Re: [iovisor-dev] [PATCH RFC] bpf: add connection tracking helper functions
On Sun, Sep 3, 2017 at 3:26 PM, Thomas Graf wrote: > On 1 September 2017 at 04:30, William Tu via iovisor-dev > wrote: > > This patch adds two BPF conntrack helper functions, bpf_ct_lookup() > > and bpf_ct_commit(), to enable the possibility of BPF stateful firewall. > > > > There are two ways to implement BPF conntrack. One way is to not > > rely on helpers but implement the conntrack state table using BPF > > maps. So conntrack is basically another BPF program extracting > > the tuples and lookup/update its map. Currenly Cillium project has > > implemented this way. > > This helper looks great. The reason why we implemented our own > conntrack table was for two reasons: > 1. we wanted to have the option to have per endpoint tables and > netfilter conntrack had already switched back to a global table. > 2. The conntrack helper was not available back then and we wanted to > have a backwards compatible alternative > > We are definitely interested in using this as well as it is merged. > Are you maintaining a development branch somewhere? We would love to > test it with Cilium. > Thanks for the feedback! Now I put the branch below, I will work on Daniel's feedback and update later. https://github.com/williamtu/net-next/commits/bpfct William ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] [PATCH RFC] bpf: add connection tracking helper functions
On 09/01/2017 01:30 PM, William Tu wrote: This patch adds two BPF conntrack helper functions, bpf_ct_lookup() and bpf_ct_commit(), to enable the possibility of BPF stateful firewall. There are two ways to implement BPF conntrack. One way is to not rely on helpers but implement the conntrack state table using BPF maps. So conntrack is basically another BPF program extracting the tuples and lookup/update its map. Currenly Cillium project has implemented this way. Instaed, this patch is proposed to reuse the the existing netfilter conntrack. By creating two helpers, bpf_skb_ct_lookup() and bpf_skb_ct_commit(), a BPF program simply lookup the conntrack state, based on the conntrack zone, or commit the new connection into the netfilter conntrack table, for later lookup hit. In the case where a fragmented packet is received, the helper handles it by submitting packets to ip_defrag() and return TC_ACT_STOLEN if being queued. An simple example is to allow PING packet to your host, only when your host initiates the ping. The first ping from your host to outside will be tracked into conntrack table as new, and the echo reply coming into your host will make it established. So the subsequent ping will pass the firwall. (see tcbpf3_kern.c and test_conntrack_bpf.h) Now the helper is attached to TC, and requires skb to interact with netfilter conntrack, so XDP program does not apply here. In the future, I'd like to test/implement ip defragmentation, alg, and nat. But I'd like to have some early feedbacks. Thanks! Signed-off-by: William Tu Cc: Joe Stringer Cc: Daniel Borkmann Cc: Alexei Starovoitov [...] Just curious, were you able to do some performance testing with the helper? [...] FN(unspec), \ @@ -638,6 +650,8 @@ union bpf_attr { FN(redirect_map), \ FN(sk_redirect_map),\ FN(sock_map_update),\ + FN(skb_ct_lookup), \ + FN(skb_ct_commit), \ /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call @@ -737,6 +751,23 @@ struct bpf_tunnel_key { __u32 tunnel_label; }; +#define BPF_CT_LABELS_LEN 16 +#define BPF_F_CT_DEFRAG(1ULL << 0) + +struct bpf_conntrack_info { + __u32 ct_state; + __u16 family; + __u16 zone_id; Have you considered also supporting zone directions, so you can have overlapping tuples? Presumably if you have some other interaction with CT where they are used and you do the lookup here with dir as NF_CT_DEFAULT_ZONE_DIR, then CT won't find them if I remember correctly. + /* for conntrack mark */ + __u32 mark_value; + __u32 mark_mask; + + /* for conntrack label */ + __u8 ct_label_value[16]; + __u8 ct_label_mask[16]; +}; + /* Generic BPF return codes which all BPF program types may support. * The values are binary compatible with their TC_ACT_* counter-part to * provide backwards compatibility with existing SCHED_CLS and SCHED_ACT diff --git a/net/core/filter.c b/net/core/filter.c index c6a37fe0285b..906518043f19 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -56,6 +56,11 @@ #include #include #include +#include +#include +#include +#include +#include /** *sk_filter_trim_cap - run a packet through a socket filter @@ -2674,6 +2679,221 @@ static unsigned long bpf_skb_copy(void *dst_buff, const void *skb, return 0; } +/* Lookup the conntrack with bpf_conntrack_info. + * If found, receive the conntrack state, mark, and label + */ +BPF_CALL_3(bpf_skb_ct_lookup, struct sk_buff *, skb, + struct bpf_conntrack_info *, info, u64, flags) +{ + struct net *net = dev_net(skb->dev); + enum ip_conntrack_info ctinfo; + struct nf_conntrack_zone zone; + struct nf_conn_labels *cl; + struct nf_conn *ct, *tmpl; + int err, nh_ofs; + struct iphdr *iph; Please err out on invalid flags, otherwise new flags cannot be added anymore since they would break existing programs. + /* Conntrack expects L3 packet */ + nh_ofs = skb_network_offset(skb); + skb_pull_rcsum(skb, nh_ofs); + iph = ip_hdr(skb); What happens if you call this function with an IPv6 or some other non-IPv4 pkt and go below into IPv4 defrag for example? We would also need to handle this for IPv6 in general, and bail out if skb->protocol is neither IPv4 nor IPv6. + if (ip_is_fragment(iph) && (flags & BPF_F_CT_DEFRAG)) { + /* XXX: not test yet */ + enum ip_defrag_users user = + IP_DEFRAG_CONNTRACK_IN + info->zone_id; + + memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); The cb[] is already used in this layer by tc and BPF, you would somewhat need to save / restore the content of the cb[] before returning back to BPF program, or making sure the skb never goes back to either of t
Re: [iovisor-dev] [PATCH RFC] bpf: add connection tracking helper functions
On 1 September 2017 at 04:30, William Tu via iovisor-dev wrote: > This patch adds two BPF conntrack helper functions, bpf_ct_lookup() > and bpf_ct_commit(), to enable the possibility of BPF stateful firewall. > > There are two ways to implement BPF conntrack. One way is to not > rely on helpers but implement the conntrack state table using BPF > maps. So conntrack is basically another BPF program extracting > the tuples and lookup/update its map. Currenly Cillium project has > implemented this way. This helper looks great. The reason why we implemented our own conntrack table was for two reasons: 1. we wanted to have the option to have per endpoint tables and netfilter conntrack had already switched back to a global table. 2. The conntrack helper was not available back then and we wanted to have a backwards compatible alternative We are definitely interested in using this as well as it is merged. Are you maintaining a development branch somewhere? We would love to test it with Cilium. ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] [PATCH RFC] bpf: add connection tracking helper functions
Hi Alexei, Thanks, I do see the lockdep complain now. I will switch to use GFP_ATOMIC. William On Sat, Sep 2, 2017 at 8:20 AM, William Tu wrote: > > > On Fri, Sep 1, 2017 at 10:53 PM, Alexei Starovoitov < > alexei.starovoi...@gmail.com> wrote: > >> On Fri, Sep 1, 2017 at 4:30 AM, William Tu wrote: >> > + >> > + /* TODO: conntrack expectation */ >> > + >> > + nf_ct_zone_init(&zone, info->zone_id, >> > + NF_CT_DEFAULT_ZONE_DIR, 0); >> > + tmpl = nf_ct_tmpl_alloc(net, &zone, GFP_KERNEL); >> >> did you test with lockdep? >> I think above should complain. >> > > I didn't see any complain. At least after system boots up, the > "Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar" > shows all 256 testcases passed. > > I have lock debugging as below > # > # Lock Debugging (spinlocks, mutexes, etc...) > # > CONFIG_DEBUG_RT_MUTEXES=y > CONFIG_DEBUG_SPINLOCK=y > CONFIG_DEBUG_MUTEXES=y > CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y > CONFIG_DEBUG_LOCK_ALLOC=y > CONFIG_PROVE_LOCKING=y > CONFIG_LOCKDEP=y > CONFIG_LOCK_STAT=y > CONFIG_DEBUG_LOCKDEP=y > CONFIG_DEBUG_ATOMIC_SLEEP=y > CONFIG_DEBUG_LOCKING_API_SELFTESTS=y > CONFIG_LOCK_TORTURE_TEST=m > CONFIG_WW_MUTEX_SELFTEST=m > CONFIG_TRACE_IRQFLAGS=y > CONFIG_STACKTRACE=y > > Thanks, > William > ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] [PATCH RFC] bpf: add connection tracking helper functions
On Fri, Sep 1, 2017 at 10:53 PM, Alexei Starovoitov < alexei.starovoi...@gmail.com> wrote: > On Fri, Sep 1, 2017 at 4:30 AM, William Tu wrote: > > + > > + /* TODO: conntrack expectation */ > > + > > + nf_ct_zone_init(&zone, info->zone_id, > > + NF_CT_DEFAULT_ZONE_DIR, 0); > > + tmpl = nf_ct_tmpl_alloc(net, &zone, GFP_KERNEL); > > did you test with lockdep? > I think above should complain. > I didn't see any complain. At least after system boots up, the "Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar" shows all 256 testcases passed. I have lock debugging as below # # Lock Debugging (spinlocks, mutexes, etc...) # CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=y CONFIG_LOCK_STAT=y CONFIG_DEBUG_LOCKDEP=y CONFIG_DEBUG_ATOMIC_SLEEP=y CONFIG_DEBUG_LOCKING_API_SELFTESTS=y CONFIG_LOCK_TORTURE_TEST=m CONFIG_WW_MUTEX_SELFTEST=m CONFIG_TRACE_IRQFLAGS=y CONFIG_STACKTRACE=y Thanks, William ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
Re: [iovisor-dev] [PATCH RFC] bpf: add connection tracking helper functions
On Fri, Sep 1, 2017 at 4:30 AM, William Tu wrote: > + > + /* TODO: conntrack expectation */ > + > + nf_ct_zone_init(&zone, info->zone_id, > + NF_CT_DEFAULT_ZONE_DIR, 0); > + tmpl = nf_ct_tmpl_alloc(net, &zone, GFP_KERNEL); did you test with lockdep? I think above should complain. ___ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev
[iovisor-dev] [PATCH RFC] bpf: add connection tracking helper functions
This patch adds two BPF conntrack helper functions, bpf_ct_lookup() and bpf_ct_commit(), to enable the possibility of BPF stateful firewall. There are two ways to implement BPF conntrack. One way is to not rely on helpers but implement the conntrack state table using BPF maps. So conntrack is basically another BPF program extracting the tuples and lookup/update its map. Currenly Cillium project has implemented this way. Instaed, this patch is proposed to reuse the the existing netfilter conntrack. By creating two helpers, bpf_skb_ct_lookup() and bpf_skb_ct_commit(), a BPF program simply lookup the conntrack state, based on the conntrack zone, or commit the new connection into the netfilter conntrack table, for later lookup hit. In the case where a fragmented packet is received, the helper handles it by submitting packets to ip_defrag() and return TC_ACT_STOLEN if being queued. An simple example is to allow PING packet to your host, only when your host initiates the ping. The first ping from your host to outside will be tracked into conntrack table as new, and the echo reply coming into your host will make it established. So the subsequent ping will pass the firwall. (see tcbpf3_kern.c and test_conntrack_bpf.h) Now the helper is attached to TC, and requires skb to interact with netfilter conntrack, so XDP program does not apply here. In the future, I'd like to test/implement ip defragmentation, alg, and nat. But I'd like to have some early feedbacks. Thanks! Signed-off-by: William Tu Cc: Joe Stringer Cc: Daniel Borkmann Cc: Alexei Starovoitov --- include/uapi/linux/bpf.h | 31 + net/core/filter.c | 224 ++ samples/bpf/Makefile | 1 + samples/bpf/tcbpf3_kern.c | 122 samples/bpf/test_conntrack_bpf.sh | 21 +++ tools/testing/selftests/bpf/bpf_helpers.h | 5 +- 6 files changed, 403 insertions(+), 1 deletion(-) create mode 100644 samples/bpf/tcbpf3_kern.c create mode 100755 samples/bpf/test_conntrack_bpf.sh diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 08c206a863e1..b8c70012a56f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -582,6 +582,18 @@ union bpf_attr { * @map: pointer to sockmap to update * @key: key to insert/update sock in map * @flags: same flags as map update elem + * + * int bpf_skb_ct_lookup(skb, info, flags) + * @skb: pointer to skb + * @info: pointer to 'struct bpf_conntrack_info' + * @flags: reserved for future use + * Return: 0 on success or negative error + * + * int bpf_skb_ct_commit(skb, info, flags) + * @skb: pointer to skb + * @info: pointer to 'struct bpf_conntrack_info' + * @flags: reserved for future use + * Return: 0 on success or negative error */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -638,6 +650,8 @@ union bpf_attr { FN(redirect_map), \ FN(sk_redirect_map),\ FN(sock_map_update),\ + FN(skb_ct_lookup), \ + FN(skb_ct_commit), \ /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call @@ -737,6 +751,23 @@ struct bpf_tunnel_key { __u32 tunnel_label; }; +#define BPF_CT_LABELS_LEN 16 +#define BPF_F_CT_DEFRAG(1ULL << 0) + +struct bpf_conntrack_info { + __u32 ct_state; + __u16 family; + __u16 zone_id; + + /* for conntrack mark */ + __u32 mark_value; + __u32 mark_mask; + + /* for conntrack label */ + __u8 ct_label_value[16]; + __u8 ct_label_mask[16]; +}; + /* Generic BPF return codes which all BPF program types may support. * The values are binary compatible with their TC_ACT_* counter-part to * provide backwards compatibility with existing SCHED_CLS and SCHED_ACT diff --git a/net/core/filter.c b/net/core/filter.c index c6a37fe0285b..906518043f19 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -56,6 +56,11 @@ #include #include #include +#include +#include +#include +#include +#include /** * sk_filter_trim_cap - run a packet through a socket filter @@ -2674,6 +2679,221 @@ static unsigned long bpf_skb_copy(void *dst_buff, const void *skb, return 0; } +/* Lookup the conntrack with bpf_conntrack_info. + * If found, receive the conntrack state, mark, and label + */ +BPF_CALL_3(bpf_skb_ct_lookup, struct sk_buff *, skb, + struct bpf_conntrack_info *, info, u64, flags) +{ + struct net *net = dev_net(skb->dev); + enum ip_conntrack_info ctinfo; + struct nf_conntrack_zone zone; + struct nf_conn_labels *cl; + struct nf_conn *ct, *tmpl; + int err, nh_ofs; + struct iphdr *iph; + + /* Conntrack expects L3 packet */ + nh_ofs = skb_network_offset(skb); +