Re: [iovisor-dev] [PATCH RFC] bpf: add connection tracking helper functions

2017-09-04 Thread William Tu via iovisor-dev
On Sun, Sep 3, 2017 at 3:26 PM, Thomas Graf  wrote:

> On 1 September 2017 at 04:30, William Tu via iovisor-dev
>  wrote:
> > This patch adds two BPF conntrack helper functions, bpf_ct_lookup()
> > and bpf_ct_commit(), to enable the possibility of BPF stateful firewall.
> >
> > There are two ways to implement BPF conntrack.  One way is to not
> > rely on helpers but implement the conntrack state table using BPF
> > maps.  So conntrack is basically another BPF program extracting
> > the tuples and lookup/update its map.  Currenly Cillium project has
> > implemented this way.
>
> This helper looks great. The reason why we implemented our own
> conntrack table was for two reasons:
> 1. we wanted to have the option to have per endpoint tables and
> netfilter conntrack had already switched back to a global table.
> 2. The conntrack helper was not available back then and we wanted to
> have a backwards compatible alternative
>
> We are definitely interested in using this as well as it is merged.
> Are you maintaining a development branch somewhere? We would love to
> test it with Cilium.
>

Thanks for the feedback!
Now I put the branch below, I will work on Daniel's feedback and update
later.
https://github.com/williamtu/net-next/commits/bpfct

William
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] [PATCH RFC] bpf: add connection tracking helper functions

2017-09-04 Thread Daniel Borkmann via iovisor-dev

On 09/01/2017 01:30 PM, William Tu wrote:

This patch adds two BPF conntrack helper functions, bpf_ct_lookup()
and bpf_ct_commit(), to enable the possibility of BPF stateful firewall.

There are two ways to implement BPF conntrack.  One way is to not
rely on helpers but implement the conntrack state table using BPF
maps.  So conntrack is basically another BPF program extracting
the tuples and lookup/update its map.  Currenly Cillium project has
implemented this way.

Instaed, this patch is proposed to reuse the the existing netfilter
conntrack.  By creating two helpers, bpf_skb_ct_lookup() and
bpf_skb_ct_commit(), a BPF program simply lookup the conntrack state,
based on the conntrack zone, or commit the new connection into the
netfilter conntrack table, for later lookup hit.  In the case where
a fragmented packet is received, the helper handles it by submitting
packets to ip_defrag() and return TC_ACT_STOLEN if being queued.

An simple example is to allow PING packet to your host, only when
your host initiates the ping.  The first ping from your host to
outside will be tracked into conntrack table as new, and the echo
reply coming into your host will make it established.  So the
subsequent ping will pass the firwall. (see tcbpf3_kern.c and
test_conntrack_bpf.h)

Now the helper is attached to TC, and requires skb to interact with
netfilter conntrack, so XDP program does not apply here.
In the future, I'd like to test/implement ip defragmentation, alg,
and nat.  But I'd like to have some early feedbacks. Thanks!

Signed-off-by: William Tu 
Cc: Joe Stringer 
Cc: Daniel Borkmann 
Cc: Alexei Starovoitov 

[...]

Just curious, were you able to do some performance testing with
the helper?

[...]

FN(unspec), \
@@ -638,6 +650,8 @@ union bpf_attr {
FN(redirect_map),   \
FN(sk_redirect_map),\
FN(sock_map_update),\
+   FN(skb_ct_lookup),  \
+   FN(skb_ct_commit),  \

  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
   * function eBPF program intends to call
@@ -737,6 +751,23 @@ struct bpf_tunnel_key {
__u32 tunnel_label;
  };

+#define BPF_CT_LABELS_LEN 16
+#define BPF_F_CT_DEFRAG(1ULL << 0)
+
+struct bpf_conntrack_info {
+   __u32 ct_state;
+   __u16 family;
+   __u16 zone_id;


Have you considered also supporting zone directions, so you can have
overlapping tuples? Presumably if you have some other interaction
with CT where they are used and you do the lookup here with dir as
NF_CT_DEFAULT_ZONE_DIR, then CT won't find them if I remember correctly.


+   /* for conntrack mark */
+   __u32 mark_value;
+   __u32 mark_mask;
+
+   /* for conntrack label */
+   __u8 ct_label_value[16];
+   __u8 ct_label_mask[16];
+};
+
  /* Generic BPF return codes which all BPF program types may support.
   * The values are binary compatible with their TC_ACT_* counter-part to
   * provide backwards compatibility with existing SCHED_CLS and SCHED_ACT
diff --git a/net/core/filter.c b/net/core/filter.c
index c6a37fe0285b..906518043f19 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -56,6 +56,11 @@
  #include 
  #include 
  #include 
+#include 
+#include 
+#include 
+#include 
+#include 

  /**
   *sk_filter_trim_cap - run a packet through a socket filter
@@ -2674,6 +2679,221 @@ static unsigned long bpf_skb_copy(void *dst_buff, const 
void *skb,
return 0;
  }

+/* Lookup the conntrack with bpf_conntrack_info.
+ * If found, receive the conntrack state, mark, and label
+ */
+BPF_CALL_3(bpf_skb_ct_lookup, struct sk_buff *, skb,
+  struct bpf_conntrack_info *, info, u64, flags)
+{
+   struct net *net = dev_net(skb->dev);
+   enum ip_conntrack_info ctinfo;
+   struct nf_conntrack_zone zone;
+   struct nf_conn_labels *cl;
+   struct nf_conn *ct, *tmpl;
+   int err, nh_ofs;
+   struct iphdr *iph;


Please err out on invalid flags, otherwise new flags cannot
be added anymore since they would break existing programs.


+   /* Conntrack expects L3 packet */
+   nh_ofs = skb_network_offset(skb);
+   skb_pull_rcsum(skb, nh_ofs);
+   iph = ip_hdr(skb);


What happens if you call this function with an IPv6 or some
other non-IPv4 pkt and go below into IPv4 defrag for example?

We would also need to handle this for IPv6 in general, and
bail out if skb->protocol is neither IPv4 nor IPv6.


+   if (ip_is_fragment(iph) && (flags & BPF_F_CT_DEFRAG)) {
+   /* XXX: not test yet */
+   enum ip_defrag_users user =
+   IP_DEFRAG_CONNTRACK_IN + info->zone_id;
+
+   memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));


The cb[] is already used in this layer by tc and BPF, you would
somewhat need to save / restore the content of the cb[] before
returning back to BPF program, or making sure the skb never goes
back to either of t

Re: [iovisor-dev] [PATCH RFC] bpf: add connection tracking helper functions

2017-09-03 Thread Thomas Graf via iovisor-dev
On 1 September 2017 at 04:30, William Tu via iovisor-dev
 wrote:
> This patch adds two BPF conntrack helper functions, bpf_ct_lookup()
> and bpf_ct_commit(), to enable the possibility of BPF stateful firewall.
>
> There are two ways to implement BPF conntrack.  One way is to not
> rely on helpers but implement the conntrack state table using BPF
> maps.  So conntrack is basically another BPF program extracting
> the tuples and lookup/update its map.  Currenly Cillium project has
> implemented this way.

This helper looks great. The reason why we implemented our own
conntrack table was for two reasons:
1. we wanted to have the option to have per endpoint tables and
netfilter conntrack had already switched back to a global table.
2. The conntrack helper was not available back then and we wanted to
have a backwards compatible alternative

We are definitely interested in using this as well as it is merged.
Are you maintaining a development branch somewhere? We would love to
test it with Cilium.
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] [PATCH RFC] bpf: add connection tracking helper functions

2017-09-03 Thread William Tu via iovisor-dev
Hi Alexei,
Thanks, I do see the lockdep complain now. I will switch to use GFP_ATOMIC.
William

On Sat, Sep 2, 2017 at 8:20 AM, William Tu  wrote:

>
>
> On Fri, Sep 1, 2017 at 10:53 PM, Alexei Starovoitov <
> alexei.starovoi...@gmail.com> wrote:
>
>> On Fri, Sep 1, 2017 at 4:30 AM, William Tu  wrote:
>> > +
>> > +   /* TODO: conntrack expectation */
>> > +
>> > +   nf_ct_zone_init(&zone, info->zone_id,
>> > +   NF_CT_DEFAULT_ZONE_DIR, 0);
>> > +   tmpl = nf_ct_tmpl_alloc(net, &zone, GFP_KERNEL);
>>
>> did you test with lockdep?
>> I think above should complain.
>>
>
> I didn't see any complain. At least after system boots up, the
> "Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar"
> shows all 256 testcases passed.
>
> I have lock debugging as below
> #
> # Lock Debugging (spinlocks, mutexes, etc...)
> #
> CONFIG_DEBUG_RT_MUTEXES=y
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_PROVE_LOCKING=y
> CONFIG_LOCKDEP=y
> CONFIG_LOCK_STAT=y
> CONFIG_DEBUG_LOCKDEP=y
> CONFIG_DEBUG_ATOMIC_SLEEP=y
> CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
> CONFIG_LOCK_TORTURE_TEST=m
> CONFIG_WW_MUTEX_SELFTEST=m
> CONFIG_TRACE_IRQFLAGS=y
> CONFIG_STACKTRACE=y
>
> Thanks,
> William
>
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] [PATCH RFC] bpf: add connection tracking helper functions

2017-09-02 Thread William Tu via iovisor-dev
On Fri, Sep 1, 2017 at 10:53 PM, Alexei Starovoitov <
alexei.starovoi...@gmail.com> wrote:

> On Fri, Sep 1, 2017 at 4:30 AM, William Tu  wrote:
> > +
> > +   /* TODO: conntrack expectation */
> > +
> > +   nf_ct_zone_init(&zone, info->zone_id,
> > +   NF_CT_DEFAULT_ZONE_DIR, 0);
> > +   tmpl = nf_ct_tmpl_alloc(net, &zone, GFP_KERNEL);
>
> did you test with lockdep?
> I think above should complain.
>

I didn't see any complain. At least after system boots up, the
"Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar"
shows all 256 testcases passed.

I have lock debugging as below
#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_DEBUG_LOCKING_API_SELFTESTS=y
CONFIG_LOCK_TORTURE_TEST=m
CONFIG_WW_MUTEX_SELFTEST=m
CONFIG_TRACE_IRQFLAGS=y
CONFIG_STACKTRACE=y

Thanks,
William
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


Re: [iovisor-dev] [PATCH RFC] bpf: add connection tracking helper functions

2017-09-01 Thread Alexei Starovoitov via iovisor-dev
On Fri, Sep 1, 2017 at 4:30 AM, William Tu  wrote:
> +
> +   /* TODO: conntrack expectation */
> +
> +   nf_ct_zone_init(&zone, info->zone_id,
> +   NF_CT_DEFAULT_ZONE_DIR, 0);
> +   tmpl = nf_ct_tmpl_alloc(net, &zone, GFP_KERNEL);

did you test with lockdep?
I think above should complain.
___
iovisor-dev mailing list
iovisor-dev@lists.iovisor.org
https://lists.iovisor.org/mailman/listinfo/iovisor-dev


[iovisor-dev] [PATCH RFC] bpf: add connection tracking helper functions

2017-09-01 Thread William Tu via iovisor-dev
This patch adds two BPF conntrack helper functions, bpf_ct_lookup()
and bpf_ct_commit(), to enable the possibility of BPF stateful firewall.

There are two ways to implement BPF conntrack.  One way is to not
rely on helpers but implement the conntrack state table using BPF
maps.  So conntrack is basically another BPF program extracting
the tuples and lookup/update its map.  Currenly Cillium project has
implemented this way.

Instaed, this patch is proposed to reuse the the existing netfilter
conntrack.  By creating two helpers, bpf_skb_ct_lookup() and
bpf_skb_ct_commit(), a BPF program simply lookup the conntrack state,
based on the conntrack zone, or commit the new connection into the
netfilter conntrack table, for later lookup hit.  In the case where
a fragmented packet is received, the helper handles it by submitting
packets to ip_defrag() and return TC_ACT_STOLEN if being queued.

An simple example is to allow PING packet to your host, only when
your host initiates the ping.  The first ping from your host to
outside will be tracked into conntrack table as new, and the echo
reply coming into your host will make it established.  So the
subsequent ping will pass the firwall. (see tcbpf3_kern.c and
test_conntrack_bpf.h)

Now the helper is attached to TC, and requires skb to interact with
netfilter conntrack, so XDP program does not apply here.
In the future, I'd like to test/implement ip defragmentation, alg,
and nat.  But I'd like to have some early feedbacks. Thanks!

Signed-off-by: William Tu 
Cc: Joe Stringer 
Cc: Daniel Borkmann 
Cc: Alexei Starovoitov 
---
 include/uapi/linux/bpf.h  |  31 +
 net/core/filter.c | 224 ++
 samples/bpf/Makefile  |   1 +
 samples/bpf/tcbpf3_kern.c | 122 
 samples/bpf/test_conntrack_bpf.sh |  21 +++
 tools/testing/selftests/bpf/bpf_helpers.h |   5 +-
 6 files changed, 403 insertions(+), 1 deletion(-)
 create mode 100644 samples/bpf/tcbpf3_kern.c
 create mode 100755 samples/bpf/test_conntrack_bpf.sh

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 08c206a863e1..b8c70012a56f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -582,6 +582,18 @@ union bpf_attr {
  * @map: pointer to sockmap to update
  * @key: key to insert/update sock in map
  * @flags: same flags as map update elem
+ *
+ * int bpf_skb_ct_lookup(skb, info, flags)
+ * @skb: pointer to skb
+ * @info: pointer to 'struct bpf_conntrack_info'
+ * @flags: reserved for future use
+ * Return: 0 on success or negative error
+ *
+ * int bpf_skb_ct_commit(skb, info, flags)
+ * @skb: pointer to skb
+ * @info: pointer to 'struct bpf_conntrack_info'
+ * @flags: reserved for future use
+ * Return: 0 on success or negative error
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -638,6 +650,8 @@ union bpf_attr {
FN(redirect_map),   \
FN(sk_redirect_map),\
FN(sock_map_update),\
+   FN(skb_ct_lookup),  \
+   FN(skb_ct_commit),  \
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -737,6 +751,23 @@ struct bpf_tunnel_key {
__u32 tunnel_label;
 };
 
+#define BPF_CT_LABELS_LEN 16
+#define BPF_F_CT_DEFRAG(1ULL << 0)
+
+struct bpf_conntrack_info {
+   __u32 ct_state;
+   __u16 family;
+   __u16 zone_id;
+
+   /* for conntrack mark */
+   __u32 mark_value;
+   __u32 mark_mask;
+
+   /* for conntrack label */
+   __u8 ct_label_value[16];
+   __u8 ct_label_mask[16];
+};
+
 /* Generic BPF return codes which all BPF program types may support.
  * The values are binary compatible with their TC_ACT_* counter-part to
  * provide backwards compatibility with existing SCHED_CLS and SCHED_ACT
diff --git a/net/core/filter.c b/net/core/filter.c
index c6a37fe0285b..906518043f19 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -56,6 +56,11 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
 
 /**
  * sk_filter_trim_cap - run a packet through a socket filter
@@ -2674,6 +2679,221 @@ static unsigned long bpf_skb_copy(void *dst_buff, const 
void *skb,
return 0;
 }
 
+/* Lookup the conntrack with bpf_conntrack_info.
+ * If found, receive the conntrack state, mark, and label
+ */
+BPF_CALL_3(bpf_skb_ct_lookup, struct sk_buff *, skb,
+  struct bpf_conntrack_info *, info, u64, flags)
+{
+   struct net *net = dev_net(skb->dev);
+   enum ip_conntrack_info ctinfo;
+   struct nf_conntrack_zone zone;
+   struct nf_conn_labels *cl;
+   struct nf_conn *ct, *tmpl;
+   int err, nh_ofs;
+   struct iphdr *iph;
+
+   /* Conntrack expects L3 packet */
+   nh_ofs = skb_network_offset(skb);
+