Re: [bpf-next RFC 2/3] flow_dissector: implements eBPF parser

2018-08-18 Thread Willem de Bruijn
On Sat, Aug 18, 2018 at 11:56 AM Tom Herbert  wrote:
>
> On Thu, Aug 16, 2018 at 9:44 AM, Petar Penkov  wrote:
> > From: Petar Penkov 
> >
> > This eBPF program extracts basic/control/ip address/ports keys from
> > incoming packets. It supports recursive parsing for IP
> > encapsulation, MPLS, GUE, and VLAN, along with IPv4/IPv6 and extension
> > headers. This program is meant to show how flow dissection and key
> > extraction can be done in eBPF.
> >
> > It is initially meant to be used for demonstration rather than as a
> > complete replacement of the existing flow dissector.
> >
> > This includes parsing of GUE and MPLS payload, which cannot be done
> > in production in general, as GUE tunnels and MPLS payloads cannot
> > unambiguously be detected in general.
> >
> > In closed environments, however, it can be enabled. Another example
> > where the programmability of BPF aids flow dissection.

> > +static __always_inline int write_ports(struct __sk_buff *skb, __u8 proto)
> > +{
> > +   struct bpf_dissect_cb *cb = (struct bpf_dissect_cb *)(skb->cb);
> > +   struct flow_dissector_key_ports ports;
> > +
> > +   /* The supported protocols always start with the ports */
> > +   if (bpf_skb_load_bytes(skb, cb->nhoff, , sizeof(ports)))
> > +   return BPF_DROP;
> > +
> > +   if (proto == IPPROTO_UDP && ports.dst == bpf_htons(GUE_PORT)) {
> > +   /* GUE encapsulation */
> > +   cb->nhoff += sizeof(struct udphdr);
> > +   bpf_tail_call(skb, _table, GUE);
> > +   return BPF_DROP;
>
> It's a nice sentiment to support GUE, but this really isn't the right
> way to do it.

Yes, this was just for demonstration purposes. The same for
unconditionally parsing MPLS payload as IP.

Though note the point in the commit message that within a closed
network with fixed reserved GUE ports, a custom BPF program
like this could be sufficient. That's true not only for UDP tunnels.

> What would be much better is a means to generically
> support all the various UDP encapsulations like GUE, VXLAN, Geneve,
> GRE/UDP, MPLS/UDP, etc. I think there's two ways to do that:
>
> 1) A UDP socket lookup that returns an encapsulation socket containing
> a flow dissector function that can be called. This is the safest
> method because of the UDP are reserved numbers problem. I implement
> this in kernel flow dissector, not upstreamed though.

Yes, similar to udp_gro_receive. Socket lookup is not free, however,
and this is a relatively rarely used feature.

I want to move the one in udp_gro_receive behind a static key.
udp_encap_needed_key is the likely target. Then the same can
eventually be done for flow dissection inside UDP tunnels.

> 2) Create a lookup table based on destination port that returns the
> flow dissector function to call. This doesn't have the socket lookup
> so it isn't quite as robust as the socket lookup. But, at least it's a
> generic interface and programmable so it might be appropriate in the
> BPF flow dissector case.

Option 1 sounds preferable to me.


Re: [bpf-next RFC 2/3] flow_dissector: implements eBPF parser

2018-08-18 Thread Tom Herbert
On Thu, Aug 16, 2018 at 9:44 AM, Petar Penkov  wrote:
> From: Petar Penkov 
>
> This eBPF program extracts basic/control/ip address/ports keys from
> incoming packets. It supports recursive parsing for IP
> encapsulation, MPLS, GUE, and VLAN, along with IPv4/IPv6 and extension
> headers. This program is meant to show how flow dissection and key
> extraction can be done in eBPF.
>
> It is initially meant to be used for demonstration rather than as a
> complete replacement of the existing flow dissector.
>
> This includes parsing of GUE and MPLS payload, which cannot be done
> in production in general, as GUE tunnels and MPLS payloads cannot
> unambiguously be detected in general.
>
> In closed environments, however, it can be enabled. Another example
> where the programmability of BPF aids flow dissection.
>
> Link: http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf
> Signed-off-by: Petar Penkov 
> Signed-off-by: Willem de Bruijn 
> ---
>  tools/testing/selftests/bpf/Makefile   |   2 +-
>  tools/testing/selftests/bpf/bpf_flow.c | 542 +
>  2 files changed, 543 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/bpf_flow.c
>
> diff --git a/tools/testing/selftests/bpf/Makefile 
> b/tools/testing/selftests/bpf/Makefile
> index fff7fb1285fc..e65f50f9185e 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -35,7 +35,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
> test_tcp_estats.o test
> test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
> test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o 
> test_lirc_mode2_kern.o \
> get_cgroup_id_kern.o socket_cookie_prog.o 
> test_select_reuseport_kern.o \
> -   test_skb_cgroup_id_kern.o
> +   test_skb_cgroup_id_kern.o bpf_flow.o
>
>  # Order correspond to 'make run_tests' order
>  TEST_PROGS := test_kmod.sh \
> diff --git a/tools/testing/selftests/bpf/bpf_flow.c 
> b/tools/testing/selftests/bpf/bpf_flow.c
> new file mode 100644
> index ..9c11c644b713
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/bpf_flow.c
> @@ -0,0 +1,542 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "bpf_helpers.h"
> +#include "bpf_endian.h"
> +
> +int _version SEC("version") = 1;
> +#define PROG(F) SEC(#F) int bpf_func_##F
> +
> +/* These are the identifiers of the BPF programs that will be used in tail
> + * calls. Name is limited to 16 characters, with the terminating character 
> and
> + * bpf_func_ above, we have only 6 to work with, anything after will be 
> cropped.
> + */
> +enum {
> +   IP,
> +   IPV6,
> +   IPV6OP, /* Destination/Hop-by-Hop Options IPv6 Extension header */
> +   IPV6FR, /* Fragmentation IPv6 Extension Header */
> +   MPLS,
> +   VLAN,
> +   GUE,
> +};
> +
> +#define IP_MF  0x2000
> +#define IP_OFFSET  0x1FFF
> +#define IP6_MF 0x0001
> +#define IP6_OFFSET 0xFFF8
> +
> +struct vlan_hdr {
> +   __be16 h_vlan_TCI;
> +   __be16 h_vlan_encapsulated_proto;
> +};
> +
> +struct gre_hdr {
> +   __be16 flags;
> +   __be16 proto;
> +};
> +
> +#define GUE_PORT 6080
> +/* Taken from include/net/gue.h. Move that to uapi, instead? */
> +struct guehdr {
> +   union {
> +   struct {
> +#if defined(__LITTLE_ENDIAN_BITFIELD)
> +   __u8hlen:5,
> +   control:1,
> +   version:2;
> +#elif defined (__BIG_ENDIAN_BITFIELD)
> +   __u8version:2,
> +   control:1,
> +   hlen:5;
> +#else
> +#error  "Please fix "
> +#endif
> +   __u8proto_ctype;
> +   __be16  flags;
> +   };
> +   __be32  word;
> +   };
> +};
> +
> +enum flow_dissector_key_id {
> +   FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
> +   FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
> +   FLOW_DISSECTOR_KEY_IPV4_ADDRS, /* struct 
> flow_dissector_key_ipv4_addrs */
> +   FLOW_DISSECTOR_KEY_IPV6_ADDRS, /* struct 
> flow_dissector_key_ipv6_addrs */
> +   FLOW_DISSECTOR_KEY_PORTS, /* struct flow_dissector_key_ports */
> +   FLOW_DISSECTOR_KEY_ICMP, /* struct flow_dissector_key_icmp */
> +   FLOW_DISSECTOR_KEY_ETH_ADDRS, /* struct flow_dissector_key_eth_addrs 
> */
> +   FLOW_DISSECTOR_KEY_TIPC, /* struct flow_dissector_key_tipc */
> +   FLOW_DISSECTOR_KEY_ARP, /* struct flow_dissector_key_arp */
> +   FLOW_DISSECTOR_KEY_VLAN, /* struct flow_dissector_key_flow_vlan */
> +   FLOW_DISSECTOR_KEY_FLOW_LABEL, /* struct flow_dissector_key_flow_tags 
> 

[bpf-next RFC 2/3] flow_dissector: implements eBPF parser

2018-08-16 Thread Petar Penkov
From: Petar Penkov 

This eBPF program extracts basic/control/ip address/ports keys from
incoming packets. It supports recursive parsing for IP
encapsulation, MPLS, GUE, and VLAN, along with IPv4/IPv6 and extension
headers. This program is meant to show how flow dissection and key
extraction can be done in eBPF.

It is initially meant to be used for demonstration rather than as a
complete replacement of the existing flow dissector.

This includes parsing of GUE and MPLS payload, which cannot be done
in production in general, as GUE tunnels and MPLS payloads cannot
unambiguously be detected in general.

In closed environments, however, it can be enabled. Another example
where the programmability of BPF aids flow dissection.

Link: http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf
Signed-off-by: Petar Penkov 
Signed-off-by: Willem de Bruijn 
---
 tools/testing/selftests/bpf/Makefile   |   2 +-
 tools/testing/selftests/bpf/bpf_flow.c | 542 +
 2 files changed, 543 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/bpf_flow.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index fff7fb1285fc..e65f50f9185e 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -35,7 +35,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
test_tcp_estats.o test
test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o 
test_lirc_mode2_kern.o \
get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
-   test_skb_cgroup_id_kern.o
+   test_skb_cgroup_id_kern.o bpf_flow.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/bpf_flow.c 
b/tools/testing/selftests/bpf/bpf_flow.c
new file mode 100644
index ..9c11c644b713
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpf_flow.c
@@ -0,0 +1,542 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+int _version SEC("version") = 1;
+#define PROG(F) SEC(#F) int bpf_func_##F
+
+/* These are the identifiers of the BPF programs that will be used in tail
+ * calls. Name is limited to 16 characters, with the terminating character and
+ * bpf_func_ above, we have only 6 to work with, anything after will be 
cropped.
+ */
+enum {
+   IP,
+   IPV6,
+   IPV6OP, /* Destination/Hop-by-Hop Options IPv6 Extension header */
+   IPV6FR, /* Fragmentation IPv6 Extension Header */
+   MPLS,
+   VLAN,
+   GUE,
+};
+
+#define IP_MF  0x2000
+#define IP_OFFSET  0x1FFF
+#define IP6_MF 0x0001
+#define IP6_OFFSET 0xFFF8
+
+struct vlan_hdr {
+   __be16 h_vlan_TCI;
+   __be16 h_vlan_encapsulated_proto;
+};
+
+struct gre_hdr {
+   __be16 flags;
+   __be16 proto;
+};
+
+#define GUE_PORT 6080
+/* Taken from include/net/gue.h. Move that to uapi, instead? */
+struct guehdr {
+   union {
+   struct {
+#if defined(__LITTLE_ENDIAN_BITFIELD)
+   __u8hlen:5,
+   control:1,
+   version:2;
+#elif defined (__BIG_ENDIAN_BITFIELD)
+   __u8version:2,
+   control:1,
+   hlen:5;
+#else
+#error  "Please fix "
+#endif
+   __u8proto_ctype;
+   __be16  flags;
+   };
+   __be32  word;
+   };
+};
+
+enum flow_dissector_key_id {
+   FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
+   FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
+   FLOW_DISSECTOR_KEY_IPV4_ADDRS, /* struct flow_dissector_key_ipv4_addrs 
*/
+   FLOW_DISSECTOR_KEY_IPV6_ADDRS, /* struct flow_dissector_key_ipv6_addrs 
*/
+   FLOW_DISSECTOR_KEY_PORTS, /* struct flow_dissector_key_ports */
+   FLOW_DISSECTOR_KEY_ICMP, /* struct flow_dissector_key_icmp */
+   FLOW_DISSECTOR_KEY_ETH_ADDRS, /* struct flow_dissector_key_eth_addrs */
+   FLOW_DISSECTOR_KEY_TIPC, /* struct flow_dissector_key_tipc */
+   FLOW_DISSECTOR_KEY_ARP, /* struct flow_dissector_key_arp */
+   FLOW_DISSECTOR_KEY_VLAN, /* struct flow_dissector_key_flow_vlan */
+   FLOW_DISSECTOR_KEY_FLOW_LABEL, /* struct flow_dissector_key_flow_tags */
+   FLOW_DISSECTOR_KEY_GRE_KEYID, /* struct flow_dissector_key_keyid */
+   FLOW_DISSECTOR_KEY_MPLS_ENTROPY, /* struct flow_dissector_key_keyid */
+   FLOW_DISSECTOR_KEY_ENC_KEYID, /* struct flow_dissector_key_keyid */
+   FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS, /* struct 
flow_dissector_key_ipv4_addrs */
+