Hi Eelco and Ilya, Do you think this patch is ok? Thanks William
On Thu, Jan 16, 2020 at 1:49 PM William Tu <[email protected]> wrote: > > Now netdev-afxdp always forwards all packets to userspace because > it is using libbpf's default XDP program, see 'xsk_load_xdp_prog'. > There are some cases when users want to keep packets in kernel instead > of sending to userspace, for example, management traffic such as SSH > should be processed in kernel. > > The patch enables loading the user-provided XDP program by > $ovs-vsctl -- set int afxdp-p0 options:xdp-obj=<path/to/xdp/obj> > > So users can implement their filtering logic or traffic steering idea > in their XDP program, and rest of the traffic passes to AF_XDP socket > handled by OVS. > > Signed-off-by: William Tu <[email protected]> > --- > v6: > - rebase to master > - mostly remains the same as v5, but make sure there is no > leak using bpftool and no repeated loop issued reported from Eelco > here: > https://patchwork.ozlabs.org/patch/1199734/ > which has been fixed at > netdev-afxdp: Avoid removing of XDP program if not loaded. > - travis: https://travis-ci.org/williamtu/ovs-travis/builds/638126505 > > v5: > - rebase to master > Feedbacks from Eelco: > - Remove xdp-obj="__default__" case, to remove xdp-obj, use > ovs-vsctl remove int <dev> options xdp-obj > - Fix problem of xdp program not unloading > verify by bpftool. > - use xdp-obj instead of xdpobj > - Limitation: xdp-obj doesn't work when using best-effort-mode > because best-effort mode tried to probe mode by setting up queue, > and loading xdp-obj requires knwoing mode in advance. > (to support it, we might need to use the > XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD as in v3) > > Testing > - I place two xdp binary here > https://drive.google.com/open?id=1QCCdNE-5CwlKCFV6Upg9mOPnnbVkUwA5 > [xdpsock_pass.o] Working one, which forwards packets to dpif-netdev > [xdpsock_invalid.o] invalid one, which has no map > > v4: > Feedbacks from Eelco. > - First load the program, then configure xsk. > Let API take care of xdp prog and map loading, don't set > XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD. > - When loading custom xdp, need to close(prog_fd) and close(map_fd) > to release the resources > - make sure prog and map is unloaded by bpftool. > - update doc, afxdp.rst > - Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/608986781 > > v3: > Feedbacks from Eelco. > - keep using xdpobj not xdp-obj (because we alread use xdpmode) > or we change both to xdp-obj and xdp-mode? > - log a info message when using external program for better debugging > - combine some failure messages > - update doc > NEW: > - add options:xdpobj=__default__, to set back to libbpf default prog > - Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/606153231 > > v2: > A couple fixes and remove RFC > --- > Documentation/intro/install/afxdp.rst | 59 +++++++++++++++ > NEWS | 2 + > lib/netdev-afxdp.c | 135 > ++++++++++++++++++++++++++++++++-- > lib/netdev-linux-private.h | 4 + > 4 files changed, 193 insertions(+), 7 deletions(-) > > diff --git a/Documentation/intro/install/afxdp.rst > b/Documentation/intro/install/afxdp.rst > index 15e3c918f942..e72bb3edabe6 100644 > --- a/Documentation/intro/install/afxdp.rst > +++ b/Documentation/intro/install/afxdp.rst > @@ -283,6 +283,65 @@ Or, use OVS pmd tool:: > ovs-appctl dpif-netdev/pmd-stats-show > > > +Loading Custom XDP Program > +-------------------------- > +By defailt, netdev-afxdp always forwards all packets to userspace because > +it is using libbpf's default XDP program. There are some cases when users > +want to keep packets in kernel instead of sending to userspace, for example, > +management traffic such as SSH should be processed in kernel. This can be > +done by loading the user-provided XDP program:: > + > + ovs-vsctl -- set int afxdp-p0 options:xdp-obj=<path/to/xdp/obj> > + > +So users can implement their filtering logic or traffic steering idea > +in their XDP program, and rest of the traffic passes to AF_XDP socket > +handled by OVS. To set it back to default, use:: > + > + ovs-vsctl remove int afxdp-p0 options xdp-obj > + > +Below is a sample C program compiled under kernel's samples/bpf/. > + > +.. code-block:: c > + > + #include <uapi/linux/bpf.h> > + #include "bpf_helpers.h" > + > + #if LINUX_VERSION_CODE < KERNEL_VERSION(5,3,0) > + /* Kernel version before 5.3 needed an additional map */ > + struct bpf_map_def SEC("maps") qidconf_map = { > + .type = BPF_MAP_TYPE_ARRAY, > + .key_size = sizeof(int), > + .value_size = sizeof(int), > + .max_entries = 64, > + }; > + #endif > + > + /* OVS will associate map 'xsks_map' to xsk socket. */ > + struct bpf_map_def SEC("maps") xsks_map = { > + .type = BPF_MAP_TYPE_XSKMAP, > + .key_size = sizeof(int), > + .value_size = sizeof(int), > + .max_entries = 32, > + }; > + > + SEC("xdp_sock") > + int xdp_sock_prog(struct xdp_md *ctx) > + { > + int index = ctx->rx_queue_index; > + > + /* Customized by user. > + * For example > + * 1) filter out all SSH traffic and return XDP_PASS > + * for kernel to process. > + * 2) Drop unwanted packet by returning XDP_DROP. > + */ > + > + /* Rest of packets goes to AF_XDP. */ > + return bpf_redirect_map(&xsks_map, index, 0); > + } > + char _license[] SEC("license") = "GPL"; > + > + > Example Script > -------------- > > diff --git a/NEWS b/NEWS > index e8d662a0c15f..a939262ce09e 100644 > --- a/NEWS > +++ b/NEWS > @@ -19,6 +19,8 @@ Post-v2.12.0 > generic - former SKB > best-effort [default] - new one, chooses the best available from > 3 above modes > + * New option 'xdp-obj' for loading custom XDP program. Default uses > + the libbpf builtin XDP program. > - DPDK: > * DPDK pdump packet capture support disabled by default. New configure > option '--enable-dpdk-pdump' to enable it. > diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c > index 6ac0bc2dde90..421566e36a40 100644 > --- a/lib/netdev-afxdp.c > +++ b/lib/netdev-afxdp.c > @@ -21,6 +21,7 @@ > #include "netdev-afxdp.h" > #include "netdev-afxdp-pool.h" > > +#include <bpf/bpf.h> > #include <errno.h> > #include <inttypes.h> > #include <linux/rtnetlink.h> > @@ -30,6 +31,7 @@ > #include <stdlib.h> > #include <sys/resource.h> > #include <sys/socket.h> > +#include <sys/stat.h> > #include <sys/types.h> > #include <unistd.h> > > @@ -93,7 +95,8 @@ static struct xsk_socket_info *xsk_configure(int ifindex, > int xdp_queue_id, > enum afxdp_mode mode, > bool use_need_wakeup, > bool report_socket_failures); > -static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode); > +static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode, > + int prog_fd, int map_fd); > static void xsk_destroy(struct xsk_socket_info *xsk); > static int xsk_configure_all(struct netdev *netdev); > static void xsk_destroy_all(struct netdev *netdev); > @@ -255,6 +258,23 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED) > ovs_mutex_unlock(&unused_pools_mutex); > } > > +static int > +xsk_load_prog(const char *path, struct bpf_object **obj, > + int *prog_fd) > +{ > + struct bpf_prog_load_attr attr = { > + .prog_type = BPF_PROG_TYPE_XDP, > + .file = path, > + }; > + > + if (bpf_prog_load_xattr(&attr, obj, prog_fd)) { > + VLOG_ERR("Can't load XDP program at '%s'", path); > + return EINVAL; > + } > + > + return 0; > +} > + > static struct xsk_umem_info * > xsk_configure_umem(void *buffer, uint64_t size) > { > @@ -471,6 +491,50 @@ xsk_configure_queue(struct netdev_linux *dev, int > ifindex, int queue_id, > return 0; > } > > +static int > +xsk_configure_prog(struct netdev *netdev, int ifindex) > +{ > + struct netdev_linux *dev = netdev_linux_cast(netdev); > + struct bpf_object *obj; > + uint32_t prog_id = 0; > + uint32_t flags; > + int prog_fd = 0; > + int map_fd = 0; > + int mode; > + int ret; > + > + mode = dev->xdp_mode_in_use; > + flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST; > + > + ret = xsk_load_prog(dev->xdp_obj, &obj, &prog_fd); > + if (ret) { > + goto err; > + } > + dev->prog_fd = prog_fd; > + > + bpf_set_link_xdp_fd(ifindex, prog_fd, flags); > + ret = bpf_get_link_xdp_id(ifindex, &prog_id, flags); > + if (ret < 0) { > + VLOG_ERR("%s: Cannot get XDP prog id.", > + netdev_get_name(netdev)); > + goto err; > + } > + > + map_fd = bpf_object__find_map_fd_by_name(obj, "xsks_map"); > + if (map_fd < 0) { > + VLOG_ERR("%s: Cannot find \"xsks_map\".", > + netdev_get_name(netdev)); > + goto err; > + } > + dev->map_fd = map_fd; > + > + VLOG_INFO("%s: Loaded custom XDP program at %s prog_id %d.", > + netdev_get_name(netdev), dev->xdp_obj, prog_id); > + return 0; > + > +err: > + return ret; > +} > > static int > xsk_configure_all(struct netdev *netdev) > @@ -507,6 +571,13 @@ xsk_configure_all(struct netdev *netdev) > qid++; > } else { > dev->xdp_mode_in_use = dev->xdp_mode; > + if (dev->xdp_obj) { > + /* XDP program is per-netdev, so all queues share > + * the same XDP program. */ > + if (xsk_configure_prog(netdev, ifindex)) { > + goto err; > + } > + } > } > > /* Configure remaining queues. */ > @@ -581,7 +652,12 @@ xsk_destroy_all(struct netdev *netdev) > > VLOG_INFO("%s: Removing xdp program.", netdev_get_name(netdev)); > ifindex = linux_get_ifindex(netdev_get_name(netdev)); > - xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use); > + xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use, dev->prog_fd, > + dev->map_fd); > + dev->prog_fd = 0; > + dev->map_fd = 0; > + free(CONST_CAST(char *, dev->xdp_obj)); > + dev->xdp_obj = NULL; > > if (dev->tx_locks) { > for (i = 0; i < netdev_n_txq(netdev); i++) { > @@ -598,9 +674,11 @@ netdev_afxdp_set_config(struct netdev *netdev, const > struct smap *args, > { > struct netdev_linux *dev = netdev_linux_cast(netdev); > const char *str_xdp_mode; > + const char *str_xdp_obj; > enum afxdp_mode xdp_mode; > bool need_wakeup; > int new_n_rxq; > + struct stat s; > > ovs_mutex_lock(&dev->mutex); > new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1); > @@ -634,12 +712,34 @@ netdev_afxdp_set_config(struct netdev *netdev, const > struct smap *args, > } > #endif > > + str_xdp_obj = smap_get_def(args, "xdp-obj", NULL); > + if (str_xdp_obj) { > + if (stat(str_xdp_obj, &s)) { > + ovs_mutex_unlock(&dev->mutex); > + VLOG_ERR("Invalid xdp-obj '%s': %s.", str_xdp_obj, > + ovs_strerror(errno)); > + return EINVAL; > + } else if (!S_ISREG(s.st_mode)) { > + ovs_mutex_unlock(&dev->mutex); > + VLOG_ERR("xdp-obj '%s' is not a regular file.", str_xdp_obj); > + return EINVAL; > + } > + } > + > + if (str_xdp_obj && xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) { > + ovs_mutex_unlock(&dev->mutex); > + VLOG_ERR("best-effort mode and xdp-obj can't be set together"); > + return EINVAL; > + } > + > if (dev->requested_n_rxq != new_n_rxq > || dev->requested_xdp_mode != xdp_mode > - || dev->requested_need_wakeup != need_wakeup) { > + || dev->requested_need_wakeup != need_wakeup > + || !nullable_string_is_equal(dev->requested_xdp_obj, str_xdp_obj)) { > dev->requested_n_rxq = new_n_rxq; > dev->requested_xdp_mode = xdp_mode; > dev->requested_need_wakeup = need_wakeup; > + dev->requested_xdp_obj = nullable_xstrdup(str_xdp_obj); > netdev_request_reconfigure(netdev); > } > ovs_mutex_unlock(&dev->mutex); > @@ -658,6 +758,8 @@ netdev_afxdp_get_config(const struct netdev *netdev, > struct smap *args) > xdp_modes[dev->xdp_mode_in_use].name); > smap_add_format(args, "use-need-wakeup", "%s", > dev->use_need_wakeup ? "true" : "false"); > + smap_add_format(args, "xdp-obj", "%s", > + dev->xdp_obj ? dev->xdp_obj : "builtin"); > ovs_mutex_unlock(&dev->mutex); > return 0; > } > @@ -674,7 +776,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev) > if (netdev->n_rxq == dev->requested_n_rxq > && dev->xdp_mode == dev->requested_xdp_mode > && dev->use_need_wakeup == dev->requested_need_wakeup > - && dev->xsks) { > + && dev->xsks > + && nullable_string_is_equal(dev->xdp_obj, dev->requested_xdp_obj)) { > goto out; > } > > @@ -692,6 +795,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev) > } > dev->use_need_wakeup = dev->requested_need_wakeup; > > + dev->xdp_obj = dev->requested_xdp_obj; > + > err = xsk_configure_all(netdev); > if (err) { > VLOG_ERR("%s: AF_XDP device reconfiguration failed.", > @@ -715,7 +820,8 @@ netdev_afxdp_get_numa_id(const struct netdev *netdev) > } > > static void > -xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode) > +xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode, > + int prog_fd, int map_fd) > { > uint32_t flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST; > uint32_t ret, prog_id = 0; > @@ -732,7 +838,19 @@ xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode > mode) > return; > } > > - bpf_set_link_xdp_fd(ifindex, -1, flags); > + ret = bpf_set_link_xdp_fd(ifindex, -1, flags); > + if (ret) { > + VLOG_ERR("Failed to unload prog ID: %d", prog_id); > + } > + > + if (prog_fd) { > + close(prog_fd); > + } > + if (map_fd) { > + close(map_fd); > + } > + > + VLOG_INFO("Removed XDP program ID: %d", prog_id); > } > > void > @@ -744,7 +862,8 @@ signal_remove_xdp(struct netdev *netdev) > ifindex = linux_get_ifindex(netdev_get_name(netdev)); > > VLOG_WARN("Force removing xdp program."); > - xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use); > + xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use, > + dev->prog_fd, dev->map_fd); > } > > static struct dp_packet_afxdp * > @@ -1158,10 +1277,12 @@ netdev_afxdp_construct(struct netdev *netdev) > netdev->n_txq = 0; > dev->xdp_mode = OVS_AF_XDP_MODE_UNSPEC; > dev->xdp_mode_in_use = OVS_AF_XDP_MODE_UNSPEC; > + dev->xdp_obj = NULL; > > dev->requested_n_rxq = NR_QUEUE; > dev->requested_xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT; > dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT; > + dev->requested_xdp_obj = NULL; > > dev->xsks = NULL; > dev->tx_locks = NULL; > diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h > index f08159aa7b53..190927cec098 100644 > --- a/lib/netdev-linux-private.h > +++ b/lib/netdev-linux-private.h > @@ -109,6 +109,10 @@ struct netdev_linux { > bool requested_need_wakeup; > > struct netdev_afxdp_tx_lock *tx_locks; /* Array of locks for TX queues. > */ > + const char *xdp_obj; /* XDP object file path. */ > + const char *requested_xdp_obj; > + int prog_fd; > + int map_fd; > #endif > }; > > -- > 2.7.4 > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
