> -----Original Message----- > From: William Tu [mailto:[email protected]] > Sent: Thursday, October 24, 2019 2:00 AM > To: Noa Levy <[email protected]> > Cc: [email protected]; Oz Shlomo <[email protected]>; Majd > Dibbiny <[email protected]>; Ameer Mahagneh > <[email protected]>; Eli Britstein <[email protected]> > Subject: Re: [ovs-dev] [PATCH ovs v3 2/2] netdev-dpdk: Add dpdkvdpa port > > Hi Noa, > > I have a couple more questions. I'm still at the learning stage of this new > feature, thanks in advance for your patience. > > On Thu, Oct 17, 2019 at 02:16:56PM +0300, Noa Ezra wrote: > > dpdkvdpa netdev works with 3 components: > > vhost-user socket, vdpa device: real vdpa device or a VF and > > representor of "vdpa device". > > What NIC card support this feature? > I don't have real vdpa device, can I use Intel X540 VF feature? >
This feature will have two modes, SW and HW. The SW mode doesn't depend on a real vdpa device and allows you to use this feature even if you don't have a NIC that support it. The HW mode will be implemented in the future and will use a real vdpa device. It will be better to use the HW mode if you have a NIC that support it. For now, we only support the SW mode, when vdpa will have support in dpdk, we will add the HW mode to OVS. > > > > In order to add a new vDPA port, add a new port to existing bridge > > with type dpdkvdpa and vDPA options: > > ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa > > options:vdpa-socket-path=<sock path> > > options:vdpa-accelerator-devargs=<VF pci id> > > options:dpdk-devargs=<vdpa pci id>,representor=[id] > > > > On this command OVS will create a new netdev: > > 1. Register vhost-user-client device. > > 2. Open and configure VF dpdk port. > > 3. Open and configure representor dpdk port. > > > > The new netdev will use netdev_rxq_recv() function in order to receive > > packets from VF and push to vhost-user and receive packets from > > vhost-user and push to VF. > > > > Signed-off-by: Noa Ezra <[email protected]> > > Reviewed-by: Oz Shlomo <[email protected]> > > --- > > Documentation/automake.mk | 1 + > > Documentation/topics/dpdk/index.rst | 1 + > > Documentation/topics/dpdk/vdpa.rst | 90 ++++++++++++++++++++ > > NEWS | 1 + > > lib/netdev-dpdk.c | 162 > ++++++++++++++++++++++++++++++++++++ > > vswitchd/vswitch.xml | 25 ++++++ > > 6 files changed, 280 insertions(+) > > create mode 100644 Documentation/topics/dpdk/vdpa.rst > > > > diff --git a/Documentation/automake.mk b/Documentation/automake.mk > > index cd68f3b..ee574bc 100644 > > --- a/Documentation/automake.mk > > +++ b/Documentation/automake.mk > > @@ -43,6 +43,7 @@ DOC_SOURCE = \ > > Documentation/topics/dpdk/ring.rst \ > > Documentation/topics/dpdk/vdev.rst \ > > Documentation/topics/dpdk/vhost-user.rst \ > > + Documentation/topics/dpdk/vdpa.rst \ > > Documentation/topics/fuzzing/index.rst \ > > Documentation/topics/fuzzing/what-is-fuzzing.rst \ > > Documentation/topics/fuzzing/ovs-fuzzing-infrastructure.rst \ diff > > --git a/Documentation/topics/dpdk/index.rst > > b/Documentation/topics/dpdk/index.rst > > index cf24a7b..c1d4ea7 100644 > > --- a/Documentation/topics/dpdk/index.rst > > +++ b/Documentation/topics/dpdk/index.rst > > @@ -41,3 +41,4 @@ The DPDK Datapath > > /topics/dpdk/pdump > > /topics/dpdk/jumbo-frames > > /topics/dpdk/memory > > + /topics/dpdk/vdpa > > diff --git a/Documentation/topics/dpdk/vdpa.rst > > b/Documentation/topics/dpdk/vdpa.rst > > new file mode 100644 > > index 0000000..34c5300 > > --- /dev/null > > +++ b/Documentation/topics/dpdk/vdpa.rst > > @@ -0,0 +1,90 @@ > > +.. > > + Copyright (c) 2019 Mellanox Technologies, Ltd. > > + > > + Licensed under the Apache License, Version 2.0 (the "License"); > > + you may not use this file except in compliance with the License. > > + You may obtain a copy of the License at: > > + > > + > > + > https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww > > + .apache.org%2Flicenses%2FLICENSE- > 2.0&data=02%7C01%7Cnoae%40mella > > + > nox.com%7C6390609428bf4e2a2df808d7580cc233%7Ca652971c7d2e4d9ba6a4 > d14 > > + > 9256f461b%7C0%7C0%7C637074684147132980&sdata=94myUB4Fchqm4 > 4lxlto > > + OIcbCXhlu%2FA%2FoVID8Z9EyvXU%3D&reserved=0 > > + > > + Unless required by applicable law or agreed to in writing, software > > + distributed under the License is distributed on an "AS IS" BASIS, > WITHOUT > > + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > See the > > + License for the specific language governing permissions and > > limitations > > + under the License. > > + > > + Convention for heading levels in Open vSwitch documentation: > > + > > + ======= Heading 0 (reserved for the title in a document) > > + ------- Heading 1 > > + ~~~~~~~ Heading 2 > > + +++++++ Heading 3 > > + ''''''' Heading 4 > > + > > + Avoid deeper levels because they do not render well. > > + > > + > > +=============== > > +DPDK VDPA Ports > > +=============== > > + > > +In user space there are two main approaches to communicate with a > > +guest (VM), using virtIO ports (e.g. netdev > > +type=dpdkvhoshuser/dpdkvhostuserclient) or SR-IOV using phy ports > (e.g. netdev type = dpdk). > > +Phy ports allow working with port representor which is attached to > > +the OVS and a matching VF is given with pass-through to the guest. > > +HW rules can process packets from up-link and direct them to the VF > > +without going through SW (OVS) and therefore using phy ports gives > > +the best performance. > > +However, SR-IOV architecture requires that the guest will use a > > +driver which is specific to the underlying HW. Specific HW driver has two > main drawbacks: > > +1. Breaks virtualization in some sense (guest aware of the HW), can > > +also limit the type of images supported. > > +2. Less natural support for live migration. > > + > > +Using virtIO port solves both problems, but reduces performance and > > +causes losing of some functionality, for example, for some HW > > +offload, working directly with virtIO cannot be supported. > > + > > +We created a new netdev type- dpdkvdpa. dpdkvdpa port solves this > conflict. > > +The new netdev is basically very similar to regular dpdk netdev but > > +it has some additional functionally. > > +This port translates between phy port to virtIO port, it takes > > +packets from rx-queue and send them to the suitable tx-queue and > > +allows to transfer packets from virtIO guest (VM) to a VF and vice > > +versa and benefit both SR-IOV and virtIO. > > + > > +Quick Example > > +------------- > > + > > +Configure OVS bridge and ports > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +you must first create a bridge and add ports to the switch. > > +Since the dpdkvdpa port is configured as a client, the > > +vdpa-socket-path must be configured by the user. > > +VHOST_USER_SOCKET_PATH=/path/to/socket > > + > > + $ ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev > > + $ ovs-vsctl add-port br0-ovs pf -- set Interface pf \ > > + type=dpdk options:dpdk-devargs=<pf pci id> > > Is adding pf port to br0 necessary? > > > + $ ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \ > > + options:vdpa-socket-path=VHOST_USER_SOCKET_PATH \ > > + options:vdpa-accelerator-devargs=<vf pci id> \ > > + options:dpdk-devargs=<pf pci id>,representor=[id] > > + > > +Once the ports have been added to the switch, they must be added to > the guest. > > + > > +Adding vhost-user ports to the guest (QEMU) > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +Attach the vhost-user device sockets to the guest. To do this, you > > +must pass the following parameters to QEMU: > > + > > + -chardev socket,id=char1,path=$VHOST_USER_SOCKET_PATH,server > > + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce > > + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 > > + > > +QEMU will wait until the port is created successfully in OVS to boot the > VM. > > +In this mode, in case the switch will crash, the vHost ports will > > +reconnect automatically once it is brought back. > > diff --git a/NEWS b/NEWS > > index f5a0b8f..6f315c6 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -542,6 +542,7 @@ v2.6.0 - 27 Sep 2016 > > * Remove dpdkvhostcuse port type. > > * OVS client mode for vHost and vHost reconnect (Requires QEMU 2.7) > > * 'dpdkvhostuserclient' port type. > > + * 'dpdkvdpa' port type. > > - Increase number of registers to 16. > > - ovs-benchmark: This utility has been removed due to lack of use and > > bitrot. > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index > > bc20d68..16ddf58 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -47,6 +47,7 @@ > > #include "dpif-netdev.h" > > #include "fatal-signal.h" > > #include "netdev-provider.h" > > +#include "netdev-dpdk-vdpa.h" > > #include "netdev-vport.h" > > #include "odp-util.h" > > #include "openvswitch/dynamic-string.h" > > @@ -137,6 +138,9 @@ typedef uint16_t dpdk_port_t; > > /* Legacy default value for vhost tx retries. */ #define > > VHOST_ENQ_RETRY_DEF 8 > > > > +/* Size of VDPA custom stats. */ > > +#define VDPA_CUSTOM_STATS_SIZE 4 > > + > > #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ) > > > > static const struct rte_eth_conf port_conf = { @@ -461,6 +465,8 @@ > > struct netdev_dpdk { > > int rte_xstats_ids_size; > > uint64_t *rte_xstats_ids; > > ); > > + > > + struct netdev_dpdk_vdpa_relay *relay; > > }; > > > > struct netdev_rxq_dpdk { > > @@ -1346,6 +1352,30 @@ netdev_dpdk_construct(struct netdev *netdev) > > return err; > > } > > > > +static int > > +netdev_dpdk_vdpa_construct(struct netdev *netdev) { > > + struct netdev_dpdk *dev; > > + int err; > > + > > + err = netdev_dpdk_construct(netdev); > > + if (err) { > > + VLOG_ERR("netdev_dpdk_construct failed. Port: %s\n", netdev- > >name); > > + goto out; > > + } > > + > > + ovs_mutex_lock(&dpdk_mutex); > > + dev = netdev_dpdk_cast(netdev); > > + dev->relay = netdev_dpdk_vdpa_alloc_relay(); > > + if (!dev->relay) { > > + err = ENOMEM; > > + } > > + > > + ovs_mutex_unlock(&dpdk_mutex); > > +out: > > + return err; > > +} > > + > > static void > > common_destruct(struct netdev_dpdk *dev) > > OVS_REQUIRES(dpdk_mutex) > > @@ -1428,6 +1458,19 @@ dpdk_vhost_driver_unregister(struct > netdev_dpdk > > *dev OVS_UNUSED, } > > > > static void > > +netdev_dpdk_vdpa_destruct(struct netdev *netdev) { > > + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > > + > > + ovs_mutex_lock(&dpdk_mutex); > > + netdev_dpdk_vdpa_destruct_impl(dev->relay); > > + rte_free(dev->relay); > > + ovs_mutex_unlock(&dpdk_mutex); > > + > > + netdev_dpdk_destruct(netdev); > > +} > > + > > +static void > > netdev_dpdk_vhost_destruct(struct netdev *netdev) { > > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -1878,6 > > +1921,47 @@ out: > > } > > > > static int > > +netdev_dpdk_vdpa_set_config(struct netdev *netdev, const struct smap > *args, > > + char **errp) { > > + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > > + const char *vdpa_accelerator_devargs = > > + smap_get(args, "vdpa-accelerator-devargs"); > > + const char *vdpa_socket_path = > > + smap_get(args, "vdpa-socket-path"); > > + int err = 0; > > + > > + if ((vdpa_accelerator_devargs == NULL) || (vdpa_socket_path == > NULL)) { > > + VLOG_ERR("netdev_dpdk_vdpa_set_config failed." > > + "Required arguments are missing for VDPA port %s", > > + netdev->name); > > + goto free_relay; > > + } > > + > > + err = netdev_dpdk_set_config(netdev, args, errp); > > + if (err) { > > + VLOG_ERR("netdev_dpdk_set_config failed. Port: %s", netdev- > >name); > > + goto free_relay; > > + } > > + > > + err = netdev_dpdk_vdpa_config_impl(dev->relay, dev->port_id, > > + vdpa_socket_path, > > + vdpa_accelerator_devargs); > > + if (err) { > > + VLOG_ERR("netdev_dpdk_vdpa_config_impl failed. Port %s", > > + netdev->name); > > + goto free_relay; > > + } > > + > > + goto out; > > + > > +free_relay: > > + rte_free(dev->relay); > > +out: > > + return err; > > +} > > + > > +static int > > netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap > *args, > > char **errp OVS_UNUSED) { @@ -2273,6 > > +2357,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct > dp_packet_batch *batch, > > return 0; > > } > > > > +static int > > +netdev_dpdk_vdpa_rxq_recv(struct netdev_rxq *rxq, > > + struct dp_packet_batch *batch, > > + int *qfill) { > > + struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); > > + int fwd_rx; > > + int ret; > > + > > + fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay, > > + rxq->queue_id); > I'm still not clear about the above function. > So netdev_dpdk_vdpa_recv_impl() > netdev_dpdk_vdpa_forward_traffic(), with a queue pair as parameter > ... > rte_eth_rx_burst(qpair->port_id_rx...) > ... > rte_eth_tx_burst(qpair->port_id_tx...) > > So looks like forwarding between vf to vhostuser and vice versa is done in > this function. > > > + ret = netdev_dpdk_rxq_recv(rxq, batch, qfill); > > Then why do we call netdev_dpdk_rxq_recv() above again? > Are packets received above the same packets as rte_eth_rx_burst() > previously called in netdev_dpdk_vdpa_forward_traffic()? > > > Thanks > William > > > + if ((ret == EAGAIN) && fwd_rx) { > > + return 0; > > + } > > + return ret; > > +} > > + > > static inline int > > netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts, > > int cnt, bool should_steal) @@ -2854,6 +2955,29 > > @@ netdev_dpdk_vhost_get_custom_stats(const struct netdev *netdev, > } > > > > static int > > +netdev_dpdk_vdpa_get_custom_stats(const struct netdev *netdev, > > + struct netdev_custom_stats > > +*custom_stats) { > > + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > > + int err = 0; > > + > > + ovs_mutex_lock(&dev->mutex); > > + > > + custom_stats->size = VDPA_CUSTOM_STATS_SIZE; > > + custom_stats->counters = xcalloc(custom_stats->size, > > + sizeof *custom_stats->counters); > > + err = netdev_dpdk_vdpa_get_custom_stats_impl(dev->relay, > > + custom_stats); > > + if (err) { > > + VLOG_ERR("netdev_dpdk_vdpa_get_custom_stats_impl failed." > > + "Port %s\n", netdev->name); > > + } > > + > > + ovs_mutex_unlock(&dev->mutex); > > + return err; > > +} > > + > > +static int > > netdev_dpdk_get_features(const struct netdev *netdev, > > enum netdev_features *current, > > enum netdev_features *advertised, @@ -4237,6 > > +4361,31 @@ netdev_dpdk_vhost_reconfigure(struct netdev *netdev) } > > > > static int > > +netdev_dpdk_vdpa_reconfigure(struct netdev *netdev) { > > + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > > + int err; > > + > > + err = netdev_dpdk_reconfigure(netdev); > > + if (err) { > > + VLOG_ERR("netdev_dpdk_reconfigure failed. Port %s", netdev- > >name); > > + goto out; > > + } > > + > > + ovs_mutex_lock(&dev->mutex); > > + err = netdev_dpdk_vdpa_update_relay(dev->relay, dev->dpdk_mp- > >mp, > > + dev->up.n_rxq); > > + if (err) { > > + VLOG_ERR("netdev_dpdk_vdpa_update_relay failed. Port %s", > > + netdev->name); > > + } > > + > > + ovs_mutex_unlock(&dev->mutex); > > +out: > > + return err; > > +} > > + > > +static int > > netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) { > > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -4456,6 > > +4605,18 @@ static const struct netdev_class dpdk_vhost_client_class = { > > .rxq_enabled = netdev_dpdk_vhost_rxq_enabled, }; > > > > +static const struct netdev_class dpdk_vdpa_class = { > > + .type = "dpdkvdpa", > > + NETDEV_DPDK_CLASS_COMMON, > > + .construct = netdev_dpdk_vdpa_construct, > > + .destruct = netdev_dpdk_vdpa_destruct, > > + .rxq_recv = netdev_dpdk_vdpa_rxq_recv, > > + .set_config = netdev_dpdk_vdpa_set_config, > > + .reconfigure = netdev_dpdk_vdpa_reconfigure, > > + .get_custom_stats = netdev_dpdk_vdpa_get_custom_stats, > > + .send = netdev_dpdk_eth_send > > +}; > > + > > void > > netdev_dpdk_register(void) > > { > > @@ -4463,4 +4624,5 @@ netdev_dpdk_register(void) > > netdev_register_provider(&dpdk_ring_class); > > netdev_register_provider(&dpdk_vhost_class); > > netdev_register_provider(&dpdk_vhost_client_class); > > + netdev_register_provider(&dpdk_vdpa_class); > > } > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index > > 9a743c0..9e94950 100644 > > --- a/vswitchd/vswitch.xml > > +++ b/vswitchd/vswitch.xml > > @@ -2640,6 +2640,13 @@ > > <dd> > > A pair of virtual devices that act as a patch cable. > > </dd> > > + > > + <dt><code>dpdkvdpa</code></dt> > > + <dd> > > + The dpdk vDPA port allows forwarding bi-directional traffic > > between > > + SR-IOV virtual functions (VFs) and VirtIO devices in virtual > > + machines (VMs). > > + </dd> > > </dl> > > </column> > > </group> > > @@ -3156,6 +3163,24 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 > type=patch options:peer=p1 \ > > </p> > > </column> > > > > + <column name="options" key="vdpa-socket-path" > > + type='{"type": "string"}'> > > + <p> > > + The value specifies the path to the socket associated with a VDPA > > + port that will be created by QEMU. > > + Only supported by dpdkvdpa interfaces. > > + </p> > > + </column> > > + > > + <column name="options" key="vdpa-accelerator-devargs" > > + type='{"type": "string"}'> > > + <p> > > + The value specifies the PCI address associated with the virtual > > + function. > > + Only supported by dpdkvdpa interfaces. > > + </p> > > + </column> > > + > > <column name="options" key="dq-zero-copy" > > type='{"type": "boolean"}'> > > <p> > > -- > > 1.8.3.1 > > > > _______________________________________________ > > dev mailing list > > [email protected] > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail > > .openvswitch.org%2Fmailman%2Flistinfo%2Fovs- > dev&data=02%7C01%7Cnoa > > > e%40mellanox.com%7C6390609428bf4e2a2df808d7580cc233%7Ca652971c7d2 > e4d9b > > > a6a4d149256f461b%7C0%7C0%7C637074684147132980&sdata=Eai7e%2B > Ln5x8a > > fpEi7HdWF8FHDYe4vD7dxRLO2Yo0usQ%3D&reserved=0 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
