> Hello Ciara, > Thanks for the patches. > > I did not review the code. But I have few general concerns about number of > tx descriptors in HW NICs inline. > > Best regards, Ilya Maximets. > > On 08.12.2017 18:26, Ciara Loftus wrote: > > Enabled per port like so: > > ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true > > > > The feature is disabled by default and can only be enabled/disabled > > when a vHost port is down. > > > > When packets from a vHost device with zero copy enabled are destined > > for a 'dpdk' port, the number of tx descriptors on that 'dpdk' port > > must be set to a smaller value. 128 is recommended. This can be > > achieved like > > so: > > > > ovs-vsctl set Interface dpdkport options:n_txq_desc=128 > > > > Signed-off-by: Ciara Loftus <[email protected]> > > --- > > v5: > > * Rebase > > * Update docs with warning of potential packet loss > > * Update docs with info on NIC & Virtio descriptor values and working > > combinations > > > > Documentation/howto/dpdk.rst | 33 ++++++++++++ > > Documentation/topics/dpdk/vhost-user.rst | 61 ++++++++++++++++++++++ > > NEWS | 2 + > > lib/netdev-dpdk.c | 89 > +++++++++++++++++++++++++++++++- > > vswitchd/vswitch.xml | 11 ++++ > > 5 files changed, 195 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/howto/dpdk.rst > > b/Documentation/howto/dpdk.rst index d123819..3e1b8f8 100644 > > --- a/Documentation/howto/dpdk.rst > > +++ b/Documentation/howto/dpdk.rst > > @@ -709,3 +709,36 @@ devices to bridge ``br0``. Once complete, follow > the below steps: > > Check traffic on multiple queues:: > > > > $ cat /proc/interrupts | grep virtio > > + > > +PHY-VM-PHY (vHost Dequeue Zero Copy) > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +vHost dequeue zero copy functionality can be validated using the > > +PHY-VM-PHY configuration. To begin, follow the steps described in > > +:ref:`dpdk-phy-phy` to create and initialize the database, start > > +ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices > > +and flows to bridge ``br0``. Once complete, follow the below steps: > > + > > +1. Enable dequeue zero copy on the vHost devices. > > + > > + $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero- > copy=true > > + $ ovs-vsctl set Interface dpdkvhostuser1 > > + options:dq-zero-copy=true > > + > > +The following log should be observed for each device: > > + > > + netdev_dpdk|INFO|Zero copy enabled for vHost socket <name> > > + > > +2. Reduce the number of txq descriptors on the phy ports. > > + > > + $ ovs-vsctl set Interface phy0 options:n_txq_desc=128 > > + $ ovs-vsctl set Interface phy1 options:n_txq_desc=128 > > + > > +3. Proceed with the test by launching the VM and configuring guest > > +forwarding, be it via the vHost loopback method or kernel forwarding > > +method, and sending traffic. The following log should be oberved for > > +each device as it becomes active during VM boot: > > + > > + VHOST_CONFIG: dequeue zero copy is enabled > > + > > +It is essential that step 1 is performed before booting the VM, > > +otherwise the feature will not be enabled. > > diff --git a/Documentation/topics/dpdk/vhost-user.rst > > b/Documentation/topics/dpdk/vhost-user.rst > > index 8b1c671..8587370 100644 > > --- a/Documentation/topics/dpdk/vhost-user.rst > > +++ b/Documentation/topics/dpdk/vhost-user.rst > > @@ -441,3 +441,64 @@ Sample XML > > </domain> > > > > .. _QEMU documentation: > > http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user > > .txt;h=7890d7169;hb=HEAD > > + > > +vhost-user Dequeue Zero Copy > > +------------------------------------- > > + > > +Normally when dequeuing a packet from a vHost User device, a memcpy > > +operation must be used to copy that packet from guest address space > > +to host address space. This memcpy can be removed by enabling dequeue > zero-copy like so: > > + > > + $ ovs-vsctl set Interface dpdkvhostuserclient0 > > + options:dq-zero-copy=true > > + > > +With this feature enabled, a reference (pointer) to the packet is > > +passed to the host, instead of a copy of the packet. Removing this > > +memcpy can give a performance improvement for some use cases, for > > +example switching large packets between different VMs. However > additional packet loss may be observed. > > + > > +Note that the feature is disabled by default and must be explicitly > > +enabled by using the command above. > > + > > +The feature cannot be enabled when the device is active (ie. VM > > +booted). If you wish to enable the feature after the VM has booted, > > +you must shutdown the VM and bring it back up. > > + > > +The same logic applies for disabling the feature - it must be > > +disabled when the device is inactive, for example before VM boot. To > disable the feature: > > + > > + $ ovs-vsctl set Interface dpdkvhostuserclient0 > > + options:dq-zero-copy=false > > + > > +The feature is available to both dpdkvhostuser and > > +dpdkvhostuserclient port types. > > + > > +A limitation exists whereby if packets from a vHost port with > > +dq-zero-copy=true are destined for a 'dpdk' type port, the number of > > +tx descriptors (n_txq_desc) for that port must be reduced to a > > +smaller number, 128 being the recommended value. This can be achieved > by issuing the following command: > > + > > + $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128 > > + > > +More information on the n_txq_desc option can be found in the "DPDK > > +Physical Port Queue Sizes" section of the `intro/install/dpdk.rst` > guide. > > + > > +The reason for this limitation is due to how the zero copy > > +functionality is implemented. The vHost device's 'tx used vring', a > > +virtio structure used for tracking used ie. sent descriptors, will > > +only be updated when the NIC frees the corresponding mbuf. If we > > +don't free the mbufs frequently enough, that vring will be starved > > +and packets will no longer be processed. One way to ensure we don't > > +encounter this scenario, is to configure n_txq_desc to a small enough > > +number such that the 'mbuf free threshold' for the NIC will be hit > > +more often and thus free mbufs more frequently. The value of 128 is > > +suggested, but values of 64 and 256 have been tested and verified to > > +work too, with differing performance characteristics. A value of 512 > > +can be used too, if the virtio queue size in the guest is increased to > 1024 (available to configure in QEMU versions v2.10 and greater). This > value can be set like so: > > + > > + $ qemu-system-x86_64 ... -chardev > socket,id=char1,path=<sockpath>,server > > + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce > > + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1, > > + tx_queue_size=1024 > > What if we have 2, 4, 10 HW NICs and a few different destinations for > traffic from VM? > What if we have 2, 4, 10 HW NICs added to balanced OVS bonding and just a > few different packet flows? > What if we have HW NICs with small number of TX queues (like VFs) and XPS > is working? > We may actually stuck with non-working virtual interface in all above > cases. > > IMHO, this feature should be marked as experimental until we don't have > proper solution for that stuck mbufs in HW rings. > > Thoughts?
My 2 cents, I'd like to see this get in for 2.9. I understand there are concerns with corner cases and config combinations but it has been available for review and testing for a while and for the most part no blocking issues were found. But as there could be a fix required for the mbuff and rings and other corner cases as highlighted above as testing continues I wouldn’t be opposed to putting an experimental tag against the feature for the 2.9 release. Ian > > > + > > +Further information can be found in the `DPDK documentation > > +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__ > > diff --git a/NEWS b/NEWS > > index d45904e..50630f8 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -18,6 +18,8 @@ Post-v2.8.0 > > - DPDK: > > * Add support for DPDK v17.11 > > * Add support for vHost IOMMU > > + * Optional dequeue zero copy feature for vHost ports can be > enabled per > > + port via the boolean 'dq-zero-copy' option. > > > > v2.8.0 - 31 Aug 2017 > > -------------------- > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index > > 879019e..03cc6cb 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -361,6 +361,9 @@ struct netdev_dpdk { > > /* True if vHost device is 'up' and has been reconfigured at > least once */ > > bool vhost_reconfigured; > > /* 3 pad bytes here. */ > > + > > + /* True if dq-zero-copy feature has successfully been enabled > */ > > + bool dq_zc_enabled; > > ); > > > > PADDED_MEMBERS(CACHE_LINE_SIZE, > > @@ -871,6 +874,7 @@ common_construct(struct netdev *netdev, dpdk_port_t > port_no, > > dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); > > ovsrcu_index_init(&dev->vid, -1); > > dev->vhost_reconfigured = false; > > + dev->dq_zc_enabled = false; > > dev->attached = false; > > > > ovsrcu_init(&dev->qos_conf, NULL); @@ -1383,6 +1387,29 @@ > > netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap > *args, > > return 0; > > } > > > > +static void > > +dpdk_vhost_set_config_helper(struct netdev_dpdk *dev, > > + const struct smap *args) { > > + bool needs_reconfigure = false; > > + bool zc_requested = smap_get_bool(args, "dq-zero-copy", false); > > + > > + if (zc_requested && > > + !(dev->vhost_driver_flags & > RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) { > > + dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY; > > + needs_reconfigure = true; > > + } else if (!zc_requested && > > + (dev->vhost_driver_flags & > RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) { > > + dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY; > > + needs_reconfigure = true; > > + } > > + > > + /* Only try to change ZC mode when device is down */ > > + if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) { > > + netdev_request_reconfigure(&dev->up); > > + } > > +} > > + > > static int > > netdev_dpdk_vhost_client_set_config(struct netdev *netdev, > > const struct smap *args, @@ > > -1399,6 +1426,23 @@ netdev_dpdk_vhost_client_set_config(struct netdev > *netdev, > > netdev_request_reconfigure(netdev); > > } > > } > > + > > + dpdk_vhost_set_config_helper(dev, args); > > + > > + ovs_mutex_unlock(&dev->mutex); > > + > > + return 0; > > +} > > + > > +static int > > +netdev_dpdk_vhost_set_config(struct netdev *netdev, > > + const struct smap *args, > > + char **errp OVS_UNUSED) { > > + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > > + > > + ovs_mutex_lock(&dev->mutex); > > + dpdk_vhost_set_config_helper(dev, args); > > ovs_mutex_unlock(&dev->mutex); > > > > return 0; > > @@ -2723,6 +2767,46 @@ netdev_dpdk_txq_map_clear(struct netdev_dpdk > *dev) > > } > > } > > > > +static void > > +vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool client_mode, > > + bool enable) { > > + int err = rte_vhost_driver_unregister(dev->vhost_id); > > + > > + if (err) { > > + VLOG_ERR("Error unregistering vHost socket %s; can't change > zero copy " > > + "mode", dev->vhost_id); > > + } else { > > + err = dpdk_setup_vhost_device(dev, client_mode); > > + if (err) { > > + VLOG_ERR("Error changing zero copy mode for vHost socket > %s", > > + dev->vhost_id); > > + } else if (enable) { > > + dev->dq_zc_enabled = true; > > + VLOG_INFO("Zero copy enabled for vHost socket %s", dev- > >vhost_id); > > + } else { > > + dev->dq_zc_enabled = false; > > + VLOG_INFO("Zero copy disabled for vHost socket %s", dev- > >vhost_id); > > + } > > + } > > +} > > + > > +static void > > +vhost_check_zero_copy_status(struct netdev_dpdk *dev) { > > + bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT; > > + > > + if ((dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY) > > + && !dev->dq_zc_enabled) { > > + /* ZC disabled but requested to be enabled, enable it. */ > > + vhost_change_zero_copy_mode(dev, mode, true); > > + } else if (!(dev->vhost_driver_flags & > > + RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev->dq_zc_enabled) { > > + /* ZC enabled but requested to be disabled, disable it. */ > > + vhost_change_zero_copy_mode(dev, mode, false); > > + } > > +} > > + > > /* > > * Remove a virtio-net device from the specific vhost port. Use dev- > >remove > > * flag to stop any more packets from being sent or received to/from > > a VM and @@ -2768,6 +2852,7 @@ destroy_device(int vid) > > */ > > ovsrcu_quiesce_start(); > > VLOG_INFO("vHost Device '%s' has been removed", ifname); > > + netdev_request_reconfigure(&dev->up); > > } else { > > VLOG_INFO("vHost Device '%s' not found", ifname); > > } > > @@ -3259,6 +3344,8 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk > *dev) > > /* Carrier status may need updating. */ > > netdev_change_seq_changed(&dev->up); > > } > > + } else { > > + vhost_check_zero_copy_status(dev); > > } > > > > return 0; > > @@ -3421,7 +3508,7 @@ static const struct netdev_class dpdk_vhost_class > = > > NULL, > > netdev_dpdk_vhost_construct, > > netdev_dpdk_vhost_destruct, > > - NULL, > > + netdev_dpdk_vhost_set_config, > > NULL, > > netdev_dpdk_vhost_send, > > netdev_dpdk_vhost_get_carrier, diff --git > > a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 4c317d0..e8409b4 > > 100644 > > --- a/vswitchd/vswitch.xml > > +++ b/vswitchd/vswitch.xml > > @@ -2669,6 +2669,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 > type=patch options:peer=p1 \ > > </p> > > </column> > > > > + <column name="options" key="dq-zero-copy" > > + type='{"type": "boolean"}'> > > + <p> > > + The value specifies whether or not to enable dequeue zero > copy on > > + the given interface. > > + The port must be in an inactive state in order to enable or > disable > > + this feature. > > + Only supported by dpdkvhostuserclient and dpdkvhostuser > interfaces. > > + </p> > > + </column> > > + > > <column name="options" key="n_rxq_desc" > > type='{"type": "integer", "minInteger": 1, "maxInteger": > 4096}'> > > <p> > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
