> Hello Ciara,
> Thanks for the patches.
> 
> I did not review the code. But I have few general concerns about number of
> tx descriptors in HW NICs inline.
> 
> Best regards, Ilya Maximets.
> 
> On 08.12.2017 18:26, Ciara Loftus wrote:
> > Enabled per port like so:
> > ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
> >
> > The feature is disabled by default and can only be enabled/disabled
> > when a vHost port is down.
> >
> > When packets from a vHost device with zero copy enabled are destined
> > for a 'dpdk' port, the number of tx descriptors on that 'dpdk' port
> > must be set to a smaller value. 128 is recommended. This can be
> > achieved like
> > so:
> >
> > ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> >
> > Signed-off-by: Ciara Loftus <[email protected]>
> > ---
> > v5:
> > * Rebase
> > * Update docs with warning of potential packet loss
> > * Update docs with info on NIC & Virtio descriptor values and working
> > combinations
> >
> >  Documentation/howto/dpdk.rst             | 33 ++++++++++++
> >  Documentation/topics/dpdk/vhost-user.rst | 61 ++++++++++++++++++++++
> >  NEWS                                     |  2 +
> >  lib/netdev-dpdk.c                        | 89
> +++++++++++++++++++++++++++++++-
> >  vswitchd/vswitch.xml                     | 11 ++++
> >  5 files changed, 195 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/howto/dpdk.rst
> > b/Documentation/howto/dpdk.rst index d123819..3e1b8f8 100644
> > --- a/Documentation/howto/dpdk.rst
> > +++ b/Documentation/howto/dpdk.rst
> > @@ -709,3 +709,36 @@ devices to bridge ``br0``. Once complete, follow
> the below steps:
> >     Check traffic on multiple queues::
> >
> >         $ cat /proc/interrupts | grep virtio
> > +
> > +PHY-VM-PHY (vHost Dequeue Zero Copy)
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +vHost dequeue zero copy functionality can  be validated using the
> > +PHY-VM-PHY configuration. To begin, follow the steps described in
> > +:ref:`dpdk-phy-phy` to create and initialize the database, start
> > +ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices
> > +and flows to bridge ``br0``. Once complete, follow the below steps:
> > +
> > +1. Enable dequeue zero copy on the vHost devices.
> > +
> > +       $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero-
> copy=true
> > +       $ ovs-vsctl set Interface dpdkvhostuser1
> > + options:dq-zero-copy=true
> > +
> > +The following log should be observed for each device:
> > +
> > +       netdev_dpdk|INFO|Zero copy enabled for vHost socket <name>
> > +
> > +2. Reduce the number of txq descriptors on the phy ports.
> > +
> > +       $ ovs-vsctl set Interface phy0 options:n_txq_desc=128
> > +       $ ovs-vsctl set Interface phy1 options:n_txq_desc=128
> > +
> > +3. Proceed with the test by launching the VM and configuring guest
> > +forwarding, be it via the vHost loopback method or kernel forwarding
> > +method, and sending traffic. The following log should be oberved for
> > +each device as it becomes active during VM boot:
> > +
> > +       VHOST_CONFIG: dequeue zero copy is enabled
> > +
> > +It is essential that step 1 is performed before booting the VM,
> > +otherwise the feature will not be enabled.
> > diff --git a/Documentation/topics/dpdk/vhost-user.rst
> > b/Documentation/topics/dpdk/vhost-user.rst
> > index 8b1c671..8587370 100644
> > --- a/Documentation/topics/dpdk/vhost-user.rst
> > +++ b/Documentation/topics/dpdk/vhost-user.rst
> > @@ -441,3 +441,64 @@ Sample XML
> >      </domain>
> >
> >  .. _QEMU documentation:
> > http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user
> > .txt;h=7890d7169;hb=HEAD
> > +
> > +vhost-user Dequeue Zero Copy
> > +-------------------------------------
> > +
> > +Normally when dequeuing a packet from a vHost User device, a memcpy
> > +operation must be used to copy that packet from guest address space
> > +to host address space. This memcpy can be removed by enabling dequeue
> zero-copy like so:
> > +
> > +    $ ovs-vsctl set Interface dpdkvhostuserclient0
> > + options:dq-zero-copy=true
> > +
> > +With this feature enabled, a reference (pointer) to the packet is
> > +passed to the host, instead of a copy of the packet. Removing this
> > +memcpy can give a performance improvement for some use cases, for
> > +example switching large packets between different VMs. However
> additional packet loss may be observed.
> > +
> > +Note that the feature is disabled by default and must be explicitly
> > +enabled by using the command above.
> > +
> > +The feature cannot be enabled when the device is active (ie. VM
> > +booted). If you wish to enable the feature after the VM has booted,
> > +you must shutdown the VM and bring it back up.
> > +
> > +The same logic applies for disabling the feature - it must be
> > +disabled when the device is inactive, for example before VM boot. To
> disable the feature:
> > +
> > +    $ ovs-vsctl set Interface dpdkvhostuserclient0
> > + options:dq-zero-copy=false
> > +
> > +The feature is available to both dpdkvhostuser and
> > +dpdkvhostuserclient port types.
> > +
> > +A limitation exists whereby if packets from a vHost port with
> > +dq-zero-copy=true are destined for a 'dpdk' type port, the number of
> > +tx descriptors (n_txq_desc) for that port must be reduced to a
> > +smaller number, 128 being the recommended value. This can be achieved
> by issuing the following command:
> > +
> > +    $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> > +
> > +More information on the n_txq_desc option can be found in the "DPDK
> > +Physical Port Queue Sizes" section of the  `intro/install/dpdk.rst`
> guide.
> > +
> > +The reason for this limitation is due to how the zero copy
> > +functionality is implemented. The vHost device's 'tx used vring', a
> > +virtio structure used for tracking used ie. sent descriptors, will
> > +only be updated when the NIC frees the corresponding mbuf. If we
> > +don't free the mbufs frequently enough, that vring will be starved
> > +and packets will no longer be processed. One way to ensure we don't
> > +encounter this scenario, is to configure n_txq_desc to a small enough
> > +number such that the 'mbuf free threshold' for the NIC will be hit
> > +more often and thus free mbufs more frequently. The value of 128 is
> > +suggested, but values of 64 and 256 have been tested and verified to
> > +work too, with differing performance characteristics. A value of 512
> > +can be used too, if the virtio queue size in the guest is increased to
> 1024 (available to configure in QEMU versions v2.10 and greater). This
> value can be set like so:
> > +
> > +    $ qemu-system-x86_64 ... -chardev
> socket,id=char1,path=<sockpath>,server
> > +      -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
> > +      -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,
> > +      tx_queue_size=1024
> 
> What if we have 2, 4, 10 HW NICs and a few different destinations for
> traffic from VM?
> What if we have 2, 4, 10 HW NICs added to balanced OVS bonding and just a
> few different packet flows?
> What if we have HW NICs with small number of TX queues (like VFs) and XPS
> is working?
> We may actually stuck with non-working virtual interface in all above
> cases.
> 
> IMHO, this feature should be marked as experimental until we don't have
> proper solution for that stuck mbufs in HW rings.
> 
> Thoughts?

My 2 cents, I'd like to see this get in for 2.9.

I understand there are concerns with corner cases and config combinations but 
it has been available for review and testing for a while and for the most part 
no blocking issues were found.

But as there could be a fix required for the mbuff and rings and other corner 
cases as highlighted above as testing continues I wouldn’t be opposed to 
putting an experimental tag against the feature for the 2.9 release.

Ian

> 
> > +
> > +Further information can be found in the `DPDK documentation
> > +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__
> > diff --git a/NEWS b/NEWS
> > index d45904e..50630f8 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -18,6 +18,8 @@ Post-v2.8.0
> >     - DPDK:
> >       * Add support for DPDK v17.11
> >       * Add support for vHost IOMMU
> > +     * Optional dequeue zero copy feature for vHost ports can be
> enabled per
> > +       port via the boolean 'dq-zero-copy' option.
> >
> >  v2.8.0 - 31 Aug 2017
> >  --------------------
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> > 879019e..03cc6cb 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -361,6 +361,9 @@ struct netdev_dpdk {
> >          /* True if vHost device is 'up' and has been reconfigured at
> least once */
> >          bool vhost_reconfigured;
> >          /* 3 pad bytes here. */
> > +
> > +        /* True if dq-zero-copy feature has successfully been enabled
> */
> > +        bool dq_zc_enabled;
> >      );
> >
> >      PADDED_MEMBERS(CACHE_LINE_SIZE,
> > @@ -871,6 +874,7 @@ common_construct(struct netdev *netdev, dpdk_port_t
> port_no,
> >      dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> >      ovsrcu_index_init(&dev->vid, -1);
> >      dev->vhost_reconfigured = false;
> > +    dev->dq_zc_enabled = false;
> >      dev->attached = false;
> >
> >      ovsrcu_init(&dev->qos_conf, NULL); @@ -1383,6 +1387,29 @@
> > netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap
> *args,
> >      return 0;
> >  }
> >
> > +static void
> > +dpdk_vhost_set_config_helper(struct netdev_dpdk *dev,
> > +                             const struct smap *args) {
> > +    bool needs_reconfigure = false;
> > +    bool zc_requested = smap_get_bool(args, "dq-zero-copy", false);
> > +
> > +    if (zc_requested &&
> > +            !(dev->vhost_driver_flags &
> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
> > +        dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> > +        needs_reconfigure = true;
> > +    } else if (!zc_requested &&
> > +            (dev->vhost_driver_flags &
> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
> > +        dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> > +        needs_reconfigure = true;
> > +    }
> > +
> > +    /* Only try to change ZC mode when device is down */
> > +    if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) {
> > +        netdev_request_reconfigure(&dev->up);
> > +    }
> > +}
> > +
> >  static int
> >  netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
> >                                      const struct smap *args, @@
> > -1399,6 +1426,23 @@ netdev_dpdk_vhost_client_set_config(struct netdev
> *netdev,
> >              netdev_request_reconfigure(netdev);
> >          }
> >      }
> > +
> > +    dpdk_vhost_set_config_helper(dev, args);
> > +
> > +    ovs_mutex_unlock(&dev->mutex);
> > +
> > +    return 0;
> > +}
> > +
> > +static int
> > +netdev_dpdk_vhost_set_config(struct netdev *netdev,
> > +                             const struct smap *args,
> > +                             char **errp OVS_UNUSED) {
> > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > +
> > +    ovs_mutex_lock(&dev->mutex);
> > +    dpdk_vhost_set_config_helper(dev, args);
> >      ovs_mutex_unlock(&dev->mutex);
> >
> >      return 0;
> > @@ -2723,6 +2767,46 @@ netdev_dpdk_txq_map_clear(struct netdev_dpdk
> *dev)
> >      }
> >  }
> >
> > +static void
> > +vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool client_mode,
> > +                            bool enable) {
> > +    int err = rte_vhost_driver_unregister(dev->vhost_id);
> > +
> > +    if (err) {
> > +        VLOG_ERR("Error unregistering vHost socket %s; can't change
> zero copy "
> > +                "mode", dev->vhost_id);
> > +    } else {
> > +        err = dpdk_setup_vhost_device(dev, client_mode);
> > +        if (err) {
> > +            VLOG_ERR("Error changing zero copy mode for vHost socket
> %s",
> > +                    dev->vhost_id);
> > +        } else if (enable) {
> > +            dev->dq_zc_enabled = true;
> > +            VLOG_INFO("Zero copy enabled for vHost socket %s", dev-
> >vhost_id);
> > +        } else {
> > +            dev->dq_zc_enabled = false;
> > +            VLOG_INFO("Zero copy disabled for vHost socket %s", dev-
> >vhost_id);
> > +        }
> > +    }
> > +}
> > +
> > +static void
> > +vhost_check_zero_copy_status(struct netdev_dpdk *dev) {
> > +    bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT;
> > +
> > +    if ((dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)
> > +                && !dev->dq_zc_enabled) {
> > +        /* ZC disabled but requested to be enabled, enable it. */
> > +        vhost_change_zero_copy_mode(dev, mode, true);
> > +    } else if (!(dev->vhost_driver_flags &
> > +            RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev->dq_zc_enabled) {
> > +        /* ZC enabled but requested to be disabled, disable it. */
> > +        vhost_change_zero_copy_mode(dev, mode, false);
> > +    }
> > +}
> > +
> >  /*
> >   * Remove a virtio-net device from the specific vhost port.  Use dev-
> >remove
> >   * flag to stop any more packets from being sent or received to/from
> > a VM and @@ -2768,6 +2852,7 @@ destroy_device(int vid)
> >           */
> >          ovsrcu_quiesce_start();
> >          VLOG_INFO("vHost Device '%s' has been removed", ifname);
> > +        netdev_request_reconfigure(&dev->up);
> >      } else {
> >          VLOG_INFO("vHost Device '%s' not found", ifname);
> >      }
> > @@ -3259,6 +3344,8 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk
> *dev)
> >              /* Carrier status may need updating. */
> >              netdev_change_seq_changed(&dev->up);
> >          }
> > +    } else {
> > +        vhost_check_zero_copy_status(dev);
> >      }
> >
> >      return 0;
> > @@ -3421,7 +3508,7 @@ static const struct netdev_class dpdk_vhost_class
> =
> >          NULL,
> >          netdev_dpdk_vhost_construct,
> >          netdev_dpdk_vhost_destruct,
> > -        NULL,
> > +        netdev_dpdk_vhost_set_config,
> >          NULL,
> >          netdev_dpdk_vhost_send,
> >          netdev_dpdk_vhost_get_carrier, diff --git
> > a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 4c317d0..e8409b4
> > 100644
> > --- a/vswitchd/vswitch.xml
> > +++ b/vswitchd/vswitch.xml
> > @@ -2669,6 +2669,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> type=patch options:peer=p1 \
> >          </p>
> >        </column>
> >
> > +      <column name="options" key="dq-zero-copy"
> > +              type='{"type": "boolean"}'>
> > +        <p>
> > +          The value specifies whether or not to enable dequeue zero
> copy on
> > +          the given interface.
> > +          The port must be in an inactive state in order to enable or
> disable
> > +          this feature.
> > +          Only supported by dpdkvhostuserclient and dpdkvhostuser
> interfaces.
> > +        </p>
> > +      </column>
> > +
> >        <column name="options" key="n_rxq_desc"
> >                type='{"type": "integer", "minInteger": 1, "maxInteger":
> 4096}'>
> >          <p>
> >
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to