On 12/14/2017 03:30 PM, Ilya Maximets wrote:
> Hello Ciara,
> Thanks for the patches.
> 
> I did not review the code. But I have few general concerns about number
> of tx descriptors in HW NICs inline.
> 
> Best regards, Ilya Maximets.
> 
> On 08.12.2017 18:26, Ciara Loftus wrote:
>> Enabled per port like so:
>> ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
>>
>> The feature is disabled by default and can only be enabled/disabled when
>> a vHost port is down.
>>
>> When packets from a vHost device with zero copy enabled are destined for
>> a 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be
>> set to a smaller value. 128 is recommended. This can be achieved like
>> so:
>>
>> ovs-vsctl set Interface dpdkport options:n_txq_desc=128
>>
>> Signed-off-by: Ciara Loftus <[email protected]>
>> ---
>> v5:
>> * Rebase
>> * Update docs with warning of potential packet loss
>> * Update docs with info on NIC & Virtio descriptor values and working
>> combinations
>>
>>  Documentation/howto/dpdk.rst             | 33 ++++++++++++
>>  Documentation/topics/dpdk/vhost-user.rst | 61 ++++++++++++++++++++++
>>  NEWS                                     |  2 +
>>  lib/netdev-dpdk.c                        | 89 
>> +++++++++++++++++++++++++++++++-
>>  vswitchd/vswitch.xml                     | 11 ++++
>>  5 files changed, 195 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
>> index d123819..3e1b8f8 100644
>> --- a/Documentation/howto/dpdk.rst
>> +++ b/Documentation/howto/dpdk.rst
>> @@ -709,3 +709,36 @@ devices to bridge ``br0``. Once complete, follow the 
>> below steps:
>>     Check traffic on multiple queues::
>>  
>>         $ cat /proc/interrupts | grep virtio
>> +
>> +PHY-VM-PHY (vHost Dequeue Zero Copy)
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +vHost dequeue zero copy functionality can  be validated using the
>> +PHY-VM-PHY configuration. To begin, follow the steps described in
>> +:ref:`dpdk-phy-phy` to create and initialize the database, start
>> +ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices
>> +and flows to bridge ``br0``. Once complete, follow the below steps:
>> +
>> +1. Enable dequeue zero copy on the vHost devices.
>> +
>> +       $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero-copy=true
>> +       $ ovs-vsctl set Interface dpdkvhostuser1 options:dq-zero-copy=true
>> +
>> +The following log should be observed for each device:
>> +
>> +       netdev_dpdk|INFO|Zero copy enabled for vHost socket <name>
>> +
>> +2. Reduce the number of txq descriptors on the phy ports.
>> +
>> +       $ ovs-vsctl set Interface phy0 options:n_txq_desc=128
>> +       $ ovs-vsctl set Interface phy1 options:n_txq_desc=128
>> +
>> +3. Proceed with the test by launching the VM and configuring guest
>> +forwarding, be it via the vHost loopback method or kernel forwarding
>> +method, and sending traffic. The following log should be oberved for
>> +each device as it becomes active during VM boot:
>> +
>> +       VHOST_CONFIG: dequeue zero copy is enabled
>> +
>> +It is essential that step 1 is performed before booting the VM, otherwise
>> +the feature will not be enabled.
>> diff --git a/Documentation/topics/dpdk/vhost-user.rst 
>> b/Documentation/topics/dpdk/vhost-user.rst
>> index 8b1c671..8587370 100644
>> --- a/Documentation/topics/dpdk/vhost-user.rst
>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>> @@ -441,3 +441,64 @@ Sample XML
>>      </domain>
>>  
>>  .. _QEMU documentation: 
>> http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD
>> +
>> +vhost-user Dequeue Zero Copy
>> +-------------------------------------
>> +
>> +Normally when dequeuing a packet from a vHost User device, a memcpy 
>> operation
>> +must be used to copy that packet from guest address space to host address
>> +space. This memcpy can be removed by enabling dequeue zero-copy like so:
>> +
>> +    $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
>> +
>> +With this feature enabled, a reference (pointer) to the packet is passed to
>> +the host, instead of a copy of the packet. Removing this memcpy can give a
>> +performance improvement for some use cases, for example switching large 
>> packets
>> +between different VMs. However additional packet loss may be observed.
>> +
>> +Note that the feature is disabled by default and must be explicitly enabled
>> +by using the command above.
>> +
>> +The feature cannot be enabled when the device is active (ie. VM booted). If
>> +you wish to enable the feature after the VM has booted, you must shutdown
>> +the VM and bring it back up.
>> +
>> +The same logic applies for disabling the feature - it must be disabled when
>> +the device is inactive, for example before VM boot. To disable the feature:
>> +
>> +    $ ovs-vsctl set Interface dpdkvhostuserclient0 
>> options:dq-zero-copy=false
>> +
>> +The feature is available to both dpdkvhostuser and dpdkvhostuserclient port
>> +types.
>> +
>> +A limitation exists whereby if packets from a vHost port with 
>> dq-zero-copy=true
>> +are destined for a 'dpdk' type port, the number of tx descriptors 
>> (n_txq_desc)
>> +for that port must be reduced to a smaller number, 128 being the recommended
>> +value. This can be achieved by issuing the following command:
>> +
>> +    $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
>> +
>> +More information on the n_txq_desc option can be found in the "DPDK Physical
>> +Port Queue Sizes" section of the  `intro/install/dpdk.rst` guide.
>> +
>> +The reason for this limitation is due to how the zero copy functionality is
>> +implemented. The vHost device's 'tx used vring', a virtio structure used for
>> +tracking used ie. sent descriptors, will only be updated when the NIC frees
>> +the corresponding mbuf. If we don't free the mbufs frequently enough, that
>> +vring will be starved and packets will no longer be processed. One way to
>> +ensure we don't encounter this scenario, is to configure n_txq_desc to a 
>> small
>> +enough number such that the 'mbuf free threshold' for the NIC will be hit 
>> more
>> +often and thus free mbufs more frequently. The value of 128 is suggested, 
>> but
>> +values of 64 and 256 have been tested and verified to work too, with 
>> differing
>> +performance characteristics. A value of 512 can be used too, if the virtio
>> +queue size in the guest is increased to 1024 (available to configure in QEMU
>> +versions v2.10 and greater). This value can be set like so:
>> +
>> +    $ qemu-system-x86_64 ... -chardev socket,id=char1,path=<sockpath>,server
>> +      -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
>> +      -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,
>> +      tx_queue_size=1024
> 
> What if we have 2, 4, 10 HW NICs and a few different destinations for traffic 
> from VM?
> What if we have 2, 4, 10 HW NICs added to balanced OVS bonding and just a few 
> different packet flows?
> What if we have HW NICs with small number of TX queues (like VFs) and XPS is 
> working?
> We may actually stuck with non-working virtual interface in all above cases.
> 
> IMHO, this feature should be marked as experimental until we don't have 
> proper solution
> for that stuck mbufs in HW rings.
> 
> Thoughts?
> 

How about using rte_eth_tx_done_cleanup() ? I guess a user would not
want to combine this feature with sw tx batching though.

On a related note, I would like to ask Ciara/Jan the use case when this
feature would likely be enabled. The RFC2544 test is very opaque but it
seems there is at least some increased packet loss, so would it be just
when the user knows it is vm-vm and large packets? (In which case you
could argue, packets would not get stuck in HW NIC anyway)

Kevin.

>> +
>> +Further information can be found in the
>> +`DPDK documentation
>> +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__
>> diff --git a/NEWS b/NEWS
>> index d45904e..50630f8 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -18,6 +18,8 @@ Post-v2.8.0
>>     - DPDK:
>>       * Add support for DPDK v17.11
>>       * Add support for vHost IOMMU
>> +     * Optional dequeue zero copy feature for vHost ports can be enabled per
>> +       port via the boolean 'dq-zero-copy' option.
>>  
>>  v2.8.0 - 31 Aug 2017
>>  --------------------
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 879019e..03cc6cb 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -361,6 +361,9 @@ struct netdev_dpdk {
>>          /* True if vHost device is 'up' and has been reconfigured at least 
>> once */
>>          bool vhost_reconfigured;
>>          /* 3 pad bytes here. */
>> +
>> +        /* True if dq-zero-copy feature has successfully been enabled */
>> +        bool dq_zc_enabled;
>>      );
>>  
>>      PADDED_MEMBERS(CACHE_LINE_SIZE,
>> @@ -871,6 +874,7 @@ common_construct(struct netdev *netdev, dpdk_port_t 
>> port_no,
>>      dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>>      ovsrcu_index_init(&dev->vid, -1);
>>      dev->vhost_reconfigured = false;
>> +    dev->dq_zc_enabled = false;
>>      dev->attached = false;
>>  
>>      ovsrcu_init(&dev->qos_conf, NULL);
>> @@ -1383,6 +1387,29 @@ netdev_dpdk_ring_set_config(struct netdev *netdev, 
>> const struct smap *args,
>>      return 0;
>>  }
>>  
>> +static void
>> +dpdk_vhost_set_config_helper(struct netdev_dpdk *dev,
>> +                             const struct smap *args)
>> +{
>> +    bool needs_reconfigure = false;
>> +    bool zc_requested = smap_get_bool(args, "dq-zero-copy", false);
>> +
>> +    if (zc_requested &&
>> +            !(dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
>> +        dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
>> +        needs_reconfigure = true;
>> +    } else if (!zc_requested &&
>> +            (dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
>> +        dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
>> +        needs_reconfigure = true;
>> +    }
>> +
>> +    /* Only try to change ZC mode when device is down */
>> +    if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) {
>> +        netdev_request_reconfigure(&dev->up);
>> +    }
>> +}
>> +
>>  static int
>>  netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
>>                                      const struct smap *args,
>> @@ -1399,6 +1426,23 @@ netdev_dpdk_vhost_client_set_config(struct netdev 
>> *netdev,
>>              netdev_request_reconfigure(netdev);
>>          }
>>      }
>> +
>> +    dpdk_vhost_set_config_helper(dev, args);
>> +
>> +    ovs_mutex_unlock(&dev->mutex);
>> +
>> +    return 0;
>> +}
>> +
>> +static int
>> +netdev_dpdk_vhost_set_config(struct netdev *netdev,
>> +                             const struct smap *args,
>> +                             char **errp OVS_UNUSED)
>> +{
>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> +
>> +    ovs_mutex_lock(&dev->mutex);
>> +    dpdk_vhost_set_config_helper(dev, args);
>>      ovs_mutex_unlock(&dev->mutex);
>>  
>>      return 0;
>> @@ -2723,6 +2767,46 @@ netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev)
>>      }
>>  }
>>  
>> +static void
>> +vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool client_mode,
>> +                            bool enable)
>> +{
>> +    int err = rte_vhost_driver_unregister(dev->vhost_id);
>> +
>> +    if (err) {
>> +        VLOG_ERR("Error unregistering vHost socket %s; can't change zero 
>> copy "
>> +                "mode", dev->vhost_id);
>> +    } else {
>> +        err = dpdk_setup_vhost_device(dev, client_mode);
>> +        if (err) {
>> +            VLOG_ERR("Error changing zero copy mode for vHost socket %s",
>> +                    dev->vhost_id);
>> +        } else if (enable) {
>> +            dev->dq_zc_enabled = true;
>> +            VLOG_INFO("Zero copy enabled for vHost socket %s", 
>> dev->vhost_id);
>> +        } else {
>> +            dev->dq_zc_enabled = false;
>> +            VLOG_INFO("Zero copy disabled for vHost socket %s", 
>> dev->vhost_id);
>> +        }
>> +    }
>> +}
>> +
>> +static void
>> +vhost_check_zero_copy_status(struct netdev_dpdk *dev)
>> +{
>> +    bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT;
>> +
>> +    if ((dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)
>> +                && !dev->dq_zc_enabled) {
>> +        /* ZC disabled but requested to be enabled, enable it. */
>> +        vhost_change_zero_copy_mode(dev, mode, true);
>> +    } else if (!(dev->vhost_driver_flags &
>> +            RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev->dq_zc_enabled) {
>> +        /* ZC enabled but requested to be disabled, disable it. */
>> +        vhost_change_zero_copy_mode(dev, mode, false);
>> +    }
>> +}
>> +
>>  /*
>>   * Remove a virtio-net device from the specific vhost port.  Use dev->remove
>>   * flag to stop any more packets from being sent or received to/from a VM 
>> and
>> @@ -2768,6 +2852,7 @@ destroy_device(int vid)
>>           */
>>          ovsrcu_quiesce_start();
>>          VLOG_INFO("vHost Device '%s' has been removed", ifname);
>> +        netdev_request_reconfigure(&dev->up);
>>      } else {
>>          VLOG_INFO("vHost Device '%s' not found", ifname);
>>      }
>> @@ -3259,6 +3344,8 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
>>              /* Carrier status may need updating. */
>>              netdev_change_seq_changed(&dev->up);
>>          }
>> +    } else {
>> +        vhost_check_zero_copy_status(dev);
>>      }
>>  
>>      return 0;
>> @@ -3421,7 +3508,7 @@ static const struct netdev_class dpdk_vhost_class =
>>          NULL,
>>          netdev_dpdk_vhost_construct,
>>          netdev_dpdk_vhost_destruct,
>> -        NULL,
>> +        netdev_dpdk_vhost_set_config,
>>          NULL,
>>          netdev_dpdk_vhost_send,
>>          netdev_dpdk_vhost_get_carrier,
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index 4c317d0..e8409b4 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -2669,6 +2669,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
>> type=patch options:peer=p1 \
>>          </p>
>>        </column>
>>  
>> +      <column name="options" key="dq-zero-copy"
>> +              type='{"type": "boolean"}'>
>> +        <p>
>> +          The value specifies whether or not to enable dequeue zero copy on
>> +          the given interface.
>> +          The port must be in an inactive state in order to enable or 
>> disable
>> +          this feature.
>> +          Only supported by dpdkvhostuserclient and dpdkvhostuser 
>> interfaces.
>> +        </p>
>> +      </column>
>> +
>>        <column name="options" key="n_rxq_desc"
>>                type='{"type": "integer", "minInteger": 1, "maxInteger": 
>> 4096}'>
>>          <p>
>>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to