> > On 18.12.2017 15:28, Loftus, Ciara wrote: > >> > >> Not a full review. > > > > Thanks for your feedback. > > > >> > >> General thoughts: > >> > >> If following conditions are true: > >> > >> 1. We don't need to add new feature to deprecated vhostuser port. > > > > Agree. > > > >> > >> 2. We actually don't need to have ability to change ZC config if vhost- > server- > >> path > >> already configured. > >> > >> Let me explain this condition: > >> To change 'dq-zero-copy' if VM already started we need to stop VM, > >> change the > >> 'dq-zero-copy' and start the VM back. It's not much simpler than stop > VM, > >> re-add > >> port with different value for 'dq-zero-copy' and start VM back. I'm > assuming > >> here > >> that VM restart is much more complex and unlikely operation than port > >> removing > >> from OVS and adding back. This will require documenting, but we > already > >> have a > >> bunch of docs about how to modify this config option in current version. > > > > Don't fully agree. I can't say whether your assumption is correct. > > Port-deletion & re-addition has repercussions eg. loss of statistics from an > interface. Retaining those stats might be important for some. > > Why not leave it up to the user to choose their preferred method of > enabling feature ie. reboot the VM or delete & add the port. > > > >> > >> we may drop patch #1 from the series and use just something like > following > >> code instead of code changes in this patch (not tested, just for a > reference): > >> > >> --------------- > >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > >> index 2325f0e..c4cbf7c 100644 > >> --- a/lib/netdev-dpdk.c > >> +++ b/lib/netdev-dpdk.c > >> @@ -379,6 +379,9 @@ struct netdev_dpdk { > >> /* True if vHost device is 'up' and has been reconfigured at > >> least once > */ > >> bool vhost_reconfigured; > >> /* 3 pad bytes here. */ > >> + > >> + /* True if dq-zero-copy feature requested for vhost device. */ > >> + bool vhost_dq_zc; > >> ); > >> > >> PADDED_MEMBERS(CACHE_LINE_SIZE, > >> @@ -889,6 +892,7 @@ common_construct(struct netdev *netdev, > >> dpdk_port_t port_no, > >> dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); > >> ovsrcu_index_init(&dev->vid, -1); > >> dev->vhost_reconfigured = false; > >> + dev->vhost_dq_zc = false; > >> dev->attached = false; > >> > >> ovsrcu_init(&dev->qos_conf, NULL); > >> @@ -1384,6 +1388,7 @@ netdev_dpdk_vhost_client_set_config(struct > >> netdev *netdev, > >> path = smap_get(args, "vhost-server-path"); > >> if (path && strcmp(path, dev->vhost_id)) { > >> strcpy(dev->vhost_id, path); > >> + dev->vhost_dq_zc = smap_get_bool(args, "dq-zero-copy", false); > >> netdev_request_reconfigure(netdev); > >> } > >> } > >> @@ -3288,6 +3293,11 @@ netdev_dpdk_vhost_client_reconfigure(struct > >> netdev *netdev) > >> if (dpdk_vhost_iommu_enabled()) { > >> vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT; > >> } > >> + > >> + /* Enable Zero Copy mode if configured. */ > >> + if (dev->vhost_dq_zc) { > >> + vhost_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY; > >> + } > >> err = rte_vhost_driver_register(dev->vhost_id, vhost_flags); > >> if (err) { > >> VLOG_ERR("vhost-user device setup failure for device %s\n", > >> > >> --------------- > >> > >> Thoughts? > > > > The above is a lot shorter of course which is nice, but I don't think we > should sacrifice the ability to enable the feature post-boot for this. > > But you wrote in documentation below: > > +The feature cannot be enabled when the device is active (ie. VM booted). If > +you wish to enable the feature after the VM has booted, you must > shutdown > +the VM and bring it back up. > > What you mean saying "sacrifice the ability to enable the feature post-boot"? > Am I missed something?
My understanding is that your suggestion means you sacrifice the ability to enable the feature post *first* VM boot. Am I correct? My suggestion (and how the patch is implemented now) allows you to enable the feature post *first* VM boot, by means of a re-boot. Thanks, Ciara > > > > > Thanks, > > Ciara > > > >> > >> > >> Best regards, Ilya Maximets. > >> > >> P.S. Few comments about patch itself are inline. > >> > >> On 08.12.2017 18:26, Ciara Loftus wrote: > >>> Enabled per port like so: > >>> ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true > >>> > >>> The feature is disabled by default and can only be enabled/disabled > when > >>> a vHost port is down. > >>> > >>> When packets from a vHost device with zero copy enabled are destined > for > >>> a 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be > >>> set to a smaller value. 128 is recommended. This can be achieved like > >>> so: > >>> > >>> ovs-vsctl set Interface dpdkport options:n_txq_desc=128 > >>> > >>> Signed-off-by: Ciara Loftus <ciara.lof...@intel.com> > >>> --- > >>> v5: > >>> * Rebase > >>> * Update docs with warning of potential packet loss > >>> * Update docs with info on NIC & Virtio descriptor values and working > >>> combinations > >>> > >>> Documentation/howto/dpdk.rst | 33 ++++++++++++ > >>> Documentation/topics/dpdk/vhost-user.rst | 61 > >> ++++++++++++++++++++++ > >>> NEWS | 2 + > >>> lib/netdev-dpdk.c | 89 > >> +++++++++++++++++++++++++++++++- > >>> vswitchd/vswitch.xml | 11 ++++ > >>> 5 files changed, 195 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/Documentation/howto/dpdk.rst > >> b/Documentation/howto/dpdk.rst > >>> index d123819..3e1b8f8 100644 > >>> --- a/Documentation/howto/dpdk.rst > >>> +++ b/Documentation/howto/dpdk.rst > >>> @@ -709,3 +709,36 @@ devices to bridge ``br0``. Once complete, follow > >> the below steps: > >>> Check traffic on multiple queues:: > >>> > >>> $ cat /proc/interrupts | grep virtio > >>> + > >>> +PHY-VM-PHY (vHost Dequeue Zero Copy) > >>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >>> + > >>> +vHost dequeue zero copy functionality can be validated using the > >>> +PHY-VM-PHY configuration. To begin, follow the steps described in > >>> +:ref:`dpdk-phy-phy` to create and initialize the database, start > >>> +ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices > >>> +and flows to bridge ``br0``. Once complete, follow the below steps: > >>> + > >>> +1. Enable dequeue zero copy on the vHost devices. > >>> + > >>> + $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero- > copy=true > >>> + $ ovs-vsctl set Interface dpdkvhostuser1 options:dq-zero- > copy=true > >>> + > >>> +The following log should be observed for each device: > >>> + > >>> + netdev_dpdk|INFO|Zero copy enabled for vHost socket <name> > >>> + > >>> +2. Reduce the number of txq descriptors on the phy ports. > >>> + > >>> + $ ovs-vsctl set Interface phy0 options:n_txq_desc=128 > >>> + $ ovs-vsctl set Interface phy1 options:n_txq_desc=128 > >>> + > >>> +3. Proceed with the test by launching the VM and configuring guest > >>> +forwarding, be it via the vHost loopback method or kernel forwarding > >>> +method, and sending traffic. The following log should be oberved for > >>> +each device as it becomes active during VM boot: > >>> + > >>> + VHOST_CONFIG: dequeue zero copy is enabled > >>> + > >>> +It is essential that step 1 is performed before booting the VM, > otherwise > >>> +the feature will not be enabled. > >>> diff --git a/Documentation/topics/dpdk/vhost-user.rst > >> b/Documentation/topics/dpdk/vhost-user.rst > >>> index 8b1c671..8587370 100644 > >>> --- a/Documentation/topics/dpdk/vhost-user.rst > >>> +++ b/Documentation/topics/dpdk/vhost-user.rst > >>> @@ -441,3 +441,64 @@ Sample XML > >>> </domain> > >>> > >>> .. _QEMU documentation: http://git.qemu- > >> project.org/?p=qemu.git;a=blob;f=docs/specs/vhost- > >> user.txt;h=7890d7169;hb=HEAD > >>> + > >>> +vhost-user Dequeue Zero Copy > >>> +------------------------------------- > >>> + > >>> +Normally when dequeuing a packet from a vHost User device, a > memcpy > >> operation > >>> +must be used to copy that packet from guest address space to host > >> address > >>> +space. This memcpy can be removed by enabling dequeue zero-copy > like > >> so: > >>> + > >>> + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero- > >> copy=true > >>> + > >>> +With this feature enabled, a reference (pointer) to the packet is passed > to > >>> +the host, instead of a copy of the packet. Removing this memcpy can > give > >> a > >>> +performance improvement for some use cases, for example switching > >> large packets > >>> +between different VMs. However additional packet loss may be > >> observed. > >>> + > >>> +Note that the feature is disabled by default and must be explicitly > enabled > >>> +by using the command above. > >>> + > >>> +The feature cannot be enabled when the device is active (ie. VM > booted). > >> If > >>> +you wish to enable the feature after the VM has booted, you must > >> shutdown > >>> +the VM and bring it back up. > >>> + > >>> +The same logic applies for disabling the feature - it must be disabled > when > >>> +the device is inactive, for example before VM boot. To disable the > >> feature: > >>> + > >>> + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero- > >> copy=false > >>> + > >>> +The feature is available to both dpdkvhostuser and > dpdkvhostuserclient > >> port > >>> +types. > >>> + > >>> +A limitation exists whereby if packets from a vHost port with dq-zero- > >> copy=true > >>> +are destined for a 'dpdk' type port, the number of tx descriptors > >> (n_txq_desc) > >>> +for that port must be reduced to a smaller number, 128 being the > >> recommended > >>> +value. This can be achieved by issuing the following command: > >>> + > >>> + $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128 > >>> + > >>> +More information on the n_txq_desc option can be found in the "DPDK > >> Physical > >>> +Port Queue Sizes" section of the `intro/install/dpdk.rst` guide. > >> > >> I think some :ref: link should be here. > >> > >>> + > >>> +The reason for this limitation is due to how the zero copy functionality > is > >>> +implemented. The vHost device's 'tx used vring', a virtio structure used > for > >>> +tracking used ie. sent descriptors, will only be updated when the NIC > >> frees > >>> +the corresponding mbuf. If we don't free the mbufs frequently > enough, > >> that > >>> +vring will be starved and packets will no longer be processed. One way > to > >>> +ensure we don't encounter this scenario, is to configure n_txq_desc to > a > >> small > >>> +enough number such that the 'mbuf free threshold' for the NIC will be > hit > >> more > >>> +often and thus free mbufs more frequently. The value of 128 is > >> suggested, but > >>> +values of 64 and 256 have been tested and verified to work too, with > >> differing > >>> +performance characteristics. A value of 512 can be used too, if the > >>> virtio > >>> +queue size in the guest is increased to 1024 (available to configure in > >> QEMU > >>> +versions v2.10 and greater). This value can be set like so: > >>> + > >>> + $ qemu-system-x86_64 ... -chardev > >> socket,id=char1,path=<sockpath>,server > >>> + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce > >>> + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1, > >>> + tx_queue_size=1024 > >>> + > >>> +Further information can be found in the > >>> +`DPDK documentation > >>> > +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__ > >>> diff --git a/NEWS b/NEWS > >>> index d45904e..50630f8 100644 > >>> --- a/NEWS > >>> +++ b/NEWS > >>> @@ -18,6 +18,8 @@ Post-v2.8.0 > >>> - DPDK: > >>> * Add support for DPDK v17.11 > >>> * Add support for vHost IOMMU > >>> + * Optional dequeue zero copy feature for vHost ports can be > enabled > >> per > >>> + port via the boolean 'dq-zero-copy' option. > >>> > >>> v2.8.0 - 31 Aug 2017 > >>> -------------------- > >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > >>> index 879019e..03cc6cb 100644 > >>> --- a/lib/netdev-dpdk.c > >>> +++ b/lib/netdev-dpdk.c > >>> @@ -361,6 +361,9 @@ struct netdev_dpdk { > >>> /* True if vHost device is 'up' and has been reconfigured at > >>> least > once > >> */ > >>> bool vhost_reconfigured; > >>> /* 3 pad bytes here. */ > >>> + > >>> + /* True if dq-zero-copy feature has successfully been enabled */ > >>> + bool dq_zc_enabled; > >> > >> "pad bytes" should be fixed above or removed entirely, or we may just > >> revert padding > >> here too because it's incorrect just like in dp_netdev_pmd_thread. The > only > >> difference > >> is that struct netdev_dpdk aligned to cacheline. > >> > >>> ); > >>> > >>> PADDED_MEMBERS(CACHE_LINE_SIZE, > >>> @@ -871,6 +874,7 @@ common_construct(struct netdev *netdev, > >> dpdk_port_t port_no, > >>> dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); > >>> ovsrcu_index_init(&dev->vid, -1); > >>> dev->vhost_reconfigured = false; > >>> + dev->dq_zc_enabled = false; > >>> dev->attached = false; > >>> > >>> ovsrcu_init(&dev->qos_conf, NULL); > >>> @@ -1383,6 +1387,29 @@ netdev_dpdk_ring_set_config(struct netdev > >> *netdev, const struct smap *args, > >>> return 0; > >>> } > >>> > >>> +static void > >>> +dpdk_vhost_set_config_helper(struct netdev_dpdk *dev, > >>> + const struct smap *args) > >>> +{ > >>> + bool needs_reconfigure = false; > >>> + bool zc_requested = smap_get_bool(args, "dq-zero-copy", false); > >>> + > >>> + if (zc_requested && > >>> + !(dev->vhost_driver_flags & > >> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) { > >>> + dev->vhost_driver_flags |= > >> RTE_VHOST_USER_DEQUEUE_ZERO_COPY; > >>> + needs_reconfigure = true; > >>> + } else if (!zc_requested && > >>> + (dev->vhost_driver_flags & > >> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) { > >>> + dev->vhost_driver_flags &= > >> ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY; > >>> + needs_reconfigure = true; > >>> + } > >>> + > >>> + /* Only try to change ZC mode when device is down */ > >>> + if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) { > >>> + netdev_request_reconfigure(&dev->up); > >>> + } > >>> +} > >>> + > >>> static int > >>> netdev_dpdk_vhost_client_set_config(struct netdev *netdev, > >>> const struct smap *args, > >>> @@ -1399,6 +1426,23 @@ netdev_dpdk_vhost_client_set_config(struct > >> netdev *netdev, > >>> netdev_request_reconfigure(netdev); > >>> } > >>> } > >>> + > >>> + dpdk_vhost_set_config_helper(dev, args); > >>> + > >>> + ovs_mutex_unlock(&dev->mutex); > >>> + > >>> + return 0; > >>> +} > >>> + > >>> +static int > >>> +netdev_dpdk_vhost_set_config(struct netdev *netdev, > >>> + const struct smap *args, > >>> + char **errp OVS_UNUSED) > >>> +{ > >>> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > >>> + > >>> + ovs_mutex_lock(&dev->mutex); > >>> + dpdk_vhost_set_config_helper(dev, args); > >>> ovs_mutex_unlock(&dev->mutex); > >>> > >>> return 0; > >>> @@ -2723,6 +2767,46 @@ netdev_dpdk_txq_map_clear(struct > >> netdev_dpdk *dev) > >>> } > >>> } > >>> > >>> +static void > >>> +vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool > >> client_mode, > >>> + bool enable) > >>> +{ > >>> + int err = rte_vhost_driver_unregister(dev->vhost_id); > >>> + > >>> + if (err) { > >>> + VLOG_ERR("Error unregistering vHost socket %s; can't change zero > >> copy " > >>> + "mode", dev->vhost_id); > >>> + } else { > >>> + err = dpdk_setup_vhost_device(dev, client_mode); > >>> + if (err) { > >>> + VLOG_ERR("Error changing zero copy mode for vHost socket %s", > >>> + dev->vhost_id); > >>> + } else if (enable) { > >>> + dev->dq_zc_enabled = true; > >>> + VLOG_INFO("Zero copy enabled for vHost socket %s", dev- > >>> vhost_id); > >>> + } else { > >>> + dev->dq_zc_enabled = false; > >>> + VLOG_INFO("Zero copy disabled for vHost socket %s", dev- > >>> vhost_id); > >>> + } > >>> + } > >>> +} > >>> + > >>> +static void > >>> +vhost_check_zero_copy_status(struct netdev_dpdk *dev) > >>> +{ > >>> + bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT; > >>> + > >>> + if ((dev->vhost_driver_flags & > >> RTE_VHOST_USER_DEQUEUE_ZERO_COPY) > >>> + && !dev->dq_zc_enabled) { > >>> + /* ZC disabled but requested to be enabled, enable it. */ > >>> + vhost_change_zero_copy_mode(dev, mode, true); > >>> + } else if (!(dev->vhost_driver_flags & > >>> + RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev- > >>> dq_zc_enabled) { > >>> + /* ZC enabled but requested to be disabled, disable it. */ > >>> + vhost_change_zero_copy_mode(dev, mode, false); > >>> + } > >>> +} > >>> + > >>> /* > >>> * Remove a virtio-net device from the specific vhost port. Use dev- > >>> remove > >>> * flag to stop any more packets from being sent or received to/from a > VM > >> and > >>> @@ -2768,6 +2852,7 @@ destroy_device(int vid) > >>> */ > >>> ovsrcu_quiesce_start(); > >>> VLOG_INFO("vHost Device '%s' has been removed", ifname); > >>> + netdev_request_reconfigure(&dev->up); > >>> } else { > >>> VLOG_INFO("vHost Device '%s' not found", ifname); > >>> } > >>> @@ -3259,6 +3344,8 @@ dpdk_vhost_reconfigure_helper(struct > >> netdev_dpdk *dev) > >>> /* Carrier status may need updating. */ > >>> netdev_change_seq_changed(&dev->up); > >>> } > >>> + } else { > >>> + vhost_check_zero_copy_status(dev); > >>> } > >>> > >>> return 0; > >>> @@ -3421,7 +3508,7 @@ static const struct netdev_class > dpdk_vhost_class > >> = > >>> NULL, > >>> netdev_dpdk_vhost_construct, > >>> netdev_dpdk_vhost_destruct, > >>> - NULL, > >>> + netdev_dpdk_vhost_set_config, > >>> NULL, > >>> netdev_dpdk_vhost_send, > >>> netdev_dpdk_vhost_get_carrier, > >>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > >>> index 4c317d0..e8409b4 100644 > >>> --- a/vswitchd/vswitch.xml > >>> +++ b/vswitchd/vswitch.xml > >>> @@ -2669,6 +2669,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 > >> type=patch options:peer=p1 \ > >>> </p> > >>> </column> > >>> > >>> + <column name="options" key="dq-zero-copy" > >>> + type='{"type": "boolean"}'> > >>> + <p> > >>> + The value specifies whether or not to enable dequeue zero copy > on > >>> + the given interface. > >>> + The port must be in an inactive state in order to enable or > >>> disable > >>> + this feature. > >>> + Only supported by dpdkvhostuserclient and dpdkvhostuser > >> interfaces. > >>> + </p> > >>> + </column> > >>> + > >>> <column name="options" key="n_rxq_desc" > >>> type='{"type": "integer", "minInteger": 1, "maxInteger": > >>> 4096}'> > >>> <p> > >>> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev