On 11/01/2019 16:02, Ian Stokes wrote: > On 1/10/2019 4:58 PM, Tiago Lam wrote: >> TCP Segmentation Offload (TSO) is a feature which enables the TCP/IP >> network stack to delegate segmentation of a TCP segment to the hardware >> NIC, thus saving compute resources. This may improve performance >> significantly for TCP workload in virtualized environments. >> >> While a previous commit already added the necesary logic to netdev-dpdk >> to deal with packets marked for TSO, this set of changes enables TSO by >> default when using multi-segment mbufs. >> >> Thus, to enable TSO on the physical DPDK interfaces, only the following >> command needs to be issued before starting OvS: >> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true > > A general question (and not a blocker by any means), is it more > intuitive to change above to be 'other_config:dpdk-tso=true' or > something similar? > > I'm thinking from a user perspective, they would be familiar with the > concept of TSO already but not necessarily multi segments. It seems to > be a technical per-requisite for DPDK (and OVS DPDK) under the hood. > > But again multi segments will be needed for other offloads besides TSO > so maybe it makes more sense to stick with the argument you have already.
That was my first thought, and the main reason why I went with `dpdk-multi-seg-mbufs`. But I'm open to suggestions, as mentioned in the first RFC of the series. If people think it's better to go with `dpdk-tso` for now, I'd gladly change. Thanks for the comments. I've replied to some in-line, but assume I'll deal with ones I haven't comment as well. > >> >> Co-authored-by: Mark Kavanagh <[email protected]> >> >> Signed-off-by: Mark Kavanagh <[email protected]> >> Signed-off-by: Tiago Lam <[email protected]> >> --- >> Documentation/automake.mk | 1 + >> Documentation/topics/dpdk/index.rst | 1 + >> Documentation/topics/dpdk/tso.rst | 111 >> ++++++++++++++++++++++++++++++++++++ >> NEWS | 1 + >> lib/netdev-dpdk.c | 63 +++++++++++++++++--- >> 5 files changed, 168 insertions(+), 9 deletions(-) >> create mode 100644 Documentation/topics/dpdk/tso.rst >> >> diff --git a/Documentation/automake.mk b/Documentation/automake.mk >> index 082438e..a20deb8 100644 >> --- a/Documentation/automake.mk >> +++ b/Documentation/automake.mk >> @@ -39,6 +39,7 @@ DOC_SOURCE = \ >> Documentation/topics/dpdk/index.rst \ >> Documentation/topics/dpdk/bridge.rst \ >> Documentation/topics/dpdk/jumbo-frames.rst \ >> + Documentation/topics/dpdk/tso.rst \ >> Documentation/topics/dpdk/memory.rst \ >> Documentation/topics/dpdk/pdump.rst \ >> Documentation/topics/dpdk/phy.rst \ >> diff --git a/Documentation/topics/dpdk/index.rst >> b/Documentation/topics/dpdk/index.rst >> index cf24a7b..eb2a04d 100644 >> --- a/Documentation/topics/dpdk/index.rst >> +++ b/Documentation/topics/dpdk/index.rst >> @@ -40,4 +40,5 @@ The DPDK Datapath >> /topics/dpdk/qos >> /topics/dpdk/pdump >> /topics/dpdk/jumbo-frames >> + /topics/dpdk/tso >> /topics/dpdk/memory >> diff --git a/Documentation/topics/dpdk/tso.rst >> b/Documentation/topics/dpdk/tso.rst >> new file mode 100644 >> index 0000000..503354f >> --- /dev/null >> +++ b/Documentation/topics/dpdk/tso.rst >> @@ -0,0 +1,111 @@ >> +.. >> + Copyright 2018, Red Hat, Inc. > > Probably Intel in this case rather than RH. :) > >> + >> + Licensed under the Apache License, Version 2.0 (the "License"); you >> may >> + not use this file except in compliance with the License. You may >> obtain >> + a copy of the License at >> + >> + http://www.apache.org/licenses/LICENSE-2.0 >> + >> + Unless required by applicable law or agreed to in writing, software >> + distributed under the License is distributed on an "AS IS" BASIS, >> WITHOUT >> + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See >> the >> + License for the specific language governing permissions and >> limitations >> + under the License. >> + >> + Convention for heading levels in Open vSwitch documentation: >> + >> + ======= Heading 0 (reserved for the title in a document) >> + ------- Heading 1 >> + ~~~~~~~ Heading 2 >> + +++++++ Heading 3 >> + ''''''' Heading 4 >> + >> + Avoid deeper levels because they do not render well. >> + >> +=== >> +TSO >> +=== > You should probably add a notice that this feature is (experimental). > >> + >> +.. versionadded:: 2.11.0 > It's good practice to add the version added but traditionally we've done > this after the fact once the release is out. Other wise you can have > commits on master referring to a release that is not officially out yet. > > This has come up before and in that case we waited for the release > before adding. >> + >> +TCP Segmentation Offload (TSO) is a mechanism which allows a TCP/IP stack to >> +offload the TCP segmentation into hardware, thus saving the cycles that >> would >> +be required to perform this same segmentation in Software. > Minor: Is software capitalized for a reason above and throughout the > text below? I would have thought lower case would be fine. > >> + >> +TCP Segmentation Offload (TSO) enables a network stack to delegate >> segmentation >> +of an oversized TCP segment to the underlying physical NIC. Offload of frame >> +segmentation achieves computational savings in the core, freeing up CPU >> cycles >> +for more useful work. >> + >> +A common use case for TSO is when using virtualization, where traffic that's >> +coming in from a VM can offload the TCP segmentation, thus avoiding the >> +fragmentation in Software. Additionally, if the traffic is headed to a VM >> +within the same host further optimization can be expected. As the traffic >> never >> +leaves the machine, no MTU needs to be accounted for, and thus no >> segmentation >> +and checksum calculations are required, which saves yet more cycles. Only >> when >> +the traffic actually leaves the host the segmentation needs to happen, in >> which >> +case it will be performed by the egress NIC. >> + >> +When using TSO with DPDK, the implementation relies on the multi-segment >> mbufs >> +feature, described in :doc:`/topics/dpdk/jumbo-frames`, where each mbuf >> +contains ~2KiB of the entire packet's data and is linked to the next mbuf >> that >> +contains the next portion of data. >> + >> +Enabling TSO >> +~~~~~~~~~~~~ >> +Once multi-segment mbufs is enabled, TSO will be enabled by default, if >> there's >> +support for it in the underlying physical NICs attached to OvS-DPDK. >> + > > Probably worth calling out above with an Important: tag. > >> +When using :doc:`vHost User ports <vhost-user>`, TSO may be enabled in one >> of >> +two ways, as follows. >> + >> +`TSO` is enabled in OvS by the DPDK vHost User backend; when a new guest >> +connection is established, `TSO` is thus advertised to the guest as an >> +available feature: >> + >> + 1. QEMU Command Line Parameter: >> + ``` >> + sudo $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \ >> + ... >> + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,\ >> + mrg_rxbuf=on,csum=on,gso=on,guest_csum=on,guest_tso4=on,\ >> + guest_tso6=on,guest_ecn=on \ >> + ... >> + ``` > > Your passing gso above as well as tso, is that required? I wasn't sure > myself. Good point. I've cleaned this example to include the bare minimum; GSO is not required here. > >> + >> + 2. ethtool >> + >> +Assuming that the guest's OS also supports `TSO`, ethtool can be used to >> enable >> +same: >> + >> + ``` >> + ethtool -K eth0 sg on # scatter-gather is a prerequisite for TSO >> + ethtool -K eth0 tso on >> + ethtool -k eth0 >> + ``` >> + >> + <b>Note:</b> In both methods, `mergeable buffers` are required: > Would call out above as important rather than note. > >> + ``` >> + sudo $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \ >> + ... >> + mrg_rxbuf=on,\ >> + ... >> + ``` >> + >> +To enable TSO in a guest, the underlying NIC must first support `TSO` - >> consult >> +your controller's datasheet for compatibility. Secondly, the NIC must have >> an >> +associated DPDK Poll Mode Driver (PMD) which supports `TSO`. >> + >> +~~~~~~~~~~~ >> +Limitations >> +~~~~~~~~~~~ >> +The current OvS `TSO` implementation supports flat and VLAN networks only >> (i.e. >> +no support for `TSO` over tunneled connection [VxLAN, GRE, IPinIP, etc.]). >> + >> +Also, as TSO is built on top of multi-segments mbufs, the constraints >> pointed >> +out in :doc:`/topics/dpdk/jumbo-frames` also apply for TSO. Thus, some >> +performance hits might be noticed when running specific functionalitry, like > Minor typo in 'functionality' above. > >> +the Userspace Connection tracker. And as mentioned in the same section, it >> is >> +paramount that a packet's headers is contained within the first mbuf (~2KiB >> in >> +size). >> diff --git a/NEWS b/NEWS >> index 98f5a9b..c7cfd17 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -23,6 +23,7 @@ Post-v2.10.0 >> * Add option for simple round-robin based Rxq to PMD assignment. >> It can be set with pmd-rxq-assign. >> * Add support for DPDK 18.11 >> + * Add support for TSO (between DPDK interfaces only). > > can you add the experimental tag next to this (there's a few examples in > the news doc). > >> - Add 'symmetric_l3' hash function. >> - OVS now honors 'updelay' and 'downdelay' for bonds with LACP >> configured. >> - ovs-vswitchd: >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >> index ad7223a..7af37ee 100644 >> --- a/lib/netdev-dpdk.c >> +++ b/lib/netdev-dpdk.c >> @@ -374,7 +374,8 @@ struct ingress_policer { >> enum dpdk_hw_ol_features { >> NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, >> NETDEV_RX_HW_CRC_STRIP = 1 << 1, >> - NETDEV_RX_HW_SCATTER = 1 << 2 >> + NETDEV_RX_HW_SCATTER = 1 << 2, >> + NETDEV_TX_TSO_OFFLOAD = 1 << 3, >> }; >> >> /* >> @@ -1005,19 +1006,26 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, >> int n_rxq, int n_txq) >> /* Multi-segment-mbuf-specific setup. */ >> if (dpdk_multi_segment_mbufs) { >> if (info.tx_offload_capa & DEV_TX_OFFLOAD_MULTI_SEGS) { >> - /* Enable multi-seg mbufs. DPDK PMDs typically attempt to use >> - * simple or vectorized transmit functions, neither of which are >> - * compatible with multi-segment mbufs. */ >> + /* DPDK PMDs typically attempt to use simple or vectorized >> + * transmit functions, neither of which are compatible with >> + * multi-segment mbufs. Ensure that these are disabled when >> + * multi-segment mbufs are enabled. >> + */ > I would think this comment should be added when the conf.txmode.offloads > |= DEV_TX_OFFLOAD_MULTI_SEGS; cod below was introduced rather than at > this stage? >> conf.txmode.offloads |= DEV_TX_OFFLOAD_MULTI_SEGS; >> - } else { >> - VLOG_WARN("Interface %s doesn't support multi-segment mbufs", >> - dev->up.name); >> - conf.txmode.offloads &= ~DEV_TX_OFFLOAD_MULTI_SEGS; > Although it's removed here, I believe we discussed that rather than a > warning the device should fail to configure if MSEG is enabled but the > device doesn't support? This seems to have been something I missed while rebasing TSO on top of the latest multi-segs. I've fixed it now. >> + } >> + >> + if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { >> + conf.txmode.offloads |= DEV_TX_OFFLOAD_TCP_TSO; >> + conf.txmode.offloads |= DEV_TX_OFFLOAD_TCP_CKSUM; >> + conf.txmode.offloads |= DEV_TX_OFFLOAD_IPV4_CKSUM; >> } >> >> txconf = info.default_txconf; >> - //txconf.txq_flags = ETH_TXQ_FLAGS_IGNORE; > I guess this is left over from previous dev work you had, should be > removed from this patch for the next revision. > >> txconf.offloads = conf.txmode.offloads; >> + } else if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { >> + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; >> + VLOG_WARN("Failed to set Tx TSO offload in %s. Requires option " >> + "`dpdk-multi-seg-mbufs` to be enabled.", dev->up.name); >> } >> >> conf.intr_conf.lsc = dev->lsc_interrupt_mode; >> @@ -1134,6 +1142,9 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) >> uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM | >> DEV_RX_OFFLOAD_TCP_CKSUM | >> DEV_RX_OFFLOAD_IPV4_CKSUM; >> + uint32_t tx_tso_offload_capa = DEV_TX_OFFLOAD_TCP_TSO | >> + DEV_TX_OFFLOAD_TCP_CKSUM | >> + DEV_TX_OFFLOAD_IPV4_CKSUM; >> >> rte_eth_dev_info_get(dev->port_id, &info); >> >> @@ -1160,6 +1171,18 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) >> dev->hw_ol_features &= ~NETDEV_RX_HW_SCATTER; >> } >> >> + if (dpdk_multi_segment_mbufs) { >> + if (info.tx_offload_capa & tx_tso_offload_capa) { >> + dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; >> + } else { >> + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; >> + VLOG_WARN("Tx TSO offload is not supported on port " >> + DPDK_PORT_ID_FMT, dev->port_id); >> + } >> + } else { >> + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; >> + } >> + >> n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq); >> n_txq = MIN(info.max_tx_queues, dev->up.n_txq); >> >> @@ -1684,6 +1707,11 @@ netdev_dpdk_get_config(const struct netdev *netdev, >> struct smap *args) >> } else { >> smap_add(args, "rx_csum_offload", "false"); >> } >> + if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { >> + smap_add(args, "tx_tso_offload", "true"); >> + } else { >> + smap_add(args, "tx_tso_offload", "false"); >> + } >> smap_add(args, "lsc_interrupt_mode", >> dev->lsc_interrupt_mode ? "true" : "false"); >> } >> @@ -2372,6 +2400,15 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk >> *dev, struct rte_mbuf **pkts, >> for (i = 0; i < pkt_cnt; i++) { >> pkt = pkts[i]; >> >> + /* Drop TSO packet if there's no TSO support on egress port */ >> + if ((pkt->ol_flags & PKT_TX_TCP_SEG) && >> + !(dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD)) { >> + VLOG_WARN_RL(&rl, "%s: TSO is disabled on port, TSO packet >> dropped" >> + "%" PRIu32 " ", dev->up.name, pkt->pkt_len); >> + rte_pktmbuf_free(pkt); >> + continue; >> + } > Hmmm, have we changed the nature of the function now. where as before it > was just checking packet length now we inspect for offload type as well. > > It could be better to rename the function 'netdev_dpdk_filter_packet' > and add an explanation in the comment preceding the function WRT what > filters are applied (length and TSO offload support). Good point. I've renamed the function to the above and added a function comment to make it clear. > > Ian >> + >> if (OVS_UNLIKELY(pkt->pkt_len > dev->max_packet_len)) { >> if (!(pkt->ol_flags & PKT_TX_TCP_SEG)) { >> VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " " >> @@ -4245,6 +4282,14 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) >> dev->tx_q[0].map = 0; >> } >> >> + if (dpdk_multi_segment_mbufs) { >> + dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; >> + >> + VLOG_DBG("%s: TSO enabled on vhost port", dev->up.name); >> + } else { >> + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; >> + } >> + >> netdev_dpdk_remap_txqs(dev); >> >> err = netdev_dpdk_mempool_configure(dev); >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
