TCP Segmentation Offload (TSO) is a feature which enables the TCP/IP
network stack to delegate segmentation of a TCP segment to the hardware
NIC, thus saving compute resources. This may improve performance
significantly for TCP workload in virtualized environments.

While a previous commit already added the necesary logic to netdev-dpdk
to deal with packets marked for TSO, this set of changes enables TSO by
default when using multi-segment mbufs.

Thus, to enable TSO on the physical DPDK interfaces, only the following
command needs to be issued before starting OvS:
    ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

Co-authored-by: Mark Kavanagh <[email protected]>

Signed-off-by: Mark Kavanagh <[email protected]>
Signed-off-by: Tiago Lam <[email protected]>
---
 Documentation/automake.mk           |   1 +
 Documentation/topics/dpdk/index.rst |   1 +
 Documentation/topics/dpdk/tso.rst   | 111 ++++++++++++++++++++++++++++++++++++
 NEWS                                |   1 +
 lib/netdev-dpdk.c                   |  63 +++++++++++++++++---
 5 files changed, 168 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/topics/dpdk/tso.rst

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 082438e..a20deb8 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -39,6 +39,7 @@ DOC_SOURCE = \
        Documentation/topics/dpdk/index.rst \
        Documentation/topics/dpdk/bridge.rst \
        Documentation/topics/dpdk/jumbo-frames.rst \
+       Documentation/topics/dpdk/tso.rst \
        Documentation/topics/dpdk/memory.rst \
        Documentation/topics/dpdk/pdump.rst \
        Documentation/topics/dpdk/phy.rst \
diff --git a/Documentation/topics/dpdk/index.rst 
b/Documentation/topics/dpdk/index.rst
index cf24a7b..eb2a04d 100644
--- a/Documentation/topics/dpdk/index.rst
+++ b/Documentation/topics/dpdk/index.rst
@@ -40,4 +40,5 @@ The DPDK Datapath
    /topics/dpdk/qos
    /topics/dpdk/pdump
    /topics/dpdk/jumbo-frames
+   /topics/dpdk/tso
    /topics/dpdk/memory
diff --git a/Documentation/topics/dpdk/tso.rst 
b/Documentation/topics/dpdk/tso.rst
new file mode 100644
index 0000000..503354f
--- /dev/null
+++ b/Documentation/topics/dpdk/tso.rst
@@ -0,0 +1,111 @@
+..
+      Copyright 2018, Red Hat, Inc.
+
+      Licensed under the Apache License, Version 2.0 (the "License"); you may
+      not use this file except in compliance with the License. You may obtain
+      a copy of the License at
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+      License for the specific language governing permissions and limitations
+      under the License.
+
+      Convention for heading levels in Open vSwitch documentation:
+
+      =======  Heading 0 (reserved for the title in a document)
+      -------  Heading 1
+      ~~~~~~~  Heading 2
+      +++++++  Heading 3
+      '''''''  Heading 4
+
+      Avoid deeper levels because they do not render well.
+
+===
+TSO
+===
+
+.. versionadded:: 2.11.0
+
+TCP Segmentation Offload (TSO) is a mechanism which allows a TCP/IP stack to
+offload the TCP segmentation into hardware, thus saving the cycles that would
+be required to perform this same segmentation in Software.
+
+TCP Segmentation Offload (TSO) enables a network stack to delegate segmentation
+of an oversized TCP segment to the underlying physical NIC. Offload of frame
+segmentation achieves computational savings in the core, freeing up CPU cycles
+for more useful work.
+
+A common use case for TSO is when using virtualization, where traffic that's
+coming in from a VM can offload the TCP segmentation, thus avoiding the
+fragmentation in Software. Additionally, if the traffic is headed to a VM
+within the same host further optimization can be expected. As the traffic never
+leaves the machine, no MTU needs to be accounted for, and thus no segmentation
+and checksum calculations are required, which saves yet more cycles. Only when
+the traffic actually leaves the host the segmentation needs to happen, in which
+case it will be performed by the egress NIC.
+
+When using TSO with DPDK, the implementation relies on the multi-segment mbufs
+feature, described in :doc:`/topics/dpdk/jumbo-frames`, where each mbuf
+contains ~2KiB of the entire packet's data and is linked to the next mbuf that
+contains the next portion of data.
+
+Enabling TSO
+~~~~~~~~~~~~
+Once multi-segment mbufs is enabled, TSO will be enabled by default, if there's
+support for it in the underlying physical NICs attached to OvS-DPDK.
+
+When using :doc:`vHost User ports <vhost-user>`, TSO may be enabled in one of
+two ways, as follows.
+
+`TSO` is enabled in OvS by the DPDK vHost User backend; when a new guest
+connection is established, `TSO` is thus advertised to the guest as an
+available feature:
+
+  1. QEMU Command Line Parameter:
+     ```
+     sudo $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \
+     ...
+     -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,\
+     mrg_rxbuf=on,csum=on,gso=on,guest_csum=on,guest_tso4=on,\
+     guest_tso6=on,guest_ecn=on \
+     ...
+     ```
+
+  2. ethtool
+
+Assuming that the guest's OS also supports `TSO`, ethtool can be used to enable
+same:
+
+    ```
+    ethtool -K eth0 sg on     # scatter-gather is a prerequisite for TSO
+    ethtool -K eth0 tso on
+    ethtool -k eth0
+    ```
+
+    <b>Note:</b> In both methods, `mergeable buffers` are required:
+    ```
+    sudo $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \
+    ...
+    mrg_rxbuf=on,\
+    ...
+    ```
+
+To enable TSO in a guest, the underlying NIC must first support `TSO` - consult
+your controller's datasheet for compatibility. Secondly, the NIC must have an
+associated DPDK Poll Mode Driver (PMD) which supports `TSO`.
+
+~~~~~~~~~~~
+Limitations
+~~~~~~~~~~~
+The current OvS `TSO` implementation supports flat and VLAN networks only (i.e.
+no support for `TSO` over tunneled connection [VxLAN, GRE, IPinIP, etc.]).
+
+Also, as TSO is built on top of multi-segments mbufs, the constraints pointed
+out in :doc:`/topics/dpdk/jumbo-frames` also apply for TSO. Thus, some
+performance hits might be noticed when running specific functionalitry, like
+the Userspace Connection tracker. And as mentioned in the same section, it is
+paramount that a packet's headers is contained within the first mbuf (~2KiB in
+size).
diff --git a/NEWS b/NEWS
index 98f5a9b..c7cfd17 100644
--- a/NEWS
+++ b/NEWS
@@ -23,6 +23,7 @@ Post-v2.10.0
      * Add option for simple round-robin based Rxq to PMD assignment.
        It can be set with pmd-rxq-assign.
      * Add support for DPDK 18.11
+     * Add support for TSO (between DPDK interfaces only).
    - Add 'symmetric_l3' hash function.
    - OVS now honors 'updelay' and 'downdelay' for bonds with LACP configured.
    - ovs-vswitchd:
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ad7223a..7af37ee 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -374,7 +374,8 @@ struct ingress_policer {
 enum dpdk_hw_ol_features {
     NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0,
     NETDEV_RX_HW_CRC_STRIP = 1 << 1,
-    NETDEV_RX_HW_SCATTER = 1 << 2
+    NETDEV_RX_HW_SCATTER = 1 << 2,
+    NETDEV_TX_TSO_OFFLOAD = 1 << 3,
 };
 
 /*
@@ -1005,19 +1006,26 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int 
n_rxq, int n_txq)
     /* Multi-segment-mbuf-specific setup. */
     if (dpdk_multi_segment_mbufs) {
         if (info.tx_offload_capa & DEV_TX_OFFLOAD_MULTI_SEGS) {
-            /* Enable multi-seg mbufs. DPDK PMDs typically attempt to use
-             * simple or vectorized transmit functions, neither of which are
-             * compatible with multi-segment mbufs. */
+            /* DPDK PMDs typically attempt to use simple or vectorized
+             * transmit functions, neither of which are compatible with
+             * multi-segment mbufs. Ensure that these are disabled when
+             * multi-segment mbufs are enabled.
+             */
             conf.txmode.offloads |= DEV_TX_OFFLOAD_MULTI_SEGS;
-        } else {
-            VLOG_WARN("Interface %s doesn't support multi-segment mbufs",
-                      dev->up.name);
-            conf.txmode.offloads &= ~DEV_TX_OFFLOAD_MULTI_SEGS;
+        }
+
+        if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
+            conf.txmode.offloads |= DEV_TX_OFFLOAD_TCP_TSO;
+            conf.txmode.offloads |= DEV_TX_OFFLOAD_TCP_CKSUM;
+            conf.txmode.offloads |= DEV_TX_OFFLOAD_IPV4_CKSUM;
         }
 
         txconf = info.default_txconf;
-        //txconf.txq_flags = ETH_TXQ_FLAGS_IGNORE;
         txconf.offloads = conf.txmode.offloads;
+    } else if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
+        dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD;
+        VLOG_WARN("Failed to set Tx TSO offload in %s. Requires option "
+                  "`dpdk-multi-seg-mbufs` to be enabled.", dev->up.name);
     }
 
     conf.intr_conf.lsc = dev->lsc_interrupt_mode;
@@ -1134,6 +1142,9 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
     uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM |
                                      DEV_RX_OFFLOAD_TCP_CKSUM |
                                      DEV_RX_OFFLOAD_IPV4_CKSUM;
+    uint32_t tx_tso_offload_capa = DEV_TX_OFFLOAD_TCP_TSO |
+                                   DEV_TX_OFFLOAD_TCP_CKSUM |
+                                   DEV_TX_OFFLOAD_IPV4_CKSUM;
 
     rte_eth_dev_info_get(dev->port_id, &info);
 
@@ -1160,6 +1171,18 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
         dev->hw_ol_features &= ~NETDEV_RX_HW_SCATTER;
     }
 
+    if (dpdk_multi_segment_mbufs) {
+        if (info.tx_offload_capa & tx_tso_offload_capa) {
+            dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD;
+        } else {
+            dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD;
+            VLOG_WARN("Tx TSO offload is not supported on port "
+                      DPDK_PORT_ID_FMT, dev->port_id);
+        }
+    } else {
+        dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD;
+    }
+
     n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
     n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
 
@@ -1684,6 +1707,11 @@ netdev_dpdk_get_config(const struct netdev *netdev, 
struct smap *args)
         } else {
             smap_add(args, "rx_csum_offload", "false");
         }
+        if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
+            smap_add(args, "tx_tso_offload", "true");
+        } else {
+            smap_add(args, "tx_tso_offload", "false");
+        }
         smap_add(args, "lsc_interrupt_mode",
                  dev->lsc_interrupt_mode ? "true" : "false");
     }
@@ -2372,6 +2400,15 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, 
struct rte_mbuf **pkts,
     for (i = 0; i < pkt_cnt; i++) {
         pkt = pkts[i];
 
+        /* Drop TSO packet if there's no TSO support on egress port */
+        if ((pkt->ol_flags & PKT_TX_TCP_SEG) &&
+            !(dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD)) {
+            VLOG_WARN_RL(&rl, "%s: TSO is disabled on port, TSO packet dropped"
+                        "%" PRIu32 " ", dev->up.name, pkt->pkt_len);
+            rte_pktmbuf_free(pkt);
+            continue;
+        }
+
         if (OVS_UNLIKELY(pkt->pkt_len > dev->max_packet_len)) {
             if (!(pkt->ol_flags & PKT_TX_TCP_SEG)) {
                 VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " "
@@ -4245,6 +4282,14 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
         dev->tx_q[0].map = 0;
     }
 
+    if (dpdk_multi_segment_mbufs) {
+        dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD;
+
+        VLOG_DBG("%s: TSO enabled on vhost port", dev->up.name);
+    } else {
+        dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD;
+    }
+
     netdev_dpdk_remap_txqs(dev);
 
     err = netdev_dpdk_mempool_configure(dev);
-- 
2.7.4

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to