Outer UDP checksum for TSO packets can be computed in software regardless of how segmentation is done (SW or HW). This property comes from nested checksums that cancels impact of the inner L4 payload to the outer checksum.
Performance was evaluated using iperf between two virtual machines hosted by two RHEL 9 hypervisors running OVS-DPDK. The main limiting factor is on the receiving endpoint as the receiving virtual machine seems too slow to dequeue packets coming from the hypervisor (tx retries/tx drops seen on the vhost-user port). A little warning on those numbers that should be taken simply as an indication of the improvement and not as absolute numbers: - the VMs and OVS-DPDKs were running from numa 0 while the E810 nic is placed in numa 1, - the random drops on the receiving side means that there is some variance on the numbers between runs, Main branch: Switching to 0000:3b:00.0 (mlx5_core) IPv4/IPv4 [ 5] 0.00-1.00 sec 767 MBytes 6.43 Gbits/sec 39 2.10 MBytes IPv4/IPv6 [ 5] 0.00-1.00 sec 768 MBytes 6.44 Gbits/sec 0 2.86 MBytes IPv6/IPv4 [ 5] 0.00-1.00 sec 771 MBytes 6.46 Gbits/sec 0 2.56 MBytes IPv6/IPv6 [ 5] 0.00-1.00 sec 639 MBytes 5.36 Gbits/sec 416 1.77 MBytes Switching to 0000:5e:00.0 (i40e) IPv4/IPv4 [ 5] 0.00-1.00 sec 583 MBytes 4.89 Gbits/sec 990 228 KBytes IPv4/IPv6 [ 5] 0.00-1.00 sec 646 MBytes 5.42 Gbits/sec 0 2.12 MBytes IPv6/IPv4 [ 5] 0.00-1.00 sec 633 MBytes 5.31 Gbits/sec 465 538 KBytes IPv6/IPv6 [ 5] 0.00-1.00 sec 657 MBytes 5.51 Gbits/sec 0 2.14 MBytes Switching to 0000:d8:00.0 (ice) IPv4/IPv4 [ 5] 0.00-1.00 sec 1.27 GBytes 10.9 Gbits/sec 64 1.52 MBytes IPv4/IPv6 [ 5] 0.00-1.00 sec 1.16 GBytes 9.93 Gbits/sec 110 2.30 MBytes IPv6/IPv4 [ 5] 0.00-1.00 sec 1.17 GBytes 10.1 Gbits/sec 112 2.35 MBytes IPv6/IPv6 [ 5] 0.00-1.00 sec 1.14 GBytes 9.79 Gbits/sec 94 1.75 MBytes After series: Switching to 0000:3b:00.0 (mlx5_core) IPv4/IPv4 [ 5] 0.00-1.00 sec 1.10 GBytes 9.49 Gbits/sec 493 1.14 MBytes IPv4/IPv6 [ 5] 0.00-1.00 sec 1.08 GBytes 9.30 Gbits/sec 162 1.77 MBytes IPv6/IPv4 [ 5] 0.00-1.00 sec 1.10 GBytes 9.43 Gbits/sec 1354 642 KBytes IPv6/IPv6 [ 5] 0.00-1.00 sec 1.10 GBytes 9.44 Gbits/sec 103 1.36 MBytes Switching to 0000:5e:00.0 (i40e) IPv4/IPv4 [ 5] 0.00-1.00 sec 1.13 GBytes 9.71 Gbits/sec 516 1.49 MBytes IPv4/IPv6 [ 5] 0.00-1.00 sec 1.29 GBytes 11.0 Gbits/sec 7 2.54 MBytes IPv6/IPv4 [ 5] 0.00-1.00 sec 1.28 GBytes 11.0 Gbits/sec 1220 920 KBytes IPv6/IPv6 [ 5] 0.00-1.00 sec 1.32 GBytes 11.3 Gbits/sec 116 1.89 MBytes Switching to 0000:d8:00.0 (ice) IPv4/IPv4 [ 5] 0.00-1.00 sec 1.23 GBytes 10.6 Gbits/sec 1867 400 KBytes IPv4/IPv6 [ 5] 0.00-1.00 sec 1.17 GBytes 10.0 Gbits/sec 113 2.28 MBytes IPv6/IPv4 [ 5] 0.00-1.00 sec 1.22 GBytes 10.5 Gbits/sec 102 2.33 MBytes IPv6/IPv6 [ 5] 0.00-1.00 sec 1.15 GBytes 9.86 Gbits/sec 109 2.29 MBytes -- David Marchand David Marchand (8): netdev-dpdk: Dump checksum offloads flags for debug. netdev-dpdk: Fix rx queue fill level with QoS. netdev-dpdk: Enforce mono-segment mbufs. netdev-dpdk: Fix TSO packet length check for tunnels. dp-packet-gso: Request UDP checksum when needed. dp-packet: Optimize outer checksum for nested checksums. dp-packet-gso: Refactor software segmentation code. netdev: Use HW segmentation without outer UDP checksum. lib/dp-packet-gso.c | 296 ++++++++++++++++++++++++++------------------ lib/dp-packet-gso.h | 4 +- lib/dp-packet.c | 20 +-- lib/dp-packet.h | 12 ++ lib/netdev-dpdk.c | 129 +++++++++---------- lib/netdev.c | 36 +++--- lib/packets.c | 117 +++++++++++++++-- lib/packets.h | 1 + 8 files changed, 392 insertions(+), 223 deletions(-) -- 2.51.0 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
