>
> Currently, the dpdk-vhost side in ovs doesn't support tcp/udp tx cksum.
> So L4 packets's cksum were calculated in VM side but performance is not
> good.
> Implementing tcp/udp tx cksum in ovs-dpdk side improves throughput in
> VM->phy->phy->VM situation. And it makes virtio-net frontend-driver
> support NETIF_F_SG(feature scatter-gather) as well.
>
> Signed-off-by: Zhenyu Gao <[email protected]>
> ---
>
> Here is some performance number:
Hi Zhenyu,
Thanks for the code changes since v3.
I tested a VM to VM case using iperf and observed a performance degradation
when the tx cksum was offloaded to the host:
checksum in VM
0.0-30.0 sec 10.9 GBytes 3.12 Gbits/sec
0.0-30.0 sec 10.9 GBytes 3.11 Gbits/sec
0.0-30.0 sec 11.0 GBytes 3.16 Gbits/sec
checksum in ovs dpdk
0.0-30.0 sec 7.52 GBytes 2.15 Gbits/sec
0.0-30.0 sec 7.12 GBytes 2.04 Gbits/sec
0.0-30.0 sec 8.17 GBytes 2.34 Gbits/sec
I think for this feature to enabled we need performance to be roughly the same
or better for all use cases. For now the gap here is too big I think.
If you wish to reproduce:
1 host, 2 VMs each with 1 vhost port and flows set up to switch packets from
each vhost port to the other.
VM1:
ifconfig eth1 1.1.1.1/24 up
ethtool -K eth2 tx on/off
iperf -c 1.1.1.2 -i 1 -t 30
VM2:
ifconfig eth1 1.1.1.2/24 up
ethtool -K eth1 tx on/off
iperf -s -i 1
Thanks,
Ciara
>
> Setup:
>
> qperf client
> +---------+
> | VM |
> +---------+
> |
> | qperf server
> +--------------+ +------------+
> | vswitch+dpdk | | bare-metal |
> +--------------+ +------------+
> | |
> | |
> pNic---------PhysicalSwitch----
>
> do cksum in ovs-dpdk: Applied this patch and execute 'ethtool -K eth0 tx on'
> in VM side.
> It offload cksum job to ovs-dpdk side.
>
> do cksum in VM: Applied this patch and execute 'ethtool -K eth0 tx off' in VM
> side.
> VM calculate cksum for tcp/udp packets.
>
> We can see huge improvment in TCP throughput if we leverage ovs-dpdk
> cksum.
>
> [root@localhost ~]# qperf -t 10 -oo msg_size:1:64K:*2 host-qperf-server01
> tcp_bw tcp_lat udp_bw udp_lat
> do cksum in ovs-dpdk do cksum in VM without this patch
> tcp_bw:
> bw = 1.9 MB/sec bw = 1.92 MB/sec bw = 1.95 MB/sec
> tcp_bw:
> bw = 3.97 MB/sec bw = 3.99 MB/sec bw = 3.98 MB/sec
> tcp_bw:
> bw = 7.75 MB/sec bw = 7.79 MB/sec bw = 7.89 MB/sec
> tcp_bw:
> bw = 14.7 MB/sec bw = 14.7 MB/sec bw = 14.9 MB/sec
> tcp_bw:
> bw = 27.7 MB/sec bw = 27.4 MB/sec bw = 28 MB/sec
> tcp_bw:
> bw = 51.1 MB/sec bw = 51.3 MB/sec bw = 51.8 MB/sec
> tcp_bw:
> bw = 86.2 MB/sec bw = 84.4 MB/sec bw = 87.6 MB/sec
> tcp_bw:
> bw = 141 MB/sec bw = 142 MB/sec bw = 141 MB/sec
> tcp_bw:
> bw = 203 MB/sec bw = 201 MB/sec bw = 211 MB/sec
> tcp_bw:
> bw = 267 MB/sec bw = 250 MB/sec bw = 260 MB/sec
> tcp_bw:
> bw = 324 MB/sec bw = 295 MB/sec bw = 302 MB/sec
> tcp_bw:
> bw = 397 MB/sec bw = 363 MB/sec bw = 347 MB/sec
> tcp_bw:
> bw = 765 MB/sec bw = 510 MB/sec bw = 383 MB/sec
> tcp_bw:
> bw = 850 MB/sec bw = 710 MB/sec bw = 417 MB/sec
> tcp_bw:
> bw = 1.09 GB/sec bw = 860 MB/sec bw = 444 MB/sec
> tcp_bw:
> bw = 1.17 GB/sec bw = 979 MB/sec bw = 447 MB/sec
> tcp_bw:
> bw = 1.17 GB/sec bw = 1.07 GB/sec bw = 462 MB/sec
> tcp_lat:
> latency = 29.1 us latency = 28.7 us latency = 29.1 us
> tcp_lat:
> latency = 29 us latency = 28.8 us latency = 29 us
> tcp_lat:
> latency = 29 us latency = 28.8 us latency = 29 us
> tcp_lat:
> latency = 29 us latency = 28.9 us latency = 29 us
> tcp_lat:
> latency = 29.2 us latency = 28.9 us latency = 29.1 us
> tcp_lat:
> latency = 29.1 us latency = 29.1 us latency = 29.1 us
> tcp_lat:
> latency = 29.5 us latency = 29.5 us latency = 29.5 us
> tcp_lat:
> latency = 29.8 us latency = 29.8 us latency = 29.9 us
> tcp_lat:
> latency = 30.7 us latency = 30.7 us latency = 30.7 us
> tcp_lat:
> latency = 47.1 us latency = 46.2 us latency = 47.1 us
> tcp_lat:
> latency = 52.1 us latency = 52.3 us latency = 53.3 us
> tcp_lat:
> latency = 44 us latency = 43.8 us latency = 43.2 us
> tcp_lat:
> latency = 50 us latency = 46.6 us latency = 47.8 us
> tcp_lat:
> latency = 79.2 us latency = 77.9 us latency = 78.9 us
> tcp_lat:
> latency = 82.3 us latency = 81.7 us latency = 82.2 us
> tcp_lat:
> latency = 96.7 us latency = 90.8 us latency = 127 us
> tcp_lat:
> latency = 215 us latency = 177 us latency = 225 us
> udp_bw:
> send_bw = 422 KB/sec send_bw = 415 KB/sec send_bw =
> 405
> KB/sec
> recv_bw = 402 KB/sec recv_bw = 404 KB/sec recv_bw =
> 403
> KB/sec
> udp_bw:
> send_bw = 845 KB/sec send_bw = 835 KB/sec send_bw =
> 802
> KB/sec
> recv_bw = 831 KB/sec recv_bw = 804 KB/sec recv_bw =
> 802
> KB/sec
> udp_bw:
> send_bw = 1.69 MB/sec send_bw = 1.66 MB/sec send_bw =
> 1.62
> MB/sec
> recv_bw = 1.45 MB/sec recv_bw = 1.63 MB/sec recv_bw =
> 1.6
> MB/sec
> udp_bw:
> send_bw = 3.38 MB/sec send_bw = 3.33 MB/sec send_bw =
> 3.24
> MB/sec
> recv_bw = 3.32 MB/sec recv_bw = 3.25 MB/sec recv_bw =
> 3.24
> MB/sec
> udp_bw:
> send_bw = 6.76 MB/sec send_bw = 6.63 MB/sec send_bw =
> 6.47
> MB/sec
> recv_bw = 6.42 MB/sec recv_bw = 5.59 MB/sec recv_bw =
> 6.45
> MB/sec
> udp_bw:
> send_bw = 13.5 MB/sec send_bw = 13.3 MB/sec send_bw =
> 13
> MB/sec
> recv_bw = 13.4 MB/sec recv_bw = 12.1 MB/sec recv_bw =
> 13
> MB/sec
> udp_bw:
> send_bw = 27 MB/sec send_bw = 26.5 MB/sec send_bw =
> 25.9
> MB/sec
> recv_bw = 26.4 MB/sec recv_bw = 21.5 MB/sec recv_bw =
> 25.9
> MB/sec
> udp_bw:
> send_bw = 53.8 MB/sec send_bw = 52.9 MB/sec send_bw =
> 51.7
> MB/sec
> recv_bw = 49.1 MB/sec recv_bw = 47.6 MB/sec recv_bw =
> 51.1
> MB/sec
> udp_bw:
> send_bw = 108 MB/sec send_bw = 105 MB/sec send_bw =
> 102
> MB/sec
> recv_bw = 91.1 MB/sec recv_bw = 101 MB/sec recv_bw =
> 100
> MB/sec
> udp_bw:
> send_bw = 212 MB/sec send_bw = 208 MB/sec send_bw =
> 203
> MB/sec
> recv_bw = 204 MB/sec recv_bw = 204 MB/sec recv_bw =
> 169
> MB/sec
> udp_bw:
> send_bw = 414 MB/sec send_bw = 407 MB/sec send_bw =
> 398
> MB/sec
> recv_bw = 403 MB/sec recv_bw = 312 MB/sec recv_bw =
> 343
> MB/sec
> udp_bw:
> send_bw = 555 MB/sec send_bw = 561 MB/sec send_bw =
> 557
> MB/sec
> recv_bw = 354 MB/sec recv_bw = 368 MB/sec recv_bw =
> 360
> MB/sec
> udp_bw:
> send_bw = 877 MB/sec send_bw = 880 MB/sec send_bw =
> 868
> MB/sec
> recv_bw = 551 MB/sec recv_bw = 542 MB/sec recv_bw =
> 562
> MB/sec
> udp_bw:
> send_bw = 1.1 GB/sec send_bw = 1.08 GB/sec send_bw =
> 1.09
> GB/sec
> recv_bw = 805 MB/sec recv_bw = 785 MB/sec recv_bw =
> 766
> MB/sec
> udp_bw:
> send_bw = 1.21 GB/sec send_bw = 1.19 GB/sec send_bw =
> 1.22
> GB/sec
> recv_bw = 899 MB/sec recv_bw = 715 MB/sec recv_bw =
> 700
> MB/sec
> udp_bw:
> send_bw = 1.31 GB/sec send_bw = 1.31 GB/sec send_bw =
> 1.31
> GB/sec
> recv_bw = 614 MB/sec recv_bw = 622 MB/sec recv_bw =
> 661
> MB/sec
> udp_bw:
> send_bw = 0 bytes/sec send_bw = 0 bytes/sec send_bw = 0
> bytes/sec
> recv_bw = 0 bytes/sec recv_bw = 0 bytes/sec recv_bw = 0
> bytes/sec
> udp_lat:
> latency = 25.9 us latency = 26.5 us latency = 26.5 us
> udp_lat:
> latency = 26.3 us latency = 26.4 us latency = 26.5 us
> udp_lat:
> latency = 26 us latency = 26.4 us latency = 26.6 us
> udp_lat:
> latency = 26.1 us latency = 26.2 us latency = 26.4 us
> udp_lat:
> latency = 26.3 us latency = 26.5 us latency = 26.7 us
> udp_lat:
> latency = 26.3 us latency = 26.4 us latency = 26.5 us
> udp_lat:
> latency = 26.3 us latency = 26.7 us latency = 26.9 us
> udp_lat:
> latency = 27.1 us latency = 27.1 us latency = 27.2 us
> udp_lat:
> latency = 27.5 us latency = 27.8 us latency = 28.1 us
> udp_lat:
> latency = 28.7 us latency = 28.9 us latency = 29.1 us
> udp_lat:
> latency = 30.4 us latency = 30.5 us latency = 30.9 us
> udp_lat:
> latency = 41.2 us latency = 41.3 us latency = 41.1 us
> udp_lat:
> latency = 41.3 us latency = 41.5 us latency = 41.5 us
> udp_lat:
> latency = 64.4 us latency = 64.5 us latency = 64.2 us
> udp_lat:
> latency = 71.5 us latency = 71.5 us latency = 71.7 us
> udp_lat:
> latency = 120 us latency = 120 us latency = 120 us
> udp_lat:
> latency = 0 ns latency = 0 ns latency = 0 ns
>
> lib/netdev-dpdk.c | 79
> ++++++++++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 75 insertions(+), 4 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index f58e9be..0f91def 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -31,6 +31,7 @@
> #include <rte_errno.h>
> #include <rte_eth_ring.h>
> #include <rte_ethdev.h>
> +#include <rte_ip.h>
> #include <rte_malloc.h>
> #include <rte_mbuf.h>
> #include <rte_meter.h>
> @@ -992,8 +993,7 @@ netdev_dpdk_vhost_construct(struct netdev
> *netdev)
>
> err = rte_vhost_driver_disable_features(dev->vhost_id,
> 1ULL << VIRTIO_NET_F_HOST_TSO4
> - | 1ULL << VIRTIO_NET_F_HOST_TSO6
> - | 1ULL << VIRTIO_NET_F_CSUM);
> + | 1ULL << VIRTIO_NET_F_HOST_TSO6);
> if (err) {
> VLOG_ERR("rte_vhost_driver_disable_features failed for vhost user "
> "port: %s\n", name);
> @@ -1455,6 +1455,76 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq
> *rxq)
> rte_free(rx);
> }
>
> +static inline void
> +netdev_dpdk_vhost_refill_l4_cksum(const char *data, struct dp_packet
> *pkt,
> + uint8_t l4_proto, bool is_ipv4)
> +{
> + void *l3hdr = (void *)(data + pkt->mbuf.l2_len);
> +
> + if (l4_proto == IPPROTO_TCP) {
> + struct tcp_header *tcp_hdr = (struct tcp_header *)(data +
> + pkt->mbuf.l2_len +
> pkt->mbuf.l3_len);
> +
> + tcp_hdr->tcp_csum = 0;
> + if (is_ipv4) {
> + tcp_hdr->tcp_csum = rte_ipv4_udptcp_cksum(l3hdr, tcp_hdr);
> + } else {
> + tcp_hdr->tcp_csum = rte_ipv6_udptcp_cksum(l3hdr, tcp_hdr);
> + }
> + } else if (l4_proto == IPPROTO_UDP) {
> + struct udp_header *udp_hdr = (struct udp_header *)(data +
> + pkt->mbuf.l2_len +
> pkt->mbuf.l3_len);
> + /* do not recalculate udp cksum if it was 0 */
> + if (udp_hdr->udp_csum != 0) {
> + udp_hdr->udp_csum = 0;
> + if (is_ipv4) {
> + /*do not calculate udp cksum if it was a fragment IP*/
> + if (IP_IS_FRAGMENT(((struct ipv4_hdr *)l3hdr)->
> + fragment_offset)) {
> + return;
> + }
> +
> + udp_hdr->udp_csum = rte_ipv4_udptcp_cksum(l3hdr, udp_hdr);
> + } else {
> + udp_hdr->udp_csum = rte_ipv6_udptcp_cksum(l3hdr, udp_hdr);
> + }
> + }
> + }
> +
> + pkt->mbuf.ol_flags &= ~PKT_TX_L4_MASK;
> +}
> +
> +static inline void
> +netdev_dpdk_vhost_tx_csum(struct dp_packet **pkts, int pkt_cnt)
> +{
> + int i;
> +
> + for (i = 0; i < pkt_cnt; i++) {
> + ovs_be16 dl_type;
> + struct dp_packet *pkt = (struct dp_packet *)pkts[i];
> + const char *data = dp_packet_data(pkt);
> + void *l3hdr = (char *)(data + pkt->mbuf.l2_len);
> +
> + if (!(pkt->mbuf.ol_flags & PKT_TX_L4_MASK)) {
> + /* DPDK vhost tags PKT_TX_L4_MASK if a L4 packet need cksum. */
> + continue;
> + }
> +
> + if (OVS_UNLIKELY(pkt->mbuf.l2_len == 0 || pkt->mbuf.l3_len == 0)) {
> + continue;
> + }
> +
> + dl_type = *(ovs_be16 *)(data + pkt->mbuf.l2_len - sizeof dl_type);
> + if (dl_type == htons(ETH_TYPE_IP)) {
> + uint8_t l4_proto = ((struct ipv4_hdr *)l3hdr)->next_proto_id;
> + netdev_dpdk_vhost_refill_l4_cksum(data, pkt, l4_proto, true);
> + } else if (dl_type == htons(ETH_TYPE_IPV6)) {
> + uint8_t l4_proto = ((struct ipv6_hdr *)l3hdr)->proto;
> + netdev_dpdk_vhost_refill_l4_cksum(data, pkt, l4_proto, false);
> + }
> + }
> +}
> +
> /* Tries to transmit 'pkts' to txq 'qid' of device 'dev'. Takes ownership of
> * 'pkts', even in case of failure.
> *
> @@ -1646,6 +1716,8 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq
> *rxq,
>
> dp_packet_batch_init_cutlen(batch);
> batch->count = (int) nb_rx;
> + netdev_dpdk_vhost_tx_csum(batch->packets, batch->count);
> +
> return 0;
> }
>
> @@ -3288,8 +3360,7 @@ netdev_dpdk_vhost_client_reconfigure(struct
> netdev *netdev)
>
> err = rte_vhost_driver_disable_features(dev->vhost_id,
> 1ULL << VIRTIO_NET_F_HOST_TSO4
> - | 1ULL << VIRTIO_NET_F_HOST_TSO6
> - | 1ULL << VIRTIO_NET_F_CSUM);
> + | 1ULL << VIRTIO_NET_F_HOST_TSO6);
> if (err) {
> VLOG_ERR("rte_vhost_driver_disable_features failed for vhost
> user "
> "client port: %s\n", dev->up.name);
> --
> 1.8.3.1
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev