On 5/22/25 1:13 PM, Ilya Maximets wrote:
> On 3/13/25 1:43 PM, David Marchand wrote:
>> Following a bug report about vhost-user port traffic with a vxlan tunnel
>> being dropped, I relooked at the offloading code to see how we could
>> switch to the non legacy offload flags mode in the vhost-user libary.
>>
>> As I dug into this, and with Ilya recent comment (on switching to an
>> inner offload API) in mind, I wrote this series.
>>
>> The result seems promising in terms of code.
>> I did no extensive performance testing yet, but I would expect either
>> similar performance (or maybe better performance for non offloaded
>> traffic, as a number of branches are removed in the flow extraction
>> code).
>>
>>
>> Mike recently posted a patch for removing the csum_start/csum_offset
>> netdev-linux for which I also had a patch, but since it had some
>> implications on the handling of inner checksums, I ended up including my
>> version for now.
>>
>>
>> There is still one extra step that is left for a future revision:
>> eliminate last if (tunnel) in miniflow_extract().
>>
>>
>> Comments?
> 
> Thanks, David.
> 
> I think, this is a right approach for moving forward, should be
> much easier to understand and maintain in the long term.
> 
> For now I applied patches 1 though 6 and the patch 8.  I made a
> few small adjustments listed below as per review comments from
> Mike and my own while looking through the set.
> Since there are mainly bug fixes and test improvements, I also
> backported them down to 3.3.  Having all these tests on LTS will
> also allow us to be slightly more confident when backporting
> checksum offload fixes in the future.
> 
> Patch 7 is also a bug fix and I agree that we should be handling
> offload flags properly in the simple match case, I just want to
> do a few more additional tests manually, including a performance
> check, before applying this one.
> 
> List of adjustments:
> Patch 3: Re-worded a commit message.
> Patch 4: Added coverage counter checks in the test as they were
>          missing in a couple 'Flag as Rx good.' cases.
> Patch 6: Renamed coverage counters removing the 'netdev_' part.
>          This module is generally named 'native_tnl' in the logs,
>          we should be using the same shorter name for the counters.
> 
> Will follow up on reviews for the rest of the set next week.

I replied to the patch 9 conversation and I don't have major comments for
other patches in the set, so it may be a good idea to respin and continue
from there.

I did some performance testing with a basic v2v setup and it's a bit of a
roller coaster throughout the set.  virtio-user ports in this setup do not
have any offloads enabled and a pure v2v performance first goes noticeably
higher after patches 9 and 10, but then it goes down and ends up lower than
the original by about 1-2% at the end of the set.

I can get back most of the performance gains seen at patch 11 by moving the
following part to the case where L4 checksums are enabled:

            /* There is no support in virtio net to offload IPv4 csum,
             * but the vhost library handles IPv4 csum offloading fine. */
            dev->hw_ol_features |= NETDEV_TX_IPV4_CKSUM_OFFLOAD;

And then wrapping the netdev_dpdk_prep_hwol_batch() call with the
if (!netdev->ol_flags) check.

I'm not sure if having ip checksum offload has a lot of sense when L4
checksums are not supported.

I also tried the following dirty hack:

@@ -3281,6 +3283,7 @@ netdev_dpdk_common_send(struct netdev *netdev, struct 
dp_packet_batch *batch,
     size_t cnt, pkt_cnt = dp_packet_batch_size(batch);
     struct dp_packet *packet;
     bool need_copy = false;
+    uint64_t offloads = 0;
 
     memset(stats, 0, sizeof *stats);
 
@@ -3289,6 +3292,8 @@ netdev_dpdk_common_send(struct netdev *netdev, struct 
dp_packet_batch *batch,
             need_copy = true;
             break;
         }
+        offloads |= (uint64_t) packet->offloads & ALL_CSUM_MASK;
+        offloads |= pkts[i]->tso_segsz;
     }
 
     /* Copy dp-packets to mbufs. */
@@ -3304,9 +3309,11 @@ netdev_dpdk_common_send(struct netdev *netdev, struct 
dp_packet_batch *batch,
     pkt_cnt = cnt;
 
     /* Prepare each mbuf for hardware offloading. */
-    cnt = netdev_dpdk_prep_hwol_batch(dev, pkts, pkt_cnt);
-    stats->tx_invalid_hwol_drops += pkt_cnt - cnt;
-    pkt_cnt = cnt;
+    if (offloads || need_copy) {
+        cnt = netdev_dpdk_prep_hwol_batch(dev, pkts, pkt_cnt);
+        stats->tx_invalid_hwol_drops += pkt_cnt - cnt;
+        pkt_cnt = cnt;
+    }
 
     /* Apply Quality of Service policy. */
     cnt = netdev_dpdk_qos_run(dev, pkts, pkt_cnt, true);
---

It also improved performance quite a bit.
I guess, there might be an optimization of accumulating offload flags
per batch and then avoid calling the offload preparation function per
packet if none required.

WDYT?

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to