On 1/11/23 11:24, David Marchand wrote: > On Wed, Jan 11, 2023 at 10:35 AM Kevin Traynor <[email protected]> wrote: >> >> Sleep for an incremental amount of time if none of the Rx queues >> assigned to a PMD have at least half a batch of packets (i.e. 16 pkts) >> on an polling iteration of the PMD. >> >> Upon detecting the threshold of >= 16 pkts on an Rxq, reset the >> sleep time to zero (i.e. no sleep). >> >> Sleep time will be increased on each iteration where the low load >> conditions remain up to a total of the max sleep time which is set >> by the user e.g: >> ovs-vsctl set Open_vSwitch . other_config:pmd-maxsleep=500 >> >> The default pmd-maxsleep value is 0, which means that no sleeps >> will occur and the default behaviour is unchanged from previously. >> >> Also add new stats to pmd-perf-show to get visibility of operation >> e.g. >> ... >> - sleep iterations: 153994 ( 76.8 % of iterations) >> Sleep time (us): 9159399 ( 46 us/iteration avg.) >> ... >> >> Reviewed-by: Robin Jarry <[email protected]> >> Reviewed-by: David Marchand <[email protected]> >> Signed-off-by: Kevin Traynor <[email protected]> > > Checked v4 -> v5 diff. > Reviewed-by: David Marchand <[email protected]>
Thanks, Kevin, David and Robin! As we discussed off-list with Kevin, I changed the 'us/iteration avg.' statistics to report average time per sleep iteration, not all the iterations. Average sleep time among sleep iterations is more intuitive and much more useful as it conveys how long sleeps actually take. Average among all the iterations doesn't seem to be useful in any way, from my perspective. With that, I applied the set. There is some unwanted behavior though that I noticed during the tests and for which we'll need some follow up fixes. I'm using my usual setup with OVS with 2 PMD threads and 2 testpmd applications with virtio-user ports. One in txonly and the other in mac mode. So, the traffic is almost bi-directional: txonly --> vhost0 --> PMD#0 --> vhost1 --> mac --> vhost1 --> PMD#1 --> drop Since the first testpmd is in txonly mode, it doesn't receive any packets and they end up dropped on a send attempt by PMD#1. The load on PMD#0 is higher than on PMD#1, because dropping packets is much faster than actually sending them. So, only 66% of cycles on PMD#1 are busy cycles, others are idle. PMD#0 is using 100% of its cycles for forwarding. If the pmd-maxsleep is not enabled, both threads are forwarding the same amount of traffic (8.2 Mpps in my case). However, once pmd-maxsleep changed, PMD#1 starts forwarding only 70% of the previous amount of traffic, while still consuming only 60% of CPU cycles. Which would be a strange behavior from a user's perspective as we're seemingly prioritizing sleeps over packet drops. Here is what happens: 1. The thread has idle cycles, so it decides that it can sleep. 2. Every time it sleeps, it sleeps for 60+ us. 50 us default timer slack in the kernel + 10 us OVS requests. 3. 60+ us is enough to overflow the rxq and testpmd in mac mode starts dropping packets on transmit. 4. PMD#1 wakes up, quickly clears the queue in a few iterations. 5. Goto step 1. This cycle continues resulting in a constant drop rate of 30% of the incoming traffic. The main problem for this case is the extra 50 us slack on the timer. Setting the PR_SET_TIMERSLACK to 1 us solves the problem, as 10 us is not enough to overflow the queue in this scenario. Another thing worth changing is setting back the sleep increments to 1 us, once the timer slack is reduced. That helps making work smoother. This will also increase the ramp up time to the maxsleep value, allowing OVS to react to bursty traffic faster. We may think of some other mitigation strategies that will help to avoid packet drops in the future. But for now, I think, we should consider getting the timer slack fix and the reduction of the sleep increment in order to not fall into such pathological cases so easily. Thoughts? Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
