On 12/01/2023 19:00, Ilya Maximets wrote:
On 1/11/23 11:24, David Marchand wrote:
On Wed, Jan 11, 2023 at 10:35 AM Kevin Traynor <[email protected]> wrote:

Sleep for an incremental amount of time if none of the Rx queues
assigned to a PMD have at least half a batch of packets (i.e. 16 pkts)
on an polling iteration of the PMD.

Upon detecting the threshold of >= 16 pkts on an Rxq, reset the
sleep time to zero (i.e. no sleep).

Sleep time will be increased on each iteration where the low load
conditions remain up to a total of the max sleep time which is set
by the user e.g:
ovs-vsctl set Open_vSwitch . other_config:pmd-maxsleep=500

The default pmd-maxsleep value is 0, which means that no sleeps
will occur and the default behaviour is unchanged from previously.

Also add new stats to pmd-perf-show to get visibility of operation
e.g.
...
    - sleep iterations:       153994  ( 76.8 % of iterations)
    Sleep time (us):         9159399  ( 46 us/iteration avg.)
...

Reviewed-by: Robin Jarry <[email protected]>
Reviewed-by: David Marchand <[email protected]>
Signed-off-by: Kevin Traynor <[email protected]>

Checked v4 -> v5 diff.
Reviewed-by: David Marchand <[email protected]>

Thanks, Kevin, David and Robin!

As we discussed off-list with Kevin, I changed the 'us/iteration avg.'
statistics to report average time per sleep iteration, not all the
iterations.  Average sleep time among sleep iterations is more intuitive
and much more useful as it conveys how long sleeps actually take.
Average among all the iterations doesn't seem to be useful in any way,
from my perspective.  With that, I applied the set.


There is some unwanted behavior though that I noticed during the tests
and for which we'll need some follow up fixes.

I'm using my usual setup with OVS with 2 PMD threads and 2 testpmd
applications with virtio-user ports.  One in txonly and the other
in mac mode.  So, the traffic is almost bi-directional:

   txonly --> vhost0 --> PMD#0 --> vhost1 --> mac --> vhost1 --> PMD#1 --> drop

Since the first testpmd is in txonly mode, it doesn't receive any
packets and they end up dropped on a send attempt by PMD#1.

The load on PMD#0 is higher than on PMD#1, because dropping packets
is much faster than actually sending them.  So, only 66% of cycles
on PMD#1 are busy cycles, others are idle.  PMD#0 is using 100% of
its cycles for forwarding.

If the pmd-maxsleep is not enabled, both threads are forwarding the
same amount of traffic (8.2 Mpps in my case).

However, once pmd-maxsleep changed, PMD#1 starts forwarding only 70%
of the previous amount of traffic, while still consuming only 60%
of CPU cycles.  Which would be a strange behavior from a user's
perspective as we're seemingly prioritizing sleeps over packet drops.

Here is what happens:

1. The thread has idle cycles, so it decides that it can sleep.
2. Every time it sleeps, it sleeps for 60+ us.  50 us default
    timer slack in the kernel + 10 us OVS requests.
3. 60+ us is enough to overflow the rxq and testpmd in mac mode
    starts dropping packets on transmit.
4. PMD#1 wakes up, quickly clears the queue in a few iterations.
5. Goto step 1.

This cycle continues resulting in a constant drop rate of 30%
of the incoming traffic.

The main problem for this case is the extra 50 us slack on the timer.
Setting the PR_SET_TIMERSLACK to 1 us solves the problem, as 10 us
is not enough to overflow the queue in this scenario.

Another thing worth changing is setting back the sleep increments
to 1 us, once the timer slack is reduced.  That helps making work
smoother.  This will also increase the ramp up time to the maxsleep
value, allowing OVS to react to bursty traffic faster.

We may think of some other mitigation strategies that will help to
avoid packet drops in the future.  But for now, I think, we should
consider getting the timer slack fix and the reduction of the sleep
increment in order to not fall into such pathological cases so easily.

Thoughts?


Hi Ilya,

Thanks for the testing. It all makes sense. It seems the difference to my testing with physical NIC, is the burstiness and batch sizes of the vhost traffic combined with the timer slack delay making the sleeping too aggressive for this interface. This is definitely something we have to avoid.

I agree that setting the timer slack and sleep start/inc sleep to 1 uS should fix this. As you mention, we can look at further mitigation like a delay before sleeping if we find a need for it.

I will chat with David and we can co-ordinate on patches reducing timer slack/sleep inc.

thanks,
Kevin.

Best regards, Ilya Maximets.


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to