On 05/22/2017 04:43 PM, Jan Scheurich wrote:
> Hi Kevin,
> 
> Thanks a lot for addressing this very important limitation of OVS-DPDK 
> multi-core scalability in cloud contexts such as OpenStack. This is highly 
> appreciated!
> 
> We have not started testing this, so for the time being just some high-level 
> comments:
> 
> I would really like to see a new command ovs-appctl 
> dpif-netdev/pmd-rxq-rebalance or similar to manually trigger redistribution 
> for testing without having to reconfigure things.
> 

Hi Jan,

Do you want this just for testing or you think it can be useful generally?

Probably the lightest way to kick a re-balance at present is to
alternate two pmds that are not used. For example switch between
pmd-cpu-mask=1ff and pmd-cpu-mask=2ff when there are <= 8 rxq's.

> Any redistribution of rx queues across PMDs under high load is a critical 
> thing as the service interruption during the PMD reload can easily cause rx 
> queue overruns and packet drop. Independently from this patch that optimizes 
> the load balance of PMDs after redistribution, we should try to improve the 
> actual reconfiguration to become hitless (i.e. not requiring a reload of 
> PMDs).
> 
> In OpenStack context we really need an automatic re-balancing of rx queues 
> over PMDs when the load balance of PMDs becomes so skewed that OVS 
> unnecessarily drops packets due to overload of some PMDs while others are not 
> fully loaded. Without such a function this patch does really not solve the 
> scalability issue. Starting a new VM forces a re-balance, but that cannot 
> take the load on the just added ports into account, so it will typically be 
> sub-optimal. Also OVS would have no means to adapt to shifting load over time 
> in a stable configuration.
> 
> If re-balance were hitless (see above) it could be triggered at any time. As 
> long as it is not, it should probably only be triggered if a) there is 
> overload on some PMD and b) a rebalancing would improve the situation such 
> that there is zero (or less) loss. Due to a) the additional short service 
> interruption should not matter.
> 

I have done some experimenting around self re-balance and so far it's
been quite easy to see how it can thrash,
- Rxq's on overloaded pmds may or may not require more cycles than they
are consuming
- Inaccuracies in the method of distribution
- Traffic pattern changes, bursty traffic

That is why I went for the approach of modifying the current round-robin
scheme at the moment. I think it's an improvement and more importantly,
safe.

> A final note: when experimenting with a similar in-house prototype for rx 
> queue rebalancing we have experienced strange effects with vhostuser tx 
> queues locking up as a result of frequent reconfiguration. These might have 
> been caused by internal vulnerabilities of the tested complex DPDK 
> application in the guest, but I would suggest we pay very good attention to 
> thread safety of shared DPDK and virtio data structures in host and guest 
> when testing and reviewing this.
> 

ok, thanks for the warning, I will watch for that.

thanks,
Kevin.

> BR, Jan
> 
> 
>> -----Original Message-----
>> From: [email protected] 
>> [mailto:[email protected]] On Behalf Of Kevin Traynor
>> Sent: Friday, 05 May, 2017 18:34
>> To: [email protected]
>> Subject: [ovs-dev] [RFC PATCH 0/6] Change dpdk rxq scheduling to incorporate 
>> rxq processing cycles.
>>
>> Rxqs are scheduled to be handled across available pmds in round robin
>> order with no weight or priority.
>>
>> It can happen that some very busy queues are handled by one pmd which
>> does not have enough cycles to prevent packets being dropped on them.
>> While at the same time another pmd which handles queues with no traffic
>> on them, is essentially idling.
>>
>> Rxq scheduling happens as a result of a number of events and when it does,
>> the same unweighted round robin approach is applied each time.
>>
>> This patchset proposes to augment the round robin nature of rxq scheduling
>> by counting the processing cycles used by the rxqs during their operation
>> and incorporate it into the rxq scheduling.
>>
>> Before distributing in a round robin manner, the rxqs will be sorted in
>> order of the processing cycles they have been consuming. Assuming multiple
>> pmds, this ensures that the measured rxqs using most processing cycles will
>> be distributed to different cores.
>>
>> To try out:
>> This patchset requires the updated pmd counting patch applied as a
>> prerequisite. https://patchwork.ozlabs.org/patch/729970/
>>
>> Alternatively the series with dependencies can be cloned from here:
>> https://github.com/kevintraynor/ovs-rxq.git
>>
>> Simple way to test is add some dpdk ports, add multiple pmds, vary traffic
>> rates and rxqs on ports and trigger reschedules e.g. by changing rxqs or
>> the pmd-cpu-mask.
>>
>> Check rxq distribution with ovs-appctl dpif-netdev/pmd-rxq-show and see
>> if it matches expected.
>>
>> todo:
>> -possibly add a dedicated reschedule trigger command
>> -use consistent type names
>> -update docs
>> -more testing, especially for dual numa
>>
>> thanks,
>> Kevin.
>>
>> Kevin Traynor (6):
>>   dpif-netdev: Add rxq processing cycle counters.
>>   dpif-netdev: Update rxq processing cycles from
>>     cycles_count_intermediate.
>>   dpif-netdev: Change polled_queue to use dp_netdev_rxq.
>>   dpif-netdev: Make dpcls optimization interval more generic.
>>   dpif-netdev: Count the rxq processing cycles for an rxq.
>>   dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
>>
>>  lib/dpif-netdev.c | 163 
>> ++++++++++++++++++++++++++++++++++++++++++++----------
>>  1 file changed, 133 insertions(+), 30 deletions(-)
>>
>> --
>> 1.8.3.1
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to