On 05/22/2017 04:43 PM, Jan Scheurich wrote: > Hi Kevin, > > Thanks a lot for addressing this very important limitation of OVS-DPDK > multi-core scalability in cloud contexts such as OpenStack. This is highly > appreciated! > > We have not started testing this, so for the time being just some high-level > comments: > > I would really like to see a new command ovs-appctl > dpif-netdev/pmd-rxq-rebalance or similar to manually trigger redistribution > for testing without having to reconfigure things. >
Hi Jan, Do you want this just for testing or you think it can be useful generally? Probably the lightest way to kick a re-balance at present is to alternate two pmds that are not used. For example switch between pmd-cpu-mask=1ff and pmd-cpu-mask=2ff when there are <= 8 rxq's. > Any redistribution of rx queues across PMDs under high load is a critical > thing as the service interruption during the PMD reload can easily cause rx > queue overruns and packet drop. Independently from this patch that optimizes > the load balance of PMDs after redistribution, we should try to improve the > actual reconfiguration to become hitless (i.e. not requiring a reload of > PMDs). > > In OpenStack context we really need an automatic re-balancing of rx queues > over PMDs when the load balance of PMDs becomes so skewed that OVS > unnecessarily drops packets due to overload of some PMDs while others are not > fully loaded. Without such a function this patch does really not solve the > scalability issue. Starting a new VM forces a re-balance, but that cannot > take the load on the just added ports into account, so it will typically be > sub-optimal. Also OVS would have no means to adapt to shifting load over time > in a stable configuration. > > If re-balance were hitless (see above) it could be triggered at any time. As > long as it is not, it should probably only be triggered if a) there is > overload on some PMD and b) a rebalancing would improve the situation such > that there is zero (or less) loss. Due to a) the additional short service > interruption should not matter. > I have done some experimenting around self re-balance and so far it's been quite easy to see how it can thrash, - Rxq's on overloaded pmds may or may not require more cycles than they are consuming - Inaccuracies in the method of distribution - Traffic pattern changes, bursty traffic That is why I went for the approach of modifying the current round-robin scheme at the moment. I think it's an improvement and more importantly, safe. > A final note: when experimenting with a similar in-house prototype for rx > queue rebalancing we have experienced strange effects with vhostuser tx > queues locking up as a result of frequent reconfiguration. These might have > been caused by internal vulnerabilities of the tested complex DPDK > application in the guest, but I would suggest we pay very good attention to > thread safety of shared DPDK and virtio data structures in host and guest > when testing and reviewing this. > ok, thanks for the warning, I will watch for that. thanks, Kevin. > BR, Jan > > >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Kevin Traynor >> Sent: Friday, 05 May, 2017 18:34 >> To: [email protected] >> Subject: [ovs-dev] [RFC PATCH 0/6] Change dpdk rxq scheduling to incorporate >> rxq processing cycles. >> >> Rxqs are scheduled to be handled across available pmds in round robin >> order with no weight or priority. >> >> It can happen that some very busy queues are handled by one pmd which >> does not have enough cycles to prevent packets being dropped on them. >> While at the same time another pmd which handles queues with no traffic >> on them, is essentially idling. >> >> Rxq scheduling happens as a result of a number of events and when it does, >> the same unweighted round robin approach is applied each time. >> >> This patchset proposes to augment the round robin nature of rxq scheduling >> by counting the processing cycles used by the rxqs during their operation >> and incorporate it into the rxq scheduling. >> >> Before distributing in a round robin manner, the rxqs will be sorted in >> order of the processing cycles they have been consuming. Assuming multiple >> pmds, this ensures that the measured rxqs using most processing cycles will >> be distributed to different cores. >> >> To try out: >> This patchset requires the updated pmd counting patch applied as a >> prerequisite. https://patchwork.ozlabs.org/patch/729970/ >> >> Alternatively the series with dependencies can be cloned from here: >> https://github.com/kevintraynor/ovs-rxq.git >> >> Simple way to test is add some dpdk ports, add multiple pmds, vary traffic >> rates and rxqs on ports and trigger reschedules e.g. by changing rxqs or >> the pmd-cpu-mask. >> >> Check rxq distribution with ovs-appctl dpif-netdev/pmd-rxq-show and see >> if it matches expected. >> >> todo: >> -possibly add a dedicated reschedule trigger command >> -use consistent type names >> -update docs >> -more testing, especially for dual numa >> >> thanks, >> Kevin. >> >> Kevin Traynor (6): >> dpif-netdev: Add rxq processing cycle counters. >> dpif-netdev: Update rxq processing cycles from >> cycles_count_intermediate. >> dpif-netdev: Change polled_queue to use dp_netdev_rxq. >> dpif-netdev: Make dpcls optimization interval more generic. >> dpif-netdev: Count the rxq processing cycles for an rxq. >> dpif-netdev: Change rxq_scheduling to use rxq processing cycles. >> >> lib/dpif-netdev.c | 163 >> ++++++++++++++++++++++++++++++++++++++++++++---------- >> 1 file changed, 133 insertions(+), 30 deletions(-) >> >> -- >> 1.8.3.1 >> >> _______________________________________________ >> dev mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
