On 6/11/21 1:24 AM, Ben Pfaff wrote: > On Mon, Jun 07, 2021 at 01:01:33PM +0200, Ilya Maximets wrote: >> On 8/2/17 11:09 PM, Andy Zhou wrote: >>> On Thu, Jul 20, 2017 at 10:21 AM, Ilya Maximets <[email protected]> >>> wrote: >>>> There are 3 constraints for moving hashes from one slave to another: >>>> >>>> 1. The load difference is larger than ~3% of one slave's load. >>>> 2. The load difference between slaves exceeds 100000 bytes. >>>> 3. Moving of the hash makes the load difference lower by > 10%. >>>> >>>> In current implementation if one of the slaves goes DOWN state, all >>>> the hashes assigned to it will be moved to other slaves. After that, >>>> if slave will go UP it will wait for rebalancing to get some hashes. >>>> But in case where we have more than 10 equally loaded hashes it >>>> will never fit constraint #3, because each hash will handle less than >>>> 10% of the load. Situation become worse when number of flows grows >>>> higher and it's almost impossible to migrate any hash when all the >>>> 256 hash entries are used which is very likely when we have few >>>> hundreds/thousands of flows. >>>> >>>> As a result, if one of the slaves goes down and up while traffic >>>> flows, it will never be used again for packet transmission. >>>> Situation will not be fixed even if we'll stop traffic completely >>>> and start it again because first two constraints will block >>>> rebalancing on the earlier stages while we have low amount of traffic. >>>> >>>> Moving of one hash if destination has no hashes as it was before >>>> commit c460a6a7bc75 ("ofproto/bond: simplify rebalancing logic") >>>> will not help because having one hash isn't enough to make load >>>> difference less than 10% of total load and this slave will >>>> handle only that one hash forever. >>>> >>>> To fix this lets try to move few hashes simultaniously to fit >>>> constraint #3. >>> >>> Thanks for working on this. >> >> Sorry for not replying for almost 4 years. :) >> >> And sorry for resurrecting the thread, but the issue still exists and >> I think that we still need to fix it. The first patch needs a >> minor rebase, but it still works fine. The test in the second patch >> is still valid. > > I don't think Andy is working on OVS these days. I'm the original > author of the rebalancing algorithm, so I went back and took a look at > patch 1. I see a little bit of coding style I'd do differently these > days (e.g. declare 'i' in the 'for' loops rather than at the top of a > block) but the code and the rationale for it seems solid to me. > > I'll read v2 but I expect to ack it.
Thanks for taking a look! I rebased the patch and adjusted coding style a little bit: https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ I didn't observe any benefits from changing migration threshold though, so I kept it as it is in v1. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
