On Mon, Jun 07, 2021 at 01:01:33PM +0200, Ilya Maximets wrote: > On 8/2/17 11:09 PM, Andy Zhou wrote: > > On Thu, Jul 20, 2017 at 10:21 AM, Ilya Maximets <[email protected]> > > wrote: > >> There are 3 constraints for moving hashes from one slave to another: > >> > >> 1. The load difference is larger than ~3% of one slave's load. > >> 2. The load difference between slaves exceeds 100000 bytes. > >> 3. Moving of the hash makes the load difference lower by > 10%. > >> > >> In current implementation if one of the slaves goes DOWN state, all > >> the hashes assigned to it will be moved to other slaves. After that, > >> if slave will go UP it will wait for rebalancing to get some hashes. > >> But in case where we have more than 10 equally loaded hashes it > >> will never fit constraint #3, because each hash will handle less than > >> 10% of the load. Situation become worse when number of flows grows > >> higher and it's almost impossible to migrate any hash when all the > >> 256 hash entries are used which is very likely when we have few > >> hundreds/thousands of flows. > >> > >> As a result, if one of the slaves goes down and up while traffic > >> flows, it will never be used again for packet transmission. > >> Situation will not be fixed even if we'll stop traffic completely > >> and start it again because first two constraints will block > >> rebalancing on the earlier stages while we have low amount of traffic. > >> > >> Moving of one hash if destination has no hashes as it was before > >> commit c460a6a7bc75 ("ofproto/bond: simplify rebalancing logic") > >> will not help because having one hash isn't enough to make load > >> difference less than 10% of total load and this slave will > >> handle only that one hash forever. > >> > >> To fix this lets try to move few hashes simultaniously to fit > >> constraint #3. > > > > Thanks for working on this. > > Sorry for not replying for almost 4 years. :) > > And sorry for resurrecting the thread, but the issue still exists and > I think that we still need to fix it. The first patch needs a > minor rebase, but it still works fine. The test in the second patch > is still valid.
I don't think Andy is working on OVS these days. I'm the original author of the rebalancing algorithm, so I went back and took a look at patch 1. I see a little bit of coding style I'd do differently these days (e.g. declare 'i' in the 'for' loops rather than at the top of a block) but the code and the rationale for it seems solid to me. I'll read v2 but I expect to ack it. Thanks, Ben. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
