On 6/11/21 1:24 AM, Ben Pfaff wrote:
> On Mon, Jun 07, 2021 at 01:01:33PM +0200, Ilya Maximets wrote:
>> On 8/2/17 11:09 PM, Andy Zhou wrote:
>>> On Thu, Jul 20, 2017 at 10:21 AM, Ilya Maximets <[email protected]> 
>>> wrote:
>>>> There are 3 constraints for moving hashes from one slave to another:
>>>>
>>>>         1. The load difference is larger than ~3% of one slave's load.
>>>>         2. The load difference between slaves exceeds 100000 bytes.
>>>>         3. Moving of the hash makes the load difference lower by > 10%.
>>>>
>>>> In current implementation if one of the slaves goes DOWN state, all
>>>> the hashes assigned to it will be moved to other slaves. After that,
>>>> if slave will go UP it will wait for rebalancing to get some hashes.
>>>> But in case where we have more than 10 equally loaded hashes it
>>>> will never fit constraint #3, because each hash will handle less than
>>>> 10% of the load. Situation become worse when number of flows grows
>>>> higher and it's almost impossible to migrate any hash when all the
>>>> 256 hash entries are used which is very likely when we have few
>>>> hundreds/thousands of flows.
>>>>
>>>> As a result, if one of the slaves goes down and up while traffic
>>>> flows, it will never be used again for packet transmission.
>>>> Situation will not be fixed even if we'll stop traffic completely
>>>> and start it again because first two constraints will block
>>>> rebalancing on the earlier stages while we have low amount of traffic.
>>>>
>>>> Moving of one hash if destination has no hashes as it was before
>>>> commit c460a6a7bc75 ("ofproto/bond: simplify rebalancing logic")
>>>> will not help because having one hash isn't enough to make load
>>>> difference less than 10% of total load and this slave will
>>>> handle only that one hash forever.
>>>>
>>>> To fix this lets try to move few hashes simultaniously to fit
>>>> constraint #3.
>>>
>>> Thanks for working on this.
>>
>> Sorry for not replying for almost 4 years. :)
>>
>> And sorry for resurrecting the thread, but the issue still exists and
>> I think that we still need to fix it.  The first patch needs a
>> minor rebase, but it still works fine.  The test in the second patch
>> is still valid.
> 
> I don't think Andy is working on OVS these days.  I'm the original
> author of the rebalancing algorithm, so I went back and took a look at
> patch 1.  I see a little bit of coding style I'd do differently these
> days (e.g. declare 'i' in the 'for' loops rather than at the top of a
> block) but the code and the rationale for it seems solid to me.
> 
> I'll read v2 but I expect to ack it.

Thanks for taking a look!  I rebased the patch and adjusted coding style
a little bit:
  
https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/

I didn't observe any benefits from changing migration threshold though,
so I kept it as it is in v1.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to