On 17.03.2025 22:26, Ilya Maximets wrote:
> On 3/17/25 16:56, Mike Pattrick wrote:
>> On Fri, Mar 14, 2025 at 4:11 PM Vladislav Odintsov <vlodintsov@k2.cloud> 
>> wrote:
>>> With this commit the rebalance of hash entries between bonding members
>>> becomes less frequent.  If we know all bond members' speed, we do not
>>> move hash entries between them if load difference is less than 1.5% of
>>> minimum link speed of bond members.
>>>
>>> Reported-at: 
>>> https://mail.openvswitch.org/pipermail/ovs-dev/2025-March/422028.html
>>> Suggested-by: Ilya Maximets <i.maxim...@ovn.org>
>>> Signed-off-by: Vladislav Odintsov <vlodintsov@k2.cloud>
>>> ---
> Hi, Vladislav.  Thanks for the patch!
>
> As Eelco mentioned, this will need an update in the documentation as well.
> In particular in Documentation/topics/bonding.rst.
>
> May also be good to add NEWS entry, since it's a user-visible change.

Hi Eelco, Mike and Ilya,

thanks for your time and comments & reviews!

I'll address most of the comments in v2. I've got some comments inline. PSB.

>
>>>   ofproto/bond.c | 11 +++++++++--
>>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>>> index 17bf10be5..70219db89 100644
>>> --- a/ofproto/bond.c
>>> +++ b/ofproto/bond.c
>>> @@ -1336,6 +1336,7 @@ bond_rebalance(struct bond *bond)
>>>       struct ovs_list bals;
>>>       bool rebalanced = false;
>>>       bool use_recirc;
>>> +    uint32_t min_member_mbps = UINT32_MAX;
> Reverse x-mass tree, please.  It's not perfect here, but we shouldn't
> make it worse.
Ack.
>
>>>       ovs_rwlock_wrlock(&rwlock);
>>>       if (!bond_is_balanced(bond) || time_msec() < bond->next_rebalance) {
>>> @@ -1378,7 +1379,11 @@ bond_rebalance(struct bond *bond)
>>>       ovs_list_init(&bals);
>>>       HMAP_FOR_EACH (member, hmap_node, &bond->members) {
>>>           if (member->enabled) {
>>> +            uint32_t mbps;
>>> +
>>>               insert_bal(&bals, member);
>>> +            netdev_get_speed(member->netdev, &mbps, NULL);
>> This function may return an error. And also it may not return an error
>> but still set mbps to zero. We should probably discard the value in
>> those cases.
> It's not really necessary to check the result of this function here.
> If it failed, we'll just get the perdefined minimum value in the MAX
> condition below.  We do similar thing in sFlow processing, for example.
>
> But I agree that it makes sense to add a comment about this.
Ack.
>
>>> +            min_member_mbps = MIN(mbps, min_member_mbps);
>>>           }
>>>       }
>>
>> Should probably also have some sane default if all the calls to
>> netdev_get_speed failed and min_member_mbps is still UINT32_MAX.
> In the current implementation, it can't end up to still be UINT32_MAX,
> unless there are no active members.  But in that case the load shift
> below will also not run, as the list will be empty.
>
>>
>>>       log_bals(bond, &bals);
>>> @@ -1392,10 +1397,12 @@ bond_rebalance(struct bond *bond)
>>>           uint64_t overload;
>>>
>>>           overload = from->tx_bytes - to->tx_bytes;
>>> -        if (overload < to->tx_bytes >> 5 || overload < 100000) {
>>> +        if (overload < to->tx_bytes >> 5 ||
>>> +            overload / 1000000 < MAX(1, min_member_mbps / 8 >> 6)) {
> Overload is in bytes here, so the left side is in MBps (not Mbps).
> On the right side we have the link speed converted to MBps as well
> and then shifted to get ~1.5%.  So, the compared values are of the
> same order, which is good.  But the MAX is taken from 1 and it means
> that this 1 is 1 MBps, and not 1 Mbps as it was before.
>
> What we could do:
>
> 1. overload * 8 / 1000000 < MAX(1, min_member_mbps >> 6)
>     That will keep the calculations in bits and will match the old
>     behavior in case the link speed is not available or too low.
>     It's likely not possible to overflow the overload here.
>
> 2. Keep the calculations in bytes as in this patch.  This corresponds
>     to a link speed of 512 Mbps ((1 << 6) * 8), which maybe a little
>     too high.  100 Mbps links are still common on smaller devices.

I can switch to using bits instead of bytes [1] to save old behavior 
with 1Mbit/s, but is it really a significant change if we switch 1Mb to 
1MB (1% vs 1,048576% for the 100Mbit/s link)?

Also, when I changed this code I noticed some strange comparison logic 
(at least for me): in the left side we have total amount of data in 
bytes transferred in last rebalance interval (default 10s). And on the 
right side we compare it with % of a link speed in bytes *per second*. 
It that correct? Shouldn't we normalize left side and divide it by 
rebalance interval to compare data amount per second with data amount 
per second? This is why I changed 1Mbps to 1MB (without ps) in the 
comment below.

>
>>>               /* The extra load on 'from' (and all less-loaded members), 
>>> compared
>>>                * to that of 'to' (the least-loaded member), is less than 
>>> ~3%, or
>>> -             * it is less than ~1Mbps.  No point in rebalancing. */
>>> +             * it is less than MAX of 1MB and ~1.5% of minimum bond member
>> Should this comment be "1MB or ~1.5%" ?
> nit: Don't omit the 'per second' part of the M[Bb]ps.
>
>>
>> Cheers,
>> M
>>
>>> +             * link speed, if present.  No point in rebalancing. */
>>>               break;
>>>           }
>>>
>>> --
>>> 2.48.1
>>>
>>> _______________________________________________
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>

-- 
regards,
Vladislav Odintsov

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to