On 3/18/25 08:36, Vladislav Odintsov wrote:
> On 18 Mar 2025, at 10:12, Vladislav Odintsov <vlodintsov@k2.cloud> wrote:
>>
>> On 17.03.2025 22:26, Ilya Maximets wrote:
>>>> On 3/17/25 16:56, Mike Pattrick wrote:
>>>> On Fri, Mar 14, 2025 at 4:11 PM Vladislav Odintsov <vlodintsov@k2.cloud> 
>>>> wrote:
>>>>> With this commit the rebalance of hash entries between bonding members
>>>>> becomes less frequent.  If we know all bond members' speed, we do not
>>>>> move hash entries between them if load difference is less than 1.5% of
>>>>> minimum link speed of bond members.
>>>>>
>>>>> Reported-at: 
>>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2025-March/422028.html
>>>>> Suggested-by: Ilya Maximets <i.maxim...@ovn.org>
>>>>> Signed-off-by: Vladislav Odintsov <vlodintsov@k2.cloud>
>>>>> ---
>>> Hi, Vladislav.  Thanks for the patch!
>>>
>>> As Eelco mentioned, this will need an update in the documentation as well.
>>> In particular in Documentation/topics/bonding.rst.
>>>
>>> May also be good to add NEWS entry, since it's a user-visible change.
>>
>> Hi Eelco, Mike and Ilya,
>>
>> thanks for your time and comments & reviews!
>>
>> I'll address most of the comments in v2. I've got some comments inline. PSB.
>>
>>>
>>>>>  ofproto/bond.c | 11 +++++++++--
>>>>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>>>>> index 17bf10be5..70219db89 100644
>>>>> --- a/ofproto/bond.c
>>>>> +++ b/ofproto/bond.c
>>>>> @@ -1336,6 +1336,7 @@ bond_rebalance(struct bond *bond)
>>>>>      struct ovs_list bals;
>>>>>      bool rebalanced = false;
>>>>>      bool use_recirc;
>>>>> +    uint32_t min_member_mbps = UINT32_MAX;
>>> Reverse x-mass tree, please.  It's not perfect here, but we shouldn't
>>> make it worse.
>> Ack.
>>>
>>>>>      ovs_rwlock_wrlock(&rwlock);
>>>>>      if (!bond_is_balanced(bond) || time_msec() < bond->next_rebalance) {
>>>>> @@ -1378,7 +1379,11 @@ bond_rebalance(struct bond *bond)
>>>>>      ovs_list_init(&bals);
>>>>>      HMAP_FOR_EACH (member, hmap_node, &bond->members) {
>>>>>          if (member->enabled) {
>>>>> +            uint32_t mbps;
>>>>> +
>>>>>              insert_bal(&bals, member);
>>>>> +            netdev_get_speed(member->netdev, &mbps, NULL);
>>>> This function may return an error. And also it may not return an error
>>>> but still set mbps to zero. We should probably discard the value in
>>>> those cases.
>>> It's not really necessary to check the result of this function here.
>>> If it failed, we'll just get the perdefined minimum value in the MAX
>>> condition below.  We do similar thing in sFlow processing, for example.
>>>
>>> But I agree that it makes sense to add a comment about this.
>> Ack.
>>>
>>>>> +            min_member_mbps = MIN(mbps, min_member_mbps);
>>>>>          }
>>>>>      }
>>>>
>>>> Should probably also have some sane default if all the calls to
>>>> netdev_get_speed failed and min_member_mbps is still UINT32_MAX.
>>> In the current implementation, it can't end up to still be UINT32_MAX,
>>> unless there are no active members.  But in that case the load shift
>>> below will also not run, as the list will be empty.
>>>
>>>>
>>>>>      log_bals(bond, &bals);
>>>>> @@ -1392,10 +1397,12 @@ bond_rebalance(struct bond *bond)
>>>>>          uint64_t overload;
>>>>>
>>>>>          overload = from->tx_bytes - to->tx_bytes;
>>>>> -        if (overload < to->tx_bytes >> 5 || overload < 100000) {
>>>>> +        if (overload < to->tx_bytes >> 5 ||
>>>>> +            overload / 1000000 < MAX(1, min_member_mbps / 8 >> 6)) {
>>> Overload is in bytes here, so the left side is in MBps (not Mbps).
>>> On the right side we have the link speed converted to MBps as well
>>> and then shifted to get ~1.5%.  So, the compared values are of the
>>> same order, which is good.  But the MAX is taken from 1 and it means
>>> that this 1 is 1 MBps, and not 1 Mbps as it was before.
>>>
>>> What we could do:
>>>
>>> 1. overload * 8 / 1000000 < MAX(1, min_member_mbps >> 6)
>>>    That will keep the calculations in bits and will match the old
>>>    behavior in case the link speed is not available or too low.
>>>    It's likely not possible to overflow the overload here.
>>>
>>> 2. Keep the calculations in bytes as in this patch.  This corresponds
>>>    to a link speed of 512 Mbps ((1 << 6) * 8), which maybe a little
>>>    too high.  100 Mbps links are still common on smaller devices.
>>
>> I can switch to using bits instead of bytes [1] to save old behavior
>> with 1Mbit/s, but is it really a significant change if we switch 1Mb to
>> 1MB (1% vs 1,048576% for the 100Mbit/s link)?
> 
> Sorry, my calculation is incorrect here. I was thinking about power of
> two vs power of ten. Sure, there is 1 vs 8 % for 100Mbit/s link. It is
> quite significant.

Yep.

> 
>>
>> Also, when I changed this code I noticed some strange comparison logic
>> (at least for me): in the left side we have total amount of data in
>> bytes transferred in last rebalance interval (default 10s). And on the
>> right side we compare it with % of a link speed in bytes *per second*.
>> It that correct? Shouldn't we normalize left side and divide it by
>> rebalance interval to compare data amount per second with data amount
>> per second? This is why I changed 1Mbps to 1MB (without ps) in the
>> comment below.

Hmm.  That's a good point.  I suppose we're indeed still comparing apples to
oranges here.  So, maybe it was a dumb idea on my end in the first place. :)

What we may do, I think, is to change how we calculate the load.  For
comparison with the link speed to make any sense, we need the load to be an
average traffic rate for the hash.  As you said, we need to convert the total
bytes on the hash into average speed for the last rebalancing interval by
dividing the byte difference from the previous stats check by the rebalancing
interval.  And then we need to calculate a moving average of the traffic rate
using those values instead of a basic / 2 decay that we have today.  We have
a library for the moving average that we can use here.  It will be a more
significant logic change though.

Do you want to take a stab at it?

Best regards, Ilya Maximets.

>>
>>>
>>>>>              /* The extra load on 'from' (and all less-loaded members), 
>>>>> compared
>>>>>               * to that of 'to' (the least-loaded member), is less than 
>>>>> ~3%, or
>>>>> -             * it is less than ~1Mbps.  No point in rebalancing. */
>>>>> +             * it is less than MAX of 1MB and ~1.5% of minimum bond 
>>>>> member
>>>> Should this comment be "1MB or ~1.5%" ?
>>> nit: Don't omit the 'per second' part of the M[Bb]ps.
>>>
>>>>
>>>> Cheers,
>>>> M
>>>>
>>>>> +             * link speed, if present.  No point in rebalancing. */
>>>>>              break;
>>>>>          }
>>>>>
>>>>> --
>>>>> 2.48.1
>>>>>
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> d...@openvswitch.org
>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>
>>
>> --
>> regards,
>> Vladislav Odintsov
>>
>> _______________________________________________
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to