On 18 Mar 2025, at 10:12, Vladislav Odintsov <vlodintsov@k2.cloud> wrote:
> 
> On 17.03.2025 22:26, Ilya Maximets wrote:
>>> On 3/17/25 16:56, Mike Pattrick wrote:
>>> On Fri, Mar 14, 2025 at 4:11 PM Vladislav Odintsov <vlodintsov@k2.cloud> 
>>> wrote:
>>>> With this commit the rebalance of hash entries between bonding members
>>>> becomes less frequent.  If we know all bond members' speed, we do not
>>>> move hash entries between them if load difference is less than 1.5% of
>>>> minimum link speed of bond members.
>>>> 
>>>> Reported-at: 
>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2025-March/422028.html
>>>> Suggested-by: Ilya Maximets <i.maxim...@ovn.org>
>>>> Signed-off-by: Vladislav Odintsov <vlodintsov@k2.cloud>
>>>> ---
>> Hi, Vladislav.  Thanks for the patch!
>> 
>> As Eelco mentioned, this will need an update in the documentation as well.
>> In particular in Documentation/topics/bonding.rst.
>> 
>> May also be good to add NEWS entry, since it's a user-visible change.
> 
> Hi Eelco, Mike and Ilya,
> 
> thanks for your time and comments & reviews!
> 
> I'll address most of the comments in v2. I've got some comments inline. PSB.
> 
>> 
>>>>  ofproto/bond.c | 11 +++++++++--
>>>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>>>> index 17bf10be5..70219db89 100644
>>>> --- a/ofproto/bond.c
>>>> +++ b/ofproto/bond.c
>>>> @@ -1336,6 +1336,7 @@ bond_rebalance(struct bond *bond)
>>>>      struct ovs_list bals;
>>>>      bool rebalanced = false;
>>>>      bool use_recirc;
>>>> +    uint32_t min_member_mbps = UINT32_MAX;
>> Reverse x-mass tree, please.  It's not perfect here, but we shouldn't
>> make it worse.
> Ack.
>> 
>>>>      ovs_rwlock_wrlock(&rwlock);
>>>>      if (!bond_is_balanced(bond) || time_msec() < bond->next_rebalance) {
>>>> @@ -1378,7 +1379,11 @@ bond_rebalance(struct bond *bond)
>>>>      ovs_list_init(&bals);
>>>>      HMAP_FOR_EACH (member, hmap_node, &bond->members) {
>>>>          if (member->enabled) {
>>>> +            uint32_t mbps;
>>>> +
>>>>              insert_bal(&bals, member);
>>>> +            netdev_get_speed(member->netdev, &mbps, NULL);
>>> This function may return an error. And also it may not return an error
>>> but still set mbps to zero. We should probably discard the value in
>>> those cases.
>> It's not really necessary to check the result of this function here.
>> If it failed, we'll just get the perdefined minimum value in the MAX
>> condition below.  We do similar thing in sFlow processing, for example.
>> 
>> But I agree that it makes sense to add a comment about this.
> Ack.
>> 
>>>> +            min_member_mbps = MIN(mbps, min_member_mbps);
>>>>          }
>>>>      }
>>> 
>>> Should probably also have some sane default if all the calls to
>>> netdev_get_speed failed and min_member_mbps is still UINT32_MAX.
>> In the current implementation, it can't end up to still be UINT32_MAX,
>> unless there are no active members.  But in that case the load shift
>> below will also not run, as the list will be empty.
>> 
>>> 
>>>>      log_bals(bond, &bals);
>>>> @@ -1392,10 +1397,12 @@ bond_rebalance(struct bond *bond)
>>>>          uint64_t overload;
>>>> 
>>>>          overload = from->tx_bytes - to->tx_bytes;
>>>> -        if (overload < to->tx_bytes >> 5 || overload < 100000) {
>>>> +        if (overload < to->tx_bytes >> 5 ||
>>>> +            overload / 1000000 < MAX(1, min_member_mbps / 8 >> 6)) {
>> Overload is in bytes here, so the left side is in MBps (not Mbps).
>> On the right side we have the link speed converted to MBps as well
>> and then shifted to get ~1.5%.  So, the compared values are of the
>> same order, which is good.  But the MAX is taken from 1 and it means
>> that this 1 is 1 MBps, and not 1 Mbps as it was before.
>> 
>> What we could do:
>> 
>> 1. overload * 8 / 1000000 < MAX(1, min_member_mbps >> 6)
>>    That will keep the calculations in bits and will match the old
>>    behavior in case the link speed is not available or too low.
>>    It's likely not possible to overflow the overload here.
>> 
>> 2. Keep the calculations in bytes as in this patch.  This corresponds
>>    to a link speed of 512 Mbps ((1 << 6) * 8), which maybe a little
>>    too high.  100 Mbps links are still common on smaller devices.
> 
> I can switch to using bits instead of bytes [1] to save old behavior
> with 1Mbit/s, but is it really a significant change if we switch 1Mb to
> 1MB (1% vs 1,048576% for the 100Mbit/s link)?

Sorry, my calculation is incorrect here. I was thinking about power of two vs 
power of ten. Sure, there is 1 vs 8 % for 100Mbit/s link. It is quite 
significant.

> 
> Also, when I changed this code I noticed some strange comparison logic
> (at least for me): in the left side we have total amount of data in
> bytes transferred in last rebalance interval (default 10s). And on the
> right side we compare it with % of a link speed in bytes *per second*.
> It that correct? Shouldn't we normalize left side and divide it by
> rebalance interval to compare data amount per second with data amount
> per second? This is why I changed 1Mbps to 1MB (without ps) in the
> comment below.
> 
>> 
>>>>              /* The extra load on 'from' (and all less-loaded members), 
>>>> compared
>>>>               * to that of 'to' (the least-loaded member), is less than 
>>>> ~3%, or
>>>> -             * it is less than ~1Mbps.  No point in rebalancing. */
>>>> +             * it is less than MAX of 1MB and ~1.5% of minimum bond member
>>> Should this comment be "1MB or ~1.5%" ?
>> nit: Don't omit the 'per second' part of the M[Bb]ps.
>> 
>>> 
>>> Cheers,
>>> M
>>> 
>>>> +             * link speed, if present.  No point in rebalancing. */
>>>>              break;
>>>>          }
>>>> 
>>>> --
>>>> 2.48.1
>>>> 
>>>> _______________________________________________
>>>> dev mailing list
>>>> d...@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>> 
> 
> --
> regards,
> Vladislav Odintsov
> 
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to