On 18 Mar 2025, at 10:12, Vladislav Odintsov <vlodintsov@k2.cloud> wrote: > > On 17.03.2025 22:26, Ilya Maximets wrote: >>> On 3/17/25 16:56, Mike Pattrick wrote: >>> On Fri, Mar 14, 2025 at 4:11 PM Vladislav Odintsov <vlodintsov@k2.cloud> >>> wrote: >>>> With this commit the rebalance of hash entries between bonding members >>>> becomes less frequent. If we know all bond members' speed, we do not >>>> move hash entries between them if load difference is less than 1.5% of >>>> minimum link speed of bond members. >>>> >>>> Reported-at: >>>> https://mail.openvswitch.org/pipermail/ovs-dev/2025-March/422028.html >>>> Suggested-by: Ilya Maximets <i.maxim...@ovn.org> >>>> Signed-off-by: Vladislav Odintsov <vlodintsov@k2.cloud> >>>> --- >> Hi, Vladislav. Thanks for the patch! >> >> As Eelco mentioned, this will need an update in the documentation as well. >> In particular in Documentation/topics/bonding.rst. >> >> May also be good to add NEWS entry, since it's a user-visible change. > > Hi Eelco, Mike and Ilya, > > thanks for your time and comments & reviews! > > I'll address most of the comments in v2. I've got some comments inline. PSB. > >> >>>> ofproto/bond.c | 11 +++++++++-- >>>> 1 file changed, 9 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/ofproto/bond.c b/ofproto/bond.c >>>> index 17bf10be5..70219db89 100644 >>>> --- a/ofproto/bond.c >>>> +++ b/ofproto/bond.c >>>> @@ -1336,6 +1336,7 @@ bond_rebalance(struct bond *bond) >>>> struct ovs_list bals; >>>> bool rebalanced = false; >>>> bool use_recirc; >>>> + uint32_t min_member_mbps = UINT32_MAX; >> Reverse x-mass tree, please. It's not perfect here, but we shouldn't >> make it worse. > Ack. >> >>>> ovs_rwlock_wrlock(&rwlock); >>>> if (!bond_is_balanced(bond) || time_msec() < bond->next_rebalance) { >>>> @@ -1378,7 +1379,11 @@ bond_rebalance(struct bond *bond) >>>> ovs_list_init(&bals); >>>> HMAP_FOR_EACH (member, hmap_node, &bond->members) { >>>> if (member->enabled) { >>>> + uint32_t mbps; >>>> + >>>> insert_bal(&bals, member); >>>> + netdev_get_speed(member->netdev, &mbps, NULL); >>> This function may return an error. And also it may not return an error >>> but still set mbps to zero. We should probably discard the value in >>> those cases. >> It's not really necessary to check the result of this function here. >> If it failed, we'll just get the perdefined minimum value in the MAX >> condition below. We do similar thing in sFlow processing, for example. >> >> But I agree that it makes sense to add a comment about this. > Ack. >> >>>> + min_member_mbps = MIN(mbps, min_member_mbps); >>>> } >>>> } >>> >>> Should probably also have some sane default if all the calls to >>> netdev_get_speed failed and min_member_mbps is still UINT32_MAX. >> In the current implementation, it can't end up to still be UINT32_MAX, >> unless there are no active members. But in that case the load shift >> below will also not run, as the list will be empty. >> >>> >>>> log_bals(bond, &bals); >>>> @@ -1392,10 +1397,12 @@ bond_rebalance(struct bond *bond) >>>> uint64_t overload; >>>> >>>> overload = from->tx_bytes - to->tx_bytes; >>>> - if (overload < to->tx_bytes >> 5 || overload < 100000) { >>>> + if (overload < to->tx_bytes >> 5 || >>>> + overload / 1000000 < MAX(1, min_member_mbps / 8 >> 6)) { >> Overload is in bytes here, so the left side is in MBps (not Mbps). >> On the right side we have the link speed converted to MBps as well >> and then shifted to get ~1.5%. So, the compared values are of the >> same order, which is good. But the MAX is taken from 1 and it means >> that this 1 is 1 MBps, and not 1 Mbps as it was before. >> >> What we could do: >> >> 1. overload * 8 / 1000000 < MAX(1, min_member_mbps >> 6) >> That will keep the calculations in bits and will match the old >> behavior in case the link speed is not available or too low. >> It's likely not possible to overflow the overload here. >> >> 2. Keep the calculations in bytes as in this patch. This corresponds >> to a link speed of 512 Mbps ((1 << 6) * 8), which maybe a little >> too high. 100 Mbps links are still common on smaller devices. > > I can switch to using bits instead of bytes [1] to save old behavior > with 1Mbit/s, but is it really a significant change if we switch 1Mb to > 1MB (1% vs 1,048576% for the 100Mbit/s link)?
Sorry, my calculation is incorrect here. I was thinking about power of two vs power of ten. Sure, there is 1 vs 8 % for 100Mbit/s link. It is quite significant. > > Also, when I changed this code I noticed some strange comparison logic > (at least for me): in the left side we have total amount of data in > bytes transferred in last rebalance interval (default 10s). And on the > right side we compare it with % of a link speed in bytes *per second*. > It that correct? Shouldn't we normalize left side and divide it by > rebalance interval to compare data amount per second with data amount > per second? This is why I changed 1Mbps to 1MB (without ps) in the > comment below. > >> >>>> /* The extra load on 'from' (and all less-loaded members), >>>> compared >>>> * to that of 'to' (the least-loaded member), is less than >>>> ~3%, or >>>> - * it is less than ~1Mbps. No point in rebalancing. */ >>>> + * it is less than MAX of 1MB and ~1.5% of minimum bond member >>> Should this comment be "1MB or ~1.5%" ? >> nit: Don't omit the 'per second' part of the M[Bb]ps. >> >>> >>> Cheers, >>> M >>> >>>> + * link speed, if present. No point in rebalancing. */ >>>> break; >>>> } >>>> >>>> -- >>>> 2.48.1 >>>> >>>> _______________________________________________ >>>> dev mailing list >>>> d...@openvswitch.org >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> > > -- > regards, > Vladislav Odintsov > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev