On 17.03.2025 22:26, Ilya Maximets wrote: > On 3/17/25 16:56, Mike Pattrick wrote: >> On Fri, Mar 14, 2025 at 4:11 PM Vladislav Odintsov <vlodintsov@k2.cloud> >> wrote: >>> With this commit the rebalance of hash entries between bonding members >>> becomes less frequent. If we know all bond members' speed, we do not >>> move hash entries between them if load difference is less than 1.5% of >>> minimum link speed of bond members. >>> >>> Reported-at: >>> https://mail.openvswitch.org/pipermail/ovs-dev/2025-March/422028.html >>> Suggested-by: Ilya Maximets <i.maxim...@ovn.org> >>> Signed-off-by: Vladislav Odintsov <vlodintsov@k2.cloud> >>> --- > Hi, Vladislav. Thanks for the patch! > > As Eelco mentioned, this will need an update in the documentation as well. > In particular in Documentation/topics/bonding.rst. > > May also be good to add NEWS entry, since it's a user-visible change.
Hi Eelco, Mike and Ilya, thanks for your time and comments & reviews! I'll address most of the comments in v2. I've got some comments inline. PSB. > >>> ofproto/bond.c | 11 +++++++++-- >>> 1 file changed, 9 insertions(+), 2 deletions(-) >>> >>> diff --git a/ofproto/bond.c b/ofproto/bond.c >>> index 17bf10be5..70219db89 100644 >>> --- a/ofproto/bond.c >>> +++ b/ofproto/bond.c >>> @@ -1336,6 +1336,7 @@ bond_rebalance(struct bond *bond) >>> struct ovs_list bals; >>> bool rebalanced = false; >>> bool use_recirc; >>> + uint32_t min_member_mbps = UINT32_MAX; > Reverse x-mass tree, please. It's not perfect here, but we shouldn't > make it worse. Ack. > >>> ovs_rwlock_wrlock(&rwlock); >>> if (!bond_is_balanced(bond) || time_msec() < bond->next_rebalance) { >>> @@ -1378,7 +1379,11 @@ bond_rebalance(struct bond *bond) >>> ovs_list_init(&bals); >>> HMAP_FOR_EACH (member, hmap_node, &bond->members) { >>> if (member->enabled) { >>> + uint32_t mbps; >>> + >>> insert_bal(&bals, member); >>> + netdev_get_speed(member->netdev, &mbps, NULL); >> This function may return an error. And also it may not return an error >> but still set mbps to zero. We should probably discard the value in >> those cases. > It's not really necessary to check the result of this function here. > If it failed, we'll just get the perdefined minimum value in the MAX > condition below. We do similar thing in sFlow processing, for example. > > But I agree that it makes sense to add a comment about this. Ack. > >>> + min_member_mbps = MIN(mbps, min_member_mbps); >>> } >>> } >> >> Should probably also have some sane default if all the calls to >> netdev_get_speed failed and min_member_mbps is still UINT32_MAX. > In the current implementation, it can't end up to still be UINT32_MAX, > unless there are no active members. But in that case the load shift > below will also not run, as the list will be empty. > >> >>> log_bals(bond, &bals); >>> @@ -1392,10 +1397,12 @@ bond_rebalance(struct bond *bond) >>> uint64_t overload; >>> >>> overload = from->tx_bytes - to->tx_bytes; >>> - if (overload < to->tx_bytes >> 5 || overload < 100000) { >>> + if (overload < to->tx_bytes >> 5 || >>> + overload / 1000000 < MAX(1, min_member_mbps / 8 >> 6)) { > Overload is in bytes here, so the left side is in MBps (not Mbps). > On the right side we have the link speed converted to MBps as well > and then shifted to get ~1.5%. So, the compared values are of the > same order, which is good. But the MAX is taken from 1 and it means > that this 1 is 1 MBps, and not 1 Mbps as it was before. > > What we could do: > > 1. overload * 8 / 1000000 < MAX(1, min_member_mbps >> 6) > That will keep the calculations in bits and will match the old > behavior in case the link speed is not available or too low. > It's likely not possible to overflow the overload here. > > 2. Keep the calculations in bytes as in this patch. This corresponds > to a link speed of 512 Mbps ((1 << 6) * 8), which maybe a little > too high. 100 Mbps links are still common on smaller devices. I can switch to using bits instead of bytes [1] to save old behavior with 1Mbit/s, but is it really a significant change if we switch 1Mb to 1MB (1% vs 1,048576% for the 100Mbit/s link)? Also, when I changed this code I noticed some strange comparison logic (at least for me): in the left side we have total amount of data in bytes transferred in last rebalance interval (default 10s). And on the right side we compare it with % of a link speed in bytes *per second*. It that correct? Shouldn't we normalize left side and divide it by rebalance interval to compare data amount per second with data amount per second? This is why I changed 1Mbps to 1MB (without ps) in the comment below. > >>> /* The extra load on 'from' (and all less-loaded members), >>> compared >>> * to that of 'to' (the least-loaded member), is less than >>> ~3%, or >>> - * it is less than ~1Mbps. No point in rebalancing. */ >>> + * it is less than MAX of 1MB and ~1.5% of minimum bond member >> Should this comment be "1MB or ~1.5%" ? > nit: Don't omit the 'per second' part of the M[Bb]ps. > >> >> Cheers, >> M >> >>> + * link speed, if present. No point in rebalancing. */ >>> break; >>> } >>> >>> -- >>> 2.48.1 >>> >>> _______________________________________________ >>> dev mailing list >>> d...@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> -- regards, Vladislav Odintsov _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev