On 3/17/25 16:56, Mike Pattrick wrote: > On Fri, Mar 14, 2025 at 4:11 PM Vladislav Odintsov <vlodintsov@k2.cloud> > wrote: >> >> With this commit the rebalance of hash entries between bonding members >> becomes less frequent. If we know all bond members' speed, we do not >> move hash entries between them if load difference is less than 1.5% of >> minimum link speed of bond members. >> >> Reported-at: >> https://mail.openvswitch.org/pipermail/ovs-dev/2025-March/422028.html >> Suggested-by: Ilya Maximets <i.maxim...@ovn.org> >> Signed-off-by: Vladislav Odintsov <vlodintsov@k2.cloud> >> ---
Hi, Vladislav. Thanks for the patch! As Eelco mentioned, this will need an update in the documentation as well. In particular in Documentation/topics/bonding.rst. May also be good to add NEWS entry, since it's a user-visible change. >> ofproto/bond.c | 11 +++++++++-- >> 1 file changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/ofproto/bond.c b/ofproto/bond.c >> index 17bf10be5..70219db89 100644 >> --- a/ofproto/bond.c >> +++ b/ofproto/bond.c >> @@ -1336,6 +1336,7 @@ bond_rebalance(struct bond *bond) >> struct ovs_list bals; >> bool rebalanced = false; >> bool use_recirc; >> + uint32_t min_member_mbps = UINT32_MAX; Reverse x-mass tree, please. It's not perfect here, but we shouldn't make it worse. >> >> ovs_rwlock_wrlock(&rwlock); >> if (!bond_is_balanced(bond) || time_msec() < bond->next_rebalance) { >> @@ -1378,7 +1379,11 @@ bond_rebalance(struct bond *bond) >> ovs_list_init(&bals); >> HMAP_FOR_EACH (member, hmap_node, &bond->members) { >> if (member->enabled) { >> + uint32_t mbps; >> + >> insert_bal(&bals, member); >> + netdev_get_speed(member->netdev, &mbps, NULL); > > This function may return an error. And also it may not return an error > but still set mbps to zero. We should probably discard the value in > those cases. It's not really necessary to check the result of this function here. If it failed, we'll just get the perdefined minimum value in the MAX condition below. We do similar thing in sFlow processing, for example. But I agree that it makes sense to add a comment about this. > >> + min_member_mbps = MIN(mbps, min_member_mbps); >> } >> } > > > Should probably also have some sane default if all the calls to > netdev_get_speed failed and min_member_mbps is still UINT32_MAX. In the current implementation, it can't end up to still be UINT32_MAX, unless there are no active members. But in that case the load shift below will also not run, as the list will be empty. > > >> log_bals(bond, &bals); >> @@ -1392,10 +1397,12 @@ bond_rebalance(struct bond *bond) >> uint64_t overload; >> >> overload = from->tx_bytes - to->tx_bytes; >> - if (overload < to->tx_bytes >> 5 || overload < 100000) { >> + if (overload < to->tx_bytes >> 5 || >> + overload / 1000000 < MAX(1, min_member_mbps / 8 >> 6)) { Overload is in bytes here, so the left side is in MBps (not Mbps). On the right side we have the link speed converted to MBps as well and then shifted to get ~1.5%. So, the compared values are of the same order, which is good. But the MAX is taken from 1 and it means that this 1 is 1 MBps, and not 1 Mbps as it was before. What we could do: 1. overload * 8 / 1000000 < MAX(1, min_member_mbps >> 6) That will keep the calculations in bits and will match the old behavior in case the link speed is not available or too low. It's likely not possible to overflow the overload here. 2. Keep the calculations in bytes as in this patch. This corresponds to a link speed of 512 Mbps ((1 << 6) * 8), which maybe a little too high. 100 Mbps links are still common on smaller devices. >> /* The extra load on 'from' (and all less-loaded members), >> compared >> * to that of 'to' (the least-loaded member), is less than ~3%, >> or >> - * it is less than ~1Mbps. No point in rebalancing. */ >> + * it is less than MAX of 1MB and ~1.5% of minimum bond member > > Should this comment be "1MB or ~1.5%" ? nit: Don't omit the 'per second' part of the M[Bb]ps. > > > Cheers, > M > >> + * link speed, if present. No point in rebalancing. */ >> break; >> } >> >> -- >> 2.48.1 >> >> _______________________________________________ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev