On 14.03.2025 16:43, Ilya Maximets wrote: > On 3/14/25 13:03, Ilya Maximets wrote: >> On 3/14/25 10:26, Vladislav Odintsov wrote: >>> On 13.03.2025 22:05, Ilya Maximets wrote: >>>> On 3/12/25 19:07, Vladislav Odintsov wrote: >>>>> Sorry, I hit the send button too early. >>>>> >>>>> On 12.03.2025 20:49, Vladislav Odintsov wrote: >>>>>> On 12.03.2025 19:16, Ilya Maximets wrote: >>>>>>> On 3/12/25 16:01, Vladislav Odintsov wrote: >>>>>>>> On 12.03.2025 17:45, Ilya Maximets wrote: >>>>>>>>> On 3/12/25 12:41, Vladislav Odintsov wrote: >>>>>>>>>> On 12.03.2025 14:11, Ilya Maximets wrote: >>>>>>>>>>> On 3/12/25 11:22, Vladislav Odintsov wrote: >>>>>>>>>>>> Hi Han, Ilya and Adrian, >>>>>>>>>>>> >>>>>>>>>>>> I'm also suffering from these messages in logs. Han, do you have >>>>>>>>>>>> plans >>>>>>>>>>>> on working with requested changes? >>>>>>>>>>>> >>>>>>>>>>>> If you've got no time at the moment, I can continue this work >>>>>>>>>>>> addressing >>>>>>>>>>>> reviewers' comments. Ilya, you said 1kB is a small chunk of data >>>>>>>>>>>> and we >>>>>>>>>>>> shouldn't care too much about it. Which value is okay to ignore >>>>>>>>>>>> from >>>>>>>>>>>> your perspective? >>>>>>>>>>> Are you suffering from excessive logging or from not enough >>>>>>>>>>> granularity >>>>>>>>>>> in the logs? This patch increases the accuracy and doesn't >>>>>>>>>>> reduce the >>>>>>>>>>> amount of logs. >>>>>>>>>>> >>>>>>>>>>> If the amount of logs is a problem, then maybe we should move the >>>>>>>>>>> logs >>>>>>>>>>> for individual load shifts to DBG level (maybe keep a >>>>>>>>>>> particularly heavy >>>>>>>>>>> ones like >1MB move at INFO) and have a separate overview of the >>>>>>>>>>> change >>>>>>>>>>> instead showing the load distribution after the rebalancing, e.g.: >>>>>>>>>>> >>>>>>>>>>> bond X: rebalanced (now carrying: portA:XXXkB, portB:YYYkB, >>>>>>>>>>> portC:ZZZkB) >>>>>>>>>>> >>>>>>>>>>> For the debug logs we could improve the accuracy as this patch does. >>>>>>>>>>> >>>>>>>>>>> What do you think? >>>>>>>>>> We suffer from excessive logging of "0 kB" records. There are tons of >>>>>>>>>> them. So I planned to take your proposal and do not to log these >>>>>>>>>> events >>>>>>>>>> at all. But I like your last idea about hiding messages of >>>>>>>>>> re-balancing >>>>>>>>>> <1MB under DBG log level and keep INFO for others. >>>>>>>>>> >>>>>>>>>> If you guys are okay with that, I can prepare and send a patch. >>>>>>>>> I think, if we're hiding some information under DBG, then we need >>>>>>>>> to add >>>>>>>>> a new log with an overview of the rebalance with a new load >>>>>>>>> distribution. >>>>>>>>> With that, sounds fine to me. >>>>>>>> Did you mean that we should log total cumulative re-balanced amount >>>>>>>> once >>>>>>>> in a timeframe (let say 5-15 minutes) or just log with some rate-limit? >>>>>>> The total cumulative amount at the end of >>>>>>> ofproto/bond.c:bond_rebalance() >>>>>>> if rebalanced == true. See the example log above. So, it the log >>>>>>> should >>>>>>> appear at most once per rebalance interval, which is 10s by default. >>>>>>> >>>>>>> The current 'shift X of load' log will remain but at DBG level, >>>>>>> except for >>>>>>> a very large ones. >>>>>>> >>>>>>> Will that solve your problem, or is that still too much of logging (once >>>>>>> in 10s) ? >>>>>> 50/50 :) >>>>>> >>>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | wc -l >>>>>> 47388 >>>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift -c >>>>>> 47339 >>>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep 'shift >>>>>> 0kB' | wc -l >>>>>> 20384 >>>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift | >>>>>> cut -d '|' -f1 | uniq -c | sort -n | sed -e 's/^[[:space:]]*//' | cut >>>>>> -d ' ' -f1 | uniq -c | sort -n >>>>>> 1 102 >>>>>> 1 108 >>>>>> 1 109 >>>> <snip> >>>> >>>>>> 70 6 >>>>>> 112 5 >>>>>> 205 4 >>>>>> 449 3 >>>>>> 741 1 >>>>>> 1034 2 >>>>>> >>>>>> The last command shows the amount (1st column) of shift events in one >>>>>> microsecond. Like this: >>>>> Sorry, there was an error: >>>>> >>>>> The *second* column (value) shows the total amount of shift log lines >>>>> occurred in any one millisecond in a log by last day. >>>>> The *first* column (key) shows the total times in a log by last day >>>>> where there was a (value) events in one millisecond. >>>>> >>>>> Like this: >>>>> >>>>> 2025-03-12T17:49:12.307Z|2772298|bond|INFO|bond directconnect: shift >>>>> 407kB of load (with hash 40) from eth6 to eth7 (now carrying 1283kB and >>>>> 935kB load, respectively) >>>>> 2025-03-12T17:49:12.307Z|2772299|bond|INFO|bond directconnect: shift 1kB >>>>> of load (with hash 82) from eth6 to eth7 (now carrying 1282kB and 936kB >>>>> load, respectively) >>>>> 2025-03-12T17:49:12.307Z|2772300|bond|INFO|bond directconnect: shift 0kB >>>>> of load (with hash 85) from eth6 to eth7 (now carrying 1281kB and 937kB >>>>> load, respectively) >>>>> 2025-03-12T17:49:12.307Z|2772301|bond|INFO|bond directconnect: shift 0kB >>>>> of load (with hash 45) from eth6 to eth7 (now carrying 1280kB and 938kB >>>>> load, respectively) >>>>> 2025-03-12T17:49:12.307Z|2772302|bond|INFO|bond directconnect: shift 0kB >>>>> of load (with hash 233) from eth6 to eth7 (now carrying 1280kB and 938kB >>>>> load, respectively) >>>>> 2025-03-12T17:49:12.307Z|2772303|bond|INFO|bond directconnect: shift 0kB >>>>> of load (with hash 19) from eth6 to eth7 (now carrying 1279kB and 938kB >>>>> load, respectively) >>>>> 2025-03-12T17:49:12.307Z|2772304|bond|INFO|bond directconnect: shift 0kB >>>>> of load (with hash 1) from eth6 to eth7 (now carrying 1279kB and 939kB >>>>> load, respectively) >>>>> 2025-03-12T17:49:12.307Z|2772305|bond|INFO|bond directconnect: shift 0kB >>>>> of load (with hash 227) from eth6 to eth7 (now carrying 1279kB and 939kB >>>>> load, respectively) >>>>> 2025-03-12T17:49:12.307Z|2772306|bond|INFO|bond directconnect: shift 0kB >>>>> of load (with hash 76) from eth6 to eth7 (now carrying 1279kB and 939kB >>>>> load, respectively) >>>>> 2025-03-12T17:49:12.307Z|2772307|bond|INFO|bond directconnect: shift 0kB >>>>> of load (with hash 14) from eth6 to eth7 (now carrying 1279kB and 939kB >>>>> load, respectively) >>>>> >>>>> This server lightly loaded (<100mb/s of network traffic). >>>>> >>>>> Additionally: >>>>> >>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift | >>>>> cut -d '|' -f1 | uniq -c | sort -n >>>>> ...(a lot of lines). >>>>> 243 2025-03-12T06:32:25.462Z >>>>> 243 2025-03-12T06:47:36.978Z >>>>> 243 2025-03-12T15:28:17.831Z >>>>> 244 2025-03-12T07:50:35.699Z >>>>> 244 2025-03-12T13:22:43.546Z >>>>> 245 2025-03-12T12:46:39.905Z >>>>> 247 2025-03-12T06:51:57.431Z >>>>> 247 2025-03-12T08:49:12.899Z >>>>> 248 2025-03-12T07:24:31.333Z >>>>> 248 2025-03-12T14:40:52.260Z >>>>> 249 2025-03-12T13:24:33.725Z >>>>> 249 2025-03-12T14:23:29.867Z >>>>> 250 2025-03-12T08:22:30.169Z >>>>> 250 2025-03-12T13:37:25.005Z >>>>> 251 2025-03-12T07:15:20.215Z >>>>> 251 2025-03-12T13:10:22.321Z >>>>> 251 2025-03-12T14:24:39.985Z >>>>> 252 2025-03-12T07:57:47.565Z >>>>> 252 2025-03-12T14:40:32.224Z >>>>> 252 2025-03-12T15:26:17.663Z >>>>> 254 2025-03-12T08:05:08.456Z >>>>> 254 2025-03-12T08:21:50.104Z >>>>> 254 2025-03-12T12:42:09.455Z >>>>> >>>>> The 1st column here is the total shift log lines from 1 millisecond >>>>> (shown in 2nd column). >>>>> >>>>> So, from my example, if there is a maximum 1 log line per 10s, the >>>>> logfile still will be filled with only rebalance lines, but the total >>>>> amount will be noticeably reduced. >>>>> >>>>> By the way, why do we need this information in log? What can we conclude >>>>> from this information? >>>> This information is useful for correlation with networking issues. We had >>>> a >>>> few cases in the past where some weird behavior happened exactly right >>>> after >>>> rebalancing and having rebalancing logs was crucial to figure that out. >>>> >>>> The proposed change will significantly reduce the amount of logging for >>>> you, >>>> it seems, so I'd say we should proceed with that. >>>> >>>> The next step, I think, would be to increase the rebalancing thresholds. >>>> Today we do not rebalance if the overload is less than 3% between ports or >>>> less than ~1Mbps. On modern hardware, 1Mbps is practically nothing and >>>> even >>>> 10 or 100 Mbps may not be worth rebalancing in some cases. So, instead of >>>> hardcoding the 1Mbps threshold we may get a MIN of the link speeds of the >>>> ports and use 1-3% of that as a threshold. On your lightly loaded system >>>> that will likely eliminate most or all the rebalancing altogether. >>>> >>>> It will be a little tricky, since it's not always possible to get the link >>>> speed and the link speed can be changed dynamically, but should generally >>>> be doable. >>>> >>>> This way we'll keep the logs, but avoid a log spam for all cases, except >>>> the ones where rebalancing is actually necessary. >>>> >>>> WDYT? >>> Sounds good for me. >>> >>> I'll post logging changes shortly, since Han is not answering and the >>> final solution is different from original. >>> >>> Regarding rebalance threshold, did you mean something like this? >> Yes, something like this. I thought to store the speed in the member >> structure >> initially, but it creates challenges if the speed changes, and it's probably >> not >> a big deal to check the speed once in the rebalancing interval. So, the >> solution >> below might be fine. I didn't check the math in the code below, but see some >> comments inline.
Thanks for the review! I'll address your comments and send out the patch series. >> >>> diff --git a/ofproto/bond.c b/ofproto/bond.c >>> index 45a36fabb..083c0be40 100644 >>> --- a/ofproto/bond.c >>> +++ b/ofproto/bond.c >>> @@ -1371,6 +1371,8 @@ bond_rebalance(struct bond *bond) >>> } >>> } >>> >>> + uint32_t mbps; >> This is temporary and can be defined inside the if condition in the loop. >> >>> + uint64_t min_member_bps = UINT64_MAX; >> An empty line would be nice. But this may be moved to the top as well, >> since this variable is used not only for the closest loop and this function >> generally defines most of its variables at the top. >> >>> /* Add enabled members to 'bals' in descending order of tx_bytes. >>> * >>> * XXX This is O(n**2) in the number of members but it could be >>> O(n lg n) >>> @@ -1379,6 +1381,10 @@ bond_rebalance(struct bond *bond) >>> HMAP_FOR_EACH (member, hmap_node, &bond->members) { >>> if (member->enabled) { >>> insert_bal(&bals, member); >>> + netdev_get_speed(member->netdev, &mbps, NULL); >>> + if (min_member_bps > mbps) { >>> + min_member_bps = mbps * 1000000; >>> + } >> The comparison above is not correct as it compares apples (bps) to >> oranges (Mbps). Use the MIN macro to avoid this issue. You're right. Initially I compared mbps, but then changed variable to bps and lost this place. Thanks for the hint, I'll fix this. > Also, this multiplication will easily overflow uin32_t, so UINT64_C(1000000) > should be used instead to avoid type conversion issues. Ideally, I'd suggest > to keep the value in Mbps and divide the overload if necessary instead. > We do not care much about accuracy on small values, but we do care about not > cropping the large ones. You're right. Will fix that. > >>> } >>> } >>> log_bals(bond, &bals); >>> @@ -1392,10 +1398,13 @@ bond_rebalance(struct bond *bond) >>> uint64_t overload; >>> >>> overload = from->tx_bytes - to->tx_bytes; >>> - if (overload < to->tx_bytes >> 5 || overload < 100000) { >>> + if (overload < to->tx_bytes >> 5 || >>> + overload < 100000 || >>> + overload * 8 < min_member_bps >> 6) { /* speed can be 0. */ >> Comments should start with a capital letter. But also, I'd suggest we just >> remove the 1Mbps condition above and instead do mbps = MAX(mbps, <some >> value>) >> at the point of getting the speed, where <some value> will be the link speed >> that will result in an equivalent 1Mbps overload check. Ack. >> >>> /* The extra load on 'from' (and all less-loaded members), >>> compared >>> * to that of 'to' (the least-loaded member), is less than >>> ~3%, or >>> - * it is less than ~1Mbps. No point in rebalancing. */ >>> + * it is less than ~1Mbps, or this extra load is < ~1.5% of >>> minimum >>> + bond member link speed, if present. No point in >>> rebalancing. */ >>> break; >>> } >>> >>> If yes, I can send both patches in a series. >> Sure. >> >>>>>> I guess we should not log 0kb movement at all. >>>>>> >>>>>>>>> Note: chinatelecom.cn mail server doesn't seem to like >>>>>>>>> communicating with >>>>>>>>> gmail smtp, and so my replies likely do not reach Han. Let's wait >>>>>>>>> maybe a >>>>>>>>> couple days for a reply, in case at least some of these messages >>>>>>>>> are delivered. >>>>>>>>> >>>>>>>>>>> Best regards, Ilya Maximets. >>>>>>>>>>> >>>>>>>>>>>> On 05.08.2024 10:08, Adrián Moreno wrote: >>>>>>>>>>>>> On Fri, Aug 02, 2024 at 06:38:19PM GMT, Ilya Maximets wrote: >>>>>>>>>>>>>> On 7/2/24 16:36, Han Ding wrote: >>>>>>>>>>>>>>> When the delta is less than 1024 in bond_shift_load, it print >>>>>>>>>>>>>>> "shift 0kB of load". >>>>>>>>>>>>>>> Like this: >>>>>>>>>>>>>>> bond dpdkbond0: shift 0kB of load (with hash 71) from nic1 to >>>>>>>>>>>>>>> nic2 (now carrying 20650165kB and 8311662kB load, respectively) >>>>>>>>>>>>>> Hi, Han. Thanks for the patch! >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm curious, is this information about movements that small >>>>>>>>>>>>>> practically useful? >>>>>>>>>>>>>> >>>>>>>>>>>>> I had the same thought. >>>>>>>>>>>>> I guess it should not happen very often given how the algorithm >>>>>>>>>>>>> works but >>>>>>>>>>>>> OTOH, printing "0kB" is definitely not useful. >>>>>>>>>>>>> >>>>>>>>>>>>>>> Signed-off-by: Han Ding<hand...@chinatelecom.cn> >>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>> ofproto/bond.c | 24 +++++++++++++++++------- >>>>>>>>>>>>>>> 1 file changed, 17 insertions(+), 7 deletions(-) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/ofproto/bond.c b/ofproto/bond.c >>>>>>>>>>>>>>> index c31869a..5b1975d 100644 >>>>>>>>>>>>>>> --- a/ofproto/bond.c >>>>>>>>>>>>>>> +++ b/ofproto/bond.c >>>>>>>>>>>>>>> @@ -1192,13 +1192,23 @@ bond_shift_load(struct bond_entry >>>>>>>>>>>>>>> *hash, struct bond_member *to) >>>>>>>>>>>>>>> struct bond *bond = from->bond; >>>>>>>>>>>>>>> uint64_t delta = hash->tx_bytes; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - VLOG_INFO("bond %s: shift %"PRIu64"kB of load (with hash >>>>>>>>>>>>>>> %"PRIdPTR") " >>>>>>>>>>>>>>> - "from %s to %s (now carrying %"PRIu64"kB and " >>>>>>>>>>>>>>> - "%"PRIu64"kB load, respectively)", >>>>>>>>>>>>>>> - bond->name, delta / 1024, hash - bond->hash, >>>>>>>>>>>>>>> - from->name, to->name, >>>>>>>>>>>>>>> - (from->tx_bytes - delta) / 1024, >>>>>>>>>>>>>>> - (to->tx_bytes + delta) / 1024); >>>>>>>>>>>>>>> + if (delta >= 1024) { >>>>>>>>>>>>>>> + VLOG_INFO("bond %s: shift %"PRIu64"kB of load (with >>>>>>>>>>>>>>> hash %"PRIdPTR") " >>>>>>>>>>>>>>> + "from %s to %s (now carrying %"PRIu64"kB and " >>>>>>>>>>>>>>> + "%"PRIu64"kB load, respectively)", >>>>>>>>>>>>>>> + bond->name, delta / 1024, hash - bond->hash, >>>>>>>>>>>>>>> + from->name, to->name, >>>>>>>>>>>>>>> + (from->tx_bytes - delta) / 1024, >>>>>>>>>>>>>>> + (to->tx_bytes + delta) / 1024); >>>>>>>>>>>>>>> + } else { >>>>>>>>>>>>>>> + VLOG_INFO("bond %s: shift %"PRIu64"B of load (with >>>>>>>>>>>>>>> hash %"PRIdPTR") " >>>>>>>>>>>>>>> + "from %s to %s (now carrying %"PRIu64"kB and " >>>>>>>>>>>>> Apart from Ilya's comment, missing one more indentation space >>>>>>>>>>>>> in this >>>>>>>>>>>>> line (and all others). >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> >>>>>>>>>>>>>>> + "%"PRIu64"kB load, respectively)", >>>>>>>>>>>>>>> + bond->name, delta, hash - bond->hash, >>>>>>>>>>>>>>> + from->name, to->name, >>>>>>>>>>>>>>> + (from->tx_bytes - delta) / 1024, >>>>>>>>>>>>>>> + (to->tx_bytes + delta) / 1024); >>>>>>>>>>>>>>> + } >>>>>>>>>>>>>> I'd suggest instead of copying the whole thing, just replace "kB" >>>>>>>>>>>>>> with another %s and use a couple of ternary operators to produce >>>>>>>>>>>>>> correct value and "kB" or "B" depending on the delta. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best regards, Ilya Maximets. >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> dev mailing list >>>>>>>>>>>>>> d...@openvswitch.org >>>>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> dev mailing list >>>>>>>>>>>>> d...@openvswitch.org >>>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- Regards, Vladislav Odintsov _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev