Re: [ovs-dev] [PATCH] bond: Fix inaccurate log info in bond_shift_load.

Ilya Maximets Fri, 14 Mar 2025 05:03:37 -0700

On 3/14/25 10:26, Vladislav Odintsov wrote:
> 
> On 13.03.2025 22:05, Ilya Maximets wrote:
>> On 3/12/25 19:07, Vladislav Odintsov wrote:
>>> Sorry, I hit the send button too early.
>>>
>>> On 12.03.2025 20:49, Vladislav Odintsov wrote:
>>>> On 12.03.2025 19:16, Ilya Maximets wrote:
>>>>> On 3/12/25 16:01, Vladislav Odintsov wrote:
>>>>>> On 12.03.2025 17:45, Ilya Maximets wrote:
>>>>>>> On 3/12/25 12:41, Vladislav Odintsov wrote:
>>>>>>>> On 12.03.2025 14:11, Ilya Maximets wrote:
>>>>>>>>> On 3/12/25 11:22, Vladislav Odintsov wrote:
>>>>>>>>>> Hi Han, Ilya and Adrian,
>>>>>>>>>>
>>>>>>>>>> I'm also suffering from these messages in logs. Han, do you have
>>>>>>>>>> plans
>>>>>>>>>> on working with requested changes?
>>>>>>>>>>
>>>>>>>>>> If you've got no time at the moment, I can continue this work
>>>>>>>>>> addressing
>>>>>>>>>> reviewers' comments. Ilya, you said 1kB is a small chunk of data
>>>>>>>>>> and we
>>>>>>>>>> shouldn't care too much about it. Which value is okay to ignore
>>>>>>>>>> from
>>>>>>>>>> your perspective?
>>>>>>>>> Are you suffering from excessive logging or from not enough
>>>>>>>>> granularity
>>>>>>>>> in the logs?  This patch increases the accuracy and doesn't
>>>>>>>>> reduce the
>>>>>>>>> amount of logs.
>>>>>>>>>
>>>>>>>>> If the amount of logs is a problem, then maybe we should move the
>>>>>>>>> logs
>>>>>>>>> for individual load shifts to DBG level (maybe keep a
>>>>>>>>> particularly heavy
>>>>>>>>> ones like >1MB move at INFO) and have a separate overview of the
>>>>>>>>> change
>>>>>>>>> instead showing the load distribution after the rebalancing, e.g.:
>>>>>>>>>
>>>>>>>>>      bond X: rebalanced (now carrying: portA:XXXkB, portB:YYYkB,
>>>>>>>>> portC:ZZZkB)
>>>>>>>>>
>>>>>>>>> For the debug logs we could improve the accuracy as this patch does.
>>>>>>>>>
>>>>>>>>> What do you think?
>>>>>>>> We suffer from excessive logging of "0 kB" records. There are tons of
>>>>>>>> them. So I planned to take your proposal and do not to log these
>>>>>>>> events
>>>>>>>> at all. But I like your last idea about hiding messages of
>>>>>>>> re-balancing
>>>>>>>> <1MB under DBG log level and keep INFO for others.
>>>>>>>>
>>>>>>>> If you guys are okay with that, I can prepare and send a patch.
>>>>>>> I think, if we're hiding some information under DBG, then we need
>>>>>>> to add
>>>>>>> a new log with an overview of the rebalance with a new load
>>>>>>> distribution.
>>>>>>> With that, sounds fine to me.
>>>>>> Did you mean that we should log total cumulative re-balanced amount
>>>>>> once
>>>>>> in a timeframe (let say 5-15 minutes) or just log with some rate-limit?
>>>>> The total cumulative amount at the end of
>>>>> ofproto/bond.c:bond_rebalance()
>>>>> if rebalanced == true.  See the example log above.  So, it the log
>>>>> should
>>>>> appear at most once per rebalance interval, which is 10s by default.
>>>>>
>>>>> The current 'shift X of load' log will remain but at DBG level,
>>>>> except for
>>>>> a very large ones.
>>>>>
>>>>> Will that solve your problem, or is that still too much of logging (once
>>>>> in 10s) ?
>>>> 50/50 :)
>>>>
>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | wc -l
>>>> 47388
>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift -c
>>>> 47339
>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep 'shift
>>>> 0kB' | wc -l
>>>> 20384
>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift |
>>>> cut -d '|' -f1 | uniq -c | sort -n | sed -e 's/^[[:space:]]*//' | cut
>>>> -d ' ' -f1 | uniq -c | sort -n
>>>>        1 102
>>>>        1 108
>>>>        1 109
>> <snip>
>>
>>>>       70 6
>>>>      112 5
>>>>      205 4
>>>>      449 3
>>>>      741 1
>>>>     1034 2
>>>>
>>>> The last command shows the amount (1st column) of shift events in one
>>>> microsecond. Like this:
>>> Sorry, there was an error:
>>>
>>> The *second* column (value) shows the total amount of shift log lines
>>> occurred in any one millisecond in a log by last day.
>>> The *first* column (key) shows the total times in a log by last day
>>> where there was a (value) events in one millisecond.
>>>
>>> Like this:
>>>
>>> 2025-03-12T17:49:12.307Z|2772298|bond|INFO|bond directconnect: shift
>>> 407kB of load (with hash 40) from eth6 to eth7 (now carrying 1283kB and
>>> 935kB load, respectively)
>>> 2025-03-12T17:49:12.307Z|2772299|bond|INFO|bond directconnect: shift 1kB
>>> of load (with hash 82) from eth6 to eth7 (now carrying 1282kB and 936kB
>>> load, respectively)
>>> 2025-03-12T17:49:12.307Z|2772300|bond|INFO|bond directconnect: shift 0kB
>>> of load (with hash 85) from eth6 to eth7 (now carrying 1281kB and 937kB
>>> load, respectively)
>>> 2025-03-12T17:49:12.307Z|2772301|bond|INFO|bond directconnect: shift 0kB
>>> of load (with hash 45) from eth6 to eth7 (now carrying 1280kB and 938kB
>>> load, respectively)
>>> 2025-03-12T17:49:12.307Z|2772302|bond|INFO|bond directconnect: shift 0kB
>>> of load (with hash 233) from eth6 to eth7 (now carrying 1280kB and 938kB
>>> load, respectively)
>>> 2025-03-12T17:49:12.307Z|2772303|bond|INFO|bond directconnect: shift 0kB
>>> of load (with hash 19) from eth6 to eth7 (now carrying 1279kB and 938kB
>>> load, respectively)
>>> 2025-03-12T17:49:12.307Z|2772304|bond|INFO|bond directconnect: shift 0kB
>>> of load (with hash 1) from eth6 to eth7 (now carrying 1279kB and 939kB
>>> load, respectively)
>>> 2025-03-12T17:49:12.307Z|2772305|bond|INFO|bond directconnect: shift 0kB
>>> of load (with hash 227) from eth6 to eth7 (now carrying 1279kB and 939kB
>>> load, respectively)
>>> 2025-03-12T17:49:12.307Z|2772306|bond|INFO|bond directconnect: shift 0kB
>>> of load (with hash 76) from eth6 to eth7 (now carrying 1279kB and 939kB
>>> load, respectively)
>>> 2025-03-12T17:49:12.307Z|2772307|bond|INFO|bond directconnect: shift 0kB
>>> of load (with hash 14) from eth6 to eth7 (now carrying 1279kB and 939kB
>>> load, respectively)
>>>
>>> This server lightly loaded (<100mb/s of network traffic).
>>>
>>> Additionally:
>>>
>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift |
>>> cut -d '|' -f1 | uniq -c | sort -n
>>> ...(a lot of lines).
>>>       243 2025-03-12T06:32:25.462Z
>>>       243 2025-03-12T06:47:36.978Z
>>>       243 2025-03-12T15:28:17.831Z
>>>       244 2025-03-12T07:50:35.699Z
>>>       244 2025-03-12T13:22:43.546Z
>>>       245 2025-03-12T12:46:39.905Z
>>>       247 2025-03-12T06:51:57.431Z
>>>       247 2025-03-12T08:49:12.899Z
>>>       248 2025-03-12T07:24:31.333Z
>>>       248 2025-03-12T14:40:52.260Z
>>>       249 2025-03-12T13:24:33.725Z
>>>       249 2025-03-12T14:23:29.867Z
>>>       250 2025-03-12T08:22:30.169Z
>>>       250 2025-03-12T13:37:25.005Z
>>>       251 2025-03-12T07:15:20.215Z
>>>       251 2025-03-12T13:10:22.321Z
>>>       251 2025-03-12T14:24:39.985Z
>>>       252 2025-03-12T07:57:47.565Z
>>>       252 2025-03-12T14:40:32.224Z
>>>       252 2025-03-12T15:26:17.663Z
>>>       254 2025-03-12T08:05:08.456Z
>>>       254 2025-03-12T08:21:50.104Z
>>>       254 2025-03-12T12:42:09.455Z
>>>
>>> The 1st column here is the total shift log lines from 1 millisecond
>>> (shown in 2nd column).
>>>
>>> So, from my example, if there is a maximum 1 log line per 10s, the
>>> logfile still will be filled with only rebalance lines, but the total
>>> amount will be noticeably reduced.
>>>
>>> By the way, why do we need this information in log? What can we conclude
>>> from this information?
>> This information is useful for correlation with networking issues.  We had a
>> few cases in the past where some weird behavior happened exactly right after
>> rebalancing and having rebalancing logs was crucial to figure that out.
>>
>> The proposed change will significantly reduce the amount of logging for you,
>> it seems, so I'd say we should proceed with that.
>>
>> The next step, I think, would be to increase the rebalancing thresholds.
>> Today we do not rebalance if the overload is less than 3% between ports or
>> less than ~1Mbps.  On modern hardware, 1Mbps is practically nothing and even
>> 10 or 100 Mbps may not be worth rebalancing in some cases.  So, instead of
>> hardcoding the 1Mbps threshold we may get a MIN of the link speeds of the
>> ports and use 1-3% of that as a threshold.  On your lightly loaded system
>> that will likely eliminate most or all the rebalancing altogether.
>>
>> It will be a little tricky, since it's not always possible to get the link
>> speed and the link speed can be changed dynamically, but should generally
>> be doable.
>>
>> This way we'll keep the logs, but avoid a log spam for all cases, except
>> the ones where rebalancing is actually necessary.
>>
>> WDYT?
> 
> Sounds good for me.
> 
> I'll post logging changes shortly, since Han is not answering and the 
> final solution is different from original.
> 
> Regarding rebalance threshold, did you mean something like this?


Yes, something like this.  I thought to store the speed in the member structure
initially, but it creates challenges if the speed changes, and it's probably not
a big deal to check the speed once in the rebalancing interval.  So, the 
solution
below might be fine.  I didn't check the math in the code below, but see some
comments inline.

> 
> diff --git a/ofproto/bond.c b/ofproto/bond.c
> index 45a36fabb..083c0be40 100644
> --- a/ofproto/bond.c
> +++ b/ofproto/bond.c
> @@ -1371,6 +1371,8 @@ bond_rebalance(struct bond *bond)
>           }
>       }
> 
> +    uint32_t mbps;

This is temporary and can be defined inside the if condition in the loop.

> +    uint64_t min_member_bps = UINT64_MAX;

An empty line would be nice.  But this may be moved to the top as well,
since this variable is used not only for the closest loop and this function
generally defines most of its variables at the top.

>       /* Add enabled members to 'bals' in descending order of tx_bytes.
>        *
>        * XXX This is O(n**2) in the number of members but it could be 
> O(n lg n)
> @@ -1379,6 +1381,10 @@ bond_rebalance(struct bond *bond)
>       HMAP_FOR_EACH (member, hmap_node, &bond->members) {
>           if (member->enabled) {
>               insert_bal(&bals, member);
> +            netdev_get_speed(member->netdev, &mbps, NULL);
> +            if (min_member_bps > mbps) {
> +                min_member_bps = mbps * 1000000;
> +            }

The comparison above is not correct as it compares apples (bps) to
oranges (Mbps).  Use the MIN macro to avoid this issue.

>           }
>       }
>       log_bals(bond, &bals);
> @@ -1392,10 +1398,13 @@ bond_rebalance(struct bond *bond)
>           uint64_t overload;
> 
>           overload = from->tx_bytes - to->tx_bytes;
> -        if (overload < to->tx_bytes >> 5 || overload < 100000) {
> +        if (overload < to->tx_bytes >> 5 ||
> +            overload < 100000 ||
> +            overload * 8 < min_member_bps >> 6) {  /* speed can be 0. */

Comments should start with a capital letter.  But also, I'd suggest we just
remove the 1Mbps condition above and instead do mbps = MAX(mbps, <some value>)
at the point of getting the speed, where <some value> will be the link speed
that will result in an equivalent 1Mbps overload check.

>               /* The extra load on 'from' (and all less-loaded members), 
> compared
>                * to that of 'to' (the least-loaded member), is less than 
> ~3%, or
> -             * it is less than ~1Mbps.  No point in rebalancing. */
> +             * it is less than ~1Mbps, or this extra load is < ~1.5% of 
> minimum
> +             bond member link speed, if present.  No point in 
> rebalancing. */
>               break;
>           }
> 
> If yes, I can send both patches in a series.

Sure.

> 
>>>> I guess we should not log 0kb movement at all.
>>>>
>>>>>>> Note: chinatelecom.cn mail server doesn't seem to like
>>>>>>> communicating with
>>>>>>> gmail smtp, and so my replies likely do not reach Han. Let's wait
>>>>>>> maybe a
>>>>>>> couple days for a reply, in case at least some of these messages
>>>>>>> are delivered.
>>>>>>>
>>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>>>
>>>>>>>>>> On 05.08.2024 10:08, Adrián Moreno wrote:
>>>>>>>>>>> On Fri, Aug 02, 2024 at 06:38:19PM GMT, Ilya Maximets wrote:
>>>>>>>>>>>> On 7/2/24 16:36, Han Ding wrote:
>>>>>>>>>>>>> When the delta is less than 1024 in bond_shift_load, it print
>>>>>>>>>>>>> "shift 0kB of load".
>>>>>>>>>>>>> Like this:
>>>>>>>>>>>>> bond dpdkbond0: shift 0kB of load (with hash 71) from nic1 to
>>>>>>>>>>>>> nic2 (now carrying 20650165kB and 8311662kB load, respectively)
>>>>>>>>>>>> Hi, Han.  Thanks for the patch!
>>>>>>>>>>>>
>>>>>>>>>>>> I'm curious, is this information about movements that small
>>>>>>>>>>>> practically useful?
>>>>>>>>>>>>
>>>>>>>>>>> I had the same thought.
>>>>>>>>>>> I guess it should not happen very often given how the algorithm
>>>>>>>>>>> works but
>>>>>>>>>>> OTOH, printing "0kB" is definitely not useful.
>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Han Ding<hand...@chinatelecom.cn>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>       ofproto/bond.c | 24 +++++++++++++++++-------
>>>>>>>>>>>>>       1 file changed, 17 insertions(+), 7 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>>>>>>>>>>>>> index c31869a..5b1975d 100644
>>>>>>>>>>>>> --- a/ofproto/bond.c
>>>>>>>>>>>>> +++ b/ofproto/bond.c
>>>>>>>>>>>>> @@ -1192,13 +1192,23 @@ bond_shift_load(struct bond_entry
>>>>>>>>>>>>> *hash, struct bond_member *to)
>>>>>>>>>>>>>           struct bond *bond = from->bond;
>>>>>>>>>>>>>           uint64_t delta = hash->tx_bytes;
>>>>>>>>>>>>>
>>>>>>>>>>>>> -    VLOG_INFO("bond %s: shift %"PRIu64"kB of load (with hash
>>>>>>>>>>>>> %"PRIdPTR") "
>>>>>>>>>>>>> -              "from %s to %s (now carrying %"PRIu64"kB and "
>>>>>>>>>>>>> -              "%"PRIu64"kB load, respectively)",
>>>>>>>>>>>>> -              bond->name, delta / 1024, hash - bond->hash,
>>>>>>>>>>>>> -              from->name, to->name,
>>>>>>>>>>>>> -              (from->tx_bytes - delta) / 1024,
>>>>>>>>>>>>> -              (to->tx_bytes + delta) / 1024);
>>>>>>>>>>>>> +    if (delta >= 1024) {
>>>>>>>>>>>>> +        VLOG_INFO("bond %s: shift %"PRIu64"kB of load (with
>>>>>>>>>>>>> hash %"PRIdPTR") "
>>>>>>>>>>>>> +                "from %s to %s (now carrying %"PRIu64"kB and "
>>>>>>>>>>>>> +                "%"PRIu64"kB load, respectively)",
>>>>>>>>>>>>> +                bond->name, delta / 1024, hash - bond->hash,
>>>>>>>>>>>>> +                from->name, to->name,
>>>>>>>>>>>>> +                (from->tx_bytes - delta) / 1024,
>>>>>>>>>>>>> +                (to->tx_bytes + delta) / 1024);
>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>> +        VLOG_INFO("bond %s: shift %"PRIu64"B of load (with
>>>>>>>>>>>>> hash %"PRIdPTR") "
>>>>>>>>>>>>> +                "from %s to %s (now carrying %"PRIu64"kB and "
>>>>>>>>>>> Apart from Ilya's comment, missing one more indentation space
>>>>>>>>>>> in this
>>>>>>>>>>> line (and all others).
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>>>> + "%"PRIu64"kB load, respectively)",
>>>>>>>>>>>>> +                bond->name, delta, hash - bond->hash,
>>>>>>>>>>>>> +                from->name, to->name,
>>>>>>>>>>>>> +                (from->tx_bytes - delta) / 1024,
>>>>>>>>>>>>> +                (to->tx_bytes + delta) / 1024);
>>>>>>>>>>>>> +    }
>>>>>>>>>>>> I'd suggest instead of copying the whole thing, just replace "kB"
>>>>>>>>>>>> with another %s and use a couple of ternary operators to produce
>>>>>>>>>>>> correct value and "kB" or "B" depending on the delta.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> dev mailing list
>>>>>>>>>>>> d...@openvswitch.org
>>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> dev mailing list
>>>>>>>>>>> d...@openvswitch.org
>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] bond: Fix inaccurate log info in bond_shift_load.

Reply via email to