On 14.03.2025 16:43, Ilya Maximets wrote:
> On 3/14/25 13:03, Ilya Maximets wrote:
>> On 3/14/25 10:26, Vladislav Odintsov wrote:
>>> On 13.03.2025 22:05, Ilya Maximets wrote:
>>>> On 3/12/25 19:07, Vladislav Odintsov wrote:
>>>>> Sorry, I hit the send button too early.
>>>>>
>>>>> On 12.03.2025 20:49, Vladislav Odintsov wrote:
>>>>>> On 12.03.2025 19:16, Ilya Maximets wrote:
>>>>>>> On 3/12/25 16:01, Vladislav Odintsov wrote:
>>>>>>>> On 12.03.2025 17:45, Ilya Maximets wrote:
>>>>>>>>> On 3/12/25 12:41, Vladislav Odintsov wrote:
>>>>>>>>>> On 12.03.2025 14:11, Ilya Maximets wrote:
>>>>>>>>>>> On 3/12/25 11:22, Vladislav Odintsov wrote:
>>>>>>>>>>>> Hi Han, Ilya and Adrian,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm also suffering from these messages in logs. Han, do you have
>>>>>>>>>>>> plans
>>>>>>>>>>>> on working with requested changes?
>>>>>>>>>>>>
>>>>>>>>>>>> If you've got no time at the moment, I can continue this work
>>>>>>>>>>>> addressing
>>>>>>>>>>>> reviewers' comments. Ilya, you said 1kB is a small chunk of data
>>>>>>>>>>>> and we
>>>>>>>>>>>> shouldn't care too much about it. Which value is okay to ignore
>>>>>>>>>>>> from
>>>>>>>>>>>> your perspective?
>>>>>>>>>>> Are you suffering from excessive logging or from not enough
>>>>>>>>>>> granularity
>>>>>>>>>>> in the logs?  This patch increases the accuracy and doesn't
>>>>>>>>>>> reduce the
>>>>>>>>>>> amount of logs.
>>>>>>>>>>>
>>>>>>>>>>> If the amount of logs is a problem, then maybe we should move the
>>>>>>>>>>> logs
>>>>>>>>>>> for individual load shifts to DBG level (maybe keep a
>>>>>>>>>>> particularly heavy
>>>>>>>>>>> ones like >1MB move at INFO) and have a separate overview of the
>>>>>>>>>>> change
>>>>>>>>>>> instead showing the load distribution after the rebalancing, e.g.:
>>>>>>>>>>>
>>>>>>>>>>>       bond X: rebalanced (now carrying: portA:XXXkB, portB:YYYkB,
>>>>>>>>>>> portC:ZZZkB)
>>>>>>>>>>>
>>>>>>>>>>> For the debug logs we could improve the accuracy as this patch does.
>>>>>>>>>>>
>>>>>>>>>>> What do you think?
>>>>>>>>>> We suffer from excessive logging of "0 kB" records. There are tons of
>>>>>>>>>> them. So I planned to take your proposal and do not to log these
>>>>>>>>>> events
>>>>>>>>>> at all. But I like your last idea about hiding messages of
>>>>>>>>>> re-balancing
>>>>>>>>>> <1MB under DBG log level and keep INFO for others.
>>>>>>>>>>
>>>>>>>>>> If you guys are okay with that, I can prepare and send a patch.
>>>>>>>>> I think, if we're hiding some information under DBG, then we need
>>>>>>>>> to add
>>>>>>>>> a new log with an overview of the rebalance with a new load
>>>>>>>>> distribution.
>>>>>>>>> With that, sounds fine to me.
>>>>>>>> Did you mean that we should log total cumulative re-balanced amount
>>>>>>>> once
>>>>>>>> in a timeframe (let say 5-15 minutes) or just log with some rate-limit?
>>>>>>> The total cumulative amount at the end of
>>>>>>> ofproto/bond.c:bond_rebalance()
>>>>>>> if rebalanced == true.  See the example log above.  So, it the log
>>>>>>> should
>>>>>>> appear at most once per rebalance interval, which is 10s by default.
>>>>>>>
>>>>>>> The current 'shift X of load' log will remain but at DBG level,
>>>>>>> except for
>>>>>>> a very large ones.
>>>>>>>
>>>>>>> Will that solve your problem, or is that still too much of logging (once
>>>>>>> in 10s) ?
>>>>>> 50/50 :)
>>>>>>
>>>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | wc -l
>>>>>> 47388
>>>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift -c
>>>>>> 47339
>>>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep 'shift
>>>>>> 0kB' | wc -l
>>>>>> 20384
>>>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift |
>>>>>> cut -d '|' -f1 | uniq -c | sort -n | sed -e 's/^[[:space:]]*//' | cut
>>>>>> -d ' ' -f1 | uniq -c | sort -n
>>>>>>         1 102
>>>>>>         1 108
>>>>>>         1 109
>>>> <snip>
>>>>
>>>>>>        70 6
>>>>>>       112 5
>>>>>>       205 4
>>>>>>       449 3
>>>>>>       741 1
>>>>>>      1034 2
>>>>>>
>>>>>> The last command shows the amount (1st column) of shift events in one
>>>>>> microsecond. Like this:
>>>>> Sorry, there was an error:
>>>>>
>>>>> The *second* column (value) shows the total amount of shift log lines
>>>>> occurred in any one millisecond in a log by last day.
>>>>> The *first* column (key) shows the total times in a log by last day
>>>>> where there was a (value) events in one millisecond.
>>>>>
>>>>> Like this:
>>>>>
>>>>> 2025-03-12T17:49:12.307Z|2772298|bond|INFO|bond directconnect: shift
>>>>> 407kB of load (with hash 40) from eth6 to eth7 (now carrying 1283kB and
>>>>> 935kB load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772299|bond|INFO|bond directconnect: shift 1kB
>>>>> of load (with hash 82) from eth6 to eth7 (now carrying 1282kB and 936kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772300|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 85) from eth6 to eth7 (now carrying 1281kB and 937kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772301|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 45) from eth6 to eth7 (now carrying 1280kB and 938kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772302|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 233) from eth6 to eth7 (now carrying 1280kB and 938kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772303|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 19) from eth6 to eth7 (now carrying 1279kB and 938kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772304|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 1) from eth6 to eth7 (now carrying 1279kB and 939kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772305|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 227) from eth6 to eth7 (now carrying 1279kB and 939kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772306|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 76) from eth6 to eth7 (now carrying 1279kB and 939kB
>>>>> load, respectively)
>>>>> 2025-03-12T17:49:12.307Z|2772307|bond|INFO|bond directconnect: shift 0kB
>>>>> of load (with hash 14) from eth6 to eth7 (now carrying 1279kB and 939kB
>>>>> load, respectively)
>>>>>
>>>>> This server lightly loaded (<100mb/s of network traffic).
>>>>>
>>>>> Additionally:
>>>>>
>>>>> # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift |
>>>>> cut -d '|' -f1 | uniq -c | sort -n
>>>>> ...(a lot of lines).
>>>>>        243 2025-03-12T06:32:25.462Z
>>>>>        243 2025-03-12T06:47:36.978Z
>>>>>        243 2025-03-12T15:28:17.831Z
>>>>>        244 2025-03-12T07:50:35.699Z
>>>>>        244 2025-03-12T13:22:43.546Z
>>>>>        245 2025-03-12T12:46:39.905Z
>>>>>        247 2025-03-12T06:51:57.431Z
>>>>>        247 2025-03-12T08:49:12.899Z
>>>>>        248 2025-03-12T07:24:31.333Z
>>>>>        248 2025-03-12T14:40:52.260Z
>>>>>        249 2025-03-12T13:24:33.725Z
>>>>>        249 2025-03-12T14:23:29.867Z
>>>>>        250 2025-03-12T08:22:30.169Z
>>>>>        250 2025-03-12T13:37:25.005Z
>>>>>        251 2025-03-12T07:15:20.215Z
>>>>>        251 2025-03-12T13:10:22.321Z
>>>>>        251 2025-03-12T14:24:39.985Z
>>>>>        252 2025-03-12T07:57:47.565Z
>>>>>        252 2025-03-12T14:40:32.224Z
>>>>>        252 2025-03-12T15:26:17.663Z
>>>>>        254 2025-03-12T08:05:08.456Z
>>>>>        254 2025-03-12T08:21:50.104Z
>>>>>        254 2025-03-12T12:42:09.455Z
>>>>>
>>>>> The 1st column here is the total shift log lines from 1 millisecond
>>>>> (shown in 2nd column).
>>>>>
>>>>> So, from my example, if there is a maximum 1 log line per 10s, the
>>>>> logfile still will be filled with only rebalance lines, but the total
>>>>> amount will be noticeably reduced.
>>>>>
>>>>> By the way, why do we need this information in log? What can we conclude
>>>>> from this information?
>>>> This information is useful for correlation with networking issues.  We had 
>>>> a
>>>> few cases in the past where some weird behavior happened exactly right 
>>>> after
>>>> rebalancing and having rebalancing logs was crucial to figure that out.
>>>>
>>>> The proposed change will significantly reduce the amount of logging for 
>>>> you,
>>>> it seems, so I'd say we should proceed with that.
>>>>
>>>> The next step, I think, would be to increase the rebalancing thresholds.
>>>> Today we do not rebalance if the overload is less than 3% between ports or
>>>> less than ~1Mbps.  On modern hardware, 1Mbps is practically nothing and 
>>>> even
>>>> 10 or 100 Mbps may not be worth rebalancing in some cases.  So, instead of
>>>> hardcoding the 1Mbps threshold we may get a MIN of the link speeds of the
>>>> ports and use 1-3% of that as a threshold.  On your lightly loaded system
>>>> that will likely eliminate most or all the rebalancing altogether.
>>>>
>>>> It will be a little tricky, since it's not always possible to get the link
>>>> speed and the link speed can be changed dynamically, but should generally
>>>> be doable.
>>>>
>>>> This way we'll keep the logs, but avoid a log spam for all cases, except
>>>> the ones where rebalancing is actually necessary.
>>>>
>>>> WDYT?
>>> Sounds good for me.
>>>
>>> I'll post logging changes shortly, since Han is not answering and the
>>> final solution is different from original.
>>>
>>> Regarding rebalance threshold, did you mean something like this?
>> Yes, something like this.  I thought to store the speed in the member 
>> structure
>> initially, but it creates challenges if the speed changes, and it's probably 
>> not
>> a big deal to check the speed once in the rebalancing interval.  So, the 
>> solution
>> below might be fine.  I didn't check the math in the code below, but see some
>> comments inline.

Thanks for the review!

I'll address your comments and send out the patch series.

>>
>>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>>> index 45a36fabb..083c0be40 100644
>>> --- a/ofproto/bond.c
>>> +++ b/ofproto/bond.c
>>> @@ -1371,6 +1371,8 @@ bond_rebalance(struct bond *bond)
>>>            }
>>>        }
>>>
>>> +    uint32_t mbps;
>> This is temporary and can be defined inside the if condition in the loop.
>>
>>> +    uint64_t min_member_bps = UINT64_MAX;
>> An empty line would be nice.  But this may be moved to the top as well,
>> since this variable is used not only for the closest loop and this function
>> generally defines most of its variables at the top.
>>
>>>        /* Add enabled members to 'bals' in descending order of tx_bytes.
>>>         *
>>>         * XXX This is O(n**2) in the number of members but it could be
>>> O(n lg n)
>>> @@ -1379,6 +1381,10 @@ bond_rebalance(struct bond *bond)
>>>        HMAP_FOR_EACH (member, hmap_node, &bond->members) {
>>>            if (member->enabled) {
>>>                insert_bal(&bals, member);
>>> +            netdev_get_speed(member->netdev, &mbps, NULL);
>>> +            if (min_member_bps > mbps) {
>>> +                min_member_bps = mbps * 1000000;
>>> +            }
>> The comparison above is not correct as it compares apples (bps) to
>> oranges (Mbps).  Use the MIN macro to avoid this issue.

You're right. Initially I compared mbps, but then changed variable to 
bps and lost this place.

Thanks for the hint, I'll fix this.

> Also, this multiplication will easily overflow uin32_t, so UINT64_C(1000000)
> should be used instead to avoid type conversion issues.  Ideally, I'd suggest
> to keep the value in Mbps and divide the overload if necessary instead.
> We do not care much about accuracy on small values, but we do care about not
> cropping the large ones.

You're right. Will fix that.

>
>>>            }
>>>        }
>>>        log_bals(bond, &bals);
>>> @@ -1392,10 +1398,13 @@ bond_rebalance(struct bond *bond)
>>>            uint64_t overload;
>>>
>>>            overload = from->tx_bytes - to->tx_bytes;
>>> -        if (overload < to->tx_bytes >> 5 || overload < 100000) {
>>> +        if (overload < to->tx_bytes >> 5 ||
>>> +            overload < 100000 ||
>>> +            overload * 8 < min_member_bps >> 6) {  /* speed can be 0. */
>> Comments should start with a capital letter.  But also, I'd suggest we just
>> remove the 1Mbps condition above and instead do mbps = MAX(mbps, <some 
>> value>)
>> at the point of getting the speed, where <some value> will be the link speed
>> that will result in an equivalent 1Mbps overload check.

Ack.

>>
>>>                /* The extra load on 'from' (and all less-loaded members),
>>> compared
>>>                 * to that of 'to' (the least-loaded member), is less than
>>> ~3%, or
>>> -             * it is less than ~1Mbps.  No point in rebalancing. */
>>> +             * it is less than ~1Mbps, or this extra load is < ~1.5% of
>>> minimum
>>> +             bond member link speed, if present.  No point in
>>> rebalancing. */
>>>                break;
>>>            }
>>>
>>> If yes, I can send both patches in a series.
>> Sure.
>>
>>>>>> I guess we should not log 0kb movement at all.
>>>>>>
>>>>>>>>> Note: chinatelecom.cn mail server doesn't seem to like
>>>>>>>>> communicating with
>>>>>>>>> gmail smtp, and so my replies likely do not reach Han. Let's wait
>>>>>>>>> maybe a
>>>>>>>>> couple days for a reply, in case at least some of these messages
>>>>>>>>> are delivered.
>>>>>>>>>
>>>>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>>>>>
>>>>>>>>>>>> On 05.08.2024 10:08, Adrián Moreno wrote:
>>>>>>>>>>>>> On Fri, Aug 02, 2024 at 06:38:19PM GMT, Ilya Maximets wrote:
>>>>>>>>>>>>>> On 7/2/24 16:36, Han Ding wrote:
>>>>>>>>>>>>>>> When the delta is less than 1024 in bond_shift_load, it print
>>>>>>>>>>>>>>> "shift 0kB of load".
>>>>>>>>>>>>>>> Like this:
>>>>>>>>>>>>>>> bond dpdkbond0: shift 0kB of load (with hash 71) from nic1 to
>>>>>>>>>>>>>>> nic2 (now carrying 20650165kB and 8311662kB load, respectively)
>>>>>>>>>>>>>> Hi, Han.  Thanks for the patch!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm curious, is this information about movements that small
>>>>>>>>>>>>>> practically useful?
>>>>>>>>>>>>>>
>>>>>>>>>>>>> I had the same thought.
>>>>>>>>>>>>> I guess it should not happen very often given how the algorithm
>>>>>>>>>>>>> works but
>>>>>>>>>>>>> OTOH, printing "0kB" is definitely not useful.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Han Ding<hand...@chinatelecom.cn>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>        ofproto/bond.c | 24 +++++++++++++++++-------
>>>>>>>>>>>>>>>        1 file changed, 17 insertions(+), 7 deletions(-)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>>>>>>>>>>>>>>> index c31869a..5b1975d 100644
>>>>>>>>>>>>>>> --- a/ofproto/bond.c
>>>>>>>>>>>>>>> +++ b/ofproto/bond.c
>>>>>>>>>>>>>>> @@ -1192,13 +1192,23 @@ bond_shift_load(struct bond_entry
>>>>>>>>>>>>>>> *hash, struct bond_member *to)
>>>>>>>>>>>>>>>            struct bond *bond = from->bond;
>>>>>>>>>>>>>>>            uint64_t delta = hash->tx_bytes;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -    VLOG_INFO("bond %s: shift %"PRIu64"kB of load (with hash
>>>>>>>>>>>>>>> %"PRIdPTR") "
>>>>>>>>>>>>>>> -              "from %s to %s (now carrying %"PRIu64"kB and "
>>>>>>>>>>>>>>> -              "%"PRIu64"kB load, respectively)",
>>>>>>>>>>>>>>> -              bond->name, delta / 1024, hash - bond->hash,
>>>>>>>>>>>>>>> -              from->name, to->name,
>>>>>>>>>>>>>>> -              (from->tx_bytes - delta) / 1024,
>>>>>>>>>>>>>>> -              (to->tx_bytes + delta) / 1024);
>>>>>>>>>>>>>>> +    if (delta >= 1024) {
>>>>>>>>>>>>>>> +        VLOG_INFO("bond %s: shift %"PRIu64"kB of load (with
>>>>>>>>>>>>>>> hash %"PRIdPTR") "
>>>>>>>>>>>>>>> +                "from %s to %s (now carrying %"PRIu64"kB and "
>>>>>>>>>>>>>>> +                "%"PRIu64"kB load, respectively)",
>>>>>>>>>>>>>>> +                bond->name, delta / 1024, hash - bond->hash,
>>>>>>>>>>>>>>> +                from->name, to->name,
>>>>>>>>>>>>>>> +                (from->tx_bytes - delta) / 1024,
>>>>>>>>>>>>>>> +                (to->tx_bytes + delta) / 1024);
>>>>>>>>>>>>>>> +    } else {
>>>>>>>>>>>>>>> +        VLOG_INFO("bond %s: shift %"PRIu64"B of load (with
>>>>>>>>>>>>>>> hash %"PRIdPTR") "
>>>>>>>>>>>>>>> +                "from %s to %s (now carrying %"PRIu64"kB and "
>>>>>>>>>>>>> Apart from Ilya's comment, missing one more indentation space
>>>>>>>>>>>>> in this
>>>>>>>>>>>>> line (and all others).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> + "%"PRIu64"kB load, respectively)",
>>>>>>>>>>>>>>> +                bond->name, delta, hash - bond->hash,
>>>>>>>>>>>>>>> +                from->name, to->name,
>>>>>>>>>>>>>>> +                (from->tx_bytes - delta) / 1024,
>>>>>>>>>>>>>>> +                (to->tx_bytes + delta) / 1024);
>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> I'd suggest instead of copying the whole thing, just replace "kB"
>>>>>>>>>>>>>> with another %s and use a couple of ternary operators to produce
>>>>>>>>>>>>>> correct value and "kB" or "B" depending on the delta.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> dev mailing list
>>>>>>>>>>>>>> d...@openvswitch.org
>>>>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> dev mailing list
>>>>>>>>>>>>> d...@openvswitch.org
>>>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
Regards,
Vladislav Odintsov

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to