Re: [ovs-dev] [PATCH] bond: Fix inaccurate log info in bond_shift_load.

Vladislav Odintsov Wed, 12 Mar 2025 10:49:51 -0700

On 12.03.2025 19:16, Ilya Maximets wrote:
> On 3/12/25 16:01, Vladislav Odintsov wrote:
>> On 12.03.2025 17:45, Ilya Maximets wrote:
>>> On 3/12/25 12:41, Vladislav Odintsov wrote:
>>>> On 12.03.2025 14:11, Ilya Maximets wrote:
>>>>> On 3/12/25 11:22, Vladislav Odintsov wrote:
>>>>>> Hi Han, Ilya and Adrian,
>>>>>>
>>>>>> I'm also suffering from these messages in logs. Han, do you have plans
>>>>>> on working with requested changes?
>>>>>>
>>>>>> If you've got no time at the moment, I can continue this work addressing
>>>>>> reviewers' comments. Ilya, you said 1kB is a small chunk of data and we
>>>>>> shouldn't care too much about it. Which value is okay to ignore from
>>>>>> your perspective?
>>>>> Are you suffering from excessive logging or from not enough granularity
>>>>> in the logs?  This patch increases the accuracy and doesn't reduce the
>>>>> amount of logs.
>>>>>
>>>>> If the amount of logs is a problem, then maybe we should move the logs
>>>>> for individual load shifts to DBG level (maybe keep a particularly heavy
>>>>> ones like >1MB move at INFO) and have a separate overview of the change
>>>>> instead showing the load distribution after the rebalancing, e.g.:
>>>>>
>>>>>     bond X: rebalanced (now carrying: portA:XXXkB, portB:YYYkB, 
>>>>> portC:ZZZkB)
>>>>>
>>>>> For the debug logs we could improve the accuracy as this patch does.
>>>>>
>>>>> What do you think?
>>>> We suffer from excessive logging of "0 kB" records. There are tons of
>>>> them. So I planned to take your proposal and do not to log these events
>>>> at all. But I like your last idea about hiding messages of re-balancing
>>>> <1MB under DBG log level and keep INFO for others.
>>>>
>>>> If you guys are okay with that, I can prepare and send a patch.
>>> I think, if we're hiding some information under DBG, then we need to add
>>> a new log with an overview of the rebalance with a new load distribution.
>>> With that, sounds fine to me.
>> Did you mean that we should log total cumulative re-balanced amount once
>> in a timeframe (let say 5-15 minutes) or just log with some rate-limit?
> The total cumulative amount at the end of ofproto/bond.c:bond_rebalance()
> if rebalanced == true.  See the example log above.  So, it the log should
> appear at most once per rebalance interval, which is 10s by default.
>
> The current 'shift X of load' log will remain but at DBG level, except for
> a very large ones.
>
> Will that solve your problem, or is that still too much of logging (once
> in 10s) ?


50/50 :)

# grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | wc -l
47388
# grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift -c
47339
# grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep 'shift 
0kB' | wc -l
20384
# grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift | 
cut -d '|' -f1 | uniq -c | sort -n | sed -e 's/^[[:space:]]*//' | cut -d 
' ' -f1 | uniq -c | sort -n
       1 102
       1 108
       1 109
       1 110
       1 112
       1 116
       1 117
       1 118
       1 119
       1 120
       1 123
       1 124
       1 127
       1 132
       1 133
       1 135
       1 136
       1 139
       1 142
       1 151
       1 153
       1 155
       1 159
       1 161
       1 163
       1 164
       1 166
       1 169
       1 170
       1 171
       1 172
       1 173
       1 174
       1 175
       1 179
       1 182
       1 185
       1 186
       1 187
       1 188
       1 191
       1 193
       1 194
       1 196
       1 198
       1 202
       1 207
       1 209
       1 211
       1 212
       1 213
       1 217
       1 220
       1 222
       1 223
       1 224
       1 227
       1 228
       1 229
       1 230
       1 231
       1 233
       1 235
       1 236
       1 238
       1 242
       1 245
       1 25
       1 34
       1 39
       1 42
       1 47
       1 48
       1 50
       1 54
       1 56
       1 57
       1 62
       1 65
       1 66
       1 71
       1 74
       1 75
       1 78
       1 82
       1 84
       1 85
       1 86
       1 89
       1 92
       1 98
       2 103
       2 105
       2 114
       2 126
       2 128
       2 138
       2 157
       2 165
       2 167
       2 189
       2 190
       2 197
       2 199
       2 20
       2 200
       2 21
       2 210
       2 215
       2 225
       2 226
       2 239
       2 244
       2 247
       2 248
       2 249
       2 250
       2 26
       2 40
       2 46
       2 51
       2 52
       2 53
       2 55
       2 58
       2 59
       2 63
       2 67
       2 70
       2 81
       2 83
       2 88
       2 90
       2 91
       2 99
       3 115
       3 192
       3 204
       3 216
       3 240
       3 251
       3 252
       3 254
       3 27
       3 28
       3 29
       3 30
       3 32
       3 33
       3 41
       3 43
       3 44
       3 49
       3 68
       3 69
       3 76
       3 77
       3 96
       3 97
       4 17
       4 208
       4 24
       4 241
       4 35
       4 37
       4 60
       4 64
       5 152
       5 232
       5 243
       5 36
       6 19
       6 23
       6 38
       8 22
      11 16
      12 11
      12 15
      13 12
      13 13
      13 14
      14 18
      19 9
      20 8
      25 10
      42 7
      70 6
     112 5
     205 4
     449 3
     741 1
    1034 2

The last command shows the amount (1st column) of shift events in one 
microsecond. Like this:


I guess we should not log 0kb movement at all.

>
>>> Note: chinatelecom.cn mail server doesn't seem to like communicating with
>>> gmail smtp, and so my replies likely do not reach Han.  Let's wait maybe a
>>> couple days for a reply, in case at least some of these messages are 
>>> delivered.
>>>
>>>>> Best regards, Ilya Maximets.
>>>>>
>>>>>> On 05.08.2024 10:08, Adrián Moreno wrote:
>>>>>>> On Fri, Aug 02, 2024 at 06:38:19PM GMT, Ilya Maximets wrote:
>>>>>>>> On 7/2/24 16:36, Han Ding wrote:
>>>>>>>>> When the delta is less than 1024 in bond_shift_load, it print "shift 
>>>>>>>>> 0kB of load".
>>>>>>>>> Like this:
>>>>>>>>> bond dpdkbond0: shift 0kB of load (with hash 71) from nic1 to nic2 
>>>>>>>>> (now carrying 20650165kB and 8311662kB load, respectively)
>>>>>>>> Hi, Han.  Thanks for the patch!
>>>>>>>>
>>>>>>>> I'm curious, is this information about movements that small
>>>>>>>> practically useful?
>>>>>>>>
>>>>>>> I had the same thought.
>>>>>>> I guess it should not happen very often given how the algorithm works 
>>>>>>> but
>>>>>>> OTOH, printing "0kB" is definitely not useful.
>>>>>>>
>>>>>>>>> Signed-off-by: Han Ding <hand...@chinatelecom.cn>
>>>>>>>>> ---
>>>>>>>>>      ofproto/bond.c | 24 +++++++++++++++++-------
>>>>>>>>>      1 file changed, 17 insertions(+), 7 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/ofproto/bond.c b/ofproto/bond.c
>>>>>>>>> index c31869a..5b1975d 100644
>>>>>>>>> --- a/ofproto/bond.c
>>>>>>>>> +++ b/ofproto/bond.c
>>>>>>>>> @@ -1192,13 +1192,23 @@ bond_shift_load(struct bond_entry *hash, 
>>>>>>>>> struct bond_member *to)
>>>>>>>>>          struct bond *bond = from->bond;
>>>>>>>>>          uint64_t delta = hash->tx_bytes;
>>>>>>>>>
>>>>>>>>> -    VLOG_INFO("bond %s: shift %"PRIu64"kB of load (with hash 
>>>>>>>>> %"PRIdPTR") "
>>>>>>>>> -              "from %s to %s (now carrying %"PRIu64"kB and "
>>>>>>>>> -              "%"PRIu64"kB load, respectively)",
>>>>>>>>> -              bond->name, delta / 1024, hash - bond->hash,
>>>>>>>>> -              from->name, to->name,
>>>>>>>>> -              (from->tx_bytes - delta) / 1024,
>>>>>>>>> -              (to->tx_bytes + delta) / 1024);
>>>>>>>>> +    if (delta >= 1024) {
>>>>>>>>> +        VLOG_INFO("bond %s: shift %"PRIu64"kB of load (with hash 
>>>>>>>>> %"PRIdPTR") "
>>>>>>>>> +                "from %s to %s (now carrying %"PRIu64"kB and "
>>>>>>>>> +                "%"PRIu64"kB load, respectively)",
>>>>>>>>> +                bond->name, delta / 1024, hash - bond->hash,
>>>>>>>>> +                from->name, to->name,
>>>>>>>>> +                (from->tx_bytes - delta) / 1024,
>>>>>>>>> +                (to->tx_bytes + delta) / 1024);
>>>>>>>>> +    } else {
>>>>>>>>> +        VLOG_INFO("bond %s: shift %"PRIu64"B of load (with hash 
>>>>>>>>> %"PRIdPTR") "
>>>>>>>>> +                "from %s to %s (now carrying %"PRIu64"kB and "
>>>>>>> Apart from Ilya's comment, missing one more indentation space in this
>>>>>>> line (and all others).
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>>> +                "%"PRIu64"kB load, respectively)",
>>>>>>>>> +                bond->name, delta, hash - bond->hash,
>>>>>>>>> +                from->name, to->name,
>>>>>>>>> +                (from->tx_bytes - delta) / 1024,
>>>>>>>>> +                (to->tx_bytes + delta) / 1024);
>>>>>>>>> +    }
>>>>>>>> I'd suggest instead of copying the whole thing, just replace "kB"
>>>>>>>> with another %s and use a couple of ternary operators to produce
>>>>>>>> correct value and "kB" or "B" depending on the delta.
>>>>>>>>
>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>> _______________________________________________
>>>>>>>> dev mailing list
>>>>>>>> d...@openvswitch.org
>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dev mailing list
>>>>>>> d...@openvswitch.org
>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
Regards,
Vladislav Odintsov

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] bond: Fix inaccurate log info in bond_shift_load.

Reply via email to