On 12.03.2025 19:16, Ilya Maximets wrote: > On 3/12/25 16:01, Vladislav Odintsov wrote: >> On 12.03.2025 17:45, Ilya Maximets wrote: >>> On 3/12/25 12:41, Vladislav Odintsov wrote: >>>> On 12.03.2025 14:11, Ilya Maximets wrote: >>>>> On 3/12/25 11:22, Vladislav Odintsov wrote: >>>>>> Hi Han, Ilya and Adrian, >>>>>> >>>>>> I'm also suffering from these messages in logs. Han, do you have plans >>>>>> on working with requested changes? >>>>>> >>>>>> If you've got no time at the moment, I can continue this work addressing >>>>>> reviewers' comments. Ilya, you said 1kB is a small chunk of data and we >>>>>> shouldn't care too much about it. Which value is okay to ignore from >>>>>> your perspective? >>>>> Are you suffering from excessive logging or from not enough granularity >>>>> in the logs? This patch increases the accuracy and doesn't reduce the >>>>> amount of logs. >>>>> >>>>> If the amount of logs is a problem, then maybe we should move the logs >>>>> for individual load shifts to DBG level (maybe keep a particularly heavy >>>>> ones like >1MB move at INFO) and have a separate overview of the change >>>>> instead showing the load distribution after the rebalancing, e.g.: >>>>> >>>>> bond X: rebalanced (now carrying: portA:XXXkB, portB:YYYkB, >>>>> portC:ZZZkB) >>>>> >>>>> For the debug logs we could improve the accuracy as this patch does. >>>>> >>>>> What do you think? >>>> We suffer from excessive logging of "0 kB" records. There are tons of >>>> them. So I planned to take your proposal and do not to log these events >>>> at all. But I like your last idea about hiding messages of re-balancing >>>> <1MB under DBG log level and keep INFO for others. >>>> >>>> If you guys are okay with that, I can prepare and send a patch. >>> I think, if we're hiding some information under DBG, then we need to add >>> a new log with an overview of the rebalance with a new load distribution. >>> With that, sounds fine to me. >> Did you mean that we should log total cumulative re-balanced amount once >> in a timeframe (let say 5-15 minutes) or just log with some rate-limit? > The total cumulative amount at the end of ofproto/bond.c:bond_rebalance() > if rebalanced == true. See the example log above. So, it the log should > appear at most once per rebalance interval, which is 10s by default. > > The current 'shift X of load' log will remain but at DBG level, except for > a very large ones. > > Will that solve your problem, or is that still too much of logging (once > in 10s) ?
50/50 :) # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | wc -l 47388 # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift -c 47339 # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep 'shift 0kB' | wc -l 20384 # grep 2025-03-12 /var/log/openvswitch/ovs-vswitchd.log | grep shift | cut -d '|' -f1 | uniq -c | sort -n | sed -e 's/^[[:space:]]*//' | cut -d ' ' -f1 | uniq -c | sort -n 1 102 1 108 1 109 1 110 1 112 1 116 1 117 1 118 1 119 1 120 1 123 1 124 1 127 1 132 1 133 1 135 1 136 1 139 1 142 1 151 1 153 1 155 1 159 1 161 1 163 1 164 1 166 1 169 1 170 1 171 1 172 1 173 1 174 1 175 1 179 1 182 1 185 1 186 1 187 1 188 1 191 1 193 1 194 1 196 1 198 1 202 1 207 1 209 1 211 1 212 1 213 1 217 1 220 1 222 1 223 1 224 1 227 1 228 1 229 1 230 1 231 1 233 1 235 1 236 1 238 1 242 1 245 1 25 1 34 1 39 1 42 1 47 1 48 1 50 1 54 1 56 1 57 1 62 1 65 1 66 1 71 1 74 1 75 1 78 1 82 1 84 1 85 1 86 1 89 1 92 1 98 2 103 2 105 2 114 2 126 2 128 2 138 2 157 2 165 2 167 2 189 2 190 2 197 2 199 2 20 2 200 2 21 2 210 2 215 2 225 2 226 2 239 2 244 2 247 2 248 2 249 2 250 2 26 2 40 2 46 2 51 2 52 2 53 2 55 2 58 2 59 2 63 2 67 2 70 2 81 2 83 2 88 2 90 2 91 2 99 3 115 3 192 3 204 3 216 3 240 3 251 3 252 3 254 3 27 3 28 3 29 3 30 3 32 3 33 3 41 3 43 3 44 3 49 3 68 3 69 3 76 3 77 3 96 3 97 4 17 4 208 4 24 4 241 4 35 4 37 4 60 4 64 5 152 5 232 5 243 5 36 6 19 6 23 6 38 8 22 11 16 12 11 12 15 13 12 13 13 13 14 14 18 19 9 20 8 25 10 42 7 70 6 112 5 205 4 449 3 741 1 1034 2 The last command shows the amount (1st column) of shift events in one microsecond. Like this: I guess we should not log 0kb movement at all. > >>> Note: chinatelecom.cn mail server doesn't seem to like communicating with >>> gmail smtp, and so my replies likely do not reach Han. Let's wait maybe a >>> couple days for a reply, in case at least some of these messages are >>> delivered. >>> >>>>> Best regards, Ilya Maximets. >>>>> >>>>>> On 05.08.2024 10:08, Adrián Moreno wrote: >>>>>>> On Fri, Aug 02, 2024 at 06:38:19PM GMT, Ilya Maximets wrote: >>>>>>>> On 7/2/24 16:36, Han Ding wrote: >>>>>>>>> When the delta is less than 1024 in bond_shift_load, it print "shift >>>>>>>>> 0kB of load". >>>>>>>>> Like this: >>>>>>>>> bond dpdkbond0: shift 0kB of load (with hash 71) from nic1 to nic2 >>>>>>>>> (now carrying 20650165kB and 8311662kB load, respectively) >>>>>>>> Hi, Han. Thanks for the patch! >>>>>>>> >>>>>>>> I'm curious, is this information about movements that small >>>>>>>> practically useful? >>>>>>>> >>>>>>> I had the same thought. >>>>>>> I guess it should not happen very often given how the algorithm works >>>>>>> but >>>>>>> OTOH, printing "0kB" is definitely not useful. >>>>>>> >>>>>>>>> Signed-off-by: Han Ding <hand...@chinatelecom.cn> >>>>>>>>> --- >>>>>>>>> ofproto/bond.c | 24 +++++++++++++++++------- >>>>>>>>> 1 file changed, 17 insertions(+), 7 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/ofproto/bond.c b/ofproto/bond.c >>>>>>>>> index c31869a..5b1975d 100644 >>>>>>>>> --- a/ofproto/bond.c >>>>>>>>> +++ b/ofproto/bond.c >>>>>>>>> @@ -1192,13 +1192,23 @@ bond_shift_load(struct bond_entry *hash, >>>>>>>>> struct bond_member *to) >>>>>>>>> struct bond *bond = from->bond; >>>>>>>>> uint64_t delta = hash->tx_bytes; >>>>>>>>> >>>>>>>>> - VLOG_INFO("bond %s: shift %"PRIu64"kB of load (with hash >>>>>>>>> %"PRIdPTR") " >>>>>>>>> - "from %s to %s (now carrying %"PRIu64"kB and " >>>>>>>>> - "%"PRIu64"kB load, respectively)", >>>>>>>>> - bond->name, delta / 1024, hash - bond->hash, >>>>>>>>> - from->name, to->name, >>>>>>>>> - (from->tx_bytes - delta) / 1024, >>>>>>>>> - (to->tx_bytes + delta) / 1024); >>>>>>>>> + if (delta >= 1024) { >>>>>>>>> + VLOG_INFO("bond %s: shift %"PRIu64"kB of load (with hash >>>>>>>>> %"PRIdPTR") " >>>>>>>>> + "from %s to %s (now carrying %"PRIu64"kB and " >>>>>>>>> + "%"PRIu64"kB load, respectively)", >>>>>>>>> + bond->name, delta / 1024, hash - bond->hash, >>>>>>>>> + from->name, to->name, >>>>>>>>> + (from->tx_bytes - delta) / 1024, >>>>>>>>> + (to->tx_bytes + delta) / 1024); >>>>>>>>> + } else { >>>>>>>>> + VLOG_INFO("bond %s: shift %"PRIu64"B of load (with hash >>>>>>>>> %"PRIdPTR") " >>>>>>>>> + "from %s to %s (now carrying %"PRIu64"kB and " >>>>>>> Apart from Ilya's comment, missing one more indentation space in this >>>>>>> line (and all others). >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>>>> + "%"PRIu64"kB load, respectively)", >>>>>>>>> + bond->name, delta, hash - bond->hash, >>>>>>>>> + from->name, to->name, >>>>>>>>> + (from->tx_bytes - delta) / 1024, >>>>>>>>> + (to->tx_bytes + delta) / 1024); >>>>>>>>> + } >>>>>>>> I'd suggest instead of copying the whole thing, just replace "kB" >>>>>>>> with another %s and use a couple of ternary operators to produce >>>>>>>> correct value and "kB" or "B" depending on the delta. >>>>>>>> >>>>>>>> Best regards, Ilya Maximets. >>>>>>>> _______________________________________________ >>>>>>>> dev mailing list >>>>>>>> d...@openvswitch.org >>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> dev mailing list >>>>>>> d...@openvswitch.org >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- Regards, Vladislav Odintsov _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev