On 5/17/23 13:09, Eelco Chaudron wrote:
> 
> 
> On 31 Mar 2023, at 15:19, Eelco Chaudron wrote:
> 
>> On 31 Mar 2023, at 15:15, Ilya Maximets wrote:
>>
>>> On 3/31/23 15:06, Eelco Chaudron wrote:
>>>>
>>>>
>>>> On 31 Mar 2023, at 12:38, Simon Horman wrote:
>>>>
>>>>> On Fri, Mar 31, 2023 at 12:05:09PM +0200, Ilya Maximets wrote:
>>>>>> On 3/31/23 11:07, Simon Horman wrote:
>>>>>>> On Thu, Mar 30, 2023 at 09:04:02PM +0200, Ilya Maximets wrote:
>>>>>>>> On 3/30/23 11:45, Simon Horman wrote:
>>>>>>>>> On Fri, Mar 17, 2023 at 09:47:36PM +0100, Eelco Chaudron wrote:
>>>>>>>>>>> Op 17 mrt. 2023 om 21:11 heeft Marcelo Ricardo Leitner 
>>>>>>>>>>> <[email protected]> het volgende geschreven:
>>>>>>>>>>> Thu, Mar 16, 2023 at 09:51:34AM +0100, Eelco Chaudron wrote:
>>>>>>>>>>>>> On 22 Dec 2022, at 13:32, Eelco Chaudron wrote:
>>>>>>>>>>>>>> On 22 Dec 2022, at 10:26, Balazs Nemeth wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The only way that stats->{n_packets,n_bytes} would decrease is 
>>>>>>>>>>>>>> due to an
>>>>>>>>>
>>>>>>>>> nit: s/way/ways/
>>>>>
>>>>>>>>>>>>>> overflow, or if there are bugs in how statistics are handled. In 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> past, there were multiple bugs that caused a jump backward. A
>>>>>>>>>>>>>> workaround was in place to set the statistics to 0 in that case. 
>>>>>>>>>>>>>> When
>>>>>>>>>>>>>> this happened while the revalidator was under heavy load, the 
>>>>>>>>>>>>>> workaround
>>>>>>>>>>>>>> had an unintended side effect where should_revalidate returned 
>>>>>>>>>>>>>> false
>>>>>>>>>>>>>> causing the flow to be removed because the metric it calculated 
>>>>>>>>>>>>>> was
>>>>>>>>>>>>>> based on a bogus value. Since many of those bugs have now been
>>>>>>>>>>>>>> identified and resolved, there is no need to set the statistics 
>>>>>>>>>>>>>> to 0. In
>>>>>>>>>>>>>> addition, the (unlikely) overflow still needs to be handled
>>>>>>>>>>>>>> appropriately.
>>>>>>>>>
>>>>>>>>> 1. Perhaps it would be prudent to log/count if/when this occurs
>>>>>>>>
>>>>>>>> +1
>>>>>>>> We do have a coverage counter that will indicate the case where stats
>>>>>>>> jump back.  However, if we're certain that this should never happen,
>>>>>>>> we should, probably, emit a warning or even an error log as well, so
>>>>>>>> users are aware that something went wrong.
>>>>>>>
>>>>>>> I was thinking more of a counter, which seems to already be covered.
>>>>>>> But I have no objection to your reasoning about having a warning (too).
>>>>>>>
>>>>>>>>
>>>>>>>>> 2. I take it that the overflow handling would be follow-up work,
>>>>>>>>>    is that correct?
>>>>>>>>
>>>>>>>> The unsigned arithmetic should take case of overflowed counters,
>>>>>>>> because the result of subtraction will still give a correct difference
>>>>>>>> between the old and a new value, even if it overflowed and the new
>>>>>>>> value is smaller.  Unless, of course, it overflowed more than once.
>>>>>>>
>>>>>>> More than once between samples?
>>>>>>> If so, I'm assuming that is not a case we can hit unless there is a bug.
>>>>>>
>>>>>> Right.  It's actually should be practically not possible to overflow
>>>>>> even once with a current hardware.  Assuming we have a fancy 400 Gbps
>>>>>> NIC, then it should take 11.7 years to overflow a byte counter.
>>>>>>
>>>>>> So, this patch is mostly removing a workaround for some bug that we
>>>>>> hope we fixed.  But it's not clear what the original bug was as the
>>>>>> commit message for this workaround didn't specify a root cause.  So,
>>>>>> it's hard to say if it's fixed or not.  And that's why I'm thinking
>>>>>> that the error message is needed.
>>>>>
>>>>> Yes, I agree that is prudent.
>>>>
>>>> If we do add a log message, we should be careful as it could still be a 
>>>> wrap (for bytes). For packets, it’s not very likely with the current 
>>>> speeds 400G, will be around 980 years… For bytes, we can wrap easily.
>>>>
>>>> So I would suggest it only for packet count…
>>>>
>>>
>>> We already only use a packet counter for the 'ukey_invalid_stat_reset' 
>>> coverage.
>>> So, I suppose, we can just add the log under the same condition.
>>>
>>> OTOH, Don't we check that the difference is within 3/4 of 64-bit range?
>>> It should still take many years to overflow the byte counter.
>>
>> Yes, I was looking at this on my review directory which did not have my 
>> patches (or latest master).
>>
>> So just adding it to the ‘ukey_invalid_stat_reset’ if() case will work :)
> 
> 
> Balasz are you planning to send out a v6 for this patch? If I understand Ilya 
> correctly all that needs to change is adding a log to the section below, Ilya?

Yep.  Some rate-limited warning or error should be fine.

> 
> if (stats->n_packets < ukey->stats.n_packets &&
>     ukey->stats.n_packets < UINT64_THREE_QUARTERS) {
>     /* Report cases where the packet counter is lower than the previous
>      * instance, but exclude the potential wrapping of an uint64_t. */
>     COVERAGE_INC(ukey_invalid_stat_reset);
> 
> ++ ADD LOG HERE…
> }
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to