On 31 Mar 2023, at 15:19, Eelco Chaudron wrote:
> On 31 Mar 2023, at 15:15, Ilya Maximets wrote: > >> On 3/31/23 15:06, Eelco Chaudron wrote: >>> >>> >>> On 31 Mar 2023, at 12:38, Simon Horman wrote: >>> >>>> On Fri, Mar 31, 2023 at 12:05:09PM +0200, Ilya Maximets wrote: >>>>> On 3/31/23 11:07, Simon Horman wrote: >>>>>> On Thu, Mar 30, 2023 at 09:04:02PM +0200, Ilya Maximets wrote: >>>>>>> On 3/30/23 11:45, Simon Horman wrote: >>>>>>>> On Fri, Mar 17, 2023 at 09:47:36PM +0100, Eelco Chaudron wrote: >>>>>>>>>> Op 17 mrt. 2023 om 21:11 heeft Marcelo Ricardo Leitner >>>>>>>>>> <[email protected]> het volgende geschreven: >>>>>>>>>> Thu, Mar 16, 2023 at 09:51:34AM +0100, Eelco Chaudron wrote: >>>>>>>>>>>> On 22 Dec 2022, at 13:32, Eelco Chaudron wrote: >>>>>>>>>>>>> On 22 Dec 2022, at 10:26, Balazs Nemeth wrote: >>>>>>>>>>>> >>>>>>>>>>>>> The only way that stats->{n_packets,n_bytes} would decrease is >>>>>>>>>>>>> due to an >>>>>>>> >>>>>>>> nit: s/way/ways/ >>>> >>>>>>>>>>>>> overflow, or if there are bugs in how statistics are handled. In >>>>>>>>>>>>> the >>>>>>>>>>>>> past, there were multiple bugs that caused a jump backward. A >>>>>>>>>>>>> workaround was in place to set the statistics to 0 in that case. >>>>>>>>>>>>> When >>>>>>>>>>>>> this happened while the revalidator was under heavy load, the >>>>>>>>>>>>> workaround >>>>>>>>>>>>> had an unintended side effect where should_revalidate returned >>>>>>>>>>>>> false >>>>>>>>>>>>> causing the flow to be removed because the metric it calculated >>>>>>>>>>>>> was >>>>>>>>>>>>> based on a bogus value. Since many of those bugs have now been >>>>>>>>>>>>> identified and resolved, there is no need to set the statistics >>>>>>>>>>>>> to 0. In >>>>>>>>>>>>> addition, the (unlikely) overflow still needs to be handled >>>>>>>>>>>>> appropriately. >>>>>>>> >>>>>>>> 1. Perhaps it would be prudent to log/count if/when this occurs >>>>>>> >>>>>>> +1 >>>>>>> We do have a coverage counter that will indicate the case where stats >>>>>>> jump back. However, if we're certain that this should never happen, >>>>>>> we should, probably, emit a warning or even an error log as well, so >>>>>>> users are aware that something went wrong. >>>>>> >>>>>> I was thinking more of a counter, which seems to already be covered. >>>>>> But I have no objection to your reasoning about having a warning (too). >>>>>> >>>>>>> >>>>>>>> 2. I take it that the overflow handling would be follow-up work, >>>>>>>> is that correct? >>>>>>> >>>>>>> The unsigned arithmetic should take case of overflowed counters, >>>>>>> because the result of subtraction will still give a correct difference >>>>>>> between the old and a new value, even if it overflowed and the new >>>>>>> value is smaller. Unless, of course, it overflowed more than once. >>>>>> >>>>>> More than once between samples? >>>>>> If so, I'm assuming that is not a case we can hit unless there is a bug. >>>>> >>>>> Right. It's actually should be practically not possible to overflow >>>>> even once with a current hardware. Assuming we have a fancy 400 Gbps >>>>> NIC, then it should take 11.7 years to overflow a byte counter. >>>>> >>>>> So, this patch is mostly removing a workaround for some bug that we >>>>> hope we fixed. But it's not clear what the original bug was as the >>>>> commit message for this workaround didn't specify a root cause. So, >>>>> it's hard to say if it's fixed or not. And that's why I'm thinking >>>>> that the error message is needed. >>>> >>>> Yes, I agree that is prudent. >>> >>> If we do add a log message, we should be careful as it could still be a >>> wrap (for bytes). For packets, it’s not very likely with the current speeds >>> 400G, will be around 980 years… For bytes, we can wrap easily. >>> >>> So I would suggest it only for packet count… >>> >> >> We already only use a packet counter for the 'ukey_invalid_stat_reset' >> coverage. >> So, I suppose, we can just add the log under the same condition. >> >> OTOH, Don't we check that the difference is within 3/4 of 64-bit range? >> It should still take many years to overflow the byte counter. > > Yes, I was looking at this on my review directory which did not have my > patches (or latest master). > > So just adding it to the ‘ukey_invalid_stat_reset’ if() case will work :) Balasz are you planning to send out a v6 for this patch? If I understand Ilya correctly all that needs to change is adding a log to the section below, Ilya? if (stats->n_packets < ukey->stats.n_packets && ukey->stats.n_packets < UINT64_THREE_QUARTERS) { /* Report cases where the packet counter is lower than the previous * instance, but exclude the potential wrapping of an uint64_t. */ COVERAGE_INC(ukey_invalid_stat_reset); ++ ADD LOG HERE… } _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
