On 31 Mar 2023, at 12:38, Simon Horman wrote:
> On Fri, Mar 31, 2023 at 12:05:09PM +0200, Ilya Maximets wrote: >> On 3/31/23 11:07, Simon Horman wrote: >>> On Thu, Mar 30, 2023 at 09:04:02PM +0200, Ilya Maximets wrote: >>>> On 3/30/23 11:45, Simon Horman wrote: >>>>> On Fri, Mar 17, 2023 at 09:47:36PM +0100, Eelco Chaudron wrote: >>>>>>> Op 17 mrt. 2023 om 21:11 heeft Marcelo Ricardo Leitner >>>>>>> <[email protected]> het volgende geschreven: >>>>>>> Thu, Mar 16, 2023 at 09:51:34AM +0100, Eelco Chaudron wrote: >>>>>>>>> On 22 Dec 2022, at 13:32, Eelco Chaudron wrote: >>>>>>>>>> On 22 Dec 2022, at 10:26, Balazs Nemeth wrote: >>>>>>>>> >>>>>>>>>> The only way that stats->{n_packets,n_bytes} would decrease is due >>>>>>>>>> to an >>>>> >>>>> nit: s/way/ways/ > >>>>>>>>>> overflow, or if there are bugs in how statistics are handled. In the >>>>>>>>>> past, there were multiple bugs that caused a jump backward. A >>>>>>>>>> workaround was in place to set the statistics to 0 in that case. When >>>>>>>>>> this happened while the revalidator was under heavy load, the >>>>>>>>>> workaround >>>>>>>>>> had an unintended side effect where should_revalidate returned false >>>>>>>>>> causing the flow to be removed because the metric it calculated was >>>>>>>>>> based on a bogus value. Since many of those bugs have now been >>>>>>>>>> identified and resolved, there is no need to set the statistics to >>>>>>>>>> 0. In >>>>>>>>>> addition, the (unlikely) overflow still needs to be handled >>>>>>>>>> appropriately. >>>>> >>>>> 1. Perhaps it would be prudent to log/count if/when this occurs >>>> >>>> +1 >>>> We do have a coverage counter that will indicate the case where stats >>>> jump back. However, if we're certain that this should never happen, >>>> we should, probably, emit a warning or even an error log as well, so >>>> users are aware that something went wrong. >>> >>> I was thinking more of a counter, which seems to already be covered. >>> But I have no objection to your reasoning about having a warning (too). >>> >>>> >>>>> 2. I take it that the overflow handling would be follow-up work, >>>>> is that correct? >>>> >>>> The unsigned arithmetic should take case of overflowed counters, >>>> because the result of subtraction will still give a correct difference >>>> between the old and a new value, even if it overflowed and the new >>>> value is smaller. Unless, of course, it overflowed more than once. >>> >>> More than once between samples? >>> If so, I'm assuming that is not a case we can hit unless there is a bug. >> >> Right. It's actually should be practically not possible to overflow >> even once with a current hardware. Assuming we have a fancy 400 Gbps >> NIC, then it should take 11.7 years to overflow a byte counter. >> >> So, this patch is mostly removing a workaround for some bug that we >> hope we fixed. But it's not clear what the original bug was as the >> commit message for this workaround didn't specify a root cause. So, >> it's hard to say if it's fixed or not. And that's why I'm thinking >> that the error message is needed. > > Yes, I agree that is prudent. If we do add a log message, we should be careful as it could still be a wrap (for bytes). For packets, it’s not very likely with the current speeds 400G, will be around 980 years… For bytes, we can wrap easily. So I would suggest it only for packet count… _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
