On 12/21/23 09:20, Eelco Chaudron wrote: > Add new coverage counters that might help debugging flow > dump length related issues. > > Signed-off-by: Eelco Chaudron <[email protected]> > --- > ofproto/ofproto-dpif-upcall.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index cd71e3ee3..85f78675a 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -52,6 +52,9 @@ > > VLOG_DEFINE_THIS_MODULE(ofproto_dpif_upcall); > > +COVERAGE_DEFINE(dump_duration_low); > +COVERAGE_DEFINE(dump_duration_medium); > +COVERAGE_DEFINE(dump_duration_high); > COVERAGE_DEFINE(dumped_duplicate_flow); > COVERAGE_DEFINE(dumped_inconsistent_flow); > COVERAGE_DEFINE(dumped_new_flow); > @@ -1039,11 +1042,15 @@ udpif_revalidator(void *arg) > udpif->dump_duration = duration; > if (duration > 2000) { > flow_limit /= duration / 1000; > + COVERAGE_INC(dump_duration_high); > } else if (duration > 1300) { > flow_limit = flow_limit * 3 / 4; > - } else if (duration < 1000 && > - flow_limit < n_flows * 1000 / duration) { > - flow_limit += 1000; > + COVERAGE_INC(dump_duration_medium); > + } else if (duration < 1000) { > + if (flow_limit < n_flows * 1000 / duration) { > + flow_limit += 1000; > + } > + COVERAGE_INC(dump_duration_low); > }
Hi, Eelco. Thanks for the change! Though I'm not sure we actually need all three of these. The main concern being a low counter. In my experience, counters that just continuously go up create a lot of confusion and panic among users, who do not fully understand what they mean. For example the memory allocation counter; nobody really knows what is the good or bad value or a grow rate for it, it just always goes up at a fairly high rate. And the 'dump_duration_low' counter name doesn't help much here, because something being low can mean either a bad or a good thing. E.g. low battery or something like that. In practice, I'm not sure if there is a lot of value of having this counter as it just tells how many revalidation cycles we had, which is not a very meaningful number either since every cycle can be different and can take different amount of time. I'd suggest not having this counter. For the high and medium, I'm also not sure why we need to distinguish them. Both mean that something is not good, and both can't actually tell what exactly is wrong. For the emphasis we have a warning in the log now that will differentiate the high from the medium, if necessary. Maybe we can go with a single counter with a more understandable name like 'upcall_flow_limit_reduced' ? Adding to a family of other flow limit counters - upcall_flow_limit_hit and upcall_flow_limit_kill. What do you think? Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
