On 12/21/23 09:20, Eelco Chaudron wrote:
> Add new coverage counters that might help debugging flow
> dump length related issues.
> 
> Signed-off-by: Eelco Chaudron <[email protected]>
> ---
>  ofproto/ofproto-dpif-upcall.c |   13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index cd71e3ee3..85f78675a 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -52,6 +52,9 @@
>  
>  VLOG_DEFINE_THIS_MODULE(ofproto_dpif_upcall);
>  
> +COVERAGE_DEFINE(dump_duration_low);
> +COVERAGE_DEFINE(dump_duration_medium);
> +COVERAGE_DEFINE(dump_duration_high);
>  COVERAGE_DEFINE(dumped_duplicate_flow);
>  COVERAGE_DEFINE(dumped_inconsistent_flow);
>  COVERAGE_DEFINE(dumped_new_flow);
> @@ -1039,11 +1042,15 @@ udpif_revalidator(void *arg)
>              udpif->dump_duration = duration;
>              if (duration > 2000) {
>                  flow_limit /= duration / 1000;
> +                COVERAGE_INC(dump_duration_high);
>              } else if (duration > 1300) {
>                  flow_limit = flow_limit * 3 / 4;
> -            } else if (duration < 1000 &&
> -                       flow_limit < n_flows * 1000 / duration) {
> -                flow_limit += 1000;
> +                COVERAGE_INC(dump_duration_medium);
> +            } else if (duration < 1000) {
> +                if (flow_limit < n_flows * 1000 / duration) {
> +                    flow_limit += 1000;
> +                }
> +                COVERAGE_INC(dump_duration_low);
>              }

Hi, Eelco.  Thanks for the change!
Though I'm not sure we actually need all three of these.

The main concern being a low counter.  In my experience, counters that
just continuously go up create a lot of confusion and panic among users,
who do not fully understand what they mean.  For example the memory
allocation counter; nobody really knows what is the good or bad value
or a grow rate for it, it just always goes up at a fairly high rate.

And the 'dump_duration_low' counter name doesn't help much here, because
something being low can mean either a bad or a good thing.
E.g. low battery or something like that.  In practice, I'm not sure if
there is a lot of value of having this counter as it just tells how many
revalidation cycles we had, which is not a very meaningful number either
since every cycle can be different and can take different amount of time.
I'd suggest not having this counter.

For the high and medium, I'm also not sure why we need to distinguish
them.  Both mean that something is not good, and both can't actually
tell what exactly is wrong.  For the emphasis we have a warning in the
log now that will differentiate the high from the medium, if necessary.

Maybe we can go with a single counter with a more understandable name
like 'upcall_flow_limit_reduced' ?  Adding to a family of other flow
limit counters - upcall_flow_limit_hit and upcall_flow_limit_kill.

What do you think?

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to