On 8 Dec 2022, at 0:25, Balazs Nemeth wrote:

> Besides the unrealistic case of an overflow, stats->n_packets can become
> less than ukey->stats.n_packets only when incorrect statistics were
> read. For that reason, the condition that the code employed is a
> workaround to avoid large jumps in statistics simply by assuming no
> packets were read. However, if this happens while the revalidator is
> under heavy load, the workaround has an unintended side effect where
> should_revalidate returns false because the metric it calculates is
> based on a bogus value.
>
> Therefore, this patch makes it explicit that the workaround has been
> triggered and consequently skipping should_revalidate assuming it should
> have returned true. Note also that the workaround is only triggered
> if stats->n_packets is _strictly_ less than ukey->stats.n_packets. Not
> receiving any packets is a valid scenario where should_revalidate should
> not be skipped.

I think we should not commit any workarounds in the code but concentrate on 
fixing the real issue that could happen, which is the wrap-around of statistics.

> Signed-off-by: Balazs Nemeth <[email protected]>

Next time please keep a list of revisions with changes here.

> ---
>  ofproto/ofproto-dpif-upcall.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index ad9635496..f0078e3c1 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -2327,6 +2327,7 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key 
> *ukey,
>      OVS_REQUIRES(ukey->mutex)
>  {
>      bool need_revalidate = ukey->reval_seq != reval_seq;
> +    bool stats_workaround;

I guess the name of this stat does not really explain what we work around.
But I guess it might not be needed, see comments below.

>      enum reval_result result = UKEY_DELETE;
>      struct dpif_flow_stats push;
>
> @@ -2334,7 +2335,8 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key 
> *ukey,
>
>      push.used = stats->used;
>      push.tcp_flags = stats->tcp_flags;
> -    push.n_packets = (stats->n_packets > ukey->stats.n_packets
> +    stats_workaround = stats->n_packets < ukey->stats.n_packets;
> +    push.n_packets = (!stats_workaround
>                        ? stats->n_packets - ukey->stats.n_packets
>                        : 0);
>      push.n_bytes = (stats->n_bytes > ukey->stats.n_bytes

I guess the design goal was taken to reset the stats when a wrap happens, which 
if true we should not reset the stats to zero, but set them to the actual value.

So please update both packets and bytes to something like (not tested): 
UINT64_MAX - ukey->stats.n_packets + stats->n_packets + 1

> @@ -2342,7 +2344,8 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key 
> *ukey,
>                      : 0);
>
>      if (need_revalidate) {
> -        if (should_revalidate(udpif, push.n_packets, ukey->stats.used)) {
> +        if (stats_workaround ||
> +            should_revalidate(udpif, push.n_packets, ukey->stats.used)) {

If we calculate the actual packets as suggested above, we should no longer need 
the stats_workaround.

>              if (!ukey->xcache) {
>                  ukey->xcache = xlate_cache_new();
>              } else {

Note that we also need to do this wrap-around in the push_dp_ops().

Doing it this way should also not mask the potential issue with the large 
packet counts, as the statistics will still show them (being a bit less large).
See the discussion here: 
https://patchwork.ozlabs.org/project/openvswitch/patch/164362644549.2824752.16138250405530944551.stgit@ebuild/

In the meantime, I’ll try to re-visit the issue mentioned above to see if I can 
replicate it again…

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to