Hi, Eelco and Ilya

It has been a long time and I see there are a lot fixes on the
revdalidator's code about this statistics code.
Has this stats inconsistent issue been solved?

I just realize that making dump seq equal in *try_ukey_replace* is not
enough.
we might need to clear the old ukey's stats.

since we only reuse the old ukey in EVICTED states, which means we have
executed delete-megaflow-op on this ukey and
the stats info has been pushed.

so if there are new megaflow upcalls and replaces the existing ukey's, the
ukey->stats should be cleared
to sync with the new megaflows.

if not, when revaldiator tries to delete this ukey, the push_dp_ops will
find inconsistent stats between ukey->stats( as it's synced with
old megaflow) and new magaflow, which is just the case you describe here.





Peng He <[email protected]> 于2022年5月21日周六 13:02写道:

> Hi, Eelco and Ilya,
>
> We have observed the similar case in our production environments.
>
> After digging some code, I would like a discussion:
>
> In the situation that the ovs revalidator is under high load, i.e. the
> number of megaflows are too large,
> and the revalidators would try to delete some megaflows, at the same time,
> there are traffic keeps
> generating new megaflows through upcalls.
>
> the underlying cmap of megaflows changes due to a lot of deletion and
> insertion (cmap will change its bucket number, capacity,
> and re-place the megaflows in different buckets), resulting in a same
> magaflow might be dumped twice in two revalidator threads in one dump stage.
>
> Consider a case:
>
> revalidator 1 gets the megaflow A dumped and its related ukey u(A), and
> decide to kill the megaflow A.
> After megaflow deletion, the pmd generates a new megaflow B with the same
> ufid, i.e. the same match
> and the same actions, and replace the ukey u(A) with u(B).
>
> Now revalidator 2 gets the old megaflow A again, and it will find the ukey
> u(B).
> u(B) is the new ukey, it's dump_seq is 0, and this leads megaflow A is
> viewed as megaflow B, and its
> statistics are contributed into ukey(B).
>
> The megaflow A and megaflow B is in essential the same one, the only
> difference is the statistics value.
> This mismatch of megaflow and ukey has two side effects:
>
> 1) the megaflow A's statistics have been contributed twice, leading to a
> amplified value of openflow rule's
> statistics. This is our observation in our environments.
>
> 2) revalidator 2 might also decides to kill ukey(B), but now its
> statistics equals to megaflow A, and thus
> results in a mismatch value in the email that Eelco observed.
>
> To fix the case 1), I think, during the ukey replace, the new ukey
> generated by upcall should take the dump_seq
> value of the replaced old key, and avoid the old megaflow being dumped and
> find the new key.
>
> and the case 2) is also fixed.
>
> Any thoughts?
>
>
> Eelco Chaudron <[email protected]> 于2022年2月22日周二 16:10写道:
>
>>
>>
>> On 21 Feb 2022, at 12:29, Eelco Chaudron wrote:
>>
>> > On 17 Feb 2022, at 14:10, Ilya Maximets wrote:
>> >
>> >> On 1/31/22 11:54, Eelco Chaudron wrote:
>> >>> Make sure to only update packet and byte counters when valid,
>> >>> or else this could lead to "temporarily/occasionally"
>> >>> out-of-sync flow counters.
>> >>
>> >> There was already the same patch submitted here:
>> >>
>> https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
>> >>
>> >> And I'm still not comfortable with the change, because it seems
>> >> like it only hides the underlying datapath problem.  Do you know
>> >> why exactly datapath stats become lower than previously reported?
>> >>
>> >> If it's some kind of a statistics flush, that will mean that flow
>> >> statistics will not be updated until new stats will catch up to the
>> >> old value leading to the flow revalidation and incorrect flow stats
>> >> anyway.
>> >
>> > This was very hard to reproduce, only once out of 20-30 runs if I
>> remember correctly.
>> >
>> > Without the patch, it would sometimes show very high numbers and then
>> got updated after a while with the correct numbers.
>> > At least this is what I remember, as I did not take any notes, and my
>> brain wanted to forget this patchset :)
>> >
>> >
>> > Guess the fix is needed anyway as this behavior was there since day one
>> in revalidate_ukey(), e79a6c833.
>> >
>> > I also see that you got no reply to your comments, so I’ll take another
>> stab to make sure this patch is really fixing the problem, or fixing a
>> problem, and hiding another TC problem.
>> >
>> >> We fixed incorrect stats on flow modification for dpdk offload provider
>> >> previously here:
>> >>
>> https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
>> >>
>> >> Do we need something similar for tc?
>> >
>> > I’ll take a close look at the DPDK patch, once I get my reproducer
>> going.
>>
>> Unfortunately, after an afternoon and night or running tests, I can not
>> replicate the problem with the weird counters :(
>>
>> So for now I’ll drop this patch from the set, and hopefully, it will
>> resurface when I’m integrating the system-traffic tests into the hardware
>> offload set.
>>
>> <SNIP>
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
>
> --
> hepeng
>


-- 
hepeng
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to