On 1/4/23 10:43, Eelco Chaudron wrote: > > > On 3 Jan 2023, at 21:03, Ilya Maximets wrote: > >> On 1/3/23 20:42, William Zhao wrote: >>> Hi Ilya and Eelco, >>> >>> I have seen udpif->dump_duration in revalidate_ukey() go as high as ~2 >>> seconds . But the dump duration is very random and sporadic. Most >>> likely a function of how busy the CPU is. >>> >>> We have tried a max-revalidator time of 1 second and we saw >>> connectivity losses presumably due to the flows being deleted in the >>> "else" case for should_revalidate() call in the revalidate_ukey() >>> function. >>> >>> Taking into consideration the Nvidia MLX driver for instance: >>> https://ansymbol.com/linux/v5.14/source/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c#L40 >>> . We can see that they poll for stats every 1 second. But definitely >>> could be delayed. So the update period of hardware stats is at a >>> minimum every 1 second for MLX NICs. >> >> What it tells me is that stats are not the problem here. >> The problem is a very long dump duration. If we're talking any values >> above one second for a dump duration we're very likely talking about >> hitting a dynamic datapath flow limit. This will affect max-idle >> configuration or even trigger 'kill_them_all' condition. >> >> How many datapath flows does that system have installed? Does the >> 'function of how busy the CPU is' mean that something else is running >> on the same cores? >> >> >> And if it takes 2 seconds to just dump flows, setting max-revalidator >> to 4 seconds should not be a huge deal. >> >> max-revalidator of 1 second is clearly not enough, as it should be >> at least twice as high as expected dump duration. > > Hi Ilya, > > Statistics are definitely the problem in this case, and I verified that in > William's setup. They might have seen +1 seconds dump times previously > however, those who were not responsible for the flow deletes. It was the flow > statistics not being synced yet. > > The part that calculates the pps rate has a flaw, as it assumes the retrieved > statistics are always accurate, which is not the case for hw offloaded flows. > This is what this patch is trying to fix. So if we do not like the static 2s > delay, we could add a configurable delay, but we need a fix for this specific > problem. > > For now, a small spike of more than 1/2 max-revalidator could cause all HW > offloaded flows to be removed. And although we could change the > max-revalidator to 4 seconds, I do not see this as a final solution, as it > affects statistics accuracy, including those of none offloaded flows. > > Not that in the field I've seen a lot of setups that have occasional spikes > up to more than 1/2 max-revalidator, and this should not be a problem. And > having to tweak OVS for all these cases not having HW offloads flows time out > seems a bit too much. > > I guess William can debug why he has the occasional +1 spikes (and if those > spikes are responsible by dumping the last duration values) but it does not > solve the root cause of the flow deletes, which is the statistics not being > correctly used. > > Cheers, > > Eelco > >>> On Tue, Jan 3, 2023 at 2:22 PM Ilya Maximets <[email protected]> wrote: >>>> >>>> On 1/3/23 13:58, Eelco Chaudron wrote: >>>>> >>>>> >>>>> On 3 Jan 2023, at 13:36, Ilya Maximets wrote: >>>>> >>>>>> On 1/3/23 13:26, Eelco Chaudron wrote: >>>>>>> >>>>>>> >>>>>>> On 3 Jan 2023, at 13:20, Ilya Maximets wrote: >>>>>>> >>>>>>>> On 1/3/23 12:33, Eelco Chaudron wrote: >>>>>>>>> Depending on the driver implementation it can take up to 2 seconds >>>>>>>>> before >>>>>>>>> offloaded flow statistics are updated. This is causing a problem with >>>>>>>>> min-revalidate-pps, as old statistic values are used during this >>>>>>>>> period. >>>>>>>>> >>>>>>>>> This fix will wait for at least 2 seconds before assuming no packets >>>>>>>>> where received during this period. >>>>>>>>> >>>>>>>>> Signed-off-by: Eelco Chaudron <[email protected]> >>>>>>>>> --- >>>>>>>>> >>>>>>>>> Changes: >>>>>>>>> - v2: Use existing ukey->offloaded so the revalidate_missed_dp_flow >>>>>>>>> case is >>>>>>>>> also covered. >>>>>>>>> >>>>>>>>> ofproto/ofproto-dpif-upcall.c | 25 +++++++++++++++---------- >>>>>>>>> 1 file changed, 15 insertions(+), 10 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/ofproto/ofproto-dpif-upcall.c >>>>>>>>> b/ofproto/ofproto-dpif-upcall.c >>>>>>>>> index ad9635496..c395adbc6 100644 >>>>>>>>> --- a/ofproto/ofproto-dpif-upcall.c >>>>>>>>> +++ b/ofproto/ofproto-dpif-upcall.c >>>>>>>>> @@ -2094,10 +2094,11 @@ ukey_delete(struct umap *umap, struct >>>>>>>>> udpif_key *ukey) >>>>>>>>> } >>>>>>>>> >>>>>>>>> static bool >>>>>>>>> -should_revalidate(const struct udpif *udpif, uint64_t packets, >>>>>>>>> - long long int used) >>>>>>>>> +should_revalidate(const struct udpif *udpif, struct udpif_key *ukey, >>>>>>>>> + uint64_t packets) >>>>>>>>> { >>>>>>>>> long long int metric, now, duration; >>>>>>>>> + long long int used = ukey->stats.used; >>>>>>>>> >>>>>>>>> if (!used) { >>>>>>>>> /* Always revalidate the first time a flow is dumped. */ >>>>>>>>> @@ -2124,8 +2125,12 @@ should_revalidate(const struct udpif *udpif, >>>>>>>>> uint64_t packets, >>>>>>>>> duration = now - used; >>>>>>>>> metric = duration / packets; >>>>>>>>> >>>>>>>>> - if (metric < 1000 / ofproto_min_revalidate_pps) { >>>>>>>>> - /* The flow is receiving more than min-revalidate-pps, so >>>>>>>>> keep it. */ >>>>>>>>> + if (metric < 1000 / ofproto_min_revalidate_pps || >>>>>>>>> + (ukey->offloaded && duration < 2000)) { >>>>>>>>> + /* The flow is receiving more than min-revalidate-pps, so >>>>>>>>> keep it. >>>>>>>>> + * Or it's a hardware offloaded flow that might take up to 2 >>>>>>>>> seconds >>>>>>>>> + * to update its statistics. Until we are sure the >>>>>>>>> statistics had a >>>>>>>>> + * chance to be updated, also keep it. */ >>>>>>>> >>>>>>>> Hmm. If you know that a flow dump is taking a long time on this setup, >>>>>>>> shouldn't we just bump the max-revalidator value in the database >>>>>>>> instead >>>>>>>> of hardcoding yet another magic constant? >>>>>>> >>>>>>> Don’t think increasing the max-revalidator value is a good solution, as >>>>>>> it affects >>>>>>> the statistics gathering in general. >>>>>> >>>>>> It affects the judgement on the dump duration and the >>>>>> time when revalidators will wake up next time. If your >>>>>> dump duration is already high, your statistics gathering >>>>>> is already delayed. And your hardware doesn't seem to >>>>>> provide statistics in any timely manner as well. >>>>> >>>>> Well, this is true if the delay is constant, but this is from the last >>>>> run, and not all runs take the same amount :( Also, there is a mix of >>>>> hardware and kernel flows, where kernel flows do provide real-time >>>>> statistics vs max of 2 seconds delayed by HW offload. >>>>> >>>>> Take the following example; we just added some flows, which will trigger >>>>> multiple updates back to back. So let’s say it takes 251ms to do the >>>>> update (way under the 1.x seconds the hardware counter is updated). >>>>> >>>>> The second iteration would purge all HW offloaded flows, as no counter >>>>> was updated in the 251ms window :( >>>>> >>>>> To solve this we could set the max-revalidator time to 4 seconds, but >>>>> this will negatively affect all statistics gathering for flows. And maybe >>>>> this might also have some other side effects in existing configurations.
4 seconds only needed if the dump takes 2 seconds. If it's 251ms, 504ms will do. But, anyway... I'm not saying that we don't have issue with statistics. We do have 2 separate issues here - dump duration and statistics update frequency. Just trying to figure out which is a primary and which is a secondary contributor to the flow deletion issues that you observe. My point is that if dump durations wasn't that unreasonably high, we would not see flow deletions. OTOH, if statistics updates wasn't delayed, we would not see the issue either. So, I propose a draw. :) We know the root cause of the statistics delay (driver), and we don't know the root cause of the long dump duration. So we need to figure that out. Would still be great to know how many datapath flows we're dealing with here. 'dump_duration' is a misleading name though, as it is total revalidation time including sweep and all other phases. So, '/ 2' part of the check in should_revalidate() doesn't seem to make a lot of sense. Removing it might help with the situation a bit. If the dump durations are too inconsistent, we could replace that metric with an exponential moving average of the last N runs (lib/mov-avg.h). Also we could go the other way and just allow users to disable that "optimization" by setting min-revalidate-pps to zero (disabled). This is what Han proposed here: https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ I'm guessing that it's very similar in its roots or even the exact same issue that you both are trying to solve. What do you think? >>>> >>>> If the dump takes 251ms, you only need to bump max-revalidator to 504ms, >>>> not 4 seconds. We should not mix up the time it takes to update the >>>> statistics and the dump duration. >>>> >>>> Flow being revalidated doesn't mean it will be deleted. The flow will >>>> be deleted during revalidation only if it is actually incorrect. >>>> >>>> If it is idle for 10 seconds, it will be deleted without even considering >>>> revalidation. This is controlled by the max-idle config. >>>> >>>> In your case flows didn't reach max-idle yet, but removed *without* >>>> revalidation. This happens because OVS thinks that it's too expensive >>>> to trigger revalidation based on the flow dump duration. >>>> This is controlled by the max-revalidator config. >>>> >>>> If your actual dump takes 251ms, but max-revalidator is 504ms, the >>>> flow should not be removed even if it has zero stats for a next few >>>> seconds. As long as the flow itself is correct. >>>> >>>> That's why we need to know what is the actual dump duration here. >>>> >>>>> >>>>>>> It also does not solve the problem, as any update in the flow >>>>>>> table/config can >>>>>>> trigger a revalidation. >>>>>> >>>>>> It will, but there is a check right above the one that >>>>>> you're changing and it looks like this: >>>>>> >>>>>> if (udpif->dump_duration < ofproto_max_revalidator / 2) { >>>>>> /* We are likely to handle full revalidation for the flows. */ >>>>>> return true; >>>>>> } >>>>>> >>>>>> So, by bumping the max-revalidator value we will avoid >>>>>> blind deletion and will properly revalidate the flow. >>>>>> The flow will not be deleted then, unless we hit the max-idle. >>>>> >>>>> See above. >>>>> >>>>>> Or am I missing something? >>>>>> >>>>>>> >>>>>>> If we do not like the magic static values we can add a configuration >>>>>>> option, so it can be adjusted based on the driver implementation. >>>>>>> >>>>>>>> How long the dump actually takes? And how many flows in it? >>>>>>> >>>>>>> William do you have these values, as I did not capture any of this when >>>>>>> debugging your setup. >>>>>>> >>>>>>> //Eelco >>>>>>> >>>>> >>>> >>> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
