On 1/4/23 22:58, Ilya Maximets wrote: > On 1/4/23 10:43, Eelco Chaudron wrote: >> >> >> On 3 Jan 2023, at 21:03, Ilya Maximets wrote: >> >>> On 1/3/23 20:42, William Zhao wrote: >>>> Hi Ilya and Eelco, >>>> >>>> I have seen udpif->dump_duration in revalidate_ukey() go as high as ~2 >>>> seconds . But the dump duration is very random and sporadic. Most >>>> likely a function of how busy the CPU is. >>>> >>>> We have tried a max-revalidator time of 1 second and we saw >>>> connectivity losses presumably due to the flows being deleted in the >>>> "else" case for should_revalidate() call in the revalidate_ukey() >>>> function. >>>> >>>> Taking into consideration the Nvidia MLX driver for instance: >>>> https://ansymbol.com/linux/v5.14/source/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c#L40 >>>> . We can see that they poll for stats every 1 second. But definitely >>>> could be delayed. So the update period of hardware stats is at a >>>> minimum every 1 second for MLX NICs. >>> >>> What it tells me is that stats are not the problem here. >>> The problem is a very long dump duration. If we're talking any values >>> above one second for a dump duration we're very likely talking about >>> hitting a dynamic datapath flow limit. This will affect max-idle >>> configuration or even trigger 'kill_them_all' condition. >>> >>> How many datapath flows does that system have installed? Does the >>> 'function of how busy the CPU is' mean that something else is running >>> on the same cores? >>> >>> >>> And if it takes 2 seconds to just dump flows, setting max-revalidator >>> to 4 seconds should not be a huge deal. >>> >>> max-revalidator of 1 second is clearly not enough, as it should be >>> at least twice as high as expected dump duration. >> >> Hi Ilya, >> >> Statistics are definitely the problem in this case, and I verified that in >> William's setup. They might have seen +1 seconds dump times previously >> however, those who were not responsible for the flow deletes. It was the >> flow statistics not being synced yet. >> >> The part that calculates the pps rate has a flaw, as it assumes the >> retrieved statistics are always accurate, which is not the case for hw >> offloaded flows. This is what this patch is trying to fix. So if we do not >> like the static 2s delay, we could add a configurable delay, but we need a >> fix for this specific problem. >> >> For now, a small spike of more than 1/2 max-revalidator could cause all HW >> offloaded flows to be removed. And although we could change the >> max-revalidator to 4 seconds, I do not see this as a final solution, as it >> affects statistics accuracy, including those of none offloaded flows. >> >> Not that in the field I've seen a lot of setups that have occasional spikes >> up to more than 1/2 max-revalidator, and this should not be a problem. And >> having to tweak OVS for all these cases not having HW offloads flows time >> out seems a bit too much. >> >> I guess William can debug why he has the occasional +1 spikes (and if those >> spikes are responsible by dumping the last duration values) but it does not >> solve the root cause of the flow deletes, which is the statistics not being >> correctly used. >> >> Cheers, >> >> Eelco >> >>>> On Tue, Jan 3, 2023 at 2:22 PM Ilya Maximets <[email protected]> wrote: >>>>> >>>>> On 1/3/23 13:58, Eelco Chaudron wrote: >>>>>> >>>>>> >>>>>> On 3 Jan 2023, at 13:36, Ilya Maximets wrote: >>>>>> >>>>>>> On 1/3/23 13:26, Eelco Chaudron wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 3 Jan 2023, at 13:20, Ilya Maximets wrote: >>>>>>>> >>>>>>>>> On 1/3/23 12:33, Eelco Chaudron wrote: >>>>>>>>>> Depending on the driver implementation it can take up to 2 seconds >>>>>>>>>> before >>>>>>>>>> offloaded flow statistics are updated. This is causing a problem with >>>>>>>>>> min-revalidate-pps, as old statistic values are used during this >>>>>>>>>> period. >>>>>>>>>> >>>>>>>>>> This fix will wait for at least 2 seconds before assuming no packets >>>>>>>>>> where received during this period. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Eelco Chaudron <[email protected]> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> Changes: >>>>>>>>>> - v2: Use existing ukey->offloaded so the revalidate_missed_dp_flow >>>>>>>>>> case is >>>>>>>>>> also covered. >>>>>>>>>> >>>>>>>>>> ofproto/ofproto-dpif-upcall.c | 25 +++++++++++++++---------- >>>>>>>>>> 1 file changed, 15 insertions(+), 10 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/ofproto/ofproto-dpif-upcall.c >>>>>>>>>> b/ofproto/ofproto-dpif-upcall.c >>>>>>>>>> index ad9635496..c395adbc6 100644 >>>>>>>>>> --- a/ofproto/ofproto-dpif-upcall.c >>>>>>>>>> +++ b/ofproto/ofproto-dpif-upcall.c >>>>>>>>>> @@ -2094,10 +2094,11 @@ ukey_delete(struct umap *umap, struct >>>>>>>>>> udpif_key *ukey) >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> static bool >>>>>>>>>> -should_revalidate(const struct udpif *udpif, uint64_t packets, >>>>>>>>>> - long long int used) >>>>>>>>>> +should_revalidate(const struct udpif *udpif, struct udpif_key *ukey, >>>>>>>>>> + uint64_t packets) >>>>>>>>>> { >>>>>>>>>> long long int metric, now, duration; >>>>>>>>>> + long long int used = ukey->stats.used; >>>>>>>>>> >>>>>>>>>> if (!used) { >>>>>>>>>> /* Always revalidate the first time a flow is dumped. */ >>>>>>>>>> @@ -2124,8 +2125,12 @@ should_revalidate(const struct udpif *udpif, >>>>>>>>>> uint64_t packets, >>>>>>>>>> duration = now - used; >>>>>>>>>> metric = duration / packets; >>>>>>>>>> >>>>>>>>>> - if (metric < 1000 / ofproto_min_revalidate_pps) { >>>>>>>>>> - /* The flow is receiving more than min-revalidate-pps, so >>>>>>>>>> keep it. */ >>>>>>>>>> + if (metric < 1000 / ofproto_min_revalidate_pps || >>>>>>>>>> + (ukey->offloaded && duration < 2000)) { >>>>>>>>>> + /* The flow is receiving more than min-revalidate-pps, so >>>>>>>>>> keep it. >>>>>>>>>> + * Or it's a hardware offloaded flow that might take up to >>>>>>>>>> 2 seconds >>>>>>>>>> + * to update its statistics. Until we are sure the >>>>>>>>>> statistics had a >>>>>>>>>> + * chance to be updated, also keep it. */ >>>>>>>>> >>>>>>>>> Hmm. If you know that a flow dump is taking a long time on this >>>>>>>>> setup, >>>>>>>>> shouldn't we just bump the max-revalidator value in the database >>>>>>>>> instead >>>>>>>>> of hardcoding yet another magic constant? >>>>>>>> >>>>>>>> Don’t think increasing the max-revalidator value is a good solution, >>>>>>>> as it affects >>>>>>>> the statistics gathering in general. >>>>>>> >>>>>>> It affects the judgement on the dump duration and the >>>>>>> time when revalidators will wake up next time. If your >>>>>>> dump duration is already high, your statistics gathering >>>>>>> is already delayed. And your hardware doesn't seem to >>>>>>> provide statistics in any timely manner as well. >>>>>> >>>>>> Well, this is true if the delay is constant, but this is from the last >>>>>> run, and not all runs take the same amount :( Also, there is a mix of >>>>>> hardware and kernel flows, where kernel flows do provide real-time >>>>>> statistics vs max of 2 seconds delayed by HW offload. >>>>>> >>>>>> Take the following example; we just added some flows, which will trigger >>>>>> multiple updates back to back. So let’s say it takes 251ms to do the >>>>>> update (way under the 1.x seconds the hardware counter is updated). >>>>>> >>>>>> The second iteration would purge all HW offloaded flows, as no counter >>>>>> was updated in the 251ms window :( >>>>>> >>>>>> To solve this we could set the max-revalidator time to 4 seconds, but >>>>>> this will negatively affect all statistics gathering for flows. And >>>>>> maybe this might also have some other side effects in existing >>>>>> configurations. > > 4 seconds only needed if the dump takes 2 seconds. If it's 251ms, > 504ms will do. But, anyway...
Please, ignore the sentence above, posted a reply in the wrong part of the email. :/ > > I'm not saying that we don't have issue with statistics. We do have > 2 separate issues here - dump duration and statistics update frequency. > Just trying to figure out which is a primary and which is a secondary > contributor to the flow deletion issues that you observe. > > My point is that if dump durations wasn't that unreasonably high, we > would not see flow deletions. OTOH, if statistics updates wasn't delayed, > we would not see the issue either. So, I propose a draw. :) > > We know the root cause of the statistics delay (driver), and we don't > know the root cause of the long dump duration. So we need to figure > that out. > > Would still be great to know how many datapath flows we're dealing > with here. > > 'dump_duration' is a misleading name though, as it is total revalidation > time including sweep and all other phases. So, '/ 2' part of the check > in should_revalidate() doesn't seem to make a lot of sense. Removing > it might help with the situation a bit. > If the dump durations are too inconsistent, we could replace that metric > with an exponential moving average of the last N runs (lib/mov-avg.h). > > > Also we could go the other way and just allow users to disable that > "optimization" by setting min-revalidate-pps to zero (disabled). > This is what Han proposed here: > > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > I'm guessing that it's very similar in its roots or even the exact > same issue that you both are trying to solve. > > What do you think? > >>>>> >>>>> If the dump takes 251ms, you only need to bump max-revalidator to 504ms, >>>>> not 4 seconds. We should not mix up the time it takes to update the >>>>> statistics and the dump duration. >>>>> >>>>> Flow being revalidated doesn't mean it will be deleted. The flow will >>>>> be deleted during revalidation only if it is actually incorrect. >>>>> >>>>> If it is idle for 10 seconds, it will be deleted without even considering >>>>> revalidation. This is controlled by the max-idle config. >>>>> >>>>> In your case flows didn't reach max-idle yet, but removed *without* >>>>> revalidation. This happens because OVS thinks that it's too expensive >>>>> to trigger revalidation based on the flow dump duration. >>>>> This is controlled by the max-revalidator config. >>>>> >>>>> If your actual dump takes 251ms, but max-revalidator is 504ms, the >>>>> flow should not be removed even if it has zero stats for a next few >>>>> seconds. As long as the flow itself is correct. >>>>> >>>>> That's why we need to know what is the actual dump duration here. >>>>> >>>>>> >>>>>>>> It also does not solve the problem, as any update in the flow >>>>>>>> table/config can >>>>>>>> trigger a revalidation. >>>>>>> >>>>>>> It will, but there is a check right above the one that >>>>>>> you're changing and it looks like this: >>>>>>> >>>>>>> if (udpif->dump_duration < ofproto_max_revalidator / 2) { >>>>>>> /* We are likely to handle full revalidation for the flows. */ >>>>>>> return true; >>>>>>> } >>>>>>> >>>>>>> So, by bumping the max-revalidator value we will avoid >>>>>>> blind deletion and will properly revalidate the flow. >>>>>>> The flow will not be deleted then, unless we hit the max-idle. >>>>>> >>>>>> See above. >>>>>> >>>>>>> Or am I missing something? >>>>>>> >>>>>>>> >>>>>>>> If we do not like the magic static values we can add a configuration >>>>>>>> option, so it can be adjusted based on the driver implementation. >>>>>>>> >>>>>>>>> How long the dump actually takes? And how many flows in it? >>>>>>>> >>>>>>>> William do you have these values, as I did not capture any of this >>>>>>>> when debugging your setup. >>>>>>>> >>>>>>>> //Eelco >>>>>>>> >>>>>> >>>>> >>>> >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
