On 18/06/2024 11:53, Eelco Chaudron wrote:
> 
> 
> On 18 Jun 2024, at 8:05, Roi Dayan wrote:
> 
>> On 03/06/2024 16:29, Eelco Chaudron wrote:
>>>
>>>
>>> On 3 Jun 2024, at 10:07, Roi Dayan wrote:
>>>
>>>> On 03/06/2024 10:18, Roi Dayan wrote:
>>>>>
>>>>>
>>>>> On 30/05/2024 18:48, Eelco Chaudron wrote:
>>>>>>
>>>>>>
>>>>>> On 23 May 2024, at 12:46, Roi Dayan via dev wrote:
>>>>>>
>>>>>>> It is observed in some environments that there are much more ukeys than
>>>>>>> actual DP flows. For example:
>>>>>>>
>>>>>>> $ ovs-appctl upcall/show
>>>>>>> system@ovs-system:
>>>>>>> flows : (current 7) (avg 6) (max 117) (limit 2125)
>>>>>>> offloaded flows : 525
>>>>>>> dump duration : 1063ms
>>>>>>> ufid enabled : true
>>>>>>>
>>>>>>> 23: (keys 3612)
>>>>>>> 24: (keys 3625)
>>>>>>> 25: (keys 3485)
>>>>>>>
>>>>>>> The revalidator threads are busy revalidating the stale ukeys leading to
>>>>>>> high CPU and long dump duration.
>>>>>>>
>>>>>>> There are some possible situations that may result in stale ukeys that
>>>>>>> have no corresponding DP flows.
>>>>>>>
>>>>>>> In revalidator, push_dp_ops() doesn't check error if the op type is not
>>>>>>> DEL. It is possible that a PUT(MODIFY) fails, especially for tc offload
>>>>>>> case, where the old flow is deleted first and then the new one is
>>>>>>> created. If the creation fails, the ukey will be stale (no corresponding
>>>>>>> DP flow). This patch adds a warning in such case.
>>>>>>
>>>>>> Not sure I understand, this behavior should be captured by the 
>>>>>> UKEY_INCONSISTENT state.
>>>>>
>>>>> Hi Eelco,
>>>>>
>>>>> thanks for reviewing.
>>>>> We started the patch on older branch as we didn't rebase for some time
>>>>> and got a little later on sending it.
>>>>> I see the addition now for UKEY_INCOSISTENT in the following patch:
>>>>>
>>>>> cf11766cbcf1 ofproto-dpif-upcall: Fix push_dp_ops to handle all errors.
>>>>>
>>>>> Looks like it handles the same situation we tried to handle in this patch.
>>>>>
>>>>>>
>>>>>>> Another possible scenario is in handle_upcalls() if a PUT operation did
>>>>>>> not succeed and op->error attribute was not set correctly it can lead to
>>>>>>> stale ukey in operational state.
>>>>>>
>>>>>>
>>>>>> Guess we need to figure out these cases, as these are the root cause of 
>>>>>> your problem.
>>>>>
>>>>> right. maybe. but this could help keep the system alive/clean while 
>>>>> logging the
>>>>> bad situation instead of having increase in those stale/inconsistent 
>>>>> ukeys.
>>>>> I understand if it's not nice to handle it like this.
>>>>>
>>>>
>>>> Hi Eelco,
>>>>
>>>> I remember now one of the reproduction scenarios we did is do some traffic
>>>> to get rules added using TC and then delete those from tc command line
>>>> and it got to stale ukeys.
>>>> The revalidator dump thread not seeing the rules so not updating anything
>>>> and sweep over the ukeys skipping them as well.
>>>> This is why we checked against the timing stats of the ukey.
>>>>
>>>> I remember I tested on the upstream code and not our development branch
>>>> and it reproduces. I didn't notice if the commit adding UKEY_INCONSISTENT
>>>> existed but it handles errors from adding flows so I dont think it matters.
>>>>
>>>> I'll try to check and verify again but I think it's still here.
>>>> So while cases getting dop.error handled with UKEY_INCONSISTENT,
>>>> this case I think is not.
>>>
>>> I think you are right this case is not covered by the UKEY_INCONSISTENT 
>>> test below. See my suggestion on fixing this in revalidate_ukey().
>>>
>>> Maybe you could also add a test case for this in the offload test suite.
>>>
>>> //Eelco
>>
>> Hi Eelco,
>>
>> Thanks for the review. I didn't have time to check this but wanted to
>> reply that it's in my queue to verify your suggestion and add a test.
> 
> Thanks for letting me know, and I understand as we are all busy :)
> 
> //Eelco
> 


Hi Eelco,

I tested your suggestion to move to against removing tc rules
from tc command I see it's helpful in cleaning those stale ukeys.
I have trouble doing a clean test for this for the testsuite.

At first I tested with modification in revalidator_sweep__() to
always set seq_mismatch to reach revalidate_ukey().
Later I moved to create a redundant veth and doing add/del loop
of it to the bridge to cause a seq mismatch. not sure this is
the clean way but without the change I get stale ukeys and with
it we get to the cleanup change and cleaning the ukeys.

I'll send v2 for now as a clean patch with only the relevant change
and cleaner commit msg and lets take it from there if my suggestion
here of the test is ok or something else or the test can be postponed
to a later time to think of a cleaner/right way to do it.

Thanks,
Roi
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to