On 10/25/22 07:38, Ales Musil wrote:
> On Tue, Oct 25, 2022 at 5:51 AM Han Zhou <[email protected]> wrote:
> 
>>
>>
>> On Sun, Oct 23, 2022 at 10:21 PM Ales Musil <[email protected]> wrote:
>>>
>>>
>>>
>>> On Sun, Oct 23, 2022 at 10:16 PM Han Zhou <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On Fri, Oct 21, 2022 at 1:37 AM Ales Musil <[email protected]> wrote:
>>>>>
>>>>> Run ovs_list_init on list_node from desired flow.
>>>>> Not doing that can potentially cause crash on
>>>>> ovs_assert(ovs_list_is_empty(&f->list_node))
>>>>> this might happen when we have conditional monitoring
>>>>> enabled and have some flow like below being processed
>>>>> by ofctrl in a single run.
>>>>>
>>>>> flowA is removed, flowA is added, flowA is removed.
>>>>>
>>>>
>>>
>>> Hi Han,
>>>
>>>>
>>>> Thanks Ales for the patch. First of all I agree this patch is not
>> harmful and is the right thing to do, so I am going to apply it.
>>>> At the same time, I still don't understand the root cause analysis
>> above and like you said, I am not sure if it would fix the bug. Two
>> questions here:
>>>
>>>
>>> The description is based on some assumptions and might not be the
>> correct way to get into that state.
>>>
>>>>
>>>> 1. How would it relate to conditional monitoring?
>>>
>>>
>>> It is more likely to receive multiple updates at once with conditional
>> monitoring or am I wrong?
>>>
>> I don't see a reason for this. I think it only depends on what changes
>> happened in SB and notified and pending to ovn-controller during an
>> iteration of ovn-controller control loop, which doesn't seem to have
>> relationship with conditional monitoring. Do you have more details on the
>> reasoning?
>>
> 
> It was based on the configuration, however it was confirmed that it still
> happens even with conditional monitoring disabled.
> 
> 
>>
>>>>
>>>> 2. For the example "flowA is removed, flowA is added, flowA is
>> removed", I assume flowA is "a desired OVS flow", not a logical flow,
>> correct? So when flowA is removed, it should be removed from the desired
>> flow table and added to the "tracked_flows". Then when flowA is added, it
>> would in fact be a newly allocated desired_flow, and no chance to find the
>> earlier tracked deleted flowA from the desired flow table. The 2nd flowA
>> deletion would find the 2nd flowA instance and destroy it, but the first
>> deleted desired_flow instance of flowA would not be referenced any more and
>> have no chance to use the list_node before it is finally destroyed, right?
>>>
>>>
>>> That makes sense, as I said, I was not able to reproduce it, the issue
>> seemed to make sense under those conditions. Anyway I can update the commit
>> message leaving the flows part out.
>>
>> Understand, please update the commit message.
>> I tend to believe that the core dump was caused by some other bug that may
>> result in corruption of the list/memory (in such case this change won't
>> help), and also not sure if it is a problem only with this old version
>> v20.12 or still a problem of the latest code.
>>
> 
> Ack, I will update the commit message in  v2. We have also provided a build
> so we will get feedback if it still happens with this patch or not. I will
> postpone v2 until then.
> 
> 
>>
>>>
>>> Also one thing to keep in mind, it was originally reported against
>> v20.12. We should also backport it if possible.
>>
>> I am not sure if it is a candidate for backporting. Usually we backport
>> critical bug fixes only, especially for such an old version. But in this
>> case the change itself is so simple and we are almost sure it won't cause
>> any harm, although no evidence of any benefit, either.
>>
>> What's your opinion of backporting this? Dumitru, Mark, Numan?
>>

Hi Ales, Han,

Regardless if we think this is the root cause of the reported crash or
not, I still think there's a possibility of failing the assertion
because we didn't reinitialize the list_node field while computing
incremental OF updates.

As you said, it won't cause any harm but it hardens the ovn-controller
code so I would backport it.

Regards,
Dumitru

>> Thanks,
>> Han
>>>
>>> Thanks,
>>> Ales
>>>
>>>>
>>>>
>>>> Thanks,
>>>> Han
>>>>
>>>>> This is most likely related to BZ[0].
>>>>>
>>>>> [0] https://bugzilla.redhat.com/2125723
>>>>> Suggested-By: Dumitru Ceara <[email protected]>
>>>>> Signed-off-by: Ales Musil <[email protected]>
>>>>> ---
>>>>> I'm not adding the BZ as Reported-At because this might
>>>>> not fix the issue they are facing. It seems to be very hard
>>>>> to reproduce, however it still is a good idea to apply this
>>>>> fix as in the context it's the right to do and shouldn't cause
>>>>> any harm.
>>>>> ---
>>>>>  controller/ofctrl.c | 1 +
>>>>>  1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
>>>>> index c77991258..fb0e0869c 100644
>>>>> --- a/controller/ofctrl.c
>>>>> +++ b/controller/ofctrl.c
>>>>> @@ -1535,6 +1535,7 @@ remove_flows_from_sb_to_flow(struct
>> ovn_desired_flow_table *flow_table,
>>>>>              free(sfr);
>>>>>          }
>>>>>          ovs_list_remove(&f->list_node);
>>>>> +        ovs_list_init(&f->list_node);
>>>>>          if (log_msg) {
>>>>>              ovn_flow_log(&f->flow, log_msg);
>>>>>          }
>>>>> --
>>>>> 2.37.3
>>>>>
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> [email protected]
>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>>
>>>
>>> --
>>>
>>> Ales Musil
>>>
>>> Senior Software Engineer - OVN Core
>>>
>>> Red Hat EMEA
>>>
>>> [email protected]    IM: amusil
>>
> 
> 
> Thanks,
> Ales
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to