On 10/25/22 07:38, Ales Musil wrote: > On Tue, Oct 25, 2022 at 5:51 AM Han Zhou <[email protected]> wrote: > >> >> >> On Sun, Oct 23, 2022 at 10:21 PM Ales Musil <[email protected]> wrote: >>> >>> >>> >>> On Sun, Oct 23, 2022 at 10:16 PM Han Zhou <[email protected]> wrote: >>>> >>>> >>>> >>>> On Fri, Oct 21, 2022 at 1:37 AM Ales Musil <[email protected]> wrote: >>>>> >>>>> Run ovs_list_init on list_node from desired flow. >>>>> Not doing that can potentially cause crash on >>>>> ovs_assert(ovs_list_is_empty(&f->list_node)) >>>>> this might happen when we have conditional monitoring >>>>> enabled and have some flow like below being processed >>>>> by ofctrl in a single run. >>>>> >>>>> flowA is removed, flowA is added, flowA is removed. >>>>> >>>> >>> >>> Hi Han, >>> >>>> >>>> Thanks Ales for the patch. First of all I agree this patch is not >> harmful and is the right thing to do, so I am going to apply it. >>>> At the same time, I still don't understand the root cause analysis >> above and like you said, I am not sure if it would fix the bug. Two >> questions here: >>> >>> >>> The description is based on some assumptions and might not be the >> correct way to get into that state. >>> >>>> >>>> 1. How would it relate to conditional monitoring? >>> >>> >>> It is more likely to receive multiple updates at once with conditional >> monitoring or am I wrong? >>> >> I don't see a reason for this. I think it only depends on what changes >> happened in SB and notified and pending to ovn-controller during an >> iteration of ovn-controller control loop, which doesn't seem to have >> relationship with conditional monitoring. Do you have more details on the >> reasoning? >> > > It was based on the configuration, however it was confirmed that it still > happens even with conditional monitoring disabled. > > >> >>>> >>>> 2. For the example "flowA is removed, flowA is added, flowA is >> removed", I assume flowA is "a desired OVS flow", not a logical flow, >> correct? So when flowA is removed, it should be removed from the desired >> flow table and added to the "tracked_flows". Then when flowA is added, it >> would in fact be a newly allocated desired_flow, and no chance to find the >> earlier tracked deleted flowA from the desired flow table. The 2nd flowA >> deletion would find the 2nd flowA instance and destroy it, but the first >> deleted desired_flow instance of flowA would not be referenced any more and >> have no chance to use the list_node before it is finally destroyed, right? >>> >>> >>> That makes sense, as I said, I was not able to reproduce it, the issue >> seemed to make sense under those conditions. Anyway I can update the commit >> message leaving the flows part out. >> >> Understand, please update the commit message. >> I tend to believe that the core dump was caused by some other bug that may >> result in corruption of the list/memory (in such case this change won't >> help), and also not sure if it is a problem only with this old version >> v20.12 or still a problem of the latest code. >> > > Ack, I will update the commit message in v2. We have also provided a build > so we will get feedback if it still happens with this patch or not. I will > postpone v2 until then. > > >> >>> >>> Also one thing to keep in mind, it was originally reported against >> v20.12. We should also backport it if possible. >> >> I am not sure if it is a candidate for backporting. Usually we backport >> critical bug fixes only, especially for such an old version. But in this >> case the change itself is so simple and we are almost sure it won't cause >> any harm, although no evidence of any benefit, either. >> >> What's your opinion of backporting this? Dumitru, Mark, Numan? >>
Hi Ales, Han, Regardless if we think this is the root cause of the reported crash or not, I still think there's a possibility of failing the assertion because we didn't reinitialize the list_node field while computing incremental OF updates. As you said, it won't cause any harm but it hardens the ovn-controller code so I would backport it. Regards, Dumitru >> Thanks, >> Han >>> >>> Thanks, >>> Ales >>> >>>> >>>> >>>> Thanks, >>>> Han >>>> >>>>> This is most likely related to BZ[0]. >>>>> >>>>> [0] https://bugzilla.redhat.com/2125723 >>>>> Suggested-By: Dumitru Ceara <[email protected]> >>>>> Signed-off-by: Ales Musil <[email protected]> >>>>> --- >>>>> I'm not adding the BZ as Reported-At because this might >>>>> not fix the issue they are facing. It seems to be very hard >>>>> to reproduce, however it still is a good idea to apply this >>>>> fix as in the context it's the right to do and shouldn't cause >>>>> any harm. >>>>> --- >>>>> controller/ofctrl.c | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c >>>>> index c77991258..fb0e0869c 100644 >>>>> --- a/controller/ofctrl.c >>>>> +++ b/controller/ofctrl.c >>>>> @@ -1535,6 +1535,7 @@ remove_flows_from_sb_to_flow(struct >> ovn_desired_flow_table *flow_table, >>>>> free(sfr); >>>>> } >>>>> ovs_list_remove(&f->list_node); >>>>> + ovs_list_init(&f->list_node); >>>>> if (log_msg) { >>>>> ovn_flow_log(&f->flow, log_msg); >>>>> } >>>>> -- >>>>> 2.37.3 >>>>> >>>>> _______________________________________________ >>>>> dev mailing list >>>>> [email protected] >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >>> >>> >>> -- >>> >>> Ales Musil >>> >>> Senior Software Engineer - OVN Core >>> >>> Red Hat EMEA >>> >>> [email protected] IM: amusil >> > > > Thanks, > Ales > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
