On 2/21/21 5:11 PM, William Tu wrote:
> On Fri, Feb 19, 2021 at 6:29 PM 贺鹏 <[email protected]> wrote:
>>
>> Hi, Ilya
>>
>> Ilya Maximets <[email protected]> 于2021年2月19日周五 下午7:19写道:
>>>
>>> On 2/19/21 3:12 AM, 贺鹏 wrote:
>>>> Hi,
>>>>
>>>> Looks like this bug is caused by violating the fact that if a rule is
>>>> referenced, the related ofproto should not be destroyed.
>>>>
>>>> If so, I have a patch that also fixes the problem, not sure if this helps.
>>>>
>>>> http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
>>>
>>> There is at least one more problem that is not strictly related but
>>> in more or less the same part of the code:
>>>   https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html
>>
>> So maybe before add it into *ovsrcu_postpone*, we should add refcount
>> of ofproto also?
>>
> Yes, I think that will fix the issue. But are we able to find out all the
> places that we need to add refcount of ofproto?

destruct() is called for ofproto_dpif unconditionally without any deferring,
so this will not help unless we're delaying the the destruct() itself, and
I'm not sure if we can do that.

> 
> Looks like we might have multiple rcu postponed function that might
> access the 'ofproto'. And 'ofproto' might be freed already and cause segfault.
> 
> Hepeng's patch fixes two places.
>   
> http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
> Ilya pointed out another place
>   https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html
> yifeng's case is a little different (not due to ofproto = NULL, but
> due to setting
> the p->connmgr = NULL before postponed)

In my case ofproto is alive too.  The problem is destroyed ofproto->baker.
And in case of meter_id we don't really need to refcount the ofproto itself.
'baker' already has a refcount, so we could increase it and pass 'baker'
pointer to free_meter_id instead of 'ofproto'.  This function doesn't
actually need an 'ofproto' pointer.

Regarding this connmgr related patch... It's still valid even with refcounts
because destruction of connmgr is explicitly not deferred for a reason.  It's
to free the listening socket that users might want to reuse.  Refcounts will
not help here.
Also for the meters related issue.. destruct() is called for ofproto_dpif
without any postponing and I'm not sure if we need or actually able to
postpone it without consequences.  So, baker->refcount might be a better
solution keeping the ofproto->refcount only for rules.  This will reduce
the scope of changes, otherwise we will have to do a lot more work tracking
down all the users that holds 'ofproto' pointer and will miss some of them
with high probability.

I'll recheck this (connmgr) patch and, likely, apply it.   Will also review
ofproto refcount patch.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to