On 2/21/21 5:11 PM, William Tu wrote: > On Fri, Feb 19, 2021 at 6:29 PM 贺鹏 <[email protected]> wrote: >> >> Hi, Ilya >> >> Ilya Maximets <[email protected]> 于2021年2月19日周五 下午7:19写道: >>> >>> On 2/19/21 3:12 AM, 贺鹏 wrote: >>>> Hi, >>>> >>>> Looks like this bug is caused by violating the fact that if a rule is >>>> referenced, the related ofproto should not be destroyed. >>>> >>>> If so, I have a patch that also fixes the problem, not sure if this helps. >>>> >>>> http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ >>> >>> There is at least one more problem that is not strictly related but >>> in more or less the same part of the code: >>> https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html >> >> So maybe before add it into *ovsrcu_postpone*, we should add refcount >> of ofproto also? >> > Yes, I think that will fix the issue. But are we able to find out all the > places that we need to add refcount of ofproto?
destruct() is called for ofproto_dpif unconditionally without any deferring, so this will not help unless we're delaying the the destruct() itself, and I'm not sure if we can do that. > > Looks like we might have multiple rcu postponed function that might > access the 'ofproto'. And 'ofproto' might be freed already and cause segfault. > > Hepeng's patch fixes two places. > > http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > Ilya pointed out another place > https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html > yifeng's case is a little different (not due to ofproto = NULL, but > due to setting > the p->connmgr = NULL before postponed) In my case ofproto is alive too. The problem is destroyed ofproto->baker. And in case of meter_id we don't really need to refcount the ofproto itself. 'baker' already has a refcount, so we could increase it and pass 'baker' pointer to free_meter_id instead of 'ofproto'. This function doesn't actually need an 'ofproto' pointer. Regarding this connmgr related patch... It's still valid even with refcounts because destruction of connmgr is explicitly not deferred for a reason. It's to free the listening socket that users might want to reuse. Refcounts will not help here. Also for the meters related issue.. destruct() is called for ofproto_dpif without any postponing and I'm not sure if we need or actually able to postpone it without consequences. So, baker->refcount might be a better solution keeping the ofproto->refcount only for rules. This will reduce the scope of changes, otherwise we will have to do a lot more work tracking down all the users that holds 'ofproto' pointer and will miss some of them with high probability. I'll recheck this (connmgr) patch and, likely, apply it. Will also review ofproto refcount patch. Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
