Hi, Ilya and William, Ilya Maximets <[email protected]> 于2021年2月22日周一 下午8:12写道: > > On 2/21/21 5:11 PM, William Tu wrote: > > On Fri, Feb 19, 2021 at 6:29 PM 贺鹏 <[email protected]> wrote: > >> > >> Hi, Ilya > >> > >> Ilya Maximets <[email protected]> 于2021年2月19日周五 下午7:19写道: > >>> > >>> On 2/19/21 3:12 AM, 贺鹏 wrote: > >>>> Hi, > >>>> > >>>> Looks like this bug is caused by violating the fact that if a rule is > >>>> referenced, the related ofproto should not be destroyed. > >>>> > >>>> If so, I have a patch that also fixes the problem, not sure if this > >>>> helps. > >>>> > >>>> http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > >>> > >>> There is at least one more problem that is not strictly related but > >>> in more or less the same part of the code: > >>> https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html > >> > >> So maybe before add it into *ovsrcu_postpone*, we should add refcount > >> of ofproto also? > >> > > Yes, I think that will fix the issue. But are we able to find out all the > > places that we need to add refcount of ofproto?
Yes, this could be difficult. > > destruct() is called for ofproto_dpif unconditionally without any deferring, > so this will not help unless we're delaying the the destruct() itself, and > I'm not sure if we can do that. > > > > > Looks like we might have multiple rcu postponed function that might > > access the 'ofproto'. And 'ofproto' might be freed already and cause > > segfault. > > > > Hepeng's patch fixes two places. > > > > http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > > Ilya pointed out another place > > https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html > > yifeng's case is a little different (not due to ofproto = NULL, but > > due to setting > > the p->connmgr = NULL before postponed) > > In my case ofproto is alive too. The problem is destroyed ofproto->baker. > And in case of meter_id we don't really need to refcount the ofproto itself. > 'baker' already has a refcount, so we could increase it and pass 'baker' > pointer to free_meter_id instead of 'ofproto'. This function doesn't > actually need an 'ofproto' pointer. > > Regarding this connmgr related patch... It's still valid even with refcounts > because destruction of connmgr is explicitly not deferred for a reason. It's > to free the listening socket that users might want to reuse. Refcounts will > not help here. > Also for the meters related issue.. destruct() is called for ofproto_dpif > without any postponing and I'm not sure if we need or actually able to > postpone it without consequences. So, baker->refcount might be a better > solution keeping the ofproto->refcount only for rules. This will reduce > the scope of changes, otherwise we will have to do a lot more work tracking > down all the users that holds 'ofproto' pointer and will miss some of them > with high probability. Agree. I recheck the patch. It's a better idea to use ofproto->backer's refcount. But in the rule case, looks like the rule->ofproto needs to be alive to access both ofproto->vl_mff_map and ofproto->ofproto_class. static void rule_destroy_cb(struct rule *rule) OVS_NO_THREAD_SAFETY_ANALYSIS { /* Send rule removed if needed. */ if (rule->flags & OFPUTIL_FF_SEND_FLOW_REM && rule->removed_reason != OVS_OFPRR_NONE && !rule_is_hidden(rule)) { ofproto_rule_send_removed(rule); } rule->ofproto->ofproto_class->rule_destruct(rule); mf_vl_mff_unref(&rule->ofproto->vl_mff_map, rule->match_tlv_bitmap); mf_vl_mff_unref(&rule->ofproto->vl_mff_map, rule->ofpacts_tlv_bitmap); ofproto_rule_destroy__(rule); } Maybe add refcount into vl_mff_map and ofproto_class? > > I'll recheck this (connmgr) patch and, likely, apply it. Will also review > ofproto refcount patch. > > Best regards, Ilya Maximets. -- hepeng _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
