Hi, Ilya and William,

Ilya Maximets <[email protected]> 于2021年2月22日周一 下午8:12写道:
>
> On 2/21/21 5:11 PM, William Tu wrote:
> > On Fri, Feb 19, 2021 at 6:29 PM 贺鹏 <[email protected]> wrote:
> >>
> >> Hi, Ilya
> >>
> >> Ilya Maximets <[email protected]> 于2021年2月19日周五 下午7:19写道:
> >>>
> >>> On 2/19/21 3:12 AM, 贺鹏 wrote:
> >>>> Hi,
> >>>>
> >>>> Looks like this bug is caused by violating the fact that if a rule is
> >>>> referenced, the related ofproto should not be destroyed.
> >>>>
> >>>> If so, I have a patch that also fixes the problem, not sure if this 
> >>>> helps.
> >>>>
> >>>> http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
> >>>
> >>> There is at least one more problem that is not strictly related but
> >>> in more or less the same part of the code:
> >>>   https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html
> >>
> >> So maybe before add it into *ovsrcu_postpone*, we should add refcount
> >> of ofproto also?
> >>
> > Yes, I think that will fix the issue. But are we able to find out all the
> > places that we need to add refcount of ofproto?

Yes, this could be difficult.

>
> destruct() is called for ofproto_dpif unconditionally without any deferring,
> so this will not help unless we're delaying the the destruct() itself, and
> I'm not sure if we can do that.
>
> >
> > Looks like we might have multiple rcu postponed function that might
> > access the 'ofproto'. And 'ofproto' might be freed already and cause 
> > segfault.
> >
> > Hepeng's patch fixes two places.
> >   
> > http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
> > Ilya pointed out another place
> >   https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html
> > yifeng's case is a little different (not due to ofproto = NULL, but
> > due to setting
> > the p->connmgr = NULL before postponed)
>
> In my case ofproto is alive too.  The problem is destroyed ofproto->baker.
> And in case of meter_id we don't really need to refcount the ofproto itself.
> 'baker' already has a refcount, so we could increase it and pass 'baker'
> pointer to free_meter_id instead of 'ofproto'.  This function doesn't
> actually need an 'ofproto' pointer.
>
> Regarding this connmgr related patch... It's still valid even with refcounts
> because destruction of connmgr is explicitly not deferred for a reason.  It's
> to free the listening socket that users might want to reuse.  Refcounts will
> not help here.
> Also for the meters related issue.. destruct() is called for ofproto_dpif
> without any postponing and I'm not sure if we need or actually able to
> postpone it without consequences.  So, baker->refcount might be a better
> solution keeping the ofproto->refcount only for rules.  This will reduce
> the scope of changes, otherwise we will have to do a lot more work tracking
> down all the users that holds 'ofproto' pointer and will miss some of them
> with high probability.

Agree.
I recheck the patch. It's a better idea to use ofproto->backer's refcount.
But in the rule case, looks like the rule->ofproto needs to be alive
to access both ofproto->vl_mff_map and ofproto->ofproto_class.

 static void
 rule_destroy_cb(struct rule *rule)
     OVS_NO_THREAD_SAFETY_ANALYSIS
 {
     /* Send rule removed if needed. */
     if (rule->flags & OFPUTIL_FF_SEND_FLOW_REM
         && rule->removed_reason != OVS_OFPRR_NONE
         && !rule_is_hidden(rule)) {
         ofproto_rule_send_removed(rule);
     }
     rule->ofproto->ofproto_class->rule_destruct(rule);
     mf_vl_mff_unref(&rule->ofproto->vl_mff_map, rule->match_tlv_bitmap);
     mf_vl_mff_unref(&rule->ofproto->vl_mff_map, rule->ofpacts_tlv_bitmap);
     ofproto_rule_destroy__(rule);
 }

Maybe add refcount into vl_mff_map and ofproto_class?

>
> I'll recheck this (connmgr) patch and, likely, apply it.   Will also review
> ofproto refcount patch.
>
> Best regards, Ilya Maximets.



-- 
hepeng
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to