Adrian Moreno <[email protected]> 于2022年2月9日周三 17:41写道:
> > > On 2/9/22 10:08, Peng He wrote: > > Hi, Adrian, > > > > > > > > Adrian Moreno <[email protected] <mailto:[email protected]>> > 于2022年2月8日 > > 周二 18:14写道: > > > > > > > > On 2/5/22 04:58, Peng He wrote: > > > Hi, Adrian, > > > > > > > > > > > > Adrian Moreno <[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> > 于2022年2月4日 > > > 周五 21:37写道: > > > > > > > > > > > > On 1/18/22 16:01, Peng He wrote: > > > > From hepeng: > > > > > > > > > > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2487473 > > < > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2487473 > > > > > > > < > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2487473 > < > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2487473 > >> > > > > > > > > also from guohongzhi <[email protected] > > <mailto:[email protected]> > > > <mailto:[email protected] <mailto:[email protected] > >>>: > > > > > > > > > > http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > > < > http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > > > > > > > < > http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > < > http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > >> > > > > > > > > also from a discussion about the mixing use of RCU and > refcount in > > the mail > > > > list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan > Rivet. > > > > > > > > A summary, as quoted from Ilya: > > > > > > > > " > > > > RCU for ofproto was introduced for one > > > > and only one reason - to avoid freeing ofproto while rules > are still > > > > alive. This was done in commit f416c8d61601 ("ofproto: > RCU postpone > > > > rule destruction."). The goal was to allow using rules > without > > > > refcounting them within a single grace period. And that > forced us > > > > to postpone destruction of the ofproto for a single grace > period. > > > > Later commit 39c9459355b6 ("Use classifier versioning.") > made it > > > > possible for rules to be alive for more than one grace > period, so > > > > the commit made ofproto wait for 2 grace periods by double > postponing. > > > > As we can see now, that wasn't enough and we have to wait > for more > > > > than 2 grace periods in certain cases. > > > > " > > > > > > > > In a short, the ofproto should have a longer life time > than rule, if > > > > the rule lasts for more than 2 grace periods, the ofproto > should live > > > > longer to ensure rule->ofproto is valid. It's hard to > predict how long > > > > a ofproto should live, thus we need to use refcount on > ofproto to make > > > > things easy. The controversial part is that we have > already used > > RCU postpone > > > > to delay ofproto destrution, if we have to add refcount, > is it > > simpler to > > > > use just refcount without RCU postpone? > > > > > > > > IMO, I think going back to the pure refcount solution is > more > > > > complicated than mixing using both. > > > > > > > > Gaëtan Rive asks some questions on guohongzhi's v2 patch: > > > > > > > > during ofproto_rule_create, should we use ofproto_ref > > > > or ofproto_try_ref? how can we make sure the ofproto is > alive? > > > > > > > > By using RCU, ofproto has three states: > > > > > > > > state 1: alive, with refcount >= 1 > > > > state 2: dying, with refcount == 0, however pointer is > valid > > > > state 3: died, memory freed, pointer might be dangling. > > > > > > > > Without using RCU, there is no state 2, thus, we have to > be very > > careful > > > > every time we see a ofproto pointer. In contrast, with > RCU, we can > > be sure > > > > that it's alive at least in this grace peroid, so we can > just check if > > > > it is dying by ofproto_try_ref. > > > > > > > > This shows that by mixing use of RCU and refcount we can > save a > > lot of work > > > > worrying if ofproto is dangling. > > > > > > > > In short, the RCU part makes sure the ofproto is alive > when we use it, > > > > and the refcount part makes sure it lives longer enough. > > > > > > > > Also regarding a new patch filed recently, people are now > making use > > > > of RCU to protect ofproto: > > > > > > > > > > > > > > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > > < > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > > > > > > > < > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > < > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > >> > > > > > > > > In this patch, I have merged guohongzhi's patch and mine, > and fixes > > > > accoring to the previous comments. > > > > > > > > Signed-off-by: Peng He <[email protected] > > <mailto:[email protected]> > > > <mailto:[email protected] <mailto: > [email protected]>>> > > > > Signed-off-by: guohongzhi <[email protected] > > <mailto:[email protected]> > > > <mailto:[email protected] <mailto:[email protected] > >>> > > > > --- > > > > ofproto/ofproto-dpif-xlate-cache.c | 2 + > > > > ofproto/ofproto-dpif-xlate.c | 14 ++++--- > > > > ofproto/ofproto-dpif.c | 24 +++++++----- > > > > ofproto/ofproto-provider.h | 2 + > > > > ofproto/ofproto.c | 62 > > +++++++++++++++++++++++++++--- > > > > ofproto/ofproto.h | 4 ++ > > > > 6 files changed, 87 insertions(+), 21 deletions(-) > > > > > > > > diff --git a/ofproto/ofproto-dpif-xlate-cache.c > > > b/ofproto/ofproto-dpif-xlate-cache.c > > > > index dcc91cb38..9224ee2e6 100644 > > > > --- a/ofproto/ofproto-dpif-xlate-cache.c > > > > +++ b/ofproto/ofproto-dpif-xlate-cache.c > > > > @@ -209,6 +209,7 @@ xlate_cache_clear_entry(struct > xc_entry *entry) > > > > { > > > > switch (entry->type) { > > > > case XC_TABLE: > > > > + ofproto_unref(&(entry->table.ofproto->up)); > > > > break; > > > > case XC_RULE: > > > > ofproto_rule_unref(&entry->rule->up); > > > > @@ -231,6 +232,7 @@ xlate_cache_clear_entry(struct > xc_entry *entry) > > > > free(entry->learn.ofm); > > > > break; > > > > case XC_NORMAL: > > > > + ofproto_unref(&(entry->normal.ofproto->up)); > > > > break; > > > > case XC_FIN_TIMEOUT: > > > > /* 'u.fin.rule' is always already held as a > XC_RULE, which > > > > diff --git a/ofproto/ofproto-dpif-xlate.c > > b/ofproto/ofproto-dpif-xlate.c > > > > index 6fb59e170..129cdf714 100644 > > > > --- a/ofproto/ofproto-dpif-xlate.c > > > > +++ b/ofproto/ofproto-dpif-xlate.c > > > > @@ -3024,12 +3024,14 @@ xlate_normal(struct xlate_ctx *ctx) > > > > struct xc_entry *entry; > > > > > > > > /* Save just enough info to update mac learning > table > > later. */ > > > > - entry = xlate_cache_add_entry(ctx->xin->xcache, > XC_NORMAL); > > > > - entry->normal.ofproto = ctx->xbridge->ofproto; > > > > - entry->normal.in_port = flow->in_port.ofp_port; > > > > - entry->normal.dl_src = flow->dl_src; > > > > - entry->normal.vlan = vlan; > > > > - entry->normal.is_gratuitous_arp = is_grat_arp; > > > > + if (ofproto_try_ref(&ctx->xbridge->ofproto->up)) { > > > > + entry = > xlate_cache_add_entry(ctx->xin->xcache, > > XC_NORMAL); > > > > + entry->normal.ofproto = ctx->xbridge->ofproto; > > > > + entry->normal.in_port = > flow->in_port.ofp_port; > > > > + entry->normal.dl_src = flow->dl_src; > > > > + entry->normal.vlan = vlan; > > > > + entry->normal.is_gratuitous_arp = is_grat_arp; > > > > + } > > > > } > > > > > > > > /* Determine output bundle. */ > > > > diff --git a/ofproto/ofproto-dpif.c > b/ofproto/ofproto-dpif.c > > > > index 8143dd965..c0a87456a 100644 > > > > --- a/ofproto/ofproto-dpif.c > > > > +++ b/ofproto/ofproto-dpif.c > > > > @@ -4471,12 +4471,14 @@ rule_dpif_lookup_from_table(struct > > ofproto_dpif > > > *ofproto, > > > > atomic_add_relaxed(&tbl->n_matched, > > stats->n_packets, > > > &orig); > > > > } > > > > if (xcache) { > > > > - struct xc_entry *entry; > > > > + if (ofproto_try_ref(&ofproto->up)) { > > > > + struct xc_entry *entry; > > > > > > > > - entry = xlate_cache_add_entry(xcache, > XC_TABLE); > > > > - entry->table.ofproto = ofproto; > > > > - entry->table.id <http://table.id> > > <http://table.id <http://table.id>> = *table_id; > > > > - entry->table.match = true; > > > > + entry = xlate_cache_add_entry(xcache, > XC_TABLE); > > > > + entry->table.ofproto = ofproto; > > > > + entry->table.id <http://table.id> > > <http://table.id <http://table.id>> = *table_id; > > > > + entry->table.match = true; > > > > + } > > > > } > > > > return rule; > > > > } > > > > @@ -4507,12 +4509,14 @@ rule_dpif_lookup_from_table(struct > > ofproto_dpif > > > *ofproto, > > > > stats->n_packets, &orig); > > > > } > > > > if (xcache) { > > > > - struct xc_entry *entry; > > > > + if (ofproto_try_ref(&ofproto->up)) { > > > > + struct xc_entry *entry; > > > > > > > > - entry = xlate_cache_add_entry(xcache, > XC_TABLE); > > > > - entry->table.ofproto = ofproto; > > > > - entry->table.id <http://table.id> < > http://table.id > > <http://table.id>> = next_id; > > > > - entry->table.match = (rule != NULL); > > > > + entry = xlate_cache_add_entry(xcache, > XC_TABLE); > > > > + entry->table.ofproto = ofproto; > > > > + entry->table.id <http://table.id> > > <http://table.id <http://table.id>> = next_id; > > > > + entry->table.match = (rule != NULL); > > > > + } > > > > } > > > > if (rule) { > > > > goto out; /* Match. */ > > > > diff --git a/ofproto/ofproto-provider.h > b/ofproto/ofproto-provider.h > > > > index 14b909973..ed10b8c76 100644 > > > > --- a/ofproto/ofproto-provider.h > > > > +++ b/ofproto/ofproto-provider.h > > > > @@ -143,6 +143,8 @@ struct ofproto { > > > > /* Variable length mf_field mapping. Stores all > configured > > variable > > > length > > > > * meta-flow fields (struct mf_field) in a switch. */ > > > > struct vl_mff_map vl_mff_map; > > > > + /* refcount to this ofproto, holds by > rule/group/xlate_caches */ > > > > + struct ovs_refcount refcount; > > > > }; > > > > > > > > void ofproto_init_tables(struct ofproto *, int n_tables); > > > > diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c > > > > index 56aeac720..10a22d9ec 100644 > > > > --- a/ofproto/ofproto.c > > > > +++ b/ofproto/ofproto.c > > > > @@ -549,6 +549,7 @@ ofproto_create(const char > *datapath_name, > > const char > > > *datapath_type, > > > > > > > > ovs_mutex_init(&ofproto->vl_mff_map.mutex); > > > > cmap_init(&ofproto->vl_mff_map.cmap); > > > > + ovs_refcount_init(&ofproto->refcount); > > > > > > > > error = ofproto->ofproto_class->construct(ofproto); > > > > if (error) { > > > > @@ -1695,9 +1696,24 @@ ofproto_destroy__(struct ofproto > *ofproto) > > > > ofproto->ofproto_class->dealloc(ofproto); > > > > } > > > > > > > > -/* Destroying rules is doubly deferred, must have > 'ofproto' > > around for them. > > > > - * - 1st we defer the removal of the rules from the > classifier > > > > - * - 2nd we defer the actual destruction of the rules. */ > > > > +/* We used to use defer function to wait for two grace > periods > > > > + * as we assume the rule that holds the ofproto pointer > will > > > > + * live at most two grace period. Howvever, we found at > certain > > > > + * cases, this assumption does not stand. > > > > + * > > > > + * destroying a rule may have to wait multiple grace > periods: > > > > + * remove_rules_postponed (one grace period) > > > > + * -> remove_rule_rcu > > > > + * -> remove_rule_rcu__ > > > > + * -> ofproto_rule_unref -> ref count != 1 > > > > + * -> ... more grace periods. > > > > + * -> rule_destroy_cb (> 2 grace > periods) > > > > + * -> free > > > > + * > > > > + * So we have to check the refcount for sure all the rules > > > > + * have been destroyed. > > > > + * > > > > + */ > > > > static void > > > > ofproto_destroy_defer__(struct ofproto *ofproto) > > > > OVS_EXCLUDED(ofproto_mutex) > > > > @@ -1705,6 +1721,26 @@ ofproto_destroy_defer__(struct > ofproto > > *ofproto) > > > > ovsrcu_postpone(ofproto_destroy__, ofproto); > > > > } > > > > > > > > +void > > > > +ofproto_ref(struct ofproto *ofproto) > > > > +{ > > > > + ovs_refcount_ref(&ofproto->refcount); > > > > +} > > > > + > > > > +bool > > > > +ofproto_try_ref(struct ofproto *ofproto) > > > > +{ > > > > + return ovs_refcount_try_ref_rcu(&ofproto->refcount); > > > > +} > > > > + > > > > +void > > > > +ofproto_unref(struct ofproto *ofproto) > > > > +{ > > > > + if (ofproto && ovs_refcount_unref(&ofproto->refcount) > == 1) { > > > > + ovsrcu_postpone(ofproto_destroy_defer__, ofproto); > > > > + } > > > > +} > > > > + > > > > void > > > > ofproto_destroy(struct ofproto *p, bool del) > > > > OVS_EXCLUDED(ofproto_mutex) > > > > @@ -1736,8 +1772,7 @@ ofproto_destroy(struct ofproto *p, > bool del) > > > > p->connmgr = NULL; > > > > ovs_mutex_unlock(&ofproto_mutex); > > > > > > > > - /* Destroying rules is deferred, must have 'ofproto' > around > > for them. */ > > > > - ovsrcu_postpone(ofproto_destroy_defer__, p); > > > > + ofproto_unref(p); > > > > } > > > > > > > > /* Destroys the datapath with the respective 'name' and > 'type'. > > With > > > the Linux > > > > @@ -2929,6 +2964,10 @@ ofproto_rule_destroy__(struct rule > *rule) > > > > cls_rule_destroy(CONST_CAST(struct cls_rule *, > &rule->cr)); > > > > rule_actions_destroy(rule_get_actions(rule)); > > > > ovs_mutex_destroy(&rule->mutex); > > > > + /* we need to call ofproto_unref first, and thanks to > rcu, > > ofproto > > > is alive > > > > + * otherwise, group is freed, group->ofproto is > invalid > > > > + */ > > > > + ofproto_unref(rule->ofproto); > > > > rule->ofproto->ofproto_class->rule_dealloc(rule); > > > > } > > > > > > > > @@ -3069,6 +3108,10 @@ group_destroy_cb(struct ofgroup > *group) > > > > > &group->props)); > > > > ofputil_bucket_list_destroy(CONST_CAST(struct > ovs_list *, > > > > > &group->buckets)); > > > > + /* we need to call ofproto_unref first, and thanks to > rcu, > > ofproto > > > is alive > > > > + * otherwise, group is freed, group->ofproto is > invalid > > > > + */ > > > > + ofproto_unref(group->ofproto); > > > > group->ofproto->ofproto_class->group_dealloc(group); > > > > } > > > > > > > > @@ -5279,6 +5322,11 @@ ofproto_rule_create(struct ofproto > *ofproto, > > > struct cls_rule *cr, > > > > return OFPERR_OFPFMFC_UNKNOWN; > > > > } > > > > > > > > + if (!ofproto_try_ref(ofproto)) { > > > > + cls_rule_destroy(cr); > > > > + return OFPERR_OFPFMFC_UNKNOWN; > > > > + } > > > > + > > > > /* Initialize base state. */ > > > > *CONST_CAST(struct ofproto **, &rule->ofproto) = > ofproto; > > > > cls_rule_move(CONST_CAST(struct cls_rule *, > &rule->cr), cr); > > > > @@ -7345,6 +7393,10 @@ init_group(struct ofproto *ofproto, > const > > struct > > > ofputil_group_mod *gm, > > > > return OFPERR_OFPGMFC_OUT_OF_GROUPS; > > > > } > > > > > > > > + if (!ofproto_try_ref(ofproto)) { > > > > + return OFPERR_OFPFMFC_UNKNOWN; > > > > + } > > > > + > > > > *CONST_CAST(struct ofproto **, &(*ofgroup)->ofproto) > = ofproto; > > > > *CONST_CAST(uint32_t *, &((*ofgroup)->group_id)) = > gm->group_id; > > > > *CONST_CAST(enum ofp11_group_type *, > &(*ofgroup)->type) = > > gm->type; > > > > diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h > > > > index b0262da2d..4e15167ab 100644 > > > > --- a/ofproto/ofproto.h > > > > +++ b/ofproto/ofproto.h > > > > @@ -563,6 +563,10 @@ int ofproto_port_get_cfm_status(const > struct > > ofproto *, > > > > enum ofputil_table_miss > ofproto_table_get_miss_config(const struct > > > ofproto *, > > > > > uint8_t > > table_id); > > > > > > > > +void ofproto_ref(struct ofproto *); > > > > +void ofproto_unref(struct ofproto *); > > > > +bool ofproto_try_ref(struct ofproto *); > > > > + > > > > #ifdef __cplusplus > > > > } > > > > #endif > > > > > > > > > Hi Peng, > > > > > > Do we still to protect the reference that "struct upcall" > holds? i.e: > > > > > > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > > < > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > > > > > > > < > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > < > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > >> > > > > > > > > > there are two code path which leads to the call to > "upcall_receive", the > > > userspace uses upcall_cb and the kernel > > > datapath uses recv_upcalls. In both places, the upcall struct is > an > > > on-stack-allocated value. > > > > > > For userspace, the upcall_cb is called directly from pmd thread, > which I > > guess > > > inside the upcall_cb, no places will sleep or call rcu_wait, > meaning > > > that this pointer to the ofproto will not be used for too long, > i.e. > > longer than > > > one grace period. > > > > > > The fact of upcall_cb calling from pmd thread can be used to > prove that > > > upcall_receive contains no rcu_wait, so also for the kernel > datapath, > > > I guess no need to protect the upcall ref in "struct upcall". > > > > Maybe I'm missing something, but what I see is that the main thread > can call > > ofproto_destroy at any time which will call ofproto_dpif->destruct(). > > The ofproto_dpif destruct() will run rcu_synchronize() which will > block until > > all handler threads have finished processing a batch. > > > > But if a hanlder thread wakes up before the main thread does, it > could pick up > > the "ofproto_dpif" reference again from "all_ofproto_dpifs_by_uuid" > and use it > > while the main thread continues destruct()ing it. The result could > very well be > > a ovs crash with the stack trace reported by Yunjian, i.e: > > > > > > If I understand correctly your problem, I think ovs will not crash after > > applying Yunjian's patch. > > Right. So we agree we need Yunjian as well. I thought you were suggesting > to > drop Yunjian's and only apply yours. I think we need both. > Yes, indeed we need both. > > > > > if a handler wakes up and gets the "ofproto" ref before the main thread > removes > > the "ofproto" ref from > > "all_ofproto_dpifs_by_uuid", it can use it, but since the handler is > using the > > ofproto, it > > will not enter the quiescent state, and if at the same time the main > thread is > > destroying ofproto, as > > it will run rcu_synchronize(), it will wait until the handler finishes > using > > ofproto, then destroying it. > > > > (gdb) bt > > hmap_next (hmap.h:398) > > seq_wake_waiters (seq.c:326) > > seq_change_protected (seq.c:134) > > seq_change (seq.c:144) > > ofproto_dpif_send_async_msg (ofproto_dpif.c:263) > > process_upcall (ofproto_dpif_upcall.c:1782) > > recv_upcalls (ofproto_dpif_upcall.c:1026) > > udpif_upcall_handler (ofproto/ofproto_dpif_upcall.c:945) > > ovsthread_wrapper (ovs_thread.c:734) > > > > > > Maybe I've misinterpreted you're reference to Yunjian's patch so > just wanted to > > make sure. > > > > > > > > > > Thanks > > -- > > Adrián Moreno > > > > > > > > -- > > hepeng > > -- > Adrián Moreno > > -- hepeng _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
