Adrian Moreno <[email protected]> 于2022年2月15日周二 17:19写道:
> Hi Peng, > > On 2/13/22 03:14, Peng He wrote: > > From hepeng: > > > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2487473 > > > > also from guohongzhi <[email protected]>: > > > http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > > > > also from a discussion about the mixing use of RCU and refcount in the > mail > > list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet. > > > > A summary, as quoted from Ilya: > > > > " > > RCU for ofproto was introduced for one > > and only one reason - to avoid freeing ofproto while rules are still > > alive. This was done in commit f416c8d61601 ("ofproto: RCU postpone > > rule destruction."). The goal was to allow using rules without > > refcounting them within a single grace period. And that forced us > > to postpone destruction of the ofproto for a single grace period. > > Later commit 39c9459355b6 ("Use classifier versioning.") made it > > possible for rules to be alive for more than one grace period, so > > the commit made ofproto wait for 2 grace periods by double postponing. > > As we can see now, that wasn't enough and we have to wait for more > > than 2 grace periods in certain cases. > > " > > > > In a short, the ofproto should have a longer life time than rule, if > > the rule lasts for more than 2 grace periods, the ofproto should live > > longer to ensure rule->ofproto is valid. It's hard to predict how long > > a ofproto should live, thus we need to use refcount on ofproto to make > > things easy. The controversial part is that we have already used RCU > postpone > > to delay ofproto destrution, if we have to add refcount, is it simpler to > > use just refcount without RCU postpone? > > > > IMO, I think going back to the pure refcount solution is more > > complicated than mixing using both. > > > > Gaëtan Rive asks some questions on guohongzhi's v2 patch: > > > > during ofproto_rule_create, should we use ofproto_ref > > or ofproto_try_ref? how can we make sure the ofproto is alive? > > > > By using RCU, ofproto has three states: > > > > state 1: alive, with refcount >= 1 > > state 2: dying, with refcount == 0, however pointer is valid > > state 3: died, memory freed, pointer might be dangling. > > > > Without using RCU, there is no state 2, thus, we have to be very careful > > every time we see a ofproto pointer. In contrast, with RCU, we can be > sure > > that it's alive at least in this grace peroid, so we can just check if > > it is dying by ofproto_try_ref. > > > > This shows that by mixing use of RCU and refcount we can save a lot of > work > > worrying if ofproto is dangling. > > > > In short, the RCU part makes sure the ofproto is alive when we use it, > > and the refcount part makes sure it lives longer enough. > > > > In this patch, I have merged guohongzhi's patch and mine, and fixes > > accoring to the previous comments. > > > > v4->v5: > > * fix the commits, remove the ref to wangyunjian's patch and > > remove the comments describing the previous ofproto destruction code. > > * fix group alloc leak issues. > > > > v5->v6: > > * fix ofproto unref leak > > * fix comments > > > > Signed-off-by: Peng He <[email protected]> > > Signed-off-by: guohongzhi <[email protected]> > > Acked-by: Mike Pattrick <[email protected]> > > --- > > ofproto/ofproto-dpif-xlate-cache.c | 2 + > > ofproto/ofproto-dpif-xlate.c | 14 ++++--- > > ofproto/ofproto-dpif.c | 24 ++++++----- > > ofproto/ofproto-provider.h | 2 + > > ofproto/ofproto.c | 65 +++++++++++++++++++++++++++--- > > ofproto/ofproto.h | 4 ++ > > 6 files changed, 90 insertions(+), 21 deletions(-) > > > > diff --git a/ofproto/ofproto-dpif-xlate-cache.c > b/ofproto/ofproto-dpif-xlate-cache.c > > index dcc91cb38..9224ee2e6 100644 > > --- a/ofproto/ofproto-dpif-xlate-cache.c > > +++ b/ofproto/ofproto-dpif-xlate-cache.c > > @@ -209,6 +209,7 @@ xlate_cache_clear_entry(struct xc_entry *entry) > > { > > switch (entry->type) { > > case XC_TABLE: > > + ofproto_unref(&(entry->table.ofproto->up)); > > break; > > case XC_RULE: > > ofproto_rule_unref(&entry->rule->up); > > @@ -231,6 +232,7 @@ xlate_cache_clear_entry(struct xc_entry *entry) > > free(entry->learn.ofm); > > break; > > case XC_NORMAL: > > + ofproto_unref(&(entry->normal.ofproto->up)); > > break; > > case XC_FIN_TIMEOUT: > > /* 'u.fin.rule' is always already held as a XC_RULE, which > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > > index 6fb59e170..129cdf714 100644 > > --- a/ofproto/ofproto-dpif-xlate.c > > +++ b/ofproto/ofproto-dpif-xlate.c > > @@ -3024,12 +3024,14 @@ xlate_normal(struct xlate_ctx *ctx) > > struct xc_entry *entry; > > > > /* Save just enough info to update mac learning table later. */ > > - entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL); > > - entry->normal.ofproto = ctx->xbridge->ofproto; > > - entry->normal.in_port = flow->in_port.ofp_port; > > - entry->normal.dl_src = flow->dl_src; > > - entry->normal.vlan = vlan; > > - entry->normal.is_gratuitous_arp = is_grat_arp; > > + if (ofproto_try_ref(&ctx->xbridge->ofproto->up)) { > > + entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL); > > + entry->normal.ofproto = ctx->xbridge->ofproto; > > + entry->normal.in_port = flow->in_port.ofp_port; > > + entry->normal.dl_src = flow->dl_src; > > + entry->normal.vlan = vlan; > > + entry->normal.is_gratuitous_arp = is_grat_arp; > > + } > > } > > > > /* Determine output bundle. */ > > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c > > index 8143dd965..c0a87456a 100644 > > --- a/ofproto/ofproto-dpif.c > > +++ b/ofproto/ofproto-dpif.c > > @@ -4471,12 +4471,14 @@ rule_dpif_lookup_from_table(struct ofproto_dpif > *ofproto, > > atomic_add_relaxed(&tbl->n_matched, stats->n_packets, > &orig); > > } > > if (xcache) { > > - struct xc_entry *entry; > > + if (ofproto_try_ref(&ofproto->up)) { > > + struct xc_entry *entry; > > > > The comment above "struct xc_entry" > (ofproto/ofproto-dpif-xlate-cache.h:67" reads: > > * An explicit reference is taken to all pointers other than the ones for > * struct ofproto_dpif. ofproto_dpif pointers are explicitly protected by > * destroying all xlate caches before the ofproto is destroyed. */ > > We might need to change this. > can just remove the words from "other than ... ". > > > - entry = xlate_cache_add_entry(xcache, XC_TABLE); > > - entry->table.ofproto = ofproto; > > - entry->table.id = *table_id; > > - entry->table.match = true; > > + entry = xlate_cache_add_entry(xcache, XC_TABLE); > > + entry->table.ofproto = ofproto; > > + entry->table.id = *table_id; > > + entry->table.match = true; > > + } > > } > > return rule; > > } > > @@ -4507,12 +4509,14 @@ rule_dpif_lookup_from_table(struct ofproto_dpif > *ofproto, > > stats->n_packets, &orig); > > } > > if (xcache) { > > - struct xc_entry *entry; > > + if (ofproto_try_ref(&ofproto->up)) { > > + struct xc_entry *entry; > > > > - entry = xlate_cache_add_entry(xcache, XC_TABLE); > > - entry->table.ofproto = ofproto; > > - entry->table.id = next_id; > > - entry->table.match = (rule != NULL); > > + entry = xlate_cache_add_entry(xcache, XC_TABLE); > > + entry->table.ofproto = ofproto; > > + entry->table.id = next_id; > > + entry->table.match = (rule != NULL); > > + } > > } > > if (rule) { > > goto out; /* Match. */ > > diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h > > index 14b909973..ed10b8c76 100644 > > --- a/ofproto/ofproto-provider.h > > +++ b/ofproto/ofproto-provider.h > > @@ -143,6 +143,8 @@ struct ofproto { > > /* Variable length mf_field mapping. Stores all configured > variable length > > * meta-flow fields (struct mf_field) in a switch. */ > > struct vl_mff_map vl_mff_map; > > + /* refcount to this ofproto, holds by rule/group/xlate_caches */ > > + struct ovs_refcount refcount; > > See Gaëtan's comment. > > > }; > > > > void ofproto_init_tables(struct ofproto *, int n_tables); > > diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c > > index 56aeac720..fe9bb943f 100644 > > --- a/ofproto/ofproto.c > > +++ b/ofproto/ofproto.c > > @@ -549,6 +549,7 @@ ofproto_create(const char *datapath_name, const char > *datapath_type, > > > > ovs_mutex_init(&ofproto->vl_mff_map.mutex); > > cmap_init(&ofproto->vl_mff_map.cmap); > > + ovs_refcount_init(&ofproto->refcount); > > > > error = ofproto->ofproto_class->construct(ofproto); > > if (error) { > > @@ -1695,9 +1696,27 @@ ofproto_destroy__(struct ofproto *ofproto) > > ofproto->ofproto_class->dealloc(ofproto); > > } > > > > -/* Destroying rules is doubly deferred, must have 'ofproto' around for > them. > > - * - 1st we defer the removal of the rules from the classifier > > - * - 2nd we defer the actual destruction of the rules. */ > > +/* > > + * Rule destruction requires ofproto to remain accessible. > > + * Depending on the rule destruction call (shown in below), it can take > several > > + * RCU grace periods before the ofproto reference is not needed anymore. > > + * The ofproto destruction callback is thus protected by a refcount, > > + * and such destruction is itself deferred. > > + * > > + * remove_rules_postponed (one grace period) > > + * -> remove_rule_rcu > > + * -> remove_rule_rcu__ > > + * -> ofproto_rule_unref -> ref count != 1 > > + * -> ... more grace periods. > > + * -> rule_destroy_cb (> 2 grace periods) > > + * -> free > > + * > > + * NOTE: The original ofproto destruction is only deferred by two grace > > + * periods to keep ofproto accessible. By using refcount the > destruction can be > > + * deferred for longer time. But we only add refounct to > rule/group/xlate cache, in case > > + * there are other places which need ofproto refcount, I keep the > original two RCU > > + * grace periods destruction deferring. > > + */ > > Maybe s/I/we/? > Agree. > s/refounct/refcount/ > Agree. > I don't quite get the last sentence, do you mean something like the > following? > We only need to add refcount to certain objects whose destruction can take > several RCU grace periods (rule, group, xlate_cache). Other references to > ofproto must be cleared before the 2 RCU grace periods. yes, but do we have to explain why using 2 other than 1? My original comments are trying to explain why 2 grace periods, not 1, as this is a legacy logic which can cover most other cases. > > Also, I find the description of the 3 stages in your commit message quite > illustrating: > state 1: alive, with refcount >= 1 > state 2: dying, with refcount == 0, however pointer is valid > state 3: died, memory freed, pointer might be dangling. > > Do you think we could include it? > yes, combining this with your description is clearer. We only need to add refcount to certain objects whose destruction can take > several RCU grace periods (rule, group, xlate_cache). Other references to > ofproto must be cleared before the 2 RCU grace periods. > > > static void > > ofproto_destroy_defer__(struct ofproto *ofproto) > > OVS_EXCLUDED(ofproto_mutex) > > @@ -1705,6 +1724,26 @@ ofproto_destroy_defer__(struct ofproto *ofproto) > > ovsrcu_postpone(ofproto_destroy__, ofproto); > > } > > > > +void > > +ofproto_ref(struct ofproto *ofproto) > > +{ > > + ovs_refcount_ref(&ofproto->refcount); > > +} > > + > > +bool > > +ofproto_try_ref(struct ofproto *ofproto) > > +{ > > + return ovs_refcount_try_ref_rcu(&ofproto->refcount); > > +} > > + > > +void > > +ofproto_unref(struct ofproto *ofproto) > > +{ > > + if (ofproto && ovs_refcount_unref(&ofproto->refcount) == 1) { > > + ovsrcu_postpone(ofproto_destroy_defer__, ofproto); > > + } > > +} > > + > > void > > ofproto_destroy(struct ofproto *p, bool del) > > OVS_EXCLUDED(ofproto_mutex) > > @@ -1736,8 +1775,7 @@ ofproto_destroy(struct ofproto *p, bool del) > > p->connmgr = NULL; > > ovs_mutex_unlock(&ofproto_mutex); > > > > - /* Destroying rules is deferred, must have 'ofproto' around for > them. */ > > - ovsrcu_postpone(ofproto_destroy_defer__, p); > > + ofproto_unref(p); > > } > > > > /* Destroys the datapath with the respective 'name' and 'type'. With > the Linux > > @@ -2929,6 +2967,9 @@ ofproto_rule_destroy__(struct rule *rule) > > cls_rule_destroy(CONST_CAST(struct cls_rule *, &rule->cr)); > > rule_actions_destroy(rule_get_actions(rule)); > > ovs_mutex_destroy(&rule->mutex); > > + /* ofproto_unref() must be called first. It is possible because > ofproto > > + * destruction is deferred by an RCU grace period. */ > > + ofproto_unref(rule->ofproto); > > rule->ofproto->ofproto_class->rule_dealloc(rule); > > } > > > > @@ -3069,6 +3110,9 @@ group_destroy_cb(struct ofgroup *group) > > &group->props)); > > ofputil_bucket_list_destroy(CONST_CAST(struct ovs_list *, > > &group->buckets)); > > + /* ofproto_unref() must be called first. It is possible because > ofproto > > + * destruction is deferred by an RCU grace period. */ > > + ofproto_unref(group->ofproto); > > group->ofproto->ofproto_class->group_dealloc(group); > > } > > > > @@ -5271,10 +5315,15 @@ ofproto_rule_create(struct ofproto *ofproto, > struct cls_rule *cr, > > struct rule *rule; > > enum ofperr error; > > > > + if (!ofproto_try_ref(ofproto)) { > > + return OFPERR_OFPFMFC_UNKNOWN; > > + } > > + > > /* Allocate new rule. */ > > rule = ofproto->ofproto_class->rule_alloc(); > > if (!rule) { > > cls_rule_destroy(cr); > > + ofproto_unref(ofproto); > > VLOG_WARN_RL(&rl, "%s: failed to allocate a rule.", > ofproto->name); > > return OFPERR_OFPFMFC_UNKNOWN; > > } > > @@ -7339,8 +7388,13 @@ init_group(struct ofproto *ofproto, const struct > ofputil_group_mod *gm, > > return OFPERR_OFPGMFC_BAD_TYPE; > > } > > > > + if (!ofproto_try_ref(ofproto)) { > > + return OFPERR_OFPFMFC_UNKNOWN; > > + } > > + > > *ofgroup = ofproto->ofproto_class->group_alloc(); > > if (!*ofgroup) { > > + ofproto_unref(ofproto); > > VLOG_WARN_RL(&rl, "%s: failed to allocate group", > ofproto->name); > > return OFPERR_OFPGMFC_OUT_OF_GROUPS; > > } > > @@ -7377,6 +7431,7 @@ init_group(struct ofproto *ofproto, const struct > ofputil_group_mod *gm, > > > &(*ofgroup)->props)); > > ofputil_bucket_list_destroy(CONST_CAST(struct ovs_list *, > > &(*ofgroup)->buckets)); > > + ofproto_unref(ofproto); > > ofproto->ofproto_class->group_dealloc(*ofgroup); > > } > > return error; > > diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h > > index b0262da2d..4e15167ab 100644 > > --- a/ofproto/ofproto.h > > +++ b/ofproto/ofproto.h > > @@ -563,6 +563,10 @@ int ofproto_port_get_cfm_status(const struct > ofproto *, > > enum ofputil_table_miss ofproto_table_get_miss_config(const struct > ofproto *, > > uint8_t > table_id); > > > > +void ofproto_ref(struct ofproto *); > > +void ofproto_unref(struct ofproto *); > > +bool ofproto_try_ref(struct ofproto *); > > + > > #ifdef __cplusplus > > } > > #endif > > -- > Adrián Moreno > > -- hepeng _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
