On Sat, Feb 5, 2022, at 06:26, Peng He wrote:
> From hepeng:
> https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2487473
>
> also from guohongzhi <[email protected]>:
> http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
>
> also from a discussion about the mixing use of RCU and refcount in the mail
> list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet.
>
> A summary, as quoted from Ilya:
>
> "
> RCU for ofproto was introduced for one
> and only one reason - to avoid freeing ofproto while rules are still
> alive. This was done in commit f416c8d61601 ("ofproto: RCU postpone
> rule destruction."). The goal was to allow using rules without
> refcounting them within a single grace period. And that forced us
> to postpone destruction of the ofproto for a single grace period.
> Later commit 39c9459355b6 ("Use classifier versioning.") made it
> possible for rules to be alive for more than one grace period, so
> the commit made ofproto wait for 2 grace periods by double postponing.
> As we can see now, that wasn't enough and we have to wait for more
> than 2 grace periods in certain cases.
> "
>
> In a short, the ofproto should have a longer life time than rule, if
> the rule lasts for more than 2 grace periods, the ofproto should live
> longer to ensure rule->ofproto is valid. It's hard to predict how long
> a ofproto should live, thus we need to use refcount on ofproto to make
> things easy. The controversial part is that we have already used RCU postpone
> to delay ofproto destrution, if we have to add refcount, is it simpler to
> use just refcount without RCU postpone?
>
> IMO, I think going back to the pure refcount solution is more
> complicated than mixing using both.
>
> Gaëtan Rive asks some questions on guohongzhi's v2 patch:
>
> during ofproto_rule_create, should we use ofproto_ref
> or ofproto_try_ref? how can we make sure the ofproto is alive?
>
> By using RCU, ofproto has three states:
>
> state 1: alive, with refcount >= 1
> state 2: dying, with refcount == 0, however pointer is valid
> state 3: died, memory freed, pointer might be dangling.
>
> Without using RCU, there is no state 2, thus, we have to be very careful
> every time we see a ofproto pointer. In contrast, with RCU, we can be sure
> that it's alive at least in this grace peroid, so we can just check if
> it is dying by ofproto_try_ref.
>
> This shows that by mixing use of RCU and refcount we can save a lot of work
> worrying if ofproto is dangling.
>
> In short, the RCU part makes sure the ofproto is alive when we use it,
> and the refcount part makes sure it lives longer enough.
>
> In this patch, I have merged guohongzhi's patch and mine, and fixes
> accoring to the previous comments.
>
> v4->v5:
> * fix the comments, remove the ref to wangyunjian's patch and
> remove the comments describing the previous ofproto destruction code.
> * fix group alloc leak issues.
>
> Signed-off-by: Peng He <[email protected]>
> Signed-off-by: guohongzhi <[email protected]>
> Acked-by: Mike Pattrick <[email protected]>
> ---
> ofproto/ofproto-dpif-xlate-cache.c | 2 ++
> ofproto/ofproto-dpif-xlate.c | 14 ++++----
> ofproto/ofproto-dpif.c | 24 +++++++------
> ofproto/ofproto-provider.h | 2 ++
> ofproto/ofproto.c | 57 +++++++++++++++++++++++++++---
> ofproto/ofproto.h | 4 +++
> 6 files changed, 82 insertions(+), 21 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif-xlate-cache.c
> b/ofproto/ofproto-dpif-xlate-cache.c
> index dcc91cb38..9224ee2e6 100644
> --- a/ofproto/ofproto-dpif-xlate-cache.c
> +++ b/ofproto/ofproto-dpif-xlate-cache.c
> @@ -209,6 +209,7 @@ xlate_cache_clear_entry(struct xc_entry *entry)
> {
> switch (entry->type) {
> case XC_TABLE:
> + ofproto_unref(&(entry->table.ofproto->up));
> break;
> case XC_RULE:
> ofproto_rule_unref(&entry->rule->up);
> @@ -231,6 +232,7 @@ xlate_cache_clear_entry(struct xc_entry *entry)
> free(entry->learn.ofm);
> break;
> case XC_NORMAL:
> + ofproto_unref(&(entry->normal.ofproto->up));
> break;
> case XC_FIN_TIMEOUT:
> /* 'u.fin.rule' is always already held as a XC_RULE, which
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 6fb59e170..129cdf714 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3024,12 +3024,14 @@ xlate_normal(struct xlate_ctx *ctx)
> struct xc_entry *entry;
>
> /* Save just enough info to update mac learning table later. */
> - entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL);
> - entry->normal.ofproto = ctx->xbridge->ofproto;
> - entry->normal.in_port = flow->in_port.ofp_port;
> - entry->normal.dl_src = flow->dl_src;
> - entry->normal.vlan = vlan;
> - entry->normal.is_gratuitous_arp = is_grat_arp;
> + if (ofproto_try_ref(&ctx->xbridge->ofproto->up)) {
> + entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL);
> + entry->normal.ofproto = ctx->xbridge->ofproto;
> + entry->normal.in_port = flow->in_port.ofp_port;
> + entry->normal.dl_src = flow->dl_src;
> + entry->normal.vlan = vlan;
> + entry->normal.is_gratuitous_arp = is_grat_arp;
> + }
> }
>
> /* Determine output bundle. */
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index 8143dd965..c0a87456a 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -4471,12 +4471,14 @@ rule_dpif_lookup_from_table(struct ofproto_dpif
> *ofproto,
> atomic_add_relaxed(&tbl->n_matched, stats->n_packets,
> &orig);
> }
> if (xcache) {
> - struct xc_entry *entry;
> + if (ofproto_try_ref(&ofproto->up)) {
> + struct xc_entry *entry;
>
> - entry = xlate_cache_add_entry(xcache, XC_TABLE);
> - entry->table.ofproto = ofproto;
> - entry->table.id = *table_id;
> - entry->table.match = true;
> + entry = xlate_cache_add_entry(xcache, XC_TABLE);
> + entry->table.ofproto = ofproto;
> + entry->table.id = *table_id;
> + entry->table.match = true;
> + }
> }
> return rule;
> }
> @@ -4507,12 +4509,14 @@ rule_dpif_lookup_from_table(struct ofproto_dpif
> *ofproto,
> stats->n_packets, &orig);
> }
> if (xcache) {
> - struct xc_entry *entry;
> + if (ofproto_try_ref(&ofproto->up)) {
> + struct xc_entry *entry;
>
> - entry = xlate_cache_add_entry(xcache, XC_TABLE);
> - entry->table.ofproto = ofproto;
> - entry->table.id = next_id;
> - entry->table.match = (rule != NULL);
> + entry = xlate_cache_add_entry(xcache, XC_TABLE);
> + entry->table.ofproto = ofproto;
> + entry->table.id = next_id;
> + entry->table.match = (rule != NULL);
> + }
> }
> if (rule) {
> goto out; /* Match. */
> diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
> index 14b909973..ed10b8c76 100644
> --- a/ofproto/ofproto-provider.h
> +++ b/ofproto/ofproto-provider.h
> @@ -143,6 +143,8 @@ struct ofproto {
> /* Variable length mf_field mapping. Stores all configured variable
> length
> * meta-flow fields (struct mf_field) in a switch. */
> struct vl_mff_map vl_mff_map;
> + /* refcount to this ofproto, holds by rule/group/xlate_caches */
holds -> held
> + struct ovs_refcount refcount;
> };
>
> void ofproto_init_tables(struct ofproto *, int n_tables);
> diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
> index 56aeac720..e83869eda 100644
> --- a/ofproto/ofproto.c
> +++ b/ofproto/ofproto.c
> @@ -549,6 +549,7 @@ ofproto_create(const char *datapath_name, const
> char *datapath_type,
>
> ovs_mutex_init(&ofproto->vl_mff_map.mutex);
> cmap_init(&ofproto->vl_mff_map.cmap);
> + ovs_refcount_init(&ofproto->refcount);
>
> error = ofproto->ofproto_class->construct(ofproto);
> if (error) {
> @@ -1695,9 +1696,20 @@ ofproto_destroy__(struct ofproto *ofproto)
> ofproto->ofproto_class->dealloc(ofproto);
> }
>
> -/* Destroying rules is doubly deferred, must have 'ofproto' around for them.
> - * - 1st we defer the removal of the rules from the classifier
> - * - 2nd we defer the actual destruction of the rules. */
> +/*
> + * destroying a rule may have to wait multiple grace periods:
> + * remove_rules_postponed (one grace period)
> + * -> remove_rule_rcu
> + * -> remove_rule_rcu__
> + * -> ofproto_rule_unref -> ref count != 1
> + * -> ... more grace periods.
> + * -> rule_destroy_cb (> 2 grace periods)
> + * -> free
> + *
> + * So we have to check the refcount for sure all the rules
> + * have been destroyed.
> + *
> + */
This comment is not clear enough. Here is a proposed reformulation
that could be used instead. Feel free to either take it or improve
the original comment to make it clearer.
/* Rule destruction requires ofproto to remain accessible.
* Depending on the rule destruction call, it can take several
* RCU grace periods before the ofproto reference is not needed anymore.
* The ofproto destruction callback is thus protected by a refcount,
* and such destruction is itself deferred by an RCU grace period. */
> static void
> ofproto_destroy_defer__(struct ofproto *ofproto)
> OVS_EXCLUDED(ofproto_mutex)
> @@ -1705,6 +1717,26 @@ ofproto_destroy_defer__(struct ofproto *ofproto)
> ovsrcu_postpone(ofproto_destroy__, ofproto);
> }
>
> +void
> +ofproto_ref(struct ofproto *ofproto)
> +{
> + ovs_refcount_ref(&ofproto->refcount);
> +}
> +
> +bool
> +ofproto_try_ref(struct ofproto *ofproto)
> +{
> + return ovs_refcount_try_ref_rcu(&ofproto->refcount);
> +}
> +
> +void
> +ofproto_unref(struct ofproto *ofproto)
> +{
> + if (ofproto && ovs_refcount_unref(&ofproto->refcount) == 1) {
> + ovsrcu_postpone(ofproto_destroy_defer__, ofproto);
> + }
> +}
> +
> void
> ofproto_destroy(struct ofproto *p, bool del)
> OVS_EXCLUDED(ofproto_mutex)
> @@ -1736,8 +1768,7 @@ ofproto_destroy(struct ofproto *p, bool del)
> p->connmgr = NULL;
> ovs_mutex_unlock(&ofproto_mutex);
>
> - /* Destroying rules is deferred, must have 'ofproto' around for them. */
> - ovsrcu_postpone(ofproto_destroy_defer__, p);
> + ofproto_unref(p);
> }
>
> /* Destroys the datapath with the respective 'name' and 'type'. With the
> Linux
> @@ -2929,6 +2960,10 @@ ofproto_rule_destroy__(struct rule *rule)
> cls_rule_destroy(CONST_CAST(struct cls_rule *, &rule->cr));
> rule_actions_destroy(rule_get_actions(rule));
> ovs_mutex_destroy(&rule->mutex);
> + /* we need to call ofproto_unref first, and thanks to rcu, ofproto is
> alive
> + * otherwise, group is freed, group->ofproto is invalid
> + */
Reformulation:
/* ofproto_unref() must be called first. It is possible because ofproto
* destruction is deferred by an RCU grace period. */
> + ofproto_unref(rule->ofproto);
> rule->ofproto->ofproto_class->rule_dealloc(rule);
> }
>
> @@ -3069,6 +3104,10 @@ group_destroy_cb(struct ofgroup *group)
> &group->props));
> ofputil_bucket_list_destroy(CONST_CAST(struct ovs_list *,
> &group->buckets));
> + /* we need to call ofproto_unref first, and thanks to rcu, ofproto is
> alive
> + * otherwise, group is freed, group->ofproto is invalid
> + */
Reformulation:
/* ofproto_unref() must be called first. It is possible because ofproto
* destruction is deferred by an RCU grace period. */
> + ofproto_unref(group->ofproto);
> group->ofproto->ofproto_class->group_dealloc(group);
> }
>
> @@ -5271,6 +5310,10 @@ ofproto_rule_create(struct ofproto *ofproto,
> struct cls_rule *cr,
> struct rule *rule;
> enum ofperr error;
>
> + if (!ofproto_try_ref(ofproto)) {
> + return OFPERR_OFPFMFC_UNKNOWN;
> + }
> +
> /* Allocate new rule. */
> rule = ofproto->ofproto_class->rule_alloc();
> if (!rule) {
Unfortunately, if (!rule), then a dangling ofproto reference has been counted.
It must be either unref() in all the error path of this function, or reorganized
somehow.
Here it should be directly ofproto_unref(), while in the later error,
ofproto_rule_destroy__ would be called and ofproto_unref() would be done within.
> @@ -7339,6 +7382,10 @@ init_group(struct ofproto *ofproto, const struct
> ofputil_group_mod *gm,
> return OFPERR_OFPGMFC_BAD_TYPE;
> }
>
> + if (!ofproto_try_ref(ofproto)) {
> + return OFPERR_OFPFMFC_UNKNOWN;
> + }
> +
> *ofgroup = ofproto->ofproto_class->group_alloc();
> if (!*ofgroup) {
ditto, although in the later error path line 7422, ofproto_unref() should
be called as it is not done as part of any of the three calls, unlike for the
rules above.
> VLOG_WARN_RL(&rl, "%s: failed to allocate group", ofproto->name);
> diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
> index b0262da2d..4e15167ab 100644
> --- a/ofproto/ofproto.h
> +++ b/ofproto/ofproto.h
> @@ -563,6 +563,10 @@ int ofproto_port_get_cfm_status(const struct ofproto *,
> enum ofputil_table_miss ofproto_table_get_miss_config(const struct ofproto *,
> uint8_t table_id);
>
> +void ofproto_ref(struct ofproto *);
> +void ofproto_unref(struct ofproto *);
> +bool ofproto_try_ref(struct ofproto *);
> +
> #ifdef __cplusplus
> }
> #endif
> --
> 2.25.1
--
Gaetan Rivet
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev