On 2/15/22 11:19, Peng He wrote:


Adrian Moreno <[email protected] <mailto:[email protected]>> 于2022年2月15日 周二 17:19写道:

    Hi Peng,

    On 2/13/22 03:14, Peng He wrote:
     >  From hepeng:
     >
    
https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2487473
    
<https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2487473>
     >
     > also from guohongzhi <[email protected]
    <mailto:[email protected]>>:
     >
    
http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
    
<http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/>
     >
     > also from a discussion about the mixing use of RCU and refcount in the 
mail
     > list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet.
     >
     > A summary, as quoted from Ilya:
     >
     > "
     > RCU for ofproto was introduced for one
     > and only one reason - to avoid freeing ofproto while rules are still
     > alive.  This was done in commit f416c8d61601 ("ofproto: RCU postpone
     > rule destruction.").  The goal was to allow using rules without
     > refcounting them within a single grace period.  And that forced us
     > to postpone destruction of the ofproto for a single grace period.
     > Later commit 39c9459355b6 ("Use classifier versioning.") made it
     > possible for rules to be alive for more than one grace period, so
     > the commit made ofproto wait for 2 grace periods by double postponing.
     > As we can see now, that wasn't enough and we have to wait for more
     > than 2 grace periods in certain cases.
     > "
     >
     > In a short, the ofproto should have a longer life time than rule, if
     > the rule lasts for more than 2 grace periods, the ofproto should live
     > longer to ensure rule->ofproto is valid. It's hard to predict how long
     > a ofproto should live, thus we need to use refcount on ofproto to make
     > things easy. The controversial part is that we have already used RCU 
postpone
     > to delay ofproto destrution, if we have to add refcount, is it simpler to
     > use just refcount without RCU postpone?
     >
     > IMO, I think going back to the pure refcount solution is more
     > complicated than mixing using both.
     >
     > Gaëtan Rive asks some questions on guohongzhi's v2 patch:
     >
     > during ofproto_rule_create, should we use ofproto_ref
     > or ofproto_try_ref? how can we make sure the ofproto is alive?
     >
     > By using RCU, ofproto has three states:
     >
     > state 1: alive, with refcount >= 1
     > state 2: dying, with refcount == 0, however pointer is valid
     > state 3: died, memory freed, pointer might be dangling.
     >
     > Without using RCU, there is no state 2, thus, we have to be very careful
     > every time we see a ofproto pointer. In contrast, with RCU, we can be 
sure
     > that it's alive at least in this grace peroid, so we can just check if
     > it is dying by ofproto_try_ref.
     >
     > This shows that by mixing use of RCU and refcount we can save a lot of 
work
     > worrying if ofproto is dangling.
     >
     > In short, the RCU part makes sure the ofproto is alive when we use it,
     > and the refcount part makes sure it lives longer enough.
     >
     > In this patch, I have merged guohongzhi's patch and mine, and fixes
     > accoring to the previous comments.
     >
     > v4->v5:
     > * fix the commits, remove the ref to wangyunjian's patch and
     > remove the comments describing the previous ofproto destruction code.
     > * fix group alloc leak issues.
     >
     > v5->v6:
     > * fix ofproto unref leak
     > * fix comments
     >
     > Signed-off-by: Peng He <[email protected]
    <mailto:[email protected]>>
     > Signed-off-by: guohongzhi <[email protected]
    <mailto:[email protected]>>
     > Acked-by: Mike Pattrick <[email protected] <mailto:[email protected]>>
     > ---
     >   ofproto/ofproto-dpif-xlate-cache.c |  2 +
     >   ofproto/ofproto-dpif-xlate.c       | 14 ++++---
     >   ofproto/ofproto-dpif.c             | 24 ++++++-----
     >   ofproto/ofproto-provider.h         |  2 +
     >   ofproto/ofproto.c                  | 65 +++++++++++++++++++++++++++---
     >   ofproto/ofproto.h                  |  4 ++
     >   6 files changed, 90 insertions(+), 21 deletions(-)
     >
     > diff --git a/ofproto/ofproto-dpif-xlate-cache.c
    b/ofproto/ofproto-dpif-xlate-cache.c
     > index dcc91cb38..9224ee2e6 100644
     > --- a/ofproto/ofproto-dpif-xlate-cache.c
     > +++ b/ofproto/ofproto-dpif-xlate-cache.c
     > @@ -209,6 +209,7 @@ xlate_cache_clear_entry(struct xc_entry *entry)
     >   {
     >       switch (entry->type) {
     >       case XC_TABLE:
     > +        ofproto_unref(&(entry->table.ofproto->up));
     >           break;
     >       case XC_RULE:
     >           ofproto_rule_unref(&entry->rule->up);
     > @@ -231,6 +232,7 @@ xlate_cache_clear_entry(struct xc_entry *entry)
     >           free(entry->learn.ofm);
     >           break;
     >       case XC_NORMAL:
     > +        ofproto_unref(&(entry->normal.ofproto->up));
     >           break;
     >       case XC_FIN_TIMEOUT:
     >           /* 'u.fin.rule' is always already held as a XC_RULE, which
     > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
     > index 6fb59e170..129cdf714 100644
     > --- a/ofproto/ofproto-dpif-xlate.c
     > +++ b/ofproto/ofproto-dpif-xlate.c
     > @@ -3024,12 +3024,14 @@ xlate_normal(struct xlate_ctx *ctx)
     >           struct xc_entry *entry;
     >
     >           /* Save just enough info to update mac learning table later. */
     > -        entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL);
     > -        entry->normal.ofproto = ctx->xbridge->ofproto;
     > -        entry->normal.in_port = flow->in_port.ofp_port;
     > -        entry->normal.dl_src = flow->dl_src;
     > -        entry->normal.vlan = vlan;
     > -        entry->normal.is_gratuitous_arp = is_grat_arp;
     > +        if (ofproto_try_ref(&ctx->xbridge->ofproto->up)) {
     > +            entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL);
     > +            entry->normal.ofproto = ctx->xbridge->ofproto;
     > +            entry->normal.in_port = flow->in_port.ofp_port;
     > +            entry->normal.dl_src = flow->dl_src;
     > +            entry->normal.vlan = vlan;
     > +            entry->normal.is_gratuitous_arp = is_grat_arp;
     > +        }
     >       }
     >
     >       /* Determine output bundle. */
     > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
     > index 8143dd965..c0a87456a 100644
     > --- a/ofproto/ofproto-dpif.c
     > +++ b/ofproto/ofproto-dpif.c
     > @@ -4471,12 +4471,14 @@ rule_dpif_lookup_from_table(struct ofproto_dpif
    *ofproto,
     >                   atomic_add_relaxed(&tbl->n_matched, stats->n_packets,
    &orig);
     >               }
     >               if (xcache) {
     > -                struct xc_entry *entry;
     > +                if (ofproto_try_ref(&ofproto->up)) {
     > +                    struct xc_entry *entry;
     >

    The comment above "struct xc_entry" (ofproto/ofproto-dpif-xlate-cache.h:67"
    reads:

       * An explicit reference is taken to all pointers other than the ones for
       * struct ofproto_dpif.  ofproto_dpif pointers are explicitly protected by
       * destroying all xlate caches before the ofproto is destroyed. */

    We might need to change this.


can just remove the words from "other than ... ".


Right. everything from "other than..." to the end is now wrong.


     > -                entry = xlate_cache_add_entry(xcache, XC_TABLE);
     > -                entry->table.ofproto = ofproto;
     > -                entry->table.id <http://table.id> = *table_id;
     > -                entry->table.match = true;
     > +                    entry = xlate_cache_add_entry(xcache, XC_TABLE);
     > +                    entry->table.ofproto = ofproto;
     > +                    entry->table.id <http://table.id> = *table_id;
     > +                    entry->table.match = true;
     > +                }
     >               }
     >               return rule;
     >           }
     > @@ -4507,12 +4509,14 @@ rule_dpif_lookup_from_table(struct ofproto_dpif
    *ofproto,
     >                                  stats->n_packets, &orig);
     >           }
     >           if (xcache) {
     > -            struct xc_entry *entry;
     > +            if (ofproto_try_ref(&ofproto->up)) {
     > +                struct xc_entry *entry;
     >
     > -            entry = xlate_cache_add_entry(xcache, XC_TABLE);
     > -            entry->table.ofproto = ofproto;
     > -            entry->table.id <http://table.id> = next_id;
     > -            entry->table.match = (rule != NULL);
     > +                entry = xlate_cache_add_entry(xcache, XC_TABLE);
     > +                entry->table.ofproto = ofproto;
     > +                entry->table.id <http://table.id> = next_id;
     > +                entry->table.match = (rule != NULL);
     > +            }
     >           }
     >           if (rule) {
     >               goto out;   /* Match. */
     > diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
     > index 14b909973..ed10b8c76 100644
     > --- a/ofproto/ofproto-provider.h
     > +++ b/ofproto/ofproto-provider.h
     > @@ -143,6 +143,8 @@ struct ofproto {
     >       /* Variable length mf_field mapping. Stores all configured variable
    length
     >        * meta-flow fields (struct mf_field) in a switch. */
     >       struct vl_mff_map vl_mff_map;
     > +    /* refcount to this ofproto, holds by rule/group/xlate_caches */
     > +    struct ovs_refcount refcount;

    See Gaëtan's comment.

     >   };
     >
     >   void ofproto_init_tables(struct ofproto *, int n_tables);
     > diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
     > index 56aeac720..fe9bb943f 100644
     > --- a/ofproto/ofproto.c
     > +++ b/ofproto/ofproto.c
     > @@ -549,6 +549,7 @@ ofproto_create(const char *datapath_name, const char
    *datapath_type,
     >
     >       ovs_mutex_init(&ofproto->vl_mff_map.mutex);
     >       cmap_init(&ofproto->vl_mff_map.cmap);
     > +    ovs_refcount_init(&ofproto->refcount);
     >
     >       error = ofproto->ofproto_class->construct(ofproto);
     >       if (error) {
     > @@ -1695,9 +1696,27 @@ ofproto_destroy__(struct ofproto *ofproto)
     >       ofproto->ofproto_class->dealloc(ofproto);
     >   }
     >
     > -/* Destroying rules is doubly deferred, must have 'ofproto' around for 
them.
     > - * - 1st we defer the removal of the rules from the classifier
     > - * - 2nd we defer the actual destruction of the rules. */
     > +/*
     > + * Rule destruction requires ofproto to remain accessible.
     > + * Depending on the rule destruction call (shown in below), it can take
    several
     > + * RCU grace periods before the ofproto reference is not needed anymore.
     > + * The ofproto destruction callback is thus protected by a refcount,
     > + * and such destruction is itself deferred.
     > + *
     > + * remove_rules_postponed (one grace period)
     > + *       -> remove_rule_rcu
     > + *           -> remove_rule_rcu__
     > + *               -> ofproto_rule_unref -> ref count != 1
     > + *                   -> ... more grace periods.
     > + *                   -> rule_destroy_cb (> 2 grace periods)
     > + *                       -> free
     > + *
     > + * NOTE: The original ofproto destruction is only deferred by two grace
     > + * periods to keep ofproto accessible. By using refcount the destruction
    can be
     > + * deferred for longer time. But we only add refounct to
    rule/group/xlate cache, in case
     > + * there are other places which need ofproto refcount, I keep the
    original two RCU
     > + * grace periods destruction deferring.
     > + */

    Maybe s/I/we/?


Agree.

    s/refounct/refcount/


Agree.

    I don't quite get the last sentence, do you mean something like the 
following?
    We only need to add refcount to certain objects whose destruction can take
    several RCU grace periods (rule, group, xlate_cache). Other references to
    ofproto must be cleared before the 2 RCU grace periods.


yes, but do we have to explain why using 2 other than 1? My original comments
are trying to explain why 2 grace periods, not 1, as this is a legacy logic 
which
can cover most other cases.


I think the rest of the explanation is clear. The only part that felt strange was that last sentence about why we are only adding refcount to 3 objects.



    Also, I find the description of the 3 stages in your commit message quite
    illustrating:
        state 1: alive, with refcount >= 1
        state 2: dying, with refcount == 0, however pointer is valid
        state 3: died, memory freed, pointer might be dangling.

    Do you think we could include it?

yes, combining this with your description is clearer.

    We only need to add refcount to certain objects whose destruction can take
    several RCU grace periods (rule, group, xlate_cache). Other references to
    ofproto must be cleared before the 2 RCU grace periods.




     >   static void
     >   ofproto_destroy_defer__(struct ofproto *ofproto)
     >       OVS_EXCLUDED(ofproto_mutex)
     > @@ -1705,6 +1724,26 @@ ofproto_destroy_defer__(struct ofproto *ofproto)
     >       ovsrcu_postpone(ofproto_destroy__, ofproto);
     >   }
     >
     > +void
     > +ofproto_ref(struct ofproto *ofproto)
     > +{
     > +    ovs_refcount_ref(&ofproto->refcount);
     > +}
     > +
     > +bool
     > +ofproto_try_ref(struct ofproto *ofproto)
     > +{
     > +    return ovs_refcount_try_ref_rcu(&ofproto->refcount);
     > +}
     > +
     > +void
     > +ofproto_unref(struct ofproto *ofproto)
     > +{
     > +    if (ofproto && ovs_refcount_unref(&ofproto->refcount) == 1) {
     > +        ovsrcu_postpone(ofproto_destroy_defer__, ofproto);
     > +    }
     > +}
     > +
     >   void
     >   ofproto_destroy(struct ofproto *p, bool del)
     >       OVS_EXCLUDED(ofproto_mutex)
     > @@ -1736,8 +1775,7 @@ ofproto_destroy(struct ofproto *p, bool del)
     >       p->connmgr = NULL;
     >       ovs_mutex_unlock(&ofproto_mutex);
     >
     > -    /* Destroying rules is deferred, must have 'ofproto' around for 
them. */
     > -    ovsrcu_postpone(ofproto_destroy_defer__, p);
     > +    ofproto_unref(p);
     >   }
     >
     >   /* Destroys the datapath with the respective 'name' and 'type'.  With
    the Linux
     > @@ -2929,6 +2967,9 @@ ofproto_rule_destroy__(struct rule *rule)
     >       cls_rule_destroy(CONST_CAST(struct cls_rule *, &rule->cr));
     >       rule_actions_destroy(rule_get_actions(rule));
     >       ovs_mutex_destroy(&rule->mutex);
     > +    /* ofproto_unref() must be called first. It is possible because 
ofproto
     > +     * destruction is deferred by an RCU grace period. */
     > +    ofproto_unref(rule->ofproto);
     >       rule->ofproto->ofproto_class->rule_dealloc(rule);
     >   }
     >
     > @@ -3069,6 +3110,9 @@ group_destroy_cb(struct ofgroup *group)
     >                                                   &group->props));
     >       ofputil_bucket_list_destroy(CONST_CAST(struct ovs_list *,
     >                                              &group->buckets));
     > +    /* ofproto_unref() must be called first. It is possible because 
ofproto
     > +     * destruction is deferred by an RCU grace period. */
     > +    ofproto_unref(group->ofproto);
     >       group->ofproto->ofproto_class->group_dealloc(group);
     >   }
     >
     > @@ -5271,10 +5315,15 @@ ofproto_rule_create(struct ofproto *ofproto,
    struct cls_rule *cr,
     >       struct rule *rule;
     >       enum ofperr error;
     >
     > +    if (!ofproto_try_ref(ofproto)) {
     > +        return OFPERR_OFPFMFC_UNKNOWN;
     > +    }
     > +
     >       /* Allocate new rule. */
     >       rule = ofproto->ofproto_class->rule_alloc();
     >       if (!rule) {
     >           cls_rule_destroy(cr);
     > +        ofproto_unref(ofproto);
     >           VLOG_WARN_RL(&rl, "%s: failed to allocate a rule.", 
ofproto->name);
     >           return OFPERR_OFPFMFC_UNKNOWN;
     >       }
     > @@ -7339,8 +7388,13 @@ init_group(struct ofproto *ofproto, const struct
    ofputil_group_mod *gm,
     >           return OFPERR_OFPGMFC_BAD_TYPE;
     >       }
     >
     > +    if (!ofproto_try_ref(ofproto)) {
     > +        return OFPERR_OFPFMFC_UNKNOWN;
     > +    }
     > +
     >       *ofgroup = ofproto->ofproto_class->group_alloc();
     >       if (!*ofgroup) {
     > +        ofproto_unref(ofproto);
     >           VLOG_WARN_RL(&rl, "%s: failed to allocate group", 
ofproto->name);
     >           return OFPERR_OFPGMFC_OUT_OF_GROUPS;
     >       }
     > @@ -7377,6 +7431,7 @@ init_group(struct ofproto *ofproto, const struct
    ofputil_group_mod *gm,
     >                                                       
&(*ofgroup)->props));
     >           ofputil_bucket_list_destroy(CONST_CAST(struct ovs_list *,
     >                                                  &(*ofgroup)->buckets));
     > +        ofproto_unref(ofproto);
     >           ofproto->ofproto_class->group_dealloc(*ofgroup);
     >       }
     >       return error;
     > diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
     > index b0262da2d..4e15167ab 100644
     > --- a/ofproto/ofproto.h
     > +++ b/ofproto/ofproto.h
     > @@ -563,6 +563,10 @@ int ofproto_port_get_cfm_status(const struct 
ofproto *,
     >   enum ofputil_table_miss ofproto_table_get_miss_config(const struct
    ofproto *,
     >                                                         uint8_t 
table_id);
     >
     > +void ofproto_ref(struct ofproto *);
     > +void ofproto_unref(struct ofproto *);
     > +bool ofproto_try_ref(struct ofproto *);
     > +
     >   #ifdef  __cplusplus
     >   }
     >   #endif

-- Adrián Moreno



--
hepeng

--
Adrián Moreno

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to