On 2/5/22 04:58, Peng He wrote:
Hi, Adrian,



Adrian Moreno <[email protected] <mailto:[email protected]>> 于2022年2月4日 周五 21:37写道:



    On 1/18/22 16:01, Peng He wrote:
     >  From hepeng:
     >
    
https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2487473
    
<https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2487473>
     >
     > also from guohongzhi <[email protected]
    <mailto:[email protected]>>:
     >
    
http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
    
<http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/>
     >
     > also from a discussion about the mixing use of RCU and refcount in the 
mail
     > list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet.
     >
     > A summary, as quoted from Ilya:
     >
     > "
     > RCU for ofproto was introduced for one
     > and only one reason - to avoid freeing ofproto while rules are still
     > alive.  This was done in commit f416c8d61601 ("ofproto: RCU postpone
     > rule destruction.").  The goal was to allow using rules without
     > refcounting them within a single grace period.  And that forced us
     > to postpone destruction of the ofproto for a single grace period.
     > Later commit 39c9459355b6 ("Use classifier versioning.") made it
     > possible for rules to be alive for more than one grace period, so
     > the commit made ofproto wait for 2 grace periods by double postponing.
     > As we can see now, that wasn't enough and we have to wait for more
     > than 2 grace periods in certain cases.
     > "
     >
     > In a short, the ofproto should have a longer life time than rule, if
     > the rule lasts for more than 2 grace periods, the ofproto should live
     > longer to ensure rule->ofproto is valid. It's hard to predict how long
     > a ofproto should live, thus we need to use refcount on ofproto to make
     > things easy. The controversial part is that we have already used RCU 
postpone
     > to delay ofproto destrution, if we have to add refcount, is it simpler to
     > use just refcount without RCU postpone?
     >
     > IMO, I think going back to the pure refcount solution is more
     > complicated than mixing using both.
     >
     > Gaëtan Rive asks some questions on guohongzhi's v2 patch:
     >
     > during ofproto_rule_create, should we use ofproto_ref
     > or ofproto_try_ref? how can we make sure the ofproto is alive?
     >
     > By using RCU, ofproto has three states:
     >
     > state 1: alive, with refcount >= 1
     > state 2: dying, with refcount == 0, however pointer is valid
     > state 3: died, memory freed, pointer might be dangling.
     >
     > Without using RCU, there is no state 2, thus, we have to be very careful
     > every time we see a ofproto pointer. In contrast, with RCU, we can be 
sure
     > that it's alive at least in this grace peroid, so we can just check if
     > it is dying by ofproto_try_ref.
     >
     > This shows that by mixing use of RCU and refcount we can save a lot of 
work
     > worrying if ofproto is dangling.
     >
     > In short, the RCU part makes sure the ofproto is alive when we use it,
     > and the refcount part makes sure it lives longer enough.
     >
     > Also regarding a new patch filed recently, people are now making use
     > of RCU to protect ofproto:
     >
     >
    
https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
    
<https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/>
     >
     > In this patch, I have merged guohongzhi's patch and mine, and fixes
     > accoring to the previous comments.
     >
     > Signed-off-by: Peng He <[email protected]
    <mailto:[email protected]>>
     > Signed-off-by: guohongzhi <[email protected]
    <mailto:[email protected]>>
     > ---
     >   ofproto/ofproto-dpif-xlate-cache.c |  2 +
     >   ofproto/ofproto-dpif-xlate.c       | 14 ++++---
     >   ofproto/ofproto-dpif.c             | 24 +++++++-----
     >   ofproto/ofproto-provider.h         |  2 +
     >   ofproto/ofproto.c                  | 62 +++++++++++++++++++++++++++---
     >   ofproto/ofproto.h                  |  4 ++
     >   6 files changed, 87 insertions(+), 21 deletions(-)
     >
     > diff --git a/ofproto/ofproto-dpif-xlate-cache.c
    b/ofproto/ofproto-dpif-xlate-cache.c
     > index dcc91cb38..9224ee2e6 100644
     > --- a/ofproto/ofproto-dpif-xlate-cache.c
     > +++ b/ofproto/ofproto-dpif-xlate-cache.c
     > @@ -209,6 +209,7 @@ xlate_cache_clear_entry(struct xc_entry *entry)
     >   {
     >       switch (entry->type) {
     >       case XC_TABLE:
     > +        ofproto_unref(&(entry->table.ofproto->up));
     >           break;
     >       case XC_RULE:
     >           ofproto_rule_unref(&entry->rule->up);
     > @@ -231,6 +232,7 @@ xlate_cache_clear_entry(struct xc_entry *entry)
     >           free(entry->learn.ofm);
     >           break;
     >       case XC_NORMAL:
     > +        ofproto_unref(&(entry->normal.ofproto->up));
     >           break;
     >       case XC_FIN_TIMEOUT:
     >           /* 'u.fin.rule' is always already held as a XC_RULE, which
     > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
     > index 6fb59e170..129cdf714 100644
     > --- a/ofproto/ofproto-dpif-xlate.c
     > +++ b/ofproto/ofproto-dpif-xlate.c
     > @@ -3024,12 +3024,14 @@ xlate_normal(struct xlate_ctx *ctx)
     >           struct xc_entry *entry;
     >
     >           /* Save just enough info to update mac learning table later. */
     > -        entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL);
     > -        entry->normal.ofproto = ctx->xbridge->ofproto;
     > -        entry->normal.in_port = flow->in_port.ofp_port;
     > -        entry->normal.dl_src = flow->dl_src;
     > -        entry->normal.vlan = vlan;
     > -        entry->normal.is_gratuitous_arp = is_grat_arp;
     > +        if (ofproto_try_ref(&ctx->xbridge->ofproto->up)) {
     > +            entry = xlate_cache_add_entry(ctx->xin->xcache, XC_NORMAL);
     > +            entry->normal.ofproto = ctx->xbridge->ofproto;
     > +            entry->normal.in_port = flow->in_port.ofp_port;
     > +            entry->normal.dl_src = flow->dl_src;
     > +            entry->normal.vlan = vlan;
     > +            entry->normal.is_gratuitous_arp = is_grat_arp;
     > +        }
     >       }
     >
     >       /* Determine output bundle. */
     > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
     > index 8143dd965..c0a87456a 100644
     > --- a/ofproto/ofproto-dpif.c
     > +++ b/ofproto/ofproto-dpif.c
     > @@ -4471,12 +4471,14 @@ rule_dpif_lookup_from_table(struct ofproto_dpif
    *ofproto,
     >                   atomic_add_relaxed(&tbl->n_matched, stats->n_packets,
    &orig);
     >               }
     >               if (xcache) {
     > -                struct xc_entry *entry;
     > +                if (ofproto_try_ref(&ofproto->up)) {
     > +                    struct xc_entry *entry;
     >
     > -                entry = xlate_cache_add_entry(xcache, XC_TABLE);
     > -                entry->table.ofproto = ofproto;
     > -                entry->table.id <http://table.id> = *table_id;
     > -                entry->table.match = true;
     > +                    entry = xlate_cache_add_entry(xcache, XC_TABLE);
     > +                    entry->table.ofproto = ofproto;
     > +                    entry->table.id <http://table.id> = *table_id;
     > +                    entry->table.match = true;
     > +                }
     >               }
     >               return rule;
     >           }
     > @@ -4507,12 +4509,14 @@ rule_dpif_lookup_from_table(struct ofproto_dpif
    *ofproto,
     >                                  stats->n_packets, &orig);
     >           }
     >           if (xcache) {
     > -            struct xc_entry *entry;
     > +            if (ofproto_try_ref(&ofproto->up)) {
     > +                struct xc_entry *entry;
     >
     > -            entry = xlate_cache_add_entry(xcache, XC_TABLE);
     > -            entry->table.ofproto = ofproto;
     > -            entry->table.id <http://table.id> = next_id;
     > -            entry->table.match = (rule != NULL);
     > +                entry = xlate_cache_add_entry(xcache, XC_TABLE);
     > +                entry->table.ofproto = ofproto;
     > +                entry->table.id <http://table.id> = next_id;
     > +                entry->table.match = (rule != NULL);
     > +            }
     >           }
     >           if (rule) {
     >               goto out;   /* Match. */
     > diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
     > index 14b909973..ed10b8c76 100644
     > --- a/ofproto/ofproto-provider.h
     > +++ b/ofproto/ofproto-provider.h
     > @@ -143,6 +143,8 @@ struct ofproto {
     >       /* Variable length mf_field mapping. Stores all configured variable
    length
     >        * meta-flow fields (struct mf_field) in a switch. */
     >       struct vl_mff_map vl_mff_map;
     > +    /* refcount to this ofproto, holds by rule/group/xlate_caches */
     > +    struct ovs_refcount refcount;
     >   };
     >
     >   void ofproto_init_tables(struct ofproto *, int n_tables);
     > diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
     > index 56aeac720..10a22d9ec 100644
     > --- a/ofproto/ofproto.c
     > +++ b/ofproto/ofproto.c
     > @@ -549,6 +549,7 @@ ofproto_create(const char *datapath_name, const char
    *datapath_type,
     >
     >       ovs_mutex_init(&ofproto->vl_mff_map.mutex);
     >       cmap_init(&ofproto->vl_mff_map.cmap);
     > +    ovs_refcount_init(&ofproto->refcount);
     >
     >       error = ofproto->ofproto_class->construct(ofproto);
     >       if (error) {
     > @@ -1695,9 +1696,24 @@ ofproto_destroy__(struct ofproto *ofproto)
     >       ofproto->ofproto_class->dealloc(ofproto);
     >   }
     >
     > -/* Destroying rules is doubly deferred, must have 'ofproto' around for 
them.
     > - * - 1st we defer the removal of the rules from the classifier
     > - * - 2nd we defer the actual destruction of the rules. */
     > +/* We used to use defer function to wait for two grace periods
     > + * as we assume the rule that holds the ofproto pointer will
     > + * live at most two grace period. Howvever, we found at certain
     > + * cases, this assumption does not stand.
     > + *
     > + * destroying a rule may have to wait multiple grace periods:
     > + * remove_rules_postponed (one grace period)
     > + *       -> remove_rule_rcu
     > + *           -> remove_rule_rcu__
     > + *               -> ofproto_rule_unref -> ref count != 1
     > + *                   -> ... more grace periods.
     > + *                   -> rule_destroy_cb (> 2 grace periods)
     > + *                       -> free
     > + *
     > + * So we have to check the refcount for sure all the rules
     > + * have been destroyed.
     > + *
     > + */
     >   static void
     >   ofproto_destroy_defer__(struct ofproto *ofproto)
     >       OVS_EXCLUDED(ofproto_mutex)
     > @@ -1705,6 +1721,26 @@ ofproto_destroy_defer__(struct ofproto *ofproto)
     >       ovsrcu_postpone(ofproto_destroy__, ofproto);
     >   }
     >
     > +void
     > +ofproto_ref(struct ofproto *ofproto)
     > +{
     > +    ovs_refcount_ref(&ofproto->refcount);
     > +}
     > +
     > +bool
     > +ofproto_try_ref(struct ofproto *ofproto)
     > +{
     > +    return ovs_refcount_try_ref_rcu(&ofproto->refcount);
     > +}
     > +
     > +void
     > +ofproto_unref(struct ofproto *ofproto)
     > +{
     > +    if (ofproto && ovs_refcount_unref(&ofproto->refcount) == 1) {
     > +        ovsrcu_postpone(ofproto_destroy_defer__, ofproto);
     > +    }
     > +}
     > +
     >   void
     >   ofproto_destroy(struct ofproto *p, bool del)
     >       OVS_EXCLUDED(ofproto_mutex)
     > @@ -1736,8 +1772,7 @@ ofproto_destroy(struct ofproto *p, bool del)
     >       p->connmgr = NULL;
     >       ovs_mutex_unlock(&ofproto_mutex);
     >
     > -    /* Destroying rules is deferred, must have 'ofproto' around for 
them. */
     > -    ovsrcu_postpone(ofproto_destroy_defer__, p);
     > +    ofproto_unref(p);
     >   }
     >
     >   /* Destroys the datapath with the respective 'name' and 'type'.  With
    the Linux
     > @@ -2929,6 +2964,10 @@ ofproto_rule_destroy__(struct rule *rule)
     >       cls_rule_destroy(CONST_CAST(struct cls_rule *, &rule->cr));
     >       rule_actions_destroy(rule_get_actions(rule));
     >       ovs_mutex_destroy(&rule->mutex);
     > +    /* we need to call ofproto_unref first, and thanks to rcu, ofproto
    is alive
     > +     * otherwise, group is freed, group->ofproto is invalid
     > +     */
     > +    ofproto_unref(rule->ofproto);
     >       rule->ofproto->ofproto_class->rule_dealloc(rule);
     >   }
     >
     > @@ -3069,6 +3108,10 @@ group_destroy_cb(struct ofgroup *group)
     >                                                   &group->props));
     >       ofputil_bucket_list_destroy(CONST_CAST(struct ovs_list *,
     >                                              &group->buckets));
     > +    /* we need to call ofproto_unref first, and thanks to rcu, ofproto
    is alive
     > +     * otherwise, group is freed, group->ofproto is invalid
     > +     */
     > +    ofproto_unref(group->ofproto);
     >       group->ofproto->ofproto_class->group_dealloc(group);
     >   }
     >
     > @@ -5279,6 +5322,11 @@ ofproto_rule_create(struct ofproto *ofproto,
    struct cls_rule *cr,
     >           return OFPERR_OFPFMFC_UNKNOWN;
     >       }
     >
     > +    if (!ofproto_try_ref(ofproto)) {
     > +        cls_rule_destroy(cr);
     > +        return OFPERR_OFPFMFC_UNKNOWN;
     > +    }
     > +
     >       /* Initialize base state. */
     >       *CONST_CAST(struct ofproto **, &rule->ofproto) = ofproto;
     >       cls_rule_move(CONST_CAST(struct cls_rule *, &rule->cr), cr);
     > @@ -7345,6 +7393,10 @@ init_group(struct ofproto *ofproto, const struct
    ofputil_group_mod *gm,
     >           return OFPERR_OFPGMFC_OUT_OF_GROUPS;
     >       }
     >
     > +    if (!ofproto_try_ref(ofproto)) {
     > +        return OFPERR_OFPFMFC_UNKNOWN;
     > +    }
     > +
     >       *CONST_CAST(struct ofproto **, &(*ofgroup)->ofproto) = ofproto;
     >       *CONST_CAST(uint32_t *, &((*ofgroup)->group_id)) = gm->group_id;
     >       *CONST_CAST(enum ofp11_group_type *, &(*ofgroup)->type) = gm->type;
     > diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
     > index b0262da2d..4e15167ab 100644
     > --- a/ofproto/ofproto.h
     > +++ b/ofproto/ofproto.h
     > @@ -563,6 +563,10 @@ int ofproto_port_get_cfm_status(const struct 
ofproto *,
     >   enum ofputil_table_miss ofproto_table_get_miss_config(const struct
    ofproto *,
     >                                                         uint8_t 
table_id);
     >
     > +void ofproto_ref(struct ofproto *);
     > +void ofproto_unref(struct ofproto *);
     > +bool ofproto_try_ref(struct ofproto *);
     > +
     >   #ifdef  __cplusplus
     >   }
     >   #endif


    Hi Peng,

    Do we still to protect the reference that "struct upcall" holds? i.e:
    
https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
    
<https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/>


there are two code path which leads to the call to "upcall_receive", the userspace uses upcall_cb and the kernel datapath uses recv_upcalls.  In both places, the upcall struct is an on-stack-allocated value.
        
For userspace, the upcall_cb is called directly from pmd thread, which I guess inside the upcall_cb, no places will sleep or call  rcu_wait, meaning that this pointer to the ofproto will not be used for too long, i.e. longer than one grace period.

The fact of upcall_cb calling from pmd thread can be used to prove that upcall_receive contains no rcu_wait, so also for the kernel datapath,
I guess no need to protect the upcall ref in "struct upcall".

Maybe I'm missing something, but what I see is that the main thread can call ofproto_destroy at any time which will call ofproto_dpif->destruct(). The ofproto_dpif destruct() will run rcu_synchronize() which will block until all handler threads have finished processing a batch.

But if a hanlder thread wakes up before the main thread does, it could pick up the "ofproto_dpif" reference again from "all_ofproto_dpifs_by_uuid" and use it while the main thread continues destruct()ing it. The result could very well be a ovs crash with the stack trace reported by Yunjian, i.e:

(gdb) bt
  hmap_next (hmap.h:398)
  seq_wake_waiters (seq.c:326)
  seq_change_protected (seq.c:134)
  seq_change (seq.c:144)
  ofproto_dpif_send_async_msg (ofproto_dpif.c:263)
  process_upcall (ofproto_dpif_upcall.c:1782)
  recv_upcalls (ofproto_dpif_upcall.c:1026)
  udpif_upcall_handler (ofproto/ofproto_dpif_upcall.c:945)
  ovsthread_wrapper (ovs_thread.c:734)


Maybe I've misinterpreted you're reference to Yunjian's patch so just wanted to make sure.

Thanks
--
Adrián Moreno

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to