Re: [ovs-dev] [ovs-dev v4] ofproto: add refcount to ofproto to fix ofproto use-after-free

Adrian Moreno Wed, 09 Feb 2022 01:41:29 -0800


On 2/9/22 10:08, Peng He wrote:

Hi, Adrian,

Adrian Moreno <[email protected] <mailto:[email protected]>> 于2022年2月8日周二 18:14写道：




    On 2/5/22 04:58, Peng He wrote:
     > Hi, Adrian,
     >
     >
     >
     > Adrian Moreno <[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>> 于2022年2月4日
     > 周五 21:37写道：
     >
     >
     >
     >     On 1/18/22 16:01, Peng He wrote:
     >      >  From hepeng:
     >      >
     >
    
https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2487473
    
<https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2487473>

> <https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2487473 <https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/#2487473>>

     >      >
     >      > also from guohongzhi <[email protected]
    <mailto:[email protected]>
     >     <mailto:[email protected] <mailto:[email protected]>>>:
     >      >
     >
    
http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
    
<http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/>

> <http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ <http://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/>>

     >      >
     >      > also from a discussion about the mixing use of RCU and refcount in
    the mail
     >      > list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet.
     >      >
     >      > A summary, as quoted from Ilya:
     >      >
     >      > "
     >      > RCU for ofproto was introduced for one
     >      > and only one reason - to avoid freeing ofproto while rules are 
still
     >      > alive.  This was done in commit f416c8d61601 ("ofproto: RCU 
postpone
     >      > rule destruction.").  The goal was to allow using rules without
     >      > refcounting them within a single grace period.  And that forced us
     >      > to postpone destruction of the ofproto for a single grace period.
     >      > Later commit 39c9459355b6 ("Use classifier versioning.") made it
     >      > possible for rules to be alive for more than one grace period, so
     >      > the commit made ofproto wait for 2 grace periods by double 
postponing.
     >      > As we can see now, that wasn't enough and we have to wait for more
     >      > than 2 grace periods in certain cases.
     >      > "
     >      >
     >      > In a short, the ofproto should have a longer life time than rule, 
if
     >      > the rule lasts for more than 2 grace periods, the ofproto should 
live
     >      > longer to ensure rule->ofproto is valid. It's hard to predict how 
long
     >      > a ofproto should live, thus we need to use refcount on ofproto to 
make
     >      > things easy. The controversial part is that we have already used
    RCU postpone
     >      > to delay ofproto destrution, if we have to add refcount, is it
    simpler to
     >      > use just refcount without RCU postpone?
     >      >
     >      > IMO, I think going back to the pure refcount solution is more
     >      > complicated than mixing using both.
     >      >
     >      > Gaëtan Rive asks some questions on guohongzhi's v2 patch:
     >      >
     >      > during ofproto_rule_create, should we use ofproto_ref
     >      > or ofproto_try_ref? how can we make sure the ofproto is alive?
     >      >
     >      > By using RCU, ofproto has three states:
     >      >
     >      > state 1: alive, with refcount >= 1
     >      > state 2: dying, with refcount == 0, however pointer is valid
     >      > state 3: died, memory freed, pointer might be dangling.
     >      >
     >      > Without using RCU, there is no state 2, thus, we have to be very
    careful
     >      > every time we see a ofproto pointer. In contrast, with RCU, we can
    be sure
     >      > that it's alive at least in this grace peroid, so we can just 
check if
     >      > it is dying by ofproto_try_ref.
     >      >
     >      > This shows that by mixing use of RCU and refcount we can save a
    lot of work
     >      > worrying if ofproto is dangling.
     >      >
     >      > In short, the RCU part makes sure the ofproto is alive when we 
use it,
     >      > and the refcount part makes sure it lives longer enough.
     >      >
     >      > Also regarding a new patch filed recently, people are now making 
use
     >      > of RCU to protect ofproto:
     >      >
     >      >
     >
    
https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
    
<https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/>

> <https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ <https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/>>

     >      >
     >      > In this patch, I have merged guohongzhi's patch and mine, and 
fixes
     >      > accoring to the previous comments.
     >      >
     >      > Signed-off-by: Peng He <[email protected]
    <mailto:[email protected]>
     >     <mailto:[email protected] 
<mailto:[email protected]>>>
     >      > Signed-off-by: guohongzhi <[email protected]
    <mailto:[email protected]>
     >     <mailto:[email protected] <mailto:[email protected]>>>
     >      > ---
     >      >   ofproto/ofproto-dpif-xlate-cache.c |  2 +
     >      >   ofproto/ofproto-dpif-xlate.c       | 14 ++++---
     >      >   ofproto/ofproto-dpif.c             | 24 +++++++-----
     >      >   ofproto/ofproto-provider.h         |  2 +
     >      >   ofproto/ofproto.c                  | 62
    +++++++++++++++++++++++++++---
     >      >   ofproto/ofproto.h                  |  4 ++
     >      >   6 files changed, 87 insertions(+), 21 deletions(-)
     >      >
     >      > diff --git a/ofproto/ofproto-dpif-xlate-cache.c
     >     b/ofproto/ofproto-dpif-xlate-cache.c
     >      > index dcc91cb38..9224ee2e6 100644
     >      > --- a/ofproto/ofproto-dpif-xlate-cache.c
     >      > +++ b/ofproto/ofproto-dpif-xlate-cache.c
     >      > @@ -209,6 +209,7 @@ xlate_cache_clear_entry(struct xc_entry 
*entry)
     >      >   {
     >      >       switch (entry->type) {
     >      >       case XC_TABLE:
     >      > +        ofproto_unref(&(entry->table.ofproto->up));
     >      >           break;
     >      >       case XC_RULE:
     >      >           ofproto_rule_unref(&entry->rule->up);
     >      > @@ -231,6 +232,7 @@ xlate_cache_clear_entry(struct xc_entry 
*entry)
     >      >           free(entry->learn.ofm);
     >      >           break;
     >      >       case XC_NORMAL:
     >      > +        ofproto_unref(&(entry->normal.ofproto->up));
     >      >           break;
     >      >       case XC_FIN_TIMEOUT:
     >      >           /* 'u.fin.rule' is always already held as a XC_RULE, 
which
     >      > diff --git a/ofproto/ofproto-dpif-xlate.c
    b/ofproto/ofproto-dpif-xlate.c
     >      > index 6fb59e170..129cdf714 100644
     >      > --- a/ofproto/ofproto-dpif-xlate.c
     >      > +++ b/ofproto/ofproto-dpif-xlate.c
     >      > @@ -3024,12 +3024,14 @@ xlate_normal(struct xlate_ctx *ctx)
     >      >           struct xc_entry *entry;
     >      >
     >      >           /* Save just enough info to update mac learning table
    later. */
     >      > -        entry = xlate_cache_add_entry(ctx->xin->xcache, 
XC_NORMAL);
     >      > -        entry->normal.ofproto = ctx->xbridge->ofproto;
     >      > -        entry->normal.in_port = flow->in_port.ofp_port;
     >      > -        entry->normal.dl_src = flow->dl_src;
     >      > -        entry->normal.vlan = vlan;
     >      > -        entry->normal.is_gratuitous_arp = is_grat_arp;
     >      > +        if (ofproto_try_ref(&ctx->xbridge->ofproto->up)) {
     >      > +            entry = xlate_cache_add_entry(ctx->xin->xcache,
    XC_NORMAL);
     >      > +            entry->normal.ofproto = ctx->xbridge->ofproto;
     >      > +            entry->normal.in_port = flow->in_port.ofp_port;
     >      > +            entry->normal.dl_src = flow->dl_src;
     >      > +            entry->normal.vlan = vlan;
     >      > +            entry->normal.is_gratuitous_arp = is_grat_arp;
     >      > +        }
     >      >       }
     >      >
     >      >       /* Determine output bundle. */
     >      > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
     >      > index 8143dd965..c0a87456a 100644
     >      > --- a/ofproto/ofproto-dpif.c
     >      > +++ b/ofproto/ofproto-dpif.c
     >      > @@ -4471,12 +4471,14 @@ rule_dpif_lookup_from_table(struct
    ofproto_dpif
     >     *ofproto,
     >      >                   atomic_add_relaxed(&tbl->n_matched,
    stats->n_packets,
     >     &orig);
     >      >               }
     >      >               if (xcache) {
     >      > -                struct xc_entry *entry;
     >      > +                if (ofproto_try_ref(&ofproto->up)) {
     >      > +                    struct xc_entry *entry;
     >      >
     >      > -                entry = xlate_cache_add_entry(xcache, XC_TABLE);
     >      > -                entry->table.ofproto = ofproto;
     >      > -                entry->table.id <http://table.id>
    <http://table.id <http://table.id>> = *table_id;
     >      > -                entry->table.match = true;
     >      > +                    entry = xlate_cache_add_entry(xcache, 
XC_TABLE);
     >      > +                    entry->table.ofproto = ofproto;
     >      > +                    entry->table.id <http://table.id>
    <http://table.id <http://table.id>> = *table_id;
     >      > +                    entry->table.match = true;
     >      > +                }
     >      >               }
     >      >               return rule;
     >      >           }
     >      > @@ -4507,12 +4509,14 @@ rule_dpif_lookup_from_table(struct
    ofproto_dpif
     >     *ofproto,
     >      >                                  stats->n_packets, &orig);
     >      >           }
     >      >           if (xcache) {
     >      > -            struct xc_entry *entry;
     >      > +            if (ofproto_try_ref(&ofproto->up)) {
     >      > +                struct xc_entry *entry;
     >      >
     >      > -            entry = xlate_cache_add_entry(xcache, XC_TABLE);
     >      > -            entry->table.ofproto = ofproto;
     >      > -            entry->table.id <http://table.id> <http://table.id
    <http://table.id>> = next_id;
     >      > -            entry->table.match = (rule != NULL);
     >      > +                entry = xlate_cache_add_entry(xcache, XC_TABLE);
     >      > +                entry->table.ofproto = ofproto;
     >      > +                entry->table.id <http://table.id>
    <http://table.id <http://table.id>> = next_id;
     >      > +                entry->table.match = (rule != NULL);
     >      > +            }
     >      >           }
     >      >           if (rule) {
     >      >               goto out;   /* Match. */
     >      > diff --git a/ofproto/ofproto-provider.h 
b/ofproto/ofproto-provider.h
     >      > index 14b909973..ed10b8c76 100644
     >      > --- a/ofproto/ofproto-provider.h
     >      > +++ b/ofproto/ofproto-provider.h
     >      > @@ -143,6 +143,8 @@ struct ofproto {
     >      >       /* Variable length mf_field mapping. Stores all configured
    variable
     >     length
     >      >        * meta-flow fields (struct mf_field) in a switch. */
     >      >       struct vl_mff_map vl_mff_map;
     >      > +    /* refcount to this ofproto, holds by 
rule/group/xlate_caches */
     >      > +    struct ovs_refcount refcount;
     >      >   };
     >      >
     >      >   void ofproto_init_tables(struct ofproto *, int n_tables);
     >      > diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
     >      > index 56aeac720..10a22d9ec 100644
     >      > --- a/ofproto/ofproto.c
     >      > +++ b/ofproto/ofproto.c
     >      > @@ -549,6 +549,7 @@ ofproto_create(const char *datapath_name,
    const char
     >     *datapath_type,
     >      >
     >      >       ovs_mutex_init(&ofproto->vl_mff_map.mutex);
     >      >       cmap_init(&ofproto->vl_mff_map.cmap);
     >      > +    ovs_refcount_init(&ofproto->refcount);
     >      >
     >      >       error = ofproto->ofproto_class->construct(ofproto);
     >      >       if (error) {
     >      > @@ -1695,9 +1696,24 @@ ofproto_destroy__(struct ofproto *ofproto)
     >      >       ofproto->ofproto_class->dealloc(ofproto);
     >      >   }
     >      >
     >      > -/* Destroying rules is doubly deferred, must have 'ofproto'
    around for them.
     >      > - * - 1st we defer the removal of the rules from the classifier
     >      > - * - 2nd we defer the actual destruction of the rules. */
     >      > +/* We used to use defer function to wait for two grace periods
     >      > + * as we assume the rule that holds the ofproto pointer will
     >      > + * live at most two grace period. Howvever, we found at certain
     >      > + * cases, this assumption does not stand.
     >      > + *
     >      > + * destroying a rule may have to wait multiple grace periods:
     >      > + * remove_rules_postponed (one grace period)
     >      > + *       -> remove_rule_rcu
     >      > + *           -> remove_rule_rcu__
     >      > + *               -> ofproto_rule_unref -> ref count != 1
     >      > + *                   -> ... more grace periods.
     >      > + *                   -> rule_destroy_cb (> 2 grace periods)
     >      > + *                       -> free
     >      > + *
     >      > + * So we have to check the refcount for sure all the rules
     >      > + * have been destroyed.
     >      > + *
     >      > + */
     >      >   static void
     >      >   ofproto_destroy_defer__(struct ofproto *ofproto)
     >      >       OVS_EXCLUDED(ofproto_mutex)
     >      > @@ -1705,6 +1721,26 @@ ofproto_destroy_defer__(struct ofproto
    *ofproto)
     >      >       ovsrcu_postpone(ofproto_destroy__, ofproto);
     >      >   }
     >      >
     >      > +void
     >      > +ofproto_ref(struct ofproto *ofproto)
     >      > +{
     >      > +    ovs_refcount_ref(&ofproto->refcount);
     >      > +}
     >      > +
     >      > +bool
     >      > +ofproto_try_ref(struct ofproto *ofproto)
     >      > +{
     >      > +    return ovs_refcount_try_ref_rcu(&ofproto->refcount);
     >      > +}
     >      > +
     >      > +void
     >      > +ofproto_unref(struct ofproto *ofproto)
     >      > +{
     >      > +    if (ofproto && ovs_refcount_unref(&ofproto->refcount) == 1) {
     >      > +        ovsrcu_postpone(ofproto_destroy_defer__, ofproto);
     >      > +    }
     >      > +}
     >      > +
     >      >   void
     >      >   ofproto_destroy(struct ofproto *p, bool del)
     >      >       OVS_EXCLUDED(ofproto_mutex)
     >      > @@ -1736,8 +1772,7 @@ ofproto_destroy(struct ofproto *p, bool del)
     >      >       p->connmgr = NULL;
     >      >       ovs_mutex_unlock(&ofproto_mutex);
     >      >
     >      > -    /* Destroying rules is deferred, must have 'ofproto' around
    for them. */
     >      > -    ovsrcu_postpone(ofproto_destroy_defer__, p);
     >      > +    ofproto_unref(p);
     >      >   }
     >      >

> > /* Destroys the datapath with the respective 'name' and 'type'.With

     >     the Linux
     >      > @@ -2929,6 +2964,10 @@ ofproto_rule_destroy__(struct rule *rule)
     >      >       cls_rule_destroy(CONST_CAST(struct cls_rule *, &rule->cr));
     >      >       rule_actions_destroy(rule_get_actions(rule));
     >      >       ovs_mutex_destroy(&rule->mutex);
     >      > +    /* we need to call ofproto_unref first, and thanks to rcu,
    ofproto
     >     is alive
     >      > +     * otherwise, group is freed, group->ofproto is invalid
     >      > +     */
     >      > +    ofproto_unref(rule->ofproto);
     >      >       rule->ofproto->ofproto_class->rule_dealloc(rule);
     >      >   }
     >      >
     >      > @@ -3069,6 +3108,10 @@ group_destroy_cb(struct ofgroup *group)
     >      >                                                   &group->props));
     >      >       ofputil_bucket_list_destroy(CONST_CAST(struct ovs_list *,
     >      >                                              &group->buckets));
     >      > +    /* we need to call ofproto_unref first, and thanks to rcu,
    ofproto
     >     is alive
     >      > +     * otherwise, group is freed, group->ofproto is invalid
     >      > +     */
     >      > +    ofproto_unref(group->ofproto);
     >      >       group->ofproto->ofproto_class->group_dealloc(group);
     >      >   }
     >      >
     >      > @@ -5279,6 +5322,11 @@ ofproto_rule_create(struct ofproto 
*ofproto,
     >     struct cls_rule *cr,
     >      >           return OFPERR_OFPFMFC_UNKNOWN;
     >      >       }
     >      >
     >      > +    if (!ofproto_try_ref(ofproto)) {
     >      > +        cls_rule_destroy(cr);
     >      > +        return OFPERR_OFPFMFC_UNKNOWN;
     >      > +    }
     >      > +
     >      >       /* Initialize base state. */
     >      >       *CONST_CAST(struct ofproto **, &rule->ofproto) = ofproto;
     >      >       cls_rule_move(CONST_CAST(struct cls_rule *, &rule->cr), cr);
     >      > @@ -7345,6 +7393,10 @@ init_group(struct ofproto *ofproto, const
    struct
     >     ofputil_group_mod *gm,
     >      >           return OFPERR_OFPGMFC_OUT_OF_GROUPS;
     >      >       }
     >      >
     >      > +    if (!ofproto_try_ref(ofproto)) {
     >      > +        return OFPERR_OFPFMFC_UNKNOWN;
     >      > +    }
     >      > +
     >      >       *CONST_CAST(struct ofproto **, &(*ofgroup)->ofproto) = 
ofproto;
     >      >       *CONST_CAST(uint32_t *, &((*ofgroup)->group_id)) = 
gm->group_id;
     >      >       *CONST_CAST(enum ofp11_group_type *, &(*ofgroup)->type) =
    gm->type;
     >      > diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
     >      > index b0262da2d..4e15167ab 100644
     >      > --- a/ofproto/ofproto.h
     >      > +++ b/ofproto/ofproto.h
     >      > @@ -563,6 +563,10 @@ int ofproto_port_get_cfm_status(const struct
    ofproto *,
     >      >   enum ofputil_table_miss ofproto_table_get_miss_config(const 
struct
     >     ofproto *,
     >      >                                                         uint8_t
    table_id);
     >      >
     >      > +void ofproto_ref(struct ofproto *);
     >      > +void ofproto_unref(struct ofproto *);
     >      > +bool ofproto_try_ref(struct ofproto *);
     >      > +
     >      >   #ifdef  __cplusplus
     >      >   }
     >      >   #endif
     >
     >
     >     Hi Peng,
     >
     >     Do we still to protect the reference that "struct upcall" holds? i.e:
     >
    
https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/
    
<https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/>

> <https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ <https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/>>

     >
     >
     > there are two code path which leads to the call to "upcall_receive", the
     > userspace uses upcall_cb and the kernel
     > datapath uses recv_upcalls.  In both places, the upcall struct is an
     > on-stack-allocated value.
     >
     > For userspace, the upcall_cb is called directly from pmd thread, which I
    guess
     > inside the upcall_cb, no places will sleep or call  rcu_wait, meaning
     > that this pointer to the ofproto will not be used for too long, i.e.
    longer than
     > one grace period.
     >
     > The fact of upcall_cb calling from pmd thread can be used to prove that
     > upcall_receive contains no rcu_wait, so also for the kernel datapath,
     > I guess no need to protect the upcall ref in "struct upcall".

    Maybe I'm missing something, but what I see is that the main thread can call
    ofproto_destroy at any time which will call ofproto_dpif->destruct().
    The ofproto_dpif destruct() will run rcu_synchronize() which will block 
until
    all handler threads have finished processing a batch.

    But if a hanlder thread wakes up before the main thread does, it could pick 
up
    the "ofproto_dpif" reference again from "all_ofproto_dpifs_by_uuid" and use 
it
    while the main thread continues destruct()ing it. The result could very 
well be
    a ovs crash with the stack trace reported by Yunjian, i.e:

If I understand correctly your problem, I think ovs will not crash afterapplying Yunjian's patch.

Right. So we agree we need Yunjian as well. I thought you were suggesting todrop Yunjian's and only apply yours. I think we need both.

if a handler wakes up and gets the "ofproto" ref before the main thread removesthe "ofproto" ref from"all_ofproto_dpifs_by_uuid", it can use it, but since the handler is using theofproto, itwill not enter the quiescent state, and if at the same time the main thread isdestroying ofproto, asit will run rcu_synchronize(), it will wait until the handler finishes usingofproto, then destroying it.
    (gdb) bt
        hmap_next (hmap.h:398)
        seq_wake_waiters (seq.c:326)
        seq_change_protected (seq.c:134)
        seq_change (seq.c:144)
        ofproto_dpif_send_async_msg (ofproto_dpif.c:263)
        process_upcall (ofproto_dpif_upcall.c:1782)
        recv_upcalls (ofproto_dpif_upcall.c:1026)
        udpif_upcall_handler (ofproto/ofproto_dpif_upcall.c:945)
        ovsthread_wrapper (ovs_thread.c:734)


    Maybe I've misinterpreted you're reference to Yunjian's patch so just 
wanted to
    make sure.




    Thanks
--Adrián Moreno
--
hepeng


--
Adrián Moreno

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [ovs-dev v4] ofproto: add refcount to ofproto to fix ofproto use-after-free

Reply via email to