Re: [ovs-dev] [PATCH 2/2] ofproto: Add ref counting for variable length mf_fields.

Yi-Hung Wei Mon, 13 Mar 2017 11:25:45 -0700

On Fri, Mar 10, 2017 at 11:34 AM, Joe Stringer <[email protected]> wrote:

> On 9 March 2017 at 10:22, Yi-Hung Wei <[email protected]> wrote:
> > On Wed, Mar 8, 2017 at 11:40 AM, Joe Stringer <[email protected]> wrote:
> >> > @@ -145,6 +150,7 @@ learn_execute(const struct ofpact_learn *learn,
> >> > const struct flow *flow,
> >> >                           spec->n_bits);
> >> >              bitwise_one(ofpact_set_field_mask(sf),
> >> > spec->dst.field->n_bytes,
> >> >                          spec->dst.ofs, spec->n_bits);
> >> > +            mf_vl_mff_set_tlv_bitmap(spec->dst.field,
> >> > &fm->ofpacts_tlv_bitmap);
> >> >              break;
> >> >
> >> >          case NX_LEARN_DST_OUTPUT:
> >>
> >> I think we should also set the tlv bitmap in this LEARN_DST_OUTPUT
> >> case? IIUC this uses an OXM field to determine the output port. I
> >> believe it's possible for someone to use a TLV to specify an output
> >> action using LEARN - so it would set the bitmap in the same way that
> >> the NX_LEARN_DST_LOAD case does.
> >
> > Yes, for output action in learn, we can specify a TLV to be the SRC
> field,
> > but the DST field will eventually be an immediate value that we derive
> from
> > the SRC field. So the ref count for a TLV in LEARN_OUTPUT is incresed
> when
> > we decode the learn flow indecode_NXAST_RAW_LEARN(), but not when we
> create
> > the "learned" flow in learn_execute().
>
> If I follow, you're saying that the learn flow (original) takes a
> reference on the field, so when we create the "learned" flow in
> learn_execute, we don't take a reference on the field. But could the
> original learn flow be deleted and leave the learned flow behind
> without a reference?
>
Yes, this patch does account the ref counting for learned flows. There are
four types of mf_fields usage in the learn action,  as in learn_execute(),
they are NX_LEARN_SRC_FIELD, NX_LEARN_DST_MATCH, NX_LEARN_DST_LOAD, and
NX_LEARN_DST_OUTPUT. In the "learned" flow, we need to account the ref
counting of vl_mff_field onDST_MATCH and DST_LOAD, since we need to read
values from vl_mf_field in the learned flow. For SRC_FIELD, and DST_OUTPUT,
it is translated into immediate value when we create the learned flow.



>
> >>
> >>
> >> > diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> >> > index 40704e628aaa..7807b3055f5b 100644
> >> > --- a/lib/meta-flow.c
> >> > +++ b/lib/meta-flow.c
> >> > @@ -27,6 +27,8 @@
> >> >  #include "openvswitch/dynamic-string.h"
> >> >  #include "nx-match.h"
> >> >  #include "openvswitch/ofp-util.h"
> >> > +#include "ofproto/ofproto-provider.h"
> >> > +#include "ovs-atomic.h"
> >> >  #include "ovs-rcu.h"
> >> >  #include "ovs-thread.h"
> >> >  #include "packets.h"
> >> > @@ -2646,6 +2648,7 @@ field_array_set(enum mf_field_id id, const union
> >> > mf_value *value,
> >> >   * struct vl_mff_map.*/
> >> >  struct vl_mf_field {
> >> >      struct mf_field mf;
> >> > +    struct ovs_refcount ref_cnt;
> >> >      struct cmap_node cmap_node; /* In ofproto->vl_mff_map->cmap. */
> >> >  };
> >> >
> >> > @@ -2655,17 +2658,25 @@ mf_field_hash(uint32_t key)
> >> >      return hash_int(key, 0);
> >> >  }
> >> >
> >> > -void
> >> > -mf_vl_mff_map_clear(struct vl_mff_map *vl_mff_map)
> >> > +enum ofperr
> >> > +mf_vl_mff_map_clear(struct vl_mff_map *vl_mff_map, bool clear)
> >> >      OVS_REQUIRES(vl_mff_map->mutex)
> >> >  {
> >> >      struct vl_mf_field *vmf;
> >> >
> >> >      CMAP_FOR_EACH (vmf, cmap_node, &vl_mff_map->cmap) {
> >> > -        cmap_remove(&vl_mff_map->cmap, &vmf->cmap_node,
> >> > -                    mf_field_hash(vmf->mf.id));
> >> > -        ovsrcu_postpone(free, vmf);
> >> > +        if (clear) {
> >> > +            cmap_remove(&vl_mff_map->cmap, &vmf->cmap_node,
> >> > +                        mf_field_hash(vmf->mf.id));
> >> > +            ovsrcu_postpone(free, vmf);
> >> > +        } else {
> >> > +            if (ovs_refcount_read(&vmf->ref_cnt) != 1) {
> >> > +                return OFPERR_NXTTMFC_INVALID_TLV_DEL;
> >> > +            }
> >> > +        }
> >> >      }
> >> > +
> >> > +    return 0;
> >> >  }
> >>
> >> I found this a bit confusing. mf_vl_mff_map_clear() has a boolean,
> >> also named 'clear' that determines whether it will actually clear or
> >> not. If you call a function like "clear(clear=false)", what would it
> >> intuitively do?
> >>
> >> I see two calling conventions - first, for TLV clear command from
> >> controller:
> >> error = mf_vl_mff_map_clear(vl_mff_map, false);
> >> if (error)
> >>     return error;
> >> error = mf_vl_mff_map_clear(vl_mff_map, true);
> >>
> >> Second, from ofproto cleanup code:
> >> mf_vl_mff_map_clear(vl_mff_map, true);
> >>
> >> I suggest that this would be easier to understand if the function were
> >> structured with the the bool to be 'force':
> >>
> >> enum ofperr
> >> mf_vl_mff_map_clear(struct vl_mff_map *vl_mff_map, bool force)
> >>    OVS_REQUIRES(vl_mff_map->mutex)
> >> {
> >>    struct vl_mf_field *vmf;
> >>
> >>     /* First, check whether any flows still refer to the current TLV
> >> map. */
> >>    if (!force) {
> >>        CMAP_FOR_EACH (vmf, cmap_node, &vl_mff_map->cmap) {
> >>            if (ovs_refcount_read(&vmf->ref_cnt) != 1) {
> >>                return OFPERR_NXTTMFC_INVALID_TLV_DEL;
> >>            }
> >>        }
> >>    }
> >>
> >>    CMAP_FOR_EACH (vmf, cmap_node, &vl_mff_map->cmap) {
> >>        cmap_remove(&vl_mff_map->cmap, &vmf->cmap_node,
> >>                    mf_field_hash(vmf->mf.id));
> >>        ovsrcu_postpone(free, vmf);
> >>    }
> >>
> >>    return 0;
> >> }
> >
> > Thanks for suggestion, it is much clear after modification.
> >
> >>
> >>
> >> You may also consider refactoring the ovsrcu_postpone(free, vmf) into
> >> a separate, vmf_delete() function which looks something like this:
> >>
> >> static void vmf_delete(struct vl_mf_field *vmf)
> >> {
> >>    if (ovs_refcount_unref(&vmf->ref_cnt) == 1) {
> >>         /* Postpone as this function is typically called immediately
> >> after removing from cmap. */
> >>        ovsrcu_postpone(free, vmf);
> >>    } else {
> >>        VLOG_WARN_RL(&rl, "Attempted to delete VMF %s but refcount is
> >> nonzero!",
> >>                  vmf->mf.name);
> >>    }
> >> }
> >>
> >> This could be reused from the other delete locations as well.
> >
> > I do not factor out the ovsrcu_postpone(), becuase the only case that we
> > force to clear the vl_mff_map is when we destory ofproto. Otherwise, we
> are
> > not going to remove vmf from vl_mff_map if ref_count > 1.
>
> I guess my point was partly that both mf_vl_mff_map_del() and
> mf_vl_mff_map_clear() can potentially RCU-defer freeing of the VMF,
> and if we push that out to a separate function here that ensures that
> the refcount is what we expect it to be, then it's more defensively
> coded against potential misuse - attempting to free when it's still
> being used, for instance.
>
Thanks, I applied this to v3.


>
> >> > +static void
> >> > +mf_vl_mff_ref_cnt_mod(const struct vl_mff_map *map, uint64_t
> >> > tlv_bitmap,
> >> > +                      bool ref)
> >> > +{
> >> > +    struct vl_mf_field *vmf;
> >> > +    int i;
> >> > +
> >> > +    if (map) {
> >> > +        ULLONG_FOR_EACH_1 (i, tlv_bitmap) {
> >> > +            vmf = mf_get_vl_mff__(i + MFF_TUN_METADATA0, map);
> >> > +            if (vmf) {
> >> > +                if (ref) {
> >> > +                    ovs_refcount_ref(&vmf->ref_cnt);
> >> > +                } else {
> >> > +                    ovs_refcount_unref(&vmf->ref_cnt);
> >>
> >> We should warn if we're trying to unref and this is the final ref,
> >> because then we're not freeing the object.
> >
> > It could be the case that no active flow is currently use this tlv, but
> the
> > controller is going to use this tlv entry for some new flows. In this
> case,
> > it is not necessary to free vmf since it will be useful.
>
> Typically when refcounts are used, the last one to decrement it -> 0
> is responsible for freeing it. Also, if it goes to 0 and something is
> able to find the element, then it probably shouldn't increment the
> refcount because most likely the piece of code that decremented it to
> 0 also rcu-deferred the free. It's easier to reason about how we
> ensure that the memory a) valid to access and b) is eventually freed
> if we stick to this idiom.
>
> For most cases, this is hit from the main thread which handles
> OpenFlow messages; if it's decoding a message and acquiring references
> to fields, then by definition it can't also be removing the fields
> that we're trying to reference, so I don't think this should ever
> decrement the refcount -> 0; therefore, if we warn, we could easily
> see that we have some bad assumption in how we understand this code.
>
> I will note though, I also see some learn_execute() callers for this,
> which means that another thread is getting involved - perhaps an
> upcall handler or perhaps a revalidator. This makes me a little more
> nervous about ensuring that references to the fields are being
> correctly counted and whether we may be trying to retain access to a
> field which was recently freed. Most likely it's fine, since that
> other thread would have to find a flow which references these fields
> to be able to take another reference (eg for the "learned" flow); but
> warning something like this would be more explicit in this function:
>
> diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> index e844008f6294..bef5aad768a3 100644
> --- a/lib/meta-flow.c
> +++ b/lib/meta-flow.c
> @@ -2836,9 +2836,13 @@ mf_vl_mff_ref_cnt_mod(const struct vl_mff_map
> *map, uint64_t tlv_bitmap,
>             vmf = mf_get_vl_mff__(i + MFF_TUN_METADATA0, map);
>             if (vmf) {
>                 if (ref) {
> -                    ovs_refcount_ref(&vmf->ref_cnt);
> +                    if (ovs_refcount_ref(&vmf->ref_cnt) == 0) {
> +                        VLOG_WARN("Taking reference on freed VMF %d", i);
> +                    }
>                 } else {
> -                    ovs_refcount_unref(&vmf->ref_cnt);
> +                    if (ovs_refcount_unref(&vmf->ref_cnt) == 1) {
> +                        VLOG_WARN("Dereferencing VMF %d without free", i);
> +                    }
>                 }
>             } else {
>                 VLOG_WARN("Invalid TLV index %d.", i);
>
> Maybe it's over the top. But it would be nicer to know through a log
> message than just have some crazy bug show up later on and try to
> figure out that this is what is happening.
>
Thanks, as you pointed out on in the following e-mail, since
ovs_refcount_ref() already asserts this, we are not going to vlog here.


> >> > @@ -1164,6 +1169,8 @@ decode_NXAST_RAW_OUTPUT_REG2(const struct
> >> > nx_action_output_reg2 *naor,
> >> >      if (error) {
> >> >          return error;
> >> >      }
> >> > +    mf_vl_mff_set_tlv_bitmap(output_reg->src.field, tlv_bitmap);
> >> > +
> >>
> >> I wonder if it's cleaner to fold this into nx_pull_header().
> >>
> >> It's not so obvious why the ofpacts decode functions in this file
> >> inconsistently need to call the mf_check_vl_mff() and/or
> >> mf_vl_mff_set_tlv_bitmap(), which makes it a little more difficult to
> >> verify (and easier to mess up if new actions are added). It'd be nice
> >> to make this more consistent.
> >
> >
> > Thanks, it is hard to follow as in v1. Actually, we use nx_pull_header(),
> > nx_pull_entry(), and mf_from_nxm_header() to get a mf_field in
> ofp-action.c
> > and those three functions are heavily use in other part of the code base
> > that may not need to update the bitmap. So instead to fold
> > mf_vl_mff_set_tlv_bitmap() directly into the three functions, three
> wrapper
> > functions mf_vl_mff_nx_pull_header(), mf_vl_mff_nx_pull_entry(), and
> > mf_vl_mff_mf_from_nxm_header() are created that handle mf_field operation
> > and set the tlv_bitmap, and it should be more easier to follow now.
>
> OK, fair enough, thanks.
>
> >> > @@ -1603,6 +1599,12 @@ ofproto_destroy__(struct ofproto *ofproto)
> >> >      }
> >> >      free(ofproto->tables);
> >> >
> >> > +    ovs_mutex_lock(&ofproto->vl_mff_map.mutex);
> >> > +    mf_vl_mff_map_clear(&ofproto->vl_mff_map, true);
> >> > +    ovs_mutex_unlock(&ofproto->vl_mff_map.mutex);
> >> > +    cmap_destroy(&ofproto->vl_mff_map.cmap);
> >> > +    ovs_mutex_destroy(&ofproto->vl_mff_map.mutex);
> >> > +
> >> >      ovs_assert(hindex_is_empty(&ofproto->cookies));
> >> >      hindex_destroy(&ofproto->cookies);
> >> >
> >>
> >> Is there a reason for the above two hunks to shift this logic later?
> >> Should this be in a separate patch to explain why, or could it be
> >> dropped?
> >>
> > Cause I feel more comfortable to remove the vl_mf_field after all the
> rules
> > are deleted. In practice, it is invoked when ofproto is destroyed, so it
> > probabaly does not matter?
>
> I think it should be another patch to move it, and describe this
> concern in the commit message.
>
Thanks, I moved it to anther patch in v3.


>
> >> > @@ -7784,14 +7802,15 @@ handle_tlv_table_mod(struct ofconn *ofconn,
> >> > const struct ofp_header *oh)
> >> >      old_tab = ovsrcu_get_protected(struct tun_table *,
> >> > &ofproto->metadata_tab);
> >> >      error = tun_metadata_table_mod(&ttm, old_tab, &new_tab);
> >> >      if (!error) {
> >> > -        ovsrcu_set(&ofproto->metadata_tab, new_tab);
> >> > -        tun_metadata_postpone_free(old_tab);
> >> > -
> >> >          ovs_mutex_lock(&ofproto->vl_mff_map.mutex);
> >> >          error =
> >> > mf_vl_mff_map_mod_from_tun_metadata(&ofproto->vl_mff_map,
> >> >                                                      &ttm);
> >> >          ovs_mutex_unlock(&ofproto->vl_mff_map.mutex);
> >> >      }
> >> > +    if (!error) {
> >> > +        ovsrcu_set(&ofproto->metadata_tab, new_tab);
> >> > +        tun_metadata_postpone_free(old_tab);
> >> > +    }
> >>
> >> Maybe I didn't quite dig deep enough, but what's the implications of
> >> moving this? Could it fix another bug, should it be a separate patch?
> >
> >
> > I'm not sure if it be a separate patch? What this change is that we can
> only
> > modify update the TLV mapping table if all the table modification
> operations
> > are valid. May it makes more sense like this
> >
> >     if (!error) {
> >         ovs_mutex_lock(&ofproto->vl_mff_map.mutex);
> >         error = mf_vl_mff_map_mod_from_tun_
> metadata(&ofproto->vl_mff_map,
> >                                                     &ttm);
> >         ovs_mutex_unlock(&ofproto->vl_mff_map.mutex);
> >         if (!error) {
> >             ovsrcu_set(&ofproto->metadata_tab, new_tab);
> >             tun_metadata_postpone_free(old_tab);
> >         }
> >     }
>
> LGTM.
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 2/2] ofproto: Add ref counting for variable length mf_fields.

Reply via email to