On 23 June 2017 at 18:57, Darrell Ball <[email protected]> wrote:
>
>
> On 6/23/17, 4:08 PM, "[email protected] on behalf of Joe 
> Stringer" <[email protected] on behalf of [email protected]> wrote:
>
>     On 17 June 2017 at 15:53, Darrell Ball <[email protected]> wrote:
>     > ALG infra and FTP (both V4 and V6) support is added to the userspace
>     > datapath.  Also, NAT support is included.
>     >
>     > Signed-off-by: Darrell Ball <[email protected]>
>     > ---
>
>     Hi Darrell, thanks for the patch.
>
>     I wasn't able to test this out, because my build and test environments
>     depend on sparse correctly running and it wouldn't compile. Please
>     consider setting it up, or using Travis-CI to validate builds before
>     v3.
>
> I see Travis found a sparse warning; strange, I always look for these
> and still see none, I even changed the line you saw flagged
>     > +    return v4_addr >> (index * 8) & 0xff;
> and still no warning :-(. I usually even get extra spurious warnings.
> However, from simple inspection right now, I would expect a “restricted type”
> warning; it is not a real problem, but, of course I’ll fix it up

OK, thanks!

>     I didn't do a thorough review, but I scanned through and pointed out a
>     couple of parts I have questions about.
>
>     >  lib/conntrack-private.h |   17 +
>     >  lib/conntrack.c         | 1016 
> +++++++++++++++++++++++++++++++++++++++++++----
>     >  lib/conntrack.h         |    4 +-
>     >  3 files changed, 957 insertions(+), 80 deletions(-)
>     >
>     > diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
>     > index 55084d3..993af33 100644
>     > --- a/lib/conntrack-private.h
>     > +++ b/lib/conntrack-private.h
>     > @@ -62,17 +62,34 @@ struct nat_conn_key_node {
>     >      struct conn_key value;
>     >  };
>     >
>     > +struct alg_exp_node {
>     > +    struct hmap_node node;
>     > +    struct ovs_list exp_node;
>     > +    long long expiration;
>     > +    struct conn_key key;
>     > +    struct conn_key master_key;
>     > +    struct ct_addr alg_nat_repl_addr;
>     > +    ovs_u128 master_label;
>     > +    uint32_t master_mark;
>     > +};
>     > +
>     >  struct conn {
>     >      struct conn_key key;
>     >      struct conn_key rev_key;
>     > +    /* Only used for orig_tuple support. */
>     > +    struct conn_key master_key;
>
>     I guess no-one's looked at this structure from a cacheline
>     perspective, but if we think it's worth optimizing then perhaps a
>     followup patch should rearrange these. You might consider starting
>     here by just placing all of the ALG-related fields together at the
>     end.
>
> I intend some optimizations later, but it is outside the scope of this patch.
> I placed the fields presently based on similar function for one field - 
> master_conn
> and the other fields based on padding considerations.
> I leave as is for the time being and revisit later.

Sure, sounds good.

>     > @@ -554,34 +681,50 @@ conn_not_found(struct conntrack *ct, struct 
> dp_packet *pkt,
>     >          nc->rev_key = nc->key;
>     >          conn_key_reverse(&nc->rev_key);
>     >
>     > +        if (helper) {
>     > +            nc->alg = xstrdup(helper);
>     > +        }
>     > +
>     > +        if (alg_exp) {
>     > +            nc->alg_related = true;
>     > +            nc->mark = alg_exp->master_mark;
>     > +            nc->label = alg_exp->master_label;
>     > +            nc->master_key = alg_exp->master_key;
>     > +        }
>     > +
>     >          if (nat_action_info) {
>     >              nc->nat_info = xmemdup(nat_action_info, sizeof 
> *nc->nat_info);
>     > -            ct_rwlock_wrlock(&ct->nat_resources_lock);
>     > -
>     > -            bool nat_res = nat_select_range_tuple(ct, nc,
>     > -                                                  
> conn_for_un_nat_copy);
>     >
>     > -            if (!nat_res) {
>     > -                free(nc->nat_info);
>     > -                nc->nat_info = NULL;
>     > -                free (nc);
>     > -                ct_rwlock_unlock(&ct->nat_resources_lock);
>     > -                return NULL;
>     > -            }
>     > +            if (alg_exp) {
>     > +                nc->rev_key.src.addr = alg_nat_repl_addr;
>     > +                nc->nat_info->nat_action = NAT_ACTION_DST;
>     > +                *conn_for_un_nat_copy = *nc;
>     > +            } else {
>     > +                ct_rwlock_wrlock(&ct->resources_lock);
>     > +                bool nat_res = nat_select_range_tuple(
>     > +                                   ct, nc, conn_for_un_nat_copy);
>     > +
>     > +                if (!nat_res) {
>     > +                    free(nc->nat_info);
>     > +                    nc->nat_info = NULL;
>     > +                    free (nc);
>
>     I think that nc->alg may be leaked here? any reason it doesn't use
>     delete_conn()?
>
> Good
> Yes, alg will leak in this rare error case and yes, delete_conn() should be 
> used
> here, as everywhere.

OK.

>     > +                    ct_rwlock_unlock(&ct->resources_lock);
>     > +                    return NULL;
>     > +                }
>     >
>     > -            if (conn_for_un_nat_copy &&
>     > -                nc->conn_type == CT_CONN_TYPE_DEFAULT) {
>     >                  *nc = *conn_for_un_nat_copy;
>
>     Perhaps nc->alg and/or nc->nat_info may be leaked here?
>
> No, the un_nat conn has no such allocations, so there is nothing to leak.

I don't mean conn_for_un_nat_copy, I mean *nc which could have had an
xstrdup()'d 'alg' attached. Won't this overwrite all fields in 'nc'?

>     <snip>
>
>     > @@ -923,6 +1126,28 @@ sweep_bucket(struct conntrack *ct, struct 
> conntrack_bucket *ctb, long long now,
>     >          }
>     >      }
>     >
>     > +#define MAX_ALG_EXP_TO_EXPIRE 1000
>     > +    size_t alg_exp_count = hmap_count(&ct->alg_expectations);
>     > +    /* XXX: revisit this. */
>     > +    size_t max_to_expire =
>     > +        MAX(alg_exp_count/10, MAX_ALG_EXP_TO_EXPIRE);
>
>     I would have thought MAX_ALG_EXP_TO_EXPIRE would be the upper bound,
>     not the lower bound.
>
> If the number of expectations is not large (common case), then this number 
> represents the lower bound.
> If the number of expectations is large, then it represents an upper bound.

alg_exp_count=10, then MAX(1, 1000) -> expire up to 1000 entries.

alg_exp_count=100000, then MAX(10000,1000) -> expire up to 10000 entries.

So, MAX_ALG_EXP_TO_EXPIRE isn't a maximum?

In practice it seems unlikely to end up in a situation with so many
expectations, but still I found this a bit confusing.

>     <snip>
>
>     > @@ -1819,6 +2045,88 @@ nat_conn_keys_remove(struct hmap *nat_conn_keys, 
> const struct conn_key *key,
>     >      }
>     >  }
>     >
>     > +static struct alg_exp_node *
>     > +expectation_lookup(struct hmap *alg_expectations,
>     > +                   const struct conn_key *key,
>     > +                   uint32_t basis)
>
>     OVS_REQ_RDLOCK(...) ?
>
> Yep, missed this one

There might be others, worth keeping an eye out.

>     > +{
>     > +    struct conn_key check_key = *key;
>     > +    check_key.src.port = ALG_WC_SRC_PORT;
>     > +    struct alg_exp_node *alg_exp_node;
>     > +    uint32_t alg_exp_conn_key_hash = conn_key_hash(&check_key, basis);
>     > +    HMAP_FOR_EACH_WITH_HASH (alg_exp_node, node,
>     > +                             alg_exp_conn_key_hash,
>     > +                             alg_expectations) {
>     > +        if (!memcmp(&alg_exp_node->key, &check_key,
>     > +                    sizeof alg_exp_node->key)) {
>     > +            return alg_exp_node;
>     > +        }
>     > +    }
>     > +
>     > +    return NULL;
>     > +}
>     > +
>     > +static void
>     > +expectation_create(struct conntrack *ct,
>     > +                   ovs_be16 dst_port,
>     > +                   const long long now,
>     > +                   enum ct_alg_mode mode,
>     > +                   const struct conn *master_conn)
>     > +{
>     > +    struct ct_addr src_addr;
>     > +    struct ct_addr dst_addr;
>     > +    struct ct_addr alg_nat_repl_addr;
>     > +
>     > +    switch (mode) {
>     > +    case CT_FTP_MODE_ACTIVE:
>     > +        src_addr = master_conn->rev_key.src.addr;
>     > +        dst_addr = master_conn->rev_key.dst.addr;
>     > +        alg_nat_repl_addr = master_conn->key.src.addr;
>     > +        break;
>     > +    case CT_FTP_MODE_PASSIVE:
>     > +        src_addr = master_conn->key.src.addr;
>     > +        dst_addr = master_conn->key.dst.addr;
>     > +        alg_nat_repl_addr = master_conn->rev_key.src.addr;
>     > +        break;
>     > +    default:
>     > +        OVS_NOT_REACHED();
>     > +    }
>     > +
>     > +    struct alg_exp_node *alg_exp_node =
>     > +        xzalloc(sizeof *alg_exp_node);
>     > +    alg_exp_node->key.dl_type = master_conn->key.dl_type;
>     > +    alg_exp_node->key.nw_proto = master_conn->key.nw_proto;
>     > +    alg_exp_node->key.zone = master_conn->key.zone;
>     > +    alg_exp_node->key.src.addr = src_addr;
>     > +    alg_exp_node->key.dst.addr = dst_addr;
>     > +    alg_exp_node->key.src.port = ALG_WC_SRC_PORT;
>     > +    alg_exp_node->key.dst.port = dst_port;
>     > +    alg_exp_node->master_mark = master_conn->mark;
>     > +    alg_exp_node->master_label = master_conn->label;
>     > +    alg_exp_node->master_key = master_conn->key;
>     > +    ct_rwlock_rdlock(&ct->resources_lock);
>     > +    struct alg_exp_node *alg_exp = expectation_lookup(
>     > +        &ct->alg_expectations, &alg_exp_node->key, ct->hash_basis);
>     > +    ct_rwlock_unlock(&ct->resources_lock);
>     > +    if (alg_exp) {
>     > +        free(alg_exp_node);
>     > +        return;
>     > +    }
>     > +
>     > +    alg_exp_node->expiration = now + CT_ALG_EXP_TIMEOUT;
>     > +    alg_exp_node->alg_nat_repl_addr = alg_nat_repl_addr;
>     > +    uint32_t alg_exp_conn_key_hash =
>     > +        conn_key_hash(&alg_exp_node->key,
>     > +                      ct->hash_basis);
>     > +    ct_rwlock_wrlock(&ct->resources_lock);
>     > +    hmap_insert(&ct->alg_expectations,
>     > +                &alg_exp_node->node,
>     > +                alg_exp_conn_key_hash);
>     > +
>     > +    alg_exp_init_expiration(ct, alg_exp_node, now);
>     > +    ct_rwlock_unlock(&ct->resources_lock);
>
>     Is there a race where another thread inserts an equivalent expectation
>     between grabbing the readlock above and the writelock here?
>
> This can’t happen practically because of how packets are assigned to threads
> and how the protocols work.
> It is theoretically possible with a very sophisticated exploit; but this 
> would also be a weak exploit.
> I had some other reasons to combine the critical sections however: the usual 
> case is that the write
> lock will be needed anyways; might as well take it even though this code only 
> handles a tiny
> subset of packets; also, readability is better.

OK.

>     >  int
>     > @@ -2020,7 +2333,7 @@ conntrack_flush(struct conntrack *ct, const 
> uint16_t *zone)
>     >          struct conn *conn, *next;
>     >
>     >          ct_lock_lock(&ct->buckets[i].lock);
>     > -        HMAP_FOR_EACH_SAFE(conn, next, node, 
> &ct->buckets[i].connections) {
>     > +        HMAP_FOR_EACH_SAFE (conn, next, node, 
> &ct->buckets[i].connections) {
>     >              if ((!zone || *zone == conn->key.zone) &&
>     >                  (conn->conn_type == CT_CONN_TYPE_DEFAULT)) {
>     >                  conn_clean(ct, conn, &ct->buckets[i]);
>     > @@ -2028,5 +2341,550 @@ conntrack_flush(struct conntrack *ct, const 
> uint16_t *zone)
>     >          }
>     >          ct_lock_unlock(&ct->buckets[i].lock);
>     >      }
>     > +
>     > +    ct_rwlock_wrlock(&ct->resources_lock);
>
>     Should this be nested within the bucket lock?
>
> No, expectations are protected by the resource lock. You
> later correctly pointed out that I should update the comment about
> the locking – I will do that.

Right. So long as there's no chance for something to acquire
ct_resources_lock then subsequently try to grab a bucket lock.

Thanks.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to