On 6/26/17, 10:22 AM, "Joe Stringer" <[email protected]> wrote:

    On 23 June 2017 at 18:57, Darrell Ball <[email protected]> wrote:
    >
    >
    > On 6/23/17, 4:08 PM, "[email protected] on behalf of Joe 
Stringer" <[email protected] on behalf of [email protected]> wrote:
    >
    >     On 17 June 2017 at 15:53, Darrell Ball <[email protected]> wrote:
    >     > ALG infra and FTP (both V4 and V6) support is added to the userspace
    >     > datapath.  Also, NAT support is included.
    >     >
    >     > Signed-off-by: Darrell Ball <[email protected]>
    >     > ---
    >
    >     Hi Darrell, thanks for the patch.
    >
    >     I wasn't able to test this out, because my build and test environments
    >     depend on sparse correctly running and it wouldn't compile. Please
    >     consider setting it up, or using Travis-CI to validate builds before
    >     v3.
    >
    > I see Travis found a sparse warning; strange, I always look for these
    > and still see none, I even changed the line you saw flagged
    >     > +    return v4_addr >> (index * 8) & 0xff;
    > and still no warning :-(. I usually even get extra spurious warnings.
    > However, from simple inspection right now, I would expect a “restricted 
type”
    > warning; it is not a real problem, but, of course I’ll fix it up
    
    OK, thanks!
    
    >     I didn't do a thorough review, but I scanned through and pointed out a
    >     couple of parts I have questions about.
    >
    >     >  lib/conntrack-private.h |   17 +
    >     >  lib/conntrack.c         | 1016 
+++++++++++++++++++++++++++++++++++++++++++----
    >     >  lib/conntrack.h         |    4 +-
    >     >  3 files changed, 957 insertions(+), 80 deletions(-)
    >     >
    >     > diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
    >     > index 55084d3..993af33 100644
    >     > --- a/lib/conntrack-private.h
    >     > +++ b/lib/conntrack-private.h
    >     > @@ -62,17 +62,34 @@ struct nat_conn_key_node {
    >     >      struct conn_key value;
    >     >  };
    >     >
    >     > +struct alg_exp_node {
    >     > +    struct hmap_node node;
    >     > +    struct ovs_list exp_node;
    >     > +    long long expiration;
    >     > +    struct conn_key key;
    >     > +    struct conn_key master_key;
    >     > +    struct ct_addr alg_nat_repl_addr;
    >     > +    ovs_u128 master_label;
    >     > +    uint32_t master_mark;
    >     > +};
    >     > +
    >     >  struct conn {
    >     >      struct conn_key key;
    >     >      struct conn_key rev_key;
    >     > +    /* Only used for orig_tuple support. */
    >     > +    struct conn_key master_key;
    >
    >     I guess no-one's looked at this structure from a cacheline
    >     perspective, but if we think it's worth optimizing then perhaps a
    >     followup patch should rearrange these. You might consider starting
    >     here by just placing all of the ALG-related fields together at the
    >     end.
    >
    > I intend some optimizations later, but it is outside the scope of this 
patch.
    > I placed the fields presently based on similar function for one field - 
master_conn
    > and the other fields based on padding considerations.
    > I leave as is for the time being and revisit later.
    
    Sure, sounds good.
    
    >     > @@ -554,34 +681,50 @@ conn_not_found(struct conntrack *ct, struct 
dp_packet *pkt,
    >     >          nc->rev_key = nc->key;
    >     >          conn_key_reverse(&nc->rev_key);
    >     >
    >     > +        if (helper) {
    >     > +            nc->alg = xstrdup(helper);
    >     > +        }
    >     > +
    >     > +        if (alg_exp) {
    >     > +            nc->alg_related = true;
    >     > +            nc->mark = alg_exp->master_mark;
    >     > +            nc->label = alg_exp->master_label;
    >     > +            nc->master_key = alg_exp->master_key;
    >     > +        }
    >     > +
    >     >          if (nat_action_info) {
    >     >              nc->nat_info = xmemdup(nat_action_info, sizeof 
*nc->nat_info);
    >     > -            ct_rwlock_wrlock(&ct->nat_resources_lock);
    >     > -
    >     > -            bool nat_res = nat_select_range_tuple(ct, nc,
    >     > -                                                  
conn_for_un_nat_copy);
    >     >
    >     > -            if (!nat_res) {
    >     > -                free(nc->nat_info);
    >     > -                nc->nat_info = NULL;
    >     > -                free (nc);
    >     > -                ct_rwlock_unlock(&ct->nat_resources_lock);
    >     > -                return NULL;
    >     > -            }
    >     > +            if (alg_exp) {
    >     > +                nc->rev_key.src.addr = alg_nat_repl_addr;
    >     > +                nc->nat_info->nat_action = NAT_ACTION_DST;
    >     > +                *conn_for_un_nat_copy = *nc;
    >     > +            } else {
    >     > +                ct_rwlock_wrlock(&ct->resources_lock);
    >     > +                bool nat_res = nat_select_range_tuple(
    >     > +                                   ct, nc, conn_for_un_nat_copy);
    >     > +
    >     > +                if (!nat_res) {
    >     > +                    free(nc->nat_info);
    >     > +                    nc->nat_info = NULL;
    >     > +                    free (nc);
    >
    >     I think that nc->alg may be leaked here? any reason it doesn't use
    >     delete_conn()?
    >
    > Good
    > Yes, alg will leak in this rare error case and yes, delete_conn() should 
be used
    > here, as everywhere.
    
    OK.
    
    >     > +                    ct_rwlock_unlock(&ct->resources_lock);
    >     > +                    return NULL;
    >     > +                }
    >     >
    >     > -            if (conn_for_un_nat_copy &&
    >     > -                nc->conn_type == CT_CONN_TYPE_DEFAULT) {
    >     >                  *nc = *conn_for_un_nat_copy;
    >
    >     Perhaps nc->alg and/or nc->nat_info may be leaked here?
    >
    > No, the un_nat conn has no such allocations, so there is nothing to leak.
    
    I don't mean conn_for_un_nat_copy, I mean *nc which could have had an
    xstrdup()'d 'alg' attached. Won't this overwrite all fields in 'nc'?

I see your question now.
No, at this point, the copy gets the same pointers to the alg string and 
nat_info.
Only nc needs them and the un_nat copy ptrs are nulled.
There is only one allocation set.

    
    >     <snip>
    >
    >     > @@ -923,6 +1126,28 @@ sweep_bucket(struct conntrack *ct, struct 
conntrack_bucket *ctb, long long now,
    >     >          }
    >     >      }
    >     >
    >     > +#define MAX_ALG_EXP_TO_EXPIRE 1000
    >     > +    size_t alg_exp_count = hmap_count(&ct->alg_expectations);
    >     > +    /* XXX: revisit this. */
    >     > +    size_t max_to_expire =
    >     > +        MAX(alg_exp_count/10, MAX_ALG_EXP_TO_EXPIRE);
    >
    >     I would have thought MAX_ALG_EXP_TO_EXPIRE would be the upper bound,
    >     not the lower bound.
    >
    > If the number of expectations is not large (common case), then this 
number represents the lower bound.
    > If the number of expectations is large, then it represents an upper bound.
    
    alg_exp_count=10, then MAX(1, 1000) -> expire up to 1000 entries.
    
    alg_exp_count=100000, then MAX(10000,1000) -> expire up to 10000 entries.
    
    So, MAX_ALG_EXP_TO_EXPIRE isn't a maximum?

I agree and was mulling over the same a few times; I will change
MAX_ALG_EXP_TO_EXPIRE
to 
DFT_ALG_EXP_TO_EXPIRE
    
    In practice it seems unlikely to end up in a situation with so many
    expectations, but still I found this a bit confusing.

    
    >     <snip>
    >
    >     > @@ -1819,6 +2045,88 @@ nat_conn_keys_remove(struct hmap 
*nat_conn_keys, const struct conn_key *key,
    >     >      }
    >     >  }
    >     >
    >     > +static struct alg_exp_node *
    >     > +expectation_lookup(struct hmap *alg_expectations,
    >     > +                   const struct conn_key *key,
    >     > +                   uint32_t basis)
    >
    >     OVS_REQ_RDLOCK(...) ?
    >
    > Yep, missed this one
    
    There might be others, worth keeping an eye out.

Yes, there are a few.
The reason I did not add them is that it requires an extra parameter for the 
lock
that is unused so the OVS_REQ can reference it; I might just add a comment and 
skip adding
the extra dummy parameter.
    
    >     > +{
    >     > +    struct conn_key check_key = *key;
    >     > +    check_key.src.port = ALG_WC_SRC_PORT;
    >     > +    struct alg_exp_node *alg_exp_node;
    >     > +    uint32_t alg_exp_conn_key_hash = conn_key_hash(&check_key, 
basis);
    >     > +    HMAP_FOR_EACH_WITH_HASH (alg_exp_node, node,
    >     > +                             alg_exp_conn_key_hash,
    >     > +                             alg_expectations) {
    >     > +        if (!memcmp(&alg_exp_node->key, &check_key,
    >     > +                    sizeof alg_exp_node->key)) {
    >     > +            return alg_exp_node;
    >     > +        }
    >     > +    }
    >     > +
    >     > +    return NULL;
    >     > +}
    >     > +
    >     > +static void
    >     > +expectation_create(struct conntrack *ct,
    >     > +                   ovs_be16 dst_port,
    >     > +                   const long long now,
    >     > +                   enum ct_alg_mode mode,
    >     > +                   const struct conn *master_conn)
    >     > +{
    >     > +    struct ct_addr src_addr;
    >     > +    struct ct_addr dst_addr;
    >     > +    struct ct_addr alg_nat_repl_addr;
    >     > +
    >     > +    switch (mode) {
    >     > +    case CT_FTP_MODE_ACTIVE:
    >     > +        src_addr = master_conn->rev_key.src.addr;
    >     > +        dst_addr = master_conn->rev_key.dst.addr;
    >     > +        alg_nat_repl_addr = master_conn->key.src.addr;
    >     > +        break;
    >     > +    case CT_FTP_MODE_PASSIVE:
    >     > +        src_addr = master_conn->key.src.addr;
    >     > +        dst_addr = master_conn->key.dst.addr;
    >     > +        alg_nat_repl_addr = master_conn->rev_key.src.addr;
    >     > +        break;
    >     > +    default:
    >     > +        OVS_NOT_REACHED();
    >     > +    }
    >     > +
    >     > +    struct alg_exp_node *alg_exp_node =
    >     > +        xzalloc(sizeof *alg_exp_node);
    >     > +    alg_exp_node->key.dl_type = master_conn->key.dl_type;
    >     > +    alg_exp_node->key.nw_proto = master_conn->key.nw_proto;
    >     > +    alg_exp_node->key.zone = master_conn->key.zone;
    >     > +    alg_exp_node->key.src.addr = src_addr;
    >     > +    alg_exp_node->key.dst.addr = dst_addr;
    >     > +    alg_exp_node->key.src.port = ALG_WC_SRC_PORT;
    >     > +    alg_exp_node->key.dst.port = dst_port;
    >     > +    alg_exp_node->master_mark = master_conn->mark;
    >     > +    alg_exp_node->master_label = master_conn->label;
    >     > +    alg_exp_node->master_key = master_conn->key;
    >     > +    ct_rwlock_rdlock(&ct->resources_lock);
    >     > +    struct alg_exp_node *alg_exp = expectation_lookup(
    >     > +        &ct->alg_expectations, &alg_exp_node->key, ct->hash_basis);
    >     > +    ct_rwlock_unlock(&ct->resources_lock);
    >     > +    if (alg_exp) {
    >     > +        free(alg_exp_node);
    >     > +        return;
    >     > +    }
    >     > +
    >     > +    alg_exp_node->expiration = now + CT_ALG_EXP_TIMEOUT;
    >     > +    alg_exp_node->alg_nat_repl_addr = alg_nat_repl_addr;
    >     > +    uint32_t alg_exp_conn_key_hash =
    >     > +        conn_key_hash(&alg_exp_node->key,
    >     > +                      ct->hash_basis);
    >     > +    ct_rwlock_wrlock(&ct->resources_lock);
    >     > +    hmap_insert(&ct->alg_expectations,
    >     > +                &alg_exp_node->node,
    >     > +                alg_exp_conn_key_hash);
    >     > +
    >     > +    alg_exp_init_expiration(ct, alg_exp_node, now);
    >     > +    ct_rwlock_unlock(&ct->resources_lock);
    >
    >     Is there a race where another thread inserts an equivalent expectation
    >     between grabbing the readlock above and the writelock here?
    >
    > This can’t happen practically because of how packets are assigned to 
threads
    > and how the protocols work.
    > It is theoretically possible with a very sophisticated exploit; but this 
would also be a weak exploit.
    > I had some other reasons to combine the critical sections however: the 
usual case is that the write
    > lock will be needed anyways; might as well take it even though this code 
only handles a tiny
    > subset of packets; also, readability is better.
    
    OK.
    
    >     >  int
    >     > @@ -2020,7 +2333,7 @@ conntrack_flush(struct conntrack *ct, const 
uint16_t *zone)
    >     >          struct conn *conn, *next;
    >     >
    >     >          ct_lock_lock(&ct->buckets[i].lock);
    >     > -        HMAP_FOR_EACH_SAFE(conn, next, node, 
&ct->buckets[i].connections) {
    >     > +        HMAP_FOR_EACH_SAFE (conn, next, node, 
&ct->buckets[i].connections) {
    >     >              if ((!zone || *zone == conn->key.zone) &&
    >     >                  (conn->conn_type == CT_CONN_TYPE_DEFAULT)) {
    >     >                  conn_clean(ct, conn, &ct->buckets[i]);
    >     > @@ -2028,5 +2341,550 @@ conntrack_flush(struct conntrack *ct, const 
uint16_t *zone)
    >     >          }
    >     >          ct_lock_unlock(&ct->buckets[i].lock);
    >     >      }
    >     > +
    >     > +    ct_rwlock_wrlock(&ct->resources_lock);
    >
    >     Should this be nested within the bucket lock?
    >
    > No, expectations are protected by the resource lock. You
    > later correctly pointed out that I should update the comment about
    > the locking – I will do that.
    
    Right. So long as there's no chance for something to acquire
    ct_resources_lock then subsequently try to grab a bucket lock.
    
    Thanks.
    

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to