On 23 June 2017 at 18:57, Darrell Ball <[email protected]> wrote:
>
>
> On 6/23/17, 4:08 PM, "[email protected] on behalf of Joe
> Stringer" <[email protected] on behalf of [email protected]> wrote:
>
> On 17 June 2017 at 15:53, Darrell Ball <[email protected]> wrote:
> > ALG infra and FTP (both V4 and V6) support is added to the userspace
> > datapath. Also, NAT support is included.
> >
> > Signed-off-by: Darrell Ball <[email protected]>
> > ---
>
> Hi Darrell, thanks for the patch.
>
> I wasn't able to test this out, because my build and test environments
> depend on sparse correctly running and it wouldn't compile. Please
> consider setting it up, or using Travis-CI to validate builds before
> v3.
>
> I see Travis found a sparse warning; strange, I always look for these
> and still see none, I even changed the line you saw flagged
> > + return v4_addr >> (index * 8) & 0xff;
> and still no warning :-(. I usually even get extra spurious warnings.
> However, from simple inspection right now, I would expect a “restricted type”
> warning; it is not a real problem, but, of course I’ll fix it up
OK, thanks!
> I didn't do a thorough review, but I scanned through and pointed out a
> couple of parts I have questions about.
>
> > lib/conntrack-private.h | 17 +
> > lib/conntrack.c | 1016
> +++++++++++++++++++++++++++++++++++++++++++----
> > lib/conntrack.h | 4 +-
> > 3 files changed, 957 insertions(+), 80 deletions(-)
> >
> > diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
> > index 55084d3..993af33 100644
> > --- a/lib/conntrack-private.h
> > +++ b/lib/conntrack-private.h
> > @@ -62,17 +62,34 @@ struct nat_conn_key_node {
> > struct conn_key value;
> > };
> >
> > +struct alg_exp_node {
> > + struct hmap_node node;
> > + struct ovs_list exp_node;
> > + long long expiration;
> > + struct conn_key key;
> > + struct conn_key master_key;
> > + struct ct_addr alg_nat_repl_addr;
> > + ovs_u128 master_label;
> > + uint32_t master_mark;
> > +};
> > +
> > struct conn {
> > struct conn_key key;
> > struct conn_key rev_key;
> > + /* Only used for orig_tuple support. */
> > + struct conn_key master_key;
>
> I guess no-one's looked at this structure from a cacheline
> perspective, but if we think it's worth optimizing then perhaps a
> followup patch should rearrange these. You might consider starting
> here by just placing all of the ALG-related fields together at the
> end.
>
> I intend some optimizations later, but it is outside the scope of this patch.
> I placed the fields presently based on similar function for one field -
> master_conn
> and the other fields based on padding considerations.
> I leave as is for the time being and revisit later.
Sure, sounds good.
> > @@ -554,34 +681,50 @@ conn_not_found(struct conntrack *ct, struct
> dp_packet *pkt,
> > nc->rev_key = nc->key;
> > conn_key_reverse(&nc->rev_key);
> >
> > + if (helper) {
> > + nc->alg = xstrdup(helper);
> > + }
> > +
> > + if (alg_exp) {
> > + nc->alg_related = true;
> > + nc->mark = alg_exp->master_mark;
> > + nc->label = alg_exp->master_label;
> > + nc->master_key = alg_exp->master_key;
> > + }
> > +
> > if (nat_action_info) {
> > nc->nat_info = xmemdup(nat_action_info, sizeof
> *nc->nat_info);
> > - ct_rwlock_wrlock(&ct->nat_resources_lock);
> > -
> > - bool nat_res = nat_select_range_tuple(ct, nc,
> > -
> conn_for_un_nat_copy);
> >
> > - if (!nat_res) {
> > - free(nc->nat_info);
> > - nc->nat_info = NULL;
> > - free (nc);
> > - ct_rwlock_unlock(&ct->nat_resources_lock);
> > - return NULL;
> > - }
> > + if (alg_exp) {
> > + nc->rev_key.src.addr = alg_nat_repl_addr;
> > + nc->nat_info->nat_action = NAT_ACTION_DST;
> > + *conn_for_un_nat_copy = *nc;
> > + } else {
> > + ct_rwlock_wrlock(&ct->resources_lock);
> > + bool nat_res = nat_select_range_tuple(
> > + ct, nc, conn_for_un_nat_copy);
> > +
> > + if (!nat_res) {
> > + free(nc->nat_info);
> > + nc->nat_info = NULL;
> > + free (nc);
>
> I think that nc->alg may be leaked here? any reason it doesn't use
> delete_conn()?
>
> Good
> Yes, alg will leak in this rare error case and yes, delete_conn() should be
> used
> here, as everywhere.
OK.
> > + ct_rwlock_unlock(&ct->resources_lock);
> > + return NULL;
> > + }
> >
> > - if (conn_for_un_nat_copy &&
> > - nc->conn_type == CT_CONN_TYPE_DEFAULT) {
> > *nc = *conn_for_un_nat_copy;
>
> Perhaps nc->alg and/or nc->nat_info may be leaked here?
>
> No, the un_nat conn has no such allocations, so there is nothing to leak.
I don't mean conn_for_un_nat_copy, I mean *nc which could have had an
xstrdup()'d 'alg' attached. Won't this overwrite all fields in 'nc'?
> <snip>
>
> > @@ -923,6 +1126,28 @@ sweep_bucket(struct conntrack *ct, struct
> conntrack_bucket *ctb, long long now,
> > }
> > }
> >
> > +#define MAX_ALG_EXP_TO_EXPIRE 1000
> > + size_t alg_exp_count = hmap_count(&ct->alg_expectations);
> > + /* XXX: revisit this. */
> > + size_t max_to_expire =
> > + MAX(alg_exp_count/10, MAX_ALG_EXP_TO_EXPIRE);
>
> I would have thought MAX_ALG_EXP_TO_EXPIRE would be the upper bound,
> not the lower bound.
>
> If the number of expectations is not large (common case), then this number
> represents the lower bound.
> If the number of expectations is large, then it represents an upper bound.
alg_exp_count=10, then MAX(1, 1000) -> expire up to 1000 entries.
alg_exp_count=100000, then MAX(10000,1000) -> expire up to 10000 entries.
So, MAX_ALG_EXP_TO_EXPIRE isn't a maximum?
In practice it seems unlikely to end up in a situation with so many
expectations, but still I found this a bit confusing.
> <snip>
>
> > @@ -1819,6 +2045,88 @@ nat_conn_keys_remove(struct hmap *nat_conn_keys,
> const struct conn_key *key,
> > }
> > }
> >
> > +static struct alg_exp_node *
> > +expectation_lookup(struct hmap *alg_expectations,
> > + const struct conn_key *key,
> > + uint32_t basis)
>
> OVS_REQ_RDLOCK(...) ?
>
> Yep, missed this one
There might be others, worth keeping an eye out.
> > +{
> > + struct conn_key check_key = *key;
> > + check_key.src.port = ALG_WC_SRC_PORT;
> > + struct alg_exp_node *alg_exp_node;
> > + uint32_t alg_exp_conn_key_hash = conn_key_hash(&check_key, basis);
> > + HMAP_FOR_EACH_WITH_HASH (alg_exp_node, node,
> > + alg_exp_conn_key_hash,
> > + alg_expectations) {
> > + if (!memcmp(&alg_exp_node->key, &check_key,
> > + sizeof alg_exp_node->key)) {
> > + return alg_exp_node;
> > + }
> > + }
> > +
> > + return NULL;
> > +}
> > +
> > +static void
> > +expectation_create(struct conntrack *ct,
> > + ovs_be16 dst_port,
> > + const long long now,
> > + enum ct_alg_mode mode,
> > + const struct conn *master_conn)
> > +{
> > + struct ct_addr src_addr;
> > + struct ct_addr dst_addr;
> > + struct ct_addr alg_nat_repl_addr;
> > +
> > + switch (mode) {
> > + case CT_FTP_MODE_ACTIVE:
> > + src_addr = master_conn->rev_key.src.addr;
> > + dst_addr = master_conn->rev_key.dst.addr;
> > + alg_nat_repl_addr = master_conn->key.src.addr;
> > + break;
> > + case CT_FTP_MODE_PASSIVE:
> > + src_addr = master_conn->key.src.addr;
> > + dst_addr = master_conn->key.dst.addr;
> > + alg_nat_repl_addr = master_conn->rev_key.src.addr;
> > + break;
> > + default:
> > + OVS_NOT_REACHED();
> > + }
> > +
> > + struct alg_exp_node *alg_exp_node =
> > + xzalloc(sizeof *alg_exp_node);
> > + alg_exp_node->key.dl_type = master_conn->key.dl_type;
> > + alg_exp_node->key.nw_proto = master_conn->key.nw_proto;
> > + alg_exp_node->key.zone = master_conn->key.zone;
> > + alg_exp_node->key.src.addr = src_addr;
> > + alg_exp_node->key.dst.addr = dst_addr;
> > + alg_exp_node->key.src.port = ALG_WC_SRC_PORT;
> > + alg_exp_node->key.dst.port = dst_port;
> > + alg_exp_node->master_mark = master_conn->mark;
> > + alg_exp_node->master_label = master_conn->label;
> > + alg_exp_node->master_key = master_conn->key;
> > + ct_rwlock_rdlock(&ct->resources_lock);
> > + struct alg_exp_node *alg_exp = expectation_lookup(
> > + &ct->alg_expectations, &alg_exp_node->key, ct->hash_basis);
> > + ct_rwlock_unlock(&ct->resources_lock);
> > + if (alg_exp) {
> > + free(alg_exp_node);
> > + return;
> > + }
> > +
> > + alg_exp_node->expiration = now + CT_ALG_EXP_TIMEOUT;
> > + alg_exp_node->alg_nat_repl_addr = alg_nat_repl_addr;
> > + uint32_t alg_exp_conn_key_hash =
> > + conn_key_hash(&alg_exp_node->key,
> > + ct->hash_basis);
> > + ct_rwlock_wrlock(&ct->resources_lock);
> > + hmap_insert(&ct->alg_expectations,
> > + &alg_exp_node->node,
> > + alg_exp_conn_key_hash);
> > +
> > + alg_exp_init_expiration(ct, alg_exp_node, now);
> > + ct_rwlock_unlock(&ct->resources_lock);
>
> Is there a race where another thread inserts an equivalent expectation
> between grabbing the readlock above and the writelock here?
>
> This can’t happen practically because of how packets are assigned to threads
> and how the protocols work.
> It is theoretically possible with a very sophisticated exploit; but this
> would also be a weak exploit.
> I had some other reasons to combine the critical sections however: the usual
> case is that the write
> lock will be needed anyways; might as well take it even though this code only
> handles a tiny
> subset of packets; also, readability is better.
OK.
> > int
> > @@ -2020,7 +2333,7 @@ conntrack_flush(struct conntrack *ct, const
> uint16_t *zone)
> > struct conn *conn, *next;
> >
> > ct_lock_lock(&ct->buckets[i].lock);
> > - HMAP_FOR_EACH_SAFE(conn, next, node,
> &ct->buckets[i].connections) {
> > + HMAP_FOR_EACH_SAFE (conn, next, node,
> &ct->buckets[i].connections) {
> > if ((!zone || *zone == conn->key.zone) &&
> > (conn->conn_type == CT_CONN_TYPE_DEFAULT)) {
> > conn_clean(ct, conn, &ct->buckets[i]);
> > @@ -2028,5 +2341,550 @@ conntrack_flush(struct conntrack *ct, const
> uint16_t *zone)
> > }
> > ct_lock_unlock(&ct->buckets[i].lock);
> > }
> > +
> > + ct_rwlock_wrlock(&ct->resources_lock);
>
> Should this be nested within the bucket lock?
>
> No, expectations are protected by the resource lock. You
> later correctly pointed out that I should update the comment about
> the locking – I will do that.
Right. So long as there's no chance for something to acquire
ct_resources_lock then subsequently try to grab a bucket lock.
Thanks.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev