[ovs-dev] [PATCH] lib: fix cmap_find_protected

2018-12-24 Thread Zang MingJie
cmap_find_protected calculated wrong h2 hash which causing entries with
duplicated id inserted into the cmap.

Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-December/047945.html
Signed-off-by: Zang MingJie 
---
 lib/cmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/cmap.c b/lib/cmap.c
index cb9cd32ab..c9eef3f4a 100644
--- a/lib/cmap.c
+++ b/lib/cmap.c
@@ -568,7 +568,7 @@ cmap_find_protected(const struct cmap *cmap, uint32_t hash)
 {
 struct cmap_impl *impl = cmap_get_impl(cmap);
 uint32_t h1 = rehash(impl, hash);
-uint32_t h2 = other_hash(hash);
+uint32_t h2 = other_hash(h1);
 struct cmap_node *node;
 
 node = cmap_find_bucket_protected(impl, hash, h1);
-- 
2.20.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] lib: fix cmap_find_protected

2018-12-24 Thread Zang MingJie
We have found a bug where recirc node with duplicate id has been inserted
to the cmap, details of the bug in the following email:

https://mail.openvswitch.org/pipermail/ovs-discuss/2018-December/047945.html

Finally we found that cmap_find_protected calculated wrong h2:

struct cmap_impl *impl = cmap_get_impl(cmap);
uint32_t h1 = rehash(impl, hash);
uint32_t h2 = other_hash(hash);

Here is how h2 is calculated in other functions:

const struct cmap_impl *impl = cmap_get_impl(cmap);
uint32_t h1 = rehash(impl, hash);
uint32_t h2 = other_hash(h1);


On Mon, Dec 24, 2018 at 4:25 PM Zang MingJie  wrote:

> ---
>  lib/cmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/cmap.c b/lib/cmap.c
> index cb9cd32ab..c9eef3f4a 100644
> --- a/lib/cmap.c
> +++ b/lib/cmap.c
> @@ -568,7 +568,7 @@ cmap_find_protected(const struct cmap *cmap, uint32_t
> hash)
>  {
>  struct cmap_impl *impl = cmap_get_impl(cmap);
>  uint32_t h1 = rehash(impl, hash);
> -uint32_t h2 = other_hash(hash);
> +uint32_t h2 = other_hash(h1);
>  struct cmap_node *node;
>
>  node = cmap_find_bucket_protected(impl, hash, h1);
> --
> 2.20.1
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] lib: fix cmap_find_protected

2018-12-24 Thread Zang MingJie
---
 lib/cmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/cmap.c b/lib/cmap.c
index cb9cd32ab..c9eef3f4a 100644
--- a/lib/cmap.c
+++ b/lib/cmap.c
@@ -568,7 +568,7 @@ cmap_find_protected(const struct cmap *cmap, uint32_t hash)
 {
 struct cmap_impl *impl = cmap_get_impl(cmap);
 uint32_t h1 = rehash(impl, hash);
-uint32_t h2 = other_hash(hash);
+uint32_t h2 = other_hash(h1);
 struct cmap_node *node;
 
 node = cmap_find_bucket_protected(impl, hash, h1);
-- 
2.20.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Bug: select group with dp_hash causing recursive recirculation

2018-09-25 Thread Zang MingJie
Hi, we found a serious problem where one pmd is stop working, I want to
share the problem and find solution here.

vswitchd log:

  2018-09-13T23:36:44.377Z|40269235|dpif_netdev(pmd45)|WARN|Packet dropped.
Max recirculation depth exceeded.
  2018-09-13T23:36:44.387Z|40269236|dpif_netdev(pmd45)|WARN|Packet dropped.
Max recirculation depth exceeded.
  2018-09-13T23:36:44.391Z|40269237|dpif_netdev(pmd45)|WARN|Packet dropped.
Max recirculation depth exceeded.

problematic datapath flows:


ct_state(+new-est),recirc_id(0x143c893),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),tcp(dst=443),
packets:84573093, bytes:6308807903, used:0.009s,
flags:SFPRU.ECN[200][400][800],
actions:meter(306),hash(hash_l4(0)),recirc(0x237b09d)


recirc_id(0x237b09d),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
packets:279713339, bytes:20890205186, used:0.007s,
flags:SFPRU.ECN[200][400][800], actions:hash(hash_l4(0)),recirc(0x237b09d)

corresponding openflow:

  cookie=0x5b5ab65e000f0101, duration=4848269.642s, table=40,
n_packets=974343805, n_bytes=72484367083,
priority=10,tcp,metadata=0xf0100/0xff00,tp_dst=443
actions=group:983297


group_id=983297,type=select,selection_method=dp_hash,bucket=bucket_id:3057033848,weight:100,actions=ct(commit,table=70,zone=15,exec(nat(dst=10.177.251.203:443))),...``lots
of buckets``...



Following explains how select group with dp_hash works.

To implement select group with dp_hash, two datapath flows are needed:

 1. calculate dp_hash, recirculate to second one
 2. select group bucket by dp_hash

When encounter a datapath miss, openflow doesn't know which one is missing,
so it depends on dp_hash value of the packet:

 if dp_hash == 0 generate first dp flow.
if dp_hash != 0 generate second dp flow.


Back to the problem.

Notice that second datapath flow is a dead loop, it recirculate to itself.
The cause of the problem is here ofproto/ofproto-dpif-xlate.c#L4429[1]:

/* dp_hash value 0 is special since it means that the dp_hash has not
been
 * computed, as all computed dp_hash values are non-zero.  Therefore
 * compare to zero can be used to decide if the dp_hash value is valid
 * without masking the dp_hash field. */
if (!dp_hash) {

The comment saying that `dp_hash` shouldn't be zero, but under DPDK, it can
be zero, at lib/odp-execute.c#L747[2]

   /* RSS hash can be used here instead of 5tuple for
* performance reasons. */
   if (dp_packet_rss_valid(packet)) {
   hash = dp_packet_get_rss_hash(packet);
   hash = hash_int(hash, hash_act->hash_basis);
   } else {
   flow_extract(packet, );
   hash = flow_hash_5tuple(, hash_act->hash_basis);
   }
   packet->md.dp_hash = hash;

I don't know how small chance that `hash_int` returns 0, we have tested
that if the final hash is 0, will definitely trigger the same bug. And due
to the chance is extremely low, I'm also investigation that if there are
other situation that will pass 0 hash to ofp.



IMO, it is silly to depends on dp_hash value, maybe we need a new mechanism
which can pass data between ofp and odp freely. And a quick solution could
be just change the 0 hash to 1.


[1]
https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-xlate.c#L4429
[2] https://github.com/openvswitch/ovs/blob/master/lib/odp-execute.c#L747
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: invalid packet should not modify ct state

2018-09-25 Thread Zang MingJie
On Thu, Sep 20, 2018 at 8:47 AM Darrell Ball  wrote:

>
>
> On Fri, Sep 14, 2018 at 1:46 AM, Zang MingJie 
> wrote:
>
>> >> > Did you notice this check ?
>> >> >
>> >> > if (src->state < CT_DPIF_TCPS_SYN_SENT) {
>> >> > /* First packet from this end. Set its state */
>> >>
>> >> Yes, this is exactly where we found the problem. If first reply packet
>> >> is invalid, it masses all following packets.
>> >
>> >
>> >
>> > Based on your description in the commit message:
>> > " Currently an invalid packet can change the seq number while the
>> connection is in SEQ_RECV state."
>> > we don't fall into this condition since SEQ_RECV  >
>> CT_DPIF_TCPS_SYN_SENT, right ?
>> >
>> > If you did not mean "SEQ_RECV", maybe you meant something else ?
>> > What the value of "src->state" where you saw an issue ?
>>
>> SYN_RECV is our server side tcp state,
>>
>
> Thanks for clarifying what you referring to.
>
>
>
>> ct table track either side tcp state separately, the problem happens in
>> this situation:
>>
>>client   |  ct.src   ct.dst  |  server
>> packet: syn ->
>> state :SYN_SEND | SYN_SEND  CLOSED  | SYN_RECV
>> packet:  <- malformed packet
>> state: SYN_SEND | SYN_SEND SYN_SEND | SYN_RECV  <-updated to wrong
>> state
>> packet: <- syn+ack  <-invalid
>>
>
> That's all fine, but details are needed.
>
> Is the second packet crafted to be invalid ?
> For that matter, is the first packet crafted to be bogus as well ?
>
> Pls send along:
>
> 1/ First packet tcp fields
>

first packet is normal tcp syn packet, nothing special


> 2/ Second packet tcp fields
>

second invalid packet is triggered due to a previous kernel bug, in general
it is packet of previous connection with same 5-tuple, everything is wrong
including flags, seq and ack number. the actual value doesn't matter as
long as it is invalid, as I have already shown, the CT state of this peer
will go to SYN_SEND even that no syn flag was set, and the wrong seq number
in the invalid packet will be tracked.


> 3/ The tcp lengths if the response is a crafted packet with unexpected
> data.
>

it doesn't matter at all. In the situation where we found the bug, it's a
fin packet of the last connection, no data.


The major problem is following checking you have posted:

if (src->state < CT_DPIF_TCPS_SYN_SENT) {

It is too loose, and if meet, the peer state will be changed, it is
unexpected.



>
> Thanks Darrell
>
>
>>
>> malformed packet will set wrong seq_lo and seq_hi to the state,
>> preventing following packets pass ct.
>>
>> >
>> >>
>> >>
>> >> >
>> >> >
>> >> >
>> >> >>
>> >> >>
>> >> >> Signed-off-by: Zang MingJie 
>> >> >> ---
>> >> >>  lib/conntrack-tcp.c | 10 --
>> >> >>  1 file changed, 8 insertions(+), 2 deletions(-)
>> >> >>
>> >> >> diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
>> >> >> index 86d313d28..7443a3293 100644
>> >> >> --- a/lib/conntrack-tcp.c
>> >> >> +++ b/lib/conntrack-tcp.c
>> >> >> @@ -151,9 +151,9 @@ tcp_conn_update(struct conn *conn_, struct
>> conntrack_bucket *ctb,
>> >> >>  struct conn_tcp *conn = conn_tcp_cast(conn_);
>> >> >>  struct tcp_header *tcp = dp_packet_l4(pkt);
>> >> >>  /* The peer that sent 'pkt' */
>> >> >> -struct tcp_peer *src = >peer[reply ? 1 : 0];
>> >> >> +struct tcp_peer orig_src, *src = >peer[reply ? 1 : 0];
>> >> >>  /* The peer that should receive 'pkt' */
>> >> >> -struct tcp_peer *dst = >peer[reply ? 0 : 1];
>> >> >> +struct tcp_peer orig_dst, *dst = >peer[reply ? 0 : 1];
>> >> >>  uint8_t sws = 0, dws = 0;
>> >> >>  uint16_t tcp_flags = TCP_FLAGS(tcp->tcp_ctl);
>> >> >>
>> >> >> @@ -187,6 +187,10 @@ tcp_conn_update(struct conn *conn_, struct
>> conntrack_bucket *ctb,
>> >> >>  dws = TCP_MAX_WSCALE;
>> >> >>  }
>> >> >>
>> >> >> +
>> >> >> +orig_src = *src;
>> >> >> +orig_dst = *dst;
>> >> >> +
>> >> >>  /*
>> >> >>   * Sequence tracking algorithm from Guido van Rooij's paper:
>> >> >>   *   http://www.madison-gurkha.com/publications/tcp_filtering/
>> >> >> @@ -385,6 +389,8 @@ tcp_conn_update(struct conn *conn_, struct
>> conntrack_bucket *ctb,
>> >> >>  src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT;
>> >> >>  }
>> >> >>  } else {
>> >> >> +*src = orig_src;
>> >> >> +*dst = orig_dst;
>> >> >>  return CT_UPDATE_INVALID;
>> >> >>  }
>> >> >>
>> >> >> --
>> >> >> 2.19.0.rc1
>> >> >>
>> >> >> ___
>> >> >> dev mailing list
>> >> >> d...@openvswitch.org
>> >> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> >> >
>> >> >
>> >
>> >
>>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: invalid packet should not modify ct state

2018-09-14 Thread Zang MingJie
>> > Did you notice this check ?
>> >
>> > if (src->state < CT_DPIF_TCPS_SYN_SENT) {
>> > /* First packet from this end. Set its state */
>>
>> Yes, this is exactly where we found the problem. If first reply packet
>> is invalid, it masses all following packets.
>
>
>
> Based on your description in the commit message:
> " Currently an invalid packet can change the seq number while the
connection is in SEQ_RECV state."
> we don't fall into this condition since SEQ_RECV  >
CT_DPIF_TCPS_SYN_SENT, right ?
>
> If you did not mean "SEQ_RECV", maybe you meant something else ?
> What the value of "src->state" where you saw an issue ?

SYN_RECV is our server side tcp state, ct table track either side tcp state
separately, the problem happens in this situation:

   client   |  ct.src   ct.dst  |  server
packet: syn ->
state :SYN_SEND | SYN_SEND  CLOSED  | SYN_RECV
packet:  <- malformed packet
state: SYN_SEND | SYN_SEND SYN_SEND | SYN_RECV  <-updated to wrong state
packet: <- syn+ack  <-invalid

malformed packet will set wrong seq_lo and seq_hi to the state, preventing
following packets pass ct.

>
>>
>>
>> >
>> >
>> >
>> >>
>> >>
>> >> Signed-off-by: Zang MingJie 
>> >> ---
>> >>  lib/conntrack-tcp.c | 10 --
>> >>  1 file changed, 8 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
>> >> index 86d313d28..7443a3293 100644
>> >> --- a/lib/conntrack-tcp.c
>> >> +++ b/lib/conntrack-tcp.c
>> >> @@ -151,9 +151,9 @@ tcp_conn_update(struct conn *conn_, struct
conntrack_bucket *ctb,
>> >>  struct conn_tcp *conn = conn_tcp_cast(conn_);
>> >>  struct tcp_header *tcp = dp_packet_l4(pkt);
>> >>  /* The peer that sent 'pkt' */
>> >> -struct tcp_peer *src = >peer[reply ? 1 : 0];
>> >> +struct tcp_peer orig_src, *src = >peer[reply ? 1 : 0];
>> >>  /* The peer that should receive 'pkt' */
>> >> -struct tcp_peer *dst = >peer[reply ? 0 : 1];
>> >> +struct tcp_peer orig_dst, *dst = >peer[reply ? 0 : 1];
>> >>  uint8_t sws = 0, dws = 0;
>> >>  uint16_t tcp_flags = TCP_FLAGS(tcp->tcp_ctl);
>> >>
>> >> @@ -187,6 +187,10 @@ tcp_conn_update(struct conn *conn_, struct
conntrack_bucket *ctb,
>> >>  dws = TCP_MAX_WSCALE;
>> >>  }
>> >>
>> >> +
>> >> +orig_src = *src;
>> >> +orig_dst = *dst;
>> >> +
>> >>  /*
>> >>   * Sequence tracking algorithm from Guido van Rooij's paper:
>> >>   *   http://www.madison-gurkha.com/publications/tcp_filtering/
>> >> @@ -385,6 +389,8 @@ tcp_conn_update(struct conn *conn_, struct
conntrack_bucket *ctb,
>> >>  src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT;
>> >>  }
>> >>  } else {
>> >> +*src = orig_src;
>> >> +*dst = orig_dst;
>> >>  return CT_UPDATE_INVALID;
>> >>  }
>> >>
>> >> --
>> >> 2.19.0.rc1
>> >>
>> >> ___
>> >> dev mailing list
>> >> d...@openvswitch.org
>> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> >
>> >
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: invalid packet should not modify ct state

2018-09-13 Thread Zang MingJie
On Thu, Sep 13, 2018 at 2:55 AM Darrell Ball  wrote:
>
> Thanks for looking MingJie
>
>
> On Wed, Sep 12, 2018 at 2:16 AM, Zang MingJie  wrote:
>>
>> When encounter an invalid packet, all changes made by the packet should
>> be reverted. Currently an invalid packet can change the seq number while
>> the connection is in SEQ_RECV state.
>
>
>
> Did you notice this check ?
>
> if (src->state < CT_DPIF_TCPS_SYN_SENT) {
> /* First packet from this end. Set its state */

Yes, this is exactly where we found the problem. If first reply packet
is invalid, it masses all following packets.

>
>
>
>>
>>
>> Signed-off-by: Zang MingJie 
>> ---
>>  lib/conntrack-tcp.c | 10 --
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
>> index 86d313d28..7443a3293 100644
>> --- a/lib/conntrack-tcp.c
>> +++ b/lib/conntrack-tcp.c
>> @@ -151,9 +151,9 @@ tcp_conn_update(struct conn *conn_, struct 
>> conntrack_bucket *ctb,
>>  struct conn_tcp *conn = conn_tcp_cast(conn_);
>>  struct tcp_header *tcp = dp_packet_l4(pkt);
>>  /* The peer that sent 'pkt' */
>> -struct tcp_peer *src = >peer[reply ? 1 : 0];
>> +struct tcp_peer orig_src, *src = >peer[reply ? 1 : 0];
>>  /* The peer that should receive 'pkt' */
>> -struct tcp_peer *dst = >peer[reply ? 0 : 1];
>> +struct tcp_peer orig_dst, *dst = >peer[reply ? 0 : 1];
>>  uint8_t sws = 0, dws = 0;
>>  uint16_t tcp_flags = TCP_FLAGS(tcp->tcp_ctl);
>>
>> @@ -187,6 +187,10 @@ tcp_conn_update(struct conn *conn_, struct 
>> conntrack_bucket *ctb,
>>  dws = TCP_MAX_WSCALE;
>>  }
>>
>> +
>> +orig_src = *src;
>> +orig_dst = *dst;
>> +
>>  /*
>>   * Sequence tracking algorithm from Guido van Rooij's paper:
>>   *   http://www.madison-gurkha.com/publications/tcp_filtering/
>> @@ -385,6 +389,8 @@ tcp_conn_update(struct conn *conn_, struct 
>> conntrack_bucket *ctb,
>>  src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT;
>>  }
>>  } else {
>> +*src = orig_src;
>> +*dst = orig_dst;
>>  return CT_UPDATE_INVALID;
>>  }
>>
>> --
>> 2.19.0.rc1
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] conntrack: invalid packet should not modify ct state

2018-09-12 Thread Zang MingJie
When encounter an invalid packet, all changes made by the packet should
be reverted. Currently an invalid packet can change the seq number while
the connection is in SEQ_RECV state.

Signed-off-by: Zang MingJie 
---
 lib/conntrack-tcp.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
index 86d313d28..7443a3293 100644
--- a/lib/conntrack-tcp.c
+++ b/lib/conntrack-tcp.c
@@ -151,9 +151,9 @@ tcp_conn_update(struct conn *conn_, struct conntrack_bucket 
*ctb,
 struct conn_tcp *conn = conn_tcp_cast(conn_);
 struct tcp_header *tcp = dp_packet_l4(pkt);
 /* The peer that sent 'pkt' */
-struct tcp_peer *src = >peer[reply ? 1 : 0];
+struct tcp_peer orig_src, *src = >peer[reply ? 1 : 0];
 /* The peer that should receive 'pkt' */
-struct tcp_peer *dst = >peer[reply ? 0 : 1];
+struct tcp_peer orig_dst, *dst = >peer[reply ? 0 : 1];
 uint8_t sws = 0, dws = 0;
 uint16_t tcp_flags = TCP_FLAGS(tcp->tcp_ctl);
 
@@ -187,6 +187,10 @@ tcp_conn_update(struct conn *conn_, struct 
conntrack_bucket *ctb,
 dws = TCP_MAX_WSCALE;
 }
 
+
+orig_src = *src;
+orig_dst = *dst;
+
 /*
  * Sequence tracking algorithm from Guido van Rooij's paper:
  *   http://www.madison-gurkha.com/publications/tcp_filtering/
@@ -385,6 +389,8 @@ tcp_conn_update(struct conn *conn_, struct conntrack_bucket 
*ctb,
 src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT;
 }
 } else {
+*src = orig_src;
+*dst = orig_dst;
 return CT_UPDATE_INVALID;
 }
 
-- 
2.19.0.rc1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] OVS-DPDK: fast host internal interface

2018-08-30 Thread Zang MingJie
Hi:

Currently with user space datapath, only internal tap device can be
used to communicate with kernel. Because packets must be copied from
user space to kernel, the throughput of tap implementation is very
slow, not suitable for data plane. Now DPDK provides some alternative
solutions.

1. KNI interface. https://doc.dpdk.org/guides/nics/kni.html

It requires a kernel module, and seems not well maintained.

2. Tap/Tun PMD. https://doc.dpdk.org/guides/nics/tap.html

Instead of using ovs main thread to communicate with tap device, it
use its own pmd thread, which may increase performance.

3. vhost+virtio adapter.
https://doc.dpdk.org/guides/howto/virtio_user_as_exceptional_path.html

It utilizes existing vhost and vhost-net kernel module, emulate a
virtio device, communicate directly to vhost ring. This looks like a
good solution.


Looking for suggestions
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v2 1/2] conntrack: Fix nat_clean.

2018-08-30 Thread Zang MingJie
I don't think the patch will resolve the problem. Once ctb->lock is
released, other thread may have chance to acquire the lock and modify
ctb. In general, ctb->lock can not be released in this function,
another approach is needed.

On Wed, Aug 29, 2018 at 3:31 PM Darrell Ball  wrote:
>
> nat_clean has a defunct optimization for calculating a hash outside the
> scope of a bucket lock which can lead to a race in referencing a freed
> conntrack entry.  Adjust to avoid this.  Needs backporting to 2.8.
>
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-August/351629.html
> Fixes: 286de2729955 ("dpdk: Userspace Datapath: Introduce NAT Support.")
> Signed-off-by: Darrell Ball 
> ---
>  lib/conntrack.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index be8debb..692f2b8 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -778,20 +778,22 @@ nat_clean(struct conntrack *ct, struct conn *conn,
>  {
>  ct_rwlock_wrlock(>resources_lock);
>  nat_conn_keys_remove(>nat_conn_keys, >rev_key, ct->hash_basis);
> -ct_rwlock_unlock(>resources_lock);
> -ct_lock_unlock(>lock);
>  unsigned bucket_rev_conn =
>  hash_to_bucket(conn_key_hash(>rev_key, ct->hash_basis));
> +struct conn_key rev_key = conn->rev_key;
> +ct_rwlock_unlock(>resources_lock);
> +ct_lock_unlock(>lock);
> +
>  ct_lock_lock(>buckets[bucket_rev_conn].lock);
>  ct_rwlock_wrlock(>resources_lock);
>  long long now = time_msec();
> -struct conn *rev_conn = conn_lookup(ct, >rev_key, now);
> +struct conn *rev_conn = conn_lookup(ct, _key, now);
>  struct nat_conn_key_node *nat_conn_key_node =
> -nat_conn_keys_lookup(>nat_conn_keys, >rev_key,
> +nat_conn_keys_lookup(>nat_conn_keys, _key,
>   ct->hash_basis);
>
> -/* In the unlikely event, rev conn was recreated, then skip
> - * rev_conn cleanup. */
> +/* In the unlikely event, 'rev_conn' was recreated, then skip
> + * 'rev_conn' cleanup. */
>  if (rev_conn && (!nat_conn_key_node ||
>   conn_key_cmp(_conn_key_node->value,
>_conn->rev_key))) {
> --
> 1.9.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Several conntrack problems, including some critical bugs.

2018-08-27 Thread Zang MingJie
While developing application using ovs userspace conntrack, we found
some bugs worth mention here.

1. conntrack_clean may causes ovs crash.

conntrack_clean function iterators through all buckets, and free
entries in the bucket with bucket lock, but when releasing a NAT
connection, inside nat_clean function, the bucket lock is temporarily
released, if other PMD acquires the lock and modifies the bucket,
further loop may causing invalid memory access inside sweep_bucket
function.

2. occasionally incorrectly DNAT to 1024 port, despite whatever port specified.

We found 2 scenarios, both leads to this result.

First, consider there are two virtual server share the same backend,
which are implemented by DNAT, both V1 and V2 are DNAT to R. While
there is already a connection C->V1 which is DNAT as C->R, if there is
another incoming connection C->V2, will also DNAT as C->R, causing
conntrack table conflict, but instead of dropping the packet, the
connection is DNAT to port 1024. Because the NAT function search
through port 1024 - 65535 when conflict occurred.

Second, if a conntrack entry is expired but not yet released, mostly
in TIMEWAIT state, the client may reuse the same port to establish a
new connection, when this condition is met, will also cause a
conflict, the connection will DNAT to port 1024 if DNAT is used.

There are also some other problems under investigation, and I'll post
them when we find the cause.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Upper limit of QoS burst

2018-04-23 Thread Zang MingJie
Hi:

I found that in struct ofputil_meter_band, rate and burst are all 32 bits,
the max burst is:

for bandwidth: 4GB = 32Gb
for pps: 4G packets/1000 = 4M packets

In our high performance setup, 4M packets burst is easily achievable, so I
suggest change these variable to uint64_t
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-05-10 Thread Zang MingJie

Confirmed, can be easily reproduced using described method.

Using ovs 2.6.2

On 01/23/2017 11:58 PM, Vidyasagara Guntaka via dev wrote:

Hi Ben,

We could reproduce this with the latest version 2.6.1.  When we compiled the 
code, we removed -O2 from CFLAGS.  This seems to make it happen more 
frequently.  With the following script running, the error starts happening 
within a few seconds and then continues to happen after every few seconds.  In 
summary, our suspicion is that having no controller set and having no NORMAL 
processing flow seems to trigger the stack that was pointed out in the gdb 
session more often and hence we hit this race condition very easily using the 
following script.  (Even if there is a default NORMAL processing flow entry, 
after the first iteration of the script below, that will be deleted).

Also, a few things about the setup - just in case:
  * enp5s0 belongs to the physical interface on this hypervisor.
  * vport0 and vport1 belong to tap interfaces corresponding to two VMs running 
on this hypervisor.
  * When the script was running, we had pings issued from the VMs so that 
packets make it to the bridge br0.

Here is a small script that makes it happen on multiple of our hypervisors:

while true
do
ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=1,actions=controller"
ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=2,actions=controller"
ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=3,actions=controller"
ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=4,actions=controller"
ovs-ofctl del-flows br0
done

Here is our bridge br0 setup:

[root@deepspace ~]# ifconfig br0
br0: flags=4163  mtu 1500
inet 192.168.2.142  netmask 255.255.255.0  broadcast 192.168.2.255
inet6 fe80::213:3bff:fe0f:1301  prefixlen 64  scopeid 0x20
ether 00:13:3b:0f:13:01  txqueuelen 1000  (Ethernet)
RX packets 89417814  bytes 12088012200 (11.2 GiB)
RX errors 0  dropped 82  overruns 0  frame 0
TX packets 32330647  bytes 3168352394 (2.9 GiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@deepspace ~]# ovs-vsctl show
54f89e00-edd2-486e-9626-6d11c7d8b0b6
Bridge "br0"
Port "vport1"
Interface "vport1"
Port "br0"
Interface "br0"
type: internal
Port vtep
Interface vtep
type: vxlan
options: {key=flow, remote_ip="192.168.1.141"}
Port "vport0"
Interface "vport0"
Port "enp5s0"
Interface "enp5s0"
ovs_version: “2.6.1"

[root@deepspace ~]# ovs-ofctl show br0
OFPT_FEATURES_REPLY (xid=0x2): dpid:00133b0f1301
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src 
mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 1(enp5s0): addr:00:13:3b:0f:13:01
 config: 0
 state:  0
 current:1GB-FD AUTO_NEG
 advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-HD 1GB-FD COPPER 
AUTO_NEG AUTO_PAUSE AUTO_PAUSE_ASYM
 supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-HD 1GB-FD COPPER AUTO_NEG
 speed: 1000 Mbps now, 1000 Mbps max
 2(vport0): addr:fe:00:00:00:00:03
 config: 0
 state:  0
 current:10MB-FD COPPER
 speed: 10 Mbps now, 0 Mbps max
 3(vport1): addr:fe:00:00:00:00:04
 config: 0
 state:  0
 current:10MB-FD COPPER
 speed: 10 Mbps now, 0 Mbps max
 4(vtep): addr:aa:97:d2:a9:19:ed
 config: 0
 state:  0
 speed: 0 Mbps now, 0 Mbps max
 LOCAL(br0): addr:00:13:3b:0f:13:01
 config: 0
 state:  0
 speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

[root@deepspace ~]# ovs-ofctl --version
ovs-ofctl (Open vSwitch) 2.6.1
OpenFlow versions 0x1:0x4

Please let us know if you need anything else to reproduce this.

Thanks,
Sagar.


On Jan 18, 2017, at 1:19 PM, Ben Pfaff  wrote:

If you can come up with simple reproduction instructions that work for
me, I'm happy to track this down.  It's probably something very simple.

On Tue, Jan 17, 2017 at 08:50:20AM -0800, Vidyasagara Guntaka wrote:

This issue happened on our in-use systems and we were trying to find a way
to move forward avoiding this issue so that we do not have to upgrade OVS
on thousands of our hypervisors causing down time. Our debugging did help
us avoid the issue for now by installing an explicit rule to to drop
packets when there is no match and this issue is not seen over many hours
of test runs.

We will definitely run this test with latest version.  But, will need more
time since we are busy with our release related activities.

Regards,
Sagar.

On Tue, Jan 17, 2017 at 8:42 AM, Ben Pfaff 

Re: [ovs-dev] [PATCH] netdev-dpdk: add dpdk interface strip_vlan option

2017-04-18 Thread Zang MingJie
Yes, It is better to use the bit-field. I'll update the patch after
the refereed patch has been applied

On Tue, Apr 4, 2017 at 10:03 PM, Kevin Traynor <ktray...@redhat.com> wrote:
> On 03/15/2017 08:46 AM, Zang MingJie wrote:
>> dpdk-strip-vlan option specifies whether strip vlan for the dpdk interface.
>>
>> Signed-off-by: Zang MingJie <zealot0...@gmail.com>
>> ---
>>  lib/netdev-dpdk.c| 23 ++-
>>  vswitchd/vswitch.xml |  7 +++
>>  2 files changed, 29 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index ddc651bf9..ea49adf3e 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -373,6 +373,7 @@ struct netdev_dpdk {
>>  int requested_n_rxq;
>>  int requested_rxq_size;
>>  int requested_txq_size;
>> +bool requested_strip_vlan;
>>
>>  /* Number of rx/tx descriptors for physical devices */
>>  int rxq_size;
>> @@ -395,6 +396,8 @@ struct netdev_dpdk {
>>  /* DPDK-ETH hardware offload features,
>>   * from the enum set 'dpdk_hw_ol_features' */
>>  uint32_t hw_ol_features;
>> +
>> +bool strip_vlan;
>
> I don't think this should have it's own fields in the data struct as
> there is a hw_ol_feature bitmask intended for that. At the moment it is
> only used for rx checksum offload and could easily be extended for strip
> vlan if people think it's a good feature to expose. Maybe the reason you
> didn't use it in this patch is because it's currently broken and does
> nothing on reconfiguration :|
>
> I've sent a patch to fix that here
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-April/330481.html
>
>>  };
>>
>>  struct netdev_rxq_dpdk {
>> @@ -646,6 +649,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int 
>> n_rxq, int n_txq)
>>  conf.rxmode.jumbo_frame = 0;
>>  conf.rxmode.max_rx_pkt_len = 0;
>>  }
>> +conf.rxmode.hw_vlan_strip = dev->strip_vlan;
>>  conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
>>NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
>>  /* A device may report more queues than it makes available (this has
>> @@ -1133,6 +1137,19 @@ netdev_dpdk_process_devargs(const char *devargs, char 
>> **errp)
>>  }
>>
>>  static void
>> +dpdk_set_strip_vlan_config(struct netdev_dpdk *dev, const struct smap *args)
>> +OVS_REQUIRES(dev->mutex)
>> +{
>> +bool strip_vlan;
>> +
>> +strip_vlan = smap_get_bool(args, "dpdk-strip-vlan", false);
>> +if (strip_vlan != dev->requested_strip_vlan) {
>> +dev->requested_strip_vlan = strip_vlan;
>> +netdev_request_reconfigure(>up);
>> +}
>> +}
>> +
>> +static void
>>  dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args)
>>  OVS_REQUIRES(dev->mutex)
>>  {
>> @@ -1182,6 +1199,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
>> struct smap *args,
>>  ovs_mutex_lock(>mutex);
>>
>>  dpdk_set_rxq_config(dev, args);
>> +dpdk_set_strip_vlan_config(dev, args);
>>
>>  dpdk_process_queue_size(netdev, args, "n_rxq_desc",
>>  NIC_PORT_DEFAULT_RXQ_SIZE,
>> @@ -3123,7 +3141,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>  && dev->mtu == dev->requested_mtu
>>  && dev->rxq_size == dev->requested_rxq_size
>>  && dev->txq_size == dev->requested_txq_size
>> -&& dev->socket_id == dev->requested_socket_id) {
>> +&& dev->socket_id == dev->requested_socket_id
>> +&& dev->strip_vlan == dev->requested_strip_vlan) {
>>  /* Reconfiguration is unnecessary */
>>
>>  goto out;
>> @@ -3145,6 +3164,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>  dev->rxq_size = dev->requested_rxq_size;
>>  dev->txq_size = dev->requested_txq_size;
>>
>> +dev->strip_vlan = dev->requested_strip_vlan;
>> +
>>  rte_free(dev->tx_q);
>>  err = dpdk_eth_dev_init(dev);
>>  dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq);
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index a91be59b0..5c0083188 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -2360,6 +2360,13 @@
>>  
>>
>>
>> +  
>> +
>> +  Specifies whether strip vlan for the dpdk interface. It is useful
>> +  when using ixgbevf interface with a vlan filter.
>> +
>> +  
>> +
>>>type='{"type": "string"}'>
>>  
>>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] netdev-dpdk: add dpdk interface strip_vlan option

2017-03-15 Thread Zang MingJie
dpdk-strip-vlan option specifies whether strip vlan for the dpdk interface.

Signed-off-by: Zang MingJie <zealot0...@gmail.com>
---
 lib/netdev-dpdk.c| 23 ++-
 vswitchd/vswitch.xml |  7 +++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ddc651bf9..ea49adf3e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -373,6 +373,7 @@ struct netdev_dpdk {
 int requested_n_rxq;
 int requested_rxq_size;
 int requested_txq_size;
+bool requested_strip_vlan;
 
 /* Number of rx/tx descriptors for physical devices */
 int rxq_size;
@@ -395,6 +396,8 @@ struct netdev_dpdk {
 /* DPDK-ETH hardware offload features,
  * from the enum set 'dpdk_hw_ol_features' */
 uint32_t hw_ol_features;
+
+bool strip_vlan;
 };
 
 struct netdev_rxq_dpdk {
@@ -646,6 +649,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int 
n_rxq, int n_txq)
 conf.rxmode.jumbo_frame = 0;
 conf.rxmode.max_rx_pkt_len = 0;
 }
+conf.rxmode.hw_vlan_strip = dev->strip_vlan;
 conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
   NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
 /* A device may report more queues than it makes available (this has
@@ -1133,6 +1137,19 @@ netdev_dpdk_process_devargs(const char *devargs, char 
**errp)
 }
 
 static void
+dpdk_set_strip_vlan_config(struct netdev_dpdk *dev, const struct smap *args)
+OVS_REQUIRES(dev->mutex)
+{
+bool strip_vlan;
+
+strip_vlan = smap_get_bool(args, "dpdk-strip-vlan", false);
+if (strip_vlan != dev->requested_strip_vlan) {
+dev->requested_strip_vlan = strip_vlan;
+netdev_request_reconfigure(>up);
+}
+}
+
+static void
 dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args)
 OVS_REQUIRES(dev->mutex)
 {
@@ -1182,6 +1199,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
struct smap *args,
 ovs_mutex_lock(>mutex);
 
 dpdk_set_rxq_config(dev, args);
+dpdk_set_strip_vlan_config(dev, args);
 
 dpdk_process_queue_size(netdev, args, "n_rxq_desc",
 NIC_PORT_DEFAULT_RXQ_SIZE,
@@ -3123,7 +3141,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
 && dev->mtu == dev->requested_mtu
 && dev->rxq_size == dev->requested_rxq_size
 && dev->txq_size == dev->requested_txq_size
-&& dev->socket_id == dev->requested_socket_id) {
+&& dev->socket_id == dev->requested_socket_id
+&& dev->strip_vlan == dev->requested_strip_vlan) {
 /* Reconfiguration is unnecessary */
 
 goto out;
@@ -3145,6 +3164,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
 dev->rxq_size = dev->requested_rxq_size;
 dev->txq_size = dev->requested_txq_size;
 
+dev->strip_vlan = dev->requested_strip_vlan;
+
 rte_free(dev->tx_q);
 err = dpdk_eth_dev_init(dev);
 dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq);
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index a91be59b0..5c0083188 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -2360,6 +2360,13 @@
 
   
 
+  
+
+  Specifies whether strip vlan for the dpdk interface. It is useful
+  when using ixgbevf interface with a vlan filter.
+
+  
+
   
 
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] netdev-dpdk: fix flow based tunnel

2017-03-13 Thread Zang MingJie
The push/pop header of flow based tunnel is generated dynamically by flows, so
there won't be any tunnel header associated to netdev.

The patch fixes that tnl_port_cache doesn't contain any flow based tunnel.

Signed-off-by: Zang MingJie <zealot0...@gmail.com>
---
 lib/dpif-netdev.c | 2 +-
 lib/netdev.c  | 7 ---
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 7d53a8def..383001e6c 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3599,7 +3599,7 @@ pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd)
 hmap_shrink(>tnl_port_cache);
 
 HMAP_FOR_EACH (tx_port, node, >tx_ports) {
-if (netdev_has_tunnel_push_pop(tx_port->port->netdev)) {
+if (netdev_get_tunnel_config(tx_port->port->netdev) != NULL) {
 tx_port_cached = xmemdup(tx_port, sizeof *tx_port_cached);
 hmap_insert(>tnl_port_cache, _port_cached->node,
 hash_port_no(tx_port_cached->port->port_no));
diff --git a/lib/netdev.c b/lib/netdev.c
index a8d8edad7..c10c7760c 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -117,13 +117,6 @@ netdev_is_pmd(const struct netdev *netdev)
 return netdev->netdev_class->is_pmd;
 }
 
-bool
-netdev_has_tunnel_push_pop(const struct netdev *netdev)
-{
-return netdev->netdev_class->push_header
-   && netdev->netdev_class->pop_header;
-}
-
 static void
 netdev_initialize(void)
 OVS_EXCLUDED(netdev_mutex)
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev