Re: [ovs-dev] [PATCH] conntrack: Fix conntrack new state

2019-12-26 Thread Darrell Ball
Sure Ben, I will take a look.

Thanks Darrell

On Fri, Dec 20, 2019 at 1:16 PM Ben Pfaff  wrote:

> On Fri, Dec 20, 2019 at 09:51:08AM -0800, Yi-Hung Wei wrote:
> > In connection tracking system, a connection is established if we
> > see packets from both directions.  However, in userspace datapath's
> > conntrack, if we send a connection setup packet in one direction
> > twice, it will make the connection to be in established state.
> >
> > This patch fixes the aforementioned issue, and adds a system traffic
> > test for UDP and TCP traffic to avoid regression.
> >
> > Fixes: a489b16854b59 ("conntrack: New userspace connection tracker.")
> > Signed-off-by: Yi-Hung Wei 
> > ---
> > Travis CI: https://travis-ci.org/YiHungWei/ovs/builds/627518780
>
> Good catch!
>
> Darrell, will you review this?
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch v4] conntrack: Support zone limits.

2019-12-03 Thread Darrell Ball
Signed-off-by: Darrell Ball 
---

v4: Add zone_limit_seq; /* Used to disambiguate zone limit counts. */

v3: recent merge conflict.

v2: Address review comment from Ben; one involves creating an
admit zone in the connection entry to track the zone used
for zone limit accounting when the entry was created and use
that zone at cleanup time accounting.

Updated dpctl.man.
Fixed a bug in zone_limit_get() to return default zone by default..
Fixed a parameter in zone_limit_delete().
Some small cleanups.

 Documentation/faq/releases.rst   |   2 +-
 NEWS |   1 +
 lib/conntrack-private.h  |   8 +++
 lib/conntrack.c  | 137 +++
 lib/conntrack.h  |  18 +
 lib/dpctl.man|   8 +--
 lib/dpif-netdev.c|  88 -
 tests/system-kmod-macros.at  |   7 --
 tests/system-traffic.at  |   1 -
 tests/system-userspace-macros.at |   9 ---
 10 files changed, 253 insertions(+), 26 deletions(-)

diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index 9c5ee03..6702c58 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -118,7 +118,7 @@ Q: Are all features available with all datapaths?
 Connection tracking 4.32.5  2.6  YES
 Conntrack Fragment Reass.   4.32.6  2.12 YES
 Conntrack Timeout Policies  5.22.12 NO   NO
-Conntrack Zone Limit4.18   2.10 NO   YES
+Conntrack Zone Limit4.18   2.10 2.13 YES
 Conntrack NAT   4.62.6  2.8  YES
 Tunnel - LISP   NO 2.11 NO   NO
 Tunnel - STTNO 2.4  NO   YES
diff --git a/NEWS b/NEWS
index 80b..17f92ba 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,7 @@ Post-v2.12.0
- Userspace datapath:
  * Add option to enable, disable and query TCP sequence checking in
conntrack.
+ * Add support for conntrack zone limits.
- AF_XDP:
  * New option 'use-need-wakeup' for netdev-afxdp to control enabling
of corresponding 'need_wakeup' flag in AF_XDP rings.  Enabled by default
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 590f139..b04e4cd 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -105,6 +105,12 @@ struct conn {
 long long expiration;
 uint32_t mark;
 int seq_skew;
+
+/* Immutable data. */
+int32_t admit_zone; /* The zone for managing zone limit counts. */
+uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
+
+/* Mutable data. */
 bool seq_skew_dir; /* TCP sequence skew direction due to NATTing of FTP
 * control messages; true if reply direction. */
 bool cleaned; /* True if cleaned from expiry lists. */
@@ -155,6 +161,7 @@ struct conntrack {
 struct ovs_mutex ct_lock; /* Protects 2 following fields. */
 struct cmap conns OVS_GUARDED;
 struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;
+struct hmap zone_limits OVS_GUARDED;
 uint32_t hash_basis; /* Salt for hashing a connection key. */
 pthread_t clean_thread; /* Periodically cleans up connection tracker. */
 struct latch clean_thread_exit; /* To destroy the 'clean_thread'. */
@@ -172,6 +179,7 @@ struct conntrack {
  * control context.  */
 
 struct ipf *ipf; /* Fragmentation handling context. */
+uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */
 atomic_bool tcp_seq_chk; /* Check TCP sequence numbers. */
 };
 
diff --git a/lib/conntrack.c b/lib/conntrack.c
index df7b9fa..b80080e 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -76,6 +76,11 @@ enum ct_alg_ctl_type {
 CT_ALG_CTL_SIP,
 };
 
+struct zone_limit {
+struct hmap_node node;
+struct conntrack_zone_limit czl;
+};
+
 static bool conn_key_extract(struct conntrack *, struct dp_packet *,
  ovs_be16 dl_type, struct conn_lookup_ctx *,
  uint16_t zone);
@@ -305,6 +310,8 @@ conntrack_init(void)
 for (unsigned i = 0; i < ARRAY_SIZE(ct->exp_lists); i++) {
 ovs_list_init(>exp_lists[i]);
 }
+hmap_init(>zone_limits);
+ct->zone_limit_seq = 0;
 ovs_mutex_unlock(>ct_lock);
 
 ct->hash_basis = random_uint32();
@@ -318,6 +325,111 @@ conntrack_init(void)
 return ct;
 }
 
+static uint32_t
+zone_key_hash(int32_t zone, uint32_t basis)
+{
+size_t hash = hash_int((OVS_FORCE uint32_t) zone, basis);
+return hash;
+}
+
+static struct zone_limit *
+zone_limit_lookup(struct conntrack *ct, int32_t zone)
+OVS_REQUIRES(ct->ct_lock)
+{
+uint32_t hash = zone_key_h

Re: [ovs-dev] [patch v3] conntrack: Support zone limits.

2019-12-03 Thread Darrell Ball
Pls ignore V3; sending V4.


On Mon, Dec 2, 2019 at 9:43 PM Darrell Ball  wrote:

> Signed-off-by: Darrell Ball 
> ---
>
> v3: recent merge conflict.
>
> v2: Address review comment from Ben; one involves creating an
> admit zone in the connection entry to track the zone used
> for zone limit accounting when the entry was created and use
> that zone at cleanup time accounting.
>
> Updated dpctl.man.
> Fixed a bug in zone_limit_get() to return default zone by default..
> Fixed a parameter in zone_limit_delete().
> Some small cleanups.
>
>  Documentation/faq/releases.rst   |   2 +-
>  NEWS |   1 +
>  lib/conntrack-private.h  |   2 +
>  lib/conntrack.c  | 134
> +++
>  lib/conntrack.h  |  17 +
>  lib/dpctl.man|   8 +--
>  lib/dpif-netdev.c|  88 -
>  tests/system-kmod-macros.at  |   7 --
>  tests/system-traffic.at  |   1 -
>  tests/system-userspace-macros.at |   9 ---
>  10 files changed, 243 insertions(+), 26 deletions(-)
>
> diff --git a/Documentation/faq/releases.rst
> b/Documentation/faq/releases.rst
> index 9c5ee03..6702c58 100644
> --- a/Documentation/faq/releases.rst
> +++ b/Documentation/faq/releases.rst
> @@ -118,7 +118,7 @@ Q: Are all features available with all datapaths?
>  Connection tracking 4.32.5  2.6
> YES
>  Conntrack Fragment Reass.   4.32.6  2.12
>  YES
>  Conntrack Timeout Policies  5.22.12 NO
>  NO
> -Conntrack Zone Limit4.18   2.10 NO
>  YES
> +Conntrack Zone Limit4.18   2.10 2.13
>  YES
>  Conntrack NAT   4.62.6  2.8
> YES
>  Tunnel - LISP   NO 2.11 NO
>  NO
>  Tunnel - STTNO 2.4  NO
>  YES
> diff --git a/NEWS b/NEWS
> index 80b..17f92ba 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -7,6 +7,7 @@ Post-v2.12.0
> - Userspace datapath:
>   * Add option to enable, disable and query TCP sequence checking in
> conntrack.
> + * Add support for conntrack zone limits.
> - AF_XDP:
>   * New option 'use-need-wakeup' for netdev-afxdp to control enabling
> of corresponding 'need_wakeup' flag in AF_XDP rings.  Enabled by
> default
> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
> index 590f139..7d36eac 100644
> --- a/lib/conntrack-private.h
> +++ b/lib/conntrack-private.h
> @@ -105,6 +105,7 @@ struct conn {
>  long long expiration;
>  uint32_t mark;
>  int seq_skew;
> +int32_t admit_zone; /* The zone for managing zone limit counts. */
>  bool seq_skew_dir; /* TCP sequence skew direction due to NATTing of
> FTP
>  * control messages; true if reply direction. */
>  bool cleaned; /* True if cleaned from expiry lists. */
> @@ -155,6 +156,7 @@ struct conntrack {
>  struct ovs_mutex ct_lock; /* Protects 2 following fields. */
>  struct cmap conns OVS_GUARDED;
>  struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;
> +struct hmap zone_limits OVS_GUARDED;
>  uint32_t hash_basis; /* Salt for hashing a connection key. */
>  pthread_t clean_thread; /* Periodically cleans up connection tracker.
> */
>  struct latch clean_thread_exit; /* To destroy the 'clean_thread'. */
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index df7b9fa..33d540a 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -76,6 +76,11 @@ enum ct_alg_ctl_type {
>  CT_ALG_CTL_SIP,
>  };
>
> +struct zone_limit {
> +struct hmap_node node;
> +struct conntrack_zone_limit czl;
> +};
> +
>  static bool conn_key_extract(struct conntrack *, struct dp_packet *,
>   ovs_be16 dl_type, struct conn_lookup_ctx *,
>   uint16_t zone);
> @@ -305,6 +310,7 @@ conntrack_init(void)
>  for (unsigned i = 0; i < ARRAY_SIZE(ct->exp_lists); i++) {
>  ovs_list_init(>exp_lists[i]);
>  }
> +hmap_init(>zone_limits);
>  ovs_mutex_unlock(>ct_lock);
>
>  ct->hash_basis = random_uint32();
> @@ -318,6 +324,110 @@ conntrack_init(void)
>  return ct;
>  }
>
> +static uint32_t
> +zone_key_hash(int32_t zone, uint32_t basis)
> +{
> +size_t hash = hash_int((OVS_FORCE uint32_t) zone, basis);
> +return hash;
> +}
> +
> +static struct zone_limit *
> +zone_limit_lookup(struct conntrack *ct, int32_

[ovs-dev] [patch v3] conntrack: Support zone limits.

2019-12-02 Thread Darrell Ball
Signed-off-by: Darrell Ball 
---

v3: recent merge conflict.

v2: Address review comment from Ben; one involves creating an
admit zone in the connection entry to track the zone used
for zone limit accounting when the entry was created and use
that zone at cleanup time accounting.

Updated dpctl.man.
Fixed a bug in zone_limit_get() to return default zone by default..
Fixed a parameter in zone_limit_delete().
Some small cleanups.

 Documentation/faq/releases.rst   |   2 +-
 NEWS |   1 +
 lib/conntrack-private.h  |   2 +
 lib/conntrack.c  | 134 +++
 lib/conntrack.h  |  17 +
 lib/dpctl.man|   8 +--
 lib/dpif-netdev.c|  88 -
 tests/system-kmod-macros.at  |   7 --
 tests/system-traffic.at  |   1 -
 tests/system-userspace-macros.at |   9 ---
 10 files changed, 243 insertions(+), 26 deletions(-)

diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index 9c5ee03..6702c58 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -118,7 +118,7 @@ Q: Are all features available with all datapaths?
 Connection tracking 4.32.5  2.6  YES
 Conntrack Fragment Reass.   4.32.6  2.12 YES
 Conntrack Timeout Policies  5.22.12 NO   NO
-Conntrack Zone Limit4.18   2.10 NO   YES
+Conntrack Zone Limit4.18   2.10 2.13 YES
 Conntrack NAT   4.62.6  2.8  YES
 Tunnel - LISP   NO 2.11 NO   NO
 Tunnel - STTNO 2.4  NO   YES
diff --git a/NEWS b/NEWS
index 80b..17f92ba 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,7 @@ Post-v2.12.0
- Userspace datapath:
  * Add option to enable, disable and query TCP sequence checking in
conntrack.
+ * Add support for conntrack zone limits.
- AF_XDP:
  * New option 'use-need-wakeup' for netdev-afxdp to control enabling
of corresponding 'need_wakeup' flag in AF_XDP rings.  Enabled by default
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 590f139..7d36eac 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -105,6 +105,7 @@ struct conn {
 long long expiration;
 uint32_t mark;
 int seq_skew;
+int32_t admit_zone; /* The zone for managing zone limit counts. */
 bool seq_skew_dir; /* TCP sequence skew direction due to NATTing of FTP
 * control messages; true if reply direction. */
 bool cleaned; /* True if cleaned from expiry lists. */
@@ -155,6 +156,7 @@ struct conntrack {
 struct ovs_mutex ct_lock; /* Protects 2 following fields. */
 struct cmap conns OVS_GUARDED;
 struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;
+struct hmap zone_limits OVS_GUARDED;
 uint32_t hash_basis; /* Salt for hashing a connection key. */
 pthread_t clean_thread; /* Periodically cleans up connection tracker. */
 struct latch clean_thread_exit; /* To destroy the 'clean_thread'. */
diff --git a/lib/conntrack.c b/lib/conntrack.c
index df7b9fa..33d540a 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -76,6 +76,11 @@ enum ct_alg_ctl_type {
 CT_ALG_CTL_SIP,
 };
 
+struct zone_limit {
+struct hmap_node node;
+struct conntrack_zone_limit czl;
+};
+
 static bool conn_key_extract(struct conntrack *, struct dp_packet *,
  ovs_be16 dl_type, struct conn_lookup_ctx *,
  uint16_t zone);
@@ -305,6 +310,7 @@ conntrack_init(void)
 for (unsigned i = 0; i < ARRAY_SIZE(ct->exp_lists); i++) {
 ovs_list_init(>exp_lists[i]);
 }
+hmap_init(>zone_limits);
 ovs_mutex_unlock(>ct_lock);
 
 ct->hash_basis = random_uint32();
@@ -318,6 +324,110 @@ conntrack_init(void)
 return ct;
 }
 
+static uint32_t
+zone_key_hash(int32_t zone, uint32_t basis)
+{
+size_t hash = hash_int((OVS_FORCE uint32_t) zone, basis);
+return hash;
+}
+
+static struct zone_limit *
+zone_limit_lookup(struct conntrack *ct, int32_t zone)
+OVS_REQUIRES(ct->ct_lock)
+{
+uint32_t hash = zone_key_hash(zone, ct->hash_basis);
+struct zone_limit *zl;
+HMAP_FOR_EACH_IN_BUCKET (zl, node, hash, >zone_limits) {
+if (zl->czl.zone == zone) {
+return zl;
+}
+}
+return NULL;
+}
+
+static struct zone_limit *
+zone_limit_lookup_or_default(struct conntrack *ct, int32_t zone)
+OVS_REQUIRES(ct->ct_lock)
+{
+struct zone_limit *zl = zone_limit_lookup(ct, zone);
+return zl ? zl : zone_limit_lookup(ct, DEFAULT_ZONE);
+}
+
+struct conntrack_zone_limit
+zone_limit_get(struct conntrack *ct, int32_t zone)
+{
+ovs_

[ovs-dev] [patch v2] conntrack: Support zone limits.

2019-12-02 Thread Darrell Ball
Signed-off-by: Darrell Ball 
---

v2: Address review comment from Ben; one involves creating an
admit zone in the connection entry to track the zone used
for zone limit accounting when the entry was created and use
that zone at cleanup time accounting.

Updated dpctl.man.
Fixed a bug in zone_limit_get() to return default zone by default..
Fixed a parameter in zone_limit_delete().
Some small cleanups.

 Documentation/faq/releases.rst   |   2 +-
 NEWS |   1 +
 lib/conntrack-private.h  |   2 +
 lib/conntrack.c  | 134 +++
 lib/conntrack.h  |  17 +
 lib/dpctl.man|   8 +--
 lib/dpif-netdev.c|  88 -
 tests/system-kmod-macros.at  |   7 --
 tests/system-traffic.at  |   1 -
 tests/system-userspace-macros.at |   9 ---
 10 files changed, 243 insertions(+), 26 deletions(-)

diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index e02dda1..4072c99 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -118,7 +118,7 @@ Q: Are all features available with all datapaths?
 Connection tracking 4.32.5  2.6  YES
 Conntrack Fragment Reass.   4.32.6  2.10 YES
 Conntrack Timeout Policies  5.22.12 NO   NO
-Conntrack Zone Limit4.18   2.10 NO   YES
+Conntrack Zone Limit4.18   2.10 2.13 YES
 Conntrack NAT   4.62.6  2.8  YES
 Tunnel - LISP   NO 2.11 NO   NO
 Tunnel - STTNO 2.4  NO   YES
diff --git a/NEWS b/NEWS
index 80b..17f92ba 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,7 @@ Post-v2.12.0
- Userspace datapath:
  * Add option to enable, disable and query TCP sequence checking in
conntrack.
+ * Add support for conntrack zone limits.
- AF_XDP:
  * New option 'use-need-wakeup' for netdev-afxdp to control enabling
of corresponding 'need_wakeup' flag in AF_XDP rings.  Enabled by default
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 590f139..7d36eac 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -105,6 +105,7 @@ struct conn {
 long long expiration;
 uint32_t mark;
 int seq_skew;
+int32_t admit_zone; /* The zone for managing zone limit counts. */
 bool seq_skew_dir; /* TCP sequence skew direction due to NATTing of FTP
 * control messages; true if reply direction. */
 bool cleaned; /* True if cleaned from expiry lists. */
@@ -155,6 +156,7 @@ struct conntrack {
 struct ovs_mutex ct_lock; /* Protects 2 following fields. */
 struct cmap conns OVS_GUARDED;
 struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;
+struct hmap zone_limits OVS_GUARDED;
 uint32_t hash_basis; /* Salt for hashing a connection key. */
 pthread_t clean_thread; /* Periodically cleans up connection tracker. */
 struct latch clean_thread_exit; /* To destroy the 'clean_thread'. */
diff --git a/lib/conntrack.c b/lib/conntrack.c
index df7b9fa..33d540a 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -76,6 +76,11 @@ enum ct_alg_ctl_type {
 CT_ALG_CTL_SIP,
 };
 
+struct zone_limit {
+struct hmap_node node;
+struct conntrack_zone_limit czl;
+};
+
 static bool conn_key_extract(struct conntrack *, struct dp_packet *,
  ovs_be16 dl_type, struct conn_lookup_ctx *,
  uint16_t zone);
@@ -305,6 +310,7 @@ conntrack_init(void)
 for (unsigned i = 0; i < ARRAY_SIZE(ct->exp_lists); i++) {
 ovs_list_init(>exp_lists[i]);
 }
+hmap_init(>zone_limits);
 ovs_mutex_unlock(>ct_lock);
 
 ct->hash_basis = random_uint32();
@@ -318,6 +324,110 @@ conntrack_init(void)
 return ct;
 }
 
+static uint32_t
+zone_key_hash(int32_t zone, uint32_t basis)
+{
+size_t hash = hash_int((OVS_FORCE uint32_t) zone, basis);
+return hash;
+}
+
+static struct zone_limit *
+zone_limit_lookup(struct conntrack *ct, int32_t zone)
+OVS_REQUIRES(ct->ct_lock)
+{
+uint32_t hash = zone_key_hash(zone, ct->hash_basis);
+struct zone_limit *zl;
+HMAP_FOR_EACH_IN_BUCKET (zl, node, hash, >zone_limits) {
+if (zl->czl.zone == zone) {
+return zl;
+}
+}
+return NULL;
+}
+
+static struct zone_limit *
+zone_limit_lookup_or_default(struct conntrack *ct, int32_t zone)
+OVS_REQUIRES(ct->ct_lock)
+{
+struct zone_limit *zl = zone_limit_lookup(ct, zone);
+return zl ? zl : zone_limit_lookup(ct, DEFAULT_ZONE);
+}
+
+struct conntrack_zone_limit
+zone_limit_get(struct conntrack *ct, int32_t zone)
+{
+ovs_mutex_lock(>ct_lock);

Re: [ovs-dev] [patch v1] conntrack: Support zone limits.

2019-12-02 Thread Darrell Ball
Thanks for the review Ben

On Mon, Dec 2, 2019 at 12:19 PM Ben Pfaff  wrote:

> On Mon, Dec 02, 2019 at 11:41:27AM -0800, Darrell Ball wrote:
> > Signed-off-by: Darrell Ball 
>
> Thanks.  I'm glad to see this code growing closer to parity with the
> kernel implementation.  The implementation also looks pretty clean.
>
> I'm appending some style suggestions.  They should not change behavior.
>

those are all fine - thanks


>
> I do have one concern about correctness.  It looks to me that there is a
> separate lookup for the zone at the time that a connection is created
> (to increment the counter) and at the time it is destroyed (to decrement
> the counter).  I believe that this can lead to inconsistencies.  For
> example, suppose that initially a connection has no associated zone, but
> at time of destruction a zone does exist.  In that case, I believe that
> a counter would get decremented that was never incremented.


Good catch !
I agree; I created a conn 'admit zone' to keep track of the zone limit zone
used at creation time.


> I think
> that there are similar potential issues related to finding a specific
> zone versus the default zone.
>

IIUC, when a zone is looked up otherwise, we specify whether we are looking
for the default zone or other zone explicitly. If I missed your meaning,
pls comment on
this aspect in V2. Thanks


>
> -8<--cut here-->8--
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 59e1c51c0389..c90da2b4e32e 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -78,9 +78,7 @@ enum ct_alg_ctl_type {
>
>  struct zone_limit {
>  struct hmap_node node;
> -int32_t zone;
> -uint32_t limit;
> -uint32_t count;
> +struct conntrack_zone_limit czl;
>

might as well use the struct created


>  };
>
>  static bool conn_key_extract(struct conntrack *, struct dp_packet *,
> @@ -339,31 +337,30 @@ zone_limit_lookup(struct conntrack *ct, int32_t zone)
>  {
>  uint32_t hash = zone_key_hash(zone, ct->hash_basis);
>  struct zone_limit *zl;
> -HMAP_FOR_EACH_WITH_HASH (zl, node, hash, >zone_limits) {
> -if (zl->zone == zone) {
> +HMAP_FOR_EACH_IN_BUCKET (zl, node, hash, >zone_limits) {
>

sure, slightly faster


> +if (zl->czl.zone == zone) {
>  return zl;
>  }
>  }
>  return NULL;
>  }
>
> +static struct zone_limit *
> +zone_limit_lookup_or_default(struct conntrack *ct, int32_t zone)
> +OVS_REQUIRES(ct->ct_lock)
> +{
> +struct zone_limit *zl = zone_limit_lookup(ct, zone);
> +return zl ? zl : zone_limit_lookup(ct, DEFAULT_ZONE);
> +}
> +
>

this is useful


>  struct conntrack_zone_limit
>  zone_limit_get(struct conntrack *ct, int32_t zone)
>  {
>  struct conntrack_zone_limit czl = {INVALID_ZONE, 0, 0};
>  ovs_mutex_lock(>ct_lock);
> -struct zone_limit *zl = zone_limit_lookup(ct, zone);
> +struct zone_limit *zl = zone_limit_lookup_or_default(ct, zone);
>  if (zl) {
> -czl.zone = zl->zone;
> -czl.limit = zl->limit;
> -czl.count = zl->count;
> -} else {
> -zl = zone_limit_lookup(ct, DEFAULT_ZONE);
> -if (zl) {
> -czl.zone = zl->zone;
> -czl.limit = zl->limit;
> -czl.count = zl->count;
> -}
> +czl = zl->czl;
>  }
>  ovs_mutex_unlock(>ct_lock);
>  return czl;
> @@ -375,8 +372,8 @@ zone_limit_create(struct conntrack *ct, int32_t zone,
> uint32_t limit)
>  {
>  if (zone >= DEFAULT_ZONE && zone <= MAX_ZONE) {
>  struct zone_limit *zl = xzalloc(sizeof *zl);
> -zl->limit = limit;
> -zl->zone = zone;
> +zl->czl.limit = limit;
> +zl->czl.zone = zone;
>  uint32_t hash = zone_key_hash(zone, ct->hash_basis);
>  hmap_insert(>zone_limits, >node, hash);
>  return 0;
> @@ -392,7 +389,7 @@ zone_limit_update(struct conntrack *ct, int32_t zone,
> uint32_t limit)
>  ovs_mutex_lock(>ct_lock);
>  struct zone_limit *zl = zone_limit_lookup(ct, zone);
>  if (zl) {
> -zl->limit = limit;
> +zl->czl.limit = limit;
>  VLOG_INFO("Changed zone limit of %u for zone %d", limit, zone);
>  } else {
>  err = zone_limit_create(ct, zone, limit);
> @@ -444,7 +441,7 @@ conn_clean_cmn(struct conntrack *ct, struct conn *conn)
>
>  struct zone_limit *zl = zone_limit_lookup(ct, conn->key.zone);
>  if (zl) {
> -zl->count--;
> +zl->czl.count--;
>  }
>  }
>
> @@ -98

[ovs-dev] [patch v1] conntrack: Support zone limits.

2019-12-02 Thread Darrell Ball
Signed-off-by: Darrell Ball 
---
 Documentation/faq/releases.rst   |   4 +-
 NEWS |   1 +
 lib/conntrack-private.h  |   1 +
 lib/conntrack.c  | 142 +++
 lib/conntrack.h  |  17 +
 lib/dpif-netdev.c|  88 +++-
 tests/system-kmod-macros.at  |   7 --
 tests/system-traffic.at  |   1 -
 tests/system-userspace-macros.at |   9 ---
 9 files changed, 248 insertions(+), 22 deletions(-)

diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index e02dda1..6702c58 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -116,9 +116,9 @@ Q: Are all features available with all datapaths?
 FeatureLinux upstream Linux OVS tree Userspace Hyper-V
 == == == = ===
 Connection tracking 4.32.5  2.6  YES
-Conntrack Fragment Reass.   4.32.6  2.10 YES
+Conntrack Fragment Reass.   4.32.6  2.12 YES
 Conntrack Timeout Policies  5.22.12 NO   NO
-Conntrack Zone Limit4.18   2.10 NO   YES
+Conntrack Zone Limit4.18   2.10 2.13 YES
 Conntrack NAT   4.62.6  2.8  YES
 Tunnel - LISP   NO 2.11 NO   NO
 Tunnel - STTNO 2.4  NO   YES
diff --git a/NEWS b/NEWS
index 80b..17f92ba 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,7 @@ Post-v2.12.0
- Userspace datapath:
  * Add option to enable, disable and query TCP sequence checking in
conntrack.
+ * Add support for conntrack zone limits.
- AF_XDP:
  * New option 'use-need-wakeup' for netdev-afxdp to control enabling
of corresponding 'need_wakeup' flag in AF_XDP rings.  Enabled by default
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 590f139..22823cb 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -155,6 +155,7 @@ struct conntrack {
 struct ovs_mutex ct_lock; /* Protects 2 following fields. */
 struct cmap conns OVS_GUARDED;
 struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;
+struct hmap zone_limits OVS_GUARDED;
 uint32_t hash_basis; /* Salt for hashing a connection key. */
 pthread_t clean_thread; /* Periodically cleans up connection tracker. */
 struct latch clean_thread_exit; /* To destroy the 'clean_thread'. */
diff --git a/lib/conntrack.c b/lib/conntrack.c
index df7b9fa..59e1c51 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -76,6 +76,13 @@ enum ct_alg_ctl_type {
 CT_ALG_CTL_SIP,
 };
 
+struct zone_limit {
+struct hmap_node node;
+int32_t zone;
+uint32_t limit;
+uint32_t count;
+};
+
 static bool conn_key_extract(struct conntrack *, struct dp_packet *,
  ovs_be16 dl_type, struct conn_lookup_ctx *,
  uint16_t zone);
@@ -305,6 +312,7 @@ conntrack_init(void)
 for (unsigned i = 0; i < ARRAY_SIZE(ct->exp_lists); i++) {
 ovs_list_init(>exp_lists[i]);
 }
+hmap_init(>zone_limits);
 ovs_mutex_unlock(>ct_lock);
 
 ct->hash_basis = random_uint32();
@@ -318,6 +326,111 @@ conntrack_init(void)
 return ct;
 }
 
+static uint32_t
+zone_key_hash(int32_t zone, uint32_t basis)
+{
+size_t hash = hash_int((OVS_FORCE uint32_t) zone, basis);
+return hash;
+}
+
+static struct zone_limit *
+zone_limit_lookup(struct conntrack *ct, int32_t zone)
+OVS_REQUIRES(ct->ct_lock)
+{
+uint32_t hash = zone_key_hash(zone, ct->hash_basis);
+struct zone_limit *zl;
+HMAP_FOR_EACH_WITH_HASH (zl, node, hash, >zone_limits) {
+if (zl->zone == zone) {
+return zl;
+}
+}
+return NULL;
+}
+
+struct conntrack_zone_limit
+zone_limit_get(struct conntrack *ct, int32_t zone)
+{
+struct conntrack_zone_limit czl = {INVALID_ZONE, 0, 0};
+ovs_mutex_lock(>ct_lock);
+struct zone_limit *zl = zone_limit_lookup(ct, zone);
+if (zl) {
+czl.zone = zl->zone;
+czl.limit = zl->limit;
+czl.count = zl->count;
+} else {
+zl = zone_limit_lookup(ct, DEFAULT_ZONE);
+if (zl) {
+czl.zone = zl->zone;
+czl.limit = zl->limit;
+czl.count = zl->count;
+}
+}
+ovs_mutex_unlock(>ct_lock);
+return czl;
+}
+
+static int
+zone_limit_create(struct conntrack *ct, int32_t zone, uint32_t limit)
+OVS_REQUIRES(ct->ct_lock)
+{
+if (zone >= DEFAULT_ZONE && zone <= MAX_ZONE) {
+struct zone_limit *zl = xzalloc(sizeof *zl);
+zl->limit = limit;
+zl->zone = zone;
+

Re: [ovs-dev] ipf question

2019-11-29 Thread Darrell Ball
>
>
> -- Forwarded message -
> From: Li,Rongqing 
> Date: Thu, Nov 28, 2019 at 12:17 AM
> Subject: 答复: ipf question
> To: Darrell Ball 
> Cc: ovs dev 
>
>
> It does not work for me
>
>
>
> Normal it should like below
>
>
>
>
> But when the size is 5,
>
>
>
>
> 50% is loss
>
>
>
>
>
>
> Thanks
>
>
>
> -RongQing
>

Hi Li,Rongqing

This is what I see

ping -s 5 -q -c 20 -i 0.1 -w 2

20 packets transmitted, 17 received, 15% packet loss, time 0ms

Note that all fragments for a given packet do not arrive into OVS in
sequence but are interleaved with other packet fragments,
hence > 1 packet's fragments are waiting to be sent back out from ipf when
the test ends, so this result is about what I expect.

A couple things to watch for in your testing:

1/ Make sure you are not filling up the maximum fragments stored (which is
for DOS protection)
This might lead to long RTT until these are flushed out; otherwise you have
something
else going on in the src or dest VMs reassembly
.
i..e. check
ovs-appctl dpctl/ipf-get-status
these fields:
max num frags (v4/v6):
num frag:

2/ Watch for datapath flows being revalidated out due to inactivity and
setup again; packets for
a stream are not normally spaced 1 sec apart, so revalidation may kick in;
use short '-i'; see above
New flows will change the recirc id and we are now doing strict checking
for resume flows.

Thanks Darrell




>
>
> *发件人:* Darrell Ball [mailto:dlu...@gmail.com]
> *发送时间:* 2019年11月26日 10:38
> *收件人:* Li,Rongqing 
> *抄送:* ovs dev 
> *主题:* Re: ipf question
>
>
>
> Thanks Li,Rongqing
>
>
>
> On Mon, Nov 18, 2019 at 9:35 PM Li,Rongqing  wrote:
>
> Thanks, Darrell
>
>
>
> I can try to test it
>
>
>
> thanks !; can you try this 2 patch series ?
>
>
>
> dball@ubuntu:~/openvswitch/ovs$ cat
> outgoing2/0001-dp-packet-Cache-batch-action-list-in-batch.patch
> From 80436fb57a54ecfd532e99086dbb0e5142415070 Mon Sep 17 00:00:00 2001
> From: Darrell Ball 
> Date: Sun, 24 Nov 2019 17:29:07 -0800
> Subject: [patch v1 1/2] dp-packet: Cache batch action list in batch.
> To: d...@openvswitch.org
>
> Cache the batch action list in the batch itself. This will be
> used in a subsequent patch.
>
> Signed-off-by: Darrell Ball 
> ---
>  lib/dp-packet.h | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 14f0897..77df801 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -775,9 +775,11 @@ enum { NETDEV_MAX_BURST = 32 }; /* Maximum number
> packets in a batch. */
>
>  struct dp_packet_batch {
>  size_t count;
> +size_t actions_len;
> +const struct nlattr *actions;
> +struct dp_packet *packets[NETDEV_MAX_BURST];
>  bool trunc; /* true if the batch needs truncate. */
>  bool do_not_steal; /* Indicate that the packets should not be stolen.
> */
> -struct dp_packet *packets[NETDEV_MAX_BURST];
>  };
>
>  static inline void
> @@ -786,6 +788,8 @@ dp_packet_batch_init(struct dp_packet_batch *batch)
>  batch->count = 0;
>  batch->trunc = false;
>  batch->do_not_steal = false;
> +batch->actions = NULL;
> +batch->actions_len = 0;
>  }
>
>  static inline void
> @@ -930,6 +934,27 @@ dp_packet_batch_reset_cutlen(struct dp_packet_batch
> *batch)
>  }
>  }
>
> +static inline void
> +dp_packet_batch_set_action_ctx(struct dp_packet_batch *batch,
> +   const struct nlattr *actions,
> +   size_t actions_len)
> +{
> +batch->actions = actions;
> +batch->actions_len = actions_len;
> +}
> +
> +static inline const struct nlattr *
> +dp_packet_batch_get_actions(struct dp_packet_batch *batch)
> +{
> +return batch->actions;
> +}
> +
> +static inline size_t
> +dp_packet_batch_get_action_len(struct dp_packet_batch *batch)
> +{
> +return batch->actions_len;
> +}
> +
>  #ifdef  __cplusplus
>  }
>  #endif
> --
> 1.9.1
>
>
>
> dball@ubuntu:~/openvswitch/ovs$ cat
> outgoing2/0002-ipf-Resume-fragments-in-same-action-list.patch
> From 4aefd4e791f9d7b6ef916e41136cb59e6020bb22 Mon Sep 17 00:00:00 2001
> From: Darrell Ball 
> Date: Sun, 24 Nov 2019 17:33:58 -0800
> Subject: [patch v1 2/2] ipf: Resume fragments in same action list.
> To: d...@openvswitch.org
>
> Once fragments are reassembled and go through conntrack, the fragments
> need to resume processing in the same action list.  There is a
> semantic requirement that the list have, at most, one conntrack action,
> so this specifies the 

[ovs-dev] [patch v1] faq: Correct fragment reassembly release.

2019-11-25 Thread Darrell Ball
Correct fragment reassembly release for the userspace datapath.

Signed-off-by: Darrell Ball 
---
 Documentation/faq/releases.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index 4152345..3b2da08 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -116,7 +116,7 @@ Q: Are all features available with all datapaths?
 FeatureLinux upstream Linux OVS tree Userspace Hyper-V
 == == == = ===
 Connection tracking 4.32.5  2.6  YES
-Conntrack Fragment Reass.   4.32.6  2.10 YES
+Conntrack Fragment Reass.   4.32.6  2.12 YES
 Conntrack Timeout Policies  5.22.12 NO   NO
 Conntrack Zone Limit4.18   2.10 NO   YES
 Conntrack NAT   4.62.6  2.8  YES
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ipf question

2019-11-25 Thread Darrell Ball
Thanks Li,Rongqing

On Mon, Nov 18, 2019 at 9:35 PM Li,Rongqing  wrote:

> Thanks, Darrell
>
>
>
> I can try to test it
>

thanks !; can you try this 2 patch series ?

dball@ubuntu:~/openvswitch/ovs$ cat
outgoing2/0001-dp-packet-Cache-batch-action-list-in-batch.patch
From 80436fb57a54ecfd532e99086dbb0e5142415070 Mon Sep 17 00:00:00 2001
From: Darrell Ball 
Date: Sun, 24 Nov 2019 17:29:07 -0800
Subject: [patch v1 1/2] dp-packet: Cache batch action list in batch.
To: d...@openvswitch.org

Cache the batch action list in the batch itself. This will be
used in a subsequent patch.

Signed-off-by: Darrell Ball 
---
 lib/dp-packet.h | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 14f0897..77df801 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -775,9 +775,11 @@ enum { NETDEV_MAX_BURST = 32 }; /* Maximum number
packets in a batch. */

 struct dp_packet_batch {
 size_t count;
+size_t actions_len;
+const struct nlattr *actions;
+struct dp_packet *packets[NETDEV_MAX_BURST];
 bool trunc; /* true if the batch needs truncate. */
 bool do_not_steal; /* Indicate that the packets should not be stolen.
*/
-struct dp_packet *packets[NETDEV_MAX_BURST];
 };

 static inline void
@@ -786,6 +788,8 @@ dp_packet_batch_init(struct dp_packet_batch *batch)
 batch->count = 0;
 batch->trunc = false;
 batch->do_not_steal = false;
+batch->actions = NULL;
+batch->actions_len = 0;
 }

 static inline void
@@ -930,6 +934,27 @@ dp_packet_batch_reset_cutlen(struct dp_packet_batch
*batch)
 }
 }

+static inline void
+dp_packet_batch_set_action_ctx(struct dp_packet_batch *batch,
+   const struct nlattr *actions,
+   size_t actions_len)
+{
+batch->actions = actions;
+batch->actions_len = actions_len;
+}
+
+static inline const struct nlattr *
+dp_packet_batch_get_actions(struct dp_packet_batch *batch)
+{
+return batch->actions;
+}
+
+static inline size_t
+dp_packet_batch_get_action_len(struct dp_packet_batch *batch)
+{
+return batch->actions_len;
+}
+
 #ifdef  __cplusplus
 }
 #endif
-- 
1.9.1

dball@ubuntu:~/openvswitch/ovs$ cat
outgoing2/0002-ipf-Resume-fragments-in-same-action-list.patch
From 4aefd4e791f9d7b6ef916e41136cb59e6020bb22 Mon Sep 17 00:00:00 2001
From: Darrell Ball 
Date: Sun, 24 Nov 2019 17:33:58 -0800
Subject: [patch v1 2/2] ipf: Resume fragments in same action list.
To: d...@openvswitch.org

Once fragments are reassembled and go through conntrack, the fragments
need to resume processing in the same action list.  There is a
semantic requirement that the list have, at most, one conntrack action,
so this specifies the resume point in the list as well.  A memcmp is
used to compare pre and post processing lists as padding is zeroed out,
therefore having predictable values.

Signed-off-by: Darrell Ball 
---
 lib/dpif-netdev.c |  1 +
 lib/ipf.c | 49 -
 2 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 5142bad..3ddca5b 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -7363,6 +7363,7 @@ dp_netdev_execute_actions(struct dp_netdev_pmd_thread
*pmd,
 {
 struct dp_netdev_execute_aux aux = { pmd, flow };

+dp_packet_batch_set_action_ctx(packets, actions, actions_len);
 odp_execute_actions(, packets, should_steal, actions,
 actions_len, dp_execute_cb);
 }
diff --git a/lib/ipf.c b/lib/ipf.c
index 45c4891..ad82620 100644
--- a/lib/ipf.c
+++ b/lib/ipf.c
@@ -91,6 +91,8 @@ union ipf_addr {
 /* Represents a single fragment; part of a list of fragments. */
 struct ipf_frag {
 struct dp_packet *pkt;
+struct nlattr *actions;
+size_t actions_len;
 uint16_t start_data_byte;
 uint16_t end_data_byte;
 bool dnsteal; /* 'do not steal': if true, ipf should not free packet.
*/
@@ -261,7 +263,12 @@ ipf_list_clean(struct hmap *frag_lists,
 {
 ovs_list_remove(_list->list_node);
 hmap_remove(frag_lists, _list->node);
-free(ipf_list->frag_list);
+struct ipf_frag *frag_list = ipf_list->frag_list;
+ovs_assert(frag_list);
+for (int i = 0; i <= ipf_list->last_inuse_idx; i++) {
+free(frag_list[i].actions);
+}
+free(frag_list);
 free(ipf_list);
 }

@@ -793,7 +800,7 @@ static bool
 ipf_process_frag(struct ipf *ipf, struct ipf_list *ipf_list,
  struct dp_packet *pkt, uint16_t start_data_byte,
  uint16_t end_data_byte, bool ff, bool lf, bool v6,
- bool dnsteal)
+ struct dp_packet_batch *pb)
 OVS_REQUIRES(ipf->ipf_lock)
 {
 bool duped_frag = ipf_is_frag_duped(ipf_list->frag_list,
@@ -811,7 +818,11 @@ ipf_process_frag(struct ipf *ipf, struct ipf_list
*ipf_list,
 frag->pkt = pkt;

Re: [ovs-dev] ipf question

2019-11-18 Thread Darrell Ball
Thanks; I had a look and I noticed ipf does not keep all the context it
needs
to properly resume fragment processing in the general case; I have a
potential fix,
but won't get to it this week.


On Sun, Nov 17, 2019 at 11:08 PM Darrell Ball  wrote:

>
>
> On Fri, Nov 15, 2019 at 6:03 PM Li,Rongqing  wrote:
>
>> 发件人: Darrell Ball 
>> 发送时间: 2019年11月15日 22:58
>> 收件人: Li,Rongqing
>> 抄送: ovs dev
>> 主题: Re: ipf question
>>
>>
>> >Let me paraphrase, just to confirm we are on the same page.
>> >IIUC, for example, in the case of a 33 fragment packet, in the first
>> pass all 33 fragments enter ipf, then are >reassembled, pass thru conntrack
>> >and then the frags sent out, while in the second pass, only 32 fragments
>> enter conntrack/ipf, while index 32 >fragment is being forwarded out
>> without going thru conntrack/ipf ?
>>
>> true.
>> the second pass is recirculation
>> and index 32 fragment is not into contrack/ipf, and send out to vm
>> directly.
>>
>
> can you check what rule is being hit by that fragment packet vs others and
> then compare the pkt metadata
>
>
>> if  I change  NETDEV_MAX_BURST to 64, it works
>>
>
> good test
>
>
>>
>> thanks
>>
>> -RongQing
>>
>>
>>
>>
>>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ipf question

2019-11-17 Thread Darrell Ball
On Fri, Nov 15, 2019 at 6:03 PM Li,Rongqing  wrote:

> 发件人: Darrell Ball 
> 发送时间: 2019年11月15日 22:58
> 收件人: Li,Rongqing
> 抄送: ovs dev
> 主题: Re: ipf question
>
>
> >Let me paraphrase, just to confirm we are on the same page.
> >IIUC, for example, in the case of a 33 fragment packet, in the first pass
> all 33 fragments enter ipf, then are >reassembled, pass thru conntrack
> >and then the frags sent out, while in the second pass, only 32 fragments
> enter conntrack/ipf, while index 32 >fragment is being forwarded out
> without going thru conntrack/ipf ?
>
> true.
> the second pass is recirculation
> and index 32 fragment is not into contrack/ipf, and send out to vm
> directly.
>

can you check what rule is being hit by that fragment packet vs others and
then compare the pkt metadata


> if  I change  NETDEV_MAX_BURST to 64, it works
>

good test


>
> thanks
>
> -RongQing
>
>
>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ipf: bail out when ipf state is COMPLETED

2019-11-14 Thread Darrell Ball
Hi RongQing/Wang

On Thu, Nov 14, 2019 at 1:26 AM Li RongQing  wrote:

> it is easy to crash ovs when a packet with same id
> hits a list that already reassembled completedly
> but have not been sent out yet, and this packet is
> not duplicate with this hit ipf list due to bigger
> offset
>
> 1  0x7f9fef0ae2d9 in __GI_abort () at abort.c:89
>

Good DOS test.
Fix is correct.

This needs a 'Fixes' tag.
Fixes: 4ea96698f667 ("Userspace datapath: Add fragmentation handling.")

This will need to be backported to 2.10.

Thanks Darrell

2  0x00464042 in ipf_list_state_transition at lib/ipf.c:545
>
> Co-authored-by: Wang Li 
> Signed-off-by: Wang Li 
> Signed-off-by: Li RongQing 
> ---
>  lib/ipf.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/ipf.c b/lib/ipf.c
> index 4cc0f2df6..45c489122 100644
> --- a/lib/ipf.c
> +++ b/lib/ipf.c
> @@ -899,7 +899,8 @@ ipf_handle_frag(struct ipf *ipf, struct dp_packet
> *pkt, ovs_be16 dl_type,
>MIN(max_frag_list_size,
> IPF_FRAG_LIST_MIN_INCREMENT));
>  hmap_insert(>frag_lists, _list->node, hash);
>  ipf_expiry_list_add(>frag_exp_list, ipf_list, now);
> -} else if (ipf_list->state == IPF_LIST_STATE_REASS_FAIL) {
> +} else if (ipf_list->state == IPF_LIST_STATE_REASS_FAIL ||
> +   ipf_list->state == IPF_LIST_STATE_COMPLETED) {
>  /* Bail out as early as possible. */
>  return false;
>  } else if (ipf_list->last_inuse_idx + 1 >= ipf_list->size) {
> --
> 2.16.2
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: Fix tcp payload length in case multi-segments.

2019-11-11 Thread Darrell Ball
Hi Zhike

Thanks for clarifying
There is presently no support for multi-segment mbufs in OVS, so the patch
would
not be needed and in general patches are only proposed at the time they are
relevant.

Also note that in the hypothetical case of multisegment mbufs, the packet
would be linearized at
this point.

Also note that this patch still relies on pointers..

As Ilya already mentioned there is no need to use multi-segment mbufs on
OVS as large buffers
can be used instead.

On Sun, Nov 10, 2019 at 9:50 PM 王志克  wrote:

> Hi Darrell,
>
> In TSO case, the packet may use multi-segments mbuf, and I do not think we
> need to make it linearal. In this case, we can NOT use pointer to calculate
> the tcp length.
>
> Br,
>
> Zhike Wang
> JDCloud, Product Development, IaaS
>
> 
> Mobile/+86 13466719566
> E- mail/wangzh...@jd.com
> Address/5F Building A,North-Star Century Center,8 Beichen West
> Street,Chaoyang District Beijing
> Https://JDCloud.com
>
> ----
>
>
> From: Darrell Ball [mailto:dlu...@gmail.com]
> Sent: Saturday, November 09, 2019 8:12 AM
> To: Zhike Wang
> Cc: ovs dev; 王志克
> Subject: Re: [ovs-dev] [PATCH] conntrack: Fix tcp payload length in case
> multi-segments.
>
> Thanks for the patch
>
> Would you mind describing the use case that this patch is aiming to
> support ?
>
> On Fri, Nov 8, 2019 at 1:23 AM Zhike Wang  wrote:
> Signed-off-by: Zhike Wang 
> ---
>  lib/conntrack-private.h | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
> index 590f139..1d21f6e 100644
> --- a/lib/conntrack-private.h
> +++ b/lib/conntrack-private.h
> @@ -233,13 +233,17 @@ conn_update_expiration(struct conntrack *ct, struct
> conn *conn,
>  static inline uint32_t
>  tcp_payload_length(struct dp_packet *pkt)
>  {
> -const char *tcp_payload = dp_packet_get_tcp_payload(pkt);
> -if (tcp_payload) {
> -return ((char *) dp_packet_tail(pkt) - dp_packet_l2_pad_size(pkt)
> -- tcp_payload);
> -} else {
> -return 0;
> +size_t l4_size = dp_packet_l4_size(pkt);
> +
> +if (OVS_LIKELY(l4_size >= TCP_HEADER_LEN)) {
> +struct tcp_header *tcp = dp_packet_l4(pkt);
> +int tcp_len = TCP_OFFSET(tcp->tcp_ctl) * 4;
> +
> +if (OVS_LIKELY(tcp_len >= TCP_HEADER_LEN && tcp_len <= l4_size)) {
> +return (l4_size - tcp_len);
> +}
>
> Maybe I missed something, but it looks like the same calculation is
> arrived at.
>
>  }
> +return 0;
>  }
>
>  #endif /* conntrack-private.h */
> --
> 1.8.3.1
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: Fix tcp payload length in case multi-segments.

2019-11-08 Thread Darrell Ball
Thanks for the patch

Would you mind describing the use case that this patch is aiming to support
?

On Fri, Nov 8, 2019 at 1:23 AM Zhike Wang  wrote:

> Signed-off-by: Zhike Wang 
> ---
>  lib/conntrack-private.h | 16 ++--
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
> index 590f139..1d21f6e 100644
> --- a/lib/conntrack-private.h
> +++ b/lib/conntrack-private.h
> @@ -233,13 +233,17 @@ conn_update_expiration(struct conntrack *ct, struct
> conn *conn,
>  static inline uint32_t
>  tcp_payload_length(struct dp_packet *pkt)
>  {
> -const char *tcp_payload = dp_packet_get_tcp_payload(pkt);
> -if (tcp_payload) {
> -return ((char *) dp_packet_tail(pkt) - dp_packet_l2_pad_size(pkt)
> -- tcp_payload);
> -} else {
> -return 0;
> +size_t l4_size = dp_packet_l4_size(pkt);
> +
> +if (OVS_LIKELY(l4_size >= TCP_HEADER_LEN)) {
> +struct tcp_header *tcp = dp_packet_l4(pkt);
> +int tcp_len = TCP_OFFSET(tcp->tcp_ctl) * 4;
> +
> +if (OVS_LIKELY(tcp_len >= TCP_HEADER_LEN && tcp_len <= l4_size)) {
> +return (l4_size - tcp_len);
> +}
>

Maybe I missed something, but it looks like the same calculation is arrived
at.


>  }
> +return 0;
>  }
>
>  #endif /* conntrack-private.h */
> --
> 1.8.3.1
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] can OVS conntrack support IP list like this: actions=ct(commit, table=0, zone=1, nat(dst=220.0.0.3, 220.0.0.7, 220.0.0.123))?

2019-11-07 Thread Darrell Ball
On Thu, Nov 7, 2019 at 8:05 AM Darrell Ball  wrote:

>
>
> On Wed, Nov 6, 2019 at 5:01 PM Yi Yang (杨燚)-云服务集团 
> wrote:
>
>> Thanks Darrell, I didn’t receive your second reply, I saw it in
>> mail.openvswitch.org.
>>
>>
>>
>> “
>>
>> probably, you should give an example of what you mean by above
>>
>> I am not sure you are meaning to say that you want to specify an L4 port
>> in
>>
>> your
>>
>> snat action rule or not; you will want to use ephemeral ports by not
>>
>> specifying a
>>
>> specific port in most cases
>>
>> “
>>
>>
>>
>> For SNAT, we don’t specify port, just use default port range
>> “1024-65535”), but for internal source IPs, i.e. floating IPs, they are
>> discrete in most cases because some floating IPs needn’t access Internet,
>> for public IPs, so are they. For public IPs, maybe they are from different
>> telecom carriers, we prefer egress traffic can be distributed on several
>> BGP lines.
>>
>
>>
>>
>> table=0,ip,nw_src=172.18.0.67,…,actions=ct(commit,table=0,zone=1,nat(src=220.0.0.3,230.0.0.7,240.0.0.123))
>>
>>
>> table=0,ip,nw_src=172.18.0.80,…,actions=ct(commit,table=0,zone=1,nat(src=220.0.0.3,230.0.0.7,240.0.0.123))
>>
>>
>> table=0,ip,nw_src=172.19.0.23,…,actions=ct(commit,table=0,zone=1,nat(src=220.0.0.3,230.0.0.7,240.0.0.123))
>>
>>
>>
>> Ideally, we hope, for different traffic types from the same internal IP
>> (say 172.18.0.67), some can SNAT to 220.0.0.3, some can SNAT to 230.0.0.7,
>> some can SNAT to
>>
>> 240.0.0.123, that way, they can leverage total bandwidth of several BGP
>> lines.
>>
>>
>>
>> I know current OVS can’t support the above IP list for snat, but it is
>> indeed required in reality, I don’t understand why OVS can’t do in this
>> way, is it linux conntrack limitation or what else reason? I think it is
>> similar to IP range which can be supported.
>>
>
> Presently, the limitation is both at the Openflow layer and implementation
> details at datapath
> The layer above (a controller or even a script) can do the mapping taking
> into account the desired distribution
> A controller can/will often do this and similar types of configuration
> specification.
>

btw, in terms of controller config support, Openflow Groups with select
type may help here


>
>
>>
>>
>>
>> *发件人:* Darrell Ball [mailto:dlu...@gmail.com]
>> *发送时间:* 2019年11月6日 9:38
>> *收件人:* Yi Yang (杨燚)-云服务集团 
>> *抄送:* ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
>> *主题:* Re: [ovs-dev] can OVS conntrack support IP list like this:
>> actions=ct(commit, table=0, zone=1, nat(dst=220.0.0.3, 220.0.0.7,
>> 220.0.0.123))?
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Nov 5, 2019 at 4:32 PM Yi Yang (杨燚)-云服务集团 
>> wrote:
>>
>> Hi, folks
>>
>>
>>
>> We need to do SNAT for many internal IPs by just using several public IPs,
>> we also need to do DNAT by some other public IPs for exposing webservice,
>> openflow rules look like the below:
>>
>>
>>
>> table=0,ip,nw_src=172.17.0.0/16,
>> …,actions=ct(commit,table=0,zone=1,nat(src=
>> 220.0.0.3,220.0.0.7,220.0.0.123))
>>
>>
>> table=0,ip,nw_src=172.18.0.67,…,actions=ct(commit,table=0,zone=1,nat(src=22
>> 0.0.0.3,220.0.0.7,220.0.0.123))
>>
>>
>>
>> for snat, you can map some subset of private IPs to a given public IP and
>> so on
>>
>>
>>
>>
>>
>>
>> table=0,ip,tcp,nw_dst=220.0.0.11,tp_dst=80,…,actions=ct(commit,table=0,zone
>> =2,nat(dst=172.16.0.100:80))
>>
>> table=0,ip,tcp,nw_dst=220.0.0.11,
>> tp_dst=443,…,actions=ct(commit,table=0,zone=2,nat(dst=172.16.0.100:443))
>>
>>
>>
>> you are mapping 'to' private IPs, so you have control over the range
>>
>>
>>
>>
>>
>>
>>
>>
>> From ct document, it seems it can’t support IP list for nat, anybody knows
>> how we can handle such cases in some kind feasible way?
>>
>>
>>
>> In addition, is it ok if multiple openflow rules use the same NAT IP:PORT
>> combination? I’m not sure if it will result in some conflicts for SNAT,
>> because all of them need to do dynamic source port mapping, per my test,
>> it
>> seems this isn’t a problem.
>>
>>
>>
>> IIUC, as long as tuples are unique, it should be fine
>>
>>
>>
>>
>>
>>
>> Thank you all in advance and appreciate your help sincerely.
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] can OVS conntrack support IP list like this: actions=ct(commit, table=0, zone=1, nat(dst=220.0.0.3, 220.0.0.7, 220.0.0.123))?

2019-11-07 Thread Darrell Ball
On Wed, Nov 6, 2019 at 5:01 PM Yi Yang (杨燚)-云服务集团 
wrote:

> Thanks Darrell, I didn’t receive your second reply, I saw it in
> mail.openvswitch.org.
>
>
>
> “
>
> probably, you should give an example of what you mean by above
>
> I am not sure you are meaning to say that you want to specify an L4 port in
>
> your
>
> snat action rule or not; you will want to use ephemeral ports by not
>
> specifying a
>
> specific port in most cases
>
> “
>
>
>
> For SNAT, we don’t specify port, just use default port range
> “1024-65535”), but for internal source IPs, i.e. floating IPs, they are
> discrete in most cases because some floating IPs needn’t access Internet,
> for public IPs, so are they. For public IPs, maybe they are from different
> telecom carriers, we prefer egress traffic can be distributed on several
> BGP lines.
>

>
>
> table=0,ip,nw_src=172.18.0.67,…,actions=ct(commit,table=0,zone=1,nat(src=220.0.0.3,230.0.0.7,240.0.0.123))
>
>
> table=0,ip,nw_src=172.18.0.80,…,actions=ct(commit,table=0,zone=1,nat(src=220.0.0.3,230.0.0.7,240.0.0.123))
>
>
> table=0,ip,nw_src=172.19.0.23,…,actions=ct(commit,table=0,zone=1,nat(src=220.0.0.3,230.0.0.7,240.0.0.123))
>
>
>
> Ideally, we hope, for different traffic types from the same internal IP
> (say 172.18.0.67), some can SNAT to 220.0.0.3, some can SNAT to 230.0.0.7,
> some can SNAT to
>
> 240.0.0.123, that way, they can leverage total bandwidth of several BGP
> lines.
>
>
>
> I know current OVS can’t support the above IP list for snat, but it is
> indeed required in reality, I don’t understand why OVS can’t do in this
> way, is it linux conntrack limitation or what else reason? I think it is
> similar to IP range which can be supported.
>

Presently, the limitation is both at the Openflow layer and implementation
details at datapath
The layer above (a controller or even a script) can do the mapping taking
into account the desired distribution
A controller can/will often do this and similar types of configuration
specification.


>
>
>
> *发件人:* Darrell Ball [mailto:dlu...@gmail.com]
> *发送时间:* 2019年11月6日 9:38
> *收件人:* Yi Yang (杨燚)-云服务集团 
> *抄送:* ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
> *主题:* Re: [ovs-dev] can OVS conntrack support IP list like this:
> actions=ct(commit, table=0, zone=1, nat(dst=220.0.0.3, 220.0.0.7,
> 220.0.0.123))?
>
>
>
>
>
>
>
> On Tue, Nov 5, 2019 at 4:32 PM Yi Yang (杨燚)-云服务集团 
> wrote:
>
> Hi, folks
>
>
>
> We need to do SNAT for many internal IPs by just using several public IPs,
> we also need to do DNAT by some other public IPs for exposing webservice,
> openflow rules look like the below:
>
>
>
> table=0,ip,nw_src=172.17.0.0/16,
> …,actions=ct(commit,table=0,zone=1,nat(src=
> 220.0.0.3,220.0.0.7,220.0.0.123))
>
> table=0,ip,nw_src=172.18.0.67,…,actions=ct(commit,table=0,zone=1,nat(src=22
> 0.0.0.3,220.0.0.7,220.0.0.123))
>
>
>
> for snat, you can map some subset of private IPs to a given public IP and
> so on
>
>
>
>
>
> table=0,ip,tcp,nw_dst=220.0.0.11,tp_dst=80,…,actions=ct(commit,table=0,zone
> =2,nat(dst=172.16.0.100:80))
>
> table=0,ip,tcp,nw_dst=220.0.0.11,
> tp_dst=443,…,actions=ct(commit,table=0,zone=2,nat(dst=172.16.0.100:443))
>
>
>
> you are mapping 'to' private IPs, so you have control over the range
>
>
>
>
>
>
>
>
> From ct document, it seems it can’t support IP list for nat, anybody knows
> how we can handle such cases in some kind feasible way?
>
>
>
> In addition, is it ok if multiple openflow rules use the same NAT IP:PORT
> combination? I’m not sure if it will result in some conflicts for SNAT,
> because all of them need to do dynamic source port mapping, per my test, it
> seems this isn’t a problem.
>
>
>
> IIUC, as long as tuples are unique, it should be fine
>
>
>
>
>
>
> Thank you all in advance and appreciate your help sincerely.
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] can OVS conntrack support IP list like this: actions=ct(commit, table=0, zone=1, nat(dst=220.0.0.3, 220.0.0.7, 220.0.0.123))?

2019-11-05 Thread Darrell Ball
On Tue, Nov 5, 2019 at 5:37 PM Darrell Ball  wrote:

>
>
> On Tue, Nov 5, 2019 at 4:32 PM Yi Yang (杨燚)-云服务集团 
> wrote:
>
>> Hi, folks
>>
>>
>>
>> We need to do SNAT for many internal IPs by just using several public IPs,
>> we also need to do DNAT by some other public IPs for exposing webservice,
>> openflow rules look like the below:
>>
>>
>>
>> table=0,ip,nw_src=172.17.0.0/16,
>> …,actions=ct(commit,table=0,zone=1,nat(src=
>> 220.0.0.3,220.0.0.7,220.0.0.123))
>>
>>
>> table=0,ip,nw_src=172.18.0.67,…,actions=ct(commit,table=0,zone=1,nat(src=22
>> 0.0.0.3,220.0.0.7,220.0.0.123))
>>
>>
> for snat, you can map some subset of private IPs to a given public IP and
> so on
>
>
>
>>
>> table=0,ip,tcp,nw_dst=220.0.0.11,tp_dst=80,…,actions=ct(commit,table=0,zone
>> =2,nat(dst=172.16.0.100:80))
>>
>> table=0,ip,tcp,nw_dst=220.0.0.11,
>> tp_dst=443,…,actions=ct(commit,table=0,zone=2,nat(dst=172.16.0.100:443))
>>
>
> you are mapping 'to' private IPs, so you have control over the range
>

>
>>
>>
>>
>>
>> From ct document, it seems it can’t support IP list for nat, anybody knows
>> how we can handle such cases in some kind feasible way?
>>
>>
>>
>> In addition, is it ok if multiple openflow rules use the same NAT IP:PORT
>> combination? I’m not sure if it will result in some conflicts for SNAT,
>> because all of them need to do dynamic source port mapping, per my test,
>> it
>> seems this isn’t a problem.
>>
>
> IIUC, as long as tuples are unique, it should be fine
>

probably, you should give an example of what you mean by above
I am not sure you are meaning to say that you want to specify an L4 port in
your
snat action rule or not; you will want to use ephemeral ports by not
specifying a
specific port in most cases



>
>
>>
>>
>>
>> Thank you all in advance and appreciate your help sincerely.
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] can OVS conntrack support IP list like this: actions=ct(commit, table=0, zone=1, nat(dst=220.0.0.3, 220.0.0.7, 220.0.0.123))?

2019-11-05 Thread Darrell Ball
On Tue, Nov 5, 2019 at 4:32 PM Yi Yang (杨燚)-云服务集团 
wrote:

> Hi, folks
>
>
>
> We need to do SNAT for many internal IPs by just using several public IPs,
> we also need to do DNAT by some other public IPs for exposing webservice,
> openflow rules look like the below:
>
>
>
> table=0,ip,nw_src=172.17.0.0/16,
> …,actions=ct(commit,table=0,zone=1,nat(src=
> 220.0.0.3,220.0.0.7,220.0.0.123))
>
> table=0,ip,nw_src=172.18.0.67,…,actions=ct(commit,table=0,zone=1,nat(src=22
> 0.0.0.3,220.0.0.7,220.0.0.123))
>
>
for snat, you can map some subset of private IPs to a given public IP and
so on



> table=0,ip,tcp,nw_dst=220.0.0.11,tp_dst=80,…,actions=ct(commit,table=0,zone
> =2,nat(dst=172.16.0.100:80))
>
> table=0,ip,tcp,nw_dst=220.0.0.11,
> tp_dst=443,…,actions=ct(commit,table=0,zone=2,nat(dst=172.16.0.100:443))
>

you are mapping 'to' private IPs, so you have control over the range


>
>
>
>
> From ct document, it seems it can’t support IP list for nat, anybody knows
> how we can handle such cases in some kind feasible way?
>
>
>
> In addition, is it ok if multiple openflow rules use the same NAT IP:PORT
> combination? I’m not sure if it will result in some conflicts for SNAT,
> because all of them need to do dynamic source port mapping, per my test, it
> seems this isn’t a problem.
>

IIUC, as long as tuples are unique, it should be fine


>
>
>
> Thank you all in advance and appreciate your help sincerely.
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch v1] faq: Fix meter action releases.

2019-11-02 Thread Darrell Ball
At the same time disambiguate some feature descriptions.
'Meters' is changed to 'Meter action' to clarify that the entry
describes the Openflow meter action rather than port based meters.
'NAT' is changed to 'Conntrack NAT' to indicate that this entry
represents NAT done in 'conntrack', rather than basic Openflow
IP address and L4 port modifications.

Signed-off-by: Darrell Ball 
---
 Documentation/faq/releases.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index 1dd4912..4152345 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -119,7 +119,7 @@ Q: Are all features available with all datapaths?
 Conntrack Fragment Reass.   4.32.6  2.10 YES
 Conntrack Timeout Policies  5.22.12 NO   NO
 Conntrack Zone Limit4.18   2.10 NO   YES
-NAT 4.62.6  2.8  YES
+Conntrack NAT   4.62.6  2.8  YES
 Tunnel - LISP   NO 2.11 NO   NO
 Tunnel - STTNO 2.4  NO   YES
 Tunnel - GRE3.11   1.0  2.4  YES
@@ -137,7 +137,7 @@ Q: Are all features available with all datapaths?
 Set action  YES1.0  1.0PARTIAL
 NIC Bonding YES1.0  1.0  YES
 Multiple VTEPs  YES1.10 1.10 YES
-Meters  4.15   1.10 1.10 NO
+Meter action4.15   2.10 2.7  NO
 == == == = ===
 
 Do note, however:
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] tests: Only run test on kernel datapath

2019-10-04 Thread Darrell Ball
On Fri, Oct 4, 2019 at 11:03 AM Gregory Rose  wrote:

>
> On 10/4/2019 10:52 AM, Ben Pfaff wrote:
> > On Fri, Oct 04, 2019 at 10:43:02AM -0700, Gregory Rose wrote:
> >> On 10/4/2019 10:30 AM, Ben Pfaff wrote:
> >>> On Fri, Oct 04, 2019 at 09:49:57AM -0700, Greg Rose wrote:
> >>>> The recently added test to check for the correct L3 L4 protocol
> >>>> information after conntrack reassembles a packet should not run
> >>>> in the userspace datapath.  It is specific to a kernel datapath
> >>>> regression.
> >>>>
> >>>> Also change the name of the test to make it more informativeand
> >>>> less redundant and add comments with a short explanation.
> >>>>
> >>>> Fixes: d7fd61a ("tests: Add check for correct l3l4 conntrack frag
> reassembly")
> >>>> Suggested-by: Darrell Ball 
> >>>> Signed-off-by: Greg Rose 
> >>> Applied to master, thanks Greg and Darrell!
> >>>
> >>> Greg, I noticed that you didn't CC Darrell on the patch.  You might
> find
> >>> it helpful to add the following to your ~/.gitconfig
> >>>
> >>> [sendemail]
> >>> ccCmd = extract-ccs
> >>>
> >>> and install the following in your $PATH as extract-ccs (marked
> >>> executable):
> >>>
> >>> #! /bin/sed -nf
> >>> s/^[A-Z][-a-z]*-by: \(.*@.*\)/\1/p
> >> My git send-email will extract CC's from 'Cc:' type fields but not
> >> Suggested-by as you say.  However, I did CC Darrell in the git
> >> send-email command itself:
> >>
> >>   git send-email --to=d...@openvswitch.org --cc=dlu...@gmail.com
> >> 0001-tests-Only-run-test-on-kernel-datapath.patch
> > Oh, that's odd, the CC does not show in the copy that reached me.
> > Puzzling.
> Really?  I see he's not on the replies in this thread.  That is weird
> but I do see him on the CC line in my email.  Of course, I sent it...
>
> ???
>
> I've added Darrell back in here.  Darrell, did you get copied?
>

No, but it does not matter since I would not respond while driving anyways
:-)



>
> - Greg
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] tests: Add check for correct l3l4 conntrack frag reassembly

2019-10-04 Thread Darrell Ball
On Fri, Oct 4, 2019 at 7:05 AM Gregory Rose  wrote:

>
> On 10/3/2019 6:56 PM, Darrell Ball wrote:
> > Thanks for the patch
> >
> > This approach will not work for the userspace datapath
> >
> > Few issues off the top of my head:
> >
> > 1/ packet-out frees the packet (which is a fragment in this case)
> > after completion
> >hence multiple packet-outs need to be part of a single Openflow
> > bundle command as in other similar tests
> >This test involves 2 fragments for the first 2 packets.
> >
> > 2/ Userpsace datapath checks UDP checksums; for V6 packets, they need
> > to always be correct and they are not presently
> >
> > 3/ UDP header lengths cannot be larger than the memory allocated for
> > the packet, else sanity checking will filter
> > out the packet
> >
> > Alternatively, if you want to use this simplified approach, you can
> > disable the test for the userspace datapath.
>
> There's no reason that I know of to run this test in the userspace
> datapath.


yep


> It should be disabled.


> How do you do that?
>

You need to add a macro to check applicability
for kernel, it will do nothing for check-kmod case; for full check-kernel
support, 'if desired', you need to check for versions that have the fix
for userspace, it will unconditionally skip the test, although a comment
explaining why would be helpful
see the example for CHECK_CT_DPIF_PER_ZONE_LIMIT which handles check-kmod
vs check-system-userspace

I noticed a couple other issues:

The test is labeled "fragment reassembly test":

a/ All the conntrack fragmentation tests include a reassembly aspect so a
better
name describing the special purpose of the test might be helpful along with
a
comment explaining the special purpose for the test.

b/ The "test" part of the name is redundant


>
> - Greg
>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] tests: Add check for correct l3l4 conntrack frag reassembly

2019-10-03 Thread Darrell Ball
On Thu, Oct 3, 2019 at 6:56 PM Darrell Ball  wrote:

> Thanks for the patch
>
> This approach will not work for the userspace datapath
>
> Few issues off the top of my head:
>
> 1/ packet-out frees the packet (which is a fragment in this case) after
> completion
>hence multiple packet-outs need to be part of a single Openflow bundle
> command as in other similar tests
>This test involves 2 fragments for the first 2 packets.
>
> 2/ Userpsace datapath checks UDP checksums; for V6 packets, they need to
> always be correct and they are not presently
>
> 3/ UDP header lengths cannot be larger than the memory allocated for the
> packet, else sanity checking will filter
> out the packet
>

'3' seems ok upon recalculation

So, the issues are '1' and '2'.



> Alternatively, if you want to use this simplified approach, you can
> disable the test for the userspace datapath.
>
> Darrell
>
>
> On Thu, Oct 3, 2019 at 10:46 AM Greg Rose  wrote:
>
>> Two commits recently fixed an issue with setting the corrrect l3 and l4
>> flow information when conntrack reassembles packet fragments.
>>
>> c98f776 datapath: Clear the L4 portion of the key for "later" fragments
>> 2609173 datapath: Properly set L4 keys on "later" IP fragments
>>
>> This test checks for regressions that might break this feature.  It
>> counts on the fact that when the bug is present the udp src port
>> will not be correct.  It will either be zero or else some other
>> garbage value.  So the test feeds some fragments through for
>> reassembly and then checks to make sure that the udp srce port
>> is actually the correct value of 5001.
>>
>> Tested by reverting the above commits and observing that the test
>> then fails.
>>
>> Signed-off-by: Greg Rose 
>> ---
>>
>> V2 - Break up long lines with dnl to help with email formatting
>> ---
>>  tests/system-traffic.at | 26 ++
>>  1 file changed, 26 insertions(+)
>>
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index bfc6bb5..9afd818 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -3245,6 +3245,32 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack |
>> FORMAT_CT(10.1.1.2)], [0], [dnl
>>  OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>>
>> +AT_SETUP([conntrack - fragment reassembly test])
>> +CHECK_CONNTRACK()
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +AT_DATA([flows.txt], [dnl
>> +action=normal
>> +])
>> +
>> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
>> +
>> +AT_CHECK([ovs-ofctl packet-out br0
>> "packet=52540003287c525400444ab586dd6006f70605b02c40200100010020200100010010110134e88deb13891389080803136161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616"dnl
>>
>> +"16161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161"dnl
>>
>> +"616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616

Re: [ovs-dev] [PATCH v2] tests: Add check for correct l3l4 conntrack frag reassembly

2019-10-03 Thread Darrell Ball
Thanks for the patch

This approach will not work for the userspace datapath

Few issues off the top of my head:

1/ packet-out frees the packet (which is a fragment in this case) after
completion
   hence multiple packet-outs need to be part of a single Openflow bundle
command as in other similar tests
   This test involves 2 fragments for the first 2 packets.

2/ Userpsace datapath checks UDP checksums; for V6 packets, they need to
always be correct and they are not presently

3/ UDP header lengths cannot be larger than the memory allocated for the
packet, else sanity checking will filter
out the packet

Alternatively, if you want to use this simplified approach, you can disable
the test for the userspace datapath.

Darrell


On Thu, Oct 3, 2019 at 10:46 AM Greg Rose  wrote:

> Two commits recently fixed an issue with setting the corrrect l3 and l4
> flow information when conntrack reassembles packet fragments.
>
> c98f776 datapath: Clear the L4 portion of the key for "later" fragments
> 2609173 datapath: Properly set L4 keys on "later" IP fragments
>
> This test checks for regressions that might break this feature.  It
> counts on the fact that when the bug is present the udp src port
> will not be correct.  It will either be zero or else some other
> garbage value.  So the test feeds some fragments through for
> reassembly and then checks to make sure that the udp srce port
> is actually the correct value of 5001.
>
> Tested by reverting the above commits and observing that the test
> then fails.
>
> Signed-off-by: Greg Rose 
> ---
>
> V2 - Break up long lines with dnl to help with email formatting
> ---
>  tests/system-traffic.at | 26 ++
>  1 file changed, 26 insertions(+)
>
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index bfc6bb5..9afd818 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -3245,6 +3245,32 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack |
> FORMAT_CT(10.1.1.2)], [0], [dnl
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>
> +AT_SETUP([conntrack - fragment reassembly test])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +AT_DATA([flows.txt], [dnl
> +action=normal
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +AT_CHECK([ovs-ofctl packet-out br0
> "packet=52540003287c525400444ab586dd6006f70605b02c40200100010020200100010010110134e88deb13891389080803136161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616"dnl
>
> +"16161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161"dnl
>
> +"61616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616"dnl
> 

Re: [ovs-dev] [PATCH] datapath: Fix conntrack cache with timeout

2019-09-29 Thread Darrell Ball
Thanks for the patch

Looks good and matches the upstream version, including the rcu deference
fixup.
Thanks for remembering to add the requested test incremental, post fix.

Darrell

On Fri, Sep 27, 2019 at 2:14 PM Yi-Hung Wei  wrote:

> This patch is from the following upstream net-next commit along with
> an updated system traffic test to avoid regression.
>
> Upstream commit:
> commit 7177895154e6a35179d332f4a584d396c50d0612
> Author: Yi-Hung Wei 
> Date:   Thu Aug 22 13:17:50 2019 -0700
>
> openvswitch: Fix conntrack cache with timeout
>
> This patch addresses a conntrack cache issue with timeout policy.
> Currently, we do not check if the timeout extension is set
> properly in the
> cached conntrack entry.  Thus, after packet recirculate from
> conntrack
> action, the timeout policy is not applied properly.  This patch
> fixes the
> aforementioned issue.
>
> Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct
> action")
> Reported-by: kbuild test robot 
> Signed-off-by: Yi-Hung Wei 
> Acked-by: Pravin B Shelar 
> Signed-off-by: David S. Miller 
>
> Signed-off-by: Yi-Hung Wei 
> ---
>  datapath/conntrack.c| 13 +
>  tests/system-traffic.at | 18 ++
>  2 files changed, 31 insertions(+)
>
> diff --git a/datapath/conntrack.c b/datapath/conntrack.c
> index 35a183aeb33a..c6d523758ff1 100644
> --- a/datapath/conntrack.c
> +++ b/datapath/conntrack.c
> @@ -88,6 +88,7 @@ struct ovs_conntrack_info {
> struct md_mark mark;
> struct md_labels labels;
> char timeout[CTNL_TIMEOUT_NAME_MAX];
> +   struct nf_ct_timeout *nf_ct_timeout;
>  #ifdef CONFIG_NF_NAT_NEEDED
> struct nf_nat_range2 range;  /* Only present for SRC NAT and DST
> NAT. */
>  #endif
> @@ -750,6 +751,14 @@ static bool skb_nfct_cached(struct net *net,
> if (help && rcu_access_pointer(help->helper) !=
> info->helper)
> return false;
> }
> +   if (info->nf_ct_timeout) {
> +   struct nf_conn_timeout *timeout_ext;
> +
> +   timeout_ext = nf_ct_timeout_find(ct);
> +   if (!timeout_ext || info->nf_ct_timeout !=
> +   rcu_dereference(timeout_ext->timeout))
> +   return false;
> +   }
> /* Force conntrack entry direction to the current packet? */
> if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
> /* Delete the conntrack entry if confirmed, else just
> release
> @@ -1709,6 +1718,10 @@ int ovs_ct_copy_action(struct net *net, const
> struct nlattr *attr,
>   ct_info.timeout))
> pr_info_ratelimited("Failed to associated timeout "
> "policy `%s'\n",
> ct_info.timeout);
> +   else
> +   ct_info.nf_ct_timeout = rcu_dereference(
> +   nf_ct_timeout_find(ct_info.ct)->timeout);
> +
> }
>
> if (helper) {
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index bfc6bb5b47c7..3d4e365764b5 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -3242,6 +3242,24 @@ sleep 4
>  AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
>  ])
>
> +dnl Re-send ICMP and UDP traffic to test conntrack cache
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=5054000a505400090800451c0011a4cd0a0101010a010102000100020008
> actions=resubmit(,0)"])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | sort],
> [0], [dnl
>
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0),zone=5
>
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=5
> +])
> +
> +dnl Wait until the timeout expire.
> +dnl We intend to wait a bit longer, because conntrack does not recycle
> the entry right after it is expired.
> +sleep 4
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> +])
> +
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>
> --
> 2.7.4
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH branch-2.6] conntrack: Fix ICMPv4 error data L4 length check.

2019-09-28 Thread Darrell Ball
Thanks for doing this Vishal !

Except for minor patch formatting issues (inline), this is fine and also
tests fine.

On Fri, Sep 27, 2019 at 10:54 PM Vishal Deep Ajmera <
vishal.deep.ajm...@ericsson.com> wrote:

> From: Darrell Ball 
>
> The ICMPv4 error data L4 length check was found to be too strict for TCP,
> expecting a minimum of 20 rather than 8 bytes.  This worked by
> hapenstance for other inner protocols.  The approach is to explicitly
> handle the ICMPv4 error data L4 length check and to do this for all
> supported inner protocols in the same way.  Making the code common
> between protocols also allows the existing ICMPv4 related UDP tests to
> cover TCP and ICMP inner protocol cases.
> Note that ICMPv6 does not have an 8 byte limit for error L4 data.
>
> Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
> CC: Daniele Di Proietto 
> Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361949.html
> Reported-by: Vishal Deep Ajmera 
> Signed-off-by: Vishal Deep Ajmera 
> Co-authored-by: Vishal Deep Ajmera 
> Signed-off-by: Darrell Ball 
>




> Signed-off-by: Ben Pfaff 
>

remove extra signoff when submitting the patch


> (cherry picked from commit 6c2a93064afe8d812e4506880d1fd8f96108f92a)
>
> Conflicts:
> lib/conntrack.c
>

move part starting from 'cherry' below the '---', although this info is not
really needed




> ---
>  lib/conntrack.c | 35 ---
>  lib/packets.h   |  3 +++
>  2 files changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 8abaf7e..d59083e 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -664,11 +664,12 @@ check_l4_icmp6(const struct conn_key *key, const
> void *data, size_t size,
>  }
>
>  static inline bool
> -extract_l4_tcp(struct conn_key *key, const void *data, size_t size)
> +extract_l4_tcp(struct conn_key *key, const void *data, size_t size,
> +   size_t *chk_len)
>  {
>  const struct tcp_header *tcp = data;
>
> -if (OVS_UNLIKELY(size < TCP_HEADER_LEN)) {
> +if (OVS_UNLIKELY(size < (chk_len ? *chk_len : TCP_HEADER_LEN))) {
>  return false;
>  }
>
> @@ -680,11 +681,12 @@ extract_l4_tcp(struct conn_key *key, const void
> *data, size_t size)
>  }
>
>  static inline bool
> -extract_l4_udp(struct conn_key *key, const void *data, size_t size)
> +extract_l4_udp(struct conn_key *key, const void *data, size_t size,
> +   size_t *chk_len)
>  {
>  const struct udp_header *udp = data;
>
> -if (OVS_UNLIKELY(size < UDP_HEADER_LEN)) {
> +if (OVS_UNLIKELY(size < (chk_len ? *chk_len : UDP_HEADER_LEN))) {
>  return false;
>  }
>
> @@ -696,7 +698,8 @@ extract_l4_udp(struct conn_key *key, const void *data,
> size_t size)
>  }
>
>  static inline bool extract_l4(struct conn_key *key, const void *data,
> -  size_t size, bool *related, const void *l3);
> +  size_t size, bool *related, const void *l3,
> +  size_t *chk_len);
>
>  static uint8_t
>  reverse_icmp_type(uint8_t type)
> @@ -728,11 +731,11 @@ reverse_icmp_type(uint8_t type)
>   * possible */
>  static inline int
>  extract_l4_icmp(struct conn_key *key, const void *data, size_t size,
> -bool *related)
> +bool *related, size_t *chk_len)
>  {
>  const struct icmp_header *icmp = data;
>
> -if (OVS_UNLIKELY(size < ICMP_HEADER_LEN)) {
> +if (OVS_UNLIKELY(size < (chk_len ? *chk_len : ICMP_HEADER_LEN))) {
>  return false;
>  }
>
> @@ -783,8 +786,9 @@ extract_l4_icmp(struct conn_key *key, const void
> *data, size_t size,
>  key->src = inner_key.src;
>  key->dst = inner_key.dst;
>  key->nw_proto = inner_key.nw_proto;
> +size_t check_len = ICMP_ERROR_DATA_L4_LEN;
>
> -ok = extract_l4(key, l4, tail - l4, NULL, l3);
> +ok = extract_l4(key, l4, tail - l4, NULL, l3, _len);
>  if (ok) {
>  conn_key_reverse(key);
>  *related = true;
> @@ -872,7 +876,7 @@ extract_l4_icmp6(struct conn_key *key, const void
> *data, size_t size,
>  key->dst = inner_key.dst;
>  key->nw_proto = inner_key.nw_proto;
>
> -ok = extract_l4(key, l4, tail - l4, NULL, l3);
> +ok = extract_l4(key, l4, tail - l4, NULL, l3, NULL);
>  if (ok) {
>  conn_key_reverse(key);
>  *related = true;
> @@ -897,21 +901,22 @@ extract_l4_icmp6(struct conn_key *key, const void
> *data, size_t size,
>   

Re: [ovs-dev] [patch v3] conntrack: Add option to disable TCP sequence checking.

2019-09-25 Thread Darrell Ball
On Wed, Sep 25, 2019 at 1:46 PM Ben Pfaff  wrote:

> On September 25, 2019 1:42:36 PM PDT, Darrell Ball 
> wrote:
>>
>> Thank you
>>
>> Pls see inline
>>
>> On Wed, Sep 25, 2019 at 10:26 AM Ben Pfaff  wrote:
>>
>>> On Tue, Sep 24, 2019 at 03:47:35PM -0700, Darrell Ball wrote:
>>> > This may be needed in some special cases, such as to support some
>>> hardware
>>> > offload implementations.  Note that disabling TCP sequence number
>>> > verification is not an optimization in itself, but supporting some
>>> > hardware offload implementations may offer better performance.  TCP
>>> > sequence number verification is enabled by default.  This option is
>>> only
>>> > available for the userspace datapath.  Access to this option is
>>> presently
>>> > provided via 'dpctl' commands as the need for this option is quite node
>>> > specific, by virtual
>>
>>
>> I changed 'virtual' to 'virtue'
>>
>>
>>> of which nics are in use on a given node.  A test is
>>> > added to verify this option.
>>> >
>>> > Reported-at:
>>> https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359188.html
>>> > Signed-off-by: Darrell Ball 
>>>
>>> Thanks a lot.
>>>
>>> It sounds like there are a couple of things you're planning to update
>>> here in any case, so I'll expect to see v4 sometime soon.
>>>
>>> This comment seems rather verbose, I'd probably just write "Check
>>> sequence numbers?" or similar:
>>
>>
>> How about ?
>>
>> /* Check TCP sequence numbers. */
>>
>>
>>> > +atomic_bool tcp_seq_ckk; /* TCP sequence number verification; when
>>> > +enabled, this enables sequence number
>>> > +verification; enabled by default. */
>>>
>>> The change to tcp_conn_update() is a bit obscure with side effects in
>>> ?:.  It might be clarified somewhat.  I'm appending a suggestion.
>>
>>
>> Better
>>
>>
>>>
>>
>>
>>> The text in dpctl.man could be wordsmithed a little.  Also appending a
>>> suggestion for that.
>>>
>>
>> Looks good
>>
>>
>>>
>>> The text in dpctl.man mentions 'be_liberal' mode but the tree doesn't
>>> have any other mention of that anywhere.
>>>
>>
>> Good point, I added some context wording.
>>
>> 'but not the same as 'be_liberal' mode, as in Netfilter.'
>>
>>
>>>
>>> Thanks again,
>>>
>>> Ben.
>>>
>>> diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
>>> index 1e843f337f8a..1dc7ead3b233 100644
>>> --- a/lib/conntrack-tcp.c
>>> +++ b/lib/conntrack-tcp.c
>>> @@ -149,6 +149,16 @@ tcp_get_wscale(const struct tcp_header *tcp)
>>>  return wscale;
>>>  }
>>>
>>> +static inline bool
>>>
>>
>> I only dropped the 'inline' specifier since it's implied.
>>
>>
>>> +tcp_bypass_seq_chk(struct conntrack *ct)
>>> +{
>>> +if (!conntrack_get_tcp_seq_chk(ct)) {
>>> +COVERAGE_INC(conntrack_tcp_seq_chk_bypass);
>>> +return true;
>>> +}
>>> +return false;
>>> +}
>>> +
>>>  static enum ct_update_res
>>>  tcp_conn_update(struct conntrack *ct, struct conn *conn_,
>>>  struct dp_packet *pkt, bool reply, long long now)
>>> @@ -288,8 +298,7 @@ tcp_conn_update(struct conntrack *ct, struct conn
>>> *conn_,
>>>  /* Acking not more than one window forward */
>>>  && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo
>>>  || (orig_seq == src->seqlo + 1) || (orig_seq + 1 ==
>>> src->seqlo)))
>>> -|| (!conntrack_get_tcp_seq_chk(ct)
>>> -? COVERAGE_INC(conntrack_tcp_seq_chk_bypass), 1 : 0)) {
>>> +|| tcp_bypass_seq_chk(ct)) {
>>>  /* Require an exact/+1 sequence match on resets when possible */
>>>
>>
>> Better; thank you
>>
>>
>>>
>>>  /* update max window */
>>>
>>> diff --git a/lib/dpctl.man b/lib/dpctl.man
>>> index 806e5d8e840d..53b7368e3b77 100644
>>> --- a/lib/dpctl.man
>>> +++ b/lib/dpctl.man
>>> @@ -324,12 +324,13 @@ Only supported for userspace datapath.
>&

[ovs-dev] [patch v4] conntrack: Add option to disable TCP sequence checking.

2019-09-25 Thread Darrell Ball
This may be needed in some special cases, such as to support some hardware
offload implementations.  Note that disabling TCP sequence number
verification is not an optimization in itself, but supporting some
hardware offload implementations may offer better performance.  TCP
sequence number verification is enabled by default.  This option is only
available for the userspace datapath.  Access to this option is presently
provided via 'dpctl' commands as the need for this option is quite node
specific, by virtue of which nics are in use on a given node.  A test is
added to verify this option.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359188.html
Signed-off-by: Darrell Ball 
---

v4: Splice tcp sequence number check in tcp_conn_update() out as a function
for clarity (Ben).

Fix up some dpctl man page comments (Ben).

s/ckk/chk/ for 'tcp_seq_chk'.

Shorten inline comment for 'tcp_seq_chk' variable (Ben).

Fix commit message spelling error.

v3: Make manpage comments more verbose.
Expand commit message comments.

v2: Per particular requirement, support 'no-tcp-seq-chk' rather than
'liberal' mode.

 NEWS|   3 ++
 lib/conntrack-private.h |   4 +-
 lib/conntrack-tcp.c |  22 +++-
 lib/conntrack.c |  16 ++
 lib/conntrack.h |   2 +
 lib/ct-dpif.c   |  16 ++
 lib/ct-dpif.h   |   2 +
 lib/dpctl.c |  64 +-
 lib/dpctl.man   |  18 +++
 lib/dpif-netdev.c   |  18 +++
 lib/dpif-netlink.c  |   2 +
 lib/dpif-provider.h |   5 ++
 tests/ofproto-dpif.at   | 138 
 13 files changed, 305 insertions(+), 5 deletions(-)

diff --git a/NEWS b/NEWS
index aba2e4a..3b72d1b 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,9 @@ Post-v2.12.0
  * OVN has been removed from this repository. It now exists as a
separate project. You can find it at
https://github.com/ovn-org/ovn.git
+   - Userspace datapath:
+ * Add option to enable, disable and query TCP sequence checking in
+   conntrack.
 
 v2.12.0 - 03 Sep 2019
 -
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index bcfbe10..590f139 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -171,8 +171,8 @@ struct conntrack {
 struct hindex alg_expectation_refs OVS_GUARDED; /* For lookup from
  * control context.  */
 
-/* Fragmentation handling context. */
-struct ipf *ipf;
+struct ipf *ipf; /* Fragmentation handling context. */
+atomic_bool tcp_seq_chk; /* Check TCP sequence numbers. */
 };
 
 /* Lock acquisition order:
diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
index 397aca1..47eb8e2 100644
--- a/lib/conntrack-tcp.c
+++ b/lib/conntrack-tcp.c
@@ -39,10 +39,15 @@
 #include 
 
 #include "conntrack-private.h"
+#include "coverage.h"
 #include "ct-dpif.h"
 #include "dp-packet.h"
 #include "util.h"
 
+COVERAGE_DEFINE(conntrack_tcp_seq_chk_bypass);
+COVERAGE_DEFINE(conntrack_tcp_seq_chk_failed);
+COVERAGE_DEFINE(conntrack_invalid_tcp_flags);
+
 struct tcp_peer {
 uint32_t   seqlo;  /* Max sequence number sent */
 uint32_t   seqhi;  /* Max the other end ACKd + win */
@@ -144,6 +149,16 @@ tcp_get_wscale(const struct tcp_header *tcp)
 return wscale;
 }
 
+static bool
+tcp_bypass_seq_chk(struct conntrack *ct)
+{
+if (!conntrack_get_tcp_seq_chk(ct)) {
+COVERAGE_INC(conntrack_tcp_seq_chk_bypass);
+return true;
+}
+return false;
+}
+
 static enum ct_update_res
 tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 struct dp_packet *pkt, bool reply, long long now)
@@ -162,6 +177,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 uint32_t p_len = tcp_payload_length(pkt);
 
 if (tcp_invalid_flags(tcp_flags)) {
+COVERAGE_INC(conntrack_invalid_tcp_flags);
 return CT_UPDATE_INVALID;
 }
 
@@ -272,7 +288,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 
 int ackskew = check_ackskew ? dst->seqlo - ack : 0;
 #define MAXACKWINDOW (0x + 1500)/* 1500 is an arbitrary fudge factor */
-if (SEQ_GEQ(src->seqhi, end)
+if ((SEQ_GEQ(src->seqhi, end)
 /* Last octet inside other's window space */
 && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
 /* Retrans: not more than one window back */
@@ -281,7 +297,8 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 && (ackskew <= (MAXACKWINDOW << sws))
 /* Acking not more than one window forward */
 && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo
-|| (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) {
+|| (or

Re: [ovs-dev] [patch v3] conntrack: Add option to disable TCP sequence checking.

2019-09-25 Thread Darrell Ball
Thank you

Pls see inline

On Wed, Sep 25, 2019 at 10:26 AM Ben Pfaff  wrote:

> On Tue, Sep 24, 2019 at 03:47:35PM -0700, Darrell Ball wrote:
> > This may be needed in some special cases, such as to support some
> hardware
> > offload implementations.  Note that disabling TCP sequence number
> > verification is not an optimization in itself, but supporting some
> > hardware offload implementations may offer better performance.  TCP
> > sequence number verification is enabled by default.  This option is only
> > available for the userspace datapath.  Access to this option is presently
> > provided via 'dpctl' commands as the need for this option is quite node
> > specific, by virtual


I changed 'virtual' to 'virtue'


> of which nics are in use on a given node.  A test is
> > added to verify this option.
> >
> > Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359188.html
> > Signed-off-by: Darrell Ball 
>
> Thanks a lot.
>
> It sounds like there are a couple of things you're planning to update
> here in any case, so I'll expect to see v4 sometime soon.
>
> This comment seems rather verbose, I'd probably just write "Check
> sequence numbers?" or similar:


How about ?

/* Check TCP sequence numbers. */


> > +atomic_bool tcp_seq_ckk; /* TCP sequence number verification; when
> > +enabled, this enables sequence number
> > +verification; enabled by default. */
>
> The change to tcp_conn_update() is a bit obscure with side effects in
> ?:.  It might be clarified somewhat.  I'm appending a suggestion.


Better


>


> The text in dpctl.man could be wordsmithed a little.  Also appending a
> suggestion for that.
>

Looks good


>
> The text in dpctl.man mentions 'be_liberal' mode but the tree doesn't
> have any other mention of that anywhere.
>

Good point, I added some context wording.

'but not the same as 'be_liberal' mode, as in Netfilter.'


>
> Thanks again,
>
> Ben.
>
> diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
> index 1e843f337f8a..1dc7ead3b233 100644
> --- a/lib/conntrack-tcp.c
> +++ b/lib/conntrack-tcp.c
> @@ -149,6 +149,16 @@ tcp_get_wscale(const struct tcp_header *tcp)
>  return wscale;
>  }
>
> +static inline bool
>

I only dropped the 'inline' specifier since it's implied.


> +tcp_bypass_seq_chk(struct conntrack *ct)
> +{
> +if (!conntrack_get_tcp_seq_chk(ct)) {
> +COVERAGE_INC(conntrack_tcp_seq_chk_bypass);
> +return true;
> +}
> +return false;
> +}
> +
>  static enum ct_update_res
>  tcp_conn_update(struct conntrack *ct, struct conn *conn_,
>  struct dp_packet *pkt, bool reply, long long now)
> @@ -288,8 +298,7 @@ tcp_conn_update(struct conntrack *ct, struct conn
> *conn_,
>  /* Acking not more than one window forward */
>  && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo
>  || (orig_seq == src->seqlo + 1) || (orig_seq + 1 ==
> src->seqlo)))
> -|| (!conntrack_get_tcp_seq_chk(ct)
> -? COVERAGE_INC(conntrack_tcp_seq_chk_bypass), 1 : 0)) {
> +|| tcp_bypass_seq_chk(ct)) {
>  /* Require an exact/+1 sequence match on resets when possible */
>

Better; thank you


>
>  /* update max window */
>
> diff --git a/lib/dpctl.man b/lib/dpctl.man
> index 806e5d8e840d..53b7368e3b77 100644
> --- a/lib/dpctl.man
> +++ b/lib/dpctl.man
> @@ -324,12 +324,13 @@ Only supported for userspace datapath.
>  .TQ
>  \*(DX\fBct\-disable\-tcp\-seq\-chk\fR [\fIdp\fR]
>  Enables or disables TCP sequence checking.  When set to disabled, all
> sequence
> -number verification is disabled, including for TCP resets and hence this
> is
> -similar, but not equivalent to 'be_liberal' mode.  Disabling sequence
> number
> +number verification is disabled, including for TCP resets.  This is
> +similar, but not the same as, 'be_liberal' mode.  Disabling sequence
> number
>  verification is not an optimization in itself, but is needed for some
> hardware
> -offload support which might offer some performance advantage,  This is
> enabled
> +offload support which might offer some performance advantage.  Sequence
> +number checking is enabled
>  by default to enforce better security and should only be disabled if
> -absolutely required. This command is only supported for the userspace
> +required for hardware offload support. This command is only supported for
> the userspace
>  datapath.
>

Looks good; thank you
I ended up with:

Enables or disables TCP sequence checking.  When set to disabled, all
sequence
number verificati

Re: [ovs-dev] [patch v6] conntrack: Optimize recirculations.

2019-09-25 Thread Darrell Ball
On Wed, Sep 25, 2019 at 10:51 AM Ben Pfaff  wrote:

> On Mon, Aug 26, 2019 at 09:05:44AM -0700, Darrell Ball wrote:
> > Cache the 'conn' context and use it when it is valid.  The cached 'conn'
> > context will get reset if it is not expected to be valid; the cost to do
> > this is negligible.  Besides being most optimal, this also handles corner
> > cases, such as decapsulation leading to the same tuple, as in tunnel VPN
> > cases.  A negative test is added to check the resetting of the cached
> > 'conn'.
> >
> > Signed-off-by: Darrell Ball 
>
> Applied to master.  Thank you!
>

cool, thank you
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch v3] conntrack: Add option to disable TCP sequence checking.

2019-09-24 Thread Darrell Ball
This may be needed in some special cases, such as to support some hardware
offload implementations.  Note that disabling TCP sequence number
verification is not an optimization in itself, but supporting some
hardware offload implementations may offer better performance.  TCP
sequence number verification is enabled by default.  This option is only
available for the userspace datapath.  Access to this option is presently
provided via 'dpctl' commands as the need for this option is quite node
specific, by virtual of which nics are in use on a given node.  A test is
added to verify this option.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359188.html
Signed-off-by: Darrell Ball 
---

v3: Make manpage comments more verbose.
Expand commit message comments.

v2: Per particular requirement, support 'no-tcp-seq-chk' rather than
'liberal' mode. 

 NEWS|   3 ++
 lib/conntrack-private.h |   6 ++-
 lib/conntrack-tcp.c |  13 -
 lib/conntrack.c |  16 ++
 lib/conntrack.h |   2 +
 lib/ct-dpif.c   |  16 ++
 lib/ct-dpif.h   |   2 +
 lib/dpctl.c |  64 +-
 lib/dpctl.man   |  18 +++
 lib/dpif-netdev.c   |  18 +++
 lib/dpif-netlink.c  |   2 +
 lib/dpif-provider.h |   5 ++
 tests/ofproto-dpif.at   | 138 
 13 files changed, 298 insertions(+), 5 deletions(-)

diff --git a/NEWS b/NEWS
index aba2e4a..3b72d1b 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,9 @@ Post-v2.12.0
  * OVN has been removed from this repository. It now exists as a
separate project. You can find it at
https://github.com/ovn-org/ovn.git
+   - Userspace datapath:
+ * Add option to enable, disable and query TCP sequence checking in
+   conntrack.
 
 v2.12.0 - 03 Sep 2019
 -
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index bcfbe10..7cdc2a4 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -171,8 +171,10 @@ struct conntrack {
 struct hindex alg_expectation_refs OVS_GUARDED; /* For lookup from
  * control context.  */
 
-/* Fragmentation handling context. */
-struct ipf *ipf;
+struct ipf *ipf; /* Fragmentation handling context. */
+atomic_bool tcp_seq_ckk; /* TCP sequence number verification; when
+enabled, this enables sequence number
+verification; enabled by default. */
 };
 
 /* Lock acquisition order:
diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
index 397aca1..1e843f3 100644
--- a/lib/conntrack-tcp.c
+++ b/lib/conntrack-tcp.c
@@ -39,10 +39,15 @@
 #include 
 
 #include "conntrack-private.h"
+#include "coverage.h"
 #include "ct-dpif.h"
 #include "dp-packet.h"
 #include "util.h"
 
+COVERAGE_DEFINE(conntrack_tcp_seq_chk_bypass);
+COVERAGE_DEFINE(conntrack_tcp_seq_chk_failed);
+COVERAGE_DEFINE(conntrack_invalid_tcp_flags);
+
 struct tcp_peer {
 uint32_t   seqlo;  /* Max sequence number sent */
 uint32_t   seqhi;  /* Max the other end ACKd + win */
@@ -162,6 +167,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 uint32_t p_len = tcp_payload_length(pkt);
 
 if (tcp_invalid_flags(tcp_flags)) {
+COVERAGE_INC(conntrack_invalid_tcp_flags);
 return CT_UPDATE_INVALID;
 }
 
@@ -272,7 +278,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 
 int ackskew = check_ackskew ? dst->seqlo - ack : 0;
 #define MAXACKWINDOW (0x + 1500)/* 1500 is an arbitrary fudge factor */
-if (SEQ_GEQ(src->seqhi, end)
+if ((SEQ_GEQ(src->seqhi, end)
 /* Last octet inside other's window space */
 && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
 /* Retrans: not more than one window back */
@@ -281,7 +287,9 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 && (ackskew <= (MAXACKWINDOW << sws))
 /* Acking not more than one window forward */
 && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo
-|| (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) {
+|| (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo)))
+|| (!conntrack_get_tcp_seq_chk(ct)
+? COVERAGE_INC(conntrack_tcp_seq_chk_bypass), 1 : 0)) {
 /* Require an exact/+1 sequence match on resets when possible */
 
 /* update max window */
@@ -385,6 +393,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT;
 }
 } else {
+COVERAGE_INC(conntrack_tcp_seq_chk_failed);
 return CT_UPDATE_INVALID;
 }
 
diff --git a/lib/conntrack.c b/

Re: [ovs-dev] [patch v2] conntrack: Add option to disable TCP sequence checking.

2019-09-24 Thread Darrell Ball
Thanks Ben

I just noticed I sent another version here
https://patchwork.ozlabs.org/patch/1153283/
but there were a couple of issues that deserve a resend.
It might need a rebase as well

Darrell

On Tue, Sep 24, 2019 at 3:18 PM Ben Pfaff  wrote:

> On Wed, Jun 12, 2019 at 12:44:18PM -0700, Darrell Ball wrote:
> > On Wed, Jun 12, 2019 at 10:58 AM Ben Pfaff  wrote:
> >
> > > On Wed, Jun 12, 2019 at 10:31:17AM -0700, Darrell Ball wrote:
> > > > On Wed, Jun 12, 2019 at 10:09 AM Ben Pfaff  wrote:
> > > >
> > > > > On Wed, Jun 12, 2019 at 08:46:06AM -0700, Darrell Ball wrote:
> > > > > > On Mon, Jun 10, 2019 at 9:51 AM Ben Pfaff  wrote:
> > > > > >
> > > > > > > On Sun, Jun 09, 2019 at 07:35:09AM -0700, Darrell Ball wrote:
> > > > > > > > This may be needed in some special cases, such as to support
> some
> > > > > > > > hardware offload implementations.
> > > > > > > >
> > > > > > > > Reported-at:
> > > > > > >
> > > https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359188.html
> > > > > > > > Signed-off-by: Darrell Ball 
> > > > > > > > ---
> > > > > > > >
> > > > > > > > v2: Per particular requirement, support  'no-tcp-seq-chk'
> rather
> > > than
> > > > > > > > 'liberal' mode.
> > > > > > > >
> > > > > > > > Add some debug counters.
> > > > > > >
> > > > > > > I'm not sure whether an ovs-appctl command is the best way for
> > > users to
> > > > > > > enable and disable this.  It means that it is difficult for an
> > > OpenFlow
> > > > > > > controller to do it, since those commands aren't exposed via
> > > OpenFlow
> > > > > or
> > > > > > > OVSDB.
> > > > > > >
> > > > > >
> > > > > > Thanks for your comments
> > > > > >
> > > > > > For local controller usage, we are using ovs-appctl today in
> similar
> > > > > cases
> > > > > > for existing products.
> > > > > >
> > > > > > In the case of non-local controller usage, the remote controller
> > > would
> > > > > need
> > > > > > remote access.
> > > > > >
> > > > > > However, in this case,  I don't expect the remote controller to
> be
> > > > > > involved; I was assuming
> > > > > > that a deployment script would be used to set the value to
> > > non-default
> > > > > > value (in needed cases)
> > > > > > when ovs-vswitchd is (re)started only. If this assumption cannot
> be
> > > > > > satisfied then we would
> > > > > > have to have to introduce a dependency on the database for these
> > > types of
> > > > > > commands.
> > > > >
> > > > > This seems to be teetering toward the pre-SDN model of having to
> > > > > separately configure each switch.  Do you have some rationale in
> mind
> > > > > why this should be a per-node decision rather than one made by the
> > > > > controller?
> > > >
> > > >
> > > > 1/ Because of the reduced security implications vs higher performance
> > > > advantage, it would be a per node (or per node role) decision of
> whether
> > > > to use it or not.
> > >
> > > Are you saying that the only advantage of disabling TCP sequence
> > > checking is performance, and only in the presence of hardware for
> > > offloading that requires it?
> >
> >
> > Some HWOL implementations would be the most common 'recommended' usage.
> > I will be adding a general statement to the documentation and will echo
> it
> > in the commit
> > message.
>
> Is there a v3 with that change?  I haven't been able to find it.
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v1] conntrack: Fix 'check_orig_tuple()' Valgrind false positive.

2019-09-24 Thread Darrell Ball
Thanks Ben

Would you mind applying to 2.12 as well.

Darrell

On Tue, Sep 24, 2019 at 2:34 PM Ben Pfaff  wrote:

> On Mon, Sep 23, 2019 at 04:44:33PM -0700, Darrell Ball wrote:
> > Valgrind reported that 'pkt->md.ct_orig_tuple.ipv4.ipv4_proto' is
> > uninitialized in 'check_orig_tuple()', if 'ct_state' is zero.  Although
> > this is true, the check is superceded, as even if it succeeds the check
> > for natted packets based on 'ct_state' is an ORed condition and is
> intended
> > to catch this case.
> > The check is '!(pkt->md.ct_state & (CS_SRC_NAT | CS_DST_NAT))' which
> > filters out all packets excepted natted ones.  Move this check up to
> > prevent the Valgrind complaint, which also helps performance and also
> remove
> > recenlty added redundant check adding extra cycles.
> >
> > Fixes: f44733c527da ("conntrack: Validate accessing of conntrack data in
> pkt_metadata.")
> > CC: Yifeng Sun 
> > Signed-off-by: Darrell Ball 
>
> Thanks, applied to master.
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch v1] conntrack: Fix 'check_orig_tuple()' Valgrind false positive.

2019-09-23 Thread Darrell Ball
Valgrind reported that 'pkt->md.ct_orig_tuple.ipv4.ipv4_proto' is
uninitialized in 'check_orig_tuple()', if 'ct_state' is zero.  Although
this is true, the check is superceded, as even if it succeeds the check
for natted packets based on 'ct_state' is an ORed condition and is intended
to catch this case.
The check is '!(pkt->md.ct_state & (CS_SRC_NAT | CS_DST_NAT))' which
filters out all packets excepted natted ones.  Move this check up to
prevent the Valgrind complaint, which also helps performance and also remove
recenlty added redundant check adding extra cycles.

Fixes: f44733c527da ("conntrack: Validate accessing of conntrack data in 
pkt_metadata.")
CC: Yifeng Sun 
Signed-off-by: Darrell Ball 
---
 lib/conntrack.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index fd71e6c..b56ef06 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1005,11 +1005,11 @@ check_orig_tuple(struct conntrack *ct, struct dp_packet 
*pkt,
  struct conn **conn,
  const struct nat_action_info_t *nat_action_info)
 {
-if ((ctx_in->key.dl_type == htons(ETH_TYPE_IP) &&
+if (!(pkt->md.ct_state & (CS_SRC_NAT | CS_DST_NAT)) ||
+(ctx_in->key.dl_type == htons(ETH_TYPE_IP) &&
  !pkt->md.ct_orig_tuple.ipv4.ipv4_proto) ||
 (ctx_in->key.dl_type == htons(ETH_TYPE_IPV6) &&
  !pkt->md.ct_orig_tuple.ipv6.ipv6_proto) ||
-!(pkt->md.ct_state & (CS_SRC_NAT | CS_DST_NAT)) ||
 nat_action_info) {
 return false;
 }
@@ -1142,8 +1142,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
 handle_nat(pkt, conn, zone, ctx->reply, ctx->icmp_related);
 }
 
-} else if (pkt->md.ct_state
-   && check_orig_tuple(ct, pkt, ctx, now, , nat_action_info)) 
{
+} else if (check_orig_tuple(ct, pkt, ctx, now, , nat_action_info)) {
 create_new_conn = conn_update_state(ct, pkt, ctx, conn, now);
 } else {
 if (ctx->icmp_related) {
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch v2] conntrack: Fix 'reverse_nat_packet()' variable datatype.

2019-08-30 Thread Darrell Ball
The datatype 'pad' in the function 'reverse_nat_packet()' was incorrectly
declared as 'char' instead of 'uint8_t'. This can affect reverse natting
of icmpX packets with padding > 127 bytes.  At the same time, add some
comments regarding 'extract_l3_ipvX' usage in this function.  Found by
inspection.

Fixes: edd1bef468c0 ("dpdk: Add more ICMP Related NAT support.")
Signed-off-by: Darrell Ball 
---

v2: Elaborate added comments.

 lib/conntrack.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index e5266e5..6452d82 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -688,7 +688,7 @@ static void
 reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn)
 {
 char *tail = dp_packet_tail(pkt);
-char pad = dp_packet_l2_pad_size(pkt);
+uint8_t pad = dp_packet_l2_pad_size(pkt);
 struct conn_key inner_key;
 const char *inner_l4 = NULL;
 uint16_t orig_l3_ofs = pkt->l3_ofs;
@@ -698,6 +698,8 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn 
*conn)
 struct ip_header *nh = dp_packet_l3(pkt);
 struct icmp_header *icmp = dp_packet_l4(pkt);
 struct ip_header *inner_l3 = (struct ip_header *) (icmp + 1);
+/* This call is already verified to succeed during the code path from
+ * 'conn_key_extract()' which calls 'extract_l4_icmp()'. */
 extract_l3_ipv4(_key, inner_l3, tail - ((char *)inner_l3) - pad,
 _l4, false);
 pkt->l3_ofs += (char *) inner_l3 - (char *) nh;
@@ -719,6 +721,8 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn 
*conn)
 struct icmp6_error_header *icmp6 = dp_packet_l4(pkt);
 struct ovs_16aligned_ip6_hdr *inner_l3_6 =
 (struct ovs_16aligned_ip6_hdr *) (icmp6 + 1);
+/* This call is already verified to succeed during the code path from
+ * 'conn_key_extract()' which calls 'extract_l4_icmp6()'. */
 extract_l3_ipv6(_key, inner_l3_6,
 tail - ((char *)inner_l3_6) - pad,
 _l4);
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch v1] conntrack: Fix 'reverse_nat_packet()' variable datatype.

2019-08-29 Thread Darrell Ball
The datatype 'pad' in the function 'reverse_nat_packet()' was incorrectly
declared as 'char' instead of 'uint8_t'. This can affect reverse natting
of icmpX packets with padding > 127 bytes.  At the same time, add some
comments regarding 'extract_l3_ipvX' usage in this function.  Found by
inspection.

Fixes: edd1bef468c0 ("dpdk: Add more ICMP Related NAT support.")
Signed-off-by: Darrell Ball 
---
 lib/conntrack.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index e5266e5..79d2711 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -688,7 +688,7 @@ static void
 reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn)
 {
 char *tail = dp_packet_tail(pkt);
-char pad = dp_packet_l2_pad_size(pkt);
+uint8_t pad = dp_packet_l2_pad_size(pkt);
 struct conn_key inner_key;
 const char *inner_l4 = NULL;
 uint16_t orig_l3_ofs = pkt->l3_ofs;
@@ -698,6 +698,7 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn 
*conn)
 struct ip_header *nh = dp_packet_l3(pkt);
 struct icmp_header *icmp = dp_packet_l4(pkt);
 struct ip_header *inner_l3 = (struct ip_header *) (icmp + 1);
+/* conn_key_extract() usage already verified extraction success. */
 extract_l3_ipv4(_key, inner_l3, tail - ((char *)inner_l3) - pad,
 _l4, false);
 pkt->l3_ofs += (char *) inner_l3 - (char *) nh;
@@ -719,6 +720,7 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn 
*conn)
 struct icmp6_error_header *icmp6 = dp_packet_l4(pkt);
 struct ovs_16aligned_ip6_hdr *inner_l3_6 =
 (struct ovs_16aligned_ip6_hdr *) (icmp6 + 1);
+/* conn_key_extract() usage already verified extraction success. */
 extract_l3_ipv6(_key, inner_l3_6,
 tail - ((char *)inner_l3_6) - pad,
 _l4);
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v2] conntrack: Fix ICMPv4 error data L4 length check.

2019-08-29 Thread Darrell Ball
On Thu, Aug 29, 2019 at 7:37 AM Ben Pfaff  wrote:

> On Tue, Aug 27, 2019 at 04:59:02PM -0700, Darrell Ball wrote:
> > The ICMPv4 error data L4 length check was found to be too strict for TCP,
> > expecting a minimum of 20 rather than 8 bytes.  This worked by
> > hapenstance for other inner protocols.  The approach is to explicitly
> > handle the ICMPv4 error data L4 length check and to do this for all
> > supported inner protocols in the same way.  Making the code common
> > between protocols also allows the existing ICMPv4 related UDP tests to
> > cover TCP and ICMP inner protocol cases.
> > Note that ICMPv6 does not have an 8 byte limit for error L4 data.
> >
> > Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
> > CC: Daniele Di Proietto 
> > Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361949.html
> > Reported-by: Vishal Deep Ajmera 
> > Signed-off-by: Vishal Deep Ajmera 
> > Co-authored-by: Vishal Deep Ajmera 
> > Signed-off-by: Darrell Ball 
>
> Thanks, applied to master and branch-2.12.
>

Thanks
This is eligible to go back to 2.6; it should apply cleanly back to 2.9; I
can look into the remaining ones,
unless Vishal would like to do those.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] userspace: Enable non-bridge port as tunnel endpoint.

2019-08-29 Thread Darrell Ball
On Thu, Aug 29, 2019 at 8:40 AM Yifeng Sun  wrote:

> Thanks Ben and Darrell, let me check it out.
>

Can you also add a use case description ?

>From a controller POV, defining which interface is a vtep can be done in
one place and
distributed to the rest of the system; it is not much code. There is more
work associated
with handling the difference b/w datapath types (kernel vs userspace) plus
other things.

Using the bridge internal interface for the vtep, as is already supported,
has some advantages
including that it is already there.


>
> On Wed, Aug 28, 2019 at 12:46 PM Darrell Ball  wrote:
> >
> > Thanks for the patch
> >
> > How about writing a system test ?
> >
> > Darrell
> >
> > On Wed, Aug 28, 2019 at 10:50 AM Yifeng Sun 
> wrote:
> >>
> >> For userspace datapath, currently only the bridge itself, the LOCAL
> port,
> >> can be the tunnel endpoint to encap/decap tunnel packets.  This patch
> >> enables non-bridge port as tunnel endpoint.  One use case is for users
> to
> >> create a bridge and a vtep port as tap, and configure underlay IP at
> vtep
> >> port as the tunnel endpoint.
> >>
> >> This patch causes failure for test "ptap - L3 over patch port". This is
> >> because this test is already using non-bridge port gre1 as tunnel
> endpoint.
> >> In this test, an extra flow is added to support this, as shown below:
> >>   ovs-ofctl add-flow br1 in_port=p1,actions=output=gre1
> >>
> >> It later generates a datapath flow which matches an extra eth field:
> >>   - recirc_id(0),...,eth_type(0x0800),...
> >>   + recirc_id(0),...,eth(dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),...
> >>
> >> With this patch, the above flow is no longer needed.
> >>
> >> Signed-off-by: William Tu 
> >> Co-authored-by: William Tu 
> >> Signed-off-by: Yifeng Sun 
> >> ---
> >> v1->v2: Fixed an error pointed out by Ben.
> >> v2->v3: Fixed a test failure, thanks Ben for review and testing!
> >>  ofproto/ofproto-dpif-xlate.c | 56
> +++-
> >>  tests/packet-type-aware.at   |  1 -
> >>  tests/tunnel-push-pop.at | 55
> +++
> >>  3 files changed, 100 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> >> index 02a2a4535542..290924634f36 100644
> >> --- a/ofproto/ofproto-dpif-xlate.c
> >> +++ b/ofproto/ofproto-dpif-xlate.c
> >> @@ -3410,6 +3410,19 @@ tnl_route_lookup_flow(const struct xlate_ctx
> *ctx,
> >>  }
> >>  }
> >>  }
> >> +
> >> +/* If tunnel IP isn't configured on bridges, then we search all
> ports. */
> >> +HMAP_FOR_EACH (xbridge, hmap_node, >xcfg->xbridges) {
> >> +struct xport *port;
> >> +
> >> +HMAP_FOR_EACH (port, ofp_node, >xports) {
> >> +if (!strncmp(netdev_get_name(port->netdev),
> >> + out_dev, IFNAMSIZ)) {
> >> +*out_port = port;
> >> +return 0;
> >> +}
> >> +}
> >> +}
> >>  return -ENOENT;
> >>  }
> >>
> >> @@ -3972,6 +3985,16 @@ is_nd_dst_correct(const struct flow *flow, const
> struct in6_addr *ipv6_addr)
> >>  IN6_ARE_ADDR_EQUAL(>ipv6_dst, ipv6_addr);
> >>  }
> >>
> >> +static bool
> >> +is_neighbor_reply_matched(const struct flow *flow, struct in6_addr
> *ip_addr)
> >> +{
> >> +return ((IN6_IS_ADDR_V4MAPPED(ip_addr) &&
> >> + flow->dl_type == htons(ETH_TYPE_ARP) &&
> >> + in6_addr_get_mapped_ipv4(ip_addr) == flow->nw_dst) ||
> >> +(!IN6_IS_ADDR_V4MAPPED(ip_addr) &&
> >> +  is_nd_dst_correct(flow, ip_addr)));
> >> +}
> >> +
> >>  /* Function verifies if the ARP reply or Neighbor Advertisement
> represented by
> >>   * 'flow' addresses the 'xbridge' of 'ctx'. Returns true if the ARP TA
> or
> >>   * neighbor discovery destination is in the list of configured IP
> addresses of
> >> @@ -3986,11 +4009,7 @@ is_neighbor_reply_correct(const struct xlate_ctx
> *ctx, const struct flow *flow)
> >>  /* Verify if 'nw_dst' of ARP or 'ipv6_dst' of ICMPV6 is in the
> list. */
> >>  for (i = 0; xbridge_addr && i < xbridge_a

Re: [ovs-dev] [PATCH v3] userspace: Enable non-bridge port as tunnel endpoint.

2019-08-28 Thread Darrell Ball
Thanks for the patch

How about writing a system test ?

Darrell

On Wed, Aug 28, 2019 at 10:50 AM Yifeng Sun  wrote:

> For userspace datapath, currently only the bridge itself, the LOCAL port,
> can be the tunnel endpoint to encap/decap tunnel packets.  This patch
> enables non-bridge port as tunnel endpoint.  One use case is for users to
> create a bridge and a vtep port as tap, and configure underlay IP at vtep
> port as the tunnel endpoint.
>
> This patch causes failure for test "ptap - L3 over patch port". This is
> because this test is already using non-bridge port gre1 as tunnel endpoint.
> In this test, an extra flow is added to support this, as shown below:
>   ovs-ofctl add-flow br1 in_port=p1,actions=output=gre1
>
> It later generates a datapath flow which matches an extra eth field:
>   - recirc_id(0),...,eth_type(0x0800),...
>   + recirc_id(0),...,eth(dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),...
>
> With this patch, the above flow is no longer needed.
>
> Signed-off-by: William Tu 
> Co-authored-by: William Tu 
> Signed-off-by: Yifeng Sun 
> ---
> v1->v2: Fixed an error pointed out by Ben.
> v2->v3: Fixed a test failure, thanks Ben for review and testing!
>  ofproto/ofproto-dpif-xlate.c | 56
> +++-
>  tests/packet-type-aware.at   |  1 -
>  tests/tunnel-push-pop.at | 55
> +++
>  3 files changed, 100 insertions(+), 12 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 02a2a4535542..290924634f36 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3410,6 +3410,19 @@ tnl_route_lookup_flow(const struct xlate_ctx *ctx,
>  }
>  }
>  }
> +
> +/* If tunnel IP isn't configured on bridges, then we search all
> ports. */
> +HMAP_FOR_EACH (xbridge, hmap_node, >xcfg->xbridges) {
> +struct xport *port;
> +
> +HMAP_FOR_EACH (port, ofp_node, >xports) {
> +if (!strncmp(netdev_get_name(port->netdev),
> + out_dev, IFNAMSIZ)) {
> +*out_port = port;
> +return 0;
> +}
> +}
> +}
>  return -ENOENT;
>  }
>
> @@ -3972,6 +3985,16 @@ is_nd_dst_correct(const struct flow *flow, const
> struct in6_addr *ipv6_addr)
>  IN6_ARE_ADDR_EQUAL(>ipv6_dst, ipv6_addr);
>  }
>
> +static bool
> +is_neighbor_reply_matched(const struct flow *flow, struct in6_addr
> *ip_addr)
> +{
> +return ((IN6_IS_ADDR_V4MAPPED(ip_addr) &&
> + flow->dl_type == htons(ETH_TYPE_ARP) &&
> + in6_addr_get_mapped_ipv4(ip_addr) == flow->nw_dst) ||
> +(!IN6_IS_ADDR_V4MAPPED(ip_addr) &&
> +  is_nd_dst_correct(flow, ip_addr)));
> +}
> +
>  /* Function verifies if the ARP reply or Neighbor Advertisement
> represented by
>   * 'flow' addresses the 'xbridge' of 'ctx'. Returns true if the ARP TA or
>   * neighbor discovery destination is in the list of configured IP
> addresses of
> @@ -3986,11 +4009,7 @@ is_neighbor_reply_correct(const struct xlate_ctx
> *ctx, const struct flow *flow)
>  /* Verify if 'nw_dst' of ARP or 'ipv6_dst' of ICMPV6 is in the list.
> */
>  for (i = 0; xbridge_addr && i < xbridge_addr->n_addr; i++) {
>  struct in6_addr *ip_addr = _addr->addr[i];
> -if ((IN6_IS_ADDR_V4MAPPED(ip_addr) &&
> - flow->dl_type == htons(ETH_TYPE_ARP) &&
> - in6_addr_get_mapped_ipv4(ip_addr) == flow->nw_dst) ||
> -(!IN6_IS_ADDR_V4MAPPED(ip_addr) &&
> -  is_nd_dst_correct(flow, ip_addr))) {
> +if (is_neighbor_reply_matched(flow, ip_addr)) {
>  /* Found a match. */
>  ret = true;
>  break;
> @@ -3998,20 +4017,35 @@ is_neighbor_reply_correct(const struct xlate_ctx
> *ctx, const struct flow *flow)
>  }
>
>  xbridge_addr_unref(xbridge_addr);
> +
> +/* If not found in bridge's IPs, search in its ports. */
> +if (!ret) {
> +struct in6_addr *ip_addr, *mask;
> +struct xport *port;
> +int error, n_in6;
> +
> +HMAP_FOR_EACH (port, ofp_node, >xbridge->xports) {
> +error = netdev_get_addr_list(port->netdev, _addr,
> + , _in6);
> +if (!error && is_neighbor_reply_matched(flow, ip_addr)) {
> +/* Found a match. */
> +ret = true;
> +break;
> +}
> +}
> +}
>  return ret;
>  }
>
>  static bool
> -terminate_native_tunnel(struct xlate_ctx *ctx, ofp_port_t ofp_port,
> -struct flow *flow, struct flow_wildcards *wc,
> -odp_port_t *tnl_port)
> +terminate_native_tunnel(struct xlate_ctx *ctx, struct flow *flow,
> +struct flow_wildcards *wc, odp_port_t *tnl_port)
>  {
>  *tnl_port = ODPP_NONE;
>
>  /* XXX: Write better Filter for tunnel port. We can use 

Re: [ovs-dev] [patch v1] conntrack: Fix ICMPV4 error data L4 length check.

2019-08-28 Thread Darrell Ball
On Wed, Aug 28, 2019 at 1:43 AM Vishal Deep Ajmera <
vishal.deep.ajm...@ericsson.com> wrote:

> That is interesting
>
> i just tried applying on top of tree and I see that the git applies some
> changes (2 lines)
>
> in extract_l4_icmp6() rather the intended extract_l4_icmp() as in the
> patch I sent out.
>
> My guess is that the surrounding lines are identical in the 2 functions
> and I had other
>
> patches in the same branch shifting the patch downward, hence git applied
> the changes
>
> to extract_l4_icmp6() rather than extract_l4_icmp()
>
>
>
> I'll make the changes on a clean branch and resend.
>
>
>
> Thanks. I applied this patch and looks ok to me.
>

Thanks for confirming


>
>
> JTBC, the 8 byte ICMP error data L4 length restriction is only for V4.
>
> ICMP6 does not have this restriction; see
> https://tools.ietf.org/html/rfc4443
>
>
>
> In my opinion, we should limit the check to < 8 bytes even in case of
> ICMPv6 as that is all
>
> is required from the TCP header to extract port numbers and aligns it with
> ICMPv4.
>
> Specially because RFC is not mandating minimum size for L4 header in case
> of ICMPv6.
>

For V6, the ICMP error L4 length can even be zero, legitimately, with a
large extension header
presence, theoretically; practically, these are almost certainly crafted
packets. The existing
check for ICMP6 is mostly permissive but reasonable.

The ICMP6 sanity check can be enhanced. For the most part, it can be made
more strict while
handling the above corner case perfectly.

I don't think we should mess with enhancing ICMP6 as part of this bug fix
for ICMPv4.
Thats why this patch leaves ICMP6 unmodified.


>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch v2] conntrack: Fix ICMPv4 error data L4 length check.

2019-08-27 Thread Darrell Ball
The ICMPv4 error data L4 length check was found to be too strict for TCP,
expecting a minimum of 20 rather than 8 bytes.  This worked by
hapenstance for other inner protocols.  The approach is to explicitly
handle the ICMPv4 error data L4 length check and to do this for all
supported inner protocols in the same way.  Making the code common
between protocols also allows the existing ICMPv4 related UDP tests to
cover TCP and ICMP inner protocol cases.
Note that ICMPv6 does not have an 8 byte limit for error L4 data.

Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
CC: Daniele Di Proietto 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361949.html
Reported-by: Vishal Deep Ajmera 
Signed-off-by: Vishal Deep Ajmera 
Co-authored-by: Vishal Deep Ajmera 
Signed-off-by: Darrell Ball 
---

v2: Rebase to fix git applying to wrong function (extract_l4_icmp6 vs
extract_l4_icmp) with same match signature.

Minor fix to related comment in extract_l4().


 lib/conntrack.c | 41 -
 lib/packets.h   |  3 +++
 2 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 5f60fea..e5266e5 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1514,9 +1514,10 @@ check_l4_icmp6(const struct conn_key *key, const void 
*data, size_t size,
 }
 
 static inline bool
-extract_l4_tcp(struct conn_key *key, const void *data, size_t size)
+extract_l4_tcp(struct conn_key *key, const void *data, size_t size,
+   size_t *chk_len)
 {
-if (OVS_UNLIKELY(size < TCP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : TCP_HEADER_LEN))) {
 return false;
 }
 
@@ -1529,9 +1530,10 @@ extract_l4_tcp(struct conn_key *key, const void *data, 
size_t size)
 }
 
 static inline bool
-extract_l4_udp(struct conn_key *key, const void *data, size_t size)
+extract_l4_udp(struct conn_key *key, const void *data, size_t size,
+   size_t *chk_len)
 {
-if (OVS_UNLIKELY(size < UDP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : UDP_HEADER_LEN))) {
 return false;
 }
 
@@ -1545,7 +1547,7 @@ extract_l4_udp(struct conn_key *key, const void *data, 
size_t size)
 
 static inline bool extract_l4(struct conn_key *key, const void *data,
   size_t size, bool *related, const void *l3,
-  bool validate_checksum);
+  bool validate_checksum, size_t *chk_len);
 
 static uint8_t
 reverse_icmp_type(uint8_t type)
@@ -1577,9 +1579,9 @@ reverse_icmp_type(uint8_t type)
  * possible */
 static inline int
 extract_l4_icmp(struct conn_key *key, const void *data, size_t size,
-bool *related)
+bool *related, size_t *chk_len)
 {
-if (OVS_UNLIKELY(size < ICMP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : ICMP_HEADER_LEN))) {
 return false;
 }
 
@@ -1630,8 +1632,9 @@ extract_l4_icmp(struct conn_key *key, const void *data, 
size_t size,
 key->src = inner_key.src;
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
+size_t check_len = ICMP_ERROR_DATA_L4_LEN;
 
-ok = extract_l4(key, l4, tail - l4, NULL, l3, false);
+ok = extract_l4(key, l4, tail - l4, NULL, l3, false, _len);
 if (ok) {
 conn_key_reverse(key);
 *related = true;
@@ -1718,7 +1721,7 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
 
-ok = extract_l4(key, l4, tail - l4, NULL, l3, false);
+ok = extract_l4(key, l4, tail - l4, NULL, l3, false, NULL);
 if (ok) {
 conn_key_reverse(key);
 *related = true;
@@ -1743,26 +1746,29 @@ extract_l4_icmp6(struct conn_key *key, const void 
*data, size_t size,
  * an ICMP or ICMP6 header.
  *
  * If 'related' is NULL, it means that we're already parsing a header nested
- * in an ICMP error.  In this case, we skip checksum and length validation. */
+ * in an ICMP error.  In this case, we skip the checksum and some length
+ * validations. */
 static inline bool
 extract_l4(struct conn_key *key, const void *data, size_t size, bool *related,
-   const void *l3, bool validate_checksum)
+   const void *l3, bool validate_checksum, size_t *chk_len)
 {
 if (key->nw_proto == IPPROTO_TCP) {
 return (!related || check_l4_tcp(key, data, size, l3,
-validate_checksum)) && extract_l4_tcp(key, data, size);
+validate_checksum))
+   && extract_l4_tcp(key, data, size, chk_len);
 } else if (key->nw_proto == IPPROTO_UDP) {
 return (!related || check_l4_udp(key, data, size, l3,
-validate_checksum)) && extract_l4_udp(key, data, size);
+  

Re: [ovs-dev] [patch v1] conntrack: Fix ICMPV4 error data L4 length check.

2019-08-27 Thread Darrell Ball
On Tue, Aug 27, 2019 at 2:02 AM Vishal Deep Ajmera <
vishal.deep.ajm...@ericsson.com> wrote:

> Hi Darrell,
>
> Thanks for the patch. When I applied the patch to latest master,
> I see that we take care of length check (< 8) only for ICMPv6 and
> not for ICMPv4.


That is interesting
i just tried applying on top of tree and I see that the git applies some
changes (2 lines)
in extract_l4_icmp6() rather the intended extract_l4_icmp() as in the patch
I sent out.
My guess is that the surrounding lines are identical in the 2 functions and
I had other
patches in the same branch shifting the patch downward, hence git applied
the changes
to extract_l4_icmp6() rather than extract_l4_icmp()

I'll make the changes on a clean branch and resend.

JTBC, the 8 byte ICMP error data L4 length restriction is only for V4.
ICMP6 does not have this restriction; see
https://tools.ietf.org/html/rfc4443



> We need to do it for ICMPv4 as well.
>
>
The intention is only for ICMPv4; see above.


> Also, we are already using 'related' to skip or not to skip length check.
>

We cannot use 'related' for this as ICMPv6 is different than ICMPv4


>  * If 'related' is NULL, it means that we're already parsing a header
> nested
>  * in an ICMP error.  In this case, we skip checksum and length validation.
>
> However we continue to validate length in extract_l4_tcp (<8 or <20).
> I understand that check for minimum 8 bytes header is needed to make
> sure we can extract tcp port numbers.
>

No, it is a sanity check like other sanity checks.
The length check varies depending on the context.


> Can we instead try to converge all checks at one place and still take care
> of nested header? In my opinion it will simplify the code.
>

See the information above


>
> Warm Regards,
> Vishal Ajmera
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: Correct length check for tcp packet inside ICMP data.

2019-08-26 Thread Darrell Ball
On Fri, Aug 23, 2019 at 9:09 AM Darrell Ball  wrote:

> Thanks for the patch
>
> Goes back to release 2.6/day one :-).
>
> I'll provide more feedback after today.
>

I sent an alternative patch here

https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/362013.html

pls take a look.


>
> On Fri, Aug 23, 2019 at 6:20 AM Vishal Deep Ajmera <
> vishal.deep.ajm...@ericsson.com> wrote:
>
>> An ICMP packet with type destination or host not reachable also carries
>> 28 bytes of ICMP data field. This data field contains IP header and TCP
>> header (partial first 8 bytes) of the original packet for which ICMP
>> is being generated.
>>
>> Conntrack module when processing these ICMP packets checks for TCP header
>> length (20 bytes). Since TCP header is partial the length check fails and
>> packet is erroneously dropped.
>>
>> This patch fixes length check for TCP header when processing ICMP data
>> fields.
>>
>> Signed-off-by: Vishal Deep Ajmera 
>> ---
>>  lib/conntrack.c | 14 +++---
>>  lib/packets.h   |  1 +
>>  2 files changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index 5f60fea..0618fdd 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -1513,10 +1513,18 @@ check_l4_icmp6(const struct conn_key *key, const
>> void *data, size_t size,
>>  return validate_checksum ? checksum_valid(key, data, size, l3) :
>> true;
>>  }
>>
>> +/* If related is NULL, we are parsing nested TCP header  inside ICMP
>> packet.
>> + * Only 8 bytes of TCP header is required by RFC to be present in such
>> case.
>> + */
>>  static inline bool
>> -extract_l4_tcp(struct conn_key *key, const void *data, size_t size)
>> +extract_l4_tcp(struct conn_key *key, const void *data, size_t size,
>> +   bool *related)
>>  {
>> -if (OVS_UNLIKELY(size < TCP_HEADER_LEN)) {
>> +if (!related) {
>> +if (size < ICMP_L4_DATA_LEN) {
>> +return false;
>> +}
>> +} else if (size < TCP_HEADER_LEN) {
>>  return false;
>>  }
>>
>> @@ -1750,7 +1758,7 @@ extract_l4(struct conn_key *key, const void *data,
>> size_t size, bool *related,
>>  {
>>  if (key->nw_proto == IPPROTO_TCP) {
>>  return (!related || check_l4_tcp(key, data, size, l3,
>> -validate_checksum)) && extract_l4_tcp(key, data, size);
>> +  validate_checksum)) && extract_l4_tcp(key, data, size,
>> related);
>>  } else if (key->nw_proto == IPPROTO_UDP) {
>>  return (!related || check_l4_udp(key, data, size, l3,
>>  validate_checksum)) && extract_l4_udp(key, data, size);
>> diff --git a/lib/packets.h b/lib/packets.h
>> index a4bee38..2bc65c9 100644
>> --- a/lib/packets.h
>> +++ b/lib/packets.h
>> @@ -886,6 +886,7 @@ struct tcp_header {
>>  ovs_be16 tcp_urg;
>>  };
>>  BUILD_ASSERT_DECL(TCP_HEADER_LEN == sizeof(struct tcp_header));
>> +#define ICMP_L4_DATA_LEN 8
>>
>>  /* Connection states.
>>   *
>> --
>> 1.9.1
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v1] conntrack: Fix ICMPV4 error data L4 length check.

2019-08-26 Thread Darrell Ball
Resent this patch, as it had a bad e-mail address

Darrell

On Mon, Aug 26, 2019 at 9:06 AM Darrell Ball  wrote:

> The ICMP error data L4 length check was found to be too strict for TCP,
> expecting a minimum of 20 rather than 8 bytes.  This worked by
> hapenstance for other inner protocols.  The approach is to explicitly
> handle the ICMPV4 error data L4 length check and to do this for all
> supported inner protocols in the same way.  Making the code common
> between protocols also allows the existing ICMP related UDP tests to
> cover TCP and ICMP cases.
>
> Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
> CC: Daniele Di Proietto 
> Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361949.html
> Reported-by: Vishal Deep Ajmera 
> Signed-off-by: Vishal Deep Ajmera 
> Co-authored-by: Vishal Deep Ajmera 
> Signed-off-by: Darrell Ball 
> ---
>  lib/conntrack.c | 38 ++
>  lib/packets.h   |  3 +++
>  2 files changed, 25 insertions(+), 16 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index ecf3bcc..de0ab9b 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -1565,9 +1565,10 @@ check_l4_icmp6(const struct conn_key *key, const
> void *data, size_t size,
>  }
>
>  static inline bool
> -extract_l4_tcp(struct conn_key *key, const void *data, size_t size)
> +extract_l4_tcp(struct conn_key *key, const void *data, size_t size,
> +   size_t *chk_len)
>  {
> -if (OVS_UNLIKELY(size < TCP_HEADER_LEN)) {
> +if (OVS_UNLIKELY(size < (chk_len ? *chk_len : TCP_HEADER_LEN))) {
>  return false;
>  }
>
> @@ -1580,9 +1581,10 @@ extract_l4_tcp(struct conn_key *key, const void
> *data, size_t size)
>  }
>
>  static inline bool
> -extract_l4_udp(struct conn_key *key, const void *data, size_t size)
> +extract_l4_udp(struct conn_key *key, const void *data, size_t size,
> +   size_t *chk_len)
>  {
> -if (OVS_UNLIKELY(size < UDP_HEADER_LEN)) {
> +if (OVS_UNLIKELY(size < (chk_len ? *chk_len : UDP_HEADER_LEN))) {
>  return false;
>  }
>
> @@ -1596,7 +1598,7 @@ extract_l4_udp(struct conn_key *key, const void
> *data, size_t size)
>
>  static inline bool extract_l4(struct conn_key *key, const void *data,
>size_t size, bool *related, const void *l3,
> -  bool validate_checksum);
> +  bool validate_checksum, size_t *chk_len);
>
>  static uint8_t
>  reverse_icmp_type(uint8_t type)
> @@ -1628,9 +1630,9 @@ reverse_icmp_type(uint8_t type)
>   * possible */
>  static inline int
>  extract_l4_icmp(struct conn_key *key, const void *data, size_t size,
> -bool *related)
> +bool *related, size_t *chk_len)
>  {
> -if (OVS_UNLIKELY(size < ICMP_HEADER_LEN)) {
> +if (OVS_UNLIKELY(size < (chk_len ? *chk_len : ICMP_HEADER_LEN))) {
>  return false;
>  }
>
> @@ -1681,8 +1683,9 @@ extract_l4_icmp(struct conn_key *key, const void
> *data, size_t size,
>  key->src = inner_key.src;
>  key->dst = inner_key.dst;
>  key->nw_proto = inner_key.nw_proto;
> +size_t check_len = ICMP_ERROR_DATA_L4_LEN;
>
> -ok = extract_l4(key, l4, tail - l4, NULL, l3, false);
> +ok = extract_l4(key, l4, tail - l4, NULL, l3, false, _len);
>  if (ok) {
>  conn_key_reverse(key);
>  *related = true;
> @@ -1769,7 +1772,7 @@ extract_l4_icmp6(struct conn_key *key, const void
> *data, size_t size,
>  key->dst = inner_key.dst;
>  key->nw_proto = inner_key.nw_proto;
>
> -ok = extract_l4(key, l4, tail - l4, NULL, l3, false);
> +ok = extract_l4(key, l4, tail - l4, NULL, l3, false, NULL);
>  if (ok) {
>  conn_key_reverse(key);
>  *related = true;
> @@ -1797,23 +1800,25 @@ extract_l4_icmp6(struct conn_key *key, const void
> *data, size_t size,
>   * in an ICMP error.  In this case, we skip checksum and length
> validation. */
>  static inline bool
>  extract_l4(struct conn_key *key, const void *data, size_t size, bool
> *related,
> -   const void *l3, bool validate_checksum)
> +   const void *l3, bool validate_checksum, size_t *chk_len)
>  {
>  if (key->nw_proto == IPPROTO_TCP) {
>  return (!related || check_l4_tcp(key, data, size, l3,
> -validate_checksum)) && extract_l4_tcp(key, data, size);
> +validate_checksum))
> +   && extract_l4_tcp(key, data, size, chk_len);
>  } else i

[ovs-dev] [patch v1] conntrack: Fix ICMPV4 error data L4 length check.

2019-08-26 Thread Darrell Ball
The ICMP error data L4 length check was found to be too strict for TCP,
expecting a minimum of 20 rather than 8 bytes.  This worked by
hapenstance for other inner protocols.  The approach is to explicitly
handle the ICMPV4 error data L4 length check and to do this for all
supported inner protocols in the same way.  Making the code common
between protocols also allows the existing ICMP related UDP tests to
cover TCP and ICMP cases.

Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
CC: Daniele Di Proietto 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361949.html
Reported-by: Vishal Deep Ajmera 
Signed-off-by: Vishal Deep Ajmera 
Co-authored-by: Vishal Deep Ajmera 
Signed-off-by: Darrell Ball 
---
 lib/conntrack.c | 38 ++
 lib/packets.h   |  3 +++
 2 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index ecf3bcc..de0ab9b 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1565,9 +1565,10 @@ check_l4_icmp6(const struct conn_key *key, const void 
*data, size_t size,
 }
 
 static inline bool
-extract_l4_tcp(struct conn_key *key, const void *data, size_t size)
+extract_l4_tcp(struct conn_key *key, const void *data, size_t size,
+   size_t *chk_len)
 {
-if (OVS_UNLIKELY(size < TCP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : TCP_HEADER_LEN))) {
 return false;
 }
 
@@ -1580,9 +1581,10 @@ extract_l4_tcp(struct conn_key *key, const void *data, 
size_t size)
 }
 
 static inline bool
-extract_l4_udp(struct conn_key *key, const void *data, size_t size)
+extract_l4_udp(struct conn_key *key, const void *data, size_t size,
+   size_t *chk_len)
 {
-if (OVS_UNLIKELY(size < UDP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : UDP_HEADER_LEN))) {
 return false;
 }
 
@@ -1596,7 +1598,7 @@ extract_l4_udp(struct conn_key *key, const void *data, 
size_t size)
 
 static inline bool extract_l4(struct conn_key *key, const void *data,
   size_t size, bool *related, const void *l3,
-  bool validate_checksum);
+  bool validate_checksum, size_t *chk_len);
 
 static uint8_t
 reverse_icmp_type(uint8_t type)
@@ -1628,9 +1630,9 @@ reverse_icmp_type(uint8_t type)
  * possible */
 static inline int
 extract_l4_icmp(struct conn_key *key, const void *data, size_t size,
-bool *related)
+bool *related, size_t *chk_len)
 {
-if (OVS_UNLIKELY(size < ICMP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : ICMP_HEADER_LEN))) {
 return false;
 }
 
@@ -1681,8 +1683,9 @@ extract_l4_icmp(struct conn_key *key, const void *data, 
size_t size,
 key->src = inner_key.src;
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
+size_t check_len = ICMP_ERROR_DATA_L4_LEN;
 
-ok = extract_l4(key, l4, tail - l4, NULL, l3, false);
+ok = extract_l4(key, l4, tail - l4, NULL, l3, false, _len);
 if (ok) {
 conn_key_reverse(key);
 *related = true;
@@ -1769,7 +1772,7 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
 
-ok = extract_l4(key, l4, tail - l4, NULL, l3, false);
+ok = extract_l4(key, l4, tail - l4, NULL, l3, false, NULL);
 if (ok) {
 conn_key_reverse(key);
 *related = true;
@@ -1797,23 +1800,25 @@ extract_l4_icmp6(struct conn_key *key, const void 
*data, size_t size,
  * in an ICMP error.  In this case, we skip checksum and length validation. */
 static inline bool
 extract_l4(struct conn_key *key, const void *data, size_t size, bool *related,
-   const void *l3, bool validate_checksum)
+   const void *l3, bool validate_checksum, size_t *chk_len)
 {
 if (key->nw_proto == IPPROTO_TCP) {
 return (!related || check_l4_tcp(key, data, size, l3,
-validate_checksum)) && extract_l4_tcp(key, data, size);
+validate_checksum))
+   && extract_l4_tcp(key, data, size, chk_len);
 } else if (key->nw_proto == IPPROTO_UDP) {
 return (!related || check_l4_udp(key, data, size, l3,
-validate_checksum)) && extract_l4_udp(key, data, size);
+validate_checksum))
+   && extract_l4_udp(key, data, size, chk_len);
 } else if (key->dl_type == htons(ETH_TYPE_IP)
&& key->nw_proto == IPPROTO_ICMP) {
 return (!related || check_l4_icmp(data, size, validate_checksum))
-   && extract_l4_icmp(key, data, size, related);
+   && extract_l4_icmp(key, data, size, related, chk_len);
 } else if (key-&

[ovs-dev] [patch v1] conntrack: Fix ICMPV4 error data L4 length check.

2019-08-26 Thread Darrell Ball
The ICMP error data L4 length check was found to be too strict for TCP,
expecting a minimum of 20 rather than 8 bytes.  This worked by
hapenstance for other inner protocols.  The approach is to explicitly
handle the ICMPV4 error data L4 length check and to do this for all
supported inner protocols in the same way.  Making the code common
between protocols also allows the existing ICMP related UDP tests to
cover TCP and ICMP cases.

Fixes: a489b16854b5 ("conntrack: New userspace connection tracker.")
CC: Daniele Di Proietto 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-dev/2019-August/361949.html
Reported-by: Vishal Deep Ajmera 
Signed-off-by: Vishal Deep Ajmera 
Co-authored-by: Vishal Deep Ajmera 
Signed-off-by: Darrell Ball 
---
 lib/conntrack.c | 38 ++
 lib/packets.h   |  3 +++
 2 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index ecf3bcc..de0ab9b 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1565,9 +1565,10 @@ check_l4_icmp6(const struct conn_key *key, const void 
*data, size_t size,
 }
 
 static inline bool
-extract_l4_tcp(struct conn_key *key, const void *data, size_t size)
+extract_l4_tcp(struct conn_key *key, const void *data, size_t size,
+   size_t *chk_len)
 {
-if (OVS_UNLIKELY(size < TCP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : TCP_HEADER_LEN))) {
 return false;
 }
 
@@ -1580,9 +1581,10 @@ extract_l4_tcp(struct conn_key *key, const void *data, 
size_t size)
 }
 
 static inline bool
-extract_l4_udp(struct conn_key *key, const void *data, size_t size)
+extract_l4_udp(struct conn_key *key, const void *data, size_t size,
+   size_t *chk_len)
 {
-if (OVS_UNLIKELY(size < UDP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : UDP_HEADER_LEN))) {
 return false;
 }
 
@@ -1596,7 +1598,7 @@ extract_l4_udp(struct conn_key *key, const void *data, 
size_t size)
 
 static inline bool extract_l4(struct conn_key *key, const void *data,
   size_t size, bool *related, const void *l3,
-  bool validate_checksum);
+  bool validate_checksum, size_t *chk_len);
 
 static uint8_t
 reverse_icmp_type(uint8_t type)
@@ -1628,9 +1630,9 @@ reverse_icmp_type(uint8_t type)
  * possible */
 static inline int
 extract_l4_icmp(struct conn_key *key, const void *data, size_t size,
-bool *related)
+bool *related, size_t *chk_len)
 {
-if (OVS_UNLIKELY(size < ICMP_HEADER_LEN)) {
+if (OVS_UNLIKELY(size < (chk_len ? *chk_len : ICMP_HEADER_LEN))) {
 return false;
 }
 
@@ -1681,8 +1683,9 @@ extract_l4_icmp(struct conn_key *key, const void *data, 
size_t size,
 key->src = inner_key.src;
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
+size_t check_len = ICMP_ERROR_DATA_L4_LEN;
 
-ok = extract_l4(key, l4, tail - l4, NULL, l3, false);
+ok = extract_l4(key, l4, tail - l4, NULL, l3, false, _len);
 if (ok) {
 conn_key_reverse(key);
 *related = true;
@@ -1769,7 +1772,7 @@ extract_l4_icmp6(struct conn_key *key, const void *data, 
size_t size,
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
 
-ok = extract_l4(key, l4, tail - l4, NULL, l3, false);
+ok = extract_l4(key, l4, tail - l4, NULL, l3, false, NULL);
 if (ok) {
 conn_key_reverse(key);
 *related = true;
@@ -1797,23 +1800,25 @@ extract_l4_icmp6(struct conn_key *key, const void 
*data, size_t size,
  * in an ICMP error.  In this case, we skip checksum and length validation. */
 static inline bool
 extract_l4(struct conn_key *key, const void *data, size_t size, bool *related,
-   const void *l3, bool validate_checksum)
+   const void *l3, bool validate_checksum, size_t *chk_len)
 {
 if (key->nw_proto == IPPROTO_TCP) {
 return (!related || check_l4_tcp(key, data, size, l3,
-validate_checksum)) && extract_l4_tcp(key, data, size);
+validate_checksum))
+   && extract_l4_tcp(key, data, size, chk_len);
 } else if (key->nw_proto == IPPROTO_UDP) {
 return (!related || check_l4_udp(key, data, size, l3,
-validate_checksum)) && extract_l4_udp(key, data, size);
+validate_checksum))
+   && extract_l4_udp(key, data, size, chk_len);
 } else if (key->dl_type == htons(ETH_TYPE_IP)
&& key->nw_proto == IPPROTO_ICMP) {
 return (!related || check_l4_icmp(data, size, validate_checksum))
-   && extract_l4_icmp(key, data, size, related);
+   && extract_l4_icmp(key, data, size, related, chk_len);
 } else if (key-&

[ovs-dev] [patch v2] conntrack: Add option to disable TCP sequence checking.

2019-08-26 Thread Darrell Ball
This may be needed in some special cases, such as to support some hardware
offload implementations.  Note that disabling TCP sequence number
verification is not an optimization in itself, but supporting some
hardware offload implementations may offer better performance.  TCP
sequence number verification is enabled by default.  This option is only
available for the userspace datapath.  Access to this option is presently
provided via 'dpctl' commands as the need for this option is quite node
specific, by virtual of which nics are in use on a given node.  A test is
added to verify this option.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359188.html
Signed-off-by: Darrell Ball 
---

v2: Per particular requirement, support 'no-tcp-seq-chk' rather than
'liberal' mode. 
Make manpage comments more verbose.
Expand commit message comments.

 lib/conntrack-private.h |   6 ++-
 lib/conntrack-tcp.c |  13 -
 lib/conntrack.c |  16 ++
 lib/conntrack.h |   2 +
 lib/ct-dpif.c   |  16 ++
 lib/ct-dpif.h   |   2 +
 lib/dpctl.c |  64 +-
 lib/dpctl.man   |  18 +++
 lib/dpif-netdev.c   |  18 +++
 lib/dpif-netlink.c  |   2 +
 lib/dpif-provider.h |   5 ++
 tests/ofproto-dpif.at   | 138 
 12 files changed, 295 insertions(+), 5 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index bcfbe10..7cdc2a4 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -171,8 +171,10 @@ struct conntrack {
 struct hindex alg_expectation_refs OVS_GUARDED; /* For lookup from
  * control context.  */
 
-/* Fragmentation handling context. */
-struct ipf *ipf;
+struct ipf *ipf; /* Fragmentation handling context. */
+atomic_bool tcp_seq_ckk; /* TCP sequence number verification; when
+enabled, this enables sequence number
+verification; enabled by default. */
 };
 
 /* Lock acquisition order:
diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
index 397aca1..1e843f3 100644
--- a/lib/conntrack-tcp.c
+++ b/lib/conntrack-tcp.c
@@ -39,10 +39,15 @@
 #include 
 
 #include "conntrack-private.h"
+#include "coverage.h"
 #include "ct-dpif.h"
 #include "dp-packet.h"
 #include "util.h"
 
+COVERAGE_DEFINE(conntrack_tcp_seq_chk_bypass);
+COVERAGE_DEFINE(conntrack_tcp_seq_chk_failed);
+COVERAGE_DEFINE(conntrack_invalid_tcp_flags);
+
 struct tcp_peer {
 uint32_t   seqlo;  /* Max sequence number sent */
 uint32_t   seqhi;  /* Max the other end ACKd + win */
@@ -162,6 +167,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 uint32_t p_len = tcp_payload_length(pkt);
 
 if (tcp_invalid_flags(tcp_flags)) {
+COVERAGE_INC(conntrack_invalid_tcp_flags);
 return CT_UPDATE_INVALID;
 }
 
@@ -272,7 +278,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 
 int ackskew = check_ackskew ? dst->seqlo - ack : 0;
 #define MAXACKWINDOW (0x + 1500)/* 1500 is an arbitrary fudge factor */
-if (SEQ_GEQ(src->seqhi, end)
+if ((SEQ_GEQ(src->seqhi, end)
 /* Last octet inside other's window space */
 && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
 /* Retrans: not more than one window back */
@@ -281,7 +287,9 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 && (ackskew <= (MAXACKWINDOW << sws))
 /* Acking not more than one window forward */
 && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo
-|| (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) {
+|| (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo)))
+|| (!conntrack_get_tcp_seq_chk(ct)
+? COVERAGE_INC(conntrack_tcp_seq_chk_bypass), 1 : 0)) {
 /* Require an exact/+1 sequence match on resets when possible */
 
 /* update max window */
@@ -385,6 +393,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_,
 src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT;
 }
 } else {
+COVERAGE_INC(conntrack_tcp_seq_chk_failed);
 return CT_UPDATE_INVALID;
 }
 
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 5f60fea..0125bb4 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -310,6 +310,7 @@ conntrack_init(void)
 ct->hash_basis = random_uint32();
 atomic_count_init(>n_conn, 0);
 atomic_init(>n_conn_limit, DEFAULT_N_CONN_LIMIT);
+atomic_init(>tcp_seq_ckk, true);
 latch_init(>clean_thread_exit);
 ct->clean_thread = ovs_thread_create("ct_clean", clean_thread_main, ct);
 ct->ipf = ipf_init(

[ovs-dev] [patch v6] conntrack: Optimize recirculations.

2019-08-26 Thread Darrell Ball
Cache the 'conn' context and use it when it is valid.  The cached 'conn'
context will get reset if it is not expected to be valid; the cost to do
this is negligible.  Besides being most optimal, this also handles corner
cases, such as decapsulation leading to the same tuple, as in tunnel VPN
cases.  A negative test is added to check the resetting of the cached
'conn'.

Signed-off-by: Darrell Ball 
---

v6: a/ Added 'conn' reset for mpls push case to force 'invalid', for
   consistency reasons.
b/ Add missed lock around 'conn->mark' and 'conn->label' access in
   'process_one_fast()'.
c/ Use opaque vs void 'conn' ptr (Ben).
d/ Remove inline specifier on 2 static functions in source file. (Ben).
e/ Remove 3 'continue' instructions in lieu of 'else' clause. (Ben).

v5: Check for alg ctl on recirculation to handle potential corner case.
Remove unnecessary 'commit' filter.

v4: Reset 'conn' cache context automatically when tuple changes, rather than
needing an explicit 'ct_clear' action.  Need to check if all cases are
handled.
v3: Remove unneeded 'NAT' field added to 'struct conn_lookup_ctx'.

 lib/conntrack.c | 60 ++
 lib/netdev.c|  1 +
 lib/packets.c   |  9 +++
 lib/packets.h   | 11 +
 tests/system-traffic.at | 63 +
 5 files changed, 139 insertions(+), 5 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 0125bb4..ecf3bcc 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1086,6 +1086,46 @@ conn_update_state_alg(struct conntrack *ct, struct 
dp_packet *pkt,
 }
 
 static void
+set_cached_conn(const struct nat_action_info_t *nat_action_info,
+const struct conn_lookup_ctx *ctx, struct conn *conn,
+struct dp_packet *pkt)
+{
+if (OVS_LIKELY(!nat_action_info)) {
+pkt->md.conn = conn;
+pkt->md.reply = ctx->reply;
+pkt->md.icmp_related = ctx->icmp_related;
+} else {
+pkt->md.conn = NULL;
+}
+}
+
+static void
+process_one_fast(uint16_t zone, const uint32_t *setmark,
+ const struct ovs_key_ct_labels *setlabel,
+ const struct nat_action_info_t *nat_action_info,
+ struct conn *conn, struct dp_packet *pkt)
+{
+if (nat_action_info) {
+handle_nat(pkt, conn, zone, pkt->md.reply, pkt->md.icmp_related);
+pkt->md.conn = NULL;
+}
+
+pkt->md.ct_zone = zone;
+ovs_mutex_lock(>lock);
+pkt->md.ct_mark = conn->mark;
+pkt->md.ct_label = conn->label;
+ovs_mutex_unlock(>lock);
+
+if (setmark) {
+set_mark(pkt, conn, setmark[0], setmark[1]);
+}
+
+if (setlabel) {
+set_label(pkt, conn, [0], [1]);
+}
+}
+
+static void
 process_one(struct conntrack *ct, struct dp_packet *pkt,
 struct conn_lookup_ctx *ctx, uint16_t zone,
 bool force, bool commit, long long now, const uint32_t *setmark,
@@ -1185,6 +1225,8 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
 }
 
 handle_alg_ctl(ct, ctx, pkt, ct_alg_ctl, conn, now, !!nat_action_info);
+
+set_cached_conn(nat_action_info, ctx, conn, pkt);
 }
 
 /* Sends the packets in '*pkt_batch' through the connection tracker 'ct'.  All
@@ -1212,14 +1254,21 @@ conntrack_execute(struct conntrack *ct, struct 
dp_packet_batch *pkt_batch,
 struct conn_lookup_ctx ctx;
 
 DP_PACKET_BATCH_FOR_EACH (i, packet, pkt_batch) {
-if (packet->md.ct_state == CS_INVALID
-|| !conn_key_extract(ct, packet, dl_type, , zone)) {
+struct conn *conn = packet->md.conn;
+if (OVS_UNLIKELY(packet->md.ct_state == CS_INVALID)) {
+write_ct_md(packet, zone, NULL, NULL, NULL);
+} else if (conn && conn->key.zone == zone && !force
+   && !get_alg_ctl_type(packet, tp_src, tp_dst, helper)) {
+process_one_fast(zone, setmark, setlabel, nat_action_info,
+ conn, packet);
+} else if (OVS_UNLIKELY(!conn_key_extract(ct, packet, dl_type, ,
+zone))) {
 packet->md.ct_state = CS_INVALID;
 write_ct_md(packet, zone, NULL, NULL, NULL);
-continue;
+} else {
+process_one(ct, packet, , zone, force, commit, now, setmark,
+setlabel, nat_action_info, tp_src, tp_dst, helper);
 }
-process_one(ct, packet, , zone, force, commit, now, setmark,
-setlabel, nat_action_info, tp_src, tp_dst, helper);
 }
 
 ipf_postprocess_conntrack(ct->ipf, pkt_batch, now, dl_type);
@@ -1233,6 +1282,7 @@ conntrack_clear(struct dp_packet *packet)
 /* According to pkt_metadata_init(), ct_state == 0 is enough to make all of
  * the conntrack fields invalid. */
 packet->md.ct_stat

Re: [ovs-dev] [PATCH] conntrack: Correct length check for tcp packet inside ICMP data.

2019-08-23 Thread Darrell Ball
Thanks for the patch

Goes back to release 2.6/day one :-).

I'll provide more feedback after today.

On Fri, Aug 23, 2019 at 6:20 AM Vishal Deep Ajmera <
vishal.deep.ajm...@ericsson.com> wrote:

> An ICMP packet with type destination or host not reachable also carries
> 28 bytes of ICMP data field. This data field contains IP header and TCP
> header (partial first 8 bytes) of the original packet for which ICMP
> is being generated.
>
> Conntrack module when processing these ICMP packets checks for TCP header
> length (20 bytes). Since TCP header is partial the length check fails and
> packet is erroneously dropped.
>
> This patch fixes length check for TCP header when processing ICMP data
> fields.
>
> Signed-off-by: Vishal Deep Ajmera 
> ---
>  lib/conntrack.c | 14 +++---
>  lib/packets.h   |  1 +
>  2 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 5f60fea..0618fdd 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -1513,10 +1513,18 @@ check_l4_icmp6(const struct conn_key *key, const
> void *data, size_t size,
>  return validate_checksum ? checksum_valid(key, data, size, l3) : true;
>  }
>
> +/* If related is NULL, we are parsing nested TCP header  inside ICMP
> packet.
> + * Only 8 bytes of TCP header is required by RFC to be present in such
> case.
> + */
>  static inline bool
> -extract_l4_tcp(struct conn_key *key, const void *data, size_t size)
> +extract_l4_tcp(struct conn_key *key, const void *data, size_t size,
> +   bool *related)
>  {
> -if (OVS_UNLIKELY(size < TCP_HEADER_LEN)) {
> +if (!related) {
> +if (size < ICMP_L4_DATA_LEN) {
> +return false;
> +}
> +} else if (size < TCP_HEADER_LEN) {
>  return false;
>  }
>
> @@ -1750,7 +1758,7 @@ extract_l4(struct conn_key *key, const void *data,
> size_t size, bool *related,
>  {
>  if (key->nw_proto == IPPROTO_TCP) {
>  return (!related || check_l4_tcp(key, data, size, l3,
> -validate_checksum)) && extract_l4_tcp(key, data, size);
> +  validate_checksum)) && extract_l4_tcp(key, data, size,
> related);
>  } else if (key->nw_proto == IPPROTO_UDP) {
>  return (!related || check_l4_udp(key, data, size, l3,
>  validate_checksum)) && extract_l4_udp(key, data, size);
> diff --git a/lib/packets.h b/lib/packets.h
> index a4bee38..2bc65c9 100644
> --- a/lib/packets.h
> +++ b/lib/packets.h
> @@ -886,6 +886,7 @@ struct tcp_header {
>  ovs_be16 tcp_urg;
>  };
>  BUILD_ASSERT_DECL(TCP_HEADER_LEN == sizeof(struct tcp_header));
> +#define ICMP_L4_DATA_LEN 8
>
>  /* Connection states.
>   *
> --
> 1.9.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: check the result of extract_l3_ipv4/6

2019-08-21 Thread Darrell Ball
On Wed, Aug 21, 2019 at 3:13 PM Ben Pfaff  wrote:

> On Mon, Aug 19, 2019 at 08:35:11AM -0700, Darrell Ball wrote:
> > Thanks for the patch
> >
> > On Sun, Aug 18, 2019 at 11:01 PM Li RongQing 
> wrote:
> >
> > > the result of extract_l3_ipv4/6 should be checked in reverse_nat_packet
> > > when it is false, meaning this packet is wrong, should not do handle it
> > > continually
> > >
> > > Signed-off-by: Li RongQing 
> > > ---
> > >  lib/conntrack.c | 17 +++--
> > >  1 file changed, 15 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/lib/conntrack.c b/lib/conntrack.c
> > > index 5f60fea18..c26d5438c 100644
> > > --- a/lib/conntrack.c
> > > +++ b/lib/conntrack.c
> > > @@ -695,11 +695,18 @@ reverse_nat_packet(struct dp_packet *pkt, const
> > > struct conn *conn)
> > >  uint16_t orig_l4_ofs = pkt->l4_ofs;
> > >
> > >  if (conn->key.dl_type == htons(ETH_TYPE_IP)) {
> > > +bool ok;
> > >  struct ip_header *nh = dp_packet_l3(pkt);
> > >  struct icmp_header *icmp = dp_packet_l4(pkt);
> > >  struct ip_header *inner_l3 = (struct ip_header *) (icmp + 1);
> > > -extract_l3_ipv4(_key, inner_l3, tail - ((char
> *)inner_l3) -
> > > pad,
> > >
> >
> > There is intentionally no checking for success/fail here bcoz the packet
> > has already
> > been parsed and found to be ok during conn_key_extract() code path.
> Reusing
> > the
> > same api here is just convenient.
>
> Maybe a comment would be warranted to make that clear.
>

that's reasonable; maybe RongQing would like to submit a patch ?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 9/9] system-traffic: Add zone-based conntrack timeout policy test

2019-08-20 Thread Darrell Ball
On Mon, Aug 12, 2019 at 5:22 PM Yi-Hung Wei  wrote:

> On Sun, Aug 11, 2019 at 12:30 PM Darrell Ball  wrote:
> >
> > I did some further testing and ran into another issue; in this case,
> one, I did not expect.
> >
> > I added an additional sending of packets at the end of the test after
> this check:
> >
> > AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> > ])
> >
> > Below is new code
> >
> > dnl Do it again
> > dnl Send ICMP and UDP traffic
> > NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > 3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > ])
> > AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=5054000a505400090800451c0011a4cd0a0101010a010102000100020008
> actions=resubmit(,0)"])
> >
> > AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | sort],
> [0], [dnl
> >
> icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0),zone=5
> >
> udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=5
> > ])
> >
> > dnl Wait until the timeout expire.
> > dnl We intend to wait a bit longer, because conntrack does not recycle
> the entry right after it is expired.
> > sleep 5
> >
> > AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> > ])
> >
> > The test fails bcoz the second time with short timeouts, the conntrack
> entries are not cleanup up quickly
> >
> > @@ -0,0 +1,2 @@
> >
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0),zone=5
> >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=5
>
>
> Thanks for testing!   This test actually catch a kernel bug when ovs
> kernel handles conntrack cache.  It works for me on my ubuntu xenial
> VM with 4.4 kernel.
>
> Since this requires upstream kernel change, it will be backported to
> OVS once the fix gets upstream.
>
> Thanks,
>
> -Yi-Hung
>
> diff --git a/datapath/conntrack.c b/datapath/conntrack.c
> index f85d0a2572f6..ad48b559bcde 100644
> --- a/datapath/conntrack.c
> +++ b/datapath/conntrack.c
> @@ -76,6 +76,7 @@ enum ovs_ct_nat {
>  /* Conntrack action context for execution. */
>  struct ovs_conntrack_info {
> struct nf_conntrack_helper *helper;
> +   struct nf_ct_timeout *nf_ct_timeout;
> struct nf_conntrack_zone zone;
> struct nf_conn *ct;
> u8 commit : 1;
> @@ -745,6 +746,13 @@ static bool skb_nfct_cached(struct net *net,
> if (help && rcu_access_pointer(help->helper) !=
> info->helper)
> return false;
> }
> +   if (info->nf_ct_timeout) {
> +   struct nf_conn_timeout *timeout_ext;
> +
> +   timeout_ext = nf_ct_timeout_find(ct);
> +   if (!timeout_ext || info->nf_ct_timeout !=
> timeout_ext->timeout)
> +   return false;
> +   }
> /* Force conntrack entry direction to the current packet? */
> if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
> /* Delete the conntrack entry if confirmed, else just
> release
> @@ -1704,6 +1712,8 @@ int ovs_ct_copy_action(struct net *net, const
> struct nlattr *attr,
>   ct_info.timeout))
> pr_info_ratelimited("Failed to associated timeout "
> "policy `%s'\n",
> ct_info.timeout);
> +   else
> +   ct_info.nf_ct_timeout =
> nf_ct_timeout_find(ct_info.ct)->timeout;
> }
>

Forgot to respond to this one earlier.
I did review, unit test and system test these changes and they are fine.

Thanks Darrell



>
> if (helper) {
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 4/9] ct-dpif, dpif-netlink: Add conntrack timeout policy support

2019-08-20 Thread Darrell Ball
On Tue, Aug 20, 2019 at 12:30 PM Yi-Hung Wei  wrote:

> On Tue, Aug 20, 2019 at 12:46 AM Darrell Ball  wrote:
> > After fixing a bug in my proposed incremental and adding tracking of an
> already removed sub timeout policy:
> > Pls double check.
>
> Thanks for the proposed incremental.
>
> I checked all the other logging places in dpif-netlink, we usually do
> not log the successfully cases in the INFO level.  As the discussion
> in the e-mail thread, I think the successful cases does not provide
> much useful information, so I made some minor changes based on the
> proposed incremental.  I will fold in the following diff.
>

Looks good

As mentioned earlier, tracking the timeout profile deletion timing at INFO
level is
not that important in general. So, as long as we don't spam the log, this
part should
be fine.


>
> Thanks,
>
> -Yi-Hung
>
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index 1d4ee60bd199..85827cd65503 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -2822,11 +2822,10 @@ dpif_netlink_ct_dump_done(struct dpif *dpif
> OVS_UNUSED,
>struct ct_dpif_dump_state *dump_)
>  {
>  struct dpif_netlink_ct_dump_state *dump;
> -int err;
>
>  INIT_CONTAINER(dump, dump_, up);
>
> -err = nl_ct_dump_done(dump->nl_ct_dump);
> +int err = nl_ct_dump_done(dump->nl_ct_dump);
>  free(dump);
>  return err;
>  }
> @@ -3318,32 +3317,32 @@ out:
>  return err;
>  }
>
> -/* Returns 0 if all the sub timeout policies are deleted or
> - * not exist in the kernel. */
> +/* Returns 0 if all the sub timeout policies are deleted or not exist in
> the
> + * kernel.  Returns 1 if any sub timeout policy deletion failed. */
>  static int
>  dpif_netlink_ct_del_timeout_policy(struct dpif *dpif OVS_UNUSED,
> uint32_t tp_id)
>  {
>  struct ds nl_tp_name = DS_EMPTY_INITIALIZER;
> -int err = 0;
> +int ret = 0;
>
>  for (int i = 0; i < ARRAY_SIZE(tp_protos); ++i) {
>  dpif_netlink_format_tp_name(tp_id, tp_protos[i].l3num,
>  tp_protos[i].l4num, _tp_name);
> -err = nl_ct_del_timeout_policy(ds_cstr(_tp_name));
> +int err = nl_ct_del_timeout_policy(ds_cstr(_tp_name));
>  if (err == ENOENT) {
>  err = 0;
>  }
>  if (err) {
> -VLOG_WARN_RL(_rl, "failed to delete timeout policy %s
> (%s)",
> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(6, 6);
> +VLOG_INFO_RL(, "failed to delete timeout policy %s (%s)",
>   ds_cstr(_tp_name), ovs_strerror(err));
> -goto out;
> +ret = 1;
>  }
>  }
>
> -out:
>  ds_destroy(_tp_name);
> -return err;
> +return ret;
>  }
>
>  struct dpif_netlink_ct_timeout_policy_dump_state {
> @@ -3392,10 +3391,9 @@
> dpif_netlink_ct_timeout_policy_dump_start(struct dpif *dpif
> OVS_UNUSED,
>void **statep)
>  {
>  struct dpif_netlink_ct_timeout_policy_dump_state *dump_state;
> -int err;
>
>  *statep = dump_state = xzalloc(sizeof *dump_state);
> -err = nl_ct_timeout_policy_dump_start(_state->nl_dump_state);
> +int err = nl_ct_timeout_policy_dump_start(_state->nl_dump_state);
>  if (err) {
>  free(dump_state);
>  return err;
>
> <- end of diff
> -->
>
> >
> > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> > index 1d4ee60..cba4432 100644
> > --- a/lib/dpif-netlink.c
> > +++ b/lib/dpif-netlink.c
> > @@ -2822,11 +2822,10 @@ dpif_netlink_ct_dump_done(struct dpif *dpif
> OVS_UNUSED,
> >struct ct_dpif_dump_state *dump_)
> >  {
> >  struct dpif_netlink_ct_dump_state *dump;
> > -int err;
> >
> >  INIT_CONTAINER(dump, dump_, up);
> >
> > -err = nl_ct_dump_done(dump->nl_ct_dump);
> > +int err = nl_ct_dump_done(dump->nl_ct_dump);
> >  free(dump);
> >  return err;
> >  }
> > @@ -3319,7 +3318,8 @@ out:
> >  }
> >
> >  /* Returns 0 if all the sub timeout policies are deleted or
> > - * not exist in the kernel. */
> > + * not exist in the kernel; returns 1 if any sub timeout policy deletion
> > + * failed. */
> >  static int
> >  dpif_netlink_ct_del_timeout_policy(struct dpif *dpif OVS_UNUSED,
> > uint32_t tp_id)
> > @@ -3330

Re: [ovs-dev] [PATCH v4 4/9] ct-dpif, dpif-netlink: Add conntrack timeout policy support

2019-08-20 Thread Darrell Ball
On Mon, Aug 19, 2019 at 7:41 PM Darrell Ball  wrote:

>
>
> On Mon, Aug 19, 2019 at 12:42 PM Darrell Ball  wrote:
>
>>
>>
>> On Mon, Aug 19, 2019 at 10:52 AM Yi-Hung Wei 
>> wrote:
>>
>>> On Fri, Aug 16, 2019 at 5:07 PM Darrell Ball  wrote:
>>> >
>>> > Thanks for the patch
>>> >
>>> > Pls let me know if this incremental works for you.
>>> > Main change is logging fix for timeout policy deletion.
>>> >
>>> > Darrell
>>> >
>>> > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>>> > index 1d4ee60..00d957b 100644
>>> > --- a/lib/dpif-netlink.c
>>> > +++ b/lib/dpif-netlink.c
>>> > @@ -2822,11 +2822,10 @@ dpif_netlink_ct_dump_done(struct dpif *dpif
>>> OVS_UNUSED,
>>> >struct ct_dpif_dump_state *dump_)
>>> >  {
>>> >  struct dpif_netlink_ct_dump_state *dump;
>>> > -int err;
>>> >
>>> >  INIT_CONTAINER(dump, dump_, up);
>>> >
>>> > -err = nl_ct_dump_done(dump->nl_ct_dump);
>>> > +int err = nl_ct_dump_done(dump->nl_ct_dump);
>>> >  free(dump);
>>> >  return err;
>>> >  }
>>> > @@ -3335,7 +3334,8 @@ dpif_netlink_ct_del_timeout_policy(struct dpif
>>> *dpif OVS_UNUSED,
>>> >  err = 0;
>>> >  }
>>> >  if (err) {
>>> > -VLOG_WARN_RL(_rl, "failed to delete timeout policy
>>> %s (%s)",
>>> > +static struct vlog_rate_limit rl =
>>> VLOG_RATE_LIMIT_INIT(1, 1);
>>>
>>> Thanks for the diff.  It looks good in general.
>>>
>>> I agree on the main concern of the proposed diff which is the original
>>> rate limit in dpif-netlink (VLOG_RATE_LIMIT_INIT(, 5)) may log too
>>> much duplicated information.  However, since we may delete more than
>>> one one timeout policy in a minute, so lowering the rate limit to
>>> VLOG_RATE_LIMIT_INIT(1, 1) may miss some useful information.   I would
>>> use somewhere in the between (VLOG_RATE_LIMIT_INIT(5, 5)) in the next
>>> version.
>>>
>>
>> TBH, I am not sure we care lots about this information. I was even
>> debating changing it debug level.
>> We have 4 billion datapath timeout profile IDs, so it is unlikely we will
>> run out.
>> Eventually, they will get cleaned up by the retry thingy.
>>
>> Also, I am not sure what action we will take by seeing these logs anyhow.
>>
>> Spamming the log is more of a concern.
>>
>
> After more testing, I noticed there are a couple other aspects we might
> have overlooked.
>
> 1/ We don't have a log for a successful deletion attempt.
>
> 2/ When we try to delete 1 of the 6 associated Netfilter timeout policies
> and it fails, we don't try to delete
>  the remaining ones and bail out of the deletion loop early.
>
> Keeping the INFO level change (from WARN), addressing '1' and '2' above
> and also folding in
> your idea to keep the overall log generation rate a little higher, I ended
> up with:
>
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index 1d4ee60..3926cfd 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -2822,11 +2822,10 @@ dpif_netlink_ct_dump_done(struct dpif *dpif
> OVS_UNUSED,
>struct ct_dpif_dump_state *dump_)
>  {
>  struct dpif_netlink_ct_dump_state *dump;
> -int err;
>
>  INIT_CONTAINER(dump, dump_, up);
>
> -err = nl_ct_dump_done(dump->nl_ct_dump);
> +int err = nl_ct_dump_done(dump->nl_ct_dump);
>  free(dump);
>  return err;
>  }
> @@ -3334,14 +,12 @@ dpif_netlink_ct_del_timeout_policy(struct dpif
> *dpif OVS_UNUSED,
>  if (err == ENOENT) {
>  err = 0;
>  }
> -if (err) {
> -VLOG_WARN_RL(_rl, "failed to delete timeout policy %s
> (%s)",
> - ds_cstr(_tp_name), ovs_strerror(err));
> -goto out;
> -}
> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(6, 6);
> +VLOG_INFO_RL(, err ? "failed to delete timeout policy %s (%s)"
> +  : "deleted timeout policy %s (%s)",
> + ds_cstr(_tp_name), ovs_strerror(err));
>  }
>
> -out:
>  ds_destroy(_tp_name);
>  return err;
>  }
> @@ -3392,10 +3389,9 @@ dpif_netlink_ct_timeout_policy_dump_start(struct
> dpif 

Re: [ovs-dev] [PATCH v4 4/9] ct-dpif, dpif-netlink: Add conntrack timeout policy support

2019-08-19 Thread Darrell Ball
On Mon, Aug 19, 2019 at 12:42 PM Darrell Ball  wrote:

>
>
> On Mon, Aug 19, 2019 at 10:52 AM Yi-Hung Wei  wrote:
>
>> On Fri, Aug 16, 2019 at 5:07 PM Darrell Ball  wrote:
>> >
>> > Thanks for the patch
>> >
>> > Pls let me know if this incremental works for you.
>> > Main change is logging fix for timeout policy deletion.
>> >
>> > Darrell
>> >
>> > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>> > index 1d4ee60..00d957b 100644
>> > --- a/lib/dpif-netlink.c
>> > +++ b/lib/dpif-netlink.c
>> > @@ -2822,11 +2822,10 @@ dpif_netlink_ct_dump_done(struct dpif *dpif
>> OVS_UNUSED,
>> >struct ct_dpif_dump_state *dump_)
>> >  {
>> >  struct dpif_netlink_ct_dump_state *dump;
>> > -int err;
>> >
>> >  INIT_CONTAINER(dump, dump_, up);
>> >
>> > -err = nl_ct_dump_done(dump->nl_ct_dump);
>> > +int err = nl_ct_dump_done(dump->nl_ct_dump);
>> >  free(dump);
>> >  return err;
>> >  }
>> > @@ -3335,7 +3334,8 @@ dpif_netlink_ct_del_timeout_policy(struct dpif
>> *dpif OVS_UNUSED,
>> >  err = 0;
>> >  }
>> >  if (err) {
>> > -VLOG_WARN_RL(_rl, "failed to delete timeout policy
>> %s (%s)",
>> > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1,
>> 1);
>>
>> Thanks for the diff.  It looks good in general.
>>
>> I agree on the main concern of the proposed diff which is the original
>> rate limit in dpif-netlink (VLOG_RATE_LIMIT_INIT(, 5)) may log too
>> much duplicated information.  However, since we may delete more than
>> one one timeout policy in a minute, so lowering the rate limit to
>> VLOG_RATE_LIMIT_INIT(1, 1) may miss some useful information.   I would
>> use somewhere in the between (VLOG_RATE_LIMIT_INIT(5, 5)) in the next
>> version.
>>
>
> TBH, I am not sure we care lots about this information. I was even
> debating changing it debug level.
> We have 4 billion datapath timeout profile IDs, so it is unlikely we will
> run out.
> Eventually, they will get cleaned up by the retry thingy.
>
> Also, I am not sure what action we will take by seeing these logs anyhow.
>
> Spamming the log is more of a concern.
>

After more testing, I noticed there are a couple other aspects we might
have overlooked.

1/ We don't have a log for a successful deletion attempt.

2/ When we try to delete 1 of the 6 associated Netfilter timeout policies
and it fails, we don't try to delete
 the remaining ones and bail out of the deletion loop early.

Keeping the INFO level change (from WARN), addressing '1' and '2' above and
also folding in
your idea to keep the overall log generation rate a little higher, I ended
up with:

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 1d4ee60..3926cfd 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2822,11 +2822,10 @@ dpif_netlink_ct_dump_done(struct dpif *dpif
OVS_UNUSED,
   struct ct_dpif_dump_state *dump_)
 {
 struct dpif_netlink_ct_dump_state *dump;
-int err;

 INIT_CONTAINER(dump, dump_, up);

-err = nl_ct_dump_done(dump->nl_ct_dump);
+int err = nl_ct_dump_done(dump->nl_ct_dump);
 free(dump);
 return err;
 }
@@ -3334,14 +,12 @@ dpif_netlink_ct_del_timeout_policy(struct dpif
*dpif OVS_UNUSED,
 if (err == ENOENT) {
 err = 0;
 }
-if (err) {
-VLOG_WARN_RL(_rl, "failed to delete timeout policy %s
(%s)",
- ds_cstr(_tp_name), ovs_strerror(err));
-goto out;
-}
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(6, 6);
+VLOG_INFO_RL(, err ? "failed to delete timeout policy %s (%s)"
+  : "deleted timeout policy %s (%s)",
+ ds_cstr(_tp_name), ovs_strerror(err));
 }

-out:
 ds_destroy(_tp_name);
 return err;
 }
@@ -3392,10 +3389,9 @@ dpif_netlink_ct_timeout_policy_dump_start(struct
dpif *dpif OVS_UNUSED,
   void **statep)
 {
 struct dpif_netlink_ct_timeout_policy_dump_state *dump_state;
-int err;

 *statep = dump_state = xzalloc(sizeof *dump_state);
-err = nl_ct_timeout_policy_dump_start(_state->nl_dump_state);
+int err = nl_ct_timeout_policy_dump_start(_state->nl_dump_state);
 if (err) {
 free(dump_state);
 return err;

Running the system test (from Patch 9) and adding a deletion request for
the in-use timeout policy yields

> 2019-08-20T01:28:41.004Z|00203|dpif_netlink|INFO|deleted timeout

Re: [ovs-dev] [PATCH v4 6/9] ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables

2019-08-19 Thread Darrell Ball
On Mon, Aug 19, 2019 at 11:12 AM Yi-Hung Wei  wrote:

> On Fri, Aug 16, 2019 at 5:10 PM Darrell Ball  wrote:
> >
> > Thanks for the patch
> >
> > Pls let me know if the following incremental works for you.
> >
> > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> > index 244155a..cb8b51e 100644
> > --- a/ofproto/ofproto-dpif.c
> > +++ b/ofproto/ofproto-dpif.c
> > @@ -168,6 +168,12 @@ struct ct_timeout_policy {
> >   * "ct_tp_kill_list" list. */
> >  };
> >
> > +/* Periodically try to purge deleted timeout policies from the
> datapath. Retry
> > + * may be necessary if the kernel datapath has a non-zero datapath flow
> > + * reference count for the timeout policy. */
> > +#define TIMEOUT_POLICY_CLEANUP_INTERVAL (30) /* 5 minutes. */
> > +static long long int timeout_policy_cleanup_timer;
>
> Thanks for adding the clean up debounce and makes the comment clearer.
>
> I will fold in your diff and the following minor change in the next
> version.
>
> #define TIMEOUT_POLICY_CLEANUP_INTERVAL (2) /* 20 seconds. */
> static long long int timeout_policy_cleanup_timer = LLONG_MIN;
>

looks fine


>
> I changed the interval to be two times of the revlidataion cycle
> because we should be able to remove the unused timeout policies in the
> kernel datapath after the next flow revalidation cycle.


> Thanks,
>
> -Yi-Hung
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 4/9] ct-dpif, dpif-netlink: Add conntrack timeout policy support

2019-08-19 Thread Darrell Ball
On Mon, Aug 19, 2019 at 10:52 AM Yi-Hung Wei  wrote:

> On Fri, Aug 16, 2019 at 5:07 PM Darrell Ball  wrote:
> >
> > Thanks for the patch
> >
> > Pls let me know if this incremental works for you.
> > Main change is logging fix for timeout policy deletion.
> >
> > Darrell
> >
> > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> > index 1d4ee60..00d957b 100644
> > --- a/lib/dpif-netlink.c
> > +++ b/lib/dpif-netlink.c
> > @@ -2822,11 +2822,10 @@ dpif_netlink_ct_dump_done(struct dpif *dpif
> OVS_UNUSED,
> >struct ct_dpif_dump_state *dump_)
> >  {
> >  struct dpif_netlink_ct_dump_state *dump;
> > -int err;
> >
> >  INIT_CONTAINER(dump, dump_, up);
> >
> > -err = nl_ct_dump_done(dump->nl_ct_dump);
> > +int err = nl_ct_dump_done(dump->nl_ct_dump);
> >  free(dump);
> >  return err;
> >  }
> > @@ -3335,7 +3334,8 @@ dpif_netlink_ct_del_timeout_policy(struct dpif
> *dpif OVS_UNUSED,
> >  err = 0;
> >  }
> >  if (err) {
> > -VLOG_WARN_RL(_rl, "failed to delete timeout policy %s
> (%s)",
> > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1,
> 1);
>
> Thanks for the diff.  It looks good in general.
>
> I agree on the main concern of the proposed diff which is the original
> rate limit in dpif-netlink (VLOG_RATE_LIMIT_INIT(, 5)) may log too
> much duplicated information.  However, since we may delete more than
> one one timeout policy in a minute, so lowering the rate limit to
> VLOG_RATE_LIMIT_INIT(1, 1) may miss some useful information.   I would
> use somewhere in the between (VLOG_RATE_LIMIT_INIT(5, 5)) in the next
> version.
>

TBH, I am not sure we care lots about this information. I was even debating
changing it debug level.
We have 4 billion datapath timeout profile IDs, so it is unlikely we will
run out.
Eventually, they will get cleaned up by the retry thingy.

Also, I am not sure what action we will take by seeing these logs anyhow.

Spamming the log is more of a concern.


>
> Thanks,
>
> -Yi-Hung
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: check the result of extract_l3_ipv4/6

2019-08-19 Thread Darrell Ball
Thanks for the patch

On Sun, Aug 18, 2019 at 11:01 PM Li RongQing  wrote:

> the result of extract_l3_ipv4/6 should be checked in reverse_nat_packet
> when it is false, meaning this packet is wrong, should not do handle it
> continually
>
> Signed-off-by: Li RongQing 
> ---
>  lib/conntrack.c | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 5f60fea18..c26d5438c 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -695,11 +695,18 @@ reverse_nat_packet(struct dp_packet *pkt, const
> struct conn *conn)
>  uint16_t orig_l4_ofs = pkt->l4_ofs;
>
>  if (conn->key.dl_type == htons(ETH_TYPE_IP)) {
> +bool ok;
>  struct ip_header *nh = dp_packet_l3(pkt);
>  struct icmp_header *icmp = dp_packet_l4(pkt);
>  struct ip_header *inner_l3 = (struct ip_header *) (icmp + 1);
> -extract_l3_ipv4(_key, inner_l3, tail - ((char *)inner_l3) -
> pad,
>

There is intentionally no checking for success/fail here bcoz the packet
has already
been parsed and found to be ok during conn_key_extract() code path. Reusing
the
same api here is just convenient.


> +
> +ok = extract_l3_ipv4(_key, inner_l3,
> +tail - ((char *)inner_l3) - pad,
>  _l4, false);
> +if (!ok) {
> +return;
> +}
> +
>  pkt->l3_ofs += (char *) inner_l3 - (char *) nh;
>  pkt->l4_ofs += inner_l4 - (char *) icmp;
>
> @@ -715,13 +722,19 @@ reverse_nat_packet(struct dp_packet *pkt, const
> struct conn *conn)
>  icmp->icmp_csum = 0;
>  icmp->icmp_csum = csum(icmp, tail - (char *) icmp - pad);
>  } else {
> +bool ok;
>  struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt);
>  struct icmp6_error_header *icmp6 = dp_packet_l4(pkt);
>  struct ovs_16aligned_ip6_hdr *inner_l3_6 =
>  (struct ovs_16aligned_ip6_hdr *) (icmp6 + 1);
> -extract_l3_ipv6(_key, inner_l3_6,
>

There is intentionally no checking for success/fail here bcoz the packet
has already
been parsed and found to be ok during conn_key_extract() code path. Reusing
the
same api here is just convenient.


> +
> +ok = extract_l3_ipv6(_key, inner_l3_6,
>  tail - ((char *)inner_l3_6) - pad,
>  _l4);
> +
> +if (!ok) {
> +return;
> +}
>  pkt->l3_ofs += (char *) inner_l3_6 - (char *) nh6;
>  pkt->l4_ofs += inner_l4 - (char *) icmp6;
>
> --
> 2.16.2
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 6/9] ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables

2019-08-16 Thread Darrell Ball
Thanks for the patch

Pls let me know if the following incremental works for you.

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 244155a..cb8b51e 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -168,6 +168,12 @@ struct ct_timeout_policy {
  * "ct_tp_kill_list" list. */
 };

+/* Periodically try to purge deleted timeout policies from the datapath.
Retry
+ * may be necessary if the kernel datapath has a non-zero datapath flow
+ * reference count for the timeout policy. */
+#define TIMEOUT_POLICY_CLEANUP_INTERVAL (30) /* 5 minutes. */
+static long long int timeout_policy_cleanup_timer;
+
 struct ct_zone {
 uint16_t zone_id;
 struct ct_timeout_policy *ct_tp;
@@ -5294,19 +5300,20 @@ ct_zone_config_uninit(struct dpif_backer *backer)
 static void
 ct_zone_timeout_policy_sweep(struct dpif_backer *backer)
 {
-if (!ovs_list_is_empty(>ct_tp_kill_list)) {
+if (!ovs_list_is_empty(>ct_tp_kill_list)
+&& time_msec() >= timeout_policy_cleanup_timer) {
 struct ct_timeout_policy *ct_tp, *next;

 LIST_FOR_EACH_SAFE (ct_tp, next, list_node,
>ct_tp_kill_list) {
-int err = ct_dpif_del_timeout_policy(backer->dpif,
ct_tp->tp_id);
-if (!err) {
+if (!ct_dpif_del_timeout_policy(backer->dpif, ct_tp->tp_id)) {
 ovs_list_remove(_tp->list_node);
 ct_timeout_policy_destroy(ct_tp, backer->tp_ids);
 } else {
-VLOG_INFO_RL(, "failed to delete timeout policy id = "
- "%"PRIu32" %s", ct_tp->tp_id,
ovs_strerror(err));
+/* INFO log raised by 'dpif' layer. */
 }
 }
+timeout_policy_cleanup_timer = time_msec() +
+TIMEOUT_POLICY_CLEANUP_INTERVAL;
 }
 }

diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 72c9297..56f42c0 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -158,22 +158,23 @@ struct ct_zone {
 uint16_t zone;
 struct simap tp;/* A map from timeout policy attribute to
  * timeout value. */
-unsigned int last_used; /* The last idl_seqno that this struct is
used
+unsigned int last_used; /* The last idl_seqno that this 'ct_zone'
used
  * in OVSDB. This number is used for
garbage
  * collection. */
-struct hmap_node node;  /* Element in struct datapath_cfgs's
- * "ct_zone_timeout_policies" hmap. */
+struct hmap_node node;  /* Node in 'struct datapath' 'ct_zones'
+ * hmap. */
 };

 /* Internal representation of datapath configuration table in OVSDB. */
 struct datapath {
 char *type; /* Datapath type. */
-struct hmap ct_zones;   /* "struct ct_zone"s indexed by zone id. */
-struct hmap_node node;  /* In 'all_datapath_cfgs'. */
+struct hmap ct_zones;   /* Map of 'struct ct_zone' elements,
indexed
+ * by 'zone'. */
+struct hmap_node node;  /* Node in 'all_datapaths' hmap. */
 const struct ovsrec_datapath *dp_cfg;
-unsigned int last_used; /* The last idl_seqno that this struct is
used
- * in OVSDB. This number is used for
garbage
- * collection. */
+unsigned int last_used; /* The last idl_seqno that this 'datapath'
+ * used in OVSDB. This number is used for
+ * garbage collection. */
 };

 /* All bridges, indexed by name. */
@@ -712,10 +713,9 @@ static void
 update_datapath_cfgs(const struct ovsrec_open_vswitch *cfg)
 {
 struct datapath *dp, *next;
-size_t i;

-/* Add new datapath configs. */
-for (i = 0; i < cfg->n_datapaths; i++) {
+/* Add new 'datapath's or update existing ones. */
+for (size_t i = 0; i < cfg->n_datapaths; i++) {
 const struct ovsrec_datapath *dp_cfg = cfg->value_datapaths[i];
 char *dp_name = cfg->key_datapaths[i];

@@ -726,7 +726,7 @@ update_datapath_cfgs(const struct ovsrec_open_vswitch
*cfg)
 dp->last_used = idl_seqno;
 }

-/* Get rid of deleted datapath configs. */
+/* Purge deleted 'datapath's. */
 HMAP_FOR_EACH_SAFE (dp, next, node, _datapaths) {
 if (dp->last_used != idl_seqno) {
 datapath_destroy(dp);
@@ -740,7 +740,8 @@ reconfigure_ct_zones(struct datapath *dp)
 const struct ovsrec_datapath *dp_cfg = dp->dp_cfg;
 struct ct_zone *ct_zone, *next;

-/* Loop through all zones. Add or update configs. */
+/* Add new 'ct_zone's or update existing 'ct_zone's based on the
database
+ * state. */
 for (size_t i = 0; i < dp_cfg->n_ct_zones; i++) {
 uint16_t zone = dp_cfg->key_ct_zones[i];
 struct ovsrec_ct_zone *zone_cfg = dp_cfg->value_ct_zones[i];
@@ -763,7 

Re: [ovs-dev] [PATCH v4 4/9] ct-dpif, dpif-netlink: Add conntrack timeout policy support

2019-08-16 Thread Darrell Ball
Thanks for the patch

Pls let me know if this incremental works for you.
Main change is logging fix for timeout policy deletion.

Darrell

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 1d4ee60..00d957b 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2822,11 +2822,10 @@ dpif_netlink_ct_dump_done(struct dpif *dpif
OVS_UNUSED,
   struct ct_dpif_dump_state *dump_)
 {
 struct dpif_netlink_ct_dump_state *dump;
-int err;

 INIT_CONTAINER(dump, dump_, up);

-err = nl_ct_dump_done(dump->nl_ct_dump);
+int err = nl_ct_dump_done(dump->nl_ct_dump);
 free(dump);
 return err;
 }
@@ -3335,7 +3334,8 @@ dpif_netlink_ct_del_timeout_policy(struct dpif *dpif
OVS_UNUSED,
 err = 0;
 }
 if (err) {
-VLOG_WARN_RL(_rl, "failed to delete timeout policy %s
(%s)",
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+VLOG_INFO_RL(, "failed to delete timeout policy %s (%s)",
  ds_cstr(_tp_name), ovs_strerror(err));
 goto out;
 }
@@ -3392,10 +3392,9 @@ dpif_netlink_ct_timeout_policy_dump_start(struct
dpif *dpif OVS_UNUSED,
   void **statep)
 {
 struct dpif_netlink_ct_timeout_policy_dump_state *dump_state;
-int err;

 *statep = dump_state = xzalloc(sizeof *dump_state);
-err = nl_ct_timeout_policy_dump_start(_state->nl_dump_state);
+int err = nl_ct_timeout_policy_dump_start(_state->nl_dump_state);
 if (err) {
 free(dump_state);
 return err;

On Thu, Aug 15, 2019 at 12:35 PM Yi-Hung Wei  wrote:

> This patch first defines the dpif interface for a datapath to support
> adding, deleting, getting and dumping conntrack timeout policy.
> The timeout policy is identified by a 4 bytes unsigned integer in
> datapath, and it currently support timeout for TCP, UDP, and ICMP
> protocols.
>
> Moreover, this patch provides the implementation for Linux kernel
> datapath in dpif-netlink.
>
> In Linux kernel, the timeout policy is maintained per L3/L4 protocol,
> and it is identified by 32 bytes null terminated string.  On the other
> hand, in vswitchd, the timeout policy is a generic one that consists of
> all the supported L4 protocols.  Therefore, one of the main task in
> dpif-netlink is to break down the generic timeout policy into 6
> sub policies (ipv4 tcp, udp, icmp, and ipv6 tcp, udp, icmp),
> and push down the configuration using the netlink API in
> netlink-conntrack.c.
>
> This patch also adds missing symbols in the windows datapath so
> that the build on windows can pass.
>
> Appveyor CI:
> * https://ci.appveyor.com/project/YiHungWei/ovs/builds/26387754
>
> Signed-off-by: Yi-Hung Wei 
> Acked-by: Alin Gabriel Serdean 
> ---
>  Documentation/faq/releases.rst |   3 +-
>  datapath-windows/include/OvsDpInterfaceCtExt.h | 114 +
>  datapath-windows/ovsext/Netlink/NetlinkProto.h |   8 +-
>  include/windows/automake.mk|   1 +
>  .../windows/linux/netfilter/nfnetlink_cttimeout.h  |   0
>  lib/ct-dpif.c  | 102 +
>  lib/ct-dpif.h  |  56 +++
>  lib/dpif-netdev.c  |   6 +
>  lib/dpif-netlink.c | 478
> +
>  lib/dpif-netlink.h |   1 -
>  lib/dpif-provider.h|  44 ++
>  lib/netlink-conntrack.c| 301 +
>  lib/netlink-conntrack.h|  27 +-
>  lib/netlink-protocol.h |   8 +-
>  14 files changed, 1142 insertions(+), 7 deletions(-)
>  create mode 100644 include/windows/linux/netfilter/nfnetlink_cttimeout.h
>
> diff --git a/Documentation/faq/releases.rst
> b/Documentation/faq/releases.rst
> index 8daa23bb2d0c..0b7eaab1b143 100644
> --- a/Documentation/faq/releases.rst
> +++ b/Documentation/faq/releases.rst
> @@ -110,8 +110,9 @@ Q: Are all features available with all datapaths?
>  == == == =
> ===
>  Connection tracking 4.3YES  YES
> YES
>  Conntrack Fragment Reass.   4.3YES  YES
> YES
> +Conntrack Timeout Policies  5.2YES  NO
>  NO
> +Conntrack Zone Limit4.18   YES  NO
>  YES
>  NAT 4.6YES  YES
> YES
> -Conntrack zone limit4.18   YES  NO
>  YES
>  Tunnel - LISP   NO YES  NO
>  NO
>  Tunnel - STTNO YES  NO
>  YES
>  Tunnel - GRE3.11   YES  YES
> YES
> diff --git a/datapath-windows/include/OvsDpInterfaceCtExt.h
> 

Re: [ovs-dev] [PATCH v4 0/9] Support zone-based conntrack timeout policy

2019-08-15 Thread Darrell Ball
Thanks for the patches

I did some quick adhoc testing with 4.4.0-119 and 5.0.0-23.

Some results worth noting:

1/ Failing to delete timeout policies with non-zero refcounts results in
WARN logging

> 2019-08-16T00:21:32.240Z|00153|dpif_netlink|WARN|failed to delete timeout
policy ovs_tp_0_udp4 (Device or resource busy)
> 2019-08-16T00:21:32.240Z|00154|ofproto_dpif|INFO|failed to delete timeout
policy id = 0 Device or resource busy
> 2019-08-16T00:21:32.241Z|00155|dpif_netlink|WARN|failed to delete timeout
policy ovs_tp_0_udp4 (Device or resource busy)
> 2019-08-16T00:21:32.241Z|00156|ofproto_dpif|INFO|failed to delete timeout
policy id = 0 Device or resource busy
> 2019-08-16T00:21:32.243Z|00157|dpif_netlink|WARN|failed to delete timeout
policy ovs_tp_0_udp4 (Device or resource busy)
> 2019-08-16T00:21:32.243Z|00158|ofproto_dpif|INFO|failed to delete timeout
policy id = 0 Device or resource busy
> 2019-08-16T00:21:32.243Z|00159|dpif_netlink|WARN|failed to delete timeout
policy ovs_tp_0_udp4 (Device or resource busy)
> 2019-08-16T00:21:32.243Z|00160|ofproto_dpif|INFO|failed to delete timeout
policy id = 0 Device or resource busy

Lets change to INFO level since this can legitimately and expectantly occur

Also eliminate one of the redundant logs - dpif-netlink or ofproto_dpif
> 2019-08-16T00:21:32.243Z|00159|dpif_netlink|WARN|failed to delete timeout
policy ovs_tp_0_udp4 (Device or resource busy)
> 2019-08-16T00:21:32.243Z|00160|ofproto_dpif|INFO|failed to delete timeout
policy id = 0 Device or resource busy
while still maintaining the full profile name (e,g, ovs_tp_0_udp4) in the
kernel.

2/ Also, the cleanup attempts are too aggressive
bridge_run() ... ->->type_run()->ct_zone_timeout_policy_sweep()
IIRC, we had a debounce in previous version(s); maybe I missed it in my
quick check of v4; I'll check more later.
Anyways, we need some debouncing.
Also the vlog rate limiting may need adjustment to throttle more.

Thanks Darrell

On Thu, Aug 15, 2019 at 12:31 PM Yi-Hung Wei  wrote:

> This patch series enables zone-based conntrack timeout policy support in
> OVS.
> Timeout policy is a set of timeout attributes that can be associated with a
> connection when it is committed.  Then, the connection tracking system will
> expire a connection based on its connection state.  For example, one use
> case would be to extend the timeout of TCP connection in the established
> state to avoid re-connect overhead. Other use case may be to shorten the
> connection timeout so that the system can reclaim resources faster.
> The idea of zone-based conntrack timeout policy is to group connections
> with similar characteristics in a conntrack zone, and assign timeout policy
> to the conntrack zone.  In this way, all the connections in that zone will
> share
> the same timeout policy.
>
> For zone-based timeout policy configuration, the association of conntrack
> zone and conntrack timeout policy is defined per datapath in vswitchd ovsdb
> schema.  User can program the database through ovs-vsctl or using ovsdb
> protocol directly.  Once the zone-based timeout policy configuration is
> in the database, vswitchd will read those configuration and organize it
> in internal datapath structure, and push the timeout policy into datapath.
> Currently, only the kernel datapath supports customized timeout policy.
>
> When a packet is committed to connection tracking system, during flow
> translation in ofproto-dpif-xlate, vswitchd will lookup the internal
> data structure to figure out which timeout policy to associate with
> the connection.  If timeout policy is not specified to the committed
> zone, it defaults to the timeout policy in the default zone (zone 0).
> If the timeout policy is not specified in the default zone, it defaults
> to the system default timeouts.
>
> Here are some more details about each patch
> * p01: Introduce ovsdb schema for ct timeout policy.
> * p02: ovs-vsctl commands to configure zone-based ct timeout policy.
> * p03: Expose a utility functions.
> * p04: dpif interface along with dpif-netlink implementation to support
>ct timeout policy.
> * p05: Consume ct timeout policy configuration from ovsdb server,
>keep it in internal data structure, and push configuration to
>datapath.
> * p06: Add utility function to help compare two simaps.
> * p07-08: Kernel datapath support for the new ct action attribute.
> * p09: Translate timeout policy in ofproto-dpif-xlate and system traffic
> test.
>
> v3->v4:
> * ofproto-dpif
> - Probe datapath for timeout policy support.
> - With the probing information only translate timeout policy when
>   the datapath is supported.
> - Resolve the old kernel compatibility issue reported by Darrell.
> * system-traffic
> - Simplify the testing script (diff from Darrell).
> * Address various code changes as in the mailing list discussion.
>
> v2->v3
> * ovsdb schema
> - Fold in changes from Justin.
> - Make ct timeout policy 

Re: [ovs-dev] [PATCH v3 2/9] ovs-vsctl: Add conntrack zone commands.

2019-08-15 Thread Darrell Ball
On Thu, Aug 15, 2019 at 10:53 AM William Tu  wrote:

> Thanks for the review.
>
> On Mon, Aug 12, 2019 at 09:54:42PM -0700, Darrell Ball wrote:
> > Thanks for the patch
> >
> > Thanks for the fixups; mostly minor comments inline.
> >
> > On Mon, Aug 12, 2019 at 5:53 PM Yi-Hung Wei 
> wrote:
> >
> > > From: William Tu 
> > >
> > > The patch adds commands creating/deleting/listing conntrack zone
> > > timeout policies:
> > >   $ ovs-vsctl {add,del,list}-zone-tp dp zone=zone_id ...
> > >
> > > Signed-off-by: William Tu 
> > > ---
> > >  tests/ovs-vsctl.at   |  34 -
> > >  utilities/ovs-vsctl.8.in |  26 +++
> > >  utilities/ovs-vsctl.c| 194
> > > +++
> > >  3 files changed, 252 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
> > > index 46fa3c5b1a33..df15fb6901a0 100644
> > > --- a/tests/ovs-vsctl.at
> > > +++ b/tests/ovs-vsctl.at
> > > @@ -805,6 +805,20 @@ AT_CHECK(
> > >[RUN_OVS_VSCTL([--if-exists remove netflow x targets '"1.2.3.4:567
> > > "'])])
> > >  AT_CHECK(
> > >[RUN_OVS_VSCTL([--if-exists clear netflow x targets])])
> > > +
> > > +AT_CHECK([RUN_OVS_VSCTL([-- --id=@m create Datapath
> datapath_version=0 --
> > > set Open_vSwitch . datapaths:"netdev"=@m])], [0], [stdout])
> > > +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=1 icmp_first=1
> > > icmp_reply=2])])
> > > +AT_CHECK([RUN_OVS_VSCTL([--may-exist add-zone-tp netdev zone=1
> > > icmp_first=1 icmp_reply=2])])
> > > +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:1, Timeout
> > > Policies: icmp_first=1 icmp_reply=2
> > > +])
> > > +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=2 icmp_first=2
> > > icmp_reply=3])])
> > > +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:1, Timeout
> > > Policies: icmp_first=1 icmp_reply=2
> > > +Zone:2, Timeout Policies: icmp_first=2 icmp_reply=3
> > > +])
> > > +AT_CHECK([RUN_OVS_VSCTL([del-zone-tp netdev zone=1])])
> > > +AT_CHECK([RUN_OVS_VSCTL([--if-exists del-zone-tp netdev zone=1])])
> > > +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:2, Timeout
> > > Policies: icmp_first=2 icmp_reply=3
> > > +])
> > >  OVS_VSCTL_CLEANUP
> > >  AT_CLEANUP
> > >
> > > @@ -890,10 +904,10 @@ AT_CHECK([RUN_OVS_VSCTL([set bridge br0
> > > flood_vlans=-1])],
> > >  AT_CHECK([RUN_OVS_VSCTL([set bridge br0 flood_vlans=4096])],
> > >[1], [], [ovs-vsctl: constraint violation: 4096 is not in the valid
> > > range 0 to 4095 (inclusive)
> > >  ])
> > > -AT_CHECK([RUN_OVS_VSCTL([set c br1 'connection-mode=xyz'])],
> > > +AT_CHECK([RUN_OVS_VSCTL([set controller br1 'connection-mode=xyz'])],
> > >[1], [], [[ovs-vsctl: constraint violation: xyz is not one of the
> > > allowed values ([in-band, out-of-band])
> > >  ]])
> > > -AT_CHECK([RUN_OVS_VSCTL([set c br1 connection-mode:x=y])],
> > > +AT_CHECK([RUN_OVS_VSCTL([set controller br1 connection-mode:x=y])],
> > >[1], [], [ovs-vsctl: cannot specify key to set for non-map column
> > > connection_mode
> > >  ])
> > >  AT_CHECK([RUN_OVS_VSCTL([add bridge br1 datapath_id x y])],
> > > @@ -929,6 +943,22 @@ AT_CHECK([RUN_OVS_VSCTL([remove bridge br1
> > > flood-vlans true])],
> > >  AT_CHECK([RUN_OVS_VSCTL([clear bridge br1 name])],
> > >[1], [], [ovs-vsctl: cannot modify read-only column name in table
> Bridge
> > >  ])
> > > +
> > > +AT_CHECK([RUN_OVS_VSCTL([-- --id=@m create Datapath
> datapath_version=0 --
> > > set Open_vSwitch . datapaths:"netdev"=@m])], [0], [stdout])
> > >
> >
> > If I execute
> >
> > AT_CHECK([RUN_OVS_VSCTL([-- --id=@m create Datapath datapath_version=0 --
> > set Open_vSwitch . datapaths:"netdevvv"=@m])], [0], [stdout])
> >
> > it works, but there is no such datapath type 'netdevvv'
> >
> this is using database command, I don't think we should enforce it.
>

JTBC, there is nothing to do for this aspect in terms of the ovs-vsctl
command support.
I mentioned here for context, bcoz the tests are in this patch.
Below paragraph has more context, but don't worry about that either.


>
> > I think it would be better to enforce an enum here as well thru the
> schema,
> > as I mentioned in V2,

Re: [ovs-dev] [PATCH v3 4/9] ct-dpif, dpif-netlink: Add conntrack timeout policy support

2019-08-14 Thread Darrell Ball
Thanks for the patch

mostly minor comments

On Mon, Aug 12, 2019 at 5:54 PM Yi-Hung Wei  wrote:

> This patch first defines the dpif interface for a datapath to support
> adding, deleting, getting and dumping conntrack timeout policy.
> The timeout policy is identified by a 4 bytes unsigned integer in
> datapath, and it currently support timeout for TCP, UDP, and ICMP
> protocols.
>
> Moreover, this patch provides the implementation for Linux kernel
> datapath in dpif-netlink.
>
> In Linux kernel, the timeout policy is maintained per L3/L4 protocol,
> and it is identified by 32 bytes null terminated string.  On the other
> hand, in vswitchd, the timeout policy is a generic one that consists of
> all the supported L4 protocols.  Therefore, one of the main task in
> dpif-netlink is to break down the generic timeout policy into 6
> sub policies (ipv4 tcp, udp, icmp, and ipv6 tcp, udp, icmp),
> and push down the configuration using the netlink API in
> netlink-conntrack.c.
>
> This patch also adds missing symbols in the windows datapath so
> that the build on windows can pass.
>
> Appveyor CI:
> * https://ci.appveyor.com/project/YiHungWei/ovs/builds/26387754
>
> Signed-off-by: Yi-Hung Wei 
> ---
>  Documentation/faq/releases.rst |   3 +-
>  datapath-windows/include/OvsDpInterfaceCtExt.h | 114 +
>  datapath-windows/ovsext/Netlink/NetlinkProto.h |   8 +-
>  include/windows/automake.mk|   1 +
>  .../windows/linux/netfilter/nfnetlink_cttimeout.h  |   0
>  lib/ct-dpif.c  | 104 +
>  lib/ct-dpif.h  |  56 +++
>  lib/dpif-netdev.c  |   6 +
>  lib/dpif-netlink.c | 469
> +
>  lib/dpif-netlink.h |   1 -
>  lib/dpif-provider.h|  44 ++
>  lib/netlink-conntrack.c| 308 ++
>  lib/netlink-conntrack.h|  27 +-
>  lib/netlink-protocol.h |   8 +-
>  14 files changed, 1142 insertions(+), 7 deletions(-)
>  create mode 100644 include/windows/linux/netfilter/nfnetlink_cttimeout.h
>
> diff --git a/Documentation/faq/releases.rst
> b/Documentation/faq/releases.rst
> index 8daa23bb2d0c..0b7eaab1b143 100644
> --- a/Documentation/faq/releases.rst
> +++ b/Documentation/faq/releases.rst
> @@ -110,8 +110,9 @@ Q: Are all features available with all datapaths?
>  == == == =
> ===
>  Connection tracking 4.3YES  YES
> YES
>  Conntrack Fragment Reass.   4.3YES  YES
> YES
> +Conntrack Timeout Policies  5.2YES  NO
>  NO
> +Conntrack Zone Limit4.18   YES  NO
>  YES
>  NAT 4.6YES  YES
> YES
> -Conntrack zone limit4.18   YES  NO
>  YES
>  Tunnel - LISP   NO YES  NO
>  NO
>  Tunnel - STTNO YES  NO
>  YES
>  Tunnel - GRE3.11   YES  YES
> YES
> diff --git a/datapath-windows/include/OvsDpInterfaceCtExt.h
> b/datapath-windows/include/OvsDpInterfaceCtExt.h
> index 3b947782e90c..4379855bb8dd 100644
> --- a/datapath-windows/include/OvsDpInterfaceCtExt.h
> +++ b/datapath-windows/include/OvsDpInterfaceCtExt.h
> @@ -421,4 +421,118 @@ struct nf_ct_tcp_flags {
>  UINT8 mask;
>  };
>
> +/* File: nfnetlink_cttimeout.h */
> +enum ctnl_timeout_msg_types {
> +IPCTNL_MSG_TIMEOUT_NEW,
> +IPCTNL_MSG_TIMEOUT_GET,
> +IPCTNL_MSG_TIMEOUT_DELETE,
> +IPCTNL_MSG_TIMEOUT_DEFAULT_SET,
> +IPCTNL_MSG_TIMEOUT_DEFAULT_GET,
> +
> +IPCTNL_MSG_TIMEOUT_MAX
> +};
> +
> +enum ctattr_timeout {
> +CTA_TIMEOUT_UNSPEC,
> +CTA_TIMEOUT_NAME,
> +CTA_TIMEOUT_L3PROTO,
> +CTA_TIMEOUT_L4PROTO,
> +CTA_TIMEOUT_DATA,
> +CTA_TIMEOUT_USE,
> +__CTA_TIMEOUT_MAX
> +};
> +#define CTA_TIMEOUT_MAX (__CTA_TIMEOUT_MAX - 1)
> +
> +enum ctattr_timeout_generic {
> +CTA_TIMEOUT_GENERIC_UNSPEC,
> +CTA_TIMEOUT_GENERIC_TIMEOUT,
> +__CTA_TIMEOUT_GENERIC_MAX
> +};
> +#define CTA_TIMEOUT_GENERIC_MAX (__CTA_TIMEOUT_GENERIC_MAX - 1)
> +
> +enum ctattr_timeout_tcp {
> +CTA_TIMEOUT_TCP_UNSPEC,
> +CTA_TIMEOUT_TCP_SYN_SENT,
> +CTA_TIMEOUT_TCP_SYN_RECV,
> +CTA_TIMEOUT_TCP_ESTABLISHED,
> +CTA_TIMEOUT_TCP_FIN_WAIT,
> +CTA_TIMEOUT_TCP_CLOSE_WAIT,
> +CTA_TIMEOUT_TCP_LAST_ACK,
> +CTA_TIMEOUT_TCP_TIME_WAIT,
> +CTA_TIMEOUT_TCP_CLOSE,
> +CTA_TIMEOUT_TCP_SYN_SENT2,
> +CTA_TIMEOUT_TCP_RETRANS,
> +CTA_TIMEOUT_TCP_UNACK,
> +__CTA_TIMEOUT_TCP_MAX
> +};
> +#define CTA_TIMEOUT_TCP_MAX (__CTA_TIMEOUT_TCP_MAX - 1)
> +
> +enum ctattr_timeout_udp {
> +

Re: [ovs-dev] [PATCH v3 6/9] ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables

2019-08-14 Thread Darrell Ball
On Wed, Aug 14, 2019 at 1:28 PM Yi-Hung Wei  wrote:

> On Tue, Aug 13, 2019 at 7:46 PM Darrell Ball  wrote:
> >
> > Thanks for the patch
> >
> > Some high level comments:
> >
> > 1/ The ct_tp_kill_list code is still in common code
> > I think we discussed moving that to the dpif backer code
> > ct_timeout_policy_unref() is adding to this deferred kill list which
> is not needed for userspace
> > datapath.
> > 2/ clear_existing_ct_timeout_policies() is in common code, but only does
> something if
> > ct_dpif_timeout_policy_dump_start/next/done are realized in the datatype
> type specific code
> > (which is only for the kernel code, which is correct). I think it would
> be cleaner and less confusing
> > just to make the API clear_existing_ct_timeout_policies() kernel
> specific; i.e. in dpif-netlink.
>
>
> Thanks for review. I address most of the code changes as in the
> detailed inline code review.
>
> For the two high level concerns,  it is mainly because currently
> ct_tp_kill_list is maintained in ofproto-dpif.c  I thought about to
> move the ct_tp_kill_list implementation from dpif_backer (in
> ofproto-dpif.c) to dpif-netlink.c, and here is why I still keep it in
> ofproto-dpif.c in the dpif_backer layer in this version.
>
> AFAIK, currently, we do not have a proper place to store
> ct_tp_kill_list in dpif-netlink.c in the userspace.  dpif-netlink is
> for the kernel datapath implementation, all the information that we
> configured to dpif-netlink are directly pass down into the kernel
> currently.   In userspace datapath, we can store userspace specific
> information in "struct dp_netdev", but there is no such place in
> dpif-neltink for now.  In this case, it is naturally to maintain the
> ct_tp_kill_list one level up in the dpif_backer layer.
>
> Anyhow, we can always make proper change on the way we maintain
> timeout policy in ofproto-dpif layer when the dpif-netdev
> implementation is introduced.
>

As discussed, lets defer.


>
>
> > On Mon, Aug 12, 2019 at 5:55 PM Yi-Hung Wei 
> wrote:
> >> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> >> index 751535249e21..3013d83e96a0 100644
> >> --- a/ofproto/ofproto-dpif.c
> >> +++ b/ofproto/ofproto-dpif.c
> >> @@ -694,6 +718,8 @@ struct odp_garbage {
> >>
> >>  static void check_support(struct dpif_backer *backer);
> >>
> >> +#define MAX_TIMEOUT_POLICY_ID UINT32_MAX
> >
> >
> > seems like random placement; could be moved where it is used.
> >
>
> ok, I will move it right before ct_zone_config_init().
>
>
> >> +static struct ct_timeout_policy *
> >> +ct_timeout_policy_alloc__(void)
> >> +{
> >> +struct ct_timeout_policy *ct_tp = xzalloc(sizeof *ct_tp);
> >> +simap_init(_tp->tp);
> >> +return ct_tp;
> >> +}
> >
> >
> > by using above API, you are not saving any code and maybe more error
> prone
> >
>
> This function is used in ct_timeout_policy_alloc() and
> clear_existing_ct_timeout_policies(). So are you sugguesting to expand
> it in these two functions?
>

ohh. I missed the other usage; I don't feel strongly either way.


>
> >>
> >> +
> >> +static struct ct_timeout_policy *
> >> +ct_timeout_policy_alloc(struct simap *tp, struct id_pool *tp_ids)
> >> +{
> >> +struct simap_node *node;
> >> +
> >> +struct ct_timeout_policy *ct_tp = ct_timeout_policy_alloc__();
> >> +SIMAP_FOR_EACH (node, tp) {
> >> +simap_put(_tp->tp, node->name, node->data);
> >> +}
> >> +
> >> +if (!id_pool_alloc_id(tp_ids, _tp->tp_id)) {
> >> +VLOG_ERR_RL(, "failed to allocate timeout policy id.");
> >> +simap_destroy(_tp->tp);
> >> +free(tp);
> >
> >
> > I think you rather need to free 'ct_tp'; i.e. free(ct_tp)
>
> Yes, thanks for spotting this bug.
>
>
> >> +static void
> >> +clear_existing_ct_timeout_policies(struct dpif_backer *backer)
> >> +{
> >> +/* In kernel datapath, when OVS starts, there may be some
> pre-existing
> >> + * timeout policies in the kernel.  To avoid reassign the same
> timeout
> >> + * policy ids, we dump all the pre-existing timeout policies and
> keep
> >> + * the ids in the pool.  Since OVS will not use those timeout
> policies
> >> + * for new datapath flow, we add them to the kill list and remove
> >> + * them later o

Re: [ovs-dev] [PATCH v3 1/9] ovs-vswitchd: Add Datapath, CT_Zone, and CT_Zone_Policy tables.

2019-08-14 Thread Darrell Ball
On Wed, Aug 14, 2019 at 9:47 AM Yi-Hung Wei  wrote:

> On Mon, Aug 12, 2019 at 7:46 PM Darrell Ball  wrote:
> >> diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
> >> index f7c6eb8983cd..c0a2242ad345 100644
> >> --- a/vswitchd/vswitch.ovsschema
> >> +++ b/vswitchd/vswitch.ovsschema
> >> @@ -1,9 +1,14 @@
> >>  {"name": "Open_vSwitch",
> >> - "version": "8.0.0",
> >> - "cksum": "3962141869 23978",
> >> + "version": "8.1.0",
> >> + "cksum": "1635647160 26090",
> >>   "tables": {
> >> "Open_vSwitch": {
> >>   "columns": {
> >> +   "datapaths": {
> >> + "type": {"key": {"type": "string"},
> >
> >
> > I had a minor comment in V2 about using an enum here for key - 'system'
> or 'netdev'
> > Does it work or is there worry that other datapath types will likely
> develop
> > and we will need to update the enum ?
>
> Thanks for the review.
>
> I discussed this with Justin about this.  We currently do not limit
> the datapath type in the Bridge table in ovsdb schema. So it might
> just keep it as is to be consistent with the the Bridge table.
>

Lets defer for now.
It can be treated as a separate issue and fixed later in one shot.


>
> >> +  "value": {"type": "uuid",
> >> +"refTable": "Datapath"},
> >> +  "min": 0, "max": "unlimited"}},
> >>
> >> "bridges": {
> >>   "type": {"key": {"type": "uuid",
> >>"refTable": "Bridge"},
> >> @@ -629,6 +634,48 @@
> >>"min": 0, "max": "unlimited"},
> >>   "ephemeral": true}},
> >>   "indexes": [["target"]]},
> >> +   "Datapath": {
> >> + "columns": {
> >> +   "datapath_version": {
> >> + "type": "string"},
> >> +   "ct_zones": {
> >> + "type": {"key": {"type": "integer",
> >> +  "minInteger": 0,
> >> +  "maxInteger": 65535},
> >> +  "value": {"type": "uuid",
> >> +"refTable": "CT_Zone"},
> >> +  "min": 0, "max": "unlimited"}},
> >
> >
> > minor comment from V2; I think
> > +  "min": 0, "max": "unlimited"}},
> > should be
> > +  "min": 0, "max": "65536"}},
>
> Since ct_zones is a map, so  the maximum size is already limited the key
> range.
> + "type": {"key": {"type": "integer",
> +  "minInteger": 0,
> +  "maxInteger": 65535},
>
> Keep "max" as "unlimited" also has the benefit that we do not need to
> update it when the range of value is changed.  There are other cases
> in ovsdb schema that has similar behavior, for  example:


>"queues": {
>  "type": {"key": {"type": "integer",
>   "minInteger": 0,
>   "maxInteger": 4294967295},
>   "value": {"type": "uuid",
> "refTable": "Queue"},
>   "min": 0, "max": "unlimited"}},
>
>"mappings": {
>  "type": {"key": {"type": "integer",
>   "minInteger": 0,
>   "maxInteger": 16777215},
>   "value": {"type": "integer",
>   "minInteger": 0,
>   "maxInteger": 4095},
>   "min": 0, "max": "unlimited"}}
>

lets defer


>
> >> +   &quo

Re: [ovs-dev] [PATCH v3 9/9] ofproto-dpif-xlate: Translate timeout policy in ct action

2019-08-13 Thread Darrell Ball
On Tue, Aug 13, 2019 at 8:03 PM Darrell Ball  wrote:

> Thanks for the patch
>
> few more comments
>
> On Mon, Aug 12, 2019 at 5:57 PM Yi-Hung Wei  wrote:
>
>> This patch derives the timeout policy based on ct zone from the
>> internal data structure that we maintain on dpif layer.
>>
>> It also adds a system traffic test to verify the zone-based conntrack
>> timeout feature.  The test uses ovs-vsctl commands to configure
>> the customized ICMP and UDP timeout on zone 5 to a shorter period.
>> It then injects ICMP and UDP traffic to conntrack, and checks if the
>> corresponding conntrack entry expires after the predefined timeout.
>>
>> Signed-off-by: Yi-Hung Wei 
>> ---
>>  NEWS |  1 +
>>  lib/ct-dpif.c| 11 +++
>>  lib/ct-dpif.h|  3 ++
>>  lib/dpif-netdev.c|  1 +
>>  lib/dpif-netlink.c   | 12 
>>  lib/dpif-provider.h  | 10 ++
>>  ofproto/ofproto-dpif-xlate.c | 23 ++
>>  ofproto/ofproto-dpif.c   | 27 
>>  ofproto/ofproto-dpif.h   |  4 +++
>>  tests/system-kmod-macros.at  | 27 
>>  tests/system-traffic.at  | 66
>> 
>>  tests/system-userspace-macros.at | 26 
>>  12 files changed, 211 insertions(+)
>>
>> diff --git a/NEWS b/NEWS
>> index c5caa13d6374..9f7fbb852e08 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -69,6 +69,7 @@ v2.12.0 - xx xxx 
>> - Linux datapath:
>>   * Support for the kernel versions 4.19.x and 4.20.x.
>>   * Support for the kernel version 5.0.x.
>> + * Add support for conntrack zone-based timeout policy.
>> - 'ovs-dpctl dump-flows' is no longer suitable for dumping offloaded
>> flows.
>>   'ovs-appctl dpctl/dump-flows' should be used instead.
>> - Add L2 GRE tunnel over IPv6 support.
>> diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
>> index 7f9ce0a561f7..f3bd71b5769d 100644
>> --- a/lib/ct-dpif.c
>> +++ b/lib/ct-dpif.c
>> @@ -864,3 +864,14 @@ ct_dpif_timeout_policy_dump_done(struct dpif *dpif,
>> void *state)
>>  ? dpif->dpif_class->ct_timeout_policy_dump_done(dpif, state)
>>  : EOPNOTSUPP);
>>  }
>> +
>> +int
>> +ct_dpif_get_timeout_policy_name(struct dpif *dpif, uint32_t tp_id,
>> +uint16_t dl_type, uint8_t nw_proto,
>> +struct ds *tp_name, bool *unwildcard)
>> +{
>> +return (dpif->dpif_class->ct_get_timeout_policy_name
>> +? dpif->dpif_class->ct_get_timeout_policy_name(
>> +dpif, tp_id, dl_type, nw_proto, tp_name, unwildcard)
>> +: EOPNOTSUPP);
>> +}
>> diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
>> index aabd6962f2c0..786dc6d2c474 100644
>> --- a/lib/ct-dpif.h
>> +++ b/lib/ct-dpif.h
>> @@ -318,5 +318,8 @@ int ct_dpif_timeout_policy_dump_start(struct dpif
>> *dpif, void **statep);
>>  int ct_dpif_timeout_policy_dump_next(struct dpif *dpif, void *state,
>>   struct ct_dpif_timeout_policy *tp);
>>  int ct_dpif_timeout_policy_dump_done(struct dpif *dpif, void *state);
>> +int ct_dpif_get_timeout_policy_name(struct dpif *dpif, uint32_t tp_id,
>> +uint16_t dl_type, uint8_t nw_proto,
>> +struct ds *tp_name, bool
>> *unwildcard);
>>
>>  #endif /* CT_DPIF_H */
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 7240a3e6f3c8..36637052e598 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -7539,6 +7539,7 @@ const struct dpif_class dpif_netdev_class = {
>>  NULL,   /* ct_timeout_policy_dump_start */
>>  NULL,   /* ct_timeout_policy_dump_next */
>>  NULL,   /* ct_timeout_policy_dump_done */
>> +NULL,   /* ct_get_timeout_policy_name */
>>  dpif_netdev_ipf_set_enabled,
>>  dpif_netdev_ipf_set_min_frag,
>>  dpif_netdev_ipf_set_max_nfrags,
>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>> index c2ac19dff887..c306242984ae 100644
>> --- a/lib/dpif-netlink.c
>> +++ b/lib/dpif-netlink.c
>> @@ -3072,6 +3072,17 @@ dpif_netlink_format_tp_name(uint32_t id, uint16_t
>> l3num, uint8_t l4num,
>>  ovs_assert(tp_name->length < CTNL_TIMEOUT_NAME_MAX);
>>  }
>>

Re: [ovs-dev] [PATCH v3 9/9] ofproto-dpif-xlate: Translate timeout policy in ct action

2019-08-13 Thread Darrell Ball
Thanks for the patch

few more comments

On Mon, Aug 12, 2019 at 5:57 PM Yi-Hung Wei  wrote:

> This patch derives the timeout policy based on ct zone from the
> internal data structure that we maintain on dpif layer.
>
> It also adds a system traffic test to verify the zone-based conntrack
> timeout feature.  The test uses ovs-vsctl commands to configure
> the customized ICMP and UDP timeout on zone 5 to a shorter period.
> It then injects ICMP and UDP traffic to conntrack, and checks if the
> corresponding conntrack entry expires after the predefined timeout.
>
> Signed-off-by: Yi-Hung Wei 
> ---
>  NEWS |  1 +
>  lib/ct-dpif.c| 11 +++
>  lib/ct-dpif.h|  3 ++
>  lib/dpif-netdev.c|  1 +
>  lib/dpif-netlink.c   | 12 
>  lib/dpif-provider.h  | 10 ++
>  ofproto/ofproto-dpif-xlate.c | 23 ++
>  ofproto/ofproto-dpif.c   | 27 
>  ofproto/ofproto-dpif.h   |  4 +++
>  tests/system-kmod-macros.at  | 27 
>  tests/system-traffic.at  | 66
> 
>  tests/system-userspace-macros.at | 26 
>  12 files changed, 211 insertions(+)
>
> diff --git a/NEWS b/NEWS
> index c5caa13d6374..9f7fbb852e08 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -69,6 +69,7 @@ v2.12.0 - xx xxx 
> - Linux datapath:
>   * Support for the kernel versions 4.19.x and 4.20.x.
>   * Support for the kernel version 5.0.x.
> + * Add support for conntrack zone-based timeout policy.
> - 'ovs-dpctl dump-flows' is no longer suitable for dumping offloaded
> flows.
>   'ovs-appctl dpctl/dump-flows' should be used instead.
> - Add L2 GRE tunnel over IPv6 support.
> diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
> index 7f9ce0a561f7..f3bd71b5769d 100644
> --- a/lib/ct-dpif.c
> +++ b/lib/ct-dpif.c
> @@ -864,3 +864,14 @@ ct_dpif_timeout_policy_dump_done(struct dpif *dpif,
> void *state)
>  ? dpif->dpif_class->ct_timeout_policy_dump_done(dpif, state)
>  : EOPNOTSUPP);
>  }
> +
> +int
> +ct_dpif_get_timeout_policy_name(struct dpif *dpif, uint32_t tp_id,
> +uint16_t dl_type, uint8_t nw_proto,
> +struct ds *tp_name, bool *unwildcard)
> +{
> +return (dpif->dpif_class->ct_get_timeout_policy_name
> +? dpif->dpif_class->ct_get_timeout_policy_name(
> +dpif, tp_id, dl_type, nw_proto, tp_name, unwildcard)
> +: EOPNOTSUPP);
> +}
> diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
> index aabd6962f2c0..786dc6d2c474 100644
> --- a/lib/ct-dpif.h
> +++ b/lib/ct-dpif.h
> @@ -318,5 +318,8 @@ int ct_dpif_timeout_policy_dump_start(struct dpif
> *dpif, void **statep);
>  int ct_dpif_timeout_policy_dump_next(struct dpif *dpif, void *state,
>   struct ct_dpif_timeout_policy *tp);
>  int ct_dpif_timeout_policy_dump_done(struct dpif *dpif, void *state);
> +int ct_dpif_get_timeout_policy_name(struct dpif *dpif, uint32_t tp_id,
> +uint16_t dl_type, uint8_t nw_proto,
> +struct ds *tp_name, bool *unwildcard);
>
>  #endif /* CT_DPIF_H */
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 7240a3e6f3c8..36637052e598 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -7539,6 +7539,7 @@ const struct dpif_class dpif_netdev_class = {
>  NULL,   /* ct_timeout_policy_dump_start */
>  NULL,   /* ct_timeout_policy_dump_next */
>  NULL,   /* ct_timeout_policy_dump_done */
> +NULL,   /* ct_get_timeout_policy_name */
>  dpif_netdev_ipf_set_enabled,
>  dpif_netdev_ipf_set_min_frag,
>  dpif_netdev_ipf_set_max_nfrags,
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index c2ac19dff887..c306242984ae 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -3072,6 +3072,17 @@ dpif_netlink_format_tp_name(uint32_t id, uint16_t
> l3num, uint8_t l4num,
>  ovs_assert(tp_name->length < CTNL_TIMEOUT_NAME_MAX);
>  }
>
> +static int
> +dpif_netlink_ct_get_timeout_policy_name(struct dpif *dpif OVS_UNUSED,
> +uint32_t tp_id, uint16_t dl_type, uint8_t nw_proto, struct ds
> *tp_name,
> +bool *unwildcard)
> +{
> +dpif_netlink_format_tp_name(tp_id,
> +dl_type == ETH_TYPE_IP ? AF_INET : AF_INET6, nw_proto, tp_name);
> +*unwildcard = true;
> +return 0;
> +}
> +
>  #define CT_DPIF_NL_TP_TCP_MAPPINGS  \
>  CT_DPIF_NL_TP_MAPPING(TCP, TCP, SYN_SENT, SYN_SENT) \
>  CT_DPIF_NL_TP_MAPPING(TCP, TCP, SYN_RECV, SYN_RECV) \
> @@ -3898,6 +3909,7 @@ const struct dpif_class dpif_netlink_class = {
>  dpif_netlink_ct_timeout_policy_dump_start,
>  dpif_netlink_ct_timeout_policy_dump_next,
>  

Re: [ovs-dev] [PATCH v3 6/9] ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables

2019-08-13 Thread Darrell Ball
Thanks for the patch

Some high level comments:

1/ The ct_tp_kill_list code is still in common code
I think we discussed moving that to the dpif backer code
ct_timeout_policy_unref() is adding to this deferred kill list which is
not needed for userspace
datapath.

2/ clear_existing_ct_timeout_policies() is in common code, but only does
something if
ct_dpif_timeout_policy_dump_start/next/done are realized in the datatype
type specific code
(which is only for the kernel code, which is correct). I think it would be
cleaner and less confusing
just to make the API clear_existing_ct_timeout_policies() kernel specific;
i.e. in dpif-netlink.

Some other comments inline


On Mon, Aug 12, 2019 at 5:55 PM Yi-Hung Wei  wrote:

> This patch consumes the CT_Zone and CT_Timeout_Policy tables, maintains
> the zone-based configuration in the vswitchd.  Whenever there is a
> database change, vswitchd will read the datapath, CT_Zone, and
> CT_Timeout_Policy tables from ovsdb, builds an internal snapshot of the
> database configuration in bridge.c, and pushes down the change into
> ofproto and dpif layer.
>
> If a new zone-based timeout policy is added, it updates the zone to
> timeout policy mapping in the per datapath type datapath structure in
> dpif-backer, and pushes down the timeout policy into the datapath via
> dpif interface.
>
> If a timeout policy is no longer used, for kernel datapath, vswitchd
> may not be able to remove it from datapath immediately since
> datapath flows can still reference the to-be-deleted timeout policies.
> Thus, we keep an timeout policy kill list, that vswitchd will go
> back to the list periodically and try to kill the unused timeout policies.
>
> Signed-off-by: Yi-Hung Wei 
> ---
>  ofproto/ofproto-dpif.c | 293
> +
>  ofproto/ofproto-dpif.h |  10 ++
>  ofproto/ofproto-provider.h |  10 ++
>  ofproto/ofproto.c  |  30 +
>  ofproto/ofproto.h  |   5 +
>  vswitchd/bridge.c  | 202 +++
>  6 files changed, 550 insertions(+)
>
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index 751535249e21..3013d83e96a0 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -156,6 +156,25 @@ struct ofport_dpif {
>  size_t n_qdscp;
>  };
>
> +struct ct_timeout_policy {
> +int ref_count;  /* The number of ct zones that use this
> + * timeout policy. */
> +uint32_t tp_id; /* Timeout policy id in the datapath. */
> +struct simap tp;/* A map from timeout policy attribute to
> + * timeout value. */
> +struct hmap_node node;  /* Element in struct dpif_backer's
> "ct_tps"
> + * cmap. */
> +struct ovs_list list_node;  /* Element in struct dpif_backer's
> + * "ct_tp_kill_list" list. */
> +};
> +
> +struct ct_zone {
> +uint16_t zone_id;
> +struct ct_timeout_policy *ct_tp;
> +struct cmap_node node;  /* Element in struct dpif_backer's
> + * "ct_zones" cmap. */
> +};
> +
>  static odp_port_t ofp_port_to_odp_port(const struct ofproto_dpif *,
> ofp_port_t);
>
> @@ -196,6 +215,9 @@ static struct hmap all_ofproto_dpifs_by_uuid =
>
>  static bool ofproto_use_tnl_push_pop = true;
>  static void ofproto_unixctl_init(void);
> +static void ct_zone_config_init(struct dpif_backer *backer);
> +static void ct_zone_config_uninit(struct dpif_backer *backer);
> +static void ct_zone_timeout_policy_sweep(struct dpif_backer *backer);
>
>  static inline struct ofproto_dpif *
>  ofproto_dpif_cast(const struct ofproto *ofproto)
> @@ -488,6 +510,7 @@ type_run(const char *type)
>  }
>
>  process_dpif_port_changes(backer);
> +ct_zone_timeout_policy_sweep(backer);
>
>  return 0;
>  }
> @@ -683,6 +706,7 @@ close_dpif_backer(struct dpif_backer *backer, bool del)
>  }
>  dpif_close(backer->dpif);
>  id_pool_destroy(backer->meter_ids);
> +ct_zone_config_uninit(backer);
>  free(backer);
>  }
>
> @@ -694,6 +718,8 @@ struct odp_garbage {
>
>  static void check_support(struct dpif_backer *backer);
>
> +#define MAX_TIMEOUT_POLICY_ID UINT32_MAX
>

seems like random placement; could be moved where it is used.


> +
>  static int
>  open_dpif_backer(const char *type, struct dpif_backer **backerp)
>  {
> @@ -811,6 +837,8 @@ open_dpif_backer(const char *type, struct dpif_backer
> **backerp)
>  backer->meter_ids = NULL;
>  }
>
> +ct_zone_config_init(backer);
> +
>  /* Make a pristine snapshot of 'support' into 'boottime_support'.
>   * 'boottime_support' can be checked to prevent 'support' to be
> changed
>   * beyond the datapath capabilities. In case 'support' is changed by
> @@ -5086,6 +5114,269 @@ ct_flush(const struct ofproto *ofproto_, const
> uint16_t *zone)

Re: [ovs-dev] [PATCH v3 9/9] ofproto-dpif-xlate: Translate timeout policy in ct action

2019-08-13 Thread Darrell Ball
On Tue, Aug 13, 2019 at 2:33 PM Yi-Hung Wei  wrote:

> On Tue, Aug 13, 2019 at 11:43 AM Darrell Ball  wrote:
> > Sure, circling back to this part
> >
> > yep, it is the Linux In-tree kernel module rather than OVS tree module
> >
> > dball@ubuntu:~/ovs$ modinfo openvswitch
> > filename:
>  /lib/modules/5.0.0-23-generic/kernel/net/openvswitch/openvswitch.ko
> > alias:  net-pf-16-proto-16-family-ovs_ct_limit
> > alias:  net-pf-16-proto-16-family-ovs_meter
> > alias:  net-pf-16-proto-16-family-ovs_packet
> > alias:  net-pf-16-proto-16-family-ovs_flow
> > alias:  net-pf-16-proto-16-family-ovs_vport
> > alias:  net-pf-16-proto-16-family-ovs_datapath
> > license:GPL
> > description:Open vSwitch switching datapath
> > srcversion: 12850657561FB87D174A001
> > depends:
> nf_conntrack,nf_nat,nf_conncount,libcrc32c,nf_nat_ipv6,nf_nat_ipv4,nf_defrag_ipv6,nsh
> > retpoline:  Y
> > intree: Y
> > name:   openvswitch
> > vermagic:   5.0.0-23-generic SMP mod_unload
> > signat: PKCS#7
> > signer:
> > sig_key:
> > sig_hashalgo:   md4
> >
> > btw, similarly
> > make 'check-kernel' fails for the same reasons.
> >
> > Ostensibly, I would have expected 5.0 to be ok.
> > I can dig more on this part later if you wish.
>
> The ct timeout feature is introduced in 5.2 kernel, so 'make
> check-kernel' is expected to fail on 5.0 kernel.  The upstream kernel
> support for ct timeout feature is documented at
> "Documentation/faq/releases.rst" in the patch 4.
>

sure, I had another version in mind for some reason


>
>
> > btw, I think a timeout policy not being applied should not result in
> packet blackholing.
> > I think we need to make this better.
>
> Sure, we can definitely make it better. I am focusing on some other
> issue now, but I will have a follow up patch that only translate the
> ct timeout attribute when the datapath does support that.
>

I had a brief look at the incremental, but probing for the support is the
standard approach.



>
> Thanks,
>
> -Yi-Hung
>
>
> > A timeout policy is just a nice to have 'thingy' after all.
> >
> > That being said, I would like to see Xenial working (with OVS in-tree
> module) with higher priority.
> >
> > Thanks Darrell
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 9/9] ofproto-dpif-xlate: Translate timeout policy in ct action

2019-08-13 Thread Darrell Ball
On Tue, Aug 13, 2019 at 11:01 AM Yi-Hung Wei  wrote:

> On Mon, Aug 12, 2019 at 7:35 PM Darrell Ball  wrote:
> >
> > Thanks for the patch
> >
> > Not a full review; I just did a quick run of the test using a more
> recent kernel version
> >
> > dball@ubuntu:~/ovs$ uname -r
> > 5.0.0-23-generic
> > dball@ubuntu:~/ovs$ lsb_release -a
> > No LSB modules are available.
> > Distributor ID: Ubuntu
> > Description: Ubuntu 18.04.3 LTS
> > Release: 18.04
> > Codename: bionic
> >
> > The test is no longer blocked on subsequent runs, at least with this
> kernel version (others: TBD) - cool !
> >
> > However
> >
> > ## --- ##
> > ## openvswitch 2.12.90 test suite. ##
> > ## --- ##
> >  75: conntrack - zone-based timeout policy   FAILED (
> system-traffic.at:3228)
> >
> > .
> > .
> > .
> > VSCTL_ADD_ZONE_TIMEOUT_POLICY([zone=5 udp_single=3 icmp_first=3])
> >
> > dnl Send ICMP and UDP traffic
> > NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl   <<<<<<<<<<<<<<<<<<<<< FAILS HERE
> > 3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > ])
> > .
> > .
> > .
> >
> > -3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +7 packets transmitted, 0 received, 100% packet loss, time 0ms
> >
> > warnings:
> >
> > > 2019-08-13T02:19:06.674Z|1|dpif(handler1)|WARN|system@ovs-system:
> failed to put[create] (Invalid argument)
> ufid:55d8603a-729c-43d7-9612-b54553e46299
> recirc_id(0x2),dp_hash(0/0),skb_priority(0/0),in_port(2),skb_mark(0/0),ct_state(0x21/0x23),ct_zone(0x5/0),ct_mark(0/0),ct_label(0/0),ct_tuple4(src=
> 10.1.1.1/0.0.0.0,dst=10.1.1.2/0.0.0.0,proto=1/0,tp_src=8/0,tp_dst=0/0
> ),eth(src=8a:ea:c3:02:6f:94/00:00:00:00:00:00,dst=92:48:5b:47:e2:63/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=
> 10.1.1.1/0.0.0.0,dst=10.1.1.2/0.0.0.0,proto=1,tos=0/0,ttl=64/0,frag=no),icmp(type=8/0,code=0/0),
> actions:ct(commit,zone=5,timeout=ovs_tp_0_icmp4),3
> > > 2019-08-13T02:19:06.674Z|2|dpif(handler1)|WARN|system@ovs-system:
> execute ct(commit,zone=5,timeout=ovs_tp_0_icmp4),3 failed (Invalid
> argument) on packet
> icmp,vlan_tci=0x,dl_src=8a:ea:c3:02:6f:94,dl_dst=92:48:5b:47:e2:63,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
> icmp_csum:4d0a
> > >  with metadata
> skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x5),ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=1,tp_src=8,tp_dst=0),in_port(2)
> mtu 0
> > > 2019-08-13T02:19:06.999Z|3|dpif(handler1)|WARN|system@ovs-system:
> failed to put[create] (Invalid argument)
> ufid:55d8603a-729c-43d7-9612-b54553e46299
> recirc_id(0x2),dp_hash(0/0),skb_priority(0/0),in_port(2),skb_mark(0/0),ct_state(0x21/0x23),ct_zone(0x5/0),ct_mark(0/0),ct_label(0/0),ct_tuple4(src=
> 10.1.1.1/0.0.0.0,dst=10.1.1.2/0.0.0.0,proto=1/0,tp_src=8/0,tp_dst=0/0
> ),eth(src=8a:ea:c3:02:6f:94/00:00:00:00:00:00,dst=92:48:5b:47:e2:63/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=
> 10.1.1.1/0.0.0.0,dst=10.1.1.2/0.0.0.0,proto=1,tos=0/0,ttl=64/0,frag=no),icmp(type=8/0,code=0/0),
> actions:ct(commit,zone=5,timeout=ovs_tp_0_icmp4),3
> > > 2019-08-13T02:19:06.999Z|4|dpif(handler1)|WARN|system@ovs-system:
> execute ct(commit,zone=5,timeout=ovs_tp_0_icmp4),3 failed (Invalid
> argument) on packet
> icmp,vlan_tci=0x,dl_src=8a:ea:c3:02:6f:94,dl_dst=92:48:5b:47:e2:63,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
> icmp_csum:2f10
> > >  with metadata
> skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x5),ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=1,tp_src=8,tp_dst=0),in_port(2)
> mtu 0
>
> Thanks for trying this test out on the other setup.
>
> The warning messages indicate that the kernel module does not
> understand the new added ct timeout action attribute.  I am wondering
> if the system used the correct kernel module?  Can you check 'modinfo
> openvswitch' and 'dmesg' to make sure the correct kernel module is
> loaded in the system?
>
> Thanks,
>
> -Yi-Hung
>

Sure, circling back to this part

yep, it is the Linux In-tree kernel module rather than OVS tree module

dball@ubuntu:~/ovs$ modinfo openvswitch
filename:
/lib/modules/5.0.0-23-generic/kernel/net/openvswitch/openvswitch.ko
alias:  net-pf-16-proto-16-family-ovs_ct_limit
alias:  net-pf-16-proto-16-family-ovs_meter
alias:  net-pf-16-proto-16-family-ovs_packet
alias:  net-pf-16-proto-16-family-ovs_flow
alias:  net-pf-16-prot

Re: [ovs-dev] [PATCH v3 2/9] ovs-vsctl: Add conntrack zone commands.

2019-08-12 Thread Darrell Ball
Thanks for the patch

Thanks for the fixups; mostly minor comments inline.

On Mon, Aug 12, 2019 at 5:53 PM Yi-Hung Wei  wrote:

> From: William Tu 
>
> The patch adds commands creating/deleting/listing conntrack zone
> timeout policies:
>   $ ovs-vsctl {add,del,list}-zone-tp dp zone=zone_id ...
>
> Signed-off-by: William Tu 
> ---
>  tests/ovs-vsctl.at   |  34 -
>  utilities/ovs-vsctl.8.in |  26 +++
>  utilities/ovs-vsctl.c| 194
> +++
>  3 files changed, 252 insertions(+), 2 deletions(-)
>
> diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
> index 46fa3c5b1a33..df15fb6901a0 100644
> --- a/tests/ovs-vsctl.at
> +++ b/tests/ovs-vsctl.at
> @@ -805,6 +805,20 @@ AT_CHECK(
>[RUN_OVS_VSCTL([--if-exists remove netflow x targets '"1.2.3.4:567
> "'])])
>  AT_CHECK(
>[RUN_OVS_VSCTL([--if-exists clear netflow x targets])])
> +
> +AT_CHECK([RUN_OVS_VSCTL([-- --id=@m create Datapath datapath_version=0 --
> set Open_vSwitch . datapaths:"netdev"=@m])], [0], [stdout])
> +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=1 icmp_first=1
> icmp_reply=2])])
> +AT_CHECK([RUN_OVS_VSCTL([--may-exist add-zone-tp netdev zone=1
> icmp_first=1 icmp_reply=2])])
> +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:1, Timeout
> Policies: icmp_first=1 icmp_reply=2
> +])
> +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=2 icmp_first=2
> icmp_reply=3])])
> +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:1, Timeout
> Policies: icmp_first=1 icmp_reply=2
> +Zone:2, Timeout Policies: icmp_first=2 icmp_reply=3
> +])
> +AT_CHECK([RUN_OVS_VSCTL([del-zone-tp netdev zone=1])])
> +AT_CHECK([RUN_OVS_VSCTL([--if-exists del-zone-tp netdev zone=1])])
> +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:2, Timeout
> Policies: icmp_first=2 icmp_reply=3
> +])
>  OVS_VSCTL_CLEANUP
>  AT_CLEANUP
>
> @@ -890,10 +904,10 @@ AT_CHECK([RUN_OVS_VSCTL([set bridge br0
> flood_vlans=-1])],
>  AT_CHECK([RUN_OVS_VSCTL([set bridge br0 flood_vlans=4096])],
>[1], [], [ovs-vsctl: constraint violation: 4096 is not in the valid
> range 0 to 4095 (inclusive)
>  ])
> -AT_CHECK([RUN_OVS_VSCTL([set c br1 'connection-mode=xyz'])],
> +AT_CHECK([RUN_OVS_VSCTL([set controller br1 'connection-mode=xyz'])],
>[1], [], [[ovs-vsctl: constraint violation: xyz is not one of the
> allowed values ([in-band, out-of-band])
>  ]])
> -AT_CHECK([RUN_OVS_VSCTL([set c br1 connection-mode:x=y])],
> +AT_CHECK([RUN_OVS_VSCTL([set controller br1 connection-mode:x=y])],
>[1], [], [ovs-vsctl: cannot specify key to set for non-map column
> connection_mode
>  ])
>  AT_CHECK([RUN_OVS_VSCTL([add bridge br1 datapath_id x y])],
> @@ -929,6 +943,22 @@ AT_CHECK([RUN_OVS_VSCTL([remove bridge br1
> flood-vlans true])],
>  AT_CHECK([RUN_OVS_VSCTL([clear bridge br1 name])],
>[1], [], [ovs-vsctl: cannot modify read-only column name in table Bridge
>  ])
> +
> +AT_CHECK([RUN_OVS_VSCTL([-- --id=@m create Datapath datapath_version=0 --
> set Open_vSwitch . datapaths:"netdev"=@m])], [0], [stdout])
>

If I execute

AT_CHECK([RUN_OVS_VSCTL([-- --id=@m create Datapath datapath_version=0 --
set Open_vSwitch . datapaths:"netdevvv"=@m])], [0], [stdout])

it works, but there is no such datapath type 'netdevvv'

I think it would be better to enforce an enum here as well thru the schema,
as I mentioned in V2, since this handles errors better.
This is actually the same idea as enforcing enum for timeout keys that was
done for V3 to block bad timeout keys like "foo_bar=3"
I commented patch 1 anyways.


> +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdevxx zone=1 icmp_first=1
> icmp_reply=2])],
> +  [1], [], [ovs-vsctl: datapath netdevxx does not exist
> +])
> +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=2 icmp_first=2
> icmp_reply=3])])
> +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=2 icmp_first=2
> icmp_reply=3])],
> +  [1], [], [ovs-vsctl: zone id 2 already exists
> +])
> +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:2, Timeout
> Policies: icmp_first=2 icmp_reply=3
> +])
> +AT_CHECK([RUN_OVS_VSCTL([del-zone-tp netdev zone=11])],
> +  [1], [], [ovs-vsctl: zone id 11 does not exist
> +])
> +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:2, Timeout
> Policies: icmp_first=2 icmp_reply=3
> +])
>  OVS_VSCTL_CLEANUP
>  AT_CLEANUP
>
> diff --git a/utilities/ovs-vsctl.8.in b/utilities/ovs-vsctl.8.in
> index 7c09df79bd29..5b9883ae1c3d 100644
> --- a/utilities/ovs-vsctl.8.in
> +++ b/utilities/ovs-vsctl.8.in
> @@ -353,6 +353,32 @@ list.
>  Prints the name of the bridge that contains \fIiface\fR on standard
>  output.
>  .
> +.SS "Conntrack Zone Commands"
> +These commands query and modify datapath CT zones and Timeout Policies.
> +.
> +.IP "[\fB\-\-may\-exist\fR] \fBadd\-zone\-tp \fIdatapath
> \fBzone=\fIzone_id \fIpolicies\fR"
> +Creates a conntrack zone timeout policy with \fIzone_id\fR in
> +\fIdatapath\fR.  The \fIpolicies\fR consist of \fIkey\fB=\fIvalue\fR
> +pairs, separated by 

Re: [ovs-dev] [PATCH v3 1/9] ovs-vswitchd: Add Datapath, CT_Zone, and CT_Zone_Policy tables.

2019-08-12 Thread Darrell Ball
Thanks for the patch

On Mon, Aug 12, 2019 at 5:52 PM Yi-Hung Wei  wrote:

> From: Justin Pettit 
>
> Signed-off-by: Justin Pettit 
> Signed-off-by: Yi-Hung Wei 
> Co-authored-by: Yi-Hung Wei 
> ---
>  vswitchd/vswitch.ovsschema |  51 -
>  vswitchd/vswitch.xml   | 275
> +
>  2 files changed, 277 insertions(+), 49 deletions(-)
>
> diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
> index f7c6eb8983cd..c0a2242ad345 100644
> --- a/vswitchd/vswitch.ovsschema
> +++ b/vswitchd/vswitch.ovsschema
> @@ -1,9 +1,14 @@
>  {"name": "Open_vSwitch",
> - "version": "8.0.0",
> - "cksum": "3962141869 23978",
> + "version": "8.1.0",
> + "cksum": "1635647160 26090",
>   "tables": {
> "Open_vSwitch": {
>   "columns": {
> +   "datapaths": {
> + "type": {"key": {"type": "string"},
>

I had a minor comment in V2 about using an enum here for key - 'system' or
'netdev'
Does it work or is there worry that other datapath types will likely develop
and we will need to update the enum ?


> +  "value": {"type": "uuid",
> +"refTable": "Datapath"},
> +  "min": 0, "max": "unlimited"}},

"bridges": {
>   "type": {"key": {"type": "uuid",
>"refTable": "Bridge"},
> @@ -629,6 +634,48 @@
>"min": 0, "max": "unlimited"},
>   "ephemeral": true}},
>   "indexes": [["target"]]},
> +   "Datapath": {
> + "columns": {
> +   "datapath_version": {
> + "type": "string"},
> +   "ct_zones": {
> + "type": {"key": {"type": "integer",
> +  "minInteger": 0,
> +  "maxInteger": 65535},
> +  "value": {"type": "uuid",
> +"refTable": "CT_Zone"},
> +  "min": 0, "max": "unlimited"}},
>

minor comment from V2; I think
+  "min": 0, "max": "unlimited"}},
should be
+  "min": 0, "max": "65536"}},



> +   "external_ids": {
> + "type": {"key": "string", "value": "string",
> +  "min": 0, "max": "unlimited",
> +   "CT_Zone": {
> + "columns": {
> +   "timeout_policy": {
> + "type": {"key": {"type": "uuid",
> +  "refTable": "CT_Timeout_Policy"},
> +  "min": 0, "max": 1}},
> +   "external_ids": {
> + "type": {"key": "string", "value": "string",
> +  "min": 0, "max": "unlimited",
> +   "CT_Timeout_Policy": {
> + "columns": {
> +   "timeouts": {
> + "type": {"key": {"type" : "string",
> +  "enum": ["set", ["tcp_syn_sent", "tcp_syn_recv",
> +   "tcp_established",
> "tcp_fin_wait",
> +   "tcp_close_wait",
> "tcp_last_ack",
> +   "tcp_time_wait", "tcp_close",
> +   "tcp_syn_sent2",
> "tcp_retransmit",
> +   "tcp_unack", "udp_first",
> +   "udp_single", "udp_multiple",
> +   "icmp_first", "icmp_reply"]]},
> +  "value": {"type" : "integer",
> +"minInteger" : 0,
> +"maxInteger" : 4294967295},
> +  "min": 0, "max": "unlimited"}},
>

Should it be ?
+  "min": 0, "max": "16"}},

or will this create upgrade issues ?



> +   "external_ids": {
> + "type": {"key": "string", "value": "string",
> +  "min": 0, "max": "unlimited",
> "SSL": {
>   "columns": {
> "private_key": {
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 027aee2f523b..495f0acad842 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -52,6 +52,13 @@
>  one record in the  table.
>
>  
> +  
> +Map of datapath types to datapaths.  The
> + column of the 
> +table is used as a key for this map.  The value points to a row in
> +the  table.
> +  
> +
>
>  Set of bridges managed by the daemon.
>
> @@ -1192,53 +1199,11 @@
>
>
>
> -
> -  Reports the version number of the Open vSwitch datapath in use.
> -  This allows management software to detect and report
> discrepancies
> -  between Open vSwitch userspace and datapath versions.  (The  -  column="ovs_version" table="Open_vSwitch"/> column in the  -  table="Open_vSwitch"/> reports the Open vSwitch userspace
> version.)
> -  The version reported depends on the datapath in use:
> -
> -
> -
> -  
> -When the kernel module included in the Open vSwitch source
> tree is
> -used, this column reports 

Re: [ovs-dev] [PATCH v3 9/9] ofproto-dpif-xlate: Translate timeout policy in ct action

2019-08-12 Thread Darrell Ball
Thanks for the patch

Not a full review; I just did a quick run of the test using a more recent
kernel version

dball@ubuntu:~/ovs$ uname -r
5.0.0-23-generic
dball@ubuntu:~/ovs$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic

The test is no longer blocked on subsequent runs, at least with this kernel
version (others: TBD) - cool !

However

## --- ##
## openvswitch 2.12.90 test suite. ##
## --- ##
 75: conntrack - zone-based timeout policy   FAILED (
system-traffic.at:3228)

.
.
.
VSCTL_ADD_ZONE_TIMEOUT_POLICY([zone=5 udp_single=3 icmp_first=3])

dnl Send ICMP and UDP traffic
NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING],
[0], [dnl   < FAILS HERE
3 packets transmitted, 3 received, 0% packet loss, time 0ms
])
.
.
.

-3 packets transmitted, 3 received, 0% packet loss, time 0ms
+7 packets transmitted, 0 received, 100% packet loss, time 0ms

warnings:

> 2019-08-13T02:19:06.674Z|1|dpif(handler1)|WARN|system@ovs-system:
failed to put[create] (Invalid argument)
ufid:55d8603a-729c-43d7-9612-b54553e46299
recirc_id(0x2),dp_hash(0/0),skb_priority(0/0),in_port(2),skb_mark(0/0),ct_state(0x21/0x23),ct_zone(0x5/0),ct_mark(0/0),ct_label(0/0),ct_tuple4(src=
10.1.1.1/0.0.0.0,dst=10.1.1.2/0.0.0.0,proto=1/0,tp_src=8/0,tp_dst=0/0
),eth(src=8a:ea:c3:02:6f:94/00:00:00:00:00:00,dst=92:48:5b:47:e2:63/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=
10.1.1.1/0.0.0.0,dst=10.1.1.2/0.0.0.0,proto=1,tos=0/0,ttl=64/0,frag=no),icmp(type=8/0,code=0/0),
actions:ct(commit,zone=5,timeout=ovs_tp_0_icmp4),3
> 2019-08-13T02:19:06.674Z|2|dpif(handler1)|WARN|system@ovs-system:
execute ct(commit,zone=5,timeout=ovs_tp_0_icmp4),3 failed (Invalid
argument) on packet
icmp,vlan_tci=0x,dl_src=8a:ea:c3:02:6f:94,dl_dst=92:48:5b:47:e2:63,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
icmp_csum:4d0a
>  with metadata
skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x5),ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=1,tp_src=8,tp_dst=0),in_port(2)
mtu 0
> 2019-08-13T02:19:06.999Z|3|dpif(handler1)|WARN|system@ovs-system:
failed to put[create] (Invalid argument)
ufid:55d8603a-729c-43d7-9612-b54553e46299
recirc_id(0x2),dp_hash(0/0),skb_priority(0/0),in_port(2),skb_mark(0/0),ct_state(0x21/0x23),ct_zone(0x5/0),ct_mark(0/0),ct_label(0/0),ct_tuple4(src=
10.1.1.1/0.0.0.0,dst=10.1.1.2/0.0.0.0,proto=1/0,tp_src=8/0,tp_dst=0/0
),eth(src=8a:ea:c3:02:6f:94/00:00:00:00:00:00,dst=92:48:5b:47:e2:63/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=
10.1.1.1/0.0.0.0,dst=10.1.1.2/0.0.0.0,proto=1,tos=0/0,ttl=64/0,frag=no),icmp(type=8/0,code=0/0),
actions:ct(commit,zone=5,timeout=ovs_tp_0_icmp4),3
> 2019-08-13T02:19:06.999Z|4|dpif(handler1)|WARN|system@ovs-system:
execute ct(commit,zone=5,timeout=ovs_tp_0_icmp4),3 failed (Invalid
argument) on packet
icmp,vlan_tci=0x,dl_src=8a:ea:c3:02:6f:94,dl_dst=92:48:5b:47:e2:63,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
icmp_csum:2f10
>  with metadata
skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x5),ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=1,tp_src=8,tp_dst=0),in_port(2)
mtu 0
> 2019-08-13T02:19:07.319Z|5|dpif(handler1)|WARN|system@ovs-system:
failed to put[create] (Invalid argument)
ufid:55d8603a-729c-43d7-9612-b54553e46299
recirc_id(0x2),dp_hash(0/0),skb_priority(0/0),in_port(2),skb_mark(0/0),ct_state(0x21/0x23),ct_zone(0x5/0),ct_mark(0/0),ct_label(0/0),ct_tuple4(src=
10.1.1.1/0.0.0.0,dst=10.1.1.2/0.0.0.0,proto=1/0,tp_src=8/0,tp_dst=0/0
),eth(src=8a:ea:c3:02:6f:94/00:00:00:00:00:00,dst=92:48:5b:47:e2:63/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=
10.1.1.1/0.0.0.0,dst=10.1.1.2/0.0.0.0,proto=1,tos=0/0,ttl=64/0,frag=no),icmp(type=8/0,code=0/0),
actions:ct(commit,zone=5,timeout=ovs_tp_0_icmp4),3
> 2019-08-13T02:19:07.320Z|6|dpif(handler1)|WARN|system@ovs-system:
execute ct(commit,zone=5,timeout=ovs_tp_0_icmp4),3 failed (Invalid
argument) on packet
icmp,vlan_tci=0x,dl_src=8a:ea:c3:02:6f:94,dl_dst=92:48:5b:47:e2:63,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
icmp_csum:906c
>  with metadata
skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x5),ct_tuple4(src=10.1.1.1,dst=10.1.1.2,proto=1,tp_src=8,tp_dst=0),in_port(2)
mtu 0
> 2019-08-13T02:19:07.639Z|7|dpif(handler1)|WARN|system@ovs-system:
failed to put[create] (Invalid argument)
ufid:55d8603a-729c-43d7-9612-b54553e46299
recirc_id(0x2),dp_hash(0/0),skb_priority(0/0),in_port(2),skb_mark(0/0),ct_state(0x21/0x23),ct_zone(0x5/0),ct_mark(0/0),ct_label(0/0),ct_tuple4(src=
10.1.1.1/0.0.0.0,dst=10.1.1.2/0.0.0.0,proto=1/0,tp_src=8/0,tp_dst=0/0
),eth(src=8a:ea:c3:02:6f:94/00:00:00:00:00:00,dst=92:48:5b:47:e2:63/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=
10.1.1.1/0.0.0.0,dst=10.1.1.2/0.0.0.0,proto=1,tos=0/0,ttl=64/0,frag=no),icmp(type=8/0,code=0/0),

Re: [ovs-dev] [PATCH v3 0/9] Support zone-based conntrack timeout policy

2019-08-12 Thread Darrell Ball
Thanks for the patches

On Mon, Aug 12, 2019 at 5:51 PM Yi-Hung Wei  wrote:

> This patch series enables zone-based conntrack timeout policy support in
> OVS.
> Timeout policy is a set of timeout attributes that can be associated with a
> connection when it is committed.  Then, the connection tracking system will
> expire a connection based on its connection state.  For example, one use
> case would be to extend the timeout of TCP connection in the established
> state to avoid re-connect overhead. Other use case may be to shorten the
> connection timeout so that the system can reclaim resources faster.
> The idea of zone-based conntrack timeout policy is to group connections
> with similar characteristics in a conntrack zone, and assign timeout policy
> to the conntrack zone.  In this way, all the connections in that zone will
> share
> the same timeout policy.
>
> For zone-based timeout policy configuration, the association of conntrack
> zone and conntrack timeout policy is defined per datapath in vswitch ovsdb
> schema.  User can program the database through ovs-vsctl or using ovsdb
> protocol directly.  Once the zone-based timeout policy configuration is
> in the database, vswitchd will read those configuration and organize it
> in internal datapath structure, and push the timeout policy into datapath.
> Currently, only the kernel datapath supports customized timeout policy.
>
> When a packet is committed to connection tracking system, during flow
> translation in ofproto-dpif-xlate, vsiwtchd will lookup the internal
> data structure to figure out which timeout policy to associate with
> the connection.





> If timeout policy is not specified to the committed
> zone, it defaults to the timeout policy in the default zone (zone 0).
>

The above is no longer true since we removed the default zone code.



> If the timeout policy is not specified in the default zone, it defaults
> to the system default timeouts.
>
> Here are some more details about each patch
> * p01: Introduce ovsdb schema for ct timeout policy.
> * p02: ovs-vsctl commands to configure zone-based ct timeout policy.
> * p03: Expose a utility functions.
> * p04: dpif interface along with dpif-netlink implementation to support
>ct timeout policy.
> * p05: Consume ct timeout policy configuration from ovsdb server,
>keep it in internal data structure, and push configuration to
>datapath.
> * p06: Add utility function to help compare two simaps.
> * p07-08: Kernel datapath support for the new ct action attribute.
> * p09: Translate timeout policy in ofproto-dpif-xlate and system traffic
> test.
>
> v2->v3
> * ovsdb schema
> - Fold in changes from Justin.
> - Make ct timeout policy key to be in a pre-defined set.
>

pre-defined set helps address a bug, I assume


> * ovs-vsctl
> - Bug fixes.
> * ct-dpif
> - Fold in diff suggestion from Justin.
> * bridge, ofproto-dpif
> - Restruct the ofproto and dpif layer support for zone based timeout
>   policy.
> * system traffic test
> - Fix bug reported by Darrell.
>

which one ?



> * Address review comments from Justin and Darrell.
>

The above history is a bit incomplete and vague, given the number of
comments
addressed. It makes it hard to track the comments addressed or missed bcoz
the
reviewer needs to go back to previous versions and correlate. If we do
another version.
please try to expand on the review comments in the different versions.



>
> v1->v2
>
> * ovs-vsctl
> - Remove add-dp,del-dp,list-dp ovs-vsctl commands.
> - Add --may-exist and --if-exists to ovs-vsctl add-zone-tp command.
> - Improve ovs-vsctl test.
> * ct-dpif, dpif-netlink
> - Remove support to change default timeout policy in the datapath.
> - Squash ct-dpif and dpif-netlink layer implementation altogether.
> - Address review comments from William.
> * ofproto-dpif
> - Remove changes from datapath-config module to ofproto-dpif layer.
> - Maintain zone-based timeout policy in dpif-backer since this is
>   per datapath type configuration. This will not break the OVS
>   hierarchy as Ilya suggested.
> - Allocate timeout policy id using id_pool instead of idl_seqno
>   as Darrell suggested.
> - Add a timeout policy sweep function that clean up unnecessary
>   timeout policy regularly in the datapath.
> * ofproto-dpif-xlate
> - Only translate ct timeout policy if it is a ct commit action
>   in kernel datapath.
> * system-traffic test
> - Update system traffic test with low level ovs-vsctl command.
> - Make system traffic test to be datapath type agnostic.
> - Improve system traffic test as Darrell suggested.
> * Rebase to master
>
>
> Ben Pfaff (1):
>   simap: Add utility function to help compare two simaps.
>
> Justin Pettit (1):
>   ovs-vswitchd: Add Datapath, CT_Zone, and CT_Zone_Policy tables.
>
> William Tu (1):
>   ovs-vsctl: Add conntrack zone commands.
>
> Yi-Hung Wei (6):
>   ct-dpif: Export 

Re: [ovs-dev] [PATCH v2 9/9] system-traffic: Add zone-based conntrack timeout policy test

2019-08-12 Thread Darrell Ball
On Mon, Aug 12, 2019 at 5:15 PM Yi-Hung Wei  wrote:

> On Sun, Aug 11, 2019 at 12:30 PM Darrell Ball  wrote:
> >
> > I did some further testing and ran into another issue; in this case,
> one, I did not expect.
> >
> > I added an additional sending of packets at the end of the test after
> this check:
> >
> > AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> > ])
> >
> > Below is new code
> >
> > dnl Do it again
> > dnl Send ICMP and UDP traffic
> > NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > 3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > ])
> > AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=5054000a505400090800451c0011a4cd0a0101010a010102000100020008
> actions=resubmit(,0)"])
> >
> > AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | sort],
> [0], [dnl
> >
> icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0),zone=5
> >
> udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=5
> > ])
> >
> > dnl Wait until the timeout expire.
> > dnl We intend to wait a bit longer, because conntrack does not recycle
> the entry right after it is expired.
> > sleep 5
> >
> > AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> > ])
> >
> > The test fails bcoz the second time with short timeouts, the conntrack
> entries are not cleanup up quickly
> >
> > @@ -0,0 +1,2 @@
> >
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0),zone=5
> >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=5
> >
> >
> >
> > On Tue, Aug 6, 2019 at 12:16 PM Darrell Ball  wrote:
> >>
> >>
> >>
> >> On Tue, Aug 6, 2019 at 11:07 AM Yi-Hung Wei 
> wrote:
> >>>
> >>> On Tue, Aug 6, 2019 at 10:21 AM Darrell Ball  wrote:
> >>> >
> >>> >
> >>> > I did some more testing and found a similar problem as in V1.
> >>> >
> >>> > This test can be run successfully once and then fails after that.
> >>> > Maybe you want to look into that. It is probably related to:
> >>> >
> >>> > dball@ubuntu:~/openvswitch/ovs$ lsmod | grep nf
> >>> > .
> >>> > nfnetlink_cttimeout16384  1
> >>> > .
> >>> >
> >>> > Darrell
> >>> >
> >>>
> >>> Thanks for trying out the test.  I can not reproduce the issue that
> >>> you mentioned on my local VM.
> >>>
> >>> Can you provide your kernel version and system-kmod-testsuite.log?
> >>>
> >>> Thanks,
> >>>
> >>> -Yi-Hung
> >>
> >>
> >>
> >> Here it is:
> >>
> >> dball@ubuntu:~/ovs$ uname -a
> >> Linux ubuntu 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:24 UTC
> 2018 x86_64 x86_64 x86_64 GNU/Linux
> >>
> Thanks for reporting the issue.  I am able to reproduce in similar set
> up.  It should be resolved in v3.
>

What is the fix you want to use for this bug; it must be different from the
second bug you have proposed a patch for
in another response ?

Thanks Darrell


>
> Thanks,
>
> -Yi-Hung
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 9/9] system-traffic: Add zone-based conntrack timeout policy test

2019-08-12 Thread Darrell Ball
On Mon, Aug 12, 2019 at 5:22 PM Yi-Hung Wei  wrote:

> On Sun, Aug 11, 2019 at 12:30 PM Darrell Ball  wrote:
> >
> > I did some further testing and ran into another issue; in this case,
> one, I did not expect.
> >
> > I added an additional sending of packets at the end of the test after
> this check:
> >
> > AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> > ])
> >
> > Below is new code
> >
> > dnl Do it again
> > dnl Send ICMP and UDP traffic
> > NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > 3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > ])
> > AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=5054000a505400090800451c0011a4cd0a0101010a010102000100020008
> actions=resubmit(,0)"])
> >
> > AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | sort],
> [0], [dnl
> >
> icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0),zone=5
> >
> udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=5
> > ])
> >
> > dnl Wait until the timeout expire.
> > dnl We intend to wait a bit longer, because conntrack does not recycle
> the entry right after it is expired.
> > sleep 5
> >
> > AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> > ])
> >
> > The test fails bcoz the second time with short timeouts, the conntrack
> entries are not cleanup up quickly
> >
> > @@ -0,0 +1,2 @@
> >
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0),zone=5
> >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=5
>
>
> Thanks for testing!   This test actually catch a kernel bug when ovs
> kernel handles conntrack cache.  It works for me on my ubuntu xenial
> VM with 4.4 kernel.
>
> Since this requires upstream kernel change, it will be backported to
> OVS once the fix gets upstream.
>
> Thanks,
>
> -Yi-Hung
>


Does the below patch fix just the failed timeout policy for sending second
and subsequent packet only
or also the issue of failed test runs after the first run ?

If the issue of subsequent failed test runs is different, what is the fix
you want to use for it ?



>
> diff --git a/datapath/conntrack.c b/datapath/conntrack.c
> index f85d0a2572f6..ad48b559bcde 100644
> --- a/datapath/conntrack.c
> +++ b/datapath/conntrack.c
> @@ -76,6 +76,7 @@ enum ovs_ct_nat {
>  /* Conntrack action context for execution. */
>  struct ovs_conntrack_info {
> struct nf_conntrack_helper *helper;
> +   struct nf_ct_timeout *nf_ct_timeout;
> struct nf_conntrack_zone zone;
> struct nf_conn *ct;
> u8 commit : 1;
> @@ -745,6 +746,13 @@ static bool skb_nfct_cached(struct net *net,
> if (help && rcu_access_pointer(help->helper) !=
> info->helper)
> return false;
> }
> +   if (info->nf_ct_timeout) {
> +   struct nf_conn_timeout *timeout_ext;
> +
> +   timeout_ext = nf_ct_timeout_find(ct);
> +   if (!timeout_ext || info->nf_ct_timeout !=
> timeout_ext->timeout)
> +   return false;
> +   }
> /* Force conntrack entry direction to the current packet? */
> if (info->force && CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
> /* Delete the conntrack entry if confirmed, else just
> release
> @@ -1704,6 +1712,8 @@ int ovs_ct_copy_action(struct net *net, const
> struct nlattr *attr,
>   ct_info.timeout))
> pr_info_ratelimited("Failed to associated timeout "
> "policy `%s'\n",
> ct_info.timeout);
> +   else
> +   ct_info.nf_ct_timeout =
> nf_ct_timeout_find(ct_info.ct)->timeout;
> }
>
> if (helper) {
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 9/9] system-traffic: Add zone-based conntrack timeout policy test

2019-08-11 Thread Darrell Ball
I did some further testing and ran into another issue; in this case, one, I
did not expect.

I added an additional sending of packets at the end of the test after this
check:

AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
])

Below is new code

dnl Do it again
dnl Send ICMP and UDP traffic
NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING],
[0], [dnl
3 packets transmitted, 3 received, 0% packet loss, time 0ms
])
AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
packet=5054000a505400090800451c0011a4cd0a0101010a010102000100020008
actions=resubmit(,0)"])

AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | sort],
[0], [dnl
icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0),zone=5
udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=5
])

dnl Wait until the timeout expire.
dnl We intend to wait a bit longer, because conntrack does not recycle the
entry right after it is expired.
sleep 5

AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
])

The test fails bcoz the second time with short timeouts, the conntrack
entries are not cleanup up quickly

@@ -0,0 +1,2 @@
+icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0),zone=5
+udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=5



On Tue, Aug 6, 2019 at 12:16 PM Darrell Ball  wrote:

>
>
> On Tue, Aug 6, 2019 at 11:07 AM Yi-Hung Wei  wrote:
>
>> On Tue, Aug 6, 2019 at 10:21 AM Darrell Ball  wrote:
>> >
>> >
>> > I did some more testing and found a similar problem as in V1.
>> >
>> > This test can be run successfully once and then fails after that.
>> > Maybe you want to look into that. It is probably related to:
>> >
>> > dball@ubuntu:~/openvswitch/ovs$ lsmod | grep nf
>> > .
>> > nfnetlink_cttimeout16384  1
>> > .
>> >
>> > Darrell
>> >
>>
>> Thanks for trying out the test.  I can not reproduce the issue that
>> you mentioned on my local VM.
>>
>> Can you provide your kernel version and system-kmod-testsuite.log?
>>
>> Thanks,
>>
>> -Yi-Hung
>>
>
>
> Here it is:
>
> dball@ubuntu:~/ovs$ uname -a
> Linux ubuntu 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:24 UTC 2018
> x86_64 x86_64 x86_64 GNU/Linux
>
> dball@ubuntu:~/ovs$ lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description: Ubuntu 16.04.2 LTS
> Release: 16.04
> Codename: xenial
>
> # -*- compilation -*-
> 75. system-traffic.at:3182: testing conntrack - zone-based timeout policy
> ...
> net.netfilter.nf_conntrack_helper = 0
> ../../tests/system-traffic.at:3185: modprobe openvswitch
> ../../tests/system-traffic.at:3185: ovsdb-tool create conf.db
> $abs_top_srcdir/vswitchd/vswitch.ovsschema
> ../../tests/system-traffic.at:3185: ovsdb-server --detach --no-chdir
> --pidfile --log-file --remote=punix:$OVS_RUNDIR/db.sock
> stderr:
> 2019-08-06T19:11:47Z|1|vlog|INFO|opened log file
> /home/dball/ovs/_gcc/tests/system-kmod-testsuite.dir/075/ovsdb-server.log
> ../../tests/system-traffic.at:3185: sed < stderr '
> /vlog|INFO|opened log file/d
> /ovsdb_server|INFO|ovsdb-server (Open vSwitch)/d'
> ../../tests/system-traffic.at:3185: ovs-vsctl --no-wait init
> ../../tests/system-traffic.at:3185: ovs-vswitchd  --detach --no-chdir
> --pidfile --log-file -vvconn -vofproto_dpif -vunixctl
> stderr:
> 2019-08-06T19:11:47Z|1|vlog|INFO|opened log file
> /home/dball/ovs/_gcc/tests/system-kmod-testsuite.dir/075/ovs-vswitchd.log
> 2019-08-06T19:11:47Z|2|ovs_numa|INFO|Discovered 1 CPU cores on NUMA
> node 0
> 2019-08-06T19:11:47Z|3|ovs_numa|INFO|Discovered 1 NUMA nodes and 1 CPU
> cores
> 2019-08-06T19:11:47Z|4|reconnect|INFO|unix:/home/dball/ovs/_gcc/tests/system-kmod-testsuite.dir/075/db.sock:
> connecting...
> 2019-08-06T19:11:47Z|5|reconnect|INFO|unix:/home/dball/ovs/_gcc/tests/system-kmod-testsuite.dir/075/db.sock:
> connected
> ../../tests/system-traffic.at:3185: sed < stderr '
> /ovs_numa|INFO|Discovered /d
> /vlog|INFO|opened log file/d
> /vswitchd|INFO|ovs-vswitchd (Open vSwitch)/d
> /reconnect|INFO|/d
> /dpif_netlink|INFO|Generic Netlink family .ovs_datapath. does not exist/d
> /ofproto|INFO|using datapath ID/d
> /netdev_linux|INFO|.*device has unknown hardware address family/d
> /ofproto|INFO|datapath ID changed to fedcba9876543210/d
> /dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable/d
> /netlink_socket|INF

Re: [ovs-dev] [PATCH v2 8/9] ofproto-dpif-xlate: Translate timeout policy in ct action

2019-08-11 Thread Darrell Ball
On Fri, Aug 9, 2019 at 1:10 PM Justin Pettit  wrote:

>
> > On Aug 1, 2019, at 3:07 PM, Yi-Hung Wei  wrote:
> >
> > diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
> > index 79a2314500cf..57b32ccb610f 100644
> > --- a/lib/dpif-provider.h
> > +++ b/lib/dpif-provider.h
> > @@ -536,6 +536,11 @@ struct dpif_class {
> >struct ct_dpif_timeout_policy
> *tp);
> > int (*ct_timeout_policy_dump_done)(struct dpif *, void *state);
> >
> > +/* Get timeout policy name (OVS_CT_ATTR_TIMEOUT) from datapath. */
> > +int (*ct_format_timeout_policy_name)(struct dpif *, uint32_t tp_id,
> > + uint16_t dl_type, uint8_t
> nw_proto,
> > + struct ds *ds);
>
> To Darrell's point about wanting the ability to implement timeout policies
> in the userspace implementation without unwildcarding the dl_type and
> nw_proto, I'd suggest adding a "bool *unwildcard" argument that the
> provider can indicate whether those bits should be unwildcarded or not.
>

Sounds good

A related point is that some timeout profile expansion supporting code is
presently located in common code in ct-dpif.*
This is the code that helps convert a profile from the database form and
expands it to 6 Netfilter profiles -
(TCP4, TCP6, UDP4, UDP6, ICMP4, ICMP6).
I did mention this in some other comments, but I wanted to reiterate to
save some of the further churn.

One other thing that concerns me is that datapath rules referencing timeout
profiles can linger in the fast path after the associated
timeout profiles are supposed to be gone (eg) short timers/ct-flush).
Although, presently, the communication of timeout profiles via datapath
rules is
needed for Netfilter support, this lingering of timeout profile context is
not ideal. It would be good to document it.

The above being said, I do think that for userspace datapath timeout
profiles could be communicated more simply to userspace conntrack.
After reading ct-zone/timeout profile info from the database, the
ct-zone/timeout profile delta can be sent via _type_set_config() API to
the userspace datapath/conntrack.
conntrack can associate timeout profiles to ct/commit related rules via
ct-zone. This would also satisfy the present requirements.
This would mean the timeout profile specifiers would not need to be
directly associated with datapath rules.


>
> > diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
> > index 41e07f0ee23e..1a2fc4a6a084 100644
> > --- a/ofproto/ofproto-provider.h
> > +++ b/ofproto/ofproto-provider.h
> > @@ -1880,6 +1880,11 @@ struct ofproto_class {
> > const struct ovsrec_datapath *dp_cfg, unsigned int
> idl_seqno);
> > /* Cleans up the to be deleted timeout policy in the pending kill
> list. */
> > void (*ct_zone_timeout_policy_sweep)(const struct ofproto *ofproto_);
> > +
> > +/* Returns true if timeout policy for 'zone' is configured and
> stores the
> > + * timeout policy id in '*tp_id'. */
> > +bool (*ct_zone_timeout_policy_get)(const struct ofproto *ofproto_,
> > +   uint16_t zone, uint32_t *tp_id);
> > };
>
> As we discuss off-line, by pulling the OVSDB information out of the dpif
> layer, I think we'll want a more tradition get/set ofproto interface.  I
> believe that will be done in v3.
>
> Thanks,
>
> --Justin
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 5/9] ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables

2019-08-11 Thread Darrell Ball
On Fri, Aug 9, 2019 at 1:23 PM Justin Pettit  wrote:

>
> > On Aug 7, 2019, at 11:40 AM, Darrell Ball  wrote:
> >
> > There are 3 behaviors with the patchset that are datapath specific
> >
> > 1/ Unwildcarding of commit flows with timeout policies
> > As we discussed, the userspace conntrack does not need to do this
> and would not since it is suboptimal
> > since unnecessary flows are generated.
> > This is because userspace conntrack would use a single shared
> profile across all dl_types and ip_proto
> > rather than expanding to 6 profiles as in the case of kernel across
> dl_types and ip_protos.
>
> That makes sense.  Based on this feedback, Yi-Hung and I came up with a
> design that I think will address this in v3.
>

I spotted the other e-mail.


>
> > 2/ Userspace datepath/conntrack can easily manage cleanup of deleted
> profiles using a refcount approach.
> > For userspace conntrack, we don't need to read all the timeouts
> profiles continually and to continually try to
> > delete them from top down hoping to catch a window when the profile
> is not referenced by a flow.
>
> Yes, I think we don't need to introduce ct_zone_timeout_policy_sweep(),
> and it can just be implemented in the existing dpif_netlink_run().  That
> should be in v3.
>

yep


>
> > 3/ In terms of timeout profile naming in userspace conntrack, we don't
> need to manage a separate profile ID space for
> > userspace conntrack. We can simply use the uuid directly. This
> simplifies the management of profiles and always
> >  keeps knowledge of the profile name in sync across layers.
>
> I'm hesitant to use the OVSDB row UUIDs in the dpif layer.  v3 is going to
> restructure the handling of timeout policies pretty significantly, so let's
> discuss the particulars once that's out.
>

I am not overly concerned about '3' in general for the userspace datapath
since vswitchd restart will wipe out the dynamic u32 profile IDs anyways
and they will need to be re-associated, so they should be tightly coupled
to the database uuids.


>
> Thanks,
>
> --Justin
>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/9] ovs-vsctl: Add conntrack zone commands.

2019-08-07 Thread Darrell Ball
On Wed, Aug 7, 2019 at 1:37 PM William Tu  wrote:

> Thanks for the review.
>
> On Mon, Aug 05, 2019 at 04:12:02PM -0700, Darrell Ball wrote:
> > Thanks for the patch
> >
> > I noticed '--may-exist' and '--if-exists' are supported now for
> > add--zone-tp/del-zone-tp - thanks
> > The check for duplicate timeout policies now correctly checks all key and
> > values - thanks
>
> yes, thanks. Will do it in next version.
> >
> > Some more comments inline
> > I am trying to avoid duplicate comment from Justin, so I just won't
> comment
> > on some parts in this version
> > to avoid confusion.
> >
> >
> > On Thu, Aug 1, 2019 at 3:09 PM Yi-Hung Wei  wrote:
> >
> > > From: William Tu 
> > >
> > > The patch adds commands creating/deleting/listing conntrack zone
> > > timeout policies:
> > >   $ ovs-vsctl {add,del,list}-zone-tp zone=zone_id ...
> > >
> > > Signed-off-by: William Tu 
> > > ---
> > >  tests/ovs-vsctl.at   |  34 +++-
> > >  utilities/ovs-vsctl.8.in |  25 ++
> > >  utilities/ovs-vsctl.c| 204
> > > +++
> > >  3 files changed, 261 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
> > > index 46fa3c5b1a33..f0c5975edd0e 100644
> > > --- a/tests/ovs-vsctl.at
> > > +++ b/tests/ovs-vsctl.at
> > > @@ -805,6 +805,20 @@ AT_CHECK(
> > >[RUN_OVS_VSCTL([--if-exists remove netflow x targets '"1.2.3.4:567
> > > "'])])
> > >  AT_CHECK(
> > >[RUN_OVS_VSCTL([--if-exists clear netflow x targets])])
> > > +
> > > +AT_CHECK([RUN_OVS_VSCTL([-- --id=@m create Datapath
> datapath_version=0 --
> > > set Open_vSwitch . datapaths:"netdev"=@m])], [0], [stdout])
> > >
> >
> > What happens if we add a datapath type here and there are no bridges of
> > that type; meaning the datapath of that type does not even exist.
> > Seems like a contradiction.
> > Maybe we should check for that at least and raise an error.
> > Ideally, it is better if these 'datapaths' are auto-managed by bridge
> > creation/deletion with given datapath types,
> > but we can certainly defer that.
> >
> >
> > > +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=1 icmp_first=1
> > > icmp_reply=2])])
> > >
> >
> > I mentioned this in V1:
> > There is no filtering of bad timeout key; for example
> >
> > AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=1 icmp_first=1
> > foo_bar=2])])
> >
> > is accepted as valid
> >
> > Even worse, a minor typo will go unnoticed - missing 'y' in 'icmp_reply'.
> >
> > AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=1 icmp_first=1
> > icmp_repl=2])])
>
> agree.
> >
> >
> > > +AT_CHECK([RUN_OVS_VSCTL([--may-exist add-zone-tp netdev zone=1
> > > icmp_first=1 icmp_reply=2])])
> > > +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:1, Timeout
> > > Policies: icmp_first=1 icmp_reply=2
> > >
> >
> > I mentioned in V1
> > We should check all possible timeout keys to make sure they work.
>
> OK.
> >
> >
> > > +])
> > > +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=2 icmp_first=2
> > > icmp_reply=3])])
> > > +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:1, Timeout
> > > Policies: icmp_first=1 icmp_reply=2
> > > +Zone:2, Timeout Policies: icmp_first=2 icmp_reply=3
> > > +])
> > > +AT_CHECK([RUN_OVS_VSCTL([del-zone-tp netdev zone=1])])
> > > +AT_CHECK([RUN_OVS_VSCTL([--if-exists del-zone-tp netdev zone=1])])
> > > +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:2, Timeout
> > > Policies: icmp_first=2 icmp_reply=3
> > > +])
> > >  OVS_VSCTL_CLEANUP
> > >  AT_CLEANUP
> > >
> > > @@ -890,10 +904,10 @@ AT_CHECK([RUN_OVS_VSCTL([set bridge br0
> > > flood_vlans=-1])],
> > >  AT_CHECK([RUN_OVS_VSCTL([set bridge br0 flood_vlans=4096])],
> > >[1], [], [ovs-vsctl: constraint violation: 4096 is not in the valid
> > > range 0 to 4095 (inclusive)
> > >  ])
> > > -AT_CHECK([RUN_OVS_VSCTL([set c br1 'connection-mode=xyz'])],
> > > +AT_CHECK([RUN_OVS_VSCTL([set controller br1 'connection-mode=xyz'])],
> > >
> >
> > I mentioned in V1.
> > I don't think we should make unrelated changes in a feature patch
> > especially since it seems the aut

Re: [ovs-dev] [PATCH v2 5/9] ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables

2019-08-07 Thread Darrell Ball
On Wed, Aug 7, 2019 at 11:51 AM Darrell Ball  wrote:

>
>
> On Wed, Aug 7, 2019 at 11:40 AM Darrell Ball  wrote:
>
>>
>>
>> On Tue, Aug 6, 2019 at 9:57 PM Justin Pettit  wrote:
>>
>>>
>>> > On Aug 5, 2019, at 8:07 PM, Darrell Ball  wrote:
>>> >
>>> > On Thu, Aug 1, 2019 at 3:10 PM Yi-Hung Wei 
>>> wrote:
>>> >
>>> >> +struct ct_timeout_policy {
>>> >> +struct uuid uuid;
>>> >> +unsigned int last_used_seqno;
>>> >> +struct ct_dpif_timeout_policy cdtp;
>>> >> +struct cmap_node node;  /* Element in struct
>>> dpif_backer's
>>> >> + * "ct_tps" cmap. */
>>> >>
>>> >
>>> >
>>> > This looks like a layering violation
>>> > should be in dpif-netlink or netlink-conntrack for kernel side
>>>
>>> Hi, Darrell.  As I mentioned in my code review, I had my own concerns
>>> about layering, but mine were from the top-down.  Yi-Hung and I didn't
>>> understand your concern here, since these seem to be structures that would
>>> be useful regardless of the implementation.  Can you explain a bit more
>>> about your layering concerns?
>>>
>>
>>
>> I was off yesterday afternoon.
>>
>> There are 3 behaviors with the patchset that are datapath specific
>>
>> 1/ Unwildcarding of commit flows with timeout policies
>> As we discussed, the userspace conntrack does not need to do this and
>> would not since it is suboptimal
>> since unnecessary flows are generated.
>> This is because userspace conntrack would use a single shared profile
>> across all dl_types and ip_proto
>> rather than expanding to 6 profiles as in the case of kernel across
>> dl_types and ip_protos.
>>
>> 2/ Userspace datepath/conntrack can easily manage cleanup of deleted
>> profiles using a refcount approach.
>> For userspace conntrack, we don't need to read all the timeouts
>> profiles continually and to continually try to
>> delete them from top down hoping to catch a window when the profile
>> is not referenced by a flow.
>>
>> 3/ In terms of timeout profile naming in userspace conntrack, we don't
>> need to manage a separate profile ID space for
>> userspace conntrack. We can simply use the uuid directly. This
>> simplifies the management of profiles and always
>>  keeps knowledge of the profile name in sync across layers.
>>
>
I think '3' is not that important for the userspace datapath, since when
vswitchd restarts we loose everything in the datapath
and will need to reallocate u32 profile IDs and re-associate them, anyways.
So we never loose the association per se.
Also, we would need to increase the size of a datastructure field to use
the uuid directly.

Also, '1' and '2' can be implemented later if it delays things too much. I
can submit a followup patch(es) if needed.


>
> Hence, the comments for this patch center around where the implementation
> code is.
> I think the code should live in datapath type specific code/files.
>
> Of course, wrappers are needed at higher layers to call the datapath
> specific implementations.
>
>
>>
>> Thanks Darrell
>>
>>
>>
>>
>>>
>>> Thanks,
>>>
>>> --Justin
>>>
>>>
>>>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 5/9] ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables

2019-08-07 Thread Darrell Ball
On Wed, Aug 7, 2019 at 11:40 AM Darrell Ball  wrote:

>
>
> On Tue, Aug 6, 2019 at 9:57 PM Justin Pettit  wrote:
>
>>
>> > On Aug 5, 2019, at 8:07 PM, Darrell Ball  wrote:
>> >
>> > On Thu, Aug 1, 2019 at 3:10 PM Yi-Hung Wei 
>> wrote:
>> >
>> >> +struct ct_timeout_policy {
>> >> +struct uuid uuid;
>> >> +unsigned int last_used_seqno;
>> >> +struct ct_dpif_timeout_policy cdtp;
>> >> +struct cmap_node node;  /* Element in struct dpif_backer's
>> >> + * "ct_tps" cmap. */
>> >>
>> >
>> >
>> > This looks like a layering violation
>> > should be in dpif-netlink or netlink-conntrack for kernel side
>>
>> Hi, Darrell.  As I mentioned in my code review, I had my own concerns
>> about layering, but mine were from the top-down.  Yi-Hung and I didn't
>> understand your concern here, since these seem to be structures that would
>> be useful regardless of the implementation.  Can you explain a bit more
>> about your layering concerns?
>>
>
>
> I was off yesterday afternoon.
>
> There are 3 behaviors with the patchset that are datapath specific
>
> 1/ Unwildcarding of commit flows with timeout policies
> As we discussed, the userspace conntrack does not need to do this and
> would not since it is suboptimal
> since unnecessary flows are generated.
> This is because userspace conntrack would use a single shared profile
> across all dl_types and ip_proto
> rather than expanding to 6 profiles as in the case of kernel across
> dl_types and ip_protos.
>
> 2/ Userspace datepath/conntrack can easily manage cleanup of deleted
> profiles using a refcount approach.
> For userspace conntrack, we don't need to read all the timeouts
> profiles continually and to continually try to
> delete them from top down hoping to catch a window when the profile is
> not referenced by a flow.
>
> 3/ In terms of timeout profile naming in userspace conntrack, we don't
> need to manage a separate profile ID space for
> userspace conntrack. We can simply use the uuid directly. This
> simplifies the management of profiles and always
>  keeps knowledge of the profile name in sync across layers.
>

Hence, the comments for this patch center around where the implementation
code is.
I think the code should live in datapath type specific code/files.

Of course, wrappers are needed at higher layers to call the datapath
specific implementations.


>
> Thanks Darrell
>
>
>
>
>>
>> Thanks,
>>
>> --Justin
>>
>>
>>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 5/9] ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables

2019-08-07 Thread Darrell Ball
On Tue, Aug 6, 2019 at 9:57 PM Justin Pettit  wrote:

>
> > On Aug 5, 2019, at 8:07 PM, Darrell Ball  wrote:
> >
> > On Thu, Aug 1, 2019 at 3:10 PM Yi-Hung Wei  wrote:
> >
> >> +struct ct_timeout_policy {
> >> +struct uuid uuid;
> >> +unsigned int last_used_seqno;
> >> +struct ct_dpif_timeout_policy cdtp;
> >> +struct cmap_node node;  /* Element in struct dpif_backer's
> >> + * "ct_tps" cmap. */
> >>
> >
> >
> > This looks like a layering violation
> > should be in dpif-netlink or netlink-conntrack for kernel side
>
> Hi, Darrell.  As I mentioned in my code review, I had my own concerns
> about layering, but mine were from the top-down.  Yi-Hung and I didn't
> understand your concern here, since these seem to be structures that would
> be useful regardless of the implementation.  Can you explain a bit more
> about your layering concerns?
>


I was off yesterday afternoon.

There are 3 behaviors with the patchset that are datapath specific

1/ Unwildcarding of commit flows with timeout policies
As we discussed, the userspace conntrack does not need to do this and
would not since it is suboptimal
since unnecessary flows are generated.
This is because userspace conntrack would use a single shared profile
across all dl_types and ip_proto
rather than expanding to 6 profiles as in the case of kernel across
dl_types and ip_protos.

2/ Userspace datepath/conntrack can easily manage cleanup of deleted
profiles using a refcount approach.
For userspace conntrack, we don't need to read all the timeouts
profiles continually and to continually try to
delete them from top down hoping to catch a window when the profile is
not referenced by a flow.

3/ In terms of timeout profile naming in userspace conntrack, we don't need
to manage a separate profile ID space for
userspace conntrack. We can simply use the uuid directly. This
simplifies the management of profiles and always
 keeps knowledge of the profile name in sync across layers.

Thanks Darrell




>
> Thanks,
>
> --Justin
>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 9/9] system-traffic: Add zone-based conntrack timeout policy test

2019-08-06 Thread Darrell Ball
On Tue, Aug 6, 2019 at 11:07 AM Yi-Hung Wei  wrote:

> On Tue, Aug 6, 2019 at 10:21 AM Darrell Ball  wrote:
> >
> >
> > I did some more testing and found a similar problem as in V1.
> >
> > This test can be run successfully once and then fails after that.
> > Maybe you want to look into that. It is probably related to:
> >
> > dball@ubuntu:~/openvswitch/ovs$ lsmod | grep nf
> > .
> > nfnetlink_cttimeout16384  1
> > .
> >
> > Darrell
> >
>
> Thanks for trying out the test.  I can not reproduce the issue that
> you mentioned on my local VM.
>
> Can you provide your kernel version and system-kmod-testsuite.log?
>
> Thanks,
>
> -Yi-Hung
>


Here it is:

dball@ubuntu:~/ovs$ uname -a
Linux ubuntu 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:24 UTC 2018
x86_64 x86_64 x86_64 GNU/Linux

dball@ubuntu:~/ovs$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial

# -*- compilation -*-
75. system-traffic.at:3182: testing conntrack - zone-based timeout policy
...
net.netfilter.nf_conntrack_helper = 0
../../tests/system-traffic.at:3185: modprobe openvswitch
../../tests/system-traffic.at:3185: ovsdb-tool create conf.db
$abs_top_srcdir/vswitchd/vswitch.ovsschema
../../tests/system-traffic.at:3185: ovsdb-server --detach --no-chdir
--pidfile --log-file --remote=punix:$OVS_RUNDIR/db.sock
stderr:
2019-08-06T19:11:47Z|1|vlog|INFO|opened log file
/home/dball/ovs/_gcc/tests/system-kmod-testsuite.dir/075/ovsdb-server.log
../../tests/system-traffic.at:3185: sed < stderr '
/vlog|INFO|opened log file/d
/ovsdb_server|INFO|ovsdb-server (Open vSwitch)/d'
../../tests/system-traffic.at:3185: ovs-vsctl --no-wait init
../../tests/system-traffic.at:3185: ovs-vswitchd  --detach --no-chdir
--pidfile --log-file -vvconn -vofproto_dpif -vunixctl
stderr:
2019-08-06T19:11:47Z|1|vlog|INFO|opened log file
/home/dball/ovs/_gcc/tests/system-kmod-testsuite.dir/075/ovs-vswitchd.log
2019-08-06T19:11:47Z|2|ovs_numa|INFO|Discovered 1 CPU cores on NUMA
node 0
2019-08-06T19:11:47Z|3|ovs_numa|INFO|Discovered 1 NUMA nodes and 1 CPU
cores
2019-08-06T19:11:47Z|4|reconnect|INFO|unix:/home/dball/ovs/_gcc/tests/system-kmod-testsuite.dir/075/db.sock:
connecting...
2019-08-06T19:11:47Z|5|reconnect|INFO|unix:/home/dball/ovs/_gcc/tests/system-kmod-testsuite.dir/075/db.sock:
connected
../../tests/system-traffic.at:3185: sed < stderr '
/ovs_numa|INFO|Discovered /d
/vlog|INFO|opened log file/d
/vswitchd|INFO|ovs-vswitchd (Open vSwitch)/d
/reconnect|INFO|/d
/dpif_netlink|INFO|Generic Netlink family .ovs_datapath. does not exist/d
/ofproto|INFO|using datapath ID/d
/netdev_linux|INFO|.*device has unknown hardware address family/d
/ofproto|INFO|datapath ID changed to fedcba9876543210/d
/dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable/d
/netlink_socket|INFO|netlink: could not enable listening to all nsid/d
/probe tc:/d
/tc: Using policy/d'
../../tests/system-traffic.at:3185: ovs-vsctl -- add-br br0 -- set Bridge
br0
protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15
fail-mode=secure  --
--- /dev/null 2019-02-26 18:50:08.04306 -0800
+++
/home/dball/ovs/_gcc/tests/system-kmod-testsuite.dir/at-groups/75/stderr
2019-08-06 12:12:17.489401899 -0700
@@ -0,0 +1,2 @@
+2019-08-06T19:12:17Z|2|fatal_signal|WARN|terminating with signal 14
(Alarm clock)
+/home/dball/ovs/_gcc/tests/system-kmod-testsuite.dir/at-groups/75/test-source:
line 874: 58958 Alarm clock ovs-vsctl -- add-br br0 -- set
Bridge br0
protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15
fail-mode=secure --
../../tests/system-traffic.at:3185: exit code was 142, expected 0
ovsdb-server.log:
> 2019-08-06T19:11:47.418Z|1|vlog|INFO|opened log file
/home/dball/ovs/_gcc/tests/system-kmod-testsuite.dir/075/ovsdb-server.log
> 2019-08-06T19:11:47.420Z|2|ovsdb_server|INFO|ovsdb-server (Open
vSwitch) 2.12.90
> 2019-08-06T19:11:57.433Z|3|memory|INFO|4504 kB peak resident set size
after 10.0 seconds
> 2019-08-06T19:11:57.433Z|4|memory|INFO|cells:122 monitors:3 sessions:2
ovs-vswitchd.log:
> 2019-08-06T19:11:47.449Z|1|vlog|INFO|opened log file
/home/dball/ovs/_gcc/tests/system-kmod-testsuite.dir/075/ovs-vswitchd.log
> 2019-08-06T19:11:47.449Z|2|ovs_numa|INFO|Discovered 1 CPU cores on
NUMA node 0
> 2019-08-06T19:11:47.449Z|3|ovs_numa|INFO|Discovered 1 NUMA nodes and
1 CPU cores
>
2019-08-06T19:11:47.450Z|4|reconnect|INFO|unix:/home/dball/ovs/_gcc/tests/system-kmod-testsuite.dir/075/db.sock:
connecting...
>
2019-08-06T19:11:47.450Z|5|reconnect|INFO|unix:/home/dball/ovs/_gcc/tests/system-kmod-testsuite.dir/075/db.sock:
connected
> 2019-08-06T19:11:47.473Z|6|bridge|INFO|ovs-vswitchd (Open vSwitch)
2.12.90
> 2019-08-06T19:11:47.492Z|

Re: [ovs-dev] [PATCH v2 8/9] ofproto-dpif-xlate: Translate timeout policy in ct action

2019-08-06 Thread Darrell Ball
On Mon, Aug 5, 2019 at 8:51 PM Darrell Ball  wrote:

> Thanks for the patch
>
> The main comment I had from the V1 patch was adding the check
>
> +if (ofc->flags & NX_CT_F_COMMIT) {
>
> in compose_conntrack_action()
>
> I see that was done.
>
> After a quick scan, I had one minor comment inline.
>

I wanted to reiterate one more comment that I added to V1 regarding the
unwildcarding (inline).


>
> On Thu, Aug 1, 2019 at 3:12 PM Yi-Hung Wei  wrote:
>
>> This patch derives the timeout policy based on ct zone from the
>> internal data structure that reads the configuration from ovsdb.
>>
>> Signed-off-by: Yi-Hung Wei 
>> ---
>>  lib/ct-dpif.c| 10 ++
>>  lib/ct-dpif.h|  3 +++
>>  lib/dpif-netdev.c|  1 +
>>  lib/dpif-netlink.c   | 10 ++
>>  lib/dpif-provider.h  |  5 +
>>  ofproto/ofproto-dpif-xlate.c | 29 +
>>  ofproto/ofproto-dpif.c   | 24 
>>  ofproto/ofproto-provider.h   |  5 +
>>  ofproto/ofproto.c| 11 +++
>>  ofproto/ofproto.h|  2 ++
>>  10 files changed, 100 insertions(+)
>>
>> diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
>> index 7f9ce0a561f7..5d2acfd7810b 100644
>> --- a/lib/ct-dpif.c
>> +++ b/lib/ct-dpif.c
>> @@ -864,3 +864,13 @@ ct_dpif_timeout_policy_dump_done(struct dpif *dpif,
>> void *state)
>>  ? dpif->dpif_class->ct_timeout_policy_dump_done(dpif, state)
>>  : EOPNOTSUPP);
>>  }
>> +
>> +int
>> +ct_dpif_format_timeout_policy_name(struct dpif *dpif, uint32_t tp_id,
>> +   uint16_t dl_type, uint8_t nw_proto,
>> +   struct ds *ds)
>> +{
>> +return (dpif->dpif_class->ct_format_timeout_policy_name
>> +? dpif->dpif_class->ct_format_timeout_policy_name(
>> +dpif, tp_id, dl_type, nw_proto, ds) : EOPNOTSUPP);
>> +}
>> diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
>> index 8dacb1c7c253..0a27568880c0 100644
>> --- a/lib/ct-dpif.h
>> +++ b/lib/ct-dpif.h
>> @@ -318,5 +318,8 @@ int ct_dpif_timeout_policy_dump_start(struct dpif
>> *dpif, void **statep);
>>  int ct_dpif_timeout_policy_dump_next(struct dpif *dpif, void *state,
>>   struct ct_dpif_timeout_policy *tp);
>>  int ct_dpif_timeout_policy_dump_done(struct dpif *dpif, void *state);
>> +int ct_dpif_format_timeout_policy_name(struct dpif *dpif, uint32_t tp_id,
>> +   uint16_t dl_type, uint8_t
>> nw_proto,
>> +   struct ds *ds);
>>
>>  #endif /* CT_DPIF_H */
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 7240a3e6f3c8..19cf9f21ec85 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -7539,6 +7539,7 @@ const struct dpif_class dpif_netdev_class = {
>>  NULL,   /* ct_timeout_policy_dump_start */
>>  NULL,   /* ct_timeout_policy_dump_next */
>>  NULL,   /* ct_timeout_policy_dump_done */
>> +NULL,   /* ct_format_timeout_policy_name */
>>  dpif_netdev_ipf_set_enabled,
>>  dpif_netdev_ipf_set_min_frag,
>>  dpif_netdev_ipf_set_max_nfrags,
>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>> index b859508f718a..92da87027c58 100644
>> --- a/lib/dpif-netlink.c
>> +++ b/lib/dpif-netlink.c
>> @@ -3071,6 +3071,15 @@ dpif_netlink_format_tp_name(uint32_t id, uint16_t
>> l3num, uint8_t l4num,
>>  ovs_assert(tp_name->length < CTNL_TIMEOUT_NAME_MAX);
>>  }
>>
>> +static int
>> +dpif_netlink_ct_format_timeout_policy_name(struct dpif *dpif OVS_UNUSED,
>> +uint32_t tp_id, uint16_t dl_type, uint8_t nw_proto, struct ds *ds)
>> +{
>> +dpif_netlink_format_tp_name(tp_id,
>> +dl_type == ETH_TYPE_IP ? AF_INET : AF_INET6, nw_proto, ds);
>> +return 0;
>> +}
>> +
>>  #define CT_DPIF_NL_TP_TCP_MAPPINGS  \
>>  CT_DPIF_NL_TP_MAPPING(TCP, TCP, SYN_SENT, SYN_SENT) \
>>  CT_DPIF_NL_TP_MAPPING(TCP, TCP, SYN_RECV, SYN_RECV) \
>> @@ -3891,6 +3900,7 @@ const struct dpif_class dpif_netlink_class = {
>>  dpif_netlink_ct_timeout_policy_dump_start,
>>  dpif_netlink_ct_timeout_policy_dump_next,
>>  dpif_netlink_ct_timeout_policy_dump_done,
>> +dpif_netlink_ct_format_timeout_po

Re: [ovs-dev] [PATCH v2 9/9] system-traffic: Add zone-based conntrack timeout policy test

2019-08-06 Thread Darrell Ball
On Mon, Aug 5, 2019 at 9:03 PM Darrell Ball  wrote:

> Thanks for the patch
>
> I see the test is much improved now from V1 and passes - thanks
>
> Ideally, tests should be associated with some code for context
> It could be folded into patch 8
>

I did some more testing and found a similar problem as in V1.

This test can be run successfully once and then fails after that.
Maybe you want to look into that. It is probably related to:

dball@ubuntu:~/openvswitch/ovs$ lsmod | grep nf
.
nfnetlink_cttimeout16384  1
.

Darrell


>
>
> On Thu, Aug 1, 2019 at 3:12 PM Yi-Hung Wei  wrote:
>
>> This patch adds a system traffic test to verify the zone-based conntrack
>> timeout feature.  The test uses ovs-vsctl commands to configure
>> the customized ICMP and UDP timeout on zone 5 to a shorter period.
>> It then injects ICMP and UDP traffic to conntrack, and checks if the
>> corresponding conntrack entry expires after the predefined timeout.
>>
>> Signed-off-by: Yi-Hung Wei 
>> ---
>>  tests/system-kmod-macros.at  | 25 +++
>>  tests/system-traffic.at  | 66
>> 
>>  tests/system-userspace-macros.at | 26 
>>  3 files changed, 117 insertions(+)
>>
>> diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
>> index 554a61e9bd95..1bc6f246f426 100644
>> --- a/tests/system-kmod-macros.at
>> +++ b/tests/system-kmod-macros.at
>> @@ -100,6 +100,15 @@ m4_define([CHECK_CONNTRACK_FRAG_OVERLAP],
>>  #
>>  m4_define([CHECK_CONNTRACK_NAT])
>>
>> +# CHECK_CONNTRACK_TIMEOUT()
>> +#
>> +# Perform requirements checks for running conntrack customized timeout
>> tests.
>> +#
>> +m4_define([CHECK_CONNTRACK_TIMEOUT],
>> +[
>> +AT_SKIP_IF([! cat /boot/config-$(uname -r) | grep
>> NF_CONNTRACK_TIMEOUT | grep '=y' > /dev/null])
>> +])
>> +
>>  # CHECK_CT_DPIF_PER_ZONE_LIMIT()
>>  #
>>  # Perform requirements checks for running ovs-dpctl
>> ct-[set|get|del]-limits per
>> @@ -185,3 +194,19 @@ m4_define([OVS_CHECK_KERNEL_EXCL],
>>  sublevel=$(uname -r | sed -e 's/\./ /g' | awk '{print $ 2}')
>>  AT_SKIP_IF([ ! ( test $version -lt $1 || ( test $version -eq $1 &&
>> test $sublevel -lt $2 ) || test $version -gt $3 || ( test $version -eq $3
>> && test $sublevel -gt $4 ) ) ])
>>  ])
>> +
>> +# VSCTL_ADD_DATAPATH_TABLE()
>> +#
>> +# Create system datapath table "system" for kernel tests in ovsdb
>> +m4_define([VSCTL_ADD_DATAPATH_TABLE],
>> +[
>> +AT_CHECK([ovs-vsctl -- --id=@m create Datapath datapath_version=0 --
>> set Open_vSwitch . datapaths:"system"=@m], [0], [stdout])
>> +])
>> +
>> +# VSCTL_ADD_ZONE_TIMEOUT_POLICY([parameters])
>> +#
>> +# Add zone based timeout policy to kernel datapath
>> +m4_define([VSCTL_ADD_ZONE_TIMEOUT_POLICY],
>> +[
>> +AT_CHECK([ovs-vsctl add-zone-tp system $1], [0], [stdout])
>> +])
>> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
>> index 1a04199dcfe9..f4ac8a8f2c06 100644
>> --- a/tests/system-traffic.at
>> +++ b/tests/system-traffic.at
>> @@ -3179,6 +3179,72 @@ NXST_FLOW reply:
>>  OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>>
>> +AT_SETUP([conntrack - zone-based timeout policy])
>> +CHECK_CONNTRACK()
>> +CHECK_CONNTRACK_TIMEOUT()
>> +OVS_TRAFFIC_VSWITCHD_START()
>> +
>> +ADD_NAMESPACES(at_ns0, at_ns1)
>> +
>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
>> +
>> +AT_DATA([flows.txt], [dnl
>> +priority=1,action=drop
>> +priority=10,arp,action=normal
>> +priority=100,in_port=1,ip,action=ct(zone=5, table=1)
>> +priority=100,in_port=2,ip,action=ct(zone=5, table=1)
>> +table=1,in_port=2,ip,ct_state=+trk+est,action=1
>> +table=1,in_port=1,ip,ct_state=+trk+new,action=ct(commit,zone=5),2
>> +table=1,in_port=1,ip,ct_state=+trk+est,action=2
>> +])
>> +
>> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
>> +
>> +dnl Test with default timeout
>> +dnl The default udp_single and icmp_first timeouts are 30 seconds in
>> +dnl kernel DP, and 60 seconds in userspace DP.
>> +
>> +dnl Send ICMP and UDP traffic
>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
>> FORMAT_PING], [0], [dnl
>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
>> +])
>> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
>> packet=5054000a505400090

Re: [ovs-dev] [PATCH v2 9/9] system-traffic: Add zone-based conntrack timeout policy test

2019-08-05 Thread Darrell Ball
Thanks for the patch

I see the test is much improved now from V1 and passes - thanks

Ideally, tests should be associated with some code for context
It could be folded into patch 8


On Thu, Aug 1, 2019 at 3:12 PM Yi-Hung Wei  wrote:

> This patch adds a system traffic test to verify the zone-based conntrack
> timeout feature.  The test uses ovs-vsctl commands to configure
> the customized ICMP and UDP timeout on zone 5 to a shorter period.
> It then injects ICMP and UDP traffic to conntrack, and checks if the
> corresponding conntrack entry expires after the predefined timeout.
>
> Signed-off-by: Yi-Hung Wei 
> ---
>  tests/system-kmod-macros.at  | 25 +++
>  tests/system-traffic.at  | 66
> 
>  tests/system-userspace-macros.at | 26 
>  3 files changed, 117 insertions(+)
>
> diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
> index 554a61e9bd95..1bc6f246f426 100644
> --- a/tests/system-kmod-macros.at
> +++ b/tests/system-kmod-macros.at
> @@ -100,6 +100,15 @@ m4_define([CHECK_CONNTRACK_FRAG_OVERLAP],
>  #
>  m4_define([CHECK_CONNTRACK_NAT])
>
> +# CHECK_CONNTRACK_TIMEOUT()
> +#
> +# Perform requirements checks for running conntrack customized timeout
> tests.
> +#
> +m4_define([CHECK_CONNTRACK_TIMEOUT],
> +[
> +AT_SKIP_IF([! cat /boot/config-$(uname -r) | grep
> NF_CONNTRACK_TIMEOUT | grep '=y' > /dev/null])
> +])
> +
>  # CHECK_CT_DPIF_PER_ZONE_LIMIT()
>  #
>  # Perform requirements checks for running ovs-dpctl
> ct-[set|get|del]-limits per
> @@ -185,3 +194,19 @@ m4_define([OVS_CHECK_KERNEL_EXCL],
>  sublevel=$(uname -r | sed -e 's/\./ /g' | awk '{print $ 2}')
>  AT_SKIP_IF([ ! ( test $version -lt $1 || ( test $version -eq $1 &&
> test $sublevel -lt $2 ) || test $version -gt $3 || ( test $version -eq $3
> && test $sublevel -gt $4 ) ) ])
>  ])
> +
> +# VSCTL_ADD_DATAPATH_TABLE()
> +#
> +# Create system datapath table "system" for kernel tests in ovsdb
> +m4_define([VSCTL_ADD_DATAPATH_TABLE],
> +[
> +AT_CHECK([ovs-vsctl -- --id=@m create Datapath datapath_version=0 --
> set Open_vSwitch . datapaths:"system"=@m], [0], [stdout])
> +])
> +
> +# VSCTL_ADD_ZONE_TIMEOUT_POLICY([parameters])
> +#
> +# Add zone based timeout policy to kernel datapath
> +m4_define([VSCTL_ADD_ZONE_TIMEOUT_POLICY],
> +[
> +AT_CHECK([ovs-vsctl add-zone-tp system $1], [0], [stdout])
> +])
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 1a04199dcfe9..f4ac8a8f2c06 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -3179,6 +3179,72 @@ NXST_FLOW reply:
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>
> +AT_SETUP([conntrack - zone-based timeout policy])
> +CHECK_CONNTRACK()
> +CHECK_CONNTRACK_TIMEOUT()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +AT_DATA([flows.txt], [dnl
> +priority=1,action=drop
> +priority=10,arp,action=normal
> +priority=100,in_port=1,ip,action=ct(zone=5, table=1)
> +priority=100,in_port=2,ip,action=ct(zone=5, table=1)
> +table=1,in_port=2,ip,ct_state=+trk+est,action=1
> +table=1,in_port=1,ip,ct_state=+trk+new,action=ct(commit,zone=5),2
> +table=1,in_port=1,ip,ct_state=+trk+est,action=2
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +dnl Test with default timeout
> +dnl The default udp_single and icmp_first timeouts are 30 seconds in
> +dnl kernel DP, and 60 seconds in userspace DP.
> +
> +dnl Send ICMP and UDP traffic
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=5054000a505400090800451c0011a4cd0a0101010a010102000100020008
> actions=resubmit(,0)"])
> +
> +sleep 4
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | sort],
> [0], [dnl
>
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0),zone=5
>
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=5
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> +
> +dnl Shorten the udp_single and icmp_first timeout in zone 5
> +VSCTL_ADD_DATAPATH_TABLE()
> +VSCTL_ADD_ZONE_TIMEOUT_POLICY([zone=5 udp_single=3 icmp_first=3])
> +
> +dnl Send ICMP and UDP traffic
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=5054000a505400090800451c0011a4cd0a0101010a010102000100020008
> actions=resubmit(,0)"])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | sort],
> [0], [dnl
>
> 

Re: [ovs-dev] [PATCH v2 8/9] ofproto-dpif-xlate: Translate timeout policy in ct action

2019-08-05 Thread Darrell Ball
Thanks for the patch

The main comment I had from the V1 patch was adding the check

+if (ofc->flags & NX_CT_F_COMMIT) {

in compose_conntrack_action()

I see that was done.

After a quick scan, I had one minor comment inline.

On Thu, Aug 1, 2019 at 3:12 PM Yi-Hung Wei  wrote:

> This patch derives the timeout policy based on ct zone from the
> internal data structure that reads the configuration from ovsdb.
>
> Signed-off-by: Yi-Hung Wei 
> ---
>  lib/ct-dpif.c| 10 ++
>  lib/ct-dpif.h|  3 +++
>  lib/dpif-netdev.c|  1 +
>  lib/dpif-netlink.c   | 10 ++
>  lib/dpif-provider.h  |  5 +
>  ofproto/ofproto-dpif-xlate.c | 29 +
>  ofproto/ofproto-dpif.c   | 24 
>  ofproto/ofproto-provider.h   |  5 +
>  ofproto/ofproto.c| 11 +++
>  ofproto/ofproto.h|  2 ++
>  10 files changed, 100 insertions(+)
>
> diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
> index 7f9ce0a561f7..5d2acfd7810b 100644
> --- a/lib/ct-dpif.c
> +++ b/lib/ct-dpif.c
> @@ -864,3 +864,13 @@ ct_dpif_timeout_policy_dump_done(struct dpif *dpif,
> void *state)
>  ? dpif->dpif_class->ct_timeout_policy_dump_done(dpif, state)
>  : EOPNOTSUPP);
>  }
> +
> +int
> +ct_dpif_format_timeout_policy_name(struct dpif *dpif, uint32_t tp_id,
> +   uint16_t dl_type, uint8_t nw_proto,
> +   struct ds *ds)
> +{
> +return (dpif->dpif_class->ct_format_timeout_policy_name
> +? dpif->dpif_class->ct_format_timeout_policy_name(
> +dpif, tp_id, dl_type, nw_proto, ds) : EOPNOTSUPP);
> +}
> diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
> index 8dacb1c7c253..0a27568880c0 100644
> --- a/lib/ct-dpif.h
> +++ b/lib/ct-dpif.h
> @@ -318,5 +318,8 @@ int ct_dpif_timeout_policy_dump_start(struct dpif
> *dpif, void **statep);
>  int ct_dpif_timeout_policy_dump_next(struct dpif *dpif, void *state,
>   struct ct_dpif_timeout_policy *tp);
>  int ct_dpif_timeout_policy_dump_done(struct dpif *dpif, void *state);
> +int ct_dpif_format_timeout_policy_name(struct dpif *dpif, uint32_t tp_id,
> +   uint16_t dl_type, uint8_t nw_proto,
> +   struct ds *ds);
>
>  #endif /* CT_DPIF_H */
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 7240a3e6f3c8..19cf9f21ec85 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -7539,6 +7539,7 @@ const struct dpif_class dpif_netdev_class = {
>  NULL,   /* ct_timeout_policy_dump_start */
>  NULL,   /* ct_timeout_policy_dump_next */
>  NULL,   /* ct_timeout_policy_dump_done */
> +NULL,   /* ct_format_timeout_policy_name */
>  dpif_netdev_ipf_set_enabled,
>  dpif_netdev_ipf_set_min_frag,
>  dpif_netdev_ipf_set_max_nfrags,
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index b859508f718a..92da87027c58 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -3071,6 +3071,15 @@ dpif_netlink_format_tp_name(uint32_t id, uint16_t
> l3num, uint8_t l4num,
>  ovs_assert(tp_name->length < CTNL_TIMEOUT_NAME_MAX);
>  }
>
> +static int
> +dpif_netlink_ct_format_timeout_policy_name(struct dpif *dpif OVS_UNUSED,
> +uint32_t tp_id, uint16_t dl_type, uint8_t nw_proto, struct ds *ds)
> +{
> +dpif_netlink_format_tp_name(tp_id,
> +dl_type == ETH_TYPE_IP ? AF_INET : AF_INET6, nw_proto, ds);
> +return 0;
> +}
> +
>  #define CT_DPIF_NL_TP_TCP_MAPPINGS  \
>  CT_DPIF_NL_TP_MAPPING(TCP, TCP, SYN_SENT, SYN_SENT) \
>  CT_DPIF_NL_TP_MAPPING(TCP, TCP, SYN_RECV, SYN_RECV) \
> @@ -3891,6 +3900,7 @@ const struct dpif_class dpif_netlink_class = {
>  dpif_netlink_ct_timeout_policy_dump_start,
>  dpif_netlink_ct_timeout_policy_dump_next,
>  dpif_netlink_ct_timeout_policy_dump_done,
> +dpif_netlink_ct_format_timeout_policy_name,
>  NULL,   /* ipf_set_enabled */
>  NULL,   /* ipf_set_min_frag */
>  NULL,   /* ipf_set_max_nfrags */
> diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
> index 79a2314500cf..57b32ccb610f 100644
> --- a/lib/dpif-provider.h
> +++ b/lib/dpif-provider.h
> @@ -536,6 +536,11 @@ struct dpif_class {
> struct ct_dpif_timeout_policy *tp);
>  int (*ct_timeout_policy_dump_done)(struct dpif *, void *state);
>
> +/* Get timeout policy name (OVS_CT_ATTR_TIMEOUT) from datapath. */
> +int (*ct_format_timeout_policy_name)(struct dpif *, uint32_t tp_id,
> + uint16_t dl_type, uint8_t
> nw_proto,
> + struct ds *ds);
> +
>  /* IP Fragmentation. */
>
>  /* Disables 

Re: [ovs-dev] [PATCH v2 5/9] ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables

2019-08-05 Thread Darrell Ball
Thanks for the patch

comments inline

On Thu, Aug 1, 2019 at 3:10 PM Yi-Hung Wei  wrote:

> This patch consumes the CT_Zone and CT_Timeout_Policy tables, maintains
> the zone-based timeout policy in the vswitchd. Whenever there is a
> database change, vswitchd will read the datapath, CT_Zone, and
> CT_Timeout_Policy tables from ovsdb to detect if the is any timeout
> policy changes.
>
> If a new timeout policy is added, it stores the information in
> per datapath type internal datapath structure in struct dpif-backer,
> and pushes down the conntrack timeout policy into the datapath via dpif
> interface.
>
> If a timeout policy is no longer used, vswitchd may not be able to
> remove it from datapath immediately since the datapath flow may still
> reference that. Instead, we keep an timeout policy kill list, that
> vswitchd will goes back to the list regularly and try to kill the
> unused timeout policies.
>
> Signed-off-by: Yi-Hung Wei 
> ---
>  ofproto/ofproto-dpif.c | 266
> +
>  ofproto/ofproto-dpif.h |  10 ++
>  ofproto/ofproto-provider.h |   8 ++
>  ofproto/ofproto.c  |  20 
>  ofproto/ofproto.h  |   4 +
>  vswitchd/bridge.c  |  41 +++
>  6 files changed, 349 insertions(+)
>
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index 751535249e21..6336494e0bc8 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -156,6 +156,25 @@ struct ofport_dpif {
>  size_t n_qdscp;
>  };
>
> +struct ct_timeout_policy {
> +struct uuid uuid;
> +unsigned int last_used_seqno;
> +struct ct_dpif_timeout_policy cdtp;
> +struct cmap_node node;  /* Element in struct dpif_backer's
> + * "ct_tps" cmap. */
>


This looks like a layering violation
should be in dpif-netlink or netlink-conntrack for kernel side


> +struct ovs_list list_node;  /* Element in struct dpif_backer's
> + * "ct_tp_kill_list" list. */





> +};
> +
> +struct ct_zone {
> +uint16_t id;
> +unsigned int last_used_seqno;
> +struct uuid tp_uuid;/* uuid that identifies a timeout
> policy in
> + * struct dpif_backer's "ct_tps"
> cmap. */
> +struct cmap_node node;  /* Element in struct dpif_backer's
> + * "ct_zones" cmap. */
> +};
> +
>  static odp_port_t ofp_port_to_odp_port(const struct ofproto_dpif *,
> ofp_port_t);
>
> @@ -196,6 +215,8 @@ static struct hmap all_ofproto_dpifs_by_uuid =
>
>  static bool ofproto_use_tnl_push_pop = true;
>  static void ofproto_unixctl_init(void);
> +static void destroy_ct_zone_timeout_policy(struct dpif_backer *backer);
> +static void init_ct_zone_timeout_policy(struct dpif_backer *backer);
>
>  static inline struct ofproto_dpif *
>  ofproto_dpif_cast(const struct ofproto *ofproto)
> @@ -683,6 +704,7 @@ close_dpif_backer(struct dpif_backer *backer, bool del)
>  }
>  dpif_close(backer->dpif);
>  id_pool_destroy(backer->meter_ids);
> +destroy_ct_zone_timeout_policy(backer);
>  free(backer);
>  }
>
> @@ -694,6 +716,8 @@ struct odp_garbage {
>
>  static void check_support(struct dpif_backer *backer);
>
> +#define MAX_TIMEOUT_POLICY_ID UINT32_MAX
> +
>  static int
>  open_dpif_backer(const char *type, struct dpif_backer **backerp)
>  {
> @@ -811,6 +835,8 @@ open_dpif_backer(const char *type, struct dpif_backer
> **backerp)
>  backer->meter_ids = NULL;
>  }
>
> +init_ct_zone_timeout_policy(backer);
> +
>  /* Make a pristine snapshot of 'support' into 'boottime_support'.
>   * 'boottime_support' can be checked to prevent 'support' to be
> changed
>   * beyond the datapath capabilities. In case 'support' is changed by
> @@ -5086,6 +5112,244 @@ ct_flush(const struct ofproto *ofproto_, const
> uint16_t *zone)
>  ct_dpif_flush(ofproto->backer->dpif, zone, NULL);
>  }
>
> +static struct ct_zone *
> +ct_zone_lookup(struct cmap *ct_zones, uint16_t zone_id)
> +{
> +struct ct_zone *zone;
> +
> +CMAP_FOR_EACH_WITH_HASH (zone, node, hash_int(zone_id, 0), ct_zones) {
> +if (zone->id == zone_id) {
> +return zone;
> +}
> +}
> +return NULL;
> +}
> +
> +static struct ct_zone *
> +ct_zone_alloc(uint16_t zone_id)
> +{
> +struct ct_zone *zone;
> +
> +zone = xzalloc(sizeof *zone);
> +zone->id = zone_id;
> +
> +return zone;
> +}
> +
> +static void
> +ct_zone_remove_and_destroy(struct dpif_backer *backer, struct ct_zone
> *zone)
> +{
> +cmap_remove(>ct_zones, >node, hash_int(zone->id, 0));
> +ovsrcu_postpone(free, zone);
> +}
> +
> +static struct ct_timeout_policy *
> +ct_timeout_policy_lookup(struct cmap *ct_tps, struct uuid *uuid)
> +{
> +struct ct_timeout_policy *tp;
> +
> +CMAP_FOR_EACH_WITH_HASH (tp, node, uuid_hash(uuid), ct_tps) {
> +if 

Re: [ovs-dev] [PATCH v2 4/9] ct-dpif, dpif-netlink: Add conntrack timeout policy support

2019-08-05 Thread Darrell Ball
Thanks for the patch

I am going to avoid commenting on style or code conciseness in the
interests of time

On Thu, Aug 1, 2019 at 3:10 PM Yi-Hung Wei  wrote:

> This patch first defines the dpif interface for a datapath to support
> adding, deleting, getting and dumping conntrack timeout policy.
> The timeout policy is identified by a 4 bytes unsigned integer in
> datapath, and it currently support timeout for TCP, UDP, and ICMP
> protocols.
>

Using a 4 integer to identify a timeout policy is needed to address a Linux
restriction.
However, in general, policies can be identified by generic string or UUIDs.
We add that flexibility later, however.


>
> Moreover, this patch provides the implemetation for Linux kernel
> datapath in dpif-netlink.
>
> In Linux kernel, the timeout policy is maintained per L3/L4 protocol,
> and it is identified by 32 bytes null terminated string.  On the other
> hand, in vswitchd, the timeout policy is a generic one that consists of
> all the supported L4 protocols.  Therefore, one of the main task in
> dpif-netlink is to break down the generic timeout policy into 6
> sub policies (ipv4 tcp, udp, icmp, and ipv6 tcp, udp, icmp),
> and push down the configuration using the netlink API in
> netlink-conntrack.c.
>
> This patch also adds missing symbols in the windows datapath so
> that the build on windows can pass.
>
> Appveyor CI:
> * https://ci.appveyor.com/project/YiHungWei/ovs/builds/26387754
>
> Signed-off-by: Yi-Hung Wei 
> ---
>  datapath-windows/include/OvsDpInterfaceCtExt.h | 114 +
>  datapath-windows/ovsext/Netlink/NetlinkProto.h |   1 +
>  include/windows/automake.mk|   1 +
>  .../windows/linux/netfilter/nfnetlink_cttimeout.h  |   0
>  lib/ct-dpif.c  | 104 +
>  lib/ct-dpif.h  |  56 +++
>  lib/dpif-netdev.c  |   6 +
>  lib/dpif-netlink.c | 462
> +
>  lib/dpif-netlink.h |   1 +
>  lib/dpif-provider.h|  38 ++
>  lib/netlink-conntrack.c| 363 
>  lib/netlink-conntrack.h|  29 ++
>  lib/netlink-protocol.h |   1 +
>  13 files changed, 1176 insertions(+)
>  create mode 100644 include/windows/linux/netfilter/nfnetlink_cttimeout.h
>
> diff --git a/datapath-windows/include/OvsDpInterfaceCtExt.h
> b/datapath-windows/include/OvsDpInterfaceCtExt.h
> index 3b947782e90c..4379855bb8dd 100644
> --- a/datapath-windows/include/OvsDpInterfaceCtExt.h
> +++ b/datapath-windows/include/OvsDpInterfaceCtExt.h
> @@ -421,4 +421,118 @@ struct nf_ct_tcp_flags {
>  UINT8 mask;
>  };
>
> +/* File: nfnetlink_cttimeout.h */
> +enum ctnl_timeout_msg_types {
> +IPCTNL_MSG_TIMEOUT_NEW,
> +IPCTNL_MSG_TIMEOUT_GET,
> +IPCTNL_MSG_TIMEOUT_DELETE,
> +IPCTNL_MSG_TIMEOUT_DEFAULT_SET,
> +IPCTNL_MSG_TIMEOUT_DEFAULT_GET,
> +
> +IPCTNL_MSG_TIMEOUT_MAX
> +};
> +
> +enum ctattr_timeout {
> +CTA_TIMEOUT_UNSPEC,
> +CTA_TIMEOUT_NAME,
> +CTA_TIMEOUT_L3PROTO,
> +CTA_TIMEOUT_L4PROTO,
> +CTA_TIMEOUT_DATA,
> +CTA_TIMEOUT_USE,
> +__CTA_TIMEOUT_MAX
> +};
> +#define CTA_TIMEOUT_MAX (__CTA_TIMEOUT_MAX - 1)
> +
> +enum ctattr_timeout_generic {
> +CTA_TIMEOUT_GENERIC_UNSPEC,
> +CTA_TIMEOUT_GENERIC_TIMEOUT,
> +__CTA_TIMEOUT_GENERIC_MAX
> +};
> +#define CTA_TIMEOUT_GENERIC_MAX (__CTA_TIMEOUT_GENERIC_MAX - 1)
> +
> +enum ctattr_timeout_tcp {
> +CTA_TIMEOUT_TCP_UNSPEC,
> +CTA_TIMEOUT_TCP_SYN_SENT,
> +CTA_TIMEOUT_TCP_SYN_RECV,
> +CTA_TIMEOUT_TCP_ESTABLISHED,
> +CTA_TIMEOUT_TCP_FIN_WAIT,
> +CTA_TIMEOUT_TCP_CLOSE_WAIT,
> +CTA_TIMEOUT_TCP_LAST_ACK,
> +CTA_TIMEOUT_TCP_TIME_WAIT,
> +CTA_TIMEOUT_TCP_CLOSE,
> +CTA_TIMEOUT_TCP_SYN_SENT2,
> +CTA_TIMEOUT_TCP_RETRANS,
> +CTA_TIMEOUT_TCP_UNACK,
> +__CTA_TIMEOUT_TCP_MAX
> +};
> +#define CTA_TIMEOUT_TCP_MAX (__CTA_TIMEOUT_TCP_MAX - 1)
> +
> +enum ctattr_timeout_udp {
> +CTA_TIMEOUT_UDP_UNSPEC,
> +CTA_TIMEOUT_UDP_UNREPLIED,
> +CTA_TIMEOUT_UDP_REPLIED,
> +__CTA_TIMEOUT_UDP_MAX
> +};
> +#define CTA_TIMEOUT_UDP_MAX (__CTA_TIMEOUT_UDP_MAX - 1)
> +
> +enum ctattr_timeout_udplite {
> +CTA_TIMEOUT_UDPLITE_UNSPEC,
> +CTA_TIMEOUT_UDPLITE_UNREPLIED,
> +CTA_TIMEOUT_UDPLITE_REPLIED,
> +__CTA_TIMEOUT_UDPLITE_MAX
> +};
> +#define CTA_TIMEOUT_UDPLITE_MAX (__CTA_TIMEOUT_UDPLITE_MAX - 1)
> +
> +enum ctattr_timeout_icmp {
> +CTA_TIMEOUT_ICMP_UNSPEC,
> +CTA_TIMEOUT_ICMP_TIMEOUT,
> +__CTA_TIMEOUT_ICMP_MAX
> +};
> +#define CTA_TIMEOUT_ICMP_MAX (__CTA_TIMEOUT_ICMP_MAX - 1)
> +
> +enum ctattr_timeout_dccp {
> +CTA_TIMEOUT_DCCP_UNSPEC,
> +CTA_TIMEOUT_DCCP_REQUEST,
> +CTA_TIMEOUT_DCCP_RESPOND,
> +CTA_TIMEOUT_DCCP_PARTOPEN,
> +CTA_TIMEOUT_DCCP_OPEN,
> +CTA_TIMEOUT_DCCP_CLOSEREQ,
> +

Re: [ovs-dev] [PATCH v2 1/9] ovs-vswitchd: Add Datapath, CT_Zone, and CT_Zone_Policy tables.

2019-08-05 Thread Darrell Ball
One comment fix:

s/ "min": 0, "max": "65535"}},/ "min": 0, "max": "65536"}},/

On Mon, Aug 5, 2019 at 4:09 PM Darrell Ball  wrote:

> Thanks for the patch
>
> I avoided duplicate comments from what Justin suggested
>
> comments inline
>
> On Thu, Aug 1, 2019 at 3:08 PM Yi-Hung Wei  wrote:
>
>> From: Justin Pettit 
>>
>> From: Justin Pettit 
>>
>> Signed-off-by: Justin Pettit 
>> ---
>>  vswitchd/vswitch.ovsschema |  43 +++-
>>  vswitchd/vswitch.xml   | 252
>> -
>>  2 files changed, 246 insertions(+), 49 deletions(-)
>>
>> diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
>> index f7c6eb8983cd..d215f4edfefa 100644
>> --- a/vswitchd/vswitch.ovsschema
>> +++ b/vswitchd/vswitch.ovsschema
>> @@ -1,9 +1,14 @@
>>  {"name": "Open_vSwitch",
>> - "version": "8.0.0",
>> - "cksum": "3962141869 23978",
>> + "version": "8.1.0",
>> + "cksum": "1566974404 25483",
>>   "tables": {
>> "Open_vSwitch": {
>>   "columns": {
>> +   "datapaths": {
>> + "type": {"key": {"type": "string"},
>>
>
> Should 'type' be an enum
> something like:
>
>  "type": {"key": {"type": "string",
>   "enum": ["set", ["system", "netdev"]]}},
>
> The schema can still be upgraded by adding new datapath types should more
> ever arise.
>
>
>
>> +  "value": {"type": "uuid",
>> +"refTable": "Datapath"},
>> +  "min": 0, "max": "unlimited"}},
>>
>
> accordingly:
>
> "min": 0, "max": "2"}},
>
>
>
>> "bridges": {
>>   "type": {"key": {"type": "uuid",
>>"refTable": "Bridge"},
>> @@ -629,6 +634,40 @@
>>"min": 0, "max": "unlimited"},
>>   "ephemeral": true}},
>>   "indexes": [["target"]]},
>> +   "Datapath": {
>> + "columns": {
>> +   "datapath_version": {
>> + "type": "string"},
>> +   "ct_zones": {
>> + "type": {"key": {"type": "integer",
>> +  "minInteger": 0,
>> +  "maxInteger": 65535},
>> +  "value": {"type": "uuid",
>> +"refTable": "CT_Zone"},
>> +  "min": 0, "max": "unlimited"}},
>>
>
>
> How about ?
>
>  "min": 0, "max": "65535"}},
>

s/ "min": 0, "max": "65535"}},/ "min": 0, "max": "65536"}},/


>
> I don't think we can have multiple entries for the same zone and if we
> did, we don't
> handle it.
>
>
>
>> +   "external_ids": {
>> + "type": {"key": "string", "value": "string",
>> +  "min": 0, "max": "unlimited",
>> +   "CT_Zone": {
>> + "columns": {
>> +   "timeout_policy": {
>> + "type": {"key": {"type": "uuid",
>> +  "refTable": "CT_Timeout_Policy"},
>> +  "min": 0, "max": 1}},
>> +   "external_ids": {
>> + "type": {"key": "string", "value": "string",
>> +  "min": 0, "max": "unlimited",
>> +   "CT_Timeout_Policy": {
>> + "columns": {
>> +   "timeouts": {
>> + "type": {"key": "string",
>> +  "value": {"type" : "integer",
>> +"minInteger" : 0,
>> +  

Re: [ovs-dev] [PATCH v2 2/9] ovs-vsctl: Add conntrack zone commands.

2019-08-05 Thread Darrell Ball
Thanks for the patch

I noticed '--may-exist' and '--if-exists' are supported now for
add--zone-tp/del-zone-tp - thanks
The check for duplicate timeout policies now correctly checks all key and
values - thanks

Some more comments inline
I am trying to avoid duplicate comment from Justin, so I just won't comment
on some parts in this version
to avoid confusion.


On Thu, Aug 1, 2019 at 3:09 PM Yi-Hung Wei  wrote:

> From: William Tu 
>
> The patch adds commands creating/deleting/listing conntrack zone
> timeout policies:
>   $ ovs-vsctl {add,del,list}-zone-tp zone=zone_id ...
>
> Signed-off-by: William Tu 
> ---
>  tests/ovs-vsctl.at   |  34 +++-
>  utilities/ovs-vsctl.8.in |  25 ++
>  utilities/ovs-vsctl.c| 204
> +++
>  3 files changed, 261 insertions(+), 2 deletions(-)
>
> diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
> index 46fa3c5b1a33..f0c5975edd0e 100644
> --- a/tests/ovs-vsctl.at
> +++ b/tests/ovs-vsctl.at
> @@ -805,6 +805,20 @@ AT_CHECK(
>[RUN_OVS_VSCTL([--if-exists remove netflow x targets '"1.2.3.4:567
> "'])])
>  AT_CHECK(
>[RUN_OVS_VSCTL([--if-exists clear netflow x targets])])
> +
> +AT_CHECK([RUN_OVS_VSCTL([-- --id=@m create Datapath datapath_version=0 --
> set Open_vSwitch . datapaths:"netdev"=@m])], [0], [stdout])
>

What happens if we add a datapath type here and there are no bridges of
that type; meaning the datapath of that type does not even exist.
Seems like a contradiction.
Maybe we should check for that at least and raise an error.
Ideally, it is better if these 'datapaths' are auto-managed by bridge
creation/deletion with given datapath types,
but we can certainly defer that.


> +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=1 icmp_first=1
> icmp_reply=2])])
>

I mentioned this in V1:
There is no filtering of bad timeout key; for example

AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=1 icmp_first=1
foo_bar=2])])

is accepted as valid

Even worse, a minor typo will go unnoticed - missing 'y' in 'icmp_reply'.

AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=1 icmp_first=1
icmp_repl=2])])


> +AT_CHECK([RUN_OVS_VSCTL([--may-exist add-zone-tp netdev zone=1
> icmp_first=1 icmp_reply=2])])
> +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:1, Timeout
> Policies: icmp_first=1 icmp_reply=2
>

I mentioned in V1
We should check all possible timeout keys to make sure they work.


> +])
> +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=2 icmp_first=2
> icmp_reply=3])])
> +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:1, Timeout
> Policies: icmp_first=1 icmp_reply=2
> +Zone:2, Timeout Policies: icmp_first=2 icmp_reply=3
> +])
> +AT_CHECK([RUN_OVS_VSCTL([del-zone-tp netdev zone=1])])
> +AT_CHECK([RUN_OVS_VSCTL([--if-exists del-zone-tp netdev zone=1])])
> +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:2, Timeout
> Policies: icmp_first=2 icmp_reply=3
> +])
>  OVS_VSCTL_CLEANUP
>  AT_CLEANUP
>
> @@ -890,10 +904,10 @@ AT_CHECK([RUN_OVS_VSCTL([set bridge br0
> flood_vlans=-1])],
>  AT_CHECK([RUN_OVS_VSCTL([set bridge br0 flood_vlans=4096])],
>[1], [], [ovs-vsctl: constraint violation: 4096 is not in the valid
> range 0 to 4095 (inclusive)
>  ])
> -AT_CHECK([RUN_OVS_VSCTL([set c br1 'connection-mode=xyz'])],
> +AT_CHECK([RUN_OVS_VSCTL([set controller br1 'connection-mode=xyz'])],
>

I mentioned in V1.
I don't think we should make unrelated changes in a feature patch
especially since it seems the author wanted to convey
short form syntax is valid


>[1], [], [[ovs-vsctl: constraint violation: xyz is not one of the
> allowed values ([in-band, out-of-band])
>  ]])
> -AT_CHECK([RUN_OVS_VSCTL([set c br1 connection-mode:x=y])],
> +AT_CHECK([RUN_OVS_VSCTL([set controller br1 connection-mode:x=y])],
>[1], [], [ovs-vsctl: cannot specify key to set for non-map column
> connection_mode
>  ])
>  AT_CHECK([RUN_OVS_VSCTL([add bridge br1 datapath_id x y])],
> @@ -929,6 +943,22 @@ AT_CHECK([RUN_OVS_VSCTL([remove bridge br1
> flood-vlans true])],
>  AT_CHECK([RUN_OVS_VSCTL([clear bridge br1 name])],
>[1], [], [ovs-vsctl: cannot modify read-only column name in table Bridge
>  ])
> +
> +AT_CHECK([RUN_OVS_VSCTL([-- --id=@m create Datapath datapath_version=0 --
> set Open_vSwitch . datapaths:"netdev"=@m])], [0], [stdout])
> +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdevxx zone=1 icmp_first=1
> icmp_reply=2])],
> +  [1], [], [ovs-vsctl: datapath: netdevxx record not found
> +])
> +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=2 icmp_first=2
> icmp_reply=3])])
> +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=2 icmp_first=2
> icmp_reply=3])],
> +  [1], [], [ovs-vsctl: zone id 2 alread exists
> +])
> +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:2, Timeout
> Policies: icmp_first=2 icmp_reply=3
> +])
> +AT_CHECK([RUN_OVS_VSCTL([del-zone-tp netdev zone=11])],
> +  [1], [], [ovs-vsctl: zone id 11 not exists.
> +])
> +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], 

Re: [ovs-dev] [PATCH v2 1/9] ovs-vswitchd: Add Datapath, CT_Zone, and CT_Zone_Policy tables.

2019-08-05 Thread Darrell Ball
Thanks for the patch

I avoided duplicate comments from what Justin suggested

comments inline

On Thu, Aug 1, 2019 at 3:08 PM Yi-Hung Wei  wrote:

> From: Justin Pettit 
>
> From: Justin Pettit 
>
> Signed-off-by: Justin Pettit 
> ---
>  vswitchd/vswitch.ovsschema |  43 +++-
>  vswitchd/vswitch.xml   | 252
> -
>  2 files changed, 246 insertions(+), 49 deletions(-)
>
> diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
> index f7c6eb8983cd..d215f4edfefa 100644
> --- a/vswitchd/vswitch.ovsschema
> +++ b/vswitchd/vswitch.ovsschema
> @@ -1,9 +1,14 @@
>  {"name": "Open_vSwitch",
> - "version": "8.0.0",
> - "cksum": "3962141869 23978",
> + "version": "8.1.0",
> + "cksum": "1566974404 25483",
>   "tables": {
> "Open_vSwitch": {
>   "columns": {
> +   "datapaths": {
> + "type": {"key": {"type": "string"},
>

Should 'type' be an enum
something like:

 "type": {"key": {"type": "string",
  "enum": ["set", ["system", "netdev"]]}},

The schema can still be upgraded by adding new datapath types should more
ever arise.



> +  "value": {"type": "uuid",
> +"refTable": "Datapath"},
> +  "min": 0, "max": "unlimited"}},
>

accordingly:

"min": 0, "max": "2"}},



> "bridges": {
>   "type": {"key": {"type": "uuid",
>"refTable": "Bridge"},
> @@ -629,6 +634,40 @@
>"min": 0, "max": "unlimited"},
>   "ephemeral": true}},
>   "indexes": [["target"]]},
> +   "Datapath": {
> + "columns": {
> +   "datapath_version": {
> + "type": "string"},
> +   "ct_zones": {
> + "type": {"key": {"type": "integer",
> +  "minInteger": 0,
> +  "maxInteger": 65535},
> +  "value": {"type": "uuid",
> +"refTable": "CT_Zone"},
> +  "min": 0, "max": "unlimited"}},
>


How about ?

 "min": 0, "max": "65535"}},

I don't think we can have multiple entries for the same zone and if we did,
we don't
handle it.



> +   "external_ids": {
> + "type": {"key": "string", "value": "string",
> +  "min": 0, "max": "unlimited",
> +   "CT_Zone": {
> + "columns": {
> +   "timeout_policy": {
> + "type": {"key": {"type": "uuid",
> +  "refTable": "CT_Timeout_Policy"},
> +  "min": 0, "max": 1}},
> +   "external_ids": {
> + "type": {"key": "string", "value": "string",
> +  "min": 0, "max": "unlimited",
> +   "CT_Timeout_Policy": {
> + "columns": {
> +   "timeouts": {
> + "type": {"key": "string",
> +  "value": {"type" : "integer",
> +"minInteger" : 0,
> +"maxInteger" : 4294967295},
> +  "min": 0, "max": "unlimited"}},
> +   "external_ids": {
> + "type": {"key": "string", "value": "string",
> +  "min": 0, "max": "unlimited",
> "SSL": {
>   "columns": {
> "private_key": {
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 027aee2f523b..a0706c9c0fc1 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -52,6 +52,13 @@
>  one record in the  table.
>
>  
> +  
> +Map of datapath types to datapaths.  The
> + column of the 
> +table is used as a key for this map.  The value points to a row in
> +the  table.
> +  
> +
>
>  Set of bridges managed by the daemon.
>
> @@ -1192,53 +1199,11 @@
>
>
>
> -
> -  Reports the version number of the Open vSwitch datapath in use.
> -  This allows management software to detect and report
> discrepancies
> -  between Open vSwitch userspace and datapath versions.  (The  -  column="ovs_version" table="Open_vSwitch"/> column in the  -  table="Open_vSwitch"/> reports the Open vSwitch userspace
> version.)
> -  The version reported depends on the datapath in use:
> -
> -
> -
> -  
> -When the kernel module included in the Open vSwitch source
> tree is
> -used, this column reports the Open vSwitch version from which
> the
> -module was taken.
> -  
> -
> -  
> -When the kernel module that is part of the upstream Linux
> kernel is
> -used, this column reports unknown.
> -  
> -
> -  
> -When the datapath is built into the ovs-vswitchd
> -binary, this column reports built-in.  A
> -built-in datapath is by definition the same version as the
> rest of
> -the Open VSwitch userspace.
> -  
> -
> -  
> -Other datapaths (such as the Hyper-V kernel datapath)
> 

Re: [ovs-dev] [PATCH 00/12] Support zone-based conntrack timeout policy

2019-07-31 Thread Darrell Ball
On Wed, Jul 31, 2019 at 1:25 AM Ilya Maximets 
wrote:

> On 29.07.2019 21:53, Yi-Hung Wei wrote:
> > Hi Ilya,
> >
> > Thanks for your comment.
> >
> > On Mon, Jul 29, 2019 at 2:22 AM Ilya Maximets 
> wrote:
> >>
> >> Hi everyone,
> >>
> >> My 2 cents for the feature design:
> >>
> >> From the user's perspective:
> >>
> >> * 'add-dp'/'del-dp' commands looks very strange.
> >>   "I didn't add datapath into ovsdb, why it exists and switches
> packets?"
> >>   "I deleted the datapath from the OVS, why it still exists and
> switches packets?"
> >>
> >>   If you're implementing the configuration like this, 'datapath' should
> >>   own the bridges and interfaces, i.e. datapath should be created
> manually
> >>   on 'add-dp' and automatically on adding the first bridge on that
> datapath.
> >>   All the bridges and interfaces must be deleted/destroyed on 'del-dp'.
> >>
> >>   Or you need to rename your tables and commands to not look like this.
> >>
> >> From the developer's perspective:
> >>
> >> * Right now 'ofproto-dpif' is the only module that manages datapath
> interfaces
> >>   and it knows that (there are specific comments in the code). 'dpif's
> has
> >>   no reference counts and it's slightly unsafe to manage them outside of
> >>   'ofproto-dpif'.
> >>   You're adding the side module that allowed to open dpif (and it's not
> able
> >>   to delete it, that is the possible cause if issues) and use it without
> >>   noticing any other modules. This breaks the hierarchical structure of
> OVS.
> >>
> >> * Right now most of the datapath configuration is done via
> 'other_config'
> >>   and corresponding dpif_set_config() callback. Since you're introducing
> >>   datapath-config module, it should take care of all of this staff. And
> this
> >>   will require significant rework of the current datapath configuration
> scheme.
> >>
> >> * 'reconfigure_datapath' is an ambiguous name.
> >>
> >>   Solution for above issues might be not introducing the new modules at
> all.
> >>   Everything could be handled like we're handling meters, but with
> OVSDB as the
> >>   configuration source. On configuration change bridge layer will call
> ofproto
> >>   layer that will pass configuration to ofproto-dpif and, finally, dpif
> layer.
> >>   Inside 'struct dpif' in dpif.c module you could track all the
> configuration
> >>   and pass all the required changes to the dpif-provider via callbacks.
> >>   This way everything will work fine without breaking current OVS
> hierarchy.
> >
> > Thanks for your suggestion about the datapath-config part. I think it
> > makes sense to implement what datapath-config does inside dpif.c
> > rather than introduce a new module.  I will make proper change for
> > that in v2.
> >
> >
> >>
> >> * DB scheme looks just overcomplicated. 3 additional tables which
> references
> >>   others just to store a string to integer map.
> >>   I think that it might be much easier to create a single 'CT_Zones'
> table
> >>   with all the required columns:
> >>   'id', 'tcp_syn_sent', 'tcp_syn_recv', ..., 'icmp_reply'.
> >>   This is not a big deal since you need to describe every single field
> in
> >>   vswitch.xml anyway. This will also allow you to check support of
> particular
> >>   field on the stage of adding value to the database.
> >>   If you really need to distinguish zones by the datapath type (which
> is not
> >>   obvious), you may add 'datapath_type' column, just like we have in a
> 'Bridge'
> >>   table.
> >
> > As for the database schema, we intend to make CT_Zone table references
> > to CT_Timeout_Policy table because some other zone-based feature can
> > be configured through ovsdb later on. For example, we can have a new
> > column in CT_Zone table that stores 'limit' as an integer to support
> > the zone limit feature (limiting number of connection in a zone).  It
> > is currently configured through dpctl commands.
>
> At least, since each zone could have only one timeout policy it's easy to
> just
> inline CT_Timeout_Policy into CT_Zone like this:
>
> diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
> index 17aed1fc3..8ee860f19 100644
> --- a/vswitchd/vswitch.ovsschema
> +++ b/vswitchd/vswitch.ovsschema
> @@ -649,15 +649,6 @@
>   "type": {"key": "string", "value": "string",
>"min": 0, "max": "unlimited",
> "CT_Zone": {
> - "columns": {
> -   "timeout_policy": {
> - "type": {"key": {"type": "uuid",
> -  "refTable": "CT_Timeout_Policy"},
> -  "min": 0, "max": 1}},
> -   "external_ids": {
> - "type": {"key": "string", "value": "string",
> -  "min": 0, "max": "unlimited",
> -   "CT_Timeout_Policy": {
>   "columns": {
> "timeouts": {
>   "type": {"key": "string",
> @@ -667,8 +658,7 @@
>"min": 0, "max": "unlimited"}},
> "external_ids": {
>   "type": {"key": "string", "value": "string",
> -  

Re: [ovs-dev] [PATCH] OVN: fix DNAT/SNAT system-ovn unit tests

2019-07-30 Thread Darrell Ball
Thanks Lorenzo

Looks good
I did not retest this patch, but it is the same code you sent before that I
had tested

On Mon, Jul 29, 2019 at 4:41 AM Lorenzo Bianconi <
lorenzo.bianc...@redhat.com> wrote:

> Fix conntrack checks in the following tests in tests/system-ovn.at:
> - ovn -- DNAT and SNAT on distributed router - N/S
> - ovn -- DNAT and SNAT on distributed router - E/W
>
> Fixes: a6ee09882283 ("OVN: run local logical flows first in
> S_ROUTER_OUT_SNAT table")
> Signed-off-by: Lorenzo Bianconi 
> ---
>  tests/system-ovn.at | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index 10fbd2649..f88ad31e4 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -1334,11 +1334,13 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2
> 172.16.1.2 | FORMAT_PING], \
>  ])
>
>  # We verify that SNAT indeed happened via 'dump-conntrack' command.
> -AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.1.4) | \
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.1.1) | \
>  sed -e 's/zone=[[0-9]]*/zone=/'], [0], [dnl
>
> -icmp,orig=(src=192.168.1.3,dst=172.16.1.2,id=,type=8,code=0),reply=(src=172.16.1.2,dst=172.16.1.4,id=,type=0,code=0),zone=
>
> +icmp,orig=(src=192.168.1.3,dst=172.16.1.2,id=,type=8,code=0),reply=(src=172.16.1.2,dst=172.16.1.1,id=,type=0,code=0),zone=
>  ])
>
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> +
>  # South-North SNAT: 'bar1' pings 'alice1'. But 'alice1' receives traffic
>  # from 172.16.1.1
>  NS_CHECK_EXEC([bar1], [ping -q -c 3 -i 0.3 -w 2 172.16.1.2 |
> FORMAT_PING], \
> @@ -1507,12 +1509,14 @@ NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2
> 172.16.1.4 | FORMAT_PING], \
>
>  # Check conntrack entries.  First SNAT of 'foo1' address happens.
>  # Then DNAT of 'bar1' address happens (listed first below).
> -AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.1.3) | \
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(172.16.1.4) | \
>  sed -e 's/zone=[[0-9]]*/zone=/'], [0], [dnl
>
> -icmp,orig=(src=172.16.1.3,dst=172.16.1.4,id=,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.3,id=,type=0,code=0),zone=
>
> -icmp,orig=(src=192.168.1.2,dst=172.16.1.4,id=,type=8,code=0),reply=(src=172.16.1.4,dst=172.16.1.3,id=,type=0,code=0),zone=
>
> +icmp,orig=(src=172.16.1.1,dst=172.16.1.4,id=,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=,type=0,code=0),zone=
>
> +icmp,orig=(src=192.168.1.2,dst=172.16.1.4,id=,type=8,code=0),reply=(src=172.16.1.4,dst=172.16.1.1,id=,type=0,code=0),zone=
>  ])
>
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> +
>  # East-West NAT: 'foo2' pings 'bar1' using 172.16.1.4.
>  NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 172.16.1.4 |
> FORMAT_PING], \
>  [0], [dnl
> --
> 2.21.0
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] test: do not require python2 for CHECK_CONNTRACK macro

2019-07-30 Thread Darrell Ball
On Tue, Jul 30, 2019 at 10:41 AM Mark Michelson  wrote:

> Why do these macros require python at all?
>

alg test tools



>
> On 7/29/19 7:48 AM, Lorenzo Bianconi wrote:
> > Do not strictly require python2 for CHECK_CONNTRACK macro definitions in
> > system-{kmod,userspace}-macros.at
> >
> > Signed-off-by: Lorenzo Bianconi 
> > ---
> >   tests/system-kmod-macros.at  | 2 +-
> >   tests/system-userspace-macros.at | 2 +-
> >   2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
> > index 554a61e9b..48e94642b 100644
> > --- a/tests/system-kmod-macros.at
> > +++ b/tests/system-kmod-macros.at
> > @@ -59,7 +59,7 @@ m4_define([CONFIGURE_VETH_OFFLOADS],
> >   # kernel conntrack tables when the test is finished.
> >   #
> >   m4_define([CHECK_CONNTRACK],
> > -[AT_SKIP_IF([test $HAVE_PYTHON2 = no])
> > +[AT_SKIP_IF([test $HAVE_PYTHON = no])
> >m4_foreach([mod], [[nf_conntrack_ipv4], [nf_conntrack_ipv6],
> [nf_nat_ftp],
> >   [nf_nat_tftp]],
> >   [modprobe mod || echo "Module mod not loaded."
> > diff --git a/tests/system-userspace-macros.at b/tests/
> system-userspace-macros.at
> > index 9d5f3bf41..a411e3d89 100644
> > --- a/tests/system-userspace-macros.at
> > +++ b/tests/system-userspace-macros.at
> > @@ -65,7 +65,7 @@ m4_define([CONFIGURE_VETH_OFFLOADS],
> >   # Perform requirements checks for running conntrack tests.
> >   #
> >   m4_define([CHECK_CONNTRACK],
> > -[AT_SKIP_IF([test $HAVE_PYTHON2 = no])]
> > +[AT_SKIP_IF([test $HAVE_PYTHON = no])]
> >   )
> >
> >   # CHECK_CONNTRACK_ALG()
> >
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] test: do not require python2 for CHECK_CONNTRACK macro

2019-07-30 Thread Darrell Ball
Thanks Lorenzo

Looks good



On Mon, Jul 29, 2019 at 4:48 AM Lorenzo Bianconi <
lorenzo.bianc...@redhat.com> wrote:

> Do not strictly require python2 for CHECK_CONNTRACK macro definitions in
> system-{kmod,userspace}-macros.at
>
> Signed-off-by: Lorenzo Bianconi 
> ---
>  tests/system-kmod-macros.at  | 2 +-
>  tests/system-userspace-macros.at | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tests/system-kmod-macros.at b/tests/system-kmod-macros.at
> index 554a61e9b..48e94642b 100644
> --- a/tests/system-kmod-macros.at
> +++ b/tests/system-kmod-macros.at
> @@ -59,7 +59,7 @@ m4_define([CONFIGURE_VETH_OFFLOADS],
>  # kernel conntrack tables when the test is finished.
>  #
>  m4_define([CHECK_CONNTRACK],
> -[AT_SKIP_IF([test $HAVE_PYTHON2 = no])
> +[AT_SKIP_IF([test $HAVE_PYTHON = no])
>   m4_foreach([mod], [[nf_conntrack_ipv4], [nf_conntrack_ipv6],
> [nf_nat_ftp],
>  [nf_nat_tftp]],
>  [modprobe mod || echo "Module mod not loaded."
> diff --git a/tests/system-userspace-macros.at b/tests/
> system-userspace-macros.at
> index 9d5f3bf41..a411e3d89 100644
> --- a/tests/system-userspace-macros.at
> +++ b/tests/system-userspace-macros.at
> @@ -65,7 +65,7 @@ m4_define([CONFIGURE_VETH_OFFLOADS],
>  # Perform requirements checks for running conntrack tests.
>  #
>  m4_define([CHECK_CONNTRACK],
> -[AT_SKIP_IF([test $HAVE_PYTHON2 = no])]
> +[AT_SKIP_IF([test $HAVE_PYTHON = no])]
>  )
>
>  # CHECK_CONNTRACK_ALG()
> --
> 2.21.0
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 05/12] ct-dpif: Add conntrack timeout policy support in dpif layer

2019-07-29 Thread Darrell Ball
On Mon, Jul 29, 2019 at 3:51 PM Yi-Hung Wei  wrote:

> On Mon, Jul 29, 2019 at 1:12 PM Darrell Ball  wrote:
> >> > "is_default" - can you explain this one ?
> >>
> >> This flag is used to configure the default timeout policy in the
> >> datapath.  If 'is_default' is true, it will set the provided timeout
> >> policy to be the default timeout policy. The default timeout policy is
> >> documented in vswitchd/vswitch.xml
> >>
> >
> > Below is what I see the the schema xml file under timeout policy
> > Please add description about the 'default' timeout policy under
> > "CT_Timeout_Policy" - let me know if you need help, as I can submit a
> patch.
>
> It is a few lines before what you pasted below.
>
> 
>   Connection tracking zone configuration
>
>   
> Connection tracking timeout policy for this zone. If timeout policy is
> not specified, defaults to the timeout policy in the default zone.  If
> the timeout policy in default zone is not specified, defaults to the
> default timeouts in the system.
>   
>
>   
> The overall purpose of these columns is described under Common
> Columns at the beginning of this document.
>
> 
>   
> 
>

Since we decided to remove the default zone timeout policy defaulting,
I think this all becomes moot.



>
>
> -Yi-Hung
> >
> >   
> > Connection tracking timeout policy configuration
> >
> > 
> >   
> >   The timeouts column contains key-value pairs used
> >   to configure connection tracking timeouts in a datapath.
> >   Key-value pairs that are not supported by a datapath are
> >   ignored.
> >   
> >
> >   
> > 
> >   TCP SYN sent timeout.
> > 
> >
> > 
> >   TCP SYN receive timeout.
> > 
> >
> > 
> >   TCP established timeout.
> > 
> >
> > 
> >   TCP FIN wait timeout.
> > 
> >
> > 
> >   TCP close wait timeout.
> > 
> >
> > 
> >   TCP last ACK timeout.
> > 
> >
> > 
> >   TCP time wait timeout.
> > 
> >
> > 
> >   TCP close timeout.
> > 
> >
> > 
> >   TCP syn sent2 timeout.
> > 
> >
> > 
> >   TCP retransmit timeout.
> > 
> >
> > 
> >   TCP unacknowledgment timeout.
> > 
> >   
> >
> >   
> > 
> >   First UDP packet timeout.
> > 
> >
> > 
> >   The timeout in the state that source host sends more than one
> packet
> >   but the destination host has never sent one backs.
> > 
> >
> > 
> >   UDP packets seen in both directions timeout.
> > 
> >   
> >
> >   
> > 
> >   First ICMP timeout.
> > 
> >
> > 
> >   ICMP reply timeout.
> > 
> >   
> > 
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 03/12] ovs-vsctl: Add datapath and CT zone commands.

2019-07-29 Thread Darrell Ball
added one more comment.

On Fri, Jul 26, 2019 at 4:10 PM Darrell Ball  wrote:

> added one more comment for now
>
>
> On Fri, Jul 26, 2019 at 11:13 AM Darrell Ball  wrote:
>
>> Thanks for the patch
>>
>> Not a full review; just some initial testing
>>
>>
>> 1/ AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=2 icmp_first=2
>> icmp_reply_blah=3])])
>>
>> The above syntax is NOT flagged as an error
>>
>>
>> 2/ AT_CHECK([RUN_OVS_VSCTL([--may-exist add-zone-tp netdev zone=2
>> icmp_first=2 icmp_reply=3])])
>>
>> The above "--may-exist" option fails with
>> +ovs-vsctl: 'add-zone-tp' command has no '--may-exist' option
>>
>> AT_CHECK([RUN_OVS_VSCTL([--if-exists del-zone-tp netdev zone=1])])
>> is also failing
>> +ovs-vsctl: 'del-zone-tp' command has no '--if-exists' option
>>
>> Please support both "--may-exist" and "--if-exists"
>>
>>
>> 3/ The below should fail, but it is accepted.
>>
>> AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=2 icmp_first=2
>> icmp_reply=3])])
>> AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=2 icmp_first=2
>> icmp_reply=3])])
>>
>>
>> 4/ The below fails (which is good), but the error is in idl, rather than
>> the 'del-zone-tp' command
>>
>> AT_CHECK([RUN_OVS_VSCTL([del-zone-tp netdev zone=1])])
>> AT_CHECK([RUN_OVS_VSCTL([del-zone-tp netdev zone=1])])
>> fails with
>> +2019-07-26T17:56:10Z|2|ovsdb_idl|WARN|Trying to delete a key that
>> doesn't exist in the map.
>>
>>
>>
>> 5/ Please support --may-exist for add-dp
>>
>> 6/ Please support --if-exists for del-dp
>>
>>
>> 7/ Few comments below
>>
>>
>> Thanks Darrell
>>
>>
>> On Thu, Jul 25, 2019 at 4:26 PM Yi-Hung Wei  wrote:
>>
>>> From: William Tu 
>>>
>>> The patch adds the following commands
>>>   $ ovs-vsctl {add,del,list}-dp
>>> for creating/deleting/listing the datapath, and
>>>   $ ovs-vsctl {add,del,list}-zone-tp
>>> for conntrack zones and timeout policies.
>>>
>>> Signed-off-by: William Tu 
>>> ---
>>>  tests/ovs-vsctl.at   |  20 +++-
>>>  utilities/ovs-vsctl.8.in |  29 ++
>>>  utilities/ovs-vsctl.c| 245
>>> +++
>>>  3 files changed, 292 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
>>> index 77604c58a2bc..8854138ecb1e 100644
>>> --- a/tests/ovs-vsctl.at
>>> +++ b/tests/ovs-vsctl.at
>>> @@ -805,6 +805,22 @@ AT_CHECK(
>>>[RUN_OVS_VSCTL([--if-exists remove netflow x targets '"1.2.3.4:567
>>> "'])])
>>>  AT_CHECK(
>>>[RUN_OVS_VSCTL([--if-exists clear netflow x targets])])
>>> +
>>> +AT_CHECK([RUN_OVS_VSCTL([add-dp netdev])])
>>> +AT_CHECK([RUN_OVS_VSCTL([add-dp system])])
>>> +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=1 icmp_first=1
>>> icmp_reply=2])])
>>> +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:1, Timeout
>>> Policies: icmp_first=1 icmp_reply=2
>>> +])
>>> +AT_CHECK([RUN_OVS_VSCTL([add-zone-tp netdev zone=2 icmp_first=2
>>> icmp_reply=3])])
>>>
>>
>> Add all possible keys as part of positive tests so we know thye work
>>
>>
>>
>>
>>> +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:1, Timeout
>>> Policies: icmp_first=1 icmp_reply=2
>>> +Zone:2, Timeout Policies: icmp_first=2 icmp_reply=3
>>> +])
>>> +AT_CHECK([RUN_OVS_VSCTL([del-zone-tp netdev zone=1])])
>>> +AT_CHECK([RUN_OVS_VSCTL([list-zone-tp netdev])], [0], [Zone:2, Timeout
>>> Policies: icmp_first=2 icmp_reply=3
>>> +])
>>> +AT_CHECK([RUN_OVS_VSCTL([del-dp netdev])])
>>> +AT_CHECK([RUN_OVS_VSCTL([list-dp | sed 's/ uuid.*$//'])], [0], [system
>>> +])
>>>  OVS_VSCTL_CLEANUP
>>>  AT_CLEANUP
>>>
>>> @@ -890,10 +906,10 @@ AT_CHECK([RUN_OVS_VSCTL([set bridge br0
>>> flood_vlans=-1])],
>>>  AT_CHECK([RUN_OVS_VSCTL([set bridge br0 flood_vlans=4096])],
>>>[1], [], [ovs-vsctl: constraint violation: 4096 is not in the valid
>>> range 0 to 4095 (inclusive)
>>>  ])
>>> -AT_CHECK([RUN_OVS_VSCTL([set c br1 'connection-mode=xyz'])],
>>> +AT_CHECK([RUN_OVS_VSCTL([set controller br1 'connection-mode=xyz'])],
>>>
>>
>> unrelated change
>>
>>
>>
>>>  

Re: [ovs-dev] [PATCH 00/12] Support zone-based conntrack timeout policy

2019-07-29 Thread Darrell Ball
On Mon, Jul 29, 2019 at 11:53 AM Yi-Hung Wei  wrote:

> Hi Ilya,
>
> Thanks for your comment.
>
> On Mon, Jul 29, 2019 at 2:22 AM Ilya Maximets 
> wrote:
> >
> > Hi everyone,
> >
> > My 2 cents for the feature design:
> >
> > From the user's perspective:
> >
> > * 'add-dp'/'del-dp' commands looks very strange.
> >   "I didn't add datapath into ovsdb, why it exists and switches packets?"
> >   "I deleted the datapath from the OVS, why it still exists and switches
> packets?"
> >
> >   If you're implementing the configuration like this, 'datapath' should
> >   own the bridges and interfaces, i.e. datapath should be created
> manually
> >   on 'add-dp' and automatically on adding the first bridge on that
> datapath.
> >   All the bridges and interfaces must be deleted/destroyed on 'del-dp'.

>
> >   Or you need to rename your tables and commands to not look like this.
>

I agree that a "datapath" should be auto-created when bridge of that type
is created.
This came up in internal e-mails and I lost track of it.

It is a bit strange to call this column 'datapath'; 'datapath-config' might
be a bit better and reflects
what it is really is. I think renaming would clear things up considerably
and avoid confusion.



> >
> > From the developer's perspective:
> >
> > * Right now 'ofproto-dpif' is the only module that manages datapath
> interfaces
> >   and it knows that (there are specific comments in the code). 'dpif's
> has
> >   no reference counts and it's slightly unsafe to manage them outside of
> >   'ofproto-dpif'.
> >   You're adding the side module that allowed to open dpif (and it's not
> able
> >   to delete it, that is the possible cause if issues) and use it without
> >   noticing any other modules. This breaks the hierarchical structure of
> OVS.
> >
> > * Right now most of the datapath configuration is done via 'other_config'
> >   and corresponding dpif_set_config() callback. Since you're introducing
> >   datapath-config module, it should take care of all of this staff. And
> this
> >   will require significant rework of the current datapath configuration
> scheme.
> >
> > * 'reconfigure_datapath' is an ambiguous name.
> >
> >   Solution for above issues might be not introducing the new modules at
> all.
> >   Everything could be handled like we're handling meters, but with OVSDB
> as the
> >   configuration source. On configuration change bridge layer will call
> ofproto
> >   layer that will pass configuration to ofproto-dpif and, finally, dpif
> layer.
> >   Inside 'struct dpif' in dpif.c module you could track all the
> configuration
> >   and pass all the required changes to the dpif-provider via callbacks.
> >   This way everything will work fine without breaking current OVS
> hierarchy.
>
> Thanks for your suggestion about the datapath-config part. I think it
> makes sense to implement what datapath-config does inside dpif.c
> rather than introduce a new module.  I will make proper change for
> that in v2.
>
>
> >
> > * DB scheme looks just overcomplicated. 3 additional tables which
> references
> >   others just to store a string to integer map.
> >   I think that it might be much easier to create a single 'CT_Zones'
> table
> >   with all the required columns:
> >   'id', 'tcp_syn_sent', 'tcp_syn_recv', ..., 'icmp_reply'.
> >   This is not a big deal since you need to describe every single field in
> >   vswitch.xml anyway. This will also allow you to check support of
> particular
> >   field on the stage of adding value to the database.
> >   If you really need to distinguish zones by the datapath type (which is
> not
> >   obvious), you may add 'datapath_type' column, just like we have in a
> 'Bridge'
> >   table.
>
> As for the database schema, we intend to make CT_Zone table references
> to CT_Timeout_Policy table because some other zone-based feature can
> be configured through ovsdb later on. For example, we can have a new
> column in CT_Zone table that stores 'limit' as an integer to support
> the zone limit feature (limiting number of connection in a zone).  It
> is currently configured through dpctl commands.
>
> I understand your concern on the complication that introduced by the
> datapath table.  Let me think about it more carefully and go back to
> you later.
>
> Thanks,
>
> -Yi-Hung
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 05/12] ct-dpif: Add conntrack timeout policy support in dpif layer

2019-07-29 Thread Darrell Ball
On Mon, Jul 29, 2019 at 12:37 PM Yi-Hung Wei  wrote:

> On Fri, Jul 26, 2019 at 11:41 AM Darrell Ball  wrote:
> >
> > Thanks for the patch
> >
> > I found this patch hard to review since it does not contain
> implementations
> > The same comment applies to Patch 6
> > I think Patches 5-7 can be combined into one patch, which will make
> review easier.
>
> Thanks Darrell for the review. Sure, I can squash patch 5-7 altogether in
> v2.
>
>
> >> +struct ct_dpif_timeout_policy {
> >> +uint32_tid; /* id that uniquely identify a timeout
> policy. */
> >> +uint32_tpresent;/* If a timeout attribute is present set
> the
> >> + * corresponding bit. */
> >>
> >> +uint32_tattrs[CT_DPIF_TP_ATTR_MAX]; /* An array that
> specifies
> >> + * timeout attribute
> values */
> >
> >
> > I think you can make attrs of type 'int32_t' and use '-1' timeout for
> 'not present' and then
> > remove the 'present' field
>
> The timeout value is uint32_t in the kernel, so I will keep it as
> uint32_t.  I find the present flag to be handy when doing conversion
> from ct-dpif layer to dpif-netlink layer, and as you mentioned in the
> following e-mail, I will keep it as is.
>
>
> >> --- a/lib/dpif-netlink.c
> >> +++ b/lib/dpif-netlink.c
> >> @@ -3434,6 +3434,12 @@ const struct dpif_class dpif_netlink_class = {
> >>  dpif_netlink_ct_set_limits,
> >>  dpif_netlink_ct_get_limits,
> >>  dpif_netlink_ct_del_limits,
> >> +NULL,   /* ct_set_timeout_policy */
> >> +NULL,   /* ct_get_timeout_policy */
> >> +NULL,   /* ct_del_timeout_policy */
> >> +NULL,   /* ct_timeout_policy_dump_start */
> >> +NULL,   /* ct_timeout_policy_dump_next */
> >> +NULL,   /* ct_timeout_policy_dump_done */
> >
> >
> > I found this patch hard to review since it does not contain
> implementations
> > The same comment applies to Patch 6
> > I think Patches 5-7 can be combined into one patch, which will make
> review easier.
> >
> >
> >>
> >>  NULL,   /* ipf_set_enabled */
> >>  NULL,   /* ipf_set_min_frag */
> >>  NULL,   /* ipf_set_max_nfrags */
> >> diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
> >> index 12898b9e3c6d..3460ef8aa98d 100644
> >> --- a/lib/dpif-provider.h
> >> +++ b/lib/dpif-provider.h
> >> @@ -80,6 +80,7 @@ dpif_flow_dump_thread_init(struct
> dpif_flow_dump_thread *thread,
> >>  struct ct_dpif_dump_state;
> >>  struct ct_dpif_entry;
> >>  struct ct_dpif_tuple;
> >> +struct ct_dpif_timeout_policy;
> >>
> >>  /* 'dpif_ipf_proto_status' and 'dpif_ipf_status' are presently in
> >>   * sync with 'ipf_proto_status' and 'ipf_status', but more
> >> @@ -498,6 +499,48 @@ struct dpif_class {
> >>   * list of 'struct ct_dpif_zone_limit' entries. */
> >>  int (*ct_del_limits)(struct dpif *, const struct ovs_list
> *zone_limits);
> >>
> >> +/* Connection tracking timeout policy */
> >> +
> >> +/* A connection tracking timeout policy contains a list of timeout
> >> + * attributes that specifies timeout values on various connection
> states.
> >> + * In a datapath, the timeout policy is identified by a 4 bytes
> unsigned
> >> + * integer, and the unsupported timeout attributes are ignored.
> >> + * When a connection is committed it can be associated with a
> timeout
> >> + * policy, or it defaults to the default timeout policy. */
> >> +
> >> +/* Add timeout policy '*tp' into the datapath.  If 'is_default' is
> true
> >
> >
> > "is_default" - can you explain this one ?
>
> This flag is used to configure the default timeout policy in the
> datapath.  If 'is_default' is true, it will set the provided timeout
> policy to be the default timeout policy. The default timeout policy is
> documented in vswitchd/vswitch.xml
>
>
Below is what I see the the schema xml file under timeout policy
Please add description about the 'default' timeout policy under
"CT_Timeout_Policy" - let me know if you need help, as I can submit a patch.

  
Connection tracking timeout policy configuration


   

  1   2   3   4   5   6   7   8   9   10   >