Re: [ovs-dev] dpif-netdev:fix reload pmd's dead lock

2019-02-06 Thread Ilya Maximets
On 31.01.2019 11:48, Lilijun wrote:
> This patch fix the dead lock when using dpdk userspace datapath. The
> problem is described as follows:
> 1) when add or delete port, the main thread will call
> reconfigure_datapath() in the function dpif_netdev_run()
> 2) Here the dp->port_mutex is locked. In dp_netdev_reload_pmd__(), it
> will notify each pmd to reload.
> 3) If pmd is doing packet upcall in fast_path_processing() and try to get
> dp->port_mutex in
> do_xlate_actions()-->tnl_route_lookup_flow()-->dpif_netdev_port_query_by_name().
> Here pmd get lock failed because the main thread has got the lock in
> step 2.
> 4) So the main thread was stuck for waiting pmd to reload done. Now we
> got a dead lock.
> 
> Here reload_affected_pmds() may not need to lock dp->port_mutex. So we
> release the lock temporarily when calling  reload_affected_pmds().
> 
> Signed-off-by: Lilijun 

Replying just to keep answers on a list/patchwork consistent.
The deadlock caused by some local changes done by the user.
Not possible in upstream master. See discussion for the previous
version of this patch that is missing in patchwork for some reason:

   https://mail.openvswitch.org/pipermail/ovs-dev/2019-January/355756.html

Best regards, Ilya Maximets.

> ---
>  lib/dpif-netdev.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 0f57e3f8a..fc3bdae66 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4813,8 +4813,11 @@ reconfigure_datapath(struct dp_netdev *dp)
>  
>  /* Reload affected pmd threads.  We must wait for the pmd threads before
>   * reconfiguring the ports, because a port cannot be reconfigured while
> - * it's being used. */
> + * it's being used. We need release dp->port_mutex to make sure that pmds
> + * don't wait for getting the mutex when handling packet upcalls */
> +ovs_mutex_unlock(>port_mutex);
>  reload_affected_pmds(dp);
> +ovs_mutex_lock(>port_mutex);
>  
>  /* Step 3: Reconfigure ports. */
>  
> @@ -4877,7 +4880,9 @@ reconfigure_datapath(struct dp_netdev *dp)
>  /* Reload affected pmd threads.  We must wait for the pmd threads to 
> remove
>   * the old queues before readding them, otherwise a queue can be polled 
> by
>   * two threads at the same time. */
> +ovs_mutex_unlock(>port_mutex);
>  reload_affected_pmds(dp);
> +ovs_mutex_lock(>port_mutex);
>  
>  /* Step 6: Add queues from scheduling, if they're not there already. */
>  HMAP_FOR_EACH (port, node, >ports) {
> @@ -4909,7 +4914,9 @@ reconfigure_datapath(struct dp_netdev *dp)
>  }
>  
>  /* Reload affected pmd threads. */
> +ovs_mutex_unlock(>port_mutex);
>  reload_affected_pmds(dp);
> +ovs_mutex_lock(>port_mutex);
>  
>  /* Check if PMD Auto LB is to be enabled */
>  set_pmd_auto_lb(dp);
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V3 2/3] acinclude: Include libverbs and libmlx4 when needed

2019-02-06 Thread Eli Britstein
DPDK 18.11 uses libverbs and libmlx4 when MLX4 PMD is enabled.

This commit makes OVS to link to libverbs and libmlx4 when MLX4 PMD is
enabled on DPDK.

Signed-off-by: Eli Britstein 
Reviewed-by: Asaf Penso 
---
 acinclude.m4 | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/acinclude.m4 b/acinclude.m4
index e47ae8b91..ec0c235cc 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -312,12 +312,27 @@ AC_DEFUN([OVS_CHECK_DPDK], [
   [AC_SEARCH_LIBS([mlx5dv_create_wq],[mlx5],[],[AC_MSG_ERROR([unable to 
find libmlx5, install the dependency package])])
DPDK_EXTRA_LIB="-lmlx5"])
 
+AC_COMPILE_IFELSE([
+  AC_LANG_PROGRAM(
+[
+  #include 
+#if defined(RTE_LIBRTE_MLX4_PMD) && !defined(RTE_LIBRTE_MLX4_DLOPEN_DEPS)
+#error
+#endif
+], [])
+  ], [],
+  [AC_SEARCH_LIBS([mlx4dv_init_obj],[mlx4],[],[AC_MSG_ERROR([unable to 
find libmlx4, install the dependency package])])
+   DPDK_EXTRA_LIB="-lmlx4"])
+
 AC_COMPILE_IFELSE([
   AC_LANG_PROGRAM(
 [
   #include 
 #if defined(RTE_LIBRTE_MLX5_PMD) && !defined(RTE_LIBRTE_MLX5_DLOPEN_DEPS)
 #error
+#endif
+#if defined(RTE_LIBRTE_MLX4_PMD) && !defined(RTE_LIBRTE_MLX4_DLOPEN_DEPS)
+#error
 #endif
 ], [])
   ], [],
-- 
2.14.5

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V3 0/3] Include libverbs and libmlx4/5 when needed

2019-02-06 Thread Eli Britstein
This patch set automatically links with the necessary libraries for
MLX4/5 DPDK PMDs.

Patch 1 automatically links with the necessary libraries for MLX5 PMD

Patch 2 automatically links with the necessary libraries for MLX4 PMD

Patch 3 removes unnecessary define previously done for automatically link libmnl

Eli Britstein (3):
  acinclude: Include libverbs and libmlx5 when needed
  acinclude: Include libverbs and libmlx4 when needed
  acinclude: Omit unnecessary define

 acinclude.m4 | 42 --
 1 file changed, 40 insertions(+), 2 deletions(-)

-- 
2.14.5

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V3 3/3] acinclude: Omit unnecessary define

2019-02-06 Thread Eli Britstein
Commit fc3b425fa02f ("acinclude: Include libmnl when needed") added
unnecessary include of DPDK_MNL. Omit it.

Fixes: fc3b425fa02f ("acinclude: Include libmnl when needed")
Signed-off-by: Eli Britstein 
---
 acinclude.m4 | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/acinclude.m4 b/acinclude.m4
index ec0c235cc..44e9770d2 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -297,8 +297,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [
 ], [])
   ], [],
   [AC_SEARCH_LIBS([mnl_attr_put],[mnl],[],[AC_MSG_ERROR([unable to find 
libmnl, install the dependency package])])
-   DPDK_EXTRA_LIB="-lmnl"
-   AC_DEFINE([DPDK_MNL], [1], [MLX5 PMD detected in DPDK.])])
+   DPDK_EXTRA_LIB="-lmnl"])
 
 AC_COMPILE_IFELSE([
   AC_LANG_PROGRAM(
-- 
2.14.5

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V3 1/3] acinclude: Include libverbs and libmlx5 when needed

2019-02-06 Thread Eli Britstein
DPDK 18.11 uses libverbs and libmlx5 when MLX5 PMD is enabled.

This commit makes OVS to link to libverbs and libmlx5 when MLX5 PMD is
enabled on DPDK.

Signed-off-by: Eli Britstein 
Reviewed-by: Shahaf Shuler 
Reviewed-by: Asaf Penso 
---
 acinclude.m4 | 24 
 1 file changed, 24 insertions(+)

diff --git a/acinclude.m4 b/acinclude.m4
index f038fd457..e47ae8b91 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -300,6 +300,30 @@ AC_DEFUN([OVS_CHECK_DPDK], [
DPDK_EXTRA_LIB="-lmnl"
AC_DEFINE([DPDK_MNL], [1], [MLX5 PMD detected in DPDK.])])
 
+AC_COMPILE_IFELSE([
+  AC_LANG_PROGRAM(
+[
+  #include 
+#if defined(RTE_LIBRTE_MLX5_PMD) && !defined(RTE_LIBRTE_MLX5_DLOPEN_DEPS)
+#error
+#endif
+], [])
+  ], [],
+  [AC_SEARCH_LIBS([mlx5dv_create_wq],[mlx5],[],[AC_MSG_ERROR([unable to 
find libmlx5, install the dependency package])])
+   DPDK_EXTRA_LIB="-lmlx5"])
+
+AC_COMPILE_IFELSE([
+  AC_LANG_PROGRAM(
+[
+  #include 
+#if defined(RTE_LIBRTE_MLX5_PMD) && !defined(RTE_LIBRTE_MLX5_DLOPEN_DEPS)
+#error
+#endif
+], [])
+  ], [],
+  [AC_SEARCH_LIBS([verbs_init_cq],[ibverbs],[],[AC_MSG_ERROR([unable to 
find libibverbs, install the dependency package])])
+   DPDK_EXTRA_LIB="-libverbs"])
+
 # On some systems we have to add -ldl to link with dpdk
 #
 # This code, at first, tries to link without -ldl (""),
-- 
2.14.5

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next V2 1/1] openvswitch: Declare ovs key structures using macros

2019-02-06 Thread Eli Britstein

On 2/5/2019 10:23 PM, Ben Pfaff wrote:
> On Tue, Feb 05, 2019 at 10:22:09AM -0800, Gregory Rose wrote:
>> On 2/5/2019 4:02 AM, Eli Britstein wrote:
>>> On 2/4/2019 10:07 PM, David Miller wrote:
 From: Yi-Hung Wei 
 Date: Mon, 4 Feb 2019 10:47:18 -0800

> For example, to see how 'struct ovs_key_ipv6' is defined, now we need
> to trace how OVS_KEY_IPV6_FIELDS is defined, and how OVS_KEY_FIELD_ARR
> and OVS_KEY_FIELD defined.  I think it makes the header file to be
> more complicated.
 I completely agree.

 Unless this is totally unavoidable, I do not want to apply a patch
 which makes reading and auditing the networking code more difficult.
>>> This technique is discussed for example in
>>> https://stackoverflow.com/questions/6635851/real-world-use-of-x-macros,
>>> and I found existing examples of using it in the kernel tree:
>>>
>>> __BPF_FUNC_MAPPER in commit ebb676daa1a34 ("bpf: Print function name in
>>> addition to function id")
>>>
>>> __AAL_STAT_ITEMS and __SONET_ITEMS in commit 607ca46e97a1b ("UAPI:
>>> (Scripted) Disintegrate include/linux"), the successor of commit
>>> 1da177e4c3f4 ("Linux-2.6.12-rc2"). I didn't dig deeper to the past.
>>>
>>> I can agree it makes that H file a bit more complicated, but for sure
>>> less than ## macros that are widely used.
>>>
>>> However, I think the alternatives of generating such defines by some
>>> scripts, or having the fields in more than one place are even worse, so
>>> it is a kind of unavoidable.
>> Why is using a script to generate defines for the requirements of your
>> application or driver so bad?  Your patch
>> turns openvswitch.h into some hardly readable code - using a script and
>> having a machine output the macros
>> your application or driver needs seems like a much better alternative to me.
OK, let's abandon this patch. I'll go with the script alternative.
> In addition, one of the reasons that developers prefer to avoid
> duplication is because of the need to synchronize copies when one of
> them changes.  This doesn't apply in the same way to these structures,
> because they are ABIs that will not change.
This is correct, but still, though it is not likely to change, it might 
will, so I think we must avoid duplication.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v5 2/2] conntrack: Fix max size for inet_ntop() call.

2019-02-06 Thread Darrell Ball
On Wed, Feb 6, 2019 at 1:22 AM David Marchand 
wrote:

> On Mon, Feb 4, 2019 at 6:56 PM Ben Pfaff  wrote:
>
>> On Thu, Jan 31, 2019 at 11:35:41PM -0800, Darrell Ball wrote:
>> > The call to inet_ntop() in repl_ftp_v6_addr() is 1 short to handle
>> > the maximum possible V6 address size for v4 mapping case.
>> >
>> > Found by inspection.
>> >
>> > Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.")
>> > Signed-off-by: Darrell Ball 
>>
>> Thanks for the bug fix patches.  I applied these to master and
>> backported to 2.10 and 2.11.  If they need to be backported further,
>> please submit backported versions.
>>
>
> Thanks Ben.
>
> Darrell, can these two fixes be backported to 2.9 ?
>

These will go back to 2.8
I am in the middle of some easier post-2.8 backports now



> I can send the backports if you don't have the bandwidth.
>
>
> --
> David Marchand
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch v10 6/6] Userspace datapath: Add fragmentation handling.

2019-02-06 Thread Darrell Ball
Fragmentation handling is added for supporting conntrack.
Both v4 and v6 are supported.

After discussion with several people, I decided to not store
configuration state in the database to be more consistent with
the kernel in future, similarity with other conntrack configuration
which will not be in the database as well and overall simplicity.
Accordingly, fragmentation handling is enabled by default.

This patch enables fragmentation tests for the userspace datapath.

Signed-off-by: Darrell Ball 
---
 Documentation/faq/releases.rst   |   49 +-
 NEWS |   10 +
 include/sparse/netinet/ip6.h |1 +
 lib/automake.mk  |4 +-
 lib/conntrack.c  |   20 +-
 lib/conntrack.h  |4 +
 lib/ct-dpif.c|   58 +-
 lib/ct-dpif.h|   12 +-
 lib/dp-packet.h  |6 +
 lib/dpctl.c  |  215 +-
 lib/dpctl.man|   36 +
 lib/dpif-netdev.c|   65 +-
 lib/dpif-netlink.c   |9 +-
 lib/dpif-provider.h  |   53 +-
 lib/ipf.c| 1571 ++
 lib/ipf.h|   60 ++
 tests/system-kmod-macros.at  |   46 +-
 tests/system-traffic.at  |   51 +-
 tests/system-userspace-macros.at |  186 -
 19 files changed, 2376 insertions(+), 80 deletions(-)
 create mode 100644 lib/ipf.c
 create mode 100644 lib/ipf.h

diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index 96da23c..1fb3b1c 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -104,31 +104,30 @@ Q: Are all features available with all datapaths?
 The following table lists the datapath supported features from an Open
 vSwitch user's perspective.
 
-= == == = ===
-Feature   Linux upstream Linux OVS tree Userspace Hyper-V
-= == == = ===
-NAT   4.6YESYes   NO
-Connection tracking   4.3YESPARTIAL   PARTIAL
-Tunnel - LISP NO YESNONO
-Tunnel - STT  NO YESNOYES
-Tunnel - GRE  3.11   YESYES   YES
-Tunnel - VXLAN3.12   YESYES   YES
-Tunnel - Geneve   3.18   YESYES   YES
-Tunnel - GRE-IPv6 4.18   YESYES   NO
-Tunnel - VXLAN-IPv6   4.3YESYES   NO
-Tunnel - Geneve-IPv6  4.4YESYES   NO
-Tunnel - ERSPAN   4.18   YESYES   NO
-Tunnel - ERSPAN-IPv6  4.18   YESYES   NO
-QoS - PolicingYESYESYES   NO
-QoS - Shaping YESYESNONO
-sFlow YESYESYES   NO
-IPFIX 3.10   YESYES   NO
-Set actionYESYESYES   PARTIAL
-NIC Bonding   YESYESYES   YES
-Multiple VTEPsYESYESYES   YES
-Meters4.15   YESYES   NO
-Conntrack zone limit  4.18   YESNONO
-= == == = ===
+== == == = ===
+FeatureLinux upstream Linux OVS tree Userspace Hyper-V
+== == == = ===
+Connection tracking 4.3YES  YES  YES
+Conntrack Fragment Reass.   4.3YES  YES  YES
+NAT 4.6YES  YES  NO
+Conntrack zone limit4.18   YES  NO   NO
+Tunnel - LISP   NO YES  NO   NO
+Tunnel - STTNO YES  NO   YES
+Tunnel - GRE3.11   YES  YES  YES
+Tunnel - VXLAN  3.12   YES  YES  YES
+Tunnel - Geneve 3.18   YES  YES  YES
+Tunnel - GRE-IPv6   NO NO   YES  NO
+Tunnel - VXLAN-IPv6 4.3YES  YES  NO
+Tunnel - Geneve-IPv64.4YES  YES  NO
+QoS - Policing  YESYES  YES  NO
+QoS - Shaping   YESYES  NO   NO
+sFlow 

[ovs-dev] [patch v10 5/6] ovs-atomic: Add 64 bit apis.

2019-02-06 Thread Darrell Ball
Signed-off-by: Darrell Ball 
---
 lib/ovs-atomic.h | 36 
 1 file changed, 36 insertions(+)

diff --git a/lib/ovs-atomic.h b/lib/ovs-atomic.h
index 21e230e..11fa192 100644
--- a/lib/ovs-atomic.h
+++ b/lib/ovs-atomic.h
@@ -479,6 +479,42 @@ atomic_count_set(atomic_count *count, unsigned int value)
 atomic_store_relaxed(>count, value);
 }
 
+static inline uint64_t
+atomic_count_inc64(atomic_uint64_t *counter)
+{
+uint64_t old;
+
+atomic_add_relaxed(counter, 1ull, );
+
+return old;
+}
+
+static inline uint64_t
+atomic_count_dec64(atomic_uint64_t *counter)
+{
+uint64_t old;
+
+atomic_sub_relaxed(counter, 1ull, );
+
+return old;
+}
+
+static inline uint64_t
+atomic_count_get64(atomic_uint64_t *counter)
+{
+uint64_t value;
+
+atomic_read_relaxed(counter, );
+
+return value;
+}
+
+static inline void
+atomic_count_set64(atomic_uint64_t *counter, uint64_t value)
+{
+atomic_store_relaxed(counter, value);
+}
+
 /* Reference count. */
 struct ovs_refcount {
 atomic_uint count;
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch v10 4/6] conntrack: Reword conntrack_execute() description.

2019-02-06 Thread Darrell Ball
Use 'must' instead of 'should'.

Suggested-by: Justin Pettit 
Signed-off-by: Darrell Ball 
---
 lib/conntrack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index bd5b7a6..b92dee5 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1297,7 +1297,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
 }
 
 /* Sends the packets in '*pkt_batch' through the connection tracker 'ct'.  All
- * the packets should have the same 'dl_type' (IPv4 or IPv6) and should have
+ * the packets must have the same 'dl_type' (IPv4 or IPv6) and should have
  * the l3 and and l4 offset properly set.
  *
  * If 'commit' is true, the packets are allowed to create new entries in the
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch v10 3/6] tests: Add missed local stack checks.

2019-02-06 Thread Darrell Ball
Acked-by: Justin Pettit 
Signed-off-by: Darrell Ball 
---
 tests/system-traffic.at | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 3a62e17..6e66946 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -2978,6 +2978,7 @@ AT_SETUP([conntrack - Fragmentation over vxlan])
 OVS_CHECK_VXLAN()
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_FRAG()
+CHECK_CONNTRACK_LOCAL_STACK()
 
 OVS_TRAFFIC_VSWITCHD_START()
 ADD_BR([br-underlay])
@@ -3030,6 +3031,7 @@ AT_SETUP([conntrack - IPv6 Fragmentation over vxlan])
 OVS_CHECK_VXLAN()
 CHECK_CONNTRACK()
 CHECK_CONNTRACK_FRAG()
+CHECK_CONNTRACK_LOCAL_STACK()
 
 OVS_TRAFFIC_VSWITCHD_START()
 ADD_BR([br-underlay])
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch v10 2/6] flow: Enhance parse_ipv6_ext_hdrs.

2019-02-06 Thread Darrell Ball
Acked-by: Justin Pettit 
Signed-off-by: Darrell Ball 
---
 lib/conntrack.c |  4 ++--
 lib/flow.c  | 44 ++--
 lib/flow.h  |  3 ++-
 3 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index f732b9e..bd5b7a6 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1313,7 +1313,6 @@ conntrack_execute(struct conntrack *ct, struct 
dp_packet_batch *pkt_batch,
   const struct nat_action_info_t *nat_action_info,
   long long now)
 {
-
 struct dp_packet *packet;
 struct conn_lookup_ctx ctx;
 
@@ -1561,7 +1560,8 @@ extract_l3_ipv6(struct conn_key *key, const void *data, 
size_t size,
 uint8_t nw_proto = ip6->ip6_nxt;
 uint8_t nw_frag = 0;
 
-if (!parse_ipv6_ext_hdrs(, , _proto, _frag)) {
+const struct ovs_16aligned_ip6_frag *frag_hdr;
+if (!parse_ipv6_ext_hdrs(, , _proto, _frag, _hdr)) {
 return false;
 }
 
diff --git a/lib/flow.c b/lib/flow.c
index c6e4778..faf7c0f 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -458,8 +458,10 @@ invalid:
 
 static inline bool
 parse_ipv6_ext_hdrs__(const void **datap, size_t *sizep, uint8_t *nw_proto,
-  uint8_t *nw_frag)
+  uint8_t *nw_frag,
+  const struct ovs_16aligned_ip6_frag **frag_hdr)
 {
+*frag_hdr = NULL;
 while (1) {
 if (OVS_LIKELY((*nw_proto != IPPROTO_HOPOPTS)
&& (*nw_proto != IPPROTO_ROUTING)
@@ -505,17 +507,17 @@ parse_ipv6_ext_hdrs__(const void **datap, size_t *sizep, 
uint8_t *nw_proto,
 return false;
 }
 } else if (*nw_proto == IPPROTO_FRAGMENT) {
-const struct ovs_16aligned_ip6_frag *frag_hdr = *datap;
+*frag_hdr = *datap;
 
-*nw_proto = frag_hdr->ip6f_nxt;
-if (!data_try_pull(datap, sizep, sizeof *frag_hdr)) {
+*nw_proto = (*frag_hdr)->ip6f_nxt;
+if (!data_try_pull(datap, sizep, sizeof **frag_hdr)) {
 return false;
 }
 
 /* We only process the first fragment. */
-if (frag_hdr->ip6f_offlg != htons(0)) {
+if ((*frag_hdr)->ip6f_offlg != htons(0)) {
 *nw_frag = FLOW_NW_FRAG_ANY;
-if ((frag_hdr->ip6f_offlg & IP6F_OFF_MASK) != htons(0)) {
+if (((*frag_hdr)->ip6f_offlg & IP6F_OFF_MASK) != htons(0)) {
 *nw_frag |= FLOW_NW_FRAG_LATER;
 *nw_proto = IPPROTO_FRAGMENT;
 return true;
@@ -525,11 +527,29 @@ parse_ipv6_ext_hdrs__(const void **datap, size_t *sizep, 
uint8_t *nw_proto,
 }
 }
 
+/* Parses IPv6 extension headers until a terminal header (or header we
+ * don't understand) is found.  'datap' points to the first extension
+ * header and advances as parsing occurs; 'sizep' is the remaining size
+ * and is decreased accordingly.  'nw_proto' starts as the first
+ * extension header to process and is updated as the extension headers
+ * are parsed.
+ *
+ * If a fragment header is found, '*frag_hdr' is set to the fragment
+ * header and otherwise set to NULL.  If it is the first fragment,
+ * extension header parsing otherwise continues as usual.  If it's not
+ * the first fragment, 'nw_proto' is set to IPPROTO_FRAGMENT and 'nw_frag'
+ * has FLOW_NW_FRAG_LATER set.  Both first and later fragments have
+ * FLOW_NW_FRAG_ANY set in 'nw_frag'.
+ *
+ * A return value of false indicates that there was a problem parsing
+ * the extension headers.*/
 bool
 parse_ipv6_ext_hdrs(const void **datap, size_t *sizep, uint8_t *nw_proto,
-uint8_t *nw_frag)
+uint8_t *nw_frag,
+const struct ovs_16aligned_ip6_frag **frag_hdr)
 {
-return parse_ipv6_ext_hdrs__(datap, sizep, nw_proto, nw_frag);
+return parse_ipv6_ext_hdrs__(datap, sizep, nw_proto, nw_frag,
+ frag_hdr);
 }
 
 bool
@@ -876,7 +896,9 @@ miniflow_extract(struct dp_packet *packet, struct miniflow 
*dst)
 nw_ttl = nh->ip6_hlim;
 nw_proto = nh->ip6_nxt;
 
-if (!parse_ipv6_ext_hdrs__(, , _proto, _frag)) {
+const struct ovs_16aligned_ip6_frag *frag_hdr;
+if (!parse_ipv6_ext_hdrs__(, , _proto, _frag,
+   _hdr)) {
 goto out;
 }
 
@@ -1077,7 +1099,9 @@ parse_tcp_flags(struct dp_packet *packet)
 plen = ntohs(nh->ip6_plen); /* Never pull padding. */
 dp_packet_set_l2_pad_size(packet, size - plen);
 size = plen;
-if (!parse_ipv6_ext_hdrs__(, , _proto, _frag)) {
+const struct ovs_16aligned_ip6_frag *frag_hdr;
+if (!parse_ipv6_ext_hdrs__(, , _proto, _frag,
+_hdr)) {
 return 0;
 }
 nw_proto = nh->ip6_nxt;
diff --git a/lib/flow.h b/lib/flow.h
index 5ebdb1f..7298c71 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -131,7 +131,8 @@ void 

[ovs-dev] [patch v10 1/6] dp-packet: Add const qualifiers for checksum apis.

2019-02-06 Thread Darrell Ball
Acked-by: Justin Pettit 
Signed-off-by: Darrell Ball 
---
 lib/dp-packet.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 7b85dd9..a6e2a4f 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -509,28 +509,28 @@ dp_packet_mbuf_init(struct dp_packet *p)
 }
 
 static inline bool
-dp_packet_ip_checksum_valid(struct dp_packet *p)
+dp_packet_ip_checksum_valid(const struct dp_packet *p)
 {
 return (p->mbuf.ol_flags & PKT_RX_IP_CKSUM_MASK) ==
 PKT_RX_IP_CKSUM_GOOD;
 }
 
 static inline bool
-dp_packet_ip_checksum_bad(struct dp_packet *p)
+dp_packet_ip_checksum_bad(const struct dp_packet *p)
 {
 return (p->mbuf.ol_flags & PKT_RX_IP_CKSUM_MASK) ==
 PKT_RX_IP_CKSUM_BAD;
 }
 
 static inline bool
-dp_packet_l4_checksum_valid(struct dp_packet *p)
+dp_packet_l4_checksum_valid(const struct dp_packet *p)
 {
 return (p->mbuf.ol_flags & PKT_RX_L4_CKSUM_MASK) ==
 PKT_RX_L4_CKSUM_GOOD;
 }
 
 static inline bool
-dp_packet_l4_checksum_bad(struct dp_packet *p)
+dp_packet_l4_checksum_bad(const struct dp_packet *p)
 {
 return (p->mbuf.ol_flags & PKT_RX_L4_CKSUM_MASK) ==
 PKT_RX_L4_CKSUM_BAD;
@@ -641,25 +641,25 @@ dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED)
 }
 
 static inline bool
-dp_packet_ip_checksum_valid(struct dp_packet *p OVS_UNUSED)
+dp_packet_ip_checksum_valid(const struct dp_packet *p OVS_UNUSED)
 {
 return false;
 }
 
 static inline bool
-dp_packet_ip_checksum_bad(struct dp_packet *p OVS_UNUSED)
+dp_packet_ip_checksum_bad(const struct dp_packet *p OVS_UNUSED)
 {
 return false;
 }
 
 static inline bool
-dp_packet_l4_checksum_valid(struct dp_packet *p OVS_UNUSED)
+dp_packet_l4_checksum_valid(const struct dp_packet *p OVS_UNUSED)
 {
 return false;
 }
 
 static inline bool
-dp_packet_l4_checksum_bad(struct dp_packet *p OVS_UNUSED)
+dp_packet_l4_checksum_bad(const struct dp_packet *p OVS_UNUSED)
 {
 return false;
 }
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch v10 0/6] Userspace datapath: Add fragmentation support.

2019-02-06 Thread Darrell Ball
Fragmentation support for userspace datapath conntrack is added; both
v4 and v6 are supported. See the patches for additional details.

Fragmentation tests for the userspace datapath are enabled
by the patches and other test enhancements are added.

v10: Addressed Ben's review comments.
 Merged patches 6 and onwards per request.
 Note 4 tests are flagged by address santizer, but are an artifact
 of the packet-out test methodology.

v9: Exported ipf status type to dpif-provider.h for code maintenance
reasons vs datatype info. hiding (code review suggestion; 
Thanks Justin).

Changed counters to 64 bit, per intention (oops) and added a patch
to support 64 bit atomics (code review; good catch Justin).

Merged code for cleanup thread into patches.

Cleanup dpctl_ct_ipf_get_status() usage; 'verbose' usage vs '-m'.

Added a patch to cleanup opt_dpif_open() and callers.

Enhanced comment for 'ipf-set-min-frag'

Minor cleanups.

Rebase.

v8: Fix argument index (-1 vs -2) for recently added function
ipf_set_enabled__().

Eliminate spaces around '|' for 'v4 | v6'.

Reduce performance impact for non-fragments to approx zero.

v7: Address review comments (Thanks Justin).
Rebase.
Fix a couple bugs.
Some enhancements.

v6: Rebase
Folded patch 4 and some test enablement into patch 3 and brought
an earlier patch forward in sequence
Enable fragmentation by default
Cleaup

v5: Added a sub-feature to optionally dump fragmentation lists.
This is useful for DOS forensics and debugging.

The testing coverage was also extended including checking
more counters and frag list occupancies.

Fixed a few bugs:
1/ Handle dpdk mempool source restrictions for a batch of
   packets from multiple sources; this also brings in a purge
   frag list function to handle pathological cases of stuck frags.
2/ ipf_destroy was missing packet frees for frag lists.
3/ A setting of CS_INVALID was missing for expired packets -
   I mentioned this earlier for version 4.

Some enhancements and coding standards changes for Patch 3.

v4: Add V6 support to the patches.
Fix possible race cleanup bug when the user disables
   fragmentation and there are list occupancies, not cleaned up
   yet.
Add missed orig tuple fields for copy from reassembled packet
to fragments.
Fix an fragment list increment check - shoiuld have been "> 0"
rather then "!= 0".
Fix max frags calculation in case of theoretical corner case.
Add proper lock annotations.
Made some other improvements while adding V6 support.

v3: Patch 2 was updated:
Remove "XXX" todo items by implementing the ones needed,
including realloc frag_list contexts to save memory.
Fix related bug with max_frag_list_size when min_frag_size is
reconfigured.

Tighten ip_tot_len sanity check for reassembled packets which
was more loose than intended.

Add another sanity check for fragment ip_tot_len; even though
it be redundant, add for completeness.

v2: Few fixes, improvements and cleanups.

Darrell Ball (6):
  dp-packet: Add const qualifiers for checksum apis.
  flow: Enhance parse_ipv6_ext_hdrs.
  tests: Add missed local stack checks.
  conntrack: Reword conntrack_execute() description.
  ovs-atomic: Add 64 bit apis.
  Userspace datapath: Add fragmentation handling.

 Documentation/faq/releases.rst   |   49 +-
 NEWS |   10 +
 include/sparse/netinet/ip6.h |1 +
 lib/automake.mk  |4 +-
 lib/conntrack.c  |   24 +-
 lib/conntrack.h  |4 +
 lib/ct-dpif.c|   58 +-
 lib/ct-dpif.h|   12 +-
 lib/dp-packet.h  |   22 +-
 lib/dpctl.c  |  215 +-
 lib/dpctl.man|   36 +
 lib/dpif-netdev.c|   65 +-
 lib/dpif-netlink.c   |9 +-
 lib/dpif-provider.h  |   53 +-
 lib/flow.c   |   44 +-
 lib/flow.h   |3 +-
 lib/ipf.c| 1571 ++
 lib/ipf.h|   60 ++
 lib/ovs-atomic.h |   36 +
 tests/system-kmod-macros.at  |   46 +-
 tests/system-traffic.at  |   53 +-
 tests/system-userspace-macros.at |  186 -
 22 files changed, 2460 insertions(+), 101 deletions(-)
 create mode 100644 lib/ipf.c
 create mode 100644 lib/ipf.h

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v9 06/11] Userspace datapath: Add fragmentation handling.

2019-02-06 Thread Darrell Ball
Thanks very much for the thorough review
sorry; I made the changes last year and then ran into vacation and internal
priorities.

On Tue, Dec 11, 2018 at 8:15 AM Ben Pfaff  wrote:

> On Mon, Nov 19, 2018 at 11:09:25AM -0800, Darrell Ball wrote:
> > Fragmentation handling is added for supporting conntrack.
> > Both v4 and v6 are supported.
> >
> > After discussion with several people, I decided to not store
> > configuration state in the database to be more consistent with
> > the kernel in future, similarity with other conntrack configuration
> > which will not be in the database as well and overall simplicity.
> > Accordingly, fragmentation handling is enabled by default.
> >
> > This patch enables fragmentation tests for the userspace datapath.
> >
> > Signed-off-by: Darrell Ball 
>
> Thanks for implementing this.
>
> This could use more comments, especially on the data structures and
> file-level variables and at the level of an individual function.
>

Added more comments.


>
> Please don't invent yet another mutex.
>

yep


>
> ipf_print_reass_packet() seems like it could use ds_put_hex_dump() and
> thereby be more readable (I don't really want to read 182 hex digits in
> a row without any white space).
>

ds_put_hex_dump() is better and I will use it – thanks.


>
> Why 91 bytes in ipf_print_reass_packet()?
>

Just a number with more than enough context


>
> All the counters seem to be write-only.
>

Subsequent patches provide that support, which as folded in now.


>
> struct ipf_addr seems odd to me.  It has both aligned and unaligned
> versions of addresses, which means that the overall struct needs to be
> aligned, and it's a union nested in a struct instead of just a union.
>

The struct was never extended; it is trivially converted to a plain union.


> ipf_addr_hash_add() implies that all the bytes in the struct need to be
> initialized even if only some of them are used.
>

The whole key that gets hashed is initialized with a memset.

The performance aspect is a rounding error.


>
> ipf_list_key_hash() seems to be at risk of hashing trailing padding at
> the end of struct ipf_list_key.
>

ipf_list_key_hash() always operates on a memset zeroed key with fields
later set.

The performance aspect is a rounding error.


>
> Does ipf_list_complete() run one past the end of the array?  Naively, it
> seems like it might.
>

No, it uses array indices with adds 1 to the penultimate one for last iter.

I made it more obvious though.


>
> I found ipf_sort() a little hard to read mostly due to the long variable
> names.  Here's an alternate form that you can accept or reject as you
> like (I have not tested it):
>
> static void
> ipf_sort(struct ipf_frag *frags, size_t last_idx)
> OVS_REQUIRES(ipf_lock)
> {
> for (int i = 1; i <= last_idx; i++) {
> struct ipf_frag ipf_frag = frags[i];
> int j = i - 1;
> while (j >= 0 && frags[j].start_data_byte >
> ipf_frag.start_data_byte) {
> frags[j + 1] = frags[j];
> j--;
> }
> frags[j + 1] = ipf_frag;
> }
> }
>

One day I decided to write only while loops to see if you would notice and
you did.
It is trivially equivalent and a for loop where applicable is always more
compact and easier to understand. Also, the longer names were not for my
benefit, so I shortened them.


>
> ipf_reassemble_v4_frags() and ipf_reassemble_v6_frags() know the final
> length of the packet they're assembling, but it doesn't look to me like
> they preallocate the space for it.
>

Sure, even though the performance aspect is a rounding error and it is just

 a couple extra lines of code and looks nicer.


>
> ipf_reassemble_v6_frags() and ipf_reassemble_v6_frags() calculate the
> length of the L3 header manually and skip past it.  Couldn't they just
> use the L4 header pointer?
>

I am using the L4 pointer elsewhere; it is weird that I didn’t just use it
here as well.

Converted now.


>
> The ipf_list_key_cmp() interface is a little confusing because the
> return value convention and the name is a little like a strcmp()ish
> function, but it doesn't use a -1/+1 return value.  I'd rename it to
> ipf_list_key_equal()s, change the return type to "bool", and make true
> mean equal, false mean unequal.
>

yep; ‘eq’ is simpler semantics for the API, so I used it.


>
> Processing a fragment is O(n) in the number of existing fragments due to
> the dup check.  I don't know whether this is important.
>

This processing is fast and a fraction of the total per fragment.

It is also practically a constant because of bounding.


>
> It looks like packet duplication can cause a fraglist to be marked
> invalid.  I wonder whether this is good behavior.
>

Only the packet is marked invalid and it is just an optimization because
conntrack

will do the same otherwise with unnecessary work.


>
> It seems like we could estimate the number of fragments needed by
> dividing the total size by the size of the first fragment received.
>


Re: [ovs-dev] [RFC PATCH v1 3/3] OVN ACL: Allow a user to input ct.label value for an acl

2019-02-06 Thread Ben Pfaff
I'd like to hear some kind of overall use case here.  Sure, you can use
it to identify an OVN ACL, or a security group, or anything else.  How
does that contribute to a larger system?  There should be a hint to the
reader about how and why to use it.

On Wed, Feb 06, 2019 at 10:06:46PM +, Ankur Sharma wrote:
> Reason for using 128 bits:
> a. Connection tracker has only 2 fields for metadata, ct.mark(32 bits) and 
> ct.label(128 bits).
> b. Purpose of this series is to ensure that we use smaller field, i.e  
> ct.mark for flags and use the bigger field, i.e ct.label for associating 
> metadata with the ct entry.
> c. An example of metadata could be a value which maps ct entry to 
> corresponding OVN ACL or Security Group or both.
> d. Yes, I agree that 128 could more than sufficient for most of the cases, 
> but unless we see a use case of dividing ct.label in subfields, i thought we 
> can leverage on full 128 bits.
> This keeps implementation simple and  also keeps the interpretation of a 
> connection tracking entry simple.
> 
> Please let me know, if it sounds reasonable.
> 
> Thanks
> 
> Regards,
> Ankur
> 
> -Original Message-
> From: Ben Pfaff  
> Sent: Tuesday, February 5, 2019 1:40 PM
> To: Ankur Sharma 
> Cc: ovs-dev@openvswitch.org
> Subject: Re: [ovs-dev] [RFC PATCH v1 3/3] OVN ACL: Allow a user to input 
> ct.label value for an acl
> 
> On Fri, Jan 11, 2019 at 12:16:39AM +, Ankur Sharma wrote:
> > This patch allows user to associate a value with acl, which will be 
> > assigned to ct.label of the corresponding connection tracking entry.
> > 
> > This value can be used to map a ct entry with corresponding OVN ACL or 
> > higher level constructs like security group.
> > 
> > signed-off-by: Ankur Sharma 
> 
> Thanks for the patch!
> 
> Please capitalize the "S" in "Signed-off-by".
> 
> This adds a column in ovn-sb.ovsschema, so it should increment the minor 
> version (the y in x.y.z).
> 
> The documentation for the new column explains what it does, but it does not 
> explain the purpose.  Why would a user set this column?  What are its effects?
> 
> The column is a string, but its value is an integer.  Maybe that is because 
> OVSDB integer columns are limited to 64 bits, but this value can be 128 bits. 
>  That is a very large space.  What is the reason that so much space should be 
> dedicated to this identifier?  Even 64 bits is more identifiers than any 
> deployment will ever use, so there must be some other reason.
> 
> Please do not use // comments.
> 
> Please document the new option in the ovn-sbctl manpage.
> 
> Please add a NEWS item for the new feature.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ofp-monitor: Added support for OpenFlow 1.4+ Flow Monitor

2019-02-06 Thread Ashish Varma
On Tue, Feb 5, 2019 at 4:41 PM Ben Pfaff  wrote:

> On Tue, Feb 05, 2019 at 04:26:56PM -0800, Ashish Varma wrote:
> > In the parse_flow_monitor_request(), usable_protocols is an out
> parameter.
> > (and is set in parse_flow_monitor_request() only if a match field is
> > provided. )
> > allowed_versions on the other hand is used only in the ofctl_monitor
> > function. That is the reason for the check in ofctl_monitor.
>
> The usual way we handle this sort of version dependency is a bitmap like
> 'usable_protocols'.  It works well for flows, for example.  You can see
> lots of examples in ovs-ofctl.c if you search for the identifier
> usable_protocols.
>
> Looking closer at this instance, the existing code is buggy.  Currently,
> only OpenFlow 1.0 should be supported, because that's the only protocol
> where OVS actually supports flow monitoring.
> parse_flow_monitor_request() should be returning that consistently as
> the out parameter (as you noted).  Or it should be returning 0 if it's
> impossible to request a flow monitor at all (in the case where OF1.0
> can't support whatever field is specified when
> parse_flow_monitor_request__ parses a field).  But it's buggy and
> nothing ever properly initializes it.  That bug is masked by another
> bug: nothing ever checks it, either!  And, finally, there is a third
> bug: ovs-ofctl calls ofputil_append_flow_monitor_request() and blindly
> always uses OF1.0, which might not be the protocol actually in use on
> the vconn.
>
> The manpage for the ovs-ofctl monitor command doesn't say that "watch:"
> is OF1.0 only or that it is an Open vSwitch extension, either, although
> it should.
>
> It would be for the best if we can fix all these bugs before we add
> support for OF1.4+ flow monitor, and then backport the bug fixes.  Does
> my description of the problems make sense?  Can you tackle these
> problems too?
>
>
Sure. I will try and fix this before the support for OF1.4+ flow monitor.
Thanks for explaining the issue.



> Thanks,
>
> Ben.
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH v1 3/3] OVN ACL: Allow a user to input ct.label value for an acl

2019-02-06 Thread Ankur Sharma
Hi Ben,

Thanks a lot for review.
Sure, V2 will have all the comments addressed.

Reason for using 128 bits:
a. Connection tracker has only 2 fields for metadata, ct.mark(32 bits) and 
ct.label(128 bits).
b. Purpose of this series is to ensure that we use smaller field, i.e  ct.mark 
for flags and use the bigger field, i.e ct.label for associating metadata with 
the ct entry.
c. An example of metadata could be a value which maps ct entry to corresponding 
OVN ACL or Security Group or both.
d. Yes, I agree that 128 could more than sufficient for most of the cases, but 
unless we see a use case of dividing ct.label in subfields, i thought we can 
leverage on full 128 bits.
This keeps implementation simple and  also keeps the interpretation of a 
connection tracking entry simple.

Please let me know, if it sounds reasonable.

Thanks

Regards,
Ankur

-Original Message-
From: Ben Pfaff  
Sent: Tuesday, February 5, 2019 1:40 PM
To: Ankur Sharma 
Cc: ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] [RFC PATCH v1 3/3] OVN ACL: Allow a user to input 
ct.label value for an acl

On Fri, Jan 11, 2019 at 12:16:39AM +, Ankur Sharma wrote:
> This patch allows user to associate a value with acl, which will be 
> assigned to ct.label of the corresponding connection tracking entry.
> 
> This value can be used to map a ct entry with corresponding OVN ACL or 
> higher level constructs like security group.
> 
> signed-off-by: Ankur Sharma 

Thanks for the patch!

Please capitalize the "S" in "Signed-off-by".

This adds a column in ovn-sb.ovsschema, so it should increment the minor 
version (the y in x.y.z).

The documentation for the new column explains what it does, but it does not 
explain the purpose.  Why would a user set this column?  What are its effects?

The column is a string, but its value is an integer.  Maybe that is because 
OVSDB integer columns are limited to 64 bits, but this value can be 128 bits.  
That is a very large space.  What is the reason that so much space should be 
dedicated to this identifier?  Even 64 bits is more identifiers than any 
deployment will ever use, so there must be some other reason.

Please do not use // comments.

Please document the new option in the ovn-sbctl manpage.

Please add a NEWS item for the new feature.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH v1 2/3] OVN ACL: Allow ct_mark and ct_label values to be set from register as well

2019-02-06 Thread Ankur Sharma
Hi Ben,

Thanks for the review.
a. My bad on the Signed off by, will take care of it in V2.
b. Sure, I will add the test case as well.
c. I will update the ovn-sb.xml as well.
d. Yes, openflow does not restrict the usage of registers for connection 
tracker, I think probably because of lack of use case thus far, it was missing 
in OVN.


Thanks again for review, V2 will have all the comments addressed.

Regards,
Ankur

-Original Message-
From: Ben Pfaff  
Sent: Tuesday, February 5, 2019 1:30 PM
To: Ankur Sharma 
Cc: ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] [RFC PATCH v1 2/3] OVN ACL: Allow ct_mark and ct_label 
values to be set from register as well

On Fri, Jan 11, 2019 at 12:16:38AM +, Ankur Sharma wrote:
> OVN allows only an integer (or masked integer) to be assigned to 
> ct_mark and ct_label.
> 
> This patch, enhances the parser code to allow ct_mark and ct_label to 
> be assigned from 32 bit registers (MFF_REG0 - MFF_REG15) and  128 bit 
> registers (MFF_XXREG0 - MFF_XXREG3) respectively.
> 
> signed-off-by: Ankur Sharma 

Thanks for the patch!

Please capitalize "Signed-off-by" <-- just like that.

When we add new support for OVN actions, we also add tests to the "ovn -- 
action parsing" test in ovn.at.  Please add a test for a correctly formed 
action as well as at least one negative form that exercises each of the new 
error messages.

Please also document the new form of the action in ovn-sb.xml.

The error messages don't seem all that user friendly.  Make sure that they 
clearly point out the problem.  (That's probably easier when you add the tests.)

The new forms break the level of abstraction that OVN fields are supposed to 
provide, that is, please use OVN field names rather than OpenFlow ones.

I don't know why only registers are allowed.  I don't believe that such a 
restriction exists at the OpenFlow level.  I also don't know why only full, 
aligned registers are allowed.  I don't think that restriction exists at the 
OpenFlow level either.

It looks like the indentation is wrong here in encode_CT_COMMIT().

Thanks,

Ben.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH v1 1/3] OVN ACL: Replace the usage of ct_label with ct_mark

2019-02-06 Thread Ankur Sharma
Hi Ben,

Thanks a lot for the review.
Sure, I will add comments in logical-fields.c explaining the reason for 
retaining ct_label.blocked.
I will rename ct_mark.blocked to ct.blocked as well.

V2 will have all these changes.

Thanks again for review.

Regards,
Ankur

-Original Message-
From: Ben Pfaff  
Sent: Tuesday, February 5, 2019 1:21 PM
To: Ankur Sharma 
Cc: ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] [RFC PATCH v1 1/3] OVN ACL: Replace the usage of 
ct_label with ct_mark

On Fri, Jan 11, 2019 at 12:16:35AM +, Ankur Sharma wrote:
> OVN ACL implementation used ct_label to indicate if a previosuly 
> allowed connection shoudl not be allowed anymore and vice versa.
> 
> However, ct_label is a 128 bit value and we should rather leverage on 
> ct_mark which is a 32 bit value.
> 
> Using ct_mark for this purpose, allows us to use ct_label for storing 
> other values like, identifier for corresponidng OVN ACL/Security group etc.
> 
> signed-off-by: Ankur Sharma 

Thanks for the patch.

I think that the idea here is to retain the existing ct_label.blocked for 
compatibility with older ovn-northd, so that during an upgrade the older 
logical flows continue to work.  That is a good idea.  I think that there 
should be a comment in logical-fields.c explaining why ct_label.blocked is 
still there.  Then, someday in the future, when we think that upgrades from 
such an old version is no longer important, we will have an idea why it is 
there and that we can now to remove it.

I find myself wondering, though, why we have ct_label.blocked at all.
In some other cases where ovn-northd uses a bit specifically, it has a macro 
for it, e.g. REGBIT_CONNTRACK_COMMIT.  Another option would be to have better 
abstraction, i.e. to name the bit "ct.blocked" instead of ct_mark.blocked or 
ct_label.blocked.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/3] acinclude: Also use LIBS from dpkg pkg-config

2019-02-06 Thread Aaron Conole
"Stokes, Ian"  writes:

>> DPDK 18.11 builds using the more modern meson build system no more provide
>> the -ldpdk linker script. Instead it is expected to use pkgconfig for
>> linker options as well.
>> 
>> This change will set DPDK_LIB from pkg-config (if pkg-config was
>> available) and since that already carries the whole-archive flags around
>> the PMDs skips the further wrapping in more whole-archive if that is
>> already part of DPDK_LIB.
>> 
>> Signed-off-by: Christian Ehrhardt 
>> ---
>
> Hi Christian,
>
> Thanks for this, this patch will cause the travis build to fail, I'd
> suggest it be rolled into patch 2 of the series as its expected for
> all patches to pass the build individually rather than being corrected
> by a later patch in the series.
>
> https://travis-ci.org/istokes/ovs/builds/488462598
>
> Just a query for the RHEL folks, I remember there being some concern
> as the DPDK userspace conference with regards to dependencies required
> for the meson build of DPDK not being supported in RHEL distros, is
> this still the case?

I think this change is independent of the meson build.  It merely takes
the meson build into account, from what I can tell.

> Ian
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/1] acinclude: Also use LIBS from dpkg pkg-config

2019-02-06 Thread Aaron Conole
Christian Ehrhardt  writes:

> DPDK 18.11 builds using the more modern meson build system no more
> provide the -ldpdk linker script. Instead it is expected to use
> pkgconfig for linker options as well.
>
> This change will set DPDK_LIB from pkg-config (if pkg-config was
> available) and since that already carries the whole-archive flags
> around the PMDs skips the further wrapping in more whole-archive
> if that is already part of DPDK_LIB.
>
> To work reliable in all environments this needs pkg-config 0.29.1.
> We want to be able to use PKG_CHECK_MODULES_STATIC which
> is not yet available in 0.24. Therefore update pkg.m4
> to pkg-config 0.29.1.
>
> This should be backport-safe as these macro files are all versioned.
> autoconf is smart enough to check the version if you have it locally,
> and if the system's is higher, it will use that one instead.
>
> Finally make configure.ac use the locally provided pkg.m4 before
> calling the PKG_PROG_PKG_CONFIG macro.
>
> Signed-off-by: Christian Ehrhardt 
> ---

LGTM

Acked-by: Aaron Conole 

I think Timothy should give it a quick glance over, if possible.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovs-discuss] Geneve remote_ip as flow for OVN hosts

2019-02-06 Thread Ben Pfaff
It looks like we fell off ovs-dev somehow.  I've added it back.

On Wed, Feb 06, 2019 at 05:50:39PM +, Venugopal Iyer wrote:
> Thanks, Ben!
> 
> I missed and {} for an if; have fixed it; thanks for catching it!
> 
> As for the TODOS:
>  - ovn/controller/chassis.c was something I just wanted to check, but not 
> needed, so have
> removed it.
> 
> - ovn/controller/physical.c is not relevant, so removed it.
> 
> - ovn/utitilities/ovnn-sbctl.c was to add a cmd to bind a encap to  a port. I 
> am not
>   planning to do that now since it will work even if we don't explicitly bind 
> (it will select
>   the preferred encap to the chassis it is bound to). If we see a need to 
> provide a
>   lsp-encap-bind (or somesuch) we can add it later. Let me know if that works.

OK, thanks.

> BTW, I wanted to check if it is possible to rename "ovn-chassis-id" to 
> "ovn-tunnel-id"
> since that is more appropriate. Not sure if that'll cause any grief to 
> existing tools/scripts
> that specifically look for ovn-chassis-id.

Good names are important but we don't want to change names if it causes
compatibility problems.  However...

> Also,  as I mentioned the changes will mean that the ovn-controller will need 
> the ovn-central
> to be updated to the changed version as well (i.e. if someone just installs 
> ovs and ovn-host
> s/he can't expect it to be backward compatible with the older version of 
> ovn-sb). Is that
> acceptable?

That's not what we usually want.  The OVN upgrade process expects the
HVs to be upgraded before the central nodes.  If that breaks things,
especially in the case where the deployment is only using a single
interface per HV, it's a problem.

What would it take to retain compatibility?

> I haven't updated the test suite (to add a couple of tests for m-vtep); i.e 
> to make sure the
> port binding is done correctly. What is the process to proceed with 
> integrating the changes;
> is it required that the tests be committed at the same time the changes are?
> 
> I have  pushed the changes to https://github.com/iyervl/nv-ovs after cleaning 
> up the
> comments and adding the missing {}.
> 
> Once the changes look fine, I'll squash the commits and also add the detailed 
> commit
> and let you know.
> 
> thanks for your time and help!
> 
> -venu
> 
> 
> From: Ben Pfaff 
> Sent: Tuesday, February 5, 2019 12:55 PM
> To: Venugopal Iyer
> Cc: Guru Shetty; Leonid Grossman
> Subject: Re: [ovs-discuss] [ovs-dev] Geneve remote_ip as flow for OVN hosts
> 
> Thanks.
> 
> I looked over this again now.  There are multiple TODO items still in
> it.  Do you intend to fix them?
> 
> The version in ovn-sb.ovsschema should be updated, probably to 2.1.0.
> 
> I noticed one missing {} around an 'if' block.
> 
> I'd appreciate a detailed commit message for the series (which I imagine
> will ultimately be squashed into one patch for commit).
> 
> Thanks again!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC 1/3] dpdk: Unify passing of DPDK EAL arguments.

2019-02-06 Thread Ilya Maximets
On 06.02.2019 18:05, Aaron Conole wrote:
> Ilya Maximets  writes:
> 
>> On 05.02.2019 23:19, Flavio Leitner wrote:
>>> On Tue, Feb 05, 2019 at 06:43:08PM +0300, Ilya Maximets wrote:
 Hi Flavio.
 Thanks for taking a look.

 On 05.02.2019 15:38, Flavio Leitner wrote:
>
> Hi Ilya,
>
> Thanks for the patch and I think we knew about that pain when we
> exposed the very first parameter. I still remember Aaron struggling
> to find the best path forward. Time flies and here we are.
>
> The problem is that this change is not friendly from the community
> perspective to its users. That is like an exposed API which we should
> avoid break as much as possible.
>
> For instance, there are users (OpenStack) with support life cycle of
> 5+ years that cannot keep up with this kind of change but they expect
> to be able to update to newer OVS.

 Sure, there are users that wants stable API that will never change.
 But this is really impossible in practice. I'm working with OpenStack
 too and will have to fixup deployment tools with these changes. BTW, from
 the whole OpenStack only few deployment projects (like TripleO) will need
 to make changes and these changes are not very big.
>>>
>>> That's only part of the work. There will be work on QA, documentation
>>> and even migration path from one to another. And we can't change the
>>> past for existing deployments.
>>
>> Sure. But incompatible API changes are almost unavoidable for a young
>> projects that wants to be better.
>>
>>>
> One idea is to freeze whatever we have now and update the documentation
> to recommend the new way. We give like a couple OVS releases and only
> then ignore (or remove?) those parameters.

 Yes, In cover letter I proposed these patches to be applied one per 
 release.
 And current (first) patch does not remove the functionality, only docs.
 Users still will be able to use old interface, but will have warnings
 in the log. In the next release cycle we'll start ignore the values
 while still printing the warnings. This should give enough time for 
 adaptation.
 If you feel that we need more time, we could apply the second patch to 2.14
 (or whatever number will be in 2 releases from now).
>>>
>>> I don't think we should remove the docs if the parameters are there as
>>> a first step. I mean, assume an existing deployment, there is a parameter
>>> which might be in use but there is no documentation available. That
>>> doesn't sound like a good user experience to me.
>>
>> Maybe we could save a man pages while removing the guides. There is no much
>> information in Documentation/intro/install/dpdk.rst anyway.
>>
>>>
>>> On another hand, you could introduce the new interface and update the
>>> docs to recommend using the new one because the old one will be removed
>>> in the future. Warning messages come next, and then finally its removal.
>>
>> I'd prefer to have warning messages to be there right from the start to
>> push users to migrate to the new interfaces as early as possible.
>>
>> How about this:
>>
>> First stage (apply now, release in 2.12):
>>   - Introduce new interface 'dpdk-options'.
>>   - Rewrite installation guide with new interface fully removing the old one.
>>   - Add new interface to man pages (vswitch.xml) and mark all the old options
>> as deprecated by adding something like: "Deprecated. 'dpdk-options' 
>> should
>> be used instead. Will be ignored in the future."
>>   - Add a runtime deprecation warning if old interface is in use.
>>   - Ignore values of old knobs if 'dpdk-options' provided.
>>
>> Second stage (release in 2.13 or 2.14, could wait longer if required):
>>   - Remove old interfaces wile keeping the warnings. (i.e. values always 
>> ignored)
>>   - Remove old knobs from the man pages.
>>
>> Third stage (optional):
>>   - Remove warnings.
>>
>> So the main difference from the current patches is delaying removal of the 
>> man
>> pages to the second stage.
>>
>>>
>>>
> IMO in the end we are moving the problem from one place to another
> because even with a single string, OVS users will be caught off guard
> when DPDK changes. Yes, less pain to OVS community in the sense that
> we don't have to add/remove/deprecate stuff, but it is still a bad
> user experience regardless, which is not what OVS is known for.

 Unfortunately, DPDK was never user-friendly enough. It tries, but still 
 not.
>>>
>>> Agreed.
>>>
 We're keeping few sane defaults like providing default lcore and setting 
 the
 socket-limit if needed. And we'll try to do that in the future. The thing
 this patch tries to eliminate is the dependency tracking between different
 EAL arguments, i.e. duplicating the work with rte_eal_init() and having too
 many configuration knobs with similar meanings what does not work at the
 same time.


[ovs-dev] [PATCH v2] netdev-dpdk: Use single struct/union for flow offload items.

2019-02-06 Thread Ilya Maximets
Having a single structure allows to simplify the code path and
clear all the items at once (probably faster). This does not
increase stack memory usage because all the L4 related items
grouped in a union.

Changes:
  - Memsets combined.
  - 'ipv4_next_proto_mask' dropped as we already know the address
and able to use 'mask.ipv4.hdr.next_proto_id' directly.
  - Group of 'if' statements for L4 protocols turned to a 'switch'.
We can do that, because we don't have semi-local variables anymore.
  - Eliminated 'end_proto_check' label. Not needed with 'switch'.

Additionally 'rte_memcpy' replaced with simple 'memcpy' as it makes no
sense to use 'rte_memcpy' for 6 bytes.

Signed-off-by: Ilya Maximets 
---

Version 2:
* Dropped 'ipv4_next_proto_mask' pointer as we able to use
  'mask.ipv4.hdr.next_proto_id' directly.

 lib/netdev-dpdk.c | 189 +++---
 1 file changed, 78 insertions(+), 111 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 26022a59c..d18dd1b6c 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -4564,28 +4564,36 @@ netdev_dpdk_add_rte_flow_offload(struct netdev *netdev,
 struct flow_actions actions = { .actions = NULL, .cnt = 0 };
 struct rte_flow *flow;
 struct rte_flow_error error;
-uint8_t *ipv4_next_proto_mask = NULL;
+uint8_t proto = 0;
 int ret = 0;
+struct flow_items {
+struct rte_flow_item_eth  eth;
+struct rte_flow_item_vlan vlan;
+struct rte_flow_item_ipv4 ipv4;
+union {
+struct rte_flow_item_tcp  tcp;
+struct rte_flow_item_udp  udp;
+struct rte_flow_item_sctp sctp;
+struct rte_flow_item_icmp icmp;
+};
+} spec, mask;
+
+memset(, 0, sizeof spec);
+memset(, 0, sizeof mask);
 
 /* Eth */
-struct rte_flow_item_eth eth_spec;
-struct rte_flow_item_eth eth_mask;
 if (!eth_addr_is_zero(match->wc.masks.dl_src) ||
 !eth_addr_is_zero(match->wc.masks.dl_dst)) {
-memset(_spec, 0, sizeof eth_spec);
-memset(_mask, 0, sizeof eth_mask);
-rte_memcpy(_spec.dst, >flow.dl_dst, sizeof eth_spec.dst);
-rte_memcpy(_spec.src, >flow.dl_src, sizeof eth_spec.src);
-eth_spec.type = match->flow.dl_type;
-
-rte_memcpy(_mask.dst, >wc.masks.dl_dst,
-   sizeof eth_mask.dst);
-rte_memcpy(_mask.src, >wc.masks.dl_src,
-   sizeof eth_mask.src);
-eth_mask.type = match->wc.masks.dl_type;
+memcpy(, >flow.dl_dst, sizeof spec.eth.dst);
+memcpy(, >flow.dl_src, sizeof spec.eth.src);
+spec.eth.type = match->flow.dl_type;
+
+memcpy(, >wc.masks.dl_dst, sizeof mask.eth.dst);
+memcpy(, >wc.masks.dl_src, sizeof mask.eth.src);
+mask.eth.type = match->wc.masks.dl_type;
 
 add_flow_pattern(, RTE_FLOW_ITEM_TYPE_ETH,
- _spec, _mask);
+ , );
 } else {
 /*
  * If user specifies a flow (like UDP flow) without L2 patterns,
@@ -4598,50 +4606,37 @@ netdev_dpdk_add_rte_flow_offload(struct netdev *netdev,
 }
 
 /* VLAN */
-struct rte_flow_item_vlan vlan_spec;
-struct rte_flow_item_vlan vlan_mask;
 if (match->wc.masks.vlans[0].tci && match->flow.vlans[0].tci) {
-memset(_spec, 0, sizeof vlan_spec);
-memset(_mask, 0, sizeof vlan_mask);
-vlan_spec.tci  = match->flow.vlans[0].tci & ~htons(VLAN_CFI);
-vlan_mask.tci  = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI);
+spec.vlan.tci  = match->flow.vlans[0].tci & ~htons(VLAN_CFI);
+mask.vlan.tci  = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI);
 
 /* match any protocols */
-vlan_mask.inner_type = 0;
+mask.vlan.inner_type = 0;
 
 add_flow_pattern(, RTE_FLOW_ITEM_TYPE_VLAN,
- _spec, _mask);
+ , );
 }
 
 /* IP v4 */
-uint8_t proto = 0;
-struct rte_flow_item_ipv4 ipv4_spec;
-struct rte_flow_item_ipv4 ipv4_mask;
 if (match->flow.dl_type == htons(ETH_TYPE_IP)) {
-memset(_spec, 0, sizeof ipv4_spec);
-memset(_mask, 0, sizeof ipv4_mask);
-
-ipv4_spec.hdr.type_of_service = match->flow.nw_tos;
-ipv4_spec.hdr.time_to_live= match->flow.nw_ttl;
-ipv4_spec.hdr.next_proto_id   = match->flow.nw_proto;
-ipv4_spec.hdr.src_addr= match->flow.nw_src;
-ipv4_spec.hdr.dst_addr= match->flow.nw_dst;
-
-ipv4_mask.hdr.type_of_service = match->wc.masks.nw_tos;
-ipv4_mask.hdr.time_to_live= match->wc.masks.nw_ttl;
-ipv4_mask.hdr.next_proto_id   = match->wc.masks.nw_proto;
-ipv4_mask.hdr.src_addr= match->wc.masks.nw_src;
-ipv4_mask.hdr.dst_addr= match->wc.masks.nw_dst;
+spec.ipv4.hdr.type_of_service = match->flow.nw_tos;
+spec.ipv4.hdr.time_to_live= match->flow.nw_ttl;
+  

Re: [ovs-dev] [PATCH] netdev-dpdk: shrink critical region under tx_q[qid].tx_lock

2019-02-06 Thread Ian Stokes

On 1/31/2019 2:47 AM, Li RongQing wrote:

netdev_dpdk_filter_packet_len() does not need to be protected
by tx_q[].tx_lock, and tx_q[].tx_lock can not protect it too,
same to netdev_dpdk_qos_run

so move them out of this lock to improve the scalability



Thanks for the Patch Li, can you give more details by what you mean in 
terms of scalability? The changes are small beow so I'm curious to as to 
the usecase you have where your seeing an improvement?



Signed-off-by: Li RongQing 
---
  lib/netdev-dpdk.c | 33 ++---
  1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 4bf0ca9e8..bf4918e2c 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2333,15 +2333,15 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
  goto out;
  }
  
-rte_spinlock_lock(>tx_q[qid].tx_lock);

-
  cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
  /* Check has QoS has been configured for the netdev */
  cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt, true);
  dropped = total_pkts - cnt;
  
+rte_spinlock_lock(>tx_q[qid].tx_lock);

+
+int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
  do {
-int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
  unsigned int tx_pkts;
  
  tx_pkts = rte_vhost_enqueue_burst(vid, vhost_qid, cur_pkts, cnt);

@@ -2462,15 +2462,20 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
  return;
  }
  
-if (OVS_UNLIKELY(concurrent_txq)) {

-qid = qid % dev->up.n_txq;
-rte_spinlock_lock(>tx_q[qid].tx_lock);
-}
-
  if (OVS_UNLIKELY(batch->packets[0]->source != DPBUF_DPDK)) {
  struct netdev *netdev = >up;
  


The change below introduces code duplication in both the if and else 
statements specifically for the unlikely case. I'm slow to introduce 
this change as it seems the key benefit is in the case where concurrent 
txqs are used which to date has not been the common use case for the 
wider community. I take it here this case is the beneficiary?


Ian


+if (OVS_UNLIKELY(concurrent_txq)) {
+qid = qid % dev->up.n_txq;
+rte_spinlock_lock(>tx_q[qid].tx_lock);
+}
+
  dpdk_do_tx_copy(netdev, qid, batch);
+
+if (OVS_UNLIKELY(concurrent_txq)) {
+rte_spinlock_unlock(>tx_q[qid].tx_lock);
+}
+
  dp_packet_delete_batch(batch, true);
  } else {
  int tx_cnt, dropped;
@@ -2481,8 +2486,17 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
  tx_cnt = netdev_dpdk_qos_run(dev, pkts, tx_cnt, true);
  dropped = batch_cnt - tx_cnt;
  
+if (OVS_UNLIKELY(concurrent_txq)) {

+qid = qid % dev->up.n_txq;
+rte_spinlock_lock(>tx_q[qid].tx_lock);
+}
+
  dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
  
+if (OVS_UNLIKELY(concurrent_txq)) {

+rte_spinlock_unlock(>tx_q[qid].tx_lock);
+}
+
  if (OVS_UNLIKELY(dropped)) {
  rte_spinlock_lock(>stats_lock);
  dev->stats.tx_dropped += dropped;
@@ -2490,9 +2504,6 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
  }
  }
  
-if (OVS_UNLIKELY(concurrent_txq)) {

-rte_spinlock_unlock(>tx_q[qid].tx_lock);
-}
  }
  
  static int




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Use single struct/union for flow offload items.

2019-02-06 Thread Ilya Maximets
On 06.02.2019 18:16, Ophir Munk wrote:
> Hi,
> Please find comments inline.
> 
>> -Original Message-
>> From: ovs-dev-boun...@openvswitch.org > boun...@openvswitch.org> On Behalf Of Ilya Maximets
>> Sent: Wednesday, February 6, 2019 3:19 PM
>> To: ovs-dev@openvswitch.org; Ian Stokes 
>> Cc: Ilya Maximets ; Roni Bar Yanai
>> 
>> Subject: [ovs-dev] [PATCH] netdev-dpdk: Use single struct/union for flow
>> offload items.
>>
>> Having a single structure allows to simplify the code path and clear all the
>> items at once (probably faster). This does not increase stack memory usage
>> because all the L4 related items grouped in a union.
>>
>> Changes:
>>   - Memsets combined.
>>   - 'ipv4_next_proto_mask' calculated at the top as we already know
>> the address. We also do not need to check it before clearing.
>>   - Group of 'if' statements for L4 protocols turned to a 'switch'.
>> We can do that, because we don't have semi-local variables anymore.
>>   - Eliminated 'end_proto_check' label. Not needed with 'switch'.
>>
>> Additionally 'rte_memcpy' replaced with simple 'memcpy' as it makes no
>> sense to use 'rte_memcpy' for 6 bytes.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>  lib/netdev-dpdk.c | 190 +++---
>>  1 file changed, 79 insertions(+), 111 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>> 26022a59c..79219f6ef 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -4564,28 +4564,37 @@ netdev_dpdk_add_rte_flow_offload(struct
>> netdev *netdev,
>>  struct flow_actions actions = { .actions = NULL, .cnt = 0 };
>>  struct rte_flow *flow;
>>  struct rte_flow_error error;
>> -uint8_t *ipv4_next_proto_mask = NULL;
>> +uint8_t proto = 0;
>>  int ret = 0;
>> +struct flow_items {
>> +struct rte_flow_item_eth  eth;
>> +struct rte_flow_item_vlan vlan;
>> +struct rte_flow_item_ipv4 ipv4;
>> +union {
>> +struct rte_flow_item_tcp  tcp;
>> +struct rte_flow_item_udp  udp;
>> +struct rte_flow_item_sctp sctp;
>> +struct rte_flow_item_icmp icmp;
>> +};
>> +} spec, mask;
>> +uint8_t *ipv4_next_proto_mask = _proto_id;
>> +
> 
> Please consider removing ipv4_next_porto_mask pointer assignment and later 
> use directly mask.ipv4.hdr.next_proto_id.

Good point. Thanks.

> 
>> +memset(, 0, sizeof spec);
>> +memset(, 0, sizeof mask);
> 
> For your consideration: we can save some memset() calls if we moved them 
> inside the Eth case. 
> I like their current position for better readability.

Single place for clearing the whole structure seems better for me because
it protects from possible issues with uninitialized data.
Also clearing big place at once could allow compiler to choose vectorized
implementations. Most of structures are too small.

> 
>>
>>  /* Eth */
>> -struct rte_flow_item_eth eth_spec;
>> -struct rte_flow_item_eth eth_mask;
>>  if (!eth_addr_is_zero(match->wc.masks.dl_src) ||
>>  !eth_addr_is_zero(match->wc.masks.dl_dst)) {
>> -memset(_spec, 0, sizeof eth_spec);
>> -memset(_mask, 0, sizeof eth_mask);
>> -rte_memcpy(_spec.dst, >flow.dl_dst, sizeof
>> eth_spec.dst);
>> -rte_memcpy(_spec.src, >flow.dl_src, sizeof eth_spec.src);
>> -eth_spec.type = match->flow.dl_type;
>> -
>> -rte_memcpy(_mask.dst, >wc.masks.dl_dst,
>> -   sizeof eth_mask.dst);
>> -rte_memcpy(_mask.src, >wc.masks.dl_src,
>> -   sizeof eth_mask.src);
>> -eth_mask.type = match->wc.masks.dl_type;
>> +memcpy(, >flow.dl_dst, sizeof spec.eth.dst);
>> +memcpy(, >flow.dl_src, sizeof spec.eth.src);
>> +spec.eth.type = match->flow.dl_type;
>> +
>> +memcpy(, >wc.masks.dl_dst, sizeof
>> mask.eth.dst);
>> +memcpy(, >wc.masks.dl_src, sizeof
>> mask.eth.src);
>> +mask.eth.type = match->wc.masks.dl_type;
>>
>>  add_flow_pattern(, RTE_FLOW_ITEM_TYPE_ETH,
>> - _spec, _mask);
>> + , );
>>  } else {
>>  /*
>>   * If user specifies a flow (like UDP flow) without L2 patterns, @@ 
>> -
>> 4598,50 +4607,37 @@ netdev_dpdk_add_rte_flow_offload(struct netdev
>> *netdev,
>>  }
>>
>>  /* VLAN */
>> -struct rte_flow_item_vlan vlan_spec;
>> -struct rte_flow_item_vlan vlan_mask;
>>  if (match->wc.masks.vlans[0].tci && match->flow.vlans[0].tci) {
>> -memset(_spec, 0, sizeof vlan_spec);
>> -memset(_mask, 0, sizeof vlan_mask);
>> -vlan_spec.tci  = match->flow.vlans[0].tci & ~htons(VLAN_CFI);
>> -vlan_mask.tci  = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI);
>> +spec.vlan.tci  = match->flow.vlans[0].tci & ~htons(VLAN_CFI);
>> +mask.vlan.tci  = match->wc.masks.vlans[0].tci &
>> + ~htons(VLAN_CFI);
>>
>>  /* match any protocols */
>> -

Re: [ovs-dev] [PATCHv7 1/3] Improved Packet Drop Statistics in OVS

2019-02-06 Thread Ilya Maximets
Hi.
See comments inline.

Best regards, Ilya Maximets.

On 06.02.2019 7:01, Anju Thomas wrote:
> 
> Hi Ben/Ilya,
> 
> I have addressed the comments in the below patch. Can you tell me if this is 
> fin> 
> Regards
> Anju
> -Original Message-
> From: Anju Thomas [mailto:anju.tho...@ericsson.com] 
> Sent: Tuesday, January 29, 2019 5:21 PM
> To: d...@openvswitch.org
> Cc: Anju Thomas 
> Subject: [PATCHv7 1/3] Improved Packet Drop Statistics in OVS
> 
>Currently OVS maintains explicit packet drop/error counters only on port
>level. Packets that are dropped as part of normal OpenFlow processing are
>counted in flow stats of “drop” flows or as table misses in table stats.
>These can only be interpreted by controllers that know the semantics of
>the configured OpenFlow pipeline. Without that knowledge, it is impossible
>for an OVS user to obtain e.g. the total number of packets dropped due to
>OpenFlow rules.
> 
>Furthermore, there are numerous other reasons for which packets can be
>dropped by OVS slow path that are not related to the OpenFlow pipeline.
>The generated datapath flow entries include a drop action to avoid further
>expensive upcalls to the slow path, but subsequent packets dropped by the
>datapath are not accounted anywhere.
> 
>Finally, the datapath itself drops packets in certain error situations.
>Also, these drops are today not accounted for.
> 
>This makes it difficult for OVS users to monitor packet drop in an OVS
>instance and to alert a management system in case of a unexpected increase
>of such drops. Also OVS trouble-shooters face difficulties in analysing
>packet drops.
> 
>With this patch we implement following changes to address the issues
>mentioned above.
> 
>1. Identify and account all the silent packet drop scenarios
> 
>2. Display these drops in ovs-appctl coverage/show
> 
>A detailed presentation on this was presented at OvS conference 2017 and
>link for the corresponding presentation is available at:
> 
>
> https://www.slideshare.net/LF_OpenvSwitch/lfovs17troubleshooting-the-data-plane-in-ovs-82280329
> 
>Co-authored-by: Rohith Basavaraja 
>Co-authored-by: Keshav Gupta 
>Signed-off-by: Anju Thomas 
>Signed-off-by: Rohith Basavaraja 
>Signed-off-by: Keshav Gupta 


You still have this strange shift to the right by 3 spaces in commit-message.

> ---
>  datapath/linux/compat/include/linux/openvswitch.h |  1 +
>  lib/dpif-netdev.c | 39 ++-
>  lib/dpif.c|  7 ++
>  lib/dpif.h|  3 +
>  lib/netdev-dpdk.c |  4 ++
>  lib/odp-execute.c | 81 
> +++
>  lib/odp-execute.h |  2 +
>  lib/odp-util.c| 10 ++-
>  ofproto/ofproto-dpif-ipfix.c  |  1 +
>  ofproto/ofproto-dpif-sflow.c  |  1 +
>  ofproto/ofproto-dpif-xlate.c  | 31 +
>  ofproto/ofproto-dpif-xlate.h  |  4 ++
>  ofproto/ofproto-dpif.c|  8 +++
>  ofproto/ofproto-dpif.h|  8 ++-
>  tests/automake.mk |  3 +-
>  tests/dpif-netdev.at  | 10 +++
>  tests/ofproto-dpif.at |  4 +-
>  tests/testsuite.at|  1 +
>  tests/tunnel-push-pop.at  | 28 +++-
>  tests/tunnel.at   | 14 +++-
>  20 files changed, 248 insertions(+), 12 deletions(-)
> 
> diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
> b/datapath/linux/compat/include/linux/openvswitch.h
> index 9b087f1..92db378 100644
> --- a/datapath/linux/compat/include/linux/openvswitch.h
> +++ b/datapath/linux/compat/include/linux/openvswitch.h
> @@ -938,6 +938,7 @@ enum ovs_action_attr {
>   OVS_ACTION_ATTR_POP_NSH,  /* No argument. */
>   OVS_ACTION_ATTR_METER,/* u32 meter number. */
>   OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
> + OVS_ACTION_ATTR_DROP,

Comment about argument type required.

>  
>  #ifndef __KERNEL__
>   OVS_ACTION_ATTR_TUNNEL_PUSH,   /* struct ovs_action_push_tnl*/
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 0f57e3f..c726463 
> 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -77,6 +77,7 @@
>  #include "unixctl.h"
>  #include "util.h"
>  #include "uuid.h"
> +#include "ofproto/ofproto-dpif-xlate.h"

This header not needed there.

>  
>  VLOG_DEFINE_THIS_MODULE(dpif_netdev);
>  
> @@ -100,6 +101,17 @@ enum { MAX_METERS = 65536 };/* Maximum number of 
> meters. */
>  enum { MAX_BANDS = 8 }; /* Maximum number of bands / meter. */
>  enum { N_METER_LOCKS = 64 };/* Maximum 

Re: [ovs-dev] [PATCH] netdev-dpdk: Use single struct/union for flow offload items.

2019-02-06 Thread Ophir Munk
Hi,
Please find comments inline.

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org  boun...@openvswitch.org> On Behalf Of Ilya Maximets
> Sent: Wednesday, February 6, 2019 3:19 PM
> To: ovs-dev@openvswitch.org; Ian Stokes 
> Cc: Ilya Maximets ; Roni Bar Yanai
> 
> Subject: [ovs-dev] [PATCH] netdev-dpdk: Use single struct/union for flow
> offload items.
> 
> Having a single structure allows to simplify the code path and clear all the
> items at once (probably faster). This does not increase stack memory usage
> because all the L4 related items grouped in a union.
> 
> Changes:
>   - Memsets combined.
>   - 'ipv4_next_proto_mask' calculated at the top as we already know
> the address. We also do not need to check it before clearing.
>   - Group of 'if' statements for L4 protocols turned to a 'switch'.
> We can do that, because we don't have semi-local variables anymore.
>   - Eliminated 'end_proto_check' label. Not needed with 'switch'.
> 
> Additionally 'rte_memcpy' replaced with simple 'memcpy' as it makes no
> sense to use 'rte_memcpy' for 6 bytes.
> 
> Signed-off-by: Ilya Maximets 
> ---
>  lib/netdev-dpdk.c | 190 +++---
>  1 file changed, 79 insertions(+), 111 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> 26022a59c..79219f6ef 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -4564,28 +4564,37 @@ netdev_dpdk_add_rte_flow_offload(struct
> netdev *netdev,
>  struct flow_actions actions = { .actions = NULL, .cnt = 0 };
>  struct rte_flow *flow;
>  struct rte_flow_error error;
> -uint8_t *ipv4_next_proto_mask = NULL;
> +uint8_t proto = 0;
>  int ret = 0;
> +struct flow_items {
> +struct rte_flow_item_eth  eth;
> +struct rte_flow_item_vlan vlan;
> +struct rte_flow_item_ipv4 ipv4;
> +union {
> +struct rte_flow_item_tcp  tcp;
> +struct rte_flow_item_udp  udp;
> +struct rte_flow_item_sctp sctp;
> +struct rte_flow_item_icmp icmp;
> +};
> +} spec, mask;
> +uint8_t *ipv4_next_proto_mask = _proto_id;
> +

Please consider removing ipv4_next_porto_mask pointer assignment and later use 
directly mask.ipv4.hdr.next_proto_id.

> +memset(, 0, sizeof spec);
> +memset(, 0, sizeof mask);

For your consideration: we can save some memset() calls if we moved them inside 
the Eth case. 
I like their current position for better readability.

> 
>  /* Eth */
> -struct rte_flow_item_eth eth_spec;
> -struct rte_flow_item_eth eth_mask;
>  if (!eth_addr_is_zero(match->wc.masks.dl_src) ||
>  !eth_addr_is_zero(match->wc.masks.dl_dst)) {
> -memset(_spec, 0, sizeof eth_spec);
> -memset(_mask, 0, sizeof eth_mask);
> -rte_memcpy(_spec.dst, >flow.dl_dst, sizeof
> eth_spec.dst);
> -rte_memcpy(_spec.src, >flow.dl_src, sizeof eth_spec.src);
> -eth_spec.type = match->flow.dl_type;
> -
> -rte_memcpy(_mask.dst, >wc.masks.dl_dst,
> -   sizeof eth_mask.dst);
> -rte_memcpy(_mask.src, >wc.masks.dl_src,
> -   sizeof eth_mask.src);
> -eth_mask.type = match->wc.masks.dl_type;
> +memcpy(, >flow.dl_dst, sizeof spec.eth.dst);
> +memcpy(, >flow.dl_src, sizeof spec.eth.src);
> +spec.eth.type = match->flow.dl_type;
> +
> +memcpy(, >wc.masks.dl_dst, sizeof
> mask.eth.dst);
> +memcpy(, >wc.masks.dl_src, sizeof
> mask.eth.src);
> +mask.eth.type = match->wc.masks.dl_type;
> 
>  add_flow_pattern(, RTE_FLOW_ITEM_TYPE_ETH,
> - _spec, _mask);
> + , );
>  } else {
>  /*
>   * If user specifies a flow (like UDP flow) without L2 patterns, @@ -
> 4598,50 +4607,37 @@ netdev_dpdk_add_rte_flow_offload(struct netdev
> *netdev,
>  }
> 
>  /* VLAN */
> -struct rte_flow_item_vlan vlan_spec;
> -struct rte_flow_item_vlan vlan_mask;
>  if (match->wc.masks.vlans[0].tci && match->flow.vlans[0].tci) {
> -memset(_spec, 0, sizeof vlan_spec);
> -memset(_mask, 0, sizeof vlan_mask);
> -vlan_spec.tci  = match->flow.vlans[0].tci & ~htons(VLAN_CFI);
> -vlan_mask.tci  = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI);
> +spec.vlan.tci  = match->flow.vlans[0].tci & ~htons(VLAN_CFI);
> +mask.vlan.tci  = match->wc.masks.vlans[0].tci &
> + ~htons(VLAN_CFI);
> 
>  /* match any protocols */
> -vlan_mask.inner_type = 0;
> +mask.vlan.inner_type = 0;
> 
>  add_flow_pattern(, RTE_FLOW_ITEM_TYPE_VLAN,
> - _spec, _mask);
> + , );
>  }
> 
>  /* IP v4 */
> -uint8_t proto = 0;
> -struct rte_flow_item_ipv4 ipv4_spec;
> -struct rte_flow_item_ipv4 ipv4_mask;
>  if (match->flow.dl_type == htons(ETH_TYPE_IP)) {
> -memset(_spec, 0, sizeof ipv4_spec);
> -

Re: [ovs-dev] [RFC 1/3] dpdk: Unify passing of DPDK EAL arguments.

2019-02-06 Thread Aaron Conole
Ilya Maximets  writes:

> On 05.02.2019 23:19, Flavio Leitner wrote:
>> On Tue, Feb 05, 2019 at 06:43:08PM +0300, Ilya Maximets wrote:
>>> Hi Flavio.
>>> Thanks for taking a look.
>>>
>>> On 05.02.2019 15:38, Flavio Leitner wrote:

 Hi Ilya,

 Thanks for the patch and I think we knew about that pain when we
 exposed the very first parameter. I still remember Aaron struggling
 to find the best path forward. Time flies and here we are.

 The problem is that this change is not friendly from the community
 perspective to its users. That is like an exposed API which we should
 avoid break as much as possible.

 For instance, there are users (OpenStack) with support life cycle of
 5+ years that cannot keep up with this kind of change but they expect
 to be able to update to newer OVS.
>>>
>>> Sure, there are users that wants stable API that will never change.
>>> But this is really impossible in practice. I'm working with OpenStack
>>> too and will have to fixup deployment tools with these changes. BTW, from
>>> the whole OpenStack only few deployment projects (like TripleO) will need
>>> to make changes and these changes are not very big.
>> 
>> That's only part of the work. There will be work on QA, documentation
>> and even migration path from one to another. And we can't change the
>> past for existing deployments.
>
> Sure. But incompatible API changes are almost unavoidable for a young
> projects that wants to be better.
>
>> 
 One idea is to freeze whatever we have now and update the documentation
 to recommend the new way. We give like a couple OVS releases and only
 then ignore (or remove?) those parameters.
>>>
>>> Yes, In cover letter I proposed these patches to be applied one per release.
>>> And current (first) patch does not remove the functionality, only docs.
>>> Users still will be able to use old interface, but will have warnings
>>> in the log. In the next release cycle we'll start ignore the values
>>> while still printing the warnings. This should give enough time for 
>>> adaptation.
>>> If you feel that we need more time, we could apply the second patch to 2.14
>>> (or whatever number will be in 2 releases from now).
>> 
>> I don't think we should remove the docs if the parameters are there as
>> a first step. I mean, assume an existing deployment, there is a parameter
>> which might be in use but there is no documentation available. That
>> doesn't sound like a good user experience to me.
>
> Maybe we could save a man pages while removing the guides. There is no much
> information in Documentation/intro/install/dpdk.rst anyway.
>
>> 
>> On another hand, you could introduce the new interface and update the
>> docs to recommend using the new one because the old one will be removed
>> in the future. Warning messages come next, and then finally its removal.
>
> I'd prefer to have warning messages to be there right from the start to
> push users to migrate to the new interfaces as early as possible.
>
> How about this:
>
> First stage (apply now, release in 2.12):
>   - Introduce new interface 'dpdk-options'.
>   - Rewrite installation guide with new interface fully removing the old one.
>   - Add new interface to man pages (vswitch.xml) and mark all the old options
> as deprecated by adding something like: "Deprecated. 'dpdk-options' should
> be used instead. Will be ignored in the future."
>   - Add a runtime deprecation warning if old interface is in use.
>   - Ignore values of old knobs if 'dpdk-options' provided.
>
> Second stage (release in 2.13 or 2.14, could wait longer if required):
>   - Remove old interfaces wile keeping the warnings. (i.e. values always 
> ignored)
>   - Remove old knobs from the man pages.
>
> Third stage (optional):
>   - Remove warnings.
>
> So the main difference from the current patches is delaying removal of the man
> pages to the second stage.
>
>> 
>> 
 IMO in the end we are moving the problem from one place to another
 because even with a single string, OVS users will be caught off guard
 when DPDK changes. Yes, less pain to OVS community in the sense that
 we don't have to add/remove/deprecate stuff, but it is still a bad
 user experience regardless, which is not what OVS is known for.
>>>
>>> Unfortunately, DPDK was never user-friendly enough. It tries, but still not.
>> 
>> Agreed.
>> 
>>> We're keeping few sane defaults like providing default lcore and setting the
>>> socket-limit if needed. And we'll try to do that in the future. The thing
>>> this patch tries to eliminate is the dependency tracking between different
>>> EAL arguments, i.e. duplicating the work with rte_eal_init() and having too
>>> many configuration knobs with similar meanings what does not work at the
>>> same time.
>>>
>>> Right now there are no critical arguments that user must provide.
>>> So, IMHO, having special configs for them is really unnecessary.
>> 
>> Do you think the 

[ovs-dev] [PATCH] netdev-dpdk: Use single struct/union for flow offload items.

2019-02-06 Thread Ilya Maximets
Having a single structure allows to simplify the code path and
clear all the items at once (probably faster). This does not
increase stack memory usage because all the L4 related items
grouped in a union.

Changes:
  - Memsets combined.
  - 'ipv4_next_proto_mask' calculated at the top as we already know
the address. We also do not need to check it before clearing.
  - Group of 'if' statements for L4 protocols turned to a 'switch'.
We can do that, because we don't have semi-local variables anymore.
  - Eliminated 'end_proto_check' label. Not needed with 'switch'.

Additionally 'rte_memcpy' replaced with simple 'memcpy' as it makes no
sense to use 'rte_memcpy' for 6 bytes.

Signed-off-by: Ilya Maximets 
---
 lib/netdev-dpdk.c | 190 +++---
 1 file changed, 79 insertions(+), 111 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 26022a59c..79219f6ef 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -4564,28 +4564,37 @@ netdev_dpdk_add_rte_flow_offload(struct netdev *netdev,
 struct flow_actions actions = { .actions = NULL, .cnt = 0 };
 struct rte_flow *flow;
 struct rte_flow_error error;
-uint8_t *ipv4_next_proto_mask = NULL;
+uint8_t proto = 0;
 int ret = 0;
+struct flow_items {
+struct rte_flow_item_eth  eth;
+struct rte_flow_item_vlan vlan;
+struct rte_flow_item_ipv4 ipv4;
+union {
+struct rte_flow_item_tcp  tcp;
+struct rte_flow_item_udp  udp;
+struct rte_flow_item_sctp sctp;
+struct rte_flow_item_icmp icmp;
+};
+} spec, mask;
+uint8_t *ipv4_next_proto_mask = _proto_id;
+
+memset(, 0, sizeof spec);
+memset(, 0, sizeof mask);
 
 /* Eth */
-struct rte_flow_item_eth eth_spec;
-struct rte_flow_item_eth eth_mask;
 if (!eth_addr_is_zero(match->wc.masks.dl_src) ||
 !eth_addr_is_zero(match->wc.masks.dl_dst)) {
-memset(_spec, 0, sizeof eth_spec);
-memset(_mask, 0, sizeof eth_mask);
-rte_memcpy(_spec.dst, >flow.dl_dst, sizeof eth_spec.dst);
-rte_memcpy(_spec.src, >flow.dl_src, sizeof eth_spec.src);
-eth_spec.type = match->flow.dl_type;
-
-rte_memcpy(_mask.dst, >wc.masks.dl_dst,
-   sizeof eth_mask.dst);
-rte_memcpy(_mask.src, >wc.masks.dl_src,
-   sizeof eth_mask.src);
-eth_mask.type = match->wc.masks.dl_type;
+memcpy(, >flow.dl_dst, sizeof spec.eth.dst);
+memcpy(, >flow.dl_src, sizeof spec.eth.src);
+spec.eth.type = match->flow.dl_type;
+
+memcpy(, >wc.masks.dl_dst, sizeof mask.eth.dst);
+memcpy(, >wc.masks.dl_src, sizeof mask.eth.src);
+mask.eth.type = match->wc.masks.dl_type;
 
 add_flow_pattern(, RTE_FLOW_ITEM_TYPE_ETH,
- _spec, _mask);
+ , );
 } else {
 /*
  * If user specifies a flow (like UDP flow) without L2 patterns,
@@ -4598,50 +4607,37 @@ netdev_dpdk_add_rte_flow_offload(struct netdev *netdev,
 }
 
 /* VLAN */
-struct rte_flow_item_vlan vlan_spec;
-struct rte_flow_item_vlan vlan_mask;
 if (match->wc.masks.vlans[0].tci && match->flow.vlans[0].tci) {
-memset(_spec, 0, sizeof vlan_spec);
-memset(_mask, 0, sizeof vlan_mask);
-vlan_spec.tci  = match->flow.vlans[0].tci & ~htons(VLAN_CFI);
-vlan_mask.tci  = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI);
+spec.vlan.tci  = match->flow.vlans[0].tci & ~htons(VLAN_CFI);
+mask.vlan.tci  = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI);
 
 /* match any protocols */
-vlan_mask.inner_type = 0;
+mask.vlan.inner_type = 0;
 
 add_flow_pattern(, RTE_FLOW_ITEM_TYPE_VLAN,
- _spec, _mask);
+ , );
 }
 
 /* IP v4 */
-uint8_t proto = 0;
-struct rte_flow_item_ipv4 ipv4_spec;
-struct rte_flow_item_ipv4 ipv4_mask;
 if (match->flow.dl_type == htons(ETH_TYPE_IP)) {
-memset(_spec, 0, sizeof ipv4_spec);
-memset(_mask, 0, sizeof ipv4_mask);
-
-ipv4_spec.hdr.type_of_service = match->flow.nw_tos;
-ipv4_spec.hdr.time_to_live= match->flow.nw_ttl;
-ipv4_spec.hdr.next_proto_id   = match->flow.nw_proto;
-ipv4_spec.hdr.src_addr= match->flow.nw_src;
-ipv4_spec.hdr.dst_addr= match->flow.nw_dst;
-
-ipv4_mask.hdr.type_of_service = match->wc.masks.nw_tos;
-ipv4_mask.hdr.time_to_live= match->wc.masks.nw_ttl;
-ipv4_mask.hdr.next_proto_id   = match->wc.masks.nw_proto;
-ipv4_mask.hdr.src_addr= match->wc.masks.nw_src;
-ipv4_mask.hdr.dst_addr= match->wc.masks.nw_dst;
+spec.ipv4.hdr.type_of_service = match->flow.nw_tos;
+spec.ipv4.hdr.time_to_live= match->flow.nw_ttl;
+spec.ipv4.hdr.next_proto_id   = match->flow.nw_proto;
+

Re: [ovs-dev] [ovs-dev, v2] netdev-dpdk: Memset rte_flow_item on a need basis.

2019-02-06 Thread Ian Stokes

On 2/5/2019 1:18 PM, Ilya Maximets wrote:

On 05.02.2019 16:09, Asaf Penso wrote:



Regards,
Asaf Penso


-Original Message-
From: Ilya Maximets 
Sent: Tuesday, February 5, 2019 1:46 PM
To: Asaf Penso ; ovs-dev@openvswitch.org
Cc: Roni Bar Yanai ; Stokes, Ian
; Finn Christensen 
Subject: Re: [ovs-dev,v2] netdev-dpdk: Memset rte_flow_item on a need
basis.

On 04.02.2019 19:14, Asaf Penso wrote:

In netdev_dpdk_add_rte_flow_offload function different rte_flow_item
are created as part of the pattern matching.

For most of them, there is a check whether the wildcards are not zero.
In case of zero, nothing is being done with the rte_flow_item.

Befor the wildcard check, and regardless of the result, the
rte_flow_item is being memset.

The patch moves the memset function only if the condition of the
wildcard is true, thus saving cycles of memset if not needed.

Signed-off-by: Asaf Penso 
---


I thought about this part of code a bit. IMHO, we could create a union from
the L4 items and clear them all at once. Like this:

 uint8_t proto = 0;
 struct flow_items {
 struct rte_flow_item_eth  eth;
 struct rte_flow_item_vlan vlan;
 struct rte_flow_item_ipv4 ipv4;
 union {
 struct rte_flow_item_tcp  tcp;
 struct rte_flow_item_udp  udp;
 struct rte_flow_item_sctp sctp;
 struct rte_flow_item_icmp icmp;
 };
 } spec, mask;
 uint8_t *ipv4_next_proto_mask = _proto_id;

 memset(, 0, sizeof spec);
 memset(, 0, sizeof mask);


Ethernet is a common case, userspace datapath always has exact match on
vlan,
IPv4 is also the common case, I think. So current code in most cases we'll not
call memset only for few of L4 protocols which are in the union here and
does not eat extra memory.
Anyway we're not on a very hot path.

With that you'll be able to easily convert L4 proto checking 'if's to a single
'switch (proto)' statement and also clear the *ipv4_next_proto_mask
unconditionally.

What do you think ?


I fully agree and if you can have a separate patch it would be great.


OK. Good.
I'll prepare the patch as soon as this one will be merged.



Thanks for the patch Asaf, I've pushed this to master, I'll also add 
your name to the AUTHORS doc.


Ian
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v5 2/2] conntrack: Fix max size for inet_ntop() call.

2019-02-06 Thread David Marchand
On Mon, Feb 4, 2019 at 6:56 PM Ben Pfaff  wrote:

> On Thu, Jan 31, 2019 at 11:35:41PM -0800, Darrell Ball wrote:
> > The call to inet_ntop() in repl_ftp_v6_addr() is 1 short to handle
> > the maximum possible V6 address size for v4 mapping case.
> >
> > Found by inspection.
> >
> > Fixes: bd5e81a0e596 ("Userspace Datapath: Add ALG infra and FTP.")
> > Signed-off-by: Darrell Ball 
>
> Thanks for the bug fix patches.  I applied these to master and
> backported to 2.10 and 2.11.  If they need to be backported further,
> please submit backported versions.
>

Thanks Ben.

Darrell, can these two fixes be backported to 2.9 ?
I can send the backports if you don't have the bandwidth.


-- 
David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC 1/3] dpdk: Unify passing of DPDK EAL arguments.

2019-02-06 Thread Ilya Maximets
On 05.02.2019 23:19, Flavio Leitner wrote:
> On Tue, Feb 05, 2019 at 06:43:08PM +0300, Ilya Maximets wrote:
>> Hi Flavio.
>> Thanks for taking a look.
>>
>> On 05.02.2019 15:38, Flavio Leitner wrote:
>>>
>>> Hi Ilya,
>>>
>>> Thanks for the patch and I think we knew about that pain when we
>>> exposed the very first parameter. I still remember Aaron struggling
>>> to find the best path forward. Time flies and here we are.
>>>
>>> The problem is that this change is not friendly from the community
>>> perspective to its users. That is like an exposed API which we should
>>> avoid break as much as possible.
>>>
>>> For instance, there are users (OpenStack) with support life cycle of
>>> 5+ years that cannot keep up with this kind of change but they expect
>>> to be able to update to newer OVS.
>>
>> Sure, there are users that wants stable API that will never change.
>> But this is really impossible in practice. I'm working with OpenStack
>> too and will have to fixup deployment tools with these changes. BTW, from
>> the whole OpenStack only few deployment projects (like TripleO) will need
>> to make changes and these changes are not very big.
> 
> That's only part of the work. There will be work on QA, documentation
> and even migration path from one to another. And we can't change the
> past for existing deployments.

Sure. But incompatible API changes are almost unavoidable for a young
projects that wants to be better.

> 
>>> One idea is to freeze whatever we have now and update the documentation
>>> to recommend the new way. We give like a couple OVS releases and only
>>> then ignore (or remove?) those parameters.
>>
>> Yes, In cover letter I proposed these patches to be applied one per release.
>> And current (first) patch does not remove the functionality, only docs.
>> Users still will be able to use old interface, but will have warnings
>> in the log. In the next release cycle we'll start ignore the values
>> while still printing the warnings. This should give enough time for 
>> adaptation.
>> If you feel that we need more time, we could apply the second patch to 2.14
>> (or whatever number will be in 2 releases from now).
> 
> I don't think we should remove the docs if the parameters are there as
> a first step. I mean, assume an existing deployment, there is a parameter
> which might be in use but there is no documentation available. That
> doesn't sound like a good user experience to me.

Maybe we could save a man pages while removing the guides. There is no much
information in Documentation/intro/install/dpdk.rst anyway.

> 
> On another hand, you could introduce the new interface and update the
> docs to recommend using the new one because the old one will be removed
> in the future. Warning messages come next, and then finally its removal.

I'd prefer to have warning messages to be there right from the start to
push users to migrate to the new interfaces as early as possible.

How about this:

First stage (apply now, release in 2.12):
  - Introduce new interface 'dpdk-options'.
  - Rewrite installation guide with new interface fully removing the old one.
  - Add new interface to man pages (vswitch.xml) and mark all the old options
as deprecated by adding something like: "Deprecated. 'dpdk-options' should
be used instead. Will be ignored in the future."
  - Add a runtime deprecation warning if old interface is in use.
  - Ignore values of old knobs if 'dpdk-options' provided.

Second stage (release in 2.13 or 2.14, could wait longer if required):
  - Remove old interfaces wile keeping the warnings. (i.e. values always 
ignored)
  - Remove old knobs from the man pages.

Third stage (optional):
  - Remove warnings.

So the main difference from the current patches is delaying removal of the man
pages to the second stage.

> 
> 
>>> IMO in the end we are moving the problem from one place to another
>>> because even with a single string, OVS users will be caught off guard
>>> when DPDK changes. Yes, less pain to OVS community in the sense that
>>> we don't have to add/remove/deprecate stuff, but it is still a bad
>>> user experience regardless, which is not what OVS is known for.
>>
>> Unfortunately, DPDK was never user-friendly enough. It tries, but still not.
> 
> Agreed.
> 
>> We're keeping few sane defaults like providing default lcore and setting the
>> socket-limit if needed. And we'll try to do that in the future. The thing
>> this patch tries to eliminate is the dependency tracking between different
>> EAL arguments, i.e. duplicating the work with rte_eal_init() and having too
>> many configuration knobs with similar meanings what does not work at the
>> same time.
>>
>> Right now there are no critical arguments that user must provide.
>> So, IMHO, having special configs for them is really unnecessary.
> 
> Do you think the defaults work for the majority of DPDK users?

My personal experience is opposite. Most of my deployments and deployments that
I had to work with had massive