Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch

2021-07-14 Thread Pravin Shelar
On Wed, Jun 30, 2021 at 2:53 AM Mark Gray  wrote:
>
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
>
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
>
> * On systems with a large number of vports, there is a correspondingly
> large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
>
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
>
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
>
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
>
> The corresponding user space code can be found at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html
>
> Bugzilla: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> ---
>
> Notes:
> v1 - Reworked based on Flavio's comments:
>  * Fixed handling of userspace action case
>  * Renamed 'struct dp_portids'
>  * Fixed handling of return from kmalloc()
>  * Removed check for dispatch type from ovs_dp_get_upcall_portid()
>- Reworked based on Dan's comments:
>  * Fixed handling of return from kmalloc()
>- Reworked based on Pravin's comments:
>  * Fixed handling of userspace action case
>- Added kfree() in destroy_dp_rcu() to cleanup netlink port ids
>
Patch looks good to me. I have the following minor comments.

>  include/uapi/linux/openvswitch.h |  8 
>  net/openvswitch/actions.c|  6 ++-
>  net/openvswitch/datapath.c   | 70 +++-
>  net/openvswitch/datapath.h   | 20 +
>  4 files changed, 101 insertions(+), 3 deletions(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 8d16744edc31..6571b57b2268 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -70,6 +70,8 @@ enum ovs_datapath_cmd {
>   * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
>   * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
>   * not be sent.
> + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when
> + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set.
>   * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the
>   * datapath.  Always present in notifications.
>   * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for 
> the
> @@ -87,6 +89,9 @@ enum ovs_datapath_attr {
> OVS_DP_ATTR_USER_FEATURES,  /* OVS_DP_F_*  */
> OVS_DP_ATTR_PAD,
> OVS_DP_ATTR_MASKS_CACHE_SIZE,
> +   OVS_DP_ATTR_PER_CPU_PIDS,   /* Netlink PIDS to receive upcalls in 
> per-cpu
> +* dispatch mode
> +*/
> __OVS_DP_ATTR_MAX
>  };
>
> @@ -127,6 +132,9 @@ struct ovs_vport_stats {
>  /* Allow tc offload recirc sharing */
>  #define OVS_DP_F_TC_RECIRC_SHARING (1 << 2)
>
> +/* Allow per-cpu dispatch of upcalls */
> +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU   (1 << 3)
> +
>  /* Fixed logical ports. */
>  #define OVSP_LOCAL  ((__u32)0)
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index ef15d9eb4774..f79679746c62 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -924,7 +924,11 @@ static int output_userspace(struct datapath *dp, struct 
> sk_buff *skb,
> break;
>
> case OVS_USERSPACE_ATTR_PID:
> -   upcall.portid = nla_get_u32(a);
> +   if (dp->user_features & 
> OVS_DP_F_DISPATCH_UPCALL_PER_CPU)
> +   upcall.portid =
> +  ovs_dp_get_upcall_portid(dp, 
> smp_processor_id());
> +   else
> +   upcall.portid = nla_get_u32(a);
>

Re: [ovs-dev] [PATCH v13 7/7] netdev-offload-tc: Add offload support for sFlow

2021-07-14 Thread 0-day Robot
Bleep bloop.  Greetings Chris Mi, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 81 characters long (recommended limit is 79)
#56 FILE: lib/netdev-offload-tc.c:1100:
nl_msg_put(buf, node->sflow.action, 
node->sflow.action->nla_len);

Lines checked: 503, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v13 5/7] dpif-offload-netlink: Implement dpif-offload-provider API

2021-07-14 Thread Chris Mi via dev
Implement dpif-offload API for netlink datapath.

Signed-off-by: Chris Mi 
Reviewed-by: Eli Britstein 
---
 lib/automake.mk |   1 +
 lib/dpif-netlink.c  |   2 +-
 lib/dpif-offload-netlink.c  | 210 
 lib/dpif-offload-provider.h |  12 +++
 4 files changed, 224 insertions(+), 1 deletion(-)
 create mode 100644 lib/dpif-offload-netlink.c

diff --git a/lib/automake.mk b/lib/automake.mk
index dc865b0ef..daa60c784 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -442,6 +442,7 @@ lib_libopenvswitch_la_SOURCES += \
lib/dpif-netlink.h \
lib/dpif-netlink-rtnl.c \
lib/dpif-netlink-rtnl.h \
+   lib/dpif-offload-netlink.c \
lib/if-notifier.c \
lib/netdev-linux.c \
lib/netdev-linux.h \
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 6a7defb95..676934ef1 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -4037,7 +4037,7 @@ const struct dpif_class dpif_netlink_class = {
 NULL,   /* bond_add */
 NULL,   /* bond_del */
 NULL,   /* bond_stats_get */
-NULL,   /* dpif_offlod_api */
+_offload_netlink,
 };
 
 static int
diff --git a/lib/dpif-offload-netlink.c b/lib/dpif-offload-netlink.c
new file mode 100644
index 0..f02a6b0eb
--- /dev/null
+++ b/lib/dpif-offload-netlink.c
@@ -0,0 +1,210 @@
+/*
+ * Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "dpif-offload-provider.h"
+#include "netdev-offload.h"
+#include "netlink-protocol.h"
+#include "netlink-socket.h"
+#include "openvswitch/vlog.h"
+
+VLOG_DEFINE_THIS_MODULE(dpif_offload_netlink);
+
+static struct nl_sock *psample_sock;
+static int psample_family;
+
+/* Receive psample netlink message and save the attributes. */
+struct offload_psample {
+struct nlattr *packet;  /* packet data */
+int dp_group_id;/* mapping id for sFlow offload */
+int iifindex;   /* input ifindex */
+int group_seq;  /* group sequence */
+};
+
+static void
+dpif_offload_netlink_init(void)
+{
+unsigned int psample_mcgroup;
+int err;
+
+if (!netdev_is_flow_api_enabled()) {
+return;
+}
+
+if (psample_sock) {
+return;
+}
+
+err = nl_lookup_genl_family(PSAMPLE_GENL_NAME,
+_family);
+if (err) {
+VLOG_INFO("%s: Generic Netlink family '%s' does not exist. "
+  "Please make sure the kernel module psample is loaded",
+  __func__, PSAMPLE_GENL_NAME);
+return;
+}
+
+err = nl_lookup_genl_mcgroup(PSAMPLE_GENL_NAME,
+ PSAMPLE_NL_MCGRP_SAMPLE_NAME,
+ _mcgroup);
+if (err) {
+VLOG_INFO("%s: Failed to join multicast group '%s' for Generic "
+  "Netlink family '%s'", __func__,
+  PSAMPLE_NL_MCGRP_SAMPLE_NAME,
+  PSAMPLE_GENL_NAME);
+return;
+}
+
+err = nl_sock_create(NETLINK_GENERIC, _sock);
+if (err) {
+VLOG_INFO("%s: Failed to create psample socket", __func__);
+return;
+}
+
+err = nl_sock_join_mcgroup(psample_sock, psample_mcgroup);
+if (err) {
+VLOG_INFO("%s: Failed to join psample mcgroup", __func__);
+nl_sock_destroy(psample_sock);
+return;
+}
+}
+
+static void
+dpif_offload_netlink_uninit(void)
+{
+if (!netdev_is_flow_api_enabled()) {
+return;
+}
+
+if (!psample_sock) {
+return;
+}
+
+nl_sock_destroy(psample_sock);
+psample_sock = NULL;
+}
+
+static void
+dpif_offload_netlink_sflow_recv_wait(void)
+{
+if (psample_sock) {
+nl_sock_wait(psample_sock, POLLIN);
+}
+}
+
+static int
+psample_from_ofpbuf(struct offload_psample *psample,
+const struct ofpbuf *buf)
+{
+static const struct nl_policy ovs_psample_policy[] = {
+[PSAMPLE_ATTR_IIFINDEX] = { .type = NL_A_U16 },
+[PSAMPLE_ATTR_SAMPLE_GROUP] = { .type = NL_A_U32 },
+[PSAMPLE_ATTR_GROUP_SEQ] = { .type = NL_A_U32 },
+[PSAMPLE_ATTR_DATA] = { .type = NL_A_UNSPEC },
+};
+struct nlattr *a[ARRAY_SIZE(ovs_psample_policy)];
+struct genlmsghdr *genl;
+struct nlmsghdr *nlmsg;
+struct 

[ovs-dev] [PATCH v13 7/7] netdev-offload-tc: Add offload support for sFlow

2021-07-14 Thread Chris Mi via dev
Create a unique group ID to map the sFlow info when offloading sFlow
action to TC. When showing the offloaded datapath flows, translate the
group ID from TC sample action to sFlow info using the mapping.

Signed-off-by: Chris Mi 
Reviewed-by: Eli Britstein 
---
 NEWS|   1 +
 lib/netdev-offload-tc.c | 211 +---
 lib/tc.c|  61 +++-
 lib/tc.h|  15 ++-
 4 files changed, 271 insertions(+), 17 deletions(-)

diff --git a/NEWS b/NEWS
index 6cdccc715..7c0361e18 100644
--- a/NEWS
+++ b/NEWS
@@ -50,6 +50,7 @@ Post-v2.15.0
- OVS now reports the datapath capability 'ct_zero_snat', which reflects
  whether the SNAT with all-zero IP address is supported.
  See ovs-vswitchd.conf.db(5) for details.
+   - Add sFlow offload support for kernel (netlink) datapath.
 
 
 v2.15.0 - 15 Feb 2021
diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 2f16cf279..b68b8df28 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -20,6 +20,7 @@
 #include 
 
 #include "dpif.h"
+#include "dpif-offload-provider.h"
 #include "hash.h"
 #include "openvswitch/hmap.h"
 #include "openvswitch/match.h"
@@ -1087,6 +1088,18 @@ parse_tc_flower_to_match(struct tc_flower *flower,
 action = flower->actions;
 for (i = 0; i < flower->action_count; i++, action++) {
 switch (action->type) {
+case TC_ACT_SAMPLE: {
+const struct sgid_node *node;
+
+node = sgid_find(action->sample.group_id);
+if (!node) {
+VLOG_ERR_RL(_rl, "%s: sgid node is NULL, sgid: %d",
+__func__, action->sample.group_id);
+return ENOENT;
+}
+nl_msg_put(buf, node->sflow.action, 
node->sflow.action->nla_len);
+}
+break;
 case TC_ACT_VLAN_POP: {
 nl_msg_put_flag(buf, OVS_ACTION_ATTR_POP_VLAN);
 }
@@ -1825,6 +1838,156 @@ parse_match_ct_state_to_flower(struct tc_flower 
*flower, struct match *match)
 }
 }
 
+static int
+parse_userspace_attributes(const struct nlattr *actions,
+   struct dpif_sflow_attr *sflow_attr)
+{
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+const struct nlattr *nla;
+unsigned int left;
+
+NL_NESTED_FOR_EACH_UNSAFE (nla, left, actions) {
+if (nl_attr_type(nla) == OVS_USERSPACE_ATTR_USERDATA) {
+struct user_action_cookie *cookie;
+
+cookie = CONST_CAST(struct user_action_cookie *, nl_attr_get(nla));
+if (cookie->type == USER_ACTION_COOKIE_SFLOW) {
+sflow_attr->userdata = CONST_CAST(void *, nl_attr_get(nla));
+sflow_attr->userdata_len = nl_attr_get_size(nla);
+return 0;
+}
+}
+}
+
+VLOG_DBG_RL(, "%s: cannot offload userspace action other than sFlow",
+__func__);
+return EOPNOTSUPP;
+}
+
+static int
+parse_sample_actions_attribute(const struct nlattr *actions,
+   struct dpif_sflow_attr *sflow_attr)
+{
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+const struct nlattr *nla;
+unsigned int left;
+int err = EINVAL;
+
+NL_NESTED_FOR_EACH_UNSAFE (nla, left, actions) {
+if (nl_attr_type(nla) == OVS_ACTION_ATTR_USERSPACE) {
+err = parse_userspace_attributes(nla, sflow_attr);
+} else {
+/* We can't offload other nested actions */
+VLOG_DBG_RL(, "%s: can only offload "
+"OVS_ACTION_ATTR_USERSPACE attribute", __func__);
+return EINVAL;
+}
+}
+
+if (err) {
+VLOG_ERR_RL(_rl, "%s: no OVS_ACTION_ATTR_USERSPACE attribute",
+__func__);
+}
+return err;
+}
+
+static int
+parse_sample_action(struct tc_flower *flower, struct tc_action *tc_action,
+const struct nlattr *sample_action,
+const struct flow_tnl *tnl, uint32_t *group_id,
+const ovs_u128 *ufid)
+{
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+struct dpif_sflow_attr sflow_attr;
+const struct nlattr *nla;
+unsigned int left;
+int ret = EINVAL;
+
+if (*group_id) {
+VLOG_ERR_RL(_rl, "%s: Only a single TC_SAMPLE action "
+"per flow is supported", __func__);
+return EOPNOTSUPP;
+}
+
+memset(_attr, 0, sizeof sflow_attr);
+sflow_attr.ufid = *ufid;
+sflow_attr.action = sample_action;
+
+if (flower->tunnel) {
+sflow_attr.tunnel = CONST_CAST(struct flow_tnl *, tnl);
+}
+
+NL_NESTED_FOR_EACH_UNSAFE (nla, left, sample_action) {
+if (nl_attr_type(nla) == OVS_SAMPLE_ATTR_ACTIONS) {
+ret = parse_sample_actions_attribute(nla, _attr);
+} else if (nl_attr_type(nla) 

[ovs-dev] [PATCH v13 3/7] dpif-offload-provider: Introduce dpif-offload-provider layer

2021-07-14 Thread Chris Mi via dev
Some offload actions require functionality that is not netdev
based, but dpif. For example, sFlow action requires to create
a psample netlink socket to receive the sampled packets from
TC or kernel driver.

Create dpif-offload-provider layer to support such actions.

Signed-off-by: Chris Mi 
Reviewed-by: Eli Britstein 
---
 lib/automake.mk |  2 ++
 lib/dpif-netdev.c   |  1 +
 lib/dpif-netlink.c  |  2 ++
 lib/dpif-offload-provider.h | 34 +
 lib/dpif-offload.c  | 43 +
 lib/dpif-provider.h |  8 ++-
 lib/dpif.c  | 10 +
 7 files changed, 99 insertions(+), 1 deletion(-)
 create mode 100644 lib/dpif-offload-provider.h
 create mode 100644 lib/dpif-offload.c

diff --git a/lib/automake.mk b/lib/automake.mk
index 3c9523c1a..dc865b0ef 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -123,6 +123,8 @@ lib_libopenvswitch_la_SOURCES = \
lib/dpif-netdev-private.h \
lib/dpif-netdev-perf.c \
lib/dpif-netdev-perf.h \
+   lib/dpif-offload.c \
+   lib/dpif-offload-provider.h \
lib/dpif-provider.h \
lib/dpif.c \
lib/dpif.h \
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 610949f36..35d73542b 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -8269,6 +8269,7 @@ const struct dpif_class dpif_netdev_class = {
 dpif_netdev_bond_add,
 dpif_netdev_bond_del,
 dpif_netdev_bond_stats_get,
+NULL,   /* dpif_offlod_api */
 };
 
 static void
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 39dc8300e..6a7defb95 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -34,6 +34,7 @@
 
 #include "bitmap.h"
 #include "dpif-netlink-rtnl.h"
+#include "dpif-offload-provider.h"
 #include "dpif-provider.h"
 #include "fat-rwlock.h"
 #include "flow.h"
@@ -4036,6 +4037,7 @@ const struct dpif_class dpif_netlink_class = {
 NULL,   /* bond_add */
 NULL,   /* bond_del */
 NULL,   /* bond_stats_get */
+NULL,   /* dpif_offlod_api */
 };
 
 static int
diff --git a/lib/dpif-offload-provider.h b/lib/dpif-offload-provider.h
new file mode 100644
index 0..97108402a
--- /dev/null
+++ b/lib/dpif-offload-provider.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef DPIF_OFFLOAD_PROVIDER_H
+#define DPIF_OFFLOAD_PROVIDER_H
+
+struct dpif;
+struct dpif_offload_sflow;
+
+struct dpif_offload_api {
+void (*init)(void);
+void (*uninit)(void);
+void (*sflow_recv_wait)(void);
+int (*sflow_recv)(struct dpif_offload_sflow *sflow);
+};
+
+void dpif_offload_sflow_recv_wait(const struct dpif *dpif);
+int dpif_offload_sflow_recv(const struct dpif *dpif,
+struct dpif_offload_sflow *sflow);
+
+#endif /* DPIF_OFFLOAD_PROVIDER_H */
diff --git a/lib/dpif-offload.c b/lib/dpif-offload.c
new file mode 100644
index 0..842e05798
--- /dev/null
+++ b/lib/dpif-offload.c
@@ -0,0 +1,43 @@
+/*
+ * Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+
+#include "dpif-provider.h"
+
+void
+dpif_offload_sflow_recv_wait(const struct dpif *dpif)
+{
+const struct dpif_offload_api *offload_api = dpif->dpif_class->offload_api;
+
+if (offload_api && offload_api->sflow_recv_wait) {
+offload_api->sflow_recv_wait();
+}
+}
+
+int
+dpif_offload_sflow_recv(const struct dpif *dpif,
+struct dpif_offload_sflow *sflow)
+{
+const struct dpif_offload_api *offload_api = dpif->dpif_class->offload_api;
+
+if (offload_api && offload_api->sflow_recv) {
+return offload_api->sflow_recv(sflow);
+}
+
+return 

[ovs-dev] [PATCH v13 6/7] ofproto: Introduce API to process sFlow offload packet

2021-07-14 Thread Chris Mi via dev
Process sFlow offload packet in handler thread if handler id is 0.

Signed-off-by: Chris Mi 
Reviewed-by: Eli Britstein 
---
 ofproto/ofproto-dpif-upcall.c | 57 +++
 1 file changed, 57 insertions(+)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index ccf97266c..7e934614d 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -22,6 +22,7 @@
 #include "connmgr.h"
 #include "coverage.h"
 #include "cmap.h"
+#include "lib/dpif-offload-provider.h"
 #include "lib/dpif-provider.h"
 #include "dpif.h"
 #include "openvswitch/dynamic-string.h"
@@ -742,6 +743,51 @@ udpif_get_n_flows(struct udpif *udpif)
 return flow_count;
 }
 
+static void
+process_offload_sflow(struct udpif *udpif, struct dpif_offload_sflow *sflow)
+{
+const struct dpif_sflow_attr *attr = sflow->attr;
+struct user_action_cookie *cookie;
+struct dpif_sflow *dpif_sflow;
+struct ofproto_dpif *ofproto;
+struct upcall upcall;
+uint32_t iifindex;
+struct flow flow;
+
+if (!attr) {
+VLOG_WARN_RL(, "%s: dpif_sflow_attr is NULL", __func__);
+return;
+}
+
+cookie = attr->userdata;
+ofproto = ofproto_dpif_lookup_by_uuid(>ofproto_uuid);
+if (!ofproto) {
+VLOG_WARN_RL(, "%s: could not find ofproto", __func__);
+return;
+}
+
+dpif_sflow = ofproto->sflow;
+if (!sflow) {
+VLOG_WARN_RL(, "%s: could not find dpif_sflow", __func__);
+return;
+}
+
+memset(, 0, sizeof flow);
+if (attr->tunnel) {
+memcpy(, attr->tunnel, sizeof flow.tunnel);
+}
+iifindex = sflow->iifindex;
+flow.in_port.odp_port = netdev_ifindex_to_odp_port(iifindex);
+memset(, 0, sizeof upcall);
+upcall.flow = 
+upcall.cookie = *cookie;
+upcall.packet = >packet;
+upcall.sflow = dpif_sflow;
+upcall.ufid = >attr->ufid;
+upcall.type = SFLOW_UPCALL;
+process_upcall(udpif, , NULL, NULL);
+}
+
 /* The upcall handler thread tries to read a batch of UPCALL_MAX_BATCH
  * upcalls from dpif, processes the batch and installs corresponding flows
  * in dpif. */
@@ -756,8 +802,19 @@ udpif_upcall_handler(void *arg)
 poll_immediate_wake();
 } else {
 dpif_recv_wait(udpif->dpif, handler->handler_id);
+dpif_offload_sflow_recv_wait(udpif->dpif);
 latch_wait(>exit_latch);
 }
+/* Only handler id 0 thread process sFlow offload packet. */
+if (handler->handler_id == 0) {
+struct dpif_offload_sflow sflow;
+int err;
+
+err = dpif_offload_sflow_recv(udpif->dpif, );
+if (!err) {
+process_offload_sflow(udpif, );
+}
+}
 poll_block();
 }
 
-- 
2.21.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v13 4/7] netdev-offload-tc: Introduce group ID management API

2021-07-14 Thread Chris Mi via dev
When offloading sample action to TC, userspace creates a unique ID
to map sFlow action and tunnel info and passes this ID to kernel instead
of the sFlow info. psample will send this ID and sampled packet to
userspace. Using the ID, userspace can recover the sFlow info and send
sampled packet to the right sFlow monitoring host.

Signed-off-by: Chris Mi 
Reviewed-by: Eli Britstein 
---
 lib/dpif-offload-provider.h |  18 +++
 lib/netdev-offload-tc.c | 272 +++-
 lib/netdev-offload.h|   1 +
 lib/tc.h|   1 +
 4 files changed, 290 insertions(+), 2 deletions(-)

diff --git a/lib/dpif-offload-provider.h b/lib/dpif-offload-provider.h
index 97108402a..b765eb9a2 100644
--- a/lib/dpif-offload-provider.h
+++ b/lib/dpif-offload-provider.h
@@ -17,9 +17,27 @@
 #ifndef DPIF_OFFLOAD_PROVIDER_H
 #define DPIF_OFFLOAD_PROVIDER_H
 
+#include "netlink-protocol.h"
+#include "openvswitch/packets.h"
+#include "openvswitch/types.h"
+
 struct dpif;
 struct dpif_offload_sflow;
 
+/* When offloading sample action, userspace creates a unique ID to map
+ * sFlow action and tunnel info and passes this ID to datapath instead
+ * of the sFlow info. Datapath will send this ID and sampled packet to
+ * userspace. Using the ID, userspace can recover the sFlow info and send
+ * sampled packet to the right sFlow monitoring host.
+ */
+struct dpif_sflow_attr {
+const struct nlattr *action; /* sFlow action */
+void *userdata;  /* struct user_action_cookie */
+size_t userdata_len; /* struct user_action_cookie length */
+struct flow_tnl *tunnel; /* tunnel info */
+ovs_u128 ufid;   /* flow ufid */
+};
+
 struct dpif_offload_api {
 void (*init)(void);
 void (*uninit)(void);
diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 9845e8d3f..2f16cf279 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -40,6 +40,7 @@
 #include "unaligned.h"
 #include "util.h"
 #include "dpif-provider.h"
+#include "cmap.h"
 
 VLOG_DEFINE_THIS_MODULE(netdev_offload_tc);
 
@@ -62,6 +63,262 @@ struct chain_node {
 uint32_t chain;
 };
 
+/* This maps a psample group ID to struct dpif_sflow_attr for sFlow */
+struct sgid_node {
+struct ovs_list exp_node OVS_GUARDED;
+struct cmap_node metadata_node;
+struct cmap_node id_node;
+struct ovs_refcount refcount;
+uint32_t hash;
+uint32_t id;
+const struct dpif_sflow_attr sflow;
+};
+
+static struct ovs_rwlock sgid_rwlock = OVS_RWLOCK_INITIALIZER;
+
+static long long int sgid_last_run OVS_GUARDED_BY(sgid_rwlock);
+
+static struct cmap sgid_map = CMAP_INITIALIZER;
+static struct cmap sgid_metadata_map = CMAP_INITIALIZER;
+
+static struct ovs_list sgid_expiring OVS_GUARDED_BY(sgid_rwlock)
+= OVS_LIST_INITIALIZER(_expiring);
+static struct ovs_list sgid_expired OVS_GUARDED_BY(sgid_rwlock)
+= OVS_LIST_INITIALIZER(_expired);
+
+static uint32_t next_sample_group_id OVS_GUARDED_BY(sgid_rwlock) = 1;
+
+#define SGID_RUN_INTERVAL   250 /* msec */
+
+static void
+sgid_node_free(struct sgid_node *node)
+{
+free(node->sflow.tunnel);
+free(CONST_CAST(void *, node->sflow.action));
+free(node->sflow.userdata);
+free(node);
+}
+
+static void
+sgid_cleanup(void)
+{
+long long int now = time_msec();
+struct sgid_node *node;
+
+/* Do maintenance at most 4 times / sec. */
+ovs_rwlock_rdlock(_rwlock);
+if (now - sgid_last_run < SGID_RUN_INTERVAL) {
+ovs_rwlock_unlock(_rwlock);
+return;
+}
+ovs_rwlock_unlock(_rwlock);
+
+ovs_rwlock_wrlock(_rwlock);
+sgid_last_run = now;
+
+LIST_FOR_EACH_POP (node, exp_node, _expired) {
+cmap_remove(_map, >id_node, node->id);
+ovsrcu_postpone(sgid_node_free, node);
+}
+
+if (!ovs_list_is_empty(_expiring)) {
+/* 'sgid_expired' is now empty, move nodes in
+ * 'sgid_expiring' to it. */
+ovs_list_splice(_expired,
+ovs_list_front(_expiring),
+_expiring);
+}
+ovs_rwlock_unlock(_rwlock);
+}
+
+/* Lockless RCU protected lookup.  If node is needed accross RCU quiescent
+ * state, caller should copy the contents. */
+static const struct sgid_node *
+sgid_find(uint32_t id)
+{
+const struct cmap_node *node = cmap_find(_map, id);
+
+return node ? CONTAINER_OF(node, const struct sgid_node, id_node) : NULL;
+}
+
+const struct dpif_sflow_attr *
+dpif_offload_sflow_attr_find(uint32_t id)
+{
+const struct sgid_node *node;
+
+node = sgid_find(id);
+if (!node) {
+return NULL;
+}
+
+return >sflow;
+}
+
+static uint32_t
+dpif_sflow_attr_hash(const struct dpif_sflow_attr *sflow)
+{
+return hash_bytes(>ufid, sizeof sflow->ufid, 0);
+}
+
+static bool
+dpif_sflow_attr_equal(const struct dpif_sflow_attr *a,
+  const struct dpif_sflow_attr *b)
+{
+return ovs_u128_equals(a->ufid, b->ufid);
+}
+
+/* Lockless RCU 

[ovs-dev] [PATCH v13 1/7] compat: Add psample and tc sample action defines for older kernels

2021-07-14 Thread Chris Mi via dev
Update kernel UAPI to support psample and the tc sample action.

Signed-off-by: Chris Mi 
Reviewed-by: Eli Britstein 
Acked-by: Eelco Chaudron 
---
 include/linux/automake.mk|  4 ++-
 include/linux/psample.h  | 62 
 include/linux/tc_act/tc_sample.h | 25 +
 3 files changed, 90 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/psample.h
 create mode 100644 include/linux/tc_act/tc_sample.h

diff --git a/include/linux/automake.mk b/include/linux/automake.mk
index 8f063f482..c48d9699a 100644
--- a/include/linux/automake.mk
+++ b/include/linux/automake.mk
@@ -7,4 +7,6 @@ noinst_HEADERS += \
include/linux/tc_act/tc_skbedit.h \
include/linux/tc_act/tc_tunnel_key.h \
include/linux/tc_act/tc_vlan.h \
-   include/linux/tc_act/tc_ct.h
+   include/linux/tc_act/tc_ct.h \
+   include/linux/tc_act/tc_sample.h \
+   include/linux/psample.h
diff --git a/include/linux/psample.h b/include/linux/psample.h
new file mode 100644
index 0..e585db5bf
--- /dev/null
+++ b/include/linux/psample.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __UAPI_PSAMPLE_H
+#define __UAPI_PSAMPLE_H
+
+enum {
+   PSAMPLE_ATTR_IIFINDEX,
+   PSAMPLE_ATTR_OIFINDEX,
+   PSAMPLE_ATTR_ORIGSIZE,
+   PSAMPLE_ATTR_SAMPLE_GROUP,
+   PSAMPLE_ATTR_GROUP_SEQ,
+   PSAMPLE_ATTR_SAMPLE_RATE,
+   PSAMPLE_ATTR_DATA,
+   PSAMPLE_ATTR_GROUP_REFCOUNT,
+   PSAMPLE_ATTR_TUNNEL,
+
+   PSAMPLE_ATTR_PAD,
+   PSAMPLE_ATTR_OUT_TC,/* u16 */
+   PSAMPLE_ATTR_OUT_TC_OCC,/* u64, bytes */
+   PSAMPLE_ATTR_LATENCY,   /* u64, nanoseconds */
+   PSAMPLE_ATTR_TIMESTAMP, /* u64, nanoseconds */
+   PSAMPLE_ATTR_PROTO, /* u16 */
+
+   __PSAMPLE_ATTR_MAX
+};
+
+enum psample_command {
+   PSAMPLE_CMD_SAMPLE,
+   PSAMPLE_CMD_GET_GROUP,
+   PSAMPLE_CMD_NEW_GROUP,
+   PSAMPLE_CMD_DEL_GROUP,
+};
+
+enum psample_tunnel_key_attr {
+   PSAMPLE_TUNNEL_KEY_ATTR_ID, /* be64 Tunnel ID */
+   PSAMPLE_TUNNEL_KEY_ATTR_IPV4_SRC,   /* be32 src IP address. */
+   PSAMPLE_TUNNEL_KEY_ATTR_IPV4_DST,   /* be32 dst IP address. */
+   PSAMPLE_TUNNEL_KEY_ATTR_TOS,/* u8 Tunnel IP ToS. */
+   PSAMPLE_TUNNEL_KEY_ATTR_TTL,/* u8 Tunnel IP TTL. */
+   PSAMPLE_TUNNEL_KEY_ATTR_DONT_FRAGMENT,  /* No argument, set DF. */
+   PSAMPLE_TUNNEL_KEY_ATTR_CSUM,   /* No argument. CSUM 
packet. */
+   PSAMPLE_TUNNEL_KEY_ATTR_OAM,/* No argument. OAM frame.  
*/
+   PSAMPLE_TUNNEL_KEY_ATTR_GENEVE_OPTS,/* Array of Geneve options. 
*/
+   PSAMPLE_TUNNEL_KEY_ATTR_TP_SRC, /* be16 src Transport Port. 
*/
+   PSAMPLE_TUNNEL_KEY_ATTR_TP_DST, /* be16 dst Transport Port. 
*/
+   PSAMPLE_TUNNEL_KEY_ATTR_VXLAN_OPTS, /* Nested VXLAN opts* */
+   PSAMPLE_TUNNEL_KEY_ATTR_IPV6_SRC,   /* struct in6_addr src IPv6 
address. */
+   PSAMPLE_TUNNEL_KEY_ATTR_IPV6_DST,   /* struct in6_addr dst IPv6 
address. */
+   PSAMPLE_TUNNEL_KEY_ATTR_PAD,
+   PSAMPLE_TUNNEL_KEY_ATTR_ERSPAN_OPTS,/* struct erspan_metadata */
+   PSAMPLE_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE,   /* No argument. 
IPV4_INFO_BRIDGE mode.*/
+   __PSAMPLE_TUNNEL_KEY_ATTR_MAX
+};
+
+/* Can be overridden at runtime by module option */
+#define PSAMPLE_ATTR_MAX (__PSAMPLE_ATTR_MAX - 1)
+
+#define PSAMPLE_NL_MCGRP_CONFIG_NAME "config"
+#define PSAMPLE_NL_MCGRP_SAMPLE_NAME "packets"
+#define PSAMPLE_GENL_NAME "psample"
+#define PSAMPLE_GENL_VERSION 1
+
+#endif
diff --git a/include/linux/tc_act/tc_sample.h b/include/linux/tc_act/tc_sample.h
new file mode 100644
index 0..fee1bcc20
--- /dev/null
+++ b/include/linux/tc_act/tc_sample.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __LINUX_TC_SAMPLE_H
+#define __LINUX_TC_SAMPLE_H
+
+#include 
+#include 
+#include 
+
+struct tc_sample {
+   tc_gen;
+};
+
+enum {
+   TCA_SAMPLE_UNSPEC,
+   TCA_SAMPLE_TM,
+   TCA_SAMPLE_PARMS,
+   TCA_SAMPLE_RATE,
+   TCA_SAMPLE_TRUNC_SIZE,
+   TCA_SAMPLE_PSAMPLE_GROUP,
+   TCA_SAMPLE_PAD,
+   __TCA_SAMPLE_MAX
+};
+#define TCA_SAMPLE_MAX (__TCA_SAMPLE_MAX - 1)
+
+#endif
-- 
2.21.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v13 0/7] Add offload support for sFlow

2021-07-14 Thread Chris Mi via dev
This patch set adds offload support for sFlow.

Psample is a genetlink channel for packet sampling. TC action act_sample
uses psample to send sampled packets to userspace.

When offloading sample action to TC, userspace creates a unique ID to
map sFlow action and tunnel info and passes this ID to kernel instead
of the sFlow info. psample will send this ID and sampled packet to
userspace. Using the ID, userspace can recover the sFlow info and send
sampled packet to the right sFlow monitoring host.

v2-v1:
- Fix robot errors.
v3-v2:
- Remove Gerrit Change-Id.
- Add patch #9 to fix older kernels build issue.
- Add travis test result.
v4-v3:
- Fix offload issue when sampling rate is 1.
v5-v4:
- Move polling thread from ofproto to netdev-offload-tc.
v6-v5:
- Rebase.
- Add GitHub Actions test result.
v7-v6:
- Remove Gerrit Change-Id.
- Fix "ERROR: Inappropriate spacing around cast"
v8-v7
- Address Eelco Chaudron's comment for patch #11.
v9-v8
- Remove sflow_len from struct dpif_sflow_attr.
- Log a debug message for other userspace actions.
v10-v9
- Address Eelco Chaudron's comments on v9.
v11-v10
- Fix a bracing error.
v12-v11
- Add duplicate sample group id check.
v13-v12
- Remove the psample poll thread from netdev-offload-tc and reuse
  ofproto handler thread according to Ilya's new desgin.
- Add dpif-offload-provider layer according to Eli's suggestion.

Chris Mi (7):
  compat: Add psample and tc sample action defines for older kernels
  ovs-kmod-ctl: Load kernel module psample
  dpif-offload-provider: Introduce dpif-offload-provider layer
  netdev-offload-tc: Introduce group ID management API
  dpif-offload-netlink: Implement dpif-offload-provider API
  ofproto: Introduce API to process sFlow offload packet
  netdev-offload-tc: Add offload support for sFlow

 NEWS |   1 +
 include/linux/automake.mk|   4 +-
 include/linux/psample.h  |  62 
 include/linux/tc_act/tc_sample.h |  25 ++
 lib/automake.mk  |   3 +
 lib/dpif-netdev.c|   1 +
 lib/dpif-netlink.c   |   2 +
 lib/dpif-offload-netlink.c   | 210 ++
 lib/dpif-offload-provider.h  |  64 +
 lib/dpif-offload.c   |  43 +++
 lib/dpif-provider.h  |   8 +-
 lib/dpif.c   |  10 +
 lib/netdev-offload-tc.c  | 475 ++-
 lib/netdev-offload.h |   1 +
 lib/tc.c |  61 +++-
 lib/tc.h |  16 +-
 ofproto/ofproto-dpif-upcall.c|  57 
 utilities/ovs-kmod-ctl.in|  14 +
 18 files changed, 1040 insertions(+), 17 deletions(-)
 create mode 100644 include/linux/psample.h
 create mode 100644 include/linux/tc_act/tc_sample.h
 create mode 100644 lib/dpif-offload-netlink.c
 create mode 100644 lib/dpif-offload-provider.h
 create mode 100644 lib/dpif-offload.c

-- 
2.21.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v13 2/7] ovs-kmod-ctl: Load kernel module psample

2021-07-14 Thread Chris Mi via dev
Load kernel module psample to receive sampled packets from TC.
Before removing kernel module psample, remove act_sample first.

Signed-off-by: Chris Mi 
Reviewed-by: Eli Britstein 
Acked-by: Eelco Chaudron 
---
 utilities/ovs-kmod-ctl.in | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/utilities/ovs-kmod-ctl.in b/utilities/ovs-kmod-ctl.in
index 19f100964..6fa945a83 100644
--- a/utilities/ovs-kmod-ctl.in
+++ b/utilities/ovs-kmod-ctl.in
@@ -28,6 +28,14 @@ for dir in "$sbindir" "$bindir" /sbin /bin /usr/sbin 
/usr/bin; do
 done
 
 insert_mods () {
+# Try loading psample kernel module.
+modinfo psample > /dev/null 2>&1
+if test $? = 0; then
+action "Inserting psample module" modprobe psample
+else
+log_warning_msg "No psample module, can't offload sFlow action"
+fi
+
 # Try loading openvswitch kernel module.
 action "Inserting openvswitch module" modprobe openvswitch
 }
@@ -95,6 +103,12 @@ remove_kmods() {
 if test -e /sys/module/vxlan; then
 action "Forcing removal of vxlan module" rmmod vxlan
 fi
+if test -e /sys/module/act_sample; then
+action "Forcing removal of act_sample module" rmmod act_sample
+fi
+if test -e /sys/module/psample; then
+action "Forcing removal of psample module" rmmod psample
+fi
 }
 
 usage () {
-- 
2.21.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v12 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-14 Thread Amber, Kumar
Hi Eelco,

Yeah I missed one Comments below but all have been fixed waiting for more 
reviews on this patch before sending new one.

> -Original Message-
> From: Eelco Chaudron 
> Sent: Wednesday, July 14, 2021 8:43 PM
> To: Amber, Kumar 
> Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van
> Haaren, Harry ; Ferriter, Cian
> ; Stokes, Ian 
> Subject: Re: [v12 06/11] dpif-netdev: Add packet count and core id paramters
> for study
> 
> 
> 
> On 14 Jul 2021, at 16:14, kumar Amber wrote:
> 
> > From: Kumar Amber 
> >
> > This commit introduces additional command line paramter for mfex study
> > function. If user provides additional packet out it is used in study
> > to compare minimum packets which must be processed else a default
> > value is choosen.
> > Also introduces a third paramter for choosing a particular pmd core.
> >
> > $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3
> >
> > Signed-off-by: Kumar Amber 
> >
> > ---
> > v12:
> > - re-work the set command to sweep
> > - inlcude fixes to study.c and doc changes
> > v11:
> > - include comments from Eelco
> > - reworked set command as per discussion
> > v10:
> > - fix review comments Eelco
> > v9:
> > - fix review comments Flavio
> > v7:
> > - change the command paramters for core_id and study_pkt_cnt
> > v5:
> > - fix review comments(Ian, Flavio, Eelco)
> > - introucde pmd core id parameter
> > ---
> > ---
> >  Documentation/topics/dpdk/bridge.rst |  38 +-
> >  lib/dpif-netdev-extract-study.c  |  30 -
> >  lib/dpif-netdev-private-extract.h|   9 ++
> >  lib/dpif-netdev.c| 178 ++-
> >  4 files changed, 222 insertions(+), 33 deletions(-)
> >
> > diff --git a/Documentation/topics/dpdk/bridge.rst
> > b/Documentation/topics/dpdk/bridge.rst
> > index a47153495..8c500c504 100644
> > --- a/Documentation/topics/dpdk/bridge.rst
> > +++ b/Documentation/topics/dpdk/bridge.rst
> > @@ -284,12 +284,46 @@ command also shows whether the CPU supports
> each implementation ::
> >
> >  An implementation can be selected manually by the following command ::
> >
> > -$ ovs-appctl dpif-netdev/miniflow-parser-set study
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
> > + [study_cnt]
> > +
> > +The above command has two optional parameters: study_cnt and core_id.
> > +The core_id sets a particular miniflow extract function to a specific
> > +pmd thread on the core. The third parameter study_cnt, which is
> > +specific to study and ignored by other implementations, means how
> > +many packets are needed to choose the best implementation.
> >
> >  Also user can select the study implementation which studies the
> > traffic for  a specific number of packets by applying all available
> > implementations of  miniflow extract and then chooses the one with the
> > most optimal result for -that traffic pattern.
> > +that traffic pattern. The user can optionally provide an packet count
> > +[study_cnt] parameter which is the minimum number of packets that OVS
> > +must study before choosing an optimal implementation. If no packet
> > +count is provided, then the default value, 128 is chosen. Also, as
> > +there is no synchronization point between threads, one PMD thread
> > +might still be running a previous round, and can now decide on earlier 
> > data.
> > +
> > +The per packet count is a global value, and parallel study executions
> > +with differing packet counts will use the most recent count value provided 
> > by
> user.
> > +
> > +Study can be selected with packet count by the following command ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
> > +
> > +Study can be selected with packet count and explicit PMD selection by
> > +the following command ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
> > +
> > +In the above command the first parameter is the CORE ID of the PMD
> > +thread and this can also be used to explicitly set the miniflow
> > +extraction function pointer on different PMD threads.
> > +
> > +Scalar can be selected on core 3 by the following command where study
> > +count should not be provided for any implementation other than study
> > +::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
> >
> >  Miniflow Extract Validation
> >  ~~~
> > diff --git a/lib/dpif-netdev-extract-study.c
> > b/lib/dpif-netdev-extract-study.c index 02b709f8b..083f940c2 100644
> > --- a/lib/dpif-netdev-extract-study.c
> > +++ b/lib/dpif-netdev-extract-study.c
> > @@ -25,8 +25,7 @@
> >
> >  VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
> >
> > -static atomic_uint32_t mfex_study_pkts_count = 0;
> > -
> > +static atomic_uint32_t  mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;
> 
> You still did not remove the space you introduced (see last couple of patches
> with the same request)?
> 
> You accidentally 

Re: [ovs-dev] [PATCH ovn branch-21.06] Don't suppress localport traffic directed to external port

2021-07-14 Thread 0-day Robot
Bleep bloop.  Greetings Ihar Hrachyshka, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Numan Siddique 
Lines checked: 355, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v12 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-14 Thread Amber, Kumar
Hi Eelco,




From: Eelco Chaudron 
Sent: Wednesday, July 14, 2021 8:26 PM
To: Amber, Kumar 
Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van Haaren, 
Harry ; Ferriter, Cian ; 
Stokes, Ian 
Subject: Re: [v12 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator


On 14 Jul 2021, at 16:14, kumar Amber wrote:

From: Kumar Amber mailto:kumar.am...@intel.com>>

Tests:
6: OVS-DPDK - MFEX Autovalidator
7: OVS-DPDK - MFEX Autovalidator Fuzzy

Added a new directory to store the PCAP file used
in the tests and a script to generate the fuzzy traffic
type pcap to be used in fuzzy unit test.

Signed-off-by: Kumar Amber mailto:kumar.am...@intel.com>>
Acked-by: Flavio Leitner mailto:f...@sysclose.org>>
---
v12:
- change skip paramter for unit test
v11:
- fix comments from Eelco
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- remove sleep from first test and added minor 5 sec sleep to fuzzy
---
---
Documentation/topics/dpdk/bridge.rst | 48 
tests/.gitignore | 1 +
tests/automake.mk | 6 +++
tests/mfex_fuzzy.py | 33 +
tests/pcap/mfex_test.pcap | Bin 0 -> 416 bytes
tests/system-dpdk.at | 53 +++
6 files changed, 141 insertions(+)
create mode 100755 tests/mfex_fuzzy.py
create mode 100644 tests/pcap/mfex_test.pcap

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 8c500c504..2f065422b 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -346,3 +346,51 @@ A compile time option is available in order to test it 
with the OVS unit
test suite. Use the following configure option ::

$ ./configure --enable-mfex-default-autovalidator
+
+Unit Test Miniflow Extract
+++
+
+Unit test can also be used to test the workflow mentioned above by running
+the following test-case in tests/system-dpdk.at ::
+
+ make check-dpdk TESTSUITEFLAGS='-k MFEX'
+ OVS-DPDK - MFEX Autovalidator
+
+The unit test uses mulitple traffic type to test the correctness of the
+implementaions.
+
+Running Fuzzy test with Autovalidator
++
+
+Fuzzy tests can also be done on miniflow extract with the help of
+auto-validator and Scapy. The steps below describes the steps to
+reproduce the setup with IP being fuzzed to generate packets.
+
+Scapy is used to create fuzzy IP packets and save them into a PCAP ::
+
+ pkt = fuzz(Ether()/IP()/TCP())
+
+Set the miniflow extract to autovalidator using ::
+
+ $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
+
+OVS is configured to receive the generated packets ::
+
+ $ ovs-vsctl add-port br0 pcap0 -- \
+ set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
+ "rx_pcap=fuzzy.pcap"
+
+With this workflow, the autovalidator will ensure that all MFEX
+implementations are classifying each packet in exactly the same way.
+If an optimized MFEX implementation causes a different miniflow to be
+generated, the autovalidator has ovs_assert and logging statements that
+will inform about the issue.
+
+Unit Fuzzy test with Autovalidator
++
+
+Unit test can also be used to test the workflow mentioned above by running
+the following test-case in tests/system-dpdk.at ::
+
+ make check-dpdk TESTSUITEFLAGS='-k MFEX'
+ OVS-DPDK - MFEX Autovalidator Fuzzy
diff --git a/tests/.gitignore b/tests/.gitignore
index 45b4f67b2..a3d927e5d 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -11,6 +11,7 @@
/ovsdb-cluster-testsuite
/ovsdb-cluster-testsuite.dir/
/ovsdb-cluster-testsuite.log
+/pcap/
/pki/
/system-afxdp-testsuite
/system-afxdp-testsuite.dir/
diff --git a/tests/automake.mk b/tests/automake.mk
index f45f8d76c..a6c15ba55 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at: tests/automake.mk
echo "TEST_FUZZ_REGRESSION([$$basename])"; \
done > $@.tmp && mv $@.tmp $@

+EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
+MFEX_AUTOVALIDATOR_TESTS = \
+ tests/pcap/mfex_test.pcap \
+ tests/mfex_fuzzy.py
+
OVSDB_CLUSTER_TESTSUITE_AT = \
tests/ovsdb-cluster-testsuite.at \
tests/ovsdb-execution.at \
@@ -512,6 +517,7 @@ tests_test_type_props_SOURCES = tests/test-type-props.c
CHECK_PYFILES = \
tests/appctl.py \
tests/flowgen.py \
+ tests/mfex_fuzzy.py \
tests/ovsdb-monitor-sort.py \
tests/test-daemon.py \
tests/test-json.py \
diff --git a/tests/mfex_fuzzy.py b/tests/mfex_fuzzy.py
new file mode 100755
index 0..5b056bb48
--- /dev/null
+++ b/tests/mfex_fuzzy.py
@@ -0,0 +1,33 @@
+#!/usr/bin/python3
+try:
+ from scapy.all import RandMAC, RandIP, PcapWriter, RandIP6, RandShort, fuzz
+ from scapy.all import IPv6, Dot1Q, IP, Ether, UDP, TCP
+except ModuleNotFoundError as err:
+ print(err + ": Scapy")
+import sys
+
+path = str(sys.argv[1]) + "/pcap/fuzzy.pcap"
+pktdump = PcapWriter(path, append=False, sync=True)
+
+for i in range(0, 2000):
+
+ # Generate 

Re: [ovs-dev] [PATCH v4 ovn] Don't suppress localport traffic directed to external port

2021-07-14 Thread Ihar Hrachyshka
On Wed, Jul 14, 2021 at 8:07 PM Numan Siddique  wrote:
>
> On Wed, Jul 14, 2021 at 4:59 PM Ihar Hrachyshka  wrote:
> >
> > Recently, we stopped leaking localport traffic through localnet ports
> > into fabric to avoid unnecessary flipping between chassis hosting the
> > same localport.
> >
> > Despite the type name, in some scenarios localports are supposed to
> > talk outside the hosting chassis. Specifically, in OpenStack [1]
> > metadata service for SR-IOV ports is implemented as a localport hosted
> > on another chassis that is exposed to the chassis owning the SR-IOV
> > port through an "external" port. In this case, "leaking" localport
> > traffic into fabric is desirable.
> >
> > This patch inserts a higher priority flow per external port on the
> > same datapath that avoids dropping localport traffic.
> >
> > Fixes: 96959e56d634 ("physical: do not forward traffic from localport
> > to a localnet one")
> >
> > [1] https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html
> >
> > Signed-off-by: Ihar Hrachyshka 
>
> Thanks Ihar for v4.
>
> All the tests pass now.  The patch LGTM.  I applied to the main branch.
> I also added the "Reported-at:
> https://bugzilla.redhat.com/show_bug.cgi?id=1974062; tag in
> the commit message.
>
> Does this need a backport ?  Looks like to me.  I tried backporting
> but the added test case fails.
> On 21.06 we don't have the pflow_output and lflow_output separation.
> That could be the reason
> it is failing.  If the fix is required, can you please take a look and
> submit the patch for branch-21.06.
>

I've sent a backport to 21.06 that returns false from
binding_handle_port_binding_changes when external port deleted. This
fixes the test failure.

Let me know if it's too hacky.

Thanks.
Ihar

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn branch-21.06] Don't suppress localport traffic directed to external port

2021-07-14 Thread Ihar Hrachyshka
Recently, we stopped leaking localport traffic through localnet ports
into fabric to avoid unnecessary flipping between chassis hosting the
same localport.

Despite the type name, in some scenarios localports are supposed to
talk outside the hosting chassis. Specifically, in OpenStack [1]
metadata service for SR-IOV ports is implemented as a localport hosted
on another chassis that is exposed to the chassis owning the SR-IOV
port through an "external" port. In this case, "leaking" localport
traffic into fabric is desirable.

This patch inserts a higher priority flow per external port on the
same datapath that avoids dropping localport traffic.

This backport returns false from binding_handle_port_binding_changes
on external port delete to enforce physical flow recalculation. This
fixes the test case.

Fixes: 96959e56d634 ("physical: do not forward traffic from localport
to a localnet one")

[1] https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1974062

Signed-off-by: Ihar Hrachyshka 
Signed-off-by: Numan Siddique 
(cherry picked from commit 1148580290d0ace803f20aeaa0241dd51c100630)
---
 controller/binding.c| 39 +++--
 controller/ovn-controller.c |  2 +
 controller/ovn-controller.h |  2 +
 controller/physical.c   | 46 
 tests/ovn.at| 85 +
 5 files changed, 170 insertions(+), 4 deletions(-)

diff --git a/controller/binding.c b/controller/binding.c
index 594babc98..1c648fc17 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -108,6 +108,7 @@ add_local_datapath__(struct ovsdb_idl_index 
*sbrec_datapath_binding_by_key,
 hmap_insert(local_datapaths, >hmap_node, dp_key);
 ld->datapath = datapath;
 ld->localnet_port = NULL;
+shash_init(>external_ports);
 ld->has_local_l3gateway = has_local_l3gateway;
 
 if (tracked_datapaths) {
@@ -474,6 +475,18 @@ is_network_plugged(const struct sbrec_port_binding 
*binding_rec,
 return network ? !!shash_find_data(bridge_mappings, network) : false;
 }
 
+static void
+update_ld_external_ports(const struct sbrec_port_binding *binding_rec,
+ struct hmap *local_datapaths)
+{
+struct local_datapath *ld = get_local_datapath(
+local_datapaths, binding_rec->datapath->tunnel_key);
+if (ld) {
+shash_replace(>external_ports, binding_rec->logical_port,
+  binding_rec);
+}
+}
+
 static void
 update_ld_localnet_port(const struct sbrec_port_binding *binding_rec,
 struct shash *bridge_mappings,
@@ -1631,8 +1644,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 !sset_is_empty(b_ctx_out->egress_ifaces) ? _map : NULL;
 
 struct ovs_list localnet_lports = OVS_LIST_INITIALIZER(_lports);
+struct ovs_list external_lports = OVS_LIST_INITIALIZER(_lports);
 
-struct localnet_lport {
+struct lport {
 struct ovs_list list_node;
 const struct sbrec_port_binding *pb;
 };
@@ -1680,11 +1694,14 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 
 case LP_EXTERNAL:
 consider_external_lport(pb, b_ctx_in, b_ctx_out);
+struct lport *ext_lport = xmalloc(sizeof *ext_lport);
+ext_lport->pb = pb;
+ovs_list_push_back(_lports, _lport->list_node);
 break;
 
 case LP_LOCALNET: {
 consider_localnet_lport(pb, b_ctx_in, b_ctx_out, _map);
-struct localnet_lport *lnet_lport = xmalloc(sizeof *lnet_lport);
+struct lport *lnet_lport = xmalloc(sizeof *lnet_lport);
 lnet_lport->pb = pb;
 ovs_list_push_back(_lports, _lport->list_node);
 break;
@@ -1711,7 +1728,7 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 /* Run through each localnet lport list to see if it is a localnet port
  * on local datapaths discovered from above loop, and update the
  * corresponding local datapath accordingly. */
-struct localnet_lport *lnet_lport;
+struct lport *lnet_lport;
 LIST_FOR_EACH_POP (lnet_lport, list_node, _lports) {
 update_ld_localnet_port(lnet_lport->pb, _mappings,
 b_ctx_out->egress_ifaces,
@@ -1719,6 +1736,15 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 free(lnet_lport);
 }
 
+/* Run through external lport list to see if these are external ports
+ * on local datapaths discovered from above loop, and update the
+ * corresponding local datapath accordingly. */
+struct lport *ext_lport;
+LIST_FOR_EACH_POP (ext_lport, list_node, _lports) {
+update_ld_external_ports(ext_lport->pb, b_ctx_out->local_datapaths);
+free(ext_lport);
+}
+
 shash_destroy(_mappings);
 
 if 

Re: [ovs-dev] [PATCH v4 ovn] Don't suppress localport traffic directed to external port

2021-07-14 Thread Ihar Hrachyshka
On Wed, Jul 14, 2021 at 8:07 PM Numan Siddique  wrote:
>
> On Wed, Jul 14, 2021 at 4:59 PM Ihar Hrachyshka  wrote:
> >
> > Recently, we stopped leaking localport traffic through localnet ports
> > into fabric to avoid unnecessary flipping between chassis hosting the
> > same localport.
> >
> > Despite the type name, in some scenarios localports are supposed to
> > talk outside the hosting chassis. Specifically, in OpenStack [1]
> > metadata service for SR-IOV ports is implemented as a localport hosted
> > on another chassis that is exposed to the chassis owning the SR-IOV
> > port through an "external" port. In this case, "leaking" localport
> > traffic into fabric is desirable.
> >
> > This patch inserts a higher priority flow per external port on the
> > same datapath that avoids dropping localport traffic.
> >
> > Fixes: 96959e56d634 ("physical: do not forward traffic from localport
> > to a localnet one")
> >
> > [1] https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html
> >
> > Signed-off-by: Ihar Hrachyshka 
>
> Thanks Ihar for v4.
>
> All the tests pass now.  The patch LGTM.  I applied to the main branch.
> I also added the "Reported-at:
> https://bugzilla.redhat.com/show_bug.cgi?id=1974062; tag in
> the commit message.

Thanks!!

>
> Does this need a backport ?  Looks like to me.  I tried backporting
> but the added test case fails.
> On 21.06 we don't have the pflow_output and lflow_output separation.
> That could be the reason
> it is failing.  If the fix is required, can you please take a look and
> submit the patch for branch-21.06.

I'll take a look and propose a backport, hopefully today.

>
> Regards
> Numan
>
> >
> > --
> >
> > v1: initial version.
> > v2: fixed code for unbound external ports.
> > v2: rebased.
> > v3: optimize external ports iteration.
> > v3: rate limit error message on mac address parse failure.
> > v4: fixed several memory leaks on local_datapaths cleanup.
> > v4: properly clean up flows for deleted external ports.
> > v4: test that external port created after localnet works.
> > v4: test that external port deleted doesn't pass traffic.
> > ---
> >  controller/binding.c| 35 +--
> >  controller/ovn-controller.c |  2 +
> >  controller/ovn-controller.h |  2 +
> >  controller/physical.c   | 46 
> >  tests/ovn.at| 85 +
> >  5 files changed, 167 insertions(+), 3 deletions(-)
> >
> > diff --git a/controller/binding.c b/controller/binding.c
> > index 70bf13390..d50f3affa 100644
> > --- a/controller/binding.c
> > +++ b/controller/binding.c
> > @@ -108,6 +108,7 @@ add_local_datapath__(struct ovsdb_idl_index 
> > *sbrec_datapath_binding_by_key,
> >  hmap_insert(local_datapaths, >hmap_node, dp_key);
> >  ld->datapath = datapath;
> >  ld->localnet_port = NULL;
> > +shash_init(>external_ports);
> >  ld->has_local_l3gateway = has_local_l3gateway;
> >
> >  if (tracked_datapaths) {
> > @@ -474,6 +475,18 @@ is_network_plugged(const struct sbrec_port_binding 
> > *binding_rec,
> >  return network ? !!shash_find_data(bridge_mappings, network) : false;
> >  }
> >
> > +static void
> > +update_ld_external_ports(const struct sbrec_port_binding *binding_rec,
> > + struct hmap *local_datapaths)
> > +{
> > +struct local_datapath *ld = get_local_datapath(
> > +local_datapaths, binding_rec->datapath->tunnel_key);
> > +if (ld) {
> > +shash_replace(>external_ports, binding_rec->logical_port,
> > +  binding_rec);
> > +}
> > +}
> > +
> >  static void
> >  update_ld_localnet_port(const struct sbrec_port_binding *binding_rec,
> >  struct shash *bridge_mappings,
> > @@ -1657,8 +1670,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> > binding_ctx_out *b_ctx_out)
> >  !sset_is_empty(b_ctx_out->egress_ifaces) ? _map : NULL;
> >
> >  struct ovs_list localnet_lports = 
> > OVS_LIST_INITIALIZER(_lports);
> > +struct ovs_list external_lports = 
> > OVS_LIST_INITIALIZER(_lports);
> >
> > -struct localnet_lport {
> > +struct lport {
> >  struct ovs_list list_node;
> >  const struct sbrec_port_binding *pb;
> >  };
> > @@ -1713,11 +1727,14 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> > binding_ctx_out *b_ctx_out)
> >
> >  case LP_EXTERNAL:
> >  consider_external_lport(pb, b_ctx_in, b_ctx_out);
> > +struct lport *ext_lport = xmalloc(sizeof *ext_lport);
> > +ext_lport->pb = pb;
> > +ovs_list_push_back(_lports, _lport->list_node);
> >  break;
> >
> >  case LP_LOCALNET: {
> >  consider_localnet_lport(pb, b_ctx_in, b_ctx_out, _map);
> > -struct localnet_lport *lnet_lport = xmalloc(sizeof 
> > *lnet_lport);
> > +struct lport *lnet_lport = xmalloc(sizeof *lnet_lport);
> >  lnet_lport->pb = pb;
> >  

Re: [ovs-dev] [PATCH ovn] Fix compilation error for m32.

2021-07-14 Thread Numan Siddique
On Wed, Jul 14, 2021 at 7:47 PM Ben Pfaff  wrote:
>
> On Wed, Jul 14, 2021 at 06:23:15PM -0400, num...@ovn.org wrote:
> > From: Numan Siddique 
> >
> > Fixes: 895e02ec0be6("ovn-sbctl.c Add logical flows count numbers")
> > Signed-off-by: Numan Siddique 
>
> Acked-by: Ben Pfaff 

Thanks for the review.  I applied the patch.

Numan

> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 ovn] Don't suppress localport traffic directed to external port

2021-07-14 Thread Numan Siddique
On Wed, Jul 14, 2021 at 4:59 PM Ihar Hrachyshka  wrote:
>
> Recently, we stopped leaking localport traffic through localnet ports
> into fabric to avoid unnecessary flipping between chassis hosting the
> same localport.
>
> Despite the type name, in some scenarios localports are supposed to
> talk outside the hosting chassis. Specifically, in OpenStack [1]
> metadata service for SR-IOV ports is implemented as a localport hosted
> on another chassis that is exposed to the chassis owning the SR-IOV
> port through an "external" port. In this case, "leaking" localport
> traffic into fabric is desirable.
>
> This patch inserts a higher priority flow per external port on the
> same datapath that avoids dropping localport traffic.
>
> Fixes: 96959e56d634 ("physical: do not forward traffic from localport
> to a localnet one")
>
> [1] https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html
>
> Signed-off-by: Ihar Hrachyshka 

Thanks Ihar for v4.

All the tests pass now.  The patch LGTM.  I applied to the main branch.
I also added the "Reported-at:
https://bugzilla.redhat.com/show_bug.cgi?id=1974062; tag in
the commit message.

Does this need a backport ?  Looks like to me.  I tried backporting
but the added test case fails.
On 21.06 we don't have the pflow_output and lflow_output separation.
That could be the reason
it is failing.  If the fix is required, can you please take a look and
submit the patch for branch-21.06.

Regards
Numan

>
> --
>
> v1: initial version.
> v2: fixed code for unbound external ports.
> v2: rebased.
> v3: optimize external ports iteration.
> v3: rate limit error message on mac address parse failure.
> v4: fixed several memory leaks on local_datapaths cleanup.
> v4: properly clean up flows for deleted external ports.
> v4: test that external port created after localnet works.
> v4: test that external port deleted doesn't pass traffic.
> ---
>  controller/binding.c| 35 +--
>  controller/ovn-controller.c |  2 +
>  controller/ovn-controller.h |  2 +
>  controller/physical.c   | 46 
>  tests/ovn.at| 85 +
>  5 files changed, 167 insertions(+), 3 deletions(-)
>
> diff --git a/controller/binding.c b/controller/binding.c
> index 70bf13390..d50f3affa 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -108,6 +108,7 @@ add_local_datapath__(struct ovsdb_idl_index 
> *sbrec_datapath_binding_by_key,
>  hmap_insert(local_datapaths, >hmap_node, dp_key);
>  ld->datapath = datapath;
>  ld->localnet_port = NULL;
> +shash_init(>external_ports);
>  ld->has_local_l3gateway = has_local_l3gateway;
>
>  if (tracked_datapaths) {
> @@ -474,6 +475,18 @@ is_network_plugged(const struct sbrec_port_binding 
> *binding_rec,
>  return network ? !!shash_find_data(bridge_mappings, network) : false;
>  }
>
> +static void
> +update_ld_external_ports(const struct sbrec_port_binding *binding_rec,
> + struct hmap *local_datapaths)
> +{
> +struct local_datapath *ld = get_local_datapath(
> +local_datapaths, binding_rec->datapath->tunnel_key);
> +if (ld) {
> +shash_replace(>external_ports, binding_rec->logical_port,
> +  binding_rec);
> +}
> +}
> +
>  static void
>  update_ld_localnet_port(const struct sbrec_port_binding *binding_rec,
>  struct shash *bridge_mappings,
> @@ -1657,8 +1670,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> binding_ctx_out *b_ctx_out)
>  !sset_is_empty(b_ctx_out->egress_ifaces) ? _map : NULL;
>
>  struct ovs_list localnet_lports = OVS_LIST_INITIALIZER(_lports);
> +struct ovs_list external_lports = OVS_LIST_INITIALIZER(_lports);
>
> -struct localnet_lport {
> +struct lport {
>  struct ovs_list list_node;
>  const struct sbrec_port_binding *pb;
>  };
> @@ -1713,11 +1727,14 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> binding_ctx_out *b_ctx_out)
>
>  case LP_EXTERNAL:
>  consider_external_lport(pb, b_ctx_in, b_ctx_out);
> +struct lport *ext_lport = xmalloc(sizeof *ext_lport);
> +ext_lport->pb = pb;
> +ovs_list_push_back(_lports, _lport->list_node);
>  break;
>
>  case LP_LOCALNET: {
>  consider_localnet_lport(pb, b_ctx_in, b_ctx_out, _map);
> -struct localnet_lport *lnet_lport = xmalloc(sizeof *lnet_lport);
> +struct lport *lnet_lport = xmalloc(sizeof *lnet_lport);
>  lnet_lport->pb = pb;
>  ovs_list_push_back(_lports, _lport->list_node);
>  break;
> @@ -1744,7 +1761,7 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> binding_ctx_out *b_ctx_out)
>  /* Run through each localnet lport list to see if it is a localnet port
>   * on local datapaths discovered from above loop, and update the
>   * corresponding local datapath accordingly. 

Re: [ovs-dev] [PATCH v3 ovn] Don't suppress localport traffic directed to external port

2021-07-14 Thread Numan Siddique
On Wed, Jul 14, 2021 at 4:59 PM Ihar Hrachyshka  wrote:
>
> On Wed, Jul 14, 2021 at 11:21 AM Numan Siddique  wrote:
> >
> > On Tue, Jul 13, 2021 at 8:40 PM Ihar Hrachyshka  wrote:
> > >
> > > Recently, we stopped leaking localport traffic through localnet ports
> > > into fabric to avoid unnecessary flipping between chassis hosting the
> > > same localport.
> > >
> > > Despite the type name, in some scenarios localports are supposed to
> > > talk outside the hosting chassis. Specifically, in OpenStack [1]
> > > metadata service for SR-IOV ports is implemented as a localport hosted
> > > on another chassis that is exposed to the chassis owning the SR-IOV
> > > port through an "external" port. In this case, "leaking" localport
> > > traffic into fabric is desirable.
> > >
> > > This patch inserts a higher priority flow per external port on the
> > > same datapath that avoids dropping localport traffic.
> > >
> > > Fixes: 96959e56d634 ("physical: do not forward traffic from localport
> > > to a localnet one")
> > >
> > > [1] https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html
> > >
> > > Signed-off-by: Ihar Hrachyshka 
> >
> > Hi Ihar,
> >
> > Thanks for the patch.  There are a few memory leaks which can be easily 
> > fixed.
> >
>
> Thanks, I addressed all the comments in the next v4 patch. Sorry for
> leaks and other issues, I am still puzzled by all the code paths
> triggered by db state change.
>
> > Please see  below for a few comments.
> >
> >
> > >
> > > --
> > >
> > > v1: initial version.
> > > v2: fixed code for unbound external ports.
> > > v2: rebased.
> > > v3: optimize external ports iteration.
> > > v3: rate limit error message on mac address parse failure.
> > > ---
> > >  controller/binding.c| 33 +---
> > >  controller/ovn-controller.c |  1 +
> > >  controller/ovn-controller.h |  2 ++
> > >  controller/physical.c   | 46 +
> > >  tests/ovn.at| 51 +
> > >  5 files changed, 130 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/controller/binding.c b/controller/binding.c
> > > index 70bf13390..87195e5fc 100644
> > > --- a/controller/binding.c
> > > +++ b/controller/binding.c
> > > @@ -108,6 +108,7 @@ add_local_datapath__(struct ovsdb_idl_index 
> > > *sbrec_datapath_binding_by_key,
> > >  hmap_insert(local_datapaths, >hmap_node, dp_key);
> > >  ld->datapath = datapath;
> > >  ld->localnet_port = NULL;
> > > +shash_init(>external_ports);
> > >  ld->has_local_l3gateway = has_local_l3gateway;
> > >
> > >  if (tracked_datapaths) {
> > > @@ -474,6 +475,18 @@ is_network_plugged(const struct sbrec_port_binding 
> > > *binding_rec,
> > >  return network ? !!shash_find_data(bridge_mappings, network) : false;
> > >  }
> > >
> > > +static void
> > > +update_ld_external_ports(const struct sbrec_port_binding *binding_rec,
> > > + struct hmap *local_datapaths)
> > > +{
> > > +struct local_datapath *ld = get_local_datapath(
> > > +local_datapaths, binding_rec->datapath->tunnel_key);
> > > +if (ld) {
> > > +shash_replace(>external_ports, binding_rec->logical_port,
> > > +  binding_rec);
> > > +}
> > > +}
> > > +
> > >  static void
> > >  update_ld_localnet_port(const struct sbrec_port_binding *binding_rec,
> > >  struct shash *bridge_mappings,
> > > @@ -1657,8 +1670,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> > > binding_ctx_out *b_ctx_out)
> > >  !sset_is_empty(b_ctx_out->egress_ifaces) ? _map : NULL;
> > >
> > >  struct ovs_list localnet_lports = 
> > > OVS_LIST_INITIALIZER(_lports);
> > > +struct ovs_list external_lports = 
> > > OVS_LIST_INITIALIZER(_lports);
> > >
> > > -struct localnet_lport {
> > > +struct lport {
> > >  struct ovs_list list_node;
> > >  const struct sbrec_port_binding *pb;
> > >  };
> > > @@ -1713,11 +1727,14 @@ binding_run(struct binding_ctx_in *b_ctx_in, 
> > > struct binding_ctx_out *b_ctx_out)
> > >
> > >  case LP_EXTERNAL:
> > >  consider_external_lport(pb, b_ctx_in, b_ctx_out);
> > > +struct lport *ext_lport = xmalloc(sizeof *ext_lport);
> > > +ext_lport->pb = pb;
> > > +ovs_list_push_back(_lports, _lport->list_node);
> > >  break;
> > >
> > >  case LP_LOCALNET: {
> > >  consider_localnet_lport(pb, b_ctx_in, b_ctx_out, _map);
> > > -struct localnet_lport *lnet_lport = xmalloc(sizeof 
> > > *lnet_lport);
> > > +struct lport *lnet_lport = xmalloc(sizeof *lnet_lport);
> > >  lnet_lport->pb = pb;
> > >  ovs_list_push_back(_lports, _lport->list_node);
> > >  break;
> > > @@ -1744,7 +1761,7 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> > > binding_ctx_out *b_ctx_out)
> > >  /* Run through each localnet lport list to 

Re: [ovs-dev] [PATCH ovn] Fix compilation error for m32.

2021-07-14 Thread Ben Pfaff
On Wed, Jul 14, 2021 at 06:23:15PM -0400, num...@ovn.org wrote:
> From: Numan Siddique 
> 
> Fixes: 895e02ec0be6("ovn-sbctl.c Add logical flows count numbers")
> Signed-off-by: Numan Siddique 

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] latch-unix: Make the latch read buffer shared

2021-07-14 Thread Ben Pfaff
On Wed, Jul 14, 2021 at 09:12:10PM +0100, Anton Ivanov wrote:
> On 14/07/2021 19:33, Ben Pfaff wrote:
> > On Wed, Jul 14, 2021 at 05:36:36PM +0100, anton.iva...@cambridgegreys.com 
> > wrote:
> > > From: Anton Ivanov 
> > > 
> > > There is no point to add 512 bytes on the stack
> > > every time latch is polled. Alignment, cache line thrashing,
> > > etc - you name it.
> > Do you have evidence this is a real problem?
> 
> I played a bit with it using the ovn-heater benchmark, difference was
> marginal.
> 
> IMHO it will result in a difference only on a bigger setup which I cannot
> simulate.
> 
> > 
> > > The result of the read is discarded anyway so the buffer
> > > can be shared by all latches.
> > > 
> > > Signed-off-by: Anton Ivanov 
> > > +/* All writes to latch are zero sized. Even 16 bytes are an overkill */
> > > +static char latch_buffer[16];
> > This comment is wrong.  Writes to a latch are 1 byte.
> 
> Me bad - I saw the "" in write() and ignored the 1 passed as length.
> 
> > 
> > latch_poll() is supposed to fully clear any buffered data.  It shouldn't
> > cause behavioral problems if it doesn't, and I imagine that it's rare
> > that there'd be more than 16 queued notifications, but it seems
> > regressive to just clear some of them.
> 
> The read can be looped. In fact, for full correctness it should be looped
> regardless of the size of the read buffer.

Couldn't hurt.

> So maybe 16 local looped?

I'd be OK with that.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v12 00/11] MFEX Infrastructure + Optimizations

2021-07-14 Thread Flavio Leitner


Hi,

I have reviewed more patches, but there are some that Eelco
requested to follow up with fixes, so I am going to wait as well.

Thanks,
fbl

On Wed, Jul 14, 2021 at 07:44:33PM +0530, kumar Amber wrote:
> v12:
> - re-work the set command to sweep method
> - changes skip for unit-test to true from not-available
> - added acks from Eelco
> - minor doc fixed and typos
> v11:
> - reworked set command in alingment with Eelco and Harry
> - added Acks from Eelco.
> - added skip to unit test if other implementations not available
> - minor typos and fixes
> - clang build fixes
> - removed patch whith Scalar DPIF, will send separately
> v10 update:
> - re-worked the default implementation
> - fix comments from Flavio and Eelco
> - Include Acks from Eelco in study
> v9 update:
> - Include review comments from Flavio
> - Rebase onto Master
> - Include Acks from Flavio
> v8 updates:
> - Include documentation on AVX512 MFEX as per Eelco's suggestion on list
> v7 updates:
> - Rebase onto DPIF v15
> - Changed commands to get and set MFEX
> - Fixed comments from Flavio, Eelco
> - Segrated addition of MFEX options to seaprate patch 12 for Scalar DPIF
> - Removed sleep from auto-validator and added frame counter check
> - Documentation updates
> - Minor bug fixes
> v6 updates:
> - Fix non-ssl build
> v5 updates:
> - reabse onto latest DPIF v14
> - use Enum for mfex impls
> - add pmd core id set paramter in set command
> - get command modified to display the pmd thread for individual mfex functions
> - resolved comments from Eelco, Ian, Flavio
> - Use Atomic to get and set miniflow implementations
> - removed and reduced sleep in unit tests
> - fixed scalar miniflow perf degradation
> v4 updates:
> - rebase on to latest DPIF v13
> - fix fuzzy.py script with random mac/ip
> v3 updates:
> - rebase on to latest DPIF v12
> - add additonal AVX512 traffic profiles for tcp and vlan
> - add new command line for study function to add packet count
> - add unit tests for fuzzy testing and auto-validation of mfex
> - add mfex option hit stats to perf-show command
> v2 updates:
> - rebase on to latest DPIF v11
> This patchset introduces miniflow extract Infrastructure changes
> which allows user to choose different type of ISA based optimized
> miniflow extract variants which can be user choosen or set based on 
> packets studies automatically by OVS using different commands.
> The Infrastructure also provides a way to check the correctness of
> different ISA optimized miniflow extract variants against the scalar
> version.
> 
> Harry van Haaren (4):
>   dpif/stats: add miniflow extract opt hits counter
>   dpdk: add additional CPU ISA detection strings
>   dpif-netdev/mfex: Add AVX512 based optimized miniflow extract
>   dpif-netdev/mfex: add more AVX512 traffic profiles
> 
> Kumar Amber (7):
>   dpif-netdev: Add command line and function pointer for miniflow
> extract
>   dpif-netdev: Add auto validation function for miniflow extract
>   dpif-netdev: Add study function to select the best mfex function
>   docs/dpdk/bridge: add miniflow extract section.
>   dpif-netdev: Add configure to enable autovalidator at build time.
>   dpif-netdev: Add packet count and core id paramters for study
>   test/sytem-dpdk: Add unit test for mfex autovalidator
> 
>  Documentation/topics/dpdk/bridge.rst | 138 ++
>  NEWS |  11 +
>  acinclude.m4 |  16 +
>  configure.ac |   1 +
>  lib/automake.mk  |   4 +
>  lib/dpdk.c   |   2 +
>  lib/dpif-netdev-avx512.c |  34 +-
>  lib/dpif-netdev-extract-avx512.c | 630 +++
>  lib/dpif-netdev-extract-study.c  | 160 +++
>  lib/dpif-netdev-perf.c   |   3 +
>  lib/dpif-netdev-perf.h   |   1 +
>  lib/dpif-netdev-private-extract.c| 371 
>  lib/dpif-netdev-private-extract.h| 203 +
>  lib/dpif-netdev-private-thread.h |   8 +
>  lib/dpif-netdev-unixctl.man  |   4 +
>  lib/dpif-netdev.c| 241 +-
>  tests/.gitignore |   1 +
>  tests/automake.mk|   6 +
>  tests/mfex_fuzzy.py  |  33 ++
>  tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
>  tests/pmd.at |   6 +-
>  tests/system-dpdk.at |  53 +++
>  22 files changed, 1914 insertions(+), 12 deletions(-)
>  create mode 100644 lib/dpif-netdev-extract-avx512.c
>  create mode 100644 lib/dpif-netdev-extract-study.c
>  create mode 100644 lib/dpif-netdev-private-extract.c
>  create mode 100644 lib/dpif-netdev-private-extract.h
>  create mode 100755 tests/mfex_fuzzy.py
>  create mode 100644 tests/pcap/mfex_test.pcap
> 
> -- 
> 2.25.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl

[ovs-dev] [PATCH ovn] Fix compilation error for m32.

2021-07-14 Thread numans
From: Numan Siddique 

Fixes: 895e02ec0be6("ovn-sbctl.c Add logical flows count numbers")
Signed-off-by: Numan Siddique 
---
 utilities/ovn-sbctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/utilities/ovn-sbctl.c b/utilities/ovn-sbctl.c
index 1ab148a67b..4d7e7d6702 100644
--- a/utilities/ovn-sbctl.c
+++ b/utilities/ovn-sbctl.c
@@ -983,7 +983,7 @@ print_lflow_counters(size_t n_flows, struct sbctl_lflow 
*lflows, struct ds *s)
prev->lflow->pipeline, dp_lflows, s);
 
 }
-ds_put_format(s, "Total number of logical flows = %ld\n", n_flows);
+ds_put_format(s, "Total number of logical flows = %"PRIuSIZE"\n", n_flows);
 }
 
 static void
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: Add config option to specify # of threads

2021-07-14 Thread Numan Siddique
On Wed, Jul 7, 2021 at 3:15 AM Fabrizio D'Angelo  wrote:
>
> Uses northd database to specify number of threads that should be used
> when lflow parallel computation is enabled.
>
> Example:
> ovn-nbctl set NB_Global . options:num_parallel_threads=16
>
> Reported at:
> https://bugzilla.redhat.com/show_bug.cgi?id=1975345
>
> Signed-off-by: Fabrizio D'Angelo 

Hi Fabrizio,

Thanks for the patch.

I tested this patch and I don't think it is working as expected.
Mainly because of the way ovn-parallel-hmap.c
sets up the pools.

When ovn-northd is started, it calls ovn_can_parallelize_hashes()
first and this sets up the worker pools
by calling setup_worker_pools().

The function setup_worker_pools() is never called later because of
atomic_compare_exchange_strong()
present in ovn_can_parallelize_hashes() and ovn_add_worker_pool().

I think unfortunately it requires some fixes there so that when the
number of threads is configured later, it takes
into effect.  Right now it doesn't.

I think ovn_can_parallelize_hashes() should not try to setup the pool
size, instead it should check if ovn-northd
can do parallelization or not based on the ovs_numa_get_n_cores() and
ovs_numa_get_n_numas().
And if force is set, it should enable parallel processing.

Would you mind taking another look at the functions -
setup_worker_pools(), ovn_can_parallelize_hashes()
and ovn_add_worker_pool()  ?  So that we can enable or disable
parallelization dynamically
and also override the pool size with your newly added option -
num_parallel_threads.

Thanks
Numan

> ---
>  lib/ovn-parallel-hmap.c | 12 ++--
>  lib/ovn-parallel-hmap.h |  5 +++--
>  northd/ovn-northd.c |  7 ++-
>  ovn-nb.xml  | 10 ++
>  4 files changed, 25 insertions(+), 9 deletions(-)
>
> diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
> index b8c7ac786..cae0b3110 100644
> --- a/lib/ovn-parallel-hmap.c
> +++ b/lib/ovn-parallel-hmap.c
> @@ -62,7 +62,7 @@ static int pool_size;
>  static int sembase;
>
>  static void worker_pool_hook(void *aux OVS_UNUSED);
> -static void setup_worker_pools(bool force);
> +static void setup_worker_pools(bool force, unsigned int thread_num);
>  static void merge_list_results(struct worker_pool *pool OVS_UNUSED,
> void *fin_result, void *result_frags,
> int index);
> @@ -86,14 +86,14 @@ ovn_can_parallelize_hashes(bool force_parallel)
>  ,
>  true)) {
>  ovs_mutex_lock(_mutex);
> -setup_worker_pools(force_parallel);
> +setup_worker_pools(force_parallel, 0);
>  ovs_mutex_unlock(_mutex);
>  }
>  return can_parallelize;
>  }
>
>  struct worker_pool *
> -ovn_add_worker_pool(void *(*start)(void *))
> +ovn_add_worker_pool(void *(*start)(void *), unsigned int thread_num)

I'd suggest renaming the parameter from 'thread_num' to 'num_threads'.

Thanks
Numan

>  {
>  struct worker_pool *new_pool = NULL;
>  struct worker_control *new_control;
> @@ -109,7 +109,7 @@ ovn_add_worker_pool(void *(*start)(void *))
>  ,
>  true)) {
>  ovs_mutex_lock(_mutex);
> -setup_worker_pools(false);
> +setup_worker_pools(false, thread_num);
>  ovs_mutex_unlock(_mutex);
>  }
>
> @@ -401,14 +401,14 @@ worker_pool_hook(void *aux OVS_UNUSED) {
>  }
>
>  static void
> -setup_worker_pools(bool force) {
> +setup_worker_pools(bool force, unsigned int thread_num) {
>  int cores, nodes;
>
>  nodes = ovs_numa_get_n_numas();
>  if (nodes == OVS_NUMA_UNSPEC || nodes <= 0) {
>  nodes = 1;
>  }
> -cores = ovs_numa_get_n_cores();
> +cores = thread_num ? thread_num : ovs_numa_get_n_cores();
>
>  /* If there is no NUMA config, use 4 cores.
>   * If there is NUMA config use half the cores on
> diff --git a/lib/ovn-parallel-hmap.h b/lib/ovn-parallel-hmap.h
> index 0af8914c4..9637a273d 100644
> --- a/lib/ovn-parallel-hmap.h
> +++ b/lib/ovn-parallel-hmap.h
> @@ -95,7 +95,8 @@ struct worker_pool {
>  /* Add a worker pool for thread function start() which expects a pointer to
>   * a worker_control structure as an argument. */
>
> -struct worker_pool *ovn_add_worker_pool(void *(*start)(void *));
> +struct worker_pool *ovn_add_worker_pool(void *(*start)(void *),
> +unsigned int thread_num);
>
>  /* Setting this to true will make all processing threads exit */
>
> @@ -265,7 +266,7 @@ bool ovn_can_parallelize_hashes(bool force_parallel);
>
>  #define stop_parallel_processing() ovn_stop_parallel_processing()
>
> -#define add_worker_pool(start) ovn_add_worker_pool(start)
> +#define add_worker_pool(start, thread_num) ovn_add_worker_pool(start, 
> thread_num)
>
>  #define fast_hmap_size_for(hmap, size) ovn_fast_hmap_size_for(hmap, size)
>
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 570c6a3ef..ffefac361 100644
> --- a/northd/ovn-northd.c
> +++ 

Re: [ovs-dev] [PATCH ovn v8] ovn-sbctl.c Add logical flows count numbers

2021-07-14 Thread Numan Siddique
On Wed, Jul 14, 2021 at 8:23 AM Mark Michelson  wrote:
>
> Looks good to me Alexey, thanks!
>
> Acked-by: Mark Michelson 

Thanks Alexey and Mark for the reviews.

I applied this patch to the main branch.

Numan

>
> On 6/19/21 3:38 PM, Alexey Roytman wrote:
> > From: Alexey Roytman 
> >
> > For big scale deployments, when number of logical flows can be 2M+,
> > sometimes users just need to know the total number of logical flows
> > and numbers of logical flows per table/per datapath.
> >
> > New command output
> > Datapath: "sw1" (4b1e53d8-9f0f-4768-b4a6-6cbc58a4bfda)  Pipeline: ingress
> >table=0 (ls_in_port_sec_l2  ) lflows=2
> >table=1 (ls_in_port_sec_ip  ) lflows=1
> >table=2 (ls_in_port_sec_nd  ) lflows=1
> >table=3 (ls_in_lookup_fdb   ) lflows=1
> >table=4 (ls_in_put_fdb  ) lflows=1
> >table=5 (ls_in_pre_acl  ) lflows=2
> >table=6 (ls_in_pre_lb   ) lflows=3
> >table=7 (ls_in_pre_stateful ) lflows=2
> >table=8 (ls_in_acl_hint ) lflows=1
> >table=9 (ls_in_acl  ) lflows=2
> >table=10(ls_in_qos_mark ) lflows=1
> >table=11(ls_in_qos_meter) lflows=1
> >table=12(ls_in_lb   ) lflows=1
> >table=13(ls_in_stateful ) lflows=8
> >table=14(ls_in_pre_hairpin  ) lflows=1
> >table=15(ls_in_nat_hairpin  ) lflows=1
> >table=16(ls_in_hairpin  ) lflows=1
> >table=17(ls_in_arp_rsp  ) lflows=1
> >table=18(ls_in_dhcp_options ) lflows=1
> >table=19(ls_in_dhcp_response) lflows=1
> >table=20(ls_in_dns_lookup   ) lflows=1
> >table=21(ls_in_dns_response ) lflows=1
> >table=22(ls_in_external_port) lflows=1
> >table=23(ls_in_l2_lkup  ) lflows=3
> >table=24(ls_in_l2_unknown   ) lflows=2
> > Total number of logical flows in the datapath "sw1" 
> > (4b1e53d8-9f0f-4768-b4a6-6cbc58a4bfda) Pipeline: ingress = 41
> >
> > Datapath: "sw1" (4b1e53d8-9f0f-4768-b4a6-6cbc58a4bfda)  Pipeline: egress
> >table=0 (ls_out_pre_lb  ) lflows=3
> >table=1 (ls_out_pre_acl ) lflows=2
> >table=2 (ls_out_pre_stateful) lflows=2
> >table=3 (ls_out_lb  ) lflows=1
> >table=4 (ls_out_acl_hint) lflows=1
> >table=5 (ls_out_acl ) lflows=2
> >table=6 (ls_out_qos_mark) lflows=1
> >table=7 (ls_out_qos_meter   ) lflows=1
> >table=8 (ls_out_stateful) lflows=3
> >table=9 (ls_out_port_sec_ip ) lflows=1
> >table=10(ls_out_port_sec_l2 ) lflows=1
> > Total number of logical flows in the datapath "sw1" 
> > (4b1e53d8-9f0f-4768-b4a6-6cbc58a4bfda) Pipeline: egress = 18
> >
> > Total number of logical flows = 59
> >
> > Signed-off-by: Alexey Roytman 
> >
> > ---
> > V6 -> V7
> >   * Addressed commit b6f0e51d8b52cf2381503c3c1c5c2a0d6bd7afa6 and Matk's 
> > comments
> > v5 -> v6
> >   * Addressed Ben's comments about replacemen the --count flag of 
> > lflow-list/dump-flows by a a "count-flows" command.
> > v3 -> v4
> >   * Addressed review comments from Mark
> >
> > ---
> >   tests/ovn-sbctl.at|  69 -
> >   utilities/ovn-sbctl.8.xml |   3 ++
> >   utilities/ovn-sbctl.c | 106 +++---
> >   3 files changed, 169 insertions(+), 9 deletions(-)
> >
> > diff --git a/tests/ovn-sbctl.at b/tests/ovn-sbctl.at
> > index f49134381..16f5dabcc 100644
> > --- a/tests/ovn-sbctl.at
> > +++ b/tests/ovn-sbctl.at
> > @@ -175,4 +175,71 @@ inactivity_probe: 3
> >
> >   OVN_SBCTL_TEST([ovn_sbctl_invalid_0x_flow], [invalid 0x flow], [
> >   check ovn-sbctl lflow-list 0x12345678
> > -])
> > \ No newline at end of file
> > +])
> > +
> > +dnl -
> > +
> > +OVN_SBCTL_TEST([ovn_sbctl_count_flows], [ovn-sbctl - count-flows], [
> > +
> > +count_entries() {
> > +ovn-sbctl --column=_uuid list Logical_Flow | sed -r '/^\s*$/d' | wc -l
> > +}
> > +
> > +count_pipeline() {
> > +ovn-sbctl  --column=pipeline list Logical_Flow | grep $1 | sed -r 
> > '/^\s*$/d' | wc -l
> > +}
> > +
> > +# we start with empty Logical_Flow table
> > +# validate that the table is indeed empty
> > +AT_CHECK([count_entries], [0], [dnl
> > +0
> > +])
> > +
> > +AT_CHECK([ovn-sbctl count-flows], [0], [dnl
> > +Total number of logical flows = 0
> > +])
> > +
> > +# create some logical flows
> > +check ovn-nbctl ls-add count-test
> > +
> > +OVS_WAIT_UNTIL([total_lflows=`count_entries`; test $total_lflows -ne 0])
> > +
> > +total_lflows=`count_entries`
> > +egress_lflows=`count_pipeline egress`
> > +ingress_lflows=`count_pipeline ingress`
> > +
> > +AT_CHECK_UNQUOTED([ovn-sbctl count-flows | grep "flows =" | awk 
> > 'NF>1{print $NF}'], [0], [dnl
> > +$total_lflows
> > +])
> > +AT_CHECK_UNQUOTED([ovn-sbctl count-flows | grep Total | grep egress | awk 
> > 'NF>1{print $NF}'], [0], [dnl
> > +$egress_lflows
> > +])
> > +AT_CHECK_UNQUOTED([ovn-sbctl count-flows | grep Total | grep ingress | awk 
> > 'NF>1{print $NF}'], [0], [dnl
> > +$ingress_lflows
> > +])
> > +
> > 

Re: [ovs-dev] [PATCH v3 ovn] Don't suppress localport traffic directed to external port

2021-07-14 Thread Ihar Hrachyshka
On Wed, Jul 14, 2021 at 11:21 AM Numan Siddique  wrote:
>
> On Tue, Jul 13, 2021 at 8:40 PM Ihar Hrachyshka  wrote:
> >
> > Recently, we stopped leaking localport traffic through localnet ports
> > into fabric to avoid unnecessary flipping between chassis hosting the
> > same localport.
> >
> > Despite the type name, in some scenarios localports are supposed to
> > talk outside the hosting chassis. Specifically, in OpenStack [1]
> > metadata service for SR-IOV ports is implemented as a localport hosted
> > on another chassis that is exposed to the chassis owning the SR-IOV
> > port through an "external" port. In this case, "leaking" localport
> > traffic into fabric is desirable.
> >
> > This patch inserts a higher priority flow per external port on the
> > same datapath that avoids dropping localport traffic.
> >
> > Fixes: 96959e56d634 ("physical: do not forward traffic from localport
> > to a localnet one")
> >
> > [1] https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html
> >
> > Signed-off-by: Ihar Hrachyshka 
>
> Hi Ihar,
>
> Thanks for the patch.  There are a few memory leaks which can be easily fixed.
>

Thanks, I addressed all the comments in the next v4 patch. Sorry for
leaks and other issues, I am still puzzled by all the code paths
triggered by db state change.

> Please see  below for a few comments.
>
>
> >
> > --
> >
> > v1: initial version.
> > v2: fixed code for unbound external ports.
> > v2: rebased.
> > v3: optimize external ports iteration.
> > v3: rate limit error message on mac address parse failure.
> > ---
> >  controller/binding.c| 33 +---
> >  controller/ovn-controller.c |  1 +
> >  controller/ovn-controller.h |  2 ++
> >  controller/physical.c   | 46 +
> >  tests/ovn.at| 51 +
> >  5 files changed, 130 insertions(+), 3 deletions(-)
> >
> > diff --git a/controller/binding.c b/controller/binding.c
> > index 70bf13390..87195e5fc 100644
> > --- a/controller/binding.c
> > +++ b/controller/binding.c
> > @@ -108,6 +108,7 @@ add_local_datapath__(struct ovsdb_idl_index 
> > *sbrec_datapath_binding_by_key,
> >  hmap_insert(local_datapaths, >hmap_node, dp_key);
> >  ld->datapath = datapath;
> >  ld->localnet_port = NULL;
> > +shash_init(>external_ports);
> >  ld->has_local_l3gateway = has_local_l3gateway;
> >
> >  if (tracked_datapaths) {
> > @@ -474,6 +475,18 @@ is_network_plugged(const struct sbrec_port_binding 
> > *binding_rec,
> >  return network ? !!shash_find_data(bridge_mappings, network) : false;
> >  }
> >
> > +static void
> > +update_ld_external_ports(const struct sbrec_port_binding *binding_rec,
> > + struct hmap *local_datapaths)
> > +{
> > +struct local_datapath *ld = get_local_datapath(
> > +local_datapaths, binding_rec->datapath->tunnel_key);
> > +if (ld) {
> > +shash_replace(>external_ports, binding_rec->logical_port,
> > +  binding_rec);
> > +}
> > +}
> > +
> >  static void
> >  update_ld_localnet_port(const struct sbrec_port_binding *binding_rec,
> >  struct shash *bridge_mappings,
> > @@ -1657,8 +1670,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> > binding_ctx_out *b_ctx_out)
> >  !sset_is_empty(b_ctx_out->egress_ifaces) ? _map : NULL;
> >
> >  struct ovs_list localnet_lports = 
> > OVS_LIST_INITIALIZER(_lports);
> > +struct ovs_list external_lports = 
> > OVS_LIST_INITIALIZER(_lports);
> >
> > -struct localnet_lport {
> > +struct lport {
> >  struct ovs_list list_node;
> >  const struct sbrec_port_binding *pb;
> >  };
> > @@ -1713,11 +1727,14 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> > binding_ctx_out *b_ctx_out)
> >
> >  case LP_EXTERNAL:
> >  consider_external_lport(pb, b_ctx_in, b_ctx_out);
> > +struct lport *ext_lport = xmalloc(sizeof *ext_lport);
> > +ext_lport->pb = pb;
> > +ovs_list_push_back(_lports, _lport->list_node);
> >  break;
> >
> >  case LP_LOCALNET: {
> >  consider_localnet_lport(pb, b_ctx_in, b_ctx_out, _map);
> > -struct localnet_lport *lnet_lport = xmalloc(sizeof 
> > *lnet_lport);
> > +struct lport *lnet_lport = xmalloc(sizeof *lnet_lport);
> >  lnet_lport->pb = pb;
> >  ovs_list_push_back(_lports, _lport->list_node);
> >  break;
> > @@ -1744,7 +1761,7 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> > binding_ctx_out *b_ctx_out)
> >  /* Run through each localnet lport list to see if it is a localnet port
> >   * on local datapaths discovered from above loop, and update the
> >   * corresponding local datapath accordingly. */
> > -struct localnet_lport *lnet_lport;
> > +struct lport *lnet_lport;
> >  LIST_FOR_EACH_POP (lnet_lport, list_node, _lports) {
> 

[ovs-dev] [PATCH v4 ovn] Don't suppress localport traffic directed to external port

2021-07-14 Thread Ihar Hrachyshka
Recently, we stopped leaking localport traffic through localnet ports
into fabric to avoid unnecessary flipping between chassis hosting the
same localport.

Despite the type name, in some scenarios localports are supposed to
talk outside the hosting chassis. Specifically, in OpenStack [1]
metadata service for SR-IOV ports is implemented as a localport hosted
on another chassis that is exposed to the chassis owning the SR-IOV
port through an "external" port. In this case, "leaking" localport
traffic into fabric is desirable.

This patch inserts a higher priority flow per external port on the
same datapath that avoids dropping localport traffic.

Fixes: 96959e56d634 ("physical: do not forward traffic from localport
to a localnet one")

[1] https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html

Signed-off-by: Ihar Hrachyshka 

--

v1: initial version.
v2: fixed code for unbound external ports.
v2: rebased.
v3: optimize external ports iteration.
v3: rate limit error message on mac address parse failure.
v4: fixed several memory leaks on local_datapaths cleanup.
v4: properly clean up flows for deleted external ports.
v4: test that external port created after localnet works.
v4: test that external port deleted doesn't pass traffic.
---
 controller/binding.c| 35 +--
 controller/ovn-controller.c |  2 +
 controller/ovn-controller.h |  2 +
 controller/physical.c   | 46 
 tests/ovn.at| 85 +
 5 files changed, 167 insertions(+), 3 deletions(-)

diff --git a/controller/binding.c b/controller/binding.c
index 70bf13390..d50f3affa 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -108,6 +108,7 @@ add_local_datapath__(struct ovsdb_idl_index 
*sbrec_datapath_binding_by_key,
 hmap_insert(local_datapaths, >hmap_node, dp_key);
 ld->datapath = datapath;
 ld->localnet_port = NULL;
+shash_init(>external_ports);
 ld->has_local_l3gateway = has_local_l3gateway;
 
 if (tracked_datapaths) {
@@ -474,6 +475,18 @@ is_network_plugged(const struct sbrec_port_binding 
*binding_rec,
 return network ? !!shash_find_data(bridge_mappings, network) : false;
 }
 
+static void
+update_ld_external_ports(const struct sbrec_port_binding *binding_rec,
+ struct hmap *local_datapaths)
+{
+struct local_datapath *ld = get_local_datapath(
+local_datapaths, binding_rec->datapath->tunnel_key);
+if (ld) {
+shash_replace(>external_ports, binding_rec->logical_port,
+  binding_rec);
+}
+}
+
 static void
 update_ld_localnet_port(const struct sbrec_port_binding *binding_rec,
 struct shash *bridge_mappings,
@@ -1657,8 +1670,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 !sset_is_empty(b_ctx_out->egress_ifaces) ? _map : NULL;
 
 struct ovs_list localnet_lports = OVS_LIST_INITIALIZER(_lports);
+struct ovs_list external_lports = OVS_LIST_INITIALIZER(_lports);
 
-struct localnet_lport {
+struct lport {
 struct ovs_list list_node;
 const struct sbrec_port_binding *pb;
 };
@@ -1713,11 +1727,14 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 
 case LP_EXTERNAL:
 consider_external_lport(pb, b_ctx_in, b_ctx_out);
+struct lport *ext_lport = xmalloc(sizeof *ext_lport);
+ext_lport->pb = pb;
+ovs_list_push_back(_lports, _lport->list_node);
 break;
 
 case LP_LOCALNET: {
 consider_localnet_lport(pb, b_ctx_in, b_ctx_out, _map);
-struct localnet_lport *lnet_lport = xmalloc(sizeof *lnet_lport);
+struct lport *lnet_lport = xmalloc(sizeof *lnet_lport);
 lnet_lport->pb = pb;
 ovs_list_push_back(_lports, _lport->list_node);
 break;
@@ -1744,7 +1761,7 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 /* Run through each localnet lport list to see if it is a localnet port
  * on local datapaths discovered from above loop, and update the
  * corresponding local datapath accordingly. */
-struct localnet_lport *lnet_lport;
+struct lport *lnet_lport;
 LIST_FOR_EACH_POP (lnet_lport, list_node, _lports) {
 update_ld_localnet_port(lnet_lport->pb, _mappings,
 b_ctx_out->egress_ifaces,
@@ -1752,6 +1769,15 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 free(lnet_lport);
 }
 
+/* Run through external lport list to see if these are external ports
+ * on local datapaths discovered from above loop, and update the
+ * corresponding local datapath accordingly. */
+struct lport *ext_lport;
+LIST_FOR_EACH_POP (ext_lport, list_node, _lports) {
+update_ld_external_ports(ext_lport->pb, b_ctx_out->local_datapaths);
+

Re: [ovs-dev] [v12 05/11] dpif-netdev: Add configure to enable autovalidator at build time.

2021-07-14 Thread Flavio Leitner
On Wed, Jul 14, 2021 at 07:44:38PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> This commit adds a new command to allow the user to enable
> autovalidatior by default at build time thus allowing for
> runnig unit test by default.
> 
>  $ ./configure --enable-mfex-default-autovalidator
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> Acked-by: Eelco Chaudron 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v12 03/11] dpif-netdev: Add study function to select the best mfex function

2021-07-14 Thread Flavio Leitner
On Wed, Jul 14, 2021 at 07:44:36PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> The study function runs all the available implementations
> of miniflow_extract and makes a choice whose hitmask has
> maximum hits and sets the mfex to that function.
> 
> Study can be run at runtime using the following command:
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set study
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> Acked-by: Eelco Chaudron 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 ovn] Don't suppress localport traffic directed to external port

2021-07-14 Thread Numan Siddique
On Wed, Jul 14, 2021 at 11:21 AM Numan Siddique  wrote:
>
> On Tue, Jul 13, 2021 at 8:40 PM Ihar Hrachyshka  wrote:
> >
> > Recently, we stopped leaking localport traffic through localnet ports
> > into fabric to avoid unnecessary flipping between chassis hosting the
> > same localport.
> >
> > Despite the type name, in some scenarios localports are supposed to
> > talk outside the hosting chassis. Specifically, in OpenStack [1]
> > metadata service for SR-IOV ports is implemented as a localport hosted
> > on another chassis that is exposed to the chassis owning the SR-IOV
> > port through an "external" port. In this case, "leaking" localport
> > traffic into fabric is desirable.
> >
> > This patch inserts a higher priority flow per external port on the
> > same datapath that avoids dropping localport traffic.
> >
> > Fixes: 96959e56d634 ("physical: do not forward traffic from localport
> > to a localnet one")
> >
> > [1] https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html
> >
> > Signed-off-by: Ihar Hrachyshka 
>
> Hi Ihar,
>
> Thanks for the patch.  There are a few memory leaks which can be easily fixed.
>
> Please see  below for a few comments.
>
>
> >
> > --
> >
> > v1: initial version.
> > v2: fixed code for unbound external ports.
> > v2: rebased.
> > v3: optimize external ports iteration.
> > v3: rate limit error message on mac address parse failure.
> > ---
> >  controller/binding.c| 33 +---
> >  controller/ovn-controller.c |  1 +
> >  controller/ovn-controller.h |  2 ++
> >  controller/physical.c   | 46 +
> >  tests/ovn.at| 51 +
> >  5 files changed, 130 insertions(+), 3 deletions(-)
> >
> > diff --git a/controller/binding.c b/controller/binding.c
> > index 70bf13390..87195e5fc 100644
> > --- a/controller/binding.c
> > +++ b/controller/binding.c
> > @@ -108,6 +108,7 @@ add_local_datapath__(struct ovsdb_idl_index 
> > *sbrec_datapath_binding_by_key,
> >  hmap_insert(local_datapaths, >hmap_node, dp_key);
> >  ld->datapath = datapath;
> >  ld->localnet_port = NULL;
> > +shash_init(>external_ports);
> >  ld->has_local_l3gateway = has_local_l3gateway;
> >
> >  if (tracked_datapaths) {
> > @@ -474,6 +475,18 @@ is_network_plugged(const struct sbrec_port_binding 
> > *binding_rec,
> >  return network ? !!shash_find_data(bridge_mappings, network) : false;
> >  }
> >
> > +static void
> > +update_ld_external_ports(const struct sbrec_port_binding *binding_rec,
> > + struct hmap *local_datapaths)
> > +{
> > +struct local_datapath *ld = get_local_datapath(
> > +local_datapaths, binding_rec->datapath->tunnel_key);
> > +if (ld) {
> > +shash_replace(>external_ports, binding_rec->logical_port,
> > +  binding_rec);
> > +}
> > +}
> > +
> >  static void
> >  update_ld_localnet_port(const struct sbrec_port_binding *binding_rec,
> >  struct shash *bridge_mappings,
> > @@ -1657,8 +1670,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> > binding_ctx_out *b_ctx_out)
> >  !sset_is_empty(b_ctx_out->egress_ifaces) ? _map : NULL;
> >
> >  struct ovs_list localnet_lports = 
> > OVS_LIST_INITIALIZER(_lports);
> > +struct ovs_list external_lports = 
> > OVS_LIST_INITIALIZER(_lports);
> >
> > -struct localnet_lport {
> > +struct lport {
> >  struct ovs_list list_node;
> >  const struct sbrec_port_binding *pb;
> >  };
> > @@ -1713,11 +1727,14 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> > binding_ctx_out *b_ctx_out)
> >
> >  case LP_EXTERNAL:
> >  consider_external_lport(pb, b_ctx_in, b_ctx_out);
> > +struct lport *ext_lport = xmalloc(sizeof *ext_lport);
> > +ext_lport->pb = pb;
> > +ovs_list_push_back(_lports, _lport->list_node);
> >  break;
> >
> >  case LP_LOCALNET: {
> >  consider_localnet_lport(pb, b_ctx_in, b_ctx_out, _map);
> > -struct localnet_lport *lnet_lport = xmalloc(sizeof 
> > *lnet_lport);
> > +struct lport *lnet_lport = xmalloc(sizeof *lnet_lport);
> >  lnet_lport->pb = pb;
> >  ovs_list_push_back(_lports, _lport->list_node);
> >  break;
> > @@ -1744,7 +1761,7 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> > binding_ctx_out *b_ctx_out)
> >  /* Run through each localnet lport list to see if it is a localnet port
> >   * on local datapaths discovered from above loop, and update the
> >   * corresponding local datapath accordingly. */
> > -struct localnet_lport *lnet_lport;
> > +struct lport *lnet_lport;
> >  LIST_FOR_EACH_POP (lnet_lport, list_node, _lports) {
> >  update_ld_localnet_port(lnet_lport->pb, _mappings,
> >  b_ctx_out->egress_ifaces,
> > @@ -1752,6 +1769,15 @@ 

Re: [ovs-dev] [PATCH] latch-unix: Make the latch read buffer shared

2021-07-14 Thread Anton Ivanov

On 14/07/2021 19:33, Ben Pfaff wrote:

On Wed, Jul 14, 2021 at 05:36:36PM +0100, anton.iva...@cambridgegreys.com wrote:

From: Anton Ivanov 

There is no point to add 512 bytes on the stack
every time latch is polled. Alignment, cache line thrashing,
etc - you name it.

Do you have evidence this is a real problem?


I played a bit with it using the ovn-heater benchmark, difference was 
marginal.


IMHO it will result in a difference only on a bigger setup which I 
cannot simulate.





The result of the read is discarded anyway so the buffer
can be shared by all latches.

Signed-off-by: Anton Ivanov 
+/* All writes to latch are zero sized. Even 16 bytes are an overkill */
+static char latch_buffer[16];

This comment is wrong.  Writes to a latch are 1 byte.


Me bad - I saw the "" in write() and ignored the 1 passed as length.



latch_poll() is supposed to fully clear any buffered data.  It shouldn't
cause behavioral problems if it doesn't, and I imagine that it's rare
that there'd be more than 16 queued notifications, but it seems
regressive to just clear some of them.


The read can be looped. In fact, for full correctness it should be 
looped regardless of the size of the read buffer.


So maybe 16 local looped?



It's silly to use static data for 16 bytes.  If you're going to reduce
the size, just keep it as local.


Fair point.




  /* Initializes 'latch' as initially unset. */
  void
  latch_init(struct latch *latch)
@@ -43,9 +46,7 @@ latch_destroy(struct latch *latch)
  bool
  latch_poll(struct latch *latch)
  {
-char buffer[_POSIX_PIPE_BUF];
-
-return read(latch->fds[0], buffer, sizeof buffer) > 0;
+return read(latch->fds[0], _buffer, sizeof latch_buffer) > 0;
  }
  
  /* Sets 'latch'.

--
2.20.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev



--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v12 02/11] dpif-netdev: Add auto validation function for miniflow extract

2021-07-14 Thread Flavio Leitner
On Wed, Jul 14, 2021 at 07:44:35PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> This patch introduced the auto-validation function which
> allows users to compare the batch of packets obtained from
> different miniflow implementations against the linear
> miniflow extract and return a hitmask.
> 
> The autovaidator function can be triggered at runtime using the
> following command:
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> Acked-by: Eelco Chaudron 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v12 01/11] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-14 Thread Flavio Leitner
On Wed, Jul 14, 2021 at 07:44:34PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> This patch introduces the MFEX function pointers which allows
> the user to switch between different miniflow extract implementations
> which are provided by the OVS based on optimized ISA CPU.
> 
> The user can query for the available minflow extract variants available
> for that CPU by following commands:
> 
> $ovs-appctl dpif-netdev/miniflow-parser-get
> 
> Similarly an user can set the miniflow implementation by the following
> command :
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set name
> 
> This allows for more performance and flexibility to the user to choose
> the miniflow implementation according to the needs.
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> Acked-by: Eelco Chaudron 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] latch-unix: Make the latch read buffer shared

2021-07-14 Thread Ben Pfaff
On Wed, Jul 14, 2021 at 05:36:36PM +0100, anton.iva...@cambridgegreys.com wrote:
> From: Anton Ivanov 
> 
> There is no point to add 512 bytes on the stack
> every time latch is polled. Alignment, cache line thrashing,
> etc - you name it.

Do you have evidence this is a real problem?

> The result of the read is discarded anyway so the buffer
> can be shared by all latches.
> 
> Signed-off-by: Anton Ivanov 

> +/* All writes to latch are zero sized. Even 16 bytes are an overkill */
> +static char latch_buffer[16];

This comment is wrong.  Writes to a latch are 1 byte.

latch_poll() is supposed to fully clear any buffered data.  It shouldn't
cause behavioral problems if it doesn't, and I imagine that it's rare
that there'd be more than 16 queued notifications, but it seems
regressive to just clear some of them.

It's silly to use static data for 16 bytes.  If you're going to reduce
the size, just keep it as local.

>  /* Initializes 'latch' as initially unset. */
>  void
>  latch_init(struct latch *latch)
> @@ -43,9 +46,7 @@ latch_destroy(struct latch *latch)
>  bool
>  latch_poll(struct latch *latch)
>  {
> -char buffer[_POSIX_PIPE_BUF];
> -
> -return read(latch->fds[0], buffer, sizeof buffer) > 0;
> +return read(latch->fds[0], _buffer, sizeof latch_buffer) > 0;
>  }
>  
>  /* Sets 'latch'.
> -- 
> 2.20.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/9] OVSDB Relay Service Model. (Was: OVSDB 2-Tier deployment)

2021-07-14 Thread Ilya Maximets
On 7/14/21 3:50 PM, Ilya Maximets wrote:
> Replication can be used to scale out read-only access to the database.
> But there are clients that are not read-only, but read-mostly.
> One of the main examples is ovn-controller that mostly monitors
> updates from the Southbound DB, but needs to claim ports by sending
> transactions that changes some database tables.
> 
> Southbound database serves lots of connections: all connections
> from ovn-controllers and some service connections from cloud
> infrastructure, e.g. some OpenStack agents are monitoring updates.
> At a high scale and with a big size of the database ovsdb-server
> spends too much time processing monitor updates and it's required
> to move this load somewhere else.  This patch-set aims to introduce
> required functionality to scale out read-mostly connections by
> introducing a new OVSDB 'relay' service model .
> 
> In this new service model ovsdb-server connects to existing OVSDB
> server and maintains in-memory copy of the database.  It serves
> read-only transactions and monitor requests by its own, but forwards
> write transactions to the relay source.
> 
> Key differences from the active-backup replication:
> - support for "write" transactions.
> - no on-disk storage. (probably, faster operation)
> - support for multiple remotes (connect to the clustered db).
> - doesn't try to keep connection as long as possible, but
>   faster reconnects to other remotes to avoid missing updates.
> - No need to know the complete database schema beforehand,
>   only the schema name.
> - can be used along with other standalone and clustered databases
>   by the same ovsdb-server process. (doesn't turn the whole
>   jsonrpc server to read-only mode)
> - supports modern version of monitors (monitor_cond_since),
>   because based on ovsdb-cs.
> - could be chained, i.e. multiple relays could be connected
>   one to another in a row or in a tree-like form.
> 
> Bringing all above functionality to the existing active-backup
> replication doesn't look right as it will make it less reliable
> for the actual backup use case, and this also would be much
> harder from the implementation point of view, because current
> replication code is not based on ovsdb-cs or idl and all the required
> features would be likely duplicated or replication would be fully
> re-written on top of ovsdb-cs with severe modifications of the former.
> 
> Relay is somewhere in the middle between active-backup replication and
> the clustered model taking a lot from both, therefore is hard to
> implement on top of any of them.
> 
> To run ovsdb-server in relay mode, user need to simply run:
> 
>   ovsdb-server --remote=punix:db.sock relay::
> 
> e.g.
> 
>   ovsdb-server --remote=punix:db.sock relay:OVN_Southbound:tcp:127.0.0.1:6642
> 
> More details and examples in the documentation in the last patch
> of the series.
> 
> I actually tried to implement transaction forwarding on top of
> active-backup replication in v1 of this seies, but it required
> a lot of tricky changes, including schema format changes in order
> to bring required information to the end clients, so I decided
> to fully rewrite the functionality in v2 with a different approach.
> 
> 
>  Testing
>  ===
> 
> Some scale tests were performed with OVSDB Relays that mimics OVN
> workloads with ovn-kubernetes.
> Tests performed with ovn-heater (https://github.com/dceara/ovn-heater)
> on scenario ocp-120-density-heavy:
>  
> https://github.com/dceara/ovn-heater/blob/master/test-scenarios/ocp-120-density-heavy.yml
> In short, the test gradually creates a lot of OVN resources and
> checks that network is configured correctly (by pinging diferent
> namespaces).  The test includes 120 chassis (created by
> ovn-fake-multinode), 31250 LSPs spread evenly across 120 LSes, 3 LBs
> with 15625 VIPs each, attached to all node LSes, etc.  Test performed
> with monitor-all=true.
> 
> Note 1:
>  - Memory consumption is checked at the end of a test in a following
>way: 1) check RSS 2) compact database 3) check RSS again.
>It's observed that ovn-controllers in this test are fairly slow
>and backlog builds up on monitors, because ovn-controllers are
>not able to receive updates fast enough.  This contributes to
>RSS of the process, especially in combination of glibc bug (glibc
>doesn't free fastbins back to the system).  Memory trimming on
>compaction is enabled in the test, so after compaction we can
>see more or less real value of the RSS at the end of the test
>wihtout backlog noise. (Compaction on relay in this case is
>just plain malloc_trim()).
> 
> Note 2:
>  - I didn't collect memory consumption (RSS) after compaction for a
>test with 10 relays, because I got the idea only after the test
>was finished and another one already started.  And run takes
>significant amount of time.  So, values marked with a star (*)
>are an approximation based on results form other tests, hence
>

[ovs-dev] [PATCH v5 7/7] tests: Add new test for cross-numa pmd rxq assignments.

2021-07-14 Thread Kevin Traynor
Add some tests to ensure that if there are numa local
PMDs they are used for polling an rxq.

Also check that if there are only numa non-local PMDs they
will be used ito poll the rxq and but the user will be warned.

Signed-off-by: Kevin Traynor 
Acked-by: Sunil Pai G 
Acked-by: David Marchand 
---
 tests/pmd.at | 154 +++
 1 file changed, 154 insertions(+)

diff --git a/tests/pmd.at b/tests/pmd.at
index 5ab4415b0..1df35057f 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -196,4 +196,158 @@ OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([PMD - pmd-cpu-mask - NUMA])
+OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy-pmd 
options:n_rxq=8 options:numa_id=1 -- set Open_vSwitch . 
other_config:pmd-cpu-mask=1],
+   [], [], [--dummy-numa 1,1,0,0])
+
+TMP=$(($(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]])+1))
+CHECK_CPU_DISCOVERED(4)
+CHECK_PMD_THREADS_CREATED()
+
+AT_CHECK([ovs-appctl dpif/show | sed 
's/\(tx_queues=\)[[0-9]]*/\1/g'], [0], [dnl
+dummy@ovs-dummy: hit:0 missed:0
+  br0:
+br0 65534/100: (dummy-internal)
+p0 1/1: (dummy-pmd: configured_rx_queues=8, 
configured_tx_queues=, requested_rx_queues=8, 
requested_tx_queues=)
+])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN], 
[0], [dnl
+pmd thread numa_id  core_id :
+  isolated : false
+  port: p0queue-id:  0 (enabled)   pmd usage: NOT AVAIL
+  port: p0queue-id:  1 (enabled)   pmd usage: NOT AVAIL
+  port: p0queue-id:  2 (enabled)   pmd usage: NOT AVAIL
+  port: p0queue-id:  3 (enabled)   pmd usage: NOT AVAIL
+  port: p0queue-id:  4 (enabled)   pmd usage: NOT AVAIL
+  port: p0queue-id:  5 (enabled)   pmd usage: NOT AVAIL
+  port: p0queue-id:  6 (enabled)   pmd usage: NOT AVAIL
+  port: p0queue-id:  7 (enabled)   pmd usage: NOT AVAIL
+])
+
+# Force cross-numa polling
+TMP=$(($(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]])+1))
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0xc])
+CHECK_PMD_THREADS_CREATED([2], [0], [+$TMP])
+OVS_WAIT_UNTIL([tail -n +$TMP ovs-vswitchd.log | grep "Performing pmd to rx 
queue assignment using cycles algorithm"])
+OVS_WAIT_UNTIL([tail -n +$TMP ovs-vswitchd.log | grep "There's no available 
(non-isolated) pmd thread on numa node 1. Port 'p0' rx queue 7 will be assigned 
to a pmd on numa node 0. This may lead to reduced performance."])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | awk '/AVAIL$/ { printf("%s\t", 
$0); next } 1' | parse_pmd_rxq_show_group | sort], [0], [dnl
+port: p0 queue-id: 0 3 4 7
+port: p0 queue-id: 1 2 5 6
+])
+
+# Check other assignment types for cross-numa polling
+TMP=$(($(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]])+1))
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=roundrobin])
+OVS_WAIT_UNTIL([tail -n +$TMP ovs-vswitchd.log | grep "Performing pmd to rx 
queue assignment using roundrobin algorithm"])
+OVS_WAIT_UNTIL([tail -n +$TMP ovs-vswitchd.log | grep "There's no available 
(non-isolated) pmd thread on numa node 1. Port 'p0' rx queue 7 will be assigned 
to a pmd on numa node 0. This may lead to reduced performance."])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | awk '/AVAIL$/ { printf("%s\t", 
$0); next } 1' | parse_pmd_rxq_show_group | sort], [0], [dnl
+port: p0 queue-id: 0 2 4 6
+port: p0 queue-id: 1 3 5 7
+])
+
+TMP=$(($(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]])+1))
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=group])
+OVS_WAIT_UNTIL([tail -n +$TMP ovs-vswitchd.log | grep "Performing pmd to rx 
queue assignment using group algorithm"])
+OVS_WAIT_UNTIL([tail -n +$TMP ovs-vswitchd.log | grep "There's no available 
(non-isolated) pmd thread on numa node 1. Port 'p0' rx queue 7 will be assigned 
to a pmd on numa node 0. This may lead to reduced performance."])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | awk '/AVAIL$/ { printf("%s\t", 
$0); next } 1' | parse_pmd_rxq_show_group | sort], [0], [dnl
+port: p0 queue-id: 0 2 4 6
+port: p0 queue-id: 1 3 5 7
+])
+
+# Switch back to same numa
+TMP=$(($(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]])+1))
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x3])
+CHECK_PMD_THREADS_CREATED([2], [1], [+$TMP])
+OVS_WAIT_UNTIL([tail -n +$TMP ovs-vswitchd.log | grep "Performing pmd to rx 
queue assignment using group algorithm"])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | awk '/AVAIL$/ { printf("%s\t", 
$0); next } 1' | parse_pmd_rxq_show_group | sort], [0], [dnl
+port: p0 queue-id: 0 2 4 6
+port: p0 queue-id: 1 3 5 7
+])
+
+# Check local numa is only used if available
+TMP=$(($(cat ovs-vswitchd.log | wc -l | tr -d [[:blank:]])+1))
+AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x6])
+CHECK_PMD_THREADS_CREATED([1], [0], [+$TMP])
+CHECK_PMD_THREADS_CREATED([1], [1], [+$TMP])

[ovs-dev] [PATCH v5 6/7] dpif-netdev: Allow pin rxq and non-isolate PMD.

2021-07-14 Thread Kevin Traynor
Pinning an rxq to a PMD with pmd-rxq-affinity may be done for
various reasons such as reserving a full PMD for an rxq, or to
ensure that multiple rxqs from a port are handled on different PMDs.

Previously pmd-rxq-affinity always isolated the PMD so no other rxqs
could be assigned to it by OVS. There may be cases where there is
unused cycles on those pmds and the user would like other rxqs to
also be able to be assigned to it by OVS.

Add an option to pin the rxq and non-isolate the PMD. The default
behaviour is unchanged, which is pin and isolate the PMD.

In order to pin and non-isolate:
ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-isolate=false

Note this is available only with group assignment type, as pinning
conflicts with the operation of the other rxq assignment algorithms.

Signed-off-by: Kevin Traynor 
Acked-by: Sunil Pai G 
Acked-by: David Marchand 
---
 Documentation/topics/dpdk/pmd.rst |   9 ++-
 NEWS  |   3 +
 lib/dpif-netdev.c |  34 --
 tests/pmd.at  | 105 ++
 vswitchd/vswitch.xml  |  19 ++
 5 files changed, 162 insertions(+), 8 deletions(-)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index 29ba53954..30040d703 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -102,6 +102,11 @@ like so:
 - Queue #3 pinned to core 8
 
-PMD threads on cores where Rx queues are *pinned* will become *isolated*. This
-means that this thread will only poll the *pinned* Rx queues.
+PMD threads on cores where Rx queues are *pinned* will become *isolated* by
+default. This means that this thread will only poll the *pinned* Rx queues.
+
+If using ``pmd-rxq-assign=group`` PMD threads with *pinned* Rxqs can be
+*non-isolated* by setting::
+
+  $ ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-isolate=false
 
 .. warning::
diff --git a/NEWS b/NEWS
index be825150b..0cf61a7f7 100644
--- a/NEWS
+++ b/NEWS
@@ -35,4 +35,7 @@ Post-v2.15.0
  * Added new 'group' option to pmd-rxq-assign. This will assign rxq to pmds
purely based on rxq and pmd load.
+ * Add new 'pmd-rxq-isolate' option that can be set to 'false' in order
+   that pmd cores which are pinned with rxqs using 'pmd-rxq-affinity'
+   are available for assigning other non-pinned rxqs.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e62535e67..47bed2736 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -291,4 +291,5 @@ struct dp_netdev {
 /* Rxq to pmd assignment type. */
 enum sched_assignment_type pmd_rxq_assign_type;
+bool pmd_iso;
 
 /* Protects the access of the 'struct dp_netdev_pmd_thread'
@@ -4012,4 +4013,22 @@ dpif_netdev_set_config(struct dpif *dpif, const struct 
smap *other_config)
 }
 
+bool pmd_iso = smap_get_bool(other_config, "pmd-rxq-isolate", true);
+
+if (pmd_rxq_assign_type != SCHED_GROUP && pmd_iso == false) {
+/* Invalid combination. */
+VLOG_WARN("pmd-rxq-isolate can only be set false "
+  "when using pmd-rxq-assign=group");
+pmd_iso = true;
+}
+if (dp->pmd_iso != pmd_iso) {
+dp->pmd_iso = pmd_iso;
+if (pmd_iso) {
+VLOG_INFO("pmd-rxq-affinity isolates PMD core");
+} else {
+VLOG_INFO("pmd-rxq-affinity does not isolate PMD core");
+}
+dp_netdev_request_reconfigure(dp);
+}
+
 struct pmd_auto_lb *pmd_alb = >pmd_alb;
 bool cur_rebalance_requested = pmd_alb->auto_lb_requested;
@@ -4741,5 +4760,5 @@ sched_numa_list_assignments(struct sched_numa_list 
*numa_list,
 sched_pmd = sched_pmd_find_by_pmd(numa_list, rxq->pmd);
 if (sched_pmd) {
-if (rxq->core_id != OVS_CORE_UNSPEC) {
+if (rxq->core_id != OVS_CORE_UNSPEC && dp->pmd_iso) {
 sched_pmd->isolated = true;
 }
@@ -5008,4 +5027,5 @@ sched_numa_list_schedule(struct sched_numa_list 
*numa_list,
 struct dp_netdev_pmd_thread *pmd;
 struct sched_numa *numa;
+bool iso = dp->pmd_iso;
 uint64_t proc_cycles;
 char rxq_cyc_log[MAX_RXQ_CYC_STRLEN];
@@ -5030,9 +5050,11 @@ sched_numa_list_schedule(struct sched_numa_list 
*numa_list,
 continue;
 }
-/* Mark PMD as isolated if not done already. */
-if (sched_pmd->isolated == false) {
-sched_pmd->isolated = true;
-numa = sched_pmd->numa;
-numa->n_isolated++;
+if (iso) {
+/* Mark PMD as isolated if not done already. */
+if (sched_pmd->isolated == false) {
+sched_pmd->isolated = true;
+numa = 

[ovs-dev] [PATCH v5 1/7] dpif-netdev: Rework rxq scheduling code.

2021-07-14 Thread Kevin Traynor
This reworks the current rxq scheduling code to break it into more
generic and reusable pieces.

The behaviour does not change from a user perspective, except the logs
are updated to be more consistent.

>From an implementation view, there are some changes with mind to
extending functionality.

The high level reusable functions added in this patch are:
- Generate a list of current numas and pmds
- Perform rxq scheduling assignments into that list
- Effect the rxq scheduling assignments so they are used

Signed-off-by: Kevin Traynor 
Acked-by: Sunil Pai G 
Acked-by: David Marchand 
---
 lib/dpif-netdev.c | 557 +-
 tests/pmd.at  |   2 +-
 2 files changed, 450 insertions(+), 109 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 610949f36..5268640a3 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -222,4 +222,10 @@ struct pmd_auto_lb {
 };
 
+enum sched_assignment_type {
+SCHED_ROUNDROBIN,
+SCHED_CYCLES, /* Default.*/
+SCHED_MAX
+};
+
 /* Datapath based on the network device interface from netdev.h.
  *
@@ -282,6 +288,6 @@ struct dp_netdev {
 struct id_pool *tx_qid_pool;
 struct ovs_mutex tx_qid_pool_mutex;
-/* Use measured cycles for rxq to pmd assignment. */
-bool pmd_rxq_assign_cyc;
+/* Rxq to pmd assignment type. */
+enum sched_assignment_type pmd_rxq_assign_type;
 
 /* Protects the access of the 'struct dp_netdev_pmd_thread'
@@ -1485,5 +1491,5 @@ create_dp_netdev(const char *name, const struct 
dpif_class *class,
 
 cmap_init(>poll_threads);
-dp->pmd_rxq_assign_cyc = true;
+dp->pmd_rxq_assign_type = SCHED_CYCLES;
 
 ovs_mutex_init(>tx_qid_pool_mutex);
@@ -3865,5 +3871,5 @@ set_pmd_auto_lb(struct dp_netdev *dp, bool always_log)
 bool enable_alb = false;
 bool multi_rxq = false;
-bool pmd_rxq_assign_cyc = dp->pmd_rxq_assign_cyc;
+enum sched_assignment_type pmd_rxq_assign_type = dp->pmd_rxq_assign_type;
 
 /* Ensure that there is at least 2 non-isolated PMDs and
@@ -3884,6 +3890,6 @@ set_pmd_auto_lb(struct dp_netdev *dp, bool always_log)
 }
 
-/* Enable auto LB if it is requested and cycle based assignment is true. */
-enable_alb = enable_alb && pmd_rxq_assign_cyc &&
+/* Enable auto LB if requested and not using roundrobin assignment. */
+enable_alb = enable_alb && pmd_rxq_assign_type != SCHED_ROUNDROBIN &&
 pmd_alb->auto_lb_requested;
 
@@ -3926,4 +3932,5 @@ dpif_netdev_set_config(struct dpif *dpif, const struct 
smap *other_config)
 uint8_t rebalance_improve;
 bool log_autolb = false;
+enum sched_assignment_type pmd_rxq_assign_type;
 
 tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
@@ -3984,13 +3991,17 @@ dpif_netdev_set_config(struct dpif *dpif, const struct 
smap *other_config)
 }
 
-bool pmd_rxq_assign_cyc = !strcmp(pmd_rxq_assign, "cycles");
-if (!pmd_rxq_assign_cyc && strcmp(pmd_rxq_assign, "roundrobin")) {
-VLOG_WARN("Unsupported Rxq to PMD assignment mode in pmd-rxq-assign. "
-  "Defaulting to 'cycles'.");
-pmd_rxq_assign_cyc = true;
+if (!strcmp(pmd_rxq_assign, "roundrobin")) {
+pmd_rxq_assign_type = SCHED_ROUNDROBIN;
+} else if (!strcmp(pmd_rxq_assign, "cycles")) {
+pmd_rxq_assign_type = SCHED_CYCLES;
+} else {
+/* Default. */
+VLOG_WARN("Unsupported rx queue to PMD assignment mode in "
+  "pmd-rxq-assign. Defaulting to 'cycles'.");
+pmd_rxq_assign_type = SCHED_CYCLES;
 pmd_rxq_assign = "cycles";
 }
-if (dp->pmd_rxq_assign_cyc != pmd_rxq_assign_cyc) {
-dp->pmd_rxq_assign_cyc = pmd_rxq_assign_cyc;
+if (dp->pmd_rxq_assign_type != pmd_rxq_assign_type) {
+dp->pmd_rxq_assign_type = pmd_rxq_assign_type;
 VLOG_INFO("Rxq to PMD assignment mode changed to: \'%s\'.",
   pmd_rxq_assign);
@@ -4652,4 +4663,196 @@ rr_numa_list_destroy(struct rr_numa_list *rr)
 }
 
+struct sched_numa_list {
+struct hmap numas;  /* Contains 'struct sched_numa'. */
+};
+
+/* Meta data for out-of-place pmd rxq assignments. */
+struct sched_pmd {
+struct sched_numa *numa;
+/* Associated PMD thread. */
+struct dp_netdev_pmd_thread *pmd;
+uint64_t pmd_proc_cycles;
+struct dp_netdev_rxq **rxqs;
+unsigned n_rxq;
+bool isolated;
+};
+
+struct sched_numa {
+struct hmap_node node;
+int numa_id;
+/* PMDs on numa node. */
+struct sched_pmd *pmds;
+/* Num of PMDs on numa node. */
+unsigned n_pmds;
+/* Num of isolated PMDs on numa node. */
+unsigned n_isolated;
+int rr_cur_index;
+bool rr_idx_inc;
+};
+
+static size_t
+sched_numa_list_count(struct sched_numa_list *numa_list)
+{
+return hmap_count(_list->numas);
+}
+
+static struct sched_numa *
+sched_numa_list_next(struct sched_numa_list *numa_list,
+ const struct sched_numa *numa)
+{
+struct 

[ovs-dev] [PATCH v5 5/7] dpif-netdev: Add group rxq scheduling assignment type.

2021-07-14 Thread Kevin Traynor
Add an rxq scheduling option that allows rxqs to be grouped
on a pmd based purely on their load.

The current default 'cycles' assignment sorts rxqs by measured
processing load and then assigns them to a list of round robin PMDs.
This helps to keep the rxqs that require most processing on different
cores but as it selects the PMDs in round robin order, it equally
distributes rxqs to PMDs.

'cycles' assignment has the advantage in that it separates the most
loaded rxqs from being on the same core but maintains the rxqs being
spread across a broad range of PMDs to mitigate against changes to
traffic pattern.

'cycles' assignment has the disadvantage that in order to make the
trade off between optimising for current traffic load and mitigating
against future changes, it tries to assign and equal amount of rxqs
per PMD in a round robin manner and this can lead to a less than optimal
balance of the processing load.

Now that PMD auto load balance can help mitigate with future changes in
traffic patterns, a 'group' assignment can be used to assign rxqs based
on their measured cycles and the estimated running total of the PMDs.

In this case, there is no restriction about keeping equal number of
rxqs per PMD as it is purely load based.

This means that one PMD may have a group of low load rxqs assigned to it
while another PMD has one high load rxq assigned to it, as that is the
best balance of their measured loads across the PMDs.

Signed-off-by: Kevin Traynor 
Acked-by: Sunil Pai G 
Acked-by: David Marchand 
---
 Documentation/topics/dpdk/pmd.rst | 26 +++
 NEWS  |  2 ++
 lib/dpif-netdev.c | 42 +--
 tests/pmd.at  | 19 --
 vswitchd/vswitch.xml  |  5 +++-
 5 files changed, 89 insertions(+), 5 deletions(-)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index 065bd16ef..29ba53954 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -137,4 +137,30 @@ The Rx queues will be assigned to the cores in the 
following order::
 Core 8: Q3 (60%) | Q0 (30%)
 
+``group`` assignment is similar to ``cycles`` in that the Rxqs will be
+ordered by their measured processing cycles before being assigned to PMDs.
+It differs from ``cycles`` in that it uses a running estimate of the cycles
+that will be on each PMD to select the PMD with the lowest load for each Rxq.
+
+This means that there can be a group of low traffic Rxqs on one PMD, while a
+high traffic Rxq may have a PMD to itself. Where ``cycles`` kept as close to
+the same number of Rxqs per PMD as possible, with ``group`` this restriction is
+removed for a better balance of the workload across PMDs.
+
+For example, where there are five Rx queues and three cores - 3, 7, and 8 -
+available and the measured usage of core cycles per Rx queue over the last
+interval is seen to be:
+
+- Queue #0: 10%
+- Queue #1: 80%
+- Queue #3: 50%
+- Queue #4: 70%
+- Queue #5: 10%
+
+The Rx queues will be assigned to the cores in the following order::
+
+Core 3: Q1 (80%) |
+Core 7: Q4 (70%) |
+Core 8: Q3 (50%) | Q0 (10%) | Q5 (10%)
+
 Alternatively, ``roundrobin`` assignment can be used, where the Rxqs are
 assigned to PMDs in a round-robined fashion. This algorithm was used by
diff --git a/NEWS b/NEWS
index 6cdccc715..be825150b 100644
--- a/NEWS
+++ b/NEWS
@@ -33,4 +33,6 @@ Post-v2.15.0
CPU supports it. This enhances performance by using the native vpopcount
instructions, instead of the emulated version of vpopcount.
+ * Added new 'group' option to pmd-rxq-assign. This will assign rxq to pmds
+   purely based on rxq and pmd load.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 53d5f57d0..e62535e67 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -225,4 +225,5 @@ enum sched_assignment_type {
 SCHED_ROUNDROBIN,
 SCHED_CYCLES, /* Default.*/
+SCHED_GROUP,
 SCHED_MAX
 };
@@ -3995,4 +3996,6 @@ dpif_netdev_set_config(struct dpif *dpif, const struct 
smap *other_config)
 } else if (!strcmp(pmd_rxq_assign, "cycles")) {
 pmd_rxq_assign_type = SCHED_CYCLES;
+} else if (!strcmp(pmd_rxq_assign, "group")) {
+pmd_rxq_assign_type = SCHED_GROUP;
 } else {
 /* Default. */
@@ -4837,4 +4840,32 @@ compare_rxq_cycles(const void *a, const void *b)
 }
 
+static struct sched_pmd *
+sched_pmd_get_lowest(struct sched_numa *numa, bool has_cyc)
+{
+struct sched_pmd *lowest_sched_pmd = NULL;
+uint64_t lowest_num = UINT64_MAX;
+
+for (unsigned i = 0; i < numa->n_pmds; i++) {
+struct sched_pmd *sched_pmd;
+uint64_t pmd_num;
+
+sched_pmd = >pmds[i];
+if (sched_pmd->isolated) {
+continue;
+}
+if (has_cyc) {
+pmd_num = sched_pmd->pmd_proc_cycles;

[ovs-dev] [PATCH v5 4/7] dpif-netdev: Assign PMD for failed pinned rxqs.

2021-07-14 Thread Kevin Traynor
Previously, if pmd-rxq-affinity was used to pin an rxq to
a core that was not in pmd-cpu-mask the rxq was not polled
for and the user received a warning. This meant that no traffic
would be received from that rxq.

Now that pinned and non-pinned rxqs are assigned to PMDs in
a common call to rxq scheduling, if an invalid core is
selected in pmd-rxq-affinity the rxq can be assigned an
available PMD (if any).

A warning will still be logged as the requested core could
not be used.

Signed-off-by: Kevin Traynor 
Acked-by: Sunil Pai G 
Acked-by: David Marchand 
---
 Documentation/topics/dpdk/pmd.rst | 6 +++---
 lib/dpif-netdev.c | 5 -
 tests/pmd.at  | 5 -
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index e481e7941..065bd16ef 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -108,7 +108,7 @@ means that this thread will only poll the *pinned* Rx 
queues.
 
If there are no *non-isolated* PMD threads, *non-pinned* RX queues will not
-   be polled. Also, if the provided  is not available (e.g. the
-    is not in ``pmd-cpu-mask``), the RX queue will not be polled
-   by any PMD thread.
+   be polled. If the provided  is not available (e.g. the
+    is not in ``pmd-cpu-mask``), the RX queue will be assigned to
+   a *non-isolated* PMD, that will remain *non-isolated*.
 
 If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned to PMDs
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 7fa7c2a9d..53d5f57d0 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4982,9 +4982,12 @@ sched_numa_list_schedule(struct sched_numa_list 
*numa_list,
 "Core %2u cannot be pinned with "
 "port \'%s\' rx queue %d. Use pmd-cpu-mask to "
-"enable a pmd on core %u.",
+"enable a pmd on core %u. An alternative core "
+"will be assigned.",
 rxq->core_id,
 netdev_rxq_get_name(rxq->rx),
 netdev_rxq_get_queue_id(rxq->rx),
 rxq->core_id);
+rxqs = xrealloc(rxqs, (n_rxqs + 1) * sizeof *rxqs);
+rxqs[n_rxqs++] = rxq;
 continue;
 }
diff --git a/tests/pmd.at b/tests/pmd.at
index dbf8952c4..112b7f869 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -588,7 +588,10 @@ AT_CHECK([ovs-vsctl set Open_vSwitch . 
other_config:pmd-cpu-mask=6])
 
 dnl We removed the cores requested by some queues from pmd-cpu-mask.
-dnl Those queues will not be polled.
+dnl Those queues will be polled by remaining non-isolated pmds.
 AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | parse_pmd_rxq_show], [0], [dnl
+p1 0 0 1
+p1 1 0 1
 p1 2 0 2
+p1 3 0 1
 ])
 
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v5 3/7] dpif-netdev: Sort PMD list by core id for rxq scheduling.

2021-07-14 Thread Kevin Traynor
The list of PMDs is round robined through for the selection
when assigning an rxq to a PMD. The list is based on a
hash map, so there is no defined order.

It means the same set of PMDs may get assigned different rxqs
on different runs for no reason other than how the PMDs are stored
in the hash map.

This can be easily changed by sorting the PMDs by core id after
they are extracted, so the PMDs will be used in a consistent order.

Signed-off-by: Kevin Traynor 
Acked-by: Sunil Pai G 
Acked-by: David Marchand 
---
 lib/dpif-netdev.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 76667b5f1..7fa7c2a9d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4593,4 +4593,28 @@ sched_numa_list_lookup(struct sched_numa_list 
*numa_list, int numa_id)
 }
 
+static int
+compare_sched_pmd_list(const void *a_, const void *b_)
+{
+struct sched_pmd *a, *b;
+
+a = (struct sched_pmd *) a_;
+b = (struct sched_pmd *) b_;
+
+return compare_poll_thread_list(>pmd, >pmd);
+}
+
+static void
+sort_numa_list_pmds(struct sched_numa_list *numa_list)
+{
+struct sched_numa *numa;
+
+HMAP_FOR_EACH (numa, node, _list->numas) {
+if (numa->n_pmds > 1) {
+qsort(numa->pmds, numa->n_pmds, sizeof *numa->pmds,
+  compare_sched_pmd_list);
+}
+}
+}
+
 /* Populate numas and pmds on those numas. */
 static void
@@ -4631,4 +4655,5 @@ sched_numa_list_populate(struct sched_numa_list 
*numa_list,
 numa->rr_idx_inc = true;
 }
+sort_numa_list_pmds(numa_list);
 }
 
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v5 2/7] dpif-netdev: Make PMD auto load balance use common rxq scheduling.

2021-07-14 Thread Kevin Traynor
PMD auto load balance had its own separate implementation of the
rxq scheduling that it used for dry runs. This was done because
previously the rxq scheduling was not made reusable for a dry run.

Apart from the code duplication (which is a good enough reason
to replace it alone) this meant that if any further rxq scheduling
changes or assignment types were added they would also have to be
duplicated in the auto load balance code too.

This patch replaces the current PMD auto load balance rxq scheduling
code to reuse the common rxq scheduling code.

The behaviour does not change from a user perspective, except the logs
are updated to be more consistent.

As the dry run will compare the pmd load variances for current and
estimated assignments, new functions are added to populate the current
assignments and use the rxq scheduling data structs for variance
calculations.

Now that the new rxq scheduling data structures are being used in
PMD auto load balance, the older rr_* data structs and associated
functions can be removed.

Signed-off-by: Kevin Traynor 
Acked-by: Sunil Pai G 
Acked-by: David Marchand 
---
 lib/dpif-netdev.c | 508 +++---
 1 file changed, 161 insertions(+), 347 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 5268640a3..76667b5f1 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4529,138 +4529,4 @@ port_reconfigure(struct dp_netdev_port *port)
 }
 
-struct rr_numa_list {
-struct hmap numas;  /* Contains 'struct rr_numa' */
-};
-
-struct rr_numa {
-struct hmap_node node;
-
-int numa_id;
-
-/* Non isolated pmds on numa node 'numa_id' */
-struct dp_netdev_pmd_thread **pmds;
-int n_pmds;
-
-int cur_index;
-bool idx_inc;
-};
-
-static size_t
-rr_numa_list_count(struct rr_numa_list *rr)
-{
-return hmap_count(>numas);
-}
-
-static struct rr_numa *
-rr_numa_list_lookup(struct rr_numa_list *rr, int numa_id)
-{
-struct rr_numa *numa;
-
-HMAP_FOR_EACH_WITH_HASH (numa, node, hash_int(numa_id, 0), >numas) {
-if (numa->numa_id == numa_id) {
-return numa;
-}
-}
-
-return NULL;
-}
-
-/* Returns the next node in numa list following 'numa' in round-robin fashion.
- * Returns first node if 'numa' is a null pointer or the last node in 'rr'.
- * Returns NULL if 'rr' numa list is empty. */
-static struct rr_numa *
-rr_numa_list_next(struct rr_numa_list *rr, const struct rr_numa *numa)
-{
-struct hmap_node *node = NULL;
-
-if (numa) {
-node = hmap_next(>numas, >node);
-}
-if (!node) {
-node = hmap_first(>numas);
-}
-
-return (node) ? CONTAINER_OF(node, struct rr_numa, node) : NULL;
-}
-
-static void
-rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr)
-{
-struct dp_netdev_pmd_thread *pmd;
-struct rr_numa *numa;
-
-hmap_init(>numas);
-
-CMAP_FOR_EACH (pmd, node, >poll_threads) {
-if (pmd->core_id == NON_PMD_CORE_ID || pmd->isolated) {
-continue;
-}
-
-numa = rr_numa_list_lookup(rr, pmd->numa_id);
-if (!numa) {
-numa = xzalloc(sizeof *numa);
-numa->numa_id = pmd->numa_id;
-hmap_insert(>numas, >node, hash_int(pmd->numa_id, 0));
-}
-numa->n_pmds++;
-numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa->pmds);
-numa->pmds[numa->n_pmds - 1] = pmd;
-/* At least one pmd so initialise curr_idx and idx_inc. */
-numa->cur_index = 0;
-numa->idx_inc = true;
-}
-}
-
-/*
- * Returns the next pmd from the numa node.
- *
- * If 'updown' is 'true' it will alternate between selecting the next pmd in
- * either an up or down walk, switching between up/down when the first or last
- * core is reached. e.g. 1,2,3,3,2,1,1,2...
- *
- * If 'updown' is 'false' it will select the next pmd wrapping around when last
- * core reached. e.g. 1,2,3,1,2,3,1,2...
- */
-static struct dp_netdev_pmd_thread *
-rr_numa_get_pmd(struct rr_numa *numa, bool updown)
-{
-int numa_idx = numa->cur_index;
-
-if (numa->idx_inc == true) {
-/* Incrementing through list of pmds. */
-if (numa->cur_index == numa->n_pmds-1) {
-/* Reached the last pmd. */
-if (updown) {
-numa->idx_inc = false;
-} else {
-numa->cur_index = 0;
-}
-} else {
-numa->cur_index++;
-}
-} else {
-/* Decrementing through list of pmds. */
-if (numa->cur_index == 0) {
-/* Reached the first pmd. */
-numa->idx_inc = true;
-} else {
-numa->cur_index--;
-}
-}
-return numa->pmds[numa_idx];
-}
-
-static void
-rr_numa_list_destroy(struct rr_numa_list *rr)
-{
-struct rr_numa *numa;
-
-HMAP_FOR_EACH_POP (numa, node, >numas) {
-free(numa->pmds);
-free(numa);
-}
-hmap_destroy(>numas);
-}
-
 struct 

[ovs-dev] [PATCH v5 0/7] Rxq scheduling updates.

2021-07-14 Thread Kevin Traynor
The first two patches do not provide new functionality for the user
(except the logs are reworked). They are reworking to make the
rxq scheduling and PMD auto load balance code more modular for cleanup
and to be used by subsequent patches. They are also removing the code
duplication between them by having some common functions they can both use.

The other patches are new functionality and unit tests.

github actions passing:
https://github.com/kevintraynor/ovs/actions/runs/1011364932

v5:
- Rebased NEWS file update and added Acks. No code changes.

v4:
- Fixed NEWS file conflict from (almost) mid-air collision of other patch 
merging

v3:
-  fixed asan enabled unit tests

v2:
- added unit tests
- rework from comments
- much renaming and minor fixes
- reordered the patches and added 2 more

---

1/7 reworks the current rxq scheduling code to make it more modular
and reusable. No functional change.
v3:
- fixed asan enabled unit tests
- minor comment syntax
v2:
- renamed functions on David's suggestions
- used enum instead of bool for assignment type from the start
- fixed mem leak
- removed/simplified some redundant code

2/7 makes PMD auto load balance reuse the common rxq scheduling code
and removes the duplication of the rxq scheduling code in PMD auto load
balance for making a dry run. No functional change.
v3:
- added missing clang annotation
v2: minor changes

3/7 new in v2. This is a small patch to make the pmd list used
for rxq scheduling ordered by core id. This is just to add some consistency
between schedules/test runs/pmd-cpu-mask changes.

4/7 provides a fallback for if the user tries to pin an rxq to a PMD with
pmd-rxq-affinity but the PMD is not in the pmd-cpu-mask. Previously it was
not polled. Now it will be polled by an available core.
v3:
- minor comment syntax
v2:
- removed some unneeded code by David's suggestion here and in 1/7 of not
  post-processing rxqs that have been already pinned

5/7 adds a new option to assign rxqs to pmds that incorporates the
estimated load of the PMD and removes the restriction for trying to
equally distribute the number of rxqs across the PMDs. This means it
is solely load based so will help optimize balancing the processing
load across the PMDs. With this method, a group of low loaded rxqs
may be on one PMD, while another PMD could have just one highly loaded
rxq.
v4/5:
- fixed NEWS conflict
v2:
- combined the lowest_* functions on Sunil's suggestion
- simplified some code
- added unit tests

6/7 adds an option to non-isolate the PMD when it is pinned with an rxq
using pmd-rxq-affinity.
v2:
- added unit tests

7/7 new in v2. There was no unit tests testing cross-numa assignments.
i.e. what happens when there is no numa local pmds for an rxq. Aside
from using the new logs, these tests are relevant regardless of this patchset.

Kevin Traynor (7):
  dpif-netdev: Rework rxq scheduling code.
  dpif-netdev: Make PMD auto load balance use common rxq scheduling.
  dpif-netdev: Sort PMD list by core id for rxq scheduling.
  dpif-netdev: Assign PMD for failed pinned rxqs.
  dpif-netdev: Add group rxq scheduling assignment type.
  dpif-netdev: Allow pin rxq and non-isolate PMD.
  tests: Add new test for cross-numa pmd rxq assignments.

 Documentation/topics/dpdk/pmd.rst |   41 +-
 NEWS  |5 +
 lib/dpif-netdev.c | 1057 ++---
 tests/pmd.at  |  285 +++-
 vswitchd/vswitch.xml  |   24 +-
 5 files changed, 995 insertions(+), 417 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] latch-unix: Make the latch read buffer shared

2021-07-14 Thread anton . ivanov
From: Anton Ivanov 

There is no point to add 512 bytes on the stack
every time latch is polled. Alignment, cache line thrashing,
etc - you name it.

The result of the read is discarded anyway so the buffer
can be shared by all latches.

Signed-off-by: Anton Ivanov 
---
 lib/latch-unix.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/latch-unix.c b/lib/latch-unix.c
index 2995076d6..e43fab770 100644
--- a/lib/latch-unix.c
+++ b/lib/latch-unix.c
@@ -23,6 +23,9 @@
 #include "openvswitch/poll-loop.h"
 #include "socket-util.h"
 
+/* All writes to latch are zero sized. Even 16 bytes are an overkill */
+static char latch_buffer[16];
+
 /* Initializes 'latch' as initially unset. */
 void
 latch_init(struct latch *latch)
@@ -43,9 +46,7 @@ latch_destroy(struct latch *latch)
 bool
 latch_poll(struct latch *latch)
 {
-char buffer[_POSIX_PIPE_BUF];
-
-return read(latch->fds[0], buffer, sizeof buffer) > 0;
+return read(latch->fds[0], _buffer, sizeof latch_buffer) > 0;
 }
 
 /* Sets 'latch'.
-- 
2.20.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 2/7] dpif-netdev: Make PMD auto load balance use common rxq scheduling.

2021-07-14 Thread Kevin Traynor
On 14/07/2021 10:05, Jan Scheurich wrote:
>>> In our patch series we decided to skip the check on cross-numa polling 
>>> during
>> auto-load balancing. The rationale is as follows:
>>>
>>> If the estimated PMD-rxq distribution includes cross-NUMA rxq assignments,
>> the same must apply for the current distribution, as none of the scheduling
>> algorithms would voluntarily assign rxqs across NUMA nodes. So, current and
>> estimated rxq assignments are comparable and it makes sense to consider
>> rebalancing when the variance improves.
>>>
>>> Please consider removing this check.
>>>
>>
>> The first thing is that this patch is not changing any behaviour, just re-
>> implementing to reuse the common code, so it would not be the place to
>> change this functionality.
> 
> Fair enough. We should address this in a separate patch.
> 
>> About the proposed change itself, just to be clear what is allowed 
>> currently. It
>> will allow rebalance when there are local pmds, OR there are no local pmds
>> and there is one other NUMA node with pmds available for cross-numa polling.
>>
>> The rationale of not doing a rebalance when there are no local pmds but
>> multiple other NUMAs available for cross-NUMA polling is that the estimate
>> may be incorrect due a different cross-NUMA being choosen for an Rxq than is
>> currently used.
>>
>> I thought about some things like making an Rxq sticky with a particular 
>> cross-
>> NUMA etc for this case but that brings a whole new set of problems, e.g. what
>> happens if that NUMA gets overloaded, reduced cores, how can it ever be reset
>> etc. so I decided not to pursue it as I think it is probably a corner case 
>> (at least
>> for now).
> 
> We currently don't see any scenarios with more than two NUMA nodes, but 
> different CPU/server architectures may perhaps have more NUMA nodes than CPU 
> sockets. 
> 
>> I know the case of no local pmd and one NUMA with pmds is not a corner case
>> as I'm aware of users doing that.
> 
> Agree such configurations are a must to support with auto-lb.
> 
>> We can discuss further about the multiple non-local NUMA case and maybe
>> there's some improvements we can think of, or maybe I've made some wrong
>> assumptions but it would be a follow on from the current patchset.
> 
> Our main use case for cross-NUMA balancing comes with the additional freedom 
> to allow cross-NUMA polling for selected ports that we introduce with fourth 
> patch:
> 
> dpif-netdev: Allow cross-NUMA polling on selected ports
> 
> Today dpif-netdev considers PMD threads on a non-local NUMA node for
> automatic assignment of the rxqs of a port only if there are no local,
> non-isolated PMDs.
> 
> On typical servers with both physical ports on one NUMA node, this often
> leaves the PMDs on the other NUMA node under-utilized, wasting CPU
> resources. The alternative, to manually pin the rxqs to PMDs on remote
> NUMA nodes, also has drawbacks as it limits OVS' ability to auto
> load-balance the rxqs.
> 
> This patch introduces a new interface configuration option to allow
> ports to be automatically polled by PMDs on any NUMA node:
> 
> ovs-vsctl set interface  other_config:cross-numa-polling=true
> 
> If this option is not present or set to false, legacy behaviour applies.
> 
> We indeed use this for our physical ports to be polled by non-isolated PMDs 
> on both NUMAs. The observed capacity improvement is very substantial, so we 
> plan to port this feature on top of your patches once they are merged. 
> 
> This can only fly if the auto-load balancing is allowed to activate rxq 
> assignments with cross-numa polling also in the case there are local 
> non-isolated PMDs.
> 
> Anyway, we can take this up later in our upcoming patch that introduces this 
> option.
> 

Sure. I agree there might be an opportunity for using some unused
resources there. The challenge is to have somewhat predictable estimates
etc. but yes, let's discuss later with that patch.

thanks,
Kevin.

> BR, Jan
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 ovn] Don't suppress localport traffic directed to external port

2021-07-14 Thread Numan Siddique
On Tue, Jul 13, 2021 at 8:40 PM Ihar Hrachyshka  wrote:
>
> Recently, we stopped leaking localport traffic through localnet ports
> into fabric to avoid unnecessary flipping between chassis hosting the
> same localport.
>
> Despite the type name, in some scenarios localports are supposed to
> talk outside the hosting chassis. Specifically, in OpenStack [1]
> metadata service for SR-IOV ports is implemented as a localport hosted
> on another chassis that is exposed to the chassis owning the SR-IOV
> port through an "external" port. In this case, "leaking" localport
> traffic into fabric is desirable.
>
> This patch inserts a higher priority flow per external port on the
> same datapath that avoids dropping localport traffic.
>
> Fixes: 96959e56d634 ("physical: do not forward traffic from localport
> to a localnet one")
>
> [1] https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html
>
> Signed-off-by: Ihar Hrachyshka 

Hi Ihar,

Thanks for the patch.  There are a few memory leaks which can be easily fixed.

Please see  below for a few comments.


>
> --
>
> v1: initial version.
> v2: fixed code for unbound external ports.
> v2: rebased.
> v3: optimize external ports iteration.
> v3: rate limit error message on mac address parse failure.
> ---
>  controller/binding.c| 33 +---
>  controller/ovn-controller.c |  1 +
>  controller/ovn-controller.h |  2 ++
>  controller/physical.c   | 46 +
>  tests/ovn.at| 51 +
>  5 files changed, 130 insertions(+), 3 deletions(-)
>
> diff --git a/controller/binding.c b/controller/binding.c
> index 70bf13390..87195e5fc 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -108,6 +108,7 @@ add_local_datapath__(struct ovsdb_idl_index 
> *sbrec_datapath_binding_by_key,
>  hmap_insert(local_datapaths, >hmap_node, dp_key);
>  ld->datapath = datapath;
>  ld->localnet_port = NULL;
> +shash_init(>external_ports);
>  ld->has_local_l3gateway = has_local_l3gateway;
>
>  if (tracked_datapaths) {
> @@ -474,6 +475,18 @@ is_network_plugged(const struct sbrec_port_binding 
> *binding_rec,
>  return network ? !!shash_find_data(bridge_mappings, network) : false;
>  }
>
> +static void
> +update_ld_external_ports(const struct sbrec_port_binding *binding_rec,
> + struct hmap *local_datapaths)
> +{
> +struct local_datapath *ld = get_local_datapath(
> +local_datapaths, binding_rec->datapath->tunnel_key);
> +if (ld) {
> +shash_replace(>external_ports, binding_rec->logical_port,
> +  binding_rec);
> +}
> +}
> +
>  static void
>  update_ld_localnet_port(const struct sbrec_port_binding *binding_rec,
>  struct shash *bridge_mappings,
> @@ -1657,8 +1670,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> binding_ctx_out *b_ctx_out)
>  !sset_is_empty(b_ctx_out->egress_ifaces) ? _map : NULL;
>
>  struct ovs_list localnet_lports = OVS_LIST_INITIALIZER(_lports);
> +struct ovs_list external_lports = OVS_LIST_INITIALIZER(_lports);
>
> -struct localnet_lport {
> +struct lport {
>  struct ovs_list list_node;
>  const struct sbrec_port_binding *pb;
>  };
> @@ -1713,11 +1727,14 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> binding_ctx_out *b_ctx_out)
>
>  case LP_EXTERNAL:
>  consider_external_lport(pb, b_ctx_in, b_ctx_out);
> +struct lport *ext_lport = xmalloc(sizeof *ext_lport);
> +ext_lport->pb = pb;
> +ovs_list_push_back(_lports, _lport->list_node);
>  break;
>
>  case LP_LOCALNET: {
>  consider_localnet_lport(pb, b_ctx_in, b_ctx_out, _map);
> -struct localnet_lport *lnet_lport = xmalloc(sizeof *lnet_lport);
> +struct lport *lnet_lport = xmalloc(sizeof *lnet_lport);
>  lnet_lport->pb = pb;
>  ovs_list_push_back(_lports, _lport->list_node);
>  break;
> @@ -1744,7 +1761,7 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> binding_ctx_out *b_ctx_out)
>  /* Run through each localnet lport list to see if it is a localnet port
>   * on local datapaths discovered from above loop, and update the
>   * corresponding local datapath accordingly. */
> -struct localnet_lport *lnet_lport;
> +struct lport *lnet_lport;
>  LIST_FOR_EACH_POP (lnet_lport, list_node, _lports) {
>  update_ld_localnet_port(lnet_lport->pb, _mappings,
>  b_ctx_out->egress_ifaces,
> @@ -1752,6 +1769,15 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> binding_ctx_out *b_ctx_out)
>  free(lnet_lport);
>  }
>
> +/* Run through external lport list to see if these are external ports
> + * on local datapaths discovered from above loop, and update the
> + * corresponding local datapath 

Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache

2021-07-14 Thread Eli Britstein


On 7/14/2021 5:58 PM, Ferriter, Cian wrote:

External email: Use caution opening links or attachments



-Original Message-
From: Ilya Maximets 
Sent: Friday 9 July 2021 21:53
To: Ferriter, Cian ; Gaëtan Rivet ; 
Eli Britstein
; d...@openvswitch.org; Van Haaren, Harry 

Cc: Majd Dibbiny ; Ilya Maximets ; Stokes, 
Ian
; Flavio Leitner 
Subject: Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache

On 7/8/21 6:43 PM, Ferriter, Cian wrote:

Hi Gaetan, Eli and all,

Thanks for the patch and the info on how it affects performance in your case. I 
just wanted to post

the performance we are seeing.

I've posted the numbers inline. Please note, I'll be away on leave till Tuesday.
Thanks,
Cian


-Original Message-
From: Gaëtan Rivet 
Sent: Wednesday 7 July 2021 17:36
To: Eli Britstein ;  
; Van Haaren,

Harry

; Ferriter, Cian 
Cc: Majd Dibbiny ; Ilya Maximets 
Subject: Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache

On Wed, Jul 7, 2021, at 17:05, Eli Britstein wrote:

Port numbers are usually small. Maintain an array of netdev handles indexed
by port numbers. It accelerates looking up for them for
netdev_hw_miss_packet_recover().

Reported-by: Cian Ferriter 
Signed-off-by: Eli Britstein 
Reviewed-by: Gaetan Rivet 
---




___
dev mailing list
d...@openvswitch.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-devdata=04%7C01%7Celibr%40nvidia.com%7C7ca0caf9434e429e4ffd08d946d7cf2f%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637618715041410254%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=rv%2FdenANxrcTGxBBbRvhhlNioyswL7ieFr8AGcGtCs8%3Dreserved=0


Hello,

I tested the performance impact of this patch with a partial offload setup.
As reported by pmd-stats-show, in average cycles per packet:

Before vxlan-decap: 525 c/p
After vxlan-decap: 542 c/p
After this fix: 530 c/p

Without those fixes, vxlan-decap has a 3.2% negative impact on cycles,
with the fixes, the impact is reduced to 0.95%.

As I had to force partial offloads for our hardware, it would be better
with an outside confirmation on a proper setup.

Kind regards,
--
Gaetan Rivet

I'm showing the performance relative to what we measured on OVS master directly 
before the VXLAN

HWOL changes went in. All of the below results are using the scalar DPIF and 
partial HWOL.

Link to "Fixup patches": 
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatchwork.ozlabs.org%2Fproject%2Fopenvswitch%2Flist%2F%3Fseries%3D252356data=04%7C01%7Celibr%40nvidia.com%7C7ca0caf9434e429e4ffd08d946d7cf2f%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637618715041410254%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Y62OrCRyS00vJHHPQvAHyhG5C4eO%2FSfWMCSPtszn3Is%3Dreserved=0

Master before VXLAN HWOL changes (f0e4a73)
1.000x

Latest master after VXLAN HWOL changes (b780911)
0.918x (-8.2%)

After fixup patches on OVS ML are applied (with ALLOW_EXPERIMENTAL_API=off)
0.973x (-2.7%)

After fixup patches on OVS ML are applied and after ALLOW_EXPERIMENTAL_API is 
removed.
0.938x (-6.2%)

I ran the last set of results by applying the below diff. I did this because 
I'm assuming the plan

is to remove the ALLOW_EXPERIMENTAL_API '#ifdef's at some point?

Yes, that is the plan.


Thanks for confirming this.


And thanks for testing, Gaetan and Cian!

Could you also provide more details on your test environment,
so someone else can reproduce?


Good idea, I'll add the details inline below. These details apply to the 
performance measured previously by me, and the performance in this mail.


What is important to know:
- Test configuration: P2P, V2V, PVP, etc.


P2P
1 PHY port
1 RXQ


- Test type: max. throughput, zero packet loss.

Max throughput.


- OVS config: EMC, SMC, HWOL, AVX512 - on/off/type

In all tests, all packets hit a single datapath flow with "offloaded:partial". 
So all packets are partially offloaded, skipping miniflow_extract() and EMC/SMC/DPCLS 
lookups.

AVX512 is off.


- Installed OF rules.

$ $OVS_DIR/utilities/ovs-ofctl dump-flows br0
  cookie=0x0, duration=253.691s, table=0, n_packets=2993867136, 
n_bytes=179632028160, in_port=phy0 actions=IN_PORT


- Traffic pattern: Packet size, number of flows, packet type.

64B, 1 flow, ETH/IP packets.


This tests also didn't include the fix from Balazs, IIUC, because
they were performed a bit before that patch got accepted.


Correct, the above tests didn't include the optimization from Balazs.


And Flavio reported what seems to be noticeable performance
drop due to just accepted AVX512 DPIF implementation for the
non-HWOL non-AVX512 setup:
   

Re: [ovs-dev] [v12 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-14 Thread Eelco Chaudron


On 14 Jul 2021, at 16:14, kumar Amber wrote:

> From: Kumar Amber 
>
> This commit introduces additional command line paramter
> for mfex study function. If user provides additional packet out
> it is used in study to compare minimum packets which must be processed
> else a default value is choosen.
> Also introduces a third paramter for choosing a particular pmd core.
>
> $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3
>
> Signed-off-by: Kumar Amber 
>
> ---
> v12:
> - re-work the set command to sweep
> - inlcude fixes to study.c and doc changes
> v11:
> - include comments from Eelco
> - reworked set command as per discussion
> v10:
> - fix review comments Eelco
> v9:
> - fix review comments Flavio
> v7:
> - change the command paramters for core_id and study_pkt_cnt
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - introucde pmd core id parameter
> ---
> ---
>  Documentation/topics/dpdk/bridge.rst |  38 +-
>  lib/dpif-netdev-extract-study.c  |  30 -
>  lib/dpif-netdev-private-extract.h|   9 ++
>  lib/dpif-netdev.c| 178 ++-
>  4 files changed, 222 insertions(+), 33 deletions(-)
>
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index a47153495..8c500c504 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -284,12 +284,46 @@ command also shows whether the CPU supports each 
> implementation ::
>
>  An implementation can be selected manually by the following command ::
>
> -$ ovs-appctl dpif-netdev/miniflow-parser-set study
> +$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
> + [study_cnt]
> +
> +The above command has two optional parameters: study_cnt and core_id.
> +The core_id sets a particular miniflow extract function to a specific
> +pmd thread on the core. The third parameter study_cnt, which is specific
> +to study and ignored by other implementations, means how many packets
> +are needed to choose the best implementation.
>
>  Also user can select the study implementation which studies the traffic for
>  a specific number of packets by applying all available implementations of
>  miniflow extract and then chooses the one with the most optimal result for
> -that traffic pattern.
> +that traffic pattern. The user can optionally provide an packet count
> +[study_cnt] parameter which is the minimum number of packets that OVS must
> +study before choosing an optimal implementation. If no packet count is
> +provided, then the default value, 128 is chosen. Also, as there is no
> +synchronization point between threads, one PMD thread might still be running
> +a previous round, and can now decide on earlier data.
> +
> +The per packet count is a global value, and parallel study executions with
> +differing packet counts will use the most recent count value provided by 
> user.
> +
> +Study can be selected with packet count by the following command ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
> +
> +Study can be selected with packet count and explicit PMD selection
> +by the following command ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
> +
> +In the above command the first parameter is the CORE ID of the PMD
> +thread and this can also be used to explicitly set the miniflow
> +extraction function pointer on different PMD threads.
> +
> +Scalar can be selected on core 3 by the following command where
> +study count should not be provided for any implementation other
> +than study ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
>
>  Miniflow Extract Validation
>  ~~~
> diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
> index 02b709f8b..083f940c2 100644
> --- a/lib/dpif-netdev-extract-study.c
> +++ b/lib/dpif-netdev-extract-study.c
> @@ -25,8 +25,7 @@
>
>  VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
>
> -static atomic_uint32_t mfex_study_pkts_count = 0;
> -
> +static atomic_uint32_t  mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;

You still did not remove the space you introduced (see last couple of patches 
with the same request)?

You accidentally removed the new line here.

>  /* Struct to hold miniflow study stats. */
>  struct study_stats {
>  uint32_t pkt_count;
> @@ -48,6 +47,28 @@ mfex_study_get_study_stats_ptr(void)
>  return stats;
>  }
>
> +int
> +mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count, const char *name)
> +{
> +struct dpif_miniflow_extract_impl *miniflow_funcs;
> +miniflow_funcs = dpif_mfex_impl_info_get();
> +
> +/* If the packet count is set and implementation called is study then
> + * set packet counter to requested number or set the packet counter
> + * to default number else return -EINVAL.
> + */
> +if ((strcmp(miniflow_funcs[MFEX_IMPL_STUDY].name, name) == 0) &&
> + 

Re: [ovs-dev] [PATCH] [branch-2.13] conntrack: Document all-zero IP SNAT behavior and add a test case.

2021-07-14 Thread 0-day Robot
Bleep bloop.  Greetings Eelco Chaudron, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.
Patch skipped due to previous failure.

Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v12 10/11] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-07-14 Thread Eelco Chaudron



On 14 Jul 2021, at 16:14, kumar Amber wrote:

> From: Harry van Haaren 
>
> This commit adds AVX512 implementations of miniflow extract.
> By using the 64 bytes available in an AVX512 register, it is
> possible to convert a packet to a miniflow data-structure in
> a small quantity instructions.
>
> The implementation here probes for Ether()/IP()/UDP() traffic,
> and builds the appropriate miniflow data-structure for packets
> that match the probe.
>
> The implementation here is auto-validated by the miniflow
> extract autovalidator, hence its correctness can be easily
> tested and verified.
>
> Note that this commit is designed to easily allow addition of new
> traffic profiles in a scalable way, without code duplication for
> each traffic profile.
>
> Signed-off-by: Harry van Haaren 
> Acked-by: Eelco Chaudron 

ACKing again as the code has changed in this revision. I think the official 
approach should be to remove all previous acks unless it was agreed upon to 
keep them.

Acked-by: Eelco Chaudron 

> ---
> v12:
> - fix one static code warning
> v9:
> - include comments from flavio
> v8:
> - include documentation on AVX512 MFEX as per Eelco's suggestion
> v7:
> - fix minor review sentences (Eelco)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - inlcude assert for flow abi change
> - include assert for offset changes
> ---
> ---
>  lib/automake.mk   |   1 +
>  lib/dpif-netdev-extract-avx512.c  | 478 ++
>  lib/dpif-netdev-private-extract.c |  13 +
>  lib/dpif-netdev-private-extract.h |  30 ++
>  4 files changed, 522 insertions(+)
>  create mode 100644 lib/dpif-netdev-extract-avx512.c
>
> diff --git a/lib/automake.mk b/lib/automake.mk
> index f4f36325e..299f81939 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -39,6 +39,7 @@ lib_libopenvswitchavx512_la_CFLAGS = \
>   $(AM_CFLAGS)
>  lib_libopenvswitchavx512_la_SOURCES = \
>   lib/dpif-netdev-lookup-avx512-gather.c \
> + lib/dpif-netdev-extract-avx512.c \
>   lib/dpif-netdev-avx512.c
>  lib_libopenvswitchavx512_la_LDFLAGS = \
>   -static
> diff --git a/lib/dpif-netdev-extract-avx512.c 
> b/lib/dpif-netdev-extract-avx512.c
> new file mode 100644
> index 0..ac253fa1e
> --- /dev/null
> +++ b/lib/dpif-netdev-extract-avx512.c
> @@ -0,0 +1,478 @@
> +/*
> + * Copyright (c) 2021 Intel.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +/*
> + * AVX512 Miniflow Extract.
> + *
> + * This file contains optimized implementations of miniflow_extract()
> + * for specific common traffic patterns. The optimizations allow for
> + * quick probing of a specific packet type, and if a match with a specific
> + * type is found, a shuffle like procedure builds up the required miniflow.
> + *
> + * Process
> + * -
> + *
> + * The procedure is to classify the packet based on the traffic type
> + * using predifined bit-masks and arrage the packet header data using shuffle
> + * instructions to a pre-defined place as required by the miniflow.
> + * This elimates the if-else ladder to identify the packet data and add data
> + * as per protocol which is present.
> + */
> +
> +#ifdef __x86_64__
> +/* Sparse cannot handle the AVX512 instructions. */
> +#if !defined(__CHECKER__)
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "flow.h"
> +#include "dpdk.h"
> +
> +#include "dpif-netdev-private-dpcls.h"
> +#include "dpif-netdev-private-extract.h"
> +#include "dpif-netdev-private-flow.h"
> +
> +/* AVX512-BW level permutex2var_epi8 emulation. */
> +static inline __m512i
> +__attribute__((target("avx512bw")))
> +_mm512_maskz_permutex2var_epi8_skx(__mmask64 k_mask,
> +   __m512i v_data_0,
> +   __m512i v_shuf_idxs,
> +   __m512i v_data_1)
> +{
> +/* Manipulate shuffle indexes for u16 size. */
> +__mmask64 k_mask_odd_lanes = 0x;
> +/* Clear away ODD lane bytes. Cannot be done above due to no u8 shift. */
> +__m512i v_shuf_idx_evn = _mm512_mask_blend_epi8(k_mask_odd_lanes,
> +v_shuf_idxs,
> +_mm512_setzero_si512());
> +v_shuf_idx_evn = _mm512_srli_epi16(v_shuf_idx_evn, 1);
> +
> +__m512i v_shuf_idx_odd = _mm512_srli_epi16(v_shuf_idxs, 9);
> +
> +/* Shuffle each half at 

Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache

2021-07-14 Thread Ferriter, Cian


> -Original Message-
> From: Ilya Maximets 
> Sent: Friday 9 July 2021 21:53
> To: Ferriter, Cian ; Gaëtan Rivet ; 
> Eli Britstein
> ; d...@openvswitch.org; Van Haaren, Harry 
> 
> Cc: Majd Dibbiny ; Ilya Maximets ; 
> Stokes, Ian
> ; Flavio Leitner 
> Subject: Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache
> 
> On 7/8/21 6:43 PM, Ferriter, Cian wrote:
> > Hi Gaetan, Eli and all,
> >
> > Thanks for the patch and the info on how it affects performance in your 
> > case. I just wanted to post
> the performance we are seeing.
> >
> > I've posted the numbers inline. Please note, I'll be away on leave till 
> > Tuesday.
> > Thanks,
> > Cian
> >
> >> -Original Message-
> >> From: Gaëtan Rivet 
> >> Sent: Wednesday 7 July 2021 17:36
> >> To: Eli Britstein ;  
> >> ; Van Haaren,
> Harry
> >> ; Ferriter, Cian 
> >> Cc: Majd Dibbiny ; Ilya Maximets 
> >> Subject: Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array 
> >> cache
> >>
> >> On Wed, Jul 7, 2021, at 17:05, Eli Britstein wrote:
> >>> Port numbers are usually small. Maintain an array of netdev handles 
> >>> indexed
> >>> by port numbers. It accelerates looking up for them for
> >>> netdev_hw_miss_packet_recover().
> >>>
> >>> Reported-by: Cian Ferriter 
> >>> Signed-off-by: Eli Britstein 
> >>> Reviewed-by: Gaetan Rivet 
> >>> ---
> >
> > 
> >
> >>> ___
> >>> dev mailing list
> >>> d...@openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>
> >>
> >> Hello,
> >>
> >> I tested the performance impact of this patch with a partial offload setup.
> >> As reported by pmd-stats-show, in average cycles per packet:
> >>
> >> Before vxlan-decap: 525 c/p
> >> After vxlan-decap: 542 c/p
> >> After this fix: 530 c/p
> >>
> >> Without those fixes, vxlan-decap has a 3.2% negative impact on cycles,
> >> with the fixes, the impact is reduced to 0.95%.
> >>
> >> As I had to force partial offloads for our hardware, it would be better
> >> with an outside confirmation on a proper setup.
> >>
> >> Kind regards,
> >> --
> >> Gaetan Rivet
> >
> > I'm showing the performance relative to what we measured on OVS master 
> > directly before the VXLAN
> HWOL changes went in. All of the below results are using the scalar DPIF and 
> partial HWOL.
> >
> > Link to "Fixup patches": 
> > http://patchwork.ozlabs.org/project/openvswitch/list/?series=252356
> >
> > Master before VXLAN HWOL changes (f0e4a73)
> > 1.000x
> >
> > Latest master after VXLAN HWOL changes (b780911)
> > 0.918x (-8.2%)
> >
> > After fixup patches on OVS ML are applied (with ALLOW_EXPERIMENTAL_API=off)
> > 0.973x (-2.7%)
> >
> > After fixup patches on OVS ML are applied and after ALLOW_EXPERIMENTAL_API 
> > is removed.
> > 0.938x (-6.2%)
> >
> > I ran the last set of results by applying the below diff. I did this 
> > because I'm assuming the plan
> is to remove the ALLOW_EXPERIMENTAL_API '#ifdef's at some point?
> 
> Yes, that is the plan.
> 

Thanks for confirming this.

> And thanks for testing, Gaetan and Cian!
> 
> Could you also provide more details on your test environment,
> so someone else can reproduce?
> 

Good idea, I'll add the details inline below. These details apply to the 
performance measured previously by me, and the performance in this mail.

> What is important to know:
> - Test configuration: P2P, V2V, PVP, etc.


P2P
1 PHY port
1 RXQ

> - Test type: max. throughput, zero packet loss.

Max throughput.

> - OVS config: EMC, SMC, HWOL, AVX512 - on/off/type

In all tests, all packets hit a single datapath flow with "offloaded:partial". 
So all packets are partially offloaded, skipping miniflow_extract() and 
EMC/SMC/DPCLS lookups.

AVX512 is off.

> - Installed OF rules.

$ $OVS_DIR/utilities/ovs-ofctl dump-flows br0
 cookie=0x0, duration=253.691s, table=0, n_packets=2993867136, 
n_bytes=179632028160, in_port=phy0 actions=IN_PORT

> - Traffic pattern: Packet size, number of flows, packet type.

64B, 1 flow, ETH/IP packets.

> 
> This tests also didn't include the fix from Balazs, IIUC, because
> they were performed a bit before that patch got accepted.
> 

Correct, the above tests didn't include the optimization from Balazs.

> And Flavio reported what seems to be noticeable performance
> drop due to just accepted AVX512 DPIF implementation for the
> non-HWOL non-AVX512 setup:
>   https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385448.html
> 

We are testing partial HWOL setups, so I don't think Flavio's mail is relevant 
to this. 

> So, it seems that everything will need to be re-tested anyway
> in order to understand what is the current situation.
> 

Agreed, let's retest the performance. I've included the new numbers below:

I'm showing the performance relative to what we measured on OVS master directly 
before the VXLAN HWOL changes went in. All of the below results are using the 
scalar DPIF and partial HWOL.
Link to "Fixup patches" (v2): 

Re: [ovs-dev] [v12 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-14 Thread Eelco Chaudron



On 14 Jul 2021, at 16:14, kumar Amber wrote:


From: Kumar Amber 

Tests:
  6: OVS-DPDK - MFEX Autovalidator
  7: OVS-DPDK - MFEX Autovalidator Fuzzy

Added a new directory to store the PCAP file used
in the tests and a script to generate the fuzzy traffic
type pcap to be used in fuzzy unit test.

Signed-off-by: Kumar Amber 
Acked-by: Flavio Leitner 
---
v12:
- change skip paramter for unit test
v11:
- fix comments from Eelco
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- remove sleep from first test and added minor 5 sec sleep to fuzzy
---
---
 Documentation/topics/dpdk/bridge.rst |  48 
 tests/.gitignore |   1 +
 tests/automake.mk|   6 +++
 tests/mfex_fuzzy.py  |  33 +
 tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
 tests/system-dpdk.at |  53 
+++

 6 files changed, 141 insertions(+)
 create mode 100755 tests/mfex_fuzzy.py
 create mode 100644 tests/pcap/mfex_test.pcap

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst

index 8c500c504..2f065422b 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -346,3 +346,51 @@ A compile time option is available in order to 
test it with the OVS unit

 test suite. Use the following configure option ::

 $ ./configure --enable-mfex-default-autovalidator
+
+Unit Test Miniflow Extract
+++
+
+Unit test can also be used to test the workflow mentioned above by 
running

+the following test-case in tests/system-dpdk.at ::
+
+make check-dpdk TESTSUITEFLAGS='-k MFEX'
+OVS-DPDK - MFEX Autovalidator
+
+The unit test uses mulitple traffic type to test the correctness of 
the

+implementaions.
+
+Running Fuzzy test with Autovalidator
++
+
+Fuzzy tests can also be done on miniflow extract with the help of
+auto-validator and Scapy. The steps below describes the steps to
+reproduce the setup with IP being fuzzed to generate packets.
+
+Scapy is used to create fuzzy IP packets and save them into a PCAP ::
+
+pkt = fuzz(Ether()/IP()/TCP())
+
+Set the miniflow extract to autovalidator using ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
+
+OVS is configured to receive the generated packets ::
+
+$ ovs-vsctl add-port br0 pcap0 -- \
+set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
+"rx_pcap=fuzzy.pcap"
+
+With this workflow, the autovalidator will ensure that all MFEX
+implementations are classifying each packet in exactly the same way.
+If an optimized MFEX implementation causes a different miniflow to be
+generated, the autovalidator has ovs_assert and logging statements 
that

+will inform about the issue.
+
+Unit Fuzzy test with Autovalidator
++
+
+Unit test can also be used to test the workflow mentioned above by 
running

+the following test-case in tests/system-dpdk.at ::
+
+make check-dpdk TESTSUITEFLAGS='-k MFEX'
+OVS-DPDK - MFEX Autovalidator Fuzzy
diff --git a/tests/.gitignore b/tests/.gitignore
index 45b4f67b2..a3d927e5d 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -11,6 +11,7 @@
 /ovsdb-cluster-testsuite
 /ovsdb-cluster-testsuite.dir/
 /ovsdb-cluster-testsuite.log
+/pcap/
 /pki/
 /system-afxdp-testsuite
 /system-afxdp-testsuite.dir/
diff --git a/tests/automake.mk b/tests/automake.mk
index f45f8d76c..a6c15ba55 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at: 
tests/automake.mk

echo "TEST_FUZZ_REGRESSION([$$basename])"; \
done > $@.tmp && mv $@.tmp $@

+EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
+MFEX_AUTOVALIDATOR_TESTS = \
+   tests/pcap/mfex_test.pcap \
+   tests/mfex_fuzzy.py
+
 OVSDB_CLUSTER_TESTSUITE_AT = \
tests/ovsdb-cluster-testsuite.at \
tests/ovsdb-execution.at \
@@ -512,6 +517,7 @@ tests_test_type_props_SOURCES = 
tests/test-type-props.c

 CHECK_PYFILES = \
tests/appctl.py \
tests/flowgen.py \
+   tests/mfex_fuzzy.py \
tests/ovsdb-monitor-sort.py \
tests/test-daemon.py \
tests/test-json.py \
diff --git a/tests/mfex_fuzzy.py b/tests/mfex_fuzzy.py
new file mode 100755
index 0..5b056bb48
--- /dev/null
+++ b/tests/mfex_fuzzy.py
@@ -0,0 +1,33 @@
+#!/usr/bin/python3
+try:
+from scapy.all import RandMAC, RandIP, PcapWriter, RandIP6, 
RandShort, fuzz

+from scapy.all import IPv6, Dot1Q, IP, Ether, UDP, TCP
+except ModuleNotFoundError as err:
+print(err + ": Scapy")
+import sys
+
+path = str(sys.argv[1]) + "/pcap/fuzzy.pcap"
+pktdump = PcapWriter(path, append=False, sync=True)
+
+for i in range(0, 2000):
+
+# Generate random protocol bases, use a fuzz() over the combined 
packet

+# for full fuzzing.
+eth = Ether(src=RandMAC(), dst=RandMAC())

Re: [ovs-dev] [PATCH v5] conntrack: document all-zero IP SNAT behavior and add a test case

2021-07-14 Thread Eelco Chaudron


On 9 Jul 2021, at 16:07, Ilya Maximets wrote:

> On 7/9/21 10:29 AM, Eelco Chaudron wrote:
>>
>>
>> On 8 Jul 2021, at 21:23, Ilya Maximets wrote:
>>
>>> On 6/10/21 11:24 AM, Eelco Chaudron wrote:
 Currently, conntrack in the kernel has an undocumented feature referred
 to as all-zero IP address SNAT. Basically, when a source port
 collision is detected during the commit, the source port will be
 translated to an ephemeral port. If there is no collision, no SNAT is
 performed.

 This patchset documents this behavior and adds a self-test to verify
 it's not changing. In addition, a datapath feature flag is added for
 the all-zero IP SNAT case. This will help applications on top of OVS,
 like OVN, to determine this feature can be used.

 Signed-off-by: Eelco Chaudron 
 Acked-by: Aaron Conole 
 Acked-by: Dumitru Ceara 
 ---

 v5: Windows datapath does not support all-zero SNAT, add checks.
 v4: Added datapath support flag for all-zero SNAT.
 v3: Renamed NULL SNAT to all-zero IP SNAT.
 v2: Fixed NULL SNAT to only work in the -rpl state to be inline with
 OpenShift-SDN's behavior.
>>>
>>>
>>> Thanks, everyone!  I added a NEWS entry and applied to master.
>>
>> Can we also backport this patch? It’s not adding any new features, just the 
>> datapath support flag, and a unit test?
>
> OK.  That makes sense, since it's not really a new feature, but
> a documentation for an always existed behavior.
>
> I backported it to 2.15.  2.13 has some conflicts, if you think
> that it's needed there, please, send a backport with branch-2.13
> subject prefix.
>

Just now sent you a patch that will apply to 2.13. Let me know if that would be 
enough?

//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] [branch-2.13] conntrack: Document all-zero IP SNAT behavior and add a test case.

2021-07-14 Thread Eelco Chaudron
Currently, conntrack in the kernel has an undocumented feature referred
to as all-zero IP address SNAT. Basically, when a source port
collision is detected during the commit, the source port will be
translated to an ephemeral port. If there is no collision, no SNAT is
performed.

This patchset documents this behavior and adds a self-test to verify
it's not changing. In addition, a datapath feature flag is added for
the all-zero IP SNAT case. This will help applications on top of OVS,
like OVN, to determine this feature can be used.

Signed-off-by: Eelco Chaudron 
---
 NEWS |6 +
 lib/ct-dpif.c|8 +++
 lib/ct-dpif.h|6 +
 lib/dpif-netdev.c|1 +
 lib/dpif-netlink.c   |   15 
 lib/dpif-provider.h  |5 
 lib/ovs-actions.xml  |   10 
 ofproto/ofproto-dpif.c   |   20 +
 ofproto/ofproto-dpif.h   |5 +++-
 tests/system-kmod-macros.at  |   11 +
 tests/system-traffic.at  |   46 ++
 tests/system-userspace-macros.at |   10 
 vswitchd/vswitch.xml |9 +++
 13 files changed, 151 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index fea854d78..5dbde2a63 100644
--- a/NEWS
+++ b/NEWS
@@ -1,3 +1,9 @@
+Post- v2.13.4
+-
+   - OVS now reports the datapath capability 'ct_zero_snat', which reflects
+ whether the SNAT with all-zero IP address is supported.
+ See ovs-vswitchd.conf.db(5) for details.
+
 v2.13.4 - 01 Jul 2021
 -
- Bug fixes
diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index 8c2480e7a..880352376 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -889,3 +889,11 @@ ct_dpif_get_timeout_policy_name(struct dpif *dpif, 
uint32_t tp_id,
 dpif, tp_id, dl_type, nw_proto, tp_name, is_generic)
 : EOPNOTSUPP);
 }
+
+int
+ct_dpif_get_features(struct dpif *dpif, enum ct_features *features)
+{
+return (dpif->dpif_class->ct_get_features
+? dpif->dpif_class->ct_get_features(dpif, features)
+: EOPNOTSUPP);
+}
diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
index 3e227d9e3..ebd9ac9be 100644
--- a/lib/ct-dpif.h
+++ b/lib/ct-dpif.h
@@ -269,6 +269,11 @@ struct ct_dpif_timeout_policy {
  * timeout attribute values */
 };
 
+/* Conntrack Features. */
+enum ct_features {
+CONNTRACK_F_ZERO_SNAT = 1 << 0,  /* All-zero SNAT support. */
+};
+
 int ct_dpif_dump_start(struct dpif *, struct ct_dpif_dump_state **,
const uint16_t *zone, int *);
 int ct_dpif_dump_next(struct ct_dpif_dump_state *, struct ct_dpif_entry *);
@@ -323,5 +328,6 @@ int ct_dpif_timeout_policy_dump_done(struct dpif *dpif, 
void *state);
 int ct_dpif_get_timeout_policy_name(struct dpif *dpif, uint32_t tp_id,
 uint16_t dl_type, uint8_t nw_proto,
 char **tp_name, bool *is_generic);
+int ct_dpif_get_features(struct dpif *dpif, enum ct_features *features);
 
 #endif /* CT_DPIF_H */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a73c4469d..9e0ba4fc4 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -7910,6 +7910,7 @@ const struct dpif_class dpif_netdev_class = {
 NULL,   /* ct_timeout_policy_dump_next */
 NULL,   /* ct_timeout_policy_dump_done */
 NULL,   /* ct_get_timeout_policy_name */
+NULL,   /* ct_get_features */
 dpif_netdev_ipf_set_enabled,
 dpif_netdev_ipf_set_min_frag,
 dpif_netdev_ipf_set_max_nfrags,
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 0b40bb083..95507f395 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -3150,6 +3150,20 @@ dpif_netlink_ct_get_timeout_policy_name(struct dpif 
*dpif OVS_UNUSED,
 return 0;
 }
 
+static int
+dpif_netlink_ct_get_features(struct dpif *dpif OVS_UNUSED,
+ enum ct_features *features)
+{
+if (features != NULL) {
+#ifndef _WIN32
+*features = CONNTRACK_F_ZERO_SNAT;
+#else
+*features = 0;
+#endif
+}
+return 0;
+}
+
 #define CT_DPIF_NL_TP_TCP_MAPPINGS  \
 CT_DPIF_NL_TP_MAPPING(TCP, TCP, SYN_SENT, SYN_SENT) \
 CT_DPIF_NL_TP_MAPPING(TCP, TCP, SYN_RECV, SYN_RECV) \
@@ -3992,6 +4006,7 @@ const struct dpif_class dpif_netlink_class = {
 dpif_netlink_ct_timeout_policy_dump_next,
 dpif_netlink_ct_timeout_policy_dump_done,
 dpif_netlink_ct_get_timeout_policy_name,
+dpif_netlink_ct_get_features,
 NULL,   /* ipf_set_enabled */
 NULL,   /* ipf_set_min_frag */
 NULL,   /* ipf_set_max_nfrags */
diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
index b77317bca..9941c9ba8 

Re: [ovs-dev] [v10 06/12] dpif-netdev: Add packet count and core id paramters for study

2021-07-14 Thread Amber, Kumar
Hi Eelco,

The requested changes for the rework of MFEX-set commands are now available in 
v12 

Br Amber

> -Original Message-
> From: Eelco Chaudron 
> Sent: Wednesday, July 14, 2021 4:04 PM
> To: Van Haaren, Harry 
> Cc: Amber, Kumar ; ovs-dev@openvswitch.org;
> f...@sysclose.org; i.maxim...@ovn.org; Ferriter, Cian
> ; Stokes, Ian 
> Subject: Re: [v10 06/12] dpif-netdev: Add packet count and core id paramters
> for study
> 
> 
> 
> On 13 Jul 2021, at 18:11, Van Haaren, Harry wrote:
> 
> >> -Original Message-
> >> From: Eelco Chaudron 
> >> Sent: Tuesday, July 13, 2021 3:26 PM
> >> To: Van Haaren, Harry 
> >> Cc: Amber, Kumar ; ovs-dev@openvswitch.org;
> >> f...@sysclose.org; i.maxim...@ovn.org; Ferriter, Cian
> >> ; Stokes, Ian 
> >> Subject: Re: [v10 06/12] dpif-netdev: Add packet count and core id
> >> paramters for study
> >>
> >>
> >>
> >> On 13 Jul 2021, at 16:09, Van Haaren, Harry wrote:
> >>
> >>> (Off Topic; Eelco, I think your email client is sending HTML
> >>> replies, perhaps
> >> check settings to disable?)
> >>>
> >>> Eelco Wrote;
>   If you ask for a specific existing pmd to run a specific
>  implementation, it
> >> should NOT affect any existing or newly created pmd.
> >>>
> >>> OK, thanks for explaining it clearly with examples below!
> >>>
> >>> In general the behavior below is OK, except that we set the default
> >>> when a
> >> specific PMD is requested.
> >>> As a solution, we can set the default behavior only when "-pmd" is
> >>> NOT
> >> provided.
> >>>
> >>> 1) "all pmds" (no args) mode: we set the default, update all current
> >>> PMD
> >> threads, and will update future PMD threads via default.
> >>> 2) "-pmd " mode: we set only for that PMD thread, and if the
> >>> PMD
> >> thread isn't active, existing code will provide good error to user.
> >>>
> >>> Does that seem a pragmatic solution? -Harry
> >>
> >> Yes, this is exactly what I had in mind when I added the comment.
> >
> > Ah great, good stuff.
> >
> > I've looked at the enabling code & patchset, the command parsing is getting
> complex.
> > I tend to agree with Eelco's review, that a single sweep of the args
> > would be a better and more readable implementation, however today that
> > command is built in multiple patchs, and each change would cause a
> > rebase-conflict, so rework would cost more time than we would like,
> particularly given the upcoming deadlines.
> >
> > Here a suggestion to pragmatically move forward:
> > - Merge the code approximately as in this patchset, but with the 1) and 2)
> suggestions applied above to fixup functional correctness.
> > Amber has a v11 almost ready to go that addresses the main issues.
> >
> > - Commit to reworking the command in a follow-up patch next week. This
> refactor would use the "sweep argc" method,
> >and will continue to have the same user-visible 
> > functionality/error-messages,
> just a cleaner implementation.
> >(This avoids the rebase-heavy workflow of refactoring now, as the
> > command is enabled over multiple commits.)
> >
> > Hope that's a workable solution for all?
> 
> Just reviewed the v11, and I agree it looks even messier. I think it needs a 
> proper
> rewrite and contradicting to your statement above, all changes are
> concentrated in patch 6, so I think it needs to be done in the v12 as we are
> almost there.
> 
> Cheers,
> 
> Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [v12 11/11] dpif-netdev/mfex: add more AVX512 traffic profiles

2021-07-14 Thread kumar Amber
From: Harry van Haaren 

This commit adds 3 new traffic profile implementations to the
existing avx512 miniflow extract infrastructure. The profiles added are:
- Ether()/IP()/TCP()
- Ether()/Dot1Q()/IP()/UDP()
- Ether()/Dot1Q()/IP()/TCP()

The design of the avx512 code here is for scalability to add more
traffic profiles, as well as enabling CPU ISA. Note that an implementation
is primarily adding static const data, which the compiler then specializes
away when the profile specific function is declared below.

As a result, the code is relatively maintainable, and scalable for new
traffic profiles as well as new ISA, and does not lower performance
compared with manually written code for each profile/ISA.

Note that confidence in the correctness of each implementation is
achieved through autovalidation, unit tests with known packets, and
fuzz tested packets.

Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 
Acked-by: Flavio Leitner 
---

Hi Readers,

If you have a traffic profile you'd like to see accelerated using
avx512 code, please send me an email and we can collaborate on adding
support for it!

Regards, -Harry

---

v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 NEWS  |   2 +
 lib/dpif-netdev-extract-avx512.c  | 152 ++
 lib/dpif-netdev-private-extract.c |  30 ++
 lib/dpif-netdev-private-extract.h |  10 ++
 4 files changed, 194 insertions(+)

diff --git a/NEWS b/NEWS
index 99baca706..ea55805e8 100644
--- a/NEWS
+++ b/NEWS
@@ -41,6 +41,8 @@ Post-v2.15.0
  * Add build time configure command to enable auto-validatior as default
miniflow implementation at build time.
  * Cache results for CPU ISA checks, reduces overhead on repeated lookups.
+ * Add AVX512 based optimized miniflow extract function for traffic type
+   IPv4/UDP, IPv4/TCP, Vlan/IPv4/UDP and Vlan/Ipv4/TCP.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c
index ac253fa1e..ec64419e3 100644
--- a/lib/dpif-netdev-extract-avx512.c
+++ b/lib/dpif-netdev-extract-avx512.c
@@ -136,6 +136,13 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
 
 #define PATTERN_ETHERTYPE_MASK PATTERN_ETHERTYPE_GEN(0xFF, 0xFF)
 #define PATTERN_ETHERTYPE_IPV4 PATTERN_ETHERTYPE_GEN(0x08, 0x00)
+#define PATTERN_ETHERTYPE_DT1Q PATTERN_ETHERTYPE_GEN(0x81, 0x00)
+
+/* VLAN (Dot1Q) patterns and masks. */
+#define PATTERN_DT1Q_MASK   \
+  0x00, 0x00, 0xFF, 0xFF,
+#define PATTERN_DT1Q_IPV4   \
+  0x00, 0x00, 0x08, 0x00,
 
 /* Generator for checking IPv4 ver, ihl, and proto */
 #define PATTERN_IPV4_GEN(VER_IHL, FLAG_OFF_B0, FLAG_OFF_B1, PROTO) \
@@ -161,6 +168,29 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
   34, 35, 36, 37, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* UDP */   \
   NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
 
+/* TCP shuffle: tcp_ctl bits require mask/processing, not included here. */
+#define PATTERN_IPV4_TCP_SHUFFLE \
+   0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, NU, NU, /* Ether */ \
+  26, 27, 28, 29, 30, 31, 32, 33, NU, NU, NU, NU, 20, 15, 22, 23, /* IPv4 */  \
+  NU, NU, NU, NU, NU, NU, NU, NU, 34, 35, 36, 37, NU, NU, NU, NU, /* TCP */   \
+  NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
+
+#define PATTERN_DT1Q_IPV4_UDP_SHUFFLE \
+  /* Ether (2 blocks): Note that *VLAN* type is written here. */  \
+  0,  1,  2,  3,  4,  5,  6,  7, 8,  9, 10, 11, 16, 17,  0,  0,   \
+  /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */   \
+  12, 13, 14, 15, 0, 0, 0, 0, \
+  30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */  \
+  38, 39, 40, 41, NU, NU, NU, NU, /* UDP */
+
+#define PATTERN_DT1Q_IPV4_TCP_SHUFFLE \
+  /* Ether (2 blocks): Note that *VLAN* type is written here. */  \
+  0,  1,  2,  3,  4,  5,  6,  7, 8,  9, 10, 11, 16, 17,  0,  0,   \
+  /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */   \
+  12, 13, 14, 15, 0, 0, 0, 0, \
+  30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */  \
+  NU, NU, NU, NU, NU, NU, NU, NU, 38, 39, 40, 41, NU, NU, NU, NU, /* TCP */   \
+  NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
 
 /* Generation of K-mask bitmask values, to zero out data in result. Note that
  * these correspond 1:1 to the above "*_SHUFFLE" values, and bit used must be
@@ -170,12 +200,22 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
  * 

[ovs-dev] [v12 09/11] dpdk: add additional CPU ISA detection strings

2021-07-14 Thread kumar Amber
From: Harry van Haaren 

This commit enables OVS to at runtime check for more detailed
AVX512 capabilities, specifically Byte and Word (BW) extensions,
and Vector Bit Manipulation Instructions (VBMI).

These instructions will be used in the CPU ISA optimized
implementations of traffic profile aware miniflow extract.

Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 
Acked-by: Flavio Leitner 
---
 NEWS   | 1 +
 lib/dpdk.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/NEWS b/NEWS
index 225eb445c..99baca706 100644
--- a/NEWS
+++ b/NEWS
@@ -40,6 +40,7 @@ Post-v2.15.0
traffic.
  * Add build time configure command to enable auto-validatior as default
miniflow implementation at build time.
+ * Cache results for CPU ISA checks, reduces overhead on repeated lookups.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 9de2af58e..1b8f8e55b 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -706,6 +706,8 @@ dpdk_get_cpu_has_isa(const char *arch, const char *feature)
 #if __x86_64__
 /* CPU flags only defined for the architecture that support it. */
 CHECK_CPU_FEATURE(feature, "avx512f", RTE_CPUFLAG_AVX512F);
+CHECK_CPU_FEATURE(feature, "avx512bw", RTE_CPUFLAG_AVX512BW);
+CHECK_CPU_FEATURE(feature, "avx512vbmi", RTE_CPUFLAG_AVX512VBMI);
 CHECK_CPU_FEATURE(feature, "avx512vpopcntdq", RTE_CPUFLAG_AVX512VPOPCNTDQ);
 CHECK_CPU_FEATURE(feature, "bmi2", RTE_CPUFLAG_BMI2);
 #endif
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [v12 10/11] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-07-14 Thread kumar Amber
From: Harry van Haaren 

This commit adds AVX512 implementations of miniflow extract.
By using the 64 bytes available in an AVX512 register, it is
possible to convert a packet to a miniflow data-structure in
a small quantity instructions.

The implementation here probes for Ether()/IP()/UDP() traffic,
and builds the appropriate miniflow data-structure for packets
that match the probe.

The implementation here is auto-validated by the miniflow
extract autovalidator, hence its correctness can be easily
tested and verified.

Note that this commit is designed to easily allow addition of new
traffic profiles in a scalable way, without code duplication for
each traffic profile.

Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 

---
v12:
- fix one static code warning
v9:
- include comments from flavio
v8:
- include documentation on AVX512 MFEX as per Eelco's suggestion
v7:
- fix minor review sentences (Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- inlcude assert for flow abi change
- include assert for offset changes
---
---
 lib/automake.mk   |   1 +
 lib/dpif-netdev-extract-avx512.c  | 478 ++
 lib/dpif-netdev-private-extract.c |  13 +
 lib/dpif-netdev-private-extract.h |  30 ++
 4 files changed, 522 insertions(+)
 create mode 100644 lib/dpif-netdev-extract-avx512.c

diff --git a/lib/automake.mk b/lib/automake.mk
index f4f36325e..299f81939 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -39,6 +39,7 @@ lib_libopenvswitchavx512_la_CFLAGS = \
$(AM_CFLAGS)
 lib_libopenvswitchavx512_la_SOURCES = \
lib/dpif-netdev-lookup-avx512-gather.c \
+   lib/dpif-netdev-extract-avx512.c \
lib/dpif-netdev-avx512.c
 lib_libopenvswitchavx512_la_LDFLAGS = \
-static
diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c
new file mode 100644
index 0..ac253fa1e
--- /dev/null
+++ b/lib/dpif-netdev-extract-avx512.c
@@ -0,0 +1,478 @@
+/*
+ * Copyright (c) 2021 Intel.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * AVX512 Miniflow Extract.
+ *
+ * This file contains optimized implementations of miniflow_extract()
+ * for specific common traffic patterns. The optimizations allow for
+ * quick probing of a specific packet type, and if a match with a specific
+ * type is found, a shuffle like procedure builds up the required miniflow.
+ *
+ * Process
+ * -
+ *
+ * The procedure is to classify the packet based on the traffic type
+ * using predifined bit-masks and arrage the packet header data using shuffle
+ * instructions to a pre-defined place as required by the miniflow.
+ * This elimates the if-else ladder to identify the packet data and add data
+ * as per protocol which is present.
+ */
+
+#ifdef __x86_64__
+/* Sparse cannot handle the AVX512 instructions. */
+#if !defined(__CHECKER__)
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "flow.h"
+#include "dpdk.h"
+
+#include "dpif-netdev-private-dpcls.h"
+#include "dpif-netdev-private-extract.h"
+#include "dpif-netdev-private-flow.h"
+
+/* AVX512-BW level permutex2var_epi8 emulation. */
+static inline __m512i
+__attribute__((target("avx512bw")))
+_mm512_maskz_permutex2var_epi8_skx(__mmask64 k_mask,
+   __m512i v_data_0,
+   __m512i v_shuf_idxs,
+   __m512i v_data_1)
+{
+/* Manipulate shuffle indexes for u16 size. */
+__mmask64 k_mask_odd_lanes = 0x;
+/* Clear away ODD lane bytes. Cannot be done above due to no u8 shift. */
+__m512i v_shuf_idx_evn = _mm512_mask_blend_epi8(k_mask_odd_lanes,
+v_shuf_idxs,
+_mm512_setzero_si512());
+v_shuf_idx_evn = _mm512_srli_epi16(v_shuf_idx_evn, 1);
+
+__m512i v_shuf_idx_odd = _mm512_srli_epi16(v_shuf_idxs, 9);
+
+/* Shuffle each half at 16-bit width. */
+__m512i v_shuf1 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_evn,
+v_data_1);
+__m512i v_shuf2 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_odd,
+v_data_1);
+
+/* Find if the shuffle index was odd, via mask and compare. */
+uint16_t index_odd_mask = 0x1;
+const __m512i v_index_mask_u16 = _mm512_set1_epi16(index_odd_mask);
+
+/* EVEN lanes, find if 

[ovs-dev] [v12 08/11] dpif/stats: add miniflow extract opt hits counter

2021-07-14 Thread kumar Amber
From: Harry van Haaren 

This commit adds a new counter to be displayed to the user when
requesting datapath packet statistics. It counts the number of
packets that are parsed and a miniflow built up from it by the
optimized miniflow extract parsers.

The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an
extra entry indicating if the optimized MFEX was hit:

  - MFEX Opt hits:6786432  (100.0 %)

Signed-off-by: Harry van Haaren 
Acked-by: Flavio Leitner 
Acked-by: Eelco Chaudron 
---
v11:
- fix review comments from Eelco
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 lib/dpif-netdev-avx512.c|  3 +++
 lib/dpif-netdev-perf.c  |  3 +++
 lib/dpif-netdev-perf.h  |  1 +
 lib/dpif-netdev-unixctl.man |  4 
 lib/dpif-netdev.c   | 12 +++-
 tests/pmd.at|  6 --
 6 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
index 7772b7abf..544d36903 100644
--- a/lib/dpif-netdev-avx512.c
+++ b/lib/dpif-netdev-avx512.c
@@ -310,8 +310,11 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 }
 
 /* At this point we don't return error anymore, so commit stats here. */
+uint32_t mfex_hit_cnt = __builtin_popcountll(mf_mask);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_RECV, batch_size);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_PHWOL_HIT, phwol_hits);
+pmd_perf_update_counter(>perf_stats, PMD_STAT_MFEX_OPT_HIT,
+mfex_hit_cnt);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_EXACT_HIT, emc_hits);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT, smc_hits);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_MASKED_HIT,
diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index 7103a2d4d..d7676ea2b 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -247,6 +247,7 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
pmd_perf_stats *s,
 "  Rx packets:%12"PRIu64"  (%.0f Kpps, %.0f cycles/pkt)\n"
 "  Datapath passes:   %12"PRIu64"  (%.2f passes/pkt)\n"
 "  - PHWOL hits:  %12"PRIu64"  (%5.1f %%)\n"
+"  - MFEX Opt hits:   %12"PRIu64"  (%5.1f %%)\n"
 "  - EMC hits:%12"PRIu64"  (%5.1f %%)\n"
 "  - SMC hits:%12"PRIu64"  (%5.1f %%)\n"
 "  - Megaflow hits:   %12"PRIu64"  (%5.1f %%, %.2f "
@@ -258,6 +259,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
pmd_perf_stats *s,
 passes, rx_packets ? 1.0 * passes / rx_packets : 0,
 stats[PMD_STAT_PHWOL_HIT],
 100.0 * stats[PMD_STAT_PHWOL_HIT] / passes,
+stats[PMD_STAT_MFEX_OPT_HIT],
+100.0 * stats[PMD_STAT_MFEX_OPT_HIT] / passes,
 stats[PMD_STAT_EXACT_HIT],
 100.0 * stats[PMD_STAT_EXACT_HIT] / passes,
 stats[PMD_STAT_SMC_HIT],
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 8b1a52387..834c26260 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -57,6 +57,7 @@ extern "C" {
 
 enum pmd_stat_type {
 PMD_STAT_PHWOL_HIT, /* Packets that had a partial HWOL hit (phwol). */
+PMD_STAT_MFEX_OPT_HIT,  /* Packets that had miniflow optimized match. */
 PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */
 PMD_STAT_SMC_HIT,   /* Packets that had a sig match hit (SMC). */
 PMD_STAT_MASKED_HIT,/* Packets that matched in the flow table. */
diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
index 83ce4f1c5..80304ad35 100644
--- a/lib/dpif-netdev-unixctl.man
+++ b/lib/dpif-netdev-unixctl.man
@@ -16,6 +16,9 @@ packet lookups performed by the datapath. Beware that a 
recirculated packet
 experiences one additional lookup per recirculation, so there may be
 more lookups than forwarded packets in the datapath.
 
+The MFEX Opt hits displays the number of packets that are processed by the
+optimized miniflow extract implementations.
+
 Cycles are counted using the TSC or similar facilities (when available on
 the platform). The duration of one cycle depends on the processing platform.
 
@@ -136,6 +139,7 @@ pmd thread numa_id 0 core_id 1:
   Rx packets: 2399607  (2381 Kpps, 848 cycles/pkt)
   Datapath passes:3599415  (1.50 passes/pkt)
   - PHWOL hits: 0  (  0.0 %)
+  - MFEX Opt hits:3570133  ( 99.2 %)
   - EMC hits:  336472  (  9.3 %)
   - SMC hits:   0  ( 0.0 %)
   - Megaflow hits:3262943  ( 90.7 %, 1.00 subtbl lookups/hit)
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a03a98fd7..ba94d9c5c 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -648,6 +648,7 @@ pmd_info_show_stats(struct ds *reply,
   "  packet recirculations: %"PRIu64"\n"
   "  avg. datapath passes per packet: %.02f\n"
   "  

[ovs-dev] [v12 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-14 Thread kumar Amber
From: Kumar Amber 

Tests:
  6: OVS-DPDK - MFEX Autovalidator
  7: OVS-DPDK - MFEX Autovalidator Fuzzy

Added a new directory to store the PCAP file used
in the tests and a script to generate the fuzzy traffic
type pcap to be used in fuzzy unit test.

Signed-off-by: Kumar Amber 
Acked-by: Flavio Leitner 
---
v12:
- change skip paramter for unit test
v11:
- fix comments from Eelco
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- remove sleep from first test and added minor 5 sec sleep to fuzzy
---
---
 Documentation/topics/dpdk/bridge.rst |  48 
 tests/.gitignore |   1 +
 tests/automake.mk|   6 +++
 tests/mfex_fuzzy.py  |  33 +
 tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
 tests/system-dpdk.at |  53 +++
 6 files changed, 141 insertions(+)
 create mode 100755 tests/mfex_fuzzy.py
 create mode 100644 tests/pcap/mfex_test.pcap

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 8c500c504..2f065422b 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -346,3 +346,51 @@ A compile time option is available in order to test it 
with the OVS unit
 test suite. Use the following configure option ::
 
 $ ./configure --enable-mfex-default-autovalidator
+
+Unit Test Miniflow Extract
+++
+
+Unit test can also be used to test the workflow mentioned above by running
+the following test-case in tests/system-dpdk.at ::
+
+make check-dpdk TESTSUITEFLAGS='-k MFEX'
+OVS-DPDK - MFEX Autovalidator
+
+The unit test uses mulitple traffic type to test the correctness of the
+implementaions.
+
+Running Fuzzy test with Autovalidator
++
+
+Fuzzy tests can also be done on miniflow extract with the help of
+auto-validator and Scapy. The steps below describes the steps to
+reproduce the setup with IP being fuzzed to generate packets.
+
+Scapy is used to create fuzzy IP packets and save them into a PCAP ::
+
+pkt = fuzz(Ether()/IP()/TCP())
+
+Set the miniflow extract to autovalidator using ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
+
+OVS is configured to receive the generated packets ::
+
+$ ovs-vsctl add-port br0 pcap0 -- \
+set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
+"rx_pcap=fuzzy.pcap"
+
+With this workflow, the autovalidator will ensure that all MFEX
+implementations are classifying each packet in exactly the same way.
+If an optimized MFEX implementation causes a different miniflow to be
+generated, the autovalidator has ovs_assert and logging statements that
+will inform about the issue.
+
+Unit Fuzzy test with Autovalidator
++
+
+Unit test can also be used to test the workflow mentioned above by running
+the following test-case in tests/system-dpdk.at ::
+
+make check-dpdk TESTSUITEFLAGS='-k MFEX'
+OVS-DPDK - MFEX Autovalidator Fuzzy
diff --git a/tests/.gitignore b/tests/.gitignore
index 45b4f67b2..a3d927e5d 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -11,6 +11,7 @@
 /ovsdb-cluster-testsuite
 /ovsdb-cluster-testsuite.dir/
 /ovsdb-cluster-testsuite.log
+/pcap/
 /pki/
 /system-afxdp-testsuite
 /system-afxdp-testsuite.dir/
diff --git a/tests/automake.mk b/tests/automake.mk
index f45f8d76c..a6c15ba55 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at: tests/automake.mk
echo "TEST_FUZZ_REGRESSION([$$basename])"; \
done > $@.tmp && mv $@.tmp $@
 
+EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
+MFEX_AUTOVALIDATOR_TESTS = \
+   tests/pcap/mfex_test.pcap \
+   tests/mfex_fuzzy.py
+
 OVSDB_CLUSTER_TESTSUITE_AT = \
tests/ovsdb-cluster-testsuite.at \
tests/ovsdb-execution.at \
@@ -512,6 +517,7 @@ tests_test_type_props_SOURCES = tests/test-type-props.c
 CHECK_PYFILES = \
tests/appctl.py \
tests/flowgen.py \
+   tests/mfex_fuzzy.py \
tests/ovsdb-monitor-sort.py \
tests/test-daemon.py \
tests/test-json.py \
diff --git a/tests/mfex_fuzzy.py b/tests/mfex_fuzzy.py
new file mode 100755
index 0..5b056bb48
--- /dev/null
+++ b/tests/mfex_fuzzy.py
@@ -0,0 +1,33 @@
+#!/usr/bin/python3
+try:
+from scapy.all import RandMAC, RandIP, PcapWriter, RandIP6, RandShort, fuzz
+from scapy.all import IPv6, Dot1Q, IP, Ether, UDP, TCP
+except ModuleNotFoundError as err:
+print(err + ": Scapy")
+import sys
+
+path = str(sys.argv[1]) + "/pcap/fuzzy.pcap"
+pktdump = PcapWriter(path, append=False, sync=True)
+
+for i in range(0, 2000):
+
+# Generate random protocol bases, use a fuzz() over the combined packet
+# for full fuzzing.
+eth = Ether(src=RandMAC(), dst=RandMAC())
+vlan = Dot1Q()
+ipv4 = IP(src=RandIP(), dst=RandIP())
+

[ovs-dev] [v12 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-14 Thread kumar Amber
From: Kumar Amber 

This commit introduces additional command line paramter
for mfex study function. If user provides additional packet out
it is used in study to compare minimum packets which must be processed
else a default value is choosen.
Also introduces a third paramter for choosing a particular pmd core.

$ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3

Signed-off-by: Kumar Amber 

---
v12:
- re-work the set command to sweep
- inlcude fixes to study.c and doc changes
v11:
- include comments from Eelco
- reworked set command as per discussion
v10:
- fix review comments Eelco
v9:
- fix review comments Flavio
v7:
- change the command paramters for core_id and study_pkt_cnt
v5:
- fix review comments(Ian, Flavio, Eelco)
- introucde pmd core id parameter
---
---
 Documentation/topics/dpdk/bridge.rst |  38 +-
 lib/dpif-netdev-extract-study.c  |  30 -
 lib/dpif-netdev-private-extract.h|   9 ++
 lib/dpif-netdev.c| 178 ++-
 4 files changed, 222 insertions(+), 33 deletions(-)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index a47153495..8c500c504 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -284,12 +284,46 @@ command also shows whether the CPU supports each 
implementation ::
 
 An implementation can be selected manually by the following command ::
 
-$ ovs-appctl dpif-netdev/miniflow-parser-set study
+$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
+ [study_cnt]
+
+The above command has two optional parameters: study_cnt and core_id.
+The core_id sets a particular miniflow extract function to a specific
+pmd thread on the core. The third parameter study_cnt, which is specific
+to study and ignored by other implementations, means how many packets
+are needed to choose the best implementation.
 
 Also user can select the study implementation which studies the traffic for
 a specific number of packets by applying all available implementations of
 miniflow extract and then chooses the one with the most optimal result for
-that traffic pattern.
+that traffic pattern. The user can optionally provide an packet count
+[study_cnt] parameter which is the minimum number of packets that OVS must
+study before choosing an optimal implementation. If no packet count is
+provided, then the default value, 128 is chosen. Also, as there is no
+synchronization point between threads, one PMD thread might still be running
+a previous round, and can now decide on earlier data.
+
+The per packet count is a global value, and parallel study executions with
+differing packet counts will use the most recent count value provided by user.
+
+Study can be selected with packet count by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
+
+Study can be selected with packet count and explicit PMD selection
+by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
+
+In the above command the first parameter is the CORE ID of the PMD
+thread and this can also be used to explicitly set the miniflow
+extraction function pointer on different PMD threads.
+
+Scalar can be selected on core 3 by the following command where
+study count should not be provided for any implementation other
+than study ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
 
 Miniflow Extract Validation
 ~~~
diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
index 02b709f8b..083f940c2 100644
--- a/lib/dpif-netdev-extract-study.c
+++ b/lib/dpif-netdev-extract-study.c
@@ -25,8 +25,7 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
 
-static atomic_uint32_t mfex_study_pkts_count = 0;
-
+static atomic_uint32_t  mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;
 /* Struct to hold miniflow study stats. */
 struct study_stats {
 uint32_t pkt_count;
@@ -48,6 +47,28 @@ mfex_study_get_study_stats_ptr(void)
 return stats;
 }
 
+int
+mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count, const char *name)
+{
+struct dpif_miniflow_extract_impl *miniflow_funcs;
+miniflow_funcs = dpif_mfex_impl_info_get();
+
+/* If the packet count is set and implementation called is study then
+ * set packet counter to requested number or set the packet counter
+ * to default number else return -EINVAL.
+ */
+if ((strcmp(miniflow_funcs[MFEX_IMPL_STUDY].name, name) == 0) &&
+(pkt_cmp_count != 0)) {
+
+atomic_uintptr_t *study_pck_cnt = (void *)_study_pkts_count;
+atomic_store_relaxed(study_pck_cnt, (uintptr_t) pkt_cmp_count );
+atomic_store_relaxed(_study_pkts_count, pkt_cmp_count);
+return 0;
+}
+
+return -EINVAL;
+}
+
 uint32_t
 mfex_study_traffic(struct dp_packet_batch *packets,
struct netdev_flow_key *keys,
@@ -86,7 +107,10 

[ovs-dev] [v12 05/11] dpif-netdev: Add configure to enable autovalidator at build time.

2021-07-14 Thread kumar Amber
From: Kumar Amber 

This commit adds a new command to allow the user to enable
autovalidatior by default at build time thus allowing for
runnig unit test by default.

 $ ./configure --enable-mfex-default-autovalidator

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 

---
v11:
- fix NEWS for blank line addition
v10:
- rework default set
v9:
- fix review comments Flavio
v7:
- fix review commens(Eelco, Flavio)
v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 Documentation/topics/dpdk/bridge.rst |  5 +
 NEWS |  2 ++
 acinclude.m4 | 16 
 configure.ac |  1 +
 lib/dpif-netdev-private-extract.c|  4 
 5 files changed, 28 insertions(+)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 7c96f4d5e..a47153495 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -307,3 +307,8 @@ implementations provide the same results.
 To set the Miniflow autovalidator, use this command ::
 
 $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
+
+A compile time option is available in order to test it with the OVS unit
+test suite. Use the following configure option ::
+
+$ ./configure --enable-mfex-default-autovalidator
diff --git a/NEWS b/NEWS
index 4a7b89409..225eb445c 100644
--- a/NEWS
+++ b/NEWS
@@ -38,6 +38,8 @@ Post-v2.15.0
  * Add study function to miniflow function table which studies packet
and automatically chooses the best miniflow implementation for that
traffic.
+ * Add build time configure command to enable auto-validatior as default
+   miniflow implementation at build time.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/acinclude.m4 b/acinclude.m4
index 343303447..5a48f0335 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -14,6 +14,22 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+dnl Set OVS MFEX Autovalidator as default miniflow extract at compile time?
+dnl This enables automatically running all unit tests with all MFEX
+dnl implementations.
+AC_DEFUN([OVS_CHECK_MFEX_AUTOVALIDATOR], [
+  AC_ARG_ENABLE([mfex-default-autovalidator],
+[AC_HELP_STRING([--enable-mfex-default-autovalidator], [Enable 
MFEX autovalidator as default miniflow_extract implementation.])],
+[autovalidator=yes],[autovalidator=no])
+  AC_MSG_CHECKING([whether MFEX Autovalidator is default implementation])
+  if test "$autovalidator" != yes; then
+AC_MSG_RESULT([no])
+  else
+OVS_CFLAGS="$OVS_CFLAGS -DMFEX_AUTOVALIDATOR_DEFAULT"
+AC_MSG_RESULT([yes])
+  fi
+])
+
 dnl Set OVS DPCLS Autovalidator as default subtable search at compile time?
 dnl This enables automatically running all unit tests with all DPCLS
 dnl implementations.
diff --git a/configure.ac b/configure.ac
index e45685a6c..46c402892 100644
--- a/configure.ac
+++ b/configure.ac
@@ -186,6 +186,7 @@ OVS_ENABLE_SPARSE
 OVS_CTAGS_IDENTIFIERS
 OVS_CHECK_DPCLS_AUTOVALIDATOR
 OVS_CHECK_DPIF_AVX512_DEFAULT
+OVS_CHECK_MFEX_AUTOVALIDATOR
 OVS_CHECK_BINUTILS_AVX512
 
 AC_ARG_VAR(KARCH, [Kernel Architecture String])
diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c
index 5158f5c90..b4d538b92 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -60,7 +60,11 @@ void
 dpif_miniflow_extract_init(void)
 {
 atomic_uintptr_t *mfex_func = (void *)_mfex_func;
+#ifdef MFEX_AUTOVALIDATOR_DEFAULT
+int mfex_idx = MFEX_IMPL_AUTOVALIDATOR;
+#else
 int mfex_idx = MFEX_IMPL_SCALAR;
+#endif
 
 /* Call probe on each impl, and cache the result. */
 for (int i = 0; i < MFEX_IMPL_MAX; i++) {
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [v12 04/11] docs/dpdk/bridge: add miniflow extract section.

2021-07-14 Thread kumar Amber
From: Kumar Amber 

This commit adds a section to the dpdk/bridge.rst netdev documentation,
detailing the added miniflow functionality. The newly added commands are
documented, and sample output is provided.

The use of auto-validator and special study function is also described
in detail as well as running fuzzy tests.

Signed-off-by: Kumar Amber 
Co-authored-by: Cian Ferriter 
Signed-off-by: Cian Ferriter 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
Acked-by: Flavio Leitner 
Acked-by: Eelco Chaudron 

---
v11:
- fix minor typos.
v10:
- fix minor typos.
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 Documentation/topics/dpdk/bridge.rst | 51 
 1 file changed, 51 insertions(+)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 2d0850836..7c96f4d5e 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -256,3 +256,54 @@ The following line should be seen in the configure output 
when the above option
 is used ::
 
 checking whether DPIF AVX512 is default implementation... yes
+
+Miniflow Extract
+
+
+Miniflow extract (MFEX) performs parsing of the raw packets and extracts the
+important header information into a compressed miniflow. This miniflow is
+composed of bits and blocks where the bits signify which blocks are set or
+have values where as the blocks hold the metadata, ip, udp, vlan, etc. These
+values are used by the datapath for switching decisions later. The Optimized
+miniflow extract is traffic specific to speed up the lookup, whereas the
+scalar works for ALL traffic patterns
+
+Most modern CPUs have SIMD capabilities. These SIMD instructions are able
+to process a vector rather than act on one variable. OVS provides multiple
+implementations of miniflow extract. This allows the user to take advantage
+of SIMD instructions like AVX512 to gain additional performance.
+
+A list of implementations can be obtained by the following command. The
+command also shows whether the CPU supports each implementation ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-get
+Available Optimized Miniflow Extracts:
+autovalidator (available: True, pmds: none)
+scalar (available: True, pmds: 1,15)
+study (available: True, pmds: none)
+
+An implementation can be selected manually by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set study
+
+Also user can select the study implementation which studies the traffic for
+a specific number of packets by applying all available implementations of
+miniflow extract and then chooses the one with the most optimal result for
+that traffic pattern.
+
+Miniflow Extract Validation
+~~~
+
+As multiple versions of miniflow extract can co-exist, each with different
+CPU ISA optimizations, it is important to validate that they all give the
+exact same results. To easily test all miniflow implementations, an
+``autovalidator`` implementation of the miniflow exists. This implementation
+runs all other available miniflow extract implementations, and verifies that
+the results are identical.
+
+Running the OVS unit tests with the autovalidator enabled ensures all
+implementations provide the same results.
+
+To set the Miniflow autovalidator, use this command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [v12 03/11] dpif-netdev: Add study function to select the best mfex function

2021-07-14 Thread kumar Amber
From: Kumar Amber 

The study function runs all the available implementations
of miniflow_extract and makes a choice whose hitmask has
maximum hits and sets the mfex to that function.

Study can be run at runtime using the following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set study

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 

---
v11:
- fix minor comments from Eelco
v10:
- fix minor comments from Eelco
v9:
- fix comments Flavio
v8:
- fix review comments Flavio
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- add Atomic set in study
---
---
 NEWS  |   3 +
 lib/automake.mk   |   1 +
 lib/dpif-netdev-extract-study.c   | 136 ++
 lib/dpif-netdev-private-extract.c |  12 +++
 lib/dpif-netdev-private-extract.h |  19 +
 5 files changed, 171 insertions(+)
 create mode 100644 lib/dpif-netdev-extract-study.c

diff --git a/NEWS b/NEWS
index cf254bcfe..4a7b89409 100644
--- a/NEWS
+++ b/NEWS
@@ -35,6 +35,9 @@ Post-v2.15.0
  * Add command line option to switch between MFEX function pointers.
  * Add miniflow extract auto-validator function to compare different
miniflow extract implementations against default implementation.
+ * Add study function to miniflow function table which studies packet
+   and automatically chooses the best miniflow implementation for that
+   traffic.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/automake.mk b/lib/automake.mk
index 53b8abc0f..f4f36325e 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -107,6 +107,7 @@ lib_libopenvswitch_la_SOURCES = \
lib/dp-packet.h \
lib/dp-packet.c \
lib/dpdk.h \
+   lib/dpif-netdev-extract-study.c \
lib/dpif-netdev-lookup.h \
lib/dpif-netdev-lookup.c \
lib/dpif-netdev-lookup-autovalidator.c \
diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
new file mode 100644
index 0..02b709f8b
--- /dev/null
+++ b/lib/dpif-netdev-extract-study.c
@@ -0,0 +1,136 @@
+/*
+ * Copyright (c) 2021 Intel.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "dpif-netdev-private-thread.h"
+#include "openvswitch/vlog.h"
+#include "ovs-thread.h"
+
+VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
+
+static atomic_uint32_t mfex_study_pkts_count = 0;
+
+/* Struct to hold miniflow study stats. */
+struct study_stats {
+uint32_t pkt_count;
+uint32_t impl_hitcount[MFEX_IMPL_MAX];
+};
+
+/* Define per thread data to hold the study stats. */
+DEFINE_PER_THREAD_MALLOCED_DATA(struct study_stats *, study_stats);
+
+/* Allocate per thread PMD pointer space for study_stats. */
+static inline struct study_stats *
+mfex_study_get_study_stats_ptr(void)
+{
+struct study_stats *stats = study_stats_get();
+if (OVS_UNLIKELY(!stats)) {
+   stats = xzalloc(sizeof *stats);
+   study_stats_set_unsafe(stats);
+}
+return stats;
+}
+
+uint32_t
+mfex_study_traffic(struct dp_packet_batch *packets,
+   struct netdev_flow_key *keys,
+   uint32_t keys_size, odp_port_t in_port,
+   struct dp_netdev_pmd_thread *pmd_handle)
+{
+uint32_t hitmask = 0;
+uint32_t mask = 0;
+struct dp_netdev_pmd_thread *pmd = pmd_handle;
+struct dpif_miniflow_extract_impl *miniflow_funcs;
+struct study_stats *stats = mfex_study_get_study_stats_ptr();
+miniflow_funcs = dpif_mfex_impl_info_get();
+
+/* Run traffic optimized miniflow_extract to collect the hitmask
+ * to be compared after certain packets have been hit to choose
+ * the best miniflow_extract version for that traffic.
+ */
+for (int i = MFEX_IMPL_START_IDX; i < MFEX_IMPL_MAX; i++) {
+if (!miniflow_funcs[i].available) {
+continue;
+}
+
+hitmask = miniflow_funcs[i].extract_func(packets, keys, keys_size,
+ in_port, pmd_handle);
+stats->impl_hitcount[i] += count_1bits(hitmask);
+
+/* If traffic is not classified then we dont overwrite the keys
+ * array in minfiflow implementations so its safe to create a
+ * mask for all those packets whose miniflow have been created.
+

[ovs-dev] [v12 01/11] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-14 Thread kumar Amber
From: Kumar Amber 

This patch introduces the MFEX function pointers which allows
the user to switch between different miniflow extract implementations
which are provided by the OVS based on optimized ISA CPU.

The user can query for the available minflow extract variants available
for that CPU by following commands:

$ovs-appctl dpif-netdev/miniflow-parser-get

Similarly an user can set the miniflow implementation by the following
command :

$ ovs-appctl dpif-netdev/miniflow-parser-set name

This allows for more performance and flexibility to the user to choose
the miniflow implementation according to the needs.

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 
---
v11:
- fix Eelco comments
v10:
- fix build errors
- rework default set and atomic global variable
v9:
- fix review comments from Flavio
v7:
- fix review comments(Eelco, Flavio)
v5:
- fix review comments(Ian, Flavio, Eelco)
- add enum to hold mfex indexes
- add new get and set implemenatations
- add Atomic set and get
---
---
 NEWS  |   1 +
 lib/automake.mk   |   2 +
 lib/dpif-netdev-avx512.c  |  31 +-
 lib/dpif-netdev-private-extract.c | 162 ++
 lib/dpif-netdev-private-extract.h | 113 +
 lib/dpif-netdev-private-thread.h  |   8 ++
 lib/dpif-netdev.c | 107 +++-
 7 files changed, 419 insertions(+), 5 deletions(-)
 create mode 100644 lib/dpif-netdev-private-extract.c
 create mode 100644 lib/dpif-netdev-private-extract.h

diff --git a/NEWS b/NEWS
index 6cdccc715..b0f08e96d 100644
--- a/NEWS
+++ b/NEWS
@@ -32,6 +32,7 @@ Post-v2.15.0
  * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if the
CPU supports it. This enhances performance by using the native vpopcount
instructions, instead of the emulated version of vpopcount.
+ * Add command line option to switch between MFEX function pointers.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/automake.mk b/lib/automake.mk
index 3c9523c1a..53b8abc0f 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \
lib/dpif-netdev-private-dpcls.h \
lib/dpif-netdev-private-dpif.c \
lib/dpif-netdev-private-dpif.h \
+   lib/dpif-netdev-private-extract.c \
+   lib/dpif-netdev-private-extract.h \
lib/dpif-netdev-private-flow.h \
lib/dpif-netdev-private-thread.h \
lib/dpif-netdev-private.h \
diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
index 6f9aa8284..7772b7abf 100644
--- a/lib/dpif-netdev-avx512.c
+++ b/lib/dpif-netdev-avx512.c
@@ -149,6 +149,15 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
  * // do all processing (HWOL->MFEX->EMC->SMC)
  * }
  */
+
+/* Do a batch minfilow extract into keys. */
+uint32_t mf_mask = 0;
+miniflow_extract_func mfex_func;
+atomic_read_relaxed(>miniflow_extract_opt, _func);
+if (mfex_func) {
+mf_mask = mfex_func(packets, keys, batch_size, in_port, pmd);
+}
+
 uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1;
 uint32_t iter = lookup_pkts_bitmask;
 while (iter) {
@@ -167,6 +176,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 pkt_metadata_init(>md, in_port);
 
 struct dp_netdev_flow *f = NULL;
+struct netdev_flow_key *key = [i];
+
+/* Check the minfiflow mask to see if the packet was correctly
+ * classifed by vector mfex else do a scalar miniflow extract
+ * for that packet.
+ */
+bool mfex_hit = !!(mf_mask & (1 << i));
 
 /* Check for a partial hardware offload match. */
 if (hwol_enabled) {
@@ -177,7 +193,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 }
 if (f) {
 rules[i] = >cr;
-pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
+/* If AVX512 MFEX already classified the packet, use it. */
+if (mfex_hit) {
+pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf);
+} else {
+pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
+}
+
 pkt_meta[i].bytes = dp_packet_size(packet);
 phwol_hits++;
 hwol_emc_smc_hitmask |= (1 << i);
@@ -185,9 +207,10 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 }
 }
 
-/* Do miniflow extract into keys. */
-struct netdev_flow_key *key = [i];
-miniflow_extract(packet, >mf);
+if (!mfex_hit) {
+/* Do a scalar miniflow extract into keys. */
+miniflow_extract(packet, >mf);
+}
 
 /* Cache TCP and byte 

[ovs-dev] [v12 02/11] dpif-netdev: Add auto validation function for miniflow extract

2021-07-14 Thread kumar Amber
From: Kumar Amber 

This patch introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different miniflow implementations against the linear
miniflow extract and return a hitmask.

The autovaidator function can be triggered at runtime using the
following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 

---
v9:
- fix review comments Flavio
v6:
-fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- remove ovs assert and switch to default after a batch of packets
  is processed
- Atomic set and get introduced
- fix raw_ctz for windows build
---
---
 NEWS  |   2 +
 lib/dpif-netdev-private-extract.c | 150 ++
 lib/dpif-netdev-private-extract.h |  22 +
 lib/dpif-netdev.c |   2 +-
 4 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index b0f08e96d..cf254bcfe 100644
--- a/NEWS
+++ b/NEWS
@@ -33,6 +33,8 @@ Post-v2.15.0
CPU supports it. This enhances performance by using the native vpopcount
instructions, instead of the emulated version of vpopcount.
  * Add command line option to switch between MFEX function pointers.
+ * Add miniflow extract auto-validator function to compare different
+   miniflow extract implementations against default implementation.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c
index 688190e70..a9ca56bd0 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -38,6 +38,11 @@ static ATOMIC(miniflow_extract_func) default_mfex_func;
  */
 static struct dpif_miniflow_extract_impl mfex_impls[] = {
 
+[MFEX_IMPL_AUTOVALIDATOR] = {
+.probe = NULL,
+.extract_func = dpif_miniflow_extract_autovalidator,
+.name = "autovalidator", },
+
 [MFEX_IMPL_SCALAR] = {
 .probe = NULL,
 .extract_func = NULL,
@@ -160,3 +165,148 @@ dp_mfex_impl_get_by_name(const char *name, 
miniflow_extract_func *out_func)
 
 return -ENOENT;
 }
+
+uint32_t
+dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
+struct netdev_flow_key *keys,
+uint32_t keys_size, odp_port_t in_port,
+struct dp_netdev_pmd_thread *pmd_handle)
+{
+const size_t cnt = dp_packet_batch_size(packets);
+uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
+uint16_t good_l3_ofs[NETDEV_MAX_BURST];
+uint16_t good_l4_ofs[NETDEV_MAX_BURST];
+uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
+struct dp_packet *packet;
+struct dp_netdev_pmd_thread *pmd = pmd_handle;
+struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
+
+if (keys_size < cnt) {
+miniflow_extract_func default_func = NULL;
+atomic_uintptr_t *pmd_func = (void *)>miniflow_extract_opt;
+atomic_store_relaxed(pmd_func, (uintptr_t) default_func);
+VLOG_ERR("Invalid key size supplied, Key_size: %d less than"
+ "batch_size:  %" PRIuSIZE"\n", keys_size, cnt);
+VLOG_ERR("Autovalidatior is disabled.\n");
+return 0;
+}
+
+/* Run scalar miniflow_extract to get default result. */
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+pkt_metadata_init(>md, in_port);
+miniflow_extract(packet, [i].mf);
+
+/* Store known good metadata to compare with optimized metadata. */
+good_l2_5_ofs[i] = packet->l2_5_ofs;
+good_l3_ofs[i] = packet->l3_ofs;
+good_l4_ofs[i] = packet->l4_ofs;
+good_l2_pad_size[i] = packet->l2_pad_size;
+}
+
+uint32_t batch_failed = 0;
+/* Iterate through each version of miniflow implementations. */
+for (int j = MFEX_IMPL_START_IDX; j < MFEX_IMPL_MAX; j++) {
+if (!mfex_impls[j].available) {
+continue;
+}
+/* Reset keys and offsets before each implementation. */
+memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key));
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+dp_packet_reset_offsets(packet);
+}
+/* Call optimized miniflow for each batch of packet. */
+uint32_t hit_mask = mfex_impls[j].extract_func(packets, test_keys,
+   keys_size, in_port,
+   pmd_handle);
+
+/* Do a miniflow compare for bits, blocks and offsets for all the
+ * classified packets in the hitmask marked by set bits. */
+while (hit_mask) {
+/* Index for the set bit. */
+uint32_t i = raw_ctz(hit_mask);
+/* Set the index 

[ovs-dev] [v12 00/11] MFEX Infrastructure + Optimizations

2021-07-14 Thread kumar Amber
v12:
- re-work the set command to sweep method
- changes skip for unit-test to true from not-available
- added acks from Eelco
- minor doc fixed and typos
v11:
- reworked set command in alingment with Eelco and Harry
- added Acks from Eelco.
- added skip to unit test if other implementations not available
- minor typos and fixes
- clang build fixes
- removed patch whith Scalar DPIF, will send separately
v10 update:
- re-worked the default implementation
- fix comments from Flavio and Eelco
- Include Acks from Eelco in study
v9 update:
- Include review comments from Flavio
- Rebase onto Master
- Include Acks from Flavio
v8 updates:
- Include documentation on AVX512 MFEX as per Eelco's suggestion on list
v7 updates:
- Rebase onto DPIF v15
- Changed commands to get and set MFEX
- Fixed comments from Flavio, Eelco
- Segrated addition of MFEX options to seaprate patch 12 for Scalar DPIF
- Removed sleep from auto-validator and added frame counter check
- Documentation updates
- Minor bug fixes
v6 updates:
- Fix non-ssl build
v5 updates:
- reabse onto latest DPIF v14
- use Enum for mfex impls
- add pmd core id set paramter in set command
- get command modified to display the pmd thread for individual mfex functions
- resolved comments from Eelco, Ian, Flavio
- Use Atomic to get and set miniflow implementations
- removed and reduced sleep in unit tests
- fixed scalar miniflow perf degradation
v4 updates:
- rebase on to latest DPIF v13
- fix fuzzy.py script with random mac/ip
v3 updates:
- rebase on to latest DPIF v12
- add additonal AVX512 traffic profiles for tcp and vlan
- add new command line for study function to add packet count
- add unit tests for fuzzy testing and auto-validation of mfex
- add mfex option hit stats to perf-show command
v2 updates:
- rebase on to latest DPIF v11
This patchset introduces miniflow extract Infrastructure changes
which allows user to choose different type of ISA based optimized
miniflow extract variants which can be user choosen or set based on 
packets studies automatically by OVS using different commands.
The Infrastructure also provides a way to check the correctness of
different ISA optimized miniflow extract variants against the scalar
version.

Harry van Haaren (4):
  dpif/stats: add miniflow extract opt hits counter
  dpdk: add additional CPU ISA detection strings
  dpif-netdev/mfex: Add AVX512 based optimized miniflow extract
  dpif-netdev/mfex: add more AVX512 traffic profiles

Kumar Amber (7):
  dpif-netdev: Add command line and function pointer for miniflow
extract
  dpif-netdev: Add auto validation function for miniflow extract
  dpif-netdev: Add study function to select the best mfex function
  docs/dpdk/bridge: add miniflow extract section.
  dpif-netdev: Add configure to enable autovalidator at build time.
  dpif-netdev: Add packet count and core id paramters for study
  test/sytem-dpdk: Add unit test for mfex autovalidator

 Documentation/topics/dpdk/bridge.rst | 138 ++
 NEWS |  11 +
 acinclude.m4 |  16 +
 configure.ac |   1 +
 lib/automake.mk  |   4 +
 lib/dpdk.c   |   2 +
 lib/dpif-netdev-avx512.c |  34 +-
 lib/dpif-netdev-extract-avx512.c | 630 +++
 lib/dpif-netdev-extract-study.c  | 160 +++
 lib/dpif-netdev-perf.c   |   3 +
 lib/dpif-netdev-perf.h   |   1 +
 lib/dpif-netdev-private-extract.c| 371 
 lib/dpif-netdev-private-extract.h| 203 +
 lib/dpif-netdev-private-thread.h |   8 +
 lib/dpif-netdev-unixctl.man  |   4 +
 lib/dpif-netdev.c| 241 +-
 tests/.gitignore |   1 +
 tests/automake.mk|   6 +
 tests/mfex_fuzzy.py  |  33 ++
 tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
 tests/pmd.at |   6 +-
 tests/system-dpdk.at |  53 +++
 22 files changed, 1914 insertions(+), 12 deletions(-)
 create mode 100644 lib/dpif-netdev-extract-avx512.c
 create mode 100644 lib/dpif-netdev-extract-study.c
 create mode 100644 lib/dpif-netdev-private-extract.c
 create mode 100644 lib/dpif-netdev-private-extract.h
 create mode 100755 tests/mfex_fuzzy.py
 create mode 100644 tests/pcap/mfex_test.pcap

-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 5/9] ovsdb: New ovsdb 'relay' service model.

2021-07-14 Thread 0-day Robot
Bleep bloop.  Greetings Ilya Maximets, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Comment with 'xxx' marker
#525 FILE: ovsdb/relay.c:158:
/* XXX: ovsdb-cs module returns shash which was previously part of a json

Lines checked: 753, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 9/9] docs: Add documentation for ovsdb relay mode.

2021-07-14 Thread Ilya Maximets
Main documentation for the service model and tutorial with the use case
and configuration examples.

Acked-by: Dumitru Ceara 
Signed-off-by: Ilya Maximets 
---
 Documentation/automake.mk|   1 +
 Documentation/ref/ovsdb.7.rst|  62 --
 Documentation/topics/index.rst   |   1 +
 Documentation/topics/ovsdb-relay.rst | 124 +++
 NEWS |   3 +
 ovsdb/ovsdb-server.1.in  |  27 +++---
 6 files changed, 200 insertions(+), 18 deletions(-)
 create mode 100644 Documentation/topics/ovsdb-relay.rst

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index bc30f94c5..213d9c867 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -52,6 +52,7 @@ DOC_SOURCE = \
Documentation/topics/networking-namespaces.rst \
Documentation/topics/openflow.rst \
Documentation/topics/ovs-extensions.rst \
+   Documentation/topics/ovsdb-relay.rst \
Documentation/topics/ovsdb-replication.rst \
Documentation/topics/porting.rst \
Documentation/topics/record-replay.rst \
diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst
index e4f1bf766..980ba29e7 100644
--- a/Documentation/ref/ovsdb.7.rst
+++ b/Documentation/ref/ovsdb.7.rst
@@ -121,13 +121,14 @@ schema checksum from a schema or database file, 
respectively.
 Service Models
 ==
 
-OVSDB supports three service models for databases: **standalone**,
-**active-backup**, and **clustered**.  The service models provide different
-compromises among consistency, availability, and partition tolerance.  They
-also differ in the number of servers required and in terms of performance.  The
-standalone and active-backup database service models share one on-disk format,
-and clustered databases use a different format, but the OVSDB programs work
-with both formats.  ``ovsdb(5)`` documents these file formats.
+OVSDB supports four service models for databases: **standalone**,
+**active-backup**, **relay** and **clustered**.  The service models provide
+different compromises among consistency, availability, and partition tolerance.
+They also differ in the number of servers required and in terms of performance.
+The standalone and active-backup database service models share one on-disk
+format, and clustered databases use a different format, but the OVSDB programs
+work with both formats.  ``ovsdb(5)`` documents these file formats.  Relay
+databases have no on-disk storage.
 
 RFC 7047, which specifies the OVSDB protocol, does not mandate or specify
 any particular service model.
@@ -406,6 +407,50 @@ following consequences:
   that the client previously read.  The OVSDB client library in Open vSwitch
   uses this feature to avoid servers with stale data.
 
+Relay Service Model
+---
+
+A **relay** database is a way to scale out read-mostly access to the
+existing database working in any service model including relay.
+
+Relay database creates and maintains an OVSDB connection with another OVSDB
+server.  It uses this connection to maintain an in-memory copy of the remote
+database (a.k.a. the ``relay source``) keeping the copy up-to-date as the
+database content changes on the relay source in the real time.
+
+The purpose of relay server is to scale out the number of database clients.
+Read-only transactions and monitor requests are fully handled by the relay
+server itself.  For the transactions that request database modifications,
+relay works as a proxy between the client and the relay source, i.e. it
+forwards transactions and replies between them.
+
+Compared to the clustered and active-backup models, relay service model
+provides read and write access to the database similarly to a clustered
+database (and even more scalable), but with generally insignificant performance
+overhead of an active-backup model.  At the same time it doesn't increase
+availability that needs to be covered by the service model of the relay source.
+
+Relay database has no on-disk storage and therefore cannot be converted to
+any other service model.
+
+If there is already a database started in any service model, to start a relay
+database server use ``ovsdb-server relay::``, where
+ is the database name as specified in the schema of the database
+that existing server runs, and  is an OVSDB connection method
+(see `Connection Methods`_ below) that connects to the existing database
+server.   could contain a comma-separated list of connection
+methods, e.g. to connect to any server of the clustered database.
+Multiple relay servers could be started for the same relay source.
+
+Since the way relays handle read and write transactions is very similar
+to the clustered model where "cluster" means "set of relay servers connected
+to the same relay source", "follower" means "relay server" and the "leader"
+means "relay source", same consistency consequences as for the clustered
+model applies 

[ovs-dev] [PATCH v3 8/9] ovsdb: Make clients aware of relay service model.

2021-07-14 Thread Ilya Maximets
Clients needs to re-connect from the relay that has no connection
with the database source.  Also, relay acts similarly to the follower
from a clustered model from the consistency point of view, so it's not
suitable for leader-only connections.

Acked-by: Mark D. Gray 
Acked-by: Dumitru Ceara 
Signed-off-by: Ilya Maximets 
---
 lib/ovsdb-cs.c   | 15 ++-
 ovsdb/ovsdb-client.c |  2 +-
 python/ovs/db/idl.py | 16 
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/lib/ovsdb-cs.c b/lib/ovsdb-cs.c
index 911b71dd4..fb3fdd006 100644
--- a/lib/ovsdb-cs.c
+++ b/lib/ovsdb-cs.c
@@ -1897,8 +1897,8 @@ ovsdb_cs_check_server_db__(struct ovsdb_cs *cs)
 bool ok = false;
 const char *model = server_column_get_string(db_row, COL_MODEL, "");
 const char *schema = server_column_get_string(db_row, COL_SCHEMA, NULL);
+bool connected = server_column_get_bool(db_row, COL_CONNECTED, false);
 if (!strcmp(model, "clustered")) {
-bool connected = server_column_get_bool(db_row, COL_CONNECTED, false);
 bool leader = server_column_get_bool(db_row, COL_LEADER, false);
 uint64_t index = server_column_get_int(db_row, COL_INDEX, 0);
 
@@ -1918,6 +1918,19 @@ ovsdb_cs_check_server_db__(struct ovsdb_cs *cs)
 cs->min_index = index;
 ok = true;
 }
+} else if (!strcmp(model, "relay")) {
+if (!schema) {
+VLOG_INFO("%s: relay database server has not yet connected to the "
+  "relay source; trying another server", server_name);
+} else if (!connected) {
+VLOG_INFO("%s: relay database server is disconnected from the "
+  "relay source; trying another server", server_name);
+} else if (cs->leader_only) {
+VLOG_INFO("%s: relay database server cannot be a leader; "
+  "trying another server", server_name);
+} else {
+ok = true;
+}
 } else {
 if (!schema) {
 VLOG_INFO("%s: missing database schema", server_name);
diff --git a/ovsdb/ovsdb-client.c b/ovsdb/ovsdb-client.c
index ffa8f8df2..f1b8d6491 100644
--- a/ovsdb/ovsdb-client.c
+++ b/ovsdb/ovsdb-client.c
@@ -716,7 +716,7 @@ should_stay_connected(const char *server, const char 
*database,
 return false;
 }
 
-if (strcmp(parse_string_column(row, "model"), "clustered")) {
+if (!strcmp(parse_string_column(row, "model"), "standalone")) {
 /* Always accept standalone databases. */
 return true;
 }
diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
index 0fc2af3c2..1bf9b1e09 100644
--- a/python/ovs/db/idl.py
+++ b/python/ovs/db/idl.py
@@ -38,6 +38,7 @@ OVSDB_UPDATE = 0
 OVSDB_UPDATE2 = 1
 
 CLUSTERED = "clustered"
+RELAY = "relay"
 
 
 Notice = collections.namedtuple('Notice', ('event', 'row', 'updates'))
@@ -798,6 +799,21 @@ class Idl(object):
   'trying another server' % session_name)
 return False
 self._min_index = database.index[0]
+elif database.model == RELAY:
+if not database.schema:
+vlog.info('%s: relay database server has not yet connected '
+  'to the relay source; trying another server'
+  % session_name)
+return False
+if not database.connected:
+vlog.info('%s: relay database server is disconnected '
+  'from the relay source; trying another server'
+  % session_name)
+return False
+if self.leader_only:
+vlog.info('%s: relay database server cannot be a leader; '
+  'trying another server' % session_name)
+return False
 
 return True
 
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 6/9] ovsdb: relay: Add support for transaction forwarding.

2021-07-14 Thread Ilya Maximets
Current version of ovsdb relay allows to scale out read-only
access to the primary database.  However, many clients are not
read-only but read-mostly.  For example, ovn-controller.

In order to scale out database access for this case ovsdb-server
need to process transactions that are not read-only.  Relay is not
allowed to do that, i.e. not allowed to modify the database, but it
can act like a proxy and forward transactions that includes database
modifications to the primary server and forward replies back to a
client.  At the same time it may serve read-only transactions and
monitor requests by itself greatly reducing the load on primary
server.

This configuration will slightly increase transaction latency, but
it's not very important for read-mostly use cases.

Implementation details:
With this change instead of creating a trigger to commit the
transaction, ovsdb-server will create a trigger for transaction
forwarding.  Later, ovsdb_relay_run() will send all new transactions
to the relay source.  Once transaction reply received from the
relay source, ovsdb-relay module will update the state of the
transaction forwarding with the reply.  After that, trigger_run()
will complete the trigger and jsonrpc_server_run() will send the
reply back to the client.  Since transaction reply from the relay
source will be received after all the updates, client will receive
all the updates before receiving the transaction reply as it is in
a normal scenario with other database models.

Acked-by: Dumitru Ceara 
Signed-off-by: Ilya Maximets 
---
 ovsdb/automake.mk   |   2 +
 ovsdb/execution.c   |  18 ++--
 ovsdb/ovsdb.c   |   9 ++
 ovsdb/ovsdb.h   |   8 +-
 ovsdb/relay.c   |  12 ++-
 ovsdb/transaction-forward.c | 182 
 ovsdb/transaction-forward.h |  44 +
 ovsdb/trigger.c |  49 --
 ovsdb/trigger.h |  41 
 tests/ovsdb-server.at   |  85 -
 10 files changed, 411 insertions(+), 39 deletions(-)
 create mode 100644 ovsdb/transaction-forward.c
 create mode 100644 ovsdb/transaction-forward.h

diff --git a/ovsdb/automake.mk b/ovsdb/automake.mk
index 05c8ebbdf..62cc02686 100644
--- a/ovsdb/automake.mk
+++ b/ovsdb/automake.mk
@@ -48,6 +48,8 @@ ovsdb_libovsdb_la_SOURCES = \
ovsdb/trigger.h \
ovsdb/transaction.c \
ovsdb/transaction.h \
+   ovsdb/transaction-forward.c \
+   ovsdb/transaction-forward.h \
ovsdb/ovsdb-util.c \
ovsdb/ovsdb-util.h
 ovsdb_libovsdb_la_CFLAGS = $(AM_CFLAGS)
diff --git a/ovsdb/execution.c b/ovsdb/execution.c
index dd2569055..f9b8067d0 100644
--- a/ovsdb/execution.c
+++ b/ovsdb/execution.c
@@ -99,7 +99,8 @@ lookup_executor(const char *name, bool *read_only)
 }
 
 /* On success, returns a transaction and stores the results to return to the
- * client in '*resultsp'.
+ * client in '*resultsp'.  If 'forwarding_needed' is nonnull and transaction
+ * needs to be forwarded (in relay mode), sets '*forwarding_needed' to true.
  *
  * On failure, returns NULL.  If '*resultsp' is nonnull, then it is the results
  * to return to the client.  If '*resultsp' is null, then the execution failed
@@ -111,7 +112,8 @@ ovsdb_execute_compose(struct ovsdb *db, const struct 
ovsdb_session *session,
   const struct json *params, bool read_only,
   const char *role, const char *id,
   long long int elapsed_msec, long long int *timeout_msec,
-  bool *durable, struct json **resultsp)
+  bool *durable, bool *forwarding_needed,
+  struct json **resultsp)
 {
 struct ovsdb_execution x;
 struct ovsdb_error *error;
@@ -120,6 +122,9 @@ ovsdb_execute_compose(struct ovsdb *db, const struct 
ovsdb_session *session,
 size_t i;
 
 *durable = false;
+if (forwarding_needed) {
+*forwarding_needed = false;
+}
 if (params->type != JSON_ARRAY
 || !params->array.n
 || params->array.elems[0]->type != JSON_STRING
@@ -196,11 +201,8 @@ ovsdb_execute_compose(struct ovsdb *db, const struct 
ovsdb_session *session,
 "%s operation not allowed on "
 "table in reserved database %s",
 op_name, db->schema->name);
-} else if (db->is_relay) {
-error = ovsdb_error("not allowed",
-"%s operation not allowed when "
-"database server is in relay mode",
-op_name);
+} else if (db->is_relay && forwarding_needed) {
+*forwarding_needed = true;
 }
 }
 if (error) {
@@ -245,7 +247,7 @@ ovsdb_execute(struct ovsdb *db, const struct ovsdb_session 
*session,
 struct json *results;
 struct ovsdb_txn *txn = 

[ovs-dev] [PATCH v3 7/9] ovsdb: relay: Reflect connection status in _Server database.

2021-07-14 Thread Ilya Maximets
It might be important for clients to know that relay lost connection
with the relay remote, so they could re-connect to other relay.

Signed-off-by: Ilya Maximets 
---
 ovsdb/_server.xml| 17 +
 ovsdb/ovsdb-server.c |  3 ++-
 ovsdb/relay.c| 34 ++
 ovsdb/relay.h|  4 
 4 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/ovsdb/_server.xml b/ovsdb/_server.xml
index 37297da73..7866f134f 100644
--- a/ovsdb/_server.xml
+++ b/ovsdb/_server.xml
@@ -71,6 +71,15 @@
   source.
 
 
+
+  True if the database is connected to its storage.  A standalone database
+  is always connected.  A clustered database is connected if the server is
+  in contact with a majority of its cluster.  A relay database is connected
+  if the server is in contact with the relay source, i.e. is connected to
+  the server it syncs from.  An unconnected database cannot be modified and
+  its data might be unavailable or stale.
+
+
 
   
 These columns are most interesting and in some cases only relevant for
@@ -78,14 +87,6 @@
 column is clustered.
   
 
-  
-True if the database is connected to its storage.  A standalone or
-active-backup database is always connected.  A clustered database is
-connected if the server is in contact with a majority of its cluster.
-An unconnected database cannot be modified and its data might be
-unavailable or stale.
-  
-
   
 True if the database is the leader in its cluster.  For a standalone or
 active-backup database, this is always true.  For a relay database,
diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
index ddf868d16..0181fe987 100644
--- a/ovsdb/ovsdb-server.c
+++ b/ovsdb/ovsdb-server.c
@@ -1193,7 +1193,8 @@ update_database_status(struct ovsdb_row *row, struct db 
*db)
 ovsdb_util_write_string_column(row, "model",
 db->db->is_relay ? "relay" : ovsdb_storage_get_model(db->db->storage));
 ovsdb_util_write_bool_column(row, "connected",
- ovsdb_storage_is_connected(db->db->storage));
+db->db->is_relay ? ovsdb_relay_is_connected(db->db)
+ : ovsdb_storage_is_connected(db->db->storage));
 ovsdb_util_write_bool_column(row, "leader",
 db->db->is_relay ? false : ovsdb_storage_is_leader(db->db->storage));
 ovsdb_util_write_uuid_column(row, "cid",
diff --git a/ovsdb/relay.c b/ovsdb/relay.c
index df9906bda..fb16ce35c 100644
--- a/ovsdb/relay.c
+++ b/ovsdb/relay.c
@@ -31,6 +31,7 @@
 #include "ovsdb-error.h"
 #include "row.h"
 #include "table.h"
+#include "timeval.h"
 #include "transaction.h"
 #include "transaction-forward.h"
 #include "util.h"
@@ -47,8 +48,36 @@ struct relay_ctx {
 struct ovsdb_schema *new_schema;
 schema_change_callback schema_change_cb;
 void *schema_change_aux;
+
+long long int last_connected;
 };
 
+#define RELAY_MAX_RECONNECTION_MS 3
+
+/* Reports if the database is connected to the relay source and functional,
+ * i.e. it actively monitors the source and is able to forward transactions. */
+bool
+ovsdb_relay_is_connected(struct ovsdb *db)
+{
+struct relay_ctx *ctx = shash_find_data(_dbs, db->name);
+
+if (!ctx || !ovsdb_cs_is_alive(ctx->cs)) {
+return false;
+}
+
+if (ovsdb_cs_may_send_transaction(ctx->cs)) {
+return true;
+}
+
+/* Trying to avoid connection state flapping by delaying report for
+ * upper layer and giving ovsdb-cs some time to reconnect. */
+if (time_msec() - ctx->last_connected < RELAY_MAX_RECONNECTION_MS) {
+return true;
+}
+
+return false;
+}
+
 static struct json *
 ovsdb_relay_compose_monitor_request(const struct json *schema_json, void *ctx_)
 {
@@ -119,6 +148,7 @@ ovsdb_relay_add_db(struct ovsdb *db, const char *remote,
 ctx->schema_change_aux = schema_change_aux;
 ctx->db = db;
 ctx->cs = ovsdb_cs_create(db->name, 3, _cs_ops, ctx);
+ctx->last_connected = 0;
 shash_add(_dbs, db->name, ctx);
 ovsdb_cs_set_leader_only(ctx->cs, false);
 ovsdb_cs_set_remote(ctx->cs, remote, true);
@@ -306,6 +336,10 @@ ovsdb_relay_run(void)
 ovsdb_txn_forward_run(ctx->db, ctx->cs);
 ovsdb_cs_run(ctx->cs, );
 
+if (ovsdb_cs_may_send_transaction(ctx->cs)) {
+ctx->last_connected = time_msec();
+}
+
 struct ovsdb_cs_event *event;
 LIST_FOR_EACH_POP (event, list_node, ) {
 if (!ctx->db) {
diff --git a/ovsdb/relay.h b/ovsdb/relay.h
index 68586e9db..390ea70c8 100644
--- a/ovsdb/relay.h
+++ b/ovsdb/relay.h
@@ -17,6 +17,8 @@
 #ifndef OVSDB_RELAY_H
 #define OVSDB_RELAY_H 1
 
+#include 
+
 struct json;
 struct ovsdb;
 struct ovsdb_schema;
@@ -31,4 +33,6 @@ void ovsdb_relay_del_db(struct ovsdb *);
 void ovsdb_relay_run(void);
 void ovsdb_relay_wait(void);
 
+bool ovsdb_relay_is_connected(struct ovsdb 

[ovs-dev] [PATCH v3 5/9] ovsdb: New ovsdb 'relay' service model.

2021-07-14 Thread Ilya Maximets
New database service model 'relay' that is needed to scale out
read-mostly database access, e.g. ovn-controller connections to
OVN_Southbound.

In this service model ovsdb-server connects to existing OVSDB
server and maintains in-memory copy of the database.  It serves
read-only transactions and monitor requests by its own, but
forwards write transactions to the relay source.

Key differences from the active-backup replication:
- support for "write" transactions (next commit).
- no on-disk storage. (probably, faster operation)
- support for multiple remotes (connect to the clustered db).
- doesn't try to keep connection as long as possible, but
  faster reconnects to other remotes to avoid missing updates.
- No need to know the complete database schema beforehand,
  only the schema name.
- can be used along with other standalone and clustered databases
  by the same ovsdb-server process. (doesn't turn the whole
  jsonrpc server to read-only mode)
- supports modern version of monitors (monitor_cond_since),
  because based on ovsdb-cs.
- could be chained, i.e. multiple relays could be connected
  one to another in a row or in a tree-like form.
- doesn't increase availability.
- cannot be converted to other service models or become a main
  active server.

Acked-by: Dumitru Ceara 
Signed-off-by: Ilya Maximets 
---
 ovsdb/_server.ovsschema |   7 +-
 ovsdb/_server.xml   |  18 ++-
 ovsdb/automake.mk   |   2 +
 ovsdb/execution.c   |   5 +
 ovsdb/ovsdb-server.c| 100 
 ovsdb/ovsdb.c   |   2 +
 ovsdb/ovsdb.h   |   3 +
 ovsdb/relay.c   | 343 
 ovsdb/relay.h   |  34 
 9 files changed, 473 insertions(+), 41 deletions(-)
 create mode 100644 ovsdb/relay.c
 create mode 100644 ovsdb/relay.h

diff --git a/ovsdb/_server.ovsschema b/ovsdb/_server.ovsschema
index a867e5cbf..e3d9d893b 100644
--- a/ovsdb/_server.ovsschema
+++ b/ovsdb/_server.ovsschema
@@ -1,13 +1,14 @@
 {"name": "_Server",
- "version": "1.1.0",
- "cksum": "3236486585 698",
+ "version": "1.2.0",
+ "cksum": "3009684573 744",
  "tables": {
"Database": {
  "columns": {
"name": {"type": "string"},
"model": {
  "type": {"key": {"type": "string",
-  "enum": ["set", ["standalone", "clustered"]]}}},
+  "enum": ["set",
+ ["standalone", "clustered", "relay"]]}}},
"connected": {"type": "boolean"},
"leader": {"type": "boolean"},
"schema": {
diff --git a/ovsdb/_server.xml b/ovsdb/_server.xml
index 70cd22db7..37297da73 100644
--- a/ovsdb/_server.xml
+++ b/ovsdb/_server.xml
@@ -60,12 +60,15 @@
 
 
   The storage model: standalone for a standalone or
-  active-backup database, clustered for a clustered database.
+  active-backup database, clustered for a clustered database,
+  relay for a relay database.
 
 
 
   The database schema, as a JSON string.  In the case of a clustered
-  database, this is empty until it finishes joining its cluster.
+  database, this is empty until it finishes joining its cluster.  In the
+  case of a relay database, this is empty until it connects to the relay
+  source.
 
 
 
@@ -85,20 +88,21 @@
 
   
 True if the database is the leader in its cluster.  For a standalone or
-active-backup database, this is always true.
+active-backup database, this is always true.  For a relay database,
+this is always false.
   
 
   
 The cluster ID for this database, which is the same for all of the
-servers that host this particular clustered database.  For a standalone
-or active-backup database, this is empty.
+servers that host this particular clustered database.  For a
+standalone, active-backup or relay database, this is empty.
   
 
   
 The server ID for this database, different for each server that hosts a
 particular clustered database.  A server that hosts more than one
 clustered database will have a different sid in each one.
-For a standalone or active-backup database, this is empty.
+For a standalone, active-backup or relay database, this is empty.
   
 
   
@@ -112,7 +116,7 @@
 
 
 
-  For a standalone or active-backup database, this is empty.
+  For a standalone, active-backup or relay database, this is empty.
 
   
 
diff --git a/ovsdb/automake.mk b/ovsdb/automake.mk
index 446d6c136..05c8ebbdf 100644
--- a/ovsdb/automake.mk
+++ b/ovsdb/automake.mk
@@ -34,6 +34,8 @@ ovsdb_libovsdb_la_SOURCES = \
ovsdb/rbac.h \
ovsdb/replication.c \
ovsdb/replication.h \
+   ovsdb/relay.c \
+   ovsdb/relay.h \
ovsdb/row.c \
ovsdb/row.h \
ovsdb/server.c \
diff --git a/ovsdb/execution.c b/ovsdb/execution.c
index f6150e944..dd2569055 

[ovs-dev] [PATCH v3 4/9] ovsdb: row: Add support for xor-based row updates.

2021-07-14 Thread Ilya Maximets
This will be used to apply update3 type updates to ovsdb tables
while processing updates for future ovsdb 'relay' service model.

'ovsdb_datum_apply_diff' is allowed to fail, so adding support
to return this error.

Acked-by: Dumitru Ceara 
Signed-off-by: Ilya Maximets 
---
 ovsdb/execution.c   |  5 +++--
 ovsdb/replication.c |  2 +-
 ovsdb/row.c | 30 +-
 ovsdb/row.h |  6 --
 ovsdb/table.c   |  9 +
 ovsdb/table.h   |  2 +-
 6 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/ovsdb/execution.c b/ovsdb/execution.c
index 3a0dad5d0..f6150e944 100644
--- a/ovsdb/execution.c
+++ b/ovsdb/execution.c
@@ -483,8 +483,9 @@ update_row_cb(const struct ovsdb_row *row, void *ur_)
 
 ur->n_matches++;
 if (!ovsdb_row_equal_columns(row, ur->row, ur->columns)) {
-ovsdb_row_update_columns(ovsdb_txn_row_modify(ur->txn, row),
- ur->row, ur->columns);
+ovsdb_error_assert(ovsdb_row_update_columns(
+   ovsdb_txn_row_modify(ur->txn, row),
+   ur->row, ur->columns, false));
 }
 
 return true;
diff --git a/ovsdb/replication.c b/ovsdb/replication.c
index b755976b0..d8b56d813 100644
--- a/ovsdb/replication.c
+++ b/ovsdb/replication.c
@@ -677,7 +677,7 @@ process_table_update(struct json *table_update, const char 
*table_name,
 struct ovsdb_error *error;
 error = (!new ? ovsdb_table_execute_delete(txn, , table)
  : !old ? ovsdb_table_execute_insert(txn, , table, new)
- : ovsdb_table_execute_update(txn, , table, new));
+ : ovsdb_table_execute_update(txn, , table, new, false));
 if (error) {
 if (!strcmp(ovsdb_error_get_tag(error), "consistency violation")) {
 ovsdb_error_assert(error);
diff --git a/ovsdb/row.c b/ovsdb/row.c
index 755ab91a8..65a054621 100644
--- a/ovsdb/row.c
+++ b/ovsdb/row.c
@@ -163,20 +163,40 @@ ovsdb_row_equal_columns(const struct ovsdb_row *a,
 return true;
 }
 
-void
+struct ovsdb_error *
 ovsdb_row_update_columns(struct ovsdb_row *dst,
  const struct ovsdb_row *src,
- const struct ovsdb_column_set *columns)
+ const struct ovsdb_column_set *columns,
+ bool xor)
 {
 size_t i;
 
 for (i = 0; i < columns->n_columns; i++) {
 const struct ovsdb_column *column = columns->columns[i];
+struct ovsdb_datum xor_datum;
+struct ovsdb_error *error;
+
+if (xor) {
+error = ovsdb_datum_apply_diff(_datum,
+   >fields[column->index],
+   >fields[column->index],
+   >type);
+if (error) {
+return error;
+}
+}
+
 ovsdb_datum_destroy(>fields[column->index], >type);
-ovsdb_datum_clone(>fields[column->index],
-  >fields[column->index],
-  >type);
+
+if (xor) {
+ovsdb_datum_swap(>fields[column->index], _datum);
+} else {
+ovsdb_datum_clone(>fields[column->index],
+  >fields[column->index],
+  >type);
+}
 }
+return NULL;
 }
 
 /* Appends the string form of the value in 'row' of each of the columns in
diff --git a/ovsdb/row.h b/ovsdb/row.h
index 2c441b5a4..394ac8eb4 100644
--- a/ovsdb/row.h
+++ b/ovsdb/row.h
@@ -82,8 +82,10 @@ bool ovsdb_row_equal_columns(const struct ovsdb_row *,
 int ovsdb_row_compare_columns_3way(const struct ovsdb_row *,
const struct ovsdb_row *,
const struct ovsdb_column_set *);
-void ovsdb_row_update_columns(struct ovsdb_row *, const struct ovsdb_row *,
-  const struct ovsdb_column_set *);
+struct ovsdb_error *ovsdb_row_update_columns(struct ovsdb_row *,
+ const struct ovsdb_row *,
+ const struct ovsdb_column_set *,
+ bool xor);
 void ovsdb_row_columns_to_string(const struct ovsdb_row *,
  const struct ovsdb_column_set *, struct ds *);
 struct ovsdb_error *ovsdb_row_from_json(struct ovsdb_row *,
diff --git a/ovsdb/table.c b/ovsdb/table.c
index 2935bd897..455a3663f 100644
--- a/ovsdb/table.c
+++ b/ovsdb/table.c
@@ -384,7 +384,8 @@ ovsdb_table_execute_delete(struct ovsdb_txn *txn, const 
struct uuid *row_uuid,
 
 struct ovsdb_error *
 ovsdb_table_execute_update(struct ovsdb_txn *txn, const struct uuid *row_uuid,
-   struct ovsdb_table *table, struct json *json_row)
+   struct ovsdb_table *table, struct json *json_row,
+ 

[ovs-dev] [PATCH v3 3/9] ovsdb: table: Expose functions to execute operations on ovsdb tables.

2021-07-14 Thread Ilya Maximets
These functions will be used later for ovsdb 'relay' service model, so
moving them to a common code.

Warnings translated to ovsdb errors, caller in replication.c only
printed inconsistency warnings, but mostly ignored them.  Implementing
the same logic by checking the error tag.

Also ovsdb_execute_insert() previously printed incorrect warning about
duplicate row while it was a syntax error in json.  Fixing that by
actually checking for the duplicate and reporting correct ovsdb error.

Acked-by: Mark D. Gray 
Acked-by: Dumitru Ceara 
Signed-off-by: Ilya Maximets 
---
 ovsdb/replication.c | 83 -
 ovsdb/table.c   | 69 +
 ovsdb/table.h   | 14 
 3 files changed, 90 insertions(+), 76 deletions(-)

diff --git a/ovsdb/replication.c b/ovsdb/replication.c
index bb1bd4250..b755976b0 100644
--- a/ovsdb/replication.c
+++ b/ovsdb/replication.c
@@ -54,18 +54,6 @@ static struct ovsdb_error *process_table_update(struct json 
*table_update,
 const char *table_name,
 struct ovsdb *database,
 struct ovsdb_txn *txn);
-
-static struct ovsdb_error *execute_insert(struct ovsdb_txn *txn,
-  const struct uuid *row_uuid,
-  struct ovsdb_table *table,
-  struct json *new);
-static struct ovsdb_error *execute_delete(struct ovsdb_txn *txn,
-  const struct uuid *row_uuid,
-  struct ovsdb_table *table);
-static struct ovsdb_error *execute_update(struct ovsdb_txn *txn,
-  const struct uuid *row_uuid,
-  struct ovsdb_table *table,
-  struct json *new);
 
 /* Maps from db name to sset of table names. */
 static struct shash excluded_tables = SHASH_INITIALIZER(_tables);
@@ -687,77 +675,20 @@ process_table_update(struct json *table_update, const 
char *table_name,
 new = shash_find_data(json_object(row_update), "new");
 
 struct ovsdb_error *error;
-error = (!new ? execute_delete(txn, , table)
- : !old ? execute_insert(txn, , table, new)
- : execute_update(txn, , table, new));
+error = (!new ? ovsdb_table_execute_delete(txn, , table)
+ : !old ? ovsdb_table_execute_insert(txn, , table, new)
+ : ovsdb_table_execute_update(txn, , table, new));
 if (error) {
+if (!strcmp(ovsdb_error_get_tag(error), "consistency violation")) {
+ovsdb_error_assert(error);
+error = NULL;
+}
 return error;
 }
 }
 return NULL;
 }
 
-static struct ovsdb_error *
-execute_insert(struct ovsdb_txn *txn, const struct uuid *row_uuid,
-   struct ovsdb_table *table, struct json *json_row)
-{
-struct ovsdb_row *row = ovsdb_row_create(table);
-struct ovsdb_error *error = ovsdb_row_from_json(row, json_row, NULL, NULL);
-if (!error) {
-*ovsdb_row_get_uuid_rw(row) = *row_uuid;
-ovsdb_txn_row_insert(txn, row);
-} else {
-static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
-VLOG_WARN_RL(, "cannot add existing row "UUID_FMT" to table %s",
- UUID_ARGS(row_uuid), table->schema->name);
-ovsdb_row_destroy(row);
-}
-
-return error;
-}
-
-static struct ovsdb_error *
-execute_delete(struct ovsdb_txn *txn, const struct uuid *row_uuid,
-   struct ovsdb_table *table)
-{
-const struct ovsdb_row *row = ovsdb_table_get_row(table, row_uuid);
-if (row) {
-ovsdb_txn_row_delete(txn, row);
-} else {
-static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
-VLOG_WARN_RL(, "cannot delete missing row "UUID_FMT" from table %s",
- UUID_ARGS(row_uuid), table->schema->name);
-}
-return NULL;
-}
-
-static struct ovsdb_error *
-execute_update(struct ovsdb_txn *txn, const struct uuid *row_uuid,
-   struct ovsdb_table *table, struct json *json_row)
-{
-const struct ovsdb_row *row = ovsdb_table_get_row(table, row_uuid);
-if (!row) {
-static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
-VLOG_WARN_RL(, "cannot modify missing row "UUID_FMT" in table %s",
- UUID_ARGS(row_uuid), table->schema->name);
-return NULL;
-}
-
-struct ovsdb_column_set columns = OVSDB_COLUMN_SET_INITIALIZER;
-struct ovsdb_row *update = ovsdb_row_create(table);
-struct ovsdb_error *error = ovsdb_row_from_json(update, json_row,
-NULL, );
-
-if (!error && 

[ovs-dev] [PATCH v3 2/9] ovsdb: storage: Allow setting the name for the unbacked storage.

2021-07-14 Thread Ilya Maximets
ovsdb_create() requires schema or storage to be nonnull, but in
practice it requires to have schema name or a storage name to
use it as a database name.  Only clustered storage has a name.
This means that only clustered database can be created without
schema,  Changing that by allowing unbacked storage to have a
name.  This way we can create database with unbacked storage
without schema.  Will be used in next commits to create database
for ovsdb 'relay' service model.

Acked-by: Dumitru Ceara 
Signed-off-by: Ilya Maximets 
---
 ovsdb/file.c |  2 +-
 ovsdb/ovsdb-server.c |  2 +-
 ovsdb/storage.c  | 13 ++---
 ovsdb/storage.h  |  2 +-
 tests/test-ovsdb.c   |  6 +++---
 5 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/ovsdb/file.c b/ovsdb/file.c
index 0b8bdfe37..59220824f 100644
--- a/ovsdb/file.c
+++ b/ovsdb/file.c
@@ -318,7 +318,7 @@ ovsdb_convert(const struct ovsdb *src, const struct 
ovsdb_schema *new_schema,
   struct ovsdb **dstp)
 {
 struct ovsdb *dst = ovsdb_create(ovsdb_schema_clone(new_schema),
- ovsdb_storage_create_unbacked());
+ ovsdb_storage_create_unbacked(NULL));
 struct ovsdb_txn *txn = ovsdb_txn_create(dst);
 struct ovsdb_error *error = NULL;
 
diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
index b09232c65..23bd226a3 100644
--- a/ovsdb/ovsdb-server.c
+++ b/ovsdb/ovsdb-server.c
@@ -738,7 +738,7 @@ add_server_db(struct server_config *config)
 /* We don't need txn_history for server_db. */
 
 db->filename = xstrdup("");
-db->db = ovsdb_create(schema, ovsdb_storage_create_unbacked());
+db->db = ovsdb_create(schema, ovsdb_storage_create_unbacked(NULL));
 bool ok OVS_UNUSED = ovsdb_jsonrpc_server_add_db(config->jsonrpc, db->db);
 ovs_assert(ok);
 add_db(config, db);
diff --git a/ovsdb/storage.c b/ovsdb/storage.c
index 40415fcf6..d727b1eac 100644
--- a/ovsdb/storage.c
+++ b/ovsdb/storage.c
@@ -45,6 +45,8 @@ struct ovsdb_storage {
 struct ovsdb_log *log;
 struct raft *raft;
 
+char *unbacked_name; /* Name of the unbacked storage. */
+
 /* All kinds of storage. */
 struct ovsdb_error *error;  /* If nonnull, a permanent error. */
 long long next_snapshot_min; /* Earliest time to take next snapshot. */
@@ -121,12 +123,14 @@ ovsdb_storage_open_standalone(const char *filename, bool 
rw)
 }
 
 /* Creates and returns new storage without any backing.  Nothing will be read
- * from the storage, and writes are discarded. */
+ * from the storage, and writes are discarded.  If 'name' is nonnull, it will
+ * be used as a storage name. */
 struct ovsdb_storage *
-ovsdb_storage_create_unbacked(void)
+ovsdb_storage_create_unbacked(const char *name)
 {
 struct ovsdb_storage *storage = xzalloc(sizeof *storage);
 schedule_next_snapshot(storage, false);
+storage->unbacked_name = nullable_xstrdup(name);
 return storage;
 }
 
@@ -137,6 +141,7 @@ ovsdb_storage_close(struct ovsdb_storage *storage)
 ovsdb_log_close(storage->log);
 raft_close(storage->raft);
 ovsdb_error_destroy(storage->error);
+free(storage->unbacked_name);
 free(storage);
 }
 }
@@ -230,7 +235,9 @@ ovsdb_storage_wait(struct ovsdb_storage *storage)
 const char *
 ovsdb_storage_get_name(const struct ovsdb_storage *storage)
 {
-return storage->raft ? raft_get_name(storage->raft) : NULL;
+return storage->unbacked_name ? storage->unbacked_name
+   : storage->raft ? raft_get_name(storage->raft)
+   : NULL;
 }
 
 /* Attempts to read a log record from 'storage'.
diff --git a/ovsdb/storage.h b/ovsdb/storage.h
index 02b6e7e6c..e120094d7 100644
--- a/ovsdb/storage.h
+++ b/ovsdb/storage.h
@@ -29,7 +29,7 @@ struct uuid;
 struct ovsdb_error *ovsdb_storage_open(const char *filename, bool rw,
struct ovsdb_storage **)
 OVS_WARN_UNUSED_RESULT;
-struct ovsdb_storage *ovsdb_storage_create_unbacked(void);
+struct ovsdb_storage *ovsdb_storage_create_unbacked(const char *name);
 void ovsdb_storage_close(struct ovsdb_storage *);
 
 const char *ovsdb_storage_get_model(const struct ovsdb_storage *);
diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
index a886f971e..fb6b3acca 100644
--- a/tests/test-ovsdb.c
+++ b/tests/test-ovsdb.c
@@ -1485,7 +1485,7 @@ do_execute__(struct ovs_cmdl_context *ctx, bool ro)
 json = parse_json(ctx->argv[1]);
 check_ovsdb_error(ovsdb_schema_from_json(json, ));
 json_destroy(json);
-db = ovsdb_create(schema, ovsdb_storage_create_unbacked());
+db = ovsdb_create(schema, ovsdb_storage_create_unbacked(NULL));
 
 for (i = 2; i < ctx->argc; i++) {
 struct json *params, *result;
@@ -1551,7 +1551,7 @@ do_trigger(struct ovs_cmdl_context *ctx)
 json = parse_json(ctx->argv[1]);
 check_ovsdb_error(ovsdb_schema_from_json(json, ));
 json_destroy(json);
-db = ovsdb_create(schema, 

[ovs-dev] [PATCH v3 1/9] jsonrpc-server: Wake up jsonrpc session if there are completed triggers.

2021-07-14 Thread Ilya Maximets
If there are completed triggers, jsonrpc server should wake up and
update clients with the new data, but there is no such condition
in ovsdb_jsonrpc_session_wait().  For some reason this doesn't result
in any processing delays in current code, probably because there are
always some other types of events in this case that could wake ovsdb
server up.  But it will become a problem in upcoming ovsdb 'relay'
service model because triggers could be completed from a different
place, i.e. after receiving transaction reply from the relay source.

Fix that by waking up ovsdb-server in case there are completed triggers
that needs to be handled.

Acked-by: Mark D. Gray 
Acked-by: Dumitru Ceara 
Signed-off-by: Ilya Maximets 
---
 ovsdb/jsonrpc-server.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c
index 4e2dfc3d7..351c39d8a 100644
--- a/ovsdb/jsonrpc-server.c
+++ b/ovsdb/jsonrpc-server.c
@@ -600,7 +600,8 @@ ovsdb_jsonrpc_session_wait(struct ovsdb_jsonrpc_session *s)
 {
 jsonrpc_session_wait(s->js);
 if (!jsonrpc_session_get_backlog(s->js)) {
-if (ovsdb_jsonrpc_monitor_needs_flush(s)) {
+if (ovsdb_jsonrpc_monitor_needs_flush(s)
+|| !ovs_list_is_empty(>up.completions)) {
 poll_immediate_wake();
 } else {
 jsonrpc_session_recv_wait(s->js);
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 0/9] OVSDB Relay Service Model. (Was: OVSDB 2-Tier deployment)

2021-07-14 Thread Ilya Maximets
Replication can be used to scale out read-only access to the database.
But there are clients that are not read-only, but read-mostly.
One of the main examples is ovn-controller that mostly monitors
updates from the Southbound DB, but needs to claim ports by sending
transactions that changes some database tables.

Southbound database serves lots of connections: all connections
from ovn-controllers and some service connections from cloud
infrastructure, e.g. some OpenStack agents are monitoring updates.
At a high scale and with a big size of the database ovsdb-server
spends too much time processing monitor updates and it's required
to move this load somewhere else.  This patch-set aims to introduce
required functionality to scale out read-mostly connections by
introducing a new OVSDB 'relay' service model .

In this new service model ovsdb-server connects to existing OVSDB
server and maintains in-memory copy of the database.  It serves
read-only transactions and monitor requests by its own, but forwards
write transactions to the relay source.

Key differences from the active-backup replication:
- support for "write" transactions.
- no on-disk storage. (probably, faster operation)
- support for multiple remotes (connect to the clustered db).
- doesn't try to keep connection as long as possible, but
  faster reconnects to other remotes to avoid missing updates.
- No need to know the complete database schema beforehand,
  only the schema name.
- can be used along with other standalone and clustered databases
  by the same ovsdb-server process. (doesn't turn the whole
  jsonrpc server to read-only mode)
- supports modern version of monitors (monitor_cond_since),
  because based on ovsdb-cs.
- could be chained, i.e. multiple relays could be connected
  one to another in a row or in a tree-like form.

Bringing all above functionality to the existing active-backup
replication doesn't look right as it will make it less reliable
for the actual backup use case, and this also would be much
harder from the implementation point of view, because current
replication code is not based on ovsdb-cs or idl and all the required
features would be likely duplicated or replication would be fully
re-written on top of ovsdb-cs with severe modifications of the former.

Relay is somewhere in the middle between active-backup replication and
the clustered model taking a lot from both, therefore is hard to
implement on top of any of them.

To run ovsdb-server in relay mode, user need to simply run:

  ovsdb-server --remote=punix:db.sock relay::

e.g.

  ovsdb-server --remote=punix:db.sock relay:OVN_Southbound:tcp:127.0.0.1:6642

More details and examples in the documentation in the last patch
of the series.

I actually tried to implement transaction forwarding on top of
active-backup replication in v1 of this seies, but it required
a lot of tricky changes, including schema format changes in order
to bring required information to the end clients, so I decided
to fully rewrite the functionality in v2 with a different approach.


 Testing
 ===

Some scale tests were performed with OVSDB Relays that mimics OVN
workloads with ovn-kubernetes.
Tests performed with ovn-heater (https://github.com/dceara/ovn-heater)
on scenario ocp-120-density-heavy:
 
https://github.com/dceara/ovn-heater/blob/master/test-scenarios/ocp-120-density-heavy.yml
In short, the test gradually creates a lot of OVN resources and
checks that network is configured correctly (by pinging diferent
namespaces).  The test includes 120 chassis (created by
ovn-fake-multinode), 31250 LSPs spread evenly across 120 LSes, 3 LBs
with 15625 VIPs each, attached to all node LSes, etc.  Test performed
with monitor-all=true.

Note 1:
 - Memory consumption is checked at the end of a test in a following
   way: 1) check RSS 2) compact database 3) check RSS again.
   It's observed that ovn-controllers in this test are fairly slow
   and backlog builds up on monitors, because ovn-controllers are
   not able to receive updates fast enough.  This contributes to
   RSS of the process, especially in combination of glibc bug (glibc
   doesn't free fastbins back to the system).  Memory trimming on
   compaction is enabled in the test, so after compaction we can
   see more or less real value of the RSS at the end of the test
   wihtout backlog noise. (Compaction on relay in this case is
   just plain malloc_trim()).

Note 2:
 - I didn't collect memory consumption (RSS) after compaction for a
   test with 10 relays, because I got the idea only after the test
   was finished and another one already started.  And run takes
   significant amount of time.  So, values marked with a star (*)
   are an approximation based on results form other tests, hence
   might be not fully correct.

Note 3:
 - 'Max. poll' is a maximum of the 'long poll intervals' logged by
   ovsdb-server during the test.  Poll intervals that involved database
   compaction (huge disk writes) are same in all tests and excluded
   

Re: [ovs-dev] [PATCH ovn v5 2/2] tests: Add check-perf target

2021-07-14 Thread Dumitru Ceara
On 6/30/21 3:15 PM, Mark Gray wrote:
> Add a suite of micro-benchmarks to aid a developer in understanding the
> performance impact of any changes that they are making. They can be used to
> help to understand the relative performance between two test runs on the same
> test machine, but are not intended to give the absolute performance of OVN.
> 
> To invoke the performance testsuite, run:
> 
> $ make check-perf
> 
> This will run all available performance tests.
> 
> Additional metrics (e.g. memory, coverage, perf counters) may be added
> in the future. Additional tests (e.g. additional topologies,  ovn-controller
> tests) may be added in the future.
> 
> Signed-off-by: Mark Gray 
> ---
> 
> Notes:
> v2:  create results directory to fix build error
> v3:  forgot to commit, create results directory to fix build error
> v4:  fix 0-day issues
>  remove `sudo` in Makefile
>  updated documentation
> 
>  Documentation/topics/testing.rst |  50 
>  tests/.gitignore |   3 +
>  tests/automake.mk|  27 
>  tests/perf-northd.at | 207 +++
>  tests/perf-testsuite.at  |  26 
>  5 files changed, 313 insertions(+)
>  create mode 100644 tests/perf-northd.at
>  create mode 100644 tests/perf-testsuite.at
> 
> diff --git a/Documentation/topics/testing.rst 
> b/Documentation/topics/testing.rst
> index be9e7c57331c..db265344a507 100644
> --- a/Documentation/topics/testing.rst
> +++ b/Documentation/topics/testing.rst
> @@ -256,3 +256,53 @@ the following::
>  All the features documented under `Unit Tests`_ are available for the
>  datapath testsuites, except that the datapath testsuites do not
>  support running tests in parallel.
> +
> +Performance testing
> +~~~
> +
> +OVN includes a suite of micro-benchmarks to aid a developer in understanding
> +the performance impact of any changes that they are making. They can be used 
> to
> +help to understand the relative performance between two test runs on the same
> +test machine, but are not intended to give the absolute performance of OVN.
> +
> +To invoke the performance testsuite, run::
> +
> +$ make check-perf
> +
> +This will run all available performance tests. Some of these tests may be
> +long-running as they need to build complex logical network topologies. In 
> order
> +to speed up subsequent test runs, some objects (e.g. the Northbound DB) may 
> be
> +cached. In order to force the tests to rebuild all these objects, run::
> +
> +$ make check-perf TESTSUITEFLAGS="--rebuild"
> +
> +A typical workflow for a developer trying to improve the performance of OVN
> +would be the following:
> +
> +0. Optional: Modify/add a performance test to buld the topology that you are
> +   benchmarking, if required.
> +1. Run ``make check-perf TESTSUITEFLAGS="--rebuild"`` to generate cached
> +   databases (and complete a test run). The results of each test run are
> +   displayed on the screen at the end of the test run but are also saved in 
> the
> +   file ``tests/perf-testsuite.dir/results``.
> +
> +.. note::
> +   This step may take some time depending on the number of tests that are 
> being
> +   rebuilt, the complexity of the tests and the performance of the test
> +   machine. If you are only using one test, you can specify the test to run 
> by
> +   adding the test number to the ``make`` command.
> +   (e.g. ``make check-perf TESTSUITEFLAGS="--rebuild "``)
> +
> +2. Run ``make check-perf`` to measure the performance metric that you are
> +   benchmarking against. If you are only using one test, you can specify the
> +   test to run by adding the test number to the ``make`` command.
> +   (e.g. ``make check-perf TESTSUITEFLAGS="--rebuild "``)
> +3. Modify OVN code to implement the change that you believe will improve the
> +   performance.
> +4. Go to Step 2. to continue making improvements.
> +
> +If, as a developer, you modify a performance test in a way that may change 
> one
> +of these cached objects, be sure to rebuild the test.
> +
> +The cached objects are stored under the relevant folder in
> +``tests/perf-testsuite.dir/cached``.
> diff --git a/tests/.gitignore b/tests/.gitignore
> index 8479f9bb0f8f..65cb1c6e4fad 100644
> --- a/tests/.gitignore
> +++ b/tests/.gitignore
> @@ -22,6 +22,9 @@
>  /system-offloads-testsuite
>  /system-offloads-testsuite.dir/
>  /system-offloads-testsuite.log
> +/perf-testsuite
> +/perf-testsuite.dir/
> +/perf-testsuite.log
>  /test-aes128
>  /test-atomic
>  /test-bundle
> diff --git a/tests/automake.mk b/tests/automake.mk
> index a8ec64212791..5b890d644eeb 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -4,9 +4,11 @@ EXTRA_DIST += \
>   $(SYSTEM_TESTSUITE_AT) \
>   $(SYSTEM_KMOD_TESTSUITE_AT) \
>   $(SYSTEM_USERSPACE_TESTSUITE_AT) \
> + $(PERF_TESTSUITE_AT) \
>   $(TESTSUITE) \
>   $(SYSTEM_KMOD_TESTSUITE) \
>   $(SYSTEM_USERSPACE_TESTSUITE) \
> +   

Re: [ovs-dev] [v11 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-14 Thread Eelco Chaudron



On 14 Jul 2021, at 14:25, Van Haaren, Harry wrote:

>> -Original Message-
>> From: Eelco Chaudron 
>> Sent: Wednesday, July 14, 2021 12:57 PM
>> To: Amber, Kumar 
>> Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van 
>> Haaren,
>> Harry ; Ferriter, Cian ;
>> Stokes, Ian 
>> Subject: Re: [v11 06/11] dpif-netdev: Add packet count and core id paramters 
>> for
>> study
>>
>>
>>
>> On 14 Jul 2021, at 13:33, Amber, Kumar wrote:
>>
>>> Hi Eelco,
>>>
 -Original Message-
 From: Eelco Chaudron 
 Sent: Wednesday, July 14, 2021 4:21 PM
 To: Amber, Kumar 
 Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van
 Haaren, Harry ; Ferriter, Cian
 ; Stokes, Ian 
 Subject: Re: [v11 06/11] dpif-netdev: Add packet count and core id 
 paramters
 for study



 On 14 Jul 2021, at 12:30, Eelco Chaudron wrote:

> On 14 Jul 2021, at 4:02, kumar Amber wrote:
>
>> From: Kumar Amber 
>>
>> This commit introduces additional command line paramter for mfex
>> study function. If user provides additional packet out it is used in
>> study to compare minimum packets which must be processed else a
>> default value is choosen.
>> Also introduces a third paramter for choosing a particular pmd core.
>>
>> $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3
>>
>> Signed-off-by: Kumar Amber 

 One additional comment, please add some (negative) test cases for the
 command line options, so we know your changes work. Rather than me having
 to do this manually every revision.

>>>
>>> Yes we did think about that but we cannot as whenever a command fails the 
>>> ovs
>> flags it as error in server that auto-mantically fails the test.
>>>
>>> Ex:
 2021-07-14T11:16:30.194Z|00082|unixctl|DBG|received request dpif-
>> netdev/miniflow-parser-set["-pmd","0","scalar"], id=0
 2021-07-14T11:16:30.194Z|00083|dpif_netdev|ERR|Error: Miniflow parser not
>> changed, PMD thread 0 not in use, pass a valid pmd thread ID.
 2021-07-14T11:16:30.194Z|00084|unixctl|DBG|replying with error, id=0: 
 "Error:
>> Miniflow parser not changed, PMD thread 0 not in use, pass a valid pmd 
>> thread ID.
 "
>>> 8. system-dpdk.at:291: 8. OVS-DPDK - MFEX Commands (system-dpdk.at:291):
>> FAILED (system-dpdk.at:308)
>>>
>>> And Hence we cannot add the command test-case.
>>
>> Not sure how you added it, but here is a quick try that works for me:
>>
>> [wsfd-netdev64:~/..._v20.11.1/ovs_github]$ git diff
>> diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
>> index 96072e646..5f039dec5 100644
>> --- a/tests/system-dpdk.at
>> +++ b/tests/system-dpdk.at
>> @@ -285,3 +285,26 @@ dnl Clean up
>>  AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
>>  AT_CLEANUP
>>  dnl 
>> --
>> +
>> +dnl 
>> --
>> +AT_SETUP([OVS-DPDK - MFEX Configuration])
>> +AT_KEYWORDS([dpdk])
>> +OVS_DPDK_START()
>> +
>> +dnl Add userspace bridge and attach it to OVS
>> +AT_CHECK([ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev])
>> +AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dpdk 
>> options:dpdk-
>> devargs=net_pcap1,rx_pcap=$srcdir/pcap/mfex_test.pcap,infinite_rx=1], [],
>> [stdout], [stderr])
>> +AT_CHECK([ovs-vsctl show], [], [stdout])
>> +
>> +
>> +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set scalar 1], [2], [], 
>> [dnl
>> +The study_pkt_cnt option is not valid for the scalar implementation.
>> +ovs-appctl: ovs-vswitchd: server returned an error
>> +])
>> +
>> +dnl Clean up
>> +AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
>> +AT_CLEANUP
>> +dnl 
>> --
>
> Hi All,
>
> I think the above point was that if the ovs-vswitchd does a VLOG_ERR, (as it 
> does when
> bad arguments like the above test are hit), then the unit-test automatically 
> fails.
> Its not about adding the test code, but about how the Unit test infra handles 
> VLOG_ERR output.

The above test ran fine in my environment. Maybe other checks generate other 
logs that cause problem.

> As a result, we cannot have unit-tests that actually hit errors it seems, as 
> they cause
> the unit-test to report a failure, instead of "negative testing" and passing 
> the unit test.


You can overwrite this log checking to exclude specific errors see check_logs() 
and OVS_VSWITCHD_STOP.

> I'm not very familiar with the wider AT_* based unit testing, but that's my 
> understanding
> of the infrastructure?


>>> 
>>>
>>> Br
>>> Amber
>
> Regards, -Harry

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v5 1/2] ovn-northd: Add useful stopwatches

2021-07-14 Thread Dumitru Ceara
On 6/30/21 3:15 PM, Mark Gray wrote:
> For performance measurement, it is useful to understand the
> length of time required to complete a number of key code paths
> in ovn-northd.c. Add stopwatches to measure these timings.
> 
> Signed-off-by: Mark Gray 
> Acked-by: Dumitru Ceara 
> ---
> 
> Notes:
> v4:  Add common header file for stopwatch names
> v5:  Forgot to `git add` a new file. Added this file.
> 
>  lib/automake.mk   |  3 ++-
>  lib/stopwatch-names.h | 25 +
>  northd/ovn-northd-ddlog.c | 12 
>  northd/ovn-northd.c   | 17 +
>  4 files changed, 56 insertions(+), 1 deletion(-)
>  create mode 100644 lib/stopwatch-names.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 917b28e1edf7..f668b791bb81 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -29,7 +29,8 @@ lib_libovn_la_SOURCES = \
>   lib/inc-proc-eng.c \
>   lib/inc-proc-eng.h \
>   lib/lb.c \
> - lib/lb.h
> + lib/lb.h \
> + lib/stopwatch-names.h
>  nodist_lib_libovn_la_SOURCES = \
>   lib/ovn-dirs.c \
>   lib/ovn-nb-idl.c \
> diff --git a/lib/stopwatch-names.h b/lib/stopwatch-names.h
> new file mode 100644
> index ..06b20272e8cf
> --- /dev/null
> +++ b/lib/stopwatch-names.h
> @@ -0,0 +1,25 @@
> +/* Copyright (c) 2021 Red Hat, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +#ifndef STOPWATCH_NAMES_H
> +#define STOPWATCH_NAMES_H 1
> +
> +/* In order to not duplicate names for stopwatches between ddlog and 
> non-ddlog
> + * we define them in a common header file.
> + */
> +#define NORTHD_LOOP_STOPWATCH_NAME "ovn-northd-loop"
> +#define OVNNB_DB_RUN_STOPWATCH_NAME "ovnnb_db_run"
> +#define OVNSB_DB_RUN_STOPWATCH_NAME "ovnsb_db_run"
> +
> +#endif
> \ No newline at end of file

No newline at end of file.  With this addressed:

Acked-by: Dumitru Ceara 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2 2/2] dpif-netdev: Introduce netdev array cache

2021-07-14 Thread Eli Britstein
Port numbers are usually small. Maintain an array of netdev handles indexed
by port numbers. It accelerates looking up for them for
netdev_hw_miss_packet_recover().

Reported-by: Cian Ferriter 
Signed-off-by: Eli Britstein 
Reviewed-by: Gaetan Rivet 
---
 lib/dpif-netdev-private-thread.h |  4 +++
 lib/dpif-netdev.c| 43 +---
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h
index ba79c4a0a..52755fbae 100644
--- a/lib/dpif-netdev-private-thread.h
+++ b/lib/dpif-netdev-private-thread.h
@@ -50,6 +50,9 @@ struct dp_netdev_pmd_thread_ctx {
 bool smc_enable_db;
 };
 
+/* Size of netdev's cache. */
+#define DP_PMD_NETDEV_CACHE_SIZE 1024
+
 /* PMD: Poll modes drivers.  PMD accesses devices via polling to eliminate
  * the performance overhead of interrupt processing.  Therefore netdev can
  * not implement rx-wait for these devices.  dpif-netdev needs to poll
@@ -192,6 +195,7 @@ struct dp_netdev_pmd_thread {
  * other instance will only be accessed by its own pmd thread. */
 struct hmap tnl_port_cache;
 struct hmap send_port_cache;
+struct netdev *send_netdev_cache[DP_PMD_NETDEV_CACHE_SIZE];
 
 /* Keep track of detailed PMD performance statistics. */
 struct pmd_perf_stats perf_stats;
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1823bf565..50ea85d48 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -5540,6 +5540,12 @@ pmd_free_cached_ports(struct dp_netdev_pmd_thread *pmd)
 free(tx_port_cached);
 }
 HMAP_FOR_EACH_POP (tx_port_cached, node, >send_port_cache) {
+uint32_t port_no_ind;
+
+port_no_ind = odp_to_u32(tx_port_cached->port->port_no);
+if (port_no_ind < ARRAY_SIZE(pmd->send_netdev_cache)) {
+pmd->send_netdev_cache[port_no_ind] = NULL;
+}
 free(tx_port_cached);
 }
 }
@@ -5566,9 +5572,16 @@ pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd)
 }
 
 if (netdev_n_txq(tx_port->port->netdev)) {
+uint32_t port_no_ind;
+
 tx_port_cached = xmemdup(tx_port, sizeof *tx_port_cached);
 hmap_insert(>send_port_cache, _port_cached->node,
 hash_port_no(tx_port_cached->port->port_no));
+port_no_ind = odp_to_u32(tx_port_cached->port->port_no);
+if (port_no_ind < ARRAY_SIZE(pmd->send_netdev_cache)) {
+pmd->send_netdev_cache[port_no_ind] =
+tx_port_cached->port->netdev;
+}
 }
 }
 }
@@ -6217,6 +6230,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, 
struct dp_netdev *dp,
 hmap_init(>tx_ports);
 hmap_init(>tnl_port_cache);
 hmap_init(>send_port_cache);
+memset(pmd->send_netdev_cache, 0, sizeof pmd->send_netdev_cache);
 cmap_init(>tx_bonds);
 
 /* Initialize DPIF function pointer to the default configured version. */
@@ -6241,6 +6255,7 @@ dp_netdev_destroy_pmd(struct dp_netdev_pmd_thread *pmd)
 struct dpcls *cls;
 
 dp_netdev_pmd_flow_flush(pmd);
+memset(pmd->send_netdev_cache, 0, sizeof pmd->send_netdev_cache);
 hmap_destroy(>send_port_cache);
 hmap_destroy(>tnl_port_cache);
 hmap_destroy(>tx_ports);
@@ -6754,20 +6769,40 @@ smc_lookup_single(struct dp_netdev_pmd_thread *pmd,
 static struct tx_port * pmd_send_port_cache_lookup(
 const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
 
+OVS_UNUSED
+static inline struct netdev *
+pmd_netdev_cache_lookup(const struct dp_netdev_pmd_thread *pmd,
+odp_port_t port_no)
+{
+uint32_t port_no_ind;
+struct tx_port *p;
+
+port_no_ind = odp_to_u32(port_no);
+if (port_no_ind < ARRAY_SIZE(pmd->send_netdev_cache)) {
+return pmd->send_netdev_cache[port_no_ind];
+}
+
+p = pmd_send_port_cache_lookup(pmd, port_no);
+if (p) {
+return p->port->netdev;
+}
+return NULL;
+}
+
 inline int
 dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
   odp_port_t port_no OVS_UNUSED,
   struct dp_packet *packet,
   struct dp_netdev_flow **flow)
 {
-struct tx_port *p OVS_UNUSED;
+struct netdev *netdev OVS_UNUSED;
 uint32_t mark;
 
 #ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
 /* Restore the packet if HW processing was terminated before completion. */
-p = pmd_send_port_cache_lookup(pmd, port_no);
-if (OVS_LIKELY(p)) {
-int err = netdev_hw_miss_packet_recover(p->port->netdev, packet);
+netdev = pmd_netdev_cache_lookup(pmd, port_no);
+if (OVS_LIKELY(netdev)) {
+int err = netdev_hw_miss_packet_recover(netdev, packet);
 
 if (err && err != EOPNOTSUPP) {
 COVERAGE_INC(datapath_drop_hw_miss_recover);
-- 
2.28.0.2311.g225365fb51

___
dev mailing list
d...@openvswitch.org

[ovs-dev] [PATCH V2 1/2] dpif-netdev: Do not execute packet recovery without experimental support

2021-07-14 Thread Eli Britstein
rte_flow_get_restore_info() API is under experimental attribute. Using it
has a performance impact that can be avoided for non-experimental compilation.

Do not call it without experimental support.

Reported-by: Cian Ferriter 
Signed-off-by: Eli Britstein 
Reviewed-by: Gaetan Rivet 
---
 lib/dpif-netdev.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 610949f36..1823bf565 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -114,7 +114,9 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
 COVERAGE_DEFINE(datapath_drop_invalid_bond);
 COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
 COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
+#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
 COVERAGE_DEFINE(datapath_drop_hw_miss_recover);
+#endif
 
 /* Protects against changes to 'dp_netdevs'. */
 static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
@@ -6754,13 +6756,14 @@ static struct tx_port * pmd_send_port_cache_lookup(
 
 inline int
 dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
-  odp_port_t port_no,
+  odp_port_t port_no OVS_UNUSED,
   struct dp_packet *packet,
   struct dp_netdev_flow **flow)
 {
-struct tx_port *p;
+struct tx_port *p OVS_UNUSED;
 uint32_t mark;
 
+#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
 /* Restore the packet if HW processing was terminated before completion. */
 p = pmd_send_port_cache_lookup(pmd, port_no);
 if (OVS_LIKELY(p)) {
@@ -6771,6 +6774,7 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
 return -1;
 }
 }
+#endif
 
 /* If no mark, no flow to find. */
 if (!dp_packet_has_flow_mark(packet, )) {
-- 
2.28.0.2311.g225365fb51

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v11 10/11] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-07-14 Thread Aaron Conole
"Van Haaren, Harry"  writes:

>> -Original Message-
>> From: Eelco Chaudron 
>> Sent: Wednesday, July 14, 2021 10:12 AM
>> To: Amber, Kumar ; Aaron Conole
>> 
>> Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van 
>> Haaren,
>> Harry ; Ferriter, Cian ;
>> Stokes, Ian 
>> Subject: Re: [v11 10/11] dpif-netdev/mfex: Add AVX512 based optimized 
>> miniflow
>> extract
>
> 
>
>> > +/* Permute the packet layout into miniflow blocks shape.
>> > + * As different AVX512 ISA levels have different implementations,
>> > + * this specializes on the "use_vbmi" attribute passed in.
>> > + */
>> > +__m512i v512_zeros = _mm512_setzero_si512();
>> > +__m512i v_blk0 = v512_zeros;
>> 
>> Although I did ACK this patchset, running make clang-analyzer, gave me the
>> following warning, which should be fixed:
>> 
>> lib/dpif-netdev-extract-avx512.c:476:17: warning: Value stored to 'v_blk0' 
>> during its
>> initialization is never read
>> __m512i v_blk0 = v512_zeros;
>> ^~   ~~
>> 1 warning generated.
>
> Ah interesting, indeed its never read. Its also a "zeroed" register, so it 
> has no runtime
> performance impact. (Magic of OoO execution, combined with register renaming 
> :)
>
> The fix is simply to remove the " = v512_zeros;" part, resulting in this decl 
> of the variable from
> __m512i v_blk0 = v512_zeros;
> to
> __m512i v_blk0;
>
> Will be included in v11.
>
>
>> Aaron, would it be possible to add a clang-analyzer run to the zero robot to 
>> catch
>> newly introduced warnings?
>
> That'd be cool. I presume that a "scan-build" prefix to the build command is 
> all that's
> needed, as usually the case with clang analyzer. Reproduced the above error 
> using
> scan-build, so that seems to work :)

If it's simple enough, sure.  Patch inc.

> Regards, -Harry
>
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ofproto-dpif-xlate: avoid successive ct_clear datapath actions

2021-07-14 Thread Eelco Chaudron



On 8 Jul 2021, at 21:21, Ilya Maximets wrote:

> On 5/24/21 2:39 PM, Timothy Redaelli wrote:
>> On Tue, 18 May 2021 06:17:48 -0400
>> Eelco Chaudron  wrote:
>>
>>> Due to flow lookup optimizations, especially in the resubmit/clone cases,
>>> we might end up with multiple ct_clear actions, which are not necessary.
>>>
>>> This patch only adds the ct_clear action to the datapath if any ct state
>>> is tracked.
>>>
>>> Signed-off-by: Eelco Chaudron 
>>> ---
>>> v2: Insert ct_clear only when ct information is tracked vs tracking 
>>> successive
>>> ct_clear actions.
>>>
>>> ofproto/ofproto-dpif-xlate.c |4 +++-
>>>  tests/ofproto-dpif.at|   25 +
>>>  2 files changed, 28 insertions(+), 1 deletion(-)
>>
>>
>> Acked-By: Timothy Redaelli 
>>
>
> Thanks!  Applied.

Can this also be backported to the stable releases?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v11 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-14 Thread Van Haaren, Harry
> -Original Message-
> From: Eelco Chaudron 
> Sent: Wednesday, July 14, 2021 12:57 PM
> To: Amber, Kumar 
> Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van 
> Haaren,
> Harry ; Ferriter, Cian ;
> Stokes, Ian 
> Subject: Re: [v11 06/11] dpif-netdev: Add packet count and core id paramters 
> for
> study
> 
> 
> 
> On 14 Jul 2021, at 13:33, Amber, Kumar wrote:
> 
> > Hi Eelco,
> >
> >> -Original Message-
> >> From: Eelco Chaudron 
> >> Sent: Wednesday, July 14, 2021 4:21 PM
> >> To: Amber, Kumar 
> >> Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van
> >> Haaren, Harry ; Ferriter, Cian
> >> ; Stokes, Ian 
> >> Subject: Re: [v11 06/11] dpif-netdev: Add packet count and core id 
> >> paramters
> >> for study
> >>
> >>
> >>
> >> On 14 Jul 2021, at 12:30, Eelco Chaudron wrote:
> >>
> >>> On 14 Jul 2021, at 4:02, kumar Amber wrote:
> >>>
>  From: Kumar Amber 
> 
>  This commit introduces additional command line paramter for mfex
>  study function. If user provides additional packet out it is used in
>  study to compare minimum packets which must be processed else a
>  default value is choosen.
>  Also introduces a third paramter for choosing a particular pmd core.
> 
>  $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3
> 
>  Signed-off-by: Kumar Amber 
> >>
> >> One additional comment, please add some (negative) test cases for the
> >> command line options, so we know your changes work. Rather than me having
> >> to do this manually every revision.
> >>
> >
> > Yes we did think about that but we cannot as whenever a command fails the 
> > ovs
> flags it as error in server that auto-mantically fails the test.
> >
> > Ex:
> >> 2021-07-14T11:16:30.194Z|00082|unixctl|DBG|received request dpif-
> netdev/miniflow-parser-set["-pmd","0","scalar"], id=0
> >> 2021-07-14T11:16:30.194Z|00083|dpif_netdev|ERR|Error: Miniflow parser not
> changed, PMD thread 0 not in use, pass a valid pmd thread ID.
> >> 2021-07-14T11:16:30.194Z|00084|unixctl|DBG|replying with error, id=0: 
> >> "Error:
> Miniflow parser not changed, PMD thread 0 not in use, pass a valid pmd thread 
> ID.
> >> "
> > 8. system-dpdk.at:291: 8. OVS-DPDK - MFEX Commands (system-dpdk.at:291):
> FAILED (system-dpdk.at:308)
> >
> > And Hence we cannot add the command test-case.
> 
> Not sure how you added it, but here is a quick try that works for me:
> 
> [wsfd-netdev64:~/..._v20.11.1/ovs_github]$ git diff
> diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
> index 96072e646..5f039dec5 100644
> --- a/tests/system-dpdk.at
> +++ b/tests/system-dpdk.at
> @@ -285,3 +285,26 @@ dnl Clean up
>  AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
>  AT_CLEANUP
>  dnl 
> --
> +
> +dnl 
> --
> +AT_SETUP([OVS-DPDK - MFEX Configuration])
> +AT_KEYWORDS([dpdk])
> +OVS_DPDK_START()
> +
> +dnl Add userspace bridge and attach it to OVS
> +AT_CHECK([ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev])
> +AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dpdk 
> options:dpdk-
> devargs=net_pcap1,rx_pcap=$srcdir/pcap/mfex_test.pcap,infinite_rx=1], [],
> [stdout], [stderr])
> +AT_CHECK([ovs-vsctl show], [], [stdout])
> +
> +
> +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set scalar 1], [2], [], [dnl
> +The study_pkt_cnt option is not valid for the scalar implementation.
> +ovs-appctl: ovs-vswitchd: server returned an error
> +])
> +
> +dnl Clean up
> +AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
> +AT_CLEANUP
> +dnl 
> --

Hi All,

I think the above point was that if the ovs-vswitchd does a VLOG_ERR, (as it 
does when
bad arguments like the above test are hit), then the unit-test automatically 
fails.
Its not about adding the test code, but about how the Unit test infra handles 
VLOG_ERR output.

As a result, we cannot have unit-tests that actually hit errors it seems, as 
they cause
the unit-test to report a failure, instead of "negative testing" and passing 
the unit test.

I'm not very familiar with the wider AT_* based unit testing, but that's my 
understanding 
of the infrastructure?


> > 
> >
> > Br
> > Amber

Regards, -Harry
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v8] ovn-sbctl.c Add logical flows count numbers

2021-07-14 Thread Mark Michelson

Looks good to me Alexey, thanks!

Acked-by: Mark Michelson 

On 6/19/21 3:38 PM, Alexey Roytman wrote:

From: Alexey Roytman 

For big scale deployments, when number of logical flows can be 2M+,
sometimes users just need to know the total number of logical flows
and numbers of logical flows per table/per datapath.

New command output
Datapath: "sw1" (4b1e53d8-9f0f-4768-b4a6-6cbc58a4bfda)  Pipeline: ingress
   table=0 (ls_in_port_sec_l2  ) lflows=2
   table=1 (ls_in_port_sec_ip  ) lflows=1
   table=2 (ls_in_port_sec_nd  ) lflows=1
   table=3 (ls_in_lookup_fdb   ) lflows=1
   table=4 (ls_in_put_fdb  ) lflows=1
   table=5 (ls_in_pre_acl  ) lflows=2
   table=6 (ls_in_pre_lb   ) lflows=3
   table=7 (ls_in_pre_stateful ) lflows=2
   table=8 (ls_in_acl_hint ) lflows=1
   table=9 (ls_in_acl  ) lflows=2
   table=10(ls_in_qos_mark ) lflows=1
   table=11(ls_in_qos_meter) lflows=1
   table=12(ls_in_lb   ) lflows=1
   table=13(ls_in_stateful ) lflows=8
   table=14(ls_in_pre_hairpin  ) lflows=1
   table=15(ls_in_nat_hairpin  ) lflows=1
   table=16(ls_in_hairpin  ) lflows=1
   table=17(ls_in_arp_rsp  ) lflows=1
   table=18(ls_in_dhcp_options ) lflows=1
   table=19(ls_in_dhcp_response) lflows=1
   table=20(ls_in_dns_lookup   ) lflows=1
   table=21(ls_in_dns_response ) lflows=1
   table=22(ls_in_external_port) lflows=1
   table=23(ls_in_l2_lkup  ) lflows=3
   table=24(ls_in_l2_unknown   ) lflows=2
Total number of logical flows in the datapath "sw1" 
(4b1e53d8-9f0f-4768-b4a6-6cbc58a4bfda) Pipeline: ingress = 41

Datapath: "sw1" (4b1e53d8-9f0f-4768-b4a6-6cbc58a4bfda)  Pipeline: egress
   table=0 (ls_out_pre_lb  ) lflows=3
   table=1 (ls_out_pre_acl ) lflows=2
   table=2 (ls_out_pre_stateful) lflows=2
   table=3 (ls_out_lb  ) lflows=1
   table=4 (ls_out_acl_hint) lflows=1
   table=5 (ls_out_acl ) lflows=2
   table=6 (ls_out_qos_mark) lflows=1
   table=7 (ls_out_qos_meter   ) lflows=1
   table=8 (ls_out_stateful) lflows=3
   table=9 (ls_out_port_sec_ip ) lflows=1
   table=10(ls_out_port_sec_l2 ) lflows=1
Total number of logical flows in the datapath "sw1" 
(4b1e53d8-9f0f-4768-b4a6-6cbc58a4bfda) Pipeline: egress = 18

Total number of logical flows = 59

Signed-off-by: Alexey Roytman 

---
V6 -> V7
  * Addressed commit b6f0e51d8b52cf2381503c3c1c5c2a0d6bd7afa6 and Matk's 
comments
v5 -> v6
  * Addressed Ben's comments about replacemen the --count flag of lflow-list/dump-flows 
by a a "count-flows" command.
v3 -> v4
  * Addressed review comments from Mark

---
  tests/ovn-sbctl.at|  69 -
  utilities/ovn-sbctl.8.xml |   3 ++
  utilities/ovn-sbctl.c | 106 +++---
  3 files changed, 169 insertions(+), 9 deletions(-)

diff --git a/tests/ovn-sbctl.at b/tests/ovn-sbctl.at
index f49134381..16f5dabcc 100644
--- a/tests/ovn-sbctl.at
+++ b/tests/ovn-sbctl.at
@@ -175,4 +175,71 @@ inactivity_probe: 3
  
  OVN_SBCTL_TEST([ovn_sbctl_invalid_0x_flow], [invalid 0x flow], [

  check ovn-sbctl lflow-list 0x12345678
-])
\ No newline at end of file
+])
+
+dnl -
+
+OVN_SBCTL_TEST([ovn_sbctl_count_flows], [ovn-sbctl - count-flows], [
+
+count_entries() {
+ovn-sbctl --column=_uuid list Logical_Flow | sed -r '/^\s*$/d' | wc -l
+}
+
+count_pipeline() {
+ovn-sbctl  --column=pipeline list Logical_Flow | grep $1 | sed -r 
'/^\s*$/d' | wc -l
+}
+
+# we start with empty Logical_Flow table
+# validate that the table is indeed empty
+AT_CHECK([count_entries], [0], [dnl
+0
+])
+
+AT_CHECK([ovn-sbctl count-flows], [0], [dnl
+Total number of logical flows = 0
+])
+
+# create some logical flows
+check ovn-nbctl ls-add count-test
+
+OVS_WAIT_UNTIL([total_lflows=`count_entries`; test $total_lflows -ne 0])
+
+total_lflows=`count_entries`
+egress_lflows=`count_pipeline egress`
+ingress_lflows=`count_pipeline ingress`
+
+AT_CHECK_UNQUOTED([ovn-sbctl count-flows | grep "flows =" | awk 'NF>1{print 
$NF}'], [0], [dnl
+$total_lflows
+])
+AT_CHECK_UNQUOTED([ovn-sbctl count-flows | grep Total | grep egress | awk 
'NF>1{print $NF}'], [0], [dnl
+$egress_lflows
+])
+AT_CHECK_UNQUOTED([ovn-sbctl count-flows | grep Total | grep ingress | awk 
'NF>1{print $NF}'], [0], [dnl
+$ingress_lflows
+])
+
+# add another datapath
+check ovn-nbctl ls-add count-test2
+
+# check total logical flows in 2 datapathes
+AT_CHECK_UNQUOTED([ovn-sbctl count-flows | grep "flows =" | awk 'NF>1{print 
$NF}'], [0], [dnl
+$(($total_lflows * 2))
+])
+# check total logical flows in a specific datapath
+AT_CHECK_UNQUOTED([ovn-sbctl count-flows count-test | grep "flows =" | awk 
'NF>1{print $NF}'], [0], [dnl
+$total_lflows
+])
+
+AT_CHECK_UNQUOTED([ovn-sbctl count-flows count-test | grep Total | grep egress | 
awk 'NF>1{print $NF}'], [0], [dnl
+$egress_lflows
+])
+AT_CHECK_UNQUOTED([ovn-sbctl count-flows count-test | grep Total | grep ingress | 
awk 'NF>1{print $NF}'], 

Re: [ovs-dev] [v11 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-14 Thread Eelco Chaudron



On 14 Jul 2021, at 13:33, Amber, Kumar wrote:

> Hi Eelco,
>
>> -Original Message-
>> From: Eelco Chaudron 
>> Sent: Wednesday, July 14, 2021 4:21 PM
>> To: Amber, Kumar 
>> Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van
>> Haaren, Harry ; Ferriter, Cian
>> ; Stokes, Ian 
>> Subject: Re: [v11 06/11] dpif-netdev: Add packet count and core id paramters
>> for study
>>
>>
>>
>> On 14 Jul 2021, at 12:30, Eelco Chaudron wrote:
>>
>>> On 14 Jul 2021, at 4:02, kumar Amber wrote:
>>>
 From: Kumar Amber 

 This commit introduces additional command line paramter for mfex
 study function. If user provides additional packet out it is used in
 study to compare minimum packets which must be processed else a
 default value is choosen.
 Also introduces a third paramter for choosing a particular pmd core.

 $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3

 Signed-off-by: Kumar Amber 
>>
>> One additional comment, please add some (negative) test cases for the
>> command line options, so we know your changes work. Rather than me having
>> to do this manually every revision.
>>
>
> Yes we did think about that but we cannot as whenever a command fails the ovs 
> flags it as error in server that auto-mantically fails the test.
>
> Ex:
>> 2021-07-14T11:16:30.194Z|00082|unixctl|DBG|received request 
>> dpif-netdev/miniflow-parser-set["-pmd","0","scalar"], id=0
>> 2021-07-14T11:16:30.194Z|00083|dpif_netdev|ERR|Error: Miniflow parser not 
>> changed, PMD thread 0 not in use, pass a valid pmd thread ID.
>> 2021-07-14T11:16:30.194Z|00084|unixctl|DBG|replying with error, id=0: 
>> "Error: Miniflow parser not changed, PMD thread 0 not in use, pass a valid 
>> pmd thread ID.
>> "
> 8. system-dpdk.at:291: 8. OVS-DPDK - MFEX Commands (system-dpdk.at:291): 
> FAILED (system-dpdk.at:308)
>
> And Hence we cannot add the command test-case.

Not sure how you added it, but here is a quick try that works for me:

[wsfd-netdev64:~/..._v20.11.1/ovs_github]$ git diff
diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
index 96072e646..5f039dec5 100644
--- a/tests/system-dpdk.at
+++ b/tests/system-dpdk.at
@@ -285,3 +285,26 @@ dnl Clean up
 AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
 AT_CLEANUP
 dnl --
+
+dnl --
+AT_SETUP([OVS-DPDK - MFEX Configuration])
+AT_KEYWORDS([dpdk])
+OVS_DPDK_START()
+
+dnl Add userspace bridge and attach it to OVS
+AT_CHECK([ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev])
+AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dpdk 
options:dpdk-devargs=net_pcap1,rx_pcap=$srcdir/pcap/mfex_test.pcap,infinite_rx=1],
 [], [stdout], [stderr])
+AT_CHECK([ovs-vsctl show], [], [stdout])
+
+
+AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set scalar 1], [2], [], [dnl
+The study_pkt_cnt option is not valid for the scalar implementation.
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+dnl Clean up
+AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
+AT_CLEANUP
+dnl --



> 
>
> Br
> Amber

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v11 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-14 Thread Amber, Kumar
Hi Eelco,

> -Original Message-
> From: Eelco Chaudron 
> Sent: Wednesday, July 14, 2021 4:21 PM
> To: Amber, Kumar 
> Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van
> Haaren, Harry ; Ferriter, Cian
> ; Stokes, Ian 
> Subject: Re: [v11 06/11] dpif-netdev: Add packet count and core id paramters
> for study
> 
> 
> 
> On 14 Jul 2021, at 12:30, Eelco Chaudron wrote:
> 
> > On 14 Jul 2021, at 4:02, kumar Amber wrote:
> >
> >> From: Kumar Amber 
> >>
> >> This commit introduces additional command line paramter for mfex
> >> study function. If user provides additional packet out it is used in
> >> study to compare minimum packets which must be processed else a
> >> default value is choosen.
> >> Also introduces a third paramter for choosing a particular pmd core.
> >>
> >> $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3
> >>
> >> Signed-off-by: Kumar Amber 
> 
> One additional comment, please add some (negative) test cases for the
> command line options, so we know your changes work. Rather than me having
> to do this manually every revision.
> 

Yes we did think about that but we cannot as whenever a command fails the ovs 
flags it as error in server that auto-mantically fails the test.

Ex: 
> 2021-07-14T11:16:30.194Z|00082|unixctl|DBG|received request 
> dpif-netdev/miniflow-parser-set["-pmd","0","scalar"], id=0
> 2021-07-14T11:16:30.194Z|00083|dpif_netdev|ERR|Error: Miniflow parser not 
> changed, PMD thread 0 not in use, pass a valid pmd thread ID.
> 2021-07-14T11:16:30.194Z|00084|unixctl|DBG|replying with error, id=0: "Error: 
> Miniflow parser not changed, PMD thread 0 not in use, pass a valid pmd thread 
> ID.
> "
8. system-dpdk.at:291: 8. OVS-DPDK - MFEX Commands (system-dpdk.at:291): FAILED 
(system-dpdk.at:308)

And Hence we cannot add the command test-case.



Br 
Amber
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-offload-tc: verify the flower rule installed

2021-07-14 Thread Eelco Chaudron



On 12 Jul 2021, at 14:54, Marcelo Ricardo Leitner wrote:

> On Mon, Jul 12, 2021 at 10:28:15AM +0200, Eelco Chaudron wrote:
>>
>>
>> On 9 Jul 2021, at 20:23, Ilya Maximets wrote:
>>
>>> On 7/9/21 10:35 AM, Eelco Chaudron wrote:


 On 8 Jul 2021, at 22:18, Ilya Maximets wrote:

> On 5/17/21 3:20 PM, Eelco Chaudron wrote:
>> When OVs installs the flower rule, it only checks for the OK from the
>> kernel. It does not check if the rule requested matches the one
>> actually programmed. This change will add this check and warns the
>> user if this is not the case.
>>
>> Signed-off-by: Eelco Chaudron 
>> ---
>>  lib/tc.c |   59 
>> +++
>>  1 file changed, 59 insertions(+)
>>
>> diff --git a/lib/tc.c b/lib/tc.c
>> index a27cca2cc..e134f6a06 100644
>> --- a/lib/tc.c
>> +++ b/lib/tc.c
>> @@ -2979,6 +2979,50 @@ nl_msg_put_flower_options(struct ofpbuf *request, 
>> struct tc_flower *flower)
>>  return 0;
>>  }
>>
>> +static bool
>> +cmp_tc_flower_match_action(const struct tc_flower *a,
>> +   const struct tc_flower *b)
>> +{
>> +if (memcmp(>mask, >mask, sizeof a->mask)) {
>> +VLOG_DBG_RL(_rl, "tc flower compare failed mask compare");
>> +return false;
>> +}
>> +
>> +/* We can not memcmp() the key as some keys might be set while the 
>> mask
>> + * is not.*/
>> +
>> +for (int i = 0; i < sizeof a->key; i++) {
>> +uint8_t mask = ((uint8_t *)>mask)[i];
>> +uint8_t key_a = ((uint8_t *)>key)[i] & mask;
>> +uint8_t key_b = ((uint8_t *)>key)[i] & mask;
>> +
>> +if (key_a != key_b) {
>> +VLOG_DBG_RL(_rl, "tc flower compare failed key 
>> compare at "
>> +"%d", i);
>> +return false;
>> +}
>> +}
>> +
>> +/* Compare the actions. */
>> +const struct tc_action *action_a = a->actions;
>> +const struct tc_action *action_b = b->actions;
>> +
>> +if (a->action_count != b->action_count) {
>> +VLOG_DBG_RL(_rl, "tc flower compare failed action length 
>> check");
>> +return false;
>> +}
>> +
>> +for (int i = 0; i < a->action_count; i++, action_a++, action_b++) {
>> +if (memcmp(action_a, action_b, sizeof *action_a)) {
>> +VLOG_DBG_RL(_rl, "tc flower compare failed action 
>> compare "
>> +"for %d", i);
>> +return false;
>> +}
>> +}
>> +
>> +return true;
>> +}
>> +
>>  int
>>  tc_replace_flower(struct tcf_id *id, struct tc_flower *flower)
>>  {
>> @@ -3010,6 +3054,21 @@ tc_replace_flower(struct tcf_id *id, struct 
>> tc_flower *flower)
>>
>>  id->prio = tc_get_major(tc->tcm_info);
>>  id->handle = tc->tcm_handle;
>> +
>> +if (id->prio != TC_RESERVED_PRIORITY_POLICE) {
>> +struct tc_flower flower_out;
>> +struct tcf_id id_out;
>> +int ret;
>> +
>> +ret = parse_netlink_to_tc_flower(reply, _out, 
>> _out,
>> + false);
>> +
>> +if (ret || !cmp_tc_flower_match_action(flower, 
>> _out)) {
>> +VLOG_WARN_RL(_rl, "Kernel flower acknowledgment 
>> does "
>> + "not match request!\n Set dpif_netlink to 
>> dbg to "
>> + "see which rule caused this error.");
>
> So we're only printing the warning and not reverting the change
> and not returning an error, right?  So, OVS will continue to
> work with the incorrect rule installed?
> I think, we should revert the incorrect change and return the
> error, so the flow could be installed to the OVS kernel datapath,
> but maybe this is a task for a separate change.
>
> What do you think?

 The goal was to make sure we do not break anything, in case there is an 
 existing kernel bug. As unfortunately, we are missing a good set of TC 
 unit tests.

 With the "warning only" option, we can backport this. And if in the field 
 we do not see any (false) reports, a follow-up patch can do as you 
 suggested.
>>>
>>> Make sense.  I removed '\n' from a warning (these doesn't look good in the 
>>> log)
>>> and applied to master.
>>
>> Thanks!
>>
>>> You and Marcelo are talking about backporting, do you think it make sense to
>>> backport to stable branches?
>>
>> If it applies cleanly, I would suggest backporting it all the way to 2.13. 
>> Marcelo?
>
> I don't know how different is the support for 2.13 and 2.15. I mean,
> if 2.13 is only for critical 

Re: [ovs-dev] [v11 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-14 Thread Eelco Chaudron


On 14 Jul 2021, at 13:02, Amber, Kumar wrote:

> Hi Eelco,
>
>> -Original Message-
>> From: Eelco Chaudron 
>> Sent: Wednesday, July 14, 2021 4:08 PM
>> To: Amber, Kumar 
>> Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van
>> Haaren, Harry ; Ferriter, Cian
>> ; Stokes, Ian 
>> Subject: Re: [v11 07/11] test/sytem-dpdk: Add unit test for mfex 
>> autovalidator
>>
>>
>>
>> On 14 Jul 2021, at 12:27, Amber, Kumar wrote:
>>
>>> Hi Eelco,
>>>
>>>
> +
> +AT_SKIP_IF([! ovs-appctl dpif-netdev/miniflow-parser-get | sed 1,4d
> +| grep -v "not available"], [], [dnl
> +])

 Please, if you make changes, test them, as this has never worked, as
 you changed this to True/False.
 Here is a working example:

 AT_SKIP_IF([! ovs-appctl dpif-netdev/miniflow-parser-get | sed 1,4d |
 grep "True"], [], [dnl
 ])

 Also, make sure you test it with this patch only, and the full patch series
>> applied.

>>>
>>> I tested the patch with just patch 7 that should skip it and it does :
>>>
>>>   6: OVS-DPDK - MFEX Autovalidator   skipped 
>>> (system-dpdk.at:248)
>>>   7: OVS-DPDK - MFEX Autovalidator Fuzzy skipped 
>>> (system-dpdk.at:275)
>>
>> Yes, but it was executing the test fine with all patches when no AVX was
>> supported. It should have skipped the tests also in that case.
>>
>>> But checking true is more logical so will take in v12.
>>>
> +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set
> +autovalidator], [0], [dnl Miniflow Extract implementation set to
>> autovalidator.
> +])
> +
> +OVS_WAIT_UNTIL([test `ovs-vsctl get interface p1 statistics | grep
> +-oP 'rx_packets=\s*\K\d+'` -ge 1000])
> +
> +dnl Clean up
> +AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
> +AT_CLEANUP dnl
> +---
> +--
> +-
> +
> +dnl
> +---
> +--
> +-
> +dnl Add standard DPDK PHY port
> +AT_SETUP([OVS-DPDK - MFEX Autovalidator Fuzzy])
> +AT_KEYWORDS([dpdk])
> +AT_SKIP_IF([! pip3 list | grep scapy], [], [])
> +AT_CHECK([$PYTHON3 $srcdir/mfex_fuzzy.py $srcdir], [], [stdout])
> +OVS_DPDK_START()
> +
> +dnl Add userspace bridge and attach it to OVS AT_CHECK([ovs-vsctl
> +add-br br0 -- set bridge br0 datapath_type=netdev])
> +AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dpdk
> +options:dpdk-devargs=net_pcap1,rx_pcap=$srcdir/pcap/fuzzy.pcap,infi
> +ni te_rx=1], [], [stdout], [stderr]) AT_CHECK([ovs-vsctl show], [],
> +[stdout])
> +
> +AT_SKIP_IF([! ovs-appctl dpif-netdev/miniflow-parser-get | sed 1,4d
> +| grep -v "not available"], [], [dnl
> +])

 This does not work, see above, but also move it up right after
 AT_SKIP_IF([! pip3 list | grep scapy], [], []) to speed up the process if 
 it’s
>> skipped.

>>>
>>> Cannot move there as for the command to work we need the OVS to start first
>> to accept the get command .
>>
>> You are right, forgot about that. I was suggesting this to avoid taking a 
>> long
>> time delay to create the fuzzy.pcap when the test does not need to run. So
>> maybe we can move the creation of this pcap below the check.
>>
>
> I did try that but once we have a Ovs start we cannot make the test-case wait 
> to run a pyhton script it will continue and hence kept it like it
> I know it does waste 5sec but I will looks to improve it but otherwise this 
> is the order we would have to keep.

Don’t understand why it can’t wait, but as this is more of a nit, ACK as is.

> +
> +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set
> +autovalidator], [0], [dnl Miniflow Extract implementation set to
>> autovalidator.
> +])
> +
> +OVS_WAIT_UNTIL([test `ovs-vsctl get interface p1 statistics | grep
> +-oP 'rx_packets=\s*\K\d+'` -ge 10])
> +
> +dnl Clean up
> +AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
> +AT_CLEANUP dnl
> +---
> +--
> +-
> --
> 2.25.1
>>>
>>> BR
>>> Amber

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v11 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-14 Thread Amber, Kumar
Hi Eelco,

> -Original Message-
> From: Eelco Chaudron 
> Sent: Wednesday, July 14, 2021 4:08 PM
> To: Amber, Kumar 
> Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van
> Haaren, Harry ; Ferriter, Cian
> ; Stokes, Ian 
> Subject: Re: [v11 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator
> 
> 
> 
> On 14 Jul 2021, at 12:27, Amber, Kumar wrote:
> 
> > Hi Eelco,
> >
> >
> >>> +
> >>> +AT_SKIP_IF([! ovs-appctl dpif-netdev/miniflow-parser-get | sed 1,4d
> >>> +| grep -v "not available"], [], [dnl
> >>> +])
> >>
> >> Please, if you make changes, test them, as this has never worked, as
> >> you changed this to True/False.
> >> Here is a working example:
> >>
> >> AT_SKIP_IF([! ovs-appctl dpif-netdev/miniflow-parser-get | sed 1,4d |
> >> grep "True"], [], [dnl
> >> ])
> >>
> >> Also, make sure you test it with this patch only, and the full patch series
> applied.
> >>
> >
> > I tested the patch with just patch 7 that should skip it and it does :
> >
> >   6: OVS-DPDK - MFEX Autovalidator   skipped 
> > (system-dpdk.at:248)
> >   7: OVS-DPDK - MFEX Autovalidator Fuzzy skipped 
> > (system-dpdk.at:275)
> 
> Yes, but it was executing the test fine with all patches when no AVX was
> supported. It should have skipped the tests also in that case.
> 
> > But checking true is more logical so will take in v12.
> >
> >>> +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set
> >>> +autovalidator], [0], [dnl Miniflow Extract implementation set to
> autovalidator.
> >>> +])
> >>> +
> >>> +OVS_WAIT_UNTIL([test `ovs-vsctl get interface p1 statistics | grep
> >>> +-oP 'rx_packets=\s*\K\d+'` -ge 1000])
> >>> +
> >>> +dnl Clean up
> >>> +AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
> >>> +AT_CLEANUP dnl
> >>> +---
> >>> +--
> >>> +-
> >>> +
> >>> +dnl
> >>> +---
> >>> +--
> >>> +-
> >>> +dnl Add standard DPDK PHY port
> >>> +AT_SETUP([OVS-DPDK - MFEX Autovalidator Fuzzy])
> >>> +AT_KEYWORDS([dpdk])
> >>> +AT_SKIP_IF([! pip3 list | grep scapy], [], [])
> >>> +AT_CHECK([$PYTHON3 $srcdir/mfex_fuzzy.py $srcdir], [], [stdout])
> >>> +OVS_DPDK_START()
> >>> +
> >>> +dnl Add userspace bridge and attach it to OVS AT_CHECK([ovs-vsctl
> >>> +add-br br0 -- set bridge br0 datapath_type=netdev])
> >>> +AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dpdk
> >>> +options:dpdk-devargs=net_pcap1,rx_pcap=$srcdir/pcap/fuzzy.pcap,infi
> >>> +ni te_rx=1], [], [stdout], [stderr]) AT_CHECK([ovs-vsctl show], [],
> >>> +[stdout])
> >>> +
> >>> +AT_SKIP_IF([! ovs-appctl dpif-netdev/miniflow-parser-get | sed 1,4d
> >>> +| grep -v "not available"], [], [dnl
> >>> +])
> >>
> >> This does not work, see above, but also move it up right after
> >> AT_SKIP_IF([! pip3 list | grep scapy], [], []) to speed up the process if 
> >> it’s
> skipped.
> >>
> >
> > Cannot move there as for the command to work we need the OVS to start first
> to accept the get command .
> 
> You are right, forgot about that. I was suggesting this to avoid taking a long
> time delay to create the fuzzy.pcap when the test does not need to run. So
> maybe we can move the creation of this pcap below the check.
> 

I did try that but once we have a Ovs start we cannot make the test-case wait 
to run a pyhton script it will continue and hence kept it like it
I know it does waste 5sec but I will looks to improve it but otherwise this is 
the order we would have to keep.
> >>> +
> >>> +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set
> >>> +autovalidator], [0], [dnl Miniflow Extract implementation set to
> autovalidator.
> >>> +])
> >>> +
> >>> +OVS_WAIT_UNTIL([test `ovs-vsctl get interface p1 statistics | grep
> >>> +-oP 'rx_packets=\s*\K\d+'` -ge 10])
> >>> +
> >>> +dnl Clean up
> >>> +AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
> >>> +AT_CLEANUP dnl
> >>> +---
> >>> +--
> >>> +-
> >>> --
> >>> 2.25.1
> >
> > BR
> > Amber

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v11 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-14 Thread Eelco Chaudron


On 14 Jul 2021, at 12:30, Eelco Chaudron wrote:

> On 14 Jul 2021, at 4:02, kumar Amber wrote:
>
>> From: Kumar Amber 
>>
>> This commit introduces additional command line paramter
>> for mfex study function. If user provides additional packet out
>> it is used in study to compare minimum packets which must be processed
>> else a default value is choosen.
>> Also introduces a third paramter for choosing a particular pmd core.
>>
>> $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3
>>
>> Signed-off-by: Kumar Amber 

One additional comment, please add some (negative) test cases for the command 
line options, so we know your changes work. Rather than me having to do this 
manually every revision.

>> ---
>> v11:
>> - include comments from Eelco
>> - reworked set command as per discussion
>> v10:
>> - fix review comments Eelco
>> v9:
>> - fix review comments Flavio
>> v7:
>> - change the command paramters for core_id and study_pkt_cnt
>> v5:
>> - fix review comments(Ian, Flavio, Eelco)
>> - introucde pmd core id parameter
>> ---
>> ---
>>  Documentation/topics/dpdk/bridge.rst |  37 ++-
>>  lib/dpif-netdev-extract-study.c  |  28 -
>>  lib/dpif-netdev-private-extract.h|   9 ++
>>  lib/dpif-netdev.c| 156 ++-
>>  4 files changed, 201 insertions(+), 29 deletions(-)
>>
>> diff --git a/Documentation/topics/dpdk/bridge.rst 
>> b/Documentation/topics/dpdk/bridge.rst
>> index a47153495..7860d6173 100644
>> --- a/Documentation/topics/dpdk/bridge.rst
>> +++ b/Documentation/topics/dpdk/bridge.rst
>> @@ -284,12 +284,45 @@ command also shows whether the CPU supports each 
>> implementation ::
>>
>>  An implementation can be selected manually by the following command ::
>>
>> -$ ovs-appctl dpif-netdev/miniflow-parser-set study
>> +$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
>> + [study_cnt]
>> +
>> +The above command has two optional parameters: study_cnt and core_id.
>> +The core_id sets a particular miniflow extract function to a specific
>> +pmd thread on the core.The third parameter study_cnt, which is specific
>
> Add space after period.
>
>> +to study and ignored by other implementations, means how many packets
>> +are needed to choose the best implementation.
>>
>>  Also user can select the study implementation which studies the traffic for
>>  a specific number of packets by applying all available implementations of
>>  miniflow extract and then chooses the one with the most optimal result for
>> -that traffic pattern.
>> +that traffic pattern. The user can optionally provide an packet count
>> +[study_cnt] parameter which is the minimum number of packets that OVS must
>> +study before choosing an optimal implementation. If no packet count is
>> +provided, then the default value, 128 is chosen. Also, as there is no
>> +synchronization point between threads, one PMD thread might still be running
>> +a previous round, and can now decide on earlier data.
>> +
>> +The per packet count is a global value, and parallel study executions with
>> +differing packet counts will use the most recent count value provided by 
>> usser.
>
> usser -> user
>
>> +
>> +Study can be selected with packet count by the following command ::
>> +
>> +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
>> +
>> +Study can be selected with packet count and explicit PMD selection
>> +by the following command ::
>> +
>> +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
>> +
>> +In the above command the last parameter is the CORE ID of the PMD
>
> This needs a re-write as this is no longer the last parameter.
>
>> +thread and this can also be used to explicitly set the miniflow
>> +extraction function pointer on different PMD threads.
>
>> +Scalar can be selected on core 3 by the following command where
>> +study count can be put as any arbitrary number or left blank ::
>
> This is also no longer correct, i.e., study count must NOT be provided.
>
>> +
>> +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
>>
>>  Miniflow Extract Validation
>>  ~~~
>> diff --git a/lib/dpif-netdev-extract-study.c 
>> b/lib/dpif-netdev-extract-study.c
>> index 02b709f8b..0ed31aa46 100644
>> --- a/lib/dpif-netdev-extract-study.c
>> +++ b/lib/dpif-netdev-extract-study.c
>> @@ -25,7 +25,7 @@
>>
>>  VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
>>
>> -static atomic_uint32_t mfex_study_pkts_count = 0;
>> +static atomic_uint32_t  mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;
>
> This extra space is still present, see previous review.
>
>>
>>  /* Struct to hold miniflow study stats. */
>>  struct study_stats {
>> @@ -48,6 +48,27 @@ mfex_study_get_study_stats_ptr(void)
>>  return stats;
>>  }
>>
>> +int
>> +mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count, const char *name)
>> +{
>> +struct dpif_miniflow_extract_impl *miniflow_funcs;
>> +miniflow_funcs = 

Re: [ovs-dev] [v11 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-14 Thread Eelco Chaudron


On 14 Jul 2021, at 12:27, Amber, Kumar wrote:

> Hi Eelco,
>
>
>>> +
>>> +AT_SKIP_IF([! ovs-appctl dpif-netdev/miniflow-parser-get | sed 1,4d |
>>> +grep -v "not available"], [], [dnl
>>> +])
>>
>> Please, if you make changes, test them, as this has never worked, as you
>> changed this to True/False.
>> Here is a working example:
>>
>> AT_SKIP_IF([! ovs-appctl dpif-netdev/miniflow-parser-get | sed 1,4d | grep
>> "True"], [], [dnl
>> ])
>>
>> Also, make sure you test it with this patch only, and the full patch series 
>> applied.
>>
>
> I tested the patch with just patch 7 that should skip it and it does :
>
>   6: OVS-DPDK - MFEX Autovalidator   skipped 
> (system-dpdk.at:248)
>   7: OVS-DPDK - MFEX Autovalidator Fuzzy skipped 
> (system-dpdk.at:275)

Yes, but it was executing the test fine with all patches when no AVX was 
supported. It should have skipped the tests also in that case.

> But checking true is more logical so will take in v12.
>
>>> +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set autovalidator],
>>> +[0], [dnl Miniflow Extract implementation set to autovalidator.
>>> +])
>>> +
>>> +OVS_WAIT_UNTIL([test `ovs-vsctl get interface p1 statistics | grep
>>> +-oP 'rx_packets=\s*\K\d+'` -ge 1000])
>>> +
>>> +dnl Clean up
>>> +AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
>>> +AT_CLEANUP dnl
>>> +-
>>> +-
>>> +
>>> +dnl
>>> +-
>>> +-
>>> +dnl Add standard DPDK PHY port
>>> +AT_SETUP([OVS-DPDK - MFEX Autovalidator Fuzzy])
>>> +AT_KEYWORDS([dpdk])
>>> +AT_SKIP_IF([! pip3 list | grep scapy], [], [])
>>> +AT_CHECK([$PYTHON3 $srcdir/mfex_fuzzy.py $srcdir], [], [stdout])
>>> +OVS_DPDK_START()
>>> +
>>> +dnl Add userspace bridge and attach it to OVS AT_CHECK([ovs-vsctl
>>> +add-br br0 -- set bridge br0 datapath_type=netdev])
>>> +AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dpdk
>>> +options:dpdk-devargs=net_pcap1,rx_pcap=$srcdir/pcap/fuzzy.pcap,infini
>>> +te_rx=1], [], [stdout], [stderr]) AT_CHECK([ovs-vsctl show], [],
>>> +[stdout])
>>> +
>>> +AT_SKIP_IF([! ovs-appctl dpif-netdev/miniflow-parser-get | sed 1,4d |
>>> +grep -v "not available"], [], [dnl
>>> +])
>>
>> This does not work, see above, but also move it up right after AT_SKIP_IF([! 
>> pip3
>> list | grep scapy], [], []) to speed up the process if it’s skipped.
>>
>
> Cannot move there as for the command to work we need the OVS to start first 
> to accept the get command .

You are right, forgot about that. I was suggesting this to avoid taking a long 
time delay to create the fuzzy.pcap when the test does not need to run. So 
maybe we can move the creation of this pcap below the check.

>>> +
>>> +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set autovalidator],
>>> +[0], [dnl Miniflow Extract implementation set to autovalidator.
>>> +])
>>> +
>>> +OVS_WAIT_UNTIL([test `ovs-vsctl get interface p1 statistics | grep
>>> +-oP 'rx_packets=\s*\K\d+'` -ge 10])
>>> +
>>> +dnl Clean up
>>> +AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
>>> +AT_CLEANUP dnl
>>> +-
>>> +-
>>> --
>>> 2.25.1
>
> BR
> Amber

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v10 06/12] dpif-netdev: Add packet count and core id paramters for study

2021-07-14 Thread Eelco Chaudron



On 13 Jul 2021, at 18:11, Van Haaren, Harry wrote:

>> -Original Message-
>> From: Eelco Chaudron 
>> Sent: Tuesday, July 13, 2021 3:26 PM
>> To: Van Haaren, Harry 
>> Cc: Amber, Kumar ; ovs-dev@openvswitch.org;
>> f...@sysclose.org; i.maxim...@ovn.org; Ferriter, Cian 
>> ;
>> Stokes, Ian 
>> Subject: Re: [v10 06/12] dpif-netdev: Add packet count and core id paramters 
>> for
>> study
>>
>>
>>
>> On 13 Jul 2021, at 16:09, Van Haaren, Harry wrote:
>>
>>> (Off Topic; Eelco, I think your email client is sending HTML replies, 
>>> perhaps
>> check settings to disable?)
>>>
>>> Eelco Wrote;
  If you ask for a specific existing pmd to run a specific implementation, 
 it
>> should NOT affect any existing or newly created pmd.
>>>
>>> OK, thanks for explaining it clearly with examples below!
>>>
>>> In general the behavior below is OK, except that we set the default when a
>> specific PMD is requested.
>>> As a solution, we can set the default behavior only when "-pmd" is NOT
>> provided.
>>>
>>> 1) "all pmds" (no args) mode: we set the default, update all current PMD
>> threads, and will update future PMD threads via default.
>>> 2) "-pmd " mode: we set only for that PMD thread, and if the PMD
>> thread isn't active, existing code will provide good error to user.
>>>
>>> Does that seem a pragmatic solution? -Harry
>>
>> Yes, this is exactly what I had in mind when I added the comment.
>
> Ah great, good stuff.
>
> I've looked at the enabling code & patchset, the command parsing is getting 
> complex.
> I tend to agree with Eelco's review, that a single sweep of the args would be 
> a
> better and more readable implementation, however today that command is built
> in multiple patchs, and each change would cause a rebase-conflict, so rework 
> would
> cost more time than we would like, particularly given the upcoming deadlines.
>
> Here a suggestion to pragmatically move forward:
> - Merge the code approximately as in this patchset, but with the 1) and 2) 
> suggestions applied above to fixup functional correctness.
> Amber has a v11 almost ready to go that addresses the main issues.
>
> - Commit to reworking the command in a follow-up patch next week. This 
> refactor would use the "sweep argc" method,
>and will continue to have the same user-visible 
> functionality/error-messages, just a cleaner implementation.
>(This avoids the rebase-heavy workflow of refactoring now, as the command 
> is enabled over multiple commits.)
>
> Hope that's a workable solution for all?

Just reviewed the v11, and I agree it looks even messier. I think it needs a 
proper rewrite and contradicting to your statement above, all changes are 
concentrated in patch 6, so I think it needs to be done in the v12 as we are 
almost there.

Cheers,

Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v11 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-14 Thread Eelco Chaudron



On 14 Jul 2021, at 4:02, kumar Amber wrote:


From: Kumar Amber 

This commit introduces additional command line paramter
for mfex study function. If user provides additional packet out
it is used in study to compare minimum packets which must be processed
else a default value is choosen.
Also introduces a third paramter for choosing a particular pmd core.

$ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3

Signed-off-by: Kumar Amber 

---
v11:
- include comments from Eelco
- reworked set command as per discussion
v10:
- fix review comments Eelco
v9:
- fix review comments Flavio
v7:
- change the command paramters for core_id and study_pkt_cnt
v5:
- fix review comments(Ian, Flavio, Eelco)
- introucde pmd core id parameter
---
---
 Documentation/topics/dpdk/bridge.rst |  37 ++-
 lib/dpif-netdev-extract-study.c  |  28 -
 lib/dpif-netdev-private-extract.h|   9 ++
 lib/dpif-netdev.c| 156 
++-

 4 files changed, 201 insertions(+), 29 deletions(-)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst

index a47153495..7860d6173 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -284,12 +284,45 @@ command also shows whether the CPU supports each 
implementation ::


 An implementation can be selected manually by the following command 
::


-$ ovs-appctl dpif-netdev/miniflow-parser-set study
+$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] 
[name]

+ [study_cnt]
+
+The above command has two optional parameters: study_cnt and core_id.
+The core_id sets a particular miniflow extract function to a specific
+pmd thread on the core.The third parameter study_cnt, which is 
specific


Add space after period.


+to study and ignored by other implementations, means how many packets
+are needed to choose the best implementation.

 Also user can select the study implementation which studies the 
traffic for
 a specific number of packets by applying all available 
implementations of
 miniflow extract and then chooses the one with the most optimal 
result for

-that traffic pattern.
+that traffic pattern. The user can optionally provide an packet count
+[study_cnt] parameter which is the minimum number of packets that OVS 
must
+study before choosing an optimal implementation. If no packet count 
is

+provided, then the default value, 128 is chosen. Also, as there is no
+synchronization point between threads, one PMD thread might still be 
running

+a previous round, and can now decide on earlier data.
+
+The per packet count is a global value, and parallel study executions 
with
+differing packet counts will use the most recent count value provided 
by usser.


usser -> user


+
+Study can be selected with packet count by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
+
+Study can be selected with packet count and explicit PMD selection
+by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
+
+In the above command the last parameter is the CORE ID of the PMD


This needs a re-write as this is no longer the last parameter.


+thread and this can also be used to explicitly set the miniflow
+extraction function pointer on different PMD threads.



+Scalar can be selected on core 3 by the following command where
+study count can be put as any arbitrary number or left blank ::


This is also no longer correct, i.e., study count must NOT be provided.


+
+$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar

 Miniflow Extract Validation
 ~~~
diff --git a/lib/dpif-netdev-extract-study.c 
b/lib/dpif-netdev-extract-study.c

index 02b709f8b..0ed31aa46 100644
--- a/lib/dpif-netdev-extract-study.c
+++ b/lib/dpif-netdev-extract-study.c
@@ -25,7 +25,7 @@

 VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);

-static atomic_uint32_t mfex_study_pkts_count = 0;
+static atomic_uint32_t  mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;


This extra space is still present, see previous review.



 /* Struct to hold miniflow study stats. */
 struct study_stats {
@@ -48,6 +48,27 @@ mfex_study_get_study_stats_ptr(void)
 return stats;
 }

+int
+mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count, const char *name)
+{
+struct dpif_miniflow_extract_impl *miniflow_funcs;
+miniflow_funcs = dpif_mfex_impl_info_get();
+
+/* If the packet count is set and implementation called is study 
then
+ * set packet counter to requested number else set the packet 
counter

+ * to default number.


Also see previous review comment!

“””
Guess this comment is not correct, it’s just not set.
“””


+ */
+if ((strcmp(miniflow_funcs[MFEX_IMPL_STUDY].name, name) == 0) &&
+(pkt_cmp_count != 0)) {
+
+atomic_uintptr_t *study_pck_cnt = (void 
*)_study_pkts_count;
+atomic_store_relaxed(study_pck_cnt, 

Re: [ovs-dev] [v11 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-14 Thread Amber, Kumar
Hi Eelco,


> > +
> > +AT_SKIP_IF([! ovs-appctl dpif-netdev/miniflow-parser-get | sed 1,4d |
> > +grep -v "not available"], [], [dnl
> > +])
> 
> Please, if you make changes, test them, as this has never worked, as you
> changed this to True/False.
> Here is a working example:
> 
> AT_SKIP_IF([! ovs-appctl dpif-netdev/miniflow-parser-get | sed 1,4d | grep
> "True"], [], [dnl
> ])
> 
> Also, make sure you test it with this patch only, and the full patch series 
> applied.
>

I tested the patch with just patch 7 that should skip it and it does :

  6: OVS-DPDK - MFEX Autovalidator   skipped 
(system-dpdk.at:248)
  7: OVS-DPDK - MFEX Autovalidator Fuzzy skipped 
(system-dpdk.at:275)
 
But checking true is more logical so will take in v12.

> > +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set autovalidator],
> > +[0], [dnl Miniflow Extract implementation set to autovalidator.
> > +])
> > +
> > +OVS_WAIT_UNTIL([test `ovs-vsctl get interface p1 statistics | grep
> > +-oP 'rx_packets=\s*\K\d+'` -ge 1000])
> > +
> > +dnl Clean up
> > +AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
> > +AT_CLEANUP dnl
> > +-
> > +-
> > +
> > +dnl
> > +-
> > +-
> > +dnl Add standard DPDK PHY port
> > +AT_SETUP([OVS-DPDK - MFEX Autovalidator Fuzzy])
> > +AT_KEYWORDS([dpdk])
> > +AT_SKIP_IF([! pip3 list | grep scapy], [], [])
> > +AT_CHECK([$PYTHON3 $srcdir/mfex_fuzzy.py $srcdir], [], [stdout])
> > +OVS_DPDK_START()
> > +
> > +dnl Add userspace bridge and attach it to OVS AT_CHECK([ovs-vsctl
> > +add-br br0 -- set bridge br0 datapath_type=netdev])
> > +AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1 type=dpdk
> > +options:dpdk-devargs=net_pcap1,rx_pcap=$srcdir/pcap/fuzzy.pcap,infini
> > +te_rx=1], [], [stdout], [stderr]) AT_CHECK([ovs-vsctl show], [],
> > +[stdout])
> > +
> > +AT_SKIP_IF([! ovs-appctl dpif-netdev/miniflow-parser-get | sed 1,4d |
> > +grep -v "not available"], [], [dnl
> > +])
> 
> This does not work, see above, but also move it up right after AT_SKIP_IF([! 
> pip3
> list | grep scapy], [], []) to speed up the process if it’s skipped.
> 

Cannot move there as for the command to work we need the OVS to start first to 
accept the get command .

> > +
> > +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set autovalidator],
> > +[0], [dnl Miniflow Extract implementation set to autovalidator.
> > +])
> > +
> > +OVS_WAIT_UNTIL([test `ovs-vsctl get interface p1 statistics | grep
> > +-oP 'rx_packets=\s*\K\d+'` -ge 10])
> > +
> > +dnl Clean up
> > +AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
> > +AT_CLEANUP dnl
> > +-
> > +-
> > --
> > 2.25.1

BR
Amber
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v11 10/11] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-07-14 Thread Van Haaren, Harry
> -Original Message-
> From: Eelco Chaudron 
> Sent: Wednesday, July 14, 2021 10:12 AM
> To: Amber, Kumar ; Aaron Conole
> 
> Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van 
> Haaren,
> Harry ; Ferriter, Cian ;
> Stokes, Ian 
> Subject: Re: [v11 10/11] dpif-netdev/mfex: Add AVX512 based optimized miniflow
> extract



> > +/* Permute the packet layout into miniflow blocks shape.
> > + * As different AVX512 ISA levels have different implementations,
> > + * this specializes on the "use_vbmi" attribute passed in.
> > + */
> > +__m512i v512_zeros = _mm512_setzero_si512();
> > +__m512i v_blk0 = v512_zeros;
> 
> Although I did ACK this patchset, running make clang-analyzer, gave me the
> following warning, which should be fixed:
> 
> lib/dpif-netdev-extract-avx512.c:476:17: warning: Value stored to 'v_blk0' 
> during its
> initialization is never read
> __m512i v_blk0 = v512_zeros;
> ^~   ~~
> 1 warning generated.

Ah interesting, indeed its never read. Its also a "zeroed" register, so it has 
no runtime
performance impact. (Magic of OoO execution, combined with register renaming :)

The fix is simply to remove the " = v512_zeros;" part, resulting in this decl 
of the variable from
__m512i v_blk0 = v512_zeros;
to
__m512i v_blk0;

Will be included in v11.


> Aaron, would it be possible to add a clang-analyzer run to the zero robot to 
> catch
> newly introduced warnings?

That'd be cool. I presume that a "scan-build" prefix to the build command is 
all that's
needed, as usually the case with clang analyzer. Reproduced the above error 
using
scan-build, so that seems to work :)

Regards, -Harry


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v11 08/11] dpif/stats: add miniflow extract opt hits counter

2021-07-14 Thread Eelco Chaudron



On 14 Jul 2021, at 4:02, kumar Amber wrote:

> From: Harry van Haaren 
>
> This commit adds a new counter to be displayed to the user when
> requesting datapath packet statistics. It counts the number of
> packets that are parsed and a miniflow built up from it by the
> optimized miniflow extract parsers.
>
> The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an
> extra entry indicating if the optimized MFEX was hit:
>
>   - MFEX Opt hits:6786432  (100.0 %)
>
> Signed-off-by: Harry van Haaren 
> Acked-by: Flavio Leitner 

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v11 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-14 Thread Eelco Chaudron


On 14 Jul 2021, at 4:02, kumar Amber wrote:

> From: Kumar Amber 
>
> Tests:
>   6: OVS-DPDK - MFEX Autovalidator
>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
>
> Added a new directory to store the PCAP file used
> in the tests and a script to generate the fuzzy traffic
> type pcap to be used in fuzzy unit test.
>
> Signed-off-by: Kumar Amber 
> Acked-by: Flavio Leitner 
> ---
> v11:
> - fix comments from Eelco
> v7:
> - fix review comments(Eelco)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - remove sleep from first test and added minor 5 sec sleep to fuzzy
> ---
> ---
>  Documentation/topics/dpdk/bridge.rst |  48 
>  tests/.gitignore |   1 +
>  tests/automake.mk|   6 +++
>  tests/mfex_fuzzy.py  |  33 +
>  tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
>  tests/system-dpdk.at |  53 +++
>  6 files changed, 141 insertions(+)
>  create mode 100755 tests/mfex_fuzzy.py
>  create mode 100644 tests/pcap/mfex_test.pcap
>
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 7860d6173..7a8983ff4 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -345,3 +345,51 @@ A compile time option is available in order to test it 
> with the OVS unit
>  test suite. Use the following configure option ::
>
>  $ ./configure --enable-mfex-default-autovalidator
> +
> +Unit Test Miniflow Extract
> +++
> +
> +Unit test can also be used to test the workflow mentioned above by running
> +the following test-case in tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> +OVS-DPDK - MFEX Autovalidator
> +
> +The unit test uses mulitple traffic type to test the correctness of the
> +implementaions.
> +
> +Running Fuzzy test with Autovalidator
> ++
> +
> +Fuzzy tests can also be done on miniflow extract with the help of
> +auto-validator and Scapy. The steps below describes the steps to
> +reproduce the setup with IP being fuzzed to generate packets.
> +
> +Scapy is used to create fuzzy IP packets and save them into a PCAP ::
> +
> +pkt = fuzz(Ether()/IP()/TCP())
> +
> +Set the miniflow extract to autovalidator using ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> +
> +OVS is configured to receive the generated packets ::
> +
> +$ ovs-vsctl add-port br0 pcap0 -- \
> +set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
> +"rx_pcap=fuzzy.pcap"
> +
> +With this workflow, the autovalidator will ensure that all MFEX
> +implementations are classifying each packet in exactly the same way.
> +If an optimized MFEX implementation causes a different miniflow to be
> +generated, the autovalidator has ovs_assert and logging statements that
> +will inform about the issue.
> +
> +Unit Fuzzy test with Autovalidator
> ++
> +
> +Unit test can also be used to test the workflow mentioned above by running
> +the following test-case in tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> +OVS-DPDK - MFEX Autovalidator Fuzzy
> diff --git a/tests/.gitignore b/tests/.gitignore
> index 45b4f67b2..a3d927e5d 100644
> --- a/tests/.gitignore
> +++ b/tests/.gitignore
> @@ -11,6 +11,7 @@
>  /ovsdb-cluster-testsuite
>  /ovsdb-cluster-testsuite.dir/
>  /ovsdb-cluster-testsuite.log
> +/pcap/
>  /pki/
>  /system-afxdp-testsuite
>  /system-afxdp-testsuite.dir/
> diff --git a/tests/automake.mk b/tests/automake.mk
> index f45f8d76c..a6c15ba55 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at: 
> tests/automake.mk
>   echo "TEST_FUZZ_REGRESSION([$$basename])"; \
>   done > $@.tmp && mv $@.tmp $@
>
> +EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
> +MFEX_AUTOVALIDATOR_TESTS = \
> + tests/pcap/mfex_test.pcap \
> + tests/mfex_fuzzy.py
> +
>  OVSDB_CLUSTER_TESTSUITE_AT = \
>   tests/ovsdb-cluster-testsuite.at \
>   tests/ovsdb-execution.at \
> @@ -512,6 +517,7 @@ tests_test_type_props_SOURCES = tests/test-type-props.c
>  CHECK_PYFILES = \
>   tests/appctl.py \
>   tests/flowgen.py \
> + tests/mfex_fuzzy.py \
>   tests/ovsdb-monitor-sort.py \
>   tests/test-daemon.py \
>   tests/test-json.py \
> diff --git a/tests/mfex_fuzzy.py b/tests/mfex_fuzzy.py
> new file mode 100755
> index 0..5b056bb48
> --- /dev/null
> +++ b/tests/mfex_fuzzy.py
> @@ -0,0 +1,33 @@
> +#!/usr/bin/python3
> +try:
> +from scapy.all import RandMAC, RandIP, PcapWriter, RandIP6, RandShort, 
> fuzz
> +from scapy.all import IPv6, Dot1Q, IP, Ether, UDP, TCP
> +except ModuleNotFoundError as err:
> +print(err + ": Scapy")
> +import sys
> +
> +path = str(sys.argv[1]) + "/pcap/fuzzy.pcap"
> +pktdump = PcapWriter(path, 

  1   2   >