Re: [ovs-dev] [PATCH] netdev-offload-dpdk: fix a crash in dump_flow_attr()

2021-08-10 Thread Sriharsha Basavapatna via dev
On Wed, Aug 11, 2021 at 6:21 AM Gaëtan Rivet  wrote:
>
> On Wed, Aug 4, 2021, at 14:37, Sriharsha Basavapatna via dev wrote:
> > In netdev_offload_dpdk_flow_create() when an offload request fails,
> > dump_flow() is called to log a warning message. The 's_tnl' string
> > in flow_patterns gets initialized in vport_to_rte_tunnel() conditionally
> > via ds_put_format(). If it is not initialized, it crashes later in
> > dump_flow_attr()->ds_clone()->memcpy() while dereferencing this string.
> >
> > Fix this by initializing s_tnl using ds_cstr(). Fix a similar
> > issue with actions->s_tnl.
> >
> > Signed-off-by: Sriharsha Basavapatna 
>
> Hello Harsha,
>
> Thanks for the fix. I have a remark below.

Thanks Gaetan, please see my response inline.

>
> > ---
> >  lib/netdev-offload-dpdk.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> > index f6706ee0c..243e2c430 100644
> > --- a/lib/netdev-offload-dpdk.c
> > +++ b/lib/netdev-offload-dpdk.c
> > @@ -788,6 +788,7 @@ free_flow_patterns(struct flow_patterns *patterns)
> >  free(CONST_CAST(void *, patterns->items[i].mask));
> >  }
> >  }
> > +ds_destroy(>s_tnl);
> >  free(patterns->items);
> >  patterns->items = NULL;
> >  patterns->cnt = 0;
> > @@ -1334,6 +1335,7 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns
> > *patterns,
> >  struct rte_flow_error error;
> >  struct rte_flow *flow;
> >
> > +ds_cstr(_tnl);
>
> You are forced to use 'ds_cstr' because 'ds_clone' crashes on properly 
> initialized
> ds. I think this is an issue with 'ds_clone'.
>
> I would expect all ds_ functions to work with strings that were initialized 
> with
> 'ds_init' or set to 'DS_EMPTY_INITIALIZER'.
>
> In this case, I think it's better to use .s_tnl = DS_EMPTY_INITIALIZER in the 
> actions
> initializer list and have a fix for 'ds_clone' so that it won't crash with 
> correct strings.

Are you suggesting something like this in ds_clone():

if (source->string)
 memcpy(dst->string, source->string, dst->allocated + 1);

I suspect that's inconsistent with other functions in the ds library.
The expectation seems to be that callers initialize their strings with
a call to ds_cstr() before using a ds string.

Here's the comments from the header file for reference:

 * The 'string' member does not always point to a null-terminated string.
 * Initially it is NULL, and even when it is nonnull, some operations do not
 * ensure that it is null-terminated.  Use ds_cstr() to ensure that memory is
 * allocated for the string and that it is null-terminated. */
struct ds {
char *string;   /* Null-terminated string. */

Let me know what you think.

Thanks,
-Harsha
>
> --
> Gaetan Rivet

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] conntrack: remove the nat_action_info from the conn

2021-08-10 Thread wenxu
From: wenxu 

Only 'nat_action_info->nat_action' is used for packet forwarding.
Other items such as min/max_ip/port are used only when creating
new connections. No need to store the whole nat_action_info in conn.

Signed-off-by: wenxu 
Acked-by: Gaetan Rivet 
Acked-By Michael Santana 
---
v2: rewrite the commit log

 lib/conntrack-private.h |   2 +-
 lib/conntrack.c | 101 +---
 2 files changed, 53 insertions(+), 50 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 169ace5..dfdf4e6 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -98,7 +98,7 @@ struct conn {
 struct conn_key parent_key; /* Only used for orig_tuple support. */
 struct ovs_list exp_node;
 struct cmap_node cm_node;
-struct nat_action_info_t *nat_info;
+uint16_t nat_action;
 char *alg;
 struct conn *nat_conn; /* The NAT 'conn' context, if there is one. */
 
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 551c206..33a1a92 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -111,7 +111,8 @@ static void *clean_thread_main(void *f_);
 
 static bool
 nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn,
- struct conn *nat_conn);
+ struct conn *nat_conn,
+ const struct nat_action_info_t *nat_info);
 
 static uint8_t
 reverse_icmp_type(uint8_t type);
@@ -724,7 +725,7 @@ handle_alg_ctl(struct conntrack *ct, const struct 
conn_lookup_ctx *ctx,
 static void
 pat_packet(struct dp_packet *pkt, const struct conn *conn)
 {
-if (conn->nat_info->nat_action & NAT_ACTION_SRC) {
+if (conn->nat_action & NAT_ACTION_SRC) {
 if (conn->key.nw_proto == IPPROTO_TCP) {
 struct tcp_header *th = dp_packet_l4(pkt);
 packet_set_tcp_port(pkt, conn->rev_key.dst.port, th->tcp_dst);
@@ -732,7 +733,7 @@ pat_packet(struct dp_packet *pkt, const struct conn *conn)
 struct udp_header *uh = dp_packet_l4(pkt);
 packet_set_udp_port(pkt, conn->rev_key.dst.port, uh->udp_dst);
 }
-} else if (conn->nat_info->nat_action & NAT_ACTION_DST) {
+} else if (conn->nat_action & NAT_ACTION_DST) {
 if (conn->key.nw_proto == IPPROTO_TCP) {
 packet_set_tcp_port(pkt, conn->rev_key.dst.port,
 conn->rev_key.src.port);
@@ -746,7 +747,7 @@ pat_packet(struct dp_packet *pkt, const struct conn *conn)
 static void
 nat_packet(struct dp_packet *pkt, const struct conn *conn, bool related)
 {
-if (conn->nat_info->nat_action & NAT_ACTION_SRC) {
+if (conn->nat_action & NAT_ACTION_SRC) {
 pkt->md.ct_state |= CS_SRC_NAT;
 if (conn->key.dl_type == htons(ETH_TYPE_IP)) {
 struct ip_header *nh = dp_packet_l3(pkt);
@@ -761,7 +762,7 @@ nat_packet(struct dp_packet *pkt, const struct conn *conn, 
bool related)
 if (!related) {
 pat_packet(pkt, conn);
 }
-} else if (conn->nat_info->nat_action & NAT_ACTION_DST) {
+} else if (conn->nat_action & NAT_ACTION_DST) {
 pkt->md.ct_state |= CS_DST_NAT;
 if (conn->key.dl_type == htons(ETH_TYPE_IP)) {
 struct ip_header *nh = dp_packet_l3(pkt);
@@ -782,7 +783,7 @@ nat_packet(struct dp_packet *pkt, const struct conn *conn, 
bool related)
 static void
 un_pat_packet(struct dp_packet *pkt, const struct conn *conn)
 {
-if (conn->nat_info->nat_action & NAT_ACTION_SRC) {
+if (conn->nat_action & NAT_ACTION_SRC) {
 if (conn->key.nw_proto == IPPROTO_TCP) {
 struct tcp_header *th = dp_packet_l4(pkt);
 packet_set_tcp_port(pkt, th->tcp_src, conn->key.src.port);
@@ -790,7 +791,7 @@ un_pat_packet(struct dp_packet *pkt, const struct conn 
*conn)
 struct udp_header *uh = dp_packet_l4(pkt);
 packet_set_udp_port(pkt, uh->udp_src, conn->key.src.port);
 }
-} else if (conn->nat_info->nat_action & NAT_ACTION_DST) {
+} else if (conn->nat_action & NAT_ACTION_DST) {
 if (conn->key.nw_proto == IPPROTO_TCP) {
 packet_set_tcp_port(pkt, conn->key.dst.port, conn->key.src.port);
 } else if (conn->key.nw_proto == IPPROTO_UDP) {
@@ -802,7 +803,7 @@ un_pat_packet(struct dp_packet *pkt, const struct conn 
*conn)
 static void
 reverse_pat_packet(struct dp_packet *pkt, const struct conn *conn)
 {
-if (conn->nat_info->nat_action & NAT_ACTION_SRC) {
+if (conn->nat_action & NAT_ACTION_SRC) {
 if (conn->key.nw_proto == IPPROTO_TCP) {
 struct tcp_header *th_in = dp_packet_l4(pkt);
 packet_set_tcp_port(pkt, conn->key.src.port,
@@ -812,7 +813,7 @@ reverse_pat_packet(struct dp_packet *pkt, const struct conn 
*conn)
 packet_set_udp_port(pkt, conn->key.src.port,
 uh_in->udp_dst);
 }
-} else if (conn->nat_info->nat_action & NAT_ACTION_DST) {
+} else if (conn->nat_action & NAT_ACTION_DST) {
 

Re: [ovs-dev] Execute the DPIF_OP_FLOW_PUT action after the slowpath packet is sent

2021-08-10 Thread 0-day Robot
Bleep bloop.  Greetings liyang_12921, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: sha1 information is lacking or useless (ofproto/ofproto-dpif-upcall.c).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0001 Execute the DPIF_OP_FLOW_PUT action after the slowpath 
packet is sent
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Patch skipped due to previous failure.

Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Execute the DPIF_OP_FLOW_PUT action after the slowpath packet is sent

2021-08-10 Thread liyang_12921
We encountered a problem when using ovs to provide forwarding for vm. A process 
in the vm frequently uploads files to the NOS server, and the tcp sessons of 
uploading is often reset by NOS server. Through packet capture, we found that 
when the session is reset, the packets sent by the vm are always out of order, 
and reset packet is sent only when the ack packet of the three-way handshake is 
out of order. Note that the ack packet of the three-way handshake is the first 
packet of ct table with ct_state is -new.


We use skb_tracer to trace the sending order of packets in client host, then we 
find that after DPIF_OP_FLOW_PUT action and before the slow path packets were 
sent, some subsequent packets directly match the datapath flow and were sent.


After discovering the above problems, we made a simple change to the code,make 
DPIF_OP_FLOW_PUT action be executed after the slowpath packets were sent. The 
reset never happened again after this.


However, I am not sure whether this change will bring other problems. Who can 
help give an accurate assessment:
1) What was the reason for this design that execute the DPIF_OP_FLOW_PUT action 
first?
2) Whether this change will cause other problems?If not, can we merge this 
change into the master? In this way, the out-of-order packets can be 
effectively avoided.






diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 05ff8cd72..63214def6 100755
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -1586,15 +1586,6 @@ handle_upcalls(struct udpif *udpif, struct upcall 
*upcalls,
 const struct dp_packet *packet = upcall->packet;
 struct ukey_op *op;


-if (should_install_flow(udpif, upcall)) {
-struct udpif_key *ukey = upcall->ukey;
-
-if (ukey_install(udpif, ukey)) {
-upcall->ukey_persists = true;
-put_op_init([n_ops++], ukey, DPIF_FP_CREATE);
-}
-}
-
 if (upcall->odp_actions.size) {
 op = [n_ops++];
 op->ukey = NULL;
@@ -1610,6 +1601,15 @@ handle_upcalls(struct udpif *udpif, struct upcall 
*upcalls,
 op->dop.execute.mtu = upcall->mru;
 op->dop.execute.skb_hash = upcall->skb_hash;
 }
+
+if (should_install_flow(udpif, upcall)) {
+struct udpif_key *ukey = upcall->ukey;
+
+if (ukey_install(udpif, ukey)) {
+upcall->ukey_persists = true;
+put_op_init([n_ops++], ukey, DPIF_FP_CREATE);
+}
+}
 }


 /* Execute batch. */


| |
liyang_12921
|
|
liyang_12...@163.com
|
签名由网易邮箱大师定制
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-offload-dpdk: fix a crash in dump_flow_attr()

2021-08-10 Thread Gaëtan Rivet
On Wed, Aug 4, 2021, at 14:37, Sriharsha Basavapatna via dev wrote:
> In netdev_offload_dpdk_flow_create() when an offload request fails,
> dump_flow() is called to log a warning message. The 's_tnl' string
> in flow_patterns gets initialized in vport_to_rte_tunnel() conditionally
> via ds_put_format(). If it is not initialized, it crashes later in
> dump_flow_attr()->ds_clone()->memcpy() while dereferencing this string.
> 
> Fix this by initializing s_tnl using ds_cstr(). Fix a similar
> issue with actions->s_tnl.
> 
> Signed-off-by: Sriharsha Basavapatna 

Hello Harsha,

Thanks for the fix. I have a remark below.

> ---
>  lib/netdev-offload-dpdk.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index f6706ee0c..243e2c430 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -788,6 +788,7 @@ free_flow_patterns(struct flow_patterns *patterns)
>  free(CONST_CAST(void *, patterns->items[i].mask));
>  }
>  }
> +ds_destroy(>s_tnl);
>  free(patterns->items);
>  patterns->items = NULL;
>  patterns->cnt = 0;
> @@ -1334,6 +1335,7 @@ netdev_offload_dpdk_mark_rss(struct flow_patterns 
> *patterns,
>  struct rte_flow_error error;
>  struct rte_flow *flow;
>  
> +ds_cstr(_tnl);

You are forced to use 'ds_cstr' because 'ds_clone' crashes on properly 
initialized
ds. I think this is an issue with 'ds_clone'.

I would expect all ds_ functions to work with strings that were initialized with
'ds_init' or set to 'DS_EMPTY_INITIALIZER'.

In this case, I think it's better to use .s_tnl = DS_EMPTY_INITIALIZER in the 
actions
initializer list and have a fix for 'ds_clone' so that it won't crash with 
correct strings.

-- 
Gaetan Rivet
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: remove the nat_action_info from the conn

2021-08-10 Thread Gaëtan Rivet
On Thu, Aug 5, 2021, at 05:11, we...@ucloud.cn wrote:
> From: wenxu 

Hi Wenxu,

A few nits in the commit log, otherwise the change looks good to me.

> 
> Only the nat_action in the nat_action_info is used for conn
> packet forward and other item such as min/max_ip/port only 
> used for the new conn operation. So the whole nat_ction_info
> no need store in conn. This will also avoid unnecessary memory
> allocation.

Suggested rewrite:

Only 'nat_action_info->nat_action' is used for packet forwarding.
Other items such as min/max_ip/port are used only when creating
new connections. No need to store the whole nat_action_info in conn.

> 
> Signed-off-by: wenxu 

Acked-by: Gaetan Rivet 

Thanks,
-- 
Gaetan Rivet
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/1] include/openvswitch/compiler.h: check existence of __builtin_prefetch using __has_builtin

2021-08-10 Thread Sergey Madaminov
Checking if '__has_builtin' is defined and then defining OVS_PREFETCH to
be '__builtin_prefetch' if it is available. To preserve backwards
compatibility, the previous way to define OVS_PREFETCH macro is also
there.

Signed-off-by: Sergey Madaminov 

---
 include/openvswitch/compiler.h | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/include/openvswitch/compiler.h b/include/openvswitch/compiler.h
index cf009f826..684d45d12 100644
--- a/include/openvswitch/compiler.h
+++ b/include/openvswitch/compiler.h
@@ -53,7 +53,7 @@
 #define OVS_UNLIKELY(CONDITION) (!!(CONDITION))
 #endif
 
-#if __has_feature(c_thread_safety_attributes)
+#if __has_feature(c_thread_safety_attributes) && defined(HAVE_THREAD_SAFETY)
 /* "clang" annotations for thread safety check.
  *
  * OVS_LOCKABLE indicates that the struct contains mutex element
@@ -229,13 +229,22 @@
  * OVS_PREFETCH_WRITE() should be used when the memory is going to be
  * written to.  Depending on the target CPU, this can generate the same
  * instruction as OVS_PREFETCH(), or bring the data into the cache in an
- * exclusive state. */
-#if __GNUC__
-#define OVS_PREFETCH(addr) __builtin_prefetch((addr))
-#define OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1)
+ * exclusive state.
+ *
+ * GCC 10 introduced support for __has_builtin preprocessor operator,
+ * however, the old way to define OVS_PREFETCH remains to allow for backwards
+ * compatibility. */
+#if defined __has_builtin
+#  if __has_builtin (__builtin_prefetch)
+#define OVS_PREFETCH(addr) __builtin_prefetch((addr))
+#define OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1)
+#  endif
+#elif __GNUC__
+#  define OVS_PREFETCH(addr) __builtin_prefetch((addr))
+#  define OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1)
 #else
-#define OVS_PREFETCH(addr)
-#define OVS_PREFETCH_WRITE(addr)
+#  define OVS_PREFETCH(addr)
+#  define OVS_PREFETCH_WRITE(addr)
 #endif
 
 /* Since Visual Studio 2015 there has been an effort to make offsetof a
@@ -244,11 +253,13 @@
  * the C compiler.
  * e.g.: https://bit.ly/2UvWwti
  */
+ /*
 #if _MSC_VER >= 1900
 #undef offsetof
 #define offsetof(type, member) \
 ((size_t)((char *)&(((type *)0)->member) - (char *)0))
 #endif
+*/
 
 /* Build assertions.
  *
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 0/1] include/openvswitch/compiler.h: check existence of __builtin_prefetch using __has_builtin

2021-08-10 Thread Sergey Madaminov
Currently, '__builtin_prefetch' is defined for OVS_PREFETCH macro only
if '__GNUC__' is defined. However, it would make sense to use a
'__has_builtin' preprocessor operator to check if '__builtin_prefetch'
is available and then define the OVS_PREFETCH macro.

Doing so will allow to use prefetching on Windows too. This operator is
supported by Clang and is supported in GCC starting GCC 10. To preserve
backwards compatibility, the old way to enable prefetching remains if
'__has_builtin' is not defined.

Signed-off-by: Sergey Madaminov 

Sergey Madaminov (1):
  check existence of __builtin_prefetch using __has_builtin operator

 include/openvswitch/compiler.h | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7] ovsdb: provide raft and command interfaces with priority

2021-08-10 Thread Ilya Maximets
On 7/27/21 10:21 AM, anton.iva...@cambridgegreys.com wrote:
> From: Anton Ivanov 
> 
> Set a soft time limit of "raft election timer"/2 on ovsdb
> processing.
> 
> This improves behaviour in large heavily loaded clusters.
> While it cannot fully eliminate spurious raft elections
> under heavy load, it significantly decreases their number.
> 
> Processing is (to the extent possible) restarted where it
> stopped on the previous iteration to ensure that sessions
> towards the tail of the session list are not starved.
> 
> Signed-off-by: Anton Ivanov 
> ---
>  ovsdb/jsonrpc-server.c | 90 --
>  ovsdb/jsonrpc-server.h |  2 +-
>  ovsdb/ovsdb-server.c   | 16 +++-
>  ovsdb/raft.c   |  6 +++
>  ovsdb/raft.h   |  3 ++
>  ovsdb/storage.c| 12 ++
>  ovsdb/storage.h|  2 +
>  7 files changed, 125 insertions(+), 6 deletions(-)
> 
> diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c
> index 351c39d8a..ea82d4161 100644
> --- a/ovsdb/jsonrpc-server.c
> +++ b/ovsdb/jsonrpc-server.c
> @@ -60,7 +60,8 @@ static struct ovsdb_jsonrpc_session 
> *ovsdb_jsonrpc_session_create(
>  struct ovsdb_jsonrpc_remote *, struct jsonrpc_session *, bool);
>  static void ovsdb_jsonrpc_session_preremove_db(struct ovsdb_jsonrpc_remote *,
> struct ovsdb *);
> -static void ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *);
> +static void ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *,
> +  uint64_t limit);
>  static void ovsdb_jsonrpc_session_wait_all(struct ovsdb_jsonrpc_remote *);
>  static void ovsdb_jsonrpc_session_get_memory_usage_all(
>  const struct ovsdb_jsonrpc_remote *, struct simap *usage);
> @@ -128,6 +129,11 @@ struct ovsdb_jsonrpc_server {
>  bool read_only;/* This server is does not accept any
>transactions that can modify the database. 
> */
>  struct shash remotes;  /* Contains "struct ovsdb_jsonrpc_remote *"s. 
> */
> +struct ovsdb_jsonrpc_remote *skip_to; /* Pointer to remote where 
> processing
> + should restart after a time
> + constraint interruption. */
> +bool must_wake_up; /* The processing loop must be re-run. It was
> +  interrupted due to exceeding a time constraint. */
>  };
>  
>  /* A configured remote.  This is either a passive stream listener plus a list
> @@ -137,6 +143,9 @@ struct ovsdb_jsonrpc_remote {
>  struct ovsdb_jsonrpc_server *server;
>  struct pstream *listener;   /* Listener, if passive. */
>  struct ovs_list sessions;   /* List of "struct ovsdb_jsonrpc_session"s. 
> */
> +struct ovsdb_jsonrpc_session *skip_to; /* Session at which processing
> +  should restart after an
> +  interruption. */
>  uint8_t dscp;
>  bool read_only;
>  char *role;
> @@ -279,6 +288,7 @@ ovsdb_jsonrpc_server_add_remote(struct 
> ovsdb_jsonrpc_server *svr,
>  remote->dscp = options->dscp;
>  remote->read_only = options->read_only;
>  remote->role = nullable_xstrdup(options->role);
> +remote->skip_to = NULL;
>  shash_add(>remotes, name, remote);
>  
>  if (!listener) {
> @@ -293,6 +303,9 @@ ovsdb_jsonrpc_server_del_remote(struct shash_node *node)
>  {
>  struct ovsdb_jsonrpc_remote *remote = node->data;
>  
> +/* The safest option is to rerun all remotes. */
> +remote->server->skip_to = NULL;
> +
>  ovsdb_jsonrpc_session_close_all(remote);
>  pstream_close(remote->listener);
>  shash_delete(>server->remotes, node);
> @@ -378,12 +391,24 @@ ovsdb_jsonrpc_server_set_read_only(struct 
> ovsdb_jsonrpc_server *svr,
>  }
>  
>  void
> -ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr)
> +ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr, uint64_t limit)
>  {
>  struct shash_node *node;
> +uint64_t elapsed = 0;
> +uint64_t start_time = time_msec();
> +
> +svr->must_wake_up = false;
>  
>  SHASH_FOR_EACH (node, >remotes) {
>  struct ovsdb_jsonrpc_remote *remote = node->data;
> +if (svr->skip_to) {
> +if (remote != svr->skip_to) {
> +continue;
> +} else {
> +svr->skip_to = NULL;
> +svr->must_wake_up = true;
> +}
> +}
>  
>  if (remote->listener) {
>  struct stream *stream;
> @@ -403,7 +428,17 @@ ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server 
> *svr)
>  }
>  }
>  
> -ovsdb_jsonrpc_session_run_all(remote);
> +/* We assume accept and session creation time to be
> + * negligible for the purposes of computing timeouts.
> + */
> +ovsdb_jsonrpc_session_run_all(remote, limit - 

Re: [ovs-dev] [PATCH RFC 0/1] use meson and ninja as a build system for ovs

2021-08-10 Thread Ilya Maximets
On 8/10/21 4:43 PM, Van Haaren, Harry wrote:
>> -Original Message-
>> From: dev  On Behalf Of William Tu
>> Sent: Tuesday, August 10, 2021 1:00 AM
>> To: Ilya Maximets 
>> Cc:  ; Sergey Madaminov
>> 
>> Subject: Re: [ovs-dev] [PATCH RFC 0/1] use meson and ninja as a build system 
>> for ovs
>>
>> Hi Ilya,
>>
>> Thanks for your feedback!
>>
>>> Hi, Sergey and William.  Thanks for working on this.
> 
> Hey All,
> 
> Nice RFC! Re cloning from github & running meson commands, I had to run
> the following as the file isn't renamed in upstream commit (for future 
> readers):
> $ git mv lib/dirs.c.in lib/dirs.c.in.meson
> 
> I manually decreased the required Meson version, all compiled fine after that!
> 
> 
>>> I think that it might be a good idea to move to a different build system
>>> that will not be that painful to run on Windows.  I'm not working on
>>> Windows parts, but it would be great to have a fast CI that can confirm
>>> that everything still working fine.
>>
>> Yes, and avoiding the msys/mingw makes coding and debugging on windows
>> much easier.
> 
> Generally +1 for Meson from me, my DPDK experience has been positive, as
> has Meson for various smaller open source projects I've worked on.
> 
> I'm happy to contribute to the Meson build-system enabling as far as the  
> CPU-ISA
> enabling/AVX512 library code goes - although I think Sergey has a (commented)
> implementation of the "openvswitchavx512" library in the RFC patchset.
> 
> 
> 
>>> However, as I already said in the discussion on GitHub, it seems to be
>>> very hard to migrate our testsuite that heavily depends on autotest.
>>> And without migrating the testsuite we will, probably, have to maintain
>>> two different build systems to be able to run tests.  This, kind of,
>>> defeats the whole purpose of the change.
>>
>> Why is it very hard?
>> I thought once we find a way to add tests to meson, it's just
>> "mechanically" copying all these scripts into a new test framework.
>> It's a lot of tedious work, but hopefully not a difficult or impossible task.
>>
>> Or does current OVS autotests heavily depend on autotool/autoconf?
>> I see the test cases are pretty independent.
> 
> I'm not very familiar with the tests, but today the AT_CHECK() syntax used to
> write the tests is pretty autotools specific.

Yep.  The way how it works is that Autoconf takes the whole bunch of
./tests/*.at files and generates a giant shell script ./tests/testsuite
out of them by replacing all the AT_CHECK/AT_SETUP/AT_DATA/AT_WHATEVER
with a corresponding shell code snippet, adds a bunch of a system
environment checks and defines, performs substitution of all the user-defined
macros and creates shell functions out of user-defined functions.
Basically, that is what we will need to re-implement as meson doesn't provide
replacement for most of these things.  We will, probably, need to make every
test a separate shell script or make a giant script that can run one requested
test, and maybe, create a library of helper scripts to avoid massive code
duplication.

Another point is that if these are still shell scripts, we still need msys2
or whatever to run a shell, right?

> Agree that in theory running the
> commands from different test runner infrastructure shouldn't be an issue.

I agree that once we have a test framework it should be just a matter of
time to migrate all the tests.  The problem is that I'm afraid we will
have to create one as the test framework in meson is just a way to run
a single binary/script, but it doesn't provide any instruments to actually
write these binaries/scripts.

> 
> As I'm not familiar as others here, I'll just be quiet on the topic :)
> 
> 
>>> Meson is a nice system in many aspects, but its support for tests is very
>>> limited.  IIUC, it can only run a single binary and check the error codes.
>>> Most of our tests starts several daemons and performs several fairly complex
>>> operations and checks.  I'm afraid that we will end up writing our own
>>> separate testing framework that will mimic autotest in order to be able to
>>> run our tests from meson.
>>>
>>> Did you think about this problem?  Do you have a solution in mind?
>>
>> Thanks, we thought about it but I don't have a solution in mind at this 
>> moment.

PoC of the build process is fine and I had no doubts that it can be done.
I'd suggest to make a PoC for at least one unit test that we have (e.g.
"ofproto-dpif - MPLS handling with goto_table") or two tests to actually
figure out what we need to do with them.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] checkpatch: check if some tags are wrongly written

2021-08-10 Thread Timothy Redaelli
Currently, there are some patches with the tags wrongly written (with
space instead of dash ) and this may prevent some automatic system or CI
to detect them correctly.

This commit adds a check in checkpatch to be sure the tag is written
correctly with dash and not with space.

The tags supported by the commit are:
Reported-by, Requested-by, Suggested-by, Reported-at, and Submitted-at

It's not necessary to add "Signed-off-by" since it's already checked in
checkpatch.

Signed-off-by: Timothy Redaelli 
---
 tests/checkpatch.at | 44 +
 utilities/checkpatch.py | 13 
 2 files changed, 57 insertions(+)

diff --git a/tests/checkpatch.at b/tests/checkpatch.at
index 0718acd99..8eb6a7558 100755
--- a/tests/checkpatch.at
+++ b/tests/checkpatch.at
@@ -348,3 +348,47 @@ try_checkpatch \
 "
 
 AT_CLEANUP
+
+AT_SETUP([checkpatch - malformed tags])
+try_checkpatch \
+   "Author: A
+
+Reported by: foo...
+Signed-off-by: A" \
+"ERROR: Reported-by tag is malformed.
+1: Reported by: foo...
+"
+try_checkpatch \
+   "Author: A
+
+Requested by: foo...
+Signed-off-by: A" \
+"ERROR: Requested-by tag is malformed.
+1: Requested by: foo...
+"
+try_checkpatch \
+   "Author: A
+
+Suggested by: foo...
+Signed-off-by: A" \
+"ERROR: Suggested-by tag is malformed.
+1: Suggested by: foo...
+"
+try_checkpatch \
+   "Author: A
+
+Reported at: foo...
+Signed-off-by: A" \
+"ERROR: Reported-at tag is malformed.
+1: Reported at: foo...
+"
+try_checkpatch \
+   "Author: A
+
+Submitted at: foo...
+Signed-off-by: A" \
+"ERROR: Submitted-at tag is malformed.
+1: Submitted at: foo...
+"
+
+AT_CLEANUP
diff --git a/utilities/checkpatch.py b/utilities/checkpatch.py
index 699fb4b02..97e21eeaa 100755
--- a/utilities/checkpatch.py
+++ b/utilities/checkpatch.py
@@ -749,6 +749,14 @@ def ovs_checkpatch_parse(text, filename, author=None, 
committer=None):
 is_gerrit_change_id = re.compile(r'(\s*(change-id: )(.*))$',
  re.I | re.M | re.S)
 
+tags_typos = {
+r'^Reported by:':  'Reported-by:',
+r'^Requested by:': 'Requested-by:',
+r'^Suggested by:': 'Suggested-by:',
+r'^Reported at:':  'Reported-at:',
+r'^Submitted at:': 'Submitted-at:'
+}
+
 reset_counters()
 
 for line in text.splitlines():
@@ -838,6 +846,11 @@ def ovs_checkpatch_parse(text, filename, author=None, 
committer=None):
 print("%d: %s\n" % (lineno, line))
 elif spellcheck:
 check_spelling(line, False)
+for typo, correct in tags_typos.items():
+m = re.match(typo, line, re.I)
+if m:
+print_error("%s tag is malformed." % (correct[:-1]))
+print("%d: %s\n" % (lineno, line))
 
 elif parse == PARSE_STATE_CHANGE_BODY:
 newfile = hunks.match(line)
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH ovn 0/4] Use Distributed Gateway Port for ovn-controller scalability.

2021-08-10 Thread Mark Gray
On 03/08/2021 19:33, Han Zhou wrote:
> On Tue, Aug 3, 2021 at 11:09 AM Numan Siddique  wrote:
>>
>> On Fri, Jul 30, 2021 at 3:22 AM Han Zhou  wrote:
>>>
>>> Note: This patch series is on top of a pending patch that is still under
>>> review:
> http://patchwork.ozlabs.org/project/ovn/patch/20210729080218.1235041-1-hz...@ovn.org/
>>>
>>> It is RFC because: a) it is based on the unmerged patch. b) DDlog
>>> changes are not done yet. Below is a copy of the commit message of the
> last
>>> patch in this series:
>>>
>>> For a fully distributed virtual network dataplane, ovn-controller
>>> flood-fills datapaths that are connected through patch ports. This
>>> creates scale problems in ovn-controller when the connected datapaths
>>> are too many.
>>>
>>> In a particular situation, when distributed gateway ports are used to
>>> connect logical routers to logical switches, when there is no need for
>>> distributed processing of those gateway ports (e.g. no dnat_and_snat
>>> configured), the datapaths on the other side of the gateway ports are
>>> not needed locally on the current chassis. This patch avoids pulling
>>> those datapaths to local in those scenarios.
>>>
>>> There are two scenarios that can greatly benefit from this optimization.
>>>
>>> 1) When there are multiple tenants, each has its own logical topology,
>>>but sharing the same external/provider networks, connected to their
>>>own logical routers with DGPs. Without this optimization, each
>>>ovn-controller would process all logical topology of all tenants and
>>>program flows for all of them, even if there are only workloads of a
>>>very few number of tenants on the node where the ovn-controller is
>>>running, because the shared external network connects all tenants.
>>>With this change, only the logical topologies relevant to the node
>>>are processed and programmed on the node.
>>>
>>> 2) In some deployments, such as ovn-kubernetes, logical switches are
>>>bound to chassises instead of distributed, because each chassis is
>>>assigned dedicated subnets. With the current implementation,
>>>ovn-controller on each node processes all logical switches and all
>>>ports on them, without knowing that they are not distributed at all.
>>>At large scale with N nodes (N = hundreds or even more), there are
>>>roughly N times processing power wasted for the logical connectivity
>>>related flows. With this change, those depolyments can utilize DGP
>>>to connect the node level logical switches to distributed router(s),
>>>with gateway chassis (or HA chassis without really HA) of the DGP
>>>set to the chassis where the logical switch is bound. This inherently
>>>tells OVN the mapping between logical switch and chassis, and
>>>ovn-controller would smartly avoid processing topologies of other
> node
>>>level logical switches, which would hugely save compute cost of each
>>>ovn-controller.
>>>
>>> For 2), test result for an ovn-kubernetes alike deployment shows
>>> signficant improvement of ovn-controller, both CPU (>90% reduced) and
> memory.
>>>
>>> Topology:
>>>
>>> - 1000 nodes, 1 LS with 10 LSPs per node, connected to a distributed
>>>   router.
>>>
>>> - 2 large port-groups PG1 and PG2, each with 2000 LSPs
>>>
>>> - 10 stateful ACLs: 5 from PG1 to PG2, 5 from PG2 to PG1
>>>
>>> - 1 GR per node, connected to the distributed router through a join
>>>   switch. Each GR also connects to an external logical switch per node.
>>>   (This part is to keep the test environment close to a real
>>>ovn-kubernetes setup but shouldn't make much difference for the
>>>comparison)
>>>
>>>  Before the change 
>>> OVS flows per node: 297408
>>> ovn-controller memory: 772696 KB
>>> ovn-controller recompute: 13s
>>> ovn-controller restart (recompute + reinstall OVS flows): 63s
>>>
>>>  After the change (also use DGP to connect node level LSes) 
>>> OVS flows per node: 81139 (~70% reduced)
>>> ovn-controller memory: 163464 KB (~80% reduced)
>>> ovn-controller recompute: 0.86s (>90% reduced)
>>> ovn-controller restart (recompute + reinstall OVS flows): 5s (>90%
> reduced)
>>
>> Hi Han,
>>
>> Thanks for these RFC patches.  The improvements are significant.
>> That's awesome.
>>
>> If I understand this RFC correctly, ovn-k8s will set the
>> gateway_chassis for each logical
>> router port of the cluster router (ovn_cluster_router) connecting to
>> the node logical switch right ?
>>
>> If so, instead of using the multiple gw port feature, why can't
>> ovn-k8s just set the chassis=
>> in the logical switch other_config option ?
>>
>> ovn-controllers can exclude the logical switches from the
>> local_datapaths if they don't belong to the local chassis.
>>
>> I'm not entirely sure if this would work.  Any thoughts ?  If the same
>> can be achieved using the chassis option
>> instead of multiple gw router ports, perhaps the former seems better
>> to me as it would be less work for 

Re: [ovs-dev] [ovn] Need advice for multiple routing tables support in LR

2021-08-10 Thread Numan Siddique
On Tue, Aug 10, 2021 at 12:22 PM Vladislav Odintsov  wrote:
>
> Thanks Numan for your help!
>
> I’ve sent patch, which fixes documentation:
> https://patchwork.ozlabs.org/project/ovn/patch/20210810161537.71692-1-odiv...@gmail.com/
>
> Since lr_in_defrag stage first writes xxreg0 register only in master branch 
> after
> https://github.com/ovn-org/ovn/commit/fb6de664f061c51040a4aae39049aa3cd4a0b1fa
> and prior this commit it was lr_in_ip_routing for a long time, should I make 
> a separate
> backport patch to branch-21.06 in which lr_in_defrag replaced with 
> lr_in_ip_routing
> for futher backport down to branch-20.06?
> I think it’s reasonable to have actual documentation in old branches too.

Thanks.  I'll take a look at the patch.  I think you can submit
backport patches once the main patch
is applied to the main branch.

Numan

>
> Regards,
> Vladislav Odintsov
>
> > On 10 Aug 2021, at 18:28, Numan Siddique  wrote:
> >
> > On Tue, Aug 10, 2021 at 8:00 AM Vladislav Odintsov  > > wrote:
> >>
> >> Hi Numan,
> >>
> >> thanks for the answer.
> >>
> >> You said that it’s possible to use reg2-reg7 for ipv4.
> >> I’ve checked usage of xxreg0 and xxreg1 to find out where they’re used in 
> >> router pipeline.
> >>
> >> In ovn-northd.c it’s written that those ipv6-related registers are used in 
> >> >= IP_INPUT. Am I right, its about lrouter stage "lr_in_ip_input"?
> >
> > That's correct.
> >
> >> If yes, I’ve analyzed ovn-northd code and it looks like first time when 
> >> xxreg0 is used to write is lr_in_defrag.
> >
> > I think you're referring to this code right
> > https://github.com/ovn-org/ovn/blob/master/northd/ovn-northd.c#L9288 
> >  ?
> >
> > If so, yes you'r right and we should definitely update the documentation.
>
> Yes, I looked at this place.
>
> >
> > And xxreg1 is used in write mode first time in lr_in_ip_routing.
> >>
> >> Can you please cofirm I am right, that I can use xxreg1’s subregisters 
> >> (for instance, reg7) in one stage prior to lr_in_ip_routing?
> >
> > I think you can use xxxreg1's sub registers  prior to
> > lr_in_ip_routing.  If you do this, then the usage can be for both IPv4
> > and IPv6.
> >
> > If you want to write xxreg1's sub registers after ip_routing, then
> > your feature would be restricted to just IPv4.
> >
> >
> >> If yes, I think we should fix router’s registers usage documentation:
> >> xxreg0: ">= IP_INPUT" -> ">= DEFRAG"
> >> xxreg1: ">= IP_INPUT" -> ">= IP_ROUTING"
> >
> > Agree.
> >
> > Thanks
> > Numan
> >
> >
> >>
> >> Right?
> >>
> >> Regards,
> >> Vladislav Odintsov
> >>
> >>> On 9 Aug 2021, at 00:09, Numan Siddique  >>> > wrote:
> >>>
> >>> On Thu, Aug 5, 2021 at 1:29 PM Vladislav Odintsov  >>>   >>> >> wrote:
> 
>  Hi,
> 
>  I’m trying to implement multiple routing tables support for Logical 
>  Routers and met some difficulties, need help/advice.
> 
>  How I see it can be used by administrators:
> 
>  1. In LRP’s options field is added key route_table with the routing 
>  table’s name - any string.
>  2. When adding new logical router static route record, supply optional 
>  parameter - route_table.
>  3. If route_table values in LRP and LR route match, packets coming from 
>  this particular LRP are looked up against routes with mentioned 
>  route_table value.
>  4. Routes to directly-connected networks of LRs have higher priority 
>  that routes from route_tables.
> 
>  So, this functionality can help achieve different routes loockup for 
>  different LR’s subnets.
> 
>  Planned changes:
>  - add route_table field to Logical_Router_Static_Route table’s schema
>  - add new optional field to Logical_Router_Static_Routes table, which 
>  when absent indicates that route is global for LR (current behaviour), 
>  and when there is a string with route_table name
> 
>  Regards,
>  Vladislav Odintsov
>  - add new stage in logical router ingress pipeline right before 
>  lr_in_ip_routing - lr_in_ip_routing_pre. In this stage packets are 
>  checked against inport and route_table’s id (computed somehow, uniq per 
>  route_table of LR) writes to some of OVS registers and then passes to 
>  next table - lr_in_ip_routing. Currently in my dev env it looks like 
>  this:
> 
>  table=9 (lr_in_ip_routing_pre), priority=100  , match=(inport == 
>  "subnet-8E3CDCC0"), action=(reg7 = 110; next;)
>  table=9 (lr_in_ip_routing_pre), priority=100  , match=(inport == 
>  "subnet-E99CDA60"), action=(reg7 = 100; next;)
>  table=9 (lr_in_ip_routing_pre), priority=0, match=(1), action=(next;)
> 
>  - routes, which have route_table set, are added to lflow with match on 
>  previously set value of computed 

Re: [ovs-dev] [ovn] Need advice for multiple routing tables support in LR

2021-08-10 Thread Vladislav Odintsov
Thanks Numan for your help!

I’ve sent patch, which fixes documentation:
https://patchwork.ozlabs.org/project/ovn/patch/20210810161537.71692-1-odiv...@gmail.com/

Since lr_in_defrag stage first writes xxreg0 register only in master branch 
after
https://github.com/ovn-org/ovn/commit/fb6de664f061c51040a4aae39049aa3cd4a0b1fa
and prior this commit it was lr_in_ip_routing for a long time, should I make a 
separate
backport patch to branch-21.06 in which lr_in_defrag replaced with 
lr_in_ip_routing
for futher backport down to branch-20.06?
I think it’s reasonable to have actual documentation in old branches too.

Regards,
Vladislav Odintsov

> On 10 Aug 2021, at 18:28, Numan Siddique  wrote:
> 
> On Tue, Aug 10, 2021 at 8:00 AM Vladislav Odintsov  > wrote:
>> 
>> Hi Numan,
>> 
>> thanks for the answer.
>> 
>> You said that it’s possible to use reg2-reg7 for ipv4.
>> I’ve checked usage of xxreg0 and xxreg1 to find out where they’re used in 
>> router pipeline.
>> 
>> In ovn-northd.c it’s written that those ipv6-related registers are used in 
>> >= IP_INPUT. Am I right, its about lrouter stage "lr_in_ip_input"?
> 
> That's correct.
> 
>> If yes, I’ve analyzed ovn-northd code and it looks like first time when 
>> xxreg0 is used to write is lr_in_defrag.
> 
> I think you're referring to this code right
> https://github.com/ovn-org/ovn/blob/master/northd/ovn-northd.c#L9288 
>  ?
> 
> If so, yes you'r right and we should definitely update the documentation.

Yes, I looked at this place.

> 
> And xxreg1 is used in write mode first time in lr_in_ip_routing.
>> 
>> Can you please cofirm I am right, that I can use xxreg1’s subregisters (for 
>> instance, reg7) in one stage prior to lr_in_ip_routing?
> 
> I think you can use xxxreg1's sub registers  prior to
> lr_in_ip_routing.  If you do this, then the usage can be for both IPv4
> and IPv6.
> 
> If you want to write xxreg1's sub registers after ip_routing, then
> your feature would be restricted to just IPv4.
> 
> 
>> If yes, I think we should fix router’s registers usage documentation:
>> xxreg0: ">= IP_INPUT" -> ">= DEFRAG"
>> xxreg1: ">= IP_INPUT" -> ">= IP_ROUTING"
> 
> Agree.
> 
> Thanks
> Numan
> 
> 
>> 
>> Right?
>> 
>> Regards,
>> Vladislav Odintsov
>> 
>>> On 9 Aug 2021, at 00:09, Numan Siddique >> > wrote:
>>> 
>>> On Thu, Aug 5, 2021 at 1:29 PM Vladislav Odintsov >>  >> >> wrote:
 
 Hi,
 
 I’m trying to implement multiple routing tables support for Logical 
 Routers and met some difficulties, need help/advice.
 
 How I see it can be used by administrators:
 
 1. In LRP’s options field is added key route_table with the routing 
 table’s name - any string.
 2. When adding new logical router static route record, supply optional 
 parameter - route_table.
 3. If route_table values in LRP and LR route match, packets coming from 
 this particular LRP are looked up against routes with mentioned 
 route_table value.
 4. Routes to directly-connected networks of LRs have higher priority that 
 routes from route_tables.
 
 So, this functionality can help achieve different routes loockup for 
 different LR’s subnets.
 
 Planned changes:
 - add route_table field to Logical_Router_Static_Route table’s schema
 - add new optional field to Logical_Router_Static_Routes table, which when 
 absent indicates that route is global for LR (current behaviour), and when 
 there is a string with route_table name
 
 Regards,
 Vladislav Odintsov
 - add new stage in logical router ingress pipeline right before 
 lr_in_ip_routing - lr_in_ip_routing_pre. In this stage packets are checked 
 against inport and route_table’s id (computed somehow, uniq per 
 route_table of LR) writes to some of OVS registers and then passes to next 
 table - lr_in_ip_routing. Currently in my dev env it looks like this:
 
 table=9 (lr_in_ip_routing_pre), priority=100  , match=(inport == 
 "subnet-8E3CDCC0"), action=(reg7 = 110; next;)
 table=9 (lr_in_ip_routing_pre), priority=100  , match=(inport == 
 "subnet-E99CDA60"), action=(reg7 = 100; next;)
 table=9 (lr_in_ip_routing_pre), priority=0, match=(1), action=(next;)
 
 - routes, which have route_table set, are added to lflow with match on 
 previously set value of computed uniq route_table id, stored in OVS 
 register. Like this:
 
 table=10(lr_in_ip_routing   ), priority=1, match=(reg7 == 100 && 
 ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 
 169.254.0.1; reg1 = 169.254.0.2; eth.src = 06:00:b1:5f:12:77; outport = 
 "vpc-B15F1277-up"; flags.loopback = 1; next;)
 table=10(lr_in_ip_routing   ), priority=1, match=(reg7 == 110 && 
 

[ovs-dev] [PATCH ovn] northd: fix xxreg{0, 1} registers usage description in router pipeline

2021-08-10 Thread Vladislav Odintsov
XXREG0 and XXREG1 (128-bit registers), are currently used in
router pipeline to store NEXT_HOP_IPV6 and SRC_IPV6 respectively.

First time XXREG0 register is written - in stage lr_in_defrag.
XXREG1 register is written first in lr_in_ip_routing.
This patch updates documentation, where it was wrongly pointed to
lr_in_ip_input router stage.

Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-dev/2021-August/386607.html
Signed-off-by: Vladislav Odintsov 
---
 northd/ovn-northd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 4c164a744..713d81e4c 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -312,7 +312,7 @@ enum ovn_stage {
  * +-+--+ G |   (< IP_INPUT)  | X |   |
  * | R1  |   SRC_IPV4 for ARP-REQ   | 0 | | R |   |
  * | |  (>= IP_INPUT)   |   | | E | NEXT_HOP_IPV6 |
- * +-+--+---+-+ G | (>= IP_INPUT) |
+ * +-+--+---+-+ G | ( >= DEFRAG ) |
  * | R2  |UNUSED| X | | 0 |   |
  * | |  | R | |   |   |
  * +-+--+ E | UNUSED  |   |   |
@@ -324,8 +324,8 @@ enum ovn_stage {
  * +-+--+ E | UNUSED  | X |   |
  * | R5  |UNUSED| G | | X |   |
  * | |  | 2 | | R |SRC_IPV6 for NS|
- * +-+--+---+-+ E | (>= IP_INPUT) |
- * | R6  |UNUSED| X | | G |   |
+ * +-+--+---+-+ E | ( >=  |
+ * | R6  |UNUSED| X | | G | IN_IP_ROUTING)|
  * | |  | R | | 1 |   |
  * +-+--+ E | UNUSED  |   |   |
  * | R7  |UNUSED| G | |   |   |
-- 
2.30.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] Need advice for multiple routing tables support in LR

2021-08-10 Thread Numan Siddique
On Tue, Aug 10, 2021 at 8:00 AM Vladislav Odintsov  wrote:
>
> Hi Numan,
>
> thanks for the answer.
>
> You said that it’s possible to use reg2-reg7 for ipv4.
> I’ve checked usage of xxreg0 and xxreg1 to find out where they’re used in 
> router pipeline.
>
> In ovn-northd.c it’s written that those ipv6-related registers are used in >= 
> IP_INPUT. Am I right, its about lrouter stage "lr_in_ip_input"?

That's correct.

> If yes, I’ve analyzed ovn-northd code and it looks like first time when 
> xxreg0 is used to write is lr_in_defrag.

I think you're referring to this code right
https://github.com/ovn-org/ovn/blob/master/northd/ovn-northd.c#L9288 ?

If so, yes you'r right and we should definitely update the documentation.

 And xxreg1 is used in write mode first time in lr_in_ip_routing.
>
> Can you please cofirm I am right, that I can use xxreg1’s subregisters (for 
> instance, reg7) in one stage prior to lr_in_ip_routing?

I think you can use xxxreg1's sub registers  prior to
lr_in_ip_routing.  If you do this, then the usage can be for both IPv4
and IPv6.

If you want to write xxreg1's sub registers after ip_routing, then
your feature would be restricted to just IPv4.


> If yes, I think we should fix router’s registers usage documentation:
> xxreg0: ">= IP_INPUT" -> ">= DEFRAG"
> xxreg1: ">= IP_INPUT" -> ">= IP_ROUTING"

Agree.

Thanks
Numan


>
> Right?
>
> Regards,
> Vladislav Odintsov
>
> > On 9 Aug 2021, at 00:09, Numan Siddique  wrote:
> >
> > On Thu, Aug 5, 2021 at 1:29 PM Vladislav Odintsov  > > wrote:
> >>
> >> Hi,
> >>
> >> I’m trying to implement multiple routing tables support for Logical 
> >> Routers and met some difficulties, need help/advice.
> >>
> >> How I see it can be used by administrators:
> >>
> >> 1. In LRP’s options field is added key route_table with the routing 
> >> table’s name - any string.
> >> 2. When adding new logical router static route record, supply optional 
> >> parameter - route_table.
> >> 3. If route_table values in LRP and LR route match, packets coming from 
> >> this particular LRP are looked up against routes with mentioned 
> >> route_table value.
> >> 4. Routes to directly-connected networks of LRs have higher priority that 
> >> routes from route_tables.
> >>
> >> So, this functionality can help achieve different routes loockup for 
> >> different LR’s subnets.
> >>
> >> Planned changes:
> >> - add route_table field to Logical_Router_Static_Route table’s schema
> >> - add new optional field to Logical_Router_Static_Routes table, which when 
> >> absent indicates that route is global for LR (current behaviour), and when 
> >> there is a string with route_table name
> >>
> >> Regards,
> >> Vladislav Odintsov
> >> - add new stage in logical router ingress pipeline right before 
> >> lr_in_ip_routing - lr_in_ip_routing_pre. In this stage packets are checked 
> >> against inport and route_table’s id (computed somehow, uniq per 
> >> route_table of LR) writes to some of OVS registers and then passes to next 
> >> table - lr_in_ip_routing. Currently in my dev env it looks like this:
> >>
> >>  table=9 (lr_in_ip_routing_pre), priority=100  , match=(inport == 
> >> "subnet-8E3CDCC0"), action=(reg7 = 110; next;)
> >>  table=9 (lr_in_ip_routing_pre), priority=100  , match=(inport == 
> >> "subnet-E99CDA60"), action=(reg7 = 100; next;)
> >>  table=9 (lr_in_ip_routing_pre), priority=0, match=(1), action=(next;)
> >>
> >> - routes, which have route_table set, are added to lflow with match on 
> >> previously set value of computed uniq route_table id, stored in OVS 
> >> register. Like this:
> >>
> >>  table=10(lr_in_ip_routing   ), priority=1, match=(reg7 == 100 && 
> >> ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 
> >> 169.254.0.1; reg1 = 169.254.0.2; eth.src = 06:00:b1:5f:12:77; outport = 
> >> "vpc-B15F1277-up"; flags.loopback = 1; next;)
> >>  table=10(lr_in_ip_routing   ), priority=1, match=(reg7 == 110 && 
> >> ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 
> >> 169.254.0.1; reg1 = 169.254.0.2; eth.src = 06:00:b1:5f:12:77; outport = 
> >> "vpc-B15F1277-up"; flags.loopback = 1; next;)
> >>
> >>
> >> My question is about OVS register choice.
> >>
> >
> >> I see in ovn-northd.c described OVS fields and (please correct me, if I’m 
> >> wrong) it looks like there are no registers available to store route_table 
> >> id? It looks for me that all registers are busy. What should I do in this 
> >> particular situation? If I set reg10, for instance, ovn-controller can’t 
> >> process such match and action, thus it’s an unknown symbol. If I increase 
> >> the MFF_N_LOG_REGS from 10 to 12, ovn connectivity tests fail and even 
> >> arp/ip connectivity drops. (see commit: 
> >> https://github.com/ovn-org/ovn/commit/18e71b5240009c519dc44eb06d101b3edd9dc103
> >>  
> >> 
> >>  
> >> 

Re: [ovs-dev] [PATCH RFC 0/1] use meson and ninja as a build system for ovs

2021-08-10 Thread Van Haaren, Harry
> -Original Message-
> From: dev  On Behalf Of William Tu
> Sent: Tuesday, August 10, 2021 1:00 AM
> To: Ilya Maximets 
> Cc:  ; Sergey Madaminov
> 
> Subject: Re: [ovs-dev] [PATCH RFC 0/1] use meson and ninja as a build system 
> for ovs
> 
> Hi Ilya,
> 
> Thanks for your feedback!
> 
> > Hi, Sergey and William.  Thanks for working on this.

Hey All,

Nice RFC! Re cloning from github & running meson commands, I had to run
the following as the file isn't renamed in upstream commit (for future readers):
$ git mv lib/dirs.c.in lib/dirs.c.in.meson

I manually decreased the required Meson version, all compiled fine after that!


> > I think that it might be a good idea to move to a different build system
> > that will not be that painful to run on Windows.  I'm not working on
> > Windows parts, but it would be great to have a fast CI that can confirm
> > that everything still working fine.
> 
> Yes, and avoiding the msys/mingw makes coding and debugging on windows
> much easier.

Generally +1 for Meson from me, my DPDK experience has been positive, as
has Meson for various smaller open source projects I've worked on.

I'm happy to contribute to the Meson build-system enabling as far as the  
CPU-ISA
enabling/AVX512 library code goes - although I think Sergey has a (commented)
implementation of the "openvswitchavx512" library in the RFC patchset.



> > However, as I already said in the discussion on GitHub, it seems to be
> > very hard to migrate our testsuite that heavily depends on autotest.
> > And without migrating the testsuite we will, probably, have to maintain
> > two different build systems to be able to run tests.  This, kind of,
> > defeats the whole purpose of the change.
> 
> Why is it very hard?
> I thought once we find a way to add tests to meson, it's just
> "mechanically" copying all these scripts into a new test framework.
> It's a lot of tedious work, but hopefully not a difficult or impossible task.
> 
> Or does current OVS autotests heavily depend on autotool/autoconf?
> I see the test cases are pretty independent.

I'm not very familiar with the tests, but today the AT_CHECK() syntax used to
write the tests is pretty autotools specific. Agree that in theory running the
commands from different test runner infrastructure shouldn't be an issue.

As I'm not familiar as others here, I'll just be quiet on the topic :)


> > Meson is a nice system in many aspects, but its support for tests is very
> > limited.  IIUC, it can only run a single binary and check the error codes.
> > Most of our tests starts several daemons and performs several fairly complex
> > operations and checks.  I'm afraid that we will end up writing our own
> > separate testing framework that will mimic autotest in order to be able to
> > run our tests from meson.
> >
> > Did you think about this problem?  Do you have a solution in mind?
> 
> Thanks, we thought about it but I don't have a solution in mind at this 
> moment.
> 
> Regards,
> William
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: remove the nat_action_info from the conn

2021-08-10 Thread Michael Santana
On Mon, Aug 9, 2021 at 11:41 PM wenxu  wrote:
>
>
>
>
>
>
>
>
> From: Michael Santana 
> Date: 2021-08-09 23:17:06
> To:  we...@ucloud.cn
> Cc:  b...@ovn.org,Ilya Maximets 
> ,dlu...@gmail.com,d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] conntrack: remove the nat_action_info from the 
> conn>nat_info->nat_action
> >
> >On Wed, Aug 4, 2021 at 11:20 PM  wrote:
> >>
> >> From: wenxu 
> >>
> >> Only the nat_action in the nat_action_info is used for conn
> >> packet forward and other item such as min/max_ip/port only
> >> used for the new conn operation. So the whole nat_ction_info
> >> no need store in conn. This will also avoid unnecessary memory
> >> allocation.
> >This seems like a good change. There are like 20 or so instances of
> >nat_info->nat_action compared to the 6 or so instances of
> >min/max/ip/port that I could find.
> >Seems you covered all the instances that needed to be changed which is
> >good. The one thing that is not clear to me is why you changed certain
> >variable declarations to const.
>
> The nat_action_info pass to the nat_get_unique_tuple is const one, The 
> functions
> get the data from nat_action_info also should be const one. Or there are some 
> compile
> warnning.

Ok, makes sense

Acked-By Michael Santana 
>
> >
> >Otherwise LGTM
> >>
> >> Signed-off-by: wenxu 
> >> ---
> >>  lib/conntrack-private.h |   2 +-
> >>  lib/conntrack.c | 101 
> >> +---
> >>  2 files changed, 53 insertions(+), 50 deletions(-)
> >>
> >> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
> >> index 169ace5..dfdf4e6 100644
> >> --- a/lib/conntrack-private.h
> >> +++ b/lib/conntrack-private.h
> >> @@ -98,7 +98,7 @@ struct conn {
> >>  struct conn_key parent_key; /* Only used for orig_tuple support. */
> >>  struct ovs_list exp_node;
> >>  struct cmap_node cm_node;
> >> -struct nat_action_info_t *nat_info;
> >> +uint16_t nat_action;
> >>  char *alg;
> >>  struct conn *nat_conn; /* The NAT 'conn' context, if there is one. */
> >>
> >> diff --git a/lib/conntrack.c b/lib/conntrack.c
> >> index 551c206..33a1a92 100644
> >> --- a/lib/conntrack.c
> >> +++ b/lib/conntrack.c
> >> @@ -111,7 +111,8 @@ static void *clean_thread_main(void *f_);
> >>
> >>  static bool
> >>  nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn,
> >> - struct conn *nat_conn);
> >> + struct conn *nat_conn,
> >> + const struct nat_action_info_t *nat_info);
> >>
> >>  static uint8_t
> >>  reverse_icmp_type(uint8_t type);
> >> @@ -724,7 +725,7 @@ handle_alg_ctl(struct conntrack *ct, const struct 
> >> conn_lookup_ctx *ctx,
> >>  static void
> >>  pat_packet(struct dp_packet *pkt, const struct conn *conn)
> >>  {
> >> -if (conn->nat_info->nat_action & NAT_ACTION_SRC) {
> >> +if (conn->nat_action & NAT_ACTION_SRC) {
> >>  if (conn->key.nw_proto == IPPROTO_TCP) {
> >>  struct tcp_header *th = dp_packet_l4(pkt);
> >>  packet_set_tcp_port(pkt, conn->rev_key.dst.port, th->tcp_dst);
> >> @@ -732,7 +733,7 @@ pat_packet(struct dp_packet *pkt, const struct conn 
> >> *conn)
> >>  struct udp_header *uh = dp_packet_l4(pkt);
> >>  packet_set_udp_port(pkt, conn->rev_key.dst.port, uh->udp_dst);
> >>  }
> >> -} else if (conn->nat_info->nat_action & NAT_ACTION_DST) {
> >> +} else if (conn->nat_action & NAT_ACTION_DST) {
> >>  if (conn->key.nw_proto == IPPROTO_TCP) {
> >>  packet_set_tcp_port(pkt, conn->rev_key.dst.port,
> >>  conn->rev_key.src.port);
> >> @@ -746,7 +747,7 @@ pat_packet(struct dp_packet *pkt, const struct conn 
> >> *conn)
> >>  static void
> >>  nat_packet(struct dp_packet *pkt, const struct conn *conn, bool related)
> >>  {
> >> -if (conn->nat_info->nat_action & NAT_ACTION_SRC) {
> >> +if (conn->nat_action & NAT_ACTION_SRC) {
> >>  pkt->md.ct_state |= CS_SRC_NAT;
> >>  if (conn->key.dl_type == htons(ETH_TYPE_IP)) {
> >>  struct ip_header *nh = dp_packet_l3(pkt);
> >> @@ -761,7 +762,7 @@ nat_packet(struct dp_packet *pkt, const struct conn 
> >> *conn, bool related)
> >>  if (!related) {
> >>  pat_packet(pkt, conn);
> >>  }
> >> -} else if (conn->nat_info->nat_action & NAT_ACTION_DST) {
> >> +} else if (conn->nat_action & NAT_ACTION_DST) {
> >>  pkt->md.ct_state |= CS_DST_NAT;
> >>  if (conn->key.dl_type == htons(ETH_TYPE_IP)) {
> >>  struct ip_header *nh = dp_packet_l3(pkt);
> >> @@ -782,7 +783,7 @@ nat_packet(struct dp_packet *pkt, const struct conn 
> >> *conn, bool related)
> >>  static void
> >>  un_pat_packet(struct dp_packet *pkt, const struct conn *conn)
> >>  {
> >> -if (conn->nat_info->nat_action & NAT_ACTION_SRC) {
> >> +if (conn->nat_action & NAT_ACTION_SRC) {
> >>  if (conn->key.nw_proto == IPPROTO_TCP) {
> >>   

Re: [ovs-dev] [PATCH] netlink-socket: Log extack error messages in netlink transactions.

2021-08-10 Thread Paolo Valerio
Hi Marcelo,

thanks for your feedback.

Marcelo Ricardo Leitner  writes:

> Hi,
>
> On Thu, Aug 05, 2021 at 11:55:51PM +0200, Paolo Valerio wrote:
>
> Now that you put these two together:
>
>> As an example, with this patch applied, the following generic message:
>> 
>> ovs|00239|netlink_socket|DBG|received NAK error=0 (Operation not supported)
>  ^^^
>> 
>> becomes:
>> 
>> ovs|00239|netlink_socket|DBG|received NAK error=0 - Conntrack isn't enabled
>  ^^^
> (more on the error below)
>
> Much better the msg!
>
> ...
>> @@ -910,15 +916,17 @@ nl_sock_transact_multiple__(struct nl_sock *sock,
>>  i = seq - base_seq;
>>  txn = transactions[i];
>>  
>> +const char *err_msg = NULL;
>>  /* Fill in the results for 'txn'. */
>> -if (nl_msg_nlmsgerr(buf_txn->reply, >error)) {
>> +if (nl_msg_nlmsgerr(buf_txn->reply, >error, _msg)) {
>> +if (txn->error) {
>> +VLOG_DBG_RL(, "received NAK error=%d - %s",
>> +error,
>
> Sounds like 'error' here should have been 'txn->error' ?
>
> error here is the return of nl_sock_recv__(). It's something else.
>
>> +err_msg ? err_msg : ovs_strerror(txn->error));
>> +}
>>  if (txn->reply) {
>>  ofpbuf_clear(txn->reply);
>>  }
>> -if (txn->error) {
>> -VLOG_DBG_RL(, "received NAK error=%d (%s)",
>> -error, ovs_strerror(txn->error));
>^
>

You're right, I think it should have been txn->error.
Thanks for noticing.

I'll resend.

>> -}
>>  } else {
>>  txn->error = 0;
>>  if (txn->reply && txn != buf_txn) {
>
> Thanks,
> Marcelo

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] Documentation: Cleanup PMD information.

2021-08-10 Thread Kevin Traynor
The 'Port/Rx Queue Assigment to PMD Threads' section has
expanded over time and now includes info about stats/commands,
manual pinning and different options for OVS assigning Rxqs to
PMDs.

Split them into different sections with sub-headings and move
the two similar paragraphs about stats together.

Rename 'Automatic assignment of Port/Rx Queue to PMD Threads'
section to 'PMD Automatic Load Balance'.

A few other minor cleanups as I was reading.

Signed-off-by: Kevin Traynor 
Acked-by: Adrian Moreno 
---

v2:
- a couple of small fixes as per Adrian's comments
- remove duplicate PMD ALB conditions paragraph
---
 Documentation/topics/dpdk/pmd.rst | 104 --
 1 file changed, 54 insertions(+), 50 deletions(-)

diff --git a/Documentation/topics/dpdk/pmd.rst 
b/Documentation/topics/dpdk/pmd.rst
index 95fa7af12..b0e2419c2 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -75,8 +75,47 @@ for enabling things like multiqueue for :ref:`physical 
`
 and :ref:`vhost-user ` interfaces.
 
-To show port/Rx queue assignment::
+Rx queues will be assigned to PMD threads by OVS, or they can be manually
+pinned to PMD threads by the user.
+
+To see the port/Rx queue assignment and current measured usage history of PMD
+core cycles for each Rx queue::
 
 $ ovs-appctl dpif-netdev/pmd-rxq-show
 
+.. note::
+
+   A history of one minute is recorded and shown for each Rx queue to allow for
+   traffic pattern spikes. Any changes in the Rx queue's PMD core cycles usage,
+   due to traffic pattern or reconfig changes, will take one minute to be fully
+   reflected in the stats.
+
+.. versionchanged:: 2.16.0
+
+   A ``overhead`` statistics is shown per PMD: it represents the number of
+   cycles inherently consumed by the OVS PMD processing loop.
+
+Rx queue to PMD assignment takes place whenever there are configuration changes
+or can be triggered by using::
+
+$ ovs-appctl dpif-netdev/pmd-rxq-rebalance
+
+.. versionchanged:: 2.6.0
+
+   The ``pmd-rxq-show`` command was added in OVS 2.6.0.
+
+.. versionchanged:: 2.9.0
+
+   Utilization-based allocation of Rx queues to PMDs and the
+   ``pmd-rxq-rebalance`` command were added in OVS 2.9.0. Prior to this,
+   allocation was round-robin and processing cycles were not taken into
+   consideration.
+
+   In addition, the output of ``pmd-rxq-show`` was modified to include
+   Rx queue utilization of the PMD as a percentage. Prior to this, tracking of
+   stats was not available.
+
+
+Port/Rx Queue assignment to PMD threads by manual pinning
+~
 Rx queues may be manually pinned to cores. This will change the default Rx
 queue assignment to PMD threads::
@@ -117,4 +156,6 @@ If using ``pmd-rxq-assign=group`` PMD threads with *pinned* 
Rxqs can be
a *non-isolated* PMD, that will remain *non-isolated*.
 
+Automatic Port/Rx Queue assignment to PMD threads
+~
 If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned to PMDs
 (cores) automatically.
@@ -126,7 +167,9 @@ The algorithm used to automatically assign Rxqs to PMDs can 
be set by::
 By default, ``cycles`` assignment is used where the Rxqs will be ordered by
 their measured processing cycles, and then be evenly assigned in descending
-order to PMDs based on an up/down walk of the PMDs. For example, where there
-are five Rx queues and three cores - 3, 7, and 8 - available and the measured
-usage of core cycles per Rx queue over the last interval is seen to be:
+order to PMDs. The PMD that will be selected for a given Rxq will be the next
+one in alternating ascending/descending order based on core id. For example,
+where there are five Rx queues and three cores - 3, 7, and 8 - available and
+the measured usage of core cycles per Rx queue over the last interval is seen
+to be:
 
 - Queue #0: 30%
@@ -184,48 +227,11 @@ The Rx queues may be assigned to the cores in the 
following order::
 Core 8: P1Q0 |
 
-To see the current measured usage history of PMD core cycles for each Rx
-queue::
-
-$ ovs-appctl dpif-netdev/pmd-rxq-show
-
-.. note::
-
-   A history of one minute is recorded and shown for each Rx queue to allow for
-   traffic pattern spikes. Any changes in the Rx queue's PMD core cycles usage,
-   due to traffic pattern or reconfig changes, will take one minute to be fully
-   reflected in the stats.
-
-.. versionchanged:: 2.16.0
-
-   A ``overhead`` statistics is shown per PMD: it represents the number of
-   cycles inherently consumed by the OVS PMD processing loop.
-
-Rx queue to PMD assignment takes place whenever there are configuration changes
-or can be triggered by using::
-
-$ ovs-appctl dpif-netdev/pmd-rxq-rebalance
-
-.. versionchanged:: 2.6.0
-
-   The ``pmd-rxq-show`` command was added in OVS 2.6.0.
-
-.. versionchanged:: 2.9.0
-
-   Utilization-based allocation of Rx queues to PMDs and the
-   ``pmd-rxq-rebalance`` 

Re: [ovs-dev] [PATCH] Documentation: Cleanup PMD information.

2021-08-10 Thread Kevin Traynor
On 05/08/2021 09:32, Adrian Moreno wrote:
> 
> On 7/22/21 3:25 PM, Kevin Traynor wrote:
>> The 'Port/Rx Queue Assigment to PMD Threads' section has
>> expanded over time and now includes info about stats/commands,
>> manual pinning and different options for OVS assigning Rxqs to
>> PMDs.
>>
>> Split them into different sections with sub-headings and move
>> the two similar paragraphs about stats together.
>>
>> Rename 'Automatic assignment of Port/Rx Queue to PMD Threads'
>> section to 'PMD Automatic Load Balance'.
>>
>> A few other minor cleanups as I was reading.
>>
>> Signed-off-by: Kevin Traynor 
>> ---
>>  Documentation/topics/dpdk/pmd.rst | 101 --
>>  1 file changed, 54 insertions(+), 47 deletions(-)
>>
>> diff --git a/Documentation/topics/dpdk/pmd.rst 
>> b/Documentation/topics/dpdk/pmd.rst
>> index 95fa7af12..45c24d100 100644
>> --- a/Documentation/topics/dpdk/pmd.rst
>> +++ b/Documentation/topics/dpdk/pmd.rst
>> @@ -75,8 +75,47 @@ for enabling things like multiqueue for :ref:`physical 
>> `
>>  and :ref:`vhost-user ` interfaces.
>>  
>> -To show port/Rx queue assignment::
>> +Rx queues will be assigned to PMD threads by OVS, or they can be  manually
>> +pinned to PMD threads by the user.
>> +
> 
> Nit: double space between "be" and "manually"

fixed.

> 
>> +To see the port/Rx queue assignment and current measured usage history of 
>> PMD
>> +core cycles for each Rx queue::
>>  
>>  $ ovs-appctl dpif-netdev/pmd-rxq-show
>>  
>> +.. note::
>> +
>> +   A history of one minute is recorded and shown for each Rx queue to allow 
>> for
>> +   traffic pattern spikes. Any changes in the Rx queue's PMD core cycles 
>> usage,
>> +   due to traffic pattern or reconfig changes, will take one minute to be 
>> fully
>> +   reflected in the stats.
>> +
>> +.. versionchanged:: 2.16.0
>> +
>> +   A ``overhead`` statistics is shown per PMD: it represents the number of
>> +   cycles inherently consumed by the OVS PMD processing loop.
>> +
>> +Rx queue to PMD assignment takes place whenever there are configuration 
>> changes
>> +or can be triggered by using::
>> +
>> +$ ovs-appctl dpif-netdev/pmd-rxq-rebalance
>> +
>> +.. versionchanged:: 2.6.0
>> +
>> +   The ``pmd-rxq-show`` command was added in OVS 2.6.0.
>> +
>> +.. versionchanged:: 2.9.0
>> +
>> +   Utilization-based allocation of Rx queues to PMDs and the
>> +   ``pmd-rxq-rebalance`` command were added in OVS 2.9.0. Prior to this,
>> +   allocation was round-robin and processing cycles were not taken into
>> +   consideration.
>> +
>> +   In addition, the output of ``pmd-rxq-show`` was modified to include
>> +   Rx queue utilization of the PMD as a percentage. Prior to this, tracking 
>> of
>> +   stats was not available.
>> +
>> +
>> +Port/Rx Queue assignment to PMD threads by manual pinning
>> +~
>>  Rx queues may be manually pinned to cores. This will change the default Rx
>>  queue assignment to PMD threads::
>> @@ -117,4 +156,6 @@ If using ``pmd-rxq-assign=group`` PMD threads with 
>> *pinned* Rxqs can be
>> a *non-isolated* PMD, that will remain *non-isolated*.
>>  
>> +Automatic Port/Rx Queue assignment to PMD threads
>> +~
>>  If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned to 
>> PMDs
>>  (cores) automatically.
>> @@ -126,7 +167,9 @@ The algorithm used to automatically assign Rxqs to PMDs 
>> can be set by::
>>  By default, ``cycles`` assignment is used where the Rxqs will be ordered by
>>  their measured processing cycles, and then be evenly assigned in descending
>> -order to PMDs based on an up/down walk of the PMDs. For example, where there
>> -are five Rx queues and three cores - 3, 7, and 8 - available and the 
>> measured
>> -usage of core cycles per Rx queue over the last interval is seen to be:
>> +order to PMDs. The PMD that will be selected for a given Rxq will be the 
>> next
>> +one in alternating ascending/descending order based on core id. For example,
>> +where there are five Rx queues and three cores - 3, 7, and 8 - available and
>> +the measured usage of core cycles per Rx queue over the last interval is 
>> seen
>> +to be:
>>  
>>  - Queue #0: 30%
>> @@ -184,43 +227,6 @@ The Rx queues may be assigned to the cores in the 
>> following order::
>>  Core 8: P1Q0 |
>>  
>> -To see the current measured usage history of PMD core cycles for each Rx
>> -queue::
>> -
>> -$ ovs-appctl dpif-netdev/pmd-rxq-show
>> -
>> -.. note::
>> -
>> -   A history of one minute is recorded and shown for each Rx queue to allow 
>> for
>> -   traffic pattern spikes. Any changes in the Rx queue's PMD core cycles 
>> usage,
>> -   due to traffic pattern or reconfig changes, will take one minute to be 
>> fully
>> -   reflected in the stats.
>> -
>> -.. versionchanged:: 2.16.0
>> -
>> -   A ``overhead`` statistics is shown per PMD: it represents the number of
>> -   cycles inherently 

Re: [ovs-dev] [ovn] Need advice for multiple routing tables support in LR

2021-08-10 Thread Vladislav Odintsov
Hi Numan,

thanks for the answer.

You said that it’s possible to use reg2-reg7 for ipv4.
I’ve checked usage of xxreg0 and xxreg1 to find out where they’re used in 
router pipeline.

In ovn-northd.c it’s written that those ipv6-related registers are used in >= 
IP_INPUT. Am I right, its about lrouter stage "lr_in_ip_input"?
If yes, I’ve analyzed ovn-northd code and it looks like first time when xxreg0 
is used to write is lr_in_defrag. And xxreg1 is used in write mode first time 
in lr_in_ip_routing.

Can you please cofirm I am right, that I can use xxreg1’s subregisters (for 
instance, reg7) in one stage prior to lr_in_ip_routing?
If yes, I think we should fix router’s registers usage documentation:
xxreg0: ">= IP_INPUT" -> ">= DEFRAG"
xxreg1: ">= IP_INPUT" -> ">= IP_ROUTING"

Right?

Regards,
Vladislav Odintsov

> On 9 Aug 2021, at 00:09, Numan Siddique  wrote:
> 
> On Thu, Aug 5, 2021 at 1:29 PM Vladislav Odintsov  > wrote:
>> 
>> Hi,
>> 
>> I’m trying to implement multiple routing tables support for Logical Routers 
>> and met some difficulties, need help/advice.
>> 
>> How I see it can be used by administrators:
>> 
>> 1. In LRP’s options field is added key route_table with the routing table’s 
>> name - any string.
>> 2. When adding new logical router static route record, supply optional 
>> parameter - route_table.
>> 3. If route_table values in LRP and LR route match, packets coming from this 
>> particular LRP are looked up against routes with mentioned route_table value.
>> 4. Routes to directly-connected networks of LRs have higher priority that 
>> routes from route_tables.
>> 
>> So, this functionality can help achieve different routes loockup for 
>> different LR’s subnets.
>> 
>> Planned changes:
>> - add route_table field to Logical_Router_Static_Route table’s schema
>> - add new optional field to Logical_Router_Static_Routes table, which when 
>> absent indicates that route is global for LR (current behaviour), and when 
>> there is a string with route_table name
>> 
>> Regards,
>> Vladislav Odintsov
>> - add new stage in logical router ingress pipeline right before 
>> lr_in_ip_routing - lr_in_ip_routing_pre. In this stage packets are checked 
>> against inport and route_table’s id (computed somehow, uniq per route_table 
>> of LR) writes to some of OVS registers and then passes to next table - 
>> lr_in_ip_routing. Currently in my dev env it looks like this:
>> 
>>  table=9 (lr_in_ip_routing_pre), priority=100  , match=(inport == 
>> "subnet-8E3CDCC0"), action=(reg7 = 110; next;)
>>  table=9 (lr_in_ip_routing_pre), priority=100  , match=(inport == 
>> "subnet-E99CDA60"), action=(reg7 = 100; next;)
>>  table=9 (lr_in_ip_routing_pre), priority=0, match=(1), action=(next;)
>> 
>> - routes, which have route_table set, are added to lflow with match on 
>> previously set value of computed uniq route_table id, stored in OVS 
>> register. Like this:
>> 
>>  table=10(lr_in_ip_routing   ), priority=1, match=(reg7 == 100 && 
>> ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 
>> 169.254.0.1; reg1 = 169.254.0.2; eth.src = 06:00:b1:5f:12:77; outport = 
>> "vpc-B15F1277-up"; flags.loopback = 1; next;)
>>  table=10(lr_in_ip_routing   ), priority=1, match=(reg7 == 110 && 
>> ip4.dst == 0.0.0.0/0), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 
>> 169.254.0.1; reg1 = 169.254.0.2; eth.src = 06:00:b1:5f:12:77; outport = 
>> "vpc-B15F1277-up"; flags.loopback = 1; next;)
>> 
>> 
>> My question is about OVS register choice.
>> 
> 
>> I see in ovn-northd.c described OVS fields and (please correct me, if I’m 
>> wrong) it looks like there are no registers available to store route_table 
>> id? It looks for me that all registers are busy. What should I do in this 
>> particular situation? If I set reg10, for instance, ovn-controller can’t 
>> process such match and action, thus it’s an unknown symbol. If I increase 
>> the MFF_N_LOG_REGS from 10 to 12, ovn connectivity tests fail and even 
>> arp/ip connectivity drops. (see commit: 
>> https://github.com/ovn-org/ovn/commit/18e71b5240009c519dc44eb06d101b3edd9dc103
>>  
>> 
>>  
>> >  
>> >)
>> 
>> I’m new to OVN development, please help :)
>> Thanks in advance!
> 
> From your email,  I'm not clear about the new feature you're planning
> to add.  But since your question is about registers,  I guess I can
> answer your question.
> 
> Registers R0 - R9 are used by ovn-northd  and regsiters R10 - 15 are
> used by ovn-controller -
> https://github.com/ovn-org/ovn/blob/master/include/ovn/logical-fields.h#L34 
> 
> The reason why the tests failed when your increared the MFF_N_LOG_REGS
> to 12 is because R10 

Re: [ovs-dev] [PATCH] dpcls: fix build on compilers without AVX512-VPOPCNT

2021-08-10 Thread Stokes, Ian
> > This commit adds extra checks around the AVX-512 vpopcnt instruction
> > enabling, ensuring that in the function where the ISA is enabled the
> > compiler has also indicated its support for the ISA. This is achieved
> > by checking the __AVX512VPOPCNTDQ__ define, which the compiler sets if
> > it is capable of handling the vpopcnt instruction.
> >
> > If the compiler is not capable of handling vpopcnt, we fall back to
> > the emulated vpopcnt implementation.
> >
> > Reported-by: Ian Stokes 
> > Signed-off-by: Harry van Haaren 
> >
> > ---
> >
> > Based on a very old system with GCC 7, an issue was identified
> > where the compiler doesn't support the vpopcnt ISA, and resulted
> > in compilation failures.
> 
> Ping on this patch, would be good to get integrated on 2.16 and master to
> ensure Gcc7 builds correctly.

Thanks, tested OK, applied to mast and branch 2.16

Regards
Ian

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpcls: fix build on compilers without AVX512-VPOPCNT

2021-08-10 Thread Stokes, Ian
> On 8/10/21 12:11 PM, Stokes, Ian wrote:
> >>> -Original Message-
> >>> From: Van Haaren, Harry 
> >>> Sent: Thursday, July 29, 2021 5:55 PM
> >>> To: ovs-dev@openvswitch.org
> >>> Cc: Stokes, Ian ; Van Haaren, Harry
> >>> 
> >>> Subject: [PATCH] dpcls: fix build on compilers without AVX512-VPOPCNT
> >>>
> >>> This commit adds extra checks around the AVX-512 vpopcnt instruction
> >>> enabling, ensuring that in the function where the ISA is enabled the
> >>> compiler has also indicated its support for the ISA. This is achieved
> >>> by checking the __AVX512VPOPCNTDQ__ define, which the compiler sets if
> >>> it is capable of handling the vpopcnt instruction.
> >>>
> >>> If the compiler is not capable of handling vpopcnt, we fall back to
> >>> the emulated vpopcnt implementation.
> >>>
> >>> Reported-by: Ian Stokes 
> >>> Signed-off-by: Harry van Haaren 
> >>>
> >>> ---
> >>>
> >>> Based on a very old system with GCC 7, an issue was identified
> >>> where the compiler doesn't support the vpopcnt ISA, and resulted
> >>> in compilation failures.
> >>
> >> Ping on this patch, would be good to get integrated on 2.16 and master to
> >> ensure Gcc7 builds correctly.
> >
> > HI Harry,
> >
> > Just testing this now.
> >
> > Regards
> > Ian
> 
> FWIW,
> I tried to test this yesterday, but I realized that you need gcc 7.0.0
> for this, because AVX512VPOPCNTDQ is supported starting 7.0.1.
> That's a very weird system you have. :)
> 
> Unfortunately, I realized this too late when I already built 7.5.0 from
> sources and I didn't want to waste another 40 mins building 7.0.0.
> So, I carved out support for AVX512VPOPCNTDQ from gcc 7.5.0.
> In this configuration it failed to build OVS with the following error:
> 
> lib/dpif-netdev-lookup-avx512-gather.c: In function
> ‘_mm512_popcnt_epi64_wrapper’:
> lib/dpif-netdev-lookup-avx512-gather.c:62:12: error: implicit declaration of
> function ‘_mm512_popcnt_epi64’; did you mean ‘_mm512_lzcnt_epi64’? [-
> Werror=implicit-function-declaration]
>  return _mm512_popcnt_epi64(v_in);
> ^~~
> _mm512_lzcnt_epi64
> 
> With the patch applied, this modified gcc was able to build OVS
> successfully.

Thanks for checking Ilya.

I've an older system in our lab that I was able to check and reproduce the 
issue and confirm it resolves the issue.

I'll add a fixes tag and apply to master and branch 2.16.

Thanks
Ian

> 
> Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpcls: fix build on compilers without AVX512-VPOPCNT

2021-08-10 Thread Ilya Maximets
On 8/10/21 12:11 PM, Stokes, Ian wrote:
>>> -Original Message-
>>> From: Van Haaren, Harry 
>>> Sent: Thursday, July 29, 2021 5:55 PM
>>> To: ovs-dev@openvswitch.org
>>> Cc: Stokes, Ian ; Van Haaren, Harry
>>> 
>>> Subject: [PATCH] dpcls: fix build on compilers without AVX512-VPOPCNT
>>>
>>> This commit adds extra checks around the AVX-512 vpopcnt instruction
>>> enabling, ensuring that in the function where the ISA is enabled the
>>> compiler has also indicated its support for the ISA. This is achieved
>>> by checking the __AVX512VPOPCNTDQ__ define, which the compiler sets if
>>> it is capable of handling the vpopcnt instruction.
>>>
>>> If the compiler is not capable of handling vpopcnt, we fall back to
>>> the emulated vpopcnt implementation.
>>>
>>> Reported-by: Ian Stokes 
>>> Signed-off-by: Harry van Haaren 
>>>
>>> ---
>>>
>>> Based on a very old system with GCC 7, an issue was identified
>>> where the compiler doesn't support the vpopcnt ISA, and resulted
>>> in compilation failures.
>>
>> Ping on this patch, would be good to get integrated on 2.16 and master to
>> ensure Gcc7 builds correctly.
> 
> HI Harry,
> 
> Just testing this now.
> 
> Regards
> Ian

FWIW,
I tried to test this yesterday, but I realized that you need gcc 7.0.0
for this, because AVX512VPOPCNTDQ is supported starting 7.0.1.
That's a very weird system you have. :)

Unfortunately, I realized this too late when I already built 7.5.0 from
sources and I didn't want to waste another 40 mins building 7.0.0.
So, I carved out support for AVX512VPOPCNTDQ from gcc 7.5.0.
In this configuration it failed to build OVS with the following error:

lib/dpif-netdev-lookup-avx512-gather.c: In function 
‘_mm512_popcnt_epi64_wrapper’:
lib/dpif-netdev-lookup-avx512-gather.c:62:12: error: implicit declaration of 
function ‘_mm512_popcnt_epi64’; did you mean ‘_mm512_lzcnt_epi64’? 
[-Werror=implicit-function-declaration]
 return _mm512_popcnt_epi64(v_in);
^~~
_mm512_lzcnt_epi64

With the patch applied, this modified gcc was able to build OVS
successfully.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpcls: fix build on compilers without AVX512-VPOPCNT

2021-08-10 Thread Stokes, Ian
> > -Original Message-
> > From: Van Haaren, Harry 
> > Sent: Thursday, July 29, 2021 5:55 PM
> > To: ovs-dev@openvswitch.org
> > Cc: Stokes, Ian ; Van Haaren, Harry
> > 
> > Subject: [PATCH] dpcls: fix build on compilers without AVX512-VPOPCNT
> >
> > This commit adds extra checks around the AVX-512 vpopcnt instruction
> > enabling, ensuring that in the function where the ISA is enabled the
> > compiler has also indicated its support for the ISA. This is achieved
> > by checking the __AVX512VPOPCNTDQ__ define, which the compiler sets if
> > it is capable of handling the vpopcnt instruction.
> >
> > If the compiler is not capable of handling vpopcnt, we fall back to
> > the emulated vpopcnt implementation.
> >
> > Reported-by: Ian Stokes 
> > Signed-off-by: Harry van Haaren 
> >
> > ---
> >
> > Based on a very old system with GCC 7, an issue was identified
> > where the compiler doesn't support the vpopcnt ISA, and resulted
> > in compilation failures.
> 
> Ping on this patch, would be good to get integrated on 2.16 and master to
> ensure Gcc7 builds correctly.

HI Harry,

Just testing this now.

Regards
Ian

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpcls: fix build on compilers without AVX512-VPOPCNT

2021-08-10 Thread Van Haaren, Harry
> -Original Message-
> From: Van Haaren, Harry 
> Sent: Thursday, July 29, 2021 5:55 PM
> To: ovs-dev@openvswitch.org
> Cc: Stokes, Ian ; Van Haaren, Harry
> 
> Subject: [PATCH] dpcls: fix build on compilers without AVX512-VPOPCNT
> 
> This commit adds extra checks around the AVX-512 vpopcnt instruction
> enabling, ensuring that in the function where the ISA is enabled the
> compiler has also indicated its support for the ISA. This is achieved
> by checking the __AVX512VPOPCNTDQ__ define, which the compiler sets if
> it is capable of handling the vpopcnt instruction.
> 
> If the compiler is not capable of handling vpopcnt, we fall back to
> the emulated vpopcnt implementation.
> 
> Reported-by: Ian Stokes 
> Signed-off-by: Harry van Haaren 
> 
> ---
> 
> Based on a very old system with GCC 7, an issue was identified
> where the compiler doesn't support the vpopcnt ISA, and resulted
> in compilation failures.

Ping on this patch, would be good to get integrated on 2.16 and master to
ensure Gcc7 builds correctly.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev