Re: [ovs-dev] [PATCH ovn] ovn-sbctl.at: Fix timing problem of count-flows test.

2021-07-15 Thread Han Zhou
On Thu, Jul 15, 2021 at 4:57 PM Ben Pfaff  wrote:

> On Thu, Jul 15, 2021 at 04:34:51PM -0700, Han Zhou wrote:
> > Fixes: 895e02ec0be6("ovn-sbctl.c Add logical flows count numbers")
> > Signed-off-by: Han Zhou 
> > ---
> >  tests/ovn-sbctl.at | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/tests/ovn-sbctl.at b/tests/ovn-sbctl.at
> > index 16f5dabcc..dabf62d2b 100644
> > --- a/tests/ovn-sbctl.at
> > +++ b/tests/ovn-sbctl.at
> > @@ -200,7 +200,7 @@ Total number of logical flows = 0
> >  ])
> >
> >  # create some logical flows
> > -check ovn-nbctl ls-add count-test
> > +check ovn-nbctl --wait=sb ls-add count-test
> >
> >  OVS_WAIT_UNTIL([total_lflows=`count_entries`; test $total_lflows -ne 0])
>
> I'm surprised that the above change is needed, since OVS_WAIT_UNTIL
> should wait until the condition is true.


Yes you are right. This is not needed, but I think it’s not harmful either.
Do you want me to send a v2 with this removed?

The second place is where this test occasionally fails.


>
> > @@ -219,7 +219,7 @@ $ingress_lflows
> >  ])
> >
> >  # add another datapath
> > -check ovn-nbctl ls-add count-test2
> > +check ovn-nbctl --wait=sb ls-add count-test2
> >
> >  # check total logical flows in 2 datapathes
> >  AT_CHECK_UNQUOTED([ovn-sbctl count-flows | grep "flows =" | awk
> 'NF>1{print $NF}'], [0], [dnl
>
> The one above makes sense to me.
>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] controller: Avoid unnecessary load balancer flow processing.

2021-07-15 Thread Han Zhou
On Thu, Jul 15, 2021 at 5:03 PM Ben Pfaff  wrote:

> On Mon, Jul 12, 2021 at 10:08:10AM +0200, Dumitru Ceara wrote:
> > Whenever a Load_Balancer is updated, e.g., a VIP is added, the following
> > sequence of events happens:
> >
> > 1. The Southbound Load_Balancer record is updated.
> > 2. The Southbound Datapath_Binding records on which the Load_Balancer is
> >applied are updated.
> > 3. Southbound ovsdb-server sends updates about the Load_Balancer and
> >Datapath_Binding records to ovn-controller.
> > 4. The IDL layer in ovn-controller processes the updates at #3, but
> >because of the SB schema references between tables [0] all logical
> >flows referencing the updated Datapath_Binding are marked as
> >"updated".  The same is true for Logical_DP_Group records
> >referencing the Datapath_Binding, and also for all logical flows
> >pointing to the new "updated" datapath groups.
> > 5. ovn-controller ends up recomputing (removing/readding) all flows for
> >all these tracked updates.
>
> This is kind of a weird change from my perspective.  It allows for
> broken referential integrity in the database to work around a
> performance bug in the IDL.


Yes, it did look weird and there were detailed discussions on it in the v1
reviews. Some options that require much bigger changes were discussed for
longer term, unless ddlog is used.


> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 3/4] ovn-northd: Populate in_out_port in logical_flow table's tags.

2021-07-15 Thread Han Zhou
Populate the in_out_port tag for logical switch pipeline flows wherever
possible.

Signed-off-by: Han Zhou 
---
 northd/ovn-northd.c  | 272 ++--
 northd/ovn_northd.dl | 495 +--
 tests/ovn-northd.at  |  21 ++
 3 files changed, 470 insertions(+), 318 deletions(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index ff81bf540..534bb9f97 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -4213,6 +4213,7 @@ struct ovn_lflow {
 uint16_t priority;
 char *match;
 char *actions;
+char *io_port;
 char *stage_hint;
 const char *where;
 };
@@ -4248,7 +4249,7 @@ ovn_lflow_equal(const struct ovn_lflow *a, const struct 
ovn_datapath *od,
 static void
 ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od,
enum ovn_stage stage, uint16_t priority,
-   char *match, char *actions, char *stage_hint,
+   char *match, char *actions, char *io_port, char *stage_hint,
const char *where)
 {
 hmapx_init(>od_group);
@@ -4257,6 +4258,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, struct 
ovn_datapath *od,
 lflow->priority = priority;
 lflow->match = match;
 lflow->actions = actions;
+lflow->io_port = io_port;
 lflow->stage_hint = stage_hint;
 lflow->where = where;
 }
@@ -4274,7 +4276,7 @@ static struct hashrow_locks lflow_locks;
 static void
 do_ovn_lflow_add(struct hmap *lflow_map, struct ovn_datapath *od,
  uint32_t hash, enum ovn_stage stage, uint16_t priority,
- const char *match, const char *actions,
+ const char *match, const char *actions, const char *io_port,
  const struct ovsdb_idl_row *stage_hint,
  const char *where)
 {
@@ -4297,6 +4299,7 @@ do_ovn_lflow_add(struct hmap *lflow_map, struct 
ovn_datapath *od,
  * one datapath in a group, so it could be hashed correctly. */
 ovn_lflow_init(lflow, NULL, stage, priority,
xstrdup(match), xstrdup(actions),
+   io_port ? xstrdup(io_port) : NULL,
ovn_lflow_hint(stage_hint), where);
 hmapx_add(>od_group, od);
 hmap_insert_fast(lflow_map, >hmap_node, hash);
@@ -4306,7 +4309,7 @@ do_ovn_lflow_add(struct hmap *lflow_map, struct 
ovn_datapath *od,
 static void
 ovn_lflow_add_at(struct hmap *lflow_map, struct ovn_datapath *od,
  enum ovn_stage stage, uint16_t priority,
- const char *match, const char *actions,
+ const char *match, const char *actions, const char *io_port,
  const struct ovsdb_idl_row *stage_hint, const char *where)
 {
 ovs_assert(ovn_stage_to_datapath_type(stage) == ovn_datapath_get_type(od));
@@ -4321,11 +4324,11 @@ ovn_lflow_add_at(struct hmap *lflow_map, struct 
ovn_datapath *od,
 if (use_logical_dp_groups && use_parallel_build) {
 lock_hash_row(_locks, hash);
 do_ovn_lflow_add(lflow_map, od, hash, stage, priority, match,
- actions, stage_hint, where);
+ actions, io_port, stage_hint, where);
 unlock_hash_row(_locks, hash);
 } else {
 do_ovn_lflow_add(lflow_map, od, hash, stage, priority, match,
- actions, stage_hint, where);
+ actions, io_port, stage_hint, where);
 }
 }
 
@@ -4333,11 +4336,27 @@ ovn_lflow_add_at(struct hmap *lflow_map, struct 
ovn_datapath *od,
 #define ovn_lflow_add_with_hint(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, \
 ACTIONS, STAGE_HINT) \
 ovn_lflow_add_at(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, ACTIONS, \
- STAGE_HINT, OVS_SOURCE_LOCATOR)
+ NULL, STAGE_HINT, OVS_SOURCE_LOCATOR)
+
+/* This macro is similar to ovn_lflow_add_with_hint, except that it requires
+ * the IN_OUT_PORT argument, which tells the lport name that appears in the
+ * MATCH, which helps ovn-controller to bypass lflows parsing when the lport is
+ * not local to the chassis. The critiera of the lport to be added using this
+ * argument:
+ *
+ * - For ingress pipeline, the lport that is used to match "inport".
+ * - For egress pipeline, the lport that is used to match "outport".
+ *
+ * For now, only LS pipelines should use this macro.  */
+#define ovn_lflow_add_with_lport_and_hint(LFLOW_MAP, OD, STAGE, PRIORITY, \
+  MATCH, ACTIONS, IN_OUT_PORT, \
+  STAGE_HINT) \
+ovn_lflow_add_at(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, ACTIONS, \
+ IN_OUT_PORT, STAGE_HINT, OVS_SOURCE_LOCATOR)
 
 #define ovn_lflow_add(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, ACTIONS) \
 ovn_lflow_add_at(LFLOW_MAP, OD, STAGE, PRIORITY, MATCH, ACTIONS, \
- NULL, OVS_SOURCE_LOCATOR)
+ NULL, NULL, OVS_SOURCE_LOCATOR)
 
 static struct ovn_lflow *
 

[ovs-dev] [PATCH ovn 4/4] ovn-controller: Skip non-local lflows in ovn-controller before parsing.

2021-07-15 Thread Han Zhou
With the help of logical_flow's in_out_port tag, we can skip parsing a
big portion of the logical flows in SB DB, which can largely improve
ovn-controller's performance whenever a full recompute is required.

With a scale test topology of 1000 chassises, 20 LSPs per chassis, 20k
lports in total spread acrossing 200 logical switches, connected by a
logical router, the test result before & after this change:

Before:
- lflow-cache disabled:
- ovn-controller recompute: 2.7 sec
- lflow-cache enabled:
- ovn-controller recompute: 2.1 sec
- lflow cache memory: 622103 KB

After:
- lflow-cache disabled:
- ovn-controller recompute: 0.83 sec
- lflow-cache enabled:
- ovn-controller recompute: 0.71 sec
- lflow cache memory: 123641 KB

(note: DP group enabled for both)

So for this test scenario, when lflow cache is disabled, latency reduced
~70%; when lflow cache is enabled, latency reduced ~65% and lflow cache
memory reduced ~80%.

Signed-off-by: Han Zhou 
---
 controller/lflow.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/controller/lflow.c b/controller/lflow.c
index c58c4f25c..871d3c54d 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -740,6 +740,27 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
 return true;
 }
 
+const char *io_port = smap_get(>tags, "in_out_port");
+if (io_port) {
+lflow_resource_add(l_ctx_out->lfrr, REF_TYPE_PORTBINDING, io_port,
+   >header_.uuid);
+const struct sbrec_port_binding *pb
+= lport_lookup_by_name(l_ctx_in->sbrec_port_binding_by_name,
+   io_port);
+if (!pb) {
+VLOG_DBG("lflow "UUID_FMT" matches inport/outport %s that's not "
+ "found, skip", UUID_ARGS(>header_.uuid), io_port);
+return true;
+}
+char buf[16];
+get_unique_lport_key(dp->tunnel_key, pb->tunnel_key, buf, sizeof buf);
+if (!sset_contains(l_ctx_in->related_lport_ids, buf)) {
+VLOG_DBG("lflow "UUID_FMT" matches inport/outport %s that's not "
+ "local, skip", UUID_ARGS(>header_.uuid), io_port);
+return true;
+}
+}
+
 /* Determine translation of logical table IDs to physical table IDs. */
 bool ingress = !strcmp(lflow->pipeline, "ingress");
 
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 2/4] ovn-sb: Add tags column to logical_flow table of the SB DB.

2021-07-15 Thread Han Zhou
The column will provide information to help improve efficiency of
ovn-controller lflow parsing.

Signed-off-by: Han Zhou 
---
 ovn-sb.ovsschema |  7 +--
 ovn-sb.xml   | 23 +++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
index 73ef33467..40db4bba2 100644
--- a/ovn-sb.ovsschema
+++ b/ovn-sb.ovsschema
@@ -1,7 +1,7 @@
 {
 "name": "OVN_Southbound",
-"version": "20.19.0",
-"cksum": "4027775051 26386",
+"version": "20.20.0",
+"cksum": "2652993555 26538",
 "tables": {
 "SB_Global": {
 "columns": {
@@ -109,6 +109,9 @@
   "maxInteger": 65535}}},
 "match": {"type": "string"},
 "actions": {"type": "string"},
+"tags": {
+"type": {"key": "string", "value": "string",
+ "min": 0, "max": "unlimited"}},
 "external_ids": {
 "type": {"key": "string", "value": "string",
  "min": 0, "max": "unlimited"}}},
diff --git a/ovn-sb.xml b/ovn-sb.xml
index 69de4551b..a39778ee0 100644
--- a/ovn-sb.xml
+++ b/ovn-sb.xml
@@ -2441,6 +2441,29 @@ tcp.flags = RST;
   
 
 
+
+  Key-value pairs that provide additional information to help
+  ovn-controller processing the logical flow. Below are the tags used
+  by ovn-controller.
+
+  
+in_out_port
+
+  In the logical flow's "match" column, if a logical port P is
+  compared with "inport" and the logical flow is on a logical switch
+  ingress pipeline, or if P is compared with "outport" and the
+  logical flow is on a logical switch egress pipeline, and the
+  expression is combined with other expressions (if any) using the
+  operator , then the port P should be added as the value in
+  this tag. If there are multiple logical ports meeting this criteria,
+  one of them can be added. ovn-controller uses this information to
+  skip parsing flows that are not needed on the chassis. Failing to add
+  the tag will affect efficiency, while adding wrong value will affect
+  correctness.
+
+  
+
+
 
   Human-readable name for this flow's stage in the pipeline.
 
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 1/4] ovn-northd.at: Minor improvement for the dp group test case.

2021-07-15 Thread Han Zhou
When counting lsp specific flows, using format "table" for ovn-sbctl
output to make sure each record is counted at most once.

Signed-off-by: Han Zhou 
---
 tests/ovn-northd.at | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 11461d3f4..5dc910f13 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -2486,7 +2486,7 @@ check_row_count Logical_DP_Group 0
 
 dnl Number of logical flows that depends on logical switch or multicast group.
 dnl These will not be combined.
-n_flows_specific=$(ovn-sbctl --bare find Logical_Flow | grep -cE 'swp')
+n_flows_specific=$(ovn-sbctl -f table find Logical_Flow | grep -cE 'swp')
 echo "Number of specific flows: "${n_flows_specific}
 
 dnl Both logical switches configured identically, so there should be same
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 0/4] Avoid parsing non-local lflows with the help of tags in SB.

2021-07-15 Thread Han Zhou
With the help of logical_flow's in_out_port tag, we can skip parsing a
big portion of the logical flows in SB DB, which can largely improve
ovn-controller's performance whenever a full recompute is required.

With a scale test topology of 1000 chassises, 20 LSPs per chassis, 20k
lports in total spread acrossing 200 logical switches, connected by a
logical router, the test result before & after this change:

Before:
- lflow-cache disabled:
- ovn-controller recompute: 2.7 sec
- lflow-cache enabled:
- ovn-controller recompute: 2.1 sec
- lflow cache memory: 622103 KB

After:
- lflow-cache disabled:
- ovn-controller recompute: 0.83 sec
- lflow-cache enabled:
- ovn-controller recompute: 0.71 sec
- lflow cache memory: 123641 KB

(note: DP group enabled for both)

So for this test scenario, when lflow cache is disabled, latency reduced
~70%; when lflow cache is enabled, latency reduced ~65% and lflow cache
memory reduced ~80%.

Changes after the RFC patch:
- Rebase on master
- Add ddlog changes
- Add an ovn-northd test case to make sure the tags are added

Han Zhou (4):
  ovn-northd.at: Minor improvement for the dp group test case.
  ovn-sb: Add tags column to logical_flow table of the SB DB.
  ovn-northd: Populate in_out_port in logical_flow table's tags.
  ovn-controller: Skip non-local lflows in ovn-controller before
parsing.

 controller/lflow.c   |  21 ++
 northd/ovn-northd.c  | 272 ++--
 northd/ovn_northd.dl | 495 +--
 ovn-sb.ovsschema |   7 +-
 ovn-sb.xml   |  23 ++
 tests/ovn-northd.at  |  23 +-
 6 files changed, 520 insertions(+), 321 deletions(-)

-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] controller: Avoid unnecessary load balancer flow processing.

2021-07-15 Thread Ben Pfaff
On Mon, Jul 12, 2021 at 10:08:10AM +0200, Dumitru Ceara wrote:
> Whenever a Load_Balancer is updated, e.g., a VIP is added, the following
> sequence of events happens:
> 
> 1. The Southbound Load_Balancer record is updated.
> 2. The Southbound Datapath_Binding records on which the Load_Balancer is
>applied are updated.
> 3. Southbound ovsdb-server sends updates about the Load_Balancer and
>Datapath_Binding records to ovn-controller.
> 4. The IDL layer in ovn-controller processes the updates at #3, but
>because of the SB schema references between tables [0] all logical
>flows referencing the updated Datapath_Binding are marked as
>"updated".  The same is true for Logical_DP_Group records
>referencing the Datapath_Binding, and also for all logical flows
>pointing to the new "updated" datapath groups.
> 5. ovn-controller ends up recomputing (removing/readding) all flows for
>all these tracked updates.

This is kind of a weird change from my perspective.  It allows for
broken referential integrity in the database to work around a
performance bug in the IDL.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Optimize the poll loop for poll_immediate_wake()

2021-07-15 Thread Ben Pfaff
I'm impressed by the improvement.  I didn't really expect any.

On Tue, Jul 13, 2021 at 09:19:34AM +0100, Anton Ivanov wrote:
> I ran the revised patch series (v2) which addresses your comments on the 
> ovn-heater benchmark.
> 
> With 9 fake nodes, 50 fake pods per node I get ~ 2% reproducible improvement. 
> IMHO it should be even more noticeable at high scale.
> 
> Best regards,
> 
> A
> 
> On 09/07/2021 19:45, Ben Pfaff wrote:
> > On Fri, Jul 09, 2021 at 06:19:06PM +0100, anton.iva...@cambridgegreys.com 
> > wrote:
> > > From: Anton Ivanov 
> > > 
> > > If we are not obtaining any useful information out of the poll(),
> > > such as is a fd busy or not, we do not need to do a poll() if
> > > an immediate_wake() has been requested.
> > > 
> > > This cuts out all the pollfd hash additions, forming the poll
> > > arguments and the actual poll() after a call to
> > > poll_immediate_wake()
> > > 
> > > Signed-off-by: Anton Ivanov 
> > Thanks for the patch.
> > 
> > I think that this will have some undesirable side effects because it
> > avoids calling time_poll() if the wakeup should happen immediately, and
> > time_poll() does some thing that we always want to happen between one
> > main loop and another.  In particular the following calls from
> > time_poll() are important:
> > 
> >  coverage_clear();
> >  coverage_run();
> >  if (*last_wakeup && !thread_is_pmd()) {
> >  log_poll_interval(*last_wakeup);
> >  }
> > 
> > ...
> > 
> >  if (!time_left) {
> >  ovsrcu_quiesce();
> >  } else {
> >  ovsrcu_quiesce_start();
> >  }
> > 
> > ...
> > 
> >  if (deadline <= time_msec()) {
> > #ifndef _WIN32
> >  fatal_signal_handler(SIGALRM);
> > #else
> >  VLOG_ERR("wake up from WaitForMultipleObjects after deadline");
> >  fatal_signal_handler(SIGTERM);
> > #endif
> >  if (retval < 0) {
> >  retval = 0;
> >  }
> >  break;
> >  }
> > 
> > ...
> > 
> >  *last_wakeup = time_msec();
> >  refresh_rusage();
> > 
> > Instead of this change, I'd suggest something more like the following,
> > along with the changes you made to suppress the file descriptors if the
> > timeout is already zero:
> > 
> > diff --git a/lib/timeval.c b/lib/timeval.c
> > index 193c7bab1781..f080a742 100644
> > --- a/lib/timeval.c
> > +++ b/lib/timeval.c
> > @@ -323,7 +323,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE 
> > *handles OVS_UNUSED,
> >   }
> >   #ifndef _WIN32
> > -retval = poll(pollfds, n_pollfds, time_left);
> > +retval = time_left ? poll(pollfds, n_pollfds, time_left) : 0;
> >   if (retval < 0) {
> >   retval = -errno;
> >   }
> > 
> > 
> -- 
> Anton R. Ivanov
> Cambridgegreys Limited. Registered in England. Company Number 10273661
> https://www.cambridgegreys.com/
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovn-sbctl.at: Fix timing problem of count-flows test.

2021-07-15 Thread Ben Pfaff
On Thu, Jul 15, 2021 at 04:34:51PM -0700, Han Zhou wrote:
> Fixes: 895e02ec0be6("ovn-sbctl.c Add logical flows count numbers")
> Signed-off-by: Han Zhou 
> ---
>  tests/ovn-sbctl.at | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/ovn-sbctl.at b/tests/ovn-sbctl.at
> index 16f5dabcc..dabf62d2b 100644
> --- a/tests/ovn-sbctl.at
> +++ b/tests/ovn-sbctl.at
> @@ -200,7 +200,7 @@ Total number of logical flows = 0
>  ])
>  
>  # create some logical flows
> -check ovn-nbctl ls-add count-test
> +check ovn-nbctl --wait=sb ls-add count-test
>  
>  OVS_WAIT_UNTIL([total_lflows=`count_entries`; test $total_lflows -ne 0])

I'm surprised that the above change is needed, since OVS_WAIT_UNTIL
should wait until the condition is true.

> @@ -219,7 +219,7 @@ $ingress_lflows
>  ])
>  
>  # add another datapath
> -check ovn-nbctl ls-add count-test2
> +check ovn-nbctl --wait=sb ls-add count-test2
>  
>  # check total logical flows in 2 datapathes
>  AT_CHECK_UNQUOTED([ovn-sbctl count-flows | grep "flows =" | awk 'NF>1{print 
> $NF}'], [0], [dnl

The one above makes sense to me.


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog - high mem and cpu usage when started with an existing DB

2021-07-15 Thread Ben Pfaff
On Mon, Jul 12, 2021 at 04:42:27PM -0700, Ben Pfaff wrote:
> On Thu, Jul 08, 2021 at 08:59:24PM +0200, Dumitru Ceara wrote:
> > Hi Ben,
> > 
> > As discussed earlier, during the OVN meeting, I've noticed a new
> > performance issue with ovn-northd-ddlog when running it against a
> > database from one of our more recent scale tests:
> > 
> > http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210708/ovnnb_db.db
> > 
> > ovn-northd-ddlog uses 100% CPU and never really reaches the point to
> > perform the first transaction to the Southbound.  Memory usage is also
> > very high, I stopped it at 45GB RSS.
> > 
> > To test I did:
> > SANDBOXFLAGS="--nbdb-source=/tmp/ovnnb_db.db --ddlog" make sandbox
> 
> Thanks.  I've been spending a lot of time with this Friday and today.
> It is a bit different from the other issues I've looked at.  The
> previous ones were inefficient production of relatively small output.
> This one is inefficient production (and storage) of rather large output
> (millions of flows).  I'm trying to get help from Leonid on how to
> reduce the memory usage.

Leonid has been looking into this and we're going to talk through a
solution tomorrow.  With luck, I'll have some patches soon after that.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2 branch-21.06] Don't suppress localport traffic directed to external port

2021-07-15 Thread Numan Siddique
On Thu, Jul 15, 2021 at 4:34 PM Ihar Hrachyshka  wrote:
>
> Recently, we stopped leaking localport traffic through localnet ports
> into fabric to avoid unnecessary flipping between chassis hosting the
> same localport.
>
> Despite the type name, in some scenarios localports are supposed to
> talk outside the hosting chassis. Specifically, in OpenStack [1]
> metadata service for SR-IOV ports is implemented as a localport hosted
> on another chassis that is exposed to the chassis owning the SR-IOV
> port through an "external" port. In this case, "leaking" localport
> traffic into fabric is desirable.
>
> This patch inserts a higher priority flow per external port on the
> same datapath that avoids dropping localport traffic.
>
> The backport includes custom branch-21.06 specific physical flows
> cleanup for localnet ports when an external port is modified / deleted.
> This was not needed in master branch because of separated pflow
> management.
>
> Fixes: 96959e56d634 ("physical: do not forward traffic from localport
> to a localnet one")
>
> [1] https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1974062
>
> Signed-off-by: Ihar Hrachyshka 
> Signed-off-by: Numan Siddique 
> (cherry picked from commit 1148580290d0ace803f20aeaa0241dd51c100630)

Thanks Ihar.  I applied this patch to branch-21.06.

Regards
Numan

>
> ---
>
> v1: initial backport.
> v2: properly handle binding updates through inc engine instead of
> forcing full recompute.
>
> new test case
> ---
>  controller/binding.c| 40 +++--
>  controller/binding.h|  2 +
>  controller/ovn-controller.c | 11 +
>  controller/ovn-controller.h |  2 +
>  controller/physical.c   | 66 +---
>  controller/physical.h   |  2 +
>  tests/ovn.at| 85 +
>  7 files changed, 198 insertions(+), 10 deletions(-)
>
> diff --git a/controller/binding.c b/controller/binding.c
> index 594babc98..ba558efdb 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -22,6 +22,7 @@
>  #include "patch.h"
>
>  #include "lib/bitmap.h"
> +#include "lib/hmapx.h"
>  #include "openvswitch/poll-loop.h"
>  #include "lib/sset.h"
>  #include "lib/util.h"
> @@ -108,6 +109,7 @@ add_local_datapath__(struct ovsdb_idl_index 
> *sbrec_datapath_binding_by_key,
>  hmap_insert(local_datapaths, >hmap_node, dp_key);
>  ld->datapath = datapath;
>  ld->localnet_port = NULL;
> +shash_init(>external_ports);
>  ld->has_local_l3gateway = has_local_l3gateway;
>
>  if (tracked_datapaths) {
> @@ -474,6 +476,18 @@ is_network_plugged(const struct sbrec_port_binding 
> *binding_rec,
>  return network ? !!shash_find_data(bridge_mappings, network) : false;
>  }
>
> +static void
> +update_ld_external_ports(const struct sbrec_port_binding *binding_rec,
> + struct hmap *local_datapaths)
> +{
> +struct local_datapath *ld = get_local_datapath(
> +local_datapaths, binding_rec->datapath->tunnel_key);
> +if (ld) {
> +shash_replace(>external_ports, binding_rec->logical_port,
> +  binding_rec);
> +}
> +}
> +
>  static void
>  update_ld_localnet_port(const struct sbrec_port_binding *binding_rec,
>  struct shash *bridge_mappings,
> @@ -1631,8 +1645,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> binding_ctx_out *b_ctx_out)
>  !sset_is_empty(b_ctx_out->egress_ifaces) ? _map : NULL;
>
>  struct ovs_list localnet_lports = OVS_LIST_INITIALIZER(_lports);
> +struct ovs_list external_lports = OVS_LIST_INITIALIZER(_lports);
>
> -struct localnet_lport {
> +struct lport {
>  struct ovs_list list_node;
>  const struct sbrec_port_binding *pb;
>  };
> @@ -1680,11 +1695,14 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> binding_ctx_out *b_ctx_out)
>
>  case LP_EXTERNAL:
>  consider_external_lport(pb, b_ctx_in, b_ctx_out);
> +struct lport *ext_lport = xmalloc(sizeof *ext_lport);
> +ext_lport->pb = pb;
> +ovs_list_push_back(_lports, _lport->list_node);
>  break;
>
>  case LP_LOCALNET: {
>  consider_localnet_lport(pb, b_ctx_in, b_ctx_out, _map);
> -struct localnet_lport *lnet_lport = xmalloc(sizeof *lnet_lport);
> +struct lport *lnet_lport = xmalloc(sizeof *lnet_lport);
>  lnet_lport->pb = pb;
>  ovs_list_push_back(_lports, _lport->list_node);
>  break;
> @@ -1711,7 +1729,7 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> binding_ctx_out *b_ctx_out)
>  /* Run through each localnet lport list to see if it is a localnet port
>   * on local datapaths discovered from above loop, and update the
>   * corresponding local datapath accordingly. */
> -struct localnet_lport *lnet_lport;
> +struct 

Re: [ovs-dev] [PATCH v2] bond: Fix broken rebalancing after link state changes.

2021-07-15 Thread Ben Pfaff
On Tue, Jul 13, 2021 at 05:32:06PM +0200, Ilya Maximets wrote:
> There are 3 constraints for moving hashes from one member to another:
> 
>   1. The load difference exceeds ~ 3% of the load of one member.
>   2. The difference in load between members exceeds 100,000 bytes.
>   3. Moving the hash reduces the load difference by more than 10%.
> 
> In the current implementation, if one of the members transitions to
> the DOWN state, all hashes assigned to it will be moved to the other
> members.  After that, if the member goes UP, it will wait for
> rebalancing to get hashes.  But in case we have more than 10 equally
> loaded hashes, it will never meet constraint # 3, because each hash
> will handle less than 10% of the load.  The situation gets worse when
> the number of flows grows and it is almost impossible to transfer any
> hash when all 256 hash records are used, which is very likely when we
> have few hundred/thousand flows.
> 
> As a result, if one of the members goes down and back up while traffic
> flows, it will never be used to transmit packets again.  This will not
> be fixed even if we completely stop the traffic and start it again,
> because the first two constraints will block rebalancing in the
> earlier stages, while we have low traffic volume.
> 
> Moving a single hash if the destination does not have any hashes,
> as it was before commit c460a6a7bc75 ("ofproto/bond: simplifying the
> rebalancing logic"), will not help, because a single hash is not
> enough to make the difference in load less than 10% of the total load,
> and this member will handle only that one hash forever.
> 
> To fix this, let's try to move multiple hashes at the same time to
> meet constraint # 3.
> 
> The implementation includes sorting the "records" to be able to
> collect records with a cumulative load close enough to the ideal value.
> 
> Signed-off-by: Ilya Maximets 

I reread this and it still looks good to me.

I spotted one typo in a comment:

diff --git a/ofproto/bond.c b/ofproto/bond.c
index b9bfa45493b8..c3e2083575b0 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -1216,7 +1216,7 @@ choose_entries_to_migrate(const struct bond_member *from, 
uint64_t to_tx_bytes,
 }
 
 if (!cnt) {
-/* There is no entry which load less than or equal to 'ideal_delta'.
+/* There is no entry with load less than or equal to 'ideal_delta'.
  * Lets try closest one. The closest is the last in sorted list. */
 struct bond_entry *closest;
 

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] ovn-sbctl.at: Fix timing problem of count-flows test.

2021-07-15 Thread Han Zhou
Fixes: 895e02ec0be6("ovn-sbctl.c Add logical flows count numbers")
Signed-off-by: Han Zhou 
---
 tests/ovn-sbctl.at | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/ovn-sbctl.at b/tests/ovn-sbctl.at
index 16f5dabcc..dabf62d2b 100644
--- a/tests/ovn-sbctl.at
+++ b/tests/ovn-sbctl.at
@@ -200,7 +200,7 @@ Total number of logical flows = 0
 ])
 
 # create some logical flows
-check ovn-nbctl ls-add count-test
+check ovn-nbctl --wait=sb ls-add count-test
 
 OVS_WAIT_UNTIL([total_lflows=`count_entries`; test $total_lflows -ne 0])
 
@@ -219,7 +219,7 @@ $ingress_lflows
 ])
 
 # add another datapath
-check ovn-nbctl ls-add count-test2
+check ovn-nbctl --wait=sb ls-add count-test2
 
 # check total logical flows in 2 datapathes
 AT_CHECK_UNQUOTED([ovn-sbctl count-flows | grep "flows =" | awk 'NF>1{print 
$NF}'], [0], [dnl
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 2/2] tutorial: Remove OVS-specific files.

2021-07-15 Thread Ben Pfaff
On Tue, Jul 13, 2021 at 12:17:48PM -0400, Numan Siddique wrote:
> On Fri, Jul 2, 2021 at 5:36 PM Ben Pfaff  wrote:
> >
> > These were part of the OVS tutorial, which isn't relevant for OVN.
> >
> > Signed-off-by: Ben Pfaff 
> 
> Acked-by: Numan Siddique 

Thanks for the reviews.  I pushed these commits.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] openvswitch: Introduce per-cpu upcall dispatch

2021-07-15 Thread Pravin Shelar
On Thu, Jul 15, 2021 at 5:28 AM Mark Gray  wrote:
>
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
>
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
>
> * On systems with a large number of vports, there is a correspondingly
> large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
>
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
>
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
>
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
>
> The corresponding user space code can be found at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.html
>
> Bugzilla: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> Acked-by: Flavio Leitner 

Acked-by: Pravin B Shelar 

Thanks,
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/9] OVSDB Relay Service Model. (Was: OVSDB 2-Tier deployment)

2021-07-15 Thread Ilya Maximets
On 7/15/21 3:32 PM, Dumitru Ceara wrote:
> Hi Ilya,
> 
> On 7/14/21 6:52 PM, Ilya Maximets wrote:
>> On 7/14/21 3:50 PM, Ilya Maximets wrote:
>>> Replication can be used to scale out read-only access to the database.
>>> But there are clients that are not read-only, but read-mostly.
>>> One of the main examples is ovn-controller that mostly monitors
>>> updates from the Southbound DB, but needs to claim ports by sending
>>> transactions that changes some database tables.
>>>
>>> Southbound database serves lots of connections: all connections
>>> from ovn-controllers and some service connections from cloud
>>> infrastructure, e.g. some OpenStack agents are monitoring updates.
>>> At a high scale and with a big size of the database ovsdb-server
>>> spends too much time processing monitor updates and it's required
>>> to move this load somewhere else.  This patch-set aims to introduce
>>> required functionality to scale out read-mostly connections by
>>> introducing a new OVSDB 'relay' service model .
>>>
>>> In this new service model ovsdb-server connects to existing OVSDB
>>> server and maintains in-memory copy of the database.  It serves
>>> read-only transactions and monitor requests by its own, but forwards
>>> write transactions to the relay source.
>>>
>>> Key differences from the active-backup replication:
>>> - support for "write" transactions.
>>> - no on-disk storage. (probably, faster operation)
>>> - support for multiple remotes (connect to the clustered db).
>>> - doesn't try to keep connection as long as possible, but
>>>   faster reconnects to other remotes to avoid missing updates.
>>> - No need to know the complete database schema beforehand,
>>>   only the schema name.
>>> - can be used along with other standalone and clustered databases
>>>   by the same ovsdb-server process. (doesn't turn the whole
>>>   jsonrpc server to read-only mode)
>>> - supports modern version of monitors (monitor_cond_since),
>>>   because based on ovsdb-cs.
>>> - could be chained, i.e. multiple relays could be connected
>>>   one to another in a row or in a tree-like form.
>>>
>>> Bringing all above functionality to the existing active-backup
>>> replication doesn't look right as it will make it less reliable
>>> for the actual backup use case, and this also would be much
>>> harder from the implementation point of view, because current
>>> replication code is not based on ovsdb-cs or idl and all the required
>>> features would be likely duplicated or replication would be fully
>>> re-written on top of ovsdb-cs with severe modifications of the former.
>>>
>>> Relay is somewhere in the middle between active-backup replication and
>>> the clustered model taking a lot from both, therefore is hard to
>>> implement on top of any of them.
>>>
>>> To run ovsdb-server in relay mode, user need to simply run:
>>>
>>>   ovsdb-server --remote=punix:db.sock relay::
>>>
>>> e.g.
>>>
>>>   ovsdb-server --remote=punix:db.sock 
>>> relay:OVN_Southbound:tcp:127.0.0.1:6642
>>>
>>> More details and examples in the documentation in the last patch
>>> of the series.
>>>
>>> I actually tried to implement transaction forwarding on top of
>>> active-backup replication in v1 of this seies, but it required
>>> a lot of tricky changes, including schema format changes in order
>>> to bring required information to the end clients, so I decided
>>> to fully rewrite the functionality in v2 with a different approach.
>>>
>>>
>>>  Testing
>>>  ===
>>>
>>> Some scale tests were performed with OVSDB Relays that mimics OVN
>>> workloads with ovn-kubernetes.
>>> Tests performed with ovn-heater (https://github.com/dceara/ovn-heater)
>>> on scenario ocp-120-density-heavy:
>>>  
>>> https://github.com/dceara/ovn-heater/blob/master/test-scenarios/ocp-120-density-heavy.yml
>>> In short, the test gradually creates a lot of OVN resources and
>>> checks that network is configured correctly (by pinging diferent
>>> namespaces).  The test includes 120 chassis (created by
>>> ovn-fake-multinode), 31250 LSPs spread evenly across 120 LSes, 3 LBs
>>> with 15625 VIPs each, attached to all node LSes, etc.  Test performed
>>> with monitor-all=true.
>>>
>>> Note 1:
>>>  - Memory consumption is checked at the end of a test in a following
>>>way: 1) check RSS 2) compact database 3) check RSS again.
>>>It's observed that ovn-controllers in this test are fairly slow
>>>and backlog builds up on monitors, because ovn-controllers are
>>>not able to receive updates fast enough.  This contributes to
>>>RSS of the process, especially in combination of glibc bug (glibc
>>>doesn't free fastbins back to the system).  Memory trimming on
>>>compaction is enabled in the test, so after compaction we can
>>>see more or less real value of the RSS at the end of the test
>>>wihtout backlog noise. (Compaction on relay in this case is
>>>just plain malloc_trim()).
>>>
>>> Note 2:
>>>  - I didn't collect memory consumption (RSS) after compaction 

Re: [ovs-dev] [PATCH v3 2/3] dpdk: Remove default values for socket-mem and limit.

2021-07-15 Thread 0-day Robot
Bleep bloop.  Greetings Rosemarie O'Riorden, I am a robot and I have tried out 
your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line has trailing whitespace
#201 FILE: vswitchd/vswitch.xml:367:
  DPDK defaults will be used instead. If dpdk-socket-mem and 

Lines checked: 220, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/3] dpdk: Logs to announce removal of defaults for socket-mem and limit.

2021-07-15 Thread 0-day Robot
Bleep bloop.  Greetings Rosemarie O'Riorden, I am a robot and I have tried out 
your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 80 characters long (recommended limit is 79)
#85 FILE: lib/dpdk.c:495:
  "from 2.17 release. DPDK defaults will be used instead.");

Lines checked: 117, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 2/3] dpdk: Remove default values for socket-mem and limit.

2021-07-15 Thread Rosemarie O'Riorden
This change removes the default values for EAL args socket-mem and
socket-limit. As DPDK supports dynamic memory allocation, there is no
need to allocate a certain amount of memory on start-up, nor limit the
amount of memory available, if not requested.

Currently, socket-mem has a default value of 1024 when it is not
configured by the user, and socket-limit takes on the value of socket-mem,
1024, by default. With this change, socket-mem is not configured by default,
meaning that socket-limit is not either. Neither, either or both options can be 
set.

Removed extra logs that announce this change and fixed documentation.

Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850
Signed-off-by: Rosemarie O'Riorden 
---
Version 2:
 - Removes logs added in patch 1 that were not in v1.
 - Removes code added to lib/dpdk.c since v1 that conflicts with this patch 
series.

Version 3:
 - Fixed reference to "patch 1" in commit message.

 Documentation/intro/install/dpdk.rst |  5 +-
 NEWS |  4 +-
 lib/dpdk.c   | 74 +---
 vswitchd/vswitch.xml | 16 +++---
 4 files changed, 11 insertions(+), 88 deletions(-)

diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
index d8fa931fa..96843af73 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -290,9 +290,8 @@ listed below. Defaults will be provided for all values not 
explicitly set.
 
 ``dpdk-socket-mem``
   Comma separated list of memory to pre-allocate from hugepages on specific
-  sockets. If not specified, 1024 MB will be set for each numa node by
-  default. This behavior will change with the 2.17 release, with no default
-  value from OVS. Instead, DPDK default will be used.
+  sockets. If not specified, this option will not be set by default. DPDK
+  default will be used instead.
 
 ``dpdk-hugepage-dir``
   Directory where hugetlbfs is mounted
diff --git a/NEWS b/NEWS
index 126f5a927..948f68283 100644
--- a/NEWS
+++ b/NEWS
@@ -29,8 +29,8 @@ Post-v2.15.0
Available only if DPDK experimantal APIs enabled during the build.
  * Add hardware offload support for VXLAN flows (experimental).
Available only if DPDK experimantal APIs enabled during the build.
- * EAL options --socket-mem and --socket-limit to have default values
-   removed with 2.17 release. Logging added to alert users.
+ * EAL option --socket-mem is no longer configured by default upon
+   start-up.
- ovsdb-tool:
  * New option '--election-timer' to the 'create-cluster' command to set the
leader election timer during cluster creation.
diff --git a/lib/dpdk.c b/lib/dpdk.c
index b70c01cf4..3a6990e2f 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -130,74 +130,12 @@ construct_dpdk_options(const struct smap 
*ovs_other_config, struct svec *args)
 }
 }
 
-static int
-compare_numa_node_list(const void *a_, const void *b_)
-{
-int a = *(const int *) a_;
-int b = *(const int *) b_;
-
-if (a < b) {
-return -1;
-}
-if (a > b) {
-return 1;
-}
-return 0;
-}
-
-static char *
-construct_dpdk_socket_mem(void)
-{
-const char *def_value = "1024";
-struct ds dpdk_socket_mem = DS_EMPTY_INITIALIZER;
-
-/* Build a list of all numa nodes with at least one core. */
-struct ovs_numa_dump *dump = ovs_numa_dump_n_cores_per_numa(1);
-size_t n_numa_nodes = hmap_count(>numas);
-int *numa_node_list = xcalloc(n_numa_nodes, sizeof *numa_node_list);
-
-const struct ovs_numa_info_numa *node;
-int k = 0, last_node = 0;
-
-FOR_EACH_NUMA_ON_DUMP(node, dump) {
-if (k >= n_numa_nodes) {
-break;
-}
-numa_node_list[k++] = node->numa_id;
-}
-qsort(numa_node_list, k, sizeof *numa_node_list, compare_numa_node_list);
-
-for (int i = 0; i < n_numa_nodes; i++) {
-while (numa_node_list[i] > last_node &&
-   numa_node_list[i] != OVS_NUMA_UNSPEC &&
-   numa_node_list[i] <= MAX_NUMA_NODES) {
-if (last_node == 0) {
-ds_put_format(_socket_mem, "%s", "0");
-} else {
-ds_put_format(_socket_mem, ",%s", "0");
-}
-last_node++;
-}
-if (numa_node_list[i] == 0) {
-ds_put_format(_socket_mem, "%s", def_value);
-} else {
-ds_put_format(_socket_mem, ",%s", def_value);
-}
-last_node++;
-}
-free(numa_node_list);
-ovs_numa_dump_destroy(dump);
-return ds_cstr(_socket_mem);
-}
-
 #define MAX_DPDK_EXCL_OPTS 10
 
 static void
 construct_dpdk_mutex_options(const struct smap *ovs_other_config,
  struct svec *args)
 {
-char *default_dpdk_socket_mem = construct_dpdk_socket_mem();
-
 struct dpdk_exclusive_options_map {
 const char *category;
 const char 

[ovs-dev] [PATCH v3 3/3] dpdk: Stop configuring socket-limit with the value of socket-mem.

2021-07-15 Thread Rosemarie O'Riorden
This change removes the automatic memory limit on start-up of OVS with
DPDK. As DPDK supports dynamic memory allocation, there is no
need to limit the amount of memory available, if not requested.

Currently, if socket-limit is not configured, it is set to the value of
socket-mem. With this change, the user can decide to set it or have no
memory limit.

Removed logs that announce this change and fixed documentation.

Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850
Signed-off-by: Rosemarie O'Riorden 
---
Version 1:
 - Removes logs added in patch 1 that were not in v1.

Version 2:
 - Removed reference to "patch 1"
 - Removed additional code that was unnecessary in lib/dpdk.c
 - Updated documentation

 NEWS |  2 ++
 lib/dpdk.c   | 19 ---
 vswitchd/vswitch.xml |  8 +---
 3 files changed, 3 insertions(+), 26 deletions(-)

diff --git a/NEWS b/NEWS
index 948f68283..99b8b9fce 100644
--- a/NEWS
+++ b/NEWS
@@ -31,6 +31,8 @@ Post-v2.15.0
Available only if DPDK experimantal APIs enabled during the build.
  * EAL option --socket-mem is no longer configured by default upon
start-up.
+ * EAL option --socket-limit no longer takes on the value of --socket-mem
+   by default.
- ovsdb-tool:
  * New option '--election-timer' to the 'create-cluster' command to set the
leader election timer during cluster creation.
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 3a6990e2f..e88183236 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -405,25 +405,6 @@ dpdk_init__(const struct smap *ovs_other_config)
 svec_add(, ovs_get_program_name());
 construct_dpdk_args(ovs_other_config, );
 
-if (!args_contains(, "--legacy-mem")
-&& !args_contains(, "--socket-limit")) {
-const char *arg;
-size_t i;
-
-SVEC_FOR_EACH (i, arg, ) {
-if (!strcmp(arg, "--socket-mem")) {
-break;
-}
-}
-if (i < args.n - 1) {
-svec_add(, "--socket-limit");
-svec_add(, args.names[i + 1]);
-VLOG_INFO("Using default value for '--socket-limit. OVS will no "
-  "longer provide a default for this argument starting "
-  "from 2.17 release. DPDK defaults will be used 
instead.");
-}
-}
-
 if (args_contains(, "-c") || args_contains(, "-l")) {
 auto_determine = false;
 }
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 190e377d3..10eecfb09 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -381,13 +381,7 @@
   0 will disable the limit for a particular socket.
 
 
-  If not specified, OVS will configure limits equal to the amount of
-  preallocated memory specified by  or --socket-mem in
-  . If none of the above
-  options specified or --legacy-mem provided in
-  , limits will not be
-  applied. There is no default value from OVS.
+  If not specified, OVS will not configure limits by default.
   Changing this value requires restarting the daemon.
 
   
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 1/3] dpdk: Logs to announce removal of defaults for socket-mem and limit.

2021-07-15 Thread Rosemarie O'Riorden
Deprecate current OVS provided defaults for DPDK socket-mem and
socket-limit that are planned to be removed in OVS 2.17. At that point
DPDK defaults will be used instead. Warnings have been added to alert
users in advance.

Signed-off-by: Rosemarie O'Riorden 
---
Version 3:
 - Fixed typo and edited commit message.

 Documentation/intro/install/dpdk.rst |  3 ++-
 NEWS |  2 ++
 lib/dpdk.c   | 11 +++
 vswitchd/vswitch.xml |  8 ++--
 4 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
index 612f2fdbc..d8fa931fa 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -291,7 +291,8 @@ listed below. Defaults will be provided for all values not 
explicitly set.
 ``dpdk-socket-mem``
   Comma separated list of memory to pre-allocate from hugepages on specific
   sockets. If not specified, 1024 MB will be set for each numa node by
-  default.
+  default. This behavior will change with the 2.17 release, with no default
+  value from OVS. Instead, DPDK default will be used.
 
 ``dpdk-hugepage-dir``
   Directory where hugetlbfs is mounted
diff --git a/NEWS b/NEWS
index 57fc2..126f5a927 100644
--- a/NEWS
+++ b/NEWS
@@ -29,6 +29,8 @@ Post-v2.15.0
Available only if DPDK experimantal APIs enabled during the build.
  * Add hardware offload support for VXLAN flows (experimental).
Available only if DPDK experimantal APIs enabled during the build.
+ * EAL options --socket-mem and --socket-limit to have default values
+   removed with 2.17 release. Logging added to alert users.
- ovsdb-tool:
  * New option '--election-timer' to the 'create-cluster' command to set the
leader election timer during cluster creation.
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 0c910092c..b70c01cf4 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -217,6 +217,7 @@ construct_dpdk_mutex_options(const struct smap 
*ovs_other_config,
 int found_opts = 0, scan, found_pos = -1;
 const char *found_value;
 struct dpdk_exclusive_options_map *popt = _opts[i];
+bool using_default = false;
 
 for (scan = 0; scan < MAX_DPDK_EXCL_OPTS
  && popt->ovs_dpdk_options[scan]; ++scan) {
@@ -233,6 +234,7 @@ construct_dpdk_mutex_options(const struct smap 
*ovs_other_config,
 if (popt->default_option) {
 found_pos = popt->default_option;
 found_value = popt->default_value;
+using_default = true;
 } else {
 continue;
 }
@@ -245,6 +247,12 @@ construct_dpdk_mutex_options(const struct smap 
*ovs_other_config,
 }
 
 if (!args_contains(args, popt->eal_dpdk_options[found_pos])) {
+if (using_default) {
+VLOG_INFO("Using default value for '%s'. OVS will no longer "
+  "provide a default for this argument starting "
+  "from 2.17 release. DPDK defaults will be used "
+  "instead.", popt->eal_dpdk_options[found_pos]);
+}
 svec_add(args, popt->eal_dpdk_options[found_pos]);
 svec_add(args, found_value);
 } else {
@@ -482,6 +490,9 @@ dpdk_init__(const struct smap *ovs_other_config)
 if (i < args.n - 1) {
 svec_add(, "--socket-limit");
 svec_add(, args.names[i + 1]);
+VLOG_INFO("Using default value for '--socket-limit. OVS will no "
+  "longer provide a default for this argument starting "
+  "from 2.17 release. DPDK defaults will be used 
instead.");
 }
 }
 
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 3522b2497..c26ebb796 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -365,8 +365,10 @@
   If dpdk-socket-mem and dpdk-alloc-mem are not specified, 
dpdk-socket-mem
   will be used and the default value is 1024 for each numa node. If
   dpdk-socket-mem and dpdk-alloc-mem are specified at same time,
-  dpdk-socket-mem will be used as default. Changing this value
-  requires restarting the daemon.
+  dpdk-socket-mem will be used as default. With the 2.17 release,
+  dpdk-socket-mem will no longer be used by default. DPDK defaults will
+  be used instead.
+  Changing this value requires restarting the daemon.
 
   
 
@@ -388,6 +390,8 @@
   options specified or --legacy-mem provided in
   , limits will not be
   applied.
+  With the 2.17 release, the OVS default value will no longer be
+  provided, and DPDK defaults will be used instead.
   Changing this value requires restarting the daemon.
 
   
-- 
2.31.1

___
dev 

[ovs-dev] [PATCH v3 0/3] Stop configuring '--socket-mem'/'--socket-limit' by default for DPDK if not requested.

2021-07-15 Thread Rosemarie O'Riorden
Currently, there is a default value of 1024 for socket-mem if not
configured. socket-limit automatically takes on the value of socket-mem
unless otherwise specified. With these changes, memory allocation will
be dynamically managed by DPDK, meaning that by default,  no memory will
be pre-allocated on startup, and there will be no limit to how much
memory can be used. Either or both of these values can be set by the
user.

The EAL arguments will look like this:

- dpdk-socket-mem=, dpdk-socket-limit=
  current: "--socket-mem=1024,1024 --socket-limit=1024,1024"
  patch 1: ""
  patch 2: ""

- dpdk-socket-mem=, dpdk-socket-limit=
  current: "--socket-mem=MEM --socket-limit=MEM"
  patch 1: "--socket-mem=MEM --socket-limit=MEM"
  patch 2: "--socket-mem=MEM"

- dpdk-socket-mem=, dpdk-socket-limit=
  current: "--socket-mem=1024,1024 --socket-limit=LIMIT"
  patch 1: "--socket-limit=LIMIT"
  patch 2: "--socket-limit=LIMIT"

- dpdk-socket-mem=, dpdk-socket-limit=
  current: "--socket-mem=MEM --socket-limit=LIMIT"
  patch 1: "--socket-mem=MEM --socket-limit=LIMIT"
  patch 2: "--socket-mem=MEM --socket-limit=LIMIT"

Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850

Rosemarie O'Riorden (3):
  dpdk: Logs to announce removal of defaults for socket-mem and limit.
  dpdk: Remove default values for socket-mem and limit.
  dpdk: Stop configuring socket-limit with the value of socket-mem.

 Documentation/intro/install/dpdk.rst |  4 +-
 NEWS |  4 ++
 lib/dpdk.c   | 82 +---
 vswitchd/vswitch.xml | 18 ++
 4 files changed, 13 insertions(+), 95 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/2] Optimize the poll loop for poll_immediate_wake()

2021-07-15 Thread Anton Ivanov

On 15/07/2021 22:17, Ben Pfaff wrote:

On Mon, Jul 12, 2021 at 06:15:28PM +0100, anton.iva...@cambridgegreys.com wrote:

From: Anton Ivanov 

If we are not obtaining any useful information out of the poll(),
such as is a fd busy or not, we do not need to do a poll() if
an immediate_wake() has been requested.

This cuts out all the pollfd hash additions, forming the poll
arguments and the actual poll() after a call to
poll_immediate_wake()

Signed-off-by: Anton Ivanov 

I don't think we need the new 'immediate_wake' member of struct
poll_loop.  A 'timeout_when' of LLONG_MIN already means the same thing
(it's even documented in the comment).


OK. Cool.

I did not want to touch any of the timeout logic, so I added a member.

I will send a v3 which uses timeout == zero for this on Monday.

A.



Thanks,

Ben.



--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] Minimize the number of time calls in time_poll()

2021-07-15 Thread Ben Pfaff
On Mon, Jul 12, 2021 at 06:15:29PM +0100, anton.iva...@cambridgegreys.com wrote:
> From: Anton Ivanov 
> 
> time_poll() makes an excessive number of time_msec() calls
> which incur a performance penalty.
> 
> 1. Avoid time_msec() call for timeout calculation when time_poll()
> is asked to skip poll()
> 
> 2. Reuse the time_msec() result from deadline calculation for
> last_wakeup and timeout calculation.
> 
> Signed-off-by: Anton Ivanov 

I'd like another look at that after patch 1 is revised.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/2] Optimize the poll loop for poll_immediate_wake()

2021-07-15 Thread Ben Pfaff
On Mon, Jul 12, 2021 at 06:15:28PM +0100, anton.iva...@cambridgegreys.com wrote:
> From: Anton Ivanov 
> 
> If we are not obtaining any useful information out of the poll(),
> such as is a fd busy or not, we do not need to do a poll() if
> an immediate_wake() has been requested.
> 
> This cuts out all the pollfd hash additions, forming the poll
> arguments and the actual poll() after a call to
> poll_immediate_wake()
> 
> Signed-off-by: Anton Ivanov 

I don't think we need the new 'immediate_wake' member of struct
poll_loop.  A 'timeout_when' of LLONG_MIN already means the same thing
(it's even documented in the comment).

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 ovn] tests: check localport->localnet->external flows cleared

2021-07-15 Thread Ihar Hrachyshka
In addition to external port deleted scenario already covered in the
test scenario, also validate that HA group change or unset also
behaves properly (rules allowing external port traffic to leak into
localnet gone).

Related: 1148580290d0a ("Don't suppress localport traffic directed to
external port")

Signed-off-by: Ihar Hrachyshka 

v1: initial version
v2: fix ddlog test failure by waiting on hv to sync
---
 tests/ovn.at | 43 ---
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/tests/ovn.at b/tests/ovn.at
index 93e1a0267..32efc054f 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -12198,24 +12198,28 @@ send_garp() {
 ovs-appctl netdev-dummy/receive $inport $request
 }
 
-spa=$(ip_to_hex 10 0 0 1)
-tpa=$(ip_to_hex 10 0 0 2)
-send_garp lp 0001 0002 $spa $tpa
-
-spa=$(ip_to_hex 10 0 0 1)
-tpa=$(ip_to_hex 10 0 0 10)
-send_garp lp 0001 0010 $spa $tpa
+send_frames() {
+spa=$(ip_to_hex 10 0 0 1)
+tpa=$(ip_to_hex 10 0 0 2)
+send_garp lp 0001 0002 $spa $tpa
+
+spa=$(ip_to_hex 10 0 0 1)
+tpa=$(ip_to_hex 10 0 0 10)
+send_garp lp 0001 0010 $spa $tpa
+
+spa=$(ip_to_hex 10 0 0 1)
+tpa=$(ip_to_hex 10 0 0 3)
+send_garp lp 0001 0003 $spa $tpa
+}
 
-spa=$(ip_to_hex 10 0 0 1)
-tpa=$(ip_to_hex 10 0 0 3)
-send_garp lp 0001 0003 $spa $tpa
+send_frames
 
 dnl external traffic from localport should be sent to localnet
 AT_CHECK([tcpdump -r main/br-phys_n1-tx.pcap arp[[24:4]]=0x0a02 | wc 
-l],[0],[dnl
 1
 ],[ignore])
 
-#dnl ...regardless of localnet / external ports creation order
+dnl ...regardless of localnet / external ports creation order
 AT_CHECK([tcpdump -r main/br-phys_n1-tx.pcap arp[[24:4]]=0x0a0a | wc 
-l],[0],[dnl
 1
 ],[ignore])
@@ -12225,6 +12229,23 @@ AT_CHECK([tcpdump -r main/br-phys_n1-tx.pcap 
arp[[24:4]]=0x0a03 | wc -l],[0]
 0
 ],[ignore])
 
+# now disown both external ports, one by moving to another (non-existing)
+# chassis, another by removing the port from any ha groups
+check ovn-nbctl --wait=sb ha-chassis-group-add fake_hagrp
+fake_hagrp_uuid=`ovn-nbctl --bare --columns _uuid find ha_chassis_group 
name=fake_hagrp`
+check ovn-nbctl set logical_switch_port lext ha_chassis_group=$fake_hagrp_uuid
+check ovn-nbctl clear logical_switch_port lext2 ha_chassis_group
+check ovn-nbctl --wait=hv sync
+
+# check that traffic no longer leaks into localnet
+send_frames
+
+for suffix in 2 a; do
+AT_CHECK([tcpdump -r main/br-phys_n1-tx.pcap 
arp[[24:4]]=0x0a0${suffix} | wc -l],[0],[dnl
+1
+],[ignore])
+done
+
 AT_CLEANUP
 ])
 
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2 branch-21.06] Don't suppress localport traffic directed to external port

2021-07-15 Thread 0-day Robot
Bleep bloop.  Greetings Ihar Hrachyshka, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Numan Siddique 
Lines checked: 474, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 ovn] Don't suppress localport traffic directed to external port

2021-07-15 Thread 0-day Robot
Bleep bloop.  Greetings Ihar Hrachyshka, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0001 Don't suppress localport traffic directed to external port
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 3/3] dpif-netlink: Introduce per-cpu upcall dispatch

2021-07-15 Thread Aaron Conole
Mark Gray  writes:

> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
>
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
>
> * On systems with a large number of vports, there is correspondingly
> a large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
>
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
>
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
>
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
>
> Reported-at: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> ---
>
> Notes:
> v1 - Reworked based on Flavio's comments:
>  * change DISPATCH_MODE_PER_CPU() to inline function
>  * add `ovs-appctl` command to check dispatch mode for datapaths
>  * fixed issue with userspace actions (tested using `ovs-ofctl 
> monitor br0 65534 -P nxt_packet_in`)
>  * update documentation as requested
> v2 - Reworked based on Flavio's comments:
>  * Used dpif_netlink_upcall_per_cpu() for check in 
> dpif_netlink_set_handler_pids()
>  * Added macro for (ignored) Netlink PID
>  * Fixed indentation issue
>  * Added NEWS entry
>  * Added section to ovs-vswitchd.8 man page
> v4 - Reworked based on Flavio's comments:
>  * Cleaned up log message when dispatch mode is set
> v5 - Reworked based on Flavio's comments:
>  * Added macros to remove functions for Window's build
>  Reworked based on David's comments:
>  * Updated the NEWS file
>
>  NEWS  |   6 +
>  .../linux/compat/include/linux/openvswitch.h  |   7 +
>  lib/automake.mk   |   1 +
>  lib/dpif-netdev.c |   1 +
>  lib/dpif-netlink-unixctl.man  |   6 +
>  lib/dpif-netlink.c| 463 --
>  lib/dpif-provider.h   |  32 +-
>  lib/dpif.c|  17 +
>  lib/dpif.h|   1 +
>  ofproto/ofproto-dpif-upcall.c |  51 +-
>  ofproto/ofproto.c |  12 -
>  vswitchd/ovs-vswitchd.8.in|   1 +
>  vswitchd/vswitch.xml  |  23 +-
>  13 files changed, 526 insertions(+), 95 deletions(-)
>  create mode 100644 lib/dpif-netlink-unixctl.man
>



> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index f92905dd83fd..255aab8ee513 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -84,6 +84,9 @@ enum { MAX_PORTS = USHRT_MAX };
>  #define EPOLLEXCLUSIVE (1u << 28)
>  #endif
>  
> +/* This PID is not used by the kernel datapath when using dispatch per CPU,
> + * but it is required to be set (not zero). */
> +#define DPIF_NETLINK_PER_CPU_PID UINT32_MAX
>  struct dpif_netlink_dp {
>  /* Generic Netlink header. */
>  uint8_t cmd;
> @@ -98,6 +101,8 @@ struct dpif_netlink_dp {
>  const struct ovs_dp_stats *stats;  /* OVS_DP_ATTR_STATS. */
>  const struct ovs_dp_megaflow_stats *megaflow_stats;
> /* OVS_DP_ATTR_MEGAFLOW_STATS.*/
> +const uint32_t *upcall_pids;   /* OVS_DP_ATTR_PER_CPU_PIDS */
> +uint32_t n_upcall_pids;
>  };
>  
>  static void dpif_netlink_dp_init(struct dpif_netlink_dp *);
> @@ -113,6 +118,10 @@ static int dpif_netlink_dp_get(const struct dpif *,
>  static int
>  dpif_netlink_set_features(struct dpif *dpif_, uint32_t new_features);
>  
> +static void
> +dpif_netlink_unixctl_dispatch_mode(struct unixctl_conn *conn, int argc,
> +   const char *argv[], void *aux);
> +
>  struct dpif_netlink_flow {
>  /* Generic Netlink header. */
>  uint8_t cmd;
> @@ -178,11 +187,16 @@ struct dpif_windows_vport_sock {
>  #endif
>  
> 

Re: [ovs-dev] [PATCH v7 ovn 1/5] ovn-northd-ddlog: Optimize AggregatedFlow rules.

2021-07-15 Thread Lorenzo Bianconi
> On Thu, Jul 15, 2021 at 08:45:00PM +0200, Lorenzo Bianconi wrote:
> > > On Thu, Jul 15, 2021 at 11:18:12AM -0700, Han Zhou wrote:
> > > > On Wed, Jul 14, 2021 at 1:34 AM Lorenzo Bianconi <
> > > > lorenzo.bianc...@redhat.com> wrote:
> > > > > This should avoid some work by doing the cheapest check (the one on
> > > > > UseLogicalDatapathGroups) before any joins.  DDlog is probably
> > > > > factoring out the reference to the Flow relation, which is identical
> > > > > in both, but this ought to avoid the group_by aggregation (which is
> > > > > relatively expensive) in the case where UseLogicalDatapathGroups is
> > > > > not enabled.
> > > > 
> > > > Thanks! I didn't know that the order matters. (not sure if there is
> > > > documentation that I missed)
> > > 
> > > In general, DDlog executes each rule in the order given, so if you have
> > > a series of joins in a rule, then it's a good idea to order them, if you
> > > can, to keep the number of records small at each join step.  It won't
> > > affect the correctness, but it will affect performance.
> > > 
> > > This might not be documented well.  I do occasionally work on the DDlog
> > > documentation, so I've made a note to try to see whether the effect of
> > > ordering is mentioned and improve it if I can.
> > > 
> > > In this particular case, I talked to Leonid about it during a discussion
> > > of how to improve performance and memory use in the benchmark currently
> > > at issue, and Leonid says that the ordering doesn't actually matter in
> > > this case because both of the possibilities (the ones for
> > > UseLogicalDatapathGroups[true] and UseLogicalDatapathGroups[false]) had
> > > an identical clause at the beginning.  DDlog optimizes identical
> > > prefixes, so there was no real difference in performance.
> > > 
> > > This patch could, therefore, be dropped, but it should also not be
> > > harmful.  Dropping it would require some changes to the later patch that
> > > updates the ovn-northd ddlog code, so folding it into that one would
> > > also be an option.
> > > 
> > 
> > Do you want me to repost dropping this patch or is the series fine as it is?
> 
> The series is fine as is.
> 

ack, thx for the clarification.

Regards,
Lorenzo
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] Minimize the number of time calls in time_poll()

2021-07-15 Thread Anton Ivanov

On 15/07/2021 20:49, Mark Michelson wrote:

Hi Anton,

Out of curiosity, has this change made a noticeable impact on 
performance?


I can't pick it up on OVN heater.

That is not unexpected. It is "noise" compared to the compute time and 
the number of syscalls used in one iteration in northd or ovsdb.


I suspect it will be more useful in places like the vswitch itself where 
the loops are tighter, processing per loop is less and lower latency is 
the key. There, one syscall less per iteration may show up as a 
noticeable difference.


I do not have a benchmark set up for those.

A.


On 7/12/21 1:15 PM, anton.iva...@cambridgegreys.com wrote:

From: Anton Ivanov 

time_poll() makes an excessive number of time_msec() calls
which incur a performance penalty.

1. Avoid time_msec() call for timeout calculation when time_poll()
is asked to skip poll()

2. Reuse the time_msec() result from deadline calculation for
last_wakeup and timeout calculation.

Signed-off-by: Anton Ivanov 
---
  lib/timeval.c | 36 +---
  1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/lib/timeval.c b/lib/timeval.c
index c6ac87376..64ab22e05 100644
--- a/lib/timeval.c
+++ b/lib/timeval.c
@@ -287,7 +287,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, 
HANDLE *handles OVS_UNUSED,

    long long int timeout_when, int *elapsed)
  {
  long long int *last_wakeup = last_wakeup_get();
-    long long int start;
+    long long int start, now;
  bool quiescent;
  int retval = 0;
  @@ -297,28 +297,31 @@ time_poll(struct pollfd *pollfds, int 
n_pollfds, HANDLE *handles OVS_UNUSED,

  if (*last_wakeup && !thread_is_pmd()) {
  log_poll_interval(*last_wakeup);
  }
-    start = time_msec();
+    now = start = time_msec();
    timeout_when = MIN(timeout_when, deadline);
  quiescent = ovsrcu_is_quiescent();
    for (;;) {
-    long long int now = time_msec();
  int time_left;
  -    if (now >= timeout_when) {
+    if (n_pollfds == 0) {
  time_left = 0;
-    } else if ((unsigned long long int) timeout_when - now > 
INT_MAX) {

-    time_left = INT_MAX;
  } else {
-    time_left = timeout_when - now;
-    }
-
-    if (!quiescent) {
-    if (!time_left) {
-    ovsrcu_quiesce();
+    if (now >= timeout_when) {
+    time_left = 0;
+    } else if ((unsigned long long int) timeout_when - now > 
INT_MAX) {

+    time_left = INT_MAX;
  } else {
-    ovsrcu_quiesce_start();
+    time_left = timeout_when - now;
+    }
+
+    if (!quiescent) {
+    if (!time_left) {
+    ovsrcu_quiesce();
+    } else {
+    ovsrcu_quiesce_start();
+    }
  }
  }
  @@ -329,6 +332,8 @@ time_poll(struct pollfd *pollfds, int 
n_pollfds, HANDLE *handles OVS_UNUSED,

   */
  if (n_pollfds != 0) {
  retval = poll(pollfds, n_pollfds, time_left);
+    } else {
+    retval = 0;
  }
  if (retval < 0) {
  retval = -errno;
@@ -355,7 +360,8 @@ time_poll(struct pollfd *pollfds, int n_pollfds, 
HANDLE *handles OVS_UNUSED,

  ovsrcu_quiesce_end();
  }
  -    if (deadline <= time_msec()) {
+    now = time_msec();
+    if (deadline <= now) {
  #ifndef _WIN32
  fatal_signal_handler(SIGALRM);
  #else
@@ -372,7 +378,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, 
HANDLE *handles OVS_UNUSED,

  break;
  }
  }
-    *last_wakeup = time_msec();
+    *last_wakeup = now;
  refresh_rusage();
  *elapsed = *last_wakeup - start;
  return retval;






--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] tests: check localport->localnet->external flows cleared

2021-07-15 Thread Ihar Hrachyshka
In addition to external port deleted scenario already covered in the
test scenario, also validate that HA group change or unset also
behaves properly (rules allowing external port traffic to leak into
localnet gone).

Related: 1148580290d0a ("Don't suppress localport traffic directed to
external port")

Signed-off-by: Ihar Hrachyshka 
---
 tests/ovn.at | 42 +++---
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/tests/ovn.at b/tests/ovn.at
index 93e1a0267..9cdf130e9 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -12198,24 +12198,28 @@ send_garp() {
 ovs-appctl netdev-dummy/receive $inport $request
 }
 
-spa=$(ip_to_hex 10 0 0 1)
-tpa=$(ip_to_hex 10 0 0 2)
-send_garp lp 0001 0002 $spa $tpa
-
-spa=$(ip_to_hex 10 0 0 1)
-tpa=$(ip_to_hex 10 0 0 10)
-send_garp lp 0001 0010 $spa $tpa
+send_frames() {
+spa=$(ip_to_hex 10 0 0 1)
+tpa=$(ip_to_hex 10 0 0 2)
+send_garp lp 0001 0002 $spa $tpa
+
+spa=$(ip_to_hex 10 0 0 1)
+tpa=$(ip_to_hex 10 0 0 10)
+send_garp lp 0001 0010 $spa $tpa
+
+spa=$(ip_to_hex 10 0 0 1)
+tpa=$(ip_to_hex 10 0 0 3)
+send_garp lp 0001 0003 $spa $tpa
+}
 
-spa=$(ip_to_hex 10 0 0 1)
-tpa=$(ip_to_hex 10 0 0 3)
-send_garp lp 0001 0003 $spa $tpa
+send_frames
 
 dnl external traffic from localport should be sent to localnet
 AT_CHECK([tcpdump -r main/br-phys_n1-tx.pcap arp[[24:4]]=0x0a02 | wc 
-l],[0],[dnl
 1
 ],[ignore])
 
-#dnl ...regardless of localnet / external ports creation order
+dnl ...regardless of localnet / external ports creation order
 AT_CHECK([tcpdump -r main/br-phys_n1-tx.pcap arp[[24:4]]=0x0a0a | wc 
-l],[0],[dnl
 1
 ],[ignore])
@@ -12225,6 +12229,22 @@ AT_CHECK([tcpdump -r main/br-phys_n1-tx.pcap 
arp[[24:4]]=0x0a03 | wc -l],[0]
 0
 ],[ignore])
 
+# now disown both external ports, one by moving to another (non-existing)
+# chassis, another by removing the port from any ha groups
+check ovn-nbctl --wait=sb ha-chassis-group-add fake_hagrp
+fake_hagrp_uuid=`ovn-nbctl --bare --columns _uuid find ha_chassis_group 
name=fake_hagrp`
+check ovn-nbctl set logical_switch_port lext ha_chassis_group=$fake_hagrp_uuid
+check ovn-nbctl clear logical_switch_port lext2 ha_chassis_group
+
+# check that traffic no longer leaks into localnet
+send_frames
+
+for suffix in 2 a; do
+AT_CHECK([tcpdump -r main/br-phys_n1-tx.pcap 
arp[[24:4]]=0x0a0${suffix} | wc -l],[0],[dnl
+1
+],[ignore])
+done
+
 AT_CLEANUP
 ])
 
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC ovn 0/4] Avoid parsing non-local lflows with the help of tags in SB.

2021-07-15 Thread Mark Michelson

Hi Han,

I finally got around to having a look at this, and honestly I'm really 
happy at how simple the series is. For now, I'm not giving individual 
notes on patches, but I'll comment on the series as a whole.


It seems that this is targeted at deployments where logical switches 
have their ports distributed across multiple HVs. Something like the 
OpenShift/ovn-kubernetes model of having one logical switch per node is 
not going to see much benefit from this series. However, this also isn't 
likely to add any extra overhead to that sort of deployment either.


I think this is a good basis for an optimization. The biggest 
improvement I can think of is to be able to apply the port hint to more 
flows than just the ones that explicitly reference the inport or 
outport. But I think that could be an incremental improvement over this 
initial patch series.


The only other criticism is the lack of DDLog, but as you noted in the 
description, that's a known shortcoming.


On 7/1/21 1:45 AM, Han Zhou wrote:

With the help of a new column in Logical_Flow table that stores ingress/egress
lport information, ovn-controller can avoid parsing a big portion of the
logical flows in SB DB, which can largely improve ovn-controller's performance
whenever a full recompute is required.

With a scale test topology of 1000 chassises, 20 LSPs per chassis, 20k
lports in total spread acrossing 200 logical switches, connected by a
logical router, the test result before & after this change:

Before:
- lflow-cache disabled:
 - ovn-controller recompute: 2.7 sec
- lflow-cache enabled:
 - ovn-controller recompute: 2.1 sec
 - lflow cache memory: 622103 KB

After:
- lflow-cache disabled:
 - ovn-controller recompute: 0.83 sec
- lflow-cache enabled:
 - ovn-controller recompute: 0.71 sec
 - lflow cache memory: 123641 KB

(note: DP group enabled for both)

So for this test scenario, when lflow cache is disabled, latency reduced
~70%; when lflow cache is enabled, latency reduced ~65% and lflow cache
memory reduced ~80%.

TODO: DDlog change for ovn-northd.

Note that this series applies on top of a pending patch:
https://patchwork.ozlabs.org/project/ovn/patch/20210629192257.1699504-1-hz...@ovn.org/

Han Zhou (4):
   ovn-northd.at: Minor improvement for the dp group test case.
   ovn-sb: Add tags column to logical_flow table of the SB DB.
   ovn-northd: Populate in_out_port in logical_flow table's tags.
   ovn-controller: Skip non-local lflows in ovn-controller before
 parsing.

  controller/lflow.c  |  21 +++
  controller/lflow.h  |   1 +
  controller/ovn-controller.c |   1 +
  northd/ovn-northd.c | 272 
  ovn-sb.ovsschema|   7 +-
  ovn-sb.xml  |  23 +++
  tests/ovn-northd.at |   2 +-
  7 files changed, 207 insertions(+), 120 deletions(-)



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2 branch-21.06] Don't suppress localport traffic directed to external port

2021-07-15 Thread Ihar Hrachyshka
Recently, we stopped leaking localport traffic through localnet ports
into fabric to avoid unnecessary flipping between chassis hosting the
same localport.

Despite the type name, in some scenarios localports are supposed to
talk outside the hosting chassis. Specifically, in OpenStack [1]
metadata service for SR-IOV ports is implemented as a localport hosted
on another chassis that is exposed to the chassis owning the SR-IOV
port through an "external" port. In this case, "leaking" localport
traffic into fabric is desirable.

This patch inserts a higher priority flow per external port on the
same datapath that avoids dropping localport traffic.

The backport includes custom branch-21.06 specific physical flows
cleanup for localnet ports when an external port is modified / deleted.
This was not needed in master branch because of separated pflow
management.

Fixes: 96959e56d634 ("physical: do not forward traffic from localport
to a localnet one")

[1] https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1974062

Signed-off-by: Ihar Hrachyshka 
Signed-off-by: Numan Siddique 
(cherry picked from commit 1148580290d0ace803f20aeaa0241dd51c100630)

---

v1: initial backport.
v2: properly handle binding updates through inc engine instead of
forcing full recompute.

new test case
---
 controller/binding.c| 40 +++--
 controller/binding.h|  2 +
 controller/ovn-controller.c | 11 +
 controller/ovn-controller.h |  2 +
 controller/physical.c   | 66 +---
 controller/physical.h   |  2 +
 tests/ovn.at| 85 +
 7 files changed, 198 insertions(+), 10 deletions(-)

diff --git a/controller/binding.c b/controller/binding.c
index 594babc98..ba558efdb 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -22,6 +22,7 @@
 #include "patch.h"
 
 #include "lib/bitmap.h"
+#include "lib/hmapx.h"
 #include "openvswitch/poll-loop.h"
 #include "lib/sset.h"
 #include "lib/util.h"
@@ -108,6 +109,7 @@ add_local_datapath__(struct ovsdb_idl_index 
*sbrec_datapath_binding_by_key,
 hmap_insert(local_datapaths, >hmap_node, dp_key);
 ld->datapath = datapath;
 ld->localnet_port = NULL;
+shash_init(>external_ports);
 ld->has_local_l3gateway = has_local_l3gateway;
 
 if (tracked_datapaths) {
@@ -474,6 +476,18 @@ is_network_plugged(const struct sbrec_port_binding 
*binding_rec,
 return network ? !!shash_find_data(bridge_mappings, network) : false;
 }
 
+static void
+update_ld_external_ports(const struct sbrec_port_binding *binding_rec,
+ struct hmap *local_datapaths)
+{
+struct local_datapath *ld = get_local_datapath(
+local_datapaths, binding_rec->datapath->tunnel_key);
+if (ld) {
+shash_replace(>external_ports, binding_rec->logical_port,
+  binding_rec);
+}
+}
+
 static void
 update_ld_localnet_port(const struct sbrec_port_binding *binding_rec,
 struct shash *bridge_mappings,
@@ -1631,8 +1645,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 !sset_is_empty(b_ctx_out->egress_ifaces) ? _map : NULL;
 
 struct ovs_list localnet_lports = OVS_LIST_INITIALIZER(_lports);
+struct ovs_list external_lports = OVS_LIST_INITIALIZER(_lports);
 
-struct localnet_lport {
+struct lport {
 struct ovs_list list_node;
 const struct sbrec_port_binding *pb;
 };
@@ -1680,11 +1695,14 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 
 case LP_EXTERNAL:
 consider_external_lport(pb, b_ctx_in, b_ctx_out);
+struct lport *ext_lport = xmalloc(sizeof *ext_lport);
+ext_lport->pb = pb;
+ovs_list_push_back(_lports, _lport->list_node);
 break;
 
 case LP_LOCALNET: {
 consider_localnet_lport(pb, b_ctx_in, b_ctx_out, _map);
-struct localnet_lport *lnet_lport = xmalloc(sizeof *lnet_lport);
+struct lport *lnet_lport = xmalloc(sizeof *lnet_lport);
 lnet_lport->pb = pb;
 ovs_list_push_back(_lports, _lport->list_node);
 break;
@@ -1711,7 +1729,7 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 /* Run through each localnet lport list to see if it is a localnet port
  * on local datapaths discovered from above loop, and update the
  * corresponding local datapath accordingly. */
-struct localnet_lport *lnet_lport;
+struct lport *lnet_lport;
 LIST_FOR_EACH_POP (lnet_lport, list_node, _lports) {
 update_ld_localnet_port(lnet_lport->pb, _mappings,
 b_ctx_out->egress_ifaces,
@@ -1719,6 +1737,15 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 free(lnet_lport);
 }
 
+/* Run 

[ovs-dev] [PATCH v2 ovn] Don't suppress localport traffic directed to external port

2021-07-15 Thread Ihar Hrachyshka
Recently, we stopped leaking localport traffic through localnet ports
into fabric to avoid unnecessary flipping between chassis hosting the
same localport.

Despite the type name, in some scenarios localports are supposed to
talk outside the hosting chassis. Specifically, in OpenStack [1]
metadata service for SR-IOV ports is implemented as a localport hosted
on another chassis that is exposed to the chassis owning the SR-IOV
port through an "external" port. In this case, "leaking" localport
traffic into fabric is desirable.

This patch inserts a higher priority flow per external port on the
same datapath that avoids dropping localport traffic.

The backport includes custom branch-21.06 specific physical flows
cleanup for localnet ports when an external port is modified / deleted.
This was not needed in master branch because of separated pflow
management.

Fixes: 96959e56d634 ("physical: do not forward traffic from localport
to a localnet one")

[1] https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1974062

Signed-off-by: Ihar Hrachyshka 
Signed-off-by: Numan Siddique 
(cherry picked from commit 1148580290d0ace803f20aeaa0241dd51c100630)

---

v1: initial backport.
v2: properly handle binding updates through inc engine instead of
forcing full recompute.

new test case
---
 controller/binding.c| 40 +++--
 controller/binding.h|  2 +
 controller/ovn-controller.c | 11 +
 controller/ovn-controller.h |  2 +
 controller/physical.c   | 66 +---
 controller/physical.h   |  2 +
 tests/ovn.at| 85 +
 7 files changed, 198 insertions(+), 10 deletions(-)

diff --git a/controller/binding.c b/controller/binding.c
index 594babc98..ba558efdb 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -22,6 +22,7 @@
 #include "patch.h"
 
 #include "lib/bitmap.h"
+#include "lib/hmapx.h"
 #include "openvswitch/poll-loop.h"
 #include "lib/sset.h"
 #include "lib/util.h"
@@ -108,6 +109,7 @@ add_local_datapath__(struct ovsdb_idl_index 
*sbrec_datapath_binding_by_key,
 hmap_insert(local_datapaths, >hmap_node, dp_key);
 ld->datapath = datapath;
 ld->localnet_port = NULL;
+shash_init(>external_ports);
 ld->has_local_l3gateway = has_local_l3gateway;
 
 if (tracked_datapaths) {
@@ -474,6 +476,18 @@ is_network_plugged(const struct sbrec_port_binding 
*binding_rec,
 return network ? !!shash_find_data(bridge_mappings, network) : false;
 }
 
+static void
+update_ld_external_ports(const struct sbrec_port_binding *binding_rec,
+ struct hmap *local_datapaths)
+{
+struct local_datapath *ld = get_local_datapath(
+local_datapaths, binding_rec->datapath->tunnel_key);
+if (ld) {
+shash_replace(>external_ports, binding_rec->logical_port,
+  binding_rec);
+}
+}
+
 static void
 update_ld_localnet_port(const struct sbrec_port_binding *binding_rec,
 struct shash *bridge_mappings,
@@ -1631,8 +1645,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 !sset_is_empty(b_ctx_out->egress_ifaces) ? _map : NULL;
 
 struct ovs_list localnet_lports = OVS_LIST_INITIALIZER(_lports);
+struct ovs_list external_lports = OVS_LIST_INITIALIZER(_lports);
 
-struct localnet_lport {
+struct lport {
 struct ovs_list list_node;
 const struct sbrec_port_binding *pb;
 };
@@ -1680,11 +1695,14 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 
 case LP_EXTERNAL:
 consider_external_lport(pb, b_ctx_in, b_ctx_out);
+struct lport *ext_lport = xmalloc(sizeof *ext_lport);
+ext_lport->pb = pb;
+ovs_list_push_back(_lports, _lport->list_node);
 break;
 
 case LP_LOCALNET: {
 consider_localnet_lport(pb, b_ctx_in, b_ctx_out, _map);
-struct localnet_lport *lnet_lport = xmalloc(sizeof *lnet_lport);
+struct lport *lnet_lport = xmalloc(sizeof *lnet_lport);
 lnet_lport->pb = pb;
 ovs_list_push_back(_lports, _lport->list_node);
 break;
@@ -1711,7 +1729,7 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 /* Run through each localnet lport list to see if it is a localnet port
  * on local datapaths discovered from above loop, and update the
  * corresponding local datapath accordingly. */
-struct localnet_lport *lnet_lport;
+struct lport *lnet_lport;
 LIST_FOR_EACH_POP (lnet_lport, list_node, _lports) {
 update_ld_localnet_port(lnet_lport->pb, _mappings,
 b_ctx_out->egress_ifaces,
@@ -1719,6 +1737,15 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 free(lnet_lport);
 }
 
+/* Run 

Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-07-15 Thread Ilya Maximets
On 7/15/21 8:31 PM, Dumitru Ceara wrote:
> On 7/15/21 7:37 PM, Han Zhou wrote:
>> On Thu, Jul 15, 2021 at 9:17 AM Ilya Maximets  wrote:
>>>
>>> On 6/29/21 9:57 PM, Ilya Maximets wrote:
>> Regarding the current patch, I think it's better to add a test case to
>> cover the scenario and confirm that existing connections didn't
>> reset. With
>> that:
>> Acked-by: Han Zhou 

 I'll work on a unit test for this.
>>>
>>> Hi.  Here is a unit test that I came up with:
>>>
>>> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
>>> index 62181dd4d..e32f9ec89 100644
>>> --- a/tests/ovsdb-idl.at
>>> +++ b/tests/ovsdb-idl.at
>>> @@ -2282,3 +2282,27 @@ OVSDB_CHECK_CLUSTER_IDL_C([simple idl,
>> monitor_cond_since, cluster disconnect],
>>>  008: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[]
>> uuid=<2>
>>>  009: done
>>>  ]])
>>> +
>>> +dnl This test checks that IDL keeps the existing connection to the
>> server if
>>> +dnl it's still on a list of remotes after update.
>>> +OVSDB_CHECK_IDL_C([simple idl, initially empty, set remotes],
>>> +  [],
>>> +  [['set-remote unix:socket' \
>>> +'+set-remote unix:bad_socket,unix:socket' \
>>> +'+set-remote unix:bad_socket' \
>>> +'+set-remote unix:socket' \
>>> +'set-remote unix:bad_socket,unix:socket' \
>>> +'+set-remote unix:socket' \
>>> +'+reconnect']],
>>> +  [[000: empty
>>> +001: new remotes: unix:socket, is connected: true
>>> +002: new remotes: unix:bad_socket,unix:socket, is connected: true
>>> +003: new remotes: unix:bad_socket, is connected: false
>>> +004: new remotes: unix:socket, is connected: false
>>> +005: empty
>>> +006: new remotes: unix:bad_socket,unix:socket, is connected: true
>>> +007: new remotes: unix:socket, is connected: true
>>> +008: reconnect
>>> +009: empty
>>> +010: done
>>> +]])
>>> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
>>> index a886f971e..93329cd4c 100644
>>> --- a/tests/test-ovsdb.c
>>> +++ b/tests/test-ovsdb.c
>>> @@ -2621,6 +2621,7 @@ do_idl(struct ovs_cmdl_context *ctx)
>>>  setvbuf(stdout, NULL, _IONBF, 0);
>>>
>>>  symtab = ovsdb_symbol_table_create();
>>> +const char remote_s[] = "set-remote ";
>>>  const char cond_s[] = "condition ";
>>>  if (ctx->argc > 2 && strstr(ctx->argv[2], cond_s)) {
>>>  update_conditions(idl, ctx->argv[2] + strlen(cond_s));
>>> @@ -2664,6 +2665,11 @@ do_idl(struct ovs_cmdl_context *ctx)
>>>  if (!strcmp(arg, "reconnect")) {
>>>  print_and_log("%03d: reconnect", step++);
>>>  ovsdb_idl_force_reconnect(idl);
>>> +}  else if (!strncmp(arg, remote_s, strlen(remote_s))) {
>>> +ovsdb_idl_set_remote(idl, arg + strlen(remote_s), true);
>>> +print_and_log("%03d: new remotes: %s, is connected: %s",
>> step++,
>>> +  arg + strlen(remote_s),
>>> +  ovsdb_idl_is_connected(idl) ? "true" :
>> "false");
>>>  }  else if (!strncmp(arg, cond_s, strlen(cond_s))) {
>>>  update_conditions(idl, arg + strlen(cond_s));
>>>  print_and_log("%03d: change conditions", step++);
>>> ---
>>>
>>> Dumitru, Han, if it looks good to you, I can squash it in before
>>> applying the patch.   What do you think?
>>
>> Thanks Ilya. LGTM.
> 
> Looks good to me too, thanks!

Applied.  Thanks!

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS "soft freeze" for 2.16 is in effect.

2021-07-15 Thread Stokes, Ian
> On 7/15/21 7:43 PM, Stokes, Ian wrote:
> >> Hi.  As described in Documentation/internals/release-process.rst, we are
> >> in a "soft freeze" state:
> >>
> >>During the freeze, we ask committers to refrain from applying patches 
> >> that
> >>add new features unless those patches were already being publicly
> discussed
> >>and reviewed before the freeze began.  Bug fixes are welcome at any 
> >> time.
> >>Please propose and discuss exceptions on ovs-dev.
> >>
> >> We should branch for version 2.16 in two weeks from now, on Friday, July 
> >> 16.
> >>
> >> Current known exception that will likely be accepted during soft-freeze, 
> >> but
> >> not prepared/reviewed yet:
> >>   - We're awaiting patches to fix performance issue in VXLAN decap
> offloading
> >> implementation in userspace datapath, but this is more a bug fix than a
> >> new feature.
> >>
> > Hi Ilya,
> 
> Hi, Ian.
> 
> >
> > With branching for 2.16 being planned tomorrow, the other exceptions that I
> am aware of are as follows
> >
> > MFEX Optimiziation (v14 sent, awaiting ACK on minor rework)
> > RXQ Scheduling (Acked by RH and Intel already)
> > dpdk: Logs to announce removal of defaults for socket-mem and limit.
> 
> These are not really exceptions for a "soft freeze" stage, because
> they were submitted and discussed/reviewed before the soft freeze
> was announced.  But for the branching, I would be better to have
> all features (actual code changes) to be merged before the creation
> of a branch-2.16, which I plan to do tomorrow at the end of a day.
> 
> Or do you expect these patches to not be merged before branching?
> 

I was hoping to merge tomorrow, MFEX has received required acks for patches 
(although I'd like to receive an additional ack on one of the patches from one 
of the reviewers, hopefully tomorrow morning).

I'm finishing some testing on the RXQ scheduling but again should be finished 
by tomorrow before EOD.


> >
> > Is there anything else on your side that you are aware that is in a 
> > position to be
> applied?
> 
> Form my side I'm going to merge OVSDB Relay patch-set, as it was
> reviewed and tested.  Another big change is
>   "dpif-netlink: Introduce per-cpu upcall dispatching"
> it's reviewed and tested by Flavio, Aaron is also reviewing.
> Kernel code is reviewed by Flavio, Pravin had only couple of
> small comments.  We're expecting kernel part be accepted as soon
> as net-next is open (next week?), unfortunately this will be after
> branching, but taking into account that there were no objections
> during review and the backward compatibility of the feature, it
> should be fine to accept userspace parts sooner.  I'm tracking this.
> 
> And there is a bunch of small changes/features that was reviewed
> and tested for long time already.  So, I'm going to look through
> them and apply ones that are in a good shape, e.g.:
> -
> https://patchwork.ozlabs.org/project/openvswitch/patch/20210701181933.944
> 0-2-twil...@redhat.com/
> -
> https://patchwork.ozlabs.org/project/openvswitch/patch/20210416120631.458
> 4-1-david.march...@redhat.com/
> -
> https://patchwork.ozlabs.org/project/openvswitch/patch/20210629204339.397
> 58-1-vdas...@gmail.com/
> 
> Would be great if you can proof-read/review the "bring your own lab"
> documentation update from Aaron:
> 

Will do.

> https://patchwork.ozlabs.org/project/openvswitch/patch/20210628180028.581
> 283-1-acon...@redhat.com/
> 
> We discussed previously the "direct output optimization" patch,
> but I don't think we have enough time now for another round of
> reviews and thorough testing.  This will go to the next release.
> BTW, I'm still waiting for comments on it regarding integration
> into AVX512 code.  Some testing also would be great, but it, likely,
> needs rebase now.
+1

Thanks
Ian
> 
> Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS "soft freeze" for 2.16 is in effect.

2021-07-15 Thread Ilya Maximets
On 7/15/21 7:43 PM, Stokes, Ian wrote:
>> Hi.  As described in Documentation/internals/release-process.rst, we are
>> in a "soft freeze" state:
>>
>>During the freeze, we ask committers to refrain from applying patches that
>>add new features unless those patches were already being publicly 
>> discussed
>>and reviewed before the freeze began.  Bug fixes are welcome at any time.
>>Please propose and discuss exceptions on ovs-dev.
>>
>> We should branch for version 2.16 in two weeks from now, on Friday, July 16.
>>
>> Current known exception that will likely be accepted during soft-freeze, but
>> not prepared/reviewed yet:
>>   - We're awaiting patches to fix performance issue in VXLAN decap offloading
>> implementation in userspace datapath, but this is more a bug fix than a
>> new feature.
>>
> Hi Ilya,

Hi, Ian.

> 
> With branching for 2.16 being planned tomorrow, the other exceptions that I 
> am aware of are as follows
> 
> MFEX Optimiziation (v14 sent, awaiting ACK on minor rework)
> RXQ Scheduling (Acked by RH and Intel already)
> dpdk: Logs to announce removal of defaults for socket-mem and limit.

These are not really exceptions for a "soft freeze" stage, because
they were submitted and discussed/reviewed before the soft freeze
was announced.  But for the branching, I would be better to have
all features (actual code changes) to be merged before the creation
of a branch-2.16, which I plan to do tomorrow at the end of a day.

Or do you expect these patches to not be merged before branching?

> 
> Is there anything else on your side that you are aware that is in a position 
> to be applied?

Form my side I'm going to merge OVSDB Relay patch-set, as it was
reviewed and tested.  Another big change is
  "dpif-netlink: Introduce per-cpu upcall dispatching"
it's reviewed and tested by Flavio, Aaron is also reviewing.
Kernel code is reviewed by Flavio, Pravin had only couple of
small comments.  We're expecting kernel part be accepted as soon
as net-next is open (next week?), unfortunately this will be after
branching, but taking into account that there were no objections
during review and the backward compatibility of the feature, it
should be fine to accept userspace parts sooner.  I'm tracking this.

And there is a bunch of small changes/features that was reviewed
and tested for long time already.  So, I'm going to look through
them and apply ones that are in a good shape, e.g.:
- 
https://patchwork.ozlabs.org/project/openvswitch/patch/20210701181933.9440-2-twil...@redhat.com/
- 
https://patchwork.ozlabs.org/project/openvswitch/patch/20210416120631.4584-1-david.march...@redhat.com/
- 
https://patchwork.ozlabs.org/project/openvswitch/patch/20210629204339.39758-1-vdas...@gmail.com/

Would be great if you can proof-read/review the "bring your own lab"
documentation update from Aaron:
  
https://patchwork.ozlabs.org/project/openvswitch/patch/20210628180028.581283-1-acon...@redhat.com/

We discussed previously the "direct output optimization" patch,
but I don't think we have enough time now for another round of
reviews and thorough testing.  This will go to the next release.
BTW, I'm still waiting for comments on it regarding integration
into AVX512 code.  Some testing also would be great, but it, likely,
needs rebase now.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] Minimize the number of time calls in time_poll()

2021-07-15 Thread Mark Michelson

Hi Anton,

Out of curiosity, has this change made a noticeable impact on performance?

On 7/12/21 1:15 PM, anton.iva...@cambridgegreys.com wrote:

From: Anton Ivanov 

time_poll() makes an excessive number of time_msec() calls
which incur a performance penalty.

1. Avoid time_msec() call for timeout calculation when time_poll()
is asked to skip poll()

2. Reuse the time_msec() result from deadline calculation for
last_wakeup and timeout calculation.

Signed-off-by: Anton Ivanov 
---
  lib/timeval.c | 36 +---
  1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/lib/timeval.c b/lib/timeval.c
index c6ac87376..64ab22e05 100644
--- a/lib/timeval.c
+++ b/lib/timeval.c
@@ -287,7 +287,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE 
*handles OVS_UNUSED,
long long int timeout_when, int *elapsed)
  {
  long long int *last_wakeup = last_wakeup_get();
-long long int start;
+long long int start, now;
  bool quiescent;
  int retval = 0;
  
@@ -297,28 +297,31 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED,

  if (*last_wakeup && !thread_is_pmd()) {
  log_poll_interval(*last_wakeup);
  }
-start = time_msec();
+now = start = time_msec();
  
  timeout_when = MIN(timeout_when, deadline);

  quiescent = ovsrcu_is_quiescent();
  
  for (;;) {

-long long int now = time_msec();
  int time_left;
  
-if (now >= timeout_when) {

+if (n_pollfds == 0) {
  time_left = 0;
-} else if ((unsigned long long int) timeout_when - now > INT_MAX) {
-time_left = INT_MAX;
  } else {
-time_left = timeout_when - now;
-}
-
-if (!quiescent) {
-if (!time_left) {
-ovsrcu_quiesce();
+if (now >= timeout_when) {
+time_left = 0;
+} else if ((unsigned long long int) timeout_when - now > INT_MAX) {
+time_left = INT_MAX;
  } else {
-ovsrcu_quiesce_start();
+time_left = timeout_when - now;
+}
+
+if (!quiescent) {
+if (!time_left) {
+ovsrcu_quiesce();
+} else {
+ovsrcu_quiesce_start();
+}
  }
  }
  
@@ -329,6 +332,8 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED,

   */
  if (n_pollfds != 0) {
  retval = poll(pollfds, n_pollfds, time_left);
+} else {
+retval = 0;
  }
  if (retval < 0) {
  retval = -errno;
@@ -355,7 +360,8 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE 
*handles OVS_UNUSED,
  ovsrcu_quiesce_end();
  }
  
-if (deadline <= time_msec()) {

+now = time_msec();
+if (deadline <= now) {
  #ifndef _WIN32
  fatal_signal_handler(SIGALRM);
  #else
@@ -372,7 +378,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE 
*handles OVS_UNUSED,
  break;
  }
  }
-*last_wakeup = time_msec();
+*last_wakeup = now;
  refresh_rusage();
  *elapsed = *last_wakeup - start;
  return retval;



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 ovn 1/5] ovn-northd-ddlog: Optimize AggregatedFlow rules.

2021-07-15 Thread Han Zhou
On Thu, Jul 15, 2021 at 11:29 AM Ben Pfaff  wrote:
>
> On Thu, Jul 15, 2021 at 11:18:12AM -0700, Han Zhou wrote:
> > On Wed, Jul 14, 2021 at 1:34 AM Lorenzo Bianconi <
> > lorenzo.bianc...@redhat.com> wrote:
> > > This should avoid some work by doing the cheapest check (the one on
> > > UseLogicalDatapathGroups) before any joins.  DDlog is probably
> > > factoring out the reference to the Flow relation, which is identical
> > > in both, but this ought to avoid the group_by aggregation (which is
> > > relatively expensive) in the case where UseLogicalDatapathGroups is
> > > not enabled.
> >
> > Thanks! I didn't know that the order matters. (not sure if there is
> > documentation that I missed)
>
> In general, DDlog executes each rule in the order given, so if you have
> a series of joins in a rule, then it's a good idea to order them, if you
> can, to keep the number of records small at each join step.  It won't
> affect the correctness, but it will affect performance.
>
> This might not be documented well.  I do occasionally work on the DDlog
> documentation, so I've made a note to try to see whether the effect of
> ordering is mentioned and improve it if I can.
>
> In this particular case, I talked to Leonid about it during a discussion
> of how to improve performance and memory use in the benchmark currently
> at issue, and Leonid says that the ordering doesn't actually matter in
> this case because both of the possibilities (the ones for
> UseLogicalDatapathGroups[true] and UseLogicalDatapathGroups[false]) had
> an identical clause at the beginning.  DDlog optimizes identical
> prefixes, so there was no real difference in performance.
>
Thanks for the explanation. However, the group_by part is not common for
these two, would this reordering still save that cost when
UseLogicalDatapathGroups is disabled?

> This patch could, therefore, be dropped, but it should also not be
> harmful.  Dropping it would require some changes to the later patch that
> updates the ovn-northd ddlog code, so folding it into that one would
> also be an option.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v15 06/10] dpif-netdev: Add a partial HWOL PMD statistic.

2021-07-15 Thread Flavio Leitner
On Thu, Jul 15, 2021 at 01:39:04PM +, Ferriter, Cian wrote:
> 
> 
> > -Original Message-
> > From: Flavio Leitner 
> > Sent: Friday 9 July 2021 18:54
> > To: Ferriter, Cian 
> > Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [v15 06/10] dpif-netdev: Add a partial HWOL PMD 
> > statistic.
> > 
> > 
> > 
> > Hi,
> > 
> > After rebasing, the performance of branch master boosted in my env
> > from 12Mpps to 13Mpps. However, this specific patch brings down
> > to 12Mpps. I am using dpif_scalar and generic lookup (no AVX512).
> > 
> 
> Thanks for the investigation. Always great seeing perf numbers and details!
> 
> I just want to check my understanding here with what you're seeing:
> 
> Performance before DPIF patchset
> 12Mpps
> 
> Performance at this patch
> 12Mpps
> 
> Performance after DPIF patchset
> 13Mpps
> 
> So the performance recovers somewhere else in the patchset?


Interesting, which flags are you passing to build OVS?

Thanks for following up!
fbl


> 
> I've checked the performance behaviour in my case. I'm going to report 
> relative performance numbers. They are relative to master branch before 
> AVX512 DPIF was applied (c36c8e3).
> I tried to run a similar testcase, I can see you are using EMC from the 
> memcmp in perf top output. I am also using the scalar DPIF in all the below 
> testcases.
> 
> Master before AVX512 DPIF (c36c8e3)
> 1.000x (0.0%)
> DPIF patch 3 - dpif-avx512: Add ISA implementation of dpif.
> 1.010x (1.0%)
> DPIF patch 4 - dpif-netdev: Add command to switch dpif implementation.
> 1.042x (4.2%)
> DPIF patch 5 - dpif-netdev: Add command to get dpif implementations.
> 1.063x (6.3%)
> DPIF patch 6 - dpif-netdev: Add a partial HWOL PMD statistic.
> 1.069x (6.9%)
> Latest master which has AVX512 DPIF patches (d2e9703)
> 1.075x (7.5%)
> Master before AVX512 DPIF (c36c8e3), with prefetch change
> 0.983x (-1.7%)
> Latest master which has AVX512 DPIF patches (d2e9703), with prefetch change
> 1.080x (8.0%)
> 
> > (I don't think this report should block the patch because the
> > counter are interesting and the analysis below doesn't point
> > directly to the proposed changes.)
> > 
> > This is a diff using all patches applied versus this patch reverted:
> > 21.44% +6.08%  ovs-vswitchd[.] miniflow_extract
> >  8.94% -1.92%  libc-2.28.so[.] __memcmp_avx2_movbe
> > 14.62% +1.44%  ovs-vswitchd[.] dp_netdev_input__
> >  2.80% -1.08%  ovs-vswitchd[.] 
> > dp_netdev_pmd_flush_output_on_port
> >  3.44% -0.91%  ovs-vswitchd[.] netdev_send
> > 
> > This is the code side by side, patch applied on the right side:
> > (sorry, long lines)
> > 
> 
> My mail client has wrapped the below lines, sorry for mangling the output!
> 
> 
> Please find it here:
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385448.html
> 
> > 
> > 
> > I don't see any relevant optimization difference in the code
> > above, but the "mov %r15w,-0x2(%r13)" on the right side accounts
> > for almost all the difference, though on the left side it seems
> > a bit more spread.
> > 
> > I applied the patch below and it helped to get to 12.7Mpps, so
> > almost at the same levels. I wonder if you see the same result.
> > 
> 
> Since I don't see the drop that you see with this patch, when I apply the 
> below patch to the latest master, I see a smaller benefit.
> The relative performance after adding the below prefetch compared to before 
> (latest master):
> 1.005x (0.5%)
> 
> When I compare before/after performance (including the prefetch code, on 
> latest master), the overall performance difference is 0.5% here.
> 
> > diff --git a/lib/flow.c b/lib/flow.c
> > index 729d59b1b..4572e356b 100644
> > --- a/lib/flow.c
> > +++ b/lib/flow.c
> > @@ -746,6 +746,9 @@ miniflow_extract(struct dp_packet *packet, struct 
> > miniflow *dst)
> >  uint8_t *ct_nw_proto_p = NULL;
> >  ovs_be16 ct_tp_src = 0, ct_tp_dst = 0;
> > 
> > +/* dltype will be updated later. */
> > +OVS_PREFETCH_WRITE(miniflow_pointer(mf, dl_type));
> > +
> >  /* Metadata. */
> >  if (flow_tnl_dst_is_set(>tunnel)) {
> >  miniflow_push_words(mf, tunnel, >tunnel,
> > 
> > 
> > fbl
> > 
> 
> 
> 
> Thanks,
> Cian

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 ovn 1/5] ovn-northd-ddlog: Optimize AggregatedFlow rules.

2021-07-15 Thread Ben Pfaff
On Thu, Jul 15, 2021 at 08:45:00PM +0200, Lorenzo Bianconi wrote:
> > On Thu, Jul 15, 2021 at 11:18:12AM -0700, Han Zhou wrote:
> > > On Wed, Jul 14, 2021 at 1:34 AM Lorenzo Bianconi <
> > > lorenzo.bianc...@redhat.com> wrote:
> > > > This should avoid some work by doing the cheapest check (the one on
> > > > UseLogicalDatapathGroups) before any joins.  DDlog is probably
> > > > factoring out the reference to the Flow relation, which is identical
> > > > in both, but this ought to avoid the group_by aggregation (which is
> > > > relatively expensive) in the case where UseLogicalDatapathGroups is
> > > > not enabled.
> > > 
> > > Thanks! I didn't know that the order matters. (not sure if there is
> > > documentation that I missed)
> > 
> > In general, DDlog executes each rule in the order given, so if you have
> > a series of joins in a rule, then it's a good idea to order them, if you
> > can, to keep the number of records small at each join step.  It won't
> > affect the correctness, but it will affect performance.
> > 
> > This might not be documented well.  I do occasionally work on the DDlog
> > documentation, so I've made a note to try to see whether the effect of
> > ordering is mentioned and improve it if I can.
> > 
> > In this particular case, I talked to Leonid about it during a discussion
> > of how to improve performance and memory use in the benchmark currently
> > at issue, and Leonid says that the ordering doesn't actually matter in
> > this case because both of the possibilities (the ones for
> > UseLogicalDatapathGroups[true] and UseLogicalDatapathGroups[false]) had
> > an identical clause at the beginning.  DDlog optimizes identical
> > prefixes, so there was no real difference in performance.
> > 
> > This patch could, therefore, be dropped, but it should also not be
> > harmful.  Dropping it would require some changes to the later patch that
> > updates the ovn-northd ddlog code, so folding it into that one would
> > also be an option.
> > 
> 
> Do you want me to repost dropping this patch or is the series fine as it is?

The series is fine as is.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-server: Fix memleak when failing to rea storage.

2021-07-15 Thread Ben Pfaff
On Wed, Jul 14, 2021 at 09:21:19AM +0200, Dumitru Ceara wrote:
> Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered 
> databases.")
> Signed-off-by: Dumitru Ceara 

Thanks!  I pushed this.  I noticed that the declaration could be moved
down to the first assignment, so I actually pushed the following.

I backported to all affected branches.

-8<--cut here-->8--

From: Dumitru Ceara 
Date: Wed, 14 Jul 2021 09:21:19 +0200
Subject: [PATCH] ovsdb-server: Fix memleak when failing to read storage.

Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered 
databases.")
Signed-off-by: Dumitru Ceara 
Signed-off-by: Ben Pfaff 
---
 ovsdb/ovsdb-server.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
index b09232c654ab..f4a0fac99259 100644
--- a/ovsdb/ovsdb-server.c
+++ b/ovsdb/ovsdb-server.c
@@ -660,8 +660,6 @@ add_db(struct server_config *config, struct db *db)
 static struct ovsdb_error * OVS_WARN_UNUSED_RESULT
 open_db(struct server_config *config, const char *filename)
 {
-struct db *db;
-
 /* If we know that the file is already open, return a good error message.
  * Otherwise, if the file is open, we'll fail later on with a harder to
  * interpret file locking error. */
@@ -676,9 +674,6 @@ open_db(struct server_config *config, const char *filename)
 return error;
 }
 
-db = xzalloc(sizeof *db);
-db->filename = xstrdup(filename);
-
 struct ovsdb_schema *schema;
 if (ovsdb_storage_is_clustered(storage)) {
 schema = NULL;
@@ -691,6 +686,9 @@ open_db(struct server_config *config, const char *filename)
 }
 ovs_assert(schema && !txn_json);
 }
+
+struct db *db = xzalloc(sizeof *db);
+db->filename = xstrdup(filename);
 db->db = ovsdb_create(schema, storage);
 ovsdb_jsonrpc_server_add_db(config->jsonrpc, db->db);
 
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v14 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-15 Thread Flavio Leitner
On Thu, Jul 15, 2021 at 09:36:13PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> Tests:
>   6: OVS-DPDK - MFEX Autovalidator
>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
>   8: OVS-DPDK - MFEX Configuration
> 
> Added a new directory to store the PCAP file used
> in the tests and a script to generate the fuzzy traffic
> type pcap to be used in fuzzy unit test.
> 
> Signed-off-by: Kumar Amber 
> Acked-by: Flavio Leitner 
> 
> ---

It looks good and works for me.

Acked-by: Flavio Leitner 

Without sse4.2:
OVS-DPDK unit tests

  6: OVS-DPDK - MFEX Autovalidator   skipped 
(system-dpdk.at:248)
  7: OVS-DPDK - MFEX Autovalidator Fuzzy skipped 
(system-dpdk.at:275)
  8: OVS-DPDK - MFEX Configuration   ok

With sse4.2:
OVS-DPDK unit tests

  6: OVS-DPDK - MFEX Autovalidator   ok
  7: OVS-DPDK - MFEX Autovalidator Fuzzy ok
  8: OVS-DPDK - MFEX Configuration   ok

fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v14 10/11] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-07-15 Thread Flavio Leitner
On Thu, Jul 15, 2021 at 09:36:16PM +0530, kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit adds AVX512 implementations of miniflow extract.
> By using the 64 bytes available in an AVX512 register, it is
> possible to convert a packet to a miniflow data-structure in
> a small quantity instructions.
> 
> The implementation here probes for Ether()/IP()/UDP() traffic,
> and builds the appropriate miniflow data-structure for packets
> that match the probe.
> 
> The implementation here is auto-validated by the miniflow
> extract autovalidator, hence its correctness can be easily
> tested and verified.
> 
> Note that this commit is designed to easily allow addition of new
> traffic profiles in a scalable way, without code duplication for
> each traffic profile.
> 
> Signed-off-by: Harry van Haaren 
> Acked-by: Eelco Chaudron 
> 
> ---

Acked-by: Flavio Leitner 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 ovn 1/5] ovn-northd-ddlog: Optimize AggregatedFlow rules.

2021-07-15 Thread Lorenzo Bianconi
> On Thu, Jul 15, 2021 at 11:18:12AM -0700, Han Zhou wrote:
> > On Wed, Jul 14, 2021 at 1:34 AM Lorenzo Bianconi <
> > lorenzo.bianc...@redhat.com> wrote:
> > > This should avoid some work by doing the cheapest check (the one on
> > > UseLogicalDatapathGroups) before any joins.  DDlog is probably
> > > factoring out the reference to the Flow relation, which is identical
> > > in both, but this ought to avoid the group_by aggregation (which is
> > > relatively expensive) in the case where UseLogicalDatapathGroups is
> > > not enabled.
> > 
> > Thanks! I didn't know that the order matters. (not sure if there is
> > documentation that I missed)
> 
> In general, DDlog executes each rule in the order given, so if you have
> a series of joins in a rule, then it's a good idea to order them, if you
> can, to keep the number of records small at each join step.  It won't
> affect the correctness, but it will affect performance.
> 
> This might not be documented well.  I do occasionally work on the DDlog
> documentation, so I've made a note to try to see whether the effect of
> ordering is mentioned and improve it if I can.
> 
> In this particular case, I talked to Leonid about it during a discussion
> of how to improve performance and memory use in the benchmark currently
> at issue, and Leonid says that the ordering doesn't actually matter in
> this case because both of the possibilities (the ones for
> UseLogicalDatapathGroups[true] and UseLogicalDatapathGroups[false]) had
> an identical clause at the beginning.  DDlog optimizes identical
> prefixes, so there was no real difference in performance.
> 
> This patch could, therefore, be dropped, but it should also not be
> harmful.  Dropping it would require some changes to the later patch that
> updates the ovn-northd ddlog code, so folding it into that one would
> also be an option.
> 

Do you want me to repost dropping this patch or is the series fine as it is?

Regards,
Lorenzo
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] latch-unix: Decrease the stack usage in latch

2021-07-15 Thread Ben Pfaff
On Thu, Jul 15, 2021 at 04:28:12PM +0100, anton.iva...@cambridgegreys.com wrote:
> From: Anton Ivanov 
> 
> 1. Make latch behave as described and documented - clear all
> outstanding latch writes when invoking latch_poll().
> 2. Decrease the size of the latch buffer. Less stack usage,
> less cache thrashing.
> 
> Signed-off-by: Anton Ivanov 

Applied, thanks!  I dropped the following hunk:

> --- a/lib/latch-unix.c
> +++ b/lib/latch-unix.c
> @@ -23,6 +23,7 @@
>  #include "openvswitch/poll-loop.h"
>  #include "socket-util.h"
>  
> +
>  /* Initializes 'latch' as initially unset. */
>  void
>  latch_init(struct latch *latch)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2] northd: Fix defrag flows for duplicate vips

2021-07-15 Thread Mark Gray
When adding two SB flows with the same vip but different protocols, only
the most recent flow will be added due to the `if` statement:

if (!sset_contains(_ips, lb_vip->vip_str)) {
sset_add(_ips, lb_vip->vip_str);

This can cause unexpected behaviour when two load balancers with
the same VIP (and different protocols) are added to a logical router.

This is due to the addition of "protocol" to the match in
defrag table flows in a previous commit.

Add flow to defrag table for every load-balancer in order to resolve this. Flows
for Load Balancers without a port specified are added with priority 100. Flows
for Load Balancers with a port specified are added with priority 110.

Add a test to check behaviour of Logical Flows when two load balancers
of the same VIP are added.

This bug was discovered through the OVN CI (ovn-kubernetes.yml).

Fixes: 384a7c6237da ("northd: Refactor Logical Flows for routers with DNAT/Load 
Balancers")
Signed-off-by: Mark Gray 
---

Notes:
v2 - Address Mark M.'s comments
 * Add flows to defrag table for every LB VIP and every LB VIP + proto
   rather than just every unique LB VIP
 * Change priority of flows in LB VIP + proto case in order to not clash
   with flows in LB VIP case.
 * Add additional tests.

 northd/ovn-northd.8.xml | 35 +++-
 northd/ovn-northd.c | 64 ++--
 northd/ovn_northd.dl|  8 ++--
 tests/ovn-northd.at | 93 +++--
 4 files changed, 130 insertions(+), 70 deletions(-)

diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index a20c5b90dc66..6599ba194fe5 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -2773,17 +2773,32 @@ icmp6 {
 
 
 
-  If load balancing rules with virtual IP addresses (and ports) are
-  configured in OVN_Northbound database for a Gateway router,
+  If load balancing rules with only virtual IP addresses are configured in
+  OVN_Northbound database for a Gateway router,
   a priority-100 flow is added for each configured virtual IP address
-  VIP. For IPv4 VIPs the flow matches ip
-   ip4.dst == VIP.  For IPv6 VIPs,
-  the flow matches ip  ip6.dst == VIP.
-  The flow applies the action reg0 = VIP
-   ct_dnat; to send IP packets to the
-  connection tracker for packet de-fragmentation and to dnat the
-  destination IP for the committed connection before sending it to the
-  next table.
+  VIP. For IPv4 VIPs the flow matches
+  ip  ip4.dst == VIP.  For IPv6
+  VIPs, the flow matches ip  ip6.dst ==
+  VIP. The flow applies the action reg0 =
+  VIP; ct_dnat;  (or xxreg0 for IPv6) to
+  send IP packets to the connection tracker for packet de-fragmentation and
+  to dnat the destination IP for the committed connection before sending it
+  to the next table.
+
+
+
+  If load balancing rules with virtual IP addresses and ports are
+  configured in OVN_Northbound database for a Gateway router,
+  a priority-110 flow is added for each configured virtual IP address
+  VIP and protocol PROTO. For IPv4 VIPs
+  the flow matches ip  ip4.dst == VIP 
+  PROTO. For IPv6 VIPs, the flow matches
+  ip  ip6.dst == VIP 
+  PROTO. The flow applies the action reg0 =
+  VIP; ct_dnat; (or xxreg0 for IPv6) to send
+  IP packets to the connection tracker for packet de-fragmentation and to
+  dnat the destination IP for the committed connection before sending it to
+  the next table.
 
 
 
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 999c3f482c29..2a11172d94b6 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -9207,8 +9207,6 @@ static void
 build_lrouter_lb_flows(struct hmap *lflows, struct ovn_datapath *od,
struct hmap *lbs, struct ds *match)
 {
-/* A set to hold all ips that need defragmentation and tracking. */
-struct sset all_ips = SSET_INITIALIZER(_ips);
 
 for (int i = 0; i < od->nbr->n_load_balancer; i++) {
 struct nbrec_load_balancer *nb_lb = od->nbr->load_balancer[i];
@@ -9217,6 +9215,7 @@ build_lrouter_lb_flows(struct hmap *lflows, struct 
ovn_datapath *od,
 ovs_assert(lb);
 
 for (size_t j = 0; j < lb->n_vips; j++) {
+int prio = 100;
 struct ovn_lb_vip *lb_vip = >vips[j];
 
 bool is_udp = nullable_string_is_equal(nb_lb->protocol, "udp");
@@ -9225,42 +9224,41 @@ build_lrouter_lb_flows(struct hmap *lflows, struct 
ovn_datapath *od,
 const char *proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp";
 
 struct ds defrag_actions = DS_EMPTY_INITIALIZER;
-if (!sset_contains(_ips, lb_vip->vip_str)) {
-sset_add(_ips, lb_vip->vip_str);
-/* If there are any load balancing rules, we should send
- * the packet to conntrack for 

Re: [ovs-dev] [PATCH v2 1/2] Optimize the poll loop for poll_immediate_wake()

2021-07-15 Thread Mark Michelson

Hi Anton,

From my perspective (i.e. a user of poll_immediate_wake()), this looks 
fine by me. It results in the same return values and logging, and it 
removes unnecessary overhead. It's probably best for someone on the OVS 
team to give the final approval, but for me,


Acked-by: Mark Michelson 

On 7/12/21 1:15 PM, anton.iva...@cambridgegreys.com wrote:

From: Anton Ivanov 

If we are not obtaining any useful information out of the poll(),
such as is a fd busy or not, we do not need to do a poll() if
an immediate_wake() has been requested.

This cuts out all the pollfd hash additions, forming the poll
arguments and the actual poll() after a call to
poll_immediate_wake()

Signed-off-by: Anton Ivanov 
---
  lib/poll-loop.c | 69 -
  lib/timeval.c   | 11 +++-
  2 files changed, 56 insertions(+), 24 deletions(-)

diff --git a/lib/poll-loop.c b/lib/poll-loop.c
index 4e751ff2c..09bc4f5c4 100644
--- a/lib/poll-loop.c
+++ b/lib/poll-loop.c
@@ -53,6 +53,7 @@ struct poll_loop {
   * wake up immediately, or LLONG_MAX to wait forever. */
  long long int timeout_when; /* In msecs as returned by time_msec(). */
  const char *timeout_where;  /* Where 'timeout_when' was set. */
+bool immediate_wake;
  };
  
  static struct poll_loop *poll_loop(void);

@@ -107,6 +108,13 @@ poll_create_node(int fd, HANDLE wevent, short int events, 
const char *where)
  
  COVERAGE_INC(poll_create_node);
  
+if (loop->immediate_wake) {

+/* We have been asked to bail out of this poll loop.
+ * There is no point to engage in yack shaving a poll hmap.
+ */
+return;
+}
+
  /* Both 'fd' and 'wevent' cannot be set. */
  ovs_assert(!fd != !wevent);
  
@@ -181,8 +189,15 @@ poll_wevent_wait_at(HANDLE wevent, const char *where)

  void
  poll_timer_wait_at(long long int msec, const char *where)
  {
-long long int now = time_msec();
+long long int now;
  long long int when;
+struct poll_loop *loop = poll_loop();
+
+if (loop->immediate_wake) {
+return;
+}
+
+now = time_msec();
  
  if (msec <= 0) {

  /* Wake up immediately. */
@@ -229,7 +244,9 @@ poll_timer_wait_until_at(long long int when, const char 
*where)
  void
  poll_immediate_wake_at(const char *where)
  {
+struct poll_loop *loop = poll_loop();
  poll_timer_wait_at(0, where);
+loop->immediate_wake = true;
  }
  
  /* Logs, if appropriate, that the poll loop was awakened by an event

@@ -320,10 +337,10 @@ poll_block(void)
  {
  struct poll_loop *loop = poll_loop();
  struct poll_node *node;
-struct pollfd *pollfds;
+struct pollfd *pollfds = NULL;
  HANDLE *wevents = NULL;
  int elapsed;
-int retval;
+int retval = 0;
  int i;
  
  /* Register fatal signal events before actually doing any real work for

@@ -335,34 +352,38 @@ poll_block(void)
  }
  
  timewarp_run();

-pollfds = xmalloc(hmap_count(>poll_nodes) * sizeof *pollfds);
+if (!loop->immediate_wake) {
+pollfds = xmalloc(hmap_count(>poll_nodes) * sizeof *pollfds);
  
  #ifdef _WIN32

-wevents = xmalloc(hmap_count(>poll_nodes) * sizeof *wevents);
+wevents = xmalloc(hmap_count(>poll_nodes) * sizeof *wevents);
  #endif
  
-/* Populate with all the fds and events. */

-i = 0;
-HMAP_FOR_EACH (node, hmap_node, >poll_nodes) {
-pollfds[i] = node->pollfd;
+/* Populate with all the fds and events. */
+i = 0;
+HMAP_FOR_EACH (node, hmap_node, >poll_nodes) {
+pollfds[i] = node->pollfd;
  #ifdef _WIN32
-wevents[i] = node->wevent;
-if (node->pollfd.fd && node->wevent) {
-short int wsa_events = 0;
-if (node->pollfd.events & POLLIN) {
-wsa_events |= FD_READ | FD_ACCEPT | FD_CLOSE;
-}
-if (node->pollfd.events & POLLOUT) {
-wsa_events |= FD_WRITE | FD_CONNECT | FD_CLOSE;
+wevents[i] = node->wevent;
+if (node->pollfd.fd && node->wevent) {
+short int wsa_events = 0;
+if (node->pollfd.events & POLLIN) {
+wsa_events |= FD_READ | FD_ACCEPT | FD_CLOSE;
+}
+if (node->pollfd.events & POLLOUT) {
+wsa_events |= FD_WRITE | FD_CONNECT | FD_CLOSE;
+}
+WSAEventSelect(node->pollfd.fd, node->wevent, wsa_events);
  }
-WSAEventSelect(node->pollfd.fd, node->wevent, wsa_events);
-}
  #endif
-i++;
-}
+i++;
+}
  
-retval = time_poll(pollfds, hmap_count(>poll_nodes), wevents,

-   loop->timeout_when, );
+retval = time_poll(pollfds, hmap_count(>poll_nodes), wevents,
+   loop->timeout_when, );
+} else {
+retval = time_poll(NULL, 0, NULL, loop->timeout_when, );
+}
  if (retval < 0) {
   

Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-07-15 Thread Dumitru Ceara
On 7/15/21 7:37 PM, Han Zhou wrote:
> On Thu, Jul 15, 2021 at 9:17 AM Ilya Maximets  wrote:
>>
>> On 6/29/21 9:57 PM, Ilya Maximets wrote:
> Regarding the current patch, I think it's better to add a test case to
> cover the scenario and confirm that existing connections didn't
> reset. With
> that:
> Acked-by: Han Zhou 
>>>
>>> I'll work on a unit test for this.
>>
>> Hi.  Here is a unit test that I came up with:
>>
>> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
>> index 62181dd4d..e32f9ec89 100644
>> --- a/tests/ovsdb-idl.at
>> +++ b/tests/ovsdb-idl.at
>> @@ -2282,3 +2282,27 @@ OVSDB_CHECK_CLUSTER_IDL_C([simple idl,
> monitor_cond_since, cluster disconnect],
>>  008: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[]
> uuid=<2>
>>  009: done
>>  ]])
>> +
>> +dnl This test checks that IDL keeps the existing connection to the
> server if
>> +dnl it's still on a list of remotes after update.
>> +OVSDB_CHECK_IDL_C([simple idl, initially empty, set remotes],
>> +  [],
>> +  [['set-remote unix:socket' \
>> +'+set-remote unix:bad_socket,unix:socket' \
>> +'+set-remote unix:bad_socket' \
>> +'+set-remote unix:socket' \
>> +'set-remote unix:bad_socket,unix:socket' \
>> +'+set-remote unix:socket' \
>> +'+reconnect']],
>> +  [[000: empty
>> +001: new remotes: unix:socket, is connected: true
>> +002: new remotes: unix:bad_socket,unix:socket, is connected: true
>> +003: new remotes: unix:bad_socket, is connected: false
>> +004: new remotes: unix:socket, is connected: false
>> +005: empty
>> +006: new remotes: unix:bad_socket,unix:socket, is connected: true
>> +007: new remotes: unix:socket, is connected: true
>> +008: reconnect
>> +009: empty
>> +010: done
>> +]])
>> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
>> index a886f971e..93329cd4c 100644
>> --- a/tests/test-ovsdb.c
>> +++ b/tests/test-ovsdb.c
>> @@ -2621,6 +2621,7 @@ do_idl(struct ovs_cmdl_context *ctx)
>>  setvbuf(stdout, NULL, _IONBF, 0);
>>
>>  symtab = ovsdb_symbol_table_create();
>> +const char remote_s[] = "set-remote ";
>>  const char cond_s[] = "condition ";
>>  if (ctx->argc > 2 && strstr(ctx->argv[2], cond_s)) {
>>  update_conditions(idl, ctx->argv[2] + strlen(cond_s));
>> @@ -2664,6 +2665,11 @@ do_idl(struct ovs_cmdl_context *ctx)
>>  if (!strcmp(arg, "reconnect")) {
>>  print_and_log("%03d: reconnect", step++);
>>  ovsdb_idl_force_reconnect(idl);
>> +}  else if (!strncmp(arg, remote_s, strlen(remote_s))) {
>> +ovsdb_idl_set_remote(idl, arg + strlen(remote_s), true);
>> +print_and_log("%03d: new remotes: %s, is connected: %s",
> step++,
>> +  arg + strlen(remote_s),
>> +  ovsdb_idl_is_connected(idl) ? "true" :
> "false");
>>  }  else if (!strncmp(arg, cond_s, strlen(cond_s))) {
>>  update_conditions(idl, arg + strlen(cond_s));
>>  print_and_log("%03d: change conditions", step++);
>> ---
>>
>> Dumitru, Han, if it looks good to you, I can squash it in before
>> applying the patch.   What do you think?
> 
> Thanks Ilya. LGTM.

Looks good to me too, thanks!

> 
>>
>> Best regards, Ilya Maximets.
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 ovn 1/5] ovn-northd-ddlog: Optimize AggregatedFlow rules.

2021-07-15 Thread Ben Pfaff
On Thu, Jul 15, 2021 at 11:18:12AM -0700, Han Zhou wrote:
> On Wed, Jul 14, 2021 at 1:34 AM Lorenzo Bianconi <
> lorenzo.bianc...@redhat.com> wrote:
> > This should avoid some work by doing the cheapest check (the one on
> > UseLogicalDatapathGroups) before any joins.  DDlog is probably
> > factoring out the reference to the Flow relation, which is identical
> > in both, but this ought to avoid the group_by aggregation (which is
> > relatively expensive) in the case where UseLogicalDatapathGroups is
> > not enabled.
> 
> Thanks! I didn't know that the order matters. (not sure if there is
> documentation that I missed)

In general, DDlog executes each rule in the order given, so if you have
a series of joins in a rule, then it's a good idea to order them, if you
can, to keep the number of records small at each join step.  It won't
affect the correctness, but it will affect performance.

This might not be documented well.  I do occasionally work on the DDlog
documentation, so I've made a note to try to see whether the effect of
ordering is mentioned and improve it if I can.

In this particular case, I talked to Leonid about it during a discussion
of how to improve performance and memory use in the benchmark currently
at issue, and Leonid says that the ordering doesn't actually matter in
this case because both of the possibilities (the ones for
UseLogicalDatapathGroups[true] and UseLogicalDatapathGroups[false]) had
an identical clause at the beginning.  DDlog optimizes identical
prefixes, so there was no real difference in performance.

This patch could, therefore, be dropped, but it should also not be
harmful.  Dropping it would require some changes to the later patch that
updates the ovn-northd ddlog code, so folding it into that one would
also be an option.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v14 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-15 Thread Flavio Leitner
On Thu, Jul 15, 2021 at 09:36:12PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> This commit introduces additional command line paramter
> for mfex study function. If user provides additional packet out
> it is used in study to compare minimum packets which must be processed
> else a default value is choosen.
> Also introduces a third paramter for choosing a particular pmd core.
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3
> 
> Signed-off-by: Kumar Amber 
> 
> ---
> v14:
> - fix logging format and  xmas ordering
> v13:
> - reowrked the set command as per discussion
> - fixed the atomic set in study
> - added bool for handling study mfex to simplify logic and command output
> - fixed double space in variable declaration and removed static
> v12:
> - re-work the set command to sweep
> - inlcude fixes to study.c and doc changes
> v11:
> - include comments from Eelco
> - reworked set command as per discussion
> v10:
> - fix review comments Eelco
> v9:
> - fix review comments Flavio
> v7:
> - change the command paramters for core_id and study_pkt_cnt
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - introucde pmd core id parameter
> ---
> ---
>  Documentation/topics/dpdk/bridge.rst |  38 +-
>  lib/dpif-netdev-extract-study.c  |  26 +++-
>  lib/dpif-netdev-private-extract.h|   9 ++
>  lib/dpif-netdev.c| 175 ++-
>  4 files changed, 216 insertions(+), 32 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index a47153495..8c500c504 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -284,12 +284,46 @@ command also shows whether the CPU supports each 
> implementation ::
>  
>  An implementation can be selected manually by the following command ::
>  
> -$ ovs-appctl dpif-netdev/miniflow-parser-set study
> +$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
> + [study_cnt]
> +
> +The above command has two optional parameters: study_cnt and core_id.
> +The core_id sets a particular miniflow extract function to a specific
> +pmd thread on the core. The third parameter study_cnt, which is specific
> +to study and ignored by other implementations, means how many packets
> +are needed to choose the best implementation.
>  
>  Also user can select the study implementation which studies the traffic for
>  a specific number of packets by applying all available implementations of
>  miniflow extract and then chooses the one with the most optimal result for
> -that traffic pattern.
> +that traffic pattern. The user can optionally provide an packet count
> +[study_cnt] parameter which is the minimum number of packets that OVS must
> +study before choosing an optimal implementation. If no packet count is
> +provided, then the default value, 128 is chosen. Also, as there is no
> +synchronization point between threads, one PMD thread might still be running
> +a previous round, and can now decide on earlier data.
> +
> +The per packet count is a global value, and parallel study executions with
> +differing packet counts will use the most recent count value provided by 
> user.
> +
> +Study can be selected with packet count by the following command ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
> +
> +Study can be selected with packet count and explicit PMD selection
> +by the following command ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
> +
> +In the above command the first parameter is the CORE ID of the PMD
> +thread and this can also be used to explicitly set the miniflow
> +extraction function pointer on different PMD threads.
> +
> +Scalar can be selected on core 3 by the following command where
> +study count should not be provided for any implementation other
> +than study ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
>  
>  Miniflow Extract Validation
>  ~~~
> diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
> index 02b709f8b..4340c9eee 100644
> --- a/lib/dpif-netdev-extract-study.c
> +++ b/lib/dpif-netdev-extract-study.c
> @@ -25,7 +25,7 @@
>  
>  VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
>  
> -static atomic_uint32_t mfex_study_pkts_count = 0;
> +static atomic_uint32_t mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;
>  
>  /* Struct to hold miniflow study stats. */
>  struct study_stats {
> @@ -48,6 +48,25 @@ mfex_study_get_study_stats_ptr(void)
>  return stats;
>  }
>  
> +int
> +mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count, const char *name)
> +{
> +struct dpif_miniflow_extract_impl *miniflow_funcs;
> +miniflow_funcs = dpif_mfex_impl_info_get();
> +
> +/* If the packet count is set and implementation called is study then
> + * set packet counter to requested number else return -EINVAL.
> + */
> 

Re: [ovs-dev] [PATCH v7 ovn 1/5] ovn-northd-ddlog: Optimize AggregatedFlow rules.

2021-07-15 Thread Han Zhou
On Wed, Jul 14, 2021 at 1:34 AM Lorenzo Bianconi <
lorenzo.bianc...@redhat.com> wrote:
>
> From: Ben Pfaff 
>
> This should avoid some work by doing the cheapest check (the one on
> UseLogicalDatapathGroups) before any joins.  DDlog is probably
> factoring out the reference to the Flow relation, which is identical
> in both, but this ought to avoid the group_by aggregation (which is
> relatively expensive) in the case where UseLogicalDatapathGroups is
> not enabled.
>
> Signed-off-by: Ben Pfaff 
> Signed-off-by: Lorenzo Bianconi 
> ---
>  northd/ovn_northd.dl | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> index ceeabe6f3..46280e22e 100644
> --- a/northd/ovn_northd.dl
> +++ b/northd/ovn_northd.dl
> @@ -1659,17 +1659,17 @@ AggregatedFlow(.logical_datapaths = g.to_set(),
> .__match = __match,
> .actions = actions,
> .external_ids = external_ids) :-
> +UseLogicalDatapathGroups[true],
>  Flow(logical_datapath, stage, priority, __match, actions,
external_ids),
> -var g = logical_datapath.group_by((stage, priority, __match,
actions, external_ids)),
> -UseLogicalDatapathGroups[true].
> +var g = logical_datapath.group_by((stage, priority, __match,
actions, external_ids)).
>  AggregatedFlow(.logical_datapaths = set_singleton(logical_datapath),
> .stage = stage,
> .priority = priority,
> .__match = __match,
> .actions = actions,
> .external_ids = external_ids) :-
> -Flow(logical_datapath, stage, priority, __match, actions,
external_ids),
> -UseLogicalDatapathGroups[false].
> +UseLogicalDatapathGroups[false],
> +Flow(logical_datapath, stage, priority, __match, actions,
external_ids).
>
>  for (f in AggregatedFlow()) {
>  var pipeline = if (f.stage.pipeline == Ingress) "ingress" else
"egress" in
> --
> 2.31.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Thanks! I didn't know that the order matters. (not sure if there is
documentation that I missed)

Acked-by: Han Zhou 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: Fix defrag flows for duplicate vips

2021-07-15 Thread Dumitru Ceara
On 7/15/21 7:04 PM, Mark Gray wrote:
> On 15/07/2021 15:29, Dumitru Ceara wrote:
>> On 7/15/21 3:54 PM, Mark Gray wrote:
>>> On 15/07/2021 14:16, Mark Michelson wrote:
 Hi Mark,
>>
>> Hi Mark, Mark,
>>

 I'm a bit curious about this change. Does the removal of the protocol 
 from the match mean that traffic that is not of the protocol specified 
 in the load balancer will be ct_dnat()'ed? Does that constitute 
 unexpected behavior?

>>>
>>> Yes, this is the case. It's a tradeoff between number of flows and
>>> reirculations but thinking about it again, it may be better to have more
>>> flows. I will create a v2.
>>>
>>
>> Unless we match on proto *and* L4 port I don't think it's worth adding
>> per proto flows.  Assuming a TCP load balancer, all TCP traffic with
>> destination VIP will still be ct_dnat()'ed, even if the TCP destination
>> port is not the one defined in the load balancer VIP.
>>
>> On the other hand, using the same VIP for multiple ports is probably a
>> common use case so if we add the L4 port to the match the number of
>> logical flows might increase significantly.
> 
> I don't think we can match on L4 port AFAIK, this could cause misses
> with fragmented packets (which is the whole point of the defrag table).

You're right, sorry about the noise.

> 
> I guess it depends on the use case as that will determine the number of
> vips (which will generate a certain number of flows) and the traffic
> pattern (which will determine the average number of ct_dnat()s). In a
> system with a handful of VIPs for TCP traffic but mostly hosting UDP
> traffic, it may be better to do as Mark M. suggests.
> 
> The number of flows may not be that high as we only add one flow per
> protocol rather than one per port. So I guess in the worst case, this
> could be 4x the number of load balancer VIPs associated with the logical
> router.
> 

Ok, it doesn't sound that bad indeed.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS "soft freeze" for 2.16 is in effect.

2021-07-15 Thread Stokes, Ian
> Hi.  As described in Documentation/internals/release-process.rst, we are
> in a "soft freeze" state:
> 
>During the freeze, we ask committers to refrain from applying patches that
>add new features unless those patches were already being publicly discussed
>and reviewed before the freeze began.  Bug fixes are welcome at any time.
>Please propose and discuss exceptions on ovs-dev.
> 
> We should branch for version 2.16 in two weeks from now, on Friday, July 16.
> 
> Current known exception that will likely be accepted during soft-freeze, but
> not prepared/reviewed yet:
>   - We're awaiting patches to fix performance issue in VXLAN decap offloading
> implementation in userspace datapath, but this is more a bug fix than a
> new feature.
> 
Hi Ilya,

With branching for 2.16 being planned tomorrow, the other exceptions that I am 
aware of are as follows

MFEX Optimiziation (v14 sent, awaiting ACK on minor rework)
RXQ Scheduling (Acked by RH and Intel already)
dpdk: Logs to announce removal of defaults for socket-mem and limit.

Is there anything else on your side that you are aware that is in a position to 
be applied?

Thanks
Ian
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-07-15 Thread Han Zhou
On Thu, Jul 15, 2021 at 9:17 AM Ilya Maximets  wrote:
>
> On 6/29/21 9:57 PM, Ilya Maximets wrote:
> >>> Regarding the current patch, I think it's better to add a test case to
> >>> cover the scenario and confirm that existing connections didn't
reset. With
> >>> that:
> >>> Acked-by: Han Zhou 
> >
> > I'll work on a unit test for this.
>
> Hi.  Here is a unit test that I came up with:
>
> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> index 62181dd4d..e32f9ec89 100644
> --- a/tests/ovsdb-idl.at
> +++ b/tests/ovsdb-idl.at
> @@ -2282,3 +2282,27 @@ OVSDB_CHECK_CLUSTER_IDL_C([simple idl,
monitor_cond_since, cluster disconnect],
>  008: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[]
uuid=<2>
>  009: done
>  ]])
> +
> +dnl This test checks that IDL keeps the existing connection to the
server if
> +dnl it's still on a list of remotes after update.
> +OVSDB_CHECK_IDL_C([simple idl, initially empty, set remotes],
> +  [],
> +  [['set-remote unix:socket' \
> +'+set-remote unix:bad_socket,unix:socket' \
> +'+set-remote unix:bad_socket' \
> +'+set-remote unix:socket' \
> +'set-remote unix:bad_socket,unix:socket' \
> +'+set-remote unix:socket' \
> +'+reconnect']],
> +  [[000: empty
> +001: new remotes: unix:socket, is connected: true
> +002: new remotes: unix:bad_socket,unix:socket, is connected: true
> +003: new remotes: unix:bad_socket, is connected: false
> +004: new remotes: unix:socket, is connected: false
> +005: empty
> +006: new remotes: unix:bad_socket,unix:socket, is connected: true
> +007: new remotes: unix:socket, is connected: true
> +008: reconnect
> +009: empty
> +010: done
> +]])
> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
> index a886f971e..93329cd4c 100644
> --- a/tests/test-ovsdb.c
> +++ b/tests/test-ovsdb.c
> @@ -2621,6 +2621,7 @@ do_idl(struct ovs_cmdl_context *ctx)
>  setvbuf(stdout, NULL, _IONBF, 0);
>
>  symtab = ovsdb_symbol_table_create();
> +const char remote_s[] = "set-remote ";
>  const char cond_s[] = "condition ";
>  if (ctx->argc > 2 && strstr(ctx->argv[2], cond_s)) {
>  update_conditions(idl, ctx->argv[2] + strlen(cond_s));
> @@ -2664,6 +2665,11 @@ do_idl(struct ovs_cmdl_context *ctx)
>  if (!strcmp(arg, "reconnect")) {
>  print_and_log("%03d: reconnect", step++);
>  ovsdb_idl_force_reconnect(idl);
> +}  else if (!strncmp(arg, remote_s, strlen(remote_s))) {
> +ovsdb_idl_set_remote(idl, arg + strlen(remote_s), true);
> +print_and_log("%03d: new remotes: %s, is connected: %s",
step++,
> +  arg + strlen(remote_s),
> +  ovsdb_idl_is_connected(idl) ? "true" :
"false");
>  }  else if (!strncmp(arg, cond_s, strlen(cond_s))) {
>  update_conditions(idl, arg + strlen(cond_s));
>  print_and_log("%03d: change conditions", step++);
> ---
>
> Dumitru, Han, if it looks good to you, I can squash it in before
> applying the patch.   What do you think?

Thanks Ilya. LGTM.

>
> Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 2/3] dpif-netlink: fix report_loss() message

2021-07-15 Thread Aaron Conole
Mark Gray  writes:

> Fixes: 1579cf677fcb ("dpif-linux: Implement the API functions to allow 
> multiple handler threads read upcall.")
> Signed-off-by: Mark Gray 
> Acked-by: Flavio Leitner 
> ---
>
> Notes:
> v1 - Reworked based on Flavio's comments:
>  * Added "Fixes" tag
>
>  lib/dpif-netlink.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>

WOW!  This is an old fix.  I guess this must have been flagged by some
compiler update.  I don't think it's really part of 'per-cpu support' -
it should definitely be merged, though.

Acked-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v5 1/3] ofproto: change type of n_handlers and n_revalidators

2021-07-15 Thread Aaron Conole
Mark Gray  writes:

> 'n_handlers' and 'n_revalidators' are declared as type 'size_t'.
> However, dpif_handlers_set() requires parameter 'n_handlers' as
> type 'uint32_t'. This patch fixes this type mismatch.
>
> Signed-off-by: Mark Gray 
> Acked-by: Flavio Leitner 
> ---
>
> Notes:
> v1 - Reworked based on Flavio's comments:
>  * fixed inconsistency with change of size_t -> uint32_t
>
>  ofproto/ofproto-dpif-upcall.c | 20 ++--
>  ofproto/ofproto-dpif-upcall.h |  5 +++--
>  ofproto/ofproto-provider.h|  2 +-
>  ofproto/ofproto.c |  2 +-
>  4 files changed, 15 insertions(+), 14 deletions(-)
>

I guess this is more of a cleanup - did it fix anything in practice?

Either way,

Acked-by: Aaron Conole 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: Fix defrag flows for duplicate vips

2021-07-15 Thread Mark Gray
On 15/07/2021 15:29, Dumitru Ceara wrote:
> On 7/15/21 3:54 PM, Mark Gray wrote:
>> On 15/07/2021 14:16, Mark Michelson wrote:
>>> Hi Mark,
> 
> Hi Mark, Mark,
> 
>>>
>>> I'm a bit curious about this change. Does the removal of the protocol 
>>> from the match mean that traffic that is not of the protocol specified 
>>> in the load balancer will be ct_dnat()'ed? Does that constitute 
>>> unexpected behavior?
>>>
>>
>> Yes, this is the case. It's a tradeoff between number of flows and
>> reirculations but thinking about it again, it may be better to have more
>> flows. I will create a v2.
>>
> 
> Unless we match on proto *and* L4 port I don't think it's worth adding
> per proto flows.  Assuming a TCP load balancer, all TCP traffic with
> destination VIP will still be ct_dnat()'ed, even if the TCP destination
> port is not the one defined in the load balancer VIP.
> 
> On the other hand, using the same VIP for multiple ports is probably a
> common use case so if we add the L4 port to the match the number of
> logical flows might increase significantly.

I don't think we can match on L4 port AFAIK, this could cause misses
with fragmented packets (which is the whole point of the defrag table).

I guess it depends on the use case as that will determine the number of
vips (which will generate a certain number of flows) and the traffic
pattern (which will determine the average number of ct_dnat()s). In a
system with a handful of VIPs for TCP traffic but mostly hosting UDP
traffic, it may be better to do as Mark M. suggests.

The number of flows may not be that high as we only add one flow per
protocol rather than one per port. So I guess in the worst case, this
could be 4x the number of load balancer VIPs associated with the logical
router.

> 
> Regards,
> Dumitru
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 4/9] ovsdb: row: Add support for xor-based row updates.

2021-07-15 Thread Mark Gray
On 14/07/2021 14:50, Ilya Maximets wrote:
> This will be used to apply update3 type updates to ovsdb tables
> while processing updates for future ovsdb 'relay' service model.
> 
> 'ovsdb_datum_apply_diff' is allowed to fail, so adding support
> to return this error.
> 
> Acked-by: Dumitru Ceara 
> Signed-off-by: Ilya Maximets 
> ---
>  ovsdb/execution.c   |  5 +++--
>  ovsdb/replication.c |  2 +-
>  ovsdb/row.c | 30 +-
>  ovsdb/row.h |  6 --
>  ovsdb/table.c   |  9 +
>  ovsdb/table.h   |  2 +-
>  6 files changed, 39 insertions(+), 15 deletions(-)
> 
> diff --git a/ovsdb/execution.c b/ovsdb/execution.c
> index 3a0dad5d0..f6150e944 100644
> --- a/ovsdb/execution.c
> +++ b/ovsdb/execution.c
> @@ -483,8 +483,9 @@ update_row_cb(const struct ovsdb_row *row, void *ur_)
>  
>  ur->n_matches++;
>  if (!ovsdb_row_equal_columns(row, ur->row, ur->columns)) {
> -ovsdb_row_update_columns(ovsdb_txn_row_modify(ur->txn, row),
> - ur->row, ur->columns);
> +ovsdb_error_assert(ovsdb_row_update_columns(
> +   ovsdb_txn_row_modify(ur->txn, row),
> +   ur->row, ur->columns, false));
>  }
>  
>  return true;
> diff --git a/ovsdb/replication.c b/ovsdb/replication.c
> index b755976b0..d8b56d813 100644
> --- a/ovsdb/replication.c
> +++ b/ovsdb/replication.c
> @@ -677,7 +677,7 @@ process_table_update(struct json *table_update, const 
> char *table_name,
>  struct ovsdb_error *error;
>  error = (!new ? ovsdb_table_execute_delete(txn, , table)
>   : !old ? ovsdb_table_execute_insert(txn, , table, new)
> - : ovsdb_table_execute_update(txn, , table, new));
> + : ovsdb_table_execute_update(txn, , table, new, 
> false));
>  if (error) {
>  if (!strcmp(ovsdb_error_get_tag(error), "consistency 
> violation")) {
>  ovsdb_error_assert(error);
> diff --git a/ovsdb/row.c b/ovsdb/row.c
> index 755ab91a8..65a054621 100644
> --- a/ovsdb/row.c
> +++ b/ovsdb/row.c
> @@ -163,20 +163,40 @@ ovsdb_row_equal_columns(const struct ovsdb_row *a,
>  return true;
>  }
>  
> -void
> +struct ovsdb_error *
>  ovsdb_row_update_columns(struct ovsdb_row *dst,
>   const struct ovsdb_row *src,
> - const struct ovsdb_column_set *columns)
> + const struct ovsdb_column_set *columns,
> + bool xor)
>  {
>  size_t i;
>  
>  for (i = 0; i < columns->n_columns; i++) {
>  const struct ovsdb_column *column = columns->columns[i];
> +struct ovsdb_datum xor_datum;
> +struct ovsdb_error *error;
> +
> +if (xor) {
> +error = ovsdb_datum_apply_diff(_datum,
> +   >fields[column->index],
> +   >fields[column->index],
> +   >type);
> +if (error) {
> +return error;
> +}
> +}
> +
>  ovsdb_datum_destroy(>fields[column->index], >type);
> -ovsdb_datum_clone(>fields[column->index],
> -  >fields[column->index],
> -  >type);
> +
> +if (xor) {
> +ovsdb_datum_swap(>fields[column->index], _datum);
> +} else {
> +ovsdb_datum_clone(>fields[column->index],
> +  >fields[column->index],
> +  >type);
> +}
>  }
> +return NULL;
>  }
>  
>  /* Appends the string form of the value in 'row' of each of the columns in
> diff --git a/ovsdb/row.h b/ovsdb/row.h
> index 2c441b5a4..394ac8eb4 100644
> --- a/ovsdb/row.h
> +++ b/ovsdb/row.h
> @@ -82,8 +82,10 @@ bool ovsdb_row_equal_columns(const struct ovsdb_row *,
>  int ovsdb_row_compare_columns_3way(const struct ovsdb_row *,
> const struct ovsdb_row *,
> const struct ovsdb_column_set *);
> -void ovsdb_row_update_columns(struct ovsdb_row *, const struct ovsdb_row *,
> -  const struct ovsdb_column_set *);
> +struct ovsdb_error *ovsdb_row_update_columns(struct ovsdb_row *,
> + const struct ovsdb_row *,
> + const struct ovsdb_column_set *,
> + bool xor);
>  void ovsdb_row_columns_to_string(const struct ovsdb_row *,
>   const struct ovsdb_column_set *, struct ds 
> *);
>  struct ovsdb_error *ovsdb_row_from_json(struct ovsdb_row *,
> diff --git a/ovsdb/table.c b/ovsdb/table.c
> index 2935bd897..455a3663f 100644
> --- a/ovsdb/table.c
> +++ b/ovsdb/table.c
> @@ -384,7 +384,8 @@ ovsdb_table_execute_delete(struct ovsdb_txn *txn, const 
> struct uuid 

Re: [ovs-dev] [PATCH v3 9/9] docs: Add documentation for ovsdb relay mode.

2021-07-15 Thread Mark Gray
On 14/07/2021 14:50, Ilya Maximets wrote:
> Main documentation for the service model and tutorial with the use case
> and configuration examples.
> 
> Acked-by: Dumitru Ceara 
> Signed-off-by: Ilya Maximets 
> ---
>  Documentation/automake.mk|   1 +
>  Documentation/ref/ovsdb.7.rst|  62 --
>  Documentation/topics/index.rst   |   1 +
>  Documentation/topics/ovsdb-relay.rst | 124 +++
>  NEWS |   3 +
>  ovsdb/ovsdb-server.1.in  |  27 +++---
>  6 files changed, 200 insertions(+), 18 deletions(-)
>  create mode 100644 Documentation/topics/ovsdb-relay.rst
> 
> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> index bc30f94c5..213d9c867 100644
> --- a/Documentation/automake.mk
> +++ b/Documentation/automake.mk
> @@ -52,6 +52,7 @@ DOC_SOURCE = \
>   Documentation/topics/networking-namespaces.rst \
>   Documentation/topics/openflow.rst \
>   Documentation/topics/ovs-extensions.rst \
> + Documentation/topics/ovsdb-relay.rst \
>   Documentation/topics/ovsdb-replication.rst \
>   Documentation/topics/porting.rst \
>   Documentation/topics/record-replay.rst \
> diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst
> index e4f1bf766..980ba29e7 100644
> --- a/Documentation/ref/ovsdb.7.rst
> +++ b/Documentation/ref/ovsdb.7.rst
> @@ -121,13 +121,14 @@ schema checksum from a schema or database file, 
> respectively.
>  Service Models
>  ==
>  
> -OVSDB supports three service models for databases: **standalone**,
> -**active-backup**, and **clustered**.  The service models provide different
> -compromises among consistency, availability, and partition tolerance.  They
> -also differ in the number of servers required and in terms of performance.  
> The
> -standalone and active-backup database service models share one on-disk 
> format,
> -and clustered databases use a different format, but the OVSDB programs work
> -with both formats.  ``ovsdb(5)`` documents these file formats.
> +OVSDB supports four service models for databases: **standalone**,
> +**active-backup**, **relay** and **clustered**.  The service models provide
> +different compromises among consistency, availability, and partition 
> tolerance.
> +They also differ in the number of servers required and in terms of 
> performance.
> +The standalone and active-backup database service models share one on-disk
> +format, and clustered databases use a different format, but the OVSDB 
> programs
> +work with both formats.  ``ovsdb(5)`` documents these file formats.  Relay
> +databases have no on-disk storage.
>  
>  RFC 7047, which specifies the OVSDB protocol, does not mandate or specify
>  any particular service model.
> @@ -406,6 +407,50 @@ following consequences:
>that the client previously read.  The OVSDB client library in Open vSwitch
>uses this feature to avoid servers with stale data.
>  
> +Relay Service Model
> +---
> +
> +A **relay** database is a way to scale out read-mostly access to the
> +existing database working in any service model including relay.
> +
> +Relay database creates and maintains an OVSDB connection with another OVSDB
> +server.  It uses this connection to maintain an in-memory copy of the remote
> +database (a.k.a. the ``relay source``) keeping the copy up-to-date as the
> +database content changes on the relay source in the real time.
> +
> +The purpose of relay server is to scale out the number of database clients.
> +Read-only transactions and monitor requests are fully handled by the relay
> +server itself.  For the transactions that request database modifications,
> +relay works as a proxy between the client and the relay source, i.e. it
> +forwards transactions and replies between them.
> +
> +Compared to the clustered and active-backup models, relay service model
> +provides read and write access to the database similarly to a clustered
> +database (and even more scalable), but with generally insignificant 
> performance
> +overhead of an active-backup model.  At the same time it doesn't increase
> +availability that needs to be covered by the service model of the relay 
> source.
> +
> +Relay database has no on-disk storage and therefore cannot be converted to
> +any other service model.
> +
> +If there is already a database started in any service model, to start a relay
> +database server use ``ovsdb-server relay::``, where
> + is the database name as specified in the schema of the database
> +that existing server runs, and  is an OVSDB connection 
> method
> +(see `Connection Methods`_ below) that connects to the existing database
> +server.   could contain a comma-separated list of 
> connection
> +methods, e.g. to connect to any server of the clustered database.
> +Multiple relay servers could be started for the same relay source.
> +
> +Since the way relays handle read and write transactions is very similar

Re: [ovs-dev] [PATCH v3 7/9] ovsdb: relay: Reflect connection status in _Server database.

2021-07-15 Thread Mark Gray
On 14/07/2021 14:50, Ilya Maximets wrote:
> It might be important for clients to know that relay lost connection
> with the relay remote, so they could re-connect to other relay.
> 
> Signed-off-by: Ilya Maximets 
> ---
>  ovsdb/_server.xml| 17 +
>  ovsdb/ovsdb-server.c |  3 ++-
>  ovsdb/relay.c| 34 ++
>  ovsdb/relay.h|  4 
>  4 files changed, 49 insertions(+), 9 deletions(-)
> 
> diff --git a/ovsdb/_server.xml b/ovsdb/_server.xml
> index 37297da73..7866f134f 100644
> --- a/ovsdb/_server.xml
> +++ b/ovsdb/_server.xml
> @@ -71,6 +71,15 @@
>source.
>  
>  
> +
> +  True if the database is connected to its storage.  A standalone 
> database
> +  is always connected.  A clustered database is connected if the server 
> is
> +  in contact with a majority of its cluster.  A relay database is 
> connected
> +  if the server is in contact with the relay source, i.e. is connected to
> +  the server it syncs from.  An unconnected database cannot be modified 
> and
> +  its data might be unavailable or stale.
> +
> +
>  
>
>  These columns are most interesting and in some cases only relevant 
> for
> @@ -78,14 +87,6 @@
>  column is clustered.
>
>  
> -  
> -True if the database is connected to its storage.  A standalone or
> -active-backup database is always connected.  A clustered database is
> -connected if the server is in contact with a majority of its cluster.
> -An unconnected database cannot be modified and its data might be
> -unavailable or stale.
> -  
> -
>
>  True if the database is the leader in its cluster.  For a standalone 
> or
>  active-backup database, this is always true.  For a relay database,
> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
> index ddf868d16..0181fe987 100644
> --- a/ovsdb/ovsdb-server.c
> +++ b/ovsdb/ovsdb-server.c
> @@ -1193,7 +1193,8 @@ update_database_status(struct ovsdb_row *row, struct db 
> *db)
>  ovsdb_util_write_string_column(row, "model",
>  db->db->is_relay ? "relay" : 
> ovsdb_storage_get_model(db->db->storage));
>  ovsdb_util_write_bool_column(row, "connected",
> - 
> ovsdb_storage_is_connected(db->db->storage));
> +db->db->is_relay ? ovsdb_relay_is_connected(db->db)
> + : ovsdb_storage_is_connected(db->db->storage));
>  ovsdb_util_write_bool_column(row, "leader",
>  db->db->is_relay ? false : ovsdb_storage_is_leader(db->db->storage));
>  ovsdb_util_write_uuid_column(row, "cid",
> diff --git a/ovsdb/relay.c b/ovsdb/relay.c
> index df9906bda..fb16ce35c 100644
> --- a/ovsdb/relay.c
> +++ b/ovsdb/relay.c
> @@ -31,6 +31,7 @@
>  #include "ovsdb-error.h"
>  #include "row.h"
>  #include "table.h"
> +#include "timeval.h"
>  #include "transaction.h"
>  #include "transaction-forward.h"
>  #include "util.h"
> @@ -47,8 +48,36 @@ struct relay_ctx {
>  struct ovsdb_schema *new_schema;
>  schema_change_callback schema_change_cb;
>  void *schema_change_aux;
> +
> +long long int last_connected;
>  };
>  
> +#define RELAY_MAX_RECONNECTION_MS 3
> +
> +/* Reports if the database is connected to the relay source and functional,
> + * i.e. it actively monitors the source and is able to forward transactions. 
> */
> +bool
> +ovsdb_relay_is_connected(struct ovsdb *db)
> +{
> +struct relay_ctx *ctx = shash_find_data(_dbs, db->name);
> +
> +if (!ctx || !ovsdb_cs_is_alive(ctx->cs)) {
> +return false;
> +}
> +
> +if (ovsdb_cs_may_send_transaction(ctx->cs)) {
> +return true;
> +}
> +
> +/* Trying to avoid connection state flapping by delaying report for
> + * upper layer and giving ovsdb-cs some time to reconnect. */
> +if (time_msec() - ctx->last_connected < RELAY_MAX_RECONNECTION_MS) {
> +return true;
> +}
> +
> +return false;
> +}
> +
>  static struct json *
>  ovsdb_relay_compose_monitor_request(const struct json *schema_json, void 
> *ctx_)
>  {
> @@ -119,6 +148,7 @@ ovsdb_relay_add_db(struct ovsdb *db, const char *remote,
>  ctx->schema_change_aux = schema_change_aux;
>  ctx->db = db;
>  ctx->cs = ovsdb_cs_create(db->name, 3, _cs_ops, ctx);
> +ctx->last_connected = 0;
>  shash_add(_dbs, db->name, ctx);
>  ovsdb_cs_set_leader_only(ctx->cs, false);
>  ovsdb_cs_set_remote(ctx->cs, remote, true);
> @@ -306,6 +336,10 @@ ovsdb_relay_run(void)
>  ovsdb_txn_forward_run(ctx->db, ctx->cs);
>  ovsdb_cs_run(ctx->cs, );
>  
> +if (ovsdb_cs_may_send_transaction(ctx->cs)) {
> +ctx->last_connected = time_msec();
> +}
> +
>  struct ovsdb_cs_event *event;
>  LIST_FOR_EACH_POP (event, list_node, ) {
>  if (!ctx->db) {
> diff --git a/ovsdb/relay.h b/ovsdb/relay.h
> index 68586e9db..390ea70c8 100644
> 

Re: [ovs-dev] [PATCH v3 6/9] ovsdb: relay: Add support for transaction forwarding.

2021-07-15 Thread Mark Gray
On 14/07/2021 14:50, Ilya Maximets wrote:
> Current version of ovsdb relay allows to scale out read-only
> access to the primary database.  However, many clients are not
> read-only but read-mostly.  For example, ovn-controller.
> 
> In order to scale out database access for this case ovsdb-server
> need to process transactions that are not read-only.  Relay is not
> allowed to do that, i.e. not allowed to modify the database, but it
> can act like a proxy and forward transactions that includes database
> modifications to the primary server and forward replies back to a
> client.  At the same time it may serve read-only transactions and
> monitor requests by itself greatly reducing the load on primary
> server.
> 
> This configuration will slightly increase transaction latency, but
> it's not very important for read-mostly use cases.
> 
> Implementation details:
> With this change instead of creating a trigger to commit the
> transaction, ovsdb-server will create a trigger for transaction
> forwarding.  Later, ovsdb_relay_run() will send all new transactions
> to the relay source.  Once transaction reply received from the
> relay source, ovsdb-relay module will update the state of the
> transaction forwarding with the reply.  After that, trigger_run()
> will complete the trigger and jsonrpc_server_run() will send the
> reply back to the client.  Since transaction reply from the relay
> source will be received after all the updates, client will receive
> all the updates before receiving the transaction reply as it is in
> a normal scenario with other database models.
> 
> Acked-by: Dumitru Ceara 
> Signed-off-by: Ilya Maximets 
> ---
>  ovsdb/automake.mk   |   2 +
>  ovsdb/execution.c   |  18 ++--
>  ovsdb/ovsdb.c   |   9 ++
>  ovsdb/ovsdb.h   |   8 +-
>  ovsdb/relay.c   |  12 ++-
>  ovsdb/transaction-forward.c | 182 
>  ovsdb/transaction-forward.h |  44 +
>  ovsdb/trigger.c |  49 --
>  ovsdb/trigger.h |  41 
>  tests/ovsdb-server.at   |  85 -
>  10 files changed, 411 insertions(+), 39 deletions(-)
>  create mode 100644 ovsdb/transaction-forward.c
>  create mode 100644 ovsdb/transaction-forward.h
> 
> diff --git a/ovsdb/automake.mk b/ovsdb/automake.mk
> index 05c8ebbdf..62cc02686 100644
> --- a/ovsdb/automake.mk
> +++ b/ovsdb/automake.mk
> @@ -48,6 +48,8 @@ ovsdb_libovsdb_la_SOURCES = \
>   ovsdb/trigger.h \
>   ovsdb/transaction.c \
>   ovsdb/transaction.h \
> + ovsdb/transaction-forward.c \
> + ovsdb/transaction-forward.h \
>   ovsdb/ovsdb-util.c \
>   ovsdb/ovsdb-util.h
>  ovsdb_libovsdb_la_CFLAGS = $(AM_CFLAGS)
> diff --git a/ovsdb/execution.c b/ovsdb/execution.c
> index dd2569055..f9b8067d0 100644
> --- a/ovsdb/execution.c
> +++ b/ovsdb/execution.c
> @@ -99,7 +99,8 @@ lookup_executor(const char *name, bool *read_only)
>  }
>  
>  /* On success, returns a transaction and stores the results to return to the
> - * client in '*resultsp'.
> + * client in '*resultsp'.  If 'forwarding_needed' is nonnull and transaction
> + * needs to be forwarded (in relay mode), sets '*forwarding_needed' to true.
>   *
>   * On failure, returns NULL.  If '*resultsp' is nonnull, then it is the 
> results
>   * to return to the client.  If '*resultsp' is null, then the execution 
> failed
> @@ -111,7 +112,8 @@ ovsdb_execute_compose(struct ovsdb *db, const struct 
> ovsdb_session *session,
>const struct json *params, bool read_only,
>const char *role, const char *id,
>long long int elapsed_msec, long long int 
> *timeout_msec,
> -  bool *durable, struct json **resultsp)
> +  bool *durable, bool *forwarding_needed,
> +  struct json **resultsp)
>  {
>  struct ovsdb_execution x;
>  struct ovsdb_error *error;
> @@ -120,6 +122,9 @@ ovsdb_execute_compose(struct ovsdb *db, const struct 
> ovsdb_session *session,
>  size_t i;
>  
>  *durable = false;
> +if (forwarding_needed) {
> +*forwarding_needed = false;
> +}
>  if (params->type != JSON_ARRAY
>  || !params->array.n
>  || params->array.elems[0]->type != JSON_STRING
> @@ -196,11 +201,8 @@ ovsdb_execute_compose(struct ovsdb *db, const struct 
> ovsdb_session *session,
>  "%s operation not allowed on "
>  "table in reserved database %s",
>  op_name, db->schema->name);
> -} else if (db->is_relay) {
> -error = ovsdb_error("not allowed",
> -"%s operation not allowed when "
> -"database server is in relay mode",
> -op_name);
> +} else if (db->is_relay && forwarding_needed) {
> 

Re: [ovs-dev] [PATCH v3 5/9] ovsdb: New ovsdb 'relay' service model.

2021-07-15 Thread Mark Gray
On 14/07/2021 14:50, Ilya Maximets wrote:
> New database service model 'relay' that is needed to scale out
> read-mostly database access, e.g. ovn-controller connections to
> OVN_Southbound.
> 
> In this service model ovsdb-server connects to existing OVSDB
> server and maintains in-memory copy of the database.  It serves
> read-only transactions and monitor requests by its own, but
> forwards write transactions to the relay source.
> 
> Key differences from the active-backup replication:
> - support for "write" transactions (next commit).
> - no on-disk storage. (probably, faster operation)
> - support for multiple remotes (connect to the clustered db).
> - doesn't try to keep connection as long as possible, but
>   faster reconnects to other remotes to avoid missing updates.
> - No need to know the complete database schema beforehand,
>   only the schema name.
> - can be used along with other standalone and clustered databases
>   by the same ovsdb-server process. (doesn't turn the whole
>   jsonrpc server to read-only mode)
> - supports modern version of monitors (monitor_cond_since),
>   because based on ovsdb-cs.
> - could be chained, i.e. multiple relays could be connected
>   one to another in a row or in a tree-like form.
> - doesn't increase availability.
> - cannot be converted to other service models or become a main
>   active server.
> 
> Acked-by: Dumitru Ceara 
> Signed-off-by: Ilya Maximets 
> ---
>  ovsdb/_server.ovsschema |   7 +-
>  ovsdb/_server.xml   |  18 ++-
>  ovsdb/automake.mk   |   2 +
>  ovsdb/execution.c   |   5 +
>  ovsdb/ovsdb-server.c| 100 
>  ovsdb/ovsdb.c   |   2 +
>  ovsdb/ovsdb.h   |   3 +
>  ovsdb/relay.c   | 343 
>  ovsdb/relay.h   |  34 
>  9 files changed, 473 insertions(+), 41 deletions(-)
>  create mode 100644 ovsdb/relay.c
>  create mode 100644 ovsdb/relay.h
> 
> diff --git a/ovsdb/_server.ovsschema b/ovsdb/_server.ovsschema
> index a867e5cbf..e3d9d893b 100644
> --- a/ovsdb/_server.ovsschema
> +++ b/ovsdb/_server.ovsschema
> @@ -1,13 +1,14 @@
>  {"name": "_Server",
> - "version": "1.1.0",
> - "cksum": "3236486585 698",
> + "version": "1.2.0",
> + "cksum": "3009684573 744",
>   "tables": {
> "Database": {
>   "columns": {
> "name": {"type": "string"},
> "model": {
>   "type": {"key": {"type": "string",
> -  "enum": ["set", ["standalone", "clustered"]]}}},
> +  "enum": ["set",
> + ["standalone", "clustered", 
> "relay"]]}}},
> "connected": {"type": "boolean"},
> "leader": {"type": "boolean"},
> "schema": {
> diff --git a/ovsdb/_server.xml b/ovsdb/_server.xml
> index 70cd22db7..37297da73 100644
> --- a/ovsdb/_server.xml
> +++ b/ovsdb/_server.xml
> @@ -60,12 +60,15 @@
>  
>  
>The storage model: standalone for a standalone or
> -  active-backup database, clustered for a clustered 
> database.
> +  active-backup database, clustered for a clustered 
> database,
> +  relay for a relay database.
>  
>  
>  
>The database schema, as a JSON string.  In the case of a clustered
> -  database, this is empty until it finishes joining its cluster.
> +  database, this is empty until it finishes joining its cluster.  In the
> +  case of a relay database, this is empty until it connects to the relay
> +  source.
>  
>  
>  
> @@ -85,20 +88,21 @@
>  
>
>  True if the database is the leader in its cluster.  For a standalone 
> or
> -active-backup database, this is always true.
> +active-backup database, this is always true.  For a relay database,
> +this is always false.
>
>  
>
>  The cluster ID for this database, which is the same for all of the
> -servers that host this particular clustered database.  For a 
> standalone
> -or active-backup database, this is empty.
> +servers that host this particular clustered database.  For a
> +standalone, active-backup or relay database, this is empty.
>
>  
>
>  The server ID for this database, different for each server that 
> hosts a
>  particular clustered database.  A server that hosts more than one
>  clustered database will have a different sid in each 
> one.
> -For a standalone or active-backup database, this is empty.
> +For a standalone, active-backup or relay database, this is empty.
>
>  
>
> @@ -112,7 +116,7 @@
>  
>  
>  
> -  For a standalone or active-backup database, this is empty.
> +  For a standalone, active-backup or relay database, this is empty.
>  
>
>  
> diff --git a/ovsdb/automake.mk b/ovsdb/automake.mk
> index 446d6c136..05c8ebbdf 100644
> --- a/ovsdb/automake.mk
> +++ b/ovsdb/automake.mk
> @@ 

Re: [ovs-dev] [PATCH v3 0/9] OVSDB Relay Service Model. (Was: OVSDB 2-Tier deployment)

2021-07-15 Thread Mark Gray
On 14/07/2021 14:50, Ilya Maximets wrote:
> Replication can be used to scale out read-only access to the database.
> But there are clients that are not read-only, but read-mostly.
> One of the main examples is ovn-controller that mostly monitors
> updates from the Southbound DB, but needs to claim ports by sending
> transactions that changes some database tables.
> 
> Southbound database serves lots of connections: all connections
> from ovn-controllers and some service connections from cloud
> infrastructure, e.g. some OpenStack agents are monitoring updates.
> At a high scale and with a big size of the database ovsdb-server
> spends too much time processing monitor updates and it's required
> to move this load somewhere else.  This patch-set aims to introduce
> required functionality to scale out read-mostly connections by
> introducing a new OVSDB 'relay' service model .
> 
> In this new service model ovsdb-server connects to existing OVSDB
> server and maintains in-memory copy of the database.  It serves
> read-only transactions and monitor requests by its own, but forwards
> write transactions to the relay source.
> 
> Key differences from the active-backup replication:
> - support for "write" transactions.
> - no on-disk storage. (probably, faster operation)
> - support for multiple remotes (connect to the clustered db).
> - doesn't try to keep connection as long as possible, but
>   faster reconnects to other remotes to avoid missing updates.
> - No need to know the complete database schema beforehand,
>   only the schema name.
> - can be used along with other standalone and clustered databases
>   by the same ovsdb-server process. (doesn't turn the whole
>   jsonrpc server to read-only mode)
> - supports modern version of monitors (monitor_cond_since),
>   because based on ovsdb-cs.
> - could be chained, i.e. multiple relays could be connected
>   one to another in a row or in a tree-like form.
> 
> Bringing all above functionality to the existing active-backup
> replication doesn't look right as it will make it less reliable
> for the actual backup use case, and this also would be much
> harder from the implementation point of view, because current
> replication code is not based on ovsdb-cs or idl and all the required
> features would be likely duplicated or replication would be fully
> re-written on top of ovsdb-cs with severe modifications of the former.
> 
> Relay is somewhere in the middle between active-backup replication and
> the clustered model taking a lot from both, therefore is hard to
> implement on top of any of them.
> 
> To run ovsdb-server in relay mode, user need to simply run:
> 
>   ovsdb-server --remote=punix:db.sock relay::
> 
> e.g.
> 
>   ovsdb-server --remote=punix:db.sock relay:OVN_Southbound:tcp:127.0.0.1:6642
> 
> More details and examples in the documentation in the last patch
> of the series.
> 
> I actually tried to implement transaction forwarding on top of
> active-backup replication in v1 of this seies, but it required
> a lot of tricky changes, including schema format changes in order
> to bring required information to the end clients, so I decided
> to fully rewrite the functionality in v2 with a different approach.
> 
> 
>  Testing
>  ===
> 
> Some scale tests were performed with OVSDB Relays that mimics OVN
> workloads with ovn-kubernetes.
> Tests performed with ovn-heater (https://github.com/dceara/ovn-heater)
> on scenario ocp-120-density-heavy:
>  
> https://github.com/dceara/ovn-heater/blob/master/test-scenarios/ocp-120-density-heavy.yml
> In short, the test gradually creates a lot of OVN resources and
> checks that network is configured correctly (by pinging diferent
> namespaces).  The test includes 120 chassis (created by
> ovn-fake-multinode), 31250 LSPs spread evenly across 120 LSes, 3 LBs
> with 15625 VIPs each, attached to all node LSes, etc.  Test performed
> with monitor-all=true.
> 
> Note 1:
>  - Memory consumption is checked at the end of a test in a following
>way: 1) check RSS 2) compact database 3) check RSS again.
>It's observed that ovn-controllers in this test are fairly slow
>and backlog builds up on monitors, because ovn-controllers are
>not able to receive updates fast enough.  This contributes to
>RSS of the process, especially in combination of glibc bug (glibc
>doesn't free fastbins back to the system).  Memory trimming on
>compaction is enabled in the test, so after compaction we can
>see more or less real value of the RSS at the end of the test
>wihtout backlog noise. (Compaction on relay in this case is
>just plain malloc_trim()).
> 
> Note 2:
>  - I didn't collect memory consumption (RSS) after compaction for a
>test with 10 relays, because I got the idea only after the test
>was finished and another one already started.  And run takes
>significant amount of time.  So, values marked with a star (*)
>are an approximation based on results form other tests, hence
>

Re: [ovs-dev] [PATCH v14 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-15 Thread Stokes, Ian
> From: Kumar Amber 
> 
> Tests:
>   6: OVS-DPDK - MFEX Autovalidator
>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
>   8: OVS-DPDK - MFEX Configuration
> 
> Added a new directory to store the PCAP file used
> in the tests and a script to generate the fuzzy traffic
> type pcap to be used in fuzzy unit test.
> 
> Signed-off-by: Kumar Amber 
> Acked-by: Flavio Leitner 

Hi Amber,

I see Flavio ack is here prom a previous revision, we should wait for Flavio to 
give an ACK on the changes to fix the coremask issue on the patch before 
including it.

Also I think it makes sense to add Eelco as co-author as he has provided extra 
negative test use cases.

These are minor so can be done on merge if there are no objections and the 
patch is acked by Flavio/Eelco.

Regards
Ian
> 
> ---
> v14:
> - include more neagtive tests in configuration
> - added core mask for the test
> v13:
> - fix -v in the command
> - added the configuration test case and supporting doc update
> v12:
> - change skip paramter for unit test
> v11:
> - fix comments from Eelco
> v7:
> - fix review comments(Eelco)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - remove sleep from first test and added minor 5 sec sleep to fuzzy
> ---
> ---
>  Documentation/topics/dpdk/bridge.rst |  56 ++
>  tests/.gitignore |   1 +
>  tests/automake.mk|   6 +
>  tests/mfex_fuzzy.py  |  33 ++
>  tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
>  tests/system-dpdk.at | 160 +++
>  6 files changed, 256 insertions(+)
>  create mode 100755 tests/mfex_fuzzy.py
>  create mode 100644 tests/pcap/mfex_test.pcap
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst
> b/Documentation/topics/dpdk/bridge.rst
> index 8c500c504..913b3e6f6 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -346,3 +346,59 @@ A compile time option is available in order to test it
> with the OVS unit
>  test suite. Use the following configure option ::
> 
>  $ ./configure --enable-mfex-default-autovalidator
> +
> +Unit Test Miniflow Extract
> +++
> +
> +Unit test can also be used to test the workflow mentioned above by running
> +the following test-case in tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> +OVS-DPDK - MFEX Autovalidator
> +
> +The unit test uses mulitple traffic type to test the correctness of the
> +implementaions.
> +
> +The MFEX commands can also be tested for negative and positive cases to
> +verify that the MFEX set command does not allow for incorrect parameters.
> +A user can directly run the following configuration test case in
> +tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> +OVS-DPDK - MFEX Configuration
> +
> +Running Fuzzy test with Autovalidator
> ++
> +
> +Fuzzy tests can also be done on miniflow extract with the help of
> +auto-validator and Scapy. The steps below describes the steps to
> +reproduce the setup with IP being fuzzed to generate packets.
> +
> +Scapy is used to create fuzzy IP packets and save them into a PCAP ::
> +
> +pkt = fuzz(Ether()/IP()/TCP())
> +
> +Set the miniflow extract to autovalidator using ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> +
> +OVS is configured to receive the generated packets ::
> +
> +$ ovs-vsctl add-port br0 pcap0 -- \
> +set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
> +"rx_pcap=fuzzy.pcap"
> +
> +With this workflow, the autovalidator will ensure that all MFEX
> +implementations are classifying each packet in exactly the same way.
> +If an optimized MFEX implementation causes a different miniflow to be
> +generated, the autovalidator has ovs_assert and logging statements that
> +will inform about the issue.
> +
> +Unit Fuzzy test with Autovalidator
> ++
> +
> +Unit test can also be used to test the workflow mentioned above by running
> +the following test-case in tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> +OVS-DPDK - MFEX Autovalidator Fuzzy
> diff --git a/tests/.gitignore b/tests/.gitignore
> index 45b4f67b2..a3d927e5d 100644
> --- a/tests/.gitignore
> +++ b/tests/.gitignore
> @@ -11,6 +11,7 @@
>  /ovsdb-cluster-testsuite
>  /ovsdb-cluster-testsuite.dir/
>  /ovsdb-cluster-testsuite.log
> +/pcap/
>  /pki/
>  /system-afxdp-testsuite
>  /system-afxdp-testsuite.dir/
> diff --git a/tests/automake.mk b/tests/automake.mk
> index f45f8d76c..a6c15ba55 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at:
> tests/automake.mk
>   echo "TEST_FUZZ_REGRESSION([$$basename])"; \
>   done > $@.tmp && mv $@.tmp $@
> 
> +EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
> +MFEX_AUTOVALIDATOR_TESTS = \
> + 

Re: [ovs-dev] [PATCH v14 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-15 Thread Stokes, Ian
> From: Kumar Amber 
> 
> This commit introduces additional command line paramter
> for mfex study function. If user provides additional packet out
> it is used in study to compare minimum packets which must be processed
> else a default value is choosen.
> Also introduces a third paramter for choosing a particular pmd core.
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3
> 
> Signed-off-by: Kumar Amber 
> 

Hi Amber, one minor comment below



> +/* If name is study and more args exist, parse study_count value. */
> +} else if (mfex_name && mfex_name_is_study) {
> +if (!str_to_uint(argv[1], 10, _count) ||
> +(study_count == 0)) {
> +ds_put_format(,
Minor, lower case for invalid below, can be fixed upon commit.

Thanks
Ian
> +"Error: Invalid study_pkt_cnt value: %s.\n", argv[1]);
> +goto error;
> +}
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v14 11/11] dpif-netdev/mfex: Add more AVX512 traffic profiles

2021-07-15 Thread kumar Amber
From: Harry van Haaren 

This commit adds 3 new traffic profile implementations to the
existing avx512 miniflow extract infrastructure. The profiles added are:
- Ether()/IP()/TCP()
- Ether()/Dot1Q()/IP()/UDP()
- Ether()/Dot1Q()/IP()/TCP()

The design of the avx512 code here is for scalability to add more
traffic profiles, as well as enabling CPU ISA. Note that an implementation
is primarily adding static const data, which the compiler then specializes
away when the profile specific function is declared below.

As a result, the code is relatively maintainable, and scalable for new
traffic profiles as well as new ISA, and does not lower performance
compared with manually written code for each profile/ISA.

Note that confidence in the correctness of each implementation is
achieved through autovalidation, unit tests with known packets, and
fuzz tested packets.

Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 
Acked-by: Flavio Leitner 
---

Hi Readers,

If you have a traffic profile you'd like to see accelerated using
avx512 code, please send me an email and we can collaborate on adding
support for it!

Regards, -Harry

---

v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 NEWS  |   2 +
 lib/dpif-netdev-extract-avx512.c  | 152 ++
 lib/dpif-netdev-private-extract.c |  30 ++
 lib/dpif-netdev-private-extract.h |  10 ++
 4 files changed, 194 insertions(+)

diff --git a/NEWS b/NEWS
index 99baca706..ea55805e8 100644
--- a/NEWS
+++ b/NEWS
@@ -41,6 +41,8 @@ Post-v2.15.0
  * Add build time configure command to enable auto-validatior as default
miniflow implementation at build time.
  * Cache results for CPU ISA checks, reduces overhead on repeated lookups.
+ * Add AVX512 based optimized miniflow extract function for traffic type
+   IPv4/UDP, IPv4/TCP, Vlan/IPv4/UDP and Vlan/Ipv4/TCP.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c
index ac253fa1e..ec64419e3 100644
--- a/lib/dpif-netdev-extract-avx512.c
+++ b/lib/dpif-netdev-extract-avx512.c
@@ -136,6 +136,13 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
 
 #define PATTERN_ETHERTYPE_MASK PATTERN_ETHERTYPE_GEN(0xFF, 0xFF)
 #define PATTERN_ETHERTYPE_IPV4 PATTERN_ETHERTYPE_GEN(0x08, 0x00)
+#define PATTERN_ETHERTYPE_DT1Q PATTERN_ETHERTYPE_GEN(0x81, 0x00)
+
+/* VLAN (Dot1Q) patterns and masks. */
+#define PATTERN_DT1Q_MASK   \
+  0x00, 0x00, 0xFF, 0xFF,
+#define PATTERN_DT1Q_IPV4   \
+  0x00, 0x00, 0x08, 0x00,
 
 /* Generator for checking IPv4 ver, ihl, and proto */
 #define PATTERN_IPV4_GEN(VER_IHL, FLAG_OFF_B0, FLAG_OFF_B1, PROTO) \
@@ -161,6 +168,29 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
   34, 35, 36, 37, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* UDP */   \
   NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
 
+/* TCP shuffle: tcp_ctl bits require mask/processing, not included here. */
+#define PATTERN_IPV4_TCP_SHUFFLE \
+   0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, NU, NU, /* Ether */ \
+  26, 27, 28, 29, 30, 31, 32, 33, NU, NU, NU, NU, 20, 15, 22, 23, /* IPv4 */  \
+  NU, NU, NU, NU, NU, NU, NU, NU, 34, 35, 36, 37, NU, NU, NU, NU, /* TCP */   \
+  NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
+
+#define PATTERN_DT1Q_IPV4_UDP_SHUFFLE \
+  /* Ether (2 blocks): Note that *VLAN* type is written here. */  \
+  0,  1,  2,  3,  4,  5,  6,  7, 8,  9, 10, 11, 16, 17,  0,  0,   \
+  /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */   \
+  12, 13, 14, 15, 0, 0, 0, 0, \
+  30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */  \
+  38, 39, 40, 41, NU, NU, NU, NU, /* UDP */
+
+#define PATTERN_DT1Q_IPV4_TCP_SHUFFLE \
+  /* Ether (2 blocks): Note that *VLAN* type is written here. */  \
+  0,  1,  2,  3,  4,  5,  6,  7, 8,  9, 10, 11, 16, 17,  0,  0,   \
+  /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */   \
+  12, 13, 14, 15, 0, 0, 0, 0, \
+  30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */  \
+  NU, NU, NU, NU, NU, NU, NU, NU, 38, 39, 40, 41, NU, NU, NU, NU, /* TCP */   \
+  NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
 
 /* Generation of K-mask bitmask values, to zero out data in result. Note that
  * these correspond 1:1 to the above "*_SHUFFLE" values, and bit used must be
@@ -170,12 +200,22 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
  * 

[ovs-dev] [PATCH v14 10/11] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-07-15 Thread kumar Amber
From: Harry van Haaren 

This commit adds AVX512 implementations of miniflow extract.
By using the 64 bytes available in an AVX512 register, it is
possible to convert a packet to a miniflow data-structure in
a small quantity instructions.

The implementation here probes for Ether()/IP()/UDP() traffic,
and builds the appropriate miniflow data-structure for packets
that match the probe.

The implementation here is auto-validated by the miniflow
extract autovalidator, hence its correctness can be easily
tested and verified.

Note that this commit is designed to easily allow addition of new
traffic profiles in a scalable way, without code duplication for
each traffic profile.

Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 

---
v12:
- fix one static code warning
v9:
- include comments from flavio
v8:
- include documentation on AVX512 MFEX as per Eelco's suggestion
v7:
- fix minor review sentences (Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- inlcude assert for flow abi change
- include assert for offset changes
---
---
 lib/automake.mk   |   1 +
 lib/dpif-netdev-extract-avx512.c  | 478 ++
 lib/dpif-netdev-private-extract.c |  13 +
 lib/dpif-netdev-private-extract.h |  30 ++
 4 files changed, 522 insertions(+)
 create mode 100644 lib/dpif-netdev-extract-avx512.c

diff --git a/lib/automake.mk b/lib/automake.mk
index f4f36325e..299f81939 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -39,6 +39,7 @@ lib_libopenvswitchavx512_la_CFLAGS = \
$(AM_CFLAGS)
 lib_libopenvswitchavx512_la_SOURCES = \
lib/dpif-netdev-lookup-avx512-gather.c \
+   lib/dpif-netdev-extract-avx512.c \
lib/dpif-netdev-avx512.c
 lib_libopenvswitchavx512_la_LDFLAGS = \
-static
diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c
new file mode 100644
index 0..ac253fa1e
--- /dev/null
+++ b/lib/dpif-netdev-extract-avx512.c
@@ -0,0 +1,478 @@
+/*
+ * Copyright (c) 2021 Intel.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * AVX512 Miniflow Extract.
+ *
+ * This file contains optimized implementations of miniflow_extract()
+ * for specific common traffic patterns. The optimizations allow for
+ * quick probing of a specific packet type, and if a match with a specific
+ * type is found, a shuffle like procedure builds up the required miniflow.
+ *
+ * Process
+ * -
+ *
+ * The procedure is to classify the packet based on the traffic type
+ * using predifined bit-masks and arrage the packet header data using shuffle
+ * instructions to a pre-defined place as required by the miniflow.
+ * This elimates the if-else ladder to identify the packet data and add data
+ * as per protocol which is present.
+ */
+
+#ifdef __x86_64__
+/* Sparse cannot handle the AVX512 instructions. */
+#if !defined(__CHECKER__)
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "flow.h"
+#include "dpdk.h"
+
+#include "dpif-netdev-private-dpcls.h"
+#include "dpif-netdev-private-extract.h"
+#include "dpif-netdev-private-flow.h"
+
+/* AVX512-BW level permutex2var_epi8 emulation. */
+static inline __m512i
+__attribute__((target("avx512bw")))
+_mm512_maskz_permutex2var_epi8_skx(__mmask64 k_mask,
+   __m512i v_data_0,
+   __m512i v_shuf_idxs,
+   __m512i v_data_1)
+{
+/* Manipulate shuffle indexes for u16 size. */
+__mmask64 k_mask_odd_lanes = 0x;
+/* Clear away ODD lane bytes. Cannot be done above due to no u8 shift. */
+__m512i v_shuf_idx_evn = _mm512_mask_blend_epi8(k_mask_odd_lanes,
+v_shuf_idxs,
+_mm512_setzero_si512());
+v_shuf_idx_evn = _mm512_srli_epi16(v_shuf_idx_evn, 1);
+
+__m512i v_shuf_idx_odd = _mm512_srli_epi16(v_shuf_idxs, 9);
+
+/* Shuffle each half at 16-bit width. */
+__m512i v_shuf1 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_evn,
+v_data_1);
+__m512i v_shuf2 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_odd,
+v_data_1);
+
+/* Find if the shuffle index was odd, via mask and compare. */
+uint16_t index_odd_mask = 0x1;
+const __m512i v_index_mask_u16 = _mm512_set1_epi16(index_odd_mask);
+
+/* EVEN lanes, find if 

[ovs-dev] [PATCH v14 08/11] dpif/stats: Add miniflow extract opt hits counter

2021-07-15 Thread kumar Amber
From: Harry van Haaren 

This commit adds a new counter to be displayed to the user when
requesting datapath packet statistics. It counts the number of
packets that are parsed and a miniflow built up from it by the
optimized miniflow extract parsers.

The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an
extra entry indicating if the optimized MFEX was hit:

  - MFEX Opt hits:6786432  (100.0 %)

Signed-off-by: Harry van Haaren 
Acked-by: Flavio Leitner 
Acked-by: Eelco Chaudron 
---
v11:
- fix review comments from Eelco
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 lib/dpif-netdev-avx512.c|  3 +++
 lib/dpif-netdev-perf.c  |  3 +++
 lib/dpif-netdev-perf.h  |  1 +
 lib/dpif-netdev-unixctl.man |  4 
 lib/dpif-netdev.c   | 12 +++-
 tests/pmd.at|  6 --
 6 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
index 7772b7abf..544d36903 100644
--- a/lib/dpif-netdev-avx512.c
+++ b/lib/dpif-netdev-avx512.c
@@ -310,8 +310,11 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 }
 
 /* At this point we don't return error anymore, so commit stats here. */
+uint32_t mfex_hit_cnt = __builtin_popcountll(mf_mask);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_RECV, batch_size);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_PHWOL_HIT, phwol_hits);
+pmd_perf_update_counter(>perf_stats, PMD_STAT_MFEX_OPT_HIT,
+mfex_hit_cnt);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_EXACT_HIT, emc_hits);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT, smc_hits);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_MASKED_HIT,
diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index 7103a2d4d..d7676ea2b 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -247,6 +247,7 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
pmd_perf_stats *s,
 "  Rx packets:%12"PRIu64"  (%.0f Kpps, %.0f cycles/pkt)\n"
 "  Datapath passes:   %12"PRIu64"  (%.2f passes/pkt)\n"
 "  - PHWOL hits:  %12"PRIu64"  (%5.1f %%)\n"
+"  - MFEX Opt hits:   %12"PRIu64"  (%5.1f %%)\n"
 "  - EMC hits:%12"PRIu64"  (%5.1f %%)\n"
 "  - SMC hits:%12"PRIu64"  (%5.1f %%)\n"
 "  - Megaflow hits:   %12"PRIu64"  (%5.1f %%, %.2f "
@@ -258,6 +259,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
pmd_perf_stats *s,
 passes, rx_packets ? 1.0 * passes / rx_packets : 0,
 stats[PMD_STAT_PHWOL_HIT],
 100.0 * stats[PMD_STAT_PHWOL_HIT] / passes,
+stats[PMD_STAT_MFEX_OPT_HIT],
+100.0 * stats[PMD_STAT_MFEX_OPT_HIT] / passes,
 stats[PMD_STAT_EXACT_HIT],
 100.0 * stats[PMD_STAT_EXACT_HIT] / passes,
 stats[PMD_STAT_SMC_HIT],
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 8b1a52387..834c26260 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -57,6 +57,7 @@ extern "C" {
 
 enum pmd_stat_type {
 PMD_STAT_PHWOL_HIT, /* Packets that had a partial HWOL hit (phwol). */
+PMD_STAT_MFEX_OPT_HIT,  /* Packets that had miniflow optimized match. */
 PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */
 PMD_STAT_SMC_HIT,   /* Packets that had a sig match hit (SMC). */
 PMD_STAT_MASKED_HIT,/* Packets that matched in the flow table. */
diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
index 83ce4f1c5..80304ad35 100644
--- a/lib/dpif-netdev-unixctl.man
+++ b/lib/dpif-netdev-unixctl.man
@@ -16,6 +16,9 @@ packet lookups performed by the datapath. Beware that a 
recirculated packet
 experiences one additional lookup per recirculation, so there may be
 more lookups than forwarded packets in the datapath.
 
+The MFEX Opt hits displays the number of packets that are processed by the
+optimized miniflow extract implementations.
+
 Cycles are counted using the TSC or similar facilities (when available on
 the platform). The duration of one cycle depends on the processing platform.
 
@@ -136,6 +139,7 @@ pmd thread numa_id 0 core_id 1:
   Rx packets: 2399607  (2381 Kpps, 848 cycles/pkt)
   Datapath passes:3599415  (1.50 passes/pkt)
   - PHWOL hits: 0  (  0.0 %)
+  - MFEX Opt hits:3570133  ( 99.2 %)
   - EMC hits:  336472  (  9.3 %)
   - SMC hits:   0  ( 0.0 %)
   - Megaflow hits:3262943  ( 90.7 %, 1.00 subtbl lookups/hit)
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 585b9500c..e54f9e617 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -648,6 +648,7 @@ pmd_info_show_stats(struct ds *reply,
   "  packet recirculations: %"PRIu64"\n"
   "  avg. datapath passes per packet: %.02f\n"
   "  

[ovs-dev] [PATCH v14 09/11] dpdk: Add additional CPU ISA detection strings

2021-07-15 Thread kumar Amber
From: Harry van Haaren 

This commit enables OVS to at runtime check for more detailed
AVX512 capabilities, specifically Byte and Word (BW) extensions,
and Vector Bit Manipulation Instructions (VBMI).

These instructions will be used in the CPU ISA optimized
implementations of traffic profile aware miniflow extract.

Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 
Acked-by: Flavio Leitner 
---
 NEWS   | 1 +
 lib/dpdk.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/NEWS b/NEWS
index 225eb445c..99baca706 100644
--- a/NEWS
+++ b/NEWS
@@ -40,6 +40,7 @@ Post-v2.15.0
traffic.
  * Add build time configure command to enable auto-validatior as default
miniflow implementation at build time.
+ * Cache results for CPU ISA checks, reduces overhead on repeated lookups.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 9de2af58e..1b8f8e55b 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -706,6 +706,8 @@ dpdk_get_cpu_has_isa(const char *arch, const char *feature)
 #if __x86_64__
 /* CPU flags only defined for the architecture that support it. */
 CHECK_CPU_FEATURE(feature, "avx512f", RTE_CPUFLAG_AVX512F);
+CHECK_CPU_FEATURE(feature, "avx512bw", RTE_CPUFLAG_AVX512BW);
+CHECK_CPU_FEATURE(feature, "avx512vbmi", RTE_CPUFLAG_AVX512VBMI);
 CHECK_CPU_FEATURE(feature, "avx512vpopcntdq", RTE_CPUFLAG_AVX512VPOPCNTDQ);
 CHECK_CPU_FEATURE(feature, "bmi2", RTE_CPUFLAG_BMI2);
 #endif
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v14 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-15 Thread kumar Amber
From: Kumar Amber 

Tests:
  6: OVS-DPDK - MFEX Autovalidator
  7: OVS-DPDK - MFEX Autovalidator Fuzzy
  8: OVS-DPDK - MFEX Configuration

Added a new directory to store the PCAP file used
in the tests and a script to generate the fuzzy traffic
type pcap to be used in fuzzy unit test.

Signed-off-by: Kumar Amber 
Acked-by: Flavio Leitner 

---
v14:
- include more neagtive tests in configuration
- added core mask for the test
v13:
- fix -v in the command
- added the configuration test case and supporting doc update
v12:
- change skip paramter for unit test
v11:
- fix comments from Eelco
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- remove sleep from first test and added minor 5 sec sleep to fuzzy
---
---
 Documentation/topics/dpdk/bridge.rst |  56 ++
 tests/.gitignore |   1 +
 tests/automake.mk|   6 +
 tests/mfex_fuzzy.py  |  33 ++
 tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
 tests/system-dpdk.at | 160 +++
 6 files changed, 256 insertions(+)
 create mode 100755 tests/mfex_fuzzy.py
 create mode 100644 tests/pcap/mfex_test.pcap

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 8c500c504..913b3e6f6 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -346,3 +346,59 @@ A compile time option is available in order to test it 
with the OVS unit
 test suite. Use the following configure option ::
 
 $ ./configure --enable-mfex-default-autovalidator
+
+Unit Test Miniflow Extract
+++
+
+Unit test can also be used to test the workflow mentioned above by running
+the following test-case in tests/system-dpdk.at ::
+
+make check-dpdk TESTSUITEFLAGS='-k MFEX'
+OVS-DPDK - MFEX Autovalidator
+
+The unit test uses mulitple traffic type to test the correctness of the
+implementaions.
+
+The MFEX commands can also be tested for negative and positive cases to
+verify that the MFEX set command does not allow for incorrect parameters.
+A user can directly run the following configuration test case in
+tests/system-dpdk.at ::
+
+make check-dpdk TESTSUITEFLAGS='-k MFEX'
+OVS-DPDK - MFEX Configuration
+
+Running Fuzzy test with Autovalidator
++
+
+Fuzzy tests can also be done on miniflow extract with the help of
+auto-validator and Scapy. The steps below describes the steps to
+reproduce the setup with IP being fuzzed to generate packets.
+
+Scapy is used to create fuzzy IP packets and save them into a PCAP ::
+
+pkt = fuzz(Ether()/IP()/TCP())
+
+Set the miniflow extract to autovalidator using ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
+
+OVS is configured to receive the generated packets ::
+
+$ ovs-vsctl add-port br0 pcap0 -- \
+set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
+"rx_pcap=fuzzy.pcap"
+
+With this workflow, the autovalidator will ensure that all MFEX
+implementations are classifying each packet in exactly the same way.
+If an optimized MFEX implementation causes a different miniflow to be
+generated, the autovalidator has ovs_assert and logging statements that
+will inform about the issue.
+
+Unit Fuzzy test with Autovalidator
++
+
+Unit test can also be used to test the workflow mentioned above by running
+the following test-case in tests/system-dpdk.at ::
+
+make check-dpdk TESTSUITEFLAGS='-k MFEX'
+OVS-DPDK - MFEX Autovalidator Fuzzy
diff --git a/tests/.gitignore b/tests/.gitignore
index 45b4f67b2..a3d927e5d 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -11,6 +11,7 @@
 /ovsdb-cluster-testsuite
 /ovsdb-cluster-testsuite.dir/
 /ovsdb-cluster-testsuite.log
+/pcap/
 /pki/
 /system-afxdp-testsuite
 /system-afxdp-testsuite.dir/
diff --git a/tests/automake.mk b/tests/automake.mk
index f45f8d76c..a6c15ba55 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at: tests/automake.mk
echo "TEST_FUZZ_REGRESSION([$$basename])"; \
done > $@.tmp && mv $@.tmp $@
 
+EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
+MFEX_AUTOVALIDATOR_TESTS = \
+   tests/pcap/mfex_test.pcap \
+   tests/mfex_fuzzy.py
+
 OVSDB_CLUSTER_TESTSUITE_AT = \
tests/ovsdb-cluster-testsuite.at \
tests/ovsdb-execution.at \
@@ -512,6 +517,7 @@ tests_test_type_props_SOURCES = tests/test-type-props.c
 CHECK_PYFILES = \
tests/appctl.py \
tests/flowgen.py \
+   tests/mfex_fuzzy.py \
tests/ovsdb-monitor-sort.py \
tests/test-daemon.py \
tests/test-json.py \
diff --git a/tests/mfex_fuzzy.py b/tests/mfex_fuzzy.py
new file mode 100755
index 0..5b056bb48
--- /dev/null
+++ b/tests/mfex_fuzzy.py
@@ -0,0 +1,33 @@
+#!/usr/bin/python3
+try:
+from scapy.all import RandMAC, RandIP, PcapWriter, 

[ovs-dev] [PATCH v14 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-15 Thread kumar Amber
From: Kumar Amber 

This commit introduces additional command line paramter
for mfex study function. If user provides additional packet out
it is used in study to compare minimum packets which must be processed
else a default value is choosen.
Also introduces a third paramter for choosing a particular pmd core.

$ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3

Signed-off-by: Kumar Amber 

---
v14:
- fix logging format and  xmas ordering
v13:
- reowrked the set command as per discussion
- fixed the atomic set in study
- added bool for handling study mfex to simplify logic and command output
- fixed double space in variable declaration and removed static
v12:
- re-work the set command to sweep
- inlcude fixes to study.c and doc changes
v11:
- include comments from Eelco
- reworked set command as per discussion
v10:
- fix review comments Eelco
v9:
- fix review comments Flavio
v7:
- change the command paramters for core_id and study_pkt_cnt
v5:
- fix review comments(Ian, Flavio, Eelco)
- introucde pmd core id parameter
---
---
 Documentation/topics/dpdk/bridge.rst |  38 +-
 lib/dpif-netdev-extract-study.c  |  26 +++-
 lib/dpif-netdev-private-extract.h|   9 ++
 lib/dpif-netdev.c| 175 ++-
 4 files changed, 216 insertions(+), 32 deletions(-)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index a47153495..8c500c504 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -284,12 +284,46 @@ command also shows whether the CPU supports each 
implementation ::
 
 An implementation can be selected manually by the following command ::
 
-$ ovs-appctl dpif-netdev/miniflow-parser-set study
+$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
+ [study_cnt]
+
+The above command has two optional parameters: study_cnt and core_id.
+The core_id sets a particular miniflow extract function to a specific
+pmd thread on the core. The third parameter study_cnt, which is specific
+to study and ignored by other implementations, means how many packets
+are needed to choose the best implementation.
 
 Also user can select the study implementation which studies the traffic for
 a specific number of packets by applying all available implementations of
 miniflow extract and then chooses the one with the most optimal result for
-that traffic pattern.
+that traffic pattern. The user can optionally provide an packet count
+[study_cnt] parameter which is the minimum number of packets that OVS must
+study before choosing an optimal implementation. If no packet count is
+provided, then the default value, 128 is chosen. Also, as there is no
+synchronization point between threads, one PMD thread might still be running
+a previous round, and can now decide on earlier data.
+
+The per packet count is a global value, and parallel study executions with
+differing packet counts will use the most recent count value provided by user.
+
+Study can be selected with packet count by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
+
+Study can be selected with packet count and explicit PMD selection
+by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
+
+In the above command the first parameter is the CORE ID of the PMD
+thread and this can also be used to explicitly set the miniflow
+extraction function pointer on different PMD threads.
+
+Scalar can be selected on core 3 by the following command where
+study count should not be provided for any implementation other
+than study ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
 
 Miniflow Extract Validation
 ~~~
diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
index 02b709f8b..4340c9eee 100644
--- a/lib/dpif-netdev-extract-study.c
+++ b/lib/dpif-netdev-extract-study.c
@@ -25,7 +25,7 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
 
-static atomic_uint32_t mfex_study_pkts_count = 0;
+static atomic_uint32_t mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;
 
 /* Struct to hold miniflow study stats. */
 struct study_stats {
@@ -48,6 +48,25 @@ mfex_study_get_study_stats_ptr(void)
 return stats;
 }
 
+int
+mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count, const char *name)
+{
+struct dpif_miniflow_extract_impl *miniflow_funcs;
+miniflow_funcs = dpif_mfex_impl_info_get();
+
+/* If the packet count is set and implementation called is study then
+ * set packet counter to requested number else return -EINVAL.
+ */
+if ((strcmp(miniflow_funcs[MFEX_IMPL_STUDY].name, name) == 0) &&
+(pkt_cmp_count != 0)) {
+
+atomic_store_relaxed(_study_pkts_count, pkt_cmp_count);
+return 0;
+}
+
+return -EINVAL;
+}
+
 uint32_t
 mfex_study_traffic(struct dp_packet_batch *packets,
struct 

[ovs-dev] [PATCH v14 03/11] dpif-netdev: Add study function to select the best mfex function

2021-07-15 Thread kumar Amber
From: Kumar Amber 

The study function runs all the available implementations
of miniflow_extract and makes a choice whose hitmask has
maximum hits and sets the mfex to that function.

Study can be run at runtime using the following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set study

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 
Acked-by: Flavio Leitner 
---
v11:
- fix minor comments from Eelco
v10:
- fix minor comments from Eelco
v9:
- fix comments Flavio
v8:
- fix review comments Flavio
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- add Atomic set in study
---
---
 NEWS  |   3 +
 lib/automake.mk   |   1 +
 lib/dpif-netdev-extract-study.c   | 136 ++
 lib/dpif-netdev-private-extract.c |  12 +++
 lib/dpif-netdev-private-extract.h |  19 +
 5 files changed, 171 insertions(+)
 create mode 100644 lib/dpif-netdev-extract-study.c

diff --git a/NEWS b/NEWS
index cf254bcfe..4a7b89409 100644
--- a/NEWS
+++ b/NEWS
@@ -35,6 +35,9 @@ Post-v2.15.0
  * Add command line option to switch between MFEX function pointers.
  * Add miniflow extract auto-validator function to compare different
miniflow extract implementations against default implementation.
+ * Add study function to miniflow function table which studies packet
+   and automatically chooses the best miniflow implementation for that
+   traffic.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/automake.mk b/lib/automake.mk
index 53b8abc0f..f4f36325e 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -107,6 +107,7 @@ lib_libopenvswitch_la_SOURCES = \
lib/dp-packet.h \
lib/dp-packet.c \
lib/dpdk.h \
+   lib/dpif-netdev-extract-study.c \
lib/dpif-netdev-lookup.h \
lib/dpif-netdev-lookup.c \
lib/dpif-netdev-lookup-autovalidator.c \
diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
new file mode 100644
index 0..02b709f8b
--- /dev/null
+++ b/lib/dpif-netdev-extract-study.c
@@ -0,0 +1,136 @@
+/*
+ * Copyright (c) 2021 Intel.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "dpif-netdev-private-thread.h"
+#include "openvswitch/vlog.h"
+#include "ovs-thread.h"
+
+VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
+
+static atomic_uint32_t mfex_study_pkts_count = 0;
+
+/* Struct to hold miniflow study stats. */
+struct study_stats {
+uint32_t pkt_count;
+uint32_t impl_hitcount[MFEX_IMPL_MAX];
+};
+
+/* Define per thread data to hold the study stats. */
+DEFINE_PER_THREAD_MALLOCED_DATA(struct study_stats *, study_stats);
+
+/* Allocate per thread PMD pointer space for study_stats. */
+static inline struct study_stats *
+mfex_study_get_study_stats_ptr(void)
+{
+struct study_stats *stats = study_stats_get();
+if (OVS_UNLIKELY(!stats)) {
+   stats = xzalloc(sizeof *stats);
+   study_stats_set_unsafe(stats);
+}
+return stats;
+}
+
+uint32_t
+mfex_study_traffic(struct dp_packet_batch *packets,
+   struct netdev_flow_key *keys,
+   uint32_t keys_size, odp_port_t in_port,
+   struct dp_netdev_pmd_thread *pmd_handle)
+{
+uint32_t hitmask = 0;
+uint32_t mask = 0;
+struct dp_netdev_pmd_thread *pmd = pmd_handle;
+struct dpif_miniflow_extract_impl *miniflow_funcs;
+struct study_stats *stats = mfex_study_get_study_stats_ptr();
+miniflow_funcs = dpif_mfex_impl_info_get();
+
+/* Run traffic optimized miniflow_extract to collect the hitmask
+ * to be compared after certain packets have been hit to choose
+ * the best miniflow_extract version for that traffic.
+ */
+for (int i = MFEX_IMPL_START_IDX; i < MFEX_IMPL_MAX; i++) {
+if (!miniflow_funcs[i].available) {
+continue;
+}
+
+hitmask = miniflow_funcs[i].extract_func(packets, keys, keys_size,
+ in_port, pmd_handle);
+stats->impl_hitcount[i] += count_1bits(hitmask);
+
+/* If traffic is not classified then we dont overwrite the keys
+ * array in minfiflow implementations so its safe to create a
+ * mask for all those packets whose 

[ovs-dev] [PATCH v14 00/11] MFEX Infrastructure + Optimizations

2021-07-15 Thread kumar Amber
v14:
- fixed format and xmas order in patch 6
- added additonal negative test-cases for mfex commands
- added core mask for configuration test case
v13:
- add Acks from Flavio
- fixed atomic set and static var in study
- shotened null check in first patch
- added improvements to set command as per discussion
- added test-case to test for set commands both negative and positive
v12:
- re-work the set command to sweep method
- changes skip for unit-test to true from not-available
- added acks from Eelco
- minor doc fixed and typos
v11:
- reworked set command in alingment with Eelco and Harry
- added Acks from Eelco.
- added skip to unit test if other implementations not available
- minor typos and fixes
- clang build fixes
- removed patch whith Scalar DPIF, will send separately
v10 update:
- re-worked the default implementation
- fix comments from Flavio and Eelco
- Include Acks from Eelco in study
v9 update:
- Include review comments from Flavio
- Rebase onto Master
- Include Acks from Flavio
v8 updates:
- Include documentation on AVX512 MFEX as per Eelco's suggestion on list
v7 updates:
- Rebase onto DPIF v15
- Changed commands to get and set MFEX
- Fixed comments from Flavio, Eelco
- Segrated addition of MFEX options to seaprate patch 12 for Scalar DPIF
- Removed sleep from auto-validator and added frame counter check
- Documentation updates
- Minor bug fixes
v6 updates:
- Fix non-ssl build
v5 updates:
- reabse onto latest DPIF v14
- use Enum for mfex impls
- add pmd core id set paramter in set command
- get command modified to display the pmd thread for individual mfex functions
- resolved comments from Eelco, Ian, Flavio
- Use Atomic to get and set miniflow implementations
- removed and reduced sleep in unit tests
- fixed scalar miniflow perf degradation
v4 updates:
- rebase on to latest DPIF v13
- fix fuzzy.py script with random mac/ip
v3 updates:
- rebase on to latest DPIF v12
- add additonal AVX512 traffic profiles for tcp and vlan
- add new command line for study function to add packet count
- add unit tests for fuzzy testing and auto-validation of mfex
- add mfex option hit stats to perf-show command
v2 updates:
- rebase on to latest DPIF v11
This patchset introduces miniflow extract Infrastructure changes
which allows user to choose different type of ISA based optimized
miniflow extract variants which can be user choosen or set based on 
packets studies automatically by OVS using different commands.
The Infrastructure also provides a way to check the correctness of
different ISA optimized miniflow extract variants against the scalar
version.

Harry van Haaren (4):
  dpif/stats: Add miniflow extract opt hits counter
  dpdk: Add additional CPU ISA detection strings
  dpif-netdev/mfex: Add AVX512 based optimized miniflow extract
  dpif-netdev/mfex: Add more AVX512 traffic profiles

Kumar Amber (7):
  dpif-netdev: Add command line and function pointer for miniflow
extract
  dpif-netdev: Add auto validation function for miniflow extract
  dpif-netdev: Add study function to select the best mfex function
  docs/dpdk/bridge: Add miniflow extract section.
  dpif-netdev: Add configure to enable autovalidator at build time.
  dpif-netdev: Add packet count and core id paramters for study
  test/sytem-dpdk: Add unit test for mfex autovalidator

 Documentation/topics/dpdk/bridge.rst | 146 +++
 NEWS |  11 +
 acinclude.m4 |  16 +
 configure.ac |   1 +
 lib/automake.mk  |   4 +
 lib/dpdk.c   |   2 +
 lib/dpif-netdev-avx512.c |  34 +-
 lib/dpif-netdev-extract-avx512.c | 630 +++
 lib/dpif-netdev-extract-study.c  | 158 +++
 lib/dpif-netdev-perf.c   |   3 +
 lib/dpif-netdev-perf.h   |   1 +
 lib/dpif-netdev-private-extract.c| 371 
 lib/dpif-netdev-private-extract.h| 203 +
 lib/dpif-netdev-private-thread.h |   8 +
 lib/dpif-netdev-unixctl.man  |   4 +
 lib/dpif-netdev.c| 238 +-
 tests/.gitignore |   1 +
 tests/automake.mk|   6 +
 tests/mfex_fuzzy.py  |  33 ++
 tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
 tests/pmd.at |   6 +-
 tests/system-dpdk.at | 160 +++
 22 files changed, 2024 insertions(+), 12 deletions(-)
 create mode 100644 lib/dpif-netdev-extract-avx512.c
 create mode 100644 lib/dpif-netdev-extract-study.c
 create mode 100644 lib/dpif-netdev-private-extract.c
 create mode 100644 lib/dpif-netdev-private-extract.h
 create mode 100755 tests/mfex_fuzzy.py
 create mode 100644 tests/pcap/mfex_test.pcap

-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v14 02/11] dpif-netdev: Add auto validation function for miniflow extract

2021-07-15 Thread kumar Amber
From: Kumar Amber 

This patch introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different miniflow implementations against the linear
miniflow extract and return a hitmask.

The autovaidator function can be triggered at runtime using the
following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 
Acked-by: Flavio Leitner 
---
v9:
- fix review comments Flavio
v6:
-fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- remove ovs assert and switch to default after a batch of packets
  is processed
- Atomic set and get introduced
- fix raw_ctz for windows build
---
---
 NEWS  |   2 +
 lib/dpif-netdev-private-extract.c | 150 ++
 lib/dpif-netdev-private-extract.h |  22 +
 lib/dpif-netdev.c |   2 +-
 4 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index b0f08e96d..cf254bcfe 100644
--- a/NEWS
+++ b/NEWS
@@ -33,6 +33,8 @@ Post-v2.15.0
CPU supports it. This enhances performance by using the native vpopcount
instructions, instead of the emulated version of vpopcount.
  * Add command line option to switch between MFEX function pointers.
+ * Add miniflow extract auto-validator function to compare different
+   miniflow extract implementations against default implementation.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c
index 584b110ca..c283ff3e1 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -38,6 +38,11 @@ static ATOMIC(miniflow_extract_func) default_mfex_func;
  */
 static struct dpif_miniflow_extract_impl mfex_impls[] = {
 
+[MFEX_IMPL_AUTOVALIDATOR] = {
+.probe = NULL,
+.extract_func = dpif_miniflow_extract_autovalidator,
+.name = "autovalidator", },
+
 [MFEX_IMPL_SCALAR] = {
 .probe = NULL,
 .extract_func = NULL,
@@ -160,3 +165,148 @@ dp_mfex_impl_get_by_name(const char *name, 
miniflow_extract_func *out_func)
 
 return -ENOENT;
 }
+
+uint32_t
+dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
+struct netdev_flow_key *keys,
+uint32_t keys_size, odp_port_t in_port,
+struct dp_netdev_pmd_thread *pmd_handle)
+{
+const size_t cnt = dp_packet_batch_size(packets);
+uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
+uint16_t good_l3_ofs[NETDEV_MAX_BURST];
+uint16_t good_l4_ofs[NETDEV_MAX_BURST];
+uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
+struct dp_packet *packet;
+struct dp_netdev_pmd_thread *pmd = pmd_handle;
+struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
+
+if (keys_size < cnt) {
+miniflow_extract_func default_func = NULL;
+atomic_uintptr_t *pmd_func = (void *)>miniflow_extract_opt;
+atomic_store_relaxed(pmd_func, (uintptr_t) default_func);
+VLOG_ERR("Invalid key size supplied, Key_size: %d less than"
+ "batch_size:  %" PRIuSIZE"\n", keys_size, cnt);
+VLOG_ERR("Autovalidatior is disabled.\n");
+return 0;
+}
+
+/* Run scalar miniflow_extract to get default result. */
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+pkt_metadata_init(>md, in_port);
+miniflow_extract(packet, [i].mf);
+
+/* Store known good metadata to compare with optimized metadata. */
+good_l2_5_ofs[i] = packet->l2_5_ofs;
+good_l3_ofs[i] = packet->l3_ofs;
+good_l4_ofs[i] = packet->l4_ofs;
+good_l2_pad_size[i] = packet->l2_pad_size;
+}
+
+uint32_t batch_failed = 0;
+/* Iterate through each version of miniflow implementations. */
+for (int j = MFEX_IMPL_START_IDX; j < MFEX_IMPL_MAX; j++) {
+if (!mfex_impls[j].available) {
+continue;
+}
+/* Reset keys and offsets before each implementation. */
+memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key));
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+dp_packet_reset_offsets(packet);
+}
+/* Call optimized miniflow for each batch of packet. */
+uint32_t hit_mask = mfex_impls[j].extract_func(packets, test_keys,
+   keys_size, in_port,
+   pmd_handle);
+
+/* Do a miniflow compare for bits, blocks and offsets for all the
+ * classified packets in the hitmask marked by set bits. */
+while (hit_mask) {
+/* Index for the set bit. */
+uint32_t i = raw_ctz(hit_mask);
+ 

[ovs-dev] [PATCH v14 05/11] dpif-netdev: Add configure to enable autovalidator at build time.

2021-07-15 Thread kumar Amber
From: Kumar Amber 

This commit adds a new command to allow the user to enable
autovalidatior by default at build time thus allowing for
runnig unit test by default.

 $ ./configure --enable-mfex-default-autovalidator

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 
Acked-by: Flavio Leitner 
---
v11:
- fix NEWS for blank line addition
v10:
- rework default set
v9:
- fix review comments Flavio
v7:
- fix review commens(Eelco, Flavio)
v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 Documentation/topics/dpdk/bridge.rst |  5 +
 NEWS |  2 ++
 acinclude.m4 | 16 
 configure.ac |  1 +
 lib/dpif-netdev-private-extract.c|  4 
 5 files changed, 28 insertions(+)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 7c96f4d5e..a47153495 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -307,3 +307,8 @@ implementations provide the same results.
 To set the Miniflow autovalidator, use this command ::
 
 $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
+
+A compile time option is available in order to test it with the OVS unit
+test suite. Use the following configure option ::
+
+$ ./configure --enable-mfex-default-autovalidator
diff --git a/NEWS b/NEWS
index 4a7b89409..225eb445c 100644
--- a/NEWS
+++ b/NEWS
@@ -38,6 +38,8 @@ Post-v2.15.0
  * Add study function to miniflow function table which studies packet
and automatically chooses the best miniflow implementation for that
traffic.
+ * Add build time configure command to enable auto-validatior as default
+   miniflow implementation at build time.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/acinclude.m4 b/acinclude.m4
index 343303447..5a48f0335 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -14,6 +14,22 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+dnl Set OVS MFEX Autovalidator as default miniflow extract at compile time?
+dnl This enables automatically running all unit tests with all MFEX
+dnl implementations.
+AC_DEFUN([OVS_CHECK_MFEX_AUTOVALIDATOR], [
+  AC_ARG_ENABLE([mfex-default-autovalidator],
+[AC_HELP_STRING([--enable-mfex-default-autovalidator], [Enable 
MFEX autovalidator as default miniflow_extract implementation.])],
+[autovalidator=yes],[autovalidator=no])
+  AC_MSG_CHECKING([whether MFEX Autovalidator is default implementation])
+  if test "$autovalidator" != yes; then
+AC_MSG_RESULT([no])
+  else
+OVS_CFLAGS="$OVS_CFLAGS -DMFEX_AUTOVALIDATOR_DEFAULT"
+AC_MSG_RESULT([yes])
+  fi
+])
+
 dnl Set OVS DPCLS Autovalidator as default subtable search at compile time?
 dnl This enables automatically running all unit tests with all DPCLS
 dnl implementations.
diff --git a/configure.ac b/configure.ac
index e45685a6c..46c402892 100644
--- a/configure.ac
+++ b/configure.ac
@@ -186,6 +186,7 @@ OVS_ENABLE_SPARSE
 OVS_CTAGS_IDENTIFIERS
 OVS_CHECK_DPCLS_AUTOVALIDATOR
 OVS_CHECK_DPIF_AVX512_DEFAULT
+OVS_CHECK_MFEX_AUTOVALIDATOR
 OVS_CHECK_BINUTILS_AVX512
 
 AC_ARG_VAR(KARCH, [Kernel Architecture String])
diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c
index f1e81a451..ceb6d1084 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -60,7 +60,11 @@ void
 dpif_miniflow_extract_init(void)
 {
 atomic_uintptr_t *mfex_func = (void *)_mfex_func;
+#ifdef MFEX_AUTOVALIDATOR_DEFAULT
+int mfex_idx = MFEX_IMPL_AUTOVALIDATOR;
+#else
 int mfex_idx = MFEX_IMPL_SCALAR;
+#endif
 
 /* Call probe on each impl, and cache the result. */
 for (int i = 0; i < MFEX_IMPL_MAX; i++) {
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v14 01/11] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-15 Thread kumar Amber
From: Kumar Amber 

This patch introduces the MFEX function pointers which allows
the user to switch between different miniflow extract implementations
which are provided by the OVS based on optimized ISA CPU.

The user can query for the available minflow extract variants available
for that CPU by following commands:

$ovs-appctl dpif-netdev/miniflow-parser-get

Similarly an user can set the miniflow implementation by the following
command :

$ ovs-appctl dpif-netdev/miniflow-parser-set name

This allows for more performance and flexibility to the user to choose
the miniflow implementation according to the needs.

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 
Acked-by: Flavio Leitner 
---
v13:
- fix a minor comment
v11:
- fix Eelco comments
v10:
- fix build errors
- rework default set and atomic global variable
v9:
- fix review comments from Flavio
v7:
- fix review comments(Eelco, Flavio)
v5:
- fix review comments(Ian, Flavio, Eelco)
- add enum to hold mfex indexes
- add new get and set implemenatations
- add Atomic set and get
---
---
 NEWS  |   1 +
 lib/automake.mk   |   2 +
 lib/dpif-netdev-avx512.c  |  31 +-
 lib/dpif-netdev-private-extract.c | 162 ++
 lib/dpif-netdev-private-extract.h | 113 +
 lib/dpif-netdev-private-thread.h  |   8 ++
 lib/dpif-netdev.c | 107 +++-
 7 files changed, 419 insertions(+), 5 deletions(-)
 create mode 100644 lib/dpif-netdev-private-extract.c
 create mode 100644 lib/dpif-netdev-private-extract.h

diff --git a/NEWS b/NEWS
index 6cdccc715..b0f08e96d 100644
--- a/NEWS
+++ b/NEWS
@@ -32,6 +32,7 @@ Post-v2.15.0
  * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if the
CPU supports it. This enhances performance by using the native vpopcount
instructions, instead of the emulated version of vpopcount.
+ * Add command line option to switch between MFEX function pointers.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/automake.mk b/lib/automake.mk
index 3c9523c1a..53b8abc0f 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \
lib/dpif-netdev-private-dpcls.h \
lib/dpif-netdev-private-dpif.c \
lib/dpif-netdev-private-dpif.h \
+   lib/dpif-netdev-private-extract.c \
+   lib/dpif-netdev-private-extract.h \
lib/dpif-netdev-private-flow.h \
lib/dpif-netdev-private-thread.h \
lib/dpif-netdev-private.h \
diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
index 6f9aa8284..7772b7abf 100644
--- a/lib/dpif-netdev-avx512.c
+++ b/lib/dpif-netdev-avx512.c
@@ -149,6 +149,15 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
  * // do all processing (HWOL->MFEX->EMC->SMC)
  * }
  */
+
+/* Do a batch minfilow extract into keys. */
+uint32_t mf_mask = 0;
+miniflow_extract_func mfex_func;
+atomic_read_relaxed(>miniflow_extract_opt, _func);
+if (mfex_func) {
+mf_mask = mfex_func(packets, keys, batch_size, in_port, pmd);
+}
+
 uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1;
 uint32_t iter = lookup_pkts_bitmask;
 while (iter) {
@@ -167,6 +176,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 pkt_metadata_init(>md, in_port);
 
 struct dp_netdev_flow *f = NULL;
+struct netdev_flow_key *key = [i];
+
+/* Check the minfiflow mask to see if the packet was correctly
+ * classifed by vector mfex else do a scalar miniflow extract
+ * for that packet.
+ */
+bool mfex_hit = !!(mf_mask & (1 << i));
 
 /* Check for a partial hardware offload match. */
 if (hwol_enabled) {
@@ -177,7 +193,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 }
 if (f) {
 rules[i] = >cr;
-pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
+/* If AVX512 MFEX already classified the packet, use it. */
+if (mfex_hit) {
+pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf);
+} else {
+pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
+}
+
 pkt_meta[i].bytes = dp_packet_size(packet);
 phwol_hits++;
 hwol_emc_smc_hitmask |= (1 << i);
@@ -185,9 +207,10 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 }
 }
 
-/* Do miniflow extract into keys. */
-struct netdev_flow_key *key = [i];
-miniflow_extract(packet, >mf);
+if (!mfex_hit) {
+/* Do a scalar miniflow extract into keys. */
+

[ovs-dev] [PATCH v14 04/11] docs/dpdk/bridge: Add miniflow extract section.

2021-07-15 Thread kumar Amber
From: Kumar Amber 

This commit adds a section to the dpdk/bridge.rst netdev documentation,
detailing the added miniflow functionality. The newly added commands are
documented, and sample output is provided.

The use of auto-validator and special study function is also described
in detail as well as running fuzzy tests.

Signed-off-by: Kumar Amber 
Co-authored-by: Cian Ferriter 
Signed-off-by: Cian Ferriter 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
Acked-by: Flavio Leitner 
Acked-by: Eelco Chaudron 

---
v11:
- fix minor typos.
v10:
- fix minor typos.
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 Documentation/topics/dpdk/bridge.rst | 51 
 1 file changed, 51 insertions(+)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 2d0850836..7c96f4d5e 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -256,3 +256,54 @@ The following line should be seen in the configure output 
when the above option
 is used ::
 
 checking whether DPIF AVX512 is default implementation... yes
+
+Miniflow Extract
+
+
+Miniflow extract (MFEX) performs parsing of the raw packets and extracts the
+important header information into a compressed miniflow. This miniflow is
+composed of bits and blocks where the bits signify which blocks are set or
+have values where as the blocks hold the metadata, ip, udp, vlan, etc. These
+values are used by the datapath for switching decisions later. The Optimized
+miniflow extract is traffic specific to speed up the lookup, whereas the
+scalar works for ALL traffic patterns
+
+Most modern CPUs have SIMD capabilities. These SIMD instructions are able
+to process a vector rather than act on one variable. OVS provides multiple
+implementations of miniflow extract. This allows the user to take advantage
+of SIMD instructions like AVX512 to gain additional performance.
+
+A list of implementations can be obtained by the following command. The
+command also shows whether the CPU supports each implementation ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-get
+Available Optimized Miniflow Extracts:
+autovalidator (available: True, pmds: none)
+scalar (available: True, pmds: 1,15)
+study (available: True, pmds: none)
+
+An implementation can be selected manually by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set study
+
+Also user can select the study implementation which studies the traffic for
+a specific number of packets by applying all available implementations of
+miniflow extract and then chooses the one with the most optimal result for
+that traffic pattern.
+
+Miniflow Extract Validation
+~~~
+
+As multiple versions of miniflow extract can co-exist, each with different
+CPU ISA optimizations, it is important to validate that they all give the
+exact same results. To easily test all miniflow implementations, an
+``autovalidator`` implementation of the miniflow exists. This implementation
+runs all other available miniflow extract implementations, and verifies that
+the results are identical.
+
+Running the OVS unit tests with the autovalidator enabled ensures all
+implementations provide the same results.
+
+To set the Miniflow autovalidator, use this command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.

2021-07-15 Thread Ilya Maximets
On 6/29/21 9:57 PM, Ilya Maximets wrote:
>>> Regarding the current patch, I think it's better to add a test case to
>>> cover the scenario and confirm that existing connections didn't reset. With
>>> that:
>>> Acked-by: Han Zhou 
> 
> I'll work on a unit test for this.

Hi.  Here is a unit test that I came up with:

diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
index 62181dd4d..e32f9ec89 100644
--- a/tests/ovsdb-idl.at
+++ b/tests/ovsdb-idl.at
@@ -2282,3 +2282,27 @@ OVSDB_CHECK_CLUSTER_IDL_C([simple idl, 
monitor_cond_since, cluster disconnect],
 008: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] 
uuid=<2>
 009: done
 ]])
+
+dnl This test checks that IDL keeps the existing connection to the server if
+dnl it's still on a list of remotes after update.
+OVSDB_CHECK_IDL_C([simple idl, initially empty, set remotes],
+  [],
+  [['set-remote unix:socket' \
+'+set-remote unix:bad_socket,unix:socket' \
+'+set-remote unix:bad_socket' \
+'+set-remote unix:socket' \
+'set-remote unix:bad_socket,unix:socket' \
+'+set-remote unix:socket' \
+'+reconnect']],
+  [[000: empty
+001: new remotes: unix:socket, is connected: true
+002: new remotes: unix:bad_socket,unix:socket, is connected: true
+003: new remotes: unix:bad_socket, is connected: false
+004: new remotes: unix:socket, is connected: false
+005: empty
+006: new remotes: unix:bad_socket,unix:socket, is connected: true
+007: new remotes: unix:socket, is connected: true
+008: reconnect
+009: empty
+010: done
+]])
diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
index a886f971e..93329cd4c 100644
--- a/tests/test-ovsdb.c
+++ b/tests/test-ovsdb.c
@@ -2621,6 +2621,7 @@ do_idl(struct ovs_cmdl_context *ctx)
 setvbuf(stdout, NULL, _IONBF, 0);
 
 symtab = ovsdb_symbol_table_create();
+const char remote_s[] = "set-remote ";
 const char cond_s[] = "condition ";
 if (ctx->argc > 2 && strstr(ctx->argv[2], cond_s)) {
 update_conditions(idl, ctx->argv[2] + strlen(cond_s));
@@ -2664,6 +2665,11 @@ do_idl(struct ovs_cmdl_context *ctx)
 if (!strcmp(arg, "reconnect")) {
 print_and_log("%03d: reconnect", step++);
 ovsdb_idl_force_reconnect(idl);
+}  else if (!strncmp(arg, remote_s, strlen(remote_s))) {
+ovsdb_idl_set_remote(idl, arg + strlen(remote_s), true);
+print_and_log("%03d: new remotes: %s, is connected: %s", step++,
+  arg + strlen(remote_s),
+  ovsdb_idl_is_connected(idl) ? "true" : "false");
 }  else if (!strncmp(arg, cond_s, strlen(cond_s))) {
 update_conditions(idl, arg + strlen(cond_s));
 print_and_log("%03d: change conditions", step++);
---

Dumitru, Han, if it looks good to you, I can squash it in before
applying the patch.   What do you think?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn branch-21.06] Don't suppress localport traffic directed to external port

2021-07-15 Thread Ihar Hrachyshka
For the record, there are some integration issues with the backport
that I am working now, please don't merge this version as-is.

Thanks,
Ihar

On Wed, Jul 14, 2021 at 10:07 PM Ihar Hrachyshka  wrote:
>
> Recently, we stopped leaking localport traffic through localnet ports
> into fabric to avoid unnecessary flipping between chassis hosting the
> same localport.
>
> Despite the type name, in some scenarios localports are supposed to
> talk outside the hosting chassis. Specifically, in OpenStack [1]
> metadata service for SR-IOV ports is implemented as a localport hosted
> on another chassis that is exposed to the chassis owning the SR-IOV
> port through an "external" port. In this case, "leaking" localport
> traffic into fabric is desirable.
>
> This patch inserts a higher priority flow per external port on the
> same datapath that avoids dropping localport traffic.
>
> This backport returns false from binding_handle_port_binding_changes
> on external port delete to enforce physical flow recalculation. This
> fixes the test case.
>
> Fixes: 96959e56d634 ("physical: do not forward traffic from localport
> to a localnet one")
>
> [1] https://docs.openstack.org/neutron/latest/admin/ovn/sriov.html
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1974062
>
> Signed-off-by: Ihar Hrachyshka 
> Signed-off-by: Numan Siddique 
> (cherry picked from commit 1148580290d0ace803f20aeaa0241dd51c100630)
> ---
>  controller/binding.c| 39 +++--
>  controller/ovn-controller.c |  2 +
>  controller/ovn-controller.h |  2 +
>  controller/physical.c   | 46 
>  tests/ovn.at| 85 +
>  5 files changed, 170 insertions(+), 4 deletions(-)
>
> diff --git a/controller/binding.c b/controller/binding.c
> index 594babc98..1c648fc17 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -108,6 +108,7 @@ add_local_datapath__(struct ovsdb_idl_index 
> *sbrec_datapath_binding_by_key,
>  hmap_insert(local_datapaths, >hmap_node, dp_key);
>  ld->datapath = datapath;
>  ld->localnet_port = NULL;
> +shash_init(>external_ports);
>  ld->has_local_l3gateway = has_local_l3gateway;
>
>  if (tracked_datapaths) {
> @@ -474,6 +475,18 @@ is_network_plugged(const struct sbrec_port_binding 
> *binding_rec,
>  return network ? !!shash_find_data(bridge_mappings, network) : false;
>  }
>
> +static void
> +update_ld_external_ports(const struct sbrec_port_binding *binding_rec,
> + struct hmap *local_datapaths)
> +{
> +struct local_datapath *ld = get_local_datapath(
> +local_datapaths, binding_rec->datapath->tunnel_key);
> +if (ld) {
> +shash_replace(>external_ports, binding_rec->logical_port,
> +  binding_rec);
> +}
> +}
> +
>  static void
>  update_ld_localnet_port(const struct sbrec_port_binding *binding_rec,
>  struct shash *bridge_mappings,
> @@ -1631,8 +1644,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> binding_ctx_out *b_ctx_out)
>  !sset_is_empty(b_ctx_out->egress_ifaces) ? _map : NULL;
>
>  struct ovs_list localnet_lports = OVS_LIST_INITIALIZER(_lports);
> +struct ovs_list external_lports = OVS_LIST_INITIALIZER(_lports);
>
> -struct localnet_lport {
> +struct lport {
>  struct ovs_list list_node;
>  const struct sbrec_port_binding *pb;
>  };
> @@ -1680,11 +1694,14 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> binding_ctx_out *b_ctx_out)
>
>  case LP_EXTERNAL:
>  consider_external_lport(pb, b_ctx_in, b_ctx_out);
> +struct lport *ext_lport = xmalloc(sizeof *ext_lport);
> +ext_lport->pb = pb;
> +ovs_list_push_back(_lports, _lport->list_node);
>  break;
>
>  case LP_LOCALNET: {
>  consider_localnet_lport(pb, b_ctx_in, b_ctx_out, _map);
> -struct localnet_lport *lnet_lport = xmalloc(sizeof *lnet_lport);
> +struct lport *lnet_lport = xmalloc(sizeof *lnet_lport);
>  lnet_lport->pb = pb;
>  ovs_list_push_back(_lports, _lport->list_node);
>  break;
> @@ -1711,7 +1728,7 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> binding_ctx_out *b_ctx_out)
>  /* Run through each localnet lport list to see if it is a localnet port
>   * on local datapaths discovered from above loop, and update the
>   * corresponding local datapath accordingly. */
> -struct localnet_lport *lnet_lport;
> +struct lport *lnet_lport;
>  LIST_FOR_EACH_POP (lnet_lport, list_node, _lports) {
>  update_ld_localnet_port(lnet_lport->pb, _mappings,
>  b_ctx_out->egress_ifaces,
> @@ -1719,6 +1736,15 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
> binding_ctx_out *b_ctx_out)
>  free(lnet_lport);
>  }
>
> +/* Run through external lport list to see if these 

Re: [ovs-dev] [PATCH v13 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-15 Thread Van Haaren, Harry
> -Original Message-
> From: Flavio Leitner 
> Sent: Thursday, July 15, 2021 4:26 PM
> To: Amber, Kumar 
> Cc: ovs-dev@openvswitch.org; echau...@redhat.com; i.maxim...@ovn.org; Van
> Haaren, Harry ; Ferriter, Cian
> ; Stokes, Ian 
> Subject: Re: [PATCH v13 07/11] test/sytem-dpdk: Add unit test for mfex
> autovalidator
> 
> On Thu, Jul 15, 2021 at 06:12:13PM +0530, kumar Amber wrote:



> > +
> > +AT_CHECK([ovs-appctl dpif-netdev/miniflow-parser-set -pmd 21 study], [0], 
> > [dnl
> > +Miniflow extract implementation set to study, on pmd thread 21, studying 
> > 128
> packets.
> > +])
> 
> This one actually fails to me because the pmd id is not valid:

Ah ok.

> --- /dev/null   2021-07-14 14:56:03.508411934 -0400
> +++
> /ovs/tests/system-dpdk-testsuite.dir/at-groups/8/stderr
> 2021-07-15 10:59:47.441060921 -0400
> @@ -0,0 +1,2 @@
> +Error: Miniflow parser not changed, PMD thread 21 not in use, pass a valid 
> pmd
> thread ID.
> +ovs-appctl: ovs-vswitchd: server returned an error
> 
> Most probably the other ones relying on -pmd 21 will fail as well.

Interesting, same as before.  

> The valid PMD id here was 11. I think the test needs to find out the
> PMD id first, then issue the command.

Yes, this can be resolved by just setting the OVS PMD mask on startup.
(By choosing that startup coremask to be 0xC we require a minimum 4 core 
machine)
The valid tests core ids have been updated to "-pmd 3" as included by 0xC mask.

+AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
other_config:pmd-cpu-mask=0xC])

Fix included in v14.

> fbl

Thanks for feedback. Regards, -Harry


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] latch-unix: Decrease the stack usage in latch

2021-07-15 Thread anton . ivanov
From: Anton Ivanov 

1. Make latch behave as described and documented - clear all
outstanding latch writes when invoking latch_poll().
2. Decrease the size of the latch buffer. Less stack usage,
less cache thrashing.

Signed-off-by: Anton Ivanov 
---
 lib/latch-unix.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/lib/latch-unix.c b/lib/latch-unix.c
index 2995076d6..80115fe6d 100644
--- a/lib/latch-unix.c
+++ b/lib/latch-unix.c
@@ -23,6 +23,7 @@
 #include "openvswitch/poll-loop.h"
 #include "socket-util.h"
 
+
 /* Initializes 'latch' as initially unset. */
 void
 latch_init(struct latch *latch)
@@ -43,9 +44,17 @@ latch_destroy(struct latch *latch)
 bool
 latch_poll(struct latch *latch)
 {
-char buffer[_POSIX_PIPE_BUF];
+char latch_buffer[16];
+bool result = false;
+int ret;
+
+do {
+ret = read(latch->fds[0], _buffer, sizeof latch_buffer);
+result |= ret > 0;
+/* Repeat as long as read() reads a full buffer. */
+} while (ret == sizeof latch_buffer);
 
-return read(latch->fds[0], buffer, sizeof buffer) > 0;
+return result;
 }
 
 /* Sets 'latch'.
-- 
2.20.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-15 Thread Flavio Leitner
On Thu, Jul 15, 2021 at 06:12:13PM +0530, kumar Amber wrote:
> From: Kumar Amber 
> 
> Tests:
>   6: OVS-DPDK - MFEX Autovalidator
>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
>   8: OVS-DPDK - MFEX Configuration
> 
> Added a new directory to store the PCAP file used
> in the tests and a script to generate the fuzzy traffic
> type pcap to be used in fuzzy unit test.
> 
> Signed-off-by: Kumar Amber 
> Acked-by: Flavio Leitner 
> 
> ---
> v13:
> - fix -v in the command
> - added the configuration test case and supporting doc update
> v12:
> - change skip paramter for unit test
> v11:
> - fix comments from Eelco
> v7:
> - fix review comments(Eelco)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - remove sleep from first test and added minor 5 sec sleep to fuzzy
> ---
> ---
>  Documentation/topics/dpdk/bridge.rst |  56 
>  tests/.gitignore |   1 +
>  tests/automake.mk|   6 ++
>  tests/mfex_fuzzy.py  |  33 +++
>  tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
>  tests/system-dpdk.at | 129 +++
>  6 files changed, 225 insertions(+)
>  create mode 100755 tests/mfex_fuzzy.py
>  create mode 100644 tests/pcap/mfex_test.pcap
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 8c500c504..913b3e6f6 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -346,3 +346,59 @@ A compile time option is available in order to test it 
> with the OVS unit
>  test suite. Use the following configure option ::
>  
>  $ ./configure --enable-mfex-default-autovalidator
> +
> +Unit Test Miniflow Extract
> +++
> +
> +Unit test can also be used to test the workflow mentioned above by running
> +the following test-case in tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> +OVS-DPDK - MFEX Autovalidator
> +
> +The unit test uses mulitple traffic type to test the correctness of the
> +implementaions.
> +
> +The MFEX commands can also be tested for negative and positive cases to
> +verify that the MFEX set command does not allow for incorrect parameters.
> +A user can directly run the following configuration test case in
> +tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> +OVS-DPDK - MFEX Configuration
> +
> +Running Fuzzy test with Autovalidator
> ++
> +
> +Fuzzy tests can also be done on miniflow extract with the help of
> +auto-validator and Scapy. The steps below describes the steps to
> +reproduce the setup with IP being fuzzed to generate packets.
> +
> +Scapy is used to create fuzzy IP packets and save them into a PCAP ::
> +
> +pkt = fuzz(Ether()/IP()/TCP())
> +
> +Set the miniflow extract to autovalidator using ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> +
> +OVS is configured to receive the generated packets ::
> +
> +$ ovs-vsctl add-port br0 pcap0 -- \
> +set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
> +"rx_pcap=fuzzy.pcap"
> +
> +With this workflow, the autovalidator will ensure that all MFEX
> +implementations are classifying each packet in exactly the same way.
> +If an optimized MFEX implementation causes a different miniflow to be
> +generated, the autovalidator has ovs_assert and logging statements that
> +will inform about the issue.
> +
> +Unit Fuzzy test with Autovalidator
> ++
> +
> +Unit test can also be used to test the workflow mentioned above by running
> +the following test-case in tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> +OVS-DPDK - MFEX Autovalidator Fuzzy
> diff --git a/tests/.gitignore b/tests/.gitignore
> index 45b4f67b2..a3d927e5d 100644
> --- a/tests/.gitignore
> +++ b/tests/.gitignore
> @@ -11,6 +11,7 @@
>  /ovsdb-cluster-testsuite
>  /ovsdb-cluster-testsuite.dir/
>  /ovsdb-cluster-testsuite.log
> +/pcap/
>  /pki/
>  /system-afxdp-testsuite
>  /system-afxdp-testsuite.dir/
> diff --git a/tests/automake.mk b/tests/automake.mk
> index f45f8d76c..a6c15ba55 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at: 
> tests/automake.mk
>   echo "TEST_FUZZ_REGRESSION([$$basename])"; \
>   done > $@.tmp && mv $@.tmp $@
>  
> +EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
> +MFEX_AUTOVALIDATOR_TESTS = \
> + tests/pcap/mfex_test.pcap \
> + tests/mfex_fuzzy.py
> +
>  OVSDB_CLUSTER_TESTSUITE_AT = \
>   tests/ovsdb-cluster-testsuite.at \
>   tests/ovsdb-execution.at \
> @@ -512,6 +517,7 @@ tests_test_type_props_SOURCES = tests/test-type-props.c
>  CHECK_PYFILES = \
>   tests/appctl.py \
>   tests/flowgen.py \
> + tests/mfex_fuzzy.py \
>   tests/ovsdb-monitor-sort.py \
>   tests/test-daemon.py \
>   tests/test-json.py \

Re: [ovs-dev] [PATCH v13 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-15 Thread Eelco Chaudron


On 15 Jul 2021, at 17:06, Stokes, Ian wrote:

>> Added some missing negative test cases, see below.
>>
>
> Thanks for these Eelco, would you like me to add you as a co-author for this 
> patch as you have code that is part of the next revison?

Whatever is easiest for you, I’m fine either way.

> Regards
> Ian
>> On 15 Jul 2021, at 14:42, kumar Amber wrote:
>>
>>> From: Kumar Amber 
>>>
>>> Tests:
>>>   6: OVS-DPDK - MFEX Autovalidator
>>>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
>>>   8: OVS-DPDK - MFEX Configuration
>>>
>>> Added a new directory to store the PCAP file used
>>> in the tests and a script to generate the fuzzy traffic
>>> type pcap to be used in fuzzy unit test.
>>>
>>> Signed-off-by: Kumar Amber 
>>> Acked-by: Flavio Leitner 
>>>
>>> ---
>>> v13:
>>> - fix -v in the command
>>> - added the configuration test case and supporting doc update
>>> v12:
>>> - change skip paramter for unit test
>>> v11:
>>> - fix comments from Eelco
>>> v7:
>>> - fix review comments(Eelco)
>>> v5:
>>> - fix review comments(Ian, Flavio, Eelco)
>>> - remove sleep from first test and added minor 5 sec sleep to fuzzy
>>> ---
>>> ---
>>>  Documentation/topics/dpdk/bridge.rst |  56 
>>>  tests/.gitignore |   1 +
>>>  tests/automake.mk|   6 ++
>>>  tests/mfex_fuzzy.py  |  33 +++
>>>  tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
>>>  tests/system-dpdk.at | 129 +++
>>>  6 files changed, 225 insertions(+)
>>>  create mode 100755 tests/mfex_fuzzy.py
>>>  create mode 100644 tests/pcap/mfex_test.pcap
>>>
>>> diff --git a/Documentation/topics/dpdk/bridge.rst
>> b/Documentation/topics/dpdk/bridge.rst
>>> index 8c500c504..913b3e6f6 100644
>>> --- a/Documentation/topics/dpdk/bridge.rst
>>> +++ b/Documentation/topics/dpdk/bridge.rst
>>> @@ -346,3 +346,59 @@ A compile time option is available in order to test it
>> with the OVS unit
>>>  test suite. Use the following configure option ::
>>>
>>>  $ ./configure --enable-mfex-default-autovalidator
>>> +
>>> +Unit Test Miniflow Extract
>>> +++
>>> +
>>> +Unit test can also be used to test the workflow mentioned above by running
>>> +the following test-case in tests/system-dpdk.at ::
>>> +
>>> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
>>> +OVS-DPDK - MFEX Autovalidator
>>> +
>>> +The unit test uses mulitple traffic type to test the correctness of the
>>> +implementaions.
>>> +
>>> +The MFEX commands can also be tested for negative and positive cases to
>>> +verify that the MFEX set command does not allow for incorrect parameters.
>>> +A user can directly run the following configuration test case in
>>> +tests/system-dpdk.at ::
>>> +
>>> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
>>> +OVS-DPDK - MFEX Configuration
>>> +
>>> +Running Fuzzy test with Autovalidator
>>> ++
>>> +
>>> +Fuzzy tests can also be done on miniflow extract with the help of
>>> +auto-validator and Scapy. The steps below describes the steps to
>>> +reproduce the setup with IP being fuzzed to generate packets.
>>> +
>>> +Scapy is used to create fuzzy IP packets and save them into a PCAP ::
>>> +
>>> +pkt = fuzz(Ether()/IP()/TCP())
>>> +
>>> +Set the miniflow extract to autovalidator using ::
>>> +
>>> +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
>>> +
>>> +OVS is configured to receive the generated packets ::
>>> +
>>> +$ ovs-vsctl add-port br0 pcap0 -- \
>>> +set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
>>> +"rx_pcap=fuzzy.pcap"
>>> +
>>> +With this workflow, the autovalidator will ensure that all MFEX
>>> +implementations are classifying each packet in exactly the same way.
>>> +If an optimized MFEX implementation causes a different miniflow to be
>>> +generated, the autovalidator has ovs_assert and logging statements that
>>> +will inform about the issue.
>>> +
>>> +Unit Fuzzy test with Autovalidator
>>> ++
>>> +
>>> +Unit test can also be used to test the workflow mentioned above by running
>>> +the following test-case in tests/system-dpdk.at ::
>>> +
>>> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
>>> +OVS-DPDK - MFEX Autovalidator Fuzzy
>>> diff --git a/tests/.gitignore b/tests/.gitignore
>>> index 45b4f67b2..a3d927e5d 100644
>>> --- a/tests/.gitignore
>>> +++ b/tests/.gitignore
>>> @@ -11,6 +11,7 @@
>>>  /ovsdb-cluster-testsuite
>>>  /ovsdb-cluster-testsuite.dir/
>>>  /ovsdb-cluster-testsuite.log
>>> +/pcap/
>>>  /pki/
>>>  /system-afxdp-testsuite
>>>  /system-afxdp-testsuite.dir/
>>> diff --git a/tests/automake.mk b/tests/automake.mk
>>> index f45f8d76c..a6c15ba55 100644
>>> --- a/tests/automake.mk
>>> +++ b/tests/automake.mk
>>> @@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at:
>> tests/automake.mk
>>> echo "TEST_FUZZ_REGRESSION([$$basename])"; \
>>> done > $@.tmp && mv $@.tmp $@
>>>

Re: [ovs-dev] [PATCH v13 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-15 Thread Amber, Kumar
Hi Flavio,

> -Original Message-
> From: Flavio Leitner 
> Sent: Thursday, July 15, 2021 8:28 PM
> To: Eelco Chaudron 
> Cc: Amber, Kumar ; ovs-dev@openvswitch.org;
> i.maxim...@ovn.org; Van Haaren, Harry ;
> Ferriter, Cian ; Stokes, Ian 
> Subject: Re: [PATCH v13 06/11] dpif-netdev: Add packet count and core id
> paramters for study
> 
> On Thu, Jul 15, 2021 at 04:15:13PM +0200, Eelco Chaudron wrote:
> >
> > Some minor changes in output, maybe they can be done during the commit?
> >
> > On 15 Jul 2021, at 14:42, kumar Amber wrote:
> >
> > > From: Kumar Amber 
> > >
> > > This commit introduces additional command line paramter for mfex
> > > study function. If user provides additional packet out it is used in
> > > study to compare minimum packets which must be processed else a
> > > default value is choosen.
> > > Also introduces a third paramter for choosing a particular pmd core.
> > >
> > > $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3
> > >
> > > Signed-off-by: Kumar Amber 
> > >
> > > ---
> > > v13:
> > > - reowrked the set command as per discussion
> > > - fixed the atomic set in study
> > > - added bool for handling study mfex to simplify logic and command
> > > output
> > > - fixed double space in variable declaration and removed static
> > > v12:
> > > - re-work the set command to sweep
> > > - inlcude fixes to study.c and doc changes
> > > v11:
> > > - include comments from Eelco
> > > - reworked set command as per discussion
> > > v10:
> > > - fix review comments Eelco
> > > v9:
> > > - fix review comments Flavio
> > > v7:
> > > - change the command paramters for core_id and study_pkt_cnt
> > > v5:
> > > - fix review comments(Ian, Flavio, Eelco)
> > > - introucde pmd core id parameter
> > > ---
> > > ---
> > >  Documentation/topics/dpdk/bridge.rst |  38 +-
> > >  lib/dpif-netdev-extract-study.c  |  27 -
> > >  lib/dpif-netdev-private-extract.h|   9 ++
> > >  lib/dpif-netdev.c| 173 ++-
> > >  4 files changed, 215 insertions(+), 32 deletions(-)
> > >
> > > diff --git a/Documentation/topics/dpdk/bridge.rst
> > > b/Documentation/topics/dpdk/bridge.rst
> > > index a47153495..8c500c504 100644
> > > --- a/Documentation/topics/dpdk/bridge.rst
> > > +++ b/Documentation/topics/dpdk/bridge.rst
> > > @@ -284,12 +284,46 @@ command also shows whether the CPU supports
> > > each implementation ::
> > >
> > >  An implementation can be selected manually by the following command ::
> > >
> > > -$ ovs-appctl dpif-netdev/miniflow-parser-set study
> > > +$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
> > > + [study_cnt]
> > > +
> > > +The above command has two optional parameters: study_cnt and core_id.
> > > +The core_id sets a particular miniflow extract function to a
> > > +specific pmd thread on the core. The third parameter study_cnt,
> > > +which is
> > > specific
> > > +to study and ignored by other implementations, means how many
> > > +packets are needed to choose the best implementation.
> > >
> > >  Also user can select the study implementation which studies the
> > > traffic for  a specific number of packets by applying all available
> > > implementations of  miniflow extract and then chooses the one with
> > > the most optimal result for -that traffic pattern.
> > > +that traffic pattern. The user can optionally provide an packet
> > > +count [study_cnt] parameter which is the minimum number of packets
> > > +that OVS
> > > must
> > > +study before choosing an optimal implementation. If no packet count
> > > +is provided, then the default value, 128 is chosen. Also, as there
> > > +is no synchronization point between threads, one PMD thread might
> > > +still be
> > > running
> > > +a previous round, and can now decide on earlier data.
> > > +
> > > +The per packet count is a global value, and parallel study
> > > +executions
> > > with
> > > +differing packet counts will use the most recent count value
> > > +provided
> > > by user.
> > > +
> > > +Study can be selected with packet count by the following command ::
> > > +
> > > +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
> > > +
> > > +Study can be selected with packet count and explicit PMD selection
> > > +by the following command ::
> > > +
> > > +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
> > > +
> > > +In the above command the first parameter is the CORE ID of the PMD
> > > +thread and this can also be used to explicitly set the miniflow
> > > +extraction function pointer on different PMD threads.
> > > +
> > > +Scalar can be selected on core 3 by the following command where
> > > +study count should not be provided for any implementation other
> > > +than study ::
> > > +
> > > +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
> > >
> > >  Miniflow Extract Validation
> > >  ~~~
> > > diff --git 

Re: [ovs-dev] [PATCH v2 3/3] dpdk: Stop configuring socket-limit with the value of socket-mem.

2021-07-15 Thread Kevin Traynor
On 13/07/2021 20:15, Rosemarie O'Riorden wrote:
> This change removes the automatic memory limit on start-up of OVS with
> DPDK. As DPDK supports dynamic memory allocation, there is no
> need to limit the amount of memory available, if not requested.
> 
> Currently, if socket-limit is not configured, it is set to the value of
> socket-mem. With this change, the user can decide to set it or have no
> memory limit.
> 
> Removed logs added in patch 1 that announce this change.
> 

Can drop the reference to 'patch 1'. When the time is right you can add
the commit as a reference.

> Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850
> Signed-off-by: Rosemarie O'Riorden 
> ---
> Removes logs added in patch 1 that were not in v1.
> 
>  NEWS | 2 ++
>  lib/dpdk.c   | 7 ---
>  vswitchd/vswitch.xml | 6 +++---
>  3 files changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 948f68283..99b8b9fce 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -31,6 +31,8 @@ Post-v2.15.0
> Available only if DPDK experimantal APIs enabled during the build.
>   * EAL option --socket-mem is no longer configured by default upon
> start-up.
> + * EAL option --socket-limit no longer takes on the value of --socket-mem
> +   by default.
> - ovsdb-tool:
>   * New option '--election-timer' to the 'create-cluster' command to set 
> the
> leader election timer during cluster creation.
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index 3a6990e2f..266ef20dc 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -415,13 +415,6 @@ dpdk_init__(const struct smap *ovs_other_config)
>  break;
>  }
>  }
> -if (i < args.n - 1) {
> -svec_add(, "--socket-limit");
> -svec_add(, args.names[i + 1]);
> -VLOG_INFO("Using default value for '--socket-limit. OVS will no "
> -  "longer provide a default for this argument starting "
> -  "from 2.17 release. DPDK defaults will be used 
> instead.");
> -}

Looks like the entire 'if (!args_contains(, "--legacy-mem")..'
block can be removed as it is not serving a purpose anymore.

>  }
>  
>  if (args_contains(, "-c") || args_contains(, "-l")) {
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index c64be6c22..c8d61332d 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -381,11 +381,11 @@
>0 will disable the limit for a particular socket.
>  
>  
> -  If not specified, OVS will configure limits equal to the amount of
> -  preallocated memory specified by  +  If not specified, OVS will not configure limits by default.
> +  Limits can be configured with key="dpdk-socket-mem"/> or --socket-mem in
>.If none of the above
> -  options specified or --legacy-mem provided in
> +  options are specified or --legacy-mem is provided in
>, limits will not be
>applied.

It's not caused by your patch, but this sentence is unclear. It seems to
be referring to 'default limits' as opposed 'user set limits'.

As there is no default OVS set limits in any circumstance with your
patch, I think the sentence can be removed.

> There is no default value from OVS.
>Changing this value requires restarting the daemon.
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-15 Thread Stokes, Ian
> Added some missing negative test cases, see below.
> 

Thanks for these Eelco, would you like me to add you as a co-author for this 
patch as you have code that is part of the next revison?

Regards
Ian
> On 15 Jul 2021, at 14:42, kumar Amber wrote:
> 
> > From: Kumar Amber 
> >
> > Tests:
> >   6: OVS-DPDK - MFEX Autovalidator
> >   7: OVS-DPDK - MFEX Autovalidator Fuzzy
> >   8: OVS-DPDK - MFEX Configuration
> >
> > Added a new directory to store the PCAP file used
> > in the tests and a script to generate the fuzzy traffic
> > type pcap to be used in fuzzy unit test.
> >
> > Signed-off-by: Kumar Amber 
> > Acked-by: Flavio Leitner 
> >
> > ---
> > v13:
> > - fix -v in the command
> > - added the configuration test case and supporting doc update
> > v12:
> > - change skip paramter for unit test
> > v11:
> > - fix comments from Eelco
> > v7:
> > - fix review comments(Eelco)
> > v5:
> > - fix review comments(Ian, Flavio, Eelco)
> > - remove sleep from first test and added minor 5 sec sleep to fuzzy
> > ---
> > ---
> >  Documentation/topics/dpdk/bridge.rst |  56 
> >  tests/.gitignore |   1 +
> >  tests/automake.mk|   6 ++
> >  tests/mfex_fuzzy.py  |  33 +++
> >  tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
> >  tests/system-dpdk.at | 129 +++
> >  6 files changed, 225 insertions(+)
> >  create mode 100755 tests/mfex_fuzzy.py
> >  create mode 100644 tests/pcap/mfex_test.pcap
> >
> > diff --git a/Documentation/topics/dpdk/bridge.rst
> b/Documentation/topics/dpdk/bridge.rst
> > index 8c500c504..913b3e6f6 100644
> > --- a/Documentation/topics/dpdk/bridge.rst
> > +++ b/Documentation/topics/dpdk/bridge.rst
> > @@ -346,3 +346,59 @@ A compile time option is available in order to test it
> with the OVS unit
> >  test suite. Use the following configure option ::
> >
> >  $ ./configure --enable-mfex-default-autovalidator
> > +
> > +Unit Test Miniflow Extract
> > +++
> > +
> > +Unit test can also be used to test the workflow mentioned above by running
> > +the following test-case in tests/system-dpdk.at ::
> > +
> > +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> > +OVS-DPDK - MFEX Autovalidator
> > +
> > +The unit test uses mulitple traffic type to test the correctness of the
> > +implementaions.
> > +
> > +The MFEX commands can also be tested for negative and positive cases to
> > +verify that the MFEX set command does not allow for incorrect parameters.
> > +A user can directly run the following configuration test case in
> > +tests/system-dpdk.at ::
> > +
> > +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> > +OVS-DPDK - MFEX Configuration
> > +
> > +Running Fuzzy test with Autovalidator
> > ++
> > +
> > +Fuzzy tests can also be done on miniflow extract with the help of
> > +auto-validator and Scapy. The steps below describes the steps to
> > +reproduce the setup with IP being fuzzed to generate packets.
> > +
> > +Scapy is used to create fuzzy IP packets and save them into a PCAP ::
> > +
> > +pkt = fuzz(Ether()/IP()/TCP())
> > +
> > +Set the miniflow extract to autovalidator using ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> > +
> > +OVS is configured to receive the generated packets ::
> > +
> > +$ ovs-vsctl add-port br0 pcap0 -- \
> > +set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
> > +"rx_pcap=fuzzy.pcap"
> > +
> > +With this workflow, the autovalidator will ensure that all MFEX
> > +implementations are classifying each packet in exactly the same way.
> > +If an optimized MFEX implementation causes a different miniflow to be
> > +generated, the autovalidator has ovs_assert and logging statements that
> > +will inform about the issue.
> > +
> > +Unit Fuzzy test with Autovalidator
> > ++
> > +
> > +Unit test can also be used to test the workflow mentioned above by running
> > +the following test-case in tests/system-dpdk.at ::
> > +
> > +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> > +OVS-DPDK - MFEX Autovalidator Fuzzy
> > diff --git a/tests/.gitignore b/tests/.gitignore
> > index 45b4f67b2..a3d927e5d 100644
> > --- a/tests/.gitignore
> > +++ b/tests/.gitignore
> > @@ -11,6 +11,7 @@
> >  /ovsdb-cluster-testsuite
> >  /ovsdb-cluster-testsuite.dir/
> >  /ovsdb-cluster-testsuite.log
> > +/pcap/
> >  /pki/
> >  /system-afxdp-testsuite
> >  /system-afxdp-testsuite.dir/
> > diff --git a/tests/automake.mk b/tests/automake.mk
> > index f45f8d76c..a6c15ba55 100644
> > --- a/tests/automake.mk
> > +++ b/tests/automake.mk
> > @@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at:
> tests/automake.mk
> > echo "TEST_FUZZ_REGRESSION([$$basename])"; \
> > done > $@.tmp && mv $@.tmp $@
> >
> > +EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
> > +MFEX_AUTOVALIDATOR_TESTS = \
> > +   

Re: [ovs-dev] [PATCH v13 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-15 Thread Flavio Leitner
On Thu, Jul 15, 2021 at 04:15:13PM +0200, Eelco Chaudron wrote:
> 
> Some minor changes in output, maybe they can be done during the commit?
> 
> On 15 Jul 2021, at 14:42, kumar Amber wrote:
> 
> > From: Kumar Amber 
> > 
> > This commit introduces additional command line paramter
> > for mfex study function. If user provides additional packet out
> > it is used in study to compare minimum packets which must be processed
> > else a default value is choosen.
> > Also introduces a third paramter for choosing a particular pmd core.
> > 
> > $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3
> > 
> > Signed-off-by: Kumar Amber 
> > 
> > ---
> > v13:
> > - reowrked the set command as per discussion
> > - fixed the atomic set in study
> > - added bool for handling study mfex to simplify logic and command
> > output
> > - fixed double space in variable declaration and removed static
> > v12:
> > - re-work the set command to sweep
> > - inlcude fixes to study.c and doc changes
> > v11:
> > - include comments from Eelco
> > - reworked set command as per discussion
> > v10:
> > - fix review comments Eelco
> > v9:
> > - fix review comments Flavio
> > v7:
> > - change the command paramters for core_id and study_pkt_cnt
> > v5:
> > - fix review comments(Ian, Flavio, Eelco)
> > - introucde pmd core id parameter
> > ---
> > ---
> >  Documentation/topics/dpdk/bridge.rst |  38 +-
> >  lib/dpif-netdev-extract-study.c  |  27 -
> >  lib/dpif-netdev-private-extract.h|   9 ++
> >  lib/dpif-netdev.c| 173 ++-
> >  4 files changed, 215 insertions(+), 32 deletions(-)
> > 
> > diff --git a/Documentation/topics/dpdk/bridge.rst
> > b/Documentation/topics/dpdk/bridge.rst
> > index a47153495..8c500c504 100644
> > --- a/Documentation/topics/dpdk/bridge.rst
> > +++ b/Documentation/topics/dpdk/bridge.rst
> > @@ -284,12 +284,46 @@ command also shows whether the CPU supports each
> > implementation ::
> > 
> >  An implementation can be selected manually by the following command ::
> > 
> > -$ ovs-appctl dpif-netdev/miniflow-parser-set study
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
> > + [study_cnt]
> > +
> > +The above command has two optional parameters: study_cnt and core_id.
> > +The core_id sets a particular miniflow extract function to a specific
> > +pmd thread on the core. The third parameter study_cnt, which is
> > specific
> > +to study and ignored by other implementations, means how many packets
> > +are needed to choose the best implementation.
> > 
> >  Also user can select the study implementation which studies the traffic
> > for
> >  a specific number of packets by applying all available implementations
> > of
> >  miniflow extract and then chooses the one with the most optimal result
> > for
> > -that traffic pattern.
> > +that traffic pattern. The user can optionally provide an packet count
> > +[study_cnt] parameter which is the minimum number of packets that OVS
> > must
> > +study before choosing an optimal implementation. If no packet count is
> > +provided, then the default value, 128 is chosen. Also, as there is no
> > +synchronization point between threads, one PMD thread might still be
> > running
> > +a previous round, and can now decide on earlier data.
> > +
> > +The per packet count is a global value, and parallel study executions
> > with
> > +differing packet counts will use the most recent count value provided
> > by user.
> > +
> > +Study can be selected with packet count by the following command ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
> > +
> > +Study can be selected with packet count and explicit PMD selection
> > +by the following command ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
> > +
> > +In the above command the first parameter is the CORE ID of the PMD
> > +thread and this can also be used to explicitly set the miniflow
> > +extraction function pointer on different PMD threads.
> > +
> > +Scalar can be selected on core 3 by the following command where
> > +study count should not be provided for any implementation other
> > +than study ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
> > 
> >  Miniflow Extract Validation
> >  ~~~
> > diff --git a/lib/dpif-netdev-extract-study.c
> > b/lib/dpif-netdev-extract-study.c
> > index 02b709f8b..7725c8f6e 100644
> > --- a/lib/dpif-netdev-extract-study.c
> > +++ b/lib/dpif-netdev-extract-study.c
> > @@ -25,7 +25,7 @@
> > 
> >  VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
> > 
> > -static atomic_uint32_t mfex_study_pkts_count = 0;
> > +static atomic_uint32_t mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;
> > 
> >  /* Struct to hold miniflow study stats. */
> >  struct study_stats {
> > @@ -48,6 +48,26 @@ mfex_study_get_study_stats_ptr(void)
> >  return stats;
> >  }
> > 
> > +int
> > 

Re: [ovs-dev] [PATCH v13 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-15 Thread Stokes, Ian
Hi Eelco,

These seem straight forward and I agree. Let us re-spin with these changes and 
I think this patch will be in a good to go state once these are addressed.

Reverse Xmas tree order for variables.
Standardize log errors to ‘miniflow extract’ (note lower case m and the word 
extract).
Add ‘Error:’ to string in error messages.

I think we can get these changes made a new v14 out soon.

Thanks
Ian

From: Eelco Chaudron 
Sent: Thursday, July 15, 2021 3:15 PM
To: Amber, Kumar ; f...@sysclose.org
Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org; Van Haaren, Harry 
; Ferriter, Cian ; Stokes, 
Ian 
Subject: Re: [PATCH v13 06/11] dpif-netdev: Add packet count and core id 
paramters for study


Some minor changes in output, maybe they can be done during the commit?

On 15 Jul 2021, at 14:42, kumar Amber wrote:

From: Kumar Amber mailto:kumar.am...@intel.com>>

This commit introduces additional command line paramter
for mfex study function. If user provides additional packet out
it is used in study to compare minimum packets which must be processed
else a default value is choosen.
Also introduces a third paramter for choosing a particular pmd core.

$ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3

Signed-off-by: Kumar Amber mailto:kumar.am...@intel.com>>

---
v13:
- reowrked the set command as per discussion
- fixed the atomic set in study
- added bool for handling study mfex to simplify logic and command output
- fixed double space in variable declaration and removed static
v12:
- re-work the set command to sweep
- inlcude fixes to study.c and doc changes
v11:
- include comments from Eelco
- reworked set command as per discussion
v10:
- fix review comments Eelco
v9:
- fix review comments Flavio
v7:
- change the command paramters for core_id and study_pkt_cnt
v5:
- fix review comments(Ian, Flavio, Eelco)
- introucde pmd core id parameter
---
---
Documentation/topics/dpdk/bridge.rst | 38 +-
lib/dpif-netdev-extract-study.c | 27 -
lib/dpif-netdev-private-extract.h | 9 ++
lib/dpif-netdev.c | 173 ++-
4 files changed, 215 insertions(+), 32 deletions(-)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index a47153495..8c500c504 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -284,12 +284,46 @@ command also shows whether the CPU supports each 
implementation ::

An implementation can be selected manually by the following command ::

- $ ovs-appctl dpif-netdev/miniflow-parser-set study
+ $ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
+ [study_cnt]
+
+The above command has two optional parameters: study_cnt and core_id.
+The core_id sets a particular miniflow extract function to a specific
+pmd thread on the core. The third parameter study_cnt, which is specific
+to study and ignored by other implementations, means how many packets
+are needed to choose the best implementation.

Also user can select the study implementation which studies the traffic for
a specific number of packets by applying all available implementations of
miniflow extract and then chooses the one with the most optimal result for
-that traffic pattern.
+that traffic pattern. The user can optionally provide an packet count
+[study_cnt] parameter which is the minimum number of packets that OVS must
+study before choosing an optimal implementation. If no packet count is
+provided, then the default value, 128 is chosen. Also, as there is no
+synchronization point between threads, one PMD thread might still be running
+a previous round, and can now decide on earlier data.
+
+The per packet count is a global value, and parallel study executions with
+differing packet counts will use the most recent count value provided by user.
+
+Study can be selected with packet count by the following command ::
+
+ $ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
+
+Study can be selected with packet count and explicit PMD selection
+by the following command ::
+
+ $ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
+
+In the above command the first parameter is the CORE ID of the PMD
+thread and this can also be used to explicitly set the miniflow
+extraction function pointer on different PMD threads.
+
+Scalar can be selected on core 3 by the following command where
+study count should not be provided for any implementation other
+than study ::
+
+ $ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar

Miniflow Extract Validation
~~~
diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
index 02b709f8b..7725c8f6e 100644
--- a/lib/dpif-netdev-extract-study.c
+++ b/lib/dpif-netdev-extract-study.c
@@ -25,7 +25,7 @@

VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);

-static atomic_uint32_t mfex_study_pkts_count = 0;
+static atomic_uint32_t mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;

/* Struct to hold miniflow study stats. */
struct study_stats {
@@ 

Re: [ovs-dev] [PATCH v2 2/3] dpdk: Remove default values for socket-mem and limit.

2021-07-15 Thread Kevin Traynor
On 13/07/2021 20:15, Rosemarie O'Riorden wrote:
> This change removes the default values for EAL args socket-mem and
> socket-limit. As DPDK supports dynamic memory allocation, there is no
> need to allocate a certain amount of memory on start-up, nor limit the
> amount of memory available, if not requested.
> 
> Currently, socket-mem has a default value of 1024 when it is not
> configured by the user, and socket-limit takes on the value of socket-mem,
> 1024, by default. With this change, socket-mem is not configured by default,
> meaning that socket-limit is not either. Neither, either or both options can 
> be set.
> 
> Removed extra logs added in patch 1 that announce this change.
> 
> Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1949850
> Signed-off-by: Rosemarie O'Riorden 
> ---
> Removes logs added in patch 1 that were not in v1.
> Removes code added to lib/dpdk.c since v1 that conflicts with this patch 
> series.
> 
>  Documentation/intro/install/dpdk.rst |  5 +-
>  NEWS |  4 +-
>  lib/dpdk.c   | 74 +---
>  vswitchd/vswitch.xml | 16 +++---
>  4 files changed, 11 insertions(+), 88 deletions(-)
> 
> diff --git a/Documentation/intro/install/dpdk.rst 
> b/Documentation/intro/install/dpdk.rst
> index d8fa931fa..96843af73 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -290,9 +290,8 @@ listed below. Defaults will be provided for all values 
> not explicitly set.
>  
>  ``dpdk-socket-mem``
>Comma separated list of memory to pre-allocate from hugepages on specific
> -  sockets. If not specified, 1024 MB will be set for each numa node by
> -  default. This behavior will change with the 2.17 release, with no default
> -  value from OVS. Instead, DPDK default will be used.
> +  sockets. If not specified, this option will not be set by default. DPDK
> +  default will be used instead.
>  
>  ``dpdk-hugepage-dir``
>Directory where hugetlbfs is mounted
> diff --git a/NEWS b/NEWS
> index 126f5a927..948f68283 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -29,8 +29,8 @@ Post-v2.15.0
> Available only if DPDK experimantal APIs enabled during the build.
>   * Add hardware offload support for VXLAN flows (experimental).
> Available only if DPDK experimantal APIs enabled during the build.
> - * EAL options --socket-mem and --socket-limit to have default values
> -   removed with 2.17 release. Logging added to alert users.
> + * EAL option --socket-mem is no longer configured by default upon
> +   start-up.

Fine for now, but the NEWS in 2 and 3 will need to rebase to post 2.16
changes when the time is right.

> - ovsdb-tool:
>   * New option '--election-timer' to the 'create-cluster' command to set 
> the
> leader election timer during cluster creation.
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index ed57067ee..3a6990e2f 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -130,74 +130,12 @@ construct_dpdk_options(const struct smap 
> *ovs_other_config, struct svec *args)
>  }
>  }
>  
> -static int
> -compare_numa_node_list(const void *a_, const void *b_)
> -{
> -int a = *(const int *) a_;
> -int b = *(const int *) b_;
> -
> -if (a < b) {
> -return -1;
> -}
> -if (a > b) {
> -return 1;
> -}
> -return 0;
> -}
> -
> -static char *
> -construct_dpdk_socket_mem(void)
> -{
> -const char *def_value = "1024";
> -struct ds dpdk_socket_mem = DS_EMPTY_INITIALIZER;
> -
> -/* Build a list of all numa nodes with at least one core. */
> -struct ovs_numa_dump *dump = ovs_numa_dump_n_cores_per_numa(1);
> -size_t n_numa_nodes = hmap_count(>numas);
> -int *numa_node_list = xcalloc(n_numa_nodes, sizeof *numa_node_list);
> -
> -const struct ovs_numa_info_numa *node;
> -int k = 0, last_node = 0;
> -
> -FOR_EACH_NUMA_ON_DUMP(node, dump) {
> -if (k >= n_numa_nodes) {
> -break;
> -}
> -numa_node_list[k++] = node->numa_id;
> -}
> -qsort(numa_node_list, k, sizeof *numa_node_list, compare_numa_node_list);
> -
> -for (int i = 0; i < n_numa_nodes; i++) {
> -while (numa_node_list[i] > last_node &&
> -   numa_node_list[i] != OVS_NUMA_UNSPEC &&
> -   numa_node_list[i] <= MAX_NUMA_NODES) {
> -if (last_node == 0) {
> -ds_put_format(_socket_mem, "%s", "0");
> -} else {
> -ds_put_format(_socket_mem, ",%s", "0");
> -}
> -last_node++;
> -}
> -if (numa_node_list[i] == 0) {
> -ds_put_format(_socket_mem, "%s", def_value);
> -} else {
> -ds_put_format(_socket_mem, ",%s", def_value);
> -}
> -last_node++;
> -}

This code had a short stay :'|

> -free(numa_node_list);
> -ovs_numa_dump_destroy(dump);
> -return ds_cstr(_socket_mem);
> -}
> -

Re: [ovs-dev] [PATCH ovn] northd: Fix defrag flows for duplicate vips

2021-07-15 Thread Dumitru Ceara
On 7/15/21 3:54 PM, Mark Gray wrote:
> On 15/07/2021 14:16, Mark Michelson wrote:
>> Hi Mark,

Hi Mark, Mark,

>>
>> I'm a bit curious about this change. Does the removal of the protocol 
>> from the match mean that traffic that is not of the protocol specified 
>> in the load balancer will be ct_dnat()'ed? Does that constitute 
>> unexpected behavior?
>>
> 
> Yes, this is the case. It's a tradeoff between number of flows and
> reirculations but thinking about it again, it may be better to have more
> flows. I will create a v2.
> 

Unless we match on proto *and* L4 port I don't think it's worth adding
per proto flows.  Assuming a TCP load balancer, all TCP traffic with
destination VIP will still be ct_dnat()'ed, even if the TCP destination
port is not the one defined in the load balancer VIP.

On the other hand, using the same VIP for multiple ports is probably a
common use case so if we add the L4 port to the match the number of
logical flows might increase significantly.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v13 07/11] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-15 Thread Eelco Chaudron
Added some missing negative test cases, see below.

On 15 Jul 2021, at 14:42, kumar Amber wrote:

> From: Kumar Amber 
>
> Tests:
>   6: OVS-DPDK - MFEX Autovalidator
>   7: OVS-DPDK - MFEX Autovalidator Fuzzy
>   8: OVS-DPDK - MFEX Configuration
>
> Added a new directory to store the PCAP file used
> in the tests and a script to generate the fuzzy traffic
> type pcap to be used in fuzzy unit test.
>
> Signed-off-by: Kumar Amber 
> Acked-by: Flavio Leitner 
>
> ---
> v13:
> - fix -v in the command
> - added the configuration test case and supporting doc update
> v12:
> - change skip paramter for unit test
> v11:
> - fix comments from Eelco
> v7:
> - fix review comments(Eelco)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - remove sleep from first test and added minor 5 sec sleep to fuzzy
> ---
> ---
>  Documentation/topics/dpdk/bridge.rst |  56 
>  tests/.gitignore |   1 +
>  tests/automake.mk|   6 ++
>  tests/mfex_fuzzy.py  |  33 +++
>  tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
>  tests/system-dpdk.at | 129 +++
>  6 files changed, 225 insertions(+)
>  create mode 100755 tests/mfex_fuzzy.py
>  create mode 100644 tests/pcap/mfex_test.pcap
>
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 8c500c504..913b3e6f6 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -346,3 +346,59 @@ A compile time option is available in order to test it 
> with the OVS unit
>  test suite. Use the following configure option ::
>
>  $ ./configure --enable-mfex-default-autovalidator
> +
> +Unit Test Miniflow Extract
> +++
> +
> +Unit test can also be used to test the workflow mentioned above by running
> +the following test-case in tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> +OVS-DPDK - MFEX Autovalidator
> +
> +The unit test uses mulitple traffic type to test the correctness of the
> +implementaions.
> +
> +The MFEX commands can also be tested for negative and positive cases to
> +verify that the MFEX set command does not allow for incorrect parameters.
> +A user can directly run the following configuration test case in
> +tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> +OVS-DPDK - MFEX Configuration
> +
> +Running Fuzzy test with Autovalidator
> ++
> +
> +Fuzzy tests can also be done on miniflow extract with the help of
> +auto-validator and Scapy. The steps below describes the steps to
> +reproduce the setup with IP being fuzzed to generate packets.
> +
> +Scapy is used to create fuzzy IP packets and save them into a PCAP ::
> +
> +pkt = fuzz(Ether()/IP()/TCP())
> +
> +Set the miniflow extract to autovalidator using ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> +
> +OVS is configured to receive the generated packets ::
> +
> +$ ovs-vsctl add-port br0 pcap0 -- \
> +set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
> +"rx_pcap=fuzzy.pcap"
> +
> +With this workflow, the autovalidator will ensure that all MFEX
> +implementations are classifying each packet in exactly the same way.
> +If an optimized MFEX implementation causes a different miniflow to be
> +generated, the autovalidator has ovs_assert and logging statements that
> +will inform about the issue.
> +
> +Unit Fuzzy test with Autovalidator
> ++
> +
> +Unit test can also be used to test the workflow mentioned above by running
> +the following test-case in tests/system-dpdk.at ::
> +
> +make check-dpdk TESTSUITEFLAGS='-k MFEX'
> +OVS-DPDK - MFEX Autovalidator Fuzzy
> diff --git a/tests/.gitignore b/tests/.gitignore
> index 45b4f67b2..a3d927e5d 100644
> --- a/tests/.gitignore
> +++ b/tests/.gitignore
> @@ -11,6 +11,7 @@
>  /ovsdb-cluster-testsuite
>  /ovsdb-cluster-testsuite.dir/
>  /ovsdb-cluster-testsuite.log
> +/pcap/
>  /pki/
>  /system-afxdp-testsuite
>  /system-afxdp-testsuite.dir/
> diff --git a/tests/automake.mk b/tests/automake.mk
> index f45f8d76c..a6c15ba55 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at: 
> tests/automake.mk
>   echo "TEST_FUZZ_REGRESSION([$$basename])"; \
>   done > $@.tmp && mv $@.tmp $@
>
> +EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
> +MFEX_AUTOVALIDATOR_TESTS = \
> + tests/pcap/mfex_test.pcap \
> + tests/mfex_fuzzy.py
> +
>  OVSDB_CLUSTER_TESTSUITE_AT = \
>   tests/ovsdb-cluster-testsuite.at \
>   tests/ovsdb-execution.at \
> @@ -512,6 +517,7 @@ tests_test_type_props_SOURCES = tests/test-type-props.c
>  CHECK_PYFILES = \
>   tests/appctl.py \
>   tests/flowgen.py \
> + tests/mfex_fuzzy.py \
>   tests/ovsdb-monitor-sort.py \
>   tests/test-daemon.py \
> 

Re: [ovs-dev] [PATCH v13 06/11] dpif-netdev: Add packet count and core id paramters for study

2021-07-15 Thread Eelco Chaudron


Some minor changes in output, maybe they can be done during the commit?

On 15 Jul 2021, at 14:42, kumar Amber wrote:


From: Kumar Amber 

This commit introduces additional command line paramter
for mfex study function. If user provides additional packet out
it is used in study to compare minimum packets which must be processed
else a default value is choosen.
Also introduces a third paramter for choosing a particular pmd core.

$ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3

Signed-off-by: Kumar Amber 

---
v13:
- reowrked the set command as per discussion
- fixed the atomic set in study
- added bool for handling study mfex to simplify logic and command 
output

- fixed double space in variable declaration and removed static
v12:
- re-work the set command to sweep
- inlcude fixes to study.c and doc changes
v11:
- include comments from Eelco
- reworked set command as per discussion
v10:
- fix review comments Eelco
v9:
- fix review comments Flavio
v7:
- change the command paramters for core_id and study_pkt_cnt
v5:
- fix review comments(Ian, Flavio, Eelco)
- introucde pmd core id parameter
---
---
 Documentation/topics/dpdk/bridge.rst |  38 +-
 lib/dpif-netdev-extract-study.c  |  27 -
 lib/dpif-netdev-private-extract.h|   9 ++
 lib/dpif-netdev.c| 173 
++-

 4 files changed, 215 insertions(+), 32 deletions(-)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst

index a47153495..8c500c504 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -284,12 +284,46 @@ command also shows whether the CPU supports each 
implementation ::


 An implementation can be selected manually by the following command 
::


-$ ovs-appctl dpif-netdev/miniflow-parser-set study
+$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] 
[name]

+ [study_cnt]
+
+The above command has two optional parameters: study_cnt and core_id.
+The core_id sets a particular miniflow extract function to a specific
+pmd thread on the core. The third parameter study_cnt, which is 
specific

+to study and ignored by other implementations, means how many packets
+are needed to choose the best implementation.

 Also user can select the study implementation which studies the 
traffic for
 a specific number of packets by applying all available 
implementations of
 miniflow extract and then chooses the one with the most optimal 
result for

-that traffic pattern.
+that traffic pattern. The user can optionally provide an packet count
+[study_cnt] parameter which is the minimum number of packets that OVS 
must
+study before choosing an optimal implementation. If no packet count 
is

+provided, then the default value, 128 is chosen. Also, as there is no
+synchronization point between threads, one PMD thread might still be 
running

+a previous round, and can now decide on earlier data.
+
+The per packet count is a global value, and parallel study executions 
with
+differing packet counts will use the most recent count value provided 
by user.

+
+Study can be selected with packet count by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
+
+Study can be selected with packet count and explicit PMD selection
+by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
+
+In the above command the first parameter is the CORE ID of the PMD
+thread and this can also be used to explicitly set the miniflow
+extraction function pointer on different PMD threads.
+
+Scalar can be selected on core 3 by the following command where
+study count should not be provided for any implementation other
+than study ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar

 Miniflow Extract Validation
 ~~~
diff --git a/lib/dpif-netdev-extract-study.c 
b/lib/dpif-netdev-extract-study.c

index 02b709f8b..7725c8f6e 100644
--- a/lib/dpif-netdev-extract-study.c
+++ b/lib/dpif-netdev-extract-study.c
@@ -25,7 +25,7 @@

 VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);

-static atomic_uint32_t mfex_study_pkts_count = 0;
+static atomic_uint32_t mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;

 /* Struct to hold miniflow study stats. */
 struct study_stats {
@@ -48,6 +48,26 @@ mfex_study_get_study_stats_ptr(void)
 return stats;
 }

+int
+mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count, const char *name)
+{
+struct dpif_miniflow_extract_impl *miniflow_funcs;
+miniflow_funcs = dpif_mfex_impl_info_get();
+
+/* If the packet count is set and implementation called is study 
then
+ * set packet counter to requested number or set the packet 
counter

+ * to default number else return -EINVAL.
+ */
+if ((strcmp(miniflow_funcs[MFEX_IMPL_STUDY].name, name) == 0) &&
+(pkt_cmp_count != 0)) {
+
+atomic_store_relaxed(_study_pkts_count, 

Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache

2021-07-15 Thread Eli Britstein via dev


On 7/15/2021 4:35 PM, Ferriter, Cian wrote:

External email: Use caution opening links or attachments



-Original Message-
From: Eli Britstein 
Sent: Wednesday 14 July 2021 16:21
To: Ferriter, Cian ; Ilya Maximets 
; Gaëtan Rivet
; d...@openvswitch.org; Van Haaren, Harry 

Cc: Majd Dibbiny ; Stokes, Ian ; Flavio 
Leitner

Subject: Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache


On 7/14/2021 5:58 PM, Ferriter, Cian wrote:

External email: Use caution opening links or attachments



-Original Message-
From: Ilya Maximets 
Sent: Friday 9 July 2021 21:53
To: Ferriter, Cian ; Gaëtan Rivet ; 
Eli Britstein
; d...@openvswitch.org; Van Haaren, Harry 

Cc: Majd Dibbiny ; Ilya Maximets ; Stokes, 
Ian
; Flavio Leitner 
Subject: Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache

On 7/8/21 6:43 PM, Ferriter, Cian wrote:

Hi Gaetan, Eli and all,

Thanks for the patch and the info on how it affects performance in your case. I 
just wanted to

post

the performance we are seeing.

I've posted the numbers inline. Please note, I'll be away on leave till Tuesday.
Thanks,
Cian


-Original Message-
From: Gaëtan Rivet 
Sent: Wednesday 7 July 2021 17:36
To: Eli Britstein ;  
; Van Haaren,

Harry

; Ferriter, Cian 
Cc: Majd Dibbiny ; Ilya Maximets 
Subject: Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache

On Wed, Jul 7, 2021, at 17:05, Eli Britstein wrote:

Port numbers are usually small. Maintain an array of netdev handles indexed
by port numbers. It accelerates looking up for them for
netdev_hw_miss_packet_recover().

Reported-by: Cian Ferriter 
Signed-off-by: Eli Britstein 
Reviewed-by: Gaetan Rivet 
---




___
dev mailing list
d...@openvswitch.org


https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flis
tinfo%2Fovs-
devdata=04%7C01%7Celibr%40nvidia.com%7C7ca0caf9434e429e4ffd08d946d7cf2f%7C43083d15727340c1b7db39e
fd9ccc17a%7C0%7C0%7C637618715041410254%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJ
BTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=rv%2FdenANxrcTGxBBbRvhhlNioyswL7ieFr8AGcGtCs8%3Drese
rved=0

Hello,

I tested the performance impact of this patch with a partial offload setup.
As reported by pmd-stats-show, in average cycles per packet:

Before vxlan-decap: 525 c/p
After vxlan-decap: 542 c/p
After this fix: 530 c/p

Without those fixes, vxlan-decap has a 3.2% negative impact on cycles,
with the fixes, the impact is reduced to 0.95%.

As I had to force partial offloads for our hardware, it would be better
with an outside confirmation on a proper setup.

Kind regards,
--
Gaetan Rivet

I'm showing the performance relative to what we measured on OVS master directly 
before the VXLAN

HWOL changes went in. All of the below results are using the scalar DPIF and 
partial HWOL.

Link to "Fixup patches":

https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatchwork.ozlabs.org%2Fproject%2Fopen
vswitch%2Flist%2F%3Fseries%3D252356data=04%7C01%7Celibr%40nvidia.com%7C7ca0caf9434e429e4ffd08d946
d7cf2f%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637618715041410254%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Y62OrCRyS00vJHHPQvAHyhG5C
4eO%2FSfWMCSPtszn3Is%3Dreserved=0

Master before VXLAN HWOL changes (f0e4a73)
1.000x

Latest master after VXLAN HWOL changes (b780911)
0.918x (-8.2%)

After fixup patches on OVS ML are applied (with ALLOW_EXPERIMENTAL_API=off)
0.973x (-2.7%)

After fixup patches on OVS ML are applied and after ALLOW_EXPERIMENTAL_API is 
removed.
0.938x (-6.2%)

I ran the last set of results by applying the below diff. I did this because 
I'm assuming the plan

is to remove the ALLOW_EXPERIMENTAL_API '#ifdef's at some point?

Yes, that is the plan.


Thanks for confirming this.


And thanks for testing, Gaetan and Cian!

Could you also provide more details on your test environment,
so someone else can reproduce?


Good idea, I'll add the details inline below. These details apply to the 
performance measured

previously by me, and the performance in this mail.

What is important to know:
- Test configuration: P2P, V2V, PVP, etc.

P2P
1 PHY port
1 RXQ


- Test type: max. throughput, zero packet loss.

Max throughput.


- OVS config: EMC, SMC, HWOL, AVX512 - on/off/type

In all tests, all packets hit a single datapath flow with "offloaded:partial". 
So all packets are

partially offloaded, skipping miniflow_extract() and EMC/SMC/DPCLS lookups.

AVX512 is off.


- Installed OF rules.

$ $OVS_DIR/utilities/ovs-ofctl dump-flows br0
   cookie=0x0, duration=253.691s, table=0, n_packets=2993867136, 
n_bytes=179632028160, in_port=phy0

actions=IN_PORT

- Traffic pattern: Packet size, number of flows, packet type.

64B, 1 flow, ETH/IP packets.


This tests also didn't include the fix from Balazs, IIUC, because
they were performed a bit before that patch got accepted.



Re: [ovs-dev] [PATCH ovn] northd: Fix defrag flows for duplicate vips

2021-07-15 Thread Mark Gray
On 15/07/2021 14:16, Mark Michelson wrote:
> Hi Mark,
> 
> I'm a bit curious about this change. Does the removal of the protocol 
> from the match mean that traffic that is not of the protocol specified 
> in the load balancer will be ct_dnat()'ed? Does that constitute 
> unexpected behavior?
> 

Yes, this is the case. It's a tradeoff between number of flows and
reirculations but thinking about it again, it may be better to have more
flows. I will create a v2.

 On 7/15/21 8:14 AM, Mark Gray wrote:
>> When adding two SB flows with the same vip but different protocols, only
>> the most recent flow will be added due to the `if` statement:
>>
>>  if (!sset_contains(_ips, lb_vip->vip_str)) {
>>  sset_add(_ips, lb_vip->vip_str);
>>
>> This can cause unexpected behaviour when two load balancers with
>> the same VIP (and different protocols) are added to a logical router.
>>
>> This is due to the addition of "protocol" to the match in
>> defrag table flows in a previous commit. Revert that change.
>>
>> This bug was discovered through the OVN CI (ovn-kubernetes.yml).
>>
>> Fixes: 384a7c6237da ("northd: Refactor Logical Flows for routers with 
>> DNAT/Load Balancers")
>> Signed-off-by: Mark Gray 
>> ---
>>   northd/ovn-northd.c  |  8 
>>   northd/ovn_northd.dl |  9 +
>>   tests/ovn-northd.at  | 46 ++--
>>   3 files changed, 24 insertions(+), 39 deletions(-)
>>
>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>> index 999c3f482c29..5fab62c0fcf7 100644
>> --- a/northd/ovn-northd.c
>> +++ b/northd/ovn-northd.c
>> @@ -9219,11 +9219,6 @@ build_lrouter_lb_flows(struct hmap *lflows, struct 
>> ovn_datapath *od,
>>   for (size_t j = 0; j < lb->n_vips; j++) {
>>   struct ovn_lb_vip *lb_vip = >vips[j];
>>   
>> -bool is_udp = nullable_string_is_equal(nb_lb->protocol, "udp");
>> -bool is_sctp = nullable_string_is_equal(nb_lb->protocol,
>> -"sctp");
>> -const char *proto = is_udp ? "udp" : is_sctp ? "sctp" : "tcp";
>> -
>>   struct ds defrag_actions = DS_EMPTY_INITIALIZER;
>>   if (!sset_contains(_ips, lb_vip->vip_str)) {
>>   sset_add(_ips, lb_vip->vip_str);
>> @@ -9249,9 +9244,6 @@ build_lrouter_lb_flows(struct hmap *lflows, struct 
>> ovn_datapath *od,
>> lb_vip->vip_str);
>>   }
>>   
>> -if (lb_vip->vip_port) {
>> -ds_put_format(match, " && %s", proto);
>> -}
>>   ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DEFRAG,
>>   100, ds_cstr(match),
>>   ds_cstr(_actions),
>> diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
>> index ceeabe6f384e..b37da86f76aa 100644
>> --- a/northd/ovn_northd.dl
>> +++ b/northd/ovn_northd.dl
>> @@ -6167,14 +6167,7 @@ for (RouterLBVIP(
>>*pick a DNAT ip address from a group.
>>* 2. If there are L4 ports in load balancing rules, we
>>*need the defragmentation to match on L4 ports. */
>> -var match1 = "ip && ${ipX}.dst == ${ip_address}" in
>> -var match2 =
>> -if (port != 0) {
>> -" && ${proto}"
>> -} else {
>> -""
>> -} in
>> -var __match = match1 ++ match2 in
>> +var __match = "ip && ${ipX}.dst == ${ip_address}" in
>>   var xx = ip_address.xxreg() in
>>   var __actions = "${xx}${rEG_NEXT_HOP()} = ${ip_address}; ct_dnat;" 
>> in
>>   /* One of these flows must be created for each unique LB VIP 
>> address.
>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>> index 11461d3f4c2a..072616898d63 100644
>> --- a/tests/ovn-northd.at
>> +++ b/tests/ovn-northd.at
>> @@ -3167,7 +3167,7 @@ AT_CHECK([grep "lr_in_unsnat" lr0flows | sort], [0], 
>> [dnl
>>   
>>   AT_CHECK([grep "lr_in_defrag" lr0flows | sort], [0], [dnl
>> table=5 (lr_in_defrag   ), priority=0, match=(1), action=(next;)
>> -  table=5 (lr_in_defrag   ), priority=100  , match=(ip && ip4.dst == 
>> 10.0.0.10 && tcp), action=(reg0 = 10.0.0.10; ct_dnat;)
>> +  table=5 (lr_in_defrag   ), priority=100  , match=(ip && ip4.dst == 
>> 10.0.0.10), action=(reg0 = 10.0.0.10; ct_dnat;)
>>   ])
>>   
>>   AT_CHECK([grep "lr_in_dnat" lr0flows | sort], [0], [dnl
>> @@ -3200,7 +3200,7 @@ AT_CHECK([grep "lr_in_unsnat" lr0flows | sort], [0], 
>> [dnl
>>   
>>   AT_CHECK([grep "lr_in_defrag" lr0flows | sort], [0], [dnl
>> table=5 (lr_in_defrag   ), priority=0, match=(1), action=(next;)
>> -  table=5 (lr_in_defrag   ), priority=100  , match=(ip && ip4.dst == 
>> 10.0.0.10 && tcp), action=(reg0 = 10.0.0.10; ct_dnat;)
>> +  table=5 (lr_in_defrag   ), priority=100  , match=(ip && ip4.dst == 
>> 10.0.0.10), 

Re: [ovs-dev] [v15 06/10] dpif-netdev: Add a partial HWOL PMD statistic.

2021-07-15 Thread Ferriter, Cian



> -Original Message-
> From: Flavio Leitner 
> Sent: Friday 9 July 2021 18:54
> To: Ferriter, Cian 
> Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> Subject: Re: [ovs-dev] [v15 06/10] dpif-netdev: Add a partial HWOL PMD 
> statistic.
> 
> 
> 
> Hi,
> 
> After rebasing, the performance of branch master boosted in my env
> from 12Mpps to 13Mpps. However, this specific patch brings down
> to 12Mpps. I am using dpif_scalar and generic lookup (no AVX512).
> 

Thanks for the investigation. Always great seeing perf numbers and details!

I just want to check my understanding here with what you're seeing:

Performance before DPIF patchset
12Mpps

Performance at this patch
12Mpps

Performance after DPIF patchset
13Mpps

So the performance recovers somewhere else in the patchset?

I've checked the performance behaviour in my case. I'm going to report relative 
performance numbers. They are relative to master branch before AVX512 DPIF was 
applied (c36c8e3).
I tried to run a similar testcase, I can see you are using EMC from the memcmp 
in perf top output. I am also using the scalar DPIF in all the below testcases.

Master before AVX512 DPIF (c36c8e3)
1.000x (0.0%)
DPIF patch 3 - dpif-avx512: Add ISA implementation of dpif.
1.010x (1.0%)
DPIF patch 4 - dpif-netdev: Add command to switch dpif implementation.
1.042x (4.2%)
DPIF patch 5 - dpif-netdev: Add command to get dpif implementations.
1.063x (6.3%)
DPIF patch 6 - dpif-netdev: Add a partial HWOL PMD statistic.
1.069x (6.9%)
Latest master which has AVX512 DPIF patches (d2e9703)
1.075x (7.5%)
Master before AVX512 DPIF (c36c8e3), with prefetch change
0.983x (-1.7%)
Latest master which has AVX512 DPIF patches (d2e9703), with prefetch change
1.080x (8.0%)

> (I don't think this report should block the patch because the
> counter are interesting and the analysis below doesn't point
> directly to the proposed changes.)
> 
> This is a diff using all patches applied versus this patch reverted:
> 21.44% +6.08%  ovs-vswitchd[.] miniflow_extract
>  8.94% -1.92%  libc-2.28.so[.] __memcmp_avx2_movbe
> 14.62% +1.44%  ovs-vswitchd[.] dp_netdev_input__
>  2.80% -1.08%  ovs-vswitchd[.] 
> dp_netdev_pmd_flush_output_on_port
>  3.44% -0.91%  ovs-vswitchd[.] netdev_send
> 
> This is the code side by side, patch applied on the right side:
> (sorry, long lines)
> 

My mail client has wrapped the below lines, sorry for mangling the output!


Please find it here:
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385448.html

> 
> 
> I don't see any relevant optimization difference in the code
> above, but the "mov %r15w,-0x2(%r13)" on the right side accounts
> for almost all the difference, though on the left side it seems
> a bit more spread.
> 
> I applied the patch below and it helped to get to 12.7Mpps, so
> almost at the same levels. I wonder if you see the same result.
> 

Since I don't see the drop that you see with this patch, when I apply the below 
patch to the latest master, I see a smaller benefit.
The relative performance after adding the below prefetch compared to before 
(latest master):
1.005x (0.5%)

When I compare before/after performance (including the prefetch code, on latest 
master), the overall performance difference is 0.5% here.

> diff --git a/lib/flow.c b/lib/flow.c
> index 729d59b1b..4572e356b 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -746,6 +746,9 @@ miniflow_extract(struct dp_packet *packet, struct 
> miniflow *dst)
>  uint8_t *ct_nw_proto_p = NULL;
>  ovs_be16 ct_tp_src = 0, ct_tp_dst = 0;
> 
> +/* dltype will be updated later. */
> +OVS_PREFETCH_WRITE(miniflow_pointer(mf, dl_type));
> +
>  /* Metadata. */
>  if (flow_tnl_dst_is_set(>tunnel)) {
>  miniflow_push_words(mf, tunnel, >tunnel,
> 
> 
> fbl
> 



Thanks,
Cian
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache

2021-07-15 Thread Ferriter, Cian


> -Original Message-
> From: Eli Britstein 
> Sent: Wednesday 14 July 2021 16:21
> To: Ferriter, Cian ; Ilya Maximets 
> ; Gaëtan Rivet
> ; d...@openvswitch.org; Van Haaren, Harry 
> 
> Cc: Majd Dibbiny ; Stokes, Ian ; 
> Flavio Leitner
> 
> Subject: Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache
> 
> 
> On 7/14/2021 5:58 PM, Ferriter, Cian wrote:
> > External email: Use caution opening links or attachments
> >
> >
> >> -Original Message-
> >> From: Ilya Maximets 
> >> Sent: Friday 9 July 2021 21:53
> >> To: Ferriter, Cian ; Gaëtan Rivet 
> >> ; Eli Britstein
> >> ; d...@openvswitch.org; Van Haaren, Harry 
> >> 
> >> Cc: Majd Dibbiny ; Ilya Maximets ; 
> >> Stokes, Ian
> >> ; Flavio Leitner 
> >> Subject: Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array 
> >> cache
> >>
> >> On 7/8/21 6:43 PM, Ferriter, Cian wrote:
> >>> Hi Gaetan, Eli and all,
> >>>
> >>> Thanks for the patch and the info on how it affects performance in your 
> >>> case. I just wanted to
> post
> >> the performance we are seeing.
> >>> I've posted the numbers inline. Please note, I'll be away on leave till 
> >>> Tuesday.
> >>> Thanks,
> >>> Cian
> >>>
>  -Original Message-
>  From: Gaëtan Rivet 
>  Sent: Wednesday 7 July 2021 17:36
>  To: Eli Britstein ;  
>  ; Van Haaren,
> >> Harry
>  ; Ferriter, Cian 
>  Cc: Majd Dibbiny ; Ilya Maximets 
>  Subject: Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array 
>  cache
> 
>  On Wed, Jul 7, 2021, at 17:05, Eli Britstein wrote:
> > Port numbers are usually small. Maintain an array of netdev handles 
> > indexed
> > by port numbers. It accelerates looking up for them for
> > netdev_hw_miss_packet_recover().
> >
> > Reported-by: Cian Ferriter 
> > Signed-off-by: Eli Britstein 
> > Reviewed-by: Gaetan Rivet 
> > ---
> >>> 
> >>>
> > ___
> > dev mailing list
> > d...@openvswitch.org
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flis
> tinfo%2Fovs-
> devdata=04%7C01%7Celibr%40nvidia.com%7C7ca0caf9434e429e4ffd08d946d7cf2f%7C43083d15727340c1b7db39e
> fd9ccc17a%7C0%7C0%7C637618715041410254%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJ
> BTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=rv%2FdenANxrcTGxBBbRvhhlNioyswL7ieFr8AGcGtCs8%3Drese
> rved=0
> >
>  Hello,
> 
>  I tested the performance impact of this patch with a partial offload 
>  setup.
>  As reported by pmd-stats-show, in average cycles per packet:
> 
>  Before vxlan-decap: 525 c/p
>  After vxlan-decap: 542 c/p
>  After this fix: 530 c/p
> 
>  Without those fixes, vxlan-decap has a 3.2% negative impact on cycles,
>  with the fixes, the impact is reduced to 0.95%.
> 
>  As I had to force partial offloads for our hardware, it would be better
>  with an outside confirmation on a proper setup.
> 
>  Kind regards,
>  --
>  Gaetan Rivet
> >>> I'm showing the performance relative to what we measured on OVS master 
> >>> directly before the VXLAN
> >> HWOL changes went in. All of the below results are using the scalar DPIF 
> >> and partial HWOL.
> >>> Link to "Fixup patches":
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatchwork.ozlabs.org%2Fproject%2Fopen
> vswitch%2Flist%2F%3Fseries%3D252356data=04%7C01%7Celibr%40nvidia.com%7C7ca0caf9434e429e4ffd08d946
> d7cf2f%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637618715041410254%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Y62OrCRyS00vJHHPQvAHyhG5C
> 4eO%2FSfWMCSPtszn3Is%3Dreserved=0
> >>>
> >>> Master before VXLAN HWOL changes (f0e4a73)
> >>> 1.000x
> >>>
> >>> Latest master after VXLAN HWOL changes (b780911)
> >>> 0.918x (-8.2%)
> >>>
> >>> After fixup patches on OVS ML are applied (with 
> >>> ALLOW_EXPERIMENTAL_API=off)
> >>> 0.973x (-2.7%)
> >>>
> >>> After fixup patches on OVS ML are applied and after 
> >>> ALLOW_EXPERIMENTAL_API is removed.
> >>> 0.938x (-6.2%)
> >>>
> >>> I ran the last set of results by applying the below diff. I did this 
> >>> because I'm assuming the plan
> >> is to remove the ALLOW_EXPERIMENTAL_API '#ifdef's at some point?
> >>
> >> Yes, that is the plan.
> >>
> > Thanks for confirming this.
> >
> >> And thanks for testing, Gaetan and Cian!
> >>
> >> Could you also provide more details on your test environment,
> >> so someone else can reproduce?
> >>
> > Good idea, I'll add the details inline below. These details apply to the 
> > performance measured
> previously by me, and the performance in this mail.
> >
> >> What is important to know:
> >> - Test configuration: P2P, V2V, PVP, etc.
> >
> > P2P
> > 1 PHY port
> > 1 RXQ
> >
> >> - Test type: max. throughput, zero packet loss.
> > Max throughput.
> >
> >> - OVS config: EMC, SMC, HWOL, 

Re: [ovs-dev] [PATCH v3 0/9] OVSDB Relay Service Model. (Was: OVSDB 2-Tier deployment)

2021-07-15 Thread Dumitru Ceara
Hi Ilya,

On 7/14/21 6:52 PM, Ilya Maximets wrote:
> On 7/14/21 3:50 PM, Ilya Maximets wrote:
>> Replication can be used to scale out read-only access to the database.
>> But there are clients that are not read-only, but read-mostly.
>> One of the main examples is ovn-controller that mostly monitors
>> updates from the Southbound DB, but needs to claim ports by sending
>> transactions that changes some database tables.
>>
>> Southbound database serves lots of connections: all connections
>> from ovn-controllers and some service connections from cloud
>> infrastructure, e.g. some OpenStack agents are monitoring updates.
>> At a high scale and with a big size of the database ovsdb-server
>> spends too much time processing monitor updates and it's required
>> to move this load somewhere else.  This patch-set aims to introduce
>> required functionality to scale out read-mostly connections by
>> introducing a new OVSDB 'relay' service model .
>>
>> In this new service model ovsdb-server connects to existing OVSDB
>> server and maintains in-memory copy of the database.  It serves
>> read-only transactions and monitor requests by its own, but forwards
>> write transactions to the relay source.
>>
>> Key differences from the active-backup replication:
>> - support for "write" transactions.
>> - no on-disk storage. (probably, faster operation)
>> - support for multiple remotes (connect to the clustered db).
>> - doesn't try to keep connection as long as possible, but
>>   faster reconnects to other remotes to avoid missing updates.
>> - No need to know the complete database schema beforehand,
>>   only the schema name.
>> - can be used along with other standalone and clustered databases
>>   by the same ovsdb-server process. (doesn't turn the whole
>>   jsonrpc server to read-only mode)
>> - supports modern version of monitors (monitor_cond_since),
>>   because based on ovsdb-cs.
>> - could be chained, i.e. multiple relays could be connected
>>   one to another in a row or in a tree-like form.
>>
>> Bringing all above functionality to the existing active-backup
>> replication doesn't look right as it will make it less reliable
>> for the actual backup use case, and this also would be much
>> harder from the implementation point of view, because current
>> replication code is not based on ovsdb-cs or idl and all the required
>> features would be likely duplicated or replication would be fully
>> re-written on top of ovsdb-cs with severe modifications of the former.
>>
>> Relay is somewhere in the middle between active-backup replication and
>> the clustered model taking a lot from both, therefore is hard to
>> implement on top of any of them.
>>
>> To run ovsdb-server in relay mode, user need to simply run:
>>
>>   ovsdb-server --remote=punix:db.sock relay::
>>
>> e.g.
>>
>>   ovsdb-server --remote=punix:db.sock relay:OVN_Southbound:tcp:127.0.0.1:6642
>>
>> More details and examples in the documentation in the last patch
>> of the series.
>>
>> I actually tried to implement transaction forwarding on top of
>> active-backup replication in v1 of this seies, but it required
>> a lot of tricky changes, including schema format changes in order
>> to bring required information to the end clients, so I decided
>> to fully rewrite the functionality in v2 with a different approach.
>>
>>
>>  Testing
>>  ===
>>
>> Some scale tests were performed with OVSDB Relays that mimics OVN
>> workloads with ovn-kubernetes.
>> Tests performed with ovn-heater (https://github.com/dceara/ovn-heater)
>> on scenario ocp-120-density-heavy:
>>  
>> https://github.com/dceara/ovn-heater/blob/master/test-scenarios/ocp-120-density-heavy.yml
>> In short, the test gradually creates a lot of OVN resources and
>> checks that network is configured correctly (by pinging diferent
>> namespaces).  The test includes 120 chassis (created by
>> ovn-fake-multinode), 31250 LSPs spread evenly across 120 LSes, 3 LBs
>> with 15625 VIPs each, attached to all node LSes, etc.  Test performed
>> with monitor-all=true.
>>
>> Note 1:
>>  - Memory consumption is checked at the end of a test in a following
>>way: 1) check RSS 2) compact database 3) check RSS again.
>>It's observed that ovn-controllers in this test are fairly slow
>>and backlog builds up on monitors, because ovn-controllers are
>>not able to receive updates fast enough.  This contributes to
>>RSS of the process, especially in combination of glibc bug (glibc
>>doesn't free fastbins back to the system).  Memory trimming on
>>compaction is enabled in the test, so after compaction we can
>>see more or less real value of the RSS at the end of the test
>>wihtout backlog noise. (Compaction on relay in this case is
>>just plain malloc_trim()).
>>
>> Note 2:
>>  - I didn't collect memory consumption (RSS) after compaction for a
>>test with 10 relays, because I got the idea only after the test
>>was finished and another one already started.  And run takes
>>

  1   2   >