Re: [ovs-dev] Question about moving the uuidset implementation from OVN to OVS

2022-09-16 Thread Ben Pfaff
On Fri, Sep 16, 2022 at 5:30 AM Dumitru Ceara  wrote:
> I'd like to post an OVS patch to add this along with some more helpers
> (see below) but I have a couple of questions:
>
> 1. What's the correct/desired way of maintaining authorship in this
> case?  I can keep all original authors and add my own sign off to it too
> but is that ok?

Seems OK to me.

> 2. It seems we miss a copyright notice in ovn-northd-ddlog.c.  Would it
> be ok if I add the following?
>
>  * Copyright (c) 2020, 2022 VMware, Inc.
>  * Copyright (c) 2022 Red Hat, Inc.

That also seems OK to me.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/2] Patches to branch for 3.0.

2022-07-15 Thread Ben Pfaff
On Fri, Jul 15, 2022 at 6:10 AM Ilya Maximets  wrote:
> Hey!  It's been about 9 years since the last major version change.
> And there were lots of large and important changes in OVS in that
> long time frame.  Nevertheless, removal of the kernel datapath
> sources from the OVS's git tree seems like a notable enough point
> in time to mark it with a version bump.

I support the bump to 3.0! Great idea!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/6] Remove OVS kernel driver

2022-07-08 Thread Ben Pfaff
On Fri, Jul 8, 2022 at 12:00 PM Ilya Maximets  wrote:

> On 7/8/22 17:37, Ben Pfaff wrote:
> > On Fri, Jul 8, 2022 at 1:53 AM David Marchand  <mailto:david.march...@redhat.com>> wrote:
> >
> > On Thu, Jul 7, 2022 at 11:21 PM Gregory Rose  <mailto:gvrose8...@gmail.com>> wrote:
> > > > xenserver/openvswitch-xen.spec.in <
> http://openvswitch-xen.spec.in>
> > >
> > > This is a bit of an issue.  Does anyone even use xenserver anymore?
> > > All of the documentation and build instructions are really old and
> > > I wonder of they work anymore.   I have no xenserver  build
> environment
> > > to  test  any changes.
> >
> > Same for me.
> >
> > Ben, Ilya, do you know if this xenserver packaging still has users?
> > The last comment about it was in 2017.
> > fefb757ce408 ("debian, xenserver: Update logrotate config to match
> RHEL.")
> >
> > Should we keep on updating it? or can we simply drop support for
> > xenserver packaging?
> >
> > I'd be inclined to drop it. I have heard so little about xenserver (or
> XCP, which
> > I think is its successor) over the last few years. It makes me a bit
> sad, since
> > that's where Open vSwitch started out (in a few cases I literally had to
> > disassemble bits of its binaries to understand what was going on), but
> the
> > world moves on.
>
> It seems like there is a newer open-source XCP-ng project that is
> more or less active.  There is also proprietary Citrix Hypervisor,
> of course.  Both seems to have OVS as a default networking solution
> and both seems to use some version of OVS 2.5.3.  Here is a repo
> for XCP-ng one:
>   https://github.com/xcp-ng-rpms/openvswitch
>
> They are not using our spec file though.  And have some custom
> patches on top.  Not sure about Citrix Hypervisor, but I found some
> more or less recent security hotfixes referencing openvswitch package
> like this one:
>
> https://support.citrix.com/article/CTX306423/hotfix-xs82e022-for-citrix-hypervisor-82
>
> So, yes, someone is still using it, but they are using about 5 year
> old version at this point and they are not using upstream spec file


I guess they're not really using anything we're working on now. I think
this still supports dropping it. I'm sure we'd welcome them back if they
asked.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/6] Remove OVS kernel driver

2022-07-08 Thread Ben Pfaff
On Fri, Jul 8, 2022 at 1:53 AM David Marchand 
wrote:

> On Thu, Jul 7, 2022 at 11:21 PM Gregory Rose  wrote:
> > > xenserver/openvswitch-xen.spec.in
> >
> > This is a bit of an issue.  Does anyone even use xenserver anymore?
> > All of the documentation and build instructions are really old and
> > I wonder of they work anymore.   I have no xenserver  build environment
> > to  test  any changes.
>
> Same for me.
>
> Ben, Ilya, do you know if this xenserver packaging still has users?
> The last comment about it was in 2017.
> fefb757ce408 ("debian, xenserver: Update logrotate config to match RHEL.")
>
> Should we keep on updating it? or can we simply drop support for
> xenserver packaging?
>
I'd be inclined to drop it. I have heard so little about xenserver (or XCP,
which
I think is its successor) over the last few years. It makes me a bit sad,
since
that's where Open vSwitch started out (in a few cases I literally had to
disassemble bits of its binaries to understand what was going on), but the
world moves on.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] classifier: change segment 3 boundary to defer dependant processing

2022-05-12 Thread Ben Pfaff
On Thu, May 12, 2022 at 8:55 AM Aaron Conole  wrote:
>
> Ben Pfaff  writes:
>
> > Hi Aaron!  This will move the following fields from the L4 segment to
> > the L3 segment:
> >
> > struct in6_addr nd_target;  /* IPv6 neighbor discovery (ND) target. */
> > struct eth_addr arp_sha;/* ARP/ND source hardware address. */
> > struct eth_addr arp_tha;/* ARP/ND target hardware address. */
> > ovs_be16 tcp_flags; /* TCP flags/ICMPv6 ND options type.
> >  * With L3 to avoid matching L4. */
> > ovs_be16 pad2;  /* Pad to 64 bits. */
> > struct ovs_key_nsh nsh; /* Network Service Header keys */
> >
> >
> > The one that stands out to me is tcp_flags.  I remember that we had a
> > reason that we wanted to include this in the L3 segment.  I think it was
> > because we might want to match on flags without matching on anything
> > else in L4, but I don't remember the exact details.  There is a comment
> > to that effect above.
>
> I guess the reason that would be done would be to implement a stateless
> firewall using the openflow rules.  I haven't seen many flow pipelines
> recently that use tcp_flags, but I can see it might help those kinds of
> scenarios where we want to allow/drop packets based on some kinds of
> flags (or drop xmas packets before they get processed).
>
> Do you think it will be a problem?  For now, I assume most folks are
> using the ct() action and that should be based on ct_state field (which
> isn't impacted).

I don't know whether it will be a problem. Everything like this always
comes back
to "how well do we know how our users use Open vSwitch?" and the answer is
always "not very well" :-(

Since this is an important correctness issue, I think we should fix it and, if
performance problems arise in cases we're not sure about, we can address
them.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] classifier: change segment 3 boundary to defer dependant processing

2022-05-11 Thread Ben Pfaff
Hi Aaron!  This will move the following fields from the L4 segment to
the L3 segment:

struct in6_addr nd_target;  /* IPv6 neighbor discovery (ND) target. */
struct eth_addr arp_sha;/* ARP/ND source hardware address. */
struct eth_addr arp_tha;/* ARP/ND target hardware address. */
ovs_be16 tcp_flags; /* TCP flags/ICMPv6 ND options type.
 * With L3 to avoid matching L4. */
ovs_be16 pad2;  /* Pad to 64 bits. */
struct ovs_key_nsh nsh; /* Network Service Header keys */


The one that stands out to me is tcp_flags.  I remember that we had a
reason that we wanted to include this in the L3 segment.  I think it was
because we might want to match on flags without matching on anything
else in L4, but I don't remember the exact details.  There is a comment
to that effect above.

I suggest including the list of fields being moved in the commit message
(i.e. the code excerpt above) so that it's clearer what's changing.

Some more below:

On Wed, May 11, 2022 at 10:52:41AM -0400, Aaron Conole wrote:
> During the sweep stage, revalidator will compare the dumped WC with a
> generated WC - these will mismatch because the generated WC will include
> such dependent ND fields.  We will then invalidate the flow and as a side
> effect force an upcall.

I find "include" (or "exclude") a little ambiguous here.  I think that
saying that the generated or dumped WC "matches" or "does not match" on
a particular field would be clearer.  Some more like this below:

> By redefining the boundary, we shift these fields to the l4 subtable, and
> cause masks to be generated by including the requisite fields.  Two
> other approaches were considered - moving the nd_target field alone
> (which leaves arp_sha/nd_sll, arp_tha/nd_tll, and tcp_flags still as
> issues so this was avoided) and collapsing the l3/l4 segments into one
> (which would result in flow explosion in real scenarios, so it was also
> discarded as a solution).
> 
> Fixes: 476f36e83bc5 ("Classifier: Staged subtable matching.")
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2081773
> Reported-by: Numan Siddique 
> Suggested-by: Ilya Maximets 
> Signed-off-by: Aaron Conole 

Thanks for adding a test!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] MAINTAINERS: Move myself to emeritus status.

2022-05-03 Thread Ben Pfaff
On Tue, May 3, 2022 at 3:15 PM Han Zhou  wrote:
> I believe this has been applied already last year.

Oh, I see, we also transitioned to have the main branch be "main" but
I was looking
in the wrong place.

No need to do anything.

Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] MAINTAINERS: Move myself to emeritus status.

2022-04-28 Thread Ben Pfaff
I am no longer actively working on OVN, so it's appropriate to move myself
to emeritus status.

Signed-off-by: Ben Pfaff 
---
 MAINTAINERS.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS.rst b/MAINTAINERS.rst
index 0d19bd622..fff2cd977 100644
--- a/MAINTAINERS.rst
+++ b/MAINTAINERS.rst
@@ -41,8 +41,6 @@ This is the current list of active OVN committers:
 
* - Name
  - Email
-   * - Ben Pfaff
- - b...@ovn.org
* - Gurucharan Shetty
  - g...@ovn.org
* - Han Zhou
@@ -67,3 +65,5 @@ More information about Emeritus Committers can be found
 
* - Name
  - Email
+   * - Ben Pfaff
+ - b...@ovn.org
-- 
2.35.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVN weekly meeting time change

2022-04-05 Thread Ben Pfaff
On Thu, Mar 31, 2022 at 02:03:41PM -0400, Mark Michelson wrote:
> @Ben
> Can you please change the channel topic message of #openvswitch so that the
> new meeting time is reflected? Thanks!

I've done this now.

Does anyone know how I would give someone else permission to do this
kind of thing?  I don't think I should be the only one who can do this.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] daemon-unix: Close log file in monitor process while waiting on child.

2022-01-29 Thread Ben Pfaff
On Wed, Jan 26, 2022 at 06:39:52PM -0800, William Tu wrote:
> On Wed, Jan 26, 2022 at 8:56 AM Ben Pfaff  wrote:
> >
> > Until now, the monitor process held its log file open while it waited for
> > the child to exit, and then it reopened it after the child exited.  Holding
> > the log file open this way prevented log rotation from freeing disk space.
> > This commit avoids the problem by closing the log file before waiting, then
> > reopening it afterward.
> >
> > Signed-off-by: Ben Pfaff 
> > Reported-by: Antonin Bas 
> > VMware-BZ: #2743409
> > ---
> 
> Hi Ben,
> Thanks, ASAN reports a memory leak,

[...]

> I think we should "free(log_file)" after vlog_set_log_file(log_file)?

You are right.  Thank you for the report.

I posted v2:
https://mail.openvswitch.org/pipermail/ovs-dev/2022-January/391119.html
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] daemon-unix: Close log file in monitor process while waiting on child.

2022-01-29 Thread Ben Pfaff
Until now, the monitor process held its log file open while it waited for
the child to exit, and then it reopened it after the child exited.  Holding
the log file open this way prevented log rotation from freeing disk space.
This commit avoids the problem by closing the log file before waiting, then
reopening it afterward.

Signed-off-by: Ben Pfaff 
Reported-by: Antonin Bas 
VMware-BZ: #2743409
---
v1->v2: Fix memory leak in monitor_daemon() reported by William.

 include/openvswitch/vlog.h |  2 +
 lib/daemon-unix.c  |  5 +-
 lib/vlog.c | 93 ++
 3 files changed, 71 insertions(+), 29 deletions(-)

diff --git a/include/openvswitch/vlog.h b/include/openvswitch/vlog.h
index 886fce5e0..e53ce6d81 100644
--- a/include/openvswitch/vlog.h
+++ b/include/openvswitch/vlog.h
@@ -131,7 +131,9 @@ void vlog_set_verbosity(const char *arg);
 
 /* Configuring log destinations. */
 void vlog_set_pattern(enum vlog_destination, const char *pattern);
+char *vlog_get_log_file(void);
 int vlog_set_log_file(const char *file_name);
+void vlog_close_log_file(void);
 int vlog_reopen_log_file(void);
 #ifndef _WIN32
 void vlog_change_owner_unix(uid_t, gid_t);
diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c
index 34d45b82a..a02fd984b 100644
--- a/lib/daemon-unix.c
+++ b/lib/daemon-unix.c
@@ -360,12 +360,15 @@ monitor_daemon(pid_t daemon_pid)
(unsigned long int) daemon_pid, status_msg);
 
 if (child_ready) {
+char *log_file = vlog_get_log_file();
+vlog_close_log_file();
 int error;
 do {
 retval = waitpid(daemon_pid, , 0);
 error = retval == -1 ? errno : 0;
 } while (error == EINTR);
-vlog_reopen_log_file();
+vlog_set_log_file(log_file);
+free(log_file);
 if (error) {
 VLOG_FATAL("waitpid failed (%s)", ovs_strerror(error));
 }
diff --git a/lib/vlog.c b/lib/vlog.c
index 533f93755..0a615bb66 100644
--- a/lib/vlog.c
+++ b/lib/vlog.c
@@ -343,13 +343,25 @@ vlog_set_pattern(enum vlog_destination destination, const 
char *pattern)
 }
 }
 
-/* Sets the name of the log file used by VLF_FILE to 'file_name', or to the
- * default file name if 'file_name' is null.  Returns 0 if successful,
- * otherwise a positive errno value. */
-int
-vlog_set_log_file(const char *file_name)
+/* Returns a copy of the name of the log file used by VLF_FILE, or NULL if none
+ * is configured.  The caller must eventually free the returned string. */
+char *
+vlog_get_log_file(void)
+{
+ovs_mutex_lock(_file_mutex);
+char *fn = nullable_xstrdup(log_file_name);
+ovs_mutex_unlock(_file_mutex);
+
+return fn;
+}
+
+/* Sets the name of the log file used by VLF_FILE to 'new_log_file_name', or
+ * closes the current log file if 'new_log_file_name' is NULL.  Takes ownership
+ * of 'new_log_file_name'.  Returns 0 if successful, otherwise a positive errno
+ * value. */
+static int
+vlog_set_log_file__(char *new_log_file_name)
 {
-char *new_log_file_name;
 struct vlog_module *mp;
 struct stat old_stat;
 struct stat new_stat;
@@ -358,25 +370,29 @@ vlog_set_log_file(const char *file_name)
 bool log_close;
 
 /* Open new log file. */
-new_log_file_name = (file_name
- ? xstrdup(file_name)
- : xasprintf("%s/%s.log", ovs_logdir(), program_name));
-new_log_fd = open(new_log_file_name, O_WRONLY | O_CREAT | O_APPEND, 0660);
-if (new_log_fd < 0) {
-VLOG_WARN("failed to open %s for logging: %s",
-  new_log_file_name, ovs_strerror(errno));
-free(new_log_file_name);
-return errno;
+if (new_log_file_name) {
+new_log_fd = open(new_log_file_name, O_WRONLY | O_CREAT | O_APPEND,
+  0660);
+if (new_log_fd < 0) {
+VLOG_WARN("failed to open %s for logging: %s",
+  new_log_file_name, ovs_strerror(errno));
+free(new_log_file_name);
+return errno;
+}
+} else {
+new_log_fd = -1;
 }
 
 /* If the new log file is the same one we already have open, bail out. */
 ovs_mutex_lock(_file_mutex);
-same_file = (log_fd >= 0
- && new_log_fd >= 0
- && !fstat(log_fd, _stat)
- && !fstat(new_log_fd, _stat)
- && old_stat.st_dev == new_stat.st_dev
- && old_stat.st_ino == new_stat.st_ino);
+same_file = ((log_fd < 0
+  && new_log_fd < 0) ||
+ (log_fd >= 0
+  && new_log_fd >= 0
+  && !fstat(log_fd, _stat)
+  && !fstat(new_log_fd, _stat)
+  && old_stat.st_dev == n

[ovs-dev] [PATCH] daemon-unix: Close log file in monitor process while waiting on child.

2022-01-26 Thread Ben Pfaff
Until now, the monitor process held its log file open while it waited for
the child to exit, and then it reopened it after the child exited.  Holding
the log file open this way prevented log rotation from freeing disk space.
This commit avoids the problem by closing the log file before waiting, then
reopening it afterward.

Signed-off-by: Ben Pfaff 
Reported-by: Antonin Bas 
VMware-BZ: #2743409
---
 include/openvswitch/vlog.h |  2 +
 lib/daemon-unix.c  |  4 +-
 lib/vlog.c | 93 ++
 3 files changed, 70 insertions(+), 29 deletions(-)

diff --git a/include/openvswitch/vlog.h b/include/openvswitch/vlog.h
index 886fce5e0..e53ce6d81 100644
--- a/include/openvswitch/vlog.h
+++ b/include/openvswitch/vlog.h
@@ -131,7 +131,9 @@ void vlog_set_verbosity(const char *arg);
 
 /* Configuring log destinations. */
 void vlog_set_pattern(enum vlog_destination, const char *pattern);
+char *vlog_get_log_file(void);
 int vlog_set_log_file(const char *file_name);
+void vlog_close_log_file(void);
 int vlog_reopen_log_file(void);
 #ifndef _WIN32
 void vlog_change_owner_unix(uid_t, gid_t);
diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c
index 34d45b82a..10806bf81 100644
--- a/lib/daemon-unix.c
+++ b/lib/daemon-unix.c
@@ -360,12 +360,14 @@ monitor_daemon(pid_t daemon_pid)
(unsigned long int) daemon_pid, status_msg);
 
 if (child_ready) {
+char *log_file = vlog_get_log_file();
+vlog_close_log_file();
 int error;
 do {
 retval = waitpid(daemon_pid, , 0);
 error = retval == -1 ? errno : 0;
 } while (error == EINTR);
-vlog_reopen_log_file();
+vlog_set_log_file(log_file);
 if (error) {
 VLOG_FATAL("waitpid failed (%s)", ovs_strerror(error));
 }
diff --git a/lib/vlog.c b/lib/vlog.c
index 533f93755..0a615bb66 100644
--- a/lib/vlog.c
+++ b/lib/vlog.c
@@ -343,13 +343,25 @@ vlog_set_pattern(enum vlog_destination destination, const 
char *pattern)
 }
 }
 
-/* Sets the name of the log file used by VLF_FILE to 'file_name', or to the
- * default file name if 'file_name' is null.  Returns 0 if successful,
- * otherwise a positive errno value. */
-int
-vlog_set_log_file(const char *file_name)
+/* Returns a copy of the name of the log file used by VLF_FILE, or NULL if none
+ * is configured.  The caller must eventually free the returned string. */
+char *
+vlog_get_log_file(void)
+{
+ovs_mutex_lock(_file_mutex);
+char *fn = nullable_xstrdup(log_file_name);
+ovs_mutex_unlock(_file_mutex);
+
+return fn;
+}
+
+/* Sets the name of the log file used by VLF_FILE to 'new_log_file_name', or
+ * closes the current log file if 'new_log_file_name' is NULL.  Takes ownership
+ * of 'new_log_file_name'.  Returns 0 if successful, otherwise a positive errno
+ * value. */
+static int
+vlog_set_log_file__(char *new_log_file_name)
 {
-char *new_log_file_name;
 struct vlog_module *mp;
 struct stat old_stat;
 struct stat new_stat;
@@ -358,25 +370,29 @@ vlog_set_log_file(const char *file_name)
 bool log_close;
 
 /* Open new log file. */
-new_log_file_name = (file_name
- ? xstrdup(file_name)
- : xasprintf("%s/%s.log", ovs_logdir(), program_name));
-new_log_fd = open(new_log_file_name, O_WRONLY | O_CREAT | O_APPEND, 0660);
-if (new_log_fd < 0) {
-VLOG_WARN("failed to open %s for logging: %s",
-  new_log_file_name, ovs_strerror(errno));
-free(new_log_file_name);
-return errno;
+if (new_log_file_name) {
+new_log_fd = open(new_log_file_name, O_WRONLY | O_CREAT | O_APPEND,
+  0660);
+if (new_log_fd < 0) {
+VLOG_WARN("failed to open %s for logging: %s",
+  new_log_file_name, ovs_strerror(errno));
+free(new_log_file_name);
+return errno;
+}
+} else {
+new_log_fd = -1;
 }
 
 /* If the new log file is the same one we already have open, bail out. */
 ovs_mutex_lock(_file_mutex);
-same_file = (log_fd >= 0
- && new_log_fd >= 0
- && !fstat(log_fd, _stat)
- && !fstat(new_log_fd, _stat)
- && old_stat.st_dev == new_stat.st_dev
- && old_stat.st_ino == new_stat.st_ino);
+same_file = ((log_fd < 0
+  && new_log_fd < 0) ||
+ (log_fd >= 0
+  && new_log_fd >= 0
+  && !fstat(log_fd, _stat)
+  && !fstat(new_log_fd, _stat)
+  && old_stat.st_dev == new_stat.st_dev
+  && old_stat.st_ino == new_stat.st_ino));
 ovs_mutex_

Re: [ovs-dev] [nore...@github.com: [GitHub] Third-party application approval request for "Open Virtual Network"]

2021-12-25 Thread Ben Pfaff
On Fri, Dec 24, 2021 at 10:32:23AM -0500, Numan Siddique wrote:
> On Thu, Dec 23, 2021 at 6:51 PM Ben Pfaff  wrote:
> >
> > This came to me.  I don't know whether this is something we/I should
> > approve.
> 
> This request came from Andrew (CCed) and I approved it.

Thanks for taking care of it!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [nore...@github.com: [GitHub] Third-party application approval request for "Open Virtual Network"]

2021-12-23 Thread Ben Pfaff
This came to me.  I don't know whether this is something we/I should
approve.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [EXTERNAL] Re: [PATCH] RFC: netdev-dpdk: Add Windows support.

2021-11-04 Thread Ben Pfaff
The  news is good, thanks!

On Thu, Nov 4, 2021 at 12:09 PM Omar Cardona  wrote:

> Hi *Ben,*
>
>- Yes, definitely.  C11/C17 stdatomic.h is high on the list.
>- Our current critical path item for MSVC DPDK support is TLS.
>
>
>
> Hi *Thomas,*
>
>- I will setup some time next week to discuss alignment and share our
>proposed MSVC plan.
>
>
>
>
>
> *From:* Ben Pfaff 
> *Sent:* Thursday, November 4, 2021 11:15 AM
> *To:* Omar Cardona 
> *Cc:* William Tu ; thomas ; <
> d...@openvswitch.org> ; ovs-dev <
> ovs-dev@openvswitch.org>; Sergey Madaminov ;
> Kadam, Pallavi ; Ilya Maximets <
> i.maxim...@ovn.org>; Test ; Dmitry Kozlyuk <
> dmitry.kozl...@gmail.com>; talshn 
> *Subject:* Re: [EXTERNAL] Re: [ovs-dev] [PATCH] RFC: netdev-dpdk: Add
> Windows support.
>
>
>
> On Thu, Nov 4, 2021 at 8:37 AM Omar Cardona 
> wrote:
>
> > > f. With clang support on Windows, can we remove MSVC?
>
> Please no.  We are actively working with the MSVC team to extend support
> for DPDKs dependencies
>
>
>
> Could you get them to implement C11 atomics? The MSVC support for atomic
>
> operations is poor, unless it's changed since I last looked. We could
> remove
>
> quite a bit of rather awful stuff if either MSVC supported the C standard
> or if
>
> OVS dropped support for MSVC.
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [EXTERNAL] Re: [PATCH] RFC: netdev-dpdk: Add Windows support.

2021-11-04 Thread Ben Pfaff
On Thu, Nov 4, 2021 at 8:37 AM Omar Cardona  wrote:

> > > f. With clang support on Windows, can we remove MSVC?
>
> Please no.  We are actively working with the MSVC team to extend support
> for DPDKs dependencies
>

Could you get them to implement C11 atomics? The MSVC support for atomic
operations is poor, unless it's changed since I last looked. We could remove
quite a bit of rather awful stuff if either MSVC supported the C standard
or if
OVS dropped support for MSVC.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] northd-ddlog slowness when adding the first regular LSP to a LS full of router ports

2021-10-06 Thread Ben Pfaff
On Thu, Sep 30, 2021 at 12:23:57AM -0700, Han Zhou wrote:
> Hi Ben,
> 
> I understand that we have difficulties for northd-ddlog progress, but I
> still want to try it before it goes away. I tested with the latest version,
> and it is super fast for most of the operations. With a large NB & SB
> created, the C northd takes ~8 seconds for any change computation. For the
> same DB, northd usually takes less than 1 sec for most operations.
> 
> However, I did find an interesting problem. I tried to create one more LSP
> on a LS that already has 800 gateway-routers connected to it, which means
> there are already 800 LSPs on the LS of the type "router". Creating the
> extra LSP (without type) took 12 sec, which is even longer than a full
> compute of the C version. What's more interesting is, when I create another
> LSP on the same LS, it takes only 100+ms, and same for the 3rd, 4th LSPs,
> etc. When I remove any one of the extra LSPs I created, it is also fast,
> just 100+ms. But when I remove the last LSP that I just created it takes 12
> sec again. Then I tried creating a LSP with type=router on the same LS, it
> is very fast, less than 100ms. Basically, only creating the first
> non-router LSP or removing the last non-router LSP takes a very long time.
> 
> I haven't debugged yet (and not sure if I am capable of), but I think it
> might be useful to report it first.

This kind of thing usually happens because there is a lot of
intermediate data being built up in internal relations that changes when
the input changes.  My usual strategy for figuring it out is to
configure with --output-internal-relations (that might not be the
exactly correct option; the documentation should say) and then replay
using the CLI to see what's changing before and after the changes in
question.  It's usually big.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] RFC: netdev-dpdk: Add Windows support.

2021-10-06 Thread Ben Pfaff
On Wed, Oct 6, 2021 at 10:01 AM William Tu  wrote:
> The patch adds OVS-DPDK supports on Windows.

Fantastic! I'm happy to see this work in progress. OVS(-DPDK) everywhere!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] MAINTAINERS: Transition myself to emeritus status.

2021-10-06 Thread Ben Pfaff
On Tue, Oct 05, 2021 at 08:06:45PM -0700, William Tu wrote:
> On Tue, Oct 5, 2021 at 8:05 PM William Tu  wrote:
> >
> > On Tue, Oct 5, 2021 at 11:33 AM Numan Siddique  wrote:
> > >
> > > On Tue, Oct 5, 2021 at 1:46 PM Han Zhou  wrote:
> > > >
> > > > On Tue, Oct 5, 2021 at 9:39 AM Ben Pfaff  wrote:
> > > > >
> > > > > I have not been actively maintaining OVN for some time now.  I don't
> > > > expect
> > > > > that to change.  I think that the responsible thing to do is to
> > > > > acknowledge this properly by transitioning to emeritus status.
> > > > >
> > > > > Signed-off-by: Ben Pfaff 
> > > > > ---
> > > > >  MAINTAINERS.rst | 4 ++--
> > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/MAINTAINERS.rst b/MAINTAINERS.rst
> > > > > index 0d19bd622..fff2cd977 100644
> > > > > --- a/MAINTAINERS.rst
> > > > > +++ b/MAINTAINERS.rst
> > > > > @@ -41,8 +41,6 @@ This is the current list of active OVN committers:
> > > > >
> > > > > * - Name
> > > > >   - Email
> > > > > -   * - Ben Pfaff
> > > > > - - b...@ovn.org
> > > > > * - Gurucharan Shetty
> > > > >   - g...@ovn.org
> > > > > * - Han Zhou
> > > > > @@ -67,3 +65,5 @@ More information about Emeritus Committers can be 
> > > > > found
> > > > >
> > > > > * - Name
> > > > >   - Email
> > > > > +   * - Ben Pfaff
> > > > > + - b...@ovn.org
> > > > > --
> > > > > 2.31.1
> > > > >
> > > > > ___
> > > > > dev mailing list
> > > > > d...@openvswitch.org
> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > >
> > > > Hi Ben,
> > > >
> > > > The patch LGTM, although I really don't like the solution :(
> > > > Thank you so much for building the amazing stuff from scratch!
> > > >
> > >
> > > Agree, Thank you for all your amazing work!
> > >
> > > Thanks
> > > Numan
> > >
> > >
> > > > Acked-by: Han Zhou 
> >
> > Thanks Ben!
> > I applied to master.
> > William
> 
> and feel free to revert the commit if you change your mind :)

Thanks :-)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] MAINTAINERS: Transition myself to emeritus status.

2021-10-06 Thread Ben Pfaff
On Wed, Oct 06, 2021 at 11:20:48AM +0200, Simon Horman wrote:
> On Tue, Oct 05, 2021 at 09:32:41AM -0700, Ben Pfaff wrote:
> > It has been a long time since I've been able to devote significant time
> > to OVS work.  I don't expect that to change.  I think that the responsible
> > thing to do is to transition myself to emeritus status, with this patch.
> > 
> > Signed-off-by: Ben Pfaff 
> 
> Thanks for all your hard work Ben.

You're welcome!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] MAINTAINERS: Transition myself to emeritus status.

2021-10-05 Thread Ben Pfaff
I have not been actively maintaining OVN for some time now.  I don't expect
that to change.  I think that the responsible thing to do is to
acknowledge this properly by transitioning to emeritus status.

Signed-off-by: Ben Pfaff 
---
 MAINTAINERS.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS.rst b/MAINTAINERS.rst
index 0d19bd622..fff2cd977 100644
--- a/MAINTAINERS.rst
+++ b/MAINTAINERS.rst
@@ -41,8 +41,6 @@ This is the current list of active OVN committers:
 
* - Name
  - Email
-   * - Ben Pfaff
- - b...@ovn.org
* - Gurucharan Shetty
  - g...@ovn.org
* - Han Zhou
@@ -67,3 +65,5 @@ More information about Emeritus Committers can be found
 
* - Name
  - Email
+   * - Ben Pfaff
+ - b...@ovn.org
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] MAINTAINERS: Transition myself to emeritus status.

2021-10-05 Thread Ben Pfaff
It has been a long time since I've been able to devote significant time
to OVS work.  I don't expect that to change.  I think that the responsible
thing to do is to transition myself to emeritus status, with this patch.

Signed-off-by: Ben Pfaff 
---
 MAINTAINERS.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS.rst b/MAINTAINERS.rst
index 0c035d940..f4f6188c8 100644
--- a/MAINTAINERS.rst
+++ b/MAINTAINERS.rst
@@ -49,8 +49,6 @@ This is the current list of active Open vSwitch committers:
  - az...@ovn.org
* - Ansis Atteka
  - aatt...@nicira.com
-   * - Ben Pfaff
- - b...@ovn.org
* - Daniele Di Proietto
  - daniele.di.proie...@gmail.com
* - Gurucharan Shetty
@@ -89,5 +87,7 @@ More information about Emeritus Committers can be found
 
* - Name
  - Email
+   * - Ben Pfaff
+ - b...@ovn.org
* - Ethan J. Jackson
  - e...@eecs.berkeley.edu
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Issue with one of your old patches, commit c645550b

2021-10-04 Thread Ben Pfaff
This code in odp-util is basically unmaintainable.  I've never been able
to figure out how to properly test or reason about it.  It's a huge
reason that I want to get rid of the kernel module.

On Mon, Oct 04, 2021 at 11:49:41AM +0200, Eelco Chaudron wrote:
> Hi Ben,
> 
> Any chance you can take a peek at the email below? I hope you can remember 
> this from 2018 :)
> 
> Thanks,
> 
> Eelco
> 
> On 15 Sep 2021, at 15:30, Eelco Chaudron wrote:
> 
> > Hi Ben,
> >
> > The mentioned fix below is causing a regression, actually two related ones:
> >
> > 1) If IGMP snooping is disabled, IGMP related traffic is still sent slow 
> > path for processing with the NORMAL rule. In addition, even without the 
> > NORMAL rule, the DP rule gets modified.
> >
> > Here are some examples:
> >
> > a) default config, one bridge (kernel dp), two ports (i.e. only a single 
> > NORMAL rule, and mcast_snooping_enable: false), send an IGMP join:
> >
> > - With your fix the dp rule looks like this:
> >
> >   
> > recirc_id(0),in_port(2),eth(src=04:f4:bc:28:57:01,dst=01:00:5e:00:00:16),eth_type(0x0800),ipv4(frag=no),
> >  packets:8, bytes:576, used:0.169s, 
> > actions:userspace(pid=4220974525,slow_path(match))
> >
> > - Without the fix, which is correct:
> >
> >   
> > recirc_id(0),in_port(2),eth(src=04:f4:bc:28:57:01,dst=01:00:5e:00:00:16),eth_type(0x0800),ipv4(frag=no),
> >  packets:8, bytes:576, used:0.004s, actions:1
> >
> >
> > b) with a more specific OpenFlow entry, no DP entry gets created:
> >
> >   ovs-ofctl del-flows ovs_pvp_br0 && ovs-ofctl add-flow ovs_pvp_br0 
> > "in_port=enp5s0f0,eth_type(0x800),ip_proto=2,action=drop"
> >   ovs-ofctl del-flows ovs_pvp_br0 && ovs-ofctl add-flow ovs_pvp_br0 
> > "in_port=enp5s0f0,eth_type(0x800),ip_proto=2,action=vnet6"
> >
> >
> > 2) With the netdev (DPDK) interface all seems to work as expected, i.e., 
> > the flow gets installed correctly without the userspace upcall in both 
> > cases above. However, the verifier generates an error as it would like to 
> > update the dp rule with a slow_path variant. Which is failing as netdev is 
> > using the odp_flow_key_to_flow() utility function, which has the IGMP check.
> >
> > Looking at the change it assumes you filter on igmp type/code when you 
> > filter on the protocol type igmp. But this is not true, you can just filter 
> > on the protocol type only, and this is what is actually causing the 
> > problems above. So I think your patch needs to be reversed!
> >
> > However, looking at fixing Huanle's original problem, 
> > https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343665.html, 
> > looking at his patch, this might be the right solution. All IGMP packets 
> > independent of their type need to be processed by a SLOW_ACTION upcall.
> >
> >
> > I've copied in Flavio, who worked on the IGMP snooping part before, as he 
> > might still remember what I argue above is true :)
> >
> >
> > If you have any other ideas, thoughts please let me know.
> >
> >
> > Cheers,
> >
> > Eelco
> >
> > 
> >
> > commit c645550bb2498fb3816b6a39b22bffeb3154dca3
> > Author: Ben Pfaff 
> > Date:   Wed Jan 24 11:40:20 2018 -0800
> >
> > odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP.
> >
> > OVS datapaths don't understand or parse IGMP fields, but OVS userspace
> > does, so this commit updates odp_flow_key_to_flow() to report that 
> > properly
> > to the caller.
> >
> > Reported-by: Huanle Han 
> > Reported-at: 
> > https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343665.html
> > Signed-off-by: Ben Pfaff 
> >
> > diff --git a/lib/odp-util.c b/lib/odp-util.c
> > index 14d5af097..5da83b4c6 100644
> > --- a/lib/odp-util.c
> > +++ b/lib/odp-util.c
> > @@ -6210,6 +6210,11 @@ parse_l2_5_onward(const struct nlattr 
> > *attrs[OVS_KEY_ATTR_MAX + 1],
> >  }
> >  }
> >  }
> > +} else if (src_flow->nw_proto == IPPROTO_IGMP
> > +   && src_flow->dl_type == htons(ETH_TYPE_IP)) {
> > +/* OVS userspace parses the IGMP type, code, and group, but its
> > + * datapaths do not, so there is always missing information. */
> > +return ODP_FIT_TOO_LITTLE;
> >  }
> >  if (is_mask && expected_bit != OVS_KEY_ATTR_UNSPEC) {
> >  if ((flow->tp_src || flow->tp_dst) && flow->nw_proto != 0xff) {
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2 00/10] 3x performance improvement for ddlog with load balancer benchmark

2021-09-16 Thread Ben Pfaff
Thanks.  I rebased and applied this series to master.

On Mon, Sep 13, 2021 at 02:13:28PM -0400, Mark Michelson wrote:
> Hi Ben, for the series:
> 
> Acked-by: Mark Michelson 
> 
> I had a question on patch 1, but it's minor and otherwise shouldn't block an
> ACK.
> 
> On 9/7/21 6:45 PM, Ben Pfaff wrote:
> > With and without these patches, I see the following performance when I
> > run the load-balancer heavy benchmark.  The measurements include cold
> > start with all the load balancers, then deleting the sctp load balancer
> > and waiting for it to finish, then the same with the tcp load balancer,
> > then the same with the udp load balancer.  The measurements only include
> > ddlog time; there is additional overhead talking to the database, but
> > that's a constant factor.
> > 
> > Without patches:   783 CPU seconds, 23.5 GB RAM, 13:06 elapsed time.
> > With patches:  224 CPU seconds, 14.3 GB RAM,  3:45 elapsed time.
> > 
> > I am continuing to work on performance.
> > 
> > These patches are also available in:
> > https://github.com/blp/ovs-reviews/tree/ovn-memory-8
> > 
> > v1->v2:
> >- Add two new patches.
> >- Rebase and rerun benchmarks (revisions reported above).
> > 
> > Ben Pfaff (9):
> >ovn-northd-ddlog: Make joins for ARP/ND flows slightly more efficient.
> >ovn-northd-ddlog: Derive load balancer IP addresses in new
> >  LoadBalancer.
> >ovn-northd-ddlog: Reverse order of joins for connection tracking
> >  flows.
> >ovn-northd-ddlog: Avoid re-parsing LB IP addresses and ports.
> >ovn-northd-ddlog: Simplify LBVIPWithStatus to include up_backends
> >  string.
> >ovn-northd-ddlog: Avoid storing unused 'lbs' field in Router.
> >ovn-northd-ddlog: Intern strings before joining when possible.
> >ovn-northd-ddlog: Avoid map(ival) for ARP flows.
> >ovn-northd-ddlog: Avoid unnecessary joins for SwitchPortARPForwards.
> > 
> > Leonid Ryzhyk (1):
> >ovn-northd-ddlog: Intern all strings in OVSDB tables.
> > 
> >   configure.ac |2 +-
> >   manpages.mk  |1 -
> >   northd/copp.dl   |   32 +-
> >   northd/helpers.dl|   14 +-
> >   northd/ipam.dl   |   17 +-
> >   northd/lrouter.dl|  213 +++--
> >   northd/lswitch.dl|  177 ++--
> >   northd/multicast.dl  |   44 +-
> >   northd/ovn-nb.dlopts |1 +
> >   northd/ovn-sb.dlopts |1 +
> >   northd/ovn.dl|7 +
> >   northd/ovn_northd.dl | 2021 --
> >   northd/ovsdb2ddlog2c |6 +-
> >   tests/ovn-ic.at  |8 +-
> >   tests/ovn-northd.at  |   20 +-
> >   tests/ovn.at |6 +-
> >   16 files changed, 1280 insertions(+), 1290 deletions(-)
> > 
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2 10/10] ovn-northd-ddlog: Avoid unnecessary joins for SwitchPortARPForwards.

2021-09-07 Thread Ben Pfaff
SwitchPortARPForwards already has all the necessary data in it and joining
with SwitchPort just wastes time.

Signed-off-by: Ben Pfaff 
---
 northd/ovn_northd.dl | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 573c2b392..201c08a24 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -4612,8 +4612,7 @@ Flow(.logical_datapath = sw._uuid,
  .io_port  = None,
  .controller_meter = None) :-
 var mc_flood_l2 = json_escape(mC_FLOOD_L2().0),
-sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
-(.port = sp, .reachable_ips_v4 = ips_v4),
+(.port = sp@{.sw = sw}, .reachable_ips_v4 
= ips_v4),
 var ipv4 = FlatMap(ips_v4).
 Flow(.logical_datapath = sw._uuid,
  .stage= s_SWITCH_IN_L2_LKUP(),
@@ -4629,8 +4628,7 @@ Flow(.logical_datapath = sw._uuid,
  .io_port  = None,
  .controller_meter = None) :-
 var mc_flood_l2 = json_escape(mC_FLOOD_L2().0),
-sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
-(.port = sp, .reachable_ips_v6 = ips_v6),
+(.port = sp@{.sw = sw}, .reachable_ips_v6 
= ips_v6),
 var ipv6 = FlatMap(ips_v6).
 
 Flow(.logical_datapath = sw._uuid,
@@ -4642,8 +4640,7 @@ Flow(.logical_datapath = sw._uuid,
  .io_port  = None,
  .controller_meter = None) :-
 var actions = i"outport = ${json_escape(mC_FLOOD().0)}; output;",
-sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
-(.port = sp, .unreachable_ips_v4 = ips_v4),
+(.port = sp@{.sw = sw}, 
.unreachable_ips_v4 = ips_v4),
 var ipv4 = FlatMap(ips_v4).
 Flow(.logical_datapath = sw._uuid,
  .stage= s_SWITCH_IN_L2_LKUP(),
@@ -4654,8 +4651,7 @@ Flow(.logical_datapath = sw._uuid,
  .io_port  = None,
  .controller_meter = None) :-
 var actions = i"outport = ${json_escape(mC_FLOOD().0)}; output;",
-sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
-(.port = sp, .unreachable_ips_v6 = ips_v6),
+(.port = sp@{.sw = sw}, 
.unreachable_ips_v6 = ips_v6),
 var ipv6 = FlatMap(ips_v6).
 
 for (SwitchPortNewDynamicAddress(.port = {.lsp = lsp, .json_name = 
json_name, .sw = sw},
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2 08/10] ovn-northd-ddlog: Intern strings before joining when possible.

2021-09-07 Thread Ben Pfaff
By interning earlier, we do less copying across the joins with
Router and Switch, because DDlog always copies by value and
copying an interned string is cheap.

Signed-off-by: Ben Pfaff 
---
 northd/ovn_northd.dl | 57 +++-
 1 file changed, 30 insertions(+), 27 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 5af41fa22..1bf1e5333 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -3216,14 +3216,14 @@ function build_lb_vip_actions(lbvip: Intern,
 Flow(.logical_datapath = sw._uuid,
  .stage= s_SWITCH_IN_STATEFUL(),
  .priority = priority,
- .__match  = __match.intern(),
- .actions  = actions.intern(),
+ .__match  = __match,
+ .actions  = actions,
  .io_port  = None,
  .controller_meter = meter,
  .stage_hint   = 0) :-
 LBVIPWithStatus(lbvip@{.lb = lb}, up_backends),
 var priority = if (lbvip.vip_port != 0) { 120 } else { 110 },
-(var actions, var reject) = {
+(var actions0, var reject) = {
 /* Store the original destination IP to be used when generating
  * hairpin flows.
  */
@@ -3243,7 +3243,8 @@ Flow(.logical_datapath = sw._uuid,
 
 build_lb_vip_actions(lbvip, up_backends, s_SWITCH_OUT_QOS_MARK(), 
actions0 ++ actions1)
 },
-var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, 
lbvip.vip_port, lb.protocol, false, false, false),
+var actions = actions0.intern(),
+var __match = ("ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, 
lbvip.vip_port, lb.protocol, false, false, false)).intern(),
 sw in (),
 sw.load_balancer.contains(lb._uuid),
 var meter = if (reject) {
@@ -4610,10 +4611,10 @@ Flow(.logical_datapath = sw._uuid,
  .stage_hint   = stage_hint(sp.lsp._uuid),
  .io_port  = None,
  .controller_meter = None) :-
+var mc_flood_l2 = json_escape(mC_FLOOD_L2().0),
 sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .reachable_ips_v4 = ips_v4),
-var ipv4 = FlatMap(ips_v4),
-var mc_flood_l2 = json_escape(mC_FLOOD_L2().0).
+var ipv4 = FlatMap(ips_v4).
 Flow(.logical_datapath = sw._uuid,
  .stage= s_SWITCH_IN_L2_LKUP(),
  .priority = 80,
@@ -4627,35 +4628,35 @@ Flow(.logical_datapath = sw._uuid,
  .stage_hint   = stage_hint(sp.lsp._uuid),
  .io_port  = None,
  .controller_meter = None) :-
+var mc_flood_l2 = json_escape(mC_FLOOD_L2().0),
 sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .reachable_ips_v6 = ips_v6),
-var ipv6 = FlatMap(ips_v6),
-var mc_flood_l2 = json_escape(mC_FLOOD_L2().0).
+var ipv6 = FlatMap(ips_v6).
 
 Flow(.logical_datapath = sw._uuid,
  .stage= s_SWITCH_IN_L2_LKUP(),
  .priority = 90,
  .__match  = i"${fLAGBIT_NOT_VXLAN()} && arp.op == 1 && arp.tpa == 
${ipv4}",
- .actions  = i"outport = ${flood}; output;",
+ .actions  = actions,
  .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None) :-
+var actions = i"outport = ${json_escape(mC_FLOOD().0)}; output;",
 sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .unreachable_ips_v4 = ips_v4),
-var ipv4 = FlatMap(ips_v4),
-var flood = json_escape(mC_FLOOD().0).
+var ipv4 = FlatMap(ips_v4).
 Flow(.logical_datapath = sw._uuid,
  .stage= s_SWITCH_IN_L2_LKUP(),
  .priority = 90,
  .__match  = i"${fLAGBIT_NOT_VXLAN()} && nd_ns && nd.target == 
${ipv6}",
- .actions  = i"outport = ${flood}; output;",
+ .actions  = actions,
  .stage_hint   = stage_hint(sp.lsp._uuid),
  .io_port  = None,
  .controller_meter = None) :-
+var actions = i"outport = ${json_escape(mC_FLOOD().0)}; output;",
 sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .unreachable_ips_v6 = ips_v6),
-var ipv6 = FlatMap(ips_v6),
-var flood = json_escape(mC_FLOOD().0).
+var ipv6 = FlatMap(ips_v6).
 
 for (SwitchPortNewDynamicAddress(.port = {.lsp = lsp, .json_name = 
json_name, .sw = sw},
  .address = Some{addrs})
@@ -6701,7 +6702,7 @@ Flow(.logical_datapath = r._uuid,
 Flow(.logical_datapath = r._uuid,
  .stage= s_ROUTER_IN_DNAT(),
  .priority = prio,
- .__match  = __match.intern(),
+ .__match  = __match,
  .actions  = actions,
  .stage_hint   = 0,
  .io_port  = None,
@@ -6718,13 +6719,13 @@ Flow(.logical_datapath = r._uuid,
 ""
 },
 var prio = if (port != 0) { 120 } else { 110 },
-var match0 = "ct.est && &qu

[ovs-dev] [PATCH ovn v2 09/10] ovn-northd-ddlog: Avoid map(ival) for ARP flows.

2021-09-07 Thread Ben Pfaff
It's expensive for a long list.  All it buys us is sorting the list in
alphabetical order (rather than in order of Intern hash value), which isn't
that valuable.

This also updates a test that depended on the sort order.

Suggested-by: Leonid Ryzhyk 
Signed-off-by: Ben Pfaff 
---
 northd/ovn_northd.dl |  2 +-
 tests/ovn-northd.at  | 20 ++--
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 1bf1e5333..573c2b392 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -5679,7 +5679,7 @@ var residence_check = match (is_redirect) {
 not all_ipv4s.is_empty() in
 LogicalRouterArpFlow(.lr = router,
  .lrp = Some{lrp},
- .ip = i"{ ${all_ipv4s.map(ival).to_vec().join(\", 
\")} }",
+ .ip = i"{ ${all_ipv4s.to_vec().join(\", \")} }",
  .mac = rEG_INPORT_ETH_ADDR(),
  .extra_match = residence_check,
  .drop = false,
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 11886b94e..c2db8f53a 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -1685,8 +1685,16 @@ match=(eth.mcast && inport == "lrp-public"), dnl
 action=(xreg0[[0..47]] = 00:00:00:00:01:00; next;)
 ])
 
+# The order of the VIP addresses in the flow table entries doesn't
+# matter, so just replace each of them with a generic $vip for
+# testing.  It would be better if we could ensure each one appeared
+# exactly once, but that's hard with sed.
+sed_vips() {
+sed 's/192\.168\.2\.[[1456]]/$vip/g'
+}
+
 # Ingress router port ETH address is used for ARP reply/NA in lr_in_ip_input.
-AT_CHECK([ovn-sbctl lflow-list | grep -E "lr_in_ip_input.*priority=90" | grep 
"arp\|nd" | sort], [0], [dnl
+AT_CHECK([ovn-sbctl lflow-list | grep -E "lr_in_ip_input.*priority=90" | grep 
"arp\|nd" | sed_vips | sort], [0], [dnl
   table=3 (lr_in_ip_input ), priority=90   , dnl
 match=(arp.op == 1 && arp.tpa == 43.43.43.150), dnl
 action=(eth.dst = eth.src; eth.src = xreg0[[0..47]]; arp.op = 2; /* ARP reply 
*/ arp.tha = arp.sha; arp.sha = xreg0[[0..47]]; arp.tpa <-> arp.spa; outport = 
inport; flags.loopback = 1; output;)
@@ -1703,7 +1711,7 @@ action=(eth.dst = eth.src; eth.src = xreg0[[0..47]]; 
arp.op = 2; /* ARP reply */
 match=(inport == "lrp" && arp.op == 1 && arp.tpa == 42.42.42.1 && arp.spa == 
42.42.42.0/24), dnl
 action=(eth.dst = eth.src; eth.src = xreg0[[0..47]]; arp.op = 2; /* ARP reply 
*/ arp.tha = arp.sha; arp.sha = xreg0[[0..47]]; arp.tpa <-> arp.spa; outport = 
inport; flags.loopback = 1; output;)
   table=3 (lr_in_ip_input ), priority=90   , dnl
-match=(inport == "lrp" && arp.op == 1 && arp.tpa == { 192.168.2.1, 
192.168.2.4, 192.168.2.5, 192.168.2.6 }), dnl
+match=(inport == "lrp" && arp.op == 1 && arp.tpa == { $vip, $vip, $vip, $vip 
}), dnl
 action=(eth.dst = eth.src; eth.src = xreg0[[0..47]]; arp.op = 2; /* ARP reply 
*/ arp.tha = arp.sha; arp.sha = xreg0[[0..47]]; arp.tpa <-> arp.spa; outport = 
inport; flags.loopback = 1; output;)
   table=3 (lr_in_ip_input ), priority=90   , dnl
 match=(inport == "lrp" && ip6.dst == {fe80::200:ff:fe00:1, ff02::1:ff00:1} && 
nd_ns && nd.target == fe80::200:ff:fe00:1), dnl
@@ -1718,7 +1726,7 @@ action=(nd_na { eth.src = xreg0[[0..47]]; ip6.src = 
fe80::200:ff:fe00:102:8080;
 match=(inport == "lrp-public" && arp.op == 1 && arp.tpa == 43.43.43.1 && 
arp.spa == 43.43.43.0/24), dnl
 action=(eth.dst = eth.src; eth.src = xreg0[[0..47]]; arp.op = 2; /* ARP reply 
*/ arp.tha = arp.sha; arp.sha = xreg0[[0..47]]; arp.tpa <-> arp.spa; outport = 
inport; flags.loopback = 1; output;)
   table=3 (lr_in_ip_input ), priority=90   , dnl
-match=(inport == "lrp-public" && arp.op == 1 && arp.tpa == { 192.168.2.1, 
192.168.2.4, 192.168.2.5, 192.168.2.6 }), dnl
+match=(inport == "lrp-public" && arp.op == 1 && arp.tpa == { $vip, $vip, $vip, 
$vip }), dnl
 action=(eth.dst = eth.src; eth.src = xreg0[[0..47]]; arp.op = 2; /* ARP reply 
*/ arp.tha = arp.sha; arp.sha = xreg0[[0..47]]; arp.tpa <-> arp.spa; outport = 
inport; flags.loopback = 1; output;)
   table=3 (lr_in_ip_input ), priority=90   , dnl
 match=(inport == "lrp-public" && ip6.dst == {fe80::200:ff:fe00:100, 
ff02::1:ff00:100} && nd_ns && nd.target == fe80::200:ff:fe00:100), dnl
@@ -1761,7 +1769,7 @@ action=(xreg0[[0..47]] = 00:00:00:00:01:00; next;)
 # Ingress router port is used for ARP reply/NA in lr_in_ip_input.
 # xxreg0[0..47] is used unless external_mac is set.
 # Priority 90 flows (per router).
-AT_CHECK([ovn-sbctl lflow-list | grep

[ovs-dev] [PATCH ovn v2 06/10] ovn-northd-ddlog: Simplify LBVIPWithStatus to include up_backends string.

2021-09-07 Thread Ben Pfaff
There was only one use for the 'backends' map, which was to be converted
to a string that listed all the backends that were up, so we might as
well do that at its point of origination.  At the same time, everything
else in LBVIPWithStatus was just a copy of the underlying LBVIP, so we
might as well just reference it.

Only slight improvement to performance.

Signed-off-by: Ben Pfaff 
---
 northd/lswitch.dl| 32 ++--
 northd/ovn_northd.dl | 29 +
 2 files changed, 23 insertions(+), 38 deletions(-)

diff --git a/northd/lswitch.dl b/northd/lswitch.dl
index ad6475a91..69b1c6eb5 100644
--- a/northd/lswitch.dl
+++ b/northd/lswitch.dl
@@ -445,25 +445,21 @@ function default_protocol(protocol: Option): 
istring = {
 }
 }
 
-typedef LBVIPWithStatus = LBVIPWithStatus {
-lb: Intern,
-vip_key: istring,
-backend_ips: istring,
-health_check: Option>,
-vip_addr: v46_ip,
-vip_port: bit<16>,
-backends: Map
-}
-relation LBVIPWithStatus[Intern]
-
-LBVIPWithStatus[LBVIPWithStatus{lb, vip_key, backend_ips, health_check, 
vip_addr, vip_port, map_empty()}.intern()] :-
-(lb, vip_key, backend_ips, health_check, vip_addr, vip_port, 
vec_empty()).
-LBVIPWithStatus[LBVIPWithStatus{lb, vip_key, backend_ips, health_check, 
vip_addr, vip_port, backends_with_status}.intern()] :-
-(lb, vip_key, backend_ips, health_check, vip_addr, vip_port, 
backends),
-var backend = FlatMap(backends),
+relation LBVIPWithStatus(
+lbvip: Intern,
+up_backends: istring)
+LBVIPWithStatus(lbvip, i"") :-
+lbvip in (.backends = vec_empty()).
+LBVIPWithStatus(lbvip, up_backends) :-
 LBVIPBackendStatus(lbvip, backend, up),
-var backends_with_status = ((backend, up)).group_by((lb, vip_key, 
backend_ips, health_check,
- vip_addr, 
vip_port)).to_map().
+var up_backends = ((backend, up)).group_by(lbvip).to_vec().filter_map(|x| {
+(LBVIPBackend{var ip, var port, _}, var up) = x;
+match ((up, port)) {
+(true, 0) -> Some{"${ip.to_bracketed_string()}"},
+(true, _) -> Some{"${ip.to_bracketed_string()}:${port}"},
+_ -> None
+}
+}).join(",").intern().
 
 /* Maps from a load-balancer virtual IP backend to whether it's up or not.
  *
diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index bf2192f7e..5af41fa22 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -3177,7 +3177,7 @@ function get_match_for_lb_key(ip_address: v46_ip,
 }
 /* New connections in Ingress table. */
 
-function ct_lb(backends: string,
+function ct_lb(backends: istring,
selection_fields: Set, protocol: Option): 
string {
 var args = vec_with_capacity(2);
 args.push("backends=${backends}");
@@ -3198,18 +3198,11 @@ function ct_lb(backends: string,
 
 "ct_lb(" ++ args.join("; ") ++ ");"
 }
-function build_lb_vip_actions(lbvip: Intern,
+function build_lb_vip_actions(lbvip: Intern,
+  up_backends: istring,
   stage: Intern,
   actions0: string): (string, bool) {
-var up_backends = vec_with_capacity(lbvip.backends.size());
-for (pair in lbvip.backends) {
-(var backend, var up) = pair;
-if (up) {
-up_backends.push((backend.ip, backend.port))
-}
-};
-
-if (up_backends.is_empty()) {
+if (up_backends == i"") {
 if (lbvip.lb.options.get_bool_def(i"reject", false)) {
 return ("reg0 = 0; reject { outport <-> inport; 
${next_to_stage(stage)};};", true)
 } else if (lbvip.health_check.is_some()) {
@@ -3217,11 +3210,7 @@ function build_lb_vip_actions(lbvip: 
Intern,
 } // else fall through
 };
 
-var up_backends_s = up_backends.sort_imm().map(|x| match (x) {
-(ip, 0) -> "${ip.to_bracketed_string()}",
-(ip, port) -> "${ip.to_bracketed_string()}:${port}"
-}).join(",");
-var actions = ct_lb(up_backends_s, lbvip.lb.selection_fields, 
lbvip.lb.protocol);
+var actions = ct_lb(up_backends, lbvip.lb.selection_fields, 
lbvip.lb.protocol);
 (actions0 ++ actions, false)
 }
 Flow(.logical_datapath = sw._uuid,
@@ -3232,7 +3221,7 @@ Flow(.logical_datapath = sw._uuid,
  .io_port  = None,
  .controller_meter = meter,
  .stage_hint   = 0) :-
-LBVIPWithStatus[lbvip@{.lb = lb}],
+LBVIPWithStatus(lbvip@{.lb = lb}, up_backends),
 var priority = if (lbvip.vip_port != 0) { 120 } else { 110 },
 (var actions, var reject) = {
 /* Store the original destination IP to be used when generating
@@ -3252,7 +3241,7 @@ Flow(.logical_datapath = sw._uuid,
 ""
 };
 
-build_lb_vip_actions(lbvip, 

[ovs-dev] [PATCH ovn v2 07/10] ovn-northd-ddlog: Avoid storing unused 'lbs' field in Router.

2021-09-07 Thread Ben Pfaff
Signed-off-by: Ben Pfaff 
---
 northd/lrouter.dl | 9 -
 1 file changed, 9 deletions(-)

diff --git a/northd/lrouter.dl b/northd/lrouter.dl
index c0ec6be47..582e07654 100644
--- a/northd/lrouter.dl
+++ b/northd/lrouter.dl
@@ -488,7 +488,6 @@ typedef Router = Router {
 is_gateway: bool,
 nats:   Vec,
 snat_ips:   Map>,
-lbs:Vec>,
 lb_ipv4s_routable:  Set,
 lb_ipv4s_unroutable: Set,
 lb_ipv6s_routable:  Set,
@@ -515,7 +514,6 @@ Router[Router{
 .is_gateway  = lr.options.contains_key(i"chassis"),
 .nats= nats,
 .snat_ips= snat_ips,
-.lbs = lbs,
 .lb_ipv4s_routable = lb_ipv4s_routable,
 .lb_ipv4s_unroutable = lb_ipv4s_unroutable,
 .lb_ipv6s_routable = lb_ipv6s_routable,
@@ -553,13 +551,6 @@ Router[Router{
  lb_ipv6s_routable, lb_ipv6s_unroutable)
 }.
 
-/* RouterLB: many-to-many relation between logical routers and nb::LB */
-relation RouterLB(router: Intern, lb: Intern)
-
-RouterLB(router, lb) :-
-router in (.lbs = lbs),
-var lb = FlatMap(lbs).
-
 /* Router-to-router logical port connections */
 relation RouterRouterPeer(rport1: uuid, rport2: uuid, rport2_name: istring)
 
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2 04/10] ovn-northd-ddlog: Reverse order of joins for connection tracking flows.

2021-09-07 Thread Ben Pfaff
DDlog evaluates rules in the order given by syntax.  The rules for
load balancers all first evaluated a Router or Switch, then joined
that against the load balancers.  However, the expensive logic was
all on the load balancers.  This meant that the load balancer logic
was happening many times, once per switch or router that contained
it.  When a single load balancer was part of many switches or
routers, this did a lot of redundant processing.  This commit
reverses the join order, which speeds up processing a lot.

This commit looks big because it also converts a lot of rules from
the FLWOR syntax to traditional Datalog-style syntax.  This is not
completely needed, but the Datalog-style syntax is more versatile
(it supports FlatMap and aggregation), so I tend to make that sort
of change as I refactor things.

Signed-off-by: Ben Pfaff 
---
 northd/lrouter.dl|  11 --
 northd/ovn_northd.dl | 419 +--
 2 files changed, 209 insertions(+), 221 deletions(-)

diff --git a/northd/lrouter.dl b/northd/lrouter.dl
index 17d803292..c0ec6be47 100644
--- a/northd/lrouter.dl
+++ b/northd/lrouter.dl
@@ -560,17 +560,6 @@ RouterLB(router, lb) :-
 router in (.lbs = lbs),
 var lb = FlatMap(lbs).
 
-/* Load balancer VIPs associated with routers */
-relation RouterLBVIP(
-router: Intern,
-lb: Intern,
-vip: istring,
-backends: istring)
-
-RouterLBVIP(router, lb, vip, backends) :-
-RouterLB(router, lb@({.lb = ::Load_Balancer{.vips = 
vips}})),
-(var vip, var backends) = FlatMap(vips).
-
 /* Router-to-router logical port connections */
 relation RouterRouterPeer(rport1: uuid, rport2: uuid, rport2_name: istring)
 
diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 4348171ba..f678a4f50 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -3232,9 +3232,7 @@ Flow(.logical_datapath = sw._uuid,
  .io_port  = None,
  .controller_meter = meter,
  .stage_hint   = 0) :-
-sw in (),
 LBVIPWithStatus[lbvip@{.lb = lb}],
-sw.load_balancer.contains(lb._uuid),
 var priority = if (lbvip.vip_port != 0) { 120 } else { 110 },
 (var actions, var reject) = {
 /* Store the original destination IP to be used when generating
@@ -3256,12 +3254,14 @@ Flow(.logical_datapath = sw._uuid,
 
 build_lb_vip_actions(lbvip, s_SWITCH_OUT_QOS_MARK(), actions0 ++ 
actions1)
 },
+var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, 
lbvip.vip_port, lb.protocol, false, false, false),
+sw in (),
+sw.load_balancer.contains(lb._uuid),
 var meter = if (reject) {
 sw.copp.get(cOPP_REJECT())
 } else {
 None
-},
-var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, 
lbvip.vip_port, lb.protocol, false, false, false).
+}.
 
 /* Ingress Pre-Hairpin/Nat-Hairpin/Hairpin tabled (Priority 0).
  * Packets that don't need hairpinning should continue processing.
@@ -5657,6 +5657,26 @@ for (RouterPortNetworksIPv4Addr(.port = {.lrp 
= lrp,
 }
 }
 
+LogicalRouterNdFlow(.lr = r,
+.lrp = Some{lrp},
+.action = i"nd_na",
+.ip = ip,
+.sn_ip = false,
+.mac = rEG_INPORT_ETH_ADDR(),
+.extra_match = residence_check,
+.drop = false,
+.priority = 90,
+.stage_hint = 0) :-
+LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}],
+Some{(IPv6{var ip}, _)} = ip_address_and_port_from_lb_key(lb_key.ival()),
+r in (),
+r.load_balancer.contains(lb._uuid),
+(.router = r, .lrp = lrp, .is_redirect = is_redirect),
+var residence_check = match (is_redirect) {
+true -> 
Some{i"is_chassis_resident(${json_escape(chassis_redirect_name(lrp.name))})"},
+false -> None
+}.
+
 for ((.lrp = lrp,
  .router = router@{._uuid = lr_uuid},
  .json_name = json_name,
@@ -5675,22 +5695,7 @@ var residence_check = match (is_redirect) {
  .extra_match = residence_check,
  .drop = false,
  .priority = 90,
- .stage_hint = 0);
-for (RouterLBVIP(.router = {._uuid= lr_uuid}, .vip = vip)) {
-Some{(var ip_address, _)} = 
ip_address_and_port_from_lb_key(vip.ival()) in {
-IPv6{var ipv6} = ip_address in
-LogicalRouterNdFlow(.lr = router,
-.lrp = Some{lrp},
-.action = i"nd_na",
-.ip = ipv6,
-.sn_ip = false,
-.mac = rEG_INPORT_ETH_ADDR(),
-.extra_match = residence_check,
-.drop = false,
-

[ovs-dev] [PATCH ovn v2 05/10] ovn-northd-ddlog: Avoid re-parsing LB IP addresses and ports.

2021-09-07 Thread Ben Pfaff
The LBVIPs already contain parsed versions of the load balancer
keys, but the code was parsing it redundantly.

Slight performance improvement but much cleaner.

Signed-off-by: Ben Pfaff 
---
 northd/ovn_northd.dl | 37 -
 1 file changed, 12 insertions(+), 25 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index f678a4f50..bf2192f7e 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -5667,8 +5667,7 @@ LogicalRouterNdFlow(.lr = r,
 .drop = false,
 .priority = 90,
 .stage_hint = 0) :-
-LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}],
-Some{(IPv6{var ip}, _)} = ip_address_and_port_from_lb_key(lb_key.ival()),
+LBVIP[lbvip@{.vip_addr = IPv6{ip}, .lb = lb}],
 r in (),
 r.load_balancer.contains(lb._uuid),
 (.router = r, .lrp = lrp, .is_redirect = is_redirect),
@@ -6685,8 +6684,7 @@ Flow(.logical_datapath = r._uuid,
  .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None) :-
-LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}],
-Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()),
+LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb}],
 var prio = if (port != 0) { 110 } else { 100 },
 var proto = match (lb.protocol) {
 Some{proto} -> proto,
@@ -6719,8 +6717,7 @@ Flow(.logical_datapath = r._uuid,
  .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None) :-
-LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}],
-Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()),
+LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb}],
 var proto = match (lb.protocol) {
 Some{proto} -> proto,
 _ -> i"tcp"
@@ -6765,8 +6762,7 @@ Flow(.logical_datapath = r._uuid,
  .stage_hint   = stage_hint(lb._uuid),
  .io_port  = None,
  .controller_meter = None) :-
-LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}],
-Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()),
+LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb}],
 var proto = match (lb.protocol) {
 Some{proto} -> proto,
 _ -> i"tcp"
@@ -6791,28 +6787,19 @@ Flow(.logical_datapath = r._uuid,
  .stage_hint   = stage_hint(lb._uuid),
  .io_port  = None,
  .controller_meter = None) :-
-LBVIP[lbvip@{.vip_key = lb_key, .lb = lb, .backend_ips = backends}],
-Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()),
+LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb, .backends = 
backends}],
 var proto = match (lb.protocol) {
 Some{proto} -> proto,
 _ -> i"tcp"
 },
-var conds = {
-var conds = vec_empty();
-for (ip_str in backends.split(",")) {
-match (ip_address_and_port_from_lb_key(ip_str)) {
-None -> () /* FIXME: put a break here */,
-Some{(ip_, port_)} -> conds.push(
-"(${ip_.ipX()}.src == ${ip_}" ++
-if (port_ != 0) {
-" && ${proto}.src == ${port_})"
-} else {
-")"
-})
-}
+var conds = backends.map(|b| {
+var port_match = if (b.port != 0) {
+" && ${proto}.src == ${b.port}"
+} else {
+""
 };
-conds.join(" || ")
-},
+"(${b.ip.ipX()}.src == ${b.ip}" ++ port_match ++ ")"
+}).join(" || "),
 conds != "",
 r in (),
 Some{var gwport} = r.l3dgw_ports.nth(0),
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2 03/10] ovn-northd-ddlog: Derive load balancer IP addresses in new LoadBalancer.

2021-09-07 Thread Ben Pfaff
This makes the get_router_load_balancer_ips() function much faster.
This function is a hot path for the use of load balancers, so it
improves performance overall when they are in use.

Signed-off-by: Ben Pfaff 
---
 northd/lrouter.dl| 68 +--
 northd/ovn_northd.dl | 76 +++-
 2 files changed, 91 insertions(+), 53 deletions(-)

diff --git a/northd/lrouter.dl b/northd/lrouter.dl
index b6e752f7c..17d803292 100644
--- a/northd/lrouter.dl
+++ b/northd/lrouter.dl
@@ -402,14 +402,13 @@ LogicalRouterSnatIPs(lr._uuid, map_empty()) :-
 lr in nb::Logical_Router(),
 not LogicalRouterSnatIP(.lr = lr._uuid).
 
-relation LogicalRouterLB(lr: uuid, nat: Intern)
-
+relation LogicalRouterLB(lr: uuid, nat: Intern)
 LogicalRouterLB(lr, lb) :-
 nb::Logical_Router(._uuid = lr, .load_balancer = lbs),
 var lb_uuid = FlatMap(lbs),
-lb in ::Load_Balancer(._uuid = lb_uuid).
+lb in (.lb = ::Load_Balancer{._uuid = lb_uuid}).
 
-relation LogicalRouterLBs(lr: uuid, nat: Vec>)
+relation LogicalRouterLBs(lr: uuid, nat: Vec>)
 
 LogicalRouterLBs(lr, lbs) :-
  LogicalRouterLB(lr, lb),
@@ -448,6 +447,31 @@ LogicalRouterCopp0(lr, meters) :-
 
 function chassis_redirect_name(port_name: istring): string = "cr-${port_name}"
 
+typedef LoadBalancer = LoadBalancer {
+lb: Intern,
+ipv4s: Set,
+ipv6s: Set,
+routable: bool
+}
+
+relation LoadBalancer[Intern]
+LoadBalancer[LoadBalancer{lb, ipv4s, ipv6s, routable}.intern()] :-
+nb::Load_Balancer[lb],
+var routable = lb.options.get_bool_def(i"add_route", false),
+(var ipv4s, var ipv6s) = {
+var ipv4s = set_empty();
+var ipv6s = set_empty();
+for ((vip, _) in lb.vips) {
+/* node->key contains IP:port or just IP. */
+match (ip_address_and_port_from_lb_key(vip.ival())) {
+None -> (),
+Some{(IPv4{ipv4}, _)} -> ipv4s.insert(i"${ipv4}"),
+Some{(IPv6{ipv6}, _)} -> ipv6s.insert(i"${ipv6}"),
+}
+};
+(ipv4s, ipv6s)
+}.
+
 typedef Router = Router {
 /* Fields copied from nb::Logical_Router. */
 _uuid:  uuid,
@@ -464,7 +488,11 @@ typedef Router = Router {
 is_gateway: bool,
 nats:   Vec,
 snat_ips:   Map>,
-lbs:Vec>,
+lbs:Vec>,
+lb_ipv4s_routable:  Set,
+lb_ipv4s_unroutable: Set,
+lb_ipv6s_routable:  Set,
+lb_ipv6s_unroutable: Set,
 mcast_cfg:  Intern,
 learn_from_arp_request: bool,
 force_lb_snat: bool,
@@ -488,6 +516,10 @@ Router[Router{
 .nats= nats,
 .snat_ips= snat_ips,
 .lbs = lbs,
+.lb_ipv4s_routable = lb_ipv4s_routable,
+.lb_ipv4s_unroutable = lb_ipv4s_unroutable,
+.lb_ipv6s_routable = lb_ipv6s_routable,
+.lb_ipv6s_unroutable = lb_ipv6s_unroutable,
 .mcast_cfg   = mcast_cfg,
 .learn_from_arp_request = learn_from_arp_request,
 .force_lb_snat = force_lb_snat,
@@ -501,10 +533,28 @@ Router[Router{
 LogicalRouterCopp(lr._uuid, copp),
 mcast_cfg in (.datapath = lr._uuid),
 var learn_from_arp_request = 
lr.options.get_bool_def(i"always_learn_from_arp_request", true),
-var force_lb_snat = lb_force_snat_router_ip(lr.options).
+var force_lb_snat = lb_force_snat_router_ip(lr.options),
+(var lb_ipv4s_routable, var lb_ipv4s_unroutable,
+ var lb_ipv6s_routable, var lb_ipv6s_unroutable) = {
+var lb_ipv4s_routable = set_empty();
+var lb_ipv4s_unroutable = set_empty();
+var lb_ipv6s_routable = set_empty();
+var lb_ipv6s_unroutable = set_empty();
+for (lb in lbs) {
+if (lb.routable) {
+lb_ipv4s_routable = lb_ipv4s_routable.union(lb.ipv4s);
+lb_ipv6s_routable = lb_ipv6s_routable.union(lb.ipv6s);
+} else {
+lb_ipv4s_unroutable = lb_ipv4s_unroutable.union(lb.ipv4s);
+lb_ipv6s_unroutable = lb_ipv6s_unroutable.union(lb.ipv6s);
+}
+};
+(lb_ipv4s_routable, lb_ipv4s_unroutable,
+ lb_ipv6s_routable, lb_ipv6s_unroutable)
+}.
 
 /* RouterLB: many-to-many relation between logical routers and nb::LB */
-relation RouterLB(router: Intern, lb: Intern)
+relation RouterLB(router: Intern, lb: Intern)
 
 RouterLB(router, lb) :-
 router in (.lbs = lbs),
@@ -513,12 +563,12 @@ RouterLB(router, lb) :-
 /* Load balancer VIPs associated with routers */
 relation RouterLBVIP(
 router: Intern,
-lb: Intern,
+lb: Intern,
 vip: istring,
 backends: istring)
 
 RouterLBVIP(router, lb, vip, backends) :-
-RouterLB(router, lb@(::Load_Balancer{.vips = vips})),
+RouterLB(router, lb@({.lb = ::Load_Balancer{.vips = 
vips}})),
 (var vip, var backends) = FlatMap(

[ovs-dev] [PATCH ovn v2 02/10] ovn-northd-ddlog: Make joins for ARP/ND flows slightly more efficient.

2021-09-07 Thread Ben Pfaff
DDlog can index equality joins within a expression like this, but not
conditional expression that follow such an expression.  This doesn't
actually matter much in this case because ordinarily the expression
will be true (most router ports are enabled).

Signed-off-by: Ben Pfaff 
---
 northd/lrouter.dl|  6 --
 northd/ovn_northd.dl | 16 +---
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/northd/lrouter.dl b/northd/lrouter.dl
index cc3dced5f..b6e752f7c 100644
--- a/northd/lrouter.dl
+++ b/northd/lrouter.dl
@@ -596,7 +596,8 @@ typedef RouterPort = RouterPort {
 peer: RouterPeer,
 mcast_cfg:Intern,
 sb_options:   Map,
-has_bfd:  bool
+has_bfd:  bool,
+enabled:  bool
 }
 
 relation RouterPort[Intern]
@@ -610,7 +611,8 @@ RouterPort[RouterPort{
.peer   = peer,
.mcast_cfg  = mcast_cfg,
.sb_options = sb_options,
-   .has_bfd= has_bfd
+   .has_bfd= has_bfd,
+   .enabled= lrp.is_enabled()
}.intern()] :-
 lrp in ::Logical_Router_Port(),
 Some{var networks} = extract_lrp_networks(lrp.mac.ival(), 
lrp.networks.map(ival)),
diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 4402b17d2..c41a79b84 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -4609,8 +4609,7 @@ relation (
.reachable_ips_v6 = reachable_ips_v6,
.unreachable_ips_v4 = unreachable_ips_v4,
.unreachable_ips_v6 = unreachable_ips_v6) :-
-port in (.peer = Some{rp}),
-rp.is_enabled(),
+port in (.peer = Some{rp@{.enabled = true}}),
 (var reachable_ips_v4, var reachable_ips_v6, var unreachable_ips_v4, var 
unreachable_ips_v6) = get_arp_forward_ips(rp).
 
 /* Packets received from VXLAN tunnels have already been through the
@@ -4632,8 +4631,7 @@ Flow(.logical_datapath = sw._uuid,
  .stage_hint   = stage_hint(sp.lsp._uuid),
  .io_port  = None,
  .controller_meter = None) :-
-sp in (.sw = sw, .peer = Some{rp}),
-rp.is_enabled(),
+sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .reachable_ips_v4 = ips_v4),
 var ipv4 = FlatMap(ips_v4),
 var mc_flood_l2 = json_escape(mC_FLOOD_L2().0).
@@ -4650,8 +4648,7 @@ Flow(.logical_datapath = sw._uuid,
  .stage_hint   = stage_hint(sp.lsp._uuid),
  .io_port  = None,
  .controller_meter = None) :-
-sp in (.sw = sw, .peer = Some{rp}),
-rp.is_enabled(),
+sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .reachable_ips_v6 = ips_v6),
 var ipv6 = FlatMap(ips_v6),
 var mc_flood_l2 = json_escape(mC_FLOOD_L2().0).
@@ -4664,12 +4661,10 @@ Flow(.logical_datapath = sw._uuid,
  .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None) :-
-sp in (.sw = sw, .peer = Some{rp}),
-rp.is_enabled(),
+sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .unreachable_ips_v4 = ips_v4),
 var ipv4 = FlatMap(ips_v4),
 var flood = json_escape(mC_FLOOD().0).
-
 Flow(.logical_datapath = sw._uuid,
  .stage= s_SWITCH_IN_L2_LKUP(),
  .priority = 90,
@@ -4678,8 +4673,7 @@ Flow(.logical_datapath = sw._uuid,
  .stage_hint   = stage_hint(sp.lsp._uuid),
  .io_port  = None,
  .controller_meter = None) :-
-sp in (.sw = sw, .peer = Some{rp}),
-rp.is_enabled(),
+sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .unreachable_ips_v6 = ips_v6),
 var ipv6 = FlatMap(ips_v6),
 var flood = json_escape(mC_FLOOD().0).
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2 00/10] 3x performance improvement for ddlog with load balancer benchmark

2021-09-07 Thread Ben Pfaff
With and without these patches, I see the following performance when I
run the load-balancer heavy benchmark.  The measurements include cold
start with all the load balancers, then deleting the sctp load balancer
and waiting for it to finish, then the same with the tcp load balancer,
then the same with the udp load balancer.  The measurements only include
ddlog time; there is additional overhead talking to the database, but
that's a constant factor.

Without patches:   783 CPU seconds, 23.5 GB RAM, 13:06 elapsed time.
With patches:  224 CPU seconds, 14.3 GB RAM,  3:45 elapsed time.

I am continuing to work on performance.

These patches are also available in:
https://github.com/blp/ovs-reviews/tree/ovn-memory-8

v1->v2:
  - Add two new patches.
  - Rebase and rerun benchmarks (revisions reported above).

Ben Pfaff (9):
  ovn-northd-ddlog: Make joins for ARP/ND flows slightly more efficient.
  ovn-northd-ddlog: Derive load balancer IP addresses in new
LoadBalancer.
  ovn-northd-ddlog: Reverse order of joins for connection tracking
flows.
  ovn-northd-ddlog: Avoid re-parsing LB IP addresses and ports.
  ovn-northd-ddlog: Simplify LBVIPWithStatus to include up_backends
string.
  ovn-northd-ddlog: Avoid storing unused 'lbs' field in Router.
  ovn-northd-ddlog: Intern strings before joining when possible.
  ovn-northd-ddlog: Avoid map(ival) for ARP flows.
  ovn-northd-ddlog: Avoid unnecessary joins for SwitchPortARPForwards.

Leonid Ryzhyk (1):
  ovn-northd-ddlog: Intern all strings in OVSDB tables.

 configure.ac |2 +-
 manpages.mk  |1 -
 northd/copp.dl   |   32 +-
 northd/helpers.dl|   14 +-
 northd/ipam.dl   |   17 +-
 northd/lrouter.dl|  213 +++--
 northd/lswitch.dl|  177 ++--
 northd/multicast.dl  |   44 +-
 northd/ovn-nb.dlopts |1 +
 northd/ovn-sb.dlopts |1 +
 northd/ovn.dl|7 +
 northd/ovn_northd.dl | 2021 --
 northd/ovsdb2ddlog2c |6 +-
 tests/ovn-ic.at  |8 +-
 tests/ovn-northd.at  |   20 +-
 tests/ovn.at |6 +-
 16 files changed, 1280 insertions(+), 1290 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] json: Optimize string serialization.

2021-08-30 Thread Ben Pfaff
On Tue, Aug 24, 2021 at 11:07:22PM +0200, Ilya Maximets wrote:
> +count = 0;
> +start = string;
>  while ((c = *string++) != '\0') {
> -escape = chars_escaping[c];
> -while ((c2 = *escape++) != '\0') {
> -ds_put_char(ds, c2);
> +if (c >= ' ' && c != '\"' && c != '\\') {
> +count++;
> +continue;
> +} else {
> +if (count) {
> +ds_put_buffer(ds, start, count);
> +count = 0;
> +}
> +start = string;
> +escape = chars_escaping[c];
> +while ((c2 = *escape++) != '\0') {
> +ds_put_char(ds, c2);
> +}

The "continue" above isn't needed.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 8/8] ovn-northd-ddlog: Intern strings before joining when possible.

2021-08-27 Thread Ben Pfaff
By interning earlier, we do less copying across the joins with
Router and Switch, because DDlog always copies by value and
copying an interned string is cheap.

Signed-off-by: Ben Pfaff 
---
 northd/ovn_northd.dl | 57 +++-
 1 file changed, 30 insertions(+), 27 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 8831da442..84d1fde0d 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -3216,14 +3216,14 @@ function build_lb_vip_actions(lbvip: Intern,
 Flow(.logical_datapath = sw._uuid,
  .stage= s_SWITCH_IN_STATEFUL(),
  .priority = priority,
- .__match  = __match.intern(),
- .actions  = actions.intern(),
+ .__match  = __match,
+ .actions  = actions,
  .io_port  = None,
  .controller_meter = meter,
  .stage_hint   = 0) :-
 LBVIPWithStatus(lbvip@{.lb = lb}, up_backends),
 var priority = if (lbvip.vip_port != 0) { 120 } else { 110 },
-(var actions, var reject) = {
+(var actions0, var reject) = {
 /* Store the original destination IP to be used when generating
  * hairpin flows.
  */
@@ -3243,7 +3243,8 @@ Flow(.logical_datapath = sw._uuid,
 
 build_lb_vip_actions(lbvip, up_backends, s_SWITCH_OUT_QOS_MARK(), 
actions0 ++ actions1)
 },
-var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, 
lbvip.vip_port, lb.protocol, false, false, false),
+var actions = actions0.intern(),
+var __match = ("ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, 
lbvip.vip_port, lb.protocol, false, false, false)).intern(),
 sw in (),
 sw.load_balancer.contains(lb._uuid),
 var meter = if (reject) {
@@ -4610,10 +4611,10 @@ Flow(.logical_datapath = sw._uuid,
  .stage_hint   = stage_hint(sp.lsp._uuid),
  .io_port  = None,
  .controller_meter = None) :-
+var mc_flood_l2 = json_escape(mC_FLOOD_L2().0),
 sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .reachable_ips_v4 = ips_v4),
-var ipv4 = FlatMap(ips_v4),
-var mc_flood_l2 = json_escape(mC_FLOOD_L2().0).
+var ipv4 = FlatMap(ips_v4).
 Flow(.logical_datapath = sw._uuid,
  .stage= s_SWITCH_IN_L2_LKUP(),
  .priority = 80,
@@ -4627,35 +4628,35 @@ Flow(.logical_datapath = sw._uuid,
  .stage_hint   = stage_hint(sp.lsp._uuid),
  .io_port  = None,
  .controller_meter = None) :-
+var mc_flood_l2 = json_escape(mC_FLOOD_L2().0),
 sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .reachable_ips_v6 = ips_v6),
-var ipv6 = FlatMap(ips_v6),
-var mc_flood_l2 = json_escape(mC_FLOOD_L2().0).
+var ipv6 = FlatMap(ips_v6).
 
 Flow(.logical_datapath = sw._uuid,
  .stage= s_SWITCH_IN_L2_LKUP(),
  .priority = 90,
  .__match  = i"${fLAGBIT_NOT_VXLAN()} && arp.op == 1 && arp.tpa == 
${ipv4}",
- .actions  = i"outport = ${flood}; output;",
+ .actions  = actions,
  .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None) :-
+var actions = i"outport = ${json_escape(mC_FLOOD().0)}; output;",
 sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .unreachable_ips_v4 = ips_v4),
-var ipv4 = FlatMap(ips_v4),
-var flood = json_escape(mC_FLOOD().0).
+var ipv4 = FlatMap(ips_v4).
 Flow(.logical_datapath = sw._uuid,
  .stage= s_SWITCH_IN_L2_LKUP(),
  .priority = 90,
  .__match  = i"${fLAGBIT_NOT_VXLAN()} && nd_ns && nd.target == 
${ipv6}",
- .actions  = i"outport = ${flood}; output;",
+ .actions  = actions,
  .stage_hint   = stage_hint(sp.lsp._uuid),
  .io_port  = None,
  .controller_meter = None) :-
+var actions = i"outport = ${json_escape(mC_FLOOD().0)}; output;",
 sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .unreachable_ips_v6 = ips_v6),
-var ipv6 = FlatMap(ips_v6),
-var flood = json_escape(mC_FLOOD().0).
+var ipv6 = FlatMap(ips_v6).
 
 for (SwitchPortNewDynamicAddress(.port = {.lsp = lsp, .json_name = 
json_name, .sw = sw},
  .address = Some{addrs})
@@ -6701,7 +6702,7 @@ Flow(.logical_datapath = r._uuid,
 Flow(.logical_datapath = r._uuid,
  .stage= s_ROUTER_IN_DNAT(),
  .priority = prio,
- .__match  = __match.intern(),
+ .__match  = __match,
  .actions  = actions,
  .stage_hint   = 0,
  .io_port  = None,
@@ -6718,13 +6719,13 @@ Flow(.logical_datapath = r._uuid,
 ""
 },
 var prio = if (port != 0) { 120 } else { 110 },
-var match0 = "ct.est && &qu

[ovs-dev] [PATCH ovn 6/8] ovn-northd-ddlog: Simplify LBVIPWithStatus to include up_backends string.

2021-08-27 Thread Ben Pfaff
There was only one use for the 'backends' map, which was to be converted
to a string that listed all the backends that were up, so we might as
well do that at its point of origination.  At the same time, everything
else in LBVIPWithStatus was just a copy of the underlying LBVIP, so we
might as well just reference it.

Only slight improvement to performance.

Signed-off-by: Ben Pfaff 
---
 northd/lswitch.dl| 32 ++--
 northd/ovn_northd.dl | 29 +
 2 files changed, 23 insertions(+), 38 deletions(-)

diff --git a/northd/lswitch.dl b/northd/lswitch.dl
index ad6475a91..69b1c6eb5 100644
--- a/northd/lswitch.dl
+++ b/northd/lswitch.dl
@@ -445,25 +445,21 @@ function default_protocol(protocol: Option): 
istring = {
 }
 }
 
-typedef LBVIPWithStatus = LBVIPWithStatus {
-lb: Intern,
-vip_key: istring,
-backend_ips: istring,
-health_check: Option>,
-vip_addr: v46_ip,
-vip_port: bit<16>,
-backends: Map
-}
-relation LBVIPWithStatus[Intern]
-
-LBVIPWithStatus[LBVIPWithStatus{lb, vip_key, backend_ips, health_check, 
vip_addr, vip_port, map_empty()}.intern()] :-
-(lb, vip_key, backend_ips, health_check, vip_addr, vip_port, 
vec_empty()).
-LBVIPWithStatus[LBVIPWithStatus{lb, vip_key, backend_ips, health_check, 
vip_addr, vip_port, backends_with_status}.intern()] :-
-(lb, vip_key, backend_ips, health_check, vip_addr, vip_port, 
backends),
-var backend = FlatMap(backends),
+relation LBVIPWithStatus(
+lbvip: Intern,
+up_backends: istring)
+LBVIPWithStatus(lbvip, i"") :-
+lbvip in (.backends = vec_empty()).
+LBVIPWithStatus(lbvip, up_backends) :-
 LBVIPBackendStatus(lbvip, backend, up),
-var backends_with_status = ((backend, up)).group_by((lb, vip_key, 
backend_ips, health_check,
- vip_addr, 
vip_port)).to_map().
+var up_backends = ((backend, up)).group_by(lbvip).to_vec().filter_map(|x| {
+(LBVIPBackend{var ip, var port, _}, var up) = x;
+match ((up, port)) {
+(true, 0) -> Some{"${ip.to_bracketed_string()}"},
+(true, _) -> Some{"${ip.to_bracketed_string()}:${port}"},
+_ -> None
+}
+}).join(",").intern().
 
 /* Maps from a load-balancer virtual IP backend to whether it's up or not.
  *
diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index de2b0718c..8831da442 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -3177,7 +3177,7 @@ function get_match_for_lb_key(ip_address: v46_ip,
 }
 /* New connections in Ingress table. */
 
-function ct_lb(backends: string,
+function ct_lb(backends: istring,
selection_fields: Set, protocol: Option): 
string {
 var args = vec_with_capacity(2);
 args.push("backends=${backends}");
@@ -3198,18 +3198,11 @@ function ct_lb(backends: string,
 
 "ct_lb(" ++ args.join("; ") ++ ");"
 }
-function build_lb_vip_actions(lbvip: Intern,
+function build_lb_vip_actions(lbvip: Intern,
+  up_backends: istring,
   stage: Intern,
   actions0: string): (string, bool) {
-var up_backends = vec_with_capacity(lbvip.backends.size());
-for (pair in lbvip.backends) {
-(var backend, var up) = pair;
-if (up) {
-up_backends.push((backend.ip, backend.port))
-}
-};
-
-if (up_backends.is_empty()) {
+if (up_backends == i"") {
 if (lbvip.lb.options.get_bool_def(i"reject", false)) {
 return ("reg0 = 0; reject { outport <-> inport; 
${next_to_stage(stage)};};", true)
 } else if (lbvip.health_check.is_some()) {
@@ -3217,11 +3210,7 @@ function build_lb_vip_actions(lbvip: 
Intern,
 } // else fall through
 };
 
-var up_backends_s = up_backends.sort_imm().map(|x| match (x) {
-(ip, 0) -> "${ip.to_bracketed_string()}",
-(ip, port) -> "${ip.to_bracketed_string()}:${port}"
-}).join(",");
-var actions = ct_lb(up_backends_s, lbvip.lb.selection_fields, 
lbvip.lb.protocol);
+var actions = ct_lb(up_backends, lbvip.lb.selection_fields, 
lbvip.lb.protocol);
 (actions0 ++ actions, false)
 }
 Flow(.logical_datapath = sw._uuid,
@@ -3232,7 +3221,7 @@ Flow(.logical_datapath = sw._uuid,
  .io_port  = None,
  .controller_meter = meter,
  .stage_hint   = 0) :-
-LBVIPWithStatus[lbvip@{.lb = lb}],
+LBVIPWithStatus(lbvip@{.lb = lb}, up_backends),
 var priority = if (lbvip.vip_port != 0) { 120 } else { 110 },
 (var actions, var reject) = {
 /* Store the original destination IP to be used when generating
@@ -3252,7 +3241,7 @@ Flow(.logical_datapath = sw._uuid,
 ""
 };
 
-build_lb_vip_actions(lbvip, 

[ovs-dev] [PATCH ovn 7/8] ovn-northd-ddlog: Avoid storing unused 'lbs' field in Router.

2021-08-27 Thread Ben Pfaff
Signed-off-by: Ben Pfaff 
---
 northd/lrouter.dl | 9 -
 1 file changed, 9 deletions(-)

diff --git a/northd/lrouter.dl b/northd/lrouter.dl
index c0ec6be47..582e07654 100644
--- a/northd/lrouter.dl
+++ b/northd/lrouter.dl
@@ -488,7 +488,6 @@ typedef Router = Router {
 is_gateway: bool,
 nats:   Vec,
 snat_ips:   Map>,
-lbs:Vec>,
 lb_ipv4s_routable:  Set,
 lb_ipv4s_unroutable: Set,
 lb_ipv6s_routable:  Set,
@@ -515,7 +514,6 @@ Router[Router{
 .is_gateway  = lr.options.contains_key(i"chassis"),
 .nats= nats,
 .snat_ips= snat_ips,
-.lbs = lbs,
 .lb_ipv4s_routable = lb_ipv4s_routable,
 .lb_ipv4s_unroutable = lb_ipv4s_unroutable,
 .lb_ipv6s_routable = lb_ipv6s_routable,
@@ -553,13 +551,6 @@ Router[Router{
  lb_ipv6s_routable, lb_ipv6s_unroutable)
 }.
 
-/* RouterLB: many-to-many relation between logical routers and nb::LB */
-relation RouterLB(router: Intern, lb: Intern)
-
-RouterLB(router, lb) :-
-router in (.lbs = lbs),
-var lb = FlatMap(lbs).
-
 /* Router-to-router logical port connections */
 relation RouterRouterPeer(rport1: uuid, rport2: uuid, rport2_name: istring)
 
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 5/8] ovn-northd-ddlog: Avoid re-parsing LB IP addresses and ports.

2021-08-27 Thread Ben Pfaff
The LBVIPs already contain parsed versions of the load balancer
keys, but the code was parsing it redundantly.

Slight performance improvement but much cleaner.

Signed-off-by: Ben Pfaff 
---
 northd/ovn_northd.dl | 37 -
 1 file changed, 12 insertions(+), 25 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index b72914d73..de2b0718c 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -5667,8 +5667,7 @@ LogicalRouterNdFlow(.lr = r,
 .drop = false,
 .priority = 90,
 .stage_hint = 0) :-
-LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}],
-Some{(IPv6{var ip}, _)} = ip_address_and_port_from_lb_key(lb_key.ival()),
+LBVIP[lbvip@{.vip_addr = IPv6{ip}, .lb = lb}],
 r in (),
 r.load_balancer.contains(lb._uuid),
 (.router = r, .lrp = lrp, .is_redirect = is_redirect),
@@ -6685,8 +6684,7 @@ Flow(.logical_datapath = r._uuid,
  .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None) :-
-LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}],
-Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()),
+LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb}],
 var prio = if (port != 0) { 110 } else { 100 },
 var proto = match (lb.protocol) {
 Some{proto} -> proto,
@@ -6719,8 +6717,7 @@ Flow(.logical_datapath = r._uuid,
  .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None) :-
-LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}],
-Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()),
+LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb}],
 var proto = match (lb.protocol) {
 Some{proto} -> proto,
 _ -> i"tcp"
@@ -6765,8 +6762,7 @@ Flow(.logical_datapath = r._uuid,
  .stage_hint   = stage_hint(lb._uuid),
  .io_port  = None,
  .controller_meter = None) :-
-LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}],
-Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()),
+LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb}],
 var proto = match (lb.protocol) {
 Some{proto} -> proto,
 _ -> i"tcp"
@@ -6791,28 +6787,19 @@ Flow(.logical_datapath = r._uuid,
  .stage_hint   = stage_hint(lb._uuid),
  .io_port  = None,
  .controller_meter = None) :-
-LBVIP[lbvip@{.vip_key = lb_key, .lb = lb, .backend_ips = backends}],
-Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()),
+LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb, .backends = 
backends}],
 var proto = match (lb.protocol) {
 Some{proto} -> proto,
 _ -> i"tcp"
 },
-var conds = {
-var conds = vec_empty();
-for (ip_str in backends.split(",")) {
-match (ip_address_and_port_from_lb_key(ip_str)) {
-None -> () /* FIXME: put a break here */,
-Some{(ip_, port_)} -> conds.push(
-"(${ip_.ipX()}.src == ${ip_}" ++
-if (port_ != 0) {
-" && ${proto}.src == ${port_})"
-} else {
-")"
-})
-}
+var conds = backends.map(|b| {
+var port_match = if (b.port != 0) {
+" && ${proto}.src == ${b.port}"
+} else {
+""
 };
-conds.join(" || ")
-},
+"(${b.ip.ipX()}.src == ${b.ip}" ++ port_match ++ ")"
+}).join(" || "),
 conds != "",
 r in (),
 Some{var gwport} = r.l3dgw_ports.nth(0),
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 4/8] ovn-northd-ddlog: Reverse order of joins for connection tracking flows.

2021-08-27 Thread Ben Pfaff
DDlog evaluates rules in the order given by syntax.  The rules for
load balancers all first evaluated a Router or Switch, then joined
that against the load balancers.  However, the expensive logic was
all on the load balancers.  This meant that the load balancer logic
was happening many times, once per switch or router that contained
it.  When a single load balancer was part of many switches or
routers, this did a lot of redundant processing.  This commit
reverses the join order, which speeds up processing a lot.

This commit looks big because it also converts a lot of rules from
the FLWOR syntax to traditional Datalog-style syntax.  This is not
completely needed, but the Datalog-style syntax is more versatile
(it supports FlatMap and aggregation), so I tend to make that sort
of change as I refactor things.

Signed-off-by: Ben Pfaff 
---
 northd/lrouter.dl|  11 --
 northd/ovn_northd.dl | 419 +--
 2 files changed, 209 insertions(+), 221 deletions(-)

diff --git a/northd/lrouter.dl b/northd/lrouter.dl
index 17d803292..c0ec6be47 100644
--- a/northd/lrouter.dl
+++ b/northd/lrouter.dl
@@ -560,17 +560,6 @@ RouterLB(router, lb) :-
 router in (.lbs = lbs),
 var lb = FlatMap(lbs).
 
-/* Load balancer VIPs associated with routers */
-relation RouterLBVIP(
-router: Intern,
-lb: Intern,
-vip: istring,
-backends: istring)
-
-RouterLBVIP(router, lb, vip, backends) :-
-RouterLB(router, lb@({.lb = ::Load_Balancer{.vips = 
vips}})),
-(var vip, var backends) = FlatMap(vips).
-
 /* Router-to-router logical port connections */
 relation RouterRouterPeer(rport1: uuid, rport2: uuid, rport2_name: istring)
 
diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index e69ce6958..b72914d73 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -3232,9 +3232,7 @@ Flow(.logical_datapath = sw._uuid,
  .io_port  = None,
  .controller_meter = meter,
  .stage_hint   = 0) :-
-sw in (),
 LBVIPWithStatus[lbvip@{.lb = lb}],
-sw.load_balancer.contains(lb._uuid),
 var priority = if (lbvip.vip_port != 0) { 120 } else { 110 },
 (var actions, var reject) = {
 /* Store the original destination IP to be used when generating
@@ -3256,12 +3254,14 @@ Flow(.logical_datapath = sw._uuid,
 
 build_lb_vip_actions(lbvip, s_SWITCH_OUT_QOS_MARK(), actions0 ++ 
actions1)
 },
+var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, 
lbvip.vip_port, lb.protocol, false, false, false),
+sw in (),
+sw.load_balancer.contains(lb._uuid),
 var meter = if (reject) {
 sw.copp.get(cOPP_REJECT())
 } else {
 None
-},
-var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, 
lbvip.vip_port, lb.protocol, false, false, false).
+}.
 
 /* Ingress Pre-Hairpin/Nat-Hairpin/Hairpin tabled (Priority 0).
  * Packets that don't need hairpinning should continue processing.
@@ -5657,6 +5657,26 @@ for (RouterPortNetworksIPv4Addr(.port = {.lrp 
= lrp,
 }
 }
 
+LogicalRouterNdFlow(.lr = r,
+.lrp = Some{lrp},
+.action = i"nd_na",
+.ip = ip,
+.sn_ip = false,
+.mac = rEG_INPORT_ETH_ADDR(),
+.extra_match = residence_check,
+.drop = false,
+.priority = 90,
+.stage_hint = 0) :-
+LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}],
+Some{(IPv6{var ip}, _)} = ip_address_and_port_from_lb_key(lb_key.ival()),
+r in (),
+r.load_balancer.contains(lb._uuid),
+(.router = r, .lrp = lrp, .is_redirect = is_redirect),
+var residence_check = match (is_redirect) {
+true -> 
Some{i"is_chassis_resident(${json_escape(chassis_redirect_name(lrp.name))})"},
+false -> None
+}.
+
 for ((.lrp = lrp,
  .router = router@{._uuid = lr_uuid},
  .json_name = json_name,
@@ -5675,22 +5695,7 @@ var residence_check = match (is_redirect) {
  .extra_match = residence_check,
  .drop = false,
  .priority = 90,
- .stage_hint = 0);
-for (RouterLBVIP(.router = {._uuid= lr_uuid}, .vip = vip)) {
-Some{(var ip_address, _)} = 
ip_address_and_port_from_lb_key(vip.ival()) in {
-IPv6{var ipv6} = ip_address in
-LogicalRouterNdFlow(.lr = router,
-.lrp = Some{lrp},
-.action = i"nd_na",
-.ip = ipv6,
-.sn_ip = false,
-.mac = rEG_INPORT_ETH_ADDR(),
-.extra_match = residence_check,
-.drop = false,
-

[ovs-dev] [PATCH ovn 2/8] ovn-northd-ddlog: Make joins for ARP/ND flows slightly more efficient.

2021-08-27 Thread Ben Pfaff
DDlog can index equality joins within a expression like this, but not
conditional expression that follow such an expression.  This doesn't
actually matter much in this case because ordinarily the expression
will be true (most router ports are enabled).

Signed-off-by: Ben Pfaff 
---
 northd/lrouter.dl|  6 --
 northd/ovn_northd.dl | 16 +---
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/northd/lrouter.dl b/northd/lrouter.dl
index cc3dced5f..b6e752f7c 100644
--- a/northd/lrouter.dl
+++ b/northd/lrouter.dl
@@ -596,7 +596,8 @@ typedef RouterPort = RouterPort {
 peer: RouterPeer,
 mcast_cfg:Intern,
 sb_options:   Map,
-has_bfd:  bool
+has_bfd:  bool,
+enabled:  bool
 }
 
 relation RouterPort[Intern]
@@ -610,7 +611,8 @@ RouterPort[RouterPort{
.peer   = peer,
.mcast_cfg  = mcast_cfg,
.sb_options = sb_options,
-   .has_bfd= has_bfd
+   .has_bfd= has_bfd,
+   .enabled= lrp.is_enabled()
}.intern()] :-
 lrp in ::Logical_Router_Port(),
 Some{var networks} = extract_lrp_networks(lrp.mac.ival(), 
lrp.networks.map(ival)),
diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 61d7d53ce..fd40b637e 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -4609,8 +4609,7 @@ relation (
.reachable_ips_v6 = reachable_ips_v6,
.unreachable_ips_v4 = unreachable_ips_v4,
.unreachable_ips_v6 = unreachable_ips_v6) :-
-port in (.peer = Some{rp}),
-rp.is_enabled(),
+port in (.peer = Some{rp@{.enabled = true}}),
 (var reachable_ips_v4, var reachable_ips_v6, var unreachable_ips_v4, var 
unreachable_ips_v6) = get_arp_forward_ips(rp).
 
 /* Packets received from VXLAN tunnels have already been through the
@@ -4632,8 +4631,7 @@ Flow(.logical_datapath = sw._uuid,
  .stage_hint   = stage_hint(sp.lsp._uuid),
  .io_port  = None,
  .controller_meter = None) :-
-sp in (.sw = sw, .peer = Some{rp}),
-rp.is_enabled(),
+sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .reachable_ips_v4 = ips_v4),
 var ipv4 = FlatMap(ips_v4),
 var mc_flood_l2 = json_escape(mC_FLOOD_L2().0).
@@ -4650,8 +4648,7 @@ Flow(.logical_datapath = sw._uuid,
  .stage_hint   = stage_hint(sp.lsp._uuid),
  .io_port  = None,
  .controller_meter = None) :-
-sp in (.sw = sw, .peer = Some{rp}),
-rp.is_enabled(),
+sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .reachable_ips_v6 = ips_v6),
 var ipv6 = FlatMap(ips_v6),
 var mc_flood_l2 = json_escape(mC_FLOOD_L2().0).
@@ -4664,12 +4661,10 @@ Flow(.logical_datapath = sw._uuid,
  .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None) :-
-sp in (.sw = sw, .peer = Some{rp}),
-rp.is_enabled(),
+sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .unreachable_ips_v4 = ips_v4),
 var ipv4 = FlatMap(ips_v4),
 var flood = json_escape(mC_FLOOD().0).
-
 Flow(.logical_datapath = sw._uuid,
  .stage= s_SWITCH_IN_L2_LKUP(),
  .priority = 90,
@@ -4678,8 +4673,7 @@ Flow(.logical_datapath = sw._uuid,
  .stage_hint   = stage_hint(sp.lsp._uuid),
  .io_port  = None,
  .controller_meter = None) :-
-sp in (.sw = sw, .peer = Some{rp}),
-rp.is_enabled(),
+sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}),
 (.port = sp, .unreachable_ips_v6 = ips_v6),
 var ipv6 = FlatMap(ips_v6),
 var flood = json_escape(mC_FLOOD().0).
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 3/8] ovn-northd-ddlog: Derive load balancer IP addresses in new LoadBalancer.

2021-08-27 Thread Ben Pfaff
This makes the get_router_load_balancer_ips() function much faster.
This function is a hot path for the use of load balancers, so it
improves performance overall when they are in use.

Signed-off-by: Ben Pfaff 
---
 northd/lrouter.dl| 68 +--
 northd/ovn_northd.dl | 76 +++-
 2 files changed, 91 insertions(+), 53 deletions(-)

diff --git a/northd/lrouter.dl b/northd/lrouter.dl
index b6e752f7c..17d803292 100644
--- a/northd/lrouter.dl
+++ b/northd/lrouter.dl
@@ -402,14 +402,13 @@ LogicalRouterSnatIPs(lr._uuid, map_empty()) :-
 lr in nb::Logical_Router(),
 not LogicalRouterSnatIP(.lr = lr._uuid).
 
-relation LogicalRouterLB(lr: uuid, nat: Intern)
-
+relation LogicalRouterLB(lr: uuid, nat: Intern)
 LogicalRouterLB(lr, lb) :-
 nb::Logical_Router(._uuid = lr, .load_balancer = lbs),
 var lb_uuid = FlatMap(lbs),
-lb in ::Load_Balancer(._uuid = lb_uuid).
+lb in (.lb = ::Load_Balancer{._uuid = lb_uuid}).
 
-relation LogicalRouterLBs(lr: uuid, nat: Vec>)
+relation LogicalRouterLBs(lr: uuid, nat: Vec>)
 
 LogicalRouterLBs(lr, lbs) :-
  LogicalRouterLB(lr, lb),
@@ -448,6 +447,31 @@ LogicalRouterCopp0(lr, meters) :-
 
 function chassis_redirect_name(port_name: istring): string = "cr-${port_name}"
 
+typedef LoadBalancer = LoadBalancer {
+lb: Intern,
+ipv4s: Set,
+ipv6s: Set,
+routable: bool
+}
+
+relation LoadBalancer[Intern]
+LoadBalancer[LoadBalancer{lb, ipv4s, ipv6s, routable}.intern()] :-
+nb::Load_Balancer[lb],
+var routable = lb.options.get_bool_def(i"add_route", false),
+(var ipv4s, var ipv6s) = {
+var ipv4s = set_empty();
+var ipv6s = set_empty();
+for ((vip, _) in lb.vips) {
+/* node->key contains IP:port or just IP. */
+match (ip_address_and_port_from_lb_key(vip.ival())) {
+None -> (),
+Some{(IPv4{ipv4}, _)} -> ipv4s.insert(i"${ipv4}"),
+Some{(IPv6{ipv6}, _)} -> ipv6s.insert(i"${ipv6}"),
+}
+};
+(ipv4s, ipv6s)
+}.
+
 typedef Router = Router {
 /* Fields copied from nb::Logical_Router. */
 _uuid:  uuid,
@@ -464,7 +488,11 @@ typedef Router = Router {
 is_gateway: bool,
 nats:   Vec,
 snat_ips:   Map>,
-lbs:Vec>,
+lbs:Vec>,
+lb_ipv4s_routable:  Set,
+lb_ipv4s_unroutable: Set,
+lb_ipv6s_routable:  Set,
+lb_ipv6s_unroutable: Set,
 mcast_cfg:  Intern,
 learn_from_arp_request: bool,
 force_lb_snat: bool,
@@ -488,6 +516,10 @@ Router[Router{
 .nats= nats,
 .snat_ips= snat_ips,
 .lbs = lbs,
+.lb_ipv4s_routable = lb_ipv4s_routable,
+.lb_ipv4s_unroutable = lb_ipv4s_unroutable,
+.lb_ipv6s_routable = lb_ipv6s_routable,
+.lb_ipv6s_unroutable = lb_ipv6s_unroutable,
 .mcast_cfg   = mcast_cfg,
 .learn_from_arp_request = learn_from_arp_request,
 .force_lb_snat = force_lb_snat,
@@ -501,10 +533,28 @@ Router[Router{
 LogicalRouterCopp(lr._uuid, copp),
 mcast_cfg in (.datapath = lr._uuid),
 var learn_from_arp_request = 
lr.options.get_bool_def(i"always_learn_from_arp_request", true),
-var force_lb_snat = lb_force_snat_router_ip(lr.options).
+var force_lb_snat = lb_force_snat_router_ip(lr.options),
+(var lb_ipv4s_routable, var lb_ipv4s_unroutable,
+ var lb_ipv6s_routable, var lb_ipv6s_unroutable) = {
+var lb_ipv4s_routable = set_empty();
+var lb_ipv4s_unroutable = set_empty();
+var lb_ipv6s_routable = set_empty();
+var lb_ipv6s_unroutable = set_empty();
+for (lb in lbs) {
+if (lb.routable) {
+lb_ipv4s_routable = lb_ipv4s_routable.union(lb.ipv4s);
+lb_ipv6s_routable = lb_ipv6s_routable.union(lb.ipv6s);
+} else {
+lb_ipv4s_unroutable = lb_ipv4s_unroutable.union(lb.ipv4s);
+lb_ipv6s_unroutable = lb_ipv6s_unroutable.union(lb.ipv6s);
+}
+};
+(lb_ipv4s_routable, lb_ipv4s_unroutable,
+ lb_ipv6s_routable, lb_ipv6s_unroutable)
+}.
 
 /* RouterLB: many-to-many relation between logical routers and nb::LB */
-relation RouterLB(router: Intern, lb: Intern)
+relation RouterLB(router: Intern, lb: Intern)
 
 RouterLB(router, lb) :-
 router in (.lbs = lbs),
@@ -513,12 +563,12 @@ RouterLB(router, lb) :-
 /* Load balancer VIPs associated with routers */
 relation RouterLBVIP(
 router: Intern,
-lb: Intern,
+lb: Intern,
 vip: istring,
 backends: istring)
 
 RouterLBVIP(router, lb, vip, backends) :-
-RouterLB(router, lb@(::Load_Balancer{.vips = vips})),
+RouterLB(router, lb@({.lb = ::Load_Balancer{.vips = 
vips}})),
 (var vip, var backends) = FlatMap(

[ovs-dev] [PATCH ovn 0/8] 3x performance improvement for ddlog with load balancer benchmark

2021-08-27 Thread Ben Pfaff
With and without these patches, I see the following performance when I
run the load-balancer heavy benchmark.  The measurements include cold
start with all the load balancers, then deleting the sctp load balancer
and waiting for it to finish, then the same with the tcp load balancer,
then the same with the udp load balancer.  The measurements only include
ddlog time; there is additional overhead talking to the database, but
that's a constant factor.

Without patches:   777 CPU seconds, 23.5 GB RAM, 12:59 elapsed time.
With patches:  261 CPU seconds, 14.4 GB RAM,  4:23 elapsed time.

I am continuing to work on performance.

Ben Pfaff (7):
  ovn-northd-ddlog: Make joins for ARP/ND flows slightly more efficient.
  ovn-northd-ddlog: Derive load balancer IP addresses in new
LoadBalancer.
  ovn-northd-ddlog: Reverse order of joins for connection tracking
flows.
  ovn-northd-ddlog: Avoid re-parsing LB IP addresses and ports.
  ovn-northd-ddlog: Simplify LBVIPWithStatus to include up_backends
string.
  ovn-northd-ddlog: Avoid storing unused 'lbs' field in Router.
  ovn-northd-ddlog: Intern strings before joining when possible.

Leonid Ryzhyk (1):
  ovn-northd-ddlog: Intern all strings in OVSDB tables.

 configure.ac |2 +-
 manpages.mk  |1 -
 northd/copp.dl   |   32 +-
 northd/helpers.dl|   14 +-
 northd/ipam.dl   |   17 +-
 northd/lrouter.dl|  213 +++--
 northd/lswitch.dl|  177 ++--
 northd/multicast.dl  |   44 +-
 northd/ovn-nb.dlopts |1 +
 northd/ovn-sb.dlopts |1 +
 northd/ovn.dl|7 +
 northd/ovn_northd.dl | 2017 --
 northd/ovsdb2ddlog2c |6 +-
 tests/ovn-ic.at  |8 +-
 tests/ovn.at |6 +-
 15 files changed, 1266 insertions(+), 1280 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netlink: Add support for parsing link layer address.

2021-08-20 Thread Ben Pfaff
On Fri, Aug 20, 2021 at 03:43:15PM +0200, Frode Nordahl wrote:
> Data retrieved from netlink and friends may include link layer
> address.  Add type to nl_attr_type and min/max functions to allow
> use of nl_policy_parse with this type of data.
> 
> While this will not be used by Open vSwitch itself at this time,
> sibling and derived projects want to use the great netlink library
> that OVS provides, and it is not possible to safely override the
> global nl_attr_type symbol at link time.
> 
> Signed-off-by: Frode Nordahl 

Thanks, I applied this.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-idl : Add APIs to query if a table and a column is present or not.

2021-08-20 Thread Ben Pfaff
On Wed, Aug 18, 2021 at 08:42:47PM -0400, num...@ovn.org wrote:
> From: Numan Siddique 
> 
> This patch adds 2 new APIs in the ovsdb-idl client library
>  - ovsdb_idl_has_table() and ovsdb_idl_has_column_in_table() to
> query if a table and a column is present in the IDL or not.  This
> is required for scenarios where the server schema is old and
> missing a table or column and the client (built with a new schema
> version) does a transaction with the missing table or column.  This
> results in a continuous loop of transaction failures.
> 
> OVN would require the API - ovsdb_idl_has_table() to address this issue
> when an old ovsdb-server is used (OVS 2.11) which has the 'datapath'
> table missing.  A recent commit in OVN creates a 'datapath' row
> in the local ovsdb-server.  ovsdb_idl_has_column_in_table() would be
> useful to have.
> 
> Related issue: https://bugzilla.redhat.com/show_bug.cgi?id=1992705
> Signed-off-by: Numan Siddique 

In the OVN meeting, I suggested that this is a bug in the ovsdb_idl code
for transactions: it shouldn't try to transact on a column that it knows
doesn't exist.  It might still make sense to add this API, though.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ideas for improving the OVS review process

2021-08-17 Thread Ben Pfaff
On Mon, Aug 16, 2021 at 03:52:49PM -0300, Flavio Leitner wrote:
> On Tue, Jul 20, 2021 at 11:41:37AM -0700, Ben Pfaff wrote:
> > The OVS review process has greatly slowed over the last few years.  This
> > is partly because I haven't been able to spend as much time on review,
> > since I was once the most productive reviewer.  Ilya has been able to
> > step up the amount of review he does, but that still isn't enough to
> > keep up with the workload.
> > 
> > We need to come up with some way to improve things.  Here are a few
> > ideas, mostly from a call earlier today (that was mainly about the OVS
> > conference).  I hope they will prompt a discussion.
> > 
> > * Since patches are coming in, we have people who are knowledgable about
> >   the code.  Those people should be pitching in with reviews as well.
> >   It doesn't seem like they or their managers have the right incentives
> >   to do that.  Maybe there is some way to improve the incentives.
> 
> There is that and also that even with others review, a maintainer
> still does a careful review which puts off reviewers' work.
> "The perfect is the enemy of the good."
> 
> We have several emeritus maintainers but only a couple of them active
> at the moment. It doesn't seem fair to ask them to do more work.  So,
> I agree with the other reply to this thread saying that we need more
> maintainers.
> 
> Although I think we could grow the number of maintainers, it doesn't
> seem reasonable to ask each one of them to do an extensive review
> by themselves on every patch.  My point is that there should be a
> "chain of trust", or a rule that allows a patch to be "blindly
> committed" if that patch is reviewed by N different members, for example.
> The intention is not to forbid maintainers to review, but to alleviate
> the load or pressure on them when the community already helped.
> 
> As a side effect, more people could become maintainers/committers
> because we don't require them to know OVS deeply or to commit huge
> amounts of time to careful review others work.
> 
> This brings up another point: Unpredictability. It is not possible
> today to tell what is going to happen with a patch after it gets
> posted to mailing list. It can be silent ignored for many months,
> or be reviewed by others and still not accepted, etc.  We should
> have a way to prevent that otherwise it makes very difficult to
> align companies or other upstream projects schedules. 
> 
> For example, if a company is investing on a feature "X" most probably
> has a deliver date. Even if the work is posted during development
> phase, that doesn't mean the patch will be reviewed or accepted or
> rejected. It's kind of chicken and egg issue. If the process is not
> clear, why managers should provide more incentives to participate?
> 
> 
> > * The Linux kernel uses something like a "default accept" policy for
> >   patches that superficially look good and compile and boot, if there is
> >   no review from a specific maintainer within a few days.  The patches
> >   do get some small amount of review from a subsystem maintainer.  OVS
> >   could adopt a similar policy.
> 
> I agree. Aaron is working to improve patchwork's status report for
> each patch. Hopefully each community member could report test status
> there.  It seems that if the community decides to go with the "default
> accept" rule, there will be more incentive to connect tests to OVS
> patchwork. We get more automation as a side effect.
> 
> However, we would still need to define who is the sub-maintainer
> or perhaps think on the "chain of trust"/"blindy committed" idea
> mentioned above.

It seems like making the list of sub-maintainers is an important step
either way.  What process should we use for that?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn repost 0/7] Reduce memory consumption and time for Load_Balancer benchmark

2021-08-17 Thread Ben Pfaff
On Tue, Aug 17, 2021 at 11:47:50AM -0400, Numan Siddique wrote:
> On Thu, Aug 12, 2021 at 1:45 PM Ben Pfaff  wrote:
> >
> > Apologies for repost; I forgot to tag this as "ovn" in the first
> > posting.
> >
> > For the benchmark at
> > https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html,
> > this reduces memory consumption from 115 GB to 17 GB and elapsed time
> > from 19 minutes to 5 minutes.
> >
> > I think there's headroom for more improvement, because Leonid has some
> > cleverness that I haven't been able to successfully work in yet.
> >
> > This series is also available here:
> > https://github.com/blp/ovs-reviews/tree/ovn-memory-4
> 
> Thanks for the improvements.
> 
> For the entire series:
> 
> Acked-by: Numan Siddique 

Thanks!  I applied this series.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ideas for improving the OVS review process

2021-08-17 Thread Ben Pfaff
There's another perspective from the PostgreSQL community detailed at
LWN.  Here's a "guest link" to the article, titled "PostgreSQL's
commitfest clog":
https://lwn.net/SubscriberLink/865754/6c0eb771f8436789/

The paragraph that starts out "There is a longstanding expectation
within the PostgreSQL project that anybody submitting a patch for
consideration in a given commitfest should take the time to review
somebody else's patch, preferably one of similar complexity." is
particularly interesting.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn repost 7/7] ovn-northd-ddlog: Postpone expanding external_ids, stage_hint, tags.

2021-08-12 Thread Ben Pfaff
This has little effect on performance in the current benchmark, but it
seems like the cleanest approach to me.

Signed-off-by: Ben Pfaff 
---
 northd/ovn_northd.dl | 38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index dadf33ab6..f0730d40b 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -1667,9 +1667,9 @@ relation AggregatedFlow (
 priority:  integer,
 __match:   istring,
 actions:   istring,
-tags:  Map,
+io_port:   Option,
 controller_meter:  Option,
-external_ids:  Map
+stage_hint:bit<32>
 )
 function make_flow_tags(io_port: Option): Map {
 match (io_port) {
@@ -1690,9 +1690,9 @@ AggregatedFlow(.logical_datapaths = g.to_set(),
.priority = priority,
.__match = __match,
.actions = actions,
-   .tags = make_flow_tags(io_port),
+   .io_port = io_port,
.controller_meter = controller_meter,
-   .external_ids = make_flow_external_ids(stage_hint, stage)) :-
+   .stage_hint = stage_hint) :-
 UseLogicalDatapathGroups[true],
 Flow(logical_datapath, stage, priority, __match, actions, io_port, 
controller_meter, stage_hint),
 var g = logical_datapath.group_by((stage, priority, __match, actions, 
io_port, controller_meter, stage_hint)).
@@ -1701,43 +1701,49 @@ AggregatedFlow(.logical_datapaths = 
set_singleton(logical_datapath),
.priority = priority,
.__match = __match,
.actions = actions,
-   .tags = make_flow_tags(io_port),
+   .io_port = io_port,
.controller_meter = controller_meter,
-   .external_ids = make_flow_external_ids(stage_hint, stage)) :-
+   .stage_hint = stage_hint) :-
 UseLogicalDatapathGroups[false],
 Flow(logical_datapath, stage, priority, __match, actions, io_port, 
controller_meter, stage_hint).
 
+function to_string(pipeline: Pipeline): string {
+if (pipeline == Ingress) {
+"ingress"
+} else {
+"egress"
+}
+}
+
 for (f in AggregatedFlow()) {
-var pipeline = if (f.stage.pipeline == Ingress) "ingress" else "egress" in
-var external_ids = f.external_ids.insert_imm("stage-name", 
f.stage.table_name) in
 if (f.logical_datapaths.size() == 1) {
 Some{var dp} = f.logical_datapaths.nth(0) in
 sb::Out_Logical_Flow(
-._uuid = hash128((dp, f.stage, f.priority, f.__match, f.actions, 
f.controller_meter, f.external_ids)),
+._uuid = hash128((dp, f.stage, f.priority, f.__match, f.actions, 
f.controller_meter, f.io_port, f.stage_hint)),
 .logical_datapath = Some{dp},
 .logical_dp_group = None,
-.pipeline = pipeline,
+.pipeline = f.stage.pipeline.to_string(),
 .table_id = f.stage.table_id as integer,
 .priority = f.priority,
 .controller_meter = f.controller_meter,
 .__match  = f.__match.ival(),
 .actions  = f.actions.ival(),
-.tags = f.tags,
-.external_ids = external_ids)
+.tags = make_flow_tags(f.io_port),
+.external_ids = make_flow_external_ids(f.stage_hint, f.stage))
 } else {
 var group_uuid = hash128(f.logical_datapaths) in {
 sb::Out_Logical_Flow(
-._uuid = hash128((group_uuid, f.stage, f.priority, f.__match, 
f.actions, f.controller_meter, f.external_ids)),
+._uuid = hash128((group_uuid, f.stage, f.priority, f.__match, 
f.actions, f.controller_meter, f.io_port, f.stage_hint)),
 .logical_datapath = None,
 .logical_dp_group = Some{group_uuid},
-.pipeline = pipeline,
+.pipeline = f.stage.pipeline.to_string(),
 .table_id = f.stage.table_id as integer,
 .priority = f.priority,
 .controller_meter = f.controller_meter,
 .__match  = f.__match.ival(),
 .actions  = f.actions.ival(),
-.tags = f.tags,
-.external_ids = external_ids);
+.tags = make_flow_tags(f.io_port),
+.external_ids = make_flow_external_ids(f.stage_hint, 
f.stage));
 sb::Out_Logical_DP_Group(._uuid = group_uuid, .datapaths = 
f.logical_datapaths)
 }
 }
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn repost 6/7] ovn-northd-ddlog: Intern nb::Logical_Switch.

2021-08-12 Thread Ben Pfaff
With the benchmark at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html,
this reduces memory consumption by 300 MB and elapsed time by a few
seconds.

Signed-off-by: Ben Pfaff 
---
 northd/ipam.dl   |  2 +-
 northd/lswitch.dl| 36 ++--
 northd/multicast.dl  |  2 +-
 northd/ovn-nb.dlopts |  1 +
 northd/ovn_northd.dl | 16 
 5 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/northd/ipam.dl b/northd/ipam.dl
index da71b2872..4665a28cb 100644
--- a/northd/ipam.dl
+++ b/northd/ipam.dl
@@ -187,7 +187,7 @@ SwitchIPv4ReservedAddresses(lswitch, addrs) :-
 var addrs = addr.group_by(lswitch).to_set().
 
 SwitchIPv4ReservedAddresses(lswitch_uuid, set_empty()) :-
-nb::Logical_Switch(._uuid = lswitch_uuid),
+::Logical_Switch(._uuid = lswitch_uuid),
 not SwitchIPv4ReservedAddress(lswitch_uuid, _).
 
 /* Allocate dynamic IP addresses for ports that require them:
diff --git a/northd/lswitch.dl b/northd/lswitch.dl
index 7e7b62a4d..868ae115f 100644
--- a/northd/lswitch.dl
+++ b/northd/lswitch.dl
@@ -44,7 +44,7 @@ SwitchRouterPeerRef(lsp, None) :-
  * (Use LogicalSwitchPort instead, which guarantees uniqueness.) */
 relation LogicalSwitchPortCandidate(lsp_uuid: uuid, ls_uuid: uuid)
 LogicalSwitchPortCandidate(lsp_uuid, ls_uuid) :-
-nb::Logical_Switch(._uuid = ls_uuid, .ports = ports),
+::Logical_Switch(._uuid = ls_uuid, .ports = ports),
 var lsp_uuid = FlatMap(ports).
 Warning[message] :-
 LogicalSwitchPortCandidate(lsp_uuid, ls_uuid),
@@ -74,7 +74,7 @@ LogicalSwitchPortWithUnknownAddress(ls_uuid, lsp_uuid) :-
 output relation LogicalSwitchHasUnknownPorts(ls: uuid, has_unknown: bool)
 LogicalSwitchHasUnknownPorts(ls, true) :- 
LogicalSwitchPortWithUnknownAddress(ls, _).
 LogicalSwitchHasUnknownPorts(ls, false) :-
-nb::Logical_Switch(._uuid = ls),
+::Logical_Switch(._uuid = ls),
 not LogicalSwitchPortWithUnknownAddress(ls, _).
 
 /* PortStaticAddresses: static IP addresses associated with each 
Logical_Switch_Port */
@@ -101,11 +101,11 @@ PortInGroup(port, group) :-
 relation LogicalSwitchACL(ls: uuid, acl: uuid)
 
 LogicalSwitchACL(ls, acl) :-
-nb::Logical_Switch(._uuid = ls, .acls = acls),
+::Logical_Switch(._uuid = ls, .acls = acls),
 var acl = FlatMap(acls).
 
 LogicalSwitchACL(ls, acl) :-
-nb::Logical_Switch(._uuid = ls, .ports = ports),
+::Logical_Switch(._uuid = ls, .ports = ports),
 var port_id = FlatMap(ports),
 PortInGroup(port_id, group_id),
 nb::Port_Group(._uuid = group_id, .acls = acls),
@@ -125,7 +125,7 @@ LogicalSwitchHasStatefulACL(ls, true) :-
 LogicalSwitchStatefulACL(ls, _).
 
 LogicalSwitchHasStatefulACL(ls, false) :-
-nb::Logical_Switch(._uuid = ls),
+::Logical_Switch(._uuid = ls),
 not LogicalSwitchStatefulACL(ls, _).
 
 // "Pitfalls of projections" in ddlog-new-feature.rst explains why this
@@ -136,7 +136,7 @@ LogicalSwitchHasACLs(ls, true) :-
 LogicalSwitchACL(ls, _).
 
 LogicalSwitchHasACLs(ls, false) :-
-nb::Logical_Switch(._uuid = ls),
+::Logical_Switch(._uuid = ls),
 not LogicalSwitchACL(ls, _).
 
 /*
@@ -146,7 +146,7 @@ LogicalSwitchHasACLs(ls, false) :-
  */
 relation LogicalSwitchLocalnetPort0(ls_uuid: uuid, lsp: (uuid, string))
 LogicalSwitchLocalnetPort0(ls_uuid, (lsp_uuid, lsp.name)) :-
-ls in nb::Logical_Switch(._uuid = ls_uuid),
+ls in ::Logical_Switch(._uuid = ls_uuid),
 var lsp_uuid = FlatMap(ls.ports),
 lsp in ::Logical_Switch_Port(._uuid = lsp_uuid),
 lsp.__type == "localnet".
@@ -156,7 +156,7 @@ LogicalSwitchLocalnetPorts(ls_uuid, localnet_ports) :-
 LogicalSwitchLocalnetPort0(ls_uuid, lsp),
 var localnet_ports = lsp.group_by(ls_uuid).to_vec().
 LogicalSwitchLocalnetPorts(ls_uuid, vec_empty()) :-
-ls in nb::Logical_Switch(),
+ls in ::Logical_Switch(),
 var ls_uuid = ls._uuid,
 not LogicalSwitchLocalnetPort0(ls_uuid, _).
 
@@ -164,7 +164,7 @@ LogicalSwitchLocalnetPorts(ls_uuid, vec_empty()) :-
 relation LogicalSwitchDNS(ls_uuid: uuid, dns_uuid: uuid)
 
 LogicalSwitchDNS(ls._uuid, dns_uuid) :-
-nb::Logical_Switch[ls],
+::Logical_Switch[ls],
 var dns_uuid = FlatMap(ls.dns_records),
 nb::DNS(._uuid = dns_uuid).
 
@@ -183,12 +183,12 @@ LogicalSwitchHasDNSRecords(ls, true) :-
 LogicalSwitchWithDNSRecords(ls).
 
 LogicalSwitchHasDNSRecords(ls, false) :-
-nb::Logical_Switch(._uuid = ls),
+::Logical_Switch(._uuid = ls),
 not LogicalSwitchWithDNSRecords(ls).
 
 relation LogicalSwitchHasNonRouterPort0(ls: uuid)
 LogicalSwitchHasNonRouterPort0(ls_uuid) :-
-ls in nb::Logical_Switch(._uuid = ls_uuid),
+ls in ::Logical_Switch(._uuid = ls_uuid),
 var lsp_uuid = FlatMap(ls.ports),
 lsp in ::Logical_Switch_Port(._uuid = lsp_uuid),
 lsp.__type != "router".
@@ -199,7 +199,7 @@ output relation LogicalSwitchHasNonRouterPort(ls: uuid, 
has_non_router_port: boo
 LogicalS

[ovs-dev] [PATCH ovn repost 2/7] ovn-northd-ddlog: Use cheaper representation for stage_hint.

2021-08-12 Thread Ben Pfaff
The stage_hint only shows 32 bits of the uuid, so it's cheaper to omit
the rest for the internal representation.  Also, this is just a hint, so
we might as well use zero to mean None and save the cost of the Option
wrapper.

With the benchmark at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html,
this reduces memory consumption by 1.3 GB.

Signed-off-by: Ben Pfaff 
---
 northd/ovn_northd.dl | 579 ++-
 1 file changed, 292 insertions(+), 287 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 844add024..2365372fb 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -1644,9 +1644,13 @@ relation Flow(
 actions:  string,
 io_port:  Option,
 controller_meter: Option,
-stage_hint:   Option
+stage_hint:   bit<32>
 )
 
+function stage_hint(_uuid: uuid): bit<32> {
+_uuid[127:96]
+}
+
 /* If this option is 'true' northd will combine logical flows that differ by
  * logical datapath only by creating a datapath group. */
 relation UseLogicalDatapathGroups[bool]
@@ -1673,11 +1677,12 @@ function make_flow_tags(io_port: Option): 
Map {
 Some{s} -> [ "in_out_port" -> s ]
 }
 }
-function make_flow_external_ids(stage_hint: Option, stage: Stage): 
Map {
-match (stage_hint) {
-None -> ["stage-name" -> stage.table_name],
-Some{uuid} -> ["stage-name" -> stage.table_name,
-   "stage-hint" -> "${hex(uuid[127:96])}"]
+function make_flow_external_ids(stage_hint: bit<32>, stage: Stage): 
Map {
+if (stage_hint == 0) {
+["stage-name" -> stage.table_name]
+} else {
+["stage-name" -> stage.table_name,
+ "stage-hint" -> "${hex(stage_hint)}"]
 }
 }
 AggregatedFlow(.logical_datapaths = g.to_set(),
@@ -1744,7 +1749,7 @@ Flow(.logical_datapath = sw._uuid,
  .priority = 50,
  .__match  = __match,
  .actions  = actions,
- .stage_hint   = Some{fg_uuid},
+ .stage_hint   = stage_hint(fg_uuid),
  .io_port  = None,
  .controller_meter = None) :-
 sw in (),
@@ -1776,7 +1781,7 @@ Flow(.logical_datapath = sw._uuid,
  .priority = 50,
  .__match  = __match,
  .actions  = actions,
- .stage_hint   = None,
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None) :-
 sw in (),
@@ -1799,7 +1804,7 @@ for (sw in ()) {
  .priority = 100,
  .__match  = "vlan.present",
  .actions  = "drop;",
- .stage_hint   = None /*TODO: check*/,
+ .stage_hint   = 0 /*TODO: check*/,
  .io_port  = None,
  .controller_meter = None)
 };
@@ -1810,7 +1815,7 @@ for (sw in ()) {
  .priority = 100,
  .__match  = "eth.src[40]",
  .actions  = "drop;",
- .stage_hint   = None /*TODO: check*/,
+ .stage_hint   = 0 /*TODO: check*/,
  .io_port  = None,
  .controller_meter = None)
 /* Port security flows have priority 50 (see below) and will continue to 
the next table
@@ -1872,7 +1877,7 @@ for ((._uuid =ls_uuid)) {
  .priority = 0,
  .__match  = "1",
  .actions  = "next;",
- .stage_hint   = None,
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None);
 Flow(.logical_datapath = ls_uuid,
@@ -1880,7 +1885,7 @@ for ((._uuid =ls_uuid)) {
  .priority = 0,
  .__match  = "1",
  .actions  = "next;",
- .stage_hint   = None,
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None);
 
@@ -1889,7 +1894,7 @@ for ((._uuid =ls_uuid)) {
  .priority = 110,
  .__match  = "eth.dst == $svc_monitor_mac",
  .actions  = "next;",
- .stage_hint   = None,
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None);
 Flow(.logical_datapath = ls_uuid,
@@ -1897,7 +1902,7 @@ for ((._uuid =ls_uuid)) {
  .priority = 110,
  .__match  = "eth.src == $svc_monitor_mac",
  .actions  = "next;",
- .stage_hint   = None,
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None)
 }
@@ -1912,7 +1917,7 @@ for ((.sw = sw@{._uuid = ls_uuid}, .acl 
= , .has_fair_meter
  .priority = acl.priority + oVN_ACL_PRI_OFFSET(),
  

[ovs-dev] [PATCH ovn repost 5/7] ovn-northd-ddlog: Get rid of duplicate flows caused by stage_hint.

2021-08-12 Thread Ben Pfaff
It was possible for these rules to generate multiple Flow records that
differed only in their stage_hint.  This caused a lot of duplication for
the load balancer benchmark.

With the benchmark at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html,
this reduces memory consumption from 53 GB to 17 GB and elapsed time
from 14 minutes to 5 minutes.

Signed-off-by: Ben Pfaff 
---
 northd/ovn_northd.dl | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index dd69126ff..38b97aa6b 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -3163,7 +3163,7 @@ Flow(.logical_datapath = sw._uuid,
  .actions  = actions.intern(),
  .io_port  = None,
  .controller_meter = meter,
- .stage_hint   = stage_hint(lb._uuid)) :-
+ .stage_hint   = 0) :-
 sw in (),
 LBVIPWithStatus[lbvip@{.lb = lb}],
 sw.load_balancer.contains(lb._uuid),
@@ -4587,7 +4587,7 @@ Flow(.logical_datapath = sw._uuid,
  .priority = 90,
  .__match  = i"${fLAGBIT_NOT_VXLAN()} && arp.op == 1 && arp.tpa == 
${ipv4}",
  .actions  = i"outport = ${flood}; output;",
- .stage_hint   = stage_hint(sp.lsp._uuid),
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None) :-
 sp in (.sw = sw, .peer = Some{rp}),
@@ -,7 +,7 @@ for (RouterLBVIP(
  .priority = prio,
  .__match  = __match.intern(),
  .actions  = __actions,
- .stage_hint   = stage_hint(lb._uuid),
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None);
 
@@ -6712,7 +6712,7 @@ for (RouterLBVIP(
  .priority = prio,
  .__match  = est_match.intern(),
  .actions  = actions,
- .stage_hint   = stage_hint(lb._uuid),
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None);
 
@@ -6797,7 +6797,7 @@ Flow(.logical_datapath = r._uuid,
  .actions  = actions.intern(),
  .io_port  = None,
  .controller_meter = meter,
- .stage_hint   = stage_hint(lb._uuid)) :-
+ .stage_hint   = 0) :-
 r in (),
 r.l3dgw_ports.len() > 0 or r.is_gateway,
 LBVIPWithStatus[lbvip@{.lb = lb}],
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn repost 3/7] ovn-northd-ddlog: Intern Stage.

2021-08-12 Thread Ben Pfaff
With the benchmark at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html,
this reduces memory consumption from 72 GB to 66 GB and elapsed time by
a few seconds.

Signed-off-by: Ben Pfaff 
---
 northd/ovn_northd.dl | 140 +--
 1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 2365372fb..f1b7471f2 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -1434,77 +1434,77 @@ nb::Out_Logical_Router_Port(._uuid = _uuid,
 
 typedef Pipeline = Ingress | Egress
 
-typedef Stage = Stage{
+typedef Stage = Stage {
 pipeline: Pipeline,
-table_id: integer,
+table_id: bit<8>,
 table_name  : string
 }
 
 /* Logical switch ingress stages. */
-function s_SWITCH_IN_PORT_SEC_L2(): Stage { Stage{Ingress,  0, 
"ls_in_port_sec_l2"} }
-function s_SWITCH_IN_PORT_SEC_IP(): Stage { Stage{Ingress,  1, 
"ls_in_port_sec_ip"} }
-function s_SWITCH_IN_PORT_SEC_ND(): Stage { Stage{Ingress,  2, 
"ls_in_port_sec_nd"} }
-function s_SWITCH_IN_LOOKUP_FDB():  Stage { Stage{Ingress,  3, 
"ls_in_lookup_fdb"} }
-function s_SWITCH_IN_PUT_FDB(): Stage { Stage{Ingress,  4, 
"ls_in_put_fdb"} }
-function s_SWITCH_IN_PRE_ACL(): Stage { Stage{Ingress,  5, 
"ls_in_pre_acl"} }
-function s_SWITCH_IN_PRE_LB():  Stage { Stage{Ingress,  6, 
"ls_in_pre_lb"} }
-function s_SWITCH_IN_PRE_STATEFUL():Stage { Stage{Ingress,  7, 
"ls_in_pre_stateful"} }
-function s_SWITCH_IN_ACL_HINT():Stage { Stage{Ingress,  8, 
"ls_in_acl_hint"} }
-function s_SWITCH_IN_ACL(): Stage { Stage{Ingress,  9, 
"ls_in_acl"} }
-function s_SWITCH_IN_QOS_MARK():Stage { Stage{Ingress, 10, 
"ls_in_qos_mark"} }
-function s_SWITCH_IN_QOS_METER():   Stage { Stage{Ingress, 11, 
"ls_in_qos_meter"} }
-function s_SWITCH_IN_STATEFUL():Stage { Stage{Ingress, 12, 
"ls_in_stateful"} }
-function s_SWITCH_IN_PRE_HAIRPIN(): Stage { Stage{Ingress, 13, 
"ls_in_pre_hairpin"} }
-function s_SWITCH_IN_NAT_HAIRPIN(): Stage { Stage{Ingress, 14, 
"ls_in_nat_hairpin"} }
-function s_SWITCH_IN_HAIRPIN(): Stage { Stage{Ingress, 15, 
"ls_in_hairpin"} }
-function s_SWITCH_IN_ARP_ND_RSP():  Stage { Stage{Ingress, 16, 
"ls_in_arp_rsp"} }
-function s_SWITCH_IN_DHCP_OPTIONS():Stage { Stage{Ingress, 17, 
"ls_in_dhcp_options"} }
-function s_SWITCH_IN_DHCP_RESPONSE():   Stage { Stage{Ingress, 18, 
"ls_in_dhcp_response"} }
-function s_SWITCH_IN_DNS_LOOKUP():  Stage { Stage{Ingress, 19, 
"ls_in_dns_lookup"} }
-function s_SWITCH_IN_DNS_RESPONSE():Stage { Stage{Ingress, 20, 
"ls_in_dns_response"} }
-function s_SWITCH_IN_EXTERNAL_PORT():   Stage { Stage{Ingress, 21, 
"ls_in_external_port"} }
-function s_SWITCH_IN_L2_LKUP(): Stage { Stage{Ingress, 22, 
"ls_in_l2_lkup"} }
-function s_SWITCH_IN_L2_UNKNOWN():  Stage { Stage{Ingress, 23, 
"ls_in_l2_unknown"} }
+function s_SWITCH_IN_PORT_SEC_L2(): Intern { Stage{Ingress,  0, 
"ls_in_port_sec_l2"}.intern() }
+function s_SWITCH_IN_PORT_SEC_IP(): Intern { Stage{Ingress,  1, 
"ls_in_port_sec_ip"}.intern() }
+function s_SWITCH_IN_PORT_SEC_ND(): Intern { Stage{Ingress,  2, 
"ls_in_port_sec_nd"}.intern() }
+function s_SWITCH_IN_LOOKUP_FDB():  Intern { Stage{Ingress,  3, 
"ls_in_lookup_fdb"}.intern() }
+function s_SWITCH_IN_PUT_FDB(): Intern { Stage{Ingress,  4, 
"ls_in_put_fdb"}.intern() }
+function s_SWITCH_IN_PRE_ACL(): Intern { Stage{Ingress,  5, 
"ls_in_pre_acl"}.intern() }
+function s_SWITCH_IN_PRE_LB():  Intern { Stage{Ingress,  6, 
"ls_in_pre_lb"}.intern() }
+function s_SWITCH_IN_PRE_STATEFUL():Intern { Stage{Ingress,  7, 
"ls_in_pre_stateful"}.intern() }
+function s_SWITCH_IN_ACL_HINT():Intern { Stage{Ingress,  8, 
"ls_in_acl_hint"}.intern() }
+function s_SWITCH_IN_ACL(): Intern { Stage{Ingress,  9, 
"ls_in_acl"}.intern() }
+function s_SWITCH_IN_QOS_MARK():Intern { Stage{Ingress, 10, 
"ls_in_qos_mark"}.intern() }
+function s_SWITCH_IN_QOS_METER():   Intern { Stage{Ingress, 11, 
"ls_in_qos_meter"}.intern() }
+function s_SWITCH_IN_STATEFUL():Intern { Stage{Ingress, 12, 
"ls_in_stateful"}.intern() }
+function s_SWITCH_IN_PRE_HAIRPIN(): Intern { Stage{Ingress, 13, 
"ls_in_pre_hairpin"}.intern() }
+function s_SWITCH_IN_NAT_HAIRPIN(): Intern { Stage{Ingress, 14, 
"ls_in_nat_hairpin"}.intern() }
+function s_SWITCH_IN_HAIRPIN(): Intern { Stage{Ingress, 15, 
"ls_in_hairpin"}.intern() }
+function s_SWITCH_IN_ARP_ND_RSP()

[ovs-dev] [PATCH ovn repost 0/7] Reduce memory consumption and time for Load_Balancer benchmark

2021-08-12 Thread Ben Pfaff
Apologies for repost; I forgot to tag this as "ovn" in the first
posting.

For the benchmark at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html,
this reduces memory consumption from 115 GB to 17 GB and elapsed time
from 19 minutes to 5 minutes.

I think there's headroom for more improvement, because Leonid has some
cleverness that I haven't been able to successfully work in yet.

This series is also available here:
https://github.com/blp/ovs-reviews/tree/ovn-memory-4

Ben Pfaff (7):
  ovn-northd-ddlog: Merge TaggedFlow and MeteredFlow into Flow.
  ovn-northd-ddlog: Use cheaper representation for stage_hint.
  ovn-northd-ddlog: Intern Stage.
  Intern all the matches and actions.
  ovn-northd-ddlog: Get rid of duplicate flows caused by stage_hint.
  ovn-northd-ddlog: Intern nb::Logical_Switch.
  ovn-northd-ddlog: Postpone expanding external_ids, stage_hint, tags.

 northd/ipam.dl   |2 +-
 northd/lswitch.dl|   36 +-
 northd/multicast.dl  |2 +-
 northd/ovn-nb.dlopts |1 +
 northd/ovn_northd.dl | 3459 +++---
 5 files changed, 1949 insertions(+), 1551 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] include/openvswitch/compiler.h: check existence of __builtin_prefetch using __has_builtin

2021-08-12 Thread Ben Pfaff
On Tue, Aug 10, 2021 at 05:17:56PM -0700, Sergey Madaminov wrote:
> Checking if '__has_builtin' is defined and then defining OVS_PREFETCH to
> be '__builtin_prefetch' if it is available. To preserve backwards
> compatibility, the previous way to define OVS_PREFETCH macro is also
> there.

Thanks for working on OVS!

> +#if defined __has_builtin
> +#  if __has_builtin (__builtin_prefetch)
> +#define OVS_PREFETCH(addr) __builtin_prefetch((addr))
> +#define OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1)
> +#  endif
> +#elif __GNUC__
> +#  define OVS_PREFETCH(addr) __builtin_prefetch((addr))
> +#  define OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1)
>  #else
> -#define OVS_PREFETCH(addr)
> -#define OVS_PREFETCH_WRITE(addr)
> +#  define OVS_PREFETCH(addr)
> +#  define OVS_PREFETCH_WRITE(addr)
>  #endif

The above has some redundancy: two of the forks of the definitions are
identical.

I think a common way to deal with this is something like:

#ifndef __has_builtin
#define __has_builtin(X) 0
#endif

#if __has_builtin (__builtin_prefetch) || __GNUC__
/* definition 1 */
#else
/* definition 2 */
#endif

>  /* Since Visual Studio 2015 there has been an effort to make offsetof a
> @@ -244,11 +253,13 @@
>   * the C compiler.
>   * e.g.: https://bit.ly/2UvWwti
>   */
> + /*
>  #if _MSC_VER >= 1900
>  #undef offsetof
>  #define offsetof(type, member) \
>  ((size_t)((char *)&(((type *)0)->member) - (char *)0))
>  #endif
> +*/

The above seems unrelated.  If it's needed, I'd put it in a separate
patch.  I'd also remove the code rather than commenting it out.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7] Reduce memory consumption and time for Load_Balancer benchmark

2021-08-12 Thread Ben Pfaff
On Thu, Aug 12, 2021 at 08:53:51AM -0700, Ben Pfaff wrote:
> For the benchmark at
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html,
> this reduces memory consumption from 115 GB to 17 GB and elapsed time
> from 19 minutes to 5 minutes.
> 
> I think there's headroom for more improvement, because Leonid has some
> cleverness that I haven't been able to successfully work in yet.

I should add that this series is also available here:
https://github.com/blp/ovs-reviews/tree/ovn-memory-4
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 7/7] ovn-northd-ddlog: Postpone expanding external_ids, stage_hint, tags.

2021-08-12 Thread Ben Pfaff
From: Ben Pfaff 

This has little effect on performance in the current benchmark, but it
seems like the cleanest approach to me.

Signed-off-by: Ben Pfaff 
---
 northd/ovn_northd.dl | 38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index dadf33ab6..f0730d40b 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -1667,9 +1667,9 @@ relation AggregatedFlow (
 priority:  integer,
 __match:   istring,
 actions:   istring,
-tags:  Map,
+io_port:   Option,
 controller_meter:  Option,
-external_ids:  Map
+stage_hint:bit<32>
 )
 function make_flow_tags(io_port: Option): Map {
 match (io_port) {
@@ -1690,9 +1690,9 @@ AggregatedFlow(.logical_datapaths = g.to_set(),
.priority = priority,
.__match = __match,
.actions = actions,
-   .tags = make_flow_tags(io_port),
+   .io_port = io_port,
.controller_meter = controller_meter,
-   .external_ids = make_flow_external_ids(stage_hint, stage)) :-
+   .stage_hint = stage_hint) :-
 UseLogicalDatapathGroups[true],
 Flow(logical_datapath, stage, priority, __match, actions, io_port, 
controller_meter, stage_hint),
 var g = logical_datapath.group_by((stage, priority, __match, actions, 
io_port, controller_meter, stage_hint)).
@@ -1701,43 +1701,49 @@ AggregatedFlow(.logical_datapaths = 
set_singleton(logical_datapath),
.priority = priority,
.__match = __match,
.actions = actions,
-   .tags = make_flow_tags(io_port),
+   .io_port = io_port,
.controller_meter = controller_meter,
-   .external_ids = make_flow_external_ids(stage_hint, stage)) :-
+   .stage_hint = stage_hint) :-
 UseLogicalDatapathGroups[false],
 Flow(logical_datapath, stage, priority, __match, actions, io_port, 
controller_meter, stage_hint).
 
+function to_string(pipeline: Pipeline): string {
+if (pipeline == Ingress) {
+"ingress"
+} else {
+"egress"
+}
+}
+
 for (f in AggregatedFlow()) {
-var pipeline = if (f.stage.pipeline == Ingress) "ingress" else "egress" in
-var external_ids = f.external_ids.insert_imm("stage-name", 
f.stage.table_name) in
 if (f.logical_datapaths.size() == 1) {
 Some{var dp} = f.logical_datapaths.nth(0) in
 sb::Out_Logical_Flow(
-._uuid = hash128((dp, f.stage, f.priority, f.__match, f.actions, 
f.controller_meter, f.external_ids)),
+._uuid = hash128((dp, f.stage, f.priority, f.__match, f.actions, 
f.controller_meter, f.io_port, f.stage_hint)),
 .logical_datapath = Some{dp},
 .logical_dp_group = None,
-.pipeline = pipeline,
+.pipeline = f.stage.pipeline.to_string(),
 .table_id = f.stage.table_id as integer,
 .priority = f.priority,
 .controller_meter = f.controller_meter,
 .__match  = f.__match.ival(),
 .actions  = f.actions.ival(),
-.tags = f.tags,
-.external_ids = external_ids)
+.tags = make_flow_tags(f.io_port),
+.external_ids = make_flow_external_ids(f.stage_hint, f.stage))
 } else {
 var group_uuid = hash128(f.logical_datapaths) in {
 sb::Out_Logical_Flow(
-._uuid = hash128((group_uuid, f.stage, f.priority, f.__match, 
f.actions, f.controller_meter, f.external_ids)),
+._uuid = hash128((group_uuid, f.stage, f.priority, f.__match, 
f.actions, f.controller_meter, f.io_port, f.stage_hint)),
 .logical_datapath = None,
 .logical_dp_group = Some{group_uuid},
-.pipeline = pipeline,
+.pipeline = f.stage.pipeline.to_string(),
 .table_id = f.stage.table_id as integer,
 .priority = f.priority,
 .controller_meter = f.controller_meter,
 .__match  = f.__match.ival(),
 .actions  = f.actions.ival(),
-.tags = f.tags,
-.external_ids = external_ids);
+.tags = make_flow_tags(f.io_port),
+.external_ids = make_flow_external_ids(f.stage_hint, 
f.stage));
 sb::Out_Logical_DP_Group(._uuid = group_uuid, .datapaths = 
f.logical_datapaths)
 }
 }
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 6/7] ovn-northd-ddlog: Intern nb::Logical_Switch.

2021-08-12 Thread Ben Pfaff
From: Ben Pfaff 

With the benchmark at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html,
this reduces memory consumption by 300 MB and elapsed time by a few
seconds.

Signed-off-by: Ben Pfaff 
---
 northd/ipam.dl   |  2 +-
 northd/lswitch.dl| 36 ++--
 northd/multicast.dl  |  2 +-
 northd/ovn-nb.dlopts |  1 +
 northd/ovn_northd.dl | 16 
 5 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/northd/ipam.dl b/northd/ipam.dl
index da71b2872..4665a28cb 100644
--- a/northd/ipam.dl
+++ b/northd/ipam.dl
@@ -187,7 +187,7 @@ SwitchIPv4ReservedAddresses(lswitch, addrs) :-
 var addrs = addr.group_by(lswitch).to_set().
 
 SwitchIPv4ReservedAddresses(lswitch_uuid, set_empty()) :-
-nb::Logical_Switch(._uuid = lswitch_uuid),
+::Logical_Switch(._uuid = lswitch_uuid),
 not SwitchIPv4ReservedAddress(lswitch_uuid, _).
 
 /* Allocate dynamic IP addresses for ports that require them:
diff --git a/northd/lswitch.dl b/northd/lswitch.dl
index 7e7b62a4d..868ae115f 100644
--- a/northd/lswitch.dl
+++ b/northd/lswitch.dl
@@ -44,7 +44,7 @@ SwitchRouterPeerRef(lsp, None) :-
  * (Use LogicalSwitchPort instead, which guarantees uniqueness.) */
 relation LogicalSwitchPortCandidate(lsp_uuid: uuid, ls_uuid: uuid)
 LogicalSwitchPortCandidate(lsp_uuid, ls_uuid) :-
-nb::Logical_Switch(._uuid = ls_uuid, .ports = ports),
+::Logical_Switch(._uuid = ls_uuid, .ports = ports),
 var lsp_uuid = FlatMap(ports).
 Warning[message] :-
 LogicalSwitchPortCandidate(lsp_uuid, ls_uuid),
@@ -74,7 +74,7 @@ LogicalSwitchPortWithUnknownAddress(ls_uuid, lsp_uuid) :-
 output relation LogicalSwitchHasUnknownPorts(ls: uuid, has_unknown: bool)
 LogicalSwitchHasUnknownPorts(ls, true) :- 
LogicalSwitchPortWithUnknownAddress(ls, _).
 LogicalSwitchHasUnknownPorts(ls, false) :-
-nb::Logical_Switch(._uuid = ls),
+::Logical_Switch(._uuid = ls),
 not LogicalSwitchPortWithUnknownAddress(ls, _).
 
 /* PortStaticAddresses: static IP addresses associated with each 
Logical_Switch_Port */
@@ -101,11 +101,11 @@ PortInGroup(port, group) :-
 relation LogicalSwitchACL(ls: uuid, acl: uuid)
 
 LogicalSwitchACL(ls, acl) :-
-nb::Logical_Switch(._uuid = ls, .acls = acls),
+::Logical_Switch(._uuid = ls, .acls = acls),
 var acl = FlatMap(acls).
 
 LogicalSwitchACL(ls, acl) :-
-nb::Logical_Switch(._uuid = ls, .ports = ports),
+::Logical_Switch(._uuid = ls, .ports = ports),
 var port_id = FlatMap(ports),
 PortInGroup(port_id, group_id),
 nb::Port_Group(._uuid = group_id, .acls = acls),
@@ -125,7 +125,7 @@ LogicalSwitchHasStatefulACL(ls, true) :-
 LogicalSwitchStatefulACL(ls, _).
 
 LogicalSwitchHasStatefulACL(ls, false) :-
-nb::Logical_Switch(._uuid = ls),
+::Logical_Switch(._uuid = ls),
 not LogicalSwitchStatefulACL(ls, _).
 
 // "Pitfalls of projections" in ddlog-new-feature.rst explains why this
@@ -136,7 +136,7 @@ LogicalSwitchHasACLs(ls, true) :-
 LogicalSwitchACL(ls, _).
 
 LogicalSwitchHasACLs(ls, false) :-
-nb::Logical_Switch(._uuid = ls),
+::Logical_Switch(._uuid = ls),
 not LogicalSwitchACL(ls, _).
 
 /*
@@ -146,7 +146,7 @@ LogicalSwitchHasACLs(ls, false) :-
  */
 relation LogicalSwitchLocalnetPort0(ls_uuid: uuid, lsp: (uuid, string))
 LogicalSwitchLocalnetPort0(ls_uuid, (lsp_uuid, lsp.name)) :-
-ls in nb::Logical_Switch(._uuid = ls_uuid),
+ls in ::Logical_Switch(._uuid = ls_uuid),
 var lsp_uuid = FlatMap(ls.ports),
 lsp in ::Logical_Switch_Port(._uuid = lsp_uuid),
 lsp.__type == "localnet".
@@ -156,7 +156,7 @@ LogicalSwitchLocalnetPorts(ls_uuid, localnet_ports) :-
 LogicalSwitchLocalnetPort0(ls_uuid, lsp),
 var localnet_ports = lsp.group_by(ls_uuid).to_vec().
 LogicalSwitchLocalnetPorts(ls_uuid, vec_empty()) :-
-ls in nb::Logical_Switch(),
+ls in ::Logical_Switch(),
 var ls_uuid = ls._uuid,
 not LogicalSwitchLocalnetPort0(ls_uuid, _).
 
@@ -164,7 +164,7 @@ LogicalSwitchLocalnetPorts(ls_uuid, vec_empty()) :-
 relation LogicalSwitchDNS(ls_uuid: uuid, dns_uuid: uuid)
 
 LogicalSwitchDNS(ls._uuid, dns_uuid) :-
-nb::Logical_Switch[ls],
+::Logical_Switch[ls],
 var dns_uuid = FlatMap(ls.dns_records),
 nb::DNS(._uuid = dns_uuid).
 
@@ -183,12 +183,12 @@ LogicalSwitchHasDNSRecords(ls, true) :-
 LogicalSwitchWithDNSRecords(ls).
 
 LogicalSwitchHasDNSRecords(ls, false) :-
-nb::Logical_Switch(._uuid = ls),
+::Logical_Switch(._uuid = ls),
 not LogicalSwitchWithDNSRecords(ls).
 
 relation LogicalSwitchHasNonRouterPort0(ls: uuid)
 LogicalSwitchHasNonRouterPort0(ls_uuid) :-
-ls in nb::Logical_Switch(._uuid = ls_uuid),
+ls in ::Logical_Switch(._uuid = ls_uuid),
 var lsp_uuid = FlatMap(ls.ports),
 lsp in ::Logical_Switch_Port(._uuid = lsp_uuid),
 lsp.__type != "router".
@@ -199,7 +199,7 @@ output relation LogicalSwitchHasNonRouterPort(ls: uuid, 
has_non_router_port: boo
 LogicalS

[ovs-dev] [PATCH 3/7] ovn-northd-ddlog: Intern Stage.

2021-08-12 Thread Ben Pfaff
From: Ben Pfaff 

With the benchmark at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html,
this reduces memory consumption from 72 GB to 66 GB and elapsed time by
a few seconds.

Signed-off-by: Ben Pfaff 
Signed-off-by: Ben Pfaff 
---
 northd/ovn_northd.dl | 140 +--
 1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 2365372fb..f1b7471f2 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -1434,77 +1434,77 @@ nb::Out_Logical_Router_Port(._uuid = _uuid,
 
 typedef Pipeline = Ingress | Egress
 
-typedef Stage = Stage{
+typedef Stage = Stage {
 pipeline: Pipeline,
-table_id: integer,
+table_id: bit<8>,
 table_name  : string
 }
 
 /* Logical switch ingress stages. */
-function s_SWITCH_IN_PORT_SEC_L2(): Stage { Stage{Ingress,  0, 
"ls_in_port_sec_l2"} }
-function s_SWITCH_IN_PORT_SEC_IP(): Stage { Stage{Ingress,  1, 
"ls_in_port_sec_ip"} }
-function s_SWITCH_IN_PORT_SEC_ND(): Stage { Stage{Ingress,  2, 
"ls_in_port_sec_nd"} }
-function s_SWITCH_IN_LOOKUP_FDB():  Stage { Stage{Ingress,  3, 
"ls_in_lookup_fdb"} }
-function s_SWITCH_IN_PUT_FDB(): Stage { Stage{Ingress,  4, 
"ls_in_put_fdb"} }
-function s_SWITCH_IN_PRE_ACL(): Stage { Stage{Ingress,  5, 
"ls_in_pre_acl"} }
-function s_SWITCH_IN_PRE_LB():  Stage { Stage{Ingress,  6, 
"ls_in_pre_lb"} }
-function s_SWITCH_IN_PRE_STATEFUL():Stage { Stage{Ingress,  7, 
"ls_in_pre_stateful"} }
-function s_SWITCH_IN_ACL_HINT():Stage { Stage{Ingress,  8, 
"ls_in_acl_hint"} }
-function s_SWITCH_IN_ACL(): Stage { Stage{Ingress,  9, 
"ls_in_acl"} }
-function s_SWITCH_IN_QOS_MARK():Stage { Stage{Ingress, 10, 
"ls_in_qos_mark"} }
-function s_SWITCH_IN_QOS_METER():   Stage { Stage{Ingress, 11, 
"ls_in_qos_meter"} }
-function s_SWITCH_IN_STATEFUL():Stage { Stage{Ingress, 12, 
"ls_in_stateful"} }
-function s_SWITCH_IN_PRE_HAIRPIN(): Stage { Stage{Ingress, 13, 
"ls_in_pre_hairpin"} }
-function s_SWITCH_IN_NAT_HAIRPIN(): Stage { Stage{Ingress, 14, 
"ls_in_nat_hairpin"} }
-function s_SWITCH_IN_HAIRPIN(): Stage { Stage{Ingress, 15, 
"ls_in_hairpin"} }
-function s_SWITCH_IN_ARP_ND_RSP():  Stage { Stage{Ingress, 16, 
"ls_in_arp_rsp"} }
-function s_SWITCH_IN_DHCP_OPTIONS():Stage { Stage{Ingress, 17, 
"ls_in_dhcp_options"} }
-function s_SWITCH_IN_DHCP_RESPONSE():   Stage { Stage{Ingress, 18, 
"ls_in_dhcp_response"} }
-function s_SWITCH_IN_DNS_LOOKUP():  Stage { Stage{Ingress, 19, 
"ls_in_dns_lookup"} }
-function s_SWITCH_IN_DNS_RESPONSE():Stage { Stage{Ingress, 20, 
"ls_in_dns_response"} }
-function s_SWITCH_IN_EXTERNAL_PORT():   Stage { Stage{Ingress, 21, 
"ls_in_external_port"} }
-function s_SWITCH_IN_L2_LKUP(): Stage { Stage{Ingress, 22, 
"ls_in_l2_lkup"} }
-function s_SWITCH_IN_L2_UNKNOWN():  Stage { Stage{Ingress, 23, 
"ls_in_l2_unknown"} }
+function s_SWITCH_IN_PORT_SEC_L2(): Intern { Stage{Ingress,  0, 
"ls_in_port_sec_l2"}.intern() }
+function s_SWITCH_IN_PORT_SEC_IP(): Intern { Stage{Ingress,  1, 
"ls_in_port_sec_ip"}.intern() }
+function s_SWITCH_IN_PORT_SEC_ND(): Intern { Stage{Ingress,  2, 
"ls_in_port_sec_nd"}.intern() }
+function s_SWITCH_IN_LOOKUP_FDB():  Intern { Stage{Ingress,  3, 
"ls_in_lookup_fdb"}.intern() }
+function s_SWITCH_IN_PUT_FDB(): Intern { Stage{Ingress,  4, 
"ls_in_put_fdb"}.intern() }
+function s_SWITCH_IN_PRE_ACL(): Intern { Stage{Ingress,  5, 
"ls_in_pre_acl"}.intern() }
+function s_SWITCH_IN_PRE_LB():  Intern { Stage{Ingress,  6, 
"ls_in_pre_lb"}.intern() }
+function s_SWITCH_IN_PRE_STATEFUL():Intern { Stage{Ingress,  7, 
"ls_in_pre_stateful"}.intern() }
+function s_SWITCH_IN_ACL_HINT():Intern { Stage{Ingress,  8, 
"ls_in_acl_hint"}.intern() }
+function s_SWITCH_IN_ACL(): Intern { Stage{Ingress,  9, 
"ls_in_acl"}.intern() }
+function s_SWITCH_IN_QOS_MARK():Intern { Stage{Ingress, 10, 
"ls_in_qos_mark"}.intern() }
+function s_SWITCH_IN_QOS_METER():   Intern { Stage{Ingress, 11, 
"ls_in_qos_meter"}.intern() }
+function s_SWITCH_IN_STATEFUL():Intern { Stage{Ingress, 12, 
"ls_in_stateful"}.intern() }
+function s_SWITCH_IN_PRE_HAIRPIN(): Intern { Stage{Ingress, 13, 
"ls_in_pre_hairpin"}.intern() }
+function s_SWITCH_IN_NAT_HAIRPIN(): Intern { Stage{Ingress, 14, 
"ls_in_nat_hairpin"}.intern() }
+function s_SWITCH_IN_HAIRPIN(): Intern { Stage{Ingress, 15, 
"ls_in_hairpin"

[ovs-dev] [PATCH 5/7] ovn-northd-ddlog: Get rid of duplicate flows caused by stage_hint.

2021-08-12 Thread Ben Pfaff
From: Ben Pfaff 

It was possible for these rules to generate multiple Flow records that
differed only in their stage_hint.  This caused a lot of duplication for
the load balancer benchmark.

With the benchmark at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html,
this reduces memory consumption from 53 GB to 17 GB and elapsed time
from 14 minutes to 5 minutes.

Signed-off-by: Ben Pfaff 
Signed-off-by: Ben Pfaff 
---
 northd/ovn_northd.dl | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index dd69126ff..38b97aa6b 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -3163,7 +3163,7 @@ Flow(.logical_datapath = sw._uuid,
  .actions  = actions.intern(),
  .io_port  = None,
  .controller_meter = meter,
- .stage_hint   = stage_hint(lb._uuid)) :-
+ .stage_hint   = 0) :-
 sw in (),
 LBVIPWithStatus[lbvip@{.lb = lb}],
 sw.load_balancer.contains(lb._uuid),
@@ -4587,7 +4587,7 @@ Flow(.logical_datapath = sw._uuid,
  .priority = 90,
  .__match  = i"${fLAGBIT_NOT_VXLAN()} && arp.op == 1 && arp.tpa == 
${ipv4}",
  .actions  = i"outport = ${flood}; output;",
- .stage_hint   = stage_hint(sp.lsp._uuid),
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None) :-
 sp in (.sw = sw, .peer = Some{rp}),
@@ -,7 +,7 @@ for (RouterLBVIP(
  .priority = prio,
  .__match  = __match.intern(),
  .actions  = __actions,
- .stage_hint   = stage_hint(lb._uuid),
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None);
 
@@ -6712,7 +6712,7 @@ for (RouterLBVIP(
  .priority = prio,
  .__match  = est_match.intern(),
  .actions  = actions,
- .stage_hint   = stage_hint(lb._uuid),
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None);
 
@@ -6797,7 +6797,7 @@ Flow(.logical_datapath = r._uuid,
  .actions  = actions.intern(),
  .io_port  = None,
  .controller_meter = meter,
- .stage_hint   = stage_hint(lb._uuid)) :-
+ .stage_hint   = 0) :-
 r in (),
 r.l3dgw_ports.len() > 0 or r.is_gateway,
 LBVIPWithStatus[lbvip@{.lb = lb}],
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/7] ovn-northd-ddlog: Use cheaper representation for stage_hint.

2021-08-12 Thread Ben Pfaff
From: Ben Pfaff 

The stage_hint only shows 32 bits of the uuid, so it's cheaper to omit
the rest for the internal representation.  Also, this is just a hint, so
we might as well use zero to mean None and save the cost of the Option
wrapper.

With the benchmark at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html,
this reduces memory consumption by 1.3 GB.

Signed-off-by: Ben Pfaff 
---
 northd/ovn_northd.dl | 579 ++-
 1 file changed, 292 insertions(+), 287 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 844add024..2365372fb 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -1644,9 +1644,13 @@ relation Flow(
 actions:  string,
 io_port:  Option,
 controller_meter: Option,
-stage_hint:   Option
+stage_hint:   bit<32>
 )
 
+function stage_hint(_uuid: uuid): bit<32> {
+_uuid[127:96]
+}
+
 /* If this option is 'true' northd will combine logical flows that differ by
  * logical datapath only by creating a datapath group. */
 relation UseLogicalDatapathGroups[bool]
@@ -1673,11 +1677,12 @@ function make_flow_tags(io_port: Option): 
Map {
 Some{s} -> [ "in_out_port" -> s ]
 }
 }
-function make_flow_external_ids(stage_hint: Option, stage: Stage): 
Map {
-match (stage_hint) {
-None -> ["stage-name" -> stage.table_name],
-Some{uuid} -> ["stage-name" -> stage.table_name,
-   "stage-hint" -> "${hex(uuid[127:96])}"]
+function make_flow_external_ids(stage_hint: bit<32>, stage: Stage): 
Map {
+if (stage_hint == 0) {
+["stage-name" -> stage.table_name]
+} else {
+["stage-name" -> stage.table_name,
+ "stage-hint" -> "${hex(stage_hint)}"]
 }
 }
 AggregatedFlow(.logical_datapaths = g.to_set(),
@@ -1744,7 +1749,7 @@ Flow(.logical_datapath = sw._uuid,
  .priority = 50,
  .__match  = __match,
  .actions  = actions,
- .stage_hint   = Some{fg_uuid},
+ .stage_hint   = stage_hint(fg_uuid),
  .io_port  = None,
  .controller_meter = None) :-
 sw in (),
@@ -1776,7 +1781,7 @@ Flow(.logical_datapath = sw._uuid,
  .priority = 50,
  .__match  = __match,
  .actions  = actions,
- .stage_hint   = None,
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None) :-
 sw in (),
@@ -1799,7 +1804,7 @@ for (sw in ()) {
  .priority = 100,
  .__match  = "vlan.present",
  .actions  = "drop;",
- .stage_hint   = None /*TODO: check*/,
+ .stage_hint   = 0 /*TODO: check*/,
  .io_port  = None,
  .controller_meter = None)
 };
@@ -1810,7 +1815,7 @@ for (sw in ()) {
  .priority = 100,
  .__match  = "eth.src[40]",
  .actions  = "drop;",
- .stage_hint   = None /*TODO: check*/,
+ .stage_hint   = 0 /*TODO: check*/,
  .io_port  = None,
  .controller_meter = None)
 /* Port security flows have priority 50 (see below) and will continue to 
the next table
@@ -1872,7 +1877,7 @@ for ((._uuid =ls_uuid)) {
  .priority = 0,
  .__match  = "1",
  .actions  = "next;",
- .stage_hint   = None,
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None);
 Flow(.logical_datapath = ls_uuid,
@@ -1880,7 +1885,7 @@ for ((._uuid =ls_uuid)) {
  .priority = 0,
  .__match  = "1",
  .actions  = "next;",
- .stage_hint   = None,
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None);
 
@@ -1889,7 +1894,7 @@ for ((._uuid =ls_uuid)) {
  .priority = 110,
  .__match  = "eth.dst == $svc_monitor_mac",
  .actions  = "next;",
- .stage_hint   = None,
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None);
 Flow(.logical_datapath = ls_uuid,
@@ -1897,7 +1902,7 @@ for ((._uuid =ls_uuid)) {
  .priority = 110,
  .__match  = "eth.src == $svc_monitor_mac",
  .actions  = "next;",
- .stage_hint   = None,
+ .stage_hint   = 0,
  .io_port  = None,
  .controller_meter = None)
 }
@@ -1912,7 +1917,7 @@ for ((.sw = sw@{._uuid = ls_uuid}, .acl 
= , .has_fair_meter
  .priority = acl.pr

[ovs-dev] [PATCH 0/7] Reduce memory consumption and time for Load_Balancer benchmark

2021-08-12 Thread Ben Pfaff
For the benchmark at
https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html,
this reduces memory consumption from 115 GB to 17 GB and elapsed time
from 19 minutes to 5 minutes.

I think there's headroom for more improvement, because Leonid has some
cleverness that I haven't been able to successfully work in yet.

Ben Pfaff (7):
  ovn-northd-ddlog: Merge TaggedFLow and MeteredFlow into Flow.
  ovn-northd-ddlog: Use cheaper representation for stage_hint.
  ovn-northd-ddlog: Intern Stage.
  Intern all the matches and actions.
  ovn-northd-ddlog: Get rid of duplicate flows caused by stage_hint.
  ovn-northd-ddlog: Intern nb::Logical_Switch.
  ovn-northd-ddlog: Postpone expanding external_ids, stage_hint, tags.

 northd/ipam.dl   |2 +-
 northd/lswitch.dl|   36 +-
 northd/multicast.dl  |2 +-
 northd/ovn-nb.dlopts |1 +
 northd/ovn_northd.dl | 3459 +++---
 5 files changed, 1949 insertions(+), 1551 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ideas for improving the OVS review process

2021-07-23 Thread Ben Pfaff
On Fri, Jul 23, 2021 at 08:26:51PM +0800, Tonghao Zhang wrote:
> On Wed, Jul 21, 2021 at 2:41 AM Ben Pfaff  wrote:
> >
> > The OVS review process has greatly slowed over the last few years.  This
> > is partly because I haven't been able to spend as much time on review,
> > since I was once the most productive reviewer.  Ilya has been able to
> Thanks for your work!
> > step up the amount of review he does, but that still isn't enough to
> > keep up with the workload.
> >
> > We need to come up with some way to improve things.  Here are a few
> > ideas, mostly from a call earlier today (that was mainly about the OVS
> > conference).  I hope they will prompt a discussion.
> >
> > * Since patches are coming in, we have people who are knowledgable about
> >   the code.  Those people should be pitching in with reviews as well.
> >   It doesn't seem like they or their managers have the right incentives
> >   to do that.  Maybe there is some way to improve the incentives.
> Do we need more maintainers? even though there are many maintainers, but
> a few people are active. Am I wrong ?

That reminds me of another point brought up in discussion in the call.
We have some people who are active and valuable contributors, but who
have not been officially appointed as maintainers.  We should nominate
and confirm them, because this also removes a bottleneck from the
process for getting patches in.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix continuations with OF instructions in OF1.1+.

2021-07-22 Thread Ben Pfaff
On Thu, Jul 22, 2021 at 12:14:23PM -0700, Ben Pfaff wrote:
> On Wed, Jul 21, 2021 at 01:22:19PM +0200, Ilya Maximets wrote:
> > On 7/21/21 12:52 AM, Ben Pfaff wrote:
> > > On Tue, Jul 20, 2021 at 08:29:14PM +0200, Ilya Maximets wrote:
> > >> On 7/7/21 8:51 PM, Ben Pfaff wrote:
> > >>> +/* From an OpenFlow point of view, goto_table and 
> > >>> write_metadata are
> > >>> + * instructions, not actions.  This means that to use them, 
> > >>> we'd have
> > >>> + * to reformulate the actions as instructions, which is 
> > >>> possible, and
> > >>> + * we'd have slot them into the frozen actions in a specific 
> > >>> order,
> > >>> + * which doesn't seem practical.  Instead, we translate these
> > >>> + * instructions into equivalent actions. */
> [...]
> > >> I have very little knowledge in this area, so this might be not issue.
> > >> However, 'OpenFlow "instructions", which were introduced in OpenFlow 1.1'
> > >> doesn't match with OFPACT_SET_FIELD that is marked as OpenFlow 1.2+
> > >> action.  I see that all tests for continuation have default bridge
> > >> configuration.  And monitors are OF10 and OF13 in different cases, but
> > >> I'm not  sure if these ovs-ofctl monitors are checking that received
> > >> actions for resume matches the OF version they are using.  And for
> > >> the bridge itself, it supports all versions of OF, so it can interpret
> > >> actions.  But the question is: will that work if the bridge only
> > >> supports OpenFlow 1.1 ?  It should fail to parse OFPACT_SET_FIELD, or
> > >> am I missing something here?
> > > 
> > > Thanks a lot for the review!
> > > 
> > > OVS can serialize most actions and instructions to most OpenFlow
> > > versions, even if they are not natively supported in that particular
> > > OpenFlow version.  For example, in the case of OFPACT_SET_FIELD (a
> > > rather important action), the code in ofp-actions.c has complete
> > > fallbacks:
> [...]
> 
> > > What is lacking is a fallback for OF1.1+, which does have instructions,
> > > in the case where we're encoding something that can't use the
> > > instructions but must use actions instead.  That's the case here.  It's
> > > a weird case and I don't know about it elsewhere.
> > 
> > OK.  Thanks for the explanation!  That make sense.  For some reason I
> > mixed up ofpacts with OpenFlow actions in my head and so I missed the
> > translation layer that handles conversion to the appropriate OpenFlow
> > version.
> > 
> > TBH, it's really hard to track the call chain that connects the
> > freeze_unroll_actions() and encode_SET_FIELD().  I still didn't figure
> > that out by reading the code.  I guess, the only option for me is to
> > attach debugger, break the execution and look at the stack, unless you
> > can draw a call tree for me.
> 
> It's a little indirect.
> 
> If we're in freeze_unroll_actions(), it's because we're freezing.  If
> we're freezing, we'll unwind the call stack a bit and end up in
> finish_freezing__().  That function will stick the actions in a struct
> frozen_state, and then attach it to a newly allocated recirc_id via
> recirc_alloc_id_ctx().  That recirc_id gets stuck in an action that gets
> appended to the flow we're currently translating.  In the case I'm
> looking at here, that's an action to send the packet to userspace.
> 
> Later on, the datapath sends a packet sent to userspace.
> process_upcall() in userspace hits the CONTROLLER_UPCALL case, which
> grabs the recirc_id out of the upcall cookie, gets the frozen state
> using that, gets the ofpacts out of it and stuffs them into the private
> part of the packet-in message it constructs, and then sends an async msg
> (a packet-in with private data attached) to the controller.  Eventually
> that async message gets serialized using
> ofputil_encode_packet_in_private(), which calls
> ofputil_put_packet_in_private(), which encodes the actions via calls to
> put_actions_property().
> 
> I wish it was more direct.  The real problem here is the fixed datapath
> interface.  We can't change that unless and until we drop support for
> the Linux kernel module and its frozen ABI.
> 
> > In the end, it's hard for me to review the logic behind this change,
> > but the patch seems to do what described in the commit message and
> > the implementation looks correct, tests work fine.  With that in mind:
> > 
> > Acked-by: Ilya Maximets 
> 
> Thanks a lot!  I'll recheck this and push it soon.

I applied this to master.  Grayson, do you want this backported?  What
version?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix continuations with OF instructions in OF1.1+.

2021-07-22 Thread Ben Pfaff
On Wed, Jul 21, 2021 at 01:22:19PM +0200, Ilya Maximets wrote:
> On 7/21/21 12:52 AM, Ben Pfaff wrote:
> > On Tue, Jul 20, 2021 at 08:29:14PM +0200, Ilya Maximets wrote:
> >> On 7/7/21 8:51 PM, Ben Pfaff wrote:
> >>> +/* From an OpenFlow point of view, goto_table and write_metadata 
> >>> are
> >>> + * instructions, not actions.  This means that to use them, we'd 
> >>> have
> >>> + * to reformulate the actions as instructions, which is 
> >>> possible, and
> >>> + * we'd have slot them into the frozen actions in a specific 
> >>> order,
> >>> + * which doesn't seem practical.  Instead, we translate these
> >>> + * instructions into equivalent actions. */
[...]
> >> I have very little knowledge in this area, so this might be not issue.
> >> However, 'OpenFlow "instructions", which were introduced in OpenFlow 1.1'
> >> doesn't match with OFPACT_SET_FIELD that is marked as OpenFlow 1.2+
> >> action.  I see that all tests for continuation have default bridge
> >> configuration.  And monitors are OF10 and OF13 in different cases, but
> >> I'm not  sure if these ovs-ofctl monitors are checking that received
> >> actions for resume matches the OF version they are using.  And for
> >> the bridge itself, it supports all versions of OF, so it can interpret
> >> actions.  But the question is: will that work if the bridge only
> >> supports OpenFlow 1.1 ?  It should fail to parse OFPACT_SET_FIELD, or
> >> am I missing something here?
> > 
> > Thanks a lot for the review!
> > 
> > OVS can serialize most actions and instructions to most OpenFlow
> > versions, even if they are not natively supported in that particular
> > OpenFlow version.  For example, in the case of OFPACT_SET_FIELD (a
> > rather important action), the code in ofp-actions.c has complete
> > fallbacks:
[...]

> > What is lacking is a fallback for OF1.1+, which does have instructions,
> > in the case where we're encoding something that can't use the
> > instructions but must use actions instead.  That's the case here.  It's
> > a weird case and I don't know about it elsewhere.
> 
> OK.  Thanks for the explanation!  That make sense.  For some reason I
> mixed up ofpacts with OpenFlow actions in my head and so I missed the
> translation layer that handles conversion to the appropriate OpenFlow
> version.
> 
> TBH, it's really hard to track the call chain that connects the
> freeze_unroll_actions() and encode_SET_FIELD().  I still didn't figure
> that out by reading the code.  I guess, the only option for me is to
> attach debugger, break the execution and look at the stack, unless you
> can draw a call tree for me.

It's a little indirect.

If we're in freeze_unroll_actions(), it's because we're freezing.  If
we're freezing, we'll unwind the call stack a bit and end up in
finish_freezing__().  That function will stick the actions in a struct
frozen_state, and then attach it to a newly allocated recirc_id via
recirc_alloc_id_ctx().  That recirc_id gets stuck in an action that gets
appended to the flow we're currently translating.  In the case I'm
looking at here, that's an action to send the packet to userspace.

Later on, the datapath sends a packet sent to userspace.
process_upcall() in userspace hits the CONTROLLER_UPCALL case, which
grabs the recirc_id out of the upcall cookie, gets the frozen state
using that, gets the ofpacts out of it and stuffs them into the private
part of the packet-in message it constructs, and then sends an async msg
(a packet-in with private data attached) to the controller.  Eventually
that async message gets serialized using
ofputil_encode_packet_in_private(), which calls
ofputil_put_packet_in_private(), which encodes the actions via calls to
put_actions_property().

I wish it was more direct.  The real problem here is the fixed datapath
interface.  We can't change that unless and until we drop support for
the Linux kernel module and its frozen ABI.

> In the end, it's hard for me to review the logic behind this change,
> but the patch seems to do what described in the commit message and
> the implementation looks correct, tests work fine.  With that in mind:
> 
> Acked-by: Ilya Maximets 

Thanks a lot!  I'll recheck this and push it soon.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dummy: Silence the 'may be uninitialized' warning.

2021-07-22 Thread Ben Pfaff
On Thu, Jul 22, 2021 at 01:33:37PM +0200, Ilya Maximets wrote:
> GCC 11 on Feodra 34 emits a false-positive warning like this:
> 
>  lib/netdev-dummy.c: In function ‘dummy_packet_stream_run’:
>  lib/netdev-dummy.c:284:16: error: ‘n’ may be used uninitialized in this
>function [-Werror=maybe-uninitialized]
>   284 | if (retval == n && dp_packet_size(>rxbuf) > 2) {
>   |^
> 
> This breaks the build with --enable-Werror.  Initializing 'n' to
> avoid the warning.
> 
> Signed-off-by: Ilya Maximets 

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ideas for improving the OVS review process

2021-07-21 Thread Ben Pfaff
On Wed, Jul 21, 2021 at 10:37:18AM +0200, Eelco Chaudron wrote:
> 
> 
> On 20 Jul 2021, at 20:41, Ben Pfaff wrote:
> 
> > The OVS review process has greatly slowed over the last few years.  This
> > is partly because I haven't been able to spend as much time on review,
> > since I was once the most productive reviewer.  Ilya has been able to
> > step up the amount of review he does, but that still isn't enough to
> > keep up with the workload.
> >
> > We need to come up with some way to improve things.  Here are a few
> > ideas, mostly from a call earlier today (that was mainly about the OVS
> > conference).  I hope they will prompt a discussion.
> >
> > * Since patches are coming in, we have people who are knowledgable about
> >   the code.  Those people should be pitching in with reviews as well.
> >   It doesn't seem like they or their managers have the right incentives
> >   to do that.  Maybe there is some way to improve the incentives.
> 
> I do agree that it takes (very) long sometimes to get a patch
> reviewed/accepted, and I do see people complain about it. However,
> some of the people who do complain have not done a single
> review. Maybe we can ask people who send in a patch, to review at
> least one patch while they are waiting for theirs to be reviewed?

Hey, that's a great idea.  Maybe I will start looking for review
requests from people who do not do many reviews and suggest that they
should look at other patches out for reviews.

There may be some difficulty, though, that people new to OVS are less
qualified to review other patches.  However, trying to review is also a
good way to learn.

> Maybe have the zero-day robot sent them a thank you email for the
> patch with a list of patches that did not yet receive a single review
> comment?

That's a great idea!  I don't know whether it's practical, but I like
the idea.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix continuations with OF instructions in OF1.1+.

2021-07-20 Thread Ben Pfaff
On Tue, Jul 20, 2021 at 08:29:14PM +0200, Ilya Maximets wrote:
> On 7/7/21 8:51 PM, Ben Pfaff wrote:
> > +/* From an OpenFlow point of view, goto_table and write_metadata 
> > are
> > + * instructions, not actions.  This means that to use them, we'd 
> > have
> > + * to reformulate the actions as instructions, which is possible, 
> > and
> > + * we'd have slot them into the frozen actions in a specific order,
> > + * which doesn't seem practical.  Instead, we translate these
> > + * instructions into equivalent actions. */
> > +case OFPACT_GOTO_TABLE: {
> > +struct ofpact_resubmit *resubmit
> > += ofpact_put_RESUBMIT(>frozen_actions);
> > +resubmit->in_port = OFPP_IN_PORT;
> > +resubmit->table_id = ofpact_get_GOTO_TABLE(a)->table_id;
> > +resubmit->with_ct_orig = false;
> > +}
> > +continue;
> > +case OFPACT_WRITE_METADATA: {
> > +const struct ofpact_metadata *md = 
> > ofpact_get_WRITE_METADATA(a);
> > +const struct mf_field *mf = mf_from_id(MFF_METADATA);
> > +ovs_assert(mf->n_bytes == sizeof md->metadata);
> > +ovs_assert(mf->n_bytes == sizeof md->mask);
> > +ofpact_put_set_field(>frozen_actions, mf,
> > + >metadata, >mask);
> 
> I have very little knowledge in this area, so this might be not issue.
> However, 'OpenFlow "instructions", which were introduced in OpenFlow 1.1'
> doesn't match with OFPACT_SET_FIELD that is marked as OpenFlow 1.2+
> action.  I see that all tests for continuation have default bridge
> configuration.  And monitors are OF10 and OF13 in different cases, but
> I'm not  sure if these ovs-ofctl monitors are checking that received
> actions for resume matches the OF version they are using.  And for
> the bridge itself, it supports all versions of OF, so it can interpret
> actions.  But the question is: will that work if the bridge only
> supports OpenFlow 1.1 ?  It should fail to parse OFPACT_SET_FIELD, or
> am I missing something here?

Thanks a lot for the review!

OVS can serialize most actions and instructions to most OpenFlow
versions, even if they are not natively supported in that particular
OpenFlow version.  For example, in the case of OFPACT_SET_FIELD (a
rather important action), the code in ofp-actions.c has complete
fallbacks:

static void
encode_SET_FIELD(const struct ofpact_set_field *sf,
 enum ofp_version ofp_version, struct ofpbuf *out)
{
if (ofp_version >= OFP15_VERSION) {
/* OF1.5+ only has Set-Field (reg_load is redundant so we drop it
 * entirely). */
set_field_to_set_field(sf, ofp_version, out);
} else if (sf->ofpact.raw == NXAST_RAW_REG_LOAD ||
   sf->ofpact.raw == NXAST_RAW_REG_LOAD2) {
/* It came in as reg_load, send it out the same way. */
set_field_to_nxast(sf, out);
} else if (ofp_version < OFP12_VERSION) {
/* OpenFlow 1.0 and 1.1 don't have Set-Field. */
set_field_to_legacy_openflow(sf, ofp_version, out);
} else if (is_all_ones(ofpact_set_field_mask(sf), sf->field->n_bytes)) {
/* We're encoding to OpenFlow 1.2, 1.3, or 1.4.  The action sets an
 * entire field, so encode it as OFPAT_SET_FIELD. */
set_field_to_set_field(sf, ofp_version, out);
} else {
/* We're encoding to OpenFlow 1.2, 1.3, or 1.4.  The action cannot 
be
 * encoded as OFPAT_SET_FIELD because it does not set an entire 
field,
 * so encode it as reg_load. */
set_field_to_nxast(sf, out);
}
}

And even for the instructions we're talking about now, there are
fallbacks for OF1.0, which doesn't have instructions, so that we can
encode them as OF1.0 extension actions instead:

static void
encode_WRITE_METADATA(const struct ofpact_metadata *metadata,
  enum ofp_version ofp_version, struct ofpbuf *out)
{
if (ofp_version == OFP10_VERSION) {
struct nx_action_write_metadata *nawm;

nawm = put_NXAST_WRITE_METADATA(out);
nawm->metadata = metadata->metadata;
nawm->mask = metadata->mask;
} else {
struct ofp11_instruction_write_metadata *oiwm;

oiwm = instruction_put_OFPIT11_WRITE_METADATA(out);
oiwm->metadata = metadata->metadata;
oiwm->metadata_mask = metadata->mask;
}
}

...

static void
encode_GOTO_TABLE(const struct ofpact_goto_table *goto_table,
  

[ovs-dev] ideas for improving the OVS review process

2021-07-20 Thread Ben Pfaff
The OVS review process has greatly slowed over the last few years.  This
is partly because I haven't been able to spend as much time on review,
since I was once the most productive reviewer.  Ilya has been able to
step up the amount of review he does, but that still isn't enough to
keep up with the workload.

We need to come up with some way to improve things.  Here are a few
ideas, mostly from a call earlier today (that was mainly about the OVS
conference).  I hope they will prompt a discussion.

* Since patches are coming in, we have people who are knowledgable about
  the code.  Those people should be pitching in with reviews as well.
  It doesn't seem like they or their managers have the right incentives
  to do that.  Maybe there is some way to improve the incentives.

* The Linux kernel uses something like a "default accept" policy for
  patches that superficially look good and compile and boot, if there is
  no review from a specific maintainer within a few days.  The patches
  do get some small amount of review from a subsystem maintainer.  OVS
  could adopt a similar policy.

* Some lack of review can be attributed to a reluctance to accept a
  review from a reviewer who is at the same company as the patch
  submitter.  There is good reason for this, but it is certainly
  possible to get quality reviews from a coworker, and perhaps we should
  relax the convention.

* A flip side of the above could be to codify the requirement for review
  from a non-coworker.  This would have the benefit of being able to
  push back against requests to commit unreviewed long series on the
  basis that it hasn't been reviewed by a third party.

I hope that there can be some discussion here.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/2] Optimize the poll loop for poll_immediate_wake()

2021-07-20 Thread Ben Pfaff
On Tue, Jul 20, 2021 at 06:26:48PM +0100, Anton Ivanov wrote:
> 
> On 19/07/2021 21:52, Ben Pfaff wrote:
> > On Fri, Jul 16, 2021 at 02:42:32PM +0100, anton.iva...@cambridgegreys.com 
> > wrote:
> > > From: Anton Ivanov 
> > > 
> > > If we are not obtaining any useful information out of the poll(),
> > > such as is a fd busy or not, we do not need to do a poll() if
> > > an immediate_wake() has been requested.
> > > 
> > > This cuts out all the pollfd hash additions, forming the poll
> > > arguments and the actual poll() after a call to
> > > poll_immediate_wake()
> > > 
> > > Signed-off-by: Anton Ivanov 
> > Thanks for v3.
> > 
> > I believe that the existing code here properly handles the pathological
> > case where the current time is less than 0.  This is a case that I have
> > seen happen on real systems that have a real-time clock with a bad
> > current time and do not have proper support for a monotonic clock.  I
> > believe that your new code does not properly handle this case, because
> > it treats all times less than 0 as immediate.  I think that you can fix
> > it by comparing against LLONG_MIN rather than zero.
> > 
> > In poll-loop, I would move
> >  COVERAGE_INC(poll_zero_timeout);
> > inside the new statement that is already conditional on timeout_when ==
> > LLONG_MIN.
> > 
> > I don't like the new assumption in time_poll() that n_pollfds == 0 means
> > we're waking immediately.
> > 
> > Thanks,
> > 
> > Ben.
> > 
> Updated, all logic is now strictly based on timeout == LLONG_MIN.

Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/2] Optimize the poll loop for poll_immediate_wake()

2021-07-20 Thread Ben Pfaff
On Tue, Jul 20, 2021 at 08:19:35AM +0100, Anton Ivanov wrote:
> 
> On 19/07/2021 21:52, Ben Pfaff wrote:
> > On Fri, Jul 16, 2021 at 02:42:32PM +0100, anton.iva...@cambridgegreys.com 
> > wrote:
> > > From: Anton Ivanov 
> > > 
> > > If we are not obtaining any useful information out of the poll(),
> > > such as is a fd busy or not, we do not need to do a poll() if
> > > an immediate_wake() has been requested.
> > > 
> > > This cuts out all the pollfd hash additions, forming the poll
> > > arguments and the actual poll() after a call to
> > > poll_immediate_wake()
> > > 
> > > Signed-off-by: Anton Ivanov 
> > Thanks for v3.
> > 
> > I believe that the existing code here properly handles the pathological
> > case where the current time is less than 0.  This is a case that I have
> > seen happen on real systems that have a real-time clock with a bad
> > current time and do not have proper support for a monotonic clock.  I
> > believe that your new code does not properly handle this case, because
> > it treats all times less than 0 as immediate.  I think that you can fix
> > it by comparing against LLONG_MIN rather than zero.
> In that case, we should set poll_immediate_wake to be LLONG_MIN as well, 
> right?

It does that, even though it's a bit indirect, since
poll_immediate_wake_at() passes 0 to poll_timer_wait_at(), which
translates 0 to LLONG_MIN.

> > In poll-loop, I would move
> >  COVERAGE_INC(poll_zero_timeout);
> > inside the new statement that is already conditional on timeout_when ==
> > LLONG_MIN.
> > 
> > I don't like the new assumption in time_poll() that n_pollfds == 0 means
> > we're waking immediately.
> 
> We can do everything on timeout. That is more consistent.

Thanks.

> As far as assumptions go, it is equivalent on Unixes, because the
> number of fds is always > 1. This is because you always get 1 fd from
> the signal pipe.

That's true if poll_block() is the only caller of time_poll(), which is
currently true, as long as we continue to deal with signals this way.
But we don't document or enforce that assumption and I'd rather not rely
on it.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix continuations with OF instructions in OF1.1+.

2021-07-19 Thread Ben Pfaff
On Wed, Jul 07, 2021 at 11:51:50AM -0700, Ben Pfaff wrote:
> Open vSwitch supports OpenFlow "instructions", which were introduced in
> OpenFlow 1.1 and act like restricted kinds of actions that can only
> appear in a particular order and particular circumstances.  OVS did
> not support two of these instructions, "write_metadata" and
> "goto_table", properly in the case where they appeared in a flow that
> needed to be frozen for continuations.

I would appreciate a review for this bug fix.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 2/2] Minimize the number of time calls in time_poll()

2021-07-19 Thread Ben Pfaff
On Fri, Jul 16, 2021 at 02:42:33PM +0100, anton.iva...@cambridgegreys.com wrote:
> From: Anton Ivanov 
> 
> time_poll() makes an excessive number of time_msec() calls
> which incur a performance penalty.
> 
> 1. Avoid time_msec() call for timeout calculation when time_poll()
> is asked to skip poll()
> 
> 2. Reuse the time_msec() result from deadline calculation for
> last_wakeup and timeout calculation.
> 
> Signed-off-by: Anton Ivanov 

I'll take another look at this in v4.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/2] Optimize the poll loop for poll_immediate_wake()

2021-07-19 Thread Ben Pfaff
On Fri, Jul 16, 2021 at 02:42:32PM +0100, anton.iva...@cambridgegreys.com wrote:
> From: Anton Ivanov 
> 
> If we are not obtaining any useful information out of the poll(),
> such as is a fd busy or not, we do not need to do a poll() if
> an immediate_wake() has been requested.
> 
> This cuts out all the pollfd hash additions, forming the poll
> arguments and the actual poll() after a call to
> poll_immediate_wake()
> 
> Signed-off-by: Anton Ivanov 

Thanks for v3.

I believe that the existing code here properly handles the pathological
case where the current time is less than 0.  This is a case that I have
seen happen on real systems that have a real-time clock with a bad
current time and do not have proper support for a monotonic clock.  I
believe that your new code does not properly handle this case, because
it treats all times less than 0 as immediate.  I think that you can fix
it by comparing against LLONG_MIN rather than zero.

In poll-loop, I would move
COVERAGE_INC(poll_zero_timeout);
inside the new statement that is already conditional on timeout_when ==
LLONG_MIN.

I don't like the new assumption in time_poll() that n_pollfds == 0 means
we're waking immediately.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovn-sbctl.at: Fix timing problem of count-flows test.

2021-07-19 Thread Ben Pfaff
On Thu, Jul 15, 2021 at 06:10:40PM -0700, Han Zhou wrote:
> On Thu, Jul 15, 2021 at 4:57 PM Ben Pfaff  wrote:
> 
> > On Thu, Jul 15, 2021 at 04:34:51PM -0700, Han Zhou wrote:
> > > Fixes: 895e02ec0be6("ovn-sbctl.c Add logical flows count numbers")
> > > Signed-off-by: Han Zhou 
> > > ---
> > >  tests/ovn-sbctl.at | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/tests/ovn-sbctl.at b/tests/ovn-sbctl.at
> > > index 16f5dabcc..dabf62d2b 100644
> > > --- a/tests/ovn-sbctl.at
> > > +++ b/tests/ovn-sbctl.at
> > > @@ -200,7 +200,7 @@ Total number of logical flows = 0
> > >  ])
> > >
> > >  # create some logical flows
> > > -check ovn-nbctl ls-add count-test
> > > +check ovn-nbctl --wait=sb ls-add count-test
> > >
> > >  OVS_WAIT_UNTIL([total_lflows=`count_entries`; test $total_lflows -ne 0])
> >
> > I'm surprised that the above change is needed, since OVS_WAIT_UNTIL
> > should wait until the condition is true.
> 
> 
> Yes you are right. This is not needed, but I think it’s not harmful either.
> Do you want me to send a v2 with this removed?

I don't think you need to send a v2, but I'd prefer to not check in that
part of the change.

> The second place is where this test occasionally fails.

That one makes sense!  For that part:
Acked-by: Ben Pfaff 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] FW: [ovs-dev v1 1-1] datapath-windows-Correct checksum for DNAT action

2021-07-19 Thread Ben Pfaff
On Sun, Jul 18, 2021 at 04:22:02AM +, Wilson Peng wrote:
> Hi,  d...@openvswitch.org ,
> 
> I would like to send out one patch for some issue found on datapah-windows in 
> OVS repo,
> 
> I have sent the patch via the email attached in the mail and the guided doc to
> Submit the patch is got via link below.
>  
> https://docs.openvswitch.org/en/latest/internals/contributing/submitting-patches/
> 
> And I also subscribe the mail-list for ovs-dev, but I could not get the patch 
> I have submitted.
> 
> This is the first time I sent out the patch into the OVS community. Not sure 
> how could I go
> To proceed for the patch submitting.

I don't think I saw a patch get caught in the spam filters (I would have
released it if I had), so this is pretty odd.

We have several submitters who submit from vmware.com email addresses,
so I don't think that's the problem.

If you can't figure it out, you could always use a different email
account to submit the patches (you can still use the vmware.com address
as the commit author), or you could submit PRs against github instead.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Documentation: Remove duplicate words.

2021-07-19 Thread Ben Pfaff
On Mon, Jul 19, 2021 at 01:27:16PM +0200, David Marchand wrote:
> This is a simple cleanup with a script of mine.
> 
> Signed-off-by: David Marchand 
> ---
> This script is a bit too silly/simplistic and requires manual review
> (too many false positives), but I don't think it is worth investing too
> much time into integrating this kind of check in utilities/checkpatch.py.

Thanks, pushed!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS has now branched for the 2.16 release.

2021-07-16 Thread Ben Pfaff
On Fri, Jul 16, 2021 at 10:42:01PM +0200, Ilya Maximets wrote:
> Hi, everyone.
> 
> branch-2.16 was just created.  Please, test it and report issues!
> 
> Official release date according to our release process should be on
> Monday, August 16th (1 month from now).
> 
> Best regards, Ilya Maximets.

Thanks for doing it!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] controller: Avoid unnecessary load balancer flow processing.

2021-07-16 Thread Ben Pfaff
On Fri, Jul 16, 2021 at 09:43:17AM +0200, Dumitru Ceara wrote:
> On 7/16/21 3:05 AM, Han Zhou wrote:
> > On Thu, Jul 15, 2021 at 5:03 PM Ben Pfaff  wrote:
> > 
> >> On Mon, Jul 12, 2021 at 10:08:10AM +0200, Dumitru Ceara wrote:
> >>> Whenever a Load_Balancer is updated, e.g., a VIP is added, the following
> >>> sequence of events happens:
> >>>
> >>> 1. The Southbound Load_Balancer record is updated.
> >>> 2. The Southbound Datapath_Binding records on which the Load_Balancer is
> >>>applied are updated.
> >>> 3. Southbound ovsdb-server sends updates about the Load_Balancer and
> >>>Datapath_Binding records to ovn-controller.
> >>> 4. The IDL layer in ovn-controller processes the updates at #3, but
> >>>because of the SB schema references between tables [0] all logical
> >>>flows referencing the updated Datapath_Binding are marked as
> >>>"updated".  The same is true for Logical_DP_Group records
> >>>referencing the Datapath_Binding, and also for all logical flows
> >>>pointing to the new "updated" datapath groups.
> >>> 5. ovn-controller ends up recomputing (removing/readding) all flows for
> >>>all these tracked updates.
> >>
> >> This is kind of a weird change from my perspective.  It allows for
> >> broken referential integrity in the database to work around a
> >> performance bug in the IDL.
> 
> Thanks for looking at this change!
> 
> I guess the description in the commit message is a bit unclear.  Let me
> try to fix that here.  There is no bug in the IDL; it's doing what it's
> supposed to do given the Southbound database schema.
> 
> The problem is with the schema itself.  When flow generation for load
> balancers was moved to ovn-controller an additional reference (from
> Datapath_Binding to Load_Balancer) was added in the SB schema to avoid
> performance issues or extremely complex code in ovn-controller to handle
> Load_Balancer updates incrementally.
> 
> My first attempt at the fix was to just remove that additional
> reference.  That is fine, and doesn't affect correctness but:
> 1. makes the upgrade scenario complex because ovn-controllers running
> old versions would expect that reference to be there.
> 2. adds lots of complicated code in ovn-controller incremental-processing.
> 
> > 
> > 
> > Yes, it did look weird and there were detailed discussions on it in the v1
> > reviews. Some options that require much bigger changes were discussed for
> > longer term, unless ddlog is used.
> > 
> 
> This would be the longer term solution, along with removing the
> reference from Datapath_Binding to Load_Balancers completely.

OK.  In context, it makes more sense.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] controller: Avoid unnecessary load balancer flow processing.

2021-07-15 Thread Ben Pfaff
On Mon, Jul 12, 2021 at 10:08:10AM +0200, Dumitru Ceara wrote:
> Whenever a Load_Balancer is updated, e.g., a VIP is added, the following
> sequence of events happens:
> 
> 1. The Southbound Load_Balancer record is updated.
> 2. The Southbound Datapath_Binding records on which the Load_Balancer is
>applied are updated.
> 3. Southbound ovsdb-server sends updates about the Load_Balancer and
>Datapath_Binding records to ovn-controller.
> 4. The IDL layer in ovn-controller processes the updates at #3, but
>because of the SB schema references between tables [0] all logical
>flows referencing the updated Datapath_Binding are marked as
>"updated".  The same is true for Logical_DP_Group records
>referencing the Datapath_Binding, and also for all logical flows
>pointing to the new "updated" datapath groups.
> 5. ovn-controller ends up recomputing (removing/readding) all flows for
>all these tracked updates.

This is kind of a weird change from my perspective.  It allows for
broken referential integrity in the database to work around a
performance bug in the IDL.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Optimize the poll loop for poll_immediate_wake()

2021-07-15 Thread Ben Pfaff
I'm impressed by the improvement.  I didn't really expect any.

On Tue, Jul 13, 2021 at 09:19:34AM +0100, Anton Ivanov wrote:
> I ran the revised patch series (v2) which addresses your comments on the 
> ovn-heater benchmark.
> 
> With 9 fake nodes, 50 fake pods per node I get ~ 2% reproducible improvement. 
> IMHO it should be even more noticeable at high scale.
> 
> Best regards,
> 
> A
> 
> On 09/07/2021 19:45, Ben Pfaff wrote:
> > On Fri, Jul 09, 2021 at 06:19:06PM +0100, anton.iva...@cambridgegreys.com 
> > wrote:
> > > From: Anton Ivanov 
> > > 
> > > If we are not obtaining any useful information out of the poll(),
> > > such as is a fd busy or not, we do not need to do a poll() if
> > > an immediate_wake() has been requested.
> > > 
> > > This cuts out all the pollfd hash additions, forming the poll
> > > arguments and the actual poll() after a call to
> > > poll_immediate_wake()
> > > 
> > > Signed-off-by: Anton Ivanov 
> > Thanks for the patch.
> > 
> > I think that this will have some undesirable side effects because it
> > avoids calling time_poll() if the wakeup should happen immediately, and
> > time_poll() does some thing that we always want to happen between one
> > main loop and another.  In particular the following calls from
> > time_poll() are important:
> > 
> >  coverage_clear();
> >  coverage_run();
> >  if (*last_wakeup && !thread_is_pmd()) {
> >  log_poll_interval(*last_wakeup);
> >  }
> > 
> > ...
> > 
> >  if (!time_left) {
> >  ovsrcu_quiesce();
> >  } else {
> >  ovsrcu_quiesce_start();
> >  }
> > 
> > ...
> > 
> >  if (deadline <= time_msec()) {
> > #ifndef _WIN32
> >  fatal_signal_handler(SIGALRM);
> > #else
> >  VLOG_ERR("wake up from WaitForMultipleObjects after deadline");
> >  fatal_signal_handler(SIGTERM);
> > #endif
> >  if (retval < 0) {
> >  retval = 0;
> >  }
> >  break;
> >  }
> > 
> > ...
> > 
> >  *last_wakeup = time_msec();
> >  refresh_rusage();
> > 
> > Instead of this change, I'd suggest something more like the following,
> > along with the changes you made to suppress the file descriptors if the
> > timeout is already zero:
> > 
> > diff --git a/lib/timeval.c b/lib/timeval.c
> > index 193c7bab1781..f080a742 100644
> > --- a/lib/timeval.c
> > +++ b/lib/timeval.c
> > @@ -323,7 +323,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE 
> > *handles OVS_UNUSED,
> >   }
> >   #ifndef _WIN32
> > -retval = poll(pollfds, n_pollfds, time_left);
> > +retval = time_left ? poll(pollfds, n_pollfds, time_left) : 0;
> >   if (retval < 0) {
> >   retval = -errno;
> >   }
> > 
> > 
> -- 
> Anton R. Ivanov
> Cambridgegreys Limited. Registered in England. Company Number 10273661
> https://www.cambridgegreys.com/
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovn-sbctl.at: Fix timing problem of count-flows test.

2021-07-15 Thread Ben Pfaff
On Thu, Jul 15, 2021 at 04:34:51PM -0700, Han Zhou wrote:
> Fixes: 895e02ec0be6("ovn-sbctl.c Add logical flows count numbers")
> Signed-off-by: Han Zhou 
> ---
>  tests/ovn-sbctl.at | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/ovn-sbctl.at b/tests/ovn-sbctl.at
> index 16f5dabcc..dabf62d2b 100644
> --- a/tests/ovn-sbctl.at
> +++ b/tests/ovn-sbctl.at
> @@ -200,7 +200,7 @@ Total number of logical flows = 0
>  ])
>  
>  # create some logical flows
> -check ovn-nbctl ls-add count-test
> +check ovn-nbctl --wait=sb ls-add count-test
>  
>  OVS_WAIT_UNTIL([total_lflows=`count_entries`; test $total_lflows -ne 0])

I'm surprised that the above change is needed, since OVS_WAIT_UNTIL
should wait until the condition is true.

> @@ -219,7 +219,7 @@ $ingress_lflows
>  ])
>  
>  # add another datapath
> -check ovn-nbctl ls-add count-test2
> +check ovn-nbctl --wait=sb ls-add count-test2
>  
>  # check total logical flows in 2 datapathes
>  AT_CHECK_UNQUOTED([ovn-sbctl count-flows | grep "flows =" | awk 'NF>1{print 
> $NF}'], [0], [dnl

The one above makes sense to me.


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog - high mem and cpu usage when started with an existing DB

2021-07-15 Thread Ben Pfaff
On Mon, Jul 12, 2021 at 04:42:27PM -0700, Ben Pfaff wrote:
> On Thu, Jul 08, 2021 at 08:59:24PM +0200, Dumitru Ceara wrote:
> > Hi Ben,
> > 
> > As discussed earlier, during the OVN meeting, I've noticed a new
> > performance issue with ovn-northd-ddlog when running it against a
> > database from one of our more recent scale tests:
> > 
> > http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210708/ovnnb_db.db
> > 
> > ovn-northd-ddlog uses 100% CPU and never really reaches the point to
> > perform the first transaction to the Southbound.  Memory usage is also
> > very high, I stopped it at 45GB RSS.
> > 
> > To test I did:
> > SANDBOXFLAGS="--nbdb-source=/tmp/ovnnb_db.db --ddlog" make sandbox
> 
> Thanks.  I've been spending a lot of time with this Friday and today.
> It is a bit different from the other issues I've looked at.  The
> previous ones were inefficient production of relatively small output.
> This one is inefficient production (and storage) of rather large output
> (millions of flows).  I'm trying to get help from Leonid on how to
> reduce the memory usage.

Leonid has been looking into this and we're going to talk through a
solution tomorrow.  With luck, I'll have some patches soon after that.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] bond: Fix broken rebalancing after link state changes.

2021-07-15 Thread Ben Pfaff
On Tue, Jul 13, 2021 at 05:32:06PM +0200, Ilya Maximets wrote:
> There are 3 constraints for moving hashes from one member to another:
> 
>   1. The load difference exceeds ~ 3% of the load of one member.
>   2. The difference in load between members exceeds 100,000 bytes.
>   3. Moving the hash reduces the load difference by more than 10%.
> 
> In the current implementation, if one of the members transitions to
> the DOWN state, all hashes assigned to it will be moved to the other
> members.  After that, if the member goes UP, it will wait for
> rebalancing to get hashes.  But in case we have more than 10 equally
> loaded hashes, it will never meet constraint # 3, because each hash
> will handle less than 10% of the load.  The situation gets worse when
> the number of flows grows and it is almost impossible to transfer any
> hash when all 256 hash records are used, which is very likely when we
> have few hundred/thousand flows.
> 
> As a result, if one of the members goes down and back up while traffic
> flows, it will never be used to transmit packets again.  This will not
> be fixed even if we completely stop the traffic and start it again,
> because the first two constraints will block rebalancing in the
> earlier stages, while we have low traffic volume.
> 
> Moving a single hash if the destination does not have any hashes,
> as it was before commit c460a6a7bc75 ("ofproto/bond: simplifying the
> rebalancing logic"), will not help, because a single hash is not
> enough to make the difference in load less than 10% of the total load,
> and this member will handle only that one hash forever.
> 
> To fix this, let's try to move multiple hashes at the same time to
> meet constraint # 3.
> 
> The implementation includes sorting the "records" to be able to
> collect records with a cumulative load close enough to the ideal value.
> 
> Signed-off-by: Ilya Maximets 

I reread this and it still looks good to me.

I spotted one typo in a comment:

diff --git a/ofproto/bond.c b/ofproto/bond.c
index b9bfa45493b8..c3e2083575b0 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -1216,7 +1216,7 @@ choose_entries_to_migrate(const struct bond_member *from, 
uint64_t to_tx_bytes,
 }
 
 if (!cnt) {
-/* There is no entry which load less than or equal to 'ideal_delta'.
+/* There is no entry with load less than or equal to 'ideal_delta'.
      * Lets try closest one. The closest is the last in sorted list. */
 struct bond_entry *closest;
 

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 2/2] tutorial: Remove OVS-specific files.

2021-07-15 Thread Ben Pfaff
On Tue, Jul 13, 2021 at 12:17:48PM -0400, Numan Siddique wrote:
> On Fri, Jul 2, 2021 at 5:36 PM Ben Pfaff  wrote:
> >
> > These were part of the OVS tutorial, which isn't relevant for OVN.
> >
> > Signed-off-by: Ben Pfaff 
> 
> Acked-by: Numan Siddique 

Thanks for the reviews.  I pushed these commits.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 2/2] Minimize the number of time calls in time_poll()

2021-07-15 Thread Ben Pfaff
On Mon, Jul 12, 2021 at 06:15:29PM +0100, anton.iva...@cambridgegreys.com wrote:
> From: Anton Ivanov 
> 
> time_poll() makes an excessive number of time_msec() calls
> which incur a performance penalty.
> 
> 1. Avoid time_msec() call for timeout calculation when time_poll()
> is asked to skip poll()
> 
> 2. Reuse the time_msec() result from deadline calculation for
> last_wakeup and timeout calculation.
> 
> Signed-off-by: Anton Ivanov 

I'd like another look at that after patch 1 is revised.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/2] Optimize the poll loop for poll_immediate_wake()

2021-07-15 Thread Ben Pfaff
On Mon, Jul 12, 2021 at 06:15:28PM +0100, anton.iva...@cambridgegreys.com wrote:
> From: Anton Ivanov 
> 
> If we are not obtaining any useful information out of the poll(),
> such as is a fd busy or not, we do not need to do a poll() if
> an immediate_wake() has been requested.
> 
> This cuts out all the pollfd hash additions, forming the poll
> arguments and the actual poll() after a call to
> poll_immediate_wake()
> 
> Signed-off-by: Anton Ivanov 

I don't think we need the new 'immediate_wake' member of struct
poll_loop.  A 'timeout_when' of LLONG_MIN already means the same thing
(it's even documented in the comment).

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 ovn 1/5] ovn-northd-ddlog: Optimize AggregatedFlow rules.

2021-07-15 Thread Ben Pfaff
On Thu, Jul 15, 2021 at 08:45:00PM +0200, Lorenzo Bianconi wrote:
> > On Thu, Jul 15, 2021 at 11:18:12AM -0700, Han Zhou wrote:
> > > On Wed, Jul 14, 2021 at 1:34 AM Lorenzo Bianconi <
> > > lorenzo.bianc...@redhat.com> wrote:
> > > > This should avoid some work by doing the cheapest check (the one on
> > > > UseLogicalDatapathGroups) before any joins.  DDlog is probably
> > > > factoring out the reference to the Flow relation, which is identical
> > > > in both, but this ought to avoid the group_by aggregation (which is
> > > > relatively expensive) in the case where UseLogicalDatapathGroups is
> > > > not enabled.
> > > 
> > > Thanks! I didn't know that the order matters. (not sure if there is
> > > documentation that I missed)
> > 
> > In general, DDlog executes each rule in the order given, so if you have
> > a series of joins in a rule, then it's a good idea to order them, if you
> > can, to keep the number of records small at each join step.  It won't
> > affect the correctness, but it will affect performance.
> > 
> > This might not be documented well.  I do occasionally work on the DDlog
> > documentation, so I've made a note to try to see whether the effect of
> > ordering is mentioned and improve it if I can.
> > 
> > In this particular case, I talked to Leonid about it during a discussion
> > of how to improve performance and memory use in the benchmark currently
> > at issue, and Leonid says that the ordering doesn't actually matter in
> > this case because both of the possibilities (the ones for
> > UseLogicalDatapathGroups[true] and UseLogicalDatapathGroups[false]) had
> > an identical clause at the beginning.  DDlog optimizes identical
> > prefixes, so there was no real difference in performance.
> > 
> > This patch could, therefore, be dropped, but it should also not be
> > harmful.  Dropping it would require some changes to the later patch that
> > updates the ovn-northd ddlog code, so folding it into that one would
> > also be an option.
> > 
> 
> Do you want me to repost dropping this patch or is the series fine as it is?

The series is fine as is.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb-server: Fix memleak when failing to rea storage.

2021-07-15 Thread Ben Pfaff
On Wed, Jul 14, 2021 at 09:21:19AM +0200, Dumitru Ceara wrote:
> Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered 
> databases.")
> Signed-off-by: Dumitru Ceara 

Thanks!  I pushed this.  I noticed that the declaration could be moved
down to the first assignment, so I actually pushed the following.

I backported to all affected branches.

-8<--cut here-->8--

From: Dumitru Ceara 
Date: Wed, 14 Jul 2021 09:21:19 +0200
Subject: [PATCH] ovsdb-server: Fix memleak when failing to read storage.

Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered 
databases.")
Signed-off-by: Dumitru Ceara 
Signed-off-by: Ben Pfaff 
---
 ovsdb/ovsdb-server.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
index b09232c654ab..f4a0fac99259 100644
--- a/ovsdb/ovsdb-server.c
+++ b/ovsdb/ovsdb-server.c
@@ -660,8 +660,6 @@ add_db(struct server_config *config, struct db *db)
 static struct ovsdb_error * OVS_WARN_UNUSED_RESULT
 open_db(struct server_config *config, const char *filename)
 {
-struct db *db;
-
 /* If we know that the file is already open, return a good error message.
  * Otherwise, if the file is open, we'll fail later on with a harder to
  * interpret file locking error. */
@@ -676,9 +674,6 @@ open_db(struct server_config *config, const char *filename)
 return error;
 }
 
-db = xzalloc(sizeof *db);
-db->filename = xstrdup(filename);
-
 struct ovsdb_schema *schema;
 if (ovsdb_storage_is_clustered(storage)) {
 schema = NULL;
@@ -691,6 +686,9 @@ open_db(struct server_config *config, const char *filename)
 }
 ovs_assert(schema && !txn_json);
 }
+
+struct db *db = xzalloc(sizeof *db);
+db->filename = xstrdup(filename);
 db->db = ovsdb_create(schema, storage);
 ovsdb_jsonrpc_server_add_db(config->jsonrpc, db->db);
 
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] latch-unix: Decrease the stack usage in latch

2021-07-15 Thread Ben Pfaff
On Thu, Jul 15, 2021 at 04:28:12PM +0100, anton.iva...@cambridgegreys.com wrote:
> From: Anton Ivanov 
> 
> 1. Make latch behave as described and documented - clear all
> outstanding latch writes when invoking latch_poll().
> 2. Decrease the size of the latch buffer. Less stack usage,
> less cache thrashing.
> 
> Signed-off-by: Anton Ivanov 

Applied, thanks!  I dropped the following hunk:

> --- a/lib/latch-unix.c
> +++ b/lib/latch-unix.c
> @@ -23,6 +23,7 @@
>  #include "openvswitch/poll-loop.h"
>  #include "socket-util.h"
>  
> +
>  /* Initializes 'latch' as initially unset. */
>  void
>  latch_init(struct latch *latch)
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 ovn 1/5] ovn-northd-ddlog: Optimize AggregatedFlow rules.

2021-07-15 Thread Ben Pfaff
On Thu, Jul 15, 2021 at 11:18:12AM -0700, Han Zhou wrote:
> On Wed, Jul 14, 2021 at 1:34 AM Lorenzo Bianconi <
> lorenzo.bianc...@redhat.com> wrote:
> > This should avoid some work by doing the cheapest check (the one on
> > UseLogicalDatapathGroups) before any joins.  DDlog is probably
> > factoring out the reference to the Flow relation, which is identical
> > in both, but this ought to avoid the group_by aggregation (which is
> > relatively expensive) in the case where UseLogicalDatapathGroups is
> > not enabled.
> 
> Thanks! I didn't know that the order matters. (not sure if there is
> documentation that I missed)

In general, DDlog executes each rule in the order given, so if you have
a series of joins in a rule, then it's a good idea to order them, if you
can, to keep the number of records small at each join step.  It won't
affect the correctness, but it will affect performance.

This might not be documented well.  I do occasionally work on the DDlog
documentation, so I've made a note to try to see whether the effect of
ordering is mentioned and improve it if I can.

In this particular case, I talked to Leonid about it during a discussion
of how to improve performance and memory use in the benchmark currently
at issue, and Leonid says that the ordering doesn't actually matter in
this case because both of the possibilities (the ones for
UseLogicalDatapathGroups[true] and UseLogicalDatapathGroups[false]) had
an identical clause at the beginning.  DDlog optimizes identical
prefixes, so there was no real difference in performance.

This patch could, therefore, be dropped, but it should also not be
harmful.  Dropping it would require some changes to the later patch that
updates the ovn-northd ddlog code, so folding it into that one would
also be an option.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] Fix compilation error for m32.

2021-07-14 Thread Ben Pfaff
On Wed, Jul 14, 2021 at 06:23:15PM -0400, num...@ovn.org wrote:
> From: Numan Siddique 
> 
> Fixes: 895e02ec0be6("ovn-sbctl.c Add logical flows count numbers")
> Signed-off-by: Numan Siddique 

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] latch-unix: Make the latch read buffer shared

2021-07-14 Thread Ben Pfaff
On Wed, Jul 14, 2021 at 09:12:10PM +0100, Anton Ivanov wrote:
> On 14/07/2021 19:33, Ben Pfaff wrote:
> > On Wed, Jul 14, 2021 at 05:36:36PM +0100, anton.iva...@cambridgegreys.com 
> > wrote:
> > > From: Anton Ivanov 
> > > 
> > > There is no point to add 512 bytes on the stack
> > > every time latch is polled. Alignment, cache line thrashing,
> > > etc - you name it.
> > Do you have evidence this is a real problem?
> 
> I played a bit with it using the ovn-heater benchmark, difference was
> marginal.
> 
> IMHO it will result in a difference only on a bigger setup which I cannot
> simulate.
> 
> > 
> > > The result of the read is discarded anyway so the buffer
> > > can be shared by all latches.
> > > 
> > > Signed-off-by: Anton Ivanov 
> > > +/* All writes to latch are zero sized. Even 16 bytes are an overkill */
> > > +static char latch_buffer[16];
> > This comment is wrong.  Writes to a latch are 1 byte.
> 
> Me bad - I saw the "" in write() and ignored the 1 passed as length.
> 
> > 
> > latch_poll() is supposed to fully clear any buffered data.  It shouldn't
> > cause behavioral problems if it doesn't, and I imagine that it's rare
> > that there'd be more than 16 queued notifications, but it seems
> > regressive to just clear some of them.
> 
> The read can be looped. In fact, for full correctness it should be looped
> regardless of the size of the read buffer.

Couldn't hurt.

> So maybe 16 local looped?

I'd be OK with that.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] latch-unix: Make the latch read buffer shared

2021-07-14 Thread Ben Pfaff
On Wed, Jul 14, 2021 at 05:36:36PM +0100, anton.iva...@cambridgegreys.com wrote:
> From: Anton Ivanov 
> 
> There is no point to add 512 bytes on the stack
> every time latch is polled. Alignment, cache line thrashing,
> etc - you name it.

Do you have evidence this is a real problem?

> The result of the read is discarded anyway so the buffer
> can be shared by all latches.
> 
> Signed-off-by: Anton Ivanov 

> +/* All writes to latch are zero sized. Even 16 bytes are an overkill */
> +static char latch_buffer[16];

This comment is wrong.  Writes to a latch are 1 byte.

latch_poll() is supposed to fully clear any buffered data.  It shouldn't
cause behavioral problems if it doesn't, and I imagine that it's rare
that there'd be more than 16 queued notifications, but it seems
regressive to just clear some of them.

It's silly to use static data for 16 bytes.  If you're going to reduce
the size, just keep it as local.

>  /* Initializes 'latch' as initially unset. */
>  void
>  latch_init(struct latch *latch)
> @@ -43,9 +46,7 @@ latch_destroy(struct latch *latch)
>  bool
>  latch_poll(struct latch *latch)
>  {
> -char buffer[_POSIX_PIPE_BUF];
> -
> -return read(latch->fds[0], buffer, sizeof buffer) > 0;
> +return read(latch->fds[0], _buffer, sizeof latch_buffer) > 0;
>  }
>  
>  /* Sets 'latch'.
> -- 
> 2.20.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 ovn 0/4] respin CoPP series

2021-07-13 Thread Ben Pfaff
On Tue, Jul 13, 2021 at 12:00:13PM -0700, Ben Pfaff wrote:
> On Sat, Jul 10, 2021 at 02:56:53PM +0200, Lorenzo Bianconi wrote:
> > ack, no worries. I folded the patch here:
> > https://github.com/LorenzoBianconi/ovn/tree/CoPP-v7
> > 
> > however CoPP tests available in system-ovn.at and for 
> > ovn-northd/ovn-controller
> > are failing with DDlog. Can you please double check if you are facing the 
> > same
> > issues?
> 
> I didn't run the system-ovn.at tests, but the ones for ovn-northd and
> ovn-controller are embarrassing.  Sorry about that.  I'll fix them, and
> I'll see if I can get the system-ovn tests set up and run here.

Here's a fix to fold in.  With this, I get failures in the following
tests.  I'm going to assume that this isn't related since all the
versions of the same test fail:

539: Symmetric IPv6 ECMP reply flows -- ovn-northd -- dp-groups=yes FAILED 
(ovn.at:23254)
540: Symmetric IPv6 ECMP reply flows -- ovn-northd -- dp-groups=no FAILED 
(ovn.at:23254)
541: Symmetric IPv6 ECMP reply flows -- ovn-northd-ddlog -- dp-groups=yes 
FAILED (ovn.at:23254)
542: Symmetric IPv6 ECMP reply flows -- ovn-northd-ddlog -- dp-groups=no
FAILED (ovn.at:23254)

I still haven't set up the system-ovn tests.  It's on my to-do list.

-8<--cut here-->8--

ovn-northd-ddlog: Fix CoPP behavior.

In ovn_northd.dl, the new 'controller_meter' column needs to be
included in the row UUID, otherwise replacing a row with one that is
identical except for controller_meter will fail with an ovsdb-server
transaction error.

In ovn-controller.at, it's necessary to wait for flow table updates.
---
 northd/ovn_northd.dl| 4 ++--
 tests/ovn-controller.at | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index d7b54be85644..71e868f05db2 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -1697,7 +1697,7 @@ for (f in AggregatedFlow()) {
 if (f.logical_datapaths.size() == 1) {
 Some{var dp} = f.logical_datapaths.nth(0) in
 sb::Out_Logical_Flow(
-._uuid = hash128((dp, f.stage, f.priority, f.__match, f.actions, 
f.external_ids)),
+._uuid = hash128((dp, f.stage, f.priority, f.__match, f.actions, 
f.controller_meter, f.external_ids)),
 .logical_datapath = Some{dp},
 .logical_dp_group = None,
 .pipeline = pipeline,
@@ -1710,7 +1710,7 @@ for (f in AggregatedFlow()) {
 } else {
 var group_uuid = hash128(f.logical_datapaths) in {
 sb::Out_Logical_Flow(
-._uuid = hash128((group_uuid, f.stage, f.priority, f.__match, 
f.actions, f.external_ids)),
+._uuid = hash128((group_uuid, f.stage, f.priority, f.__match, 
f.actions, f.controller_meter, f.external_ids)),
 .logical_datapath = None,
 .logical_dp_group = Some{group_uuid},
 .pipeline = pipeline,
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index 475528623b5c..e317b70aa55f 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -632,19 +632,19 @@ check ovn-nbctl ls-lb-add ls1 lb1
 
 # controller-event metering
 check ovn-nbctl meter-add event-elb drop 100 pktps 10
-check ovn-nbctl ls-copp-add ls1 event-elb event-elb
+check ovn-nbctl --wait=hv ls-copp-add ls1 event-elb event-elb
 
 AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep controller | grep 
userdata=00.00.00.0f | grep -q meter_id=1])
 
 # reject metering
 check ovn-nbctl meter-add acl-meter drop 1 pktps 0
 check ovn-nbctl ls-copp-add ls1 reject acl-meter
-check ovn-nbctl acl-add ls1 from-lport 1002 'inport == "lsp1" && ip && udp' 
reject
+check ovn-nbctl --wait=hv acl-add ls1 from-lport 1002 'inport == "lsp1" && ip 
&& udp' reject
 AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep controller | grep 
userdata=00.00.00.16 | grep -q meter_id=2])
 
 # arp metering
 check ovn-nbctl meter-add arp-meter drop 200 pktps 0
-check ovn-nbctl lr-copp-add lr1 arp-resolve arp-meter
+check ovn-nbctl --wait=hv lr-copp-add lr1 arp-resolve arp-meter
 AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep controller | grep 
userdata=00.00.00.00 | grep -q meter_id=3])
 
 OVN_CLEANUP([hv1])
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v6 ovn 0/4] respin CoPP series

2021-07-13 Thread Ben Pfaff
On Sat, Jul 10, 2021 at 02:56:53PM +0200, Lorenzo Bianconi wrote:
> ack, no worries. I folded the patch here:
> https://github.com/LorenzoBianconi/ovn/tree/CoPP-v7
> 
> however CoPP tests available in system-ovn.at and for 
> ovn-northd/ovn-controller
> are failing with DDlog. Can you please double check if you are facing the same
> issues?

I didn't run the system-ovn.at tests, but the ones for ovn-northd and
ovn-controller are embarrassing.  Sorry about that.  I'll fix them, and
I'll see if I can get the system-ovn tests set up and run here.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog - high mem and cpu usage when started with an existing DB

2021-07-12 Thread Ben Pfaff
On Thu, Jul 08, 2021 at 08:59:24PM +0200, Dumitru Ceara wrote:
> Hi Ben,
> 
> As discussed earlier, during the OVN meeting, I've noticed a new
> performance issue with ovn-northd-ddlog when running it against a
> database from one of our more recent scale tests:
> 
> http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210708/ovnnb_db.db
> 
> ovn-northd-ddlog uses 100% CPU and never really reaches the point to
> perform the first transaction to the Southbound.  Memory usage is also
> very high, I stopped it at 45GB RSS.
> 
> To test I did:
> SANDBOXFLAGS="--nbdb-source=/tmp/ovnnb_db.db --ddlog" make sandbox

Thanks.  I've been spending a lot of time with this Friday and today.
It is a bit different from the other issues I've looked at.  The
previous ones were inefficient production of relatively small output.
This one is inefficient production (and storage) of rather large output
(millions of flows).  I'm trying to get help from Leonid on how to
reduce the memory usage.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovn-nbctl: Fix ovn-nbctl --print-wait-time.

2021-07-09 Thread Ben Pfaff
On Fri, Jul 09, 2021 at 12:23:49PM -0700, Han Zhou wrote:
> Recently ovn-nbctl was changed to partially monitor NB DB, which
> revealed the missing column monitoring for NB_Global:nb_cfg_timestamp.
> 
> Without the fix, ovn-nbctl --wait=hv --print-wait-time would print
> something like:
> 
> Time spent on processing nb_cfg 39:
>   ovn-northd delay before processing: -1623916000627ms
>   ovn-northd completion:  14408ms
>   ovn-controller(s) completion:   36204ms
> 
> Signed-off-by: Han Zhou 

Acked-by: Ben Pfaff 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Optimize the poll loop for poll_immediate_wake()

2021-07-09 Thread Ben Pfaff
On Fri, Jul 09, 2021 at 06:19:06PM +0100, anton.iva...@cambridgegreys.com wrote:
> From: Anton Ivanov 
> 
> If we are not obtaining any useful information out of the poll(),
> such as is a fd busy or not, we do not need to do a poll() if
> an immediate_wake() has been requested.
> 
> This cuts out all the pollfd hash additions, forming the poll
> arguments and the actual poll() after a call to
> poll_immediate_wake()
> 
> Signed-off-by: Anton Ivanov 

Thanks for the patch.

I think that this will have some undesirable side effects because it
avoids calling time_poll() if the wakeup should happen immediately, and
time_poll() does some thing that we always want to happen between one
main loop and another.  In particular the following calls from
time_poll() are important:

coverage_clear();
coverage_run();
if (*last_wakeup && !thread_is_pmd()) {
log_poll_interval(*last_wakeup);
}

...

if (!time_left) {
ovsrcu_quiesce();
} else {
ovsrcu_quiesce_start();
}

...

if (deadline <= time_msec()) {
#ifndef _WIN32
fatal_signal_handler(SIGALRM);
#else
VLOG_ERR("wake up from WaitForMultipleObjects after deadline");
fatal_signal_handler(SIGTERM);
#endif
if (retval < 0) {
retval = 0;
}
break;
}

...

*last_wakeup = time_msec();
refresh_rusage();

Instead of this change, I'd suggest something more like the following,
along with the changes you made to suppress the file descriptors if the
timeout is already zero:

diff --git a/lib/timeval.c b/lib/timeval.c
index 193c7bab1781..f080a742 100644
--- a/lib/timeval.c
+++ b/lib/timeval.c
@@ -323,7 +323,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE 
*handles OVS_UNUSED,
 }
 
 #ifndef _WIN32
-retval = poll(pollfds, n_pollfds, time_left);
+retval = time_left ? poll(pollfds, n_pollfds, time_left) : 0;
 if (retval < 0) {
 retval = -errno;
 }

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] is the OVN load balancer also intended to be a firewall?

2021-07-09 Thread Ben Pfaff
On Wed, Jul 07, 2021 at 05:17:17PM -0700, Han Zhou wrote:
> On Wed, Jul 7, 2021 at 3:39 PM Ben Pfaff  wrote:
> >
> > Hi, I've been talking to Shay Vargaftik (CC'd), also a researcher at
> > VMware, about some work he's done on optimizing load balancers.  What
> > he's come up with is a technique that in many cases avoids putting
> > connections into the connection-tracking table, because it can achieve
> > per-connection consistency without needing to do that.  This improves
> > performance by reducing the size of the connection-tracking table, which
> > is therefore more likely to fit inside a CPU cache and cheaper to
> > search.
> >
> > I'm trying to determine whether this technique would apply to OVN's load
> > balancer.  There would be challenges in any case, but one fundamental
> > question I have is: is the OVN load balancer also supposed to be a
> > firewall?  If it's not, then it's worth continuing to look to see if the
> > technique is applicable.  On the other hand, if it is, then every
> > connection needs to be tracked in any case, so the technique can't be
> > useful.
> >
> > Anyone's thoughts would be welcome.
> >
> For my understanding OVN LB doesn't directly relate to FW (OVN ACLs),
> although they both use conntrack. For LB, we use conntrack for NAT (convert
> the client IP to an LB owned IP) purposes. Does this technique support NAT
> without using conntrack?

Some kinds of NAT just change IP addresses.  These kinds of NAT don't
fundamentally need to track per-connection state, so this kind of NAT
would not demand conntrack.  Other kinds of NAT pull a port number from
a pool and assign it to each connection.  The latter kind does need to
track per-connection state, so it does need conntrack.  I'm not sure
which category the OVN LB is in.

> Moreover, maybe for the future, we also need to consider the cases when a
> LB is applied on a OVN gateway, for HA purposes the NAT tracking entries
> need to be able to be replicated across nodes, so that when failover
> happens the existing connections can continue working through another
> gateway node.

I don't think this is a problem for Shay's technique, which is called
JET (for Just Enough Tracking).  JET is designed to gracefully cope with
failover.  In fact, this is possibly an even better case for JET,
because fewer entries mean less replication.

> There are also OVN LB use cases that don't require NAT. If this technique
> doesn't support NAT, it is probably still useful for those scenarios.

That makes sense.

Do you have any idea which of the various cases that OVN supports are
commonly used?  Or whether performance is an issue for them in practice?
It would be silly to spend a lot of time optimizing the performance of
something that is already fast enough.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


  1   2   3   4   5   6   7   8   9   10   >