Re: [ovs-dev] Question about moving the uuidset implementation from OVN to OVS
On Fri, Sep 16, 2022 at 5:30 AM Dumitru Ceara wrote: > I'd like to post an OVS patch to add this along with some more helpers > (see below) but I have a couple of questions: > > 1. What's the correct/desired way of maintaining authorship in this > case? I can keep all original authors and add my own sign off to it too > but is that ok? Seems OK to me. > 2. It seems we miss a copyright notice in ovn-northd-ddlog.c. Would it > be ok if I add the following? > > * Copyright (c) 2020, 2022 VMware, Inc. > * Copyright (c) 2022 Red Hat, Inc. That also seems OK to me. Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/2] Patches to branch for 3.0.
On Fri, Jul 15, 2022 at 6:10 AM Ilya Maximets wrote: > Hey! It's been about 9 years since the last major version change. > And there were lots of large and important changes in OVS in that > long time frame. Nevertheless, removal of the kernel datapath > sources from the OVS's git tree seems like a notable enough point > in time to mark it with a version bump. I support the bump to 3.0! Great idea! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/6] Remove OVS kernel driver
On Fri, Jul 8, 2022 at 12:00 PM Ilya Maximets wrote: > On 7/8/22 17:37, Ben Pfaff wrote: > > On Fri, Jul 8, 2022 at 1:53 AM David Marchand <mailto:david.march...@redhat.com>> wrote: > > > > On Thu, Jul 7, 2022 at 11:21 PM Gregory Rose <mailto:gvrose8...@gmail.com>> wrote: > > > > xenserver/openvswitch-xen.spec.in < > http://openvswitch-xen.spec.in> > > > > > > This is a bit of an issue. Does anyone even use xenserver anymore? > > > All of the documentation and build instructions are really old and > > > I wonder of they work anymore. I have no xenserver build > environment > > > to test any changes. > > > > Same for me. > > > > Ben, Ilya, do you know if this xenserver packaging still has users? > > The last comment about it was in 2017. > > fefb757ce408 ("debian, xenserver: Update logrotate config to match > RHEL.") > > > > Should we keep on updating it? or can we simply drop support for > > xenserver packaging? > > > > I'd be inclined to drop it. I have heard so little about xenserver (or > XCP, which > > I think is its successor) over the last few years. It makes me a bit > sad, since > > that's where Open vSwitch started out (in a few cases I literally had to > > disassemble bits of its binaries to understand what was going on), but > the > > world moves on. > > It seems like there is a newer open-source XCP-ng project that is > more or less active. There is also proprietary Citrix Hypervisor, > of course. Both seems to have OVS as a default networking solution > and both seems to use some version of OVS 2.5.3. Here is a repo > for XCP-ng one: > https://github.com/xcp-ng-rpms/openvswitch > > They are not using our spec file though. And have some custom > patches on top. Not sure about Citrix Hypervisor, but I found some > more or less recent security hotfixes referencing openvswitch package > like this one: > > https://support.citrix.com/article/CTX306423/hotfix-xs82e022-for-citrix-hypervisor-82 > > So, yes, someone is still using it, but they are using about 5 year > old version at this point and they are not using upstream spec file I guess they're not really using anything we're working on now. I think this still supports dropping it. I'm sure we'd welcome them back if they asked. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/6] Remove OVS kernel driver
On Fri, Jul 8, 2022 at 1:53 AM David Marchand wrote: > On Thu, Jul 7, 2022 at 11:21 PM Gregory Rose wrote: > > > xenserver/openvswitch-xen.spec.in > > > > This is a bit of an issue. Does anyone even use xenserver anymore? > > All of the documentation and build instructions are really old and > > I wonder of they work anymore. I have no xenserver build environment > > to test any changes. > > Same for me. > > Ben, Ilya, do you know if this xenserver packaging still has users? > The last comment about it was in 2017. > fefb757ce408 ("debian, xenserver: Update logrotate config to match RHEL.") > > Should we keep on updating it? or can we simply drop support for > xenserver packaging? > I'd be inclined to drop it. I have heard so little about xenserver (or XCP, which I think is its successor) over the last few years. It makes me a bit sad, since that's where Open vSwitch started out (in a few cases I literally had to disassemble bits of its binaries to understand what was going on), but the world moves on. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] classifier: change segment 3 boundary to defer dependant processing
On Thu, May 12, 2022 at 8:55 AM Aaron Conole wrote: > > Ben Pfaff writes: > > > Hi Aaron! This will move the following fields from the L4 segment to > > the L3 segment: > > > > struct in6_addr nd_target; /* IPv6 neighbor discovery (ND) target. */ > > struct eth_addr arp_sha;/* ARP/ND source hardware address. */ > > struct eth_addr arp_tha;/* ARP/ND target hardware address. */ > > ovs_be16 tcp_flags; /* TCP flags/ICMPv6 ND options type. > > * With L3 to avoid matching L4. */ > > ovs_be16 pad2; /* Pad to 64 bits. */ > > struct ovs_key_nsh nsh; /* Network Service Header keys */ > > > > > > The one that stands out to me is tcp_flags. I remember that we had a > > reason that we wanted to include this in the L3 segment. I think it was > > because we might want to match on flags without matching on anything > > else in L4, but I don't remember the exact details. There is a comment > > to that effect above. > > I guess the reason that would be done would be to implement a stateless > firewall using the openflow rules. I haven't seen many flow pipelines > recently that use tcp_flags, but I can see it might help those kinds of > scenarios where we want to allow/drop packets based on some kinds of > flags (or drop xmas packets before they get processed). > > Do you think it will be a problem? For now, I assume most folks are > using the ct() action and that should be based on ct_state field (which > isn't impacted). I don't know whether it will be a problem. Everything like this always comes back to "how well do we know how our users use Open vSwitch?" and the answer is always "not very well" :-( Since this is an important correctness issue, I think we should fix it and, if performance problems arise in cases we're not sure about, we can address them. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [RFC] classifier: change segment 3 boundary to defer dependant processing
Hi Aaron! This will move the following fields from the L4 segment to the L3 segment: struct in6_addr nd_target; /* IPv6 neighbor discovery (ND) target. */ struct eth_addr arp_sha;/* ARP/ND source hardware address. */ struct eth_addr arp_tha;/* ARP/ND target hardware address. */ ovs_be16 tcp_flags; /* TCP flags/ICMPv6 ND options type. * With L3 to avoid matching L4. */ ovs_be16 pad2; /* Pad to 64 bits. */ struct ovs_key_nsh nsh; /* Network Service Header keys */ The one that stands out to me is tcp_flags. I remember that we had a reason that we wanted to include this in the L3 segment. I think it was because we might want to match on flags without matching on anything else in L4, but I don't remember the exact details. There is a comment to that effect above. I suggest including the list of fields being moved in the commit message (i.e. the code excerpt above) so that it's clearer what's changing. Some more below: On Wed, May 11, 2022 at 10:52:41AM -0400, Aaron Conole wrote: > During the sweep stage, revalidator will compare the dumped WC with a > generated WC - these will mismatch because the generated WC will include > such dependent ND fields. We will then invalidate the flow and as a side > effect force an upcall. I find "include" (or "exclude") a little ambiguous here. I think that saying that the generated or dumped WC "matches" or "does not match" on a particular field would be clearer. Some more like this below: > By redefining the boundary, we shift these fields to the l4 subtable, and > cause masks to be generated by including the requisite fields. Two > other approaches were considered - moving the nd_target field alone > (which leaves arp_sha/nd_sll, arp_tha/nd_tll, and tcp_flags still as > issues so this was avoided) and collapsing the l3/l4 segments into one > (which would result in flow explosion in real scenarios, so it was also > discarded as a solution). > > Fixes: 476f36e83bc5 ("Classifier: Staged subtable matching.") > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2081773 > Reported-by: Numan Siddique > Suggested-by: Ilya Maximets > Signed-off-by: Aaron Conole Thanks for adding a test! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] MAINTAINERS: Move myself to emeritus status.
On Tue, May 3, 2022 at 3:15 PM Han Zhou wrote: > I believe this has been applied already last year. Oh, I see, we also transitioned to have the main branch be "main" but I was looking in the wrong place. No need to do anything. Thanks! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn] MAINTAINERS: Move myself to emeritus status.
I am no longer actively working on OVN, so it's appropriate to move myself to emeritus status. Signed-off-by: Ben Pfaff --- MAINTAINERS.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS.rst b/MAINTAINERS.rst index 0d19bd622..fff2cd977 100644 --- a/MAINTAINERS.rst +++ b/MAINTAINERS.rst @@ -41,8 +41,6 @@ This is the current list of active OVN committers: * - Name - Email - * - Ben Pfaff - - b...@ovn.org * - Gurucharan Shetty - g...@ovn.org * - Han Zhou @@ -67,3 +65,5 @@ More information about Emeritus Committers can be found * - Name - Email + * - Ben Pfaff + - b...@ovn.org -- 2.35.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVN weekly meeting time change
On Thu, Mar 31, 2022 at 02:03:41PM -0400, Mark Michelson wrote: > @Ben > Can you please change the channel topic message of #openvswitch so that the > new meeting time is reflected? Thanks! I've done this now. Does anyone know how I would give someone else permission to do this kind of thing? I don't think I should be the only one who can do this. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] daemon-unix: Close log file in monitor process while waiting on child.
On Wed, Jan 26, 2022 at 06:39:52PM -0800, William Tu wrote: > On Wed, Jan 26, 2022 at 8:56 AM Ben Pfaff wrote: > > > > Until now, the monitor process held its log file open while it waited for > > the child to exit, and then it reopened it after the child exited. Holding > > the log file open this way prevented log rotation from freeing disk space. > > This commit avoids the problem by closing the log file before waiting, then > > reopening it afterward. > > > > Signed-off-by: Ben Pfaff > > Reported-by: Antonin Bas > > VMware-BZ: #2743409 > > --- > > Hi Ben, > Thanks, ASAN reports a memory leak, [...] > I think we should "free(log_file)" after vlog_set_log_file(log_file)? You are right. Thank you for the report. I posted v2: https://mail.openvswitch.org/pipermail/ovs-dev/2022-January/391119.html ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v2] daemon-unix: Close log file in monitor process while waiting on child.
Until now, the monitor process held its log file open while it waited for the child to exit, and then it reopened it after the child exited. Holding the log file open this way prevented log rotation from freeing disk space. This commit avoids the problem by closing the log file before waiting, then reopening it afterward. Signed-off-by: Ben Pfaff Reported-by: Antonin Bas VMware-BZ: #2743409 --- v1->v2: Fix memory leak in monitor_daemon() reported by William. include/openvswitch/vlog.h | 2 + lib/daemon-unix.c | 5 +- lib/vlog.c | 93 ++ 3 files changed, 71 insertions(+), 29 deletions(-) diff --git a/include/openvswitch/vlog.h b/include/openvswitch/vlog.h index 886fce5e0..e53ce6d81 100644 --- a/include/openvswitch/vlog.h +++ b/include/openvswitch/vlog.h @@ -131,7 +131,9 @@ void vlog_set_verbosity(const char *arg); /* Configuring log destinations. */ void vlog_set_pattern(enum vlog_destination, const char *pattern); +char *vlog_get_log_file(void); int vlog_set_log_file(const char *file_name); +void vlog_close_log_file(void); int vlog_reopen_log_file(void); #ifndef _WIN32 void vlog_change_owner_unix(uid_t, gid_t); diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c index 34d45b82a..a02fd984b 100644 --- a/lib/daemon-unix.c +++ b/lib/daemon-unix.c @@ -360,12 +360,15 @@ monitor_daemon(pid_t daemon_pid) (unsigned long int) daemon_pid, status_msg); if (child_ready) { +char *log_file = vlog_get_log_file(); +vlog_close_log_file(); int error; do { retval = waitpid(daemon_pid, , 0); error = retval == -1 ? errno : 0; } while (error == EINTR); -vlog_reopen_log_file(); +vlog_set_log_file(log_file); +free(log_file); if (error) { VLOG_FATAL("waitpid failed (%s)", ovs_strerror(error)); } diff --git a/lib/vlog.c b/lib/vlog.c index 533f93755..0a615bb66 100644 --- a/lib/vlog.c +++ b/lib/vlog.c @@ -343,13 +343,25 @@ vlog_set_pattern(enum vlog_destination destination, const char *pattern) } } -/* Sets the name of the log file used by VLF_FILE to 'file_name', or to the - * default file name if 'file_name' is null. Returns 0 if successful, - * otherwise a positive errno value. */ -int -vlog_set_log_file(const char *file_name) +/* Returns a copy of the name of the log file used by VLF_FILE, or NULL if none + * is configured. The caller must eventually free the returned string. */ +char * +vlog_get_log_file(void) +{ +ovs_mutex_lock(_file_mutex); +char *fn = nullable_xstrdup(log_file_name); +ovs_mutex_unlock(_file_mutex); + +return fn; +} + +/* Sets the name of the log file used by VLF_FILE to 'new_log_file_name', or + * closes the current log file if 'new_log_file_name' is NULL. Takes ownership + * of 'new_log_file_name'. Returns 0 if successful, otherwise a positive errno + * value. */ +static int +vlog_set_log_file__(char *new_log_file_name) { -char *new_log_file_name; struct vlog_module *mp; struct stat old_stat; struct stat new_stat; @@ -358,25 +370,29 @@ vlog_set_log_file(const char *file_name) bool log_close; /* Open new log file. */ -new_log_file_name = (file_name - ? xstrdup(file_name) - : xasprintf("%s/%s.log", ovs_logdir(), program_name)); -new_log_fd = open(new_log_file_name, O_WRONLY | O_CREAT | O_APPEND, 0660); -if (new_log_fd < 0) { -VLOG_WARN("failed to open %s for logging: %s", - new_log_file_name, ovs_strerror(errno)); -free(new_log_file_name); -return errno; +if (new_log_file_name) { +new_log_fd = open(new_log_file_name, O_WRONLY | O_CREAT | O_APPEND, + 0660); +if (new_log_fd < 0) { +VLOG_WARN("failed to open %s for logging: %s", + new_log_file_name, ovs_strerror(errno)); +free(new_log_file_name); +return errno; +} +} else { +new_log_fd = -1; } /* If the new log file is the same one we already have open, bail out. */ ovs_mutex_lock(_file_mutex); -same_file = (log_fd >= 0 - && new_log_fd >= 0 - && !fstat(log_fd, _stat) - && !fstat(new_log_fd, _stat) - && old_stat.st_dev == new_stat.st_dev - && old_stat.st_ino == new_stat.st_ino); +same_file = ((log_fd < 0 + && new_log_fd < 0) || + (log_fd >= 0 + && new_log_fd >= 0 + && !fstat(log_fd, _stat) + && !fstat(new_log_fd, _stat) + && old_stat.st_dev == n
[ovs-dev] [PATCH] daemon-unix: Close log file in monitor process while waiting on child.
Until now, the monitor process held its log file open while it waited for the child to exit, and then it reopened it after the child exited. Holding the log file open this way prevented log rotation from freeing disk space. This commit avoids the problem by closing the log file before waiting, then reopening it afterward. Signed-off-by: Ben Pfaff Reported-by: Antonin Bas VMware-BZ: #2743409 --- include/openvswitch/vlog.h | 2 + lib/daemon-unix.c | 4 +- lib/vlog.c | 93 ++ 3 files changed, 70 insertions(+), 29 deletions(-) diff --git a/include/openvswitch/vlog.h b/include/openvswitch/vlog.h index 886fce5e0..e53ce6d81 100644 --- a/include/openvswitch/vlog.h +++ b/include/openvswitch/vlog.h @@ -131,7 +131,9 @@ void vlog_set_verbosity(const char *arg); /* Configuring log destinations. */ void vlog_set_pattern(enum vlog_destination, const char *pattern); +char *vlog_get_log_file(void); int vlog_set_log_file(const char *file_name); +void vlog_close_log_file(void); int vlog_reopen_log_file(void); #ifndef _WIN32 void vlog_change_owner_unix(uid_t, gid_t); diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c index 34d45b82a..10806bf81 100644 --- a/lib/daemon-unix.c +++ b/lib/daemon-unix.c @@ -360,12 +360,14 @@ monitor_daemon(pid_t daemon_pid) (unsigned long int) daemon_pid, status_msg); if (child_ready) { +char *log_file = vlog_get_log_file(); +vlog_close_log_file(); int error; do { retval = waitpid(daemon_pid, , 0); error = retval == -1 ? errno : 0; } while (error == EINTR); -vlog_reopen_log_file(); +vlog_set_log_file(log_file); if (error) { VLOG_FATAL("waitpid failed (%s)", ovs_strerror(error)); } diff --git a/lib/vlog.c b/lib/vlog.c index 533f93755..0a615bb66 100644 --- a/lib/vlog.c +++ b/lib/vlog.c @@ -343,13 +343,25 @@ vlog_set_pattern(enum vlog_destination destination, const char *pattern) } } -/* Sets the name of the log file used by VLF_FILE to 'file_name', or to the - * default file name if 'file_name' is null. Returns 0 if successful, - * otherwise a positive errno value. */ -int -vlog_set_log_file(const char *file_name) +/* Returns a copy of the name of the log file used by VLF_FILE, or NULL if none + * is configured. The caller must eventually free the returned string. */ +char * +vlog_get_log_file(void) +{ +ovs_mutex_lock(_file_mutex); +char *fn = nullable_xstrdup(log_file_name); +ovs_mutex_unlock(_file_mutex); + +return fn; +} + +/* Sets the name of the log file used by VLF_FILE to 'new_log_file_name', or + * closes the current log file if 'new_log_file_name' is NULL. Takes ownership + * of 'new_log_file_name'. Returns 0 if successful, otherwise a positive errno + * value. */ +static int +vlog_set_log_file__(char *new_log_file_name) { -char *new_log_file_name; struct vlog_module *mp; struct stat old_stat; struct stat new_stat; @@ -358,25 +370,29 @@ vlog_set_log_file(const char *file_name) bool log_close; /* Open new log file. */ -new_log_file_name = (file_name - ? xstrdup(file_name) - : xasprintf("%s/%s.log", ovs_logdir(), program_name)); -new_log_fd = open(new_log_file_name, O_WRONLY | O_CREAT | O_APPEND, 0660); -if (new_log_fd < 0) { -VLOG_WARN("failed to open %s for logging: %s", - new_log_file_name, ovs_strerror(errno)); -free(new_log_file_name); -return errno; +if (new_log_file_name) { +new_log_fd = open(new_log_file_name, O_WRONLY | O_CREAT | O_APPEND, + 0660); +if (new_log_fd < 0) { +VLOG_WARN("failed to open %s for logging: %s", + new_log_file_name, ovs_strerror(errno)); +free(new_log_file_name); +return errno; +} +} else { +new_log_fd = -1; } /* If the new log file is the same one we already have open, bail out. */ ovs_mutex_lock(_file_mutex); -same_file = (log_fd >= 0 - && new_log_fd >= 0 - && !fstat(log_fd, _stat) - && !fstat(new_log_fd, _stat) - && old_stat.st_dev == new_stat.st_dev - && old_stat.st_ino == new_stat.st_ino); +same_file = ((log_fd < 0 + && new_log_fd < 0) || + (log_fd >= 0 + && new_log_fd >= 0 + && !fstat(log_fd, _stat) + && !fstat(new_log_fd, _stat) + && old_stat.st_dev == new_stat.st_dev + && old_stat.st_ino == new_stat.st_ino)); ovs_mutex_
Re: [ovs-dev] [nore...@github.com: [GitHub] Third-party application approval request for "Open Virtual Network"]
On Fri, Dec 24, 2021 at 10:32:23AM -0500, Numan Siddique wrote: > On Thu, Dec 23, 2021 at 6:51 PM Ben Pfaff wrote: > > > > This came to me. I don't know whether this is something we/I should > > approve. > > This request came from Andrew (CCed) and I approved it. Thanks for taking care of it! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [nore...@github.com: [GitHub] Third-party application approval request for "Open Virtual Network"]
This came to me. I don't know whether this is something we/I should approve. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [EXTERNAL] Re: [PATCH] RFC: netdev-dpdk: Add Windows support.
The news is good, thanks! On Thu, Nov 4, 2021 at 12:09 PM Omar Cardona wrote: > Hi *Ben,* > >- Yes, definitely. C11/C17 stdatomic.h is high on the list. >- Our current critical path item for MSVC DPDK support is TLS. > > > > Hi *Thomas,* > >- I will setup some time next week to discuss alignment and share our >proposed MSVC plan. > > > > > > *From:* Ben Pfaff > *Sent:* Thursday, November 4, 2021 11:15 AM > *To:* Omar Cardona > *Cc:* William Tu ; thomas ; < > d...@openvswitch.org> ; ovs-dev < > ovs-dev@openvswitch.org>; Sergey Madaminov ; > Kadam, Pallavi ; Ilya Maximets < > i.maxim...@ovn.org>; Test ; Dmitry Kozlyuk < > dmitry.kozl...@gmail.com>; talshn > *Subject:* Re: [EXTERNAL] Re: [ovs-dev] [PATCH] RFC: netdev-dpdk: Add > Windows support. > > > > On Thu, Nov 4, 2021 at 8:37 AM Omar Cardona > wrote: > > > > f. With clang support on Windows, can we remove MSVC? > > Please no. We are actively working with the MSVC team to extend support > for DPDKs dependencies > > > > Could you get them to implement C11 atomics? The MSVC support for atomic > > operations is poor, unless it's changed since I last looked. We could > remove > > quite a bit of rather awful stuff if either MSVC supported the C standard > or if > > OVS dropped support for MSVC. > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [EXTERNAL] Re: [PATCH] RFC: netdev-dpdk: Add Windows support.
On Thu, Nov 4, 2021 at 8:37 AM Omar Cardona wrote: > > > f. With clang support on Windows, can we remove MSVC? > > Please no. We are actively working with the MSVC team to extend support > for DPDKs dependencies > Could you get them to implement C11 atomics? The MSVC support for atomic operations is poor, unless it's changed since I last looked. We could remove quite a bit of rather awful stuff if either MSVC supported the C standard or if OVS dropped support for MSVC. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] northd-ddlog slowness when adding the first regular LSP to a LS full of router ports
On Thu, Sep 30, 2021 at 12:23:57AM -0700, Han Zhou wrote: > Hi Ben, > > I understand that we have difficulties for northd-ddlog progress, but I > still want to try it before it goes away. I tested with the latest version, > and it is super fast for most of the operations. With a large NB & SB > created, the C northd takes ~8 seconds for any change computation. For the > same DB, northd usually takes less than 1 sec for most operations. > > However, I did find an interesting problem. I tried to create one more LSP > on a LS that already has 800 gateway-routers connected to it, which means > there are already 800 LSPs on the LS of the type "router". Creating the > extra LSP (without type) took 12 sec, which is even longer than a full > compute of the C version. What's more interesting is, when I create another > LSP on the same LS, it takes only 100+ms, and same for the 3rd, 4th LSPs, > etc. When I remove any one of the extra LSPs I created, it is also fast, > just 100+ms. But when I remove the last LSP that I just created it takes 12 > sec again. Then I tried creating a LSP with type=router on the same LS, it > is very fast, less than 100ms. Basically, only creating the first > non-router LSP or removing the last non-router LSP takes a very long time. > > I haven't debugged yet (and not sure if I am capable of), but I think it > might be useful to report it first. This kind of thing usually happens because there is a lot of intermediate data being built up in internal relations that changes when the input changes. My usual strategy for figuring it out is to configure with --output-internal-relations (that might not be the exactly correct option; the documentation should say) and then replay using the CLI to see what's changing before and after the changes in question. It's usually big. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] RFC: netdev-dpdk: Add Windows support.
On Wed, Oct 6, 2021 at 10:01 AM William Tu wrote: > The patch adds OVS-DPDK supports on Windows. Fantastic! I'm happy to see this work in progress. OVS(-DPDK) everywhere! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] MAINTAINERS: Transition myself to emeritus status.
On Tue, Oct 05, 2021 at 08:06:45PM -0700, William Tu wrote: > On Tue, Oct 5, 2021 at 8:05 PM William Tu wrote: > > > > On Tue, Oct 5, 2021 at 11:33 AM Numan Siddique wrote: > > > > > > On Tue, Oct 5, 2021 at 1:46 PM Han Zhou wrote: > > > > > > > > On Tue, Oct 5, 2021 at 9:39 AM Ben Pfaff wrote: > > > > > > > > > > I have not been actively maintaining OVN for some time now. I don't > > > > expect > > > > > that to change. I think that the responsible thing to do is to > > > > > acknowledge this properly by transitioning to emeritus status. > > > > > > > > > > Signed-off-by: Ben Pfaff > > > > > --- > > > > > MAINTAINERS.rst | 4 ++-- > > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/MAINTAINERS.rst b/MAINTAINERS.rst > > > > > index 0d19bd622..fff2cd977 100644 > > > > > --- a/MAINTAINERS.rst > > > > > +++ b/MAINTAINERS.rst > > > > > @@ -41,8 +41,6 @@ This is the current list of active OVN committers: > > > > > > > > > > * - Name > > > > > - Email > > > > > - * - Ben Pfaff > > > > > - - b...@ovn.org > > > > > * - Gurucharan Shetty > > > > > - g...@ovn.org > > > > > * - Han Zhou > > > > > @@ -67,3 +65,5 @@ More information about Emeritus Committers can be > > > > > found > > > > > > > > > > * - Name > > > > > - Email > > > > > + * - Ben Pfaff > > > > > + - b...@ovn.org > > > > > -- > > > > > 2.31.1 > > > > > > > > > > ___ > > > > > dev mailing list > > > > > d...@openvswitch.org > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > Hi Ben, > > > > > > > > The patch LGTM, although I really don't like the solution :( > > > > Thank you so much for building the amazing stuff from scratch! > > > > > > > > > > Agree, Thank you for all your amazing work! > > > > > > Thanks > > > Numan > > > > > > > > > > Acked-by: Han Zhou > > > > Thanks Ben! > > I applied to master. > > William > > and feel free to revert the commit if you change your mind :) Thanks :-) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] MAINTAINERS: Transition myself to emeritus status.
On Wed, Oct 06, 2021 at 11:20:48AM +0200, Simon Horman wrote: > On Tue, Oct 05, 2021 at 09:32:41AM -0700, Ben Pfaff wrote: > > It has been a long time since I've been able to devote significant time > > to OVS work. I don't expect that to change. I think that the responsible > > thing to do is to transition myself to emeritus status, with this patch. > > > > Signed-off-by: Ben Pfaff > > Thanks for all your hard work Ben. You're welcome! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn] MAINTAINERS: Transition myself to emeritus status.
I have not been actively maintaining OVN for some time now. I don't expect that to change. I think that the responsible thing to do is to acknowledge this properly by transitioning to emeritus status. Signed-off-by: Ben Pfaff --- MAINTAINERS.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS.rst b/MAINTAINERS.rst index 0d19bd622..fff2cd977 100644 --- a/MAINTAINERS.rst +++ b/MAINTAINERS.rst @@ -41,8 +41,6 @@ This is the current list of active OVN committers: * - Name - Email - * - Ben Pfaff - - b...@ovn.org * - Gurucharan Shetty - g...@ovn.org * - Han Zhou @@ -67,3 +65,5 @@ More information about Emeritus Committers can be found * - Name - Email + * - Ben Pfaff + - b...@ovn.org -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] MAINTAINERS: Transition myself to emeritus status.
It has been a long time since I've been able to devote significant time to OVS work. I don't expect that to change. I think that the responsible thing to do is to transition myself to emeritus status, with this patch. Signed-off-by: Ben Pfaff --- MAINTAINERS.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS.rst b/MAINTAINERS.rst index 0c035d940..f4f6188c8 100644 --- a/MAINTAINERS.rst +++ b/MAINTAINERS.rst @@ -49,8 +49,6 @@ This is the current list of active Open vSwitch committers: - az...@ovn.org * - Ansis Atteka - aatt...@nicira.com - * - Ben Pfaff - - b...@ovn.org * - Daniele Di Proietto - daniele.di.proie...@gmail.com * - Gurucharan Shetty @@ -89,5 +87,7 @@ More information about Emeritus Committers can be found * - Name - Email + * - Ben Pfaff + - b...@ovn.org * - Ethan J. Jackson - e...@eecs.berkeley.edu -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] Issue with one of your old patches, commit c645550b
This code in odp-util is basically unmaintainable. I've never been able to figure out how to properly test or reason about it. It's a huge reason that I want to get rid of the kernel module. On Mon, Oct 04, 2021 at 11:49:41AM +0200, Eelco Chaudron wrote: > Hi Ben, > > Any chance you can take a peek at the email below? I hope you can remember > this from 2018 :) > > Thanks, > > Eelco > > On 15 Sep 2021, at 15:30, Eelco Chaudron wrote: > > > Hi Ben, > > > > The mentioned fix below is causing a regression, actually two related ones: > > > > 1) If IGMP snooping is disabled, IGMP related traffic is still sent slow > > path for processing with the NORMAL rule. In addition, even without the > > NORMAL rule, the DP rule gets modified. > > > > Here are some examples: > > > > a) default config, one bridge (kernel dp), two ports (i.e. only a single > > NORMAL rule, and mcast_snooping_enable: false), send an IGMP join: > > > > - With your fix the dp rule looks like this: > > > > > > recirc_id(0),in_port(2),eth(src=04:f4:bc:28:57:01,dst=01:00:5e:00:00:16),eth_type(0x0800),ipv4(frag=no), > > packets:8, bytes:576, used:0.169s, > > actions:userspace(pid=4220974525,slow_path(match)) > > > > - Without the fix, which is correct: > > > > > > recirc_id(0),in_port(2),eth(src=04:f4:bc:28:57:01,dst=01:00:5e:00:00:16),eth_type(0x0800),ipv4(frag=no), > > packets:8, bytes:576, used:0.004s, actions:1 > > > > > > b) with a more specific OpenFlow entry, no DP entry gets created: > > > > ovs-ofctl del-flows ovs_pvp_br0 && ovs-ofctl add-flow ovs_pvp_br0 > > "in_port=enp5s0f0,eth_type(0x800),ip_proto=2,action=drop" > > ovs-ofctl del-flows ovs_pvp_br0 && ovs-ofctl add-flow ovs_pvp_br0 > > "in_port=enp5s0f0,eth_type(0x800),ip_proto=2,action=vnet6" > > > > > > 2) With the netdev (DPDK) interface all seems to work as expected, i.e., > > the flow gets installed correctly without the userspace upcall in both > > cases above. However, the verifier generates an error as it would like to > > update the dp rule with a slow_path variant. Which is failing as netdev is > > using the odp_flow_key_to_flow() utility function, which has the IGMP check. > > > > Looking at the change it assumes you filter on igmp type/code when you > > filter on the protocol type igmp. But this is not true, you can just filter > > on the protocol type only, and this is what is actually causing the > > problems above. So I think your patch needs to be reversed! > > > > However, looking at fixing Huanle's original problem, > > https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343665.html, > > looking at his patch, this might be the right solution. All IGMP packets > > independent of their type need to be processed by a SLOW_ACTION upcall. > > > > > > I've copied in Flavio, who worked on the IGMP snooping part before, as he > > might still remember what I argue above is true :) > > > > > > If you have any other ideas, thoughts please let me know. > > > > > > Cheers, > > > > Eelco > > > > > > > > commit c645550bb2498fb3816b6a39b22bffeb3154dca3 > > Author: Ben Pfaff > > Date: Wed Jan 24 11:40:20 2018 -0800 > > > > odp-util: Always report ODP_FIT_TOO_LITTLE for IGMP. > > > > OVS datapaths don't understand or parse IGMP fields, but OVS userspace > > does, so this commit updates odp_flow_key_to_flow() to report that > > properly > > to the caller. > > > > Reported-by: Huanle Han > > Reported-at: > > https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343665.html > > Signed-off-by: Ben Pfaff > > > > diff --git a/lib/odp-util.c b/lib/odp-util.c > > index 14d5af097..5da83b4c6 100644 > > --- a/lib/odp-util.c > > +++ b/lib/odp-util.c > > @@ -6210,6 +6210,11 @@ parse_l2_5_onward(const struct nlattr > > *attrs[OVS_KEY_ATTR_MAX + 1], > > } > > } > > } > > +} else if (src_flow->nw_proto == IPPROTO_IGMP > > + && src_flow->dl_type == htons(ETH_TYPE_IP)) { > > +/* OVS userspace parses the IGMP type, code, and group, but its > > + * datapaths do not, so there is always missing information. */ > > +return ODP_FIT_TOO_LITTLE; > > } > > if (is_mask && expected_bit != OVS_KEY_ATTR_UNSPEC) { > > if ((flow->tp_src || flow->tp_dst) && flow->nw_proto != 0xff) { > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2 00/10] 3x performance improvement for ddlog with load balancer benchmark
Thanks. I rebased and applied this series to master. On Mon, Sep 13, 2021 at 02:13:28PM -0400, Mark Michelson wrote: > Hi Ben, for the series: > > Acked-by: Mark Michelson > > I had a question on patch 1, but it's minor and otherwise shouldn't block an > ACK. > > On 9/7/21 6:45 PM, Ben Pfaff wrote: > > With and without these patches, I see the following performance when I > > run the load-balancer heavy benchmark. The measurements include cold > > start with all the load balancers, then deleting the sctp load balancer > > and waiting for it to finish, then the same with the tcp load balancer, > > then the same with the udp load balancer. The measurements only include > > ddlog time; there is additional overhead talking to the database, but > > that's a constant factor. > > > > Without patches: 783 CPU seconds, 23.5 GB RAM, 13:06 elapsed time. > > With patches: 224 CPU seconds, 14.3 GB RAM, 3:45 elapsed time. > > > > I am continuing to work on performance. > > > > These patches are also available in: > > https://github.com/blp/ovs-reviews/tree/ovn-memory-8 > > > > v1->v2: > >- Add two new patches. > >- Rebase and rerun benchmarks (revisions reported above). > > > > Ben Pfaff (9): > >ovn-northd-ddlog: Make joins for ARP/ND flows slightly more efficient. > >ovn-northd-ddlog: Derive load balancer IP addresses in new > > LoadBalancer. > >ovn-northd-ddlog: Reverse order of joins for connection tracking > > flows. > >ovn-northd-ddlog: Avoid re-parsing LB IP addresses and ports. > >ovn-northd-ddlog: Simplify LBVIPWithStatus to include up_backends > > string. > >ovn-northd-ddlog: Avoid storing unused 'lbs' field in Router. > >ovn-northd-ddlog: Intern strings before joining when possible. > >ovn-northd-ddlog: Avoid map(ival) for ARP flows. > >ovn-northd-ddlog: Avoid unnecessary joins for SwitchPortARPForwards. > > > > Leonid Ryzhyk (1): > >ovn-northd-ddlog: Intern all strings in OVSDB tables. > > > > configure.ac |2 +- > > manpages.mk |1 - > > northd/copp.dl | 32 +- > > northd/helpers.dl| 14 +- > > northd/ipam.dl | 17 +- > > northd/lrouter.dl| 213 +++-- > > northd/lswitch.dl| 177 ++-- > > northd/multicast.dl | 44 +- > > northd/ovn-nb.dlopts |1 + > > northd/ovn-sb.dlopts |1 + > > northd/ovn.dl|7 + > > northd/ovn_northd.dl | 2021 -- > > northd/ovsdb2ddlog2c |6 +- > > tests/ovn-ic.at |8 +- > > tests/ovn-northd.at | 20 +- > > tests/ovn.at |6 +- > > 16 files changed, 1280 insertions(+), 1290 deletions(-) > > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2 10/10] ovn-northd-ddlog: Avoid unnecessary joins for SwitchPortARPForwards.
SwitchPortARPForwards already has all the necessary data in it and joining with SwitchPort just wastes time. Signed-off-by: Ben Pfaff --- northd/ovn_northd.dl | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 573c2b392..201c08a24 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -4612,8 +4612,7 @@ Flow(.logical_datapath = sw._uuid, .io_port = None, .controller_meter = None) :- var mc_flood_l2 = json_escape(mC_FLOOD_L2().0), -sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), -(.port = sp, .reachable_ips_v4 = ips_v4), +(.port = sp@{.sw = sw}, .reachable_ips_v4 = ips_v4), var ipv4 = FlatMap(ips_v4). Flow(.logical_datapath = sw._uuid, .stage= s_SWITCH_IN_L2_LKUP(), @@ -4629,8 +4628,7 @@ Flow(.logical_datapath = sw._uuid, .io_port = None, .controller_meter = None) :- var mc_flood_l2 = json_escape(mC_FLOOD_L2().0), -sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), -(.port = sp, .reachable_ips_v6 = ips_v6), +(.port = sp@{.sw = sw}, .reachable_ips_v6 = ips_v6), var ipv6 = FlatMap(ips_v6). Flow(.logical_datapath = sw._uuid, @@ -4642,8 +4640,7 @@ Flow(.logical_datapath = sw._uuid, .io_port = None, .controller_meter = None) :- var actions = i"outport = ${json_escape(mC_FLOOD().0)}; output;", -sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), -(.port = sp, .unreachable_ips_v4 = ips_v4), +(.port = sp@{.sw = sw}, .unreachable_ips_v4 = ips_v4), var ipv4 = FlatMap(ips_v4). Flow(.logical_datapath = sw._uuid, .stage= s_SWITCH_IN_L2_LKUP(), @@ -4654,8 +4651,7 @@ Flow(.logical_datapath = sw._uuid, .io_port = None, .controller_meter = None) :- var actions = i"outport = ${json_escape(mC_FLOOD().0)}; output;", -sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), -(.port = sp, .unreachable_ips_v6 = ips_v6), +(.port = sp@{.sw = sw}, .unreachable_ips_v6 = ips_v6), var ipv6 = FlatMap(ips_v6). for (SwitchPortNewDynamicAddress(.port = {.lsp = lsp, .json_name = json_name, .sw = sw}, -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2 08/10] ovn-northd-ddlog: Intern strings before joining when possible.
By interning earlier, we do less copying across the joins with Router and Switch, because DDlog always copies by value and copying an interned string is cheap. Signed-off-by: Ben Pfaff --- northd/ovn_northd.dl | 57 +++- 1 file changed, 30 insertions(+), 27 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 5af41fa22..1bf1e5333 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -3216,14 +3216,14 @@ function build_lb_vip_actions(lbvip: Intern, Flow(.logical_datapath = sw._uuid, .stage= s_SWITCH_IN_STATEFUL(), .priority = priority, - .__match = __match.intern(), - .actions = actions.intern(), + .__match = __match, + .actions = actions, .io_port = None, .controller_meter = meter, .stage_hint = 0) :- LBVIPWithStatus(lbvip@{.lb = lb}, up_backends), var priority = if (lbvip.vip_port != 0) { 120 } else { 110 }, -(var actions, var reject) = { +(var actions0, var reject) = { /* Store the original destination IP to be used when generating * hairpin flows. */ @@ -3243,7 +3243,8 @@ Flow(.logical_datapath = sw._uuid, build_lb_vip_actions(lbvip, up_backends, s_SWITCH_OUT_QOS_MARK(), actions0 ++ actions1) }, -var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, false, false, false), +var actions = actions0.intern(), +var __match = ("ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, false, false, false)).intern(), sw in (), sw.load_balancer.contains(lb._uuid), var meter = if (reject) { @@ -4610,10 +4611,10 @@ Flow(.logical_datapath = sw._uuid, .stage_hint = stage_hint(sp.lsp._uuid), .io_port = None, .controller_meter = None) :- +var mc_flood_l2 = json_escape(mC_FLOOD_L2().0), sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .reachable_ips_v4 = ips_v4), -var ipv4 = FlatMap(ips_v4), -var mc_flood_l2 = json_escape(mC_FLOOD_L2().0). +var ipv4 = FlatMap(ips_v4). Flow(.logical_datapath = sw._uuid, .stage= s_SWITCH_IN_L2_LKUP(), .priority = 80, @@ -4627,35 +4628,35 @@ Flow(.logical_datapath = sw._uuid, .stage_hint = stage_hint(sp.lsp._uuid), .io_port = None, .controller_meter = None) :- +var mc_flood_l2 = json_escape(mC_FLOOD_L2().0), sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .reachable_ips_v6 = ips_v6), -var ipv6 = FlatMap(ips_v6), -var mc_flood_l2 = json_escape(mC_FLOOD_L2().0). +var ipv6 = FlatMap(ips_v6). Flow(.logical_datapath = sw._uuid, .stage= s_SWITCH_IN_L2_LKUP(), .priority = 90, .__match = i"${fLAGBIT_NOT_VXLAN()} && arp.op == 1 && arp.tpa == ${ipv4}", - .actions = i"outport = ${flood}; output;", + .actions = actions, .stage_hint = 0, .io_port = None, .controller_meter = None) :- +var actions = i"outport = ${json_escape(mC_FLOOD().0)}; output;", sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .unreachable_ips_v4 = ips_v4), -var ipv4 = FlatMap(ips_v4), -var flood = json_escape(mC_FLOOD().0). +var ipv4 = FlatMap(ips_v4). Flow(.logical_datapath = sw._uuid, .stage= s_SWITCH_IN_L2_LKUP(), .priority = 90, .__match = i"${fLAGBIT_NOT_VXLAN()} && nd_ns && nd.target == ${ipv6}", - .actions = i"outport = ${flood}; output;", + .actions = actions, .stage_hint = stage_hint(sp.lsp._uuid), .io_port = None, .controller_meter = None) :- +var actions = i"outport = ${json_escape(mC_FLOOD().0)}; output;", sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .unreachable_ips_v6 = ips_v6), -var ipv6 = FlatMap(ips_v6), -var flood = json_escape(mC_FLOOD().0). +var ipv6 = FlatMap(ips_v6). for (SwitchPortNewDynamicAddress(.port = {.lsp = lsp, .json_name = json_name, .sw = sw}, .address = Some{addrs}) @@ -6701,7 +6702,7 @@ Flow(.logical_datapath = r._uuid, Flow(.logical_datapath = r._uuid, .stage= s_ROUTER_IN_DNAT(), .priority = prio, - .__match = __match.intern(), + .__match = __match, .actions = actions, .stage_hint = 0, .io_port = None, @@ -6718,13 +6719,13 @@ Flow(.logical_datapath = r._uuid, "" }, var prio = if (port != 0) { 120 } else { 110 }, -var match0 = "ct.est && &qu
[ovs-dev] [PATCH ovn v2 09/10] ovn-northd-ddlog: Avoid map(ival) for ARP flows.
It's expensive for a long list. All it buys us is sorting the list in alphabetical order (rather than in order of Intern hash value), which isn't that valuable. This also updates a test that depended on the sort order. Suggested-by: Leonid Ryzhyk Signed-off-by: Ben Pfaff --- northd/ovn_northd.dl | 2 +- tests/ovn-northd.at | 20 ++-- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 1bf1e5333..573c2b392 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -5679,7 +5679,7 @@ var residence_check = match (is_redirect) { not all_ipv4s.is_empty() in LogicalRouterArpFlow(.lr = router, .lrp = Some{lrp}, - .ip = i"{ ${all_ipv4s.map(ival).to_vec().join(\", \")} }", + .ip = i"{ ${all_ipv4s.to_vec().join(\", \")} }", .mac = rEG_INPORT_ETH_ADDR(), .extra_match = residence_check, .drop = false, diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index 11886b94e..c2db8f53a 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -1685,8 +1685,16 @@ match=(eth.mcast && inport == "lrp-public"), dnl action=(xreg0[[0..47]] = 00:00:00:00:01:00; next;) ]) +# The order of the VIP addresses in the flow table entries doesn't +# matter, so just replace each of them with a generic $vip for +# testing. It would be better if we could ensure each one appeared +# exactly once, but that's hard with sed. +sed_vips() { +sed 's/192\.168\.2\.[[1456]]/$vip/g' +} + # Ingress router port ETH address is used for ARP reply/NA in lr_in_ip_input. -AT_CHECK([ovn-sbctl lflow-list | grep -E "lr_in_ip_input.*priority=90" | grep "arp\|nd" | sort], [0], [dnl +AT_CHECK([ovn-sbctl lflow-list | grep -E "lr_in_ip_input.*priority=90" | grep "arp\|nd" | sed_vips | sort], [0], [dnl table=3 (lr_in_ip_input ), priority=90 , dnl match=(arp.op == 1 && arp.tpa == 43.43.43.150), dnl action=(eth.dst = eth.src; eth.src = xreg0[[0..47]]; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = xreg0[[0..47]]; arp.tpa <-> arp.spa; outport = inport; flags.loopback = 1; output;) @@ -1703,7 +1711,7 @@ action=(eth.dst = eth.src; eth.src = xreg0[[0..47]]; arp.op = 2; /* ARP reply */ match=(inport == "lrp" && arp.op == 1 && arp.tpa == 42.42.42.1 && arp.spa == 42.42.42.0/24), dnl action=(eth.dst = eth.src; eth.src = xreg0[[0..47]]; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = xreg0[[0..47]]; arp.tpa <-> arp.spa; outport = inport; flags.loopback = 1; output;) table=3 (lr_in_ip_input ), priority=90 , dnl -match=(inport == "lrp" && arp.op == 1 && arp.tpa == { 192.168.2.1, 192.168.2.4, 192.168.2.5, 192.168.2.6 }), dnl +match=(inport == "lrp" && arp.op == 1 && arp.tpa == { $vip, $vip, $vip, $vip }), dnl action=(eth.dst = eth.src; eth.src = xreg0[[0..47]]; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = xreg0[[0..47]]; arp.tpa <-> arp.spa; outport = inport; flags.loopback = 1; output;) table=3 (lr_in_ip_input ), priority=90 , dnl match=(inport == "lrp" && ip6.dst == {fe80::200:ff:fe00:1, ff02::1:ff00:1} && nd_ns && nd.target == fe80::200:ff:fe00:1), dnl @@ -1718,7 +1726,7 @@ action=(nd_na { eth.src = xreg0[[0..47]]; ip6.src = fe80::200:ff:fe00:102:8080; match=(inport == "lrp-public" && arp.op == 1 && arp.tpa == 43.43.43.1 && arp.spa == 43.43.43.0/24), dnl action=(eth.dst = eth.src; eth.src = xreg0[[0..47]]; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = xreg0[[0..47]]; arp.tpa <-> arp.spa; outport = inport; flags.loopback = 1; output;) table=3 (lr_in_ip_input ), priority=90 , dnl -match=(inport == "lrp-public" && arp.op == 1 && arp.tpa == { 192.168.2.1, 192.168.2.4, 192.168.2.5, 192.168.2.6 }), dnl +match=(inport == "lrp-public" && arp.op == 1 && arp.tpa == { $vip, $vip, $vip, $vip }), dnl action=(eth.dst = eth.src; eth.src = xreg0[[0..47]]; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = xreg0[[0..47]]; arp.tpa <-> arp.spa; outport = inport; flags.loopback = 1; output;) table=3 (lr_in_ip_input ), priority=90 , dnl match=(inport == "lrp-public" && ip6.dst == {fe80::200:ff:fe00:100, ff02::1:ff00:100} && nd_ns && nd.target == fe80::200:ff:fe00:100), dnl @@ -1761,7 +1769,7 @@ action=(xreg0[[0..47]] = 00:00:00:00:01:00; next;) # Ingress router port is used for ARP reply/NA in lr_in_ip_input. # xxreg0[0..47] is used unless external_mac is set. # Priority 90 flows (per router). -AT_CHECK([ovn-sbctl lflow-list | grep
[ovs-dev] [PATCH ovn v2 06/10] ovn-northd-ddlog: Simplify LBVIPWithStatus to include up_backends string.
There was only one use for the 'backends' map, which was to be converted to a string that listed all the backends that were up, so we might as well do that at its point of origination. At the same time, everything else in LBVIPWithStatus was just a copy of the underlying LBVIP, so we might as well just reference it. Only slight improvement to performance. Signed-off-by: Ben Pfaff --- northd/lswitch.dl| 32 ++-- northd/ovn_northd.dl | 29 + 2 files changed, 23 insertions(+), 38 deletions(-) diff --git a/northd/lswitch.dl b/northd/lswitch.dl index ad6475a91..69b1c6eb5 100644 --- a/northd/lswitch.dl +++ b/northd/lswitch.dl @@ -445,25 +445,21 @@ function default_protocol(protocol: Option): istring = { } } -typedef LBVIPWithStatus = LBVIPWithStatus { -lb: Intern, -vip_key: istring, -backend_ips: istring, -health_check: Option>, -vip_addr: v46_ip, -vip_port: bit<16>, -backends: Map -} -relation LBVIPWithStatus[Intern] - -LBVIPWithStatus[LBVIPWithStatus{lb, vip_key, backend_ips, health_check, vip_addr, vip_port, map_empty()}.intern()] :- -(lb, vip_key, backend_ips, health_check, vip_addr, vip_port, vec_empty()). -LBVIPWithStatus[LBVIPWithStatus{lb, vip_key, backend_ips, health_check, vip_addr, vip_port, backends_with_status}.intern()] :- -(lb, vip_key, backend_ips, health_check, vip_addr, vip_port, backends), -var backend = FlatMap(backends), +relation LBVIPWithStatus( +lbvip: Intern, +up_backends: istring) +LBVIPWithStatus(lbvip, i"") :- +lbvip in (.backends = vec_empty()). +LBVIPWithStatus(lbvip, up_backends) :- LBVIPBackendStatus(lbvip, backend, up), -var backends_with_status = ((backend, up)).group_by((lb, vip_key, backend_ips, health_check, - vip_addr, vip_port)).to_map(). +var up_backends = ((backend, up)).group_by(lbvip).to_vec().filter_map(|x| { +(LBVIPBackend{var ip, var port, _}, var up) = x; +match ((up, port)) { +(true, 0) -> Some{"${ip.to_bracketed_string()}"}, +(true, _) -> Some{"${ip.to_bracketed_string()}:${port}"}, +_ -> None +} +}).join(",").intern(). /* Maps from a load-balancer virtual IP backend to whether it's up or not. * diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index bf2192f7e..5af41fa22 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -3177,7 +3177,7 @@ function get_match_for_lb_key(ip_address: v46_ip, } /* New connections in Ingress table. */ -function ct_lb(backends: string, +function ct_lb(backends: istring, selection_fields: Set, protocol: Option): string { var args = vec_with_capacity(2); args.push("backends=${backends}"); @@ -3198,18 +3198,11 @@ function ct_lb(backends: string, "ct_lb(" ++ args.join("; ") ++ ");" } -function build_lb_vip_actions(lbvip: Intern, +function build_lb_vip_actions(lbvip: Intern, + up_backends: istring, stage: Intern, actions0: string): (string, bool) { -var up_backends = vec_with_capacity(lbvip.backends.size()); -for (pair in lbvip.backends) { -(var backend, var up) = pair; -if (up) { -up_backends.push((backend.ip, backend.port)) -} -}; - -if (up_backends.is_empty()) { +if (up_backends == i"") { if (lbvip.lb.options.get_bool_def(i"reject", false)) { return ("reg0 = 0; reject { outport <-> inport; ${next_to_stage(stage)};};", true) } else if (lbvip.health_check.is_some()) { @@ -3217,11 +3210,7 @@ function build_lb_vip_actions(lbvip: Intern, } // else fall through }; -var up_backends_s = up_backends.sort_imm().map(|x| match (x) { -(ip, 0) -> "${ip.to_bracketed_string()}", -(ip, port) -> "${ip.to_bracketed_string()}:${port}" -}).join(","); -var actions = ct_lb(up_backends_s, lbvip.lb.selection_fields, lbvip.lb.protocol); +var actions = ct_lb(up_backends, lbvip.lb.selection_fields, lbvip.lb.protocol); (actions0 ++ actions, false) } Flow(.logical_datapath = sw._uuid, @@ -3232,7 +3221,7 @@ Flow(.logical_datapath = sw._uuid, .io_port = None, .controller_meter = meter, .stage_hint = 0) :- -LBVIPWithStatus[lbvip@{.lb = lb}], +LBVIPWithStatus(lbvip@{.lb = lb}, up_backends), var priority = if (lbvip.vip_port != 0) { 120 } else { 110 }, (var actions, var reject) = { /* Store the original destination IP to be used when generating @@ -3252,7 +3241,7 @@ Flow(.logical_datapath = sw._uuid, "" }; -build_lb_vip_actions(lbvip,
[ovs-dev] [PATCH ovn v2 07/10] ovn-northd-ddlog: Avoid storing unused 'lbs' field in Router.
Signed-off-by: Ben Pfaff --- northd/lrouter.dl | 9 - 1 file changed, 9 deletions(-) diff --git a/northd/lrouter.dl b/northd/lrouter.dl index c0ec6be47..582e07654 100644 --- a/northd/lrouter.dl +++ b/northd/lrouter.dl @@ -488,7 +488,6 @@ typedef Router = Router { is_gateway: bool, nats: Vec, snat_ips: Map>, -lbs:Vec>, lb_ipv4s_routable: Set, lb_ipv4s_unroutable: Set, lb_ipv6s_routable: Set, @@ -515,7 +514,6 @@ Router[Router{ .is_gateway = lr.options.contains_key(i"chassis"), .nats= nats, .snat_ips= snat_ips, -.lbs = lbs, .lb_ipv4s_routable = lb_ipv4s_routable, .lb_ipv4s_unroutable = lb_ipv4s_unroutable, .lb_ipv6s_routable = lb_ipv6s_routable, @@ -553,13 +551,6 @@ Router[Router{ lb_ipv6s_routable, lb_ipv6s_unroutable) }. -/* RouterLB: many-to-many relation between logical routers and nb::LB */ -relation RouterLB(router: Intern, lb: Intern) - -RouterLB(router, lb) :- -router in (.lbs = lbs), -var lb = FlatMap(lbs). - /* Router-to-router logical port connections */ relation RouterRouterPeer(rport1: uuid, rport2: uuid, rport2_name: istring) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2 04/10] ovn-northd-ddlog: Reverse order of joins for connection tracking flows.
DDlog evaluates rules in the order given by syntax. The rules for load balancers all first evaluated a Router or Switch, then joined that against the load balancers. However, the expensive logic was all on the load balancers. This meant that the load balancer logic was happening many times, once per switch or router that contained it. When a single load balancer was part of many switches or routers, this did a lot of redundant processing. This commit reverses the join order, which speeds up processing a lot. This commit looks big because it also converts a lot of rules from the FLWOR syntax to traditional Datalog-style syntax. This is not completely needed, but the Datalog-style syntax is more versatile (it supports FlatMap and aggregation), so I tend to make that sort of change as I refactor things. Signed-off-by: Ben Pfaff --- northd/lrouter.dl| 11 -- northd/ovn_northd.dl | 419 +-- 2 files changed, 209 insertions(+), 221 deletions(-) diff --git a/northd/lrouter.dl b/northd/lrouter.dl index 17d803292..c0ec6be47 100644 --- a/northd/lrouter.dl +++ b/northd/lrouter.dl @@ -560,17 +560,6 @@ RouterLB(router, lb) :- router in (.lbs = lbs), var lb = FlatMap(lbs). -/* Load balancer VIPs associated with routers */ -relation RouterLBVIP( -router: Intern, -lb: Intern, -vip: istring, -backends: istring) - -RouterLBVIP(router, lb, vip, backends) :- -RouterLB(router, lb@({.lb = ::Load_Balancer{.vips = vips}})), -(var vip, var backends) = FlatMap(vips). - /* Router-to-router logical port connections */ relation RouterRouterPeer(rport1: uuid, rport2: uuid, rport2_name: istring) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 4348171ba..f678a4f50 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -3232,9 +3232,7 @@ Flow(.logical_datapath = sw._uuid, .io_port = None, .controller_meter = meter, .stage_hint = 0) :- -sw in (), LBVIPWithStatus[lbvip@{.lb = lb}], -sw.load_balancer.contains(lb._uuid), var priority = if (lbvip.vip_port != 0) { 120 } else { 110 }, (var actions, var reject) = { /* Store the original destination IP to be used when generating @@ -3256,12 +3254,14 @@ Flow(.logical_datapath = sw._uuid, build_lb_vip_actions(lbvip, s_SWITCH_OUT_QOS_MARK(), actions0 ++ actions1) }, +var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, false, false, false), +sw in (), +sw.load_balancer.contains(lb._uuid), var meter = if (reject) { sw.copp.get(cOPP_REJECT()) } else { None -}, -var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, false, false, false). +}. /* Ingress Pre-Hairpin/Nat-Hairpin/Hairpin tabled (Priority 0). * Packets that don't need hairpinning should continue processing. @@ -5657,6 +5657,26 @@ for (RouterPortNetworksIPv4Addr(.port = {.lrp = lrp, } } +LogicalRouterNdFlow(.lr = r, +.lrp = Some{lrp}, +.action = i"nd_na", +.ip = ip, +.sn_ip = false, +.mac = rEG_INPORT_ETH_ADDR(), +.extra_match = residence_check, +.drop = false, +.priority = 90, +.stage_hint = 0) :- +LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}], +Some{(IPv6{var ip}, _)} = ip_address_and_port_from_lb_key(lb_key.ival()), +r in (), +r.load_balancer.contains(lb._uuid), +(.router = r, .lrp = lrp, .is_redirect = is_redirect), +var residence_check = match (is_redirect) { +true -> Some{i"is_chassis_resident(${json_escape(chassis_redirect_name(lrp.name))})"}, +false -> None +}. + for ((.lrp = lrp, .router = router@{._uuid = lr_uuid}, .json_name = json_name, @@ -5675,22 +5695,7 @@ var residence_check = match (is_redirect) { .extra_match = residence_check, .drop = false, .priority = 90, - .stage_hint = 0); -for (RouterLBVIP(.router = {._uuid= lr_uuid}, .vip = vip)) { -Some{(var ip_address, _)} = ip_address_and_port_from_lb_key(vip.ival()) in { -IPv6{var ipv6} = ip_address in -LogicalRouterNdFlow(.lr = router, -.lrp = Some{lrp}, -.action = i"nd_na", -.ip = ipv6, -.sn_ip = false, -.mac = rEG_INPORT_ETH_ADDR(), -.extra_match = residence_check, -.drop = false, -
[ovs-dev] [PATCH ovn v2 05/10] ovn-northd-ddlog: Avoid re-parsing LB IP addresses and ports.
The LBVIPs already contain parsed versions of the load balancer keys, but the code was parsing it redundantly. Slight performance improvement but much cleaner. Signed-off-by: Ben Pfaff --- northd/ovn_northd.dl | 37 - 1 file changed, 12 insertions(+), 25 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index f678a4f50..bf2192f7e 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -5667,8 +5667,7 @@ LogicalRouterNdFlow(.lr = r, .drop = false, .priority = 90, .stage_hint = 0) :- -LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}], -Some{(IPv6{var ip}, _)} = ip_address_and_port_from_lb_key(lb_key.ival()), +LBVIP[lbvip@{.vip_addr = IPv6{ip}, .lb = lb}], r in (), r.load_balancer.contains(lb._uuid), (.router = r, .lrp = lrp, .is_redirect = is_redirect), @@ -6685,8 +6684,7 @@ Flow(.logical_datapath = r._uuid, .stage_hint = 0, .io_port = None, .controller_meter = None) :- -LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}], -Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()), +LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb}], var prio = if (port != 0) { 110 } else { 100 }, var proto = match (lb.protocol) { Some{proto} -> proto, @@ -6719,8 +6717,7 @@ Flow(.logical_datapath = r._uuid, .stage_hint = 0, .io_port = None, .controller_meter = None) :- -LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}], -Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()), +LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb}], var proto = match (lb.protocol) { Some{proto} -> proto, _ -> i"tcp" @@ -6765,8 +6762,7 @@ Flow(.logical_datapath = r._uuid, .stage_hint = stage_hint(lb._uuid), .io_port = None, .controller_meter = None) :- -LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}], -Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()), +LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb}], var proto = match (lb.protocol) { Some{proto} -> proto, _ -> i"tcp" @@ -6791,28 +6787,19 @@ Flow(.logical_datapath = r._uuid, .stage_hint = stage_hint(lb._uuid), .io_port = None, .controller_meter = None) :- -LBVIP[lbvip@{.vip_key = lb_key, .lb = lb, .backend_ips = backends}], -Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()), +LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb, .backends = backends}], var proto = match (lb.protocol) { Some{proto} -> proto, _ -> i"tcp" }, -var conds = { -var conds = vec_empty(); -for (ip_str in backends.split(",")) { -match (ip_address_and_port_from_lb_key(ip_str)) { -None -> () /* FIXME: put a break here */, -Some{(ip_, port_)} -> conds.push( -"(${ip_.ipX()}.src == ${ip_}" ++ -if (port_ != 0) { -" && ${proto}.src == ${port_})" -} else { -")" -}) -} +var conds = backends.map(|b| { +var port_match = if (b.port != 0) { +" && ${proto}.src == ${b.port}" +} else { +"" }; -conds.join(" || ") -}, +"(${b.ip.ipX()}.src == ${b.ip}" ++ port_match ++ ")" +}).join(" || "), conds != "", r in (), Some{var gwport} = r.l3dgw_ports.nth(0), -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2 03/10] ovn-northd-ddlog: Derive load balancer IP addresses in new LoadBalancer.
This makes the get_router_load_balancer_ips() function much faster. This function is a hot path for the use of load balancers, so it improves performance overall when they are in use. Signed-off-by: Ben Pfaff --- northd/lrouter.dl| 68 +-- northd/ovn_northd.dl | 76 +++- 2 files changed, 91 insertions(+), 53 deletions(-) diff --git a/northd/lrouter.dl b/northd/lrouter.dl index b6e752f7c..17d803292 100644 --- a/northd/lrouter.dl +++ b/northd/lrouter.dl @@ -402,14 +402,13 @@ LogicalRouterSnatIPs(lr._uuid, map_empty()) :- lr in nb::Logical_Router(), not LogicalRouterSnatIP(.lr = lr._uuid). -relation LogicalRouterLB(lr: uuid, nat: Intern) - +relation LogicalRouterLB(lr: uuid, nat: Intern) LogicalRouterLB(lr, lb) :- nb::Logical_Router(._uuid = lr, .load_balancer = lbs), var lb_uuid = FlatMap(lbs), -lb in ::Load_Balancer(._uuid = lb_uuid). +lb in (.lb = ::Load_Balancer{._uuid = lb_uuid}). -relation LogicalRouterLBs(lr: uuid, nat: Vec>) +relation LogicalRouterLBs(lr: uuid, nat: Vec>) LogicalRouterLBs(lr, lbs) :- LogicalRouterLB(lr, lb), @@ -448,6 +447,31 @@ LogicalRouterCopp0(lr, meters) :- function chassis_redirect_name(port_name: istring): string = "cr-${port_name}" +typedef LoadBalancer = LoadBalancer { +lb: Intern, +ipv4s: Set, +ipv6s: Set, +routable: bool +} + +relation LoadBalancer[Intern] +LoadBalancer[LoadBalancer{lb, ipv4s, ipv6s, routable}.intern()] :- +nb::Load_Balancer[lb], +var routable = lb.options.get_bool_def(i"add_route", false), +(var ipv4s, var ipv6s) = { +var ipv4s = set_empty(); +var ipv6s = set_empty(); +for ((vip, _) in lb.vips) { +/* node->key contains IP:port or just IP. */ +match (ip_address_and_port_from_lb_key(vip.ival())) { +None -> (), +Some{(IPv4{ipv4}, _)} -> ipv4s.insert(i"${ipv4}"), +Some{(IPv6{ipv6}, _)} -> ipv6s.insert(i"${ipv6}"), +} +}; +(ipv4s, ipv6s) +}. + typedef Router = Router { /* Fields copied from nb::Logical_Router. */ _uuid: uuid, @@ -464,7 +488,11 @@ typedef Router = Router { is_gateway: bool, nats: Vec, snat_ips: Map>, -lbs:Vec>, +lbs:Vec>, +lb_ipv4s_routable: Set, +lb_ipv4s_unroutable: Set, +lb_ipv6s_routable: Set, +lb_ipv6s_unroutable: Set, mcast_cfg: Intern, learn_from_arp_request: bool, force_lb_snat: bool, @@ -488,6 +516,10 @@ Router[Router{ .nats= nats, .snat_ips= snat_ips, .lbs = lbs, +.lb_ipv4s_routable = lb_ipv4s_routable, +.lb_ipv4s_unroutable = lb_ipv4s_unroutable, +.lb_ipv6s_routable = lb_ipv6s_routable, +.lb_ipv6s_unroutable = lb_ipv6s_unroutable, .mcast_cfg = mcast_cfg, .learn_from_arp_request = learn_from_arp_request, .force_lb_snat = force_lb_snat, @@ -501,10 +533,28 @@ Router[Router{ LogicalRouterCopp(lr._uuid, copp), mcast_cfg in (.datapath = lr._uuid), var learn_from_arp_request = lr.options.get_bool_def(i"always_learn_from_arp_request", true), -var force_lb_snat = lb_force_snat_router_ip(lr.options). +var force_lb_snat = lb_force_snat_router_ip(lr.options), +(var lb_ipv4s_routable, var lb_ipv4s_unroutable, + var lb_ipv6s_routable, var lb_ipv6s_unroutable) = { +var lb_ipv4s_routable = set_empty(); +var lb_ipv4s_unroutable = set_empty(); +var lb_ipv6s_routable = set_empty(); +var lb_ipv6s_unroutable = set_empty(); +for (lb in lbs) { +if (lb.routable) { +lb_ipv4s_routable = lb_ipv4s_routable.union(lb.ipv4s); +lb_ipv6s_routable = lb_ipv6s_routable.union(lb.ipv6s); +} else { +lb_ipv4s_unroutable = lb_ipv4s_unroutable.union(lb.ipv4s); +lb_ipv6s_unroutable = lb_ipv6s_unroutable.union(lb.ipv6s); +} +}; +(lb_ipv4s_routable, lb_ipv4s_unroutable, + lb_ipv6s_routable, lb_ipv6s_unroutable) +}. /* RouterLB: many-to-many relation between logical routers and nb::LB */ -relation RouterLB(router: Intern, lb: Intern) +relation RouterLB(router: Intern, lb: Intern) RouterLB(router, lb) :- router in (.lbs = lbs), @@ -513,12 +563,12 @@ RouterLB(router, lb) :- /* Load balancer VIPs associated with routers */ relation RouterLBVIP( router: Intern, -lb: Intern, +lb: Intern, vip: istring, backends: istring) RouterLBVIP(router, lb, vip, backends) :- -RouterLB(router, lb@(::Load_Balancer{.vips = vips})), +RouterLB(router, lb@({.lb = ::Load_Balancer{.vips = vips}})), (var vip, var backends) = FlatMap(
[ovs-dev] [PATCH ovn v2 02/10] ovn-northd-ddlog: Make joins for ARP/ND flows slightly more efficient.
DDlog can index equality joins within a expression like this, but not conditional expression that follow such an expression. This doesn't actually matter much in this case because ordinarily the expression will be true (most router ports are enabled). Signed-off-by: Ben Pfaff --- northd/lrouter.dl| 6 -- northd/ovn_northd.dl | 16 +--- 2 files changed, 9 insertions(+), 13 deletions(-) diff --git a/northd/lrouter.dl b/northd/lrouter.dl index cc3dced5f..b6e752f7c 100644 --- a/northd/lrouter.dl +++ b/northd/lrouter.dl @@ -596,7 +596,8 @@ typedef RouterPort = RouterPort { peer: RouterPeer, mcast_cfg:Intern, sb_options: Map, -has_bfd: bool +has_bfd: bool, +enabled: bool } relation RouterPort[Intern] @@ -610,7 +611,8 @@ RouterPort[RouterPort{ .peer = peer, .mcast_cfg = mcast_cfg, .sb_options = sb_options, - .has_bfd= has_bfd + .has_bfd= has_bfd, + .enabled= lrp.is_enabled() }.intern()] :- lrp in ::Logical_Router_Port(), Some{var networks} = extract_lrp_networks(lrp.mac.ival(), lrp.networks.map(ival)), diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 4402b17d2..c41a79b84 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -4609,8 +4609,7 @@ relation ( .reachable_ips_v6 = reachable_ips_v6, .unreachable_ips_v4 = unreachable_ips_v4, .unreachable_ips_v6 = unreachable_ips_v6) :- -port in (.peer = Some{rp}), -rp.is_enabled(), +port in (.peer = Some{rp@{.enabled = true}}), (var reachable_ips_v4, var reachable_ips_v6, var unreachable_ips_v4, var unreachable_ips_v6) = get_arp_forward_ips(rp). /* Packets received from VXLAN tunnels have already been through the @@ -4632,8 +4631,7 @@ Flow(.logical_datapath = sw._uuid, .stage_hint = stage_hint(sp.lsp._uuid), .io_port = None, .controller_meter = None) :- -sp in (.sw = sw, .peer = Some{rp}), -rp.is_enabled(), +sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .reachable_ips_v4 = ips_v4), var ipv4 = FlatMap(ips_v4), var mc_flood_l2 = json_escape(mC_FLOOD_L2().0). @@ -4650,8 +4648,7 @@ Flow(.logical_datapath = sw._uuid, .stage_hint = stage_hint(sp.lsp._uuid), .io_port = None, .controller_meter = None) :- -sp in (.sw = sw, .peer = Some{rp}), -rp.is_enabled(), +sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .reachable_ips_v6 = ips_v6), var ipv6 = FlatMap(ips_v6), var mc_flood_l2 = json_escape(mC_FLOOD_L2().0). @@ -4664,12 +4661,10 @@ Flow(.logical_datapath = sw._uuid, .stage_hint = 0, .io_port = None, .controller_meter = None) :- -sp in (.sw = sw, .peer = Some{rp}), -rp.is_enabled(), +sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .unreachable_ips_v4 = ips_v4), var ipv4 = FlatMap(ips_v4), var flood = json_escape(mC_FLOOD().0). - Flow(.logical_datapath = sw._uuid, .stage= s_SWITCH_IN_L2_LKUP(), .priority = 90, @@ -4678,8 +4673,7 @@ Flow(.logical_datapath = sw._uuid, .stage_hint = stage_hint(sp.lsp._uuid), .io_port = None, .controller_meter = None) :- -sp in (.sw = sw, .peer = Some{rp}), -rp.is_enabled(), +sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .unreachable_ips_v6 = ips_v6), var ipv6 = FlatMap(ips_v6), var flood = json_escape(mC_FLOOD().0). -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v2 00/10] 3x performance improvement for ddlog with load balancer benchmark
With and without these patches, I see the following performance when I run the load-balancer heavy benchmark. The measurements include cold start with all the load balancers, then deleting the sctp load balancer and waiting for it to finish, then the same with the tcp load balancer, then the same with the udp load balancer. The measurements only include ddlog time; there is additional overhead talking to the database, but that's a constant factor. Without patches: 783 CPU seconds, 23.5 GB RAM, 13:06 elapsed time. With patches: 224 CPU seconds, 14.3 GB RAM, 3:45 elapsed time. I am continuing to work on performance. These patches are also available in: https://github.com/blp/ovs-reviews/tree/ovn-memory-8 v1->v2: - Add two new patches. - Rebase and rerun benchmarks (revisions reported above). Ben Pfaff (9): ovn-northd-ddlog: Make joins for ARP/ND flows slightly more efficient. ovn-northd-ddlog: Derive load balancer IP addresses in new LoadBalancer. ovn-northd-ddlog: Reverse order of joins for connection tracking flows. ovn-northd-ddlog: Avoid re-parsing LB IP addresses and ports. ovn-northd-ddlog: Simplify LBVIPWithStatus to include up_backends string. ovn-northd-ddlog: Avoid storing unused 'lbs' field in Router. ovn-northd-ddlog: Intern strings before joining when possible. ovn-northd-ddlog: Avoid map(ival) for ARP flows. ovn-northd-ddlog: Avoid unnecessary joins for SwitchPortARPForwards. Leonid Ryzhyk (1): ovn-northd-ddlog: Intern all strings in OVSDB tables. configure.ac |2 +- manpages.mk |1 - northd/copp.dl | 32 +- northd/helpers.dl| 14 +- northd/ipam.dl | 17 +- northd/lrouter.dl| 213 +++-- northd/lswitch.dl| 177 ++-- northd/multicast.dl | 44 +- northd/ovn-nb.dlopts |1 + northd/ovn-sb.dlopts |1 + northd/ovn.dl|7 + northd/ovn_northd.dl | 2021 -- northd/ovsdb2ddlog2c |6 +- tests/ovn-ic.at |8 +- tests/ovn-northd.at | 20 +- tests/ovn.at |6 +- 16 files changed, 1280 insertions(+), 1290 deletions(-) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] json: Optimize string serialization.
On Tue, Aug 24, 2021 at 11:07:22PM +0200, Ilya Maximets wrote: > +count = 0; > +start = string; > while ((c = *string++) != '\0') { > -escape = chars_escaping[c]; > -while ((c2 = *escape++) != '\0') { > -ds_put_char(ds, c2); > +if (c >= ' ' && c != '\"' && c != '\\') { > +count++; > +continue; > +} else { > +if (count) { > +ds_put_buffer(ds, start, count); > +count = 0; > +} > +start = string; > +escape = chars_escaping[c]; > +while ((c2 = *escape++) != '\0') { > +ds_put_char(ds, c2); > +} The "continue" above isn't needed. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn 8/8] ovn-northd-ddlog: Intern strings before joining when possible.
By interning earlier, we do less copying across the joins with Router and Switch, because DDlog always copies by value and copying an interned string is cheap. Signed-off-by: Ben Pfaff --- northd/ovn_northd.dl | 57 +++- 1 file changed, 30 insertions(+), 27 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 8831da442..84d1fde0d 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -3216,14 +3216,14 @@ function build_lb_vip_actions(lbvip: Intern, Flow(.logical_datapath = sw._uuid, .stage= s_SWITCH_IN_STATEFUL(), .priority = priority, - .__match = __match.intern(), - .actions = actions.intern(), + .__match = __match, + .actions = actions, .io_port = None, .controller_meter = meter, .stage_hint = 0) :- LBVIPWithStatus(lbvip@{.lb = lb}, up_backends), var priority = if (lbvip.vip_port != 0) { 120 } else { 110 }, -(var actions, var reject) = { +(var actions0, var reject) = { /* Store the original destination IP to be used when generating * hairpin flows. */ @@ -3243,7 +3243,8 @@ Flow(.logical_datapath = sw._uuid, build_lb_vip_actions(lbvip, up_backends, s_SWITCH_OUT_QOS_MARK(), actions0 ++ actions1) }, -var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, false, false, false), +var actions = actions0.intern(), +var __match = ("ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, false, false, false)).intern(), sw in (), sw.load_balancer.contains(lb._uuid), var meter = if (reject) { @@ -4610,10 +4611,10 @@ Flow(.logical_datapath = sw._uuid, .stage_hint = stage_hint(sp.lsp._uuid), .io_port = None, .controller_meter = None) :- +var mc_flood_l2 = json_escape(mC_FLOOD_L2().0), sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .reachable_ips_v4 = ips_v4), -var ipv4 = FlatMap(ips_v4), -var mc_flood_l2 = json_escape(mC_FLOOD_L2().0). +var ipv4 = FlatMap(ips_v4). Flow(.logical_datapath = sw._uuid, .stage= s_SWITCH_IN_L2_LKUP(), .priority = 80, @@ -4627,35 +4628,35 @@ Flow(.logical_datapath = sw._uuid, .stage_hint = stage_hint(sp.lsp._uuid), .io_port = None, .controller_meter = None) :- +var mc_flood_l2 = json_escape(mC_FLOOD_L2().0), sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .reachable_ips_v6 = ips_v6), -var ipv6 = FlatMap(ips_v6), -var mc_flood_l2 = json_escape(mC_FLOOD_L2().0). +var ipv6 = FlatMap(ips_v6). Flow(.logical_datapath = sw._uuid, .stage= s_SWITCH_IN_L2_LKUP(), .priority = 90, .__match = i"${fLAGBIT_NOT_VXLAN()} && arp.op == 1 && arp.tpa == ${ipv4}", - .actions = i"outport = ${flood}; output;", + .actions = actions, .stage_hint = 0, .io_port = None, .controller_meter = None) :- +var actions = i"outport = ${json_escape(mC_FLOOD().0)}; output;", sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .unreachable_ips_v4 = ips_v4), -var ipv4 = FlatMap(ips_v4), -var flood = json_escape(mC_FLOOD().0). +var ipv4 = FlatMap(ips_v4). Flow(.logical_datapath = sw._uuid, .stage= s_SWITCH_IN_L2_LKUP(), .priority = 90, .__match = i"${fLAGBIT_NOT_VXLAN()} && nd_ns && nd.target == ${ipv6}", - .actions = i"outport = ${flood}; output;", + .actions = actions, .stage_hint = stage_hint(sp.lsp._uuid), .io_port = None, .controller_meter = None) :- +var actions = i"outport = ${json_escape(mC_FLOOD().0)}; output;", sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .unreachable_ips_v6 = ips_v6), -var ipv6 = FlatMap(ips_v6), -var flood = json_escape(mC_FLOOD().0). +var ipv6 = FlatMap(ips_v6). for (SwitchPortNewDynamicAddress(.port = {.lsp = lsp, .json_name = json_name, .sw = sw}, .address = Some{addrs}) @@ -6701,7 +6702,7 @@ Flow(.logical_datapath = r._uuid, Flow(.logical_datapath = r._uuid, .stage= s_ROUTER_IN_DNAT(), .priority = prio, - .__match = __match.intern(), + .__match = __match, .actions = actions, .stage_hint = 0, .io_port = None, @@ -6718,13 +6719,13 @@ Flow(.logical_datapath = r._uuid, "" }, var prio = if (port != 0) { 120 } else { 110 }, -var match0 = "ct.est && &qu
[ovs-dev] [PATCH ovn 6/8] ovn-northd-ddlog: Simplify LBVIPWithStatus to include up_backends string.
There was only one use for the 'backends' map, which was to be converted to a string that listed all the backends that were up, so we might as well do that at its point of origination. At the same time, everything else in LBVIPWithStatus was just a copy of the underlying LBVIP, so we might as well just reference it. Only slight improvement to performance. Signed-off-by: Ben Pfaff --- northd/lswitch.dl| 32 ++-- northd/ovn_northd.dl | 29 + 2 files changed, 23 insertions(+), 38 deletions(-) diff --git a/northd/lswitch.dl b/northd/lswitch.dl index ad6475a91..69b1c6eb5 100644 --- a/northd/lswitch.dl +++ b/northd/lswitch.dl @@ -445,25 +445,21 @@ function default_protocol(protocol: Option): istring = { } } -typedef LBVIPWithStatus = LBVIPWithStatus { -lb: Intern, -vip_key: istring, -backend_ips: istring, -health_check: Option>, -vip_addr: v46_ip, -vip_port: bit<16>, -backends: Map -} -relation LBVIPWithStatus[Intern] - -LBVIPWithStatus[LBVIPWithStatus{lb, vip_key, backend_ips, health_check, vip_addr, vip_port, map_empty()}.intern()] :- -(lb, vip_key, backend_ips, health_check, vip_addr, vip_port, vec_empty()). -LBVIPWithStatus[LBVIPWithStatus{lb, vip_key, backend_ips, health_check, vip_addr, vip_port, backends_with_status}.intern()] :- -(lb, vip_key, backend_ips, health_check, vip_addr, vip_port, backends), -var backend = FlatMap(backends), +relation LBVIPWithStatus( +lbvip: Intern, +up_backends: istring) +LBVIPWithStatus(lbvip, i"") :- +lbvip in (.backends = vec_empty()). +LBVIPWithStatus(lbvip, up_backends) :- LBVIPBackendStatus(lbvip, backend, up), -var backends_with_status = ((backend, up)).group_by((lb, vip_key, backend_ips, health_check, - vip_addr, vip_port)).to_map(). +var up_backends = ((backend, up)).group_by(lbvip).to_vec().filter_map(|x| { +(LBVIPBackend{var ip, var port, _}, var up) = x; +match ((up, port)) { +(true, 0) -> Some{"${ip.to_bracketed_string()}"}, +(true, _) -> Some{"${ip.to_bracketed_string()}:${port}"}, +_ -> None +} +}).join(",").intern(). /* Maps from a load-balancer virtual IP backend to whether it's up or not. * diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index de2b0718c..8831da442 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -3177,7 +3177,7 @@ function get_match_for_lb_key(ip_address: v46_ip, } /* New connections in Ingress table. */ -function ct_lb(backends: string, +function ct_lb(backends: istring, selection_fields: Set, protocol: Option): string { var args = vec_with_capacity(2); args.push("backends=${backends}"); @@ -3198,18 +3198,11 @@ function ct_lb(backends: string, "ct_lb(" ++ args.join("; ") ++ ");" } -function build_lb_vip_actions(lbvip: Intern, +function build_lb_vip_actions(lbvip: Intern, + up_backends: istring, stage: Intern, actions0: string): (string, bool) { -var up_backends = vec_with_capacity(lbvip.backends.size()); -for (pair in lbvip.backends) { -(var backend, var up) = pair; -if (up) { -up_backends.push((backend.ip, backend.port)) -} -}; - -if (up_backends.is_empty()) { +if (up_backends == i"") { if (lbvip.lb.options.get_bool_def(i"reject", false)) { return ("reg0 = 0; reject { outport <-> inport; ${next_to_stage(stage)};};", true) } else if (lbvip.health_check.is_some()) { @@ -3217,11 +3210,7 @@ function build_lb_vip_actions(lbvip: Intern, } // else fall through }; -var up_backends_s = up_backends.sort_imm().map(|x| match (x) { -(ip, 0) -> "${ip.to_bracketed_string()}", -(ip, port) -> "${ip.to_bracketed_string()}:${port}" -}).join(","); -var actions = ct_lb(up_backends_s, lbvip.lb.selection_fields, lbvip.lb.protocol); +var actions = ct_lb(up_backends, lbvip.lb.selection_fields, lbvip.lb.protocol); (actions0 ++ actions, false) } Flow(.logical_datapath = sw._uuid, @@ -3232,7 +3221,7 @@ Flow(.logical_datapath = sw._uuid, .io_port = None, .controller_meter = meter, .stage_hint = 0) :- -LBVIPWithStatus[lbvip@{.lb = lb}], +LBVIPWithStatus(lbvip@{.lb = lb}, up_backends), var priority = if (lbvip.vip_port != 0) { 120 } else { 110 }, (var actions, var reject) = { /* Store the original destination IP to be used when generating @@ -3252,7 +3241,7 @@ Flow(.logical_datapath = sw._uuid, "" }; -build_lb_vip_actions(lbvip,
[ovs-dev] [PATCH ovn 7/8] ovn-northd-ddlog: Avoid storing unused 'lbs' field in Router.
Signed-off-by: Ben Pfaff --- northd/lrouter.dl | 9 - 1 file changed, 9 deletions(-) diff --git a/northd/lrouter.dl b/northd/lrouter.dl index c0ec6be47..582e07654 100644 --- a/northd/lrouter.dl +++ b/northd/lrouter.dl @@ -488,7 +488,6 @@ typedef Router = Router { is_gateway: bool, nats: Vec, snat_ips: Map>, -lbs:Vec>, lb_ipv4s_routable: Set, lb_ipv4s_unroutable: Set, lb_ipv6s_routable: Set, @@ -515,7 +514,6 @@ Router[Router{ .is_gateway = lr.options.contains_key(i"chassis"), .nats= nats, .snat_ips= snat_ips, -.lbs = lbs, .lb_ipv4s_routable = lb_ipv4s_routable, .lb_ipv4s_unroutable = lb_ipv4s_unroutable, .lb_ipv6s_routable = lb_ipv6s_routable, @@ -553,13 +551,6 @@ Router[Router{ lb_ipv6s_routable, lb_ipv6s_unroutable) }. -/* RouterLB: many-to-many relation between logical routers and nb::LB */ -relation RouterLB(router: Intern, lb: Intern) - -RouterLB(router, lb) :- -router in (.lbs = lbs), -var lb = FlatMap(lbs). - /* Router-to-router logical port connections */ relation RouterRouterPeer(rport1: uuid, rport2: uuid, rport2_name: istring) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn 5/8] ovn-northd-ddlog: Avoid re-parsing LB IP addresses and ports.
The LBVIPs already contain parsed versions of the load balancer keys, but the code was parsing it redundantly. Slight performance improvement but much cleaner. Signed-off-by: Ben Pfaff --- northd/ovn_northd.dl | 37 - 1 file changed, 12 insertions(+), 25 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index b72914d73..de2b0718c 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -5667,8 +5667,7 @@ LogicalRouterNdFlow(.lr = r, .drop = false, .priority = 90, .stage_hint = 0) :- -LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}], -Some{(IPv6{var ip}, _)} = ip_address_and_port_from_lb_key(lb_key.ival()), +LBVIP[lbvip@{.vip_addr = IPv6{ip}, .lb = lb}], r in (), r.load_balancer.contains(lb._uuid), (.router = r, .lrp = lrp, .is_redirect = is_redirect), @@ -6685,8 +6684,7 @@ Flow(.logical_datapath = r._uuid, .stage_hint = 0, .io_port = None, .controller_meter = None) :- -LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}], -Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()), +LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb}], var prio = if (port != 0) { 110 } else { 100 }, var proto = match (lb.protocol) { Some{proto} -> proto, @@ -6719,8 +6717,7 @@ Flow(.logical_datapath = r._uuid, .stage_hint = 0, .io_port = None, .controller_meter = None) :- -LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}], -Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()), +LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb}], var proto = match (lb.protocol) { Some{proto} -> proto, _ -> i"tcp" @@ -6765,8 +6762,7 @@ Flow(.logical_datapath = r._uuid, .stage_hint = stage_hint(lb._uuid), .io_port = None, .controller_meter = None) :- -LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}], -Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()), +LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb}], var proto = match (lb.protocol) { Some{proto} -> proto, _ -> i"tcp" @@ -6791,28 +6787,19 @@ Flow(.logical_datapath = r._uuid, .stage_hint = stage_hint(lb._uuid), .io_port = None, .controller_meter = None) :- -LBVIP[lbvip@{.vip_key = lb_key, .lb = lb, .backend_ips = backends}], -Some{(var ip, var port)} = ip_address_and_port_from_lb_key(lb_key.ival()), +LBVIP[lbvip@{.vip_addr = ip, .vip_port = port, .lb = lb, .backends = backends}], var proto = match (lb.protocol) { Some{proto} -> proto, _ -> i"tcp" }, -var conds = { -var conds = vec_empty(); -for (ip_str in backends.split(",")) { -match (ip_address_and_port_from_lb_key(ip_str)) { -None -> () /* FIXME: put a break here */, -Some{(ip_, port_)} -> conds.push( -"(${ip_.ipX()}.src == ${ip_}" ++ -if (port_ != 0) { -" && ${proto}.src == ${port_})" -} else { -")" -}) -} +var conds = backends.map(|b| { +var port_match = if (b.port != 0) { +" && ${proto}.src == ${b.port}" +} else { +"" }; -conds.join(" || ") -}, +"(${b.ip.ipX()}.src == ${b.ip}" ++ port_match ++ ")" +}).join(" || "), conds != "", r in (), Some{var gwport} = r.l3dgw_ports.nth(0), -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn 4/8] ovn-northd-ddlog: Reverse order of joins for connection tracking flows.
DDlog evaluates rules in the order given by syntax. The rules for load balancers all first evaluated a Router or Switch, then joined that against the load balancers. However, the expensive logic was all on the load balancers. This meant that the load balancer logic was happening many times, once per switch or router that contained it. When a single load balancer was part of many switches or routers, this did a lot of redundant processing. This commit reverses the join order, which speeds up processing a lot. This commit looks big because it also converts a lot of rules from the FLWOR syntax to traditional Datalog-style syntax. This is not completely needed, but the Datalog-style syntax is more versatile (it supports FlatMap and aggregation), so I tend to make that sort of change as I refactor things. Signed-off-by: Ben Pfaff --- northd/lrouter.dl| 11 -- northd/ovn_northd.dl | 419 +-- 2 files changed, 209 insertions(+), 221 deletions(-) diff --git a/northd/lrouter.dl b/northd/lrouter.dl index 17d803292..c0ec6be47 100644 --- a/northd/lrouter.dl +++ b/northd/lrouter.dl @@ -560,17 +560,6 @@ RouterLB(router, lb) :- router in (.lbs = lbs), var lb = FlatMap(lbs). -/* Load balancer VIPs associated with routers */ -relation RouterLBVIP( -router: Intern, -lb: Intern, -vip: istring, -backends: istring) - -RouterLBVIP(router, lb, vip, backends) :- -RouterLB(router, lb@({.lb = ::Load_Balancer{.vips = vips}})), -(var vip, var backends) = FlatMap(vips). - /* Router-to-router logical port connections */ relation RouterRouterPeer(rport1: uuid, rport2: uuid, rport2_name: istring) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index e69ce6958..b72914d73 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -3232,9 +3232,7 @@ Flow(.logical_datapath = sw._uuid, .io_port = None, .controller_meter = meter, .stage_hint = 0) :- -sw in (), LBVIPWithStatus[lbvip@{.lb = lb}], -sw.load_balancer.contains(lb._uuid), var priority = if (lbvip.vip_port != 0) { 120 } else { 110 }, (var actions, var reject) = { /* Store the original destination IP to be used when generating @@ -3256,12 +3254,14 @@ Flow(.logical_datapath = sw._uuid, build_lb_vip_actions(lbvip, s_SWITCH_OUT_QOS_MARK(), actions0 ++ actions1) }, +var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, false, false, false), +sw in (), +sw.load_balancer.contains(lb._uuid), var meter = if (reject) { sw.copp.get(cOPP_REJECT()) } else { None -}, -var __match = "ct.new && " ++ get_match_for_lb_key(lbvip.vip_addr, lbvip.vip_port, lb.protocol, false, false, false). +}. /* Ingress Pre-Hairpin/Nat-Hairpin/Hairpin tabled (Priority 0). * Packets that don't need hairpinning should continue processing. @@ -5657,6 +5657,26 @@ for (RouterPortNetworksIPv4Addr(.port = {.lrp = lrp, } } +LogicalRouterNdFlow(.lr = r, +.lrp = Some{lrp}, +.action = i"nd_na", +.ip = ip, +.sn_ip = false, +.mac = rEG_INPORT_ETH_ADDR(), +.extra_match = residence_check, +.drop = false, +.priority = 90, +.stage_hint = 0) :- +LBVIP[lbvip@{.vip_key = lb_key, .lb = lb}], +Some{(IPv6{var ip}, _)} = ip_address_and_port_from_lb_key(lb_key.ival()), +r in (), +r.load_balancer.contains(lb._uuid), +(.router = r, .lrp = lrp, .is_redirect = is_redirect), +var residence_check = match (is_redirect) { +true -> Some{i"is_chassis_resident(${json_escape(chassis_redirect_name(lrp.name))})"}, +false -> None +}. + for ((.lrp = lrp, .router = router@{._uuid = lr_uuid}, .json_name = json_name, @@ -5675,22 +5695,7 @@ var residence_check = match (is_redirect) { .extra_match = residence_check, .drop = false, .priority = 90, - .stage_hint = 0); -for (RouterLBVIP(.router = {._uuid= lr_uuid}, .vip = vip)) { -Some{(var ip_address, _)} = ip_address_and_port_from_lb_key(vip.ival()) in { -IPv6{var ipv6} = ip_address in -LogicalRouterNdFlow(.lr = router, -.lrp = Some{lrp}, -.action = i"nd_na", -.ip = ipv6, -.sn_ip = false, -.mac = rEG_INPORT_ETH_ADDR(), -.extra_match = residence_check, -.drop = false, -
[ovs-dev] [PATCH ovn 2/8] ovn-northd-ddlog: Make joins for ARP/ND flows slightly more efficient.
DDlog can index equality joins within a expression like this, but not conditional expression that follow such an expression. This doesn't actually matter much in this case because ordinarily the expression will be true (most router ports are enabled). Signed-off-by: Ben Pfaff --- northd/lrouter.dl| 6 -- northd/ovn_northd.dl | 16 +--- 2 files changed, 9 insertions(+), 13 deletions(-) diff --git a/northd/lrouter.dl b/northd/lrouter.dl index cc3dced5f..b6e752f7c 100644 --- a/northd/lrouter.dl +++ b/northd/lrouter.dl @@ -596,7 +596,8 @@ typedef RouterPort = RouterPort { peer: RouterPeer, mcast_cfg:Intern, sb_options: Map, -has_bfd: bool +has_bfd: bool, +enabled: bool } relation RouterPort[Intern] @@ -610,7 +611,8 @@ RouterPort[RouterPort{ .peer = peer, .mcast_cfg = mcast_cfg, .sb_options = sb_options, - .has_bfd= has_bfd + .has_bfd= has_bfd, + .enabled= lrp.is_enabled() }.intern()] :- lrp in ::Logical_Router_Port(), Some{var networks} = extract_lrp_networks(lrp.mac.ival(), lrp.networks.map(ival)), diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 61d7d53ce..fd40b637e 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -4609,8 +4609,7 @@ relation ( .reachable_ips_v6 = reachable_ips_v6, .unreachable_ips_v4 = unreachable_ips_v4, .unreachable_ips_v6 = unreachable_ips_v6) :- -port in (.peer = Some{rp}), -rp.is_enabled(), +port in (.peer = Some{rp@{.enabled = true}}), (var reachable_ips_v4, var reachable_ips_v6, var unreachable_ips_v4, var unreachable_ips_v6) = get_arp_forward_ips(rp). /* Packets received from VXLAN tunnels have already been through the @@ -4632,8 +4631,7 @@ Flow(.logical_datapath = sw._uuid, .stage_hint = stage_hint(sp.lsp._uuid), .io_port = None, .controller_meter = None) :- -sp in (.sw = sw, .peer = Some{rp}), -rp.is_enabled(), +sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .reachable_ips_v4 = ips_v4), var ipv4 = FlatMap(ips_v4), var mc_flood_l2 = json_escape(mC_FLOOD_L2().0). @@ -4650,8 +4648,7 @@ Flow(.logical_datapath = sw._uuid, .stage_hint = stage_hint(sp.lsp._uuid), .io_port = None, .controller_meter = None) :- -sp in (.sw = sw, .peer = Some{rp}), -rp.is_enabled(), +sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .reachable_ips_v6 = ips_v6), var ipv6 = FlatMap(ips_v6), var mc_flood_l2 = json_escape(mC_FLOOD_L2().0). @@ -4664,12 +4661,10 @@ Flow(.logical_datapath = sw._uuid, .stage_hint = 0, .io_port = None, .controller_meter = None) :- -sp in (.sw = sw, .peer = Some{rp}), -rp.is_enabled(), +sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .unreachable_ips_v4 = ips_v4), var ipv4 = FlatMap(ips_v4), var flood = json_escape(mC_FLOOD().0). - Flow(.logical_datapath = sw._uuid, .stage= s_SWITCH_IN_L2_LKUP(), .priority = 90, @@ -4678,8 +4673,7 @@ Flow(.logical_datapath = sw._uuid, .stage_hint = stage_hint(sp.lsp._uuid), .io_port = None, .controller_meter = None) :- -sp in (.sw = sw, .peer = Some{rp}), -rp.is_enabled(), +sp in (.sw = sw, .peer = Some{rp@{.enabled = true}}), (.port = sp, .unreachable_ips_v6 = ips_v6), var ipv6 = FlatMap(ips_v6), var flood = json_escape(mC_FLOOD().0). -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn 3/8] ovn-northd-ddlog: Derive load balancer IP addresses in new LoadBalancer.
This makes the get_router_load_balancer_ips() function much faster. This function is a hot path for the use of load balancers, so it improves performance overall when they are in use. Signed-off-by: Ben Pfaff --- northd/lrouter.dl| 68 +-- northd/ovn_northd.dl | 76 +++- 2 files changed, 91 insertions(+), 53 deletions(-) diff --git a/northd/lrouter.dl b/northd/lrouter.dl index b6e752f7c..17d803292 100644 --- a/northd/lrouter.dl +++ b/northd/lrouter.dl @@ -402,14 +402,13 @@ LogicalRouterSnatIPs(lr._uuid, map_empty()) :- lr in nb::Logical_Router(), not LogicalRouterSnatIP(.lr = lr._uuid). -relation LogicalRouterLB(lr: uuid, nat: Intern) - +relation LogicalRouterLB(lr: uuid, nat: Intern) LogicalRouterLB(lr, lb) :- nb::Logical_Router(._uuid = lr, .load_balancer = lbs), var lb_uuid = FlatMap(lbs), -lb in ::Load_Balancer(._uuid = lb_uuid). +lb in (.lb = ::Load_Balancer{._uuid = lb_uuid}). -relation LogicalRouterLBs(lr: uuid, nat: Vec>) +relation LogicalRouterLBs(lr: uuid, nat: Vec>) LogicalRouterLBs(lr, lbs) :- LogicalRouterLB(lr, lb), @@ -448,6 +447,31 @@ LogicalRouterCopp0(lr, meters) :- function chassis_redirect_name(port_name: istring): string = "cr-${port_name}" +typedef LoadBalancer = LoadBalancer { +lb: Intern, +ipv4s: Set, +ipv6s: Set, +routable: bool +} + +relation LoadBalancer[Intern] +LoadBalancer[LoadBalancer{lb, ipv4s, ipv6s, routable}.intern()] :- +nb::Load_Balancer[lb], +var routable = lb.options.get_bool_def(i"add_route", false), +(var ipv4s, var ipv6s) = { +var ipv4s = set_empty(); +var ipv6s = set_empty(); +for ((vip, _) in lb.vips) { +/* node->key contains IP:port or just IP. */ +match (ip_address_and_port_from_lb_key(vip.ival())) { +None -> (), +Some{(IPv4{ipv4}, _)} -> ipv4s.insert(i"${ipv4}"), +Some{(IPv6{ipv6}, _)} -> ipv6s.insert(i"${ipv6}"), +} +}; +(ipv4s, ipv6s) +}. + typedef Router = Router { /* Fields copied from nb::Logical_Router. */ _uuid: uuid, @@ -464,7 +488,11 @@ typedef Router = Router { is_gateway: bool, nats: Vec, snat_ips: Map>, -lbs:Vec>, +lbs:Vec>, +lb_ipv4s_routable: Set, +lb_ipv4s_unroutable: Set, +lb_ipv6s_routable: Set, +lb_ipv6s_unroutable: Set, mcast_cfg: Intern, learn_from_arp_request: bool, force_lb_snat: bool, @@ -488,6 +516,10 @@ Router[Router{ .nats= nats, .snat_ips= snat_ips, .lbs = lbs, +.lb_ipv4s_routable = lb_ipv4s_routable, +.lb_ipv4s_unroutable = lb_ipv4s_unroutable, +.lb_ipv6s_routable = lb_ipv6s_routable, +.lb_ipv6s_unroutable = lb_ipv6s_unroutable, .mcast_cfg = mcast_cfg, .learn_from_arp_request = learn_from_arp_request, .force_lb_snat = force_lb_snat, @@ -501,10 +533,28 @@ Router[Router{ LogicalRouterCopp(lr._uuid, copp), mcast_cfg in (.datapath = lr._uuid), var learn_from_arp_request = lr.options.get_bool_def(i"always_learn_from_arp_request", true), -var force_lb_snat = lb_force_snat_router_ip(lr.options). +var force_lb_snat = lb_force_snat_router_ip(lr.options), +(var lb_ipv4s_routable, var lb_ipv4s_unroutable, + var lb_ipv6s_routable, var lb_ipv6s_unroutable) = { +var lb_ipv4s_routable = set_empty(); +var lb_ipv4s_unroutable = set_empty(); +var lb_ipv6s_routable = set_empty(); +var lb_ipv6s_unroutable = set_empty(); +for (lb in lbs) { +if (lb.routable) { +lb_ipv4s_routable = lb_ipv4s_routable.union(lb.ipv4s); +lb_ipv6s_routable = lb_ipv6s_routable.union(lb.ipv6s); +} else { +lb_ipv4s_unroutable = lb_ipv4s_unroutable.union(lb.ipv4s); +lb_ipv6s_unroutable = lb_ipv6s_unroutable.union(lb.ipv6s); +} +}; +(lb_ipv4s_routable, lb_ipv4s_unroutable, + lb_ipv6s_routable, lb_ipv6s_unroutable) +}. /* RouterLB: many-to-many relation between logical routers and nb::LB */ -relation RouterLB(router: Intern, lb: Intern) +relation RouterLB(router: Intern, lb: Intern) RouterLB(router, lb) :- router in (.lbs = lbs), @@ -513,12 +563,12 @@ RouterLB(router, lb) :- /* Load balancer VIPs associated with routers */ relation RouterLBVIP( router: Intern, -lb: Intern, +lb: Intern, vip: istring, backends: istring) RouterLBVIP(router, lb, vip, backends) :- -RouterLB(router, lb@(::Load_Balancer{.vips = vips})), +RouterLB(router, lb@({.lb = ::Load_Balancer{.vips = vips}})), (var vip, var backends) = FlatMap(
[ovs-dev] [PATCH ovn 0/8] 3x performance improvement for ddlog with load balancer benchmark
With and without these patches, I see the following performance when I run the load-balancer heavy benchmark. The measurements include cold start with all the load balancers, then deleting the sctp load balancer and waiting for it to finish, then the same with the tcp load balancer, then the same with the udp load balancer. The measurements only include ddlog time; there is additional overhead talking to the database, but that's a constant factor. Without patches: 777 CPU seconds, 23.5 GB RAM, 12:59 elapsed time. With patches: 261 CPU seconds, 14.4 GB RAM, 4:23 elapsed time. I am continuing to work on performance. Ben Pfaff (7): ovn-northd-ddlog: Make joins for ARP/ND flows slightly more efficient. ovn-northd-ddlog: Derive load balancer IP addresses in new LoadBalancer. ovn-northd-ddlog: Reverse order of joins for connection tracking flows. ovn-northd-ddlog: Avoid re-parsing LB IP addresses and ports. ovn-northd-ddlog: Simplify LBVIPWithStatus to include up_backends string. ovn-northd-ddlog: Avoid storing unused 'lbs' field in Router. ovn-northd-ddlog: Intern strings before joining when possible. Leonid Ryzhyk (1): ovn-northd-ddlog: Intern all strings in OVSDB tables. configure.ac |2 +- manpages.mk |1 - northd/copp.dl | 32 +- northd/helpers.dl| 14 +- northd/ipam.dl | 17 +- northd/lrouter.dl| 213 +++-- northd/lswitch.dl| 177 ++-- northd/multicast.dl | 44 +- northd/ovn-nb.dlopts |1 + northd/ovn-sb.dlopts |1 + northd/ovn.dl|7 + northd/ovn_northd.dl | 2017 -- northd/ovsdb2ddlog2c |6 +- tests/ovn-ic.at |8 +- tests/ovn.at |6 +- 15 files changed, 1266 insertions(+), 1280 deletions(-) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netlink: Add support for parsing link layer address.
On Fri, Aug 20, 2021 at 03:43:15PM +0200, Frode Nordahl wrote: > Data retrieved from netlink and friends may include link layer > address. Add type to nl_attr_type and min/max functions to allow > use of nl_policy_parse with this type of data. > > While this will not be used by Open vSwitch itself at this time, > sibling and derived projects want to use the great netlink library > that OVS provides, and it is not possible to safely override the > global nl_attr_type symbol at link time. > > Signed-off-by: Frode Nordahl Thanks, I applied this. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovsdb-idl : Add APIs to query if a table and a column is present or not.
On Wed, Aug 18, 2021 at 08:42:47PM -0400, num...@ovn.org wrote: > From: Numan Siddique > > This patch adds 2 new APIs in the ovsdb-idl client library > - ovsdb_idl_has_table() and ovsdb_idl_has_column_in_table() to > query if a table and a column is present in the IDL or not. This > is required for scenarios where the server schema is old and > missing a table or column and the client (built with a new schema > version) does a transaction with the missing table or column. This > results in a continuous loop of transaction failures. > > OVN would require the API - ovsdb_idl_has_table() to address this issue > when an old ovsdb-server is used (OVS 2.11) which has the 'datapath' > table missing. A recent commit in OVN creates a 'datapath' row > in the local ovsdb-server. ovsdb_idl_has_column_in_table() would be > useful to have. > > Related issue: https://bugzilla.redhat.com/show_bug.cgi?id=1992705 > Signed-off-by: Numan Siddique In the OVN meeting, I suggested that this is a bug in the ovsdb_idl code for transactions: it shouldn't try to transact on a column that it knows doesn't exist. It might still make sense to add this API, though. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ideas for improving the OVS review process
On Mon, Aug 16, 2021 at 03:52:49PM -0300, Flavio Leitner wrote: > On Tue, Jul 20, 2021 at 11:41:37AM -0700, Ben Pfaff wrote: > > The OVS review process has greatly slowed over the last few years. This > > is partly because I haven't been able to spend as much time on review, > > since I was once the most productive reviewer. Ilya has been able to > > step up the amount of review he does, but that still isn't enough to > > keep up with the workload. > > > > We need to come up with some way to improve things. Here are a few > > ideas, mostly from a call earlier today (that was mainly about the OVS > > conference). I hope they will prompt a discussion. > > > > * Since patches are coming in, we have people who are knowledgable about > > the code. Those people should be pitching in with reviews as well. > > It doesn't seem like they or their managers have the right incentives > > to do that. Maybe there is some way to improve the incentives. > > There is that and also that even with others review, a maintainer > still does a careful review which puts off reviewers' work. > "The perfect is the enemy of the good." > > We have several emeritus maintainers but only a couple of them active > at the moment. It doesn't seem fair to ask them to do more work. So, > I agree with the other reply to this thread saying that we need more > maintainers. > > Although I think we could grow the number of maintainers, it doesn't > seem reasonable to ask each one of them to do an extensive review > by themselves on every patch. My point is that there should be a > "chain of trust", or a rule that allows a patch to be "blindly > committed" if that patch is reviewed by N different members, for example. > The intention is not to forbid maintainers to review, but to alleviate > the load or pressure on them when the community already helped. > > As a side effect, more people could become maintainers/committers > because we don't require them to know OVS deeply or to commit huge > amounts of time to careful review others work. > > This brings up another point: Unpredictability. It is not possible > today to tell what is going to happen with a patch after it gets > posted to mailing list. It can be silent ignored for many months, > or be reviewed by others and still not accepted, etc. We should > have a way to prevent that otherwise it makes very difficult to > align companies or other upstream projects schedules. > > For example, if a company is investing on a feature "X" most probably > has a deliver date. Even if the work is posted during development > phase, that doesn't mean the patch will be reviewed or accepted or > rejected. It's kind of chicken and egg issue. If the process is not > clear, why managers should provide more incentives to participate? > > > > * The Linux kernel uses something like a "default accept" policy for > > patches that superficially look good and compile and boot, if there is > > no review from a specific maintainer within a few days. The patches > > do get some small amount of review from a subsystem maintainer. OVS > > could adopt a similar policy. > > I agree. Aaron is working to improve patchwork's status report for > each patch. Hopefully each community member could report test status > there. It seems that if the community decides to go with the "default > accept" rule, there will be more incentive to connect tests to OVS > patchwork. We get more automation as a side effect. > > However, we would still need to define who is the sub-maintainer > or perhaps think on the "chain of trust"/"blindy committed" idea > mentioned above. It seems like making the list of sub-maintainers is an important step either way. What process should we use for that? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn repost 0/7] Reduce memory consumption and time for Load_Balancer benchmark
On Tue, Aug 17, 2021 at 11:47:50AM -0400, Numan Siddique wrote: > On Thu, Aug 12, 2021 at 1:45 PM Ben Pfaff wrote: > > > > Apologies for repost; I forgot to tag this as "ovn" in the first > > posting. > > > > For the benchmark at > > https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html, > > this reduces memory consumption from 115 GB to 17 GB and elapsed time > > from 19 minutes to 5 minutes. > > > > I think there's headroom for more improvement, because Leonid has some > > cleverness that I haven't been able to successfully work in yet. > > > > This series is also available here: > > https://github.com/blp/ovs-reviews/tree/ovn-memory-4 > > Thanks for the improvements. > > For the entire series: > > Acked-by: Numan Siddique Thanks! I applied this series. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ideas for improving the OVS review process
There's another perspective from the PostgreSQL community detailed at LWN. Here's a "guest link" to the article, titled "PostgreSQL's commitfest clog": https://lwn.net/SubscriberLink/865754/6c0eb771f8436789/ The paragraph that starts out "There is a longstanding expectation within the PostgreSQL project that anybody submitting a patch for consideration in a given commitfest should take the time to review somebody else's patch, preferably one of similar complexity." is particularly interesting. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn repost 7/7] ovn-northd-ddlog: Postpone expanding external_ids, stage_hint, tags.
This has little effect on performance in the current benchmark, but it seems like the cleanest approach to me. Signed-off-by: Ben Pfaff --- northd/ovn_northd.dl | 38 ++ 1 file changed, 22 insertions(+), 16 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index dadf33ab6..f0730d40b 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -1667,9 +1667,9 @@ relation AggregatedFlow ( priority: integer, __match: istring, actions: istring, -tags: Map, +io_port: Option, controller_meter: Option, -external_ids: Map +stage_hint:bit<32> ) function make_flow_tags(io_port: Option): Map { match (io_port) { @@ -1690,9 +1690,9 @@ AggregatedFlow(.logical_datapaths = g.to_set(), .priority = priority, .__match = __match, .actions = actions, - .tags = make_flow_tags(io_port), + .io_port = io_port, .controller_meter = controller_meter, - .external_ids = make_flow_external_ids(stage_hint, stage)) :- + .stage_hint = stage_hint) :- UseLogicalDatapathGroups[true], Flow(logical_datapath, stage, priority, __match, actions, io_port, controller_meter, stage_hint), var g = logical_datapath.group_by((stage, priority, __match, actions, io_port, controller_meter, stage_hint)). @@ -1701,43 +1701,49 @@ AggregatedFlow(.logical_datapaths = set_singleton(logical_datapath), .priority = priority, .__match = __match, .actions = actions, - .tags = make_flow_tags(io_port), + .io_port = io_port, .controller_meter = controller_meter, - .external_ids = make_flow_external_ids(stage_hint, stage)) :- + .stage_hint = stage_hint) :- UseLogicalDatapathGroups[false], Flow(logical_datapath, stage, priority, __match, actions, io_port, controller_meter, stage_hint). +function to_string(pipeline: Pipeline): string { +if (pipeline == Ingress) { +"ingress" +} else { +"egress" +} +} + for (f in AggregatedFlow()) { -var pipeline = if (f.stage.pipeline == Ingress) "ingress" else "egress" in -var external_ids = f.external_ids.insert_imm("stage-name", f.stage.table_name) in if (f.logical_datapaths.size() == 1) { Some{var dp} = f.logical_datapaths.nth(0) in sb::Out_Logical_Flow( -._uuid = hash128((dp, f.stage, f.priority, f.__match, f.actions, f.controller_meter, f.external_ids)), +._uuid = hash128((dp, f.stage, f.priority, f.__match, f.actions, f.controller_meter, f.io_port, f.stage_hint)), .logical_datapath = Some{dp}, .logical_dp_group = None, -.pipeline = pipeline, +.pipeline = f.stage.pipeline.to_string(), .table_id = f.stage.table_id as integer, .priority = f.priority, .controller_meter = f.controller_meter, .__match = f.__match.ival(), .actions = f.actions.ival(), -.tags = f.tags, -.external_ids = external_ids) +.tags = make_flow_tags(f.io_port), +.external_ids = make_flow_external_ids(f.stage_hint, f.stage)) } else { var group_uuid = hash128(f.logical_datapaths) in { sb::Out_Logical_Flow( -._uuid = hash128((group_uuid, f.stage, f.priority, f.__match, f.actions, f.controller_meter, f.external_ids)), +._uuid = hash128((group_uuid, f.stage, f.priority, f.__match, f.actions, f.controller_meter, f.io_port, f.stage_hint)), .logical_datapath = None, .logical_dp_group = Some{group_uuid}, -.pipeline = pipeline, +.pipeline = f.stage.pipeline.to_string(), .table_id = f.stage.table_id as integer, .priority = f.priority, .controller_meter = f.controller_meter, .__match = f.__match.ival(), .actions = f.actions.ival(), -.tags = f.tags, -.external_ids = external_ids); +.tags = make_flow_tags(f.io_port), +.external_ids = make_flow_external_ids(f.stage_hint, f.stage)); sb::Out_Logical_DP_Group(._uuid = group_uuid, .datapaths = f.logical_datapaths) } } -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn repost 6/7] ovn-northd-ddlog: Intern nb::Logical_Switch.
With the benchmark at https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html, this reduces memory consumption by 300 MB and elapsed time by a few seconds. Signed-off-by: Ben Pfaff --- northd/ipam.dl | 2 +- northd/lswitch.dl| 36 ++-- northd/multicast.dl | 2 +- northd/ovn-nb.dlopts | 1 + northd/ovn_northd.dl | 16 5 files changed, 29 insertions(+), 28 deletions(-) diff --git a/northd/ipam.dl b/northd/ipam.dl index da71b2872..4665a28cb 100644 --- a/northd/ipam.dl +++ b/northd/ipam.dl @@ -187,7 +187,7 @@ SwitchIPv4ReservedAddresses(lswitch, addrs) :- var addrs = addr.group_by(lswitch).to_set(). SwitchIPv4ReservedAddresses(lswitch_uuid, set_empty()) :- -nb::Logical_Switch(._uuid = lswitch_uuid), +::Logical_Switch(._uuid = lswitch_uuid), not SwitchIPv4ReservedAddress(lswitch_uuid, _). /* Allocate dynamic IP addresses for ports that require them: diff --git a/northd/lswitch.dl b/northd/lswitch.dl index 7e7b62a4d..868ae115f 100644 --- a/northd/lswitch.dl +++ b/northd/lswitch.dl @@ -44,7 +44,7 @@ SwitchRouterPeerRef(lsp, None) :- * (Use LogicalSwitchPort instead, which guarantees uniqueness.) */ relation LogicalSwitchPortCandidate(lsp_uuid: uuid, ls_uuid: uuid) LogicalSwitchPortCandidate(lsp_uuid, ls_uuid) :- -nb::Logical_Switch(._uuid = ls_uuid, .ports = ports), +::Logical_Switch(._uuid = ls_uuid, .ports = ports), var lsp_uuid = FlatMap(ports). Warning[message] :- LogicalSwitchPortCandidate(lsp_uuid, ls_uuid), @@ -74,7 +74,7 @@ LogicalSwitchPortWithUnknownAddress(ls_uuid, lsp_uuid) :- output relation LogicalSwitchHasUnknownPorts(ls: uuid, has_unknown: bool) LogicalSwitchHasUnknownPorts(ls, true) :- LogicalSwitchPortWithUnknownAddress(ls, _). LogicalSwitchHasUnknownPorts(ls, false) :- -nb::Logical_Switch(._uuid = ls), +::Logical_Switch(._uuid = ls), not LogicalSwitchPortWithUnknownAddress(ls, _). /* PortStaticAddresses: static IP addresses associated with each Logical_Switch_Port */ @@ -101,11 +101,11 @@ PortInGroup(port, group) :- relation LogicalSwitchACL(ls: uuid, acl: uuid) LogicalSwitchACL(ls, acl) :- -nb::Logical_Switch(._uuid = ls, .acls = acls), +::Logical_Switch(._uuid = ls, .acls = acls), var acl = FlatMap(acls). LogicalSwitchACL(ls, acl) :- -nb::Logical_Switch(._uuid = ls, .ports = ports), +::Logical_Switch(._uuid = ls, .ports = ports), var port_id = FlatMap(ports), PortInGroup(port_id, group_id), nb::Port_Group(._uuid = group_id, .acls = acls), @@ -125,7 +125,7 @@ LogicalSwitchHasStatefulACL(ls, true) :- LogicalSwitchStatefulACL(ls, _). LogicalSwitchHasStatefulACL(ls, false) :- -nb::Logical_Switch(._uuid = ls), +::Logical_Switch(._uuid = ls), not LogicalSwitchStatefulACL(ls, _). // "Pitfalls of projections" in ddlog-new-feature.rst explains why this @@ -136,7 +136,7 @@ LogicalSwitchHasACLs(ls, true) :- LogicalSwitchACL(ls, _). LogicalSwitchHasACLs(ls, false) :- -nb::Logical_Switch(._uuid = ls), +::Logical_Switch(._uuid = ls), not LogicalSwitchACL(ls, _). /* @@ -146,7 +146,7 @@ LogicalSwitchHasACLs(ls, false) :- */ relation LogicalSwitchLocalnetPort0(ls_uuid: uuid, lsp: (uuid, string)) LogicalSwitchLocalnetPort0(ls_uuid, (lsp_uuid, lsp.name)) :- -ls in nb::Logical_Switch(._uuid = ls_uuid), +ls in ::Logical_Switch(._uuid = ls_uuid), var lsp_uuid = FlatMap(ls.ports), lsp in ::Logical_Switch_Port(._uuid = lsp_uuid), lsp.__type == "localnet". @@ -156,7 +156,7 @@ LogicalSwitchLocalnetPorts(ls_uuid, localnet_ports) :- LogicalSwitchLocalnetPort0(ls_uuid, lsp), var localnet_ports = lsp.group_by(ls_uuid).to_vec(). LogicalSwitchLocalnetPorts(ls_uuid, vec_empty()) :- -ls in nb::Logical_Switch(), +ls in ::Logical_Switch(), var ls_uuid = ls._uuid, not LogicalSwitchLocalnetPort0(ls_uuid, _). @@ -164,7 +164,7 @@ LogicalSwitchLocalnetPorts(ls_uuid, vec_empty()) :- relation LogicalSwitchDNS(ls_uuid: uuid, dns_uuid: uuid) LogicalSwitchDNS(ls._uuid, dns_uuid) :- -nb::Logical_Switch[ls], +::Logical_Switch[ls], var dns_uuid = FlatMap(ls.dns_records), nb::DNS(._uuid = dns_uuid). @@ -183,12 +183,12 @@ LogicalSwitchHasDNSRecords(ls, true) :- LogicalSwitchWithDNSRecords(ls). LogicalSwitchHasDNSRecords(ls, false) :- -nb::Logical_Switch(._uuid = ls), +::Logical_Switch(._uuid = ls), not LogicalSwitchWithDNSRecords(ls). relation LogicalSwitchHasNonRouterPort0(ls: uuid) LogicalSwitchHasNonRouterPort0(ls_uuid) :- -ls in nb::Logical_Switch(._uuid = ls_uuid), +ls in ::Logical_Switch(._uuid = ls_uuid), var lsp_uuid = FlatMap(ls.ports), lsp in ::Logical_Switch_Port(._uuid = lsp_uuid), lsp.__type != "router". @@ -199,7 +199,7 @@ output relation LogicalSwitchHasNonRouterPort(ls: uuid, has_non_router_port: boo LogicalS
[ovs-dev] [PATCH ovn repost 2/7] ovn-northd-ddlog: Use cheaper representation for stage_hint.
The stage_hint only shows 32 bits of the uuid, so it's cheaper to omit the rest for the internal representation. Also, this is just a hint, so we might as well use zero to mean None and save the cost of the Option wrapper. With the benchmark at https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html, this reduces memory consumption by 1.3 GB. Signed-off-by: Ben Pfaff --- northd/ovn_northd.dl | 579 ++- 1 file changed, 292 insertions(+), 287 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 844add024..2365372fb 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -1644,9 +1644,13 @@ relation Flow( actions: string, io_port: Option, controller_meter: Option, -stage_hint: Option +stage_hint: bit<32> ) +function stage_hint(_uuid: uuid): bit<32> { +_uuid[127:96] +} + /* If this option is 'true' northd will combine logical flows that differ by * logical datapath only by creating a datapath group. */ relation UseLogicalDatapathGroups[bool] @@ -1673,11 +1677,12 @@ function make_flow_tags(io_port: Option): Map { Some{s} -> [ "in_out_port" -> s ] } } -function make_flow_external_ids(stage_hint: Option, stage: Stage): Map { -match (stage_hint) { -None -> ["stage-name" -> stage.table_name], -Some{uuid} -> ["stage-name" -> stage.table_name, - "stage-hint" -> "${hex(uuid[127:96])}"] +function make_flow_external_ids(stage_hint: bit<32>, stage: Stage): Map { +if (stage_hint == 0) { +["stage-name" -> stage.table_name] +} else { +["stage-name" -> stage.table_name, + "stage-hint" -> "${hex(stage_hint)}"] } } AggregatedFlow(.logical_datapaths = g.to_set(), @@ -1744,7 +1749,7 @@ Flow(.logical_datapath = sw._uuid, .priority = 50, .__match = __match, .actions = actions, - .stage_hint = Some{fg_uuid}, + .stage_hint = stage_hint(fg_uuid), .io_port = None, .controller_meter = None) :- sw in (), @@ -1776,7 +1781,7 @@ Flow(.logical_datapath = sw._uuid, .priority = 50, .__match = __match, .actions = actions, - .stage_hint = None, + .stage_hint = 0, .io_port = None, .controller_meter = None) :- sw in (), @@ -1799,7 +1804,7 @@ for (sw in ()) { .priority = 100, .__match = "vlan.present", .actions = "drop;", - .stage_hint = None /*TODO: check*/, + .stage_hint = 0 /*TODO: check*/, .io_port = None, .controller_meter = None) }; @@ -1810,7 +1815,7 @@ for (sw in ()) { .priority = 100, .__match = "eth.src[40]", .actions = "drop;", - .stage_hint = None /*TODO: check*/, + .stage_hint = 0 /*TODO: check*/, .io_port = None, .controller_meter = None) /* Port security flows have priority 50 (see below) and will continue to the next table @@ -1872,7 +1877,7 @@ for ((._uuid =ls_uuid)) { .priority = 0, .__match = "1", .actions = "next;", - .stage_hint = None, + .stage_hint = 0, .io_port = None, .controller_meter = None); Flow(.logical_datapath = ls_uuid, @@ -1880,7 +1885,7 @@ for ((._uuid =ls_uuid)) { .priority = 0, .__match = "1", .actions = "next;", - .stage_hint = None, + .stage_hint = 0, .io_port = None, .controller_meter = None); @@ -1889,7 +1894,7 @@ for ((._uuid =ls_uuid)) { .priority = 110, .__match = "eth.dst == $svc_monitor_mac", .actions = "next;", - .stage_hint = None, + .stage_hint = 0, .io_port = None, .controller_meter = None); Flow(.logical_datapath = ls_uuid, @@ -1897,7 +1902,7 @@ for ((._uuid =ls_uuid)) { .priority = 110, .__match = "eth.src == $svc_monitor_mac", .actions = "next;", - .stage_hint = None, + .stage_hint = 0, .io_port = None, .controller_meter = None) } @@ -1912,7 +1917,7 @@ for ((.sw = sw@{._uuid = ls_uuid}, .acl = , .has_fair_meter .priority = acl.priority + oVN_ACL_PRI_OFFSET(),
[ovs-dev] [PATCH ovn repost 5/7] ovn-northd-ddlog: Get rid of duplicate flows caused by stage_hint.
It was possible for these rules to generate multiple Flow records that differed only in their stage_hint. This caused a lot of duplication for the load balancer benchmark. With the benchmark at https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html, this reduces memory consumption from 53 GB to 17 GB and elapsed time from 14 minutes to 5 minutes. Signed-off-by: Ben Pfaff --- northd/ovn_northd.dl | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index dd69126ff..38b97aa6b 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -3163,7 +3163,7 @@ Flow(.logical_datapath = sw._uuid, .actions = actions.intern(), .io_port = None, .controller_meter = meter, - .stage_hint = stage_hint(lb._uuid)) :- + .stage_hint = 0) :- sw in (), LBVIPWithStatus[lbvip@{.lb = lb}], sw.load_balancer.contains(lb._uuid), @@ -4587,7 +4587,7 @@ Flow(.logical_datapath = sw._uuid, .priority = 90, .__match = i"${fLAGBIT_NOT_VXLAN()} && arp.op == 1 && arp.tpa == ${ipv4}", .actions = i"outport = ${flood}; output;", - .stage_hint = stage_hint(sp.lsp._uuid), + .stage_hint = 0, .io_port = None, .controller_meter = None) :- sp in (.sw = sw, .peer = Some{rp}), @@ -,7 +,7 @@ for (RouterLBVIP( .priority = prio, .__match = __match.intern(), .actions = __actions, - .stage_hint = stage_hint(lb._uuid), + .stage_hint = 0, .io_port = None, .controller_meter = None); @@ -6712,7 +6712,7 @@ for (RouterLBVIP( .priority = prio, .__match = est_match.intern(), .actions = actions, - .stage_hint = stage_hint(lb._uuid), + .stage_hint = 0, .io_port = None, .controller_meter = None); @@ -6797,7 +6797,7 @@ Flow(.logical_datapath = r._uuid, .actions = actions.intern(), .io_port = None, .controller_meter = meter, - .stage_hint = stage_hint(lb._uuid)) :- + .stage_hint = 0) :- r in (), r.l3dgw_ports.len() > 0 or r.is_gateway, LBVIPWithStatus[lbvip@{.lb = lb}], -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn repost 3/7] ovn-northd-ddlog: Intern Stage.
With the benchmark at https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html, this reduces memory consumption from 72 GB to 66 GB and elapsed time by a few seconds. Signed-off-by: Ben Pfaff --- northd/ovn_northd.dl | 140 +-- 1 file changed, 70 insertions(+), 70 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 2365372fb..f1b7471f2 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -1434,77 +1434,77 @@ nb::Out_Logical_Router_Port(._uuid = _uuid, typedef Pipeline = Ingress | Egress -typedef Stage = Stage{ +typedef Stage = Stage { pipeline: Pipeline, -table_id: integer, +table_id: bit<8>, table_name : string } /* Logical switch ingress stages. */ -function s_SWITCH_IN_PORT_SEC_L2(): Stage { Stage{Ingress, 0, "ls_in_port_sec_l2"} } -function s_SWITCH_IN_PORT_SEC_IP(): Stage { Stage{Ingress, 1, "ls_in_port_sec_ip"} } -function s_SWITCH_IN_PORT_SEC_ND(): Stage { Stage{Ingress, 2, "ls_in_port_sec_nd"} } -function s_SWITCH_IN_LOOKUP_FDB(): Stage { Stage{Ingress, 3, "ls_in_lookup_fdb"} } -function s_SWITCH_IN_PUT_FDB(): Stage { Stage{Ingress, 4, "ls_in_put_fdb"} } -function s_SWITCH_IN_PRE_ACL(): Stage { Stage{Ingress, 5, "ls_in_pre_acl"} } -function s_SWITCH_IN_PRE_LB(): Stage { Stage{Ingress, 6, "ls_in_pre_lb"} } -function s_SWITCH_IN_PRE_STATEFUL():Stage { Stage{Ingress, 7, "ls_in_pre_stateful"} } -function s_SWITCH_IN_ACL_HINT():Stage { Stage{Ingress, 8, "ls_in_acl_hint"} } -function s_SWITCH_IN_ACL(): Stage { Stage{Ingress, 9, "ls_in_acl"} } -function s_SWITCH_IN_QOS_MARK():Stage { Stage{Ingress, 10, "ls_in_qos_mark"} } -function s_SWITCH_IN_QOS_METER(): Stage { Stage{Ingress, 11, "ls_in_qos_meter"} } -function s_SWITCH_IN_STATEFUL():Stage { Stage{Ingress, 12, "ls_in_stateful"} } -function s_SWITCH_IN_PRE_HAIRPIN(): Stage { Stage{Ingress, 13, "ls_in_pre_hairpin"} } -function s_SWITCH_IN_NAT_HAIRPIN(): Stage { Stage{Ingress, 14, "ls_in_nat_hairpin"} } -function s_SWITCH_IN_HAIRPIN(): Stage { Stage{Ingress, 15, "ls_in_hairpin"} } -function s_SWITCH_IN_ARP_ND_RSP(): Stage { Stage{Ingress, 16, "ls_in_arp_rsp"} } -function s_SWITCH_IN_DHCP_OPTIONS():Stage { Stage{Ingress, 17, "ls_in_dhcp_options"} } -function s_SWITCH_IN_DHCP_RESPONSE(): Stage { Stage{Ingress, 18, "ls_in_dhcp_response"} } -function s_SWITCH_IN_DNS_LOOKUP(): Stage { Stage{Ingress, 19, "ls_in_dns_lookup"} } -function s_SWITCH_IN_DNS_RESPONSE():Stage { Stage{Ingress, 20, "ls_in_dns_response"} } -function s_SWITCH_IN_EXTERNAL_PORT(): Stage { Stage{Ingress, 21, "ls_in_external_port"} } -function s_SWITCH_IN_L2_LKUP(): Stage { Stage{Ingress, 22, "ls_in_l2_lkup"} } -function s_SWITCH_IN_L2_UNKNOWN(): Stage { Stage{Ingress, 23, "ls_in_l2_unknown"} } +function s_SWITCH_IN_PORT_SEC_L2(): Intern { Stage{Ingress, 0, "ls_in_port_sec_l2"}.intern() } +function s_SWITCH_IN_PORT_SEC_IP(): Intern { Stage{Ingress, 1, "ls_in_port_sec_ip"}.intern() } +function s_SWITCH_IN_PORT_SEC_ND(): Intern { Stage{Ingress, 2, "ls_in_port_sec_nd"}.intern() } +function s_SWITCH_IN_LOOKUP_FDB(): Intern { Stage{Ingress, 3, "ls_in_lookup_fdb"}.intern() } +function s_SWITCH_IN_PUT_FDB(): Intern { Stage{Ingress, 4, "ls_in_put_fdb"}.intern() } +function s_SWITCH_IN_PRE_ACL(): Intern { Stage{Ingress, 5, "ls_in_pre_acl"}.intern() } +function s_SWITCH_IN_PRE_LB(): Intern { Stage{Ingress, 6, "ls_in_pre_lb"}.intern() } +function s_SWITCH_IN_PRE_STATEFUL():Intern { Stage{Ingress, 7, "ls_in_pre_stateful"}.intern() } +function s_SWITCH_IN_ACL_HINT():Intern { Stage{Ingress, 8, "ls_in_acl_hint"}.intern() } +function s_SWITCH_IN_ACL(): Intern { Stage{Ingress, 9, "ls_in_acl"}.intern() } +function s_SWITCH_IN_QOS_MARK():Intern { Stage{Ingress, 10, "ls_in_qos_mark"}.intern() } +function s_SWITCH_IN_QOS_METER(): Intern { Stage{Ingress, 11, "ls_in_qos_meter"}.intern() } +function s_SWITCH_IN_STATEFUL():Intern { Stage{Ingress, 12, "ls_in_stateful"}.intern() } +function s_SWITCH_IN_PRE_HAIRPIN(): Intern { Stage{Ingress, 13, "ls_in_pre_hairpin"}.intern() } +function s_SWITCH_IN_NAT_HAIRPIN(): Intern { Stage{Ingress, 14, "ls_in_nat_hairpin"}.intern() } +function s_SWITCH_IN_HAIRPIN(): Intern { Stage{Ingress, 15, "ls_in_hairpin"}.intern() } +function s_SWITCH_IN_ARP_ND_RSP()
[ovs-dev] [PATCH ovn repost 0/7] Reduce memory consumption and time for Load_Balancer benchmark
Apologies for repost; I forgot to tag this as "ovn" in the first posting. For the benchmark at https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html, this reduces memory consumption from 115 GB to 17 GB and elapsed time from 19 minutes to 5 minutes. I think there's headroom for more improvement, because Leonid has some cleverness that I haven't been able to successfully work in yet. This series is also available here: https://github.com/blp/ovs-reviews/tree/ovn-memory-4 Ben Pfaff (7): ovn-northd-ddlog: Merge TaggedFlow and MeteredFlow into Flow. ovn-northd-ddlog: Use cheaper representation for stage_hint. ovn-northd-ddlog: Intern Stage. Intern all the matches and actions. ovn-northd-ddlog: Get rid of duplicate flows caused by stage_hint. ovn-northd-ddlog: Intern nb::Logical_Switch. ovn-northd-ddlog: Postpone expanding external_ids, stage_hint, tags. northd/ipam.dl |2 +- northd/lswitch.dl| 36 +- northd/multicast.dl |2 +- northd/ovn-nb.dlopts |1 + northd/ovn_northd.dl | 3459 +++--- 5 files changed, 1949 insertions(+), 1551 deletions(-) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 1/1] include/openvswitch/compiler.h: check existence of __builtin_prefetch using __has_builtin
On Tue, Aug 10, 2021 at 05:17:56PM -0700, Sergey Madaminov wrote: > Checking if '__has_builtin' is defined and then defining OVS_PREFETCH to > be '__builtin_prefetch' if it is available. To preserve backwards > compatibility, the previous way to define OVS_PREFETCH macro is also > there. Thanks for working on OVS! > +#if defined __has_builtin > +# if __has_builtin (__builtin_prefetch) > +#define OVS_PREFETCH(addr) __builtin_prefetch((addr)) > +#define OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1) > +# endif > +#elif __GNUC__ > +# define OVS_PREFETCH(addr) __builtin_prefetch((addr)) > +# define OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1) > #else > -#define OVS_PREFETCH(addr) > -#define OVS_PREFETCH_WRITE(addr) > +# define OVS_PREFETCH(addr) > +# define OVS_PREFETCH_WRITE(addr) > #endif The above has some redundancy: two of the forks of the definitions are identical. I think a common way to deal with this is something like: #ifndef __has_builtin #define __has_builtin(X) 0 #endif #if __has_builtin (__builtin_prefetch) || __GNUC__ /* definition 1 */ #else /* definition 2 */ #endif > /* Since Visual Studio 2015 there has been an effort to make offsetof a > @@ -244,11 +253,13 @@ > * the C compiler. > * e.g.: https://bit.ly/2UvWwti > */ > + /* > #if _MSC_VER >= 1900 > #undef offsetof > #define offsetof(type, member) \ > ((size_t)((char *)&(((type *)0)->member) - (char *)0)) > #endif > +*/ The above seems unrelated. If it's needed, I'd put it in a separate patch. I'd also remove the code rather than commenting it out. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/7] Reduce memory consumption and time for Load_Balancer benchmark
On Thu, Aug 12, 2021 at 08:53:51AM -0700, Ben Pfaff wrote: > For the benchmark at > https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html, > this reduces memory consumption from 115 GB to 17 GB and elapsed time > from 19 minutes to 5 minutes. > > I think there's headroom for more improvement, because Leonid has some > cleverness that I haven't been able to successfully work in yet. I should add that this series is also available here: https://github.com/blp/ovs-reviews/tree/ovn-memory-4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 7/7] ovn-northd-ddlog: Postpone expanding external_ids, stage_hint, tags.
From: Ben Pfaff This has little effect on performance in the current benchmark, but it seems like the cleanest approach to me. Signed-off-by: Ben Pfaff --- northd/ovn_northd.dl | 38 ++ 1 file changed, 22 insertions(+), 16 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index dadf33ab6..f0730d40b 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -1667,9 +1667,9 @@ relation AggregatedFlow ( priority: integer, __match: istring, actions: istring, -tags: Map, +io_port: Option, controller_meter: Option, -external_ids: Map +stage_hint:bit<32> ) function make_flow_tags(io_port: Option): Map { match (io_port) { @@ -1690,9 +1690,9 @@ AggregatedFlow(.logical_datapaths = g.to_set(), .priority = priority, .__match = __match, .actions = actions, - .tags = make_flow_tags(io_port), + .io_port = io_port, .controller_meter = controller_meter, - .external_ids = make_flow_external_ids(stage_hint, stage)) :- + .stage_hint = stage_hint) :- UseLogicalDatapathGroups[true], Flow(logical_datapath, stage, priority, __match, actions, io_port, controller_meter, stage_hint), var g = logical_datapath.group_by((stage, priority, __match, actions, io_port, controller_meter, stage_hint)). @@ -1701,43 +1701,49 @@ AggregatedFlow(.logical_datapaths = set_singleton(logical_datapath), .priority = priority, .__match = __match, .actions = actions, - .tags = make_flow_tags(io_port), + .io_port = io_port, .controller_meter = controller_meter, - .external_ids = make_flow_external_ids(stage_hint, stage)) :- + .stage_hint = stage_hint) :- UseLogicalDatapathGroups[false], Flow(logical_datapath, stage, priority, __match, actions, io_port, controller_meter, stage_hint). +function to_string(pipeline: Pipeline): string { +if (pipeline == Ingress) { +"ingress" +} else { +"egress" +} +} + for (f in AggregatedFlow()) { -var pipeline = if (f.stage.pipeline == Ingress) "ingress" else "egress" in -var external_ids = f.external_ids.insert_imm("stage-name", f.stage.table_name) in if (f.logical_datapaths.size() == 1) { Some{var dp} = f.logical_datapaths.nth(0) in sb::Out_Logical_Flow( -._uuid = hash128((dp, f.stage, f.priority, f.__match, f.actions, f.controller_meter, f.external_ids)), +._uuid = hash128((dp, f.stage, f.priority, f.__match, f.actions, f.controller_meter, f.io_port, f.stage_hint)), .logical_datapath = Some{dp}, .logical_dp_group = None, -.pipeline = pipeline, +.pipeline = f.stage.pipeline.to_string(), .table_id = f.stage.table_id as integer, .priority = f.priority, .controller_meter = f.controller_meter, .__match = f.__match.ival(), .actions = f.actions.ival(), -.tags = f.tags, -.external_ids = external_ids) +.tags = make_flow_tags(f.io_port), +.external_ids = make_flow_external_ids(f.stage_hint, f.stage)) } else { var group_uuid = hash128(f.logical_datapaths) in { sb::Out_Logical_Flow( -._uuid = hash128((group_uuid, f.stage, f.priority, f.__match, f.actions, f.controller_meter, f.external_ids)), +._uuid = hash128((group_uuid, f.stage, f.priority, f.__match, f.actions, f.controller_meter, f.io_port, f.stage_hint)), .logical_datapath = None, .logical_dp_group = Some{group_uuid}, -.pipeline = pipeline, +.pipeline = f.stage.pipeline.to_string(), .table_id = f.stage.table_id as integer, .priority = f.priority, .controller_meter = f.controller_meter, .__match = f.__match.ival(), .actions = f.actions.ival(), -.tags = f.tags, -.external_ids = external_ids); +.tags = make_flow_tags(f.io_port), +.external_ids = make_flow_external_ids(f.stage_hint, f.stage)); sb::Out_Logical_DP_Group(._uuid = group_uuid, .datapaths = f.logical_datapaths) } } -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 6/7] ovn-northd-ddlog: Intern nb::Logical_Switch.
From: Ben Pfaff With the benchmark at https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html, this reduces memory consumption by 300 MB and elapsed time by a few seconds. Signed-off-by: Ben Pfaff --- northd/ipam.dl | 2 +- northd/lswitch.dl| 36 ++-- northd/multicast.dl | 2 +- northd/ovn-nb.dlopts | 1 + northd/ovn_northd.dl | 16 5 files changed, 29 insertions(+), 28 deletions(-) diff --git a/northd/ipam.dl b/northd/ipam.dl index da71b2872..4665a28cb 100644 --- a/northd/ipam.dl +++ b/northd/ipam.dl @@ -187,7 +187,7 @@ SwitchIPv4ReservedAddresses(lswitch, addrs) :- var addrs = addr.group_by(lswitch).to_set(). SwitchIPv4ReservedAddresses(lswitch_uuid, set_empty()) :- -nb::Logical_Switch(._uuid = lswitch_uuid), +::Logical_Switch(._uuid = lswitch_uuid), not SwitchIPv4ReservedAddress(lswitch_uuid, _). /* Allocate dynamic IP addresses for ports that require them: diff --git a/northd/lswitch.dl b/northd/lswitch.dl index 7e7b62a4d..868ae115f 100644 --- a/northd/lswitch.dl +++ b/northd/lswitch.dl @@ -44,7 +44,7 @@ SwitchRouterPeerRef(lsp, None) :- * (Use LogicalSwitchPort instead, which guarantees uniqueness.) */ relation LogicalSwitchPortCandidate(lsp_uuid: uuid, ls_uuid: uuid) LogicalSwitchPortCandidate(lsp_uuid, ls_uuid) :- -nb::Logical_Switch(._uuid = ls_uuid, .ports = ports), +::Logical_Switch(._uuid = ls_uuid, .ports = ports), var lsp_uuid = FlatMap(ports). Warning[message] :- LogicalSwitchPortCandidate(lsp_uuid, ls_uuid), @@ -74,7 +74,7 @@ LogicalSwitchPortWithUnknownAddress(ls_uuid, lsp_uuid) :- output relation LogicalSwitchHasUnknownPorts(ls: uuid, has_unknown: bool) LogicalSwitchHasUnknownPorts(ls, true) :- LogicalSwitchPortWithUnknownAddress(ls, _). LogicalSwitchHasUnknownPorts(ls, false) :- -nb::Logical_Switch(._uuid = ls), +::Logical_Switch(._uuid = ls), not LogicalSwitchPortWithUnknownAddress(ls, _). /* PortStaticAddresses: static IP addresses associated with each Logical_Switch_Port */ @@ -101,11 +101,11 @@ PortInGroup(port, group) :- relation LogicalSwitchACL(ls: uuid, acl: uuid) LogicalSwitchACL(ls, acl) :- -nb::Logical_Switch(._uuid = ls, .acls = acls), +::Logical_Switch(._uuid = ls, .acls = acls), var acl = FlatMap(acls). LogicalSwitchACL(ls, acl) :- -nb::Logical_Switch(._uuid = ls, .ports = ports), +::Logical_Switch(._uuid = ls, .ports = ports), var port_id = FlatMap(ports), PortInGroup(port_id, group_id), nb::Port_Group(._uuid = group_id, .acls = acls), @@ -125,7 +125,7 @@ LogicalSwitchHasStatefulACL(ls, true) :- LogicalSwitchStatefulACL(ls, _). LogicalSwitchHasStatefulACL(ls, false) :- -nb::Logical_Switch(._uuid = ls), +::Logical_Switch(._uuid = ls), not LogicalSwitchStatefulACL(ls, _). // "Pitfalls of projections" in ddlog-new-feature.rst explains why this @@ -136,7 +136,7 @@ LogicalSwitchHasACLs(ls, true) :- LogicalSwitchACL(ls, _). LogicalSwitchHasACLs(ls, false) :- -nb::Logical_Switch(._uuid = ls), +::Logical_Switch(._uuid = ls), not LogicalSwitchACL(ls, _). /* @@ -146,7 +146,7 @@ LogicalSwitchHasACLs(ls, false) :- */ relation LogicalSwitchLocalnetPort0(ls_uuid: uuid, lsp: (uuid, string)) LogicalSwitchLocalnetPort0(ls_uuid, (lsp_uuid, lsp.name)) :- -ls in nb::Logical_Switch(._uuid = ls_uuid), +ls in ::Logical_Switch(._uuid = ls_uuid), var lsp_uuid = FlatMap(ls.ports), lsp in ::Logical_Switch_Port(._uuid = lsp_uuid), lsp.__type == "localnet". @@ -156,7 +156,7 @@ LogicalSwitchLocalnetPorts(ls_uuid, localnet_ports) :- LogicalSwitchLocalnetPort0(ls_uuid, lsp), var localnet_ports = lsp.group_by(ls_uuid).to_vec(). LogicalSwitchLocalnetPorts(ls_uuid, vec_empty()) :- -ls in nb::Logical_Switch(), +ls in ::Logical_Switch(), var ls_uuid = ls._uuid, not LogicalSwitchLocalnetPort0(ls_uuid, _). @@ -164,7 +164,7 @@ LogicalSwitchLocalnetPorts(ls_uuid, vec_empty()) :- relation LogicalSwitchDNS(ls_uuid: uuid, dns_uuid: uuid) LogicalSwitchDNS(ls._uuid, dns_uuid) :- -nb::Logical_Switch[ls], +::Logical_Switch[ls], var dns_uuid = FlatMap(ls.dns_records), nb::DNS(._uuid = dns_uuid). @@ -183,12 +183,12 @@ LogicalSwitchHasDNSRecords(ls, true) :- LogicalSwitchWithDNSRecords(ls). LogicalSwitchHasDNSRecords(ls, false) :- -nb::Logical_Switch(._uuid = ls), +::Logical_Switch(._uuid = ls), not LogicalSwitchWithDNSRecords(ls). relation LogicalSwitchHasNonRouterPort0(ls: uuid) LogicalSwitchHasNonRouterPort0(ls_uuid) :- -ls in nb::Logical_Switch(._uuid = ls_uuid), +ls in ::Logical_Switch(._uuid = ls_uuid), var lsp_uuid = FlatMap(ls.ports), lsp in ::Logical_Switch_Port(._uuid = lsp_uuid), lsp.__type != "router". @@ -199,7 +199,7 @@ output relation LogicalSwitchHasNonRouterPort(ls: uuid, has_non_router_port: boo LogicalS
[ovs-dev] [PATCH 3/7] ovn-northd-ddlog: Intern Stage.
From: Ben Pfaff With the benchmark at https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html, this reduces memory consumption from 72 GB to 66 GB and elapsed time by a few seconds. Signed-off-by: Ben Pfaff Signed-off-by: Ben Pfaff --- northd/ovn_northd.dl | 140 +-- 1 file changed, 70 insertions(+), 70 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 2365372fb..f1b7471f2 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -1434,77 +1434,77 @@ nb::Out_Logical_Router_Port(._uuid = _uuid, typedef Pipeline = Ingress | Egress -typedef Stage = Stage{ +typedef Stage = Stage { pipeline: Pipeline, -table_id: integer, +table_id: bit<8>, table_name : string } /* Logical switch ingress stages. */ -function s_SWITCH_IN_PORT_SEC_L2(): Stage { Stage{Ingress, 0, "ls_in_port_sec_l2"} } -function s_SWITCH_IN_PORT_SEC_IP(): Stage { Stage{Ingress, 1, "ls_in_port_sec_ip"} } -function s_SWITCH_IN_PORT_SEC_ND(): Stage { Stage{Ingress, 2, "ls_in_port_sec_nd"} } -function s_SWITCH_IN_LOOKUP_FDB(): Stage { Stage{Ingress, 3, "ls_in_lookup_fdb"} } -function s_SWITCH_IN_PUT_FDB(): Stage { Stage{Ingress, 4, "ls_in_put_fdb"} } -function s_SWITCH_IN_PRE_ACL(): Stage { Stage{Ingress, 5, "ls_in_pre_acl"} } -function s_SWITCH_IN_PRE_LB(): Stage { Stage{Ingress, 6, "ls_in_pre_lb"} } -function s_SWITCH_IN_PRE_STATEFUL():Stage { Stage{Ingress, 7, "ls_in_pre_stateful"} } -function s_SWITCH_IN_ACL_HINT():Stage { Stage{Ingress, 8, "ls_in_acl_hint"} } -function s_SWITCH_IN_ACL(): Stage { Stage{Ingress, 9, "ls_in_acl"} } -function s_SWITCH_IN_QOS_MARK():Stage { Stage{Ingress, 10, "ls_in_qos_mark"} } -function s_SWITCH_IN_QOS_METER(): Stage { Stage{Ingress, 11, "ls_in_qos_meter"} } -function s_SWITCH_IN_STATEFUL():Stage { Stage{Ingress, 12, "ls_in_stateful"} } -function s_SWITCH_IN_PRE_HAIRPIN(): Stage { Stage{Ingress, 13, "ls_in_pre_hairpin"} } -function s_SWITCH_IN_NAT_HAIRPIN(): Stage { Stage{Ingress, 14, "ls_in_nat_hairpin"} } -function s_SWITCH_IN_HAIRPIN(): Stage { Stage{Ingress, 15, "ls_in_hairpin"} } -function s_SWITCH_IN_ARP_ND_RSP(): Stage { Stage{Ingress, 16, "ls_in_arp_rsp"} } -function s_SWITCH_IN_DHCP_OPTIONS():Stage { Stage{Ingress, 17, "ls_in_dhcp_options"} } -function s_SWITCH_IN_DHCP_RESPONSE(): Stage { Stage{Ingress, 18, "ls_in_dhcp_response"} } -function s_SWITCH_IN_DNS_LOOKUP(): Stage { Stage{Ingress, 19, "ls_in_dns_lookup"} } -function s_SWITCH_IN_DNS_RESPONSE():Stage { Stage{Ingress, 20, "ls_in_dns_response"} } -function s_SWITCH_IN_EXTERNAL_PORT(): Stage { Stage{Ingress, 21, "ls_in_external_port"} } -function s_SWITCH_IN_L2_LKUP(): Stage { Stage{Ingress, 22, "ls_in_l2_lkup"} } -function s_SWITCH_IN_L2_UNKNOWN(): Stage { Stage{Ingress, 23, "ls_in_l2_unknown"} } +function s_SWITCH_IN_PORT_SEC_L2(): Intern { Stage{Ingress, 0, "ls_in_port_sec_l2"}.intern() } +function s_SWITCH_IN_PORT_SEC_IP(): Intern { Stage{Ingress, 1, "ls_in_port_sec_ip"}.intern() } +function s_SWITCH_IN_PORT_SEC_ND(): Intern { Stage{Ingress, 2, "ls_in_port_sec_nd"}.intern() } +function s_SWITCH_IN_LOOKUP_FDB(): Intern { Stage{Ingress, 3, "ls_in_lookup_fdb"}.intern() } +function s_SWITCH_IN_PUT_FDB(): Intern { Stage{Ingress, 4, "ls_in_put_fdb"}.intern() } +function s_SWITCH_IN_PRE_ACL(): Intern { Stage{Ingress, 5, "ls_in_pre_acl"}.intern() } +function s_SWITCH_IN_PRE_LB(): Intern { Stage{Ingress, 6, "ls_in_pre_lb"}.intern() } +function s_SWITCH_IN_PRE_STATEFUL():Intern { Stage{Ingress, 7, "ls_in_pre_stateful"}.intern() } +function s_SWITCH_IN_ACL_HINT():Intern { Stage{Ingress, 8, "ls_in_acl_hint"}.intern() } +function s_SWITCH_IN_ACL(): Intern { Stage{Ingress, 9, "ls_in_acl"}.intern() } +function s_SWITCH_IN_QOS_MARK():Intern { Stage{Ingress, 10, "ls_in_qos_mark"}.intern() } +function s_SWITCH_IN_QOS_METER(): Intern { Stage{Ingress, 11, "ls_in_qos_meter"}.intern() } +function s_SWITCH_IN_STATEFUL():Intern { Stage{Ingress, 12, "ls_in_stateful"}.intern() } +function s_SWITCH_IN_PRE_HAIRPIN(): Intern { Stage{Ingress, 13, "ls_in_pre_hairpin"}.intern() } +function s_SWITCH_IN_NAT_HAIRPIN(): Intern { Stage{Ingress, 14, "ls_in_nat_hairpin"}.intern() } +function s_SWITCH_IN_HAIRPIN(): Intern { Stage{Ingress, 15, "ls_in_hairpin"
[ovs-dev] [PATCH 5/7] ovn-northd-ddlog: Get rid of duplicate flows caused by stage_hint.
From: Ben Pfaff It was possible for these rules to generate multiple Flow records that differed only in their stage_hint. This caused a lot of duplication for the load balancer benchmark. With the benchmark at https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html, this reduces memory consumption from 53 GB to 17 GB and elapsed time from 14 minutes to 5 minutes. Signed-off-by: Ben Pfaff Signed-off-by: Ben Pfaff --- northd/ovn_northd.dl | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index dd69126ff..38b97aa6b 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -3163,7 +3163,7 @@ Flow(.logical_datapath = sw._uuid, .actions = actions.intern(), .io_port = None, .controller_meter = meter, - .stage_hint = stage_hint(lb._uuid)) :- + .stage_hint = 0) :- sw in (), LBVIPWithStatus[lbvip@{.lb = lb}], sw.load_balancer.contains(lb._uuid), @@ -4587,7 +4587,7 @@ Flow(.logical_datapath = sw._uuid, .priority = 90, .__match = i"${fLAGBIT_NOT_VXLAN()} && arp.op == 1 && arp.tpa == ${ipv4}", .actions = i"outport = ${flood}; output;", - .stage_hint = stage_hint(sp.lsp._uuid), + .stage_hint = 0, .io_port = None, .controller_meter = None) :- sp in (.sw = sw, .peer = Some{rp}), @@ -,7 +,7 @@ for (RouterLBVIP( .priority = prio, .__match = __match.intern(), .actions = __actions, - .stage_hint = stage_hint(lb._uuid), + .stage_hint = 0, .io_port = None, .controller_meter = None); @@ -6712,7 +6712,7 @@ for (RouterLBVIP( .priority = prio, .__match = est_match.intern(), .actions = actions, - .stage_hint = stage_hint(lb._uuid), + .stage_hint = 0, .io_port = None, .controller_meter = None); @@ -6797,7 +6797,7 @@ Flow(.logical_datapath = r._uuid, .actions = actions.intern(), .io_port = None, .controller_meter = meter, - .stage_hint = stage_hint(lb._uuid)) :- + .stage_hint = 0) :- r in (), r.l3dgw_ports.len() > 0 or r.is_gateway, LBVIPWithStatus[lbvip@{.lb = lb}], -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 2/7] ovn-northd-ddlog: Use cheaper representation for stage_hint.
From: Ben Pfaff The stage_hint only shows 32 bits of the uuid, so it's cheaper to omit the rest for the internal representation. Also, this is just a hint, so we might as well use zero to mean None and save the cost of the Option wrapper. With the benchmark at https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html, this reduces memory consumption by 1.3 GB. Signed-off-by: Ben Pfaff --- northd/ovn_northd.dl | 579 ++- 1 file changed, 292 insertions(+), 287 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index 844add024..2365372fb 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -1644,9 +1644,13 @@ relation Flow( actions: string, io_port: Option, controller_meter: Option, -stage_hint: Option +stage_hint: bit<32> ) +function stage_hint(_uuid: uuid): bit<32> { +_uuid[127:96] +} + /* If this option is 'true' northd will combine logical flows that differ by * logical datapath only by creating a datapath group. */ relation UseLogicalDatapathGroups[bool] @@ -1673,11 +1677,12 @@ function make_flow_tags(io_port: Option): Map { Some{s} -> [ "in_out_port" -> s ] } } -function make_flow_external_ids(stage_hint: Option, stage: Stage): Map { -match (stage_hint) { -None -> ["stage-name" -> stage.table_name], -Some{uuid} -> ["stage-name" -> stage.table_name, - "stage-hint" -> "${hex(uuid[127:96])}"] +function make_flow_external_ids(stage_hint: bit<32>, stage: Stage): Map { +if (stage_hint == 0) { +["stage-name" -> stage.table_name] +} else { +["stage-name" -> stage.table_name, + "stage-hint" -> "${hex(stage_hint)}"] } } AggregatedFlow(.logical_datapaths = g.to_set(), @@ -1744,7 +1749,7 @@ Flow(.logical_datapath = sw._uuid, .priority = 50, .__match = __match, .actions = actions, - .stage_hint = Some{fg_uuid}, + .stage_hint = stage_hint(fg_uuid), .io_port = None, .controller_meter = None) :- sw in (), @@ -1776,7 +1781,7 @@ Flow(.logical_datapath = sw._uuid, .priority = 50, .__match = __match, .actions = actions, - .stage_hint = None, + .stage_hint = 0, .io_port = None, .controller_meter = None) :- sw in (), @@ -1799,7 +1804,7 @@ for (sw in ()) { .priority = 100, .__match = "vlan.present", .actions = "drop;", - .stage_hint = None /*TODO: check*/, + .stage_hint = 0 /*TODO: check*/, .io_port = None, .controller_meter = None) }; @@ -1810,7 +1815,7 @@ for (sw in ()) { .priority = 100, .__match = "eth.src[40]", .actions = "drop;", - .stage_hint = None /*TODO: check*/, + .stage_hint = 0 /*TODO: check*/, .io_port = None, .controller_meter = None) /* Port security flows have priority 50 (see below) and will continue to the next table @@ -1872,7 +1877,7 @@ for ((._uuid =ls_uuid)) { .priority = 0, .__match = "1", .actions = "next;", - .stage_hint = None, + .stage_hint = 0, .io_port = None, .controller_meter = None); Flow(.logical_datapath = ls_uuid, @@ -1880,7 +1885,7 @@ for ((._uuid =ls_uuid)) { .priority = 0, .__match = "1", .actions = "next;", - .stage_hint = None, + .stage_hint = 0, .io_port = None, .controller_meter = None); @@ -1889,7 +1894,7 @@ for ((._uuid =ls_uuid)) { .priority = 110, .__match = "eth.dst == $svc_monitor_mac", .actions = "next;", - .stage_hint = None, + .stage_hint = 0, .io_port = None, .controller_meter = None); Flow(.logical_datapath = ls_uuid, @@ -1897,7 +1902,7 @@ for ((._uuid =ls_uuid)) { .priority = 110, .__match = "eth.src == $svc_monitor_mac", .actions = "next;", - .stage_hint = None, + .stage_hint = 0, .io_port = None, .controller_meter = None) } @@ -1912,7 +1917,7 @@ for ((.sw = sw@{._uuid = ls_uuid}, .acl = , .has_fair_meter .priority = acl.pr
[ovs-dev] [PATCH 0/7] Reduce memory consumption and time for Load_Balancer benchmark
For the benchmark at https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385333.html, this reduces memory consumption from 115 GB to 17 GB and elapsed time from 19 minutes to 5 minutes. I think there's headroom for more improvement, because Leonid has some cleverness that I haven't been able to successfully work in yet. Ben Pfaff (7): ovn-northd-ddlog: Merge TaggedFLow and MeteredFlow into Flow. ovn-northd-ddlog: Use cheaper representation for stage_hint. ovn-northd-ddlog: Intern Stage. Intern all the matches and actions. ovn-northd-ddlog: Get rid of duplicate flows caused by stage_hint. ovn-northd-ddlog: Intern nb::Logical_Switch. ovn-northd-ddlog: Postpone expanding external_ids, stage_hint, tags. northd/ipam.dl |2 +- northd/lswitch.dl| 36 +- northd/multicast.dl |2 +- northd/ovn-nb.dlopts |1 + northd/ovn_northd.dl | 3459 +++--- 5 files changed, 1949 insertions(+), 1551 deletions(-) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ideas for improving the OVS review process
On Fri, Jul 23, 2021 at 08:26:51PM +0800, Tonghao Zhang wrote: > On Wed, Jul 21, 2021 at 2:41 AM Ben Pfaff wrote: > > > > The OVS review process has greatly slowed over the last few years. This > > is partly because I haven't been able to spend as much time on review, > > since I was once the most productive reviewer. Ilya has been able to > Thanks for your work! > > step up the amount of review he does, but that still isn't enough to > > keep up with the workload. > > > > We need to come up with some way to improve things. Here are a few > > ideas, mostly from a call earlier today (that was mainly about the OVS > > conference). I hope they will prompt a discussion. > > > > * Since patches are coming in, we have people who are knowledgable about > > the code. Those people should be pitching in with reviews as well. > > It doesn't seem like they or their managers have the right incentives > > to do that. Maybe there is some way to improve the incentives. > Do we need more maintainers? even though there are many maintainers, but > a few people are active. Am I wrong ? That reminds me of another point brought up in discussion in the call. We have some people who are active and valuable contributors, but who have not been officially appointed as maintainers. We should nominate and confirm them, because this also removes a bottleneck from the process for getting patches in. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix continuations with OF instructions in OF1.1+.
On Thu, Jul 22, 2021 at 12:14:23PM -0700, Ben Pfaff wrote: > On Wed, Jul 21, 2021 at 01:22:19PM +0200, Ilya Maximets wrote: > > On 7/21/21 12:52 AM, Ben Pfaff wrote: > > > On Tue, Jul 20, 2021 at 08:29:14PM +0200, Ilya Maximets wrote: > > >> On 7/7/21 8:51 PM, Ben Pfaff wrote: > > >>> +/* From an OpenFlow point of view, goto_table and > > >>> write_metadata are > > >>> + * instructions, not actions. This means that to use them, > > >>> we'd have > > >>> + * to reformulate the actions as instructions, which is > > >>> possible, and > > >>> + * we'd have slot them into the frozen actions in a specific > > >>> order, > > >>> + * which doesn't seem practical. Instead, we translate these > > >>> + * instructions into equivalent actions. */ > [...] > > >> I have very little knowledge in this area, so this might be not issue. > > >> However, 'OpenFlow "instructions", which were introduced in OpenFlow 1.1' > > >> doesn't match with OFPACT_SET_FIELD that is marked as OpenFlow 1.2+ > > >> action. I see that all tests for continuation have default bridge > > >> configuration. And monitors are OF10 and OF13 in different cases, but > > >> I'm not sure if these ovs-ofctl monitors are checking that received > > >> actions for resume matches the OF version they are using. And for > > >> the bridge itself, it supports all versions of OF, so it can interpret > > >> actions. But the question is: will that work if the bridge only > > >> supports OpenFlow 1.1 ? It should fail to parse OFPACT_SET_FIELD, or > > >> am I missing something here? > > > > > > Thanks a lot for the review! > > > > > > OVS can serialize most actions and instructions to most OpenFlow > > > versions, even if they are not natively supported in that particular > > > OpenFlow version. For example, in the case of OFPACT_SET_FIELD (a > > > rather important action), the code in ofp-actions.c has complete > > > fallbacks: > [...] > > > > What is lacking is a fallback for OF1.1+, which does have instructions, > > > in the case where we're encoding something that can't use the > > > instructions but must use actions instead. That's the case here. It's > > > a weird case and I don't know about it elsewhere. > > > > OK. Thanks for the explanation! That make sense. For some reason I > > mixed up ofpacts with OpenFlow actions in my head and so I missed the > > translation layer that handles conversion to the appropriate OpenFlow > > version. > > > > TBH, it's really hard to track the call chain that connects the > > freeze_unroll_actions() and encode_SET_FIELD(). I still didn't figure > > that out by reading the code. I guess, the only option for me is to > > attach debugger, break the execution and look at the stack, unless you > > can draw a call tree for me. > > It's a little indirect. > > If we're in freeze_unroll_actions(), it's because we're freezing. If > we're freezing, we'll unwind the call stack a bit and end up in > finish_freezing__(). That function will stick the actions in a struct > frozen_state, and then attach it to a newly allocated recirc_id via > recirc_alloc_id_ctx(). That recirc_id gets stuck in an action that gets > appended to the flow we're currently translating. In the case I'm > looking at here, that's an action to send the packet to userspace. > > Later on, the datapath sends a packet sent to userspace. > process_upcall() in userspace hits the CONTROLLER_UPCALL case, which > grabs the recirc_id out of the upcall cookie, gets the frozen state > using that, gets the ofpacts out of it and stuffs them into the private > part of the packet-in message it constructs, and then sends an async msg > (a packet-in with private data attached) to the controller. Eventually > that async message gets serialized using > ofputil_encode_packet_in_private(), which calls > ofputil_put_packet_in_private(), which encodes the actions via calls to > put_actions_property(). > > I wish it was more direct. The real problem here is the fixed datapath > interface. We can't change that unless and until we drop support for > the Linux kernel module and its frozen ABI. > > > In the end, it's hard for me to review the logic behind this change, > > but the patch seems to do what described in the commit message and > > the implementation looks correct, tests work fine. With that in mind: > > > > Acked-by: Ilya Maximets > > Thanks a lot! I'll recheck this and push it soon. I applied this to master. Grayson, do you want this backported? What version? ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix continuations with OF instructions in OF1.1+.
On Wed, Jul 21, 2021 at 01:22:19PM +0200, Ilya Maximets wrote: > On 7/21/21 12:52 AM, Ben Pfaff wrote: > > On Tue, Jul 20, 2021 at 08:29:14PM +0200, Ilya Maximets wrote: > >> On 7/7/21 8:51 PM, Ben Pfaff wrote: > >>> +/* From an OpenFlow point of view, goto_table and write_metadata > >>> are > >>> + * instructions, not actions. This means that to use them, we'd > >>> have > >>> + * to reformulate the actions as instructions, which is > >>> possible, and > >>> + * we'd have slot them into the frozen actions in a specific > >>> order, > >>> + * which doesn't seem practical. Instead, we translate these > >>> + * instructions into equivalent actions. */ [...] > >> I have very little knowledge in this area, so this might be not issue. > >> However, 'OpenFlow "instructions", which were introduced in OpenFlow 1.1' > >> doesn't match with OFPACT_SET_FIELD that is marked as OpenFlow 1.2+ > >> action. I see that all tests for continuation have default bridge > >> configuration. And monitors are OF10 and OF13 in different cases, but > >> I'm not sure if these ovs-ofctl monitors are checking that received > >> actions for resume matches the OF version they are using. And for > >> the bridge itself, it supports all versions of OF, so it can interpret > >> actions. But the question is: will that work if the bridge only > >> supports OpenFlow 1.1 ? It should fail to parse OFPACT_SET_FIELD, or > >> am I missing something here? > > > > Thanks a lot for the review! > > > > OVS can serialize most actions and instructions to most OpenFlow > > versions, even if they are not natively supported in that particular > > OpenFlow version. For example, in the case of OFPACT_SET_FIELD (a > > rather important action), the code in ofp-actions.c has complete > > fallbacks: [...] > > What is lacking is a fallback for OF1.1+, which does have instructions, > > in the case where we're encoding something that can't use the > > instructions but must use actions instead. That's the case here. It's > > a weird case and I don't know about it elsewhere. > > OK. Thanks for the explanation! That make sense. For some reason I > mixed up ofpacts with OpenFlow actions in my head and so I missed the > translation layer that handles conversion to the appropriate OpenFlow > version. > > TBH, it's really hard to track the call chain that connects the > freeze_unroll_actions() and encode_SET_FIELD(). I still didn't figure > that out by reading the code. I guess, the only option for me is to > attach debugger, break the execution and look at the stack, unless you > can draw a call tree for me. It's a little indirect. If we're in freeze_unroll_actions(), it's because we're freezing. If we're freezing, we'll unwind the call stack a bit and end up in finish_freezing__(). That function will stick the actions in a struct frozen_state, and then attach it to a newly allocated recirc_id via recirc_alloc_id_ctx(). That recirc_id gets stuck in an action that gets appended to the flow we're currently translating. In the case I'm looking at here, that's an action to send the packet to userspace. Later on, the datapath sends a packet sent to userspace. process_upcall() in userspace hits the CONTROLLER_UPCALL case, which grabs the recirc_id out of the upcall cookie, gets the frozen state using that, gets the ofpacts out of it and stuffs them into the private part of the packet-in message it constructs, and then sends an async msg (a packet-in with private data attached) to the controller. Eventually that async message gets serialized using ofputil_encode_packet_in_private(), which calls ofputil_put_packet_in_private(), which encodes the actions via calls to put_actions_property(). I wish it was more direct. The real problem here is the fixed datapath interface. We can't change that unless and until we drop support for the Linux kernel module and its frozen ABI. > In the end, it's hard for me to review the logic behind this change, > but the patch seems to do what described in the commit message and > the implementation looks correct, tests work fine. With that in mind: > > Acked-by: Ilya Maximets Thanks a lot! I'll recheck this and push it soon. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-dummy: Silence the 'may be uninitialized' warning.
On Thu, Jul 22, 2021 at 01:33:37PM +0200, Ilya Maximets wrote: > GCC 11 on Feodra 34 emits a false-positive warning like this: > > lib/netdev-dummy.c: In function ‘dummy_packet_stream_run’: > lib/netdev-dummy.c:284:16: error: ‘n’ may be used uninitialized in this >function [-Werror=maybe-uninitialized] > 284 | if (retval == n && dp_packet_size(>rxbuf) > 2) { > |^ > > This breaks the build with --enable-Werror. Initializing 'n' to > avoid the warning. > > Signed-off-by: Ilya Maximets Acked-by: Ben Pfaff ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ideas for improving the OVS review process
On Wed, Jul 21, 2021 at 10:37:18AM +0200, Eelco Chaudron wrote: > > > On 20 Jul 2021, at 20:41, Ben Pfaff wrote: > > > The OVS review process has greatly slowed over the last few years. This > > is partly because I haven't been able to spend as much time on review, > > since I was once the most productive reviewer. Ilya has been able to > > step up the amount of review he does, but that still isn't enough to > > keep up with the workload. > > > > We need to come up with some way to improve things. Here are a few > > ideas, mostly from a call earlier today (that was mainly about the OVS > > conference). I hope they will prompt a discussion. > > > > * Since patches are coming in, we have people who are knowledgable about > > the code. Those people should be pitching in with reviews as well. > > It doesn't seem like they or their managers have the right incentives > > to do that. Maybe there is some way to improve the incentives. > > I do agree that it takes (very) long sometimes to get a patch > reviewed/accepted, and I do see people complain about it. However, > some of the people who do complain have not done a single > review. Maybe we can ask people who send in a patch, to review at > least one patch while they are waiting for theirs to be reviewed? Hey, that's a great idea. Maybe I will start looking for review requests from people who do not do many reviews and suggest that they should look at other patches out for reviews. There may be some difficulty, though, that people new to OVS are less qualified to review other patches. However, trying to review is also a good way to learn. > Maybe have the zero-day robot sent them a thank you email for the > patch with a list of patches that did not yet receive a single review > comment? That's a great idea! I don't know whether it's practical, but I like the idea. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix continuations with OF instructions in OF1.1+.
On Tue, Jul 20, 2021 at 08:29:14PM +0200, Ilya Maximets wrote: > On 7/7/21 8:51 PM, Ben Pfaff wrote: > > +/* From an OpenFlow point of view, goto_table and write_metadata > > are > > + * instructions, not actions. This means that to use them, we'd > > have > > + * to reformulate the actions as instructions, which is possible, > > and > > + * we'd have slot them into the frozen actions in a specific order, > > + * which doesn't seem practical. Instead, we translate these > > + * instructions into equivalent actions. */ > > +case OFPACT_GOTO_TABLE: { > > +struct ofpact_resubmit *resubmit > > += ofpact_put_RESUBMIT(>frozen_actions); > > +resubmit->in_port = OFPP_IN_PORT; > > +resubmit->table_id = ofpact_get_GOTO_TABLE(a)->table_id; > > +resubmit->with_ct_orig = false; > > +} > > +continue; > > +case OFPACT_WRITE_METADATA: { > > +const struct ofpact_metadata *md = > > ofpact_get_WRITE_METADATA(a); > > +const struct mf_field *mf = mf_from_id(MFF_METADATA); > > +ovs_assert(mf->n_bytes == sizeof md->metadata); > > +ovs_assert(mf->n_bytes == sizeof md->mask); > > +ofpact_put_set_field(>frozen_actions, mf, > > + >metadata, >mask); > > I have very little knowledge in this area, so this might be not issue. > However, 'OpenFlow "instructions", which were introduced in OpenFlow 1.1' > doesn't match with OFPACT_SET_FIELD that is marked as OpenFlow 1.2+ > action. I see that all tests for continuation have default bridge > configuration. And monitors are OF10 and OF13 in different cases, but > I'm not sure if these ovs-ofctl monitors are checking that received > actions for resume matches the OF version they are using. And for > the bridge itself, it supports all versions of OF, so it can interpret > actions. But the question is: will that work if the bridge only > supports OpenFlow 1.1 ? It should fail to parse OFPACT_SET_FIELD, or > am I missing something here? Thanks a lot for the review! OVS can serialize most actions and instructions to most OpenFlow versions, even if they are not natively supported in that particular OpenFlow version. For example, in the case of OFPACT_SET_FIELD (a rather important action), the code in ofp-actions.c has complete fallbacks: static void encode_SET_FIELD(const struct ofpact_set_field *sf, enum ofp_version ofp_version, struct ofpbuf *out) { if (ofp_version >= OFP15_VERSION) { /* OF1.5+ only has Set-Field (reg_load is redundant so we drop it * entirely). */ set_field_to_set_field(sf, ofp_version, out); } else if (sf->ofpact.raw == NXAST_RAW_REG_LOAD || sf->ofpact.raw == NXAST_RAW_REG_LOAD2) { /* It came in as reg_load, send it out the same way. */ set_field_to_nxast(sf, out); } else if (ofp_version < OFP12_VERSION) { /* OpenFlow 1.0 and 1.1 don't have Set-Field. */ set_field_to_legacy_openflow(sf, ofp_version, out); } else if (is_all_ones(ofpact_set_field_mask(sf), sf->field->n_bytes)) { /* We're encoding to OpenFlow 1.2, 1.3, or 1.4. The action sets an * entire field, so encode it as OFPAT_SET_FIELD. */ set_field_to_set_field(sf, ofp_version, out); } else { /* We're encoding to OpenFlow 1.2, 1.3, or 1.4. The action cannot be * encoded as OFPAT_SET_FIELD because it does not set an entire field, * so encode it as reg_load. */ set_field_to_nxast(sf, out); } } And even for the instructions we're talking about now, there are fallbacks for OF1.0, which doesn't have instructions, so that we can encode them as OF1.0 extension actions instead: static void encode_WRITE_METADATA(const struct ofpact_metadata *metadata, enum ofp_version ofp_version, struct ofpbuf *out) { if (ofp_version == OFP10_VERSION) { struct nx_action_write_metadata *nawm; nawm = put_NXAST_WRITE_METADATA(out); nawm->metadata = metadata->metadata; nawm->mask = metadata->mask; } else { struct ofp11_instruction_write_metadata *oiwm; oiwm = instruction_put_OFPIT11_WRITE_METADATA(out); oiwm->metadata = metadata->metadata; oiwm->metadata_mask = metadata->mask; } } ... static void encode_GOTO_TABLE(const struct ofpact_goto_table *goto_table,
[ovs-dev] ideas for improving the OVS review process
The OVS review process has greatly slowed over the last few years. This is partly because I haven't been able to spend as much time on review, since I was once the most productive reviewer. Ilya has been able to step up the amount of review he does, but that still isn't enough to keep up with the workload. We need to come up with some way to improve things. Here are a few ideas, mostly from a call earlier today (that was mainly about the OVS conference). I hope they will prompt a discussion. * Since patches are coming in, we have people who are knowledgable about the code. Those people should be pitching in with reviews as well. It doesn't seem like they or their managers have the right incentives to do that. Maybe there is some way to improve the incentives. * The Linux kernel uses something like a "default accept" policy for patches that superficially look good and compile and boot, if there is no review from a specific maintainer within a few days. The patches do get some small amount of review from a subsystem maintainer. OVS could adopt a similar policy. * Some lack of review can be attributed to a reluctance to accept a review from a reviewer who is at the same company as the patch submitter. There is good reason for this, but it is certainly possible to get quality reviews from a coworker, and perhaps we should relax the convention. * A flip side of the above could be to codify the requirement for review from a non-coworker. This would have the benefit of being able to push back against requests to commit unreviewed long series on the basis that it hasn't been reviewed by a third party. I hope that there can be some discussion here. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 1/2] Optimize the poll loop for poll_immediate_wake()
On Tue, Jul 20, 2021 at 06:26:48PM +0100, Anton Ivanov wrote: > > On 19/07/2021 21:52, Ben Pfaff wrote: > > On Fri, Jul 16, 2021 at 02:42:32PM +0100, anton.iva...@cambridgegreys.com > > wrote: > > > From: Anton Ivanov > > > > > > If we are not obtaining any useful information out of the poll(), > > > such as is a fd busy or not, we do not need to do a poll() if > > > an immediate_wake() has been requested. > > > > > > This cuts out all the pollfd hash additions, forming the poll > > > arguments and the actual poll() after a call to > > > poll_immediate_wake() > > > > > > Signed-off-by: Anton Ivanov > > Thanks for v3. > > > > I believe that the existing code here properly handles the pathological > > case where the current time is less than 0. This is a case that I have > > seen happen on real systems that have a real-time clock with a bad > > current time and do not have proper support for a monotonic clock. I > > believe that your new code does not properly handle this case, because > > it treats all times less than 0 as immediate. I think that you can fix > > it by comparing against LLONG_MIN rather than zero. > > > > In poll-loop, I would move > > COVERAGE_INC(poll_zero_timeout); > > inside the new statement that is already conditional on timeout_when == > > LLONG_MIN. > > > > I don't like the new assumption in time_poll() that n_pollfds == 0 means > > we're waking immediately. > > > > Thanks, > > > > Ben. > > > Updated, all logic is now strictly based on timeout == LLONG_MIN. Thanks! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 1/2] Optimize the poll loop for poll_immediate_wake()
On Tue, Jul 20, 2021 at 08:19:35AM +0100, Anton Ivanov wrote: > > On 19/07/2021 21:52, Ben Pfaff wrote: > > On Fri, Jul 16, 2021 at 02:42:32PM +0100, anton.iva...@cambridgegreys.com > > wrote: > > > From: Anton Ivanov > > > > > > If we are not obtaining any useful information out of the poll(), > > > such as is a fd busy or not, we do not need to do a poll() if > > > an immediate_wake() has been requested. > > > > > > This cuts out all the pollfd hash additions, forming the poll > > > arguments and the actual poll() after a call to > > > poll_immediate_wake() > > > > > > Signed-off-by: Anton Ivanov > > Thanks for v3. > > > > I believe that the existing code here properly handles the pathological > > case where the current time is less than 0. This is a case that I have > > seen happen on real systems that have a real-time clock with a bad > > current time and do not have proper support for a monotonic clock. I > > believe that your new code does not properly handle this case, because > > it treats all times less than 0 as immediate. I think that you can fix > > it by comparing against LLONG_MIN rather than zero. > In that case, we should set poll_immediate_wake to be LLONG_MIN as well, > right? It does that, even though it's a bit indirect, since poll_immediate_wake_at() passes 0 to poll_timer_wait_at(), which translates 0 to LLONG_MIN. > > In poll-loop, I would move > > COVERAGE_INC(poll_zero_timeout); > > inside the new statement that is already conditional on timeout_when == > > LLONG_MIN. > > > > I don't like the new assumption in time_poll() that n_pollfds == 0 means > > we're waking immediately. > > We can do everything on timeout. That is more consistent. Thanks. > As far as assumptions go, it is equivalent on Unixes, because the > number of fds is always > 1. This is because you always get 1 fd from > the signal pipe. That's true if poll_block() is the only caller of time_poll(), which is currently true, as long as we continue to deal with signals this way. But we don't document or enforce that assumption and I'd rather not rely on it. Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Fix continuations with OF instructions in OF1.1+.
On Wed, Jul 07, 2021 at 11:51:50AM -0700, Ben Pfaff wrote: > Open vSwitch supports OpenFlow "instructions", which were introduced in > OpenFlow 1.1 and act like restricted kinds of actions that can only > appear in a particular order and particular circumstances. OVS did > not support two of these instructions, "write_metadata" and > "goto_table", properly in the case where they appeared in a flow that > needed to be frozen for continuations. I would appreciate a review for this bug fix. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 2/2] Minimize the number of time calls in time_poll()
On Fri, Jul 16, 2021 at 02:42:33PM +0100, anton.iva...@cambridgegreys.com wrote: > From: Anton Ivanov > > time_poll() makes an excessive number of time_msec() calls > which incur a performance penalty. > > 1. Avoid time_msec() call for timeout calculation when time_poll() > is asked to skip poll() > > 2. Reuse the time_msec() result from deadline calculation for > last_wakeup and timeout calculation. > > Signed-off-by: Anton Ivanov I'll take another look at this in v4. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v3 1/2] Optimize the poll loop for poll_immediate_wake()
On Fri, Jul 16, 2021 at 02:42:32PM +0100, anton.iva...@cambridgegreys.com wrote: > From: Anton Ivanov > > If we are not obtaining any useful information out of the poll(), > such as is a fd busy or not, we do not need to do a poll() if > an immediate_wake() has been requested. > > This cuts out all the pollfd hash additions, forming the poll > arguments and the actual poll() after a call to > poll_immediate_wake() > > Signed-off-by: Anton Ivanov Thanks for v3. I believe that the existing code here properly handles the pathological case where the current time is less than 0. This is a case that I have seen happen on real systems that have a real-time clock with a bad current time and do not have proper support for a monotonic clock. I believe that your new code does not properly handle this case, because it treats all times less than 0 as immediate. I think that you can fix it by comparing against LLONG_MIN rather than zero. In poll-loop, I would move COVERAGE_INC(poll_zero_timeout); inside the new statement that is already conditional on timeout_when == LLONG_MIN. I don't like the new assumption in time_poll() that n_pollfds == 0 means we're waking immediately. Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] ovn-sbctl.at: Fix timing problem of count-flows test.
On Thu, Jul 15, 2021 at 06:10:40PM -0700, Han Zhou wrote: > On Thu, Jul 15, 2021 at 4:57 PM Ben Pfaff wrote: > > > On Thu, Jul 15, 2021 at 04:34:51PM -0700, Han Zhou wrote: > > > Fixes: 895e02ec0be6("ovn-sbctl.c Add logical flows count numbers") > > > Signed-off-by: Han Zhou > > > --- > > > tests/ovn-sbctl.at | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/tests/ovn-sbctl.at b/tests/ovn-sbctl.at > > > index 16f5dabcc..dabf62d2b 100644 > > > --- a/tests/ovn-sbctl.at > > > +++ b/tests/ovn-sbctl.at > > > @@ -200,7 +200,7 @@ Total number of logical flows = 0 > > > ]) > > > > > > # create some logical flows > > > -check ovn-nbctl ls-add count-test > > > +check ovn-nbctl --wait=sb ls-add count-test > > > > > > OVS_WAIT_UNTIL([total_lflows=`count_entries`; test $total_lflows -ne 0]) > > > > I'm surprised that the above change is needed, since OVS_WAIT_UNTIL > > should wait until the condition is true. > > > Yes you are right. This is not needed, but I think it’s not harmful either. > Do you want me to send a v2 with this removed? I don't think you need to send a v2, but I'd prefer to not check in that part of the change. > The second place is where this test occasionally fails. That one makes sense! For that part: Acked-by: Ben Pfaff ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] FW: [ovs-dev v1 1-1] datapath-windows-Correct checksum for DNAT action
On Sun, Jul 18, 2021 at 04:22:02AM +, Wilson Peng wrote: > Hi, d...@openvswitch.org , > > I would like to send out one patch for some issue found on datapah-windows in > OVS repo, > > I have sent the patch via the email attached in the mail and the guided doc to > Submit the patch is got via link below. > > https://docs.openvswitch.org/en/latest/internals/contributing/submitting-patches/ > > And I also subscribe the mail-list for ovs-dev, but I could not get the patch > I have submitted. > > This is the first time I sent out the patch into the OVS community. Not sure > how could I go > To proceed for the patch submitting. I don't think I saw a patch get caught in the spam filters (I would have released it if I had), so this is pretty odd. We have several submitters who submit from vmware.com email addresses, so I don't think that's the problem. If you can't figure it out, you could always use a different email account to submit the patches (you can still use the vmware.com address as the commit author), or you could submit PRs against github instead. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] Documentation: Remove duplicate words.
On Mon, Jul 19, 2021 at 01:27:16PM +0200, David Marchand wrote: > This is a simple cleanup with a script of mine. > > Signed-off-by: David Marchand > --- > This script is a bit too silly/simplistic and requires manual review > (too many false positives), but I don't think it is worth investing too > much time into integrating this kind of check in utilities/checkpatch.py. Thanks, pushed! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] OVS has now branched for the 2.16 release.
On Fri, Jul 16, 2021 at 10:42:01PM +0200, Ilya Maximets wrote: > Hi, everyone. > > branch-2.16 was just created. Please, test it and report issues! > > Official release date according to our release process should be on > Monday, August 16th (1 month from now). > > Best regards, Ilya Maximets. Thanks for doing it! ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2] controller: Avoid unnecessary load balancer flow processing.
On Fri, Jul 16, 2021 at 09:43:17AM +0200, Dumitru Ceara wrote: > On 7/16/21 3:05 AM, Han Zhou wrote: > > On Thu, Jul 15, 2021 at 5:03 PM Ben Pfaff wrote: > > > >> On Mon, Jul 12, 2021 at 10:08:10AM +0200, Dumitru Ceara wrote: > >>> Whenever a Load_Balancer is updated, e.g., a VIP is added, the following > >>> sequence of events happens: > >>> > >>> 1. The Southbound Load_Balancer record is updated. > >>> 2. The Southbound Datapath_Binding records on which the Load_Balancer is > >>>applied are updated. > >>> 3. Southbound ovsdb-server sends updates about the Load_Balancer and > >>>Datapath_Binding records to ovn-controller. > >>> 4. The IDL layer in ovn-controller processes the updates at #3, but > >>>because of the SB schema references between tables [0] all logical > >>>flows referencing the updated Datapath_Binding are marked as > >>>"updated". The same is true for Logical_DP_Group records > >>>referencing the Datapath_Binding, and also for all logical flows > >>>pointing to the new "updated" datapath groups. > >>> 5. ovn-controller ends up recomputing (removing/readding) all flows for > >>>all these tracked updates. > >> > >> This is kind of a weird change from my perspective. It allows for > >> broken referential integrity in the database to work around a > >> performance bug in the IDL. > > Thanks for looking at this change! > > I guess the description in the commit message is a bit unclear. Let me > try to fix that here. There is no bug in the IDL; it's doing what it's > supposed to do given the Southbound database schema. > > The problem is with the schema itself. When flow generation for load > balancers was moved to ovn-controller an additional reference (from > Datapath_Binding to Load_Balancer) was added in the SB schema to avoid > performance issues or extremely complex code in ovn-controller to handle > Load_Balancer updates incrementally. > > My first attempt at the fix was to just remove that additional > reference. That is fine, and doesn't affect correctness but: > 1. makes the upgrade scenario complex because ovn-controllers running > old versions would expect that reference to be there. > 2. adds lots of complicated code in ovn-controller incremental-processing. > > > > > > > Yes, it did look weird and there were detailed discussions on it in the v1 > > reviews. Some options that require much bigger changes were discussed for > > longer term, unless ddlog is used. > > > > This would be the longer term solution, along with removing the > reference from Datapath_Binding to Load_Balancers completely. OK. In context, it makes more sense. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2] controller: Avoid unnecessary load balancer flow processing.
On Mon, Jul 12, 2021 at 10:08:10AM +0200, Dumitru Ceara wrote: > Whenever a Load_Balancer is updated, e.g., a VIP is added, the following > sequence of events happens: > > 1. The Southbound Load_Balancer record is updated. > 2. The Southbound Datapath_Binding records on which the Load_Balancer is >applied are updated. > 3. Southbound ovsdb-server sends updates about the Load_Balancer and >Datapath_Binding records to ovn-controller. > 4. The IDL layer in ovn-controller processes the updates at #3, but >because of the SB schema references between tables [0] all logical >flows referencing the updated Datapath_Binding are marked as >"updated". The same is true for Logical_DP_Group records >referencing the Datapath_Binding, and also for all logical flows >pointing to the new "updated" datapath groups. > 5. ovn-controller ends up recomputing (removing/readding) all flows for >all these tracked updates. This is kind of a weird change from my perspective. It allows for broken referential integrity in the database to work around a performance bug in the IDL. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] Optimize the poll loop for poll_immediate_wake()
I'm impressed by the improvement. I didn't really expect any. On Tue, Jul 13, 2021 at 09:19:34AM +0100, Anton Ivanov wrote: > I ran the revised patch series (v2) which addresses your comments on the > ovn-heater benchmark. > > With 9 fake nodes, 50 fake pods per node I get ~ 2% reproducible improvement. > IMHO it should be even more noticeable at high scale. > > Best regards, > > A > > On 09/07/2021 19:45, Ben Pfaff wrote: > > On Fri, Jul 09, 2021 at 06:19:06PM +0100, anton.iva...@cambridgegreys.com > > wrote: > > > From: Anton Ivanov > > > > > > If we are not obtaining any useful information out of the poll(), > > > such as is a fd busy or not, we do not need to do a poll() if > > > an immediate_wake() has been requested. > > > > > > This cuts out all the pollfd hash additions, forming the poll > > > arguments and the actual poll() after a call to > > > poll_immediate_wake() > > > > > > Signed-off-by: Anton Ivanov > > Thanks for the patch. > > > > I think that this will have some undesirable side effects because it > > avoids calling time_poll() if the wakeup should happen immediately, and > > time_poll() does some thing that we always want to happen between one > > main loop and another. In particular the following calls from > > time_poll() are important: > > > > coverage_clear(); > > coverage_run(); > > if (*last_wakeup && !thread_is_pmd()) { > > log_poll_interval(*last_wakeup); > > } > > > > ... > > > > if (!time_left) { > > ovsrcu_quiesce(); > > } else { > > ovsrcu_quiesce_start(); > > } > > > > ... > > > > if (deadline <= time_msec()) { > > #ifndef _WIN32 > > fatal_signal_handler(SIGALRM); > > #else > > VLOG_ERR("wake up from WaitForMultipleObjects after deadline"); > > fatal_signal_handler(SIGTERM); > > #endif > > if (retval < 0) { > > retval = 0; > > } > > break; > > } > > > > ... > > > > *last_wakeup = time_msec(); > > refresh_rusage(); > > > > Instead of this change, I'd suggest something more like the following, > > along with the changes you made to suppress the file descriptors if the > > timeout is already zero: > > > > diff --git a/lib/timeval.c b/lib/timeval.c > > index 193c7bab1781..f080a742 100644 > > --- a/lib/timeval.c > > +++ b/lib/timeval.c > > @@ -323,7 +323,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE > > *handles OVS_UNUSED, > > } > > #ifndef _WIN32 > > -retval = poll(pollfds, n_pollfds, time_left); > > +retval = time_left ? poll(pollfds, n_pollfds, time_left) : 0; > > if (retval < 0) { > > retval = -errno; > > } > > > > > -- > Anton R. Ivanov > Cambridgegreys Limited. Registered in England. Company Number 10273661 > https://www.cambridgegreys.com/ > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] ovn-sbctl.at: Fix timing problem of count-flows test.
On Thu, Jul 15, 2021 at 04:34:51PM -0700, Han Zhou wrote: > Fixes: 895e02ec0be6("ovn-sbctl.c Add logical flows count numbers") > Signed-off-by: Han Zhou > --- > tests/ovn-sbctl.at | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tests/ovn-sbctl.at b/tests/ovn-sbctl.at > index 16f5dabcc..dabf62d2b 100644 > --- a/tests/ovn-sbctl.at > +++ b/tests/ovn-sbctl.at > @@ -200,7 +200,7 @@ Total number of logical flows = 0 > ]) > > # create some logical flows > -check ovn-nbctl ls-add count-test > +check ovn-nbctl --wait=sb ls-add count-test > > OVS_WAIT_UNTIL([total_lflows=`count_entries`; test $total_lflows -ne 0]) I'm surprised that the above change is needed, since OVS_WAIT_UNTIL should wait until the condition is true. > @@ -219,7 +219,7 @@ $ingress_lflows > ]) > > # add another datapath > -check ovn-nbctl ls-add count-test2 > +check ovn-nbctl --wait=sb ls-add count-test2 > > # check total logical flows in 2 datapathes > AT_CHECK_UNQUOTED([ovn-sbctl count-flows | grep "flows =" | awk 'NF>1{print > $NF}'], [0], [dnl The one above makes sense to me. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog - high mem and cpu usage when started with an existing DB
On Mon, Jul 12, 2021 at 04:42:27PM -0700, Ben Pfaff wrote: > On Thu, Jul 08, 2021 at 08:59:24PM +0200, Dumitru Ceara wrote: > > Hi Ben, > > > > As discussed earlier, during the OVN meeting, I've noticed a new > > performance issue with ovn-northd-ddlog when running it against a > > database from one of our more recent scale tests: > > > > http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210708/ovnnb_db.db > > > > ovn-northd-ddlog uses 100% CPU and never really reaches the point to > > perform the first transaction to the Southbound. Memory usage is also > > very high, I stopped it at 45GB RSS. > > > > To test I did: > > SANDBOXFLAGS="--nbdb-source=/tmp/ovnnb_db.db --ddlog" make sandbox > > Thanks. I've been spending a lot of time with this Friday and today. > It is a bit different from the other issues I've looked at. The > previous ones were inefficient production of relatively small output. > This one is inefficient production (and storage) of rather large output > (millions of flows). I'm trying to get help from Leonid on how to > reduce the memory usage. Leonid has been looking into this and we're going to talk through a solution tomorrow. With luck, I'll have some patches soon after that. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] bond: Fix broken rebalancing after link state changes.
On Tue, Jul 13, 2021 at 05:32:06PM +0200, Ilya Maximets wrote: > There are 3 constraints for moving hashes from one member to another: > > 1. The load difference exceeds ~ 3% of the load of one member. > 2. The difference in load between members exceeds 100,000 bytes. > 3. Moving the hash reduces the load difference by more than 10%. > > In the current implementation, if one of the members transitions to > the DOWN state, all hashes assigned to it will be moved to the other > members. After that, if the member goes UP, it will wait for > rebalancing to get hashes. But in case we have more than 10 equally > loaded hashes, it will never meet constraint # 3, because each hash > will handle less than 10% of the load. The situation gets worse when > the number of flows grows and it is almost impossible to transfer any > hash when all 256 hash records are used, which is very likely when we > have few hundred/thousand flows. > > As a result, if one of the members goes down and back up while traffic > flows, it will never be used to transmit packets again. This will not > be fixed even if we completely stop the traffic and start it again, > because the first two constraints will block rebalancing in the > earlier stages, while we have low traffic volume. > > Moving a single hash if the destination does not have any hashes, > as it was before commit c460a6a7bc75 ("ofproto/bond: simplifying the > rebalancing logic"), will not help, because a single hash is not > enough to make the difference in load less than 10% of the total load, > and this member will handle only that one hash forever. > > To fix this, let's try to move multiple hashes at the same time to > meet constraint # 3. > > The implementation includes sorting the "records" to be able to > collect records with a cumulative load close enough to the ideal value. > > Signed-off-by: Ilya Maximets I reread this and it still looks good to me. I spotted one typo in a comment: diff --git a/ofproto/bond.c b/ofproto/bond.c index b9bfa45493b8..c3e2083575b0 100644 --- a/ofproto/bond.c +++ b/ofproto/bond.c @@ -1216,7 +1216,7 @@ choose_entries_to_migrate(const struct bond_member *from, uint64_t to_tx_bytes, } if (!cnt) { -/* There is no entry which load less than or equal to 'ideal_delta'. +/* There is no entry with load less than or equal to 'ideal_delta'. * Lets try closest one. The closest is the last in sorted list. */ struct bond_entry *closest; Acked-by: Ben Pfaff ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn 2/2] tutorial: Remove OVS-specific files.
On Tue, Jul 13, 2021 at 12:17:48PM -0400, Numan Siddique wrote: > On Fri, Jul 2, 2021 at 5:36 PM Ben Pfaff wrote: > > > > These were part of the OVS tutorial, which isn't relevant for OVN. > > > > Signed-off-by: Ben Pfaff > > Acked-by: Numan Siddique Thanks for the reviews. I pushed these commits. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 2/2] Minimize the number of time calls in time_poll()
On Mon, Jul 12, 2021 at 06:15:29PM +0100, anton.iva...@cambridgegreys.com wrote: > From: Anton Ivanov > > time_poll() makes an excessive number of time_msec() calls > which incur a performance penalty. > > 1. Avoid time_msec() call for timeout calculation when time_poll() > is asked to skip poll() > > 2. Reuse the time_msec() result from deadline calculation for > last_wakeup and timeout calculation. > > Signed-off-by: Anton Ivanov I'd like another look at that after patch 1 is revised. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 1/2] Optimize the poll loop for poll_immediate_wake()
On Mon, Jul 12, 2021 at 06:15:28PM +0100, anton.iva...@cambridgegreys.com wrote: > From: Anton Ivanov > > If we are not obtaining any useful information out of the poll(), > such as is a fd busy or not, we do not need to do a poll() if > an immediate_wake() has been requested. > > This cuts out all the pollfd hash additions, forming the poll > arguments and the actual poll() after a call to > poll_immediate_wake() > > Signed-off-by: Anton Ivanov I don't think we need the new 'immediate_wake' member of struct poll_loop. A 'timeout_when' of LLONG_MIN already means the same thing (it's even documented in the comment). Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v7 ovn 1/5] ovn-northd-ddlog: Optimize AggregatedFlow rules.
On Thu, Jul 15, 2021 at 08:45:00PM +0200, Lorenzo Bianconi wrote: > > On Thu, Jul 15, 2021 at 11:18:12AM -0700, Han Zhou wrote: > > > On Wed, Jul 14, 2021 at 1:34 AM Lorenzo Bianconi < > > > lorenzo.bianc...@redhat.com> wrote: > > > > This should avoid some work by doing the cheapest check (the one on > > > > UseLogicalDatapathGroups) before any joins. DDlog is probably > > > > factoring out the reference to the Flow relation, which is identical > > > > in both, but this ought to avoid the group_by aggregation (which is > > > > relatively expensive) in the case where UseLogicalDatapathGroups is > > > > not enabled. > > > > > > Thanks! I didn't know that the order matters. (not sure if there is > > > documentation that I missed) > > > > In general, DDlog executes each rule in the order given, so if you have > > a series of joins in a rule, then it's a good idea to order them, if you > > can, to keep the number of records small at each join step. It won't > > affect the correctness, but it will affect performance. > > > > This might not be documented well. I do occasionally work on the DDlog > > documentation, so I've made a note to try to see whether the effect of > > ordering is mentioned and improve it if I can. > > > > In this particular case, I talked to Leonid about it during a discussion > > of how to improve performance and memory use in the benchmark currently > > at issue, and Leonid says that the ordering doesn't actually matter in > > this case because both of the possibilities (the ones for > > UseLogicalDatapathGroups[true] and UseLogicalDatapathGroups[false]) had > > an identical clause at the beginning. DDlog optimizes identical > > prefixes, so there was no real difference in performance. > > > > This patch could, therefore, be dropped, but it should also not be > > harmful. Dropping it would require some changes to the later patch that > > updates the ovn-northd ddlog code, so folding it into that one would > > also be an option. > > > > Do you want me to repost dropping this patch or is the series fine as it is? The series is fine as is. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovsdb-server: Fix memleak when failing to rea storage.
On Wed, Jul 14, 2021 at 09:21:19AM +0200, Dumitru Ceara wrote: > Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered > databases.") > Signed-off-by: Dumitru Ceara Thanks! I pushed this. I noticed that the declaration could be moved down to the first assignment, so I actually pushed the following. I backported to all affected branches. -8<--cut here-->8-- From: Dumitru Ceara Date: Wed, 14 Jul 2021 09:21:19 +0200 Subject: [PATCH] ovsdb-server: Fix memleak when failing to read storage. Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.") Signed-off-by: Dumitru Ceara Signed-off-by: Ben Pfaff --- ovsdb/ovsdb-server.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c index b09232c654ab..f4a0fac99259 100644 --- a/ovsdb/ovsdb-server.c +++ b/ovsdb/ovsdb-server.c @@ -660,8 +660,6 @@ add_db(struct server_config *config, struct db *db) static struct ovsdb_error * OVS_WARN_UNUSED_RESULT open_db(struct server_config *config, const char *filename) { -struct db *db; - /* If we know that the file is already open, return a good error message. * Otherwise, if the file is open, we'll fail later on with a harder to * interpret file locking error. */ @@ -676,9 +674,6 @@ open_db(struct server_config *config, const char *filename) return error; } -db = xzalloc(sizeof *db); -db->filename = xstrdup(filename); - struct ovsdb_schema *schema; if (ovsdb_storage_is_clustered(storage)) { schema = NULL; @@ -691,6 +686,9 @@ open_db(struct server_config *config, const char *filename) } ovs_assert(schema && !txn_json); } + +struct db *db = xzalloc(sizeof *db); +db->filename = xstrdup(filename); db->db = ovsdb_create(schema, storage); ovsdb_jsonrpc_server_add_db(config->jsonrpc, db->db); -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2] latch-unix: Decrease the stack usage in latch
On Thu, Jul 15, 2021 at 04:28:12PM +0100, anton.iva...@cambridgegreys.com wrote: > From: Anton Ivanov > > 1. Make latch behave as described and documented - clear all > outstanding latch writes when invoking latch_poll(). > 2. Decrease the size of the latch buffer. Less stack usage, > less cache thrashing. > > Signed-off-by: Anton Ivanov Applied, thanks! I dropped the following hunk: > --- a/lib/latch-unix.c > +++ b/lib/latch-unix.c > @@ -23,6 +23,7 @@ > #include "openvswitch/poll-loop.h" > #include "socket-util.h" > > + > /* Initializes 'latch' as initially unset. */ > void > latch_init(struct latch *latch) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v7 ovn 1/5] ovn-northd-ddlog: Optimize AggregatedFlow rules.
On Thu, Jul 15, 2021 at 11:18:12AM -0700, Han Zhou wrote: > On Wed, Jul 14, 2021 at 1:34 AM Lorenzo Bianconi < > lorenzo.bianc...@redhat.com> wrote: > > This should avoid some work by doing the cheapest check (the one on > > UseLogicalDatapathGroups) before any joins. DDlog is probably > > factoring out the reference to the Flow relation, which is identical > > in both, but this ought to avoid the group_by aggregation (which is > > relatively expensive) in the case where UseLogicalDatapathGroups is > > not enabled. > > Thanks! I didn't know that the order matters. (not sure if there is > documentation that I missed) In general, DDlog executes each rule in the order given, so if you have a series of joins in a rule, then it's a good idea to order them, if you can, to keep the number of records small at each join step. It won't affect the correctness, but it will affect performance. This might not be documented well. I do occasionally work on the DDlog documentation, so I've made a note to try to see whether the effect of ordering is mentioned and improve it if I can. In this particular case, I talked to Leonid about it during a discussion of how to improve performance and memory use in the benchmark currently at issue, and Leonid says that the ordering doesn't actually matter in this case because both of the possibilities (the ones for UseLogicalDatapathGroups[true] and UseLogicalDatapathGroups[false]) had an identical clause at the beginning. DDlog optimizes identical prefixes, so there was no real difference in performance. This patch could, therefore, be dropped, but it should also not be harmful. Dropping it would require some changes to the later patch that updates the ovn-northd ddlog code, so folding it into that one would also be an option. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] Fix compilation error for m32.
On Wed, Jul 14, 2021 at 06:23:15PM -0400, num...@ovn.org wrote: > From: Numan Siddique > > Fixes: 895e02ec0be6("ovn-sbctl.c Add logical flows count numbers") > Signed-off-by: Numan Siddique Acked-by: Ben Pfaff ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] latch-unix: Make the latch read buffer shared
On Wed, Jul 14, 2021 at 09:12:10PM +0100, Anton Ivanov wrote: > On 14/07/2021 19:33, Ben Pfaff wrote: > > On Wed, Jul 14, 2021 at 05:36:36PM +0100, anton.iva...@cambridgegreys.com > > wrote: > > > From: Anton Ivanov > > > > > > There is no point to add 512 bytes on the stack > > > every time latch is polled. Alignment, cache line thrashing, > > > etc - you name it. > > Do you have evidence this is a real problem? > > I played a bit with it using the ovn-heater benchmark, difference was > marginal. > > IMHO it will result in a difference only on a bigger setup which I cannot > simulate. > > > > > > The result of the read is discarded anyway so the buffer > > > can be shared by all latches. > > > > > > Signed-off-by: Anton Ivanov > > > +/* All writes to latch are zero sized. Even 16 bytes are an overkill */ > > > +static char latch_buffer[16]; > > This comment is wrong. Writes to a latch are 1 byte. > > Me bad - I saw the "" in write() and ignored the 1 passed as length. > > > > > latch_poll() is supposed to fully clear any buffered data. It shouldn't > > cause behavioral problems if it doesn't, and I imagine that it's rare > > that there'd be more than 16 queued notifications, but it seems > > regressive to just clear some of them. > > The read can be looped. In fact, for full correctness it should be looped > regardless of the size of the read buffer. Couldn't hurt. > So maybe 16 local looped? I'd be OK with that. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] latch-unix: Make the latch read buffer shared
On Wed, Jul 14, 2021 at 05:36:36PM +0100, anton.iva...@cambridgegreys.com wrote: > From: Anton Ivanov > > There is no point to add 512 bytes on the stack > every time latch is polled. Alignment, cache line thrashing, > etc - you name it. Do you have evidence this is a real problem? > The result of the read is discarded anyway so the buffer > can be shared by all latches. > > Signed-off-by: Anton Ivanov > +/* All writes to latch are zero sized. Even 16 bytes are an overkill */ > +static char latch_buffer[16]; This comment is wrong. Writes to a latch are 1 byte. latch_poll() is supposed to fully clear any buffered data. It shouldn't cause behavioral problems if it doesn't, and I imagine that it's rare that there'd be more than 16 queued notifications, but it seems regressive to just clear some of them. It's silly to use static data for 16 bytes. If you're going to reduce the size, just keep it as local. > /* Initializes 'latch' as initially unset. */ > void > latch_init(struct latch *latch) > @@ -43,9 +46,7 @@ latch_destroy(struct latch *latch) > bool > latch_poll(struct latch *latch) > { > -char buffer[_POSIX_PIPE_BUF]; > - > -return read(latch->fds[0], buffer, sizeof buffer) > 0; > +return read(latch->fds[0], _buffer, sizeof latch_buffer) > 0; > } > > /* Sets 'latch'. > -- > 2.20.1 > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v6 ovn 0/4] respin CoPP series
On Tue, Jul 13, 2021 at 12:00:13PM -0700, Ben Pfaff wrote: > On Sat, Jul 10, 2021 at 02:56:53PM +0200, Lorenzo Bianconi wrote: > > ack, no worries. I folded the patch here: > > https://github.com/LorenzoBianconi/ovn/tree/CoPP-v7 > > > > however CoPP tests available in system-ovn.at and for > > ovn-northd/ovn-controller > > are failing with DDlog. Can you please double check if you are facing the > > same > > issues? > > I didn't run the system-ovn.at tests, but the ones for ovn-northd and > ovn-controller are embarrassing. Sorry about that. I'll fix them, and > I'll see if I can get the system-ovn tests set up and run here. Here's a fix to fold in. With this, I get failures in the following tests. I'm going to assume that this isn't related since all the versions of the same test fail: 539: Symmetric IPv6 ECMP reply flows -- ovn-northd -- dp-groups=yes FAILED (ovn.at:23254) 540: Symmetric IPv6 ECMP reply flows -- ovn-northd -- dp-groups=no FAILED (ovn.at:23254) 541: Symmetric IPv6 ECMP reply flows -- ovn-northd-ddlog -- dp-groups=yes FAILED (ovn.at:23254) 542: Symmetric IPv6 ECMP reply flows -- ovn-northd-ddlog -- dp-groups=no FAILED (ovn.at:23254) I still haven't set up the system-ovn tests. It's on my to-do list. -8<--cut here-->8-- ovn-northd-ddlog: Fix CoPP behavior. In ovn_northd.dl, the new 'controller_meter' column needs to be included in the row UUID, otherwise replacing a row with one that is identical except for controller_meter will fail with an ovsdb-server transaction error. In ovn-controller.at, it's necessary to wait for flow table updates. --- northd/ovn_northd.dl| 4 ++-- tests/ovn-controller.at | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index d7b54be85644..71e868f05db2 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -1697,7 +1697,7 @@ for (f in AggregatedFlow()) { if (f.logical_datapaths.size() == 1) { Some{var dp} = f.logical_datapaths.nth(0) in sb::Out_Logical_Flow( -._uuid = hash128((dp, f.stage, f.priority, f.__match, f.actions, f.external_ids)), +._uuid = hash128((dp, f.stage, f.priority, f.__match, f.actions, f.controller_meter, f.external_ids)), .logical_datapath = Some{dp}, .logical_dp_group = None, .pipeline = pipeline, @@ -1710,7 +1710,7 @@ for (f in AggregatedFlow()) { } else { var group_uuid = hash128(f.logical_datapaths) in { sb::Out_Logical_Flow( -._uuid = hash128((group_uuid, f.stage, f.priority, f.__match, f.actions, f.external_ids)), +._uuid = hash128((group_uuid, f.stage, f.priority, f.__match, f.actions, f.controller_meter, f.external_ids)), .logical_datapath = None, .logical_dp_group = Some{group_uuid}, .pipeline = pipeline, diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at index 475528623b5c..e317b70aa55f 100644 --- a/tests/ovn-controller.at +++ b/tests/ovn-controller.at @@ -632,19 +632,19 @@ check ovn-nbctl ls-lb-add ls1 lb1 # controller-event metering check ovn-nbctl meter-add event-elb drop 100 pktps 10 -check ovn-nbctl ls-copp-add ls1 event-elb event-elb +check ovn-nbctl --wait=hv ls-copp-add ls1 event-elb event-elb AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep controller | grep userdata=00.00.00.0f | grep -q meter_id=1]) # reject metering check ovn-nbctl meter-add acl-meter drop 1 pktps 0 check ovn-nbctl ls-copp-add ls1 reject acl-meter -check ovn-nbctl acl-add ls1 from-lport 1002 'inport == "lsp1" && ip && udp' reject +check ovn-nbctl --wait=hv acl-add ls1 from-lport 1002 'inport == "lsp1" && ip && udp' reject AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep controller | grep userdata=00.00.00.16 | grep -q meter_id=2]) # arp metering check ovn-nbctl meter-add arp-meter drop 200 pktps 0 -check ovn-nbctl lr-copp-add lr1 arp-resolve arp-meter +check ovn-nbctl --wait=hv lr-copp-add lr1 arp-resolve arp-meter AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep controller | grep userdata=00.00.00.00 | grep -q meter_id=3]) OVN_CLEANUP([hv1]) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v6 ovn 0/4] respin CoPP series
On Sat, Jul 10, 2021 at 02:56:53PM +0200, Lorenzo Bianconi wrote: > ack, no worries. I folded the patch here: > https://github.com/LorenzoBianconi/ovn/tree/CoPP-v7 > > however CoPP tests available in system-ovn.at and for > ovn-northd/ovn-controller > are failing with DDlog. Can you please double check if you are facing the same > issues? I didn't run the system-ovn.at tests, but the ones for ovn-northd and ovn-controller are embarrassing. Sorry about that. I'll fix them, and I'll see if I can get the system-ovn tests set up and run here. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog - high mem and cpu usage when started with an existing DB
On Thu, Jul 08, 2021 at 08:59:24PM +0200, Dumitru Ceara wrote: > Hi Ben, > > As discussed earlier, during the OVN meeting, I've noticed a new > performance issue with ovn-northd-ddlog when running it against a > database from one of our more recent scale tests: > > http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210708/ovnnb_db.db > > ovn-northd-ddlog uses 100% CPU and never really reaches the point to > perform the first transaction to the Southbound. Memory usage is also > very high, I stopped it at 45GB RSS. > > To test I did: > SANDBOXFLAGS="--nbdb-source=/tmp/ovnnb_db.db --ddlog" make sandbox Thanks. I've been spending a lot of time with this Friday and today. It is a bit different from the other issues I've looked at. The previous ones were inefficient production of relatively small output. This one is inefficient production (and storage) of rather large output (millions of flows). I'm trying to get help from Leonid on how to reduce the memory usage. Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] ovn-nbctl: Fix ovn-nbctl --print-wait-time.
On Fri, Jul 09, 2021 at 12:23:49PM -0700, Han Zhou wrote: > Recently ovn-nbctl was changed to partially monitor NB DB, which > revealed the missing column monitoring for NB_Global:nb_cfg_timestamp. > > Without the fix, ovn-nbctl --wait=hv --print-wait-time would print > something like: > > Time spent on processing nb_cfg 39: > ovn-northd delay before processing: -1623916000627ms > ovn-northd completion: 14408ms > ovn-controller(s) completion: 36204ms > > Signed-off-by: Han Zhou Acked-by: Ben Pfaff ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] Optimize the poll loop for poll_immediate_wake()
On Fri, Jul 09, 2021 at 06:19:06PM +0100, anton.iva...@cambridgegreys.com wrote: > From: Anton Ivanov > > If we are not obtaining any useful information out of the poll(), > such as is a fd busy or not, we do not need to do a poll() if > an immediate_wake() has been requested. > > This cuts out all the pollfd hash additions, forming the poll > arguments and the actual poll() after a call to > poll_immediate_wake() > > Signed-off-by: Anton Ivanov Thanks for the patch. I think that this will have some undesirable side effects because it avoids calling time_poll() if the wakeup should happen immediately, and time_poll() does some thing that we always want to happen between one main loop and another. In particular the following calls from time_poll() are important: coverage_clear(); coverage_run(); if (*last_wakeup && !thread_is_pmd()) { log_poll_interval(*last_wakeup); } ... if (!time_left) { ovsrcu_quiesce(); } else { ovsrcu_quiesce_start(); } ... if (deadline <= time_msec()) { #ifndef _WIN32 fatal_signal_handler(SIGALRM); #else VLOG_ERR("wake up from WaitForMultipleObjects after deadline"); fatal_signal_handler(SIGTERM); #endif if (retval < 0) { retval = 0; } break; } ... *last_wakeup = time_msec(); refresh_rusage(); Instead of this change, I'd suggest something more like the following, along with the changes you made to suppress the file descriptors if the timeout is already zero: diff --git a/lib/timeval.c b/lib/timeval.c index 193c7bab1781..f080a742 100644 --- a/lib/timeval.c +++ b/lib/timeval.c @@ -323,7 +323,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED, } #ifndef _WIN32 -retval = poll(pollfds, n_pollfds, time_left); +retval = time_left ? poll(pollfds, n_pollfds, time_left) : 0; if (retval < 0) { retval = -errno; } ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] is the OVN load balancer also intended to be a firewall?
On Wed, Jul 07, 2021 at 05:17:17PM -0700, Han Zhou wrote: > On Wed, Jul 7, 2021 at 3:39 PM Ben Pfaff wrote: > > > > Hi, I've been talking to Shay Vargaftik (CC'd), also a researcher at > > VMware, about some work he's done on optimizing load balancers. What > > he's come up with is a technique that in many cases avoids putting > > connections into the connection-tracking table, because it can achieve > > per-connection consistency without needing to do that. This improves > > performance by reducing the size of the connection-tracking table, which > > is therefore more likely to fit inside a CPU cache and cheaper to > > search. > > > > I'm trying to determine whether this technique would apply to OVN's load > > balancer. There would be challenges in any case, but one fundamental > > question I have is: is the OVN load balancer also supposed to be a > > firewall? If it's not, then it's worth continuing to look to see if the > > technique is applicable. On the other hand, if it is, then every > > connection needs to be tracked in any case, so the technique can't be > > useful. > > > > Anyone's thoughts would be welcome. > > > For my understanding OVN LB doesn't directly relate to FW (OVN ACLs), > although they both use conntrack. For LB, we use conntrack for NAT (convert > the client IP to an LB owned IP) purposes. Does this technique support NAT > without using conntrack? Some kinds of NAT just change IP addresses. These kinds of NAT don't fundamentally need to track per-connection state, so this kind of NAT would not demand conntrack. Other kinds of NAT pull a port number from a pool and assign it to each connection. The latter kind does need to track per-connection state, so it does need conntrack. I'm not sure which category the OVN LB is in. > Moreover, maybe for the future, we also need to consider the cases when a > LB is applied on a OVN gateway, for HA purposes the NAT tracking entries > need to be able to be replicated across nodes, so that when failover > happens the existing connections can continue working through another > gateway node. I don't think this is a problem for Shay's technique, which is called JET (for Just Enough Tracking). JET is designed to gracefully cope with failover. In fact, this is possibly an even better case for JET, because fewer entries mean less replication. > There are also OVN LB use cases that don't require NAT. If this technique > doesn't support NAT, it is probably still useful for those scenarios. That makes sense. Do you have any idea which of the various cases that OVN supports are commonly used? Or whether performance is an issue for them in practice? It would be silly to spend a lot of time optimizing the performance of something that is already fast enough. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev