Re: [ovs-dev] ovn-northd-ddlog scale issues
On Thu, Jul 01, 2021 at 12:18:12AM -0700, Ben Pfaff wrote: > On Tue, Jun 29, 2021 at 03:32:26PM -0700, Ben Pfaff wrote: > > On Mon, Jun 28, 2021 at 05:40:53PM +0200, Dumitru Ceara wrote: > > > On 5/20/21 5:50 PM, Ben Pfaff wrote: > > > > On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote: > > > >> On 4/7/21 6:49 PM, Ben Pfaff wrote: > > > >> > > > >> [...] > > > >> > > > > > > Thanks! I can download them now. It's back on my to-do list. > > > >>> > > > >>> I can reproduce the problem now. I haven't fixed it yet, but I did > > > >>> fix > > > >>> a nasty performance problem in ovn-nbctl that became really apparent > > > >>> when working with your databases: > > > >>> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html > > > >> > > > >> I was wondering if you had a chance to look at this since. > > > > > > > > I haven't kept going. I consider my series that gives a 5x performance > > > > improvement a kind of checkpoint along the way. I assumed at first that > > > > it would get reviewed quickly so I could move on to other things, but no > > > > one has looked at it yet. > > > > > > > > > > > > > Hi Ben, > > > > > > Just a note, I've tried this again with ovn-northd-ddlog built from > > > current OVN master branch, running against the same DBs: > > > > I've identified the problem. It's because of the ReachableLogicalRouter > > relation, which holds all pairs of routers (A,B) such that a packet at > > router A can transit switches and rotuers to arrive at router B. This > > is inherently O(n**2) and in this example n is about 8,000. > > > > I'll fix it. > > The following is pretty close, but I see three test failures I need to > investigate: > > 273: ovn.at:11253 ovn -- vlan traffic for external network with > distributed router gateway port -- ovn-northd-ddlog -- dp-groups=yes > 641: ovn.at:26903 ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 LR > -- ovn-northd-ddlog -- dp-groups=yes > proxy-arp > 642: ovn.at:26903 ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 LR > -- ovn-northd-ddlog -- dp-groups=no > proxy-arp These test failures were red herrings. I posted a formal version of this patch for review: https://patchwork.ozlabs.org/project/ovn/patch/20210702005640.1627098-1-...@ovn.org/ ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog scale issues
On Thu, Jul 1, 2021 at 4:31 AM Dumitru Ceara wrote: > > On 7/1/21 9:18 AM, Ben Pfaff wrote: > > On Tue, Jun 29, 2021 at 03:32:26PM -0700, Ben Pfaff wrote: > >> On Mon, Jun 28, 2021 at 05:40:53PM +0200, Dumitru Ceara wrote: > >>> On 5/20/21 5:50 PM, Ben Pfaff wrote: > On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote: > > On 4/7/21 6:49 PM, Ben Pfaff wrote: > > > > [...] > > > >>> > >>> Thanks! I can download them now. It's back on my to-do list. > >> > >> I can reproduce the problem now. I haven't fixed it yet, but I did fix > >> a nasty performance problem in ovn-nbctl that became really apparent > >> when working with your databases: > >> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html > > > > I was wondering if you had a chance to look at this since. > > I haven't kept going. I consider my series that gives a 5x performance > improvement a kind of checkpoint along the way. I assumed at first that > it would get reviewed quickly so I could move on to other things, but no > one has looked at it yet. > > >>> > >>> > >>> Hi Ben, > >>> > >>> Just a note, I've tried this again with ovn-northd-ddlog built from > >>> current OVN master branch, running against the same DBs: > >> > >> I've identified the problem. It's because of the ReachableLogicalRouter > >> relation, which holds all pairs of routers (A,B) such that a packet at > >> router A can transit switches and rotuers to arrive at router B. This > >> is inherently O(n**2) and in this example n is about 8,000. > >> > >> I'll fix it. > > > > The following is pretty close, but I see three test failures I need to > > investigate: > > > > 273: ovn.at:11253 ovn -- vlan traffic for external network with > > distributed router gateway port -- ovn-northd-ddlog -- dp-groups=yes > > 641: ovn.at:26903 ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 > > LR -- ovn-northd-ddlog -- dp-groups=yes > > proxy-arp > > 642: ovn.at:26903 ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 > > LR -- ovn-northd-ddlog -- dp-groups=no > > proxy-arp > > Thanks Ben! > > The two proxy-arp failures are likely because the proxy-arp > implementation is missing in ovn-northd ddlog. Numan started working on it: > > http://patchwork.ozlabs.org/project/ovn/patch/20210629160849.4130753-1-num...@ovn.org/ v2 of this patch is posted https://patchwork.ozlabs.org/project/ovn/patch/20210701124521.2095748-1-num...@ovn.org/ which fixes these test failures. Request to take a look. Thanks Numan > > I'll try out your patch as soon as I get the chance. > > Regards, > Dumitru > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog scale issues
On 7/1/21 9:18 AM, Ben Pfaff wrote: > On Tue, Jun 29, 2021 at 03:32:26PM -0700, Ben Pfaff wrote: >> On Mon, Jun 28, 2021 at 05:40:53PM +0200, Dumitru Ceara wrote: >>> On 5/20/21 5:50 PM, Ben Pfaff wrote: On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote: > On 4/7/21 6:49 PM, Ben Pfaff wrote: > > [...] > >>> >>> Thanks! I can download them now. It's back on my to-do list. >> >> I can reproduce the problem now. I haven't fixed it yet, but I did fix >> a nasty performance problem in ovn-nbctl that became really apparent >> when working with your databases: >> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html > > I was wondering if you had a chance to look at this since. I haven't kept going. I consider my series that gives a 5x performance improvement a kind of checkpoint along the way. I assumed at first that it would get reviewed quickly so I could move on to other things, but no one has looked at it yet. >>> >>> >>> Hi Ben, >>> >>> Just a note, I've tried this again with ovn-northd-ddlog built from >>> current OVN master branch, running against the same DBs: >> >> I've identified the problem. It's because of the ReachableLogicalRouter >> relation, which holds all pairs of routers (A,B) such that a packet at >> router A can transit switches and rotuers to arrive at router B. This >> is inherently O(n**2) and in this example n is about 8,000. >> >> I'll fix it. > > The following is pretty close, but I see three test failures I need to > investigate: > > 273: ovn.at:11253 ovn -- vlan traffic for external network with > distributed router gateway port -- ovn-northd-ddlog -- dp-groups=yes > 641: ovn.at:26903 ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 LR > -- ovn-northd-ddlog -- dp-groups=yes > proxy-arp > 642: ovn.at:26903 ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 LR > -- ovn-northd-ddlog -- dp-groups=no > proxy-arp Thanks Ben! The two proxy-arp failures are likely because the proxy-arp implementation is missing in ovn-northd ddlog. Numan started working on it: http://patchwork.ozlabs.org/project/ovn/patch/20210629160849.4130753-1-num...@ovn.org/ I'll try out your patch as soon as I get the chance. Regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog scale issues
On Tue, Jun 29, 2021 at 03:32:26PM -0700, Ben Pfaff wrote: > On Mon, Jun 28, 2021 at 05:40:53PM +0200, Dumitru Ceara wrote: > > On 5/20/21 5:50 PM, Ben Pfaff wrote: > > > On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote: > > >> On 4/7/21 6:49 PM, Ben Pfaff wrote: > > >> > > >> [...] > > >> > > > > Thanks! I can download them now. It's back on my to-do list. > > >>> > > >>> I can reproduce the problem now. I haven't fixed it yet, but I did fix > > >>> a nasty performance problem in ovn-nbctl that became really apparent > > >>> when working with your databases: > > >>> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html > > >> > > >> I was wondering if you had a chance to look at this since. > > > > > > I haven't kept going. I consider my series that gives a 5x performance > > > improvement a kind of checkpoint along the way. I assumed at first that > > > it would get reviewed quickly so I could move on to other things, but no > > > one has looked at it yet. > > > > > > > > > Hi Ben, > > > > Just a note, I've tried this again with ovn-northd-ddlog built from > > current OVN master branch, running against the same DBs: > > I've identified the problem. It's because of the ReachableLogicalRouter > relation, which holds all pairs of routers (A,B) such that a packet at > router A can transit switches and rotuers to arrive at router B. This > is inherently O(n**2) and in this example n is about 8,000. > > I'll fix it. The following is pretty close, but I see three test failures I need to investigate: 273: ovn.at:11253 ovn -- vlan traffic for external network with distributed router gateway port -- ovn-northd-ddlog -- dp-groups=yes 641: ovn.at:26903 ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 LR -- ovn-northd-ddlog -- dp-groups=yes proxy-arp 642: ovn.at:26903 ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 LR -- ovn-northd-ddlog -- dp-groups=no proxy-arp -8<--cut here-->8-- From: Ben Pfaff Date: Wed, 30 Jun 2021 11:57:46 -0700 Subject: [PATCH ovn] ovn-northd-ddlog: Avoid N**2 blowup for N connected logical routers. It's easy to implement "connected components" in raw DDlog, but it takes N**2 time and space in the number of elements in a component. This was a huge waste for a test case supplied by Dumitru Ceara that had over 8000 logical routers. This commit solves the problem by using the "graph" transformer built in DDlog, which efficiently implements connected components. Signed-off-by: Ben Pfaff Reported-by: Dumitru Ceara Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384519.html --- northd/lrouter.dl| 38 ++ northd/ovn_northd.dl | 4 +++- 2 files changed, 25 insertions(+), 17 deletions(-) diff --git a/northd/lrouter.dl b/northd/lrouter.dl index ff9dc9844ffc..85c07716bb12 100644 --- a/northd/lrouter.dl +++ b/northd/lrouter.dl @@ -14,6 +14,7 @@ import OVN_Northbound as nb import OVN_Southbound as sb +import graph as graph import multicast import ovsdb import ovn @@ -95,26 +96,31 @@ LogicalSwitchRouterPort(lsp, lsp_router_port, ls) :- ::Logical_Switch_Port(._uuid = lsp, .__type = "router", .options = options), Some{var lsp_router_port} = options.get("router-port"). +/* Undirected edges connecting one router and another. + * This is a building block for ConnectedLogicalRouter. */ +relation LogicalRouterEdge(a: uuid, b: uuid) +LogicalRouterEdge(a, b) :- +FirstHopLogicalRouter(a, ls), +FirstHopLogicalRouter(b, ls), +a <= b. +LogicalRouterEdge(a, b) :- PeerLogicalRouter(a, b). +function edge_from(e: LogicalRouterEdge): uuid = { e.a } +function edge_to(e: LogicalRouterEdge): uuid = { e.b } + /* - * Reachable routers. + * Sets of routers such that packets can transit directly or indirectly among + * any of the routers in a set. Any given router is in exactly one set. * - * Each row in the relation indicates that routers 'a' and 'b' can reach each - * other directly or indirectly through any chain of logical routers and - * switches. + * Each row (set, elem) identifes the membership of router with UUID 'elem' in + * set 'set', where 'set' is the minimum UUID across all its elements. * - * This relation is symmetric: if (a,b) then (b,a). - * This relation is reflexive: (a,a) is always true. + * We implement this using the graph transformer because there is no + * way to implement "connected components" in raw DDlog that avoids O(n**2) + * blowup in the number of nodes in a component. */ -relation ReachableLogicalRouter(a: uuid, b: uuid) -ReachableLogicalRouter(a, b), ReachableLogicalRouter(a, a), ReachableLogicalRouter(b, b) :- -PeerLogicalRouter(a, b). -ReachableLogicalRouter(a, b), ReachableLogicalRouter(a, a), ReachableLogicalRouter(b, b) :- -FirstHopLogicalRouter(a, ls), -FirstHopLogicalRouter(b, ls), -a != b.
Re: [ovs-dev] ovn-northd-ddlog scale issues
On Mon, Jun 28, 2021 at 05:40:53PM +0200, Dumitru Ceara wrote: > On 5/20/21 5:50 PM, Ben Pfaff wrote: > > On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote: > >> On 4/7/21 6:49 PM, Ben Pfaff wrote: > >> > >> [...] > >> > > Thanks! I can download them now. It's back on my to-do list. > >>> > >>> I can reproduce the problem now. I haven't fixed it yet, but I did fix > >>> a nasty performance problem in ovn-nbctl that became really apparent > >>> when working with your databases: > >>> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html > >> > >> I was wondering if you had a chance to look at this since. > > > > I haven't kept going. I consider my series that gives a 5x performance > > improvement a kind of checkpoint along the way. I assumed at first that > > it would get reviewed quickly so I could move on to other things, but no > > one has looked at it yet. > > > > > Hi Ben, > > Just a note, I've tried this again with ovn-northd-ddlog built from > current OVN master branch, running against the same DBs: I've identified the problem. It's because of the ReachableLogicalRouter relation, which holds all pairs of routers (A,B) such that a packet at router A can transit switches and rotuers to arrive at router B. This is inherently O(n**2) and in this example n is about 8,000. I'll fix it. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog scale issues
On 5/20/21 5:50 PM, Ben Pfaff wrote: > On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote: >> On 4/7/21 6:49 PM, Ben Pfaff wrote: >> >> [...] >> Thanks! I can download them now. It's back on my to-do list. >>> >>> I can reproduce the problem now. I haven't fixed it yet, but I did fix >>> a nasty performance problem in ovn-nbctl that became really apparent >>> when working with your databases: >>> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html >> >> I was wondering if you had a chance to look at this since. > > I haven't kept going. I consider my series that gives a 5x performance > improvement a kind of checkpoint along the way. I assumed at first that > it would get reviewed quickly so I could move on to other things, but no > one has looked at it yet. > Hi Ben, Just a note, I've tried this again with ovn-northd-ddlog built from current OVN master branch, running against the same DBs: http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/ I don't see the huge memory usage as initially reported, now ovn-northd-ddlog is stable at ~5GB, but it still seems that ovn-northd-ddlog spins indeterminately (infinite loop?) when starting up. Attaching GDB to it I see: (gdb) bt #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x011bd9fa in std::thread::park () #2 0x00f824e3 in crossbeam_channel::context::Context::wait_until () #3 0x00f82621 in crossbeam_channel::context::Context::with::{{closure}} () #4 0x00f83468 in crossbeam_channel::flavors::list::Channel::recv () #5 0x0149b60d in crossbeam_channel::channel::Receiver::recv () #6 0x00f4f2d0 in differential_datalog::program::RunningProgram::await_flush_ack () #7 0x00f4af4f in differential_datalog::program::RunningProgram::transaction_commit () #8 0x00507610 in ::transaction_commit_dump_changes () #9 0x00528bb0 in ddlog_transaction_commit_dump_changes () #10 0x0040a874 in ddlog_commit (delta=0x44ba7e0, ddlog=) at northd/ovn-northd-ddlog.c:335 #11 northd_parse_updates (updates=0x7ffd77a00aa0, ctx=0x444e690) at northd/ovn-northd-ddlog.c:464 #12 northd_run (ctx=0x444e690) at northd/ovn-northd-ddlog.c:566 #13 0x00408efc in main (argc=, argv=) at northd/ovn-northd-ddlog.c:1296 Regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog scale issues
On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote: > On 4/7/21 6:49 PM, Ben Pfaff wrote: > > [...] > > >> > >> Thanks! I can download them now. It's back on my to-do list. > > > > I can reproduce the problem now. I haven't fixed it yet, but I did fix > > a nasty performance problem in ovn-nbctl that became really apparent > > when working with your databases: > > https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html > > I was wondering if you had a chance to look at this since. I haven't kept going. I consider my series that gives a 5x performance improvement a kind of checkpoint along the way. I assumed at first that it would get reviewed quickly so I could move on to other things, but no one has looked at it yet. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog scale issues
On 4/7/21 6:49 PM, Ben Pfaff wrote: [...] >> >> Thanks! I can download them now. It's back on my to-do list. > > I can reproduce the problem now. I haven't fixed it yet, but I did fix > a nasty performance problem in ovn-nbctl that became really apparent > when working with your databases: > https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html > Hi Ben, I was wondering if you had a chance to look at this since. Thanks, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog scale issues
On Thu, Apr 01, 2021 at 09:54:10AM -0700, Ben Pfaff wrote: > On Thu, Apr 01, 2021 at 06:33:21PM +0200, Dumitru Ceara wrote: > > On 4/1/21 5:48 PM, Ben Pfaff wrote: > > > On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote: > > >> Hi Ben, > > >> > > >> We discussed a bit about this during one of the recent IRC OVN meetings, > > >> but I didn't get around to properly reporting this until now. > > >> > > >> I've tried running ovn-northd-ddlog against some large OVN NB/DB > > >> databases extracted from one of our scale testing runs: > > >> > > >> http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/ > > >> > > >> It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot > > >> of memory: > > >> > > >> 775734 root 10 -10 81.6g 80.8g 22396 S 99.7 64.2 3:50.79 > > >> ovn-northd-ddlog > > > > > > I am game to try to reproduce and fix this. I haven't tried reproducing > > > from a database snapshot before, so it'll be a new adventure. > > > > > > But those files are 403 Forbidden, even though the directory they're in > > > comes up fine. > > > > > > > Oops, really sorry about that, my bad. > > > > Can you, please, try again now? > > Thanks! I can download them now. It's back on my to-do list. I can reproduce the problem now. I haven't fixed it yet, but I did fix a nasty performance problem in ovn-nbctl that became really apparent when working with your databases: https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog scale issues
On Thu, Apr 01, 2021 at 06:33:21PM +0200, Dumitru Ceara wrote: > On 4/1/21 5:48 PM, Ben Pfaff wrote: > > On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote: > >> Hi Ben, > >> > >> We discussed a bit about this during one of the recent IRC OVN meetings, > >> but I didn't get around to properly reporting this until now. > >> > >> I've tried running ovn-northd-ddlog against some large OVN NB/DB > >> databases extracted from one of our scale testing runs: > >> > >> http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/ > >> > >> It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot > >> of memory: > >> > >> 775734 root 10 -10 81.6g 80.8g 22396 S 99.7 64.2 3:50.79 > >> ovn-northd-ddlog > > > > I am game to try to reproduce and fix this. I haven't tried reproducing > > from a database snapshot before, so it'll be a new adventure. > > > > But those files are 403 Forbidden, even though the directory they're in > > comes up fine. > > > > Oops, really sorry about that, my bad. > > Can you, please, try again now? Thanks! I can download them now. It's back on my to-do list. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog scale issues
On 4/1/21 5:48 PM, Ben Pfaff wrote: > On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote: >> Hi Ben, >> >> We discussed a bit about this during one of the recent IRC OVN meetings, >> but I didn't get around to properly reporting this until now. >> >> I've tried running ovn-northd-ddlog against some large OVN NB/DB >> databases extracted from one of our scale testing runs: >> >> http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/ >> >> It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot >> of memory: >> >> 775734 root 10 -10 81.6g 80.8g 22396 S 99.7 64.2 3:50.79 >> ovn-northd-ddlog > > I am game to try to reproduce and fix this. I haven't tried reproducing > from a database snapshot before, so it'll be a new adventure. > > But those files are 403 Forbidden, even though the directory they're in > comes up fine. > Oops, really sorry about that, my bad. Can you, please, try again now? Thanks, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog scale issues
On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote: > Hi Ben, > > We discussed a bit about this during one of the recent IRC OVN meetings, > but I didn't get around to properly reporting this until now. > > I've tried running ovn-northd-ddlog against some large OVN NB/DB > databases extracted from one of our scale testing runs: > > http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/ > > It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot > of memory: > > 775734 root 10 -10 81.6g 80.8g 22396 S 99.7 64.2 3:50.79 > ovn-northd-ddlog I am game to try to reproduce and fix this. I haven't tried reproducing from a database snapshot before, so it'll be a new adventure. But those files are 403 Forbidden, even though the directory they're in comes up fine. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog scale issues
I found the leaks and the fixes will be in v2 of the series. On Thu, Mar 25, 2021 at 01:14:37PM -0700, Ben Pfaff wrote: > OK. I guess I'll have to try it myself. > > On Thu, Mar 25, 2021 at 12:59:00PM +0530, Numan Siddique wrote: > > Hi Ben, > > > > With your memory fixes applied, I still see many test failures due to > > memory leaks > > when configured as: > > > > ./configure --enable-Werror --enable-sparse CFLAGS="-g -fsanitize=address" > > > > You can find more details about the memory leaks here - > > https://gist.github.com/numansiddique/e701777da977a89e24b49f159bb30d5d > > > > Thanks > > Numan > > > > > > > > On Wed, Mar 24, 2021 at 9:19 PM Ben Pfaff wrote: > > > > > > Thanks a lot for the report! I need a new test case so I'll give this > > > one a shot when I can. > > > > > > On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote: > > > > Hi Ben, > > > > > > > > We discussed a bit about this during one of the recent IRC OVN meetings, > > > > but I didn't get around to properly reporting this until now. > > > > > > > > I've tried running ovn-northd-ddlog against some large OVN NB/DB > > > > databases extracted from one of our scale testing runs: > > > > > > > > http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/ > > > > > > > > It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot > > > > of memory: > > > > > > > > 775734 root 10 -10 81.6g 80.8g 22396 S 99.7 64.2 3:50.79 > > > > ovn-northd-ddlog > > > > > > > > ovn-northd-ddlog is stuck in an (infinite?) loop in > > > > ddlog_transaction_commit_dump_changes(): > > > > > > > > (gdb) bt > > > > #0 0x7f2762167e92 in pthread_cond_wait@@GLIBC_2.3.2 () from > > > > /lib64/libpthread.so.0 > > > > #1 0x00891eb3 in std::thread::park () > > > > #2 0x00530963 in > > > > crossbeam_channel::context::Context::wait_until () > > > > #3 0x00530aa1 in > > > > crossbeam_channel::context::Context::with::{{closure}} () > > > > #4 0x00531f5c in > > > > crossbeam_channel::flavors::list::Channel::recv () > > > > #5 0x0075b35d in crossbeam_channel::channel::Receiver::recv > > > > () > > > > #6 0x00512156 in > > > > differential_datalog::program::RunningProgram::flush () > > > > #7 0x0050de67 in > > > > differential_datalog::program::RunningProgram::transaction_commit () > > > > #8 0x0093f17d in > > > differential_datalog::ddlog::DDlog>::transaction_commit_dump_changes () > > > > #9 0x0040cc40 in ddlog_transaction_commit_dump_changes () > > > > #10 0x0040a758 in ddlog_commit (ddlog=) at > > > > northd/ovn-northd-ddlog.c:435 > > > > #11 northd_parse_update (update=0x3dc0598, ctx=0x3d71880) at > > > > northd/ovn-northd-ddlog.c:435 > > > > #12 northd_run (ctx=0x3d71880) at northd/ovn-northd-ddlog.c:512 > > > > #13 0x00408edd in main (argc=, argv= > > > out>) at northd/ovn-northd-ddlog.c:1203 > > > > > > > > For comparison, running the C version of ovn-northd I get: > > > > > > > > 2021-03-24T14:48:06.556Z|00033|timeval|WARN|Unreasonably long 11290ms > > > > poll interval (10916ms user, 334ms system) > > > > 2021-03-24T14:48:14.678Z|00050|timeval|WARN|Unreasonably long 8122ms > > > > poll interval (8046ms user, 48ms system) > > > > > > > > But after the northd iteration ends, memory usage is OK: > > > > > > > > 777567 root 10 -10 1657308 1.6g 3496 S 0.0 1.2 0:49.08 > > > > ovn-northd > > > > > > > > The behavior above is consistent both with current OVN master and also > > > > when cherry-picking the ddlog-related patches that are pending in > > > > patchwork: > > > > http://patchwork.ozlabs.org/project/ovn/list/?series=233075 > > > > http://patchwork.ozlabs.org/project/ovn/list/?series=232480 > > > > http://patchwork.ozlabs.org/project/ovn/list/?series=233079 > > > > http://patchwork.ozlabs.org/project/ovn/list/?series=233080 > > > > > > > > I didn't try out the changes from the following series though as I > > > > understand they need a v2: > > > > http://patchwork.ozlabs.org/project/ovn/list/?series=232040 > > > > > > > > Regards, > > > > Dumitru > > > > > > > ___ > > > dev mailing list > > > d...@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog scale issues
OK. I guess I'll have to try it myself. On Thu, Mar 25, 2021 at 12:59:00PM +0530, Numan Siddique wrote: > Hi Ben, > > With your memory fixes applied, I still see many test failures due to > memory leaks > when configured as: > > ./configure --enable-Werror --enable-sparse CFLAGS="-g -fsanitize=address" > > You can find more details about the memory leaks here - > https://gist.github.com/numansiddique/e701777da977a89e24b49f159bb30d5d > > Thanks > Numan > > > > On Wed, Mar 24, 2021 at 9:19 PM Ben Pfaff wrote: > > > > Thanks a lot for the report! I need a new test case so I'll give this > > one a shot when I can. > > > > On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote: > > > Hi Ben, > > > > > > We discussed a bit about this during one of the recent IRC OVN meetings, > > > but I didn't get around to properly reporting this until now. > > > > > > I've tried running ovn-northd-ddlog against some large OVN NB/DB > > > databases extracted from one of our scale testing runs: > > > > > > http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/ > > > > > > It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot > > > of memory: > > > > > > 775734 root 10 -10 81.6g 80.8g 22396 S 99.7 64.2 3:50.79 > > > ovn-northd-ddlog > > > > > > ovn-northd-ddlog is stuck in an (infinite?) loop in > > > ddlog_transaction_commit_dump_changes(): > > > > > > (gdb) bt > > > #0 0x7f2762167e92 in pthread_cond_wait@@GLIBC_2.3.2 () from > > > /lib64/libpthread.so.0 > > > #1 0x00891eb3 in std::thread::park () > > > #2 0x00530963 in crossbeam_channel::context::Context::wait_until > > > () > > > #3 0x00530aa1 in > > > crossbeam_channel::context::Context::with::{{closure}} () > > > #4 0x00531f5c in > > > crossbeam_channel::flavors::list::Channel::recv () > > > #5 0x0075b35d in crossbeam_channel::channel::Receiver::recv () > > > #6 0x00512156 in > > > differential_datalog::program::RunningProgram::flush () > > > #7 0x0050de67 in > > > differential_datalog::program::RunningProgram::transaction_commit () > > > #8 0x0093f17d in > > differential_datalog::ddlog::DDlog>::transaction_commit_dump_changes () > > > #9 0x0040cc40 in ddlog_transaction_commit_dump_changes () > > > #10 0x0040a758 in ddlog_commit (ddlog=) at > > > northd/ovn-northd-ddlog.c:435 > > > #11 northd_parse_update (update=0x3dc0598, ctx=0x3d71880) at > > > northd/ovn-northd-ddlog.c:435 > > > #12 northd_run (ctx=0x3d71880) at northd/ovn-northd-ddlog.c:512 > > > #13 0x00408edd in main (argc=, argv= > > out>) at northd/ovn-northd-ddlog.c:1203 > > > > > > For comparison, running the C version of ovn-northd I get: > > > > > > 2021-03-24T14:48:06.556Z|00033|timeval|WARN|Unreasonably long 11290ms > > > poll interval (10916ms user, 334ms system) > > > 2021-03-24T14:48:14.678Z|00050|timeval|WARN|Unreasonably long 8122ms poll > > > interval (8046ms user, 48ms system) > > > > > > But after the northd iteration ends, memory usage is OK: > > > > > > 777567 root 10 -10 1657308 1.6g 3496 S 0.0 1.2 0:49.08 > > > ovn-northd > > > > > > The behavior above is consistent both with current OVN master and also > > > when cherry-picking the ddlog-related patches that are pending in > > > patchwork: > > > http://patchwork.ozlabs.org/project/ovn/list/?series=233075 > > > http://patchwork.ozlabs.org/project/ovn/list/?series=232480 > > > http://patchwork.ozlabs.org/project/ovn/list/?series=233079 > > > http://patchwork.ozlabs.org/project/ovn/list/?series=233080 > > > > > > I didn't try out the changes from the following series though as I > > > understand they need a v2: > > > http://patchwork.ozlabs.org/project/ovn/list/?series=232040 > > > > > > Regards, > > > Dumitru > > > > > ___ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog scale issues
Hi Ben, With your memory fixes applied, I still see many test failures due to memory leaks when configured as: ./configure --enable-Werror --enable-sparse CFLAGS="-g -fsanitize=address" You can find more details about the memory leaks here - https://gist.github.com/numansiddique/e701777da977a89e24b49f159bb30d5d Thanks Numan On Wed, Mar 24, 2021 at 9:19 PM Ben Pfaff wrote: > > Thanks a lot for the report! I need a new test case so I'll give this > one a shot when I can. > > On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote: > > Hi Ben, > > > > We discussed a bit about this during one of the recent IRC OVN meetings, > > but I didn't get around to properly reporting this until now. > > > > I've tried running ovn-northd-ddlog against some large OVN NB/DB > > databases extracted from one of our scale testing runs: > > > > http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/ > > > > It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot > > of memory: > > > > 775734 root 10 -10 81.6g 80.8g 22396 S 99.7 64.2 3:50.79 > > ovn-northd-ddlog > > > > ovn-northd-ddlog is stuck in an (infinite?) loop in > > ddlog_transaction_commit_dump_changes(): > > > > (gdb) bt > > #0 0x7f2762167e92 in pthread_cond_wait@@GLIBC_2.3.2 () from > > /lib64/libpthread.so.0 > > #1 0x00891eb3 in std::thread::park () > > #2 0x00530963 in crossbeam_channel::context::Context::wait_until () > > #3 0x00530aa1 in > > crossbeam_channel::context::Context::with::{{closure}} () > > #4 0x00531f5c in > > crossbeam_channel::flavors::list::Channel::recv () > > #5 0x0075b35d in crossbeam_channel::channel::Receiver::recv () > > #6 0x00512156 in > > differential_datalog::program::RunningProgram::flush () > > #7 0x0050de67 in > > differential_datalog::program::RunningProgram::transaction_commit () > > #8 0x0093f17d in > differential_datalog::ddlog::DDlog>::transaction_commit_dump_changes () > > #9 0x0040cc40 in ddlog_transaction_commit_dump_changes () > > #10 0x0040a758 in ddlog_commit (ddlog=) at > > northd/ovn-northd-ddlog.c:435 > > #11 northd_parse_update (update=0x3dc0598, ctx=0x3d71880) at > > northd/ovn-northd-ddlog.c:435 > > #12 northd_run (ctx=0x3d71880) at northd/ovn-northd-ddlog.c:512 > > #13 0x00408edd in main (argc=, argv=) > > at northd/ovn-northd-ddlog.c:1203 > > > > For comparison, running the C version of ovn-northd I get: > > > > 2021-03-24T14:48:06.556Z|00033|timeval|WARN|Unreasonably long 11290ms poll > > interval (10916ms user, 334ms system) > > 2021-03-24T14:48:14.678Z|00050|timeval|WARN|Unreasonably long 8122ms poll > > interval (8046ms user, 48ms system) > > > > But after the northd iteration ends, memory usage is OK: > > > > 777567 root 10 -10 1657308 1.6g 3496 S 0.0 1.2 0:49.08 > > ovn-northd > > > > The behavior above is consistent both with current OVN master and also > > when cherry-picking the ddlog-related patches that are pending in > > patchwork: > > http://patchwork.ozlabs.org/project/ovn/list/?series=233075 > > http://patchwork.ozlabs.org/project/ovn/list/?series=232480 > > http://patchwork.ozlabs.org/project/ovn/list/?series=233079 > > http://patchwork.ozlabs.org/project/ovn/list/?series=233080 > > > > I didn't try out the changes from the following series though as I > > understand they need a v2: > > http://patchwork.ozlabs.org/project/ovn/list/?series=232040 > > > > Regards, > > Dumitru > > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] ovn-northd-ddlog scale issues
Thanks a lot for the report! I need a new test case so I'll give this one a shot when I can. On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote: > Hi Ben, > > We discussed a bit about this during one of the recent IRC OVN meetings, > but I didn't get around to properly reporting this until now. > > I've tried running ovn-northd-ddlog against some large OVN NB/DB > databases extracted from one of our scale testing runs: > > http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/ > > It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot > of memory: > > 775734 root 10 -10 81.6g 80.8g 22396 S 99.7 64.2 3:50.79 > ovn-northd-ddlog > > ovn-northd-ddlog is stuck in an (infinite?) loop in > ddlog_transaction_commit_dump_changes(): > > (gdb) bt > #0 0x7f2762167e92 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00891eb3 in std::thread::park () > #2 0x00530963 in crossbeam_channel::context::Context::wait_until () > #3 0x00530aa1 in > crossbeam_channel::context::Context::with::{{closure}} () > #4 0x00531f5c in crossbeam_channel::flavors::list::Channel::recv > () > #5 0x0075b35d in crossbeam_channel::channel::Receiver::recv () > #6 0x00512156 in > differential_datalog::program::RunningProgram::flush () > #7 0x0050de67 in > differential_datalog::program::RunningProgram::transaction_commit () > #8 0x0093f17d in differential_datalog::ddlog::DDlog>::transaction_commit_dump_changes () > #9 0x0040cc40 in ddlog_transaction_commit_dump_changes () > #10 0x0040a758 in ddlog_commit (ddlog=) at > northd/ovn-northd-ddlog.c:435 > #11 northd_parse_update (update=0x3dc0598, ctx=0x3d71880) at > northd/ovn-northd-ddlog.c:435 > #12 northd_run (ctx=0x3d71880) at northd/ovn-northd-ddlog.c:512 > #13 0x00408edd in main (argc=, argv=) > at northd/ovn-northd-ddlog.c:1203 > > For comparison, running the C version of ovn-northd I get: > > 2021-03-24T14:48:06.556Z|00033|timeval|WARN|Unreasonably long 11290ms poll > interval (10916ms user, 334ms system) > 2021-03-24T14:48:14.678Z|00050|timeval|WARN|Unreasonably long 8122ms poll > interval (8046ms user, 48ms system) > > But after the northd iteration ends, memory usage is OK: > > 777567 root 10 -10 1657308 1.6g 3496 S 0.0 1.2 0:49.08 > ovn-northd > > The behavior above is consistent both with current OVN master and also > when cherry-picking the ddlog-related patches that are pending in > patchwork: > http://patchwork.ozlabs.org/project/ovn/list/?series=233075 > http://patchwork.ozlabs.org/project/ovn/list/?series=232480 > http://patchwork.ozlabs.org/project/ovn/list/?series=233079 > http://patchwork.ozlabs.org/project/ovn/list/?series=233080 > > I didn't try out the changes from the following series though as I > understand they need a v2: > http://patchwork.ozlabs.org/project/ovn/list/?series=232040 > > Regards, > Dumitru > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] ovn-northd-ddlog scale issues
Hi Ben, We discussed a bit about this during one of the recent IRC OVN meetings, but I didn't get around to properly reporting this until now. I've tried running ovn-northd-ddlog against some large OVN NB/DB databases extracted from one of our scale testing runs: http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/ It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot of memory: 775734 root 10 -10 81.6g 80.8g 22396 S 99.7 64.2 3:50.79 ovn-northd-ddlog ovn-northd-ddlog is stuck in an (infinite?) loop in ddlog_transaction_commit_dump_changes(): (gdb) bt #0 0x7f2762167e92 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00891eb3 in std::thread::park () #2 0x00530963 in crossbeam_channel::context::Context::wait_until () #3 0x00530aa1 in crossbeam_channel::context::Context::with::{{closure}} () #4 0x00531f5c in crossbeam_channel::flavors::list::Channel::recv () #5 0x0075b35d in crossbeam_channel::channel::Receiver::recv () #6 0x00512156 in differential_datalog::program::RunningProgram::flush () #7 0x0050de67 in differential_datalog::program::RunningProgram::transaction_commit () #8 0x0093f17d in ::transaction_commit_dump_changes () #9 0x0040cc40 in ddlog_transaction_commit_dump_changes () #10 0x0040a758 in ddlog_commit (ddlog=) at northd/ovn-northd-ddlog.c:435 #11 northd_parse_update (update=0x3dc0598, ctx=0x3d71880) at northd/ovn-northd-ddlog.c:435 #12 northd_run (ctx=0x3d71880) at northd/ovn-northd-ddlog.c:512 #13 0x00408edd in main (argc=, argv=) at northd/ovn-northd-ddlog.c:1203 For comparison, running the C version of ovn-northd I get: 2021-03-24T14:48:06.556Z|00033|timeval|WARN|Unreasonably long 11290ms poll interval (10916ms user, 334ms system) 2021-03-24T14:48:14.678Z|00050|timeval|WARN|Unreasonably long 8122ms poll interval (8046ms user, 48ms system) But after the northd iteration ends, memory usage is OK: 777567 root 10 -10 1657308 1.6g 3496 S 0.0 1.2 0:49.08 ovn-northd The behavior above is consistent both with current OVN master and also when cherry-picking the ddlog-related patches that are pending in patchwork: http://patchwork.ozlabs.org/project/ovn/list/?series=233075 http://patchwork.ozlabs.org/project/ovn/list/?series=232480 http://patchwork.ozlabs.org/project/ovn/list/?series=233079 http://patchwork.ozlabs.org/project/ovn/list/?series=233080 I didn't try out the changes from the following series though as I understand they need a v2: http://patchwork.ozlabs.org/project/ovn/list/?series=232040 Regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev