Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-07-01 Thread Ben Pfaff
On Thu, Jul 01, 2021 at 12:18:12AM -0700, Ben Pfaff wrote:
> On Tue, Jun 29, 2021 at 03:32:26PM -0700, Ben Pfaff wrote:
> > On Mon, Jun 28, 2021 at 05:40:53PM +0200, Dumitru Ceara wrote:
> > > On 5/20/21 5:50 PM, Ben Pfaff wrote:
> > > > On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote:
> > > >> On 4/7/21 6:49 PM, Ben Pfaff wrote:
> > > >>
> > > >> [...]
> > > >>
> > > 
> > >  Thanks!  I can download them now.  It's back on my to-do list.
> > > >>>
> > > >>> I can reproduce the problem now.  I haven't fixed it yet, but I did 
> > > >>> fix
> > > >>> a nasty performance problem in ovn-nbctl that became really apparent
> > > >>> when working with your databases:
> > > >>> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html
> > > >>
> > > >> I was wondering if you had a chance to look at this since.
> > > > 
> > > > I haven't kept going.  I consider my series that gives a 5x performance
> > > > improvement a kind of checkpoint along the way.  I assumed at first that
> > > > it would get reviewed quickly so I could move on to other things, but no
> > > > one has looked at it yet.
> > > > 
> > > 
> > > 
> > > Hi Ben,
> > > 
> > > Just a note, I've tried this again with ovn-northd-ddlog built from
> > > current OVN master branch, running against the same DBs:
> > 
> > I've identified the problem.  It's because of the ReachableLogicalRouter
> > relation, which holds all pairs of routers (A,B) such that a packet at
> > router A can transit switches and rotuers to arrive at router B.  This
> > is inherently O(n**2) and in this example n is about 8,000.
> > 
> > I'll fix it.
> 
> The following is pretty close, but I see three test failures I need to
> investigate:
> 
>  273: ovn.at:11253   ovn -- vlan traffic for external network with 
> distributed router gateway port -- ovn-northd-ddlog -- dp-groups=yes
>  641: ovn.at:26903   ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 LR 
> -- ovn-northd-ddlog -- dp-groups=yes
>   proxy-arp
>  642: ovn.at:26903   ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 LR 
> -- ovn-northd-ddlog -- dp-groups=no
>   proxy-arp

These test failures were red herrings.  I posted a formal version of
this patch for review:
https://patchwork.ozlabs.org/project/ovn/patch/20210702005640.1627098-1-...@ovn.org/
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-07-01 Thread Numan Siddique
On Thu, Jul 1, 2021 at 4:31 AM Dumitru Ceara  wrote:
>
> On 7/1/21 9:18 AM, Ben Pfaff wrote:
> > On Tue, Jun 29, 2021 at 03:32:26PM -0700, Ben Pfaff wrote:
> >> On Mon, Jun 28, 2021 at 05:40:53PM +0200, Dumitru Ceara wrote:
> >>> On 5/20/21 5:50 PM, Ben Pfaff wrote:
>  On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote:
> > On 4/7/21 6:49 PM, Ben Pfaff wrote:
> >
> > [...]
> >
> >>>
> >>> Thanks!  I can download them now.  It's back on my to-do list.
> >>
> >> I can reproduce the problem now.  I haven't fixed it yet, but I did fix
> >> a nasty performance problem in ovn-nbctl that became really apparent
> >> when working with your databases:
> >> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html
> >
> > I was wondering if you had a chance to look at this since.
> 
>  I haven't kept going.  I consider my series that gives a 5x performance
>  improvement a kind of checkpoint along the way.  I assumed at first that
>  it would get reviewed quickly so I could move on to other things, but no
>  one has looked at it yet.
> 
> >>>
> >>>
> >>> Hi Ben,
> >>>
> >>> Just a note, I've tried this again with ovn-northd-ddlog built from
> >>> current OVN master branch, running against the same DBs:
> >>
> >> I've identified the problem.  It's because of the ReachableLogicalRouter
> >> relation, which holds all pairs of routers (A,B) such that a packet at
> >> router A can transit switches and rotuers to arrive at router B.  This
> >> is inherently O(n**2) and in this example n is about 8,000.
> >>
> >> I'll fix it.
> >
> > The following is pretty close, but I see three test failures I need to
> > investigate:
> >
> >  273: ovn.at:11253   ovn -- vlan traffic for external network with 
> > distributed router gateway port -- ovn-northd-ddlog -- dp-groups=yes
> >  641: ovn.at:26903   ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 
> > LR -- ovn-northd-ddlog -- dp-groups=yes
> >   proxy-arp
> >  642: ovn.at:26903   ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 
> > LR -- ovn-northd-ddlog -- dp-groups=no
> >   proxy-arp
>
> Thanks Ben!
>
> The two proxy-arp failures are likely because the proxy-arp
> implementation is missing in ovn-northd ddlog.  Numan started working on it:
>
> http://patchwork.ozlabs.org/project/ovn/patch/20210629160849.4130753-1-num...@ovn.org/

v2 of this patch is posted
https://patchwork.ozlabs.org/project/ovn/patch/20210701124521.2095748-1-num...@ovn.org/
which fixes these test failures.  Request to take a look.

Thanks
Numan

>
> I'll try out your patch as soon as I get the chance.
>
> Regards,
> Dumitru
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-07-01 Thread Dumitru Ceara
On 7/1/21 9:18 AM, Ben Pfaff wrote:
> On Tue, Jun 29, 2021 at 03:32:26PM -0700, Ben Pfaff wrote:
>> On Mon, Jun 28, 2021 at 05:40:53PM +0200, Dumitru Ceara wrote:
>>> On 5/20/21 5:50 PM, Ben Pfaff wrote:
 On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote:
> On 4/7/21 6:49 PM, Ben Pfaff wrote:
>
> [...]
>
>>>
>>> Thanks!  I can download them now.  It's back on my to-do list.
>>
>> I can reproduce the problem now.  I haven't fixed it yet, but I did fix
>> a nasty performance problem in ovn-nbctl that became really apparent
>> when working with your databases:
>> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html
>
> I was wondering if you had a chance to look at this since.

 I haven't kept going.  I consider my series that gives a 5x performance
 improvement a kind of checkpoint along the way.  I assumed at first that
 it would get reviewed quickly so I could move on to other things, but no
 one has looked at it yet.

>>>
>>>
>>> Hi Ben,
>>>
>>> Just a note, I've tried this again with ovn-northd-ddlog built from
>>> current OVN master branch, running against the same DBs:
>>
>> I've identified the problem.  It's because of the ReachableLogicalRouter
>> relation, which holds all pairs of routers (A,B) such that a packet at
>> router A can transit switches and rotuers to arrive at router B.  This
>> is inherently O(n**2) and in this example n is about 8,000.
>>
>> I'll fix it.
> 
> The following is pretty close, but I see three test failures I need to
> investigate:
> 
>  273: ovn.at:11253   ovn -- vlan traffic for external network with 
> distributed router gateway port -- ovn-northd-ddlog -- dp-groups=yes
>  641: ovn.at:26903   ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 LR 
> -- ovn-northd-ddlog -- dp-groups=yes
>   proxy-arp
>  642: ovn.at:26903   ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 LR 
> -- ovn-northd-ddlog -- dp-groups=no
>   proxy-arp

Thanks Ben!

The two proxy-arp failures are likely because the proxy-arp
implementation is missing in ovn-northd ddlog.  Numan started working on it:

http://patchwork.ozlabs.org/project/ovn/patch/20210629160849.4130753-1-num...@ovn.org/

I'll try out your patch as soon as I get the chance.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-07-01 Thread Ben Pfaff
On Tue, Jun 29, 2021 at 03:32:26PM -0700, Ben Pfaff wrote:
> On Mon, Jun 28, 2021 at 05:40:53PM +0200, Dumitru Ceara wrote:
> > On 5/20/21 5:50 PM, Ben Pfaff wrote:
> > > On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote:
> > >> On 4/7/21 6:49 PM, Ben Pfaff wrote:
> > >>
> > >> [...]
> > >>
> > 
> >  Thanks!  I can download them now.  It's back on my to-do list.
> > >>>
> > >>> I can reproduce the problem now.  I haven't fixed it yet, but I did fix
> > >>> a nasty performance problem in ovn-nbctl that became really apparent
> > >>> when working with your databases:
> > >>> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html
> > >>
> > >> I was wondering if you had a chance to look at this since.
> > > 
> > > I haven't kept going.  I consider my series that gives a 5x performance
> > > improvement a kind of checkpoint along the way.  I assumed at first that
> > > it would get reviewed quickly so I could move on to other things, but no
> > > one has looked at it yet.
> > > 
> > 
> > 
> > Hi Ben,
> > 
> > Just a note, I've tried this again with ovn-northd-ddlog built from
> > current OVN master branch, running against the same DBs:
> 
> I've identified the problem.  It's because of the ReachableLogicalRouter
> relation, which holds all pairs of routers (A,B) such that a packet at
> router A can transit switches and rotuers to arrive at router B.  This
> is inherently O(n**2) and in this example n is about 8,000.
> 
> I'll fix it.

The following is pretty close, but I see three test failures I need to
investigate:

 273: ovn.at:11253   ovn -- vlan traffic for external network with 
distributed router gateway port -- ovn-northd-ddlog -- dp-groups=yes
 641: ovn.at:26903   ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 LR 
-- ovn-northd-ddlog -- dp-groups=yes
  proxy-arp
 642: ovn.at:26903   ovn -- proxy-arp: 1 HVs, 1 LSs, 1 lport/LS, 1 LR 
-- ovn-northd-ddlog -- dp-groups=no
  proxy-arp

-8<--cut here-->8--

From: Ben Pfaff 
Date: Wed, 30 Jun 2021 11:57:46 -0700
Subject: [PATCH ovn] ovn-northd-ddlog: Avoid N**2 blowup for N connected
 logical routers.

It's easy to implement "connected components" in raw DDlog, but it
takes N**2 time and space in the number of elements in a component.
This was a huge waste for a test case supplied by Dumitru Ceara that
had over 8000 logical routers.  This commit solves the problem by using
the "graph" transformer built in DDlog, which efficiently implements
connected components.

Signed-off-by: Ben Pfaff 
Reported-by: Dumitru Ceara 
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384519.html
---
 northd/lrouter.dl| 38 ++
 northd/ovn_northd.dl |  4 +++-
 2 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/northd/lrouter.dl b/northd/lrouter.dl
index ff9dc9844ffc..85c07716bb12 100644
--- a/northd/lrouter.dl
+++ b/northd/lrouter.dl
@@ -14,6 +14,7 @@
 
 import OVN_Northbound as nb
 import OVN_Southbound as sb
+import graph as graph
 import multicast
 import ovsdb
 import ovn
@@ -95,26 +96,31 @@ LogicalSwitchRouterPort(lsp, lsp_router_port, ls) :-
   ::Logical_Switch_Port(._uuid = lsp, .__type = "router", .options = 
options),
   Some{var lsp_router_port} = options.get("router-port").
 
+/* Undirected edges connecting one router and another.
+ * This is a building block for ConnectedLogicalRouter. */
+relation LogicalRouterEdge(a: uuid, b: uuid)
+LogicalRouterEdge(a, b) :-
+FirstHopLogicalRouter(a, ls),
+FirstHopLogicalRouter(b, ls),
+a <= b.
+LogicalRouterEdge(a, b) :- PeerLogicalRouter(a, b).
+function edge_from(e: LogicalRouterEdge): uuid = { e.a }
+function edge_to(e: LogicalRouterEdge): uuid = { e.b }
+
 /*
- * Reachable routers.
+ * Sets of routers such that packets can transit directly or indirectly among
+ * any of the routers in a set.  Any given router is in exactly one set.
  *
- * Each row in the relation indicates that routers 'a' and 'b' can reach each
- * other directly or indirectly through any chain of logical routers and
- * switches.
+ * Each row (set, elem) identifes the membership of router with UUID 'elem' in
+ * set 'set', where 'set' is the minimum UUID across all its elements.
  *
- * This relation is symmetric: if (a,b) then (b,a).
- * This relation is reflexive: (a,a) is always true.
+ * We implement this using the graph transformer because there is no
+ * way to implement "connected components" in raw DDlog that avoids O(n**2)
+ * blowup in the number of nodes in a component.
  */
-relation ReachableLogicalRouter(a: uuid, b: uuid)
-ReachableLogicalRouter(a, b), ReachableLogicalRouter(a, a), 
ReachableLogicalRouter(b, b) :-
-PeerLogicalRouter(a, b).
-ReachableLogicalRouter(a, b), ReachableLogicalRouter(a, a), 
ReachableLogicalRouter(b, b) :-
-FirstHopLogicalRouter(a, ls),
-FirstHopLogicalRouter(b, ls),
-a != b.

Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-06-29 Thread Ben Pfaff
On Mon, Jun 28, 2021 at 05:40:53PM +0200, Dumitru Ceara wrote:
> On 5/20/21 5:50 PM, Ben Pfaff wrote:
> > On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote:
> >> On 4/7/21 6:49 PM, Ben Pfaff wrote:
> >>
> >> [...]
> >>
> 
>  Thanks!  I can download them now.  It's back on my to-do list.
> >>>
> >>> I can reproduce the problem now.  I haven't fixed it yet, but I did fix
> >>> a nasty performance problem in ovn-nbctl that became really apparent
> >>> when working with your databases:
> >>> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html
> >>
> >> I was wondering if you had a chance to look at this since.
> > 
> > I haven't kept going.  I consider my series that gives a 5x performance
> > improvement a kind of checkpoint along the way.  I assumed at first that
> > it would get reviewed quickly so I could move on to other things, but no
> > one has looked at it yet.
> > 
> 
> 
> Hi Ben,
> 
> Just a note, I've tried this again with ovn-northd-ddlog built from
> current OVN master branch, running against the same DBs:

I've identified the problem.  It's because of the ReachableLogicalRouter
relation, which holds all pairs of routers (A,B) such that a packet at
router A can transit switches and rotuers to arrive at router B.  This
is inherently O(n**2) and in this example n is about 8,000.

I'll fix it.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-06-28 Thread Dumitru Ceara
On 5/20/21 5:50 PM, Ben Pfaff wrote:
> On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote:
>> On 4/7/21 6:49 PM, Ben Pfaff wrote:
>>
>> [...]
>>

 Thanks!  I can download them now.  It's back on my to-do list.
>>>
>>> I can reproduce the problem now.  I haven't fixed it yet, but I did fix
>>> a nasty performance problem in ovn-nbctl that became really apparent
>>> when working with your databases:
>>> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html
>>
>> I was wondering if you had a chance to look at this since.
> 
> I haven't kept going.  I consider my series that gives a 5x performance
> improvement a kind of checkpoint along the way.  I assumed at first that
> it would get reviewed quickly so I could move on to other things, but no
> one has looked at it yet.
> 


Hi Ben,

Just a note, I've tried this again with ovn-northd-ddlog built from
current OVN master branch, running against the same DBs:

http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/

I don't see the huge memory usage as initially reported, now
ovn-northd-ddlog is stable at ~5GB, but it still seems that
ovn-northd-ddlog spins indeterminately (infinite loop?) when starting up.

Attaching GDB to it I see:

(gdb) bt
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x011bd9fa in std::thread::park ()
#2  0x00f824e3 in crossbeam_channel::context::Context::wait_until ()
#3  0x00f82621 in 
crossbeam_channel::context::Context::with::{{closure}} ()
#4  0x00f83468 in crossbeam_channel::flavors::list::Channel::recv ()
#5  0x0149b60d in crossbeam_channel::channel::Receiver::recv ()
#6  0x00f4f2d0 in 
differential_datalog::program::RunningProgram::await_flush_ack ()
#7  0x00f4af4f in 
differential_datalog::program::RunningProgram::transaction_commit ()
#8  0x00507610 in ::transaction_commit_dump_changes ()
#9  0x00528bb0 in ddlog_transaction_commit_dump_changes ()
#10 0x0040a874 in ddlog_commit (delta=0x44ba7e0, ddlog=) 
at northd/ovn-northd-ddlog.c:335
#11 northd_parse_updates (updates=0x7ffd77a00aa0, ctx=0x444e690) at 
northd/ovn-northd-ddlog.c:464
#12 northd_run (ctx=0x444e690) at northd/ovn-northd-ddlog.c:566
#13 0x00408efc in main (argc=, argv=) at 
northd/ovn-northd-ddlog.c:1296

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-05-20 Thread Ben Pfaff
On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote:
> On 4/7/21 6:49 PM, Ben Pfaff wrote:
> 
> [...]
> 
> >>
> >> Thanks!  I can download them now.  It's back on my to-do list.
> > 
> > I can reproduce the problem now.  I haven't fixed it yet, but I did fix
> > a nasty performance problem in ovn-nbctl that became really apparent
> > when working with your databases:
> > https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html
> 
> I was wondering if you had a chance to look at this since.

I haven't kept going.  I consider my series that gives a 5x performance
improvement a kind of checkpoint along the way.  I assumed at first that
it would get reviewed quickly so I could move on to other things, but no
one has looked at it yet.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-05-20 Thread Dumitru Ceara
On 4/7/21 6:49 PM, Ben Pfaff wrote:

[...]

>>
>> Thanks!  I can download them now.  It's back on my to-do list.
> 
> I can reproduce the problem now.  I haven't fixed it yet, but I did fix
> a nasty performance problem in ovn-nbctl that became really apparent
> when working with your databases:
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html
> 

Hi Ben,

I was wondering if you had a chance to look at this since.

Thanks,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-04-07 Thread Ben Pfaff
On Thu, Apr 01, 2021 at 09:54:10AM -0700, Ben Pfaff wrote:
> On Thu, Apr 01, 2021 at 06:33:21PM +0200, Dumitru Ceara wrote:
> > On 4/1/21 5:48 PM, Ben Pfaff wrote:
> > > On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote:
> > >> Hi Ben,
> > >>
> > >> We discussed a bit about this during one of the recent IRC OVN meetings,
> > >> but I didn't get around to properly reporting this until now.
> > >>
> > >> I've tried running ovn-northd-ddlog against some large OVN NB/DB
> > >> databases extracted from one of our scale testing runs:
> > >>
> > >> http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/
> > >>
> > >> It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot
> > >> of memory:
> > >>
> > >> 775734 root  10 -10   81.6g  80.8g  22396 S  99.7  64.2   3:50.79 
> > >> ovn-northd-ddlog
> > > 
> > > I am game to try to reproduce and fix this.  I haven't tried reproducing
> > > from a database snapshot before, so it'll be a new adventure.
> > > 
> > > But those files are 403 Forbidden, even though the directory they're in
> > > comes up fine.
> > > 
> > 
> > Oops, really sorry about that, my bad.
> > 
> > Can you, please, try again now?
> 
> Thanks!  I can download them now.  It's back on my to-do list.

I can reproduce the problem now.  I haven't fixed it yet, but I did fix
a nasty performance problem in ovn-nbctl that became really apparent
when working with your databases:
https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-04-01 Thread Ben Pfaff
On Thu, Apr 01, 2021 at 06:33:21PM +0200, Dumitru Ceara wrote:
> On 4/1/21 5:48 PM, Ben Pfaff wrote:
> > On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote:
> >> Hi Ben,
> >>
> >> We discussed a bit about this during one of the recent IRC OVN meetings,
> >> but I didn't get around to properly reporting this until now.
> >>
> >> I've tried running ovn-northd-ddlog against some large OVN NB/DB
> >> databases extracted from one of our scale testing runs:
> >>
> >> http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/
> >>
> >> It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot
> >> of memory:
> >>
> >> 775734 root  10 -10   81.6g  80.8g  22396 S  99.7  64.2   3:50.79 
> >> ovn-northd-ddlog
> > 
> > I am game to try to reproduce and fix this.  I haven't tried reproducing
> > from a database snapshot before, so it'll be a new adventure.
> > 
> > But those files are 403 Forbidden, even though the directory they're in
> > comes up fine.
> > 
> 
> Oops, really sorry about that, my bad.
> 
> Can you, please, try again now?

Thanks!  I can download them now.  It's back on my to-do list.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-04-01 Thread Dumitru Ceara
On 4/1/21 5:48 PM, Ben Pfaff wrote:
> On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote:
>> Hi Ben,
>>
>> We discussed a bit about this during one of the recent IRC OVN meetings,
>> but I didn't get around to properly reporting this until now.
>>
>> I've tried running ovn-northd-ddlog against some large OVN NB/DB
>> databases extracted from one of our scale testing runs:
>>
>> http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/
>>
>> It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot
>> of memory:
>>
>> 775734 root  10 -10   81.6g  80.8g  22396 S  99.7  64.2   3:50.79 
>> ovn-northd-ddlog
> 
> I am game to try to reproduce and fix this.  I haven't tried reproducing
> from a database snapshot before, so it'll be a new adventure.
> 
> But those files are 403 Forbidden, even though the directory they're in
> comes up fine.
> 

Oops, really sorry about that, my bad.

Can you, please, try again now?

Thanks,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-04-01 Thread Ben Pfaff
On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote:
> Hi Ben,
> 
> We discussed a bit about this during one of the recent IRC OVN meetings,
> but I didn't get around to properly reporting this until now.
> 
> I've tried running ovn-northd-ddlog against some large OVN NB/DB
> databases extracted from one of our scale testing runs:
> 
> http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/
> 
> It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot
> of memory:
> 
> 775734 root  10 -10   81.6g  80.8g  22396 S  99.7  64.2   3:50.79 
> ovn-northd-ddlog

I am game to try to reproduce and fix this.  I haven't tried reproducing
from a database snapshot before, so it'll be a new adventure.

But those files are 403 Forbidden, even though the directory they're in
comes up fine.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-03-31 Thread Ben Pfaff
I found the leaks and the fixes will be in v2 of the series.

On Thu, Mar 25, 2021 at 01:14:37PM -0700, Ben Pfaff wrote:
> OK.  I guess I'll have to try it myself.
> 
> On Thu, Mar 25, 2021 at 12:59:00PM +0530, Numan Siddique wrote:
> > Hi Ben,
> > 
> > With your memory fixes applied, I still see many test failures due to
> > memory leaks
> > when configured as:
> > 
> > ./configure --enable-Werror --enable-sparse CFLAGS="-g -fsanitize=address"
> > 
> > You can find more details about the memory leaks here -
> > https://gist.github.com/numansiddique/e701777da977a89e24b49f159bb30d5d
> > 
> > Thanks
> > Numan
> > 
> > 
> > 
> > On Wed, Mar 24, 2021 at 9:19 PM Ben Pfaff  wrote:
> > >
> > > Thanks a lot for the report!  I need a new test case so I'll give this
> > > one a shot when I can.
> > >
> > > On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote:
> > > > Hi Ben,
> > > >
> > > > We discussed a bit about this during one of the recent IRC OVN meetings,
> > > > but I didn't get around to properly reporting this until now.
> > > >
> > > > I've tried running ovn-northd-ddlog against some large OVN NB/DB
> > > > databases extracted from one of our scale testing runs:
> > > >
> > > > http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/
> > > >
> > > > It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot
> > > > of memory:
> > > >
> > > > 775734 root  10 -10   81.6g  80.8g  22396 S  99.7  64.2   3:50.79 
> > > > ovn-northd-ddlog
> > > >
> > > > ovn-northd-ddlog is stuck in an (infinite?) loop in
> > > > ddlog_transaction_commit_dump_changes():
> > > >
> > > > (gdb) bt
> > > > #0  0x7f2762167e92 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> > > > /lib64/libpthread.so.0
> > > > #1  0x00891eb3 in std::thread::park ()
> > > > #2  0x00530963 in 
> > > > crossbeam_channel::context::Context::wait_until ()
> > > > #3  0x00530aa1 in 
> > > > crossbeam_channel::context::Context::with::{{closure}} ()
> > > > #4  0x00531f5c in 
> > > > crossbeam_channel::flavors::list::Channel::recv ()
> > > > #5  0x0075b35d in crossbeam_channel::channel::Receiver::recv 
> > > > ()
> > > > #6  0x00512156 in 
> > > > differential_datalog::program::RunningProgram::flush ()
> > > > #7  0x0050de67 in 
> > > > differential_datalog::program::RunningProgram::transaction_commit ()
> > > > #8  0x0093f17d in  > > > differential_datalog::ddlog::DDlog>::transaction_commit_dump_changes ()
> > > > #9  0x0040cc40 in ddlog_transaction_commit_dump_changes ()
> > > > #10 0x0040a758 in ddlog_commit (ddlog=) at 
> > > > northd/ovn-northd-ddlog.c:435
> > > > #11 northd_parse_update (update=0x3dc0598, ctx=0x3d71880) at 
> > > > northd/ovn-northd-ddlog.c:435
> > > > #12 northd_run (ctx=0x3d71880) at northd/ovn-northd-ddlog.c:512
> > > > #13 0x00408edd in main (argc=, argv= > > > out>) at northd/ovn-northd-ddlog.c:1203
> > > >
> > > > For comparison, running the C version of ovn-northd I get:
> > > >
> > > > 2021-03-24T14:48:06.556Z|00033|timeval|WARN|Unreasonably long 11290ms 
> > > > poll interval (10916ms user, 334ms system)
> > > > 2021-03-24T14:48:14.678Z|00050|timeval|WARN|Unreasonably long 8122ms 
> > > > poll interval (8046ms user, 48ms system)
> > > >
> > > > But after the northd iteration ends, memory usage is OK:
> > > >
> > > > 777567 root  10 -10 1657308   1.6g   3496 S   0.0   1.2   0:49.08 
> > > > ovn-northd
> > > >
> > > > The behavior above is consistent both with current OVN master and also
> > > > when cherry-picking the ddlog-related patches that are pending in
> > > > patchwork:
> > > > http://patchwork.ozlabs.org/project/ovn/list/?series=233075
> > > > http://patchwork.ozlabs.org/project/ovn/list/?series=232480
> > > > http://patchwork.ozlabs.org/project/ovn/list/?series=233079
> > > > http://patchwork.ozlabs.org/project/ovn/list/?series=233080
> > > >
> > > > I didn't try out the changes from the following series though as I
> > > > understand they need a v2:
> > > > http://patchwork.ozlabs.org/project/ovn/list/?series=232040
> > > >
> > > > Regards,
> > > > Dumitru
> > > >
> > > ___
> > > dev mailing list
> > > d...@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-03-25 Thread Ben Pfaff
OK.  I guess I'll have to try it myself.

On Thu, Mar 25, 2021 at 12:59:00PM +0530, Numan Siddique wrote:
> Hi Ben,
> 
> With your memory fixes applied, I still see many test failures due to
> memory leaks
> when configured as:
> 
> ./configure --enable-Werror --enable-sparse CFLAGS="-g -fsanitize=address"
> 
> You can find more details about the memory leaks here -
> https://gist.github.com/numansiddique/e701777da977a89e24b49f159bb30d5d
> 
> Thanks
> Numan
> 
> 
> 
> On Wed, Mar 24, 2021 at 9:19 PM Ben Pfaff  wrote:
> >
> > Thanks a lot for the report!  I need a new test case so I'll give this
> > one a shot when I can.
> >
> > On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote:
> > > Hi Ben,
> > >
> > > We discussed a bit about this during one of the recent IRC OVN meetings,
> > > but I didn't get around to properly reporting this until now.
> > >
> > > I've tried running ovn-northd-ddlog against some large OVN NB/DB
> > > databases extracted from one of our scale testing runs:
> > >
> > > http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/
> > >
> > > It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot
> > > of memory:
> > >
> > > 775734 root  10 -10   81.6g  80.8g  22396 S  99.7  64.2   3:50.79 
> > > ovn-northd-ddlog
> > >
> > > ovn-northd-ddlog is stuck in an (infinite?) loop in
> > > ddlog_transaction_commit_dump_changes():
> > >
> > > (gdb) bt
> > > #0  0x7f2762167e92 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> > > /lib64/libpthread.so.0
> > > #1  0x00891eb3 in std::thread::park ()
> > > #2  0x00530963 in crossbeam_channel::context::Context::wait_until 
> > > ()
> > > #3  0x00530aa1 in 
> > > crossbeam_channel::context::Context::with::{{closure}} ()
> > > #4  0x00531f5c in 
> > > crossbeam_channel::flavors::list::Channel::recv ()
> > > #5  0x0075b35d in crossbeam_channel::channel::Receiver::recv ()
> > > #6  0x00512156 in 
> > > differential_datalog::program::RunningProgram::flush ()
> > > #7  0x0050de67 in 
> > > differential_datalog::program::RunningProgram::transaction_commit ()
> > > #8  0x0093f17d in  > > differential_datalog::ddlog::DDlog>::transaction_commit_dump_changes ()
> > > #9  0x0040cc40 in ddlog_transaction_commit_dump_changes ()
> > > #10 0x0040a758 in ddlog_commit (ddlog=) at 
> > > northd/ovn-northd-ddlog.c:435
> > > #11 northd_parse_update (update=0x3dc0598, ctx=0x3d71880) at 
> > > northd/ovn-northd-ddlog.c:435
> > > #12 northd_run (ctx=0x3d71880) at northd/ovn-northd-ddlog.c:512
> > > #13 0x00408edd in main (argc=, argv= > > out>) at northd/ovn-northd-ddlog.c:1203
> > >
> > > For comparison, running the C version of ovn-northd I get:
> > >
> > > 2021-03-24T14:48:06.556Z|00033|timeval|WARN|Unreasonably long 11290ms 
> > > poll interval (10916ms user, 334ms system)
> > > 2021-03-24T14:48:14.678Z|00050|timeval|WARN|Unreasonably long 8122ms poll 
> > > interval (8046ms user, 48ms system)
> > >
> > > But after the northd iteration ends, memory usage is OK:
> > >
> > > 777567 root  10 -10 1657308   1.6g   3496 S   0.0   1.2   0:49.08 
> > > ovn-northd
> > >
> > > The behavior above is consistent both with current OVN master and also
> > > when cherry-picking the ddlog-related patches that are pending in
> > > patchwork:
> > > http://patchwork.ozlabs.org/project/ovn/list/?series=233075
> > > http://patchwork.ozlabs.org/project/ovn/list/?series=232480
> > > http://patchwork.ozlabs.org/project/ovn/list/?series=233079
> > > http://patchwork.ozlabs.org/project/ovn/list/?series=233080
> > >
> > > I didn't try out the changes from the following series though as I
> > > understand they need a v2:
> > > http://patchwork.ozlabs.org/project/ovn/list/?series=232040
> > >
> > > Regards,
> > > Dumitru
> > >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-03-25 Thread Numan Siddique
Hi Ben,

With your memory fixes applied, I still see many test failures due to
memory leaks
when configured as:

./configure --enable-Werror --enable-sparse CFLAGS="-g -fsanitize=address"

You can find more details about the memory leaks here -
https://gist.github.com/numansiddique/e701777da977a89e24b49f159bb30d5d

Thanks
Numan



On Wed, Mar 24, 2021 at 9:19 PM Ben Pfaff  wrote:
>
> Thanks a lot for the report!  I need a new test case so I'll give this
> one a shot when I can.
>
> On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote:
> > Hi Ben,
> >
> > We discussed a bit about this during one of the recent IRC OVN meetings,
> > but I didn't get around to properly reporting this until now.
> >
> > I've tried running ovn-northd-ddlog against some large OVN NB/DB
> > databases extracted from one of our scale testing runs:
> >
> > http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/
> >
> > It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot
> > of memory:
> >
> > 775734 root  10 -10   81.6g  80.8g  22396 S  99.7  64.2   3:50.79 
> > ovn-northd-ddlog
> >
> > ovn-northd-ddlog is stuck in an (infinite?) loop in
> > ddlog_transaction_commit_dump_changes():
> >
> > (gdb) bt
> > #0  0x7f2762167e92 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> > /lib64/libpthread.so.0
> > #1  0x00891eb3 in std::thread::park ()
> > #2  0x00530963 in crossbeam_channel::context::Context::wait_until ()
> > #3  0x00530aa1 in 
> > crossbeam_channel::context::Context::with::{{closure}} ()
> > #4  0x00531f5c in 
> > crossbeam_channel::flavors::list::Channel::recv ()
> > #5  0x0075b35d in crossbeam_channel::channel::Receiver::recv ()
> > #6  0x00512156 in 
> > differential_datalog::program::RunningProgram::flush ()
> > #7  0x0050de67 in 
> > differential_datalog::program::RunningProgram::transaction_commit ()
> > #8  0x0093f17d in  > differential_datalog::ddlog::DDlog>::transaction_commit_dump_changes ()
> > #9  0x0040cc40 in ddlog_transaction_commit_dump_changes ()
> > #10 0x0040a758 in ddlog_commit (ddlog=) at 
> > northd/ovn-northd-ddlog.c:435
> > #11 northd_parse_update (update=0x3dc0598, ctx=0x3d71880) at 
> > northd/ovn-northd-ddlog.c:435
> > #12 northd_run (ctx=0x3d71880) at northd/ovn-northd-ddlog.c:512
> > #13 0x00408edd in main (argc=, argv=) 
> > at northd/ovn-northd-ddlog.c:1203
> >
> > For comparison, running the C version of ovn-northd I get:
> >
> > 2021-03-24T14:48:06.556Z|00033|timeval|WARN|Unreasonably long 11290ms poll 
> > interval (10916ms user, 334ms system)
> > 2021-03-24T14:48:14.678Z|00050|timeval|WARN|Unreasonably long 8122ms poll 
> > interval (8046ms user, 48ms system)
> >
> > But after the northd iteration ends, memory usage is OK:
> >
> > 777567 root  10 -10 1657308   1.6g   3496 S   0.0   1.2   0:49.08 
> > ovn-northd
> >
> > The behavior above is consistent both with current OVN master and also
> > when cherry-picking the ddlog-related patches that are pending in
> > patchwork:
> > http://patchwork.ozlabs.org/project/ovn/list/?series=233075
> > http://patchwork.ozlabs.org/project/ovn/list/?series=232480
> > http://patchwork.ozlabs.org/project/ovn/list/?series=233079
> > http://patchwork.ozlabs.org/project/ovn/list/?series=233080
> >
> > I didn't try out the changes from the following series though as I
> > understand they need a v2:
> > http://patchwork.ozlabs.org/project/ovn/list/?series=232040
> >
> > Regards,
> > Dumitru
> >
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-03-24 Thread Ben Pfaff
Thanks a lot for the report!  I need a new test case so I'll give this
one a shot when I can.

On Wed, Mar 24, 2021 at 04:03:07PM +0100, Dumitru Ceara wrote:
> Hi Ben,
> 
> We discussed a bit about this during one of the recent IRC OVN meetings,
> but I didn't get around to properly reporting this until now.
> 
> I've tried running ovn-northd-ddlog against some large OVN NB/DB
> databases extracted from one of our scale testing runs:
> 
> http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/
> 
> It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot
> of memory:
> 
> 775734 root  10 -10   81.6g  80.8g  22396 S  99.7  64.2   3:50.79 
> ovn-northd-ddlog
> 
> ovn-northd-ddlog is stuck in an (infinite?) loop in
> ddlog_transaction_commit_dump_changes():
> 
> (gdb) bt
> #0  0x7f2762167e92 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x00891eb3 in std::thread::park ()
> #2  0x00530963 in crossbeam_channel::context::Context::wait_until ()
> #3  0x00530aa1 in 
> crossbeam_channel::context::Context::with::{{closure}} ()
> #4  0x00531f5c in crossbeam_channel::flavors::list::Channel::recv 
> ()
> #5  0x0075b35d in crossbeam_channel::channel::Receiver::recv ()
> #6  0x00512156 in 
> differential_datalog::program::RunningProgram::flush ()
> #7  0x0050de67 in 
> differential_datalog::program::RunningProgram::transaction_commit ()
> #8  0x0093f17d in  differential_datalog::ddlog::DDlog>::transaction_commit_dump_changes ()
> #9  0x0040cc40 in ddlog_transaction_commit_dump_changes ()
> #10 0x0040a758 in ddlog_commit (ddlog=) at 
> northd/ovn-northd-ddlog.c:435
> #11 northd_parse_update (update=0x3dc0598, ctx=0x3d71880) at 
> northd/ovn-northd-ddlog.c:435
> #12 northd_run (ctx=0x3d71880) at northd/ovn-northd-ddlog.c:512
> #13 0x00408edd in main (argc=, argv=) 
> at northd/ovn-northd-ddlog.c:1203
> 
> For comparison, running the C version of ovn-northd I get:
> 
> 2021-03-24T14:48:06.556Z|00033|timeval|WARN|Unreasonably long 11290ms poll 
> interval (10916ms user, 334ms system)
> 2021-03-24T14:48:14.678Z|00050|timeval|WARN|Unreasonably long 8122ms poll 
> interval (8046ms user, 48ms system)
> 
> But after the northd iteration ends, memory usage is OK:
> 
> 777567 root  10 -10 1657308   1.6g   3496 S   0.0   1.2   0:49.08 
> ovn-northd
> 
> The behavior above is consistent both with current OVN master and also
> when cherry-picking the ddlog-related patches that are pending in
> patchwork:
> http://patchwork.ozlabs.org/project/ovn/list/?series=233075
> http://patchwork.ozlabs.org/project/ovn/list/?series=232480
> http://patchwork.ozlabs.org/project/ovn/list/?series=233079
> http://patchwork.ozlabs.org/project/ovn/list/?series=233080
> 
> I didn't try out the changes from the following series though as I
> understand they need a v2:
> http://patchwork.ozlabs.org/project/ovn/list/?series=232040
> 
> Regards,
> Dumitru
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] ovn-northd-ddlog scale issues

2021-03-24 Thread Dumitru Ceara
Hi Ben,

We discussed a bit about this during one of the recent IRC OVN meetings,
but I didn't get around to properly reporting this until now.

I've tried running ovn-northd-ddlog against some large OVN NB/DB
databases extracted from one of our scale testing runs:

http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210324/existing-nb-sb/

It seems that ovn-northd-ddlog gets stuck in a busy loop and uses a lot
of memory:

775734 root  10 -10   81.6g  80.8g  22396 S  99.7  64.2   3:50.79 
ovn-northd-ddlog

ovn-northd-ddlog is stuck in an (infinite?) loop in
ddlog_transaction_commit_dump_changes():

(gdb) bt
#0  0x7f2762167e92 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00891eb3 in std::thread::park ()
#2  0x00530963 in crossbeam_channel::context::Context::wait_until ()
#3  0x00530aa1 in 
crossbeam_channel::context::Context::with::{{closure}} ()
#4  0x00531f5c in crossbeam_channel::flavors::list::Channel::recv ()
#5  0x0075b35d in crossbeam_channel::channel::Receiver::recv ()
#6  0x00512156 in differential_datalog::program::RunningProgram::flush 
()
#7  0x0050de67 in 
differential_datalog::program::RunningProgram::transaction_commit ()
#8  0x0093f17d in ::transaction_commit_dump_changes ()
#9  0x0040cc40 in ddlog_transaction_commit_dump_changes ()
#10 0x0040a758 in ddlog_commit (ddlog=) at 
northd/ovn-northd-ddlog.c:435
#11 northd_parse_update (update=0x3dc0598, ctx=0x3d71880) at 
northd/ovn-northd-ddlog.c:435
#12 northd_run (ctx=0x3d71880) at northd/ovn-northd-ddlog.c:512
#13 0x00408edd in main (argc=, argv=) at 
northd/ovn-northd-ddlog.c:1203

For comparison, running the C version of ovn-northd I get:

2021-03-24T14:48:06.556Z|00033|timeval|WARN|Unreasonably long 11290ms poll 
interval (10916ms user, 334ms system)
2021-03-24T14:48:14.678Z|00050|timeval|WARN|Unreasonably long 8122ms poll 
interval (8046ms user, 48ms system)

But after the northd iteration ends, memory usage is OK:

777567 root  10 -10 1657308   1.6g   3496 S   0.0   1.2   0:49.08 ovn-northd

The behavior above is consistent both with current OVN master and also
when cherry-picking the ddlog-related patches that are pending in
patchwork:
http://patchwork.ozlabs.org/project/ovn/list/?series=233075
http://patchwork.ozlabs.org/project/ovn/list/?series=232480
http://patchwork.ozlabs.org/project/ovn/list/?series=233079
http://patchwork.ozlabs.org/project/ovn/list/?series=233080

I didn't try out the changes from the following series though as I
understand they need a v2:
http://patchwork.ozlabs.org/project/ovn/list/?series=232040

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev