On Fri, Aug 7, 2020 at 12:57 PM Tony Liu <tonyliu0...@hotmail.com> wrote:

> Enabled debug logging, there are tons of messages.
> Note there are 4353 datapath bindings and 13078 port bindings in SB.
> 4097 LS, 8470 LSP, 256 LR and 4352 LRP in NB. Every 16 LS connect to
> a router. All routers connect to the external network.
>
> ovn-controller on compute node is good. The ovn-controller on gateway
> node is taking 100% cpu. It's probably related to the ports on the
> external network? Any specific messages I need to check?
>
> Any hint to look into it is appreciated!
>
>
If it happens only on gateway, it may be related to ARP handling. Could you
share the output of ovn-appctl -t ovn-controller coverage/show, with 2 - 3
runs in 5s interval?
For the debug log, I'd first check if there is any OVSDB notification from
SB DB, and if yes, what are the changes.

>
> Thanks!
>
> Tony
> > -----Original Message-----
> > From: Han Zhou <zhou...@gmail.com>
> > Sent: Friday, August 7, 2020 12:39 PM
> > To: Tony Liu <tonyliu0...@hotmail.com>
> > Cc: ovs-discuss <ovs-discuss@openvswitch.org>; ovs-dev <ovs-
> > d...@openvswitch.org>
> > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> > changes in sb-db
> >
> >
> >
> > On Fri, Aug 7, 2020 at 12:35 PM Tony Liu <tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com> > wrote:
> >
> >
> >       Inline...
> >
> >       Thanks!
> >
> >       Tony
> >       > -----Original Message-----
> >       > From: Han Zhou <zhou...@gmail.com <mailto:zhou...@gmail.com> >
> >       > Sent: Friday, August 7, 2020 12:29 PM
> >       > To: Tony Liu <tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com> >
> >       > Cc: ovs-discuss <ovs-discuss@openvswitch.org <mailto:ovs-
> > disc...@openvswitch.org> >; ovs-dev <ovs-
> >       > d...@openvswitch.org <mailto:d...@openvswitch.org> >
> >       > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu
> > while no
> >       > changes in sb-db
> >       >
> >       >
> >       >
> >       > On Fri, Aug 7, 2020 at 12:19 PM Tony Liu <
> tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com>
> >       > <mailto:tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com> > > wrote:
> >       >
> >       >
> >       >       ovn-controller is using UNIX socket connecting to local
> > ovsdb-
> >       > server.
> >       >
> >       > From the log you were showing, you were using tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640>
> >
> >       Sorry, what I meant was, given your advice, I just made the change
> > for
> >       ovn-controller to use UNIX socket.
> >
> >
> >
> > Oh, I see, no worries.
> >
> >
> >       > <http://127.0.0.1:6640>  to connect the local ovsdb.
> >       > >       2020-08-
> > 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640>
> >       > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> : connection
> > dropped
> >       > > (Broken pipe)
> >       >
> >       >
> >       >       Inactivity probe doesn't seem to be the cause of high cpu
> > usage.
> >       >
> >       >       The wakeup on connection to sb-db is always followed by a
> >       > "unreasonably
> >       >       long" warning. I guess the pollin event loop is stuck for
> > too long,
> >       > like
> >       >       10s as below.
> >       >       ========
> >       >       2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to
> > [POLLIN]
> >       > on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712>
> > <http://10.6.20.91:60712> <->10.6.20.86:6642 <http://10.6.20.86:6642>
> >       > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU
> usage)
> >       >       2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably
> > long
> >       > 10153ms poll interval (10075ms user, 1ms system)
> >       >       ========
> >       >
> >       >       Could that stuck loop be the cause of high cpu usage?
> >       >       What is it polling in?
> >       >       Why is it stuck, waiting for message from sb-db?
> >       >       Isn't it supposed to release the cpu while waiting?
> >       >
> >       >
> >       >
> >       > This log means there are messages received from 10.6.20.86:6642
> > <http://10.6.20.86:6642>
> >       > <http://10.6.20.86:6642>  (the SB DB). Is there SB change? The
> > CPU is
> >       > spent on handling the SB change. Some type of SB changes are not
> > handled
> >       > incrementally.
> >
> >       SB update is driven by ovn-northd in case anything changed in NB,
> >       and ovn-controller in case anything changed on chassis. No, there
> >       is nothing changed in NB, neither chassis.
> >
> >       Should I bump logging level up to dbg? Is that going to show me
> >       what messages ovn-controller is handling?
> >
> >
> >
> > Yes, debug log should show the details.
> >
> >
> >
> >       >
> >       >       Thanks!
> >       >
> >       >       Tony
> >       >
> >       >       > -----Original Message-----
> >       >       > From: Han Zhou <zhou...@gmail.com
> > <mailto:zhou...@gmail.com>  <mailto:zhou...@gmail.com
> > <mailto:zhou...@gmail.com> > >
> >       >       > Sent: Friday, August 7, 2020 10:32 AM
> >       >       > To: Tony Liu <tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com>
> >       > <mailto:tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com> > >
> >       >       > Cc: ovs-discuss <ovs-discuss@openvswitch.org <mailto:
> ovs-
> > disc...@openvswitch.org>  <mailto:ovs- <mailto:ovs->
> >       > disc...@openvswitch.org <mailto:disc...@openvswitch.org> > >;
> > ovs-dev <ovs-
> >       >       > d...@openvswitch.org <mailto:d...@openvswitch.org>
> > <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> > >
> >       >       > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes
> 100%
> > cpu
> >       > while no
> >       >       > changes in sb-db
> >       >       >
> >       >       >
> >       >       >
> >       >       > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu
> > <tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com>
> >       > <mailto:tonyliu0...@hotmail.com <mailto:tonyliu0...@hotmail.com>
> >
> >       >       > <mailto:tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com>
> >       > <mailto:tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com> > > > wrote:
> >       >       >
> >       >       >
> >       >       >       Hi,
> >       >       >
> >       >       >       Here are some logging snippets from ovn-controller.
> >       >       >       ========
> >       >       >       2020-08-
> > 07T16:38:04.020Z|29250|timeval|WARN|Unreasonably
> >       > long
> >       >       > 8954ms poll interval (8895ms user, 0ms system)
> >       >       >       ========
> >       >       >       What's that mean? Is it harmless?
> >       >       >
> >       >       >       ========
> >       >       >       2020-08-07T16:38:04.021Z|29251|timeval|WARN|context
> >       > switches: 0
> >       >       > voluntary, 6 involuntary
> >       >       >       2020-08-
> > 07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to
> >       > [POLLIN]
> >       >       > on fd 19 (10.6.20.91:60398 <http://10.6.20.91:60398>
> > <http://10.6.20.91:60398>
> >       > <http://10.6.20.91:60398> <->10.6.20.86:6642
> > <http://10.6.20.86:6642>  <http://10.6.20.86:6642>
> >       >       > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99%
> > CPU usage)
> >       >       >       ========
> >       >       >       Is this wakeup caused by changes in sb-db?
> >       >       >       Why is ovn-controller so busy?
> >       >       >
> >       >       >       ========
> >       >       >       2020-08-
> >       > 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640>
> >       > <http://127.0.0.1:6640>
> >       >       > <http://127.0.0.1:6640> : connection dropped (Broken
> pipe)
> >       >       >       ========
> >       >       >       Connection to local ovsdb-server is dropped.
> >       >       >       Is this caused by the timeout of inactivity probe?
> >       >       >
> >       >       >       ========
> >       >       >       2020-08-
> > 07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to
> >       > [POLLIN]
> >       >       > on fd 20 (<->/var/run/openvswitch/br-int.mgmt) at
> > lib/stream-
> >       > fd.c:157
> >       >       > (99% CPU usage)
> >       >       >       ========
> >       >       >       What causes this wakeup?
> >       >       >
> >       >       >       ========
> >       >       >       2020-08-
> > 07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to
> >       > 0-ms
> >       >       > timeout at lib/ovsdb-idl.c:5391 (99% CPU usage)
> >       >       >       ========
> >       >       >       What's this 0-ms wakeup mean?
> >       >       >
> >       >       >       ========
> >       >       >       2020-08-
> > 07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to
> >       > 962-ms
> >       >       > timeout at lib/reconnect.c:643 (99% CPU usage)
> >       >       >       2020-08-
> >       > 07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640>
> >       > <http://127.0.0.1:6640>
> >       >       > <http://127.0.0.1:6640> : connecting...
> >       >       >       2020-08-
> > 07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to
> >       >       > [POLLOUT] on fd 14 (127.0.0.1:51478
> > <http://127.0.0.1:51478>  <http://127.0.0.1:51478>
> >       > <http://127.0.0.1:51478> <-
> >       >       > >127.0.0.1:6640 <http://127.0.0.1:6640>
> > <http://127.0.0.1:6640>  <http://127.0.0.1:6640> )
> >       > at lib/stream-fd.c:153 (99%
> >       >       > CPU usage)
> >       >       >       2020-08-
> >       > 07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640>
> >       > <http://127.0.0.1:6640>
> >       >       > <http://127.0.0.1:6640> : connected
> >       >       >       ================
> >       >       >       Retry to connect to local ovsdb-server. A pollout
> > event is
> >       >       > triggered
> >       >       >       right after connection is established. What's
> > poolout?
> >       >       >
> >       >       >       ovn-controller is taking 100% CPU now, and there is
> > no
> >       > changes in
> >       >       >       sb-db (not busy). It seems that it's busy with
> > local ovsdb-
> >       > server
> >       >       >       or vswitchd. I'd like to understand why ovn-
> > controller is
> >       > so busy?
> >       >       >       All inactivity probe intervals are set to 30s.
> >       >       >
> >       >       >
> >       >       >
> >       >       >
> >       >       > Is there change from the local ovsdb? You can enable dbg
> > log to
> >       > see what
> >       >       > is happening.
> >       >       > For the local ovsdb probe, I have mentioned in the other
> > thread
> >       > that
> >       >       > UNIX socket is recommended (instead of tcp 127.0.0.1).
> > Using UNIX
> >       > socket
> >       >       > disables probe by default.
> >       >       >
> >       >       > Thanks,
> >       >       > Han
> >       >
> >       >
> >
> >
>
>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to