On Fri, Aug 7, 2020 at 12:35 PM Tony Liu <tonyliu0...@hotmail.com> wrote:
> Inline... > > Thanks! > > Tony > > -----Original Message----- > > From: Han Zhou <zhou...@gmail.com> > > Sent: Friday, August 7, 2020 12:29 PM > > To: Tony Liu <tonyliu0...@hotmail.com> > > Cc: ovs-discuss <ovs-discuss@openvswitch.org>; ovs-dev <ovs- > > d...@openvswitch.org> > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no > > changes in sb-db > > > > > > > > On Fri, Aug 7, 2020 at 12:19 PM Tony Liu <tonyliu0...@hotmail.com > > <mailto:tonyliu0...@hotmail.com> > wrote: > > > > > > ovn-controller is using UNIX socket connecting to local ovsdb- > > server. > > > > From the log you were showing, you were using tcp:127.0.0.1:6640 > > Sorry, what I meant was, given your advice, I just made the change for > ovn-controller to use UNIX socket. > > Oh, I see, no worries. > <http://127.0.0.1:6640> to connect the local ovsdb. > > > 2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640 > > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> : connection dropped > > > (Broken pipe) > > > > > > Inactivity probe doesn't seem to be the cause of high cpu usage. > > > > The wakeup on connection to sb-db is always followed by a > > "unreasonably > > long" warning. I guess the pollin event loop is stuck for too long, > > like > > 10s as below. > > ======== > > 2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to > [POLLIN] > > on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712> <->10.6.20.86:6642 > > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage) > > 2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably long > > 10153ms poll interval (10075ms user, 1ms system) > > ======== > > > > Could that stuck loop be the cause of high cpu usage? > > What is it polling in? > > Why is it stuck, waiting for message from sb-db? > > Isn't it supposed to release the cpu while waiting? > > > > > > > > This log means there are messages received from 10.6.20.86:6642 > > <http://10.6.20.86:6642> (the SB DB). Is there SB change? The CPU is > > spent on handling the SB change. Some type of SB changes are not handled > > incrementally. > > SB update is driven by ovn-northd in case anything changed in NB, > and ovn-controller in case anything changed on chassis. No, there > is nothing changed in NB, neither chassis. > > Should I bump logging level up to dbg? Is that going to show me > what messages ovn-controller is handling? > > Yes, debug log should show the details. > > > > Thanks! > > > > Tony > > > > > -----Original Message----- > > > From: Han Zhou <zhou...@gmail.com <mailto:zhou...@gmail.com> > > > > Sent: Friday, August 7, 2020 10:32 AM > > > To: Tony Liu <tonyliu0...@hotmail.com > > <mailto:tonyliu0...@hotmail.com> > > > > Cc: ovs-discuss <ovs-discuss@openvswitch.org <mailto:ovs- > > disc...@openvswitch.org> >; ovs-dev <ovs- > > > d...@openvswitch.org <mailto:d...@openvswitch.org> > > > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu > > while no > > > changes in sb-db > > > > > > > > > > > > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu < > tonyliu0...@hotmail.com > > <mailto:tonyliu0...@hotmail.com> > > > <mailto:tonyliu0...@hotmail.com > > <mailto:tonyliu0...@hotmail.com> > > wrote: > > > > > > > > > Hi, > > > > > > Here are some logging snippets from ovn-controller. > > > ======== > > > 2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably > > long > > > 8954ms poll interval (8895ms user, 0ms system) > > > ======== > > > What's that mean? Is it harmless? > > > > > > ======== > > > 2020-08-07T16:38:04.021Z|29251|timeval|WARN|context > > switches: 0 > > > voluntary, 6 involuntary > > > 2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to > > [POLLIN] > > > on fd 19 (10.6.20.91:60398 <http://10.6.20.91:60398> > > <http://10.6.20.91:60398> <->10.6.20.86:6642 <http://10.6.20.86:6642> > > > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU > usage) > > > ======== > > > Is this wakeup caused by changes in sb-db? > > > Why is ovn-controller so busy? > > > > > > ======== > > > 2020-08- > > 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640 > > <http://127.0.0.1:6640> > > > <http://127.0.0.1:6640> : connection dropped (Broken pipe) > > > ======== > > > Connection to local ovsdb-server is dropped. > > > Is this caused by the timeout of inactivity probe? > > > > > > ======== > > > 2020-08-07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to > > [POLLIN] > > > on fd 20 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream- > > fd.c:157 > > > (99% CPU usage) > > > ======== > > > What causes this wakeup? > > > > > > ======== > > > 2020-08-07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to > > 0-ms > > > timeout at lib/ovsdb-idl.c:5391 (99% CPU usage) > > > ======== > > > What's this 0-ms wakeup mean? > > > > > > ======== > > > 2020-08-07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to > > 962-ms > > > timeout at lib/reconnect.c:643 (99% CPU usage) > > > 2020-08- > > 07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640 > > <http://127.0.0.1:6640> > > > <http://127.0.0.1:6640> : connecting... > > > 2020-08-07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to > > > [POLLOUT] on fd 14 (127.0.0.1:51478 <http://127.0.0.1:51478> > > <http://127.0.0.1:51478> <- > > > >127.0.0.1:6640 <http://127.0.0.1:6640> <http://127.0.0.1:6640> > ) > > at lib/stream-fd.c:153 (99% > > > CPU usage) > > > 2020-08- > > 07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640 > > <http://127.0.0.1:6640> > > > <http://127.0.0.1:6640> : connected > > > ================ > > > Retry to connect to local ovsdb-server. A pollout event is > > > triggered > > > right after connection is established. What's poolout? > > > > > > ovn-controller is taking 100% CPU now, and there is no > > changes in > > > sb-db (not busy). It seems that it's busy with local ovsdb- > > server > > > or vswitchd. I'd like to understand why ovn-controller is > > so busy? > > > All inactivity probe intervals are set to 30s. > > > > > > > > > > > > > > > Is there change from the local ovsdb? You can enable dbg log to > > see what > > > is happening. > > > For the local ovsdb probe, I have mentioned in the other thread > > that > > > UNIX socket is recommended (instead of tcp 127.0.0.1). Using UNIX > > socket > > > disables probe by default. > > > > > > Thanks, > > > Han > > > > > >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss