Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
Good to know. Yes, I am using 20.03. Will try to upgrade. Thanks! Tony > -Original Message- > From: Han Zhou > Sent: Friday, August 7, 2020 3:39 PM > To: Tony Liu > Cc: ovs-dev ; ovs-discuss disc...@openvswitch.org> > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no > changes in sb-db > > The change in external-ids is not monitored by ovn-controller any more > after version 20.06. Probably you are still using an older version? > > > On Fri, Aug 7, 2020 at 3:11 PM Tony Liu <mailto:tonyliu0...@hotmail.com> > wrote: > > > Is the changes made by Neutron liveness-check in SB properly > handled > by inc_proc_eng? > > Given my testing scale, even the frequency is lowered by that fix, > ovn-controller still takes quite lot of CPU and time to compute SB. > > > Thanks! > > Tony > > -Original Message- > > From: Tony Liu <mailto:tonyliu0...@hotmail.com> > > > Sent: Friday, August 7, 2020 1:25 PM > > To: Tony Liu <mailto:tonyliu0...@hotmail.com> >; Han Zhou <mailto:zhou...@gmail.com> > > > Cc: ovs-dev mailto:ovs- > d...@openvswitch.org> >; ovs-discuss > disc...@openvswitch.org <mailto:disc...@openvswitch.org> > > > Subject: RE: [ovs-discuss] [OVN] ovn-controller takes 100% cpu > while no > > changes in sb-db > > > > Thanks for the hint! > > > > 2020-08-07T19:44:18.019Z|616614|jsonrpc|DBG|tcp:10.6.20.84:6642 > <http://10.6.20.84:6642> : > > received notification, method="update3", > > params=[["monid","OVN_Southbound"],"5002cb22-13e5-490a-9a64- > > 5d48914138ca",{"Chassis":{"0390b621-152b-48a0-a3d1- > > > 2973c0b823cc":{"modify":{"external_ids":["map",[["neutron:liveness_check > > _at","2020-08-07T19:44:07.130233+00:00"]]]] > > > > > > Nailed it... > > https://bugs.launchpad.net/neutron/+bug/1883554 > > > > > > Tony > > > -----Original Message----- > > > From: dev mailto:ovs-dev- > boun...@openvswitch.org> > On Behalf Of Tony Liu > > > Sent: Friday, August 7, 2020 1:14 PM > > > To: Han Zhou mailto:zhou...@gmail.com> > > > > Cc: ovs-dev mailto:ovs- > d...@openvswitch.org> >; ovs-discuss> > disc...@openvswitch.org <mailto:disc...@openvswitch.org> > > > > Subject: Re: [ovs-dev] [ovs-discuss] [OVN] ovn-controller takes > 100% > > > cpu while no changes in sb-db > > > > > > > > > Here is the outpu. > > > > > > [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t > > > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg > rate > > > over last: 5 seconds, last minute, last hour, hash=e70a83c8: > > > lflow_run 0.0/sec 0.083/sec > 0.0725/sec > > > total: 295 > > > miniflow_malloc0.0/sec 44356.817/sec > 44527.3975/sec > > > total: 180635403 > > > hindex_pathological0.0/sec 0.000/sec > 0./sec > > > total: 7187 > > > hindex_expand 0.0/sec 0.000/sec > 0./sec > > > total: 17 > > > hmap_pathological 0.0/sec 4.167/sec > 4.1806/sec > > > total: 25091 > > > hmap_expand0.0/sec 5366.500/sec > 5390.0800/sec > > > total: 23680738 > > > txn_unchanged 0.0/sec 0.300/sec > 0.3175/sec > > > total: 11024 > > > txn_incomplete 0.0/sec 0.100/sec > 0.0836/sec > > > total: 974 > > > txn_success0.0/sec 0.033/sec > 0.0308/sec > > > total: 129 > > > txn_try_again 0.0/sec 0.000/sec > 0.0003/sec > > > total: 1 > > > poll_create_node 0.4/sec 1.933/sec > 1.9575/sec > > > total: 55611 > > > poll_zero_timeout 0.0/sec 0.067/sec > 0.0556/sec > > > total: 241 > > > rconn_queued 0.0/sec 0.050/sec > 0.0594/sec > > > total: 1208720 > > > rconn_sent
Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
The change in external-ids is not monitored by ovn-controller any more after version 20.06. Probably you are still using an older version? On Fri, Aug 7, 2020 at 3:11 PM Tony Liu wrote: > Is the changes made by Neutron liveness-check in SB properly handled > by inc_proc_eng? > > Given my testing scale, even the frequency is lowered by that fix, > ovn-controller still takes quite lot of CPU and time to compute SB. > > > Thanks! > > Tony > > -Original Message- > > From: Tony Liu > > Sent: Friday, August 7, 2020 1:25 PM > > To: Tony Liu ; Han Zhou > > Cc: ovs-dev ; ovs-discuss > disc...@openvswitch.org> > > Subject: RE: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no > > changes in sb-db > > > > Thanks for the hint! > > > > 2020-08-07T19:44:18.019Z|616614|jsonrpc|DBG|tcp:10.6.20.84:6642: > > received notification, method="update3", > > params=[["monid","OVN_Southbound"],"5002cb22-13e5-490a-9a64- > > 5d48914138ca",{"Chassis":{"0390b621-152b-48a0-a3d1- > > 2973c0b823cc":{"modify":{"external_ids":["map",[["neutron:liveness_check > > _at","2020-08-07T19:44:07.130233+00:00"]]]] > > > > > > Nailed it... > > https://bugs.launchpad.net/neutron/+bug/1883554 > > > > > > Tony > > > -----Original Message- > > > From: dev On Behalf Of Tony Liu > > > Sent: Friday, August 7, 2020 1:14 PM > > > To: Han Zhou > > > Cc: ovs-dev ; ovs-discuss > > disc...@openvswitch.org> > > > Subject: Re: [ovs-dev] [ovs-discuss] [OVN] ovn-controller takes 100% > > > cpu while no changes in sb-db > > > > > > > > > Here is the outpu. > > > > > > [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t > > > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate > > > over last: 5 seconds, last minute, last hour, hash=e70a83c8: > > > lflow_run 0.0/sec 0.083/sec0.0725/sec > > > total: 295 > > > miniflow_malloc0.0/sec 44356.817/sec44527.3975/sec > > > total: 180635403 > > > hindex_pathological0.0/sec 0.000/sec0./sec > > > total: 7187 > > > hindex_expand 0.0/sec 0.000/sec0./sec > > > total: 17 > > > hmap_pathological 0.0/sec 4.167/sec4.1806/sec > > > total: 25091 > > > hmap_expand0.0/sec 5366.500/sec 5390.0800/sec > > > total: 23680738 > > > txn_unchanged 0.0/sec 0.300/sec0.3175/sec > > > total: 11024 > > > txn_incomplete 0.0/sec 0.100/sec0.0836/sec > > > total: 974 > > > txn_success0.0/sec 0.033/sec0.0308/sec > > > total: 129 > > > txn_try_again 0.0/sec 0.000/sec0.0003/sec > > > total: 1 > > > poll_create_node 0.4/sec 1.933/sec1.9575/sec > > > total: 55611 > > > poll_zero_timeout 0.0/sec 0.067/sec0.0556/sec > > > total: 241 > > > rconn_queued 0.0/sec 0.050/sec0.0594/sec > > > total: 1208720 > > > rconn_sent 0.0/sec 0.050/sec0.0594/sec > > > total: 1208720 > > > seq_change 0.2/sec 0.783/sec0.7492/sec > > > total: 13962 > > > pstream_open 0.0/sec 0.000/sec0./sec > > > total: 1 > > > stream_open0.0/sec 0.000/sec0.0003/sec > > > total: 5 > > > unixctl_received 0.0/sec 0.000/sec0.0011/sec > > > total: 4 > > > unixctl_replied0.0/sec 0.000/sec0.0011/sec > > > total: 4 > > > util_xalloc0.8/sec 1396586.967/sec 240916.6047/sec > > > total: 5834154064 > > > vconn_open 0.0/sec 0.000/sec0./sec > > > total: 2 > > > vconn_received 0.0/sec 0.050/sec0.0594/sec > > > total: 632 > > > vconn_sent 0.0/sec 0.050/sec0.0494/sec > > > total: 1213248 > > > netlink_received 0.0/sec 0.300/sec0.2900/sec > > > total: 1188 > > > netlink_recv_jumbo 0.0/sec 0.083/sec0.0725/sec > > > total: 296 > > > netlink_s
Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
Is the changes made by Neutron liveness-check in SB properly handled by inc_proc_eng? Given my testing scale, even the frequency is lowered by that fix, ovn-controller still takes quite lot of CPU and time to compute SB. Thanks! Tony > -Original Message- > From: Tony Liu > Sent: Friday, August 7, 2020 1:25 PM > To: Tony Liu ; Han Zhou > Cc: ovs-dev ; ovs-discuss disc...@openvswitch.org> > Subject: RE: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no > changes in sb-db > > Thanks for the hint! > > 2020-08-07T19:44:18.019Z|616614|jsonrpc|DBG|tcp:10.6.20.84:6642: > received notification, method="update3", > params=[["monid","OVN_Southbound"],"5002cb22-13e5-490a-9a64- > 5d48914138ca",{"Chassis":{"0390b621-152b-48a0-a3d1- > 2973c0b823cc":{"modify":{"external_ids":["map",[["neutron:liveness_check > _at","2020-08-07T19:44:07.130233+00:00"]]]] > > > Nailed it... > https://bugs.launchpad.net/neutron/+bug/1883554 > > > Tony > > -Original Message----- > > From: dev On Behalf Of Tony Liu > > Sent: Friday, August 7, 2020 1:14 PM > > To: Han Zhou > > Cc: ovs-dev ; ovs-discuss > disc...@openvswitch.org> > > Subject: Re: [ovs-dev] [ovs-discuss] [OVN] ovn-controller takes 100% > > cpu while no changes in sb-db > > > > > > Here is the outpu. > > > > [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t > > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate > > over last: 5 seconds, last minute, last hour, hash=e70a83c8: > > lflow_run 0.0/sec 0.083/sec0.0725/sec > > total: 295 > > miniflow_malloc0.0/sec 44356.817/sec44527.3975/sec > > total: 180635403 > > hindex_pathological0.0/sec 0.000/sec0./sec > > total: 7187 > > hindex_expand 0.0/sec 0.000/sec0./sec > > total: 17 > > hmap_pathological 0.0/sec 4.167/sec4.1806/sec > > total: 25091 > > hmap_expand0.0/sec 5366.500/sec 5390.0800/sec > > total: 23680738 > > txn_unchanged 0.0/sec 0.300/sec0.3175/sec > > total: 11024 > > txn_incomplete 0.0/sec 0.100/sec0.0836/sec > > total: 974 > > txn_success0.0/sec 0.033/sec0.0308/sec > > total: 129 > > txn_try_again 0.0/sec 0.000/sec0.0003/sec > > total: 1 > > poll_create_node 0.4/sec 1.933/sec1.9575/sec > > total: 55611 > > poll_zero_timeout 0.0/sec 0.067/sec0.0556/sec > > total: 241 > > rconn_queued 0.0/sec 0.050/sec0.0594/sec > > total: 1208720 > > rconn_sent 0.0/sec 0.050/sec0.0594/sec > > total: 1208720 > > seq_change 0.2/sec 0.783/sec0.7492/sec > > total: 13962 > > pstream_open 0.0/sec 0.000/sec0./sec > > total: 1 > > stream_open0.0/sec 0.000/sec0.0003/sec > > total: 5 > > unixctl_received 0.0/sec 0.000/sec0.0011/sec > > total: 4 > > unixctl_replied0.0/sec 0.000/sec0.0011/sec > > total: 4 > > util_xalloc0.8/sec 1396586.967/sec 240916.6047/sec > > total: 5834154064 > > vconn_open 0.0/sec 0.000/sec0./sec > > total: 2 > > vconn_received 0.0/sec 0.050/sec0.0594/sec > > total: 632 > > vconn_sent 0.0/sec 0.050/sec0.0494/sec > > total: 1213248 > > netlink_received 0.0/sec 0.300/sec0.2900/sec > > total: 1188 > > netlink_recv_jumbo 0.0/sec 0.083/sec0.0725/sec > > total: 296 > > netlink_sent 0.0/sec 0.300/sec0.2900/sec > > total: 1188 > > cmap_expand0.0/sec 0.000/sec0./sec > > total: 3 > > 82 events never hit > > [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t > > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate > > over last: 5 seconds, last minute, last hour, hash=d0107601: > > lflow_run 0.2/sec 0.083/sec0.0717/sec > > total: 296 > > miniflow_malloc 122834.2/sec 51180.917/sec43930.2869/sec > > total: 181249574 > > hindex_pathological0.0/sec 0.0
Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
Thanks for the hint! 2020-08-07T19:44:18.019Z|616614|jsonrpc|DBG|tcp:10.6.20.84:6642: received notification, method="update3", params=[["monid","OVN_Southbound"],"5002cb22-13e5-490a-9a64-5d48914138ca",{"Chassis":{"0390b621-152b-48a0-a3d1-2973c0b823cc":{"modify":{"external_ids":["map",[["neutron:liveness_check_at","2020-08-07T19:44:07.130233+00:00"]]]] Nailed it... https://bugs.launchpad.net/neutron/+bug/1883554 Tony > -Original Message- > From: dev On Behalf Of Tony Liu > Sent: Friday, August 7, 2020 1:14 PM > To: Han Zhou > Cc: ovs-dev ; ovs-discuss disc...@openvswitch.org> > Subject: Re: [ovs-dev] [ovs-discuss] [OVN] ovn-controller takes 100% cpu > while no changes in sb-db > > > Here is the outpu. > > [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate > over last: 5 seconds, last minute, last hour, hash=e70a83c8: > lflow_run 0.0/sec 0.083/sec0.0725/sec > total: 295 > miniflow_malloc0.0/sec 44356.817/sec44527.3975/sec > total: 180635403 > hindex_pathological0.0/sec 0.000/sec0./sec > total: 7187 > hindex_expand 0.0/sec 0.000/sec0./sec > total: 17 > hmap_pathological 0.0/sec 4.167/sec4.1806/sec > total: 25091 > hmap_expand0.0/sec 5366.500/sec 5390.0800/sec > total: 23680738 > txn_unchanged 0.0/sec 0.300/sec0.3175/sec > total: 11024 > txn_incomplete 0.0/sec 0.100/sec0.0836/sec > total: 974 > txn_success0.0/sec 0.033/sec0.0308/sec > total: 129 > txn_try_again 0.0/sec 0.000/sec0.0003/sec > total: 1 > poll_create_node 0.4/sec 1.933/sec1.9575/sec > total: 55611 > poll_zero_timeout 0.0/sec 0.067/sec0.0556/sec > total: 241 > rconn_queued 0.0/sec 0.050/sec0.0594/sec > total: 1208720 > rconn_sent 0.0/sec 0.050/sec0.0594/sec > total: 1208720 > seq_change 0.2/sec 0.783/sec0.7492/sec > total: 13962 > pstream_open 0.0/sec 0.000/sec0./sec > total: 1 > stream_open0.0/sec 0.000/sec0.0003/sec > total: 5 > unixctl_received 0.0/sec 0.000/sec0.0011/sec > total: 4 > unixctl_replied0.0/sec 0.000/sec0.0011/sec > total: 4 > util_xalloc0.8/sec 1396586.967/sec 240916.6047/sec > total: 5834154064 > vconn_open 0.0/sec 0.000/sec0./sec > total: 2 > vconn_received 0.0/sec 0.050/sec0.0594/sec > total: 632 > vconn_sent 0.0/sec 0.050/sec0.0494/sec > total: 1213248 > netlink_received 0.0/sec 0.300/sec0.2900/sec > total: 1188 > netlink_recv_jumbo 0.0/sec 0.083/sec0.0725/sec > total: 296 > netlink_sent 0.0/sec 0.300/sec0.2900/sec > total: 1188 > cmap_expand0.0/sec 0.000/sec0./sec > total: 3 > 82 events never hit > [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate > over last: 5 seconds, last minute, last hour, hash=d0107601: > lflow_run 0.2/sec 0.083/sec0.0717/sec > total: 296 > miniflow_malloc 122834.2/sec 51180.917/sec43930.2869/sec > total: 181249574 > hindex_pathological0.0/sec 0.000/sec0./sec > total: 7187 > hindex_expand 0.0/sec 0.000/sec0./sec > total: 17 > hmap_pathological 13.2/sec 4.967/sec4.1264/sec > total: 25157 > hmap_expand 14982.2/sec 6205.067/sec 5317.9547/sec > total: 23755649 > txn_unchanged 1.4/sec 0.400/sec0.3144/sec > total: 11031 > txn_incomplete 0.4/sec 0.117/sec0.0825/sec > total: 976 > txn_success0.2/sec 0.050/sec0.0306/sec > total: 130 > txn_try_again 0.0/sec 0.000/sec0.0003/sec > total: 1 > poll_create_node 7.6/sec 2.467/sec1.9353/sec > total: 55649 > poll_zero_timeout 0.4/sec 0.100/sec0.0547/sec > total: 243 > rconn_queued 0.4/sec 0.083/sec0.0592/sec > total: 1208722 > rconn_sent 0.4/sec 0.083/sec0.0592/sec
Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
rate over last: 5 seconds, last minute, last hour, hash=069e0b25: lflow_run 0.2/sec 0.100/sec0.0719/sec total: 297 miniflow_malloc 122834.2/sec 61417.100/sec44100.8900/sec total: 181863745 hindex_pathological0.0/sec 0.000/sec0./sec total: 7187 hindex_expand 0.0/sec 0.000/sec0./sec total: 17 hmap_pathological 10.0/sec 5.800/sec4.1403/sec total: 25207 hmap_expand 14756.6/sec 7434.783/sec 5338.4500/sec total: 23829432 txn_unchanged 0.4/sec 0.433/sec0.3150/sec total: 11033 txn_incomplete 0.0/sec 0.117/sec0.0825/sec total: 976 txn_success0.0/sec 0.050/sec0.0306/sec total: 130 txn_try_again 0.0/sec 0.000/sec0.0003/sec total: 1 poll_create_node 2.6/sec 2.650/sec1.9389/sec total: 55662 poll_zero_timeout 0.0/sec 0.100/sec0.0547/sec total: 243 rconn_queued 0.0/sec 0.083/sec0.0592/sec total: 1208722 rconn_sent 0.0/sec 0.083/sec0.0592/sec total: 1208722 seq_change 1.4/sec 1.000/sec0.7414/sec total: 13980 pstream_open 0.0/sec 0.000/sec0./sec total: 1 stream_open0.0/sec 0.000/sec0.0003/sec total: 5 unixctl_received 0.2/sec 0.033/sec0.0017/sec total: 6 unixctl_replied0.2/sec 0.033/sec0.0017/sec total: 6 util_xalloc 3864890.0/sec 1933841.933/sec 227487.1978/sec total: 5872830887 vconn_open 0.0/sec 0.000/sec0./sec total: 2 vconn_received 0.0/sec 0.083/sec0.0592/sec total: 634 vconn_sent 0.0/sec 0.083/sec0.0492/sec total: 1213250 netlink_received 0.8/sec 0.400/sec0.2872/sec total: 1196 netlink_recv_jumbo 0.2/sec 0.100/sec0.0719/sec total: 298 netlink_sent 0.8/sec 0.400/sec0.2872/sec total: 1196 cmap_expand0.0/sec 0.000/sec0./sec total: 3 82 events never hit [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=069e0b25: lflow_run 0.0/sec 0.083/sec0.0719/sec total: 297 miniflow_malloc0.0/sec 51180.917/sec44100.8900/sec total: 181863745 hindex_pathological0.0/sec 0.000/sec0./sec total: 7187 hindex_expand 0.0/sec 0.000/sec0./sec total: 17 hmap_pathological 2.4/sec 4.967/sec4.1436/sec total: 25219 hmap_expand 171.8/sec 6205.350/sec 5338.6886/sec total: 23830291 txn_unchanged 1.2/sec 0.433/sec0.3167/sec total: 11039 txn_incomplete 0.0/sec 0.100/sec0.0825/sec total: 976 txn_success0.0/sec 0.033/sec0.0306/sec total: 130 txn_try_again 0.0/sec 0.000/sec0.0003/sec total: 1 poll_create_node 4.6/sec 2.583/sec1.9453/sec total: 55685 poll_zero_timeout 0.0/sec 0.067/sec0.0547/sec total: 243 rconn_queued 0.2/sec 0.083/sec0.0594/sec total: 1208723 rconn_sent 0.2/sec 0.083/sec0.0594/sec total: 1208723 seq_change 1.0/sec 0.933/sec0.7428/sec total: 13985 pstream_open 0.0/sec 0.000/sec0./sec total: 1 stream_open0.0/sec 0.000/sec0.0003/sec total: 5 unixctl_received 0.2/sec 0.050/sec0.0019/sec total: 7 unixctl_replied0.2/sec 0.050/sec0.0019/sec total: 7 util_xalloc 4345.0/sec 1611785.933/sec 227493.2325/sec total: 5872852612 vconn_open 0.0/sec 0.000/sec0./sec total: 2 vconn_received 0.2/sec 0.083/sec0.0594/sec total: 635 vconn_sent 0.2/sec 0.083/sec0.0494/sec total: 1213251 netlink_received 0.0/sec 0.333/sec0.2872/sec total: 1196 netlink_recv_jumbo 0.0/sec 0.083/sec0.0719/sec total: 298 netlink_sent 0.0/sec 0.333/sec0.2872/sec total: 1196 cmap_expand0.0/sec 0.000/sec0./sec total: 3 82 events never hit Thanks! Tony > -Original Message- > From: Han Zhou > Sent: Friday, August 7, 2020 1:09 PM > To: Tony Liu > Cc: ovs-discuss ; ovs-dev d...@openvswitch.org> > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no > change
Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
On Fri, Aug 7, 2020 at 12:57 PM Tony Liu wrote: > Enabled debug logging, there are tons of messages. > Note there are 4353 datapath bindings and 13078 port bindings in SB. > 4097 LS, 8470 LSP, 256 LR and 4352 LRP in NB. Every 16 LS connect to > a router. All routers connect to the external network. > > ovn-controller on compute node is good. The ovn-controller on gateway > node is taking 100% cpu. It's probably related to the ports on the > external network? Any specific messages I need to check? > > Any hint to look into it is appreciated! > > If it happens only on gateway, it may be related to ARP handling. Could you share the output of ovn-appctl -t ovn-controller coverage/show, with 2 - 3 runs in 5s interval? For the debug log, I'd first check if there is any OVSDB notification from SB DB, and if yes, what are the changes. > > Thanks! > > Tony > > -Original Message- > > From: Han Zhou > > Sent: Friday, August 7, 2020 12:39 PM > > To: Tony Liu > > Cc: ovs-discuss ; ovs-dev > d...@openvswitch.org> > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no > > changes in sb-db > > > > > > > > On Fri, Aug 7, 2020 at 12:35 PM Tony Liu > <mailto:tonyliu0...@hotmail.com> > wrote: > > > > > > Inline... > > > > Thanks! > > > > Tony > > > -Original Message- > > > From: Han Zhou mailto:zhou...@gmail.com> > > > > Sent: Friday, August 7, 2020 12:29 PM > > > To: Tony Liu > <mailto:tonyliu0...@hotmail.com> > > > > Cc: ovs-discuss mailto:ovs- > > disc...@openvswitch.org> >; ovs-dev > > d...@openvswitch.org <mailto:d...@openvswitch.org> > > > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu > > while no > > > changes in sb-db > > > > > > > > > > > > On Fri, Aug 7, 2020 at 12:19 PM Tony Liu < > tonyliu0...@hotmail.com > > <mailto:tonyliu0...@hotmail.com> > > > <mailto:tonyliu0...@hotmail.com > > <mailto:tonyliu0...@hotmail.com> > > wrote: > > > > > > > > > ovn-controller is using UNIX socket connecting to local > > ovsdb- > > > server. > > > > > > From the log you were showing, you were using tcp:127.0.0.1:6640 > > <http://127.0.0.1:6640> > > > > Sorry, what I meant was, given your advice, I just made the change > > for > > ovn-controller to use UNIX socket. > > > > > > > > Oh, I see, no worries. > > > > > > > <http://127.0.0.1:6640> to connect the local ovsdb. > > > > 2020-08- > > 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640 > > <http://127.0.0.1:6640> > > > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> : connection > > dropped > > > > (Broken pipe) > > > > > > > > > Inactivity probe doesn't seem to be the cause of high cpu > > usage. > > > > > > The wakeup on connection to sb-db is always followed by a > > > "unreasonably > > > long" warning. I guess the pollin event loop is stuck for > > too long, > > > like > > > 10s as below. > > > > > > 2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to > > [POLLIN] > > > on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712> > > <http://10.6.20.91:60712> <->10.6.20.86:6642 <http://10.6.20.86:6642> > > > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU > usage) > > > 2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably > > long > > > 10153ms poll interval (10075ms user, 1ms system) > > > > > > > > > Could that stuck loop be the cause of high cpu usage? > > > What is it polling in? > > > Why is it stuck, waiting for message from sb-db? > > > Isn't it supposed to release the cpu while waiting? > > > > > > > > > > > > This log means there are messages received from 10.6.20.86:6642 > > <http://10.6.20.86:6642> > > > <http://10.6.20.86:6642> (the
Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
Enabled debug logging, there are tons of messages. Note there are 4353 datapath bindings and 13078 port bindings in SB. 4097 LS, 8470 LSP, 256 LR and 4352 LRP in NB. Every 16 LS connect to a router. All routers connect to the external network. ovn-controller on compute node is good. The ovn-controller on gateway node is taking 100% cpu. It's probably related to the ports on the external network? Any specific messages I need to check? Any hint to look into it is appreciated! Thanks! Tony > -Original Message- > From: Han Zhou > Sent: Friday, August 7, 2020 12:39 PM > To: Tony Liu > Cc: ovs-discuss ; ovs-dev d...@openvswitch.org> > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no > changes in sb-db > > > > On Fri, Aug 7, 2020 at 12:35 PM Tony Liu <mailto:tonyliu0...@hotmail.com> > wrote: > > > Inline... > > Thanks! > > Tony > > -Original Message- > > From: Han Zhou mailto:zhou...@gmail.com> > > > Sent: Friday, August 7, 2020 12:29 PM > > To: Tony Liu <mailto:tonyliu0...@hotmail.com> > > > Cc: ovs-discuss mailto:ovs- > disc...@openvswitch.org> >; ovs-dev > d...@openvswitch.org <mailto:d...@openvswitch.org> > > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu > while no > > changes in sb-db > > > > > > > > On Fri, Aug 7, 2020 at 12:19 PM Tony Liu <mailto:tonyliu0...@hotmail.com> > > <mailto:tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > wrote: > > > > > > ovn-controller is using UNIX socket connecting to local > ovsdb- > > server. > > > > From the log you were showing, you were using tcp:127.0.0.1:6640 > <http://127.0.0.1:6640> > > Sorry, what I meant was, given your advice, I just made the change > for > ovn-controller to use UNIX socket. > > > > Oh, I see, no worries. > > > > <http://127.0.0.1:6640> to connect the local ovsdb. > > > 2020-08- > 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640 > <http://127.0.0.1:6640> > > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> : connection > dropped > > > (Broken pipe) > > > > > > Inactivity probe doesn't seem to be the cause of high cpu > usage. > > > > The wakeup on connection to sb-db is always followed by a > > "unreasonably > > long" warning. I guess the pollin event loop is stuck for > too long, > > like > > 10s as below. > > > > 2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to > [POLLIN] > > on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712> > <http://10.6.20.91:60712> <->10.6.20.86:6642 <http://10.6.20.86:6642> > > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage) > > 2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably > long > > 10153ms poll interval (10075ms user, 1ms system) > > > > > > Could that stuck loop be the cause of high cpu usage? > > What is it polling in? > > Why is it stuck, waiting for message from sb-db? > > Isn't it supposed to release the cpu while waiting? > > > > > > > > This log means there are messages received from 10.6.20.86:6642 > <http://10.6.20.86:6642> > > <http://10.6.20.86:6642> (the SB DB). Is there SB change? The > CPU is > > spent on handling the SB change. Some type of SB changes are not > handled > > incrementally. > > SB update is driven by ovn-northd in case anything changed in NB, > and ovn-controller in case anything changed on chassis. No, there > is nothing changed in NB, neither chassis. > > Should I bump logging level up to dbg? Is that going to show me > what messages ovn-controller is handling? > > > > Yes, debug log should show the details. > > > > > > > Thanks! > > > > Tony > > > > > -Original Message----- > > > From: Han Zhou <mailto:zhou...@gmail.com> <mailto:zhou...@gmail.com > <mailto:zhou...@gmail.com> > > > >
Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
Inline... Thanks! Tony > -Original Message- > From: Han Zhou > Sent: Friday, August 7, 2020 12:29 PM > To: Tony Liu > Cc: ovs-discuss ; ovs-dev d...@openvswitch.org> > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no > changes in sb-db > > > > On Fri, Aug 7, 2020 at 12:19 PM Tony Liu <mailto:tonyliu0...@hotmail.com> > wrote: > > > ovn-controller is using UNIX socket connecting to local ovsdb- > server. > > From the log you were showing, you were using tcp:127.0.0.1:6640 Sorry, what I meant was, given your advice, I just made the change for ovn-controller to use UNIX socket. > <http://127.0.0.1:6640> to connect the local ovsdb. > > 2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640 > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> : connection dropped > > (Broken pipe) > > > Inactivity probe doesn't seem to be the cause of high cpu usage. > > The wakeup on connection to sb-db is always followed by a > "unreasonably > long" warning. I guess the pollin event loop is stuck for too long, > like > 10s as below. > > 2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to [POLLIN] > on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712> <->10.6.20.86:6642 > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage) > 2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably long > 10153ms poll interval (10075ms user, 1ms system) > > > Could that stuck loop be the cause of high cpu usage? > What is it polling in? > Why is it stuck, waiting for message from sb-db? > Isn't it supposed to release the cpu while waiting? > > > > This log means there are messages received from 10.6.20.86:6642 > <http://10.6.20.86:6642> (the SB DB). Is there SB change? The CPU is > spent on handling the SB change. Some type of SB changes are not handled > incrementally. SB update is driven by ovn-northd in case anything changed in NB, and ovn-controller in case anything changed on chassis. No, there is nothing changed in NB, neither chassis. Should I bump logging level up to dbg? Is that going to show me what messages ovn-controller is handling? > > Thanks! > > Tony > > > -Original Message- > > From: Han Zhou mailto:zhou...@gmail.com> > > > Sent: Friday, August 7, 2020 10:32 AM > > To: Tony Liu <mailto:tonyliu0...@hotmail.com> > > > Cc: ovs-discuss mailto:ovs- > disc...@openvswitch.org> >; ovs-dev> d...@openvswitch.org <mailto:d...@openvswitch.org> > > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu > while no > > changes in sb-db > > > > > > > > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu <mailto:tonyliu0...@hotmail.com> > > <mailto:tonyliu0...@hotmail.com > <mailto:tonyliu0...@hotmail.com> > > wrote: > > > > > > Hi, > > > > Here are some logging snippets from ovn-controller. > > > > 2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably > long > > 8954ms poll interval (8895ms user, 0ms system) > > > > What's that mean? Is it harmless? > > > > > > 2020-08-07T16:38:04.021Z|29251|timeval|WARN|context > switches: 0 > > voluntary, 6 involuntary > > 2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to > [POLLIN] > > on fd 19 (10.6.20.91:60398 <http://10.6.20.91:60398> > <http://10.6.20.91:60398> <->10.6.20.86:6642 <http://10.6.20.86:6642> > > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage) > > > > Is this wakeup caused by changes in sb-db? > > Why is ovn-controller so busy? > > > > > > 2020-08- > 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640 > <http://127.0.0.1:6640> > > <http://127.0.0.1:6640> : connection dropped (Broken pipe) > > > > Connection to local ovsdb-server is dropped. > > Is this caused by the timeout of inactivity probe? > > > > > > 2020-08-07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to > [POLLIN] > > on f
Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
On Fri, Aug 7, 2020 at 12:35 PM Tony Liu wrote: > Inline... > > Thanks! > > Tony > > -Original Message- > > From: Han Zhou > > Sent: Friday, August 7, 2020 12:29 PM > > To: Tony Liu > > Cc: ovs-discuss ; ovs-dev > d...@openvswitch.org> > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no > > changes in sb-db > > > > > > > > On Fri, Aug 7, 2020 at 12:19 PM Tony Liu > <mailto:tonyliu0...@hotmail.com> > wrote: > > > > > > ovn-controller is using UNIX socket connecting to local ovsdb- > > server. > > > > From the log you were showing, you were using tcp:127.0.0.1:6640 > > Sorry, what I meant was, given your advice, I just made the change for > ovn-controller to use UNIX socket. > > Oh, I see, no worries. > <http://127.0.0.1:6640> to connect the local ovsdb. > > > 2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640 > > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> : connection dropped > > > (Broken pipe) > > > > > > Inactivity probe doesn't seem to be the cause of high cpu usage. > > > > The wakeup on connection to sb-db is always followed by a > > "unreasonably > > long" warning. I guess the pollin event loop is stuck for too long, > > like > > 10s as below. > > > > 2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to > [POLLIN] > > on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712> <->10.6.20.86:6642 > > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage) > > 2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably long > > 10153ms poll interval (10075ms user, 1ms system) > > > > > > Could that stuck loop be the cause of high cpu usage? > > What is it polling in? > > Why is it stuck, waiting for message from sb-db? > > Isn't it supposed to release the cpu while waiting? > > > > > > > > This log means there are messages received from 10.6.20.86:6642 > > <http://10.6.20.86:6642> (the SB DB). Is there SB change? The CPU is > > spent on handling the SB change. Some type of SB changes are not handled > > incrementally. > > SB update is driven by ovn-northd in case anything changed in NB, > and ovn-controller in case anything changed on chassis. No, there > is nothing changed in NB, neither chassis. > > Should I bump logging level up to dbg? Is that going to show me > what messages ovn-controller is handling? > > Yes, debug log should show the details. > > > > Thanks! > > > > Tony > > > > > -Original Message- > > > From: Han Zhou mailto:zhou...@gmail.com> > > > > Sent: Friday, August 7, 2020 10:32 AM > > > To: Tony Liu > <mailto:tonyliu0...@hotmail.com> > > > > Cc: ovs-discuss mailto:ovs- > > disc...@openvswitch.org> >; ovs-dev > > d...@openvswitch.org <mailto:d...@openvswitch.org> > > > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu > > while no > > > changes in sb-db > > > > > > > > > > > > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu < > tonyliu0...@hotmail.com > > <mailto:tonyliu0...@hotmail.com> > > > <mailto:tonyliu0...@hotmail.com > > <mailto:tonyliu0...@hotmail.com> > > wrote: > > > > > > > > > Hi, > > > > > > Here are some logging snippets from ovn-controller. > > > > > > 2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably > > long > > > 8954ms poll interval (8895ms user, 0ms system) > > > > > > What's that mean? Is it harmless? > > > > > > > > > 2020-08-07T16:38:04.021Z|29251|timeval|WARN|context > > switches: 0 > > > voluntary, 6 involuntary > > > 2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to > > [POLLIN] > > > on fd 19 (10.6.20.91:60398 <http://10.6.20.91:60398> > > <http://10.6.20.91:60398> <->10.6.20.86:6642 <http://10.6.20.86:6642> > > > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU > usage) > > >
Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
On Fri, Aug 7, 2020 at 12:19 PM Tony Liu wrote: > ovn-controller is using UNIX socket connecting to local ovsdb-server. > >From the log you were showing, you were using tcp:127.0.0.1:6640 to connect the local ovsdb. > 2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640 > <http://127.0.0.1:6640> : connection dropped (Broken pipe) > Inactivity probe doesn't seem to be the cause of high cpu usage. > > The wakeup on connection to sb-db is always followed by a "unreasonably > long" warning. I guess the pollin event loop is stuck for too long, like > 10s as below. > > 2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to [POLLIN] on fd > 19 (10.6.20.91:60712<->10.6.20.86:6642) at lib/stream-fd.c:157 (99% CPU > usage) > 2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably long 10153ms poll > interval (10075ms user, 1ms system) > > > Could that stuck loop be the cause of high cpu usage? > What is it polling in? > Why is it stuck, waiting for message from sb-db? > Isn't it supposed to release the cpu while waiting? > > This log means there are messages received from 10.6.20.86:6642 (the SB DB). Is there SB change? The CPU is spent on handling the SB change. Some type of SB changes are not handled incrementally. > Thanks! > > Tony > > > -Original Message- > > From: Han Zhou > > Sent: Friday, August 7, 2020 10:32 AM > > To: Tony Liu > > Cc: ovs-discuss ; ovs-dev > d...@openvswitch.org> > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no > > changes in sb-db > > > > > > > > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu > <mailto:tonyliu0...@hotmail.com> > wrote: > > > > > > Hi, > > > > Here are some logging snippets from ovn-controller. > > > > 2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably long > > 8954ms poll interval (8895ms user, 0ms system) > > > > What's that mean? Is it harmless? > > > > > > 2020-08-07T16:38:04.021Z|29251|timeval|WARN|context switches: 0 > > voluntary, 6 involuntary > > 2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to > [POLLIN] > > on fd 19 (10.6.20.91:60398 <http://10.6.20.91:60398> <->10.6.20.86:6642 > > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage) > > > > Is this wakeup caused by changes in sb-db? > > Why is ovn-controller so busy? > > > > > > 2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640 > > <http://127.0.0.1:6640> : connection dropped (Broken pipe) > > > > Connection to local ovsdb-server is dropped. > > Is this caused by the timeout of inactivity probe? > > > > > > 2020-08-07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to > [POLLIN] > > on fd 20 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 > > (99% CPU usage) > > > > What causes this wakeup? > > > > > > 2020-08-07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to 0-ms > > timeout at lib/ovsdb-idl.c:5391 (99% CPU usage) > > > > What's this 0-ms wakeup mean? > > > > > > 2020-08-07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to 962-ms > > timeout at lib/reconnect.c:643 (99% CPU usage) > > 2020-08-07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640 > > <http://127.0.0.1:6640> : connecting... > > 2020-08-07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to > > [POLLOUT] on fd 14 (127.0.0.1:51478 <http://127.0.0.1:51478> <- > > >127.0.0.1:6640 <http://127.0.0.1:6640> ) at lib/stream-fd.c:153 (99% > > CPU usage) > > 2020-08-07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640 > > <http://127.0.0.1:6640> : connected > > > > Retry to connect to local ovsdb-server. A pollout event is > > triggered > > right after connection is established. What's poolout? > > > > ovn-controller is taking 100% CPU now, and there is no changes in > > sb-db (not busy). It seems that it's busy with local ovsdb-server > > or vswitchd. I'd like to understand why ovn-controller is so busy? > > All inactivity probe intervals are set to 30s. > > > > > > > > > > Is there change from the local ovsdb? You can enable dbg log to see what > > is happening. > > For the local ovsdb probe, I have mentioned in the other thread that > > UNIX socket is recommended (instead of tcp 127.0.0.1). Using UNIX socket > > disables probe by default. > > > > Thanks, > > Han > > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
ovn-controller is using UNIX socket connecting to local ovsdb-server. Inactivity probe doesn't seem to be the cause of high cpu usage. The wakeup on connection to sb-db is always followed by a "unreasonably long" warning. I guess the pollin event loop is stuck for too long, like 10s as below. 2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to [POLLIN] on fd 19 (10.6.20.91:60712<->10.6.20.86:6642) at lib/stream-fd.c:157 (99% CPU usage) 2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably long 10153ms poll interval (10075ms user, 1ms system) Could that stuck loop be the cause of high cpu usage? What is it polling in? Why is it stuck, waiting for message from sb-db? Isn't it supposed to release the cpu while waiting? Thanks! Tony > -Original Message- > From: Han Zhou > Sent: Friday, August 7, 2020 10:32 AM > To: Tony Liu > Cc: ovs-discuss ; ovs-dev d...@openvswitch.org> > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no > changes in sb-db > > > > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu <mailto:tonyliu0...@hotmail.com> > wrote: > > > Hi, > > Here are some logging snippets from ovn-controller. > > 2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably long > 8954ms poll interval (8895ms user, 0ms system) > > What's that mean? Is it harmless? > > > 2020-08-07T16:38:04.021Z|29251|timeval|WARN|context switches: 0 > voluntary, 6 involuntary > 2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to [POLLIN] > on fd 19 (10.6.20.91:60398 <http://10.6.20.91:60398> <->10.6.20.86:6642 > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage) > > Is this wakeup caused by changes in sb-db? > Why is ovn-controller so busy? > > > 2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640 > <http://127.0.0.1:6640> : connection dropped (Broken pipe) > > Connection to local ovsdb-server is dropped. > Is this caused by the timeout of inactivity probe? > > > 2020-08-07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to [POLLIN] > on fd 20 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 > (99% CPU usage) > > What causes this wakeup? > > > 2020-08-07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to 0-ms > timeout at lib/ovsdb-idl.c:5391 (99% CPU usage) > > What's this 0-ms wakeup mean? > > > 2020-08-07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to 962-ms > timeout at lib/reconnect.c:643 (99% CPU usage) > 2020-08-07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640 > <http://127.0.0.1:6640> : connecting... > 2020-08-07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to > [POLLOUT] on fd 14 (127.0.0.1:51478 <http://127.0.0.1:51478> <- > >127.0.0.1:6640 <http://127.0.0.1:6640> ) at lib/stream-fd.c:153 (99% > CPU usage) > 2020-08-07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640 > <http://127.0.0.1:6640> : connected > > Retry to connect to local ovsdb-server. A pollout event is > triggered > right after connection is established. What's poolout? > > ovn-controller is taking 100% CPU now, and there is no changes in > sb-db (not busy). It seems that it's busy with local ovsdb-server > or vswitchd. I'd like to understand why ovn-controller is so busy? > All inactivity probe intervals are set to 30s. > > > > > Is there change from the local ovsdb? You can enable dbg log to see what > is happening. > For the local ovsdb probe, I have mentioned in the other thread that > UNIX socket is recommended (instead of tcp 127.0.0.1). Using UNIX socket > disables probe by default. > > Thanks, > Han ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
On Fri, Aug 7, 2020 at 10:05 AM Tony Liu wrote: > Hi, > > Here are some logging snippets from ovn-controller. > > 2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably long 8954ms poll > interval (8895ms user, 0ms system) > > What's that mean? Is it harmless? > > > 2020-08-07T16:38:04.021Z|29251|timeval|WARN|context switches: 0 voluntary, > 6 involuntary > 2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to [POLLIN] on fd > 19 (10.6.20.91:60398<->10.6.20.86:6642) at lib/stream-fd.c:157 (99% CPU > usage) > > Is this wakeup caused by changes in sb-db? > Why is ovn-controller so busy? > > > 2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640: > connection dropped (Broken pipe) > > Connection to local ovsdb-server is dropped. > Is this caused by the timeout of inactivity probe? > > > 2020-08-07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to [POLLIN] on fd > 20 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (99% CPU > usage) > > What causes this wakeup? > > > 2020-08-07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to 0-ms timeout > at lib/ovsdb-idl.c:5391 (99% CPU usage) > > What's this 0-ms wakeup mean? > > > 2020-08-07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to 962-ms timeout > at lib/reconnect.c:643 (99% CPU usage) > 2020-08-07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640: > connecting... > 2020-08-07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to [POLLOUT] on > fd 14 (127.0.0.1:51478<->127.0.0.1:6640) at lib/stream-fd.c:153 (99% CPU > usage) > 2020-08-07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640: > connected > > Retry to connect to local ovsdb-server. A pollout event is triggered > right after connection is established. What's poolout? > > ovn-controller is taking 100% CPU now, and there is no changes in > sb-db (not busy). It seems that it's busy with local ovsdb-server > or vswitchd. I'd like to understand why ovn-controller is so busy? > All inactivity probe intervals are set to 30s. > > Is there change from the local ovsdb? You can enable dbg log to see what is happening. For the local ovsdb probe, I have mentioned in the other thread that UNIX socket is recommended (instead of tcp 127.0.0.1). Using UNIX socket disables probe by default. Thanks, Han ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
Hi, Here are some logging snippets from ovn-controller. 2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably long 8954ms poll interval (8895ms user, 0ms system) What's that mean? Is it harmless? 2020-08-07T16:38:04.021Z|29251|timeval|WARN|context switches: 0 voluntary, 6 involuntary 2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to [POLLIN] on fd 19 (10.6.20.91:60398<->10.6.20.86:6642) at lib/stream-fd.c:157 (99% CPU usage) Is this wakeup caused by changes in sb-db? Why is ovn-controller so busy? 2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640: connection dropped (Broken pipe) Connection to local ovsdb-server is dropped. Is this caused by the timeout of inactivity probe? 2020-08-07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to [POLLIN] on fd 20 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (99% CPU usage) What causes this wakeup? 2020-08-07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to 0-ms timeout at lib/ovsdb-idl.c:5391 (99% CPU usage) What's this 0-ms wakeup mean? 2020-08-07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to 962-ms timeout at lib/reconnect.c:643 (99% CPU usage) 2020-08-07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640: connecting... 2020-08-07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to [POLLOUT] on fd 14 (127.0.0.1:51478<->127.0.0.1:6640) at lib/stream-fd.c:153 (99% CPU usage) 2020-08-07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640: connected Retry to connect to local ovsdb-server. A pollout event is triggered right after connection is established. What's poolout? ovn-controller is taking 100% CPU now, and there is no changes in sb-db (not busy). It seems that it's busy with local ovsdb-server or vswitchd. I'd like to understand why ovn-controller is so busy? All inactivity probe intervals are set to 30s. Thanks! Tony ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss