Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db

2020-08-07 Thread Tony Liu
Good to know. Yes, I am using 20.03.
Will try to upgrade.


Thanks!

Tony
> -Original Message-
> From: Han Zhou 
> Sent: Friday, August 7, 2020 3:39 PM
> To: Tony Liu 
> Cc: ovs-dev ; ovs-discuss  disc...@openvswitch.org>
> Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> changes in sb-db
> 
> The change in external-ids is not monitored by ovn-controller any more
> after version 20.06. Probably you are still using an older version?
> 
> 
> On Fri, Aug 7, 2020 at 3:11 PM Tony Liu  <mailto:tonyliu0...@hotmail.com> > wrote:
> 
> 
>   Is the changes made by Neutron liveness-check in SB properly
> handled
>   by inc_proc_eng?
> 
>   Given my testing scale, even the frequency is lowered by that fix,
>   ovn-controller still takes quite lot of CPU and time to compute SB.
> 
> 
>   Thanks!
> 
>   Tony
>   > -Original Message-
>   > From: Tony Liu  <mailto:tonyliu0...@hotmail.com> >
>   > Sent: Friday, August 7, 2020 1:25 PM
>   > To: Tony Liu  <mailto:tonyliu0...@hotmail.com> >; Han Zhou  <mailto:zhou...@gmail.com> >
>   > Cc: ovs-dev mailto:ovs-
> d...@openvswitch.org> >; ovs-discuss    > disc...@openvswitch.org <mailto:disc...@openvswitch.org> >
>   > Subject: RE: [ovs-discuss] [OVN] ovn-controller takes 100% cpu
> while no
>   > changes in sb-db
>   >
>   > Thanks for the hint!
>   > 
>   > 2020-08-07T19:44:18.019Z|616614|jsonrpc|DBG|tcp:10.6.20.84:6642
> <http://10.6.20.84:6642> :
>   > received notification, method="update3",
>   > params=[["monid","OVN_Southbound"],"5002cb22-13e5-490a-9a64-
>   > 5d48914138ca",{"Chassis":{"0390b621-152b-48a0-a3d1-
>   >
> 2973c0b823cc":{"modify":{"external_ids":["map",[["neutron:liveness_check
>   > _at","2020-08-07T19:44:07.130233+00:00"]]]]
>   > 
>   >
>   > Nailed it...
>   > https://bugs.launchpad.net/neutron/+bug/1883554
>   >
>   >
>   > Tony
>   > > -----Original Message-----
>   > > From: dev mailto:ovs-dev-
> boun...@openvswitch.org> > On Behalf Of Tony Liu
>   > > Sent: Friday, August 7, 2020 1:14 PM
>   > > To: Han Zhou mailto:zhou...@gmail.com> >
>   > > Cc: ovs-dev mailto:ovs-
> d...@openvswitch.org> >; ovs-discuss> > disc...@openvswitch.org <mailto:disc...@openvswitch.org> >
>   > > Subject: Re: [ovs-dev] [ovs-discuss] [OVN] ovn-controller takes
> 100%
>   > > cpu while no changes in sb-db
>   > >
>   > >
>   > > Here is the outpu.
>   > > 
>   > > [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t
>   > > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg
> rate
>   > > over last: 5 seconds, last minute, last hour,  hash=e70a83c8:
>   > > lflow_run  0.0/sec 0.083/sec
> 0.0725/sec
>   > > total: 295
>   > > miniflow_malloc0.0/sec 44356.817/sec
> 44527.3975/sec
>   > > total: 180635403
>   > > hindex_pathological0.0/sec 0.000/sec
> 0./sec
>   > > total: 7187
>   > > hindex_expand  0.0/sec 0.000/sec
> 0./sec
>   > > total: 17
>   > > hmap_pathological  0.0/sec 4.167/sec
> 4.1806/sec
>   > > total: 25091
>   > > hmap_expand0.0/sec  5366.500/sec
> 5390.0800/sec
>   > > total: 23680738
>   > > txn_unchanged  0.0/sec 0.300/sec
> 0.3175/sec
>   > > total: 11024
>   > > txn_incomplete 0.0/sec 0.100/sec
> 0.0836/sec
>   > > total: 974
>   > > txn_success0.0/sec 0.033/sec
> 0.0308/sec
>   > > total: 129
>   > > txn_try_again  0.0/sec 0.000/sec
> 0.0003/sec
>   > > total: 1
>   > > poll_create_node   0.4/sec 1.933/sec
> 1.9575/sec
>   > > total: 55611
>   > > poll_zero_timeout  0.0/sec 0.067/sec
> 0.0556/sec
>   > > total: 241
>   > > rconn_queued   0.0/sec 0.050/sec
> 0.0594/sec
>   > > total: 1208720
>   > > rconn_sent  

Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db

2020-08-07 Thread Han Zhou
The change in external-ids is not monitored by ovn-controller any more
after version 20.06. Probably you are still using an older version?

On Fri, Aug 7, 2020 at 3:11 PM Tony Liu  wrote:

> Is the changes made by Neutron liveness-check in SB properly handled
> by inc_proc_eng?
>
> Given my testing scale, even the frequency is lowered by that fix,
> ovn-controller still takes quite lot of CPU and time to compute SB.
>
>
> Thanks!
>
> Tony
> > -Original Message-
> > From: Tony Liu 
> > Sent: Friday, August 7, 2020 1:25 PM
> > To: Tony Liu ; Han Zhou 
> > Cc: ovs-dev ; ovs-discuss  > disc...@openvswitch.org>
> > Subject: RE: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> > changes in sb-db
> >
> > Thanks for the hint!
> > 
> > 2020-08-07T19:44:18.019Z|616614|jsonrpc|DBG|tcp:10.6.20.84:6642:
> > received notification, method="update3",
> > params=[["monid","OVN_Southbound"],"5002cb22-13e5-490a-9a64-
> > 5d48914138ca",{"Chassis":{"0390b621-152b-48a0-a3d1-
> > 2973c0b823cc":{"modify":{"external_ids":["map",[["neutron:liveness_check
> > _at","2020-08-07T19:44:07.130233+00:00"]]]]
> > 
> >
> > Nailed it...
> > https://bugs.launchpad.net/neutron/+bug/1883554
> >
> >
> > Tony
> > > -----Original Message-
> > > From: dev  On Behalf Of Tony Liu
> > > Sent: Friday, August 7, 2020 1:14 PM
> > > To: Han Zhou 
> > > Cc: ovs-dev ; ovs-discuss  > > disc...@openvswitch.org>
> > > Subject: Re: [ovs-dev] [ovs-discuss] [OVN] ovn-controller takes 100%
> > > cpu while no changes in sb-db
> > >
> > >
> > > Here is the outpu.
> > > 
> > > [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t
> > > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate
> > > over last: 5 seconds, last minute, last hour,  hash=e70a83c8:
> > > lflow_run  0.0/sec 0.083/sec0.0725/sec
> > > total: 295
> > > miniflow_malloc0.0/sec 44356.817/sec44527.3975/sec
> > > total: 180635403
> > > hindex_pathological0.0/sec 0.000/sec0./sec
> > > total: 7187
> > > hindex_expand  0.0/sec 0.000/sec0./sec
> > > total: 17
> > > hmap_pathological  0.0/sec 4.167/sec4.1806/sec
> > > total: 25091
> > > hmap_expand0.0/sec  5366.500/sec 5390.0800/sec
> > > total: 23680738
> > > txn_unchanged  0.0/sec 0.300/sec0.3175/sec
> > > total: 11024
> > > txn_incomplete 0.0/sec 0.100/sec0.0836/sec
> > > total: 974
> > > txn_success0.0/sec 0.033/sec0.0308/sec
> > > total: 129
> > > txn_try_again  0.0/sec 0.000/sec0.0003/sec
> > > total: 1
> > > poll_create_node   0.4/sec 1.933/sec1.9575/sec
> > > total: 55611
> > > poll_zero_timeout  0.0/sec 0.067/sec0.0556/sec
> > > total: 241
> > > rconn_queued   0.0/sec 0.050/sec0.0594/sec
> > > total: 1208720
> > > rconn_sent 0.0/sec 0.050/sec0.0594/sec
> > > total: 1208720
> > > seq_change 0.2/sec 0.783/sec0.7492/sec
> > > total: 13962
> > > pstream_open   0.0/sec 0.000/sec0./sec
> > > total: 1
> > > stream_open0.0/sec 0.000/sec0.0003/sec
> > > total: 5
> > > unixctl_received   0.0/sec 0.000/sec0.0011/sec
> > > total: 4
> > > unixctl_replied0.0/sec 0.000/sec0.0011/sec
> > > total: 4
> > > util_xalloc0.8/sec 1396586.967/sec   240916.6047/sec
> > > total: 5834154064
> > > vconn_open 0.0/sec 0.000/sec0./sec
> > > total: 2
> > > vconn_received 0.0/sec 0.050/sec0.0594/sec
> > > total: 632
> > > vconn_sent 0.0/sec 0.050/sec0.0494/sec
> > > total: 1213248
> > > netlink_received   0.0/sec 0.300/sec0.2900/sec
> > > total: 1188
> > > netlink_recv_jumbo 0.0/sec 0.083/sec0.0725/sec
> > > total: 296
> > > netlink_s

Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db

2020-08-07 Thread Tony Liu
Is the changes made by Neutron liveness-check in SB properly handled
by inc_proc_eng?

Given my testing scale, even the frequency is lowered by that fix,
ovn-controller still takes quite lot of CPU and time to compute SB.


Thanks!

Tony
> -Original Message-
> From: Tony Liu 
> Sent: Friday, August 7, 2020 1:25 PM
> To: Tony Liu ; Han Zhou 
> Cc: ovs-dev ; ovs-discuss  disc...@openvswitch.org>
> Subject: RE: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> changes in sb-db
> 
> Thanks for the hint!
> 
> 2020-08-07T19:44:18.019Z|616614|jsonrpc|DBG|tcp:10.6.20.84:6642:
> received notification, method="update3",
> params=[["monid","OVN_Southbound"],"5002cb22-13e5-490a-9a64-
> 5d48914138ca",{"Chassis":{"0390b621-152b-48a0-a3d1-
> 2973c0b823cc":{"modify":{"external_ids":["map",[["neutron:liveness_check
> _at","2020-08-07T19:44:07.130233+00:00"]]]]
> 
> 
> Nailed it...
> https://bugs.launchpad.net/neutron/+bug/1883554
> 
> 
> Tony
> > -Original Message-----
> > From: dev  On Behalf Of Tony Liu
> > Sent: Friday, August 7, 2020 1:14 PM
> > To: Han Zhou 
> > Cc: ovs-dev ; ovs-discuss  > disc...@openvswitch.org>
> > Subject: Re: [ovs-dev] [ovs-discuss] [OVN] ovn-controller takes 100%
> > cpu while no changes in sb-db
> >
> >
> > Here is the outpu.
> > 
> > [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t
> > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate
> > over last: 5 seconds, last minute, last hour,  hash=e70a83c8:
> > lflow_run  0.0/sec 0.083/sec0.0725/sec
> > total: 295
> > miniflow_malloc0.0/sec 44356.817/sec44527.3975/sec
> > total: 180635403
> > hindex_pathological0.0/sec 0.000/sec0./sec
> > total: 7187
> > hindex_expand  0.0/sec 0.000/sec0./sec
> > total: 17
> > hmap_pathological  0.0/sec 4.167/sec4.1806/sec
> > total: 25091
> > hmap_expand0.0/sec  5366.500/sec 5390.0800/sec
> > total: 23680738
> > txn_unchanged  0.0/sec 0.300/sec0.3175/sec
> > total: 11024
> > txn_incomplete 0.0/sec 0.100/sec0.0836/sec
> > total: 974
> > txn_success0.0/sec 0.033/sec0.0308/sec
> > total: 129
> > txn_try_again  0.0/sec 0.000/sec0.0003/sec
> > total: 1
> > poll_create_node   0.4/sec 1.933/sec1.9575/sec
> > total: 55611
> > poll_zero_timeout  0.0/sec 0.067/sec0.0556/sec
> > total: 241
> > rconn_queued   0.0/sec 0.050/sec0.0594/sec
> > total: 1208720
> > rconn_sent 0.0/sec 0.050/sec0.0594/sec
> > total: 1208720
> > seq_change 0.2/sec 0.783/sec0.7492/sec
> > total: 13962
> > pstream_open   0.0/sec 0.000/sec0./sec
> > total: 1
> > stream_open0.0/sec 0.000/sec0.0003/sec
> > total: 5
> > unixctl_received   0.0/sec 0.000/sec0.0011/sec
> > total: 4
> > unixctl_replied0.0/sec 0.000/sec0.0011/sec
> > total: 4
> > util_xalloc0.8/sec 1396586.967/sec   240916.6047/sec
> > total: 5834154064
> > vconn_open 0.0/sec 0.000/sec0./sec
> > total: 2
> > vconn_received 0.0/sec 0.050/sec0.0594/sec
> > total: 632
> > vconn_sent 0.0/sec 0.050/sec0.0494/sec
> > total: 1213248
> > netlink_received   0.0/sec 0.300/sec0.2900/sec
> > total: 1188
> > netlink_recv_jumbo 0.0/sec 0.083/sec0.0725/sec
> > total: 296
> > netlink_sent   0.0/sec 0.300/sec0.2900/sec
> > total: 1188
> > cmap_expand0.0/sec 0.000/sec0./sec
> > total: 3
> > 82 events never hit
> > [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t
> > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate
> > over last: 5 seconds, last minute, last hour,  hash=d0107601:
> > lflow_run  0.2/sec 0.083/sec0.0717/sec
> > total: 296
> > miniflow_malloc  122834.2/sec 51180.917/sec43930.2869/sec
> > total: 181249574
> > hindex_pathological0.0/sec 0.0

Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db

2020-08-07 Thread Tony Liu
Thanks for the hint!

2020-08-07T19:44:18.019Z|616614|jsonrpc|DBG|tcp:10.6.20.84:6642: received 
notification, method="update3", 
params=[["monid","OVN_Southbound"],"5002cb22-13e5-490a-9a64-5d48914138ca",{"Chassis":{"0390b621-152b-48a0-a3d1-2973c0b823cc":{"modify":{"external_ids":["map",[["neutron:liveness_check_at","2020-08-07T19:44:07.130233+00:00"]]]]


Nailed it...
https://bugs.launchpad.net/neutron/+bug/1883554


Tony
> -Original Message-
> From: dev  On Behalf Of Tony Liu
> Sent: Friday, August 7, 2020 1:14 PM
> To: Han Zhou 
> Cc: ovs-dev ; ovs-discuss  disc...@openvswitch.org>
> Subject: Re: [ovs-dev] [ovs-discuss] [OVN] ovn-controller takes 100% cpu
> while no changes in sb-db
> 
> 
> Here is the outpu.
> 
> [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t
> /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate
> over last: 5 seconds, last minute, last hour,  hash=e70a83c8:
> lflow_run  0.0/sec 0.083/sec0.0725/sec
> total: 295
> miniflow_malloc0.0/sec 44356.817/sec44527.3975/sec
> total: 180635403
> hindex_pathological0.0/sec 0.000/sec0./sec
> total: 7187
> hindex_expand  0.0/sec 0.000/sec0./sec
> total: 17
> hmap_pathological  0.0/sec 4.167/sec4.1806/sec
> total: 25091
> hmap_expand0.0/sec  5366.500/sec 5390.0800/sec
> total: 23680738
> txn_unchanged  0.0/sec 0.300/sec0.3175/sec
> total: 11024
> txn_incomplete 0.0/sec 0.100/sec0.0836/sec
> total: 974
> txn_success0.0/sec 0.033/sec0.0308/sec
> total: 129
> txn_try_again  0.0/sec 0.000/sec0.0003/sec
> total: 1
> poll_create_node   0.4/sec 1.933/sec1.9575/sec
> total: 55611
> poll_zero_timeout  0.0/sec 0.067/sec0.0556/sec
> total: 241
> rconn_queued   0.0/sec 0.050/sec0.0594/sec
> total: 1208720
> rconn_sent 0.0/sec 0.050/sec0.0594/sec
> total: 1208720
> seq_change 0.2/sec 0.783/sec0.7492/sec
> total: 13962
> pstream_open   0.0/sec 0.000/sec0./sec
> total: 1
> stream_open0.0/sec 0.000/sec0.0003/sec
> total: 5
> unixctl_received   0.0/sec 0.000/sec0.0011/sec
> total: 4
> unixctl_replied0.0/sec 0.000/sec0.0011/sec
> total: 4
> util_xalloc0.8/sec 1396586.967/sec   240916.6047/sec
> total: 5834154064
> vconn_open 0.0/sec 0.000/sec0./sec
> total: 2
> vconn_received 0.0/sec 0.050/sec0.0594/sec
> total: 632
> vconn_sent 0.0/sec 0.050/sec0.0494/sec
> total: 1213248
> netlink_received   0.0/sec 0.300/sec0.2900/sec
> total: 1188
> netlink_recv_jumbo 0.0/sec 0.083/sec0.0725/sec
> total: 296
> netlink_sent   0.0/sec 0.300/sec0.2900/sec
> total: 1188
> cmap_expand0.0/sec 0.000/sec0./sec
> total: 3
> 82 events never hit
> [root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t
> /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate
> over last: 5 seconds, last minute, last hour,  hash=d0107601:
> lflow_run  0.2/sec 0.083/sec0.0717/sec
> total: 296
> miniflow_malloc  122834.2/sec 51180.917/sec43930.2869/sec
> total: 181249574
> hindex_pathological0.0/sec 0.000/sec0./sec
> total: 7187
> hindex_expand  0.0/sec 0.000/sec0./sec
> total: 17
> hmap_pathological 13.2/sec 4.967/sec4.1264/sec
> total: 25157
> hmap_expand  14982.2/sec  6205.067/sec 5317.9547/sec
> total: 23755649
> txn_unchanged  1.4/sec 0.400/sec0.3144/sec
> total: 11031
> txn_incomplete 0.4/sec 0.117/sec0.0825/sec
> total: 976
> txn_success0.2/sec 0.050/sec0.0306/sec
> total: 130
> txn_try_again  0.0/sec 0.000/sec0.0003/sec
> total: 1
> poll_create_node   7.6/sec 2.467/sec1.9353/sec
> total: 55649
> poll_zero_timeout  0.4/sec 0.100/sec0.0547/sec
> total: 243
> rconn_queued   0.4/sec 0.083/sec0.0592/sec
> total: 1208722
> rconn_sent 0.4/sec 0.083/sec0.0592/sec

Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db

2020-08-07 Thread Tony Liu
 rate over last: 5 seconds, last minute, last hour,  
hash=069e0b25:
lflow_run  0.2/sec 0.100/sec0.0719/sec   total: 297
miniflow_malloc  122834.2/sec 61417.100/sec44100.8900/sec   total: 
181863745
hindex_pathological0.0/sec 0.000/sec0./sec   total: 7187
hindex_expand  0.0/sec 0.000/sec0./sec   total: 17
hmap_pathological 10.0/sec 5.800/sec4.1403/sec   total: 
25207
hmap_expand  14756.6/sec  7434.783/sec 5338.4500/sec   total: 
23829432
txn_unchanged  0.4/sec 0.433/sec0.3150/sec   total: 
11033
txn_incomplete 0.0/sec 0.117/sec0.0825/sec   total: 976
txn_success0.0/sec 0.050/sec0.0306/sec   total: 130
txn_try_again  0.0/sec 0.000/sec0.0003/sec   total: 1
poll_create_node   2.6/sec 2.650/sec1.9389/sec   total: 
55662
poll_zero_timeout  0.0/sec 0.100/sec0.0547/sec   total: 243
rconn_queued   0.0/sec 0.083/sec0.0592/sec   total: 
1208722
rconn_sent 0.0/sec 0.083/sec0.0592/sec   total: 
1208722
seq_change 1.4/sec 1.000/sec0.7414/sec   total: 
13980
pstream_open   0.0/sec 0.000/sec0./sec   total: 1
stream_open0.0/sec 0.000/sec0.0003/sec   total: 5
unixctl_received   0.2/sec 0.033/sec0.0017/sec   total: 6
unixctl_replied0.2/sec 0.033/sec0.0017/sec   total: 6
util_xalloc  3864890.0/sec 1933841.933/sec   227487.1978/sec   
total: 5872830887
vconn_open 0.0/sec 0.000/sec0./sec   total: 2
vconn_received 0.0/sec 0.083/sec0.0592/sec   total: 634
vconn_sent 0.0/sec 0.083/sec0.0492/sec   total: 
1213250
netlink_received   0.8/sec 0.400/sec0.2872/sec   total: 1196
netlink_recv_jumbo 0.2/sec 0.100/sec0.0719/sec   total: 298
netlink_sent   0.8/sec 0.400/sec0.2872/sec   total: 1196
cmap_expand0.0/sec 0.000/sec0./sec   total: 3
82 events never hit
[root@gateway-1 ~]# docker exec ovn_controller ovs-appctl -t 
/run/ovn/ovn-controller.6.ctl coverage/show
Event coverage, avg rate over last: 5 seconds, last minute, last hour,  
hash=069e0b25:
lflow_run  0.0/sec 0.083/sec0.0719/sec   total: 297
miniflow_malloc0.0/sec 51180.917/sec44100.8900/sec   total: 
181863745
hindex_pathological0.0/sec 0.000/sec0./sec   total: 7187
hindex_expand  0.0/sec 0.000/sec0./sec   total: 17
hmap_pathological  2.4/sec 4.967/sec4.1436/sec   total: 
25219
hmap_expand  171.8/sec  6205.350/sec 5338.6886/sec   total: 
23830291
txn_unchanged  1.2/sec 0.433/sec0.3167/sec   total: 
11039
txn_incomplete 0.0/sec 0.100/sec0.0825/sec   total: 976
txn_success0.0/sec 0.033/sec0.0306/sec   total: 130
txn_try_again  0.0/sec 0.000/sec0.0003/sec   total: 1
poll_create_node   4.6/sec 2.583/sec1.9453/sec   total: 
55685
poll_zero_timeout  0.0/sec 0.067/sec0.0547/sec   total: 243
rconn_queued   0.2/sec 0.083/sec0.0594/sec   total: 
1208723
rconn_sent 0.2/sec 0.083/sec0.0594/sec   total: 
1208723
seq_change 1.0/sec 0.933/sec0.7428/sec   total: 
13985
pstream_open   0.0/sec 0.000/sec0./sec   total: 1
stream_open0.0/sec 0.000/sec0.0003/sec   total: 5
unixctl_received   0.2/sec 0.050/sec0.0019/sec   total: 7
unixctl_replied0.2/sec 0.050/sec0.0019/sec   total: 7
util_xalloc  4345.0/sec 1611785.933/sec   227493.2325/sec   total: 
5872852612
vconn_open 0.0/sec 0.000/sec0./sec   total: 2
vconn_received 0.2/sec 0.083/sec0.0594/sec   total: 635
vconn_sent 0.2/sec 0.083/sec0.0494/sec   total: 
1213251
netlink_received   0.0/sec 0.333/sec0.2872/sec   total: 1196
netlink_recv_jumbo 0.0/sec 0.083/sec0.0719/sec   total: 298
netlink_sent   0.0/sec 0.333/sec0.2872/sec   total: 1196
cmap_expand0.0/sec 0.000/sec0./sec   total: 3
82 events never hit



Thanks!

Tony
> -Original Message-
> From: Han Zhou 
> Sent: Friday, August 7, 2020 1:09 PM
> To: Tony Liu 
> Cc: ovs-discuss ; ovs-dev  d...@openvswitch.org>
> Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> change

Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db

2020-08-07 Thread Han Zhou
On Fri, Aug 7, 2020 at 12:57 PM Tony Liu  wrote:

> Enabled debug logging, there are tons of messages.
> Note there are 4353 datapath bindings and 13078 port bindings in SB.
> 4097 LS, 8470 LSP, 256 LR and 4352 LRP in NB. Every 16 LS connect to
> a router. All routers connect to the external network.
>
> ovn-controller on compute node is good. The ovn-controller on gateway
> node is taking 100% cpu. It's probably related to the ports on the
> external network? Any specific messages I need to check?
>
> Any hint to look into it is appreciated!
>
>
If it happens only on gateway, it may be related to ARP handling. Could you
share the output of ovn-appctl -t ovn-controller coverage/show, with 2 - 3
runs in 5s interval?
For the debug log, I'd first check if there is any OVSDB notification from
SB DB, and if yes, what are the changes.

>
> Thanks!
>
> Tony
> > -Original Message-
> > From: Han Zhou 
> > Sent: Friday, August 7, 2020 12:39 PM
> > To: Tony Liu 
> > Cc: ovs-discuss ; ovs-dev  > d...@openvswitch.org>
> > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> > changes in sb-db
> >
> >
> >
> > On Fri, Aug 7, 2020 at 12:35 PM Tony Liu  > <mailto:tonyliu0...@hotmail.com> > wrote:
> >
> >
> >   Inline...
> >
> >   Thanks!
> >
> >   Tony
> >   > -Original Message-
> >   > From: Han Zhou mailto:zhou...@gmail.com> >
> >   > Sent: Friday, August 7, 2020 12:29 PM
> >   > To: Tony Liu  > <mailto:tonyliu0...@hotmail.com> >
> >   > Cc: ovs-discuss mailto:ovs-
> > disc...@openvswitch.org> >; ovs-dev  >   > d...@openvswitch.org <mailto:d...@openvswitch.org> >
> >   > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu
> > while no
> >   > changes in sb-db
> >   >
> >   >
> >   >
> >   > On Fri, Aug 7, 2020 at 12:19 PM Tony Liu <
> tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com>
> >   > <mailto:tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com> > > wrote:
> >   >
> >   >
> >   >   ovn-controller is using UNIX socket connecting to local
> > ovsdb-
> >   > server.
> >   >
> >   > From the log you were showing, you were using tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640>
> >
> >   Sorry, what I meant was, given your advice, I just made the change
> > for
> >   ovn-controller to use UNIX socket.
> >
> >
> >
> > Oh, I see, no worries.
> >
> >
> >   > <http://127.0.0.1:6640>  to connect the local ovsdb.
> >   > >   2020-08-
> > 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640>
> >   > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> : connection
> > dropped
> >   > > (Broken pipe)
> >   >
> >   >
> >   >   Inactivity probe doesn't seem to be the cause of high cpu
> > usage.
> >   >
> >   >   The wakeup on connection to sb-db is always followed by a
> >   > "unreasonably
> >   >   long" warning. I guess the pollin event loop is stuck for
> > too long,
> >   > like
> >   >   10s as below.
> >   >   
> >   >   2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to
> > [POLLIN]
> >   > on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712>
> > <http://10.6.20.91:60712> <->10.6.20.86:6642 <http://10.6.20.86:6642>
> >   > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU
> usage)
> >   >   2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably
> > long
> >   > 10153ms poll interval (10075ms user, 1ms system)
> >   >   
> >   >
> >   >   Could that stuck loop be the cause of high cpu usage?
> >   >   What is it polling in?
> >   >   Why is it stuck, waiting for message from sb-db?
> >   >   Isn't it supposed to release the cpu while waiting?
> >   >
> >   >
> >   >
> >   > This log means there are messages received from 10.6.20.86:6642
> > <http://10.6.20.86:6642>
> >   > <http://10.6.20.86:6642>  (the

Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db

2020-08-07 Thread Tony Liu
Enabled debug logging, there are tons of messages.
Note there are 4353 datapath bindings and 13078 port bindings in SB.
4097 LS, 8470 LSP, 256 LR and 4352 LRP in NB. Every 16 LS connect to
a router. All routers connect to the external network.

ovn-controller on compute node is good. The ovn-controller on gateway
node is taking 100% cpu. It's probably related to the ports on the
external network? Any specific messages I need to check?

Any hint to look into it is appreciated!


Thanks!

Tony
> -Original Message-
> From: Han Zhou 
> Sent: Friday, August 7, 2020 12:39 PM
> To: Tony Liu 
> Cc: ovs-discuss ; ovs-dev  d...@openvswitch.org>
> Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> changes in sb-db
> 
> 
> 
> On Fri, Aug 7, 2020 at 12:35 PM Tony Liu  <mailto:tonyliu0...@hotmail.com> > wrote:
> 
> 
>   Inline...
> 
>   Thanks!
> 
>   Tony
>   > -Original Message-
>   > From: Han Zhou mailto:zhou...@gmail.com> >
>   > Sent: Friday, August 7, 2020 12:29 PM
>   > To: Tony Liu  <mailto:tonyliu0...@hotmail.com> >
>   > Cc: ovs-discuss mailto:ovs-
> disc...@openvswitch.org> >; ovs-dev        > d...@openvswitch.org <mailto:d...@openvswitch.org> >
>   > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu
> while no
>   > changes in sb-db
>   >
>   >
>   >
>   > On Fri, Aug 7, 2020 at 12:19 PM Tony Liu  <mailto:tonyliu0...@hotmail.com>
>   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > wrote:
>   >
>   >
>   >   ovn-controller is using UNIX socket connecting to local
> ovsdb-
>   > server.
>   >
>   > From the log you were showing, you were using tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
> 
>   Sorry, what I meant was, given your advice, I just made the change
> for
>   ovn-controller to use UNIX socket.
> 
> 
> 
> Oh, I see, no worries.
> 
> 
>   > <http://127.0.0.1:6640>  to connect the local ovsdb.
>   > >   2020-08-
> 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
>   > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> : connection
> dropped
>   > > (Broken pipe)
>   >
>   >
>   >   Inactivity probe doesn't seem to be the cause of high cpu
> usage.
>   >
>   >   The wakeup on connection to sb-db is always followed by a
>   > "unreasonably
>   >   long" warning. I guess the pollin event loop is stuck for
> too long,
>   > like
>   >   10s as below.
>   >   
>   >   2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to
> [POLLIN]
>   > on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712>
> <http://10.6.20.91:60712> <->10.6.20.86:6642 <http://10.6.20.86:6642>
>   > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage)
>   >   2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably
> long
>   > 10153ms poll interval (10075ms user, 1ms system)
>   >   
>   >
>   >   Could that stuck loop be the cause of high cpu usage?
>   >   What is it polling in?
>   >   Why is it stuck, waiting for message from sb-db?
>   >   Isn't it supposed to release the cpu while waiting?
>   >
>   >
>   >
>   > This log means there are messages received from 10.6.20.86:6642
> <http://10.6.20.86:6642>
>   > <http://10.6.20.86:6642>  (the SB DB). Is there SB change? The
> CPU is
>   > spent on handling the SB change. Some type of SB changes are not
> handled
>   > incrementally.
> 
>   SB update is driven by ovn-northd in case anything changed in NB,
>   and ovn-controller in case anything changed on chassis. No, there
>   is nothing changed in NB, neither chassis.
> 
>   Should I bump logging level up to dbg? Is that going to show me
>   what messages ovn-controller is handling?
> 
> 
> 
> Yes, debug log should show the details.
> 
> 
> 
>   >
>   >   Thanks!
>   >
>   >   Tony
>   >
>   >   > -Original Message-----
>       >   > From: Han Zhou  <mailto:zhou...@gmail.com>  <mailto:zhou...@gmail.com
> <mailto:zhou...@gmail.com> > >
>   > 

Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db

2020-08-07 Thread Tony Liu
Inline...

Thanks!

Tony
> -Original Message-
> From: Han Zhou 
> Sent: Friday, August 7, 2020 12:29 PM
> To: Tony Liu 
> Cc: ovs-discuss ; ovs-dev  d...@openvswitch.org>
> Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> changes in sb-db
> 
> 
> 
> On Fri, Aug 7, 2020 at 12:19 PM Tony Liu  <mailto:tonyliu0...@hotmail.com> > wrote:
> 
> 
>   ovn-controller is using UNIX socket connecting to local ovsdb-
> server.
> 
> From the log you were showing, you were using tcp:127.0.0.1:6640

Sorry, what I meant was, given your advice, I just made the change for
ovn-controller to use UNIX socket.

> <http://127.0.0.1:6640>  to connect the local ovsdb.
> >   2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640> <http://127.0.0.1:6640> : connection dropped
> > (Broken pipe)
> 
> 
>   Inactivity probe doesn't seem to be the cause of high cpu usage.
> 
>   The wakeup on connection to sb-db is always followed by a
> "unreasonably
>   long" warning. I guess the pollin event loop is stuck for too long,
> like
>   10s as below.
>   
>   2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to [POLLIN]
> on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712> <->10.6.20.86:6642
> <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage)
>   2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably long
> 10153ms poll interval (10075ms user, 1ms system)
>   
> 
>   Could that stuck loop be the cause of high cpu usage?
>   What is it polling in?
>   Why is it stuck, waiting for message from sb-db?
>   Isn't it supposed to release the cpu while waiting?
> 
> 
> 
> This log means there are messages received from 10.6.20.86:6642
> <http://10.6.20.86:6642>  (the SB DB). Is there SB change? The CPU is
> spent on handling the SB change. Some type of SB changes are not handled
> incrementally.

SB update is driven by ovn-northd in case anything changed in NB,
and ovn-controller in case anything changed on chassis. No, there
is nothing changed in NB, neither chassis.

Should I bump logging level up to dbg? Is that going to show me
what messages ovn-controller is handling?

> 
>   Thanks!
> 
>   Tony
> 
>   > -Original Message-
>   > From: Han Zhou mailto:zhou...@gmail.com> >
>       > Sent: Friday, August 7, 2020 10:32 AM
>   > To: Tony Liu  <mailto:tonyliu0...@hotmail.com> >
>   > Cc: ovs-discuss mailto:ovs-
> disc...@openvswitch.org> >; ovs-dev> d...@openvswitch.org <mailto:d...@openvswitch.org> >
>   > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu
> while no
>   > changes in sb-db
>   >
>   >
>   >
>   > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu  <mailto:tonyliu0...@hotmail.com>
>   > <mailto:tonyliu0...@hotmail.com
> <mailto:tonyliu0...@hotmail.com> > > wrote:
>   >
>   >
>   >   Hi,
>   >
>   >   Here are some logging snippets from ovn-controller.
>   >   
>   >   2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably
> long
>   > 8954ms poll interval (8895ms user, 0ms system)
>   >   
>   >   What's that mean? Is it harmless?
>   >
>   >   
>   >   2020-08-07T16:38:04.021Z|29251|timeval|WARN|context
> switches: 0
>   > voluntary, 6 involuntary
>   >   2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to
> [POLLIN]
>   > on fd 19 (10.6.20.91:60398 <http://10.6.20.91:60398>
> <http://10.6.20.91:60398> <->10.6.20.86:6642 <http://10.6.20.86:6642>
>   > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage)
>   >   
>   >   Is this wakeup caused by changes in sb-db?
>   >   Why is ovn-controller so busy?
>   >
>   >   
>   >   2020-08-
> 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640>
>   > <http://127.0.0.1:6640> : connection dropped (Broken pipe)
>   >   
>   >   Connection to local ovsdb-server is dropped.
>   >   Is this caused by the timeout of inactivity probe?
>   >
>   >   
>   >   2020-08-07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to
> [POLLIN]
>   > on f

Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db

2020-08-07 Thread Han Zhou
On Fri, Aug 7, 2020 at 12:35 PM Tony Liu  wrote:

> Inline...
>
> Thanks!
>
> Tony
> > -Original Message-
> > From: Han Zhou 
> > Sent: Friday, August 7, 2020 12:29 PM
> > To: Tony Liu 
> > Cc: ovs-discuss ; ovs-dev  > d...@openvswitch.org>
> > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> > changes in sb-db
> >
> >
> >
> > On Fri, Aug 7, 2020 at 12:19 PM Tony Liu  > <mailto:tonyliu0...@hotmail.com> > wrote:
> >
> >
> >   ovn-controller is using UNIX socket connecting to local ovsdb-
> > server.
> >
> > From the log you were showing, you were using tcp:127.0.0.1:6640
>
> Sorry, what I meant was, given your advice, I just made the change for
> ovn-controller to use UNIX socket.
>
> Oh, I see, no worries.

> <http://127.0.0.1:6640>  to connect the local ovsdb.
> > >   2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> : connection dropped
> > > (Broken pipe)
> >
> >
> >   Inactivity probe doesn't seem to be the cause of high cpu usage.
> >
> >   The wakeup on connection to sb-db is always followed by a
> > "unreasonably
> >   long" warning. I guess the pollin event loop is stuck for too long,
> > like
> >   10s as below.
> >   
> >   2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to
> [POLLIN]
> > on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712> <->10.6.20.86:6642
> > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage)
> >   2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably long
> > 10153ms poll interval (10075ms user, 1ms system)
> >   
> >
> >   Could that stuck loop be the cause of high cpu usage?
> >   What is it polling in?
> >   Why is it stuck, waiting for message from sb-db?
> >   Isn't it supposed to release the cpu while waiting?
> >
> >
> >
> > This log means there are messages received from 10.6.20.86:6642
> > <http://10.6.20.86:6642>  (the SB DB). Is there SB change? The CPU is
> > spent on handling the SB change. Some type of SB changes are not handled
> > incrementally.
>
> SB update is driven by ovn-northd in case anything changed in NB,
> and ovn-controller in case anything changed on chassis. No, there
> is nothing changed in NB, neither chassis.
>
> Should I bump logging level up to dbg? Is that going to show me
> what messages ovn-controller is handling?
>
> Yes, debug log should show the details.


> >
> >       Thanks!
> >
> >   Tony
> >
> >   > -Original Message-
> >   > From: Han Zhou mailto:zhou...@gmail.com> >
> >   > Sent: Friday, August 7, 2020 10:32 AM
> >   > To: Tony Liu  > <mailto:tonyliu0...@hotmail.com> >
> >   > Cc: ovs-discuss mailto:ovs-
> > disc...@openvswitch.org> >; ovs-dev  >   > d...@openvswitch.org <mailto:d...@openvswitch.org> >
> >   > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu
> > while no
> >   > changes in sb-db
> >   >
> >   >
> >   >
> >   > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu <
> tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com>
> >   > <mailto:tonyliu0...@hotmail.com
> > <mailto:tonyliu0...@hotmail.com> > > wrote:
> >   >
> >   >
> >   >   Hi,
> >   >
> >   >   Here are some logging snippets from ovn-controller.
> >   >   
> >   >   2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably
> > long
> >   > 8954ms poll interval (8895ms user, 0ms system)
> >   >   
> >   >   What's that mean? Is it harmless?
> >   >
> >   >   
> >   >   2020-08-07T16:38:04.021Z|29251|timeval|WARN|context
> > switches: 0
> >   > voluntary, 6 involuntary
> >   >   2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to
> > [POLLIN]
> >   > on fd 19 (10.6.20.91:60398 <http://10.6.20.91:60398>
> > <http://10.6.20.91:60398> <->10.6.20.86:6642 <http://10.6.20.86:6642>
> >   > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU
> usage)
> >   >   

Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db

2020-08-07 Thread Han Zhou
On Fri, Aug 7, 2020 at 12:19 PM Tony Liu  wrote:

> ovn-controller is using UNIX socket connecting to local ovsdb-server.
>

>From the log you were showing, you were using tcp:127.0.0.1:6640 to connect
the local ovsdb.
>   2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640> : connection dropped (Broken pipe)


> Inactivity probe doesn't seem to be the cause of high cpu usage.
>
> The wakeup on connection to sb-db is always followed by a "unreasonably
> long" warning. I guess the pollin event loop is stuck for too long, like
> 10s as below.
> 
> 2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to [POLLIN] on fd
> 19 (10.6.20.91:60712<->10.6.20.86:6642) at lib/stream-fd.c:157 (99% CPU
> usage)
> 2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably long 10153ms poll
> interval (10075ms user, 1ms system)
> 
>
> Could that stuck loop be the cause of high cpu usage?
> What is it polling in?
> Why is it stuck, waiting for message from sb-db?
> Isn't it supposed to release the cpu while waiting?
>
> This log means there are messages received from 10.6.20.86:6642 (the SB
DB). Is there SB change? The CPU is spent on handling the SB change. Some
type of SB changes are not handled incrementally.


> Thanks!
>
> Tony
>
> > -Original Message-
> > From: Han Zhou 
> > Sent: Friday, August 7, 2020 10:32 AM
> > To: Tony Liu 
> > Cc: ovs-discuss ; ovs-dev  > d...@openvswitch.org>
> > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> > changes in sb-db
> >
> >
> >
> > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu  > <mailto:tonyliu0...@hotmail.com> > wrote:
> >
> >
> >   Hi,
> >
> >   Here are some logging snippets from ovn-controller.
> >   
> >   2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably long
> > 8954ms poll interval (8895ms user, 0ms system)
> >   
> >   What's that mean? Is it harmless?
> >
> >   
> >   2020-08-07T16:38:04.021Z|29251|timeval|WARN|context switches: 0
> > voluntary, 6 involuntary
> >   2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to
> [POLLIN]
> > on fd 19 (10.6.20.91:60398 <http://10.6.20.91:60398> <->10.6.20.86:6642
> > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage)
> >   
> >   Is this wakeup caused by changes in sb-db?
> >   Why is ovn-controller so busy?
> >
> >   
> >   2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640> : connection dropped (Broken pipe)
> >   
> >   Connection to local ovsdb-server is dropped.
> >   Is this caused by the timeout of inactivity probe?
> >
> >   
> >   2020-08-07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to
> [POLLIN]
> > on fd 20 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157
> > (99% CPU usage)
> >   
> >   What causes this wakeup?
> >
> >   
> >   2020-08-07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to 0-ms
> > timeout at lib/ovsdb-idl.c:5391 (99% CPU usage)
> >   
> >   What's this 0-ms wakeup mean?
> >
> >   
> >   2020-08-07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to 962-ms
> > timeout at lib/reconnect.c:643 (99% CPU usage)
> >   2020-08-07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640> : connecting...
> >   2020-08-07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to
> > [POLLOUT] on fd 14 (127.0.0.1:51478 <http://127.0.0.1:51478> <-
> > >127.0.0.1:6640 <http://127.0.0.1:6640> ) at lib/stream-fd.c:153 (99%
> > CPU usage)
> >   2020-08-07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640> : connected
> >   
> >   Retry to connect to local ovsdb-server. A pollout event is
> > triggered
> >   right after connection is established. What's poolout?
> >
> >   ovn-controller is taking 100% CPU now, and there is no changes in
> >   sb-db (not busy). It seems that it's busy with local ovsdb-server
> >   or vswitchd. I'd like to understand why ovn-controller is so busy?
> >   All inactivity probe intervals are set to 30s.
> >
> >
> >
> >
> > Is there change from the local ovsdb? You can enable dbg log to see what
> > is happening.
> > For the local ovsdb probe, I have mentioned in the other thread that
> > UNIX socket is recommended (instead of tcp 127.0.0.1). Using UNIX socket
> > disables probe by default.
> >
> > Thanks,
> > Han
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db

2020-08-07 Thread Tony Liu
ovn-controller is using UNIX socket connecting to local ovsdb-server.
Inactivity probe doesn't seem to be the cause of high cpu usage.

The wakeup on connection to sb-db is always followed by a "unreasonably
long" warning. I guess the pollin event loop is stuck for too long, like
10s as below.

2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to [POLLIN] on fd 19 
(10.6.20.91:60712<->10.6.20.86:6642) at lib/stream-fd.c:157 (99% CPU usage)
2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably long 10153ms poll 
interval (10075ms user, 1ms system)


Could that stuck loop be the cause of high cpu usage?
What is it polling in?
Why is it stuck, waiting for message from sb-db?
Isn't it supposed to release the cpu while waiting?


Thanks!

Tony

> -Original Message-
> From: Han Zhou 
> Sent: Friday, August 7, 2020 10:32 AM
> To: Tony Liu 
> Cc: ovs-discuss ; ovs-dev  d...@openvswitch.org>
> Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> changes in sb-db
> 
> 
> 
> On Fri, Aug 7, 2020 at 10:05 AM Tony Liu  <mailto:tonyliu0...@hotmail.com> > wrote:
> 
> 
>   Hi,
> 
>   Here are some logging snippets from ovn-controller.
>   
>   2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably long
> 8954ms poll interval (8895ms user, 0ms system)
>   
>   What's that mean? Is it harmless?
> 
>   
>   2020-08-07T16:38:04.021Z|29251|timeval|WARN|context switches: 0
> voluntary, 6 involuntary
>   2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to [POLLIN]
> on fd 19 (10.6.20.91:60398 <http://10.6.20.91:60398> <->10.6.20.86:6642
> <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage)
>   
>   Is this wakeup caused by changes in sb-db?
>   Why is ovn-controller so busy?
> 
>   
>   2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640> : connection dropped (Broken pipe)
>   
>   Connection to local ovsdb-server is dropped.
>   Is this caused by the timeout of inactivity probe?
> 
>   
>   2020-08-07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to [POLLIN]
> on fd 20 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157
> (99% CPU usage)
>   
>   What causes this wakeup?
> 
>   
>   2020-08-07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to 0-ms
> timeout at lib/ovsdb-idl.c:5391 (99% CPU usage)
>   
>   What's this 0-ms wakeup mean?
> 
>   
>   2020-08-07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to 962-ms
> timeout at lib/reconnect.c:643 (99% CPU usage)
>   2020-08-07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640> : connecting...
>   2020-08-07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to
> [POLLOUT] on fd 14 (127.0.0.1:51478 <http://127.0.0.1:51478> <-
> >127.0.0.1:6640 <http://127.0.0.1:6640> ) at lib/stream-fd.c:153 (99%
> CPU usage)
>   2020-08-07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640> : connected
>   
>   Retry to connect to local ovsdb-server. A pollout event is
> triggered
>   right after connection is established. What's poolout?
> 
>   ovn-controller is taking 100% CPU now, and there is no changes in
>   sb-db (not busy). It seems that it's busy with local ovsdb-server
>   or vswitchd. I'd like to understand why ovn-controller is so busy?
>   All inactivity probe intervals are set to 30s.
> 
> 
> 
> 
> Is there change from the local ovsdb? You can enable dbg log to see what
> is happening.
> For the local ovsdb probe, I have mentioned in the other thread that
> UNIX socket is recommended (instead of tcp 127.0.0.1). Using UNIX socket
> disables probe by default.
> 
> Thanks,
> Han

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db

2020-08-07 Thread Han Zhou
On Fri, Aug 7, 2020 at 10:05 AM Tony Liu  wrote:

> Hi,
>
> Here are some logging snippets from ovn-controller.
> 
> 2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably long 8954ms poll
> interval (8895ms user, 0ms system)
> 
> What's that mean? Is it harmless?
>
> 
> 2020-08-07T16:38:04.021Z|29251|timeval|WARN|context switches: 0 voluntary,
> 6 involuntary
> 2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to [POLLIN] on fd
> 19 (10.6.20.91:60398<->10.6.20.86:6642) at lib/stream-fd.c:157 (99% CPU
> usage)
> 
> Is this wakeup caused by changes in sb-db?
> Why is ovn-controller so busy?
>
> 
> 2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640:
> connection dropped (Broken pipe)
> 
> Connection to local ovsdb-server is dropped.
> Is this caused by the timeout of inactivity probe?
>
> 
> 2020-08-07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to [POLLIN] on fd
> 20 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (99% CPU
> usage)
> 
> What causes this wakeup?
>
> 
> 2020-08-07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to 0-ms timeout
> at lib/ovsdb-idl.c:5391 (99% CPU usage)
> 
> What's this 0-ms wakeup mean?
>
> 
> 2020-08-07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to 962-ms timeout
> at lib/reconnect.c:643 (99% CPU usage)
> 2020-08-07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640:
> connecting...
> 2020-08-07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to [POLLOUT] on
> fd 14 (127.0.0.1:51478<->127.0.0.1:6640) at lib/stream-fd.c:153 (99% CPU
> usage)
> 2020-08-07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640:
> connected
> 
> Retry to connect to local ovsdb-server. A pollout event is triggered
> right after connection is established. What's poolout?
>
> ovn-controller is taking 100% CPU now, and there is no changes in
> sb-db (not busy). It seems that it's busy with local ovsdb-server
> or vswitchd. I'd like to understand why ovn-controller is so busy?
> All inactivity probe intervals are set to 30s.
>
>
Is there change from the local ovsdb? You can enable dbg log to see what is
happening.
For the local ovsdb probe, I have mentioned in the other thread that UNIX
socket is recommended (instead of tcp 127.0.0.1). Using UNIX socket
disables probe by default.

Thanks,
Han
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db

2020-08-07 Thread Tony Liu
Hi,

Here are some logging snippets from ovn-controller.

2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably long 8954ms poll 
interval (8895ms user, 0ms system)

What's that mean? Is it harmless?


2020-08-07T16:38:04.021Z|29251|timeval|WARN|context switches: 0 voluntary, 6 
involuntary
2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to [POLLIN] on fd 19 
(10.6.20.91:60398<->10.6.20.86:6642) at lib/stream-fd.c:157 (99% CPU usage)

Is this wakeup caused by changes in sb-db?
Why is ovn-controller so busy?


2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640: connection 
dropped (Broken pipe)

Connection to local ovsdb-server is dropped.
Is this caused by the timeout of inactivity probe?


2020-08-07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to [POLLIN] on fd 20 
(<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (99% CPU usage)

What causes this wakeup?


2020-08-07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to 0-ms timeout at 
lib/ovsdb-idl.c:5391 (99% CPU usage)

What's this 0-ms wakeup mean?


2020-08-07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to 962-ms timeout at 
lib/reconnect.c:643 (99% CPU usage)
2020-08-07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640: connecting...
2020-08-07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to [POLLOUT] on fd 14 
(127.0.0.1:51478<->127.0.0.1:6640) at lib/stream-fd.c:153 (99% CPU usage)
2020-08-07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640: connected

Retry to connect to local ovsdb-server. A pollout event is triggered
right after connection is established. What's poolout?

ovn-controller is taking 100% CPU now, and there is no changes in
sb-db (not busy). It seems that it's busy with local ovsdb-server
or vswitchd. I'd like to understand why ovn-controller is so busy?
All inactivity probe intervals are set to 30s.


Thanks!

Tony

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss