On Thu, Feb 20, 2020 at 1:58 AM Dumitru Ceara <[email protected]> wrote: > > On 2/19/20 6:36 PM, Han Zhou wrote: > > > > > > On Wed, Feb 19, 2020 at 4:50 AM Dumitru Ceara <[email protected] > > <mailto:[email protected]>> wrote: > >> > >> On 2/19/20 12:32 AM, Han Zhou wrote: > >> > This is usefule when external_ids:ovn-monitor-all is set to true. > >> > > >> > Signed-off-by: Han Zhou <[email protected] <mailto:[email protected]>> > >> > >> Hi Han, > >> > >> Looks good to me. > >> > >> Acked-by: Dumitru Ceara <[email protected] <mailto:[email protected]>> > >> > >> I also tested this (together with your previous patch) on a scaled setup > >> with 150 ovn-fake-multinode nodes and ovn-monitor-all enabled. > >> > >> With OVN master I see high CPU usage on ovn-controllers from time to time: > >> > >> > > ovn-netlab1-64/ovn-controller.log:2020-02-19T12:14:11.896Z|00017|timeval|WARN|Unreasonably > >> long 1087ms poll interval (224ms user, 14ms system) > >> > > ovn-netlab1-140/ovn-controller.log:2020-02-19T12:14:12.030Z|00017|timeval|WARN|Unreasonably > >> long 1055ms poll interval (241ms user, 11ms system) > >> > > ovn-netlab1-69/ovn-controller.log:2020-02-19T12:14:11.856Z|00017|timeval|WARN|Unreasonably > >> long 1019ms poll interval (221ms user, 1ms system) > >> > > ovn-netlab1-25/ovn-controller.log:2020-02-19T12:14:11.857Z|00017|timeval|WARN|Unreasonably > >> long 1053ms poll interval (230ms user, 9ms system) > >> > > ovn-netlab1-48/ovn-controller.log:2020-02-19T12:14:11.827Z|00017|timeval|WARN|Unreasonably > >> long 1005ms poll interval (245ms user, 22ms system) > >> > > ovn-netlab1-80/ovn-controller.log:2020-02-19T12:14:11.936Z|00017|timeval|WARN|Unreasonably > >> long 1127ms poll interval (218ms user, 2ms system) > >> > > ovn-netlab1-56/ovn-controller.log:2020-02-19T12:14:01.202Z|00017|timeval|WARN|Unreasonably > >> long 1016ms poll interval (224ms user, 0ms system) > >> > > ovn-netlab1-24/ovn-controller.log:2020-02-19T12:14:22.623Z|00017|timeval|WARN|Unreasonably > >> long 1022ms poll interval (227ms user, 1ms system) > >> > > ovn-netlab1-65/ovn-controller.log:2020-02-19T12:13:19.585Z|00017|timeval|WARN|Unreasonably > >> long 1012ms poll interval (213ms user, 1ms system) > >> > > ovn-netlab1-46/ovn-controller.log:2020-02-19T12:14:11.893Z|00017|timeval|WARN|Unreasonably > >> long 1086ms poll interval (225ms user, 0ms system) > >> > > ovn-netlab1-21/ovn-controller.log:2020-02-19T12:13:19.586Z|00017|timeval|WARN|Unreasonably > >> long 1031ms poll interval (222ms user, 0ms system) > >> > >> With your changes this happens less often: > >> > > ./localhost/ovn-netlab1-63/ovn-controller.log:2020-02-19T12:46:10.204Z|00017|timeval|WARN|Unreasonably > >> long 1038ms poll interval (223ms user, 1ms system) > >> > > ./localhost/ovn-netlab1-67/ovn-controller.log:2020-02-19T12:45:59.677Z|00017|timeval|WARN|Unreasonably > >> long 1033ms poll interval (215ms user, 0ms system) > >> > > ./localhost/ovn-netlab1-96/ovn-controller.log:2020-02-19T12:46:10.261Z|00017|timeval|WARN|Unreasonably > >> long 1009ms poll interval (219ms user, 1ms system) > >> > > ./localhost/ovn-netlab1-43/ovn-controller.log:2020-02-19T12:46:10.194Z|00017|timeval|WARN|Unreasonably > >> long 1044ms poll interval (222ms user, 0ms system) > >> > > ./localhost/ovn-netlab1-58/ovn-controller.log:2020-02-19T12:46:10.253Z|00017|timeval|WARN|Unreasonably > >> long 1091ms poll interval (225ms user, 12ms system) > >> > > ./localhost/ovn-netlab1-95/ovn-controller.log:2020-02-19T12:46:10.246Z|00017|timeval|WARN|Unreasonably > >> long 1031ms poll interval (216ms user, 16ms system) > >> > >> > >> Regards, > >> Dumitru > >> > > Thanks Dumitru for reviewing and testing it out. > > Are you seeing high CPU only after applying this patch? In theory I > > think this patch should not contribute to CPU spike. > > Enabling ovn-monitor-all can result in higher CPU in ovn-controller in > > circumstances when not all datapaths are local. In your test case, is > > the topology ideal for ovn-monitor-all? I.e. does each node cares about > > all datapaths? If the answer is yes, then could you try enabling > > ovn-monitor-all only on half of the nodes, and see if the nodes with > > ovn-monitor-all enabled are with higher CPU than others? > > > > Hi Han, > > In my test topology all datapaths are local (i.e., all logical switches > are connected to a single cluster logical router). > > The test machine I used initially was oversubscribed so I ran the tests > again on a setup with more physical machines: > > 1. With OVN master, ovn-monitor-all=false, bringing up 300 nodes (300 > logical switches + one VIF per switch): > - SB DB CPU usage is high after a certain number of nodes come up. > Running perf on the setup points to ovsdb_monitor_get_update that takes > up to 70% CPU time (including children). This due to each ovn-controller > subscribing to OVSDB updates for all datapaths individually. > - ovn-controller CPU usage is normal, i.e., no visible CPU spikes. > > 2. With OVN master, ovn-monitor-all=true, bringing up 300 nodes: > - SB DB CPU usage is low, no visible CPU spikes. > - ovn-controller CPU usage is normal as well. > > 3. With OVN master + your patches, ovn-monitor-all=true, bringing up 300 > nodes: > - SB DB CPU usage is low, no visible CPU spikes. > - ovn-controller CPU usage is normal as well. > > In conclusion all seems fine to me and even in the worst case scenario, > when all datapaths are local, ovn-controller cpu usage is not affected > by the extra datapath lookups introduced by your changes. > > Thanks, > Dumitru > > > In addition, did you see any difference of CPU usage on SB DB? > > > > Thanks, > > Han > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Hi Dumitru, thanks for the testing and sharing! I applied the patch to master. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
