Hi Cory, With 4000 networks all connecting to one router with external GW, all networks and router are created and connected. I launched a few VMs on some networks, they are connected and all have external connectivity. When running ping on VM, there is a slow ping (a few seconds) out of 10+ normal pings (< 1ms). When checking CPU usage, I see Neutron server, OVN DB, OVN controller and ovs-switchd all take almost 100% CPU. It's been like that for hours already. Since they are all created and some of them work fine (didn't validate all networks), not sure what those services are busy with. Checked log, the ovn-controller keep switching between ovn-sb-db, because of heartbeat timeout.
I'd like know if that's expected, or something I can tune to fix the problem. If that's expected, I can't think of anything other than building multiple clusters to support that kind of scale. I am running test with 4000 networks with 50 routers, 80 networks on each router. Wondering if that's going to help. The goal is to have thousands networks connecting to external. I'd like to know what's the expected scale supported by current OVN. Any comment is welcome. Thanks! Tony ________________________________ From: Cory Hawkless <[email protected]> Sent: July 20, 2020 10:04 PM To: Tony Liu <[email protected]>; [email protected] <[email protected]> Subject: RE: OVN scale I would expect to see 100% cpu utilisation on anything involved in the process of creating 4000 networks and routers but the question is for how long do you see high utilisation? Does it last for seconds, minutes, hours? Do the resources actually get created after some period of time or is the process failing? From: discuss [mailto:[email protected]] On Behalf Of Tony Liu Sent: Tuesday, 21 July 2020 1:53 PM To: [email protected] Subject: [ovs-discuss] OVN scale Hi folks, This is my first email here. Please let me know if there is any rule or convention I need to follow. Don't want to break it. I started with OpenStack Ussuri and OVN 20.03.0 recently and currently running some scaling test. Searched around for scaling info and noticed some improvements already presented, which is pretty cool. Wondering that "incremental" by DDlog implemented yet? With a 3-node OVN DB cluster and 3 compute nodes (with OVN controller), I created 4000 networks from OpenStack, 4000 logical routers with external GW, add one network to each LR. Port security is disabled on all networks. Then I see ovn-northd, ovn-controller and ovs-switchd all take almost 100% CPU. Is this expected? I revised solution and running test to have 4000 networks, 20 LRs and 200 networks on each LR. Will see if this makes any difference. Is there any scaling and performance report with the latest OVN release as my reference? Thanks! Tony
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
