Hi Cory,

With 4000 networks all connecting to one router with external GW, all networks 
and router
are created and connected. I launched a few VMs on some networks, they are 
connected and
all have external connectivity. When running ping on VM, there is a slow ping 
(a few seconds)
out of 10+ normal pings (< 1ms). When checking CPU usage, I see Neutron server, 
OVN DB,
OVN controller and ovs-switchd all take almost 100% CPU. It's been like that 
for hours already.
Since they are all created and some of them work fine (didn't validate all 
networks), not sure
what those services are busy with. Checked log, the ovn-controller keep 
switching between
ovn-sb-db, because of heartbeat timeout.

I'd like know if that's expected, or something I can tune to fix the problem. 
If that's expected,
I can't think of anything other than building multiple clusters to support that 
kind of scale.

I am running test with 4000 networks with 50 routers, 80 networks on each 
router. Wondering
if that's going to help.

The goal is to have thousands networks connecting to external. I'd like to know 
what's the
expected scale supported by current OVN.

Any comment is welcome.


Thanks!

Tony

________________________________
From: Cory Hawkless <[email protected]>
Sent: July 20, 2020 10:04 PM
To: Tony Liu <[email protected]>; [email protected] 
<[email protected]>
Subject: RE: OVN scale


I would expect to see 100% cpu utilisation on anything involved in the process 
of creating 4000 networks and routers but the question is for how long do you 
see high utilisation? Does it last for seconds, minutes, hours?

Do the resources actually get created after some period of time or is the 
process failing?



From: discuss [mailto:[email protected]] On Behalf Of Tony Liu
Sent: Tuesday, 21 July 2020 1:53 PM
To: [email protected]
Subject: [ovs-discuss] OVN scale



Hi folks,



​This is my first email here. Please let me know if there is any rule

or convention I need to follow. Don't want to break it.



I started with OpenStack Ussuri and OVN 20.03.0 recently and currently

running some scaling test. Searched around for scaling info and noticed

some improvements already presented, which is pretty cool.

Wondering that "incremental" by DDlog implemented yet?



With a 3-node OVN DB cluster and 3 compute nodes (with OVN controller),

I created 4000 networks from OpenStack, 4000 logical routers with

external GW, add one network to each LR. Port security is disabled on

all networks. Then I see ovn-northd, ovn-controller and ovs-switchd all

take almost 100% CPU. Is this expected?



I revised solution and running test to have 4000 networks, 20 LRs and

200 networks on each LR. Will see if this makes any difference.



Is there any scaling and performance report with the latest OVN release

as my reference?





Thanks!



Tony


_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to