On Thu, Feb 28, 2019 at 09:15:15AM -0800, Han Zhou wrote: > In scalability test with ovn-scale-test, ovsdb-server SB load is not a > problem at least with 1k HVs. However, if we restart the ovsdb-server, > depending on the number of HVs and scale of logical objects, e.g. the > number of logical ports, ovsdb-server of SB become an obvious bottleneck. > > In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches > connected by one single logical router). Restarting ovsdb-server of SB > resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and > northd) are reconnecting and resyncing the big amount of data at the same > time. > > Similar problem would happen in failover scenario. With active-active > cluster, the problem can be aleviated slightly, because only 1/3 (assuming > it is 3-node cluster) of the HVs will need to resync data from new servers, > but it is still a serious problem. > > For detailed discussions for the problem and solutions, see: > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html > > The patches implements the proposal in that discussion. It introduces > a new method monitor_cond_since to enable client to request changes that > happened after a specific point so that the data has been cached already > in client are not re-transfered. Scalability test shows dramatic improvement. > All HVs finishes sync as soon as they reconnect since there is no new data > to be transfered.
Thanks a lot. I applied this to master. I want to encourage you to send another patch adding a NEWS item. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
