Hello OVN community, I'm glad the subject of this message has caught your attention :-)
I would like to start a discussion about how we could improve OVN on the following topics: * Reduce the memory and CPU footprint of ovn-controller, ovn-northd. * Support scaling of L2 connectivity across larger clusters. * Simplify CMS interoperability. * Allow support for alternative datapath implementations. This first email will focus on the current issues that (in my view) are preventing OVN from scaling L2 networks on larger clusters. I will send another message with some change proposals to remove or fix these issues. Disclaimer: I am fairly new to this project and my perception and understanding may be incorrect in some aspects. Please forgive me in advance if I use the wrong terms and/or make invalid statements. My intent is only to make things better and not to put the blame on anyone for the current design choices. Southbound Design ================= In the current architecture, both databases contain a mix of state and configuration. While this does not seem to cause any scaling issues for the northbound DB, it can become a bottleneck for the southbound with large numbers of chassis and logical network constructs. The southbound database contains a mix of configuration (logical flows transformed from the logical network topology) and state (chassis, port bindings, mac bindings, FDB entries, etc.). The "configuration" part is consumed by ovn-controller to implement the network on every chassis and the "state" part is consumed by ovn-northd to update the northbound "state" entries and to update logical flows. Some CMS's [1] also depend on the southbound "state" in order to function properly. [1] https://opendev.org/openstack/neutron/src/tag/22.0.0/neutron/agent/ovn/metadata/ovsdb.py#L39-L40 Centralized decisions ===================== Every chassis needs to be "aware" of all other chassis in the cluster. This requirement mainly comes from overlay networks that are implemented over a full-mesh of point-to-point GENEVE tunnels (or VXLAN with some limitations). It is not a scaling issue by itself, but it implies a centralized decision which in turn puts pressure on the central node at scale. Due to ovsdb monitoring and caching, any change in the southbound DB (either by northd or by any of the chassis controllers) is replicated on every chassis. The monitor_all option is often enabled on large clusters to avoid the conditional monitoring CPU cost on the central node. This leads to high memory usage on all chassis, control plane traffic and possible disruptions in the ovs-vswitchd datapath flow cache. Unfortunately, I don't have any hard data to back this claim. This is mainly coming from discussions I had with neutron contributors and from brainstorming sessions with colleagues. I hope that the current work on OVN heater to integrate openstack support [2] will allow getting more insight. [2] https://github.com/ovn-org/ovn-heater/pull/179 Dynamic mac learning ==================== Logical switch ports on a given chassis are all connected to the same OVS bridge, in the same VLAN. This prevents from using local mac address learning and shifts the responsibility to a centralized ovn-northd to create all the required logical flows to properly segment the network. When using mac_address=unknown ports, centralized mac learning is enabled and when a new address is seen entering a port, OVS sends it to the local controller which updates the FDB table and recomputes flow rules accordingly. With logical switches spanning across a large number of chassis, this centralized mac address learning and aging can have an impact on control plane and dataplane performance. Closing thoughts ================ My understanding of L3 and L4 capabilities of OVN are too limited to discuss if there are other issues that would prevent scaling to thousands of nodes. My point was mainly focused on L2 network scaling. I would love to get other opinions on these statements. Cheers! -- Robin Jarry Red Hat, Telco/NFV _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss