Hello OVN community,

I'm glad the subject of this message has caught your attention :-)

I would like to start a discussion about how we could improve OVN on the
following topics:

* Reduce the memory and CPU footprint of ovn-controller, ovn-northd.
* Support scaling of L2 connectivity across larger clusters.
* Simplify CMS interoperability.
* Allow support for alternative datapath implementations.

This first email will focus on the current issues that (in my view) are
preventing OVN from scaling L2 networks on larger clusters. I will send
another message with some change proposals to remove or fix these
issues.

Disclaimer:

I am fairly new to this project and my perception and understanding may
be incorrect in some aspects. Please forgive me in advance if I use the
wrong terms and/or make invalid statements. My intent is only to make
things better and not to put the blame on anyone for the current design
choices.

Southbound Design
=================

In the current architecture, both databases contain a mix of state and
configuration. While this does not seem to cause any scaling issues for
the northbound DB, it can become a bottleneck for the southbound with
large numbers of chassis and logical network constructs.

The southbound database contains a mix of configuration (logical flows
transformed from the logical network topology) and state (chassis, port
bindings, mac bindings, FDB entries, etc.).

The "configuration" part is consumed by ovn-controller to implement the
network on every chassis and the "state" part is consumed by ovn-northd
to update the northbound "state" entries and to update logical flows.
Some CMS's [1] also depend on the southbound "state" in order to
function properly.

[1] 
https://opendev.org/openstack/neutron/src/tag/22.0.0/neutron/agent/ovn/metadata/ovsdb.py#L39-L40

Centralized decisions
=====================

Every chassis needs to be "aware" of all other chassis in the cluster.
This requirement mainly comes from overlay networks that are implemented
over a full-mesh of point-to-point GENEVE tunnels (or VXLAN with some
limitations). It is not a scaling issue by itself, but it implies
a centralized decision which in turn puts pressure on the central node
at scale.

Due to ovsdb monitoring and caching, any change in the southbound DB
(either by northd or by any of the chassis controllers) is replicated on
every chassis. The monitor_all option is often enabled on large clusters
to avoid the conditional monitoring CPU cost on the central node.

This leads to high memory usage on all chassis, control plane traffic
and possible disruptions in the ovs-vswitchd datapath flow cache.
Unfortunately, I don't have any hard data to back this claim. This is
mainly coming from discussions I had with neutron contributors and from
brainstorming sessions with colleagues.

I hope that the current work on OVN heater to integrate openstack
support [2] will allow getting more insight.

[2] https://github.com/ovn-org/ovn-heater/pull/179

Dynamic mac learning
====================

Logical switch ports on a given chassis are all connected to the same
OVS bridge, in the same VLAN. This prevents from using local mac address
learning and shifts the responsibility to a centralized ovn-northd to
create all the required logical flows to properly segment the network.

When using mac_address=unknown ports, centralized mac learning is
enabled and when a new address is seen entering a port, OVS sends it to
the local controller which updates the FDB table and recomputes flow
rules accordingly. With logical switches spanning across a large number
of chassis, this centralized mac address learning and aging can have an
impact on control plane and dataplane performance.

Closing thoughts
================

My understanding of L3 and L4 capabilities of OVN are too limited to
discuss if there are other issues that would prevent scaling to
thousands of nodes. My point was mainly focused on L2 network scaling.

I would love to get other opinions on these statements.

Cheers!

-- 
Robin Jarry
Red Hat, Telco/NFV

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to