Warning: Very long email ahead
ovn-controller has seen numerous drastic improvements over the past
several years. The focus has been mostly on performance and adding new
features. This, however, has resulted in a code base that has become
difficult to work with. The problems it has can be summed up as follows:
* It is not always obvious how certain pieces of code interact with each
other.
* The architecture comes across as monolithic. That is, the
ovn-controller application does not have individual components or layers
that have isolated duties.
* Southbound database IDL structures are used directly throughout the
code, rather than structures that may better represent the network
makeup or that take advantage of types and other benefits of C.
* There are a lot of special-cases. In other words, circumstances can
result in different code paths running.
These have led to problems, such as:
* Unit testing is difficult.
* System tests can be flaky. We might see some developers run tests
successfully 90% of the time, while other devs run the same tests with
90% failure.
* Attempting to replace a section of code with something new is difficult.
* Modifying the code can result in unexpected errors in what seems at
first glance to be an unrelated section.
* The incremental engine sometimes encounters situations that require
full recomputes of all southbound data. Many of these situations could
be avoided if the focus of the incremental engine were narrowed.
To fix these issues, we suggest the following ideas:
* Isolate database IDL usage to a data translation layer. That is,
database-specific structures from the IDL will be translated into local
structures. Any lower layers would act upon the local structures only,
never seeing the database data directly.
* This allows lower layers to be more easily unit tested.
* This allows changing database technologies more easily.
* This potentially allows for more natural representations of data.
In other words, we are not restricted to the relations and limited types
that the databases use.
* Isolate the incremental engine to determine only on which data is
relevant to be processed by lower layers. The lower layers should then
take whatever data is relevant and perform any actions that should be
performed (e.g. claiming/releasing port bindings, writing openflow
flows, etc.)
* This reduces most special-casing of code in lower layers.
* This makes it easier to reason about what effect the incremental
engine will have on what will be processed.
* This reduces the overall stretch of the incremental engine on the
rest of the code.
* Separate unrelated units from each other. For instance, OpenFlow
generation should be separate from packet-in processing, and those
should be separate from the code that claims port bindings. [1]
* This allows for easier unit testing of these specific components.
* This allows for potential parallelization (or pipelining) of tasks.
* Isolate the writing of OpenFlow to its own unit.
Together, these changes have some net positive effects on the code:
* There will be fewer conditions that require the incremental engine to
perform full recomputes. This should lead to better performance.
* Adding additional incremental processing should be easier and less
error prone than it currently is.
* Changing technologies that ovn-controller interacts with (e.g. ovsdb,
OpenFlow) is now at least within the realm of possibility.
* Adding new features should be less error-prone. Adding new features
should also be easier, since it should hopefully be more obvious how a
new feature should slot into ovn-controller.
* As has been noted above, several of the changes allow for more unit
tests to be written, allowing for better code coverage, and better
confidence in the code.
* With the isolation of code sections, it should not be as easy to
accidentally break an unrelated code section with a bug fix or new feature.
* With the layering of code sections, it should be easier to profile the
code and find where the problem areas are. Adjusting their performance
should be safe as long as the inputs and outputs from those layers
remains the same.
Hypothetical layering system
Below is a possible layering of parts that ovn-controller could use.
Note that this chart is not meant to be prescriptive: this is only a
suggestion, and it likely is missing important details.
+-------------------------------------------------------------+
| |
| OVSDB (Southbound and OVS) |
| |
+-------------------------------------------------------------+
| |
| Data Translation/Validation (read) |
| |
+-------------------------------------------------------------+
| |
| Data Filter |
| |
+-------------------------------------------------------------+
| |
| Incremental Engine |
| |
+---------------+----------------+----------------------------+
| | | |
| Logical Flow | Physical Flow | Other Processing |
| | | |
+---------------+---------------------------------------------+
| | |
| Flow Writing | Data Translation (write) |
| | |
+-------------------------------------------------------------+
| | |
| vswitchd | OVSDB (Southbound and OVS) |
| | |
+--------------------------------+----------------------------+
The topmost and bottommost rows represent entities outside of OVN. The
top row represents the source of all data that OVN processes. The bottom
row represents the final destinations for data that OVN has processed.
The inner blocks are layers within ovn-controller. Each layer reads data
from the layer(s) directly above it and writes data to the layer(s)
directly below it.
Data Translation/Validation (read)
This layer is responsible for all reads from the OVSDBs that OVN
interacts with. It translates the structures from the OVSDB IDL into
local ovn-controller formats, possibly performing validation if
necessary. The output of this layer to lower layers is a snapshot of the
database. This layer and the Data Translation (write) layer are the only
layers that use OVSDB IDL types and transactions.
Data Filter
This layer takes the data from the Data Translation layer and determines
what, if any, is relevant to the local chassis. This layer is
responsible for filtering out non-local data before it reaches any lower
layers.
Incremental Engine
This portion of the code is responsible for keeping caches of data, and
taking the snapshots of data from the layers above to determine what has
changed. The output of this layer is the set of changes: data that has
been added, updated, or deleted.
Logical Flow
This layer is responsible for generating the OpenFlow flows based on
changed logical flow data. This layer is where all logical flow
expression parsing is performed. It takes the changes from the
incremental engine and outputs a set of OpenFlow changes.
Physical Flow
This layer is responsible for generating the OpenFlow flows based on
physical data, such as input/output to VIF ports, logical patch ports,
etc. Unlike the Logical Flow layer, this creates OpenFlow flows directly
without the need for any expression parsing, since the flows originate
from the network structure rather than logical rules. Its output is the
same as the Logical Flow layer: the set of OpenFlow changes that should
be written.
Flow Writing
This layer writes the flows generated from the Logical Flow and Physical
Flow layers. This layer is responsible for writing flow_mod, group_mod,
meter_mod, etc. messages from OVN to OVS. it also maintains the state
machine between OVN and OVS, leaving the upper layers from having to
bother with it.
Other processing
What a horrible name! This essentially describes any entity in OVN that
needs to consume the changes from the incremental engine and write
changes to a database as a result.
General notes:
* Though the layers describe the overall flow of data and subdivide the
duties, it does not necessarily represent how code might end up being
grouped. For instance, we might write a code unit that takes on the
duties of multiple layers by specializing in a particular data type. As
an example, consider a port binding unit that does the following:
* Translates southbound port binding related data into local data
structures (Data Translation/Validation (read) layer)
* Determines which port bindings are considered local to the
hypervisor (data filter layer)
* Maintains a cache of current port binding data to determine what
has changed (incremental engine layer)
* Acts upon the changes to port binding data (e.g. "claims" local
port bindings) (other processing layer)
* Writes updates to the southbound database (e.g. updates the chassis
field on a port binding that it has claimed) (data translation (write)
layer)
As long as the layering is adhered to, it should not necessarily matter
how the code is arranged.
* If the layering model is adhered to, it should be possible to replace
a layer transparently, so long as the new code provides the same outputs
based on the inputs it receives.
* If we decided to change database technologies, the two data
translation layers (read and write) would need to be replaced, but
nothing else would need to be touched.
* What previously was called a recompute in the incremental engine is
now essentially replaced with a cache clear. By clearing the cache, all
data from the upper layers will be seen as new. Based on what was
discussed in a higher bullet point, if the data caches are farmed out to
separate code units, you can have localized cache clears that do not
affect other units. Thus one unit may need a "recompute" while others
may be able to continue without a need to clear their caches.
* pinctrl is not represented in this diagram. pinctrl sources its data
differently from the rest of ovn-controller. pinctrl does not use the
incremental engine. The pinctrl thread does not use OVSDB structures
directly. In other words, pinctrl is not in need of the same type of
refactoring compared with the rest of ovn-controller [2]
* The top three layers (Data translation (read), data locality, and
incremental engine) can be grouped together as a larger data acquisition
layer. The layers below that can be grouped together as a data
transformation layer. This was difficult to represent in an ASCII
diagram. The reason for making this distinction is that it provides a
clean separation point that could be considered when rethinking
threading/program control of ovn-controller.
Possible Transition Plan
If all of the above were submitted as a single patch series it would be
nearly impossible to review, and it likely would introduce unforeseen
bugs. So let's think about how best to go from current ovn-controller to
a more structured ovn-controller.
Phase 1: Add scaffolding for the layers alongside the existing
ovn-controller code.
1. Add stub function calls to ovn-controller's main loop that each
correspond to one of the data processing layers.
2. Move the current ovn-controller loop logic into one of these stubs.
If using the layering system suggested earlier in this document, the
"other processing" layer would be a good place to put it.
This reformulates ovn-controller to call into the individual layers.
From here, we can fill out the stubs with the appropriate logic, and we
can remove "old" code as necessary.
Phase 2: Isolate IDL data type usage
1. Determine internal data types to be used.
2. Write data translation layers so that IDL types can be translated
into internal types and vice-versa. Call into these functions from the
appropriate stub functions created in phase 1.
3. Add unit tests that ensure data translation works as expected between
the types.
4. Replace the use of IDL data types in the code with the internal types.
At this point, you will have the data translation layers written. Using
internal data types will make it easier to create unit tests during
later phases.
Phase 3: Incremental engine redefinition
1. Determine the data types for which incremental processing is desired.
2. Pick one of those data types and create the following:
a. A data cache for the type.
b. A method that takes the current data as input and compares it to
the cache, outputting the new, deleted, and updated items.
3. Write unit tests
4. Write the layers below the incremental engine to use this "diff" data.
5. Ensure the stub from phase 1 that calls out to the incremental engine
is filled in for this data type.
6. Delete any old incremental engine code that might conflict with this
new data handling.
7. Repeat steps 2 - 6 for each data type that the incremental engine
will handle.
Phase 3 is by far the hardest phase, since it likely will not be a very
clean process. By dividing the work by data type, it at least makes it
slightly easier to try to break the work up into more manageable chunks.
Phase 4: Add processing between data translation and the incremental
engine. If using the proposed layering scheme above, this would be the
data filter layer.
1. Determine the data types that need to be filtered for the local
environment.
2. Pick one of those data types and create the following:
a. A function that will take in the current data set and will output
a subset that includes only data that is pertinent to the local hypervisor.
3. Call the functions from step 2 in the appropriate stub from
ovn-controller.
4. Write unit tests.
5. Repeat steps 2 - 4 for each data type from step 1.
Phase 4 is a bit simpler, assuming that data filtering is the only
processing that occurs between the data translation layer and
incremental engine.
At this point we essentially have the new code written. There may be
some additional work that is needed below the incremental engine layer,
but a lot of the incremental engine rework in phase 3 is going to touch
those lower layers as well. Whatever is left of the original
ovn-controller logic should be examined and determined if it should be
moved into a different layer, removed altogether, or left where it is.
For the code that gets left where it is, we can examine what is left and
determine if it can or should be cleaned up/refactored.
[1] To some extent this is already done, since code is separated into
lflow.c, ofctrl.c, binding.c, physical.c, etc. However, those all have
cross-dependencies taht can result in unexpected interactions.
[2] This is not to imply that pinctrl is perfect. It is potentially in
need of its own refactoring. It also doesn't mean pinctrl would not be
touched by the ovn-controller refactor, since it makes use of structures
like local datapaths, which may be altered by the refactor.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev