[ovs-dev] ovn-controller refactor doc (LONG)

Mark Michelson Tue, 26 Jan 2021 13:04:47 -0800

Warning: Very long email ahead

ovn-controller has seen numerous drastic improvements over the pastseveral years. The focus has been mostly on performance and adding newfeatures. This, however, has resulted in a code base that has becomedifficult to work with. The problems it has can be summed up as follows:* It is not always obvious how certain pieces of code interact with eachother.* The architecture comes across as monolithic. That is, theovn-controller application does not have individual components or layersthat have isolated duties.* Southbound database IDL structures are used directly throughout thecode, rather than structures that may better represent the networkmakeup or that take advantage of types and other benefits of C.* There are a lot of special-cases. In other words, circumstances canresult in different code paths running.


These have led to problems, such as:
* Unit testing is difficult.

* System tests can be flaky. We might see some developers run testssuccessfully 90% of the time, while other devs run the same tests with90% failure.

* Attempting to replace a section of code with something new is difficult.

* Modifying the code can result in unexpected errors in what seems atfirst glance to be an unrelated section.* The incremental engine sometimes encounters situations that requirefull recomputes of all southbound data. Many of these situations couldbe avoided if the focus of the incremental engine were narrowed.


To fix these issues, we suggest the following ideas:

* Isolate database IDL usage to a data translation layer. That is,database-specific structures from the IDL will be translated into localstructures. Any lower layers would act upon the local structures only,never seeing the database data directly.

  * This allows lower layers to be more easily unit tested.
  * This allows changing database technologies more easily.

* This potentially allows for more natural representations of data.In other words, we are not restricted to the relations and limited typesthat the databases use.* Isolate the incremental engine to determine only on which data isrelevant to be processed by lower layers. The lower layers should thentake whatever data is relevant and perform any actions that should beperformed (e.g. claiming/releasing port bindings, writing openflowflows, etc.)

  * This reduces most special-casing of code in lower layers.

* This makes it easier to reason about what effect the incrementalengine will have on what will be processed.* This reduces the overall stretch of the incremental engine on therest of the code.* Separate unrelated units from each other. For instance, OpenFlowgeneration should be separate from packet-in processing, and thoseshould be separate from the code that claims port bindings. [1]

  * This allows for easier unit testing of these specific components.
  * This allows for potential parallelization (or pipelining) of tasks.
* Isolate the writing of OpenFlow to its own unit.

Together, these changes have some net positive effects on the code:

* There will be fewer conditions that require the incremental engine toperform full recomputes. This should lead to better performance.* Adding additional incremental processing should be easier and lesserror prone than it currently is.* Changing technologies that ovn-controller interacts with (e.g. ovsdb,OpenFlow) is now at least within the realm of possibility.* Adding new features should be less error-prone. Adding new featuresshould also be easier, since it should hopefully be more obvious how anew feature should slot into ovn-controller.* As has been noted above, several of the changes allow for more unittests to be written, allowing for better code coverage, and betterconfidence in the code.* With the isolation of code sections, it should not be as easy toaccidentally break an unrelated code section with a bug fix or new feature.* With the layering of code sections, it should be easier to profile thecode and find where the problem areas are. Adjusting their performanceshould be safe as long as the inputs and outputs from those layersremains the same.


Hypothetical layering system

Below is a possible layering of parts that ovn-controller could use.Note that this chart is not meant to be prescriptive: this is only asuggestion, and it likely is missing important details.


+-------------------------------------------------------------+
|                                                             |
|                  OVSDB (Southbound and OVS)                 |
|                                                             |
+-------------------------------------------------------------+
|                                                             |
|               Data Translation/Validation (read)            |
|                                                             |
+-------------------------------------------------------------+
|                                                             |
|                         Data Filter                         |
|                                                             |
+-------------------------------------------------------------+
|                                                             |
|                     Incremental Engine                      |
|                                                             |
+---------------+----------------+----------------------------+
|               |                |                            |
| Logical Flow  | Physical Flow  |      Other Processing      |
|               |                |                            |
+---------------+---------------------------------------------+
|                                |                            |
|          Flow Writing          |  Data Translation (write)  |
|                                |                            |
+-------------------------------------------------------------+
|                                |                            |
|           vswitchd             | OVSDB (Southbound and OVS) |
|                                |                            |
+--------------------------------+----------------------------+

The topmost and bottommost rows represent entities outside of OVN. Thetop row represents the source of all data that OVN processes. The bottomrow represents the final destinations for data that OVN has processed.The inner blocks are layers within ovn-controller. Each layer reads datafrom the layer(s) directly above it and writes data to the layer(s)directly below it.


Data Translation/Validation (read)

This layer is responsible for all reads from the OVSDBs that OVNinteracts with. It translates the structures from the OVSDB IDL intolocal ovn-controller formats, possibly performing validation ifnecessary. The output of this layer to lower layers is a snapshot of thedatabase. This layer and the Data Translation (write) layer are the onlylayers that use OVSDB IDL types and transactions.


Data Filter

This layer takes the data from the Data Translation layer and determineswhat, if any, is relevant to the local chassis. This layer isresponsible for filtering out non-local data before it reaches any lowerlayers.


Incremental Engine

This portion of the code is responsible for keeping caches of data, andtaking the snapshots of data from the layers above to determine what haschanged. The output of this layer is the set of changes: data that hasbeen added, updated, or deleted.


Logical Flow

This layer is responsible for generating the OpenFlow flows based onchanged logical flow data. This layer is where all logical flowexpression parsing is performed. It takes the changes from theincremental engine and outputs a set of OpenFlow changes.


Physical Flow

This layer is responsible for generating the OpenFlow flows based onphysical data, such as input/output to VIF ports, logical patch ports,etc. Unlike the Logical Flow layer, this creates OpenFlow flows directlywithout the need for any expression parsing, since the flows originatefrom the network structure rather than logical rules. Its output is thesame as the Logical Flow layer: the set of OpenFlow changes that shouldbe written.


Flow Writing

This layer writes the flows generated from the Logical Flow and PhysicalFlow layers. This layer is responsible for writing flow_mod, group_mod,meter_mod, etc. messages from OVN to OVS. it also maintains the statemachine between OVN and OVS, leaving the upper layers from having tobother with it.


Other processing

What a horrible name! This essentially describes any entity in OVN thatneeds to consume the changes from the incremental engine and writechanges to a database as a result.


General notes:

* Though the layers describe the overall flow of data and subdivide theduties, it does not necessarily represent how code might end up beinggrouped. For instance, we might write a code unit that takes on theduties of multiple layers by specializing in a particular data type. Asan example, consider a port binding unit that does the following:* Translates southbound port binding related data into local datastructures (Data Translation/Validation (read) layer)* Determines which port bindings are considered local to thehypervisor (data filter layer)* Maintains a cache of current port binding data to determine whathas changed (incremental engine layer)* Acts upon the changes to port binding data (e.g. "claims" localport bindings) (other processing layer)* Writes updates to the southbound database (e.g. updates the chassisfield on a port binding that it has claimed) (data translation (write)layer)As long as the layering is adhered to, it should not necessarily matterhow the code is arranged.* If the layering model is adhered to, it should be possible to replacea layer transparently, so long as the new code provides the same outputsbased on the inputs it receives.* If we decided to change database technologies, the two datatranslation layers (read and write) would need to be replaced, butnothing else would need to be touched.* What previously was called a recompute in the incremental engine isnow essentially replaced with a cache clear. By clearing the cache, alldata from the upper layers will be seen as new. Based on what wasdiscussed in a higher bullet point, if the data caches are farmed out toseparate code units, you can have localized cache clears that do notaffect other units. Thus one unit may need a "recompute" while othersmay be able to continue without a need to clear their caches.* pinctrl is not represented in this diagram. pinctrl sources its datadifferently from the rest of ovn-controller. pinctrl does not use theincremental engine. The pinctrl thread does not use OVSDB structuresdirectly. In other words, pinctrl is not in need of the same type ofrefactoring compared with the rest of ovn-controller [2]* The top three layers (Data translation (read), data locality, andincremental engine) can be grouped together as a larger data acquisitionlayer. The layers below that can be grouped together as a datatransformation layer. This was difficult to represent in an ASCIIdiagram. The reason for making this distinction is that it provides aclean separation point that could be considered when rethinkingthreading/program control of ovn-controller.


Possible Transition Plan

If all of the above were submitted as a single patch series it would benearly impossible to review, and it likely would introduce unforeseenbugs. So let's think about how best to go from current ovn-controller toa more structured ovn-controller.

Phase 1: Add scaffolding for the layers alongside the existingovn-controller code.1. Add stub function calls to ovn-controller's main loop that eachcorrespond to one of the data processing layers.2. Move the current ovn-controller loop logic into one of these stubs.If using the layering system suggested earlier in this document, the"other processing" layer would be a good place to put it.

This reformulates ovn-controller to call into the individual layers.From here, we can fill out the stubs with the appropriate logic, and wecan remove "old" code as necessary.


Phase 2: Isolate IDL data type usage
1. Determine internal data types to be used.

2. Write data translation layers so that IDL types can be translatedinto internal types and vice-versa. Call into these functions from theappropriate stub functions created in phase 1.3. Add unit tests that ensure data translation works as expected betweenthe types.

4. Replace the use of IDL data types in the code with the internal types.

At this point, you will have the data translation layers written. Usinginternal data types will make it easier to create unit tests duringlater phases.


Phase 3: Incremental engine redefinition
1. Determine the data types for which incremental processing is desired.
2. Pick one of those data types and create the following:
   a. A data cache for the type.

b. A method that takes the current data as input and compares it tothe cache, outputting the new, deleted, and updated items.

3. Write unit tests
4. Write the layers below the incremental engine to use this "diff" data.

5. Ensure the stub from phase 1 that calls out to the incremental engineis filled in for this data type.6. Delete any old incremental engine code that might conflict with thisnew data handling.7. Repeat steps 2 - 6 for each data type that the incremental enginewill handle.

Phase 3 is by far the hardest phase, since it likely will not be a veryclean process. By dividing the work by data type, it at least makes itslightly easier to try to break the work up into more manageable chunks.

Phase 4: Add processing between data translation and the incrementalengine. If using the proposed layering scheme above, this would be thedata filter layer.1. Determine the data types that need to be filtered for the localenvironment.

2. Pick one of those data types and create the following:

a. A function that will take in the current data set and will outputa subset that includes only data that is pertinent to the local hypervisor.3. Call the functions from step 2 in the appropriate stub fromovn-controller.

4. Write unit tests.
5. Repeat steps 2 - 4 for each data type from step 1.

Phase 4 is a bit simpler, assuming that data filtering is the onlyprocessing that occurs between the data translation layer andincremental engine.

At this point we essentially have the new code written. There may besome additional work that is needed below the incremental engine layer,but a lot of the incremental engine rework in phase 3 is going to touchthose lower layers as well. Whatever is left of the originalovn-controller logic should be examined and determined if it should bemoved into a different layer, removed altogether, or left where it is.For the code that gets left where it is, we can examine what is left anddetermine if it can or should be cleaned up/refactored.

[1] To some extent this is already done, since code is separated intolflow.c, ofctrl.c, binding.c, physical.c, etc. However, those all havecross-dependencies taht can result in unexpected interactions.[2] This is not to imply that pinctrl is perfect. It is potentially inneed of its own refactoring. It also doesn't mean pinctrl would not betouched by the ovn-controller refactor, since it makes use of structureslike local datapaths, which may be altered by the refactor.


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] ovn-controller refactor doc (LONG)

Reply via email to