Hi Ilya, First off, what a great change! I love seeing this sort of improvement!
I had a look through the patches, and I couldn't find any technical faults with them. From that point of view: Acked-by: Mark Michelson <[email protected]> When it comes to backporting the change to 26.03, I'm a bit hesitant. My instinct is to say that we should absolutely backport it since it is a tremendous performance improvement. I also share your view that people that follow the LTS would miss out on this for two years. We only just released 26.03.0 less than a week ago. This may also help with deployments that see unreasonably large polling intervals in ovn-controller, since this presumably would be a way to trim that down. However, when I think of times we have made exceptions in the past for performance improvement or new feature backports, this patch series is missing some of the typical elements we would normally see. 1. I mentioned that this could be helpful with unreasonably long polling intervals. However, I don't know that we actually have data to show that handling of conjunctive flows is a common cause of this problem amongst large deployments. Typically, if we are going to backport a performance patch, we need to have some evidence that without the performance improvement, reasonably-sized deployments have difficulty functioning. I don't know that we have that for this patch series. 2. It affects the critical path for ovn-controller's functionality, rather than affecting something supplemental or optional. 3. There is no way to opt in or out of this change. 4. The change adds no new tests. In the context of adding this to "main" this is fine, since we rely on existing tests to cover the changes. We have a long testing period where we can plug gaps in coverage in case this change ends up causing an issue. With a backport, we don't have the same luxury. It may seem strange, but maybe adding opt-in semantics to the backport would help here. This way, if it turns out there is some sort of error in the code that we did not see during review, it will only affect those who opt into the performance enhancement, and they can easily disable it without the need for any emergency code fixes. For main (i.e. 26.09 and beyond) we would ignore whatever option is added for 26.03 and always use the optimized code. What do you think? Does anyone else want to weigh in on this? On Fri, Mar 13, 2026 at 12:38 PM Ilya Maximets <[email protected]> wrote: > > When a user creates a lot of ACLs referencing large overlapping address > sets and port groups, ovn-controller takes a lot of time to process > them. Recompute times observed in real world setups go up to a minute > and beyond. > > A lot of time is spent in the following places while merging a new > desired flow with an existing one: > > 1. ovn-controller iterates over all the referenced by the existing > desired flow Sb flows and untracks address sets from them. This > has a linear complexity per merge, and touches a lot of different > dynamically allocated chunks of memory. Takes a lot of time as a > number of SB flows per desired flow goes up to hundreds and beyond, > while in practice after the first merge there are no longer any > address sets referenced in most cases. > > 2. During the same iteration, ovn-controller checks if the same Sb > flow is already referenced. This is fine while we're doing a > linear walk over the list anyway, but it can be done with a hash > map or in some other way if we get rid of the linear lookup. > > 3. ovn-controller constructs a hash map from all the existing > conjunctive actions to later check for duplicates. This is > inefficient when there is only one conjunction to add, which is how > it is always in the current implementation. > > 4. Recompute frequently re-installs all the conjunctive flows in OVS, > because the order of conjunctions in the actions depends on the > order in which incremental processing is happening, and recompute > has it's own order in which it processes Sb flows and merges them > together into desired flows. > > This set is trying to address all the issues above by converting > 'references' into a hash map, counting the number of references with > address sets and performing the duplicate conjunction checks on the > original ofpacts instead of building ad-hoc hash maps, while keeping > the action sorted solving the problem with recompute re-creating all > of them. > > In a setup with 3.5K ACLs referencing the same-ish 450 IP addresses and > a port group with ~20 ports, this set reduces the average full recompute > time from 48 to 18 seconds, which is about 2.7x speed up. > > > As per usual, this is a performance fix, so it's on the edge between > a bug fix and an enhancement. It doesn't make sense to backport below > 26.03, as older branches do not have the previous commit: > 9a4035402982 ("ofctrl: Optimize and specialize the > ofctrl_add_or_append_flow function.") > But if we could consider a backport to 26.03, I think, it might be a > good thing, as people will get much better performance in ACL-heavy > setups in the upcoming LTS. Otherwise, they may need to wait for > another 2 years, if they are following the LTS releases. > > Patch #4 is not a performance fix, so should be backported regardless. > > > Ilya Maximets (5): > ofctrl: Track Sb flow references in a hash map. > ofctrl: Use hash map lookup for duplicate references check. > ofctrl: Count the number of referenced flows with address sets. > ofctrl: Fix removing wrong conjunction during the duplicate check. > ofctrl: Look for duplicated conjunctions directly in the ofpacts. > > controller/ofctrl.c | 245 +++++++++++++++++++++++++++----------------- > tests/ovn-macros.at | 1 - > 2 files changed, 152 insertions(+), 94 deletions(-) > > -- > 2.53.0 > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
