On 3/25/26 8:51 AM, Ales Musil via dev wrote: > On Tue, Mar 24, 2026 at 9:04 PM Ilya Maximets <[email protected]> wrote: > >> On 3/24/26 6:32 PM, Mark Michelson wrote: >>> Hi Ilya, >>> >>> First off, what a great change! I love seeing this sort of improvement! >>> >>> I had a look through the patches, and I couldn't find any technical >>> faults with them. From that point of view: >>> >>> Acked-by: Mark Michelson <[email protected]> >> >> Thanks! >> >>> >>> When it comes to backporting the change to 26.03, I'm a bit hesitant. >>> My instinct is to say that we should absolutely backport it since it >>> is a tremendous performance improvement. I also share your view that >>> people that follow the LTS would miss out on this for two years. We >>> only just released 26.03.0 less than a week ago. This may also help >>> with deployments that see unreasonably large polling intervals in >>> ovn-controller, since this presumably would be a way to trim that >>> down. >>> >>> However, when I think of times we have made exceptions in the past for >>> performance improvement or new feature backports, this patch series is >>> missing some of the typical elements we would normally see. >>> 1. I mentioned that this could be helpful with unreasonably long >>> polling intervals. However, I don't know that we actually have data to >>> show that handling of conjunctive flows is a common cause of this >>> problem amongst large deployments. Typically, if we are going to >>> backport a performance patch, we need to have some evidence that >>> without the performance improvement, reasonably-sized deployments have >>> difficulty functioning. I don't know that we have that for this patch >>> series. >> >> I'm aware of at least a few real world deployments that would have much >> easier time with these changes. But I also agree that those setups are >> a little unreasonable in the way the ACLs are configured. At the same >> time, it is often not particularly easy to persuade users to do things >> differently. :) >> >>> 2. It affects the critical path for ovn-controller's functionality, >>> rather than affecting something supplemental or optional. >>> 3. There is no way to opt in or out of this change. >>> 4. The change adds no new tests. In the context of adding this to >>> "main" this is fine, since we rely on existing tests to cover the >>> changes. We have a long testing period where we can plug gaps in >>> coverage in case this change ends up causing an issue. With a >>> backport, we don't have the same luxury. >> >> This set does technically enable extra testing, since the masking of >> conjunction ids is removed for the I-P vs recompute comparison. >> It's not a dedicated test for sure, but we have pre-existing ones for >> merging the conjunctions. Just wanted to point out the technicality. :) >> >>> >>> It may seem strange, but maybe adding opt-in semantics to the backport >>> would help here. This way, if it turns out there is some sort of error >>> in the code that we did not see during review, it will only affect >>> those who opt into the performance enhancement, and they can easily >>> disable it without the need for any emergency code fixes. For main >>> (i.e. 26.09 and beyond) we would ignore whatever option is added for >>> 26.03 and always use the optimized code. >> >> It doesn't sound right to duplicate the code just to be able to have >> the old slow way of doing things, so I'd vote for not backporting in >> case backporting means an extra knob. >> >>> >>> What do you think? Does anyone else want to weigh in on this? >> >> As I said originally, it's a tricky one. And it's an enhancement for >> the most part, so the default answer is "no" for backports. So, I'm >> leaving this entirely for OVN maintainers to decide if it's worth the >> risk. >> >> The last patch in the set is the largest one and the most complicated. >> So, one more option to consider might be to backport only the first 3. >> This would bring the majority of performance of this set to 26.03. >> >> Best regards, Ilya Maximets. >> >>> >>> >>> >>> >>> >>> On Fri, Mar 13, 2026 at 12:38 PM Ilya Maximets <[email protected]> >> wrote: >>>> >>>> When a user creates a lot of ACLs referencing large overlapping address >>>> sets and port groups, ovn-controller takes a lot of time to process >>>> them. Recompute times observed in real world setups go up to a minute >>>> and beyond. >>>> >>>> A lot of time is spent in the following places while merging a new >>>> desired flow with an existing one: >>>> >>>> 1. ovn-controller iterates over all the referenced by the existing >>>> desired flow Sb flows and untracks address sets from them. This >>>> has a linear complexity per merge, and touches a lot of different >>>> dynamically allocated chunks of memory. Takes a lot of time as a >>>> number of SB flows per desired flow goes up to hundreds and beyond, >>>> while in practice after the first merge there are no longer any >>>> address sets referenced in most cases. >>>> >>>> 2. During the same iteration, ovn-controller checks if the same Sb >>>> flow is already referenced. This is fine while we're doing a >>>> linear walk over the list anyway, but it can be done with a hash >>>> map or in some other way if we get rid of the linear lookup. >>>> >>>> 3. ovn-controller constructs a hash map from all the existing >>>> conjunctive actions to later check for duplicates. This is >>>> inefficient when there is only one conjunction to add, which is how >>>> it is always in the current implementation. >>>> >>>> 4. Recompute frequently re-installs all the conjunctive flows in OVS, >>>> because the order of conjunctions in the actions depends on the >>>> order in which incremental processing is happening, and recompute >>>> has it's own order in which it processes Sb flows and merges them >>>> together into desired flows. >>>> >>>> This set is trying to address all the issues above by converting >>>> 'references' into a hash map, counting the number of references with >>>> address sets and performing the duplicate conjunction checks on the >>>> original ofpacts instead of building ad-hoc hash maps, while keeping >>>> the action sorted solving the problem with recompute re-creating all >>>> of them. >>>> >>>> In a setup with 3.5K ACLs referencing the same-ish 450 IP addresses and >>>> a port group with ~20 ports, this set reduces the average full recompute >>>> time from 48 to 18 seconds, which is about 2.7x speed up. >>>> >>>> >>>> As per usual, this is a performance fix, so it's on the edge between >>>> a bug fix and an enhancement. It doesn't make sense to backport below >>>> 26.03, as older branches do not have the previous commit: >>>> 9a4035402982 ("ofctrl: Optimize and specialize the >> ofctrl_add_or_append_flow function.") >>>> But if we could consider a backport to 26.03, I think, it might be a >>>> good thing, as people will get much better performance in ACL-heavy >>>> setups in the upcoming LTS. Otherwise, they may need to wait for >>>> another 2 years, if they are following the LTS releases. >>>> >>>> Patch #4 is not a performance fix, so should be backported regardless. >>>> >>>> >>>> Ilya Maximets (5): >>>> ofctrl: Track Sb flow references in a hash map. >>>> ofctrl: Use hash map lookup for duplicate references check. >>>> ofctrl: Count the number of referenced flows with address sets. >>>> ofctrl: Fix removing wrong conjunction during the duplicate check. >>>> ofctrl: Look for duplicated conjunctions directly in the ofpacts. >>>> >>>> controller/ofctrl.c | 245 +++++++++++++++++++++++++++----------------- >>>> tests/ovn-macros.at | 1 - >>>> 2 files changed, 152 insertions(+), 94 deletions(-) >>>> >>>> -- >>>> 2.53.0 >>>> >>>> _______________________________________________ >>>> dev mailing list >>>> [email protected] >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> >>> >> >> _______________________________________________ >> dev mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > Thank you Ilya and Mark, >
Hi all, > first things first, I applied this to main while we continue the > backport discussion. To be brief, it's true that we usually don't > backport performance improvements. However, two main points lead me > to lean toward backporting to 26.03: > > 1) 26.03 was just released with 26.03.0 so there is still a space to > fix anything that this might broke (not that I believe it would). > > 2) 26.03 is LTS which means many CMSs might stay with this version > and the benefit of having this is undeniable. > > Also, since this is being merged to main we can take the extra step > to perform d/s testing to ensure it's safe to backport. Thoughts? > I definitely vote for backporting this to 26.03 as is. I had a closer look at the applied patches and they seem safe. We also most likely have no users on 26.03 yet. Regards, Dumitru > Regards, > Ales > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
