On 3/25/26 8:51 AM, Ales Musil via dev wrote:
> On Tue, Mar 24, 2026 at 9:04 PM Ilya Maximets <[email protected]> wrote:
> 
>> On 3/24/26 6:32 PM, Mark Michelson wrote:
>>> Hi Ilya,
>>>
>>> First off, what a great change! I love seeing this sort of improvement!
>>>
>>> I had a look through the patches, and I couldn't find any technical
>>> faults with them. From that point of view:
>>>
>>> Acked-by: Mark Michelson <[email protected]>
>>
>> Thanks!
>>
>>>
>>> When it comes to backporting the change to 26.03, I'm a bit hesitant.
>>> My instinct is to say that we should absolutely backport it since it
>>> is a tremendous performance improvement. I also share your view that
>>> people that follow the LTS would miss out on this for two years. We
>>> only just released 26.03.0 less than a week ago. This may also help
>>> with deployments that see unreasonably large polling intervals in
>>> ovn-controller, since this presumably would be a way to trim that
>>> down.
>>>
>>> However, when I think of times we have made exceptions in the past for
>>> performance improvement or new feature backports, this patch series is
>>> missing some of the typical elements we would normally see.
>>> 1. I mentioned that this could be helpful with unreasonably long
>>> polling intervals. However, I don't know that we actually have data to
>>> show that handling of conjunctive flows is a common cause of this
>>> problem amongst large deployments. Typically, if we are going to
>>> backport a performance patch, we need to have some evidence that
>>> without the performance improvement, reasonably-sized deployments have
>>> difficulty functioning. I don't know that we have that for this patch
>>> series.
>>
>> I'm aware of at least a few real world deployments that would have much
>> easier time with these changes.  But I also agree that those setups are
>> a little unreasonable in the way the ACLs are configured.  At the same
>> time, it is often not particularly easy to persuade users to do things
>> differently. :)
>>
>>> 2. It affects the critical path for ovn-controller's functionality,
>>> rather than affecting something supplemental or optional.
>>> 3. There is no way to opt in or out of this change.
>>> 4. The change adds no new tests. In the context of adding this to
>>> "main" this is fine, since we rely on existing tests to cover the
>>> changes. We have a long testing period where we can plug gaps in
>>> coverage in case this change ends up causing an issue. With a
>>> backport, we don't have the same luxury.
>>
>> This set does technically enable extra testing, since the masking of
>> conjunction ids is removed for the I-P vs recompute comparison.
>> It's not a dedicated test for sure, but we have pre-existing ones for
>> merging the conjunctions.  Just wanted to point out the technicality. :)
>>
>>>
>>> It may seem strange, but maybe adding opt-in semantics to the backport
>>> would help here. This way, if it turns out there is some sort of error
>>> in the code that we did not see during review, it will only affect
>>> those who opt into the performance enhancement, and they can easily
>>> disable it without the need for any emergency code fixes. For main
>>> (i.e. 26.09 and beyond) we would ignore whatever option is added for
>>> 26.03 and always use the optimized code.
>>
>> It doesn't sound right to duplicate the code just to be able to have
>> the old slow way of doing things, so I'd vote for not backporting in
>> case backporting means an extra knob.
>>
>>>
>>> What do you think? Does anyone else want to weigh in on this?
>>
>> As I said originally, it's a tricky one.  And it's an enhancement for
>> the most part, so the default answer is "no" for backports.  So, I'm
>> leaving this entirely for OVN maintainers to decide if it's worth the
>> risk.
>>
>> The last patch in the set is the largest one and the most complicated.
>> So, one more option to consider might be to backport only the first 3.
>> This would bring the majority of performance of this set to 26.03.
>>
>> Best regards, Ilya Maximets.
>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Mar 13, 2026 at 12:38 PM Ilya Maximets <[email protected]>
>> wrote:
>>>>
>>>> When a user creates a lot of ACLs referencing large overlapping address
>>>> sets and port groups, ovn-controller takes a lot of time to process
>>>> them.  Recompute times observed in real world setups go up to a minute
>>>> and beyond.
>>>>
>>>> A lot of time is spent in the following places while merging a new
>>>> desired flow with an existing one:
>>>>
>>>> 1. ovn-controller iterates over all the referenced by the existing
>>>>    desired flow Sb flows and untracks address sets from them.  This
>>>>    has a linear complexity per merge, and touches a lot of different
>>>>    dynamically allocated chunks of memory.  Takes a lot of time as a
>>>>    number of SB flows per desired flow goes up to hundreds and beyond,
>>>>    while in practice after the first merge there are no longer any
>>>>    address sets referenced in most cases.
>>>>
>>>> 2. During the same iteration, ovn-controller checks if the same Sb
>>>>    flow is already referenced.  This is fine while we're doing a
>>>>    linear walk over the list anyway, but it can be done with a hash
>>>>    map or in some other way if we get rid of the linear lookup.
>>>>
>>>> 3. ovn-controller constructs a hash map from all the existing
>>>>    conjunctive actions to later check for duplicates.  This is
>>>>    inefficient when there is only one conjunction to add, which is how
>>>>    it is always in the current implementation.
>>>>
>>>> 4. Recompute frequently re-installs all the conjunctive flows in OVS,
>>>>    because the order of conjunctions in the actions depends on the
>>>>    order in which incremental processing is happening, and recompute
>>>>    has it's own order in which it processes Sb flows and merges them
>>>>    together into desired flows.
>>>>
>>>> This set is trying to address all the issues above by converting
>>>> 'references' into a hash map, counting the number of references with
>>>> address sets and performing the duplicate conjunction checks on the
>>>> original ofpacts instead of building ad-hoc hash maps, while keeping
>>>> the action sorted solving the problem with recompute re-creating all
>>>> of them.
>>>>
>>>> In a setup with 3.5K ACLs referencing the same-ish 450 IP addresses and
>>>> a port group with ~20 ports, this set reduces the average full recompute
>>>> time from 48 to 18 seconds, which is about 2.7x speed up.
>>>>
>>>>
>>>> As per usual, this is a performance fix, so it's on the edge between
>>>> a bug fix and an enhancement.  It doesn't make sense to backport below
>>>> 26.03, as older branches do not have the previous commit:
>>>>   9a4035402982 ("ofctrl: Optimize and specialize the
>> ofctrl_add_or_append_flow function.")
>>>> But if we could consider a backport to 26.03, I think, it might be a
>>>> good thing, as people will get much better performance in ACL-heavy
>>>> setups in the upcoming LTS.  Otherwise, they may need to wait for
>>>> another 2 years, if they are following the LTS releases.
>>>>
>>>> Patch #4 is not a performance fix, so should be backported regardless.
>>>>
>>>>
>>>> Ilya Maximets (5):
>>>>   ofctrl: Track Sb flow references in a hash map.
>>>>   ofctrl: Use hash map lookup for duplicate references check.
>>>>   ofctrl: Count the number of referenced flows with address sets.
>>>>   ofctrl: Fix removing wrong conjunction during the duplicate check.
>>>>   ofctrl: Look for duplicated conjunctions directly in the ofpacts.
>>>>
>>>>  controller/ofctrl.c | 245 +++++++++++++++++++++++++++-----------------
>>>>  tests/ovn-macros.at |   1 -
>>>>  2 files changed, 152 insertions(+), 94 deletions(-)
>>>>
>>>> --
>>>> 2.53.0
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> [email protected]
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>
>>>
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> 
> Thank you Ilya and Mark,
> 

Hi all,

> first things first, I applied this to main while we continue the
> backport discussion. To be brief, it's true that we usually don't
> backport performance improvements. However, two main points lead me
> to lean toward backporting to 26.03:
> 
> 1) 26.03 was just released with 26.03.0 so there is still a space to
> fix anything that this might broke (not that I believe it would).
> 
> 2) 26.03 is LTS which means many CMSs might stay with this version
> and the benefit of having this is undeniable.
> 
> Also, since this is being merged to main we can take the extra step
> to perform d/s testing to ensure it's safe to backport. Thoughts?
> 

I definitely vote for backporting this to 26.03 as is.  I had a closer
look at the applied patches and they seem safe.  We also most likely
have no users on 26.03 yet.

Regards,
Dumitru

> Regards,
> Ales
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to