On 5/20/26 12:28 PM, Dumitru Ceara wrote:
> Hi Ales,
> 
> +Lorenzo & Numan in CC
> 
> On 5/19/26 11:21 PM, Dumitru Ceara wrote:
>> On 5/14/26 10:13 AM, Ales Musil wrote:
>>> On Wed, May 13, 2026 at 11:15 AM Dumitru Ceara via dev <
>>> [email protected]> wrote:
>>>
>>>> The pflow_output SB_port_binding handler triggers a full
>>>> recompute when the type column is updated on a port binding.
>>>> However, for newly created port bindings, the OVSDB IDL
>>>> marks all non-default columns as "updated", even though no
>>>> actual update occurred.  This caused every new port binding
>>>> with a non-default type (e.g., remote, patch, localnet,
>>>> router) to unnecessarily trigger a full pflow_output
>>>> recompute, severely impacting ovn-controller performance
>>>> at scale.
>>>>
>>>> This is particularly problematic in deployments that use
>>>> remote LSPs, such as ovn-kubernetes with L2 UDNs, where
>>>> frequent creation of remote port bindings leads to
>>>> continuous full recomputes and high CPU usage.
>>>>
>>>> Guard the type-update check with sbrec_port_binding_is_new()
>>>> and sbrec_port_binding_is_deleted() so that only genuine
>>>> type changes on existing port bindings trigger a recompute.
>>>> This matches the pattern already used in binding.c for the
>>>> tunnel_key column.
>>>>
>>>> Also fix a typo in the test name ("path" -> "patch").
>>>>
>>>> Fixes: 73a10345a29c ("controller: Update physical flows for peer port when
>>>> the patch port is removed.")
>>>> Reported-at: https://redhat.atlassian.net/browse/FDP-3819
>>>> Reported-by: Patryk Diak <[email protected]>
>>>> Assisted-by: Claude Opus 4.6, Claude Code
>>>> Signed-off-by: Dumitru Ceara <[email protected]>
>>>> ---
> 
> [...]
>>>>
>>> Thank you Dumitru,
>>>
>>> applied to main and backported down to 25.09.
>>>
>>
>> Hi Ales,
>>
>> Thanks for that!  I actually tried to cherry pick this on 25.03 too
>> today because Red Hat's ovn-kubernetes needs it there too but I'm
>> hitting a test failure:
>>
>> rcv_n=0 exp_n=1
>> ovn-macros.at:12: wait failed after 30 seconds
>> Expected:
>> f0f00000001100000101020708004500001c000000003e111726ac1f0064ac1f000a0035111100080000
>> Received:
>> Diff:
>> --- vif-north.expected.sorted        2026-05-19 09:38:11.881677567 +0000
>> +++ hv4/vif-north-tx.packets.sorted  2026-05-19 09:38:11.883375243 +0000
>> @@ -1 +0,0 @@
>> -f0f00000001100000101020708004500001c000000003e111726ac1f0064ac1f000a0035111100080000
>> ../../../tests/ovs-macros.at:256: hard failure
>> 327. ovn.at:26002: 327. Stateless Floating IP -- parallelization=yes --
>> ovn_monitor_all=no (ovn.at:26002): FAILED (ovs-macros.at:256)
>>
>> After quite some debugging, I think this might be an actual incremental
>> processing bug that we now hit more often.  There's a dependency between
>> localnet ports and LP_CHASSISREDIRECT ports when redirect-type=bridged.
>>
>> If the localnet port binding creation is received in a later iteration
>> of ovn-controller, after the chassis-redirect port was created, we now
>> fail to create the redirect flows for the CR port (i.e., we don't call
>> put_remote_port_redirect_bridged() anymore).
>>
>> I'll try to come up with a simpler test that proves the bug (I think it
>> was present before 73a10345a29c ("controller: Update physical flows for
>> peer port when the patch port is removed.")) and a fix for it.
>>
> 
> I went ahead and posted a fix for this:
> https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/
> 

Since that patch was reviewed, I also included a 25.03 backport of
b408eedf6d9d ("ovn-controller: Skip type-update check for new port
bindings.") in the same batch.

Regards,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to