On 5/14/26 10:13 AM, Ales Musil wrote:
> On Wed, May 13, 2026 at 11:15 AM Dumitru Ceara via dev <
> [email protected]> wrote:
> 
>> The pflow_output SB_port_binding handler triggers a full
>> recompute when the type column is updated on a port binding.
>> However, for newly created port bindings, the OVSDB IDL
>> marks all non-default columns as "updated", even though no
>> actual update occurred.  This caused every new port binding
>> with a non-default type (e.g., remote, patch, localnet,
>> router) to unnecessarily trigger a full pflow_output
>> recompute, severely impacting ovn-controller performance
>> at scale.
>>
>> This is particularly problematic in deployments that use
>> remote LSPs, such as ovn-kubernetes with L2 UDNs, where
>> frequent creation of remote port bindings leads to
>> continuous full recomputes and high CPU usage.
>>
>> Guard the type-update check with sbrec_port_binding_is_new()
>> and sbrec_port_binding_is_deleted() so that only genuine
>> type changes on existing port bindings trigger a recompute.
>> This matches the pattern already used in binding.c for the
>> tunnel_key column.
>>
>> Also fix a typo in the test name ("path" -> "patch").
>>
>> Fixes: 73a10345a29c ("controller: Update physical flows for peer port when
>> the patch port is removed.")
>> Reported-at: https://redhat.atlassian.net/browse/FDP-3819
>> Reported-by: Patryk Diak <[email protected]>
>> Assisted-by: Claude Opus 4.6, Claude Code
>> Signed-off-by: Dumitru Ceara <[email protected]>
>> ---
>>  controller/ovn-controller.c |  6 ++++--
>>  tests/ovn-controller.at     | 14 +++++++++++++-
>>  tests/ovn-performance.at    |  7 +++++++
>>  3 files changed, 24 insertions(+), 3 deletions(-)
>>
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index 3024399f36..b26ccfbd96 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -5081,12 +5081,14 @@ pflow_output_sb_port_binding_handler(struct
>> engine_node *node,
>>       */
>>      const struct sbrec_port_binding *pb;
>>      SBREC_PORT_BINDING_TABLE_FOR_EACH_TRACKED (pb,
>> p_ctx.port_binding_table) {
>> +        bool removed = sbrec_port_binding_is_deleted(pb);
>> +
>>          /* Trigger a full recompute if type column is updated. */
>> -        if (sbrec_port_binding_is_updated(pb,
>> SBREC_PORT_BINDING_COL_TYPE)) {
>> +        if (!removed && !sbrec_port_binding_is_new(pb) &&
>> +            sbrec_port_binding_is_updated(pb,
>> SBREC_PORT_BINDING_COL_TYPE)) {
>>              destroy_physical_ctx(&p_ctx);
>>              return EN_UNHANDLED;
>>          }
>> -        bool removed = sbrec_port_binding_is_deleted(pb);
>>          if (!physical_handle_flows_for_lport(pb, removed, &p_ctx,
>>                                               &pfo->flow_table)) {
>>              destroy_physical_ctx(&p_ctx);
>> diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
>> index b1814dfb74..f9f64d691c 100644
>> --- a/tests/ovn-controller.at
>> +++ b/tests/ovn-controller.at
>> @@ -962,7 +962,7 @@ AT_CLEANUP
>>  ])
>>
>>  OVN_FOR_EACH_NORTHD([
>> -AT_SETUP([ovn-controller - ovn IP check path ports])
>> +AT_SETUP([ovn-controller - ovn IP check patch ports])
>>  AT_KEYWORDS([ovn-ip-patch-ports])
>>
>>  ovn_start
>> @@ -1003,6 +1003,18 @@ check as hv1 ovn-appctl -t ovn-controller
>> inc-engine/clear-stats
>>  check ovn-nbctl --wait=hv lsp-set-type ls0-rp localnet
>>  check_controller_engine_stats hv1 pflow_output recompute nocompute
>>
>> +# Check that adding a new port with a non-default type does not trigger
>> +# a pflow_output recompute.
>> +check as hv1 ovn-appctl -t ovn-controller inc-engine/clear-stats
>> +check ovn-nbctl --wait=hv lsp-add ls0 lsp-remote -- lsp-set-type
>> lsp-remote remote
>> +check_controller_engine_stats hv1 pflow_output norecompute compute
>> +
>> +# Check that deleting a port with a non-default type does not trigger
>> +# a pflow_output recompute.
>> +check as hv1 ovn-appctl -t ovn-controller inc-engine/clear-stats
>> +check ovn-nbctl --wait=hv lsp-del lsp-remote
>> +check_controller_engine_stats hv1 pflow_output norecompute compute
>> +
>>  OVN_CLEANUP([hv1])
>>  AT_CLEANUP
>>  ])
>> diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at
>> index 2bccbb06dd..114917832b 100644
>> --- a/tests/ovn-performance.at
>> +++ b/tests/ovn-performance.at
>> @@ -386,6 +386,13 @@ for i in 1 2; do
>>      done
>>  done
>>
>> +# Check that adding a new port with a non-default type does not trigger
>> +# a physical_run.
>> +OVN_CONTROLLER_EXPECT_NO_HIT(
>> +    [hv1 hv2], [physical_run],
>> +    [ovn-nbctl --wait=hv lsp-add ls1 lsp-remote -- lsp-set-type
>> lsp-remote remote]
>> +)
>> +
>>  for i in 1 2; do
>>      j=$((i%2 + 1))
>>      as=as$i
>> --
>> 2.54.0
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
> Thank you Dumitru,
> 
> applied to main and backported down to 25.09.
> 

Hi Ales,

Thanks for that!  I actually tried to cherry pick this on 25.03 too
today because Red Hat's ovn-kubernetes needs it there too but I'm
hitting a test failure:

rcv_n=0 exp_n=1
ovn-macros.at:12: wait failed after 30 seconds
Expected:
f0f00000001100000101020708004500001c000000003e111726ac1f0064ac1f000a0035111100080000
Received:
Diff:
--- vif-north.expected.sorted   2026-05-19 09:38:11.881677567 +0000
+++ hv4/vif-north-tx.packets.sorted     2026-05-19 09:38:11.883375243 +0000
@@ -1 +0,0 @@
-f0f00000001100000101020708004500001c000000003e111726ac1f0064ac1f000a0035111100080000
../../../tests/ovs-macros.at:256: hard failure
327. ovn.at:26002: 327. Stateless Floating IP -- parallelization=yes --
ovn_monitor_all=no (ovn.at:26002): FAILED (ovs-macros.at:256)

After quite some debugging, I think this might be an actual incremental
processing bug that we now hit more often.  There's a dependency between
localnet ports and LP_CHASSISREDIRECT ports when redirect-type=bridged.

If the localnet port binding creation is received in a later iteration
of ovn-controller, after the chassis-redirect port was created, we now
fail to create the redirect flows for the CR port (i.e., we don't call
put_remote_port_redirect_bridged() anymore).

I'll try to come up with a simpler test that proves the bug (I think it
was present before 73a10345a29c ("controller: Update physical flows for
peer port when the patch port is removed.")) and a fix for it.

Regards,
Dumitru

> Regards,
> Ales
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to