On 5/7/26 4:31 PM, Dumitru Ceara wrote:
> Hi all,
>
> It seems that the test added by this patch of mine sometimes fails in
> GitHub CI when ovs-vswitchd is stopped; the failure is due to:
>
> ./ovn.at:10193: check_logs "
> $error
> /connection failed (No such file or directory)/d
> /has no network name*/d
> /receive tunnel port not found*/d
> /Failed to locate tunnel to reach main chassis/d
> /Transaction causes multiple rows.*MAC_Binding/d
> /Transaction causes multiple rows.*FDB/d
> " $sbox
> --- /dev/null 2026-05-06 14:47:18.250105001 +0000
> +++ /workspace/ovn-tmp/tests/testsuite.dir/at-groups/155/stdout
> 2026-05-06 14:54:35.770811381 +0000
> @@ -0,0 +1,2 @@
> +2026-05-06T14:54:35.548Z|00471|ofproto_dpif_rid|ERR|recirc_id 4 left
> allocated when ofproto (br-int) is destructed
> +2026-05-06T14:54:35.548Z|00472|ofproto_dpif_rid|ERR|recirc_id 2 left
> allocated when ofproto (br-int) is destructed
>
> https://github.com/ovsrobot/ovn/actions/runs/25442425151/job/74637759422#step:12:5664
>
> I'm failing to reproduce the issue locally but I'll keep investigating.
>
+Eelco as we were discussing offline about this earlier.
So, if I understand correctly, it's just a race when ovs-vswitchd exits:
ofproto-dpif.c:destruct() is called, it removes all OF rules and then
calls recirc_free_ofproto() which assumes there's no remaining recirc_id.
However, close_dpif_backer() which in turn calls udpif_destroy() and
deletes all the ukeys (including recirc_ids they might have) is only
called at the end of destruct().
static void
destruct(struct ofproto *ofproto_, bool del)
{
...
ofproto->backer->need_revalidate = REV_RECONFIGURE;
xlate_txn_start();
xlate_remove_ofproto(ofproto);
xlate_txn_commit();
hmap_remove(&all_ofproto_dpifs_by_name,
&ofproto->all_ofproto_dpifs_by_name_node);
OFPROTO_FOR_EACH_TABLE (table, &ofproto->up) {
CLS_FOR_EACH (rule, up.cr, &table->cls) {
ofproto_rule_delete(&ofproto->up, &rule->up);
}
}
...
recirc_free_ofproto(ofproto, ofproto->up.name);
...
/* Wait for all the meter destroy work to finish. */
ovsrcu_barrier();
close_dpif_backer(ofproto->backer, del);
}
Is this something that's worth addressing in OVS? I understand it's
benign because we're exiting anyway but I am a bit surprised we don't
hit it more often in tests (OVN tests too).
Alternatively, would it be an option to change OVN's OVN_CLEANUP_VSWITCH
macro to call revalidator/purge before stopping vswitchd? I.e.:
diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
index 39f03ba62b..b0b22e2fe6 100644
--- a/tests/ovn-macros.at
+++ b/tests/ovn-macros.at
@@ -223,6 +223,7 @@ m4_define([OVN_CLEANUP_VSWITCH],[
echo
echo "$1: clean up vswitch"
as $1
+ ovs-appctl revalidator/purge
OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
OVS_APP_EXIT_AND_WAIT([ovsdb-server])
])
Thanks,
Dumitru
> Regards,
> Dumitru
>
> On 5/6/26 1:49 PM, Dumitru Ceara wrote:
>> On 5/6/26 10:33 AM, Mairtin O'Loingsigh wrote:
>>> On Mon, May 04, 2026 at 09:05:40AM +0200, Dumitru Ceara wrote:
>>>> Hi Mairtin,
>>>>
>>>> Thanks for the review!
>>>>
>>>> On 4/29/26 10:28 AM, Mairtin O'Loingsigh wrote:
>>>>> On Fri, Apr 24, 2026 at 05:35:58PM +0200, Dumitru Ceara via dev wrote:
>>>>>> The ARP/ND responder stage (ls_in_arp_rsp) unconditionally
>>>>>> bypassed all traffic arriving from localnet ports via a
>>>>>> priority-100 "next;" flow. This caused broadcast ARP/ND
>>>>>> requests from the physical network to be flooded to every
>>>>>> logical switch port instead of being handled by proxy
>>>>>> ARP/ND. On switches with ~200+ ports the resulting
>>>>>> multicast replication exceeded the OVS 4K resubmit limit,
>>>>>> dropping the packets and breaking connectivity.
>>>>>>
>>>>>> Replace the bypass with a targeted mechanism:
>>>>>>
>>>>>> - In ls_in_lookup_fdb, set flags.localnet = 1 for
>>>>>> packets arriving from localnet ports (P50 fallback;
>>>>>> the existing P100 FDB-learning flow already sets this
>>>>>> flag when FDB learning is enabled).
>>>>>>
>>>>>> - In the P50 ARP/ND reply flows, append the condition
>>>>>> "((flags.localnet == 1 && is_chassis_resident(port))
>>>>>> || flags.localnet == 0)" on switches that have
>>>>>> localnet ports.
>>>>>>
>>>>>> This ensures that ARP/ND requests from localnet are only
>>>>>> answered on the chassis hosting the target VIF, preventing
>>>>>> both the flood and duplicate replies from multiple
>>>>>> hypervisors. VIF-to-VIF proxy ARP/ND is unchanged because
>>>>>> flags.localnet is 0 for non-localnet-sourced traffic.
>>>>>>
>>>>>> Fixes: f763a3273b84 ("ovn: Avoid ARP responder for packets from localnet
>>>>>> port")
>>>>>> Reported-at: https://redhat.atlassian.net/browse/FDP-3436
>>>>>> Assisted-by: Claude Opus 4.6, Claude Code
>>>>>> Signed-off-by: Dumitru Ceara <[email protected]>
>>>>>> ---
>>>>
>>>> [...]
>>>>
>>>>>>
>>>>>> +/* On switches with localnet ports, restrict ARP/ND replies for
>>>>>> + * localnet-sourced requests to the chassis hosting the target VIF
>>>>>> + * (preventing duplicate replies from every hypervisor). Non-localnet
>>>>>> + * requests (VIF-to-VIF) are answered unconditionally as before. */
>>>>>> +static void
>>>>>> +build_lswitch_arp_nd_local_resp_match(struct ds *match,
>>>>>> + const struct ovn_port *op)
>>>>>> +{
>>>>>> + if (!ls_has_localnet_port(op->od)) {
>>>>>> + return;
>>>>>> + }
>>>>>> +
>>>>>> + ds_put_format(match,
>>>>>> + " && ((flags.localnet == 1 && is_chassis_resident(%s))"
>>>>>> + " || flags.localnet == 0)", op->json_key);
>>>>> nit: spacing
>>>>
>>>> I had actually done this on purpose to make it a bit more visible that "
>>>> || flags.localnet == 0" is part of the condition in parenthesis. But I
>>>> have no strong preference in the end. Please let me know if you still
>>>> would like me to change it.
>>>>
>>>>>> +}
>>>>>> +
>>>>
>>>> [...]
>>>>
>>>>>>
>>>>> LGTM. Just one small nit.
>>>>>
>>>>> Acked-by: Mairtin O'Loingsigh <[email protected]>
>>>>>
>>>>
>>>> Regards,
>>>> Dumitru
>>>>
>>>
>>> This spacing does look more readable. No need to change.
>>>
>>
>> Hi Mairtin,
>>
>> Thanks for the confirmation! Applied to main and all stable branches
>> down to 24.03.
>>
>> Regards,
>> Dumitru
>>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev