Hi all,

It seems that the test added by this patch of mine sometimes fails in
GitHub CI when ovs-vswitchd is stopped; the failure is due to:

./ovn.at:10193: check_logs "
        $error
        /connection failed (No such file or directory)/d
        /has no network name*/d
        /receive tunnel port not found*/d
        /Failed to locate tunnel to reach main chassis/d
        /Transaction causes multiple rows.*MAC_Binding/d
        /Transaction causes multiple rows.*FDB/d
    " $sbox
--- /dev/null   2026-05-06 14:47:18.250105001 +0000
+++ /workspace/ovn-tmp/tests/testsuite.dir/at-groups/155/stdout
2026-05-06 14:54:35.770811381 +0000
@@ -0,0 +1,2 @@
+2026-05-06T14:54:35.548Z|00471|ofproto_dpif_rid|ERR|recirc_id 4 left
allocated when ofproto (br-int) is destructed
+2026-05-06T14:54:35.548Z|00472|ofproto_dpif_rid|ERR|recirc_id 2 left
allocated when ofproto (br-int) is destructed

https://github.com/ovsrobot/ovn/actions/runs/25442425151/job/74637759422#step:12:5664

I'm failing to reproduce the issue locally but I'll keep investigating.

Regards,
Dumitru

On 5/6/26 1:49 PM, Dumitru Ceara wrote:
> On 5/6/26 10:33 AM, Mairtin O'Loingsigh wrote:
>> On Mon, May 04, 2026 at 09:05:40AM +0200, Dumitru Ceara wrote:
>>> Hi Mairtin,
>>>
>>> Thanks for the review!
>>>
>>> On 4/29/26 10:28 AM, Mairtin O'Loingsigh wrote:
>>>> On Fri, Apr 24, 2026 at 05:35:58PM +0200, Dumitru Ceara via dev wrote:
>>>>> The ARP/ND responder stage (ls_in_arp_rsp) unconditionally
>>>>> bypassed all traffic arriving from localnet ports via a
>>>>> priority-100 "next;" flow.  This caused broadcast ARP/ND
>>>>> requests from the physical network to be flooded to every
>>>>> logical switch port instead of being handled by proxy
>>>>> ARP/ND.  On switches with ~200+ ports the resulting
>>>>> multicast replication exceeded the OVS 4K resubmit limit,
>>>>> dropping the packets and breaking connectivity.
>>>>>
>>>>> Replace the bypass with a targeted mechanism:
>>>>>
>>>>>   - In ls_in_lookup_fdb, set flags.localnet = 1 for
>>>>>     packets arriving from localnet ports (P50 fallback;
>>>>>     the existing P100 FDB-learning flow already sets this
>>>>>     flag when FDB learning is enabled).
>>>>>
>>>>>   - In the P50 ARP/ND reply flows, append the condition
>>>>>     "((flags.localnet == 1 && is_chassis_resident(port))
>>>>>      || flags.localnet == 0)" on switches that have
>>>>>     localnet ports.
>>>>>
>>>>> This ensures that ARP/ND requests from localnet are only
>>>>> answered on the chassis hosting the target VIF, preventing
>>>>> both the flood and duplicate replies from multiple
>>>>> hypervisors.  VIF-to-VIF proxy ARP/ND is unchanged because
>>>>> flags.localnet is 0 for non-localnet-sourced traffic.
>>>>>
>>>>> Fixes: f763a3273b84 ("ovn: Avoid ARP responder for packets from localnet 
>>>>> port")
>>>>> Reported-at: https://redhat.atlassian.net/browse/FDP-3436
>>>>> Assisted-by: Claude Opus 4.6, Claude Code
>>>>> Signed-off-by: Dumitru Ceara <[email protected]>
>>>>> ---
>>>
>>> [...]
>>>
>>>>>  
>>>>> +/* On switches with localnet ports, restrict ARP/ND replies for
>>>>> + * localnet-sourced requests to the chassis hosting the target VIF
>>>>> + * (preventing duplicate replies from every hypervisor).  Non-localnet
>>>>> + * requests (VIF-to-VIF) are answered unconditionally as before. */
>>>>> +static void
>>>>> +build_lswitch_arp_nd_local_resp_match(struct ds *match,
>>>>> +                                      const struct ovn_port *op)
>>>>> +{
>>>>> +    if (!ls_has_localnet_port(op->od)) {
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    ds_put_format(match,
>>>>> +        " && ((flags.localnet == 1 && is_chassis_resident(%s))"
>>>>> +            " || flags.localnet == 0)", op->json_key);
>>>> nit: spacing
>>>
>>> I had actually done this on purpose to make it a bit more visible that "
>>> || flags.localnet == 0" is part of the condition in parenthesis.  But I
>>> have no strong preference in the end.  Please let me know if you still
>>> would like me to change it.
>>>
>>>>> +}
>>>>> +
>>>
>>> [...]
>>>
>>>>>
>>>> LGTM. Just one small nit.
>>>>
>>>> Acked-by: Mairtin O'Loingsigh <[email protected]>
>>>>
>>>
>>> Regards,
>>> Dumitru
>>>
>>
>> This spacing does look more readable. No need to change.
>>
> 
> Hi Mairtin,
> 
> Thanks for the confirmation!  Applied to main and all stable branches
> down to 24.03.
> 
> Regards,
> Dumitru
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to