>
> Hi, Salem. This patch looks clearly AI-generated. You need to disclose
> that fact by providing the Assisted-by tag, see the contribution guide.
>
> Also, while code seems fine at the first glance, most of the comments are
> (as typical with LLMs) way too long, some only make sense in the context
> of the patch and not in the context of the file they are in, a lot of
> comments, especially in the test, are obvious. Please, clean this up.
>
> I didn't review the code in details, but see a couple comments below.
Hi Ilya, thank you for the review. I gave addressed all your comments and
adapted accordingly in V3 for your review at your convenience.
>
>>
>> However, OVN also produces OFPP_NONE in the frozen state at logical
>> datapath crossings by explicitly clearing in_port mid-pipeline
>> (load:0xffff->in_port) before a ct() freeze. In this case the
>> megaflow still has a real datapath in_port.
>>
>> When that port is deleted, the revalidator re-translates these
>> megaflows. The shortcut bypasses port validation, the actions look
>> unchanged, and the megaflow is kept.
>> Stale megaflows then accumulate until max-idle fires,
>
> The commeit message and the comments fixate on the word 'megaflow'
> which is a wrong word to use in this context, IMO. These are datapath
> flows. Megaflow is the match criteria with the mask, which is not
> really related to what is going on here.
>
>> which is problematic in high-churn environments like Kubernetes and
>> especially in offload setups where matching datapath flows are held
>> in hardware.
>>
>> Fix this by verifying that the megaflow's datapath in_port still maps
>> to an OF port via odp_port_to_ofport() in the shortcut path. Skip
>> the check when in_port.odp_port == ODPP_NONE so the original
>> controller packet-out scenarios continue to work.
>>
>> A regression test reproducing the bug on the dummy datapath is added.
>
>
>
>>
>> [1]: 323ae1e808e6 ("ofproto-dpif-xlate: Fix recirculation when in_port is
>> OFPP_CONTROLLER.")
>
> Don't duplicate the commit line, if you need to reference the commit in
> the fixes tag, just say so or reference it as 'cited commit'.
>
>>
>> Fixes: 323ae1e808e6 ("ofproto-dpif-xlate: Fix recirculation when in_port is
>> OFPP_CONTROLLER.")
>> Signed-off-by: Salem Sol <[email protected]>
>> ---
>> ofproto/ofproto-dpif-xlate.c | 33 ++++++++++++++++++++-
>> tests/ofproto-dpif.at | 56 ++++++++++++++++++++++++++++++++++++
>> 2 files changed, 88 insertions(+), 1 deletion(-)
>>
>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>> index c1ba447e93..972df39935 100644
>> --- a/ofproto/ofproto-dpif-xlate.c
>> +++ b/ofproto/ofproto-dpif-xlate.c
>> @@ -1689,7 +1689,10 @@ xlate_lookup_ofproto_(const struct dpif_backer
>> *backer,
>> * the packet originated from OFPP_CONTROLLER passed
>> * through a patch port.
>> *
>> - * OFPP_NONE can also indicate that a bond caused
>> recirculation. */
>> + * OFPP_NONE can also indicate that a bond caused recirculation,
>> + * or that an OpenFlow action (e.g.
>> load:0xffff->NXM_OF_IN_PORT[])
>> + * cleared the OF in_port mid-pipeline before a freeze (as OVN
>> + * does at logical-datapath crossings). */
>> struct uuid uuid = recirc_id_node->state.ofproto_uuid;
>> const struct xbridge *bridge = xbridge_lookup_by_uuid(xcfg,
>> &uuid);
>>
>> @@ -1698,6 +1701,34 @@ xlate_lookup_ofproto_(const struct dpif_backer
>> *backer,
>> !get_ofp_port(bridge, in_port)) {
>> goto xport_lookup;
>> }
>> +
>> + /* Even when the frozen state's OF in_port is OFPP_NONE or
>> + * OFPP_CONTROLLER, the megaflow under revalidation still
>> has
>> + * a concrete datapath in_port in its key
>> + * (flow->in_port.odp_port). If that ingress can no longer
>> + * be resolved -- e.g. the underlying port has been deleted,
>> + * or the tunnel removed -- then this megaflow is
>> unreachable
>> + * (no future packet can match a non-existent ingress) and
>> + * must be evicted. Mirror the xport_lookup logic below so
>> + * tunnel arrivals are handled the same way, and fall
>> through
>> + * to xport_lookup when the ingress is gone so we return
>> + * NULL/ENODEV.
>> + *
>> + * ODPP_NONE means there is no datapath ingress at all
>> + * (legitimate for controller-originated packet-outs); skip
>> + * the check there. */
>> + if (flow->in_port.odp_port != ODPP_NONE) {
>> + const struct ofport_dpif *ingress;
>> +
>> + ingress = tnl_port_should_receive(flow)
>> + ? tnl_port_receive(flow)
>> + : odp_port_to_ofport(backer,
>> + flow->in_port.odp_port);
>> + if (!ingress) {
>> + goto xport_lookup;
>> + }
>> + }
>> +
>> if (errorp) {
>> *errorp = NULL;
>> }
>> diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
>> index dcab7bba45..b15457b06f 100644
>> --- a/tests/ofproto-dpif.at
>> +++ b/tests/ofproto-dpif.at
>> @@ -9010,6 +9010,62 @@ packets:2, bytes:68, used:0.001s, actions:drop
>> OVS_VSWITCHD_STOP(["/sending to collector failed/d"])
>> AT_CLEANUP
>>
>> +dnl Reproducer for stale recirculated megaflows that survive port deletion
>> +dnl when the OpenFlow pipeline clears in_port
>> (load:0xffff->NXM_OF_IN_PORT[])
>> +dnl before triggering a freeze with ct(table=N). This is the exact pattern
>> +dnl OVN uses at logical-datapath crossings, and the resulting frozen state
>> +dnl with metadata.in_port == OFPP_NONE makes xlate_lookup_ofproto_() bypass
>> +dnl any validation of the datapath in_port that the megaflow keys on, so the
>> +dnl megaflow is revalidated as UKEY_KEEP for max-idle even
>> +dnl though the underlying port is gone.
>> +AT_SETUP([ofproto-dpif - stale recirc megaflow after in_port deletion
>> (OFPP_NONE frozen in_port)])
>
> The name is too long.
>
>> +AT_KEYWORDS([megaflow revalidator recirc])
>> +OVS_VSWITCHD_START
>> +add_of_ports br0 1 2 3
>
> Why 3 ports?
>
>> +
>> +dnl Pipeline mirrors OVN's logical-datapath-crossing pattern:
>> +dnl table=0: receive on p3, clear OF in_port, then ct(table=1) which
>> freezes
>> +dnl table=1: post-recirc, output to p2
>> +AT_DATA([flows.txt], [dnl
>> +table=0,in_port=3,tcp
>> actions=load:0xffff->NXM_OF_IN_PORT[[]],ct(zone=1,table=1)
>> +table=1,tcp,ct_state=+trk actions=output:2
>> +])
>> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
>> +
>> +dnl Send a packet to install both the recirc_id=0 and the recirc_id=0x1
>> ukeys.
>> +AT_CHECK([ovs-appctl netdev-dummy/receive p3 \
>> +
>> 'in_port(3),eth(src=50:54:00:00:00:08,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=192.168.0.2,dst=192.168.0.1,proto=6,tos=0,ttl=64,frag=no),tcp(src=1234,dst=80)'])
>> +
>
> Use an m4_define for a more compact definition for the packet, see TRACE_PKT
> as an example.
>
>> +dnl Sanity: both the pre-CT (recirc_id=0,in_port=3) and the post-CT
>> +dnl (recirc_id=0x1,in_port=3) megaflows must be installed. Use
>> +dnl OVS_WAIT_UNTIL so the test does not depend on revalidator exec speed.
>> +OVS_WAIT_UNTIL([test $(ovs-appctl dpctl/dump-flows | dnl
>> + grep -c 'recirc_id(0),in_port(3)') -eq 1], [ovs-appctl dpctl/dump-flows])
>> +OVS_WAIT_UNTIL([test $(ovs-appctl dpctl/dump-flows | dnl
>> + grep -c 'recirc_id(0x1),in_port(3)') -eq 1], [ovs-appctl
>> dpctl/dump-flows])
>
> Don't use OVS_WAIT_UNTIL. Use revalidator/wait (when needed) + AC_CHECK.
> Also,
> we know exactly how all the datapath flows here should look like, so just
> match
> on that (with all the appropriate strip_* options).
>
>> +
>> +dnl Delete the ingress port. This triggers REV_RECONFIGURE, after which
>> +dnl the revalidator must drop any datapath megaflows that key on the
>> +dnl now-deleted port.
>> +
>> +AT_CHECK([ovs-vsctl del-port br0 p3])
>> +
>> +dnl The recirc_id=0 flow is correctly deleted (no frozen state involved).
>> +OVS_WAIT_UNTIL([test $(ovs-appctl dpctl/dump-flows | dnl
>> + grep -c 'recirc_id(0),in_port(3)') -eq 0], [ovs-appctl dpctl/dump-flows])
>> +
>> +dnl The recirc_id>0 megaflow MUST also be deleted: its datapath key is
>> +dnl in_port(3) which no longer exists. Without the fix this wait times
>> +dnl out, because xlate_lookup_ofproto_() takes the frozen-in_port=OFPP_NONE
>> +dnl shortcut and returns SUCCESS without verifying the datapath in_port,
>> +dnl so revalidate_ukey__() decides UKEY_KEEP and the megaflow only goes
>> +dnl away when max-idle fires.
>> +OVS_WAIT_UNTIL([test $(ovs-appctl dpctl/dump-flows | dnl
>> + grep -c 'recirc_id(0x1),in_port(3)') -eq 0], [ovs-appctl
>> dpctl/dump-flows])
>> +
>> +OVS_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>> dnl Bridge IPFIX statistics check
>> AT_SETUP([ofproto-dpif - Bridge IPFIX statistics check])
>> OVS_VSWITCHD_START
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev