On 9/9/25 3:56 PM, Reuven Plevinsky wrote: > On 08/09/2025 15:24, Ilya Maximets wrote: >> External email: Use caution opening links or attachments >> >> >> On 9/7/25 1:33 PM, Eli Britstein wrote: >>> >>>> -----Original Message----- >>>> From: Ilya Maximets <i.maxim...@ovn.org> >>>> Sent: Thursday, 21 August 2025 13:32 >>>> To: Reuven Plevinsky <rplevin...@nvidia.com>; d...@openvswitch.org >>>> Cc: i.maxim...@ovn.org; Eli Britstein <el...@nvidia.com>; Maor Dickman >>>> <ma...@nvidia.com> >>>> Subject: Re: [ovs-dev] [PATCH v2 1/1] ofproto-dpif-xlate: Clear tunnel >>>> wildcard >>>> bits properly. >>>> >>>> External email: Use caution opening links or attachments >>>> >>>> >>>> On 8/19/25 11:01 AM, Reuven Plevinsky wrote: >>>>> On 06/08/2025 14:26, Ilya Maximets wrote: >>>>>> External email: Use caution opening links or attachments >>>>>> >>>>>> >>>>>> On 8/4/25 4:29 PM, Reuven Plevinsky via dev wrote: >>>>>>> For example, consider the following OpenFlow rule: >>>>>>> - IPv6 -> IPv4 tunnel translation: >>>>>>> add-flow br-int 'in_port=vxlan_br-int_0, \ >>>>>>> actions=set_field:8.8.8.7->tun_src, \ >>>>>>> set_field:8.8.8.8->tun_dst,set_field:43->tun_id,geneve_br-int_1' >>>>>>> >>>>>>> As any set_field action, flow wildcards are set to the tun_src/dst. >>>>>>> The offload layer fails if not all matches are handled ("consumed"). >>>>>>> Clear the irrelevant wildcards. >>>>>>> >>>>>>> Signed-off-by: Reuven Plevinsky <rplevin...@nvidia.com> >>>>>>> --- >>>>>>> ofproto/ofproto-dpif-xlate.c | 17 +++++++++++++ >>>>>>> tests/tunnel-push-pop.at | 48 >>>> ++++++++++++++++++++++++++++++++++++ >>>>>>> 2 files changed, 65 insertions(+) >>>>>>> >>>>>>> diff --git a/ofproto/ofproto-dpif-xlate.c >>>>>>> b/ofproto/ofproto-dpif-xlate.c index 2c8197fb7307..e2ba7e6b0250 >>>>>>> 100644 >>>>>>> --- a/ofproto/ofproto-dpif-xlate.c >>>>>>> +++ b/ofproto/ofproto-dpif-xlate.c >>>>>>> @@ -8078,6 +8078,20 @@ xlate_wc_init(struct xlate_ctx *ctx) >>>>>>> tnl_wc_init(&ctx->xin->flow, ctx->wc); >>>>>>> } >>>>>>> >>>>>>> +static void >>>>>>> +clear_tunnel_wc(const struct flow_tnl *spec, struct flow_tnl *mask) >>>>>>> +{ >>>>>>> + if (!spec->ip_src && !spec->ip_dst) { >>>>>>> + mask->ip_src = 0; >>>>>>> + mask->ip_dst = 0; >>>>>>> + } >>>>>>> + if (!ipv6_addr_is_set(&spec->ipv6_src) && >>>>>>> + !ipv6_addr_is_set(&spec->ipv6_dst)) { >>>>>>> + mask->ipv6_src = in6addr_any; >>>>>>> + mask->ipv6_dst = in6addr_any; >>>>>>> + } >>>>>>> +} >>>>>>> + >>>>>>> static void >>>>>>> xlate_wc_finish(struct xlate_ctx *ctx) >>>>>>> { >>>>>>> @@ -8152,6 +8166,9 @@ xlate_wc_finish(struct xlate_ctx *ctx) >>>>>>> if (!flow_tnl_dst_is_set(&ctx->xin->upcall_flow->tunnel)) { >>>>>>> memset(&ctx->wc->masks.tunnel, 0, sizeof >>>>>>> ctx->wc->masks.tunnel); >>>>>>> } >>>>>>> + /* Both IPv4/IPv6 tunnel wc may be set because of set_fields >>>>>>> action. >>>>>>> + * Clear the irrelevant one. */ >>>>>>> + clear_tunnel_wc(&ctx->xin->upcall_flow->tunnel, >>>>>>> + &ctx->wc->masks.tunnel); >>>>>>> } >>>>>>> >>>>>>> /* This will tweak the odp actions generated. For now, it will: >>>>>>> diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at >>>>>>> index 64c41460bbb2..9b46d249fe9f 100644 >>>>>>> --- a/tests/tunnel-push-pop.at >>>>>>> +++ b/tests/tunnel-push-pop.at >>>>>>> @@ -1440,3 +1440,51 @@ AT_CHECK([tail -1 stdout], [0], >>>>>>> >>>>>>> OVS_VSWITCHD_STOP >>>>>>> AT_CLEANUP >>>>>>> + >>>>>>> +AT_SETUP([tunnel - decap and re-encap with outer IP version >>>>>>> +conversion]) OVS_VSWITCHD_START([add-port br0 p0 -- set Interface >>>>>>> +p0 type=dummy]) AT_CHECK([ovs-vsctl add-br int-br -- set bridge >>>>>>> +int-br datapath_type=dummy], [0]) AT_CHECK([ovs-vsctl add-port int-br >>>> t1 -- set Interface t1 type=vxlan \ >>>>>>> + options:key=flow options:remote_ip=flow \ >>>>>>> + -- add-port int-br t2 -- set Interface t2 >>>>>>> type=geneve \ >>>>>>> + options:key=flow options:remote_ip=flow \ >>>>>>> + ], [0]) >>>>>>> + >>>>>>> +AT_CHECK([ovs-ofctl add-flow br0 action=normal]) dnl Setup dummy >>>>>>> +interface IP addresses. >>>>>>> +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.1.1/24], [0], [OK >>>>>>> +]) >>>>>>> +AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::1/24], >>>>>>> +[0], [OK >>>>>>> +]) >>>>>>> +dnl Checking that a local routes for added IPs were successfully >>>>>>> installed. >>>>>>> +AT_CHECK([ovs-appctl ovs/route/show | grep Cached | sort], [0], >>>>>>> +[dnl >>>>>>> +Cached: 1.1.1.0/24 dev br0 SRC 1.1.1.1 local >>>>>>> +Cached: 2001:ca00::/24 dev br0 SRC 2001:cafe::1 local >>>>>>> +]) >>>>>>> +dnl Add entries >>>>>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 1.1.1.2 aa:bb:cc:00:00:01], >>>>>>> +[0], [OK >>>>>>> +]) >>>>>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::2 >>>>>>> +aa:bb:cc:00:00:01], [0], [OK >>>>>>> +]) >>>>>>> +AT_CHECK([ovs-appctl tnl/neigh/show | grep br0 | sort], [0], [dnl >>>>>>> +1.1.1.2 aa:bb:cc:00:00:01 br0 >>>>>>> +2001:cafe::2 aa:bb:cc:00:00:01 br0 >>>>>>> +]) >>>>>>> + >>>>>>> +AT_CHECK([ovs-ofctl add-flow int-br >>>>>>> +'in_port=t1,tun_ipv6_src=2001:cafe::1,tun_ipv6_dst=2001:cafe::2,tun >>>>>>> +_id=0x7a,actions=set_field:1.1.1.1->tun_src,set_field:1.1.1.2->tun_ >>>>>>> +dst,set_field:0x7b->tun_id,t2']) AT_CHECK([ovs-ofctl add-flow >>>>>>> +int-br >>>>>>> +'in_port=t2,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_id=0x7b,actions=set >>>>>>> +_field:2001:cafe::1->tun_ipv6_src,set_field:2001:cafe::2->tun_ipv6_ >>>>>>> +dst,set_field:0x7a->tun_id,set_field:0.0.0.0->tun_src,set_field:0.0 >>>>>>> +.0.0->tun_dst,t1']) >>>>>>> + >>>>>>> +AT_CHECK([ovs-appctl ofproto/trace --names ovs-dummy >>>>>>> +'tunnel(tun_id=0x7a,ipv6_src=2001:cafe::1,ipv6_dst=2001:cafe::2,ttl >>>>>>> +=64),in_port(4789)'], [0], [stdout]) AT_CHECK([tail -2 stdout], >>>>>>> +[0], >>>>>>> + [Megaflow: >>>>>>> +recirc_id=0,eth,tun_id=0x7a,tun_ipv6_src=2001:cafe::1,tun_ipv6_dst= >>>>>>> +2001:cafe::2,tun_tos=0,tun_flags=-df-csum+key,in_port=t1,dl_type=0x >>>>>>> +05ff Datapath actions: >>>>>>> +tnl_push(tnl_port(genev_sys_6081),header(size=50,type=5,eth(dst=aa: >>>>>>> +bb:cc:00:00:01,src=aa:55:aa:55:00:00,dl_type=0x0800),ipv4(src=1.1.1 >>>>>>> +.1,dst=1.1.1.2,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=608 >>>>>>> +1,csum=0x0),geneve(vni=0x7b)),out_port(br0)),p0 >>>>>>> +]) >>>>>>> + >>>>>>> +AT_CHECK([ovs-appctl ofproto/trace --names ovs-dummy >>>>>>> +'tunnel(tun_id=0x7b,src=1.1.1.1,dst=1.1.1.2,ttl=64),in_port(6081)'] >>>>>>> +, [0], [stdout]) AT_CHECK([tail -2 stdout], [0], >>>>>>> + [Megaflow: >>>>>>> +recirc_id=0,eth,tun_id=0x7b,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_tos >>>>>>> +=0,tun_flags=-df-csum+key,in_port=t2,dl_type=0x05ff >>>>>>> +Datapath actions: >>>>>>> +tnl_push(tnl_port(vxlan_sys_4789),header(size=70,type=4,eth(dst=aa: >>>>>>> +bb:cc:00:00:01,src=aa:55:aa:55:00:00,dl_type=0x86dd),ipv6(src=2001: >>>>>>> +cafe::1,dst=2001:cafe::2,label=0,proto=17,tclass=0x0,hlimit=64),udp >>>>>>> +(src=0,dst=4789,csum=0xffff),vxlan(flags=0x8000000,vni=0x7a)),out_p >>>>>>> +ort(br0)),p0 >>>>>>> +]) >>>>>>> + >>>>>>> +OVS_VSWITCHD_STOP >>>>>>> +AT_CLEANUP >>>>>> Hi, Reuven. Thanks for the update and the test. I see what you're >>>>>> trying to achive now. >>>>>> >>>>>> However, I don't think this is a right approach as we can't actually >>>>>> clear tunnel metadata at xlate_wc_finish() stage. This is because we >>>>>> must have prerequisites unwildcarded in order to clear the masks that >>>>>> we think are unnecessary. And tunnel metadata doesn't have any >>>>>> prerequisites, so there is no way to tell if it is necessary to match >>>>>> on or not. The main problem is a match on empty metadata that the >>>>>> user can add into OpenFlow rules. We must never clear those as it >>>>>> may be necessary to distinguish tunneled and non-tunneled packets. >>>>>> E.g. match on all-zero tun_dst signifies that the packet is not >>>>>> coming from an ipv4 tunnel. Note that input port match is not enough >>>>>> as a prerequisite as controller can inject packets from any port. >>>>>> >>>>>> For example, if you add the following change to your test: >>>>>> >>>>>> diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at >>>>>> index 9b46d249f..1d19e7e5b 100644 >>>>>> --- a/tests/tunnel-push-pop.at >>>>>> +++ b/tests/tunnel-push-pop.at >>>>>> @@ -1473,6 +1473,13 @@ AT_CHECK([ovs-appctl tnl/neigh/show | grep >>>> br0 >>>>>> | sort], [0], [dnl >>>>>> >>>>>> AT_CHECK([ovs-ofctl add-flow int-br >>>> 'in_port=t1,tun_ipv6_src=2001:cafe::1,tun_ipv6_dst=2001:cafe::2,tun_id=0x7a >>>> ,actions=set_field:1.1.1.1->tun_src,set_field:1.1.1.2->tun_dst,set_field:0x7b- >>>>> tun_id,t2']) >>>>>> AT_CHECK([ovs-ofctl add-flow int-br >>>>>> 'in_port=t2,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_id=0x7b,actions=set_f >>>>>> ield:2001:cafe::1->tun_ipv6_src,set_field:2001:cafe::2->tun_ipv6_dst, >>>>>> set_field:0x7a->tun_id,set_field:0.0.0.0->tun_src,set_field:0.0.0.0-> >>>>>> tun_dst,t1']) >>>>>> +AT_CHECK([ovs-ofctl add-flow int-br >>>>>> +'tun_dst=0.0.0.0,tun_ipv6_src=::,actions=drop']) >>>>>> + >>>>>> +AT_CHECK([ovs-appctl ofproto/trace --names int-br in_port=t1], [0], >>>>>> +[stdout]) AT_CHECK([tail -2 stdout], [0], >>>>>> + [Megaflow: >>>>>> +recirc_id=0,eth,tun_dst=0.0.0.0,tun_ipv6_src=::,in_port=t1,dl_type=0 >>>>>> +x0000 >>>>>> +Datapath actions: drop >>>>>> +]) >>>>>> >>>>>> AT_CHECK([ovs-appctl ofproto/trace --names ovs-dummy >>>> 'tunnel(tun_id=0x7a,ipv6_src=2001:cafe::1,ipv6_dst=2001:cafe::2,ttl=64),in_p >>>> ort(4789)'], [0], [stdout]) >>>>>> AT_CHECK([tail -2 stdout], [0], >>>>>> --- >>>>>> >>>>>> The test will fail and the actual datapath flow will look like this: >>>>>> >>>>>> Megaflow: recirc_id=0,eth,in_port=t1,dl_type=0x0000 >>>>>> >>>>>> Such a flow will drop all the traffic and not just non-tunneled one >>>>>> as it was intended by the OpenFlow rules. >>>>>> >>>>>> This issue actually already exist for a while and need to also revert >>>>>> the previous commit: >>>>>> d046453b56a9 ("ofproto-dpif-xlate: Clear tunnel wc bits if >>>>>> original packet is non-tunnel.") Though, I think, we need a better >>>>>> solution for this before reverting as conditions where it would be a >>>>>> problem >>>> are kind of unlikely. >>>>>> A proper solution would be to avoid unwildcarding these fields while >>>>>> setting the metadata through varois OpenFlow actions (set_field, >>>>>> load, stack pop, etc.) but preserve the masks coming from the classifier. >>>>>> The reason why we can actually avoid unwildcarding while setting the >>>>>> fileds is that unlike other fileds, tunnel metadata is not a subject >>>>>> for the optimization where we skip datapath actions if the value is >>>>>> already the same. So, there is no real reason to unwildcard them. >>>>>> >>>>>> Do you want to try and work on this kind of change? >>>>>> >>>>>> BTW, I didn't test, but it seems like IPV4/IPv6 mix is a problem not >>>>>> only for the hardware offloading code, but also for the kernel >>>>>> datapath in general as it will reject to install the flow with mixed >>>>>> metadata types. Userspace works fine in that case though. So, >>>>>> technicaly, we also need to probe the datapath implementation for >>>>>> support and slowpath such traffic, if it doesn't, i.e. set the SLOW_MATCH >>>> flag. >>>>>> Best regards, Ilya Maximets. >>>>> Hi Ilya, >>>>> >>>>> Thanks a lot for your review and comments. Since this issue is >>>>> specific to offload, and is not supported by the current upstream DPDK >>>>> implementation (while it is supported by our DOCA implementation), >>>> It should be supported by TC offload implementation, AFAIU. It also may >>>> be a >>>> problem for the kernel datapath without offload as I mentioned above. >>>> >>>>> we’ve >>>>> decided to take a different approach and handle it within the DOCA >>>>> offload provider instead. For now, we will abandon this change. >>>> The same as we can't simply clear these bits at the wc_finish stage, the >>>> datapath or the offload implementaion can't make this decision either >>>> without >>>> breaking user-specified OpenFlow logic. >>> Hi Ilya, >>> Could you please elaborate on your concern about a potential breakage? >>> Could you please provide an example? >> See the test case I provided in my previous reply above. It shows that flow >> with an overly broad match is getting generated and it will match traffic >> that was not intended to be matched by the provided set of OpenFlow rules. >> >>> I'd like to refer you to 2 other commits that are doing something like what >>> we want to do: >>> 262eded5fbd7 ("netdev-offload-tc: Fix ignoring unknown tunnel keys.") >> This was a temporary workaround and it is now removed for kernels that >> support matching on tunnel flags. You can see a lot of "This is wrong!" >> comments in this change. >> >> The commit message of this patch also describes other issues like such >> flows will be frequently removed and re-added during revalidation, i.e. >> un-offloaded and offlaoded back, because ofproto layer will see the overly >> broad match and try to re-generate the narrower one on every revalidation >> cycle. >> >>> 0ef70536eb41 ("netdev-offload-dpdk: Support vxlan encap offload with load >>> actions.") >> This only affects the case where the original packet wasn't from a tunnel, >> which is not a case here. You're trying to solve the case where the packet >> is actually from a tunnel. And we'll need to revert 0ef70536eb41 along with >> d046453b56a9 when we have a proper solution that doesn't unmask tunnel >> metadata fields on 'set'-style actions, but preserves values comming from >> the classifier, as both of these commits are practically the same and so >> have the same issue with potentially installing overly broad datapath flows. >> (I provided reasoning on why we can avoid unwildcarding on actions earlier >> in this thread.) >> >> All in all, the code on current main is not really correct. But the point >> is to avoid making it even less correct. Instead, work towards a proper >> solution and revert all the incorrect parts. >> >> Best regards, Ilya Maximets. > > Hi Ilya, > > I tried to follow your proposal: > > Added a new function mf_is_payload_field() that categorizes OpenFlow > match fields into two types: > > Non-payload fields: Metadata, pipeline state, tunnel info, registers, > etc. (returns false). > > Payload fields: Actual packet header fields like Ethernet, IP, TCP, UDP, > etc. (returns true).
I'm not sure all of these are safe. Should be limited to tunnel metadata for now, I think. Most of non-packet fields are getting cleared anyway, there is no need for extra checking. > > Modified wildcard mask behavior in three places: > > mf_mask_field_masked() - only applies masks to payload fields. > > nxm_reg_load() - only writes wildcard masks for payload fields. > > nxm_execute_stack_push/pop() - only writes wildcard masks for payload > fields. > > The problem with this solution, for example in the following flows: > > ovs-ofctl add-flow br-int > 'in_port=vxlan_br-int_0,tun_ipv6_dst=2001:192:168:222::64,actions=set_field:8.8.8.7->tun_src,set_field:8.8.8.8->tun_dst,set_field:43->tun_id,geneve_br-int_1' > > ovs-ofctl add-flow br-int > 'in_port=geneve_br-int_1,tun_dst=8.8.8.7,actions=set_field:2001:192:168:222::64->tun_ipv6_src,set_field:2001:192:168:222::65->tun_ipv6_dst,set_field:0.0.0.0->tun_src,set_field:0.0.0.0->tun_dst,set_field:42->tun_id,vxlan_br-int_0' > > The classifier masks both IP versions, as looked up in all subtables. As > we "preserve the masks coming from the classifier", the issue still exists. Yes, as long as we actually have the matches on both in the same table, we can't guarantee that both will not be matched by the classifier. It all goes back to detecting that the packet doesn't match the rule. We can only detect that if we match on the fields being different from any of the rules. And these two rules are not mutually exclusive. > > There is a workaround for this problem, for the above example: > > ovs-ofctl add-flow br-int > "table=0,in_port=vxlan_br-int_0,actions=resubmit(,1)" > > ovs-ofctl add-flow br-int > "table=0,in_port=geneve_br-int_1,actions=resubmit(,2)" > > ovs-ofctl add-flow br-int > "table=1,tun_ipv6_dst=2001:192:168:222::64,actions=resubmit(,3)" > > ovs-ofctl add-flow br-int "table=2,tun_dst=8.8.8.7,actions=resubmit(,4)" > > ovs-ofctl add-flow br-int > "table=3,actions=set_field:8.8.8.7->tun_src,set_field:8.8.8.8->tun_dst,set_field:43->tun_id,geneve_br-int_1" > > ovs-ofctl add-flow br-int > "table=4,actions=set_field:2001:192:168:222::64->tun_ipv6_src,set_field:2001:192:168:222::65->tun_ipv6_dst,set_field:0.0.0.0->tun_src,set_field:0.0.0.0->tun_dst,set_field:42->tun_id,vxlan_br-int_0" > > Now every match lives in its own table. You don't really need tables 1 and 2, matches from them can probably be moved to 3 and 4. > > Alternatively, we have another proposal: Introduce tunnel eth-type field. > > With this we have a robust way to know if the match should be on IPv4 or > IPv6. > > This also enables us to clear the code from those "bad" commits we have. > > This might also enable us to change flow_tnl_{src/dst}_is_set functions > (that currently assumes zeros means not set, e.g. your counter example). > > Note this approach has limited scope: it only addresses IP version > disambiguation. It wouldn't be helpful for anything else like stack pop etc. > > What do you think? Having such a filed on OpenFlow level will imply that both v4 and v6 fileds cannot be set at the same time, which is not how it works today and will be a backward incompatible change. It will also need to become a dependency for those fields if we want to make sure that it is always matched whenever any of the addresses matched. Which will also not be backward compatible. We may need a new set of fields in this case, which will be hard to support. So, while the tunnel eth-type might make sense from the datapath perspective, it's complicated from the perspective of the OpenFlow pipeline, as all fields can be populated at the same time today. The current design of these fields is obviously not great, but it's also the reason why it's not easy to change it, unfortunately. Can you match on tunnel id instead? If all the tunnels are keyed (vxlan and geneve always are), and you know the ids, and if those ids are unique, then you may never need to match on the tunnel addresses at all. In combination with not unwildcarding bits on actions, this should give a desired result. Best regards, Ilya Maximets. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev