On 13/09/2025 2:26, Ilya Maximets wrote: > External email: Use caution opening links or attachments > > > On 9/9/25 3:56 PM, Reuven Plevinsky wrote: >> On 08/09/2025 15:24, Ilya Maximets wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> On 9/7/25 1:33 PM, Eli Britstein wrote: >>>>> -----Original Message----- >>>>> From: Ilya Maximets <i.maxim...@ovn.org> >>>>> Sent: Thursday, 21 August 2025 13:32 >>>>> To: Reuven Plevinsky <rplevin...@nvidia.com>; d...@openvswitch.org >>>>> Cc: i.maxim...@ovn.org; Eli Britstein <el...@nvidia.com>; Maor Dickman >>>>> <ma...@nvidia.com> >>>>> Subject: Re: [ovs-dev] [PATCH v2 1/1] ofproto-dpif-xlate: Clear tunnel >>>>> wildcard >>>>> bits properly. >>>>> >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> On 8/19/25 11:01 AM, Reuven Plevinsky wrote: >>>>>> On 06/08/2025 14:26, Ilya Maximets wrote: >>>>>>> External email: Use caution opening links or attachments >>>>>>> >>>>>>> >>>>>>> On 8/4/25 4:29 PM, Reuven Plevinsky via dev wrote: >>>>>>>> For example, consider the following OpenFlow rule: >>>>>>>> - IPv6 -> IPv4 tunnel translation: >>>>>>>> add-flow br-int 'in_port=vxlan_br-int_0, \ >>>>>>>> actions=set_field:8.8.8.7->tun_src, \ >>>>>>>> set_field:8.8.8.8->tun_dst,set_field:43->tun_id,geneve_br-int_1' >>>>>>>> >>>>>>>> As any set_field action, flow wildcards are set to the tun_src/dst. >>>>>>>> The offload layer fails if not all matches are handled ("consumed"). >>>>>>>> Clear the irrelevant wildcards. >>>>>>>> >>>>>>>> Signed-off-by: Reuven Plevinsky <rplevin...@nvidia.com> >>>>>>>> --- >>>>>>>> ofproto/ofproto-dpif-xlate.c | 17 +++++++++++++ >>>>>>>> tests/tunnel-push-pop.at | 48 >>>>> ++++++++++++++++++++++++++++++++++++ >>>>>>>> 2 files changed, 65 insertions(+) >>>>>>>> >>>>>>>> diff --git a/ofproto/ofproto-dpif-xlate.c >>>>>>>> b/ofproto/ofproto-dpif-xlate.c index 2c8197fb7307..e2ba7e6b0250 >>>>>>>> 100644 >>>>>>>> --- a/ofproto/ofproto-dpif-xlate.c >>>>>>>> +++ b/ofproto/ofproto-dpif-xlate.c >>>>>>>> @@ -8078,6 +8078,20 @@ xlate_wc_init(struct xlate_ctx *ctx) >>>>>>>> tnl_wc_init(&ctx->xin->flow, ctx->wc); >>>>>>>> } >>>>>>>> >>>>>>>> +static void >>>>>>>> +clear_tunnel_wc(const struct flow_tnl *spec, struct flow_tnl *mask) >>>>>>>> +{ >>>>>>>> + if (!spec->ip_src && !spec->ip_dst) { >>>>>>>> + mask->ip_src = 0; >>>>>>>> + mask->ip_dst = 0; >>>>>>>> + } >>>>>>>> + if (!ipv6_addr_is_set(&spec->ipv6_src) && >>>>>>>> + !ipv6_addr_is_set(&spec->ipv6_dst)) { >>>>>>>> + mask->ipv6_src = in6addr_any; >>>>>>>> + mask->ipv6_dst = in6addr_any; >>>>>>>> + } >>>>>>>> +} >>>>>>>> + >>>>>>>> static void >>>>>>>> xlate_wc_finish(struct xlate_ctx *ctx) >>>>>>>> { >>>>>>>> @@ -8152,6 +8166,9 @@ xlate_wc_finish(struct xlate_ctx *ctx) >>>>>>>> if (!flow_tnl_dst_is_set(&ctx->xin->upcall_flow->tunnel)) { >>>>>>>> memset(&ctx->wc->masks.tunnel, 0, sizeof >>>>>>>> ctx->wc->masks.tunnel); >>>>>>>> } >>>>>>>> + /* Both IPv4/IPv6 tunnel wc may be set because of set_fields >>>>>>>> action. >>>>>>>> + * Clear the irrelevant one. */ >>>>>>>> + clear_tunnel_wc(&ctx->xin->upcall_flow->tunnel, >>>>>>>> + &ctx->wc->masks.tunnel); >>>>>>>> } >>>>>>>> >>>>>>>> /* This will tweak the odp actions generated. For now, it will: >>>>>>>> diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at >>>>>>>> index 64c41460bbb2..9b46d249fe9f 100644 >>>>>>>> --- a/tests/tunnel-push-pop.at >>>>>>>> +++ b/tests/tunnel-push-pop.at >>>>>>>> @@ -1440,3 +1440,51 @@ AT_CHECK([tail -1 stdout], [0], >>>>>>>> >>>>>>>> OVS_VSWITCHD_STOP >>>>>>>> AT_CLEANUP >>>>>>>> + >>>>>>>> +AT_SETUP([tunnel - decap and re-encap with outer IP version >>>>>>>> +conversion]) OVS_VSWITCHD_START([add-port br0 p0 -- set Interface >>>>>>>> +p0 type=dummy]) AT_CHECK([ovs-vsctl add-br int-br -- set bridge >>>>>>>> +int-br datapath_type=dummy], [0]) AT_CHECK([ovs-vsctl add-port int-br >>>>> t1 -- set Interface t1 type=vxlan \ >>>>>>>> + options:key=flow options:remote_ip=flow \ >>>>>>>> + -- add-port int-br t2 -- set Interface t2 >>>>>>>> type=geneve \ >>>>>>>> + options:key=flow options:remote_ip=flow \ >>>>>>>> + ], [0]) >>>>>>>> + >>>>>>>> +AT_CHECK([ovs-ofctl add-flow br0 action=normal]) dnl Setup dummy >>>>>>>> +interface IP addresses. >>>>>>>> +AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0 1.1.1.1/24], [0], [OK >>>>>>>> +]) >>>>>>>> +AT_CHECK([ovs-appctl netdev-dummy/ip6addr br0 2001:cafe::1/24], >>>>>>>> +[0], [OK >>>>>>>> +]) >>>>>>>> +dnl Checking that a local routes for added IPs were successfully >>>>>>>> installed. >>>>>>>> +AT_CHECK([ovs-appctl ovs/route/show | grep Cached | sort], [0], >>>>>>>> +[dnl >>>>>>>> +Cached: 1.1.1.0/24 dev br0 SRC 1.1.1.1 local >>>>>>>> +Cached: 2001:ca00::/24 dev br0 SRC 2001:cafe::1 local >>>>>>>> +]) >>>>>>>> +dnl Add entries >>>>>>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 1.1.1.2 aa:bb:cc:00:00:01], >>>>>>>> +[0], [OK >>>>>>>> +]) >>>>>>>> +AT_CHECK([ovs-appctl tnl/neigh/set br0 2001:cafe::2 >>>>>>>> +aa:bb:cc:00:00:01], [0], [OK >>>>>>>> +]) >>>>>>>> +AT_CHECK([ovs-appctl tnl/neigh/show | grep br0 | sort], [0], [dnl >>>>>>>> +1.1.1.2 aa:bb:cc:00:00:01 br0 >>>>>>>> +2001:cafe::2 aa:bb:cc:00:00:01 br0 >>>>>>>> +]) >>>>>>>> + >>>>>>>> +AT_CHECK([ovs-ofctl add-flow int-br >>>>>>>> +'in_port=t1,tun_ipv6_src=2001:cafe::1,tun_ipv6_dst=2001:cafe::2,tun >>>>>>>> +_id=0x7a,actions=set_field:1.1.1.1->tun_src,set_field:1.1.1.2->tun_ >>>>>>>> +dst,set_field:0x7b->tun_id,t2']) AT_CHECK([ovs-ofctl add-flow >>>>>>>> +int-br >>>>>>>> +'in_port=t2,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_id=0x7b,actions=set >>>>>>>> +_field:2001:cafe::1->tun_ipv6_src,set_field:2001:cafe::2->tun_ipv6_ >>>>>>>> +dst,set_field:0x7a->tun_id,set_field:0.0.0.0->tun_src,set_field:0.0 >>>>>>>> +.0.0->tun_dst,t1']) >>>>>>>> + >>>>>>>> +AT_CHECK([ovs-appctl ofproto/trace --names ovs-dummy >>>>>>>> +'tunnel(tun_id=0x7a,ipv6_src=2001:cafe::1,ipv6_dst=2001:cafe::2,ttl >>>>>>>> +=64),in_port(4789)'], [0], [stdout]) AT_CHECK([tail -2 stdout], >>>>>>>> +[0], >>>>>>>> + [Megaflow: >>>>>>>> +recirc_id=0,eth,tun_id=0x7a,tun_ipv6_src=2001:cafe::1,tun_ipv6_dst= >>>>>>>> +2001:cafe::2,tun_tos=0,tun_flags=-df-csum+key,in_port=t1,dl_type=0x >>>>>>>> +05ff Datapath actions: >>>>>>>> +tnl_push(tnl_port(genev_sys_6081),header(size=50,type=5,eth(dst=aa: >>>>>>>> +bb:cc:00:00:01,src=aa:55:aa:55:00:00,dl_type=0x0800),ipv4(src=1.1.1 >>>>>>>> +.1,dst=1.1.1.2,proto=17,tos=0,ttl=64,frag=0x4000),udp(src=0,dst=608 >>>>>>>> +1,csum=0x0),geneve(vni=0x7b)),out_port(br0)),p0 >>>>>>>> +]) >>>>>>>> + >>>>>>>> +AT_CHECK([ovs-appctl ofproto/trace --names ovs-dummy >>>>>>>> +'tunnel(tun_id=0x7b,src=1.1.1.1,dst=1.1.1.2,ttl=64),in_port(6081)'] >>>>>>>> +, [0], [stdout]) AT_CHECK([tail -2 stdout], [0], >>>>>>>> + [Megaflow: >>>>>>>> +recirc_id=0,eth,tun_id=0x7b,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_tos >>>>>>>> +=0,tun_flags=-df-csum+key,in_port=t2,dl_type=0x05ff >>>>>>>> +Datapath actions: >>>>>>>> +tnl_push(tnl_port(vxlan_sys_4789),header(size=70,type=4,eth(dst=aa: >>>>>>>> +bb:cc:00:00:01,src=aa:55:aa:55:00:00,dl_type=0x86dd),ipv6(src=2001: >>>>>>>> +cafe::1,dst=2001:cafe::2,label=0,proto=17,tclass=0x0,hlimit=64),udp >>>>>>>> +(src=0,dst=4789,csum=0xffff),vxlan(flags=0x8000000,vni=0x7a)),out_p >>>>>>>> +ort(br0)),p0 >>>>>>>> +]) >>>>>>>> + >>>>>>>> +OVS_VSWITCHD_STOP >>>>>>>> +AT_CLEANUP >>>>>>> Hi, Reuven. Thanks for the update and the test. I see what you're >>>>>>> trying to achive now. >>>>>>> >>>>>>> However, I don't think this is a right approach as we can't actually >>>>>>> clear tunnel metadata at xlate_wc_finish() stage. This is because we >>>>>>> must have prerequisites unwildcarded in order to clear the masks that >>>>>>> we think are unnecessary. And tunnel metadata doesn't have any >>>>>>> prerequisites, so there is no way to tell if it is necessary to match >>>>>>> on or not. The main problem is a match on empty metadata that the >>>>>>> user can add into OpenFlow rules. We must never clear those as it >>>>>>> may be necessary to distinguish tunneled and non-tunneled packets. >>>>>>> E.g. match on all-zero tun_dst signifies that the packet is not >>>>>>> coming from an ipv4 tunnel. Note that input port match is not enough >>>>>>> as a prerequisite as controller can inject packets from any port. >>>>>>> >>>>>>> For example, if you add the following change to your test: >>>>>>> >>>>>>> diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at >>>>>>> index 9b46d249f..1d19e7e5b 100644 >>>>>>> --- a/tests/tunnel-push-pop.at >>>>>>> +++ b/tests/tunnel-push-pop.at >>>>>>> @@ -1473,6 +1473,13 @@ AT_CHECK([ovs-appctl tnl/neigh/show | grep >>>>> br0 >>>>>>> | sort], [0], [dnl >>>>>>> >>>>>>> AT_CHECK([ovs-ofctl add-flow int-br >>>>> 'in_port=t1,tun_ipv6_src=2001:cafe::1,tun_ipv6_dst=2001:cafe::2,tun_id=0x7a >>>>> ,actions=set_field:1.1.1.1->tun_src,set_field:1.1.1.2->tun_dst,set_field:0x7b- >>>>>> tun_id,t2']) >>>>>>> AT_CHECK([ovs-ofctl add-flow int-br >>>>>>> 'in_port=t2,tun_src=1.1.1.1,tun_dst=1.1.1.2,tun_id=0x7b,actions=set_f >>>>>>> ield:2001:cafe::1->tun_ipv6_src,set_field:2001:cafe::2->tun_ipv6_dst, >>>>>>> set_field:0x7a->tun_id,set_field:0.0.0.0->tun_src,set_field:0.0.0.0-> >>>>>>> tun_dst,t1']) >>>>>>> +AT_CHECK([ovs-ofctl add-flow int-br >>>>>>> +'tun_dst=0.0.0.0,tun_ipv6_src=::,actions=drop']) >>>>>>> + >>>>>>> +AT_CHECK([ovs-appctl ofproto/trace --names int-br in_port=t1], [0], >>>>>>> +[stdout]) AT_CHECK([tail -2 stdout], [0], >>>>>>> + [Megaflow: >>>>>>> +recirc_id=0,eth,tun_dst=0.0.0.0,tun_ipv6_src=::,in_port=t1,dl_type=0 >>>>>>> +x0000 >>>>>>> +Datapath actions: drop >>>>>>> +]) >>>>>>> >>>>>>> AT_CHECK([ovs-appctl ofproto/trace --names ovs-dummy >>>>> 'tunnel(tun_id=0x7a,ipv6_src=2001:cafe::1,ipv6_dst=2001:cafe::2,ttl=64),in_p >>>>> ort(4789)'], [0], [stdout]) >>>>>>> AT_CHECK([tail -2 stdout], [0], >>>>>>> --- >>>>>>> >>>>>>> The test will fail and the actual datapath flow will look like this: >>>>>>> >>>>>>> Megaflow: recirc_id=0,eth,in_port=t1,dl_type=0x0000 >>>>>>> >>>>>>> Such a flow will drop all the traffic and not just non-tunneled one >>>>>>> as it was intended by the OpenFlow rules. >>>>>>> >>>>>>> This issue actually already exist for a while and need to also revert >>>>>>> the previous commit: >>>>>>> d046453b56a9 ("ofproto-dpif-xlate: Clear tunnel wc bits if >>>>>>> original packet is non-tunnel.") Though, I think, we need a better >>>>>>> solution for this before reverting as conditions where it would be a >>>>>>> problem >>>>> are kind of unlikely. >>>>>>> A proper solution would be to avoid unwildcarding these fields while >>>>>>> setting the metadata through varois OpenFlow actions (set_field, >>>>>>> load, stack pop, etc.) but preserve the masks coming from the >>>>>>> classifier. >>>>>>> The reason why we can actually avoid unwildcarding while setting the >>>>>>> fileds is that unlike other fileds, tunnel metadata is not a subject >>>>>>> for the optimization where we skip datapath actions if the value is >>>>>>> already the same. So, there is no real reason to unwildcard them. >>>>>>> >>>>>>> Do you want to try and work on this kind of change? >>>>>>> >>>>>>> BTW, I didn't test, but it seems like IPV4/IPv6 mix is a problem not >>>>>>> only for the hardware offloading code, but also for the kernel >>>>>>> datapath in general as it will reject to install the flow with mixed >>>>>>> metadata types. Userspace works fine in that case though. So, >>>>>>> technicaly, we also need to probe the datapath implementation for >>>>>>> support and slowpath such traffic, if it doesn't, i.e. set the >>>>>>> SLOW_MATCH >>>>> flag. >>>>>>> Best regards, Ilya Maximets. >>>>>> Hi Ilya, >>>>>> >>>>>> Thanks a lot for your review and comments. Since this issue is >>>>>> specific to offload, and is not supported by the current upstream DPDK >>>>>> implementation (while it is supported by our DOCA implementation), >>>>> It should be supported by TC offload implementation, AFAIU. It also may >>>>> be a >>>>> problem for the kernel datapath without offload as I mentioned above. >>>>> >>>>>> we’ve >>>>>> decided to take a different approach and handle it within the DOCA >>>>>> offload provider instead. For now, we will abandon this change. >>>>> The same as we can't simply clear these bits at the wc_finish stage, the >>>>> datapath or the offload implementaion can't make this decision either >>>>> without >>>>> breaking user-specified OpenFlow logic. >>>> Hi Ilya, >>>> Could you please elaborate on your concern about a potential breakage? >>>> Could you please provide an example? >>> See the test case I provided in my previous reply above. It shows that flow >>> with an overly broad match is getting generated and it will match traffic >>> that was not intended to be matched by the provided set of OpenFlow rules. >>> >>>> I'd like to refer you to 2 other commits that are doing something like what >>>> we want to do: >>>> 262eded5fbd7 ("netdev-offload-tc: Fix ignoring unknown tunnel keys.") >>> This was a temporary workaround and it is now removed for kernels that >>> support matching on tunnel flags. You can see a lot of "This is wrong!" >>> comments in this change. >>> >>> The commit message of this patch also describes other issues like such >>> flows will be frequently removed and re-added during revalidation, i.e. >>> un-offloaded and offlaoded back, because ofproto layer will see the overly >>> broad match and try to re-generate the narrower one on every revalidation >>> cycle. >>> >>>> 0ef70536eb41 ("netdev-offload-dpdk: Support vxlan encap offload with load >>>> actions.") >>> This only affects the case where the original packet wasn't from a tunnel, >>> which is not a case here. You're trying to solve the case where the packet >>> is actually from a tunnel. And we'll need to revert 0ef70536eb41 along with >>> d046453b56a9 when we have a proper solution that doesn't unmask tunnel >>> metadata fields on 'set'-style actions, but preserves values comming from >>> the classifier, as both of these commits are practically the same and so >>> have the same issue with potentially installing overly broad datapath flows. >>> (I provided reasoning on why we can avoid unwildcarding on actions earlier >>> in this thread.) >>> >>> All in all, the code on current main is not really correct. But the point >>> is to avoid making it even less correct. Instead, work towards a proper >>> solution and revert all the incorrect parts. >>> >>> Best regards, Ilya Maximets. >> Hi Ilya, >> >> I tried to follow your proposal: >> >> Added a new function mf_is_payload_field() that categorizes OpenFlow >> match fields into two types: >> >> Non-payload fields: Metadata, pipeline state, tunnel info, registers, >> etc. (returns false). >> >> Payload fields: Actual packet header fields like Ethernet, IP, TCP, UDP, >> etc. (returns true). > I'm not sure all of these are safe. Should be limited to tunnel metadata > for now, I think. Most of non-packet fields are getting cleared anyway, > there is no need for extra checking. > >> Modified wildcard mask behavior in three places: >> >> mf_mask_field_masked() - only applies masks to payload fields. >> >> nxm_reg_load() - only writes wildcard masks for payload fields. >> >> nxm_execute_stack_push/pop() - only writes wildcard masks for payload >> fields. >> >> The problem with this solution, for example in the following flows: >> >> ovs-ofctl add-flow br-int >> 'in_port=vxlan_br-int_0,tun_ipv6_dst=2001:192:168:222::64,actions=set_field:8.8.8.7->tun_src,set_field:8.8.8.8->tun_dst,set_field:43->tun_id,geneve_br-int_1' >> >> ovs-ofctl add-flow br-int >> 'in_port=geneve_br-int_1,tun_dst=8.8.8.7,actions=set_field:2001:192:168:222::64->tun_ipv6_src,set_field:2001:192:168:222::65->tun_ipv6_dst,set_field:0.0.0.0->tun_src,set_field:0.0.0.0->tun_dst,set_field:42->tun_id,vxlan_br-int_0' >> >> The classifier masks both IP versions, as looked up in all subtables. As >> we "preserve the masks coming from the classifier", the issue still exists. > Yes, as long as we actually have the matches on both in the same table, > we can't guarantee that both will not be matched by the classifier. > It all goes back to detecting that the packet doesn't match the rule. > We can only detect that if we match on the fields being different from > any of the rules. And these two rules are not mutually exclusive. > >> There is a workaround for this problem, for the above example: >> >> ovs-ofctl add-flow br-int >> "table=0,in_port=vxlan_br-int_0,actions=resubmit(,1)" >> >> ovs-ofctl add-flow br-int >> "table=0,in_port=geneve_br-int_1,actions=resubmit(,2)" >> >> ovs-ofctl add-flow br-int >> "table=1,tun_ipv6_dst=2001:192:168:222::64,actions=resubmit(,3)" >> >> ovs-ofctl add-flow br-int "table=2,tun_dst=8.8.8.7,actions=resubmit(,4)" >> >> ovs-ofctl add-flow br-int >> "table=3,actions=set_field:8.8.8.7->tun_src,set_field:8.8.8.8->tun_dst,set_field:43->tun_id,geneve_br-int_1" >> >> ovs-ofctl add-flow br-int >> "table=4,actions=set_field:2001:192:168:222::64->tun_ipv6_src,set_field:2001:192:168:222::65->tun_ipv6_dst,set_field:0.0.0.0->tun_src,set_field:0.0.0.0->tun_dst,set_field:42->tun_id,vxlan_br-int_0" >> >> Now every match lives in its own table. > You don't really need tables 1 and 2, matches from them can probably be > moved to 3 and 4. > >> Alternatively, we have another proposal: Introduce tunnel eth-type field. >> >> With this we have a robust way to know if the match should be on IPv4 or >> IPv6. >> >> This also enables us to clear the code from those "bad" commits we have. >> >> This might also enable us to change flow_tnl_{src/dst}_is_set functions >> (that currently assumes zeros means not set, e.g. your counter example). >> >> Note this approach has limited scope: it only addresses IP version >> disambiguation. It wouldn't be helpful for anything else like stack pop etc. >> >> What do you think? > Having such a filed on OpenFlow level will imply that both v4 and v6 fileds > cannot be set at the same time, which is not how it works today and will > be a backward incompatible change. It will also need to become a dependency > for those fields if we want to make sure that it is always matched whenever > any of the addresses matched. Which will also not be backward compatible. > We may need a new set of fields in this case, which will be hard to support. > > So, while the tunnel eth-type might make sense from the datapath perspective, > it's complicated from the perspective of the OpenFlow pipeline, as all fields > can be populated at the same time today. The current design of these fields > is obviously not great, but it's also the reason why it's not easy to change > it, unfortunately. > > Can you match on tunnel id instead? If all the tunnels are keyed (vxlan and > geneve always are), and you know the ids, and if those ids are unique, then > you may never need to match on the tunnel addresses at all. In combination > with not unwildcarding bits on actions, this should give a desired result. > > Best regards, Ilya Maximets.
Hi Ilya, Thanks again for your feedback. I uploaded a draft of the proposed change here: https://mail.openvswitch.org/pipermail/ovs-dev/2025-September/426299.html In that message I also addressed your concerns in more detail. Best, Reuven _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev