On Wed, Feb 14, 2024 at 12:09 PM David Marchand
<[email protected]> wrote:
>
> Hello Mike,
>
> On Mon, Feb 12, 2024 at 8:50 PM Mike Pattrick <[email protected]> wrote:
> >
> > Previously if an OVS configuration nested multiple layers of UDP tunnels
> > like VXLAN or GENEVE on top of each other through netdev-linux
> > interfaces, the vnet header would be incorrectly set to the outermost
> > UDP tunnel layer instead of the intermediary tunnel layer.
> >
> > This resulted in the middle UDP tunnel not checksum offloading properly.
> >
> > Fixes: 3337e6d91c5b ("userspace: Enable L4 checksum offloading by default.")
> > Reported-by: David Marchand <[email protected]>
> > Signed-off-by: Mike Pattrick <[email protected]>
>
> I have some trouble relating this patch to the issue I faced :-).
> Could you detail a test that shows the issue you fix here?

I made a slight modification to the test suite you provided adding a
UDP traffic test to your test; which unveiled this issue. But you are
correct, I got so distracted by one issue that I missed other issues.

>
> After applying (only this patch), I still reproduce an issue with
> inner checksums.
> As I reported this issue to you offlist, let me put the details in public 
> here.
>
> I wrote a system-traffic.at unit test that stacks 3 vxlan tunnels
> (separate topic, but for the context, my goal was to stress DPDK
> dp-packets wrt headroom).
> If I try this unit test before commit 084c8087292c ("userspace:
> Support VXLAN and GENEVE TSO."), I have no issue.
>
> The topology is as follows:
> ##############################################################
> #
> # at_ns0                    . init_net
> #                           .
> # at_vxlan1 (10.1.1.1/24)   . br0 (10.1.1.100/24)
> # (remote 172.31.1.100)     . |
> #                           . at_vxlan0
> #                           . (remote 172.31.1.1)
> #                           .
> # at_vxlan3 (172.31.1.1/24) . br-underlay0 (172.31.1.100/24)
> # (remote 172.31.2.100)     . |
> #                           . at_vxlan2
> #                           . (remote 172.31.2.1)
> #                           .
> # at_vxlan5 (172.31.2.1/24) . br-underlay1 (172.31.2.100/24)
> # (remote 172.31.3.100)     . |
> #                           . at_vxlan4
> #                           . (remote 172.31.3.1)
> #                           .
> # p0 (172.31.3.1/24)        . br-underlay2 (172.31.3.100/24)
> # |                         . |
> # \-------------------------.-ovs-p0
> #
> ##############################################################
>
> (gmail will probably bust this copy/paste, so putting a link to the
> actual test: 
> https://github.com/david-marchand/ovs/commit/manyvxlan~2#diff-45a77f85f9679bc66ac97300392c0d5d9f5c53264fa8a82d735a553246e71faeR400)
>
> With this setup, I try to ping, from at_ns0 netns, the ip address of
> the br tap iface plugged with the other side of each tunnel:
>
> - Most outter level, no encapsulation, all good:
> 16:24:51.590966 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4
> (0x0800), length 98: (tos 0x0, ttl 64, id 63550, offset 0, flags [DF],
> proto ICMP (1), length 84)
>     172.31.3.1 > 172.31.3.100: ICMP echo request, id 26707, seq 1, length 64
>
> 16:24:51.591084 82:cf:78:de:ed:46 > a6:0a:bf:e2:f3:f2, ethertype IPv4
> (0x0800), length 98: (tos 0x0, ttl 64, id 28720, offset 0, flags
> [none], proto ICMP (1), length 84)
>     172.31.3.100 > 172.31.3.1: ICMP echo reply, id 26707, seq 1, length 64
>
> - One tunnel encap all good:
> 16:24:54.140629 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4
> (0x0800), length 148: (tos 0x0, ttl 64, id 61052, offset 0, flags
> [none], proto UDP (17), length 134)
>     172.31.3.1.36831 > 172.31.3.100.vxlan: [udp sum ok] VXLAN, flags
> [I] (0x08), vni 0
> 1e:db:ec:e5:28:6d > 9a:39:be:e8:18:4b, ethertype IPv4 (0x0800), length
> 98: (tos 0x0, ttl 64, id 54399, offset 0, flags [DF], proto ICMP (1),
> length 84)
>     172.31.2.1 > 172.31.2.100: ICMP echo request, id 51488, seq 1, length 64
>
> 16:24:54.140772 82:cf:78:de:ed:46 > a6:0a:bf:e2:f3:f2, ethertype IPv4
> (0x0800), length 148: (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> proto UDP (17), length 134)
>     172.31.3.100.39912 > 172.31.3.1.vxlan: [no cksum] VXLAN, flags [I]
> (0x08), vni 0
> 9a:39:be:e8:18:4b > 1e:db:ec:e5:28:6d, ethertype IPv4 (0x0800), length
> 98: (tos 0x0, ttl 64, id 29701, offset 0, flags [none], proto ICMP
> (1), length 84)
>     172.31.2.100 > 172.31.2.1: ICMP echo reply, id 51488, seq 1, length 64
>
> - Two tunnels encap:
> 16:24:58.578900 a6:0a:bf:e2:f3:f2 > 82:cf:78:de:ed:46, ethertype IPv4
> (0x0800), length 142: (tos 0x0, ttl 64, id 61719, offset 0, flags
> [none], proto UDP (17), length 128)
>     172.31.3.1.50673 > 172.31.3.100.vxlan: [udp sum ok] VXLAN, flags
> [I] (0x08), vni 0
> 1e:db:ec:e5:28:6d > 9a:39:be:e8:18:4b, ethertype IPv4 (0x0800), length
> 92: (tos 0x0, ttl 64, id 35175, offset 0, flags [none], proto UDP
> (17), length 78)
>     172.31.2.1.44060 > 172.31.2.100.vxlan: [udp sum ok] VXLAN, flags
> [I] (0x08), vni 1
> 62:53:3f:82:da:56 > Broadcast, ethertype ARP (0x0806), length 42:
> Ethernet (len 6), IPv4 (len 4), Request who-has 172.31.1.100 tell
> 172.31.1.1, length 28
>
> 16:24:58.579021 82:cf:78:de:ed:46 > a6:0a:bf:e2:f3:f2, ethertype IPv4
> (0x0800), length 142: (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> proto UDP (17), length 128)
>     172.31.3.100.56325 > 172.31.3.1.vxlan: [no cksum] VXLAN, flags [I]
> (0x08), vni 0
> 9a:39:be:e8:18:4b > 1e:db:ec:e5:28:6d, ethertype IPv4 (0x0800), length
> 92: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17),
> length 78, bad cksum 0 (->ddfb)!)
>     172.31.2.100.56325 > 172.31.2.1.vxlan: [no cksum] VXLAN, flags [I]
> (0x08), vni 1
> 26:8d:e7:0e:41:45 > 62:53:3f:82:da:56, ethertype ARP (0x0806), length
> 42: Ethernet (len 6), IPv4 (len 4), Reply 172.31.1.100 is-at
> 26:8d:e7:0e:41:45, length 28
>
> In this last packet capture, the inner vxlan tunnel header has wrong
> (well no) IP checksum.
>
> If I set the tunnel endpoint neighbors in each netns, the same issue
> can be seen for icmp encapsulated traffic:
>
> 17:41:39.303170 32:a4:65:eb:df:49 > a6:0a:bf:e2:f3:f2, ethertype IPv4
> (0x0800), length 198: (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> proto UDP (17), length 184)
>     172.31.3.100.59550 > 172.31.3.1.vxlan: [no cksum] VXLAN, flags [I]
> (0x08), vni 0
> fe:72:7b:39:24:49 > 7e:9d:b3:a5:a2:f8, ethertype IPv4 (0x0800), length
> 148: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17),
> length 134, bad cksum 0 (->ddc3)!)
>     172.31.2.100.59550 > 172.31.2.1.vxlan: [no cksum] VXLAN, flags [I]
> (0x08), vni 1
> 56:21:48:f3:f3:42 > 72:64:65:f9:38:0a, ethertype IPv4 (0x0800), length
> 98: (tos 0x0, ttl 64, id 49609, offset 0, flags [none], proto ICMP
> (1), length 84)
>     172.31.1.100 > 172.31.1.1: ICMP echo reply, id 35387, seq 1, length 64
>
>
> Running in GHA:
> https://github.com/david-marchand/ovs/actions/runs/7903675971/job/21572151269
>
> Afaics, this issue is not reproduced when using the kernel datapath
> (check-kernel) but it is seen in GHA for check-system-userspace /
> check-dpdk / check-afxdp.
> I can reproduce those locally.
>
> There is also a failure with check-offloads in GHA which I can't
> reproduce locally (for the latter, it may be a kernel related issue
> (and I think there were some known issues..)).
>
>
> Now, if I apply below diff, my issue seems to go away for
> check-system-userspace / check-dpdk / check-afxdp.
> $ git diff -b HEAD^
> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
> index 0d6d803fe4..7f3e2c0ae4 100644
> --- a/lib/netdev-native-tnl.c
> +++ b/lib/netdev-native-tnl.c
> @@ -280,6 +280,7 @@ dp_packet_tnl_ol_process(struct dp_packet *packet,
>                  dp_packet_hwol_set_tx_ipv6(packet);
>              }
>          }
> +    }
>
>      /* Attention please, tunnel inner l2 len is consist of udp header
>       * len and tunnel header len and inner l2 len. */
> @@ -299,7 +300,6 @@ dp_packet_tnl_ol_process(struct dp_packet *packet,
>                                       (char *) dp_packet_eth(packet) +
>                                       VXLAN_HLEN);
>      }
> -    }
>  }
>
>  void
>
> But IPv6 tso tests have some failures, and some nsh test fails too
> regardless of which datapath is used, so this is not a right fix.

One of the problems here is when the tunnel offload flag is set, we
set dp_packet_hwol_set_tx_outer_ipv4_csum() instead of
dp_packet_hwol_set_tx_ipv4_csum(). But when modifying IP addresses, we
check for dp_packet_hwol_tx_ip_csum() not the outer ip csum flag.
Likewise, in netdev_tnl_ip_extract_tnl_md we also only check for ip
csum, not the outer flag.

Since dp_packet_l3*() functions always refer to the outermost packet
but the ipv4_csum() functions refer to the inner packet there is some
tension here, and some incorrect behaviour.

I've made a branch where we properly account for outer and inner
checksums, and it passes the tests mostly, except for afxdp.

For afxdp we crash in dp_packet_prealloc_headroom(). netdev-afxdp has
a hardcoded OVS_XDP_HEADROOM=128 bytes and the multiple layers of
tunneling exceeds that. I ran a test where I set this to 256 and the
test passes, but that seems like a non-ideal solution. We probably
shouldn't abort() in dp_packet_resize(), as it could be possible to
accidentally run into this.

Dropping the packet is probably preferable IMO, but that is also a
very large change, as none of the calling functions have return codes
themselves and some of the 2rd degree call backs don't either, so many
functions will need to change.

You can see the branch here: https://github.com/mkp-rh/ovs/tree/multitun
And the test run here: https://github.com/mkp-rh/ovs/actions/runs/7911539363

I'll clean up this a bit and address some of the other things
mentioned, like the incorrect Fixes tag.


Cheers,
M

>
> Actual commit:
> https://github.com/david-marchand/ovs/commit/manyvxlan
> https://github.com/david-marchand/ovs/actions/runs/7904287294
>
>
>
> --
> David Marchand
>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to