Re: Mixed MTU hosts on a network
Hey Roman, I've just tried a few ways of replicating your setup, and I can't seem to reproduce the bug, either with the new code or old. The results you mention are surprising too, since WireGuard or not, TCP is supposed to negotiate the lowest common MSS. I wonder if some strange iptables rules are getting in the way and confusing things? Jason ___ WireGuard mailing list WireGuard@lists.zx2c4.com https://lists.zx2c4.com/mailman/listinfo/wireguard
Re: Mixed MTU hosts on a network
On Sat, 14 Apr 2018 16:45:32 +0200 "Jason A. Donenfeld"wrote: > In this case, WireGuard seems to be doing the right thing. Think you > could come up with some minimal test that exhibits the behavior you're > seeing? I now remember in more detail what was the problem. It was not with MTU 1412 on both sides, it was during trying to mix WG MTU 1412 on the PPPoE-connected machine, with WG MTU 1420 on the other side (which uses full 1500 underlying MTU). Here I posted about it with some tcpdumps included: https://lists.zx2c4.com/pipermail/wireguard/2018-March/002537.html With 1420 on the "full MTU" side, the "PPPoE" side had to set 1408 WG MTU for things to work properly, not 1412 as would theoretically fit into its PPPoE. I'll post an update if I come up with a short and simple reproducer sequence. Setting 1412 on both sides seems to work fine from more testing just now. -- With respect, Roman ___ WireGuard mailing list WireGuard@lists.zx2c4.com https://lists.zx2c4.com/mailman/listinfo/wireguard
Re: Mixed MTU hosts on a network
Hi Roman, That's strange; I'm unable to reproduce what you've described: [+] NS1: ip link set wg0 mtu 1412 [+] NS2: ip link set wg0 mtu 1412 [+] NS1: wg set wg0 peer QXloTaPOwUTzqFElVLSD0vBc4sxjyoKtPBSaTkZHokY= endpoint 127.0.0.1:2 [+] NS2: wg set wg0 peer X0p7+UWc4wjaAmT73xAEuXLY80I6Gv8vTg6KwFHCPGs= endpoint 127.0.0.1:1 [+] NS0: iptables -A INPUT -m length --length 1473 -j DROP [+] NS2: ping -c 1 -W 1 -s 1384 192.168.241.1 PING 192.168.241.1 (192.168.241.1) 1384(1412) bytes of data. 1392 bytes from 192.168.241.1: icmp_seq=1 ttl=64 time=0.752 ms --- 192.168.241.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.752/0.752/0.752/0.000 ms In this case, WireGuard seems to be doing the right thing. Think you could come up with some minimal test that exhibits the behavior you're seeing? Jason ___ WireGuard mailing list WireGuard@lists.zx2c4.com https://lists.zx2c4.com/mailman/listinfo/wireguard
Re: Mixed MTU hosts on a network
On Sat, 14 Apr 2018 16:15:07 +0200 "Jason A. Donenfeld"wrote: > Hi Roman, > > I answered this in my first email to you, which perhaps got lost in > the mix of emails, so I'll quote the relevant part: > > > 2) When we pad the packet payload. In this case, we pad it to the > > nearest multiple of 16, but we don't let it exceed the device MTU. > > This is skb_padding in send.c. This behavior seems like the bug in > > your particular case, since what matters here is the route's MTU, not > > the device MTU. For full 1412 size packets, the payload is presumably > > being padded to 1424, since that's still less than the device MTU. In > > order to test this theory, try setting your route MTU, as you've > > described in your first email, to 1408 (which is a multiple of 16). If > > this works, let me know, as it will be good motivation for fixing > > skb_padding. If not, then it means there's a problem elsewhere to > > investigate too. > > In short, because 1408 is a multiple of 16 so it didn't get rounded > up, whereas 1412 got rounded up to 1424. I got that, but that still seemed to be talking about the problem with route MTUs. But what about if I don't touch any route MTUs at all, but set the WG device MTU to 1412. In my further experiments that didn't work well either, causing weird one-directional issues, and only 1408 worked. So, is it possible to fix the padding so 1412 can be used as WG device MTU on underlying MTU of 1492? Otherwise, shouldn't there be a warning somewhere in the docs to not just choose the largest fitting MTU according to [1], but also round down what you got, to a nearest multiple of 16. [1] https://www.mail-archive.com/wireguard@lists.zx2c4.com/msg01856.html -- With respect, Roman ___ WireGuard mailing list WireGuard@lists.zx2c4.com https://lists.zx2c4.com/mailman/listinfo/wireguard
Re: Mixed MTU hosts on a network
Hi Roman, I answered this in my first email to you, which perhaps got lost in the mix of emails, so I'll quote the relevant part: > 2) When we pad the packet payload. In this case, we pad it to the > nearest multiple of 16, but we don't let it exceed the device MTU. > This is skb_padding in send.c. This behavior seems like the bug in > your particular case, since what matters here is the route's MTU, not > the device MTU. For full 1412 size packets, the payload is presumably > being padded to 1424, since that's still less than the device MTU. In > order to test this theory, try setting your route MTU, as you've > described in your first email, to 1408 (which is a multiple of 16). If > this works, let me know, as it will be good motivation for fixing > skb_padding. If not, then it means there's a problem elsewhere to > investigate too. In short, because 1408 is a multiple of 16 so it didn't get rounded up, whereas 1412 got rounded up to 1424. Jason ___ WireGuard mailing list WireGuard@lists.zx2c4.com https://lists.zx2c4.com/mailman/listinfo/wireguard
Re: Mixed MTU hosts on a network
On Sat, 14 Apr 2018 15:16:56 +0200 "Jason A. Donenfeld"wrote: > Hi Roman, > > This commit should fix it. It now has a unit test too so that we don't > hit this issue again. Thanks for reporting it in such detail. > > https://git.zx2c4.com/WireGuard/commit/?id=a88a067d5477f877003d3703bb3b95cb4e94bc46 > > Let me know if that fixes it on your end. > > Jason Thanks! I didn't get a chance to test it yet. Leaving route MTUs aside, did you look into why the interface MTU of 1412 behaves erratically (while by all calculations it should just fit into 1492 underlying PPPoE MTU), with only 1408 working reliably? Is it also because of the padding? -- With respect, Roman ___ WireGuard mailing list WireGuard@lists.zx2c4.com https://lists.zx2c4.com/mailman/listinfo/wireguard
Re: Mixed MTU hosts on a network
Hi Roman, This commit should fix it. It now has a unit test too so that we don't hit this issue again. Thanks for reporting it in such detail. https://git.zx2c4.com/WireGuard/commit/?id=a88a067d5477f877003d3703bb3b95cb4e94bc46 Let me know if that fixes it on your end. Jason ___ WireGuard mailing list WireGuard@lists.zx2c4.com https://lists.zx2c4.com/mailman/listinfo/wireguard
Re: Mixed MTU hosts on a network
On Sat, Apr 14, 2018 at 03:38:46AM +0200, Jason A. Donenfeld wrote: > 2) When we pad the packet payload. In this case, we pad it to the > nearest multiple of 16, but we don't let it exceed the device MTU. > This is skb_padding in send.c. This behavior seems like the bug in > your particular case, since what matters here is the route's MTU, not > the device MTU. For full 1412 size packets, the payload is presumably > being padded to 1424, since that's still less than the device MTU. In > order to test this theory, try setting your route MTU, as you've > described in your first email, to 1408 (which is a multiple of 16). If > this works, let me know, as it will be good motivation for fixing > skb_padding. If not, then it means there's a problem elsewhere to > investigate too. > > I'm CC'ing Luis on this email, as he was working on the MTU code a while back. I'm still playing with this, but something like the following might fix the issue, if you're interested in playing a bit. =~=~=~=~=~=~= diff --git a/src/device.c b/src/device.c index 1614d61..3d18368 100644 --- a/src/device.c +++ b/src/device.c @@ -120,6 +120,7 @@ static netdev_tx_t xmit(struct sk_buff *skb, struct net_device *dev) struct sk_buff *next; struct sk_buff_head packets; sa_family_t family; + u32 mtu; int ret; if (unlikely(skb_examine_untrusted_ip_hdr(skb) != skb->protocol)) { @@ -142,6 +143,8 @@ static netdev_tx_t xmit(struct sk_buff *skb, struct net_device *dev) goto err_peer; } + mtu = dst_mtu(skb_dst(skb)) ?: skb->dev->mtu; + __skb_queue_head_init(); if (!skb_is_gso(skb)) skb->next = NULL; @@ -168,6 +171,8 @@ static netdev_tx_t xmit(struct sk_buff *skb, struct net_device *dev) */ skb_dst_drop(skb); + PACKET_CB(skb)->mtu = mtu; + __skb_queue_tail(, skb); } while ((skb = next) != NULL); diff --git a/src/queueing.h b/src/queueing.h index d5948f3..c507536 100644 --- a/src/queueing.h +++ b/src/queueing.h @@ -46,6 +46,7 @@ struct packet_cb { u64 nonce; struct noise_keypair *keypair; atomic_t state; + u32 mtu; u8 ds; }; #define PACKET_PEER(skb) (((struct packet_cb *)skb->cb)->keypair->entry.peer) diff --git a/src/send.c b/src/send.c index dddcc0b..e3b1ffd 100644 --- a/src/send.c +++ b/src/send.c @@ -116,11 +116,11 @@ static inline unsigned int skb_padding(struct sk_buff *skb) * isn't strictly neccessary, but it's better to be cautious here, especially * if that code ever changes. */ - unsigned int last_unit = skb->len % skb->dev->mtu; + unsigned int last_unit = skb->len % PACKET_CB(skb)->mtu; unsigned int padded_size = (last_unit + MESSAGE_PADDING_MULTIPLE - 1) & ~(MESSAGE_PADDING_MULTIPLE - 1); - if (padded_size > skb->dev->mtu) - padded_size = skb->dev->mtu; + if (padded_size > PACKET_CB(skb)->mtu) + padded_size = PACKET_CB(skb)->mtu; return padded_size - last_unit; } ___ WireGuard mailing list WireGuard@lists.zx2c4.com https://lists.zx2c4.com/mailman/listinfo/wireguard
Re: Mixed MTU hosts on a network
Hi Roman, I think that your idea of setting a route-based MTU _should_ work, and it seems like a bug if it isn't working. There are two places in WireGuard which directly touch the MTU: 1) When we split GSO superpackets up into normal sized packets. This code is supposed to be aware of the per-route MTU you've set, so it shouldn't be a problem. This is the call to skb_gso_segment in device.c. 2) When we pad the packet payload. In this case, we pad it to the nearest multiple of 16, but we don't let it exceed the device MTU. This is skb_padding in send.c. This behavior seems like the bug in your particular case, since what matters here is the route's MTU, not the device MTU. For full 1412 size packets, the payload is presumably being padded to 1424, since that's still less than the device MTU. In order to test this theory, try setting your route MTU, as you've described in your first email, to 1408 (which is a multiple of 16). If this works, let me know, as it will be good motivation for fixing skb_padding. If not, then it means there's a problem elsewhere to investigate too. I'm CC'ing Luis on this email, as he was working on the MTU code a while back. Regards, Jason ___ WireGuard mailing list WireGuard@lists.zx2c4.com https://lists.zx2c4.com/mailman/listinfo/wireguard
Re: Mixed MTU hosts on a network
On Fri, 16 Mar 2018 14:25:47 +0500 Roman Mamedovwrote: > What helps, is only reducing MTU of the entire wg0 interface to 1412. > Then everything works fine. But it doesn't feel optimal to reduce MTU > of the entire network just because of 1 or 2 hosts. I would rather > use a couple of those mtu-override routes, if they worked. Unfortunately, lowering the MTU of the whole tunnel interface is the only reliable solution right now. Per-peer configurability of MTUs has been on project TODO for a while, so there will be a better solution some day. I even started to work on this a few months back, but got sidetracked. Cheers, Luis ___ WireGuard mailing list WireGuard@lists.zx2c4.com https://lists.zx2c4.com/mailman/listinfo/wireguard
Re: Mixed MTU hosts on a network
On Fri, 16 Mar 2018 15:53:43 +0500 Roman Mamedovwrote: > But guess what, turns out that didn't work either. Tried both OUTPUT and > POSTROUTING chains on the "mangle" table, and set-mss all the way down to > 1220, no matter what, the iperf3 output looked the same as before. Actually the iptables bit is easy to explain. Even if initial MSS is forced to a low value on the sender, it's get negotiated back up to the maximum value according to MTU on the receiver (changed both IPs since then): 21:13:38.641531 IP6 fd39:30::f5a8:e923:f8cd:24b5.40052 > fd39:30::e84f:942d:7f93:ddc1.5001: Flags [S], seq 2397878391, win 27200, options [mss 1220,sackOK,TS val 566161815 ecr 0,nop,wscale 9], length 0 21:13:38.641574 IP6 fd39:30::e84f:942d:7f93:ddc1.5001 > fd39:30::f5a8:e923:f8cd:24b5.40052: Flags [S.], seq 1221117548, ack 2397878392, win 26800, options [mss 1352,sackOK,TS val 2726162536 ecr 566161815,nop,wscale 9], length 0 21:13:38.716047 IP6 fd39:30::f5a8:e923:f8cd:24b5.40052 > fd39:30::e84f:942d:7f93:ddc1.5001: Flags [.], ack 1, win 54, options [nop,nop,TS val 566161889 ecr 2726162536], length 0 21:13:38.716444 IP6 fd39:30::f5a8:e923:f8cd:24b5.40052 > fd39:30::e84f:942d:7f93:ddc1.5001: Flags [P.], seq 1341:1605, ack 1, win 54, options [nop,nop,TS val 566161889 ecr 2726162536], length 264 21:13:38.716458 IP6 fd39:30::e84f:942d:7f93:ddc1.5001 > fd39:30::f5a8:e923:f8cd:24b5.40052: Flags [.], ack 1, win 55, options [nop,nop,TS val 2726162611 ecr 566161889,nop,nop,sack 1 {1341:1605}], length 0 So the other side really needs to have a proper MTU set. And the highest working wg0 MTU on PPPoE turned out to be 1408, not 1412 as I assumed. As for why 1412 also works but only if set on the sender side, I've no explanation for that yet. -- With respect, Roman ___ WireGuard mailing list WireGuard@lists.zx2c4.com https://lists.zx2c4.com/mailman/listinfo/wireguard
Re: Mixed MTU hosts on a network
On Fri, 16 Mar 2018 10:35:18 +0100 Matthias Ordnerwrote: > If you only care about TCP connections you could set a different TCP-MSS > with an iptables rule. On Fri, 16 Mar 2018 11:01:51 +0100 Kalin KOZHUHAROV wrote: > You may need to pre-shape the packets for the "offenders", e.g. > > ip6tables -t mangle -A POSTROUTING -o wg0 -d WHATEVERHOST -p tcp -m > tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1352 > > https://www.netfilter.org/documentation/HOWTO/netfilter-extensions-HOWTO-4.html#ss4.7 > > O, wait! You talk IPv6... > > ip6tables -t mangle -A POSTROUTING -o wg0 -d fd39:30::250/128 -p tcp > -m tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1372 I knew about this option, but wanted to avoid it because it would incur more overhead (going to iptables for this) and a bit more complexity. But guess what, turns out that didn't work either. Tried both OUTPUT and POSTROUTING chains on the "mangle" table, and set-mss all the way down to 1220, no matter what, the iperf3 output looked the same as before. At this point I thought I'm going crazy or something. :) It's not just iperf either, trying to send a file with "netcat6" into a running listener on the other side also failed to transfer data. Then almost by accident, I discovered that what also helps. It's to reduce interface MTU only on the receiver, but just by a bit more, to 1408. So what makes it work is EITHER: a) set MTU 1412 on wg0 at sender; OR b) set MTU 1408 on wg0 at receiver. ...doing both at the same time is not even necessary. Some tcpdumps from the receiver host are attached to demonstrate (if anyone else thinks I am crazy :). Now, I can live with just the impacted (PPPoE) hosts having a lower MTU on wg0. But still the whole thing seems rather weird. -- With respect, Roman Receiver mtu 1420, sender mtu 1412, successful transfer: # tcpdump -i wg0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on wg0, link-type RAW (Raw IP), capture size 262144 bytes 15:42:35.027995 IP6 fd39:30::2.42414 > fd39:30::250.5001: Flags [S], seq 4148302601, win 27040, options [mss 1352,sackOK,TS val 2239613851 ecr 0,nop,wscale 9], length 0 15:42:35.028026 IP6 fd39:30::250.5001 > fd39:30::2.42414: Flags [S.], seq 505975510, ack 4148302602, win 26960, options [mss 1360,sackOK,TS val 1473426057 ecr 2239613851,nop,wscale 9], length 0 15:42:35.102517 IP6 fd39:30::2.42414 > fd39:30::250.5001: Flags [.], ack 1, win 53, options [nop,nop,TS val 2239613925 ecr 1473426057], length 0 15:42:35.102772 IP6 fd39:30::2.42414 > fd39:30::250.5001: Flags [.], seq 1:1341, ack 1, win 53, options [nop,nop,TS val 2239613925 ecr 1473426057], length 1340 15:42:35.102785 IP6 fd39:30::250.5001 > fd39:30::2.42414: Flags [.], ack 1341, win 58, options [nop,nop,TS val 1473426131 ecr 2239613925], length 0 15:42:35.102810 IP6 fd39:30::2.42414 > fd39:30::250.5001: Flags [P.], seq 1341:2145, ack 1, win 53, options [nop,nop,TS val 2239613925 ecr 1473426057], length 804 15:42:35.102818 IP6 fd39:30::250.5001 > fd39:30::2.42414: Flags [.], ack 2145, win 64, options [nop,nop,TS val 1473426131 ecr 2239613925], length 0 15:42:35.729846 IP6 fd39:30::250.5001 > fd39:30::2.42162: Flags [F.], seq 1811803733, ack 3749581328, win 56, options [nop,nop,TS val 1473426758 ecr 2239251660,nop,nop,sack 1 {1341:2145}], length 0 15:42:35.804023 IP6 fd39:30::2.42162 > fd39:30::250.5001: Flags [.], ack 1, win 54, options [nop,nop,TS val 2239614627 ecr 1473426758,nop,nop,sack 1 {0:1}], length 0 15:42:36.939584 IP6 fd39:30::2.42414 > fd39:30::250.5001: Flags [F.], seq 2145, ack 1, win 53, options [nop,nop,TS val 2239615763 ecr 1473426131], length 0 15:42:36.939723 IP6 fd39:30::250.5001 > fd39:30::2.42414: Flags [F.], seq 1, ack 2146, win 64, options [nop,nop,TS val 1473427968 ecr 2239615763], length 0 15:42:37.014143 IP6 fd39:30::2.42414 > fd39:30::250.5001: Flags [.], ack 2, win 53, options [nop,nop,TS val 2239615837 ecr 1473427968], length 0 ^C 12 packets captured 12 packets received by filter 0 packets dropped by kernel === Receiver mtu 1408, sender mtu 1420, successful transfer: # tcpdump -i wg0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on wg0, link-type RAW (Raw IP), capture size 262144 bytes 15:43:23.935508 IP6 fd39:30::2.42442 > fd39:30::250.5001: Flags [S], seq 1011924297, win 27200, options [mss 1360,sackOK,TS val 2239662759 ecr 0,nop,wscale 9], length 0 15:43:23.935541 IP6 fd39:30::250.5001 > fd39:30::2.42442: Flags [S.], seq 1735470303, ack 1011924298, win 26720, options [mss 1348,sackOK,TS val 1473474964 ecr 2239662759,nop,wscale 9], length 0 15:43:24.009867 IP6 fd39:30::2.42442 > fd39:30::250.5001: Flags [.], ack 1, win 54, options [nop,nop,TS val 2239662834 ecr 1473474964], length 0 15:43:24.010192 IP6 fd39:30::2.42442 > fd39:30::250.5001: Flags [.], seq 1:1337, ack 1, win 54, options
Re: Mixed MTU hosts on a network
On Fri, Mar 16, 2018 at 10:25 AM, Roman Mamedovwrote: > Hello, > > I have a host which is on PPPoE and has 1492 as underlying MTU. > > When WireGuard starts by default, it sets MTU of its interface to 1420. All > TCP connections trying to send a stream of data over the WG interface to that > host, hang up (I test with iperf3). > > My first idea was to override the MTU for this specific host via adding a > route: > > # ip -6 route add fd39:30::250/128 dev wg0 mtu 1412 metric 1 > > # ip -6 route | grep ^fd39:30 > fd39:30::250 dev wg0 metric 1 mtu 1412 > fd39:30::/64 dev wg0 proto kernel metric 256 > > # ip route get fd39:30::250 > fd39:30::250 from :: dev wg0 src fd39:30::2 metric 1 mtu 1412 > > However, this does not help at all. Even adding the corresponding route on the > other side. Even using the "mtu lock" keyword instead of just "mtu". I am > still > puzzled why. Any ideas? > Isn't it because routing is done by WG itself, based on AlowedIPs, so that routing table is not considered at all, after the packet is given to WG? Those are assumptions of how things work, I haven't looked at the code. > What helps, is only reducing MTU of the entire wg0 interface to 1412. Then > everything works fine. But it doesn't feel optimal to reduce MTU of the entire > network just because of 1 or 2 hosts. I would rather use a couple of those > mtu-override routes, if they worked. > You may need to pre-shape the packets for the "offenders", e.g. ip6tables -t mangle -A POSTROUTING -o wg0 -d WHATEVERHOST -p tcp -m tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1352 https://www.netfilter.org/documentation/HOWTO/netfilter-extensions-HOWTO-4.html#ss4.7 O, wait! You talk IPv6... ip6tables -t mangle -A POSTROUTING -o wg0 -d fd39:30::250/128 -p tcp -m tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1372 You can also try setting the route MTU as above and then use "... -j TCPMSS --clamp-mss-to-pmtu", although it may be more work and/or might not work. Cheers, Kalin. ___ WireGuard mailing list WireGuard@lists.zx2c4.com https://lists.zx2c4.com/mailman/listinfo/wireguard