I am working on a VPN solution connecting some appliances on two different networks. I’m using an x86 openwrt router with strongswan 5.9.2 and kernel 5.4.154. The systems I am connecting exhibit non-compliant TCP MSS behaviour. They are, for unknown reasons, ignoring the MSS from their peers and sending oversized packets. They also ignore ICMP unreachable messages indicating path MTU, I have confirmed that the ICMP unreachable messages are not blocked and they have been captured directly on the system sending the problematic traffic. I do not have control over the appliances and need to solve the issues at the network level.
I'm using a modern IKEv2 / XFRM based configuration for this VPN. I would like to ignore the DF bit and fragment traffic passing through the VPN tunnel. This fragmentation could occur before or after encapsulation, it's not significant to me. If I was using a GRE tunnel I could use the ignore-df configuration [1], however there doesn't appear to be an equivalent with an xfrm interface. I have managed to "solve" my problem, though I do not understand the solution or how it works. If I create the following iptables rule to adjust the MSS on traffic traversing the xfrm interface: iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN -o xfrm0 -j TCPMSS --set-mss 1240 iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN -i xfrm0 -j TCPMSS --set-mss 1240 Then, in addition to the expected modification of the mss field, my TCP traffic will be fragmented, ignoring the DF bit. Here's an excerpt of traffic in ingress to the router: 09:23:56.103022 IP 10.1.34.10.5060 > 10.1.61.20.25578: Flags [P.], seq 883:1906, ack 1760, win 260, length 1023 09:23:56.119864 IP 10.1.61.20.25578 > 10.1.34.10.5060: Flags [.], ack 1906, win 501, length 0 09:24:01.448960 IP 10.1.34.10.5060 > 10.1.61.20.25578: Flags [P.], seq 1906:3271, ack 1760, win 260, length 1365 09:24:01.467771 IP 10.1.61.20.25578 > 10.1.34.10.5060: Flags [.], ack 3148, win 501, length 0 09:24:01.467810 IP 10.1.61.20.25578 > 10.1.34.10.5060: Flags [.], ack 3271, win 501, length 0 And egress on the xfrm interface (In addition to being sent over a VPN connect the traffic is also being NATed by the VPN router): 09:23:56.103150 IP 10.2.30.1.5060 > 10.2.2.6.25578: Flags [P.], seq 881:1902, ack 1750, win 260, length 1021 09:23:56.119828 IP 10.2.2.6.25578 > 10.2.30.1.5060: Flags [.], ack 1902, win 501, length 0 09:24:01.449067 IP 10.2.30.1.5060 > 10.2.2.6.25578: Flags [.], seq 1902:3142, ack 1750, win 260, length 1240 09:24:01.449135 IP 10.2.30.1.5060 > 10.2.2.6.25578: Flags [P.], seq 3142:3265, ack 1750, win 260, length 123 09:24:01.467724 IP 10.2.2.6.25578 > 10.2.30.1.5060: Flags [.], ack 3142, win 501, length 0 09:24:01.467725 IP 10.2.2.6.25578 > 10.2.30.1.5060: Flags [.], ack 3265, win 501, length 0 The packet with length 1365 has been split into a packet of 1240 bytes and a second of 123. Without these rules I see the expected behaviour, the packets are dropped and ICMP unreachable messages are sent indicating the path MTU. Is anyone able to explain why, in addition to adjusting the MSS, this mangle configuration is allowing fragmentation ignoring the DF bit? While the solution is working as I need it to, I'm concerned that it may be extremely fragile. Is there a better way to solve this problem? Thanks in advance for any help you can offer, -JohnF [1] https://man7.org/linux/man-pages/man8/ip-tunnel.8.html