Hi, You can look at traffic in the tunnel by using the NFLOG target in iptables. Read the CorrectTrafficDump page on the wiki.
Kind regards Noel On 29.05.2018 18:05, Arzhel Younsi wrote: > Hello! > > I started to troubleshoot intermittent but large spikes of ICMP "packet too > big" messages on our servers running IPsec in transport mode with StrongSwan. > > We're tracking that issue "internally" on > https://phabricator.wikimedia.org/T195365 with many digressions and real > data, but here is a summarized version: > > hostA and hostB have IPsec configured such as all traffic between the two > hosts is being encrypted. Traffic is relatively steady. > > At (so far) random times, a packet capture on hostA's loopback shows large > spikes of ICMP "packet too big" from and to hostA's interface IP. > The payload (detailed in the phabricator task) says: hostA tried to send a > 1516 bytes packet to hostB while hostA's interface MTU is 1500. > > During that spike of ICMP, running: > "ip -s route get hostB" on hostA shows "mtu 1500". > This mtu mention is absent during "quiet time" (default value?). > The ICMP spike stops before the end of the "cache" countdown. But if the ICMP > spike happens again, the "cache" countdown gets re-initialized. > > Locking the MTU with: > "ip route add hostB via xxx mtu lock 1400" seems to fix the issue. > > Our current guess is something along the lines of: > 1/ An unknown event (eg. congestion) triggers a MTU probing from the kernel > (we have tcp_mtu_probing set to 1) > (As it's all in ipsec, we can't inspect the traffic and see what and how > traffic is flowing) > 2/ The kernel sets a temporary PMTU value based on the interface (and maybe > hostB) > without taking the ESP overhead into consideration > 3/ Traffic use that mtu 1500 to send traffic, but can't get passed the > interface after beeing encrypted because being too big. > > But as this is still quite speculative, and for Ocham's razor' sake I'd > expect a miss-configuration on our side instead of a bug in the > kernel/StrongSwan :) > > How to figure out what creates that cache entry? > Is our guess plausible? > How to troubleshoot it more? > Any help welcome. > > As we have many to many IPsec links, I would rather avoid deploying the mtu > lock everywhere. This also doesn't help understanding and nailing the root of > the issue. > > Cheers >
signature.asc
Description: OpenPGP digital signature
