Package: linux
Version: 3.2.60-1+deb7u1
Severity: important
Dear Maintainer,
the Kernel upgrade via Debian Security on Friday 2014-07-04 made routing
service (in this case with NAT) somewhat broken.
*High level description*
This is experienced by users as very slow network access to some servers, and
only by some client computers.
*Setup*
A gateway using Linux for routing and NAT running Debian stable (amd64) was
updated from linux-image-3.2.0-4-amd64:amd64 version 3.2.57-3+deb7u2 to version
3.2.60-1+deb7u1
The gateway is supposed to route and NAT traffic from a private network to the
public internet, translating RFC1918 client source addresses to public
addresses.
In the following tcpdumps I have replaced the RFC1918 address of a client with
CLIENT, the public IP address of the relevant gateway with NAT and the
public IP address of a server in the Internet with SERVER for reasons of
privacy, as they were gathered in a LIVE environment.
*Problem description*
After the update the Kernel chokes on apparently too big IP packets that
don't fit the MTU:
17:11:00.917355 IP SERVER.80 NAT.44991: Flags [.], seq 1:2921, ack 98, win
5840, length 2920
17:11:00.917384 IP NAT SERVER: ICMP NAT unreachable - need to frag (mtu
1500), length 556
Note the large IP packet (2960 MTU of 1500). It has the DF bit set. The
packet cannot have arrived via the network, though, as it is an Ethernet with
an MTU of 1500, so this is odd. *1
*Workaround*
The problem disappears when GRO is deactivated:
ethtool -K eth0 gro off
The kernel then receives only valid packets of up to MTU in size:
17:14:53.288712 IP SERVER.80 NAT.44996: Flags [.], seq 1:1461, ack 98, win
5840, length 1460
17:14:53.288730 IP SERVER.80 CLIENT.44996: Flags [.], seq 1:1461, ack 98, win
5840, length 1460
17:14:53.288735 IP SERVER.80 NAT.44996: Flags [.], seq 1461:2921, ack 98, win
5840, length 1460
17:14:53.27 IP SERVER.80 CLIENT.44996: Flags [.], seq 1461:2921, ack 98,
win 5840, length 1460
GRO is a performance optimization where the NIC assembles packets into larger
packets for smaller processing/interrupt overhead. GRO defaults to on (on this
hardware).
*Regression*
The problem did not exist in 3.2.57-3+deb7u2. In that version the Kernel
forwards those big packets as many smaller packets of up to MTU size:
16:23:01.394351 IP SERVER.80 NAT.44943: Flags [.], seq 1:2921, ack 98, win
5840, length 2920
16:23:01.394375 IP SERVER.80 CLIENT.44943: Flags [.], seq 1:1461, ack 98, win
5840, length 1460
16:23:01.394525 IP SERVER.80 CLIENT.44943: Flags [.], seq 1461:2921, ack 98,
win 5840, length 1460
Note this is not IP fragmentation, as the smaller packets contain one TCP
segment each.
*Possible causes*
I suspect the reason for how the error manifests to end users (very slow
network access to some servers, and only by some client computers) is that the
actual operation of GRO is influenced by the NIC/driver, timing of packet flow,
and IP/TCP options used (which depend on client OS and configuration and server
OS and configuration). Then, the server's retransmit behaviour may cause single
packets to be transmitted, which are then not mangled by GRO and can be
successfully forwarded to clients, although that is very slow.
There are two changes between 3.2.57-3+deb7u2 and 3.2.60-1+deb7u1 that look
related, because they were supposed to fix a similar issue with IP packets that
arrive fragmented but have the DF bit set:
In the Debian specific patch set
patches/bugfix/all/netfilter-ipv4-defrag-set-local_df-flag-on-defragmen.patch:
[quote]
From: Florian Westphal f...@strlen.de
Date: Fri, 2 May 2014 15:32:16 +0200
Subject: netfilter: ipv4: defrag: set local_df flag on defragmented skb
Origin: https://git.kernel.org/linus/895162b1101b3ea5db08ca6822ae9672717efec0
else we may fail to forward skb even if original fragments do fit
outgoing link mtu:
1. remote sends 2k packets in two 1000 byte frags, DF set
2. we want to forward but only see '2k mtu and DF set'
3. we then send icmp error saying that outgoing link is 1500
But original sender never sent a packet that would not fit
the outgoing link.
Setting local_df makes outgoing path test size vs.
IPCB(skb)-frag_max_size, so we will still send the correct
error in case the largest original size did not fit
outgoing link mtu.
Reported-by: Maxime Bizon mbi...@freebox.fr
Suggested-by: Maxime Bizon mbi...@freebox.fr
Fixes: 5f2d04f1f9 (ipv4: fix path MTU discovery with connection tracking)
Signed-off-by: Florian Westphal f...@strlen.de
Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org
---
net/ipv4/netfilter/nf_defrag_ipv4.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c
b/net/ipv4/netfilter/nf_defrag_ipv4.c
index 12e13bd..f40f321 100644
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -22,7 +22,6 @@
#endif
#include net/netfilter/nf_conntrack_zones.h
-/* Returns