Bug#646063: net: fix route cache rebuilds

2012-11-24 Thread Jonathan Nieder
Hi Ben,

In October, 2011, Florian Fuessl wrote:
 Eric Dumazet eric.duma...@gmail.com Fri, Oct 21, 2011 7:44 AM
 Le vendredi 21 octobre 2011 à 01:07 +0100, Ben Hutchings a écrit :
 On Fri, 2011-10-21 at 00:40 +0200, Florian Fuessl wrote:

 http://patchwork.ozlabs.org/patch/47114/raw/
[...]
 Eric, do you see any problems with this?  Would we need any more
 follow-up fixes?
 [...]
 This patch is probably safe, [...]

 But I believe another bug was fixed in 6a2bad70d546cf30
 (ipv4: Restart rt_intern_hash after emergency rebuild )

 At least the Debian router runs stable now after applying the patch of Eric. 
 :)

 Although there's still one message in the dmesg log after ~3 days uptime:
 spozerl:~# dmesg
 [...]
 [58018.930367] Route hash chain too long!
 [58018.930371] Adjust your secret_interval!

This is all beyond my depth.  What's the next step?  Is there some
further test Florian should try?

Thanks,
Jonathan


--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20121125004159.GA2244@elie.Belkin



Bug#646063: net: fix route cache rebuilds

2011-11-07 Thread Florian Fuessl
Hi Eric,
 hi Ben,

this is an update regarding this case:

Eric Dumazet eric.duma...@gmail.com wrote Fri, Oct 21, 2011 7:44 AM
 Le vendredi 21 octobre 2011 à 01:07 +0100, Ben Hutchings a écrit :
 
  Eric, do you see any problems with this?  Would we need any more
  follow-up fixes?
 
 Hi Ben
 
 This patch is probably safe, it should avoid the emergency rebuild
 trigger.even with few entries in cache, because of one long chain
 [different TOS values being mapped to the same slot ]
 
 But I believe another bug was fixed in 6a2bad70d546cf30
 (ipv4: Restart rt_intern_hash after emergency rebuild )
 
 If Florian route cache use is light/normal, this second commit is
 probably not needed.

Unfortunately the system still suffered from two network disconnects starting 
with the following messages in the kernel log: 
Nov  7 06:38:41 spozerl kernel: [ 9025.854230] Route hash chain too long!
Nov  7 06:38:41 spozerl kernel: [ 9025.854237] Adjust your secret_interval!
Nov  7 07:10:53 spozerl kernel: [10953.398869] eth0: 5 rebuilds is over limit, 
route caching disabled
Nov  7 07:10:53 spozerl kernel: [10953.398876] Route hash chain too long!
Nov  7 07:10:53 spozerl kernel: [10953.398878] Adjust your secret_interval!
Nov  7 07:12:59 spozerl kernel: [11080.006209] dst cache overflow
Nov  7 07:12:59 spozerl kernel: [11080.012829] dst cache overflow
Nov  7 07:12:59 spozerl kernel: [11080.019653] dst cache overflow
Nov  7 07:12:59 spozerl kernel: [11080.019704] dst cache overflow
Nov  7 07:12:59 spozerl kernel: [11080.022230] dst cache overflow
Nov  7 07:12:59 spozerl kernel: [11080.023285] dst cache overflow
Nov  7 07:12:59 spozerl kernel: [11080.023680] dst cache overflow
Nov  7 07:12:59 spozerl kernel: [11080.023731] dst cache overflow
Nov  7 07:12:59 spozerl kernel: [11080.024538] dst cache overflow
Nov  7 07:12:59 spozerl kernel: [11080.026248] dst cache overflow
Nov  7 07:13:04 spozerl kernel: [11085.007358] __ratelimit: 595 callbacks 
suppressed
Nov  7 07:13:04 spozerl kernel: [11085.007362] dst cache overflow
Nov  7 07:13:04 spozerl kernel: [11085.009144] dst cache overflow
[...]

-Florian




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/002101cc9dae$d9ed7530$8dc85f90$@de



Bug#646063: net: fix route cache rebuilds

2011-11-07 Thread Eric Dumazet
Le mardi 08 novembre 2011 à 01:39 +0100, Florian Fuessl a écrit :

 Unfortunately the system still suffered from two network disconnects starting 
 with the following messages in the kernel log: 
 Nov  7 06:38:41 spozerl kernel: [ 9025.854230] Route hash chain too long!
 Nov  7 06:38:41 spozerl kernel: [ 9025.854237] Adjust your secret_interval!
 Nov  7 07:10:53 spozerl kernel: [10953.398869] eth0: 5 rebuilds is over 
 limit, route caching disabled
 Nov  7 07:10:53 spozerl kernel: [10953.398876] Route hash chain too long!
 Nov  7 07:10:53 spozerl kernel: [10953.398878] Adjust your secret_interval!
 Nov  7 07:12:59 spozerl kernel: [11080.006209] dst cache overflow
 ...

 Nov  7 07:13:04 spozerl kernel: [11085.007358] __ratelimit: 595 callbacks 
 suppressed
 Nov  7 07:13:04 spozerl kernel: [11085.007362] dst cache overflow
 Nov  7 07:13:04 spozerl kernel: [11085.009144] dst cache overflow
 [...]
 

If patch is already in your kernel, your machine is under stress,
and route cache is disabled.

You probably need to adjust route cache hash size.

grep . /proc/sys/net/ipv4/route/*






-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1320734944.8976.3.camel@edumazet-laptop



Bug#646063: net: fix route cache rebuilds

2011-10-23 Thread Florian Fuessl
Hi Eric,
 hi Ben,

Eric Dumazet eric.duma...@gmail.com Fri, Oct 21, 2011 7:44 AM
 Le vendredi 21 octobre 2011 à 01:07 +0100, Ben Hutchings a écrit :
 
  Eric, do you see any problems with this?  Would we need any more
  follow-up fixes?
 [...]
 This patch is probably safe, [...]
 
 But I believe another bug was fixed in 6a2bad70d546cf30
 (ipv4: Restart rt_intern_hash after emergency rebuild )

At least the Debian router runs stable now after applying the patch of Eric. :)

Although there's still one message in the dmesg log after ~3 days uptime:
spozerl:~# dmesg
[...]
[58018.930367] Route hash chain too long!
[58018.930371] Adjust your secret_interval!
spozerl:~# uptime
 15:26:02 up 2 days, 19:24,  2 users,  load average: 0.00, 0.00, 0.00
spozerl:~# uname -a
Linux spozerl 2.6.32-fix-route-cache-rebuilds #1 SMP Sat Oct 15 23:06:34 CEST 
2011 i686 GNU/Linux

 
 If Florian route cache use is light/normal, this second commit is
 probably not needed.

spozerl:~# while true; do echo -n $(date): ; ip route show cache | wc -l; 
sleep 10; done
So 23. Okt 15:15:20 CEST 2011: 2470
So 23. Okt 15:15:30 CEST 2011: 2496
So 23. Okt 15:15:40 CEST 2011: 2692
So 23. Okt 15:15:50 CEST 2011: 2716
So 23. Okt 15:16:00 CEST 2011: 2728
So 23. Okt 15:16:10 CEST 2011: 2774
So 23. Okt 15:16:20 CEST 2011: 3588
So 23. Okt 15:16:30 CEST 2011: 3774
So 23. Okt 15:16:40 CEST 2011: 3784
So 23. Okt 15:16:51 CEST 2011: 3788
So 23. Okt 15:17:01 CEST 2011: 3804
So 23. Okt 15:17:11 CEST 2011: 4568
[...]
So 23. Okt 15:23:02 CEST 2011: 3718
So 23. Okt 15:23:12 CEST 2011: 3844
So 23. Okt 15:23:22 CEST 2011: 360
So 23. Okt 15:23:32 CEST 2011: 838
So 23. Okt 15:23:42 CEST 2011: 1176
So 23. Okt 15:23:52 CEST 2011: 1406
So 23. Okt 15:24:02 CEST 2011: 1798
So 23. Okt 15:24:12 CEST 2011: 2028
[... attached: complete log file of some more minutes]

-Florian
spozerl:~# while true; do echo -n $(date): ; ip route show cache | wc -l; 
sleep 10; done
So 23. Okt 15:15:20 CEST 2011: 2470
So 23. Okt 15:15:30 CEST 2011: 2496
So 23. Okt 15:15:40 CEST 2011: 2692
So 23. Okt 15:15:50 CEST 2011: 2716
So 23. Okt 15:16:00 CEST 2011: 2728
So 23. Okt 15:16:10 CEST 2011: 2774
So 23. Okt 15:16:20 CEST 2011: 3588
So 23. Okt 15:16:30 CEST 2011: 3774
So 23. Okt 15:16:40 CEST 2011: 3784
So 23. Okt 15:16:51 CEST 2011: 3788
So 23. Okt 15:17:01 CEST 2011: 3804
So 23. Okt 15:17:11 CEST 2011: 4568
So 23. Okt 15:17:21 CEST 2011: 4502
So 23. Okt 15:17:31 CEST 2011: 4508
So 23. Okt 15:17:41 CEST 2011: 4554
So 23. Okt 15:17:51 CEST 2011: 4542
So 23. Okt 15:18:01 CEST 2011: 4546
So 23. Okt 15:18:11 CEST 2011: 4554
So 23. Okt 15:18:21 CEST 2011: 4364
So 23. Okt 15:18:31 CEST 2011: 4368
So 23. Okt 15:18:41 CEST 2011: 4372
So 23. Okt 15:18:51 CEST 2011: 4384
So 23. Okt 15:19:01 CEST 2011: 4384
So 23. Okt 15:19:11 CEST 2011: 4392
So 23. Okt 15:19:21 CEST 2011: 4024
So 23. Okt 15:19:31 CEST 2011: 4040
So 23. Okt 15:19:41 CEST 2011: 4096
So 23. Okt 15:19:51 CEST 2011: 4106
So 23. Okt 15:20:01 CEST 2011: 4112
So 23. Okt 15:20:11 CEST 2011: 4118
So 23. Okt 15:20:21 CEST 2011: 3796
So 23. Okt 15:20:31 CEST 2011: 3804
So 23. Okt 15:20:41 CEST 2011: 3810
So 23. Okt 15:20:51 CEST 2011: 3854
So 23. Okt 15:21:01 CEST 2011: 3870
So 23. Okt 15:21:11 CEST 2011: 3902
So 23. Okt 15:21:21 CEST 2011: 3484
So 23. Okt 15:21:31 CEST 2011: 3530
So 23. Okt 15:21:41 CEST 2011: 3564
So 23. Okt 15:21:51 CEST 2011: 3620
So 23. Okt 15:22:02 CEST 2011: 3670
So 23. Okt 15:22:12 CEST 2011: 3876
So 23. Okt 15:22:22 CEST 2011: 3282
So 23. Okt 15:22:32 CEST 2011: 3378
So 23. Okt 15:22:42 CEST 2011: 3540
So 23. Okt 15:22:52 CEST 2011: 3664
So 23. Okt 15:23:02 CEST 2011: 3718
So 23. Okt 15:23:12 CEST 2011: 3844
So 23. Okt 15:23:22 CEST 2011: 360
So 23. Okt 15:23:32 CEST 2011: 838
So 23. Okt 15:23:42 CEST 2011: 1176
So 23. Okt 15:23:52 CEST 2011: 1406
So 23. Okt 15:24:02 CEST 2011: 1798
So 23. Okt 15:24:12 CEST 2011: 2028
So 23. Okt 15:24:22 CEST 2011: 2380
So 23. Okt 15:24:32 CEST 2011: 3488
So 23. Okt 15:24:42 CEST 2011: 3768
So 23. Okt 15:24:52 CEST 2011: 3984
So 23. Okt 15:25:02 CEST 2011: 4266
So 23. Okt 15:25:12 CEST 2011: 4388
So 23. Okt 15:25:22 CEST 2011: 4424
So 23. Okt 15:25:32 CEST 2011: 4556
So 23. Okt 15:25:42 CEST 2011: 4632
So 23. Okt 15:25:52 CEST 2011: 4736
So 23. Okt 15:26:02 CEST 2011: 4806
So 23. Okt 15:26:12 CEST 2011: 4908
So 23. Okt 15:26:22 CEST 2011: 5012
So 23. Okt 15:26:32 CEST 2011: 5028
So 23. Okt 15:26:42 CEST 2011: 5102
So 23. Okt 15:26:52 CEST 2011: 5144
So 23. Okt 15:27:02 CEST 2011: 5222
So 23. Okt 15:27:12 CEST 2011: 5298
So 23. Okt 15:27:22 CEST 2011: 5338
So 23. Okt 15:27:33 CEST 2011: 5340
So 23. Okt 15:27:43 CEST 2011: 5368
So 23. Okt 15:27:53 CEST 2011: 5390
So 23. Okt 15:28:03 CEST 2011: 5454
So 23. Okt 15:28:13 CEST 2011: 5534
So 23. Okt 15:28:23 CEST 2011: 5586
So 23. Okt 15:28:33 CEST 2011: 5632
So 23. Okt 15:28:43 CEST 2011: 5718
So 23. Okt 15:28:53 CEST 2011: 5750
So 23. Okt 15:29:03 CEST 2011: 5784
So 23. Okt 15:29:13 CEST 2011: 5838
So 23. Okt 15:29:23 CEST 2011: 5886
So 23. Okt 

Bug#646063: net: fix route cache rebuilds

2011-10-20 Thread Florian Fuessl
Package: linux-source-2.6.32
Version: 2.6.32-38
Severity: critical
Tags: squeeze patch
Justification: breaks the whole system


Hi,

Debian Squeeze running kernel 2.6.32 suffers the following bug, discussed on 
the kernel mailing list net...@vger.kernel.org:
http://kerneltrap.org/mailarchive/linux-netdev/2010/3/8/6271476

In detail: [...]
Oct 12 11:54:28 spozerl kernel: [180385.555758] Route hash chain too long!
Oct 12 11:54:28 spozerl kernel: [180385.555760] Adjust your secret_interval!
Oct 12 12:01:52 spozerl kernel: [180829.114321] dst cache overflow
Oct 12 12:01:52 spozerl kernel: [180829.129033] dst cache overflow
Oct 12 12:01:52 spozerl kernel: [180829.130873] dst cache overflow
Oct 12 12:01:52 spozerl kernel: [180829.139006] dst cache overflow
[...] until the kernel network stack freezes after a while.

To resolve the kernel hangups (of network connectivity) I've applied the patch 
of Eric Dumazet to the linux-source-2.6.32 package, which had resolved the 
issue at my loaded Debian Squeeze router, here:
http://patchwork.ozlabs.org/patch/47114/raw/

It would be great, if this patch could be included to the official Debian 
Squeeze kernel. Maybe this also resolves some other strange network hangups 
described by other users.

-Florian

-- System Information:
Debian Release: 6.0.3
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.32-fix-route-cache-rebuilds (SMP w/1 CPU core)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-source-2.6.32 depends on:
ii  binutils  2.20.1-16  The GNU assembler, linker and bina
ii  bzip2 1.0.5-6high-quality block-sorting file co

Versions of packages linux-source-2.6.32 recommends:
ii  gcc   4:4.4.5-1  The GNU C compiler
ii  libc6-dev [libc-dev]  2.11.2-10  Embedded GNU C Library: Developmen
ii  make  3.81-8 An utility for Directing compilati

Versions of packages linux-source-2.6.32 suggests:
ii  kernel-package12.036+nmu1A utility for building Linux kerne
ii  libncurses5-dev [ncurses- 5.7+20100313-5 developer's libraries and docs for
pn  libqt3-mt-dev none (no description available)

-- no debconf information
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index b2ba558..d9b4024 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -146,7 +146,6 @@ static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst);
 static void		 ipv4_link_failure(struct sk_buff *skb);
 static void		 ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu);
 static int rt_garbage_collect(struct dst_ops *ops);
-static void rt_emergency_hash_rebuild(struct net *net);
 
 
 static struct dst_ops ipv4_dst_ops = {
@@ -780,11 +779,30 @@ static void rt_do_flush(int process_context)
 #define FRACT_BITS 3
 #define ONE (1UL  FRACT_BITS)
 
+/*
+ * Given a hash chain and an item in this hash chain,
+ * find if a previous entry has the same hash_inputs
+ * (but differs on tos, mark or oif)
+ * Returns 0 if an alias is found.
+ * Returns ONE if rth has no alias before itself.
+ */
+static int has_noalias(const struct rtable *head, const struct rtable *rth)
+{
+	const struct rtable *aux = head;
+
+	while (aux != rth) {
+		if (compare_hash_inputs(aux-fl, rth-fl))
+			return 0;
+		aux = aux-u.dst.rt_next;
+	}
+	return ONE;
+}
+
 static void rt_check_expire(void)
 {
 	static unsigned int rover;
 	unsigned int i = rover, goal;
-	struct rtable *rth, *aux, **rthp;
+	struct rtable *rth, **rthp;
 	unsigned long samples = 0;
 	unsigned long sum = 0, sum2 = 0;
 	unsigned long delta;
@@ -835,15 +853,7 @@ nofree:
 	 * attributes don't unfairly skew
 	 * the length computation
 	 */
-	for (aux = rt_hash_table[i].chain;;) {
-		if (aux == rth) {
-			length += ONE;
-			break;
-		}
-		if (compare_hash_inputs(aux-fl, rth-fl))
-			break;
-		aux = aux-u.dst.rt_next;
-	}
+	length += has_noalias(rt_hash_table[i].chain, rth);
 	continue;
 }
 			} else if (!rt_may_expire(rth, tmo, ip_rt_gc_timeout))
@@ -1073,6 +1083,21 @@ work_done:
 out:	return 0;
 }
 
+/*
+ * Returns number of entries in a hash chain that have different hash_inputs
+ */
+static int slow_chain_length(const struct rtable *head)
+{
+	int length = 0;
+	const struct rtable *rth = head;
+
+	while (rth) {
+		length += has_noalias(head, rth);
+		rth = rth-u.dst.rt_next;
+	}
+	return length  FRACT_BITS;
+}
+
 static int rt_intern_hash(unsigned hash, struct rtable *rt,
 			  struct rtable **rp, struct sk_buff *skb)
 {
@@ -1185,7 +1210,8 @@ restart:
 			rt_free(cand);
 		}
 	} else {
-		if (chain_length  rt_chain_length_max) {
+		if (chain_length  rt_chain_length_max 
+		slow_chain_length(rt_hash_table[hash].chain)  rt_chain_length_max) {
 			struct net *net = dev_net(rt-u.dst.dev);
 			int num = ++net-ipv4.current_rt_cache_rebuild_count;
 			if 

Bug#646063: net: fix route cache rebuilds

2011-10-20 Thread Ben Hutchings
On Fri, 2011-10-21 at 00:40 +0200, Florian Fuessl wrote:
 Debian Squeeze running kernel 2.6.32 suffers the following bug,
 discussed on the kernel mailing list net...@vger.kernel.org:
 http://kerneltrap.org/mailarchive/linux-netdev/2010/3/8/6271476
 
 In detail: [...]
 Oct 12 11:54:28 spozerl kernel: [180385.555758] Route hash chain too long!
 Oct 12 11:54:28 spozerl kernel: [180385.555760] Adjust your secret_interval!
 Oct 12 12:01:52 spozerl kernel: [180829.114321] dst cache overflow
 Oct 12 12:01:52 spozerl kernel: [180829.129033] dst cache overflow
 Oct 12 12:01:52 spozerl kernel: [180829.130873] dst cache overflow
 Oct 12 12:01:52 spozerl kernel: [180829.139006] dst cache overflow
 [...] until the kernel network stack freezes after a while.
 
 To resolve the kernel hangups (of network connectivity) I've applied
 the patch of Eric Dumazet to the linux-source-2.6.32 package, which
 had resolved the issue at my loaded Debian Squeeze router, here:
 http://patchwork.ozlabs.org/patch/47114/raw/
 
 It would be great, if this patch could be included to the official
 Debian Squeeze kernel. Maybe this also resolves some other strange
 network hangups described by other users.

Eric, do you see any problems with this?  Would we need any more
follow-up fixes?

Ben.

-- 
Ben Hutchings
Unix is many things to many people,
but it's never been everything to anybody.


signature.asc
Description: This is a digitally signed message part


Bug#646063: net: fix route cache rebuilds

2011-10-20 Thread Eric Dumazet
Le vendredi 21 octobre 2011 à 01:07 +0100, Ben Hutchings a écrit :

 Eric, do you see any problems with this?  Would we need any more
 follow-up fixes?

Hi Ben

This patch is probably safe, it should avoid the emergency rebuild
trigger.even with few entries in cache, because of one long chain
[different TOS values being mapped to the same slot ]

But I believe another bug was fixed in 6a2bad70d546cf30
(ipv4: Restart rt_intern_hash after emergency rebuild )

If Florian route cache use is light/normal, this second commit is
probably not needed.





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1319175853.2854.52.camel@edumazet-laptop