Bug#594845: Acknowledgement (linux-image-2.6.32-5-amd64: kernel BUG at /build/buildd-linux-2.6_2.6.32-20-amd64-lNUT1p/..../fs/sysfs/file.c:539)

2010-09-07 Thread Russell Stuart
 You are editing in the wrong place.  The patch needs to be applied in
 debian/build/source_amd64_none.

Ta.  I applied the patch to every copy of tun.c other than the one in
source_amd64_openvz_amd64, and now the my trace is printed as it should
be.  

And with the patch applied properly the problem disappears, so it does
indeed fix the problem.

 The debian/bin/test-patches script can handle this all for you.

Unfortunately the patch doesn't apply to source_amd64_openvz_amd64, and
test-patches dies as soon as that fails.  That is why I was doing it
manually.




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1283844141.4098.113.ca...@russell-laptop



Bug#594845: Acknowledgement (linux-image-2.6.32-5-amd64: kernel BUG at /build/buildd-linux-2.6_2.6.32-20-amd64-lNUT1p/..../fs/sysfs/file.c:539)

2010-09-06 Thread Russell Stuart
On Sun, 2010-09-05 at 01:45 +0100, Ben Hutchings wrote:
 Are you quite sure you used the modified kernel?

Nope. 

 'cat /proc/version' will tell you for sure which version you are
 running.

$ cat /proc/version 
Linux version 2.6.32-5-amd64 (Debian 2.6.32-20.2) (russell-deb...@stuart.id.au) 
(gcc version 4.3.5 (Debian 4.3.5-2) ) #1 SMP Mon Sep 6 09:14:09 EST 2010

So that looks like I am running the right kernel.  But because the
symptoms didn't change overly I had my doubts from the beginning, and
thus have been trying to prove I was running the kernel with your patch
applied.  Doing that in various ways is why it took me a while to
respond to your request to test the patch.

One way I tried to confirm it by adding trace:

--- x/linux-2.6-2.6.32/drivers/net/tun.c2009-12-03 13:51:21.0 
+1000
+++ linux-2.6-2.6.32/drivers/net/tun.c  2010-09-06 08:09:44.068458190 +1000
@@ -36,7 +36,7 @@
 
 #define DRV_NAME   tun
 #define DRV_VERSION1.6
-#define DRV_DESCRIPTIONUniversal TUN/TAP device driver
+#define DRV_DESCRIPTIONUniversal TUN/TAP device driver + 
0001-tun-Don-t-add-sysfs-attributes-to-devices-without-sy.patch applied
 #define DRV_COPYRIGHT  (C) 1999-2004 Max Krasnyansky m...@qualcomm.com
 
 #include linux/module.h
@@ -1006,7 +1006,9 @@
if (err  0)
goto err_free_sk;
 
-   if (device_create_file(tun-dev-dev, dev_attr_tun_flags) ||
+   printk(KERN_INFO 
0001-tun-Don-t-add-sysfs-attributes-to-devices-without-sy.patch\n);
+   if (!net_eq(dev_net(tun-dev), init_net) ||
+   device_create_file(tun-dev-dev, dev_attr_tun_flags) ||
device_create_file(tun-dev-dev, dev_attr_owner) ||
device_create_file(tun-dev-dev, dev_attr_group))
printk(KERN_ERR Failed to create tun sysfs files\n);

None of the trace is ever appears in /var/log/kern.log, which is to say
grep 0001-tun /var/log/kern.log prints nothing. Yet I am evidently
running a different kernel as I don't ever hit the BUG the buildd kernel
generates.  I don't have a clue what is going on.

The steps I used to generate the kernel are:

  $ apt-get source linux-2.6
  $ cd linux-2.6-2.6.32
  $ fakeroot debian/rules source
  $ fakeroot debian/rules setup
  $ patch -p1 
.../0001-tun-Don-t-add-sysfs-attributes-to-devices-without-sy.patch
  $ ed drivers/net/tun.c  # add trace
  $ ed debian/changelog   # add rev 2.6.30-20.2
  $ fakeroot debian/rules binary
  $ sudo dpkg -i ../linux-image-2.6.32-5-amd64_2.6.32-20.2_amd64.deb
  $ sudo ed /boot/grub/grub.cfg   # set the default kernel
  $ sudo reboot -f





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1283773622.4264.65.ca...@russell-laptop



Bug#594845: Acknowledgement (linux-image-2.6.32-5-amd64: kernel BUG at /build/buildd-linux-2.6_2.6.32-20-amd64-lNUT1p/..../fs/sysfs/file.c:539)

2010-09-06 Thread Ben Hutchings
On Mon, 2010-09-06 at 21:47 +1000, Russell Stuart wrote:
[...]
 The steps I used to generate the kernel are:
 
   $ apt-get source linux-2.6
   $ cd linux-2.6-2.6.32
   $ fakeroot debian/rules source
   $ fakeroot debian/rules setup
   $ patch -p1 
 .../0001-tun-Don-t-add-sysfs-attributes-to-devices-without-sy.patch
   $ ed drivers/net/tun.c  # add trace
[...]

You are editing in the wrong place.  The patch needs to be applied in
debian/build/source_amd64_none.  The debian/bin/test-patches script can
handle this all for you.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part


Bug#594845: Acknowledgement (linux-image-2.6.32-5-amd64: kernel BUG at /build/buildd-linux-2.6_2.6.32-20-amd64-lNUT1p/..../fs/sysfs/file.c:539)

2010-09-04 Thread Ben Hutchings
On Sat, 2010-09-04 at 09:27 +1000, Russell Stuart wrote:
 On Mon, 2010-08-30 at 14:48 +0100, Ben Hutchings wrote: 
  On Mon, 2010-08-30 at 17:34 +1000, Russell Stuart wrote:
   The problem disappears in
   linux-image-2.6.35-trunk-amd64_2.6.35-1~experimental.2.
  
  Yes, as I expected.
  
  Can you please test the attached patch against the version in unstable?
  Directions for rebuilding an official kernel package are at
  http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official.
 
 Applied that.  It changed the problem.
 
 Before I got a nice repeatable BUG.  Now the openvpn instance
 unconditionally segfaults and normally nothing appears on the console or
 in kern.log.  Once I got lucky and this appeared on the console:
[...]

Are you quite sure you used the modified kernel?  This message matches
your original report:

  kernel:[52062.330671] Code: 74 0f 48 89 ef e8 24 07 00 00 eb 05
 bb fe ff ff ff 89 d8 5b 5d 41 5c c3 48 85 ff 74 0e 48 8b 7f 30
 48 85 ff 74 05 48 85 f6 75 04 0f 0b eb fe ba 02 00 00 00 e9 5d
 ff ff ff 55 53 48 89 fb 48 c7 

'cat /proc/version' will tell you for sure which version you are
running.

 The machine appears to freeze in various ways - eg you can't get a login
 prompt to have a sniff around and the first command you type at an
 existing shell prompt that requires disk IO freezes, and a sleep 300;
 sudo reboot -f doesn't do anything.  On the other hand a for f in
 $(seq 1000); do echo $f; sleep 1; done continues on as though nothing
 has happened.  Disk IO is probably borked.

The crash occurs at a point where the tun driver is holding the 'RTNL'
lock which controls access to network device configuration.  Any
operation that must acquire that lock (and it's surprising how many
operations do) will hang.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part


Bug#594845: Acknowledgement (linux-image-2.6.32-5-amd64: kernel BUG at /build/buildd-linux-2.6_2.6.32-20-amd64-lNUT1p/..../fs/sysfs/file.c:539)

2010-09-03 Thread Russell Stuart
On Mon, 2010-08-30 at 14:48 +0100, Ben Hutchings wrote: 
 On Mon, 2010-08-30 at 17:34 +1000, Russell Stuart wrote:
  The problem disappears in
  linux-image-2.6.35-trunk-amd64_2.6.35-1~experimental.2.
 
 Yes, as I expected.
 
 Can you please test the attached patch against the version in unstable?
 Directions for rebuilding an official kernel package are at
 http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official.

Applied that.  It changed the problem.

Before I got a nice repeatable BUG.  Now the openvpn instance
unconditionally segfaults and normally nothing appears on the console or
in kern.log.  Once I got lucky and this appeared on the console:

Message from sysl...@toby at Sep  4 09:11:57 ...
 kernel:[52062.327222] [ cut here ]

Message from sysl...@toby at Sep  4 09:11:57 ...
 kernel:[52062.329577] invalid opcode:  [#1] SMP 

Message from sysl...@toby at Sep  4 09:11:57 ...
 kernel:[52062.330671] last sysfs
file: /sys/devices/virtual/misc/tun/uevent

Message from sysl...@toby at Sep  4 09:11:57 ...
 kernel:[52062.330671] Stack:

Message from sysl...@toby at Sep  4 09:11:57 ...
 kernel:[52062.330671] Call Trace:

Message from sysl...@toby at Sep  4 09:11:57 ...
 kernel:[52062.330671] Code: 74 0f 48 89 ef e8 24 07 00 00 eb 05
bb fe ff ff ff 89 d8 5b 5d 41 5c c3 48 85 ff 74 0e 48 8b 7f 30
48 85 ff 74 05 48 85 f6 75 04 0f 0b eb fe ba 02 00 00 00 e9 5d
ff ff ff 55 53 48 89 fb 48 c7 

The machine appears to freeze in various ways - eg you can't get a login
prompt to have a sniff around and the first command you type at an
existing shell prompt that requires disk IO freezes, and a sleep 300;
sudo reboot -f doesn't do anything.  On the other hand a for f in
$(seq 1000); do echo $f; sleep 1; done continues on as though nothing
has happened.  Disk IO is probably borked.




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1283556425.23992.14.ca...@russell-laptop



Bug#594845: Acknowledgement (linux-image-2.6.32-5-amd64: kernel BUG at /build/buildd-linux-2.6_2.6.32-20-amd64-lNUT1p/..../fs/sysfs/file.c:539)

2010-08-30 Thread Russell Stuart
The problem disappears in
linux-image-2.6.35-trunk-amd64_2.6.35-1~experimental.2.




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1283153685.4366.2.ca...@russell-laptop



Bug#594845: Acknowledgement (linux-image-2.6.32-5-amd64: kernel BUG at /build/buildd-linux-2.6_2.6.32-20-amd64-lNUT1p/..../fs/sysfs/file.c:539)

2010-08-30 Thread Ben Hutchings
On Mon, 2010-08-30 at 17:34 +1000, Russell Stuart wrote:
 The problem disappears in
 linux-image-2.6.35-trunk-amd64_2.6.35-1~experimental.2.

Yes, as I expected.

Can you please test the attached patch against the version in unstable?
Directions for rebuilding an official kernel package are at
http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
From e5e7a7a14da22681abd96a305753e8cdcf898d40 Mon Sep 17 00:00:00 2001
From: Ben Hutchings b...@decadent.org.uk
Date: Mon, 30 Aug 2010 14:38:14 +0100
Subject: [PATCH] tun: Don't add sysfs attributes to devices without sysfs directories

Prior to Linux 2.6.35, net devices outside the initial net namespace
did not have sysfs directories.  Attempting to add attributes to
them will trigger a BUG().

Reported-by: Russell Stuart russell-deb...@stuart.id.au
Signed-off-by: Ben Hutchings b...@decadent.org.uk
---
 drivers/net/tun.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4fdfa2a..0f77aca 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1006,7 +1006,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		if (err  0)
 			goto err_free_sk;
 
-		if (device_create_file(tun-dev-dev, dev_attr_tun_flags) ||
+		if (!net_eq(dev_net(tun-dev), init_net) ||
+		device_create_file(tun-dev-dev, dev_attr_tun_flags) ||
 		device_create_file(tun-dev-dev, dev_attr_owner) ||
 		device_create_file(tun-dev-dev, dev_attr_group))
 			printk(KERN_ERR Failed to create tun sysfs files\n);
-- 
1.7.1



signature.asc
Description: This is a digitally signed message part