Bug#690814: [squeeze-wheezy regression] disk activity provokes lockups on VIA EPIA CL-6000

2012-10-25 Thread Frank Lenaerts
--- On Wed, 10/17/12, Jonathan Nieder jrnie...@gmail.com wrote:

From: Jonathan Nieder jrnie...@gmail.com
Subject: Re: [squeeze-wheezy regression] disk activity provokes lockups on VIA 
EPIA CL-6000
To: Frank Lenaerts frank.lenae...@yahoo.com
Cc: 690...@bugs.debian.org
Date: Wednesday, October 17, 2012, 11:19 PM

# regression
severity 690814 important
quit

Hi Frank,

Frank Lenaerts wrote:

 I had to use the power button to restart the machine. The first time
 I rebooted, the system got stuck when trying to mount one of the
 filesystems. Rebooted again and got the login prompt. Did some more
 reboots and found out that the system hangs during more or less 50%
 of the reboots. Since the disk activity LED was always on, I tried
 to provoke some disk activity e.g. by installing some packages. This
 effectively locked up the system. I only once got the system in a
 locked up state without having the disk activity LED turned on.
[...]
 Since this machine had been running Lenny just fine, and Squeeze
 also worked fine, I decided to install a 2.6 kernel.
[...]
                          Note that it took several reboots to get
 the deb file on the system and to install it. With this kernel, the
 box runs just fine.

Thanks for reporting it.

A few suggestions for moving forward:
[...snipped...] * if you have time to run a bisection search through the 
pre-compiled
   kernels at http://snapshot.debian.org/package/linux-2.6/ to find
   the first broken one, that could help narrow down things quite a bit.
Tested some of the linux images:

- 2.6.39 seems ok

- 3.0.0-1 seems ok

- 3.1.0-1 seems ok

- 3.2.0-1 does not seem to be ok:
- first boot: system hung at Loading, please wait...; note that the LED 
indicating disk activity was not burning
- second boot: ok
- third boot: ok

Had it running and used it for a few days. Then decided to give 3.2.0-3 a try 
again.

- 3.2.0-3 is not ok:
- first boot: crash during the boot process i.e. detected the external USB disk 
(which I had connected again while working with 3.2.0-1), makefile style 
concurrent boot, hotplug, udev... stacktrace (not sure if it was after or 
before the hotplug stuff; unfortunately didn't have logging turned on and 
keyboard was stuck)... boot process continued a bit and then hung; note that 
the LED was not on; since I've now seen quite some lockups without this LED 
being on, I think it does not necessarily have anything to do with disk activity
- seond boot: stuck in Configuring network interfaces; note that I've seen 
this more than once already

So, it looks like we'll have to investigate the differences between -1 and -3.

Question: 3.2.0-3 was the one installed during the installation. 3.2.0-1 was 
installed by me for testing purposes and came from 3.2.1-1. I see that the 
snapshot directory contains other 3.2.0-1 e.g. under 3.2.2-1. Why is it like 
that? I see that the 3.2.19-1 directory contains 3.2.0-2 for instance and it 
seems that 3.2.0-3 is not in the snapshot directory...Hope that helps, and 
sorry I have no better ideas,
Jonathan

[1] http://www.kernel.org/doc/Documentation/networking/netconsole.txt


Bug#690814: [squeeze-wheezy regression] disk activity provokes lockups on VIA EPIA CL-6000

2012-10-25 Thread Jonathan Nieder
Frank Lenaerts wrote:

 Tested some of the linux images:

 - 2.6.39 seems ok
 - 3.0.0-1 seems ok
 - 3.1.0-1 seems ok
 - 3.2.0-1 does not seem to be ok:
   - first boot: system hung at Loading, please wait...; note that
 the LED indicating disk activity was not burning
   - second boot: ok
   - third boot: ok

 Had it running and used it for a few days. Then decided to give
 3.2.0-3 a try again.

 - 3.2.0-3 is not ok:

Thanks much for this.

[...]
 - first boot: crash during the boot process i.e. detected the
   external USB disk (which I had connected again while working with
   3.2.0-1), makefile style concurrent boot, hotplug, udev...
   stacktrace (not sure if it was after or before the hotplug stuff;
   unfortunately didn't have logging turned on and keyboard was
   stuck)...

For the future, photographs work fine if you can catch the machine at
the right moment.

[...]
 Question: 3.2.0-3 was the one installed during the installation.
 3.2.0-1 was installed by me for testing purposes and came from
 3.2.1-1. I see that the snapshot directory contains other 3.2.0-1
 e.g. under 3.2.2-1. Why is it like that?

Package names like linux-image-3.2.0-1-486 describe the kernel's
ABI, not the package version.  The package version is something like
3.2.1-1.  See http://kernel-handbook.alioth.debian.org/ch-versions.html
for more details.

You can get the version of the currently running kernel by running
cat /proc/version (it will be in parentheses).  The currently
installed kernel's version number can be retrieved with
dpkg-query -W linux-image-$(uname -r).

Which versions were the 3.2.0-1, 3.2.0-2, and 3.2.0-3 kernels
you mentioned testing above?

Hope that helps,
Jonathan


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20121025203420.GF30334@elie.Belkin



Bug#690814: [squeeze-wheezy regression] disk activity provokes lockups on VIA EPIA CL-6000

2012-10-18 Thread Frank Lenaerts


--- On Wed, 10/17/12, Jonathan Nieder jrnie...@gmail.com wrote:

From: Jonathan Nieder jrnie...@gmail.com
Subject: Re: [squeeze-wheezy regression] disk activity provokes lockups on VIA 
EPIA CL-6000
To: Frank Lenaerts frank.lenae...@yahoo.com
Cc: 690...@bugs.debian.org
Date: Wednesday, October 17, 2012, 11:19 PM

# regression
severity 690814 important
quit

Hi Frank,

Frank Lenaerts wrote:

 I had to use the power button to restart the machine. The first time
 I rebooted, the system got stuck when trying to mount one of the
 filesystems. Rebooted again and got the login prompt. Did some more
 reboots and found out that the system hangs during more or less 50%
 of the reboots. Since the disk activity LED was always on, I tried
 to provoke some disk activity e.g. by installing some packages. This
 effectively locked up the system. I only once got the system in a
 locked up state without having the disk activity LED turned on.
[...]
 Since this machine had been running Lenny just fine, and Squeeze
 also worked fine, I decided to install a 2.6 kernel.
[...]
                          Note that it took several reboots to get
 the deb file on the system and to install it. With this kernel, the
 box runs just fine.

Thanks for reporting it.

A few suggestions for moving forward:

 * please attach full dmesg output from a normal boot (with the
   2.6.32-based kernel)

The tarball I attached to this e-mail contains the dmesg output from 
2.6.32-5-486 and 3.2.0-3-486 (sometimes, I can login and even do some things;-))

 * could you also get a kernel log from booting the 3.2-based kernel?
   A full log including the lockup would be ideal --- netconsole[1]
   might help here.

The attached tarball contains some try to boot logs. Some more information:

- netconsole_1.out: netconsole only shows until 'loop: module loaded' while the 
console also showed 'Loading kernel module loop.' (after the above 'loop: 
module loaded'), 'Activating lvm and md swap... done' and 'Checking 
filesystems... fsck from util-linux 2.20.1'. I don't know why these 3 lines are 
not visible via netconsole. It looks like the box cannot write to the network 
anymore.

- netconsole_2.out: netconsole only shows until 'eth0: no IPv6 routers present' 
while the console also showed 'Activating swap...'

- netconsole_3_A-B-C.out: A was like netconsole_1.out, B was like 
netconsole_2.out, C was like netconsole_1.out

After this, I booted with 2.6 because I could not get 3.2 far enough to be able 
to login. When I booted with 2.6 it wanted to fsck an external USB disk. Since 
this would take too long, I unplugged it (so further logs won't show /dev/sdc 
anymore).

Instead of using netconsole, I used screen on ttyS0. Finally, I could get to 
the login prompt of 3.2. When I issued a find /usr -ls over an ssh connection, 
the system locked up (no keyboard interaction possible anymore). Note that 
screenlog_3.2.0-3-486.0 doesn't show anything usesfull (on the linux 
commandline, I specified console=ttyS0 and debug).

Since the successful 3.2 boot was without (a) the USB  disk and (b) 
netconsole, I decided to try to boot with netconsole (and without the USB 
disk). It booted a little bit further than before but I still couldn't get to 
the login prompt. Rebooted several times, see netconsole_4_a-b-c.out. Some 
extra notes:

- a: hung at 90-second grace period

- b: hung at loop: module loaded

- c: login was possible; netconsole also ended with 90-second grace period 
like in case a; to see if the network is the showstopper, ran find /usr -ls on 
the console; this worked, just like find /var -ls; when I ran an apt-get update 
however, the system locked up (all files were downloaded, but it failed at the 
end (percentage sign stuck)).

 * if you have time to run a bisection search through the pre-compiled
   kernels at http://snapshot.debian.org/package/linux-2.6/ to find
   the first broken one, that could help narrow down things quite a bit.

I put it on my TODO list.

Hope that helps, and sorry I have no better ideas,
Jonathan

[1] http://www.kernel.org/doc/Documentation/networking/netconsole.txt


dbts-690814.tar.gz
Description: GNU Zip compressed data


Bug#690814: [squeeze-wheezy regression] disk activity provokes lockups on VIA EPIA CL-6000

2012-10-17 Thread Jonathan Nieder
# regression
severity 690814 important
quit

Hi Frank,

Frank Lenaerts wrote:

 I had to use the power button to restart the machine. The first time
 I rebooted, the system got stuck when trying to mount one of the
 filesystems. Rebooted again and got the login prompt. Did some more
 reboots and found out that the system hangs during more or less 50%
 of the reboots. Since the disk activity LED was always on, I tried
 to provoke some disk activity e.g. by installing some packages. This
 effectively locked up the system. I only once got the system in a
 locked up state without having the disk activity LED turned on.
[...]
 Since this machine had been running Lenny just fine, and Squeeze
 also worked fine, I decided to install a 2.6 kernel.
[...]
  Note that it took several reboots to get
 the deb file on the system and to install it. With this kernel, the
 box runs just fine.

Thanks for reporting it.

A few suggestions for moving forward:

 * please attach full dmesg output from a normal boot (with the
   2.6.32-based kernel)

 * could you also get a kernel log from booting the 3.2-based kernel?
   A full log including the lockup would be ideal --- netconsole[1]
   might help here.

 * if you have time to run a bisection search through the pre-compiled
   kernels at http://snapshot.debian.org/package/linux-2.6/ to find
   the first broken one, that could help narrow down things quite a bit.

Hope that helps, and sorry I have no better ideas,
Jonathan

[1] http://www.kernel.org/doc/Documentation/networking/netconsole.txt


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20121017211906.GG12456@elie.Belkin