Bug#690814: [squeeze-wheezy regression] disk activity provokes lockups on VIA EPIA CL-6000
--- On Wed, 10/17/12, Jonathan Nieder jrnie...@gmail.com wrote: From: Jonathan Nieder jrnie...@gmail.com Subject: Re: [squeeze-wheezy regression] disk activity provokes lockups on VIA EPIA CL-6000 To: Frank Lenaerts frank.lenae...@yahoo.com Cc: 690...@bugs.debian.org Date: Wednesday, October 17, 2012, 11:19 PM # regression severity 690814 important quit Hi Frank, Frank Lenaerts wrote: I had to use the power button to restart the machine. The first time I rebooted, the system got stuck when trying to mount one of the filesystems. Rebooted again and got the login prompt. Did some more reboots and found out that the system hangs during more or less 50% of the reboots. Since the disk activity LED was always on, I tried to provoke some disk activity e.g. by installing some packages. This effectively locked up the system. I only once got the system in a locked up state without having the disk activity LED turned on. [...] Since this machine had been running Lenny just fine, and Squeeze also worked fine, I decided to install a 2.6 kernel. [...] Note that it took several reboots to get the deb file on the system and to install it. With this kernel, the box runs just fine. Thanks for reporting it. A few suggestions for moving forward: [...snipped...] * if you have time to run a bisection search through the pre-compiled kernels at http://snapshot.debian.org/package/linux-2.6/ to find the first broken one, that could help narrow down things quite a bit. Tested some of the linux images: - 2.6.39 seems ok - 3.0.0-1 seems ok - 3.1.0-1 seems ok - 3.2.0-1 does not seem to be ok: - first boot: system hung at Loading, please wait...; note that the LED indicating disk activity was not burning - second boot: ok - third boot: ok Had it running and used it for a few days. Then decided to give 3.2.0-3 a try again. - 3.2.0-3 is not ok: - first boot: crash during the boot process i.e. detected the external USB disk (which I had connected again while working with 3.2.0-1), makefile style concurrent boot, hotplug, udev... stacktrace (not sure if it was after or before the hotplug stuff; unfortunately didn't have logging turned on and keyboard was stuck)... boot process continued a bit and then hung; note that the LED was not on; since I've now seen quite some lockups without this LED being on, I think it does not necessarily have anything to do with disk activity - seond boot: stuck in Configuring network interfaces; note that I've seen this more than once already So, it looks like we'll have to investigate the differences between -1 and -3. Question: 3.2.0-3 was the one installed during the installation. 3.2.0-1 was installed by me for testing purposes and came from 3.2.1-1. I see that the snapshot directory contains other 3.2.0-1 e.g. under 3.2.2-1. Why is it like that? I see that the 3.2.19-1 directory contains 3.2.0-2 for instance and it seems that 3.2.0-3 is not in the snapshot directory...Hope that helps, and sorry I have no better ideas, Jonathan [1] http://www.kernel.org/doc/Documentation/networking/netconsole.txt
Bug#690814: [squeeze-wheezy regression] disk activity provokes lockups on VIA EPIA CL-6000
Frank Lenaerts wrote: Tested some of the linux images: - 2.6.39 seems ok - 3.0.0-1 seems ok - 3.1.0-1 seems ok - 3.2.0-1 does not seem to be ok: - first boot: system hung at Loading, please wait...; note that the LED indicating disk activity was not burning - second boot: ok - third boot: ok Had it running and used it for a few days. Then decided to give 3.2.0-3 a try again. - 3.2.0-3 is not ok: Thanks much for this. [...] - first boot: crash during the boot process i.e. detected the external USB disk (which I had connected again while working with 3.2.0-1), makefile style concurrent boot, hotplug, udev... stacktrace (not sure if it was after or before the hotplug stuff; unfortunately didn't have logging turned on and keyboard was stuck)... For the future, photographs work fine if you can catch the machine at the right moment. [...] Question: 3.2.0-3 was the one installed during the installation. 3.2.0-1 was installed by me for testing purposes and came from 3.2.1-1. I see that the snapshot directory contains other 3.2.0-1 e.g. under 3.2.2-1. Why is it like that? Package names like linux-image-3.2.0-1-486 describe the kernel's ABI, not the package version. The package version is something like 3.2.1-1. See http://kernel-handbook.alioth.debian.org/ch-versions.html for more details. You can get the version of the currently running kernel by running cat /proc/version (it will be in parentheses). The currently installed kernel's version number can be retrieved with dpkg-query -W linux-image-$(uname -r). Which versions were the 3.2.0-1, 3.2.0-2, and 3.2.0-3 kernels you mentioned testing above? Hope that helps, Jonathan -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20121025203420.GF30334@elie.Belkin
Bug#690814: [squeeze-wheezy regression] disk activity provokes lockups on VIA EPIA CL-6000
--- On Wed, 10/17/12, Jonathan Nieder jrnie...@gmail.com wrote: From: Jonathan Nieder jrnie...@gmail.com Subject: Re: [squeeze-wheezy regression] disk activity provokes lockups on VIA EPIA CL-6000 To: Frank Lenaerts frank.lenae...@yahoo.com Cc: 690...@bugs.debian.org Date: Wednesday, October 17, 2012, 11:19 PM # regression severity 690814 important quit Hi Frank, Frank Lenaerts wrote: I had to use the power button to restart the machine. The first time I rebooted, the system got stuck when trying to mount one of the filesystems. Rebooted again and got the login prompt. Did some more reboots and found out that the system hangs during more or less 50% of the reboots. Since the disk activity LED was always on, I tried to provoke some disk activity e.g. by installing some packages. This effectively locked up the system. I only once got the system in a locked up state without having the disk activity LED turned on. [...] Since this machine had been running Lenny just fine, and Squeeze also worked fine, I decided to install a 2.6 kernel. [...] Note that it took several reboots to get the deb file on the system and to install it. With this kernel, the box runs just fine. Thanks for reporting it. A few suggestions for moving forward: * please attach full dmesg output from a normal boot (with the 2.6.32-based kernel) The tarball I attached to this e-mail contains the dmesg output from 2.6.32-5-486 and 3.2.0-3-486 (sometimes, I can login and even do some things;-)) * could you also get a kernel log from booting the 3.2-based kernel? A full log including the lockup would be ideal --- netconsole[1] might help here. The attached tarball contains some try to boot logs. Some more information: - netconsole_1.out: netconsole only shows until 'loop: module loaded' while the console also showed 'Loading kernel module loop.' (after the above 'loop: module loaded'), 'Activating lvm and md swap... done' and 'Checking filesystems... fsck from util-linux 2.20.1'. I don't know why these 3 lines are not visible via netconsole. It looks like the box cannot write to the network anymore. - netconsole_2.out: netconsole only shows until 'eth0: no IPv6 routers present' while the console also showed 'Activating swap...' - netconsole_3_A-B-C.out: A was like netconsole_1.out, B was like netconsole_2.out, C was like netconsole_1.out After this, I booted with 2.6 because I could not get 3.2 far enough to be able to login. When I booted with 2.6 it wanted to fsck an external USB disk. Since this would take too long, I unplugged it (so further logs won't show /dev/sdc anymore). Instead of using netconsole, I used screen on ttyS0. Finally, I could get to the login prompt of 3.2. When I issued a find /usr -ls over an ssh connection, the system locked up (no keyboard interaction possible anymore). Note that screenlog_3.2.0-3-486.0 doesn't show anything usesfull (on the linux commandline, I specified console=ttyS0 and debug). Since the successful 3.2 boot was without (a) the USB disk and (b) netconsole, I decided to try to boot with netconsole (and without the USB disk). It booted a little bit further than before but I still couldn't get to the login prompt. Rebooted several times, see netconsole_4_a-b-c.out. Some extra notes: - a: hung at 90-second grace period - b: hung at loop: module loaded - c: login was possible; netconsole also ended with 90-second grace period like in case a; to see if the network is the showstopper, ran find /usr -ls on the console; this worked, just like find /var -ls; when I ran an apt-get update however, the system locked up (all files were downloaded, but it failed at the end (percentage sign stuck)). * if you have time to run a bisection search through the pre-compiled kernels at http://snapshot.debian.org/package/linux-2.6/ to find the first broken one, that could help narrow down things quite a bit. I put it on my TODO list. Hope that helps, and sorry I have no better ideas, Jonathan [1] http://www.kernel.org/doc/Documentation/networking/netconsole.txt dbts-690814.tar.gz Description: GNU Zip compressed data
Bug#690814: [squeeze-wheezy regression] disk activity provokes lockups on VIA EPIA CL-6000
# regression severity 690814 important quit Hi Frank, Frank Lenaerts wrote: I had to use the power button to restart the machine. The first time I rebooted, the system got stuck when trying to mount one of the filesystems. Rebooted again and got the login prompt. Did some more reboots and found out that the system hangs during more or less 50% of the reboots. Since the disk activity LED was always on, I tried to provoke some disk activity e.g. by installing some packages. This effectively locked up the system. I only once got the system in a locked up state without having the disk activity LED turned on. [...] Since this machine had been running Lenny just fine, and Squeeze also worked fine, I decided to install a 2.6 kernel. [...] Note that it took several reboots to get the deb file on the system and to install it. With this kernel, the box runs just fine. Thanks for reporting it. A few suggestions for moving forward: * please attach full dmesg output from a normal boot (with the 2.6.32-based kernel) * could you also get a kernel log from booting the 3.2-based kernel? A full log including the lockup would be ideal --- netconsole[1] might help here. * if you have time to run a bisection search through the pre-compiled kernels at http://snapshot.debian.org/package/linux-2.6/ to find the first broken one, that could help narrow down things quite a bit. Hope that helps, and sorry I have no better ideas, Jonathan [1] http://www.kernel.org/doc/Documentation/networking/netconsole.txt -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20121017211906.GG12456@elie.Belkin