Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?
On 04/09/06, Eito Tamura [EMAIL PROTECTED] wrote: Hi George, I am having the same problem as you did. So installing ampd package didn't fix the problem? Just those crappy HDD problem? Upgrading kernel might fix this problem? Hello, The problem is still there, although it does not occur very often (once every few months or so). The few machines that still have those HDDs have the latest Sarge 2.6.8-3 kernel, acpid, DMA disabled, drive power management disabled (through hdparm). To be honest though, we have swapped out the HDDs on most of them for Compact Flash cards and those machines they have been rock solid since. I don't know if the Etch / Sid kernels will fix this, as I never got around trying this out (being firewalls, running Stable is a requirement really). I probably should, since Etch is due in a few months. If you try this, feel free to let me know how it went. HTH, George. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?
Just to let you know that the 'fix' did not work. :-( The hard drives will still randomly go to sleep and not wake up. What's more, even a sysrq forced reboot does not wake the drive up - only a cold boot works! The end result is that we are gradually replacing these crappy disks in our firewalls with CompactFlash disks.
Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?
I think I finally fixed the problem by installing the apmd package. :-\ The problem was, I am guessing, that the hard drive went to sleep at some random time (despite it being explicitly told not to in the BIOS) and regardless of the disk activity (so even my 15 min dd cronjob did not fix it) and nothing was there to detect it and tell the system that the drive was not dead, but just sleeping. I'm not sure what apmd does when the drive goes to sleep, but it seems to do something useful. Anyway, not a kernel bug (unless you think it should have managed without apmd,) so please close this and sorry to have wasted your time. George.
Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?
On Tue, Nov 01, 2005 at 09:24:16AM +, George B. wrote: P.S. I could try unloading the ide-generic module etc. if you think that's a good idea. I think its certainly worth a try, though its probably not going to make much difference. -- Horms -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?
On 27/10/05, Horms [EMAIL PROTECTED] wrote: Ok, that does sound like fair resoning, though I should say that almost always these kind of errors show up faulty hardware. In this case, its probably a bug. Don't get me wrong, I think may well be faulty hardware, but a general fault rather than with a specific unit. Unfortunately as such this would be way over my head. Just for the record, the hdparm -b 255 hack did not work on at least one of the machines and neither did a 15 min cronjob of dd if=/dev/hdc of=/dev/null bas=1024 count=3000 seek=1000 /dev/null :-( One of the firewalls I still haven't rebooted, one died last night and the other is still alive for the moment. The only thing that all these boxes have in common (apart from identical HDDs, but one had a different one before) is a VIA chipset. How likely is it that there's a bug either in that, or the driver for it? I am very confused :-(
Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?
Is there any chance that #336103 is related to this? http:://bugs.debian.org/336103 -- Horms -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?
On Mon, Oct 24, 2005 at 03:37:52PM +0100, George B. wrote: Package: kernel-image-2.6.8-2-386 Version: 2.6.8-16 Severity: important Hello, I've been having the same problem on 3 firewall boxes: after a certain amount of time (days, weeks) the hard drives will either go into read only mode or lock up for good (until a reboot) with I/O error messages. I will report about this this machine, as the others have not been rebooted yet, so they don't work properly yet (although they still forward/filter packets.) All the firewalls use Seagate ST92011A (20GB 2.5) drives and are based on the VIA chipsets (can't confirm if these are identical as one of the firewalls uses a different motherboard and is currently dead (input/output error on any command) until the next reboot. This was a problem when I tried the latest 2.4 kernel in Sarge, then it seemed to go away when I switched to 2.6.8, but is still there, just takes much longer for the fault to occur. I have a feeling it is a Power Management problem, with the drive not waking up from deep sleep (this was proven experimentally with the 2.4 kernel.) At the moment I am testing the hdparm -B 255 'solution'. Otherwise it's the 15 min ls -l / /dev/null cron job :-S The kernel is from APT the modules loaded are by hotplug - no custom stuff. powermgmt-base is installed, but that's about it. Your disks are failing, get new ones if you value your data. -- Horms -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?
Your disks are failing, get new ones if you value your data. I'm afraid I will have to disagree with you on this one. All of the disks were brand new and the problem started showing up after a few days/weeks after assembly. Also, one of the firewalls had a different (3.5) drive in it before, that displayed similar symptoms. Thinking the disk was dying I swapped it, only to have the problem come back. (The 3.5 drive is being happily used by my brother now, with no problems whatsoever.) Also, 3 out of 3 is a bit high failure rate... George B.
Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?
On Wed, Oct 26, 2005 at 10:42:46PM +0100, George B. wrote: Your disks are failing, get new ones if you value your data. I'm afraid I will have to disagree with you on this one. All of the disks were brand new and the problem started showing up after a few days/weeks after assembly. Also, one of the firewalls had a different (3.5) drive in it before, that displayed similar symptoms. Thinking the disk was dying I swapped it, only to have the problem come back. (The 3.5 drive is being happily used by my brother now, with no problems whatsoever.) Also, 3 out of 3 is a bit high failure rate... Ok, that does sound like fair resoning, though I should say that almost always these kind of errors show up faulty hardware. In this case, its probably a bug. -- Horms -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?
Package: kernel-image-2.6.8-2-386 Version: 2.6.8-16 Severity: important Hello, I've been having the same problem on 3 firewall boxes: after a certain amount of time (days, weeks) the hard drives will either go into read only mode or lock up for good (until a reboot) with I/O error messages. I will report about this this machine, as the others have not been rebooted yet, so they don't work properly yet (although they still forward/filter packets.) All the firewalls use Seagate ST92011A (20GB 2.5) drives and are based on the VIA chipsets (can't confirm if these are identical as one of the firewalls uses a different motherboard and is currently dead (input/output error on any command) until the next reboot. This was a problem when I tried the latest 2.4 kernel in Sarge, then it seemed to go away when I switched to 2.6.8, but is still there, just takes much longer for the fault to occur. I have a feeling it is a Power Management problem, with the drive not waking up from deep sleep (this was proven experimentally with the 2.4 kernel.) At the moment I am testing the hdparm -B 255 'solution'. Otherwise it's the 15 min ls -l / /dev/null cron job :-S The kernel is from APT the modules loaded are by hotplug - no custom stuff. powermgmt-base is installed, but that's about it. Here is some info: lspci: --- :00:00.0 Host bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia] (rev 05) :00:01.0 PCI bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia AGP] :00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40) :00:07.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) :00:07.2 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 Controller (rev 1a) :00:07.3 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 Controller (rev 1a) :00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40) :00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 AC97 Audio Controller (rev 50) :00:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) :00:09.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) :00:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) :01:00.0 VGA compatible controller: Trident Microsystems CyberBlade/i1 (rev 6a) --- Dmesg error from this machine (with a futile attempt to force a remote reboot - is there a better way?): --- eth1: no IPv6 routers present eth0: no IPv6 routers present apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16ac) HTB init, kernel part version 3.17 u32 classifier OLD policer on hdc: dma_timer_expiry: dma status == 0x20 hdc: DMA timeout retry hdc: timeout waiting for DMA hdc: status timeout: status=0xd0 { Busy } hdc: drive not ready for command ide1: reset timed-out, status=0x80 hdc: status timeout: status=0x80 { Busy } hdc: drive not ready for command ide1: reset timed-out, status=0x80 end_request: I/O error, dev hdc, sector 33719 end_request: I/O error, dev hdc, sector 33727 end_request: I/O error, dev hdc, sector 33735 end_request: I/O error, dev hdc, sector 33743 end_request: I/O error, dev hdc, sector 33751 end_request: I/O error, dev hdc, sector 33759 end_request: I/O error, dev hdc, sector 33767 end_request: I/O error, dev hdc, sector 33775 end_request: I/O error, dev hdc, sector 33783 end_request: I/O error, dev hdc, sector 33791 end_request: I/O error, dev hdc, sector 33799 end_request: I/O error, dev hdc, sector 33807 end_request: I/O error, dev hdc, sector 33815 end_request: I/O error, dev hdc, sector 33823 end_request: I/O error, dev hdc, sector 33831 end_request: I/O error, dev hdc, sector 33839 end_request: I/O error, dev hdc, sector 33847 end_request: I/O error, dev hdc, sector 33855 end_request: I/O error, dev hdc, sector 33863 end_request: I/O error, dev hdc, sector 33871 end_request: I/O error, dev hdc, sector 33879 end_request: I/O error, dev hdc, sector 33887 end_request: I/O error, dev hdc, sector 33895 end_request: I/O error, dev hdc, sector 18982239 Buffer I/O error on device hdc5, logical block 782337 lost page write due to I/O error on hdc5 end_request: I/O error, dev hdc, sector 17250295 Buffer I/O error on device hdc5, logical block 565844 lost page write due to I/O error on hdc5 end_request: I/O error, dev hdc, sector 17250303 Buffer I/O error on device hdc5, logical block 565845 lost page write due to I/O error on hdc5 end_request: I/O error, dev hdc, sector 17250311 Buffer I/O error on device hdc5, logical block 565846 lost page write due to I/O error on hdc5 end_request: I/O error, dev hdc, sector 17249919 Buffer I/O error on device hdc5, logical block 565797 lost page write due to I/O error on hdc5 end_request: I/O error, dev hdc, sector 17245527 Buffer I/O error on device hdc5, logical block 565248 lost page write due to I/O error on hdc5 end_request: I/O