Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?

2006-09-05 Thread George B.

On 04/09/06, Eito Tamura [EMAIL PROTECTED] wrote:


Hi George,

I am having the same problem as you did. So installing ampd package didn't
fix the problem?
Just those crappy HDD problem? Upgrading kernel might fix this problem?


Hello,

The problem is still there, although it does not occur very often
(once every few months or so). The few machines that still have those
HDDs have the latest Sarge 2.6.8-3 kernel, acpid, DMA disabled, drive
power management disabled (through hdparm).

To be honest though, we have swapped out the HDDs on most of them for
Compact Flash cards and those machines they have been rock solid
since.

I don't know if the Etch / Sid kernels will fix this, as I never got
around trying this out (being firewalls, running Stable is a
requirement really). I probably should, since Etch is due in a few
months. If you try this, feel free to let me know how it went.


HTH,

George.


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?

2006-03-16 Thread George B.
Just to let you know that the 'fix' did not work. :-(

The hard drives will still randomly go to sleep and not wake up.
What's more, even a sysrq forced reboot does not wake the drive up -
only a cold boot works!

The end result is that we are gradually replacing these crappy disks
in our firewalls with CompactFlash disks.



Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?

2005-11-20 Thread George B.
I think I finally fixed the problem by installing the apmd package. :-\

The problem was, I am guessing, that the hard drive went to sleep at
some random time (despite it being explicitly told not to in the BIOS)
and regardless of the disk activity (so even my 15 min dd cronjob did
not fix it) and nothing was there to detect it and tell the system
that the drive was not dead, but just sleeping. I'm not sure what apmd
does when the drive goes to sleep, but it seems to do something
useful.

Anyway, not a kernel bug (unless you think it should have managed
without apmd,) so please close this and sorry to have wasted your
time.


George.



Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?

2005-11-01 Thread Horms
On Tue, Nov 01, 2005 at 09:24:16AM +, George B. wrote:
 P.S. I could try unloading the ide-generic module etc. if you think
 that's a good idea.

I think its certainly worth a try, though its probably not going
to make much difference.

-- 
Horms


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?

2005-10-31 Thread George B.
On 27/10/05, Horms [EMAIL PROTECTED] wrote:

 Ok, that does sound like fair resoning, though I should
 say that almost always these kind of errors show up faulty hardware.
 In this case, its probably a bug.

Don't get me wrong, I think may well be faulty hardware, but a general
fault rather than with a specific unit. Unfortunately as such this
would be way over my head.

Just for the record, the hdparm -b 255 hack did not work on at least
one of the machines and neither did a 15 min cronjob of dd if=/dev/hdc
of=/dev/null bas=1024 count=3000 seek=1000  /dev/null :-(

One of the firewalls I still haven't rebooted, one died last night and
the other is still alive for the moment.

The only thing that all these boxes have in common (apart from
identical HDDs, but one had a different one before) is a VIA chipset.
How likely is it that there's a bug either in that, or the driver for
it?

I am very confused :-(



Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?

2005-10-31 Thread Horms
Is there any chance that #336103 is related to this?

http:://bugs.debian.org/336103

-- 
Horms


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?

2005-10-26 Thread Horms
On Mon, Oct 24, 2005 at 03:37:52PM +0100, George B. wrote:
 Package: kernel-image-2.6.8-2-386
 Version: 2.6.8-16
 Severity: important
 
 Hello,
 
 I've been having the same problem on 3 firewall boxes: after a certain
 amount of time (days, weeks) the hard drives will either go into read
 only mode or lock up for good (until a reboot) with I/O error messages.
 
 I will report about this this machine, as the others have not been
 rebooted yet, so they don't work properly yet (although they still
 forward/filter packets.)
 
 All the firewalls use Seagate ST92011A (20GB 2.5) drives and are based
 on the VIA chipsets (can't confirm if these are identical as one of the
 firewalls uses a different motherboard and is currently dead
 (input/output error on any command) until the next reboot.
 
 This was a problem when I tried the latest 2.4 kernel in Sarge, then it
 seemed to go away when I switched to 2.6.8, but is still there, just
 takes much longer for the fault to occur.
 
 I have a feeling it is a Power Management problem, with the drive not
 waking up from deep sleep (this was proven experimentally with the 2.4
 kernel.) At the moment I am testing the hdparm -B 255 'solution'.
 Otherwise it's the 15 min ls -l /  /dev/null cron job :-S
 
 The kernel is from APT the modules loaded are by hotplug - no custom
 stuff. powermgmt-base is installed, but that's about it.

Your disks are failing, get new ones if you value your data.

-- 
Horms


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?

2005-10-26 Thread George B.

 Your disks are failing, get new ones if you value your data.

I'm afraid I will have to disagree with you on this one. All of the
disks were brand new and the problem started showing up after a few
days/weeks after assembly.

Also, one of the firewalls had a different (3.5) drive in it before,
that displayed similar symptoms. Thinking the disk was dying I swapped
it, only to have the problem come back. (The 3.5 drive is being
happily used by my brother now, with no problems whatsoever.)

Also, 3 out of 3 is a bit high failure rate...


George B.



Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?

2005-10-26 Thread Horms
On Wed, Oct 26, 2005 at 10:42:46PM +0100, George B. wrote:
 
  Your disks are failing, get new ones if you value your data.
 
 I'm afraid I will have to disagree with you on this one. All of the
 disks were brand new and the problem started showing up after a few
 days/weeks after assembly.
 
 Also, one of the firewalls had a different (3.5) drive in it before,
 that displayed similar symptoms. Thinking the disk was dying I swapped
 it, only to have the problem come back. (The 3.5 drive is being
 happily used by my brother now, with no problems whatsoever.)
 
 Also, 3 out of 3 is a bit high failure rate...

Ok, that does sound like fair resoning, though I should
say that almost always these kind of errors show up faulty hardware.
In this case, its probably a bug.

-- 
Horms


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#335538: kernel-image-2.6.8-2-386: Hard drive locks up - DMA or Power Management problem?

2005-10-24 Thread George B.
Package: kernel-image-2.6.8-2-386
Version: 2.6.8-16
Severity: important

Hello,

I've been having the same problem on 3 firewall boxes: after a certain
amount of time (days, weeks) the hard drives will either go into read
only mode or lock up for good (until a reboot) with I/O error messages.

I will report about this this machine, as the others have not been
rebooted yet, so they don't work properly yet (although they still
forward/filter packets.)

All the firewalls use Seagate ST92011A (20GB 2.5) drives and are based
on the VIA chipsets (can't confirm if these are identical as one of the
firewalls uses a different motherboard and is currently dead
(input/output error on any command) until the next reboot.

This was a problem when I tried the latest 2.4 kernel in Sarge, then it
seemed to go away when I switched to 2.6.8, but is still there, just
takes much longer for the fault to occur.

I have a feeling it is a Power Management problem, with the drive not
waking up from deep sleep (this was proven experimentally with the 2.4
kernel.) At the moment I am testing the hdparm -B 255 'solution'.
Otherwise it's the 15 min ls -l /  /dev/null cron job :-S

The kernel is from APT the modules loaded are by hotplug - no custom
stuff. powermgmt-base is installed, but that's about it.

Here is some info:

lspci:
---
:00:00.0 Host bridge: VIA Technologies, Inc. VT8601 [Apollo
ProMedia] (rev 05)
:00:01.0 PCI bridge: VIA Technologies, Inc. VT8601 [Apollo ProMedia
AGP]
:00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super
South] (rev 40)
:00:07.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
:00:07.2 USB Controller: VIA Technologies, Inc. VT82x UHCI USB
1.1 Controller (rev 1a)
:00:07.3 USB Controller: VIA Technologies, Inc. VT82x UHCI USB
1.1 Controller (rev 1a)
:00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
(rev 40)
:00:07.5 Multimedia audio controller: VIA Technologies, Inc.
VT82C686 AC97 Audio Controller (rev 50)
:00:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
:00:09.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
:00:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
:01:00.0 VGA compatible controller: Trident Microsystems
CyberBlade/i1 (rev 6a)
---

Dmesg error from this machine (with a futile attempt to force a remote
reboot - is there a better way?):
---
eth1: no IPv6 routers present
eth0: no IPv6 routers present
apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16ac)
HTB init, kernel part version 3.17
u32 classifier
OLD policer on
hdc: dma_timer_expiry: dma status == 0x20
hdc: DMA timeout retry
hdc: timeout waiting for DMA
hdc: status timeout: status=0xd0 { Busy }
 
hdc: drive not ready for command
ide1: reset timed-out, status=0x80
hdc: status timeout: status=0x80 { Busy }
 
hdc: drive not ready for command
ide1: reset timed-out, status=0x80
end_request: I/O error, dev hdc, sector 33719
end_request: I/O error, dev hdc, sector 33727
end_request: I/O error, dev hdc, sector 33735
end_request: I/O error, dev hdc, sector 33743
end_request: I/O error, dev hdc, sector 33751
end_request: I/O error, dev hdc, sector 33759
end_request: I/O error, dev hdc, sector 33767
end_request: I/O error, dev hdc, sector 33775
end_request: I/O error, dev hdc, sector 33783
end_request: I/O error, dev hdc, sector 33791
end_request: I/O error, dev hdc, sector 33799
end_request: I/O error, dev hdc, sector 33807
end_request: I/O error, dev hdc, sector 33815
end_request: I/O error, dev hdc, sector 33823
end_request: I/O error, dev hdc, sector 33831
end_request: I/O error, dev hdc, sector 33839
end_request: I/O error, dev hdc, sector 33847
end_request: I/O error, dev hdc, sector 33855
end_request: I/O error, dev hdc, sector 33863
end_request: I/O error, dev hdc, sector 33871
end_request: I/O error, dev hdc, sector 33879
end_request: I/O error, dev hdc, sector 33887
end_request: I/O error, dev hdc, sector 33895
end_request: I/O error, dev hdc, sector 18982239
Buffer I/O error on device hdc5, logical block 782337
lost page write due to I/O error on hdc5
end_request: I/O error, dev hdc, sector 17250295
Buffer I/O error on device hdc5, logical block 565844
lost page write due to I/O error on hdc5
end_request: I/O error, dev hdc, sector 17250303
Buffer I/O error on device hdc5, logical block 565845
lost page write due to I/O error on hdc5
end_request: I/O error, dev hdc, sector 17250311
Buffer I/O error on device hdc5, logical block 565846
lost page write due to I/O error on hdc5
end_request: I/O error, dev hdc, sector 17249919
Buffer I/O error on device hdc5, logical block 565797
lost page write due to I/O error on hdc5
end_request: I/O error, dev hdc, sector 17245527
Buffer I/O error on device hdc5, logical block 565248
lost page write due to I/O error on hdc5
end_request: I/O