Re: PROBLEM: Buffer I/O error on device hdg1, system freeze.

2005-03-18 Thread Robert Hancock
Nils Radtke wrote:
Error 14 occurred at disk power-on lifetime: 2249 hours (93 days + 17
hours)
  When the command that caused the error occurred, the device was doing
SMART Offline or Self-test.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 f8 23 3e 56 e0  Error: UNC at LBA = 0x00563e23 = 5652003
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --    
  24 00 f8 07 3e 56 10 00  00:36:28.850  READ SECTOR(S) EXT
  25 00 00 ff 3d 56 10 00  00:36:28.850  READ DMA EXT
  25 00 00 ff 3c 56 10 00  00:36:28.850  READ DMA EXT
  25 00 00 ff 3b 56 10 00  00:36:28.850  READ DMA EXT
  25 00 00 ff 3a 56 10 00  00:36:28.850  READ DMA EXT
Could you please explain what these errors mean exactly and what may
have caused them?
Might it be possible that these transmission/xxx errors be caused 
by a bad card and/or driver?

I'm asking this as the disk never showed errors on onboard IDE ports.
Nils
This error is reported by the drive itself, indicating uncorrectable 
errors when attempting to read data from the media. It is quite unlikely 
that the controller or driver is responsible for this sort of error, as 
can occasionally be the case for DMA timeout errors. Almost certainly 
the hard drive is failing.

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Buffer I/O error on device hdg1, system freeze.

2005-03-18 Thread Nils Radtke

Hi Bartlomiej,

Thanks for your link.

# >  hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error }
# >  hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=262311, high=0, 
low=262311, sector=262311
# >  ide: failed opcode was: unknown
# >  end_request: I/O error, dev hdg, sector 262311
# >  Buffer I/O error on device hdg1, logical block 131124
# > 
# >   fscking this disk freezes the entire system.
# > 
# >  The disk was remounted ro afterwards.
# >  Disk itself is ok. Is a new one.

# http://smartmontools.sf.net
Extract from /usr/share/doc/smartmontools/WARNINGS.gz:

SYSTEM:   Promise 20265 IDE-controller
PROBLEM:  Smartctl locks system solid when used on CDROM/DVD device
REPORTER: see link below
LINK: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=208964
NOTE: Problem seems to affect kernel 2.4.21 only.


SYSTEM:   Promise IDE-controllers and perhaps others also
PROBLEM:  System freezes under heavy load, perhaps when running SMART
commands
REPORTER: Mario 'BitKoenig' Holbe [EMAIL PROTECTED]
LINK:
http://groups.google.de/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=1wUXW-
2FA-9%40gated-at.bofh.it
NOTE: Before freezing, SYSLOG shows the following message(s)
  kernel: hdf: dma timer expiry: dma status == 0xXX
  where XX is two hexidecimal digits. This may be a kernel bug
  or an underlying hardware problem.  It's not clear if
  smartmontools plays a role in provoking this problem.  FINAL
  NOTE: Problem was COMPLETELY resolved by replacing the power
  supply.  See URL above, entry on May 29, 2004 by Holbe.  Other
  things to try are exchanging cables, and cleaning PCI slots.


This sounds highly familiar and shows an at least hidden
correlation(-potential) between this kind of error and the Promise controller 
PDC drivers.
Ok, maybe I'm suffering prejudices now. We'll see.
A year ago, other disks (IBM/WD) had trouble on the PDC also, but not on onboard
controllers. And they are still spinning today. (Means, they had not to
be replaced for hard disk errors)

Fact is however, that as mailed last year, even after a complete
exchange of mainboard and processor, the problem perexists through any
kernel-version. Furthermore, countless posts indicate similar or same
symptoms.

Nevertheless, I keep the list up-to-date in case of new info.

smartctl -a /dev/hdc gives:
Error 18 occurred at disk power-on lifetime: 2249 hours (93 days + 17
hours)
  When the command that caused the error occurred, the device was active
or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 f8 a8 05 c3 e0  Error: UNC at LBA = 0x00c305a8 = 12780968

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --    
  24 00 f8 a7 05 c3 06 00  00:08:14.850  READ SECTOR(S) EXT
  25 00 00 9f 05 c3 06 00  00:08:14.850  READ DMA EXT
  25 00 00 9f 04 c3 06 00  00:08:14.850  READ DMA EXT
  25 00 00 9f 03 c3 06 00  00:08:14.850  READ DMA EXT
  25 00 00 9f 02 c3 06 00  00:08:14.850  READ DMA EXT

Error 17 occurred at disk power-on lifetime: 2249 hours (93 days + 17
hours)
  When the command that caused the error occurred, the device was active
or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 47 06 c3 e0  Error: UNC at LBA = 0x00c30647 = 12781127

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --    
  25 00 00 9f 05 c3 06 00  00:07:48.550  READ DMA EXT
  25 00 00 9f 04 c3 06 00  00:07:48.550  READ DMA EXT
  25 00 00 9f 03 c3 06 00  00:07:48.550  READ DMA EXT
  25 00 00 9f 02 c3 06 00  00:07:48.550  READ DMA EXT
  25 00 00 9f 01 c3 06 00  00:07:48.550  READ DMA EXT

Error 16 occurred at disk power-on lifetime: 2249 hours (93 days + 17
hours)
  When the command that caused the error occurred, the device was doing
SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 20 b0 f2 57 e0  Error: UNC at LBA = 0x0057f2b0 = 5763760

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --    
  24 00 20 af f2 57 10 00  00:43:45.600  READ SECTOR(S) EXT
  25 00 28 a7 f2 57 10 00  00:43:45.600  READ DMA EXT
  25 00 18 77 f2 57 10 00  00:43:45.600  READ DMA EXT
  25 00 18 5f 28 57 11 00  00:43:45.600  READ DMA EXT
  25 00 08 7f 10 54 10 00  00:43:45.600  READ DMA EXT

Error 15 occurred at disk power-on lifetime: 2249 hours (93 days + 17
hours)
  When the command that caused the error occurred, the device was doing
SM

Re: PROBLEM: Buffer I/O error on device hdg1, system freeze.

2005-03-18 Thread Bartlomiej Zolnierkiewicz
On Fri, 18 Mar 2005 16:29:45 +0100, [EMAIL PROTECTED]
<[EMAIL PROTECTED]> wrote:
> 
> 
> One line summary of the problem:
> Buffer I/O error on device hdg1, system freeze.
> 
> Full description of the problem/report:
>  the following error showed up in dmesg today:
> 
>  hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error }
>  hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=262311, high=0, 
> low=262311, sector=262311
>  ide: failed opcode was: unknown
>  end_request: I/O error, dev hdg, sector 262311
>  Buffer I/O error on device hdg1, logical block 131124
> 
>   fscking this disk freezes the entire system.
> 
>  The disk was remounted ro afterwards.
>  Disk itself is ok. Is a new one.

I doubt it, you can verify this with:
http://smartmontools.sf.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PROBLEM: Buffer I/O error on device hdg1, system freeze.

2005-03-18 Thread lkml


One line summary of the problem:
Buffer I/O error on device hdg1, system freeze.


Full description of the problem/report:
 the following error showed up in dmesg today:

 hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error }
 hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=262311, high=0, 
low=262311, sector=262311
 ide: failed opcode was: unknown
 end_request: I/O error, dev hdg, sector 262311
 Buffer I/O error on device hdg1, logical block 131124

  fscking this disk freezes the entire system.

 The disk was remounted ro afterwards.
 Disk itself is ok. Is a new one.

 Remark: average temperature of the system raised during the last 5 day
 from 21 deg C to 23 deg C as spring is approaching.

 Last summer there have been a lot of problems with the pdc at even
 higher temperatures using kernel 2.4.26 to 2.4.xx. 


Keywords (i.e., modules, networking, kernel):
PDC20269: IDE controller, CONFIG_BLK_DEV_PDC202XX_OLD=y, 
CONFIG_BLK_DEV_PDC202XX_NEW=y



/proc/version:
--

Linux version 2.6.11serviceservice ([EMAIL PROTECTED]) (gcc version 2.95.4 
20011002 (Debian prerelease)) #1 Sat Mar 5 16:31:18 CET 2005


Output of Oops.. message:
see above.


A small shell script or example program which triggers the problem:


/usr/src/linux/scripts/ver_linux:
-

If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
 
Linux service 2.6.11serviceservice #1 Sat Mar 5 16:31:18 CET 2005 i686 GNU/Linux
 
Gnu C  2.95.4
Gnu make   3.79.1
binutils   2.12.90.0.1
util-linux 2.11n
mount  2.12a
module-init-tools  3.1
e2fsprogs  1.35
reiserfsprogs  reiserfsck:
reiser4progs   fsck.reiser4:
quota-tools3.04.
PPP2.4.1
isdn4k-utils   3.5
Linux C Library2.3.2
Dynamic linker (ldd)   2.3.2
Procps 3.2.4
Net-tools  1.60
Console-tools  0.2.3
Sh-utils   5.2.1
Modules Loaded usbcore 8250 serial_core parport_pc lp parport bridge 
dm_mod hisax_isac hisax isdn 8139too 3c59x mii


/proc/cpuinfo:
--

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 7
model name  : Pentium III (Katmai)
stepping: 3
cpu MHz : 551.398
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat 
pse36 mmx fxsr sse
bogomips: 1089.53



/proc/modules:
--

usbcore 114504 0 - Live 0xd098f000
8250 23200 2 - Live 0xd0933000
serial_core 21664 1 8250, Live 0xd092c000
parport_pc 39072 1 - Live 0xd08cd000
lp 12032 0 - Live 0xd0899000
parport 35776 2 parport_pc,lp, Live 0xd08fc000
bridge 50900 0 - Live 0xd091e000
dm_mod 57728 0 - Live 0xd090e000
hisax_isac 12372 0 - Live 0xd08c8000
hisax 198272 1 hisax_isac, Live 0xd093a000
isdn 135872 1 hisax, Live 0xd08d9000
8139too 25376 0 - Live 0xd08a8000
3c59x 40392 0 - Live 0xd089d000
mii 4992 2 8139too,3c59x, Live 0xd088c000


/proc/ioports:
--

-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
0290-0297 : pnp 00:0f
02f8-02ff : serial
0376-0376 : ide1
0378-037a : parport0
037b-037f : parport0
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial
0778-077a : parport0
0cf8-0cff : PCI conf1
9400-9403 : :00:0e.0
9800-987f : :00:0e.0
a000-a0ff : :00:0b.0
  a000-a0ff : 8139too
a400-a40f : :00:0a.0
  a400-a407 : ide2
  a408-a40f : ide3
a800-a803 : :00:0a.0
  a802-a802 : ide3
b000-b007 : :00:0a.0
  b000-b007 : ide3
b400-b403 : :00:0a.0
b800-b807 : :00:0a.0
d000-d0ff : :00:09.0
  d000-d0ff : 8139too
d400-d41f : :00:04.2
d800-d80f : :00:04.1
  d800-d807 : ide0
  d808-d80f : ide1
e400-e43f : :00:04.3
  e400-e43f : motherboard
e400-e403 : PM1a_EVT_BLK
e404-e405 : PM1a_CNT_BLK
e408-e40b : PM_TMR
e40c-e40f : GPE0_BLK
e800-e81f : :00:04.3
  e800-e80f : motherboard


/proc/iomem:


-0009e7ff : System RAM
0009e800-0009 : reserved
000a-000b : Video RAM area
000c-000c7fff : Video ROM
000cc000-000ce7ff : Adapter ROM
000f-000f : System ROM
0010-0fffbfff : System RAM
  0010-0035ecd9 : Kernel code
  0035ecda-004df71f : Kernel data
0fffc000-0fffefff : ACPI Tables
0000-0fff : ACPI Non-volatile Storage
e280-e280007f : :00:0e.0
e300-e3ff : :00:0c.0
e400-e4ff : :00:0b.0
  e400-e4ff : 8139too
e480-e4803fff : :00:0a.0
e500-e5ff : :00:09.0
  e500-e5ff