Re: PROBLEM: Buffer I/O error on device hdg1, system freeze.
Nils Radtke wrote: Error 14 occurred at disk power-on lifetime: 2249 hours (93 days + 17 hours) When the command that caused the error occurred, the device was doing SMART Offline or Self-test. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 f8 23 3e 56 e0 Error: UNC at LBA = 0x00563e23 = 5652003 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- 24 00 f8 07 3e 56 10 00 00:36:28.850 READ SECTOR(S) EXT 25 00 00 ff 3d 56 10 00 00:36:28.850 READ DMA EXT 25 00 00 ff 3c 56 10 00 00:36:28.850 READ DMA EXT 25 00 00 ff 3b 56 10 00 00:36:28.850 READ DMA EXT 25 00 00 ff 3a 56 10 00 00:36:28.850 READ DMA EXT Could you please explain what these errors mean exactly and what may have caused them? Might it be possible that these transmission/xxx errors be caused by a bad card and/or driver? I'm asking this as the disk never showed errors on onboard IDE ports. Nils This error is reported by the drive itself, indicating uncorrectable errors when attempting to read data from the media. It is quite unlikely that the controller or driver is responsible for this sort of error, as can occasionally be the case for DMA timeout errors. Almost certainly the hard drive is failing. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Buffer I/O error on device hdg1, system freeze.
Hi Bartlomiej, Thanks for your link. # > hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error } # > hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=262311, high=0, low=262311, sector=262311 # > ide: failed opcode was: unknown # > end_request: I/O error, dev hdg, sector 262311 # > Buffer I/O error on device hdg1, logical block 131124 # > # > fscking this disk freezes the entire system. # > # > The disk was remounted ro afterwards. # > Disk itself is ok. Is a new one. # http://smartmontools.sf.net Extract from /usr/share/doc/smartmontools/WARNINGS.gz: SYSTEM: Promise 20265 IDE-controller PROBLEM: Smartctl locks system solid when used on CDROM/DVD device REPORTER: see link below LINK: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=208964 NOTE: Problem seems to affect kernel 2.4.21 only. SYSTEM: Promise IDE-controllers and perhaps others also PROBLEM: System freezes under heavy load, perhaps when running SMART commands REPORTER: Mario 'BitKoenig' Holbe [EMAIL PROTECTED] LINK: http://groups.google.de/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=1wUXW- 2FA-9%40gated-at.bofh.it NOTE: Before freezing, SYSLOG shows the following message(s) kernel: hdf: dma timer expiry: dma status == 0xXX where XX is two hexidecimal digits. This may be a kernel bug or an underlying hardware problem. It's not clear if smartmontools plays a role in provoking this problem. FINAL NOTE: Problem was COMPLETELY resolved by replacing the power supply. See URL above, entry on May 29, 2004 by Holbe. Other things to try are exchanging cables, and cleaning PCI slots. This sounds highly familiar and shows an at least hidden correlation(-potential) between this kind of error and the Promise controller PDC drivers. Ok, maybe I'm suffering prejudices now. We'll see. A year ago, other disks (IBM/WD) had trouble on the PDC also, but not on onboard controllers. And they are still spinning today. (Means, they had not to be replaced for hard disk errors) Fact is however, that as mailed last year, even after a complete exchange of mainboard and processor, the problem perexists through any kernel-version. Furthermore, countless posts indicate similar or same symptoms. Nevertheless, I keep the list up-to-date in case of new info. smartctl -a /dev/hdc gives: Error 18 occurred at disk power-on lifetime: 2249 hours (93 days + 17 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 f8 a8 05 c3 e0 Error: UNC at LBA = 0x00c305a8 = 12780968 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- 24 00 f8 a7 05 c3 06 00 00:08:14.850 READ SECTOR(S) EXT 25 00 00 9f 05 c3 06 00 00:08:14.850 READ DMA EXT 25 00 00 9f 04 c3 06 00 00:08:14.850 READ DMA EXT 25 00 00 9f 03 c3 06 00 00:08:14.850 READ DMA EXT 25 00 00 9f 02 c3 06 00 00:08:14.850 READ DMA EXT Error 17 occurred at disk power-on lifetime: 2249 hours (93 days + 17 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 47 06 c3 e0 Error: UNC at LBA = 0x00c30647 = 12781127 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- 25 00 00 9f 05 c3 06 00 00:07:48.550 READ DMA EXT 25 00 00 9f 04 c3 06 00 00:07:48.550 READ DMA EXT 25 00 00 9f 03 c3 06 00 00:07:48.550 READ DMA EXT 25 00 00 9f 02 c3 06 00 00:07:48.550 READ DMA EXT 25 00 00 9f 01 c3 06 00 00:07:48.550 READ DMA EXT Error 16 occurred at disk power-on lifetime: 2249 hours (93 days + 17 hours) When the command that caused the error occurred, the device was doing SMART Offline or Self-test. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 20 b0 f2 57 e0 Error: UNC at LBA = 0x0057f2b0 = 5763760 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- 24 00 20 af f2 57 10 00 00:43:45.600 READ SECTOR(S) EXT 25 00 28 a7 f2 57 10 00 00:43:45.600 READ DMA EXT 25 00 18 77 f2 57 10 00 00:43:45.600 READ DMA EXT 25 00 18 5f 28 57 11 00 00:43:45.600 READ DMA EXT 25 00 08 7f 10 54 10 00 00:43:45.600 READ DMA EXT Error 15 occurred at disk power-on lifetime: 2249 hours (93 days + 17 hours) When the command that caused the error occurred, the device was doing SM
Re: PROBLEM: Buffer I/O error on device hdg1, system freeze.
On Fri, 18 Mar 2005 16:29:45 +0100, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > One line summary of the problem: > Buffer I/O error on device hdg1, system freeze. > > Full description of the problem/report: > the following error showed up in dmesg today: > > hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error } > hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=262311, high=0, > low=262311, sector=262311 > ide: failed opcode was: unknown > end_request: I/O error, dev hdg, sector 262311 > Buffer I/O error on device hdg1, logical block 131124 > > fscking this disk freezes the entire system. > > The disk was remounted ro afterwards. > Disk itself is ok. Is a new one. I doubt it, you can verify this with: http://smartmontools.sf.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PROBLEM: Buffer I/O error on device hdg1, system freeze.
One line summary of the problem: Buffer I/O error on device hdg1, system freeze. Full description of the problem/report: the following error showed up in dmesg today: hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error } hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=262311, high=0, low=262311, sector=262311 ide: failed opcode was: unknown end_request: I/O error, dev hdg, sector 262311 Buffer I/O error on device hdg1, logical block 131124 fscking this disk freezes the entire system. The disk was remounted ro afterwards. Disk itself is ok. Is a new one. Remark: average temperature of the system raised during the last 5 day from 21 deg C to 23 deg C as spring is approaching. Last summer there have been a lot of problems with the pdc at even higher temperatures using kernel 2.4.26 to 2.4.xx. Keywords (i.e., modules, networking, kernel): PDC20269: IDE controller, CONFIG_BLK_DEV_PDC202XX_OLD=y, CONFIG_BLK_DEV_PDC202XX_NEW=y /proc/version: -- Linux version 2.6.11serviceservice ([EMAIL PROTECTED]) (gcc version 2.95.4 20011002 (Debian prerelease)) #1 Sat Mar 5 16:31:18 CET 2005 Output of Oops.. message: see above. A small shell script or example program which triggers the problem: /usr/src/linux/scripts/ver_linux: - If some fields are empty or look unusual you may have an old version. Compare to the current minimal requirements in Documentation/Changes. Linux service 2.6.11serviceservice #1 Sat Mar 5 16:31:18 CET 2005 i686 GNU/Linux Gnu C 2.95.4 Gnu make 3.79.1 binutils 2.12.90.0.1 util-linux 2.11n mount 2.12a module-init-tools 3.1 e2fsprogs 1.35 reiserfsprogs reiserfsck: reiser4progs fsck.reiser4: quota-tools3.04. PPP2.4.1 isdn4k-utils 3.5 Linux C Library2.3.2 Dynamic linker (ldd) 2.3.2 Procps 3.2.4 Net-tools 1.60 Console-tools 0.2.3 Sh-utils 5.2.1 Modules Loaded usbcore 8250 serial_core parport_pc lp parport bridge dm_mod hisax_isac hisax isdn 8139too 3c59x mii /proc/cpuinfo: -- processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 7 model name : Pentium III (Katmai) stepping: 3 cpu MHz : 551.398 cache size : 512 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips: 1089.53 /proc/modules: -- usbcore 114504 0 - Live 0xd098f000 8250 23200 2 - Live 0xd0933000 serial_core 21664 1 8250, Live 0xd092c000 parport_pc 39072 1 - Live 0xd08cd000 lp 12032 0 - Live 0xd0899000 parport 35776 2 parport_pc,lp, Live 0xd08fc000 bridge 50900 0 - Live 0xd091e000 dm_mod 57728 0 - Live 0xd090e000 hisax_isac 12372 0 - Live 0xd08c8000 hisax 198272 1 hisax_isac, Live 0xd093a000 isdn 135872 1 hisax, Live 0xd08d9000 8139too 25376 0 - Live 0xd08a8000 3c59x 40392 0 - Live 0xd089d000 mii 4992 2 8139too,3c59x, Live 0xd088c000 /proc/ioports: -- -001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : ide1 01f0-01f7 : ide0 0290-0297 : pnp 00:0f 02f8-02ff : serial 0376-0376 : ide1 0378-037a : parport0 037b-037f : parport0 03c0-03df : vga+ 03f6-03f6 : ide0 03f8-03ff : serial 0778-077a : parport0 0cf8-0cff : PCI conf1 9400-9403 : :00:0e.0 9800-987f : :00:0e.0 a000-a0ff : :00:0b.0 a000-a0ff : 8139too a400-a40f : :00:0a.0 a400-a407 : ide2 a408-a40f : ide3 a800-a803 : :00:0a.0 a802-a802 : ide3 b000-b007 : :00:0a.0 b000-b007 : ide3 b400-b403 : :00:0a.0 b800-b807 : :00:0a.0 d000-d0ff : :00:09.0 d000-d0ff : 8139too d400-d41f : :00:04.2 d800-d80f : :00:04.1 d800-d807 : ide0 d808-d80f : ide1 e400-e43f : :00:04.3 e400-e43f : motherboard e400-e403 : PM1a_EVT_BLK e404-e405 : PM1a_CNT_BLK e408-e40b : PM_TMR e40c-e40f : GPE0_BLK e800-e81f : :00:04.3 e800-e80f : motherboard /proc/iomem: -0009e7ff : System RAM 0009e800-0009 : reserved 000a-000b : Video RAM area 000c-000c7fff : Video ROM 000cc000-000ce7ff : Adapter ROM 000f-000f : System ROM 0010-0fffbfff : System RAM 0010-0035ecd9 : Kernel code 0035ecda-004df71f : Kernel data 0fffc000-0fffefff : ACPI Tables 0000-0fff : ACPI Non-volatile Storage e280-e280007f : :00:0e.0 e300-e3ff : :00:0c.0 e400-e4ff : :00:0b.0 e400-e4ff : 8139too e480-e4803fff : :00:0a.0 e500-e5ff : :00:09.0 e500-e5ff