Chris/Jeff, can you modify your code to whenever it sees an I/O error, to say "I/O errors usually indicate bad hardware not bad software, probably you need to get a new disk and use dd_rescue to copy everything to it."?
Thanks, Hans Linas Vepstas wrote: >Hi, > >I've been experimenting with automatic bus error recovery in the >2.6.11 kernel. During one of my failed experiments, I tripped over >a Reiserfs bug, below. Basically, my error recovery failed, which >means a SCSI disk went permanently offline, which, admitedly, >is pretty catastrophic, but shouldn't be a kernel panic. It seems >that reiser hits a 'BUG_ON' in this case. > >FWIW, in my limited experience with ext3 in the same exact situation, >it seems that ext3 handles this gracefully, returning -EIO to all >affected apps accessing the disk. > >Unfortunately, I don't know how to tell you how to reproduce this :) > >--linas > > >Here's dmesg leading up to the failure, and the stack traces are shown below. > ><4>sym0:8:0: HOST RESET operation timed-out. ><6>scsi: Device offlined - not ready after error recovery: host 0 channel 0 id >8 lun 0 ><3>scsi0 (8:0): rejecting I/O to offline device ><3>scsi0 (8:0): rejecting I/O to offline device ><3>Buffer I/O error on device sda3, logical block 8210 ><4>lost page write due to I/O error on sda3 ><4>ReiserFS: sda3: warning: journal-837: IO error during journal replay ><2>REISERFS: abort (device sda3): Write error while updating journal header in >flush_journal_list ><2>REISERFS: Aborting journal for filesystem on sda3 ><3>scsi0 (8:0): rejecting I/O to offline device ><3>Buffer I/O error on device sda3, logical block 741 ><4>lost page write due to I/O error on sda3 ><3>Buffer I/O error on device sda3, logical block 742 ><4>lost page write due to I/O error on sda3 ><3>Buffer I/O error on device sda3, logical block 743 ><4>lost page write due to I/O error on sda3 ><3>Buffer I/O error on device sda3, logical block 744 ><4>lost page write due to I/O error on sda3 ><3>Buffer I/O error on device sda3, logical block 745 ><4>lost page write due to I/O error on sda3 ><3>Buffer I/O error on device sda3, logical block 746 ><4>lost page write due to I/O error on sda3 ><3>Buffer I/O error on device sda3, logical block 747 ><4>lost page write due to I/O error on sda3 ><3>Buffer I/O error on device sda3, logical block 748 ><4>lost page write due to I/O error on sda3 ><3>Buffer I/O error on device sda3, logical block 749 ><4>lost page write due to I/O error on sda3 ><3>scsi0 (8:0): rejecting I/O to offline device ><4>ReiserFS: sda3: warning: clm-6006: writing inode 346759 on readonly FS ><4>ReiserFS: sda3: warning: clm-6006: writing inode 346759 on readonly FS ><4>ReiserFS: sda3: warning: clm-6006: writing inode 346759 on readonly FS ><4>ReiserFS: sda3: warning: clm-6006: writing inode 346759 on readonly FS ><4>ReiserFS: sda3: warning: clm-6006: writing inode 346759 on readonly FS ><4>ReiserFS: sda3: warning: clm-6006: writing inode 346759 on readonly FS ><2>kernel BUG in submit_ordered_buffer at fs/reiserfs/journal.c:616! ><3>scsi0 (8:0): rejecting I/O to offline device > > >cpu 0x1: Vector: 700 (Program Check) at [c00000000fcef740] > pc: c000000000132ac8: .write_ordered_chunk+0xa4/0x100 > lr: c000000000133274: .write_ordered_buffers+0x348/0x364 > sp: c00000000fcef9c0 > msr: 9000000000029032 > current = 0xc00000000fea87b0 > paca = 0xc00000000053b400 > pid = 953, comm = reiserfs/1 >kernel BUG in submit_ordered_buffer at fs/reiserfs/journal.c:616! >enter ? for help >1:mon> t >[c00000000fcefa60] c000000000133274 .write_ordered_buffers+0x348/0x364 >[c00000000fcefc30] c000000000133af0 .flush_commit_list+0x80c/0x8cc >[c00000000fcefd10] c000000000138ac0 .flush_async_commits+0xf0/0xf4 >[c00000000fcefdb0] c00000000006d2fc .worker_thread+0x258/0x32c >[c00000000fcefee0] c000000000073d80 .kthread+0x174/0x1c8 >[c00000000fceff90] c000000000014240 .kernel_thread+0x4c/0x6c >1:mon> >1:mon> c >cpus stopped: 0-3 >1:mon> c 0 >0:mon> t >[c0000000004efdd0] c00000000000f948 .cpu_idle+0x3c/0x54 >[c0000000004efe50] c00000000000c188 .rest_init+0x3c/0x58 >[c0000000004efed0] c00000000049b7dc .start_kernel+0x27c/0x2fc >[c0000000004eff90] c00000000000c000 .__setup_cpu_power3+0x0/0x4 >0:mon> c 2 >2:mon> t >[c00000000424fe80] c00000000000f948 .cpu_idle+0x3c/0x54 >[c00000000424ff00] c00000000003a878 .start_secondary+0x108/0x148 >[c00000000424ff90] c00000000000bd28 .enable_64b_mode+0x0/0x28 >2:mon> c 3 >3:mon> t >[c000000004253e80] c00000000000f948 .cpu_idle+0x3c/0x54 >[c000000004253f00] c00000000003a878 .start_secondary+0x108/0x148 >[c000000004253f90] c00000000000bd28 .enable_64b_mode+0x0/0x28 > > > > >
