Re: Journal-601 error on Redhat 7.3 / reiserfs / ext3 / raid 5
Hello! On Thu, Jul 03, 2003 at 01:14:08AM +0300, Jussi Vainionp?? wrote: > >>Apr 27 20:18:06 un kernel: journal-601, buffer write failed > >I do not know who to blame here. Try to heavily write to loop device > >itself (without using > >reiserfs) to see if something will break? Or bettr yet - upgrade to newer > >kernel and see if that's > >cures your problem? > I tried the same operation using ext2 instead of reiserfs and at least that > worked without any problems. ext2 does not wait on buffers unless you operate in sync mode, so it won't notice. Try the ext2 with -o sync then? Bye, Oleg
Re: reiserfs on removable media
Hello! On Wed, Jul 02, 2003 at 02:23:13PM -0400, Zygo Blaxell wrote: > - If the device is detached while a filesystem is mounted, reiserfs gets a > whole lot of I/O errors (or worse) and immediately oopses. It would be > nice if reiserfs would handle this a bit more gracefully--it should at > least let me kill processes with open files and umount the filesystem. > OTOH many other things also oops with with current USB/firewire/scsi device > driver stack too. :-P Write errors to data areas are not mostly "safe". It's write errors into journal area that kill the thing. Jeff Mahoney of SuSE have the patch that remounts the FS R/O in case of such an event (I think he even posted some preliminary patches here), it is what you most probably need in this case. Bye, Oleg
Recipe for reiserfs oops on "removable" disks (was: Re: reiserfs on removable media)
On Wed, 02 Jul 2003 14:45:44 -0400, Andreas Dilger wrote: > On Jul 02, 2003 14:23 -0400, Zygo Blaxell wrote: > This is called ordered data mode, and exists on ext3 and also reiserfs > with Chris Mason's patches. Ah, thank you, I had forgotten that the feature had a name, and that ext3 can be configured to have the same behavior. ;-) > Well, if something oopses you are pretty much stuck w.r.t. killing the > process and unmounting the fs. So fix the oopses and the rest should > come around as a result. Of course, the reiserfs folks can do a lot > more with a specific oops report than just "it immediately oopses". ;-) But it _does_ immediately oops. Actually that's not true, it BUG()s first. And depending on the device driver chain it may also oops in other places. I guess I forgot that "the oops I've seen about 20 times this morning" isn't really useful information to other people. ;-) You can get a similar oops without any special hardware using the network block device. Here's the recipe: Create a large-file on machine A. Run 'nbd-server 1 large-file' on machine A. The file has to be big enough that mkreiserfs can create a filesystem on it. Run the following on machine B: nbd-client A 1 /dev/nbd/0 mkreiserfs /dev/nbd/0 mount /dev/nbd/0 /test ls -l / > /test/some-data Then on machine A: killall nbd-server Then do something on machine B with the /test filesystem, and watch the fireworks. It looks something like this: NBD: receive - sock=-1040559660 at buf=-1047896328, size=16 returned 0 . NBD: Recv control failed.(result 0) req should never be null nbd: shutting down socket nbd: queue cleared Kernel call returned.Closing: que, sock, done NBD, minor 0: Request when not-ready. NBD, minor 0: Request when not-ready. NBD, minor 0: Request when not-ready. NBD, minor 0: Request when not-ready. NBD, minor 0: Request when not-ready. NBD, minor 0: Request when not-ready. NBD, minor 0: Request when not-ready. NBD, minor 0: Request when not-ready. NBD, minor 0: Request when not-ready. journal-601, buffer write failed [the rest is filtered through ksymoops] kernel BUG at prints.c:334! invalid operand: CPU:0 EIP:0010:[]Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 0282 eax: 0024 ebx: c02f2460 ecx: c034cd2c edx: esi: c133e000 edi: c133e000 ebp: 0004 esp: c1fedec8 ds: 0018 es: 0018 ss: 0018 Process kupdated (pid: 6, stackpage=c1fed000) Stack: c02f035a c0403740 c02f2460 c1fedeec c28540cc c01af619 c133e000 c02f2460 0010 c2854100 c28540f4 0005 c1307814 c01b313e c133e000 c28540cc 0001 c1fedf80 c133e000 Call Trace:[] [] [] [] [] [] [] [] Code: 0f 0b 4e 01 60 03 2f c0 68 40 37 40 c0 85 f6 74 16 31 c0 66 >>EIP; c01a4fa9<= >>ebx; c02f2460 >>ecx; c034cd2c >>esi; c133e000 <_end+ef7548/3047d548> >>edi; c133e000 <_end+ef7548/3047d548> >>esp; c1fedec8 <_end+1ba7410/3047d548> Trace; c01af619 Trace; c01b313e Trace; c01b230f Trace; c01a24be Trace; c014b508 Trace; c014a267 Trace; c014a736 Trace; c0107420 Code; c01a4fa9 <_EIP>: Code; c01a4fa9<= 0: 0f 0b ud2a <= Code; c01a4fab 2: 4edec%esi Code; c01a4fac 3: 01 60 03 add%esp,0x3(%eax) Code; c01a4faf 6: 2fdas Code; c01a4fb0 7: c0 68 40 37 shrb $0x37,0x40(%eax) Code; c01a4fb4 b: 40inc%eax Code; c01a4fb5 c: c0 85 f6 74 16 31 c0 rolb $0xc0,0x311674f6(%ebp) Code; c01a4fbc 13: 66data16 > Not much you can do about the IO
Re: reiserfsck 3.6.8 + corrupted filesystem
On Wed, 02 Jul 2003 14:31:36 -0400, Vitaly Fertman wrote: > fsck should not abort if in memory data on pass1 (which were built on > pass0 of fsck) match what they should be. Otherwise it looks like > hardware problem with memory or smth like that. OK, that clears things up a bit. Basically something in the output of pass 0 is different from what is expected based on in-RAM data in pass 1. The diagnostic message could be (much!) clearer about that, rather than just guessing that I must have "bad memory." In a nutshell, either the system RAM is bad, or the disks are mangling data without returning errors, or reiserfsck has a bug that causes it to expect to find that it has previously written something that it hasn't. I'll exercise RAM and disks over the next N days to try to eliminate hardware as a possible cause.
Re: Journal-601 error on Redhat 7.3 / reiserfs / ext3 / raid 5
Oleg Drokin wrote: Apr 27 20:18:06 un kernel: journal-601, buffer write failed I do not know who to blame here. Try to heavily write to loop device itself (without using reiserfs) to see if something will break? Or bettr yet - upgrade to newer kernel and see if that's cures your problem? I tried the same operation using ext2 instead of reiserfs and at least that worked without any problems.
Re: reiserfs on removable media
On Wed, 02 Jul 2003 14:53:39 -0400, Hans Reiser wrote: > Remind me about removable media around January, and we'll write some > code for reiser4 to make it more graceful for it (somehow prompt the > user to insert disk, etc.) Ow! Ow! Ow! Kernel prompting the user... Ow! ;-) Now, "kernel notifying an automounter daemon process, which talks to the user in user-space" is somewhere in the realm of possibility... Actually to be clear there are two topics here: removable _media_ and removable _drives_. e.g. a typical IDE disk is non-removable media in a non-removable drive. A floppy disk in a typical floppy drive is removable media in a non-removable drive, but a floppy disk in a USB floppy drive is removable media in a removable drive. Practically speaking there's not much difference--if the drive was removed, you'd have to assume the media was removed as well, if only because there's no way to receive media-change notifications if the drive isn't connected. The USB drive I wrote about earlier is a desktop non-removable IDE disk in a removable drive. The difference is subtle but it does allow for some interesting stuff to happen in the block device layer. The hard drive has a serial number, which (in theory) could be queried by the USB storage layer and used as a unique identifier for the drive. This could e.g. suspend all read/write requests to the drive while it is disconnected, and resume said requests when it is reconnected. That's all I'd need for my laptop setup (assuming I don't connect the drive somewhere else, that is)...and it doesn't require changing one line of reiserfs.
Re: reiserfs on removable media
On Wed, 2003-07-02 at 15:08, Dieter Nützel wrote: > Am Mittwoch, 2. Juli 2003 20:59 schrieb Chris Mason: > > On Wed, 2003-07-02 at 14:53, Hans Reiser wrote: > > > >This is called ordered data mode, and exists on ext3 and also reiserfs > > > > with Chris Mason's patches. Under normal usage it shouldn't change > > > > performance compared to writeback data mode (which is what reiserfs > > > > does by default). > > Chris, > > I thought data=ordered is the "new" default with your patch? > It is. > > > It had some impact, I forget exactly how much, maybe Chris can > > > resuscitate his benchmark of it? > > > > The major cost of data=ordered is that dirty blocks are flushed every 5 > > seconds instead of every 30. The journal header patch in my > > experimental data logging directory changes things so that only new > > bytes in the file are done in data=ordered mode (either adding a new > > block or appending onto the end of the file). > > > > This helps a lot in the file rewrite tests. > > What's faster than with your patches? ordered|journal|writeback? > > I thought is order: writeback < ordered < journal ;-) Usually ;-) ordered is faster in a few rare benchmarks because it helps keeps the number of dirty buffers lower and generally sends the dirty buffers to the disk in a big flood. journal is faster for some fsync heavy benchmarks. For practical desktop usage, data=ordered and writeback are usually close to each other. -chris
Re: reiserfs on removable media
Am Mittwoch, 2. Juli 2003 20:59 schrieb Chris Mason: > On Wed, 2003-07-02 at 14:53, Hans Reiser wrote: > > >This is called ordered data mode, and exists on ext3 and also reiserfs > > > with Chris Mason's patches. Under normal usage it shouldn't change > > > performance compared to writeback data mode (which is what reiserfs > > > does by default). Chris, I thought data=ordered is the "new" default with your patch? > > It had some impact, I forget exactly how much, maybe Chris can > > resuscitate his benchmark of it? > > The major cost of data=ordered is that dirty blocks are flushed every 5 > seconds instead of every 30. The journal header patch in my > experimental data logging directory changes things so that only new > bytes in the file are done in data=ordered mode (either adding a new > block or appending onto the end of the file). > > This helps a lot in the file rewrite tests. What's faster than with your patches? ordered|journal|writeback? I thought is order: writeback < ordered < journal ;-) Thanks, Dieter
Re: reiserfs on removable media
On Wed, 2003-07-02 at 14:53, Hans Reiser wrote: > >This is called ordered data mode, and exists on ext3 and also reiserfs with > >Chris Mason's patches. Under normal usage it shouldn't change performance > >compared to writeback data mode (which is what reiserfs does by default). > > > It had some impact, I forget exactly how much, maybe Chris can > resuscitate his benchmark of it? > The major cost of data=ordered is that dirty blocks are flushed every 5 seconds instead of every 30. The journal header patch in my experimental data logging directory changes things so that only new bytes in the file are done in data=ordered mode (either adding a new block or appending onto the end of the file). This helps a lot in the file rewrite tests. -chris
Re: reiserfs on removable media
Andreas Dilger wrote: On Jul 02, 2003 14:23 -0400, Zygo Blaxell wrote: Two reiserfs improvements come to mind: - There is a tendency for files that were being grown at crash time to contain invalid data. It seems that the inodes are being updated before the data blocks they refer to are written. It would be nice if the inode writes were deferred (or at least made invisible) until after the data blocks were written. I'd rather lose my data than possibly have random garbage masquerading as my data. This is called ordered data mode, and exists on ext3 and also reiserfs with Chris Mason's patches. Under normal usage it shouldn't change performance compared to writeback data mode (which is what reiserfs does by default). It had some impact, I forget exactly how much, maybe Chris can resuscitate his benchmark of it? Remind me about removable media around January, and we'll write some code for reiser4 to make it more graceful for it (somehow prompt the user to insert disk, etc.) -- Hans
Re: reiserfs on removable media
On Jul 02, 2003 14:23 -0400, Zygo Blaxell wrote: > Two reiserfs improvements come to mind: > > - There is a tendency for files that were being grown at crash time to > contain invalid data. It seems that the inodes are being updated before > the data blocks they refer to are written. It would be nice if the inode > writes were deferred (or at least made invisible) until after the data > blocks were written. I'd rather lose my data than possibly have random > garbage masquerading as my data. This is called ordered data mode, and exists on ext3 and also reiserfs with Chris Mason's patches. Under normal usage it shouldn't change performance compared to writeback data mode (which is what reiserfs does by default). > - If the device is detached while a filesystem is mounted, reiserfs gets a > whole lot of I/O errors (or worse) and immediately oopses. It would be > nice if reiserfs would handle this a bit more gracefully--it should at > least let me kill processes with open files and umount the filesystem. > OTOH many other things also oops with with current USB/firewire/scsi device > driver stack too. :-P Well, if something oopses you are pretty much stuck w.r.t. killing the process and unmounting the fs. So fix the oopses and the rest should come around as a result. Of course, the reiserfs folks can do a lot more with a specific oops report than just "it immediately oopses". ;-) Not much you can do about the IO errors (i.e. working as designed). That's going to happen if you remove your device while writing to it. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: reiserfsck 3.6.8 + corrupted filesystem
Hi, On Wednesday 02 July 2003 21:31, Zygo Blaxell wrote: > I've been running reiserfsck over a corrupted filesystem (IDE disks, dead > fans, overheating embedded controller RAM, smoke...you get the picture). > The messages are...interesting. > > What is the meaning of the message "The problem has occurred looks like > a hardware problem (perhaps memory)."? Is that referring to the memory > of reiserfsck, or is it suggesting there is some kind of data consistency > issue on the disk, or is it suggesting that the corruption it is seeing > on the disk might have been the result of bad memory some time in the > past? Hardware problem means a problem with your hardware, not software. Perhaps you want to run memtest and check your memory, perhaps smth else but fsck data built in memory on pass0 turned out to be wrong on pass1. > I've been running reiserfsck --rebuild-tree in a while loop until it fixes > the FS. It seems that each time through it gets a little further along, > then near the end of pass 1, reiserfsck complains that something wasn't > done in pass 0 and aborts. Pass 0 runs again, and some additional changes > are made which fix whatever pass 1 was complaining about. Pass 1 runs > again, gets a little further than it did the previous run, then aborts > a few thousand blocks later. The most recent run suggests that this > might continue in pass 2 (complaining about things not done by both pass > 1 and 0), but I've never gotten to pass 2 yet to find out. > > Here are parts of the three reiserfsck runs so far (actually I did some > more earlier, but those were 3.6.6 not 3.6.8). Note I've left out > several thousand lines of pass0 output, most of which involves deleting > invalidly formatted nodes, directories with bad types, wrong order > entries in directories...basically what you'd expect if one disk out of > a RAID array was randomly corrupted. > > I realize that there is huge data loss here, but IMHO reiserfsck should at > least salvage the FS without calling abort() on itself. fsck should not abort if in memory data on pass1 (which were built on pass0 of fsck) match what they should be. Otherwise it looks like hardware problem with memory or smth like that. > I also realize that these log sections are useless as a bug report. Actually, these log sections were intended to explain that smth unexpected happened what does not look like an fsck problem. So you should check all your hardware (the hint about what should be checked first is given) and do not continue unless you are sure it is working properly. And only if the problem occured again in the same place -- this already looks like an fsck problem -- report about it. -- Thanks, Vitaly Fertman
reiserfs on removable media
I have a 120GB reiserfs in a portable disk enclosure with USB2.0 and IEEE1394 interfaces. Unfortunately the current Linux USB and firewire drivers in 2.4.21 still have nasty issues, with the result that I've had too many crashes to count while working out how to get the device drivers to talk to this disk reliably (probably 50 or more crashes so far). Obviously these problems aren't reiserfs's fault, nor can reiserfs do anything about these problems, but it's nice to see that reiserfs survives as well as it does. Two reiserfs improvements come to mind: - There is a tendency for files that were being grown at crash time to contain invalid data. It seems that the inodes are being updated before the data blocks they refer to are written. It would be nice if the inode writes were deferred (or at least made invisible) until after the data blocks were written. I'd rather lose my data than possibly have random garbage masquerading as my data. - If the device is detached while a filesystem is mounted, reiserfs gets a whole lot of I/O errors (or worse) and immediately oopses. It would be nice if reiserfs would handle this a bit more gracefully--it should at least let me kill processes with open files and umount the filesystem. OTOH many other things also oops with with current USB/firewire/scsi device driver stack too. :-P Otherwise, this particular reiserfs has survived all of the crashes so far, even under the heavy I/O loads that seem to trigger the crashes. Cool.
a happier reiserfsck story
I have a USB2.0/firewire external disk drive which has a 120GB reiserfs filesystem on top of a loop-AES loopback FS. Often the nbd driver is involved tool, as I'm testing various hardware configurations in order to isolate ieee1394 or USB in order to reliably talk to this disk from my laptop. The current state of Linux USB and IEEE1394 drivers is such that this filesystem has endured _many_ crashes. One pass of reiserfsck 3.6.6 and one of 3.6.8 fixed the most recent crash, which was the only one that required a reiserfsck. Why two passes? Well, it takes more than an hour to run reiserfsck through firewire, and while I was waiting I figured I might as well grab the latest version in case 3.6.6 didn't work (it didn't).
reiserfsck 3.6.8 + corrupted filesystem
I've been running reiserfsck over a corrupted filesystem (IDE disks, dead fans, overheating embedded controller RAM, smoke...you get the picture). The messages are...interesting. What is the meaning of the message "The problem has occurred looks like a hardware problem (perhaps memory)."? Is that referring to the memory of reiserfsck, or is it suggesting there is some kind of data consistency issue on the disk, or is it suggesting that the corruption it is seeing on the disk might have been the result of bad memory some time in the past? I've been running reiserfsck --rebuild-tree in a while loop until it fixes the FS. It seems that each time through it gets a little further along, then near the end of pass 1, reiserfsck complains that something wasn't done in pass 0 and aborts. Pass 0 runs again, and some additional changes are made which fix whatever pass 1 was complaining about. Pass 1 runs again, gets a little further than it did the previous run, then aborts a few thousand blocks later. The most recent run suggests that this might continue in pass 2 (complaining about things not done by both pass 1 and 0), but I've never gotten to pass 2 yet to find out. Here are parts of the three reiserfsck runs so far (actually I did some more earlier, but those were 3.6.6 not 3.6.8). Note I've left out several thousand lines of pass0 output, most of which involves deleting invalidly formatted nodes, directories with bad types, wrong order entries in directories...basically what you'd expect if one disk out of a RAID array was randomly corrupted. I realize that there is huge data loss here, but IMHO reiserfsck should at least salvage the FS without calling abort() on itself. I also realize that these log sections are useless as a bug report. On the other hand, the messages keep changing anyway, so the state of the FS is a bit of a moving target. ;-) (pass 0 elided) 18163 directory entries were hashed with not set hash. 23916806 directory entries were hashed with "r5" hash. "r5" hash is selected Flushing..finished Read blocks (but not data blocks) 158433879 Leaves among those 925148 - corrected leaves 3613 - leaves all contents of which could not be saved and deleted 1584 pointers in indirect items to wrong area 23559 (zeroed) Objectids found 4953 Pass 1 (will try to insert 923564 leaves): ### Pass 1 ### Looking for allocable blocks .. finished 0%20%40% left 523692, 160 /sec The problem has occurred looks like a hardware problem (perhaps memory). Send us the bug report only if the second run dies at the same place with the same block number. build_the_tree: Nothing but leaves are expected. Block 59067053 - ?? /root/bin/md0-fsck: line 7: 884 Doneecho Yes 885 Aborted | reiserfsck "$@" /dev/md0 + mount /md0 mount: Not a directory (pass 0 elided) pass0: vpf-10160: block 64352473: item 7: No "." entry found in the first item of a directory left 0, 3795 /sec 793 directory entries were hashed with not set hash. 23911554 directory entries were hashed with "r5" hash. "r5" hash is selected Flushing..finished Read blocks (but not data blocks) 158433879 Leaves among those 922471 - corrected leaves 899 - leaves all contents of which could not be saved and deleted 1767 pointers in indirect items to wrong area 16751 (zeroed) Objectids found 4942 Pass 1 (will try to insert 920704 leaves): ### Pass 1 ### Looking for allocable blocks .. finished 0%20%40%is_leaf_bad: block 59177036, item 0: The corrupted item found (845456 215423828 0xcd7e001 ??? (15), len 4048, location 48 entry count 0, fsck need 0, format new) is_leaf_bad: WARNING: The leaf (59177036) is formatted badly. Will be handled on the the pass2. left 520674, 166 /sec The problem has occurred looks like a hardware problem (perhaps memory). Send us the bug report only if the second run dies at the same place with the same block number. build_the_tree: Nothing but leaves are expected. Block 59373117 - ?? /root/bin/md0-fsck: line 7: 21821 Doneecho Yes 21822 Aborted | reiserfsck "$@" /dev/md0 (pass 0 elided) 191 directory entries were hashed with not set hash. 23911285 directory entries were hashed with "r5" hash. "r5" hash is selected Flushing..finished Read blocks (but not data blocks) 158433879 Leaves among those 921865 - corrected leaves 282 - leaves all contents of which could not be saved and deleted 1821 pointers in indirect items to wrong area 7191 (zeroed)
data-logging for 2.4.22-pre3
Hello! Yes, I know that 2.4.22-pre3 is not out yet, but Marcelo have accepted our somewhat big patches and so you can get replacement patches from ftp://namesys.com/pub/reiserfs-for-2.4/testing/data-logging-and-quota-2.4.22-pre3 once 2.4.22-pre3 is out ;) Also starting from 2.4.22-pre3 you no longer need to apply 03-relocation-8.diff.gz patch. Bye, Oleg