Re: Journal-601 error on Redhat 7.3 / reiserfs / ext3 / raid 5

2003-07-02 Thread Oleg Drokin
Hello!

On Thu, Jul 03, 2003 at 01:14:08AM +0300, Jussi Vainionp?? wrote:

> >>Apr 27 20:18:06 un kernel: journal-601, buffer write failed
> >I do not know who to blame here. Try to heavily write to loop device 
> >itself (without using
> >reiserfs) to see if something will break? Or bettr yet - upgrade to newer 
> >kernel and see if that's
> >cures your problem?
> I tried the same operation using ext2 instead of reiserfs and at least that 
> worked without any problems.

ext2 does not wait on buffers unless you operate in sync mode, so it won't notice.
Try the ext2 with -o sync then?

Bye,
Oleg


Re: reiserfs on removable media

2003-07-02 Thread Oleg Drokin
Hello!

On Wed, Jul 02, 2003 at 02:23:13PM -0400, Zygo Blaxell wrote:

> - If the device is detached while a filesystem is mounted, reiserfs gets a
> whole lot of I/O errors (or worse) and immediately oopses.  It would be
> nice if reiserfs would handle this a bit more gracefully--it should at
> least let me kill processes with open files and umount the filesystem.
> OTOH many other things also oops with with current USB/firewire/scsi device
> driver stack too.  :-P

Write errors to data areas are not mostly "safe".
It's write errors into journal area that kill the thing.
Jeff Mahoney of SuSE have the patch that remounts the FS R/O in
case of such an event (I think he even posted some preliminary patches
here), it is what you most probably need in this case.

Bye,
Oleg


Recipe for reiserfs oops on "removable" disks (was: Re: reiserfs on removable media)

2003-07-02 Thread Zygo Blaxell
On Wed, 02 Jul 2003 14:45:44 -0400, Andreas Dilger wrote:
> On Jul 02, 2003  14:23 -0400, Zygo Blaxell wrote:

> This is called ordered data mode, and exists on ext3 and also reiserfs
> with Chris Mason's patches.

Ah, thank you, I had forgotten that the feature had a name, and that ext3
can be configured to have the same behavior.  ;-)

> Well, if something oopses you are pretty much stuck w.r.t. killing the
> process and unmounting the fs.  So fix the oopses and the rest should
> come around as a result.  Of course, the reiserfs folks can do a lot
> more with a specific oops report than just "it immediately oopses".  ;-)

But it _does_ immediately oops.  Actually that's not true, it BUG()s
first.  And depending on the device driver chain it may also oops in
other places.

I guess I forgot that "the oops I've seen about 20 times this morning"
isn't really useful information to other people.  ;-)

You can get a similar oops without any special hardware using the network
block device.  Here's the recipe:

Create a large-file on machine A.  Run 'nbd-server 1 large-file' on machine
A.  The file has to be big enough that mkreiserfs can create a filesystem
on it.

Run the following on machine B:

nbd-client A 1 /dev/nbd/0
mkreiserfs /dev/nbd/0
mount /dev/nbd/0 /test
ls -l / > /test/some-data

Then on machine A:

killall nbd-server

Then do something on machine B with the /test filesystem, and watch the
fireworks.  It looks something like this:

NBD: receive - sock=-1040559660 at buf=-1047896328, size=16 returned 0
.   
NBD: Recv control failed.(result 0) 
req should never be null
nbd: shutting down socket   
nbd: queue cleared  
Kernel call returned.Closing: que, sock, done   
NBD, minor 0: Request when not-ready.   
NBD, minor 0: Request when not-ready.   
NBD, minor 0: Request when not-ready.   
NBD, minor 0: Request when not-ready.   
NBD, minor 0: Request when not-ready.   
NBD, minor 0: Request when not-ready.   
NBD, minor 0: Request when not-ready.   
NBD, minor 0: Request when not-ready.   
NBD, minor 0: Request when not-ready.   
journal-601, buffer write failed  
[the rest is filtered through ksymoops]
kernel BUG at prints.c:334! 
invalid operand:    
CPU:0   
EIP:0010:[]Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 0282
eax: 0024   ebx: c02f2460   ecx: c034cd2c   edx:    
esi: c133e000   edi: c133e000   ebp: 0004   esp: c1fedec8   
ds: 0018   es: 0018   ss: 0018  
Process kupdated (pid: 6, stackpage=c1fed000)   
Stack: c02f035a c0403740 c02f2460 c1fedeec c28540cc  c01af619 c133e000  
   c02f2460   0010  c2854100 c28540f4 0005  
    c1307814 c01b313e c133e000 c28540cc 0001 c1fedf80 c133e000  
Call Trace:[] [] [] [] [] 
  [] [] []
Code: 0f 0b 4e 01 60 03 2f c0 68 40 37 40 c0 85 f6 74 16 31 c0 66   


>>EIP; c01a4fa9<=

>>ebx; c02f2460 
>>ecx; c034cd2c 
>>esi; c133e000 <_end+ef7548/3047d548>
>>edi; c133e000 <_end+ef7548/3047d548>
>>esp; c1fedec8 <_end+1ba7410/3047d548>

Trace; c01af619 
Trace; c01b313e 
Trace; c01b230f 
Trace; c01a24be 
Trace; c014b508 
Trace; c014a267 
Trace; c014a736 
Trace; c0107420 

Code;  c01a4fa9 
 <_EIP>:
Code;  c01a4fa9<=
   0:   0f 0b ud2a  <=
Code;  c01a4fab 
   2:   4edec%esi
Code;  c01a4fac 
   3:   01 60 03  add%esp,0x3(%eax)
Code;  c01a4faf 
   6:   2fdas
Code;  c01a4fb0 
   7:   c0 68 40 37   shrb   $0x37,0x40(%eax)
Code;  c01a4fb4 
   b:   40inc%eax
Code;  c01a4fb5 
   c:   c0 85 f6 74 16 31 c0  rolb   $0xc0,0x311674f6(%ebp)
Code;  c01a4fbc 
  13:   66data16


> Not much you can do about the IO

Re: reiserfsck 3.6.8 + corrupted filesystem

2003-07-02 Thread Zygo Blaxell
On Wed, 02 Jul 2003 14:31:36 -0400, Vitaly Fertman wrote:
> fsck should not abort if in memory data on pass1 (which were built on
> pass0 of fsck) match what they should be. Otherwise it looks like
> hardware problem with memory or smth like that.

OK, that clears things up a bit.

Basically something in the output of pass 0 is different from what is
expected based on in-RAM data in pass 1.  The diagnostic message could be
(much!) clearer about that, rather than just guessing that I must have "bad
memory."

In a nutshell, either the system RAM is bad, or the disks are mangling
data without returning errors, or reiserfsck has a bug that causes it to
expect to find that it has previously written something that it hasn't.

I'll exercise RAM and disks over the next N days to try to eliminate
hardware as a possible cause.


Re: Journal-601 error on Redhat 7.3 / reiserfs / ext3 / raid 5

2003-07-02 Thread Jussi Vainionpää
Oleg Drokin wrote:

Apr 27 20:18:06 un kernel: journal-601, buffer write failed


I do not know who to blame here. Try to heavily write to loop device itself (without 
using
reiserfs) to see if something will break? Or bettr yet - upgrade to newer kernel and 
see if that's
cures your problem?
I tried the same operation using ext2 instead of reiserfs and at least that worked without any problems.




Re: reiserfs on removable media

2003-07-02 Thread Zygo Blaxell
On Wed, 02 Jul 2003 14:53:39 -0400, Hans Reiser wrote:
> Remind me about removable media around January, and we'll write some
> code for reiser4 to make it more graceful for it (somehow prompt the
> user to insert disk, etc.)

Ow!  Ow!  Ow!  Kernel prompting the user...  Ow!  ;-)

Now, "kernel notifying an automounter daemon process, which talks to the
user in user-space" is somewhere in the realm of possibility...

Actually to be clear there are two topics here:  removable _media_ and
removable _drives_.  e.g. a typical IDE disk is non-removable media in a
non-removable drive.

A floppy disk in a typical floppy drive is removable media in a
non-removable drive, but a floppy disk in a USB floppy drive is removable
media in a removable drive.  Practically speaking there's not much
difference--if the drive was removed, you'd have to assume the media was
removed as well, if only because there's no way to receive media-change
notifications if the drive isn't connected.

The USB drive I wrote about earlier is a desktop non-removable IDE disk in
a removable drive.

The difference is subtle but it does allow for some interesting stuff to
happen in the block device layer.  The hard drive has a serial number,
which (in theory) could be queried by the USB storage layer and used as a
unique identifier for the drive.  This could e.g. suspend all read/write
requests to the drive while it is disconnected, and resume said requests
when it is reconnected.  That's all I'd need for my laptop setup
(assuming I don't connect the drive somewhere else, that is)...and it
doesn't require changing one line of reiserfs.


Re: reiserfs on removable media

2003-07-02 Thread Chris Mason
On Wed, 2003-07-02 at 15:08, Dieter Nützel wrote:
> Am Mittwoch, 2. Juli 2003 20:59 schrieb Chris Mason:
> > On Wed, 2003-07-02 at 14:53, Hans Reiser wrote:
> > > >This is called ordered data mode, and exists on ext3 and also reiserfs
> > > > with Chris Mason's patches.  Under normal usage it shouldn't change
> > > > performance compared to writeback data mode (which is what reiserfs
> > > > does by default).
> 
> Chris,
> 
> I thought data=ordered is the "new" default with your patch?
> 
It is.

> > > It had some impact, I forget exactly how much, maybe Chris can
> > > resuscitate his benchmark of it?
> >
> > The major cost of data=ordered is that dirty blocks are flushed every 5
> > seconds instead of every 30.  The journal header patch in my
> > experimental data logging directory changes things so that only new
> > bytes in the file are done in data=ordered mode (either adding a new
> > block or appending onto the end of the file).
> >
> > This helps a lot in the file rewrite tests.
> 
> What's faster than with your patches? ordered|journal|writeback?
> 
> I thought is order: writeback < ordered < journal ;-)

Usually ;-)  ordered is faster in a few rare benchmarks because it helps
keeps the number of dirty buffers lower and generally sends the dirty
buffers to the disk in a big flood.

journal is faster for some fsync heavy benchmarks.

For practical desktop usage, data=ordered and writeback are usually
close to each other.

-chris




Re: reiserfs on removable media

2003-07-02 Thread Dieter Nützel
Am Mittwoch, 2. Juli 2003 20:59 schrieb Chris Mason:
> On Wed, 2003-07-02 at 14:53, Hans Reiser wrote:
> > >This is called ordered data mode, and exists on ext3 and also reiserfs
> > > with Chris Mason's patches.  Under normal usage it shouldn't change
> > > performance compared to writeback data mode (which is what reiserfs
> > > does by default).

Chris,

I thought data=ordered is the "new" default with your patch?

> > It had some impact, I forget exactly how much, maybe Chris can
> > resuscitate his benchmark of it?
>
> The major cost of data=ordered is that dirty blocks are flushed every 5
> seconds instead of every 30.  The journal header patch in my
> experimental data logging directory changes things so that only new
> bytes in the file are done in data=ordered mode (either adding a new
> block or appending onto the end of the file).
>
> This helps a lot in the file rewrite tests.

What's faster than with your patches? ordered|journal|writeback?

I thought is order: writeback < ordered < journal ;-)

Thanks,
Dieter



Re: reiserfs on removable media

2003-07-02 Thread Chris Mason
On Wed, 2003-07-02 at 14:53, Hans Reiser wrote:

> >This is called ordered data mode, and exists on ext3 and also reiserfs with
> >Chris Mason's patches.  Under normal usage it shouldn't change performance
> >compared to writeback data mode (which is what reiserfs does by default).
> >
> It had some impact, I forget exactly how much, maybe Chris can 
> resuscitate his benchmark of it?
> 

The major cost of data=ordered is that dirty blocks are flushed every 5
seconds instead of every 30.  The journal header patch in my
experimental data logging directory changes things so that only new
bytes in the file are done in data=ordered mode (either adding a new
block or appending onto the end of the file).

This helps a lot in the file rewrite tests.

-chris




Re: reiserfs on removable media

2003-07-02 Thread Hans Reiser
Andreas Dilger wrote:

On Jul 02, 2003  14:23 -0400, Zygo Blaxell wrote:
 

Two reiserfs improvements come to mind:

- There is a tendency for files that were being grown at crash time to
contain invalid data.  It seems that the inodes are being updated before
the data blocks they refer to are written.  It would be nice if the inode
writes were deferred (or at least made invisible) until after the data
blocks were written.  I'd rather lose my data than possibly have random
garbage masquerading as my data.
   

This is called ordered data mode, and exists on ext3 and also reiserfs with
Chris Mason's patches.  Under normal usage it shouldn't change performance
compared to writeback data mode (which is what reiserfs does by default).
It had some impact, I forget exactly how much, maybe Chris can 
resuscitate his benchmark of it?

Remind me about removable media around January, and we'll write some 
code for reiser4 to make it more graceful for it (somehow prompt the 
user to insert disk, etc.)

--
Hans



Re: reiserfs on removable media

2003-07-02 Thread Andreas Dilger
On Jul 02, 2003  14:23 -0400, Zygo Blaxell wrote:
> Two reiserfs improvements come to mind:
> 
> - There is a tendency for files that were being grown at crash time to
> contain invalid data.  It seems that the inodes are being updated before
> the data blocks they refer to are written.  It would be nice if the inode
> writes were deferred (or at least made invisible) until after the data
> blocks were written.  I'd rather lose my data than possibly have random
> garbage masquerading as my data.

This is called ordered data mode, and exists on ext3 and also reiserfs with
Chris Mason's patches.  Under normal usage it shouldn't change performance
compared to writeback data mode (which is what reiserfs does by default).

> - If the device is detached while a filesystem is mounted, reiserfs gets a
> whole lot of I/O errors (or worse) and immediately oopses.  It would be
> nice if reiserfs would handle this a bit more gracefully--it should at
> least let me kill processes with open files and umount the filesystem.
> OTOH many other things also oops with with current USB/firewire/scsi device
> driver stack too.  :-P

Well, if something oopses you are pretty much stuck w.r.t. killing the
process and unmounting the fs.  So fix the oopses and the rest should
come around as a result.  Of course, the reiserfs folks can do a lot
more with a specific oops report than just "it immediately oopses".  ;-)

Not much you can do about the IO errors (i.e. working as designed).
That's going to happen if you remove your device while writing to it.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/



Re: reiserfsck 3.6.8 + corrupted filesystem

2003-07-02 Thread Vitaly Fertman
Hi, 

On Wednesday 02 July 2003 21:31, Zygo Blaxell wrote:
> I've been running reiserfsck over a corrupted filesystem (IDE disks, dead
> fans, overheating embedded controller RAM, smoke...you get the picture).
> The messages are...interesting.
>
> What is the meaning of the message "The problem has occurred looks like
> a hardware problem (perhaps memory)."?  Is that referring to the memory
> of reiserfsck, or is it suggesting there is some kind of data consistency
> issue on the disk, or is it suggesting that the corruption it is seeing
> on the disk might have been the result of bad memory some time in the
> past?

Hardware problem means a problem with your hardware, not software.
Perhaps you want to run memtest and check your memory, perhaps smth
else but fsck data built in memory on pass0 turned out to be wrong on 
pass1. 

> I've been running reiserfsck --rebuild-tree in a while loop until it fixes
> the FS.  It seems that each time through it gets a little further along,
> then near the end of pass 1, reiserfsck complains that something wasn't
> done in pass 0 and aborts.  Pass 0 runs again, and some additional changes
> are made which fix whatever pass 1 was complaining about.  Pass 1 runs
> again, gets a little further than it did the previous run, then aborts
> a few thousand blocks later.  The most recent run suggests that this
> might continue in pass 2 (complaining about things not done by both pass
> 1 and 0), but I've never gotten to pass 2 yet to find out.
>
> Here are parts of the three reiserfsck runs so far (actually I did some
> more earlier, but those were 3.6.6 not 3.6.8).  Note I've left out
> several thousand lines of pass0 output, most of which involves deleting
> invalidly formatted nodes, directories with bad types, wrong order
> entries in directories...basically what you'd expect if one disk out of
> a RAID array was randomly corrupted.
>
> I realize that there is huge data loss here, but IMHO reiserfsck should at
> least salvage the FS without calling abort() on itself.

fsck should not abort if in memory data on pass1 (which were built on pass0 
of fsck) match what they should be. Otherwise it looks like hardware problem 
with memory or smth like that. 

> I also realize that these log sections are useless as a bug report.

Actually, these log sections were intended to explain that smth unexpected 
happened what does not look like an fsck problem. So you should check all 
your hardware (the hint about what should be checked first is given) and do 
not continue unless you are sure it is working properly. And only if the 
problem occured again in the same place -- this already looks like an fsck
problem -- report about it. 

-- 
Thanks,
Vitaly Fertman


reiserfs on removable media

2003-07-02 Thread Zygo Blaxell
I have a 120GB reiserfs in a portable disk enclosure with USB2.0 and
IEEE1394 interfaces.  Unfortunately the current Linux USB and firewire
drivers in 2.4.21 still have nasty issues, with the result that I've had
too many crashes to count while working out how to get the device drivers
to talk to this disk reliably (probably 50 or more crashes so far).

Obviously these problems aren't reiserfs's fault, nor can reiserfs do
anything about these problems, but it's nice to see that reiserfs survives
as well as it does.

Two reiserfs improvements come to mind:

- There is a tendency for files that were being grown at crash time to
contain invalid data.  It seems that the inodes are being updated before
the data blocks they refer to are written.  It would be nice if the inode
writes were deferred (or at least made invisible) until after the data
blocks were written.  I'd rather lose my data than possibly have random
garbage masquerading as my data.

- If the device is detached while a filesystem is mounted, reiserfs gets a
whole lot of I/O errors (or worse) and immediately oopses.  It would be
nice if reiserfs would handle this a bit more gracefully--it should at
least let me kill processes with open files and umount the filesystem.
OTOH many other things also oops with with current USB/firewire/scsi device
driver stack too.  :-P

Otherwise, this particular reiserfs has survived all of the crashes so
far, even under the heavy I/O loads that seem to trigger the crashes.
Cool.


a happier reiserfsck story

2003-07-02 Thread Zygo Blaxell
I have a USB2.0/firewire external disk drive which has a 120GB reiserfs
filesystem on top of a loop-AES loopback FS.  Often the nbd driver is
involved tool, as I'm testing various hardware configurations in order to
isolate ieee1394 or USB in order to reliably talk to this disk from my laptop.

The current state of Linux USB and IEEE1394 drivers is such that this
filesystem has endured _many_ crashes.  One pass of reiserfsck 3.6.6 and
one of 3.6.8 fixed the most recent crash, which was the only one that
required a reiserfsck.

Why two passes?  Well, it takes more than an hour to run reiserfsck
through firewire, and while I was waiting I figured I might as well
grab the latest version in case 3.6.6 didn't work (it didn't).


reiserfsck 3.6.8 + corrupted filesystem

2003-07-02 Thread Zygo Blaxell
I've been running reiserfsck over a corrupted filesystem (IDE disks, dead
fans, overheating embedded controller RAM, smoke...you get the picture).
The messages are...interesting.

What is the meaning of the message "The problem has occurred looks like
a hardware problem (perhaps memory)."?  Is that referring to the memory
of reiserfsck, or is it suggesting there is some kind of data consistency
issue on the disk, or is it suggesting that the corruption it is seeing
on the disk might have been the result of bad memory some time in the
past?

I've been running reiserfsck --rebuild-tree in a while loop until it fixes
the FS.  It seems that each time through it gets a little further along,
then near the end of pass 1, reiserfsck complains that something wasn't
done in pass 0 and aborts.  Pass 0 runs again, and some additional changes
are made which fix whatever pass 1 was complaining about.  Pass 1 runs
again, gets a little further than it did the previous run, then aborts
a few thousand blocks later.  The most recent run suggests that this
might continue in pass 2 (complaining about things not done by both pass
1 and 0), but I've never gotten to pass 2 yet to find out.

Here are parts of the three reiserfsck runs so far (actually I did some
more earlier, but those were 3.6.6 not 3.6.8).  Note I've left out
several thousand lines of pass0 output, most of which involves deleting
invalidly formatted nodes, directories with bad types, wrong order
entries in directories...basically what you'd expect if one disk out of
a RAID array was randomly corrupted.

I realize that there is huge data loss here, but IMHO reiserfsck should at
least salvage the FS without calling abort() on itself.

I also realize that these log sections are useless as a bug report.
On the other hand, the messages keep changing anyway, so the state of the
FS is a bit of a moving target.  ;-)



(pass 0 elided)


18163 directory entries were hashed with not set hash.
23916806 directory entries were hashed with "r5" hash.
"r5" hash is selected
Flushing..finished
Read blocks (but not data blocks) 158433879
Leaves among those 925148
- corrected leaves 3613
- leaves all contents of which could not be saved and deleted 
1584
pointers in indirect items to wrong area 23559 (zeroed)
Objectids found 4953

Pass 1 (will try to insert 923564 leaves):
### Pass 1 ###
Looking for allocable blocks .. finished
0%20%40%  left 523692, 160
/sec
The problem has occurred looks like a hardware problem (perhaps memory).
Send us the bug report only if the second run dies at the same place with
the same block number.

build_the_tree: Nothing but leaves are expected. Block 59067053 - ??

/root/bin/md0-fsck: line 7:   884 Doneecho Yes
   885 Aborted | reiserfsck "$@" /dev/md0
+ mount /md0
mount: Not a directory



(pass 0 elided)



pass0: vpf-10160: block 64352473: item 7: No "." entry found in the first item of a 
directory
left 0, 3795 /sec
793 directory entries were hashed with not set hash.
23911554 directory entries were hashed with "r5" hash.
"r5" hash is selected
Flushing..finished
Read blocks (but not data blocks) 158433879
Leaves among those 922471
- corrected leaves 899
- leaves all contents of which could not be saved and deleted 
1767
pointers in indirect items to wrong area 16751 (zeroed)
Objectids found 4942

Pass 1 (will try to insert 920704 leaves):
### Pass 1 ###
Looking for allocable blocks .. finished
0%20%40%is_leaf_bad: block 59177036, item 0: The corrupted item found (845456 
215423828 0xcd7e001 ??? (15), len 4048, location 48 entry count 0, fsck need 0, format 
new)
is_leaf_bad: WARNING: The leaf (59177036) is formatted badly. Will be handled on the 
the pass2.
  left 520674, 166 /sec
The problem has occurred looks like a hardware problem (perhaps memory).
Send us the bug report only if the second run dies at the same place with
the same block number.

build_the_tree: Nothing but leaves are expected. Block 59373117 - ??

/root/bin/md0-fsck: line 7: 21821 Doneecho Yes
 21822 Aborted | reiserfsck "$@" /dev/md0


(pass 0 elided)


191 directory entries were hashed with not set hash.
23911285 directory entries were hashed with "r5" hash.
"r5" hash is selected
Flushing..finished
Read blocks (but not data blocks) 158433879
Leaves among those 921865
- corrected leaves 282
- leaves all contents of which could not be saved and deleted 
1821
pointers in indirect items to wrong area 7191 (zeroed)
   

data-logging for 2.4.22-pre3

2003-07-02 Thread Oleg Drokin
Hello!

   Yes, I know that 2.4.22-pre3 is not out yet, but Marcelo have accepted our somewhat 
big patches
   and so you can get replacement patches from 
ftp://namesys.com/pub/reiserfs-for-2.4/testing/data-logging-and-quota-2.4.22-pre3
   once 2.4.22-pre3 is out ;)
   Also starting from 2.4.22-pre3 you no longer need to apply 03-relocation-8.diff.gz 
patch.

Bye,
Oleg