Re: Problems with "--rebuild-tree" on network (ENBD) storage

Vladimir V. Saveliev Mon, 09 Oct 2006 07:54:08 -0700

Hello

On Friday 06 October 2006 17:10, Bas van Schaik wrote:
> Hi Vladimir,
> >>> ok, may I ask you to run badblocks on that device? reiserfsck wants to be 
> >>> able to read and write filesystem device.
> >>> badblocks will show us whether your device is in good shape. 
> >>>       
> >> Of course you may ask me this, but I really don't think it's relevant.
> >> ReiserFS is on top of (in this specific order) CryptoLoop, LVM, RAID5
> >> and ENBD. If there are bad blocks on one of the 12 (!) disks, then one
> >> of my storage servers in the ENBD-cluster would report a bunch of I/O
> >> errors, RAID5 would drop the device and ReiserFS won't even notice that
> >> a hard drive failed.
> >> Furthermore, every RAID5 device has had a resync since the filesystem
> >> resize operation, which implies that every bit has been checked at least
> >> once.
> >>
> >> I think the problem lies within the way reiserfsck reads and writes to
> >> the underlying block device. Maybe reiserfsck isn't opening the device
> >> in direct I/O (O_DIRECT) mode? 
> >>     
> > Yes, it does not. But why would it have to?
> >
> >   
> >> I think it should, because it's safer, 
> >> though slower. Maybe O_DIRECT can be set optionally on (or off) using a
> >> commandline switch?
> >>
> >>     
> > Maybe O_DIRECT should be used, I do not argue. But there is nothing wrong 
> > in not using O_DIRECT.
> > Why would user land application make a computer unusable?
> > reiserfsck uses standard libc's low level i/o functions to read and write a 
> > device, it also analyses and modify read data before writing them back.
> > The worst thing reiserfsck can do is 100% CPU consumption. But that also 
> > should not hurt a system.
> >
> > I hope you understand what I mean: if user land application makes a box 
> > unusable - something is wrong in kernel.
> > I have never dealt with setup like yours. There are so many layers, why 
> > there can not be any errors?
> >   
> That's true, of course. But there's (at least) one place in the kernel
> where userland touches kernel space: buffering. In my case, I think
> reiserfsck is causing starvation of my TCP buffers, because it doesn't
> use direct I/O but buffered I/O. Of course, this is a normal (and maybe
> wise) thing to do when the bottom layer is ATA or SATA (or something
> like that), but in my case there's a network somewhere between
> reiserfsck and ATA/SATA. So, I don't expect reiserfsck to use direct I/O
> by default, but it would be a nice feature for me (and the few others
> with the same problem?) if direct I/O can be enabled by a commandline
> switch.
>


I am going to send you a patch to try later today (I hope to complete debugging 
by that time).

> > Can you dd_rescue your filesystem to a spare device which has less 
> > underlaying layers (linear raid or oven plain hard disk)
> > and try reiserfsck --rebuild-tree oin it?
> I'm sorry, the system is built upon 12 harddrives, with a total of more
> than 3TB of disk space. I don't have that amount of drives available for
> creating a backup!
> 
> Thanks for you thoughts,
> 
>   -- Bas
> 
> 
>

Re: Problems with "--rebuild-tree" on network (ENBD) storage

Reply via email to