Re: Which version will be merged into mainline kernel?

2006-11-08 Thread Andreas Dilger
On Nov 08, 2006  10:15 +0100, Francesco Biscani wrote:
 I think the slow performance you're experiencing are caused by the fsync() 
 call not being well-optimized in reiser4. I've commented out the function in 
 fs/buffer.c, and I'm having much better performance on my / partition.

I don't think this can be advocated as a real solution to performance
problems.  This can mean data loss for some applications like email that
expect fsync return to mean this data is safely on disk.  May as well
say I improved the performance of my backups by writing to /dev/null.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



Re: Problem with multiple mounts

2006-11-08 Thread Andreas Dilger
On Nov 08, 2006  14:38 -0800, Suzuki wrote:
 Lennart Sorensen wrote:
 ReiserFS: sda10: checking transaction log (sda10)
 
 Oops: Kernel access of bad area, sig: 11 [#1]
 
 Call Trace:
 [C00011333090] [C01EDB70] .journal_read+0x165c/0x1b6c 
 (unreliable)
 [C00011333410] [C01EF280] .journal_init+0xdc0/0xee8
 [C00011333530] [C01CDBD8] .reiserfs_fill_super+0xa90/0x1e40
 [C00011333790] [C011E988] .get_sb_bdev+0x208/0x31c
 [C00011333870] [C01CA00C] .get_super_block+0x38/0x60
 [C00011333900] [C011E260] .vfs_kern_mount+0xec/0x198
 [C000113339B0] [C011E3E0] .do_kern_mount+0x88/0xdc
 [C00011333A50] [C01532CC] .do_mount+0xd50/0xe08
 [C00011333D60] [C0175090] .compat_sys_mount+0x368/0x448
 [C00011333E30] [C000861C] syscall_exit+0x0/0x40
 
 My question is : Is this supported ? Mounting a filesystem which is 
 already mounted and replaying the ( - a may be incomplete- ) journal.

 Thanks for the response. This problem was reported by one of our test 
 team on 2.6.19. So, I wanted to confirm that what they are doing is not 
 supported !

I would suggest that even while this is not supported, it would be prudent
to fix such a bug.  It might be possible to hit a similar problem if there
is corruption of the on-disk data in the journal and oopsing the kernel
isn't a graceful way to deal with bad data on disk.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



Re: Better fsck.reiser4 needed

2006-09-06 Thread Andreas Dilger
On Sep 06, 2006  13:04 +0400, Vladimir V. Saveliev wrote:
 On Wednesday 06 September 2006 12:01, Joe Feise wrote:
  It seems that fsck.reiser4 -a doesn't do anything. It doesn't even detect
  corruption.
 
 I think it is made that way with intention. If fsck.reiser4 -a did corruption 
 detection bootup process would take long time and users would not have the 
 main advantage of journalled filesystems - quick recovering after unclean 
 shutdown.

In e2fsck the boot-time check (-a) of ext3 only does very minimal checking:
- valid superblock
- valid journal superblock
- recover journal and any errors stored in the journal, transfer to superblock
- reverify superblock
- check if superblock recorded any metadata errors previously

At this point less than a second has normally passed and the e2fsck is done.
The kernel ext3 code can also do journal recovery, but this doesn't allow
e2fsck the chance to verify the superblock after the journal is recovered.

If the journal or filesystem superblock recorded an error in the
filesystem during the previous run (generally corruption of the metadata)
or is itself corrupt e2fsck will force a full check.  Otherwise,
this corruption may cause endless panic+reboot cycles, or may lead to
cascading corruption of the rest of the filesystem (e.g. if allocation
bitmaps are corrupted).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



Re: possible recursive locking detected - while running fs operations in loops - 2.6.18-rc2-git5

2006-07-25 Thread Andreas Dilger
On Jul 26, 2006  00:16 +0200, Jesper Juhl wrote:
 What I did to provoke it was to run 6 different xterms (with a bash
 shell) with the following loops in them in a test directory that was
 initially empty :
 
 xterm1:   while true; do mkdir a; done
 xterm2:   while true; do rmdir a; done
 xterm3:   while true; do touch a/foo; done
 xterm4:   while true; do find .; done
 xterm5:   while true; do sync; sleep 1; done
 xterm6:   while true; do rm -r a; done

See racer test at ftp.lustre.org/pub/benchmarks/racer-lustre.tar.gz

It does the above, but a bunch more things and is a truly pathalogical
test script that does lots of stupid user tricks, unlike normal tests
which are only doing operations that expect to be successful.

PS - during the racer.sh test run rm is known to segfault after hitting
 an internal assertion, nobody is sure why.
PPS- I don't know who wrote this program, it was originally posted by
 someone not the author to linux-fsdevel or something.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



Re: create very large file system

2006-07-20 Thread Andreas Dilger
On Jul 19, 2006  16:57 +0400, Alexander Zarochentsev wrote:
 On Wednesday 19 July 2006 16:10, Mark F wrote:
  I've tried to create a large 5TB file system using both reiserfs and
  ext3 and both have failed.
 
 you might need to convert the partition table to GPT format for 
 supporting 2TB+ partitions.  it can be done by the gnu parted tool.

Or, for that matter, don't use a partition table at all, since this
adds an unhelpful offset to all the filesystem structures and can
hurt performance on RAID where the filesystem is trying to align IO
to RAID stripe boundaries.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



Re: create very large file system

2006-07-20 Thread Andreas Dilger
On Jul 20, 2006  13:17 +0200, Christian Iversen wrote:
 On Thursday 20 July 2006 08:26, Andreas Dilger wrote:
  On Jul 19, 2006  16:57 +0400, Alexander Zarochentsev wrote:
   On Wednesday 19 July 2006 16:10, Mark F wrote:
I've tried to create a large 5TB file system using both reiserfs and
ext3 and both have failed.
  
   you might need to convert the partition table to GPT format for
   supporting 2TB+ partitions.  it can be done by the gnu parted tool.
 
  Or, for that matter, don't use a partition table at all, since this
  adds an unhelpful offset to all the filesystem structures and can
  hurt performance on RAID where the filesystem is trying to align IO
  to RAID stripe boundaries.
 
 Can linux still auto-detect raid volumes if there's no partition table?

Hmm, that I'm not sure of - we mostly deal with external RAID devices.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



Re: reiserfs performance on ssd

2006-04-28 Thread Andreas Dilger
On Apr 28, 2006  09:07 +0200, PFC wrote:
   While I like the idea, the iram implementation is horrible for 
   various  reasons :
 
   - no ECC

I don't know why people are so keen on ECC RAM.  Why not just put an extra
socket on the board and run the RAM in RAIM (RAID for Memory) mode?
The incremental cost of ECC vs. regular RAM is FAR more than the cost
of just getting the extra stick of RAM.  Also, with RAIM you could even
hot-swap a failing DIMM, while with ECC you have to take an outage to
get back to redundancy.

   - It uses SATA hence only a very little part of the RAM speed is 
   used,  and large latencies are introduced.

Ah, but if you can connect the RAM to multiple machines, you could at
least have the hope of hot failover for the storage to another server.
That isn't something you can do with a bus-attached device.

Even the chance of any recovery is far better with such a setup (i.e.
plug into another system after motherboard dies), since you can at
least get access from another machine.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



Re: State of the Reiser4 FS

2006-03-15 Thread Andreas Dilger
On Mar 15, 2006  20:27 +0100, Andreas Sch�fer wrote:
 If it was that easy... The problem for openMosix is that most devices
 fetch data in 4k blocks via copy_from_user(). For migrated processes,
 openMosix intercepts these calls and forwards them to the node which
 currently hosts the process. This forwarding yields a high latency
 penalty.
 
 Obviously there are two ways to get rid of this problem: 
 
 * modify _every_ Linux device driver to use a
   _a_lot_more_than_4k_at_a_time_ approach or
 
 * implement a second read ahead buffer which fetches large blocks via
   the network in the background and answers calls to copy_from_user()
   directly from the local buffer

Or you can use a network filesystem like Lustre that handles this
itself ;-).  Sadly, though, it has to do both of these to get
good performance, via {sub,per}version of the VFS/VM.

Clients do delayed-write (writeback cache, with write credits from
the server to accound for space) to avoid small RPCs.  They also
do large amounts of readahead (in large chunks) to improve reads
for applications and the VM that breaks up all reads into 4kB chunks.

Servers also do batch block allocation and then large direct writes
instead of going through the VFS/VM.  There are still a number of
device drivers that break up bios into chunks smaller than 1MB, and
that hurts performance.

Having a generic delayed/batch allocation mechanism is definitely
the right way to go, and from my reading of linux-fsdevel this is
underway by some folks at IBM.  Since we have to support customers
dating back to 2.4.21 it will be a while before we can move over to
the newer APIs, once they are available.

 BTW: how are you guys planning to solve this 4k issue? Will you revert
 to small blocks or will you pretend to perform 4k transfers and
 assemble those in the background to, again, process large chunks at
 once? If yes, wouldn't this seriously increase CPU usage due to
 (most likely) unnecessary data duplication?

It doesn't result in data duplication, per se, since the pages are
copied into kernel space only once.  What it does mean is that there
needs to be a duplication of infrastructure in order to reassemble
and track all of these pages.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



Re: [PATCH 00/11] reiserfs: xattr rework

2006-03-08 Thread Andreas Dilger
On Mar 08, 2006  14:12 -0500, Jeff Mahoney wrote:
 Jan Kara wrote:
  The internal i/o patches don't support tails, and that's a silver bullet
  against this working for xattrs. Most xattrs, such as ACLs, are likley
  to be only a few tens of bytes long and allocating an entire block is
  extremely wasteful.
 
Umm, that is really nasty. Ext3 solves this by sharing a block among
  several inodes but that's far to much work to fix this bug...
 
 I had considered sharing files, and the code knows to drop a link to a
 shared file when it's changed. That's one of the features I had wanted
 from the beginning but never got around to implementing.

Just FYI, ext3 has recently implemented support for larger inodes exactly
to store small EAs with the inode instead of an external block.  For ACLs
there is some benefit to sharing the block, because the overhead is
amortized over many inodes.  However, virtually all other EA data is
unique per inode and the ext3 EA block sharing only works if ALL the EAs
for an inode are identical, so that isn't very useful if you have anything
other than ACLs to store.

The performance of in-inode EAs is vastly better than external blocks
because of seeks and not wasting 4kB of disk/RAM for 10-100 bytes of EA.

Not sure if this is useful (haven't been following discussion too closely),
but thought I would steer you away from the shared-block idea early.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



Re: Small ZFS / Reiser4 / Ext 'benchmark'

2006-02-02 Thread Andreas Dilger
On Feb 02, 2006  22:59 +0100, Adrian Ulrich wrote:
 If anyone is interested:
  I ran a small filesystem benchmark on my x86 PC.
 
  It includes:
 
  On Linux:
   * Reiser4
   * ReiserFS
   * Ext3
 
  On Solaris (Using 'gnusolaris'[.org] - Alpha 2)
   * UFS
   * ZFS
 
 
  NetApp's 'Postmark' was used to perform the tests.
  (Postmark simulates something like Mail/NNTP-Server load)
 
 Results:
   http://spam.workaround.ch/dull/postmark.txt
 
 (I used the *default* mkfs/mount options for all filesystems.
  If you like, i can re-run the test with non-default parameters)

If you could format (or tune2fs) the ext3 filesystem with -O dir_index
this would likely improve performance if the test is creating many files
in the same dir.  Also, is the file size limit in bytes, or kilobytes?
Unfortunately, the canonical postmark URLs I can find are not useful.

What is a interesting, though maybe not terribly surprising is that ZFS
is doing so poorly in the second test.  I'd be extremely interested in
seeing the vmstat output while the tests are running, as I've heard that
ZFS is CPU hungry because of the checksumming.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



Re: struct dirent *ent-d_type weirdness

2005-09-19 Thread Andreas Dilger
On Sep 19, 2005  18:33 +0400, Vladimir V. Saveliev wrote:
 Kristian Köhntopp wrote:
  My goal is to walk a spool directory recursively and deal with 
  all files inside directories that I encounter. For that, I check 
  struct dirent *ent-d_type and get unexpected results.
  
 Yes, unfortunately, reiserfs does not support entry types.

   /* disable actual test and print ent-d_type */
   /* ent-d_type==DT_DIR  ent-d_name[1]==0  ent-d_name[0]0 
   ent-d_name[0]!='.'*/

You need to stat() the entry to find the type if d_type is DT_UNKNOWN.
I would continue to use d_type as an optimization, however, as this
reduces the number of syscalls needed when it is supported, and for
network filesystems a stat() may be an expensive operation.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



Re: Fastest way to find / -mtime +7.....

2005-07-20 Thread Andreas Dilger
On Jul 19, 2005  16:00 -0600, Jonathan Briggs wrote:
 How about some kind of stat-data readahead logic?  If the first two or
 three directory entries are stat'd, queue up the rest (or next
 hundred/thousand) of them.  If the disk queue is given the whole pile of
 stat requests at once instead of one at a time, it should be able to
 sort them into a reasonable order.
 
 This might even be a VFS thing to do instead of per-FS.

This is something I would be very interested in.  Having a pipeline of
stats generated when an app does readdir + in-order stat would help
reduce latency a great deal for network filesystems.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



pgprQT5yfDILL.pgp
Description: PGP signature


Re: Creating large numbers of files

2005-01-10 Thread Andreas Dilger
On Jan 10, 2005  17:34 -0500, Dan Labute wrote:
 I'm a developer trying to create large numbers of *empty* files 
 (~10).  What is the fastest way to perform such an operation other 
 than a simple open().  Will using multiple threads to perform concurrent 
 operations help significantly, or am I just awell-off using a single thread?

If you call 'mknod(/path/to/file, S_IFREG | 0666, 0)' from a program
it avoids the overhead of doing both an open and a close.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/



pgpCKZHvOpsd7.pgp
Description: PGP signature


Re: Bug in scandir d_type

2004-12-07 Thread Andreas Dilger
On Dec 07, 2004  12:54 -0500, Jeff Mahoney wrote:
 -BEGIN PGP SIGNED MESSAGE-
 [EMAIL PROTECTED] wrote:
 | Hello.
 |
 | I am working with a program that uses scandir, using d_type to check if a
 | file has certain properties. It seems like that when used on reiserfs 3.6
 | (I haven't tried any other version), the d_type field is always 0 (zero).
 | When the program is moved onto an ext2 partition, it works. The example
 | program in man scandir works also the same way (just replace d_name with
 | d_type when printing out).
 |
 | Kernel:  2.6.8.1 with reiserfs compiled in.
 
 The d_type feature appears to be optional. ext[23] only supports it
 because the feature was tacked on later, it's protected by an
 incompatible feature bit. Most other Linux filesystems only bother
 returning something other than DT_UNKNOWN for . and .., which is kind of
 silly.
 
 In order to get the type information from the file/directory, it either
 needs to be stored with the directory entry (disk format change
 required), or readdir needs to load _every_ inode referenced by the
 directory which would be an immense performance hit for such a small
 corner case. ReiserFS has been in the mainline kernel for years now, and
 your message is the first complaint I've seen about this feature missing.

Doesn't reiserfs store the mode in the directory entry anyways, or is that
only reiser4?  So the overhead of returning the filetype is virtually nil.

 If you truly need the type information, a more portable solution would
 be to stat() each filename returned. You can generate the d_type value
 as follows:

Which obviates the whole point of d_type, which is to get the filetype
efficiently without hundreds of extra syscalls (which is at least part
of what Hans wants with sys_reiser4()).

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/



pgpMP8O2zXiJE.pgp
Description: PGP signature


Re: [RFC] Pathname Semantics with //

2004-09-09 Thread Andreas Dilger
Christian Mayrhuber wrote:
 What about using // as some URI entry point?

One problem that using // may have (thought it is personally my favourite
option right now) is that realpath(3) may cause the // to be eaten, and
this is used by many programs to resolve pathnames to remvoe symlinks,
bogus /./ etc.  This may need a small fix in glibc, but at least it is
still central instead of teaching a million apps about different sematics.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/



pgpmMxWWiq3vG.pgp
Description: PGP signature


Re: reiserfs_create: no enough blocks on device

2004-08-02 Thread Andreas Dilger
On Aug 02, 2004  14:23 -0700, sankarshana rao wrote:
 I am trying to create a reiserfs filesystem using the
 command 
 'mkreiserfs  /dev/loop/0 100'.
 It always gives me the error 
 reiserfs_create: no enough blocks on device.
 
 I tried altering the block size, but it would not
 help..

You can't make a reiserfs filesystem smaller than about 40MB.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/



pgpBVwc8WT6cW.pgp
Description: PGP signature


Re: reiser4 metas/bmap problem

2004-07-28 Thread Andreas Dilger
On Jul 28, 2004  18:26 -0500, Matt Stegman wrote:
 I'm finally taking the time to start testing reiser4, and I'm running into
 something odd.  Some of the time, reiser4 doesn't report a file's blocks
 until I run 'sync'.  This shows up in metas/bmap as many blocks are
 reported as 0.
 
 # dd if=/dev/zero of=/mnt/reiser4/file bs=1M count=50
 50+0 records in
 50+0 records out
 # chmod +x /mnt/reiser4/file
 # grep '^0$' /mnt/reiser4/file/metas/bmap | wc -l
 1792
 # sleep 100
 # grep '^0$' /mnt/reiser4/file/metas/bmap | wc -l
 1792
 # sync
 # grep '^0$' /mnt/reiser4/file/metas/bmap | wc -l
 0
 
 I tried waiting to see if it gets written out in the background
 asynchronously.  As you can see, it still shows up after a couple minutes,
 so it doesn't appear to get written until an actual 'sync' command is run.
 But it shows up inconsistently - many files don't show this.

I believe one of the speedups of reiser4 is that it doesn't actually write
data to disk for minutes at a time, vs. 5s or so for most other filesystems.

The ext3_bmap() function flushes all file data to disk when called, and it
would be prudent to do the same with reiser4, since bmap users tend to be
important and not speed critical (e.g. lilo) and failing to do so can mean
not booting later.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/



pgpGdEIkfYdAo.pgp
Description: PGP signature


Re: 2 Terabyte install

2004-05-11 Thread Andreas Dilger
On May 11, 2004  19:51 -0700, Clifford Beshers wrote:
 Thanks for the help everyone.  In the end, it turned out that sfdisk is 
 indeed the culprit.  We created a 2.1T partition with fdisk, then asked 
 sfdisk for the size and it said: -92,xxx,xxx.  Unfortunately, this means 
 we have to either fix it or find an alternate solution to creating 
 partitions.  If the word ``parted'' is on the tip of your tongue  I'll 
 bet you haven't actually used the thing...
 
 There were a few bugs of ours that acted as red herrings, but Linspire 
 is now up and running on this system with ReiserFS 3 and kernel 2.6.5.
 
 While I'm here, I have some other questions:
 
* What is the time complexity of mounting a ReiserFS partition?  It
  seems to be proportional to the size of the partition?  Is it
  different for Reiser4?

AFAIK, reiserfs will do the initial zeroing of the journal and filesystem
bitmaps at the first mount time instead of at mkreiserfs time.  I don't
know why it was done that way.

* Is there a tool to determine the type of file system on a
  partition without mounting it?

Lots of them.  file -s /dev/foo or if you have a newer e2fsprogs (1.33
and newer I think) you can use blkid [dev ...] to tell you a bunch of
things about each device (LABEL, UUID, TYPE).

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/



Re: tests to see how ext3 reiserfs 3.6 and jfs survive disk errors.

2004-03-31 Thread Andreas Dilger
On Mar 31, 2004  16:00 +0400, Vladimir Saveliev wrote:
 On Wed, 2004-03-31 at 15:44, Ivan Ivanov wrote:
  I made some tests to see how ext3 reiserfs 3.6 and jfs survive disk
  errors.
  
  The test is simple:
  format a partition, copy the kernel source, unmount and and do ?dd
  if=/dev/zero of=/dev/hdd bs=512 count=10 seek=3? to simulate a
  disk surface damage and then run fsck.
  
  seek=3 ? this must be the second half of journal in reiserfs and
  ext3, for jfs I don't know
  
 
 Well, not that I defend reiserfs's i/o error handling
 
 But I do not think that your test is a fair one. You overwrote area
 where reiserfs stored metadata for data you copied into it. (not sure
 about jfs, it probably has the same problem). Do you want to try to
 overwrite ext3's inode tables?

Actually, with a 51MB write it is guaranteed to overwrite at least one
inode table somewhere in the filesystem (one inode table per 32MB of disk).
One of the reasons that ext2/ext3 can survive such actions is that the
location of the metadata is in a fixed location so even if everything is
overwritten it knows what is inode table, what is data blocks, etc.  This
makes ext3 less flexible (i.e. no dynamic inode allocation) but also more
robust.

  jfs:
  
  total data loss, can't mount, fsck didn't helps
  
  reiserfs:
  -
  doing ?reiserfsck ?rebuild-tree? moves all recovered data in lost+found,
  but information is almost unusable
  
  ext3:
  -
  after ?fsck.ext3 -f -y? almost everything was usable, directory
  structure was untouched, some files was moved in lost+found, but in
  general
  everything was usable.
  
  My opinion:
  I can't use anything but ext2/3 in a system where there is no RAID ? 99%
  of desktops and most of web and mail servers.

If you have time, you may want to try overwriting some other parts of the
filesystem, just to see if the results change.  I don't think it will make
a huge difference in the end, but it might.  Note that 51MB is a large
fraction of the size of a Linux kernel so you might end up overwriting 1/4
of all the data.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/



Re: Object Oriented FS

2003-10-27 Thread Andreas Dilger
On Oct 27, 2003  21:23 +0300, Hans Reiser wrote:
 darren wrote:
  allows very high throughput by scaling

 Do you know what that  means?  (Seriously, I don't)

Panasas (from what little I have seen of it) looks to be very similar to
Lustre.  Clients do not access disks directly, but rather have a mapping
layer between offsets and actual data on disk (similar to LVM and VM
abstractions of block devices/memory).  Once client has this mapping (very
small in lustre, on the order of a few hundred bytes at most), it
can do IO directly to one or more storage devices hence scaling of
throughput proportional to number of storage targets.

In Lustre at least, the client knows nothing about the physical layout of
blocks in the file on disk, but rather just accesses one or more objects
via object identifyer, offset, length so the actual layout of the on disk
data can change.

Lustre is GPL, don't know about Panasas.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/



Re: ReiserFS problems

2003-08-14 Thread Andreas Dilger
On Aug 06, 2003  19:18 +0200, Rogier Wolff wrote:
   later. So we hit control-C on the fsck.
  
  That was big mistake.
 
 It was only a couple of percent done. All we have to do now is run it
 again, and let it continue.

 From a user-safety point-of-view, you should use tty() to see if the program
is running interactively, and then trap CTRL-C and have it print a warning in
the signal handler that pressing CTRL-C again in the next second will kill it.
All you need then is to call time() and save it in a static, and if the
signal handler is called more than once in the same second only then exit.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/



Re: ReiserFS problems

2003-08-06 Thread Andreas Dilger
On Aug 06, 2003  18:20 +0200, Rogier Wolff wrote:
 Question: If it is reading all datablocks, I'm guessing that it is
 looking for the magics that build up the filesystem. We're a
 datarecovery company. We probably don't have any current
 datarecoveries of people with Reiserfs on their disk. But if we had a
 disk-image with a valid (or not) Reiserfs on it, would it link that
 into our filesytem?

Correct.  I think that is mentioned somewhere with the resierfsck docs
not to try this with an image of a reiserfsck disk in the filesystem.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/



Re: reiserfs on removable media

2003-07-02 Thread Andreas Dilger
On Jul 02, 2003  14:23 -0400, Zygo Blaxell wrote:
 Two reiserfs improvements come to mind:
 
 - There is a tendency for files that were being grown at crash time to
 contain invalid data.  It seems that the inodes are being updated before
 the data blocks they refer to are written.  It would be nice if the inode
 writes were deferred (or at least made invisible) until after the data
 blocks were written.  I'd rather lose my data than possibly have random
 garbage masquerading as my data.

This is called ordered data mode, and exists on ext3 and also reiserfs with
Chris Mason's patches.  Under normal usage it shouldn't change performance
compared to writeback data mode (which is what reiserfs does by default).

 - If the device is detached while a filesystem is mounted, reiserfs gets a
 whole lot of I/O errors (or worse) and immediately oopses.  It would be
 nice if reiserfs would handle this a bit more gracefully--it should at
 least let me kill processes with open files and umount the filesystem.
 OTOH many other things also oops with with current USB/firewire/scsi device
 driver stack too.  :-P

Well, if something oopses you are pretty much stuck w.r.t. killing the
process and unmounting the fs.  So fix the oopses and the rest should
come around as a result.  Of course, the reiserfs folks can do a lot
more with a specific oops report than just it immediately oopses.  ;-)

Not much you can do about the IO errors (i.e. working as designed).
That's going to happen if you remove your device while writing to it.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/



Re: Write-once file system

2003-06-27 Thread Andreas Dilger
On Jun 27, 2003  09:07 -0700, Fong Vang wrote:
 Once the write to the file is CLOSED the file should not be modifiable in
 any way.  It should not be writeable by root.  Ideally, this should be
 across reboot and across kernel.  The current requirement is that as long as
 the modified kernel/reisefs is being used then it should NOT be modifiable
 (if a kernel allowing modification is used then it could allow
 modifications).

Sounds like immutable (chattr +i) support is what you want.  It looks
like reiserfs already supports this.  Even root can not overwrite or delete
an immutable file, but could disable the immutable flag first (chattr -i)
before doing so.  Regular users can never disable the immutable flag once
set without the CAP_LINUX_IMMUTABLE capability.  However, it looks like
the reiserfs code has a bug there - any user can clear the immutable flag
(see ext[23]_ioctl() for proper permission check).

In BSD (AFAIK), removing the immutable flag requires that you be booted
into runlevel 1 (single user) but in Linux it can currently be done at any
time, although I imagine it would be pretty easy to fix that.

You should be able to set the immutable flag on a directory and have it
inherited by all files created in that directory.

 Fong Vang wrote:
 We rely heavily on reiserfs for some of our critical file systems.  I'm
 wondering what work would be involved and how difficult it would be to add
 an option (perhaps at mount time) to reiserfs that will allow a file to be
 written only once, i.e. once a file is created it should not be allowed to
 be modified or deleted (including the inode).  We may consider paying for
 this modification.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/



Re: Write-once file system

2003-06-27 Thread 'Andreas Dilger'
On Jun 27, 2003  10:07 -0700, Fong Vang wrote:
 this doesn't seem to work on kernel 2.4.20.  I did a chattr +i on file but
 rm -rf (as root) on the file deletes it.

That is a reiserfs bug then...  I just tested it with ext3 and it worked as
expected.

[root]# chattr +i /tmp/ttt
[root]# echo foo  /tmp/ttt
bash: /tmp/ttt: Permission denied
[root]# cp /etc/hosts /tmp/ttt
cp: cannot create regular file `/tmp/ttt': Permission denied
[root]# mv /tmp/cvsErIatf /tmp/ttt
mv: cannot move `/tmp/cvsErIatf' to `/tmp/ttt': Operation not permitted
[root]# rm -f /tmp/ttt
rm: cannot unlink `/tmp/ttt': Operation not permitted
[root]# mv /tmp/ttt /tmp/foo
mv: cannot unlink `/tmp/ttt': Operation not permitted
mv: cannot remove `/tmp/ttt': Operation not permitted


Note however, that I now see that an immutable directory can not have new
files created in it, so there is no easy way for new files to inherit the
immutable flag.  That could probably be done on a per-filesystem basis by
mounting with a new option inherit=immutable or something like that.

Andreas Dilger [mailto:[EMAIL PROTECTED] wrote:
 On Jun 27, 2003  09:07 -0700, Fong Vang wrote:
  Once the write to the file is CLOSED the file should not be modifiable in
  any way.  It should not be writeable by root.  Ideally, this should be
  across reboot and across kernel.  The current requirement is that as long
 as
  the modified kernel/reisefs is being used then it should NOT be modifiable
  (if a kernel allowing modification is used then it could allow
  modifications).
 
 Sounds like immutable (chattr +i) support is what you want.  It looks
 like reiserfs already supports this.  Even root can not overwrite or delete
 an immutable file, but could disable the immutable flag first (chattr -i)
 before doing so.  Regular users can never disable the immutable flag once
 set without the CAP_LINUX_IMMUTABLE capability.  However, it looks like
 the reiserfs code has a bug there - any user can clear the immutable flag
 (see ext[23]_ioctl() for proper permission check).
 
 In BSD (AFAIK), removing the immutable flag requires that you be booted
 into runlevel 1 (single user) but in Linux it can currently be done at any
 time, although I imagine it would be pretty easy to fix that.
 
 You should be able to set the immutable flag on a directory and have it
 inherited by all files created in that directory.
 
  Fong Vang wrote:
  We rely heavily on reiserfs for some of our critical file systems.  I'm
  wondering what work would be involved and how difficult it would be to
 add
  an option (perhaps at mount time) to reiserfs that will allow a file to
 be
  written only once, i.e. once a file is created it should not be allowed
 to
  be modified or deleted (including the inode).  We may consider paying for
  this modification.
 
 Cheers, Andreas
 --
 Andreas Dilger
 http://sourceforge.net/projects/ext2resize/
 http://www-mddsp.enel.ucalgary.ca/People/adilger/

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/



Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-17 Thread Andreas Dilger
On Feb 14, 2003  22:19 +0300, Hans Reiser wrote:
 Andreas Dilger wrote:
  You are well aware
 that the e2fsck check intervals can be tuned per-filesystem and even
 disabled if desired (it prints options for how to do this at mke2fs time
 and is clearly documented for the experienced user).  For a boot-once-a-day
 machine, the default is to check about once a month (at most 6 months for
 the time check), and if machines are crashing more often, then they should
 probably be checked more often because _something_ has to be causing crashes.
 
 The idea that how often you boot determines how often it checks is just 
 silly, sorry.

I guess the shortcoming in the ext2 case is that it counts mounts and
not crashes.  If it were counting the number of times the filesystem
was uncleanly shut down instead of normal shutdowns, would that be more
acceptable?  The reason I'm still interested in crashes, even if they
are not filesystem-related crashes, is because there had to be _something_
which caused a crash (bad code, bad hardware, whatever), and once you have
any driver corrupting memory the chance that it is also corrupting filesystem
memory exists.

 Having reiserfsck just do read-only checks shouldn't force you to type
 yes (and we mean yes because this is so scary, mere mortals shouldn't
 be doing this).  Hans, you've always talked about making things easy for
 the average user (error messages and such), don't you think that making
 a data consistency check for the user a little less intimidating too?

 I think that you should have to agree that you have time to wait for 
 fsck before you get stuck with a 1 day large server fsck.

That is definitely true.  However, my assumption would be that if someone
is running a system with terabytes of data they will read the man page
after waiting a day for fsck to complete, or lose their job.  It is entirely
possible for administrators to disable the per-mount e2fsck checking, and
the time-based (6 months by default) checking too, and do fsck themselves.
My experience would be that, like backups, people don't do that, so leaving
the 6 month check in protects users from themselves.

The other thing to keep in mind is that you can have different levels of
automated fsck at boot time, depending on how long they take.  You never
necessarily have to try and fix anything with fsck -a, just detect errors
and leave it up to the user to decide what to do if you find a problem:
- always recover journal, validate superblock, error flag ( 1s)

Don't know how long it takes these things to run, so it is up to you to
trade off checks vs. speed, and you could even round-robin them (storing
the last checked item in the superblock or something):
- check block allocation bitmaps match superblock counts
- walk directory structure from root, checking for directory corruption
- check btree validity on inodes for up to 10 seconds (or whatever, storing
  last checked inode in superblock for restarting this test at next one)

By all means, don't do checks for an hour, or allow users to set the maximum
boot check duration in the superblock.  I'm sure users don't mind waiting
5s at boot time if it means they don't lose data.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/




Re: External journals and NVRAM devices

2002-10-31 Thread Andreas Dilger
On Nov 01, 2002  16:38 +1100, Jeremy Howard wrote:
 I'm looking at buying solid state drives / NVRAM drives for our servers
 to hold an external ReiserFS journal.
 
 We are using 2.4.20pre11, and Chris Mason's data logging patches.
 
 I'm looking for any tips on how large the journal is when using
 data=journal, and whether the external log patches are stable and work OK
 in data=journal mode. Is there a command to show the current journal
 size? Does the size vary over time? We need to ensure we buy a card with
 enough memory so this is important information for us.
 
 Is anyone currently using NVRAM for the journal? If so, how do you find
 the performance of this configuration?

When people were testing this with ext3 external journals, they just
used a RAMDISK for getting the performance measurements.  Obviously,
(I hope ;-) this is not something you can do in real life, but for
performance measurement it is OK.

Most people found that the ramdisk (and presumably the NVRAM device too)
didn't perform much, if any, better than having a separate fast disk for
the journal, because you are doing sequential I/O to the journal anyways.
If it is on a separate disk/controller from the filesystem you don't have
any seek or channel contention with the filesystem.  Of course, using a
regular disk for the journal is MUCH cheaper than an NVRAM card, so you
probably want to test this out before you go ahead and buy the NVRAM card.

NVRAM devices are great for disks you are doing a lot of random I/O
on (maybe database indexes or something), because there is zero seek
latency, but for sequential I/O (like the journal) it really isn't
anything special.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/




[reiserfs-list] Re: [ext3-users] To compare Linux journalised filesystem, part II.

2002-10-24 Thread Andreas Dilger
On Oct 24, 2002  18:45 +0200, Fabien Combernous wrote:
 ++--+
 | quotas | Again Y is not aqual. ext3 accept quota only on data-journaled |
 || filesystems, but all other journaled filesystem don't have data  |
 ++--+

Granted that I have never used quotas, so it is possible that I
am incorrect.  However, my understanding is that yes, you do need
data-journaled quota files to ensure that your quota tables don't miss
some operations after a crash.  However, you can separately select
data journaling for files in ext3 (via chattr), even if the rest of
the filesystem is using data=ordered (the default).

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/




[reiserfs-list] [BUG][trivial] atime not set on directory reads

2002-06-13 Thread Andreas Dilger

Hello reiserfs folks,
I'm just in the process of running some POSIX conformance tests on
Linux (test available at http://www.opengroup.org/testing/lsb-fhs/)
and reiserfs fails one test where ext2/ext3 are passing.

The test checks whether atime is updated on a directory if you do a
directory lookup.  I imagine that it is as simple as adding a call
UPDATE_ATIME(inode) to the reiserfs directory lookup.  This can, of
course, be turned off at runtime via the noatime mount option (I
think newer kernels also have a newer nodiratime).

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/




Re: [reiserfs-list] 2GB limit won't go away

2002-06-10 Thread Andreas Dilger

On Jun 10, 2002  08:09 -0600, Chris Worley wrote:
 # rm test.bigfile ;dd if=/dev/zero of=./test.bigfile bs=1k seek=1023M
 dd: advancing past 1098437885952 bytes in output file `./test.bigfile':
 File too large

OK, my bad.  This should have been:
dd if=/dev/zero of=./test.bigfile bs=1k seek=2047k

Even so, this probably won't fix your problem.  I would suggest creating
a test 3.6 format filesystem (loopback or such) and see if that works.
With the seek= argument you don't need a very big filesystem.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/




Re: [reiserfs-list] Fsck over Telnet is slow.

2002-05-31 Thread Andreas Dilger

On May 31, 2002  10:21 +0200, Anders Widman wrote:
Is  it just me, but I think reiserfsck is slow over telnet, or even
slower   over  ssh. It seems as it would be better to not print the
progress as fast, if the connection to the terminal is slow.

e2fsck used to have the same problem when you enabled the progress
meter.  What it does now is only update the progress on the screen
every 10th of a second (or whatever) by checking how long ago the last 
update was.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/




Re: [reiserfs-list] Interesting reading that I agree with;-)

2002-05-22 Thread Andreas Dilger

On May 22, 2002  18:33 +0400, Hans Reiser wrote:
 It is a pity we don't have more folks like Rob working on Linux.

I haven't read the whole paper, but at first glance it would appear
to be trivial to do this under Linux, because the dentries maintain such
an absolute path to a file.  The only thing that appears to be needed
for this is the new syscall and the applications to actually use it.
The syscall just needs to walk the dentry tree upwards to generate the
full path, as is already done in the __d_path() function.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/




Re: [reiserfs-list] postmark performance numbers for ReiserFS

2002-03-03 Thread Andreas Dilger

On Mar 04, 2002  00:32 -0500, Chris Mason wrote:
 Ok, I'm not going to be able to replicate the entire test, but I can
 at least demonstrate the high number of subdirectories is slowing down
 the creation time  I'm guessing it is either caused by the
 subdirectory inodes not being in cache often enough, or increased
 log traffic
 
 Try setting the number of subdirectories to 10  If this fixes the
 reiserfs performance problem, we can look into solutions

Hmm, interesting  When Andrew was doing MTA performance testing on
ext3, he found that _increasing_ the number of directories improved
performance  I don't have the thread handy, but I _think_ it had to
do with VFS locking on the directories - more directories means that
more operations can be done in parallel since Al made this part of
the VFS thread-safe

Cheers, Andreas
--
Andreas Dilger
http://sourceforgenet/projects/ext2resize/
http://www-mddspenelucalgaryca/People/adilger/




Re: [reiserfs-list] lsattr/chattr and ReiserFS

2002-02-23 Thread Andreas Dilger

On Feb 23, 2002  15:01 +0100, Marek 'Marecki' Szuba wrote:
 On Fri, 22 Feb 2002, Andreas Dilger wrote:
  There is an attribute patch for reiserfs which is compatible with the
  ext2 attributes interface, and e2fsprogs 1.26+ has support for the
  no tail packing attribute

 Noted, thanks again. BTW. The one I was interested in was no atime
 update that I'd want on selected files rather than on the whole
 filesystem.

I haven't looked at the reiserfs patch in a while, but it _may_ support
the no atime flag.  It should be clear if you look at the patch.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/




Re: [reiserfs-list] Reiserfs ext2 disk repair

2002-02-23 Thread Andreas Dilger

On Feb 23, 2002  19:29 -0500, Harry Wert wrote:
 I recently, with the aid of several reiserfs experts from this group, added a 
 journal file system to my SuSE 7.3 ext2 file system which is now working very 
 well.  After research and curiosity on my part I now understand my changes 
 have not created a true reiserfs file system but retains the structure of the 
 original ext2,  adding a journaling file which facilitates a rapid recovery 
 from a damaged file system, plus probably some improved performance. 

So, it sounds like you actually have an ext3 filesystem (ext2 + journal) and
not a reiserfs filesystem at all.  While ext3 and reiserfs have some general
concepts in common (journaling, which allows quick recovery after a crash),
in fact the kernel code and on-disk data format have nothing in common between
ext3 and reiserfs.

 If my assumptions and conclusions are essentially correct, will my file 
 system be able to maintain its' integrity?

Yes, ext3 will maintain both metadata and data integrity after a crash.
In general (both ext3 and reiserfs have this property) you may lose a
small amount of changes which happened immediately before a system crash.
This is because any operations which were not 100% completed at the time
the system crashed have to be discarded in order to maintain filesystem
integrity.  Reiserfs has the added problem that you may have garbage in
the contents of your file if it was in the process of being modified at
the time of a crash.

 will I be able to use the original fsck structure to repair it

Yes, the ext3 format is 100% supported by e2fsck, so it can detect and
correct problems in the filesystem, just like with ext2 filesystems.
Since the journal removes the requirement to run e2fsck after each crash,
it does not do a full check very often (6 months or 20-40 crashes) unless
the kernel detects an error in the filesystem.  You can change these
intervals with the tune2fs program.  Even though the ext3 journal will
protect you from corruption due to interrupted filesystem operations,
it cannot protect you from disk, cable, RAM, CPU, or software errors, so
it is still a good idea to do a full filesystem check periodically.

Cheers, Andreas

PS - Since you do not actually have a reiserfs filesystem, I would suggest
 that you send any future questions to [EMAIL PROTECTED] instead
 of to the reiserfs list.
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/




Re: [reiserfs-list] Deleting and resetting a reiserfs journal file

2002-02-19 Thread Andreas Dilger

On Feb 19, 2002  20:22 -0500, Harry Wert wrote:
 I have SuSE 7.3 Professional installed on a HP Pavillion 750n with a 1.6Ghz 
 The system is operating and stable but due to the large hard drive, it takes 
 an agonizing 20 minutes to do a fcsk every 26 mounts. SuSE recommended I 
 change to the reiser file system since they could not understand why the 
 tune2fs -c 1000  /dev/hda7 command refused to permanently change the max 
 mount from 26 to 1000.

That is because SuSE 7.3 shipped with a broken kernel (2.4.10) which has
the kernel accessing the disk via the block cache and user-space accessing
the disk via the page cache.  This leads to problems such as you describe,
where changes written from user-space are not seen by the kernel if the
filesystem is mounted or is not synced to disk before shutdown.  This
can include tune2fs and e2fsck.

 Afer repeated attempts to solve this problem I decided to delete the .journal 
 file and reissue the tune2fs -j  /dev/hda7  command but now receive a cannot 
 either delete or rewrite the .journal file due to a permissions error.

The file is set immutable to prevent accidental deletion of the journal,
if you create the journal on a mounted filesystem.  Again, a symptom of
the 2.4.10 kernel problems.

Cheers, Andreas

PS - you would probably be better off asking ext3 questions on
 [EMAIL PROTECTED] instead of reiserfs-list...
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/




Re: [reiserfs-list] resize_reiserfs problem

2002-01-07 Thread Andreas Dilger

On Jan 07, 2002  17:57 -0500, Ciro Vargas Clemow wrote:
 I'm trying to change size of my root reiserfs v3.5 partition, but i can't do 
 it.
 
 this is my partiton table.
 
   cfdisk 2.11b
 
Unidad de disco: /dev/hdc
 Tamaño: 3228696576 bytes
 Cabezales: 128   Sectores por pista: 63   Cilindros: 782
 
 Nombre  IndicadoresTipo de parTipo de sistema d[Etiqueta]
 Tamaño(MB)
  -
 -
 hdc1Primaria  Linux ext2   20,65 
 hdc2Primaria  Linux swap136,25 
 hdc3InicioPrimaria  Linux   2733,25 
Pri/Lóg   Espacio libre 338,56 
 
 
 4) I run resize_reiserfs /dev/hdc3 -s +250M /dev/hdc3
 
 Resize_reiserfs respond that it's not enought space in this partition.
 Are 338.56 Mb free space.

Well, my Spanish isn't que bueno, but I think what you need to do is to
increase the size of /dev/hdc3 to include the libre space at the end
of the disk before you resize the filesystem.

The resize_reiserfs tool is only changing the _filesystem_ and not the
_partition_!

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/




Re: [reiserfs-list] resize_reiserfs problem

2002-01-07 Thread Andreas Dilger

On Jan 07, 2002  18:26 -0500, Ciro Vargas Clemow wrote:
 El Lun 07 Ene 2002 06:08PM, escribiste:
  Well, my Spanish isn't que bueno, but I think what you need to do is to
  increase the size of /dev/hdc3 to include the libre space at the end
  of the disk before you resize the filesystem.
 
  The resize_reiserfs tool is only changing the _filesystem_ and not the
  _partition_!
 
 I'm try this with the cfdisk program ?
 or  exist other program to make this easy?

Well, you can do it with cfdisk, it should be relatively easy to do in
your case.  GNU parted is probably the best tool for doing complex
partition resizing tasks, but it does not appear to support reiserfs.

 p.d. sorry for my english

Well, my espanol is worse, so don't worry about it.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/




Re: [reiserfs-list] What went into 2.4.18-pre1, what do we need? Again and again...;-)

2001-12-27 Thread Andreas Dilger

Dieter Nützel wrote:
Shouldn't you try to force Marcelo to integrate all pending stuff 
(O-inode-attrs.patch, quota)?

Well, my understanding is that the inode-attrs patch is still at the
experimental stage and has not actually been submitted to the kernel
maintainer for inclusion.  There are still unresolved issues like the
fact that an unpatched kernel puts garbage into the attributes area,
but a patched kernel respects the garbage attributes.

IMHO (very 'H' as I'm not about to work on this), it should be done in
several stages:

1) ASAP get a patch into mainline 2.2/2.4/2.5 which ensures that the
   attributes area is zero'd for new files.  This should be low risk
   and easy to do, and reduces the growth of the problem.
2) Ensure that reiserfsck will clear out garbage attributes*.
3) Ensure that mkreiserfs will not write garbage attributes*.
4) Ensure that there is a FAQ entry on how to recover from garbage
   attributes for all of the people that run a new kernel without first
   having run a new reiserfsck against their partition.
5) Add the attribute handling code into the mainline kernel after a
   suitable interval.

(*) My understanding is that (2) and (3) already exist, and it sets a
flag in the superblock which indicates that the attributes are clean. 
This is essentially like an ext2 feature compatibility flag, which I
have urged Hans to start using as well (especially v4).

If this was widely used for reiserfs, then you could do things like
ensure that an old reiserfsck that tries to run against a filesystem
with the attribute flag set would tell the user they need to upgrade
reiserfsck, and the kernel code would know which features it could
handle and which ones it could not.

It is also good practise that _any_ unused field in an on-disk struct
is set to zero at mkfs time and in all the kernel code, to avoid such
problems in the future.  It is always good to know when trying to add
a feature that this field will be zero, so I can use it like 

Has an audit been done of other unused fields in the reiserfs code, to
ensure that they are being zero'd properly?  If not, it should go in as
part of (1) above.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/




Re: [reiserfs-list] 2.4.15 (final) + RFS A-N (2.4.15-pre8) running fine, here.

2001-11-26 Thread Andreas Dilger

On Nov 26, 2001  20:25 +0100, Dieter N?tzel wrote:
 SunWave1 /usr# chattr -R -SAadiscu .badattr/

Actually the better thing is chattr -R = .badattr

 SunWave1 /lib/modules# lsattr 2.4.15-pre7-preempt/
 lsattr: No such file or directory While reading flags on 2.4.15-pre7-preempt//build

The problem is that lsattr calls ioctl to get the attributes, and if
this is not a regular file or directory, then the ioctl will not necessarily
work.  On ext2, e2fsck will clear attributes on sockets, devices, etc, for
you, and I imagine that reiserfsck will eventually do the same.  As with
any new feature, there is some time between when it is added to the fs,
and when support for it appears in fsck.

 Erm, while disabling set and get as a nice idea, you only need to disable 
 get. Disabling set will mean you 'cannot' reset the file that is causing 
 the problem, because chattr won't actually be able to do what you want.

That is not necessarily true - what if the attribute is compressed and
you are randomly allowed to clear this feature?  Then when you read back
the file you will get compressed garbage that is not your file.  There is
a reason why not all attributes (under ext2 at least) are set/clearable
from the user tools.

 eg: Does it read in the value, change the bit and write it out again, or 
 does it just change that bit? If it reads and get is disabled, well then 
 you would want to make sure you set the 'full value' that needs to be 
 there. Not a bug so much as something that just needs to be documented.

Depends on whether you use +/- or = when using chattr.  See above why
chattr -R = .badattr _should_ remove all of the features on your fs.

In the future (once reiserfsck and the kernel code are smarter about
clearing garbage attributes), reiserfs may want restrict set/clear of
attributes to supported values, so that you don't accidentally clear
something like compressed and screw yourself.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/




Re: [reiserfs-list] Re: [REISERFS TESTING] new patches on ftp.namesys.com: 2.4.15-pre7

2001-11-21 Thread Andreas Dilger

On Nov 21, 2001  21:19 +0100, Dieter N?tzel wrote:
 Some files are _NOT_ deleteable even as root, argh?

The normal ext2 solution in this case is move it all to a separate dir,
cp -a from the old dir, and then wait for e2fsck to clean up.  Since
reiserfsck won't do thisyet , just consider it a few kB of lost space
in /dev/bad or whatever.  This _should_ be OK, because cp doesn't
know anything about attributes.

 SunWave1 /home/nuetzel# lsattr /
 BD--j //bin
 BD--j //dev
 -DX-j //dvd
 BD--j //etc
 BD--j //lib
 BD--j //mnt
 BDXE- //opt
 BDXE- //tmp
 BDXE- //var
 BDXE- //usr
 BD--j //boot

No, this is garbage.  Most of these are ext2-specific flags, and sadly
some of them are probably not even settable by chattr, I don't know,
but you could try.  I think only 'j' is supported by chattr (full data
journaling, which isn't implemented in reiserfs yet).

chattr -R -j /

The others are unused, undocumented attributes, such as:
B=compressed block
D=compressed dirty file
X=compression raw access
E=compression error

You can ignore these for now, they won't really hurt you, but maybe before
pushing the reiserfs attributes patch, there should be a feature flag
put in the superblock which means attributes are valid, and resierfsck
will set this flag (if unset) after zeroing all of these fields.  If it
is an issue of the kernel not zeroing these fields for new inodes which
is fixed by the attribute patch, then an attribute-aware reiserfs should
refuse to mount with attributes enabled until reiserfsck --clear-attr
is run to clear the attributes and set the flag.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/




Re: [reiserfs-list] Re: [REISERFS TESTING] new patches on ftp.namesys.com: 2.4.15-pre7

2001-11-21 Thread Andreas Dilger

On Nov 21, 2001  22:58 +0100, Dieter Nützel wrote:
 Am Mittwoch, 21. November 2001 22:20 schrieb Andreas Dilger:
  The normal ext2 solution in this case is move it all to a separate dir,
  cp -a from the old dir, and then wait for e2fsck to clean up. 
 
 Sorry, I don't understand you, here.
 
 If I _first_ move all to a separate dir, the original dir is empty, no?
 
 You mean move it all to a separate dir, cp -a from the new to the old dir, 
 delete the new one, and then wait for e2fsck to clean up?

So, you have a lot of bad inodes in /dev, do this (untested, but easily
reversible):

mv /dev /.badattr
mkdir /dev
lsattr -d /dev

Hopefully /dev is created without any attributes.  If it is, then you need
to find a directory which has no attribute bits set, create /dev there,
and mv it to the root directory.

cp -a /.badattr/* /dev
lsattr -R /dev

Hopefully all of the new inodes in /dev will not have attributes set.
Presumably, the reiserfs attribute code does not inherit attributes
for files which do not support them (e.g. special files), because
ioctls on these files will talk to the device/socket/etc instead of
to the filesystem.  This might need to be fixed in the reiserfs patch.

In the end, these attributes don't do anything bad for you, so they
could all just be ignores.  You can put other bad files into .badattr
until then also.

  Since reiserfsck won't do thisyet , just consider it a few kB of lost space
  in /dev/bad or whatever.  This _should_ be OK, because cp doesn't
  know anything about attributes.

So, maybe you can delete some files in /.badattr, maybe not.  It is
only a few kB, and it will last until reiserfsck gets around to fixing it.

 SunWave1 /# time chattr -R -B /

Not supported.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/




[reiserfs-list] Re: 2.4.9-ac12 - problem mounting reiserfs (parse error?)

2001-09-19 Thread Andreas Dilger

On Sep 19, 2001  21:47 +0200, boris wrote:
 On Wed, Sep 19, 2001 at 12:49:54PM -0400, Fabian Arias wrote:
  - Debian Sid
  - mount 2.11h
  - gcc-2.95.4 (20010902 Debian prerelease) and 3.0.2pre010908.
 
 dito ...
  
  But in my case I don't have defaults on fstab on my reiserfs partitions:
  
  /dev/hdc1  /  ext2  defaults,errors=remount-ro  0 1
  /dev/hdc5  /home  reiserfs  rw  0 2
 
 but I can boot :
 
 /dev/scsi/host0/bus0/target5/lun0/part1   /   reiserfs
defaults,errors=remount-ro  0   0
 
 with:
 reiserfs: Unrecognized mount option errors
 
 everything is ok until I run lilo:
 
 
 Unable to handle kernel NULL pointer dereference at virtual address 
  printing eip:
 
 *pde = 
 
 Entering kdb (current=0xc58d8000, pid 738) on processor 0 Oops: Oops
 due to oops @ 0x0
 eax = 0x ebx = 0xc5a7ebd4 ecx = 0xc7bb8c9c edx = 0xc1179bf0 
 esi = 0xc1179bd4 edi = 0x esp = 0xc58d9f20 eip = 0x 
 ebp = 0xc58d9f54 xss = 0x0018 xcs = 0x0010 eflags = 0x00010246 
 xds = 0xc1170018 xes = 0x0018 origeax = 0x regs = 0xc58d9eec
 
 
 0xc5a5e000 0588 0578  0  001  stop  0xc5a5e370 bash
 0xc5a4c000 0737 0588  0  000  stop  0xc5a4c370 lilo
 0xc58d8000 0738 0737  1  000  run   0xc58d8370*lilo.real
 [0]kdb btp 738
 EBP   EIP Function(args)
 0xc012a310 0x unknown (0xc5a7ebd4, 0xc1179bd4)
kernel unknown 0x0 0x0 0x0
0xc012a310 do_generic_file_read+0x364 (0xc5a7ebd4, 0xc5a7ebf4, 
0xc58d9f88, 0xc012a6fc)
kernel .text 0xc010 0xc0129fac 0xc012a51c
 0xc58d9f98 0xc012a7d2 generic_file_read+0x7a (0xc5a7ebd4, 0x805a920, 0x200, 
0xc5a7ebf4)
kernel .text 0xc010 0xc012a758 0xc012a8dc
 0xc58d9fbc 0xc0138e28 sys_read+0x98 (0x4, 0x805a920, 0x200, 0x805c768, 0x3)
kernel .text 0xc010 0xc0138d90 0xc0138e64
0xc01071cb system_call+0x33
kernel .text 0xc010 0xc0107198 0xc01071d0

What version of LILO are you using?  Versions = 21.6 _should_ automatically
do tail unpacking for mapped files via ioctl, but maybe it is not well tested.
Even so, it should NOT be possible to cause it to oops with bad data to the
ioctl, if this is the case.  I've CC'd reiserfs-list on this as well.

Cheers, Andreas
--
Andreas Dilger  \ If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert




Re: [reiserfs-list] Errors in a reiserfs that hagns out my machine

2001-08-31 Thread Andreas Dilger

On Aug 31, 2001  15:43 +0200, Groo, El Errante wrote:
 Apologize me about my ignorance, but with this procedure can be any data
 lost?.. I don't have any other 260GB to make a backup. If this procedure is
 safe, I will do it immediatly and send you the reports.

What???!!! You have 260GB of data and you DON'T have a backup?

Having a small bitmap error is the least of your worries then.
Just because you have RAID5 does not mean your data is safe.  Often, on
systems that have very long uptimes, you can have multiple-disk failures
if you have a long shutdown period.  Also, RAID doesn't protect against
user/software error that deletes some/all of your data.

If you DO have a backup, then the worst that can happen (not saying it WILL
happen) is that your filesystem is totally lost, so you create a new one
and restore the data.  An outage of a couple of hours to do a full restore
is nothing compared to trying to get all of your data back if you have
some sort of problem.

Cheers, Andreas
-- 
Andreas Dilger  \ If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert




Re: [reiserfs-list] Mount options

2001-08-31 Thread Andreas Dilger

On Aug 31, 2001  18:54 +0200, Rosaire AMORE wrote:
 /dev/hdb6 /ext reiserfs rw,suid,dev,exec,auto,user,async 1 2
 
 The second line contains options that i used with ext2fs. The result was
 that i was unable to execute nothing on the /ext filesystem (scripts or
 binaries).
 I rewrote the line like this :
 
 /dev/hdb6 /ext reiserfs notail 1 2

Well, most of those options are in the defaults flag, so you could use:

/dev/hdb6 /ext reiserfs defaults,user 1 2

and it should work for both filesystem types.  However, I don't think any
of them are specific to ext2, so reiserfs shouldn't have a problem if
they are specified explicitly.  I would _guess_ that all of these flags
are handled by mount/VFS and neither reiserfs nor ext2 actually do anything
with them, so it is strange that you would have such a problem.

Cheers, Andreas
-- 
Andreas Dilger  \ If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert




Re: [reiserfs-list] ReiserFS at /

2001-08-29 Thread Andreas Dilger

On Aug 29, 2001  21:51 +0400, Hans Reiser wrote:
 How about sending Linus and lkm an email insisting that Xenix should get its
 attempt to mount last.

I _thought_ that a patch had gone into v7 which made it check some sanity
of the superblock before trying to mount it (e.g. blocksize/total blocks
make sense for device, root inode is a directory, has a size which is a
multiple of the block size, etc).  I haven't checked whether the patch
actually made it in, but since Linus was an advocate of it, I was pretty
sure it would.

Still, I don't think anyone would object to a patch which moved SYSV to
the end of the list in fs/Makefile, so that it is probed last.

Cheers, Andreas
-- 
Andreas Dilger  \ If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert




[reiserfs-list] [PATCH] incorrect byte swapping for statfs

2001-06-12 Thread Andreas Dilger

Hello,
while this is probably already addressed in the patch for big-endian
reiserfs support, I noticed a bug when looking at resierfs_statfs.
You are byte-swapping the (generic) superblock s_blocksize field,
which is already in CPU byte order.  The other fields are from the
reiserfs on-disk superblock (rs) which need to be swapped, but this
one is the VFS superblock (s).

Cheers, Andreas
==
--- fs/reiserfs/super.c.origTue May  8 17:08:12 2001
+++ fs/reiserfs/super.c Tue Jun 12 11:20:30 2001
@@ -853,7 +853,7 @@
   
/* changed to accomodate gcc folks.*/
   buf-f_type =  REISERFS_SUPER_MAGIC;
-  buf-f_bsize = le32_to_cpu (s-s_blocksize);
+  buf-f_bsize = s-s_blocksize;
   buf-f_blocks = le32_to_cpu (rs-s_block_count) - le16_to_cpu (rs-s_bmap_nr) - 1;
   buf-f_bfree = le32_to_cpu (rs-s_free_blocks);
   buf-f_bavail = buf-f_bfree;
-- 
Andreas Dilger  \ If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert