Re: Which version will be merged into mainline kernel?
On Nov 08, 2006 10:15 +0100, Francesco Biscani wrote: I think the slow performance you're experiencing are caused by the fsync() call not being well-optimized in reiser4. I've commented out the function in fs/buffer.c, and I'm having much better performance on my / partition. I don't think this can be advocated as a real solution to performance problems. This can mean data loss for some applications like email that expect fsync return to mean this data is safely on disk. May as well say I improved the performance of my backups by writing to /dev/null. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Re: Problem with multiple mounts
On Nov 08, 2006 14:38 -0800, Suzuki wrote: Lennart Sorensen wrote: ReiserFS: sda10: checking transaction log (sda10) Oops: Kernel access of bad area, sig: 11 [#1] Call Trace: [C00011333090] [C01EDB70] .journal_read+0x165c/0x1b6c (unreliable) [C00011333410] [C01EF280] .journal_init+0xdc0/0xee8 [C00011333530] [C01CDBD8] .reiserfs_fill_super+0xa90/0x1e40 [C00011333790] [C011E988] .get_sb_bdev+0x208/0x31c [C00011333870] [C01CA00C] .get_super_block+0x38/0x60 [C00011333900] [C011E260] .vfs_kern_mount+0xec/0x198 [C000113339B0] [C011E3E0] .do_kern_mount+0x88/0xdc [C00011333A50] [C01532CC] .do_mount+0xd50/0xe08 [C00011333D60] [C0175090] .compat_sys_mount+0x368/0x448 [C00011333E30] [C000861C] syscall_exit+0x0/0x40 My question is : Is this supported ? Mounting a filesystem which is already mounted and replaying the ( - a may be incomplete- ) journal. Thanks for the response. This problem was reported by one of our test team on 2.6.19. So, I wanted to confirm that what they are doing is not supported ! I would suggest that even while this is not supported, it would be prudent to fix such a bug. It might be possible to hit a similar problem if there is corruption of the on-disk data in the journal and oopsing the kernel isn't a graceful way to deal with bad data on disk. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Re: Better fsck.reiser4 needed
On Sep 06, 2006 13:04 +0400, Vladimir V. Saveliev wrote: On Wednesday 06 September 2006 12:01, Joe Feise wrote: It seems that fsck.reiser4 -a doesn't do anything. It doesn't even detect corruption. I think it is made that way with intention. If fsck.reiser4 -a did corruption detection bootup process would take long time and users would not have the main advantage of journalled filesystems - quick recovering after unclean shutdown. In e2fsck the boot-time check (-a) of ext3 only does very minimal checking: - valid superblock - valid journal superblock - recover journal and any errors stored in the journal, transfer to superblock - reverify superblock - check if superblock recorded any metadata errors previously At this point less than a second has normally passed and the e2fsck is done. The kernel ext3 code can also do journal recovery, but this doesn't allow e2fsck the chance to verify the superblock after the journal is recovered. If the journal or filesystem superblock recorded an error in the filesystem during the previous run (generally corruption of the metadata) or is itself corrupt e2fsck will force a full check. Otherwise, this corruption may cause endless panic+reboot cycles, or may lead to cascading corruption of the rest of the filesystem (e.g. if allocation bitmaps are corrupted). Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Re: possible recursive locking detected - while running fs operations in loops - 2.6.18-rc2-git5
On Jul 26, 2006 00:16 +0200, Jesper Juhl wrote: What I did to provoke it was to run 6 different xterms (with a bash shell) with the following loops in them in a test directory that was initially empty : xterm1: while true; do mkdir a; done xterm2: while true; do rmdir a; done xterm3: while true; do touch a/foo; done xterm4: while true; do find .; done xterm5: while true; do sync; sleep 1; done xterm6: while true; do rm -r a; done See racer test at ftp.lustre.org/pub/benchmarks/racer-lustre.tar.gz It does the above, but a bunch more things and is a truly pathalogical test script that does lots of stupid user tricks, unlike normal tests which are only doing operations that expect to be successful. PS - during the racer.sh test run rm is known to segfault after hitting an internal assertion, nobody is sure why. PPS- I don't know who wrote this program, it was originally posted by someone not the author to linux-fsdevel or something. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Re: create very large file system
On Jul 19, 2006 16:57 +0400, Alexander Zarochentsev wrote: On Wednesday 19 July 2006 16:10, Mark F wrote: I've tried to create a large 5TB file system using both reiserfs and ext3 and both have failed. you might need to convert the partition table to GPT format for supporting 2TB+ partitions. it can be done by the gnu parted tool. Or, for that matter, don't use a partition table at all, since this adds an unhelpful offset to all the filesystem structures and can hurt performance on RAID where the filesystem is trying to align IO to RAID stripe boundaries. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Re: create very large file system
On Jul 20, 2006 13:17 +0200, Christian Iversen wrote: On Thursday 20 July 2006 08:26, Andreas Dilger wrote: On Jul 19, 2006 16:57 +0400, Alexander Zarochentsev wrote: On Wednesday 19 July 2006 16:10, Mark F wrote: I've tried to create a large 5TB file system using both reiserfs and ext3 and both have failed. you might need to convert the partition table to GPT format for supporting 2TB+ partitions. it can be done by the gnu parted tool. Or, for that matter, don't use a partition table at all, since this adds an unhelpful offset to all the filesystem structures and can hurt performance on RAID where the filesystem is trying to align IO to RAID stripe boundaries. Can linux still auto-detect raid volumes if there's no partition table? Hmm, that I'm not sure of - we mostly deal with external RAID devices. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Re: reiserfs performance on ssd
On Apr 28, 2006 09:07 +0200, PFC wrote: While I like the idea, the iram implementation is horrible for various reasons : - no ECC I don't know why people are so keen on ECC RAM. Why not just put an extra socket on the board and run the RAM in RAIM (RAID for Memory) mode? The incremental cost of ECC vs. regular RAM is FAR more than the cost of just getting the extra stick of RAM. Also, with RAIM you could even hot-swap a failing DIMM, while with ECC you have to take an outage to get back to redundancy. - It uses SATA hence only a very little part of the RAM speed is used, and large latencies are introduced. Ah, but if you can connect the RAM to multiple machines, you could at least have the hope of hot failover for the storage to another server. That isn't something you can do with a bus-attached device. Even the chance of any recovery is far better with such a setup (i.e. plug into another system after motherboard dies), since you can at least get access from another machine. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Re: State of the Reiser4 FS
On Mar 15, 2006 20:27 +0100, Andreas Sch�fer wrote: If it was that easy... The problem for openMosix is that most devices fetch data in 4k blocks via copy_from_user(). For migrated processes, openMosix intercepts these calls and forwards them to the node which currently hosts the process. This forwarding yields a high latency penalty. Obviously there are two ways to get rid of this problem: * modify _every_ Linux device driver to use a _a_lot_more_than_4k_at_a_time_ approach or * implement a second read ahead buffer which fetches large blocks via the network in the background and answers calls to copy_from_user() directly from the local buffer Or you can use a network filesystem like Lustre that handles this itself ;-). Sadly, though, it has to do both of these to get good performance, via {sub,per}version of the VFS/VM. Clients do delayed-write (writeback cache, with write credits from the server to accound for space) to avoid small RPCs. They also do large amounts of readahead (in large chunks) to improve reads for applications and the VM that breaks up all reads into 4kB chunks. Servers also do batch block allocation and then large direct writes instead of going through the VFS/VM. There are still a number of device drivers that break up bios into chunks smaller than 1MB, and that hurts performance. Having a generic delayed/batch allocation mechanism is definitely the right way to go, and from my reading of linux-fsdevel this is underway by some folks at IBM. Since we have to support customers dating back to 2.4.21 it will be a while before we can move over to the newer APIs, once they are available. BTW: how are you guys planning to solve this 4k issue? Will you revert to small blocks or will you pretend to perform 4k transfers and assemble those in the background to, again, process large chunks at once? If yes, wouldn't this seriously increase CPU usage due to (most likely) unnecessary data duplication? It doesn't result in data duplication, per se, since the pages are copied into kernel space only once. What it does mean is that there needs to be a duplication of infrastructure in order to reassemble and track all of these pages. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Re: [PATCH 00/11] reiserfs: xattr rework
On Mar 08, 2006 14:12 -0500, Jeff Mahoney wrote: Jan Kara wrote: The internal i/o patches don't support tails, and that's a silver bullet against this working for xattrs. Most xattrs, such as ACLs, are likley to be only a few tens of bytes long and allocating an entire block is extremely wasteful. Umm, that is really nasty. Ext3 solves this by sharing a block among several inodes but that's far to much work to fix this bug... I had considered sharing files, and the code knows to drop a link to a shared file when it's changed. That's one of the features I had wanted from the beginning but never got around to implementing. Just FYI, ext3 has recently implemented support for larger inodes exactly to store small EAs with the inode instead of an external block. For ACLs there is some benefit to sharing the block, because the overhead is amortized over many inodes. However, virtually all other EA data is unique per inode and the ext3 EA block sharing only works if ALL the EAs for an inode are identical, so that isn't very useful if you have anything other than ACLs to store. The performance of in-inode EAs is vastly better than external blocks because of seeks and not wasting 4kB of disk/RAM for 10-100 bytes of EA. Not sure if this is useful (haven't been following discussion too closely), but thought I would steer you away from the shared-block idea early. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Re: Small ZFS / Reiser4 / Ext 'benchmark'
On Feb 02, 2006 22:59 +0100, Adrian Ulrich wrote: If anyone is interested: I ran a small filesystem benchmark on my x86 PC. It includes: On Linux: * Reiser4 * ReiserFS * Ext3 On Solaris (Using 'gnusolaris'[.org] - Alpha 2) * UFS * ZFS NetApp's 'Postmark' was used to perform the tests. (Postmark simulates something like Mail/NNTP-Server load) Results: http://spam.workaround.ch/dull/postmark.txt (I used the *default* mkfs/mount options for all filesystems. If you like, i can re-run the test with non-default parameters) If you could format (or tune2fs) the ext3 filesystem with -O dir_index this would likely improve performance if the test is creating many files in the same dir. Also, is the file size limit in bytes, or kilobytes? Unfortunately, the canonical postmark URLs I can find are not useful. What is a interesting, though maybe not terribly surprising is that ZFS is doing so poorly in the second test. I'd be extremely interested in seeing the vmstat output while the tests are running, as I've heard that ZFS is CPU hungry because of the checksumming. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Re: struct dirent *ent-d_type weirdness
On Sep 19, 2005 18:33 +0400, Vladimir V. Saveliev wrote: Kristian Köhntopp wrote: My goal is to walk a spool directory recursively and deal with all files inside directories that I encounter. For that, I check struct dirent *ent-d_type and get unexpected results. Yes, unfortunately, reiserfs does not support entry types. /* disable actual test and print ent-d_type */ /* ent-d_type==DT_DIR ent-d_name[1]==0 ent-d_name[0]0 ent-d_name[0]!='.'*/ You need to stat() the entry to find the type if d_type is DT_UNKNOWN. I would continue to use d_type as an optimization, however, as this reduces the number of syscalls needed when it is supported, and for network filesystems a stat() may be an expensive operation. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Re: Fastest way to find / -mtime +7.....
On Jul 19, 2005 16:00 -0600, Jonathan Briggs wrote: How about some kind of stat-data readahead logic? If the first two or three directory entries are stat'd, queue up the rest (or next hundred/thousand) of them. If the disk queue is given the whole pile of stat requests at once instead of one at a time, it should be able to sort them into a reasonable order. This might even be a VFS thing to do instead of per-FS. This is something I would be very interested in. Having a pipeline of stats generated when an app does readdir + in-order stat would help reduce latency a great deal for network filesystems. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. pgprQT5yfDILL.pgp Description: PGP signature
Re: Creating large numbers of files
On Jan 10, 2005 17:34 -0500, Dan Labute wrote: I'm a developer trying to create large numbers of *empty* files (~10). What is the fastest way to perform such an operation other than a simple open(). Will using multiple threads to perform concurrent operations help significantly, or am I just awell-off using a single thread? If you call 'mknod(/path/to/file, S_IFREG | 0666, 0)' from a program it avoids the overhead of doing both an open and a close. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/ pgpCKZHvOpsd7.pgp Description: PGP signature
Re: Bug in scandir d_type
On Dec 07, 2004 12:54 -0500, Jeff Mahoney wrote: -BEGIN PGP SIGNED MESSAGE- [EMAIL PROTECTED] wrote: | Hello. | | I am working with a program that uses scandir, using d_type to check if a | file has certain properties. It seems like that when used on reiserfs 3.6 | (I haven't tried any other version), the d_type field is always 0 (zero). | When the program is moved onto an ext2 partition, it works. The example | program in man scandir works also the same way (just replace d_name with | d_type when printing out). | | Kernel: 2.6.8.1 with reiserfs compiled in. The d_type feature appears to be optional. ext[23] only supports it because the feature was tacked on later, it's protected by an incompatible feature bit. Most other Linux filesystems only bother returning something other than DT_UNKNOWN for . and .., which is kind of silly. In order to get the type information from the file/directory, it either needs to be stored with the directory entry (disk format change required), or readdir needs to load _every_ inode referenced by the directory which would be an immense performance hit for such a small corner case. ReiserFS has been in the mainline kernel for years now, and your message is the first complaint I've seen about this feature missing. Doesn't reiserfs store the mode in the directory entry anyways, or is that only reiser4? So the overhead of returning the filetype is virtually nil. If you truly need the type information, a more portable solution would be to stat() each filename returned. You can generate the d_type value as follows: Which obviates the whole point of d_type, which is to get the filetype efficiently without hundreds of extra syscalls (which is at least part of what Hans wants with sys_reiser4()). Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/ pgpMP8O2zXiJE.pgp Description: PGP signature
Re: [RFC] Pathname Semantics with //
Christian Mayrhuber wrote: What about using // as some URI entry point? One problem that using // may have (thought it is personally my favourite option right now) is that realpath(3) may cause the // to be eaten, and this is used by many programs to resolve pathnames to remvoe symlinks, bogus /./ etc. This may need a small fix in glibc, but at least it is still central instead of teaching a million apps about different sematics. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/ pgpmMxWWiq3vG.pgp Description: PGP signature
Re: reiserfs_create: no enough blocks on device
On Aug 02, 2004 14:23 -0700, sankarshana rao wrote: I am trying to create a reiserfs filesystem using the command 'mkreiserfs /dev/loop/0 100'. It always gives me the error reiserfs_create: no enough blocks on device. I tried altering the block size, but it would not help.. You can't make a reiserfs filesystem smaller than about 40MB. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/ pgpBVwc8WT6cW.pgp Description: PGP signature
Re: reiser4 metas/bmap problem
On Jul 28, 2004 18:26 -0500, Matt Stegman wrote: I'm finally taking the time to start testing reiser4, and I'm running into something odd. Some of the time, reiser4 doesn't report a file's blocks until I run 'sync'. This shows up in metas/bmap as many blocks are reported as 0. # dd if=/dev/zero of=/mnt/reiser4/file bs=1M count=50 50+0 records in 50+0 records out # chmod +x /mnt/reiser4/file # grep '^0$' /mnt/reiser4/file/metas/bmap | wc -l 1792 # sleep 100 # grep '^0$' /mnt/reiser4/file/metas/bmap | wc -l 1792 # sync # grep '^0$' /mnt/reiser4/file/metas/bmap | wc -l 0 I tried waiting to see if it gets written out in the background asynchronously. As you can see, it still shows up after a couple minutes, so it doesn't appear to get written until an actual 'sync' command is run. But it shows up inconsistently - many files don't show this. I believe one of the speedups of reiser4 is that it doesn't actually write data to disk for minutes at a time, vs. 5s or so for most other filesystems. The ext3_bmap() function flushes all file data to disk when called, and it would be prudent to do the same with reiser4, since bmap users tend to be important and not speed critical (e.g. lilo) and failing to do so can mean not booting later. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/ pgpGdEIkfYdAo.pgp Description: PGP signature
Re: 2 Terabyte install
On May 11, 2004 19:51 -0700, Clifford Beshers wrote: Thanks for the help everyone. In the end, it turned out that sfdisk is indeed the culprit. We created a 2.1T partition with fdisk, then asked sfdisk for the size and it said: -92,xxx,xxx. Unfortunately, this means we have to either fix it or find an alternate solution to creating partitions. If the word ``parted'' is on the tip of your tongue I'll bet you haven't actually used the thing... There were a few bugs of ours that acted as red herrings, but Linspire is now up and running on this system with ReiserFS 3 and kernel 2.6.5. While I'm here, I have some other questions: * What is the time complexity of mounting a ReiserFS partition? It seems to be proportional to the size of the partition? Is it different for Reiser4? AFAIK, reiserfs will do the initial zeroing of the journal and filesystem bitmaps at the first mount time instead of at mkreiserfs time. I don't know why it was done that way. * Is there a tool to determine the type of file system on a partition without mounting it? Lots of them. file -s /dev/foo or if you have a newer e2fsprogs (1.33 and newer I think) you can use blkid [dev ...] to tell you a bunch of things about each device (LABEL, UUID, TYPE). Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: tests to see how ext3 reiserfs 3.6 and jfs survive disk errors.
On Mar 31, 2004 16:00 +0400, Vladimir Saveliev wrote: On Wed, 2004-03-31 at 15:44, Ivan Ivanov wrote: I made some tests to see how ext3 reiserfs 3.6 and jfs survive disk errors. The test is simple: format a partition, copy the kernel source, unmount and and do ?dd if=/dev/zero of=/dev/hdd bs=512 count=10 seek=3? to simulate a disk surface damage and then run fsck. seek=3 ? this must be the second half of journal in reiserfs and ext3, for jfs I don't know Well, not that I defend reiserfs's i/o error handling But I do not think that your test is a fair one. You overwrote area where reiserfs stored metadata for data you copied into it. (not sure about jfs, it probably has the same problem). Do you want to try to overwrite ext3's inode tables? Actually, with a 51MB write it is guaranteed to overwrite at least one inode table somewhere in the filesystem (one inode table per 32MB of disk). One of the reasons that ext2/ext3 can survive such actions is that the location of the metadata is in a fixed location so even if everything is overwritten it knows what is inode table, what is data blocks, etc. This makes ext3 less flexible (i.e. no dynamic inode allocation) but also more robust. jfs: total data loss, can't mount, fsck didn't helps reiserfs: - doing ?reiserfsck ?rebuild-tree? moves all recovered data in lost+found, but information is almost unusable ext3: - after ?fsck.ext3 -f -y? almost everything was usable, directory structure was untouched, some files was moved in lost+found, but in general everything was usable. My opinion: I can't use anything but ext2/3 in a system where there is no RAID ? 99% of desktops and most of web and mail servers. If you have time, you may want to try overwriting some other parts of the filesystem, just to see if the results change. I don't think it will make a huge difference in the end, but it might. Note that 51MB is a large fraction of the size of a Linux kernel so you might end up overwriting 1/4 of all the data. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: Object Oriented FS
On Oct 27, 2003 21:23 +0300, Hans Reiser wrote: darren wrote: allows very high throughput by scaling Do you know what that means? (Seriously, I don't) Panasas (from what little I have seen of it) looks to be very similar to Lustre. Clients do not access disks directly, but rather have a mapping layer between offsets and actual data on disk (similar to LVM and VM abstractions of block devices/memory). Once client has this mapping (very small in lustre, on the order of a few hundred bytes at most), it can do IO directly to one or more storage devices hence scaling of throughput proportional to number of storage targets. In Lustre at least, the client knows nothing about the physical layout of blocks in the file on disk, but rather just accesses one or more objects via object identifyer, offset, length so the actual layout of the on disk data can change. Lustre is GPL, don't know about Panasas. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: ReiserFS problems
On Aug 06, 2003 19:18 +0200, Rogier Wolff wrote: later. So we hit control-C on the fsck. That was big mistake. It was only a couple of percent done. All we have to do now is run it again, and let it continue. From a user-safety point-of-view, you should use tty() to see if the program is running interactively, and then trap CTRL-C and have it print a warning in the signal handler that pressing CTRL-C again in the next second will kill it. All you need then is to call time() and save it in a static, and if the signal handler is called more than once in the same second only then exit. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: ReiserFS problems
On Aug 06, 2003 18:20 +0200, Rogier Wolff wrote: Question: If it is reading all datablocks, I'm guessing that it is looking for the magics that build up the filesystem. We're a datarecovery company. We probably don't have any current datarecoveries of people with Reiserfs on their disk. But if we had a disk-image with a valid (or not) Reiserfs on it, would it link that into our filesytem? Correct. I think that is mentioned somewhere with the resierfsck docs not to try this with an image of a reiserfsck disk in the filesystem. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: reiserfs on removable media
On Jul 02, 2003 14:23 -0400, Zygo Blaxell wrote: Two reiserfs improvements come to mind: - There is a tendency for files that were being grown at crash time to contain invalid data. It seems that the inodes are being updated before the data blocks they refer to are written. It would be nice if the inode writes were deferred (or at least made invisible) until after the data blocks were written. I'd rather lose my data than possibly have random garbage masquerading as my data. This is called ordered data mode, and exists on ext3 and also reiserfs with Chris Mason's patches. Under normal usage it shouldn't change performance compared to writeback data mode (which is what reiserfs does by default). - If the device is detached while a filesystem is mounted, reiserfs gets a whole lot of I/O errors (or worse) and immediately oopses. It would be nice if reiserfs would handle this a bit more gracefully--it should at least let me kill processes with open files and umount the filesystem. OTOH many other things also oops with with current USB/firewire/scsi device driver stack too. :-P Well, if something oopses you are pretty much stuck w.r.t. killing the process and unmounting the fs. So fix the oopses and the rest should come around as a result. Of course, the reiserfs folks can do a lot more with a specific oops report than just it immediately oopses. ;-) Not much you can do about the IO errors (i.e. working as designed). That's going to happen if you remove your device while writing to it. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: Write-once file system
On Jun 27, 2003 09:07 -0700, Fong Vang wrote: Once the write to the file is CLOSED the file should not be modifiable in any way. It should not be writeable by root. Ideally, this should be across reboot and across kernel. The current requirement is that as long as the modified kernel/reisefs is being used then it should NOT be modifiable (if a kernel allowing modification is used then it could allow modifications). Sounds like immutable (chattr +i) support is what you want. It looks like reiserfs already supports this. Even root can not overwrite or delete an immutable file, but could disable the immutable flag first (chattr -i) before doing so. Regular users can never disable the immutable flag once set without the CAP_LINUX_IMMUTABLE capability. However, it looks like the reiserfs code has a bug there - any user can clear the immutable flag (see ext[23]_ioctl() for proper permission check). In BSD (AFAIK), removing the immutable flag requires that you be booted into runlevel 1 (single user) but in Linux it can currently be done at any time, although I imagine it would be pretty easy to fix that. You should be able to set the immutable flag on a directory and have it inherited by all files created in that directory. Fong Vang wrote: We rely heavily on reiserfs for some of our critical file systems. I'm wondering what work would be involved and how difficult it would be to add an option (perhaps at mount time) to reiserfs that will allow a file to be written only once, i.e. once a file is created it should not be allowed to be modified or deleted (including the inode). We may consider paying for this modification. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: Write-once file system
On Jun 27, 2003 10:07 -0700, Fong Vang wrote: this doesn't seem to work on kernel 2.4.20. I did a chattr +i on file but rm -rf (as root) on the file deletes it. That is a reiserfs bug then... I just tested it with ext3 and it worked as expected. [root]# chattr +i /tmp/ttt [root]# echo foo /tmp/ttt bash: /tmp/ttt: Permission denied [root]# cp /etc/hosts /tmp/ttt cp: cannot create regular file `/tmp/ttt': Permission denied [root]# mv /tmp/cvsErIatf /tmp/ttt mv: cannot move `/tmp/cvsErIatf' to `/tmp/ttt': Operation not permitted [root]# rm -f /tmp/ttt rm: cannot unlink `/tmp/ttt': Operation not permitted [root]# mv /tmp/ttt /tmp/foo mv: cannot unlink `/tmp/ttt': Operation not permitted mv: cannot remove `/tmp/ttt': Operation not permitted Note however, that I now see that an immutable directory can not have new files created in it, so there is no easy way for new files to inherit the immutable flag. That could probably be done on a per-filesystem basis by mounting with a new option inherit=immutable or something like that. Andreas Dilger [mailto:[EMAIL PROTECTED] wrote: On Jun 27, 2003 09:07 -0700, Fong Vang wrote: Once the write to the file is CLOSED the file should not be modifiable in any way. It should not be writeable by root. Ideally, this should be across reboot and across kernel. The current requirement is that as long as the modified kernel/reisefs is being used then it should NOT be modifiable (if a kernel allowing modification is used then it could allow modifications). Sounds like immutable (chattr +i) support is what you want. It looks like reiserfs already supports this. Even root can not overwrite or delete an immutable file, but could disable the immutable flag first (chattr -i) before doing so. Regular users can never disable the immutable flag once set without the CAP_LINUX_IMMUTABLE capability. However, it looks like the reiserfs code has a bug there - any user can clear the immutable flag (see ext[23]_ioctl() for proper permission check). In BSD (AFAIK), removing the immutable flag requires that you be booted into runlevel 1 (single user) but in Linux it can currently be done at any time, although I imagine it would be pretty easy to fix that. You should be able to set the immutable flag on a directory and have it inherited by all files created in that directory. Fong Vang wrote: We rely heavily on reiserfs for some of our critical file systems. I'm wondering what work would be involved and how difficult it would be to add an option (perhaps at mount time) to reiserfs that will allow a file to be written only once, i.e. once a file is created it should not be allowed to be modified or deleted (including the inode). We may consider paying for this modification. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: Corrupted/unreadable journal: reiser vs. ext3
On Feb 14, 2003 22:19 +0300, Hans Reiser wrote: Andreas Dilger wrote: You are well aware that the e2fsck check intervals can be tuned per-filesystem and even disabled if desired (it prints options for how to do this at mke2fs time and is clearly documented for the experienced user). For a boot-once-a-day machine, the default is to check about once a month (at most 6 months for the time check), and if machines are crashing more often, then they should probably be checked more often because _something_ has to be causing crashes. The idea that how often you boot determines how often it checks is just silly, sorry. I guess the shortcoming in the ext2 case is that it counts mounts and not crashes. If it were counting the number of times the filesystem was uncleanly shut down instead of normal shutdowns, would that be more acceptable? The reason I'm still interested in crashes, even if they are not filesystem-related crashes, is because there had to be _something_ which caused a crash (bad code, bad hardware, whatever), and once you have any driver corrupting memory the chance that it is also corrupting filesystem memory exists. Having reiserfsck just do read-only checks shouldn't force you to type yes (and we mean yes because this is so scary, mere mortals shouldn't be doing this). Hans, you've always talked about making things easy for the average user (error messages and such), don't you think that making a data consistency check for the user a little less intimidating too? I think that you should have to agree that you have time to wait for fsck before you get stuck with a 1 day large server fsck. That is definitely true. However, my assumption would be that if someone is running a system with terabytes of data they will read the man page after waiting a day for fsck to complete, or lose their job. It is entirely possible for administrators to disable the per-mount e2fsck checking, and the time-based (6 months by default) checking too, and do fsck themselves. My experience would be that, like backups, people don't do that, so leaving the 6 month check in protects users from themselves. The other thing to keep in mind is that you can have different levels of automated fsck at boot time, depending on how long they take. You never necessarily have to try and fix anything with fsck -a, just detect errors and leave it up to the user to decide what to do if you find a problem: - always recover journal, validate superblock, error flag ( 1s) Don't know how long it takes these things to run, so it is up to you to trade off checks vs. speed, and you could even round-robin them (storing the last checked item in the superblock or something): - check block allocation bitmaps match superblock counts - walk directory structure from root, checking for directory corruption - check btree validity on inodes for up to 10 seconds (or whatever, storing last checked inode in superblock for restarting this test at next one) By all means, don't do checks for an hour, or allow users to set the maximum boot check duration in the superblock. I'm sure users don't mind waiting 5s at boot time if it means they don't lose data. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: External journals and NVRAM devices
On Nov 01, 2002 16:38 +1100, Jeremy Howard wrote: I'm looking at buying solid state drives / NVRAM drives for our servers to hold an external ReiserFS journal. We are using 2.4.20pre11, and Chris Mason's data logging patches. I'm looking for any tips on how large the journal is when using data=journal, and whether the external log patches are stable and work OK in data=journal mode. Is there a command to show the current journal size? Does the size vary over time? We need to ensure we buy a card with enough memory so this is important information for us. Is anyone currently using NVRAM for the journal? If so, how do you find the performance of this configuration? When people were testing this with ext3 external journals, they just used a RAMDISK for getting the performance measurements. Obviously, (I hope ;-) this is not something you can do in real life, but for performance measurement it is OK. Most people found that the ramdisk (and presumably the NVRAM device too) didn't perform much, if any, better than having a separate fast disk for the journal, because you are doing sequential I/O to the journal anyways. If it is on a separate disk/controller from the filesystem you don't have any seek or channel contention with the filesystem. Of course, using a regular disk for the journal is MUCH cheaper than an NVRAM card, so you probably want to test this out before you go ahead and buy the NVRAM card. NVRAM devices are great for disks you are doing a lot of random I/O on (maybe database indexes or something), because there is zero seek latency, but for sequential I/O (like the journal) it really isn't anything special. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/
[reiserfs-list] Re: [ext3-users] To compare Linux journalised filesystem, part II.
On Oct 24, 2002 18:45 +0200, Fabien Combernous wrote: ++--+ | quotas | Again Y is not aqual. ext3 accept quota only on data-journaled | || filesystems, but all other journaled filesystem don't have data | ++--+ Granted that I have never used quotas, so it is possible that I am incorrect. However, my understanding is that yes, you do need data-journaled quota files to ensure that your quota tables don't miss some operations after a crash. However, you can separately select data journaling for files in ext3 (via chattr), even if the rest of the filesystem is using data=ordered (the default). Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/
[reiserfs-list] [BUG][trivial] atime not set on directory reads
Hello reiserfs folks, I'm just in the process of running some POSIX conformance tests on Linux (test available at http://www.opengroup.org/testing/lsb-fhs/) and reiserfs fails one test where ext2/ext3 are passing. The test checks whether atime is updated on a directory if you do a directory lookup. I imagine that it is as simple as adding a call UPDATE_ATIME(inode) to the reiserfs directory lookup. This can, of course, be turned off at runtime via the noatime mount option (I think newer kernels also have a newer nodiratime). Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/
Re: [reiserfs-list] 2GB limit won't go away
On Jun 10, 2002 08:09 -0600, Chris Worley wrote: # rm test.bigfile ;dd if=/dev/zero of=./test.bigfile bs=1k seek=1023M dd: advancing past 1098437885952 bytes in output file `./test.bigfile': File too large OK, my bad. This should have been: dd if=/dev/zero of=./test.bigfile bs=1k seek=2047k Even so, this probably won't fix your problem. I would suggest creating a test 3.6 format filesystem (loopback or such) and see if that works. With the seek= argument you don't need a very big filesystem. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/
Re: [reiserfs-list] Fsck over Telnet is slow.
On May 31, 2002 10:21 +0200, Anders Widman wrote: Is it just me, but I think reiserfsck is slow over telnet, or even slower over ssh. It seems as it would be better to not print the progress as fast, if the connection to the terminal is slow. e2fsck used to have the same problem when you enabled the progress meter. What it does now is only update the progress on the screen every 10th of a second (or whatever) by checking how long ago the last update was. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/
Re: [reiserfs-list] Interesting reading that I agree with;-)
On May 22, 2002 18:33 +0400, Hans Reiser wrote: It is a pity we don't have more folks like Rob working on Linux. I haven't read the whole paper, but at first glance it would appear to be trivial to do this under Linux, because the dentries maintain such an absolute path to a file. The only thing that appears to be needed for this is the new syscall and the applications to actually use it. The syscall just needs to walk the dentry tree upwards to generate the full path, as is already done in the __d_path() function. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/
Re: [reiserfs-list] postmark performance numbers for ReiserFS
On Mar 04, 2002 00:32 -0500, Chris Mason wrote: Ok, I'm not going to be able to replicate the entire test, but I can at least demonstrate the high number of subdirectories is slowing down the creation time I'm guessing it is either caused by the subdirectory inodes not being in cache often enough, or increased log traffic Try setting the number of subdirectories to 10 If this fixes the reiserfs performance problem, we can look into solutions Hmm, interesting When Andrew was doing MTA performance testing on ext3, he found that _increasing_ the number of directories improved performance I don't have the thread handy, but I _think_ it had to do with VFS locking on the directories - more directories means that more operations can be done in parallel since Al made this part of the VFS thread-safe Cheers, Andreas -- Andreas Dilger http://sourceforgenet/projects/ext2resize/ http://www-mddspenelucalgaryca/People/adilger/
Re: [reiserfs-list] lsattr/chattr and ReiserFS
On Feb 23, 2002 15:01 +0100, Marek 'Marecki' Szuba wrote: On Fri, 22 Feb 2002, Andreas Dilger wrote: There is an attribute patch for reiserfs which is compatible with the ext2 attributes interface, and e2fsprogs 1.26+ has support for the no tail packing attribute Noted, thanks again. BTW. The one I was interested in was no atime update that I'd want on selected files rather than on the whole filesystem. I haven't looked at the reiserfs patch in a while, but it _may_ support the no atime flag. It should be clear if you look at the patch. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: [reiserfs-list] Reiserfs ext2 disk repair
On Feb 23, 2002 19:29 -0500, Harry Wert wrote: I recently, with the aid of several reiserfs experts from this group, added a journal file system to my SuSE 7.3 ext2 file system which is now working very well. After research and curiosity on my part I now understand my changes have not created a true reiserfs file system but retains the structure of the original ext2, adding a journaling file which facilitates a rapid recovery from a damaged file system, plus probably some improved performance. So, it sounds like you actually have an ext3 filesystem (ext2 + journal) and not a reiserfs filesystem at all. While ext3 and reiserfs have some general concepts in common (journaling, which allows quick recovery after a crash), in fact the kernel code and on-disk data format have nothing in common between ext3 and reiserfs. If my assumptions and conclusions are essentially correct, will my file system be able to maintain its' integrity? Yes, ext3 will maintain both metadata and data integrity after a crash. In general (both ext3 and reiserfs have this property) you may lose a small amount of changes which happened immediately before a system crash. This is because any operations which were not 100% completed at the time the system crashed have to be discarded in order to maintain filesystem integrity. Reiserfs has the added problem that you may have garbage in the contents of your file if it was in the process of being modified at the time of a crash. will I be able to use the original fsck structure to repair it Yes, the ext3 format is 100% supported by e2fsck, so it can detect and correct problems in the filesystem, just like with ext2 filesystems. Since the journal removes the requirement to run e2fsck after each crash, it does not do a full check very often (6 months or 20-40 crashes) unless the kernel detects an error in the filesystem. You can change these intervals with the tune2fs program. Even though the ext3 journal will protect you from corruption due to interrupted filesystem operations, it cannot protect you from disk, cable, RAM, CPU, or software errors, so it is still a good idea to do a full filesystem check periodically. Cheers, Andreas PS - Since you do not actually have a reiserfs filesystem, I would suggest that you send any future questions to [EMAIL PROTECTED] instead of to the reiserfs list. -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: [reiserfs-list] Deleting and resetting a reiserfs journal file
On Feb 19, 2002 20:22 -0500, Harry Wert wrote: I have SuSE 7.3 Professional installed on a HP Pavillion 750n with a 1.6Ghz The system is operating and stable but due to the large hard drive, it takes an agonizing 20 minutes to do a fcsk every 26 mounts. SuSE recommended I change to the reiser file system since they could not understand why the tune2fs -c 1000 /dev/hda7 command refused to permanently change the max mount from 26 to 1000. That is because SuSE 7.3 shipped with a broken kernel (2.4.10) which has the kernel accessing the disk via the block cache and user-space accessing the disk via the page cache. This leads to problems such as you describe, where changes written from user-space are not seen by the kernel if the filesystem is mounted or is not synced to disk before shutdown. This can include tune2fs and e2fsck. Afer repeated attempts to solve this problem I decided to delete the .journal file and reissue the tune2fs -j /dev/hda7 command but now receive a cannot either delete or rewrite the .journal file due to a permissions error. The file is set immutable to prevent accidental deletion of the journal, if you create the journal on a mounted filesystem. Again, a symptom of the 2.4.10 kernel problems. Cheers, Andreas PS - you would probably be better off asking ext3 questions on [EMAIL PROTECTED] instead of reiserfs-list... -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: [reiserfs-list] resize_reiserfs problem
On Jan 07, 2002 17:57 -0500, Ciro Vargas Clemow wrote: I'm trying to change size of my root reiserfs v3.5 partition, but i can't do it. this is my partiton table. cfdisk 2.11b Unidad de disco: /dev/hdc Tamaño: 3228696576 bytes Cabezales: 128 Sectores por pista: 63 Cilindros: 782 Nombre IndicadoresTipo de parTipo de sistema d[Etiqueta] Tamaño(MB) - - hdc1Primaria Linux ext2 20,65 hdc2Primaria Linux swap136,25 hdc3InicioPrimaria Linux 2733,25 Pri/Lóg Espacio libre 338,56 4) I run resize_reiserfs /dev/hdc3 -s +250M /dev/hdc3 Resize_reiserfs respond that it's not enought space in this partition. Are 338.56 Mb free space. Well, my Spanish isn't que bueno, but I think what you need to do is to increase the size of /dev/hdc3 to include the libre space at the end of the disk before you resize the filesystem. The resize_reiserfs tool is only changing the _filesystem_ and not the _partition_! Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: [reiserfs-list] resize_reiserfs problem
On Jan 07, 2002 18:26 -0500, Ciro Vargas Clemow wrote: El Lun 07 Ene 2002 06:08PM, escribiste: Well, my Spanish isn't que bueno, but I think what you need to do is to increase the size of /dev/hdc3 to include the libre space at the end of the disk before you resize the filesystem. The resize_reiserfs tool is only changing the _filesystem_ and not the _partition_! I'm try this with the cfdisk program ? or exist other program to make this easy? Well, you can do it with cfdisk, it should be relatively easy to do in your case. GNU parted is probably the best tool for doing complex partition resizing tasks, but it does not appear to support reiserfs. p.d. sorry for my english Well, my espanol is worse, so don't worry about it. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: [reiserfs-list] What went into 2.4.18-pre1, what do we need? Again and again...;-)
Dieter Nützel wrote: Shouldn't you try to force Marcelo to integrate all pending stuff (O-inode-attrs.patch, quota)? Well, my understanding is that the inode-attrs patch is still at the experimental stage and has not actually been submitted to the kernel maintainer for inclusion. There are still unresolved issues like the fact that an unpatched kernel puts garbage into the attributes area, but a patched kernel respects the garbage attributes. IMHO (very 'H' as I'm not about to work on this), it should be done in several stages: 1) ASAP get a patch into mainline 2.2/2.4/2.5 which ensures that the attributes area is zero'd for new files. This should be low risk and easy to do, and reduces the growth of the problem. 2) Ensure that reiserfsck will clear out garbage attributes*. 3) Ensure that mkreiserfs will not write garbage attributes*. 4) Ensure that there is a FAQ entry on how to recover from garbage attributes for all of the people that run a new kernel without first having run a new reiserfsck against their partition. 5) Add the attribute handling code into the mainline kernel after a suitable interval. (*) My understanding is that (2) and (3) already exist, and it sets a flag in the superblock which indicates that the attributes are clean. This is essentially like an ext2 feature compatibility flag, which I have urged Hans to start using as well (especially v4). If this was widely used for reiserfs, then you could do things like ensure that an old reiserfsck that tries to run against a filesystem with the attribute flag set would tell the user they need to upgrade reiserfsck, and the kernel code would know which features it could handle and which ones it could not. It is also good practise that _any_ unused field in an on-disk struct is set to zero at mkfs time and in all the kernel code, to avoid such problems in the future. It is always good to know when trying to add a feature that this field will be zero, so I can use it like Has an audit been done of other unused fields in the reiserfs code, to ensure that they are being zero'd properly? If not, it should go in as part of (1) above. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: [reiserfs-list] 2.4.15 (final) + RFS A-N (2.4.15-pre8) running fine, here.
On Nov 26, 2001 20:25 +0100, Dieter N?tzel wrote: SunWave1 /usr# chattr -R -SAadiscu .badattr/ Actually the better thing is chattr -R = .badattr SunWave1 /lib/modules# lsattr 2.4.15-pre7-preempt/ lsattr: No such file or directory While reading flags on 2.4.15-pre7-preempt//build The problem is that lsattr calls ioctl to get the attributes, and if this is not a regular file or directory, then the ioctl will not necessarily work. On ext2, e2fsck will clear attributes on sockets, devices, etc, for you, and I imagine that reiserfsck will eventually do the same. As with any new feature, there is some time between when it is added to the fs, and when support for it appears in fsck. Erm, while disabling set and get as a nice idea, you only need to disable get. Disabling set will mean you 'cannot' reset the file that is causing the problem, because chattr won't actually be able to do what you want. That is not necessarily true - what if the attribute is compressed and you are randomly allowed to clear this feature? Then when you read back the file you will get compressed garbage that is not your file. There is a reason why not all attributes (under ext2 at least) are set/clearable from the user tools. eg: Does it read in the value, change the bit and write it out again, or does it just change that bit? If it reads and get is disabled, well then you would want to make sure you set the 'full value' that needs to be there. Not a bug so much as something that just needs to be documented. Depends on whether you use +/- or = when using chattr. See above why chattr -R = .badattr _should_ remove all of the features on your fs. In the future (once reiserfsck and the kernel code are smarter about clearing garbage attributes), reiserfs may want restrict set/clear of attributes to supported values, so that you don't accidentally clear something like compressed and screw yourself. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: [reiserfs-list] Re: [REISERFS TESTING] new patches on ftp.namesys.com: 2.4.15-pre7
On Nov 21, 2001 21:19 +0100, Dieter N?tzel wrote: Some files are _NOT_ deleteable even as root, argh? The normal ext2 solution in this case is move it all to a separate dir, cp -a from the old dir, and then wait for e2fsck to clean up. Since reiserfsck won't do thisyet , just consider it a few kB of lost space in /dev/bad or whatever. This _should_ be OK, because cp doesn't know anything about attributes. SunWave1 /home/nuetzel# lsattr / BD--j //bin BD--j //dev -DX-j //dvd BD--j //etc BD--j //lib BD--j //mnt BDXE- //opt BDXE- //tmp BDXE- //var BDXE- //usr BD--j //boot No, this is garbage. Most of these are ext2-specific flags, and sadly some of them are probably not even settable by chattr, I don't know, but you could try. I think only 'j' is supported by chattr (full data journaling, which isn't implemented in reiserfs yet). chattr -R -j / The others are unused, undocumented attributes, such as: B=compressed block D=compressed dirty file X=compression raw access E=compression error You can ignore these for now, they won't really hurt you, but maybe before pushing the reiserfs attributes patch, there should be a feature flag put in the superblock which means attributes are valid, and resierfsck will set this flag (if unset) after zeroing all of these fields. If it is an issue of the kernel not zeroing these fields for new inodes which is fixed by the attribute patch, then an attribute-aware reiserfs should refuse to mount with attributes enabled until reiserfsck --clear-attr is run to clear the attributes and set the flag. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
Re: [reiserfs-list] Re: [REISERFS TESTING] new patches on ftp.namesys.com: 2.4.15-pre7
On Nov 21, 2001 22:58 +0100, Dieter Nützel wrote: Am Mittwoch, 21. November 2001 22:20 schrieb Andreas Dilger: The normal ext2 solution in this case is move it all to a separate dir, cp -a from the old dir, and then wait for e2fsck to clean up. Sorry, I don't understand you, here. If I _first_ move all to a separate dir, the original dir is empty, no? You mean move it all to a separate dir, cp -a from the new to the old dir, delete the new one, and then wait for e2fsck to clean up? So, you have a lot of bad inodes in /dev, do this (untested, but easily reversible): mv /dev /.badattr mkdir /dev lsattr -d /dev Hopefully /dev is created without any attributes. If it is, then you need to find a directory which has no attribute bits set, create /dev there, and mv it to the root directory. cp -a /.badattr/* /dev lsattr -R /dev Hopefully all of the new inodes in /dev will not have attributes set. Presumably, the reiserfs attribute code does not inherit attributes for files which do not support them (e.g. special files), because ioctls on these files will talk to the device/socket/etc instead of to the filesystem. This might need to be fixed in the reiserfs patch. In the end, these attributes don't do anything bad for you, so they could all just be ignores. You can put other bad files into .badattr until then also. Since reiserfsck won't do thisyet , just consider it a few kB of lost space in /dev/bad or whatever. This _should_ be OK, because cp doesn't know anything about attributes. So, maybe you can delete some files in /.badattr, maybe not. It is only a few kB, and it will last until reiserfsck gets around to fixing it. SunWave1 /# time chattr -R -B / Not supported. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/
[reiserfs-list] Re: 2.4.9-ac12 - problem mounting reiserfs (parse error?)
On Sep 19, 2001 21:47 +0200, boris wrote: On Wed, Sep 19, 2001 at 12:49:54PM -0400, Fabian Arias wrote: - Debian Sid - mount 2.11h - gcc-2.95.4 (20010902 Debian prerelease) and 3.0.2pre010908. dito ... But in my case I don't have defaults on fstab on my reiserfs partitions: /dev/hdc1 / ext2 defaults,errors=remount-ro 0 1 /dev/hdc5 /home reiserfs rw 0 2 but I can boot : /dev/scsi/host0/bus0/target5/lun0/part1 / reiserfs defaults,errors=remount-ro 0 0 with: reiserfs: Unrecognized mount option errors everything is ok until I run lilo: Unable to handle kernel NULL pointer dereference at virtual address printing eip: *pde = Entering kdb (current=0xc58d8000, pid 738) on processor 0 Oops: Oops due to oops @ 0x0 eax = 0x ebx = 0xc5a7ebd4 ecx = 0xc7bb8c9c edx = 0xc1179bf0 esi = 0xc1179bd4 edi = 0x esp = 0xc58d9f20 eip = 0x ebp = 0xc58d9f54 xss = 0x0018 xcs = 0x0010 eflags = 0x00010246 xds = 0xc1170018 xes = 0x0018 origeax = 0x regs = 0xc58d9eec 0xc5a5e000 0588 0578 0 001 stop 0xc5a5e370 bash 0xc5a4c000 0737 0588 0 000 stop 0xc5a4c370 lilo 0xc58d8000 0738 0737 1 000 run 0xc58d8370*lilo.real [0]kdb btp 738 EBP EIP Function(args) 0xc012a310 0x unknown (0xc5a7ebd4, 0xc1179bd4) kernel unknown 0x0 0x0 0x0 0xc012a310 do_generic_file_read+0x364 (0xc5a7ebd4, 0xc5a7ebf4, 0xc58d9f88, 0xc012a6fc) kernel .text 0xc010 0xc0129fac 0xc012a51c 0xc58d9f98 0xc012a7d2 generic_file_read+0x7a (0xc5a7ebd4, 0x805a920, 0x200, 0xc5a7ebf4) kernel .text 0xc010 0xc012a758 0xc012a8dc 0xc58d9fbc 0xc0138e28 sys_read+0x98 (0x4, 0x805a920, 0x200, 0x805c768, 0x3) kernel .text 0xc010 0xc0138d90 0xc0138e64 0xc01071cb system_call+0x33 kernel .text 0xc010 0xc0107198 0xc01071d0 What version of LILO are you using? Versions = 21.6 _should_ automatically do tail unpacking for mapped files via ioctl, but maybe it is not well tested. Even so, it should NOT be possible to cause it to oops with bad data to the ioctl, if this is the case. I've CC'd reiserfs-list on this as well. Cheers, Andreas -- Andreas Dilger \ If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry? http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
Re: [reiserfs-list] Errors in a reiserfs that hagns out my machine
On Aug 31, 2001 15:43 +0200, Groo, El Errante wrote: Apologize me about my ignorance, but with this procedure can be any data lost?.. I don't have any other 260GB to make a backup. If this procedure is safe, I will do it immediatly and send you the reports. What???!!! You have 260GB of data and you DON'T have a backup? Having a small bitmap error is the least of your worries then. Just because you have RAID5 does not mean your data is safe. Often, on systems that have very long uptimes, you can have multiple-disk failures if you have a long shutdown period. Also, RAID doesn't protect against user/software error that deletes some/all of your data. If you DO have a backup, then the worst that can happen (not saying it WILL happen) is that your filesystem is totally lost, so you create a new one and restore the data. An outage of a couple of hours to do a full restore is nothing compared to trying to get all of your data back if you have some sort of problem. Cheers, Andreas -- Andreas Dilger \ If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry? http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
Re: [reiserfs-list] Mount options
On Aug 31, 2001 18:54 +0200, Rosaire AMORE wrote: /dev/hdb6 /ext reiserfs rw,suid,dev,exec,auto,user,async 1 2 The second line contains options that i used with ext2fs. The result was that i was unable to execute nothing on the /ext filesystem (scripts or binaries). I rewrote the line like this : /dev/hdb6 /ext reiserfs notail 1 2 Well, most of those options are in the defaults flag, so you could use: /dev/hdb6 /ext reiserfs defaults,user 1 2 and it should work for both filesystem types. However, I don't think any of them are specific to ext2, so reiserfs shouldn't have a problem if they are specified explicitly. I would _guess_ that all of these flags are handled by mount/VFS and neither reiserfs nor ext2 actually do anything with them, so it is strange that you would have such a problem. Cheers, Andreas -- Andreas Dilger \ If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry? http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
Re: [reiserfs-list] ReiserFS at /
On Aug 29, 2001 21:51 +0400, Hans Reiser wrote: How about sending Linus and lkm an email insisting that Xenix should get its attempt to mount last. I _thought_ that a patch had gone into v7 which made it check some sanity of the superblock before trying to mount it (e.g. blocksize/total blocks make sense for device, root inode is a directory, has a size which is a multiple of the block size, etc). I haven't checked whether the patch actually made it in, but since Linus was an advocate of it, I was pretty sure it would. Still, I don't think anyone would object to a patch which moved SYSV to the end of the list in fs/Makefile, so that it is probed last. Cheers, Andreas -- Andreas Dilger \ If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry? http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
[reiserfs-list] [PATCH] incorrect byte swapping for statfs
Hello, while this is probably already addressed in the patch for big-endian reiserfs support, I noticed a bug when looking at resierfs_statfs. You are byte-swapping the (generic) superblock s_blocksize field, which is already in CPU byte order. The other fields are from the reiserfs on-disk superblock (rs) which need to be swapped, but this one is the VFS superblock (s). Cheers, Andreas == --- fs/reiserfs/super.c.origTue May 8 17:08:12 2001 +++ fs/reiserfs/super.c Tue Jun 12 11:20:30 2001 @@ -853,7 +853,7 @@ /* changed to accomodate gcc folks.*/ buf-f_type = REISERFS_SUPER_MAGIC; - buf-f_bsize = le32_to_cpu (s-s_blocksize); + buf-f_bsize = s-s_blocksize; buf-f_blocks = le32_to_cpu (rs-s_block_count) - le16_to_cpu (rs-s_bmap_nr) - 1; buf-f_bfree = le32_to_cpu (rs-s_free_blocks); buf-f_bavail = buf-f_bfree; -- Andreas Dilger \ If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry? http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert