Re: v3 experimental data=ordered and logging speedups for 2.6.1
Hello! On Mon, Jan 19, 2004 at 11:45:26AM -0500, Chris Mason wrote: I've got most of data=ordered finished, there are a few paths like writepage and O_DIRECT that need tweaking. Thanks to Oleg's file_write work in 2.6.x, the data=journal patch is much cleaner than 2.4, it is almost done but not included in the bunch of patches I just uploaded to ftp.suse.com. Oleg is cc'd in case he wants to look over the changes to reiserfs_file_write in reiserfs-jh-2. Cool. I'd certainly take a look at it. But may be in February, as I am in US right now and I have not got any stable internet connection yet. Thank you. Bye, Oleg
Re: New reiser4 snapshot (2003.09.12) is out
Hello! On Sat, Sep 13, 2003 at 04:38:01AM -0400, Robert P. J. Day wrote: It is because of paranoid -Werror flag. ./configure --disable-werror will help you. However, if your system will have readline (as on systems which are used by reiser4progs developers :-) headers that warning disappear. my system definitely has readline installed, and i get the same error. On the other hand, you need not only readline itself, but also its header files. Lots of distributions (e.g. redhat-alike) form another package called (usuaully) readline-devel, that contains this necessary stuff. Bye, Oleg
Re: New reiser4 snapshot (2003.09.12) is out
Hello! On Sat, Sep 13, 2003 at 06:05:58PM +0400, Hans Reiser wrote: my system definitely has readline installed, and i get the same error. On the other hand, you need not only readline itself, but also its header files. Lots of distributions (e.g. redhat-alike) form another package called (usuaully) readline-devel, that contains this necessary stuff. did you adjust our headers to be redhat compatible? If not, please do. The problem turned out to be that RedHat's readline was compiled with ncurses dynamically, so when you link in readline, you need to link ncurses too. Vitaly will release the fix to configure script shortly, I believe. Bye, Oleg
Re: New reiser4 snapshot (2003.09.12) is out
Hello! On Sat, Sep 13, 2003 at 10:41:47AM -0400, Robert P. J. Day wrote: did you adjust our headers to be redhat compatible? If not, please do. i didn't see anything in the READ.ME about adjusting headers, but would this also explain why, if i build reiser 4 support directly into the kernel, my make modules_install works fine, but if i make it modular, i get reiser4 is not supposed to be build as module yet. Hm, I thought I disabled this in Kconfig, but it seems I done this not fully. Anyway the list of unresolved symbols you provided is valuable and we will export those on next snapshot. just curious. haven't rebooted under the new kernel yet, but i'd like to clear up the above before i do that. thanks. Bye, Oleg
Re: data-logging finally for 2.4.23?
Hello! On Wed, Sep 03, 2003 at 05:56:00PM +0200, Dieter N?tzel wrote: What's up Chris? Your latest stuff working fine on 2.4.22-rc1-rl (pre-emption; haven't time for a newer version, yet). I forwarded the patch to Hans to propagate to Marcelo, but it have not went through. I will check with Hans after he will awake. I hope we will get 2.4.23 vanilla with datalogging stuff. Bye, Oleg
Re: Fwd: Bad root block 0
Hello! On Sun, Sep 07, 2003 at 02:46:32PM +0200, Roland H?der wrote: another thing: I could extract the missing metadata as you described to a compressed file. But I do not mail it to you, because it contains security-breaking-data: My Online-Banking stuff. :-( If I do so, Hm. Can you tell us what kernel version you were using at the time of writing your files to encrypted device? What do you mean by extracting missing metadata? which will never happen - you just need to crack my password for unlimited cash :-( unlimited cash? Wow! Bye, Oleg
Re: Fwd: Bad root block 0
Hello! On Mon, Sep 08, 2003 at 05:15:01PM +0200, Roland H?der wrote: Hm. Can you tell us what kernel version you were using at the time of writing your files to encrypted device? Opps: Vanilla 2.4.21 with cryptoloop (not patched, it's compiled alone and installed into /lib/modules/2.4.21/x) Ah, ok. What do you mean by extracting missing metadata? debugreiserfs can extract the metadata of a rfs-partition. With missing I mean that reiserfschk tells me that there's no metadata available. So here's a strange thing: debugreiserfs found it, but reiserfschk not... What debugfs command found the metadata? Was it just plain debugreiserfs -p? How much leaves/internal nodes were found? unlimited cash? Wow! Nope, not *really* unlimited cash... :) Ah, sigh. I thought Bill Gates started to use linux + reiserfs, but no... Bye, Oleg
Re: write barrier patches for 2.4.21
Hello! On Tue, Aug 26, 2003 at 05:46:24PM -0400, Tom Vier wrote: anyone working on scsi wb's? there was a long thread on l-k about wb's, but i wasn't aware what came of it. There was a discussion about that on Kernel Summit 2003 and general opinion was that SCSI does not need the WB stuff at all as it does the correct thing anyway. But since the the barrier flags are visible in io requests, actual device drivers are free to do something when met with barrier requests or to ignore it. The only concern is probably raid cards that show bunch of IDE drives as a SCSI device. Bye, Oleg
Re: 2.6.0-test4 reiserfs oops
Hello! On Tue, Aug 26, 2003 at 03:59:20PM +, Lorenzo Allegrucci wrote: I have got this oops running fsstress and fsx-linux on a 20Gb reiserfs partition. Fully reproducible. What are the options to fsx and fsstress? fsx-linux linux-2.5.0.tar.bz2 :) mkdir d fsstress -d d -n 100 -p 10 The oops follows immediately. works for me without any patches. This is easier to reproduce: touch file mkdir d fsstress -d d -n 100 -p 10 As soon as I run fsx-linux file on another console I get the oops. (However fsx-linux without fsstress is not sufficient) Well, actually I was able to reproduce it yesterday later after replying to you. And we got some other similar reports under different workloads. My initial suspiction about the cause was right. and that fix I sent to you on the second try is the correct thing to do. The actual problem came from AIO people merging incorrect patch into reiserfs code. Bye, Oleg
Re: reiser4 snapshot for August 26th.
Hello! On Tue, Aug 26, 2003 at 11:28:44PM +0200, Diego Calleja Garc?a wrote: btw, I suppose this feature will be removed if/when reiser4 is merged?: config REISER4_FS_SYSCALL bool Enable reiser4 system call No. It will be fixed. dmesg errors: (fs/ext3/inode.c, 2728): ext3_write_inode: called recursively, non-PF_MEMALLOC! Call Trace: [c018c715] write_inode+0x45/0x50 [c018c9af] __sync_single_inode+0x28f/0x310 [c018cd00] generic_sync_sb_inodes+0x1c0/0x2e0 Hm. Interesting Thank you for the report. We will fix it. Bye, Oleg
Re: New reiser4 snapshot (as of August 22, 2003)
Hello! On Fri, Aug 22, 2003 at 06:16:35PM -0700, Tupshin Harper wrote: Are these patches available outside of bitkeeper, and if so, where are they located? Yes, they are at http://thebsh.namesys.com/snapshots/2003.08.22 , as somebody pointed out already. I just forgot to mention the URL. Bye, Oleg
Re: ReiserFS problems
Hello! On Wed, Aug 06, 2003 at 08:22:52PM +0200, Rogier Wolff wrote: Only list the file/directory that's being worked upon when explicitly requested. When not explicitly requested, set an alarm handler to print it every second (or so). Lots of time is now spent in writing to I think we already do something like this. Vitaly should know exact details. Bye, Oleg
Re: Filesystem corruption
Hello! On Thu, Aug 14, 2003 at 12:05:28AM +0800, Locke wrote: the files. I'm guessing the reason why it recovered so little was because that because I was running a 7.8GB+40GB LVM and the 40GB pyhsical volume wasn't working and left it with only 7.8GB. Yes of course. is_tree_node: node level 0 does not match to the expected one 1 vs-5150: search_by_key: invalid format found in block 8838461. Fsck? So LVM substitures zero filled blocks instead of data if physical volume is unavailable. Of course reiserfsck happily thrown all of those blocks out of the tree. And also when rebooting after the corruption I saw several error messages for all drives, hda, hdb and hdg ** hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x84 { DriveStatusError BadCRC } hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Also you should consider replacing your noisy IDE cable for primary IDE controller with not noisy one. Or just run in lower UDMA mode. **The messages are copied from the FAQ in namesys.com because they looked similar so I'm not sure if they're the exactly same. Well, if they are not the same, you'd better write them down on paper. Is there anything I can try to recover more data? You might try to get LVM up again and run reiserfsck --rebuild tree. Some more stuff wuill be restored. Though still you will have lots of files' content lost and there is no way to restore it anymore. Also use reiserfsck 3.6.11 Bye, Oleg
Re: ReiserFS problems
Hello! On Thu, Aug 07, 2003 at 11:12:27AM -0700, Mike Fedyk wrote: Well. This is actually unfortunate, I agree. In such a case you'd better move your reiserfs images to some other place for the time of reiserfsck --rebuild-tree run. or compress them. But if there was at any time an uncompressed reiserfs image within the outer reiserfs filesystem you're fscking, won't that screw it up too? Yes. The fs in file will be completely destroyed. Some stuff from it may appear in outer fs. (possibly in lost + found, no actual file data, just the names and directory structure). So you can compress it, but if you uncompress it to work with it, it still fscks fsck... Right? :-/ Yes. Bye, Oleg
Re: ReiserFS problems
Hello! On Wed, Aug 06, 2003 at 06:20:55PM +0200, Rogier Wolff wrote: Reiserfs messed up our filesystem again (one file gives us permission And you use what kernel with what patches on what hardware? A surface scan needs to read all the datablocks. But an fsck doesn't. At least that's the normal case. reiserfsck --rebuild-tree is special, it actually reads in all the blocks on the device that are marked as used, to find metadata blocks and connect them to the tree (even if they were previously unconnected). Unlike many other filesystems out there, reiserfs does not have fixed metadata locations, hence we absolutely need this scan. later. So we hit control-C on the fsck. That was big mistake. But now mounting the filesystem gives us: ReiserFS version 3.6.25 reiserfs: checking transaction log (device 09:00) ... is_tree_node: node level 0 does not match to the expected one 65534 vs-5150: search_by_key: invalid format found in block 0. Fsck? vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [1 2 0x0 SD] Using r5 hash to sort names is_tree_node: node level 0 does not match to the expected one 65534 vs-5150: search_by_key: invalid format found in block 0. Fsck? vs-2140: finish_unfinished: search_by_key returned -2 and fsck without --rebuild-tree gives us that an unfinished --rebuild-tree was in progress. So we've restarted the tree-rebuild. Yes. Once you run tree-rebuild, you must wait until it is completed. (Documentation update is scheduled just now. But in fact we mention this in our FAQ). Question: If it is reading all datablocks, I'm guessing that it is All one that are marked as occupied in the bitmaps. looking for the magics that build up the filesystem. We're a Yes. datarecovery company. We probably don't have any current datarecoveries of people with Reiserfs on their disk. But if we had a disk-image with a valid (or not) Reiserfs on it, would it link that into our filesytem? yes it will. So basically speaking you do not want to run rebuild-tree operation on the FS that contains files with reiserfs metadata embedded in them in clear. This is also explained in our FAQ. Anyway, when I first started out with Reiserfs, it didn't support 2G files (or was it 4G?) I had to patch the kernel and (irreversably!) upgrade the on-disk format. Yes. Linux by itself was not supporting 2G some time ago and people used patches an changed their on disk formats even for other filesystems out there. We've noticed horrible slowdowns when the filesystem is 90% full. It turns out that when a block group is more than 90% full reiserfs will prefer a different block group. i.e. it is ALWAYS switching block groups when the whole disk is 90% full. Something like that. When we report something like that it's always: Ah, yes, that's an old bug we've fixed it. Use patch. In fact this is not exactly true, it only switches to other block group if you are creating new file. Why do you think this is a problem? (of course I am speaking of 2.4.20+ kernels). Bye, Oleg
Re: nfsd-fh: found a name that I didn't expect
Hello! On Wed, Aug 06, 2003 at 05:00:03PM -0400, John Dalbec wrote: I just got an nfsd-fh: found a name that I didn't expect yesterday. I'm using a Red Hat 2.4.20 RPM with 2.4.20-pending+data-logging+quota. Should I apply just this patch or both this patch and the iget5_locked_2.4.20 patch? You only need the patch below. iget5_locked_2.4.20 patch is broken. Bye, Oleg = fs/reiserfs/inode.c 1.42 vs edited = --- 1.42/fs/reiserfs/inode.c Thu Feb 13 15:42:42 2003 +++ edited/fs/reiserfs/inode.c Thu Feb 20 17:23:24 2003 @@ -20,6 +20,10 @@ static int reiserfs_get_block (struct inode * inode, long block, struct buffer_head * bh_result, int create); +/* This spinlock guards inode pkey in private part of inode + against race between find_actor() vs reiserfs_read_inode2 */ +static spinlock_t keycopy_lock = SPIN_LOCK_UNLOCKED; + void reiserfs_delete_inode (struct inode * inode) { int jbegin_count = JOURNAL_PER_BALANCE_CNT * 2; @@ -898,8 +902,9 @@ bh = PATH_PLAST_BUFFER (path); ih = PATH_PITEM_HEAD (path); - +spin_lock(keycopy_lock); copy_key (INODE_PKEY (inode), (ih-ih_key)); +spin_unlock(keycopy_lock); inode-i_blksize = PAGE_SIZE; INIT_LIST_HEAD(inode-u.reiserfs_i.i_prealloc_list) ; @@ -1220,10 +1225,27 @@ unsigned long inode_no, void *opaque ) { struct reiserfs_iget4_args *args; +int retval; args = opaque; +/* We protect against possible parallel init_inode() on another CPU here. */ +spin_lock(keycopy_lock); /* args is already in CPU order */ -return le32_to_cpu(INODE_PKEY(inode)-k_dir_id) == args - objectid; +if (le32_to_cpu(INODE_PKEY(inode)-k_dir_id) == args - objectid) +retval = 1; +else +/* If The key does not match, lets see if we are racing + with another iget4, that already progressed so far + to reiserfs_read_inode2() and was preempted in + call to search_by_key(). The signs of that are: + Inode is locked + dirid and object id are zero (not yet initialized)*/ +retval = (inode-i_state I_LOCK) + !INODE_PKEY(inode)-k_dir_id + !INODE_PKEY(inode)-k_objectid; + +spin_unlock(keycopy_lock); +return retval; } struct inode * reiserfs_iget (struct super_block * s, const struct cpu_key * key)
Re: reiser4 snapshot
Hello! On Mon, Aug 11, 2003 at 05:32:25PM -0700, Boris Tschirschwitz wrote: I thought I'd give it a try on 2.6.0-test3-mm1. Even with 'make mrproper' before compiling, I get the following error message: (Is there any interest in such error reports?) Yes, there is. bobele linux # make bzImage CHK include/linux/version.h UPD include/linux/version.h Making asm-asm-i386 symlink CC scripts/empty.o MKELF scripts/elfconfig.h HOSTCC scripts/file2alias.o HOSTCC scripts/modpost.o HOSTLD scripts/modpost SPLIT include/linux/autoconf.h - include/config/* CC arch/i386/kernel/asm-offsets.s CHK include/asm-i386/asm_offsets.h UPD include/asm-i386/asm_offsets.h CC init/main.o In file included from include/linux/unistd.h:9, from init/main.c:18: include/asm/unistd.h: In function `reiser4': include/asm/unistd.h:400: error: `__NR_reiser4' undeclared (first use in this function) include/asm/unistd.h:400: error: (Each undeclared identifier is reported only once include/asm/unistd.h:400: error: for each function it appears in.) make[1]: *** [init/main.o] Error 1 make: *** [init] Error 2 Hm, this is strange. __NR_reiser4 is clearly defined in include/asm-i386/unistd.h Probably you had that part of the patch rejected? Can you please verify? Bye, Oleg
Re: un-long listable files
Hello! On Wed, Aug 06, 2003 at 02:31:39PM +1000, [EMAIL PROTECTED] wrote: getxattr(light_in_time_of_darkness__glad_to_see_you.wav, system.posix_acl_access and there it sits waiting for heat-death-of-universe. Hm, this code is by SuSE people (and it is only in suse kernels) and I have not even looked at it closely yet. Probably Jeff can comment on it. Looks like I'm falling foul of some strange ACLs. ls -l as root still hangs though. Bye, Oleg
Re: rebuild fs
Hello! On Tue, Aug 05, 2003 at 04:56:55PM +0400, Hans Reiser wrote: rephrase that as, use 3.6.11, if it still fails, tell us, the segfault will at least be fixed regardless of whether fsck has enough data to do its job. But it was not failing on the IDE drive anyway. I don't understand the relevance of your statement to mine. Since after transferring image to IDE made reiserfsck to not fail (and it failed on raid5 due to raid errors, I think), your if it still fails statement was not adequate. Current problem is that not everything is restored and some important files were lost. Now, I know that recently we introduced some serious changes in reiserfsck and now if the block have some slight corruption, it is not immediately discarded, but fsck actually tries to extract some useful data out of it if it think this is really reiserfs metadata block. That's why newer reiserfsck might achieve better results. Bye, Oleg
Re: rebuild fs
Hello! On Tue, Aug 05, 2003 at 04:28:15PM +0400, Hans Reiser wrote: But while rebuilding the tree, I got a segmentation fault. Because I didn't want to continue work on the original raid system, I copied the entire raid disk to the IDE disk. dd if=/dev/rd/c0d0 of=/dev/hda conv=noerror,sync I tried to rebuild the fs structure again, and I was able to access many files, but not that are important to me :( Does there is anyting I can do in this state ? does other tools then resierfsck exist ? If you are not using latest fsck version (3.6.11 as of now), try to use reiserfsprogs 3.6.11, as there is slight chance it would do better. rephrase that as, use 3.6.11, if it still fails, tell us, the segfault will at least be fixed regardless of whether fsck has enough data to do its job. But it was not failing on the IDE drive anyway. Bye, Oleg
Re: is this a known bug?
Hello! On Thu, Jul 24, 2003 at 01:54:35AM +0200, [EMAIL PROTECTED] wrote: [usb-ohci:sohci_device_operations+781321/95136603] .LC62 [reiserfs] 0xc3 Jul 14 13:25:41 mai-stor2 kernel: Call Trace: [f8a456d5] .LC62 [reiserfs] 0xc3 Jul 14 13:25:41 mai-stor2 kernel: [f8a3ad1e] journal_mark_dirty [reiserfs] 0x13e Jul 14 13:25:41 mai-stor2 kernel: [f8a43d60] .LC93 [reiserfs] 0x27a0 Jul 14 13:25:41 mai-stor2 kernel: [f8a1f100] reiserfs_free_block [reiserfs] 0xa0 Jul 14 13:25:41 mai-stor2 kernel: [f8a34bb0] prepare_for_delete_or_cut [reiserfs] 0x760 Jul 14 13:25:41 mai-stor2 kernel: [f8a22417] free_thrown [reiserfs] 0x57 Jul 14 13:25:41 mai-stor2 kernel: [f8a22689] do_balance [reiserfs] 0xe9 Jul 14 13:25:41 mai-stor2 kernel: [f8a356ca] reiserfs_cut_from_item [reiserfs] Yes, I know this one. It is journal overflow we fixed in 2.4.21 or thereabout. If you use reiserfs, you really do not want to depend on RedHat's 2.4.9 kernel, you'd better get some recent stuff (or talk us into backporting fixes to 2.4.9? that might work if you have enough money ;) ) Bye, Oleg
Re: problem with overwriting large files
Hello! On Mon, Jul 21, 2003 at 08:30:19AM -0700, Suman Puthana wrote: We do not see any problem when we are writing into empty space(using the write call in a C program) as the file is extending( the write operation takes less than 3 ms), but for a certain part of the application we need to over-write these files and we find that the write operation is taking about 200-300 ms every few minutes, sometimes every few seconds depending on the system load. The description is very nice, but it would be even more nice if you can provide a sample test code that we can run and just see the problem. 3.) Would writing in file system blocks(4096 bytes?) or multiples of blocks help this situation? From some basic tests it doesn't seem to help by much. From the file system performance point of view, is it better to write sixteen 4K chunks or one 64K chunk? Actually rewriting should be way more faster just because you are not allocating stuff and only changing mtime. So... I'd really appreciate a sample code that demonstrates a problem. Also give an info about what kernel you are trying to use that shows the problem andstuff like that. Thank you. Bye, Oleg
Re: bug report: attributes ( chattr +a ) not respected by reiserfs 3.6, but this isn't listed in man mkreiserfs
Hello! On Wed, Jul 16, 2003 at 04:29:38PM -0500, Matt Stegman wrote: I get the same behaviour, but it appears that *only* the append-only attribute is ignored. Other attributes are respected fine. However, there are still some wierd things with reiserfs and attributes. This is a pretty long email detailing what I found. Thanks a lot for a lot of details. # mkreiserfs /dev/hdc1 -mkreiserfs, 2003- reiserfsprogs 3.6.8 mkreiserfs: Guessing about desired format.. mkreiserfs: Kernel 2.4.20-xfs-r3 is running. Format 3.6 with standard journal ...snip... # mount -t reiserfs -o attrs /dev/hdc1 /mnt/reiser # cd /mnt/reiser # echo hello file -bash: file: Permission denied Huh? I'm root, this is a new filesystem, why would permission be denied? Wait a minute... # lsattr -d /mnt/reiser suS-iadAcjIt- /mnt/reiser Yes, this is a problem in mkreiserfs. Surprisingly 2.6.4 works ok. This of course will be fixed. # chattr +i file # echo line2 file -bash: file: Permission denied Append only (a) is not respected. # chattr -i +a file # lsattr file s-S--adAc--t- file # echo test file # ls -l file -rw-r--r--1 root root5 Jul 16 16:09 file Hm... Indeed. Sigh. a : file can only be opened in append mode Ignored by reiserfs. Yes, this is a bug. d : tell dump to ignore this file Does dump even work on reiserfs? We use this for marking the file as not needing tail packing. t : do not merge tails on this file I don't know if this is supported or not. Hm, my chattr documentation does not have this flag. Finally, 'reiserfsck --clean-attributes' will produce the following: # umount /mnt/reiser # reiserfsck --clean-attributes /dev/hdc1 ...snip... # mount -t reiserfs -o attrs /dev/hdc1 /mnt/reiser # lsattr -d /mnt/reiser - /mnt/reiser # lsattr /mnt/reiser/file - /mnt/reiser/file Yes, this is expected. Older kernels, that are unaware of reiserfs attributes (pre 2.4.17 ones) write various garbage in sd_attrs field in stat data. So this needs to be cleaned. And we even invented a superblock flag to indicate that such a cleaning was performed already. Now we also see that new mkreiserfs also writes garbage there. Bye, Oleg
Re: reiserfsprogs-3.6.9 release
Hello! BTW, it seems one other important change was omissed in the short summary below. The license on the reiserfsprogs package was changed from GPLv2 to GPLv2 with additional restriction that I quote below: === ReiserFSprogs is hereby licensed under the GNU General Public License version 2 but with the following Anti-Plagiarism modification: You may not remove any credits or brand marks, or cause them to not display, unless you are an end user (that is, you are not redistributing to others). Yes, there really are people with the nerve to remove credits from software they did not write, or only wrote a small part of, and they are even frequently occurring sad to say. Credits are not ads, credits describe someone's contribution to the project (e.g. labor or money) whereas an ad says something else. === Bye, Oleg On Wed, Jul 16, 2003 at 09:03:29PM +0400, Vitaly Fertman wrote: Hi all, the new reiserfsprogs release is available on our ftp site (ftp.namesys.com). This release includes: - objectid handling was improved, significant speedup at pass0 and semantic/lost+found rebuild passes (was in last pre releases for some time); - improved leaves recovery on pass0 of rebuild-tree; - exit codes of reiserfsck were fixed; - reiserfsck --yes option was added - mkreiserfs --quiet option was added; - fsck check on boot avoids another bitmap reading on the following mount; - credits were fixed. Some bugs were fixed: - fsck proceeds for the standard journal when wrong journal parameters in the journal header detected, fixing them with the warning; - a bug in journal replaying code when the only transaction exists was fixed; - a few not standard journal related bugs in mkreiserfs and reiserfstune were fixed; - a pair of bugs in rebuild-sb were fixed; -- Thanks, Vitaly Fertman
Re: Horrible ftruncate performance
Hello! On Tue, Jul 15, 2003 at 09:55:09PM +0200, Dieter N?tzel wrote: Somewhat. Mouse movement is OK, now. But... 1+0 Records aus 0.000u 3.090s 0:16.81 18.3% 0+0k 0+0io 153pf+0w 0.000u 0.050s 0:00.27 18.5% 0+0k 0+0io 122pf+0w INSTALL/SOURCE time dd if=/dev/zero of=sparse1 bs=1 seek=200G count=1 ; time sync 1+0 Records ein 1+0 Records aus 0.000u 3.010s 0:15.27 19.7% 0+0k 0+0io 153pf+0w 0.000u 0.020s 0:01.01 1.9% 0+0k 0+0io 122pf+0w So you create a file in 15 seconds and remove it in 15 seconds. Kind of nothing changed except mouse now moves, am I reading this wrong? INSTALL/SOURCE time rm sparse ; time sync 0.000u 14.990s 1:31.15 16.4%0+0k 0+0io 130pf+0w 0.000u 0.030s 0:00.22 13.6% 0+0k 0+0io 122pf+0w So the stuff fell out of cache and we need to read it again. hence the increased time. Hm, probably this case can be optimized if there is only one item in the leaf and this item should be removed. Need to take closer look to balancing code. Bye, Oleg
Re: Horrible ftruncate performance
Hello! On Wed, Jul 16, 2003 at 12:47:53PM +0200, Dieter N?tzel wrote: Somewhat. Mouse movement is OK, now. But... 1+0 Records aus 0.000u 3.090s 0:16.81 18.3% 0+0k 0+0io 153pf+0w 0.000u 0.050s 0:00.27 18.5% 0+0k 0+0io 122pf+0w INSTALL/SOURCE time dd if=/dev/zero of=sparse1 bs=1 seek=200G count=1 ; time sync 1+0 Records ein 1+0 Records aus 0.000u 3.010s 0:15.27 19.7% 0+0k 0+0io 153pf+0w 0.000u 0.020s 0:01.01 1.9% 0+0k 0+0io 122pf+0w So you create a file in 15 seconds Right. and remove it in 15 seconds. No. Normaly ~5 seconds. Ah, yes. Looking at wrong timeing info ;) I see that yesterday without the patch you had 1m, 9s, 5s, 2m times for 4 deletes... Kind of nothing changed except mouse now moves, INSTALL/SOURCE time rm sparse ; time sync 0.000u 14.990s 1:31.15 16.4%0+0k 0+0io 130pf+0w 0.000u 0.030s 0:00.22 13.6% 0+0k 0+0io 122pf+0w So the stuff fell out of cache and we need to read it again. Shouldn't this take only 15 seconds, then? Probably there was some seeking due to removal of lots of blocks. Worst case was ~5 minutes. Yeah, this is of course sad. BTW is this with search_reada patch? What if you try without it? Bye, Oleg
Re: Horrible ftruncate performance
Hello! On Fri, Jul 11, 2003 at 05:27:25PM +0200, Dieter N?tzel wrote: Actually I did it already, as data-logging patches can be applied to 2.4.22-pre3 (where this truncate patch was included). No -aaX. Right. Maybe it _IS_ time for this _AND_ all the other data-logging patches? 2.4.22-pre5? It's Chris turn. I thought it is good idea to test in -ac first, though (even taking into account that these patches are part of SuSE's stock kernels). I don't think -ac would make it = No big Reiser involved... Would make what? I think Alan have agreed to put data-logging code in already. Bye, Oleg
Re: Horrible ftruncate performance
Gello! On Fri, Jul 11, 2003 at 05:32:49PM +0200, Dieter N?tzel wrote: OK some hand work... Where comes this from? It was there for a lot of time. Like for not less than 2 years, I'd say. I don't find it my tree: reiserfs quota patch got rid of it. Here's relevant part of my diff: if (retval) { - reiserfs_free_block (th, allocated_block_nr); + reiserfs_free_block (th, inode, allocated_block_nr, 1); goto failure; } - if (done) { - inode-i_blocks += inode-i_sb-s_blocksize / 512; - } else { + if (!done) { /* We need to mark new file size in case this function will be interrupted/aborted later on. And we may do this only for holes. */ Bye, Oleg
Re: Horrible ftruncate performance
Hello! On Fri, Jul 11, 2003 at 05:34:12PM +0200, Marc-Christian Petersen wrote: Actually I did it already, as data-logging patches can be applied to 2.4.22-pre3 (where this truncate patch was included). Maybe it _IS_ time for this _AND_ all the other data-logging patches? 2.4.22-pre5? It's Chris turn. I thought it is good idea to test in -ac first, though (even taking into account that these patches are part of SuSE's stock kernels). Well, I don't think that testing in -ac is necessary at all in this case. May be not. But it is still useful ;) I am using WOLK on many production machines with ReiserFS mostly as Fileserver (hundred of gigabytes) and proxy caches. I am using this code on my production server myself ;) If someone would ask me: Go for 2.4 mainline inclusion w/o going via -ac! :) Chris should decide (and Marcelo should agree) (Actually Chris thought it is good idea to submit data-logging to Marcelo now, too). I have no objections. Also now, that quota v2 code is in place, even quota code can be included. Also it would be great to port this stuff to 2.5 (yes, I know Chris wants this to be in 2.4 first) Bye, Oleg
Re: df shows 172GB reiserFS partition as 109GB partition
Hello! On Mon, Jul 07, 2003 at 07:49:22PM +0200, Yasuo Iwakura wrote: I have a IBM Deskstar 180GB Harddrive and I created a 172GB Reiser FS partition, using drakconf (mdk-9.1-ger). The Problem is, df show only 109GB, same with windows(samba) - Windows says 108GB. (btw, the Bios thinks 130GB but i think thats not important) df (coreutils) 4.5.7 Linux version 2.4.21-0.13mdk /dev/hda1 - cylinder 1-22526 - 180940063+ blocks - ID 83 Linux Hm, strange. Looks like the fs was created with lesser size for unknown reason. Try to issue this command (once) while fs is mounted: mount /dev/hda1 -t reiserfs -o resize=45235015 Then see if df now reports coirrect number. Bye, Oleg
Re: add_save_link:search_by_key
Hello! On Mon, Jul 07, 2003 at 07:51:01AM +0200, Trond Hagen wrote: Thanks, but the Red Hat kernel is a 2.4.20 why isn't the fix merged in ? Oops. I meant that the fix was merged into 2.4.21. Bye, Oleg On Sat, 2003-07-05 at 13:46, Oleg Drokin wrote: Hello! On Fri, Jul 04, 2003 at 11:32:16PM +0200, Trond Hagen wrote: why I'm getting a lot of these ? Jul 4 22:24:59 db-http1 kernel: vs-2100: add_save_link:search_by_key ([-1 15610004 0x1001 DIRECT]) returned 1 I'm running: 2.4.20-13.7smp (Red Hat 7.3 kernel) Sounds like you've been bitten by infamous iget4() race. The fix is merged into 2.4.20 The individual fix can be obtained from ftp://ftp.namesys.com/pub/reiserfs-for-2.4/2.4.20-pending/07-race-fix.diff Don't forget to fsck your reiserfs filesystems after applying the fix (use latest reiserfsprogs from our ftp site, not the old stuff from RH 7.3!) Bye, Oleg
Re: reiserfs on removable media
Hello! On Wed, Jul 02, 2003 at 02:23:13PM -0400, Zygo Blaxell wrote: - If the device is detached while a filesystem is mounted, reiserfs gets a whole lot of I/O errors (or worse) and immediately oopses. It would be nice if reiserfs would handle this a bit more gracefully--it should at least let me kill processes with open files and umount the filesystem. OTOH many other things also oops with with current USB/firewire/scsi device driver stack too. :-P Write errors to data areas are not mostly safe. It's write errors into journal area that kill the thing. Jeff Mahoney of SuSE have the patch that remounts the FS R/O in case of such an event (I think he even posted some preliminary patches here), it is what you most probably need in this case. Bye, Oleg
Re: Journal-601 error on Redhat 7.3 / reiserfs / ext3 / raid 5
Hello! On Thu, Jul 03, 2003 at 01:14:08AM +0300, Jussi Vainionp?? wrote: Apr 27 20:18:06 un kernel: journal-601, buffer write failed I do not know who to blame here. Try to heavily write to loop device itself (without using reiserfs) to see if something will break? Or bettr yet - upgrade to newer kernel and see if that's cures your problem? I tried the same operation using ext2 instead of reiserfs and at least that worked without any problems. ext2 does not wait on buffers unless you operate in sync mode, so it won't notice. Try the ext2 with -o sync then? Bye, Oleg
data-logging for 2.4.22-pre3
Hello! Yes, I know that 2.4.22-pre3 is not out yet, but Marcelo have accepted our somewhat big patches and so you can get replacement patches from ftp://namesys.com/pub/reiserfs-for-2.4/testing/data-logging-and-quota-2.4.22-pre3 once 2.4.22-pre3 is out ;) Also starting from 2.4.22-pre3 you no longer need to apply 03-relocation-8.diff.gz patch. Bye, Oleg
Re: vpf-10680, minor corruptions
Hello! On Fri, Jun 27, 2003 at 04:38:00PM +0400, Oleg Drokin wrote: I was looking in the wrong direction, when I produced that patch, so it will produce zero output. I hope to come up with ultimate fix soon enough. ;) Well, there is a patch below that does *not* work for me ;) But it should work. I have traced the new problem to a cross compiler that compiles code in a different way than native compiler for whatever reason (demo is attached as test.c program, it should print result is 1 in case it is compiled correctly and stuff about unknown uniqueness if it is miscompiled. In fact may be this is just correct compiler behaviour.) I now think that when I compile a kernel with native compiler, it should work with below patch. But I can verify that only tomorrow it seems. You might try that patch as well to see if it helps you before I try it ;) The patch is obviously correct one. (except that it does not work with my cross compiler and kernel does work without patch which is really-really strange). = fs/reiserfs/bitmap.c 1.26 vs edited = --- 1.26/fs/reiserfs/bitmap.c Sun May 18 01:09:36 2003 +++ edited/fs/reiserfs/bitmap.c Fri Jun 27 16:58:44 2003 @@ -43,7 +43,7 @@ test_bit(_ALLOC_ ## optname , SB_ALLOC_OPTS(s)) static inline void get_bit_address (struct super_block * s, - unsigned long block, int * bmap_nr, int * offset) + b_blocknr_t block, int * bmap_nr, int * offset) { /* It is in the bitmap block number equal to the block * number divided by the number of bits in a block. */ @@ -54,7 +54,7 @@ } #ifdef CONFIG_REISERFS_CHECK -int is_reusable (struct super_block * s, unsigned long block, int bit_value) +int is_reusable (struct super_block * s, b_blocknr_t block, int bit_value) { int i, j; @@ -107,7 +107,7 @@ static inline int is_block_in_journal (struct super_block * s, int bmap, int off, int *next) { -unsigned long tmp; +b_blocknr_t tmp; if (reiserfs_in_journal (s, bmap, off, 1, tmp)) { if (tmp) { /* hint supplied */ @@ -235,7 +235,7 @@ /* Tries to find contiguous zero bit window (given size) in given region of * bitmap and place new blocks there. Returns number of allocated blocks. */ static int scan_bitmap (struct reiserfs_transaction_handle *th, - unsigned long *start, unsigned long finish, + b_blocknr_t *start, b_blocknr_t finish, int min, int max, int unfm, unsigned long file_block) { int nr_allocated=0; @@ -281,7 +281,7 @@ } static void _reiserfs_free_block (struct reiserfs_transaction_handle *th, - unsigned long block) + b_blocknr_t block) { struct super_block * s = th-t_super; struct reiserfs_super_block * rs; @@ -327,7 +327,7 @@ } void reiserfs_free_block (struct reiserfs_transaction_handle *th, - unsigned long block) + b_blocknr_t block) { struct super_block * s = th-t_super; @@ -340,7 +340,7 @@ /* preallocated blocks don't need to be run through journal_mark_freed */ void reiserfs_free_prealloc_block (struct reiserfs_transaction_handle *th, - unsigned long block) { + b_blocknr_t block) { RFALSE(!th-t_super, vs-4060: trying to free block on nonexistent device); RFALSE(is_reusable (th-t_super, block, 1) == 0, vs-4070: can not free such block); _reiserfs_free_block(th, block) ; @@ -589,15 +589,15 @@ static inline int old_hashed_relocation (reiserfs_blocknr_hint_t * hint) { -unsigned long border; -unsigned long hash_in; +b_blocknr_t border; +u32 long hash_in; if (hint-formatted_node || hint-inode == NULL) { return 0; } hash_in = le32_to_cpu((INODE_PKEY(hint-inode))-k_dir_id); -border = hint-beg + (unsigned long) keyed_hash(((char *) (hash_in)), 4) % (hint-end - hint-beg - 1); +border = hint-beg + (u32) keyed_hash(((char *) (hash_in)), 4) % (hint-end - hint-beg - 1); if (border hint-search_start) hint-search_start = border; @@ -606,7 +606,7 @@ static inline int old_way (reiserfs_blocknr_hint_t * hint) { -unsigned long border; +b_blocknr_t border; if (hint-formatted_node || hint-inode == NULL) { return 0; @@ -622,7 +622,7 @@ static inline void hundredth_slices (reiserfs_blocknr_hint_t * hint) { struct key * key = hint-key; -unsigned long slice_start; +b_blocknr_t slice_start; slice_start = (keyed_hash((char*)(key-k_dir_id),4) % 100) * (hint-end / 100); if ( slice_start hint-search_start || slice_start + (hint-end / 100) = hint-search_start) { @@ -910,7 +910,7 @@ int reiserfs_can_fit_pages ( struct super_block *sb /* superblock of filesystem to estimate space
Re: Write-once file system
Hello! On Fri, Jun 27, 2003 at 09:07:05AM -0700, Fong Vang wrote: Once the write to the file is CLOSED the file should not be modifiable in any way. It should not be writeable by root. Ideally, this should be across reboot and across kernel. The current requirement is that as long as the modified kernel/reisefs is being used then it should NOT be modifiable (if a kernel allowing modification is used then it could allow modifications). So basically do you think it would be better for you to have write-once flag in superblock that will make all files to be unwritable (except newly created ones) as opposed to a simple mount option that you'd use for filesystems with non-changeable files? (you need to mark filesystems that are in write-once mode somehow, because I think you do not need all reiserfs filesystems to be run in this mode, right?) Also concerning the root should not be able to change the files, root will be able to overwrite files by using block devices if he'd want to. Bye, Oleg
Re: vpf-10680, minor corruptions
Hello! On Fri, Jun 27, 2003 at 12:23:07PM -0400, Chris Mason wrote: Most of these changes are in 2.4.21, which I've been using on an AMD64 Not the reiserfs_file_write() ones. bit box for a while without any problems. The bug should be somewhere else, it looks to me like these spots aren't trying to send an unsigned long to disk. the reiserfs_file_write() code have an array of b_blocknr_t elements. It then submits this array to reiserfs_paste_into_item/reiserfs_insert_item, but b_blocknr_t is unsigned long (read - 64 bit on alpha - oops). Funny thing is when I declare b_blocknr_t as u32, kernel basically falls apart if cross compiled. E.g. key comparison does not work and all kind of weird things start to happen. In short - if you want to make sure the bug is there - compile 2.5.70+ code on any 64 bit platform, write any file bigger than 2 blocks, unmount and remount the fs and see what's in the file. Bye, Oleg
Re: Write-once file system
Hello! On Fri, Jun 27, 2003 at 10:07:05AM -0700, Fong Vang wrote: this doesn't seem to work on kernel 2.4.20. I did a chattr +i on file but rm -rf (as root) on the file deletes it. You need to mount with -o attrs mount option for extended attributes to work. Bye, Oleg
Re: Write-once file system
Hello! On Fri, Jun 27, 2003 at 11:27:22AM -0600, 'Andreas Dilger' wrote: this doesn't seem to work on kernel 2.4.20. I did a chattr +i on file but rm -rf (as root) on the file deletes it. That is a reiserfs bug then... I just tested it with ext3 and it worked as expected. No, it is documented reiserfs feature. You must enable extended attributes support at mount time. Bye, Oleg
Re: Quota
Hello! On Wed, Jun 25, 2003 at 06:22:00PM +0800, SteelRat wrote: Can you help me. How can i set quotas to reiserfs? Patches for recent kernels (2.4.21+) are available at ftp://ftp.suse.com/pub/people/mason/patches/data-logging/2.4.21 Patches for 2.4.20 are available at ftp://namesys.com/pub/reiserfs-for-2.4/testing/quota-2.4.20 Apply the patches, recompile your kernel with quota support, upgrade your quota tools if needed and follow directions from Quota-HOWTO. Bye, Oleg
Re: vpf-10680, minor corruptions
Hello! On Mon, Jun 23, 2003 at 03:38:20PM +0200, Christian Kujau wrote: as stated before, the corruptions occur only on this very alpha machine, Well, I still cannot build the kernel myself and still working on it. (having make: *** [vmlinux] Error 139 and zero length vmlinux) BTW, I realised that I have not looked into your kernel config for that box, can you send it to me please? bread: Cannot read the block (523914): (Input/output error). Hm, but still it means kernel returned some error for read request. hah! i was not aware that the disk might have an hw problem, not a single error ever showed up in my logs. this was weird. so i re-partitioned the disk with a 10MB sde (to circumvent the bread error) on the beginning and a 2 GB sde2. now reiserfsck/cp/diff are all working fine under 2.4.21, but 2.5.72 is still erroneous. Sigh. btw: i am still using reiserfsprogs 3.6.8 now (since debian/testing has 3.6.6) and i have compiled these utils under a 2.5.72 kernel. is it safe to use them under 2.4 ? I see that you have used 2.5.70 and earlier kernels on alpha too. Do you have any idea of when stuff broke for you? Bye, Oleg
Re: vpf-10680, minor corruptions
Hello! On Wed, Jun 25, 2003 at 02:42:24AM +0200, Christian Kujau wrote: (/lib/modules/2.5.65/kernel/fs/reiserfs/reiserfs.ko): Invalid module format lila:~# uname -a Linux lila 2.5.65 #4 Wed Jun 25 00:48:46 CEST 2003 alpha GNU/Linux i compiled the module with CONFIG_REISERFS_CHECK=y. shall i go on with 2.5.64 or better 2.5.67 ? Try to compile with CONFIG_REISERFS_CHECK=y the kernel that known-bad for you. (e.g. 2.5.72/2.5.73) Bye, Oleg
Re: 2.4.21 reiserfs oops
Hello! On Mon, Jun 23, 2003 at 11:16:27PM +0100, Nix wrote: Jun 22 13:52:42 loki kernel: Unable to handle kernel NULL pointer dereference at virtual address 0001 This is very strange address to oops on. I'll say! Looks almost like it JMPed to a null pointer or something. No, if it'd jumped to a NULL pointer, we'd see 0 in EIP. Jun 22 13:52:43 loki kernel: EIP:0010:[c0092df4]Not tainted And the EIP is prior to kernel start which is also very strange. On the other hand the address c0192df4 is somewhere inside reiserfs code, so it looks like a single bit error, I'd say. I think it unlikely to be RAM problems given that the problem happened shortly after upgrading to 2.4.21; this was about half a day after I rebooted it because it threw a pile of never-seen-again, un-syslogged SCSI abort errors at me (sym53c875); and *that* was a few minutes after I rebooted into 2.4.21 for the first time. Hm, so first there were some scsi problems and then reiserfs oops? Actually since the RAM is good, I see no good reason for this to happen. (actually I see no good reason for valid code before _text, either). I wonder if 2.4.21 constantly crashes like that for you, then? Bye, Oleg
Re: illegal seek - unable to umount device
Hello! On Mon, Jun 23, 2003 at 07:44:57PM +0200, Gyimesi Akos wrote: I have just encountered a problem (?) with reiserfs on Linux 2.4.20. I wrote the following (shortened) python code which produced an unkillable process: #!/usr/bin/python file=open(terabyte_length_file, w) file.seek(1024*1024*1024*1024) file.write(%c % 0) file.close() On ext2, this code produced a file which had the (nominal) length of 1 terabyte and the real size of 16k. I ran this script on a reiserfs partition as a normal user, and it 1. produced an unkillable process with 100% CPU usage 2. i was unable to kill it anyway, and i was unable to unmount the filesystem. (Illegal seek, Device or resource busy). So finally i rebooted the system. Of course, the rebooting process also got stuck as init couldn't umount the device either. Naturally, i was not really surprised that this code didn't work on reiserfs, but i was quite astonished by the fact that this dirty program could practically kill the machine as a normal user - in a meaning that running it several times consumes its CPU resouces and makes it impossible to unmount the filesystem. This is known problem and we hope to push the fix to Marcelo soon. In short, the process is not stuck, it is busy creating file hole (4k at a time). If you'd wait long enough, it will eventually finish. Thank you for the report. Bye, Oleg
Re: Problem mounting ReiserFS as root partition
Hello! On Sun, Jun 15, 2003 at 03:08:39PM +0200, Till Gerken wrote: I am not able to mount my ReiserFS partition as root partition with any kernel later than 2.4.20-pre11. When trying, I get read_super_block: can't find a reiserfs filesystem on (dev 03:00, block 64, size 1024) Are you sure you want to mount /dev/hda as opposed to some partition on /dev/hda? Please check that you are mounting correct thing (root= option in lilo.conf for example). Bye, Oleg
Re: Will Reisefs have undo?
Hello! On Sun, Jun 15, 2003 at 02:29:26PM -0500, Alex Malinovich wrote: I don't think snapshots are really needed. I would be perfectly content with a semi-intelligent filesystem that would mark files as deleted in the journal while leaving the file intact on the HD. As soon as the file has actually been overwritten, it is marked purged and cannot be recovered. But up to that point, it would be a relatively simple task to just tell the journal to mark the file as not-deleted again. Hm. You seem to confuse journal with log of operations. The journal just holds copies of the blocks we are going to overwrite to achieve atomic block overwrite. So we cannot mark something deleted in journal. Bye, Oleg
Re: Problem mounting ReiserFS as root partition
Hello! On Mon, Jun 16, 2003 at 11:16:52AM +0200, Till Gerken wrote: kernel (hd0,0)/boot/vmlinuz-2.4.21 root=/dev/hda1,rw hdc=ide-scsi pci=biosirq idebus=66 I suggest you to replace the comma between root=/dev/hda1 and rw with space. Bye, Oleg
Re: Identifying files with badblocks
Hello! On Sun, Jun 08, 2003 at 11:16:05PM +0200, Felix E. Klee wrote: I am using a ReiserFS 3.6 formatted IBM-DJSA-220-ATA-harddisk with SuSE LINUX 8.2. Today, by using badblocks -s /dev/hda under LINUX and IBM/Hitachi's Drive Fitness under DOS, I found that the drive contains a continuous section of bad blocks. The Drive Fitness Utility has an option to repair the corresponding sector but this will destroy all data in it. This is OK, but I need to know what data is destroyed so that I can recreate it later. So, now my question: How do I find out which files correspond to certain bad blocks on my Reiser file system? Well, you can do debugreiserfs -d /dev/your_device somefile. then lookup the blocknumber there as text string. This will give you file's key. Then lookup the direntry by this key. Bye, Oleg
Re: Error In Dmesg
Hello! On Fri, May 30, 2003 at 08:56:32AM -0400, Bill Rees wrote: My application is running with the Sun jdk 1.4.1_02 under Red Hat 9.0 and I've received this error in dmesg: Do you have any way to reproduce? Unable to handle kernel NULL pointer dereference at virtual address 0018 printing eip: e092b263 *pde = Oops: CPU:0 EIP:0060:[e092b263]Not tainted EFLAGS: 00010282 EIP is at do_journal_end [reiserfs] 0x3b3 (2.4.20-8smp) So it's died here: /* for each real block, add it to the journal list hash, ** copy into real block index array in the commit or desc block */ for (i = 0, cn = SB_JOURNAL(p_s_sb)-j_first ; cn ; cn = cn-next, i++) { if (test_bit(BH_JDirty, cn-bh-b_state) ) { (in test_bit) because cn-bh is zero. Hm. Chris, do you have any ideas how that might have happened? Bye, Oleg
Re: About Reiser4 release date ...
Hello! On Thu, May 29, 2003 at 12:09:31AM +0200, Fred -- Speed Up -- wrote: How about the BitKeeper repositery : does it contain the latest 2.5 kernel sources along with your Reiser4 developement patches ? Yes, our reiser4-linux-2.5 bk reporsitory contain latest 2.5 kernels sources + patches to make UML arch functional + necessary reiser4 changes. And our reiser4 bk repository contains current reiser4 code. Bye, Oleg
Re: disk or reiserfs problem?
Hello! On Wed, May 28, 2003 at 01:07:27PM -0700, Jeff Breidenbach wrote: This is after a hard (power switch) reboot (due to I/O errors). The disk in question has about 125 GB of data on a single 200GB reiserfs partition. Do people think the disk is toast, or is this possibly some correctable filesystem problem? The machine is remote, so I can't hdb1: bad access: block=35, count=5 end_request: I/O error, dev 03:41 (hdb), sector 35 Looks like disk have gone bad. If you are lucky enough, some of the data still can be recovered. Try to copy entire disk into a file/to another disk to see how much bad sectors are there. Bye, Oleg
Re: [PAID PRIORITY SUPPORT REQUEST] Quotas not fully working in 2.4.21-pre5
Hello! On Sat, Apr 05, 2003 at 01:11:18PM +0200, Philippe Gramoull? wrote: | Object: Can't set quotas with 2.4.21-pre5 | After we upgraded from 2.4.19-pre6 + quota patches , 2.4.21-pre5 + data-logging and quota | patches, we can't set quotas anymore. | | Have you enabled all the compatibility stuff in kernel config? (show the relevant part of .config please). Well, i think so yes. You'll find attached the .config. # CONFIG_QIFACE_COMPAT is not set suggests that you do not have compatibility stuff enabled. Please enable it and see if it cures the problem? At least that's how I forced my quotatools to work. Well, mine are 3.03 and your are 3.07, help text says that this option is only required for quota tools = 3.04. But I still think you should try it. Bye, Oleg
fixes/changes to mount options parser
Hello! Ok, so after some silence on this front, here is 2.4 and 2.5 versions of mount options parser fixes I propose. These fixes consist of: When you pass some mount options at mount time, default mount options are not reset if what you pass does not change the defaults. (both in 2.4 and 2.5) If you are doing remount and parser detected error, remount fails (2.4 only) If you pass more than one jdev= option, parsers spits out error (2.5 only, as 2.4 does not have this yet) Remount options (better) support (2.5 did not had ability to propagate mount options at all, by Jeff Mahoney). What this patch does not do, but was supposed to do: Hans decided that conflicting mount options on one line (like tails=off,tails=small) should produce error on mount/remount. After I implemented this, it turned out this does not work with remounting. if you mount with -o tails=off, and then later do remount with -o tails=small, then the filesystem will be passed options string like tails=off,tails=small. This seems to be a feature of mount(8). So the only option that cannot appear on command line twice is jdev (resize can be met more than once is you enlarged drive twice in a row). So I reverted to the old way of last option takes effect. As I do not want to split the code to determine whenever this is mount or remount, this behaviour will take place in case of both mount and umount. Attached are three patches: 2.4.20_parsefix.diff is patch for 2.4.20 2.4_parsefix.diff is patch against latest Marchelo's bk tree. (2.4) 2.5_parsefix.diff is patch against latest Linus' bk tree. (2.5) This code was only tested by me, and I want to hear any opinions on ways to improve before I pass it to our tester and start to try to submit it upstream. So if you want to try the code, treat it as experimental one. (BTW, I wonder how often people do actually pass any mount options at all and how often remount is made wit hany additional options?) Bye, Oleg --- fs/reiserfs/super.c.1 Fri Apr 4 18:39:16 2003 +++ fs/reiserfs/super.c Fri Apr 4 19:31:39 2003 @@ -402,8 +402,11 @@ mount options that have values rather than being toggles. */ typedef struct { char * value; -int bitmask; /* bit which is to be set in mount_options bitmask when this -value is found, 0 is no bits are to be set */ +int setmask; /* bitmask which is to set on mount_options bitmask when this +value is found, 0 is no bits are to be changed. */ +int clrmask; /* bitmask which is to clear on mount_options bitmask when this + value is found, 0 is no bits are to be changed. This is + applied BEFORE setmask */ } arg_desc_t; @@ -413,37 +416,42 @@ char * option_name; int arg_required; /* 0 is argument is not required, not 0 otherwise */ const arg_desc_t * values; /* list of values accepted by an option */ -int bitmask; /* bit which is to be set in mount_options bitmask when this -option is selected, 0 is not bits are to be set */ +int setmask; /* bitmask which is to set on mount_options bitmask when this + value is found, 0 is no bits are to be changed. */ +int clrmask; /* bitmask which is to clear on mount_options bitmask when this + value is found, 0 is no bits are to be changed. This is + applied BEFORE setmask */ } opt_desc_t; /* possible values for -o hash= and bits which are to be set in s_mount_opt of reiserfs specific part of in-core super block */ const arg_desc_t hash[] = { -{rupasov, FORCE_RUPASOV_HASH}, -{tea, FORCE_TEA_HASH}, -{r5, FORCE_R5_HASH}, -{detect, FORCE_HASH_DETECT}, -{NULL, 0} +{rupasov, 1FORCE_RUPASOV_HASH,(1FORCE_TEA_HASH)|(1FORCE_R5_HASH)}, +{tea, 1FORCE_TEA_HASH,(1FORCE_RUPASOV_HASH)|(1FORCE_R5_HASH)}, +{r5, 1FORCE_R5_HASH,(1FORCE_RUPASOV_HASH)|(1FORCE_TEA_HASH)}, +{detect, 1FORCE_HASH_DETECT, (1FORCE_RUPASOV_HASH)|(1FORCE_TEA_HASH)|(1FORCE_R5_HASH)}, +{NULL, 0, 0} }; /* possible values for -o block-allocator= and bits which are to be set in s_mount_opt of reiserfs specific part of in-core super block */ const arg_desc_t balloc[] = { -{noborder, REISERFS_NO_BORDER}, -{no_unhashed_relocation, REISERFS_NO_UNHASHED_RELOCATION}, -{hashed_relocation, REISERFS_HASHED_RELOCATION}, -{test4, REISERFS_TEST4}, -{NULL, 0} +{noborder, 1REISERFS_NO_BORDER, 0}, +{border, 0, 1REISERFS_NO_BORDER}, +{no_unhashed_relocation, 1REISERFS_NO_UNHASHED_RELOCATION, 0}, +{hashed_relocation, 1REISERFS_HASHED_RELOCATION, 0}, +{test4, 1REISERFS_TEST4, 0}, +{notest4, 0, 1REISERFS_TEST4}, +{NULL, 0, 0} }; const arg_desc_t tails[] = { -{on, REISERFS_LARGETAIL}, -{off, -1}, -{small, REISERFS_SMALLTAIL}, -{NULL, 0} +{on,
Re: [PAID PRIORITY SUPPORT REQUEST] Quotas not fully working in 2.4.21-pre5
Hello! On Sat, Apr 05, 2003 at 02:45:29AM +0200, Philippe Gramoull? wrote: Object: Can't set quotas with 2.4.21-pre5 After we upgraded from 2.4.19-pre6 + quota patches , 2.4.21-pre5 + data-logging and quota patches, we can't set quotas anymore. Have you enabled all the compatibility stuff in kernel config? (show the relevant part of .config please). Everything was worked for me last time I tried (at the time of creatign those patches). I will do more tests now and see what will happen. What quotatools version do you have? here is an extract from 2 straces: quotactl(Q_SETQLIM|GRPQUOTA, /dev/sdb1, 502242, {2048, 0, 0, 0, 0, 0, 0, 0}) = -1 EINVAL (Invalid argument) quotactl(Q_SETQLIM|GRPQUOTA, /dev/sdb1, 32066, {2048, 0, 0, 0, 0, 0, 0, 0}) = -1 EINVAL (Invalid argument) This problem is really nasty as this filer is used for the paying services and now clients can use as much space as they want. So, you mean that even old quota limits that were set previously are not enforced? Bye, Oleg
Re: possible bug - fsck shows perfect results, linux refuses to mount
Hello! On Sun, Mar 30, 2003 at 11:22:50PM -0800, Robin H. Johnson wrote: The system refused to mount it originally, so I ran just plain --fix-fixable. It showed nothing wrong at all. By a fluke of terminals, I have a copy of this first output [http://www.orbis-terrarum.net/~robbat2/reiserfs/hdb1.first]. However the system still refused to mount the drive, showing this in syslog: Mar 30 22:14:22 [kernel] read_super_block: can't find a reiserfs filesystem on (dev 03:41, block 64, size 1024) Mar 30 22:14:22 [kernel] read_super_block: can't find a reiserfs filesystem on (dev 03:41, block 8, size 1024) This is indeed strange. The drive still refused to mount. I dug in fsck.reiserfs --help, and saw '--scan-whole-partition'. Tried --rebuild-tree with that on. It showed a LOT of stuff about StatDatas, and completed successfully. Well, this is likely to destroy data, but still it should be mountable at the point of completion. Can you please make a metadata dump for us? debugreiserfs -p /dev/hdb1 | bzip2 -9c metadata.bz2 and make this file available for us to download. I just find that there is something definetly wrong if fsck says the partition is fine, but Linux refuses to mount it. Either this is a bug in Linux, or the reiserfsprogs. Either way, somebody has a bug :-) Sure, and we are interested in resolving the problem. Thank you. Bye, Oleg
Re: filesystem corruption ?
Hello! On Fri, Mar 21, 2003 at 02:01:38PM +0100, Bernd Schubert wrote: So, the beam of X-rays run through the memory module corrupting some bits? ;) This stuff should not have been written to disk, so probably plain reboot should fix everything? Can you test that? indeed after rebooting everything is fine again. We will run another memtest86 So on-disk corruption is out of question. during the weekend, though I really don't believe we will find a problem. Ask those physics guys to run some X-ray experiments while you are running memtest86 ;) Though this machine will be replaced by a real server in a few month, I'm still rather worried what happend. Even if its 'only' a hardware memory problem this means lots of trouble for us -- on the one hand it seems not to be memtest86 detectable and on the other hand our programs really do need Well, it may be not detectable because no high-enerty beams are running around at the time of test. working memory, but of course this is not of your concern. I've learn in the school that if you put some bit amount of plumbum in between some area and source of radiation, chances are radiation that will reach the protected area will be of much lesser strenght. In fact you might go to those guys and ask them what matherial (and how much of it) is best suited to shield against stuff they generate. Bye, Oleg
Re: badblocks blocksize setting
Hello! On Sat, Mar 15, 2003 at 01:09:19AM +0100, Marius Reiner wrote: not really a reiserfs issue: When following the FAQ everything is fine, I set blocksize zu 4096 as debugreiserfs told me, and don't get any bad blocks. Good. Nevertheless I'm a bit concerned about the ones, badblocks reports, when being invoked without the -b option. How does this happen? Can you please show the command line and the resulting output with badblocks? Also what is in dmesg output after such a run? Or should I just ignore them? There is not enough info yet. Can you please tell us whole story? Bye, Oleg
Re: Getopt improvements
Hello! On Fri, Feb 28, 2003 at 03:04:26PM +0300, Hans Reiser wrote: For the simple cases (which also happen to be all we have right now), yes, I think that my implementation is cleaner. It allows the simple use of mutually exclusive options, through the no prefix, and clearing of the other bits in a multivalue option. For now, that's all we need - and it's a valid argument for using my code. However, what I like about Oleg's implementation is that if you have an option that excludes other options (even when it's not multivalue), it can clear those bits as well. It clears them without failing, yes? Not sure I like that. Hm, why should it fail? Incompatible options should fail as they represent error. Feel free to argue with that. Hm, I am not going to argue. But we never had this kind of logic. Usually the latest-specified option was taking effect. Bye, Oleg
Re: reiserfsprogs 3.6.5-pre2 release.
Hello! On Wed, Feb 26, 2003 at 01:54:48AM +0100, Philippe Gramoull? wrote: # time reiserfsck -a /dev/sdb1 Reiserfs super block in block 16 on 0x811 of format 3.6 with standard journal Blocks (total/free): 143109020/59148009 by 4096 bytes Filesystem is cleanly umounted Replaying journal.. 0 transactions replayed Checking internal tree..finished real0m47.890s user0m6.668s sys 0m0.732s Thanks for trying. 48 seconds is much longer than we expected such test should take. Was the system loaded at the time of test? Bye, Oleg
Re: Indicating filtered spam?
Hello! On Sat, Feb 22, 2003 at 02:12:29PM +0100, Szabolcs Szasz wrote: Wouldn' it be better to put (back? was it there? I can't recall) to the Subject header an indication for filtered spam? The fact that now there is Spamassasin at work, actually changes the behavior of my organic brain-embedded spam filter so that I now find myself opening mails I had been deleting before. Seems our filter that directs spam to /dev/null have broke again. I'll see what can be done with it. Bye, Oleg
Re: [PATCH] new data logging and quota patches available
Hello! On Fri, Feb 21, 2003 at 06:32:11PM -0500, Chris Mason wrote: ftp.suse.com/pub/people/mason/patches/data-logging/2.4.21 will soon be updated with a new set of data logging and quota patches against 2.4.21-pre4 The data logging code is updated with another set of io stalling fixes, they should improve performance of data=ordered and data=writeback by being smarter about forcing commits under heavy write load and kicking kreiserfsd. Treat these with care, they've gotten a ton of testing under the suse kernel, but the port to vanilla was just done today. The quota patches include a fix for incorrect sd_block counts on symlinks. Replacement 05-data-logging-36.diff.gz file that applies to 2.4.21-pre4-ac5 is available at ftp://namesys.com/pub/reiserfs-for-2.4/testing/05-data-logging-36-ac5.diff.gz It compiles, boots, survives my (simple) testing. (writing this email from patched 2.4.21-pre4-ac5, too). Quota works. symlinks are now have correct blocks count too The reason for rejects is mostly DIRECTIO fix that also went into current bk snapshot, so probably it will apply to Marcelo's bk tree, too. Chris: Is it intended that directio only works on data=writeback mounted filesystems? Also following README file diff should be considered: --- README.orig Sat Feb 22 16:44:34 2003 +++ README Sat Feb 22 16:44:49 2003 -28,7 +28,7 These add reiserfs quota support -07-quota-v2-2.4.21.diff.gz +07-quota-v2-2.4.21.diff.gz # you don't need this on -ac, too 08-reiserfs-quota-26.diff.gz 09-kinoded-8.diff.gz
Re: reiserfs messages cleanup patch.
Hello! On Fri, Feb 21, 2003 at 09:22:16AM +0100, Manuel Krause wrote: It doesn't apply to my kernel setup [-pre4 + data-logging + preempt] -- too many hunks failing in my eyes. datalogging is just too big of a change. Bye, Oleg
About direntries pointing to nowhere on reiserfs problem in 2.4
Hello! Vladimir have finally tracked the problem to a race between two iget4 running on same file whose inode is not in cache. The sequence of events is like this (UP case): 1st thread: take inode_lock search through inode cache, but found nothing. alloc new inode, mark it as locaked. release inode_lock call reiserfs_read_inode2(). do some stuff. call search_by_key() schedule() Now 2nd thread comes in: take inode_lock search through inode cache, found inode with same inode number. check that there is find_actor defined for reiserfs. call find_actor() check that inode's primary key's dir_id is equal to expected one. but at that time this part of inode is uninitialized yet! so we return 0; ... And we create second inode for the same file. This scenario seems possible for any filesystem that stores some cookie in private part of inode and whose read_inode2 can schedule. We checked and coda seems safe because they take a semaphore in iget(). So we solved that with patch below (Zygo, others who think they have this problem, please check). But Vladimir is really unhappy with that comparison with zero and guessing (though he agrees it is correct, if FS is undamaged). Andrew, Alan: Is there a possibility to have iget5_locked() kind of interface in 2.4? We need some way to init parts inode under inode_lock to solve this problem in more elegant way. (and inode_lock is not even exported, so I invented another spinlock to guard atomicity of inode pkey update on SMP). Bye, Oleg = fs/reiserfs/inode.c 1.42 vs edited = --- 1.42/fs/reiserfs/inode.cThu Feb 13 15:42:42 2003 +++ edited/fs/reiserfs/inode.c Thu Feb 20 17:23:24 2003 @@ -20,6 +20,10 @@ static int reiserfs_get_block (struct inode * inode, long block, struct buffer_head * bh_result, int create); +/* This spinlock guards inode pkey in private part of inode + against rae between find_actor() vs reiserfs_read_inode2 */ +static spinlock_t keycopy_lock = SPIN_LOCK_UNLOCKED; + void reiserfs_delete_inode (struct inode * inode) { int jbegin_count = JOURNAL_PER_BALANCE_CNT * 2; @@ -898,8 +902,9 @@ bh = PATH_PLAST_BUFFER (path); ih = PATH_PITEM_HEAD (path); - +spin_lock(keycopy_lock); copy_key (INODE_PKEY (inode), (ih-ih_key)); +spin_unlock(keycopy_lock); inode-i_blksize = PAGE_SIZE; INIT_LIST_HEAD(inode-u.reiserfs_i.i_prealloc_list) ; @@ -1220,10 +1225,27 @@ unsigned long inode_no, void *opaque ) { struct reiserfs_iget4_args *args; +int retval; args = opaque; +/* We protect against possible parallel init_inode() on another CPU here. */ +spin_lock(keycopy_lock); /* args is already in CPU order */ -return le32_to_cpu(INODE_PKEY(inode)-k_dir_id) == args - objectid; +if (le32_to_cpu(INODE_PKEY(inode)-k_dir_id) == args - objectid) + retval = 1; +else + /* If The key does not match, lets see if we are racing + with another iget4, that already progressed so far + to reiserfs_read_inode2() and was preempted in + call to search_by_key(). The signs of that are: +Inode is locked +dirid and object id are zero (not yet initialized)*/ + retval = (inode-i_state I_LOCK) +!INODE_PKEY(inode)-k_dir_id +!INODE_PKEY(inode)-k_objectid; + +spin_unlock(keycopy_lock); +return retval; } struct inode * reiserfs_iget (struct super_block * s, const struct cpu_key * key)
Re: About direntries pointing to nowhere on reiserfs problem in 2.4
Hello! On Fri, Feb 21, 2003 at 08:11:16AM +0100, Manuel Krause wrote: Is this fix safe for usage already? work for me (tm) ;) It will most probably will be replaced by iget5_locked, though. Mmmh. I have some hangs within KDE 2.2.2 Konqueror when copying over (existing) directory links for some weeks now. I don't copy often over links but when it stalls, a directory link is somewhere involved. (No crashes, no messages in the logs, just minutes for copies or deletes that happened in seconds usually.) Should I worry and use the patch -- or finally upgrade my KDE ;-)) This does not look like this bug in reiserfs, the patch is supposed to fix. Bye, Oleg
Re: reiser4 and 2.5.60
Hello! On Sat, Feb 15, 2003 at 04:00:33PM +0100, Ookhoi wrote: I still get a Segmentation fault when I want to untar a kernel source on a fresh 512MB loop filesystem. Does this help you? Kind of. It seems that inode_file_plugin(inode)-key_by_inode pointer is zero for one of inodes. But I do not see how that can happen at all. I personally untarred (and then compiled) kernel on this reiser4 snapshot without any problems more than once (in fact this is on of my basic tests). I tried bot UP and SMP, block device and loop device with file. Can you describe your system in more details? Bye, Oleg
Re: Error - Partition Correspondance [was Re: Corrupted/unreadable journal: reiser vs. ext3]
Hello! On Tue, Feb 18, 2003 at 12:35:23AM +0100, Manuel Krause wrote: BTW, do the ReiserFS errors nowadays print out a usable partition identification (like Chris actual data-logging patches perform at mount, e.g.)? Sometimes it does. I mostly always have 2 partitions with ReiserFS mounted, so -- is it still meaningless to get an error message related to one of them in my logs? It depends on what are the messages. I posted this circumstance some 3.6-ReiserFS levels ago and someone of your team wanted to implement this after his task-list was done, IIRC. Yes. I have a patch dated back to May 7th, 2002. But it was never accepted for reason I don't remember already. I will dig through my email, though. Probably I will give it another try. Bye, Oleg
Re: rsync and memory leak on linux 2.2?
Hello! On Thu, Feb 13, 2003 at 12:11:44PM -0500, Patrick O'Rourke wrote: What happens is that we put both systems under a fixed work load and after many hours system B will start consuming lots of swap and suffer degraded performance. Through some kernel debugging we discovered that the 4K kmalloc slab is slowly growing to the point where there is enough memory pressure to start swapping. We observe that one of the heaviest users of the 4K slab is reiserfs_kmalloc(), and in particular, the calls issued from get_mem_for_virtual_node() and reiserfs_file_write(). As an experiment we ran the same work load, but this time disabled the rsync'ing of the log files and we no longer see the growth in the 4k slab. This leads us to believe that we have a memory leak somewhere in the reiserfs and was wondering if anyone else has seen this, and if so, if a patch exists. One question I have is that get_mem_for_virtual_node() will first attempt to allocate memory atomically, and if this fails, will re-try non atomically. In this case we return SCHEDULE_OCCURRED which results in fix_nodes() will also return to its caller. Is it possible for buffer allocated by get_mem_for_virtual_node() to be lost in this case? I did not see any path out of reiserfs_file_write() in which we could return w/o freeing the buffer. Perhaps this problem is triggered via memory pressure? Thanks for the report. I will investigate it tomorrow, when it is day again in Russia. Meanwile there is easy way to find if there is some memory leaked through reiserfs_kmalloc. If you enable CONFIG_REISERFS_CHECK kernel option, then there is code in reiserfs_kmalloc, that cheks if we alloc, but not free the memory. It will print a warning once in a while. But please note that CONFIG_REISERFS_CHECK will impose some (substantial?) slowdown to reiserfs operations, so you might just unconditionally enable the check for memleak in there if you cannot afford the slowdown. Bye, Oleg
Re: Corrupted/unreadable journal: reiser vs. ext3
Hello! On Wed, Feb 12, 2003 at 05:56:58PM +0100, Anders Widman wrote: So it would be possible to do some actions to 1) get some blocks back in the described way, 1.1) write to really bad blocks should have remaped them already here if there is a space in remap area 2) save bad blocks to badblock list in fs if they are still bad - out of remap area. Would be not bad to try to recover in this way already remapped blocks - do not know how to get the list of them only. Ok, but what if the IO error you got is not a bad block, but a bad cable? Do you want the fs to work in the described way? Trying to fix all automatically? I am not sure. How about trial and (then) error? :) That might be suitable for fsck, but not for kernel I am sure. Kernel should just probably return error or try to use different block (if it was doing write) and if certain number of attempts failed, return error too. Also remount R/O if write error is in system area (journal, superblock, bitmaps) or special mount option was given that demands remounting R/O on io errors. Bye, Oleg
Re: Error after rebuilding file system tree
Hello! On Tue, Feb 11, 2003 at 11:01:55AM +0100, Karl Mistelberger wrote: Feb 11 08:02:54 nnk kernel: 3a:04: rw=1, want=26537940, limit=26529792 Feb 11 08:02:54 nnk kernel: attempt to access beyond end of device what does debugreiserfs /dev/your_device says? Starting the system again resulted in: 4reiserfs: found format 3.5 with standard journal 4reiserfs: checking transaction log (lvm(58,4)) for (lvm(58,4)) 6attempt to access beyond end of device 63a:04: rw=1, want=26532716, limit=26529792 6attempt to access beyond end of device Hm. 1. please create metadata dump: debugreiserfs -p /dev/yourdevice | bzip2 -9c /path/metadata.bz2 Then make this metadata.bz2 file available for download. 2. Get newer reiserfsprogs from our ftp, and try to run rebuild-tree with reiserfsck from latest reiserfsprogs. Seems that your kernel found non-existent transactions in log. Where is your kernel from? (what distro/version? any updates applied?). Thank you. Bye, Oleg
Re: reiserfsck --blame-it-on-the-hardware-yeah-yeah
Hello! On Tue, Feb 11, 2003 at 05:24:58AM +1300, Sam Vilain wrote: Therefore, your reiserfsck has a bug. The whole point of a fsck is Well, currently the logic is If we cannot read some block, that usually means this is a badblock. And so it prints the message. Of course more testing about if the block is beyond partition boundary should be probably added. The block is not bad, it's EINVAL :-). The block *number* is bad; you Sure. *could* add to your is_block_shagged() function a test for whether the block is out of bounds, but the point is that if it gets as far as that function, chances are that it is too late. This is being worked on currently. (In reiserfsck), you need to do the bounds check when the referring block/data structure is checked. Sure. We have some checks, though apparently not enough. [horrors about recompiling fsck with customly disabled stuff skipped]. If you really decided to shoot yourself in the foot, you might as well just will journal with zeroes. It would be much easier this way ;) - filesystem now mounts, however about the first 2 levels of directories, and many recently written files, have had their directory entries lost - lost+found contains roughly 11,000 entries (of 150,000 or so). Hm, probably corresponding blocks (with names) were only present in journal, and you erased that. - thankfully, I can locate the several hundred megabytes of .debs to save myself spending days re-downloading it all over 56k :-). Mission successful. At least you have not lost anything valuable. This is good. If reiserfsck was built with --no-journal-available in mind (that is, ignoring the data present in an in-partition journal with that switch), then I'm fairly sure that I wouldn't have suffered the last problem. How so? After the first scan, the journal would have been written back to an empty state. So what? If directories content was only present in journal, you just loose that info. I'm going to try removing that test in the 3.x.1b version and see if the fsck completes. Well, 3.x.1b should not be actually used, lots of bugs were fixed since then. Vitaly: We need a check that journal target block is in range of filesystem. Please add this test. That is not all you must do! You need to do one, preferably both of the following: a) allow reiserfsck to ignore the in-partition journal, without producing an insane result (where the filesystem header says there is a journal, but the space where the journal is has filesystem data in it). This cannot happen in any sane way. (I mean root block just cannot live in journal). b) make reiserfsck validate the journal as well as the filesystem, probably playing them back itself rather than relying on a mount option that just does the playback for it. In theory you could decide whether to use the on-disk or the in-journal data structure, depending on which was more consistent! I was thinking about that already. May be we will do something like that in 2.7/2.8, but certainly not now. And it will make lots of complications, I fear. People who will forget to upgrade their reiserfsprogs will get in trouble when upgrading kernels and so on... Bye, Oleg
Re: reiserfsck --blame-it-on-the-hardware-yeah-yeah
Hello! On Sat, Feb 08, 2003 at 10:49:28PM +, [EMAIL PROTECTED] wrote: Ah, right, well that explains it. It complained about block 524111, which would be physical block number 2096444. This is off the end of the block device, which only has 261 blocks. Aha, so this is indeed the problem. I acknowledge that I used `hda' where I should have used `hda1' for the simple read-test with dd, but did you not see the `badblocks' program output in the same e-mail? `badblocks' read in the existing Yes, I saw it. then wrote the original data back. It detected no error anywhere in the block device. That's good, it means your hard drive is probably ok. Therefore, your reiserfsck has a bug. The whole point of a fsck is Well, currently the logic is If we cannot read some block, that usually means this is a badblock. And so it prints the message. Of course more testing about if the block is beyond partition boundary should be probably added. that any data, anywhere, can be corrupted - and reiserfsck should not fall over because of it. So, what you should do is carefully go Sure, unfortunatelly interactive part of reiserfsck is not very mature. And what do you think it should have done? Shrink the size of FS to fit changed (may be because of corruption) partition size? Enlarge the partition? What else? through your filesystem data structure, insert garbage in at each unique structural location, and run `reiserfsck' on it to see if it handles the problem correctly. Then I'd suggest sollowing that up with some randomly corrupted filesystems. Yup, we are running such tests. But thanks for suggestion. Looking at the source code, I now see why the --no-journal-available switch does not do anything if a `standard' journal is used rather than an off-device journal. However, I would suggest that this test is superfluous, and the tool has more benefit to the system administrator if the test for a `standard' journal with fsck_skip_journal is removed, or perhaps replaced with a warning or another prompt. We will think about it. Thanks for the idea. I'm going to try removing that test in the 3.x.1b version and see if the fsck completes. Well, 3.x.1b should not be actually used, lots of bugs were fixed since then. Thanks for the report. Vitaly: We need a check that journal target block is in range of filesystem. Please add this test. Bye, Oleg
Re: reiserfsck --blame-it-on-the-hardware-yeah-yeah
Hello! bread: Cannot read the block (524111). Aborted (none):~# dd if=/dev/hda of=/tmp/foo skip=524100 count=100 100+0 records in 100+0 records out (none):~# od -x /tmp/foo 000 6974 6e6f 7720 6c69 206c 6562 6920 636e 020 756c 6564 2064 6e69 7420 6568 6e20 7865 [... lots of very valid looking data snipped ...] This is wrong block, try adding bs=4k to dd Also read not from /dev/hda, but from your partition instead Bye, Oleg
Re: kernel go-slow
Hello! On Thu, Feb 06, 2003 at 05:41:46PM +0100, Russell Coker wrote: but there is possible situations that will not generate disk activity, but may cause your system to go-slow, if there you have some unussual IO numbers while disk activity is moderate to low - most likely same sweet pair. The problem is that sar etc product jumbled results. Profiling the kernel may help, but may also hide the error, and it's not something I can easily do. Well, you can do it very easily. reboot with profile=2 kernel option. when 100% sys cpu situation started - execute readprofile -r when it is finished, execute readprofile -m /path/to/System.map somefile then sort somefile and you are done, you are now seeing where is most of the time is spent. The servers are locked in a managed server room on the other side of the city so seeing the blinken lights is not an option. ;) humourwebcam/humour I've put the aa1 kernel on half the machines and now I'll wait to see what happens. If the aa1 machines don't have the problem but the others do then I'll go all aa1. Ah, if your problem was with highmem I/O not present, then that might actually help. Bye, Oleg
Re: link/unlink problem gone?
Hello! On Thu, Feb 06, 2003 at 05:32:10PM -0500, Zygo Blaxell wrote: Sigh, these were false hopes indeed. I can reproduce it with 2.4.21-pre4, only it is now harder for some reason. I've seen times-to-failure ranging from 20 minutes to 20+ hours (!). Same here. Chris: My current idea is it happens during low memory conditions, so I am actively running around prune_icache and id's dcache equivalent. Probably you can easily reproduce that if you'd have no swap and not very much RAM. (Ok, I just checked, limited the RAM to 90M and turned off SWAP entirely. and reproduced the problem fairly quickly) I have observed the problem on machines ranging in size from 96 to 512MB RAM. I haven't observed a correlation between swapping activity and failures but I haven't been looking for this either. The machines I noticed that with newer 2.4.21-pre kernels first I see processes die because of OOM and only after that I see direntries pointing to nowhere. I reproduced this much more than once, so I believe there is some correlation between these. that have problems machines are swapping at some time or another (they have several hundred MB of swap used). And they are just swapping all the time, so it may take a while before useful code runs and problem happens, it seems. So far I decided that with SWAP turned off one can reproduce problem more easily that with SWAP on (especially if swap is large). Bye, Oleg
Re: Fwd: Re: Segmentation Fault when mounting ataraid
Hello! On Wed, Jan 29, 2003 at 08:38:16PM +0300, Hans Reiser wrote: Well, that certainly looks like a bug in mount options parsing code. Edward and Oleg, please review and fix. There is another decoded output that makes much more sence to me. And it suggest something gone wrong within block layer. (And in the one you are referring to parse_options's address is only stored in registers, not in bactrace. Also starting from 2.4.20, there is no function named parse_optinos in reiserfs code). Bye, Oleg
Re: Segmentation Fault when mounting ataraid
Hello! On Wed, Jan 29, 2003 at 07:32:31PM +0100, Jochen Haemmerle wrote: So, here it comes again! Don't care about the warning this is the machine the errror occures! I hope someone understands that sh*** because I don't!!! Ok, looks like __make_request tries to call get_request, and there is something wrong with request queue. Yesterday I've patched my Kernel to 2.4.21-pre3. The bug does not appear! It seems to be only a bug of the 2.4.20 (on 2.4.19 it works too...as I allready mentioned here) Well, that probably means there was some bug in 2.4.20, that is now fixed in ataraid code path in later kernel, I presume. Though I quckly scanned changelogs and see nothing related. Bye, Oleg
Re: Re: Segmentation Fault when mounting ataraid
Hello! On Tue, Jan 28, 2003 at 09:49:29PM +0100, Jochen Haemmerle wrote: Well, down there it is!! Hm. Strange stacktrace, I'd say. Please also decode EIP line, may be you need to get never ksymoops for that. (EIP 0010:[c01a62c0]Tainted: P) BTW, what proprietary modules do you have loaded? guardian@viking:~$ cat segfault.txt | ksymoops -m /boot/System.map-2.4.20 ksymoops 2.4.6 on i686 2.4.20. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.20/ (default) -m /boot/System.map-2.4.20 (specified) Unable to handle kernel NULL pointer dereference at vitual address 0004 c01a62c0 *pde = Oops: 0002 CPU: 0 EFLAGS: 00010012 eax: ebx: c03324f8 ecx: dff188a0 edx: c03324fc esi: edi: c03324f8 ebp: 0008 esp: d7839d98 Process mount (pid: 327, stackpage=d7839000) 007c 098a9f2d c0332518 0800 c0332520 c0332518 0080 003e003f c01a6c6c c03324f8 d7cc54a0 0008 Call Trace: [c01a698d] [c01a6c6c][c01a6ccc] [c01a6e27] [c0172671] [c0173188] [c013c712] [c014d546][c013c8fd] [c014e589] [c014e842] [c014e6ad] [c014ec6f] [c0106f17] Code: 89 50 04 89 02 c7 01 00 00 00 00 c7 41 04 00 00 00 00 ff 0b Using defaults from ksymoops -t elf32-i386 -a i386 ebx; c03324f8 parse_options+124/1c8 edx; c03324fc parse_options+128/1c8 edi; c03324f8 parse_options+124/1c8 Trace; c01a698d reiserfs_super_in_proc+69/254 Trace; c01a6c6c reiserfs_per_level_in_proc+f4/134 Trace; c01a6ccc reiserfs_bitmap_in_proc+20/a0 Trace; c01a6e27 reiserfs_on_disk_super_in_proc+db/f0 Trace; c0172671 nlm_shutdown_hosts+e9/11c Trace; c0173188 nlmsvc_lock+c8/340 Trace; c013c712 sys_getdents64+7e/b3 Trace; c014d546 notesize+1e/2c Trace; c013c8fd max_select_fd+9d/a4 Trace; c014e589 handle_ide_mess+25/194 Trace; c014e842 msdos_partition+14a/2f8 Trace; c014e6ad handle_ide_mess+149/194 Trace; c014ec6f __load_block_bitmap+17f/198 Trace; c0106f17 show_stack+7/78 Code; Before first symbol _EIP: Code; Before first symbol 0: 89 50 04 mov%edx,0x4(%eax) Code; 0003 Before first symbol 3: 89 02 mov%eax,(%edx) Code; 0005 Before first symbol 5: c7 01 00 00 00 00 movl $0x0,(%ecx) Code; 000b Before first symbol b: c7 41 04 00 00 00 00 movl $0x0,0x4(%ecx) Code; 0012 Before first symbol 12: ff 0b decl (%ebx) Bye, Oleg
Re: mkreiserfs -s 1024 makes unmountable partitions
Hello! On Sun, Jan 26, 2003 at 07:18:16PM +0100, Francois-Rene Rideau wrote: Hi! No hard disk crash today (I'm just disabling the DMA )- ) However, I've tried to make small reiserfs partitions, and was annoyed at the journal taking a significant size of the disk: 32MB is 50% of my 64MB /boot partition, and 40% of the whole of my server's 80MB harddisk. I saw that mkreiserfs had an option -s to select the size of the journal, and tried to use it to make a 4MB journal: mkreiserfs -s 1024 /dev/hdc1 However, whereas mkreiserfs didn't complain, the resulting partition was unmountable by linux. In the syslogs, the kernel complains: read_super_block: can't find a reiserfs filesystem on (dev 16:01, block 128, size 512) read_super_block: can't find a reiserfs filesystem on (dev 16:01, block 16, size 512) You need journal relocation patches from Chris Mason. ftp://ftp.suse.com/pub/people/mason/patches/datalogging I there a way to make a reiserfs partition with a small journal? Sure. You just did it. ;) Not you need in-kernel support to be able to mount it. Ir You can use 2.5 kernels (note these are not recommended for productional environment of course) Would a small kernel patch do it? Sure. In any case, I think that it is a bug that mkreiserfs doesn't check the consistency of its parameters with what the kernel is able to handle. No, that's not a bug. mkreiserfs cannot know if you are just making a filesystem and planning to reboot into proper kernel later (or even move the disk to other system). Also it cannot detect if your current kernel have any patches applied or not. Bye, Oleg
Re: Hard disk crash and solution
Hello! On Mon, Jan 27, 2003 at 05:53:31AM +0100, Ookhoi wrote: Title: IBM DTLA 307045 Hard disk crash I bought this disk (46 GB) about two years ago. One of the best they claimed. [...] What is the fucking MBTF of these drives?? Is it close to one year like I experienced? That is quite good for those drives :-) I bought IBM DTLA-307030 made in Hungary 2 years ago. It is still working (though it already have ~1500 bad sectors remapped) aside of making unusual noises when remapping bad sectors ;) I may be just lucky. Also I try to run it in cool environment, so that may help it too. Bye, Oleg
Re: quotas in 2.4.20
Hello! On Fri, Jan 24, 2003 at 09:02:02PM -0600, Jos? A. Guzm?n wrote: I'm trying to get quotas working on 2.4.20. So far they seem to work ok with the dec-3-2002 patches from: ftp://ftp.namesys.com/pub/reiserfs-for-2.4/testing/quota-2.4.20/ with CONFIG_QFMT_V2. However the patch from jan-22-2003 in: ftp://ftp.namesys.com/pub/reiserfs-for-2.4/2.4.20-pending/01-iput-deadlock-fix.diff does not apply on top of the testing/quota-2.4.20 patches. Yup. You do not need that patch if you are using quotas, as it is already in ;) I will rediff quota patches against new 2.4.21-pre later. Also, when compiling a 2.4.20 kernel with testing/quota patches and the latest evms 1.2.1 patches, compilation stops with: fs/fs.o: In function `fsync_dev_lockfs': fs/fs.o(.text+0x31a8): undefined reference to `DQUOT_SYNC' Hm, that's strange. Do evems guys touch quota code at all? Are the testing/quota patches recommended for a production box with 2.4.20? (Debian 3.0, quota 3.08) Well, they seem to work well enough. There is no bugreport that I can assotiate with this quota code for sure. Is there a way to get EVMS working with quotas? Hm. What are evms guys thoughts on that issue? Bye, Oleg
Re: slightly [OT] highmem (was Re: 2.4.20 at kernel.org and data logging)
Hello! On Fri, Jan 24, 2003 at 06:00:19PM +0100, Dieter N?tzel wrote: higmem4GB / highmem64GB with pae or does it produce more overhead that you mention below? You get no advantage of course. But lots of overhead. Rumours have it that 256M systems with highmem enabled kernels (default for RedHat beta it seems) are swapping much more then when the same kernel is built with highmem off. But that could be because they have forgotten to enabled HIGHMEM IO? See Andrea Ancangeli's -aa kernels. What HIGHMEM IO? There is exactly NO highmem, so sighmem IO code won't be used. Bye, Oleg
Re: old block allocator found in 2.4.19
Hello! On Thu, Jan 23, 2003 at 11:30:07PM +0100, Newsmail wrote: Hi Oleg, as you remember I mentioned you about my loop-aes+lvm+reiserfs problem, that leaves hung processes after them, and only a cold reboot Yes. Unfortunatelly I only produced 2.4.20+crypto stuff kernel image and now other more important bugs and problems divert me from looking at your problem more, sorry. could save a solution. well these problems came (in my opinion) after the introduction of the new block allocator in 2.4.20-preX. in the first version the new block allocation wasnt the default one, we had to use some preallocmin= etcetc flag in the mount process. well I would like to try with a new 2.4.20 kernel the settings for the old allocator. is there a way to use the old allocator with the new kernel? some mount option or any? Well, you can specify tails=large,alloc=old_way:concentrating_formatted_nodes=10. This way it will resemble old block allocator pretty much. Also you can remove the new block allocator patch from 2.4.20 (just apply it (and later fixes) with -R patch option). Also it would be interesting if your hangs go away if you apply iput-deadlock fix: ftp://ftp.namesys.com/pub/reiserfs-for-2.4/2.4.20-pending/01-iput-deadlock-fix.diff Thank you. Bye, Oleg
Re: ordered writes in 2.4.20?
Hello! On Thu, Jan 23, 2003 at 08:35:18PM -0500, Hubert Chan wrote: I'm currently using linux 2.4.20, and I'm wondering if it has support for ordered writes, or if I would have to apply a patch. You need to apply the patch. Bye, Oleg
Re: [reiserfs-dev] Re: [ANNOUNCE]: reiser4 snapshot
Hello! On Fri, Jan 17, 2003 at 01:09:20PM +0100, Ookhoi wrote: It is released as a patch against linux-2.5.58 kernel. It should also work with current (January 16th) bk snapshot at http://linux.bkbits.net/linux-2.5 This is mostly bug fixing release. READ.ME file contains changelog. Can you please have a look at the READ.ME? It contains old info from the former snapshot. Only the snapshot date was old, all other info was recent one. Thanks for noticing. Fixed. Bye, Oleg
Re: Quotas aand 2.5.x
Hello! On Wed, Jan 15, 2003 at 10:43:06AM +0100, Philippe Gramoull? wrote: Actually, right now, we still have that nasty bug every time we run quotacheck that prenvent us from enabling them on several filerservers which is a big problem right now (that's on 2.4.x) Have you tried 2.4. without Chris' datalogging patched, but with original short overflow fix? Bye, Oleg
Re: Stress test failure for reiserfs, but ext3 ok on Linux 2.4.20
Hello! On Wed, Jan 15, 2003 at 03:35:59AM +0100, Bernhard Sadlowski wrote: I am using the attached stess.sh script (probably from this mailinglist) for creating load on a reiserfs filesystem, which forks 100 (read,write,delete) processes: # mkreiserfs /dev/sda4 # mount /dev/sda4 /backup # stress.sh -c /usr -n 100 /backup Then wait until /backup fills up. Hm. This resembles me something. Can you reproduce the same problem if you apply patches from ftp://ftp.namesys.com/pub/reiserfs-for-2.4/testing/quota-2.4.20/ These patches add quota support to reiserfs, but also change some new inode-related operation to prevent deadlocks like you are seeing. Any I/O freezes and even after killing the script, the remaining cp and mv commands don't terminate. They are in status D. A simle ls /backup never comes back. Only a hard powerdown fixes this situation, because init 6 etc. doesn't work. I have even activated the reiserfs debug, but I don't see any additional info. Try executing sysrq-t after the lockup happens, then send us decoded output plese. Thank you. Bye, Oleg
Re: Quotas aand 2.5.x
Hello! On Wed, Jan 15, 2003 at 11:59:53AM +0100, Philippe Gramoull? wrote: | Have you tried 2.4. without Chris' datalogging patched, but with original short | overflow fix? Well, my question was more like a Plan B. I think i did, and that it still crashed, always during the quotacheck, but i'll try it again to be 100% sure. The 2.4.19-presomething you had there before with just only fix I sent first time? Bye, Oleg
Re: Stress test failure for reiserfs, but ext3 ok on Linux 2.4.20
Hello! On Wed, Jan 15, 2003 at 12:51:00PM +0100, Bernhard Sadlowski wrote: Hm. This resembles me something. Can you reproduce the same problem if you apply patches from ftp://ftp.namesys.com/pub/reiserfs-for-2.4/testing/quota-2.4.20/ These patches add quota support to reiserfs, but also change some new inode-related operation to prevent deadlocks like you are seeing. The unpatched kernel shows the hangs much earlier, so I assume that the above patches solve the problem. With the patches the load goes up very slowly but steady to 100 and I/O does not freeze anymore. vmstat and iostat still show activity. I assume you don't need any sysrq-t output now. Ok. That's a good sign. Will the patches be included in 2.4.21? No, they require quota support tha won't be included into 2.4 because of new quota formats and stuff. I will extract relevant bits from the patch though. I will send you short version without quota once it will be ready. Thank you. Bye, Oleg
Re: Stress test failure for reiserfs, but ext3 ok on Linux 2.4.20
Hello! On Wed, Jan 15, 2003 at 02:58:04PM +0300, Oleg Drokin wrote: I will extract relevant bits from the patch though. I will send you short version without quota once it will be ready. Ok, here is the patch, can you give it a try and see if it also helps? I tested it locally and it works for me. If you confirm everything is ok, I will try to get it into 2.4.21 in time. Bye, Oleg --- linux-2.4.20/fs/reiserfs/namei.cFri Nov 29 02:53:15 2002 +++ linux-2.4.20-t/fs/reiserfs/namei.c Wed Jan 15 17:08:20 2003 @@ -488,27 +488,58 @@ return 0; } +/* quota utility function, call if you've had to abort after calling +** new_inode_init, and have not called reiserfs_new_inode yet. +** This should only be called on inodes that do not hav stat data +** inserted into the tree yet. +*/ +static int drop_new_inode(struct inode *inode) { +make_bad_inode(inode) ; +iput(inode) ; +return 0 ; +} + +/* utility function that does setup for reiserfs_new_inode. +** DQUOT_ALLOC_INODE cannot be called inside a transaction, so we had +** to pull some bits of reiserfs_new_inode out into this func. +*/ +static int new_inode_init(struct inode *inode, struct inode *dir, int mode) { + +/* the quota init calls have to know who to charge the quota to, so +** we have to set uid and gid here +*/ +inode-i_uid = current-fsuid; +inode-i_mode = mode; + +if (dir-i_mode S_ISGID) { +inode-i_gid = dir-i_gid; +if (S_ISDIR(mode)) +inode-i_mode |= S_ISGID; +} else +inode-i_gid = current-fsgid; +return 0 ; +} + static int reiserfs_create (struct inode * dir, struct dentry *dentry, int mode) { int retval; struct inode * inode; -int windex ; int jbegin_count = JOURNAL_PER_BALANCE_CNT * 2 ; struct reiserfs_transaction_handle th ; - if (!(inode = new_inode(dir-i_sb))) { return -ENOMEM ; } +retval = new_inode_init(inode, dir, mode) ; +if (retval) + return retval ; + journal_begin(th, dir-i_sb, jbegin_count) ; th.t_caller = create ; -windex = push_journal_writer(reiserfs_create) ; -inode = reiserfs_new_inode (th, dir, mode, 0, 0/*i_size*/, dentry, inode, retval); -if (!inode) { - pop_journal_writer(windex) ; - journal_end(th, dir-i_sb, jbegin_count) ; - return retval; +retval = reiserfs_new_inode (th, dir, mode, 0, 0/*i_size*/, dentry, inode); +if (retval) { + goto out_failed ; } inode-i_op = reiserfs_file_inode_operations; @@ -520,20 +551,19 @@ if (retval) { inode-i_nlink--; reiserfs_update_sd (th, inode); - pop_journal_writer(windex) ; - // FIXME: should we put iput here and have stat data deleted - // in the same transactioin journal_end(th, dir-i_sb, jbegin_count) ; - iput (inode); - return retval; + iput(inode) ; + goto out_failed ; } reiserfs_update_inode_transaction(inode) ; reiserfs_update_inode_transaction(dir) ; d_instantiate(dentry, inode); -pop_journal_writer(windex) ; journal_end(th, dir-i_sb, jbegin_count) ; return 0; + +out_failed: +return retval ; } @@ -541,21 +571,21 @@ { int retval; struct inode * inode; -int windex ; struct reiserfs_transaction_handle th ; int jbegin_count = JOURNAL_PER_BALANCE_CNT * 3; if (!(inode = new_inode(dir-i_sb))) { return -ENOMEM ; } +retval = new_inode_init(inode, dir, mode) ; +if (retval) +return retval ; + journal_begin(th, dir-i_sb, jbegin_count) ; -windex = push_journal_writer(reiserfs_mknod) ; -inode = reiserfs_new_inode (th, dir, mode, 0, 0/*i_size*/, dentry, inode, retval); -if (!inode) { - pop_journal_writer(windex) ; - journal_end(th, dir-i_sb, jbegin_count) ; - return retval; +retval = reiserfs_new_inode(th, dir, mode, 0, 0/*i_size*/, dentry, inode); +if (retval) { + goto out_failed; } init_special_inode(inode, mode, rdev) ; @@ -571,16 +601,17 @@ if (retval) { inode-i_nlink--; reiserfs_update_sd (th, inode); - pop_journal_writer(windex) ; journal_end(th, dir-i_sb, jbegin_count) ; - iput (inode); - return retval; + iput(inode) ; +goto out_failed; } d_instantiate(dentry, inode); -pop_journal_writer(windex) ; journal_end(th, dir-i_sb, jbegin_count) ; return 0; + +out_failed: +return retval ; } @@ -588,15 +619,18 @@ { int retval; struct inode * inode; -int windex ; struct reiserfs_transaction_handle th ; int jbegin_count = JOURNAL_PER_BALANCE_CNT * 3; +mode = S_IFDIR | mode; if (!(inode = new_inode(dir-i_sb))) { return -ENOMEM ; } +retval = new_inode_init(inode, dir, mode) ; +if (retval) + return retval ; + journal_begin(th, dir-i_sb, jbegin_count) ; -windex
Re: Stress test failure for reiserfs, but ext3 ok on Linux 2.4.20
Hello! On Wed, Jan 15, 2003 at 04:48:52PM +0100, Bernhard Sadlowski wrote: Ok, here is the patch, can you give it a try and see if it also helps? I tested it locally and it works for me. If you confirm everything is ok, I will try to get it into 2.4.21 in time. At first glance it seems to work. I will run now that script overnight and will tell you, if any problems arise. Ok, Thank you very much. Bye, Oleg
Re: How to break a reiserfs on Linux 2.4.20
Hello! On Wed, Jan 15, 2003 at 05:44:26PM -0500, Zygo Blaxell wrote: And now I can reliably reproduce it. It has nothing to do with MD, linear, raid, SMP, or unclean shutdowns. I can reproduce this bug on a plain IDE disk partition in about three hours on Linux 2.4.20 (compiled for SMP but running on UP, full .config and system details available on request). My test system has about 4 gigs under /etc, /usr, and /var, /dev/hdc2 is 25GB, and there is 1G of swap. Thanks for the report. We shall try to reproduce it tonight. Were you successful? If your experience is anything like mine, you should have hundreds if not thousands of broken files by now... Yes, we were able to reproduce the problem and now we are trying to fix it. Thanks a lot for your help and for the script. Bye, Oleg
Re: Core dump in reiserfsck
Hello! On Wed, Jan 15, 2003 at 01:08:26PM +0900, Vitaly Porotikov wrote: Perhaps 3.x1c have not this bug, but I can't do any changes in my system. I send it in hope to find some coding errors out (if this wasn't before). You you guessed right, this bug is long fixed. Actually if you ever will decide to upgrade your reiserfsprogs (which is recommended), pick 3.6.4 (or whatever will be latest at the time), not 3.x.1c Bye, Oleg
Re: reiserfsck failure
Hello! On Tue, Jan 14, 2003 at 04:01:42PM -0500, Bill Schrier wrote: I am sending along both the --logfile and the core file from a recent reiserfsck we were running on our Redhat 7.2 raidzone machine. Can you say what exact version was that? Also just before dumping core it should have output some more info on stderr about assertion failure, we are interested in that message too. Thank you. Bye, Oleg
Re: kswapd CPU usage and heavy disk IO
Hello! On Thu, Jan 09, 2003 at 02:31:54PM +0100, Russell Coker wrote: I have a server with 4G of RAM running ReiserFS for everything that matters. It has 2G of swap space free, but so far I have not seen swap usage go above 1.6M (so in normal use I could turn off swap entirely and expect not to see much difference). When it's under really heavy load (when I have a maintenance task involving a find / and there are lots of POP/IMAP clients hitting the server as well as mail delivery) and the load average gets to about 40, the kswapd kernel thread starts using excessive CPU time. It will stay on ~4% but have spikes of up to 45%!!! This is a two-processor machine so 45% CPU reported by top means 90% of a single CPU I guess. 90% of a 1.8GHz P4 CPU is a lot of CPU and I think that something is wrong. Sounds exactly like yesterday/todays topic on lkml. You have highmem box, during heavy IO all of the lowmempages are occupied with bounce buffers and bh's. Kernel needs more low memory and tries to free some with no much success though. Known non-reiserfs related problem. Not easy to fix unfortunatelly. Relevant lkml topic was 2.4.20, .text.lock.swap cpu usage? (ibm x440) Mail from Andrew Morton with msgid [EMAIL PROTECTED] He recommended to try http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20aa1.bz2 and send a report on the outcome Bye, Oleg
Re: !?!
Hello! On Wed, Jan 08, 2003 at 11:53:26AM +0500, Anton Erofeevskij wrote: in reiserfs filesystem time cat sd1 | ./a.out sd2 0.00user 0.05system 0:01.79elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (131major+43minor)pagefaults 0swaps in ext2 filesystem time cat sd1 | ./a.out sd2 0.00user 0.05system 0:00.95elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (131major+43minor)pagefaults 0swaps In what the reason?!? Generally reiserfs might have more CPU overhead over ext2 due to it's journaling and balanced-tree nature per one operation. For large operations this is outweight by speed of performing the operation itself, but when you just write four bytes at a time, and each time that involves statdata (size, possibly nlinks, times) update, possible rebalancings, journal updates. And you have not said what ketnel are you using and what is config of the kernel. Bye, Oleg
Re: How to use external journal?
Hello! On Mon, Dec 23, 2002 at 04:57:04PM +0100, Luis Gregorio Muniz Rodriguez wrote: I have recently discovered that the journal can be placed on an external device (i.e, `mkreiserfs --journal-device FILE'), but I haven't found any doc about it. If you do not use 2.5 kernels, you need to apply separate patch to your tree. I'm currently using ReiserFS on top of LVM partitions, and I am wondering if I can use the same shared journal device for a number of small partitions. No, you cannot. Note that I'm not trying to move all the journals to the same device (this should be easy using `--journal-device' and `--journal-offset', isn't it?). Rather, I try to share the same 32Mb journal between small filesystems and/or filesystems with infrequent writes (such as /usr, /usr/local, /var/www, and so on). Is that possible? And convenient? This is not possible. But you can divide these shared 32M into four separate parts and use these with --journal-offset. Bye, Oleg
Re: BUG() in _get_block_create_0
Hello! On Mon, Dec 23, 2002 at 05:31:09PM +0100, Nick Wellnhofer wrote: I'm using ReiserFS with the old 3.5 format on a web server. The system has been running fine for 2 years. About 1 month ago I upgraded from Linux 2.2.16 to 2.4.18 (SuSE 8.1 default kernel). Some weeks ago I got reiserfs error messages in syslog suggesting a fsck and I had some files which couldn't be accessed or deleted. So last week I ran reiserfsck --rebuid-tree. At first everything worked fine. The problematic files could be accessed again. What was reiserfsck version? After about 3 hours I got an oops report in my syslog, but the system kept running normally. Again 3 hours later the machine crashed with another oops. It turned out that the BUG() in _get_block_create_0 in fs/reiserfs/inode.c was hit both times. According to the value of EAX le_key_k_type (version, key) is TYPE_ANY (0x0f) but TYPE_DIRECT (0x02) is expected. Hm, sounds like FS corruption. The machine is a web server in production and I have only remote access, so I couldn't run reiserfsck again. Any suggestions? We'd be interested in metadata snapshot (debugreiserfs -p /dev/your_device | bzip2 -9c metadata.bz2). You probably can even do this on readonly-mounted device. Probably it will even work on read-write mounted device, but make sure no much write activity is performed on that fs at the time of snapshot. Also avoid writing metadata to the same fs you are taking this metadata from ;) Bye, Oleg
Re: 640.0 GB symlink
Hello! On Tue, Dec 03, 2002 at 09:02:38PM -0800, Jason Mancini wrote: Should I just erase and remake the symlink? Yes, that would be the simpliest thing. It wasn't like this in July (my last backup, *cough*). Then somebody corrupted it, or may be even the drive itself. reiserfsck is generally does not shorten file sizes, but symlink is really special file, so this will be fixed for next release for sure. May be we will even include something like that check into the kernel. Thank you for your report. Bye, Oleg
Re: journal relocation
Hello! On Wed, Dec 04, 2002 at 09:50:52PM -0600, Brian Tinsley wrote: Is there a patchset available for journal relocation on a 2.4 kernel (2.4.20 specifically)? I've seen reference to it in a few places but have been unable to locate it. Sure. Check out ftp://ftp.suse.com/pub/people/mason/patches/data-logging/2.4.20 Bye, Oleg
Re: non volatile ram devices
Hello! On Wed, Dec 04, 2002 at 08:59:35PM +0100, Russell Coker wrote: I have some servers that are giving inadequate disk performance for Maildir mail spools. They are running kernel 2.4.19 (2.4.20 upgrade is planned) and using ReiserFS for everything that's important. May I ask what kind of inadequacy on what kinds of operations do you observe? Thank you. Bye, Oleg