Re: v3 experimental data=ordered and logging speedups for 2.6.1

2004-01-21 Thread Oleg Drokin
Hello!

On Mon, Jan 19, 2004 at 11:45:26AM -0500, Chris Mason wrote:

 I've got most of data=ordered finished, there are a few paths like
 writepage and O_DIRECT that need tweaking.  Thanks to Oleg's file_write
 work in 2.6.x, the data=journal patch is much cleaner than 2.4, it is
 almost done but not included in the bunch of patches I just uploaded to
 ftp.suse.com.  Oleg is cc'd in case he wants to look over the changes to
 reiserfs_file_write in reiserfs-jh-2.

Cool. I'd certainly take a look at it. But may be in February, as I am in US
right now and I have not got any stable internet connection yet.

Thank you.

Bye,
Oleg


Re: New reiser4 snapshot (2003.09.12) is out

2003-09-13 Thread Oleg Drokin
Hello!

On Sat, Sep 13, 2003 at 04:38:01AM -0400, Robert P. J. Day wrote:

  It is because of paranoid -Werror flag.  
  ./configure --disable-werror will help you.
  However, if your system will have readline (as on systems which are used by
  reiser4progs developers :-) headers that warning disappear.
 my system definitely has readline installed, and i get the same error.

On the other hand, you need not only readline itself, but also its header files.
Lots of distributions (e.g. redhat-alike) form another package called (usuaully) 
readline-devel,
that contains this necessary stuff.

Bye,
Oleg


Re: New reiser4 snapshot (2003.09.12) is out

2003-09-13 Thread Oleg Drokin
Hello!

On Sat, Sep 13, 2003 at 06:05:58PM +0400, Hans Reiser wrote:
 my system definitely has readline installed, and i get the same error.
 On the other hand, you need not only readline itself, but also its header 
 files.
 Lots of distributions (e.g. redhat-alike) form another package called 
 (usuaully) readline-devel,
 that contains this necessary stuff.
 did you adjust our headers to be redhat compatible?  If not, please do.

The problem turned out to be that RedHat's readline was compiled with ncurses
dynamically, so when you link in readline, you need to link ncurses too.

Vitaly will release the fix to configure script shortly, I believe.

Bye,
Oleg


Re: New reiser4 snapshot (2003.09.12) is out

2003-09-13 Thread Oleg Drokin
Hello!

On Sat, Sep 13, 2003 at 10:41:47AM -0400, Robert P. J. Day wrote:
  did you adjust our headers to be redhat compatible?  If not, please do.
 i didn't see anything in the READ.ME about adjusting headers, but would
 this also explain why, if i build reiser 4 support directly into the 
 kernel, my make modules_install works fine, but if i make it modular,
 i get

reiser4 is not supposed to be build as module yet.
Hm, I thought I disabled this in Kconfig, but it seems I done this not fully.

Anyway the list of unresolved symbols you provided is valuable and we will
export those on next snapshot.

   just curious.  haven't rebooted under the new kernel yet, but i'd like 
 to clear up the above before i do that.  thanks.

Bye,
Oleg


Re: data-logging finally for 2.4.23?

2003-09-08 Thread Oleg Drokin
Hello!

On Wed, Sep 03, 2003 at 05:56:00PM +0200, Dieter N?tzel wrote:

 What's up Chris?
 Your latest stuff working fine on 2.4.22-rc1-rl (pre-emption; haven't time for 
 a newer version, yet).

I forwarded the patch to Hans to propagate to Marcelo, but it have not went through.
I will check with Hans after he will awake.
I hope we will get 2.4.23 vanilla with datalogging stuff.

Bye,
Oleg


Re: Fwd: Bad root block 0

2003-09-08 Thread Oleg Drokin
Hello!

On Sun, Sep 07, 2003 at 02:46:32PM +0200, Roland H?der wrote:

 another thing: I could extract the missing metadata as you described 
 to a compressed file. But I do not mail it to you, because it contains 
 security-breaking-data: My Online-Banking stuff. :-( If I do so, 

Hm. Can you tell us what kernel version you were using at the time of writing
your files to encrypted device?
What do you mean by extracting missing metadata?

 which will never happen - you just need to crack my password for 
 unlimited cash :-(

unlimited cash? Wow! 

Bye,
Oleg


Re: Fwd: Bad root block 0

2003-09-08 Thread Oleg Drokin
Hello!

On Mon, Sep 08, 2003 at 05:15:01PM +0200, Roland H?der wrote:

  Hm. Can you tell us what kernel version you were using at the time of
  writing your files to encrypted device?
 Opps: Vanilla 2.4.21 with cryptoloop (not patched, it's compiled alone 
 and installed into /lib/modules/2.4.21/x)

Ah, ok.

  What do you mean by extracting missing metadata?
 debugreiserfs can extract the metadata of a rfs-partition. With 
 missing I mean that reiserfschk tells me that there's no metadata 
 available. So here's a strange thing: debugreiserfs found it, but 
 reiserfschk not...

What debugfs command found the metadata? Was it just plain debugreiserfs -p?
How much leaves/internal nodes were found?

  unlimited cash? Wow!
 Nope, not *really* unlimited cash... :)

Ah, sigh.
I thought Bill Gates started to use linux + reiserfs, but no...

Bye,
Oleg


Re: write barrier patches for 2.4.21

2003-08-27 Thread Oleg Drokin
Hello!

On Tue, Aug 26, 2003 at 05:46:24PM -0400, Tom Vier wrote:

 anyone working on scsi wb's? there was a long thread on l-k about wb's, but
 i wasn't aware what came of it.

There was a discussion about that on Kernel Summit 2003 and general opinion was that 
SCSI
does not need the WB stuff at all as it does the correct thing anyway.
But since the the barrier flags are visible in io requests, actual device drivers
are free to do something when met with barrier requests or to ignore it.
The only concern is probably raid cards that show bunch of IDE drives as a SCSI device.

Bye,
Oleg


Re: 2.6.0-test4 reiserfs oops

2003-08-27 Thread Oleg Drokin
Hello!

On Tue, Aug 26, 2003 at 03:59:20PM +, Lorenzo Allegrucci wrote:
 I have got this oops running fsstress and fsx-linux
 on a 20Gb reiserfs partition.  Fully reproducible.
What are the options to fsx and fsstress?
   fsx-linux linux-2.5.0.tar.bz2 :)
   mkdir d
   fsstress -d d -n 100 -p 10
   The oops follows immediately.
  works for me without any patches.
 
 This is easier to reproduce:
 touch file
 mkdir d
 fsstress -d d -n 100 -p 10
 As soon as I run fsx-linux file on another console I get the oops.
 (However fsx-linux without fsstress is not sufficient)

Well, actually I was able to reproduce it yesterday later after replying to you.
And we got some other similar reports under different workloads.
My initial suspiction about the cause was right. and that fix I sent to you
on the second try is the correct thing to do.
The actual problem came from AIO people merging incorrect patch
into reiserfs code.

Bye,
Oleg


Re: reiser4 snapshot for August 26th.

2003-08-27 Thread Oleg Drokin
Hello!

On Tue, Aug 26, 2003 at 11:28:44PM +0200, Diego Calleja Garc?a wrote:

 btw, I suppose this feature will be removed if/when reiser4 is merged?:
 config REISER4_FS_SYSCALL
 bool Enable reiser4 system call

No. It will be fixed.

 dmesg errors:
 (fs/ext3/inode.c, 2728): ext3_write_inode: called recursively, non-PF_MEMALLOC!
 Call Trace:
  [c018c715] write_inode+0x45/0x50
  [c018c9af] __sync_single_inode+0x28f/0x310
  [c018cd00] generic_sync_sb_inodes+0x1c0/0x2e0

Hm. Interesting Thank you for the report. We will fix it.

Bye,
Oleg


Re: New reiser4 snapshot (as of August 22, 2003)

2003-08-23 Thread Oleg Drokin
Hello!

On Fri, Aug 22, 2003 at 06:16:35PM -0700, Tupshin Harper wrote:

 Are these patches available outside of bitkeeper, and if so, where are 
 they located?

Yes, they are at http://thebsh.namesys.com/snapshots/2003.08.22 , as somebody
pointed out already.
I just forgot to mention the URL.

Bye,
Oleg


Re: ReiserFS problems

2003-08-14 Thread Oleg Drokin
Hello!

On Wed, Aug 06, 2003 at 08:22:52PM +0200, Rogier Wolff wrote:

 Only list the file/directory that's being worked upon when explicitly
 requested. When not explicitly requested, set an alarm handler to
 print it every second (or so). Lots of time is now spent in writing to

I think we already do something like this.
Vitaly should know exact details.

Bye,
Oleg


Re: Filesystem corruption

2003-08-14 Thread Oleg Drokin
Hello!

On Thu, Aug 14, 2003 at 12:05:28AM +0800, Locke wrote:
 the files. I'm guessing the reason why it recovered so little was 
 because that because I was running a 7.8GB+40GB LVM and the 40GB 
 pyhsical volume wasn't working and left it with only 7.8GB.

Yes of course.

 is_tree_node: node level 0 does not match to the expected one 1
 vs-5150: search_by_key: invalid format found in block 8838461. Fsck?

So LVM substitures zero filled blocks instead of data if physical volume
is unavailable.
Of course reiserfsck happily thrown all of those blocks out of the tree.

 And also when rebooting after the corruption I saw several error 
 messages for all drives, hda, hdb and hdg
 **
 hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
 hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
 hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
 hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Also you should consider replacing your noisy IDE cable for primary IDE
controller with not noisy one. Or just run in lower UDMA mode.

 **The messages are copied from the FAQ in namesys.com because they 
 looked similar so I'm not sure if they're the exactly same.

Well, if they are not the same, you'd better write them down on paper.

 Is there anything I can try to recover more data?

You might try to get LVM up again and run reiserfsck --rebuild tree.
Some more stuff wuill be restored.
Though still you will have lots of files' content lost and there is no way
to restore it anymore.
Also use reiserfsck 3.6.11

Bye,
Oleg


Re: ReiserFS problems

2003-08-14 Thread Oleg Drokin
Hello!

On Thu, Aug 07, 2003 at 11:12:27AM -0700, Mike Fedyk wrote:
  Well. This is actually unfortunate, I agree. In such a case you'd better
  move your reiserfs images to some other place for the time of reiserfsck 
  --rebuild-tree run.
  or compress them.
 But if there was at any time an uncompressed reiserfs image within the outer
 reiserfs filesystem you're fscking, won't that screw it up too?

Yes.
The fs in file will be completely destroyed.
Some stuff from it may appear in outer fs. (possibly in lost + found,
no actual file data, just the names and directory structure).

 So you can compress it, but if you uncompress it to work with it, it still
 fscks fsck...  Right? :-/

Yes.

Bye,
Oleg


Re: ReiserFS problems

2003-08-14 Thread Oleg Drokin
Hello!

On Wed, Aug 06, 2003 at 06:20:55PM +0200, Rogier Wolff wrote:

 Reiserfs messed up our filesystem again (one file gives us permission

And you use what kernel with what patches on what hardware?

 A surface scan needs to read all the datablocks. But an fsck
 doesn't. At least that's the normal case.

reiserfsck --rebuild-tree is special, it actually reads in all
the blocks on the device that are marked as used, to find metadata blocks and
connect them to the tree (even if they were previously unconnected).
Unlike many other filesystems out there, reiserfs does not have fixed metadata 
locations,
hence we absolutely need this scan.

 later. So we hit control-C on the fsck.

That was big mistake.

 But now mounting the filesystem gives us: 
 ReiserFS version 3.6.25
 reiserfs: checking transaction log (device 09:00) ...
 is_tree_node: node level 0 does not match to the expected one 65534
 vs-5150: search_by_key: invalid format found in block 0. Fsck?
 vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [1 
 2 0x0 SD]
 Using r5 hash to sort names
 is_tree_node: node level 0 does not match to the expected one 65534
 vs-5150: search_by_key: invalid format found in block 0. Fsck?
 vs-2140: finish_unfinished: search_by_key returned -2
 and fsck without --rebuild-tree gives us that an unfinished
 --rebuild-tree was in progress. So we've restarted the tree-rebuild.

Yes. Once you run tree-rebuild, you must wait until it is completed.
(Documentation update is scheduled just now. But in fact we mention this in our FAQ).

 Question: If it is reading all datablocks, I'm guessing that it is

All one that are marked as occupied in the bitmaps.

 looking for the magics that build up the filesystem. We're a

Yes.

 datarecovery company. We probably don't have any current
 datarecoveries of people with Reiserfs on their disk. But if we had a
 disk-image with a valid (or not) Reiserfs on it, would it link that
 into our filesytem?

yes it will.
So basically speaking you do not want to run rebuild-tree operation on the 
FS that contains files with reiserfs metadata embedded in them in clear.
This is also explained in our FAQ.

 Anyway, when I first started out with Reiserfs, it didn't support  2G
 files (or was it 4G?) I had to patch the kernel and (irreversably!) 
 upgrade the on-disk format. 

Yes. Linux by itself was not supporting 2G some time ago and people used patches
an changed their on disk formats even for other filesystems out there.

 We've noticed horrible slowdowns when the filesystem is  90% full. It
 turns out that when a block group is more than 90% full reiserfs will
 prefer a different block group. i.e. it is ALWAYS switching block
 groups when the whole disk is  90% full. Something like that. When we
 report something like that it's always: Ah, yes, that's an old bug
 we've fixed it. Use patch.

In fact this is not exactly true, it only switches to other block group if
you are creating new file. Why do you think this is a problem?
(of course I am speaking of 2.4.20+ kernels).

Bye,
Oleg


Re: nfsd-fh: found a name that I didn't expect

2003-08-14 Thread Oleg Drokin
Hello!

On Wed, Aug 06, 2003 at 05:00:03PM -0400, John Dalbec wrote:

 I just got an nfsd-fh: found a name that I didn't expect yesterday. 
 I'm using a Red Hat 2.4.20 RPM with 2.4.20-pending+data-logging+quota.
 Should I apply just this patch or both this patch and the 
 iget5_locked_2.4.20 patch?

You only need the patch below. iget5_locked_2.4.20 patch is broken.

Bye,
Oleg
 = fs/reiserfs/inode.c 1.42 vs edited =
 --- 1.42/fs/reiserfs/inode.c Thu Feb 13 15:42:42 2003
 +++ edited/fs/reiserfs/inode.c   Thu Feb 20 17:23:24 2003
 @@ -20,6 +20,10 @@
  static int reiserfs_get_block (struct inode * inode, long block,
 struct buffer_head * bh_result, int create);
  
 +/* This spinlock guards inode pkey in private part of inode
 +   against race between find_actor() vs reiserfs_read_inode2 */
 +static spinlock_t keycopy_lock = SPIN_LOCK_UNLOCKED;
 +
  void reiserfs_delete_inode (struct inode * inode)
  {
  int jbegin_count = JOURNAL_PER_BALANCE_CNT * 2; 
 @@ -898,8 +902,9 @@
  bh = PATH_PLAST_BUFFER (path);
  ih = PATH_PITEM_HEAD (path);
  
 -
 +spin_lock(keycopy_lock);
  copy_key (INODE_PKEY (inode), (ih-ih_key));
 +spin_unlock(keycopy_lock);
  inode-i_blksize = PAGE_SIZE;
  
  INIT_LIST_HEAD(inode-u.reiserfs_i.i_prealloc_list) ;
 @@ -1220,10 +1225,27 @@
  unsigned long inode_no, void *opaque )
  {
  struct reiserfs_iget4_args *args;
 +int retval;
  
  args = opaque;
 +/* We protect against possible parallel init_inode() on another CPU 
 here. */
 +spin_lock(keycopy_lock);
  /* args is already in CPU order */
 -return le32_to_cpu(INODE_PKEY(inode)-k_dir_id) == args - objectid;
 +if (le32_to_cpu(INODE_PKEY(inode)-k_dir_id) == args - objectid)
 +retval = 1;
 +else
 +/* If The key does not match, lets see if we are racing
 +   with another iget4, that already progressed so far
 +   to reiserfs_read_inode2() and was preempted in
 +   call to search_by_key(). The signs of that are:
 + Inode is locked
 + dirid and object id are zero (not yet initialized)*/
 +retval = (inode-i_state  I_LOCK) 
 + !INODE_PKEY(inode)-k_dir_id 
 + !INODE_PKEY(inode)-k_objectid;
 +
 +spin_unlock(keycopy_lock);
 +return retval;
  }
  
  struct inode * reiserfs_iget (struct super_block * s, const struct 
  cpu_key * key)
 
 
 


Re: reiser4 snapshot

2003-08-14 Thread Oleg Drokin
Hello!

On Mon, Aug 11, 2003 at 05:32:25PM -0700, Boris Tschirschwitz wrote:

 I thought I'd give it a try on 2.6.0-test3-mm1.
 Even with 'make mrproper' before compiling, I get the following error
 message:
 (Is there any interest in such error reports?)

Yes, there is.

 bobele linux # make bzImage
   CHK include/linux/version.h
   UPD include/linux/version.h
   Making asm-asm-i386 symlink
   CC  scripts/empty.o
   MKELF   scripts/elfconfig.h
   HOSTCC  scripts/file2alias.o
   HOSTCC  scripts/modpost.o
   HOSTLD  scripts/modpost
   SPLIT   include/linux/autoconf.h - include/config/*
   CC  arch/i386/kernel/asm-offsets.s
   CHK include/asm-i386/asm_offsets.h
   UPD include/asm-i386/asm_offsets.h
   CC  init/main.o
 In file included from include/linux/unistd.h:9,
  from init/main.c:18:
 include/asm/unistd.h: In function `reiser4':
 include/asm/unistd.h:400: error: `__NR_reiser4' undeclared (first use in this 
 function)
 include/asm/unistd.h:400: error: (Each undeclared identifier is reported only once
 include/asm/unistd.h:400: error: for each function it appears in.)
 make[1]: *** [init/main.o] Error 1
 make: *** [init] Error 2

Hm, this is strange.
__NR_reiser4 is clearly defined in include/asm-i386/unistd.h

Probably you had that part of the patch rejected? Can you please verify?

Bye,
Oleg


Re: un-long listable files

2003-08-06 Thread Oleg Drokin
Hello!

On Wed, Aug 06, 2003 at 02:31:39PM +1000, [EMAIL PROTECTED] wrote:

 getxattr(light_in_time_of_darkness__glad_to_see_you.wav,
  system.posix_acl_access
 and there it sits waiting for heat-death-of-universe.

Hm, this code is by SuSE people (and it is only in suse kernels) and I have not
even looked at it closely yet.
Probably Jeff can comment on it.

 Looks like I'm falling foul of some strange ACLs.
 ls -l as root still hangs though.

Bye,
Oleg


Re: rebuild fs

2003-08-05 Thread Oleg Drokin
Hello!

On Tue, Aug 05, 2003 at 04:56:55PM +0400, Hans Reiser wrote:

 rephrase that as, use 3.6.11, if it still fails, tell us, the segfault 
 will at least be fixed regardless of whether fsck has enough data to do 
 its job.
 But it was not failing on the IDE drive anyway.
 I don't understand the relevance of your statement to mine.

Since after transferring image to IDE made reiserfsck to not fail (and it failed on 
raid5 due to raid errors,
I think), your if it still fails statement was not adequate.
Current problem is that not everything is restored and some important files were lost.
Now, I know that recently we introduced some serious changes in reiserfsck and now
if the block have some slight corruption, it is not immediately discarded, but fsck 
actually
tries to extract some useful data out of it if it think this is really reiserfs 
metadata block.
That's why newer reiserfsck might achieve better results.

Bye,
Oleg


Re: rebuild fs

2003-08-05 Thread Oleg Drokin
Hello!

On Tue, Aug 05, 2003 at 04:28:15PM +0400, Hans Reiser wrote:
 But while rebuilding the tree, I got a segmentation fault. Because I 
 didn't
 want to continue work on the original raid system, I copied the entire 
 raid
 disk to the IDE disk.
 dd  if=/dev/rd/c0d0  of=/dev/hda conv=noerror,sync
 I tried to rebuild the fs structure again, and  I was able to access many
 files, but not that are important to me :(
 Does there is anyting I can do in this state ? does other tools then
 resierfsck exist ?
 If you are not using latest fsck version (3.6.11 as of now), try to use 
 reiserfsprogs 3.6.11,
 as there is slight chance it would do better.
 rephrase that as, use 3.6.11, if it still fails, tell us, the segfault 
 will at least be fixed regardless of whether fsck has enough data to do 
 its job.

But it was not failing on the IDE drive anyway.

Bye,
Oleg


Re: is this a known bug?

2003-07-24 Thread Oleg Drokin
Hello!

On Thu, Jul 24, 2003 at 01:54:35AM +0200, [EMAIL PROTECTED] wrote:
 [usb-ohci:sohci_device_operations+781321/95136603] .LC62 [reiserfs] 0xc3
 Jul 14 13:25:41 mai-stor2 kernel: Call Trace: [f8a456d5] .LC62 [reiserfs] 0xc3
 Jul 14 13:25:41 mai-stor2 kernel: [f8a3ad1e] journal_mark_dirty [reiserfs] 0x13e
 Jul 14 13:25:41 mai-stor2 kernel: [f8a43d60] .LC93 [reiserfs] 0x27a0
 Jul 14 13:25:41 mai-stor2 kernel: [f8a1f100] reiserfs_free_block [reiserfs] 0xa0
 Jul 14 13:25:41 mai-stor2 kernel: [f8a34bb0] prepare_for_delete_or_cut
 [reiserfs] 0x760
 Jul 14 13:25:41 mai-stor2 kernel: [f8a22417] free_thrown [reiserfs] 0x57
 Jul 14 13:25:41 mai-stor2 kernel: [f8a22689] do_balance [reiserfs] 0xe9
 Jul 14 13:25:41 mai-stor2 kernel: [f8a356ca] reiserfs_cut_from_item [reiserfs]

Yes, I know this one. It is journal overflow we fixed in 2.4.21 or thereabout.
If you use reiserfs, you really do not want to depend on RedHat's 2.4.9 kernel,
you'd better get some recent stuff (or talk us into backporting fixes to 2.4.9?
that might work if you have enough money ;) )

Bye,
Oleg


Re: problem with overwriting large files

2003-07-24 Thread Oleg Drokin
Hello!

On Mon, Jul 21, 2003 at 08:30:19AM -0700, Suman Puthana wrote:
 We do not see any problem when we are writing into empty space(using the
 write call in a C program) as the file is extending( the write operation
 takes less than 3 ms), but for a certain part of the application we need to
 over-write these files and we find that the write operation is taking
 about 200-300 ms every few minutes, sometimes every few seconds depending on
 the system load.

The description is very nice, but it would be even more nice if you can
provide a sample test code that we can run and just see the problem.
 
 3.) Would writing in file system blocks(4096 bytes?) or multiples of blocks
 help this situation?  From some basic tests it doesn't seem to help by much.
 From the file system performance point of view, is it better to write
 sixteen 4K chunks or one 64K chunk?

Actually rewriting should be way more faster just because you are not allocating stuff
and only changing mtime. So... I'd really appreciate a sample code
that demonstrates a problem.

Also give an info about what kernel you are trying to use that shows the problem
andstuff like that.

Thank you.

Bye,
Oleg


Re: bug report: attributes ( chattr +a ) not respected by reiserfs 3.6, but this isn't listed in man mkreiserfs

2003-07-18 Thread Oleg Drokin
Hello!

On Wed, Jul 16, 2003 at 04:29:38PM -0500, Matt Stegman wrote:

 I get the same behaviour, but it appears that *only* the append-only
 attribute is ignored. Other attributes are respected fine.  However,
 there are still some wierd things with reiserfs and attributes.  This is
 a pretty long email detailing what I found.

Thanks a lot for a lot of details.

 # mkreiserfs /dev/hdc1
 -mkreiserfs, 2003-
 reiserfsprogs 3.6.8
 mkreiserfs: Guessing about desired format.. 
 mkreiserfs: Kernel 2.4.20-xfs-r3 is running.
 Format 3.6 with standard journal
 ...snip...
 # mount -t reiserfs -o attrs /dev/hdc1 /mnt/reiser
 # cd /mnt/reiser
 # echo hello  file
 -bash: file: Permission denied
 Huh?  I'm root, this is a new filesystem, why would permission be
 denied?  Wait a minute...
 # lsattr -d /mnt/reiser
 suS-iadAcjIt- /mnt/reiser

Yes, this is a problem in mkreiserfs.
Surprisingly 2.6.4 works ok.
This of course will be fixed.

 # chattr +i file
 # echo line2  file
 -bash: file: Permission denied
 Append only (a) is not respected.
 
 # chattr -i +a file
 # lsattr file
 s-S--adAc--t- file
 # echo test  file
 # ls -l file
 -rw-r--r--1 root root5 Jul 16 16:09 file

Hm... Indeed. Sigh. 

   a : file can only be opened in append mode
   Ignored by reiserfs.

Yes, this is a bug.

   d : tell dump to ignore this file
   Does dump even work on reiserfs?

We use this for marking the file as not needing tail packing.

   t : do not merge tails on this file
   I don't know if this is supported or not.

Hm, my chattr documentation does not have this flag.

 Finally, 'reiserfsck --clean-attributes' will produce the following:
 # umount /mnt/reiser
 # reiserfsck --clean-attributes /dev/hdc1
 ...snip...
 # mount -t reiserfs -o attrs /dev/hdc1 /mnt/reiser
 # lsattr -d /mnt/reiser
 - /mnt/reiser
 # lsattr /mnt/reiser/file
 - /mnt/reiser/file

Yes, this is expected.
Older kernels, that are unaware of reiserfs attributes (pre 2.4.17 ones)
write various garbage in sd_attrs field in stat data. So this needs to be cleaned.
And we even invented a superblock flag to indicate that such a cleaning was performed 
already.
Now we also see that new mkreiserfs also writes garbage there.

Bye,
Oleg


Re: reiserfsprogs-3.6.9 release

2003-07-17 Thread Oleg Drokin
Hello!

   BTW, it seems one other important change was omissed in the short summary below.
   The license on the reiserfsprogs package was changed from GPLv2 to
   GPLv2 with additional restriction that I quote below:
===
ReiserFSprogs is hereby licensed under the GNU General Public License
version 2 but with the following Anti-Plagiarism modification:
You may not remove any credits or brand marks, or cause them to not
display, unless you are an end user (that is, you are not
redistributing to others).  Yes, there really are people with the
nerve to remove credits from software they did not write, or only
wrote a small part of, and they are even frequently occurring sad to
say.  Credits are not ads, credits describe someone's contribution to
the project (e.g. labor or money) whereas an ad says something else.
===

Bye,
Oleg
On Wed, Jul 16, 2003 at 09:03:29PM +0400, Vitaly Fertman wrote:
 
 Hi all, 
 
 the new reiserfsprogs release is available on our ftp site (ftp.namesys.com).
 
 This release includes:
 - objectid handling was improved, significant speedup at pass0 and
 semantic/lost+found rebuild passes (was in last pre releases for some time);
 - improved leaves recovery on pass0 of rebuild-tree;
 - exit codes of reiserfsck were fixed;
 - reiserfsck --yes option was added
 - mkreiserfs --quiet option was added;
 - fsck check on boot avoids another bitmap reading on the following mount;
 - credits were fixed.
 
 Some bugs were fixed:
 - fsck proceeds for the standard journal when wrong journal parameters in the 
   journal header detected, fixing them with the warning;
 - a bug in journal replaying code when the only transaction exists was fixed;
 - a few not standard journal related bugs in mkreiserfs and reiserfstune were 
 fixed;
 - a pair of bugs in rebuild-sb were fixed;
 
 -- 
 Thanks,
 Vitaly Fertman


Re: Horrible ftruncate performance

2003-07-16 Thread Oleg Drokin
Hello!

On Tue, Jul 15, 2003 at 09:55:09PM +0200, Dieter N?tzel wrote:

 Somewhat.
 Mouse movement is OK, now. But...
 
 1+0 Records aus
 0.000u 3.090s 0:16.81 18.3% 0+0k 0+0io 153pf+0w
 0.000u 0.050s 0:00.27 18.5% 0+0k 0+0io 122pf+0w
 INSTALL/SOURCE time dd if=/dev/zero of=sparse1 bs=1 seek=200G count=1 ; time 
 sync
 1+0 Records ein
 1+0 Records aus
 0.000u 3.010s 0:15.27 19.7% 0+0k 0+0io 153pf+0w
 0.000u 0.020s 0:01.01 1.9%  0+0k 0+0io 122pf+0w

So you create a file in 15 seconds and remove it in 15 seconds.
Kind of nothing changed except mouse now moves, am I reading this wrong?

 INSTALL/SOURCE time rm sparse ; time sync
 0.000u 14.990s 1:31.15 16.4%0+0k 0+0io 130pf+0w
 0.000u 0.030s 0:00.22 13.6% 0+0k 0+0io 122pf+0w

So the stuff fell out of cache and we need to read it again.
hence the increased time. Hm, probably this case can be optimized
if there is only one item in the leaf and this item should be removed.
Need to take closer look to balancing code.

Bye,
Oleg


Re: Horrible ftruncate performance

2003-07-16 Thread Oleg Drokin
Hello!

On Wed, Jul 16, 2003 at 12:47:53PM +0200, Dieter N?tzel wrote:
   Somewhat.
   Mouse movement is OK, now. But...
   1+0 Records aus
   0.000u 3.090s 0:16.81 18.3% 0+0k 0+0io 153pf+0w
   0.000u 0.050s 0:00.27 18.5% 0+0k 0+0io 122pf+0w
   INSTALL/SOURCE time dd if=/dev/zero of=sparse1 bs=1 seek=200G count=1 ;
   time sync
   1+0 Records ein
   1+0 Records aus
   0.000u 3.010s 0:15.27 19.7% 0+0k 0+0io 153pf+0w
   0.000u 0.020s 0:01.01 1.9%  0+0k 0+0io 122pf+0w
  So you create a file in 15 seconds
 Right.
  and remove it in 15 seconds.
 No. Normaly ~5 seconds.

Ah, yes. Looking at wrong timeing info ;)
I see that yesterday without the patch you had 1m, 9s, 5s, 2m times
for 4 deletes...

  Kind of nothing changed except mouse now moves,

   INSTALL/SOURCE time rm sparse ; time sync
   0.000u 14.990s 1:31.15 16.4%0+0k 0+0io 130pf+0w
   0.000u 0.030s 0:00.22 13.6% 0+0k 0+0io 122pf+0w
  So the stuff fell out of cache and we need to read it again.
 Shouldn't this take only 15 seconds, then?

Probably there was some seeking due to removal of lots of blocks.

 Worst case was ~5 minutes.

Yeah, this is of course sad.
BTW is this with search_reada patch? What if you try without it?

Bye,
Oleg


Re: Horrible ftruncate performance

2003-07-11 Thread Oleg Drokin
Hello!

On Fri, Jul 11, 2003 at 05:27:25PM +0200, Dieter N?tzel wrote:
  Actually I did it already, as data-logging patches can be applied to
  2.4.22-pre3 (where this truncate patch was included).
 No -aaX.

Right.

   Maybe it _IS_ time for this _AND_ all the other data-logging patches?
   2.4.22-pre5?
  It's Chris turn. I thought it is good idea to test in -ac first, though
  (even taking into account that these patches are part of SuSE's stock
  kernels).
 I don't think -ac would make it = No big Reiser involved...

Would make what?
I think Alan have agreed to put data-logging code in already.

Bye,
Oleg


Re: Horrible ftruncate performance

2003-07-11 Thread Oleg Drokin
Gello!

On Fri, Jul 11, 2003 at 05:32:49PM +0200, Dieter N?tzel wrote:
   OK some hand work...
 Where comes this from?

It was there for a lot of time. Like for not less than 2 years, I'd say.

 I don't find it my tree:

reiserfs quota patch got rid of it.
Here's relevant part of my diff:
if (retval) {
-   reiserfs_free_block (th, allocated_block_nr);
+   reiserfs_free_block (th, inode, allocated_block_nr, 1);
goto failure;
}
-   if (done) {
-   inode-i_blocks += inode-i_sb-s_blocksize / 512;
-   } else {
+   if (!done) {
/* We need to mark new file size in case this function will be
   interrupted/aborted later on. And we may do this only for
   holes. */

Bye,
Oleg


Re: Horrible ftruncate performance

2003-07-11 Thread Oleg Drokin
Hello!

On Fri, Jul 11, 2003 at 05:34:12PM +0200, Marc-Christian Petersen wrote:

  Actually I did it already, as data-logging patches can be applied to
  2.4.22-pre3 (where this truncate patch was included).
   Maybe it _IS_ time for this _AND_ all the other data-logging patches?
   2.4.22-pre5?
  It's Chris turn. I thought it is good idea to test in -ac first, though
  (even taking into account that these patches are part of SuSE's stock
  kernels).
 Well, I don't think that testing in -ac is necessary at all in this case.

May be not. But it is still useful ;)

 I am using WOLK on many production machines with ReiserFS mostly as Fileserver 
 (hundred of gigabytes) and proxy caches.

I am using this code on my production server myself ;)

 If someone would ask me: Go for 2.4 mainline inclusion w/o going via -ac! :)

Chris should decide (and Marcelo should agree) (Actually Chris thought it is good
idea to submit data-logging to Marcelo now, too). I have no objections.
Also now, that quota v2 code is in place, even quota code can be included.

Also it would be great to port this stuff to 2.5 (yes, I know Chris wants this to be 
in 2.4 first)

Bye,
Oleg


Re: df shows 172GB reiserFS partition as 109GB partition

2003-07-08 Thread Oleg Drokin
Hello!

On Mon, Jul 07, 2003 at 07:49:22PM +0200, Yasuo Iwakura wrote:

 I have a IBM Deskstar 180GB Harddrive and I created a 172GB Reiser FS
 partition, using drakconf (mdk-9.1-ger).
 The Problem is, df show only 109GB, same with  windows(samba) -
 Windows says 108GB. (btw, the Bios thinks 130GB but i think thats not
 important)
 df (coreutils) 4.5.7
 Linux version 2.4.21-0.13mdk
 /dev/hda1
 - cylinder 1-22526
 - 180940063+ blocks
 - ID 83  Linux
 

Hm, strange.
Looks like the fs was created with lesser size for unknown reason.
Try to issue this command (once) while fs is mounted:
mount /dev/hda1 -t reiserfs -o resize=45235015 

Then see if df now reports coirrect number.

Bye,
Oleg


Re: add_save_link:search_by_key

2003-07-06 Thread Oleg Drokin
Hello!

On Mon, Jul 07, 2003 at 07:51:01AM +0200, Trond Hagen wrote:

 Thanks, but the Red Hat kernel is a 2.4.20 why isn't the fix merged in ?

Oops. I meant that the fix was merged into 2.4.21.

Bye, 
Oleg
 On Sat, 2003-07-05 at 13:46, Oleg Drokin wrote:
  Hello!
  
  On Fri, Jul 04, 2003 at 11:32:16PM +0200, Trond Hagen wrote:
  
   why I'm getting a lot of these ? 
   Jul  4 22:24:59 db-http1 kernel: vs-2100: add_save_link:search_by_key
   ([-1 15610004 0x1001 DIRECT]) returned 1
   I'm running: 2.4.20-13.7smp  (Red Hat 7.3 kernel)
  
  Sounds like you've been bitten by infamous iget4() race. The fix is
  merged into 2.4.20
  The individual fix can be obtained from
  ftp://ftp.namesys.com/pub/reiserfs-for-2.4/2.4.20-pending/07-race-fix.diff
  Don't forget to fsck your reiserfs filesystems after applying
  the fix (use latest reiserfsprogs from our ftp site, not the old
  stuff from RH 7.3!)
  
  Bye,
  Oleg
 




Re: reiserfs on removable media

2003-07-03 Thread Oleg Drokin
Hello!

On Wed, Jul 02, 2003 at 02:23:13PM -0400, Zygo Blaxell wrote:

 - If the device is detached while a filesystem is mounted, reiserfs gets a
 whole lot of I/O errors (or worse) and immediately oopses.  It would be
 nice if reiserfs would handle this a bit more gracefully--it should at
 least let me kill processes with open files and umount the filesystem.
 OTOH many other things also oops with with current USB/firewire/scsi device
 driver stack too.  :-P

Write errors to data areas are not mostly safe.
It's write errors into journal area that kill the thing.
Jeff Mahoney of SuSE have the patch that remounts the FS R/O in
case of such an event (I think he even posted some preliminary patches
here), it is what you most probably need in this case.

Bye,
Oleg


Re: Journal-601 error on Redhat 7.3 / reiserfs / ext3 / raid 5

2003-07-03 Thread Oleg Drokin
Hello!

On Thu, Jul 03, 2003 at 01:14:08AM +0300, Jussi Vainionp?? wrote:

 Apr 27 20:18:06 un kernel: journal-601, buffer write failed
 I do not know who to blame here. Try to heavily write to loop device 
 itself (without using
 reiserfs) to see if something will break? Or bettr yet - upgrade to newer 
 kernel and see if that's
 cures your problem?
 I tried the same operation using ext2 instead of reiserfs and at least that 
 worked without any problems.

ext2 does not wait on buffers unless you operate in sync mode, so it won't notice.
Try the ext2 with -o sync then?

Bye,
Oleg


data-logging for 2.4.22-pre3

2003-07-02 Thread Oleg Drokin
Hello!

   Yes, I know that 2.4.22-pre3 is not out yet, but Marcelo have accepted our somewhat 
big patches
   and so you can get replacement patches from 
ftp://namesys.com/pub/reiserfs-for-2.4/testing/data-logging-and-quota-2.4.22-pre3
   once 2.4.22-pre3 is out ;)
   Also starting from 2.4.22-pre3 you no longer need to apply 03-relocation-8.diff.gz 
patch.

Bye,
Oleg


Re: vpf-10680, minor corruptions

2003-06-27 Thread Oleg Drokin
Hello!

On Fri, Jun 27, 2003 at 04:38:00PM +0400, Oleg Drokin wrote:

 I was looking in the wrong direction, when I produced that patch,
 so it will produce zero output.
 I hope to come up with ultimate fix soon enough. ;)

Well, there is a patch below that does *not* work for me ;)
But it should work.
I have traced the new problem to a cross compiler that compiles
code in a different way than native compiler for whatever reason
(demo is attached as test.c program, it should print result is 1
in case it is compiled correctly and stuff about unknown
uniqueness if it is miscompiled. In fact may be this is just correct compiler 
behaviour.)
I now think that when I compile a kernel with native compiler, it should work
with below patch. But I can verify that only tomorrow it seems.
You might try that patch as well to see if it helps you before I try it ;)
The patch is obviously correct one. (except that it does not work
with my cross compiler and kernel does work without patch which is really-really 
strange).

= fs/reiserfs/bitmap.c 1.26 vs edited =
--- 1.26/fs/reiserfs/bitmap.c   Sun May 18 01:09:36 2003
+++ edited/fs/reiserfs/bitmap.c Fri Jun 27 16:58:44 2003
@@ -43,7 +43,7 @@
 test_bit(_ALLOC_ ## optname , SB_ALLOC_OPTS(s))
 
 static inline void get_bit_address (struct super_block * s,
-   unsigned long block, int * bmap_nr, int * offset)
+   b_blocknr_t block, int * bmap_nr, int * offset)
 {
 /* It is in the bitmap block number equal to the block
  * number divided by the number of bits in a block. */
@@ -54,7 +54,7 @@
 }
 
 #ifdef CONFIG_REISERFS_CHECK
-int is_reusable (struct super_block * s, unsigned long block, int bit_value)
+int is_reusable (struct super_block * s, b_blocknr_t block, int bit_value)
 {
 int i, j;
 
@@ -107,7 +107,7 @@
 static inline  int is_block_in_journal (struct super_block * s, int bmap, int
 off, int *next)
 {
-unsigned long tmp;
+b_blocknr_t tmp;
 
 if (reiserfs_in_journal (s, bmap, off, 1, tmp)) {
if (tmp) {  /* hint supplied */
@@ -235,7 +235,7 @@
 /* Tries to find contiguous zero bit window (given size) in given region of
  * bitmap and place new blocks there. Returns number of allocated blocks. */
 static int scan_bitmap (struct reiserfs_transaction_handle *th,
-   unsigned long *start, unsigned long finish,
+   b_blocknr_t *start, b_blocknr_t finish,
int min, int max, int unfm, unsigned long file_block)
 {
 int nr_allocated=0;
@@ -281,7 +281,7 @@
 }
 
 static void _reiserfs_free_block (struct reiserfs_transaction_handle *th,
- unsigned long block)
+ b_blocknr_t block)
 {
 struct super_block * s = th-t_super;
 struct reiserfs_super_block * rs;
@@ -327,7 +327,7 @@
 }
 
 void reiserfs_free_block (struct reiserfs_transaction_handle *th, 
-  unsigned long block)
+  b_blocknr_t block)
 {
 struct super_block * s = th-t_super;
 
@@ -340,7 +340,7 @@
 
 /* preallocated blocks don't need to be run through journal_mark_freed */
 void reiserfs_free_prealloc_block (struct reiserfs_transaction_handle *th, 
-  unsigned long block) {
+  b_blocknr_t block) {
 RFALSE(!th-t_super, vs-4060: trying to free block on nonexistent device);
 RFALSE(is_reusable (th-t_super, block, 1) == 0, vs-4070: can not free such 
block);
 _reiserfs_free_block(th, block) ;
@@ -589,15 +589,15 @@
 
 static inline int old_hashed_relocation (reiserfs_blocknr_hint_t * hint)
 {
-unsigned long border;
-unsigned long hash_in;
+b_blocknr_t border;
+u32 long hash_in;
 
 if (hint-formatted_node || hint-inode == NULL) {
return 0;
   }
 
 hash_in = le32_to_cpu((INODE_PKEY(hint-inode))-k_dir_id);
-border = hint-beg + (unsigned long) keyed_hash(((char *) (hash_in)), 4) % 
(hint-end - hint-beg - 1);
+border = hint-beg + (u32) keyed_hash(((char *) (hash_in)), 4) % (hint-end - 
hint-beg - 1);
 if (border  hint-search_start)
hint-search_start = border;
 
@@ -606,7 +606,7 @@
   
 static inline int old_way (reiserfs_blocknr_hint_t * hint)
 {
-unsigned long border;
+b_blocknr_t border;
 
 if (hint-formatted_node || hint-inode == NULL) {
return 0;
@@ -622,7 +622,7 @@
 static inline void hundredth_slices (reiserfs_blocknr_hint_t * hint)
 {
 struct key * key = hint-key;
-unsigned long slice_start;
+b_blocknr_t slice_start;
 
 slice_start = (keyed_hash((char*)(key-k_dir_id),4) % 100) * (hint-end / 100);
 if ( slice_start  hint-search_start || slice_start + (hint-end / 100) = 
hint-search_start) {
@@ -910,7 +910,7 @@
 int reiserfs_can_fit_pages ( struct super_block *sb /* superblock of filesystem
   to estimate space

Re: Write-once file system

2003-06-27 Thread Oleg Drokin
Hello!

On Fri, Jun 27, 2003 at 09:07:05AM -0700, Fong Vang wrote:
 Once the write to the file is CLOSED the file should not be modifiable in
 any way.  It should not be writeable by root.  Ideally, this should be
 across reboot and across kernel.  The current requirement is that as long as
 the modified kernel/reisefs is being used then it should NOT be modifiable
 (if a kernel allowing modification is used then it could allow
 modifications).

So basically do you think it would be better for you to have write-once flag in 
superblock
that will make all files to be unwritable (except newly created ones) as opposed to a 
simple
mount option that you'd use for filesystems with non-changeable files?
(you need to mark filesystems that are in write-once mode somehow, because I think
you do not need all reiserfs filesystems to be run in this mode, right?)
Also concerning the root should not be able to change the files, root
will be able to overwrite files by using block devices if he'd want to.

Bye,
Oleg


Re: vpf-10680, minor corruptions

2003-06-27 Thread Oleg Drokin
Hello!

On Fri, Jun 27, 2003 at 12:23:07PM -0400, Chris Mason wrote:

 Most of these changes are in 2.4.21, which I've been using on an AMD64

Not the reiserfs_file_write() ones.

 bit box for a while without any problems.  The bug should be somewhere
 else, it looks to me like these spots aren't trying to send an unsigned
 long to disk.

the reiserfs_file_write() code
have an array of b_blocknr_t elements.
It then submits this array to reiserfs_paste_into_item/reiserfs_insert_item,
but b_blocknr_t is unsigned long (read - 64 bit on alpha - oops).
Funny thing is when I declare b_blocknr_t as u32, kernel basically falls apart
if cross compiled. E.g. key comparison does not work and
all kind of weird things start to happen.

In short - if you want to make sure the bug is there - compile 2.5.70+ code
on any 64 bit platform, write any file bigger than 2 blocks,
unmount and remount the fs and see what's in the file.

Bye,
Oleg


Re: Write-once file system

2003-06-27 Thread Oleg Drokin
Hello!

On Fri, Jun 27, 2003 at 10:07:05AM -0700, Fong Vang wrote:
 this doesn't seem to work on kernel 2.4.20.  I did a chattr +i on file but
 rm -rf (as root) on the file deletes it.

You need to mount with -o attrs mount option for extended attributes to work.

Bye,
Oleg


Re: Write-once file system

2003-06-27 Thread Oleg Drokin
Hello!

On Fri, Jun 27, 2003 at 11:27:22AM -0600, 'Andreas Dilger' wrote:

  this doesn't seem to work on kernel 2.4.20.  I did a chattr +i on file but
  rm -rf (as root) on the file deletes it.
 That is a reiserfs bug then...  I just tested it with ext3 and it worked as
 expected.

No, it is documented reiserfs feature. You must enable extended attributes 
support at mount time.

Bye,
Oleg


Re: Quota

2003-06-25 Thread Oleg Drokin
Hello!

On Wed, Jun 25, 2003 at 06:22:00PM +0800, SteelRat wrote:

 Can you help me.
 How can i set quotas to reiserfs?

Patches for recent kernels (2.4.21+) are available at
ftp://ftp.suse.com/pub/people/mason/patches/data-logging/2.4.21
Patches for 2.4.20 are available at
ftp://namesys.com/pub/reiserfs-for-2.4/testing/quota-2.4.20

Apply the patches, recompile your kernel with quota support, upgrade your quota
tools if needed and follow directions from Quota-HOWTO.

Bye,
Oleg


Re: vpf-10680, minor corruptions

2003-06-24 Thread Oleg Drokin
Hello!

On Mon, Jun 23, 2003 at 03:38:20PM +0200, Christian Kujau wrote:

 as stated before, the corruptions occur only on this very alpha machine, 

Well, I still cannot build the kernel myself and still working on it.
(having make: *** [vmlinux] Error 139 and zero length vmlinux)

BTW, I realised that I have not looked into your kernel config for that box,
can you send it to me please?

 bread: Cannot read the block (523914): (Input/output error).

Hm, but still it means kernel returned some error for read request.

 hah! i was not aware that the disk might have an hw problem, not a 
 single error ever showed up in my logs. this was weird. so i 
 re-partitioned the disk with a 10MB sde (to circumvent the bread error) 
 on the beginning and a 2 GB sde2. now reiserfsck/cp/diff are all working 
 fine under 2.4.21, but 2.5.72 is still erroneous.

Sigh.

 
 btw: i am still using reiserfsprogs 3.6.8 now (since debian/testing has 
 3.6.6) and i have compiled these utils under a 2.5.72 kernel. is it safe 
 to use them under 2.4 ?

I see that you have used 2.5.70 and earlier kernels on alpha too.
Do you have any idea of when stuff broke for you?

Bye,
Oleg


Re: vpf-10680, minor corruptions

2003-06-24 Thread Oleg Drokin
Hello!

On Wed, Jun 25, 2003 at 02:42:24AM +0200, Christian Kujau wrote:
 (/lib/modules/2.5.65/kernel/fs/reiserfs/reiserfs.ko): Invalid module format
 lila:~# uname -a
 Linux lila 2.5.65 #4 Wed Jun 25 00:48:46 CEST 2003 alpha GNU/Linux
 i compiled the module with CONFIG_REISERFS_CHECK=y.
 shall i go on with 2.5.64 or better 2.5.67 ?

Try to compile with CONFIG_REISERFS_CHECK=y the kernel that known-bad for you.
(e.g. 2.5.72/2.5.73)

Bye,
Oleg


Re: 2.4.21 reiserfs oops

2003-06-23 Thread Oleg Drokin
Hello!

On Mon, Jun 23, 2003 at 11:16:27PM +0100, Nix wrote:

  Jun 22 13:52:42 loki kernel: Unable to handle kernel NULL pointer dereference at 
  virtual address 0001 
  This is very strange address to oops on.
 I'll say! Looks almost like it JMPed to a null pointer or something.

No, if it'd jumped to a NULL pointer, we'd see 0 in EIP.

  Jun 22 13:52:43 loki kernel: EIP:0010:[c0092df4]Not tainted 
  And the EIP is prior to kernel start which is also very strange.
  On the other hand the address c0192df4 is somewhere inside reiserfs code,
  so it looks like a single bit error, I'd say.
 I think it unlikely to be RAM problems given that the problem happened
 shortly after upgrading to 2.4.21; this was about half a day after I
 rebooted it because it threw a pile of never-seen-again, un-syslogged
 SCSI abort errors at me (sym53c875); and *that* was a few minutes after
 I rebooted into 2.4.21 for the first time.

Hm, so first there were some scsi problems and then reiserfs oops?

Actually since the RAM is good, I see no good reason for this to happen.
(actually I see no good reason for valid code before _text, either).

I wonder if 2.4.21 constantly crashes like that for you, then?

Bye,
Oleg


Re: illegal seek - unable to umount device

2003-06-23 Thread Oleg Drokin
Hello!

On Mon, Jun 23, 2003 at 07:44:57PM +0200, Gyimesi Akos wrote:

 I have just encountered a problem (?) with reiserfs on Linux 2.4.20. I wrote 
 the following (shortened) python code which produced an unkillable process:
 #!/usr/bin/python
 file=open(terabyte_length_file, w)
 file.seek(1024*1024*1024*1024)
 file.write(%c % 0)
 file.close()
 On ext2, this code produced a file which had the (nominal) length of 1 
 terabyte and the real size of 16k. I ran this script on a reiserfs partition 
 as a normal user, and it
 1. produced an unkillable process with 100% CPU usage
 2. i was unable to kill it anyway, and i was unable to unmount the filesystem.
(Illegal seek, Device or resource busy). So finally i rebooted the system.
Of course, the rebooting process also got stuck as init couldn't umount
the device either.
 Naturally, i was not really surprised that this code didn't work on reiserfs, 
 but i was quite astonished by the fact that this dirty program could 
 practically kill the machine as a normal user - in a meaning that running 
 it several times consumes its CPU resouces and makes it impossible to unmount 
 the filesystem.

This is known problem and we hope to push the fix to Marcelo soon.
In short, the process is not stuck, it is busy creating file hole (4k at a time).
If you'd wait long enough, it will eventually finish.

Thank you for the report.

Bye,
Oleg


Re: Problem mounting ReiserFS as root partition

2003-06-16 Thread Oleg Drokin
Hello!

On Sun, Jun 15, 2003 at 03:08:39PM +0200, Till Gerken wrote:

 I am not able to mount my ReiserFS partition as root partition with any kernel 
 later than 2.4.20-pre11. When trying, I get
 read_super_block: can't find a reiserfs filesystem on (dev 03:00, block 64, 
 size 1024)

Are you sure you want to mount /dev/hda as opposed to some partition on /dev/hda?
Please check that you are mounting correct thing (root= option in lilo.conf for 
example).

Bye,
Oleg


Re: Will Reisefs have undo?

2003-06-16 Thread Oleg Drokin
Hello!

On Sun, Jun 15, 2003 at 02:29:26PM -0500, Alex Malinovich wrote:

 I don't think snapshots are really needed. I would be perfectly content
 with a semi-intelligent filesystem that would mark files as deleted in
 the journal while leaving the file intact on the HD. As soon as the file
 has actually been overwritten, it is marked purged and cannot be
 recovered. But up to that point, it would be a relatively simple task to
 just tell the journal to mark the file as not-deleted again.

Hm. You seem to confuse journal with log of operations.
The journal just holds copies of the blocks we are going to overwrite
to achieve atomic block overwrite. So we cannot mark something deleted in journal.

Bye,
Oleg


Re: Problem mounting ReiserFS as root partition

2003-06-16 Thread Oleg Drokin
Hello!

On Mon, Jun 16, 2003 at 11:16:52AM +0200, Till Gerken wrote:

 kernel (hd0,0)/boot/vmlinuz-2.4.21 root=/dev/hda1,rw hdc=ide-scsi pci=biosirq 
 idebus=66

I suggest you to replace the comma between root=/dev/hda1 and rw with space.
 
Bye,
Oleg


Re: Identifying files with badblocks

2003-06-09 Thread Oleg Drokin
Hello!

On Sun, Jun 08, 2003 at 11:16:05PM +0200, Felix E. Klee wrote:
 I am using a ReiserFS 3.6 formatted IBM-DJSA-220-ATA-harddisk with SuSE 
 LINUX 8.2. Today, by using 
 badblocks -s /dev/hda under LINUX 
 and IBM/Hitachi's
 Drive Fitness under DOS, 
 I found that the drive contains a continuous section of bad blocks. The 
 Drive Fitness Utility has an option to repair the corresponding sector but 
 this will destroy all data in it. This is OK, but I need to know what data 
 is destroyed so that I can recreate it later. So, now my question:
 How do I find out which files correspond to certain bad blocks on my Reiser 
 file system?

Well, you can do
debugreiserfs -d /dev/your_device somefile.
then lookup the blocknumber there as text string.
This will give you file's key.
Then lookup the direntry by this key.

Bye,
Oleg


Re: Error In Dmesg

2003-05-31 Thread Oleg Drokin
Hello!

On Fri, May 30, 2003 at 08:56:32AM -0400, Bill Rees wrote:
 My application is running with the Sun jdk 1.4.1_02 under Red Hat 9.0 and
 I've received this  error in dmesg:

Do you have any way to reproduce?

 Unable to handle kernel NULL pointer dereference at virtual address 0018
  printing eip:
 e092b263
 *pde = 
 Oops: 
 CPU:0
 EIP:0060:[e092b263]Not tainted
 EFLAGS: 00010282
 EIP is at do_journal_end [reiserfs] 0x3b3 (2.4.20-8smp)

So it's died here:
  /* for each real block, add it to the journal list hash,
  ** copy into real block index array in the commit or desc block
  */
  for (i = 0, cn = SB_JOURNAL(p_s_sb)-j_first ; cn ; cn = cn-next, i++) {
if (test_bit(BH_JDirty, cn-bh-b_state) ) {

(in test_bit) because cn-bh is zero.
Hm. Chris, do you have any ideas how that might have happened?

Bye,
Oleg


Re: About Reiser4 release date ...

2003-05-30 Thread Oleg Drokin
Hello!

On Thu, May 29, 2003 at 12:09:31AM +0200, Fred -- Speed Up -- wrote:

 How about the BitKeeper repositery : does it contain the latest 2.5 kernel sources 
 along with your Reiser4 developement patches ?

Yes, our reiser4-linux-2.5 bk reporsitory contain latest 2.5 kernels sources + patches 
to make UML arch functional + necessary
reiser4 changes. And our reiser4 bk repository contains current reiser4 code.

Bye,
Oleg


Re: disk or reiserfs problem?

2003-05-29 Thread Oleg Drokin
Hello!

On Wed, May 28, 2003 at 01:07:27PM -0700, Jeff Breidenbach wrote:
 This is after a hard (power switch) reboot (due to I/O errors). The
 disk in question has about 125 GB of data on a single 200GB reiserfs
 partition. Do people think the disk is toast, or is this possibly some
 correctable filesystem problem? The machine is remote, so I can't
 hdb1: bad access: block=35, count=5
 end_request: I/O error, dev 03:41 (hdb), sector 35

Looks like disk have gone bad. If you are lucky enough, some of the data
still can be recovered. Try to copy entire disk into a file/to another
disk to see how much bad sectors are there.

Bye,
Oleg


Re: [PAID PRIORITY SUPPORT REQUEST] Quotas not fully working in 2.4.21-pre5

2003-04-05 Thread Oleg Drokin
Hello!

On Sat, Apr 05, 2003 at 01:11:18PM +0200, Philippe Gramoull? wrote:
   |  Object: Can't set quotas with 2.4.21-pre5
   |   After we upgraded from 2.4.19-pre6 + quota patches , 2.4.21-pre5 + 
 data-logging and quota
   |  patches, we can't set quotas anymore.
   | 
   | Have you enabled all the compatibility stuff in kernel config? (show the 
 relevant part of .config please).
 Well, i think so yes.
 You'll find attached the .config.

# CONFIG_QIFACE_COMPAT is not set
suggests that you do not have compatibility stuff enabled.
Please enable it and see if it cures the problem? 
At least that's how I forced my quotatools to work.
Well, mine are 3.03 and your are 3.07, help text says that this option is only 
required for quota
tools = 3.04. But I still think you should try it.

Bye,
Oleg


fixes/changes to mount options parser

2003-04-04 Thread Oleg Drokin
Hello!

   Ok, so after some silence on this front, here is 2.4 and 2.5 versions of
   mount options parser fixes I propose.
   These fixes consist of:
   When you pass some mount options at mount time, default mount options are not reset 
if what you pass
  does not change the defaults. (both in 2.4 and 2.5)
   If you are doing remount and parser detected error, remount fails (2.4 only)
   If you pass more than one jdev= option, parsers spits out error (2.5 only, as 2.4 
does not have this yet)
   Remount options (better) support (2.5 did not had ability to propagate mount 
options at all, by Jeff Mahoney).

   What this patch does not do, but was supposed to do:
 Hans decided that conflicting mount options on one line (like 
tails=off,tails=small) should produce
 error on mount/remount. After I implemented this, it turned out this does not 
work with remounting.
 if you mount with -o tails=off, and then later do remount with -o tails=small, 
then the filesystem 
 will be passed options string like tails=off,tails=small. This seems to be a 
feature of mount(8).
 So the only option that cannot appear on command line twice is jdev (resize 
can be met more than once
 is you enlarged drive twice in a row). So I reverted to the old way of last 
option takes effect.
 As I do not want to split the code to determine whenever this is mount or 
remount, this behaviour
 will take place in case of both mount and umount.

 Attached are three patches:
 2.4.20_parsefix.diff is patch for 2.4.20
 2.4_parsefix.diff is patch against latest Marchelo's bk tree. (2.4)
 2.5_parsefix.diff is patch against latest Linus' bk tree. (2.5)

 This code was only tested by me, and I want to hear any opinions on ways to 
improve before I pass
 it to our tester and start to try to submit it upstream. So if you want to try 
the code, treat it
 as experimental one. (BTW, I wonder how often people do actually pass any mount 
options at all
 and how often remount is made wit hany additional options?)

Bye,
Oleg
--- fs/reiserfs/super.c.1   Fri Apr  4 18:39:16 2003
+++ fs/reiserfs/super.c Fri Apr  4 19:31:39 2003
@@ -402,8 +402,11 @@
mount options that have values rather than being toggles. */
 typedef struct {
 char * value;
-int bitmask; /* bit which is to be set in mount_options bitmask when this
-value is found, 0 is no bits are to be set */
+int setmask; /* bitmask which is to set on mount_options bitmask when this
+value is found, 0 is no bits are to be changed. */
+int clrmask; /* bitmask which is to clear on mount_options bitmask when this
+   value is found, 0 is no bits are to be changed. This is
+   applied BEFORE setmask */
 } arg_desc_t;
 
 
@@ -413,37 +416,42 @@
 char * option_name;
 int arg_required; /* 0 is argument is not required, not 0 otherwise */
 const arg_desc_t * values; /* list of values accepted by an option */
-int bitmask;  /* bit which is to be set in mount_options bitmask when this
-option is selected, 0 is not bits are to be set */
+int setmask; /* bitmask which is to set on mount_options bitmask when this
+   value is found, 0 is no bits are to be changed. */
+int clrmask; /* bitmask which is to clear on mount_options bitmask when this
+   value is found, 0 is no bits are to be changed. This is
+   applied BEFORE setmask */
 } opt_desc_t;
 
 
 /* possible values for -o hash= and bits which are to be set in s_mount_opt
of reiserfs specific part of in-core super block */
 const arg_desc_t hash[] = {
-{rupasov, FORCE_RUPASOV_HASH},
-{tea, FORCE_TEA_HASH},
-{r5, FORCE_R5_HASH},
-{detect, FORCE_HASH_DETECT},
-{NULL, 0}
+{rupasov, 1FORCE_RUPASOV_HASH,(1FORCE_TEA_HASH)|(1FORCE_R5_HASH)},
+{tea, 1FORCE_TEA_HASH,(1FORCE_RUPASOV_HASH)|(1FORCE_R5_HASH)},
+{r5, 1FORCE_R5_HASH,(1FORCE_RUPASOV_HASH)|(1FORCE_TEA_HASH)},
+{detect, 1FORCE_HASH_DETECT, 
(1FORCE_RUPASOV_HASH)|(1FORCE_TEA_HASH)|(1FORCE_R5_HASH)},
+{NULL, 0, 0}
 };
 
 
 /* possible values for -o block-allocator= and bits which are to be set in
s_mount_opt of reiserfs specific part of in-core super block */
 const arg_desc_t balloc[] = {
-{noborder, REISERFS_NO_BORDER},
-{no_unhashed_relocation, REISERFS_NO_UNHASHED_RELOCATION},
-{hashed_relocation, REISERFS_HASHED_RELOCATION},
-{test4, REISERFS_TEST4},
-{NULL, 0}
+{noborder, 1REISERFS_NO_BORDER, 0},
+{border, 0, 1REISERFS_NO_BORDER},
+{no_unhashed_relocation, 1REISERFS_NO_UNHASHED_RELOCATION, 0},
+{hashed_relocation, 1REISERFS_HASHED_RELOCATION, 0},
+{test4, 1REISERFS_TEST4, 0},
+{notest4, 0, 1REISERFS_TEST4},
+{NULL, 0, 0}
 };
 
 const arg_desc_t tails[] = {
-{on, REISERFS_LARGETAIL},
-{off, -1},
-{small, REISERFS_SMALLTAIL},
-{NULL, 0}
+{on, 

Re: [PAID PRIORITY SUPPORT REQUEST] Quotas not fully working in 2.4.21-pre5

2003-04-04 Thread Oleg Drokin
Hello!

On Sat, Apr 05, 2003 at 02:45:29AM +0200, Philippe Gramoull? wrote:

 Object: Can't set quotas with 2.4.21-pre5
  After we upgraded from 2.4.19-pre6 + quota patches , 2.4.21-pre5 + data-logging and 
 quota
 patches, we can't set quotas anymore.

Have you enabled all the compatibility stuff in kernel config? (show the relevant part 
of .config please).
Everything was worked for me last time I tried (at the time of creatign those patches).
I will do more tests now and see what will happen.
What quotatools version do you have?

 here is an extract from 2 straces:
 quotactl(Q_SETQLIM|GRPQUOTA, /dev/sdb1, 502242, {2048, 0, 0, 0, 0, 0, 0, 0}) = -1 
 EINVAL (Invalid argument)
 quotactl(Q_SETQLIM|GRPQUOTA, /dev/sdb1, 32066, {2048, 0, 0, 0, 0, 0, 0, 0}) = -1 
 EINVAL (Invalid argument)
 This problem is really nasty as this filer is used for the paying services and now 
 clients can use as much
 space as they want.

So, you mean that even old quota limits that were set previously are not enforced?

Bye,
Oleg


Re: possible bug - fsck shows perfect results, linux refuses to mount

2003-03-30 Thread Oleg Drokin
Hello!

On Sun, Mar 30, 2003 at 11:22:50PM -0800, Robin H. Johnson wrote:
 The system refused to mount it originally, so I ran just plain
 --fix-fixable. It showed nothing wrong at all. By a fluke of terminals,
 I have a copy of this first output
 [http://www.orbis-terrarum.net/~robbat2/reiserfs/hdb1.first].
 However the system still refused to mount the drive, showing this in
 syslog:
 Mar 30 22:14:22 [kernel] read_super_block: can't find a reiserfs
 filesystem on (dev 03:41, block 64, size 1024)
 Mar 30 22:14:22 [kernel] read_super_block: can't find a reiserfs
 filesystem on (dev 03:41, block 8, size 1024)

This is indeed strange.

 The drive still refused to mount.
 I dug in fsck.reiserfs --help, and saw '--scan-whole-partition'. Tried
 --rebuild-tree with that on. It showed a LOT of stuff about StatDatas,
 and completed successfully.

Well, this is likely to destroy data, but still it should be mountable at the point of
completion.

Can you please make a metadata dump for us?
debugreiserfs -p /dev/hdb1 | bzip2 -9c metadata.bz2 and make this file
available for us to download.

 I just find that there is something definetly wrong if fsck says the
 partition is fine, but Linux refuses to mount it. Either this is a bug
 in Linux, or the reiserfsprogs. Either way, somebody has a bug :-)

Sure, and we are interested in resolving the problem.

Thank you.

Bye,
Oleg


Re: filesystem corruption ?

2003-03-21 Thread Oleg Drokin
Hello!

On Fri, Mar 21, 2003 at 02:01:38PM +0100, Bernd Schubert wrote:
  So, the beam of X-rays run through the memory module corrupting some bits?
  ;) This stuff should not have been written to disk, so probably
  plain reboot should fix everything? Can you test that?
 indeed after rebooting everything is fine again. We will run another memtest86 

So on-disk corruption is out of question.

 during the weekend, though I really don't believe we will find a problem.

Ask those physics guys to run some X-ray experiments while you are running memtest86 ;)

 Though this machine will be replaced by a real server in a few month, I'm 
 still rather worried what happend. Even if its 'only' a hardware memory 
 problem this means lots of trouble for us -- on the one hand it seems not to 
 be memtest86 detectable and on the other hand our programs really do need 

Well, it may be not detectable because no high-enerty beams are running around at
the time of test.

 working memory, but of course this is not of your concern.

I've learn in the school that if you put some bit amount of plumbum in between
some area and source of radiation, chances are radiation that will reach the
protected area will be of much lesser strenght.
In fact you might go to those guys and ask them what matherial (and how much of it)
is best suited to shield against stuff they generate.

Bye,
Oleg


Re: badblocks blocksize setting

2003-03-15 Thread Oleg Drokin
Hello!

On Sat, Mar 15, 2003 at 01:09:19AM +0100, Marius Reiner wrote:

 not really a reiserfs issue:
 When following the FAQ everything is fine, I set blocksize zu 4096 as
 debugreiserfs told me, and don't get any bad blocks.

Good.

 Nevertheless I'm a bit concerned about the ones, badblocks reports, when
 being invoked without the -b option. How does this happen?

Can you please show the command line and the resulting output with badblocks?
Also what is in dmesg output after such a run?

 Or should I just ignore them?

There is not enough info yet.
Can you please tell us whole story?

Bye,
Oleg


Re: Getopt improvements

2003-02-28 Thread Oleg Drokin
Hello!

On Fri, Feb 28, 2003 at 03:04:26PM +0300, Hans Reiser wrote:
   For the simple cases (which also happen to be all we have right 
 now), yes, I think that my implementation is cleaner. It allows the 
 simple use of mutually exclusive options, through the no prefix, and 
 clearing of the other bits in a multivalue option.  For now, that's 
 all we need - and it's a valid argument for using my code. However, 
 what I like about Oleg's implementation is that if you have an option 
 that excludes other options (even when it's not multivalue), it can 
 clear those bits as well.
 It clears them without failing, yes?  Not sure I like that.
 Hm, why should it fail?
 Incompatible options should fail as they represent error.  Feel free to 
 argue with that.

Hm, I am not going to argue. But we never had this kind of logic.
Usually the latest-specified option was taking effect.

Bye,
Oleg


Re: reiserfsprogs 3.6.5-pre2 release.

2003-02-25 Thread Oleg Drokin
Hello!

On Wed, Feb 26, 2003 at 01:54:48AM +0100, Philippe Gramoull? wrote:
 # time reiserfsck -a /dev/sdb1
 Reiserfs super block in block 16 on 0x811 of format 3.6 with standard journal
 Blocks (total/free): 143109020/59148009 by 4096 bytes
 Filesystem is cleanly umounted
 Replaying journal..
 0 transactions replayed
 Checking internal tree..finished  
 real0m47.890s
 user0m6.668s
 sys 0m0.732s

Thanks for trying.
48 seconds is much longer than we expected such test should take.
Was the system loaded at the time of test?

Bye,
Oleg


Re: Indicating filtered spam?

2003-02-22 Thread Oleg Drokin
Hello!

On Sat, Feb 22, 2003 at 02:12:29PM +0100, Szabolcs Szasz wrote:
 Wouldn' it be better to put (back? was it there? I can't
 recall) to the Subject header an indication for filtered
 spam?
 The fact that now there is Spamassasin at work, actually
 changes the behavior of my organic brain-embedded spam filter
 so that I now find myself opening mails I had been deleting
 before.

Seems our filter that directs spam to /dev/null have broke again.
I'll see what can be done with it.

Bye,
Oleg


Re: [PATCH] new data logging and quota patches available

2003-02-22 Thread Oleg Drokin
Hello!

On Fri, Feb 21, 2003 at 06:32:11PM -0500, Chris Mason wrote:

 ftp.suse.com/pub/people/mason/patches/data-logging/2.4.21 will soon be
 updated with a new set of data logging and quota patches against
 2.4.21-pre4
 The data logging code is updated with another set of io stalling fixes,
 they should improve performance of data=ordered and data=writeback by
 being smarter about forcing commits under heavy write load and kicking
 kreiserfsd.
 Treat these with care, they've gotten a ton of testing under the suse
 kernel, but the port to vanilla was just done today.
 The quota patches include a fix for incorrect sd_block counts on
 symlinks.

Replacement 05-data-logging-36.diff.gz file that applies to 2.4.21-pre4-ac5
is available at
ftp://namesys.com/pub/reiserfs-for-2.4/testing/05-data-logging-36-ac5.diff.gz
It compiles, boots, survives my (simple) testing. (writing this email
from patched 2.4.21-pre4-ac5, too). Quota works. symlinks are now have correct
blocks count too
The reason for rejects is mostly DIRECTIO fix that also went into current
bk snapshot, so probably it will apply to Marcelo's bk tree, too.
Chris: Is it intended that directio only works on data=writeback
mounted filesystems?

Also following README file diff should be considered:

--- README.orig Sat Feb 22 16:44:34 2003
+++ README  Sat Feb 22 16:44:49 2003
 -28,7 +28,7 
 
 These add reiserfs quota support
 
-07-quota-v2-2.4.21.diff.gz
+07-quota-v2-2.4.21.diff.gz # you don't need this on -ac, too
 08-reiserfs-quota-26.diff.gz
 09-kinoded-8.diff.gz
 


Re: reiserfs messages cleanup patch.

2003-02-21 Thread Oleg Drokin
Hello!

On Fri, Feb 21, 2003 at 09:22:16AM +0100, Manuel Krause wrote:

 It doesn't apply to my kernel setup [-pre4 + data-logging + preempt]
 -- too many hunks failing in my eyes.

datalogging is just too big of a change.

Bye,
Oleg



About direntries pointing to nowhere on reiserfs problem in 2.4

2003-02-20 Thread Oleg Drokin
Hello!

Vladimir have finally tracked the problem to a race between
two iget4 running on same file whose inode is not in cache.
The sequence of events is like this (UP case):
1st thread:
take inode_lock
search through inode cache, but found nothing.
alloc new inode, mark it as locaked.
release inode_lock
call reiserfs_read_inode2().
 do some stuff.
 call search_by_key()
 schedule()

Now 2nd thread comes in:
take inode_lock
search through inode cache, found inode with same inode number.
check that there is find_actor defined for reiserfs.
call find_actor()
 check that inode's primary key's dir_id is equal to expected one.
   but at that time this part of inode is uninitialized yet!
   so we return 0;
... 
And we create second inode for the same file.

   This scenario seems possible for any filesystem that stores some cookie 
   in private part of inode and whose read_inode2 can schedule. We checked and
   coda seems safe because they take a semaphore in iget().

   So we solved that with patch below (Zygo, others who think they have this problem,
   please check).

   But Vladimir is really unhappy with that comparison with zero and guessing (though
   he agrees it is correct, if FS is undamaged).

   Andrew, Alan: Is there a possibility to have iget5_locked() kind of interface
   in 2.4? We need some way to init parts inode under inode_lock to solve this problem
   in more elegant way. (and inode_lock is not even exported, so I invented another 
spinlock
   to guard atomicity of inode pkey update on SMP).

Bye,
Oleg

= fs/reiserfs/inode.c 1.42 vs edited =
--- 1.42/fs/reiserfs/inode.cThu Feb 13 15:42:42 2003
+++ edited/fs/reiserfs/inode.c  Thu Feb 20 17:23:24 2003
@@ -20,6 +20,10 @@
 static int reiserfs_get_block (struct inode * inode, long block,
   struct buffer_head * bh_result, int create);
 
+/* This spinlock guards inode pkey in private part of inode
+   against rae between find_actor() vs reiserfs_read_inode2 */
+static spinlock_t keycopy_lock = SPIN_LOCK_UNLOCKED;
+
 void reiserfs_delete_inode (struct inode * inode)
 {
 int jbegin_count = JOURNAL_PER_BALANCE_CNT * 2; 
@@ -898,8 +902,9 @@
 bh = PATH_PLAST_BUFFER (path);
 ih = PATH_PITEM_HEAD (path);
 
-
+spin_lock(keycopy_lock);
 copy_key (INODE_PKEY (inode), (ih-ih_key));
+spin_unlock(keycopy_lock);
 inode-i_blksize = PAGE_SIZE;
 
 INIT_LIST_HEAD(inode-u.reiserfs_i.i_prealloc_list) ;
@@ -1220,10 +1225,27 @@
unsigned long inode_no, void *opaque )
 {
 struct reiserfs_iget4_args *args;
+int retval;
 
 args = opaque;
+/* We protect against possible parallel init_inode() on another CPU here. */
+spin_lock(keycopy_lock);
 /* args is already in CPU order */
-return le32_to_cpu(INODE_PKEY(inode)-k_dir_id) == args - objectid;
+if (le32_to_cpu(INODE_PKEY(inode)-k_dir_id) == args - objectid)
+   retval = 1;
+else
+   /* If The key does not match, lets see if we are racing
+  with another iget4, that already progressed so far
+  to reiserfs_read_inode2() and was preempted in
+  call to search_by_key(). The signs of that are:
+Inode is locked
+dirid and object id are zero (not yet initialized)*/
+   retval = (inode-i_state  I_LOCK) 
+!INODE_PKEY(inode)-k_dir_id 
+!INODE_PKEY(inode)-k_objectid;
+
+spin_unlock(keycopy_lock);
+return retval;
 }
 
 struct inode * reiserfs_iget (struct super_block * s, const struct cpu_key * key)




Re: About direntries pointing to nowhere on reiserfs problem in 2.4

2003-02-20 Thread Oleg Drokin
Hello!

On Fri, Feb 21, 2003 at 08:11:16AM +0100, Manuel Krause wrote:

 Is this fix safe for usage already?

work for me (tm) ;)
It will most probably will be replaced by iget5_locked, though.

 Mmmh. I have some hangs within KDE 2.2.2 Konqueror when copying over 
 (existing) directory links for some weeks now. I don't copy often over 
 links but when it stalls, a directory link is somewhere involved.
 (No crashes, no messages in the logs, just minutes for copies or 
 deletes that happened in seconds usually.)
 Should I worry and use the patch
  -- or finally upgrade my KDE ;-))

This does not look like this bug in reiserfs, the patch is supposed to fix.

Bye,
Oleg



Re: reiser4 and 2.5.60

2003-02-18 Thread Oleg Drokin
Hello!

On Sat, Feb 15, 2003 at 04:00:33PM +0100, Ookhoi wrote:

 I still get a Segmentation fault when I want to untar a kernel source on
 a fresh 512MB loop filesystem.
 Does this help you?

Kind of.
It seems that inode_file_plugin(inode)-key_by_inode pointer is zero for one of inodes.
But I do not see how that can happen at all.
I personally untarred (and then compiled) kernel on this reiser4 snapshot without any
problems more than once (in fact this is on of my basic tests). I tried bot UP and SMP,
block device and loop device with file.  Can you describe your system in more details?

Bye,
Oleg  



Re: Error - Partition Correspondance [was Re: Corrupted/unreadable journal: reiser vs. ext3]

2003-02-17 Thread Oleg Drokin
Hello!

On Tue, Feb 18, 2003 at 12:35:23AM +0100, Manuel Krause wrote:

 BTW, do the ReiserFS errors nowadays print out a usable partition 
 identification (like Chris actual data-logging patches perform at mount, 
 e.g.)?

Sometimes it does.

 I mostly always have 2 partitions with ReiserFS mounted, so -- is it 
 still meaningless to get an error message related to one of them in my logs?

It depends on what are the messages.

 I posted this circumstance some 3.6-ReiserFS levels ago and someone of 
 your team wanted to implement this after his task-list was done, IIRC.

Yes. I have a patch dated back to May 7th, 2002. But it was never
accepted for reason I don't remember already.
I will dig through my email, though. Probably I will give it another try.

Bye,
Oleg



Re: rsync and memory leak on linux 2.2?

2003-02-13 Thread Oleg Drokin
Hello!

On Thu, Feb 13, 2003 at 12:11:44PM -0500, Patrick O'Rourke wrote:
 What happens is that we put both systems under a fixed work load and after
 many hours system B will start consuming lots of swap and suffer degraded
 performance.  Through some kernel debugging we discovered that the 4K kmalloc
 slab is slowly growing to the point where there is enough memory pressure to
 start swapping.  We observe that one of the heaviest users of the 4K slab is
 reiserfs_kmalloc(), and in particular, the calls issued from
 get_mem_for_virtual_node() and reiserfs_file_write().  As an experiment we
 ran the same work load, but this time disabled the rsync'ing of the log files
 and we no longer see the growth in the 4k slab.
 This leads us to believe that we have a memory leak somewhere in the reiserfs
 and was wondering if anyone else has seen this, and if so, if a patch exists.
 One question I have is that get_mem_for_virtual_node() will first attempt to
 allocate memory atomically, and if this fails, will re-try non atomically.
 In this case we return SCHEDULE_OCCURRED which results in fix_nodes()
 will also return to its caller.  Is it possible for buffer allocated by
 get_mem_for_virtual_node() to be lost in this case?  I did not see any
 path out of reiserfs_file_write() in which we could return w/o freeing
 the buffer.  Perhaps this problem is triggered via memory pressure?

Thanks for the report. I will investigate it tomorrow, when it is day again
in Russia.
Meanwile there is easy way to find if there is some memory leaked through
reiserfs_kmalloc. If you enable CONFIG_REISERFS_CHECK kernel option,
then there is code in reiserfs_kmalloc, that cheks if we alloc, but not free
the memory.
It will print a warning once in a while.
But please note that CONFIG_REISERFS_CHECK will impose some (substantial?)
slowdown to reiserfs operations, so you might just unconditionally
enable the check for memleak in there if you cannot afford the slowdown.

Bye,
Oleg



Re: Corrupted/unreadable journal: reiser vs. ext3

2003-02-12 Thread Oleg Drokin
Hello!

On Wed, Feb 12, 2003 at 05:56:58PM +0100, Anders Widman wrote:
 
  So it would be possible to do some actions to 1) get some blocks back in the 
described
  way, 1.1) write to really bad blocks should have remaped them already here if 
there is
  a space in remap area 2) save bad blocks to badblock list in fs if they are still 
bad -
  out of remap area. 
  Would be not bad to try to recover in this way already remapped blocks - do not 
know how
  to get the list of them only.
  Ok, but what if the IO error you got is not a bad block, but a bad cable? Do you 
want
  the fs to work in the described way? Trying to fix all automatically? I am not 
sure.
   How about trial and (then) error? :)

That might be suitable for fsck, but not for kernel I am sure.
Kernel should just probably return error or try to use different block (if it was
doing write) and if certain number of attempts failed, return error too.
Also remount R/O if write error is in system area (journal, superblock, bitmaps)
or special mount option was given that demands remounting R/O on io errors.

Bye,
Oleg



Re: Error after rebuilding file system tree

2003-02-11 Thread Oleg Drokin
Hello!

On Tue, Feb 11, 2003 at 11:01:55AM +0100, Karl Mistelberger wrote:
 Feb 11 08:02:54 nnk kernel: 3a:04: rw=1, want=26537940, limit=26529792
 Feb 11 08:02:54 nnk kernel: attempt to access beyond end of device

what does debugreiserfs /dev/your_device says?

 Starting the system again resulted in:
 4reiserfs: found format 3.5 with standard journal
 4reiserfs: checking transaction log (lvm(58,4)) for (lvm(58,4))
 6attempt to access beyond end of device
 63a:04: rw=1, want=26532716, limit=26529792
 6attempt to access beyond end of device

Hm.
1. please create metadata dump:
debugreiserfs -p /dev/yourdevice | bzip2 -9c /path/metadata.bz2
Then make this metadata.bz2 file available for download.

2. Get newer reiserfsprogs from our ftp, and try to run rebuild-tree
with reiserfsck from latest reiserfsprogs.

Seems that your kernel found non-existent transactions in log.
Where is your kernel from? (what distro/version? any updates applied?).

Thank you.

Bye,
Oleg



Re: reiserfsck --blame-it-on-the-hardware-yeah-yeah

2003-02-10 Thread Oleg Drokin
Hello!

On Tue, Feb 11, 2003 at 05:24:58AM +1300, Sam Vilain wrote:
   Therefore, your reiserfsck has a bug.  The whole point of a fsck is
  Well, currently the logic is If we cannot read some block, that
  usually means this is a badblock.
  And so it prints the message. Of course more testing about
  if the block is beyond partition boundary should be probably added.
 The block is not bad, it's EINVAL :-).  The block *number* is bad; you 

Sure.

 *could* add to your is_block_shagged() function a test for whether the 
 block is out of bounds, but the point is that if it gets as far as that 
 function, chances are that it is too late.

This is being worked on currently.

 (In reiserfsck), you need to do the bounds check when the referring 
 block/data structure is checked.

Sure. We have some checks, though apparently not enough.

[horrors about recompiling fsck with customly disabled stuff skipped].
If you really decided to shoot yourself in the foot, you might as well
just will journal with zeroes. It would be much easier this way ;)

   - filesystem now mounts, however about the first 2 levels of directories,
 and many recently written files, have had their directory entries
 lost - lost+found contains roughly 11,000 entries (of 150,000 or
 so).

Hm, probably corresponding blocks (with names) were only present in
journal, and you erased that.

   - thankfully, I can locate the several hundred megabytes of .debs to save
 myself spending days re-downloading it all over 56k :-).  Mission
 successful.

At least you have not lost anything valuable. This is good.

 If reiserfsck was built with --no-journal-available in mind (that is, 
 ignoring the data present in an in-partition journal with that switch), 
 then I'm fairly sure that I wouldn't have suffered the last problem.  

How so?

 After the first scan, the journal would have been written back to an empty 
 state.

So what? If directories content was only present in journal, you just loose that info.

   I'm going to try removing that test in the 3.x.1b version and see if
   the fsck completes.
  Well, 3.x.1b should not be actually used, lots of bugs were fixed since
  then.
  Vitaly: We need a check that journal target block is in range of
  filesystem. Please add this test.
 That is not all you must do!

 You need to do one, preferably both of the following:
   a) allow reiserfsck to ignore the in-partition journal, without producing
  an insane result (where the filesystem header says there is a journal,
  but the space where the journal is has filesystem data in it).

This cannot happen in any sane way. (I mean root block just cannot live in journal).

   b) make reiserfsck validate the journal as well as the filesystem,
  probably playing them back itself rather than relying on a mount
  option that just does the playback for it.  In theory you could decide
  whether to use the on-disk or the in-journal data structure, depending
  on which was more consistent!

I was thinking about that already. May be we will do something like that in 2.7/2.8,
but certainly not now. And it will make lots of complications, I fear.
People who will forget to upgrade their reiserfsprogs will get in trouble when
upgrading kernels and so on...

Bye,
Oleg



Re: reiserfsck --blame-it-on-the-hardware-yeah-yeah

2003-02-09 Thread Oleg Drokin
Hello!

On Sat, Feb 08, 2003 at 10:49:28PM +, [EMAIL PROTECTED] wrote:

 Ah, right, well that explains it.  It complained about block 524111,
 which would be physical block number 2096444.  This is off the end of
 the block device, which only has 261 blocks.

Aha, so this is indeed the problem.

 I acknowledge that I used `hda' where I should have used `hda1' for
 the simple read-test with dd, but did you not see the `badblocks'
 program output in the same e-mail?  `badblocks' read in the existing

Yes, I saw it.

 then wrote the original data back.  It detected no error anywhere in
 the block device.

That's good, it means your hard drive is probably ok.

 Therefore, your reiserfsck has a bug.  The whole point of a fsck is

Well, currently the logic is If we cannot read some block, that
usually means this is a badblock.
And so it prints the message. Of course more testing about
if the block is beyond partition boundary should be probably added.

 that any data, anywhere, can be corrupted - and reiserfsck should not
 fall over because of it.  So, what you should do is carefully go

Sure, unfortunatelly interactive part of reiserfsck is not very mature.
And what do you think it should have done? Shrink the size of FS
to fit changed (may be because of corruption) partition size?
Enlarge the partition? What else?

 through your filesystem data structure, insert garbage in at each
 unique structural location, and run `reiserfsck' on it to see if it
 handles the problem correctly.  Then I'd suggest sollowing that up
 with some randomly corrupted filesystems.

Yup, we are running such tests. But thanks for suggestion.

 Looking at the source code, I now see why the --no-journal-available
 switch does not do anything if a `standard' journal is used rather
 than an off-device journal.  However, I would suggest that this test
 is superfluous, and the tool has more benefit to the system
 administrator if the test for a `standard' journal with
 fsck_skip_journal is removed, or perhaps replaced with a warning or
 another prompt.

We will think about it. Thanks for the idea.

 I'm going to try removing that test in the 3.x.1b version and see if
 the fsck completes.

Well, 3.x.1b should not be actually used, lots of bugs were fixed since then.

Thanks for the report.

Vitaly: We need a check that journal target block is in range of filesystem.
Please add this test.

Bye,
Oleg



Re: reiserfsck --blame-it-on-the-hardware-yeah-yeah

2003-02-08 Thread Oleg Drokin
Hello!


 bread: Cannot read the block (524111).
 Aborted
 (none):~# dd if=/dev/hda of=/tmp/foo skip=524100 count=100
 100+0 records in
 100+0 records out
 (none):~# od -x /tmp/foo
 000 6974 6e6f 7720 6c69 206c 6562 6920 636e
 020 756c 6564 2064 6e69 7420 6568 6e20 7865
 [... lots of very valid looking data snipped ...]

This is wrong block, try adding bs=4k to dd
Also read not from /dev/hda, but from your partition instead

Bye,
Oleg



Re: kernel go-slow

2003-02-06 Thread Oleg Drokin
Hello!

On Thu, Feb 06, 2003 at 05:41:46PM +0100, Russell Coker wrote:

  but there is possible situations that will not generate disk activity,
  but may cause your system to go-slow, if there you have some
  unussual IO numbers while disk activity is moderate to low -
  most likely same sweet pair.
 The problem is that sar etc product jumbled results.  Profiling the kernel may 
 help, but may also hide the error, and it's not something I can easily do.

Well, you can do it very easily.
reboot with profile=2 kernel option.
when 100% sys cpu situation started - execute readprofile -r
when it is finished, execute readprofile -m /path/to/System.map somefile
then sort somefile and you are done, you are now seeing where is most of the time
is spent.

 The servers are locked in a managed server room on the other side of the city 
 so seeing the blinken lights is not an option.

;)
humourwebcam/humour

 I've put the aa1 kernel on half the machines and now I'll wait to see what 
 happens.  If the aa1 machines don't have the problem but the others do then 
 I'll go all aa1.

Ah, if your problem was with highmem I/O not present, then that might actually help.

Bye,
Oleg



Re: link/unlink problem gone?

2003-02-06 Thread Oleg Drokin
Hello!

On Thu, Feb 06, 2003 at 05:32:10PM -0500, Zygo Blaxell wrote:

Sigh, these were false hopes indeed.
I can reproduce it with 2.4.21-pre4, only it is now harder for some reason.
 I've seen times-to-failure ranging from 20 minutes to 20+ hours (!).

Same here.

Chris: My current idea is it happens during low memory conditions, so I am
actively running around prune_icache and id's dcache equivalent. Probably
you can easily reproduce that if you'd have no swap and not very much RAM.
 
(Ok, I just checked, limited the RAM to 90M and turned off SWAP entirely.
 and reproduced the problem fairly quickly)
 I have observed the problem on machines ranging in size from 96 to
 512MB RAM.  I haven't observed a correlation between swapping activity
 and failures but I haven't been looking for this either.  The machines

I noticed that with newer 2.4.21-pre kernels first I see processes die because
of OOM and only after that I see direntries pointing to nowhere.
I reproduced this much more than once, so I believe there is some correlation
between these.

 that have problems machines are swapping at some time or another (they
 have several hundred MB of swap used).

And they are just swapping all the time, so it may take a while before
useful code runs and problem happens, it seems.

So far I decided that with SWAP turned off one can reproduce problem more
easily that with SWAP on (especially if swap is large).

Bye,
Oleg



Re: Fwd: Re: Segmentation Fault when mounting ataraid

2003-01-29 Thread Oleg Drokin
Hello!

On Wed, Jan 29, 2003 at 08:38:16PM +0300, Hans Reiser wrote:

 Well, that certainly looks like a bug in mount options parsing code.  
 Edward and Oleg, please review and fix.

There is another decoded output that makes much more sence to me.
And it suggest something gone wrong within block layer.
(And in the one you are referring to parse_options's address is only
stored in registers, not in bactrace. Also starting from 2.4.20,
there is no function named parse_optinos in reiserfs code).

Bye,
Oleg



Re: Segmentation Fault when mounting ataraid

2003-01-29 Thread Oleg Drokin
Hello!

On Wed, Jan 29, 2003 at 07:32:31PM +0100, Jochen Haemmerle wrote:

 So, here it comes again!
 Don't care about the warning this is the machine the errror occures!
 I hope someone understands that sh*** because I don't!!!

Ok, looks like __make_request tries to call get_request, and there is something
wrong with request queue.

 Yesterday I've patched my Kernel to 2.4.21-pre3. The bug does not appear!
 It seems to be only a bug of the 2.4.20 (on 2.4.19 it works too...as I 
 allready mentioned here)

Well, that probably means there was some bug in 2.4.20, that is now fixed in
ataraid code path in later kernel, I presume.
Though I quckly scanned changelogs and see nothing related.

Bye,
Oleg 



Re: Re: Segmentation Fault when mounting ataraid

2003-01-28 Thread Oleg Drokin
Hello!

On Tue, Jan 28, 2003 at 09:49:29PM +0100, Jochen Haemmerle wrote:

 Well, down there it is!!

Hm.
Strange stacktrace, I'd say.
Please also decode EIP line, may be you need to get never ksymoops for that.
(EIP 0010:[c01a62c0]Tainted: P)
BTW, what proprietary modules do you have loaded?
 
 guardian@viking:~$ cat segfault.txt | ksymoops -m /boot/System.map-2.4.20
 ksymoops 2.4.6 on i686 2.4.20.  Options used
 -V (default)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.20/ (default)
 -m /boot/System.map-2.4.20 (specified)
 
 Unable to handle kernel NULL pointer dereference at vitual address 0004
 c01a62c0
 *pde = 
 Oops: 0002
 CPU:   0
 EFLAGS: 00010012
 eax:    ebx: c03324f8  ecx: dff188a0   edx: c03324fc
 esi:    edi: c03324f8  ebp: 0008   esp: d7839d98
 Process mount (pid: 327, stackpage=d7839000)
  007c 098a9f2d c0332518 0800 c0332520 c0332518 0080 
 
  003e003f  c01a6c6c c03324f8  d7cc54a0 0008 
 
 Call Trace:   [c01a698d] [c01a6c6c][c01a6ccc] [c01a6e27] 
 [c0172671]
 [c0173188] [c013c712] [c014d546][c013c8fd] [c014e589] [c014e842]
 [c014e6ad] [c014ec6f] [c0106f17]
 Code: 89 50 04 89 02 c7 01 00 00 00 00 c7 41 04 00 00 00 00 ff 0b
 Using defaults from ksymoops -t elf32-i386 -a i386
 
 
 ebx; c03324f8 parse_options+124/1c8
 edx; c03324fc parse_options+128/1c8
 edi; c03324f8 parse_options+124/1c8
 
 Trace; c01a698d reiserfs_super_in_proc+69/254
 Trace; c01a6c6c reiserfs_per_level_in_proc+f4/134
 Trace; c01a6ccc reiserfs_bitmap_in_proc+20/a0
 Trace; c01a6e27 reiserfs_on_disk_super_in_proc+db/f0
 Trace; c0172671 nlm_shutdown_hosts+e9/11c
 Trace; c0173188 nlmsvc_lock+c8/340
 Trace; c013c712 sys_getdents64+7e/b3
 Trace; c014d546 notesize+1e/2c
 Trace; c013c8fd max_select_fd+9d/a4
 Trace; c014e589 handle_ide_mess+25/194
 Trace; c014e842 msdos_partition+14a/2f8
 Trace; c014e6ad handle_ide_mess+149/194
 Trace; c014ec6f __load_block_bitmap+17f/198
 Trace; c0106f17 show_stack+7/78
 
 Code;   Before first symbol
  _EIP:
 Code;   Before first symbol
   0:   89 50 04  mov%edx,0x4(%eax)
 Code;  0003 Before first symbol
   3:   89 02 mov%eax,(%edx)
 Code;  0005 Before first symbol
   5:   c7 01 00 00 00 00 movl   $0x0,(%ecx)
 Code;  000b Before first symbol
   b:   c7 41 04 00 00 00 00  movl   $0x0,0x4(%ecx)
 Code;  0012 Before first symbol
  12:   ff 0b decl   (%ebx)

Bye,
Oleg



Re: mkreiserfs -s 1024 makes unmountable partitions

2003-01-26 Thread Oleg Drokin
Hello!

On Sun, Jan 26, 2003 at 07:18:16PM +0100, Francois-Rene Rideau wrote:

 Hi! No hard disk crash today (I'm just disabling the DMA )- )
 However, I've tried to make small reiserfs partitions,
 and was annoyed at the journal taking a significant size of the disk:
 32MB is 50% of my 64MB /boot partition, and 40% of the whole
 of my server's 80MB harddisk.
 I saw that mkreiserfs had an option -s to select the size of the journal,
 and tried to use it to make a 4MB journal:
   mkreiserfs -s 1024 /dev/hdc1
 However, whereas mkreiserfs didn't complain, the resulting partition
 was unmountable by linux. In the syslogs, the kernel complains:
 read_super_block: can't find a reiserfs filesystem on (dev 16:01, block 128, size 
512)
 read_super_block: can't find a reiserfs filesystem on (dev 16:01, block 16, size 512)

You need journal relocation patches from Chris Mason.
ftp://ftp.suse.com/pub/people/mason/patches/datalogging

 I there a way to make a reiserfs partition with a small journal?

Sure. You just did it. ;)
Not you need in-kernel support to be able to mount it.
Ir You can use 2.5 kernels (note these are not recommended for productional environment
of course)

 Would a small kernel patch do it?

Sure.

 In any case, I think that it is a bug that mkreiserfs doesn't check
 the consistency of its parameters with what the kernel is able to handle.

No, that's not a bug. mkreiserfs cannot know if you are just making a filesystem
and planning to reboot into proper kernel later (or even move the disk to other 
system).
Also it cannot detect if your current kernel have any patches applied or not.

Bye,
Oleg



Re: Hard disk crash and solution

2003-01-26 Thread Oleg Drokin
Hello!

On Mon, Jan 27, 2003 at 05:53:31AM +0100, Ookhoi wrote:
  Title: IBM DTLA 307045 Hard disk crash
  
  I bought this disk (46 GB) about two years ago. One of the best they
  claimed.
  [...]
  What is the fucking MBTF of these drives?? Is it close to one year
  like I experienced?
 That is quite good for those drives :-)

I bought IBM DTLA-307030 made in Hungary 2 years ago.
It is still working (though it already have ~1500 bad sectors remapped)
aside of making unusual noises when remapping bad sectors ;)
I may be just lucky.
Also I try to run it in cool environment, so that may help it too.

Bye,
Oleg



Re: quotas in 2.4.20

2003-01-25 Thread Oleg Drokin
Hello!

On Fri, Jan 24, 2003 at 09:02:02PM -0600, Jos? A. Guzm?n wrote:

  I'm trying to get quotas working on 2.4.20.
  So far they seem to work ok with the dec-3-2002 patches from: 
 ftp://ftp.namesys.com/pub/reiserfs-for-2.4/testing/quota-2.4.20/ 
 with CONFIG_QFMT_V2.
  However the patch from jan-22-2003 in: 
 ftp://ftp.namesys.com/pub/reiserfs-for-2.4/2.4.20-pending/01-iput-deadlock-fix.diff
 does not apply on top of the testing/quota-2.4.20 patches.

Yup.
You do not need that patch if you are using quotas, as it is already in ;)
I will rediff quota patches against new 2.4.21-pre later.

  Also, when compiling a 2.4.20 kernel with testing/quota patches and the latest
 evms 1.2.1 patches, compilation stops with:
 fs/fs.o: In function `fsync_dev_lockfs':
 fs/fs.o(.text+0x31a8): undefined reference to `DQUOT_SYNC'

Hm, that's strange. Do evems guys touch quota code at all?

   Are the testing/quota patches recommended for a production box with 2.4.20?
 (Debian 3.0, quota 3.08)

Well, they seem to work well enough. There is no bugreport that I can assotiate with
this quota code for sure.

   Is there a way to get EVMS working with quotas?

Hm. What are evms guys thoughts on that issue?

Bye,
Oleg



Re: slightly [OT] highmem (was Re: 2.4.20 at kernel.org and data logging)

2003-01-24 Thread Oleg Drokin
Hello!

On Fri, Jan 24, 2003 at 06:00:19PM +0100, Dieter N?tzel wrote:
   higmem4GB / highmem64GB with pae or does it produce more overhead that
   you mention below?
  You get no advantage of course.
  But lots of overhead. Rumours have it that 256M systems with highmem
  enabled kernels (default for RedHat beta it seems) are swapping much more
  then when the same kernel is built with highmem off.
 But that could be because they have forgotten to enabled HIGHMEM IO?
 See Andrea Ancangeli's -aa kernels.

What HIGHMEM IO? There is exactly NO highmem, so sighmem IO code won't be used.

Bye,
Oleg



Re: old block allocator found in 2.4.19

2003-01-23 Thread Oleg Drokin
Hello!

On Thu, Jan 23, 2003 at 11:30:07PM +0100, Newsmail wrote:
 Hi Oleg, as you remember I mentioned you about my loop-aes+lvm+reiserfs 
 problem, that leaves hung processes after them, and only a cold reboot 

Yes. Unfortunatelly I only produced 2.4.20+crypto stuff kernel image and
now other more important bugs and problems divert me from looking at your
problem more, sorry.

 could save a solution. well these problems came (in my opinion) after the 
 introduction of the new block allocator in 2.4.20-preX. in the first 
 version the new block allocation wasnt the default one, we had to use some 
 preallocmin= etcetc flag in the mount process. well I would like to try 
 with a new 2.4.20 kernel the settings for the old allocator. is there a way 
 to use the old allocator with the new kernel? some mount option or any?

Well, you can specify tails=large,alloc=old_way:concentrating_formatted_nodes=10.
This way it will resemble old block allocator pretty much.
Also you can remove the new block allocator patch from 2.4.20 (just apply it
(and later fixes) with -R patch option).
Also it would be interesting if your hangs go away if you apply
iput-deadlock fix: 
ftp://ftp.namesys.com/pub/reiserfs-for-2.4/2.4.20-pending/01-iput-deadlock-fix.diff

Thank you.

Bye,
Oleg



Re: ordered writes in 2.4.20?

2003-01-23 Thread Oleg Drokin
Hello!

On Thu, Jan 23, 2003 at 08:35:18PM -0500, Hubert Chan wrote:

 I'm currently using linux 2.4.20, and I'm wondering if it has support
 for ordered writes, or if I would have to apply a patch.

You need to apply the patch.

Bye,
Oleg



Re: [reiserfs-dev] Re: [ANNOUNCE]: reiser4 snapshot

2003-01-17 Thread Oleg Drokin
Hello!

On Fri, Jan 17, 2003 at 01:09:20PM +0100, Ookhoi wrote:
  It is released as a patch against linux-2.5.58 kernel. It should also
  work with current (January 16th) bk snapshot at
  http://linux.bkbits.net/linux-2.5
  This is mostly bug fixing release.
  READ.ME file contains changelog.
 Can you please have a look at the READ.ME? It contains old info from the
 former snapshot.

Only the snapshot date was old, all other info was recent one.
Thanks for noticing. Fixed.

Bye,
Oleg



Re: Quotas aand 2.5.x

2003-01-15 Thread Oleg Drokin
Hello!

On Wed, Jan 15, 2003 at 10:43:06AM +0100, Philippe Gramoull? wrote:

 Actually, right now, we still have that nasty bug every time we run quotacheck that 
prenvent us from enabling them on several filerservers which is a big problem right 
now (that's on 2.4.x)

Have you tried 2.4. without Chris' datalogging patched, but with original short
overflow fix?

Bye,
Oleg



Re: Stress test failure for reiserfs, but ext3 ok on Linux 2.4.20

2003-01-15 Thread Oleg Drokin
Hello!

On Wed, Jan 15, 2003 at 03:35:59AM +0100, Bernhard Sadlowski wrote:

 I am using the attached stess.sh script (probably from this mailinglist)
 for creating load on a reiserfs filesystem, which forks 100
 (read,write,delete) processes: 
 # mkreiserfs /dev/sda4
 # mount /dev/sda4 /backup
 # stress.sh -c /usr -n 100 /backup 
 Then wait until /backup fills up.

Hm. This resembles me something.
Can you reproduce the same problem if you apply patches from
ftp://ftp.namesys.com/pub/reiserfs-for-2.4/testing/quota-2.4.20/
These patches add quota support to reiserfs, but also change some
new inode-related operation to prevent deadlocks like you are seeing.

 Any I/O freezes and even after killing the script, the remaining cp and
 mv commands don't terminate. They are in status D. A simle ls
 /backup never comes back. Only a hard powerdown fixes this situation,
 because init 6 etc. doesn't work. I have even activated the reiserfs
 debug, but I don't see any additional info.

Try executing sysrq-t after the lockup happens, then send us decoded output
plese.

Thank you.

Bye,
Oleg



Re: Quotas aand 2.5.x

2003-01-15 Thread Oleg Drokin
Hello!

On Wed, Jan 15, 2003 at 11:59:53AM +0100, Philippe Gramoull? wrote:
   |  Have you tried 2.4. without Chris' datalogging patched, but with original short
   |  overflow fix?
 Well, my question was more like a Plan B.
 I think i did, and that it still crashed, always during the quotacheck,  but i'll 
try it again to be 100% sure.

The 2.4.19-presomething you had there before with just only fix I sent first
time?

Bye,
Oleg



Re: Stress test failure for reiserfs, but ext3 ok on Linux 2.4.20

2003-01-15 Thread Oleg Drokin
Hello!

On Wed, Jan 15, 2003 at 12:51:00PM +0100, Bernhard Sadlowski wrote:

  Hm. This resembles me something.
  Can you reproduce the same problem if you apply patches from
  ftp://ftp.namesys.com/pub/reiserfs-for-2.4/testing/quota-2.4.20/
  These patches add quota support to reiserfs, but also change some
  new inode-related operation to prevent deadlocks like you are seeing.
 The unpatched kernel shows the hangs much earlier, so I assume that the
 above patches solve the problem. With the patches the load goes up very
 slowly but steady to 100 and I/O does not freeze anymore. vmstat and
 iostat still show activity. I assume you don't need any sysrq-t output
 now.

Ok. That's a good sign.

 Will the patches be included in 2.4.21?

No, they require quota support tha won't be included into 2.4 because of
new quota formats and stuff.
I will extract relevant bits from the patch though.
I will send you short version without quota once it will be ready.

Thank you.

Bye,
Oleg



Re: Stress test failure for reiserfs, but ext3 ok on Linux 2.4.20

2003-01-15 Thread Oleg Drokin
Hello!

On Wed, Jan 15, 2003 at 02:58:04PM +0300, Oleg Drokin wrote:

 I will extract relevant bits from the patch though.
 I will send you short version without quota once it will be ready.

Ok, here is the patch, can you give it a try and see if it also helps?
I tested it locally and it works for me.
If you confirm everything is ok, I will try to get it into 2.4.21 in time.

Bye,
Oleg

--- linux-2.4.20/fs/reiserfs/namei.cFri Nov 29 02:53:15 2002
+++ linux-2.4.20-t/fs/reiserfs/namei.c  Wed Jan 15 17:08:20 2003
@@ -488,27 +488,58 @@
 return 0;
 }
 
+/* quota utility function, call if you've had to abort after calling
+** new_inode_init, and have not called reiserfs_new_inode yet.
+** This should only be called on inodes that do not hav stat data
+** inserted into the tree yet.
+*/
+static int drop_new_inode(struct inode *inode) {
+make_bad_inode(inode) ;
+iput(inode) ;
+return 0 ;
+}
+
+/* utility function that does setup for reiserfs_new_inode.  
+** DQUOT_ALLOC_INODE cannot be called inside a transaction, so we had
+** to pull some bits of reiserfs_new_inode out into this func.
+*/
+static int new_inode_init(struct inode *inode, struct inode *dir, int mode) {
+
+/* the quota init calls have to know who to charge the quota to, so
+** we have to set uid and gid here
+*/
+inode-i_uid = current-fsuid;
+inode-i_mode = mode;
+
+if (dir-i_mode  S_ISGID) {
+inode-i_gid = dir-i_gid;
+if (S_ISDIR(mode))
+inode-i_mode |= S_ISGID;
+} else
+inode-i_gid = current-fsgid;
 
+return 0 ;
+}
+  
 static int reiserfs_create (struct inode * dir, struct dentry *dentry, int mode)
 {
 int retval;
 struct inode * inode;
-int windex ;
 int jbegin_count = JOURNAL_PER_BALANCE_CNT * 2 ;
 struct reiserfs_transaction_handle th ;
 
-
 if (!(inode = new_inode(dir-i_sb))) {
return -ENOMEM ;
 }
+retval = new_inode_init(inode, dir, mode) ;
+if (retval)
+   return retval ;
+
 journal_begin(th, dir-i_sb, jbegin_count) ;
 th.t_caller = create ;
-windex = push_journal_writer(reiserfs_create) ;
-inode = reiserfs_new_inode (th, dir, mode, 0, 0/*i_size*/, dentry, inode, 
retval);
-if (!inode) {
-   pop_journal_writer(windex) ;
-   journal_end(th, dir-i_sb, jbegin_count) ;
-   return retval;
+retval = reiserfs_new_inode (th, dir, mode, 0, 0/*i_size*/, dentry, inode);
+if (retval) {
+   goto out_failed ;
 }

 inode-i_op = reiserfs_file_inode_operations;
@@ -520,20 +551,19 @@
 if (retval) {
inode-i_nlink--;
reiserfs_update_sd (th, inode);
-   pop_journal_writer(windex) ;
-   // FIXME: should we put iput here and have stat data deleted
-   // in the same transactioin
journal_end(th, dir-i_sb, jbegin_count) ;
-   iput (inode);
-   return retval;
+   iput(inode) ;
+   goto out_failed ;
 }
 reiserfs_update_inode_transaction(inode) ;
 reiserfs_update_inode_transaction(dir) ;
 
 d_instantiate(dentry, inode);
-pop_journal_writer(windex) ;
 journal_end(th, dir-i_sb, jbegin_count) ;
 return 0;
+
+out_failed:
+return retval ;
 }
 
 
@@ -541,21 +571,21 @@
 {
 int retval;
 struct inode * inode;
-int windex ;
 struct reiserfs_transaction_handle th ;
 int jbegin_count = JOURNAL_PER_BALANCE_CNT * 3; 
 
 if (!(inode = new_inode(dir-i_sb))) {
return -ENOMEM ;
 }
+retval = new_inode_init(inode, dir, mode) ;
+if (retval)
+return retval ;
+
 journal_begin(th, dir-i_sb, jbegin_count) ;
-windex = push_journal_writer(reiserfs_mknod) ;
 
-inode = reiserfs_new_inode (th, dir, mode, 0, 0/*i_size*/, dentry, inode, 
retval);
-if (!inode) {
-   pop_journal_writer(windex) ;
-   journal_end(th, dir-i_sb, jbegin_count) ;
-   return retval;
+retval = reiserfs_new_inode(th, dir, mode, 0, 0/*i_size*/, dentry, inode);
+if (retval) {
+   goto out_failed; 
 }
 
 init_special_inode(inode, mode, rdev) ;
@@ -571,16 +601,17 @@
 if (retval) {
inode-i_nlink--;
reiserfs_update_sd (th, inode);
-   pop_journal_writer(windex) ;
journal_end(th, dir-i_sb, jbegin_count) ;
-   iput (inode);
-   return retval;
+   iput(inode) ;
+goto out_failed; 
 }
 
 d_instantiate(dentry, inode);
-pop_journal_writer(windex) ;
 journal_end(th, dir-i_sb, jbegin_count) ;
 return 0;
+
+out_failed:
+return retval ;
 }
 
 
@@ -588,15 +619,18 @@
 {
 int retval;
 struct inode * inode;
-int windex ;
 struct reiserfs_transaction_handle th ;
 int jbegin_count = JOURNAL_PER_BALANCE_CNT * 3; 
 
+mode = S_IFDIR | mode;
 if (!(inode = new_inode(dir-i_sb))) {
return -ENOMEM ;
 }
+retval = new_inode_init(inode, dir, mode) ;
+if (retval)
+   return retval ;
+
 journal_begin(th, dir-i_sb, jbegin_count) ;
-windex

Re: Stress test failure for reiserfs, but ext3 ok on Linux 2.4.20

2003-01-15 Thread Oleg Drokin
Hello!

On Wed, Jan 15, 2003 at 04:48:52PM +0100, Bernhard Sadlowski wrote:
  Ok, here is the patch, can you give it a try and see if it also helps?
  I tested it locally and it works for me.
  If you confirm everything is ok, I will try to get it into 2.4.21 in time.
 At first glance it seems to work. I will run now that script overnight
 and will tell you, if any problems arise.

Ok, Thank you very much.

Bye,
Oleg



Re: How to break a reiserfs on Linux 2.4.20

2003-01-15 Thread Oleg Drokin
Hello!

On Wed, Jan 15, 2003 at 05:44:26PM -0500, Zygo Blaxell wrote:
   And now I can reliably reproduce it.  It has nothing to do with MD,
   linear, raid, SMP, or unclean shutdowns.
   
   I can reproduce this bug on a plain IDE disk partition in about three
   hours on Linux 2.4.20 (compiled for SMP but running on UP, full .config
   and system details available on request).  My test system has about 4 gigs
   under /etc, /usr, and /var, /dev/hdc2 is 25GB, and there is 1G of swap.
 Thanks for the report. We shall try to reproduce it tonight.
 Were you successful?  If your experience is anything like mine, you
 should have hundreds if not thousands of broken files by now...

Yes, we were able to reproduce the problem and now we are trying to fix it.
Thanks a lot for your help and for the script.

Bye,
Oleg



Re: Core dump in reiserfsck

2003-01-14 Thread Oleg Drokin
Hello!

On Wed, Jan 15, 2003 at 01:08:26PM +0900, Vitaly Porotikov wrote:

   Perhaps 3.x1c have not this bug, but I can't do any changes in my
 system. I send it in hope to find some coding errors out (if this wasn't
 before).

You you guessed right, this bug is long fixed.
Actually if you ever will decide to upgrade your reiserfsprogs (which is
recommended), pick 3.6.4 (or whatever will be latest at the time),
not 3.x.1c

Bye,
Oleg



Re: reiserfsck failure

2003-01-14 Thread Oleg Drokin
Hello!

On Tue, Jan 14, 2003 at 04:01:42PM -0500, Bill Schrier wrote:

  I am sending along both the --logfile and the core file from a recent
  reiserfsck we were running on our Redhat 7.2 raidzone machine.

Can you say what exact version was that?
Also just before dumping core it should have output some more info on stderr
about assertion failure, we are interested in that message too.

Thank you.

Bye,
Oleg



Re: kswapd CPU usage and heavy disk IO

2003-01-09 Thread Oleg Drokin
Hello!

On Thu, Jan 09, 2003 at 02:31:54PM +0100, Russell Coker wrote:

 I have a server with 4G of RAM running ReiserFS for everything that matters.
 It has 2G of swap space free, but so far I have not seen swap usage go above 
 1.6M (so in normal use I could turn off swap entirely and expect not to see 
 much difference).
 When it's under really heavy load (when I have a maintenance task involving a 
 find / and there are lots of POP/IMAP clients hitting the server as well as 
 mail delivery) and the load average gets to about 40, the kswapd kernel 
 thread starts using excessive CPU time.  It will stay on ~4% but have spikes 
 of up to 45%!!!  This is a two-processor machine so 45% CPU reported by top 
 means 90% of a single CPU I guess.  90% of a 1.8GHz P4 CPU is a lot of CPU 
 and I think that something is wrong.

Sounds exactly like yesterday/todays topic on lkml.
You have highmem box, during heavy IO all of the lowmempages are
occupied with bounce buffers and bh's.
Kernel needs more low memory and tries to free some with no much success though.
Known non-reiserfs related problem. Not easy to fix unfortunatelly.

Relevant lkml topic was 2.4.20, .text.lock.swap cpu usage? (ibm x440)
Mail from Andrew Morton with msgid [EMAIL PROTECTED]
He recommended to try 
http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20aa1.bz2
and send a report on the outcome

Bye,
Oleg



Re: !?!

2003-01-08 Thread Oleg Drokin
Hello!

On Wed, Jan 08, 2003 at 11:53:26AM +0500, Anton Erofeevskij wrote:

 in reiserfs filesystem
 time cat sd1 | ./a.out  sd2
 0.00user 0.05system 0:01.79elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
 0inputs+0outputs (131major+43minor)pagefaults 0swaps
 in ext2 filesystem
 time cat sd1 | ./a.out  sd2
 0.00user 0.05system 0:00.95elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
 0inputs+0outputs (131major+43minor)pagefaults 0swaps
 In what the reason?!?


Generally reiserfs might have more CPU overhead over ext2 due to it's
journaling and balanced-tree nature per one operation. For large operations
this is outweight by speed of performing the operation itself, but when you
just write four bytes at a time, and each time that involves statdata (size,
possibly nlinks, times) update, possible rebalancings, journal updates.
And you have not said what ketnel are you using and what is config of the
kernel.

Bye,
Oleg



Re: How to use external journal?

2002-12-23 Thread Oleg Drokin
Hello!

On Mon, Dec 23, 2002 at 04:57:04PM +0100, Luis Gregorio Muniz Rodriguez wrote:
 I have recently discovered that the journal can be placed on an external
 device (i.e, `mkreiserfs --journal-device FILE'), but I haven't found
 any doc about it.

If you do not use 2.5 kernels, you need to apply separate patch to your tree.

 I'm currently using ReiserFS on top of LVM partitions, and I am
 wondering if I can use the same shared journal device for a number of
 small partitions.  

No, you cannot.

 Note that I'm not trying to move all the journals to the same device
 (this should be easy using `--journal-device' and `--journal-offset',
 isn't it?).  Rather, I try to share the same 32Mb journal between small
 filesystems and/or filesystems with infrequent writes (such as /usr,
 /usr/local, /var/www, and so on).
 Is that possible?  And convenient?

This is not possible.
But you can divide these shared 32M into four separate parts 
and use these with --journal-offset.

Bye,
Oleg



Re: BUG() in _get_block_create_0

2002-12-23 Thread Oleg Drokin
Hello!

On Mon, Dec 23, 2002 at 05:31:09PM +0100, Nick Wellnhofer wrote:

 I'm using ReiserFS with the old 3.5 format on a web server. The system 
 has been running fine for 2 years. About 1 month ago I upgraded from 
 Linux 2.2.16 to 2.4.18 (SuSE 8.1 default kernel). Some weeks ago I got 
 reiserfs error messages in syslog suggesting a fsck and I had some files 
 which couldn't be accessed or deleted. So last week I ran reiserfsck 
 --rebuid-tree. At first everything worked fine. The problematic files 
 could be accessed again.

What was reiserfsck version?

 After about 3 hours I got an oops report in my syslog, but the system 
 kept running normally. Again 3 hours later the machine crashed with 
 another oops. It turned out that the BUG() in _get_block_create_0 in 
 fs/reiserfs/inode.c was hit both times. According to the value of EAX
   le_key_k_type (version, key)
 is TYPE_ANY (0x0f) but TYPE_DIRECT (0x02) is expected.

Hm, sounds like FS corruption.

 The machine is a web server in production and I have only remote access, 
 so I couldn't run reiserfsck again.
 Any suggestions?

We'd be interested in metadata snapshot
(debugreiserfs -p /dev/your_device | bzip2 -9c metadata.bz2).
You probably can even do this on readonly-mounted device.
Probably it will even work on read-write mounted device, but
make sure no much write activity is performed on that fs at the time
of snapshot. Also avoid writing metadata to the same fs you are taking
this metadata from ;)

Bye,
Oleg



Re: 640.0 GB symlink

2002-12-04 Thread Oleg Drokin
Hello!

On Tue, Dec 03, 2002 at 09:02:38PM -0800, Jason Mancini wrote:

 Should I just erase and remake the symlink?

Yes, that would be the simpliest thing.

 It wasn't like this in July (my last backup, *cough*).

Then somebody corrupted it, or may be even the drive itself.

reiserfsck is generally does not shorten file sizes, but symlink is really
special file, so this will be fixed for next release for sure.
May be we will even include something like that check into the kernel.

Thank you for your report.

Bye,
Oleg



Re: journal relocation

2002-12-04 Thread Oleg Drokin
Hello!

On Wed, Dec 04, 2002 at 09:50:52PM -0600, Brian Tinsley wrote:
 Is there a patchset available for journal relocation on a 2.4 kernel 
 (2.4.20 specifically)? I've seen reference to it in a few places but 
 have been unable to locate it.

Sure. Check out ftp://ftp.suse.com/pub/people/mason/patches/data-logging/2.4.20

Bye,
Oleg



Re: non volatile ram devices

2002-12-04 Thread Oleg Drokin
Hello!

On Wed, Dec 04, 2002 at 08:59:35PM +0100, Russell Coker wrote:

 I have some servers that are giving inadequate disk performance for Maildir 
 mail spools.  They are running kernel 2.4.19 (2.4.20 upgrade is planned) and 
 using ReiserFS for everything that's important.

May I ask what kind of inadequacy on what kinds of operations do you observe?

Thank you.

Bye,
Oleg



  1   2   3   >