Re: [reiserfs-list] Disk fragmentation and performance degradation caused by NFS/ preallocation code interaction

2001-10-26 Thread Eric Whiting

No -- bonnie doesn't need 1G ram -- bonnie needs to test using a file
much larger than available ram to ensure that bonnie actually writes
something to disk and not just to the VFS/buffercache layer.
eric


Bo Moon wrote:
 
 Hi,
 
 What does 2*RAM mean?
 His box has 512M RAM, so he need 2*512M for Bonnie?
 Why?
 
 Thanks,
 
 Bo
 
 Chris Mason wrote:
 
 
   Linux box is 512M RAM. The files fit in buffer cache. I run the bonnies
   of size n+50. Results are fairly constant until I hit about 300M. Then
   they seem to fall off an edge. I'll include only two sample points for
   each setup.
 
  Bonnie really needs 2 * RAM to be reliable.  Notice how on the 100M
  test you've got 99% CPU usage?  That's most likely because the 100M
 
  -chris



Re: [reiserfs-list] Disk fragmentation and performance degradation caused by NFS/ preallocation code interaction

2001-10-25 Thread Chris Mason



On Tuesday, October 23, 2001 02:19:57 PM -0400 Anne Milicia 
[EMAIL PROTECTED] wrote:

[ great analysis of fragmentation problem + fix ]
 
 So, my question is can journal_mark_freed() be safely skipped when
 reiserfs_free_block() is called by __discard_prealloc()?  Can you think
 of any problems with making this change?
 

Ok, this fragmentation problem is triggered by any programs that open,
append, close the same file frequently enough that it happens more than once
inside a transaction (5 seconds).  This is because preallocated blocks are
discarded on file close.

So, I wrote a program that did this X number of times in a loop, and 
got similar results to Anne's.  I reworked her patch slightly to 
keep the interface to reiserfs_free_block the same, but I did keep 
the change to __discard_prealloc.

In her code, __discard_prealloc preseves the next block number to be preallocated, 
greating increasing the chance this block will be chosen 
again if we do allocation before the inode goes out of cache.

Normally find_tag does this for us, but there are a few corner cases 
(mostly with holes) where it can't.

Anyway, Anne, could you please take a look and make sure this still
improves your performance?  I think the odd results you got for 2.4.12
before were probably due to actual fragmentation against prellocated
blocks from other files.  With a single writer, 2.4.13 allocates blocks
in order during the open, append, close loop for me.

thanks,
Chris

--- 1.11/fs/reiserfs/bitmap.c Wed, 24 Oct 2001 09:50:17 -0400 
+++ 2413.10/fs/reiserfs/bitmap.c Thu, 25 Oct 2001 14:07:55 -0400 
@@ -84,7 +84,7 @@
to free a list of blocks at once. -Hans */
/* I wonder if it would be less modest
now that we use journaling. -Hans */
-void reiserfs_free_block (struct reiserfs_transaction_handle *th, unsigned long block)
+static void _reiserfs_free_block (struct reiserfs_transaction_handle *th, unsigned 
+long block)
 {
 struct super_block * s = th-t_super;
 struct reiserfs_super_block * rs;
@@ -92,18 +92,12 @@
 struct buffer_head ** apbh;
 int nr, offset;
 
-  RFALSE(!s, vs-4060: trying to free block on nonexistent device);
-  RFALSE(is_reusable (s, block, 1) == 0, vs-4070: can not free such block);
-
   rs = SB_DISK_SUPER_BLOCK (s);
   sbh = SB_BUFFER_WITH_SB (s);
   apbh = SB_AP_BITMAP (s);
 
   get_bit_address (s, block, nr, offset);
 
-  /* mark it before we clear it, just in case */
-  journal_mark_freed(th, s, block) ;
-
   reiserfs_prepare_for_journal(s, apbh[nr], 1 ) ;
 
   /* clear bit for the given block in bit map */
@@ -122,7 +116,26 @@
   s-s_dirt = 1;
 }
 
+void reiserfs_free_block (struct reiserfs_transaction_handle *th, 
+  unsigned long block) {
+struct super_block * s = th-t_super;
+
+RFALSE(!s, vs-4061: trying to free block on nonexistent device);
+RFALSE(is_reusable (s, block, 1) == 0, vs-4071: can not free such block);
+/* mark it before we clear it, just in case */
+journal_mark_freed(th, s, block) ;
+_reiserfs_free_block(th, block) ;
+}
+
+/* preallocated blocks don't need to be run through journal_mark_freed */
+void reiserfs_free_prealloc_block (struct reiserfs_transaction_handle *th, 
+  unsigned long block) {
+struct super_block * s = th-t_super;
 
+RFALSE(!s, vs-4060: trying to free block on nonexistent device);
+RFALSE(is_reusable (s, block, 1) == 0, vs-4070: can not free such block);
+_reiserfs_free_block(th, block) ;
+}
 
 /* beginning from offset-th bit in bmap_nr-th bitmap block,
find_forward finds the closest zero bit. It returns 1 and zero
@@ -649,11 +662,13 @@
 static void __discard_prealloc (struct reiserfs_transaction_handle * th,
struct inode * inode)
 {
+  unsigned long search_start = inode-u.reiserfs_i.i_prealloc_block ;
   while (inode-u.reiserfs_i.i_prealloc_count  0) {
-reiserfs_free_block(th,inode-u.reiserfs_i.i_prealloc_block);
+reiserfs_free_prealloc_block(th,inode-u.reiserfs_i.i_prealloc_block);
 inode-u.reiserfs_i.i_prealloc_block++;
 inode-u.reiserfs_i.i_prealloc_count --;
   }
+  inode-u.reiserfs_i.i_prealloc_block = search_start ; 
   list_del ((inode-u.reiserfs_i.i_prealloc_list));
 }
 




Re: [reiserfs-list] Disk fragmentation and performance degradation caused by NFS/ preallocation code interaction

2001-10-25 Thread Anne Milicia

Chris Mason wrote:
 
 Anyway, Anne, could you please take a look and make sure this still
 improves your performance?  I think the odd results you got for 2.4.12
 before were probably due to actual fragmentation against prellocated
 blocks from other files.  With a single writer, 2.4.13 allocates blocks
 in order during the open, append, close loop for me.
 

Hi Chris,
Thanks, I had switched over to almost the same code after your
suggestion.  (Much nicer!)
I will try this patch.  The 2.4.12 results actually turned out to have
reallocated the prealloc blocks to the same file, but out of order.  I
think the reason for this is that the NFS client sent a bunch of NFS
write requests asynchronously and multiple NFSD threads sent the writes
down to reiserfs out of order.  Since I was loopback mounting, the
client for each kernel was the same as the server version.  I don't see
a difference on a quick look, but I'll see if I can figure out the
difference during testing.  I noticed that Trond has 2 patches
http://www.fys.uio.no/~trondmy/src/2.4.13/linux-2.4.13-tune.dif and
http://www.fys.uio.no/~trondmy/src/2.4.13/linux-2.4.13-write.dif that
will be interesting to try and see the results.

If NFS continues to send writes to reiserfs out of order, an NFS
specific optimization would be to allocate the disk blocks that would be
appropriate for keeping the file data contiguous if the new file data
being written is within a short number of blocks from the current end of
the file.

Thanks again for the help!
Anne



Re: [reiserfs-list] Disk fragmentation and performance degradation caused by NFS/ preallocation code interaction

2001-10-25 Thread Chris Mason



On Thursday, October 25, 2001 03:06:27 PM -0600 Eric Whiting [EMAIL PROTECTED] wrote:

 Here is some feedback of 2.4.13+the patch from Chris. Two tests: local
 fs and NFS. I still see odd things happening at files above 300G. This
 is reiserfs formatted -v2 (3.6) with a default mount (tail).
 
 Linux box is 512M RAM. The files fit in buffer cache. I run the bonnies
 of size n+50. Results are fairly constant until I hit about 300M. Then
 they seem to fall off an edge. I'll include only two sample points for
 each setup.

Bonnie really needs 2 * RAM to be reliable.  Notice how on the 100M
test you've got 99% CPU usage?  That's most likely because the 100M
test is doing nothing but hammering on the various caches.  The drop
off point comes at 300M because the test starts taking long enough
for kupdate and friends to jump in an throttle the writes.

How long does each phase take (esp the intelligenty write phase)?

-chris