PATCH: Trying to get back IO performance (WIP)

2000-07-03 Thread Juan J. Quintela


Hi
This patch is against test3-pre2.
It gives here good performance in the first run, and very bad
in the following ones of dbench 48.  I am hitting here problems with
the locking scheme.  I get a lot of contention in __wait_on_supper.
Almost all the dbench processes are waiting in:

0xc7427dcc 0xc0116fbd schedule+0x389 (0xc4840c20, 0x12901d, 0xc7427ea0, 0x123456
7, 0xc7426000)
   kernel .text 0xc010 0xc0116c34 0xc01173c0
   0xc013639c __wait_on_super+0x184 (0xc13f4c00)
   kernel .text 0xc010 0xc0136218 0xc0136410
   0xc01523e5 ext2_alloc_block+0x21 (0xc4840c20, 0x12901d, 0xc7427ea0)
   kernel .text 0xc010 0xc01523c4 0xc015245c
0xc7427e5c 0xc0152892 block_getblk+0x15e (0xc4840c20, 0xc316b5e0, 0x9, 0x15, 0xc
7427ea0)
   kernel .text 0xc010 0xc0152734 0xc0152a68
0xc7427eac 0xc0152ed0 ext2_get_block+0x468 (0xc4840c20, 0x15, 0xc2d99de0, 0x1)
   kernel .text 0xc010 0xc0152a68 0xc0152fc0
0xc7427ef4 0xc0133ae3 __block_prepare_write+0xe7 (0xc4840c20, 0xc11f7f78, 0x0, 0
x1000, 0xc0152a68)
   kernel .text 0xc010 0xc01339fc 0xc0133bf0
0xc7427f18 0xc0134121 block_prepare_write+0x21 (0xc11f7f78, 0x0, 0x1000, 0xc0152
a68)
   kernel .text 0xc010 0xc0134100 0xc013413c
[0]more> 
0xc7427f30 0xc01531d1 ext2_prepare_write+0x19 (0xc06b4f00, 0xc11f7f78, 0x0, 0x10
00)
   kernel .text 0xc010 0xc01531b8 0xc01531d8
0xc7427f90 0xc0127b8d generic_file_write+0x305 (0xc06b4f00, 0x8050461, 0xafc2, 0
xc06b4f20)
   kernel .text 0xc010 0xc0127888 0xc0127cf0
0xc7427fbc 0xc0130ea8 sys_write+0xe8 (0x9, 0x804b460, 0xffc3, 0x28, 0x1082)
   kernel .text 0xc010 0xc0130dc0 0xc0130ed0
   0xc0109874 system_call+0x34
   kernel .text 0xc010 0xc0109840 0xc0109878


This behavior also happens with vanilla kernel, only that it is not
so easy to reproduce.  Vanilla Kernel also gives normally worse IO
throughput than the first run with this patch.

Comments/suggerences/fixes are welcome.

Later, Juan.

This patch does:
 - Introduces WRITETRY logic that means: write this buffer if that
   is possible, but never block.  If we have to block, skip the
   write. (mainly from Jens axboe)
 - Uses that logic in sync_page_buffers(), i.e. never try to block
   in writes, when writing buffers.
 - Do all the blocking/waiting calling balance_dirty() in
   shrink_mmap.
 - export the sync_page_buffers function.
 - make try_to_free_buffers never generate any IO.
 - Change the caller of that two functions accordingly with the new
   semantics.


diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude 
base/drivers/block/ll_rw_blk.c working/drivers/block/ll_rw_blk.c
--- base/drivers/block/ll_rw_blk.c  Fri Jun 30 18:42:30 2000
+++ working/drivers/block/ll_rw_blk.c   Mon Jul  3 00:03:00 2000
@@ -556,7 +556,7 @@
unsigned int sector, count;
int max_segments = MAX_SEGMENTS;
struct request * req = NULL;
-   int rw_ahead, max_sectors, el_ret;
+   int r_ahead, w_ahead, max_sectors, el_ret;
struct list_head *head = &q->queue_head;
int latency;
elevator_t *elevator = &q->elevator;
@@ -584,30 +584,24 @@
}
}
 
-   rw_ahead = 0;   /* normal case; gets changed below for READA */
+   r_ahead = w_ahead = 0;
switch (rw) {
case READA:
-   rw_ahead = 1;
+   r_ahead = 1;
rw = READ;  /* drop into READ */
case READ:
if (buffer_uptodate(bh)) /* Hmmph! Already have it */
goto end_io;
kstat.pgpgin++;
break;
-   case WRITERAW:
-   rw = WRITE;
-   goto do_write;  /* Skip the buffer refile */
+   case WRITETRY:
+   w_ahead = 1;
case WRITE:
if (!test_and_clear_bit(BH_Dirty, &bh->b_state))
goto end_io;/* Hmmph! Nothing to write */
refile_buffer(bh);
-   do_write:
-   /*
-* We don't allow the write-requests to fill up the
-* queue completely:  we want some room for reads,
-* as they take precedence. The last third of the
-* requests are only for reads.
-*/
+   case WRITERAW:  /* Skip the buffer refile */
+   rw = WRITE;
kstat.pgpgout++;
break;
   

Re: PATCH: Trying to get back IO performance (WIP)

2000-07-03 Thread Stephen C. Tweedie

Hi,

On Mon, Jul 03, 2000 at 02:24:07AM +0200, Juan J. Quintela wrote:

> This patch is against test3-pre2.
> It gives here good performance in the first run, and very bad
> in the following ones of dbench 48.  I am hitting here problems with
> the locking scheme.  I get a lot of contention in __wait_on_supper.
> Almost all the dbench processes are waiting in:
> 
>0xc013639c __wait_on_super+0x184 (0xc13f4c00)
>0xc01523e5 ext2_alloc_block+0x21 (0xc4840c20, 0x12901d, 0xc7427ea0)

Known, and I did a patch for this ages ago.  It actually didn't make a
whole lot of difference.  The last version of the ext2 diffs I did for
this are included below.

--Stephen



>From [EMAIL PROTECTED]  Mon May  1 18:18:15 2000
Return-Path: 
Received: (from sct@localhost)
by dukat.scot.redhat.com (8.9.3/8.9.3) id SAA16268
for [EMAIL PROTECTED]; Mon, 1 May 2000 18:18:14 +0100
Resent-From: "Stephen C. Tweedie" <[EMAIL PROTECTED]>
Resent-Message-ID: <[EMAIL PROTECTED]>
Resent-Date: Mon, 1 May 2000 18:18:14 +0100 (BST)
Resent-To: [EMAIL PROTECTED]
X-Authentication-Warning: worf.scot.redhat.com: sct set sender to 
[EMAIL PROTECTED] using -f
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <[EMAIL PROTECTED]>
In-Reply-To: <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]>
X-Mailer: VM 6.59 under Emacs 20.3.1
From: "Stephen C. Tweedie" <[EMAIL PROTECTED]>
To: Linus Torvalds <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>, "Stephen C. Tweedie" <[EMAIL PROTECTED]>,
[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: DANGER: DONT apply it. Re: [patch] ext2nolock-2.2.8-A0
Date: Sat, 15 May 1999 03:05:41 +0100 (BST)
Status: RO
Content-Length: 22531
Lines: 755

Hi,

Linus Torvalds writes:

 > In particular, ext2_new_block() returns a block number. That's all fine
 > and dandy, but it's extremely and utterly idiotic. It means that
 > ext2_new_block() needs to clear the block, which is where all the
 > race-avoidance comes from as far as I can tell. 

There are two sets of races involved.  One involves consistency of the
bitmaps themselves: we need to make sure that concurrent allocations
are consistent.  The second is consistency of inodes, and that is
already handled in fs/ext2/inode.c: when we call ext2_alloc_block(),
we follow it up with a check to see if anybody else has allocated the
same block in the same inode while we slept.  If so, we free the block
and repeat the block search.

So, there's no reason why block clearing needs to be locked: the block
cannot be installed in the inode indirection maps until we have
completely returned from ext2_new_block() anyway.

Anyway, the patch below (against 2.2.8) is an extension of the last
one I posted, and it gets rid of the superblock lock in balloc.c
entirely.  We do have to be a bit more careful about allocation
consitency without the lock: in particular, quota operations used to
happen between bitmap and group descriptor updates, but now we can't
risk blocking while the bitmaps and group descriptors are
inconsistent.

It compiles but I can't test right now.  Feel free to play with it if
you want to, but I do believe we can safely go without superblock
locks in both ialloc.c and balloc.c if we are careful.

The main change that dropping the lock imposes on us is that we cannot
rely on the bitmap buffer remaining pinned in cache for the duration
of the allocation, so we have to bump the buffer_head b_count in
load_block_bitmap() and brelse it once we are finished with it.
Shift-scrlck will be very useful here for spotting the buffer_head
leaks I've introduced. :)

--Stephen

alloc-2.2.8.diff:

--- fs/ext2/balloc.c.~1~Thu Oct 29 05:54:56 1998
+++ fs/ext2/balloc.cFri May 14 14:46:59 1999
@@ -9,6 +9,13 @@
  *  Enhanced block allocation by Stephen Tweedie ([EMAIL PROTECTED]), 1993
  *  Big-endian to little-endian byte-swapping/bitmaps by
  *David S. Miller ([EMAIL PROTECTED]), 1995
+ *
+ *  Dropped use of the superblock lock.
+ *  This means that we have to take extra care not to block between any
+ *  operations on the bitmap and the corresponding update to the group 
+ *  descriptors.
+ *   Stephen C. Tweedie ([EMAIL PROTECTED]), 1999
+ * 
  */
 
 /*
@@ -74,42 +81,33 @@
 }
 
 /*
- * Read the bitmap for a given block_group, reading into the specified 
- * slot in the superblock's bitmap cache.
+ * Read the bitmap for a given block_group.
  *
- * Return >=0 on success or a -ve error code.
+ * Return the buffer_head on success or NULL on IO error.
  */
 
-static int read_block_bitmap (struct super_block * sb,
-  unsigned int block_group,
-  unsigned long bitmap_nr)
+static struct buffer_head * read_block_bitmap (struct super_block * sb,
+