PATCH: Trying to get back IO performance (WIP)
Hi This patch is against test3-pre2. It gives here good performance in the first run, and very bad in the following ones of dbench 48. I am hitting here problems with the locking scheme. I get a lot of contention in __wait_on_supper. Almost all the dbench processes are waiting in: 0xc7427dcc 0xc0116fbd schedule+0x389 (0xc4840c20, 0x12901d, 0xc7427ea0, 0x123456 7, 0xc7426000) kernel .text 0xc010 0xc0116c34 0xc01173c0 0xc013639c __wait_on_super+0x184 (0xc13f4c00) kernel .text 0xc010 0xc0136218 0xc0136410 0xc01523e5 ext2_alloc_block+0x21 (0xc4840c20, 0x12901d, 0xc7427ea0) kernel .text 0xc010 0xc01523c4 0xc015245c 0xc7427e5c 0xc0152892 block_getblk+0x15e (0xc4840c20, 0xc316b5e0, 0x9, 0x15, 0xc 7427ea0) kernel .text 0xc010 0xc0152734 0xc0152a68 0xc7427eac 0xc0152ed0 ext2_get_block+0x468 (0xc4840c20, 0x15, 0xc2d99de0, 0x1) kernel .text 0xc010 0xc0152a68 0xc0152fc0 0xc7427ef4 0xc0133ae3 __block_prepare_write+0xe7 (0xc4840c20, 0xc11f7f78, 0x0, 0 x1000, 0xc0152a68) kernel .text 0xc010 0xc01339fc 0xc0133bf0 0xc7427f18 0xc0134121 block_prepare_write+0x21 (0xc11f7f78, 0x0, 0x1000, 0xc0152 a68) kernel .text 0xc010 0xc0134100 0xc013413c [0]more> 0xc7427f30 0xc01531d1 ext2_prepare_write+0x19 (0xc06b4f00, 0xc11f7f78, 0x0, 0x10 00) kernel .text 0xc010 0xc01531b8 0xc01531d8 0xc7427f90 0xc0127b8d generic_file_write+0x305 (0xc06b4f00, 0x8050461, 0xafc2, 0 xc06b4f20) kernel .text 0xc010 0xc0127888 0xc0127cf0 0xc7427fbc 0xc0130ea8 sys_write+0xe8 (0x9, 0x804b460, 0xffc3, 0x28, 0x1082) kernel .text 0xc010 0xc0130dc0 0xc0130ed0 0xc0109874 system_call+0x34 kernel .text 0xc010 0xc0109840 0xc0109878 This behavior also happens with vanilla kernel, only that it is not so easy to reproduce. Vanilla Kernel also gives normally worse IO throughput than the first run with this patch. Comments/suggerences/fixes are welcome. Later, Juan. This patch does: - Introduces WRITETRY logic that means: write this buffer if that is possible, but never block. If we have to block, skip the write. (mainly from Jens axboe) - Uses that logic in sync_page_buffers(), i.e. never try to block in writes, when writing buffers. - Do all the blocking/waiting calling balance_dirty() in shrink_mmap. - export the sync_page_buffers function. - make try_to_free_buffers never generate any IO. - Change the caller of that two functions accordingly with the new semantics. diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude base/drivers/block/ll_rw_blk.c working/drivers/block/ll_rw_blk.c --- base/drivers/block/ll_rw_blk.c Fri Jun 30 18:42:30 2000 +++ working/drivers/block/ll_rw_blk.c Mon Jul 3 00:03:00 2000 @@ -556,7 +556,7 @@ unsigned int sector, count; int max_segments = MAX_SEGMENTS; struct request * req = NULL; - int rw_ahead, max_sectors, el_ret; + int r_ahead, w_ahead, max_sectors, el_ret; struct list_head *head = &q->queue_head; int latency; elevator_t *elevator = &q->elevator; @@ -584,30 +584,24 @@ } } - rw_ahead = 0; /* normal case; gets changed below for READA */ + r_ahead = w_ahead = 0; switch (rw) { case READA: - rw_ahead = 1; + r_ahead = 1; rw = READ; /* drop into READ */ case READ: if (buffer_uptodate(bh)) /* Hmmph! Already have it */ goto end_io; kstat.pgpgin++; break; - case WRITERAW: - rw = WRITE; - goto do_write; /* Skip the buffer refile */ + case WRITETRY: + w_ahead = 1; case WRITE: if (!test_and_clear_bit(BH_Dirty, &bh->b_state)) goto end_io;/* Hmmph! Nothing to write */ refile_buffer(bh); - do_write: - /* -* We don't allow the write-requests to fill up the -* queue completely: we want some room for reads, -* as they take precedence. The last third of the -* requests are only for reads. -*/ + case WRITERAW: /* Skip the buffer refile */ + rw = WRITE; kstat.pgpgout++; break;
Re: PATCH: Trying to get back IO performance (WIP)
Hi, On Mon, Jul 03, 2000 at 02:24:07AM +0200, Juan J. Quintela wrote: > This patch is against test3-pre2. > It gives here good performance in the first run, and very bad > in the following ones of dbench 48. I am hitting here problems with > the locking scheme. I get a lot of contention in __wait_on_supper. > Almost all the dbench processes are waiting in: > >0xc013639c __wait_on_super+0x184 (0xc13f4c00) >0xc01523e5 ext2_alloc_block+0x21 (0xc4840c20, 0x12901d, 0xc7427ea0) Known, and I did a patch for this ages ago. It actually didn't make a whole lot of difference. The last version of the ext2 diffs I did for this are included below. --Stephen >From [EMAIL PROTECTED] Mon May 1 18:18:15 2000 Return-Path: Received: (from sct@localhost) by dukat.scot.redhat.com (8.9.3/8.9.3) id SAA16268 for [EMAIL PROTECTED]; Mon, 1 May 2000 18:18:14 +0100 Resent-From: "Stephen C. Tweedie" <[EMAIL PROTECTED]> Resent-Message-ID: <[EMAIL PROTECTED]> Resent-Date: Mon, 1 May 2000 18:18:14 +0100 (BST) Resent-To: [EMAIL PROTECTED] X-Authentication-Warning: worf.scot.redhat.com: sct set sender to [EMAIL PROTECTED] using -f MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <[EMAIL PROTECTED]> In-Reply-To: <[EMAIL PROTECTED]> References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> X-Mailer: VM 6.59 under Emacs 20.3.1 From: "Stephen C. Tweedie" <[EMAIL PROTECTED]> To: Linus Torvalds <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]>, "Stephen C. Tweedie" <[EMAIL PROTECTED]>, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: DANGER: DONT apply it. Re: [patch] ext2nolock-2.2.8-A0 Date: Sat, 15 May 1999 03:05:41 +0100 (BST) Status: RO Content-Length: 22531 Lines: 755 Hi, Linus Torvalds writes: > In particular, ext2_new_block() returns a block number. That's all fine > and dandy, but it's extremely and utterly idiotic. It means that > ext2_new_block() needs to clear the block, which is where all the > race-avoidance comes from as far as I can tell. There are two sets of races involved. One involves consistency of the bitmaps themselves: we need to make sure that concurrent allocations are consistent. The second is consistency of inodes, and that is already handled in fs/ext2/inode.c: when we call ext2_alloc_block(), we follow it up with a check to see if anybody else has allocated the same block in the same inode while we slept. If so, we free the block and repeat the block search. So, there's no reason why block clearing needs to be locked: the block cannot be installed in the inode indirection maps until we have completely returned from ext2_new_block() anyway. Anyway, the patch below (against 2.2.8) is an extension of the last one I posted, and it gets rid of the superblock lock in balloc.c entirely. We do have to be a bit more careful about allocation consitency without the lock: in particular, quota operations used to happen between bitmap and group descriptor updates, but now we can't risk blocking while the bitmaps and group descriptors are inconsistent. It compiles but I can't test right now. Feel free to play with it if you want to, but I do believe we can safely go without superblock locks in both ialloc.c and balloc.c if we are careful. The main change that dropping the lock imposes on us is that we cannot rely on the bitmap buffer remaining pinned in cache for the duration of the allocation, so we have to bump the buffer_head b_count in load_block_bitmap() and brelse it once we are finished with it. Shift-scrlck will be very useful here for spotting the buffer_head leaks I've introduced. :) --Stephen alloc-2.2.8.diff: --- fs/ext2/balloc.c.~1~Thu Oct 29 05:54:56 1998 +++ fs/ext2/balloc.cFri May 14 14:46:59 1999 @@ -9,6 +9,13 @@ * Enhanced block allocation by Stephen Tweedie ([EMAIL PROTECTED]), 1993 * Big-endian to little-endian byte-swapping/bitmaps by *David S. Miller ([EMAIL PROTECTED]), 1995 + * + * Dropped use of the superblock lock. + * This means that we have to take extra care not to block between any + * operations on the bitmap and the corresponding update to the group + * descriptors. + * Stephen C. Tweedie ([EMAIL PROTECTED]), 1999 + * */ /* @@ -74,42 +81,33 @@ } /* - * Read the bitmap for a given block_group, reading into the specified - * slot in the superblock's bitmap cache. + * Read the bitmap for a given block_group. * - * Return >=0 on success or a -ve error code. + * Return the buffer_head on success or NULL on IO error. */ -static int read_block_bitmap (struct super_block * sb, - unsigned int block_group, - unsigned long bitmap_nr) +static struct buffer_head * read_block_bitmap (struct super_block * sb, +