Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Dave Kleikamp
On Wed, 2007-09-19 at 12:15 -0700, Mingming Cao wrote:

 Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2. In all
 cases except one handles memory allocation failure so I get rid of those
 GFP_NOFAIL flags.
 
 Also, shouldn't we use GFP_KERNEL instead of GFP_NOFS flag for kmalloc
 in jbd/jbd2? I will send a separate patch to cleanup that.

No.  GFP_NOFS avoids deadlock.  It prevents the allocation from making
recursive calls back into the file system that could end up blocking on
jbd code.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Dave Kleikamp
On Wed, 2007-09-19 at 14:26 -0500, Dave Kleikamp wrote:
 On Wed, 2007-09-19 at 12:15 -0700, Mingming Cao wrote:
 
  Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2. In all
  cases except one handles memory allocation failure so I get rid of those
  GFP_NOFAIL flags.
  
  Also, shouldn't we use GFP_KERNEL instead of GFP_NOFS flag for kmalloc
  in jbd/jbd2? I will send a separate patch to cleanup that.
 
 No.  GFP_NOFS avoids deadlock.  It prevents the allocation from making
 recursive calls back into the file system that could end up blocking on
 jbd code.

Oh, I see your patch now.  You mean use GFP_NOFS instead of
GFP_KERNEL.  :-)  OK then.

 Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Andreas Dilger
On Sep 19, 2007  12:15 -0700, Mingming Cao wrote:
 @@ -96,8 +96,7 @@ static int start_this_handle(journal_t *
  
  alloc_transaction:
   if (!journal-j_running_transaction) {
 - new_transaction = kmalloc(sizeof(*new_transaction),
 - GFP_NOFS|__GFP_NOFAIL);
 + new_transaction = kmalloc(sizeof(*new_transaction), GFP_NOFS);

This should probably be a __GFP_NOFAIL if we are trying to start a new
handle in truncate, as there is no way to propagate an error to the caller.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Mingming Cao
On Wed, 2007-09-19 at 19:28 +, Dave Kleikamp wrote:
 On Wed, 2007-09-19 at 14:26 -0500, Dave Kleikamp wrote:
  On Wed, 2007-09-19 at 12:15 -0700, Mingming Cao wrote:
  
   Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2. In all
   cases except one handles memory allocation failure so I get rid of those
   GFP_NOFAIL flags.
   
   Also, shouldn't we use GFP_KERNEL instead of GFP_NOFS flag for kmalloc
   in jbd/jbd2? I will send a separate patch to cleanup that.
  
  No.  GFP_NOFS avoids deadlock.  It prevents the allocation from making
  recursive calls back into the file system that could end up blocking on
  jbd code.
 
 Oh, I see your patch now.  You mean use GFP_NOFS instead of
 GFP_KERNEL.  :-)  OK then.
 

oops, I did mean what you say here.:-)

  Shaggy

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Christoph Hellwig
On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
 Here is the incremental small cleanup patch. 
 
 Remove kamlloc usages in jbd/jbd2 and consistently use 
 jbd_kmalloc/jbd2_malloc.

Shouldn't we kill jbd_kmalloc instead?

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao
On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
 On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
  Here is the incremental small cleanup patch. 
  
  Remove kamlloc usages in jbd/jbd2 and consistently use 
  jbd_kmalloc/jbd2_malloc.
 
 Shouldn't we kill jbd_kmalloc instead?
 

It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
places to handle memory (de)allocation(page size) via kmalloc/kfree, so
in the future if we need to change memory allocation in jbd(e.g. not
using kmalloc or using different flag), we don't need to touch every
place in the jbd code calling jbd_kmalloc.

Regards,
Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Dave Kleikamp
On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote:
 On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
  On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
   Here is the incremental small cleanup patch. 
   
   Remove kamlloc usages in jbd/jbd2 and consistently use 
   jbd_kmalloc/jbd2_malloc.
  
  Shouldn't we kill jbd_kmalloc instead?
  
 
 It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
 places to handle memory (de)allocation(page size) via kmalloc/kfree, so
 in the future if we need to change memory allocation in jbd(e.g. not
 using kmalloc or using different flag), we don't need to touch every
 place in the jbd code calling jbd_kmalloc.

I disagree.  Why would jbd need to globally change the way it allocates
memory?  It currently uses kmalloc (and jbd_kmalloc) for allocating a
variety of structures.  Having to change one particular instance won't
necessarily mean we want to change all of them.  Adding unnecessary
wrappers only obfuscates the code making it harder to understand.  You
wouldn't want every subsystem to have it's own *_kmalloc() that took
different arguments.  Besides, there aren't that many calls to kmalloc
and kfree in the jbd code, so there wouldn't be much pain in changing
GFP flags or whatever, if it ever needed to be done.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao
On Tue, 2007-09-18 at 13:04 -0500, Dave Kleikamp wrote:
 On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote:
  On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
   On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
Here is the incremental small cleanup patch. 

Remove kamlloc usages in jbd/jbd2 and consistently use 
jbd_kmalloc/jbd2_malloc.
   
   Shouldn't we kill jbd_kmalloc instead?
   
  
  It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
  places to handle memory (de)allocation(page size) via kmalloc/kfree, so
  in the future if we need to change memory allocation in jbd(e.g. not
  using kmalloc or using different flag), we don't need to touch every
  place in the jbd code calling jbd_kmalloc.
 
 I disagree.  Why would jbd need to globally change the way it allocates
 memory?  It currently uses kmalloc (and jbd_kmalloc) for allocating a
 variety of structures.  Having to change one particular instance won't
 necessarily mean we want to change all of them.  Adding unnecessary
 wrappers only obfuscates the code making it harder to understand.  You
 wouldn't want every subsystem to have it's own *_kmalloc() that took
 different arguments.  Besides, there aren't that many calls to kmalloc
 and kfree in the jbd code, so there wouldn't be much pain in changing
 GFP flags or whatever, if it ever needed to be done.
 
 Shaggy

Okay, Points taken, Here is the updated patch to get rid of slab
management and jbd_kmalloc from jbd totally. This patch is intend to
replace the patch in mm tree, Andrew, could you pick up this one
instead?

Thanks,

Mingming


jbd/jbd2: JBD memory allocation cleanups

From: Christoph Lameter [EMAIL PROTECTED]

JBD: Replace slab allocations with page cache allocations

JBD allocate memory for committed_data and frozen_data from slab. However
JBD should not pass slab pages down to the block layer. Use page allocator 
pages instead. This will also prepare JBD for the large blocksize patchset.


Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]

---
 fs/jbd/commit.c   |6 +--
 fs/jbd/journal.c  |   99 ++
 fs/jbd/transaction.c  |   12 +++---
 fs/jbd2/commit.c  |6 +--
 fs/jbd2/journal.c |   99 ++
 fs/jbd2/transaction.c |   18 -
 include/linux/jbd.h   |   18 +
 include/linux/jbd2.h  |   21 +-
 8 files changed, 52 insertions(+), 227 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-18 17:19:01.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-18 17:51:21.0 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd_slab_alloc(bh_in-b_size, GFP_NOFS);
+   tmp = jbd_alloc(bh_in-b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in-b_frozen_data) {
-   jbd_slab_free(tmp, bh_in-b_size);
+   jbd_free(tmp, bh_in-b_size);
goto repeat;
}
 
@@ -654,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = jbd_kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = journal_create_jbd_slab(be32_to_cpu(sb-s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (journal_recover(journal))
@@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
-}
-
-/*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could

Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Andrew Morton
On Tue, 18 Sep 2007 18:00:01 -0700 Mingming Cao [EMAIL PROTECTED] wrote:

 JBD: Replace slab allocations with page cache allocations
 
 JBD allocate memory for committed_data and frozen_data from slab. However
 JBD should not pass slab pages down to the block layer. Use page allocator 
 pages instead. This will also prepare JBD for the large blocksize patchset.
 
 
 Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly

__GFP_NOFAIL should only be used when we have no way of recovering
from failure.  The allocation in journal_init_common() (at least)
_can_ recover and hence really shouldn't be using __GFP_NOFAIL.

(Actually, nothing in the kernel should be using __GFP_NOFAIL.  It is 
there as a marker which says we really shouldn't be doing this but
we don't know how to fix it).

So sometime it'd be good if you could review all the __GFP_NOFAILs in
there and see if we can remove some, thanks.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Mingming Cao
On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
 jbd/jbd2: Replace slab allocations with page cache allocations
 
 From: Christoph Lameter [EMAIL PROTECTED]
 
 JBD should not pass slab pages down to the block layer.
 Use page allocator pages instead. This will also prepare
 JBD for the large blocksize patchset.
 

Currently memory allocation for committed_data(and frozen_buffer) for
bufferhead is done through jbd slab management, as Christoph Hellwig
pointed out that this is broken as jbd should not pass slab pages down
to IO layer. and suggested to use get_free_pages() directly.

The problem with this patch, as Andreas Dilger pointed today in ext4
interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
1/3-1/2 page space. 

What was the originally intention to set up slabs for committed_data(and
frozen_buffer) in JBD? Why not using kmalloc?

Mingming

 Tested on 2.6.23-rc6 with fsx runs fine.
 
 Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
 Signed-off-by: Mingming Cao [EMAIL PROTECTED]
 ---
  fs/jbd/checkpoint.c   |2 
  fs/jbd/commit.c   |6 +-
  fs/jbd/journal.c  |  107 
 -
  fs/jbd/transaction.c  |   10 ++--
  fs/jbd2/checkpoint.c  |2 
  fs/jbd2/commit.c  |6 +-
  fs/jbd2/journal.c |  109 
 --
  fs/jbd2/transaction.c |   18 
  include/linux/jbd.h   |   23 +-
  include/linux/jbd2.h  |   28 ++--
  10 files changed, 83 insertions(+), 228 deletions(-)
 
 Index: linux-2.6.23-rc5/fs/jbd/journal.c
 ===
 --- linux-2.6.23-rc5.orig/fs/jbd/journal.c2007-09-13 13:37:57.0 
 -0700
 +++ linux-2.6.23-rc5/fs/jbd/journal.c 2007-09-13 13:45:39.0 -0700
 @@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
  static int journal_convert_superblock_v1(journal_t *, journal_superblock_t 
 *);
  static void __journal_abort_soft (journal_t *journal, int errno);
 -static int journal_create_jbd_slab(size_t slab_size);
 
  /*
   * Helper function used to manage commit timeouts
 @@ -334,10 +333,10 @@ repeat:
   char *tmp;
 
   jbd_unlock_bh_state(bh_in);
 - tmp = jbd_slab_alloc(bh_in-b_size, GFP_NOFS);
 + tmp = jbd_alloc(bh_in-b_size, GFP_NOFS);
   jbd_lock_bh_state(bh_in);
   if (jh_in-b_frozen_data) {
 - jbd_slab_free(tmp, bh_in-b_size);
 + jbd_free(tmp, bh_in-b_size);
   goto repeat;
   }
 
 @@ -679,7 +678,7 @@ static journal_t * journal_init_common (
   /* Set up a default-sized revoke table for the new mount. */
   err = journal_init_revoke(journal, JOURNAL_REVOKE_DEFAULT_HASH);
   if (err) {
 - kfree(journal);
 + jbd_kfree(journal);
   goto fail;
   }
   return journal;
 @@ -728,7 +727,7 @@ journal_t * journal_init_dev(struct bloc
   if (!journal-j_wbuf) {
   printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
   __FUNCTION__);
 - kfree(journal);
 + jbd_kfree(journal);
   journal = NULL;
   goto out;
   }
 @@ -782,7 +781,7 @@ journal_t * journal_init_inode (struct i
   if (!journal-j_wbuf) {
   printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
   __FUNCTION__);
 - kfree(journal);
 + jbd_kfree(journal);
   return NULL;
   }
 
 @@ -791,7 +790,7 @@ journal_t * journal_init_inode (struct i
   if (err) {
   printk(KERN_ERR %s: Cannnot locate journal superblock\n,
  __FUNCTION__);
 - kfree(journal);
 + jbd_kfree(journal);
   return NULL;
   }
 
 @@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
   }
   }
 
 - /*
 -  * Create a slab for this blocksize
 -  */
 - err = journal_create_jbd_slab(be32_to_cpu(sb-s_blocksize));
 - if (err)
 - return err;
 -
   /* Let the recovery code check whether it needs to recover any
* data from the journal. */
   if (journal_recover(journal))
 @@ -1166,7 +1158,7 @@ void journal_destroy(journal_t *journal)
   if (journal-j_revoke)
   journal_destroy_revoke(journal);
   kfree(journal-j_wbuf);
 - kfree(journal);
 + jbd_kfree(journal);
  }
 
 
 @@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
  }
 
  /*
 - * Simple support for retrying memory allocations.  Introduced to help to
 - * debug different VM deadlock avoidance strategies.
 - */
 -void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
 -{
 - return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
 -}
 -
 -/*
 - * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
 - * and allocate frozen 

Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Christoph Hellwig
On Mon, Sep 17, 2007 at 12:29:51PM -0700, Mingming Cao wrote:
 The problem with this patch, as Andreas Dilger pointed today in ext4
 interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
 1/3-1/2 page space. 
 
 What was the originally intention to set up slabs for committed_data(and
 frozen_buffer) in JBD? Why not using kmalloc?

kmalloc is using slabs :)

The intent was to avoid the wasted memory, but as we've repeated a gazillion
times wasted memory on a rather rare codepath doesn't really matter when
you just crash random storage drivers otherwise.
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Badari Pulavarty
On Mon, 2007-09-17 at 12:29 -0700, Mingming Cao wrote:
 On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
  jbd/jbd2: Replace slab allocations with page cache allocations
  
  From: Christoph Lameter [EMAIL PROTECTED]
  
  JBD should not pass slab pages down to the block layer.
  Use page allocator pages instead. This will also prepare
  JBD for the large blocksize patchset.
  
 
 Currently memory allocation for committed_data(and frozen_buffer) for
 bufferhead is done through jbd slab management, as Christoph Hellwig
 pointed out that this is broken as jbd should not pass slab pages down
 to IO layer. and suggested to use get_free_pages() directly.
 
 The problem with this patch, as Andreas Dilger pointed today in ext4
 interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
 1/3-1/2 page space. 
 
 What was the originally intention to set up slabs for committed_data(and
 frozen_buffer) in JBD? Why not using kmalloc?
 
 Mingming

Looks good. Small suggestion is to get rid of all kmalloc() usages and
consistently use jbd_kmalloc() or jbd2_kmalloc().

Thanks,
Badari

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Mingming Cao
On Mon, 2007-09-17 at 15:01 -0700, Badari Pulavarty wrote:
 On Mon, 2007-09-17 at 12:29 -0700, Mingming Cao wrote:
  On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
   jbd/jbd2: Replace slab allocations with page cache allocations
   
   From: Christoph Lameter [EMAIL PROTECTED]
   
   JBD should not pass slab pages down to the block layer.
   Use page allocator pages instead. This will also prepare
   JBD for the large blocksize patchset.
   
  
  Currently memory allocation for committed_data(and frozen_buffer) for
  bufferhead is done through jbd slab management, as Christoph Hellwig
  pointed out that this is broken as jbd should not pass slab pages down
  to IO layer. and suggested to use get_free_pages() directly.
  
  The problem with this patch, as Andreas Dilger pointed today in ext4
  interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
  1/3-1/2 page space. 
  
  What was the originally intention to set up slabs for committed_data(and
  frozen_buffer) in JBD? Why not using kmalloc?
  
  Mingming
 
 Looks good. Small suggestion is to get rid of all kmalloc() usages and
 consistently use jbd_kmalloc() or jbd2_kmalloc().
 
 Thanks,
 Badari
 

Here is the incremental small cleanup patch. 

Remove kamlloc usages in jbd/jbd2 and consistently use jbd_kmalloc/jbd2_malloc.


Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/jbd/journal.c  |8 +---
 fs/jbd/revoke.c   |   12 ++--
 fs/jbd2/journal.c |8 +---
 fs/jbd2/revoke.c  |   12 ++--
 4 files changed, 22 insertions(+), 18 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-17 14:32:16.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-17 14:33:59.0 -0700
@@ -723,7 +723,8 @@ journal_t * journal_init_dev(struct bloc
journal-j_blocksize = blocksize;
n = journal-j_blocksize / sizeof(journal_block_tag_t);
journal-j_wbufsize = n;
-   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal-j_wbuf = jbd_kmalloc(n * sizeof(struct buffer_head*),
+   GFP_KERNEL);
if (!journal-j_wbuf) {
printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
__FUNCTION__);
@@ -777,7 +778,8 @@ journal_t * journal_init_inode (struct i
/* journal descriptor can store up to n blocks -bzzz */
n = journal-j_blocksize / sizeof(journal_block_tag_t);
journal-j_wbufsize = n;
-   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal-j_wbuf = jbd_kmalloc(n * sizeof(struct buffer_head*),
+   GFP_KERNEL);
if (!journal-j_wbuf) {
printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
__FUNCTION__);
@@ -1157,7 +1159,7 @@ void journal_destroy(journal_t *journal)
iput(journal-j_inode);
if (journal-j_revoke)
journal_destroy_revoke(journal);
-   kfree(journal-j_wbuf);
+   jbd_kfree(journal-j_wbuf);
jbd_kfree(journal);
 }
 
Index: linux-2.6.23-rc6/fs/jbd/revoke.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/revoke.c   2007-09-17 14:32:22.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/revoke.c2007-09-17 14:35:13.0 -0700
@@ -219,7 +219,7 @@ int journal_init_revoke(journal_t *journ
journal-j_revoke-hash_shift = shift;
 
journal-j_revoke-hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   jbd_kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
if (!journal-j_revoke-hash_table) {
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
journal-j_revoke = NULL;
@@ -231,7 +231,7 @@ int journal_init_revoke(journal_t *journ
 
journal-j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
GFP_KERNEL);
if (!journal-j_revoke_table[1]) {
-   kfree(journal-j_revoke_table[0]-hash_table);
+   jbd_kfree(journal-j_revoke_table[0]-hash_table);
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
return -ENOMEM;
}
@@ -246,9 +246,9 @@ int journal_init_revoke(journal_t *journ
journal-j_revoke-hash_shift = shift;
 
journal-j_revoke-hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   jbd_kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
if (!journal-j_revoke-hash_table) {
-   kfree(journal-j_revoke_table[0]-hash_table);
+   jbd_kfree(journal-j_revoke_table[0]-hash_table);
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);

[PATCH] JBD slab cleanups

2007-09-14 Thread Mingming Cao
jbd/jbd2: Replace slab allocations with page cache allocations

From: Christoph Lameter [EMAIL PROTECTED]

JBD should not pass slab pages down to the block layer.
Use page allocator pages instead. This will also prepare
JBD for the large blocksize patchset.

Tested on 2.6.23-rc6 with fsx runs fine.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/jbd/checkpoint.c   |2 
 fs/jbd/commit.c   |6 +-
 fs/jbd/journal.c  |  107 -
 fs/jbd/transaction.c  |   10 ++--
 fs/jbd2/checkpoint.c  |2 
 fs/jbd2/commit.c  |6 +-
 fs/jbd2/journal.c |  109 --
 fs/jbd2/transaction.c |   18 
 include/linux/jbd.h   |   23 +-
 include/linux/jbd2.h  |   28 ++--
 10 files changed, 83 insertions(+), 228 deletions(-)

Index: linux-2.6.23-rc5/fs/jbd/journal.c
===
--- linux-2.6.23-rc5.orig/fs/jbd/journal.c  2007-09-13 13:37:57.0 
-0700
+++ linux-2.6.23-rc5/fs/jbd/journal.c   2007-09-13 13:45:39.0 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd_slab_alloc(bh_in-b_size, GFP_NOFS);
+   tmp = jbd_alloc(bh_in-b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in-b_frozen_data) {
-   jbd_slab_free(tmp, bh_in-b_size);
+   jbd_free(tmp, bh_in-b_size);
goto repeat;
}
 
@@ -679,7 +678,7 @@ static journal_t * journal_init_common (
/* Set up a default-sized revoke table for the new mount. */
err = journal_init_revoke(journal, JOURNAL_REVOKE_DEFAULT_HASH);
if (err) {
-   kfree(journal);
+   jbd_kfree(journal);
goto fail;
}
return journal;
@@ -728,7 +727,7 @@ journal_t * journal_init_dev(struct bloc
if (!journal-j_wbuf) {
printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
__FUNCTION__);
-   kfree(journal);
+   jbd_kfree(journal);
journal = NULL;
goto out;
}
@@ -782,7 +781,7 @@ journal_t * journal_init_inode (struct i
if (!journal-j_wbuf) {
printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
__FUNCTION__);
-   kfree(journal);
+   jbd_kfree(journal);
return NULL;
}
 
@@ -791,7 +790,7 @@ journal_t * journal_init_inode (struct i
if (err) {
printk(KERN_ERR %s: Cannnot locate journal superblock\n,
   __FUNCTION__);
-   kfree(journal);
+   jbd_kfree(journal);
return NULL;
}
 
@@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = journal_create_jbd_slab(be32_to_cpu(sb-s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (journal_recover(journal))
@@ -1166,7 +1158,7 @@ void journal_destroy(journal_t *journal)
if (journal-j_revoke)
journal_destroy_revoke(journal);
kfree(journal-j_wbuf);
-   kfree(journal);
+   jbd_kfree(journal);
 }
 
 
@@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
-}
-
-/*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
- */
-
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size)  (size  11)
-
-static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
-static const char *jbd_slab_names[JBD_MAX_SLABS] = {
-   jbd_1k, jbd_2k, jbd_4k, NULL, jbd_8k
-};
-
-static void journal_destroy_jbd_slabs(void)
-{
-   int i;
-
-   for (i = 0; i  JBD_MAX_SLABS; i++) {
-   if (jbd_slab[i])
-   kmem_cache_destroy(jbd_slab[i]);
-   jbd_slab[i] = NULL;
-   }

Re: [PATCH] JBD slab cleanups

2007-09-14 Thread Christoph Lameter
Thanks Mingming.


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html