from:"Mingming Cao"

Re: [PATCH v3 0/8] Support for transparent PUD pages for DAX files

2016-01-21 Thread mingming cao

On 01/08/2016 11:49 AM, Matthew Wilcox wrote:
> From: Matthew Wilcox 
> 
> Andrew, I think this is ready for a spin in -mm.
> 
> v3: Rebased against current mmtom
> v2: Reduced churn in filesystems by switching to ->huge_fault interface
> Addressed concerns from Kirill
> 
> We have customer demand to use 1GB pages to map DAX files.  Unlike the 2MB
> page support, the Linux MM does not currently support PUD pages, so I have
> attempted to add support for the necessary pieces for DAX huge PUD pages.
> 
> Filesystems still need work to allocate 1GB pages.  With ext4, I can
> only get 16MB of contiguous space, although it is aligned.  With XFS,
> I can get 80MB less than 1GB, and it's not aligned.  The XFS problem
> may be due to the small amount of RAM in my test machine.
> 
I dont think ext4 can do 1G at this time due to extent length bits (15 for 
unwritten) and block group size bundary (well, with flex bg we may able to 
relax this ). I have seen about 125M of contiguous space allocated on my fresh 
new ext4 filesystem. I do remember mballoc in ext4 used to normalize the 
allocation request up to 8 or 16M, but it appears not that small any more.

Thanks,
Mingming

> This patch set is against something approximately current -mm.  I'd like
> to thank Dave Chinner & Kirill Shutemov for their reviews of v1.
> The conversion of pmd_fault & pud_fault to huge_fault is thanks to
> Dave's poking, and Kirill spotted a couple of problems in the MM code.
> Version 2 of the patch set is about 200 lines smaller (1016 insertions,
> 23 deletions in v1).
> 
> I've done some light testing using a program to mmap a block device
> with DAX enabled, calling mincore() and examining /proc/smaps and
> /proc/pagemap.
> 
> Matthew Wilcox (8):
>   mm: Convert an open-coded VM_BUG_ON_VMA
>   mm,fs,dax: Change ->pmd_fault to ->huge_fault
>   mm: Add support for PUD-sized transparent hugepages
>   mincore: Add support for PUDs
>   procfs: Add support for PUDs to smaps, clear_refs and pagemap
>   x86: Add support for PUD-sized transparent hugepages
>   dax: Support for transparent PUD pages
>   ext4: Support for PUD-sized transparent huge pages
> 
>  Documentation/filesystems/dax.txt |  12 +-
>  arch/Kconfig  |   3 +
>  arch/x86/Kconfig  |   1 +
>  arch/x86/include/asm/paravirt.h   |  11 ++
>  arch/x86/include/asm/paravirt_types.h |   2 +
>  arch/x86/include/asm/pgtable.h|  94 
>  arch/x86/include/asm/pgtable_64.h |  13 ++
>  arch/x86/kernel/paravirt.c|   1 +
>  arch/x86/mm/pgtable.c |  31 
>  fs/block_dev.c|  10 +-
>  fs/dax.c  | 272 
> +-
>  fs/ext2/file.c|  27 +---
>  fs/ext4/file.c|  60 +++-
>  fs/proc/task_mmu.c| 109 ++
>  fs/xfs/xfs_file.c |  25 ++--
>  fs/xfs/xfs_trace.h|   2 +-
>  include/asm-generic/pgtable.h |  62 +++-
>  include/asm-generic/tlb.h |  14 ++
>  include/linux/dax.h   |  17 ---
>  include/linux/huge_mm.h   |  50 +++
>  include/linux/mm.h|  43 +-
>  include/linux/mmu_notifier.h  |  13 ++
>  include/linux/pfn_t.h |   8 +
>  mm/huge_memory.c  | 151 +++
>  mm/memory.c   | 101 +++--
>  mm/mincore.c  |  13 ++
>  mm/pagewalk.c |  19 ++-
>  mm/pgtable-generic.c  |  14 ++
>  28 files changed, 980 insertions(+), 198 deletions(-)
>

Re: [PATCH v3 0/8] Support for transparent PUD pages for DAX files

2016-01-21 Thread mingming cao

On 01/08/2016 11:49 AM, Matthew Wilcox wrote:
> From: Matthew Wilcox 
> 
> Andrew, I think this is ready for a spin in -mm.
> 
> v3: Rebased against current mmtom
> v2: Reduced churn in filesystems by switching to ->huge_fault interface
> Addressed concerns from Kirill
> 
> We have customer demand to use 1GB pages to map DAX files.  Unlike the 2MB
> page support, the Linux MM does not currently support PUD pages, so I have
> attempted to add support for the necessary pieces for DAX huge PUD pages.
> 
> Filesystems still need work to allocate 1GB pages.  With ext4, I can
> only get 16MB of contiguous space, although it is aligned.  With XFS,
> I can get 80MB less than 1GB, and it's not aligned.  The XFS problem
> may be due to the small amount of RAM in my test machine.
> 
I dont think ext4 can do 1G at this time due to extent length bits (15 for 
unwritten) and block group size bundary (well, with flex bg we may able to 
relax this ). I have seen about 125M of contiguous space allocated on my fresh 
new ext4 filesystem. I do remember mballoc in ext4 used to normalize the 
allocation request up to 8 or 16M, but it appears not that small any more.

Thanks,
Mingming

> This patch set is against something approximately current -mm.  I'd like
> to thank Dave Chinner & Kirill Shutemov for their reviews of v1.
> The conversion of pmd_fault & pud_fault to huge_fault is thanks to
> Dave's poking, and Kirill spotted a couple of problems in the MM code.
> Version 2 of the patch set is about 200 lines smaller (1016 insertions,
> 23 deletions in v1).
> 
> I've done some light testing using a program to mmap a block device
> with DAX enabled, calling mincore() and examining /proc/smaps and
> /proc/pagemap.
> 
> Matthew Wilcox (8):
>   mm: Convert an open-coded VM_BUG_ON_VMA
>   mm,fs,dax: Change ->pmd_fault to ->huge_fault
>   mm: Add support for PUD-sized transparent hugepages
>   mincore: Add support for PUDs
>   procfs: Add support for PUDs to smaps, clear_refs and pagemap
>   x86: Add support for PUD-sized transparent hugepages
>   dax: Support for transparent PUD pages
>   ext4: Support for PUD-sized transparent huge pages
> 
>  Documentation/filesystems/dax.txt |  12 +-
>  arch/Kconfig  |   3 +
>  arch/x86/Kconfig  |   1 +
>  arch/x86/include/asm/paravirt.h   |  11 ++
>  arch/x86/include/asm/paravirt_types.h |   2 +
>  arch/x86/include/asm/pgtable.h|  94 
>  arch/x86/include/asm/pgtable_64.h |  13 ++
>  arch/x86/kernel/paravirt.c|   1 +
>  arch/x86/mm/pgtable.c |  31 
>  fs/block_dev.c|  10 +-
>  fs/dax.c  | 272 
> +-
>  fs/ext2/file.c|  27 +---
>  fs/ext4/file.c|  60 +++-
>  fs/proc/task_mmu.c| 109 ++
>  fs/xfs/xfs_file.c |  25 ++--
>  fs/xfs/xfs_trace.h|   2 +-
>  include/asm-generic/pgtable.h |  62 +++-
>  include/asm-generic/tlb.h |  14 ++
>  include/linux/dax.h   |  17 ---
>  include/linux/huge_mm.h   |  50 +++
>  include/linux/mm.h|  43 +-
>  include/linux/mmu_notifier.h  |  13 ++
>  include/linux/pfn_t.h |   8 +
>  mm/huge_memory.c  | 151 +++
>  mm/memory.c   | 101 +++--
>  mm/mincore.c  |  13 ++
>  mm/pagewalk.c |  19 ++-
>  mm/pgtable-generic.c  |  14 ++
>  28 files changed, 980 insertions(+), 198 deletions(-)
>

Re: [PATCH resend] ext2/3/4: convert byte order of constant instead of variable

2008-02-14 Thread Mingming Cao

On Thu, 2008-02-14 at 14:20 -0800, Andrew Morton wrote:
> On Sun, 10 Feb 2008 11:10:15 +0100
> Marcin Slusarz <[EMAIL PROTECTED]> wrote:
> 
> >  fs/ext2/super.c |8 +++-
> >  fs/ext3/super.c |2 +-
> >  fs/ext4/super.c |2 +-
> 
> Please don't bundle the filesystem patches in this manner.  I split
> it into three patches.
> 
Andrew, Ted,

I added the ext4 patch in the ext4 patch queue.

Regards,
Mingming


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH resend] ext2/3/4: convert byte order of constant instead of variable

2008-02-14 Thread Mingming Cao

On Thu, 2008-02-14 at 14:20 -0800, Andrew Morton wrote:
 On Sun, 10 Feb 2008 11:10:15 +0100
 Marcin Slusarz [EMAIL PROTECTED] wrote:
 
   fs/ext2/super.c |8 +++-
   fs/ext3/super.c |2 +-
   fs/ext4/super.c |2 +-
 
 Please don't bundle the filesystem patches in this manner.  I split
 it into three patches.
 
Andrew, Ted,

I added the ext4 patch in the ext4 patch queue.

Regards,
Mingming


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ext4 can fail badly when device stops accepting BIO_RW_BARRIER requests.

2008-02-07 Thread Mingming Cao

On Wed, 2008-02-06 at 22:25 -0600, Dave Kleikamp wrote:
> Duplicating Neil Brown's jbd patch for jbd2.  I guess this can go
> through the ext4 queue rather than straight into -mm.
> 
Checked-in.

Thanks Shaggy and Neil.

Mingming

> Neil's text:
> 
> Some devices - notably dm and md - can change their behaviour in
> response to BIO_RW_BARRIER requests.  They might start out accepting
> such requests but on reconfiguration, they find out that they cannot
> any more.
> 
> ext3 (and other filesystems) deal with this by always testing if
> BIO_RW_BARRIER requests fail with EOPNOTSUPP, and retrying the write
> requests without the barrier (probably after waiting for any pending
> writes to complete).
> 
> However there is a bug in the handling for this for ext3.
> 
> When ext3 (jbd actually) decides to submit a BIO_RW_BARRIER request,
> it sets the buffer_ordered flag on the buffer head.
> If the request completes successfully, the flag STAYS SET.
> 
> Other code might then write the same buffer_head after the device has
> been reconfigured to not accept barriers.  This write will then fail,
> but the "other code" is not ready to handle EOPNOTSUPP errors and the
> error will be treated as fatal.
> 
> This can be seen without having to reconfigure a device at exactly the
> wrong time by putting:
> 
> if (buffer_ordered(bh))
> printk("OH DEAR, and ordered buffer\n");
> 
> 
> in the while loop in "commit phase 5" of journal_commit_transaction.
> 
> If it ever prints the "OH DEAR ..." message (as it does sometimes for
> me), then that request could (in different circumstances) have failed
> with EOPNOTSUPP, but that isn't tested for.
> 
> My proposed fix is to clear the buffer_ordered flag after it has been
> used, as in the following patch.
> 
> Thanks,
> NeilBrown
> 
> Signed-off-by: Dave Kleikamp <[EMAIL PROTECTED]>
> 
> diff -Nurp linux-2.6.24-mm1/fs/jbd2/commit.c linux/fs/jbd2/commit.c
> --- linux-2.6.24-mm1/fs/jbd2/commit.c 2008-02-04 09:08:44.0 -0600
> +++ linux/fs/jbd2/commit.c2008-02-06 22:11:14.0 -0600
> @@ -148,6 +148,8 @@ static int journal_submit_commit_record(
>   barrier_done = 1;
>   }
>   ret = submit_bh(WRITE, bh);
> + if (barrier_done)
> + clear_buffer_ordered(bh);
> 
>   /* is it possible for another commit to fail at roughly
>* the same time as this one?  If so, we don't want to
> @@ -166,7 +168,6 @@ static int journal_submit_commit_record(
>   spin_unlock(>j_state_lock);
> 
>   /* And try again, without the barrier */
> - clear_buffer_ordered(bh);
>   set_buffer_uptodate(bh);
>   set_buffer_dirty(bh);
>   ret = submit_bh(WRITE, bh);
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ext4 can fail badly when device stops accepting BIO_RW_BARRIER requests.

2008-02-07 Thread Mingming Cao

On Wed, 2008-02-06 at 22:25 -0600, Dave Kleikamp wrote:
 Duplicating Neil Brown's jbd patch for jbd2.  I guess this can go
 through the ext4 queue rather than straight into -mm.
 
Checked-in.

Thanks Shaggy and Neil.

Mingming

 Neil's text:
 
 Some devices - notably dm and md - can change their behaviour in
 response to BIO_RW_BARRIER requests.  They might start out accepting
 such requests but on reconfiguration, they find out that they cannot
 any more.
 
 ext3 (and other filesystems) deal with this by always testing if
 BIO_RW_BARRIER requests fail with EOPNOTSUPP, and retrying the write
 requests without the barrier (probably after waiting for any pending
 writes to complete).
 
 However there is a bug in the handling for this for ext3.
 
 When ext3 (jbd actually) decides to submit a BIO_RW_BARRIER request,
 it sets the buffer_ordered flag on the buffer head.
 If the request completes successfully, the flag STAYS SET.
 
 Other code might then write the same buffer_head after the device has
 been reconfigured to not accept barriers.  This write will then fail,
 but the other code is not ready to handle EOPNOTSUPP errors and the
 error will be treated as fatal.
 
 This can be seen without having to reconfigure a device at exactly the
 wrong time by putting:
 
 if (buffer_ordered(bh))
 printk(OH DEAR, and ordered buffer\n);
 
 
 in the while loop in commit phase 5 of journal_commit_transaction.
 
 If it ever prints the OH DEAR ... message (as it does sometimes for
 me), then that request could (in different circumstances) have failed
 with EOPNOTSUPP, but that isn't tested for.
 
 My proposed fix is to clear the buffer_ordered flag after it has been
 used, as in the following patch.
 
 Thanks,
 NeilBrown
 
 Signed-off-by: Dave Kleikamp [EMAIL PROTECTED]
 
 diff -Nurp linux-2.6.24-mm1/fs/jbd2/commit.c linux/fs/jbd2/commit.c
 --- linux-2.6.24-mm1/fs/jbd2/commit.c 2008-02-04 09:08:44.0 -0600
 +++ linux/fs/jbd2/commit.c2008-02-06 22:11:14.0 -0600
 @@ -148,6 +148,8 @@ static int journal_submit_commit_record(
   barrier_done = 1;
   }
   ret = submit_bh(WRITE, bh);
 + if (barrier_done)
 + clear_buffer_ordered(bh);
 
   /* is it possible for another commit to fail at roughly
* the same time as this one?  If so, we don't want to
 @@ -166,7 +168,6 @@ static int journal_submit_commit_record(
   spin_unlock(journal-j_state_lock);
 
   /* And try again, without the barrier */
 - clear_buffer_ordered(bh);
   set_buffer_uptodate(bh);
   set_buffer_dirty(bh);
   ret = submit_bh(WRITE, bh);
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] jbd: fix assertion failure in journal_next_log_block

2008-02-06 Thread Mingming Cao

On Tue, 2008-02-05 at 13:59 -0500, Josef Bacik wrote:
> On Tuesday 05 February 2008 12:27:31 pm Jan Kara wrote:
> >   Hello,
> >
> >   Sorry for replying a bit late but I'm currently falling behind in
> > maling-list reading...
> >
> > > The way jbd tries to determine if there is enough space left on the
> > > journal in order to start a new transaction is looking at the space left
> > > in the journal and the space needed for the committing transaction, which
> > > is 1/4 of the journal + the number of t_outstanding_credits for that
> > > transaction.  In this case its assumed that t_outstanding_credits
> > > accurately represents the number of credits
> >
> >   Yes.
> >
> > > waiting to be written to the journal, but this sometimes isn't the case. 
> > > The transaction has two counters for buffers, t_outstanding_credits which
> > > is used in conjunction with handles that are added to the transaction,
> > > and t_nr_buffers which is only incremented/decremented when buffers are
> > > added/removed from the transaction and are actually destined to be
> > > journaled.  Normally these two
> >
> >   t_nr_buffers actually represents number of buffers on BJ_Metadata list
> > and nobody uses it (except for the assertion in
> > __journal_temp_unlink_buffer()). t_outstanding_credits is supposed to be
> > *the* counter making sure we don't write more than we have credits for.
> >
> > > counters are the same, however there are cases where the committing
> > > transaction can have buffers moved to the next running transaction, for
> > > example any buffers on the committing transactions t_reserved list would
> > > be moved to the next (running) transaction, and if it had been dirtied in
> > > the process it would immediately make it onto the t_updates list, which
> > > would increment t_nr_buffers
> >
> >   You probably mean t_buffers list here...
> >
> > > but not t_outstanding_credits.  So you get into this situation where
> >
> >   But which moving and dirtying do you mean? The caller which dirties
> > the buffer must make sure that he has acquired enough credits for the
> > transaction where the buffer ends up... So if there were not enough
> > buffers in the running transaction where we refiled the buffer it is a
> > bug in the caller which dirties the buffer.
> >
> 
> You know now that you say that I feel like an idiot, you are right the only 
> way 
> for something to actually end up on that list was if somebody dirtied it and 
> if 
> they did it would have had to been accounted for at some point on the running 
> transaction.
> 
> > > t_nr_buffers (the actual number of buffers that are on the transaction)
> > > is greater than the number of buffers accounted for via
> > > t_outstanding_credits. This presents a problem since as we loop through
> > > writting buffers to the journal, we decrement t_outstanding_credits, and
> > > if t_nr_buffers is more than t_outstanding_credits then we end up with a
> > > negative number for
> > > t_outstanding_credits, which means we start saying we need less than 1/4
> > > of the journal for our committing transaction and allow more transactions
> > > than we can handle to start, and then bam we fail because
> > > journal_next_log_block doesn't have enough free blocks in order to handle
> > > the request.  This has been tested and fixes the issue (which could not
> > > be reproduced by me but several other people could get it to reproduce
> > > using postmark), and although I couldn't reproduce the assertion, I could
> > > very easily reproduce the situation where t_outstanding_credits was <
> > > than t_nr_buffers.
> >
> >   I suppose you see the assertion J_ASSERT(journal->j_free > 1); to
> > fail, right? I don't see how your patch could help avoid that assertion.
> > You've just removed accounting of t_outstanding_credits which has no
> > impact on the real number of free blocks in the journal stored in
> > j_free. Anyway, if you can reproduce t_outstanding_credits <
> > t_nr_buffers, then there's something fishy. Are you able to reproduce it
> > also with a current kernel?
> >   Thanks for looking into the problem :)
> >
> 
> Well my patch helped avoid the assertion because t_outstanding_credits was 
> going 
> negative therefore we were letting transactions start when we shouldn't be, 
> and 
> eventually we would end up with too much of the journal in use and we'd 
> assert.  
> Course I can't reproduce where t_outstanding_credits < t_nr_buffers upstream 
> (again I feel like an idiot, should have tested that first).  Thanks for 
> looking at this Jan.
> 
> Mingming, would you mind pulling this patch out of the patch queue please 
> since 
> its wrong?  Thanks much,
> 

Sure, done!

Mingming


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] jbd: fix assertion failure in journal_next_log_block

2008-01-31 Thread Mingming Cao

On Thu, 2008-01-31 at 16:52 -0500, Josef Bacik wrote:
> On Thu, Jan 31, 2008 at 12:35:43PM -0700, Andreas Dilger wrote:
> > On Jan 31, 2008  11:14 -0500, Josef Bacik wrote:
> > [snip excellent analysis]
> > > So you get into this situation where
> > > t_nr_buffers (the actual number of buffers that are on the transaction) is
> > > greater than the number of buffers accounted for via 
> > > t_outstanding_credits.
> > > This presents a problem since as we loop through writting buffers to the
> > > journal, we decrement t_outstanding_credits, and if t_nr_buffers is more 
> > > than
> > > t_outstanding_credits then we end up with a negative number for
> > > t_outstanding_credits
> > > 
> > > Signed-off-by: Josef Bacik <[EMAIL PROTECTED]>
> > 
> > Do you know what kernel this problem was introduced in, or is this a
> > long standing problem?  Presumably the same is needed for jbd2?
> > 
> > Once we have some decent amount of testing going on with ext4, I think
> > it makes sense to merge the jbd2 changes back into jbd and return to
> > a single code base, since there is nothing in the jbd2 code that ext3
> > can't also work with (i.e. all of the changes are properly isolated
> > with compatibility flags and such).
> > 
> > > @@ -1056,7 +1056,7 @@ static inline int jbd_space_needed(journal_t 
> > > *journal)
> > >   int nblocks = journal->j_max_transaction_buffers;
> > >   if (journal->j_committing_transaction)
> > >   nblocks += journal->j_committing_transaction->
> > > - t_outstanding_credits;
> > > + t_nr_buffers;
> > 
> > (trivial) this can be moved back onto the previous line.
> > 
> > > @@ -1168,7 +1168,7 @@ static inline int jbd_space_needed(journal_t 
> > > *journal)
> > >   int nblocks = journal->j_max_transaction_buffers;
> > >   if (journal->j_committing_transaction)
> > >   nblocks += journal->j_committing_transaction->
> > > - t_outstanding_credits;
> > > + t_nr_buffers;
> > 
> > Same...
> >
> 
> The original issue was reported on RHEL4, so thats 2.6.9, and looking through
> the old-bkcvs git tree I can't see where this was introduced, so it's probably
> existed before that.  The same problem looks to exist in jbd2 though I haven't
> tested it myself, I just went ahead and included the fixes.  Here is the 
> updated
> patch, thanks much for the comments.
> 

Added to ext4 patch queue. 

Thanks,
Mingming
> Signed-off-by: Josef Bacik <[EMAIL PROTECTED]>
> 
> 
> diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
> index 31853eb..e385a5c 100644
> --- a/fs/jbd/commit.c
> +++ b/fs/jbd/commit.c
> @@ -561,13 +561,6 @@ void journal_commit_transaction(journal_t *journal)
>   continue;
>   }
> 
> - /*
> -  * start_this_handle() uses t_outstanding_credits to determine
> -  * the free space in the log, but this counter is changed
> -  * by journal_next_log_block() also.
> -  */
> - commit_transaction->t_outstanding_credits--;
> -
>   /* Bump b_count to prevent truncate from stumbling over
> the shadowed buffer!  @@@ This can go if we ever get
> rid of the BJ_IO/BJ_Shadow pairing of buffers. */
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 4f302d2..c0f93f5 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -580,7 +580,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>   stats.u.run.rs_logging = jiffies;
>   stats.u.run.rs_flushing = jbd2_time_diff(stats.u.run.rs_flushing,
>stats.u.run.rs_logging);
> - stats.u.run.rs_blocks = commit_transaction->t_outstanding_credits;
> + stats.u.run.rs_blocks = commit_transaction->t_nr_buffers;
>   stats.u.run.rs_blocks_logged = 0;
> 
>   descriptor = NULL;
> @@ -655,13 +655,6 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>   continue;
>   }
> 
> - /*
> -  * start_this_handle() uses t_outstanding_credits to determine
> -  * the free space in the log, but this counter is changed
> -  * by jbd2_journal_next_log_block() also.
> -  */
> - commit_transaction->t_outstanding_credits--;
> -
>   /* Bump b_count to prevent truncate from stumbling over
> the shadowed buffer!  @@@ This can go if we ever get
> rid of the BJ_IO/BJ_Shadow pairing of buffers. */
> diff --git a/include/linux/jbd.h b/include/linux/jbd.h
> index d9ecd13..eaeb3db 100644
> --- a/include/linux/jbd.h
> +++ b/include/linux/jbd.h
> @@ -1055,8 +1055,7 @@ static inline int jbd_space_needed(journal_t *journal)
>  {
>   int nblocks = journal->j_max_transaction_buffers;
>   if (journal->j_committing_transaction)
> - nblocks +=

Re: [PATCH] jbd: fix assertion failure in journal_next_log_block

2008-01-31 Thread Mingming Cao

On Thu, 2008-01-31 at 16:52 -0500, Josef Bacik wrote:
 On Thu, Jan 31, 2008 at 12:35:43PM -0700, Andreas Dilger wrote:
  On Jan 31, 2008  11:14 -0500, Josef Bacik wrote:
  [snip excellent analysis]
   So you get into this situation where
   t_nr_buffers (the actual number of buffers that are on the transaction) is
   greater than the number of buffers accounted for via 
   t_outstanding_credits.
   This presents a problem since as we loop through writting buffers to the
   journal, we decrement t_outstanding_credits, and if t_nr_buffers is more 
   than
   t_outstanding_credits then we end up with a negative number for
   t_outstanding_credits
   
   Signed-off-by: Josef Bacik [EMAIL PROTECTED]
  
  Do you know what kernel this problem was introduced in, or is this a
  long standing problem?  Presumably the same is needed for jbd2?
  
  Once we have some decent amount of testing going on with ext4, I think
  it makes sense to merge the jbd2 changes back into jbd and return to
  a single code base, since there is nothing in the jbd2 code that ext3
  can't also work with (i.e. all of the changes are properly isolated
  with compatibility flags and such).
  
   @@ -1056,7 +1056,7 @@ static inline int jbd_space_needed(journal_t 
   *journal)
 int nblocks = journal-j_max_transaction_buffers;
 if (journal-j_committing_transaction)
 nblocks += journal-j_committing_transaction-
   - t_outstanding_credits;
   + t_nr_buffers;
  
  (trivial) this can be moved back onto the previous line.
  
   @@ -1168,7 +1168,7 @@ static inline int jbd_space_needed(journal_t 
   *journal)
 int nblocks = journal-j_max_transaction_buffers;
 if (journal-j_committing_transaction)
 nblocks += journal-j_committing_transaction-
   - t_outstanding_credits;
   + t_nr_buffers;
  
  Same...
 
 
 The original issue was reported on RHEL4, so thats 2.6.9, and looking through
 the old-bkcvs git tree I can't see where this was introduced, so it's probably
 existed before that.  The same problem looks to exist in jbd2 though I haven't
 tested it myself, I just went ahead and included the fixes.  Here is the 
 updated
 patch, thanks much for the comments.
 

Added to ext4 patch queue. 

Thanks,
Mingming
 Signed-off-by: Josef Bacik [EMAIL PROTECTED]
 
 
 diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
 index 31853eb..e385a5c 100644
 --- a/fs/jbd/commit.c
 +++ b/fs/jbd/commit.c
 @@ -561,13 +561,6 @@ void journal_commit_transaction(journal_t *journal)
   continue;
   }
 
 - /*
 -  * start_this_handle() uses t_outstanding_credits to determine
 -  * the free space in the log, but this counter is changed
 -  * by journal_next_log_block() also.
 -  */
 - commit_transaction-t_outstanding_credits--;
 -
   /* Bump b_count to prevent truncate from stumbling over
 the shadowed buffer!  @@@ This can go if we ever get
 rid of the BJ_IO/BJ_Shadow pairing of buffers. */
 diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
 index 4f302d2..c0f93f5 100644
 --- a/fs/jbd2/commit.c
 +++ b/fs/jbd2/commit.c
 @@ -580,7 +580,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
   stats.u.run.rs_logging = jiffies;
   stats.u.run.rs_flushing = jbd2_time_diff(stats.u.run.rs_flushing,
stats.u.run.rs_logging);
 - stats.u.run.rs_blocks = commit_transaction-t_outstanding_credits;
 + stats.u.run.rs_blocks = commit_transaction-t_nr_buffers;
   stats.u.run.rs_blocks_logged = 0;
 
   descriptor = NULL;
 @@ -655,13 +655,6 @@ void jbd2_journal_commit_transaction(journal_t *journal)
   continue;
   }
 
 - /*
 -  * start_this_handle() uses t_outstanding_credits to determine
 -  * the free space in the log, but this counter is changed
 -  * by jbd2_journal_next_log_block() also.
 -  */
 - commit_transaction-t_outstanding_credits--;
 -
   /* Bump b_count to prevent truncate from stumbling over
 the shadowed buffer!  @@@ This can go if we ever get
 rid of the BJ_IO/BJ_Shadow pairing of buffers. */
 diff --git a/include/linux/jbd.h b/include/linux/jbd.h
 index d9ecd13..eaeb3db 100644
 --- a/include/linux/jbd.h
 +++ b/include/linux/jbd.h
 @@ -1055,8 +1055,7 @@ static inline int jbd_space_needed(journal_t *journal)
  {
   int nblocks = journal-j_max_transaction_buffers;
   if (journal-j_committing_transaction)
 - nblocks += journal-j_committing_transaction-
 - t_outstanding_credits;
 + nblocks += journal-j_committing_transaction-t_nr_buffers;
   return nblocks;
  }
 
 diff --git

Re: [patch 12/26] mount options: fix ext4

2008-01-25 Thread Mingming Cao

On Thu, 2008-01-24 at 20:33 +0100, Miklos Szeredi wrote:
> plain text document attachment (ext4_opts.patch)
> From: Miklos Szeredi <[EMAIL PROTECTED]>
> 
> Add stripe= option to /proc/mounts for ext4 filesystems.
> 
> Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
> ---
> 
> Index: linux/fs/ext4/super.c
> ===
> --- linux.orig/fs/ext4/super.c2008-01-23 12:57:07.0 +0100
> +++ linux/fs/ext4/super.c 2008-01-23 21:43:51.0 +0100
> @@ -742,7 +742,8 @@ static int ext4_show_options(struct seq_
>   seq_puts(seq, ",nomballoc");
>   if (!test_opt(sb, DELALLOC))
>   seq_puts(seq, ",nodelalloc");
> -
> + if (sbi->s_stripe)
> + seq_printf(seq, ",stripe=%lu", sbi->s_stripe);
> 
>   /*
>* journal mode get enabled in different ways
> 

Added to ext4 patch queue. Thanks!
http://repo.or.cz/w/ext4-patch-queue.git

Mingming


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 12/26] mount options: fix ext4

2008-01-25 Thread Mingming Cao

On Thu, 2008-01-24 at 20:33 +0100, Miklos Szeredi wrote:
 plain text document attachment (ext4_opts.patch)
 From: Miklos Szeredi [EMAIL PROTECTED]
 
 Add stripe= option to /proc/mounts for ext4 filesystems.
 
 Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
 ---
 
 Index: linux/fs/ext4/super.c
 ===
 --- linux.orig/fs/ext4/super.c2008-01-23 12:57:07.0 +0100
 +++ linux/fs/ext4/super.c 2008-01-23 21:43:51.0 +0100
 @@ -742,7 +742,8 @@ static int ext4_show_options(struct seq_
   seq_puts(seq, ,nomballoc);
   if (!test_opt(sb, DELALLOC))
   seq_puts(seq, ,nodelalloc);
 -
 + if (sbi-s_stripe)
 + seq_printf(seq, ,stripe=%lu, sbi-s_stripe);
 
   /*
* journal mode get enabled in different ways
 

Added to ext4 patch queue. Thanks!
http://repo.or.cz/w/ext4-patch-queue.git

Mingming


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 33/49] ext4: Add the journal checksum feature

2008-01-24 Thread Mingming Cao

al_t *journal,
> > struct recovery_info *info, enum passtype pass)
> >  {
> > @@ -328,6 +360,7 @@ static int do_one_pass(journal_t *journal,
> > unsigned intsequence;
> > int blocktype;
> > int tag_bytes = journal_tag_bytes(journal);
> > +   __u32   crc32_sum = ~0; /* Transactional Checksums */
> >  
> > /* Precompute the maximum metadata descriptors in a descriptor block */
> > int MAX_BLOCKS_PER_DESC;
> > @@ -419,9 +452,23 @@ static int do_one_pass(journal_t *journal,
> > switch(blocktype) {
> > case JBD2_DESCRIPTOR_BLOCK:
> > /* If it is a valid descriptor block, replay it
> > -* in pass REPLAY; otherwise, just skip over the
> > -* blocks it describes. */
> > +* in pass REPLAY; if journal_checksums enabled, then
> > +* calculate checksums in PASS_SCAN, otherwise,
> > +* just skip over the blocks it describes. */
> > if (pass != PASS_REPLAY) {
> > +   if (pass == PASS_SCAN &&
> > +   JBD2_HAS_COMPAT_FEATURE(journal,
> > +   JBD2_FEATURE_COMPAT_CHECKSUM) &&
> > +   !info->end_transaction) {
> > +   if (calc_chksums(journal, bh,
> > +   _log_block,
> > +   _sum)) {
> 
> put_bh()
> 
> > +   brelse(bh);
> > +   break;
> > +   }
> > +   brelse(bh);
> > +   continue;
> 
> put_bh()
> 
> > +   }
> > next_log_block += count_tags(journal, bh);
> > wrap(journal, next_log_block);
> > brelse(bh);
> > @@ -516,9 +563,96 @@ static int do_one_pass(journal_t *journal,
> > continue;
> >  
> > +   brelse(bh);
> 
> etc
> 

Thanks, Updated  patch below:
ext4: Add the journal checksum feature

From: Girish Shilamkar <[EMAIL PROTECTED]>

The journal checksum feature adds two new flags i.e
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM.

JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the
checksum for the blocks described by the descriptor blocks.
Due to checksums, writing of the commit record no longer needs to be
synchronous. Now commit record can be sent to disk without waiting for
descriptor blocks to be written to disk. This behavior is controlled
using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be
able to recover the journal with _ASYNC_COMMIT hence it is made
incompat.
The commit header has been extended to hold the checksum along with the
type of the checksum.

For recovery in pass scan checksums are verified to ensure the sanity
and completeness(in case of _ASYNC_COMMIT) of every transaction.

Signed-off-by: Andreas Dilger <[EMAIL PROTECTED]>
Signed-off-by: Girish Shilamkar <[EMAIL PROTECTED]>
Signed-off-by: Dave Kleikamp <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---

 Documentation/filesystems/ext4.txt |   10 +
 fs/Kconfig |1 
 fs/ext4/super.c|   25 
 fs/jbd2/commit.c   |  198 +++--
 fs/jbd2/journal.c  |   26 
 fs/jbd2/recovery.c |  151 ++--
 include/linux/ext4_fs.h|3 
 include/linux/jbd2.h   |   36 +-
 8 files changed, 388 insertions(+), 62 deletions(-)


Index: linux-2.6.24-rc8/Documentation/filesystems/ext4.txt
===
--- linux-2.6.24-rc8.orig/Documentation/filesystems/ext4.txt2008-01-24 
11:18:08.0 -0800
+++ linux-2.6.24-rc8/Documentation/filesystems/ext4.txt 2008-01-24 
13:00:44.0 -0800
@@ -89,6 +89,16 @@ When mounting an ext4 filesystem, the fo
 extentsext4 will use extents to address file data.  The
file system will no longer be mountable by ext3.
 
+journal_checksum   Enable checksumming of the journal transactions.
+   This will allow the recovery code in e2fsck and the
+

Re: [PATCH 33/49] ext4: Add the journal checksum feature

2008-01-24 Thread Mingming Cao

,
  +* just skip over the blocks it describes. */
  if (pass != PASS_REPLAY) {
  +   if (pass == PASS_SCAN 
  +   JBD2_HAS_COMPAT_FEATURE(journal,
  +   JBD2_FEATURE_COMPAT_CHECKSUM) 
  +   !info-end_transaction) {
  +   if (calc_chksums(journal, bh,
  +   next_log_block,
  +   crc32_sum)) {
 
 put_bh()
 
  +   brelse(bh);
  +   break;
  +   }
  +   brelse(bh);
  +   continue;
 
 put_bh()
 
  +   }
  next_log_block += count_tags(journal, bh);
  wrap(journal, next_log_block);
  brelse(bh);
  @@ -516,9 +563,96 @@ static int do_one_pass(journal_t *journal,
  continue;
   
  +   brelse(bh);
 
 etc
 

Thanks, Updated  patch below:
ext4: Add the journal checksum feature

From: Girish Shilamkar [EMAIL PROTECTED]

The journal checksum feature adds two new flags i.e
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM.

JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the
checksum for the blocks described by the descriptor blocks.
Due to checksums, writing of the commit record no longer needs to be
synchronous. Now commit record can be sent to disk without waiting for
descriptor blocks to be written to disk. This behavior is controlled
using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be
able to recover the journal with _ASYNC_COMMIT hence it is made
incompat.
The commit header has been extended to hold the checksum along with the
type of the checksum.

For recovery in pass scan checksums are verified to ensure the sanity
and completeness(in case of _ASYNC_COMMIT) of every transaction.

Signed-off-by: Andreas Dilger [EMAIL PROTECTED]
Signed-off-by: Girish Shilamkar [EMAIL PROTECTED]
Signed-off-by: Dave Kleikamp [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 Documentation/filesystems/ext4.txt |   10 +
 fs/Kconfig |1 
 fs/ext4/super.c|   25 
 fs/jbd2/commit.c   |  198 +++--
 fs/jbd2/journal.c  |   26 
 fs/jbd2/recovery.c |  151 ++--
 include/linux/ext4_fs.h|3 
 include/linux/jbd2.h   |   36 +-
 8 files changed, 388 insertions(+), 62 deletions(-)


Index: linux-2.6.24-rc8/Documentation/filesystems/ext4.txt
===
--- linux-2.6.24-rc8.orig/Documentation/filesystems/ext4.txt2008-01-24 
11:18:08.0 -0800
+++ linux-2.6.24-rc8/Documentation/filesystems/ext4.txt 2008-01-24 
13:00:44.0 -0800
@@ -89,6 +89,16 @@ When mounting an ext4 filesystem, the fo
 extentsext4 will use extents to address file data.  The
file system will no longer be mountable by ext3.
 
+journal_checksum   Enable checksumming of the journal transactions.
+   This will allow the recovery code in e2fsck and the
+   kernel to detect corruption in the kernel.  It is a
+   compatible change and will be ignored by older kernels.
+
+journal_async_commit   Commit block can be written to disk without waiting
+   for descriptor blocks. If enabled older kernels cannot
+   mount the device. This will enable 'journal_checksum'
+   internally.
+
 journal=update Update the ext4 file system's journal to the current
format.
 
Index: linux-2.6.24-rc8/fs/Kconfig
===
--- linux-2.6.24-rc8.orig/fs/Kconfig2008-01-24 11:18:08.0 -0800
+++ linux-2.6.24-rc8/fs/Kconfig 2008-01-24 11:18:55.0 -0800
@@ -236,6 +236,7 @@ config JBD_DEBUG
 
 config JBD2
tristate
+   select CRC32
help
  This is a generic journaling layer for block devices that support
  both 32-bit and 64-bit block numbers.  It is currently used by
Index: linux-2.6.24-rc8/fs/ext4/super.c
===
--- linux-2.6.24-rc8.orig/fs/ext4/super.c   2008-01-24 11:18:52.0 
-0800
+++ linux-2.6.24-rc8/fs/ext4/super.c2008-01-24 13:00:45.0 -0800
@@ -869,6 +869,7 @@ enum {
Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl,
Opt_reservation, Opt_noreservation, Opt_noload, Opt_nobh

Re: [PATCH] Shrink ext3_inode_info by 8 bytes for !POSIX_ACL.

2008-01-18 Thread Mingming Cao

On Sat, 2008-01-12 at 21:35 +0100, Indan Zupancic wrote:
> i_file_acl and i_dir_acl aren't always needed.
> 
> With certain configs this makes 10 ext3_inode_cache objects fit in
> one slab instead of the current 9, as the size shrinks from 416 to
> 408 bytes for 32 bit, !POSIX_ACL and !EXT3_FS_XATTR configs.
> 
> Signed-off-by: Indan Zupancic <[EMAIL PROTECTED]>
> ---
>  fs/ext3/ialloc.c  |2 ++
>  fs/ext3/inode.c   |   29 +++--
>  include/linux/ext3_fs_i.h |2 ++
>  3 files changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c
> index 1bc8cd8..01745bc 100644
> --- a/fs/ext3/ialloc.c
> +++ b/fs/ext3/ialloc.c
> @@ -574,8 +574,10 @@ got:
>   ei->i_frag_no = 0;
>   ei->i_frag_size = 0;
>  #endif
> +#ifdef CONFIG_EXT3_FS_POSIX_ACL
>   ei->i_file_acl = 0;
>   ei->i_dir_acl = 0;
> +#endif

For regular file, i_dir_acl is being reused as i_size_high to support
large file.

Mingming

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Shrink ext3_inode_info by 8 bytes for !POSIX_ACL.

2008-01-18 Thread Mingming Cao

On Sat, 2008-01-12 at 21:35 +0100, Indan Zupancic wrote:
 i_file_acl and i_dir_acl aren't always needed.
 
 With certain configs this makes 10 ext3_inode_cache objects fit in
 one slab instead of the current 9, as the size shrinks from 416 to
 408 bytes for 32 bit, !POSIX_ACL and !EXT3_FS_XATTR configs.
 
 Signed-off-by: Indan Zupancic [EMAIL PROTECTED]
 ---
  fs/ext3/ialloc.c  |2 ++
  fs/ext3/inode.c   |   29 +++--
  include/linux/ext3_fs_i.h |2 ++
  3 files changed, 23 insertions(+), 10 deletions(-)
 
 diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c
 index 1bc8cd8..01745bc 100644
 --- a/fs/ext3/ialloc.c
 +++ b/fs/ext3/ialloc.c
 @@ -574,8 +574,10 @@ got:
   ei-i_frag_no = 0;
   ei-i_frag_size = 0;
  #endif
 +#ifdef CONFIG_EXT3_FS_POSIX_ACL
   ei-i_file_acl = 0;
   ei-i_dir_acl = 0;
 +#endif

For regular file, i_dir_acl is being reused as i_size_high to support
large file.

Mingming

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4]JBD2: user of the jiffies rounding code

2008-01-17 Thread Mingming Cao

Ported from JBD changes from Arjan van de Ven <[EMAIL PROTECTED]>
---
Date: Sun, 10 Dec 2006 10:21:26 + (-0800)
Subject: [PATCH] user of the jiffies rounding code: JBD
X-Git-Tag: v2.6.20-rc1~15^2~43
X-Git-Url: 
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=44d306e1508fef6fa7a6eb15a1aba86ef68389a6

[PATCH] user of the jiffies rounding code: JBD

This patch introduces a user: of the round_jiffies() function; the "5 second"
ext3/jbd wakeup.

While "every 5 seconds" doesn't sound as a problem, there can be many of these
(and these timers do add up over all the kernel).  The "5 second" wakeup isn't
really timing sensitive; in addition even with rounding it'll still happen
every 5 seconds (with the exception of the very first time, which is likely to
be rounded up to somewhere closer to 6 seconds)

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---

---
 fs/jbd2/transaction.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.24-rc7/fs/jbd2/transaction.c
===
--- linux-2.6.24-rc7.orig/fs/jbd2/transaction.c 2008-01-16 15:41:14.0 
-0800
+++ linux-2.6.24-rc7/fs/jbd2/transaction.c  2008-01-16 17:49:48.0 
-0800
@@ -54,7 +54,7 @@ jbd2_get_transaction(journal_t *journal,
spin_lock_init(>t_handle_lock);
 
/* Set up the commit timer for the new transaction. */
-   journal->j_commit_timer.expires = transaction->t_expires;
+   journal->j_commit_timer.expires = round_jiffies(transaction->t_expires);
add_timer(>j_commit_timer);
 
J_ASSERT(journal->j_running_transaction == NULL);

 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4]JBD2: Group short-lived and reclaimable kernel allocations

2008-01-17 Thread Mingming Cao

JBD2: Group short-lived and reclaimable kernel allocations
From: Mingming Cao <[EMAIL PROTECTED]>
Ported from JBD to JBD2


From: Mel Gorman <[EMAIL PROTECTED]>
Date: Tue, 16 Oct 2007 08:25:52 + (-0700)
Subject: Group short-lived and reclaimable kernel allocations
X-Git-Tag: v2.6.24-rc1~1137
X-Git-Url: 
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=e12ba74d8ff3e2f73a583500d7095e406df4d093

Group short-lived and reclaimable kernel allocations

This patch marks a number of allocations that are either short-lived such as
network buffers or are reclaimable such as inode allocations.  When something
like updatedb is called, long-lived and unmovable kernel allocations tend to
be spread throughout the address space which increases fragmentation.

This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE.  The MIGRATE_RECLAIMABLE type is for allocations that can be
reclaimed on demand, but not moved.  i.e.  they can be migrated by deleting
them and re-reading the information from elsewhere.

Cc: Mel Gorman <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---

---
 fs/jbd2/journal.c |4 ++--
 fs/jbd2/revoke.c  |6 --
 2 files changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6.24-rc7/fs/jbd2/journal.c
===
--- linux-2.6.24-rc7.orig/fs/jbd2/journal.c 2008-01-16 15:02:40.0 
-0800
+++ linux-2.6.24-rc7/fs/jbd2/journal.c  2008-01-16 17:40:24.0 -0800
@@ -1975,7 +1975,7 @@ static int journal_init_jbd2_journal_hea
jbd2_journal_head_cache = kmem_cache_create("jbd2_journal_head",
sizeof(struct journal_head),
0,  /* offset */
-   0,  /* flags */
+   SLAB_TEMPORARY, /* flags */
NULL);  /* ctor */
retval = 0;
if (jbd2_journal_head_cache == 0) {
@@ -2271,7 +2271,7 @@ static int __init journal_init_handle_ca
jbd2_handle_cache = kmem_cache_create("jbd2_journal_handle",
sizeof(handle_t),
0,  /* offset */
-   0,  /* flags */
+   SLAB_TEMPORARY, /* flags */
NULL);  /* ctor */
if (jbd2_handle_cache == NULL) {
printk(KERN_EMERG "JBD: failed to create handle cache\n");
Index: linux-2.6.24-rc7/fs/jbd2/revoke.c
===
--- linux-2.6.24-rc7.orig/fs/jbd2/revoke.c  2008-01-06 13:45:38.0 
-0800
+++ linux-2.6.24-rc7/fs/jbd2/revoke.c   2008-01-16 17:40:24.0 -0800
@@ -171,13 +171,15 @@ int __init jbd2_journal_init_revoke_cach
 {
jbd2_revoke_record_cache = kmem_cache_create("jbd2_revoke_record",
   sizeof(struct jbd2_revoke_record_s),
-  0, SLAB_HWCACHE_ALIGN, NULL);
+  0,
+  SLAB_HWCACHE_ALIGN|SLAB_TEMPORARY,
+  NULL);
if (jbd2_revoke_record_cache == 0)
return -ENOMEM;
 
jbd2_revoke_table_cache = kmem_cache_create("jbd2_revoke_table",
   sizeof(struct jbd2_revoke_table_s),
-  0, 0, NULL);
+  0, SLAB_TEMPORARY, NULL);
if (jbd2_revoke_table_cache == 0) {
kmem_cache_destroy(jbd2_revoke_record_cache);
jbd2_revoke_record_cache = NULL;

 
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4]JBD2: sparse pointer use of zero as null

2008-01-17 Thread Mingming Cao

Ported from upstream jbd changes to jbd2

sparse pointer use of zero as null

Get rid of sparse related warnings from places that use integer as NULL
pointer.

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/jbd2/transaction.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux-2.6.24-rc7/fs/jbd2/transaction.c
===
--- linux-2.6.24-rc7.orig/fs/jbd2/transaction.c 2008-01-16 17:49:48.0 
-0800
+++ linux-2.6.24-rc7/fs/jbd2/transaction.c  2008-01-16 18:06:33.0 
-0800
@@ -1182,7 +1182,7 @@ int jbd2_journal_dirty_metadata(handle_t
}
 
/* That test should have eliminated the following case: */
-   J_ASSERT_JH(jh, jh->b_frozen_data == 0);
+   J_ASSERT_JH(jh, jh->b_frozen_data == NULL);
 
JBUFFER_TRACE(jh, "file as BJ_Metadata");
spin_lock(>j_list_lock);
@@ -1532,7 +1532,7 @@ void __jbd2_journal_temp_unlink_buffer(s
 
J_ASSERT_JH(jh, jh->b_jlist < BJ_Types);
if (jh->b_jlist != BJ_None)
-   J_ASSERT_JH(jh, transaction != 0);
+   J_ASSERT_JH(jh, transaction != NULL);
 
switch (jh->b_jlist) {
case BJ_None:
@@ -1601,11 +1601,11 @@ __journal_try_to_free_buffer(journal_t *
if (buffer_locked(bh) || buffer_dirty(bh))
goto out;
 
-   if (jh->b_next_transaction != 0)
+   if (jh->b_next_transaction != NULL)
goto out;
 
spin_lock(>j_list_lock);
-   if (jh->b_transaction != 0 && jh->b_cp_transaction == 0) {
+   if (jh->b_transaction != NULL && jh->b_cp_transaction == NULL) {
if (jh->b_jlist == BJ_SyncData || jh->b_jlist == BJ_Locked) {
/* A written-back ordered data buffer */
JBUFFER_TRACE(jh, "release data");
@@ -1613,7 +1613,7 @@ __journal_try_to_free_buffer(journal_t *
jbd2_journal_remove_journal_head(bh);
__brelse(bh);
}
-   } else if (jh->b_cp_transaction != 0 && jh->b_transaction == 0) {
+   } else if (jh->b_cp_transaction != NULL && jh->b_transaction == NULL) {
/* written-back checkpointed metadata buffer */
if (jh->b_jlist == BJ_None) {
JBUFFER_TRACE(jh, "remove from checkpoint list");
@@ -1973,7 +1973,7 @@ void __jbd2_journal_file_buffer(struct j
 
J_ASSERT_JH(jh, jh->b_jlist < BJ_Types);
J_ASSERT_JH(jh, jh->b_transaction == transaction ||
-   jh->b_transaction == 0);
+   jh->b_transaction == NULL);
 
if (jh->b_transaction && jh->b_jlist == jlist)
return;

 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4]jbd2: port jbd lockdep support to jbd2

2008-01-17 Thread Mingming Cao

Hi Andrew, Ted,

I walked through the linus's git tree history and found 4 patches should
port from ext3/jbd to ext4/jbd2, since the day ext4 was forked
(2006.10.11) to today. I have already queued the ported patches in ext4
patch queue and verified they seems fine. Here is the first one.



jbd2: port jbd lockdep support to jbd2
> Except lockdep doesn't know about journal_start(), which has ranking
> requirements similar to a semaphore.  

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/jbd2/transaction.c |   11 +++
 include/linux/jbd2.h  |4 
 2 files changed, 15 insertions(+)

Index: linux-2.6.24-rc7/fs/jbd2/transaction.c
===
--- linux-2.6.24-rc7.orig/fs/jbd2/transaction.c 2008-01-16 15:30:24.0 
-0800
+++ linux-2.6.24-rc7/fs/jbd2/transaction.c  2008-01-16 15:41:14.0 
-0800
@@ -241,6 +241,8 @@ out:
return ret;
 }
 
+static struct lock_class_key jbd2_handle_key;
+
 /* Allocate a new handle.  This should probably be in a slab... */
 static handle_t *new_handle(int nblocks)
 {
@@ -251,6 +253,9 @@ static handle_t *new_handle(int nblocks)
handle->h_buffer_credits = nblocks;
handle->h_ref = 1;
 
+   lockdep_init_map(>h_lockdep_map, "jbd2_handle",
+   _handle_key, 0);
+
return handle;
 }
 
@@ -293,7 +298,11 @@ handle_t *jbd2_journal_start(journal_t *
jbd2_free_handle(handle);
current->journal_info = NULL;
handle = ERR_PTR(err);
+   goto out;
}
+
+   lock_acquire(>h_lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+out:
return handle;
 }
 
@@ -1419,6 +1428,8 @@ int jbd2_journal_stop(handle_t *handle)
spin_unlock(>j_state_lock);
}
 
+   lock_release(>h_lockdep_map, 1, _THIS_IP_);
+
jbd2_free_handle(handle);
return err;
 }
Index: linux-2.6.24-rc7/include/linux/jbd2.h
===
--- linux-2.6.24-rc7.orig/include/linux/jbd2.h  2008-01-16 15:29:03.0 
-0800
+++ linux-2.6.24-rc7/include/linux/jbd2.h   2008-01-16 15:29:54.0 
-0800
@@ -418,6 +418,10 @@ struct handle_s
unsigned inth_sync: 1;  /* sync-on-close */
unsigned inth_jdata:1;  /* force data journaling */
unsigned inth_aborted:  1;  /* fatal error on handle */
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+   struct lockdep_map  h_lockdep_map;
+#endif
 };
 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4]JBD2: Group short-lived and reclaimable kernel allocations

2008-01-17 Thread Mingming Cao

JBD2: Group short-lived and reclaimable kernel allocations
From: Mingming Cao [EMAIL PROTECTED]
Ported from JBD to JBD2


From: Mel Gorman [EMAIL PROTECTED]
Date: Tue, 16 Oct 2007 08:25:52 + (-0700)
Subject: Group short-lived and reclaimable kernel allocations
X-Git-Tag: v2.6.24-rc1~1137
X-Git-Url: 
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=e12ba74d8ff3e2f73a583500d7095e406df4d093

Group short-lived and reclaimable kernel allocations

This patch marks a number of allocations that are either short-lived such as
network buffers or are reclaimable such as inode allocations.  When something
like updatedb is called, long-lived and unmovable kernel allocations tend to
be spread throughout the address space which increases fragmentation.

This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE.  The MIGRATE_RECLAIMABLE type is for allocations that can be
reclaimed on demand, but not moved.  i.e.  they can be migrated by deleting
them and re-reading the information from elsewhere.

Cc: Mel Gorman [EMAIL PROTECTED]
Cc: Andrew Morton [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

---
 fs/jbd2/journal.c |4 ++--
 fs/jbd2/revoke.c  |6 --
 2 files changed, 6 insertions(+), 4 deletions(-)

Index: linux-2.6.24-rc7/fs/jbd2/journal.c
===
--- linux-2.6.24-rc7.orig/fs/jbd2/journal.c 2008-01-16 15:02:40.0 
-0800
+++ linux-2.6.24-rc7/fs/jbd2/journal.c  2008-01-16 17:40:24.0 -0800
@@ -1975,7 +1975,7 @@ static int journal_init_jbd2_journal_hea
jbd2_journal_head_cache = kmem_cache_create(jbd2_journal_head,
sizeof(struct journal_head),
0,  /* offset */
-   0,  /* flags */
+   SLAB_TEMPORARY, /* flags */
NULL);  /* ctor */
retval = 0;
if (jbd2_journal_head_cache == 0) {
@@ -2271,7 +2271,7 @@ static int __init journal_init_handle_ca
jbd2_handle_cache = kmem_cache_create(jbd2_journal_handle,
sizeof(handle_t),
0,  /* offset */
-   0,  /* flags */
+   SLAB_TEMPORARY, /* flags */
NULL);  /* ctor */
if (jbd2_handle_cache == NULL) {
printk(KERN_EMERG JBD: failed to create handle cache\n);
Index: linux-2.6.24-rc7/fs/jbd2/revoke.c
===
--- linux-2.6.24-rc7.orig/fs/jbd2/revoke.c  2008-01-06 13:45:38.0 
-0800
+++ linux-2.6.24-rc7/fs/jbd2/revoke.c   2008-01-16 17:40:24.0 -0800
@@ -171,13 +171,15 @@ int __init jbd2_journal_init_revoke_cach
 {
jbd2_revoke_record_cache = kmem_cache_create(jbd2_revoke_record,
   sizeof(struct jbd2_revoke_record_s),
-  0, SLAB_HWCACHE_ALIGN, NULL);
+  0,
+  SLAB_HWCACHE_ALIGN|SLAB_TEMPORARY,
+  NULL);
if (jbd2_revoke_record_cache == 0)
return -ENOMEM;
 
jbd2_revoke_table_cache = kmem_cache_create(jbd2_revoke_table,
   sizeof(struct jbd2_revoke_table_s),
-  0, 0, NULL);
+  0, SLAB_TEMPORARY, NULL);
if (jbd2_revoke_table_cache == 0) {
kmem_cache_destroy(jbd2_revoke_record_cache);
jbd2_revoke_record_cache = NULL;

 
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4]jbd2: port jbd lockdep support to jbd2

2008-01-17 Thread Mingming Cao

Hi Andrew, Ted,

I walked through the linus's git tree history and found 4 patches should
port from ext3/jbd to ext4/jbd2, since the day ext4 was forked
(2006.10.11) to today. I have already queued the ported patches in ext4
patch queue and verified they seems fine. Here is the first one.



jbd2: port jbd lockdep support to jbd2
 Except lockdep doesn't know about journal_start(), which has ranking
 requirements similar to a semaphore.  

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/jbd2/transaction.c |   11 +++
 include/linux/jbd2.h  |4 
 2 files changed, 15 insertions(+)

Index: linux-2.6.24-rc7/fs/jbd2/transaction.c
===
--- linux-2.6.24-rc7.orig/fs/jbd2/transaction.c 2008-01-16 15:30:24.0 
-0800
+++ linux-2.6.24-rc7/fs/jbd2/transaction.c  2008-01-16 15:41:14.0 
-0800
@@ -241,6 +241,8 @@ out:
return ret;
 }
 
+static struct lock_class_key jbd2_handle_key;
+
 /* Allocate a new handle.  This should probably be in a slab... */
 static handle_t *new_handle(int nblocks)
 {
@@ -251,6 +253,9 @@ static handle_t *new_handle(int nblocks)
handle-h_buffer_credits = nblocks;
handle-h_ref = 1;
 
+   lockdep_init_map(handle-h_lockdep_map, jbd2_handle,
+   jbd2_handle_key, 0);
+
return handle;
 }
 
@@ -293,7 +298,11 @@ handle_t *jbd2_journal_start(journal_t *
jbd2_free_handle(handle);
current-journal_info = NULL;
handle = ERR_PTR(err);
+   goto out;
}
+
+   lock_acquire(handle-h_lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+out:
return handle;
 }
 
@@ -1419,6 +1428,8 @@ int jbd2_journal_stop(handle_t *handle)
spin_unlock(journal-j_state_lock);
}
 
+   lock_release(handle-h_lockdep_map, 1, _THIS_IP_);
+
jbd2_free_handle(handle);
return err;
 }
Index: linux-2.6.24-rc7/include/linux/jbd2.h
===
--- linux-2.6.24-rc7.orig/include/linux/jbd2.h  2008-01-16 15:29:03.0 
-0800
+++ linux-2.6.24-rc7/include/linux/jbd2.h   2008-01-16 15:29:54.0 
-0800
@@ -418,6 +418,10 @@ struct handle_s
unsigned inth_sync: 1;  /* sync-on-close */
unsigned inth_jdata:1;  /* force data journaling */
unsigned inth_aborted:  1;  /* fatal error on handle */
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+   struct lockdep_map  h_lockdep_map;
+#endif
 };
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4]JBD2: sparse pointer use of zero as null

2008-01-17 Thread Mingming Cao

Ported from upstream jbd changes to jbd2

sparse pointer use of zero as null

Get rid of sparse related warnings from places that use integer as NULL
pointer.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/jbd2/transaction.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux-2.6.24-rc7/fs/jbd2/transaction.c
===
--- linux-2.6.24-rc7.orig/fs/jbd2/transaction.c 2008-01-16 17:49:48.0 
-0800
+++ linux-2.6.24-rc7/fs/jbd2/transaction.c  2008-01-16 18:06:33.0 
-0800
@@ -1182,7 +1182,7 @@ int jbd2_journal_dirty_metadata(handle_t
}
 
/* That test should have eliminated the following case: */
-   J_ASSERT_JH(jh, jh-b_frozen_data == 0);
+   J_ASSERT_JH(jh, jh-b_frozen_data == NULL);
 
JBUFFER_TRACE(jh, file as BJ_Metadata);
spin_lock(journal-j_list_lock);
@@ -1532,7 +1532,7 @@ void __jbd2_journal_temp_unlink_buffer(s
 
J_ASSERT_JH(jh, jh-b_jlist  BJ_Types);
if (jh-b_jlist != BJ_None)
-   J_ASSERT_JH(jh, transaction != 0);
+   J_ASSERT_JH(jh, transaction != NULL);
 
switch (jh-b_jlist) {
case BJ_None:
@@ -1601,11 +1601,11 @@ __journal_try_to_free_buffer(journal_t *
if (buffer_locked(bh) || buffer_dirty(bh))
goto out;
 
-   if (jh-b_next_transaction != 0)
+   if (jh-b_next_transaction != NULL)
goto out;
 
spin_lock(journal-j_list_lock);
-   if (jh-b_transaction != 0  jh-b_cp_transaction == 0) {
+   if (jh-b_transaction != NULL  jh-b_cp_transaction == NULL) {
if (jh-b_jlist == BJ_SyncData || jh-b_jlist == BJ_Locked) {
/* A written-back ordered data buffer */
JBUFFER_TRACE(jh, release data);
@@ -1613,7 +1613,7 @@ __journal_try_to_free_buffer(journal_t *
jbd2_journal_remove_journal_head(bh);
__brelse(bh);
}
-   } else if (jh-b_cp_transaction != 0  jh-b_transaction == 0) {
+   } else if (jh-b_cp_transaction != NULL  jh-b_transaction == NULL) {
/* written-back checkpointed metadata buffer */
if (jh-b_jlist == BJ_None) {
JBUFFER_TRACE(jh, remove from checkpoint list);
@@ -1973,7 +1973,7 @@ void __jbd2_journal_file_buffer(struct j
 
J_ASSERT_JH(jh, jh-b_jlist  BJ_Types);
J_ASSERT_JH(jh, jh-b_transaction == transaction ||
-   jh-b_transaction == 0);
+   jh-b_transaction == NULL);
 
if (jh-b_transaction  jh-b_jlist == jlist)
return;

 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4]JBD2: user of the jiffies rounding code

2008-01-17 Thread Mingming Cao

Ported from JBD changes from Arjan van de Ven [EMAIL PROTECTED]
---
Date: Sun, 10 Dec 2006 10:21:26 + (-0800)
Subject: [PATCH] user of the jiffies rounding code: JBD
X-Git-Tag: v2.6.20-rc1~15^2~43
X-Git-Url: 
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=44d306e1508fef6fa7a6eb15a1aba86ef68389a6

[PATCH] user of the jiffies rounding code: JBD

This patch introduces a user: of the round_jiffies() function; the 5 second
ext3/jbd wakeup.

While every 5 seconds doesn't sound as a problem, there can be many of these
(and these timers do add up over all the kernel).  The 5 second wakeup isn't
really timing sensitive; in addition even with rounding it'll still happen
every 5 seconds (with the exception of the very first time, which is likely to
be rounded up to somewhere closer to 6 seconds)

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

---
 fs/jbd2/transaction.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.24-rc7/fs/jbd2/transaction.c
===
--- linux-2.6.24-rc7.orig/fs/jbd2/transaction.c 2008-01-16 15:41:14.0 
-0800
+++ linux-2.6.24-rc7/fs/jbd2/transaction.c  2008-01-16 17:49:48.0 
-0800
@@ -54,7 +54,7 @@ jbd2_get_transaction(journal_t *journal,
spin_lock_init(transaction-t_handle_lock);
 
/* Set up the commit timer for the new transaction. */
-   journal-j_commit_timer.expires = transaction-t_expires;
+   journal-j_commit_timer.expires = round_jiffies(transaction-t_expires);
add_timer(journal-j_commit_timer);
 
J_ASSERT(journal-j_running_transaction == NULL);

 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [EXT4 set 6][PATCH 1/1]Export jbd stats through procfs

2007-11-30 Thread Mingming Cao

On Fri, 2007-11-30 at 17:08 -0600, Eric Sandeen wrote:
> Mingming Cao wrote:
> > [PATCH] jbd2 stats through procfs
> > 
> > The patch below updates the jbd stats patch to 2.6.20/jbd2.
> > The initial patch was posted by Alex Tomas in December 2005
> > (http://marc.info/?l=linux-ext4=113538565128617=2).
> > It provides statistics via procfs such as transaction lifetime and size.
> > 
> > [ This probably should be rewritten to use debugfs?   -- Ted]
> > 
> > Signed-off-by: Johann Lombardi <[EMAIL PROTECTED]>
> 
> I've started going through this one to clean it up to the point where it
> can go forward.  It's been sitting at the top of the unstable portion of
> the patch queue for long enough, I think :)
> 
Thanks!

> I've converted the msecs to jiffies until the user boundary, changed the
> union #defines as suggested by Andrew, and various other little issues etc.
> 
> Remaining to do is a generic time-difference calculator (instead of
> jbd2_time_diff), and looking into whether it should be made a config
> option; I tend to think it should, but it's fairly well sprinkled
> through the code, so I'll see how well that works.
> 
> Also did we ever decided if this should go to debugfs?
> 

I thought it was decided to keep it on procfs as debugfs is not always
on...
> Thanks,
> 
> -Eric
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [EXT4 set 6][PATCH 1/1]Export jbd stats through procfs

2007-11-30 Thread Mingming Cao

On Fri, 2007-11-30 at 17:08 -0600, Eric Sandeen wrote:
 Mingming Cao wrote:
  [PATCH] jbd2 stats through procfs
  
  The patch below updates the jbd stats patch to 2.6.20/jbd2.
  The initial patch was posted by Alex Tomas in December 2005
  (http://marc.info/?l=linux-ext4m=113538565128617w=2).
  It provides statistics via procfs such as transaction lifetime and size.
  
  [ This probably should be rewritten to use debugfs?   -- Ted]
  
  Signed-off-by: Johann Lombardi [EMAIL PROTECTED]
 
 I've started going through this one to clean it up to the point where it
 can go forward.  It's been sitting at the top of the unstable portion of
 the patch queue for long enough, I think :)
 
Thanks!

 I've converted the msecs to jiffies until the user boundary, changed the
 union #defines as suggested by Andrew, and various other little issues etc.
 
 Remaining to do is a generic time-difference calculator (instead of
 jbd2_time_diff), and looking into whether it should be made a config
 option; I tend to think it should, but it's fairly well sprinkled
 through the code, so I'll see how well that works.
 
 Also did we ever decided if this should go to debugfs?
 

I thought it was decided to keep it on procfs as debugfs is not always
on...
 Thanks,
 
 -Eric
 -
 To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ext4: dir inode reservation V3

2007-11-20 Thread Mingming Cao

On Tue, 2007-11-20 at 12:14 +0800, Coly Li wrote:
> Thanks for the feedback :-)
> 
> Mingming Cao wrote:
> > On Tue, 2007-11-13 at 22:12 +0800, Coly Li wrote:
> >> Basic idea of my dir inode reservation patch can be found here,
> >> http://lists.openwall.net/linux-ext4/2007/11/05/3
> >>
> >> 1, What does dir inode reservation do
> >> Dir inode reservation tries to reserve several inodes in inodes table for 
> >> a directory when this
> >> directory is created. When create new file under this directory, try to 
> >> allocate inode from the
> >> reserved inodes area. This is called as dir_ireserve inode allocator.
> >>
> > Thanks for the update.
> > 
> > Let me try to understand your method:
> > 
> > So the basic idea is not do linear inode allocation for directory? Inode
> > structure block for directory file is only coming from block 0, N, N
> > +N,... where the number of skipped blocks N is stored in the in-core
> > superblock structure. 
> 
> N is not stored in in-core superblock. N = s_dir_ireserve_nr / 
> inodes_per_block. What is stored in
> in-core superblock is number of inodes to be reserved for each directory.
> 
> > 
> > When ever need to allocate an inode for directory, skip N reserved bits
> > (space for N*16 inodes) if the previous block is already allocated. That
> > way place two directories with the hole of N*16 inodes structures, then
> > allow files under the first directory stay closer with their parent
> > directory. Is this correct?
> 
> The hole is (s_dir_ireserve_nr - 1), not N * s_dir_ireserve_nr. Because 
> directory inode will also
> use a inode slot from reserved area, reset slots number for files is 
> (s_dir_ireserve_nr - 1).
> Except for the reserved inodes number, your understanding exactly matches my 
> idea.
> 
Ok, thanks for clarification.

The performance gain on large number of directories looks interesting,
but I am afraid this makes the uninitialized block group feature less
useful (to speed up fsck) than before:( The reserved inode will cause
the unused inode watermark higher than before, and spread the
directories over to other block groups earlier than before. Maybe it
worth it...not sure.

> >  
> > 
> >> 4, Dir inode reservation is optional
> >> Dir inode reservation is optional, you can use -o followed by one of these 
> >> options to enable dir
> >> inode reservation during mount ext4 file system:
> >>  dir_ireserve=low
> >>  dir_ireserve=normal
> >>  dir_ireserve=high
> > 
> > Would be nice to pass the tuning info low/normal/high(16/64/128 blocks
> > correspondingly) via something else rather than mount options. 
> 
> Sure, I agree with you. Also I am thinking should this patch permit user to 
> input reserved inodes
> number directly other than a low/normal/high. Also I am looking for methods 
> to display the tuning
> info more convenient to users.
> 
export/tune through /procfs?

> >  
> >> Currently, 'low' reserves 15 file inodes for each directory, 'normal' 
> >> reserves 31 inodes and 'high'
> >> reserves 127 inodes. Reserving more than 127 inodes does not help to 
> >> performance obviously.
> >>
> >>
> >> 5, Performance number
> >> On a Core-Duo, 2MB DDM memory, 7200 RPM SATA PC, I built a 50GB ext4 
> >> partition, and tried to create
> >> 5 directories, and create 15 (1KB) files in each directory 
> >> alternatively. After a remount, I
> >> tried to remove all the directories and files recursively by a 'rm -rf'. 
> >> Bellow is the benchmark result,
> >>normal ext4 ext4 with dir inode 
> >> reservation
> >>mount options:  -o data=writeback   -o 
> >> data=writeback,dir_ireserve=low
> >>Create dirs:real0m49.101s   real2m59.703s
> >>Create files:   real24m17.962s  real21m8.161s
> >>Unlink all: real24m43.788s  real17m29.862s
> >> Creating dirs with dir inode reservation is slower than normal ext4 as 
> >> predicted, because allocating
> >> directory inodes in non-linear order will cause extra hard disk seeking 
> >> and block I/O.
> > 
> > Hmm...I suspect there is bug in your patch, the extra seek should not
> > contribute to 4 times slower
> 
> I agree with you :-)
> 
> > 
> >>  #include 
> >> @@ -478,6 +480,75 @@ static int find_group_other(struct super_block *sb, 
> >> struct

Re: [PATCH] ext4: dir inode reservation V3

2007-11-20 Thread Mingming Cao

On Tue, 2007-11-20 at 12:14 +0800, Coly Li wrote:
 Thanks for the feedback :-)
 
 Mingming Cao wrote:
  On Tue, 2007-11-13 at 22:12 +0800, Coly Li wrote:
  Basic idea of my dir inode reservation patch can be found here,
  http://lists.openwall.net/linux-ext4/2007/11/05/3
 
  1, What does dir inode reservation do
  Dir inode reservation tries to reserve several inodes in inodes table for 
  a directory when this
  directory is created. When create new file under this directory, try to 
  allocate inode from the
  reserved inodes area. This is called as dir_ireserve inode allocator.
 
  Thanks for the update.
  
  Let me try to understand your method:
  
  So the basic idea is not do linear inode allocation for directory? Inode
  structure block for directory file is only coming from block 0, N, N
  +N,... where the number of skipped blocks N is stored in the in-core
  superblock structure. 
 
 N is not stored in in-core superblock. N = s_dir_ireserve_nr / 
 inodes_per_block. What is stored in
 in-core superblock is number of inodes to be reserved for each directory.
 
  
  When ever need to allocate an inode for directory, skip N reserved bits
  (space for N*16 inodes) if the previous block is already allocated. That
  way place two directories with the hole of N*16 inodes structures, then
  allow files under the first directory stay closer with their parent
  directory. Is this correct?
 
 The hole is (s_dir_ireserve_nr - 1), not N * s_dir_ireserve_nr. Because 
 directory inode will also
 use a inode slot from reserved area, reset slots number for files is 
 (s_dir_ireserve_nr - 1).
 Except for the reserved inodes number, your understanding exactly matches my 
 idea.
 
Ok, thanks for clarification.

The performance gain on large number of directories looks interesting,
but I am afraid this makes the uninitialized block group feature less
useful (to speed up fsck) than before:( The reserved inode will cause
the unused inode watermark higher than before, and spread the
directories over to other block groups earlier than before. Maybe it
worth it...not sure.

   
  
  4, Dir inode reservation is optional
  Dir inode reservation is optional, you can use -o followed by one of these 
  options to enable dir
  inode reservation during mount ext4 file system:
   dir_ireserve=low
   dir_ireserve=normal
   dir_ireserve=high
  
  Would be nice to pass the tuning info low/normal/high(16/64/128 blocks
  correspondingly) via something else rather than mount options. 
 
 Sure, I agree with you. Also I am thinking should this patch permit user to 
 input reserved inodes
 number directly other than a low/normal/high. Also I am looking for methods 
 to display the tuning
 info more convenient to users.
 
export/tune through /procfs?

   
  Currently, 'low' reserves 15 file inodes for each directory, 'normal' 
  reserves 31 inodes and 'high'
  reserves 127 inodes. Reserving more than 127 inodes does not help to 
  performance obviously.
 
 
  5, Performance number
  On a Core-Duo, 2MB DDM memory, 7200 RPM SATA PC, I built a 50GB ext4 
  partition, and tried to create
  5 directories, and create 15 (1KB) files in each directory 
  alternatively. After a remount, I
  tried to remove all the directories and files recursively by a 'rm -rf'. 
  Bellow is the benchmark result,
 normal ext4 ext4 with dir inode 
  reservation
 mount options:  -o data=writeback   -o 
  data=writeback,dir_ireserve=low
 Create dirs:real0m49.101s   real2m59.703s
 Create files:   real24m17.962s  real21m8.161s
 Unlink all: real24m43.788s  real17m29.862s
  Creating dirs with dir inode reservation is slower than normal ext4 as 
  predicted, because allocating
  directory inodes in non-linear order will cause extra hard disk seeking 
  and block I/O.
  
  Hmm...I suspect there is bug in your patch, the extra seek should not
  contribute to 4 times slower
 
 I agree with you :-)
 
  
   #include linux/time.h
  @@ -478,6 +480,75 @@ static int find_group_other(struct super_block *sb, 
  struct inode *parent,
 return -1;
   }
 
  +static int ext4_ino_from_ireserve(handle_t *handle, struct inode *dir,
  +int mode, ext4_group_t *group, unsigned long *ino)
  +{
  +  struct super_block *sb;
  +  struct ext4_sb_info *sbi;
  +  struct ext4_group_desc *gdp = NULL;
  +  struct buffer_head *gdp_bh = NULL, *bitmap_bh = NULL;
  +  ext4_group_t ires_group = *group;
  +  unsigned long ires_ino;
  +  int i, bit;
  +
  +  sb = dir-i_sb;
  +  sbi = EXT4_SB(sb);
  +
  +  /* if the inode number is not for directory,
  +   * only try to allocate after directory's inode
  +   */
  +  if (!S_ISDIR(mode)) {
  +  *ino = dir-i_ino % EXT4_INODES_PER_GROUP(sb);
  +  return 0;
  +  }
  +
  +  /* reserve inodes for new directory */
  +  for (i = 0; i  sbi-s_groups_count; i

Re: [PATCH] ext4: dir inode reservation V3

2007-11-19 Thread Mingming Cao

On Tue, 2007-11-13 at 22:12 +0800, Coly Li wrote:
> Basic idea of my dir inode reservation patch can be found here,
> http://lists.openwall.net/linux-ext4/2007/11/05/3
> 
> 1, What does dir inode reservation do
> Dir inode reservation tries to reserve several inodes in inodes table for a 
> directory when this
> directory is created. When create new file under this directory, try to 
> allocate inode from the
> reserved inodes area. This is called as dir_ireserve inode allocator.
> 
Thanks for the update.

Let me try to understand your method:

So the basic idea is not do linear inode allocation for directory? Inode
structure block for directory file is only coming from block 0, N, N
+N,... where the number of skipped blocks N is stored in the in-core
superblock structure. 

When ever need to allocate an inode for directory, skip N reserved bits
(space for N*16 inodes) if the previous block is already allocated. That
way place two directories with the hole of N*16 inodes structures, then
allow files under the first directory stay closer with their parent
directory. Is this correct?
 

> 4, Dir inode reservation is optional
> Dir inode reservation is optional, you can use -o followed by one of these 
> options to enable dir
> inode reservation during mount ext4 file system:
>  dir_ireserve=low
>  dir_ireserve=normal
>  dir_ireserve=high

Would be nice to pass the tuning info low/normal/high(16/64/128 blocks
correspondingly) via something else rather than mount options. 
 
> Currently, 'low' reserves 15 file inodes for each directory, 'normal' 
> reserves 31 inodes and 'high'
> reserves 127 inodes. Reserving more than 127 inodes does not help to 
> performance obviously.
> 
> 
> 5, Performance number
> On a Core-Duo, 2MB DDM memory, 7200 RPM SATA PC, I built a 50GB ext4 
> partition, and tried to create
> 5 directories, and create 15 (1KB) files in each directory alternatively. 
> After a remount, I
> tried to remove all the directories and files recursively by a 'rm -rf'. 
> Bellow is the benchmark result,
>   normal ext4 ext4 with dir inode 
> reservation
>   mount options:  -o data=writeback   -o 
> data=writeback,dir_ireserve=low
>   Create dirs:real0m49.101s   real2m59.703s
>   Create files:   real24m17.962s  real21m8.161s
>   Unlink all: real24m43.788s  real17m29.862s
> Creating dirs with dir inode reservation is slower than normal ext4 as 
> predicted, because allocating
> directory inodes in non-linear order will cause extra hard disk seeking and 
> block I/O.

Hmm...I suspect there is bug in your patch, the extra seek should not
contribute to 4 times slower

> 
>  #include 
> @@ -478,6 +480,75 @@ static int find_group_other(struct super_block *sb, 
> struct inode *parent,
>   return -1;
>  }
> 
> +static int ext4_ino_from_ireserve(handle_t *handle, struct inode *dir,
> +   int mode, ext4_group_t *group, unsigned long *ino)
> +{
> + struct super_block *sb;
> + struct ext4_sb_info *sbi;
> + struct ext4_group_desc *gdp = NULL;
> + struct buffer_head *gdp_bh = NULL, *bitmap_bh = NULL;
> + ext4_group_t ires_group = *group;
> + unsigned long ires_ino;
> + int i, bit;
> +
> + sb = dir->i_sb;
> + sbi = EXT4_SB(sb);
> +
> + /* if the inode number is not for directory,
> +  * only try to allocate after directory's inode
> +  */
> + if (!S_ISDIR(mode)) {
> + *ino = dir->i_ino % EXT4_INODES_PER_GROUP(sb);
> + return 0;
> + }
> +
> + /* reserve inodes for new directory */
> + for (i = 0; i < sbi->s_groups_count; i++) {
> + gdp = ext4_get_group_desc(sb, ires_group, _bh);
> + if (!gdp)
> + goto fail;
> + bit = 0;
> +try_same_group:
> + if (bit < EXT4_INODES_PER_GROUP(sb)) {
> + brelse(bitmap_bh);
> + bitmap_bh = read_inode_bitmap(sb, ires_group);
> + if (!bitmap_bh)
> + goto fail;
> +
> + BUFFER_TRACE(bitmap_bh, "get_write_access");
> + if (ext4_journal_get_write_access(
> + handle, bitmap_bh) != 0)
> + goto fail;
> + if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, ires_group),
> + bit, bitmap_bh->b_data)) {
> + /* we won it */
> + BUFFER_TRACE(bitmap_bh,
> + "call ext4_journal_dirty_metadata");
> + if (ext4_journal_dirty_metadata(handle,
> + bitmap_bh) != 0)
> + goto fail;
> + ires_ino = bit;
> +

Re: [PATCH] ext4: dir inode reservation V3

2007-11-19 Thread Mingming Cao

On Tue, 2007-11-13 at 22:12 +0800, Coly Li wrote:
 Basic idea of my dir inode reservation patch can be found here,
 http://lists.openwall.net/linux-ext4/2007/11/05/3
 
 1, What does dir inode reservation do
 Dir inode reservation tries to reserve several inodes in inodes table for a 
 directory when this
 directory is created. When create new file under this directory, try to 
 allocate inode from the
 reserved inodes area. This is called as dir_ireserve inode allocator.
 
Thanks for the update.

Let me try to understand your method:

So the basic idea is not do linear inode allocation for directory? Inode
structure block for directory file is only coming from block 0, N, N
+N,... where the number of skipped blocks N is stored in the in-core
superblock structure. 

When ever need to allocate an inode for directory, skip N reserved bits
(space for N*16 inodes) if the previous block is already allocated. That
way place two directories with the hole of N*16 inodes structures, then
allow files under the first directory stay closer with their parent
directory. Is this correct?
 

 4, Dir inode reservation is optional
 Dir inode reservation is optional, you can use -o followed by one of these 
 options to enable dir
 inode reservation during mount ext4 file system:
  dir_ireserve=low
  dir_ireserve=normal
  dir_ireserve=high

Would be nice to pass the tuning info low/normal/high(16/64/128 blocks
correspondingly) via something else rather than mount options. 
 
 Currently, 'low' reserves 15 file inodes for each directory, 'normal' 
 reserves 31 inodes and 'high'
 reserves 127 inodes. Reserving more than 127 inodes does not help to 
 performance obviously.
 
 
 5, Performance number
 On a Core-Duo, 2MB DDM memory, 7200 RPM SATA PC, I built a 50GB ext4 
 partition, and tried to create
 5 directories, and create 15 (1KB) files in each directory alternatively. 
 After a remount, I
 tried to remove all the directories and files recursively by a 'rm -rf'. 
 Bellow is the benchmark result,
   normal ext4 ext4 with dir inode 
 reservation
   mount options:  -o data=writeback   -o 
 data=writeback,dir_ireserve=low
   Create dirs:real0m49.101s   real2m59.703s
   Create files:   real24m17.962s  real21m8.161s
   Unlink all: real24m43.788s  real17m29.862s
 Creating dirs with dir inode reservation is slower than normal ext4 as 
 predicted, because allocating
 directory inodes in non-linear order will cause extra hard disk seeking and 
 block I/O.

Hmm...I suspect there is bug in your patch, the extra seek should not
contribute to 4 times slower

 
  #include linux/time.h
 @@ -478,6 +480,75 @@ static int find_group_other(struct super_block *sb, 
 struct inode *parent,
   return -1;
  }
 
 +static int ext4_ino_from_ireserve(handle_t *handle, struct inode *dir,
 +   int mode, ext4_group_t *group, unsigned long *ino)
 +{
 + struct super_block *sb;
 + struct ext4_sb_info *sbi;
 + struct ext4_group_desc *gdp = NULL;
 + struct buffer_head *gdp_bh = NULL, *bitmap_bh = NULL;
 + ext4_group_t ires_group = *group;
 + unsigned long ires_ino;
 + int i, bit;
 +
 + sb = dir-i_sb;
 + sbi = EXT4_SB(sb);
 +
 + /* if the inode number is not for directory,
 +  * only try to allocate after directory's inode
 +  */
 + if (!S_ISDIR(mode)) {
 + *ino = dir-i_ino % EXT4_INODES_PER_GROUP(sb);
 + return 0;
 + }
 +
 + /* reserve inodes for new directory */
 + for (i = 0; i  sbi-s_groups_count; i++) {
 + gdp = ext4_get_group_desc(sb, ires_group, gdp_bh);
 + if (!gdp)
 + goto fail;
 + bit = 0;
 +try_same_group:
 + if (bit  EXT4_INODES_PER_GROUP(sb)) {
 + brelse(bitmap_bh);
 + bitmap_bh = read_inode_bitmap(sb, ires_group);
 + if (!bitmap_bh)
 + goto fail;
 +
 + BUFFER_TRACE(bitmap_bh, get_write_access);
 + if (ext4_journal_get_write_access(
 + handle, bitmap_bh) != 0)
 + goto fail;
 + if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, ires_group),
 + bit, bitmap_bh-b_data)) {
 + /* we won it */
 + BUFFER_TRACE(bitmap_bh,
 + call ext4_journal_dirty_metadata);
 + if (ext4_journal_dirty_metadata(handle,
 + bitmap_bh) != 0)
 + goto fail;
 + ires_ino = bit;
 + goto find;
 + }
 + /* we lost it */
 +

Re: [2.6 patch] ext4/super.c: fix #ifdef's

2007-11-05 Thread Mingming Cao

Acked-by: Mingmming Cao <[EMAIL PROTECTED]>

Ted, I added this patch in ext4 patch queue.

On Mon, 2007-11-05 at 18:07 +0100, Adrian Bunk wrote:
> This patch fixes the names of two variables in #ifdef's.
> 
> Based on a report by Robert P. J. Day.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> 
> ---
> 
>  fs/ext4/super.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> 44e9889e6a3952ea225704b2e494d31e00f34a6b 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 8031dc0..6673672 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -646,7 +646,7 @@ static int ext4_show_options(struct seq_file *seq, struct 
> vfsmount *vfs)
>   seq_puts(seq, ",debug");
>   if (test_opt(sb, OLDALLOC))
>   seq_puts(seq, ",oldalloc");
> -#ifdef CONFIG_EXT4_FS_XATTR
> +#ifdef CONFIG_EXT4DEV_FS_XATTR
>   if (test_opt(sb, XATTR_USER))
>   seq_puts(seq, ",user_xattr");
>   if (!test_opt(sb, XATTR_USER) &&
> @@ -654,7 +654,7 @@ static int ext4_show_options(struct seq_file *seq, struct 
> vfsmount *vfs)
>   seq_puts(seq, ",nouser_xattr");
>   }
>  #endif
> -#ifdef CONFIG_EXT4_FS_POSIX_ACL
> +#ifdef CONFIG_EXT4DEV_FS_POSIX_ACL
>   if (test_opt(sb, POSIX_ACL))
>   seq_puts(seq, ",acl");
>   if (!test_opt(sb, POSIX_ACL) && (def_mount_opts & EXT4_DEFM_ACL))
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] ext4/super.c: fix #ifdef's

2007-11-05 Thread Mingming Cao

Acked-by: Mingmming Cao [EMAIL PROTECTED]

Ted, I added this patch in ext4 patch queue.

On Mon, 2007-11-05 at 18:07 +0100, Adrian Bunk wrote:
 This patch fixes the names of two variables in #ifdef's.
 
 Based on a report by Robert P. J. Day.
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
 
 ---
 
  fs/ext4/super.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 44e9889e6a3952ea225704b2e494d31e00f34a6b 
 diff --git a/fs/ext4/super.c b/fs/ext4/super.c
 index 8031dc0..6673672 100644
 --- a/fs/ext4/super.c
 +++ b/fs/ext4/super.c
 @@ -646,7 +646,7 @@ static int ext4_show_options(struct seq_file *seq, struct 
 vfsmount *vfs)
   seq_puts(seq, ,debug);
   if (test_opt(sb, OLDALLOC))
   seq_puts(seq, ,oldalloc);
 -#ifdef CONFIG_EXT4_FS_XATTR
 +#ifdef CONFIG_EXT4DEV_FS_XATTR
   if (test_opt(sb, XATTR_USER))
   seq_puts(seq, ,user_xattr);
   if (!test_opt(sb, XATTR_USER) 
 @@ -654,7 +654,7 @@ static int ext4_show_options(struct seq_file *seq, struct 
 vfsmount *vfs)
   seq_puts(seq, ,nouser_xattr);
   }
  #endif
 -#ifdef CONFIG_EXT4_FS_POSIX_ACL
 +#ifdef CONFIG_EXT4DEV_FS_POSIX_ACL
   if (test_opt(sb, POSIX_ACL))
   seq_puts(seq, ,acl);
   if (!test_opt(sb, POSIX_ACL)  (def_mount_opts  EXT4_DEFM_ACL))
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] i_version update - vfs part

2007-10-25 Thread Mingming Cao

On Thu, 2007-10-25 at 19:04 +0200, Cordenner jean noel wrote:
> Hi,
> 
> This is an update of the previous patches on the ext4 git tree, the 2
> coming patches applies at the end of the current ext4-patch-queue, and
> replaces the inode-version related patches:
> 64-bit-i_version.patch
> i_version_hi.patch
> ext4_i_version_hi_2.patch
> i_version_update_ext4.patch
> 
> The first part deals with the vfs part. 
> The i_version field of the inode is changed to be a 64-bit counter that
> is set on every inode creation and that is incremented every time the
> inode data is modified (similarly to the "ctime" time-stamp). 
> The aim is to fulfill a NFSv4 requirement for rfc3530.
> This first part concerns the vfs, it converts the 32-bit i_version in
> the generic inode to a 64-bit, a flag is added in the super block in
> order to check if the feature is enabled and the i_version is
> incremented in the vfs.
> 
Thanks for reposting it.

> Index: linux-2.6.23-ext4-1/fs/inode.c
> ===
> --- linux-2.6.23-ext4-1.orig/fs/inode.c   2007-10-25 16:15:52.0
> +0200
> +++ linux-2.6.23-ext4-1/fs/inode.c2007-10-25 16:25:53.0 +0200
> @@ -1216,6 +1216,24 @@
>  EXPORT_SYMBOL(touch_atime);
> 
>  /**
> + * inode_inc_iversion  -   increments i_version
> + * @inode: inode that need to be updated
> + *
> + * Every time the inode is modified, the i_version field
> + * will be incremented.
> + * The filesystem has to be mounted with i_version flag
> + *
> + */
> +
> +void inode_inc_iversion(struct inode *inode)
> +{
> + spin_lock(>i_lock);
> + inode->i_version++;
> + spin_unlock(>i_lock);
> +}

I wonder do we really need i_lock here for inode versioning update?

Understand this is a 64 bit counter, but file_update_time() and
ext4_mark_inode_dirty() (where the inode version is updated) is called
on the file write path so i_mutex should be hold all the time. As long
as the read patch holding i_mutex everything should be fine, isn't it?

Have you get a chance to check the performance impact to ext4?

> +
> +/**
>   *   file_update_time-   update mtime and ctime time
>   *   @file: file accessed
>   *
> @@ -1249,6 +1267,11 @@
>   sync_it = 1;
>   }
> 
> + if (IS_I_VERSION(inode)) {
> + inode_inc_iversion(inode);
> + sync_it = 1;
> + }
> +
>   if (sync_it)
>   mark_inode_dirty_sync(inode);
>  }
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] i_version update - vfs part

2007-10-25 Thread Mingming Cao

On Thu, 2007-10-25 at 19:04 +0200, Cordenner jean noel wrote:
 Hi,
 
 This is an update of the previous patches on the ext4 git tree, the 2
 coming patches applies at the end of the current ext4-patch-queue, and
 replaces the inode-version related patches:
 64-bit-i_version.patch
 i_version_hi.patch
 ext4_i_version_hi_2.patch
 i_version_update_ext4.patch
 
 The first part deals with the vfs part. 
 The i_version field of the inode is changed to be a 64-bit counter that
 is set on every inode creation and that is incremented every time the
 inode data is modified (similarly to the ctime time-stamp). 
 The aim is to fulfill a NFSv4 requirement for rfc3530.
 This first part concerns the vfs, it converts the 32-bit i_version in
 the generic inode to a 64-bit, a flag is added in the super block in
 order to check if the feature is enabled and the i_version is
 incremented in the vfs.
 
Thanks for reposting it.

 Index: linux-2.6.23-ext4-1/fs/inode.c
 ===
 --- linux-2.6.23-ext4-1.orig/fs/inode.c   2007-10-25 16:15:52.0
 +0200
 +++ linux-2.6.23-ext4-1/fs/inode.c2007-10-25 16:25:53.0 +0200
 @@ -1216,6 +1216,24 @@
  EXPORT_SYMBOL(touch_atime);
 
  /**
 + * inode_inc_iversion  -   increments i_version
 + * @inode: inode that need to be updated
 + *
 + * Every time the inode is modified, the i_version field
 + * will be incremented.
 + * The filesystem has to be mounted with i_version flag
 + *
 + */
 +
 +void inode_inc_iversion(struct inode *inode)
 +{
 + spin_lock(inode-i_lock);
 + inode-i_version++;
 + spin_unlock(inode-i_lock);
 +}

I wonder do we really need i_lock here for inode versioning update?

Understand this is a 64 bit counter, but file_update_time() and
ext4_mark_inode_dirty() (where the inode version is updated) is called
on the file write path so i_mutex should be hold all the time. As long
as the read patch holding i_mutex everything should be fine, isn't it?

Have you get a chance to check the performance impact to ext4?

 +
 +/**
   *   file_update_time-   update mtime and ctime time
   *   @file: file accessed
   *
 @@ -1249,6 +1267,11 @@
   sync_it = 1;
   }
 
 + if (IS_I_VERSION(inode)) {
 + inode_inc_iversion(inode);
 + sync_it = 1;
 + }
 +
   if (sync_it)
   mark_inode_dirty_sync(inode);
  }
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] ext2: Avoid rec_len overflow with 64KB block size

2007-10-18 Thread Mingming Cao

On Wed, 2007-10-17 at 21:09 -0700, Andrew Morton wrote:
> On Thu, 11 Oct 2007 13:18:49 +0200 Jan Kara <[EMAIL PROTECTED]> wrote:
> 
> > With 64KB blocksize, a directory entry can have size 64KB which does not fit
> > into 16 bits we have for entry lenght. So we store 0x instead and 
> > convert
> > value when read from / written to disk.
> 
> btw, this changes ext2's on-disk format.
> 
Just to clarify this is only changes the directory entries format on
ext2/3/4 fs with 64k block size. But currently without kernel changes
ext2/3/4 does not support 64 block size.

> a) is the ext2 format documented anywhere?  If so, that document will
>need updating.
> 

The e2fsprogs needs to be changed to sync up with this change.

Ted has a paper a while back to show ext2 disk format 
http://web.mit.edu/tytso/www/linux/ext2intro.html

The Documentation/filesystems/ext2.txt doesn't have the ext2 format
documented. That document is out-dated need to be reviewed and cleaned
up.
 
> b) what happens when an old ext2 driver tries to read and/or write this
>directory entry?  Do we need a compat flag for it?
> 
> c) what happens when old and new ext3 or ext4 try to read/write this
>directory entry?
> 

Without the first patch in this series: ext2 large blocksize support
patches, it fails to mount a ext2 filesystem with 64k block size. 

[PATCH 1/2] ext2:  Support large blocksize up to PAGESIZE
http://lkml.org/lkml/2007/10/1/361

So the old ext2/3/4 driver will not get access the directory entry with
64k block size format changes.


Regards,

Mingming

> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] ext2: Avoid rec_len overflow with 64KB block size

2007-10-18 Thread Mingming Cao

On Wed, 2007-10-17 at 21:09 -0700, Andrew Morton wrote:
 On Thu, 11 Oct 2007 13:18:49 +0200 Jan Kara [EMAIL PROTECTED] wrote:
 
  With 64KB blocksize, a directory entry can have size 64KB which does not fit
  into 16 bits we have for entry lenght. So we store 0x instead and 
  convert
  value when read from / written to disk.
 
 btw, this changes ext2's on-disk format.
 
Just to clarify this is only changes the directory entries format on
ext2/3/4 fs with 64k block size. But currently without kernel changes
ext2/3/4 does not support 64 block size.

 a) is the ext2 format documented anywhere?  If so, that document will
need updating.
 

The e2fsprogs needs to be changed to sync up with this change.

Ted has a paper a while back to show ext2 disk format 
http://web.mit.edu/tytso/www/linux/ext2intro.html

The Documentation/filesystems/ext2.txt doesn't have the ext2 format
documented. That document is out-dated need to be reviewed and cleaned
up.
 
 b) what happens when an old ext2 driver tries to read and/or write this
directory entry?  Do we need a compat flag for it?
 
 c) what happens when old and new ext3 or ext4 try to read/write this
directory entry?
 

Without the first patch in this series: ext2 large blocksize support
patches, it fails to mount a ext2 filesystem with 64k block size. 

[PATCH 1/2] ext2:  Support large blocksize up to PAGESIZE
http://lkml.org/lkml/2007/10/1/361

So the old ext2/3/4 driver will not get access the directory entry with
64k block size format changes.


Regards,

Mingming

 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] jbd: JBD replace jbd_kmalloc with kmalloc

2007-10-08 Thread Mingming Cao

On Mon, 2007-10-08 at 10:46 -0700, Christoph Lameter wrote:
> On Fri, 5 Oct 2007, Mingming Cao wrote:
> 
> > Index: linux-2.6.23-rc9/fs/jbd/transaction.c
> > ===
> > --- linux-2.6.23-rc9.orig/fs/jbd/transaction.c  2007-10-05 
> > 12:08:08.0 -0700
> > +++ linux-2.6.23-rc9/fs/jbd/transaction.c   2007-10-05 12:08:29.0 
> > -0700
> > @@ -96,8 +96,8 @@ static int start_this_handle(journal_t *
> >  
> >  alloc_transaction:
> > if (!journal->j_running_transaction) {
> > -   new_transaction = jbd_kmalloc(sizeof(*new_transaction),
> > -   GFP_NOFS);
> > +   new_transaction = kmalloc(sizeof(*new_transaction),
> > +   GFP_NOFS|__GFP_NOFAIL);
> 
> 
> Why was a __GFP_NOFAIL added here? I do not see a use of jbd_rep_kmalloc?
> 
> > -#define jbd_kmalloc(size, flags) \
> > -   __jbd_kmalloc(__FUNCTION__, (size), (flags), journal_oom_retry)
> 
> journal_oom_retry is no longer used?
> -

journal_oom_retry (which is defined as 1 currently) is still being used
in revoke.c, the cleanup patch doesn't remove the define of
journal_oom_retry.

Since journal_oom_retry is always 1 to __jbd_kmalloc, 

void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int
retry)
{
return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
}

So we replace jbd_kmalloc() to kmalloc() with __GFP_NOFAIL flag on in
start_this_handle(). Other two places replacing to kmalloc() is part of
the init process, so no need for __GFP_NOFAIL flag there.

Mingming

> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] jbd: JBD replace jbd_kmalloc with kmalloc

2007-10-08 Thread Mingming Cao

On Mon, 2007-10-08 at 10:46 -0700, Christoph Lameter wrote:
 On Fri, 5 Oct 2007, Mingming Cao wrote:
 
  Index: linux-2.6.23-rc9/fs/jbd/transaction.c
  ===
  --- linux-2.6.23-rc9.orig/fs/jbd/transaction.c  2007-10-05 
  12:08:08.0 -0700
  +++ linux-2.6.23-rc9/fs/jbd/transaction.c   2007-10-05 12:08:29.0 
  -0700
  @@ -96,8 +96,8 @@ static int start_this_handle(journal_t *
   
   alloc_transaction:
  if (!journal-j_running_transaction) {
  -   new_transaction = jbd_kmalloc(sizeof(*new_transaction),
  -   GFP_NOFS);
  +   new_transaction = kmalloc(sizeof(*new_transaction),
  +   GFP_NOFS|__GFP_NOFAIL);
 
 
 Why was a __GFP_NOFAIL added here? I do not see a use of jbd_rep_kmalloc?
 
  -#define jbd_kmalloc(size, flags) \
  -   __jbd_kmalloc(__FUNCTION__, (size), (flags), journal_oom_retry)
 
 journal_oom_retry is no longer used?
 -

journal_oom_retry (which is defined as 1 currently) is still being used
in revoke.c, the cleanup patch doesn't remove the define of
journal_oom_retry.

Since journal_oom_retry is always 1 to __jbd_kmalloc, 

void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int
retry)
{
return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
}

So we replace jbd_kmalloc() to kmalloc() with __GFP_NOFAIL flag on in
start_this_handle(). Other two places replacing to kmalloc() is part of
the init process, so no need for __GFP_NOFAIL flag there.

Mingming

 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] i_version update - vfs part

2007-10-05 Thread Mingming Cao

On Fri, 2007-10-05 at 17:28 +0200, Cordenner jean noel wrote:
> Hi, 
> 
Hi Jean Noel,

> This is an update of the i_version patch. 

Just to make sure, is this vfs patch and next ext4 patch together going
to replace the 4 inode-version related patches currently in
ext4-patch-queue (and git tree)? 

> The i_version field is a 64bit counter that is set on every inode 
> creation and that is incremented every time the inode data is modified 
> (similarly to the "ctime" time-stamp). 
> The aim is to fulfill a NFSv4 requirement for rfc3530: 
> "5.5.  Mandatory Attributes - Definitions 
> Name#DataType   Access   Description 
> ___ 
> change3uint64   READ A value created by the 
> server that the client can use to determine if file 
> data, directory contents or attributes of the object 
> have been modified.  The servermay return the object's 
> time_metadata attribute for this attribute's value but 
> only if the filesystem object can not be updated more 
> frequently than the resolution of time_metadata. 
> "
> 
> This first part deals with adding a flag in the super block and incrementing 
> the i_version in the vfs.
> 
> Signed-off-by: Jean Noel Cordenner <[EMAIL PROTECTED]>
> --- 
>  fs/inode.c |   23 +++
>  fs/libfs.c |   12 
>  include/linux/fs.h |3 +++
>  3 files changed, 38 insertions(+)
> 
> Index: linux-2.6.23-rc8-ext4-i_version/fs/inode.c
> ===
> --- linux-2.6.23-rc8-ext4-i_version.orig/fs/inode.c   2007-09-26 
> 14:41:41.0 +0200
> +++ linux-2.6.23-rc8-ext4-i_version/fs/inode.c2007-10-05 
> 16:14:41.0 +0200
> @@ -1216,6 +1216,24 @@
>  EXPORT_SYMBOL(touch_atime);
> 
>  /**
> + *   inode_inc_iversion  -   increments i_version
> + *   @inode: inode that need to be updated
> + *
> + *   Every time the inode is modified, the i_version field
> + *   will be incremented.
> + *   The filesystem has to be mounted with i_version flag
> + *
> + */
> +
> +void inode_inc_iversion(struct inode *inode)
> +{
> + spin_lock(>i_lock);
> + inode->i_version++;
> + spin_unlock(>i_lock);
> +}

I suspect we need a lock here,  the places where need to update the
inode->i_version are already doing update for inode, mostly protected by
i_mutex. 

You could remove the above function and update the counter directly at
the places it need to.

> +EXPORT_SYMBOL(inode_inc_iversion);
> +

Seems unnecessary.

> +/**
>   *   file_update_time-   update mtime and ctime time
>   *   @file: file accessed
>   *
> @@ -1249,6 +1267,11 @@
>   sync_it = 1;
>   }
> 
> + if (IS_I_VERSION(inode)) {
> + inode_inc_iversion(inode);
> + sync_it = 1;
> + }
> +
>   if (sync_it)
>   mark_inode_dirty_sync(inode);
>  }
> Index: linux-2.6.23-rc8-ext4-i_version/fs/libfs.c
> ===
> --- linux-2.6.23-rc8-ext4-i_version.orig/fs/libfs.c   2007-07-09 
> 01:32:17.0 +0200
> +++ linux-2.6.23-rc8-ext4-i_version/fs/libfs.c2007-09-26 
> 14:51:08.0 +0200
> @@ -255,6 +255,10 @@
>   struct inode *inode = old_dentry->d_inode;
> 
>   inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
> + if (IS_I_VERSION(inode)) {
> + inode_inc_iversion(inode);
> + inode_inc_iversion(dir);
> + }
>   inc_nlink(inode);
>   atomic_inc(>i_count);
>   dget(dentry);
> @@ -287,6 +291,10 @@
>   struct inode *inode = dentry->d_inode;
> 
>   inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME;
> + if (IS_I_VERSION(inode)) {
> + inode_inc_iversion(inode);
> + inode_inc_iversion(dir);
> + }
>   drop_nlink(inode);
>   dput(dentry);
>   return 0;
> @@ -323,6 +331,10 @@
> 
>   old_dir->i_ctime = old_dir->i_mtime = new_dir->i_ctime =
>   new_dir->i_mtime = inode->i_ctime = CURRENT_TIME;
> + if (IS_I_VERSION(old_dir)) {
> + inode_inc_iversion(old_dir);
> + inode_inc_iversion(new_dir);
> + }
> 
>   return 0;
>  }

Need to update the counter in libfs.c?

> Index: linux-2.6.23-rc8-ext4-i_version/include/linux/fs.h
> ===
> --- linux-2.6.23-rc8-ext4-i_version.orig/include/linux/fs.h   2007-09-26 
> 14:46:15.0 +0200
> +++ linux-2.6.23-rc8-ext4-i_version/include/linux/fs.h2007-09-26 
> 14:51:08.0 +0200
> @@ -123,6 +123,7 @@
>  #define MS_SLAVE (1<<19) /* change to slave */
>  #define MS_SHARED(1<<20) /* change to shared */
>  #define MS_RELATIME  (1<<21) /* Update atime relative to mtime/ctime. */
> +#define MS_I_VERSION (1<<22) /* Update inode i_version field */
>  #define MS_ACTIVE

Re: [PATCH 2/2] i_version update - ext4 part

2007-10-05 Thread Mingming Cao

On Fri, 2007-10-05 at 17:28 +0200, Cordenner jean noel wrote:
> This patch update the i_version field of the inode and add a mount
> option to enable this feature. The other condition to enable this
> feature is that the inode size should be 256-bytes.
> 
> Signed-off-by: Jean Noel Cordenner <[EMAIL PROTECTED]>
> --- 
>  fs/ext4/inode.c |4 +++-
>  fs/ext4/super.c |7 ++-
>  include/linux/ext4_fs.h |1 +
>  3 files changed, 10 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.23-rc8-ext4-i_version/fs/ext4/inode.c
> ===
> --- linux-2.6.23-rc8-ext4-i_version.orig/fs/ext4/inode.c  2007-10-03
> 18:11:17.0 +0200
> +++ linux-2.6.23-rc8-ext4-i_version/fs/ext4/inode.c   2007-10-05
> 10:26:42.0 +0200
> @@ -3173,7 +3173,9 @@
>  {
>   int err = 0;
> 
> - inode->i_version++;
> + if (test_opt(inode->i_sb, I_VERSION))
> + inode_inc_iversion(inode);
> +
>   /* the do_update_inode consumes one bh->b_count */
>   get_bh(iloc->bh);
> 
> Index: linux-2.6.23-rc8-ext4-i_version/fs/ext4/super.c
> ===
> --- linux-2.6.23-rc8-ext4-i_version.orig/fs/ext4/super.c  2007-10-03
> 18:11:17.0 +0200
> +++ linux-2.6.23-rc8-ext4-i_version/fs/ext4/super.c   2007-10-03
> 18:17:44.0 +0200
> @@ -742,7 +742,7 @@
>   Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota,
>   Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota,
>   Opt_grpquota, Opt_extents, Opt_noextents, Opt_delalloc,
> - Opt_mballoc, Opt_nomballoc, Opt_stripe,
> + Opt_mballoc, Opt_nomballoc, Opt_stripe, Opt_i_version,
>  };
> 
>  static match_table_t tokens = {
> @@ -800,6 +800,7 @@
>   {Opt_mballoc, "mballoc"},
>   {Opt_nomballoc, "nomballoc"},
>   {Opt_stripe, "stripe=%u"},
> + {Opt_i_version, "i_version"},
>   {Opt_err, NULL},
>   {Opt_resize, "resize"},
>  };
> @@ -1161,6 +1162,10 @@
>   return 0;
>   sbi->s_stripe = option;
>   break;
> + case Opt_i_version:
> + set_opt (sbi->s_mount_opt, I_VERSION);
> + sb->s_flags |= MS_I_VERSION;
> + break;

Need to make sure this flag is cleared if remounted fs without I_VERSION

>   default:
>   printk (KERN_ERR
>   "EXT4-fs: Unrecognized mount option \"%s\" "
> Index: linux-2.6.23-rc8-ext4-i_version/include/linux/ext4_fs.h
> ===
> --- linux-2.6.23-rc8-ext4-i_version.orig/include/linux/ext4_fs.h
> 2007-10-03 18:11:17.0 +0200
> +++ linux-2.6.23-rc8-ext4-i_version/include/linux/ext4_fs.h   2007-10-03
> 18:11:54.0 +0200
> @@ -500,6 +500,7 @@
>  #define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT  0x100 /* Journal Async
> Commit */
>  #define EXT4_MOUNT_DELALLOC  0x200 /* Delalloc support */
>  #define EXT4_MOUNT_MBALLOC   0x400 /* Buddy allocation support */
> +#define EXT4_MOUNT_I_VERSION 0x800 /* i_version support */
>  /* Compatibility, for having both ext2_fs.h and ext4_fs.h included at
> once */
>  #ifndef _LINUX_EXT2_FS_H
>  #define clear_opt(o, opt)o &= ~EXT4_MOUNT_##opt
> 


I don't see places where this counter is being stored/load to/from disk,
so I assume this is the not the full patch series?


Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] jbd2: JBD replace jbd2_kmalloc with kmalloc

2007-10-05 Thread Mingming Cao

JBD2: JBD2 replace jbd2_kmalloc with kmalloc

From: Mingming Cao <[EMAIL PROTECTED]>

This patch cleans up jbd_kmalloc and replace it with kmalloc directly

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---

 fs/jbd2/journal.c |   11 +--
 fs/jbd2/transaction.c |4 ++--
 include/linux/jbd2.h  |7 ---
 3 files changed, 3 insertions(+), 19 deletions(-)


Index: linux-2.6.23-rc9/fs/jbd2/journal.c
===
--- linux-2.6.23-rc9.orig/fs/jbd2/journal.c 2007-10-05 12:08:26.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd2/journal.c  2007-10-05 12:08:32.0 -0700
@@ -654,7 +654,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = jbd_kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -1619,15 +1619,6 @@ size_t journal_tag_bytes(journal_t *jour
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd2_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
-}
-
-/*
  * Journal_head storage management
  */
 static struct kmem_cache *jbd2_journal_head_cache;
Index: linux-2.6.23-rc9/fs/jbd2/transaction.c
===
--- linux-2.6.23-rc9.orig/fs/jbd2/transaction.c 2007-10-05 12:08:26.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd2/transaction.c  2007-10-05 12:08:32.0 
-0700
@@ -96,8 +96,8 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal->j_running_transaction) {
-   new_transaction = jbd_kmalloc(sizeof(*new_transaction),
-   GFP_NOFS);
+   new_transaction = kmalloc(sizeof(*new_transaction),
+   GFP_NOFS|__GFP_NOFAIL);
if (!new_transaction) {
ret = -ENOMEM;
goto out;
Index: linux-2.6.23-rc9/include/linux/jbd2.h
===
--- linux-2.6.23-rc9.orig/include/linux/jbd2.h  2007-10-05 12:08:26.0 
-0700
+++ linux-2.6.23-rc9/include/linux/jbd2.h   2007-10-05 12:08:32.0 
-0700
@@ -71,13 +71,6 @@ extern u8 jbd2_journal_enable_debug;
 #define jbd_debug(f, a...) /**/
 #endif
 
-extern void * __jbd2_kmalloc (const char *where, size_t size, gfp_t flags, int 
retry);
-#define jbd_kmalloc(size, flags) \
-   __jbd2_kmalloc(__FUNCTION__, (size), (flags), journal_oom_retry)
-#define jbd_rep_kmalloc(size, flags) \
-   __jbd2_kmalloc(__FUNCTION__, (size), (flags), 1)
-
-
 static inline void *jbd2_alloc(size_t size, gfp_t flags)
 {
return (void *)__get_free_pages(flags, get_order(size));


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] jbd: JBD replace jbd_kmalloc with kmalloc

2007-10-05 Thread Mingming Cao

JBD: JBD replace jbd_kmalloc with kmalloc

From: Mingming Cao <[EMAIL PROTECTED]>

This patch cleans up jbd_kmalloc and replace it with kmalloc directly

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---

 fs/jbd/journal.c |   11 +--
 fs/jbd/transaction.c |4 ++--
 include/linux/jbd.h  |6 --
 3 files changed, 3 insertions(+), 18 deletions(-)


Index: linux-2.6.23-rc9/fs/jbd/journal.c
===
--- linux-2.6.23-rc9.orig/fs/jbd/journal.c  2007-10-05 12:08:08.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd/journal.c   2007-10-05 12:08:29.0 -0700
@@ -653,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = jbd_kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -1607,15 +1607,6 @@ int journal_blocks_per_page(struct inode
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
-}
-
-/*
  * Journal_head storage management
  */
 static struct kmem_cache *journal_head_cache;
Index: linux-2.6.23-rc9/fs/jbd/transaction.c
===
--- linux-2.6.23-rc9.orig/fs/jbd/transaction.c  2007-10-05 12:08:08.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd/transaction.c   2007-10-05 12:08:29.0 
-0700
@@ -96,8 +96,8 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal->j_running_transaction) {
-   new_transaction = jbd_kmalloc(sizeof(*new_transaction),
-   GFP_NOFS);
+   new_transaction = kmalloc(sizeof(*new_transaction),
+   GFP_NOFS|__GFP_NOFAIL);
if (!new_transaction) {
ret = -ENOMEM;
goto out;
Index: linux-2.6.23-rc9/include/linux/jbd.h
===
--- linux-2.6.23-rc9.orig/include/linux/jbd.h   2007-10-05 12:08:08.0 
-0700
+++ linux-2.6.23-rc9/include/linux/jbd.h2007-10-05 12:08:29.0 
-0700
@@ -71,12 +71,6 @@ extern int journal_enable_debug;
 #define jbd_debug(f, a...) /**/
 #endif
 
-extern void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int 
retry);
-#define jbd_kmalloc(size, flags) \
-   __jbd_kmalloc(__FUNCTION__, (size), (flags), journal_oom_retry)
-#define jbd_rep_kmalloc(size, flags) \
-   __jbd_kmalloc(__FUNCTION__, (size), (flags), 1)
-
 static inline void *jbd_alloc(size_t size, gfp_t flags)
 {
return (void *)__get_free_pages(flags, get_order(size));


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] jbd2: JBD2 slab allocation cleanups

2007-10-05 Thread Mingming Cao

JBD2: jbd2 slab allocation cleanups

From: Mingming Cao <[EMAIL PROTECTED]>

JBD2: Replace slab allocations with page allocations

JBD2 allocate memory for committed_data and frozen_data from slab. However
JBD2 should not pass slab pages down to the block layer. Use page allocator
pages instead. This will also prepare JBD for the large blocksize patchset.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---

 fs/jbd2/commit.c  |6 +--
 fs/jbd2/journal.c |   88 ++
 fs/jbd2/transaction.c |   14 +++
 include/linux/jbd2.h  |   18 +++---
 4 files changed, 27 insertions(+), 99 deletions(-)


Index: linux-2.6.23-rc9/fs/jbd2/commit.c
===
--- linux-2.6.23-rc9.orig/fs/jbd2/commit.c  2007-10-05 12:03:43.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd2/commit.c   2007-10-05 12:08:26.0 -0700
@@ -384,7 +384,7 @@ void jbd2_journal_commit_transaction(jou
struct buffer_head *bh = jh2bh(jh);
 
jbd_lock_bh_state(bh);
-   jbd2_slab_free(jh->b_committed_data, bh->b_size);
+   jbd2_free(jh->b_committed_data, bh->b_size);
jh->b_committed_data = NULL;
jbd_unlock_bh_state(bh);
}
@@ -801,14 +801,14 @@ restart_loop:
 * Otherwise, we can just throw away the frozen data now.
 */
if (jh->b_committed_data) {
-   jbd2_slab_free(jh->b_committed_data, bh->b_size);
+   jbd2_free(jh->b_committed_data, bh->b_size);
jh->b_committed_data = NULL;
if (jh->b_frozen_data) {
jh->b_committed_data = jh->b_frozen_data;
jh->b_frozen_data = NULL;
}
} else if (jh->b_frozen_data) {
-   jbd2_slab_free(jh->b_frozen_data, bh->b_size);
+   jbd2_free(jh->b_frozen_data, bh->b_size);
jh->b_frozen_data = NULL;
}
 
Index: linux-2.6.23-rc9/fs/jbd2/journal.c
===
--- linux-2.6.23-rc9.orig/fs/jbd2/journal.c 2007-10-05 12:03:43.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd2/journal.c  2007-10-05 12:08:26.0 -0700
@@ -84,7 +84,6 @@ EXPORT_SYMBOL(jbd2_journal_force_commit)
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int jbd2_journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -335,10 +334,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd2_slab_alloc(bh_in->b_size, GFP_NOFS);
+   tmp = jbd2_alloc(bh_in->b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in->b_frozen_data) {
-   jbd2_slab_free(tmp, bh_in->b_size);
+   jbd2_free(tmp, bh_in->b_size);
goto repeat;
}
 
@@ -1096,13 +1095,6 @@ int jbd2_journal_load(journal_t *journal
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = jbd2_journal_create_jbd_slab(be32_to_cpu(sb->s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (jbd2_journal_recover(journal))
@@ -1636,77 +1628,6 @@ void * __jbd2_kmalloc (const char *where
 }
 
 /*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
- */
-
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size)  (size >> 11)
-
-static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
-static const char *jbd_slab_names[JBD_MAX_SLABS] = {
-   "jbd2_1k", "jbd2_2k", "jbd2_4k", NULL, "jbd2_8k"
-};
-
-static void jbd2_journal_destroy_jbd_slabs(void)
-{
-   int i;
-
-   for (i = 0; i < JBD_MAX_SLABS; i++) {
-   if (jbd_slab[i])
-   kmem_cache_destroy(jbd_slab[i]);
-   jbd_slab[i] = NULL;
-   }
-}
-
-static int jbd2_journal_create_jbd_slab(size_t slab_size)
-{
-   int i = JBD_SLAB_INDEX(slab_size);
-
-   BUG_ON(i >= JBD_MAX_SLABS);
-
-   /*
-* Check if we already have a slab created for this size
-*/
-   if (jbd_slab[i])
-   r

[PATCH] jbd: JBD slab allocation cleanups

2007-10-05 Thread Mingming Cao

JBD: JBD slab allocation cleanups

From: Mingming Cao <[EMAIL PROTECTED]>

JBD: Replace slab allocations with page allocations

JBD allocate memory for committed_data and frozen_data from slab. However
JBD should not pass slab pages down to the block layer. Use page allocator 
pages instead. This will also prepare JBD for the large blocksize patchset.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---

 fs/jbd/commit.c  |6 +--
 fs/jbd/journal.c |   88 ++-
 fs/jbd/transaction.c |8 ++--
 include/linux/jbd.h  |   13 +--
 4 files changed, 21 insertions(+), 94 deletions(-)


Index: linux-2.6.23-rc9/fs/jbd/commit.c
===
--- linux-2.6.23-rc9.orig/fs/jbd/commit.c   2007-10-05 12:03:43.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd/commit.c2007-10-05 12:08:08.0 -0700
@@ -375,7 +375,7 @@ void journal_commit_transaction(journal_
struct buffer_head *bh = jh2bh(jh);
 
jbd_lock_bh_state(bh);
-   jbd_slab_free(jh->b_committed_data, bh->b_size);
+   jbd_free(jh->b_committed_data, bh->b_size);
jh->b_committed_data = NULL;
jbd_unlock_bh_state(bh);
}
@@ -792,14 +792,14 @@ restart_loop:
 * Otherwise, we can just throw away the frozen data now.
 */
if (jh->b_committed_data) {
-   jbd_slab_free(jh->b_committed_data, bh->b_size);
+   jbd_free(jh->b_committed_data, bh->b_size);
jh->b_committed_data = NULL;
if (jh->b_frozen_data) {
jh->b_committed_data = jh->b_frozen_data;
jh->b_frozen_data = NULL;
}
} else if (jh->b_frozen_data) {
-   jbd_slab_free(jh->b_frozen_data, bh->b_size);
+   jbd_free(jh->b_frozen_data, bh->b_size);
jh->b_frozen_data = NULL;
}
 
Index: linux-2.6.23-rc9/fs/jbd/journal.c
===
--- linux-2.6.23-rc9.orig/fs/jbd/journal.c  2007-10-05 12:03:43.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd/journal.c   2007-10-05 12:08:08.0 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd_slab_alloc(bh_in->b_size, GFP_NOFS);
+   tmp = jbd_alloc(bh_in->b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in->b_frozen_data) {
-   jbd_slab_free(tmp, bh_in->b_size);
+   jbd_free(tmp, bh_in->b_size);
goto repeat;
}
 
@@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = journal_create_jbd_slab(be32_to_cpu(sb->s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (journal_recover(journal))
@@ -1624,77 +1616,6 @@ void * __jbd_kmalloc (const char *where,
 }
 
 /*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
- */
-
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size)  (size >> 11)
-
-static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
-static const char *jbd_slab_names[JBD_MAX_SLABS] = {
-   "jbd_1k", "jbd_2k", "jbd_4k", NULL, "jbd_8k"
-};
-
-static void journal_destroy_jbd_slabs(void)
-{
-   int i;
-
-   for (i = 0; i < JBD_MAX_SLABS; i++) {
-   if (jbd_slab[i])
-   kmem_cache_destroy(jbd_slab[i]);
-   jbd_slab[i] = NULL;
-   }
-}
-
-static int journal_create_jbd_slab(size_t slab_size)
-{
-   int i = JBD_SLAB_INDEX(slab_size);
-
-   BUG_ON(i >= JBD_MAX_SLABS);
-
-   /*
-* Check if we already have a slab created for this size
-*/
-   if (jbd_slab[i])
-   return 0;
-
-   /*
-* Create a slab and force alig

Re: [PATCH] jbd/jbd2: JBD memory allocation cleanups

2007-10-05 Thread Mingming Cao

On Thu, 2007-10-04 at 07:52 +0100, Christoph Hellwig wrote:
> On Thu, Oct 04, 2007 at 01:50:36AM -0400, Theodore Ts'o wrote:
> > From: Mingming Cao <[EMAIL PROTECTED]>
> > 
> > JBD: Replace slab allocations with page cache allocations
> 
> It's page allocations, not page cache allocations.
> 
> > Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly
> 
> That sounds like it should be a different patch..

Okay. Will sent the patches, that also separate JBD2 changes to a
different patch.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c

2007-10-05 Thread Mingming Cao

 Forwarded Message 
From: Valerie Clement <[EMAIL PROTECTED]>
To: Linux Kernel Mailing List 
Subject: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c
Date:   Thu, 04 Oct 2007 18:13:46 +0200
While running ffsb tests on my ext4 filesystem, I got an Oops in 
cache_alloc_refill().
I turned on SLAB debugging and here is the message I got:

slab: Internal list corruption detected in cache 'buffer_head'(30), 
slabp 81007e100100(1515870810). Hexdump:

===>

slabp->inuse counter looks corrupted (1515870810), it should not greater
than cachep->num looks valid (30)


000: 5a 5a 5a 5a 5a 5a 5a 5a b8 23 34 7e 00 81 ff ff
010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
020: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
030: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
040: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
050: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
060: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a a5
070: c0 88 56 63 c5 56 41 d8 f1 37 4a 80 ff ff ff ff
080: c0 88 56 63 c5 56 41 d8 80 33 53 7d 00 81 ff ff
090: e8 25 60 7d 00 81 ff ff 68 cb 3b 01 00 81 ff ff
0a0: 18 68 50 7d 00 81 ff ff
[ cut here ]
kernel BUG at /home/clementv/src/linux-2.6.23-rc9/mm/slab.c:2923!
invalid opcode:  [1] SMP
CPU 2
Modules linked in: qla2xxx
Pid: 4041, comm: ffsb Not tainted 2.6.23-rc9 #2
RIP: 0010:[]  [] check_slabp+0xb5/0xc1
RSP: 0018:8100774bb958  EFLAGS: 00010096
RAX: 0001 RBX: 81007e100100 RCX: 6d20
RDX:  RSI: 0046 RDI: 81007e347280
RBP: 00a8 R08: 0005 R09: 8060bb10
R10: 000ae468 R11: 00050002 R12: 00a8
R13: 81007e347280 R14: 81007e347280 R15: 0002
FS:  41802950(0063) GS:81007e0c4728() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 5f83d00c CR3: 78149000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process ffsb (pid: 4041, threadinfo 8100774ba000, task 81007dbdc7a0)
Stack:  000d 000e 81007e100100 81007e342398
  81007e078488 80277069 8050 81007e347280
  8050 0246 80299539 f000
Call Trace:
  [] cache_alloc_refill+0xc8/0x23f
  [] alloc_buffer_head+0x14/0x45
  [] kmem_cache_alloc+0x94/0xe9
  [] alloc_buffer_head+0x14/0x45
  [] alloc_page_buffers+0x38/0xd5
  [] create_empty_buffers+0x14/0x9b
  [] __block_prepare_write+0x7c/0x45b
  [] ext4_get_block+0x0/0x139
  [] block_prepare_write+0x1a/0x25
  [] ext4_prepare_write+0xaf/0x175
  [] generic_file_buffered_write+0x288/0x631
  [] __generic_file_aio_write_nolock+0x33f/0x3a9
  [] enqueue_entity+0x17c/0x1a3
  [] generic_file_aio_write+0x61/0xc1
  [] __check_preempt_curr_fair+0x56/0x76
  [] ext4_file_write+0x16/0x91
  [] do_sync_write+0xc9/0x10c
  [] file_move+0x1d/0x4c
  [] autoremove_wake_function+0x0/0x2e
  [] do_filp_open+0x2a/0x38
  [] poison_obj+0x26/0x30
  [] vfs_write+0xad/0x136
  [] sys_write+0x45/0x6e
  [] system_call+0x7e/0x83


=>

The stack track shows ext4_new_block(), is the problem repeatable? Does away 
without
multiple block allocation patch?

Mingming


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c

2007-10-05 Thread Mingming Cao

On Fri, 2007-10-05 at 07:54 -0700, Badari Pulavarty wrote:
> On Fri, 2007-10-05 at 15:41 +0200, Valerie Clement wrote:
> > Badari Pulavarty wrote:
> > > On Thu, 2007-10-04 at 18:13 +0200, Valerie Clement wrote:
> > >> While running ffsb tests on my ext4 filesystem, I got an Oops in 
> > >> cache_alloc_refill().
> > >> I turned on SLAB debugging and here is the message I got:
> > >>
> > >> slab: Internal list corruption detected in cache 'buffer_head'(30), 
> > >> slabp 81007e100100(1515870810). Hexdump:
> > > 
> > > slabp->inuse = 1515870810 looks bogus. Is this easily reproducible ?
> > 
> > Hi Badari,
> > Thanks for your answer.
> > I didn't reproduce it without the latest ext4 patches. So I suspect a 
> > bug in one of them.
> > But how debugging this?
> > Which other debug traces can I turn on?
> 
> Let me understand. You applied latest ext4 patchsets ? If so, Mingming
> has some slab-cleanup changes in the patchset. You can try backing them
> out and see. 
> 

It's unlikely to be the jbd_slab_cleanup.patch, which actually get rid
of slab allocation for buffers passing down to disk IO, and replace with
get_free_page directly.

Could you send me the profile used for ffsb test?

Thanks,
Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c

2007-10-05 Thread Mingming Cao

On Fri, 2007-10-05 at 07:54 -0700, Badari Pulavarty wrote:
 On Fri, 2007-10-05 at 15:41 +0200, Valerie Clement wrote:
  Badari Pulavarty wrote:
   On Thu, 2007-10-04 at 18:13 +0200, Valerie Clement wrote:
   While running ffsb tests on my ext4 filesystem, I got an Oops in 
   cache_alloc_refill().
   I turned on SLAB debugging and here is the message I got:
  
   slab: Internal list corruption detected in cache 'buffer_head'(30), 
   slabp 81007e100100(1515870810). Hexdump:
   
   slabp-inuse = 1515870810 looks bogus. Is this easily reproducible ?
  
  Hi Badari,
  Thanks for your answer.
  I didn't reproduce it without the latest ext4 patches. So I suspect a 
  bug in one of them.
  But how debugging this?
  Which other debug traces can I turn on?
 
 Let me understand. You applied latest ext4 patchsets ? If so, Mingming
 has some slab-cleanup changes in the patchset. You can try backing them
 out and see. 
 

It's unlikely to be the jbd_slab_cleanup.patch, which actually get rid
of slab allocation for buffers passing down to disk IO, and replace with
get_free_page directly.

Could you send me the profile used for ffsb test?

Thanks,
Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c

2007-10-05 Thread Mingming Cao

 Forwarded Message 
From: Valerie Clement [EMAIL PROTECTED]
To: Linux Kernel Mailing List linux-kernel@vger.kernel.org
Subject: 2.6.23-rc9: Oops in cache_alloc_refill() mm/slab.c
Date:   Thu, 04 Oct 2007 18:13:46 +0200
While running ffsb tests on my ext4 filesystem, I got an Oops in 
cache_alloc_refill().
I turned on SLAB debugging and here is the message I got:

slab: Internal list corruption detected in cache 'buffer_head'(30), 
slabp 81007e100100(1515870810). Hexdump:

===

slabp-inuse counter looks corrupted (1515870810), it should not greater
than cachep-num looks valid (30)


000: 5a 5a 5a 5a 5a 5a 5a 5a b8 23 34 7e 00 81 ff ff
010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
020: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
030: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
040: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
050: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
060: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a a5
070: c0 88 56 63 c5 56 41 d8 f1 37 4a 80 ff ff ff ff
080: c0 88 56 63 c5 56 41 d8 80 33 53 7d 00 81 ff ff
090: e8 25 60 7d 00 81 ff ff 68 cb 3b 01 00 81 ff ff
0a0: 18 68 50 7d 00 81 ff ff
[ cut here ]
kernel BUG at /home/clementv/src/linux-2.6.23-rc9/mm/slab.c:2923!
invalid opcode:  [1] SMP
CPU 2
Modules linked in: qla2xxx
Pid: 4041, comm: ffsb Not tainted 2.6.23-rc9 #2
RIP: 0010:[802758b6]  [802758b6] check_slabp+0xb5/0xc1
RSP: 0018:8100774bb958  EFLAGS: 00010096
RAX: 0001 RBX: 81007e100100 RCX: 6d20
RDX:  RSI: 0046 RDI: 81007e347280
RBP: 00a8 R08: 0005 R09: 8060bb10
R10: 000ae468 R11: 00050002 R12: 00a8
R13: 81007e347280 R14: 81007e347280 R15: 0002
FS:  41802950(0063) GS:81007e0c4728() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 5f83d00c CR3: 78149000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process ffsb (pid: 4041, threadinfo 8100774ba000, task 81007dbdc7a0)
Stack:  000d 000e 81007e100100 81007e342398
  81007e078488 80277069 8050 81007e347280
  8050 0246 80299539 f000
Call Trace:
  [80277069] cache_alloc_refill+0xc8/0x23f
  [80299539] alloc_buffer_head+0x14/0x45
  [802774cd] kmem_cache_alloc+0x94/0xe9
  [80299539] alloc_buffer_head+0x14/0x45
  [80299cf7] alloc_page_buffers+0x38/0xd5
  [80299da8] create_empty_buffers+0x14/0x9b
  [8029a875] __block_prepare_write+0x7c/0x45b
  [802f6e29] ext4_get_block+0x0/0x139
  [8029ac6e] block_prepare_write+0x1a/0x25
  [802f8340] ext4_prepare_write+0xaf/0x175
  [802576c2] generic_file_buffered_write+0x288/0x631
  [80257daa] __generic_file_aio_write_nolock+0x33f/0x3a9
  [8022b7d5] enqueue_entity+0x17c/0x1a3
  [80257e75] generic_file_aio_write+0x61/0xc1
  [8022c512] __check_preempt_curr_fair+0x56/0x76
  [802f4022] ext4_file_write+0x16/0x91
  [8027c4f4] do_sync_write+0xc9/0x10c
  [8027d50a] file_move+0x1d/0x4c
  [80245992] autoremove_wake_function+0x0/0x2e
  [8027b216] do_filp_open+0x2a/0x38
  [80275f7a] poison_obj+0x26/0x30
  [8027cc34] vfs_write+0xad/0x136
  [8027d171] sys_write+0x45/0x6e
  [8020b32e] system_call+0x7e/0x83


=

The stack track shows ext4_new_block(), is the problem repeatable? Does away 
without
multiple block allocation patch?

Mingming


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] jbd/jbd2: JBD memory allocation cleanups

2007-10-05 Thread Mingming Cao

On Thu, 2007-10-04 at 07:52 +0100, Christoph Hellwig wrote:
 On Thu, Oct 04, 2007 at 01:50:36AM -0400, Theodore Ts'o wrote:
  From: Mingming Cao [EMAIL PROTECTED]
  
  JBD: Replace slab allocations with page cache allocations
 
 It's page allocations, not page cache allocations.
 
  Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly
 
 That sounds like it should be a different patch..

Okay. Will sent the patches, that also separate JBD2 changes to a
different patch.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] jbd: JBD slab allocation cleanups

2007-10-05 Thread Mingming Cao

JBD: JBD slab allocation cleanups

From: Mingming Cao [EMAIL PROTECTED]

JBD: Replace slab allocations with page allocations

JBD allocate memory for committed_data and frozen_data from slab. However
JBD should not pass slab pages down to the block layer. Use page allocator 
pages instead. This will also prepare JBD for the large blocksize patchset.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/jbd/commit.c  |6 +--
 fs/jbd/journal.c |   88 ++-
 fs/jbd/transaction.c |8 ++--
 include/linux/jbd.h  |   13 +--
 4 files changed, 21 insertions(+), 94 deletions(-)


Index: linux-2.6.23-rc9/fs/jbd/commit.c
===
--- linux-2.6.23-rc9.orig/fs/jbd/commit.c   2007-10-05 12:03:43.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd/commit.c2007-10-05 12:08:08.0 -0700
@@ -375,7 +375,7 @@ void journal_commit_transaction(journal_
struct buffer_head *bh = jh2bh(jh);
 
jbd_lock_bh_state(bh);
-   jbd_slab_free(jh-b_committed_data, bh-b_size);
+   jbd_free(jh-b_committed_data, bh-b_size);
jh-b_committed_data = NULL;
jbd_unlock_bh_state(bh);
}
@@ -792,14 +792,14 @@ restart_loop:
 * Otherwise, we can just throw away the frozen data now.
 */
if (jh-b_committed_data) {
-   jbd_slab_free(jh-b_committed_data, bh-b_size);
+   jbd_free(jh-b_committed_data, bh-b_size);
jh-b_committed_data = NULL;
if (jh-b_frozen_data) {
jh-b_committed_data = jh-b_frozen_data;
jh-b_frozen_data = NULL;
}
} else if (jh-b_frozen_data) {
-   jbd_slab_free(jh-b_frozen_data, bh-b_size);
+   jbd_free(jh-b_frozen_data, bh-b_size);
jh-b_frozen_data = NULL;
}
 
Index: linux-2.6.23-rc9/fs/jbd/journal.c
===
--- linux-2.6.23-rc9.orig/fs/jbd/journal.c  2007-10-05 12:03:43.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd/journal.c   2007-10-05 12:08:08.0 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd_slab_alloc(bh_in-b_size, GFP_NOFS);
+   tmp = jbd_alloc(bh_in-b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in-b_frozen_data) {
-   jbd_slab_free(tmp, bh_in-b_size);
+   jbd_free(tmp, bh_in-b_size);
goto repeat;
}
 
@@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = journal_create_jbd_slab(be32_to_cpu(sb-s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (journal_recover(journal))
@@ -1624,77 +1616,6 @@ void * __jbd_kmalloc (const char *where,
 }
 
 /*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
- */
-
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size)  (size  11)
-
-static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
-static const char *jbd_slab_names[JBD_MAX_SLABS] = {
-   jbd_1k, jbd_2k, jbd_4k, NULL, jbd_8k
-};
-
-static void journal_destroy_jbd_slabs(void)
-{
-   int i;
-
-   for (i = 0; i  JBD_MAX_SLABS; i++) {
-   if (jbd_slab[i])
-   kmem_cache_destroy(jbd_slab[i]);
-   jbd_slab[i] = NULL;
-   }
-}
-
-static int journal_create_jbd_slab(size_t slab_size)
-{
-   int i = JBD_SLAB_INDEX(slab_size);
-
-   BUG_ON(i = JBD_MAX_SLABS);
-
-   /*
-* Check if we already have a slab created for this size
-*/
-   if (jbd_slab[i])
-   return 0;
-
-   /*
-* Create a slab and force alignment to be same as slabsize -
-* this will make sure that allocations won't cross the page
-* boundary.
-*/
-   jbd_slab[i] = kmem_cache_create(jbd_slab_names[i

[PATCH] jbd2: JBD2 slab allocation cleanups

2007-10-05 Thread Mingming Cao

JBD2: jbd2 slab allocation cleanups

From: Mingming Cao [EMAIL PROTECTED]

JBD2: Replace slab allocations with page allocations

JBD2 allocate memory for committed_data and frozen_data from slab. However
JBD2 should not pass slab pages down to the block layer. Use page allocator
pages instead. This will also prepare JBD for the large blocksize patchset.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/jbd2/commit.c  |6 +--
 fs/jbd2/journal.c |   88 ++
 fs/jbd2/transaction.c |   14 +++
 include/linux/jbd2.h  |   18 +++---
 4 files changed, 27 insertions(+), 99 deletions(-)


Index: linux-2.6.23-rc9/fs/jbd2/commit.c
===
--- linux-2.6.23-rc9.orig/fs/jbd2/commit.c  2007-10-05 12:03:43.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd2/commit.c   2007-10-05 12:08:26.0 -0700
@@ -384,7 +384,7 @@ void jbd2_journal_commit_transaction(jou
struct buffer_head *bh = jh2bh(jh);
 
jbd_lock_bh_state(bh);
-   jbd2_slab_free(jh-b_committed_data, bh-b_size);
+   jbd2_free(jh-b_committed_data, bh-b_size);
jh-b_committed_data = NULL;
jbd_unlock_bh_state(bh);
}
@@ -801,14 +801,14 @@ restart_loop:
 * Otherwise, we can just throw away the frozen data now.
 */
if (jh-b_committed_data) {
-   jbd2_slab_free(jh-b_committed_data, bh-b_size);
+   jbd2_free(jh-b_committed_data, bh-b_size);
jh-b_committed_data = NULL;
if (jh-b_frozen_data) {
jh-b_committed_data = jh-b_frozen_data;
jh-b_frozen_data = NULL;
}
} else if (jh-b_frozen_data) {
-   jbd2_slab_free(jh-b_frozen_data, bh-b_size);
+   jbd2_free(jh-b_frozen_data, bh-b_size);
jh-b_frozen_data = NULL;
}
 
Index: linux-2.6.23-rc9/fs/jbd2/journal.c
===
--- linux-2.6.23-rc9.orig/fs/jbd2/journal.c 2007-10-05 12:03:43.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd2/journal.c  2007-10-05 12:08:26.0 -0700
@@ -84,7 +84,6 @@ EXPORT_SYMBOL(jbd2_journal_force_commit)
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int jbd2_journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -335,10 +334,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd2_slab_alloc(bh_in-b_size, GFP_NOFS);
+   tmp = jbd2_alloc(bh_in-b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in-b_frozen_data) {
-   jbd2_slab_free(tmp, bh_in-b_size);
+   jbd2_free(tmp, bh_in-b_size);
goto repeat;
}
 
@@ -1096,13 +1095,6 @@ int jbd2_journal_load(journal_t *journal
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = jbd2_journal_create_jbd_slab(be32_to_cpu(sb-s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (jbd2_journal_recover(journal))
@@ -1636,77 +1628,6 @@ void * __jbd2_kmalloc (const char *where
 }
 
 /*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
- */
-
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size)  (size  11)
-
-static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
-static const char *jbd_slab_names[JBD_MAX_SLABS] = {
-   jbd2_1k, jbd2_2k, jbd2_4k, NULL, jbd2_8k
-};
-
-static void jbd2_journal_destroy_jbd_slabs(void)
-{
-   int i;
-
-   for (i = 0; i  JBD_MAX_SLABS; i++) {
-   if (jbd_slab[i])
-   kmem_cache_destroy(jbd_slab[i]);
-   jbd_slab[i] = NULL;
-   }
-}
-
-static int jbd2_journal_create_jbd_slab(size_t slab_size)
-{
-   int i = JBD_SLAB_INDEX(slab_size);
-
-   BUG_ON(i = JBD_MAX_SLABS);
-
-   /*
-* Check if we already have a slab created for this size
-*/
-   if (jbd_slab[i])
-   return 0;
-
-   /*
-* Create a slab and force alignment to be same as slabsize -
-* this will make sure that allocations won't cross the page
-* boundary

[PATCH] jbd: JBD replace jbd_kmalloc with kmalloc

2007-10-05 Thread Mingming Cao

JBD: JBD replace jbd_kmalloc with kmalloc

From: Mingming Cao [EMAIL PROTECTED]

This patch cleans up jbd_kmalloc and replace it with kmalloc directly

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/jbd/journal.c |   11 +--
 fs/jbd/transaction.c |4 ++--
 include/linux/jbd.h  |6 --
 3 files changed, 3 insertions(+), 18 deletions(-)


Index: linux-2.6.23-rc9/fs/jbd/journal.c
===
--- linux-2.6.23-rc9.orig/fs/jbd/journal.c  2007-10-05 12:08:08.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd/journal.c   2007-10-05 12:08:29.0 -0700
@@ -653,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = jbd_kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -1607,15 +1607,6 @@ int journal_blocks_per_page(struct inode
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
-}
-
-/*
  * Journal_head storage management
  */
 static struct kmem_cache *journal_head_cache;
Index: linux-2.6.23-rc9/fs/jbd/transaction.c
===
--- linux-2.6.23-rc9.orig/fs/jbd/transaction.c  2007-10-05 12:08:08.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd/transaction.c   2007-10-05 12:08:29.0 
-0700
@@ -96,8 +96,8 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal-j_running_transaction) {
-   new_transaction = jbd_kmalloc(sizeof(*new_transaction),
-   GFP_NOFS);
+   new_transaction = kmalloc(sizeof(*new_transaction),
+   GFP_NOFS|__GFP_NOFAIL);
if (!new_transaction) {
ret = -ENOMEM;
goto out;
Index: linux-2.6.23-rc9/include/linux/jbd.h
===
--- linux-2.6.23-rc9.orig/include/linux/jbd.h   2007-10-05 12:08:08.0 
-0700
+++ linux-2.6.23-rc9/include/linux/jbd.h2007-10-05 12:08:29.0 
-0700
@@ -71,12 +71,6 @@ extern int journal_enable_debug;
 #define jbd_debug(f, a...) /**/
 #endif
 
-extern void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int 
retry);
-#define jbd_kmalloc(size, flags) \
-   __jbd_kmalloc(__FUNCTION__, (size), (flags), journal_oom_retry)
-#define jbd_rep_kmalloc(size, flags) \
-   __jbd_kmalloc(__FUNCTION__, (size), (flags), 1)
-
 static inline void *jbd_alloc(size_t size, gfp_t flags)
 {
return (void *)__get_free_pages(flags, get_order(size));


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] jbd2: JBD replace jbd2_kmalloc with kmalloc

2007-10-05 Thread Mingming Cao

JBD2: JBD2 replace jbd2_kmalloc with kmalloc

From: Mingming Cao [EMAIL PROTECTED]

This patch cleans up jbd_kmalloc and replace it with kmalloc directly

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/jbd2/journal.c |   11 +--
 fs/jbd2/transaction.c |4 ++--
 include/linux/jbd2.h  |7 ---
 3 files changed, 3 insertions(+), 19 deletions(-)


Index: linux-2.6.23-rc9/fs/jbd2/journal.c
===
--- linux-2.6.23-rc9.orig/fs/jbd2/journal.c 2007-10-05 12:08:26.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd2/journal.c  2007-10-05 12:08:32.0 -0700
@@ -654,7 +654,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = jbd_kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -1619,15 +1619,6 @@ size_t journal_tag_bytes(journal_t *jour
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd2_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
-}
-
-/*
  * Journal_head storage management
  */
 static struct kmem_cache *jbd2_journal_head_cache;
Index: linux-2.6.23-rc9/fs/jbd2/transaction.c
===
--- linux-2.6.23-rc9.orig/fs/jbd2/transaction.c 2007-10-05 12:08:26.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd2/transaction.c  2007-10-05 12:08:32.0 
-0700
@@ -96,8 +96,8 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal-j_running_transaction) {
-   new_transaction = jbd_kmalloc(sizeof(*new_transaction),
-   GFP_NOFS);
+   new_transaction = kmalloc(sizeof(*new_transaction),
+   GFP_NOFS|__GFP_NOFAIL);
if (!new_transaction) {
ret = -ENOMEM;
goto out;
Index: linux-2.6.23-rc9/include/linux/jbd2.h
===
--- linux-2.6.23-rc9.orig/include/linux/jbd2.h  2007-10-05 12:08:26.0 
-0700
+++ linux-2.6.23-rc9/include/linux/jbd2.h   2007-10-05 12:08:32.0 
-0700
@@ -71,13 +71,6 @@ extern u8 jbd2_journal_enable_debug;
 #define jbd_debug(f, a...) /**/
 #endif
 
-extern void * __jbd2_kmalloc (const char *where, size_t size, gfp_t flags, int 
retry);
-#define jbd_kmalloc(size, flags) \
-   __jbd2_kmalloc(__FUNCTION__, (size), (flags), journal_oom_retry)
-#define jbd_rep_kmalloc(size, flags) \
-   __jbd2_kmalloc(__FUNCTION__, (size), (flags), 1)
-
-
 static inline void *jbd2_alloc(size_t size, gfp_t flags)
 {
return (void *)__get_free_pages(flags, get_order(size));


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] i_version update - ext4 part

2007-10-05 Thread Mingming Cao

On Fri, 2007-10-05 at 17:28 +0200, Cordenner jean noel wrote:
 This patch update the i_version field of the inode and add a mount
 option to enable this feature. The other condition to enable this
 feature is that the inode size should be 256-bytes.
 
 Signed-off-by: Jean Noel Cordenner [EMAIL PROTECTED]
 --- 
  fs/ext4/inode.c |4 +++-
  fs/ext4/super.c |7 ++-
  include/linux/ext4_fs.h |1 +
  3 files changed, 10 insertions(+), 2 deletions(-)
 
 Index: linux-2.6.23-rc8-ext4-i_version/fs/ext4/inode.c
 ===
 --- linux-2.6.23-rc8-ext4-i_version.orig/fs/ext4/inode.c  2007-10-03
 18:11:17.0 +0200
 +++ linux-2.6.23-rc8-ext4-i_version/fs/ext4/inode.c   2007-10-05
 10:26:42.0 +0200
 @@ -3173,7 +3173,9 @@
  {
   int err = 0;
 
 - inode-i_version++;
 + if (test_opt(inode-i_sb, I_VERSION))
 + inode_inc_iversion(inode);
 +
   /* the do_update_inode consumes one bh-b_count */
   get_bh(iloc-bh);
 
 Index: linux-2.6.23-rc8-ext4-i_version/fs/ext4/super.c
 ===
 --- linux-2.6.23-rc8-ext4-i_version.orig/fs/ext4/super.c  2007-10-03
 18:11:17.0 +0200
 +++ linux-2.6.23-rc8-ext4-i_version/fs/ext4/super.c   2007-10-03
 18:17:44.0 +0200
 @@ -742,7 +742,7 @@
   Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota,
   Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota,
   Opt_grpquota, Opt_extents, Opt_noextents, Opt_delalloc,
 - Opt_mballoc, Opt_nomballoc, Opt_stripe,
 + Opt_mballoc, Opt_nomballoc, Opt_stripe, Opt_i_version,
  };
 
  static match_table_t tokens = {
 @@ -800,6 +800,7 @@
   {Opt_mballoc, mballoc},
   {Opt_nomballoc, nomballoc},
   {Opt_stripe, stripe=%u},
 + {Opt_i_version, i_version},
   {Opt_err, NULL},
   {Opt_resize, resize},
  };
 @@ -1161,6 +1162,10 @@
   return 0;
   sbi-s_stripe = option;
   break;
 + case Opt_i_version:
 + set_opt (sbi-s_mount_opt, I_VERSION);
 + sb-s_flags |= MS_I_VERSION;
 + break;

Need to make sure this flag is cleared if remounted fs without I_VERSION

   default:
   printk (KERN_ERR
   EXT4-fs: Unrecognized mount option \%s\ 
 Index: linux-2.6.23-rc8-ext4-i_version/include/linux/ext4_fs.h
 ===
 --- linux-2.6.23-rc8-ext4-i_version.orig/include/linux/ext4_fs.h
 2007-10-03 18:11:17.0 +0200
 +++ linux-2.6.23-rc8-ext4-i_version/include/linux/ext4_fs.h   2007-10-03
 18:11:54.0 +0200
 @@ -500,6 +500,7 @@
  #define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT  0x100 /* Journal Async
 Commit */
  #define EXT4_MOUNT_DELALLOC  0x200 /* Delalloc support */
  #define EXT4_MOUNT_MBALLOC   0x400 /* Buddy allocation support */
 +#define EXT4_MOUNT_I_VERSION 0x800 /* i_version support */
  /* Compatibility, for having both ext2_fs.h and ext4_fs.h included at
 once */
  #ifndef _LINUX_EXT2_FS_H
  #define clear_opt(o, opt)o = ~EXT4_MOUNT_##opt
 


I don't see places where this counter is being stored/load to/from disk,
so I assume this is the not the full patch series?


Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] i_version update - vfs part

2007-10-05 Thread Mingming Cao

On Fri, 2007-10-05 at 17:28 +0200, Cordenner jean noel wrote:
 Hi, 
 
Hi Jean Noel,

 This is an update of the i_version patch. 

Just to make sure, is this vfs patch and next ext4 patch together going
to replace the 4 inode-version related patches currently in
ext4-patch-queue (and git tree)? 

 The i_version field is a 64bit counter that is set on every inode 
 creation and that is incremented every time the inode data is modified 
 (similarly to the ctime time-stamp). 
 The aim is to fulfill a NFSv4 requirement for rfc3530: 
 5.5.  Mandatory Attributes - Definitions 
 Name#DataType   Access   Description 
 ___ 
 change3uint64   READ A value created by the 
 server that the client can use to determine if file 
 data, directory contents or attributes of the object 
 have been modified.  The servermay return the object's 
 time_metadata attribute for this attribute's value but 
 only if the filesystem object can not be updated more 
 frequently than the resolution of time_metadata. 
 
 
 This first part deals with adding a flag in the super block and incrementing 
 the i_version in the vfs.
 
 Signed-off-by: Jean Noel Cordenner [EMAIL PROTECTED]
 --- 
  fs/inode.c |   23 +++
  fs/libfs.c |   12 
  include/linux/fs.h |3 +++
  3 files changed, 38 insertions(+)
 
 Index: linux-2.6.23-rc8-ext4-i_version/fs/inode.c
 ===
 --- linux-2.6.23-rc8-ext4-i_version.orig/fs/inode.c   2007-09-26 
 14:41:41.0 +0200
 +++ linux-2.6.23-rc8-ext4-i_version/fs/inode.c2007-10-05 
 16:14:41.0 +0200
 @@ -1216,6 +1216,24 @@
  EXPORT_SYMBOL(touch_atime);
 
  /**
 + *   inode_inc_iversion  -   increments i_version
 + *   @inode: inode that need to be updated
 + *
 + *   Every time the inode is modified, the i_version field
 + *   will be incremented.
 + *   The filesystem has to be mounted with i_version flag
 + *
 + */
 +
 +void inode_inc_iversion(struct inode *inode)
 +{
 + spin_lock(inode-i_lock);
 + inode-i_version++;
 + spin_unlock(inode-i_lock);
 +}

I suspect we need a lock here,  the places where need to update the
inode-i_version are already doing update for inode, mostly protected by
i_mutex. 

You could remove the above function and update the counter directly at
the places it need to.

 +EXPORT_SYMBOL(inode_inc_iversion);
 +

Seems unnecessary.

 +/**
   *   file_update_time-   update mtime and ctime time
   *   @file: file accessed
   *
 @@ -1249,6 +1267,11 @@
   sync_it = 1;
   }
 
 + if (IS_I_VERSION(inode)) {
 + inode_inc_iversion(inode);
 + sync_it = 1;
 + }
 +
   if (sync_it)
   mark_inode_dirty_sync(inode);
  }
 Index: linux-2.6.23-rc8-ext4-i_version/fs/libfs.c
 ===
 --- linux-2.6.23-rc8-ext4-i_version.orig/fs/libfs.c   2007-07-09 
 01:32:17.0 +0200
 +++ linux-2.6.23-rc8-ext4-i_version/fs/libfs.c2007-09-26 
 14:51:08.0 +0200
 @@ -255,6 +255,10 @@
   struct inode *inode = old_dentry-d_inode;
 
   inode-i_ctime = dir-i_ctime = dir-i_mtime = CURRENT_TIME;
 + if (IS_I_VERSION(inode)) {
 + inode_inc_iversion(inode);
 + inode_inc_iversion(dir);
 + }
   inc_nlink(inode);
   atomic_inc(inode-i_count);
   dget(dentry);
 @@ -287,6 +291,10 @@
   struct inode *inode = dentry-d_inode;
 
   inode-i_ctime = dir-i_ctime = dir-i_mtime = CURRENT_TIME;
 + if (IS_I_VERSION(inode)) {
 + inode_inc_iversion(inode);
 + inode_inc_iversion(dir);
 + }
   drop_nlink(inode);
   dput(dentry);
   return 0;
 @@ -323,6 +331,10 @@
 
   old_dir-i_ctime = old_dir-i_mtime = new_dir-i_ctime =
   new_dir-i_mtime = inode-i_ctime = CURRENT_TIME;
 + if (IS_I_VERSION(old_dir)) {
 + inode_inc_iversion(old_dir);
 + inode_inc_iversion(new_dir);
 + }
 
   return 0;
  }

Need to update the counter in libfs.c?

 Index: linux-2.6.23-rc8-ext4-i_version/include/linux/fs.h
 ===
 --- linux-2.6.23-rc8-ext4-i_version.orig/include/linux/fs.h   2007-09-26 
 14:46:15.0 +0200
 +++ linux-2.6.23-rc8-ext4-i_version/include/linux/fs.h2007-09-26 
 14:51:08.0 +0200
 @@ -123,6 +123,7 @@
  #define MS_SLAVE (119) /* change to slave */
  #define MS_SHARED(120) /* change to shared */
  #define MS_RELATIME  (121) /* Update atime relative to mtime/ctime. */
 +#define MS_I_VERSION (122) /* Update inode i_version field */
  #define MS_ACTIVE(130)
  #define MS_NOUSER(131)
 
 @@ -172,6 +173,7 @@
   ((inode)-i_flags  (S_SYNC|S_DIRSYNC)))

[PATCH 1/2] ext3: Support large blocksize up to PAGESIZE

2007-10-01 Thread Mingming Cao

Support large blocksize up to PAGESIZE (max 64KB) for ext3

From: Takashi Sato <[EMAIL PROTECTED]>

This patch set supports large block size(>4k, <=64k) in ext3
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext3 without some changes to the directory handling
code.  The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem.  The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0x to handle this.

The Patch-set consists of the following 2 patches.
  [1/2]  ext3: enlarge blocksize
 - Allow blocksize up to pagesize

  [2/2]  ext3: fix rec_len overflow
 - prevent rec_len from overflow with 64KB blocksize

Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext3, and able to handle empty directory block.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---

 fs/ext3/super.c |6 +-
 include/linux/ext3_fs.h |4 ++--
 2 files changed, 7 insertions(+), 3 deletions(-)


diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 9537316..b4bfd36 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -1549,7 +1549,11 @@ static int ext3_fill_super (struct super_block *sb, void 
*data, int silent)
}
 
brelse (bh);
-   sb_set_blocksize(sb, blocksize);
+   if (!sb_set_blocksize(sb, blocksize)) {
+   printk(KERN_ERR "EXT3-fs: bad blocksize %d.\n",
+   blocksize);
+   goto out_fail;
+   }
logic_sb_block = (sb_block * EXT3_MIN_BLOCK_SIZE) / blocksize;
offset = (sb_block * EXT3_MIN_BLOCK_SIZE) % blocksize;
bh = sb_bread(sb, logic_sb_block);
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index ece49a8..7aa5556 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -76,8 +76,8 @@
  * Macro-instructions used to manage several block sizes
  */
 #define EXT3_MIN_BLOCK_SIZE1024
-#defineEXT3_MAX_BLOCK_SIZE 4096
-#define EXT3_MIN_BLOCK_LOG_SIZE  10
+#defineEXT3_MAX_BLOCK_SIZE 65536
+#define EXT3_MIN_BLOCK_LOG_SIZE10
 #ifdef __KERNEL__
 # define EXT3_BLOCK_SIZE(s)((s)->s_blocksize)
 #else


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] ext3: Avoid rec_len overflow with 64KB block size

2007-10-01 Thread Mingming Cao

ext3: Avoid rec_len overflow with 64KB block size

From: Jan Kara <[EMAIL PROTECTED]>

With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0x instead and convert
value when read from / written to disk. The patch also converts some places
to use ext3_next_entry() when we are changing them anyway.

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---

 fs/ext3/dir.c   |   10 +++--
 fs/ext3/namei.c |   90 ++-
 include/linux/ext3_fs.h |   20 ++
 3 files changed, 68 insertions(+), 52 deletions(-)


diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c
index c00723a..3c4c43a 100644
--- a/fs/ext3/dir.c
+++ b/fs/ext3/dir.c
@@ -69,7 +69,7 @@ int ext3_check_dir_entry (const char * function, struct inode 
* dir,
  unsigned long offset)
 {
const char * error_msg = NULL;
-   const int rlen = le16_to_cpu(de->rec_len);
+   const int rlen = ext3_rec_len_from_disk(de->rec_len);
 
if (rlen < EXT3_DIR_REC_LEN(1))
error_msg = "rec_len is smaller than minimal";
@@ -177,10 +177,10 @@ revalidate:
 * least that it is non-zero.  A
 * failure will be detected in the
 * dirent test below. */
-   if (le16_to_cpu(de->rec_len) <
+   if (ext3_rec_len_from_disk(de->rec_len) <
EXT3_DIR_REC_LEN(1))
break;
-   i += le16_to_cpu(de->rec_len);
+   i += ext3_rec_len_from_disk(de->rec_len);
}
offset = i;
filp->f_pos = (filp->f_pos & ~(sb->s_blocksize - 1))
@@ -201,7 +201,7 @@ revalidate:
ret = stored;
goto out;
}
-   offset += le16_to_cpu(de->rec_len);
+   offset += ext3_rec_len_from_disk(de->rec_len);
if (le32_to_cpu(de->inode)) {
/* We might block in the next section
 * if the data destination is
@@ -223,7 +223,7 @@ revalidate:
goto revalidate;
stored ++;
}
-   filp->f_pos += le16_to_cpu(de->rec_len);
+   filp->f_pos += ext3_rec_len_from_disk(de->rec_len);
}
offset = 0;
brelse (bh);
diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index c1fa190..2c38eb6 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -144,6 +144,15 @@ struct dx_map_entry
u16 size;
 };
 
+/*
+ * p is at least 6 bytes before the end of page
+ */
+static inline struct ext3_dir_entry_2 *ext3_next_entry(struct ext3_dir_entry_2 
*p)
+{
+   return (struct ext3_dir_entry_2 *)((char*)p +
+   ext3_rec_len_from_disk(p->rec_len));
+}
+
 #ifdef CONFIG_EXT3_INDEX
 static inline unsigned dx_get_block (struct dx_entry *entry);
 static void dx_set_block (struct dx_entry *entry, unsigned value);
@@ -281,7 +290,7 @@ static struct stats dx_show_leaf(struct dx_hash_info 
*hinfo, struct ext3_dir_ent
space += EXT3_DIR_REC_LEN(de->name_len);
names++;
}
-   de = (struct ext3_dir_entry_2 *) ((char *) de + 
le16_to_cpu(de->rec_len));
+   de = ext3_next_entry(de);
}
printk("(%i)\n", names);
return (struct stats) { names, space, 1 };
@@ -548,14 +557,6 @@ static int ext3_htree_next_block(struct inode *dir, __u32 
hash,
 
 
 /*
- * p is at least 6 bytes before the end of page
- */
-static inline struct ext3_dir_entry_2 *ext3_next_entry(struct ext3_dir_entry_2 
*p)
-{
-   return (struct ext3_dir_entry_2 *)((char*)p + le16_to_cpu(p->rec_len));
-}
-
-/*
  * This function fills a red-black tree with information from a
  * directory block.  It returns the number directory entries loaded
  * into the tree.  If there is an error it is returned in err.
@@ -721,7 +722,7 @@ static int dx_make_map (struct ext3_dir_entry_2 *de, int 
size,
cond_resched();
}
/* XXX: do we need to check rec_len == 0 case? -Chris */
-   de = (struct ext3_dir_entry_2 *) ((char *) de + 
le16_to_cpu(de->rec_len));
+   de = ext3_next_entry(de);
}
return count;
 }
@@ -825,7 +826,7 @@ static inline int search_dirblock(struct buffer_head * bh,
return 1;

[PATCH 2/2] ext2: Avoid rec_len overflow with 64KB block size

2007-10-01 Thread Mingming Cao

ext2: Avoid rec_len overflow with 64KB block size

From: Jan Kara <[EMAIL PROTECTED]>

With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0x instead and convert
value when read from / written to disk.

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---

 fs/ext2/dir.c   |   43 +++
 include/linux/ext2_fs.h |1 +
 2 files changed, 32 insertions(+), 12 deletions(-)


diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 2bf49d7..1329bdb 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -26,6 +26,24 @@
 
 typedef struct ext2_dir_entry_2 ext2_dirent;
 
+static inline unsigned ext2_rec_len_from_disk(__le16 dlen)
+{
+   unsigned len = le16_to_cpu(dlen);
+
+   if (len == EXT2_MAX_REC_LEN)
+   return 1 << 16;
+   return len;
+}
+
+static inline __le16 ext2_rec_len_to_disk(unsigned len)
+{
+   if (len == (1 << 16))
+   return cpu_to_le16(EXT2_MAX_REC_LEN);
+   else if (len > (1 << 16))
+   BUG();
+   return cpu_to_le16(len);
+}
+
 /*
  * ext2 uses block-sized chunks. Arguably, sector-sized ones would be
  * more robust, but we have what we have
@@ -95,7 +113,7 @@ static void ext2_check_page(struct page *page)
}
for (offs = 0; offs <= limit - EXT2_DIR_REC_LEN(1); offs += rec_len) {
p = (ext2_dirent *)(kaddr + offs);
-   rec_len = le16_to_cpu(p->rec_len);
+   rec_len = ext2_rec_len_from_disk(p->rec_len);
 
if (rec_len < EXT2_DIR_REC_LEN(1))
goto Eshort;
@@ -193,7 +211,8 @@ static inline int ext2_match (int len, const char * const 
name,
  */
 static inline ext2_dirent *ext2_next_entry(ext2_dirent *p)
 {
-   return (ext2_dirent *)((char*)p + le16_to_cpu(p->rec_len));
+   return (ext2_dirent *)((char*)p +
+   ext2_rec_len_from_disk(p->rec_len));
 }
 
 static inline unsigned 
@@ -305,7 +324,7 @@ ext2_readdir (struct file * filp, void * dirent, filldir_t 
filldir)
return 0;
}
}
-   filp->f_pos += le16_to_cpu(de->rec_len);
+   filp->f_pos += ext2_rec_len_from_disk(de->rec_len);
}
ext2_put_page(page);
}
@@ -413,7 +432,7 @@ void ext2_set_link(struct inode *dir, struct 
ext2_dir_entry_2 *de,
struct page *page, struct inode *inode)
 {
unsigned from = (char *) de - (char *) page_address(page);
-   unsigned to = from + le16_to_cpu(de->rec_len);
+   unsigned to = from + ext2_rec_len_from_disk(de->rec_len);
int err;
 
lock_page(page);
@@ -469,7 +488,7 @@ int ext2_add_link (struct dentry *dentry, struct inode 
*inode)
/* We hit i_size */
name_len = 0;
rec_len = chunk_size;
-   de->rec_len = cpu_to_le16(chunk_size);
+   de->rec_len = ext2_rec_len_to_disk(chunk_size);
de->inode = 0;
goto got_it;
}
@@ -483,7 +502,7 @@ int ext2_add_link (struct dentry *dentry, struct inode 
*inode)
if (ext2_match (namelen, name, de))
goto out_unlock;
name_len = EXT2_DIR_REC_LEN(de->name_len);
-   rec_len = le16_to_cpu(de->rec_len);
+   rec_len = ext2_rec_len_from_disk(de->rec_len);
if (!de->inode && rec_len >= reclen)
goto got_it;
if (rec_len >= name_len + reclen)
@@ -504,8 +523,8 @@ got_it:
goto out_unlock;
if (de->inode) {
ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
-   de1->rec_len = cpu_to_le16(rec_len - name_len);
-   de->rec_len = cpu_to_le16(name_len);
+   de1->rec_len = ext2_rec_len_to_disk(rec_len - name_len);
+   de->rec_len = ext2_rec_len_to_disk(name_len);
de = de1;
}
de->name_len = namelen;
@@ -536,7 +555,7 @@ int ext2_delete_entry (struct ext2_dir_entry_2 * dir, 
struct page * page )
struct inode *inode = mapping->host;
char *kaddr = page_address(page);
unsigned from = ((char*)dir - kaddr) & ~(ext2_chunk_size(inode)-1);
-   unsigned to = ((char*)dir - kaddr) + le16_to_cpu(dir->rec_len);
+   unsigned to = ((char*)dir - kaddr) + 
ext2_rec_len_from_disk(dir->rec_len);
ext2_dirent * pde = NULL;
ext2_dirent *

[PATCH 1/2] ext2: Support large blocksize up to PAGESIZE

2007-10-01 Thread Mingming Cao

Support large blocksize up to PAGESIZE (max 64KB) for ext2

From: Takashi Sato <[EMAIL PROTECTED]>

This patch set supports large block size(>4k, <=64k) in ext2,
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext2 without some changes to the directory handling
code.  The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem.  The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0x to handle this.

The Patch-set consists of the following 2 patches.
  [1/2]  ext2: enlarge blocksize
 - Allow blocksize up to pagesize

  [2/2]  ext2: fix rec_len overflow
 - prevent rec_len from overflow with 64KB blocksize

Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext2, and able to handle empty directory block.

Please consider to include to mm tree.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---

 fs/ext2/super.c |3 ++-
 include/linux/ext2_fs.h |4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)


diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 639a32c..765c805 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -775,7 +775,8 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
brelse(bh);
 
if (!sb_set_blocksize(sb, blocksize)) {
-   printk(KERN_ERR "EXT2-fs: blocksize too small for 
device.\n");
+   printk(KERN_ERR "EXT2-fs: bad blocksize %d.\n",
+   blocksize);
goto failed_sbi;
}
 
diff --git a/include/linux/ext2_fs.h b/include/linux/ext2_fs.h
index 153d755..910a705 100644
--- a/include/linux/ext2_fs.h
+++ b/include/linux/ext2_fs.h
@@ -86,8 +86,8 @@ static inline struct ext2_sb_info *EXT2_SB(struct super_block 
*sb)
  * Macro-instructions used to manage several block sizes
  */
 #define EXT2_MIN_BLOCK_SIZE1024
-#defineEXT2_MAX_BLOCK_SIZE 4096
-#define EXT2_MIN_BLOCK_LOG_SIZE  10
+#define EXT2_MAX_BLOCK_SIZE65536
+#define EXT2_MIN_BLOCK_LOG_SIZE10
 #ifdef __KERNEL__
 # define EXT2_BLOCK_SIZE(s)((s)->s_blocksize)
 #else


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] ext4: Avoid rec_len overflow with 64KB block size

2007-10-01 Thread Mingming Cao

ext4: Avoid rec_len overflow with 64KB block size

From: Jan Kara <[EMAIL PROTECTED]>

With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0x instead and convert
value when read from / written to disk. The patch also converts some places
to use ext4_next_entry() when we are changing them anyway.

Signed-off-by: Jan Kara <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---

 fs/ext4/dir.c   |   12 ---
 fs/ext4/namei.c |   76 ++-
 include/linux/ext4_fs.h |   20 
 3 files changed, 62 insertions(+), 46 deletions(-)


diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 3ab01c0..20b1e28 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -69,7 +69,7 @@ int ext4_check_dir_entry (const char * function, struct inode 
* dir,
  unsigned long offset)
 {
const char * error_msg = NULL;
-   const int rlen = le16_to_cpu(de->rec_len);
+   const int rlen = ext4_rec_len_from_disk(de->rec_len);
 
if (rlen < EXT4_DIR_REC_LEN(1))
error_msg = "rec_len is smaller than minimal";
@@ -176,10 +176,10 @@ revalidate:
 * least that it is non-zero.  A
 * failure will be detected in the
 * dirent test below. */
-   if (le16_to_cpu(de->rec_len) <
-   EXT4_DIR_REC_LEN(1))
+   if (ext4_rec_len_from_disk(de->rec_len)
+   < EXT4_DIR_REC_LEN(1))
break;
-   i += le16_to_cpu(de->rec_len);
+   i += ext4_rec_len_from_disk(de->rec_len);
}
offset = i;
filp->f_pos = (filp->f_pos & ~(sb->s_blocksize - 1))
@@ -201,7 +201,7 @@ revalidate:
ret = stored;
goto out;
}
-   offset += le16_to_cpu(de->rec_len);
+   offset += ext4_rec_len_from_disk(de->rec_len);
if (le32_to_cpu(de->inode)) {
/* We might block in the next section
 * if the data destination is
@@ -223,7 +223,7 @@ revalidate:
goto revalidate;
stored ++;
}
-   filp->f_pos += le16_to_cpu(de->rec_len);
+   filp->f_pos += ext4_rec_len_from_disk(de->rec_len);
}
offset = 0;
brelse (bh);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 5fdb862..96e8a85 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -281,7 +281,7 @@ static struct stats dx_show_leaf(struct dx_hash_info 
*hinfo, struct ext4_dir_ent
space += EXT4_DIR_REC_LEN(de->name_len);
names++;
}
-   de = (struct ext4_dir_entry_2 *) ((char *) de + 
le16_to_cpu(de->rec_len));
+   de = ext4_next_entry(de);
}
printk("(%i)\n", names);
return (struct stats) { names, space, 1 };
@@ -552,7 +552,8 @@ static int ext4_htree_next_block(struct inode *dir, __u32 
hash,
  */
 static inline struct ext4_dir_entry_2 *ext4_next_entry(struct ext4_dir_entry_2 
*p)
 {
-   return (struct ext4_dir_entry_2 *)((char*)p + le16_to_cpu(p->rec_len));
+   return (struct ext4_dir_entry_2 *)((char*)p +
+   ext4_rec_len_from_disk(p->rec_len));
 }
 
 /*
@@ -721,7 +722,7 @@ static int dx_make_map (struct ext4_dir_entry_2 *de, int 
size,
cond_resched();
}
/* XXX: do we need to check rec_len == 0 case? -Chris */
-   de = (struct ext4_dir_entry_2 *) ((char *) de + 
le16_to_cpu(de->rec_len));
+   de = ext4_next_entry(de);
}
return count;
 }
@@ -823,7 +824,7 @@ static inline int search_dirblock(struct buffer_head * bh,
return 1;
}
/* prevent looping on a bad block */
-   de_len = le16_to_cpu(de->rec_len);
+   de_len = ext4_rec_len_from_disk(de->rec_len);
if (de_len <= 0)
return -1;
offset += de_len;
@@ -1136,7 +1137,7 @@ dx_move_dirents(char *from, char *to, struct dx_map_entry 
*map, int count)
rec_len = EXT4_DIR_REC_LEN(de->name_len);
memcpy (to, de, rec_len);
((struct ext4_dir_entry_2 *) to)->rec_len =
-

[PATCH 1/2] ext4: Support large blocksize up to PAGESIZE

2007-10-01 Thread Mingming Cao

Support large blocksize up to PAGESIZE (max 64KB) for ext4. 

From: Takashi Sato <[EMAIL PROTECTED]>

This patch set supports large block size(>4k, <=64k) in ext4,
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext4 without some changes to the directory handling
code.  The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem.  The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0x to handle this.

The Patch-set consists of the following 2 patches.
  [1/2]  ext4: enlarge blocksize
 - Allow blocksize up to pagesize

  [2/2]  ext4: fix rec_len overflow
 - prevent rec_len from overflow with 64KB blocksize

Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext4dev, and able to handle empty directory block.
Patch consider to be merge to 2.6.24-rc1.

Signed-off-by: Takashi Sato <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---

 fs/ext4/super.c |5 +
 include/linux/ext4_fs.h |4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)


diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 619db84..d8bb279 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1548,6 +1548,11 @@ static int ext4_fill_super (struct super_block *sb, void 
*data, int silent)
goto out_fail;
}
 
+   if (!sb_set_blocksize(sb, blocksize)) {
+   printk(KERN_ERR "EXT4-fs: bad blocksize %d.\n", blocksize);
+   goto out_fail;
+   }
+
/*
 * The ext4 superblock will not be buffer aligned for other than 1kB
 * block sizes.  We need to calculate the offset from buffer start.
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index f9881b6..d15a15e 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -77,8 +77,8 @@
  * Macro-instructions used to manage several block sizes
  */
 #define EXT4_MIN_BLOCK_SIZE1024
-#defineEXT4_MAX_BLOCK_SIZE 4096
-#define EXT4_MIN_BLOCK_LOG_SIZE  10
+#defineEXT4_MAX_BLOCK_SIZE 65536
+#define EXT4_MIN_BLOCK_LOG_SIZE10
 #ifdef __KERNEL__
 # define EXT4_BLOCK_SIZE(s)((s)->s_blocksize)
 #else


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] ext4: Support large blocksize up to PAGESIZE

2007-10-01 Thread Mingming Cao

Support large blocksize up to PAGESIZE (max 64KB) for ext4. 

From: Takashi Sato [EMAIL PROTECTED]

This patch set supports large block size(4k, =64k) in ext4,
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext4 without some changes to the directory handling
code.  The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem.  The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0x to handle this.

The Patch-set consists of the following 2 patches.
  [1/2]  ext4: enlarge blocksize
 - Allow blocksize up to pagesize

  [2/2]  ext4: fix rec_len overflow
 - prevent rec_len from overflow with 64KB blocksize

Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext4dev, and able to handle empty directory block.
Patch consider to be merge to 2.6.24-rc1.

Signed-off-by: Takashi Sato [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/ext4/super.c |5 +
 include/linux/ext4_fs.h |4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)


diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 619db84..d8bb279 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1548,6 +1548,11 @@ static int ext4_fill_super (struct super_block *sb, void 
*data, int silent)
goto out_fail;
}
 
+   if (!sb_set_blocksize(sb, blocksize)) {
+   printk(KERN_ERR EXT4-fs: bad blocksize %d.\n, blocksize);
+   goto out_fail;
+   }
+
/*
 * The ext4 superblock will not be buffer aligned for other than 1kB
 * block sizes.  We need to calculate the offset from buffer start.
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index f9881b6..d15a15e 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -77,8 +77,8 @@
  * Macro-instructions used to manage several block sizes
  */
 #define EXT4_MIN_BLOCK_SIZE1024
-#defineEXT4_MAX_BLOCK_SIZE 4096
-#define EXT4_MIN_BLOCK_LOG_SIZE  10
+#defineEXT4_MAX_BLOCK_SIZE 65536
+#define EXT4_MIN_BLOCK_LOG_SIZE10
 #ifdef __KERNEL__
 # define EXT4_BLOCK_SIZE(s)((s)-s_blocksize)
 #else


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] ext4: Avoid rec_len overflow with 64KB block size

2007-10-01 Thread Mingming Cao

ext4: Avoid rec_len overflow with 64KB block size

From: Jan Kara [EMAIL PROTECTED]

With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0x instead and convert
value when read from / written to disk. The patch also converts some places
to use ext4_next_entry() when we are changing them anyway.

Signed-off-by: Jan Kara [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/ext4/dir.c   |   12 ---
 fs/ext4/namei.c |   76 ++-
 include/linux/ext4_fs.h |   20 
 3 files changed, 62 insertions(+), 46 deletions(-)


diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 3ab01c0..20b1e28 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -69,7 +69,7 @@ int ext4_check_dir_entry (const char * function, struct inode 
* dir,
  unsigned long offset)
 {
const char * error_msg = NULL;
-   const int rlen = le16_to_cpu(de-rec_len);
+   const int rlen = ext4_rec_len_from_disk(de-rec_len);
 
if (rlen  EXT4_DIR_REC_LEN(1))
error_msg = rec_len is smaller than minimal;
@@ -176,10 +176,10 @@ revalidate:
 * least that it is non-zero.  A
 * failure will be detected in the
 * dirent test below. */
-   if (le16_to_cpu(de-rec_len) 
-   EXT4_DIR_REC_LEN(1))
+   if (ext4_rec_len_from_disk(de-rec_len)
+EXT4_DIR_REC_LEN(1))
break;
-   i += le16_to_cpu(de-rec_len);
+   i += ext4_rec_len_from_disk(de-rec_len);
}
offset = i;
filp-f_pos = (filp-f_pos  ~(sb-s_blocksize - 1))
@@ -201,7 +201,7 @@ revalidate:
ret = stored;
goto out;
}
-   offset += le16_to_cpu(de-rec_len);
+   offset += ext4_rec_len_from_disk(de-rec_len);
if (le32_to_cpu(de-inode)) {
/* We might block in the next section
 * if the data destination is
@@ -223,7 +223,7 @@ revalidate:
goto revalidate;
stored ++;
}
-   filp-f_pos += le16_to_cpu(de-rec_len);
+   filp-f_pos += ext4_rec_len_from_disk(de-rec_len);
}
offset = 0;
brelse (bh);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 5fdb862..96e8a85 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -281,7 +281,7 @@ static struct stats dx_show_leaf(struct dx_hash_info 
*hinfo, struct ext4_dir_ent
space += EXT4_DIR_REC_LEN(de-name_len);
names++;
}
-   de = (struct ext4_dir_entry_2 *) ((char *) de + 
le16_to_cpu(de-rec_len));
+   de = ext4_next_entry(de);
}
printk((%i)\n, names);
return (struct stats) { names, space, 1 };
@@ -552,7 +552,8 @@ static int ext4_htree_next_block(struct inode *dir, __u32 
hash,
  */
 static inline struct ext4_dir_entry_2 *ext4_next_entry(struct ext4_dir_entry_2 
*p)
 {
-   return (struct ext4_dir_entry_2 *)((char*)p + le16_to_cpu(p-rec_len));
+   return (struct ext4_dir_entry_2 *)((char*)p +
+   ext4_rec_len_from_disk(p-rec_len));
 }
 
 /*
@@ -721,7 +722,7 @@ static int dx_make_map (struct ext4_dir_entry_2 *de, int 
size,
cond_resched();
}
/* XXX: do we need to check rec_len == 0 case? -Chris */
-   de = (struct ext4_dir_entry_2 *) ((char *) de + 
le16_to_cpu(de-rec_len));
+   de = ext4_next_entry(de);
}
return count;
 }
@@ -823,7 +824,7 @@ static inline int search_dirblock(struct buffer_head * bh,
return 1;
}
/* prevent looping on a bad block */
-   de_len = le16_to_cpu(de-rec_len);
+   de_len = ext4_rec_len_from_disk(de-rec_len);
if (de_len = 0)
return -1;
offset += de_len;
@@ -1136,7 +1137,7 @@ dx_move_dirents(char *from, char *to, struct dx_map_entry 
*map, int count)
rec_len = EXT4_DIR_REC_LEN(de-name_len);
memcpy (to, de, rec_len);
((struct ext4_dir_entry_2 *) to)-rec_len =
-   cpu_to_le16(rec_len);
+   ext4_rec_len_to_disk(rec_len);
de-inode = 0;
map

[PATCH 1/2] ext2: Support large blocksize up to PAGESIZE

2007-10-01 Thread Mingming Cao

Support large blocksize up to PAGESIZE (max 64KB) for ext2

From: Takashi Sato [EMAIL PROTECTED]

This patch set supports large block size(4k, =64k) in ext2,
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext2 without some changes to the directory handling
code.  The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem.  The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0x to handle this.

The Patch-set consists of the following 2 patches.
  [1/2]  ext2: enlarge blocksize
 - Allow blocksize up to pagesize

  [2/2]  ext2: fix rec_len overflow
 - prevent rec_len from overflow with 64KB blocksize

Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext2, and able to handle empty directory block.

Please consider to include to mm tree.

Signed-off-by: Takashi Sato [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/ext2/super.c |3 ++-
 include/linux/ext2_fs.h |4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)


diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 639a32c..765c805 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -775,7 +775,8 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
brelse(bh);
 
if (!sb_set_blocksize(sb, blocksize)) {
-   printk(KERN_ERR EXT2-fs: blocksize too small for 
device.\n);
+   printk(KERN_ERR EXT2-fs: bad blocksize %d.\n,
+   blocksize);
goto failed_sbi;
}
 
diff --git a/include/linux/ext2_fs.h b/include/linux/ext2_fs.h
index 153d755..910a705 100644
--- a/include/linux/ext2_fs.h
+++ b/include/linux/ext2_fs.h
@@ -86,8 +86,8 @@ static inline struct ext2_sb_info *EXT2_SB(struct super_block 
*sb)
  * Macro-instructions used to manage several block sizes
  */
 #define EXT2_MIN_BLOCK_SIZE1024
-#defineEXT2_MAX_BLOCK_SIZE 4096
-#define EXT2_MIN_BLOCK_LOG_SIZE  10
+#define EXT2_MAX_BLOCK_SIZE65536
+#define EXT2_MIN_BLOCK_LOG_SIZE10
 #ifdef __KERNEL__
 # define EXT2_BLOCK_SIZE(s)((s)-s_blocksize)
 #else


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] ext2: Avoid rec_len overflow with 64KB block size

2007-10-01 Thread Mingming Cao

ext2: Avoid rec_len overflow with 64KB block size

From: Jan Kara [EMAIL PROTECTED]

With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0x instead and convert
value when read from / written to disk.

Signed-off-by: Jan Kara [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/ext2/dir.c   |   43 +++
 include/linux/ext2_fs.h |1 +
 2 files changed, 32 insertions(+), 12 deletions(-)


diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 2bf49d7..1329bdb 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -26,6 +26,24 @@
 
 typedef struct ext2_dir_entry_2 ext2_dirent;
 
+static inline unsigned ext2_rec_len_from_disk(__le16 dlen)
+{
+   unsigned len = le16_to_cpu(dlen);
+
+   if (len == EXT2_MAX_REC_LEN)
+   return 1  16;
+   return len;
+}
+
+static inline __le16 ext2_rec_len_to_disk(unsigned len)
+{
+   if (len == (1  16))
+   return cpu_to_le16(EXT2_MAX_REC_LEN);
+   else if (len  (1  16))
+   BUG();
+   return cpu_to_le16(len);
+}
+
 /*
  * ext2 uses block-sized chunks. Arguably, sector-sized ones would be
  * more robust, but we have what we have
@@ -95,7 +113,7 @@ static void ext2_check_page(struct page *page)
}
for (offs = 0; offs = limit - EXT2_DIR_REC_LEN(1); offs += rec_len) {
p = (ext2_dirent *)(kaddr + offs);
-   rec_len = le16_to_cpu(p-rec_len);
+   rec_len = ext2_rec_len_from_disk(p-rec_len);
 
if (rec_len  EXT2_DIR_REC_LEN(1))
goto Eshort;
@@ -193,7 +211,8 @@ static inline int ext2_match (int len, const char * const 
name,
  */
 static inline ext2_dirent *ext2_next_entry(ext2_dirent *p)
 {
-   return (ext2_dirent *)((char*)p + le16_to_cpu(p-rec_len));
+   return (ext2_dirent *)((char*)p +
+   ext2_rec_len_from_disk(p-rec_len));
 }
 
 static inline unsigned 
@@ -305,7 +324,7 @@ ext2_readdir (struct file * filp, void * dirent, filldir_t 
filldir)
return 0;
}
}
-   filp-f_pos += le16_to_cpu(de-rec_len);
+   filp-f_pos += ext2_rec_len_from_disk(de-rec_len);
}
ext2_put_page(page);
}
@@ -413,7 +432,7 @@ void ext2_set_link(struct inode *dir, struct 
ext2_dir_entry_2 *de,
struct page *page, struct inode *inode)
 {
unsigned from = (char *) de - (char *) page_address(page);
-   unsigned to = from + le16_to_cpu(de-rec_len);
+   unsigned to = from + ext2_rec_len_from_disk(de-rec_len);
int err;
 
lock_page(page);
@@ -469,7 +488,7 @@ int ext2_add_link (struct dentry *dentry, struct inode 
*inode)
/* We hit i_size */
name_len = 0;
rec_len = chunk_size;
-   de-rec_len = cpu_to_le16(chunk_size);
+   de-rec_len = ext2_rec_len_to_disk(chunk_size);
de-inode = 0;
goto got_it;
}
@@ -483,7 +502,7 @@ int ext2_add_link (struct dentry *dentry, struct inode 
*inode)
if (ext2_match (namelen, name, de))
goto out_unlock;
name_len = EXT2_DIR_REC_LEN(de-name_len);
-   rec_len = le16_to_cpu(de-rec_len);
+   rec_len = ext2_rec_len_from_disk(de-rec_len);
if (!de-inode  rec_len = reclen)
goto got_it;
if (rec_len = name_len + reclen)
@@ -504,8 +523,8 @@ got_it:
goto out_unlock;
if (de-inode) {
ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
-   de1-rec_len = cpu_to_le16(rec_len - name_len);
-   de-rec_len = cpu_to_le16(name_len);
+   de1-rec_len = ext2_rec_len_to_disk(rec_len - name_len);
+   de-rec_len = ext2_rec_len_to_disk(name_len);
de = de1;
}
de-name_len = namelen;
@@ -536,7 +555,7 @@ int ext2_delete_entry (struct ext2_dir_entry_2 * dir, 
struct page * page )
struct inode *inode = mapping-host;
char *kaddr = page_address(page);
unsigned from = ((char*)dir - kaddr)  ~(ext2_chunk_size(inode)-1);
-   unsigned to = ((char*)dir - kaddr) + le16_to_cpu(dir-rec_len);
+   unsigned to = ((char*)dir - kaddr) + 
ext2_rec_len_from_disk(dir-rec_len);
ext2_dirent * pde = NULL;
ext2_dirent * de = (ext2_dirent *) (kaddr + from);
int err;
@@ -557,7 +576,7 @@ int ext2_delete_entry (struct ext2_dir_entry_2 * dir, 
struct page * page )
err = mapping-a_ops-prepare_write

[PATCH 1/2] ext3: Support large blocksize up to PAGESIZE

2007-10-01 Thread Mingming Cao

Support large blocksize up to PAGESIZE (max 64KB) for ext3

From: Takashi Sato [EMAIL PROTECTED]

This patch set supports large block size(4k, =64k) in ext3
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext3 without some changes to the directory handling
code.  The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem.  The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0x to handle this.

The Patch-set consists of the following 2 patches.
  [1/2]  ext3: enlarge blocksize
 - Allow blocksize up to pagesize

  [2/2]  ext3: fix rec_len overflow
 - prevent rec_len from overflow with 64KB blocksize

Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext3, and able to handle empty directory block.

Signed-off-by: Takashi Sato [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/ext3/super.c |6 +-
 include/linux/ext3_fs.h |4 ++--
 2 files changed, 7 insertions(+), 3 deletions(-)


diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 9537316..b4bfd36 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -1549,7 +1549,11 @@ static int ext3_fill_super (struct super_block *sb, void 
*data, int silent)
}
 
brelse (bh);
-   sb_set_blocksize(sb, blocksize);
+   if (!sb_set_blocksize(sb, blocksize)) {
+   printk(KERN_ERR EXT3-fs: bad blocksize %d.\n,
+   blocksize);
+   goto out_fail;
+   }
logic_sb_block = (sb_block * EXT3_MIN_BLOCK_SIZE) / blocksize;
offset = (sb_block * EXT3_MIN_BLOCK_SIZE) % blocksize;
bh = sb_bread(sb, logic_sb_block);
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index ece49a8..7aa5556 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -76,8 +76,8 @@
  * Macro-instructions used to manage several block sizes
  */
 #define EXT3_MIN_BLOCK_SIZE1024
-#defineEXT3_MAX_BLOCK_SIZE 4096
-#define EXT3_MIN_BLOCK_LOG_SIZE  10
+#defineEXT3_MAX_BLOCK_SIZE 65536
+#define EXT3_MIN_BLOCK_LOG_SIZE10
 #ifdef __KERNEL__
 # define EXT3_BLOCK_SIZE(s)((s)-s_blocksize)
 #else


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] ext3: Avoid rec_len overflow with 64KB block size

2007-10-01 Thread Mingming Cao

ext3: Avoid rec_len overflow with 64KB block size

From: Jan Kara [EMAIL PROTECTED]

With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0x instead and convert
value when read from / written to disk. The patch also converts some places
to use ext3_next_entry() when we are changing them anyway.

Signed-off-by: Jan Kara [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/ext3/dir.c   |   10 +++--
 fs/ext3/namei.c |   90 ++-
 include/linux/ext3_fs.h |   20 ++
 3 files changed, 68 insertions(+), 52 deletions(-)


diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c
index c00723a..3c4c43a 100644
--- a/fs/ext3/dir.c
+++ b/fs/ext3/dir.c
@@ -69,7 +69,7 @@ int ext3_check_dir_entry (const char * function, struct inode 
* dir,
  unsigned long offset)
 {
const char * error_msg = NULL;
-   const int rlen = le16_to_cpu(de-rec_len);
+   const int rlen = ext3_rec_len_from_disk(de-rec_len);
 
if (rlen  EXT3_DIR_REC_LEN(1))
error_msg = rec_len is smaller than minimal;
@@ -177,10 +177,10 @@ revalidate:
 * least that it is non-zero.  A
 * failure will be detected in the
 * dirent test below. */
-   if (le16_to_cpu(de-rec_len) 
+   if (ext3_rec_len_from_disk(de-rec_len) 
EXT3_DIR_REC_LEN(1))
break;
-   i += le16_to_cpu(de-rec_len);
+   i += ext3_rec_len_from_disk(de-rec_len);
}
offset = i;
filp-f_pos = (filp-f_pos  ~(sb-s_blocksize - 1))
@@ -201,7 +201,7 @@ revalidate:
ret = stored;
goto out;
}
-   offset += le16_to_cpu(de-rec_len);
+   offset += ext3_rec_len_from_disk(de-rec_len);
if (le32_to_cpu(de-inode)) {
/* We might block in the next section
 * if the data destination is
@@ -223,7 +223,7 @@ revalidate:
goto revalidate;
stored ++;
}
-   filp-f_pos += le16_to_cpu(de-rec_len);
+   filp-f_pos += ext3_rec_len_from_disk(de-rec_len);
}
offset = 0;
brelse (bh);
diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index c1fa190..2c38eb6 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -144,6 +144,15 @@ struct dx_map_entry
u16 size;
 };
 
+/*
+ * p is at least 6 bytes before the end of page
+ */
+static inline struct ext3_dir_entry_2 *ext3_next_entry(struct ext3_dir_entry_2 
*p)
+{
+   return (struct ext3_dir_entry_2 *)((char*)p +
+   ext3_rec_len_from_disk(p-rec_len));
+}
+
 #ifdef CONFIG_EXT3_INDEX
 static inline unsigned dx_get_block (struct dx_entry *entry);
 static void dx_set_block (struct dx_entry *entry, unsigned value);
@@ -281,7 +290,7 @@ static struct stats dx_show_leaf(struct dx_hash_info 
*hinfo, struct ext3_dir_ent
space += EXT3_DIR_REC_LEN(de-name_len);
names++;
}
-   de = (struct ext3_dir_entry_2 *) ((char *) de + 
le16_to_cpu(de-rec_len));
+   de = ext3_next_entry(de);
}
printk((%i)\n, names);
return (struct stats) { names, space, 1 };
@@ -548,14 +557,6 @@ static int ext3_htree_next_block(struct inode *dir, __u32 
hash,
 
 
 /*
- * p is at least 6 bytes before the end of page
- */
-static inline struct ext3_dir_entry_2 *ext3_next_entry(struct ext3_dir_entry_2 
*p)
-{
-   return (struct ext3_dir_entry_2 *)((char*)p + le16_to_cpu(p-rec_len));
-}
-
-/*
  * This function fills a red-black tree with information from a
  * directory block.  It returns the number directory entries loaded
  * into the tree.  If there is an error it is returned in err.
@@ -721,7 +722,7 @@ static int dx_make_map (struct ext3_dir_entry_2 *de, int 
size,
cond_resched();
}
/* XXX: do we need to check rec_len == 0 case? -Chris */
-   de = (struct ext3_dir_entry_2 *) ((char *) de + 
le16_to_cpu(de-rec_len));
+   de = ext3_next_entry(de);
}
return count;
 }
@@ -825,7 +826,7 @@ static inline int search_dirblock(struct buffer_head * bh,
return 1;
}
/* prevent looping on a bad block */
-   de_len = le16_to_cpu(de-rec_len);
+   de_len = ext3_rec_len_from_disk(de

Re: kernel Oops in ext3 code

2007-09-28 Thread Mingming Cao

> BUG: unable to handle kernel paging request at virtual address 104b
>  printing eip:
>  c0195bd3
>  *pde = 
>  Oops:  [#1]
>  PREEMPT SMP 
>  Modules linked in: vboxdrv binfmt_misc fuse coretemp hwmon gspca videodev 
> v4l2_common v4l1_compat iwl3945 mac80211 tifm_7xx1 tifm_core joydev irda 
> crc_ccitt 8250_pnp 8250 serial_core firewire_ohci firewire_core crc_itu_t
>  CPU:0
>  EIP:0060:[]Not tainted VLI
>  EFLAGS: 00010206   (2.6.23-rc6 #1)
>  EIP is at ext3_discard_reservation+0x18/0x4d
>  eax: dff23800   ebx: 1033   ecx: dfc15ec0   edx: 
>  esi: c0007c44   edi: 1033   ebp: dfc2bef4   esp: dfc2beac
>  ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
>  Process kswapd0 (pid: 261, ti=dfc2a000 task=dfcac570 task.ti=dfc2a000)
>  Stack: c0007ba4 c0007c44 1033 c019ec51 c0007c44 c0007d8c 002c 
> c0171b1b 
> 002c c0007c44 c0007c4c c0171da2 c050880c  0080 
> 0080 
> c0171fb8 0080 c0007e48 df9e3910 7404 c03f5634 0080 
> 00d0 
>  Call Trace:
>   [] ext3_clear_inode+0x5d/0x76
>   [] clear_inode+0x6b/0xb9
>   [] dispose_list+0x48/0xc9
>   [] shrink_icache_memory+0x195/0x1bd
>   [] shrink_slab+0xe2/0x159
>   [] kswapd+0x2d3/0x431
>   [] autoremove_wake_function+0x0/0x33
>   [] kswapd+0x0/0x431
>   [] kthread+0x38/0x5d
>   [] kthread+0x0/0x5d
>   [] kernel_thread_helper+0x7/0x10
>   ===
>  Code: 83 f8 01 19 c0 f7 d0 83 e0 08 89 42 0c 89 56 b4 5b 5e c3 57 56 89 c6 
> 53 8b 58 b4 8b 80 a4 00 00 00 85 db 8b 80 78 01 00 00 74 30 <83> 7b 18 00 74 
> 2a 8d b8 00 03 00 00 89 f8 e8 b8 ca 1a 00 83 7b 
>  EIP: [] ext3_discard_reservation+0x18/0x4d SS:ESP 0068:dfc2beac
> 
> 
On Fri, 2007-09-28 at 17:00 +0200, Norbert Preining wrote: 
> On Fr, 28 Sep 2007, Badari Pulavarty wrote:
> > objdump -DlS balloc.o 
> 
> Here it is
> 

Thanks

Looks like kernel oops at 1753(173b+0x18):

173b :
ext3_discard_reservation():
173b:   57  push   %edi
173c:   56  push   %esi
173d:   89 c6   mov%eax,%esi
173f:   53  push   %ebx
1740:   8b 58 b4mov-0x4c(%eax),%ebx
1743:   8b 80 a4 00 00 00   mov0xa4(%eax),%eax
1749:   85 db   test   %ebx,%ebx
174b:   8b 80 78 01 00 00   mov0x178(%eax),%eax
1751:   74 30   je 1783

1753:   83 7b 18 00 cmpl   $0x0,0x18(%ebx)

==> Kernel oops here, ebx=1033, match bad
page location 104b(=1033+0x18)


1757:   74 2a   je 1783

1759:   8d b8 00 03 00 00   lea0x300(%eax),%edi
175f:   89 f8   mov%edi,%eax
1761:   e8 fc ff ff ff  call   1762

1766:   83 7b 18 00 cmpl   $0x0,0x18(%ebx)
176a:   74 0d   je 1779

176c:   8b 86 a4 00 00 00   mov0xa4(%esi),%eax
1772:   89 da   mov%ebx,%edx
1774:   e8 dc eb ff ff  call   355 
1779:   89 f8   mov%edi,%eax
177b:   5b  pop%ebx
177c:   5e  pop%esi
177d:   5f  pop%edi
177e:   e9 fc ff ff ff  jmp177f

1783:   5b  pop%ebx
1784:   5e  pop%esi
1785:   5f  pop%edi
1786:   c3  ret


And trying to matching to the code:

void ext3_discard_reservation(struct inode *inode)
{
struct ext3_inode_info *ei = EXT3_I(inode);
struct ext3_block_alloc_info *block_i = ei->i_block_alloc_info;
struct ext3_reserve_window_node *rsv;
spinlock_t *rsv_lock = _SB(inode->i_sb)->s_rsv_window_lock;

if (!block_i)
return;

rsv = _i->rsv_window_node;
if (!rsv_is_empty(>rsv_window)) {

=> kernel oops here

spin_lock(rsv_lock);
if (!rsv_is_empty(>rsv_window))
rsv_window_remove(inode->i_sb, rsv);
spin_unlock(rsv_lock);
}
}


It seems ebx points to block_i(i_block_alloc_info), and that is bad
memory location, so that leads to bad paging request when try to get the
rsv_window structure. 

But it confused me why the rsv_window offset is 0x18 to
i_block_alloc_info, it should be 0x14(20 bytes)...Are you running a
vanilla 2.6.23-rc6?

No clue how i_block_alloc_info pointing to a bad location for now.
ext3_alloc_inode() clearly init this field to NULL, and
ext3_clear_inode() clearly set this field to NULL. So during the
lifecycle of the inode, i_block_alloc_info should point to a valid
address or being NULL.

And the stack trace indicating the oops happened when pushing the inode
from the

Re: kernel Oops in ext3 code

2007-09-28 Thread Mingming Cao

 BUG: unable to handle kernel paging request at virtual address 104b
  printing eip:
  c0195bd3
  *pde = 
  Oops:  [#1]
  PREEMPT SMP 
  Modules linked in: vboxdrv binfmt_misc fuse coretemp hwmon gspca videodev 
 v4l2_common v4l1_compat iwl3945 mac80211 tifm_7xx1 tifm_core joydev irda 
 crc_ccitt 8250_pnp 8250 serial_core firewire_ohci firewire_core crc_itu_t
  CPU:0
  EIP:0060:[c0195bd3]Not tainted VLI
  EFLAGS: 00010206   (2.6.23-rc6 #1)
  EIP is at ext3_discard_reservation+0x18/0x4d
  eax: dff23800   ebx: 1033   ecx: dfc15ec0   edx: 
  esi: c0007c44   edi: 1033   ebp: dfc2bef4   esp: dfc2beac
  ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
  Process kswapd0 (pid: 261, ti=dfc2a000 task=dfcac570 task.ti=dfc2a000)
  Stack: c0007ba4 c0007c44 1033 c019ec51 c0007c44 c0007d8c 002c 
 c0171b1b 
 002c c0007c44 c0007c4c c0171da2 c050880c  0080 
 0080 
 c0171fb8 0080 c0007e48 df9e3910 7404 c03f5634 0080 
 00d0 
  Call Trace:
   [c019ec51] ext3_clear_inode+0x5d/0x76
   [c0171b1b] clear_inode+0x6b/0xb9
   [c0171da2] dispose_list+0x48/0xc9
   [c0171fb8] shrink_icache_memory+0x195/0x1bd
   [c014f5ec] shrink_slab+0xe2/0x159
   [c014f9a0] kswapd+0x2d3/0x431
   [c0132520] autoremove_wake_function+0x0/0x33
   [c014f6cd] kswapd+0x0/0x431
   [c0132453] kthread+0x38/0x5d
   [c013241b] kthread+0x0/0x5d
   [c0104b73] kernel_thread_helper+0x7/0x10
   ===
  Code: 83 f8 01 19 c0 f7 d0 83 e0 08 89 42 0c 89 56 b4 5b 5e c3 57 56 89 c6 
 53 8b 58 b4 8b 80 a4 00 00 00 85 db 8b 80 78 01 00 00 74 30 83 7b 18 00 74 
 2a 8d b8 00 03 00 00 89 f8 e8 b8 ca 1a 00 83 7b 
  EIP: [c0195bd3] ext3_discard_reservation+0x18/0x4d SS:ESP 0068:dfc2beac
 
 
On Fri, 2007-09-28 at 17:00 +0200, Norbert Preining wrote: 
 On Fr, 28 Sep 2007, Badari Pulavarty wrote:
  objdump -DlS balloc.o 
 
 Here it is
 

Thanks

Looks like kernel oops at 1753(173b+0x18):

173b ext3_discard_reservation:
ext3_discard_reservation():
173b:   57  push   %edi
173c:   56  push   %esi
173d:   89 c6   mov%eax,%esi
173f:   53  push   %ebx
1740:   8b 58 b4mov-0x4c(%eax),%ebx
1743:   8b 80 a4 00 00 00   mov0xa4(%eax),%eax
1749:   85 db   test   %ebx,%ebx
174b:   8b 80 78 01 00 00   mov0x178(%eax),%eax
1751:   74 30   je 1783
ext3_discard_reservation+0x48
1753:   83 7b 18 00 cmpl   $0x0,0x18(%ebx)

== Kernel oops here, ebx=1033, match bad
page location 104b(=1033+0x18)


1757:   74 2a   je 1783
ext3_discard_reservation+0x48
1759:   8d b8 00 03 00 00   lea0x300(%eax),%edi
175f:   89 f8   mov%edi,%eax
1761:   e8 fc ff ff ff  call   1762
ext3_discard_reservation+0x27
1766:   83 7b 18 00 cmpl   $0x0,0x18(%ebx)
176a:   74 0d   je 1779
ext3_discard_reservation+0x3e
176c:   8b 86 a4 00 00 00   mov0xa4(%esi),%eax
1772:   89 da   mov%ebx,%edx
1774:   e8 dc eb ff ff  call   355 rsv_window_remove
1779:   89 f8   mov%edi,%eax
177b:   5b  pop%ebx
177c:   5e  pop%esi
177d:   5f  pop%edi
177e:   e9 fc ff ff ff  jmp177f
ext3_discard_reservation+0x44
1783:   5b  pop%ebx
1784:   5e  pop%esi
1785:   5f  pop%edi
1786:   c3  ret


And trying to matching to the code:

void ext3_discard_reservation(struct inode *inode)
{
struct ext3_inode_info *ei = EXT3_I(inode);
struct ext3_block_alloc_info *block_i = ei-i_block_alloc_info;
struct ext3_reserve_window_node *rsv;
spinlock_t *rsv_lock = EXT3_SB(inode-i_sb)-s_rsv_window_lock;

if (!block_i)
return;

rsv = block_i-rsv_window_node;
if (!rsv_is_empty(rsv-rsv_window)) {

= kernel oops here

spin_lock(rsv_lock);
if (!rsv_is_empty(rsv-rsv_window))
rsv_window_remove(inode-i_sb, rsv);
spin_unlock(rsv_lock);
}
}


It seems ebx points to block_i(i_block_alloc_info), and that is bad
memory location, so that leads to bad paging request when try to get the
rsv_window structure. 

But it confused me why the rsv_window offset is 0x18 to
i_block_alloc_info, it should be 0x14(20 bytes)...Are you running a
vanilla 2.6.23-rc6?

No clue how i_block_alloc_info pointing to a bad location for now.
ext3_alloc_inode() clearly init this field to

Re: kernel Oops in ext3 code

2007-09-27 Thread Mingming Cao

Hi,
Could you please sent the objdump of the ext4_discard_reservation
function? It doesn't match what I see here.

Thanks,
Mingming

On Thu, 2007-09-27 at 12:31 +0200, [EMAIL PROTECTED]
wrote:
> Hi all!
> 
> (Please Cc)
> 
> kernel 2.6.23-rc6
> Debian/sid
> 
> kernel ooops:
> 
> BUG: unable to handle kernel paging request at virtual address 104b
>  printing eip:
>  c0195bd3
>  *pde = 
>  Oops:  [#1]
>  PREEMPT SMP 
>  Modules linked in: vboxdrv binfmt_misc fuse coretemp hwmon gspca videodev 
> v4l2_common v4l1_compat iwl3945 mac80211 tifm_7xx1 tifm_core joydev irda 
> crc_ccitt 8250_pnp 8250 serial_core firewire_ohci firewire_core crc_itu_t
>  CPU:0
>  EIP:0060:[]Not tainted VLI
>  EFLAGS: 00010206   (2.6.23-rc6 #1)
>  EIP is at ext3_discard_reservation+0x18/0x4d
>  eax: dff23800   ebx: 1033   ecx: dfc15ec0   edx: 
>  esi: c0007c44   edi: 1033   ebp: dfc2bef4   esp: dfc2beac
>  ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
>  Process kswapd0 (pid: 261, ti=dfc2a000 task=dfcac570 task.ti=dfc2a000)
>  Stack: c0007ba4 c0007c44 1033 c019ec51 c0007c44 c0007d8c 002c 
> c0171b1b 
> 002c c0007c44 c0007c4c c0171da2 c050880c  0080 
> 0080 
> c0171fb8 0080 c0007e48 df9e3910 7404 c03f5634 0080 
> 00d0 
>  Call Trace:
>   [] ext3_clear_inode+0x5d/0x76
>   [] clear_inode+0x6b/0xb9
>   [] dispose_list+0x48/0xc9
>   [] shrink_icache_memory+0x195/0x1bd
>   [] shrink_slab+0xe2/0x159
>   [] kswapd+0x2d3/0x431
>   [] autoremove_wake_function+0x0/0x33
>   [] kswapd+0x0/0x431
>   [] kthread+0x38/0x5d
>   [] kthread+0x0/0x5d
>   [] kernel_thread_helper+0x7/0x10
>   ===
>  Code: 83 f8 01 19 c0 f7 d0 83 e0 08 89 42 0c 89 56 b4 5b 5e c3 57 56 89 c6 
> 53 8b 58 b4 8b 80 a4 00 00 00 85 db 8b 80 78 01 00 00 74 30 <83> 7b 18 00 74 
> 2a 8d b8 00 03 00 00 89 f8 e8 b8 ca 1a 00 83 7b 
>  EIP: [] ext3_discard_reservation+0x18/0x4d SS:ESP 0068:dfc2beac
> 
> 
> Sysrq did work, so the oops was saved. Good.
> 
> Any ideas?
> 
> Best wishes
> 
> Norbert
> 
> ---
> Dr. Norbert Preining <[EMAIL PROTECTED]>Vienna University of 
> Technology
> Debian Developer <[EMAIL PROTECTED]> Debian TeX Group
> gpg DSA: 0x09C5B094  fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 
> B094
> ---
> As he came into the light they could see his black and
> gold uniform on which the buttons were so highly polished
> that they shone with an intensity that would have made an
> approaching motorist flash his lights in annoyance.
>  --- Douglas Adams, The Hitchhikers Guide to the Galaxy
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel Oops in ext3 code

2007-09-27 Thread Mingming Cao

Hi,
Could you please sent the objdump of the ext4_discard_reservation
function? It doesn't match what I see here.

Thanks,
Mingming

On Thu, 2007-09-27 at 12:31 +0200, [EMAIL PROTECTED]
wrote:
 Hi all!
 
 (Please Cc)
 
 kernel 2.6.23-rc6
 Debian/sid
 
 kernel ooops:
 
 BUG: unable to handle kernel paging request at virtual address 104b
  printing eip:
  c0195bd3
  *pde = 
  Oops:  [#1]
  PREEMPT SMP 
  Modules linked in: vboxdrv binfmt_misc fuse coretemp hwmon gspca videodev 
 v4l2_common v4l1_compat iwl3945 mac80211 tifm_7xx1 tifm_core joydev irda 
 crc_ccitt 8250_pnp 8250 serial_core firewire_ohci firewire_core crc_itu_t
  CPU:0
  EIP:0060:[c0195bd3]Not tainted VLI
  EFLAGS: 00010206   (2.6.23-rc6 #1)
  EIP is at ext3_discard_reservation+0x18/0x4d
  eax: dff23800   ebx: 1033   ecx: dfc15ec0   edx: 
  esi: c0007c44   edi: 1033   ebp: dfc2bef4   esp: dfc2beac
  ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
  Process kswapd0 (pid: 261, ti=dfc2a000 task=dfcac570 task.ti=dfc2a000)
  Stack: c0007ba4 c0007c44 1033 c019ec51 c0007c44 c0007d8c 002c 
 c0171b1b 
 002c c0007c44 c0007c4c c0171da2 c050880c  0080 
 0080 
 c0171fb8 0080 c0007e48 df9e3910 7404 c03f5634 0080 
 00d0 
  Call Trace:
   [c019ec51] ext3_clear_inode+0x5d/0x76
   [c0171b1b] clear_inode+0x6b/0xb9
   [c0171da2] dispose_list+0x48/0xc9
   [c0171fb8] shrink_icache_memory+0x195/0x1bd
   [c014f5ec] shrink_slab+0xe2/0x159
   [c014f9a0] kswapd+0x2d3/0x431
   [c0132520] autoremove_wake_function+0x0/0x33
   [c014f6cd] kswapd+0x0/0x431
   [c0132453] kthread+0x38/0x5d
   [c013241b] kthread+0x0/0x5d
   [c0104b73] kernel_thread_helper+0x7/0x10
   ===
  Code: 83 f8 01 19 c0 f7 d0 83 e0 08 89 42 0c 89 56 b4 5b 5e c3 57 56 89 c6 
 53 8b 58 b4 8b 80 a4 00 00 00 85 db 8b 80 78 01 00 00 74 30 83 7b 18 00 74 
 2a 8d b8 00 03 00 00 89 f8 e8 b8 ca 1a 00 83 7b 
  EIP: [c0195bd3] ext3_discard_reservation+0x18/0x4d SS:ESP 0068:dfc2beac
 
 
 Sysrq did work, so the oops was saved. Good.
 
 Any ideas?
 
 Best wishes
 
 Norbert
 
 ---
 Dr. Norbert Preining [EMAIL PROTECTED]Vienna University of 
 Technology
 Debian Developer [EMAIL PROTECTED] Debian TeX Group
 gpg DSA: 0x09C5B094  fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 
 B094
 ---
 As he came into the light they could see his black and
 gold uniform on which the buttons were so highly polished
 that they shone with an intensity that would have made an
 approaching motorist flash his lights in annoyance.
  --- Douglas Adams, The Hitchhikers Guide to the Galaxy
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] JBD/ext34 cleanups: convert to kzalloc

2007-09-26 Thread Mingming Cao

On Wed, 2007-09-26 at 12:54 -0700, Andrew Morton wrote:
> On Fri, 21 Sep 2007 16:13:56 -0700
> Mingming Cao <[EMAIL PROTECTED]> wrote:
> 
> > Convert kmalloc to kzalloc() and get rid of the memset().
> 
> I split this into separate ext3/jbd and ext4/jbd2 patches.  It's generally
> better to raise separate patches, please - the ext3 patches I'll merge
> directly but the ext4 patches should go through (and be against) the ext4
> devel tree.
> 
Sure. The patches(including ext3/jbd and ext4/jbd2) were merged into
ext4 devel tree already, I will remove the ext3/jbd part out of the ext4
devel tree.

> I fixed lots of rejects against the already-pending changes to these
> filesystems.
> 
> You forgot to remove the memsets in both start_this_handle()s.
> 
Thanks for catching this.

Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] JBD/ext34 cleanups: convert to kzalloc

2007-09-26 Thread Mingming Cao

On Wed, 2007-09-26 at 12:54 -0700, Andrew Morton wrote:
 On Fri, 21 Sep 2007 16:13:56 -0700
 Mingming Cao [EMAIL PROTECTED] wrote:
 
  Convert kmalloc to kzalloc() and get rid of the memset().
 
 I split this into separate ext3/jbd and ext4/jbd2 patches.  It's generally
 better to raise separate patches, please - the ext3 patches I'll merge
 directly but the ext4 patches should go through (and be against) the ext4
 devel tree.
 
Sure. The patches(including ext3/jbd and ext4/jbd2) were merged into
ext4 devel tree already, I will remove the ext3/jbd part out of the ext4
devel tree.

 I fixed lots of rejects against the already-pending changes to these
 filesystems.
 
 You forgot to remove the memsets in both start_this_handle()s.
 
Thanks for catching this.

Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] JBD2/ext4 naming cleanup

2007-09-21 Thread Mingming Cao

JBD2 naming cleanup

From: Mingming Cao <[EMAIL PROTECTED]>

change micros name from JBD_XXX to JBD2_XXX in JBD2/Ext4

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/ext4/extents.c |2 +-
 fs/ext4/super.c   |2 +-
 fs/jbd2/commit.c  |2 +-
 fs/jbd2/journal.c |8 
 fs/jbd2/recovery.c|2 +-
 fs/jbd2/revoke.c  |4 ++--
 include/linux/ext4_jbd2.h |6 +++---
 include/linux/jbd2.h  |   30 +++---
 8 files changed, 28 insertions(+), 28 deletions(-)

Index: linux-2.6.23-rc6/fs/ext4/super.c
===
--- linux-2.6.23-rc6.orig/fs/ext4/super.c   2007-09-21 16:27:31.0 
-0700
+++ linux-2.6.23-rc6/fs/ext4/super.c2007-09-21 16:27:46.0 -0700
@@ -966,7 +966,7 @@ static int parse_options (char *options,
if (option < 0)
return 0;
if (option == 0)
-   option = JBD_DEFAULT_MAX_COMMIT_AGE;
+   option = JBD2_DEFAULT_MAX_COMMIT_AGE;
sbi->s_commit_interval = HZ * option;
break;
case Opt_data_journal:
Index: linux-2.6.23-rc6/include/linux/ext4_jbd2.h
===
--- linux-2.6.23-rc6.orig/include/linux/ext4_jbd2.h 2007-09-10 
19:50:29.0 -0700
+++ linux-2.6.23-rc6/include/linux/ext4_jbd2.h  2007-09-21 16:27:46.0 
-0700
@@ -12,8 +12,8 @@
  * Ext4-specific journaling extensions.
  */
 
-#ifndef _LINUX_EXT4_JBD_H
-#define _LINUX_EXT4_JBD_H
+#ifndef _LINUX_EXT4_JBD2_H
+#define _LINUX_EXT4_JBD2_H
 
 #include 
 #include 
@@ -228,4 +228,4 @@ static inline int ext4_should_writeback_
return 0;
 }
 
-#endif /* _LINUX_EXT4_JBD_H */
+#endif /* _LINUX_EXT4_JBD2_H */
Index: linux-2.6.23-rc6/include/linux/jbd2.h
===
--- linux-2.6.23-rc6.orig/include/linux/jbd2.h  2007-09-21 09:07:09.0 
-0700
+++ linux-2.6.23-rc6/include/linux/jbd2.h   2007-09-21 16:27:46.0 
-0700
@@ -13,8 +13,8 @@
  * filesystem journaling support.
  */
 
-#ifndef _LINUX_JBD_H
-#define _LINUX_JBD_H
+#ifndef _LINUX_JBD2_H
+#define _LINUX_JBD2_H
 
 /* Allow this file to be included directly into e2fsprogs */
 #ifndef __KERNEL__
@@ -37,26 +37,26 @@
 #define journal_oom_retry 1
 
 /*
- * Define JBD_PARANIOD_IOFAIL to cause a kernel BUG() if ext3 finds
+ * Define JBD2_PARANIOD_IOFAIL to cause a kernel BUG() if ext4 finds
  * certain classes of error which can occur due to failed IOs.  Under
- * normal use we want ext3 to continue after such errors, because
+ * normal use we want ext4 to continue after such errors, because
  * hardware _can_ fail, but for debugging purposes when running tests on
  * known-good hardware we may want to trap these errors.
  */
-#undef JBD_PARANOID_IOFAIL
+#undef JBD2_PARANOID_IOFAIL
 
 /*
  * The default maximum commit age, in seconds.
  */
-#define JBD_DEFAULT_MAX_COMMIT_AGE 5
+#define JBD2_DEFAULT_MAX_COMMIT_AGE 5
 
 #ifdef CONFIG_JBD2_DEBUG
 /*
- * Define JBD_EXPENSIVE_CHECKING to enable more expensive internal
+ * Define JBD2_EXPENSIVE_CHECKING to enable more expensive internal
  * consistency checks.  By default we don't do this unless
  * CONFIG_JBD2_DEBUG is on.
  */
-#define JBD_EXPENSIVE_CHECKING
+#define JBD2_EXPENSIVE_CHECKING
 extern u8 jbd2_journal_enable_debug;
 
 #define jbd_debug(n, f, a...)  \
@@ -163,8 +163,8 @@ typedef struct journal_block_tag_s
__be32  t_blocknr_high; /* most-significant high 32bits. */
 } journal_block_tag_t;
 
-#define JBD_TAG_SIZE32 (offsetof(journal_block_tag_t, t_blocknr_high))
-#define JBD_TAG_SIZE64 (sizeof(journal_block_tag_t))
+#define JBD2_TAG_SIZE32 (offsetof(journal_block_tag_t, t_blocknr_high))
+#define JBD2_TAG_SIZE64 (sizeof(journal_block_tag_t))
 
 /*
  * The revoke descriptor: used on disk to describe a series of blocks to
@@ -256,8 +256,8 @@ typedef struct journal_superblock_s
 #include 
 #include 
 
-#define JBD_ASSERTIONS
-#ifdef JBD_ASSERTIONS
+#define JBD2_ASSERTIONS
+#ifdef JBD2_ASSERTIONS
 #define J_ASSERT(assert)   \
 do {   \
if (!(assert)) {\
@@ -284,9 +284,9 @@ void buffer_assertion_failure(struct buf
 
 #else
 #define J_ASSERT(assert)   do { } while (0)
-#endif /* JBD_ASSERTIONS */
+#endif /* JBD2_ASSERTIONS */
 
-#if defined(JBD_PARANOID_IOFAIL)
+#if defined(JBD2_PARANOID_IOFAIL)
 #define J_EXPECT(expr, why...) J_ASSERT(expr)
 #define J_EXPECT_BH(bh, expr, why...)  J_ASSERT_BH(bh, expr)
 #define J_EXPECT_JH(jh, expr, why...)  J_ASSERT_JH(jh, expr)
@@ -1104,4 +11

[PATCH] JBD/ext34 cleanups: convert to kzalloc

2007-09-21 Thread Mingming Cao

Convert kmalloc to kzalloc() and get rid of the memset().

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/ext3/xattr.c   |3 +--
 fs/ext4/xattr.c   |3 +--
 fs/jbd/journal.c  |3 +--
 fs/jbd/transaction.c  |2 +-
 fs/jbd2/journal.c |3 +--
 fs/jbd2/transaction.c |2 +-
 6 files changed, 6 insertions(+), 10 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-21 09:08:02.0
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-21 09:10:37.0
-0700
@@ -653,10 +653,9 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kzalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
if (!journal)
goto fail;
-   memset(journal, 0, sizeof(*journal));
 
init_waitqueue_head(>j_wait_transaction_locked);
init_waitqueue_head(>j_wait_logspace);
Index: linux-2.6.23-rc6/fs/jbd/transaction.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/transaction.c  2007-09-21
09:13:11.0 -0700
+++ linux-2.6.23-rc6/fs/jbd/transaction.c   2007-09-21 09:13:24.0
-0700
@@ -96,7 +96,7 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal->j_running_transaction) {
-   new_transaction = kmalloc(sizeof(*new_transaction),
+   new_transaction = kzalloc(sizeof(*new_transaction),
GFP_NOFS|__GFP_NOFAIL);
if (!new_transaction) {
ret = -ENOMEM;
Index: linux-2.6.23-rc6/fs/jbd2/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-21
09:10:53.0 -0700
+++ linux-2.6.23-rc6/fs/jbd2/journal.c  2007-09-21 09:11:13.0
-0700
@@ -654,10 +654,9 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kzalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
if (!journal)
goto fail;
-   memset(journal, 0, sizeof(*journal));
 
init_waitqueue_head(>j_wait_transaction_locked);
init_waitqueue_head(>j_wait_logspace);
Index: linux-2.6.23-rc6/fs/jbd2/transaction.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/transaction.c 2007-09-21
09:12:46.0 -0700
+++ linux-2.6.23-rc6/fs/jbd2/transaction.c  2007-09-21 09:12:59.0
-0700
@@ -96,7 +96,7 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal->j_running_transaction) {
-   new_transaction = kmalloc(sizeof(*new_transaction),
+   new_transaction = kzalloc(sizeof(*new_transaction),
GFP_NOFS|__GFP_NOFAIL);
if (!new_transaction) {
ret = -ENOMEM;
Index: linux-2.6.23-rc6/fs/ext3/xattr.c
===
--- linux-2.6.23-rc6.orig/fs/ext3/xattr.c   2007-09-21 10:22:24.0
-0700
+++ linux-2.6.23-rc6/fs/ext3/xattr.c2007-09-21 10:24:19.0 -0700
@@ -741,12 +741,11 @@ ext3_xattr_block_set(handle_t *handle, s
}
} else {
/* Allocate a buffer where we construct the new block. */
-   s->base = kmalloc(sb->s_blocksize, GFP_KERNEL);
+   s->base = kzalloc(sb->s_blocksize, GFP_KERNEL);
/* assert(header == s->base) */
error = -ENOMEM;
if (s->base == NULL)
goto cleanup;
-   memset(s->base, 0, sb->s_blocksize);
header(s->base)->h_magic = cpu_to_le32(EXT3_XATTR_MAGIC);
header(s->base)->h_blocks = cpu_to_le32(1);
header(s->base)->h_refcount = cpu_to_le32(1);
Index: linux-2.6.23-rc6/fs/ext4/xattr.c
===
--- linux-2.6.23-rc6.orig/fs/ext4/xattr.c   2007-09-21 10:20:21.0
-0700
+++ linux-2.6.23-rc6/fs/ext4/xattr.c2007-09-21 10:21:00.0 -0700
@@ -750,12 +750,11 @@ ext4_xattr_block_set(handle_t *handle, s
}
} else {
/* Allocate a buffer where we construct the new block. */
-   s->base = kmalloc(sb->s_blocksize, GFP_KERNEL);
+   s->base = kzalloc(sb->s_blocksize, GFP_KERNEL);
/* assert(header == s->base) */
error = -ENOMEM;
if (s->base == NULL)
goto cleanup;
-

[PATCH] JBD/ext34 cleanups: convert to kzalloc

2007-09-21 Thread Mingming Cao

Convert kmalloc to kzalloc() and get rid of the memset().

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/ext3/xattr.c   |3 +--
 fs/ext4/xattr.c   |3 +--
 fs/jbd/journal.c  |3 +--
 fs/jbd/transaction.c  |2 +-
 fs/jbd2/journal.c |3 +--
 fs/jbd2/transaction.c |2 +-
 6 files changed, 6 insertions(+), 10 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-21 09:08:02.0
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-21 09:10:37.0
-0700
@@ -653,10 +653,9 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kzalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
if (!journal)
goto fail;
-   memset(journal, 0, sizeof(*journal));
 
init_waitqueue_head(journal-j_wait_transaction_locked);
init_waitqueue_head(journal-j_wait_logspace);
Index: linux-2.6.23-rc6/fs/jbd/transaction.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/transaction.c  2007-09-21
09:13:11.0 -0700
+++ linux-2.6.23-rc6/fs/jbd/transaction.c   2007-09-21 09:13:24.0
-0700
@@ -96,7 +96,7 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal-j_running_transaction) {
-   new_transaction = kmalloc(sizeof(*new_transaction),
+   new_transaction = kzalloc(sizeof(*new_transaction),
GFP_NOFS|__GFP_NOFAIL);
if (!new_transaction) {
ret = -ENOMEM;
Index: linux-2.6.23-rc6/fs/jbd2/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-21
09:10:53.0 -0700
+++ linux-2.6.23-rc6/fs/jbd2/journal.c  2007-09-21 09:11:13.0
-0700
@@ -654,10 +654,9 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kzalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
if (!journal)
goto fail;
-   memset(journal, 0, sizeof(*journal));
 
init_waitqueue_head(journal-j_wait_transaction_locked);
init_waitqueue_head(journal-j_wait_logspace);
Index: linux-2.6.23-rc6/fs/jbd2/transaction.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/transaction.c 2007-09-21
09:12:46.0 -0700
+++ linux-2.6.23-rc6/fs/jbd2/transaction.c  2007-09-21 09:12:59.0
-0700
@@ -96,7 +96,7 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal-j_running_transaction) {
-   new_transaction = kmalloc(sizeof(*new_transaction),
+   new_transaction = kzalloc(sizeof(*new_transaction),
GFP_NOFS|__GFP_NOFAIL);
if (!new_transaction) {
ret = -ENOMEM;
Index: linux-2.6.23-rc6/fs/ext3/xattr.c
===
--- linux-2.6.23-rc6.orig/fs/ext3/xattr.c   2007-09-21 10:22:24.0
-0700
+++ linux-2.6.23-rc6/fs/ext3/xattr.c2007-09-21 10:24:19.0 -0700
@@ -741,12 +741,11 @@ ext3_xattr_block_set(handle_t *handle, s
}
} else {
/* Allocate a buffer where we construct the new block. */
-   s-base = kmalloc(sb-s_blocksize, GFP_KERNEL);
+   s-base = kzalloc(sb-s_blocksize, GFP_KERNEL);
/* assert(header == s-base) */
error = -ENOMEM;
if (s-base == NULL)
goto cleanup;
-   memset(s-base, 0, sb-s_blocksize);
header(s-base)-h_magic = cpu_to_le32(EXT3_XATTR_MAGIC);
header(s-base)-h_blocks = cpu_to_le32(1);
header(s-base)-h_refcount = cpu_to_le32(1);
Index: linux-2.6.23-rc6/fs/ext4/xattr.c
===
--- linux-2.6.23-rc6.orig/fs/ext4/xattr.c   2007-09-21 10:20:21.0
-0700
+++ linux-2.6.23-rc6/fs/ext4/xattr.c2007-09-21 10:21:00.0 -0700
@@ -750,12 +750,11 @@ ext4_xattr_block_set(handle_t *handle, s
}
} else {
/* Allocate a buffer where we construct the new block. */
-   s-base = kmalloc(sb-s_blocksize, GFP_KERNEL);
+   s-base = kzalloc(sb-s_blocksize, GFP_KERNEL);
/* assert(header == s-base) */
error = -ENOMEM;
if (s-base == NULL)
goto cleanup;
-   memset(s-base, 0, sb-s_blocksize);
header(s-base)-h_magic

[PATCH] JBD2/ext4 naming cleanup

2007-09-21 Thread Mingming Cao

JBD2 naming cleanup

From: Mingming Cao [EMAIL PROTECTED]

change micros name from JBD_XXX to JBD2_XXX in JBD2/Ext4

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/ext4/extents.c |2 +-
 fs/ext4/super.c   |2 +-
 fs/jbd2/commit.c  |2 +-
 fs/jbd2/journal.c |8 
 fs/jbd2/recovery.c|2 +-
 fs/jbd2/revoke.c  |4 ++--
 include/linux/ext4_jbd2.h |6 +++---
 include/linux/jbd2.h  |   30 +++---
 8 files changed, 28 insertions(+), 28 deletions(-)

Index: linux-2.6.23-rc6/fs/ext4/super.c
===
--- linux-2.6.23-rc6.orig/fs/ext4/super.c   2007-09-21 16:27:31.0 
-0700
+++ linux-2.6.23-rc6/fs/ext4/super.c2007-09-21 16:27:46.0 -0700
@@ -966,7 +966,7 @@ static int parse_options (char *options,
if (option  0)
return 0;
if (option == 0)
-   option = JBD_DEFAULT_MAX_COMMIT_AGE;
+   option = JBD2_DEFAULT_MAX_COMMIT_AGE;
sbi-s_commit_interval = HZ * option;
break;
case Opt_data_journal:
Index: linux-2.6.23-rc6/include/linux/ext4_jbd2.h
===
--- linux-2.6.23-rc6.orig/include/linux/ext4_jbd2.h 2007-09-10 
19:50:29.0 -0700
+++ linux-2.6.23-rc6/include/linux/ext4_jbd2.h  2007-09-21 16:27:46.0 
-0700
@@ -12,8 +12,8 @@
  * Ext4-specific journaling extensions.
  */
 
-#ifndef _LINUX_EXT4_JBD_H
-#define _LINUX_EXT4_JBD_H
+#ifndef _LINUX_EXT4_JBD2_H
+#define _LINUX_EXT4_JBD2_H
 
 #include linux/fs.h
 #include linux/jbd2.h
@@ -228,4 +228,4 @@ static inline int ext4_should_writeback_
return 0;
 }
 
-#endif /* _LINUX_EXT4_JBD_H */
+#endif /* _LINUX_EXT4_JBD2_H */
Index: linux-2.6.23-rc6/include/linux/jbd2.h
===
--- linux-2.6.23-rc6.orig/include/linux/jbd2.h  2007-09-21 09:07:09.0 
-0700
+++ linux-2.6.23-rc6/include/linux/jbd2.h   2007-09-21 16:27:46.0 
-0700
@@ -13,8 +13,8 @@
  * filesystem journaling support.
  */
 
-#ifndef _LINUX_JBD_H
-#define _LINUX_JBD_H
+#ifndef _LINUX_JBD2_H
+#define _LINUX_JBD2_H
 
 /* Allow this file to be included directly into e2fsprogs */
 #ifndef __KERNEL__
@@ -37,26 +37,26 @@
 #define journal_oom_retry 1
 
 /*
- * Define JBD_PARANIOD_IOFAIL to cause a kernel BUG() if ext3 finds
+ * Define JBD2_PARANIOD_IOFAIL to cause a kernel BUG() if ext4 finds
  * certain classes of error which can occur due to failed IOs.  Under
- * normal use we want ext3 to continue after such errors, because
+ * normal use we want ext4 to continue after such errors, because
  * hardware _can_ fail, but for debugging purposes when running tests on
  * known-good hardware we may want to trap these errors.
  */
-#undef JBD_PARANOID_IOFAIL
+#undef JBD2_PARANOID_IOFAIL
 
 /*
  * The default maximum commit age, in seconds.
  */
-#define JBD_DEFAULT_MAX_COMMIT_AGE 5
+#define JBD2_DEFAULT_MAX_COMMIT_AGE 5
 
 #ifdef CONFIG_JBD2_DEBUG
 /*
- * Define JBD_EXPENSIVE_CHECKING to enable more expensive internal
+ * Define JBD2_EXPENSIVE_CHECKING to enable more expensive internal
  * consistency checks.  By default we don't do this unless
  * CONFIG_JBD2_DEBUG is on.
  */
-#define JBD_EXPENSIVE_CHECKING
+#define JBD2_EXPENSIVE_CHECKING
 extern u8 jbd2_journal_enable_debug;
 
 #define jbd_debug(n, f, a...)  \
@@ -163,8 +163,8 @@ typedef struct journal_block_tag_s
__be32  t_blocknr_high; /* most-significant high 32bits. */
 } journal_block_tag_t;
 
-#define JBD_TAG_SIZE32 (offsetof(journal_block_tag_t, t_blocknr_high))
-#define JBD_TAG_SIZE64 (sizeof(journal_block_tag_t))
+#define JBD2_TAG_SIZE32 (offsetof(journal_block_tag_t, t_blocknr_high))
+#define JBD2_TAG_SIZE64 (sizeof(journal_block_tag_t))
 
 /*
  * The revoke descriptor: used on disk to describe a series of blocks to
@@ -256,8 +256,8 @@ typedef struct journal_superblock_s
 #include linux/fs.h
 #include linux/sched.h
 
-#define JBD_ASSERTIONS
-#ifdef JBD_ASSERTIONS
+#define JBD2_ASSERTIONS
+#ifdef JBD2_ASSERTIONS
 #define J_ASSERT(assert)   \
 do {   \
if (!(assert)) {\
@@ -284,9 +284,9 @@ void buffer_assertion_failure(struct buf
 
 #else
 #define J_ASSERT(assert)   do { } while (0)
-#endif /* JBD_ASSERTIONS */
+#endif /* JBD2_ASSERTIONS */
 
-#if defined(JBD_PARANOID_IOFAIL)
+#if defined(JBD2_PARANOID_IOFAIL)
 #define J_EXPECT(expr, why...) J_ASSERT(expr)
 #define J_EXPECT_BH(bh, expr, why...)  J_ASSERT_BH(bh, expr)
 #define J_EXPECT_JH(jh, expr, why...)  J_ASSERT_JH(jh, expr

Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Mingming Cao

On Wed, 2007-09-19 at 13:48 -0600, Andreas Dilger wrote:
> On Sep 19, 2007  12:15 -0700, Mingming Cao wrote:
> > @@ -96,8 +96,7 @@ static int start_this_handle(journal_t *
> >  
> >  alloc_transaction:
> > if (!journal->j_running_transaction) {
> > -   new_transaction = kmalloc(sizeof(*new_transaction),
> > -   GFP_NOFS|__GFP_NOFAIL);
> > +   new_transaction = kmalloc(sizeof(*new_transaction), GFP_NOFS);
> 
> This should probably be a __GFP_NOFAIL if we are trying to start a new
> handle in truncate, as there is no way to propagate an error to the caller.
> 

Thanks, updated version.

Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2, most cases
they are not needed.

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/jbd/journal.c  |2 +-
 fs/jbd2/journal.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-19 11:47:58.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-19 14:23:45.0 -0700
@@ -653,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
Index: linux-2.6.23-rc6/fs/jbd2/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-19 11:48:14.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd2/journal.c  2007-09-19 14:23:45.0 -0700
@@ -654,7 +654,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] JBD: use GFP_NOFS in kmalloc

2007-09-19 Thread Mingming Cao

On Wed, 2007-09-19 at 14:34 -0700, Andrew Morton wrote:
> On Wed, 19 Sep 2007 12:22:09 -0700
> Mingming Cao <[EMAIL PROTECTED]> wrote:
> 
> > Convert the GFP_KERNEL flag used in JBD/JBD2 to GFP_NOFS, consistent
> > with the rest of kmalloc flag used in the JBD/JBD2 layer.
> > 
> > Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
> > 
> > ---
> >  fs/jbd/journal.c  |6 +++---
> >  fs/jbd/revoke.c   |8 
> >  fs/jbd2/journal.c |6 +++---
> >  fs/jbd2/revoke.c  |8 
> >  4 files changed, 14 insertions(+), 14 deletions(-)
> > 
> > Index: linux-2.6.23-rc6/fs/jbd/journal.c
> > ===
> > --- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-19 11:51:10.0 
> > -0700
> > +++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-19 11:51:57.0 
> > -0700
> > @@ -653,7 +653,7 @@ static journal_t * journal_init_common (
> > journal_t *journal;
> > int err;
> >  
> > -   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
> > +   journal = kmalloc(sizeof(*journal), GFP_NOFS);
> > if (!journal)
> > goto fail;
> > memset(journal, 0, sizeof(*journal));
> > @@ -723,7 +723,7 @@ journal_t * journal_init_dev(struct bloc
> > journal->j_blocksize = blocksize;
> > n = journal->j_blocksize / sizeof(journal_block_tag_t);
> > journal->j_wbufsize = n;
> > -   journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
> > +   journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
> > if (!journal->j_wbuf) {
> > printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
> > __FUNCTION__);
> > @@ -777,7 +777,7 @@ journal_t * journal_init_inode (struct i
> > /* journal descriptor can store up to n blocks -bzzz */
> > n = journal->j_blocksize / sizeof(journal_block_tag_t);
> > journal->j_wbufsize = n;
> > -   journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
> > +   journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
> > if (!journal->j_wbuf) {
> > printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
> > __FUNCTION__);
> > Index: linux-2.6.23-rc6/fs/jbd/revoke.c
> > ===
> > --- linux-2.6.23-rc6.orig/fs/jbd/revoke.c   2007-09-19 11:51:30.0 
> > -0700
> > +++ linux-2.6.23-rc6/fs/jbd/revoke.c2007-09-19 11:52:34.0 
> > -0700
> > @@ -206,7 +206,7 @@ int journal_init_revoke(journal_t *journ
> > while((tmp >>= 1UL) != 0UL)
> > shift++;
> >  
> > -   journal->j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, 
> > GFP_KERNEL);
> > +   journal->j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, 
> > GFP_NOFS);
> > if (!journal->j_revoke_table[0])
> > return -ENOMEM;
> > journal->j_revoke = journal->j_revoke_table[0];
> > @@ -219,7 +219,7 @@ int journal_init_revoke(journal_t *journ
> > journal->j_revoke->hash_shift = shift;
> >  
> > journal->j_revoke->hash_table =
> > -   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
> > +   kmalloc(hash_size * sizeof(struct list_head), GFP_NOFS);
> > if (!journal->j_revoke->hash_table) {
> > kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
> > journal->j_revoke = NULL;
> > @@ -229,7 +229,7 @@ int journal_init_revoke(journal_t *journ
> > for (tmp = 0; tmp < hash_size; tmp++)
> > INIT_LIST_HEAD(>j_revoke->hash_table[tmp]);
> >  
> > -   journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
> > GFP_KERNEL);
> > +   journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
> > GFP_NOFS);
> > if (!journal->j_revoke_table[1]) {
> > kfree(journal->j_revoke_table[0]->hash_table);
> > kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
> > @@ -246,7 +246,7 @@ int journal_init_revoke(journal_t *journ
> > journal->j_revoke->hash_shift = shift;
> >  
> > journal->j_revoke->hash_table =
> > -   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
> > +   kmalloc(hash_size * sizeof(struct list_head), GFP_NOFS);
> > if (!journal->j

Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Mingming Cao

On Wed, 2007-09-19 at 19:28 +, Dave Kleikamp wrote:
> On Wed, 2007-09-19 at 14:26 -0500, Dave Kleikamp wrote:
> > On Wed, 2007-09-19 at 12:15 -0700, Mingming Cao wrote:
> > 
> > > Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2. In all
> > > cases except one handles memory allocation failure so I get rid of those
> > > GFP_NOFAIL flags.
> > > 
> > > Also, shouldn't we use GFP_KERNEL instead of GFP_NOFS flag for kmalloc
> > > in jbd/jbd2? I will send a separate patch to cleanup that.
> > 
> > No.  GFP_NOFS avoids deadlock.  It prevents the allocation from making
> > recursive calls back into the file system that could end up blocking on
> > jbd code.
> 
> Oh, I see your patch now.  You mean use GFP_NOFS instead of
> GFP_KERNEL.  :-)  OK then.
> 

oops, I did mean what you say here.:-)

> > Shaggy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] JBD: use GFP_NOFS in kmalloc

2007-09-19 Thread Mingming Cao

Convert the GFP_KERNEL flag used in JBD/JBD2 to GFP_NOFS, consistent
with the rest of kmalloc flag used in the JBD/JBD2 layer.

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>

---
 fs/jbd/journal.c  |6 +++---
 fs/jbd/revoke.c   |8 
 fs/jbd2/journal.c |6 +++---
 fs/jbd2/revoke.c  |8 
 4 files changed, 14 insertions(+), 14 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-19 11:51:10.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-19 11:51:57.0 -0700
@@ -653,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_NOFS);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -723,7 +723,7 @@ journal_t * journal_init_dev(struct bloc
journal->j_blocksize = blocksize;
n = journal->j_blocksize / sizeof(journal_block_tag_t);
journal->j_wbufsize = n;
-   journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
@@ -777,7 +777,7 @@ journal_t * journal_init_inode (struct i
/* journal descriptor can store up to n blocks -bzzz */
n = journal->j_blocksize / sizeof(journal_block_tag_t);
journal->j_wbufsize = n;
-   journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
Index: linux-2.6.23-rc6/fs/jbd/revoke.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/revoke.c   2007-09-19 11:51:30.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/revoke.c2007-09-19 11:52:34.0 -0700
@@ -206,7 +206,7 @@ int journal_init_revoke(journal_t *journ
while((tmp >>= 1UL) != 0UL)
shift++;
 
-   journal->j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, 
GFP_KERNEL);
+   journal->j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, 
GFP_NOFS);
if (!journal->j_revoke_table[0])
return -ENOMEM;
journal->j_revoke = journal->j_revoke_table[0];
@@ -219,7 +219,7 @@ int journal_init_revoke(journal_t *journ
journal->j_revoke->hash_shift = shift;
 
journal->j_revoke->hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   kmalloc(hash_size * sizeof(struct list_head), GFP_NOFS);
if (!journal->j_revoke->hash_table) {
kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
journal->j_revoke = NULL;
@@ -229,7 +229,7 @@ int journal_init_revoke(journal_t *journ
for (tmp = 0; tmp < hash_size; tmp++)
INIT_LIST_HEAD(>j_revoke->hash_table[tmp]);
 
-   journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
GFP_KERNEL);
+   journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
GFP_NOFS);
if (!journal->j_revoke_table[1]) {
kfree(journal->j_revoke_table[0]->hash_table);
kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
@@ -246,7 +246,7 @@ int journal_init_revoke(journal_t *journ
journal->j_revoke->hash_shift = shift;
 
journal->j_revoke->hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   kmalloc(hash_size * sizeof(struct list_head), GFP_NOFS);
if (!journal->j_revoke->hash_table) {
kfree(journal->j_revoke_table[0]->hash_table);
kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
Index: linux-2.6.23-rc6/fs/jbd2/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-19 11:52:48.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd2/journal.c  2007-09-19 11:53:12.0 -0700
@@ -654,7 +654,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_NOFS);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -724,7 +724,7 @@ journal_t * jbd2_journal_init_dev(struct
journal-

Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Mingming Cao

On Tue, 2007-09-18 at 19:19 -0700, Andrew Morton wrote:
> On Tue, 18 Sep 2007 18:00:01 -0700 Mingming Cao <[EMAIL PROTECTED]> wrote:
> 
> > JBD: Replace slab allocations with page cache allocations
> > 
> > JBD allocate memory for committed_data and frozen_data from slab. However
> > JBD should not pass slab pages down to the block layer. Use page allocator 
> > pages instead. This will also prepare JBD for the large blocksize patchset.
> > 
> > 
> > Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly
> 
> __GFP_NOFAIL should only be used when we have no way of recovering
> from failure.  The allocation in journal_init_common() (at least)
> _can_ recover and hence really shouldn't be using __GFP_NOFAIL.
> 
> (Actually, nothing in the kernel should be using __GFP_NOFAIL.  It is 
> there as a marker which says "we really shouldn't be doing this but
> we don't know how to fix it").
> 
> So sometime it'd be good if you could review all the __GFP_NOFAILs in
> there and see if we can remove some, thanks.

Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2. In all
cases except one handles memory allocation failure so I get rid of those
GFP_NOFAIL flags.

Also, shouldn't we use GFP_KERNEL instead of GFP_NOFS flag for kmalloc
in jbd/jbd2? I will send a separate patch to cleanup that.

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/jbd/journal.c  |2 +-
 fs/jbd/transaction.c  |3 +--
 fs/jbd2/journal.c |2 +-
 fs/jbd2/transaction.c |3 +--
 4 files changed, 4 insertions(+), 6 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-19 11:47:58.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-19 11:48:40.0 -0700
@@ -653,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
Index: linux-2.6.23-rc6/fs/jbd/transaction.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/transaction.c  2007-09-19 11:48:05.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/transaction.c   2007-09-19 11:49:10.0 
-0700
@@ -96,8 +96,7 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal->j_running_transaction) {
-   new_transaction = kmalloc(sizeof(*new_transaction),
-   GFP_NOFS|__GFP_NOFAIL);
+   new_transaction = kmalloc(sizeof(*new_transaction), GFP_NOFS);
if (!new_transaction) {
ret = -ENOMEM;
goto out;
Index: linux-2.6.23-rc6/fs/jbd2/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-19 11:48:14.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd2/journal.c  2007-09-19 11:49:46.0 -0700
@@ -654,7 +654,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
Index: linux-2.6.23-rc6/fs/jbd2/transaction.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/transaction.c 2007-09-19 11:48:08.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd2/transaction.c  2007-09-19 11:50:12.0 
-0700
@@ -96,8 +96,7 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal->j_running_transaction) {
-   new_transaction = kmalloc(sizeof(*new_transaction),
-   GFP_NOFS|__GFP_NOFAIL);
+   new_transaction = kmalloc(sizeof(*new_transaction), GFP_NOFS);
if (!new_transaction) {
ret = -ENOMEM;
goto out;




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Mingming Cao

On Tue, 2007-09-18 at 19:19 -0700, Andrew Morton wrote:
 On Tue, 18 Sep 2007 18:00:01 -0700 Mingming Cao [EMAIL PROTECTED] wrote:
 
  JBD: Replace slab allocations with page cache allocations
  
  JBD allocate memory for committed_data and frozen_data from slab. However
  JBD should not pass slab pages down to the block layer. Use page allocator 
  pages instead. This will also prepare JBD for the large blocksize patchset.
  
  
  Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly
 
 __GFP_NOFAIL should only be used when we have no way of recovering
 from failure.  The allocation in journal_init_common() (at least)
 _can_ recover and hence really shouldn't be using __GFP_NOFAIL.
 
 (Actually, nothing in the kernel should be using __GFP_NOFAIL.  It is 
 there as a marker which says we really shouldn't be doing this but
 we don't know how to fix it).
 
 So sometime it'd be good if you could review all the __GFP_NOFAILs in
 there and see if we can remove some, thanks.

Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2. In all
cases except one handles memory allocation failure so I get rid of those
GFP_NOFAIL flags.

Also, shouldn't we use GFP_KERNEL instead of GFP_NOFS flag for kmalloc
in jbd/jbd2? I will send a separate patch to cleanup that.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/jbd/journal.c  |2 +-
 fs/jbd/transaction.c  |3 +--
 fs/jbd2/journal.c |2 +-
 fs/jbd2/transaction.c |3 +--
 4 files changed, 4 insertions(+), 6 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-19 11:47:58.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-19 11:48:40.0 -0700
@@ -653,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
Index: linux-2.6.23-rc6/fs/jbd/transaction.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/transaction.c  2007-09-19 11:48:05.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/transaction.c   2007-09-19 11:49:10.0 
-0700
@@ -96,8 +96,7 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal-j_running_transaction) {
-   new_transaction = kmalloc(sizeof(*new_transaction),
-   GFP_NOFS|__GFP_NOFAIL);
+   new_transaction = kmalloc(sizeof(*new_transaction), GFP_NOFS);
if (!new_transaction) {
ret = -ENOMEM;
goto out;
Index: linux-2.6.23-rc6/fs/jbd2/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-19 11:48:14.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd2/journal.c  2007-09-19 11:49:46.0 -0700
@@ -654,7 +654,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
Index: linux-2.6.23-rc6/fs/jbd2/transaction.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/transaction.c 2007-09-19 11:48:08.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd2/transaction.c  2007-09-19 11:50:12.0 
-0700
@@ -96,8 +96,7 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal-j_running_transaction) {
-   new_transaction = kmalloc(sizeof(*new_transaction),
-   GFP_NOFS|__GFP_NOFAIL);
+   new_transaction = kmalloc(sizeof(*new_transaction), GFP_NOFS);
if (!new_transaction) {
ret = -ENOMEM;
goto out;




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] JBD: use GFP_NOFS in kmalloc

2007-09-19 Thread Mingming Cao

Convert the GFP_KERNEL flag used in JBD/JBD2 to GFP_NOFS, consistent
with the rest of kmalloc flag used in the JBD/JBD2 layer.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]

---
 fs/jbd/journal.c  |6 +++---
 fs/jbd/revoke.c   |8 
 fs/jbd2/journal.c |6 +++---
 fs/jbd2/revoke.c  |8 
 4 files changed, 14 insertions(+), 14 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-19 11:51:10.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-19 11:51:57.0 -0700
@@ -653,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_NOFS);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -723,7 +723,7 @@ journal_t * journal_init_dev(struct bloc
journal-j_blocksize = blocksize;
n = journal-j_blocksize / sizeof(journal_block_tag_t);
journal-j_wbufsize = n;
-   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
if (!journal-j_wbuf) {
printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
__FUNCTION__);
@@ -777,7 +777,7 @@ journal_t * journal_init_inode (struct i
/* journal descriptor can store up to n blocks -bzzz */
n = journal-j_blocksize / sizeof(journal_block_tag_t);
journal-j_wbufsize = n;
-   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
if (!journal-j_wbuf) {
printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
__FUNCTION__);
Index: linux-2.6.23-rc6/fs/jbd/revoke.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/revoke.c   2007-09-19 11:51:30.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/revoke.c2007-09-19 11:52:34.0 -0700
@@ -206,7 +206,7 @@ int journal_init_revoke(journal_t *journ
while((tmp = 1UL) != 0UL)
shift++;
 
-   journal-j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, 
GFP_KERNEL);
+   journal-j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, 
GFP_NOFS);
if (!journal-j_revoke_table[0])
return -ENOMEM;
journal-j_revoke = journal-j_revoke_table[0];
@@ -219,7 +219,7 @@ int journal_init_revoke(journal_t *journ
journal-j_revoke-hash_shift = shift;
 
journal-j_revoke-hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   kmalloc(hash_size * sizeof(struct list_head), GFP_NOFS);
if (!journal-j_revoke-hash_table) {
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
journal-j_revoke = NULL;
@@ -229,7 +229,7 @@ int journal_init_revoke(journal_t *journ
for (tmp = 0; tmp  hash_size; tmp++)
INIT_LIST_HEAD(journal-j_revoke-hash_table[tmp]);
 
-   journal-j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
GFP_KERNEL);
+   journal-j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
GFP_NOFS);
if (!journal-j_revoke_table[1]) {
kfree(journal-j_revoke_table[0]-hash_table);
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
@@ -246,7 +246,7 @@ int journal_init_revoke(journal_t *journ
journal-j_revoke-hash_shift = shift;
 
journal-j_revoke-hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   kmalloc(hash_size * sizeof(struct list_head), GFP_NOFS);
if (!journal-j_revoke-hash_table) {
kfree(journal-j_revoke_table[0]-hash_table);
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
Index: linux-2.6.23-rc6/fs/jbd2/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-19 11:52:48.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd2/journal.c  2007-09-19 11:53:12.0 -0700
@@ -654,7 +654,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_NOFS);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -724,7 +724,7 @@ journal_t * jbd2_journal_init_dev(struct
journal-j_blocksize = blocksize;
n = journal-j_blocksize / sizeof(journal_block_tag_t);
journal-j_wbufsize = n;
-   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL

Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Mingming Cao

On Wed, 2007-09-19 at 19:28 +, Dave Kleikamp wrote:
 On Wed, 2007-09-19 at 14:26 -0500, Dave Kleikamp wrote:
  On Wed, 2007-09-19 at 12:15 -0700, Mingming Cao wrote:
  
   Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2. In all
   cases except one handles memory allocation failure so I get rid of those
   GFP_NOFAIL flags.
   
   Also, shouldn't we use GFP_KERNEL instead of GFP_NOFS flag for kmalloc
   in jbd/jbd2? I will send a separate patch to cleanup that.
  
  No.  GFP_NOFS avoids deadlock.  It prevents the allocation from making
  recursive calls back into the file system that could end up blocking on
  jbd code.
 
 Oh, I see your patch now.  You mean use GFP_NOFS instead of
 GFP_KERNEL.  :-)  OK then.
 

oops, I did mean what you say here.:-)

  Shaggy

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] JBD: use GFP_NOFS in kmalloc

2007-09-19 Thread Mingming Cao

On Wed, 2007-09-19 at 14:34 -0700, Andrew Morton wrote:
 On Wed, 19 Sep 2007 12:22:09 -0700
 Mingming Cao [EMAIL PROTECTED] wrote:
 
  Convert the GFP_KERNEL flag used in JBD/JBD2 to GFP_NOFS, consistent
  with the rest of kmalloc flag used in the JBD/JBD2 layer.
  
  Signed-off-by: Mingming Cao [EMAIL PROTECTED]
  
  ---
   fs/jbd/journal.c  |6 +++---
   fs/jbd/revoke.c   |8 
   fs/jbd2/journal.c |6 +++---
   fs/jbd2/revoke.c  |8 
   4 files changed, 14 insertions(+), 14 deletions(-)
  
  Index: linux-2.6.23-rc6/fs/jbd/journal.c
  ===
  --- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-19 11:51:10.0 
  -0700
  +++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-19 11:51:57.0 
  -0700
  @@ -653,7 +653,7 @@ static journal_t * journal_init_common (
  journal_t *journal;
  int err;
   
  -   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
  +   journal = kmalloc(sizeof(*journal), GFP_NOFS);
  if (!journal)
  goto fail;
  memset(journal, 0, sizeof(*journal));
  @@ -723,7 +723,7 @@ journal_t * journal_init_dev(struct bloc
  journal-j_blocksize = blocksize;
  n = journal-j_blocksize / sizeof(journal_block_tag_t);
  journal-j_wbufsize = n;
  -   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
  +   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
  if (!journal-j_wbuf) {
  printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
  __FUNCTION__);
  @@ -777,7 +777,7 @@ journal_t * journal_init_inode (struct i
  /* journal descriptor can store up to n blocks -bzzz */
  n = journal-j_blocksize / sizeof(journal_block_tag_t);
  journal-j_wbufsize = n;
  -   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
  +   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
  if (!journal-j_wbuf) {
  printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
  __FUNCTION__);
  Index: linux-2.6.23-rc6/fs/jbd/revoke.c
  ===
  --- linux-2.6.23-rc6.orig/fs/jbd/revoke.c   2007-09-19 11:51:30.0 
  -0700
  +++ linux-2.6.23-rc6/fs/jbd/revoke.c2007-09-19 11:52:34.0 
  -0700
  @@ -206,7 +206,7 @@ int journal_init_revoke(journal_t *journ
  while((tmp = 1UL) != 0UL)
  shift++;
   
  -   journal-j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, 
  GFP_KERNEL);
  +   journal-j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, 
  GFP_NOFS);
  if (!journal-j_revoke_table[0])
  return -ENOMEM;
  journal-j_revoke = journal-j_revoke_table[0];
  @@ -219,7 +219,7 @@ int journal_init_revoke(journal_t *journ
  journal-j_revoke-hash_shift = shift;
   
  journal-j_revoke-hash_table =
  -   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
  +   kmalloc(hash_size * sizeof(struct list_head), GFP_NOFS);
  if (!journal-j_revoke-hash_table) {
  kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
  journal-j_revoke = NULL;
  @@ -229,7 +229,7 @@ int journal_init_revoke(journal_t *journ
  for (tmp = 0; tmp  hash_size; tmp++)
  INIT_LIST_HEAD(journal-j_revoke-hash_table[tmp]);
   
  -   journal-j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
  GFP_KERNEL);
  +   journal-j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
  GFP_NOFS);
  if (!journal-j_revoke_table[1]) {
  kfree(journal-j_revoke_table[0]-hash_table);
  kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
  @@ -246,7 +246,7 @@ int journal_init_revoke(journal_t *journ
  journal-j_revoke-hash_shift = shift;
   
  journal-j_revoke-hash_table =
  -   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
  +   kmalloc(hash_size * sizeof(struct list_head), GFP_NOFS);
  if (!journal-j_revoke-hash_table) {
  kfree(journal-j_revoke_table[0]-hash_table);
  kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
 
 These were all OK using GFP_KERNEL.
 
 GFP_NOFS should only be used when the caller is holding some fs locks which
 might cause a deadlock if that caller reentered the fs in -writepage (and
 maybe put_inode and such).  That isn't the case in any of the above code,
 which is all mount time stuff (I think).
 

You are right they are all occur at initialization time.

 ext3/4 should be using GFP_NOFS when the caller has a transaction open, has
 a page locked, is holding i_mutex, etc.
 

Thanks for your feedback.

Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org

Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Mingming Cao

On Wed, 2007-09-19 at 13:48 -0600, Andreas Dilger wrote:
 On Sep 19, 2007  12:15 -0700, Mingming Cao wrote:
  @@ -96,8 +96,7 @@ static int start_this_handle(journal_t *
   
   alloc_transaction:
  if (!journal-j_running_transaction) {
  -   new_transaction = kmalloc(sizeof(*new_transaction),
  -   GFP_NOFS|__GFP_NOFAIL);
  +   new_transaction = kmalloc(sizeof(*new_transaction), GFP_NOFS);
 
 This should probably be a __GFP_NOFAIL if we are trying to start a new
 handle in truncate, as there is no way to propagate an error to the caller.
 

Thanks, updated version.

Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2, most cases
they are not needed.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/jbd/journal.c  |2 +-
 fs/jbd2/journal.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-19 11:47:58.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-19 14:23:45.0 -0700
@@ -653,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
Index: linux-2.6.23-rc6/fs/jbd2/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-19 11:48:14.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd2/journal.c  2007-09-19 14:23:45.0 -0700
@@ -654,7 +654,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao

On Tue, 2007-09-18 at 13:04 -0500, Dave Kleikamp wrote:
> On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote:
> > On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
> > > On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
> > > > Here is the incremental small cleanup patch. 
> > > > 
> > > > Remove kamlloc usages in jbd/jbd2 and consistently use 
> > > > jbd_kmalloc/jbd2_malloc.
> > > 
> > > Shouldn't we kill jbd_kmalloc instead?
> > > 
> > 
> > It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
> > places to handle memory (de)allocation( > in the future if we need to change memory allocation in jbd(e.g. not
> > using kmalloc or using different flag), we don't need to touch every
> > place in the jbd code calling jbd_kmalloc.
> 
> I disagree.  Why would jbd need to globally change the way it allocates
> memory?  It currently uses kmalloc (and jbd_kmalloc) for allocating a
> variety of structures.  Having to change one particular instance won't
> necessarily mean we want to change all of them.  Adding unnecessary
> wrappers only obfuscates the code making it harder to understand.  You
> wouldn't want every subsystem to have it's own *_kmalloc() that took
> different arguments.  Besides, there aren't that many calls to kmalloc
> and kfree in the jbd code, so there wouldn't be much pain in changing
> GFP flags or whatever, if it ever needed to be done.
> 
> Shaggy

Okay, Points taken, Here is the updated patch to get rid of slab
management and jbd_kmalloc from jbd totally. This patch is intend to
replace the patch in mm tree, Andrew, could you pick up this one
instead?

Thanks,

Mingming


jbd/jbd2: JBD memory allocation cleanups

From: Christoph Lameter <[EMAIL PROTECTED]>

JBD: Replace slab allocations with page cache allocations

JBD allocate memory for committed_data and frozen_data from slab. However
JBD should not pass slab pages down to the block layer. Use page allocator 
pages instead. This will also prepare JBD for the large blocksize patchset.


Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>

---
 fs/jbd/commit.c   |6 +--
 fs/jbd/journal.c  |   99 ++
 fs/jbd/transaction.c  |   12 +++---
 fs/jbd2/commit.c  |6 +--
 fs/jbd2/journal.c |   99 ++
 fs/jbd2/transaction.c |   18 -
 include/linux/jbd.h   |   18 +
 include/linux/jbd2.h  |   21 +-
 8 files changed, 52 insertions(+), 227 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-18 17:19:01.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-18 17:51:21.0 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd_slab_alloc(bh_in->b_size, GFP_NOFS);
+   tmp = jbd_alloc(bh_in->b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in->b_frozen_data) {
-   jbd_slab_free(tmp, bh_in->b_size);
+   jbd_free(tmp, bh_in->b_size);
goto repeat;
}
 
@@ -654,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = jbd_kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = journal_create_jbd_slab(be32_to_cpu(sb->s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (journal_recover(journal))
@@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(si

Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao

On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
> On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
> > Here is the incremental small cleanup patch. 
> > 
> > Remove kamlloc usages in jbd/jbd2 and consistently use 
> > jbd_kmalloc/jbd2_malloc.
> 
> Shouldn't we kill jbd_kmalloc instead?
> 

It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
places to handle memory (de)allocation(http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao

On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
 On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
  Here is the incremental small cleanup patch. 
  
  Remove kamlloc usages in jbd/jbd2 and consistently use 
  jbd_kmalloc/jbd2_malloc.
 
 Shouldn't we kill jbd_kmalloc instead?
 

It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
places to handle memory (de)allocation(page size) via kmalloc/kfree, so
in the future if we need to change memory allocation in jbd(e.g. not
using kmalloc or using different flag), we don't need to touch every
place in the jbd code calling jbd_kmalloc.

Regards,
Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao

On Tue, 2007-09-18 at 13:04 -0500, Dave Kleikamp wrote:
 On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote:
  On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
   On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
Here is the incremental small cleanup patch. 

Remove kamlloc usages in jbd/jbd2 and consistently use 
jbd_kmalloc/jbd2_malloc.
   
   Shouldn't we kill jbd_kmalloc instead?
   
  
  It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
  places to handle memory (de)allocation(page size) via kmalloc/kfree, so
  in the future if we need to change memory allocation in jbd(e.g. not
  using kmalloc or using different flag), we don't need to touch every
  place in the jbd code calling jbd_kmalloc.
 
 I disagree.  Why would jbd need to globally change the way it allocates
 memory?  It currently uses kmalloc (and jbd_kmalloc) for allocating a
 variety of structures.  Having to change one particular instance won't
 necessarily mean we want to change all of them.  Adding unnecessary
 wrappers only obfuscates the code making it harder to understand.  You
 wouldn't want every subsystem to have it's own *_kmalloc() that took
 different arguments.  Besides, there aren't that many calls to kmalloc
 and kfree in the jbd code, so there wouldn't be much pain in changing
 GFP flags or whatever, if it ever needed to be done.
 
 Shaggy

Okay, Points taken, Here is the updated patch to get rid of slab
management and jbd_kmalloc from jbd totally. This patch is intend to
replace the patch in mm tree, Andrew, could you pick up this one
instead?

Thanks,

Mingming


jbd/jbd2: JBD memory allocation cleanups

From: Christoph Lameter [EMAIL PROTECTED]

JBD: Replace slab allocations with page cache allocations

JBD allocate memory for committed_data and frozen_data from slab. However
JBD should not pass slab pages down to the block layer. Use page allocator 
pages instead. This will also prepare JBD for the large blocksize patchset.


Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]

---
 fs/jbd/commit.c   |6 +--
 fs/jbd/journal.c  |   99 ++
 fs/jbd/transaction.c  |   12 +++---
 fs/jbd2/commit.c  |6 +--
 fs/jbd2/journal.c |   99 ++
 fs/jbd2/transaction.c |   18 -
 include/linux/jbd.h   |   18 +
 include/linux/jbd2.h  |   21 +-
 8 files changed, 52 insertions(+), 227 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-18 17:19:01.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-18 17:51:21.0 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd_slab_alloc(bh_in-b_size, GFP_NOFS);
+   tmp = jbd_alloc(bh_in-b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in-b_frozen_data) {
-   jbd_slab_free(tmp, bh_in-b_size);
+   jbd_free(tmp, bh_in-b_size);
goto repeat;
}
 
@@ -654,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = jbd_kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = journal_create_jbd_slab(be32_to_cpu(sb-s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (journal_recover(journal))
@@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
-}
-
-/*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could

Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Mingming Cao

On Mon, 2007-09-17 at 15:01 -0700, Badari Pulavarty wrote:
> On Mon, 2007-09-17 at 12:29 -0700, Mingming Cao wrote:
> > On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
> > > jbd/jbd2: Replace slab allocations with page cache allocations
> > > 
> > > From: Christoph Lameter <[EMAIL PROTECTED]>
> > > 
> > > JBD should not pass slab pages down to the block layer.
> > > Use page allocator pages instead. This will also prepare
> > > JBD for the large blocksize patchset.
> > > 
> > 
> > Currently memory allocation for committed_data(and frozen_buffer) for
> > bufferhead is done through jbd slab management, as Christoph Hellwig
> > pointed out that this is broken as jbd should not pass slab pages down
> > to IO layer. and suggested to use get_free_pages() directly.
> > 
> > The problem with this patch, as Andreas Dilger pointed today in ext4
> > interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
> > 1/3-1/2 page space. 
> > 
> > What was the originally intention to set up slabs for committed_data(and
> > frozen_buffer) in JBD? Why not using kmalloc?
> > 
> > Mingming
> 
> Looks good. Small suggestion is to get rid of all kmalloc() usages and
> consistently use jbd_kmalloc() or jbd2_kmalloc().
> 
> Thanks,
> Badari
> 

Here is the incremental small cleanup patch. 

Remove kamlloc usages in jbd/jbd2 and consistently use jbd_kmalloc/jbd2_malloc.


Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/jbd/journal.c  |8 +---
 fs/jbd/revoke.c   |   12 ++--
 fs/jbd2/journal.c |8 +---
 fs/jbd2/revoke.c  |   12 ++--
 4 files changed, 22 insertions(+), 18 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-17 14:32:16.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-17 14:33:59.0 -0700
@@ -723,7 +723,8 @@ journal_t * journal_init_dev(struct bloc
journal->j_blocksize = blocksize;
n = journal->j_blocksize / sizeof(journal_block_tag_t);
journal->j_wbufsize = n;
-   journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal->j_wbuf = jbd_kmalloc(n * sizeof(struct buffer_head*),
+   GFP_KERNEL);
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
@@ -777,7 +778,8 @@ journal_t * journal_init_inode (struct i
/* journal descriptor can store up to n blocks -bzzz */
n = journal->j_blocksize / sizeof(journal_block_tag_t);
journal->j_wbufsize = n;
-   journal->j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal->j_wbuf = jbd_kmalloc(n * sizeof(struct buffer_head*),
+   GFP_KERNEL);
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
@@ -1157,7 +1159,7 @@ void journal_destroy(journal_t *journal)
iput(journal->j_inode);
if (journal->j_revoke)
journal_destroy_revoke(journal);
-   kfree(journal->j_wbuf);
+   jbd_kfree(journal->j_wbuf);
jbd_kfree(journal);
 }
 
Index: linux-2.6.23-rc6/fs/jbd/revoke.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/revoke.c   2007-09-17 14:32:22.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/revoke.c2007-09-17 14:35:13.0 -0700
@@ -219,7 +219,7 @@ int journal_init_revoke(journal_t *journ
journal->j_revoke->hash_shift = shift;
 
journal->j_revoke->hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   jbd_kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
if (!journal->j_revoke->hash_table) {
kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
journal->j_revoke = NULL;
@@ -231,7 +231,7 @@ int journal_init_revoke(journal_t *journ
 
journal->j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
GFP_KERNEL);
if (!journal->j_revoke_table[1]) {
-   kfree(journal->j_revoke_table[0]->hash_table);
+   jbd_kfree(journal->j_revoke_table[0]->hash_table);
kmem_cache_free(revoke_table_cache, journal->j_revoke_table[0]);
return -ENOMEM;
}
@@ -246,9 +246,9 @@ int journal_init_revoke(journal_t *journ
journal->j_revoke->hash_shift = shift;
 
journal->j_revoke->hash_table =
-

Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Mingming Cao

On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
> jbd/jbd2: Replace slab allocations with page cache allocations
> 
> From: Christoph Lameter <[EMAIL PROTECTED]>
> 
> JBD should not pass slab pages down to the block layer.
> Use page allocator pages instead. This will also prepare
> JBD for the large blocksize patchset.
> 

Currently memory allocation for committed_data(and frozen_buffer) for
bufferhead is done through jbd slab management, as Christoph Hellwig
pointed out that this is broken as jbd should not pass slab pages down
to IO layer. and suggested to use get_free_pages() directly.

The problem with this patch, as Andreas Dilger pointed today in ext4
interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
1/3-1/2 page space. 

What was the originally intention to set up slabs for committed_data(and
frozen_buffer) in JBD? Why not using kmalloc?

Mingming

> Tested on 2.6.23-rc6 with fsx runs fine.
> 
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
> ---
>  fs/jbd/checkpoint.c   |2 
>  fs/jbd/commit.c   |6 +-
>  fs/jbd/journal.c  |  107 
> -
>  fs/jbd/transaction.c  |   10 ++--
>  fs/jbd2/checkpoint.c  |2 
>  fs/jbd2/commit.c  |6 +-
>  fs/jbd2/journal.c |  109 
> --
>  fs/jbd2/transaction.c |   18 
>  include/linux/jbd.h   |   23 +-
>  include/linux/jbd2.h  |   28 ++--
>  10 files changed, 83 insertions(+), 228 deletions(-)
> 
> Index: linux-2.6.23-rc5/fs/jbd/journal.c
> ===
> --- linux-2.6.23-rc5.orig/fs/jbd/journal.c2007-09-13 13:37:57.0 
> -0700
> +++ linux-2.6.23-rc5/fs/jbd/journal.c 2007-09-13 13:45:39.0 -0700
> @@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
> 
>  static int journal_convert_superblock_v1(journal_t *, journal_superblock_t 
> *);
>  static void __journal_abort_soft (journal_t *journal, int errno);
> -static int journal_create_jbd_slab(size_t slab_size);
> 
>  /*
>   * Helper function used to manage commit timeouts
> @@ -334,10 +333,10 @@ repeat:
>   char *tmp;
> 
>   jbd_unlock_bh_state(bh_in);
> - tmp = jbd_slab_alloc(bh_in->b_size, GFP_NOFS);
> + tmp = jbd_alloc(bh_in->b_size, GFP_NOFS);
>   jbd_lock_bh_state(bh_in);
>   if (jh_in->b_frozen_data) {
> - jbd_slab_free(tmp, bh_in->b_size);
> + jbd_free(tmp, bh_in->b_size);
>   goto repeat;
>   }
> 
> @@ -679,7 +678,7 @@ static journal_t * journal_init_common (
>   /* Set up a default-sized revoke table for the new mount. */
>   err = journal_init_revoke(journal, JOURNAL_REVOKE_DEFAULT_HASH);
>   if (err) {
> - kfree(journal);
> + jbd_kfree(journal);
>   goto fail;
>   }
>   return journal;
> @@ -728,7 +727,7 @@ journal_t * journal_init_dev(struct bloc
>   if (!journal->j_wbuf) {
>   printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
>   __FUNCTION__);
> - kfree(journal);
> + jbd_kfree(journal);
>   journal = NULL;
>   goto out;
>   }
> @@ -782,7 +781,7 @@ journal_t * journal_init_inode (struct i
>   if (!journal->j_wbuf) {
>   printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
>   __FUNCTION__);
> - kfree(journal);
> + jbd_kfree(journal);
>   return NULL;
>   }
> 
> @@ -791,7 +790,7 @@ journal_t * journal_init_inode (struct i
>   if (err) {
>   printk(KERN_ERR "%s: Cannnot locate journal superblock\n",
>  __FUNCTION__);
> - kfree(journal);
> + jbd_kfree(journal);
>   return NULL;
>   }
> 
> @@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
>   }
>   }
> 
> - /*
> -  * Create a slab for this blocksize
> -  */
> - err = journal_create_jbd_slab(be32_to_cpu(sb->s_blocksize));
> - if (err)
> - return err;
> -
>   /* Let the recovery code check whether it needs to recover any
>* data from the journal. */
>   if (journal_recover(journal))
> @@ -1166,7 +1158,7 @@ void journal_destroy(journal_t *journal)
>   if (journal->j_revoke)
>   journal_destroy_revoke(journal);
>   kfree(journal-&g

Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Mingming Cao

On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
 jbd/jbd2: Replace slab allocations with page cache allocations
 
 From: Christoph Lameter [EMAIL PROTECTED]
 
 JBD should not pass slab pages down to the block layer.
 Use page allocator pages instead. This will also prepare
 JBD for the large blocksize patchset.
 

Currently memory allocation for committed_data(and frozen_buffer) for
bufferhead is done through jbd slab management, as Christoph Hellwig
pointed out that this is broken as jbd should not pass slab pages down
to IO layer. and suggested to use get_free_pages() directly.

The problem with this patch, as Andreas Dilger pointed today in ext4
interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
1/3-1/2 page space. 

What was the originally intention to set up slabs for committed_data(and
frozen_buffer) in JBD? Why not using kmalloc?

Mingming

 Tested on 2.6.23-rc6 with fsx runs fine.
 
 Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
 Signed-off-by: Mingming Cao [EMAIL PROTECTED]
 ---
  fs/jbd/checkpoint.c   |2 
  fs/jbd/commit.c   |6 +-
  fs/jbd/journal.c  |  107 
 -
  fs/jbd/transaction.c  |   10 ++--
  fs/jbd2/checkpoint.c  |2 
  fs/jbd2/commit.c  |6 +-
  fs/jbd2/journal.c |  109 
 --
  fs/jbd2/transaction.c |   18 
  include/linux/jbd.h   |   23 +-
  include/linux/jbd2.h  |   28 ++--
  10 files changed, 83 insertions(+), 228 deletions(-)
 
 Index: linux-2.6.23-rc5/fs/jbd/journal.c
 ===
 --- linux-2.6.23-rc5.orig/fs/jbd/journal.c2007-09-13 13:37:57.0 
 -0700
 +++ linux-2.6.23-rc5/fs/jbd/journal.c 2007-09-13 13:45:39.0 -0700
 @@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
  static int journal_convert_superblock_v1(journal_t *, journal_superblock_t 
 *);
  static void __journal_abort_soft (journal_t *journal, int errno);
 -static int journal_create_jbd_slab(size_t slab_size);
 
  /*
   * Helper function used to manage commit timeouts
 @@ -334,10 +333,10 @@ repeat:
   char *tmp;
 
   jbd_unlock_bh_state(bh_in);
 - tmp = jbd_slab_alloc(bh_in-b_size, GFP_NOFS);
 + tmp = jbd_alloc(bh_in-b_size, GFP_NOFS);
   jbd_lock_bh_state(bh_in);
   if (jh_in-b_frozen_data) {
 - jbd_slab_free(tmp, bh_in-b_size);
 + jbd_free(tmp, bh_in-b_size);
   goto repeat;
   }
 
 @@ -679,7 +678,7 @@ static journal_t * journal_init_common (
   /* Set up a default-sized revoke table for the new mount. */
   err = journal_init_revoke(journal, JOURNAL_REVOKE_DEFAULT_HASH);
   if (err) {
 - kfree(journal);
 + jbd_kfree(journal);
   goto fail;
   }
   return journal;
 @@ -728,7 +727,7 @@ journal_t * journal_init_dev(struct bloc
   if (!journal-j_wbuf) {
   printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
   __FUNCTION__);
 - kfree(journal);
 + jbd_kfree(journal);
   journal = NULL;
   goto out;
   }
 @@ -782,7 +781,7 @@ journal_t * journal_init_inode (struct i
   if (!journal-j_wbuf) {
   printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
   __FUNCTION__);
 - kfree(journal);
 + jbd_kfree(journal);
   return NULL;
   }
 
 @@ -791,7 +790,7 @@ journal_t * journal_init_inode (struct i
   if (err) {
   printk(KERN_ERR %s: Cannnot locate journal superblock\n,
  __FUNCTION__);
 - kfree(journal);
 + jbd_kfree(journal);
   return NULL;
   }
 
 @@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
   }
   }
 
 - /*
 -  * Create a slab for this blocksize
 -  */
 - err = journal_create_jbd_slab(be32_to_cpu(sb-s_blocksize));
 - if (err)
 - return err;
 -
   /* Let the recovery code check whether it needs to recover any
* data from the journal. */
   if (journal_recover(journal))
 @@ -1166,7 +1158,7 @@ void journal_destroy(journal_t *journal)
   if (journal-j_revoke)
   journal_destroy_revoke(journal);
   kfree(journal-j_wbuf);
 - kfree(journal);
 + jbd_kfree(journal);
  }
 
 
 @@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
  }
 
  /*
 - * Simple support for retrying memory allocations.  Introduced to help to
 - * debug different VM deadlock avoidance strategies.
 - */
 -void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
 -{
 - return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
 -}
 -
 -/*
 - * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
 - * and allocate frozen

Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Mingming Cao

On Mon, 2007-09-17 at 15:01 -0700, Badari Pulavarty wrote:
 On Mon, 2007-09-17 at 12:29 -0700, Mingming Cao wrote:
  On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
   jbd/jbd2: Replace slab allocations with page cache allocations
   
   From: Christoph Lameter [EMAIL PROTECTED]
   
   JBD should not pass slab pages down to the block layer.
   Use page allocator pages instead. This will also prepare
   JBD for the large blocksize patchset.
   
  
  Currently memory allocation for committed_data(and frozen_buffer) for
  bufferhead is done through jbd slab management, as Christoph Hellwig
  pointed out that this is broken as jbd should not pass slab pages down
  to IO layer. and suggested to use get_free_pages() directly.
  
  The problem with this patch, as Andreas Dilger pointed today in ext4
  interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
  1/3-1/2 page space. 
  
  What was the originally intention to set up slabs for committed_data(and
  frozen_buffer) in JBD? Why not using kmalloc?
  
  Mingming
 
 Looks good. Small suggestion is to get rid of all kmalloc() usages and
 consistently use jbd_kmalloc() or jbd2_kmalloc().
 
 Thanks,
 Badari
 

Here is the incremental small cleanup patch. 

Remove kamlloc usages in jbd/jbd2 and consistently use jbd_kmalloc/jbd2_malloc.


Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/jbd/journal.c  |8 +---
 fs/jbd/revoke.c   |   12 ++--
 fs/jbd2/journal.c |8 +---
 fs/jbd2/revoke.c  |   12 ++--
 4 files changed, 22 insertions(+), 18 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-17 14:32:16.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-17 14:33:59.0 -0700
@@ -723,7 +723,8 @@ journal_t * journal_init_dev(struct bloc
journal-j_blocksize = blocksize;
n = journal-j_blocksize / sizeof(journal_block_tag_t);
journal-j_wbufsize = n;
-   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal-j_wbuf = jbd_kmalloc(n * sizeof(struct buffer_head*),
+   GFP_KERNEL);
if (!journal-j_wbuf) {
printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
__FUNCTION__);
@@ -777,7 +778,8 @@ journal_t * journal_init_inode (struct i
/* journal descriptor can store up to n blocks -bzzz */
n = journal-j_blocksize / sizeof(journal_block_tag_t);
journal-j_wbufsize = n;
-   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal-j_wbuf = jbd_kmalloc(n * sizeof(struct buffer_head*),
+   GFP_KERNEL);
if (!journal-j_wbuf) {
printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
__FUNCTION__);
@@ -1157,7 +1159,7 @@ void journal_destroy(journal_t *journal)
iput(journal-j_inode);
if (journal-j_revoke)
journal_destroy_revoke(journal);
-   kfree(journal-j_wbuf);
+   jbd_kfree(journal-j_wbuf);
jbd_kfree(journal);
 }
 
Index: linux-2.6.23-rc6/fs/jbd/revoke.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/revoke.c   2007-09-17 14:32:22.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/revoke.c2007-09-17 14:35:13.0 -0700
@@ -219,7 +219,7 @@ int journal_init_revoke(journal_t *journ
journal-j_revoke-hash_shift = shift;
 
journal-j_revoke-hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   jbd_kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
if (!journal-j_revoke-hash_table) {
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
journal-j_revoke = NULL;
@@ -231,7 +231,7 @@ int journal_init_revoke(journal_t *journ
 
journal-j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
GFP_KERNEL);
if (!journal-j_revoke_table[1]) {
-   kfree(journal-j_revoke_table[0]-hash_table);
+   jbd_kfree(journal-j_revoke_table[0]-hash_table);
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
return -ENOMEM;
}
@@ -246,9 +246,9 @@ int journal_init_revoke(journal_t *journ
journal-j_revoke-hash_shift = shift;
 
journal-j_revoke-hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   jbd_kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
if (!journal-j_revoke-hash_table) {
-   kfree(journal-j_revoke_table[0]-hash_table);
+   jbd_kfree(journal-j_revoke_table[0]-hash_table);
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
kmem_cache_free

[PATCH] JBD slab cleanups

2007-09-14 Thread Mingming Cao

jbd/jbd2: Replace slab allocations with page cache allocations

From: Christoph Lameter <[EMAIL PROTECTED]>

JBD should not pass slab pages down to the block layer.
Use page allocator pages instead. This will also prepare
JBD for the large blocksize patchset.

Tested on 2.6.23-rc6 with fsx runs fine.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>
---
 fs/jbd/checkpoint.c   |2 
 fs/jbd/commit.c   |6 +-
 fs/jbd/journal.c  |  107 -
 fs/jbd/transaction.c  |   10 ++--
 fs/jbd2/checkpoint.c  |2 
 fs/jbd2/commit.c  |6 +-
 fs/jbd2/journal.c |  109 --
 fs/jbd2/transaction.c |   18 
 include/linux/jbd.h   |   23 +-
 include/linux/jbd2.h  |   28 ++--
 10 files changed, 83 insertions(+), 228 deletions(-)

Index: linux-2.6.23-rc5/fs/jbd/journal.c
===
--- linux-2.6.23-rc5.orig/fs/jbd/journal.c  2007-09-13 13:37:57.0 
-0700
+++ linux-2.6.23-rc5/fs/jbd/journal.c   2007-09-13 13:45:39.0 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd_slab_alloc(bh_in->b_size, GFP_NOFS);
+   tmp = jbd_alloc(bh_in->b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in->b_frozen_data) {
-   jbd_slab_free(tmp, bh_in->b_size);
+   jbd_free(tmp, bh_in->b_size);
goto repeat;
}
 
@@ -679,7 +678,7 @@ static journal_t * journal_init_common (
/* Set up a default-sized revoke table for the new mount. */
err = journal_init_revoke(journal, JOURNAL_REVOKE_DEFAULT_HASH);
if (err) {
-   kfree(journal);
+   jbd_kfree(journal);
goto fail;
}
return journal;
@@ -728,7 +727,7 @@ journal_t * journal_init_dev(struct bloc
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
-   kfree(journal);
+   jbd_kfree(journal);
journal = NULL;
goto out;
}
@@ -782,7 +781,7 @@ journal_t * journal_init_inode (struct i
if (!journal->j_wbuf) {
printk(KERN_ERR "%s: Cant allocate bhs for commit thread\n",
__FUNCTION__);
-   kfree(journal);
+   jbd_kfree(journal);
return NULL;
}
 
@@ -791,7 +790,7 @@ journal_t * journal_init_inode (struct i
if (err) {
printk(KERN_ERR "%s: Cannnot locate journal superblock\n",
   __FUNCTION__);
-   kfree(journal);
+   jbd_kfree(journal);
return NULL;
}
 
@@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = journal_create_jbd_slab(be32_to_cpu(sb->s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (journal_recover(journal))
@@ -1166,7 +1158,7 @@ void journal_destroy(journal_t *journal)
if (journal->j_revoke)
journal_destroy_revoke(journal);
kfree(journal->j_wbuf);
-   kfree(journal);
+   jbd_kfree(journal);
 }
 
 
@@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
-}
-
-/*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
- */
-
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size)  (size >> 11)
-
-static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
-static const char *jbd_slab_names[JBD_MAX_SLABS] = {
-   "jbd_1k", "jbd_2k", "jbd_4k", NULL, "jbd_8k"
-};
-
-static void journal_destroy_jbd_slabs(void)
-{
-   int i;
-
-

[PATCH] JBD slab cleanups

2007-09-14 Thread Mingming Cao

jbd/jbd2: Replace slab allocations with page cache allocations

From: Christoph Lameter [EMAIL PROTECTED]

JBD should not pass slab pages down to the block layer.
Use page allocator pages instead. This will also prepare
JBD for the large blocksize patchset.

Tested on 2.6.23-rc6 with fsx runs fine.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/jbd/checkpoint.c   |2 
 fs/jbd/commit.c   |6 +-
 fs/jbd/journal.c  |  107 -
 fs/jbd/transaction.c  |   10 ++--
 fs/jbd2/checkpoint.c  |2 
 fs/jbd2/commit.c  |6 +-
 fs/jbd2/journal.c |  109 --
 fs/jbd2/transaction.c |   18 
 include/linux/jbd.h   |   23 +-
 include/linux/jbd2.h  |   28 ++--
 10 files changed, 83 insertions(+), 228 deletions(-)

Index: linux-2.6.23-rc5/fs/jbd/journal.c
===
--- linux-2.6.23-rc5.orig/fs/jbd/journal.c  2007-09-13 13:37:57.0 
-0700
+++ linux-2.6.23-rc5/fs/jbd/journal.c   2007-09-13 13:45:39.0 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd_slab_alloc(bh_in-b_size, GFP_NOFS);
+   tmp = jbd_alloc(bh_in-b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in-b_frozen_data) {
-   jbd_slab_free(tmp, bh_in-b_size);
+   jbd_free(tmp, bh_in-b_size);
goto repeat;
}
 
@@ -679,7 +678,7 @@ static journal_t * journal_init_common (
/* Set up a default-sized revoke table for the new mount. */
err = journal_init_revoke(journal, JOURNAL_REVOKE_DEFAULT_HASH);
if (err) {
-   kfree(journal);
+   jbd_kfree(journal);
goto fail;
}
return journal;
@@ -728,7 +727,7 @@ journal_t * journal_init_dev(struct bloc
if (!journal-j_wbuf) {
printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
__FUNCTION__);
-   kfree(journal);
+   jbd_kfree(journal);
journal = NULL;
goto out;
}
@@ -782,7 +781,7 @@ journal_t * journal_init_inode (struct i
if (!journal-j_wbuf) {
printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
__FUNCTION__);
-   kfree(journal);
+   jbd_kfree(journal);
return NULL;
}
 
@@ -791,7 +790,7 @@ journal_t * journal_init_inode (struct i
if (err) {
printk(KERN_ERR %s: Cannnot locate journal superblock\n,
   __FUNCTION__);
-   kfree(journal);
+   jbd_kfree(journal);
return NULL;
}
 
@@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = journal_create_jbd_slab(be32_to_cpu(sb-s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (journal_recover(journal))
@@ -1166,7 +1158,7 @@ void journal_destroy(journal_t *journal)
if (journal-j_revoke)
journal_destroy_revoke(journal);
kfree(journal-j_wbuf);
-   kfree(journal);
+   jbd_kfree(journal);
 }
 
 
@@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
-}
-
-/*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
- */
-
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size)  (size  11)
-
-static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
-static const char *jbd_slab_names[JBD_MAX_SLABS] = {
-   jbd_1k, jbd_2k, jbd_4k, NULL, jbd_8k
-};
-
-static void journal_destroy_jbd_slabs(void)
-{
-   int i;
-
-   for (i = 0; i  JBD_MAX_SLABS; i++) {
-   if (jbd_slab[i])
-   kmem_cache_destroy(jbd_slab[i]);
-   jbd_slab[i] = NULL

[RFC 2/2] JBD: blocks reservation fix for large block support

2007-08-31 Thread Mingming Cao

The blocks per page could be less or quals to 1 with the large block support in 
VM.
The patch fixed the way to calculate the number of blocks to reserve in journal 
in the
case blocksize > pagesize.



Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>

Index: my2.6/fs/jbd/journal.c
===
--- my2.6.orig/fs/jbd/journal.c 2007-08-31 13:27:16.0 -0700
+++ my2.6/fs/jbd/journal.c  2007-08-31 13:28:18.0 -0700
@@ -1611,7 +1611,12 @@ void journal_ack_err(journal_t *journal)
 
 int journal_blocks_per_page(struct inode *inode)
 {
-   return 1 << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
+   int bits = PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits;
+
+   if (bits > 0)
+   return 1 << bits;
+   else
+   return 1;
 }
 
 /*
Index: my2.6/fs/jbd2/journal.c
===
--- my2.6.orig/fs/jbd2/journal.c2007-08-31 13:32:21.0 -0700
+++ my2.6/fs/jbd2/journal.c 2007-08-31 13:32:30.0 -0700
@@ -1612,7 +1612,12 @@ void jbd2_journal_ack_err(journal_t *jou
 
 int jbd2_journal_blocks_per_page(struct inode *inode)
 {
-   return 1 << (PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits);
+   int bits = PAGE_CACHE_SHIFT - inode->i_sb->s_blocksize_bits;
+
+   if (bits > 0)
+   return 1 << bits;
+   else
+   return 1;
 }
 
 /*


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC 1/2] JBD: slab management support for large block(>8k)

2007-08-31 Thread Mingming Cao

>From clameter:
Teach jbd/jbd2 slab management to support >8k block size. Without this, it 
refused to mount on >8k ext3.

Signed-off-by: Mingming Cao <[EMAIL PROTECTED]>

Index: my2.6/fs/jbd/journal.c
===
--- my2.6.orig/fs/jbd/journal.c 2007-08-30 18:40:02.0 -0700
+++ my2.6/fs/jbd/journal.c  2007-08-31 11:01:18.0 -0700
@@ -1627,16 +1627,17 @@ void * __jbd_kmalloc (const char *where,
  * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
  * and allocate frozen and commit buffers from these slabs.
  *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
+ * (Note: We only seem to need the definitions here for the SLAB_DEBUG
+ * case. In non debug operations SLUB will find the corresponding kmalloc
+ * cache and create an alias. --clameter)
  */
-
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size)  (size >> 11)
+#define JBD_MAX_SLABS 7
+#define JBD_SLAB_INDEX(size)  get_order((size) << (PAGE_SHIFT - 10))
 
 static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
 static const char *jbd_slab_names[JBD_MAX_SLABS] = {
-   "jbd_1k", "jbd_2k", "jbd_4k", NULL, "jbd_8k"
+   "jbd_1k", "jbd_2k", "jbd_4k", "jbd_8k",
+   "jbd_16k", "jbd_32k", "jbd_64k"
 };
 
 static void journal_destroy_jbd_slabs(void)
Index: my2.6/fs/jbd2/journal.c
===
--- my2.6.orig/fs/jbd2/journal.c2007-08-30 18:40:02.0 -0700
+++ my2.6/fs/jbd2/journal.c 2007-08-31 11:04:37.0 -0700
@@ -1639,16 +1639,18 @@ void * __jbd2_kmalloc (const char *where
  * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
  * and allocate frozen and commit buffers from these slabs.
  *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
+ * (Note: We only seem to need the definitions here for the SLAB_DEBUG
+ * case. In non debug operations SLUB will find the corresponding kmalloc
+ * cache and create an alias. --clameter)
  */
 
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size)  (size >> 11)
+#define JBD_MAX_SLABS 7
+#define JBD_SLAB_INDEX(size)  get_order((size) << (PAGE_SHIFT - 10))
 
 static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
 static const char *jbd_slab_names[JBD_MAX_SLABS] = {
-   "jbd2_1k", "jbd2_2k", "jbd2_4k", NULL, "jbd2_8k"
+   "jbd2_1k", "jbd2_2k", "jbd2_4k", "jbd2_8k",
+"jbd2_16k", "jbd2_32k", "jbd2_64k"
 };
 
 static void jbd2_journal_destroy_jbd_slabs(void)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/4] Large Blocksize support for Ext2/3/4

2007-08-31 Thread Mingming Cao

On Wed, 2007-08-29 at 17:47 -0700, Mingming Cao wrote:

> Just rebase to 2.6.23-rc4 and against the ext4 patch queue. Compile tested 
> only. 
> 
> Next steps:
> Need a e2fsprogs changes to able test this feature. As mkfs needs to be
> educated not assuming rec_len to be blocksize all the time.
> Will try it with Christoph Lameter's large block patch next.
> 

Two problems were found when testing largeblock on ext3.  Patches to
follow. 

Good news is, with your changes, plus all these extN changes, I am able
to run ext2/3/4 with 64k block size, tested on x86 and ppc64 with 4k
page size. fsx test runs fine for an hour on ext3 with 16k blocksize on
x86:-)

Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 356 matches

Mail list logo