Re: [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification.

2008-02-22 Thread Mingming Cao
On Fri, 2008-02-22 at 20:09 +0530, Aneesh Kumar K.V wrote:
 We would like to get notified when we are doing a write on mmap section.
 This is needed with respect to preallocated area. We split the preallocated
 area into initialzed extent and uninitialzed extent in the call back. This
 let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
 that would result in data loss. The changes are also needed to handle ENOSPC
 when writing to an mmap section of files with holes.
 
 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 ---
  fs/ext4/file.c  |   19 ++-
  fs/ext4/inode.c |   60 
 +++
  include/linux/ext4_fs.h |1 +
  3 files changed, 79 insertions(+), 1 deletions(-)
 
 diff --git a/fs/ext4/file.c b/fs/ext4/file.c
 index 20507a2..77341c1 100644
 --- a/fs/ext4/file.c
 +++ b/fs/ext4/file.c
 @@ -123,6 +123,23 @@ force_commit:
   return ret;
  }
 
 +static struct vm_operations_struct ext4_file_vm_ops = {
 + .fault  = filemap_fault,
 + .page_mkwrite   = ext4_page_mkwrite,
 +};
 +
 +static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 +{
 + struct address_space *mapping = file-f_mapping;
 +
 + if (!mapping-a_ops-readpage)
 + return -ENOEXEC;
 + file_accessed(file);
 + vma-vm_ops = ext4_file_vm_ops;
 + vma-vm_flags |= VM_CAN_NONLINEAR;
 + return 0;
 +}
 +
  const struct file_operations ext4_file_operations = {
   .llseek = generic_file_llseek,
   .read   = do_sync_read,
 @@ -133,7 +150,7 @@ const struct file_operations ext4_file_operations = {
  #ifdef CONFIG_COMPAT
   .compat_ioctl   = ext4_compat_ioctl,
  #endif
 - .mmap   = generic_file_mmap,
 + .mmap   = ext4_file_mmap,
   .open   = generic_file_open,
   .release= ext4_release_file,
   .fsync  = ext4_sync_file,
 diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
 index 5b5d63d..00af97d 100644
 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -3490,3 +3490,63 @@ int ext4_change_inode_journal_flag(struct inode 
 *inode, int val)
 
   return err;
  }
 +
 +int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
 +{
 + unsigned long end;
 + loff_t size;
 + handle_t *handle;
 + int ret = -EINVAL, needed_blocks;
 + struct file *file   = vma-vm_file;
 + struct inode *inode = file-f_path.dentry-d_inode;
 +
 + needed_blocks = ext4_writepage_trans_blocks(inode);
 + /* We need to take inode mutex to prevent parallel write */
 + mutex_lock(inode-i_mutex);
 + lock_page(page);
 + size = i_size_read(inode);
 + if ((page-mapping != inode-i_mapping) ||
 + (page_offset(page)  size)) {
 + /* page got truncated out from underneath us */
 + goto out_unlock;
 + }
 +
 + /* page is wholly or partially inside EOF */
 + if (((page-index + 1)  PAGE_CACHE_SHIFT)  size)
 + end = size  ~PAGE_CACHE_MASK;
 + else
 + end = PAGE_CACHE_SIZE;
 +
 + handle = ext4_journal_start(inode, needed_blocks);
 + if (IS_ERR(handle)) {
 + ret = PTR_ERR(handle);
 + goto out_unlock;
 + }
 + /* Will zero out the pages if buffer is marked new */
 + ret = block_prepare_write(page, 0, end, ext4_get_block);
 +
 + if (!ret  ext4_should_journal_data(inode)) {
 + ret = walk_page_buffers(handle, page_buffers(page),
 + 0, end, NULL, do_journal_get_write_access);
 + if (!ret)
 + ret = walk_page_buffers(handle, page_buffers(page),
 + 0, end, NULL, write_end_fn);
 + /*
 +  * we don't want to call block_commit_write in journalled mode
 +  */
 + ext4_journal_stop(handle);
 + goto out_unlock;
 + }
 + if (!ret  ext4_should_order_data(inode)) {
 + ret = walk_page_buffers(handle, page_buffers(page),
 + 0, end, NULL, ext4_journal_dirty_data);
 + }
 + if (!ret)
 + ret = block_commit_write(page, 0, end);
 +
Hmm, it seems wired to do commit_write when the page is about becoming
writable, but maybe that's the way it needs to?

Don't we need to update the i_size somewhere?

 + ext4_journal_stop(handle);
 +out_unlock:
 + unlock_page(page);
 + mutex_unlock(inode-i_mutex);
 + return ret;
 +}

It seems this combined the three journalling mode prepare_write() code
here:( 

Since prepare_write() and commit_write() is going to sunset, why not
simply calling mappings-a_ops-write_begin() and then write_end()? that
should take care of pretty much the journalling and the page operations,
no?

Mingming
 diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
 index 22810b1..8f5a563 100644
 --- a/include/linux/ext4_fs.h
 +++ 

Re: [PATCH] ext4: Use page_mkwrite vma_operations to get mmap write notification.

2008-02-22 Thread Mingming Cao
On Fri, 2008-02-22 at 23:53 +0530, Aneesh Kumar K.V wrote:
 On Fri, Feb 22, 2008 at 10:10:48AM -0800, Mingming Cao wrote:
  On Fri, 2008-02-22 at 20:09 +0530, Aneesh Kumar K.V wrote:
 
 .
 
   + ext4_journal_stop(handle);
   + goto out_unlock;
   + }
   + if (!ret  ext4_should_order_data(inode)) {
   + ret = walk_page_buffers(handle, page_buffers(page),
   + 0, end, NULL, ext4_journal_dirty_data);
   + }
   + if (!ret)
   + ret = block_commit_write(page, 0, end);
   +
  Hmm, it seems wired to do commit_write when the page is about becoming
  writable, but maybe that's the way it needs to?
  
  Don't we need to update the i_size somewhere?
 
ah, i_size didn't change with mapped IO.

 block_commit_write simply iterate over buffer_head of page and mark them
 dirty. That is why we don't want to call that for data=journalled mode.
 

Right, but it still seems odd to mark the buffer_heard dirty *before*
the write happens.

I am confused, if i_size is not changing, then what we are journalling
about?  Keep journal ordering? but we haven't write anything yet

Mingming
  
   + ext4_journal_stop(handle);
   +out_unlock:
   + unlock_page(page);
   + mutex_unlock(inode-i_mutex);
   + return ret;
   +}
  
  It seems this combined the three journalling mode prepare_write() code
  here:( 
  
  Since prepare_write() and commit_write() is going to sunset, why not
  simply calling mappings-a_ops-write_begin() and then write_end()? that
  should take care of pretty much the journalling and the page operations,
  no?
 
 write_begin and write_end works with the user space buffer. In this case
 we don't have one. Also what ext4_page_mkwrite does is mostly what
 write_begin/write_end does except the copy of user space buffer.
 
 
 -aneesh

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC]ext4: block reservations to handle delalloc ENOSPC error

2008-02-22 Thread Mingming Cao
In delayed allocation, blocks to be allocated need to be reserved
before user buffers being copied to memory, otherwise later at
page writeout time we could hit ENOSPC error.

In this patch, blocks(data and metadata) are reserved at da_write_begin()
time, the free blocks counter is updated by then, and the number of reserved
blocks is store in inode structure in memory.

Then later, when new_blocks() is being called, if the caller is already reserved
the blocks for allocation, the free blocks_counter is not being updated, 
otherwise,
in DIO case, which could call get_blocks() without the blocks being reserved,
proper accounting still need to be done.

At the writepage() time, the unused reserved blocks are returned back.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/ext4/balloc.c|   25 +++--
 fs/ext4/dir.c   |3 -
 fs/ext4/extents.c   |   78 +++---
 fs/ext4/inode.c |  102 +---
 fs/ext4/mballoc.c   |   11 +++-
 fs/ext4/migrate.c   |2 
 fs/ext4/super.c |3 +
 fs/ext4/xattr.c |2 
 include/linux/ext4_fs.h |   17 --
 include/linux/ext4_fs_extents.h |4 -
 include/linux/ext4_fs_i.h   |4 +
 11 files changed, 195 insertions(+), 56 deletions(-)

Index: linux-2.6.25-rc2/fs/ext4/super.c
===
--- linux-2.6.25-rc2.orig/fs/ext4/super.c   2008-02-21 17:30:59.0 
-0800
+++ linux-2.6.25-rc2/fs/ext4/super.c2008-02-21 19:47:00.0 -0800
@@ -573,6 +573,8 @@ static struct inode *ext4_alloc_inode(st
memset(ei-i_cached_extent, 0, sizeof(struct ext4_ext_cache));
INIT_LIST_HEAD(ei-i_prealloc_list);
spin_lock_init(ei-i_prealloc_lock);
+   ei-i_reserved_data_blocks = 0;
+   ei-i_reserved_meta_blocks = 0;
return ei-vfs_inode;
 }
 
@@ -2190,6 +2192,7 @@ static int ext4_fill_super (struct super
err = percpu_counter_init(sbi-s_dirs_counter,
ext4_count_dirs(sb));
}
+
if (err) {
printk(KERN_ERR EXT4-fs: insufficient memory\n);
goto failed_mount3;
Index: linux-2.6.25-rc2/fs/ext4/inode.c
===
--- linux-2.6.25-rc2.orig/fs/ext4/inode.c   2008-02-21 17:31:03.0 
-0800
+++ linux-2.6.25-rc2/fs/ext4/inode.c2008-02-21 19:52:34.0 -0800
@@ -36,6 +36,7 @@
 #include linux/mpage.h
 #include linux/uio.h
 #include linux/bio.h
+#include linux/ext4_fs_extents.h
 #include xattr.h
 #include acl.h
 
@@ -508,10 +509,11 @@ static int ext4_blks_to_allocate(Indirec
  * the indirect blocks(if needed) and the first direct block,
  * @blks:  on return it will store the total number of allocated
  * direct blocks
+ * @reserved: flag if blocks is pre-reserved by delayed allocation
  */
 static int ext4_alloc_blocks(handle_t *handle, struct inode *inode,
ext4_fsblk_t goal, int indirect_blks, int blks,
-   ext4_fsblk_t new_blocks[4], int *err)
+   ext4_fsblk_t new_blocks[4], int reserved, int *err)
 {
int target, i;
unsigned long count = 0;
@@ -532,7 +534,8 @@ static int ext4_alloc_blocks(handle_t *h
while (1) {
count = target;
/* allocating blocks for indirect blocks and direct blocks */
-   current_block = ext4_new_blocks(handle,inode,goal,count,err);
+   current_block = ext4_new_blocks(handle, inode, goal,
+   count, reserved, err);
if (*err)
goto failed_out;
 
@@ -567,6 +570,8 @@ failed_out:
  * @blks: number of allocated direct blocks
  * @offsets: offsets (in the blocks) to store the pointers to next.
  * @branch: place to store the chain in.
+ * @reserved: flagging if blocks already reserved for allocation
+ *avoid doing accounting on free blocks
  *
  * This function allocates blocks, zeroes out all but the last one,
  * links them into chain and (if we are synchronous) writes them to disk.
@@ -587,7 +592,7 @@ failed_out:
  */
 static int ext4_alloc_branch(handle_t *handle, struct inode *inode,
int indirect_blks, int *blks, ext4_fsblk_t goal,
-   ext4_lblk_t *offsets, Indirect *branch)
+   ext4_lblk_t *offsets, Indirect *branch, int reserved)
 {
int blocksize = inode-i_sb-s_blocksize;
int i, n = 0;
@@ -598,7 +603,7 @@ static int ext4_alloc_branch(handle_t *h
ext4_fsblk_t current_block;
 
num = ext4_alloc_blocks(handle, inode, goal, indirect_blks,
-   *blks, new_blocks, err);
+   *blks, new_blocks

[RFC] Delayed allocation updates

2008-02-22 Thread Mingming Cao
Hi, ext4 patch queue has keeping a set of delayed allocation for a
while. The know todo is add reservation and ordered mode support.

FYI, The base delayed allocation patches are located at
VFS changes:
http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=delalloc-vfs.patch;h=713004ca3f01ae9e7ace289c596ce25901b6c28a;hb=dc9fa932e786df0f2e18402e990926be667484ac

ext4 support:
http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=delalloc-ext4.patch;h=67f8e2802f52e52497a9a8360a1d979bc60c2142;hb=dc9fa932e786df0f2e18402e990926be667484ac


Here is the updates to the delayed allocation patches in the ext4 patch
queue.

- update delayed allocation with current VFS API write_begin/write_end
- general handling  preallocation
- add block reservation to handle ENOSPC error

Patches will follow. compiles, plan to add to the patch later. Comments?

I looked at ordered mode briefly but not entirely clear how that could
be done without a new callback function. Perhaps Alex has some thoughts?


Cheers,
Mingming


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]ext4: use write_begin/write_end in delalloc instead of prepare_write()

2008-02-22 Thread Mingming Cao
Since generally the current mainline VFS API changed to use write-begin()
and write_end(), updating ext4 delayed allocation to adopt this change.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/ext4/inode.c |   36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

Index: linux-2.6.25-rc2/fs/ext4/inode.c
===
--- linux-2.6.25-rc2.orig/fs/ext4/inode.c   2008-02-21 17:30:59.0 
-0800
+++ linux-2.6.25-rc2/fs/ext4/inode.c2008-02-22 12:17:16.0 -0800
@@ -1421,13 +1421,6 @@ static int ext4_da_get_block_prep(struct
return ret;
 }
 
-
-static int ext4_da_prepare_write(struct file *file, struct page *page,
-unsigned from, unsigned to)
-{
-   return block_prepare_write(page, from, to, ext4_da_get_block_prep);
-}
-
 static int ext4_da_get_block_write(struct inode *inode, sector_t iblock,
   struct buffer_head *bh_result, int create)
 {
@@ -1485,11 +1478,34 @@ out:
 }
 
 static int ext4_da_writepages(struct address_space *mapping,
- struct writeback_control *wbc)
+   struct writeback_control *wbc)
 {
return mpage_da_writepages(mapping, wbc, ext4_da_get_block_write);
 }
 
+static int ext4_da_write_begin(struct file *file, struct address_space 
*mapping,
+   loff_t pos, unsigned len, unsigned flags,
+   struct page **pagep, void **fsdata)
+{
+   int ret;
+   struct page *page;
+   pgoff_t index;
+   unsigned from, to;
+
+   index = pos  PAGE_CACHE_SHIFT;
+   from = pos  (PAGE_CACHE_SIZE - 1);
+   to = from + len;
+
+   page = __grab_cache_page(mapping, index);
+   if (!page)
+   return -ENOMEM;
+   *pagep = page;
+
+   ret = block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+   ext4_da_get_block_prep);
+   return ret;
+}
+
 static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
 {
struct buffer_head *head, *bh;
@@ -1990,8 +2006,8 @@ static const struct address_space_operat
.writepage  = ext4_writeback_writepage,
.writepages = ext4_da_writepages,
.sync_page  = block_sync_page,
-   .prepare_write  = ext4_da_prepare_write,
-   .commit_write   = generic_commit_write,
+   .write_begin= ext4_da_write_begin,
+   .write_end  = generic_write_end,
.bmap   = ext4_bmap,
.invalidatepage = ext4_da_invalidatepage,
.releasepage= ext4_releasepage,


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ext4: general handling preallocated blocks in delayed allocation

2008-02-22 Thread Mingming Cao
With delayed allocation, get_block() is only doing block map at the
write_begin() time. If the blocks are prea-allocated, the result bh
is not mapped, but the blocks are actually being allocated. 
delalloc should not treat it as other regular unallocated area, thus
mark it as need block allocation later, and doing extra reservation
incorrectly.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/ext4/inode.c |   26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

Index: linux-2.6.25-rc2/fs/ext4/inode.c
===
--- linux-2.6.25-rc2.orig/fs/ext4/inode.c   2008-02-19 14:57:00.0 
-0800
+++ linux-2.6.25-rc2/fs/ext4/inode.c2008-02-19 15:04:56.0 -0800
@@ -1401,20 +1401,18 @@ static int ext4_da_get_block_prep(struct
 * XXX: when the filesystem has a lot of free blocks, we could
 * reserve even allocated blocks to save this lookup */
ret = ext4_get_blocks_wrap(NULL, inode, iblock, 1,  bh_result, 0, 0);
-   if (ret = 0) {
-   if (buffer_mapped(bh_result)) {
-   bh_result-b_size = (ret  inode-i_blkbits);
-   } else {
-   /* the block isn't allocated yet, let's reserve space */
-   /* XXX: call reservation here */
-   /*
-* XXX: __block_prepare_write() unmaps passed block,
-* is it OK?
-*/
-   map_bh(bh_result, inode-i_sb, 0);
-   set_buffer_new(bh_result);
-   set_buffer_delay(bh_result);
-   }
+   if (ret == 0) {
+   /* the block isn't allocated yet, let's reserve space */
+   /* XXX: call reservation here */
+   /*
+* XXX: __block_prepare_write() unmaps passed block,
+* is it OK?
+*/
+   map_bh(bh_result, inode-i_sb, 0);
+   set_buffer_new(bh_result);
+   set_buffer_delay(bh_result);
+   } else if ((ret  0)  (buffer_mapped(bh_result))) {
+   bh_result-b_size = (ret  inode-i_blkbits);
ret = 0;
}
 


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v2] ext4: Use page_mkwrite vma_operations to get mmap write notification.

2008-02-21 Thread Mingming Cao
On Tue, 2008-02-19 at 09:13 +0530, Aneesh Kumar K.V wrote:
 We would like to get notified when we are doing a write on mmap section.
 This is needed with respect to preallocated area. We split the preallocated
 area into initialzed extent and uninitialzed extent in the call back. This
 let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
 that would result in data loss. The changes are also needed to handle ENOSPC
 when writing to an mmap section of files with holes.
 

Hi Aneesh,

I have a concern, it seems we missed journalling the allocation activity
for the mmaped write. See comments below...

Another thing, perhaps similar patch should be ported to ext2/3, as this
also addressed the mmaped write ENOSPC error without
preallocation/deleyed allocation.

 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 ---
  fs/ext4/file.c  |   19 ++-
  fs/ext4/inode.c |6 ++
  include/linux/ext4_fs.h |1 +
  3 files changed, 25 insertions(+), 1 deletions(-)
 
 diff --git a/fs/ext4/file.c b/fs/ext4/file.c
 index 20507a2..77341c1 100644
 --- a/fs/ext4/file.c
 +++ b/fs/ext4/file.c
 @@ -123,6 +123,23 @@ force_commit:
   return ret;
  }
 
 +static struct vm_operations_struct ext4_file_vm_ops = {
 + .fault  = filemap_fault,
 + .page_mkwrite   = ext4_page_mkwrite,
 +};
 +
 +static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 +{
 + struct address_space *mapping = file-f_mapping;
 +
 + if (!mapping-a_ops-readpage)
 + return -ENOEXEC;
 + file_accessed(file);
 + vma-vm_ops = ext4_file_vm_ops;
 + vma-vm_flags |= VM_CAN_NONLINEAR;
 + return 0;
 +}
 +
  const struct file_operations ext4_file_operations = {
   .llseek = generic_file_llseek,
   .read   = do_sync_read,
 @@ -133,7 +150,7 @@ const struct file_operations ext4_file_operations = {
  #ifdef CONFIG_COMPAT
   .compat_ioctl   = ext4_compat_ioctl,
  #endif
 - .mmap   = generic_file_mmap,
 + .mmap   = ext4_file_mmap,
   .open   = generic_file_open,
   .release= ext4_release_file,
   .fsync  = ext4_sync_file,
 diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
 index 34f3eb6..81faa67 100644
 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -3466,3 +3466,9 @@ int ext4_change_inode_journal_flag(struct inode *inode, 
 int val)
 
   return err;
  }
 +
 +int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
 +{
 + return block_page_mkwrite(vma, page, ext4_get_block);
 +}
 +

I don't see block allocation being journalled here. block_page_mkwrite()
eventually calling block_prepare_write() which invokes ext4_get_block()
without starting a new journal handle.

Perhaps call ext4 write_begin() and write_end() inode operations, that
would taking care of different write_begin and write_end for three
different journalling mode.

Mingming

 diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
 index 22810b1..8f5a563 100644
 --- a/include/linux/ext4_fs.h
 +++ b/include/linux/ext4_fs.h
 @@ -1059,6 +1059,7 @@ extern void ext4_set_aops(struct inode *inode);
  extern int ext4_writepage_trans_blocks(struct inode *);
  extern int ext4_block_truncate_page(handle_t *handle, struct page *page,
   struct address_space *mapping, loff_t from);
 +extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page);
 
  /* ioctl.c */
  extern long ext4_ioctl(struct file *, unsigned int, unsigned long);

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] ext4: Convert uninitialized extent to initialized extent in case of file system full

2008-02-21 Thread Mingming Cao
Hi Aneesh,

It's a good start, a few comments below..

On Fri, 2008-02-22 at 00:47 +0530, Aneesh Kumar K.V wrote:
 From 6a73edd4dbb32344e6a83ebdc07edd0e96d376bd Mon Sep 17 00:00:00 2001
 From: Aneesh Kumar K.V [EMAIL PROTECTED]
 Date: Thu, 21 Feb 2008 23:57:38 +0530
 Subject: [PATCH] ext4: Convert uninitialized extent to initialized extent in 
 case of file system full
 
 A write to prealloc area cause the split of unititalized extent into a 
 initialized
 and uninitialized extent. If we don't have space to add new extent 
 information instead
 of returning error convert the existing uninitialized extent to initialized 
 one. We
 need to zero out the blocks corresponding to the extent to prevent wrong data 
 reaching
 userspace.
 

 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 ---
  fs/ext4/extents.c |  135 
 -
  1 files changed, 133 insertions(+), 2 deletions(-)
 
 diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
 index b179b03..d37c14e 100644
 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -2137,6 +2137,103 @@ void ext4_ext_release(struct super_block *sb)
  #endif
  }
 
 +static int ext4_ext_zero_out(handle_t *handle, struct inode *inode,
 + ext4_lblk_t iblock, struct ext4_extent *ex)
 +{
 + ext4_lblk_t ee_block;
 + unsigned int ee_len, blkcount, blocksize;
 + loff_t pos;
 + pgoff_t index, skip_index;
 + unsigned long offset;
 + struct page *page;
 + struct address_space *mapping = inode-i_mapping;
 + struct buffer_head *head, *bh;
 + int err = 0;
 +
 + ee_block = le32_to_cpu(ex-ee_block);
 + ee_len = blkcount = ext4_ext_get_actual_len(ex);
 + blocksize = inode-i_sb-s_blocksize;
 +
 + /*
 +  * find the skip index. We can't call __grab_cache_page for this
 +  * because we are in the writeout of this page and we already have
 +  * taken the lock on this page
 +  */
 + pos = iblock   inode-i_blkbits;
 + skip_index = pos  PAGE_CACHE_SHIFT;
 +

We should not need to look up the page cache to do the zero out. The
approach I had thought is just zero it out on disk.

 + while (blkcount) {
 + pos = (ee_block  + ee_len - blkcount)  inode-i_blkbits;
 + index = pos  PAGE_CACHE_SHIFT;
 + offset = (pos  (PAGE_CACHE_SIZE - 1));
 + if (index == skip_index) {
 + /* Page will already be locked in the writepage */
 + read_lock_irq(mapping-tree_lock);
 + page = radix_tree_lookup(mapping-page_tree, index);
 + read_unlock_irq(mapping-tree_lock);
 + if (page)
 + page_cache_get(page);
 + else
 + return -ENOMEM;
 + } else {
 + page = __grab_cache_page(mapping, index);
 + if (!page)
 + return -ENOMEM;
 + }
 +

I the page is already locked before calling get_block() via writepage(),
isn't it? and the journal transaction already started...


 + if (!page_has_buffers(page))
 + create_empty_buffers(page, blocksize, 0);
 +
 + head = page_buffers(page);
 + /* Look for the buffer_head which map the block */
 + bh = head;
 + while (offset  0) {
 + bh = bh-b_this_page;
 + offset -= blocksize;
 + }
 + offset = (pos  (PAGE_CACHE_SIZE - 1));
 +
 + /* Now write all the buffer_heads in the page */
 + do {
 + set_buffer_uptodate(bh);
 + if (ext4_should_journal_data(inode)) {
 + err = ext4_journal_get_write_access(handle, bh);
 + /* do we have that many credits ??*/
 + if (err)
 + goto err_out;
 + }
 + zero_user(page, offset, blocksize);

Ah oh, you are trying to zero out the pages in the page cache, that's
seems wrong to me. By the time get_block() is called from writepages(),
the pages should have meaningful content that needs to flush to disk,
zero the pages out will lost the data.

 + offset += blocksize;
 + if (ext4_should_journal_data(inode)) {
 + err = ext4_journal_dirty_metadata(handle, bh);
 + if (err)
 + goto err_out;
 + } else {
 + if (ext4_should_order_data(inode)) {
 + err = ext4_journal_dirty_data(handle,
 + bh);
 + if (err)
 + goto 

[PATCH] ext4: remove extra defination of ext4_new_blocks_old()

2008-02-21 Thread Mingming Cao
ext4_new_blocks_old() is already externed in ext4_fs.h, remove the extra
define from mballoc.c. Trivial cleanup, but this cause extra maintain work
if later we need to modify this function.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/ext4/mballoc.c |2 --
 1 file changed, 2 deletions(-)

Index: linux-2.6.25-rc2/fs/ext4/mballoc.c
===
--- linux-2.6.25-rc2.orig/fs/ext4/mballoc.c 2008-02-21 17:23:26.0 
-0800
+++ linux-2.6.25-rc2/fs/ext4/mballoc.c  2008-02-21 17:23:54.0 -0800
@@ -576,8 +576,6 @@ static void ext4_mb_store_history(struct
 
 static struct proc_dir_entry *proc_root_ext4;
 struct buffer_head *read_block_bitmap(struct super_block *, ext4_group_t);
-ext4_fsblk_t ext4_new_blocks_old(handle_t *handle, struct inode *inode,
-   ext4_fsblk_t goal, unsigned long *count, int *errp);
 
 static void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,
ext4_group_t group);


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: ext4_find_next_zero_bit needs an aligned address on some arch

2008-02-20 Thread Mingming Cao
On Wed, 2008-02-20 at 08:49 -0600, Eric Sandeen wrote:
 Aneesh Kumar K.V wrote:
  ext4_find_next_zero_bit and ext4_find_next_bit needs a long aligned
  address on x8_64. Add mb_find_next_zero_bit and mb_find_next_bit
  and use them in the mballoc.
  
  Fix: https://bugzilla.redhat.com/show_bug.cgi?id=433286
  
  Eric Sandeen debugged the problem and suggested the fix.
 
 Also, Ted  Mingming: we probably should get this into 2.6.25; at least
 w/ the way the Fedora kernel is configured, ext4 is pretty much DOA w/o
 this change.  I'm not sure why it started showing up now, but it is, in
 a big way. :)
 

Acked, and added to ext4 patch queue. 

Thanks,
Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] ext4: use ext4_group_first_block_no()

2008-02-19 Thread Mingming Cao
On Sun, 2008-02-17 at 15:08 +0900, Akinobu Mita wrote:
 Use ext4_group_first_block_no() and assign the return values to
 ext2_fsblk_t variables.
 
Acked for ext2/3/4 patches(except a little typo in above change log:
ext4_fsblk_t). ext4 patch is queued in ext4 patch queue
http://repo.or.cz/w/ext4-patch-queue.git

Mingming

 Signed-off-by: Akinobu Mita [EMAIL PROTECTED]
 Cc: Stephen Tweedie [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Cc: Andrew Morton [EMAIL PROTECTED]
 Cc: Mingming Cao [EMAIL PROTECTED]
 Cc: Theodore Tso [EMAIL PROTECTED]
 ---
  fs/ext4/balloc.c |6 +++---
  fs/ext4/xattr.c  |6 ++
  2 files changed, 5 insertions(+), 7 deletions(-)
 
 Index: 2.6-rc/fs/ext4/xattr.c
 ===
 --- 2.6-rc.orig/fs/ext4/xattr.c
 +++ 2.6-rc/fs/ext4/xattr.c
 @@ -808,10 +808,8 @@ inserted:
   get_bh(new_bh);
   } else {
   /* We need to allocate a new block */
 - ext4_fsblk_t goal = le32_to_cpu(
 - EXT4_SB(sb)-s_es-s_first_data_block) +
 - (ext4_fsblk_t)EXT4_I(inode)-i_block_group *
 - EXT4_BLOCKS_PER_GROUP(sb);
 + ext4_fsblk_t goal = ext4_group_first_block_no(sb,
 + EXT4_I(inode)-i_block_group);
   ext4_fsblk_t block = ext4_new_block(handle, inode,
   goal, error);
   if (error)
 Index: 2.6-rc/fs/ext4/balloc.c
 ===
 --- 2.6-rc.orig/fs/ext4/balloc.c
 +++ 2.6-rc/fs/ext4/balloc.c
 @@ -48,7 +48,6 @@ void ext4_get_group_no_and_offset(struct
  unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head 
 *bh,
ext4_group_t block_group, struct ext4_group_desc *gdp)
  {
 - unsigned long start;
   int bit, bit_max;
   unsigned free_blocks, group_blocks;
   struct ext4_sb_info *sbi = EXT4_SB(sb);
 @@ -106,11 +105,12 @@ unsigned ext4_init_block_bitmap(struct s
   free_blocks = group_blocks - bit_max;
 
   if (bh) {
 + ext4_fsblk_t start;
 +
   for (bit = 0; bit  bit_max; bit++)
   ext4_set_bit(bit, bh-b_data);
 
 - start = block_group * EXT4_BLOCKS_PER_GROUP(sb) +
 - le32_to_cpu(sbi-s_es-s_first_data_block);
 + start = ext4_group_first_block_no(sb, block_group);
 
   /* Set bits for block and inode bitmaps, and inode table */
   ext4_set_bit(ext4_block_bitmap(sb, gdp) - start, bh-b_data);

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] ext4: use ext4_get_group_desc()

2008-02-19 Thread Mingming Cao
On Sun, 2008-02-17 at 15:13 +0900, Akinobu Mita wrote:
 Use ext4_get_group_desc()
 
Acked and added to ext4 patch queue.

 Signed-off-by: Akinobu Mita [EMAIL PROTECTED]
 Cc: Stephen Tweedie [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Cc: Andrew Morton [EMAIL PROTECTED]
 Cc: Mingming Cao [EMAIL PROTECTED]
 Cc: Theodore Tso [EMAIL PROTECTED]
 ---
  fs/ext4/inode.c |   20 +++-
  1 file changed, 3 insertions(+), 17 deletions(-)
 
 Index: 2.6-rc/fs/ext4/inode.c
 ===
 --- 2.6-rc.orig/fs/ext4/inode.c
 +++ 2.6-rc/fs/ext4/inode.c
 @@ -2455,12 +2455,10 @@ out_stop:
  static ext4_fsblk_t ext4_get_inode_block(struct super_block *sb,
   unsigned long ino, struct ext4_iloc *iloc)
  {
 - unsigned long desc, group_desc;
   ext4_group_t block_group;
   unsigned long offset;
   ext4_fsblk_t block;
 - struct buffer_head *bh;
 - struct ext4_group_desc * gdp;
 + struct ext4_group_desc *gdp;
 
   if (!ext4_valid_inum(sb, ino)) {
   /*
 @@ -2472,22 +2470,10 @@ static ext4_fsblk_t ext4_get_inode_block
   }
 
   block_group = (ino - 1) / EXT4_INODES_PER_GROUP(sb);
 - if (block_group = EXT4_SB(sb)-s_groups_count) {
 - ext4_error(sb,ext4_get_inode_block,group = groups count);
 + gdp = ext4_get_group_desc(sb, block_group, NULL);
 + if (!gdp)
   return 0;
 - }
 - smp_rmb();
 - group_desc = block_group  EXT4_DESC_PER_BLOCK_BITS(sb);
 - desc = block_group  (EXT4_DESC_PER_BLOCK(sb) - 1);
 - bh = EXT4_SB(sb)-s_group_desc[group_desc];
 - if (!bh) {
 - ext4_error (sb, ext4_get_inode_block,
 - Descriptor not loaded);
 - return 0;
 - }
 
 - gdp = (struct ext4_group_desc *)((__u8 *)bh-b_data +
 - desc * EXT4_DESC_SIZE(sb));
   /*
* Figure out the offset within the block group inode table
*/

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] ext4: check ext4_journal_get_write_access() errors

2008-02-19 Thread Mingming Cao
On Sun, 2008-02-17 at 15:15 +0900, Akinobu Mita wrote:
 Check ext4_journal_get_write_access() errors.
 
Acked and added to ext4 patch queue

Mingming
 Signed-off-by: Akinobu Mita [EMAIL PROTECTED]
 Cc: Stephen Tweedie [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Cc: Andrew Morton [EMAIL PROTECTED]
 Cc: Mingming Cao [EMAIL PROTECTED]
 Cc: Theodore Tso [EMAIL PROTECTED]
 ---
  fs/ext4/namei.c |9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)
 
 Index: 2.6-rc/fs/ext4/namei.c
 ===
 --- 2.6-rc.orig/fs/ext4/namei.c
 +++ 2.6-rc/fs/ext4/namei.c
 @@ -57,10 +57,15 @@ static struct buffer_head *ext4_append(h
 
   *block = inode-i_size  inode-i_sb-s_blocksize_bits;
 
 - if ((bh = ext4_bread(handle, inode, *block, 1, err))) {
 + bh = ext4_bread(handle, inode, *block, 1, err);
 + if (bh) {
   inode-i_size += inode-i_sb-s_blocksize;
   EXT4_I(inode)-i_disksize = inode-i_size;
 - ext4_journal_get_write_access(handle,bh);
 + *err = ext4_journal_get_write_access(handle, bh);
 + if (*err) {
 + brelse(bh);
 + bh = NULL;
 + }
   }
   return bh;
  }

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: When reading from fallocated blocks make sure we return zero.

2008-02-18 Thread Mingming Cao
On Tue, 2008-02-19 at 08:49 +0530, Aneesh Kumar K.V wrote:
 On Mon, Feb 18, 2008 at 04:14:34PM -0800, Mingming Cao wrote:
  On Sat, 2008-02-16 at 08:53 +0530, Aneesh Kumar K.V wrote:
  
  How about the following patch?
  
  Regards,
  Mingming
  
  ext4: ext4_get_blocks_wrap fix for writing to preallocated
  From: Mingming Cao [EMAIL PROTECTED]
  
  This patch fixed a issue with wrting to a preallocated blocks.
  A write hit a BUG_ON() in fs/buffer.c saying the buffer is not mapped.
  
  On the write path, ext4_get_block_wrap() is called with create=1, but it
  will pass create=0 down to the underlying ext4ext_get_blocks()
  to do a look up first.  In the preallocation case, ext4_ext_get_blocks()
  with create = 0, will return number of blocks pre-allocated and buffer
  head unmapped. ext4_get_blocks_wrap() thinks it succeeds too early, without
  checking if it needs again call ext4_ext_get_blocks with create = 1
  which would do proper handling for writing to preallocated blocks:
  split the extent to initialized and uninitialized one and
  returns the mapped buffer head.
  
  Treating preallocated blocks as holes equally(i.e. ignoring the number of 
  blocks
  pre-allocated and returns 0) when get_blocks() is called with create = 0 is 
  not enough.
  ext4_ext_get_blocks() needs to differentiate these two cases for delayed 
  allocation
  purpose, as for holes it need to do reservation and prepare for later
  delayed allocation, but for pre-allocated blocks it needs skip that work.
  
  It would makes things more clear if we have clear definition of what
  get_blocks() return value means.
  
  Similar to ext4_get_blocks_handle(),  the following 
  * return  0, # of blocks already allocated
  *  if these are pre-allocated blocks and create = 0
  * buffer head is unmapped
  *  otherwise blocks are mapped.
  *
  * return = 0, if plain look up failed (blocks have not been allocated)
  *  buffer head is unmapped
  *
  * return  0, error case.
  
  The for the write path, at ext4_ext_get_blocks_wrap(), it could check the
  buffer_mapped() status for preallocated extent before quit too early.
  
  Signed-off-by: Mingming Cao [EMAIL PROTECTED]
 
 Reviewed-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 
 
 
 I guess we also need to make sure the buffer head have the mapped bit
 set. Something like the patch below.
 
Good point. I modified the patch with clear_buffer_mapped() added at the
begining of the wrapper function.

Mingming
 diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
 index bc7081f..69ccda9 100644
 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -2294,6 +2294,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode 
 *inode,
   struct ext4_allocation_request ar;
 
   __clear_bit(BH_New, bh_result-b_state);
 + __clear_bit(BH_Mapped, bh_result-b_state);
   ext_debug(blocks %u/%lu requested for inode %u\n,
   iblock, max_blocks, inode-i_ino);
 

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v2] ext4: Use page_mkwrite vma_operations to get mmap write notification.

2008-02-18 Thread Mingming Cao
On Tue, 2008-02-19 at 09:13 +0530, Aneesh Kumar K.V wrote:
 We would like to get notified when we are doing a write on mmap section.
 This is needed with respect to preallocated area. We split the preallocated
 area into initialzed extent and uninitialzed extent in the call back. This
 let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
 that would result in data loss. The changes are also needed to handle ENOSPC
 when writing to an mmap section of files with holes.
 
Acked. Added to patch queue.

Thanks,
Mingming
 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 ---
  fs/ext4/file.c  |   19 ++-
  fs/ext4/inode.c |6 ++
  include/linux/ext4_fs.h |1 +
  3 files changed, 25 insertions(+), 1 deletions(-)
 
 diff --git a/fs/ext4/file.c b/fs/ext4/file.c
 index 20507a2..77341c1 100644
 --- a/fs/ext4/file.c
 +++ b/fs/ext4/file.c
 @@ -123,6 +123,23 @@ force_commit:
   return ret;
  }
 
 +static struct vm_operations_struct ext4_file_vm_ops = {
 + .fault  = filemap_fault,
 + .page_mkwrite   = ext4_page_mkwrite,
 +};
 +
 +static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 +{
 + struct address_space *mapping = file-f_mapping;
 +
 + if (!mapping-a_ops-readpage)
 + return -ENOEXEC;
 + file_accessed(file);
 + vma-vm_ops = ext4_file_vm_ops;
 + vma-vm_flags |= VM_CAN_NONLINEAR;
 + return 0;
 +}
 +
  const struct file_operations ext4_file_operations = {
   .llseek = generic_file_llseek,
   .read   = do_sync_read,
 @@ -133,7 +150,7 @@ const struct file_operations ext4_file_operations = {
  #ifdef CONFIG_COMPAT
   .compat_ioctl   = ext4_compat_ioctl,
  #endif
 - .mmap   = generic_file_mmap,
 + .mmap   = ext4_file_mmap,
   .open   = generic_file_open,
   .release= ext4_release_file,
   .fsync  = ext4_sync_file,
 diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
 index 34f3eb6..81faa67 100644
 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -3466,3 +3466,9 @@ int ext4_change_inode_journal_flag(struct inode *inode, 
 int val)
 
   return err;
  }
 +
 +int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
 +{
 + return block_page_mkwrite(vma, page, ext4_get_block);
 +}
 +
 diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
 index 22810b1..8f5a563 100644
 --- a/include/linux/ext4_fs.h
 +++ b/include/linux/ext4_fs.h
 @@ -1059,6 +1059,7 @@ extern void ext4_set_aops(struct inode *inode);
  extern int ext4_writepage_trans_blocks(struct inode *);
  extern int ext4_block_truncate_page(handle_t *handle, struct page *page,
   struct address_space *mapping, loff_t from);
 +extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page);
 
  /* ioctl.c */
  extern long ext4_ioctl(struct file *, unsigned int, unsigned long);

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v2] ext4: Don't mark the filesystem with errors if we fail to fallocate.

2008-02-18 Thread Mingming Cao
On Tue, 2008-02-19 at 09:14 +0530, Aneesh Kumar K.V wrote:
 IF we fail allocate blocks don't call ext4_error. Also don't hide errors
 from ext4_get_blocks_wrap
 
Aced. And added to patch queue.

Thanks,
Mingming
 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 ---
  fs/ext4/extents.c |   11 ++-
  1 files changed, 6 insertions(+), 5 deletions(-)
 
 diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
 index 5b22f71..9a27e88 100644
 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -2653,13 +2653,14 @@ retry:
   ret = ext4_get_blocks_wrap(handle, inode, block,
 max_blocks, map_bh,
 EXT4_CREATE_UNINITIALIZED_EXT, 0);
 - WARN_ON(ret = 0);
   if (ret = 0) {
 - ext4_error(inode-i_sb, ext4_fallocate,
 - ext4_ext_get_blocks returned error: 
 - inode#%lu, block=%u, max_blocks=%lu,
 +#ifdef EXT4FS_DEBUG
 + WARN_ON(ret = 0);
 + printk(KERN_ERR %s: ext4_ext_get_blocks 
 + returned error inode#%lu, block=%u, 
 + max_blocks=%lu, __FUNCTION__,
   inode-i_ino, block, max_blocks);
 - ret = -EIO;
 +#endif
   ext4_mark_inode_dirty(handle, inode);
   ret2 = ext4_journal_stop(handle);
   break;

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] don't set extents flag for _any_ symlinks

2008-02-18 Thread Mingming Cao
Thanks, Added to patch queue
On Mon, 2008-02-18 at 16:27 -0600, Eric Sandeen wrote:
 As symlinks are limited to a single block anyway, and e2fsck
 doesn't expect to find it set, don't set the extents flag on 
 any type of symlinks at all: fast/in-inode, or the 
 external-block flavor.
 
 There are a lot of filesystems out there by now w/ exent-style 
 symlink blocks though, so e2fsck should probably be able to 
 repair that at some point...
 
 Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
 
 ---
 
 Index: linux-2.6.24/fs/ext4/namei.c
 ===
 --- linux-2.6.24.orig/fs/ext4/namei.c
 +++ linux-2.6.24/fs/ext4/namei.c
 @@ -2223,7 +2226,6 @@ retry:
   inode-i_op = ext4_fast_symlink_inode_operations;
   memcpy((char*)EXT4_I(inode)-i_data,symname,l);
   inode-i_size = l-1;
 - EXT4_I(inode)-i_flags = ~EXT4_EXTENTS_FL;
   }
   EXT4_I(inode)-i_disksize = inode-i_size;
   err = ext4_add_nondir(handle, dentry, inode);
 
 Index: linux-2.6.24/fs/ext4/ialloc.c
 ===
 --- linux-2.6.24.orig/fs/ext4/ialloc.c
 +++ linux-2.6.24/fs/ext4/ialloc.c
 @@ -744,7 +744,7 @@ got:
   ext4_std_error(sb, err);
   goto fail_free_drop;
   }
 - if (test_opt(sb, EXTENTS)) {
 + if (test_opt(sb, EXTENTS)  !S_ISLNK(mode)) {
   EXT4_I(inode)-i_flags |= EXT4_EXTENTS_FL;
   ext4_ext_tree_init(handle, inode);
   err = ext4_update_incompat_feature(handle, sb,
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clear extents flag on inodes created in ext4_mknod

2008-02-18 Thread Mingming Cao
Acked and added to the patch queue.
On Mon, 2008-02-18 at 15:00 -0600, Eric Sandeen wrote:
 e2fsck doesn't expect to find char, block, fifo, or socket
 files with the extent flag set, so clear that in ext4_mknod.
 
 Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
 ---
 
 Index: linux-2.6.24/fs/ext4/namei.c
 ===
 --- linux-2.6.24.orig/fs/ext4/namei.c
 +++ linux-2.6.24/fs/ext4/namei.c
 @@ -1766,6 +1766,7 @@ retry:
  #ifdef CONFIG_EXT4DEV_FS_XATTR
   inode-i_op = ext4_special_inode_operations;
  #endif
 + EXT4_I(inode)-i_flags = ~EXT4_EXTENTS_FL;
   err = ext4_add_nondir(handle, dentry, inode);
   }
   ext4_journal_stop(handle);
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] ext4: add missing ext4_journal_stop()

2008-02-18 Thread Mingming Cao
On Sun, 2008-02-17 at 15:11 +0900, Akinobu Mita wrote:
 Add missing ext4_journal_stop() in error handling.
 
Reviewed and Acked ext2/3/4 changes.

ext4 patch is also queued in ext4 patch queue
http://repo.or.cz/w/ext4-patch-queue.git

Mingming
 Signed-off-by: Akinobu Mita [EMAIL PROTECTED]
 Cc: Stephen Tweedie [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Cc: Andrew Morton [EMAIL PROTECTED]
 Cc: Mingming Cao [EMAIL PROTECTED]
 Cc: Theodore Tso [EMAIL PROTECTED]
 ---
  fs/ext4/resize.c |1 +
  1 file changed, 1 insertion(+)
 
 Index: 2.6-rc/fs/ext4/resize.c
 ===
 --- 2.6-rc.orig/fs/ext4/resize.c
 +++ 2.6-rc/fs/ext4/resize.c
 @@ -1037,6 +1037,7 @@ int ext4_group_extend(struct super_block
   ext4_warning(sb, __FUNCTION__,
multiple resizers run on filesystem!);
   unlock_super(sb);
 + ext4_journal_stop(handle);
   err = -EBUSY;
   goto exit_put;
   }
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] ext4: Fix the memory leak and buffer head leak with respect to ext4_ext_path

2008-02-14 Thread Mingming Cao
On Thu, 2008-02-14 at 16:00 +0530, Aneesh Kumar K.V wrote:
 The path variable returned via ext4_ext_find_extent is a kmalloc variable
 and need to be freeded. It also contain refrences to buffer_head which need
 to be dropped.
 
Acked.

Mingming
 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 ---
  fs/ext4/extents.c   |6 +++---
  fs/ext4/migrate.c   |5 +
  include/linux/ext4_fs_extents.h |1 +
  3 files changed, 9 insertions(+), 3 deletions(-)
 
 diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
 index e856f66..995ac16 100644
 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -349,7 +349,7 @@ static void ext4_ext_show_leaf(struct inode *inode, 
 struct ext4_ext_path *path)
  #define ext4_ext_show_leaf(inode,path)
  #endif
 
 -static void ext4_ext_drop_refs(struct ext4_ext_path *path)
 +void ext4_ext_drop_refs(struct ext4_ext_path *path)
  {
   int depth = path-p_depth;
   int i;
 @@ -2200,10 +2200,10 @@ static int ext4_ext_convert_to_initialized(handle_t 
 *handle,
   newdepth = ext_depth(inode);
   if (newdepth != depth) {
   depth = newdepth;
 - path = ext4_ext_find_extent(inode, iblock, NULL);
 + ext4_ext_drop_refs(path);
 + path = ext4_ext_find_extent(inode, iblock, path);
   if (IS_ERR(path)) {
   err = PTR_ERR(path);
 - path = NULL;
   goto out;
   }
   eh = path[depth].p_hdr;
 diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
 index 8c6c685..5c1e27d 100644
 --- a/fs/ext4/migrate.c
 +++ b/fs/ext4/migrate.c
 @@ -43,6 +43,7 @@ static int finish_range(handle_t *handle, struct inode 
 *inode,
 
   if (IS_ERR(path)) {
   retval = PTR_ERR(path);
 + path = NULL;
   goto err_out;
   }
 
 @@ -74,6 +75,10 @@ static int finish_range(handle_t *handle, struct inode 
 *inode,
   }
   retval = ext4_ext_insert_extent(handle, inode, path, newext);
  err_out:
 + if (path) {
 + ext4_ext_drop_refs(path);
 + kfree(path);
 + }
   lb-first_pblock = 0;
   return retval;
  }
 diff --git a/include/linux/ext4_fs_extents.h b/include/linux/ext4_fs_extents.h
 index 697da4b..1285c58 100644
 --- a/include/linux/ext4_fs_extents.h
 +++ b/include/linux/ext4_fs_extents.h
 @@ -227,5 +227,6 @@ extern int ext4_ext_search_left(struct inode *, struct 
 ext4_ext_path *,
   ext4_lblk_t *, ext4_fsblk_t *);
  extern int ext4_ext_search_right(struct inode *, struct ext4_ext_path *,
   ext4_lblk_t *, ext4_fsblk_t *);
 +extern void ext4_ext_drop_refs(struct ext4_ext_path *);
  #endif /* _LINUX_EXT4_EXTENTS */
 

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] ext4: Request for journal write access early.

2008-02-14 Thread Mingming Cao
On Thu, 2008-02-14 at 16:00 +0530, Aneesh Kumar K.V wrote:
 In ext4_ext_convert_to_initialized before we need to request for journal
 write access before we even modify the extent length.
 
Acked.
Thanks
Mingming
 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 ---
  fs/ext4/extents.c |   11 ---
  1 files changed, 8 insertions(+), 3 deletions(-)
 
 diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
 index 995ac16..c4d6f19 100644
 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -2168,6 +2168,10 @@ static int ext4_ext_convert_to_initialized(handle_t 
 *handle,
   newblock = iblock - ee_block + ext_pblock(ex);
   ex2 = ex;
 
 + err = ext4_ext_get_access(handle, inode, path + depth);
 + if (err)
 + goto out;
 +
   /* ex1: ee_block to iblock - 1 : uninitialized */
   if (iblock  ee_block) {
   ex1 = ex;
 @@ -2210,6 +2214,10 @@ static int ext4_ext_convert_to_initialized(handle_t 
 *handle,
   ex = path[depth].p_ext;
   if (ex2 != newex)
   ex2 = ex;
 +
 + err = ext4_ext_get_access(handle, inode, path + depth);
 + if (err)
 + goto out;
   }
   allocated = max_blocks;
   }
 @@ -2230,9 +2238,6 @@ static int ext4_ext_convert_to_initialized(handle_t 
 *handle,
   ex2-ee_len = cpu_to_le16(allocated);
   if (ex2 != ex)
   goto insert;
 - err = ext4_ext_get_access(handle, inode, path + depth);
 - if (err)
 - goto out;
   /*
* New (initialized) extent starts from the first block
* in the current extent. i.e., ex2 == ex

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] ext4: Fix the memory leak and buffer head leak with respect to ext4_ext_path

2008-02-14 Thread Mingming Cao
Added in ext4 patch queue,
Thanks

On Thu, 2008-02-14 at 16:00 +0530, Aneesh Kumar K.V wrote:
 The path variable returned via ext4_ext_find_extent is a kmalloc variable
 and need to be freeded. It also contain refrences to buffer_head which need
 to be dropped.
 
 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 ---
  fs/ext4/extents.c   |6 +++---
  fs/ext4/migrate.c   |5 +
  include/linux/ext4_fs_extents.h |1 +
  3 files changed, 9 insertions(+), 3 deletions(-)
 
 diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
 index e856f66..995ac16 100644
 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -349,7 +349,7 @@ static void ext4_ext_show_leaf(struct inode *inode, 
 struct ext4_ext_path *path)
  #define ext4_ext_show_leaf(inode,path)
  #endif
 
 -static void ext4_ext_drop_refs(struct ext4_ext_path *path)
 +void ext4_ext_drop_refs(struct ext4_ext_path *path)
  {
   int depth = path-p_depth;
   int i;
 @@ -2200,10 +2200,10 @@ static int ext4_ext_convert_to_initialized(handle_t 
 *handle,
   newdepth = ext_depth(inode);
   if (newdepth != depth) {
   depth = newdepth;
 - path = ext4_ext_find_extent(inode, iblock, NULL);
 + ext4_ext_drop_refs(path);
 + path = ext4_ext_find_extent(inode, iblock, path);
   if (IS_ERR(path)) {
   err = PTR_ERR(path);
 - path = NULL;
   goto out;
   }
   eh = path[depth].p_hdr;
 diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
 index 8c6c685..5c1e27d 100644
 --- a/fs/ext4/migrate.c
 +++ b/fs/ext4/migrate.c
 @@ -43,6 +43,7 @@ static int finish_range(handle_t *handle, struct inode 
 *inode,
 
   if (IS_ERR(path)) {
   retval = PTR_ERR(path);
 + path = NULL;
   goto err_out;
   }
 
 @@ -74,6 +75,10 @@ static int finish_range(handle_t *handle, struct inode 
 *inode,
   }
   retval = ext4_ext_insert_extent(handle, inode, path, newext);
  err_out:
 + if (path) {
 + ext4_ext_drop_refs(path);
 + kfree(path);
 + }
   lb-first_pblock = 0;
   return retval;
  }
 diff --git a/include/linux/ext4_fs_extents.h b/include/linux/ext4_fs_extents.h
 index 697da4b..1285c58 100644
 --- a/include/linux/ext4_fs_extents.h
 +++ b/include/linux/ext4_fs_extents.h
 @@ -227,5 +227,6 @@ extern int ext4_ext_search_left(struct inode *, struct 
 ext4_ext_path *,
   ext4_lblk_t *, ext4_fsblk_t *);
  extern int ext4_ext_search_right(struct inode *, struct ext4_ext_path *,
   ext4_lblk_t *, ext4_fsblk_t *);
 +extern void ext4_ext_drop_refs(struct ext4_ext_path *);
  #endif /* _LINUX_EXT4_EXTENTS */
 

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] ext4: Request for journal write access early.

2008-02-14 Thread Mingming Cao
Added in ext4 patch queue,
Thanks
On Thu, 2008-02-14 at 16:00 +0530, Aneesh Kumar K.V wrote:
 In ext4_ext_convert_to_initialized before we need to request for journal
 write access before we even modify the extent length.
 
 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 ---
  fs/ext4/extents.c |   11 ---
  1 files changed, 8 insertions(+), 3 deletions(-)
 
 diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
 index 995ac16..c4d6f19 100644
 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -2168,6 +2168,10 @@ static int ext4_ext_convert_to_initialized(handle_t 
 *handle,
   newblock = iblock - ee_block + ext_pblock(ex);
   ex2 = ex;
 
 + err = ext4_ext_get_access(handle, inode, path + depth);
 + if (err)
 + goto out;
 +
   /* ex1: ee_block to iblock - 1 : uninitialized */
   if (iblock  ee_block) {
   ex1 = ex;
 @@ -2210,6 +2214,10 @@ static int ext4_ext_convert_to_initialized(handle_t 
 *handle,
   ex = path[depth].p_ext;
   if (ex2 != newex)
   ex2 = ex;
 +
 + err = ext4_ext_get_access(handle, inode, path + depth);
 + if (err)
 + goto out;
   }
   allocated = max_blocks;
   }
 @@ -2230,9 +2238,6 @@ static int ext4_ext_convert_to_initialized(handle_t 
 *handle,
   ex2-ee_len = cpu_to_le16(allocated);
   if (ex2 != ex)
   goto insert;
 - err = ext4_ext_get_access(handle, inode, path + depth);
 - if (err)
 - goto out;
   /*
* New (initialized) extent starts from the first block
* in the current extent. i.e., ex2 == ex

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH resend] ext2/3/4: convert byte order of constant instead of variable

2008-02-14 Thread Mingming Cao
On Thu, 2008-02-14 at 14:20 -0800, Andrew Morton wrote:
 On Sun, 10 Feb 2008 11:10:15 +0100
 Marcin Slusarz [EMAIL PROTECTED] wrote:
 
   fs/ext2/super.c |8 +++-
   fs/ext3/super.c |2 +-
   fs/ext4/super.c |2 +-
 
 Please don't bundle the filesystem patches in this manner.  I split
 it into three patches.
 
Andrew, Ted,

I added the ext4 patch in the ext4 patch queue.

Regards,
Mingming


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: modify block allocation algorithm for the last group

2008-02-13 Thread Mingming Cao
On Wed, 2008-02-13 at 17:26 +0100, Valerie Clement wrote:
 Modify the block allocation algorithm for the last group
 
 From: Valerie Clement [EMAIL PROTECTED]
 
 When a directory inode is allocated in the last group and the last group
 contains less than s_blocks_per_group blocks, the initial block allocated
 for the directory is not always allocated in the same group as the
 directory inode, but in one of the first groups of the filesystem (group 1
 for example).
 Depending on the current process's pid, ext4_find_near() and 
 ext4_ext_find_goal() can return a block number greater than the maximum
 blocks count in the filesystem and in that case the block will be not
 allocated in the same group as the inode.
 
 The following patch fixes the problem.
 
 Comments?
Looks sane to me. 

 Should the modification also be done in ext2/3 code?
 
I think so.

Mingming
 Signed-off-by: Valerie Clement [EMAIL PROTECTED]
 ---
 
  fs/ext4/extents.c |8 +++-
  fs/ext4/inode.c   |8 +++-
  2 files changed, 14 insertions(+), 2 deletions(-)
 
 diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
 index 5c4af51..1391ded 100644
 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -148,6 +148,7 @@ ext4_fsblk_t ext4_ext_find_goal(struct inode *inode,
  {
   struct ext4_inode_info *ei = EXT4_I(inode);
   ext4_fsblk_t bg_start;
 + ext4_fsblk_t last_block;
   ext4_grpblk_t colour;
   int depth;
 
 @@ -169,8 +170,13 @@ ext4_fsblk_t ext4_ext_find_goal(struct inode *inode,
   /* OK. use inode's group */
   bg_start = (ei-i_block_group * EXT4_BLOCKS_PER_GROUP(inode-i_sb)) +
   le32_to_cpu(EXT4_SB(inode-i_sb)-s_es-s_first_data_block);
 - colour = (current-pid % 16) *
 + last_block = ext4_blocks_count(EXT4_SB(inode-i_sb)-s_es) - 1;
 +
 + if (bg_start + EXT4_BLOCKS_PER_GROUP(inode-i_sb) = last_block)
 + colour = (current-pid % 16) *
   (EXT4_BLOCKS_PER_GROUP(inode-i_sb) / 16);
 + else
 + colour = (current-pid % 16) * ((last_block - bg_start) / 16);
   return bg_start + colour + block;
  }
 
 diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
 index 51bf3b5..c4f8ace 100644
 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -405,6 +405,7 @@ static ext4_fsblk_t ext4_find_near(struct inode *inode, 
 Indirect *ind)
   __le32 *start = ind-bh ? (__le32*) ind-bh-b_data : ei-i_data;
   __le32 *p;
   ext4_fsblk_t bg_start;
 + ext4_fsblk_t last_block;
   ext4_grpblk_t colour;
 
   /* Try to find previous block */
 @@ -422,8 +423,13 @@ static ext4_fsblk_t ext4_find_near(struct inode *inode, 
 Indirect *ind)
* into the same cylinder group then.
*/
   bg_start = ext4_group_first_block_no(inode-i_sb, ei-i_block_group);
 - colour = (current-pid % 16) *
 + last_block = ext4_blocks_count(EXT4_SB(inode-i_sb)-s_es) - 1;
 +
 + if (bg_start + EXT4_BLOCKS_PER_GROUP(inode-i_sb) = last_block)
 + colour = (current-pid % 16) *
   (EXT4_BLOCKS_PER_GROUP(inode-i_sb) / 16);
 + else
 + colour = (current-pid % 16) * ((last_block - bg_start) / 16);
   return bg_start + colour;
  }
 
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] allocate struct ext4_allocation_context from a kmem cache to save stack space

2008-02-07 Thread Mingming Cao
On Wed, 2008-02-06 at 11:05 -0600, Eric Sandeen wrote:
 struct ext4_allocation_context is rather large, and this bloats
 the stack of many functions which use it.  Allocating it from
 a named slab cache will alleviate this.
 
 For example, with this change (on top of the noinline patch sent earlier):
 
 -ext4_mb_new_blocks   200
 +ext4_mb_new_blocks40
 
 -ext4_mb_free_blocks  344
 +ext4_mb_free_blocks  168
 
 -ext4_mb_release_inode_pa 216
 +ext4_mb_release_inode_pa  40
 
 -ext4_mb_release_group_pa 192
 +ext4_mb_release_group_pa  24
 
 Most of these stack-allocated structs are actually used only for
 mballoc history; and in those cases often a smaller struct would do.
 So changing that may be another way around it, at least for those
 functions, if preferred.  For now, in those cases where the ac
 is only for history, an allocation failure simply skips the history
 recording, and does not cause any other failures.
 
 Comments?
 
 Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
 ---
 
 Index: linux-2.6.24-rc6-mm1/fs/ext4/mballoc.c
 ===
 --- linux-2.6.24-rc6-mm1.orig/fs/ext4/mballoc.c
 +++ linux-2.6.24-rc6-mm1/fs/ext4/mballoc.c
 @@ -419,6 +419,7 @@
  #define MB_DEFAULT_GROUP_PREALLOC512
 
  static struct kmem_cache *ext4_pspace_cachep;
 +static struct kmem_cache *ext4_ac_cachep;
 
  #ifdef EXT4_BB_MAX_BLOCKS
  #undef EXT4_BB_MAX_BLOCKS
 @@ -2958,11 +2959,18 @@ int __init init_ext4_mballoc(void)
   if (ext4_pspace_cachep == NULL)
   return -ENOMEM;
 
 -#ifdef CONFIG_PROC_FS
 + ext4_ac_cachep =
 + kmem_cache_create(ext4_alloc_context,
 +  sizeof(struct ext4_allocation_context),
 +  0, SLAB_RECLAIM_ACCOUNT, NULL);
 + if (ext4_ac_cachep == NULL) {
 + kmem_cache_destroy(ext4_pspace_cachep);
 + return -ENOMEM;
 + }
 +
   proc_root_ext4 = proc_mkdir(EXT4_ROOT, proc_root_fs);
   if (proc_root_ext4 == NULL)
   printk(KERN_ERR EXT4-fs: Unable to create %s\n, EXT4_ROOT);
 -#endif
 
   return 0;
  }
 @@ -2971,9 +2979,8 @@ void exit_ext4_mballoc(void)
  {
   /* XXX: synchronize_rcu(); */
   kmem_cache_destroy(ext4_pspace_cachep);
 -#ifdef CONFIG_PROC_FS
 + kmem_cache_destroy(ext4_ac_cachep);
   remove_proc_entry(EXT4_ROOT, proc_root_fs);
 -#endif
  }
 
 

Do you intend to remove the #ifdef CONFIG_PROC_FS, or it's a accident? I
think we need keep that to allow ext4 build without procfs configured.

Other than this, the patch looks fine to me.:)

Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG_ON at mballoc.c:3752

2008-02-07 Thread Mingming Cao
On Thu, 2008-02-07 at 18:25 +0530, Aneesh Kumar K.V wrote:

 ext4: Don't panic in case of corrupt bitmap
 
 From: Aneesh Kumar K.V [EMAIL PROTECTED]
 
 Multiblock allocator was calling BUG_ON in many case if the free and used
 blocks count obtained looking at the bitmap is different from what
 the allocator internally accounted for. Use ext4_error in such case
 and don't panic the system.
 

There seems a lot of BUG_ON() and BUG() in mballoc code, other than this
case. Should it always panic the whole system in those cases? Perhaps
replacing with ext4_error() or some cases just WARN_ON is enough.

Mingming
 Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
 ---
 
  fs/ext4/mballoc.c |   35 +--
  1 files changed, 21 insertions(+), 14 deletions(-)
 
 
 diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
 index 06d1f52..656729b 100644
 --- a/fs/ext4/mballoc.c
 +++ b/fs/ext4/mballoc.c
 @@ -680,7 +680,6 @@ static void *mb_find_buddy(struct ext4_buddy *e4b, int 
 order, int *max)
  {
   char *bb;
 
 - /* FIXME!! is this needed */
   BUG_ON(EXT4_MB_BITMAP(e4b) == EXT4_MB_BUDDY(e4b));
   BUG_ON(max == NULL);
 



 @@ -964,7 +963,7 @@ static void ext4_mb_generate_buddy(struct super_block *sb,
   grp-bb_fragments = fragments;
 
   if (free != grp-bb_free) {
 - printk(KERN_DEBUG
 + ext4_error(sb, __FUNCTION__,
   EXT4-fs: group %lu: %u blocks in bitmap, %u in gd\n,
   group, free, grp-bb_free);
   grp-bb_free = free;
 @@ -1821,13 +1820,24 @@ static void ext4_mb_complex_scan_group(struct 
 ext4_allocation_context *ac,
   i = ext4_find_next_zero_bit(bitmap,
   EXT4_BLOCKS_PER_GROUP(sb), i);
   if (i = EXT4_BLOCKS_PER_GROUP(sb)) {
 - BUG_ON(free != 0);
 + /*
 +  * IF we corrupt the bitmap  we won't find any
 +  * free blocks even though group info says we
 +  * we have free blocks
 +  */
 + ext4_error(sb, __FUNCTION__, %d free blocks as per 
 + group info. But bitmap says 0\n,
 + free);
   break;
   }
 
   mb_find_extent(e4b, 0, i, ac-ac_g_ex.fe_len, ex);
   BUG_ON(ex.fe_len = 0);
 - BUG_ON(free  ex.fe_len);
 + if (free  ex.fe_len) {
 + ext4_error(sb, __FUNCTION__, %d free blocks as per 
 + group info. But got %d blocks\n,
 + free, ex.fe_len);
 + }
 
   ext4_mb_measure_extent(ac, ex, e4b);
 
 @@ -3354,13 +3364,10 @@ static void ext4_mb_use_group_pa(struct 
 ext4_allocation_context *ac,
   ac-ac_pa = pa;
 
   /* we don't correct pa_pstart or pa_plen here to avoid
 -  * possible race when tte group is being loaded concurrently
 +  * possible race when the group is being loaded concurrently
* instead we correct pa later, after blocks are marked
 -  * in on-disk bitmap -- see ext4_mb_release_context() */
 - /*
 -  * FIXME!! but the other CPUs can look at this particular
 -  * pa and think that it have enought free blocks if we
 -  * don't update pa_free here right ?
 +  * in on-disk bitmap -- see ext4_mb_release_context()
 +  * Other CPUs are prevented from allocating from this pa by lg_mutex
*/
   mb_debug(use %u/%u from group pa %p\n, pa-pa_lstart-len, len, pa);
  }
 @@ -3743,13 +3750,13 @@ static int ext4_mb_release_inode_pa(struct ext4_buddy 
 *e4b,
   bit = next + 1;
   }
   if (free != pa-pa_free) {
 - printk(KERN_ERR pa %p: logic %lu, phys. %lu, len %lu\n,
 + printk(KERN_CRIT pa %p: logic %lu, phys. %lu, len %lu\n,
   pa, (unsigned long) pa-pa_lstart,
   (unsigned long) pa-pa_pstart,
   (unsigned long) pa-pa_len);
 - printk(KERN_ERR free %u, pa_free %u\n, free, pa-pa_free);
 + ext4_error(sb, __FUNCTION__, free %u, pa_free %u\n,
 + free, pa-pa_free);
   }
 - BUG_ON(free != pa-pa_free);
   atomic_add(free, sbi-s_mb_discarded);
 
   return err;
 @@ -4405,7 +4412,7 @@ void ext4_mb_free_blocks(handle_t *handle, struct inode 
 *inode,
   unsigned long block, unsigned long count,
   int metadata, unsigned long *freed)
  {
 - struct buffer_head *bitmap_bh = 0;
 + struct buffer_head *bitmap_bh = NULL;
   struct super_block *sb = inode-i_sb;
   struct ext4_allocation_context ac;
   struct ext4_group_desc *gdp;
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL 

Re: [PATCH] allocate struct ext4_allocation_context from a kmem cache to save stack space

2008-02-07 Thread Mingming Cao
On Thu, 2008-02-07 at 19:06 -0600, Eric Sandeen wrote:
 Mingming Cao wrote:
 
  Do you intend to remove the #ifdef CONFIG_PROC_FS, or it's a accident? I
  think we need keep that to allow ext4 build without procfs configured.
  
  Other than this, the patch looks fine to me.:)
 
 oh, it kind of snuck in.  It actually should still build, as
 remove_proc_entry is a no-op function w/o the config option.

Oh, I mean the proc_mkdir(EXT4_ROOT, proc_root_fs) will complain w/o
CONFIG_PROC_FS configured.

Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] jbd: fix assertion failure in journal_next_log_block

2008-02-06 Thread Mingming Cao
On Tue, 2008-02-05 at 13:59 -0500, Josef Bacik wrote:
 On Tuesday 05 February 2008 12:27:31 pm Jan Kara wrote:
Hello,
 
Sorry for replying a bit late but I'm currently falling behind in
  maling-list reading...
 
   The way jbd tries to determine if there is enough space left on the
   journal in order to start a new transaction is looking at the space left
   in the journal and the space needed for the committing transaction, which
   is 1/4 of the journal + the number of t_outstanding_credits for that
   transaction.  In this case its assumed that t_outstanding_credits
   accurately represents the number of credits
 
Yes.
 
   waiting to be written to the journal, but this sometimes isn't the case. 
   The transaction has two counters for buffers, t_outstanding_credits which
   is used in conjunction with handles that are added to the transaction,
   and t_nr_buffers which is only incremented/decremented when buffers are
   added/removed from the transaction and are actually destined to be
   journaled.  Normally these two
 
t_nr_buffers actually represents number of buffers on BJ_Metadata list
  and nobody uses it (except for the assertion in
  __journal_temp_unlink_buffer()). t_outstanding_credits is supposed to be
  *the* counter making sure we don't write more than we have credits for.
 
   counters are the same, however there are cases where the committing
   transaction can have buffers moved to the next running transaction, for
   example any buffers on the committing transactions t_reserved list would
   be moved to the next (running) transaction, and if it had been dirtied in
   the process it would immediately make it onto the t_updates list, which
   would increment t_nr_buffers
 
You probably mean t_buffers list here...
 
   but not t_outstanding_credits.  So you get into this situation where
 
But which moving and dirtying do you mean? The caller which dirties
  the buffer must make sure that he has acquired enough credits for the
  transaction where the buffer ends up... So if there were not enough
  buffers in the running transaction where we refiled the buffer it is a
  bug in the caller which dirties the buffer.
 
 
 You know now that you say that I feel like an idiot, you are right the only 
 way 
 for something to actually end up on that list was if somebody dirtied it and 
 if 
 they did it would have had to been accounted for at some point on the running 
 transaction.
 
   t_nr_buffers (the actual number of buffers that are on the transaction)
   is greater than the number of buffers accounted for via
   t_outstanding_credits. This presents a problem since as we loop through
   writting buffers to the journal, we decrement t_outstanding_credits, and
   if t_nr_buffers is more than t_outstanding_credits then we end up with a
   negative number for
   t_outstanding_credits, which means we start saying we need less than 1/4
   of the journal for our committing transaction and allow more transactions
   than we can handle to start, and then bam we fail because
   journal_next_log_block doesn't have enough free blocks in order to handle
   the request.  This has been tested and fixes the issue (which could not
   be reproduced by me but several other people could get it to reproduce
   using postmark), and although I couldn't reproduce the assertion, I could
   very easily reproduce the situation where t_outstanding_credits was 
   than t_nr_buffers.
 
I suppose you see the assertion J_ASSERT(journal-j_free  1); to
  fail, right? I don't see how your patch could help avoid that assertion.
  You've just removed accounting of t_outstanding_credits which has no
  impact on the real number of free blocks in the journal stored in
  j_free. Anyway, if you can reproduce t_outstanding_credits 
  t_nr_buffers, then there's something fishy. Are you able to reproduce it
  also with a current kernel?
Thanks for looking into the problem :)
 
 
 Well my patch helped avoid the assertion because t_outstanding_credits was 
 going 
 negative therefore we were letting transactions start when we shouldn't be, 
 and 
 eventually we would end up with too much of the journal in use and we'd 
 assert.  
 Course I can't reproduce where t_outstanding_credits  t_nr_buffers upstream 
 (again I feel like an idiot, should have tested that first).  Thanks for 
 looking at this Jan.
 
 Mingming, would you mind pulling this patch out of the patch queue please 
 since 
 its wrong?  Thanks much,
 

Sure, done!

Mingming


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: Fix DIO locking

2008-02-06 Thread Mingming Cao
On Wed, 2008-02-06 at 17:06 +0100, Jan Kara wrote:
   Hi,
 
   this patch fixes lock inversion in ext4 direct IO path. The similar patch
 has already been accepted for ext3. Mingming, will you put it into ext4
 patch queue? Thanks.
 

Thanks! Added to the ext4 patch queue.

Mingming
   Honza
 -- 
 Jan Kara [EMAIL PROTECTED]
 SUSE Labs, CR
 ---
 
 We cannot start transaction in ext4_direct_IO() and just let it last during 
 the
 whole write because dio_get_page() acquires mmap_sem which ranks above
 transaction start (e.g. because we have dependency chain
 mmap_sem-PageLock-journal_start, or because we update atime while holding
 mmap_sem) and thus deadlocks could happen. We solve the problem by starting
 a transaction separately for each ext4_get_block() call.
 
 We *could* have a problem that we allocate a block and before its data are
 written out the machine crashes and thus we expose stale data. But that
 does not happen because for hole-filling generic code falls back to buffered
 writes and for file extension, we add inode to orphan list and thus in
 case of crash, journal replay will truncate inode back to the original size.
 
 Signed-off-by: Jan Kara [EMAIL PROTECTED]
 
 diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
 index bb717cb..1948b97 100644
 --- a/fs/ext4/inode.c
 +++ b/fs/ext4/inode.c
 @@ -895,7 +895,16 @@ out:
   return err;
  }
 
 -#define DIO_CREDITS (EXT4_RESERVE_TRANS_BLOCKS + 32)
 +/* Maximum number of blocks we map for direct IO at once. */
 +#define DIO_MAX_BLOCKS 4096
 +/*
 + * Number of credits we need for writing DIO_MAX_BLOCKS:
 + * We need sb + group descriptor + bitmap + inode - 4
 + * For B blocks with A block pointers per block we need:
 + * 1 (triple ind.) + (B/A/A + 2) (doubly ind.) + (B/A + 2) (indirect).
 + * If we plug in 4096 for B and 256 for A (for 1KB block size), we get 25.
 + */
 +#define DIO_CREDITS 25
 
  int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t 
 block,
   unsigned long max_blocks, struct buffer_head *bh,
 @@ -942,49 +951,31 @@ static int ext4_get_block(struct inode *inode, sector_t 
 iblock,
   struct buffer_head *bh_result, int create)
  {
   handle_t *handle = ext4_journal_current_handle();
 - int ret = 0;
 + int ret = 0, started = 0;
   unsigned max_blocks = bh_result-b_size  inode-i_blkbits;
 
 - if (!create)
 - goto get_block; /* A read */
 -
 - if (max_blocks == 1)
 - goto get_block; /* A single block get */
 -
 - if (handle-h_transaction-t_state == T_LOCKED) {
 - /*
 -  * Huge direct-io writes can hold off commits for long
 -  * periods of time.  Let this commit run.
 -  */
 - ext4_journal_stop(handle);
 - handle = ext4_journal_start(inode, DIO_CREDITS);
 - if (IS_ERR(handle))
 + 
 + if (create  !handle) {/* Direct IO write... */
 + if (max_blocks  DIO_MAX_BLOCKS)
 + max_blocks = DIO_MAX_BLOCKS;
 + handle = ext4_journal_start(inode, DIO_CREDITS +
 +   2 * EXT4_QUOTA_TRANS_BLOCKS(inode-i_sb));
 + if (IS_ERR(handle)) {
   ret = PTR_ERR(handle);
 - goto get_block;
 - }
 -
 - if (handle-h_buffer_credits = EXT4_RESERVE_TRANS_BLOCKS) {
 - /*
 -  * Getting low on buffer credits...
 -  */
 - ret = ext4_journal_extend(handle, DIO_CREDITS);
 - if (ret  0) {
 - /*
 -  * Couldn't extend the transaction.  Start a new one.
 -  */
 - ret = ext4_journal_restart(handle, DIO_CREDITS);
 + goto out;
   }
 + started = 1;
   }
 
 -get_block:
 - if (ret == 0) {
 - ret = ext4_get_blocks_wrap(handle, inode, iblock,
 + ret = ext4_get_blocks_wrap(handle, inode, iblock,
   max_blocks, bh_result, create, 0);
 - if (ret  0) {
 - bh_result-b_size = (ret  inode-i_blkbits);
 - ret = 0;
 - }
 + if (ret  0) {
 + bh_result-b_size = (ret  inode-i_blkbits);
 + ret = 0;
   }
 + if (started)
 + ext4_journal_stop(handle);
 +out:
   return ret;
  }
 
 @@ -1674,7 +1665,8 @@ static int ext4_releasepage(struct page *page, gfp_t 
 wait)
   * if the machine crashes during the write.
   *
   * If the O_DIRECT write is intantiating holes inside i_size and the machine
 - * crashes then stale disk data _may_ be exposed inside the file.
 + * crashes then stale disk data _may_ be exposed inside the file. But current
 + * VFS code falls back into buffered path in that case so we are safe.
   */
  static ssize_t ext4_direct_IO(int rw, 

Re: - disable-ext4.patch removed from -mm tree

2008-02-05 Thread Mingming Cao
On Mon, 2008-02-04 at 10:00 -0500, Theodore Tso wrote:
 On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
  On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso [EMAIL PROTECTED] wrote:
  
   On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
When I merge David's iget coversion patches this will instead wreck the
ext4 patchset.
   
   That's ok, it shouldn't be hard for me to fix this up.  How quickly
   will you be able to merge David's iget converstion patches?
  
  They're about 1,000 patches back.
 
 OK, if you're not planning on pushing David's changes to Linus right
 away, what if I pull in David's
 
   iget-stop-ext4-from-using-iget-and-read_inode-try.patch 
 
 and push it plus some other ext4 bug fixes directly to Linus, and let
 you know when that has happened so you can drop David's patch from
 your queue?
 
 David's changes to ext4 can be applied standalone without the rest of
 his series, so it would be safe to push that to Linus independently
 and in advance of the rest of his series.  

I get compile error when builing ext4 patch queue with
iget-stop-ext4-from-using-iget-and-read_inode-try.patch applied, against
2.6.24-git14.

It seems iget-stop-ext4-from-using-iget-and-read_inode-try.patch depends
on patches:  
[PATCH 01/31] Add an ERR_CAST() macro to complement ERR_PTR and co.
[PATCH 03/32] IGET: Introduce a function to register iget failure

Mingming

 That should also help
 reduce the number of inter-patch queue dependencies.
 
 Regards,
 
 - Ted
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Replace iget with iget_locked in defrag-free-space-fragementation

2008-02-05 Thread Mingming Cao
On Sat, 2008-02-02 at 11:40 -0500, Theodore Ts'o wrote:
 FYI, since iget() is going to be disappearing in 2.6.25, and akpm has
 disabled ext4 in the -mm tree as a result, I'm folding the following
 patch into the defrag-free-space-fragmentation.patch in the ext4 tree.
 
 
I noticed that you have merged the changes in the patch queue...

diff --git a/fs/ext4/defrag.c b/fs/ext4/defrag.c
 index d9a14e4..6370fb8 100644
 --- a/fs/ext4/defrag.c
 +++ b/fs/ext4/defrag.c
 @@ -287,9 +287,13 @@ static int ext4_ext_extents_info(struct 
 ext4_extents_info *ext_info,
   int entries = 0;
   int err = 0;
 
 - inode = iget(sb, ext_info-ino);
 + inode = iget_locked(sb, ext_info-ino);
   if (!inode)
   return -EACCES;
 + if (inode-i_state  I_NEW) {
 + sb-s_op-read_inode(inode);
 + unlock_new_inode(inode);
 + }
 
   down_write(EXT4_I(inode)-i_data_sem);
 
read_inode() also sunset, as removed in 
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24/2.6.24-mm1/broken-out/iget-remove-iget-and-the-read_inode-super-op-as.patch

so above changes won't compile in mm tree.

I have fixed it in patch queue, using ext4_iget() to replace iget() and 
read-inode() as suggested in 
iget-stop-ext4-from-using-iget-and-read_inode-try.patch 

http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=ext4-online-defrag-iget-read-inode-fix.patch;h=94660ca7371680fcd02bb5804016b4cae4f20846;hb=85589744f9d53579a12aa463ccd42fb450aec786

 We also have an issue where the read-only bind patches from Dave Hansen
 are making changes which conflict with the ext4 patch tree, which is
 making akpm very grumpy.  I'll try to take a look at this later in the
 weekend...
 
   - Ted
 



 @@ -588,9 +592,13 @@ static int ext4_ext_defrag_victim(struct file 
 *target_filp,
   ext.len = 0;
 
   /* Get the inode of the victim file */
 - victim_inode = iget(sb, ex_info-ino);
 + victim_inode = iget_locked(sb, ex_info-ino);
   if (!victim_inode)
   return -EACCES;
 + if (victim_inode-i_state  I_NEW) {
 + sb-s_op-read_inode(victim_inode);
 + unlock_new_inode(victim_inode);
 + }
 
   /* Setup file for the victim file */
   victim_dent.d_inode = victim_inode;

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Replace iget with iget_locked in defrag-free-space-fragementation

2008-02-05 Thread Mingming Cao
On Sat, 2008-02-02 at 11:40 -0500, Theodore Ts'o wrote:
 We also have an issue where the read-only bind patches from Dave Hansen
 are making changes which conflict with the ext4 patch tree, which is
 making akpm very grumpy.  I'll try to take a look at this later in the
 weekend...
 
   - Ted

BTW, Ted, could you clarify about this one? I am able to build the patch
queue on akpm's 2.6.24-mm1 tree with Dave Hansen's read-only bind
changes. Maybe I missed something.

Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: - disable-ext4.patch removed from -mm tree

2008-02-05 Thread Mingming Cao
On Tue, 2008-02-05 at 13:26 -0800, Mingming Cao wrote:
 On Mon, 2008-02-04 at 10:00 -0500, Theodore Tso wrote:
  On Sun, Feb 03, 2008 at 07:15:40PM -0800, Andrew Morton wrote:
   On Sun, 3 Feb 2008 20:36:26 -0500 Theodore Tso [EMAIL PROTECTED] wrote:
   
On Sun, Feb 03, 2008 at 12:25:51PM -0800, Andrew Morton wrote:
 When I merge David's iget coversion patches this will instead wreck 
 the
 ext4 patchset.

That's ok, it shouldn't be hard for me to fix this up.  How quickly
will you be able to merge David's iget converstion patches?
   
   They're about 1,000 patches back.
  
  OK, if you're not planning on pushing David's changes to Linus right
  away, what if I pull in David's
  
  iget-stop-ext4-from-using-iget-and-read_inode-try.patch 
  
  and push it plus some other ext4 bug fixes directly to Linus, and let
  you know when that has happened so you can drop David's patch from
  your queue?
  
  David's changes to ext4 can be applied standalone without the rest of
  his series, so it would be safe to push that to Linus independently
  and in advance of the rest of his series.  
 
 I get compile error when builing ext4 patch queue with
 iget-stop-ext4-from-using-iget-and-read_inode-try.patch applied, against
 2.6.24-git14.
 
 It seems iget-stop-ext4-from-using-iget-and-read_inode-try.patch depends
 on patches:  
 [PATCH 01/31] Add an ERR_CAST() macro to complement ERR_PTR and co.
 [PATCH 03/32] IGET: Introduce a function to register iget failure

It seems to me the easiest way to bring ext4 patches back to mm tree, is
to carry above two patches in ext4 patch queue, like we did for other
ext4 patches that depend on generic code in the past.

So I added above two patches to ext4 patch queue, now that ext4 patches
could apply cleanly to linus git tree, and Andrew should able to easily
pull ext4 patches after removing the duplicated patches. 

Ted, I have the ext4 patch queue updated for this. 

Regards,
Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG_ON at mballoc.c:3752

2008-02-01 Thread Mingming Cao
On Thu, 2008-01-31 at 15:01 +0100, Eric Sesterhenn wrote:
 hi,
 
 while running a modified version of fsfuzzer i triggered the BUG() in
 ext4_mb_release_inode_pa(). Sadly I am not able to reproduce this using
 the generated image, but running the fuzzer will usually trigger this in
 less than 40 attempts. Increasing the JBD2 Debug level didnt give more
 information. The kernel is current git with
 ext4-fix-null-pointer-deref-in-journal_wait_on_commit_record.patch
 applied. 
 
 Greetings, Eric
 
Thanks for reporting this.

 [ 1570.971980] EXT4-fs error (device loop0) in ext4_reserve_inode_write: 
 Journal has aborted

Is there any more info about why jbd has aborted?

 [ 1570.972077] pa c6512330: logic 16, phys. 2337, len 16
 [ 1570.972103] free 2, pa_free 1

looks like free!=pa_free.

Aneesh, could you take a look? Thanks!

Mingming
 [ 1570.972191] [ cut here ]
 [ 1570.972217] kernel BUG at fs/ext4/mballoc.c:3752!
 [ 1570.972241] invalid opcode:  [#1] PREEMPT DEBUG_PAGEALLOC
 [ 1570.972386] Modules linked in:
 [ 1570.972425] 
 [ 1570.972509] Pid: 6629, comm: fstest Not tainted 
 (2.6.24-05749-g8af03e7-dirty #19)
 [ 1570.972534] EIP: 0060:[c02266b9] EFLAGS: 00010202 CPU: 0
 [ 1570.972570] EIP is at ext4_mb_release_inode_pa+0x169/0x1a0
 [ 1570.972595] EAX: 0001 EBX: 0930 ECX: 0001 EDX: 0001
 [ 1570.972678] ESI: 0930 EDI: c6512330 EBP: cb638b28 ESP: cb638a84
 [ 1570.972703]  DS: 007b ES: 007b FS:  GS: 0033 SS: 0068
 [ 1570.972728] Process fstest (pid: 6629, ti=cb638000 task=cb698000 
 task.ti=cb638000)
 [ 1570.972751] Stack: c07c21bb 0002 0001 0921 0010 cbff34e0 
 cb638b54 0002 
 [ 1570.972899]cb5fa430 c64824a0 cb5fb920 0022 cbff34e0 cb638ad4 
 0246 0400 
 [ 1570.972899]cbfa2000 cb5fb920 cbff34e0  092e  
 0002 cbfa2000 
 [ 1570.972899] Call Trace:
 [ 1570.972899]  [c020e3a4] ? read_block_bitmap+0x54/0x120
 [ 1570.972899]  [c022bd24] ? 
 ext4_mb_discard_inode_preallocations+0x124/0x300
 [ 1570.972899]  [c022bda5] ? 
 ext4_mb_discard_inode_preallocations+0x1a5/0x300
 [ 1570.972899]  [c0223577] ? ext4_ext_get_blocks+0x3a7/0x4b0
 [ 1570.972899]  [c0213928] ? ext4_get_blocks_wrap+0xe8/0x130
 [ 1570.972899]  [c0213bce] ? ext4_get_block+0x7e/0xf0
 [ 1570.972899]  [c019d7ba] ? __block_prepare_write+0x17a/0x3a0
 [ 1570.972899]  [c019da68] ? block_write_begin+0x48/0xe0
 [ 1570.972899]  [c0213b50] ? ext4_get_block+0x0/0xf0
 [ 1570.972899]  [c0215127] ? ext4_write_begin+0xb7/0x190
 [ 1570.972899]  [c0213b50] ? ext4_get_block+0x0/0xf0
 [ 1570.972899]  [c01557e9] ? generic_perform_write+0xa9/0x190
 [ 1570.972899]  [c01575bd] ? generic_file_buffered_write+0x6d/0x130
 [ 1570.972899]  [c01578c1] ? __generic_file_aio_write_nolock+0x241/0x550
 [ 1570.972899]  [c0144b44] ? trace_hardirqs_on+0xc4/0x150
 [ 1570.972899]  [c0157c2c] ? generic_file_aio_write+0x5c/0xd0
 [ 1570.972899]  [c015ab0d] ? free_one_page+0x1ed/0x220
 [ 1570.972899]  [c0210410] ? ext4_file_write+0x50/0x160
 [ 1570.972899]  [c017b11d] ? do_sync_write+0xcd/0x110
 [ 1570.972899]  [c01096f9] ? native_sched_clock+0x69/0xc0
 [ 1570.972899]  [c01373c0] ? autoremove_wake_function+0x0/0x50
 [ 1570.972899]  [c01070e5] ? do_softirq+0x55/0xd0
 [ 1570.972899]  [c01050d3] ? restore_nocheck+0x12/0x15
 [ 1570.972899]  [c0144b44] ? trace_hardirqs_on+0xc4/0x150
 [ 1570.972899]  [c017b959] ? vfs_write+0x99/0x130
 [ 1570.972899]  [c017b050] ? do_sync_write+0x0/0x110
 [ 1570.972899]  [c017c048] ? sys_pwrite64+0x68/0x70
 [ 1570.972899]  [c0104fea] ? sysenter_past_esp+0x5f/0xa5
 [ 1570.972899]  ===
 [ 1570.972899] Code: ff 0f b7 47 4e 89 44 24 08 8b 85 78 ff ff ff c7 04 24 bb 
 21 7c c0 89 44 24 04 e8 c3 e0 ef ff 0f b7 47 4e 39 85 78 ff ff ff 74 07 0f 
 0b eb fe 8d 76 00 8b 85 78 ff ff ff 8b 95 7c ff ff ff 01 82 
 [ 1570.972899] EIP: [c02266b9] ext4_mb_release_inode_pa+0x169/0x1a0 SS:ESP 
 0068:cb638a84
 [ 1570.972942] ---[ end trace 51819e80cd9431da ]---
 [ 1570.972969] note: fstest[6629] exited with preempt_count 1
 [ 1570.973013] BUG: sleeping function called from invalid context at 
 kernel/rwsem.c:21
 [ 1570.973039] in_atomic():1, irqs_disabled():0
 [ 1570.973077] INFO: lockdep is turned off.
 [ 1570.973104] Pid: 6629, comm: fstest Tainted: G  D 
 2.6.24-05749-g8af03e7-dirty #19
 [ 1570.973159]  [c011e1a6] __might_sleep+0xc6/0xf0
 [ 1570.973224]  [c06b2c99] down_read+0x19/0x80
 [ 1570.973295]  [c013a7fd] ? hrtimer_try_to_cancel+0x3d/0x80
 [ 1570.973396]  [c0125a27] exit_mm+0x27/0xd0
 [ 1570.973467]  [c01272f3] do_exit+0x133/0x2e0
 [ 1570.973529]  [c010611c] die+0x13c/0x140
 [ 1570.973590]  [c0135197] ? search_exception_tables+0x27/0x30
 [ 1570.973993]  [c01061b1] do_trap+0x91/0xc0
 [ 1570.974054]  [c0106440] ? do_invalid_op+0x0/0xa0
 [ 1570.974133]  [c01064c9] do_invalid_op+0x89/0xa0
 [ 1570.974195]  [c02266b9] ? ext4_mb_release_inode_pa+0x169/0x1a0
 [ 1570.974290]  [c012007b] ? account_system_time+0x9b/0xd0
 [ 1570.974370]  

Re: [PATCH] jbd: fix assertion failure in journal_next_log_block

2008-01-31 Thread Mingming Cao
On Thu, 2008-01-31 at 16:52 -0500, Josef Bacik wrote:
 On Thu, Jan 31, 2008 at 12:35:43PM -0700, Andreas Dilger wrote:
  On Jan 31, 2008  11:14 -0500, Josef Bacik wrote:
  [snip excellent analysis]
   So you get into this situation where
   t_nr_buffers (the actual number of buffers that are on the transaction) is
   greater than the number of buffers accounted for via 
   t_outstanding_credits.
   This presents a problem since as we loop through writting buffers to the
   journal, we decrement t_outstanding_credits, and if t_nr_buffers is more 
   than
   t_outstanding_credits then we end up with a negative number for
   t_outstanding_credits
   
   Signed-off-by: Josef Bacik [EMAIL PROTECTED]
  
  Do you know what kernel this problem was introduced in, or is this a
  long standing problem?  Presumably the same is needed for jbd2?
  
  Once we have some decent amount of testing going on with ext4, I think
  it makes sense to merge the jbd2 changes back into jbd and return to
  a single code base, since there is nothing in the jbd2 code that ext3
  can't also work with (i.e. all of the changes are properly isolated
  with compatibility flags and such).
  
   @@ -1056,7 +1056,7 @@ static inline int jbd_space_needed(journal_t 
   *journal)
 int nblocks = journal-j_max_transaction_buffers;
 if (journal-j_committing_transaction)
 nblocks += journal-j_committing_transaction-
   - t_outstanding_credits;
   + t_nr_buffers;
  
  (trivial) this can be moved back onto the previous line.
  
   @@ -1168,7 +1168,7 @@ static inline int jbd_space_needed(journal_t 
   *journal)
 int nblocks = journal-j_max_transaction_buffers;
 if (journal-j_committing_transaction)
 nblocks += journal-j_committing_transaction-
   - t_outstanding_credits;
   + t_nr_buffers;
  
  Same...
 
 
 The original issue was reported on RHEL4, so thats 2.6.9, and looking through
 the old-bkcvs git tree I can't see where this was introduced, so it's probably
 existed before that.  The same problem looks to exist in jbd2 though I haven't
 tested it myself, I just went ahead and included the fixes.  Here is the 
 updated
 patch, thanks much for the comments.
 

Added to ext4 patch queue. 

Thanks,
Mingming
 Signed-off-by: Josef Bacik [EMAIL PROTECTED]
 
 
 diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
 index 31853eb..e385a5c 100644
 --- a/fs/jbd/commit.c
 +++ b/fs/jbd/commit.c
 @@ -561,13 +561,6 @@ void journal_commit_transaction(journal_t *journal)
   continue;
   }
 
 - /*
 -  * start_this_handle() uses t_outstanding_credits to determine
 -  * the free space in the log, but this counter is changed
 -  * by journal_next_log_block() also.
 -  */
 - commit_transaction-t_outstanding_credits--;
 -
   /* Bump b_count to prevent truncate from stumbling over
 the shadowed buffer!  @@@ This can go if we ever get
 rid of the BJ_IO/BJ_Shadow pairing of buffers. */
 diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
 index 4f302d2..c0f93f5 100644
 --- a/fs/jbd2/commit.c
 +++ b/fs/jbd2/commit.c
 @@ -580,7 +580,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
   stats.u.run.rs_logging = jiffies;
   stats.u.run.rs_flushing = jbd2_time_diff(stats.u.run.rs_flushing,
stats.u.run.rs_logging);
 - stats.u.run.rs_blocks = commit_transaction-t_outstanding_credits;
 + stats.u.run.rs_blocks = commit_transaction-t_nr_buffers;
   stats.u.run.rs_blocks_logged = 0;
 
   descriptor = NULL;
 @@ -655,13 +655,6 @@ void jbd2_journal_commit_transaction(journal_t *journal)
   continue;
   }
 
 - /*
 -  * start_this_handle() uses t_outstanding_credits to determine
 -  * the free space in the log, but this counter is changed
 -  * by jbd2_journal_next_log_block() also.
 -  */
 - commit_transaction-t_outstanding_credits--;
 -
   /* Bump b_count to prevent truncate from stumbling over
 the shadowed buffer!  @@@ This can go if we ever get
 rid of the BJ_IO/BJ_Shadow pairing of buffers. */
 diff --git a/include/linux/jbd.h b/include/linux/jbd.h
 index d9ecd13..eaeb3db 100644
 --- a/include/linux/jbd.h
 +++ b/include/linux/jbd.h
 @@ -1055,8 +1055,7 @@ static inline int jbd_space_needed(journal_t *journal)
  {
   int nblocks = journal-j_max_transaction_buffers;
   if (journal-j_committing_transaction)
 - nblocks += journal-j_committing_transaction-
 - t_outstanding_credits;
 + nblocks += journal-j_committing_transaction-t_nr_buffers;
   return nblocks;
  }
 
 diff --git 

Re: Fw: [Bugme-new] [Bug 9849] New: NULL pointer deref in journal_wait_on_commit_record

2008-01-30 Thread Mingming Cao
On Wed, 2008-01-30 at 12:00 -0800, Andrew Morton wrote:
 
 Begin forwarded message:
 
 Date: Wed, 30 Jan 2008 03:24:08 -0800 (PST)
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: [Bugme-new] [Bug 9849] New: NULL pointer deref in 
 journal_wait_on_commit_record
 
 
 http://bugzilla.kernel.org/show_bug.cgi?id=9849
 
Summary: NULL pointer deref in journal_wait_on_commit_record
Product: File System
Version: 2.5
  KernelVersion: 2.6.24-03997-g85004cc
   Platform: All
 OS/Version: Linux
   Tree: Mainline
 Status: NEW
   Severity: normal
   Priority: P1
  Component: ext4
 AssignedTo: [EMAIL PROTECTED]
 ReportedBy: [EMAIL PROTECTED]
 
 
 Latest working kernel version: -
 Earliest failing kernel version: 2.6.24-03863-g0ba6c33
 Distribution: Ubuntu
 Problem Description:
 
 using a corrupted image causes an oops in unmount, seems as if
 journal_wait_on_commit_record() gets passed a NULL pointer
 

The buufer head pointer passed to journal_wait_on_commit_record() could
be NULL if the previous journal_submit_commit_record() failed or journal
has already aborted.

Looking at the jbd2 debug messages, before the oops happen, the jbd2 is
aborted due to trying to access the next log block beyond the end of
device. This might be caused by using a corrupted image.

We need to check the error returns from journal_submit_commit_record()
and avoid calling journal_wait_on_commit_record() in the failure case.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
The buufer head pointer passed to journal_wait_on_commit_record()
could be NULL if the previous journal_submit_commit_record() failed
or journal has already aborted.

We need to check the error returns from journal_submit_commit_record()
and avoid calling journal_wait_on_commit_record() in the failure case.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/jbd2/commit.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.24-rc8/fs/jbd2/commit.c
===
--- linux-2.6.24-rc8.orig/fs/jbd2/commit.c  2008-01-30 14:12:10.0 
-0800
+++ linux-2.6.24-rc8/fs/jbd2/commit.c   2008-01-30 15:09:50.0 -0800
@@ -872,7 +872,8 @@ wait_for_iobuf:
if (err)
__jbd2_journal_abort_hard(journal);
}
-   err = journal_wait_on_commit_record(cbh);
+   if (!err  !is_journal_aborted(journal))
+   err = journal_wait_on_commit_record(cbh);
 
if (err)
jbd2_journal_abort(journal, err);


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 33/49] ext4: Add the journal checksum feature

2008-01-24 Thread Mingming Cao
,
  +* just skip over the blocks it describes. */
  if (pass != PASS_REPLAY) {
  +   if (pass == PASS_SCAN 
  +   JBD2_HAS_COMPAT_FEATURE(journal,
  +   JBD2_FEATURE_COMPAT_CHECKSUM) 
  +   !info-end_transaction) {
  +   if (calc_chksums(journal, bh,
  +   next_log_block,
  +   crc32_sum)) {
 
 put_bh()
 
  +   brelse(bh);
  +   break;
  +   }
  +   brelse(bh);
  +   continue;
 
 put_bh()
 
  +   }
  next_log_block += count_tags(journal, bh);
  wrap(journal, next_log_block);
  brelse(bh);
  @@ -516,9 +563,96 @@ static int do_one_pass(journal_t *journal,
  continue;
   
  +   brelse(bh);
 
 etc
 

Thanks, Updated  patch below:
ext4: Add the journal checksum feature

From: Girish Shilamkar [EMAIL PROTECTED]

The journal checksum feature adds two new flags i.e
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM.

JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the
checksum for the blocks described by the descriptor blocks.
Due to checksums, writing of the commit record no longer needs to be
synchronous. Now commit record can be sent to disk without waiting for
descriptor blocks to be written to disk. This behavior is controlled
using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be
able to recover the journal with _ASYNC_COMMIT hence it is made
incompat.
The commit header has been extended to hold the checksum along with the
type of the checksum.

For recovery in pass scan checksums are verified to ensure the sanity
and completeness(in case of _ASYNC_COMMIT) of every transaction.

Signed-off-by: Andreas Dilger [EMAIL PROTECTED]
Signed-off-by: Girish Shilamkar [EMAIL PROTECTED]
Signed-off-by: Dave Kleikamp [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 Documentation/filesystems/ext4.txt |   10 +
 fs/Kconfig |1 
 fs/ext4/super.c|   25 
 fs/jbd2/commit.c   |  198 +++--
 fs/jbd2/journal.c  |   26 
 fs/jbd2/recovery.c |  151 ++--
 include/linux/ext4_fs.h|3 
 include/linux/jbd2.h   |   36 +-
 8 files changed, 388 insertions(+), 62 deletions(-)


Index: linux-2.6.24-rc8/Documentation/filesystems/ext4.txt
===
--- linux-2.6.24-rc8.orig/Documentation/filesystems/ext4.txt2008-01-24 
11:18:08.0 -0800
+++ linux-2.6.24-rc8/Documentation/filesystems/ext4.txt 2008-01-24 
13:00:44.0 -0800
@@ -89,6 +89,16 @@ When mounting an ext4 filesystem, the fo
 extentsext4 will use extents to address file data.  The
file system will no longer be mountable by ext3.
 
+journal_checksum   Enable checksumming of the journal transactions.
+   This will allow the recovery code in e2fsck and the
+   kernel to detect corruption in the kernel.  It is a
+   compatible change and will be ignored by older kernels.
+
+journal_async_commit   Commit block can be written to disk without waiting
+   for descriptor blocks. If enabled older kernels cannot
+   mount the device. This will enable 'journal_checksum'
+   internally.
+
 journal=update Update the ext4 file system's journal to the current
format.
 
Index: linux-2.6.24-rc8/fs/Kconfig
===
--- linux-2.6.24-rc8.orig/fs/Kconfig2008-01-24 11:18:08.0 -0800
+++ linux-2.6.24-rc8/fs/Kconfig 2008-01-24 11:18:55.0 -0800
@@ -236,6 +236,7 @@ config JBD_DEBUG
 
 config JBD2
tristate
+   select CRC32
help
  This is a generic journaling layer for block devices that support
  both 32-bit and 64-bit block numbers.  It is currently used by
Index: linux-2.6.24-rc8/fs/ext4/super.c
===
--- linux-2.6.24-rc8.orig/fs/ext4/super.c   2008-01-24 11:18:52.0 
-0800
+++ linux-2.6.24-rc8/fs/ext4/super.c2008-01-24 13:00:45.0 -0800
@@ -869,6 +869,7 @@ enum {
Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl,
Opt_reservation, Opt_noreservation, Opt_noload, Opt_nobh

Re: [PATCH] Shrink ext3_inode_info by 8 bytes for !POSIX_ACL.

2008-01-18 Thread Mingming Cao
On Sat, 2008-01-12 at 21:35 +0100, Indan Zupancic wrote:
 i_file_acl and i_dir_acl aren't always needed.
 
 With certain configs this makes 10 ext3_inode_cache objects fit in
 one slab instead of the current 9, as the size shrinks from 416 to
 408 bytes for 32 bit, !POSIX_ACL and !EXT3_FS_XATTR configs.
 
 Signed-off-by: Indan Zupancic [EMAIL PROTECTED]
 ---
  fs/ext3/ialloc.c  |2 ++
  fs/ext3/inode.c   |   29 +++--
  include/linux/ext3_fs_i.h |2 ++
  3 files changed, 23 insertions(+), 10 deletions(-)
 
 diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c
 index 1bc8cd8..01745bc 100644
 --- a/fs/ext3/ialloc.c
 +++ b/fs/ext3/ialloc.c
 @@ -574,8 +574,10 @@ got:
   ei-i_frag_no = 0;
   ei-i_frag_size = 0;
  #endif
 +#ifdef CONFIG_EXT3_FS_POSIX_ACL
   ei-i_file_acl = 0;
   ei-i_dir_acl = 0;
 +#endif

For regular file, i_dir_acl is being reused as i_size_high to support
large file.

Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4]JBD2: user of the jiffies rounding code

2008-01-17 Thread Mingming Cao
Ported from JBD changes from Arjan van de Ven [EMAIL PROTECTED]
---
Date: Sun, 10 Dec 2006 10:21:26 + (-0800)
Subject: [PATCH] user of the jiffies rounding code: JBD
X-Git-Tag: v2.6.20-rc1~15^2~43
X-Git-Url: 
http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=44d306e1508fef6fa7a6eb15a1aba86ef68389a6

[PATCH] user of the jiffies rounding code: JBD

This patch introduces a user: of the round_jiffies() function; the 5 second
ext3/jbd wakeup.

While every 5 seconds doesn't sound as a problem, there can be many of these
(and these timers do add up over all the kernel).  The 5 second wakeup isn't
really timing sensitive; in addition even with rounding it'll still happen
every 5 seconds (with the exception of the very first time, which is likely to
be rounded up to somewhere closer to 6 seconds)

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

---
 fs/jbd2/transaction.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.24-rc7/fs/jbd2/transaction.c
===
--- linux-2.6.24-rc7.orig/fs/jbd2/transaction.c 2008-01-16 15:41:14.0 
-0800
+++ linux-2.6.24-rc7/fs/jbd2/transaction.c  2008-01-16 17:49:48.0 
-0800
@@ -54,7 +54,7 @@ jbd2_get_transaction(journal_t *journal,
spin_lock_init(transaction-t_handle_lock);
 
/* Set up the commit timer for the new transaction. */
-   journal-j_commit_timer.expires = transaction-t_expires;
+   journal-j_commit_timer.expires = round_jiffies(transaction-t_expires);
add_timer(journal-j_commit_timer);
 
J_ASSERT(journal-j_running_transaction == NULL);

 



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4]jbd2: port jbd lockdep support to jbd2

2008-01-17 Thread Mingming Cao
Hi Andrew, Ted,

I walked through the linus's git tree history and found 4 patches should
port from ext3/jbd to ext4/jbd2, since the day ext4 was forked
(2006.10.11) to today. I have already queued the ported patches in ext4
patch queue and verified they seems fine. Here is the first one.



jbd2: port jbd lockdep support to jbd2
 Except lockdep doesn't know about journal_start(), which has ranking
 requirements similar to a semaphore.  

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/jbd2/transaction.c |   11 +++
 include/linux/jbd2.h  |4 
 2 files changed, 15 insertions(+)

Index: linux-2.6.24-rc7/fs/jbd2/transaction.c
===
--- linux-2.6.24-rc7.orig/fs/jbd2/transaction.c 2008-01-16 15:30:24.0 
-0800
+++ linux-2.6.24-rc7/fs/jbd2/transaction.c  2008-01-16 15:41:14.0 
-0800
@@ -241,6 +241,8 @@ out:
return ret;
 }
 
+static struct lock_class_key jbd2_handle_key;
+
 /* Allocate a new handle.  This should probably be in a slab... */
 static handle_t *new_handle(int nblocks)
 {
@@ -251,6 +253,9 @@ static handle_t *new_handle(int nblocks)
handle-h_buffer_credits = nblocks;
handle-h_ref = 1;
 
+   lockdep_init_map(handle-h_lockdep_map, jbd2_handle,
+   jbd2_handle_key, 0);
+
return handle;
 }
 
@@ -293,7 +298,11 @@ handle_t *jbd2_journal_start(journal_t *
jbd2_free_handle(handle);
current-journal_info = NULL;
handle = ERR_PTR(err);
+   goto out;
}
+
+   lock_acquire(handle-h_lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+out:
return handle;
 }
 
@@ -1419,6 +1428,8 @@ int jbd2_journal_stop(handle_t *handle)
spin_unlock(journal-j_state_lock);
}
 
+   lock_release(handle-h_lockdep_map, 1, _THIS_IP_);
+
jbd2_free_handle(handle);
return err;
 }
Index: linux-2.6.24-rc7/include/linux/jbd2.h
===
--- linux-2.6.24-rc7.orig/include/linux/jbd2.h  2008-01-16 15:29:03.0 
-0800
+++ linux-2.6.24-rc7/include/linux/jbd2.h   2008-01-16 15:29:54.0 
-0800
@@ -418,6 +418,10 @@ struct handle_s
unsigned inth_sync: 1;  /* sync-on-close */
unsigned inth_jdata:1;  /* force data journaling */
unsigned inth_aborted:  1;  /* fatal error on handle */
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+   struct lockdep_map  h_lockdep_map;
+#endif
 };
 



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4]JBD2: sparse pointer use of zero as null

2008-01-17 Thread Mingming Cao
Ported from upstream jbd changes to jbd2

sparse pointer use of zero as null

Get rid of sparse related warnings from places that use integer as NULL
pointer.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/jbd2/transaction.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux-2.6.24-rc7/fs/jbd2/transaction.c
===
--- linux-2.6.24-rc7.orig/fs/jbd2/transaction.c 2008-01-16 17:49:48.0 
-0800
+++ linux-2.6.24-rc7/fs/jbd2/transaction.c  2008-01-16 18:06:33.0 
-0800
@@ -1182,7 +1182,7 @@ int jbd2_journal_dirty_metadata(handle_t
}
 
/* That test should have eliminated the following case: */
-   J_ASSERT_JH(jh, jh-b_frozen_data == 0);
+   J_ASSERT_JH(jh, jh-b_frozen_data == NULL);
 
JBUFFER_TRACE(jh, file as BJ_Metadata);
spin_lock(journal-j_list_lock);
@@ -1532,7 +1532,7 @@ void __jbd2_journal_temp_unlink_buffer(s
 
J_ASSERT_JH(jh, jh-b_jlist  BJ_Types);
if (jh-b_jlist != BJ_None)
-   J_ASSERT_JH(jh, transaction != 0);
+   J_ASSERT_JH(jh, transaction != NULL);
 
switch (jh-b_jlist) {
case BJ_None:
@@ -1601,11 +1601,11 @@ __journal_try_to_free_buffer(journal_t *
if (buffer_locked(bh) || buffer_dirty(bh))
goto out;
 
-   if (jh-b_next_transaction != 0)
+   if (jh-b_next_transaction != NULL)
goto out;
 
spin_lock(journal-j_list_lock);
-   if (jh-b_transaction != 0  jh-b_cp_transaction == 0) {
+   if (jh-b_transaction != NULL  jh-b_cp_transaction == NULL) {
if (jh-b_jlist == BJ_SyncData || jh-b_jlist == BJ_Locked) {
/* A written-back ordered data buffer */
JBUFFER_TRACE(jh, release data);
@@ -1613,7 +1613,7 @@ __journal_try_to_free_buffer(journal_t *
jbd2_journal_remove_journal_head(bh);
__brelse(bh);
}
-   } else if (jh-b_cp_transaction != 0  jh-b_transaction == 0) {
+   } else if (jh-b_cp_transaction != NULL  jh-b_transaction == NULL) {
/* written-back checkpointed metadata buffer */
if (jh-b_jlist == BJ_None) {
JBUFFER_TRACE(jh, remove from checkpoint list);
@@ -1973,7 +1973,7 @@ void __jbd2_journal_file_buffer(struct j
 
J_ASSERT_JH(jh, jh-b_jlist  BJ_Types);
J_ASSERT_JH(jh, jh-b_transaction == transaction ||
-   jh-b_transaction == 0);
+   jh-b_transaction == NULL);
 
if (jh-b_transaction  jh-b_jlist == jlist)
return;

 



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix oops in mballoc caused by a variable overflow

2008-01-16 Thread Mingming Cao
On Wed, 2008-01-16 at 20:11 +0100, Valerie Clement wrote:
 A simple dd oopses the kernel (2.6.24-rc7 with the latest patch queue):
   dd if=/dev/zero of=/mnt/test/foo bs=1M count=8096
 
 EXT4-fs: mballoc enabled
 [ cut here ]
 kernel BUG at fs/ext4/mballoc.c:3148!
 
 The BUG_ON is:
   BUG_ON(size = 0 || size = EXT4_BLOCKS_PER_GROUP(ac-ac_sb));
 
 where the value of size is 4293920768.
 
 This is due to the overflow of the variable start in the 
 ext4_mb_normalize_request() function.
 The patch below fixes it.
 
Thanks!

 Signed-off-by: Valerie Clement [EMAIL PROTECTED]
 ---
 
  mballoc.c |   23 ---
  1 file changed, 12 insertions(+), 11 deletions(-)
 
 
 Index: linux-2.6.24-rc7/fs/ext4/mballoc.c
 ===
 --- linux-2.6.24-rc7.orig/fs/ext4/mballoc.c   2008-01-16 19:22:45.0 
 +0100
 +++ linux-2.6.24-rc7/fs/ext4/mballoc.c2008-01-16 19:25:04.0 
 +0100
 @@ -2990,6 +2990,7 @@ static void ext4_mb_normalize_request(st
   struct list_head *cur;
   loff_t size, orig_size;
   ext4_lblk_t start, orig_start;
 + ext4_fsblk_t pstart;

ext4_fsblk_t is used for fs physical block number, here I think pstart
is pointing to some logical block location..

   struct ext4_inode_info *ei = EXT4_I(ac-ac_inode);
 
   /* do normalize only data requests, metadata requests
 @@ -3029,7 +3030,7 @@ static void ext4_mb_normalize_request(st
 
   /* first, try to predict filesize */
   /* XXX: should this table be tunable? */
 - start = 0;
 + pstart = 0;
   if (size = 16 * 1024) {
   size = 16 * 1024;
   } else if (size = 32 * 1024) {
 @@ -3045,25 +3046,25 @@ static void ext4_mb_normalize_request(st
   } else if (size = 1024 * 1024) {
   size = 1024 * 1024;
   } else if (NRL_CHECK_SIZE(size, 4 * 1024 * 1024, max, bsbits)) {
 - start = ac-ac_o_ex.fe_logical  bsbits;
 - start = (start / (1024 * 1024)) * (1024 * 1024);
 + pstart = ac-ac_o_ex.fe_logical  bsbits;
 + pstart = (pstart / (1024 * 1024)) * (1024 * 1024);

How about using shift...

-   start = ac-ac_o_ex.fe_logical  bsbits;
-   start = (start / (1024 * 1024)) * (1024 * 1024);
+   start = (ac-ac_o_ex.fe_logical  (20-bsbits))  20;

That would be more efficient and should fix the overflow issue

   size = 1024 * 1024;
   } else if (NRL_CHECK_SIZE(size, 8 * 1024 * 1024, max, bsbits)) {
 - start = ac-ac_o_ex.fe_logical  bsbits;
 - start = (start / (4 * (1024 * 1024))) * 4 * (1024 * 1024);
 + pstart = ac-ac_o_ex.fe_logical  bsbits;
 + pstart = (pstart / (4 * (1024 * 1024))) * 4 * (1024 * 1024);

+   start = (ac-ac_o_ex.fe_logical  (22-bsbits))  22;

   size = 4 * 1024 * 1024;
   } else 
 if(NRL_CHECK_SIZE(ac-ac_o_ex.fe_len,(820)bsbits,max,bsbits)){
 - start = ac-ac_o_ex.fe_logical;
 - start = start  bsbits;
 - start = (start / (8 * (1024 * 1024))) * 8 * (1024 * 1024);
 + pstart = ac-ac_o_ex.fe_logical;
 + pstart = pstart  bsbits;
 + pstart = (pstart / (8 * (1024 * 1024))) * 8 * (1024 * 1024);

+   start = (ac-ac_o_ex.fe_logical  (23-bsbits))  23;

   size = 8 * 1024 * 1024;
   } else {
 - start = ac-ac_o_ex.fe_logical;
 - start = start  bsbits;
 + pstart = ac-ac_o_ex.fe_logical;
 + pstart = pstart  bsbits;
   size = ac-ac_o_ex.fe_len  bsbits;
   }
   orig_size = size = size  bsbits;
 - orig_start = start = start  bsbits;
 + orig_start = start = pstart  bsbits;
 
   /* don't cover already allocated blocks in selected range */
   if (ar-pleft  start = ar-lleft) {
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Do not try lock_acquire after handle made invalid

2008-01-16 Thread Mingming Cao
On Tue, 2008-01-15 at 15:59 -0800, Andrew Morton wrote:
 On Wed, 16 Jan 2008 00:39:26 +0100
 Jonas Bonn [EMAIL PROTECTED] wrote:
 
  This likely fixes the oops in __lock_acquire reported as:
  
  http://www.kerneloops.org/raw.php?rawid=2753msgid=
  http://www.kerneloops.org/raw.php?rawid=2749msgid=
  
  In these reported oopses, start_this_handle is returning -EROFS.
  
  Signed-off-by: Jonas Bonn [EMAIL PROTECTED]
  ---
   fs/jbd/transaction.c |2 ++
   1 files changed, 2 insertions(+), 0 deletions(-)
  
  diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
  index 08ff6c7..038ed74 100644
  --- a/fs/jbd/transaction.c
  +++ b/fs/jbd/transaction.c
  @@ -288,10 +288,12 @@ handle_t *journal_start(journal_t *journal, int 
  nblocks)
  jbd_free_handle(handle);
  current-journal_info = NULL;
  handle = ERR_PTR(err);
  +   goto out;
  }
   
  lock_acquire(handle-h_lockdep_map, 0, 0, 0, 2, _THIS_IP_);
   
  +out:
  return handle;
   }
 
 Yeah, thanks.
 
 It looks like we forgot to port this lockdep support into ext4.  This is
 bad.  What else got lost?

Here is the ext4/jbd2 port of the original lockdep patch with the above
fix. Please let me know if I missed anything.

It's likely there are other jbd/ext3 changes missed out in ext4/jbd2. I
will take a look at the git tree and do a cleanup.


Thanks for pointing this out.

-
jbd2: lockdep support for jbd2
 Except lockdep doesn't know about journal_start(), which has ranking
 requirements similar to a semaphore.  

Ported from lockdep jbd patch.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/jbd2/transaction.c |   11 +++
 include/linux/jbd2.h  |4 
 2 files changed, 15 insertions(+)

Index: linux-2.6.24-rc7/fs/jbd2/transaction.c
===
--- linux-2.6.24-rc7.orig/fs/jbd2/transaction.c 2008-01-16 15:30:24.0 
-0800
+++ linux-2.6.24-rc7/fs/jbd2/transaction.c  2008-01-16 15:41:14.0 
-0800
@@ -241,6 +241,8 @@ out:
return ret;
 }
 
+static struct lock_class_key jbd2_handle_key;
+
 /* Allocate a new handle.  This should probably be in a slab... */
 static handle_t *new_handle(int nblocks)
 {
@@ -251,6 +253,9 @@ static handle_t *new_handle(int nblocks)
handle-h_buffer_credits = nblocks;
handle-h_ref = 1;
 
+   lockdep_init_map(handle-h_lockdep_map, jbd2_handle,
+   jbd2_handle_key, 0);
+
return handle;
 }
 
@@ -293,7 +298,11 @@ handle_t *jbd2_journal_start(journal_t *
jbd2_free_handle(handle);
current-journal_info = NULL;
handle = ERR_PTR(err);
+   goto out;
}
+
+   lock_acquire(handle-h_lockdep_map, 0, 0, 0, 2, _THIS_IP_);
+out:
return handle;
 }
 
@@ -1419,6 +1428,8 @@ int jbd2_journal_stop(handle_t *handle)
spin_unlock(journal-j_state_lock);
}
 
+   lock_release(handle-h_lockdep_map, 1, _THIS_IP_);
+
jbd2_free_handle(handle);
return err;
 }
Index: linux-2.6.24-rc7/include/linux/jbd2.h
===
--- linux-2.6.24-rc7.orig/include/linux/jbd2.h  2008-01-16 15:29:03.0 
-0800
+++ linux-2.6.24-rc7/include/linux/jbd2.h   2008-01-16 15:29:54.0 
-0800
@@ -418,6 +418,10 @@ struct handle_s
unsigned inth_sync: 1;  /* sync-on-close */
unsigned inth_jdata:1;  /* force data journaling */
unsigned inth_aborted:  1;  /* fatal error on handle */
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+   struct lockdep_map  h_lockdep_map;
+#endif
 };
 
 




-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: checkpatch.pl warnings

2008-01-15 Thread Mingming Cao
On Tue, 2008-01-15 at 18:22 +0530, Aneesh Kumar K.V wrote:
 On Mon, Jan 14, 2008 at 12:49:27PM -0800, Mingming Cao wrote:
  Hi Guys,
  
  Could you check the checkpatch.pl warnings and see if it make sense to fix 
  them? Thanks!
  
  [EMAIL PROTECTED]:~/fs/ext4/stylecheck$ grep has style problems *
  linux-2.6.24-rc7-48-bit-i_blocks.patch.out:Your patch has style problems, 
  please review.  If any of these errors
  linux-2.6.24-rc7-ext3-4-migrate.patch.out:Your patch has style problems, 
  please review.  If any of these errors
  linux-2.6.24-rc7-ext4_export_iov_shorten_from_kernel_for_ext4.patch.out:Your
   patch has style problems, please review.  If any of these errors
  linux-2.6.24-rc7-ext4-journal_chksum-2.6.20.patch.out:Your patch has style 
  problems, please review.  If any of these errors
  linux-2.6.24-rc7-ext4_rec_len_overflow_with_64kblk_fix-v2.patch.out:Your 
  patch has style problems, please review.  If any of these errors
  linux-2.6.24-rc7-ext4_store_maxbytes_for_bitmaped_files.patch.out:Your 
  patch has style problems, please review.  If any of these errors
  linux-2.6.24-rc7-inode-version-ext4.patch.out:Your patch has style 
  problems, please review.  If any of these errors
  linux-2.6.24-rc7-jbd-stats-through-procfs.out:Your patch has style 
  problems, please review.  If any of these errors
  linux-2.6.24-rc7-large-file-blocktype.patch.out:Your patch has style 
  problems, please review.  If any of these errors
  linux-2.6.24-rc7-mballoc-core.patch.out:Your patch has style problems, 
  please review.  If any of these errors
  
 
 Fixed the checkpatch.pl warning for all the patches in the patch queue.
 The diff is attached below for review.

Thanks!

patch queue has been updated.

Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: fix uniniatilized extend splitting error.

2008-01-10 Thread Mingming Cao
Thanks for catching this, I added your fix to the ext4 patch queue
http://repo.or.cz/w/ext4-patch-queue.git

On Thu, 2008-01-10 at 17:31 +0300, Dmitry Monakhov wrote:
 Hi,
 While playing with new fancy fallocate interface on ext4 i've triggered
 bug which corrupted my grub :).
 
 My testcase:
 
 blksize = 0x1000;
 fd = open(argv[1], O_RDWR|O_CREAT, 0700);
 unsigned long long sz = 0x1000UL;
 /* allocating big blocks chunk */
 syscall(__NR_fallocate, fd, 0, 0UL, sz)
 
 /* grab all other available filesystem space */
 tfd = open(tmp, O_RDWR|O_CREAT|O_DIRECT, 0700);
 while( write(tfd, buf, 4096)  0); /* loop untill ENOSPC */
 fsync(fd); /* just in case */
 while (pos  sz) {
   /* each seek+ write operation result in splits uninitialized extent
   in three extents. Splitting may result in new extent allocation
   which probably will fail because of ENOSPC*/
 
   lseek(fd, blksize*2 -1, SEEK_CUR);
   if ((ret = write(fd, 'a', 1)) != 1)
   exit(1);
   pos += blksize * 2;
 }
 
 Buggy place:
 
 ext4_ext_get_blocks(..., bh_result,..)
 {
   err = 0;
   allocated = 0;
 
ret = ext4_ext_convert_to_initialized(...)
   if (ret  0)
  By occasion real error code was lost here.
   goto out2
 
 out2:
 
   return err? err: allocated;
  Wow.. exit with 0, and caller assumes what bh_result was properly filled
and then will submit it for write. But in fact bh contains random data in
-b_bdev, -b_blocknr fileds :).
 }
 
 Signed-off-by: Dmitry Monakhov [EMAIL PROTECTED]
 ---
  fs/ext4/extents.c |5 +++--
  1 files changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
 index 8528774..fc8e508 100644
 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -2320,9 +2320,10 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode 
 *inode,
   ret = ext4_ext_convert_to_initialized(handle, inode,
   path, iblock,
   max_blocks);
 - if (ret = 0)
 + if (ret = 0) {
 + err = ret;
   goto out2;
 - else
 + } else
   allocated = ret;
   goto outnew;
   }

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: fix uniniatilized extend splitting error.

2008-01-10 Thread Mingming Cao
On Thu, 2008-01-10 at 14:31 -0700, Andreas Dilger wrote:
 On Jan 10, 2008  17:31 +0300, Dmitry Monakhov wrote:
  While playing with new fancy fallocate interface on ext4 i've triggered
  bug which corrupted my grub :).
 
 I notice I'm CC'd on this, but in fact Amit wrote the code.  I've CC'd
 him even though I expect he will notice it anyways.
 
  My testcase:
  
  blksize = 0x1000;
  fd = open(argv[1], O_RDWR|O_CREAT, 0700);
  unsigned long long sz = 0x1000UL;
  /* allocating big blocks chunk */
  syscall(__NR_fallocate, fd, 0, 0UL, sz)
  
  /* grab all other available filesystem space */
  tfd = open(tmp, O_RDWR|O_CREAT|O_DIRECT, 0700);
  while( write(tfd, buf, 4096)  0); /* loop untill ENOSPC */
  fsync(fd); /* just in case */
  while (pos  sz) {
  /* each seek+ write operation result in splits uninitialized extent
  in three extents. Splitting may result in new extent allocation
  which probably will fail because of ENOSPC*/
  
  lseek(fd, blksize*2 -1, SEEK_CUR);
  if ((ret = write(fd, 'a', 1)) != 1)
  exit(1);
  pos += blksize * 2;
  }
 
 Interesting test, and well thought out...
 
  Buggy place:
  
  ext4_ext_get_blocks(..., bh_result,..)
  {
  err = 0;
  allocated = 0;
  
   ret = ext4_ext_convert_to_initialized(...)
  if (ret  0)
   By occasion real error code was lost here.
  goto out2
  
  out2:
  
  return err? err: allocated;
   Wow.. exit with 0, and caller assumes what bh_result was properly 
  filled
 and then will submit it for write. But in fact bh contains random data 
  in
 -b_bdev, -b_blocknr fileds :).
  }
 
 The other item that Amit and I discussed in the past is in the case of
 ENOSPC it would be possible instead of splitting the extent to zero-fill
 the smaller extent (1 block in your test case) and write the whole thing
 as an initialized extent.  This could then either be merged with the
 previous or following allocated extent, or the whole extent zeroed if that
 was not possible.
 
 It would add some latency in the worst case to do this in the kernel,
 but this would only happen if there is no free space at all.  It might
 even be desirable to always zero-fill small extents instead of splitting
 uninitialized extents, because a random write of 64kB is not more expensive
 than 4kB and avoids overhead of splitting the nicely contiguous extent tree.
 

Zero-fill uninitialized extents all the time can cause extra IO, as that
zero-out (via get_block()) is happened at prepare write time rather than
page write out time.

I'd prefer the latency zero-out happens only when it needed. 


  Signed-off-by: Dmitry Monakhov [EMAIL PROTECTED]
  ---
   fs/ext4/extents.c |5 +++--
   1 files changed, 3 insertions(+), 2 deletions(-)
  
  diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
  index 8528774..fc8e508 100644
  --- a/fs/ext4/extents.c
  +++ b/fs/ext4/extents.c
  @@ -2320,9 +2320,10 @@ int ext4_ext_get_blocks(handle_t *handle, struct 
  inode *inode,
  ret = ext4_ext_convert_to_initialized(handle, inode,
  path, iblock,
  max_blocks);
  -   if (ret = 0)
  +   if (ret = 0) {
  +   err = ret;
  goto out2;
  -   else
  +   } else
  allocated = ret;
  goto outnew;
  }
  -- 
  1.5.3.1.40.g6972-dirty
  
  
  -
  To unsubscribe from this list: send the line unsubscribe linux-ext4 in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: Fix the soft lockup with multi block allocator.

2008-01-09 Thread Mingming Cao
On Thu, 2008-01-10 at 00:41 +0530, Aneesh Kumar K.V wrote:
 On Wed, Jan 09, 2008 at 07:44:30PM +0100, Jan Kara wrote:
  On Wed 09-01-08 23:54:28, Aneesh Kumar K.V wrote:
   On Wed, Jan 09, 2008 at 01:10:41PM +0100, Jan Kara wrote:
 With the multi block allocator when we don't have prealloc space we 
 discard
 @@ -3790,7 +3782,9 @@ repeat:
  
   /* if we still need more blocks and some PAs were used, try 
 again */
   if (free  needed  busy) {
 + busy = 0;
   ext4_unlock_group(sb, group);
 + schedule_timeout(HZ);
   goto repeat;
   }
  Hmm, wouldn't just schedule() be enough here? That would give a good
chance to other processes to proceed and we would avoid this artificial
wait of 1s which is quite ugly IMO.

   
   But then who will wake up the task ?. I have the below comment added to
   the patch in the patch queue.
As far as I know, you don't have to wake-up the task explicitely.
  Scheduler will simply schedule the task sometime in future (it is a similar
  situation as if the task got preempted in the kernel).
 
 I missed the current-state = TASK_UNINTERRUPTIBLE; in that code piece.
 So yes without setting the task state yes it would put it back to the run
 queue.Infact schedule_timeout() without changing the task state also
 behaves similarly. Now that that we know that it if fine just to have a
 schedule() there since schedule_timeout() was just behaving as
 schedule(). I guess we should make the change you suggested. In that
 case we can remove the comment which says we need to add the wait queue.
 
 Mingming,
 
 Do you want me to send a patch or can you make the modification ?
 

I could make the changes and update the mballoc patch in the queue.

Mingming
 -aneesh
 

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix up vfs inode version patch for modules

2007-12-03 Thread Mingming Cao
On Wed, 2007-11-28 at 14:33 -0600, Eric Sandeen wrote:
 ext4 calls inode_inc_iversion(), but it's not exported, so modular ext4
 has trouble.  Any reason not to just make it an inline, as below?
 

I agree.

Mingming
 Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
 
 ---
 
 Index: linux-2.6.24-rc1/fs/inode.c
 ===
 --- linux-2.6.24-rc1.orig/fs/inode.c
 +++ linux-2.6.24-rc1/fs/inode.c
 @@ -1243,23 +1243,6 @@ void touch_atime(struct vfsmount *mnt, s
  EXPORT_SYMBOL(touch_atime);
 
  /**
 - * inode_inc_iversion  -   increments i_version
 - * @inode: inode that need to be updated
 - *
 - * Every time the inode is modified, the i_version field
 - * will be incremented.
 - * The filesystem has to be mounted with i_version flag
 - *
 - */
 -
 -void inode_inc_iversion(struct inode *inode)
 -{
 - spin_lock(inode-i_lock);
 - inode-i_version++;
 - spin_unlock(inode-i_lock);
 -}
 -
 -/**
   *   file_update_time-   update mtime and ctime time
   *   @file: file accessed
   *
 Index: linux-2.6.24-rc1/include/linux/fs.h
 ===
 --- linux-2.6.24-rc1.orig/include/linux/fs.h
 +++ linux-2.6.24-rc1/include/linux/fs.h
 @@ -1396,7 +1396,21 @@ static inline void inode_dec_link_count(
   mark_inode_dirty(inode);
  }
 
 -extern void inode_inc_iversion(struct inode *inode);
 +/**
 + * inode_inc_iversion - increments i_version
 + * @inode: inode that need to be updated
 + *
 + * Every time the inode is modified, the i_version field will be incremented.
 + * The filesystem has to be mounted with i_version flag
 + */
 +
 +static inline void inode_inc_iversion(struct inode *inode)
 +{
 +   spin_lock(inode-i_lock);
 +   inode-i_version++;
 +   spin_unlock(inode-i_lock);
 +}
 +
  extern void touch_atime(struct vfsmount *mnt, struct dentry *dentry);
  static inline void file_accessed(struct file *file)
  {
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/8] ext4: fix MB_DEBUG format warnings

2007-11-30 Thread Mingming Cao
On Fri, 2007-11-30 at 16:22 -0800, Andrew Morton wrote:
 On Fri, 30 Nov 2007 16:17:32 -0800
 Mingming Cao [EMAIL PROTECTED] wrote:
 
  Thanks for resending the patches. The first three patches in this series
  were missed in the ext4 patch queue, the rest of them are already queued
  there.  They are all in ext4 patch queue now. I will check other ext4
  patches you plan to merge to mm tree and get them sync in ext4 tree.
  
  http://repo.or.cz/w/ext4-patch-queue.git
  git://repo.or.cz/ext4-patch-queue.git
 
 What's all this about?  
That is the latest ext4 patch queue ( quilt format patches, not full
ext4 source tree).

 I've always been pulling
 ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ext4-patches/LATEST/broken-out/
 (and continuously fixing the same rejects each time).
 
 
Ted periodically pull patches from the latest ext4 patch queue on
repo.or.cz, after they had survive fs tests, then, release the ext4
patches on kernel.org (the one you pulled from). I believe Ted is on
vacation now.

Mingming


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/8] ext4: fix MB_DEBUG format warnings

2007-11-30 Thread Mingming Cao
Hi Andrew,

Thanks for resending the patches. The first three patches in this series
were missed in the ext4 patch queue, the rest of them are already queued
there.  They are all in ext4 patch queue now. I will check other ext4
patches you plan to merge to mm tree and get them sync in ext4 tree.

http://repo.or.cz/w/ext4-patch-queue.git
git://repo.or.cz/ext4-patch-queue.git


Regards,
Mingming
On Thu, 2007-11-29 at 12:52 -0800, [EMAIL PROTECTED] wrote:
 From: Eric Sandeen [EMAIL PROTECTED]
 
 (applies at end of current ext4 patch series)
 
 Fix a slew of compile warnings from mismatched format specifiers
 when MB_DEBUG is turned on.
 
 Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
 Signed-off-by: Andrew Morton [EMAIL PROTECTED]
 ---
 
  fs/ext4/mballoc.c |   20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)
 
 diff -puN fs/ext4/mballoc.c~ext4-fix-mb_debug-format-warnings 
 fs/ext4/mballoc.c
 --- a/fs/ext4/mballoc.c~ext4-fix-mb_debug-format-warnings
 +++ a/fs/ext4/mballoc.c
 @@ -954,7 +954,7 @@ static int ext4_mb_init_cache(struct pag
   get_bh(bh[i]);
   bh[i]-b_end_io = end_buffer_read_sync;
   submit_bh(READ, bh[i]);
 - mb_debug(read bitmap for group %u\n, first_group + i);
 + mb_debug(read bitmap for group %lu\n, first_group + i);
   }
 
   /* wait for I/O completion */
 @@ -2670,7 +2670,7 @@ static void ext4_mb_free_committed_block
   if (md == NULL)
   break;
 
 - mb_debug(gonna free %u blocks in group %u (0x%p):,
 + mb_debug(gonna free %u blocks in group %lu (0x%p):,
   md-num, md-group, md);
 
   err = ext4_mb_load_buddy(sb, md-group, e4b);
 @@ -3029,7 +3029,7 @@ static void ext4_mb_normalize_group_requ
 
   BUG_ON(lg == NULL);
   ac-ac_g_ex.fe_len = EXT4_SB(sb)-s_mb_group_prealloc;
 - mb_debug(#%u: goal %u blocks for locality group\n,
 + mb_debug(#%u: goal %lu blocks for locality group\n,
   current-pid, ac-ac_g_ex.fe_len);
  }
 
 @@ -3255,7 +3255,7 @@ static void ext4_mb_use_inode_pa(struct 
   BUG_ON(pa-pa_free  len);
   pa-pa_free -= len;
 
 - mb_debug(use %lu/%lu from inode pa %p\n, start, len, pa);
 + mb_debug(use %llu/%lu from inode pa %p\n, start, len, pa);
  }
 
  /*
 @@ -3282,7 +3282,7 @@ static void ext4_mb_use_group_pa(struct 
* pa and think that it have enought free blocks if we
* don't update pa_free here right ?
*/
 - mb_debug(use %lu/%lu from group pa %p\n, pa-pa_lstart-len, len, pa);
 + mb_debug(use %u/%u from group pa %p\n, pa-pa_lstart-len, len, pa);
  }
 
  /*
 @@ -3513,7 +3513,7 @@ static int ext4_mb_new_inode_pa(struct e
   pa-pa_deleted = 0;
   pa-pa_linear = 0;
 
 - mb_debug(new inode pa %p: %lu/%lu for %lu\n, pa,
 + mb_debug(new inode pa %p: %llu/%u for %u\n, pa,
   pa-pa_pstart, pa-pa_len, pa-pa_lstart);
 
   ext4_mb_use_inode_pa(ac, pa);
 @@ -3569,7 +3569,7 @@ static int ext4_mb_new_group_pa(struct e
   pa-pa_deleted = 0;
   pa-pa_linear = 1;
 
 - mb_debug(new group pa %p: %lu/%lu for %lu\n, pa,
 + mb_debug(new group pa %p: %llu/%u for %u\n, pa,
   pa-pa_pstart, pa-pa_len, pa-pa_lstart);
 
   ext4_mb_use_group_pa(ac, pa);
 @@ -4072,8 +4072,8 @@ static int ext4_mb_initialize_context(st
* locality group. this is a policy, actually */
   ext4_mb_group_or_file(ac);
 
 - mb_debug(init ac: %u blocks @ %llu, goal %llu, flags %x, 2^%d, 
 - left: %llu/%llu, right %llu/%llu to %swritable\n,
 + mb_debug(init ac: %u blocks @ %u, goal %u, flags %x, 2^%d, 
 + left: %u/%u, right %u/%u to %swritable\n,
   (unsigned) ar-len, (unsigned) ar-logical,
   (unsigned) ar-goal, ac-ac_flags, ac-ac_2order,
   (unsigned) ar-lleft, (unsigned) ar-pleft,
 @@ -4288,7 +4288,7 @@ static int ext4_mb_free_metadata(handle_
   page_cache_get(e4b-bd_bitmap_page);
   db-bb_md_cur = md;
   db-bb_tid = handle-h_transaction-t_tid;
 - mb_debug(new md 0x%p for group %u\n,
 + mb_debug(new md 0x%p for group %lu\n,
   md, md-group);
   } else {
   kfree(md);
 _

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 6][PATCH 1/1]Export jbd stats through procfs

2007-11-30 Thread Mingming Cao
On Fri, 2007-11-30 at 17:08 -0600, Eric Sandeen wrote:
 Mingming Cao wrote:
  [PATCH] jbd2 stats through procfs
  
  The patch below updates the jbd stats patch to 2.6.20/jbd2.
  The initial patch was posted by Alex Tomas in December 2005
  (http://marc.info/?l=linux-ext4m=113538565128617w=2).
  It provides statistics via procfs such as transaction lifetime and size.
  
  [ This probably should be rewritten to use debugfs?   -- Ted]
  
  Signed-off-by: Johann Lombardi [EMAIL PROTECTED]
 
 I've started going through this one to clean it up to the point where it
 can go forward.  It's been sitting at the top of the unstable portion of
 the patch queue for long enough, I think :)
 
Thanks!

 I've converted the msecs to jiffies until the user boundary, changed the
 union #defines as suggested by Andrew, and various other little issues etc.
 
 Remaining to do is a generic time-difference calculator (instead of
 jbd2_time_diff), and looking into whether it should be made a config
 option; I tend to think it should, but it's fairly well sprinkled
 through the code, so I'll see how well that works.
 
 Also did we ever decided if this should go to debugfs?
 

I thought it was decided to keep it on procfs as debugfs is not always
on...
 Thanks,
 
 -Eric
 -
 To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ext4 still broken on arm (at least)

2007-11-28 Thread Mingming Cao

Andrew Morton wrote:

On Wed, 28 Nov 2007 11:00:20 +0530 Aneesh Kumar K.V [EMAIL PROTECTED] wrote:


On Tue, Nov 27, 2007 at 09:14:44PM -0800, Andrew Morton wrote:

fs/ext4/mballoc.c: In function `ext4_mb_generate_buddy':
fs/ext4/mballoc.c:836: error: implicit declaration of function `ext2_find_next_bit' 


I actually sent in a patch which changes asking for review to
linux-arch. I haven't got the response yet.


Don't expect one...


Attaching the patch
below 




Introduce ext4_find_next_bit

From: Aneesh Kumar K.V [EMAIL PROTECTED]

This gets used by the ext4 multi block allocator patches.

Also add generic_find_next_le_bit

Signed-off-by: Aneesh Kumar K.V [EMAIL PROTECTED]
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 include/asm-arm/bitops.h |2 +
 include/asm-generic/bitops/ext2-non-atomic.h |2 +
 include/asm-generic/bitops/le.h  |4 ++
 include/asm-m68k/bitops.h|2 +
 include/asm-m68knommu/bitops.h   |2 +
 include/asm-powerpc/bitops.h |4 ++
 include/asm-s390/bitops.h|2 +
 include/linux/ext4_fs.h  |1 +
 lib/find_next_bit.c  |   43 ++


May as well merge it I guess.  I'll do that.  Possibly it should be
merged into the ext4 tree instead, so that people can build the ext4 devel
tree on other architectures but obviously there's no call for that.


I will merge it to ext4 development tree.

Mingming


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext4: dir inode reservation V3

2007-11-19 Thread Mingming Cao
On Tue, 2007-11-13 at 22:12 +0800, Coly Li wrote:
 Basic idea of my dir inode reservation patch can be found here,
 http://lists.openwall.net/linux-ext4/2007/11/05/3
 
 1, What does dir inode reservation do
 Dir inode reservation tries to reserve several inodes in inodes table for a 
 directory when this
 directory is created. When create new file under this directory, try to 
 allocate inode from the
 reserved inodes area. This is called as dir_ireserve inode allocator.
 
Thanks for the update.

Let me try to understand your method:

So the basic idea is not do linear inode allocation for directory? Inode
structure block for directory file is only coming from block 0, N, N
+N,... where the number of skipped blocks N is stored in the in-core
superblock structure. 

When ever need to allocate an inode for directory, skip N reserved bits
(space for N*16 inodes) if the previous block is already allocated. That
way place two directories with the hole of N*16 inodes structures, then
allow files under the first directory stay closer with their parent
directory. Is this correct?
 

 4, Dir inode reservation is optional
 Dir inode reservation is optional, you can use -o followed by one of these 
 options to enable dir
 inode reservation during mount ext4 file system:
  dir_ireserve=low
  dir_ireserve=normal
  dir_ireserve=high

Would be nice to pass the tuning info low/normal/high(16/64/128 blocks
correspondingly) via something else rather than mount options. 
 
 Currently, 'low' reserves 15 file inodes for each directory, 'normal' 
 reserves 31 inodes and 'high'
 reserves 127 inodes. Reserving more than 127 inodes does not help to 
 performance obviously.
 
 
 5, Performance number
 On a Core-Duo, 2MB DDM memory, 7200 RPM SATA PC, I built a 50GB ext4 
 partition, and tried to create
 5 directories, and create 15 (1KB) files in each directory alternatively. 
 After a remount, I
 tried to remove all the directories and files recursively by a 'rm -rf'. 
 Bellow is the benchmark result,
   normal ext4 ext4 with dir inode 
 reservation
   mount options:  -o data=writeback   -o 
 data=writeback,dir_ireserve=low
   Create dirs:real0m49.101s   real2m59.703s
   Create files:   real24m17.962s  real21m8.161s
   Unlink all: real24m43.788s  real17m29.862s
 Creating dirs with dir inode reservation is slower than normal ext4 as 
 predicted, because allocating
 directory inodes in non-linear order will cause extra hard disk seeking and 
 block I/O.

Hmm...I suspect there is bug in your patch, the extra seek should not
contribute to 4 times slower

 
  #include linux/time.h
 @@ -478,6 +480,75 @@ static int find_group_other(struct super_block *sb, 
 struct inode *parent,
   return -1;
  }
 
 +static int ext4_ino_from_ireserve(handle_t *handle, struct inode *dir,
 +   int mode, ext4_group_t *group, unsigned long *ino)
 +{
 + struct super_block *sb;
 + struct ext4_sb_info *sbi;
 + struct ext4_group_desc *gdp = NULL;
 + struct buffer_head *gdp_bh = NULL, *bitmap_bh = NULL;
 + ext4_group_t ires_group = *group;
 + unsigned long ires_ino;
 + int i, bit;
 +
 + sb = dir-i_sb;
 + sbi = EXT4_SB(sb);
 +
 + /* if the inode number is not for directory,
 +  * only try to allocate after directory's inode
 +  */
 + if (!S_ISDIR(mode)) {
 + *ino = dir-i_ino % EXT4_INODES_PER_GROUP(sb);
 + return 0;
 + }
 +
 + /* reserve inodes for new directory */
 + for (i = 0; i  sbi-s_groups_count; i++) {
 + gdp = ext4_get_group_desc(sb, ires_group, gdp_bh);
 + if (!gdp)
 + goto fail;
 + bit = 0;
 +try_same_group:
 + if (bit  EXT4_INODES_PER_GROUP(sb)) {
 + brelse(bitmap_bh);
 + bitmap_bh = read_inode_bitmap(sb, ires_group);
 + if (!bitmap_bh)
 + goto fail;
 +
 + BUFFER_TRACE(bitmap_bh, get_write_access);
 + if (ext4_journal_get_write_access(
 + handle, bitmap_bh) != 0)
 + goto fail;
 + if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, ires_group),
 + bit, bitmap_bh-b_data)) {
 + /* we won it */
 + BUFFER_TRACE(bitmap_bh,
 + call ext4_journal_dirty_metadata);
 + if (ext4_journal_dirty_metadata(handle,
 + bitmap_bh) != 0)
 + goto fail;
 + ires_ino = bit;
 + goto find;
 + }
 + /* we lost it */
 +

Re: [2.6 patch] ext4/super.c: fix #ifdef's

2007-11-05 Thread Mingming Cao
Acked-by: Mingmming Cao [EMAIL PROTECTED]

Ted, I added this patch in ext4 patch queue.

On Mon, 2007-11-05 at 18:07 +0100, Adrian Bunk wrote:
 This patch fixes the names of two variables in #ifdef's.
 
 Based on a report by Robert P. J. Day.
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
 
 ---
 
  fs/ext4/super.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 44e9889e6a3952ea225704b2e494d31e00f34a6b 
 diff --git a/fs/ext4/super.c b/fs/ext4/super.c
 index 8031dc0..6673672 100644
 --- a/fs/ext4/super.c
 +++ b/fs/ext4/super.c
 @@ -646,7 +646,7 @@ static int ext4_show_options(struct seq_file *seq, struct 
 vfsmount *vfs)
   seq_puts(seq, ,debug);
   if (test_opt(sb, OLDALLOC))
   seq_puts(seq, ,oldalloc);
 -#ifdef CONFIG_EXT4_FS_XATTR
 +#ifdef CONFIG_EXT4DEV_FS_XATTR
   if (test_opt(sb, XATTR_USER))
   seq_puts(seq, ,user_xattr);
   if (!test_opt(sb, XATTR_USER) 
 @@ -654,7 +654,7 @@ static int ext4_show_options(struct seq_file *seq, struct 
 vfsmount *vfs)
   seq_puts(seq, ,nouser_xattr);
   }
  #endif
 -#ifdef CONFIG_EXT4_FS_POSIX_ACL
 +#ifdef CONFIG_EXT4DEV_FS_POSIX_ACL
   if (test_opt(sb, POSIX_ACL))
   seq_puts(seq, ,acl);
   if (!test_opt(sb, POSIX_ACL)  (def_mount_opts  EXT4_DEFM_ACL))
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC]JBD2: Fix journal checksum kernel oops on NUMA

2007-11-05 Thread Mingming Cao
On Mon, 2007-11-05 at 10:07 -0800, Badari Pulavarty wrote: 
 On Tue, 2007-11-06 at 00:15 +0800, Andreas Dilger wrote:
  On Nov 05, 2007  08:04 -0800, Badari Pulavarty wrote:
   On Sat, 2007-11-03 at 09:36 +0800, Andreas Dilger wrote:
But...  this implies that every user of bh-b_data needs to kmap, and I
don't see that in the code anywhere else.  That makes me think something
else is going wrong here.
   
   Most cases, this is handled in ll_rw_block() code - when we submit the
   buffer head for IO. If the page is in highmem, we will end up creating
   a bounce bufer for it. 
   
   In our case, JBD code is trying to look at the data to do checksum
   on it. Thats why we have to kmap() the page before looking.
  
  My point is that there is a LOT of code in ext[234] that dereferences
  bh-b_data without kmap() (e.g. group descriptors, bitmaps, superblock,
  inode tables, etc).  Does that imply that something is forcing those
  bh pages into lowmem, or is the journal bh page in question being
  allocated in some different way that allows it to be in highmem?
 
 Yes. You are right. Its been a while since I had to deal with HIGHMEM.
 All the meta-data should be in LOWMEM. I asked Mingming to verify
 what the buffer-head is pointing to when it has HIGHMEM page.
 

The buffer_heads with NULL bh-b_data(under the start_journal_io
branch in jbd2_journal_commit_transaction() code) is created by
jbd2_journal_write_metadata_buffer().

Noticed that in jbd2_journal_write_metadata_buffer(),  there are
multiple places which do kmap_atomic() to access the journal bh page
(new_page).  In the normal case the new_page is pointing to the bh
pages, which(the page) was initially allocated by _page_cache_alloc()
(sb_bread-__bread()-_...find_or_create_page()-_page_cache_alloc()

In the case it need a data copy (the buffer start with the
JBD2_MAGIC_NUMBER?), a new page is allocated by by
__get_free_pages()(via jbd2_alloc, which is possible allocated in
highmem. __get_free_pages calls alloc_pages() directly, doesn't seem to
have highmem handling like __page_cache_alloc(). 

I am not sure why we saw this issue on 2.6.23 kernel, where
jbd2_slab_alloc()-kmem_cache_alloc() is used. Isn't all slab pages
under lowmem?


Regards,

Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][RFC]Ext4: Use get_cpu()/put_cpu() in preemptible context

2007-11-02 Thread Mingming Cao
Fix the warning of:

printk: 369 messages suppressed.
BUG: using smp_processor_id() in preemptible [0001] code: fsx-linux/31702
caller is ext4_mb_new_blocks+0x2aa/0x1319

Signed-off-by: Mingming Cao [EMAIL PROTECTED]

---
 fs/ext4/mballoc.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.24-rc1/fs/ext4/mballoc.c
===
--- linux-2.6.24-rc1.orig/fs/ext4/mballoc.c 2007-11-02 17:22:18.0 
-0700
+++ linux-2.6.24-rc1/fs/ext4/mballoc.c  2007-11-02 17:23:02.0 -0700
@@ -4006,7 +4006,8 @@ static void ext4_mb_group_or_file(struct
return;
 
BUG_ON(ac-ac_lg != NULL);
-   ac-ac_lg = sbi-s_locality_groups[smp_processor_id()];
+   ac-ac_lg = sbi-s_locality_groups[get_cpu()];
+   put_cpu();
 
/* we're going to use group allocation */
ac-ac_flags |= EXT4_MB_HINT_GROUP_ALLOC;


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][RFC]JBD2: Fix journal checksum kernel oops on NUMA

2007-11-01 Thread Mingming Cao
JBD2: Fix NULL pointer bh-b_data on NUMA box with journal checksumming.

Current journal checksumming patch failed fsstress test on NUMA. The 
bh-b_data passed to the crc32_be () function could be NULL pointer, 
which caused kernel oops immediately when running fsstress with -o 
journal_checksum. It is because the page is part of highmem on NUMA box.
We need to kmap the page before access the bh-b_data to calculate
the checksums.
 
Signed-off-by: Mingming Cao [EMAIL PROTECTED]

---
 fs/jbd2/commit.c |   19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

Index: linux-2.6.24-rc1/fs/jbd2/commit.c
===
--- linux-2.6.24-rc1.orig/fs/jbd2/commit.c  2007-11-01 11:15:08.0 
-0700
+++ linux-2.6.24-rc1/fs/jbd2/commit.c   2007-11-01 12:27:02.0 -0700
@@ -352,6 +352,20 @@ write_out_data:
journal_do_submit_data(wbuf, bufs);
 }
 
+static inline __u32 jbd2_checksum_data(__u32 crc32_sum, struct buffer_head *bh)
+{
+   struct page *page = bh-b_page;
+   char *addr;
+   __u32 checksum;
+
+   addr = kmap(page);
+   checksum = crc32_be(crc32_sum,
+   (void *)(addr + offset_in_page(bh-b_data)), bh-b_size);
+   kunmap(page);
+
+   return checksum;
+}
+
 static inline void write_tag_block(int tag_bytes, journal_block_tag_t *tag,
   unsigned long long block)
 {
@@ -715,9 +729,8 @@ start_journal_io:
 */
if (JBD2_HAS_COMPAT_FEATURE(journal,
JBD2_FEATURE_COMPAT_CHECKSUM)) {
-   crc32_sum = crc32_be(crc32_sum,
-   (void *)bh-b_data,
-   bh-b_size);
+   crc32_sum =
+   jbd2_checksum_data(crc32_sum, bh);
}
 
lock_buffer(bh);


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problem with delayed allocation option

2007-10-26 Thread Mingming Cao
On Fri, 2007-10-26 at 14:28 +0200, Valerie Clement wrote:
 Hi all,
 
Hi Valerie,

 I ran a small test which creates one directory and 2O 8-KB size files in it.
 
 When the filesystem is mounted without the delalloc option, here is the
 output of the command dumpe2fs for the group in which the directory and 
 the files are created:
 
 Group 532 : (Blocks 17432576-17465343)
Block bitmap at 17432576 (+0), Inode bitmap at 17432577 (+1)
Inode table at 17432578-17433089 (+2)
32213 free blocks, 16363 free inodes, 1 directories
Free blocks : 17433090-17459199, 17459241-17465343
Free inodes : 8716310-8732672
 
 
 When the filesystem is mounted with the delalloc option, the same test
 gives a different result:
 
 Group 395 : (Blocks 12943360-12976127)
Block bitmap at 12943360 (+0), Inode bitmap at 12943361 (+1)
Inode table at 12943362-12943873 (+2)
32213 free blocks, 16363 free inodes, 1 directories
Free blocks : 12943874-12955647, 12955650-12955655, 
 12955658-12955663, 12955666-12955671, 12955674-12955679, 
 12955682-12955687, 12955690-12955695, 12955698-12955703, 
 12955706-12955711, 12955714-12955719, 12955722-12955727, 
 12955730-12955735, 12955738-12955743, 12955746-12955751, 
 12955754-12955759, 12955762-12955767, 12955770-12955775, 
 12955778-12955783, 12955786-12955791, 12955794-12955799, 
 12955802-12961791, 12961793-12976127
Free inodes : 6471702-6488064
 
 In the first case, the allocated blocks are contiguous whereas they are
 not in the second case.
 
 After adding traces in the code to understand why the behavior is
 different with the delalloc option, I found that the problem is related
 to the inode reservation window.

 To simplify, without the delalloc option we have the following scenario:
 For each inode,
   - call alloc_new_reservation() to allocate a new reservation window
   - allocate blocks for data
   - write data to disk
   - ext4_discard_reservation() when the inode is closed.
 
 With the delalloc option, when the data are written to disk we have:
 For each inode,
   - call alloc_new_reservation() to allocate a new reservation window
   - allocate blocks for data
   - write data to disk
 
 
 I think a call to ext4_discard_reservation() is missing somewhere and
 the question is where.
 
Oh, that should be block reservation, not inode reservation window.

The problem with delayed allocation and block reservation is, we don't
know when suppose to close the window, as the file maybe closed with
diry data in cache,and the blocks has not be allocated yet. We would
like to keep the window open so that later delayed allocation happens,
the allocation could take advantage of the reservation. But on the other
hand, that may leads fs external fragmentation.

with mballoc, ext3 block reservation should be turned off and replaced
with the group-in-core-preallocation. 

Has the new delayed allocation integrated with mballoc yet?

 I tried to add this call at the end of the ext4_da_get_block_write()
 function. This seems to fix the problem as the blocks are allocated
 contiguously on disk but the function seems to be called too many times
 so I think it is perhaps not the right place to call it.
 
 Who could look into this problem?
 I've got a few days off so I couldn't help more next days, but the
 problem is easily reproductible.
 
 Wouldn't this also explain why the compilebench results posted by Chris
 Mason are bad in some cases?
 
Valérie
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix oops with jbd-stats-through-procfs and external journal

2007-10-25 Thread Mingming Cao
Thanks, Added to ext4 patch queue at
http://repo.or.cz/w/ext4-patch-queue.git

On Thu, 2007-10-25 at 11:39 -0500, Eric Sandeen wrote:
 When using an external device for the journal,
 jbd2_stats_proc_init() wants to use journal-j_dev in its call
 to bdevname() but it's not assigned yet, resulting in an oops.
 
 Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
 
 --- linux.orig/fs/jbd2/journal.c  2007-10-25 11:36:25.772354262 -0500
 +++ linux/fs/jbd2/journal.c   2007-10-25 11:36:35.058278242 -0500
 @@ -1035,11 +1035,11 @@
   journal = NULL;
   goto out;
   }
 - jbd2_stats_proc_init(journal);
   journal-j_dev = bdev;
   journal-j_fs_dev = fs_dev;
   journal-j_blk_offset = start;
   journal-j_maxlen = len;
 + jbd2_stats_proc_init(journal);
 
   bh = __getblk(journal-j_dev, start, journal-j_blocksize);
   J_ASSERT(bh != NULL);
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] i_version update - vfs part

2007-10-25 Thread Mingming Cao
On Thu, 2007-10-25 at 19:04 +0200, Cordenner jean noel wrote:
 Hi,
 
 This is an update of the previous patches on the ext4 git tree, the 2
 coming patches applies at the end of the current ext4-patch-queue, and
 replaces the inode-version related patches:
 64-bit-i_version.patch
 i_version_hi.patch
 ext4_i_version_hi_2.patch
 i_version_update_ext4.patch
 
 The first part deals with the vfs part. 
 The i_version field of the inode is changed to be a 64-bit counter that
 is set on every inode creation and that is incremented every time the
 inode data is modified (similarly to the ctime time-stamp). 
 The aim is to fulfill a NFSv4 requirement for rfc3530.
 This first part concerns the vfs, it converts the 32-bit i_version in
 the generic inode to a 64-bit, a flag is added in the super block in
 order to check if the feature is enabled and the i_version is
 incremented in the vfs.
 
Thanks for reposting it.

 Index: linux-2.6.23-ext4-1/fs/inode.c
 ===
 --- linux-2.6.23-ext4-1.orig/fs/inode.c   2007-10-25 16:15:52.0
 +0200
 +++ linux-2.6.23-ext4-1/fs/inode.c2007-10-25 16:25:53.0 +0200
 @@ -1216,6 +1216,24 @@
  EXPORT_SYMBOL(touch_atime);
 
  /**
 + * inode_inc_iversion  -   increments i_version
 + * @inode: inode that need to be updated
 + *
 + * Every time the inode is modified, the i_version field
 + * will be incremented.
 + * The filesystem has to be mounted with i_version flag
 + *
 + */
 +
 +void inode_inc_iversion(struct inode *inode)
 +{
 + spin_lock(inode-i_lock);
 + inode-i_version++;
 + spin_unlock(inode-i_lock);
 +}

I wonder do we really need i_lock here for inode versioning update?

Understand this is a 64 bit counter, but file_update_time() and
ext4_mark_inode_dirty() (where the inode version is updated) is called
on the file write path so i_mutex should be hold all the time. As long
as the read patch holding i_mutex everything should be fine, isn't it?

Have you get a chance to check the performance impact to ext4?

 +
 +/**
   *   file_update_time-   update mtime and ctime time
   *   @file: file accessed
   *
 @@ -1249,6 +1267,11 @@
   sync_it = 1;
   }
 
 + if (IS_I_VERSION(inode)) {
 + inode_inc_iversion(inode);
 + sync_it = 1;
 + }
 +
   if (sync_it)
   mark_inode_dirty_sync(inode);
  }
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One request for people editing the patch queue....

2007-10-25 Thread Mingming Cao
On Thu, 2007-10-25 at 15:49 -0400, Theodore Ts'o wrote:
 Could people please use more descriptive names for patch names?  Names
 like: 
 
 ext4-cleanup-2.patch
 ext4-cleanup-3.patch
 ext4-cleanup-4.patch
 
 aren't very useful when browsing through the series file.
 
 Thanks,
 

Hi Ted,

I was me being lazy, I just took an updated tar ball from Aneesh without
carefully rename them to a proper names.

Updated ext4-patch-queue patches with meaningful names now.

Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ext4 patch queue updated(2.6.23-git17)

2007-10-23 Thread Mingming Cao
http://repo.or.cz/w/ext4-patch-queue.git
This patch queue is against mainline.

Changes: 

Rebase ext4 patch queue after Linus pull patches from ext4 git tree.
Added ext4-Return-after-ext4_error-in-case-of-failures.patch from Aneesh
Added mballoc-compilebench-fix.patch from Aneesh
Reordered series to move some cleanup patches to stable series.


# This series applies on 2.6.24-git17
#

#
# Large blocksize support for ext4
#
ext4_large_blocksize_support.patch
ext4_rec_len_overflow_with_64kblk_fix-v2.patch

large-file-blocktype.patch

#block group varibles ext4_grpnum_t
ext4_grpnum_t.patch
ext4_grpnum_t_int_fix.patch

#large file support
ext4-cleanup.patch
ext4-cleanup-2.patch
ext4-cleanup-3.patch
ext4-cleanup-4.patch
48-bit-i_blocks.patch
large-file.patch
ext2_fix_max_size.patch
ext3_fix_max_size.patch

ext4_sync_group_desciptor_with_e2fsprogs.patch
ext4-Return-after-ext4_error-in-case-of-failures.patch

# unstable patches still have outstanding comments from lkml

stable-boundary
stable-boundary-undo.patch

# Export jbd stats through procfs
# Fold together:
#ext4-jbd-stats-through-procfs.patch
#ext4-jbd-stats-through-procfs_fix.patch
#ext4-fs-jbd2-journalc-kmalloc-memset-conversion-to-kzalloc.patch
# still have comments from akpm to be addressed
jbd-stats-through-procfs

# Add journal checksums
ext4-journal_chksum-2.6.20.patch
ext4-journal-chksum-review-fix.patch

# inode verion patch series
# inode versioning is needed for NFSv4
# Need to address overwrite case and performance concerns
# Need to repost to lkml for comments before submit
#
# vfs changes, 64 bit inode-i_version
64-bit-i_version.patch
# reserve hi 32 bit inode version on ext4 on-disk inode
i_version_hi.patch
# ext4 inode version read/store
ext4_i_version_hi_2.patch
# ext4 inode version update
i_version_update_ext4.patch

# New delayed allocation patch
delalloc-vfs.patch
delalloc-ext4.patch

ext-truncate-mutex.patch
ext3-4-migrate.patch

###
#mballoc
###
# n.b. in the rc7-mm1 tree as introduce-ext4_find_next_bit.patch
generic-find-next-le-bit
new-extent-function.patch
mballoc-core.patch
mballoc-bug-workaround.patch
ext4_grpnumt-mballoc-fix.patch
mballoc-compilebench-fix.patch

# Large block support for blocksize  pagesize
# Needed for Christoph Lameter's largeblock patchset
# to support large block on system that
# blocksize  pagesize
jbd-blocks-reservation-fix-for-large-blk.patch
jbd2-blocks-reservation-fix-for-large-blk.patch

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compilebench numbers for ext4

2007-10-22 Thread Mingming Cao
On Mon, 2007-10-22 at 19:31 -0400, Chris Mason wrote:
 Hello everyone,
 
 I recently posted some performance numbers for Btrfs with different
 blocksizes, and to help establish a baseline I did comparisons with
 Ext3.
 

Thanks for doing this, Chris!

 The graphs, numbers and a basic description of compilebench are here:
 
 http://oss.oracle.com/~mason/blocksizes/
 
 Ext3 easily wins the read phase, but scores poorly while creating files
 and deleting them.  Since ext3 is winning the read phase, we can assume
 the file layout is fairly good.  I think most of the problems during the
 write phase are caused by pdflush doing metadata writeback.  The file
 data and metadata are written separately, and so we end up seeking
 between things that are actually close together.
 
 Andreas asked me to give ext4 a try, so I grabbed the patch queue from
 Friday along with the latest Linus kernel.  The FS was created with:
 
 mkfs.ext3 -I 256 /dev/
 mount -o delalloc,mballoc,data=ordered -t ext4dev /dev/
 
 I did expect delayed allocation to help the write phases of
 compilebench, especially the parts where it writes out .o files in
 random order (basically writing medium sized files all over the
 directory tree).

Unfortunately delayed allocation support for ordered mode is not there
yet. 

   But, every phase except reads showed huge
 improvements.
 
 http://oss.oracle.com/~mason/compilebench/ext4/ext-create-compare.png
 http://oss.oracle.com/~mason/compilebench/ext4/ext-compile-compare.png
 http://oss.oracle.com/~mason/compilebench/ext4/ext-read-compare.png
 http://oss.oracle.com/~mason/compilebench/ext4/ext-rm-compare.png
 
 To match the ext4 numbers with Btrfs, I'd probably have to turn off data
 checksumming...
 
 But oddly enough I saw very bad ext4 read throughput even when reading
 a single kernel tree (outside of compilebench).  The time to read the
 tree was almost 2x ext3.  Have others seen similar problems?
 
thanks for point this out, will run compilebench. 

Trying to understand the Disk IO graph
http://oss.oracle.com/~mason/compilebench/ext4/ext-read-compare.png
it looks like ext3 the blocks are spread over the disk, while ext4 is
more around the same place, is this right?

 I think the ext4 delete times are so much better than ext3 because this
 is a single threaded test.  delayed allocation is able to get
 everything into a few extents, and these all end up in the inode.  So,
 the delete phase only needs to seek around in small directories and
 seek to well grouped inodes.  ext3 probably had to seek all over for
 the direct/indirect blocks.
 
 So, tomorrow I'll run a few tests with delalloc and mballoc
 independently, but if there are other numbers people are interested in,
 please let me know.
 
 (test box was a desktop machine with single sata drive, barriers were
 not used).
 
 -chris
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] ext2: Avoid rec_len overflow with 64KB block size

2007-10-18 Thread Mingming Cao
On Wed, 2007-10-17 at 21:09 -0700, Andrew Morton wrote:
 On Thu, 11 Oct 2007 13:18:49 +0200 Jan Kara [EMAIL PROTECTED] wrote:
 
  With 64KB blocksize, a directory entry can have size 64KB which does not fit
  into 16 bits we have for entry lenght. So we store 0x instead and 
  convert
  value when read from / written to disk.
 
 btw, this changes ext2's on-disk format.
 
Just to clarify this is only changes the directory entries format on
ext2/3/4 fs with 64k block size. But currently without kernel changes
ext2/3/4 does not support 64 block size.

 a) is the ext2 format documented anywhere?  If so, that document will
need updating.
 

The e2fsprogs needs to be changed to sync up with this change.

Ted has a paper a while back to show ext2 disk format 
http://web.mit.edu/tytso/www/linux/ext2intro.html

The Documentation/filesystems/ext2.txt doesn't have the ext2 format
documented. That document is out-dated need to be reviewed and cleaned
up.
 
 b) what happens when an old ext2 driver tries to read and/or write this
directory entry?  Do we need a compat flag for it?
 
 c) what happens when old and new ext3 or ext4 try to read/write this
directory entry?
 

Without the first patch in this series: ext2 large blocksize support
patches, it fails to mount a ext2 filesystem with 64k block size. 

[PATCH 1/2] ext2:  Support large blocksize up to PAGESIZE
http://lkml.org/lkml/2007/10/1/361

So the old ext2/3/4 driver will not get access the directory entry with
64k block size format changes.


Regards,

Mingming

 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: question about patch queue and ext4-git

2007-10-15 Thread Mingming Cao
On Mon, 2007-10-15 at 22:29 +0800, Coly Li wrote:
 
 Jose R. Santos wrote:
  On Mon, 15 Oct 2007 20:46:32 +0800
  Coly Li [EMAIL PROTECTED] wrote:
  
  Thanks for the replying :-)
 
  Jose R. Santos wrote:
  On Mon, 15 Oct 2007 13:36:05 +0800
  Coly Li [EMAIL PROTECTED] wrote:
 
  Now in my mind there are several words for ext4 patches, most frequently 
  one are patch queue.
 
  I see the patches in patch queue from
  http://www2.kernel.org/pub/linux/kernel/people/tytso/ext4-patches/LATEST/broken-out/
   .
  Also I confirm some of the patches are in ext4 git tree now, but I am 
  not sure for two questions:
  1) Whether all the patches are in ext4 git tree ?
  Yes, all these patches should be in ext4 git tree.
 
  2) This patch queue is only used to push ext4 patch into upstream ?
  The patch queue series is divided into stable and unstable patches.
  The stable patches are the one usually the ones used to push back
  upstream, while the unstable section has the patches for development
  purposes only and are not ready for pushing upstream (and some may
  never make it in).
  How to recognize which patch is stable patch and which one is unstable 
  patch ?
  
  Look in the series file.  The mark where the stable patches end is
  documented there.

 Find it now. Thanks for your explaining :-)


when a new ext4 patch is being submitted, it first added to the patch
queue on repo.or.cz, where the whole series (stable and unstable
patches) will be tested automatically. After the patches being stablized
a bit, Ted will pull ext4 patches from patch queue from repo.or.cz to
his git tree, and public the current patches in the git tree on
kernel.org.

Also there is a stable-boundary patch being added in repo.or.cz patch
queue to indicating where the unstable patches starts.

Mingming
 
  Also there is a patch-queue git at 
  http://repo.or.cz/w/ext4-patch-queue.git , is it same to the
  patches in 
  http://www2.kernel.org/pub/linux/kernel/people/tytso/ext4-patches/ ?
  Same thing bug in git format.  I believe Ted updates his patch queue
  from the patches in the git tree repo, so  if you want latest/greatest
  the git tree is what you want.
 
  Thanks for clarifying :-)
 
 
  -JRS
  
  -JRS
 

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch queue updated

2007-10-10 Thread Mingming Cao
On Wed, 2007-10-10 at 17:35 +0530, Aneesh Kumar K.V wrote:
 I have updated the patch queue at 
 
 http://www.radian.org/~kvaneesh/ext4/patch-series/
 
 http://www.radian.org/~kvaneesh/ext4/patch-series/ext4-patch-queue.tar
 
 Changes:
 a) Add large-file-blocktype.patch  large-file.patch patch.

Could you send them out separately? I noticed that your queue is not
based on the last ext4 patch queue. You modified mballoc-core.patch
directly with large file block type changes...but your updated
mballoc-core.patch doesn't have the latest proc fs fix from Ted. 

I would appreciate if you separete the largeblocktype changes for
mballoc out of the core mballoc changes, as that patch is getting really
big already.

 b) Add the stable-boundary-patch with dummy change
 c) Add stable-boundary-undo.patch which undo the above dummy change.

The stable-boundary patch is being removed from ext4 patch queue,as some
tools expect a meaning code diff to apply any patch.

 d) Update the series to point to the right file name 
 ext4_grpnum_t_int_fix.patch
   instead of ext4_grpnum_t_negative_value_fix.patch
 
Fixed it in ext4-patch-queue.Thanks,

 The diff between the content of the current patch-queu and the this is below
 
 -aneesh
 
 
 diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
 index 2ba49ff..9ffae96 100644
 --- a/fs/ext4/dir.c
 +++ b/fs/ext4/dir.c
 @@ -124,7 +124,7 @@ static int ext4_readdir(struct file * filp,
   offset = filp-f_pos  (sb-s_blocksize - 1);
 
   while (!error  !stored  filp-f_pos  inode-i_size) {
 - unsigned long blk = filp-f_pos  EXT4_BLOCK_SIZE_BITS(sb);
 + ext4_lblk_t blk = filp-f_pos  EXT4_BLOCK_SIZE_BITS(sb);
   struct buffer_head map_bh;
   struct buffer_head *bh = NULL;
 
 diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
 index 0ea6122..392b76e 100644
 --- a/fs/ext4/extents.c
 +++ b/fs/ext4/extents.c
 @@ -144,7 +144,7 @@ static int ext4_ext_dirty(handle_t *handle, struct inode 
 *inode,
 
 static ext4_fsblk_t ext4_ext_find_goal(struct inode *inode,
 struct ext4_ext_path *path,
 -   ext4_fsblk_t block)
 +   ext4_lblk_t block)
 {
   struct ext4_inode_info *ei = EXT4_I(inode);
   ext4_fsblk_t bg_start;
 @@ -367,13 +367,14 @@ static void ext4_ext_drop_refs(struct ext4_ext_path 
 *path)
  * the header must be checked before calling this
  */
 static void
 -ext4_ext_binsearch_idx(struct inode *inode, struct ext4_ext_path *path, int 
 block)
 +ext4_ext_binsearch_idx(struct inode *inode,
 + struct ext4_ext_path *path, ext4_lblk_t block)
 {
   struct ext4_extent_header *eh = path-p_hdr;
   struct ext4_extent_idx *r, *l, *m;
 
 
 - ext_debug(binsearch for %d(idx):  , block);
 + ext_debug(binsearch for %lu(idx):  , (unsigned long)block);
 
   l = EXT_FIRST_INDEX(eh) + 1;
   r = EXT_LAST_INDEX(eh);
 @@ -425,7 +426,8 @@ ext4_ext_binsearch_idx(struct inode *inode, struct 
 ext4_ext_path *path, int bloc
  * the header must be checked before calling this
  */
 static void
 -ext4_ext_binsearch(struct inode *inode, struct ext4_ext_path *path, int 
 block)
 +ext4_ext_binsearch(struct inode *inode,
 + struct ext4_ext_path *path, ext4_lblk_t block)
 {
   struct ext4_extent_header *eh = path-p_hdr;
   struct ext4_extent *r, *l, *m;
 @@ -438,7 +440,7 @@ ext4_ext_binsearch(struct inode *inode, struct 
 ext4_ext_path *path, int block)
   return;
   }
 
 - ext_debug(binsearch for %d:  , block);
 + ext_debug(binsearch for %lu:  , (unsigned long)block);
 
   l = EXT_FIRST_EXTENT(eh) + 1;
   r = EXT_LAST_EXTENT(eh);
 @@ -494,7 +496,8 @@ int ext4_ext_tree_init(handle_t *handle, struct inode 
 *inode)
 }
 
 struct ext4_ext_path *
 -ext4_ext_find_extent(struct inode *inode, int block, struct ext4_ext_path 
 *path)
 +ext4_ext_find_extent(struct inode *inode, ext4_lblk_t block,
 + struct ext4_ext_path *path)
 {
   struct ext4_extent_header *eh;
   struct buffer_head *bh;
 @@ -979,8 +982,8 @@ repeat:
   /* refill path */
   ext4_ext_drop_refs(path);
   path = ext4_ext_find_extent(inode,
 - le32_to_cpu(newext-ee_block),
 - path);
 + (ext4_lblk_t )le32_to_cpu(newext-ee_block),
 + path);
   if (IS_ERR(path))
   err = PTR_ERR(path);
   } else {
 @@ -992,8 +995,8 @@ repeat:
   /* refill path */
   ext4_ext_drop_refs(path);
   path = ext4_ext_find_extent(inode,
 - le32_to_cpu(newext-ee_block),
 - path);
 +(ext4_lblk_t)le32_to_cpu(newext-ee_block),
 + path);
   if 

Re: [patch 1/2] add init_ext4_proc() stub for when CONFIG_PROC_FS is not set

2007-10-09 Thread Mingming Cao
On Tue, 2007-10-09 at 09:31 -0700, Badari Pulavarty wrote:
 On Tue, 2007-10-09 at 15:50 +1000, [EMAIL PROTECTED] wrote:
  plain text document attachment (ext4-add-init_ext4_proc-stub.patch)
  init_ext4_fs() calls init_ext4_proc() so we need a stub init_ext4_proc()
  for the case that CONFIG_PROC_FS is not set.
  Without the stub we get a build error:
  
  fs/ext4/mballoc.c: In function 'init_ext4_proc':
  fs/ext4/mballoc.c:2837: error: 'proc_root_fs' undeclared (first use in this 
  function)
  fs/ext4/mballoc.c:2837: error: (Each undeclared identifier is reported only 
  once
  fs/ext4/mballoc.c:2837: error: for each function it appears in.)
  
  Add a stub init_ext4_proc() function that does nothing but return 0
  
  Signed-off-by: Mark Nelson [EMAIL PROTECTED]
  ---
   fs/ext4/mballoc.c |7 +++
   1 file changed, 7 insertions(+)
  
  Index: ext4/fs/ext4/mballoc.c
  ===
  --- ext4.orig/fs/ext4/mballoc.c
  +++ ext4/fs/ext4/mballoc.c
  @@ -2825,6 +2825,7 @@ static int ext4_mb_destroy_per_dev_proc(
  return 0;
   }
   
  +#ifdef CONFIG_PROC_FS
   int __init init_ext4_proc(void)
   {
  ext4_pspace_cachep =
  @@ -2840,6 +2841,12 @@ int __init init_ext4_proc(void)
   
  return 0;
   }
  +#else
  +int __init init_ext4_proc(void)
  +{
  +   return 0;
  +}
  +#endif
   
   void exit_ext4_proc(void)
   {
 
 Nope. I don't think we can do this :(
 
 For example, we need to create ext4_pspace_cachep kmem cache 
 for the pre-allocation to work. We can't ifdef it out.
 
 Mingming/Amit, can you take a look at this ? It looks like
 we NEED procfs support to make mballoc work. If so, we need
 to add it to the dependency.
 
 

I guess our testing did not catch this up because we have CONFIG_PROC_FS
enabled all the time. mballoc needs procfs for exporting some stats info
and tunable paramenters to optimize/customize multiple allocation.

We could select CONFIG_PROC_FS at kconfig when ext4dev is enabled.

Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] jbd/jbd2: JBD memory allocation cleanups

2007-10-05 Thread Mingming Cao
On Thu, 2007-10-04 at 07:52 +0100, Christoph Hellwig wrote:
 On Thu, Oct 04, 2007 at 01:50:36AM -0400, Theodore Ts'o wrote:
  From: Mingming Cao [EMAIL PROTECTED]
  
  JBD: Replace slab allocations with page cache allocations
 
 It's page allocations, not page cache allocations.
 
  Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly
 
 That sounds like it should be a different patch..

Okay. Will sent the patches, that also separate JBD2 changes to a
different patch.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] jbd: JBD slab allocation cleanups

2007-10-05 Thread Mingming Cao
JBD: JBD slab allocation cleanups

From: Mingming Cao [EMAIL PROTECTED]

JBD: Replace slab allocations with page allocations

JBD allocate memory for committed_data and frozen_data from slab. However
JBD should not pass slab pages down to the block layer. Use page allocator 
pages instead. This will also prepare JBD for the large blocksize patchset.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/jbd/commit.c  |6 +--
 fs/jbd/journal.c |   88 ++-
 fs/jbd/transaction.c |8 ++--
 include/linux/jbd.h  |   13 +--
 4 files changed, 21 insertions(+), 94 deletions(-)


Index: linux-2.6.23-rc9/fs/jbd/commit.c
===
--- linux-2.6.23-rc9.orig/fs/jbd/commit.c   2007-10-05 12:03:43.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd/commit.c2007-10-05 12:08:08.0 -0700
@@ -375,7 +375,7 @@ void journal_commit_transaction(journal_
struct buffer_head *bh = jh2bh(jh);
 
jbd_lock_bh_state(bh);
-   jbd_slab_free(jh-b_committed_data, bh-b_size);
+   jbd_free(jh-b_committed_data, bh-b_size);
jh-b_committed_data = NULL;
jbd_unlock_bh_state(bh);
}
@@ -792,14 +792,14 @@ restart_loop:
 * Otherwise, we can just throw away the frozen data now.
 */
if (jh-b_committed_data) {
-   jbd_slab_free(jh-b_committed_data, bh-b_size);
+   jbd_free(jh-b_committed_data, bh-b_size);
jh-b_committed_data = NULL;
if (jh-b_frozen_data) {
jh-b_committed_data = jh-b_frozen_data;
jh-b_frozen_data = NULL;
}
} else if (jh-b_frozen_data) {
-   jbd_slab_free(jh-b_frozen_data, bh-b_size);
+   jbd_free(jh-b_frozen_data, bh-b_size);
jh-b_frozen_data = NULL;
}
 
Index: linux-2.6.23-rc9/fs/jbd/journal.c
===
--- linux-2.6.23-rc9.orig/fs/jbd/journal.c  2007-10-05 12:03:43.0 
-0700
+++ linux-2.6.23-rc9/fs/jbd/journal.c   2007-10-05 12:08:08.0 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd_slab_alloc(bh_in-b_size, GFP_NOFS);
+   tmp = jbd_alloc(bh_in-b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in-b_frozen_data) {
-   jbd_slab_free(tmp, bh_in-b_size);
+   jbd_free(tmp, bh_in-b_size);
goto repeat;
}
 
@@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = journal_create_jbd_slab(be32_to_cpu(sb-s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (journal_recover(journal))
@@ -1624,77 +1616,6 @@ void * __jbd_kmalloc (const char *where,
 }
 
 /*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
- */
-
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size)  (size  11)
-
-static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
-static const char *jbd_slab_names[JBD_MAX_SLABS] = {
-   jbd_1k, jbd_2k, jbd_4k, NULL, jbd_8k
-};
-
-static void journal_destroy_jbd_slabs(void)
-{
-   int i;
-
-   for (i = 0; i  JBD_MAX_SLABS; i++) {
-   if (jbd_slab[i])
-   kmem_cache_destroy(jbd_slab[i]);
-   jbd_slab[i] = NULL;
-   }
-}
-
-static int journal_create_jbd_slab(size_t slab_size)
-{
-   int i = JBD_SLAB_INDEX(slab_size);
-
-   BUG_ON(i = JBD_MAX_SLABS);
-
-   /*
-* Check if we already have a slab created for this size
-*/
-   if (jbd_slab[i])
-   return 0;
-
-   /*
-* Create a slab and force alignment to be same as slabsize -
-* this will make sure that allocations won't cross the page
-* boundary.
-*/
-   jbd_slab[i] = kmem_cache_create(jbd_slab_names[i

Re: [PATCH 2/2] i_version update - ext4 part

2007-10-05 Thread Mingming Cao
On Fri, 2007-10-05 at 17:28 +0200, Cordenner jean noel wrote:
 This patch update the i_version field of the inode and add a mount
 option to enable this feature. The other condition to enable this
 feature is that the inode size should be 256-bytes.
 
 Signed-off-by: Jean Noel Cordenner [EMAIL PROTECTED]
 --- 
  fs/ext4/inode.c |4 +++-
  fs/ext4/super.c |7 ++-
  include/linux/ext4_fs.h |1 +
  3 files changed, 10 insertions(+), 2 deletions(-)
 
 Index: linux-2.6.23-rc8-ext4-i_version/fs/ext4/inode.c
 ===
 --- linux-2.6.23-rc8-ext4-i_version.orig/fs/ext4/inode.c  2007-10-03
 18:11:17.0 +0200
 +++ linux-2.6.23-rc8-ext4-i_version/fs/ext4/inode.c   2007-10-05
 10:26:42.0 +0200
 @@ -3173,7 +3173,9 @@
  {
   int err = 0;
 
 - inode-i_version++;
 + if (test_opt(inode-i_sb, I_VERSION))
 + inode_inc_iversion(inode);
 +
   /* the do_update_inode consumes one bh-b_count */
   get_bh(iloc-bh);
 
 Index: linux-2.6.23-rc8-ext4-i_version/fs/ext4/super.c
 ===
 --- linux-2.6.23-rc8-ext4-i_version.orig/fs/ext4/super.c  2007-10-03
 18:11:17.0 +0200
 +++ linux-2.6.23-rc8-ext4-i_version/fs/ext4/super.c   2007-10-03
 18:17:44.0 +0200
 @@ -742,7 +742,7 @@
   Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota,
   Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota,
   Opt_grpquota, Opt_extents, Opt_noextents, Opt_delalloc,
 - Opt_mballoc, Opt_nomballoc, Opt_stripe,
 + Opt_mballoc, Opt_nomballoc, Opt_stripe, Opt_i_version,
  };
 
  static match_table_t tokens = {
 @@ -800,6 +800,7 @@
   {Opt_mballoc, mballoc},
   {Opt_nomballoc, nomballoc},
   {Opt_stripe, stripe=%u},
 + {Opt_i_version, i_version},
   {Opt_err, NULL},
   {Opt_resize, resize},
  };
 @@ -1161,6 +1162,10 @@
   return 0;
   sbi-s_stripe = option;
   break;
 + case Opt_i_version:
 + set_opt (sbi-s_mount_opt, I_VERSION);
 + sb-s_flags |= MS_I_VERSION;
 + break;

Need to make sure this flag is cleared if remounted fs without I_VERSION

   default:
   printk (KERN_ERR
   EXT4-fs: Unrecognized mount option \%s\ 
 Index: linux-2.6.23-rc8-ext4-i_version/include/linux/ext4_fs.h
 ===
 --- linux-2.6.23-rc8-ext4-i_version.orig/include/linux/ext4_fs.h
 2007-10-03 18:11:17.0 +0200
 +++ linux-2.6.23-rc8-ext4-i_version/include/linux/ext4_fs.h   2007-10-03
 18:11:54.0 +0200
 @@ -500,6 +500,7 @@
  #define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT  0x100 /* Journal Async
 Commit */
  #define EXT4_MOUNT_DELALLOC  0x200 /* Delalloc support */
  #define EXT4_MOUNT_MBALLOC   0x400 /* Buddy allocation support */
 +#define EXT4_MOUNT_I_VERSION 0x800 /* i_version support */
  /* Compatibility, for having both ext2_fs.h and ext4_fs.h included at
 once */
  #ifndef _LINUX_EXT2_FS_H
  #define clear_opt(o, opt)o = ~EXT4_MOUNT_##opt
 


I don't see places where this counter is being stored/load to/from disk,
so I assume this is the not the full patch series?


Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] i_version update - vfs part

2007-10-05 Thread Mingming Cao
On Fri, 2007-10-05 at 17:28 +0200, Cordenner jean noel wrote:
 Hi, 
 
Hi Jean Noel,

 This is an update of the i_version patch. 

Just to make sure, is this vfs patch and next ext4 patch together going
to replace the 4 inode-version related patches currently in
ext4-patch-queue (and git tree)? 

 The i_version field is a 64bit counter that is set on every inode 
 creation and that is incremented every time the inode data is modified 
 (similarly to the ctime time-stamp). 
 The aim is to fulfill a NFSv4 requirement for rfc3530: 
 5.5.  Mandatory Attributes - Definitions 
 Name#DataType   Access   Description 
 ___ 
 change3uint64   READ A value created by the 
 server that the client can use to determine if file 
 data, directory contents or attributes of the object 
 have been modified.  The servermay return the object's 
 time_metadata attribute for this attribute's value but 
 only if the filesystem object can not be updated more 
 frequently than the resolution of time_metadata. 
 
 
 This first part deals with adding a flag in the super block and incrementing 
 the i_version in the vfs.
 
 Signed-off-by: Jean Noel Cordenner [EMAIL PROTECTED]
 --- 
  fs/inode.c |   23 +++
  fs/libfs.c |   12 
  include/linux/fs.h |3 +++
  3 files changed, 38 insertions(+)
 
 Index: linux-2.6.23-rc8-ext4-i_version/fs/inode.c
 ===
 --- linux-2.6.23-rc8-ext4-i_version.orig/fs/inode.c   2007-09-26 
 14:41:41.0 +0200
 +++ linux-2.6.23-rc8-ext4-i_version/fs/inode.c2007-10-05 
 16:14:41.0 +0200
 @@ -1216,6 +1216,24 @@
  EXPORT_SYMBOL(touch_atime);
 
  /**
 + *   inode_inc_iversion  -   increments i_version
 + *   @inode: inode that need to be updated
 + *
 + *   Every time the inode is modified, the i_version field
 + *   will be incremented.
 + *   The filesystem has to be mounted with i_version flag
 + *
 + */
 +
 +void inode_inc_iversion(struct inode *inode)
 +{
 + spin_lock(inode-i_lock);
 + inode-i_version++;
 + spin_unlock(inode-i_lock);
 +}

I suspect we need a lock here,  the places where need to update the
inode-i_version are already doing update for inode, mostly protected by
i_mutex. 

You could remove the above function and update the counter directly at
the places it need to.

 +EXPORT_SYMBOL(inode_inc_iversion);
 +

Seems unnecessary.

 +/**
   *   file_update_time-   update mtime and ctime time
   *   @file: file accessed
   *
 @@ -1249,6 +1267,11 @@
   sync_it = 1;
   }
 
 + if (IS_I_VERSION(inode)) {
 + inode_inc_iversion(inode);
 + sync_it = 1;
 + }
 +
   if (sync_it)
   mark_inode_dirty_sync(inode);
  }
 Index: linux-2.6.23-rc8-ext4-i_version/fs/libfs.c
 ===
 --- linux-2.6.23-rc8-ext4-i_version.orig/fs/libfs.c   2007-07-09 
 01:32:17.0 +0200
 +++ linux-2.6.23-rc8-ext4-i_version/fs/libfs.c2007-09-26 
 14:51:08.0 +0200
 @@ -255,6 +255,10 @@
   struct inode *inode = old_dentry-d_inode;
 
   inode-i_ctime = dir-i_ctime = dir-i_mtime = CURRENT_TIME;
 + if (IS_I_VERSION(inode)) {
 + inode_inc_iversion(inode);
 + inode_inc_iversion(dir);
 + }
   inc_nlink(inode);
   atomic_inc(inode-i_count);
   dget(dentry);
 @@ -287,6 +291,10 @@
   struct inode *inode = dentry-d_inode;
 
   inode-i_ctime = dir-i_ctime = dir-i_mtime = CURRENT_TIME;
 + if (IS_I_VERSION(inode)) {
 + inode_inc_iversion(inode);
 + inode_inc_iversion(dir);
 + }
   drop_nlink(inode);
   dput(dentry);
   return 0;
 @@ -323,6 +331,10 @@
 
   old_dir-i_ctime = old_dir-i_mtime = new_dir-i_ctime =
   new_dir-i_mtime = inode-i_ctime = CURRENT_TIME;
 + if (IS_I_VERSION(old_dir)) {
 + inode_inc_iversion(old_dir);
 + inode_inc_iversion(new_dir);
 + }
 
   return 0;
  }

Need to update the counter in libfs.c?

 Index: linux-2.6.23-rc8-ext4-i_version/include/linux/fs.h
 ===
 --- linux-2.6.23-rc8-ext4-i_version.orig/include/linux/fs.h   2007-09-26 
 14:46:15.0 +0200
 +++ linux-2.6.23-rc8-ext4-i_version/include/linux/fs.h2007-09-26 
 14:51:08.0 +0200
 @@ -123,6 +123,7 @@
  #define MS_SLAVE (119) /* change to slave */
  #define MS_SHARED(120) /* change to shared */
  #define MS_RELATIME  (121) /* Update atime relative to mtime/ctime. */
 +#define MS_I_VERSION (122) /* Update inode i_version field */
  #define MS_ACTIVE(130)
  #define MS_NOUSER(131)
 
 @@ -172,6 +173,7 @@
   ((inode)-i_flags  (S_SYNC|S_DIRSYNC)))
  

[PATCH 1/2] ext4: Support large blocksize up to PAGESIZE

2007-10-01 Thread Mingming Cao
Support large blocksize up to PAGESIZE (max 64KB) for ext4. 

From: Takashi Sato [EMAIL PROTECTED]

This patch set supports large block size(4k, =64k) in ext4,
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext4 without some changes to the directory handling
code.  The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem.  The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0x to handle this.

The Patch-set consists of the following 2 patches.
  [1/2]  ext4: enlarge blocksize
 - Allow blocksize up to pagesize

  [2/2]  ext4: fix rec_len overflow
 - prevent rec_len from overflow with 64KB blocksize

Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext4dev, and able to handle empty directory block.
Patch consider to be merge to 2.6.24-rc1.

Signed-off-by: Takashi Sato [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/ext4/super.c |5 +
 include/linux/ext4_fs.h |4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)


diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 619db84..d8bb279 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1548,6 +1548,11 @@ static int ext4_fill_super (struct super_block *sb, void 
*data, int silent)
goto out_fail;
}
 
+   if (!sb_set_blocksize(sb, blocksize)) {
+   printk(KERN_ERR EXT4-fs: bad blocksize %d.\n, blocksize);
+   goto out_fail;
+   }
+
/*
 * The ext4 superblock will not be buffer aligned for other than 1kB
 * block sizes.  We need to calculate the offset from buffer start.
diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index f9881b6..d15a15e 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -77,8 +77,8 @@
  * Macro-instructions used to manage several block sizes
  */
 #define EXT4_MIN_BLOCK_SIZE1024
-#defineEXT4_MAX_BLOCK_SIZE 4096
-#define EXT4_MIN_BLOCK_LOG_SIZE  10
+#defineEXT4_MAX_BLOCK_SIZE 65536
+#define EXT4_MIN_BLOCK_LOG_SIZE10
 #ifdef __KERNEL__
 # define EXT4_BLOCK_SIZE(s)((s)-s_blocksize)
 #else


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] ext4: Avoid rec_len overflow with 64KB block size

2007-10-01 Thread Mingming Cao
ext4: Avoid rec_len overflow with 64KB block size

From: Jan Kara [EMAIL PROTECTED]

With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0x instead and convert
value when read from / written to disk. The patch also converts some places
to use ext4_next_entry() when we are changing them anyway.

Signed-off-by: Jan Kara [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/ext4/dir.c   |   12 ---
 fs/ext4/namei.c |   76 ++-
 include/linux/ext4_fs.h |   20 
 3 files changed, 62 insertions(+), 46 deletions(-)


diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 3ab01c0..20b1e28 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -69,7 +69,7 @@ int ext4_check_dir_entry (const char * function, struct inode 
* dir,
  unsigned long offset)
 {
const char * error_msg = NULL;
-   const int rlen = le16_to_cpu(de-rec_len);
+   const int rlen = ext4_rec_len_from_disk(de-rec_len);
 
if (rlen  EXT4_DIR_REC_LEN(1))
error_msg = rec_len is smaller than minimal;
@@ -176,10 +176,10 @@ revalidate:
 * least that it is non-zero.  A
 * failure will be detected in the
 * dirent test below. */
-   if (le16_to_cpu(de-rec_len) 
-   EXT4_DIR_REC_LEN(1))
+   if (ext4_rec_len_from_disk(de-rec_len)
+EXT4_DIR_REC_LEN(1))
break;
-   i += le16_to_cpu(de-rec_len);
+   i += ext4_rec_len_from_disk(de-rec_len);
}
offset = i;
filp-f_pos = (filp-f_pos  ~(sb-s_blocksize - 1))
@@ -201,7 +201,7 @@ revalidate:
ret = stored;
goto out;
}
-   offset += le16_to_cpu(de-rec_len);
+   offset += ext4_rec_len_from_disk(de-rec_len);
if (le32_to_cpu(de-inode)) {
/* We might block in the next section
 * if the data destination is
@@ -223,7 +223,7 @@ revalidate:
goto revalidate;
stored ++;
}
-   filp-f_pos += le16_to_cpu(de-rec_len);
+   filp-f_pos += ext4_rec_len_from_disk(de-rec_len);
}
offset = 0;
brelse (bh);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 5fdb862..96e8a85 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -281,7 +281,7 @@ static struct stats dx_show_leaf(struct dx_hash_info 
*hinfo, struct ext4_dir_ent
space += EXT4_DIR_REC_LEN(de-name_len);
names++;
}
-   de = (struct ext4_dir_entry_2 *) ((char *) de + 
le16_to_cpu(de-rec_len));
+   de = ext4_next_entry(de);
}
printk((%i)\n, names);
return (struct stats) { names, space, 1 };
@@ -552,7 +552,8 @@ static int ext4_htree_next_block(struct inode *dir, __u32 
hash,
  */
 static inline struct ext4_dir_entry_2 *ext4_next_entry(struct ext4_dir_entry_2 
*p)
 {
-   return (struct ext4_dir_entry_2 *)((char*)p + le16_to_cpu(p-rec_len));
+   return (struct ext4_dir_entry_2 *)((char*)p +
+   ext4_rec_len_from_disk(p-rec_len));
 }
 
 /*
@@ -721,7 +722,7 @@ static int dx_make_map (struct ext4_dir_entry_2 *de, int 
size,
cond_resched();
}
/* XXX: do we need to check rec_len == 0 case? -Chris */
-   de = (struct ext4_dir_entry_2 *) ((char *) de + 
le16_to_cpu(de-rec_len));
+   de = ext4_next_entry(de);
}
return count;
 }
@@ -823,7 +824,7 @@ static inline int search_dirblock(struct buffer_head * bh,
return 1;
}
/* prevent looping on a bad block */
-   de_len = le16_to_cpu(de-rec_len);
+   de_len = ext4_rec_len_from_disk(de-rec_len);
if (de_len = 0)
return -1;
offset += de_len;
@@ -1136,7 +1137,7 @@ dx_move_dirents(char *from, char *to, struct dx_map_entry 
*map, int count)
rec_len = EXT4_DIR_REC_LEN(de-name_len);
memcpy (to, de, rec_len);
((struct ext4_dir_entry_2 *) to)-rec_len =
-   cpu_to_le16(rec_len);
+   ext4_rec_len_to_disk(rec_len);
de-inode = 0;
map

[PATCH 1/2] ext2: Support large blocksize up to PAGESIZE

2007-10-01 Thread Mingming Cao
Support large blocksize up to PAGESIZE (max 64KB) for ext2

From: Takashi Sato [EMAIL PROTECTED]

This patch set supports large block size(4k, =64k) in ext2,
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext2 without some changes to the directory handling
code.  The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem.  The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0x to handle this.

The Patch-set consists of the following 2 patches.
  [1/2]  ext2: enlarge blocksize
 - Allow blocksize up to pagesize

  [2/2]  ext2: fix rec_len overflow
 - prevent rec_len from overflow with 64KB blocksize

Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext2, and able to handle empty directory block.

Please consider to include to mm tree.

Signed-off-by: Takashi Sato [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/ext2/super.c |3 ++-
 include/linux/ext2_fs.h |4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)


diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 639a32c..765c805 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -775,7 +775,8 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
brelse(bh);
 
if (!sb_set_blocksize(sb, blocksize)) {
-   printk(KERN_ERR EXT2-fs: blocksize too small for 
device.\n);
+   printk(KERN_ERR EXT2-fs: bad blocksize %d.\n,
+   blocksize);
goto failed_sbi;
}
 
diff --git a/include/linux/ext2_fs.h b/include/linux/ext2_fs.h
index 153d755..910a705 100644
--- a/include/linux/ext2_fs.h
+++ b/include/linux/ext2_fs.h
@@ -86,8 +86,8 @@ static inline struct ext2_sb_info *EXT2_SB(struct super_block 
*sb)
  * Macro-instructions used to manage several block sizes
  */
 #define EXT2_MIN_BLOCK_SIZE1024
-#defineEXT2_MAX_BLOCK_SIZE 4096
-#define EXT2_MIN_BLOCK_LOG_SIZE  10
+#define EXT2_MAX_BLOCK_SIZE65536
+#define EXT2_MIN_BLOCK_LOG_SIZE10
 #ifdef __KERNEL__
 # define EXT2_BLOCK_SIZE(s)((s)-s_blocksize)
 #else


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] ext2: Avoid rec_len overflow with 64KB block size

2007-10-01 Thread Mingming Cao
ext2: Avoid rec_len overflow with 64KB block size

From: Jan Kara [EMAIL PROTECTED]

With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0x instead and convert
value when read from / written to disk.

Signed-off-by: Jan Kara [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/ext2/dir.c   |   43 +++
 include/linux/ext2_fs.h |1 +
 2 files changed, 32 insertions(+), 12 deletions(-)


diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 2bf49d7..1329bdb 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -26,6 +26,24 @@
 
 typedef struct ext2_dir_entry_2 ext2_dirent;
 
+static inline unsigned ext2_rec_len_from_disk(__le16 dlen)
+{
+   unsigned len = le16_to_cpu(dlen);
+
+   if (len == EXT2_MAX_REC_LEN)
+   return 1  16;
+   return len;
+}
+
+static inline __le16 ext2_rec_len_to_disk(unsigned len)
+{
+   if (len == (1  16))
+   return cpu_to_le16(EXT2_MAX_REC_LEN);
+   else if (len  (1  16))
+   BUG();
+   return cpu_to_le16(len);
+}
+
 /*
  * ext2 uses block-sized chunks. Arguably, sector-sized ones would be
  * more robust, but we have what we have
@@ -95,7 +113,7 @@ static void ext2_check_page(struct page *page)
}
for (offs = 0; offs = limit - EXT2_DIR_REC_LEN(1); offs += rec_len) {
p = (ext2_dirent *)(kaddr + offs);
-   rec_len = le16_to_cpu(p-rec_len);
+   rec_len = ext2_rec_len_from_disk(p-rec_len);
 
if (rec_len  EXT2_DIR_REC_LEN(1))
goto Eshort;
@@ -193,7 +211,8 @@ static inline int ext2_match (int len, const char * const 
name,
  */
 static inline ext2_dirent *ext2_next_entry(ext2_dirent *p)
 {
-   return (ext2_dirent *)((char*)p + le16_to_cpu(p-rec_len));
+   return (ext2_dirent *)((char*)p +
+   ext2_rec_len_from_disk(p-rec_len));
 }
 
 static inline unsigned 
@@ -305,7 +324,7 @@ ext2_readdir (struct file * filp, void * dirent, filldir_t 
filldir)
return 0;
}
}
-   filp-f_pos += le16_to_cpu(de-rec_len);
+   filp-f_pos += ext2_rec_len_from_disk(de-rec_len);
}
ext2_put_page(page);
}
@@ -413,7 +432,7 @@ void ext2_set_link(struct inode *dir, struct 
ext2_dir_entry_2 *de,
struct page *page, struct inode *inode)
 {
unsigned from = (char *) de - (char *) page_address(page);
-   unsigned to = from + le16_to_cpu(de-rec_len);
+   unsigned to = from + ext2_rec_len_from_disk(de-rec_len);
int err;
 
lock_page(page);
@@ -469,7 +488,7 @@ int ext2_add_link (struct dentry *dentry, struct inode 
*inode)
/* We hit i_size */
name_len = 0;
rec_len = chunk_size;
-   de-rec_len = cpu_to_le16(chunk_size);
+   de-rec_len = ext2_rec_len_to_disk(chunk_size);
de-inode = 0;
goto got_it;
}
@@ -483,7 +502,7 @@ int ext2_add_link (struct dentry *dentry, struct inode 
*inode)
if (ext2_match (namelen, name, de))
goto out_unlock;
name_len = EXT2_DIR_REC_LEN(de-name_len);
-   rec_len = le16_to_cpu(de-rec_len);
+   rec_len = ext2_rec_len_from_disk(de-rec_len);
if (!de-inode  rec_len = reclen)
goto got_it;
if (rec_len = name_len + reclen)
@@ -504,8 +523,8 @@ got_it:
goto out_unlock;
if (de-inode) {
ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
-   de1-rec_len = cpu_to_le16(rec_len - name_len);
-   de-rec_len = cpu_to_le16(name_len);
+   de1-rec_len = ext2_rec_len_to_disk(rec_len - name_len);
+   de-rec_len = ext2_rec_len_to_disk(name_len);
de = de1;
}
de-name_len = namelen;
@@ -536,7 +555,7 @@ int ext2_delete_entry (struct ext2_dir_entry_2 * dir, 
struct page * page )
struct inode *inode = mapping-host;
char *kaddr = page_address(page);
unsigned from = ((char*)dir - kaddr)  ~(ext2_chunk_size(inode)-1);
-   unsigned to = ((char*)dir - kaddr) + le16_to_cpu(dir-rec_len);
+   unsigned to = ((char*)dir - kaddr) + 
ext2_rec_len_from_disk(dir-rec_len);
ext2_dirent * pde = NULL;
ext2_dirent * de = (ext2_dirent *) (kaddr + from);
int err;
@@ -557,7 +576,7 @@ int ext2_delete_entry (struct ext2_dir_entry_2 * dir, 
struct page * page )
err = mapping-a_ops-prepare_write

[PATCH 1/2] ext3: Support large blocksize up to PAGESIZE

2007-10-01 Thread Mingming Cao
Support large blocksize up to PAGESIZE (max 64KB) for ext3

From: Takashi Sato [EMAIL PROTECTED]

This patch set supports large block size(4k, =64k) in ext3
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext3 without some changes to the directory handling
code.  The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem.  The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0x to handle this.

The Patch-set consists of the following 2 patches.
  [1/2]  ext3: enlarge blocksize
 - Allow blocksize up to pagesize

  [2/2]  ext3: fix rec_len overflow
 - prevent rec_len from overflow with 64KB blocksize

Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext3, and able to handle empty directory block.

Signed-off-by: Takashi Sato [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/ext3/super.c |6 +-
 include/linux/ext3_fs.h |4 ++--
 2 files changed, 7 insertions(+), 3 deletions(-)


diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 9537316..b4bfd36 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -1549,7 +1549,11 @@ static int ext3_fill_super (struct super_block *sb, void 
*data, int silent)
}
 
brelse (bh);
-   sb_set_blocksize(sb, blocksize);
+   if (!sb_set_blocksize(sb, blocksize)) {
+   printk(KERN_ERR EXT3-fs: bad blocksize %d.\n,
+   blocksize);
+   goto out_fail;
+   }
logic_sb_block = (sb_block * EXT3_MIN_BLOCK_SIZE) / blocksize;
offset = (sb_block * EXT3_MIN_BLOCK_SIZE) % blocksize;
bh = sb_bread(sb, logic_sb_block);
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index ece49a8..7aa5556 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -76,8 +76,8 @@
  * Macro-instructions used to manage several block sizes
  */
 #define EXT3_MIN_BLOCK_SIZE1024
-#defineEXT3_MAX_BLOCK_SIZE 4096
-#define EXT3_MIN_BLOCK_LOG_SIZE  10
+#defineEXT3_MAX_BLOCK_SIZE 65536
+#define EXT3_MIN_BLOCK_LOG_SIZE10
 #ifdef __KERNEL__
 # define EXT3_BLOCK_SIZE(s)((s)-s_blocksize)
 #else


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] ext3: Avoid rec_len overflow with 64KB block size

2007-10-01 Thread Mingming Cao
ext3: Avoid rec_len overflow with 64KB block size

From: Jan Kara [EMAIL PROTECTED]

With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0x instead and convert
value when read from / written to disk. The patch also converts some places
to use ext3_next_entry() when we are changing them anyway.

Signed-off-by: Jan Kara [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---

 fs/ext3/dir.c   |   10 +++--
 fs/ext3/namei.c |   90 ++-
 include/linux/ext3_fs.h |   20 ++
 3 files changed, 68 insertions(+), 52 deletions(-)


diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c
index c00723a..3c4c43a 100644
--- a/fs/ext3/dir.c
+++ b/fs/ext3/dir.c
@@ -69,7 +69,7 @@ int ext3_check_dir_entry (const char * function, struct inode 
* dir,
  unsigned long offset)
 {
const char * error_msg = NULL;
-   const int rlen = le16_to_cpu(de-rec_len);
+   const int rlen = ext3_rec_len_from_disk(de-rec_len);
 
if (rlen  EXT3_DIR_REC_LEN(1))
error_msg = rec_len is smaller than minimal;
@@ -177,10 +177,10 @@ revalidate:
 * least that it is non-zero.  A
 * failure will be detected in the
 * dirent test below. */
-   if (le16_to_cpu(de-rec_len) 
+   if (ext3_rec_len_from_disk(de-rec_len) 
EXT3_DIR_REC_LEN(1))
break;
-   i += le16_to_cpu(de-rec_len);
+   i += ext3_rec_len_from_disk(de-rec_len);
}
offset = i;
filp-f_pos = (filp-f_pos  ~(sb-s_blocksize - 1))
@@ -201,7 +201,7 @@ revalidate:
ret = stored;
goto out;
}
-   offset += le16_to_cpu(de-rec_len);
+   offset += ext3_rec_len_from_disk(de-rec_len);
if (le32_to_cpu(de-inode)) {
/* We might block in the next section
 * if the data destination is
@@ -223,7 +223,7 @@ revalidate:
goto revalidate;
stored ++;
}
-   filp-f_pos += le16_to_cpu(de-rec_len);
+   filp-f_pos += ext3_rec_len_from_disk(de-rec_len);
}
offset = 0;
brelse (bh);
diff --git a/fs/ext3/namei.c b/fs/ext3/namei.c
index c1fa190..2c38eb6 100644
--- a/fs/ext3/namei.c
+++ b/fs/ext3/namei.c
@@ -144,6 +144,15 @@ struct dx_map_entry
u16 size;
 };
 
+/*
+ * p is at least 6 bytes before the end of page
+ */
+static inline struct ext3_dir_entry_2 *ext3_next_entry(struct ext3_dir_entry_2 
*p)
+{
+   return (struct ext3_dir_entry_2 *)((char*)p +
+   ext3_rec_len_from_disk(p-rec_len));
+}
+
 #ifdef CONFIG_EXT3_INDEX
 static inline unsigned dx_get_block (struct dx_entry *entry);
 static void dx_set_block (struct dx_entry *entry, unsigned value);
@@ -281,7 +290,7 @@ static struct stats dx_show_leaf(struct dx_hash_info 
*hinfo, struct ext3_dir_ent
space += EXT3_DIR_REC_LEN(de-name_len);
names++;
}
-   de = (struct ext3_dir_entry_2 *) ((char *) de + 
le16_to_cpu(de-rec_len));
+   de = ext3_next_entry(de);
}
printk((%i)\n, names);
return (struct stats) { names, space, 1 };
@@ -548,14 +557,6 @@ static int ext3_htree_next_block(struct inode *dir, __u32 
hash,
 
 
 /*
- * p is at least 6 bytes before the end of page
- */
-static inline struct ext3_dir_entry_2 *ext3_next_entry(struct ext3_dir_entry_2 
*p)
-{
-   return (struct ext3_dir_entry_2 *)((char*)p + le16_to_cpu(p-rec_len));
-}
-
-/*
  * This function fills a red-black tree with information from a
  * directory block.  It returns the number directory entries loaded
  * into the tree.  If there is an error it is returned in err.
@@ -721,7 +722,7 @@ static int dx_make_map (struct ext3_dir_entry_2 *de, int 
size,
cond_resched();
}
/* XXX: do we need to check rec_len == 0 case? -Chris */
-   de = (struct ext3_dir_entry_2 *) ((char *) de + 
le16_to_cpu(de-rec_len));
+   de = ext3_next_entry(de);
}
return count;
 }
@@ -825,7 +826,7 @@ static inline int search_dirblock(struct buffer_head * bh,
return 1;
}
/* prevent looping on a bad block */
-   de_len = le16_to_cpu(de-rec_len);
+   de_len = ext3_rec_len_from_disk(de

Re: kernel Oops in ext3 code

2007-09-28 Thread Mingming Cao
 BUG: unable to handle kernel paging request at virtual address 104b
  printing eip:
  c0195bd3
  *pde = 
  Oops:  [#1]
  PREEMPT SMP 
  Modules linked in: vboxdrv binfmt_misc fuse coretemp hwmon gspca videodev 
 v4l2_common v4l1_compat iwl3945 mac80211 tifm_7xx1 tifm_core joydev irda 
 crc_ccitt 8250_pnp 8250 serial_core firewire_ohci firewire_core crc_itu_t
  CPU:0
  EIP:0060:[c0195bd3]Not tainted VLI
  EFLAGS: 00010206   (2.6.23-rc6 #1)
  EIP is at ext3_discard_reservation+0x18/0x4d
  eax: dff23800   ebx: 1033   ecx: dfc15ec0   edx: 
  esi: c0007c44   edi: 1033   ebp: dfc2bef4   esp: dfc2beac
  ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
  Process kswapd0 (pid: 261, ti=dfc2a000 task=dfcac570 task.ti=dfc2a000)
  Stack: c0007ba4 c0007c44 1033 c019ec51 c0007c44 c0007d8c 002c 
 c0171b1b 
 002c c0007c44 c0007c4c c0171da2 c050880c  0080 
 0080 
 c0171fb8 0080 c0007e48 df9e3910 7404 c03f5634 0080 
 00d0 
  Call Trace:
   [c019ec51] ext3_clear_inode+0x5d/0x76
   [c0171b1b] clear_inode+0x6b/0xb9
   [c0171da2] dispose_list+0x48/0xc9
   [c0171fb8] shrink_icache_memory+0x195/0x1bd
   [c014f5ec] shrink_slab+0xe2/0x159
   [c014f9a0] kswapd+0x2d3/0x431
   [c0132520] autoremove_wake_function+0x0/0x33
   [c014f6cd] kswapd+0x0/0x431
   [c0132453] kthread+0x38/0x5d
   [c013241b] kthread+0x0/0x5d
   [c0104b73] kernel_thread_helper+0x7/0x10
   ===
  Code: 83 f8 01 19 c0 f7 d0 83 e0 08 89 42 0c 89 56 b4 5b 5e c3 57 56 89 c6 
 53 8b 58 b4 8b 80 a4 00 00 00 85 db 8b 80 78 01 00 00 74 30 83 7b 18 00 74 
 2a 8d b8 00 03 00 00 89 f8 e8 b8 ca 1a 00 83 7b 
  EIP: [c0195bd3] ext3_discard_reservation+0x18/0x4d SS:ESP 0068:dfc2beac
 
 
On Fri, 2007-09-28 at 17:00 +0200, Norbert Preining wrote: 
 On Fr, 28 Sep 2007, Badari Pulavarty wrote:
  objdump -DlS balloc.o 
 
 Here it is
 

Thanks

Looks like kernel oops at 1753(173b+0x18):

173b ext3_discard_reservation:
ext3_discard_reservation():
173b:   57  push   %edi
173c:   56  push   %esi
173d:   89 c6   mov%eax,%esi
173f:   53  push   %ebx
1740:   8b 58 b4mov-0x4c(%eax),%ebx
1743:   8b 80 a4 00 00 00   mov0xa4(%eax),%eax
1749:   85 db   test   %ebx,%ebx
174b:   8b 80 78 01 00 00   mov0x178(%eax),%eax
1751:   74 30   je 1783
ext3_discard_reservation+0x48
1753:   83 7b 18 00 cmpl   $0x0,0x18(%ebx)

== Kernel oops here, ebx=1033, match bad
page location 104b(=1033+0x18)


1757:   74 2a   je 1783
ext3_discard_reservation+0x48
1759:   8d b8 00 03 00 00   lea0x300(%eax),%edi
175f:   89 f8   mov%edi,%eax
1761:   e8 fc ff ff ff  call   1762
ext3_discard_reservation+0x27
1766:   83 7b 18 00 cmpl   $0x0,0x18(%ebx)
176a:   74 0d   je 1779
ext3_discard_reservation+0x3e
176c:   8b 86 a4 00 00 00   mov0xa4(%esi),%eax
1772:   89 da   mov%ebx,%edx
1774:   e8 dc eb ff ff  call   355 rsv_window_remove
1779:   89 f8   mov%edi,%eax
177b:   5b  pop%ebx
177c:   5e  pop%esi
177d:   5f  pop%edi
177e:   e9 fc ff ff ff  jmp177f
ext3_discard_reservation+0x44
1783:   5b  pop%ebx
1784:   5e  pop%esi
1785:   5f  pop%edi
1786:   c3  ret


And trying to matching to the code:

void ext3_discard_reservation(struct inode *inode)
{
struct ext3_inode_info *ei = EXT3_I(inode);
struct ext3_block_alloc_info *block_i = ei-i_block_alloc_info;
struct ext3_reserve_window_node *rsv;
spinlock_t *rsv_lock = EXT3_SB(inode-i_sb)-s_rsv_window_lock;

if (!block_i)
return;

rsv = block_i-rsv_window_node;
if (!rsv_is_empty(rsv-rsv_window)) {

= kernel oops here

spin_lock(rsv_lock);
if (!rsv_is_empty(rsv-rsv_window))
rsv_window_remove(inode-i_sb, rsv);
spin_unlock(rsv_lock);
}
}


It seems ebx points to block_i(i_block_alloc_info), and that is bad
memory location, so that leads to bad paging request when try to get the
rsv_window structure. 

But it confused me why the rsv_window offset is 0x18 to
i_block_alloc_info, it should be 0x14(20 bytes)...Are you running a
vanilla 2.6.23-rc6?

No clue how i_block_alloc_info pointing to a bad location for now.
ext3_alloc_inode() clearly init this field to 

Re: kernel Oops in ext3 code

2007-09-27 Thread Mingming Cao
Hi,
Could you please sent the objdump of the ext4_discard_reservation
function? It doesn't match what I see here.

Thanks,
Mingming

On Thu, 2007-09-27 at 12:31 +0200, [EMAIL PROTECTED]
wrote:
 Hi all!
 
 (Please Cc)
 
 kernel 2.6.23-rc6
 Debian/sid
 
 kernel ooops:
 
 BUG: unable to handle kernel paging request at virtual address 104b
  printing eip:
  c0195bd3
  *pde = 
  Oops:  [#1]
  PREEMPT SMP 
  Modules linked in: vboxdrv binfmt_misc fuse coretemp hwmon gspca videodev 
 v4l2_common v4l1_compat iwl3945 mac80211 tifm_7xx1 tifm_core joydev irda 
 crc_ccitt 8250_pnp 8250 serial_core firewire_ohci firewire_core crc_itu_t
  CPU:0
  EIP:0060:[c0195bd3]Not tainted VLI
  EFLAGS: 00010206   (2.6.23-rc6 #1)
  EIP is at ext3_discard_reservation+0x18/0x4d
  eax: dff23800   ebx: 1033   ecx: dfc15ec0   edx: 
  esi: c0007c44   edi: 1033   ebp: dfc2bef4   esp: dfc2beac
  ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
  Process kswapd0 (pid: 261, ti=dfc2a000 task=dfcac570 task.ti=dfc2a000)
  Stack: c0007ba4 c0007c44 1033 c019ec51 c0007c44 c0007d8c 002c 
 c0171b1b 
 002c c0007c44 c0007c4c c0171da2 c050880c  0080 
 0080 
 c0171fb8 0080 c0007e48 df9e3910 7404 c03f5634 0080 
 00d0 
  Call Trace:
   [c019ec51] ext3_clear_inode+0x5d/0x76
   [c0171b1b] clear_inode+0x6b/0xb9
   [c0171da2] dispose_list+0x48/0xc9
   [c0171fb8] shrink_icache_memory+0x195/0x1bd
   [c014f5ec] shrink_slab+0xe2/0x159
   [c014f9a0] kswapd+0x2d3/0x431
   [c0132520] autoremove_wake_function+0x0/0x33
   [c014f6cd] kswapd+0x0/0x431
   [c0132453] kthread+0x38/0x5d
   [c013241b] kthread+0x0/0x5d
   [c0104b73] kernel_thread_helper+0x7/0x10
   ===
  Code: 83 f8 01 19 c0 f7 d0 83 e0 08 89 42 0c 89 56 b4 5b 5e c3 57 56 89 c6 
 53 8b 58 b4 8b 80 a4 00 00 00 85 db 8b 80 78 01 00 00 74 30 83 7b 18 00 74 
 2a 8d b8 00 03 00 00 89 f8 e8 b8 ca 1a 00 83 7b 
  EIP: [c0195bd3] ext3_discard_reservation+0x18/0x4d SS:ESP 0068:dfc2beac
 
 
 Sysrq did work, so the oops was saved. Good.
 
 Any ideas?
 
 Best wishes
 
 Norbert
 
 ---
 Dr. Norbert Preining [EMAIL PROTECTED]Vienna University of 
 Technology
 Debian Developer [EMAIL PROTECTED] Debian TeX Group
 gpg DSA: 0x09C5B094  fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 
 B094
 ---
 As he came into the light they could see his black and
 gold uniform on which the buttons were so highly polished
 that they shone with an intensity that would have made an
 approaching motorist flash his lights in annoyance.
  --- Douglas Adams, The Hitchhikers Guide to the Galaxy
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD/ext34 cleanups: convert to kzalloc

2007-09-26 Thread Mingming Cao
On Wed, 2007-09-26 at 12:54 -0700, Andrew Morton wrote:
 On Fri, 21 Sep 2007 16:13:56 -0700
 Mingming Cao [EMAIL PROTECTED] wrote:
 
  Convert kmalloc to kzalloc() and get rid of the memset().
 
 I split this into separate ext3/jbd and ext4/jbd2 patches.  It's generally
 better to raise separate patches, please - the ext3 patches I'll merge
 directly but the ext4 patches should go through (and be against) the ext4
 devel tree.
 
Sure. The patches(including ext3/jbd and ext4/jbd2) were merged into
ext4 devel tree already, I will remove the ext3/jbd part out of the ext4
devel tree.

 I fixed lots of rejects against the already-pending changes to these
 filesystems.
 
 You forgot to remove the memsets in both start_this_handle()s.
 
Thanks for catching this.

Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ext4 patch queue updated(2.6.23-rc8)

2007-09-26 Thread Mingming Cao
http://repo.or.cz/w/ext4-patch-queue.git

Changes

+ Add large block support back, 
updated patches to fix rec_len overflow issue with 64k blk size from Jan
Kara

split ext234_enlarge_blocksize.patch to three patches, one for each
filesystem type


Fix compile warning with jbd-stats-through-procfs


split jbd_kmalloc_kzalloc.patch into two patches for ext3/jbd and
ext4/jbd2 to sync with what in mm tree


+Add jbd2-fix-commit-code-to-properly-abort-journal.patch from Jan kara

+Added New patch set :Introduce le32_t and le16_t, from Aneesh







-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] JBD2/ext4 naming cleanup

2007-09-21 Thread Mingming Cao
JBD2 naming cleanup

From: Mingming Cao [EMAIL PROTECTED]

change micros name from JBD_XXX to JBD2_XXX in JBD2/Ext4

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/ext4/extents.c |2 +-
 fs/ext4/super.c   |2 +-
 fs/jbd2/commit.c  |2 +-
 fs/jbd2/journal.c |8 
 fs/jbd2/recovery.c|2 +-
 fs/jbd2/revoke.c  |4 ++--
 include/linux/ext4_jbd2.h |6 +++---
 include/linux/jbd2.h  |   30 +++---
 8 files changed, 28 insertions(+), 28 deletions(-)

Index: linux-2.6.23-rc6/fs/ext4/super.c
===
--- linux-2.6.23-rc6.orig/fs/ext4/super.c   2007-09-21 16:27:31.0 
-0700
+++ linux-2.6.23-rc6/fs/ext4/super.c2007-09-21 16:27:46.0 -0700
@@ -966,7 +966,7 @@ static int parse_options (char *options,
if (option  0)
return 0;
if (option == 0)
-   option = JBD_DEFAULT_MAX_COMMIT_AGE;
+   option = JBD2_DEFAULT_MAX_COMMIT_AGE;
sbi-s_commit_interval = HZ * option;
break;
case Opt_data_journal:
Index: linux-2.6.23-rc6/include/linux/ext4_jbd2.h
===
--- linux-2.6.23-rc6.orig/include/linux/ext4_jbd2.h 2007-09-10 
19:50:29.0 -0700
+++ linux-2.6.23-rc6/include/linux/ext4_jbd2.h  2007-09-21 16:27:46.0 
-0700
@@ -12,8 +12,8 @@
  * Ext4-specific journaling extensions.
  */
 
-#ifndef _LINUX_EXT4_JBD_H
-#define _LINUX_EXT4_JBD_H
+#ifndef _LINUX_EXT4_JBD2_H
+#define _LINUX_EXT4_JBD2_H
 
 #include linux/fs.h
 #include linux/jbd2.h
@@ -228,4 +228,4 @@ static inline int ext4_should_writeback_
return 0;
 }
 
-#endif /* _LINUX_EXT4_JBD_H */
+#endif /* _LINUX_EXT4_JBD2_H */
Index: linux-2.6.23-rc6/include/linux/jbd2.h
===
--- linux-2.6.23-rc6.orig/include/linux/jbd2.h  2007-09-21 09:07:09.0 
-0700
+++ linux-2.6.23-rc6/include/linux/jbd2.h   2007-09-21 16:27:46.0 
-0700
@@ -13,8 +13,8 @@
  * filesystem journaling support.
  */
 
-#ifndef _LINUX_JBD_H
-#define _LINUX_JBD_H
+#ifndef _LINUX_JBD2_H
+#define _LINUX_JBD2_H
 
 /* Allow this file to be included directly into e2fsprogs */
 #ifndef __KERNEL__
@@ -37,26 +37,26 @@
 #define journal_oom_retry 1
 
 /*
- * Define JBD_PARANIOD_IOFAIL to cause a kernel BUG() if ext3 finds
+ * Define JBD2_PARANIOD_IOFAIL to cause a kernel BUG() if ext4 finds
  * certain classes of error which can occur due to failed IOs.  Under
- * normal use we want ext3 to continue after such errors, because
+ * normal use we want ext4 to continue after such errors, because
  * hardware _can_ fail, but for debugging purposes when running tests on
  * known-good hardware we may want to trap these errors.
  */
-#undef JBD_PARANOID_IOFAIL
+#undef JBD2_PARANOID_IOFAIL
 
 /*
  * The default maximum commit age, in seconds.
  */
-#define JBD_DEFAULT_MAX_COMMIT_AGE 5
+#define JBD2_DEFAULT_MAX_COMMIT_AGE 5
 
 #ifdef CONFIG_JBD2_DEBUG
 /*
- * Define JBD_EXPENSIVE_CHECKING to enable more expensive internal
+ * Define JBD2_EXPENSIVE_CHECKING to enable more expensive internal
  * consistency checks.  By default we don't do this unless
  * CONFIG_JBD2_DEBUG is on.
  */
-#define JBD_EXPENSIVE_CHECKING
+#define JBD2_EXPENSIVE_CHECKING
 extern u8 jbd2_journal_enable_debug;
 
 #define jbd_debug(n, f, a...)  \
@@ -163,8 +163,8 @@ typedef struct journal_block_tag_s
__be32  t_blocknr_high; /* most-significant high 32bits. */
 } journal_block_tag_t;
 
-#define JBD_TAG_SIZE32 (offsetof(journal_block_tag_t, t_blocknr_high))
-#define JBD_TAG_SIZE64 (sizeof(journal_block_tag_t))
+#define JBD2_TAG_SIZE32 (offsetof(journal_block_tag_t, t_blocknr_high))
+#define JBD2_TAG_SIZE64 (sizeof(journal_block_tag_t))
 
 /*
  * The revoke descriptor: used on disk to describe a series of blocks to
@@ -256,8 +256,8 @@ typedef struct journal_superblock_s
 #include linux/fs.h
 #include linux/sched.h
 
-#define JBD_ASSERTIONS
-#ifdef JBD_ASSERTIONS
+#define JBD2_ASSERTIONS
+#ifdef JBD2_ASSERTIONS
 #define J_ASSERT(assert)   \
 do {   \
if (!(assert)) {\
@@ -284,9 +284,9 @@ void buffer_assertion_failure(struct buf
 
 #else
 #define J_ASSERT(assert)   do { } while (0)
-#endif /* JBD_ASSERTIONS */
+#endif /* JBD2_ASSERTIONS */
 
-#if defined(JBD_PARANOID_IOFAIL)
+#if defined(JBD2_PARANOID_IOFAIL)
 #define J_EXPECT(expr, why...) J_ASSERT(expr)
 #define J_EXPECT_BH(bh, expr, why...)  J_ASSERT_BH(bh, expr)
 #define J_EXPECT_JH(jh, expr, why...)  J_ASSERT_JH(jh, expr

Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Mingming Cao
On Tue, 2007-09-18 at 19:19 -0700, Andrew Morton wrote:
 On Tue, 18 Sep 2007 18:00:01 -0700 Mingming Cao [EMAIL PROTECTED] wrote:
 
  JBD: Replace slab allocations with page cache allocations
  
  JBD allocate memory for committed_data and frozen_data from slab. However
  JBD should not pass slab pages down to the block layer. Use page allocator 
  pages instead. This will also prepare JBD for the large blocksize patchset.
  
  
  Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly
 
 __GFP_NOFAIL should only be used when we have no way of recovering
 from failure.  The allocation in journal_init_common() (at least)
 _can_ recover and hence really shouldn't be using __GFP_NOFAIL.
 
 (Actually, nothing in the kernel should be using __GFP_NOFAIL.  It is 
 there as a marker which says we really shouldn't be doing this but
 we don't know how to fix it).
 
 So sometime it'd be good if you could review all the __GFP_NOFAILs in
 there and see if we can remove some, thanks.

Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2. In all
cases except one handles memory allocation failure so I get rid of those
GFP_NOFAIL flags.

Also, shouldn't we use GFP_KERNEL instead of GFP_NOFS flag for kmalloc
in jbd/jbd2? I will send a separate patch to cleanup that.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/jbd/journal.c  |2 +-
 fs/jbd/transaction.c  |3 +--
 fs/jbd2/journal.c |2 +-
 fs/jbd2/transaction.c |3 +--
 4 files changed, 4 insertions(+), 6 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-19 11:47:58.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-19 11:48:40.0 -0700
@@ -653,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
Index: linux-2.6.23-rc6/fs/jbd/transaction.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/transaction.c  2007-09-19 11:48:05.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/transaction.c   2007-09-19 11:49:10.0 
-0700
@@ -96,8 +96,7 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal-j_running_transaction) {
-   new_transaction = kmalloc(sizeof(*new_transaction),
-   GFP_NOFS|__GFP_NOFAIL);
+   new_transaction = kmalloc(sizeof(*new_transaction), GFP_NOFS);
if (!new_transaction) {
ret = -ENOMEM;
goto out;
Index: linux-2.6.23-rc6/fs/jbd2/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-19 11:48:14.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd2/journal.c  2007-09-19 11:49:46.0 -0700
@@ -654,7 +654,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
Index: linux-2.6.23-rc6/fs/jbd2/transaction.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/transaction.c 2007-09-19 11:48:08.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd2/transaction.c  2007-09-19 11:50:12.0 
-0700
@@ -96,8 +96,7 @@ static int start_this_handle(journal_t *
 
 alloc_transaction:
if (!journal-j_running_transaction) {
-   new_transaction = kmalloc(sizeof(*new_transaction),
-   GFP_NOFS|__GFP_NOFAIL);
+   new_transaction = kmalloc(sizeof(*new_transaction), GFP_NOFS);
if (!new_transaction) {
ret = -ENOMEM;
goto out;




-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] JBD: use GFP_NOFS in kmalloc

2007-09-19 Thread Mingming Cao
Convert the GFP_KERNEL flag used in JBD/JBD2 to GFP_NOFS, consistent
with the rest of kmalloc flag used in the JBD/JBD2 layer.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]

---
 fs/jbd/journal.c  |6 +++---
 fs/jbd/revoke.c   |8 
 fs/jbd2/journal.c |6 +++---
 fs/jbd2/revoke.c  |8 
 4 files changed, 14 insertions(+), 14 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-19 11:51:10.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-19 11:51:57.0 -0700
@@ -653,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_NOFS);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -723,7 +723,7 @@ journal_t * journal_init_dev(struct bloc
journal-j_blocksize = blocksize;
n = journal-j_blocksize / sizeof(journal_block_tag_t);
journal-j_wbufsize = n;
-   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
if (!journal-j_wbuf) {
printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
__FUNCTION__);
@@ -777,7 +777,7 @@ journal_t * journal_init_inode (struct i
/* journal descriptor can store up to n blocks -bzzz */
n = journal-j_blocksize / sizeof(journal_block_tag_t);
journal-j_wbufsize = n;
-   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
if (!journal-j_wbuf) {
printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
__FUNCTION__);
Index: linux-2.6.23-rc6/fs/jbd/revoke.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/revoke.c   2007-09-19 11:51:30.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/revoke.c2007-09-19 11:52:34.0 -0700
@@ -206,7 +206,7 @@ int journal_init_revoke(journal_t *journ
while((tmp = 1UL) != 0UL)
shift++;
 
-   journal-j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, 
GFP_KERNEL);
+   journal-j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, 
GFP_NOFS);
if (!journal-j_revoke_table[0])
return -ENOMEM;
journal-j_revoke = journal-j_revoke_table[0];
@@ -219,7 +219,7 @@ int journal_init_revoke(journal_t *journ
journal-j_revoke-hash_shift = shift;
 
journal-j_revoke-hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   kmalloc(hash_size * sizeof(struct list_head), GFP_NOFS);
if (!journal-j_revoke-hash_table) {
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
journal-j_revoke = NULL;
@@ -229,7 +229,7 @@ int journal_init_revoke(journal_t *journ
for (tmp = 0; tmp  hash_size; tmp++)
INIT_LIST_HEAD(journal-j_revoke-hash_table[tmp]);
 
-   journal-j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
GFP_KERNEL);
+   journal-j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
GFP_NOFS);
if (!journal-j_revoke_table[1]) {
kfree(journal-j_revoke_table[0]-hash_table);
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
@@ -246,7 +246,7 @@ int journal_init_revoke(journal_t *journ
journal-j_revoke-hash_shift = shift;
 
journal-j_revoke-hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   kmalloc(hash_size * sizeof(struct list_head), GFP_NOFS);
if (!journal-j_revoke-hash_table) {
kfree(journal-j_revoke_table[0]-hash_table);
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
Index: linux-2.6.23-rc6/fs/jbd2/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd2/journal.c 2007-09-19 11:52:48.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd2/journal.c  2007-09-19 11:53:12.0 -0700
@@ -654,7 +654,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_NOFS);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -724,7 +724,7 @@ journal_t * jbd2_journal_init_dev(struct
journal-j_blocksize = blocksize;
n = journal-j_blocksize / sizeof(journal_block_tag_t);
journal-j_wbufsize = n;
-   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL

Re: [PATCH] JBD slab cleanups

2007-09-19 Thread Mingming Cao
On Wed, 2007-09-19 at 19:28 +, Dave Kleikamp wrote:
 On Wed, 2007-09-19 at 14:26 -0500, Dave Kleikamp wrote:
  On Wed, 2007-09-19 at 12:15 -0700, Mingming Cao wrote:
  
   Here is the patch to clean up __GFP_NOFAIL flag in jbd/jbd2. In all
   cases except one handles memory allocation failure so I get rid of those
   GFP_NOFAIL flags.
   
   Also, shouldn't we use GFP_KERNEL instead of GFP_NOFS flag for kmalloc
   in jbd/jbd2? I will send a separate patch to cleanup that.
  
  No.  GFP_NOFS avoids deadlock.  It prevents the allocation from making
  recursive calls back into the file system that could end up blocking on
  jbd code.
 
 Oh, I see your patch now.  You mean use GFP_NOFS instead of
 GFP_KERNEL.  :-)  OK then.
 

oops, I did mean what you say here.:-)

  Shaggy

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD: use GFP_NOFS in kmalloc

2007-09-19 Thread Mingming Cao
On Wed, 2007-09-19 at 14:34 -0700, Andrew Morton wrote:
 On Wed, 19 Sep 2007 12:22:09 -0700
 Mingming Cao [EMAIL PROTECTED] wrote:
 
  Convert the GFP_KERNEL flag used in JBD/JBD2 to GFP_NOFS, consistent
  with the rest of kmalloc flag used in the JBD/JBD2 layer.
  
  Signed-off-by: Mingming Cao [EMAIL PROTECTED]
  
  ---
   fs/jbd/journal.c  |6 +++---
   fs/jbd/revoke.c   |8 
   fs/jbd2/journal.c |6 +++---
   fs/jbd2/revoke.c  |8 
   4 files changed, 14 insertions(+), 14 deletions(-)
  
  Index: linux-2.6.23-rc6/fs/jbd/journal.c
  ===
  --- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-19 11:51:10.0 
  -0700
  +++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-19 11:51:57.0 
  -0700
  @@ -653,7 +653,7 @@ static journal_t * journal_init_common (
  journal_t *journal;
  int err;
   
  -   journal = kmalloc(sizeof(*journal), GFP_KERNEL);
  +   journal = kmalloc(sizeof(*journal), GFP_NOFS);
  if (!journal)
  goto fail;
  memset(journal, 0, sizeof(*journal));
  @@ -723,7 +723,7 @@ journal_t * journal_init_dev(struct bloc
  journal-j_blocksize = blocksize;
  n = journal-j_blocksize / sizeof(journal_block_tag_t);
  journal-j_wbufsize = n;
  -   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
  +   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
  if (!journal-j_wbuf) {
  printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
  __FUNCTION__);
  @@ -777,7 +777,7 @@ journal_t * journal_init_inode (struct i
  /* journal descriptor can store up to n blocks -bzzz */
  n = journal-j_blocksize / sizeof(journal_block_tag_t);
  journal-j_wbufsize = n;
  -   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
  +   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_NOFS);
  if (!journal-j_wbuf) {
  printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
  __FUNCTION__);
  Index: linux-2.6.23-rc6/fs/jbd/revoke.c
  ===
  --- linux-2.6.23-rc6.orig/fs/jbd/revoke.c   2007-09-19 11:51:30.0 
  -0700
  +++ linux-2.6.23-rc6/fs/jbd/revoke.c2007-09-19 11:52:34.0 
  -0700
  @@ -206,7 +206,7 @@ int journal_init_revoke(journal_t *journ
  while((tmp = 1UL) != 0UL)
  shift++;
   
  -   journal-j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, 
  GFP_KERNEL);
  +   journal-j_revoke_table[0] = kmem_cache_alloc(revoke_table_cache, 
  GFP_NOFS);
  if (!journal-j_revoke_table[0])
  return -ENOMEM;
  journal-j_revoke = journal-j_revoke_table[0];
  @@ -219,7 +219,7 @@ int journal_init_revoke(journal_t *journ
  journal-j_revoke-hash_shift = shift;
   
  journal-j_revoke-hash_table =
  -   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
  +   kmalloc(hash_size * sizeof(struct list_head), GFP_NOFS);
  if (!journal-j_revoke-hash_table) {
  kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
  journal-j_revoke = NULL;
  @@ -229,7 +229,7 @@ int journal_init_revoke(journal_t *journ
  for (tmp = 0; tmp  hash_size; tmp++)
  INIT_LIST_HEAD(journal-j_revoke-hash_table[tmp]);
   
  -   journal-j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
  GFP_KERNEL);
  +   journal-j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
  GFP_NOFS);
  if (!journal-j_revoke_table[1]) {
  kfree(journal-j_revoke_table[0]-hash_table);
  kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
  @@ -246,7 +246,7 @@ int journal_init_revoke(journal_t *journ
  journal-j_revoke-hash_shift = shift;
   
  journal-j_revoke-hash_table =
  -   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
  +   kmalloc(hash_size * sizeof(struct list_head), GFP_NOFS);
  if (!journal-j_revoke-hash_table) {
  kfree(journal-j_revoke_table[0]-hash_table);
  kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
 
 These were all OK using GFP_KERNEL.
 
 GFP_NOFS should only be used when the caller is holding some fs locks which
 might cause a deadlock if that caller reentered the fs in -writepage (and
 maybe put_inode and such).  That isn't the case in any of the above code,
 which is all mount time stuff (I think).
 

You are right they are all occur at initialization time.

 ext3/4 should be using GFP_NOFS when the caller has a transaction open, has
 a page locked, is holding i_mutex, etc.
 

Thanks for your feedback.

Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao
On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
 On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
  Here is the incremental small cleanup patch. 
  
  Remove kamlloc usages in jbd/jbd2 and consistently use 
  jbd_kmalloc/jbd2_malloc.
 
 Shouldn't we kill jbd_kmalloc instead?
 

It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
places to handle memory (de)allocation(page size) via kmalloc/kfree, so
in the future if we need to change memory allocation in jbd(e.g. not
using kmalloc or using different flag), we don't need to touch every
place in the jbd code calling jbd_kmalloc.

Regards,
Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] JBD slab cleanups

2007-09-18 Thread Mingming Cao
On Tue, 2007-09-18 at 13:04 -0500, Dave Kleikamp wrote:
 On Tue, 2007-09-18 at 09:35 -0700, Mingming Cao wrote:
  On Tue, 2007-09-18 at 10:04 +0100, Christoph Hellwig wrote:
   On Mon, Sep 17, 2007 at 03:57:31PM -0700, Mingming Cao wrote:
Here is the incremental small cleanup patch. 

Remove kamlloc usages in jbd/jbd2 and consistently use 
jbd_kmalloc/jbd2_malloc.
   
   Shouldn't we kill jbd_kmalloc instead?
   
  
  It seems useful to me to keep jbd_kmalloc/jbd_free. They are central
  places to handle memory (de)allocation(page size) via kmalloc/kfree, so
  in the future if we need to change memory allocation in jbd(e.g. not
  using kmalloc or using different flag), we don't need to touch every
  place in the jbd code calling jbd_kmalloc.
 
 I disagree.  Why would jbd need to globally change the way it allocates
 memory?  It currently uses kmalloc (and jbd_kmalloc) for allocating a
 variety of structures.  Having to change one particular instance won't
 necessarily mean we want to change all of them.  Adding unnecessary
 wrappers only obfuscates the code making it harder to understand.  You
 wouldn't want every subsystem to have it's own *_kmalloc() that took
 different arguments.  Besides, there aren't that many calls to kmalloc
 and kfree in the jbd code, so there wouldn't be much pain in changing
 GFP flags or whatever, if it ever needed to be done.
 
 Shaggy

Okay, Points taken, Here is the updated patch to get rid of slab
management and jbd_kmalloc from jbd totally. This patch is intend to
replace the patch in mm tree, Andrew, could you pick up this one
instead?

Thanks,

Mingming


jbd/jbd2: JBD memory allocation cleanups

From: Christoph Lameter [EMAIL PROTECTED]

JBD: Replace slab allocations with page cache allocations

JBD allocate memory for committed_data and frozen_data from slab. However
JBD should not pass slab pages down to the block layer. Use page allocator 
pages instead. This will also prepare JBD for the large blocksize patchset.


Also this patch cleans up jbd_kmalloc and replace it with kmalloc directly

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
Signed-off-by: Mingming Cao [EMAIL PROTECTED]

---
 fs/jbd/commit.c   |6 +--
 fs/jbd/journal.c  |   99 ++
 fs/jbd/transaction.c  |   12 +++---
 fs/jbd2/commit.c  |6 +--
 fs/jbd2/journal.c |   99 ++
 fs/jbd2/transaction.c |   18 -
 include/linux/jbd.h   |   18 +
 include/linux/jbd2.h  |   21 +-
 8 files changed, 52 insertions(+), 227 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-18 17:19:01.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-18 17:51:21.0 -0700
@@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
 static int journal_convert_superblock_v1(journal_t *, journal_superblock_t *);
 static void __journal_abort_soft (journal_t *journal, int errno);
-static int journal_create_jbd_slab(size_t slab_size);
 
 /*
  * Helper function used to manage commit timeouts
@@ -334,10 +333,10 @@ repeat:
char *tmp;
 
jbd_unlock_bh_state(bh_in);
-   tmp = jbd_slab_alloc(bh_in-b_size, GFP_NOFS);
+   tmp = jbd_alloc(bh_in-b_size, GFP_NOFS);
jbd_lock_bh_state(bh_in);
if (jh_in-b_frozen_data) {
-   jbd_slab_free(tmp, bh_in-b_size);
+   jbd_free(tmp, bh_in-b_size);
goto repeat;
}
 
@@ -654,7 +653,7 @@ static journal_t * journal_init_common (
journal_t *journal;
int err;
 
-   journal = jbd_kmalloc(sizeof(*journal), GFP_KERNEL);
+   journal = kmalloc(sizeof(*journal), GFP_KERNEL|__GFP_NOFAIL);
if (!journal)
goto fail;
memset(journal, 0, sizeof(*journal));
@@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
}
}
 
-   /*
-* Create a slab for this blocksize
-*/
-   err = journal_create_jbd_slab(be32_to_cpu(sb-s_blocksize));
-   if (err)
-   return err;
-
/* Let the recovery code check whether it needs to recover any
 * data from the journal. */
if (journal_recover(journal))
@@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
 }
 
 /*
- * Simple support for retrying memory allocations.  Introduced to help to
- * debug different VM deadlock avoidance strategies.
- */
-void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
-{
-   return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
-}
-
-/*
- * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
- * and allocate frozen and commit buffers from these slabs.
- *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could

Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Mingming Cao
On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
 jbd/jbd2: Replace slab allocations with page cache allocations
 
 From: Christoph Lameter [EMAIL PROTECTED]
 
 JBD should not pass slab pages down to the block layer.
 Use page allocator pages instead. This will also prepare
 JBD for the large blocksize patchset.
 

Currently memory allocation for committed_data(and frozen_buffer) for
bufferhead is done through jbd slab management, as Christoph Hellwig
pointed out that this is broken as jbd should not pass slab pages down
to IO layer. and suggested to use get_free_pages() directly.

The problem with this patch, as Andreas Dilger pointed today in ext4
interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
1/3-1/2 page space. 

What was the originally intention to set up slabs for committed_data(and
frozen_buffer) in JBD? Why not using kmalloc?

Mingming

 Tested on 2.6.23-rc6 with fsx runs fine.
 
 Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
 Signed-off-by: Mingming Cao [EMAIL PROTECTED]
 ---
  fs/jbd/checkpoint.c   |2 
  fs/jbd/commit.c   |6 +-
  fs/jbd/journal.c  |  107 
 -
  fs/jbd/transaction.c  |   10 ++--
  fs/jbd2/checkpoint.c  |2 
  fs/jbd2/commit.c  |6 +-
  fs/jbd2/journal.c |  109 
 --
  fs/jbd2/transaction.c |   18 
  include/linux/jbd.h   |   23 +-
  include/linux/jbd2.h  |   28 ++--
  10 files changed, 83 insertions(+), 228 deletions(-)
 
 Index: linux-2.6.23-rc5/fs/jbd/journal.c
 ===
 --- linux-2.6.23-rc5.orig/fs/jbd/journal.c2007-09-13 13:37:57.0 
 -0700
 +++ linux-2.6.23-rc5/fs/jbd/journal.c 2007-09-13 13:45:39.0 -0700
 @@ -83,7 +83,6 @@ EXPORT_SYMBOL(journal_force_commit);
 
  static int journal_convert_superblock_v1(journal_t *, journal_superblock_t 
 *);
  static void __journal_abort_soft (journal_t *journal, int errno);
 -static int journal_create_jbd_slab(size_t slab_size);
 
  /*
   * Helper function used to manage commit timeouts
 @@ -334,10 +333,10 @@ repeat:
   char *tmp;
 
   jbd_unlock_bh_state(bh_in);
 - tmp = jbd_slab_alloc(bh_in-b_size, GFP_NOFS);
 + tmp = jbd_alloc(bh_in-b_size, GFP_NOFS);
   jbd_lock_bh_state(bh_in);
   if (jh_in-b_frozen_data) {
 - jbd_slab_free(tmp, bh_in-b_size);
 + jbd_free(tmp, bh_in-b_size);
   goto repeat;
   }
 
 @@ -679,7 +678,7 @@ static journal_t * journal_init_common (
   /* Set up a default-sized revoke table for the new mount. */
   err = journal_init_revoke(journal, JOURNAL_REVOKE_DEFAULT_HASH);
   if (err) {
 - kfree(journal);
 + jbd_kfree(journal);
   goto fail;
   }
   return journal;
 @@ -728,7 +727,7 @@ journal_t * journal_init_dev(struct bloc
   if (!journal-j_wbuf) {
   printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
   __FUNCTION__);
 - kfree(journal);
 + jbd_kfree(journal);
   journal = NULL;
   goto out;
   }
 @@ -782,7 +781,7 @@ journal_t * journal_init_inode (struct i
   if (!journal-j_wbuf) {
   printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
   __FUNCTION__);
 - kfree(journal);
 + jbd_kfree(journal);
   return NULL;
   }
 
 @@ -791,7 +790,7 @@ journal_t * journal_init_inode (struct i
   if (err) {
   printk(KERN_ERR %s: Cannnot locate journal superblock\n,
  __FUNCTION__);
 - kfree(journal);
 + jbd_kfree(journal);
   return NULL;
   }
 
 @@ -1095,13 +1094,6 @@ int journal_load(journal_t *journal)
   }
   }
 
 - /*
 -  * Create a slab for this blocksize
 -  */
 - err = journal_create_jbd_slab(be32_to_cpu(sb-s_blocksize));
 - if (err)
 - return err;
 -
   /* Let the recovery code check whether it needs to recover any
* data from the journal. */
   if (journal_recover(journal))
 @@ -1166,7 +1158,7 @@ void journal_destroy(journal_t *journal)
   if (journal-j_revoke)
   journal_destroy_revoke(journal);
   kfree(journal-j_wbuf);
 - kfree(journal);
 + jbd_kfree(journal);
  }
 
 
 @@ -1615,86 +1607,6 @@ int journal_blocks_per_page(struct inode
  }
 
  /*
 - * Simple support for retrying memory allocations.  Introduced to help to
 - * debug different VM deadlock avoidance strategies.
 - */
 -void * __jbd_kmalloc (const char *where, size_t size, gfp_t flags, int retry)
 -{
 - return kmalloc(size, flags | (retry ? __GFP_NOFAIL : 0));
 -}
 -
 -/*
 - * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
 - * and allocate frozen

Re: [PATCH] JBD slab cleanups

2007-09-17 Thread Mingming Cao
On Mon, 2007-09-17 at 15:01 -0700, Badari Pulavarty wrote:
 On Mon, 2007-09-17 at 12:29 -0700, Mingming Cao wrote:
  On Fri, 2007-09-14 at 11:53 -0700, Mingming Cao wrote:
   jbd/jbd2: Replace slab allocations with page cache allocations
   
   From: Christoph Lameter [EMAIL PROTECTED]
   
   JBD should not pass slab pages down to the block layer.
   Use page allocator pages instead. This will also prepare
   JBD for the large blocksize patchset.
   
  
  Currently memory allocation for committed_data(and frozen_buffer) for
  bufferhead is done through jbd slab management, as Christoph Hellwig
  pointed out that this is broken as jbd should not pass slab pages down
  to IO layer. and suggested to use get_free_pages() directly.
  
  The problem with this patch, as Andreas Dilger pointed today in ext4
  interlock call, for 1k,2k block size ext2/3/4, get_free_pages() waste
  1/3-1/2 page space. 
  
  What was the originally intention to set up slabs for committed_data(and
  frozen_buffer) in JBD? Why not using kmalloc?
  
  Mingming
 
 Looks good. Small suggestion is to get rid of all kmalloc() usages and
 consistently use jbd_kmalloc() or jbd2_kmalloc().
 
 Thanks,
 Badari
 

Here is the incremental small cleanup patch. 

Remove kamlloc usages in jbd/jbd2 and consistently use jbd_kmalloc/jbd2_malloc.


Signed-off-by: Mingming Cao [EMAIL PROTECTED]
---
 fs/jbd/journal.c  |8 +---
 fs/jbd/revoke.c   |   12 ++--
 fs/jbd2/journal.c |8 +---
 fs/jbd2/revoke.c  |   12 ++--
 4 files changed, 22 insertions(+), 18 deletions(-)

Index: linux-2.6.23-rc6/fs/jbd/journal.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/journal.c  2007-09-17 14:32:16.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/journal.c   2007-09-17 14:33:59.0 -0700
@@ -723,7 +723,8 @@ journal_t * journal_init_dev(struct bloc
journal-j_blocksize = blocksize;
n = journal-j_blocksize / sizeof(journal_block_tag_t);
journal-j_wbufsize = n;
-   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal-j_wbuf = jbd_kmalloc(n * sizeof(struct buffer_head*),
+   GFP_KERNEL);
if (!journal-j_wbuf) {
printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
__FUNCTION__);
@@ -777,7 +778,8 @@ journal_t * journal_init_inode (struct i
/* journal descriptor can store up to n blocks -bzzz */
n = journal-j_blocksize / sizeof(journal_block_tag_t);
journal-j_wbufsize = n;
-   journal-j_wbuf = kmalloc(n * sizeof(struct buffer_head*), GFP_KERNEL);
+   journal-j_wbuf = jbd_kmalloc(n * sizeof(struct buffer_head*),
+   GFP_KERNEL);
if (!journal-j_wbuf) {
printk(KERN_ERR %s: Cant allocate bhs for commit thread\n,
__FUNCTION__);
@@ -1157,7 +1159,7 @@ void journal_destroy(journal_t *journal)
iput(journal-j_inode);
if (journal-j_revoke)
journal_destroy_revoke(journal);
-   kfree(journal-j_wbuf);
+   jbd_kfree(journal-j_wbuf);
jbd_kfree(journal);
 }
 
Index: linux-2.6.23-rc6/fs/jbd/revoke.c
===
--- linux-2.6.23-rc6.orig/fs/jbd/revoke.c   2007-09-17 14:32:22.0 
-0700
+++ linux-2.6.23-rc6/fs/jbd/revoke.c2007-09-17 14:35:13.0 -0700
@@ -219,7 +219,7 @@ int journal_init_revoke(journal_t *journ
journal-j_revoke-hash_shift = shift;
 
journal-j_revoke-hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   jbd_kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
if (!journal-j_revoke-hash_table) {
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
journal-j_revoke = NULL;
@@ -231,7 +231,7 @@ int journal_init_revoke(journal_t *journ
 
journal-j_revoke_table[1] = kmem_cache_alloc(revoke_table_cache, 
GFP_KERNEL);
if (!journal-j_revoke_table[1]) {
-   kfree(journal-j_revoke_table[0]-hash_table);
+   jbd_kfree(journal-j_revoke_table[0]-hash_table);
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
return -ENOMEM;
}
@@ -246,9 +246,9 @@ int journal_init_revoke(journal_t *journ
journal-j_revoke-hash_shift = shift;
 
journal-j_revoke-hash_table =
-   kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
+   jbd_kmalloc(hash_size * sizeof(struct list_head), GFP_KERNEL);
if (!journal-j_revoke-hash_table) {
-   kfree(journal-j_revoke_table[0]-hash_table);
+   jbd_kfree(journal-j_revoke_table[0]-hash_table);
kmem_cache_free(revoke_table_cache, journal-j_revoke_table[0]);
kmem_cache_free

Ext4 patch queue updated(2.6.23-rc6)

2007-09-14 Thread Mingming Cao
Changes:

Rebase to 2.6.23-rc6
updated jbd_slab_cleanup.patch. (from Mingming)
updated commit messages to different patches in the patch-queue (from
Aneesh.)
Added JBD-blocks-reservation-fix-for-large-blk.patch (part of 4k block
support) (from Mingming)

http://repo.or.cz/w/ext4-patch-queue.git


Ted, could you please update the ext4 git tree with the patches before
the unstable mark, so we could get more air time in mm tree for
testing?

Thanks,
Mingming

current patch series

# Rebased the patches to 2.6.23-rc6
jbd_slab_cleanup.patch
sparse-fix.patch

# Add journal checksums
ext4-journal_chksum-2.6.20.patch
ext4-journal-chksum-review-fix.patch

# Add unused inode watermark and checksum to blockgroup descriptors
# Need e2fsprogs changes to mkfs to use this featur
ext4_uninit_blockgroup.patch

# Large blocksize support
#
ext234_enlarge_blocksize.patch
ext2_rec_len_overflow_with_64kblk_fix.patch
ext3_rec_len_overflow_with_64kblk_fix.patch
ext4_rec_len_overflow_with_64kblk_fix.patch
JBD-blocks-reservation-fix-for-large-blk.patch


# patches still have outstanding comments from lkml


# inode verion patch series
# inode versioning is needed for NFSv4
# Need to address overwrite case and performance concerns
# Need to repost to lkml for comments before submit
#
# vfs changes, 64 bit inode-i_version
64-bit-i_version.patch
# reserve hi 32 bit inode version on ext4 on-disk inode
i_version_hi.patch
# ext4 inode version read/store
ext4_i_version_hi_2.patch
# ext4 inode version update
i_version_update_ext4.patch

# Export jbd stats through procfs
# Shall this move to debugfs?
# Still need to address akpm's comments.
jbd-stats-through-procfs
jbd-stats-through-procfs_fix.patch


#Unstable patches

# New delayed allocation patch
delalloc-vfs.patch
delalloc-ext4.patch

ext-truncate-mutex.patch
ext3-4-migrate.patch
new-extent-function.patch
###
#mballoc
###
generic-find-next-le-bit
mballoc-core.patch

#
#FLEX_BG

flex_bg_kernel.patch



-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: new mballoc patches.

2007-09-11 Thread Mingming Cao
On Tue, 2007-09-11 at 13:59 +0530, Aneesh Kumar K.V wrote:
 I have updated the mballoc patches. The same can be found at
 
 http://www.radian.org/~kvaneesh/ext4/patch-series/
 
 
 
 The series file is 
 
 # This series applies on GIT commit b07d68b5ca4d55a16fab223d63d5fb36f89ff42f
 ext4-journal_chksum-2.6.20.patch
 ext4-journal-chksum-review-fix.patch
 ext4_uninit_blockgroup.patch
 64-bit-i_version.patch
 i_version_hi.patch
 ext4_i_version_hi_2.patch
 i_version_update_ext4.patch
 jbd-stats-through-procfs
 jbd-stats-through-procfs_fix.patch
 delalloc-vfs.patch
 delalloc-ext4.patch
 ext-truncate-mutex.patch
 ext3-4-migrate.patch
 new-extent-function.patch
 generic-find-next-le-bit
 mballoc-core.patch
 ext234_enlarge_blocksize.patch
 ext2_rec_len_overflow_with_64kblk_fix.patch
 ext3_rec_len_overflow_with_64kblk_fix.patch
 ext4_rec_len_overflow_with_64kblk_fix.patch
 RFC-1-2-JBD-slab-management-support-for-large-b.patch
 RFC-2-2-JBD-blocks-reservation-fix-for-large-bl.patch
 
 
 
 Changes:
 a) deleted mballoc-fixup.patch; merged with mballoc-core.path
 b) deleted mballoc-mkdir-oops-fix.patch; merged with mballoc-core.patch
 c) merged  Uninitialized group related changes for mballoc
 d) merged the checkpatch.pl error fix patch for mballoc 
 e) merged the fix for ext4_ext_search_right for error 
EXT4-fs error (device sdc): ext4_ext_search_right: bad header in inode 
 #3745797: unexpected eh_depth - magic f30a, entries 81, max 84(0), depth 0(1)
 f) PPC build fix by introducing generic_find_next_le_bit()
 g) Document the code. Rest of the places that needs more comments are marked 
 with FIXME!!
 h) 48 bit block usage by converting bg_*_bitmap and bg_inode_table to 
 respective ext4_*_bitmap and ext4_inode_table() function
 i) Added jbd fixes patches from Mingming for large  block size with the 
 comment from hch. (this is added towards the end of the series)
 k) Updated the ext234_enlarge_blocksize.patch 
 ext2_rec_len_overflow_with_64kblk_fix.patch 
 ext3_rec_len_overflow_with_64kblk_fix.patch 
ext4_rec_len_overflow_with_64kblk_fix.patch to make sure they can be 
 applied with stgit. They didn't had a valid email id in the
From: field. ( This is only commit message update )
 
 Test status:
 Minor testing with KVM. I also didn't do a PPC build. 
 
 Mingming/Avantika,
 
 Can we update the patch queue with the series ?
 

Thanks for the update.  I just back to home from London late last night,
still catching up emails, so I haven't get a chance to update the ext4
patch queue yet. Will do so shortly.


Cheers,
Mingming
 -aneesh
   
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] placate checkpatch.pl to some degree for mballoc.c

2007-08-31 Thread Mingming Cao
On Fri, 2007-08-31 at 16:28 -0500, Eric Sandeen wrote:
 Here's a patch for mballoc.c, to make checkpatch happier with it.
 
 I was about as pedantic as I could be, except for a few things it 
 complained about which I just could not agree with. :)
 
 One of the comments in the series file in git says checkpatch doesn't like
 mballoc, so, here you go, if it's helpful.  

Very helpful, thanks!

 Applies to the bottom of the
 patch stack - should some of these mballoc patches get rolled together by
 now?
 
Okay, will do.

There are still a issue with mballoc, it doesn't handle big endian  when
generating free extents map from on-disk bitmaps. Thus cause failure
compile on ppc64. ppc64 needs to implement generic_find_next_le_bit.

Andreas pointed a patch already in cfs bug report,
https://bugzilla.lustre.org/show_bug.cgi?id=10634, the patch for this is
in the last attachment. 


 -Eric
 
 
 
 Make checkpatch happier with mballoc.c
 
 Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
 
 Index: ext4.git/fs/ext4/mballoc.c
 ===
 --- ext4.git.orig/fs/ext4/mballoc.c
 +++ ext4.git/fs/ext4/mballoc.c
 @@ -247,9 +247,9 @@
   */
  #define MB_DEBUG__
  #ifdef MB_DEBUG
 -#define mb_debug(fmt,a...)   printk(fmt, ##a)
 +#define mb_debug(fmt, a...)  printk(fmt, ##a)
  #else
 -#define mb_debug(fmt,a...)
 +#define mb_debug(fmt, a...)
 
  #endif
 
  /*
 @@ -303,7 +303,7 @@
   */
  #define MB_DEFAULT_STRIPE256
 
 -static struct kmem_cache *ext4_pspace_cachep = NULL;
 +static struct kmem_cache *ext4_pspace_cachep;
 
  #ifdef EXT4_BB_MAX_BLOCKS
  #undef EXT4_BB_MAX_BLOCKS
 @@ -353,7 +353,7 @@ struct ext4_prealloc_space {
   unsigned short  pa_len; /* len of preallocated chunk */
   unsigned short  pa_free;/* how many blocks are free */
   unsigned short  pa_linear;  /* consumed in one direction
 -  * strictly, for group prealloc 
 */
 +  * strictly, for grp prealloc */
   spinlock_t  *pa_obj_lock;
   struct inode*pa_inode;  /* hack, for history only */
  };
 @@ -456,8 +456,8 @@ static void ext4_mb_store_history(struct
 
  static struct proc_dir_entry *proc_root_ext4;
 
 -int ext4_create (struct inode *, struct dentry *, int, struct nameidata *);
 -struct buffer_head * read_block_bitmap(struct super_block *, unsigned int);
 +int ext4_create(struct inode *, struct dentry *, int, struct nameidata *);
 +struct buffer_head *read_block_bitmap(struct super_block *, unsigned int);
  ext4_fsblk_t ext4_new_blocks_old(handle_t *handle, struct inode *inode,
   ext4_fsblk_t goal, unsigned long *count, int *errp);
  void ext4_mb_release_blocks(struct super_block *, int);
 @@ -465,11 +465,13 @@ void ext4_mb_poll_new_transaction(struct
  void ext4_mb_free_committed_blocks(struct super_block *);
  void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap, int 
 group);
  void ext4_mb_free_consumed_preallocations(struct ext4_allocation_context 
 *ac);
 -void ext4_mb_return_to_preallocation(struct inode *inode, struct ext4_buddy 
 *e3b,
 - sector_t block, int count);
 +void ext4_mb_return_to_preallocation(struct inode *inode,
 + struct ext4_buddy *e3b, sector_t block,
 + int count);
  void ext4_mb_show_ac(struct ext4_allocation_context *ac);
  void ext4_mb_check_with_pa(struct ext4_buddy *e3b, int first, int count);
 -void ext4_mb_put_pa(struct ext4_allocation_context *, struct super_block *, 
 struct ext4_prealloc_space *pa);
 +void ext4_mb_put_pa(struct ext4_allocation_context *, struct super_block *,
 + struct ext4_prealloc_space *pa);
  int ext4_mb_init_per_dev_proc(struct super_block *sb);
  int ext4_mb_destroy_per_dev_proc(struct super_block *sb);
 
 @@ -507,13 +509,13 @@ unsigned long ext4_grp_offs_to_block(str
  }
 
  #if BITS_PER_LONG == 64
 -#define mb_correct_addr_and_bit(bit,addr)\
 +#define mb_correct_addr_and_bit(bit, addr)   \
  {\
   bit += ((unsigned long) addr  7UL)  3;   \
   addr = (void *) ((unsigned long) addr  ~7UL);  \
  }
  #elif BITS_PER_LONG == 32
 -#define mb_correct_addr_and_bit(bit,addr)\
 +#define mb_correct_addr_and_bit(bit, addr)   \
  {\
   bit += ((unsigned long) addr  3UL)  3;   \
   addr = (void *) ((unsigned long) addr  ~3UL);  \
 @@ -524,31 +526,31 @@ unsigned long ext4_grp_offs_to_block(str
 
  static inline int mb_test_bit(int bit, void *addr)
  {
 - mb_correct_addr_and_bit(bit,addr);
 + mb_correct_addr_and_bit(bit, addr);
   return ext2_test_bit(bit, addr);
  }
 
  static inline void 

Re: [RFC 1/4] Large Blocksize support for Ext2/3/4

2007-08-31 Thread Mingming Cao
On Wed, 2007-08-29 at 17:47 -0700, Mingming Cao wrote:

 Just rebase to 2.6.23-rc4 and against the ext4 patch queue. Compile tested 
 only. 
 
 Next steps:
 Need a e2fsprogs changes to able test this feature. As mkfs needs to be
 educated not assuming rec_len to be blocksize all the time.
 Will try it with Christoph Lameter's large block patch next.
 

Two problems were found when testing largeblock on ext3.  Patches to
follow. 

Good news is, with your changes, plus all these extN changes, I am able
to run ext2/3/4 with 64k block size, tested on x86 and ppc64 with 4k
page size. fsx test runs fine for an hour on ext3 with 16k blocksize on
x86:-)

Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 1/2] JBD: slab management support for large block(8k)

2007-08-31 Thread Mingming Cao
From clameter:
Teach jbd/jbd2 slab management to support 8k block size. Without this, it 
refused to mount on 8k ext3.

Signed-off-by: Mingming Cao [EMAIL PROTECTED]

Index: my2.6/fs/jbd/journal.c
===
--- my2.6.orig/fs/jbd/journal.c 2007-08-30 18:40:02.0 -0700
+++ my2.6/fs/jbd/journal.c  2007-08-31 11:01:18.0 -0700
@@ -1627,16 +1627,17 @@ void * __jbd_kmalloc (const char *where,
  * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
  * and allocate frozen and commit buffers from these slabs.
  *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
+ * (Note: We only seem to need the definitions here for the SLAB_DEBUG
+ * case. In non debug operations SLUB will find the corresponding kmalloc
+ * cache and create an alias. --clameter)
  */
-
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size)  (size  11)
+#define JBD_MAX_SLABS 7
+#define JBD_SLAB_INDEX(size)  get_order((size)  (PAGE_SHIFT - 10))
 
 static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
 static const char *jbd_slab_names[JBD_MAX_SLABS] = {
-   jbd_1k, jbd_2k, jbd_4k, NULL, jbd_8k
+   jbd_1k, jbd_2k, jbd_4k, jbd_8k,
+   jbd_16k, jbd_32k, jbd_64k
 };
 
 static void journal_destroy_jbd_slabs(void)
Index: my2.6/fs/jbd2/journal.c
===
--- my2.6.orig/fs/jbd2/journal.c2007-08-30 18:40:02.0 -0700
+++ my2.6/fs/jbd2/journal.c 2007-08-31 11:04:37.0 -0700
@@ -1639,16 +1639,18 @@ void * __jbd2_kmalloc (const char *where
  * jbd slab management: create 1k, 2k, 4k, 8k slabs as needed
  * and allocate frozen and commit buffers from these slabs.
  *
- * Reason for doing this is to avoid, SLAB_DEBUG - since it could
- * cause bh to cross page boundary.
+ * (Note: We only seem to need the definitions here for the SLAB_DEBUG
+ * case. In non debug operations SLUB will find the corresponding kmalloc
+ * cache and create an alias. --clameter)
  */
 
-#define JBD_MAX_SLABS 5
-#define JBD_SLAB_INDEX(size)  (size  11)
+#define JBD_MAX_SLABS 7
+#define JBD_SLAB_INDEX(size)  get_order((size)  (PAGE_SHIFT - 10))
 
 static struct kmem_cache *jbd_slab[JBD_MAX_SLABS];
 static const char *jbd_slab_names[JBD_MAX_SLABS] = {
-   jbd2_1k, jbd2_2k, jbd2_4k, NULL, jbd2_8k
+   jbd2_1k, jbd2_2k, jbd2_4k, jbd2_8k,
+jbd2_16k, jbd2_32k, jbd2_64k
 };
 
 static void jbd2_journal_destroy_jbd_slabs(void)


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 2/2] JBD: blocks reservation fix for large block support

2007-08-31 Thread Mingming Cao
The blocks per page could be less or quals to 1 with the large block support in 
VM.
The patch fixed the way to calculate the number of blocks to reserve in journal 
in the
case blocksize  pagesize.



Signed-off-by: Mingming Cao [EMAIL PROTECTED]

Index: my2.6/fs/jbd/journal.c
===
--- my2.6.orig/fs/jbd/journal.c 2007-08-31 13:27:16.0 -0700
+++ my2.6/fs/jbd/journal.c  2007-08-31 13:28:18.0 -0700
@@ -1611,7 +1611,12 @@ void journal_ack_err(journal_t *journal)
 
 int journal_blocks_per_page(struct inode *inode)
 {
-   return 1  (PAGE_CACHE_SHIFT - inode-i_sb-s_blocksize_bits);
+   int bits = PAGE_CACHE_SHIFT - inode-i_sb-s_blocksize_bits;
+
+   if (bits  0)
+   return 1  bits;
+   else
+   return 1;
 }
 
 /*
Index: my2.6/fs/jbd2/journal.c
===
--- my2.6.orig/fs/jbd2/journal.c2007-08-31 13:32:21.0 -0700
+++ my2.6/fs/jbd2/journal.c 2007-08-31 13:32:30.0 -0700
@@ -1612,7 +1612,12 @@ void jbd2_journal_ack_err(journal_t *jou
 
 int jbd2_journal_blocks_per_page(struct inode *inode)
 {
-   return 1  (PAGE_CACHE_SHIFT - inode-i_sb-s_blocksize_bits);
+   int bits = PAGE_CACHE_SHIFT - inode-i_sb-s_blocksize_bits;
+
+   if (bits  0)
+   return 1  bits;
+   else
+   return 1;
 }
 
 /*


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 8][PATCH 1/1]Add journal checksums

2007-08-30 Thread Mingming Cao
On Wed, 2007-08-01 at 12:34 +0530, Girish Shilamkar wrote:
 On Wed, 2007-07-11 at 17:16 +0530, Girish Shilamkar wrote:
 
  I will make the changes and send an incremental patch.
  
 Hi,
   I have made the changes and attached the incremental patch as per the
 review.
 
 This is the actual changelog which was missing in the original patch.
 
 --
 The journal checksum feature adds two new flags i.e 
 JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM.
 
 JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the
 checksum for the blocks described by the descriptor blocks.
 Due to checksums, writing of the commit record no longer needs to be
 synchronous. Now commit record can be sent to disk without waiting for
 descriptor blocks to be written to disk. This behavior is controlled
 using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be
 able to recover the journal with _ASYNC_COMMIT hence it is made
 incompat.
 The commit header has been extended to hold the checksum along with the
 type of the checksum.
 
 For recovery in pass scan checksums are verified to ensure the sanity
 and completeness(in case of _ASYNC_COMMIT) of every transaction.
 -

Hit kernel oops when run fsstress test on ext4-patch-queue with -o
journal_checksum.

BUG: unable to handle kernel NULL pointer dereference at virtual address
 
 printing eip:
c118ba5d
*pdpt = 2560a001
*pde = 
Oops:  [#1]
SMP 
Modules linked in:
CPU:1
EIP:0060:[c118ba5d]Not tainted VLI
EFLAGS: 00010257   (2.6.23-rc4-autokern1 #1)
EIP is at crc32_be+0x3d/0x9c
eax: 7e78a276   ebx: 76a2787e   ecx: 0400   edx: 
esi:    edi:    ebp: f56e5200   esp: e61f9e90
ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
Process kjournald2 (pid: 5388, ti=e61f8000 task=e3efc000 task.ti=e61f8000)
Stack: ef5fffc0 0016 c10c762d 055e   1000  
   f50e3e80 7e78a276 0008  0544 eb46aab4 eb46aabc 0155 
   f585f800 e1e1c968  eb059428 055e   e3efc000 
Call Trace:
 [c10c762d] jbd2_journal_commit_transaction+0x92a/0x128d
 [c1029e51] autoremove_wake_function+0x0/0x33
 [c1029e51] autoremove_wake_function+0x0/0x33
 [c10214f4] try_to_del_timer_sync+0x42/0x48
 [c10ca4fd] kjournald2+0x130/0x307
 [c1029e51] autoremove_wake_function+0x0/0x33
 [c129171c] __sched_text_start+0x364/0x3ff
 [c1029e51] autoremove_wake_function+0x0/0x33
 [c10ca3cd] kjournald2+0x0/0x307
 [c1029a27] kthread+0x34/0x55
 [c10299f3] kthread+0x0/0x55
 [c1003173] kernel_thread_helper+0x7/0x10
 ===
Code: 42 30 d8 0f b6 c0 c1 eb 08 33 1c 85 e0 ae 2a c1 49 74 05 f6 c2 03 75 e5 
83 f9 03 76 4c 89 ce 83 ea 04 83 e6 03 c1 e9 02 83 c2 04 33 1a 0f b6 c3 c1 eb 
08 33 1c 85 e0 ae 2a c1 0f b6 c3 c1 eb 08 
EIP: [c118ba5d] crc32_be+0x3d/0x9c SS:ESP 0068:e61f9e90


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8][e2fsprogs] add new -D desc-size option to mkfs

2007-08-30 Thread Mingming Cao
On Thu, 2007-08-30 at 17:37 +0200, Valerie Clement wrote:
 From: Valerie Clement [EMAIL PROTECTED]
 
 This patch adds a new option -D descriptor-size to mkfs to allow creating
 ext4 filesystems with larger group descriptor size.
 
 By default, the group descriptor size is 32 bytes.
 To support 64-bit physical block numbers and thus to be able to use 
 filesystems
   16TB, the group descriptor size must be increased to 64 bytes.
 

Could you explain a bit more why we need this option? It seems to me we
could make the block group descriptor size to 64bytes whenever creating
a ext4 filesystem by default. Is there any case we need to create ext4
with old 32 bytes block group descriptors?

 Use mkfs -t ext4 -D 64 to create an ext4 filesystem with 64-byte group
 descriptor size. This set the EXT4_FEATURE_INCOMPAT_64BIT flag on device.
 
 The descriptor size is limited to a power-of-two value and to the block size.
 
 Signed-off-by: Valerie Clement [EMAIL PROTECTED]
 ---
 
  lib/ext2fs/ext2_fs.h|7 +++
  lib/ext2fs/initialize.c |1 +
  misc/mke2fs.c   |   33 +++--
  3 files changed, 39 insertions(+), 2 deletions(-)
 
 diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
 index a316665..f6e8b36 100644
 --- a/lib/ext2fs/ext2_fs.h
 +++ b/lib/ext2fs/ext2_fs.h
 @@ -217,6 +217,13 @@ struct ext2_dx_countlimit {
  /*
   * Macro-instructions used to manage group descriptors
   */
 +#define EXT2_MIN_DESC_SIZE   32
 +#define EXT2_MIN_DESC_SIZE_64BIT 64
 +#define EXT2_MAX_DESC_SIZE   EXT2_MIN_BLOCK_SIZE
 +#define EXT2_DESC_SIZE(s)\
 + ((EXT2_SB(s)-s_feature_incompat  EXT4_FEATURE_INCOMPAT_64BIT) ? \
 +  (s)-s_desc_size : EXT2_MIN_DESC_SIZE)
 +
  #define EXT2_BLOCKS_PER_GROUP(s) (EXT2_SB(s)-s_blocks_per_group)
  #define EXT2_INODES_PER_GROUP(s) (EXT2_SB(s)-s_inodes_per_group)
  #define EXT2_INODES_PER_BLOCK(s) (EXT2_BLOCK_SIZE(s)/EXT2_INODE_SIZE(s))
 diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
 index 16e9eaa..ee7af93 100644
 --- a/lib/ext2fs/initialize.c
 +++ b/lib/ext2fs/initialize.c
 @@ -149,6 +149,7 @@ errcode_t ext2fs_initialize(const char *name, int flags,
 
   set_field(s_log_block_size, 0); /* default blocksize: 1024 bytes */
   set_field(s_log_frag_size, 0); /* default fragsize: 1024 bytes */
 + set_field(s_desc_size, 0);
   set_field(s_first_data_block, super-s_log_block_size ? 0 : 1);
   set_field(s_max_mnt_count, EXT2_DFL_MAX_MNT_COUNT);
   set_field(s_errors, EXT2_ERRORS_DEFAULT);
 diff --git a/misc/mke2fs.c b/misc/mke2fs.c
 index 4a6cace..115eef9 100644
 --- a/misc/mke2fs.c
 +++ b/misc/mke2fs.c
 @@ -96,7 +96,7 @@ static void usage(void)
  {
   fprintf(stderr, _(Usage: %s [-c|-t|-l filename] [-b block-size] 
   [-f fragment-size]\n\t[-i bytes-per-inode] [-I inode-size] 
 - [-j] [-J journal-options]\n
 + [-D descriptor-size] [-j] [-J journal-options]\n
   \t[-N number-of-inodes] [-m reserved-blocks-percentage] 
   [-o creator-os]\n\t[-g blocks-per-group] [-L volume-label] 
   [-M last-mounted-directory]\n\t[-O feature[,...]] 
 @@ -911,6 +911,7 @@ static void PRS(int argc, char *argv[])
   int blocksize = 0;
   int inode_ratio = 0;
   int inode_size = 0;
 + int desc_size = 0;
   double  reserved_ratio = 5.0;
   int sector_size = 0;
   int show_version_only = 0;
 @@ -993,7 +994,7 @@ static void PRS(int argc, char *argv[])
   }
 
   while ((c = getopt (argc, argv,
 - b:cf:g:i:jl:m:no:qr:s:tvE:FI:J:L:M:N:O:R:ST:V)) != EOF) {
 + b:cf:g:i:jl:m:no:qr:s:tvD:E:FI:J:L:M:N:O:R:ST:V)) != EOF) 
 {
   switch (c) {
   case 'b':
   blocksize = strtol(optarg, tmp, 0);
 @@ -1110,6 +,14 @@ static void PRS(int argc, char *argv[])
   exit(1);
   }
   break;
 + case 'D':
 + desc_size = strtoul(optarg, tmp, 0);
 + if (*tmp) {
 + com_err(program_name, 0,
 + _(invalid descriptor size - %s), 
 optarg);
 + exit(1);
 + }
 + break;
   case 'v':
   verbose = 1;
   break;
 @@ -1439,6 +1448,26 @@ static void PRS(int argc, char *argv[])
   }
   }
 
 + if (desc_size) {
 + if (desc_size  EXT2_MIN_DESC_SIZE ||
 + desc_size  EXT2_BLOCK_SIZE(fs_param) ||
 + desc_size  (desc_size -1)) {
 + com_err(program_name, 0,
 + _(invalid descriptor size %d (min %d/max %d)),
 + desc_size, EXT2_MIN_DESC_SIZE,
 +  

Re: [PATCH V2] fix mballoc oopses on mkdir

2007-08-29 Thread Mingming Cao
On Tue, 2007-08-28 at 00:25 -0500, Eric Sandeen wrote:
 Ugh. brain clearly not fully engaged... here we go:
 
 --
 
 Tried out Ted's git tree + all pending patches today, and 
 immediately oopsed on a mkdir thanks to this in 
 ext4_mb_new_group_pa()
 
   BUG_ON(!S_ISREG(ac-ac_inode-i_mode));
 
 (there are 54 BUGs and BUG_ONs in this file...!)
 
 I think something like this patch is needed?
 

Thanks for catching this and the fix.  I saw the same kernel oops today
when rum mballoc on fsstress (mkdir). I just added this fix to the patch
queue with some comments.

 Signed-off-by: Eric Sandeen [EMAIL PROTECTED]
 
 
 Index: ext4.git/fs/ext4/extents.c
 ===
 --- ext4.git.orig/fs/ext4/extents.c
 +++ ext4.git/fs/ext4/extents.c
 @@ -2535,7 +2535,10 @@ int ext4_ext_get_blocks(handle_t *handle
   ar.goal = ext4_ext_find_goal(inode, path, iblock);
   ar.logical = iblock;
   ar.len = allocated;
 - ar.flags = EXT4_MB_HINT_DATA;
 + if (S_ISREG(inode-i_mode))
 + ar.flags = EXT4_MB_HINT_DATA;
 + else
 + ar.flags = 0;

I see. We set the flags to zero here to avoid doing in-core
preallocation for directory files later. 

Well, I am not sure whether it's worth the effort to support in-core
preallocation for directory files. But I don't see hard reason we cannot
do this. My understanding is: this is the new in-core preallocation
which does a should not depend on extents or any on-disk format changes.

Alex, any comments?


Mingming

   newblock = ext4_mb_new_blocks(handle, ar, err);
   if (!newblock)
   goto out2;
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >