Re: [RFC][Patch 1/1] Persistent preallocation in ext4

2006-12-13 Thread Mingming Cao

Dave Kleikamp wrote:


If anything, the block-based preallocation could initialize all of the
data to zero.  It would be slow, but it would still provide the correct
function and result in contiguous allocation.


Agree. That seems goood enough.


Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][Patch 1/1] Persistent preallocation in ext4

2006-12-13 Thread Suparna Bhattacharya
On Wed, Dec 13, 2006 at 07:36:29AM -0600, Dave Kleikamp wrote:
> On Wed, 2006-12-13 at 15:31 +0530, Amit K. Arora wrote:
> > On Tue, Dec 12, 2006 at 04:20:38PM -0800, Mingming Cao wrote:
> > > On Tue, 2006-12-12 at 11:53 +0530, Amit K. Arora wrote:
> 
> > > Supporting preallocation for extent based files seems fairly
> > > straightforward.  I agree we should look at this first.  After get this
> > > done, it probably worth re-consider whether to support preallocation for
> > > non-extent based files on ext4. I could imagine user upgrade from ext3
> > > to ext4, and expecting to use preallocation on those existing files
> 
> I disagree here.  Why add the complexity for what is going to be a rare
> case?  In cases where a user is going to benefit from preallocation,
> she'll probably also benefit from extents, and would be better off
> making a copy of the file, thus converting it to extents.
> 
> > I gave a thought on this initially. But, I was not sure how we should
> > implement preallocation in a non-extent based file. Using extents we can
> > mark a set of blocks as unitialized, but how will we do this for
> > non-extent based files ? If we do not have a way to mark blocks
> > uninitialized, when someone will try to read from a preallocated block,
> > it will return junk/stale data instead of zeroes.
> 
> If anything, the block-based preallocation could initialize all of the
> data to zero.  It would be slow, but it would still provide the correct
> function and result in contiguous allocation.

And posix_fallocate does that already ... 

Regards
Suparna

> 
> Shaggy
> --
> David Kleikamp
> IBM Linux Technology Center
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][Patch 1/1] Persistent preallocation in ext4

2006-12-13 Thread Dave Kleikamp
On Wed, 2006-12-13 at 15:31 +0530, Amit K. Arora wrote:
> On Tue, Dec 12, 2006 at 04:20:38PM -0800, Mingming Cao wrote:
> > On Tue, 2006-12-12 at 11:53 +0530, Amit K. Arora wrote:

> > Supporting preallocation for extent based files seems fairly
> > straightforward.  I agree we should look at this first.  After get this
> > done, it probably worth re-consider whether to support preallocation for
> > non-extent based files on ext4. I could imagine user upgrade from ext3
> > to ext4, and expecting to use preallocation on those existing files

I disagree here.  Why add the complexity for what is going to be a rare
case?  In cases where a user is going to benefit from preallocation,
she'll probably also benefit from extents, and would be better off
making a copy of the file, thus converting it to extents.

> I gave a thought on this initially. But, I was not sure how we should
> implement preallocation in a non-extent based file. Using extents we can
> mark a set of blocks as unitialized, but how will we do this for
> non-extent based files ? If we do not have a way to mark blocks
> uninitialized, when someone will try to read from a preallocated block,
> it will return junk/stale data instead of zeroes.

If anything, the block-based preallocation could initialize all of the
data to zero.  It would be slow, but it would still provide the correct
function and result in contiguous allocation.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][Patch 1/1] Persistent preallocation in ext4

2006-12-13 Thread Amit K. Arora
On Tue, Dec 12, 2006 at 04:20:38PM -0800, Mingming Cao wrote:
> On Tue, 2006-12-12 at 11:53 +0530, Amit K. Arora wrote:
> > +
> > +   if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
> > +   return -ENOTTY;
> >
> 
> Supporting preallocation for extent based files seems fairly
> straightforward.  I agree we should look at this first.  After get this
> done, it probably worth re-consider whether to support preallocation for
> non-extent based files on ext4. I could imagine user upgrade from ext3
> to ext4, and expecting to use preallocation on those existing files

I gave a thought on this initially. But, I was not sure how we should
implement preallocation in a non-extent based file. Using extents we can
mark a set of blocks as unitialized, but how will we do this for
non-extent based files ? If we do not have a way to mark blocks
uninitialized, when someone will try to read from a preallocated block,
it will return junk/stale data instead of zeroes.

But, if we can think of a solution here then it will be as simple as
removing the check above and replacing "ext4_ext_get_blocks()" with
"ext4_get_blocks_wrap()" in the while() loop.

> 
> > +
> > +   block = EXT4_BLOCK_ALIGN(input.offset, blkbits) >> blkbits;
> > +   max_blocks = EXT4_BLOCK_ALIGN(input.len, blkbits) >> blkbits;

I was wondering if I should change above lines to this :

+   block = input.offset >> blkbits;
+   max_blocks = (EXT4_BLOCK_ALIGN(input.len+input.offset,
blkbits) >> blkbits) - block;

Reason is that the block which contains the offset, should also be
preallocated. And the max_blocks should be calculated accordingly.

> > +   while(ret>=0 && ret > +   {
> > +   block = block + ret;
> > +   max_blocks = max_blocks - ret;
> > +   ret = ext4_ext_get_blocks(handle, inode, block,
> > +   max_blocks, &map_bh,
> > +   EXT4_CREATE_UNINITIALIZED_EXT, 1);
> 
> Since the interface takes offset and number of blocks to allocate, I
> assuming we are going to handle holes in preallocation, thus, we cannot
> not mark the extend_size flag to 1 when calling ext4_ext_get_blocks.
> 
> I think we should update i_size and i_disksize after preallocation. Oh,
> to protect parallel updating i_size, we have to take i_mutex down.

Okay. So, is this what you want to be done here :

+retry:
+ret = 0;
+while(ret>=0 && ret 0 && test_bit(BH_New, &map_bh.b_state))
+nblocks = nblocks + ret;
+}
+if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb,
+&retries))
+goto retry;
+
+if(nblocks) {
+mutex_lock(&inode->i_mutex);
+inode->i_size = inode->i_size + (nblocks >> blkbits);
+EXT4_I(inode)->i_disksize = inode->i_size;
+mutex_unlock(&inode->i_mutex);
+}

> 
> > +   }
> > +   ext4_mark_inode_dirty(handle, inode);
> > +   ext4_journal_stop(handle);
> > +
> 
> Error code returned by ext4_journal_stop() is being ignored here, is
> this right?
> Well, there are other places in ext34/ioctl.c which ignore the return
> returned by ext4_journal_stop(), maybe should fix this in a separate
> patch.

Agreed. I think following should take care of it:

+   ext4_mark_inode_dirty(handle, inode);
+   ret2 = ext4_journal_stop(handle);
+   if(ret > 0)
+   ret = ret2;
+   return ret > 0 ? nblocks : ret;

> > +   return ret>0?0:ret;
> > +   }
> >
> >
> Oh, what if we failed to allocate the full amount of blocks? i.e, the
> ext4_ext_get_blocks() returns -ENOSPC error and exit the loop early. Are
> we going to return error, or try something like
> 
> if (ret == -ENOSPC && ext3_should_retry_alloc(inode->i_sb, &retries))
>   goto retry
> 
> I wonder it might be useful to return the actual number of blocks
> preallocated back to the application.

Ok. Yes, makes sense. We can return the number of "new" blocks like
this:
+   return ret > 0 ? nblocks : ret;



Please let me know if you agree with the above set of changes, and any
further comments you have. I will then update and test the new patch and
post it again. Thanks!

Regards,
Amit Arora
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][Patch 1/1] Persistent preallocation in ext4

2006-12-12 Thread Mingming Cao
On Tue, 2006-12-12 at 11:53 +0530, Amit K. Arora wrote:
> Hi Mingming,
> 
Hi Amit,

> On Mon, Dec 11, 2006 at 05:28:15PM -0800, Mingming Cao wrote:
> > On Wed, 2006-12-06 at 11:28 +0530, Amit K. Arora wrote:
> >
> > > @@ -1142,13 +1155,22 @@
> > >   /* try to insert block into found extent and return */
> > >   if (ex && ext4_can_extents_be_merged(inode, ex, newext)) {
> > >   ext_debug("append %d block to %d:%d (from %llu)\n",
> > > - le16_to_cpu(newext->ee_len),
> > > + ext4_ext_get_actual_len(newext),
> > >   le32_to_cpu(ex->ee_block),
> > > - le16_to_cpu(ex->ee_len), ext_pblock(ex));
> > > + ext4_ext_get_actual_len(ex), ext_pblock(ex));
> > >   if ((err = ext4_ext_get_access(handle, inode, path + depth)))
> > >   return err;
> > > - ex->ee_len = cpu_to_le16(le16_to_cpu(ex->ee_len)
> > > -  + le16_to_cpu(newext->ee_len));
> > > +
> > > + /* ext4_can_extents_be_merged should have checked that either
> > > +  * both extents are uninitialized, or both aren't. Thus we
> > > +  * need to check any of them here.
> > > +  */
> > > + if (ext4_ext_is_uninitialized(ex))
> > > + uninitialized = 1;
> > > + ex->ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex)
> > > +  + ext4_ext_get_actual_len(newext));
> 
> Above line will "remove" the uninitialized bit from "ex", if it was set.
> We call ext4_ext_get_actual_len() to get the "actual" lengths of the two
> extents, which removes this MSB in ee_len (MSB in ee_len is used to mark
> an extent uninitialized). Now, we do this because if lengths of two
> uninitialized extents will be added as it is (i.e. without masking out
> the MSB in the length), it will result in removing the MSB in ee_len.
> For example, 0x8002 + 0x8003 => 0x10005 => 0x5 (since ee_len is 16 bit).
> 
> That is why just before this line, we save the "state" of this extent,
> whether it was uninitialized or not. And, we restore this "state" below.

Ah...you are right:)

More comments below...

> Index: linux-2.6.19/fs/ext4/ioctl.c
> ===
> --- linux-2.6.19.orig/fs/ext4/ioctl.c 2006-12-06 10:18:12.0 +0530
> +++ linux-2.6.19/fs/ext4/ioctl.c  2006-12-06 10:18:15.0 +0530
> @@ -248,6 +248,47 @@
>   return err;
>   }
> 
> + case EXT4_IOC_PREALLOCATE: {
> + struct ext4_falloc_input input;
> + handle_t *handle;
> + ext4_fsblk_t block, max_blocks;
> + int ret = 0;
> + struct buffer_head map_bh;
> + unsigned int blkbits = inode->i_blkbits;
> +
> + if (IS_RDONLY(inode))
> +return -EROFS;
> +
> + if (copy_from_user(&input,
> + (struct ext4_falloc_input __user *) arg, sizeof(input)))
> +return -EFAULT;
> +
> + if (input.len == 0)
> + return -EINVAL;
> +
> + if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
> + return -ENOTTY;
> 

Supporting preallocation for extent based files seems fairly
straightforward.  I agree we should look at this first.  After get this
done, it probably worth re-consider whether to support preallocation for
non-extent based files on ext4. I could imagine user upgrade from ext3
to ext4, and expecting to use preallocation on those existing files

> +
> + block = EXT4_BLOCK_ALIGN(input.offset, blkbits) >> blkbits;
> + max_blocks = EXT4_BLOCK_ALIGN(input.len, blkbits) >> blkbits;
> + handle=ext4_journal_start(inode,
> + EXT4_DATA_TRANS_BLOCKS(inode->i_sb)+max_blocks);
> + if (IS_ERR(handle))
> + return PTR_ERR(handle);
> + while(ret>=0 && ret + {
> + block = block + ret;
> + max_blocks = max_blocks - ret;
> + ret = ext4_ext_get_blocks(handle, inode, block,
> + max_blocks, &map_bh,
> + EXT4_CREATE_UNINITIALIZED_EXT, 1);

Since the interface takes offset and number of blocks to allocate, I
assuming we are going to handle holes in preallocation, thus, we cannot
not mark the extend_size flag to 1 when calling ext4_ext_get_blocks.

I think we should update i_size and i_disksize after preallocation. Oh,
to protect parallel updating i_size, we have to take i_mutex down.

> + }
> + ext4_mark_inode_dirty(handle, inode);
> + ext4_journal_stop(handle);
> +

Error code returned by ext4_journal_stop() is being ignored here, is
this right?
Well, there are other places in ext34/ioctl.c which ignore the return
returned 

Re: [RFC][Patch 1/1] Persistent preallocation in ext4

2006-12-11 Thread Amit K. Arora
Hi Mingming,

On Mon, Dec 11, 2006 at 05:28:15PM -0800, Mingming Cao wrote:
> On Wed, 2006-12-06 at 11:28 +0530, Amit K. Arora wrote:
> 
> > @@ -1142,13 +1155,22 @@
> > /* try to insert block into found extent and return */
> > if (ex && ext4_can_extents_be_merged(inode, ex, newext)) {
> > ext_debug("append %d block to %d:%d (from %llu)\n",
> > -   le16_to_cpu(newext->ee_len),
> > +   ext4_ext_get_actual_len(newext),
> > le32_to_cpu(ex->ee_block),
> > -   le16_to_cpu(ex->ee_len), ext_pblock(ex));
> > +   ext4_ext_get_actual_len(ex), ext_pblock(ex));
> > if ((err = ext4_ext_get_access(handle, inode, path + depth)))
> > return err;
> > -   ex->ee_len = cpu_to_le16(le16_to_cpu(ex->ee_len)
> > -+ le16_to_cpu(newext->ee_len));
> > +
> > +   /* ext4_can_extents_be_merged should have checked that either
> > +* both extents are uninitialized, or both aren't. Thus we
> > +* need to check any of them here.
> > +*/
> > +   if (ext4_ext_is_uninitialized(ex))
> > +   uninitialized = 1;
> > +   ex->ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex)
> > ++ ext4_ext_get_actual_len(newext));

Above line will "remove" the uninitialized bit from "ex", if it was set.
We call ext4_ext_get_actual_len() to get the "actual" lengths of the two
extents, which removes this MSB in ee_len (MSB in ee_len is used to mark
an extent uninitialized). Now, we do this because if lengths of two
uninitialized extents will be added as it is (i.e. without masking out
the MSB in the length), it will result in removing the MSB in ee_len.
For example, 0x8002 + 0x8003 => 0x10005 => 0x5 (since ee_len is 16 bit).

That is why just before this line, we save the "state" of this extent,
whether it was uninitialized or not. And, we restore this "state" below.

> > +   if(uninitialized)
> > +   ext4_mark_uninitialized_ext(ex);
> > eh = path[depth].p_hdr;
> > nearex = ex;
> > goto merge;
> 
> Hmm, I missed the point to re-mark an uninitialized extent here. If ex
> is an uninitialized extent, the mark(the first bit the ee_len) shall
> still there after the update, isn't?  We already make sure that two
> large uninitialized extent can't get merged if the resulting length will
> take the first bit, which used as the mark of uninitialized extent.

Please get back if you do not agree with the explanation above and if I
am missing something here. Thanks!

Regards,
Amit Arora
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][Patch 1/1] Persistent preallocation in ext4

2006-12-11 Thread Mingming Cao
On Wed, 2006-12-06 at 11:28 +0530, Amit K. Arora wrote:

> @@ -1142,13 +1155,22 @@
>   /* try to insert block into found extent and return */
>   if (ex && ext4_can_extents_be_merged(inode, ex, newext)) {
>   ext_debug("append %d block to %d:%d (from %llu)\n",
> - le16_to_cpu(newext->ee_len),
> + ext4_ext_get_actual_len(newext),
>   le32_to_cpu(ex->ee_block),
> - le16_to_cpu(ex->ee_len), ext_pblock(ex));
> + ext4_ext_get_actual_len(ex), ext_pblock(ex));
>   if ((err = ext4_ext_get_access(handle, inode, path + depth)))
>   return err;
> - ex->ee_len = cpu_to_le16(le16_to_cpu(ex->ee_len)
> -  + le16_to_cpu(newext->ee_len));
> +
> + /* ext4_can_extents_be_merged should have checked that either
> +  * both extents are uninitialized, or both aren't. Thus we
> +  * need to check any of them here.
> +  */
> + if (ext4_ext_is_uninitialized(ex))
> + uninitialized = 1;
> + ex->ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex)
> +  + ext4_ext_get_actual_len(newext));
> + if(uninitialized)
> + ext4_mark_uninitialized_ext(ex);
>   eh = path[depth].p_hdr;
>   nearex = ex;
>   goto merge;

Hmm, I missed the point to re-mark an uninitialized extent here. If ex
is an uninitialized extent, the mark(the first bit the ee_len) shall
still there after the update, isn't?  We already make sure that two
large uninitialized extent can't get merged if the resulting length will
take the first bit, which used as the mark of uninitialized extent.


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][Patch 1/1] Persistent preallocation in ext4

2006-12-05 Thread Amit K. Arora
This patch takes care of few suggestions given by Suzuki.
Suzuki, Thanks!

Signed-off-by: Amit Arora <[EMAIL PROTECTED]>
---
 fs/ext4/extents.c   |  117 
 fs/ext4/ioctl.c |   41 
 include/linux/ext4_fs.h |   21 
 3 files changed, 141 insertions(+), 38 deletions(-)

Index: linux-2.6.19/fs/ext4/extents.c
===
--- linux-2.6.19.orig/fs/ext4/extents.c 2006-12-06 10:18:12.0 +0530
+++ linux-2.6.19/fs/ext4/extents.c  2006-12-06 10:23:02.0 +0530
@@ -282,7 +282,7 @@
} else if (path->p_ext) {
ext_debug("  %d:%d:%llu ",
  le32_to_cpu(path->p_ext->ee_block),
- le16_to_cpu(path->p_ext->ee_len),
+ ext4_ext_get_actual_len(path->p_ext),
  ext_pblock(path->p_ext));
} else
ext_debug("  []");
@@ -305,7 +305,7 @@
 
for (i = 0; i < le16_to_cpu(eh->eh_entries); i++, ex++) {
ext_debug("%d:%d:%llu ", le32_to_cpu(ex->ee_block),
- le16_to_cpu(ex->ee_len), ext_pblock(ex));
+   ext4_ext_get_actual_len(ex), ext_pblock(ex));
}
ext_debug("\n");
 }
@@ -425,7 +425,7 @@
ext_debug("  -> %d:%llu:%d ",
le32_to_cpu(path->p_ext->ee_block),
ext_pblock(path->p_ext),
-   le16_to_cpu(path->p_ext->ee_len));
+   ext4_ext_get_actual_len(path->p_ext));
 
 #ifdef CHECK_BINSEARCH
{
@@ -686,7 +686,7 @@
ext_debug("move %d:%llu:%d in new leaf %llu\n",
le32_to_cpu(path[depth].p_ext->ee_block),
ext_pblock(path[depth].p_ext),
-   le16_to_cpu(path[depth].p_ext->ee_len),
+   ext4_ext_get_actual_len(path[depth].p_ext),
newblock);
/*memmove(ex++, path[depth].p_ext++,
sizeof(struct ext4_extent));
@@ -1097,7 +1097,20 @@
 ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
struct ext4_extent *ex2)
 {
-   if (le32_to_cpu(ex1->ee_block) + le16_to_cpu(ex1->ee_len) !=
+   unsigned short ext1_ee_len, ext2_ee_len;
+
+
+   /*
+* Make sure that either both extents are uninitialized, or
+* both are _not_.
+*/
+   if (ext4_ext_is_uninitialized(ex1) ^ ext4_ext_is_uninitialized(ex2))
+   return 0;
+
+   ext1_ee_len = ext4_ext_get_actual_len(ex1);
+   ext2_ee_len = ext4_ext_get_actual_len(ex2);
+
+   if (le32_to_cpu(ex1->ee_block) + ext1_ee_len !=
le32_to_cpu(ex2->ee_block))
return 0;
 
@@ -1106,14 +1119,14 @@
 * as an RO_COMPAT feature, refuse to merge to extents if
 * this can result in the top bit of ee_len being set.
 */
-   if (le16_to_cpu(ex1->ee_len) + le16_to_cpu(ex2->ee_len) > EXT_MAX_LEN)
+   if (ext1_ee_len + ext2_ee_len > EXT_MAX_LEN)
return 0;
 #ifdef AGRESSIVE_TEST
if (le16_to_cpu(ex1->ee_len) >= 4)
return 0;
 #endif
 
-   if (ext_pblock(ex1) + le16_to_cpu(ex1->ee_len) == ext_pblock(ex2))
+   if (ext_pblock(ex1) + ext1_ee_len == ext_pblock(ex2))
return 1;
return 0;
 }
@@ -1132,9 +1145,9 @@
struct ext4_extent *ex, *fex;
struct ext4_extent *nearex; /* nearest extent */
struct ext4_ext_path *npath = NULL;
-   int depth, len, err, next;
+   int depth, len, err, next, uninitialized = 0;
 
-   BUG_ON(newext->ee_len == 0);
+   BUG_ON(ext4_ext_get_actual_len(newext) == 0);
depth = ext_depth(inode);
ex = path[depth].p_ext;
BUG_ON(path[depth].p_hdr == NULL);
@@ -1142,13 +1155,22 @@
/* try to insert block into found extent and return */
if (ex && ext4_can_extents_be_merged(inode, ex, newext)) {
ext_debug("append %d block to %d:%d (from %llu)\n",
-   le16_to_cpu(newext->ee_len),
+   ext4_ext_get_actual_len(newext),
le32_to_cpu(ex->ee_block),
-   le16_to_cpu(ex->ee_len), ext_pblock(ex));
+   ext4_ext_get_actual_len(ex), ext_pblock(ex));
if ((err = ext4_ext_get_access(handle, inode, path + depth)))
return err;
-   ex->ee_len = cpu_to_le16(le16_to_cpu(ex->ee_len)
-+ le16_to_cpu(newext->ee_len));
+
+   /* ext4_can_extents_be_merged should have checked that either
+* both extents are uninitialized, or b