Re: [RFC][PATCH 3/3] Btrfs: improve truncation of btrfs

2012-01-08 Thread Miao Xie
On Fri, 6 Jan 2012 09:50:31 -0500, Josef Bacik wrote:
 On Fri, Jan 06, 2012 at 11:51:16AM +0800, Miao Xie wrote:
 On thu, 5 Jan 2012 10:15:50 -0500, Josef Bacik wrote:
 +  trans = btrfs_start_transaction(root, 2);
 +  if (IS_ERR(trans))
 +  return PTR_ERR(trans);
  
/*
 * setattr is responsible for setting the ordered_data_close flag,
 @@ -6621,26 +6585,12 @@ static int btrfs_truncate(struct inode *inode)
 * using truncate to replace the contents of the file will
 * end up with a zero length file after a crash.
 */
 -  if (inode-i_size == 0  BTRFS_I(inode)-ordered_data_close)
 +  if (newsize == 0  BTRFS_I(inode)-ordered_data_close)
btrfs_add_ordered_operation(trans, root, inode);

 Since we have write out all the dirty page, we can drop the following code 
 which is
 in front of the while loop, and move the first btrfs_start_transaction() 
 into the loop,
 the logic of btrfs_truncate() will become simpler.

while (1) {
 -  ret = btrfs_block_rsv_refill(root, rsv, min_size);
 -  if (ret) {
 -  /*
 -   * This can only happen with the original transaction we
 -   * started above, every other time we shouldn't have a
 -   * transaction started yet.
 -   */
 -  if (ret == -EAGAIN)
 -  goto end_trans;
 -  err = ret;
 -  break;
 -  }
 -

 Taking this part out is wrong, we need to have this slack space to account 
 for
 any COW that truncate does.  Other than that this looks pretty good.  
 Thanks,


 I think we can take this part out, because we start a new transaction every 
 time we
 do a truncation, and reserve enough space at that time. See below:

 
 Ok let me rephrase.  The whole reason I do this is because the reservation 
 stuff
 is tricky, we may not actually use any of this space and so constantly going
 back to reserve it makes us much more likely to fail our truncate() because of
 ENOSPC.  But if we just hold onto a min size and then refill it when we need 
 to
 we lower the risk considerably, so this needs to stay.  Thanks,

I see.
But I think this method is too gingerly, it can not avoid the ENOSPC 
completely, but it
makes the code more complex. Though dropping this part will make the risk of 
ENOSPC higher,
it doesn't break the meta-data. so it is enough that reserving space when 
starting a new
transaction and just return the error number if ENOSPC happens.

Thanks
Miao
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 3/3] Btrfs: improve truncation of btrfs

2012-01-06 Thread Josef Bacik
On Fri, Jan 06, 2012 at 11:51:16AM +0800, Miao Xie wrote:
 On thu, 5 Jan 2012 10:15:50 -0500, Josef Bacik wrote:
  +  trans = btrfs_start_transaction(root, 2);
  +  if (IS_ERR(trans))
  +  return PTR_ERR(trans);
   
 /*
  * setattr is responsible for setting the ordered_data_close flag,
  @@ -6621,26 +6585,12 @@ static int btrfs_truncate(struct inode *inode)
  * using truncate to replace the contents of the file will
  * end up with a zero length file after a crash.
  */
  -  if (inode-i_size == 0  BTRFS_I(inode)-ordered_data_close)
  +  if (newsize == 0  BTRFS_I(inode)-ordered_data_close)
 btrfs_add_ordered_operation(trans, root, inode);
 
 Since we have write out all the dirty page, we can drop the following code 
 which is
 in front of the while loop, and move the first btrfs_start_transaction() into 
 the loop,
 the logic of btrfs_truncate() will become simpler.
 
 while (1) {
  -  ret = btrfs_block_rsv_refill(root, rsv, min_size);
  -  if (ret) {
  -  /*
  -   * This can only happen with the original transaction we
  -   * started above, every other time we shouldn't have a
  -   * transaction started yet.
  -   */
  -  if (ret == -EAGAIN)
  -  goto end_trans;
  -  err = ret;
  -  break;
  -  }
  -
  
  Taking this part out is wrong, we need to have this slack space to account 
  for
  any COW that truncate does.  Other than that this looks pretty good.  
  Thanks,
  
 
 I think we can take this part out, because we start a new transaction every 
 time we
 do a truncation, and reserve enough space at that time. See below:
 

Ok let me rephrase.  The whole reason I do this is because the reservation stuff
is tricky, we may not actually use any of this space and so constantly going
back to reserve it makes us much more likely to fail our truncate() because of
ENOSPC.  But if we just hold onto a min size and then refill it when we need to
we lower the risk considerably, so this needs to stay.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 3/3] Btrfs: improve truncation of btrfs

2012-01-05 Thread Miao Xie
The original truncation of btrfs has a bug, that is the orphan item will not be
dropped when the truncation fails. This bug will trigger BUG() when unlink that
truncated file. And besides that, if the user does pre-allocation for the file
which is truncated unsuccessfully, after re-mount(umount-mount, not -o remount),
the pre-allocated extent will be dropped.

This patch modified the relative functions of the truncation, and makes the
truncation update i_size and disk_i_size of i-nodes every time we drop the file
extent successfully, and set them to the real value. By this way, we needn't
add orphan items to guarantee the consistency of the meta-data.

By this patch, it is possible that the file may not be truncated to the size
that the user expects(may be = the orignal size and = the expected one), so I
think it is better that we shouldn't lose the data that lies within the range
the expected size, the real size, because the user may take it for granted
that the data in that extent is not lost. In order to implement it, we just
write out all the dirty pages which are beyond the expected size of the file.
This operation will spend lots of time if there are many dirty pages. It is
also the only disadvantage of this patch. (Maybe I'm overcautious, we needn't
hold that data.)

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/inode.c |  159 +-
 1 files changed, 49 insertions(+), 110 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index df6060f..9ace01b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -88,7 +88,7 @@ static unsigned char btrfs_type_by_mode[S_IFMT  S_SHIFT] = {
 };
 
 static int btrfs_setsize(struct inode *inode, loff_t newsize);
-static int btrfs_truncate(struct inode *inode);
+static int btrfs_truncate(struct inode *inode, loff_t newsize);
 static int btrfs_finish_ordered_io(struct inode *inode, u64 start, u64 end);
 static noinline int cow_file_range(struct inode *inode,
   struct page *locked_page,
@@ -2230,7 +2230,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 * btrfs_delalloc_reserve_space to catch offenders.
 */
mutex_lock(inode-i_mutex);
-   ret = btrfs_truncate(inode);
+   ret = btrfs_truncate(inode, inode-i_size);
mutex_unlock(inode-i_mutex);
} else {
nr_unlink++;
@@ -2993,7 +2993,7 @@ static int btrfs_release_and_test_inline_data_extent(
return 0;
 
/*
-* Truncate inline items is special, we have done it by
+* Truncate inline items is special, we will do it by
 *   btrfs_truncate_page();
 */
if (offset  new_size)
@@ -3121,9 +3121,9 @@ static int btrfs_release_and_test_data_extent(struct 
btrfs_trans_handle *trans,
  * will kill all the items on this inode, including the INODE_ITEM_KEY.
  */
 int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
-   struct btrfs_root *root,
-   struct inode *inode,
-   u64 new_size, u32 min_type)
+  struct btrfs_root *root,
+  struct inode *inode,
+  u64 new_size, u32 min_type)
 {
struct btrfs_path *path;
struct extent_buffer *leaf;
@@ -3131,6 +3131,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle 
*trans,
struct btrfs_key found_key;
u64 mask = root-sectorsize - 1;
u64 ino = btrfs_ino(inode);
+   u64 old_size = i_size_read(inode);
u32 found_type;
int pending_del_nr = 0;
int pending_del_slot = 0;
@@ -3138,6 +3139,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle 
*trans,
int err = 0;
 
BUG_ON(new_size  0  min_type != BTRFS_EXTENT_DATA_KEY);
+   BUG_ON(new_size  mask);
 
path = btrfs_alloc_path();
if (!path)
@@ -3190,6 +3192,13 @@ search_again:
ret = btrfs_release_and_test_data_extent(trans, root,
path, inode, found_key.offset,
new_size);
+   if (root-ref_cows ||
+   root == root-fs_info-tree_root) {
+   if (ret  found_key.offset  old_size)
+   i_size_write(inode, found_key.offset);
+   else if (!ret)
+   i_size_write(inode, new_size);
+   }
if (!ret)
break;
}
@@ -3247,12 +3256,10 @@ out:
 static int btrfs_truncate_page(struct address_space *mapping, loff_t from)
 {
struct inode *inode = mapping-host;
-   

Re: [RFC][PATCH 3/3] Btrfs: improve truncation of btrfs

2012-01-05 Thread Josef Bacik
On Thu, Jan 05, 2012 at 04:32:46PM +0800, Miao Xie wrote:
 The original truncation of btrfs has a bug, that is the orphan item will not 
 be
 dropped when the truncation fails. This bug will trigger BUG() when unlink 
 that
 truncated file. And besides that, if the user does pre-allocation for the file
 which is truncated unsuccessfully, after re-mount(umount-mount, not -o 
 remount),
 the pre-allocated extent will be dropped.
 
 This patch modified the relative functions of the truncation, and makes the
 truncation update i_size and disk_i_size of i-nodes every time we drop the 
 file
 extent successfully, and set them to the real value. By this way, we needn't
 add orphan items to guarantee the consistency of the meta-data.
 
 By this patch, it is possible that the file may not be truncated to the size
 that the user expects(may be = the orignal size and = the expected one), so 
 I
 think it is better that we shouldn't lose the data that lies within the range
 the expected size, the real size, because the user may take it for granted
 that the data in that extent is not lost. In order to implement it, we just
 write out all the dirty pages which are beyond the expected size of the file.
 This operation will spend lots of time if there are many dirty pages. It is
 also the only disadvantage of this patch. (Maybe I'm overcautious, we needn't
 hold that data.)
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/inode.c |  159 
 +-
  1 files changed, 49 insertions(+), 110 deletions(-)
 
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index df6060f..9ace01b 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -88,7 +88,7 @@ static unsigned char btrfs_type_by_mode[S_IFMT  S_SHIFT] 
 = {
  };
  
  static int btrfs_setsize(struct inode *inode, loff_t newsize);
 -static int btrfs_truncate(struct inode *inode);
 +static int btrfs_truncate(struct inode *inode, loff_t newsize);
  static int btrfs_finish_ordered_io(struct inode *inode, u64 start, u64 end);
  static noinline int cow_file_range(struct inode *inode,
  struct page *locked_page,
 @@ -2230,7 +2230,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
* btrfs_delalloc_reserve_space to catch offenders.
*/
   mutex_lock(inode-i_mutex);
 - ret = btrfs_truncate(inode);
 + ret = btrfs_truncate(inode, inode-i_size);
   mutex_unlock(inode-i_mutex);
   } else {
   nr_unlink++;
 @@ -2993,7 +2993,7 @@ static int btrfs_release_and_test_inline_data_extent(
   return 0;
  
   /*
 -  * Truncate inline items is special, we have done it by
 +  * Truncate inline items is special, we will do it by
*   btrfs_truncate_page();
*/
   if (offset  new_size)
 @@ -3121,9 +3121,9 @@ static int btrfs_release_and_test_data_extent(struct 
 btrfs_trans_handle *trans,
   * will kill all the items on this inode, including the INODE_ITEM_KEY.
   */
  int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
 - struct btrfs_root *root,
 - struct inode *inode,
 - u64 new_size, u32 min_type)
 +struct btrfs_root *root,
 +struct inode *inode,
 +u64 new_size, u32 min_type)
  {
   struct btrfs_path *path;
   struct extent_buffer *leaf;
 @@ -3131,6 +3131,7 @@ int btrfs_truncate_inode_items(struct 
 btrfs_trans_handle *trans,
   struct btrfs_key found_key;
   u64 mask = root-sectorsize - 1;
   u64 ino = btrfs_ino(inode);
 + u64 old_size = i_size_read(inode);
   u32 found_type;
   int pending_del_nr = 0;
   int pending_del_slot = 0;
 @@ -3138,6 +3139,7 @@ int btrfs_truncate_inode_items(struct 
 btrfs_trans_handle *trans,
   int err = 0;
  
   BUG_ON(new_size  0  min_type != BTRFS_EXTENT_DATA_KEY);
 + BUG_ON(new_size  mask);
  
   path = btrfs_alloc_path();
   if (!path)
 @@ -3190,6 +3192,13 @@ search_again:
   ret = btrfs_release_and_test_data_extent(trans, root,
   path, inode, found_key.offset,
   new_size);
 + if (root-ref_cows ||
 + root == root-fs_info-tree_root) {
 + if (ret  found_key.offset  old_size)
 + i_size_write(inode, found_key.offset);
 + else if (!ret)
 + i_size_write(inode, new_size);
 + }
   if (!ret)
   break;
   }
 @@ -3247,12 +3256,10 @@ out:
  static int btrfs_truncate_page(struct address_space 

Re: [RFC][PATCH 3/3] Btrfs: improve truncation of btrfs

2012-01-05 Thread Miao Xie
On thu, 5 Jan 2012 10:15:50 -0500, Josef Bacik wrote:
 +trans = btrfs_start_transaction(root, 2);
 +if (IS_ERR(trans))
 +return PTR_ERR(trans);
  
  /*
   * setattr is responsible for setting the ordered_data_close flag,
 @@ -6621,26 +6585,12 @@ static int btrfs_truncate(struct inode *inode)
   * using truncate to replace the contents of the file will
   * end up with a zero length file after a crash.
   */
 -if (inode-i_size == 0  BTRFS_I(inode)-ordered_data_close)
 +if (newsize == 0  BTRFS_I(inode)-ordered_data_close)
  btrfs_add_ordered_operation(trans, root, inode);

Since we have write out all the dirty page, we can drop the following code 
which is
in front of the while loop, and move the first btrfs_start_transaction() into 
the loop,
the logic of btrfs_truncate() will become simpler.

  while (1) {
 -ret = btrfs_block_rsv_refill(root, rsv, min_size);
 -if (ret) {
 -/*
 - * This can only happen with the original transaction we
 - * started above, every other time we shouldn't have a
 - * transaction started yet.
 - */
 -if (ret == -EAGAIN)
 -goto end_trans;
 -err = ret;
 -break;
 -}
 -
 
 Taking this part out is wrong, we need to have this slack space to account for
 any COW that truncate does.  Other than that this looks pretty good.  Thanks,
 

I think we can take this part out, because we start a new transaction every 
time we
do a truncation, and reserve enough space at that time. See below:

Thanks
Miao
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html