Re: [RFC][PATCH 3/3] Btrfs: improve truncation of btrfs
On Fri, 6 Jan 2012 09:50:31 -0500, Josef Bacik wrote: On Fri, Jan 06, 2012 at 11:51:16AM +0800, Miao Xie wrote: On thu, 5 Jan 2012 10:15:50 -0500, Josef Bacik wrote: + trans = btrfs_start_transaction(root, 2); + if (IS_ERR(trans)) + return PTR_ERR(trans); /* * setattr is responsible for setting the ordered_data_close flag, @@ -6621,26 +6585,12 @@ static int btrfs_truncate(struct inode *inode) * using truncate to replace the contents of the file will * end up with a zero length file after a crash. */ - if (inode-i_size == 0 BTRFS_I(inode)-ordered_data_close) + if (newsize == 0 BTRFS_I(inode)-ordered_data_close) btrfs_add_ordered_operation(trans, root, inode); Since we have write out all the dirty page, we can drop the following code which is in front of the while loop, and move the first btrfs_start_transaction() into the loop, the logic of btrfs_truncate() will become simpler. while (1) { - ret = btrfs_block_rsv_refill(root, rsv, min_size); - if (ret) { - /* - * This can only happen with the original transaction we - * started above, every other time we shouldn't have a - * transaction started yet. - */ - if (ret == -EAGAIN) - goto end_trans; - err = ret; - break; - } - Taking this part out is wrong, we need to have this slack space to account for any COW that truncate does. Other than that this looks pretty good. Thanks, I think we can take this part out, because we start a new transaction every time we do a truncation, and reserve enough space at that time. See below: Ok let me rephrase. The whole reason I do this is because the reservation stuff is tricky, we may not actually use any of this space and so constantly going back to reserve it makes us much more likely to fail our truncate() because of ENOSPC. But if we just hold onto a min size and then refill it when we need to we lower the risk considerably, so this needs to stay. Thanks, I see. But I think this method is too gingerly, it can not avoid the ENOSPC completely, but it makes the code more complex. Though dropping this part will make the risk of ENOSPC higher, it doesn't break the meta-data. so it is enough that reserving space when starting a new transaction and just return the error number if ENOSPC happens. Thanks Miao -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 3/3] Btrfs: improve truncation of btrfs
On Fri, Jan 06, 2012 at 11:51:16AM +0800, Miao Xie wrote: On thu, 5 Jan 2012 10:15:50 -0500, Josef Bacik wrote: + trans = btrfs_start_transaction(root, 2); + if (IS_ERR(trans)) + return PTR_ERR(trans); /* * setattr is responsible for setting the ordered_data_close flag, @@ -6621,26 +6585,12 @@ static int btrfs_truncate(struct inode *inode) * using truncate to replace the contents of the file will * end up with a zero length file after a crash. */ - if (inode-i_size == 0 BTRFS_I(inode)-ordered_data_close) + if (newsize == 0 BTRFS_I(inode)-ordered_data_close) btrfs_add_ordered_operation(trans, root, inode); Since we have write out all the dirty page, we can drop the following code which is in front of the while loop, and move the first btrfs_start_transaction() into the loop, the logic of btrfs_truncate() will become simpler. while (1) { - ret = btrfs_block_rsv_refill(root, rsv, min_size); - if (ret) { - /* - * This can only happen with the original transaction we - * started above, every other time we shouldn't have a - * transaction started yet. - */ - if (ret == -EAGAIN) - goto end_trans; - err = ret; - break; - } - Taking this part out is wrong, we need to have this slack space to account for any COW that truncate does. Other than that this looks pretty good. Thanks, I think we can take this part out, because we start a new transaction every time we do a truncation, and reserve enough space at that time. See below: Ok let me rephrase. The whole reason I do this is because the reservation stuff is tricky, we may not actually use any of this space and so constantly going back to reserve it makes us much more likely to fail our truncate() because of ENOSPC. But if we just hold onto a min size and then refill it when we need to we lower the risk considerably, so this needs to stay. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 3/3] Btrfs: improve truncation of btrfs
The original truncation of btrfs has a bug, that is the orphan item will not be dropped when the truncation fails. This bug will trigger BUG() when unlink that truncated file. And besides that, if the user does pre-allocation for the file which is truncated unsuccessfully, after re-mount(umount-mount, not -o remount), the pre-allocated extent will be dropped. This patch modified the relative functions of the truncation, and makes the truncation update i_size and disk_i_size of i-nodes every time we drop the file extent successfully, and set them to the real value. By this way, we needn't add orphan items to guarantee the consistency of the meta-data. By this patch, it is possible that the file may not be truncated to the size that the user expects(may be = the orignal size and = the expected one), so I think it is better that we shouldn't lose the data that lies within the range the expected size, the real size, because the user may take it for granted that the data in that extent is not lost. In order to implement it, we just write out all the dirty pages which are beyond the expected size of the file. This operation will spend lots of time if there are many dirty pages. It is also the only disadvantage of this patch. (Maybe I'm overcautious, we needn't hold that data.) Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/inode.c | 159 +- 1 files changed, 49 insertions(+), 110 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index df6060f..9ace01b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -88,7 +88,7 @@ static unsigned char btrfs_type_by_mode[S_IFMT S_SHIFT] = { }; static int btrfs_setsize(struct inode *inode, loff_t newsize); -static int btrfs_truncate(struct inode *inode); +static int btrfs_truncate(struct inode *inode, loff_t newsize); static int btrfs_finish_ordered_io(struct inode *inode, u64 start, u64 end); static noinline int cow_file_range(struct inode *inode, struct page *locked_page, @@ -2230,7 +2230,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) * btrfs_delalloc_reserve_space to catch offenders. */ mutex_lock(inode-i_mutex); - ret = btrfs_truncate(inode); + ret = btrfs_truncate(inode, inode-i_size); mutex_unlock(inode-i_mutex); } else { nr_unlink++; @@ -2993,7 +2993,7 @@ static int btrfs_release_and_test_inline_data_extent( return 0; /* -* Truncate inline items is special, we have done it by +* Truncate inline items is special, we will do it by * btrfs_truncate_page(); */ if (offset new_size) @@ -3121,9 +3121,9 @@ static int btrfs_release_and_test_data_extent(struct btrfs_trans_handle *trans, * will kill all the items on this inode, including the INODE_ITEM_KEY. */ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, - struct btrfs_root *root, - struct inode *inode, - u64 new_size, u32 min_type) + struct btrfs_root *root, + struct inode *inode, + u64 new_size, u32 min_type) { struct btrfs_path *path; struct extent_buffer *leaf; @@ -3131,6 +3131,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, struct btrfs_key found_key; u64 mask = root-sectorsize - 1; u64 ino = btrfs_ino(inode); + u64 old_size = i_size_read(inode); u32 found_type; int pending_del_nr = 0; int pending_del_slot = 0; @@ -3138,6 +3139,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, int err = 0; BUG_ON(new_size 0 min_type != BTRFS_EXTENT_DATA_KEY); + BUG_ON(new_size mask); path = btrfs_alloc_path(); if (!path) @@ -3190,6 +3192,13 @@ search_again: ret = btrfs_release_and_test_data_extent(trans, root, path, inode, found_key.offset, new_size); + if (root-ref_cows || + root == root-fs_info-tree_root) { + if (ret found_key.offset old_size) + i_size_write(inode, found_key.offset); + else if (!ret) + i_size_write(inode, new_size); + } if (!ret) break; } @@ -3247,12 +3256,10 @@ out: static int btrfs_truncate_page(struct address_space *mapping, loff_t from) { struct inode *inode = mapping-host; -
Re: [RFC][PATCH 3/3] Btrfs: improve truncation of btrfs
On Thu, Jan 05, 2012 at 04:32:46PM +0800, Miao Xie wrote: The original truncation of btrfs has a bug, that is the orphan item will not be dropped when the truncation fails. This bug will trigger BUG() when unlink that truncated file. And besides that, if the user does pre-allocation for the file which is truncated unsuccessfully, after re-mount(umount-mount, not -o remount), the pre-allocated extent will be dropped. This patch modified the relative functions of the truncation, and makes the truncation update i_size and disk_i_size of i-nodes every time we drop the file extent successfully, and set them to the real value. By this way, we needn't add orphan items to guarantee the consistency of the meta-data. By this patch, it is possible that the file may not be truncated to the size that the user expects(may be = the orignal size and = the expected one), so I think it is better that we shouldn't lose the data that lies within the range the expected size, the real size, because the user may take it for granted that the data in that extent is not lost. In order to implement it, we just write out all the dirty pages which are beyond the expected size of the file. This operation will spend lots of time if there are many dirty pages. It is also the only disadvantage of this patch. (Maybe I'm overcautious, we needn't hold that data.) Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/inode.c | 159 +- 1 files changed, 49 insertions(+), 110 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index df6060f..9ace01b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -88,7 +88,7 @@ static unsigned char btrfs_type_by_mode[S_IFMT S_SHIFT] = { }; static int btrfs_setsize(struct inode *inode, loff_t newsize); -static int btrfs_truncate(struct inode *inode); +static int btrfs_truncate(struct inode *inode, loff_t newsize); static int btrfs_finish_ordered_io(struct inode *inode, u64 start, u64 end); static noinline int cow_file_range(struct inode *inode, struct page *locked_page, @@ -2230,7 +2230,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) * btrfs_delalloc_reserve_space to catch offenders. */ mutex_lock(inode-i_mutex); - ret = btrfs_truncate(inode); + ret = btrfs_truncate(inode, inode-i_size); mutex_unlock(inode-i_mutex); } else { nr_unlink++; @@ -2993,7 +2993,7 @@ static int btrfs_release_and_test_inline_data_extent( return 0; /* - * Truncate inline items is special, we have done it by + * Truncate inline items is special, we will do it by * btrfs_truncate_page(); */ if (offset new_size) @@ -3121,9 +3121,9 @@ static int btrfs_release_and_test_data_extent(struct btrfs_trans_handle *trans, * will kill all the items on this inode, including the INODE_ITEM_KEY. */ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, - struct btrfs_root *root, - struct inode *inode, - u64 new_size, u32 min_type) +struct btrfs_root *root, +struct inode *inode, +u64 new_size, u32 min_type) { struct btrfs_path *path; struct extent_buffer *leaf; @@ -3131,6 +3131,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, struct btrfs_key found_key; u64 mask = root-sectorsize - 1; u64 ino = btrfs_ino(inode); + u64 old_size = i_size_read(inode); u32 found_type; int pending_del_nr = 0; int pending_del_slot = 0; @@ -3138,6 +3139,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, int err = 0; BUG_ON(new_size 0 min_type != BTRFS_EXTENT_DATA_KEY); + BUG_ON(new_size mask); path = btrfs_alloc_path(); if (!path) @@ -3190,6 +3192,13 @@ search_again: ret = btrfs_release_and_test_data_extent(trans, root, path, inode, found_key.offset, new_size); + if (root-ref_cows || + root == root-fs_info-tree_root) { + if (ret found_key.offset old_size) + i_size_write(inode, found_key.offset); + else if (!ret) + i_size_write(inode, new_size); + } if (!ret) break; } @@ -3247,12 +3256,10 @@ out: static int btrfs_truncate_page(struct address_space
Re: [RFC][PATCH 3/3] Btrfs: improve truncation of btrfs
On thu, 5 Jan 2012 10:15:50 -0500, Josef Bacik wrote: +trans = btrfs_start_transaction(root, 2); +if (IS_ERR(trans)) +return PTR_ERR(trans); /* * setattr is responsible for setting the ordered_data_close flag, @@ -6621,26 +6585,12 @@ static int btrfs_truncate(struct inode *inode) * using truncate to replace the contents of the file will * end up with a zero length file after a crash. */ -if (inode-i_size == 0 BTRFS_I(inode)-ordered_data_close) +if (newsize == 0 BTRFS_I(inode)-ordered_data_close) btrfs_add_ordered_operation(trans, root, inode); Since we have write out all the dirty page, we can drop the following code which is in front of the while loop, and move the first btrfs_start_transaction() into the loop, the logic of btrfs_truncate() will become simpler. while (1) { -ret = btrfs_block_rsv_refill(root, rsv, min_size); -if (ret) { -/* - * This can only happen with the original transaction we - * started above, every other time we shouldn't have a - * transaction started yet. - */ -if (ret == -EAGAIN) -goto end_trans; -err = ret; -break; -} - Taking this part out is wrong, we need to have this slack space to account for any COW that truncate does. Other than that this looks pretty good. Thanks, I think we can take this part out, because we start a new transaction every time we do a truncation, and reserve enough space at that time. See below: Thanks Miao -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html