Re: [PATCH 1/2] btrfs: introduce priority based delalloc shrink mechanism
hi, On 10/13/2016 01:20 AM, Josef Bacik wrote: On 10/12/2016 05:03 AM, Wang Xiaoguang wrote: Since commit b02441999efcc6152b87cd58e7970bb7843f76cf, we don't wait all ordered extents, but I run into some enospc errors when doing large file create and delete tests, it's because shrink_delalloc() does not write enough delalloc bytes and wait them finished: From: Miao XieDate: Mon, 4 Nov 2013 23:13:25 +0800 Subject: [PATCH] Btrfs: don't wait for the completion of all the ordered extents It is very likely that there are lots of ordered extents in the filesytem, if we wait for the completion of all of them when we want to reclaim some space for the metadata space reservation, we would be blocked for a long time. The performance would drop down suddenly for a long time. Here we introduce a simple reclaim_priority variable, the higher the value, the higher the priority, 0 is the minimum priority. The core idea is: if (reclaim_priority >= 3) to_reclaim = percpu_counter_sum_positive(>fs_info->delalloc_bytes); else to_reclaim = orig * (2 << (reclaim_priority + 1)); As the priority increases, we try wo write more delalloc bytes, here orig is the number of metadata we want to reserve. Meanwhile if "reclaim_priority >= 3" returns true, we'll also wait all current ordered extents to finish. Ok this is closer to what I want but not quite there. I want to get rid of the magic numbers, so we start with #define BTRFS_DEFAULT_FLUSH_PRIORITY 3 and once priority == 0 we know it's time to flush and wait for all the things. Then we use the priority as a multiplier, with DEFAULT being no multiplier, so it would look something like this if (reclaim_priority) to_reclaim = orig * (2 << BTRFS_DEFAULT_FLUSH_PRIORITY - priority); else to_reclaim = delalloc_bytes; if (reclaim_priority) items_to_wait = calc_reclaim_items_nr(root, to_reclaim); else items_to_wait = -1; and then instead of if (reclaim_priority > 3) { wake_all_tickets(_info->tickets); ... we can change the while loop to be while ((flush_state <= COMMIT_TRANS) || (--reclaim_priority >= 0)); and move the wake_all_tickets() to outside the loop, and add a if (flush_state > COMMIT_TRANS) flush_state = FLUSH_DELAYED_ITEMS_NR; to the start of the do loop. Does that make sense? Thanks, Yes, thanks for your kindly help. For "wake_all_tickets()", I think we still need to put it in the while loop, If we move it outside, we need to relock btrfs_space_info's lock, but here there maybe some race window, some new tickets maybe added in, we should not wake up them directly and should also do some flush work for them. Regards, Xiaoguang Wang Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs: introduce priority based delalloc shrink mechanism
On 10/12/2016 05:03 AM, Wang Xiaoguang wrote: Since commit b02441999efcc6152b87cd58e7970bb7843f76cf, we don't wait all ordered extents, but I run into some enospc errors when doing large file create and delete tests, it's because shrink_delalloc() does not write enough delalloc bytes and wait them finished: From: Miao XieDate: Mon, 4 Nov 2013 23:13:25 +0800 Subject: [PATCH] Btrfs: don't wait for the completion of all the ordered extents It is very likely that there are lots of ordered extents in the filesytem, if we wait for the completion of all of them when we want to reclaim some space for the metadata space reservation, we would be blocked for a long time. The performance would drop down suddenly for a long time. Here we introduce a simple reclaim_priority variable, the higher the value, the higher the priority, 0 is the minimum priority. The core idea is: if (reclaim_priority >= 3) to_reclaim = percpu_counter_sum_positive(>fs_info->delalloc_bytes); else to_reclaim = orig * (2 << (reclaim_priority + 1)); As the priority increases, we try wo write more delalloc bytes, here orig is the number of metadata we want to reserve. Meanwhile if "reclaim_priority >= 3" returns true, we'll also wait all current ordered extents to finish. Ok this is closer to what I want but not quite there. I want to get rid of the magic numbers, so we start with #define BTRFS_DEFAULT_FLUSH_PRIORITY 3 and once priority == 0 we know it's time to flush and wait for all the things. Then we use the priority as a multiplier, with DEFAULT being no multiplier, so it would look something like this if (reclaim_priority) to_reclaim = orig * (2 << BTRFS_DEFAULT_FLUSH_PRIORITY - priority); else to_reclaim = delalloc_bytes; if (reclaim_priority) items_to_wait = calc_reclaim_items_nr(root, to_reclaim); else items_to_wait = -1; and then instead of if (reclaim_priority > 3) { wake_all_tickets(_info->tickets); ... we can change the while loop to be while ((flush_state <= COMMIT_TRANS) || (--reclaim_priority >= 0)); and move the wake_all_tickets() to outside the loop, and add a if (flush_state > COMMIT_TRANS) flush_state = FLUSH_DELAYED_ITEMS_NR; to the start of the do loop. Does that make sense? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs: introduce priority based delalloc shrink mechanism
Since commit b02441999efcc6152b87cd58e7970bb7843f76cf, we don't wait all ordered extents, but I run into some enospc errors when doing large file create and delete tests, it's because shrink_delalloc() does not write enough delalloc bytes and wait them finished: From: Miao XieDate: Mon, 4 Nov 2013 23:13:25 +0800 Subject: [PATCH] Btrfs: don't wait for the completion of all the ordered extents It is very likely that there are lots of ordered extents in the filesytem, if we wait for the completion of all of them when we want to reclaim some space for the metadata space reservation, we would be blocked for a long time. The performance would drop down suddenly for a long time. Here we introduce a simple reclaim_priority variable, the higher the value, the higher the priority, 0 is the minimum priority. The core idea is: if (reclaim_priority >= 3) to_reclaim = percpu_counter_sum_positive(>fs_info->delalloc_bytes); else to_reclaim = orig * (2 << (reclaim_priority + 1)); As the priority increases, we try wo write more delalloc bytes, here orig is the number of metadata we want to reserve. Meanwhile if "reclaim_priority >= 3" returns true, we'll also wait all current ordered extents to finish. Signed-off-by: Wang Xiaoguang --- fs/btrfs/extent-tree.c | 49 include/trace/events/btrfs.h | 11 -- 2 files changed, 35 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 665da8f..661f039 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4690,12 +4690,13 @@ static inline int calc_reclaim_items_nr(struct btrfs_root *root, u64 to_reclaim) /* * shrink metadata reservation for delalloc */ -static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, - bool wait_ordered) +static void shrink_delalloc(struct btrfs_root *root, u64 orig, + bool wait_ordered, int reclaim_priority) { struct btrfs_block_rsv *block_rsv; struct btrfs_space_info *space_info; struct btrfs_trans_handle *trans; + u64 to_reclaim; u64 delalloc_bytes; u64 max_reclaim; long time_left; @@ -4703,22 +4704,35 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, int loops; int items; enum btrfs_reserve_flush_enum flush; + int items_to_wait; + + if (reclaim_priority < 0) + reclaim_priority = 0; + + delalloc_bytes = percpu_counter_sum_positive( + >fs_info->delalloc_bytes); + if (reclaim_priority >= 3) + to_reclaim = delalloc_bytes; + else + to_reclaim = orig * (2 << (reclaim_priority + 1)); /* Calc the number of the pages we need flush for space reservation */ items = calc_reclaim_items_nr(root, to_reclaim); to_reclaim = (u64)items * EXTENT_SIZE_PER_ITEM; + if (reclaim_priority >= 3) + items_to_wait = -1; + else + items_to_wait = items; trans = (struct btrfs_trans_handle *)current->journal_info; block_rsv = >fs_info->delalloc_block_rsv; space_info = block_rsv->space_info; - delalloc_bytes = percpu_counter_sum_positive( - >fs_info->delalloc_bytes); if (delalloc_bytes == 0) { if (trans) return; if (wait_ordered) - btrfs_wait_ordered_roots(root->fs_info, items, + btrfs_wait_ordered_roots(root->fs_info, items_to_wait, 0, (u64)-1); return; } @@ -4763,7 +4777,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, loops++; if (wait_ordered && !trans) { - btrfs_wait_ordered_roots(root->fs_info, items, + btrfs_wait_ordered_roots(root->fs_info, items_to_wait, 0, (u64)-1); } else { time_left = schedule_timeout_killable(1); @@ -4836,7 +4850,7 @@ struct reserve_ticket { static int flush_space(struct btrfs_root *root, struct btrfs_space_info *space_info, u64 num_bytes, - u64 orig_bytes, int state) + int state, int reclaim_priority) { struct btrfs_trans_handle *trans; int nr; @@ -4860,8 +4874,8 @@ static int flush_space(struct btrfs_root *root, break; case FLUSH_DELALLOC: case FLUSH_DELALLOC_WAIT: - shrink_delalloc(root, num_bytes * 2, orig_bytes, - state ==