[PATCH 00/10] several fixes and cleanups
This patchset consists of a bug fix from allocating chunk, six bug fixes from autodefrag, and other cleanups. I've tested it with xfstests plus autodefrag option. Liu Bo (10): Btrfs: show useful info in space reservation tracepoint Btrfs: fix deadlock during allocating chunks Btrfs: fix race between direct io and autodefrag Btrfs: fix the mismatch of page-mapping Btrfs: fix recursive defragment with autodefrag option Btrfs: add a check to decide if we should defrag the range Btrfs: do not bother to defrag an extent if it is a big real extent Btrfs: update to the right index of defragment Btrfs: use PagePrivate2 to check ordered data Btrfs: drop cache with VACANCY em when we fail to start a transaction fs/btrfs/extent-tree.c | 79 -- fs/btrfs/inode-map.c |6 +-- fs/btrfs/inode.c | 59 --- fs/btrfs/ioctl.c | 89 +++- fs/btrfs/transaction.c |3 +- 5 files changed, 151 insertions(+), 85 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/10] Btrfs: show useful info in space reservation tracepoint
o For space info, the type of space info is useful for debug. o For transaction handle, its transid is useful. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/extent-tree.c | 29 ++--- fs/btrfs/inode-map.c |6 ++ fs/btrfs/transaction.c |3 +-- 3 files changed, 13 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 37e0a80..f3d367a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3312,8 +3312,7 @@ commit_trans: } data_sinfo-bytes_may_use += bytes; trace_btrfs_space_reservation(root-fs_info, space_info, - (u64)(unsigned long)data_sinfo, - bytes, 1); + data_sinfo-flags, bytes, 1); spin_unlock(data_sinfo-lock); return 0; @@ -3334,8 +,7 @@ void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes) spin_lock(data_sinfo-lock); data_sinfo-bytes_may_use -= bytes; trace_btrfs_space_reservation(root-fs_info, space_info, - (u64)(unsigned long)data_sinfo, - bytes, 0); + data_sinfo-flags, bytes, 0); spin_unlock(data_sinfo-lock); } @@ -3700,9 +3698,7 @@ again: if (used + orig_bytes = space_info-total_bytes) { space_info-bytes_may_use += orig_bytes; trace_btrfs_space_reservation(root-fs_info, - space_info, - (u64)(unsigned long)space_info, - orig_bytes, 1); + space_info, space_info-flags, orig_bytes, 1); ret = 0; } else { /* @@ -3771,9 +3767,7 @@ again: if (used + num_bytes space_info-total_bytes + avail) { space_info-bytes_may_use += orig_bytes; trace_btrfs_space_reservation(root-fs_info, - space_info, - (u64)(unsigned long)space_info, - orig_bytes, 1); + space_info, space_info-flags, orig_bytes, 1); ret = 0; } else { wait_ordered = true; @@ -3918,8 +3912,7 @@ static void block_rsv_release_bytes(struct btrfs_fs_info *fs_info, spin_lock(space_info-lock); space_info-bytes_may_use -= num_bytes; trace_btrfs_space_reservation(fs_info, space_info, - (u64)(unsigned long)space_info, - num_bytes, 0); + space_info-flags, num_bytes, 0); space_info-reservation_progress++; spin_unlock(space_info-lock); } @@ -4137,14 +4130,14 @@ static void update_global_block_rsv(struct btrfs_fs_info *fs_info) block_rsv-reserved += num_bytes; sinfo-bytes_may_use += num_bytes; trace_btrfs_space_reservation(fs_info, space_info, - (u64)(unsigned long)sinfo, num_bytes, 1); + sinfo-flags, num_bytes, 1); } if (block_rsv-reserved = block_rsv-size) { num_bytes = block_rsv-reserved - block_rsv-size; sinfo-bytes_may_use -= num_bytes; trace_btrfs_space_reservation(fs_info, space_info, - (u64)(unsigned long)sinfo, num_bytes, 0); + sinfo-flags, num_bytes, 0); sinfo-reservation_progress++; block_rsv-reserved = block_rsv-size; block_rsv-full = 1; @@ -4198,8 +4191,7 @@ void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans, return; trace_btrfs_space_reservation(root-fs_info, transaction, - (u64)(unsigned long)trans, - trans-bytes_reserved, 0); + trans-transid, trans-bytes_reserved, 0); btrfs_block_rsv_release(root, trans-block_rsv, trans-bytes_reserved); trans-bytes_reserved = 0; } @@ -4716,9 +4708,8 @@ static int btrfs_update_reserved_bytes(struct btrfs_block_group_cache *cache, space_info-bytes_reserved += num_bytes; if (reserve == RESERVE_ALLOC) { trace_btrfs_space_reservation(cache-fs_info, - space_info, -
[PATCH 02/10][RESEND] Btrfs: fix deadlock during allocating chunks
This deadlock comes from xfstests 251. We'll hold the chunk_mutex throughout the whole of a chunk allocation. But if we find that we've used up system chunk space, we need to allocate a new system chunk, but this will lead to a recursion of chunk allocation and end up with a deadlock on chunk_mutex. So instead we need to allocate the system chunk first if we find we're in ENOSPC. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/extent-tree.c | 50 1 files changed, 50 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index f3d367a..fe5bbc7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3394,6 +3394,50 @@ static int should_alloc_chunk(struct btrfs_root *root, return 1; } +static u64 get_system_chunk_thresh(struct btrfs_root *root, u64 type) +{ + u64 num_dev; + + if (type BTRFS_BLOCK_GROUP_RAID10 || + type BTRFS_BLOCK_GROUP_RAID0) + num_dev = root-fs_info-fs_devices-rw_devices; + else if (type BTRFS_BLOCK_GROUP_RAID1) + num_dev = 2; + else + num_dev = 1;/* DUP or single */ + + /* metadata for updaing devices and chunk tree */ + return btrfs_calc_trans_metadata_size(root, num_dev + 1); +} + +static void check_system_chunk(struct btrfs_trans_handle *trans, + struct btrfs_root *root, u64 type) +{ + struct btrfs_space_info *info; + u64 left; + u64 thresh; + + info = __find_space_info(root-fs_info, BTRFS_BLOCK_GROUP_SYSTEM); + spin_lock(info-lock); + left = info-total_bytes - info-bytes_used - info-bytes_pinned - + info-bytes_reserved - info-bytes_readonly; + spin_unlock(info-lock); + + thresh = get_system_chunk_thresh(root, type); + if (left thresh btrfs_test_opt(root, ENOSPC_DEBUG)) { + printk(KERN_INFO left=%llu, need=%llu, flags=%llu\n, + left, thresh, type); + dump_space_info(info, 0, 0); + } + + if (left thresh) { + u64 flags; + + flags = btrfs_get_alloc_profile(root-fs_info-chunk_root, 0); + btrfs_alloc_chunk(trans, root, flags); + } +} + static int do_chunk_alloc(struct btrfs_trans_handle *trans, struct btrfs_root *extent_root, u64 alloc_bytes, u64 flags, int force) @@ -3466,6 +3510,12 @@ again: force_metadata_allocation(fs_info); } + /* +* Check if we have enough space in SYSTEM chunk because we may need +* to update devices. +*/ + check_system_chunk(trans, extent_root, flags); + ret = btrfs_alloc_chunk(trans, extent_root, flags); if (ret 0 ret != -ENOSPC) goto out; -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/10] Btrfs: fix the mismatch of page-mapping
commit 600a45e1d5e376f679ff9ecc4ce9452710a6d27c (Btrfs: fix deadlock on page lock when doing auto-defragment) fixes the deadlock on page, but it also introduces another bug. A page may have been truncated after unlock lock. So we need to find it again to get the right one. And since we've held i_mutex lock, inode size remains unchanged and we can drop isize overflow checks. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/ioctl.c | 35 +++ 1 files changed, 19 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0acc828..81faa78 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -856,6 +856,7 @@ static int cluster_pages_for_defrag(struct inode *inode, u64 isize = i_size_read(inode); u64 page_start; u64 page_end; + u64 page_cnt; int ret; int i; int i_done; @@ -864,19 +865,21 @@ static int cluster_pages_for_defrag(struct inode *inode, struct extent_io_tree *tree; gfp_t mask = btrfs_alloc_write_mask(inode-i_mapping); - if (isize == 0) - return 0; file_end = (isize - 1) PAGE_CACHE_SHIFT; + if (!isize || start_index file_end) + return 0; + + page_cnt = min_t(u64, (u64)num_pages, (u64)file_end - start_index + 1); ret = btrfs_delalloc_reserve_space(inode, - num_pages PAGE_CACHE_SHIFT); + page_cnt PAGE_CACHE_SHIFT); if (ret) return ret; i_done = 0; tree = BTRFS_I(inode)-io_tree; /* step one, lock all the pages */ - for (i = 0; i num_pages; i++) { + for (i = 0; i page_cnt; i++) { struct page *page; again: page = find_or_create_page(inode-i_mapping, @@ -898,6 +901,15 @@ again: btrfs_start_ordered_extent(inode, ordered, 1); btrfs_put_ordered_extent(ordered); lock_page(page); + /* +* we unlocked the page above, so we need check if +* it was released or not. +*/ + if (page-mapping != inode-i_mapping) { + unlock_page(page); + page_cache_release(page); + goto again; + } } if (!PageUptodate(page)) { @@ -911,15 +923,6 @@ again: } } - isize = i_size_read(inode); - file_end = (isize - 1) PAGE_CACHE_SHIFT; - if (!isize || page-index file_end) { - /* whoops, we blew past eof, skip this page */ - unlock_page(page); - page_cache_release(page); - break; - } - if (page-mapping != inode-i_mapping) { unlock_page(page); page_cache_release(page); @@ -953,12 +956,12 @@ again: EXTENT_DO_ACCOUNTING, 0, 0, cached_state, GFP_NOFS); - if (i_done != num_pages) { + if (i_done != page_cnt) { spin_lock(BTRFS_I(inode)-lock); BTRFS_I(inode)-outstanding_extents++; spin_unlock(BTRFS_I(inode)-lock); btrfs_delalloc_release_space(inode, -(num_pages - i_done) PAGE_CACHE_SHIFT); +(page_cnt - i_done) PAGE_CACHE_SHIFT); } @@ -983,7 +986,7 @@ out: unlock_page(pages[i]); page_cache_release(pages[i]); } - btrfs_delalloc_release_space(inode, num_pages PAGE_CACHE_SHIFT); + btrfs_delalloc_release_space(inode, page_cnt PAGE_CACHE_SHIFT); return ret; } -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/10] Btrfs: add a check to decide if we should defrag the range
If our file's layout is as follows: | hole | data1 | hole | data2 | we do not need to defrag this file, because this file has holes and cannot be merged into one extent. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ioctl.c | 36 +++- 1 files changed, 35 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 81faa78..66a4933 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -769,6 +769,31 @@ none: return -ENOENT; } +/* + * Validaty check of prev em and next em: + * 1) no prev/next em + * 2) prev/next em is an hole/inline extent + */ +static int check_adjacent_extents(struct inode *inode, struct extent_map *em) +{ + struct extent_map_tree *em_tree = BTRFS_I(inode)-extent_tree; + struct extent_map *prev = NULL, *next = NULL; + int ret = 0; + + read_lock(em_tree-lock); + prev = lookup_extent_mapping(em_tree, em-start - 1, (u64)-1); + next = lookup_extent_mapping(em_tree, em-start + em-len, (u64)-1); + read_unlock(em_tree-lock); + + if ((!prev || prev-block_start = EXTENT_MAP_LAST_BYTE) + (!next || next-block_start = EXTENT_MAP_LAST_BYTE)) + ret = 1; + free_extent_map(prev); + free_extent_map(next); + + return ret; +} + static int should_defrag_range(struct inode *inode, u64 start, u64 len, int thresh, u64 *last_len, u64 *skip, u64 *defrag_end) @@ -806,8 +831,16 @@ static int should_defrag_range(struct inode *inode, u64 start, u64 len, } /* this will cover holes, and inline extents */ - if (em-block_start = EXTENT_MAP_LAST_BYTE) + if (em-block_start = EXTENT_MAP_LAST_BYTE) { + ret = 0; + goto out; + } + + /* If we have nothing to merge with us, just skip. */ + if (check_adjacent_extents(inode, em)) { ret = 0; + goto out; + } /* * we hit a real extent, if it is big don't bother defragging it again @@ -815,6 +848,7 @@ static int should_defrag_range(struct inode *inode, u64 start, u64 len, if ((*last_len == 0 || *last_len = thresh) em-len = thresh) ret = 0; +out: /* * last_len ends up being a counter of how many bytes we've defragged. * every time we choose not to defrag an extent, we reset *last_len -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/10][RESEND] Btrfs: fix recursive defragment with autodefrag option
Reproduce: $ mkfs.btrfs disk $ mount disk /mnt -o autodefrag $ dd if=/dev/zero of=/mnt/foobar bs=4k count=10 2/dev/null sync $ for i in `seq 9 -2 0`; do dd if=/dev/zero of=/mnt/foobar bs=4k count=1 \ seek=$i conv=notrunc 2 /dev/null; done sync then we'll get to defrag foobar again and again. So does option -o autodefrag,compress. Reasons: When the cleaner kthread gets to fetch inodes from the defrag tree and defrag them, it will dirty pages and submit them, this will comes to another DATA COW where the processing inode will be inserted to the defrag tree again. This patch sets a rule for COW code, i.e. insert an inode when we're really going to make some defragments. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/inode.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 892b347..7f5018d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -344,8 +344,9 @@ static noinline int compress_file_range(struct inode *inode, int will_compress; int compress_type = root-fs_info-compress_type; - /* if this is a small write inside eof, kick off a defragbot */ - if (end = BTRFS_I(inode)-disk_i_size (end - start + 1) 16 * 1024) + /* if this is a small write inside eof, kick off a defrag */ + if ((end - start + 1) 16 * 1024 + (start 0 || end + 1 BTRFS_I(inode)-disk_i_size)) btrfs_add_inode_defrag(NULL, inode); actual_end = min_t(u64, isize, end + 1); @@ -800,7 +801,8 @@ static noinline int cow_file_range(struct inode *inode, ret = 0; /* if this is a small write inside eof, kick off defrag */ - if (end = BTRFS_I(inode)-disk_i_size num_bytes 64 * 1024) + if (num_bytes 64 * 1024 + (start 0 || end + 1 BTRFS_I(inode)-disk_i_size)) btrfs_add_inode_defrag(trans, inode); if (start == 0) { -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/10] Btrfs: update to the right index of defragment
When we use autodefrag, we forget to update the index which indicates the last page we've dirty. And we'll set dirty flags on a same set of pages again and again. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ioctl.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 7a6d15c..e3cb770 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1172,6 +1172,9 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, if (newer_off == (u64)-1) break; + if (ret 0) + i += ret; + newer_off = max(newer_off + 1, (u64)i PAGE_CACHE_SHIFT); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/10] Btrfs: drop cache with VACANCY em when we fail to start a transaction
We need to clean a VACANCY em(if we have) when we fail to start a transaction. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/inode.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index bacf441..2b2f0b6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3406,9 +3406,6 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size) break; } - btrfs_drop_extent_cache(inode, hole_start, - last_byte - 1, 0); - btrfs_update_inode(trans, root, inode); btrfs_end_transaction(trans, root); } @@ -3419,6 +3416,9 @@ int btrfs_cont_expand(struct inode *inode, loff_t oldsize, loff_t size) break; } + if (em test_bit(EXTENT_FLAG_VACANCY, em-flags)) + btrfs_drop_extent_cache(inode, hole_start, last_byte - 1, 0); + free_extent_map(em); unlock_extent_cached(io_tree, hole_start, block_end - 1, cached_state, GFP_NOFS); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/10] Btrfs: use PagePrivate2 to check ordered data
If a page has PagePrivate2 flag, it still remains as ordered data, so we can check this flag directly instead of looking up an ordered extent. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/inode.c | 45 +++-- 1 files changed, 15 insertions(+), 30 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 7f5018d..bacf441 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6345,8 +6345,6 @@ static int btrfs_releasepage(struct page *page, gfp_t gfp_flags) static void btrfs_invalidatepage(struct page *page, unsigned long offset) { struct extent_io_tree *tree; - struct btrfs_ordered_extent *ordered; - struct extent_state *cached_state = NULL; u64 page_start = page_offset(page); u64 page_end = page_start + PAGE_CACHE_SIZE - 1; @@ -6365,35 +6363,22 @@ static void btrfs_invalidatepage(struct page *page, unsigned long offset) btrfs_releasepage(page, GFP_NOFS); return; } - lock_extent_bits(tree, page_start, page_end, 0, cached_state, -GFP_NOFS); - ordered = btrfs_lookup_ordered_extent(page-mapping-host, - page_offset(page)); - if (ordered) { - /* -* IO on this page will never be started, so we need -* to account for any ordered extents now -*/ - clear_extent_bit(tree, page_start, page_end, -EXTENT_DIRTY | EXTENT_DELALLOC | -EXTENT_LOCKED | EXTENT_DO_ACCOUNTING, 1, 0, -cached_state, GFP_NOFS); - /* -* whoever cleared the private bit is responsible -* for the finish_ordered_io -*/ - if (TestClearPagePrivate2(page)) { - btrfs_finish_ordered_io(page-mapping-host, - page_start, page_end); - } - btrfs_put_ordered_extent(ordered); - cached_state = NULL; - lock_extent_bits(tree, page_start, page_end, 0, cached_state, -GFP_NOFS); - } + /* +* IO on this page will never be started, so we need +* to account for any ordered extents now +*/ clear_extent_bit(tree, page_start, page_end, -EXTENT_LOCKED | EXTENT_DIRTY | EXTENT_DELALLOC | -EXTENT_DO_ACCOUNTING, 1, 1, cached_state, GFP_NOFS); +EXTENT_DIRTY | EXTENT_DELALLOC | +EXTENT_LOCKED | EXTENT_DO_ACCOUNTING, 1, 1, +NULL, GFP_NOFS); + /* +* whoever cleared the private bit is responsible +* for the finish_ordered_io +*/ + if (TestClearPagePrivate2(page)) { + btrfs_finish_ordered_io(page-mapping-host, + page_start, page_end); + } __btrfs_releasepage(page, GFP_NOFS); ClearPageChecked(page); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/10] Btrfs: do not bother to defrag an extent if it is a big real extent
$ mkfs.btrfs /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs/ -oautodefrag $ dd if=/dev/zero of=/mnt/btrfs/foobar bs=4k count=10 oflag=direct 2/dev/null $ filefrag -v /mnt/btrfs/foobar Filesystem type is: 9123683e File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3072 10 eof /mnt/btrfs/foobar: 1 extent found Now we have a big real extent [0, 40960), but autodefrag will still defrag it. $ sync $ filefrag -v /mnt/btrfs/foobar Filesystem type is: 9123683e File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3082 10 eof /mnt/btrfs/foobar: 1 extent found So if we already find a big real extent, we're ok about that, just skip it. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ioctl.c |9 +++-- 1 files changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 66a4933..7a6d15c 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1126,12 +1126,9 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, if (!(inode-i_sb-s_flags MS_ACTIVE)) break; - if (!newer_than - !should_defrag_range(inode, (u64)i PAGE_CACHE_SHIFT, - PAGE_CACHE_SIZE, - extent_thresh, - last_len, skip, - defrag_end)) { + if (!should_defrag_range(inode, (u64)i PAGE_CACHE_SHIFT, +PAGE_CACHE_SIZE, extent_thresh, +last_len, skip, defrag_end)) { unsigned long next; /* * the should_defrag function tells us how much to skip -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/10] Btrfs: fix race between direct io and autodefrag
The bug is from running xfstests 209 with autodefrag. The race is as follows: t1 t2(autodefrag) direct IO invalidate pagecache dio(old data) add_inode_defrag invalidate pagecache endio direct IO invalidate pagecache run_defrag readpage(old data) set page dirty (old data) dio(new data, rewrite) invalidate pagecache (*) endio t2(autodefrag) will get old data into pagecache via readpage and set pagecache dirty. Meanwhile, invalidate pagecache(*) will fail due to dirty flags in pages. So the old data may be flushed into disk by flush thread, which will lead to data loss. And so does the case of user defragment progs. The patch fixes this race by holding i_mutex when we readpage and set page dirty. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/ioctl.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d8b5471..0acc828 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1123,12 +1123,16 @@ int btrfs_defrag_file(struct inode *inode, struct file *file, ra_index += max_cluster; } + mutex_lock(inode-i_mutex); ret = cluster_pages_for_defrag(inode, pages, i, cluster); - if (ret 0) + if (ret 0) { + mutex_unlock(inode-i_mutex); goto out_ra; + } defrag_count += ret; balance_dirty_pages_ratelimited_nr(inode-i_mapping, ret); + mutex_unlock(inode-i_mutex); if (newer_than) { if (newer_off == (u64)-1) -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs csum failed, scrub ok
I have a freshly installed system with btrfs as the root file system. The machine is running linux 3.2. The raid1 btrfs file system lives on two new hard drives. About one day after installation the following message appeared in kern.log. There were no other errors. root@mim:/var/log# grep 'btrfs.*fail' kern.log Mar 27 01:07:46 mim kernel: [ 6480.233861] btrfs csum failed ino 453509 off 1495040 csum 3301532933 private 4156998194 Mar 27 01:07:46 mim kernel: [ 6480.234470] btrfs csum failed ino 453509 off 1499136 csum 1873118812 private 3512102188 Mar 27 01:07:46 mim kernel: [ 6480.234572] btrfs csum failed ino 453509 off 1503232 csum 1034640717 private 2041007647 Mar 27 01:07:46 mim kernel: [ 6480.234670] btrfs csum failed ino 453509 off 1507328 csum 889729013 private 2342095239 Mar 27 01:07:46 mim kernel: [ 6480.237977] btrfs csum failed ino 453509 off 1503232 csum 1518679450 private 2041007647 Mar 27 01:07:46 mim kernel: [ 6480.238149] btrfs csum failed ino 453509 off 1507328 csum 889729013 private 2342095239 Mar 27 01:07:46 mim kernel: [ 6480.238330] btrfs csum failed ino 453509 off 1495040 csum 3234580989 private 4156998194 Mar 27 01:07:46 mim kernel: [ 6480.238447] btrfs csum failed ino 453509 off 1499136 csum 1873118812 private 3512102188 Mar 27 01:07:46 mim kernel: [ 6480.243873] btrfs csum failed ino 453509 off 1503232 csum 2184012753 private 2041007647 Mar 27 01:07:46 mim kernel: [ 6480.243962] btrfs csum failed ino 453509 off 1507328 csum 240604621 private 2342095239 inode 453509 belongs to a file installed by dpkg root@mim:/# find / -inum 453509 -ls 453509 1976 -rw-r--r-- 1 root root 2020832 Mar 7 21:11 /usr/lib/libreoffice/basis3.4/program/libsblx.so That file seems to be ok, there are no errors when re-reading it. A scrub done the morning after the incident also didn't find any problems: root@mim:/home/cwg# btrfs scrub status / scrub status for 2da00153-f9ea-4d6c-a6cc-10c913d22686 scrub started at Tue Mar 27 10:37:49 2012 and finished after 3921 seconds total bytes scrubbed: 550.20GB with 0 errors Also inspecting the SMART status of the hard drives does not reveal any problems. Is this a bug in btrfs, or am I supposed to be afraid that the new hard drives are not working reliably? Or could this be the effect of some cosmic ray hitting my machine? (It doesn't have ECC.) Or is it normal that hard drives sometimes make errors? (In that case the additional layer of btrfs checksumming seems to be a very good thing.) Christoph -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed, scrub ok
On Tue, 27 Mar 2012 12:57:31 +0200 Christoph Groth c...@falma.de wrote: root@mim:/# find / -inum 453509 -ls 453509 1976 -rw-r--r-- 1 root root 2020832 Mar 7 21:11 /usr/lib/libreoffice/basis3.4/program/libsblx.so That file seems to be ok, there are no errors when re-reading it. How about $ sudo apt-get install debsums $ debsums libreoffice-core | grep libsblx.so -- With respect, Roman ~~~ Stallman had a printer, with code he could not see. So he began to tinker, and set the software free. signature.asc Description: PGP signature
Re: btrfs csum failed, scrub ok
Roman Mamedov r...@romanrm.ru writes: On Tue, 27 Mar 2012 12:57:31 +0200 Christoph Groth c...@falma.de wrote: root@mim:/# find / -inum 453509 -ls 453509 1976 -rw-r--r-- 1 root root 2020832 Mar 7 21:11 /usr/lib/libreoffice/basis3.4/program/libsblx.so That file seems to be ok, there are no errors when re-reading it. How about $ sudo apt-get install debsums $ debsums libreoffice-core | grep libsblx.so Good idea! $ debsums libreoffice-core | grep libsblx.so /usr/lib/libreoffice/basis3.4/program/libsblx.so OK I'm still puzzled by this incident. Christoph -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: open_ctree failed
One entire subvolume was restored. But there were 4 subvolumes on that partition. Is there a way to specify/force the restore of a different subvolume ? find-root seems to only find a single root. thanks On Mon, Mar 26, 2012 at 3:47 PM, Hugo Mills h...@carfax.org.uk wrote: On Mon, Mar 26, 2012 at 03:36:13PM -0700, Not Zippy wrote: Hugo I did try the dangerdonteveruse branch and thats the error btrfsck --repair gave me. Oooh, a brave one, I see. ;) Looks like the btrfs-restore command may work (thanks!). And yes I do have backups for the important data - I had some other data on there which would need to be d/l again.. Excellent. We don't need to set the hounds onto you, then. I don't dabble that much with the kernel - this is a straight ubuntu which I regularly do their updates - Can I advance the kernel beyond ? Yes, there's a PPA[1] for it (documented in the Getting Started page on the btrfs wiki at [2]). [1] http://kernel.ubuntu.com/~kernel-ppa/mainline/ [2] http://btrfs.ipv5.de/index.php?title=Getting_started#Ubuntu_Linux -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: open_ctree failed
On Tue, Mar 27, 2012 at 05:58:17AM -0700, Not Zippy wrote: One entire subvolume was restored. But there were 4 subvolumes on that partition. Is there a way to specify/force the restore of a different subvolume ? find-root seems to only find a single root. There is only a single root tree, so that's understandable. If you have a look at the documentation for restore[1], it mentions (right near the bottom of the page) that -r will allow you to select an alternative subvolume to recover from. Hugo. [1] http://btrfs.ipv5.de/index.php?title=Restore thanks On Mon, Mar 26, 2012 at 3:47 PM, Hugo Mills h...@carfax.org.uk wrote: On Mon, Mar 26, 2012 at 03:36:13PM -0700, Not Zippy wrote: Hugo I did try the dangerdonteveruse branch and thats the error btrfsck --repair gave me. Oooh, a brave one, I see. ;) Looks like the btrfs-restore command may work (thanks!). And yes I do have backups for the important data - I had some other data on there which would need to be d/l again.. Excellent. We don't need to set the hounds onto you, then. I don't dabble that much with the kernel - this is a straight ubuntu which I regularly do their updates - Can I advance the kernel beyond ? Yes, there's a PPA[1] for it (documented in the Getting Started page on the btrfs wiki at [2]). [1] http://kernel.ubuntu.com/~kernel-ppa/mainline/ [2] http://btrfs.ipv5.de/index.php?title=Getting_started#Ubuntu_Linux -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- The enemy have elected for Death by Powerpoint. That's what --- they shall get. -- gdb signature.asc Description: Digital signature
[PATCH 1/3] Btrfs: actually call btrfs_init_lockdep
btrfs_init_lockdep only makes our lockdep class names look prettier, thus it did never hurt we forgot to actually call it. This turns our lockdep identifier strings from lockdep auto-set #[id] into really pretty btrfs-fs-01 or btrfs-csum-03. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/super.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 61717a4..5239003 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1404,6 +1404,8 @@ static int __init init_btrfs_fs(void) if (err) goto unregister_ioctl; + btrfs_init_lockdep(); + printk(KERN_INFO %s loaded\n, BTRFS_BUILD_VERSION); return 0; -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Btrfs fixes for 3.4
Hi Chris, please pull my three current patches from my repo, based on your for-linus branch (I can rebase them to the integration branch if that helps): git://git.jan-o-sch.net/btrfs-unstable for-chris It's two really small fixes both mentioned earlier and a more or less imporant fixup for scrub. While working fine in 3.2, name resolving can deadlock since the first rc of 3.3. I suggest we queue that patch 3/3 for submission to 3.3-stable. I'm passing xfstests just as good as for-linus is doing without my patches (which is not really good). I also made some manual error insertion tests to verify that the scrub deadlock chance is really gone. And, we really should have an xfstest for raid-repair and scrub-repair. Anyone? :-) -Jan Jan Schmidt (3): Btrfs: actually call btrfs_init_lockdep Btrfs: check return value of btrfs_cow_block() Btrfs: fix regression in scrub path resolving fs/btrfs/backref.c | 115 +++ fs/btrfs/backref.h |5 +- fs/btrfs/ioctl.c |4 +- fs/btrfs/scrub.c |4 +- fs/btrfs/super.c |2 + fs/btrfs/transaction.c |6 ++- 6 files changed, 79 insertions(+), 57 deletions(-) -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] Btrfs: check return value of btrfs_cow_block()
The two helper functions commit_cowonly_roots() and create_pending_snapshot() failed to check the return value from btrfs_cow_block(), which could at least in theory fail with -ENOSPC from btrfs_alloc_free_block(). This commit adds the missing checks. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/transaction.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 04b77e3..cd220f2 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -718,7 +718,8 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans, BUG_ON(ret); eb = btrfs_lock_root_node(fs_info-tree_root); - btrfs_cow_block(trans, fs_info-tree_root, eb, NULL, 0, eb); + ret = btrfs_cow_block(trans, fs_info-tree_root, eb, NULL, 0, eb); + BUG_ON(ret); btrfs_tree_unlock(eb); free_extent_buffer(eb); @@ -949,7 +950,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, btrfs_set_root_flags(new_root_item, root_flags); old = btrfs_lock_root_node(root); - btrfs_cow_block(trans, root, old, NULL, 0, old); + ret = btrfs_cow_block(trans, root, old, NULL, 0, old); + BUG_ON(ret); btrfs_set_lock_blocking(old); btrfs_copy_root(trans, root, old, tmp, objectid); -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] Btrfs: fix regression in scrub path resolving
In commit 4692cf58 we introduced new backref walking code for btrfs. This assumes we're searching live roots, which requires a transaction context. While scrubbing, however, we must not join a transaction because this could deadlock with the commit path. Additionally, what scrub really wants to do is resolving a logical address in the commit root it's currently checking. This patch adds support for logical to path resolving on commit roots and makes scrub use that. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- I think we should queue this one for 3.3-stable --- fs/btrfs/backref.c | 115 ++-- fs/btrfs/backref.h |5 +- fs/btrfs/ioctl.c |4 +- fs/btrfs/scrub.c |4 +- 4 files changed, 73 insertions(+), 55 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 0436c12..56136d90 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -116,6 +116,7 @@ add_parent: * to a logical address */ static int __resolve_indirect_ref(struct btrfs_fs_info *fs_info, + int search_commit_root, struct __prelim_ref *ref, struct ulist *parents) { @@ -131,6 +132,7 @@ static int __resolve_indirect_ref(struct btrfs_fs_info *fs_info, path = btrfs_alloc_path(); if (!path) return -ENOMEM; + path-search_commit_root = !!search_commit_root; root_key.objectid = ref-root_id; root_key.type = BTRFS_ROOT_ITEM_KEY; @@ -188,6 +190,7 @@ out: * resolve all indirect backrefs from the list */ static int __resolve_indirect_refs(struct btrfs_fs_info *fs_info, + int search_commit_root, struct list_head *head) { int err; @@ -212,7 +215,8 @@ static int __resolve_indirect_refs(struct btrfs_fs_info *fs_info, continue; if (ref-count == 0) continue; - err = __resolve_indirect_ref(fs_info, ref, parents); + err = __resolve_indirect_ref(fs_info, search_commit_root, +ref, parents); if (err) { if (ret == 0) ret = err; @@ -586,6 +590,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, struct btrfs_delayed_ref_head *head; int info_level = 0; int ret; + int search_commit_root = (trans == BTRFS_BACKREF_SEARCH_COMMIT_ROOT); struct list_head prefs_delayed; struct list_head prefs; struct __prelim_ref *ref; @@ -600,6 +605,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, path = btrfs_alloc_path(); if (!path) return -ENOMEM; + path-search_commit_root = !!search_commit_root; /* * grab both a lock on the path and a lock on the delayed ref head. @@ -614,35 +620,39 @@ again: goto out; BUG_ON(ret == 0); - /* -* look if there are updates for this ref queued and lock the head -*/ - delayed_refs = trans-transaction-delayed_refs; - spin_lock(delayed_refs-lock); - head = btrfs_find_delayed_ref_head(trans, bytenr); - if (head) { - if (!mutex_trylock(head-mutex)) { - atomic_inc(head-node.refs); - spin_unlock(delayed_refs-lock); - - btrfs_release_path(path); - - /* -* Mutex was contended, block until it's -* released and try again -*/ - mutex_lock(head-mutex); - mutex_unlock(head-mutex); - btrfs_put_delayed_ref(head-node); - goto again; - } - ret = __add_delayed_refs(head, seq, info_key, prefs_delayed); - if (ret) { - spin_unlock(delayed_refs-lock); - goto out; + if (trans != BTRFS_BACKREF_SEARCH_COMMIT_ROOT) { + /* +* look if there are updates for this ref queued and lock the +* head +*/ + delayed_refs = trans-transaction-delayed_refs; + spin_lock(delayed_refs-lock); + head = btrfs_find_delayed_ref_head(trans, bytenr); + if (head) { + if (!mutex_trylock(head-mutex)) { + atomic_inc(head-node.refs); + spin_unlock(delayed_refs-lock); + + btrfs_release_path(path); + + /* +* Mutex was contended, block until it's +* released and try again +
[PATCH 0/8] Restriper fixes
Hi Chris, The main one here is the improvement to btrfs_can_relocate(), which is now a tiny bit smarter and does not return ENOSPC when there's plenty of unallocated space for target chunks. This, in addition to my patch which disables silent profile upgrades, should lower a number of corner cases in profile changing. The rest are a bunch of cleanups and some minor fixes. Please pull from git://github.com/idryomov/btrfs-unstable.git for-chris top commit 213e64da90d14537cd63f7090d6c4d1fcc75d9f8 Thanks, Ilya Ilya Dryomov (8): Btrfs: add wrappers for working with alloc profiles Btrfs: make profile_is_valid() check more strict Btrfs: move alloc_profile_is_valid() to volumes.c Btrfs: add get_restripe_target() helper Btrfs: add __get_block_group_index() helper Btrfs: improve the logic in btrfs_can_relocate() Btrfs: validate target profiles only if we are going to use them Btrfs: allow dup for data chunks in mixed mode fs/btrfs/ctree.h | 33 +-- fs/btrfs/extent-tree.c | 158 ++-- fs/btrfs/volumes.c | 88 --- 3 files changed, 152 insertions(+), 127 deletions(-) -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/8] Btrfs: move alloc_profile_is_valid() to volumes.c
Header file is not a good place to define functions. This also moves a call to alloc_profile_is_valid() down the stack and removes a redundant check from __btrfs_alloc_chunk() - alloc_profile_is_valid() takes it into account. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/ctree.h | 23 --- fs/btrfs/extent-tree.c |2 -- fs/btrfs/volumes.c | 30 +- 3 files changed, 25 insertions(+), 30 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index f057e92..a56e1e0 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2734,29 +2734,6 @@ static inline void free_fs_info(struct btrfs_fs_info *fs_info) kfree(fs_info-super_for_commit); kfree(fs_info); } -/** - * alloc_profile_is_valid - see if a given profile is valid and reduced - * @flags: profile to validate - * @extended: if true @flags is treated as an extended profile - */ -static inline int alloc_profile_is_valid(u64 flags, int extended) -{ - u64 mask = (extended ? BTRFS_EXTENDED_PROFILE_MASK : - BTRFS_BLOCK_GROUP_PROFILE_MASK); - - flags = ~BTRFS_BLOCK_GROUP_TYPE_MASK; - - /* 1) check that all other bits are zeroed */ - if (flags ~mask) - return 0; - - /* 2) see if profile is reduced */ - if (flags == 0) - return !extended; /* 0 is valid for usual profiles */ - - /* true if exactly one bit set */ - return (flags (flags - 1)) == 0; -} /* root-item.c */ int btrfs_find_root_ref(struct btrfs_root *tree_root, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 8c5bd8f..304710c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3400,8 +3400,6 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans, int wait_for_alloc = 0; int ret = 0; - BUG_ON(!alloc_profile_is_valid(flags, 0)); - space_info = __find_space_info(extent_root-fs_info, flags); if (!space_info) { ret = update_space_info(extent_root-fs_info, flags, diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e4ef0f2..def9e25 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2598,6 +2598,30 @@ error: return ret; } +/** + * alloc_profile_is_valid - see if a given profile is valid and reduced + * @flags: profile to validate + * @extended: if true @flags is treated as an extended profile + */ +static int alloc_profile_is_valid(u64 flags, int extended) +{ + u64 mask = (extended ? BTRFS_EXTENDED_PROFILE_MASK : + BTRFS_BLOCK_GROUP_PROFILE_MASK); + + flags = ~BTRFS_BLOCK_GROUP_TYPE_MASK; + + /* 1) check that all other bits are zeroed */ + if (flags ~mask) + return 0; + + /* 2) see if profile is reduced */ + if (flags == 0) + return !extended; /* 0 is valid for usual profiles */ + + /* true if exactly one bit set */ + return (flags (flags - 1)) == 0; +} + static inline int balance_need_close(struct btrfs_fs_info *fs_info) { /* cancel requested || normal exit path */ @@ -3124,11 +3148,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, int i; int j; - if ((type BTRFS_BLOCK_GROUP_RAID1) - (type BTRFS_BLOCK_GROUP_DUP)) { - WARN_ON(1); - type = ~BTRFS_BLOCK_GROUP_DUP; - } + BUG_ON(!alloc_profile_is_valid(type, 0)); if (list_empty(fs_devices-alloc_list)) return -ENOSPC; -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/8] Btrfs: add wrappers for working with alloc profiles
Add functions to abstract the conversion between chunk and extended allocation profile formats and switch everybody to use them. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/ctree.h | 15 +++ fs/btrfs/extent-tree.c | 25 +++-- fs/btrfs/volumes.c | 20 3 files changed, 30 insertions(+), 30 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c2e17cd..aba7832 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -849,6 +849,21 @@ struct btrfs_csum_item { */ #define BTRFS_AVAIL_ALLOC_BIT_SINGLE (1ULL 48) +#define BTRFS_EXTENDED_PROFILE_MASK(BTRFS_BLOCK_GROUP_PROFILE_MASK | \ +BTRFS_AVAIL_ALLOC_BIT_SINGLE) + +static inline u64 chunk_to_extended(u64 flags) +{ + if ((flags BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0) + flags |= BTRFS_AVAIL_ALLOC_BIT_SINGLE; + + return flags; +} +static inline u64 extended_to_chunk(u64 flags) +{ + return flags ~BTRFS_AVAIL_ALLOC_BIT_SINGLE; +} + struct btrfs_block_group_item { __le64 used; __le64 chunk_objectid; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 4269777..9f16fdb 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3098,11 +3098,8 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags, static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) { - u64 extra_flags = flags BTRFS_BLOCK_GROUP_PROFILE_MASK; - - /* chunk - extended profile */ - if (extra_flags == 0) - extra_flags = BTRFS_AVAIL_ALLOC_BIT_SINGLE; + u64 extra_flags = chunk_to_extended(flags) + BTRFS_EXTENDED_PROFILE_MASK; if (flags BTRFS_BLOCK_GROUP_DATA) fs_info-avail_data_alloc_bits |= extra_flags; @@ -3181,9 +3178,7 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) } out: - /* extended - chunk profile */ - flags = ~BTRFS_AVAIL_ALLOC_BIT_SINGLE; - return flags; + return extended_to_chunk(flags); } static u64 get_alloc_profile(struct btrfs_root *root, u64 flags) @@ -6914,11 +6909,8 @@ static u64 update_block_group_flags(struct btrfs_root *root, u64 flags) tgt = BTRFS_BLOCK_GROUP_METADATA | bctl-meta.target; } - if (tgt) { - /* extended - chunk profile */ - tgt = ~BTRFS_AVAIL_ALLOC_BIT_SINGLE; - return tgt; - } + if (tgt) + return extended_to_chunk(tgt); } /* @@ -7597,11 +7589,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, static void clear_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) { - u64 extra_flags = flags BTRFS_BLOCK_GROUP_PROFILE_MASK; - - /* chunk - extended profile */ - if (extra_flags == 0) - extra_flags = BTRFS_AVAIL_ALLOC_BIT_SINGLE; + u64 extra_flags = chunk_to_extended(flags) + BTRFS_EXTENDED_PROFILE_MASK; if (flags BTRFS_BLOCK_GROUP_DATA) fs_info-avail_data_alloc_bits = ~extra_flags; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 58aad63e..4b263a2 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2250,15 +2250,13 @@ static void unset_balance_control(struct btrfs_fs_info *fs_info) * Balance filters. Return 1 if chunk should be filtered out * (should not be balanced). */ -static int chunk_profiles_filter(u64 chunk_profile, +static int chunk_profiles_filter(u64 chunk_type, struct btrfs_balance_args *bargs) { - chunk_profile = BTRFS_BLOCK_GROUP_PROFILE_MASK; + chunk_type = chunk_to_extended(chunk_type) + BTRFS_EXTENDED_PROFILE_MASK; - if (chunk_profile == 0) - chunk_profile = BTRFS_AVAIL_ALLOC_BIT_SINGLE; - - if (bargs-profiles chunk_profile) + if (bargs-profiles chunk_type) return 0; return 1; @@ -2365,18 +2363,16 @@ static int chunk_vrange_filter(struct extent_buffer *leaf, return 1; } -static int chunk_soft_convert_filter(u64 chunk_profile, +static int chunk_soft_convert_filter(u64 chunk_type, struct btrfs_balance_args *bargs) { if (!(bargs-flags BTRFS_BALANCE_ARGS_CONVERT)) return 0; - chunk_profile = BTRFS_BLOCK_GROUP_PROFILE_MASK; - - if (chunk_profile == 0) - chunk_profile = BTRFS_AVAIL_ALLOC_BIT_SINGLE; + chunk_type = chunk_to_extended(chunk_type) + BTRFS_EXTENDED_PROFILE_MASK; - if (bargs-target chunk_profile) + if (bargs-target == chunk_type) return 1; return 0; -- 1.7.9.1 -- To unsubscribe from this
[PATCH 4/8] Btrfs: add get_restripe_target() helper
Add get_restripe_target() helper and switch everybody to use it. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/extent-tree.c | 94 +-- 1 files changed, 50 insertions(+), 44 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 304710c..faf52e0 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3110,6 +3110,35 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) } /* + * returns target flags in extended format or 0 if restripe for this + * chunk_type is not in progress + */ +static u64 get_restripe_target(struct btrfs_fs_info *fs_info, u64 flags) +{ + struct btrfs_balance_control *bctl = fs_info-balance_ctl; + u64 target = 0; + + BUG_ON(!mutex_is_locked(fs_info-volume_mutex) + !spin_is_locked(fs_info-balance_lock)); + + if (!bctl) + return 0; + + if (flags BTRFS_BLOCK_GROUP_DATA + bctl-data.flags BTRFS_BALANCE_ARGS_CONVERT) { + target = BTRFS_BLOCK_GROUP_DATA | bctl-data.target; + } else if (flags BTRFS_BLOCK_GROUP_SYSTEM + bctl-sys.flags BTRFS_BALANCE_ARGS_CONVERT) { + target = BTRFS_BLOCK_GROUP_SYSTEM | bctl-sys.target; + } else if (flags BTRFS_BLOCK_GROUP_METADATA + bctl-meta.flags BTRFS_BALANCE_ARGS_CONVERT) { + target = BTRFS_BLOCK_GROUP_METADATA | bctl-meta.target; + } + + return target; +} + +/* * @flags: available profiles in extended format (see ctree.h) * * Returns reduced profile in chunk format. If profile changing is in @@ -3125,31 +3154,19 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) */ u64 num_devices = root-fs_info-fs_devices-rw_devices + root-fs_info-fs_devices-missing_devices; + u64 target; - /* pick restriper's target profile if it's available */ + /* +* see if restripe for this chunk_type is in progress, if so +* try to reduce to the target profile +*/ spin_lock(root-fs_info-balance_lock); - if (root-fs_info-balance_ctl) { - struct btrfs_balance_control *bctl = root-fs_info-balance_ctl; - u64 tgt = 0; - - if ((flags BTRFS_BLOCK_GROUP_DATA) - (bctl-data.flags BTRFS_BALANCE_ARGS_CONVERT) - (flags bctl-data.target)) { - tgt = BTRFS_BLOCK_GROUP_DATA | bctl-data.target; - } else if ((flags BTRFS_BLOCK_GROUP_SYSTEM) - (bctl-sys.flags BTRFS_BALANCE_ARGS_CONVERT) - (flags bctl-sys.target)) { - tgt = BTRFS_BLOCK_GROUP_SYSTEM | bctl-sys.target; - } else if ((flags BTRFS_BLOCK_GROUP_METADATA) - (bctl-meta.flags BTRFS_BALANCE_ARGS_CONVERT) - (flags bctl-meta.target)) { - tgt = BTRFS_BLOCK_GROUP_METADATA | bctl-meta.target; - } - - if (tgt) { + target = get_restripe_target(root-fs_info, flags); + if (target) { + /* pick target profile only if it's already available */ + if ((flags target) BTRFS_EXTENDED_PROFILE_MASK) { spin_unlock(root-fs_info-balance_lock); - flags = tgt; - goto out; + return extended_to_chunk(target); } } spin_unlock(root-fs_info-balance_lock); @@ -3177,7 +3194,6 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) flags = ~BTRFS_BLOCK_GROUP_RAID0; } -out: return extended_to_chunk(flags); } @@ -6888,28 +6904,15 @@ int btrfs_drop_subtree(struct btrfs_trans_handle *trans, static u64 update_block_group_flags(struct btrfs_root *root, u64 flags) { u64 num_devices; - u64 stripped = BTRFS_BLOCK_GROUP_RAID0 | - BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10; - - if (root-fs_info-balance_ctl) { - struct btrfs_balance_control *bctl = root-fs_info-balance_ctl; - u64 tgt = 0; - - /* pick restriper's target profile and return */ - if (flags BTRFS_BLOCK_GROUP_DATA - bctl-data.flags BTRFS_BALANCE_ARGS_CONVERT) { - tgt = BTRFS_BLOCK_GROUP_DATA | bctl-data.target; - } else if (flags BTRFS_BLOCK_GROUP_SYSTEM - bctl-sys.flags BTRFS_BALANCE_ARGS_CONVERT) { - tgt = BTRFS_BLOCK_GROUP_SYSTEM | bctl-sys.target; - } else if (flags BTRFS_BLOCK_GROUP_METADATA - bctl-meta.flags BTRFS_BALANCE_ARGS_CONVERT) { - tgt = BTRFS_BLOCK_GROUP_METADATA | bctl-meta.target; - } + u64
[PATCH 5/8] Btrfs: add __get_block_group_index() helper
Add __get_block_group_index() helper to be able to derive block group index from an arbitary set of flags. Implement get_block_group_index() in terms of it. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/extent-tree.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index faf52e0..c44aa96 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5248,22 +5248,29 @@ wait_block_group_cache_done(struct btrfs_block_group_cache *cache) return 0; } -static int get_block_group_index(struct btrfs_block_group_cache *cache) +static int __get_block_group_index(u64 flags) { int index; - if (cache-flags BTRFS_BLOCK_GROUP_RAID10) + + if (flags BTRFS_BLOCK_GROUP_RAID10) index = 0; - else if (cache-flags BTRFS_BLOCK_GROUP_RAID1) + else if (flags BTRFS_BLOCK_GROUP_RAID1) index = 1; - else if (cache-flags BTRFS_BLOCK_GROUP_DUP) + else if (flags BTRFS_BLOCK_GROUP_DUP) index = 2; - else if (cache-flags BTRFS_BLOCK_GROUP_RAID0) + else if (flags BTRFS_BLOCK_GROUP_RAID0) index = 3; else index = 4; + return index; } +static int get_block_group_index(struct btrfs_block_group_cache *cache) +{ + return __get_block_group_index(cache-flags); +} + enum btrfs_loop_type { LOOP_CACHING_NOWAIT = 0, LOOP_CACHING_WAIT = 1, -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8] Btrfs: make profile_is_valid() check more strict
0 is a valid value for an on-disk chunk profile, but it is not a valid extended profile. (We have a separate bit for single chunks in extended case) Also rename it to alloc_profile_is_valid() for clarity. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/ctree.h | 21 + fs/btrfs/extent-tree.c |2 +- fs/btrfs/volumes.c |6 +++--- 3 files changed, 17 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index aba7832..f057e92 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2735,22 +2735,27 @@ static inline void free_fs_info(struct btrfs_fs_info *fs_info) kfree(fs_info); } /** - * profile_is_valid - tests whether a given profile is valid and reduced + * alloc_profile_is_valid - see if a given profile is valid and reduced * @flags: profile to validate * @extended: if true @flags is treated as an extended profile */ -static inline int profile_is_valid(u64 flags, int extended) +static inline int alloc_profile_is_valid(u64 flags, int extended) { - u64 mask = ~BTRFS_BLOCK_GROUP_PROFILE_MASK; + u64 mask = (extended ? BTRFS_EXTENDED_PROFILE_MASK : + BTRFS_BLOCK_GROUP_PROFILE_MASK); flags = ~BTRFS_BLOCK_GROUP_TYPE_MASK; - if (extended) - mask = ~BTRFS_AVAIL_ALLOC_BIT_SINGLE; - if (flags mask) + /* 1) check that all other bits are zeroed */ + if (flags ~mask) return 0; - /* true if zero or exactly one bit set */ - return (flags (~flags + 1)) == flags; + + /* 2) see if profile is reduced */ + if (flags == 0) + return !extended; /* 0 is valid for usual profiles */ + + /* true if exactly one bit set */ + return (flags (flags - 1)) == 0; } /* root-item.c */ diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 9f16fdb..8c5bd8f 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3400,7 +3400,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans, int wait_for_alloc = 0; int ret = 0; - BUG_ON(!profile_is_valid(flags, 0)); + BUG_ON(!alloc_profile_is_valid(flags, 0)); space_info = __find_space_info(extent_root-fs_info, flags); if (!space_info) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4b263a2..e4ef0f2 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2669,7 +2669,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl, allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10); - if (!profile_is_valid(bctl-data.target, 1) || + if (!alloc_profile_is_valid(bctl-data.target, 1) || bctl-data.target ~allowed) { printk(KERN_ERR btrfs: unable to start balance with target data profile %llu\n, @@ -2677,7 +2677,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl, ret = -EINVAL; goto out; } - if (!profile_is_valid(bctl-meta.target, 1) || + if (!alloc_profile_is_valid(bctl-meta.target, 1) || bctl-meta.target ~allowed) { printk(KERN_ERR btrfs: unable to start balance with target metadata profile %llu\n, @@ -2685,7 +2685,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl, ret = -EINVAL; goto out; } - if (!profile_is_valid(bctl-sys.target, 1) || + if (!alloc_profile_is_valid(bctl-sys.target, 1) || bctl-sys.target ~allowed) { printk(KERN_ERR btrfs: unable to start balance with target system profile %llu\n, -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] Btrfs: improve the logic in btrfs_can_relocate()
Currently if we don't have enough space allocated we go ahead and loop though devices in the hopes of finding enough space for a chunk of the *same* type as the one we are trying to relocate. The problem with that is that if we are trying to restripe the chunk its target type can be more relaxed than the current one (eg require less devices or less space). So, when restriping, run checks against the target profile instead of the current one. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/extent-tree.c | 24 ++-- 1 files changed, 18 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index c44aa96..9454045 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7136,6 +7136,7 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr) u64 min_free; u64 dev_min = 1; u64 dev_nr = 0; + u64 target; int index; int full = 0; int ret = 0; @@ -7176,13 +7177,11 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr) /* * ok we don't have enough space, but maybe we have free space on our * devices to allocate new chunks for relocation, so loop through our -* alloc devices and guess if we have enough space. However, if we -* were marked as full, then we know there aren't enough chunks, and we -* can just return. +* alloc devices and guess if we have enough space. if this block +* group is going to be restriped, run checks against the target +* profile instead of the current one. */ ret = -1; - if (full) - goto out; /* * index: @@ -7192,7 +7191,20 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr) * 3: raid0 * 4: single */ - index = get_block_group_index(block_group); + target = get_restripe_target(root-fs_info, block_group-flags); + if (target) { + index = __get_block_group_index(extended_to_chunk(target)); + } else { + /* +* this is just a balance, so if we were marked as full +* we know there is no space for a new chunk +*/ + if (full) + goto out; + + index = get_block_group_index(block_group); + } + if (index == 0) { dev_min = 4; /* Divide by 2 */ -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/8] Btrfs: validate target profiles only if we are going to use them
Do not run sanity checks on all target profiles unless they all will be used. This came up because alloc_profile_is_valid() is now more strict than it used to be. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c | 27 +++ 1 files changed, 11 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index def9e25..28addea 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2676,14 +2676,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl, } } - /* -* Profile changing sanity checks. Skip them if a simple -* balance is requested. -*/ - if (!((bctl-data.flags | bctl-sys.flags | bctl-meta.flags) - BTRFS_BALANCE_ARGS_CONVERT)) - goto do_balance; - allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE; if (fs_info-fs_devices-num_devices == 1) allowed |= BTRFS_BLOCK_GROUP_DUP; @@ -2693,24 +2685,27 @@ int btrfs_balance(struct btrfs_balance_control *bctl, allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_RAID10); - if (!alloc_profile_is_valid(bctl-data.target, 1) || - bctl-data.target ~allowed) { + if ((bctl-data.flags BTRFS_BALANCE_ARGS_CONVERT) + (!alloc_profile_is_valid(bctl-data.target, 1) || +(bctl-data.target ~allowed))) { printk(KERN_ERR btrfs: unable to start balance with target data profile %llu\n, (unsigned long long)bctl-data.target); ret = -EINVAL; goto out; } - if (!alloc_profile_is_valid(bctl-meta.target, 1) || - bctl-meta.target ~allowed) { + if ((bctl-meta.flags BTRFS_BALANCE_ARGS_CONVERT) + (!alloc_profile_is_valid(bctl-meta.target, 1) || +(bctl-meta.target ~allowed))) { printk(KERN_ERR btrfs: unable to start balance with target metadata profile %llu\n, (unsigned long long)bctl-meta.target); ret = -EINVAL; goto out; } - if (!alloc_profile_is_valid(bctl-sys.target, 1) || - bctl-sys.target ~allowed) { + if ((bctl-sys.flags BTRFS_BALANCE_ARGS_CONVERT) + (!alloc_profile_is_valid(bctl-sys.target, 1) || +(bctl-sys.target ~allowed))) { printk(KERN_ERR btrfs: unable to start balance with target system profile %llu\n, (unsigned long long)bctl-sys.target); @@ -2718,7 +2713,8 @@ int btrfs_balance(struct btrfs_balance_control *bctl, goto out; } - if (bctl-data.target BTRFS_BLOCK_GROUP_DUP) { + if ((bctl-data.flags BTRFS_BALANCE_ARGS_CONVERT) + (bctl-data.target BTRFS_BLOCK_GROUP_DUP)) { printk(KERN_ERR btrfs: dup for data is not allowed\n); ret = -EINVAL; goto out; @@ -2744,7 +2740,6 @@ int btrfs_balance(struct btrfs_balance_control *bctl, } } -do_balance: ret = insert_balance_item(fs_info-tree_root, bctl); if (ret ret != -EEXIST) goto out; -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/8] Btrfs: allow dup for data chunks in mixed mode
Generally we don't allow dup for data, but mixed chunks are special and people seem to think this has its use cases. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c | 13 + 1 files changed, 9 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 28addea..bcc0acd 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2650,6 +2650,7 @@ int btrfs_balance(struct btrfs_balance_control *bctl, { struct btrfs_fs_info *fs_info = bctl-fs_info; u64 allowed; + int mixed = 0; int ret; if (btrfs_fs_closing(fs_info) || @@ -2659,13 +2660,16 @@ int btrfs_balance(struct btrfs_balance_control *bctl, goto out; } + allowed = btrfs_super_incompat_flags(fs_info-super_copy); + if (allowed BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) + mixed = 1; + /* * In case of mixed groups both data and meta should be picked, * and identical options should be given for both of them. */ - allowed = btrfs_super_incompat_flags(fs_info-super_copy); - if ((allowed BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) - (bctl-flags (BTRFS_BALANCE_DATA | BTRFS_BALANCE_METADATA))) { + allowed = BTRFS_BALANCE_DATA | BTRFS_BALANCE_METADATA; + if (mixed (bctl-flags allowed)) { if (!(bctl-flags BTRFS_BALANCE_DATA) || !(bctl-flags BTRFS_BALANCE_METADATA) || memcmp(bctl-data, bctl-meta, sizeof(bctl-data))) { @@ -2713,7 +2717,8 @@ int btrfs_balance(struct btrfs_balance_control *bctl, goto out; } - if ((bctl-data.flags BTRFS_BALANCE_ARGS_CONVERT) + /* allow dup'ed data chunks only in mixed mode */ + if (!mixed (bctl-data.flags BTRFS_BALANCE_ARGS_CONVERT) (bctl-data.target BTRFS_BLOCK_GROUP_DUP)) { printk(KERN_ERR btrfs: dup for data is not allowed\n); ret = -EINVAL; -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix infinite loop in btrfs_shrink_device()
If relocate of block group 0 fails with ENOSPC we end up infinitely looping because key.offset -= 1 statement in that case brings us back to where we started. Signed-off-by: Ilya Dryomov idryo...@gmail.com --- fs/btrfs/volumes.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index bcc0acd..be2d4e0 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2987,7 +2987,7 @@ again: key.offset = (u64)-1; key.type = BTRFS_DEV_EXTENT_KEY; - while (1) { + do { ret = btrfs_search_slot(NULL, root, key, path, 0, 0); if (ret 0) goto done; @@ -3029,8 +3029,7 @@ again: goto done; if (ret == -ENOSPC) failed++; - key.offset -= 1; - } + } while (key.offset-- 0); if (failed !retried) { failed = 0; -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: open_ctree failed
I had found that note on the restore but my restore.c does not allow that flag (it is also missing the m flag as well), I used the branch dangerousdonteveruse on https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git I switched to the master branch to see if there was a difference but it does not appear to be any different. (I did find a btrfs-progs on git-hub which appears to have those flags, but i thought the best to use would be on git.kernel. ) Assuming I can locate the correct restore.c, is there a some other software to determine the object id of the subvolume ? the root object id was 5 thanks Nz On Tue, Mar 27, 2012 at 6:02 AM, Hugo Mills h...@carfax.org.uk wrote: On Tue, Mar 27, 2012 at 05:58:17AM -0700, Not Zippy wrote: One entire subvolume was restored. But there were 4 subvolumes on that partition. Is there a way to specify/force the restore of a different subvolume ? find-root seems to only find a single root. There is only a single root tree, so that's understandable. If you have a look at the documentation for restore[1], it mentions (right near the bottom of the page) that -r will allow you to select an alternative subvolume to recover from. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed, scrub ok
On Tue, Mar 27, 2012 at 4:57 AM, Christoph Groth c...@falma.de wrote: I have a freshly installed system with btrfs as the root file system. The machine is running linux 3.2. The raid1 btrfs file system lives on two new hard drives. About one day after installation the following message appeared in kern.log. There were no other errors. root@mim:/var/log# grep 'btrfs.*fail' kern.log Mar 27 01:07:46 mim kernel: [ 6480.233861] btrfs csum failed ino 453509 off 1495040 csum 3301532933 private 4156998194 Mar 27 01:07:46 mim kernel: [ 6480.234470] btrfs csum failed ino 453509 off 1499136 csum 1873118812 private 3512102188 Mar 27 01:07:46 mim kernel: [ 6480.234572] btrfs csum failed ino 453509 off 1503232 csum 1034640717 private 2041007647 Mar 27 01:07:46 mim kernel: [ 6480.234670] btrfs csum failed ino 453509 off 1507328 csum 889729013 private 2342095239 Mar 27 01:07:46 mim kernel: [ 6480.237977] btrfs csum failed ino 453509 off 1503232 csum 1518679450 private 2041007647 Mar 27 01:07:46 mim kernel: [ 6480.238149] btrfs csum failed ino 453509 off 1507328 csum 889729013 private 2342095239 Mar 27 01:07:46 mim kernel: [ 6480.238330] btrfs csum failed ino 453509 off 1495040 csum 3234580989 private 4156998194 Mar 27 01:07:46 mim kernel: [ 6480.238447] btrfs csum failed ino 453509 off 1499136 csum 1873118812 private 3512102188 Mar 27 01:07:46 mim kernel: [ 6480.243873] btrfs csum failed ino 453509 off 1503232 csum 2184012753 private 2041007647 Mar 27 01:07:46 mim kernel: [ 6480.243962] btrfs csum failed ino 453509 off 1507328 csum 240604621 private 2342095239 inode 453509 belongs to a file installed by dpkg root@mim:/# find / -inum 453509 -ls 453509 1976 -rw-r--r-- 1 root root 2020832 Mar 7 21:11 /usr/lib/libreoffice/basis3.4/program/libsblx.so That file seems to be ok, there are no errors when re-reading it. A scrub done the morning after the incident also didn't find any problems: root@mim:/home/cwg# btrfs scrub status / scrub status for 2da00153-f9ea-4d6c-a6cc-10c913d22686 scrub started at Tue Mar 27 10:37:49 2012 and finished after 3921 seconds total bytes scrubbed: 550.20GB with 0 errors If btrfs is able to find a good copy, it will fix the bad copy automatically. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed, scrub ok
On 27.03.2012 18:24, cwillu wrote: On Tue, Mar 27, 2012 at 4:57 AM, Christoph Groth c...@falma.de wrote: A scrub done the morning after the incident also didn't find any problems: root@mim:/home/cwg# btrfs scrub status / scrub status for 2da00153-f9ea-4d6c-a6cc-10c913d22686 scrub started at Tue Mar 27 10:37:49 2012 and finished after 3921 seconds total bytes scrubbed: 550.20GB with 0 errors If btrfs is able to find a good copy, it will fix the bad copy automatically. It does mention this in your logs, though. Grep for repair, if it doesn't occur, btrfs didn't repair any failures. Scrub would normally find and count checksum errors, though. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs csum failed, scrub ok
Jan Schmidt list.bt...@jan-o-sch.net writes: On 27.03.2012 18:24, cwillu wrote: On Tue, Mar 27, 2012 at 4:57 AM, Christoph Groth c...@falma.de wrote: A scrub done the morning after the incident also didn't find any problems: root@mim:/home/cwg# btrfs scrub status / scrub status for 2da00153-f9ea-4d6c-a6cc-10c913d22686 scrub started at Tue Mar 27 10:37:49 2012 and finished after 3921 seconds total bytes scrubbed: 550.20GB with 0 errors If btrfs is able to find a good copy, it will fix the bad copy automatically. It does mention this in your logs, though. Grep for repair, if it doesn't occur, btrfs didn't repair any failures. repair doesn't occur in the logs. Actually, there are no other entries from btrfs. So why didn't btrfs try to repair a block it believed to be bad? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Create subvolume from a directory?
Hi all, Just a quick question but can't find an obvious answer. Can I create/convert a existing (btrfs) directory into a subvolume? It would be very helpful when transferring 'partitions' into btrfs. I found a similar question way back in google, but that site is down now generally. Thanks in advance. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: open_ctree failed
Thought I would let you know I did get things figured out. I used btrfs-progs from github https://github.com/josefbacik/btrfs-progs I also used the findroot function from there which generated more possibilities for the root objectid. By pluging in the guesses from findroot into -r objectid for the restore I was able to access the data from my subvolumes. thanks Nz On Tue, Mar 27, 2012 at 8:21 AM, Not Zippy notzi...@gmail.com wrote: I had found that note on the restore but my restore.c does not allow that flag (it is also missing the m flag as well), I used the branch dangerousdonteveruse on https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git I switched to the master branch to see if there was a difference but it does not appear to be any different. (I did find a btrfs-progs on git-hub which appears to have those flags, but i thought the best to use would be on git.kernel. ) Assuming I can locate the correct restore.c, is there a some other software to determine the object id of the subvolume ? the root object id was 5 thanks Nz On Tue, Mar 27, 2012 at 6:02 AM, Hugo Mills h...@carfax.org.uk wrote: On Tue, Mar 27, 2012 at 05:58:17AM -0700, Not Zippy wrote: One entire subvolume was restored. But there were 4 subvolumes on that partition. Is there a way to specify/force the restore of a different subvolume ? find-root seems to only find a single root. There is only a single root tree, so that's understandable. If you have a look at the documentation for restore[1], it mentions (right near the bottom of the page) that -r will allow you to select an alternative subvolume to recover from. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Create subvolume from a directory?
On Tue, Mar 27, 2012 at 12:19 PM, Alex a...@bpmit.com wrote: Hi all, Just a quick question but can't find an obvious answer. Can I create/convert a existing (btrfs) directory into a subvolume? It would be very helpful when transferring 'partitions' into btrfs. I found a similar question way back in google, but that site is down now generally. Thanks in advance. I don't think this is possible. The closest thing I can think of is to take a snapshot of the volume, move the directory to the top of the subvolume, then delete all other content.. ... That seems like an awful amount of work, but it'll preserve the contents of the directory without making duplicates. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Create subvolume from a directory?
Hello Alex and all, On 2012-03-27 T 17:19 + Alex wrote: Just a quick question but can't find an obvious answer. Can I create/convert a existing (btrfs) directory into a subvolume? It would be very helpful when transferring 'partitions' into btrfs. I found a similar question way back in google, but that site is down now generally. As far as I am aware, this is not possible directly. My approach to this would be using copy with reflinks: -- snip -- ## migrate /var/lib/lxc/installserver ## from directory to btrfs subvolume # du -ks /var/lib/lxc/installserver 500332 /var/lib/lxc/installserver # mv /var/lib/lxc/installserver /var/lib/lxc/installserver_tmp # btrfs subvol create /var/lib/lxc/installserver Create subvolume '/var/lib/lxc/installserver' # time cp -a --reflink /var/lib/lxc/installserver_tmp/rootfs /var/lib/lxc/installserver real0m1.367s user0m0.148s sys 0m1.108s ## Now remove /var/lib/lxc/installserver_tmp (or not) -- snap -- Just to compare this with a mv: -- snip -- ## Go back to former state # btrfs subvol delete /var/lib/lxc/installserver Delete subvolume '/var/lib/lxc/installserver' # btrfs subvol create /var/lib/lxc/installserver Create subvolume '/var/lib/lxc/installserver' # time mv /var/lib/lxc/installserver_tmp/rootfs /var/lib/lxc/installserver/ real0m12.917s user0m0.208s sys 0m2.508s -- snap -- While the time measurement might be flawed due to the subvol actions inbetween, caching etc.: I tried several times, and cp --reflinks always is multiple times faster than mv in my environment. Or did I misunderstand your question? so long - MgE -- Matthias G. Eckermann Senior Product Manager SUSE® Linux Enterprise SUSE LINUX Products GmbH Maxfeldstraße 5 90409 Nürnberg Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs crash after disk reconnect
On Monday 2012-03-26 03:42, Liu Bo wrote: On 03/23/2012 08:07 PM, Jan Engelhardt wrote: Observed on Linux 3.2.9 after the controller/disk flaked in-out. (The world still needs a SCSI error decoding tool to tell normal people what cmd and res are about.) I'm not that sure if your 3.2.9-jng4-default build contains this commit or not: commit 8bedd51b6121c4607784d75f852828d25d119c52 (Btrfs: Check for NULL page in extent_range_uptodate) 8bedd isn't in 3.2.9; thanks for the hint, I will try that one. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Create subvolume from a directory?
On Wed, Mar 28, 2012 at 5:24 AM, Matthias G. Eckermann m...@suse.com wrote: While the time measurement might be flawed due to the subvol actions inbetween, caching etc.: I tried several times, and cp --reflinks always is multiple times faster than mv in my environment. So this is cross-subvolume reflinks? I thought the code for that wasn't merged yet? -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Create subvolume from a directory?
On 03/28/2012 06:24 AM, Matthias G. Eckermann wrote: Hello Alex and all, On 2012-03-27 T 17:19 + Alex wrote: Just a quick question but can't find an obvious answer. Can I create/convert a existing (btrfs) directory into a subvolume? It would be very helpful when transferring 'partitions' into btrfs. I found a similar question way back in google, but that site is down now generally. As far as I am aware, this is not possible directly. My approach to this would be using copy with reflinks: -- snip -- ## migrate /var/lib/lxc/installserver ## from directory to btrfs subvolume # du -ks /var/lib/lxc/installserver 500332 /var/lib/lxc/installserver # mv /var/lib/lxc/installserver /var/lib/lxc/installserver_tmp # btrfs subvol create /var/lib/lxc/installserver Create subvolume '/var/lib/lxc/installserver' # time cp -a --reflink /var/lib/lxc/installserver_tmp/rootfs /var/lib/lxc/installserver This is too much weird. AFAIK, clone between different subvolumes should be forbidden. So this would get a Invalid cross-device link, because an individual subvolume can be mounted directly. thanks, liubo real0m1.367s user0m0.148s sys 0m1.108s ## Now remove /var/lib/lxc/installserver_tmp (or not) -- snap -- Just to compare this with a mv: -- snip -- ## Go back to former state # btrfs subvol delete /var/lib/lxc/installserver Delete subvolume '/var/lib/lxc/installserver' # btrfs subvol create /var/lib/lxc/installserver Create subvolume '/var/lib/lxc/installserver' # time mv /var/lib/lxc/installserver_tmp/rootfs /var/lib/lxc/installserver/ real0m12.917s user0m0.208s sys 0m2.508s -- snap -- While the time measurement might be flawed due to the subvol actions inbetween, caching etc.: I tried several times, and cp --reflinks always is multiple times faster than mv in my environment. Or did I misunderstand your question? so long - MgE -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html