Removing bad hdd from btrfs volume
Hi, I have an btrfs volume that spans multiple disks (no raid, just single), and earlier this morning I hit some hardware problems with one of the disks. I tried btrfs dev del /dev/sda1 /, but btrfs was unable to migrate the 1gb that appears to be causing the read errors. See http://sprunge.us/aeZC Is there some way to figure out which file(s) are affected, and if they are stuff I don't care about, is there some way to force btrfs to lose the 1gb it can't copy off of the failing hdd? Thanks, Peter Foley -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/11] Btrfs: fallocate: Work with sectorsized blocks
While at it, this commit changes btrfs_truncate_page() to truncate sectorsized blocks instead of pages. Hence the function has been renamed to btrfs_truncate_block(). Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com --- fs/btrfs/ctree.h | 2 +- fs/btrfs/file.c | 47 +-- fs/btrfs/inode.c | 52 +++- 3 files changed, 53 insertions(+), 48 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index aac314e..fec5fa9 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3897,7 +3897,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct inode *dir, u64 objectid, const char *name, int name_len); -int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len, +int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, int front); int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, struct btrfs_root *root, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index e3b2b3c..f69e030 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2278,23 +2278,26 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) u64 tail_len; u64 orig_start = offset; u64 cur_offset; + unsigned char blocksize_bits; u64 min_size = btrfs_calc_trunc_metadata_size(root, 1); u64 drop_end; int ret = 0; int err = 0; int rsv_count; - bool same_page; + bool same_block; bool no_holes = btrfs_fs_incompat(root-fs_info, NO_HOLES); u64 ino_size; - bool truncated_page = false; + bool truncated_block = false; bool updated_inode = false; + blocksize_bits = inode-i_blkbits; + ret = btrfs_wait_ordered_range(inode, offset, len); if (ret) return ret; mutex_lock(inode-i_mutex); - ino_size = round_up(inode-i_size, PAGE_CACHE_SIZE); + ino_size = round_up(inode-i_size, root-sectorsize); ret = find_first_non_hole(inode, offset, len); if (ret 0) goto out_only_mutex; @@ -2307,31 +2310,30 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) lockstart = round_up(offset, BTRFS_I(inode)-root-sectorsize); lockend = round_down(offset + len, BTRFS_I(inode)-root-sectorsize) - 1; - same_page = ((offset PAGE_CACHE_SHIFT) == - ((offset + len - 1) PAGE_CACHE_SHIFT)); - + same_block = ((offset blocksize_bits) + == ((offset + len - 1) blocksize_bits)); /* -* We needn't truncate any page which is beyond the end of the file +* We needn't truncate any block which is beyond the end of the file * because we are sure there is no data there. */ /* -* Only do this if we are in the same page and we aren't doing the -* entire page. +* Only do this if we are in the same block and we aren't doing the +* entire block. */ - if (same_page len PAGE_CACHE_SIZE) { + if (same_block len root-sectorsize) { if (offset ino_size) { - truncated_page = true; - ret = btrfs_truncate_page(inode, offset, len, 0); + truncated_block = true; + ret = btrfs_truncate_block(inode, offset, len, 0); } else { ret = 0; } goto out_only_mutex; } - /* zero back part of the first page */ + /* zero back part of the first block */ if (offset ino_size) { - truncated_page = true; - ret = btrfs_truncate_page(inode, offset, 0, 0); + truncated_block = true; + ret = btrfs_truncate_block(inode, offset, 0, 0); if (ret) { mutex_unlock(inode-i_mutex); return ret; @@ -2366,9 +2368,10 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) if (!ret) { /* zero the front end of the last page */ if (tail_start + tail_len ino_size) { - truncated_page = true; - ret = btrfs_truncate_page(inode, - tail_start + tail_len, 0, 1); + truncated_block = true; + ret = btrfs_truncate_block(inode, + tail_start + tail_len, + 0, 1); if (ret) goto
[PATCH 08/11] Btrfs: btrfs_submit_direct_hook: Handle map_length bio vector length
In subpagesize-blocksize scenario, map_length can be less than the length of a bio vector. Such a condition may cause btrfs_submit_direct_hook() to submit a zero length bio. Fix this by comparing map_length against block size rather than with bv_len. Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com --- fs/btrfs/inode.c | 25 + 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index dad76ef..1acee74 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8110,9 +8110,11 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip, u64 file_offset = dip-logical_offset; u64 submit_len = 0; u64 map_length; - int nr_pages = 0; - int ret; + u32 blocksize = root-sectorsize; int async_submit = 0; + int nr_sectors; + int ret; + int i; map_length = orig_bio-bi_iter.bi_size; ret = btrfs_map_block(root-fs_info, rw, start_sector 9, @@ -8142,9 +8144,12 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip, atomic_inc(dip-pending_bios); while (bvec = (orig_bio-bi_io_vec + orig_bio-bi_vcnt - 1)) { - if (map_length submit_len + bvec-bv_len || - bio_add_page(bio, bvec-bv_page, bvec-bv_len, -bvec-bv_offset) bvec-bv_len) { + nr_sectors = bvec-bv_len inode-i_blkbits; + i = 0; +next_block: + if (unlikely(map_length submit_len + blocksize || + bio_add_page(bio, bvec-bv_page, blocksize, + bvec-bv_offset + (i * blocksize)) blocksize)) { /* * inc the count before we submit the bio so * we know the end IO handler won't happen before @@ -8165,7 +8170,6 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip, file_offset += submit_len; submit_len = 0; - nr_pages = 0; bio = btrfs_dio_bio_alloc(orig_bio-bi_bdev, start_sector, GFP_NOFS); @@ -8183,9 +8187,14 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip, bio_put(bio); goto out_err; } + + goto next_block; } else { - submit_len += bvec-bv_len; - nr_pages++; + submit_len += blocksize; + if (--nr_sectors) { + i++; + goto next_block; + } bvec++; } } -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
FW: btrfs-progs: android build
Hi, I made btrfs-progs android build script and test it.\ And need some help on btrfs_wipe_existing_sb().\ On the test it looks work well.\ But I’m not sure it’s ok. 0001-btrfs-progs-Add-Android-build-mk-file.patch Description: Binary data
[PATCH 09/11] Btrfs: Limit inline extents to root-sectorsize
cow_file_range_inline() limits the size of an inline extent to PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by comparing against root-sectorsize. Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1acee74..daf2462 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -257,7 +257,7 @@ static noinline int cow_file_range_inline(struct btrfs_root *root, data_len = compressed_size; if (start 0 || - actual_end PAGE_CACHE_SIZE || + actual_end root-sectorsize || data_len BTRFS_MAX_INLINE_DATA_SIZE(root) || (!compressed_size (actual_end (root-sectorsize - 1)) == 0) || -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/11] Btrfs: Compute and look up csums based on sectorsized blocks
Checksums are applicable to sectorsize units. The current code uses bio-bv_len units to compute and look up checksums. This works on machines where sectorsize == PAGE_SIZE. This patch makes the checksum computation and look up code to work with sectorsize units. Reviewed-by: Liu Bo bo.li@oracle.com Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com --- fs/btrfs/file-item.c | 90 +--- 1 file changed, 57 insertions(+), 33 deletions(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 58ece65..d752051 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -172,6 +172,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root, u64 item_start_offset = 0; u64 item_last_offset = 0; u64 disk_bytenr; + u64 page_bytes_left; u32 diff; int nblocks; int bio_index = 0; @@ -220,6 +221,8 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root, disk_bytenr = (u64)bio-bi_iter.bi_sector 9; if (dio) offset = logical_offset; + + page_bytes_left = bvec-bv_len; while (bio_index bio-bi_vcnt) { if (!dio) offset = page_offset(bvec-bv_page) + bvec-bv_offset; @@ -243,7 +246,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root, if (BTRFS_I(inode)-root-root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID) { set_extent_bits(io_tree, offset, - offset + bvec-bv_len - 1, + offset + root-sectorsize - 1, EXTENT_NODATASUM, GFP_NOFS); } else { btrfs_info(BTRFS_I(inode)-root-fs_info, @@ -281,11 +284,17 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root, found: csum += count * csum_size; nblocks -= count; - bio_index += count; + while (count--) { - disk_bytenr += bvec-bv_len; - offset += bvec-bv_len; - bvec++; + disk_bytenr += root-sectorsize; + offset += root-sectorsize; + page_bytes_left -= root-sectorsize; + if (!page_bytes_left) { + bio_index++; + bvec++; + page_bytes_left = bvec-bv_len; + } + } } btrfs_free_path(path); @@ -432,6 +441,8 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct inode *inode, struct bio_vec *bvec = bio-bi_io_vec; int bio_index = 0; int index; + int nr_sectors; + int i; unsigned long total_bytes = 0; unsigned long this_sum_bytes = 0; u64 offset; @@ -459,41 +470,54 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct inode *inode, if (!contig) offset = page_offset(bvec-bv_page) + bvec-bv_offset; - if (offset = ordered-file_offset + ordered-len || - offset ordered-file_offset) { - unsigned long bytes_left; - sums-len = this_sum_bytes; - this_sum_bytes = 0; - btrfs_add_ordered_sum(inode, ordered, sums); - btrfs_put_ordered_extent(ordered); + data = kmap_atomic(bvec-bv_page); - bytes_left = bio-bi_iter.bi_size - total_bytes; - sums = kzalloc(btrfs_ordered_sum_size(root, bytes_left), - GFP_NOFS); - BUG_ON(!sums); /* -ENOMEM */ - sums-len = bytes_left; - ordered = btrfs_lookup_ordered_extent(inode, offset); - BUG_ON(!ordered); /* Logic error */ - sums-bytenr = ((u64)bio-bi_iter.bi_sector 9) + - total_bytes; - index = 0; + nr_sectors = (bvec-bv_len + root-sectorsize - 1) +inode-i_blkbits; + + + for (i = 0; i nr_sectors; i++) { + if (offset = ordered-file_offset + ordered-len || + offset ordered-file_offset) { + unsigned long bytes_left; + + sums-len = this_sum_bytes; + this_sum_bytes = 0; + btrfs_add_ordered_sum(inode, ordered, sums); + btrfs_put_ordered_extent(ordered); + + bytes_left =
[PATCH 03/11] Btrfs: Direct I/O read: Work on sectorsized blocks
The direct I/O read's endio and corresponding repair functions work on page sized blocks. This commit adds the ability for direct I/O read to work on subpagesized blocks. Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com --- fs/btrfs/inode.c | 96 ++-- 1 file changed, 73 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e33dff3..ff8b699 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7630,9 +7630,9 @@ static int btrfs_check_dio_repairable(struct inode *inode, } static int dio_read_error(struct inode *inode, struct bio *failed_bio, - struct page *page, u64 start, u64 end, - int failed_mirror, bio_end_io_t *repair_endio, - void *repair_arg) + struct page *page, unsigned int pgoff, + u64 start, u64 end, int failed_mirror, + bio_end_io_t *repair_endio, void *repair_arg) { struct io_failure_record *failrec; struct bio *bio; @@ -7653,7 +7653,9 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio, return -EIO; } - if (failed_bio-bi_vcnt 1) + if ((failed_bio-bi_vcnt 1) + || (failed_bio-bi_io_vec-bv_len +BTRFS_I(inode)-root-sectorsize)) read_mode = READ_SYNC | REQ_FAILFAST_DEV; else read_mode = READ_SYNC; @@ -7661,7 +7663,7 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio, isector = start - btrfs_io_bio(failed_bio)-logical; isector = inode-i_sb-s_blocksize_bits; bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page, - 0, isector, repair_endio, repair_arg); + pgoff, isector, repair_endio, repair_arg); if (!bio) { free_io_failure(inode, failrec); return -EIO; @@ -7691,12 +7693,17 @@ struct btrfs_retry_complete { static void btrfs_retry_endio_nocsum(struct bio *bio, int err) { struct btrfs_retry_complete *done = bio-bi_private; + struct inode *inode; struct bio_vec *bvec; int i; if (err) goto end; + BUG_ON(bio-bi_vcnt != 1); + inode = bio-bi_io_vec-bv_page-mapping-host; + BUG_ON(bio-bi_io_vec-bv_len != BTRFS_I(inode)-root-sectorsize); + done-uptodate = 1; bio_for_each_segment_all(bvec, bio, i) clean_io_failure(done-inode, done-start, bvec-bv_page, 0); @@ -7711,22 +7718,30 @@ static int __btrfs_correct_data_nocsum(struct inode *inode, struct bio_vec *bvec; struct btrfs_retry_complete done; u64 start; + unsigned int pgoff; + u32 sectorsize; + int nr_sectors; int i; int ret; + sectorsize = BTRFS_I(inode)-root-sectorsize; + start = io_bio-logical; done.inode = inode; bio_for_each_segment_all(bvec, io_bio-bio, i) { -try_again: + nr_sectors = bvec-bv_len inode-i_blkbits; + pgoff = bvec-bv_offset; + +next_block_or_try_again: done.uptodate = 0; done.start = start; init_completion(done.done); - ret = dio_read_error(inode, io_bio-bio, bvec-bv_page, start, -start + bvec-bv_len - 1, -io_bio-mirror_num, -btrfs_retry_endio_nocsum, done); + ret = dio_read_error(inode, io_bio-bio, bvec-bv_page, + pgoff, start, start + sectorsize - 1, + io_bio-mirror_num, + btrfs_retry_endio_nocsum, done); if (ret) return ret; @@ -7734,10 +7749,15 @@ try_again: if (!done.uptodate) { /* We might have another mirror, so try again */ - goto try_again; + goto next_block_or_try_again; } - start += bvec-bv_len; + start += sectorsize; + + if (nr_sectors--) { + pgoff += sectorsize; + goto next_block_or_try_again; + } } return 0; @@ -7747,7 +7767,9 @@ static void btrfs_retry_endio(struct bio *bio, int err) { struct btrfs_retry_complete *done = bio-bi_private; struct btrfs_io_bio *io_bio = btrfs_io_bio(bio); + struct inode *inode; struct bio_vec *bvec; + u64 start; int uptodate; int ret; int i; @@ -7756,13 +7778,20 @@ static void btrfs_retry_endio(struct bio *bio, int err) goto end; uptodate = 1; + + start = done-start; + + BUG_ON(bio-bi_vcnt !=
[PATCH 05/11] Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units
In subpagesize-blocksize scenario, if i_size occurs in a block which is not the last block in the page, then the space to be reserved should be calculated appropriately. Reviewed-by: Liu Bo bo.li@oracle.com Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com --- fs/btrfs/inode.c | 36 +++- 1 file changed, 31 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index afb8d2b..b39273b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8626,11 +8626,24 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf) loff_t size; int ret; int reserved = 0; + u64 reserved_space; u64 page_start; u64 page_end; + u64 end; + + reserved_space = PAGE_CACHE_SIZE; sb_start_pagefault(inode-i_sb); - ret = btrfs_delalloc_reserve_space(inode, PAGE_CACHE_SIZE); + + /* + Reserving delalloc space after obtaining the page lock can lead to + deadlock. For example, if a dirty page is locked by this function + and the call to btrfs_delalloc_reserve_space() ends up triggering + dirty page write out, then the btrfs_writepage() function could + end up waiting indefinitely to get a lock on the page currently + being processed by btrfs_page_mkwrite() function. +*/ + ret = btrfs_delalloc_reserve_space(inode, reserved_space); if (!ret) { ret = file_update_time(vma-vm_file); reserved = 1; @@ -8651,6 +8664,7 @@ again: size = i_size_read(inode); page_start = page_offset(page); page_end = page_start + PAGE_CACHE_SIZE - 1; + end = page_end; if ((page-mapping != inode-i_mapping) || (page_start = size)) { @@ -8666,7 +8680,7 @@ again: * we can't set the delalloc bits if there are pending ordered * extents. Drop our locks and wait for them to finish */ - ordered = btrfs_lookup_ordered_extent(inode, page_start); + ordered = btrfs_lookup_ordered_range(inode, page_start, page_end); if (ordered) { unlock_extent_cached(io_tree, page_start, page_end, cached_state, GFP_NOFS); @@ -8676,6 +8690,18 @@ again: goto again; } + if (page-index == ((size - 1) PAGE_CACHE_SHIFT)) { + reserved_space = round_up(size - page_start, root-sectorsize); + if (reserved_space PAGE_CACHE_SIZE) { + end = page_start + reserved_space - 1; + spin_lock(BTRFS_I(inode)-lock); + BTRFS_I(inode)-outstanding_extents++; + spin_unlock(BTRFS_I(inode)-lock); + btrfs_delalloc_release_space(inode, + PAGE_CACHE_SIZE - reserved_space); + } + } + /* * XXX - page_mkwrite gets called every time the page is dirtied, even * if it was already dirty, so for space accounting reasons we need to @@ -8683,12 +8709,12 @@ again: * is probably a better way to do this, but for now keep consistent with * prepare_pages in the normal write path. */ - clear_extent_bit(BTRFS_I(inode)-io_tree, page_start, page_end, + clear_extent_bit(BTRFS_I(inode)-io_tree, page_start, end, EXTENT_DIRTY | EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 0, 0, cached_state, GFP_NOFS); - ret = btrfs_set_extent_delalloc(inode, page_start, page_end, + ret = btrfs_set_extent_delalloc(inode, page_start, end, cached_state); if (ret) { unlock_extent_cached(io_tree, page_start, page_end, @@ -8727,7 +8753,7 @@ out_unlock: } unlock_page(page); out: - btrfs_delalloc_release_space(inode, PAGE_CACHE_SIZE); + btrfs_delalloc_release_space(inode, reserved_space); out_noreserve: sb_end_pagefault(inode-i_sb); return ret; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/11] Btrfs: Clean pte corresponding to page straddling i_size
When extending a file by either truncate up or by writing beyond i_size, the page which had i_size needs to be marked read only so that future writes to the page via mmap interface causes btrfs_page_mkwrite() to be invoked. If not, a write performed after extending the file via the mmap interface will find the page to be writaeable and continue writing to the page without invoking btrfs_page_mkwrite() i.e. we end up writing to a file without reserving disk space. Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com --- fs/btrfs/file.c | 12 ++-- fs/btrfs/inode.c | 2 +- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index f69e030..aba215c 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1755,6 +1755,8 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, ssize_t err; loff_t pos; size_t count; + loff_t oldsize; + int clean_page = 0; mutex_lock(inode-i_mutex); err = generic_write_checks(iocb, from); @@ -1793,14 +1795,17 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, pos = iocb-ki_pos; count = iov_iter_count(from); start_pos = round_down(pos, root-sectorsize); - if (start_pos i_size_read(inode)) { + oldsize = i_size_read(inode); + if (start_pos oldsize) { /* Expand hole size to cover write data, preventing empty gap */ end_pos = round_up(pos + count, root-sectorsize); - err = btrfs_cont_expand(inode, i_size_read(inode), end_pos); + err = btrfs_cont_expand(inode, oldsize, end_pos); if (err) { mutex_unlock(inode-i_mutex); goto out; } + if (start_pos round_up(oldsize, root-sectorsize)) + clean_page = 1; } if (sync) @@ -1812,6 +1817,9 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, num_written = __btrfs_buffered_write(file, from, pos); if (num_written 0) iocb-ki_pos = pos + num_written; + if (clean_page) + pagecache_isize_extended(inode, oldsize, + i_size_read(inode)); } mutex_unlock(inode-i_mutex); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ea7d9f1..0a8a5ff 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4824,7 +4824,6 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) } if (newsize oldsize) { - truncate_pagecache(inode, newsize); /* * Don't do an expanding truncate while snapshoting is ongoing. * This is to ensure the snapshot captures a fully consistent @@ -4847,6 +4846,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) i_size_write(inode, newsize); btrfs_ordered_update_i_size(inode, i_size_read(inode), NULL); + pagecache_isize_extended(inode, oldsize, newsize); ret = btrfs_update_inode(trans, root, inode); btrfs_end_write_no_snapshoting(root); btrfs_end_transaction(trans, root); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/11] Btrfs: Fix block size returned to user space
btrfs_getattr() returns PAGE_CACHE_SIZE as the block size. Since generic_fillattr() already does the right thing (by obtaining block size from inode-i_blkbits), just remove the statement from btrfs_getattr. Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com --- fs/btrfs/inode.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index daf2462..ea7d9f1 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9164,7 +9164,6 @@ static int btrfs_getattr(struct vfsmount *mnt, generic_fillattr(inode, stat); stat-dev = BTRFS_I(inode)-root-anon_dev; - stat-blksize = PAGE_CACHE_SIZE; spin_lock(BTRFS_I(inode)-lock); delalloc_bytes = BTRFS_I(inode)-delalloc_bytes; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/11] Btrfs: Pre subpagesize-blocksize cleanups
Hello all, The patches posted along with this cover letter are cleanups made during the developement of subpagesize-blocksize patchset. I believe that they can be integrated with the mainline kernel. Hence I have posted them separately from the subpagesize-blocksize patchset. I have testsed the patchset by running xfstests on ppc64 and x86_64. On ppc64, some of the Btrfs specific tests and generic/255 fail because they assume 4K as the filesystem's block size. I have fixed some of the test cases. I will fix the rest and mail them to the fstests mailing list in the near future. Chandan Rajendra (11): Btrfs: __btrfs_buffered_write: Reserve/release extents aligned to block size Btrfs: Compute and look up csums based on sectorsized blocks Btrfs: Direct I/O read: Work on sectorsized blocks Btrfs: fallocate: Work with sectorsized blocks Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units Btrfs: Search for all ordered extents that could span across a page Btrfs: Use (eb-start, seq) as search key for tree modification log Btrfs: btrfs_submit_direct_hook: Handle map_length bio vector length Btrfs: Limit inline extents to root-sectorsize Btrfs: Fix block size returned to user space Btrfs: Clean pte corresponding to page straddling i_size fs/btrfs/ctree.c | 34 fs/btrfs/ctree.h | 2 +- fs/btrfs/extent_io.c | 3 +- fs/btrfs/file-item.c | 90 --- fs/btrfs/file.c | 99 + fs/btrfs/inode.c | 239 --- 6 files changed, 308 insertions(+), 159 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/11] Btrfs: Search for all ordered extents that could span across a page
In subpagesize-blocksize scenario it is not sufficient to search using the first byte of the page to make sure that there are no ordered extents present across the page. Fix this. Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com --- fs/btrfs/extent_io.c | 3 ++- fs/btrfs/inode.c | 25 ++--- 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index a3ec2c8..65691a0 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3164,7 +3164,8 @@ static int __extent_read_full_page(struct extent_io_tree *tree, while (1) { lock_extent(tree, start, end); - ordered = btrfs_lookup_ordered_extent(inode, start); + ordered = btrfs_lookup_ordered_range(inode, start, + PAGE_CACHE_SIZE); if (!ordered) break; unlock_extent(tree, start, end); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b39273b..dad76ef 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1975,7 +1975,8 @@ again: if (PagePrivate2(page)) goto out; - ordered = btrfs_lookup_ordered_extent(inode, page_start); + ordered = btrfs_lookup_ordered_range(inode, page_start, + PAGE_CACHE_SIZE); if (ordered) { unlock_extent_cached(BTRFS_I(inode)-io_tree, page_start, page_end, cached_state, GFP_NOFS); @@ -8519,6 +8520,8 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, struct extent_state *cached_state = NULL; u64 page_start = page_offset(page); u64 page_end = page_start + PAGE_CACHE_SIZE - 1; + u64 start; + u64 end; int inode_evicting = inode-i_state I_FREEING; /* @@ -8538,14 +8541,18 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, if (!inode_evicting) lock_extent_bits(tree, page_start, page_end, 0, cached_state); - ordered = btrfs_lookup_ordered_extent(inode, page_start); +again: + start = page_start; + ordered = btrfs_lookup_ordered_range(inode, start, + page_end - start + 1); if (ordered) { + end = min(page_end, ordered-file_offset + ordered-len - 1); /* * IO on this page will never be started, so we need * to account for any ordered extents now */ if (!inode_evicting) - clear_extent_bit(tree, page_start, page_end, + clear_extent_bit(tree, start, end, EXTENT_DIRTY | EXTENT_DELALLOC | EXTENT_LOCKED | EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG, 1, 0, cached_state, @@ -8562,22 +8569,26 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, spin_lock_irq(tree-lock); set_bit(BTRFS_ORDERED_TRUNCATED, ordered-flags); - new_len = page_start - ordered-file_offset; + new_len = start - ordered-file_offset; if (new_len ordered-truncated_len) ordered-truncated_len = new_len; spin_unlock_irq(tree-lock); if (btrfs_dec_test_ordered_pending(inode, ordered, - page_start, - PAGE_CACHE_SIZE, 1)) + start, + end - start + 1, 1)) btrfs_finish_ordered_io(ordered); } btrfs_put_ordered_extent(ordered); if (!inode_evicting) { cached_state = NULL; - lock_extent_bits(tree, page_start, page_end, 0, + lock_extent_bits(tree, start, end, 0, cached_state); } + + start = end + 1; + if (start page_end) + goto again; } if (!inode_evicting) { -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix stale dir entries after removing a link and fsync
From: Filipe Manana fdman...@suse.com We have one more case where after a log tree is replayed we get inconsistent metadata leading to stale directory entries, due to some directories having entries pointing to some inode while the inode does not have a matching BTRFS_INODE_[REF|EXTREF]_KEY item. To trigger the problem we need to have a file with multiple hard links belonging to different parent directories. Then if one of those hard links is removed and we fsync the file using one of its other links that belongs to a different parent directory, we end up not logging the fact that the removed hard link doesn't exists anymore in the parent directory. Simple reproducer: seq=`basename $0` seqres=$RESULT_DIR/$seq echo QA output created by $seq tmp=/tmp/$$ status=1 # failure is the default! trap _cleanup; exit \$status 0 1 2 3 15 _cleanup() { _cleanup_flakey rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter . ./common/dmflakey # real QA test starts here _need_to_be_root _supported_fs generic _supported_os Linux _require_scratch _require_dm_flakey _require_metadata_journaling $SCRATCH_DEV rm -f $seqres.full _scratch_mkfs $seqres.full 21 _init_flakey _mount_flakey # Create our test directory and file. mkdir $SCRATCH_MNT/testdir touch $SCRATCH_MNT/foo ln $SCRATCH_MNT/foo $SCRATCH_MNT/testdir/foo2 ln $SCRATCH_MNT/foo $SCRATCH_MNT/testdir/foo3 # Make sure everything done so far is durably persisted. sync # Now we remove one of our file's hardlinks in the directory testdir. unlink $SCRATCH_MNT/testdir/foo3 # We now fsync our file using the foo link, which has a parent that # is not the directory testdir. $XFS_IO_PROG -c fsync $SCRATCH_MNT/foo # Silently drop all writes and unmount to simulate a crash/power # failure. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey # Allow writes again, mount to trigger journal/log replay. _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey # After the journal/log is replayed we expect to not see the foo3 # link anymore and we should be able to remove all names in the # directory testdir and then remove it (no stale directory entries # left after the journal/log replay). echo Entries in testdir: ls -1 $SCRATCH_MNT/testdir rm -f $SCRATCH_MNT/testdir/* rmdir $SCRATCH_MNT/testdir _unmount_flakey status=0 exit The test fails with: $ ./check generic/107 FSTYP -- btrfs PLATFORM -- Linux/x86_64 debian3 4.1.0-rc6-btrfs-next-11+ MKFS_OPTIONS -- /dev/sdc MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1 generic/107 3s ... - output mismatch (see .../results/generic/107.out.bad) --- tests/generic/107.out 2015-08-01 01:39:45.807462161 +0100 +++ /home/fdmanana/git/hub/xfstests/results//generic/107.out.bad @@ -1,3 +1,5 @@ QA output created by 107 Entries in testdir: foo2 +foo3 +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty ... _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent \ (see /home/fdmanana/git/hub/xfstests/results//generic/107.full) _check_dmesg: something found in dmesg (see .../results/generic/107.dmesg) Ran: generic/107 Failures: generic/107 Failed 1 of 1 tests $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.full (...) checking fs roots root 5 inode 257 errors 200, dir isize wrong unresolved ref dir 257 index 3 namelen 4 name foo3 filetype 1 errors 5, no dir item, no inode ref (...) And produces the following warning in dmesg: [127298.759064] BTRFS info (device dm-0): failed to delete reference to foo3, inode 258 parent 257 [127298.762081] [ cut here ] [127298.763311] WARNING: CPU: 10 PID: 7891 at fs/btrfs/inode.c:3956 __btrfs_unlink_inode+0x182/0x35a [btrfs]() [127298.767327] BTRFS: Transaction aborted (error -2) (...) [127298.788611] Call Trace: [127298.789137] [8145f077] dump_stack+0x4f/0x7b [127298.790090] [81095de5] ? console_unlock+0x356/0x3a2 [127298.791157] [8104b3b0] warn_slowpath_common+0xa1/0xbb [127298.792323] [a065ad09] ? __btrfs_unlink_inode+0x182/0x35a [btrfs] [127298.793633] [8104b410] warn_slowpath_fmt+0x46/0x48 [127298.794699] [a065ad09] __btrfs_unlink_inode+0x182/0x35a [btrfs] [127298.797640] [a065be8f] btrfs_unlink_inode+0x1e/0x40 [btrfs] [127298.798876] [a065bf11] btrfs_unlink+0x60/0x9b [btrfs] [127298.800154] [8116fb48] vfs_unlink+0x9c/0xed [127298.801303] [81173481] do_unlinkat+0x12b/0x1fb [127298.802450] [81253855] ? lockdep_sys_exit_thunk+0x12/0x14 [127298.803797] [81174056] SyS_unlinkat+0x29/0x2b [127298.805017] [81465197] system_call_fastpath+0x12/0x6f [127298.806310] ---[ end trace
[PATCH] fstests: generic test for fsync of file with multiple links
From: Filipe Manana fdman...@suse.com Test that when we have a file with multiple hard links belonging to different parent directories, if we remove one of those links, fsync the file using one of its other links (that has a parent directory different from the one we removed a link from), power fail and then replay the fsync log/journal, the hard link we removed is not available anymore and all the filesystem metadata is in a consistent state. This test is motivated by an issue found in btrfs, where the test fails with: generic/107 2s ... - output mismatch (see .../results/generic/107.out.bad) --- tests/generic/107.out 2015-08-04 09:47:46.922131256 +0100 +++ /home/fdmanana/git/hub/xfstests/results//generic/107.out.bad @@ -1,3 +1,5 @@ QA output created by 107 Entries in testdir: foo2 +foo3 +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty ... (Run 'diff -u tests/generic/107.out .../generic/107.out.bad' to see the entire diff) _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent (see .../generic/107.full) _check_dmesg: something found in dmesg (see .../generic/107.dmesg) $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.full _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent *** fsck.btrfs output *** checking extents checking free space cache checking fs roots root 5 inode 257 errors 200, dir isize wrong unresolved ref dir 257 index 3 namelen 4 name foo3 filetype 1 \ errors 5, no dir item, no inode ref $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.dmesg (...) [188897.707311] BTRFS info (device dm-0): failed to delete reference to \ foo3, inode 258 parent 257 [188897.711345] [ cut here ] [188897.713369] WARNING: CPU: 10 PID: 19452 at fs/btrfs/inode.c:3956 \ __btrfs_unlink_inode+0x182/0x35a [btrfs]() [188897.717661] BTRFS: Transaction aborted (error -2) (...) [188897.747898] Call Trace: [188897.748519] [8145f077] dump_stack+0x4f/0x7b [188897.749602] [81095de5] ? console_unlock+0x356/0x3a2 [188897.750682] [8104b3b0] warn_slowpath_common+0xa1/0xbb [188897.751936] [a04c5d09] ? __btrfs_unlink_inode+0x182/0x35a [btrfs] [188897.753485] [8104b410] warn_slowpath_fmt+0x46/0x48 [188897.754781] [a04c5d09] __btrfs_unlink_inode+0x182/0x35a [btrfs] [188897.756295] [a04c6e8f] btrfs_unlink_inode+0x1e/0x40 [btrfs] [188897.757692] [a04c6f11] btrfs_unlink+0x60/0x9b [btrfs] [188897.758978] [8116fb48] vfs_unlink+0x9c/0xed [188897.760151] [81173481] do_unlinkat+0x12b/0x1fb [188897.761354] [81253855] ? lockdep_sys_exit_thunk+0x12/0x14 [188897.762692] [81174056] SyS_unlinkat+0x29/0x2b [188897.763741] [81465197] system_call_fastpath+0x12/0x6f [188897.764894] ---[ end trace bbfddacb7aaada8c ]--- [188897.765801] BTRFS warning (device dm-0): __btrfs_unlink_inode:3956: \ Aborting unused transaction(No such entry). Tested against ext3/4, xfs, reiserfs and f2fs too, and all these filesystems currently pass this test (on a 4.1 linux kernel at least). The btrfs issue is fixed by the linux kernel patch titled: Btrfs: fix stale dir entries after removing a link and fsync. Signed-off-by: Filipe Manana fdman...@suse.com --- tests/generic/107 | 99 +++ tests/generic/107.out | 3 ++ tests/generic/group | 1 + 3 files changed, 103 insertions(+) create mode 100755 tests/generic/107 create mode 100644 tests/generic/107.out diff --git a/tests/generic/107 b/tests/generic/107 new file mode 100755 index 000..7d107d7 --- /dev/null +++ b/tests/generic/107 @@ -0,0 +1,99 @@ +#! /bin/bash +# FSQA Test No. 107 +# +# Test that when we have a file with multiple hard links belonging to different +# parent directories, if we remove one of those links, fsync the file using one +# of its other links (that has a parent directory different from the one we +# removed a link from), power fail and then replay the fsync log/journal, the +# hard link we removed is not available anymore and all the filesystem metadata +# is in a consistent state. +# +#--- +# +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana fdman...@suse.com +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +#
Re: Data single *and* raid?
Hello Quo, thanks for your reply. But then: root@homeserver:/mnt/__Complete_Disk# btrfs fi df /mnt/__Complete_Disk/ Data, RAID5: total=3.83TiB, used=3.78TiB System, RAID5: total=32.00MiB, used=576.00KiB Metadata, RAID5: total=6.46GiB, used=4.84GiB GlobalReserve, single: total=512.00MiB, used=0.00B GlobalReserve is not a chunk type, it just means a range of metadata reserved for overcommiting. And it's always single. Personally, I don't think it should be output in fi df command, as it's in a higher level than chunk. At least for your case, nothing is needed to worry about. But this seems to be a RAID5 now, right? Well, that's what I want, but the command was: btrfs balance start -dprofiles=single -mprofiles=raid1 /mnt/__Complete_Disk/ So, we would expect raid1 here, no? Greetings, Hendrik On 01.08.2015 22:44, Chris Murphy wrote: On Sat, Aug 1, 2015 at 2:32 PM, Hugo Mills h...@carfax.org.uk wrote: On Sat, Aug 01, 2015 at 10:09:35PM +0200, Hendrik Friedel wrote: Hello, I converted an array to raid5 by btrfs device add /dev/sdd /mnt/new_storage btrfs device add /dev/sdc /mnt/new_storage btrfs balance start -dconvert=raid5 -mconvert=raid5 /mnt/new_storage/ The Balance went through. But now: Label: none uuid: a8af3832-48c7-4568-861f-e80380dd7e0b Total devices 3 FS bytes used 5.28TiB devid1 size 2.73TiB used 2.57TiB path /dev/sde devid2 size 2.73TiB used 2.73TiB path /dev/sdc devid3 size 2.73TiB used 2.73TiB path /dev/sdd btrfs-progs v4.1.1 Already the 2.57TiB is a bit surprising: root@homeserver:/mnt# btrfs fi df /mnt/new_storage/ Data, single: total=2.55TiB, used=2.55TiB Data, RAID5: total=2.73TiB, used=2.72TiB System, RAID5: total=32.00MiB, used=736.00KiB Metadata, RAID1: total=6.00GiB, used=5.33GiB Metadata, RAID5: total=3.00GiB, used=2.99GiB Looking at the btrfs fi show output, you've probably run out of space during the conversion, probably due to an uneven distribution of the original single chunks. I think I would suggest balancing the single chunks, and trying the conversion (of the unconverted parts) again: # btrfs balance start -dprofiles=single -mprofile=raid1 /mnt/new_storage/ # btrfs balance start -dconvert=raid5,soft -mconvert=raid5,soft /mnt/new_storage/ Yep I bet that's it also. btrfs fi usage might be better at exposing this case. -- Hendrik Friedel Auf dem Brink 12 28844 Weyhe Tel. 04203 8394854 Mobil 0178 1874363 --- Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft. https://www.avast.com/antivirus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Lockup in BTRFS_IOC_CLONE/Kernel 4.2.0-rc5
Hi, On Wed, Aug 05, 2015 at 10:28:05AM +0200, Elias Probst wrote: I can reproduce a hard btrfs lockup (process issuing the ioctl() is in D-state, same goes for btrfs-transacti process) on Kernel 4.2.0-rc5. I had the same issue on 4.1, so it's unlikely a regression introduced in 4.2. ## With the following steps, I can reproduce the problem: 1. Create a new clean btrfs volume for /var/lib/machines machinectl set-limit 6G 2. Paste this to /tmp/yum.conf [main] reposdir=/dev/null gpgcheck=0 logfile=/var/log/yum.log installroot=/var/lib/machines/centos7.1-base assumeyes=1 [base] name=CentOS 7.1.1503 - x86_64 baseurl=http://mirror.centos.org/centos/7.1.1503/os/x86_64/ enabled=1 3. Bootstrap a CentOS 7.1 base image /usr/bin/yum -c /tmp/yum.conf groupinstall Base 4. Start an ephemeral systemd-nspawn container based on 'centos7.1-base' strace -o /tmp/systemd-nspawn.out -s 500 -f systemd-nspawn -xbD /var/lib/machines/centos7.1-base/ `systemd-nspawn` will now just hang forever. I couldn't come up yet with a shorter/more low-level way to reproduce this as I lack quite a bit of btrfs experience. Thank you for reporting this. Could you do 'echo w /proc/sysrq-trigger' to gather the whole hang call stack? Here's a quick patch that may address your problem, can you give it a shot after getting sysrq-w output? Thanks, -liubo diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0770c91..b52bd66 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3478,6 +3478,22 @@ process_slot: drop_start = new_key.offset; /* +* We need to look up the roots that point at +* this bytenr and see if the new root does. If +* it does not we need to make sure we update +* quotas appropriately. +*/ + if (disko root != BTRFS_I(src)-root + disko != last_disko) { + no_quota = check_ref(trans, root, +disko); + if (no_quota 0) { + ret = no_quota; + goto out; + } + } + + /* * 1 - adjusting old extent (we may have to * split it) * 1 - add new extent * 1 - inode update @@ -3544,27 +3560,6 @@ process_slot: btrfs_set_file_extent_num_bytes(leaf, extent, datal); - /* -* We need to look up the roots that point at -* this bytenr and see if the new root does. If -* it does not we need to make sure we update -* quotas appropriately. - */ - if (disko root != BTRFS_I(src)-root - disko != last_disko) { - no_quota = check_ref(trans, root, -disko); - if (no_quota 0) { - btrfs_abort_transaction(trans, - root, - ret); - btrfs_end_transaction(trans, - root); - ret = no_quota; - goto out; - } - } - if (disko) { inode_add_bytes(inode, datal); ret = btrfs_inc_extent_ref(trans, root, ## Results: - Last 'strace' lines 6095 fchown(16, 0, 0) = 0 6095 fchmod(16, 0755) = 0 6095 utimensat(16, NULL, {{1402362275, 0}, {1438761285, 819041906}}, 0) = 0 6095 flistxattr(15, , 100) = 0 6095 getdents(15, /* 3 entries */, 32768) = 80 6095 newfstatat(15, coreutils.mo, {st_mode=S_IFREG|0644, st_size=357263, ...}, AT_SYMLINK_NOFOLLOW) = 0 6095 openat(15, coreutils.mo, O_RDONLY|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC) = 17 6095 openat(16, coreutils.mo, O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC, 0644) = 18 6095 fstat(18, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 6095 ioctl(18, BTRFS_IOC_CLONE - call trace in Kernel journal: Aug 05 10:10:03 moria kernel:
Re: [PATCH 01/11] Btrfs: __btrfs_buffered_write: Reserve/release extents aligned to block size
Hi Chanda, Thanks for your effort to implement sub pagesize block size. These cleanups look quite good, but still some small readablity recommendation inlined below. Chandan Rajendra wrote on 2015/08/06 15:40 +0530: Currently, the code reserves/releases extents in multiples of PAGE_CACHE_SIZE units. Fix this by doing reservation/releases in block size units. Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com --- fs/btrfs/file.c | 40 1 file changed, 28 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 795d754..e3b2b3c 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1362,16 +1362,19 @@ fail: static noinline int lock_and_cleanup_extent_if_need(struct inode *inode, struct page **pages, size_t num_pages, loff_t pos, + size_t write_bytes, u64 *lockstart, u64 *lockend, struct extent_state **cached_state) { + struct btrfs_root *root = BTRFS_I(inode)-root; u64 start_pos; u64 last_pos; int i; int ret = 0; - start_pos = pos ~((u64)PAGE_CACHE_SIZE - 1); - last_pos = start_pos + ((u64)num_pages PAGE_CACHE_SHIFT) - 1; + start_pos = pos ~((u64)root-sectorsize - 1); Why not just roundown(pos, root-sectorisze) Hard coded align is never that easy to read. + last_pos = start_pos + + ALIGN(pos + write_bytes - start_pos, root-sectorsize) - 1; Maybe just a preference problem, I'd prefer to use round_down other than ALIGN, as sometimes I still need to figure out if it is round_down or round_down. if (start_pos inode-i_size) { struct btrfs_ordered_extent *ordered; @@ -1489,6 +1492,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, while (iov_iter_count(i) 0) { size_t offset = pos (PAGE_CACHE_SIZE - 1); + size_t sector_offset; size_t write_bytes = min(iov_iter_count(i), nrptrs * (size_t)PAGE_CACHE_SIZE - offset); @@ -1497,6 +1501,8 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, size_t reserve_bytes; size_t dirty_pages; size_t copied; + size_t dirty_sectors; + size_t num_sectors; WARN_ON(num_pages nrptrs); @@ -1509,8 +1515,12 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, break; } - reserve_bytes = num_pages PAGE_CACHE_SHIFT; + sector_offset = pos (root-sectorsize - 1); Same here. Thanks, Qu + reserve_bytes = ALIGN(write_bytes + sector_offset, + root-sectorsize); + ret = btrfs_check_data_free_space(inode, reserve_bytes, write_bytes); + if (ret == -ENOSPC (BTRFS_I(inode)-flags (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC))) { @@ -1523,7 +1533,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, */ num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_CACHE_SIZE); - reserve_bytes = num_pages PAGE_CACHE_SHIFT; + reserve_bytes = ALIGN(write_bytes + sector_offset, + root-sectorsize); + ret = 0; } else { ret = -ENOSPC; @@ -1558,8 +1570,8 @@ again: break; ret = lock_and_cleanup_extent_if_need(inode, pages, num_pages, - pos, lockstart, lockend, - cached_state); + pos, write_bytes, lockstart, + lockend, cached_state); if (ret 0) { if (ret == -EAGAIN) goto again; @@ -1595,9 +1607,14 @@ again: * we still have an outstanding extent for the chunk we actually * managed to copy. */ - if (num_pages dirty_pages) { - release_bytes = (num_pages - dirty_pages) - PAGE_CACHE_SHIFT; + num_sectors = reserve_bytes inode-i_blkbits; + dirty_sectors = round_up(copied + sector_offset, + root-sectorsize); + dirty_sectors = inode-i_blkbits; + + if (num_sectors
Re: [PATCH 01/11] Btrfs: __btrfs_buffered_write: Reserve/release extents aligned to block size
On Friday 07 Aug 2015 11:08:30 Qu Wenruo wrote: Hi Chanda, Thanks for your effort to implement sub pagesize block size. These cleanups look quite good, but still some small readablity recommendation inlined below. Chandan Rajendra wrote on 2015/08/06 15:40 +0530: Currently, the code reserves/releases extents in multiples of PAGE_CACHE_SIZE units. Fix this by doing reservation/releases in block size units. Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com --- fs/btrfs/file.c | 40 1 file changed, 28 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 795d754..e3b2b3c 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1362,16 +1362,19 @@ fail: static noinline int lock_and_cleanup_extent_if_need(struct inode *inode, struct page **pages, size_t num_pages, loff_t pos, + size_t write_bytes, u64 *lockstart, u64 *lockend, struct extent_state **cached_state) { + struct btrfs_root *root = BTRFS_I(inode)-root; u64 start_pos; u64 last_pos; int i; int ret = 0; - start_pos = pos ~((u64)PAGE_CACHE_SIZE - 1); - last_pos = start_pos + ((u64)num_pages PAGE_CACHE_SHIFT) - 1; + start_pos = pos ~((u64)root-sectorsize - 1); Why not just roundown(pos, root-sectorisze) Hard coded align is never that easy to read. Qu Wenruo, Thanks for pointing it out. I will replace them with round_[down,up] calls and post V2. + last_pos = start_pos + + ALIGN(pos + write_bytes - start_pos, root-sectorsize) - 1; Maybe just a preference problem, I'd prefer to use round_down other than ALIGN, as sometimes I still need to figure out if it is round_down or round_down. if (start_pos inode-i_size) { struct btrfs_ordered_extent *ordered; @@ -1489,6 +1492,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, while (iov_iter_count(i) 0) { size_t offset = pos (PAGE_CACHE_SIZE - 1); + size_t sector_offset; size_t write_bytes = min(iov_iter_count(i), nrptrs * (size_t)PAGE_CACHE_SIZE - offset); @@ -1497,6 +1501,8 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, size_t reserve_bytes; size_t dirty_pages; size_t copied; + size_t dirty_sectors; + size_t num_sectors; WARN_ON(num_pages nrptrs); @@ -1509,8 +1515,12 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, break; } - reserve_bytes = num_pages PAGE_CACHE_SHIFT; + sector_offset = pos (root-sectorsize - 1); Same here. Thanks, Qu + reserve_bytes = ALIGN(write_bytes + sector_offset, + root-sectorsize); + ret = btrfs_check_data_free_space(inode, reserve_bytes, write_bytes); + if (ret == -ENOSPC (BTRFS_I(inode)-flags (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC))) { @@ -1523,7 +1533,9 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, */ num_pages = DIV_ROUND_UP(write_bytes + offset, PAGE_CACHE_SIZE); - reserve_bytes = num_pages PAGE_CACHE_SHIFT; + reserve_bytes = ALIGN(write_bytes + sector_offset, + root-sectorsize); + ret = 0; } else { ret = -ENOSPC; @@ -1558,8 +1570,8 @@ again: break; ret = lock_and_cleanup_extent_if_need(inode, pages, num_pages, - pos, lockstart, lockend, - cached_state); + pos, write_bytes, lockstart, + lockend, cached_state); if (ret 0) { if (ret == -EAGAIN) goto again; @@ -1595,9 +1607,14 @@ again: * we still have an outstanding extent for the chunk we actually * managed to copy. */ -
RE: [PATCH V2] btrfs-progs: add newline to some error messages
Reviewed-by: Zhao Lei zhao...@cn.fujitsu.com Thanks Zhaolei -Original Message- From: Tsutomu Itoh [mailto:t-i...@jp.fujitsu.com] Sent: Friday, August 07, 2015 8:20 AM To: linux-btrfs@vger.kernel.org Cc: Zhao Lei Subject: [PATCH V2] btrfs-progs: add newline to some error messages Added a missing newline to some error messages. Also printf() was changed to fprintf(stderr) for error message. Signed-off-by: Tsutomu Itoh t-i...@jp.fujitsu.com --- btrfs-corrupt-block.c | 2 +- cmds-check.c | 4 ++-- cmds-send.c | 4 ++-- dir-item.c| 6 +++--- free-space-cache.c| 24 +++- mkfs.c| 2 +- 6 files changed, 24 insertions(+), 18 deletions(-) diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c index 1a2aa23..ea871f4 100644 --- a/btrfs-corrupt-block.c +++ b/btrfs-corrupt-block.c @@ -1010,7 +1010,7 @@ int find_chunk_offset(struct btrfs_root *root, goto out; } if (ret 0) { - fprintf(stderr, Error searching chunk); + fprintf(stderr, Error searching chunk\n); goto out; } out: diff --git a/cmds-check.c b/cmds-check.c index 50bb6f3..d0ffc94 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -2399,7 +2399,7 @@ static int repair_inode_nlinks(struct btrfs_trans_handle *trans, BTRFS_FIRST_FREE_OBJECTID, lost_found_ino, mode); if (ret 0) { - fprintf(stderr, Failed to create '%s' dir: %s, + fprintf(stderr, Failed to create '%s' dir: %s\n, dir_name, strerror(-ret)); goto out; } @@ -2427,7 +2427,7 @@ static int repair_inode_nlinks(struct btrfs_trans_handle *trans, } if (ret 0) { fprintf(stderr, - Failed to link the inode %llu to %s dir: %s, + Failed to link the inode %llu to %s dir: %s\n, rec-ino, dir_name, strerror(-ret)); goto out; } diff --git a/cmds-send.c b/cmds-send.c index a0b7f95..6f2f340 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -193,13 +193,13 @@ static int write_buf(int fd, const void *buf, int size) ret = write(fd, (char*)buf + pos, size - pos); if (ret 0) { ret = -errno; - fprintf(stderr, ERROR: failed to dump stream. %s, + fprintf(stderr, ERROR: failed to dump stream. %s\n, strerror(-ret)); goto out; } if (!ret) { ret = -EIO; - fprintf(stderr, ERROR: failed to dump stream. %s, + fprintf(stderr, ERROR: failed to dump stream. %s\n, strerror(-ret)); goto out; } diff --git a/dir-item.c b/dir-item.c index a5bf861..f3ad98f 100644 --- a/dir-item.c +++ b/dir-item.c @@ -285,7 +285,7 @@ int verify_dir_item(struct btrfs_root *root, u8 type = btrfs_dir_type(leaf, dir_item); if (type = BTRFS_FT_MAX) { - fprintf(stderr, invalid dir item type: %d, + fprintf(stderr, invalid dir item type: %d\n, (int)type); return 1; } @@ -294,7 +294,7 @@ int verify_dir_item(struct btrfs_root *root, namelen = XATTR_NAME_MAX; if (btrfs_dir_name_len(leaf, dir_item) namelen) { - fprintf(stderr, invalid dir item name len: %u, + fprintf(stderr, invalid dir item name len: %u\n, (unsigned)btrfs_dir_data_len(leaf, dir_item)); return 1; } @@ -302,7 +302,7 @@ int verify_dir_item(struct btrfs_root *root, /* BTRFS_MAX_XATTR_SIZE is the same for all dir items */ if ((btrfs_dir_data_len(leaf, dir_item) + btrfs_dir_name_len(leaf, dir_item)) BTRFS_MAX_XATTR_SIZE(root)) { - fprintf(stderr, invalid dir item name + data len: %u + %u, + fprintf(stderr, invalid dir item name + data len: %u + %u\n, (unsigned)btrfs_dir_name_len(leaf, dir_item), (unsigned)btrfs_dir_data_len(leaf, dir_item)); return 1; diff --git a/free-space-cache.c b/free-space-cache.c index 67f00fd..19ab0c9 100644 --- a/free-space-cache.c +++ b/free-space-cache.c @@ -107,7 +107,8 @@ static int io_ctl_prepare_pages(struct io_ctl *io_ctl, struct btrfs_root *root, ret = btrfs_search_slot(NULL, root, key, path, 0, 0); if (ret) { - printf(Couldn't find file extent item for free space inode + fprintf(stderr, +Couldn't find file extent item for free
Re: Removing bad hdd from btrfs volume
Peter Foley posted on Thu, 06 Aug 2015 15:17:04 -0700 as excerpted: I have an btrfs volume that spans multiple disks (no raid, just single), and earlier this morning I hit some hardware problems with one of the disks. I tried btrfs dev del /dev/sda1 /, but btrfs was unable to migrate the 1gb that appears to be causing the read errors. See http://sprunge.us/aeZC Is there some way to figure out which file(s) are affected, and if they are stuff I don't care about, is there some way to force btrfs to lose the 1gb it can't copy off of the failing hdd? Of course that's the classic raid0 trap (with btrfs multi-device single being effectively a raid0 with really big stripes). Raid0 is (ideally) never supposed to be used for data that isn't throw-away, either because it's literally no-care data, or because there's backups kept appropriately updated, as it's generally considered as good as dead the moment one device fails or even really starts to go bad. So ideally, with one device starting to go bad, you scrap the entire filesystem, remove the bad device (or trigger sector remap and reuse, but that's dangerous as once sectors start to go, generally the badness spreads so the entire device can't be considered trustworthy again), and mkfs a new filesystem on the remaining devices, with a replacement device thrown in as well if desired. But sometimes the world isn't ideal; on the arguably more practical side... Most of my btrfs are raid1, both data/metadata, with the remainder being mixed-bg dup, so I've never tried this on single, personally, but... First, you didn't mention versions so be sure you're current, btrfs-progs v4.1.2 is current on the user side, kernel 4.1.x (which you appear to have, based on the dmesg, BTW, gentoo here too =:^), or 4.2-rc5+ since 4.2 is close to release now, is current on the kernel side. Try btrfs scrub. Assuming a current btrfs-progs, that should correct errors in the metadata, which should be raid1 and thus have a second hopefully valid copy to read from. It should detect but not be able to correct errors in the single mode data, but should tell you what files the errors are in (I believe very old btrfs-progs scrub did not). Armed with a list of the files with errors, you should be able to delete them. Once all such files are deleted, the 1 GiB chunk that they were in should be empty, and a btrfs balance -dusage=0 should eliminate it. At that point a btrfs dev del should work. That's the theory, anyway. As I said, I've not tried it myself. But it's what I'd try if I did have single-mode data on anything and found myself in that situation. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2] btrfs-progs: add newline to some error messages
Added a missing newline to some error messages. Also printf() was changed to fprintf(stderr) for error message. Signed-off-by: Tsutomu Itoh t-i...@jp.fujitsu.com --- btrfs-corrupt-block.c | 2 +- cmds-check.c | 4 ++-- cmds-send.c | 4 ++-- dir-item.c| 6 +++--- free-space-cache.c| 24 +++- mkfs.c| 2 +- 6 files changed, 24 insertions(+), 18 deletions(-) diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c index 1a2aa23..ea871f4 100644 --- a/btrfs-corrupt-block.c +++ b/btrfs-corrupt-block.c @@ -1010,7 +1010,7 @@ int find_chunk_offset(struct btrfs_root *root, goto out; } if (ret 0) { - fprintf(stderr, Error searching chunk); + fprintf(stderr, Error searching chunk\n); goto out; } out: diff --git a/cmds-check.c b/cmds-check.c index 50bb6f3..d0ffc94 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -2399,7 +2399,7 @@ static int repair_inode_nlinks(struct btrfs_trans_handle *trans, BTRFS_FIRST_FREE_OBJECTID, lost_found_ino, mode); if (ret 0) { - fprintf(stderr, Failed to create '%s' dir: %s, + fprintf(stderr, Failed to create '%s' dir: %s\n, dir_name, strerror(-ret)); goto out; } @@ -2427,7 +2427,7 @@ static int repair_inode_nlinks(struct btrfs_trans_handle *trans, } if (ret 0) { fprintf(stderr, - Failed to link the inode %llu to %s dir: %s, + Failed to link the inode %llu to %s dir: %s\n, rec-ino, dir_name, strerror(-ret)); goto out; } diff --git a/cmds-send.c b/cmds-send.c index a0b7f95..6f2f340 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -193,13 +193,13 @@ static int write_buf(int fd, const void *buf, int size) ret = write(fd, (char*)buf + pos, size - pos); if (ret 0) { ret = -errno; - fprintf(stderr, ERROR: failed to dump stream. %s, + fprintf(stderr, ERROR: failed to dump stream. %s\n, strerror(-ret)); goto out; } if (!ret) { ret = -EIO; - fprintf(stderr, ERROR: failed to dump stream. %s, + fprintf(stderr, ERROR: failed to dump stream. %s\n, strerror(-ret)); goto out; } diff --git a/dir-item.c b/dir-item.c index a5bf861..f3ad98f 100644 --- a/dir-item.c +++ b/dir-item.c @@ -285,7 +285,7 @@ int verify_dir_item(struct btrfs_root *root, u8 type = btrfs_dir_type(leaf, dir_item); if (type = BTRFS_FT_MAX) { - fprintf(stderr, invalid dir item type: %d, + fprintf(stderr, invalid dir item type: %d\n, (int)type); return 1; } @@ -294,7 +294,7 @@ int verify_dir_item(struct btrfs_root *root, namelen = XATTR_NAME_MAX; if (btrfs_dir_name_len(leaf, dir_item) namelen) { - fprintf(stderr, invalid dir item name len: %u, + fprintf(stderr, invalid dir item name len: %u\n, (unsigned)btrfs_dir_data_len(leaf, dir_item)); return 1; } @@ -302,7 +302,7 @@ int verify_dir_item(struct btrfs_root *root, /* BTRFS_MAX_XATTR_SIZE is the same for all dir items */ if ((btrfs_dir_data_len(leaf, dir_item) + btrfs_dir_name_len(leaf, dir_item)) BTRFS_MAX_XATTR_SIZE(root)) { - fprintf(stderr, invalid dir item name + data len: %u + %u, + fprintf(stderr, invalid dir item name + data len: %u + %u\n, (unsigned)btrfs_dir_name_len(leaf, dir_item), (unsigned)btrfs_dir_data_len(leaf, dir_item)); return 1; diff --git a/free-space-cache.c b/free-space-cache.c index 67f00fd..19ab0c9 100644 --- a/free-space-cache.c +++ b/free-space-cache.c @@ -107,7 +107,8 @@ static int io_ctl_prepare_pages(struct io_ctl *io_ctl, struct btrfs_root *root, ret = btrfs_search_slot(NULL, root, key, path, 0, 0); if (ret) { - printf(Couldn't find file extent item for free space inode + fprintf(stderr, + Couldn't find file extent item for free space inode %Lu\n, ino); btrfs_release_path(path); return -EINVAL; @@ -138,7 +139,7 @@ static int io_ctl_prepare_pages(struct io_ctl *io_ctl, struct btrfs_root *root, struct
Re: Data single *and* raid?
Hendrik Friedel wrote on 2015/08/06 20:57 +0200: Hello Hugo, hello Chris, thanks for your advice. Now I am here: btrfs balance start -dprofiles=single -mprofiles=raid1 /mnt/__Complete_Disk/ Done, had to relocate 0 out of 3939 chunks root@homeserver:/mnt/__Complete_Disk# btrfs fi show Label: none uuid: a8af3832-48c7-4568-861f-e80380dd7e0b Total devices 3 FS bytes used 3.78TiB devid1 size 2.73TiB used 2.72TiB path /dev/sde devid2 size 2.73TiB used 2.23TiB path /dev/sdc devid3 size 2.73TiB used 2.73TiB path /dev/sdd btrfs-progs v4.1.1 So, that looks good. But then: root@homeserver:/mnt/__Complete_Disk# btrfs fi df /mnt/__Complete_Disk/ Data, RAID5: total=3.83TiB, used=3.78TiB System, RAID5: total=32.00MiB, used=576.00KiB Metadata, RAID5: total=6.46GiB, used=4.84GiB GlobalReserve, single: total=512.00MiB, used=0.00B GlobalReserve is not a chunk type, it just means a range of metadata reserved for overcommiting. And it's always single. Personally, I don't think it should be output in fi df command, as it's in a higher level than chunk. At least for your case, nothing is needed to worry about. Thanks, Qu Is the RAID5 expected here? I did not yet run: btrfs balance start -dconvert=raid5,soft -mconvert=raid5,soft /mnt/new_storage/ Regards, Hendrik On 01.08.2015 22:44, Chris Murphy wrote: On Sat, Aug 1, 2015 at 2:32 PM, Hugo Mills h...@carfax.org.uk wrote: On Sat, Aug 01, 2015 at 10:09:35PM +0200, Hendrik Friedel wrote: Hello, I converted an array to raid5 by btrfs device add /dev/sdd /mnt/new_storage btrfs device add /dev/sdc /mnt/new_storage btrfs balance start -dconvert=raid5 -mconvert=raid5 /mnt/new_storage/ The Balance went through. But now: Label: none uuid: a8af3832-48c7-4568-861f-e80380dd7e0b Total devices 3 FS bytes used 5.28TiB devid1 size 2.73TiB used 2.57TiB path /dev/sde devid2 size 2.73TiB used 2.73TiB path /dev/sdc devid3 size 2.73TiB used 2.73TiB path /dev/sdd btrfs-progs v4.1.1 Already the 2.57TiB is a bit surprising: root@homeserver:/mnt# btrfs fi df /mnt/new_storage/ Data, single: total=2.55TiB, used=2.55TiB Data, RAID5: total=2.73TiB, used=2.72TiB System, RAID5: total=32.00MiB, used=736.00KiB Metadata, RAID1: total=6.00GiB, used=5.33GiB Metadata, RAID5: total=3.00GiB, used=2.99GiB Looking at the btrfs fi show output, you've probably run out of space during the conversion, probably due to an uneven distribution of the original single chunks. I think I would suggest balancing the single chunks, and trying the conversion (of the unconverted parts) again: # btrfs balance start -dprofiles=single -mprofile=raid1 /mnt/new_storage/ # btrfs balance start -dconvert=raid5,soft -mconvert=raid5,soft /mnt/new_storage/ Yep I bet that's it also. btrfs fi usage might be better at exposing this case. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: add newline to some error messages
On 2015/08/06 15:07, Zhao Lei wrote: Hi, Itho-san -Original Message- From: Tsutomu Itoh [mailto:t-i...@jp.fujitsu.com] Sent: Thursday, August 06, 2015 12:01 PM To: Zhao Lei; linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs-progs: add newline to some error messages On 2015/08/06 12:51, Zhao Lei wrote: Hi, Itoh -Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Tsutomu Itoh Sent: Thursday, August 06, 2015 11:06 AM To: linux-btrfs@vger.kernel.org Subject: [PATCH] btrfs-progs: add newline to some error messages Added a missing newline to some error messages. Good found! Seems more code need to be fixed, as: # cat mkfs.c | tr -d '\n' | grep -o -w 'f\?printf([^(]*);' | sed 's/f\?printf[[:blank:]]*(\(stderr,\|\)[[:blank:]]*\(.*\)[,)].*/\2/g' | grep -v '\\n' symlink too long for %s Incompat features: %s # It's OK. printf(Incompat features: %s, features_buf); printf(\n); # cat utils.c | tr -d '\n' | grep -o -w 'f\?printf([^(]*);' | sed 's/f\?printf[[:blank:]]*(\(stderr,\|\)[[:blank:]]*\(.*\)[,)].*/\2/g' | grep -v '\\n' ERROR: DUP for data is allowed only in mixed mode %s [y/N]: *1 # *1: It is not problem, should to be ignored Already fixed by David in devel branch. Got it. I run above script for all .c files, nearly all are fixed by this patch, except this: free-space-cache.c Duplicate entries in free space cache, dumping Duplicate entries in free space cache, dumping block group %llu has wrong amount of free space Above message seems having these problem: 1: lack of '\n' 2: better to use fprintf(stderr, 3: there is dumping in message, but I havn't see dump code in source. I will send V2 patch, soon, Thanks, Tsutomu Thanks Zhaolei Thanks, Tsutomu Thanks Zhaolei Signed-off-by: Tsutomu Itoh t-i...@jp.fujitsu.com --- btrfs-corrupt-block.c | 2 +- cmds-check.c | 4 ++-- cmds-send.c | 4 ++-- dir-item.c| 6 +++--- mkfs.c| 2 +- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c index 1a2aa23..ea871f4 100644 --- a/btrfs-corrupt-block.c +++ b/btrfs-corrupt-block.c @@ -1010,7 +1010,7 @@ int find_chunk_offset(struct btrfs_root *root, goto out; } if (ret 0) { - fprintf(stderr, Error searching chunk); + fprintf(stderr, Error searching chunk\n); goto out; } out: diff --git a/cmds-check.c b/cmds-check.c index dd2fce3..0ddf57c 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -2398,7 +2398,7 @@ static int repair_inode_nlinks(struct btrfs_trans_handle *trans, BTRFS_FIRST_FREE_OBJECTID, lost_found_ino, mode); if (ret 0) { - fprintf(stderr, Failed to create '%s' dir: %s, + fprintf(stderr, Failed to create '%s' dir: %s\n, dir_name, strerror(-ret)); goto out; } @@ -2426,7 +2426,7 @@ static int repair_inode_nlinks(struct btrfs_trans_handle *trans, } if (ret 0) { fprintf(stderr, - Failed to link the inode %llu to %s dir: %s, + Failed to link the inode %llu to %s dir: %s\n, rec-ino, dir_name, strerror(-ret)); goto out; } diff --git a/cmds-send.c b/cmds-send.c index 20bba18..78ee54c 100644 --- a/cmds-send.c +++ b/cmds-send.c @@ -192,13 +192,13 @@ static int write_buf(int fd, const void *buf, int size) ret = write(fd, (char*)buf + pos, size - pos); if (ret 0) { ret = -errno; - fprintf(stderr, ERROR: failed to dump stream. %s, + fprintf(stderr, ERROR: failed to dump stream. %s\n, strerror(-ret)); goto out; } if (!ret) { ret = -EIO; - fprintf(stderr, ERROR: failed to dump stream. %s, + fprintf(stderr, ERROR: failed to dump stream. %s\n, strerror(-ret)); goto out; } diff --git a/dir-item.c b/dir-item.c index a5bf861..f3ad98f 100644 --- a/dir-item.c +++ b/dir-item.c @@ -285,7 +285,7 @@ int verify_dir_item(struct btrfs_root *root, u8 type = btrfs_dir_type(leaf, dir_item); if (type = BTRFS_FT_MAX) { - fprintf(stderr, invalid dir item type: %d, + fprintf(stderr, invalid dir item type: %d\n, (int)type); return 1; } @@ -294,7 +294,7 @@ int verify_dir_item(struct btrfs_root
[PATCH] btrfs: Remove unnecessary variants in relocation.c
These arguments are not used in functions, remove them for cleanup and make kernel stack happy. Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/ctree.h | 3 +-- fs/btrfs/relocation.c | 13 + fs/btrfs/transaction.c | 2 +- 3 files changed, 7 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index f57e6ca..f335c18 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -4185,8 +4185,7 @@ int btrfs_reloc_clone_csums(struct inode *inode, u64 file_pos, u64 len); int btrfs_reloc_cow_block(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *buf, struct extent_buffer *cow); -void btrfs_reloc_pre_snapshot(struct btrfs_trans_handle *trans, - struct btrfs_pending_snapshot *pending, +void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending, u64 *bytes_to_reserve); int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans, struct btrfs_pending_snapshot *pending); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 4698928..303babe 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2523,8 +2523,7 @@ struct btrfs_root *select_reloc_root(struct btrfs_trans_handle *trans, * counted. return -ENOENT if the block is root of reloc tree. */ static noinline_for_stack -struct btrfs_root *select_one_root(struct btrfs_trans_handle *trans, - struct backref_node *node) +struct btrfs_root *select_one_root(struct backref_node *node) { struct backref_node *next; struct btrfs_root *root; @@ -2912,7 +2911,7 @@ static int relocate_tree_block(struct btrfs_trans_handle *trans, return 0; BUG_ON(node-processed); - root = select_one_root(trans, node); + root = select_one_root(node); if (root == ERR_PTR(-ENOENT)) { update_processed_blocks(rc, node); goto out; @@ -3755,8 +3754,7 @@ out: * helper to find next unprocessed extent */ static noinline_for_stack -int find_next_extent(struct btrfs_trans_handle *trans, -struct reloc_control *rc, struct btrfs_path *path, +int find_next_extent(struct reloc_control *rc, struct btrfs_path *path, struct btrfs_key *extent_key) { struct btrfs_key key; @@ -3951,7 +3949,7 @@ restart: continue; } - ret = find_next_extent(trans, rc, path, key); + ret = find_next_extent(rc, path, key); if (ret 0) err = ret; if (ret != 0) @@ -4596,8 +4594,7 @@ int btrfs_reloc_cow_block(struct btrfs_trans_handle *trans, * called before creating snapshot. it calculates metadata reservation * requried for relocating tree blocks in the snapshot */ -void btrfs_reloc_pre_snapshot(struct btrfs_trans_handle *trans, - struct btrfs_pending_snapshot *pending, +void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending, u64 *bytes_to_reserve) { struct btrfs_root *root; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index c0f18e7..049613c 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1301,7 +1301,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, */ btrfs_set_skip_qgroup(trans, objectid); - btrfs_reloc_pre_snapshot(trans, pending, to_reserve); + btrfs_reloc_pre_snapshot(pending, to_reserve); if (to_reserve 0) { pending-error = btrfs_block_rsv_add(root, -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bedup --defrag freezing
On 2015-08-05 17:45, Konstantin Svist wrote: Hi, I've been running btrfs on Fedora for a while now, with bedup --defrag running in a night-time cronjob. Last few runs seem to have gotten stuck, without possibility of even killing the process (kill -9 doesn't work) -- all I could do is hard power cycle. Did something change recently? Is bedup simply too out of date? What should I use to de-duplicate across snapshots instead? Etc.? AFAIK, bedup hasn't been actively developed for quite a while (I'm actually kind of surprised it runs with the newest btrfs-progs). Personally, I'd suggest using duperemove (https://github.com/markfasheh/duperemove). smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH] fstests: generic test for fsync of file with multiple links
On Thu, Aug 06, 2015 at 05:11:30AM +0100, fdman...@kernel.org wrote: From: Filipe Manana fdman...@suse.com Test that when we have a file with multiple hard links belonging to different parent directories, if we remove one of those links, fsync the file using one of its other links (that has a parent directory different from the one we removed a link from), power fail and then replay the fsync log/journal, the hard link we removed is not available anymore and all the filesystem metadata is in a consistent state. Looks good to me, just one minor question below This test is motivated by an issue found in btrfs, where the test fails with: generic/107 2s ... - output mismatch (see .../results/generic/107.out.bad) --- tests/generic/107.out 2015-08-04 09:47:46.922131256 +0100 +++ /home/fdmanana/git/hub/xfstests/results//generic/107.out.bad @@ -1,3 +1,5 @@ QA output created by 107 Entries in testdir: foo2 +foo3 +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty ... (Run 'diff -u tests/generic/107.out .../generic/107.out.bad' to see the entire diff) _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent (see .../generic/107.full) _check_dmesg: something found in dmesg (see .../generic/107.dmesg) $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.full _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent *** fsck.btrfs output *** checking extents checking free space cache checking fs roots root 5 inode 257 errors 200, dir isize wrong unresolved ref dir 257 index 3 namelen 4 name foo3 filetype 1 \ errors 5, no dir item, no inode ref $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.dmesg (...) [188897.707311] BTRFS info (device dm-0): failed to delete reference to \ foo3, inode 258 parent 257 [188897.711345] [ cut here ] [188897.713369] WARNING: CPU: 10 PID: 19452 at fs/btrfs/inode.c:3956 \ __btrfs_unlink_inode+0x182/0x35a [btrfs]() [188897.717661] BTRFS: Transaction aborted (error -2) (...) [188897.747898] Call Trace: [188897.748519] [8145f077] dump_stack+0x4f/0x7b [188897.749602] [81095de5] ? console_unlock+0x356/0x3a2 [188897.750682] [8104b3b0] warn_slowpath_common+0xa1/0xbb [188897.751936] [a04c5d09] ? __btrfs_unlink_inode+0x182/0x35a [btrfs] [188897.753485] [8104b410] warn_slowpath_fmt+0x46/0x48 [188897.754781] [a04c5d09] __btrfs_unlink_inode+0x182/0x35a [btrfs] [188897.756295] [a04c6e8f] btrfs_unlink_inode+0x1e/0x40 [btrfs] [188897.757692] [a04c6f11] btrfs_unlink+0x60/0x9b [btrfs] [188897.758978] [8116fb48] vfs_unlink+0x9c/0xed [188897.760151] [81173481] do_unlinkat+0x12b/0x1fb [188897.761354] [81253855] ? lockdep_sys_exit_thunk+0x12/0x14 [188897.762692] [81174056] SyS_unlinkat+0x29/0x2b [188897.763741] [81465197] system_call_fastpath+0x12/0x6f [188897.764894] ---[ end trace bbfddacb7aaada8c ]--- [188897.765801] BTRFS warning (device dm-0): __btrfs_unlink_inode:3956: \ Aborting unused transaction(No such entry). Tested against ext3/4, xfs, reiserfs and f2fs too, and all these filesystems currently pass this test (on a 4.1 linux kernel at least). The btrfs issue is fixed by the linux kernel patch titled: Btrfs: fix stale dir entries after removing a link and fsync. Signed-off-by: Filipe Manana fdman...@suse.com --- tests/generic/107 | 99 +++ tests/generic/107.out | 3 ++ tests/generic/group | 1 + 3 files changed, 103 insertions(+) create mode 100755 tests/generic/107 create mode 100644 tests/generic/107.out diff --git a/tests/generic/107 b/tests/generic/107 new file mode 100755 index 000..7d107d7 --- /dev/null +++ b/tests/generic/107 @@ -0,0 +1,99 @@ +#! /bin/bash +# FSQA Test No. 107 +# +# Test that when we have a file with multiple hard links belonging to different +# parent directories, if we remove one of those links, fsync the file using one +# of its other links (that has a parent directory different from the one we +# removed a link from), power fail and then replay the fsync log/journal, the +# hard link we removed is not available anymore and all the filesystem metadata +# is in a consistent state. +# +#--- +# +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana fdman...@suse.com +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY
Re: [PATCH] fstests: generic test for fsync of file with multiple links
On Thu, Aug 6, 2015 at 12:46 PM, Eryu Guan eg...@redhat.com wrote: On Thu, Aug 06, 2015 at 05:11:30AM +0100, fdman...@kernel.org wrote: From: Filipe Manana fdman...@suse.com Test that when we have a file with multiple hard links belonging to different parent directories, if we remove one of those links, fsync the file using one of its other links (that has a parent directory different from the one we removed a link from), power fail and then replay the fsync log/journal, the hard link we removed is not available anymore and all the filesystem metadata is in a consistent state. Looks good to me, just one minor question below This test is motivated by an issue found in btrfs, where the test fails with: generic/107 2s ... - output mismatch (see .../results/generic/107.out.bad) --- tests/generic/107.out 2015-08-04 09:47:46.922131256 +0100 +++ /home/fdmanana/git/hub/xfstests/results//generic/107.out.bad @@ -1,3 +1,5 @@ QA output created by 107 Entries in testdir: foo2 +foo3 +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty ... (Run 'diff -u tests/generic/107.out .../generic/107.out.bad' to see the entire diff) _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent (see .../generic/107.full) _check_dmesg: something found in dmesg (see .../generic/107.dmesg) $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.full _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent *** fsck.btrfs output *** checking extents checking free space cache checking fs roots root 5 inode 257 errors 200, dir isize wrong unresolved ref dir 257 index 3 namelen 4 name foo3 filetype 1 \ errors 5, no dir item, no inode ref $ cat /home/fdmanana/git/hub/xfstests/results//generic/107.dmesg (...) [188897.707311] BTRFS info (device dm-0): failed to delete reference to \ foo3, inode 258 parent 257 [188897.711345] [ cut here ] [188897.713369] WARNING: CPU: 10 PID: 19452 at fs/btrfs/inode.c:3956 \ __btrfs_unlink_inode+0x182/0x35a [btrfs]() [188897.717661] BTRFS: Transaction aborted (error -2) (...) [188897.747898] Call Trace: [188897.748519] [8145f077] dump_stack+0x4f/0x7b [188897.749602] [81095de5] ? console_unlock+0x356/0x3a2 [188897.750682] [8104b3b0] warn_slowpath_common+0xa1/0xbb [188897.751936] [a04c5d09] ? __btrfs_unlink_inode+0x182/0x35a [btrfs] [188897.753485] [8104b410] warn_slowpath_fmt+0x46/0x48 [188897.754781] [a04c5d09] __btrfs_unlink_inode+0x182/0x35a [btrfs] [188897.756295] [a04c6e8f] btrfs_unlink_inode+0x1e/0x40 [btrfs] [188897.757692] [a04c6f11] btrfs_unlink+0x60/0x9b [btrfs] [188897.758978] [8116fb48] vfs_unlink+0x9c/0xed [188897.760151] [81173481] do_unlinkat+0x12b/0x1fb [188897.761354] [81253855] ? lockdep_sys_exit_thunk+0x12/0x14 [188897.762692] [81174056] SyS_unlinkat+0x29/0x2b [188897.763741] [81465197] system_call_fastpath+0x12/0x6f [188897.764894] ---[ end trace bbfddacb7aaada8c ]--- [188897.765801] BTRFS warning (device dm-0): __btrfs_unlink_inode:3956: \ Aborting unused transaction(No such entry). Tested against ext3/4, xfs, reiserfs and f2fs too, and all these filesystems currently pass this test (on a 4.1 linux kernel at least). The btrfs issue is fixed by the linux kernel patch titled: Btrfs: fix stale dir entries after removing a link and fsync. Signed-off-by: Filipe Manana fdman...@suse.com --- tests/generic/107 | 99 +++ tests/generic/107.out | 3 ++ tests/generic/group | 1 + 3 files changed, 103 insertions(+) create mode 100755 tests/generic/107 create mode 100644 tests/generic/107.out diff --git a/tests/generic/107 b/tests/generic/107 new file mode 100755 index 000..7d107d7 --- /dev/null +++ b/tests/generic/107 @@ -0,0 +1,99 @@ +#! /bin/bash +# FSQA Test No. 107 +# +# Test that when we have a file with multiple hard links belonging to different +# parent directories, if we remove one of those links, fsync the file using one +# of its other links (that has a parent directory different from the one we +# removed a link from), power fail and then replay the fsync log/journal, the +# hard link we removed is not available anymore and all the filesystem metadata +# is in a consistent state. +# +#--- +# +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana fdman...@suse.com +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed
Re: BTRFS disaster (of my own making). Is this recoverable?
On 2015-08-05 22:13, Chris Murphy wrote: On Wed, Aug 5, 2015 at 6:45 PM, Paul Jones p...@pauljones.id.au wrote: Would it be possible to store this type of critical information twice on each disk, at the beginning and end? I thought BTRFS already did that, but I might be thinking of some other filesystem. I've had my share of these types of oops! moments as well. That option is metadata profile raid1. To do an automatic -mconvert=raid1 when the user does 'btrfs device add' breaks any use case where you want to temporarily add a small device, maybe a USB stick, and now hundreds of MiBs possibly GiBs of metadata have to be copied over to this device without warning. It could be made smart, autoconvert to raid1 when the added device is at least 4x the size of metadata allocation, but then that makes it inconsistent. OK so it could be made interactive, but now that breaks scripts. So... where do you draw the line? Maybe this would work if the system chunk only was raid1? I don't know what the minimum necessary information is for such a case. Possibly it make more sense if 'btrfs device add' always does -dconvert=raid1 unless a --quick option is passed? Perhaps we could print out a big noisy warning that could be silenced? smime.p7s Description: S/MIME Cryptographic Signature
Re: Why subvolume and not just volume?
On 2015-08-06 03:23, Duncan wrote: Martin posted on Wed, 05 Aug 2015 09:06:40 +0200 as excerpted: [W]hat is the penalty of a subvolume compared to a directory? From a design perspective, couldn't all directories just be subvolumes? In addition to the performance issues mentioned by others, there's at least one further practical reason as well. Snapshots stop at subvolume boundaries. It's thus quite useful to use subvolumes to delineate the limits of the snapshot, saying, in effect, snapshot this dir (which happens to be a subvol not just a normal dir) recursively, but don't snapshot the subtree starting with this nested subdir (which again is a (different) subvol). And for some people, this is very useful functionality. I use it to specifically exclude subsets of trivially reproducible data from backups (for example, I always clone public git repositories into individual subvolumes, and keep my local copy of the Portage tree on a separate one (when it isn't on a separate filesystem that is)). smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 0/6] sysfs-part2 Add seed device representation on the sysfs
On Thu, Aug 06, 2015 at 05:51:16AM +0800, Anand Jain wrote: ... these can go in. Btrfs: sysfs: support seed devices in the sysfs layout Sorry for late reply, the patches look good. I'm going to prepare a branch for pull into 4.3. Thanks. I suggested if this can wait. on the 2nd thought, I am preparing to conduct a survey to know most preferred sysfs layout for btrfs. mainly between one, less invasive overlays on the existing layout (current method). the other, separates FS and Volume attributes (old method). sorry that I am going back a bit, but i think its worth as these API are forever. Understood. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Data single *and* raid?
Hello Hugo, hello Chris, thanks for your advice. Now I am here: btrfs balance start -dprofiles=single -mprofiles=raid1 /mnt/__Complete_Disk/ Done, had to relocate 0 out of 3939 chunks root@homeserver:/mnt/__Complete_Disk# btrfs fi show Label: none uuid: a8af3832-48c7-4568-861f-e80380dd7e0b Total devices 3 FS bytes used 3.78TiB devid1 size 2.73TiB used 2.72TiB path /dev/sde devid2 size 2.73TiB used 2.23TiB path /dev/sdc devid3 size 2.73TiB used 2.73TiB path /dev/sdd btrfs-progs v4.1.1 So, that looks good. But then: root@homeserver:/mnt/__Complete_Disk# btrfs fi df /mnt/__Complete_Disk/ Data, RAID5: total=3.83TiB, used=3.78TiB System, RAID5: total=32.00MiB, used=576.00KiB Metadata, RAID5: total=6.46GiB, used=4.84GiB GlobalReserve, single: total=512.00MiB, used=0.00B Is the RAID5 expected here? I did not yet run: btrfs balance start -dconvert=raid5,soft -mconvert=raid5,soft /mnt/new_storage/ Regards, Hendrik On 01.08.2015 22:44, Chris Murphy wrote: On Sat, Aug 1, 2015 at 2:32 PM, Hugo Mills h...@carfax.org.uk wrote: On Sat, Aug 01, 2015 at 10:09:35PM +0200, Hendrik Friedel wrote: Hello, I converted an array to raid5 by btrfs device add /dev/sdd /mnt/new_storage btrfs device add /dev/sdc /mnt/new_storage btrfs balance start -dconvert=raid5 -mconvert=raid5 /mnt/new_storage/ The Balance went through. But now: Label: none uuid: a8af3832-48c7-4568-861f-e80380dd7e0b Total devices 3 FS bytes used 5.28TiB devid1 size 2.73TiB used 2.57TiB path /dev/sde devid2 size 2.73TiB used 2.73TiB path /dev/sdc devid3 size 2.73TiB used 2.73TiB path /dev/sdd btrfs-progs v4.1.1 Already the 2.57TiB is a bit surprising: root@homeserver:/mnt# btrfs fi df /mnt/new_storage/ Data, single: total=2.55TiB, used=2.55TiB Data, RAID5: total=2.73TiB, used=2.72TiB System, RAID5: total=32.00MiB, used=736.00KiB Metadata, RAID1: total=6.00GiB, used=5.33GiB Metadata, RAID5: total=3.00GiB, used=2.99GiB Looking at the btrfs fi show output, you've probably run out of space during the conversion, probably due to an uneven distribution of the original single chunks. I think I would suggest balancing the single chunks, and trying the conversion (of the unconverted parts) again: # btrfs balance start -dprofiles=single -mprofile=raid1 /mnt/new_storage/ # btrfs balance start -dconvert=raid5,soft -mconvert=raid5,soft /mnt/new_storage/ Yep I bet that's it also. btrfs fi usage might be better at exposing this case. -- Hendrik Friedel Auf dem Brink 12 28844 Weyhe Tel. 04203 8394854 Mobil 0178 1874363 --- Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft. https://www.avast.com/antivirus -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs: Fix wrong comment of btrfs_alloc_tree_block()
These wrong comment was copyed from another function(expired) from init, this patch fixed them. Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/extent-tree.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a436bd5..792247f 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7536,9 +7536,6 @@ static void unuse_block_rsv(struct btrfs_fs_info *fs_info, /* * finds a free extent and does all the dirty work required for allocation - * returns the key for the extent through ins, and a tree buffer for - * the first block of the extent through buf. - * * returns the tree buffer or an ERR_PTR on error. */ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: abort transaction on btrfs_reloc_cow_block()
When btrfs_reloc_cow_block() failed in __btrfs_cow_block(), current code just return a err-value to caller, but leave new_created extent buffer exist and locked. Then subsequent code (in relocate) try to lock above eb again, and caused deadlock without any dmesg. (eb lock use wait_event(), so no lockdep message) It is hard to do recover work in __btrfs_cow_block() at this error point, but we can abort transaction to avoid deadlock and operate on unstable state.a It also helps developer to find wrong place quickly. (better than a frozen fs without any dmesg before patch) Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/ctree.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 54114b4..5f745ea 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1159,8 +1159,10 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans, if (test_bit(BTRFS_ROOT_REF_COWS, root-state)) { ret = btrfs_reloc_cow_block(trans, root, buf, cow); - if (ret) + if (ret) { + btrfs_abort_transaction(trans, root, ret); return ret; + } } if (buf == root-node) { -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs: Remove root argument in extent_data_ref_count()
Because it is never used. Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/extent-tree.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 792247f..5f7cbd7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1316,8 +1316,7 @@ static noinline int remove_extent_data_ref(struct btrfs_trans_handle *trans, return ret; } -static noinline u32 extent_data_ref_count(struct btrfs_root *root, - struct btrfs_path *path, +static noinline u32 extent_data_ref_count(struct btrfs_path *path, struct btrfs_extent_inline_ref *iref) { struct btrfs_key key; @@ -6318,7 +6317,7 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, } else { if (found_extent) { BUG_ON(is_data refs_to_drop != - extent_data_ref_count(root, path, iref)); + extent_data_ref_count(path, iref)); if (iref) { BUG_ON(path-slots[0] != extent_slot); } else { -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why subvolume and not just volume?
Martin posted on Wed, 05 Aug 2015 09:06:40 +0200 as excerpted: [W]hat is the penalty of a subvolume compared to a directory? From a design perspective, couldn't all directories just be subvolumes? In addition to the performance issues mentioned by others, there's at least one further practical reason as well. Snapshots stop at subvolume boundaries. It's thus quite useful to use subvolumes to delineate the limits of the snapshot, saying, in effect, snapshot this dir (which happens to be a subvol not just a normal dir) recursively, but don't snapshot the subtree starting with this nested subdir (which again is a (different) subvol). Subvols act very much like directories, it is true. But they have a few additional properties and different behaviors, and it is the distinction between directories and subvols that makes them valuable /as/ subvols. Without a distinction, the whole reason to have subvols as a separate feature vanishes. (FWIW, the first systemd release, v219, to use btrfs subvolume in place of directories found out some of the behavior differences the hard way. Where it was previously doing mkdir, which returns success if the directory is already there, critical for a root filesystem keep read-only mounted by default, but with the required directories already created, on btrfs it tried to create a subvolume instead, which fails if there's a directory already there, particularly if it's a read-only mount. So the behavior creating a subvol differs from that of creating a subdir, and systemd's tmpfiles service was failing on read-only btrfs mounts as a result, while it previously succeeded, when it was only trying to create directories, which already existed. Oops! The bug was fixed in v221, but the experience does illustrate that while subvolumes behave in /many/ ways like subdirs, there are indeed small differences in behavior that can leap up and bite the unwary.) -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/8] Allow GFP_NOFS allocation to fail
On Wed 05-08-15 20:58:25, Andreas Dilger wrote: On Aug 5, 2015, at 3:51 AM, mho...@kernel.org wrote: [...] The rest are the FS specific patches to fortify allocations requests which are really needed to finish transactions without RO remounts. There might be more needed but my test case survives with these in place. Wouldn't it make more sense to order the fs-specific patches _before_ the GFP_NOFS can fail patch (#3), so that once that patch is applied all known failures have already been fixed? Otherwise it could show test failures during bisection that would be confusing. As I write below. If maintainers consider them useful even when GFP_NOFS doesn't fail I will reword them and resend. But you cannot fix the world without breaking it first in this case ;) They would obviously need some rewording if they are going to be applied even without Patch3 and I will do that if respective maintainers will take them. Ext3 and JBD are going away soon so they might be dropped but they have been in the tree while I was testing so I've kept them. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: qgroup: Fix a regression in qgroup reserved space.
On Thu, Aug 06, 2015 at 04:42:41PM +0800, Qu Wenruo wrote: Hi Chris, Would you please consider including this patch into v4.2 if it is still possible? Although the fix is still not perfect and just a hotfix, as qgroup reserve parts still have a lot of problems from design and a lot of operations can still cause reserve space leak. But considering how easy it is to trigger, I still hope it to be merged asap before 4.2. Thanks Qu, I'll get this in. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FW: btrfs-progs: android build
Hi, On Thu, Aug 06, 2015 at 06:45:11PM +0900, �� wrote: Hi, I made btrfs-progs android build script and test it.\ Thanks. The changes as they stand are too intrusive to be added but give me an idea what's needed. The makefile part shares some variables with current make and adds some specific variables and includes. Ideally there's only one makefile but I think that we can live with a separate makefile for android as there seem to be specific quirks that would complicate the common makefile. This means we'd have to keep the shared part in sync manually but it's not that hard. New files or new libs, this always requires more care. And need some help on btrfs_wipe_existing_sb().\ On the test it looks work well.\ The code changes should be hidden in wrappers and pulled via a separate header file. If this is not possible, the ifdefs should be in the function implementations (eg. is_ssd, check_overwrite), not at the call sites. Other than that, I don't mind adding support for android builds. I don't have an android build environment at hand and cannot verify that it's always working, so the same holds as for musl libc, fixes are up to you. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Add WARN_ON() for double lock in btrfs_tree_lock()
When a task trying to double lock a extent buffer, there are no lockdep warning about it because this lock may be in blocking_lock state, and make us hard to debug. This patch add a WARN_ON() for above condition, it can not report all deadlock cases(as lock between tasks), but at least helps us some. Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/locking.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c index f8229ef..d7e6baf 100644 --- a/fs/btrfs/locking.c +++ b/fs/btrfs/locking.c @@ -241,6 +241,7 @@ void btrfs_tree_read_unlock_blocking(struct extent_buffer *eb) */ void btrfs_tree_lock(struct extent_buffer *eb) { + WARN_ON(eb-lock_owner == current-pid); again: wait_event(eb-read_lock_wq, atomic_read(eb-blocking_readers) == 0); wait_event(eb-write_lock_wq, atomic_read(eb-blocking_writers) == 0); -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html