Re: [PATCH] btrfs: rename btrfs_close_extra_device to btrfs_free_extra_devids
On 03/01/2018 12:42 PM, kbuild test robot wrote: Hi Anand, Thank you for the patch! Yet something to improve: [auto build test ERROR on btrfs/next] [also build test ERROR on v4.16-rc3 next-20180228] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Anand-Jain/btrfs-rename-btrfs_close_extra_device-to-btrfs_free_extra_devids/20180301-120850 base: https://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git next config: x86_64-randconfig-x016-201808 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): there is v2 which already fixed this. Thanks, Anand fs/btrfs/disk-io.c: In function 'open_ctree': fs/btrfs/disk-io.c:2783:2: error: implicit declaration of function 'btrfs_free_extra_devids'; did you mean 'btrfs_free_extra_devid'? [-Werror=implicit-function-declaration] btrfs_free_extra_devids(fs_devices, 0); ^~~ btrfs_free_extra_devid cc1: some warnings being treated as errors vim +2783 fs/btrfs/disk-io.c 2396 2397 int open_ctree(struct super_block *sb, 2398struct btrfs_fs_devices *fs_devices, 2399char *options) 2400 { 2401 u32 sectorsize; 2402 u32 nodesize; 2403 u32 stripesize; 2404 u64 generation; 2405 u64 features; 2406 struct btrfs_key location; 2407 struct buffer_head *bh; 2408 struct btrfs_super_block *disk_super; 2409 struct btrfs_fs_info *fs_info = btrfs_sb(sb); 2410 struct btrfs_root *tree_root; 2411 struct btrfs_root *chunk_root; 2412 int ret; 2413 int err = -EINVAL; 2414 int num_backups_tried = 0; 2415 int backup_index = 0; 2416 int max_active; 2417 int clear_free_space_tree = 0; 2418 2419 tree_root = fs_info->tree_root = btrfs_alloc_root(fs_info, GFP_KERNEL); 2420 chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info, GFP_KERNEL); 2421 if (!tree_root || !chunk_root) { 2422 err = -ENOMEM; 2423 goto fail; 2424 } 2425 2426 ret = init_srcu_struct(&fs_info->subvol_srcu); 2427 if (ret) { 2428 err = ret; 2429 goto fail; 2430 } 2431 2432 ret = percpu_counter_init(&fs_info->dirty_metadata_bytes, 0, GFP_KERNEL); 2433 if (ret) { 2434 err = ret; 2435 goto fail_srcu; 2436 } 2437 fs_info->dirty_metadata_batch = PAGE_SIZE * 2438 (1 + ilog2(nr_cpu_ids)); 2439 2440 ret = percpu_counter_init(&fs_info->delalloc_bytes, 0, GFP_KERNEL); 2441 if (ret) { 2442 err = ret; 2443 goto fail_dirty_metadata_bytes; 2444 } 2445 2446 ret = percpu_counter_init(&fs_info->bio_counter, 0, GFP_KERNEL); 2447 if (ret) { 2448 err = ret; 2449 goto fail_delalloc_bytes; 2450 } 2451 2452 INIT_RADIX_TREE(&fs_info->fs_roots_radix, GFP_ATOMIC); 2453 INIT_RADIX_TREE(&fs_info->buffer_radix, GFP_ATOMIC); 2454 INIT_LIST_HEAD(&fs_info->trans_list); 2455 INIT_LIST_HEAD(&fs_info->dead_roots); 2456 INIT_LIST_HEAD(&fs_info->delayed_iputs); 2457 INIT_LIST_HEAD(&fs_info->delalloc_roots); 2458 INIT_LIST_HEAD(&fs_info->caching_block_groups); 2459 spin_lock_init(&fs_info->delalloc_root_lock); 2460 spin_lock_init(&fs_info->trans_lock); 2461 spin_lock_init(&fs_info->fs_roots_radix_lock); 2462 spin_lock_init(&fs_info->delayed_iput_lock); 2463 spin_lock_init(&fs_info->defrag_inodes_lock); 2464 spin_lock_init(&fs_info->tree_mod_seq_lock); 2465 spin_lock_init(&fs_info->super_lock); 2466 spin_lock_init(&fs_info->qgroup_op_lock); 2467 spin_lock_init(&fs_info->buffer_lock); 2468 spin_lock_init(&fs_info->unused_bgs_lock); 2469 rwlock_init(&fs_info->tree_mod_log_lock); 2470 mutex_init(&fs_info->unused_bg_unpin_mutex); 2471 mutex_init(&fs_info->delete_unused_bgs_mutex); 2472 mutex_init(&fs_info->reloc_mutex); 2473 mutex_init(&fs_info->delalloc_root_mutex); 2474 mutex_init(&fs_info->cleaner_delayed
Re: [PATCH] btrfs: rename btrfs_close_extra_device to btrfs_free_extra_devids
Hi Anand, Thank you for the patch! Yet something to improve: [auto build test ERROR on btrfs/next] [also build test ERROR on v4.16-rc3 next-20180228] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Anand-Jain/btrfs-rename-btrfs_close_extra_device-to-btrfs_free_extra_devids/20180301-120850 base: https://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git next config: x86_64-randconfig-x016-201808 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): fs/btrfs/disk-io.c: In function 'open_ctree': >> fs/btrfs/disk-io.c:2783:2: error: implicit declaration of function >> 'btrfs_free_extra_devids'; did you mean 'btrfs_free_extra_devid'? >> [-Werror=implicit-function-declaration] btrfs_free_extra_devids(fs_devices, 0); ^~~ btrfs_free_extra_devid cc1: some warnings being treated as errors vim +2783 fs/btrfs/disk-io.c 2396 2397 int open_ctree(struct super_block *sb, 2398 struct btrfs_fs_devices *fs_devices, 2399 char *options) 2400 { 2401 u32 sectorsize; 2402 u32 nodesize; 2403 u32 stripesize; 2404 u64 generation; 2405 u64 features; 2406 struct btrfs_key location; 2407 struct buffer_head *bh; 2408 struct btrfs_super_block *disk_super; 2409 struct btrfs_fs_info *fs_info = btrfs_sb(sb); 2410 struct btrfs_root *tree_root; 2411 struct btrfs_root *chunk_root; 2412 int ret; 2413 int err = -EINVAL; 2414 int num_backups_tried = 0; 2415 int backup_index = 0; 2416 int max_active; 2417 int clear_free_space_tree = 0; 2418 2419 tree_root = fs_info->tree_root = btrfs_alloc_root(fs_info, GFP_KERNEL); 2420 chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info, GFP_KERNEL); 2421 if (!tree_root || !chunk_root) { 2422 err = -ENOMEM; 2423 goto fail; 2424 } 2425 2426 ret = init_srcu_struct(&fs_info->subvol_srcu); 2427 if (ret) { 2428 err = ret; 2429 goto fail; 2430 } 2431 2432 ret = percpu_counter_init(&fs_info->dirty_metadata_bytes, 0, GFP_KERNEL); 2433 if (ret) { 2434 err = ret; 2435 goto fail_srcu; 2436 } 2437 fs_info->dirty_metadata_batch = PAGE_SIZE * 2438 (1 + ilog2(nr_cpu_ids)); 2439 2440 ret = percpu_counter_init(&fs_info->delalloc_bytes, 0, GFP_KERNEL); 2441 if (ret) { 2442 err = ret; 2443 goto fail_dirty_metadata_bytes; 2444 } 2445 2446 ret = percpu_counter_init(&fs_info->bio_counter, 0, GFP_KERNEL); 2447 if (ret) { 2448 err = ret; 2449 goto fail_delalloc_bytes; 2450 } 2451 2452 INIT_RADIX_TREE(&fs_info->fs_roots_radix, GFP_ATOMIC); 2453 INIT_RADIX_TREE(&fs_info->buffer_radix, GFP_ATOMIC); 2454 INIT_LIST_HEAD(&fs_info->trans_list); 2455 INIT_LIST_HEAD(&fs_info->dead_roots); 2456 INIT_LIST_HEAD(&fs_info->delayed_iputs); 2457 INIT_LIST_HEAD(&fs_info->delalloc_roots); 2458 INIT_LIST_HEAD(&fs_info->caching_block_groups); 2459 spin_lock_init(&fs_info->delalloc_root_lock); 2460 spin_lock_init(&fs_info->trans_lock); 2461 spin_lock_init(&fs_info->fs_roots_radix_lock); 2462 spin_lock_init(&fs_info->delayed_iput_lock); 2463 spin_lock_init(&fs_info->defrag_inodes_lock); 2464 spin_lock_init(&fs_info->tree_mod_seq_lock); 2465 spin_lock_init(&fs_info->super_lock); 2466 spin_lock_init(&fs_info->qgroup_op_lock); 2467 spin_lock_init(&fs_info->buffer_lock); 2468 spin_lock_init(&fs_info->unused_bgs_lock); 2469 rwlock_init(&fs_info->tree_mod_log_lock); 2470 mutex_init(&fs_info->unused_bg_unpin_mutex); 2471 mutex_init(&fs_info->delete_unused_bgs_mutex); 2472 mutex_init(&fs_info->reloc_mutex); 2473 mutex_init(&fs_info->delalloc_root_mutex); 2474 mutex_init(&fs_info->cleaner_delayed_iput_mutex); 2475 seqlock_init(&fs_info->profiles_lock); 2476 2477
Re: [PATCH 3/4] btrfs-progs: check/lowmem mode: Check inline extent size
On 03/01/2018 10:47 AM, Qu Wenruo wrote: Signed-off-by: Qu Wenruo Looks good to me. Reviewed-by: Su Yue --- check/mode-lowmem.c | 8 1 file changed, 8 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 62bcf3d2e126..44c58163f8f7 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -1417,6 +1417,7 @@ static int check_file_extent(struct btrfs_root *root, struct btrfs_key *fkey, u64 csum_found; /* In byte size, sectorsize aligned */ u64 search_start; /* Logical range start we search for csum */ u64 search_len; /* Logical range len we search for csum */ + u32 max_inline_extent_size = BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info); unsigned int extent_type; unsigned int is_hole; int compressed = 0; @@ -1440,6 +1441,13 @@ static int check_file_extent(struct btrfs_root *root, struct btrfs_key *fkey, root->objectid, fkey->objectid, fkey->offset); err |= FILE_EXTENT_ERROR; } + if (extent_num_bytes > max_inline_extent_size) { + error( + "root %llu EXTENT_DATA[%llu %llu] too large inline extent size, have %llu, max: %u", + root->objectid, fkey->objectid, fkey->offset, + extent_num_bytes, max_inline_extent_size); + err |= FILE_EXTENT_ERROR; + } if (!compressed && extent_num_bytes != item_inline_len) { error( "root %llu EXTENT_DATA[%llu %llu] wrong inline size, have: %llu, expected: %u", -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] btrfs-progs: check/original mode: Check inline extent size
On 03/01/2018 10:47 AM, Qu Wenruo wrote: Kernel doesn't allow inline extent equal or larger than 4K. And for inline extent larger than 4K, __btrfs_drop_extents() can return -EOPNOTSUPP and cause unexpected error. Check it in original mode. Signed-off-by: Qu Wenruo Looks good to me. Reviewed-by: Su Yue --- check/main.c | 4 check/mode-original.h | 1 + 2 files changed, 5 insertions(+) diff --git a/check/main.c b/check/main.c index 97baae583f04..ce41550ab16a 100644 --- a/check/main.c +++ b/check/main.c @@ -560,6 +560,8 @@ static void print_inode_error(struct btrfs_root *root, struct inode_record *rec) fprintf(stderr, ", bad file extent"); if (errors & I_ERR_FILE_EXTENT_OVERLAP) fprintf(stderr, ", file extent overlap"); + if (errors & I_ERR_FILE_EXTENT_TOO_LARGE) + fprintf(stderr, ", inline file extent too large"); if (errors & I_ERR_FILE_EXTENT_DISCOUNT) fprintf(stderr, ", file extent discount"); if (errors & I_ERR_DIR_ISIZE_WRONG) @@ -1461,6 +1463,8 @@ static int process_file_extent(struct btrfs_root *root, num_bytes = btrfs_file_extent_inline_len(eb, slot, fi); if (num_bytes == 0) rec->errors |= I_ERR_BAD_FILE_EXTENT; + if (num_bytes > BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)) + rec->errors |= I_ERR_FILE_EXTENT_TOO_LARGE; rec->found_size += num_bytes; num_bytes = (num_bytes + mask) & ~mask; } else if (extent_type == BTRFS_FILE_EXTENT_REG || diff --git a/check/mode-original.h b/check/mode-original.h index f859af478f0f..368de692fdd1 100644 --- a/check/mode-original.h +++ b/check/mode-original.h @@ -185,6 +185,7 @@ struct file_extent_hole { #define I_ERR_SOME_CSUM_MISSING (1 << 12) #define I_ERR_LINK_COUNT_WRONG(1 << 13) #define I_ERR_FILE_EXTENT_ORPHAN (1 << 14) +#define I_ERR_FILE_EXTENT_TOO_LARGE(1 << 15) struct inode_record { struct list_head backrefs; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] btrfs-progs: test/convert: Add test case for invalid large inline data extent
Signed-off-by: Qu Wenruo --- .../016-invalid-large-inline-extent/test.sh| 22 ++ 1 file changed, 22 insertions(+) create mode 100755 tests/convert-tests/016-invalid-large-inline-extent/test.sh diff --git a/tests/convert-tests/016-invalid-large-inline-extent/test.sh b/tests/convert-tests/016-invalid-large-inline-extent/test.sh new file mode 100755 index ..f37c7c09d2e7 --- /dev/null +++ b/tests/convert-tests/016-invalid-large-inline-extent/test.sh @@ -0,0 +1,22 @@ +#!/bin/bash +# Check if btrfs-convert refuses to rollback the filesystem, and leave the fs +# and the convert image untouched + +source "$TEST_TOP/common" +source "$TEST_TOP/common.convert" + +setup_root_helper +prepare_test_dev +check_prereq btrfs-convert +check_global_prereq mke2fs + +convert_test_prep_fs ext4 mke2fs -t ext4 -b 4096 + +# Create a 6K file, which should not be inlined +run_check $SUDO_HELPER dd if=/dev/zero bs=2k count=3 of="$TEST_MNT/file1" + +run_check_umount_test_dev + +# convert_test_do_convert() will call btrfs check, which should expose any +# invalid inline extent with too large size +convert_test_do_convert -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] btrfs-progs: check/lowmem mode: Check inline extent size
Signed-off-by: Qu Wenruo --- check/mode-lowmem.c | 8 1 file changed, 8 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 62bcf3d2e126..44c58163f8f7 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -1417,6 +1417,7 @@ static int check_file_extent(struct btrfs_root *root, struct btrfs_key *fkey, u64 csum_found; /* In byte size, sectorsize aligned */ u64 search_start; /* Logical range start we search for csum */ u64 search_len; /* Logical range len we search for csum */ + u32 max_inline_extent_size = BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info); unsigned int extent_type; unsigned int is_hole; int compressed = 0; @@ -1440,6 +1441,13 @@ static int check_file_extent(struct btrfs_root *root, struct btrfs_key *fkey, root->objectid, fkey->objectid, fkey->offset); err |= FILE_EXTENT_ERROR; } + if (extent_num_bytes > max_inline_extent_size) { + error( + "root %llu EXTENT_DATA[%llu %llu] too large inline extent size, have %llu, max: %u", + root->objectid, fkey->objectid, fkey->offset, + extent_num_bytes, max_inline_extent_size); + err |= FILE_EXTENT_ERROR; + } if (!compressed && extent_num_bytes != item_inline_len) { error( "root %llu EXTENT_DATA[%llu %llu] wrong inline size, have: %llu, expected: %u", -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Fix long standing -EOPNOTSUPP problem caused by
Kernel doesn't support dropping range inside inline extent, and prevents such thing happening by limiting max inline extent size to min(max_inline, sectorsize - 1) in cow_file_range_inline(). However btrfs-progs only inherit the BTRFS_MAX_INLINE_DATA_SIZE() macro, which doesn't have sectorsize check. And since btrfs-progs defaults to 16K nodesize, above macro allows large inline extent over 15K size. This leads to unexpected kernel behavior. The bug exists from the very beginning of btrfs-convert, dating back to 2008 when btrfs-convert is first introduced. Qu Wenruo (4): btrfs-progs: Limit inline extent below page size btrfs-progs: check/original mode: Check inline extent size btrfs-progs: check/lowmem mode: Check inline extent size btrfs-progs: test/convert: Add test case for invalid large inline data extent check/main.c | 4 check/mode-lowmem.c| 8 check/mode-original.h | 1 + ctree.h| 11 +-- .../016-invalid-large-inline-extent/test.sh| 22 ++ 5 files changed, 44 insertions(+), 2 deletions(-) create mode 100755 tests/convert-tests/016-invalid-large-inline-extent/test.sh -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] btrfs-progs: check/original mode: Check inline extent size
Kernel doesn't allow inline extent equal or larger than 4K. And for inline extent larger than 4K, __btrfs_drop_extents() can return -EOPNOTSUPP and cause unexpected error. Check it in original mode. Signed-off-by: Qu Wenruo --- check/main.c | 4 check/mode-original.h | 1 + 2 files changed, 5 insertions(+) diff --git a/check/main.c b/check/main.c index 97baae583f04..ce41550ab16a 100644 --- a/check/main.c +++ b/check/main.c @@ -560,6 +560,8 @@ static void print_inode_error(struct btrfs_root *root, struct inode_record *rec) fprintf(stderr, ", bad file extent"); if (errors & I_ERR_FILE_EXTENT_OVERLAP) fprintf(stderr, ", file extent overlap"); + if (errors & I_ERR_FILE_EXTENT_TOO_LARGE) + fprintf(stderr, ", inline file extent too large"); if (errors & I_ERR_FILE_EXTENT_DISCOUNT) fprintf(stderr, ", file extent discount"); if (errors & I_ERR_DIR_ISIZE_WRONG) @@ -1461,6 +1463,8 @@ static int process_file_extent(struct btrfs_root *root, num_bytes = btrfs_file_extent_inline_len(eb, slot, fi); if (num_bytes == 0) rec->errors |= I_ERR_BAD_FILE_EXTENT; + if (num_bytes > BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)) + rec->errors |= I_ERR_FILE_EXTENT_TOO_LARGE; rec->found_size += num_bytes; num_bytes = (num_bytes + mask) & ~mask; } else if (extent_type == BTRFS_FILE_EXTENT_REG || diff --git a/check/mode-original.h b/check/mode-original.h index f859af478f0f..368de692fdd1 100644 --- a/check/mode-original.h +++ b/check/mode-original.h @@ -185,6 +185,7 @@ struct file_extent_hole { #define I_ERR_SOME_CSUM_MISSING(1 << 12) #define I_ERR_LINK_COUNT_WRONG (1 << 13) #define I_ERR_FILE_EXTENT_ORPHAN (1 << 14) +#define I_ERR_FILE_EXTENT_TOO_LARGE(1 << 15) struct inode_record { struct list_head backrefs; -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] btrfs-progs: Limit inline extent below page size
Kernel doesn't support to drop extent inside an inlined extent. And kernel tends to limit inline extent just below sectorsize, so also limit it in btrfs-progs. This fixes unexpected -EOPNOTSUPP error from __btrfs_drop_extents() on converted btrfs. Fixes: 806528b8755f ("Add Yan Zheng's ext3->btrfs conversion program") Reported-by: Peter Y. Chuang Signed-off-by: Qu Wenruo --- ctree.h | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/ctree.h b/ctree.h index 17cdac76c58c..0282deef339b 100644 --- a/ctree.h +++ b/ctree.h @@ -20,6 +20,7 @@ #define __BTRFS_CTREE_H__ #include +#include "internal.h" #if BTRFS_FLAT_INCLUDES #include "list.h" @@ -1195,8 +1196,14 @@ static inline u32 BTRFS_NODEPTRS_PER_BLOCK(const struct btrfs_fs_info *info) (offsetof(struct btrfs_file_extent_item, disk_bytenr)) static inline u32 BTRFS_MAX_INLINE_DATA_SIZE(const struct btrfs_fs_info *info) { - return BTRFS_MAX_ITEM_SIZE(info) - - BTRFS_FILE_EXTENT_INLINE_DATA_START; + /* +* Inline extent larger than pagesize could lead to kernel unexpected +* error when dropping extents, so we need to limit the inline extent +* size to less than sectorsize. +*/ + return min_t(u32, info->sectorsize - 1, +BTRFS_MAX_ITEM_SIZE(info) - +BTRFS_FILE_EXTENT_INLINE_DATA_START); } static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info) -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug report] BTRFS partition re-mounted as read-only after few minutes of use
On 2018年03月01日 02:36, Filipe Manana wrote: > On Wed, Feb 28, 2018 at 5:50 PM, David Sterba wrote: >> On Wed, Feb 28, 2018 at 05:43:40PM +0100, peteryuchu...@gmail.com wrote: >>> On my laptop, which has just been switched to BTRFS, the root partition >>> (a BTRFS partition inside an encrypted LVM. The drive is an NVMe) is >>> re-mounted as read-only few minutes after boot. >>> >>> Trace: >> >> By any chance, are there other messages from btrfs above the line? >>> >>> [ 199.974591] [ cut here ] >>> [ 199.974593] BTRFS: Transaction aborted (error -95) >> >> -95 is EOPNOTSUPP, ie operation not supported >> >>> [ 199.974647] WARNING: CPU: 0 PID: 324 at fs/btrfs/inode.c:3042 >>> btrfs_finish_ordered_io+0x7ab/0x850 [btrfs] >> >> btrfs_finish_ordered_io:: >> >> 3038 btrfs_ordered_update_i_size(inode, 0, ordered_extent); >> 3039 ret = btrfs_update_inode_fallback(trans, root, inode); >> 3040 if (ret) { >> 3041 btrfs_abort_transaction(trans, ret); >> 3042 goto out; >> 3043 } > > I don't know what's exactly in Arch's kernel, but looking at the > 4.15.5 stable tag from kernel.org: > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/fs/btrfs/inode.c?h=v4.15.5#n3042 > > The -EOPNOTSUPP error can come from btrfs_drop_extents() through the > call to insert_reserved_file_extent(). __btrfs_drop_extents() will return -EOPNOTSUPP if we're dropping part of an inline extent. Could be something wrong with convert inline extent generator. > We've had several reports of this kind of error in this location in > the past and they happened to be on filesystems converted from extN to > btrfs. > I don't know however if this filesystem was from such a conversion nor > if those old bugs in the conversion tool were fixed. And since the user is using Arch and kernel is latest, it normally means the btrfs-progs is also latest. I need to double check about the convert inline extent code to ensure we don't create too large inline extent. Thanks, Qu > > >> >> the return code is unexpected here. And seeing 'operation not supported' >> after a inode size change looks strange but EOPNOTSUPP could be returned >> from some places. >> >> The transaction is aborted from a thread that finalizes some processing >> so we don't have enough information here to see how it started. I >> suspect there's a file that gets modified short after boot and hits the >> problem. I don't think the EOPNOTSUPP is returned from the lower layers >> (lvm encryption or nvme), so at this point seems like a btrfs bug. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > signature.asc Description: OpenPGP digital signature
Re: spurious full btrfs corruption
On 2018年02月28日 23:50, Christoph Anton Mitterer wrote: > Hey Qu > > Thanks for still looking into this. > I'm still in the recovery process (and there are other troubles at the > university where I work, so everything will take me some time), but I > have made a dd image of the broken fs, before I put a backup on the > SSD, so that still exist in the case we need to do further debugging. > > To thoroughly describe what has happened, let me go a bit back. > > - Until last ~ September, I was using some Fujitsu E782, for at least > 4 years, with no signs of data corruptions. That's pretty good. > - For my personal data, I have one[0] Seagate 8 TB SMR HDD, which I > backup (send/receive) on two further such HDDs (all these are > btrfs), and (rsync) on one further with ext4. > These files have all their SHA512 sums attached as XATTRs, which I > regularly test. So I think I can be pretty sure, that there was never > a case of silent data corruption and the RAM on the E782 is fine. Good backup practice can't be even better. > - In October I got a new notebook from the university... brand new > Fujitsu U757 in basically the best possible configuration. > I ran memtest86+ in it's normal (non-SMP) mode for roughly a day, > with no errors. > In SMP mode (which is considered experimental, I think) it crashes > reproducible on the same position. Many people seem to have this > (with exactly the same test, address range where it freezes) so I > considered it a bug in memtest86+ SMP mode, which it likely is. > A patch[1], didn't help me. Normally I won't blame memory unless strange behavior happens, from unexpected freeze to strange kernel panic. But when this happens, a lot of things can go wrong. > - Unfortunately from the beginning on that notebook showed many further > issues. > - CPU overheating[2] > - boot freezes, when the initramfs tool of Debian isn't configured > to > blindly add all modules to the initramfs[3]. > - spurious freezes, which I couldn't really debug any further since > there is no serial port... Netconsole would help here, especially when U757 has an RJ45. As long as you have another system which is able to run nc, it should catch any kernel message, and help us to analyse if it's really a memory corruption. > in that cases neither Magic-SysRq nor > even NumLock LEDs and so worked anymore. > These freezes caused me some troubles with dpkg[4]. > The issue I describe there, could also shed some light on the whole > situation, since it resulted out of the freezes. > - The dealer replaced the thermal paste on the CPU and when the CPU > overheating and the freezes didn't go away, they sent the notebook > for one week to Fujitsu in Germany, who allegedly thoroughly tested > it with Windows, and found no errors. That's unfortunately very common for consumer electronics, as few people and cooperation really care about Linux user on consumer laptops. And since there are problems with the system (either hardware or software), I already see a much higher possibility to hard reset. > > - The notebooks SSD is a Samsung SSD 850 PRO 1TB, the same which I > already used with the old notebook. > A long SMART check after the corruption, brought no errors. Also using that SSD with smaller capacity, it's less possible for the SSD. > > > - Just before the corruption on the btrfs happened, I decided it's > time > for a backup of the notebooks SSD (what an irony, I know), so I made > a snapshot of my one and only subvol, removed and non-precious data > from that snapshot, made anotjer ro-snapshot of that and removed the > rw snapshot. > - The kernel was some 4.14. > > - More or less after that, I saw the "BUG: unable to handle kernel > paging request at 9fb75f827100" which I reported here. > I'm not sure whether this had to do with btrfs at all, and even if > whether it was the fs on the SSD, or another one on an external HDD It could be Btrfs, and it would block btrfs module to continue, which is almost a hard reset. > I've had mounted at that time. > sync/umount/remount,rw/shutdown all didn't work, and I had to power > off the node. > - After that things went on basically as I described in my previous > mails to the list already. > - There were some csum erros.> - Checking these files with debsums > (Debian stores MD5s of the > package's files) found no errors. > - A scrub brought no errors. > - Shortly after the scrub, further csum errors as well as: > BTRFS critical (device dm-0): unable to find logical 4503658729209856 > length 4096 > - Then I booted from a rescue USB stick with kernel/btrfs-progs 4.12. > - fsck in normal/lowmem mode were okay except: > Couldn't find free space inode 1 > - I cleared the v1 free space cache > - a scrub failed with "ret=-1, errno=5 (Input/output error)" > - Things like these in the kernel log: > Feb 21 17:43:09 heisenberg kernel
Re: [PATCH] Btrfs: fix unexpected -EEXIST when creating new inode
On Wed, Feb 28, 2018 at 04:06:40PM +, Filipe Manana wrote: > On Thu, Jan 25, 2018 at 6:02 PM, Liu Bo wrote: > > The highest objectid, which is assigned to new inode, is decided at > > the time of initializing fs roots. However, in cases where log replay > > gets processed, the btree which fs root owns might be changed, so we > > have to search it again for the highest objectid, otherwise creating > > new inode would end up with -EEXIST. > > > > cc: v4.4-rc6+ > > Fixes: f32e48e92596 ("Btrfs: Initialize btrfs_root->highest_objectid when > > loading tree root and subvolume roots") > > Signed-off-by: Liu Bo > > Hi Bo, > > Any reason to not have submitted a test case for fstests? > Unless I missed something this should be easy to reproduce, deterministic > issue. > It's on my todo list for a while until I forgot it...will do after I fix the bugs I have now. I found this originally from running generic/475. Thanks, -liubo > thanks > > > --- > > fs/btrfs/tree-log.c | 19 +++ > > 1 file changed, 19 insertions(+) > > > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > > index a7e6235..646cdbf 100644 > > --- a/fs/btrfs/tree-log.c > > +++ b/fs/btrfs/tree-log.c > > @@ -28,6 +28,7 @@ > > #include "hash.h" > > #include "compression.h" > > #include "qgroup.h" > > +#include "inode-map.h" > > > > /* magic values for the inode_only field in btrfs_log_inode: > > * > > @@ -5715,6 +5716,24 @@ int btrfs_recover_log_trees(struct btrfs_root > > *log_root_tree) > > path); > > } > > > > + if (!ret && wc.stage == LOG_WALK_REPLAY_ALL) { > > + struct btrfs_root *root = wc.replay_dest; > > + > > + btrfs_release_path(path); > > + > > + /* > > +* We have just replayed everything, and the highest > > +* objectid of fs roots probably has changed in case > > +* some inode_item's got replayed. > > +*/ > > + /* > > +* root->objectid_mutex is not acquired as log > > replay > > +* could only happen during mount. > > +*/ > > + ret = btrfs_find_highest_objectid(root, > > + &root->highest_objectid); > > + } > > + > > key.offset = found_key.offset - 1; > > wc.replay_dest->log_root = NULL; > > free_extent_buffer(log->node); > > -- > > 2.9.4 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Filipe David Manana, > > “Whether you think you can, or you think you can't — you're right.” -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs space used issue
On 2018-02-28 14:54, Duncan wrote: Austin S. Hemmelgarn posted on Wed, 28 Feb 2018 14:24:40 -0500 as excerpted: I believe this effect is what Austin was referencing when he suggested the defrag, tho defrag won't necessarily /entirely/ clear it up. One way to be /sure/ it's cleared up would be to rewrite the entire file, deleting the original, either by copying it to a different filesystem and back (with the off-filesystem copy guaranteeing that it can't use reflinks to the existing extents), or by using cp's --reflink=never option. (FWIW, I prefer the former, just to be sure, using temporary copies to a suitably sized tmpfs for speed where possible, tho obviously if the file is larger than your memory size that's not possible.) Correct, this is why I recommended trying a defrag. I've actually never seen things so bad that a simple defrag didn't fix them however (though I have seen a few cases where the target extent size had to be set higher than the default of 20MB). Good to know. I knew larger target extent sizes could help, but between not being sure they'd entirely fix it and not wanting to get too far down into the detail when the copy-off-the-filesystem-and-back option is /sure/ to fix the problem, I decided to handwave that part of it. =:^) FWIW, a target size of 128M has fixed it on all 5 cases I've seen where the default didn't. In theory, there's probably some really pathological case where that won't work, but I've just gotten into the habit of using that by default on all my systems now and haven't seen any issues so far (but like you I'm pretty much exclusively on SSD's, and the small handful of things I have on traditional hard disks are all archival storage with WORM access patterns). Also, as counter-intuitive as it might sound, autodefrag really doesn't help much with this, and can actually make things worse. I hadn't actually seen that here, but suspect I might, now, as previous autodefrag behavior on my system tended to rewrite the entire file[1], thereby effectively giving me the benefit of the copy-away-and-back technique without actually bothering, while that "bug" has now been fixed. I sort of wish the old behavior remained an option, maybe radicalautodefrag or something, and must confess to being a bit concerned over the eventual impact here now that autodefrag does /not/ rewrite the entire file any more, but oh, well... Chances are it's not going to be /that/ big a deal since I /am/ on fast ssd, and if it becomes one, I guess I can just setup say firefox-profile-defrag.timer jobs or whatever, as necessary. --- [1] I forgot whether it was ssd behavior, or compression, or what, but something I'm using here apparently forced autodefrag to rewrite the entire file, and a recent "bugfix" changed that so it's more in line with the normal autodefrag behavior. I rather preferred the old behavior, especially since I'm on fast ssd and all my large files tend to be write- once no-rewrite anyway, but I understand the performance implications on large active-rewrite files such as gig-plus database and VM-image files, so... Hmm. I've actually never seen such behavior myself. I do know that compression impacts how autodefrag works (autodefrag tries to rewrite up to 64k around a random write, but compression operates in 128k blocks), but beyond that I'm not sure what might have caused this. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs space used issue
Austin S. Hemmelgarn posted on Wed, 28 Feb 2018 14:24:40 -0500 as excerpted: >> I believe this effect is what Austin was referencing when he suggested >> the defrag, tho defrag won't necessarily /entirely/ clear it up. One >> way to be /sure/ it's cleared up would be to rewrite the entire file, >> deleting the original, either by copying it to a different filesystem >> and back (with the off-filesystem copy guaranteeing that it can't use >> reflinks to the existing extents), or by using cp's --reflink=never >> option. >> (FWIW, I prefer the former, just to be sure, using temporary copies to >> a suitably sized tmpfs for speed where possible, tho obviously if the >> file is larger than your memory size that's not possible.) > Correct, this is why I recommended trying a defrag. I've actually never > seen things so bad that a simple defrag didn't fix them however (though > I have seen a few cases where the target extent size had to be set > higher than the default of 20MB). Good to know. I knew larger target extent sizes could help, but between not being sure they'd entirely fix it and not wanting to get too far down into the detail when the copy-off-the-filesystem-and-back option is /sure/ to fix the problem, I decided to handwave that part of it. =:^) > Also, as counter-intuitive as it > might sound, autodefrag really doesn't help much with this, and can > actually make things worse. I hadn't actually seen that here, but suspect I might, now, as previous autodefrag behavior on my system tended to rewrite the entire file[1], thereby effectively giving me the benefit of the copy-away-and-back technique without actually bothering, while that "bug" has now been fixed. I sort of wish the old behavior remained an option, maybe radicalautodefrag or something, and must confess to being a bit concerned over the eventual impact here now that autodefrag does /not/ rewrite the entire file any more, but oh, well... Chances are it's not going to be /that/ big a deal since I /am/ on fast ssd, and if it becomes one, I guess I can just setup say firefox-profile-defrag.timer jobs or whatever, as necessary. --- [1] I forgot whether it was ssd behavior, or compression, or what, but something I'm using here apparently forced autodefrag to rewrite the entire file, and a recent "bugfix" changed that so it's more in line with the normal autodefrag behavior. I rather preferred the old behavior, especially since I'm on fast ssd and all my large files tend to be write- once no-rewrite anyway, but I understand the performance implications on large active-rewrite files such as gig-plus database and VM-image files, so... -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs space used issue
On 2018-02-28 14:09, Duncan wrote: vinayak hegde posted on Tue, 27 Feb 2018 18:39:51 +0530 as excerpted: I am using btrfs, But I am seeing du -sh and df -h showing huge size difference on ssd. mount: /dev/drbd1 on /dc/fileunifier.datacache type btrfs (rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/) du -sh /dc/fileunifier.datacache/ - 331G df -h /dev/drbd1 746G 346G 398G 47% /dc/fileunifier.datacache btrfs fi usage /dc/fileunifier.datacache/ Overall: Device size: 745.19GiB Device allocated: 368.06GiB Device unallocated: 377.13GiB Device missing: 0.00B Used: 346.73GiB Free (estimated): 396.36GiB(min: 207.80GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 176.00MiB(used: 0.00B) Data,single: Size:365.00GiB, Used:345.76GiB /dev/drbd1 365.00GiB Metadata,DUP: Size:1.50GiB, Used:493.23MiB /dev/drbd1 3.00GiB System,DUP: Size:32.00MiB, Used:80.00KiB /dev/drbd1 64.00MiB Unallocated: /dev/drbd1 377.13GiB Even if we consider 6G metadata its 331+6 = 337. where is 9GB used? Please explain. Taking a somewhat higher level view than Austin's reply, on btrfs, plain df and to a somewhat lessor extent du[1] are at best good /estimations/ of usage, and for df, space remaining. Due to btrfs' COW/copy-on-write semantics and features such as the various replication/raid schemes, snapshotting, etc, btrfs makes available, that df/du don't really understand as they simply don't have and weren't /designed/ to have that level of filesystem-specific insight, they, particularly df due to its whole-filesystem focus, aren't particularly accurate on btrfs. Consider their output more a "best estimate given the rough data we have available" sort of report. To get the real filesystem focused picture, use btrfs filesystem usage, or btrfs filesystem show combined with btrfs filesystem df. That's what you should trust, altho various utilities that check for available space before doing something often use the kernel-call equivalent of (plain) df to ensure they have the required space, so it's worthwhile to keep an eye on it as the filesystem fills, as well. If it gets too out of sync with btrfs filesystem usage, or if btrfs filesystem usage unallocated drops below say five gigs or data or metadata size vs used shows a spread of multiple gigs (your data shows a spread of ~20 gigs ATM, but with 377 gigs still unallocated it's no big deal; it would be a big deal if those were reversed, tho, only 20 gigs unallocated and a spread of 300+ gigs in data size vs used), then corrective action such as a filtered rebalance may be necessary. There are entries in the FAQ discussing free space issues that you should definitely read if you haven't, altho they obviously address the general case, so if you have more questions about an individual case after having read them, here is a good place to ask. =:^) Everything having to do with "space" (see both the 1/Important-questions and 4/Common-questions sections) here: https://btrfs.wiki.kernel.org/index.php/FAQ Meanwhile, it's worth noting that not entirely intuitively, btrfs' COW implementation can "waste" space on larger files that are mostly, but not entirely, rewritten. An example is the best way to demonstrate. Consider each x a used block and each - an unused but still referenced block: Original file, written as a single extent (diagram works best with monospace, not arbitrarily rewrapped): xxx First rewrite of part of it: xxx--xx xx Nth rewrite, where some blocks of the original still remain as originally written: --xxx-- xxx--- xxx x---xx xxx xxx As you can see, that first really large extent remains fully referenced, altho only three blocks of it remain in actual use. All those -- won't be returned to free space until those last three blocks get rewritten as well, thus freeing the entire original extent. I believe this effect is what Austin was referencing when he suggested the defrag, tho defrag won't necessarily /entirely/ clear it up. One way to be /sure/ it's cleared up would be to rewrite the entire file, deleting the original, either by copying it to a different filesystem and back (with the off-filesystem copy guaranteeing that it can't use reflinks to the existing extents), or by using cp's --reflink=never option. (FWIW, I prefer the former, just to be sure, using temporary copies to a suitably sized tmpfs for speed where possible, tho obviously if the file is larger than your memory size that's not possible.) Correct, this is why I recommended trying a defrag. I'
Re: btrfs space used issue
vinayak hegde posted on Tue, 27 Feb 2018 18:39:51 +0530 as excerpted: > I am using btrfs, But I am seeing du -sh and df -h showing huge size > difference on ssd. > > mount: > /dev/drbd1 on /dc/fileunifier.datacache type btrfs > (rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/) > > > du -sh /dc/fileunifier.datacache/ - 331G > > df -h /dev/drbd1 746G 346G 398G 47% /dc/fileunifier.datacache > > btrfs fi usage /dc/fileunifier.datacache/ > Overall: > Device size: 745.19GiB Device allocated: 368.06GiB > Device unallocated: 377.13GiB Device missing: > 0.00B Used: 346.73GiB Free (estimated): > 396.36GiB(min: 207.80GiB) > Data ratio: 1.00 Metadata ratio: 2.00 > Global reserve: 176.00MiB(used: 0.00B) > > Data,single: Size:365.00GiB, Used:345.76GiB >/dev/drbd1 365.00GiB > > Metadata,DUP: Size:1.50GiB, Used:493.23MiB >/dev/drbd1 3.00GiB > > System,DUP: Size:32.00MiB, Used:80.00KiB >/dev/drbd1 64.00MiB > > Unallocated: >/dev/drbd1 377.13GiB > > > Even if we consider 6G metadata its 331+6 = 337. > where is 9GB used? > > Please explain. Taking a somewhat higher level view than Austin's reply, on btrfs, plain df and to a somewhat lessor extent du[1] are at best good /estimations/ of usage, and for df, space remaining. Due to btrfs' COW/copy-on-write semantics and features such as the various replication/raid schemes, snapshotting, etc, btrfs makes available, that df/du don't really understand as they simply don't have and weren't /designed/ to have that level of filesystem-specific insight, they, particularly df due to its whole-filesystem focus, aren't particularly accurate on btrfs. Consider their output more a "best estimate given the rough data we have available" sort of report. To get the real filesystem focused picture, use btrfs filesystem usage, or btrfs filesystem show combined with btrfs filesystem df. That's what you should trust, altho various utilities that check for available space before doing something often use the kernel-call equivalent of (plain) df to ensure they have the required space, so it's worthwhile to keep an eye on it as the filesystem fills, as well. If it gets too out of sync with btrfs filesystem usage, or if btrfs filesystem usage unallocated drops below say five gigs or data or metadata size vs used shows a spread of multiple gigs (your data shows a spread of ~20 gigs ATM, but with 377 gigs still unallocated it's no big deal; it would be a big deal if those were reversed, tho, only 20 gigs unallocated and a spread of 300+ gigs in data size vs used), then corrective action such as a filtered rebalance may be necessary. There are entries in the FAQ discussing free space issues that you should definitely read if you haven't, altho they obviously address the general case, so if you have more questions about an individual case after having read them, here is a good place to ask. =:^) Everything having to do with "space" (see both the 1/Important-questions and 4/Common-questions sections) here: https://btrfs.wiki.kernel.org/index.php/FAQ Meanwhile, it's worth noting that not entirely intuitively, btrfs' COW implementation can "waste" space on larger files that are mostly, but not entirely, rewritten. An example is the best way to demonstrate. Consider each x a used block and each - an unused but still referenced block: Original file, written as a single extent (diagram works best with monospace, not arbitrarily rewrapped): xxx First rewrite of part of it: xxx--xx xx Nth rewrite, where some blocks of the original still remain as originally written: --xxx-- xxx--- xxx x---xx xxx xxx As you can see, that first really large extent remains fully referenced, altho only three blocks of it remain in actual use. All those -- won't be returned to free space until those last three blocks get rewritten as well, thus freeing the entire original extent. I believe this effect is what Austin was referencing when he suggested the defrag, tho defrag won't necessarily /entirely/ clear it up. One way to be /sure/ it's cleared up would be to rewrite the entire file, deleting the original, either by copying it to a different filesystem and back (with the off-filesystem copy guaranteeing that it can't use reflinks to the existing extents), or by using cp's --reflink=never option. (FWIW, I prefer the former, just to be sure, using temporary copies to a suitably sized tmpfs for speed where possible, tho obviously if the file is larger than your memory
Re: [Bug report] BTRFS partition re-mounted as read-only after few minutes of use
On Wed, 2018-02-28 at 18:36 +, Filipe Manana wrote: > On Wed, Feb 28, 2018 at 5:50 PM, David Sterba > wrote: > > On Wed, Feb 28, 2018 at 05:43:40PM +0100, peteryuchu...@gmail.com > > wrote: > > > On my laptop, which has just been switched to BTRFS, the root > > > partition > > > (a BTRFS partition inside an encrypted LVM. The drive is an NVMe) > > > is > > > re-mounted as read-only few minutes after boot. > > > > > > Trace: > > > > By any chance, are there other messages from btrfs above the line? > > > > > > [ 199.974591] [ cut here ] > > > [ 199.974593] BTRFS: Transaction aborted (error -95) > > > > -95 is EOPNOTSUPP, ie operation not supported > > > > > [ 199.974647] WARNING: CPU: 0 PID: 324 at fs/btrfs/inode.c:3042 > > > btrfs_finish_ordered_io+0x7ab/0x850 [btrfs] > > > > btrfs_finish_ordered_io:: > > > > 3038 btrfs_ordered_update_i_size(inode, 0, > > ordered_extent); > > 3039 ret = btrfs_update_inode_fallback(trans, root, > > inode); > > 3040 if (ret) { > > 3041 btrfs_abort_transaction(trans, ret); > > 3042 goto out; > > 3043 } > > I don't know what's exactly in Arch's kernel, but looking at the > 4.15.5 stable tag from kernel.org: > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.g > it/tree/fs/btrfs/inode.c?h=v4.15.5#n3042 > > The -EOPNOTSUPP error can come from btrfs_drop_extents() through the > call to insert_reserved_file_extent(). > We've had several reports of this kind of error in this location in > the past and they happened to be on filesystems converted from extN > to > btrfs. > I don't know however if this filesystem was from such a conversion > nor > if those old bugs in the conversion tool were fixed. > > Indeed it was converted from ext4. I may try to rebuild the system from scratch when I have more time, but I'm afraid I have to revert back to ext4 for now. > > > > the return code is unexpected here. And seeing 'operation not > > supported' > > after a inode size change looks strange but EOPNOTSUPP could be > > returned > > from some places. > > > > The transaction is aborted from a thread that finalizes some > > processing > > so we don't have enough information here to see how it started. I > > suspect there's a file that gets modified short after boot and hits > > the > > problem. I don't think the EOPNOTSUPP is returned from the lower > > layers > > (lvm encryption or nvme), so at this point seems like a btrfs bug. > > -- > > To unsubscribe from this list: send the line "unsubscribe linux- > > btrfs" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug report] BTRFS partition re-mounted as read-only after few minutes of use
On Wed, Feb 28, 2018 at 5:50 PM, David Sterba wrote: > On Wed, Feb 28, 2018 at 05:43:40PM +0100, peteryuchu...@gmail.com wrote: >> On my laptop, which has just been switched to BTRFS, the root partition >> (a BTRFS partition inside an encrypted LVM. The drive is an NVMe) is >> re-mounted as read-only few minutes after boot. >> >> Trace: > > By any chance, are there other messages from btrfs above the line? >> >> [ 199.974591] [ cut here ] >> [ 199.974593] BTRFS: Transaction aborted (error -95) > > -95 is EOPNOTSUPP, ie operation not supported > >> [ 199.974647] WARNING: CPU: 0 PID: 324 at fs/btrfs/inode.c:3042 >> btrfs_finish_ordered_io+0x7ab/0x850 [btrfs] > > btrfs_finish_ordered_io:: > > 3038 btrfs_ordered_update_i_size(inode, 0, ordered_extent); > 3039 ret = btrfs_update_inode_fallback(trans, root, inode); > 3040 if (ret) { > 3041 btrfs_abort_transaction(trans, ret); > 3042 goto out; > 3043 } I don't know what's exactly in Arch's kernel, but looking at the 4.15.5 stable tag from kernel.org: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/fs/btrfs/inode.c?h=v4.15.5#n3042 The -EOPNOTSUPP error can come from btrfs_drop_extents() through the call to insert_reserved_file_extent(). We've had several reports of this kind of error in this location in the past and they happened to be on filesystems converted from extN to btrfs. I don't know however if this filesystem was from such a conversion nor if those old bugs in the conversion tool were fixed. > > the return code is unexpected here. And seeing 'operation not supported' > after a inode size change looks strange but EOPNOTSUPP could be returned > from some places. > > The transaction is aborted from a thread that finalizes some processing > so we don't have enough information here to see how it started. I > suspect there's a file that gets modified short after boot and hits the > problem. I don't think the EOPNOTSUPP is returned from the lower layers > (lvm encryption or nvme), so at this point seems like a btrfs bug. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.” -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug report] BTRFS partition re-mounted as read-only after few minutes of use
On Wed, Feb 28, 2018 at 05:43:40PM +0100, peteryuchu...@gmail.com wrote: > On my laptop, which has just been switched to BTRFS, the root partition > (a BTRFS partition inside an encrypted LVM. The drive is an NVMe) is > re-mounted as read-only few minutes after boot. > > Trace: By any chance, are there other messages from btrfs above the line? > > [ 199.974591] [ cut here ] > [ 199.974593] BTRFS: Transaction aborted (error -95) -95 is EOPNOTSUPP, ie operation not supported > [ 199.974647] WARNING: CPU: 0 PID: 324 at fs/btrfs/inode.c:3042 > btrfs_finish_ordered_io+0x7ab/0x850 [btrfs] btrfs_finish_ordered_io:: 3038 btrfs_ordered_update_i_size(inode, 0, ordered_extent); 3039 ret = btrfs_update_inode_fallback(trans, root, inode); 3040 if (ret) { 3041 btrfs_abort_transaction(trans, ret); 3042 goto out; 3043 } the return code is unexpected here. And seeing 'operation not supported' after a inode size change looks strange but EOPNOTSUPP could be returned from some places. The transaction is aborted from a thread that finalizes some processing so we don't have enough information here to see how it started. I suspect there's a file that gets modified short after boot and hits the problem. I don't think the EOPNOTSUPP is returned from the lower layers (lvm encryption or nvme), so at this point seems like a btrfs bug. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs space used issue
On Wed, Feb 28, 2018 at 9:01 AM, vinayak hegde wrote: > I ran full defragement and balance both, but didnt help. Showing the same information immediately after full defragment would be helpful. > My created and accounting usage files are matching the du -sh output. > But I am not getting why btrfs internals use so much extra space. > My worry is, will get no space error earlier than I expect. > Is it expected with btrfs internal that it will use so much extra space? > Did you try to reboot? Deleted opened file could well cause this effect. > Vinayak > > > > > On Tue, Feb 27, 2018 at 7:24 PM, Austin S. Hemmelgarn > wrote: >> On 2018-02-27 08:09, vinayak hegde wrote: >>> >>> I am using btrfs, But I am seeing du -sh and df -h showing huge size >>> difference on ssd. >>> >>> mount: >>> /dev/drbd1 on /dc/fileunifier.datacache type btrfs >>> >>> (rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/) >>> >>> >>> du -sh /dc/fileunifier.datacache/ - 331G >>> >>> df -h >>> /dev/drbd1 746G 346G 398G 47% /dc/fileunifier.datacache >>> >>> btrfs fi usage /dc/fileunifier.datacache/ >>> Overall: >>> Device size: 745.19GiB >>> Device allocated: 368.06GiB >>> Device unallocated: 377.13GiB >>> Device missing: 0.00B >>> Used: 346.73GiB >>> Free (estimated): 396.36GiB(min: 207.80GiB) >>> Data ratio: 1.00 >>> Metadata ratio: 2.00 >>> Global reserve: 176.00MiB(used: 0.00B) >>> >>> Data,single: Size:365.00GiB, Used:345.76GiB >>> /dev/drbd1 365.00GiB >>> >>> Metadata,DUP: Size:1.50GiB, Used:493.23MiB >>> /dev/drbd1 3.00GiB >>> >>> System,DUP: Size:32.00MiB, Used:80.00KiB >>> /dev/drbd1 64.00MiB >>> >>> Unallocated: >>> /dev/drbd1 377.13GiB >>> >>> >>> Even if we consider 6G metadata its 331+6 = 337. >>> where is 9GB used? >>> >>> Please explain. >> >> First, you're counting the metadata wrong. The value shown per-device by >> `btrfs filesystem usage` already accounts for replication (so it's only 3 GB >> of metadata allocated, not 6 GB). Neither `df` nor `du` looks at the chunk >> level allocations though. >> >> Now, with that out of the way, the discrepancy almost certainly comes form >> differences in how `df` and `du` calculate space usage. In particular, `df` >> calls statvfs and looks at the f_blocks and f_bfree values to compute space >> usage, while `du` walks the filesystem tree calling stat on everything and >> looking at st_blksize and st_blocks (or instead at st_size if you pass in >> `--apparent-size` as an option). This leads to a couple of differences in >> what they will count: >> >> 1. `du` may or may not properly count hardlinks, sparse files, and >> transparently compressed data, dependent on whether or not you use >> `--apparent-sizes` (by default, it does properly count all of those), while >> `df` will always account for those properly. >> 2. `du` does not properly account for reflinked blocks (from deduplication, >> snapshots, or use of the CLONE ioctl), and will count each reflink of every >> block as part of the total size, while `df` will always count each block >> exactly once no matter how many reflinks it has. >> 3. `du` does not account for all of the BTRFS metadata allocations, >> functionally ignoring space allocated for anything but inline data, while >> `df` accounts for all BTRFS metadata properly. >> 4. `du` will recurse into other filesystems if you don't pass the `-x` >> option to it, while `df` will only report for each filesystem separately. >> 5. `du` will only count data usage under the given mount point, and won't >> account for data on other subvolumes that may be mounted elsewhere (and if >> you pass in `-x` won't count data on other subvolumes located under the >> given path either), while `df` will count all the data in all subvolumes. >> 6. There are a couple of other differences too, but they're rather complex >> and dependent on the internals of BTRFS. >> >> In your case, I think the issue is probably one of the various things under >> item 6. Items 1, 2 and 4 will cause `du` to report more space usage than >> `df`, item 3 is irrelevant because `du` shows less space than the total data >> chunk usage reported by `btrfs filesystem usage`, and item 5 is irrelevant >> because you're mounting the root subvolume and not using the `-x` option on >> `du` (and therefore there can't be other subvolumes you're missing). >> >> Try running a full defrag of the given mount point. If what I think is >> causing this is in fact the issue, that should bring the numbers back >> in-line with each other. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscri
[PATCH] generic: add test for fsync after renaming and linking special file
From: Filipe Manana Test that when a fsync journal/log exists, if we rename a special file (fifo, symbolic link or device), create a hard link for it with its old name and then commit the journal/log, if a power loss happens the filesystem will not fail to replay the journal/log when it is mounted the next time. This test is motivated by a bug found in btrfs, which is fixed by the following patch for the linux kernel: "Btrfs: fix log replay failure after linking special file and fsync" Signed-off-by: Filipe Manana --- tests/generic/479 | 112 ++ tests/generic/479.out | 2 + tests/generic/group | 1 + 3 files changed, 115 insertions(+) create mode 100644 tests/generic/479 create mode 100644 tests/generic/479.out diff --git a/tests/generic/479 b/tests/generic/479 new file mode 100644 index ..7e4ba7d0 --- /dev/null +++ b/tests/generic/479 @@ -0,0 +1,112 @@ +#! /bin/bash +# FSQA Test No. 479 +# +# Test that when a fsync journal/log exists, if we rename a special file (fifo, +# symbolic link or device), create a hard link for it with its old name and then +# commit the journal/log, if a power loss happens the filesystem will not fail +# to replay the journal/log when it is mounted the next time. +# +#--- +# +# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + _cleanup_flakey + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/dmflakey + +# real QA test starts here +_supported_fs generic +_supported_os Linux +_require_scratch +_require_dm_target flakey + +rm -f $seqres.full + +run_test() +{ + local file_type=$1 + + _scratch_mkfs >>$seqres.full 2>&1 + _require_metadata_journaling $SCRATCH_DEV + _init_flakey + _mount_flakey + + mkdir $SCRATCH_MNT/testdir + case $file_type in + symlink) + ln -s xxx $SCRATCH_MNT/testdir/foo + ;; + fifo) + mkfifo $SCRATCH_MNT/testdir/foo + ;; + dev) + mknod $SCRATCH_MNT/testdir/foo c 0 0 + ;; + *) + _fail "Invalid file type argument: $file_type" + esac + # Make sure everything done so far is durably persisted. + sync + + # Create a file and fsync it just to create a journal/log. This file + # must be in the same directory as our special file "foo". + touch $SCRATCH_MNT/testdir/f1 + $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/f1 + + # Rename our special file and then create link that has its old name. + mv $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/bar + ln $SCRATCH_MNT/testdir/bar $SCRATCH_MNT/testdir/foo + + # Create a second file and fsync it. This is just to durably persist the + # fsync journal/log which is typically modified by the previous rename + # and link operations. This file does not need to be placed in the same + # directory as our special file. + touch $SCRATCH_MNT/f2 + $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/f2 + + # Simulate a power failure and mount the filesystem to check that + # replaying the fsync log/journal succeeds, that is the mount operation + # does not fail. + _flakey_drop_and_remount + _unmount_flakey + _cleanup_flakey +} + +run_test symlink +run_test fifo +run_test dev + +echo "Silence is golden" +status=0 +exit diff --git a/tests/generic/479.out b/tests/generic/479.out new file mode 100644 index ..290f18b3 --- /dev/null +++ b/tests/generic/479.out @@ -0,0 +1,2 @@ +QA output created by 479 +Silence is golden diff --git a/tests/generic/group b/tests/generic/group index 1e808865..3b9b47e3 100644 --- a/tests/generic/group +++ b/tests/generic/group @@ -481,3 +481,4 @@ 476 auto rw 477 auto quick exportfs 478 auto quick +479 auto quick metadata -- 2.11.0 -- To unsubscribe from this list:
[PATCH] generic: test fsync new file after removing hard link
From: Filipe Manana Test that if we have a file with two hard links in the same parent directory, then remove of the links, create a new file in the same parent directory and with the name of the link removed, fsync the new file and have a power loss, mounting the filesystem succeeds. This test is motivated by a bug found in btrfs, which is fixed by the linux kernel patch titled: "Btrfs: fix log replay failure after unlink and link combination" Signed-off-by: Filipe Manana --- tests/generic/480 | 83 +++ tests/generic/480.out | 2 ++ tests/generic/group | 1 + 3 files changed, 86 insertions(+) create mode 100755 tests/generic/480 create mode 100644 tests/generic/480.out diff --git a/tests/generic/480 b/tests/generic/480 new file mode 100755 index ..a287684b --- /dev/null +++ b/tests/generic/480 @@ -0,0 +1,83 @@ +#! /bin/bash +# FSQA Test No. 480 +# +# Test that if we have a file with two hard links in the same parent directory, +# then remove of the links, create a new file in the same parent directory and +# with the name of the link removed, fsync the new file and have a power loss, +# mounting the filesystem succeeds. +# +#--- +# +# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + _cleanup_flakey + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/dmflakey + +# real QA test starts here +_supported_fs generic +_supported_os Linux +_require_scratch +_require_dm_target flakey + +rm -f $seqres.full + +_scratch_mkfs >>$seqres.full 2>&1 +_require_metadata_journaling $SCRATCH_DEV +_init_flakey +_mount_flakey + +mkdir $SCRATCH_MNT/testdir +touch $SCRATCH_MNT/testdir/foo +ln $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/bar + +# Make sure everything done so far is durably persisted. +sync + +# Now remove of the links of our file and create a new file with the same name +# and in the same parent directory, and finally fsync this new file. +unlink $SCRATCH_MNT/testdir/bar +touch $SCRATCH_MNT/testdir/bar +$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/bar + +# Simulate a power failure and mount the filesystem to check that replaying the +# the fsync log/journal succeeds, that is the mount operation does not fail. +_flakey_drop_and_remount + +_unmount_flakey +_cleanup_flakey + +echo "Silence is golden" +status=0 +exit diff --git a/tests/generic/480.out b/tests/generic/480.out new file mode 100644 index ..a40a718e --- /dev/null +++ b/tests/generic/480.out @@ -0,0 +1,2 @@ +QA output created by 480 +Silence is golden diff --git a/tests/generic/group b/tests/generic/group index 3b9b47e3..ea2056b1 100644 --- a/tests/generic/group +++ b/tests/generic/group @@ -482,3 +482,4 @@ 477 auto quick exportfs 478 auto quick 479 auto quick metadata +480 auto quick metadata -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug report] BTRFS partition re-mounted as read-only after few minutes of use
Hi, On my laptop, which has just been switched to BTRFS, the root partition (a BTRFS partition inside an encrypted LVM. The drive is an NVMe) is re-mounted as read-only few minutes after boot. Trace: [ 199.974591] [ cut here ] [ 199.974593] BTRFS: Transaction aborted (error -95) [ 199.974647] WARNING: CPU: 0 PID: 324 at fs/btrfs/inode.c:3042 btrfs_finish_ordered_io+0x7ab/0x850 [btrfs] [ 199.974648] Modules linked in: tun fuse cmac rfcomm bnep snd_hda_codec_hdmi ip6t_REJECT snd_hda_codec_generic nf_reject_ipv6 nf_log_ipv6 xt_hl nf_conntrack_ipv6 nf_defrag_ipv6 ip6t_rt ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp nls_iso8859_1 nls_cp437 vfat fat nf_conntrack_ipv4 nf_defrag_ipv4 xt_addrtype xt_conntrack snd_soc_skl snd_soc_skl_ipc snd_hda_ext_core snd_soc_sst_dsp snd_soc_sst_ipc snd_soc_acpi brcmfmac snd_soc_core ip6table_filter ip6_tables brcmutil nf_conntrack_netbios_ns snd_compress nf_conntrack_broadcast nf_nat_ftp ac97_bus cfg80211 snd_pcm_dmaengine nf_nat nf_conntrack_ftp nf_conntrack libcrc32c crc32c_generic thunderbolt iptable_filter iTCO_wdt mmc_core iTCO_vendor_support crypto_user msr intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp [ 199.974675] kvm_intel snd_hda_intel snd_hda_codec kvm snd_hda_core applesmc snd_hwdep input_polldev irqbypass snd_pcm intel_cstate snd_timer intel_uncore intel_rapl_perf pcspkr i915 snd i2c_i801 soundcore joydev mousedev input_leds i2c_algo_bit drm_kms_helper hci_uart btbcm btqca btintel drm bluetooth mei_me 8250_dw intel_gtt mei agpgart acpi_als shpchp syscopyarea idma64 sysfillrect sbs sysimgblt fb_sys_fops ecdh_generic rfkill kfifo_buf sbshc industrialio rtc_cmos evdev mac_hid ac apple_bl facetimehd(O) videobuf2_dma_sg videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media ip_tables x_tables btrfs xor zstd_decompress zstd_compress xxhash raid6_pq dm_crypt algif_skcipher af_alg dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd [ 199.974710] glue_helper cryptd xhci_pci xhci_hcd usbcore usb_common applespi(O) crc16 led_class intel_lpss_pci intel_lpss spi_pxa2xx_platform [ 199.974718] CPU: 0 PID: 324 Comm: kworker/u8:6 Tainted: G U O 4.15.5-1-ARCH #1 [ 199.974718] Hardware name: Apple Inc. MacBookPro14,1/Mac- B4831CEBD52A0C4C, BIOS MBP141.88Z.0169.B00.1712141501 12/14/2017 [ 199.974734] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [ 199.974746] RIP: 0010:btrfs_finish_ordered_io+0x7ab/0x850 [btrfs] [ 199.974747] RSP: 0018:b3310128bdc8 EFLAGS: 00010286 [ 199.974749] RAX: RBX: a25b1e150b60 RCX: 0001 [ 199.974749] RDX: 8001 RSI: 9fe47fd0 RDI: [ 199.974750] RBP: a25a2df7dad0 R08: 0001 R09: 0412 [ 199.974751] R10: R11: R12: a25a2df7d8e0 [ 199.974752] R13: a25a2df7d8c0 R14: a25b1f037f78 R15: 0001 [ 199.974753] FS: () GS:a25b2ec0() knlGS: [ 199.974754] CS: 0010 DS: ES: CR0: 80050033 [ 199.974755] CR2: 7fd9cc5dd3e7 CR3: 6300a002 CR4: 003606f0 [ 199.974756] DR0: DR1: DR2: [ 199.974756] DR3: DR6: fffe0ff0 DR7: 0400 [ 199.974757] Call Trace: [ 199.974774] normal_work_helper+0x39/0x370 [btrfs] [ 199.974779] process_one_work+0x1ce/0x410 [ 199.974782] worker_thread+0x2b/0x3d0 [ 199.974784] ? process_one_work+0x410/0x410 [ 199.974785] kthread+0x113/0x130 [ 199.974787] ? kthread_create_on_node+0x70/0x70 [ 199.974789] ? do_syscall_64+0x74/0x190 [ 199.974791] ? SyS_exit_group+0x10/0x10 [ 199.974793] ret_from_fork+0x35/0x40 [ 199.974795] Code: 08 01 e9 a4 fb ff ff 49 8b 46 60 f0 0f ba a8 50 12 00 00 02 72 17 8b 74 24 10 83 fe fb 74 32 48 c7 c7 38 a7 6c c0 e8 85 7a a3 de <0f> 0b 8b 4c 24 10 ba e2 0b 00 00 eb b1 4c 8b 23 4c 8b 53 10 41 [ 199.974820] ---[ end trace c8ed62ff6a525901 ]--- [ 199.974822] BTRFS: error (device dm-2) in btrfs_finish_ordered_io:3042: errno=-95 unknown [ 199.974824] BTRFS info (device dm-2): forced readonly [ 199.976696] BTRFS error (device dm-2): pending csums is 6447104 Bug report: https://bugzilla.kernel.org/show_bug.cgi?id=198945 Kernel version: 4.15.5 Distro: Arch Linux -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix unexpected -EEXIST when creating new inode
On Thu, Jan 25, 2018 at 6:02 PM, Liu Bo wrote: > The highest objectid, which is assigned to new inode, is decided at > the time of initializing fs roots. However, in cases where log replay > gets processed, the btree which fs root owns might be changed, so we > have to search it again for the highest objectid, otherwise creating > new inode would end up with -EEXIST. > > cc: v4.4-rc6+ > Fixes: f32e48e92596 ("Btrfs: Initialize btrfs_root->highest_objectid when > loading tree root and subvolume roots") > Signed-off-by: Liu Bo Hi Bo, Any reason to not have submitted a test case for fstests? Unless I missed something this should be easy to reproduce, deterministic issue. thanks > --- > fs/btrfs/tree-log.c | 19 +++ > 1 file changed, 19 insertions(+) > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index a7e6235..646cdbf 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -28,6 +28,7 @@ > #include "hash.h" > #include "compression.h" > #include "qgroup.h" > +#include "inode-map.h" > > /* magic values for the inode_only field in btrfs_log_inode: > * > @@ -5715,6 +5716,24 @@ int btrfs_recover_log_trees(struct btrfs_root > *log_root_tree) > path); > } > > + if (!ret && wc.stage == LOG_WALK_REPLAY_ALL) { > + struct btrfs_root *root = wc.replay_dest; > + > + btrfs_release_path(path); > + > + /* > +* We have just replayed everything, and the highest > +* objectid of fs roots probably has changed in case > +* some inode_item's got replayed. > +*/ > + /* > +* root->objectid_mutex is not acquired as log replay > +* could only happen during mount. > +*/ > + ret = btrfs_find_highest_objectid(root, > + &root->highest_objectid); > + } > + > key.offset = found_key.offset - 1; > wc.replay_dest->log_root = NULL; > free_extent_buffer(log->node); > -- > 2.9.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.” -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Btrfs: fix log replay failure after linking special file and fsync
From: Filipe Manana If in the same transaction we rename a special file (fifo, character/block device or symbolic link), create a hard link for it having its old name then sync the log, we will end up with a log that can not be replayed and at when attempting to replay it, an EEXIST error is returned and mounting the filesystem fails. Example scenario: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt $ mkdir /mnt/testdir $ mkfifo /mnt/testdir/foo # Make sure everything done so far is durably persisted. $ sync # Create some unrelated file and fsync it, this is just to create a log # tree. The file must be in the same directory as our special file. $ touch /mnt/testdir/f1 $ xfs_io -c "fsync" /mnt/testdir/f1 # Rename our special file and then create a hard link with its old name. $ mv /mnt/testdir/foo /mnt/testdir/bar $ ln /mnt/testdir/bar /mnt/testdir/foo # Create some other unrelated file and fsync it, this is just to persist # the log tree which was modified by the previous rename and link # operations. Alternatively we could have modified file f1 and fsync it. $ touch /mnt/f2 $ xfs_io -c "fsync" /mnt/f2 $ mount /dev/sdc /mnt mount: mount /dev/sdc on /mnt failed: File exists This happens because when both the log tree and the subvolume's tree have an entry in the directory "testdir" with the same name, that is, there is one key (258 INODE_REF 257) in the subvolume tree and another one in the log tree (where 258 is the inode number of our special file and 257 is the inode for directory "testdir"). Only the data of those two keys differs, in the subvolume tree the index field for inode reference has a value of 3 while the log tree it has a value of 5. Because the same key exists in both trees, but have different index, the log replay fails with an -EEXIST error when attempting to replay the inode reference from the log tree. Fix this by setting the last_unlink_trans field of the inode (our special file) to the current transaction id when a hard link is created, as this forces logging the parent directory inode, solving the conflict at log replay time. A new generic test case for fstests was also submitted. Signed-off-by: Filipe Manana --- fs/btrfs/tree-log.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 28d0de199b05..411a022489e4 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -5841,7 +5841,7 @@ int btrfs_log_new_name(struct btrfs_trans_handle *trans, * this will force the logging code to walk the dentry chain * up for the file */ - if (S_ISREG(inode->vfs_inode.i_mode)) + if (!S_ISDIR(inode->vfs_inode.i_mode)) inode->last_unlink_trans = trans->transid; /* -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix log replay failure after unlink and link combination
From: Filipe Manana If we have a file with 2 (or more) hard links in the same directory, remove one of the hard links, create a new file (or link an existing file) in the same directory with the name of the removed hard link, and then finally fsync the new file, we end up with a log that fails to replay, causing a mount failure. Example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/testdir $ touch /mnt/testdir/foo $ ln /mnt/testdir/foo /mnt/testdir/bar $ sync $ unlink /mnt/testdir/bar $ touch /mnt/testdir/bar $ xfs_io -c "fsync" /mnt/testdir/bar $ mount /dev/sdb /mnt mount: mount(2) failed: /mnt: No such file or directory When replaying the log, for that example, we also see the following in dmesg/syslog: [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257 [71813.674204] [ cut here ] [71813.675694] BTRFS: Transaction aborted (error -2) [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs] [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs] [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: GW 4.15.0-rc9-btrfs-next-56+ #1 [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs] [71813.679669] RSP: 0018:c90001cef738 EFLAGS: 00010286 [71813.679669] RAX: 0025 RBX: 880217ce4708 RCX: 0001 [71813.679669] RDX: RSI: 81c14bae RDI: [71813.679669] RBP: c90001cef7c0 R08: 0001 R09: 0001 [71813.679669] R10: c90001cef5e0 R11: 8343f007 R12: 880217d474c8 [71813.679669] R13: fffe R14: 88021ccf1548 R15: 0101 [71813.679669] FS: 7f7cee84c480() GS:88023fc8() knlGS: [71813.679669] CS: 0010 DS: ES: CR0: 80050033 [71813.679669] CR2: 7f7cedc1abf9 CR3: 0002354b4003 CR4: 001606e0 [71813.679669] Call Trace: [71813.679669] btrfs_unlink_inode+0x17/0x41 [btrfs] [71813.679669] drop_one_dir_item+0xfa/0x131 [btrfs] [71813.679669] add_inode_ref+0x71e/0x851 [btrfs] [71813.679669] ? __lock_is_held+0x39/0x71 [71813.679669] ? replay_one_buffer+0x53/0x53a [btrfs] [71813.679669] replay_one_buffer+0x4a4/0x53a [btrfs] [71813.679669] ? rcu_read_unlock+0x3a/0x57 [71813.679669] ? __lock_is_held+0x39/0x71 [71813.679669] walk_up_log_tree+0x101/0x1d2 [btrfs] [71813.679669] walk_log_tree+0xad/0x188 [btrfs] [71813.679669] btrfs_recover_log_trees+0x1fa/0x31e [btrfs] [71813.679669] ? replay_one_extent+0x544/0x544 [btrfs] [71813.679669] open_ctree+0x1cf6/0x2209 [btrfs] [71813.679669] btrfs_mount_root+0x368/0x482 [btrfs] [71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6 [71813.679669] ? __lockdep_init_map+0x176/0x1c2 [71813.679669] ? mount_fs+0x64/0x10b [71813.679669] mount_fs+0x64/0x10b [71813.679669] vfs_kern_mount+0x68/0xce [71813.679669] btrfs_mount+0x13e/0x772 [btrfs] [71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6 [71813.679669] ? __lockdep_init_map+0x176/0x1c2 [71813.679669] ? mount_fs+0x64/0x10b [71813.679669] mount_fs+0x64/0x10b [71813.679669] vfs_kern_mount+0x68/0xce [71813.679669] do_mount+0x6e5/0x973 [71813.679669] ? memdup_user+0x3e/0x5c [71813.679669] SyS_mount+0x72/0x98 [71813.679669] entry_SYSCALL_64_fastpath+0x1e/0x8b [71813.679669] RIP: 0033:0x7f7cedf150ba [71813.679669] RSP: 002b:7ffca71da688 EFLAGS: 0206 [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c [71813.679669] ---[ end trace 83bd473fc5b4663b ]--- [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree) [71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30 [71814.128078] BTRFS error (device dm-0): open_ctree failed This happens because the log has inode reference items for both inode 25
Re: Btrfs occupies more space than du reports...
On Wed, Feb 28, 2018 at 2:26 PM, Shyam Prasad N wrote: > Hi, > > Thanks for the reply. > >> * `df` calls `statvfs` to get it's data, which tries to count physical >> allocation accounting for replication profiles. In other words, data in >> chunks with the dup, raid1, and raid10 profiles gets counted twice, data in >> raid5 and raid6 chunks gets counted with a bit of extra space for the >> parity, etc. > > We have data not using raid (single), metadata using dup, we've not > used compression, subvols have not been created yet (other than the > default subvol), there are no other mount points within the tree. > Taking into account all that you're saying, the numbers don't make > sense to me. "btrfs fi usage" tells that the data "used" is much more > than what it should be. I agree more with what du is saying the disk > usage is. > I tried an experiment. Filled up the available space (as per what > btrfs believes is available) with huge files. As soon as the usage > reached 100%, further writes started to return ENOSPC. This is what > I'm scared is what is going to happen when these filesystems > eventually fill up. This would normally be the expected behaviour, but > in many of these servers, the actual data that is being used is much > lesser (60-70 GBs in some cases). > To me, it looks like a btrfs internal refcounting has gone wrong. > Maybe it's thinking that some data blocks (which are actually free) > are in use? One reason could be overwrites inside of extents. What happens is btrfs does not (always) physically split extent when it is partially overwritten. So some space remains free but unavailable. Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb18387584 16704 7531456 1% /mnt localhost:~ # dd if=/dev/urandom of=/mnt/file bs=1M count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.580041 s, 181 MB/s localhost:~ # sync localhost:~ # df -k /mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb18387584 119552 7428864 2% /mnt localhost:~ # dd if=/dev/urandom of=/mnt/file bs=1M count=1 conv=notrunc seek=25 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00781892 s, 134 MB/s localhost:~ # dd if=/dev/urandom of=/mnt/file bs=1M count=1 conv=notrunc seek=50 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00780386 s, 134 MB/s localhost:~ # dd if=/dev/urandom of=/mnt/file bs=1M count=1 conv=notrunc seek=75 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00761908 s, 138 MB/s localhost:~ # sync localhost:~ # df -k /mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb18387584 122624 7425792 2% /mnt So 3M is lost. And if you write 50M in the middle you will get 50M "lost" space. I do not know how btrfs decides when to split extent. defragmenting file should free those partial extents again. btrfs fi defrag -r /mnt > Or some other refcounting issue? > We've tried "btrfs check" as well as "btrfs scrub", so far. Both have > not reported any errors. > > Regards, > Shyam > > On Fri, Feb 23, 2018 at 6:53 PM, Austin S. Hemmelgarn > wrote: >> On 2018-02-23 06:21, Shyam Prasad N wrote: >>> >>> Hi, >>> >>> Can someone explain me why there is a difference in the number of >>> blocks reported by df and du commands below? >>> >>> = >>> # df -h /dc >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/drbd1 746G 519G 225G 70% /dc >>> >>> # btrfs filesystem df -h /dc/ >>> Data, single: total=518.01GiB, used=516.58GiB >>> System, DUP: total=8.00MiB, used=80.00KiB >>> Metadata, DUP: total=2.00GiB, used=1019.72MiB >>> GlobalReserve, single: total=352.00MiB, used=0.00B >>> >>> # du -sh /dc >>> 467G/dc >>> = >>> >>> df shows 519G is used. While recursive check using du shows only 467G. >>> The filesystem doesn't contain any snapshots/extra subvolumes. >>> Neither does it contain any mounted filesystem under /dc. >>> I also considered that it could be a void left behind by one of the >>> open FDs held by a process. So I rebooted the system. Still no >>> changes. >>> >>> The situation is even worse on a few other systems with similar >>> configuration. >>> >> >> At least part of this is a difference in how each tool computes space usage. >> >> * `df` calls `statvfs` to get it's data, which tries to count physical >> allocation accounting for replication profiles. In other words, data in >> chunks with the dup, raid1, and raid10 profiles gets counted twice, data in >> raid5 and raid6 chunks gets counted with a bit of extra space for the >> parity, etc. >> >> * `btrfs fi df` looks directly at the filesystem itself and counts how much >> space is available to each chunk type in the `total` values and how much >> space is used in each chunk type in the `used` values, after replication. >> If you add together the data used value and twice the system and metadata >> used values
Re: Btrfs occupies more space than du reports...
Hi, Thanks for the reply. > * `df` calls `statvfs` to get it's data, which tries to count physical > allocation accounting for replication profiles. In other words, data in > chunks with the dup, raid1, and raid10 profiles gets counted twice, data in > raid5 and raid6 chunks gets counted with a bit of extra space for the > parity, etc. We have data not using raid (single), metadata using dup, we've not used compression, subvols have not been created yet (other than the default subvol), there are no other mount points within the tree. Taking into account all that you're saying, the numbers don't make sense to me. "btrfs fi usage" tells that the data "used" is much more than what it should be. I agree more with what du is saying the disk usage is. I tried an experiment. Filled up the available space (as per what btrfs believes is available) with huge files. As soon as the usage reached 100%, further writes started to return ENOSPC. This is what I'm scared is what is going to happen when these filesystems eventually fill up. This would normally be the expected behaviour, but in many of these servers, the actual data that is being used is much lesser (60-70 GBs in some cases). To me, it looks like a btrfs internal refcounting has gone wrong. Maybe it's thinking that some data blocks (which are actually free) are in use? Or some other refcounting issue? We've tried "btrfs check" as well as "btrfs scrub", so far. Both have not reported any errors. Regards, Shyam On Fri, Feb 23, 2018 at 6:53 PM, Austin S. Hemmelgarn wrote: > On 2018-02-23 06:21, Shyam Prasad N wrote: >> >> Hi, >> >> Can someone explain me why there is a difference in the number of >> blocks reported by df and du commands below? >> >> = >> # df -h /dc >> Filesystem Size Used Avail Use% Mounted on >> /dev/drbd1 746G 519G 225G 70% /dc >> >> # btrfs filesystem df -h /dc/ >> Data, single: total=518.01GiB, used=516.58GiB >> System, DUP: total=8.00MiB, used=80.00KiB >> Metadata, DUP: total=2.00GiB, used=1019.72MiB >> GlobalReserve, single: total=352.00MiB, used=0.00B >> >> # du -sh /dc >> 467G/dc >> = >> >> df shows 519G is used. While recursive check using du shows only 467G. >> The filesystem doesn't contain any snapshots/extra subvolumes. >> Neither does it contain any mounted filesystem under /dc. >> I also considered that it could be a void left behind by one of the >> open FDs held by a process. So I rebooted the system. Still no >> changes. >> >> The situation is even worse on a few other systems with similar >> configuration. >> > > At least part of this is a difference in how each tool computes space usage. > > * `df` calls `statvfs` to get it's data, which tries to count physical > allocation accounting for replication profiles. In other words, data in > chunks with the dup, raid1, and raid10 profiles gets counted twice, data in > raid5 and raid6 chunks gets counted with a bit of extra space for the > parity, etc. > > * `btrfs fi df` looks directly at the filesystem itself and counts how much > space is available to each chunk type in the `total` values and how much > space is used in each chunk type in the `used` values, after replication. > If you add together the data used value and twice the system and metadata > used values, you get the used value reported by regular `df` (well, close to > it that is, `df` rounds at a lower precision than `btrfs fi df` does). > > * `du` scans the directory tree and looks at the file allocation values > returned form `stat` calls (or just looks at file sizes if you pass the > `--apparent-size` flag to it). Like `btrfs fi df`, it reports values after > replication, it has a couple of nasty caveats on BTRFS, namely that it will > report sizes for natively compressed files _before_ compression, and will > count reflinked blocks once for each link. > > Now, this doesn't explain the entirety of the discrepancy with `du`, but it > should cover the whole difference between `df` and `btrfs fi df`. -- -Shyam -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs-progs: fsck-tests: Introduce test case with keyed data backref with the extent offset
Add the testcase for false alert of data extent backref lost with the extent offset. The image can be reproduced by the following commands: -- dev=~/test.img mnt=/mnt/btrfs umount $mnt &> /dev/null fallocate -l 128M $dev mkfs.btrfs $dev mount $dev $mnt for i in `seq 1 10`; do xfs_io -f -c "pwrite 0 2K" $mnt/file$i done xfs_io -f -c "falloc 0 64K" $mnt/file11 for i in `seq 1 32`; do xfs_io -f -c "reflink $mnt/file11 0 $(($i * 64))K 64K" $mnt/file11 done xfs_io -f -c "reflink $mnt/file11 32K $((33 * 64))K 32K" $mnt/file11 btrfs subvolume snapshot $mnt $mnt/snap1 umount $mnt btrfs-image -c9 $dev extent_data_ref.img -- Signed-off-by: Lu Fengqi --- .../fsck-tests/020-extent-ref-cases/extent_data_ref.img | Bin 0 -> 6144 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 tests/fsck-tests/020-extent-ref-cases/extent_data_ref.img diff --git a/tests/fsck-tests/020-extent-ref-cases/extent_data_ref.img b/tests/fsck-tests/020-extent-ref-cases/extent_data_ref.img new file mode 100644 index ..3ab2396ba9c810d98f16a5efcf7fe23ee4b12ab5 GIT binary patch literal 6144 zcmeHKhgXx!wg;pLB1&(OB2^RhswC7R@)a*^1Q zoukK(t9nD9`o4x$Zyr51!+&3ungAIaVYi!m$6rv1jhq;d6iHImVvj2b5>V zZ~4u@ZwCGY81Nfk`ym*TMS8{0BlxQ&y0Uq6A0K)8N6Q>rPiyN%7Kg|ljJ!X25e4SR zqg7JI60Rvr4w?fol%`OCtCcBcV<_$gA%9RVL#wS;7f#<1L22?a92Cb2xOK=~ ztcp9_ps855F#$LaXkeFshxR3J0Tq7L8mL*o3+{{h8zP1eZcPCgCjIA^nAjejfD;qQ|nts`g?xciH zSxzRsQ}~tghc&9j`df0KVy!nE4dZN{05%*Q5=fhH7U^d!v}miq*r;Z3j>~gzE;@8W z6;$=)G#k9A=;EiI!?T{UWC3V7yjSSRK6Abo%Piy-qi57pvzh%8ZyE0 z0;(+IqPI&A6`Z}KJN*Z#XK#$}6`aVdTGyFIX`vPTET|;IsYQub!&uY zwfq$QiB(`qtSwrzUre0u876v5M(qYvQN{owlvG00#}j>3Q58 zRrn9L;w~AKI|K>y>=Isih~!@YzSS&9VjAeID>ihw!{Lp-KJt7ypjyx;GWJED%rm7j ziD!Rj>Mx}I1@ce}zXc`0IWd@m<#(Mcvio-{-6V}syKbv|5Ap*Bju=+1IqCewAP6|U zpi^s!F8{bS^|eCQMI~!KtqWn(DoHEt| zqd2TOWRx%Gz#>Z$ZP-Yi3OTx}MSgu7qWz;ZiBo*i^m1yD^=B_@*B2ZsECn5aDkXlc z*8}y9j>h`0H0m9Pw|Npg28u^ixQhCVH7FFc^9+42`K727;7xag@5JL1F$EKnt2$f> z9z(^ZW(qy31tQgcDbec8_S?c&;(IK=6&Q11Q_pqUnXc8JH{^)tF!oPTFQ_O{pb2Y9 z@SW{5u?WN@dh8hK6*P}3f$NqxjjF1uXDV?nIA{CO#Y=7;_r1O}o&B4w(EPE0jB49G z&>n=!@5%~3D3OZnH_cwfy~8m(J2bagn3G`$qalENf;V5T~OcAwWmv zHYO=|{o!@5KnfdjLB6Mf6i?z>fGQIAU&dWWMi^UHi(^(}Z;`Zl@~KW%jR7FCz;JxrP??o* zt1gxa#S)8z$HL+6i80A^;WRNbIVJFjmr%KnBF5!q*CSqrgSEvDLVQQ34hjP-ij-P- zr<0E+kFI{4wGub9Xem8y9gR=bu$n){U_FPjMxv94w2|+Q*KQzTwk~pOK0YLW<4(*QGhOFKsvH3 zJie6P);E>4$1Xmv=*fVjYj{Dr${YHb>RqpwW&-*l6=t=T!$j?kDlW79;8c2}Yfj}I zLrgG3$S(XlaQ1SHG&O_?Pnvx{G0os!9{0(T($%_1W~P^soWkM^dU;O;ymGs>{~W%Q zui9?JM8$8ZI#80s6N8A9W}*^kS|fGWY*%H1GT0je(m*lSfwK;Go<~HYgc}4VGzR!! zl=mwm`GoQO!DO1ekx>Q zm0YZ@b=|Q4F@L_HQ)Hxe{s`JeMX+;Qg* zZJ}GJfj_uTDhn!AzJ?8**@hC9GN77IWl9AFNi^(Nz)C#u3c%IYy`Ye9gJVq(X z7p6EJhHx!E4kaAo*1xUqk|+*sRg@mk#xq)n)-)flPrClVW$x+!d^&lnpdRZGGU?FS zjNQh6nEi@Muxi}FUFixI-#B@WhIXGW>tjn-y{7AiA3`+N3wfbWyCHwe>0jo(gu3Fr zpyb0)AQIC9N10u;2seX57s)O;?eu9DeOs&QM`TMgF@e z+{))YEFaq=)c2`XZ4JEpSz3@p%3is6zwLHnPD1K2 znIToGv8NeWky{p%3?Iy+?8*ku4Q91+6Z)~$s`prt>lTVkb>?|Oy`Igpx6CKrKW*~V zV1K<-f2a2&b9|zoQX{f&z5{61=+ch@-ZnHYg1nQtgtx`* z>4y@f;@sfvVb7hjhVP&Dy$bAxmgAUNfEcPKDKt14a0lUO8Q%@N-gA&!l*_yoe#I|8*P6ARScVV3wHEXzx-ppZN-6l$`$I=ZNOF zr0rRR05%UH0BJ)Aq=q8|=8t7{=9&=#*;ZCpTou;iqIcRl=(o1U{bh!^^w_Ro20F%O zW@Q5W84H1=3k#%%b(T@J*IFt=D|w2s-#k{__F4XgHnDetS}sTkjw z{yEPPxKR}p=j&Xab)~p(&BABXuLGcvPW#LOptjWJn}Fg{5P?6eWtMT{t6hPoZazvK z$2Dj>`x0uV-So;v=k{6twPpUV9VM%OsUq4}Kd~SK6%h8#-Omk7cSb7$$10)$clOo* zs_IS@*eY&q>XvGU~` zRNkCm4R&m=)mt}5YucjqzDCRM2pVtLoMduA>yg{veA8;z<=GXpXU50wkLpp4>hb;2 zS!n(_%ikW)ZT8Xebnw|btf_cP#r2Addf4Jg<~Akb9rE8Lj37{ZY*4W~vEB!nVWglp zySnP7ENL39eOyl?7ji47*q8z~CL%0HZWoS+_FZF9cluiDz
[PATCH 2/3] btrfs-progs: check/lowmem: Fix false alert of data extent backref lost for snapshot
Btrfs lowmem check reports the following false alert: -- ERROR: file extent[267 2162688] root 256 owner 5 backref lost -- The file extent is in the leaf which is shared by file tree 256 and fs tree. -- leaf 30605312 items 46 free space 4353 generation 7 owner 5 .. item 45 key (267 EXTENT_DATA 2162688) itemoff 5503 itemsize 53 generation 7 type 2 (prealloc) prealloc data disk byte 13631488 nr 65536 prealloc data offset 32768 nr 32768 -- And there is the corresponding extent_data_ref item in the extent tree. -- item 1 key (13631488 EXTENT_DATA_REF 1007496934287921081) itemoff 15274 itemsize 28 extent data backref root 5 objectid 267 offset 2129920 count 1 -- The offset of EXTENT_DATA_REF which is the hash of the owner root objectid, the inode number and the calculated offset (file offset - extent offset). What caused the false alert is the code mix up the owner root objectid and the file tree objectid. Fixes: b0d360b541f0 ("btrfs-progs: check: introduce function to check data backref in extent tree") Signed-off-by: Lu Fengqi --- check/mode-lowmem.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index f37b1b2c1571..6f1ea8db341d 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -2689,8 +2689,8 @@ static int check_extent_data_item(struct btrfs_root *root, /* Didn't find inlined data backref, try EXTENT_DATA_REF_KEY */ dbref_key.objectid = btrfs_file_extent_disk_bytenr(eb, fi); dbref_key.type = BTRFS_EXTENT_DATA_REF_KEY; - dbref_key.offset = hash_extent_data_ref(root->objectid, - fi_key.objectid, fi_key.offset - offset); + dbref_key.offset = hash_extent_data_ref(owner, fi_key.objectid, + fi_key.offset - offset); ret = btrfs_search_slot(NULL, root->fs_info->extent_root, &dbref_key, &path, 0, 0); -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/3] btrfs-progs: check/lowmem: Fix the incorrect error message of check_extent_data_item
Instead of the disk_bytenr and disk_num_bytes of the extent_item which the file extent references, we should output the objectid and offset of the file extent. And the leaf may be shared by the file trees, we should print the objectid of the root and the owner of the leaf. Fixes: b0d360b541f0 ("btrfs-progs: check: introduce function to check data backref in extent tree") Signed-off-by: Lu Fengqi --- V2: Output the objectid of the root and the owner of the leaf. check/mode-lowmem.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 62bcf3d2e126..f37b1b2c1571 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -2631,9 +2631,9 @@ static int check_extent_data_item(struct btrfs_root *root, if (!(extent_flags & BTRFS_EXTENT_FLAG_DATA)) { error( - "extent[%llu %llu] backref type mismatch, wanted bit: %llx", - disk_bytenr, disk_num_bytes, - BTRFS_EXTENT_FLAG_DATA); +"file extent[%llu %llu] root %llu owner %llu backref type mismatch, wanted bit: %llx", + fi_key.objectid, fi_key.offset, root->objectid, owner, + BTRFS_EXTENT_FLAG_DATA); err |= BACKREF_MISMATCH; } @@ -2722,8 +2722,9 @@ out: err |= BACKREF_MISSING; btrfs_release_path(&path); if (err & BACKREF_MISSING) { - error("data extent[%llu %llu] backref lost", - disk_bytenr, disk_num_bytes); + error( + "file extent[%llu %llu] root %llu owner %llu backref lost", + fi_key.objectid, fi_key.offset, root->objectid, owner); } return err; } -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Re: BUG: unable to handle kernel paging request at ffff9fb75f827100
Hi Christoph, Since I'm still digging the unexpected corruption (although without much progress yet), would you please describe how the corruption happens? In my current investigation, btrfs is indeed bullet-proof, (unlike my original assumption) using newer dm-log-writes tool. But free space cache file (v1 free space cache) is not CoW protected so it's vulnerable against power loss. So my current assumption is, there are at least 2 power loss happens during the problem. The 1st power loss caused free space cache corrupted but not detected by its checksum, and btrfs used the corrupted free space cache to allocate tree blocks. And then 2nd power loss happened. Since new allocated tree blocks can overwrite existing tree blocks, it breaks metadata CoW of btrfs, and leads the final corruption. Would you please provide some detailed info about the corruption? Thanks, Qu On 2018年02月23日 03:21, Christoph Anton Mitterer wrote: > Am 22. Februar 2018 04:57:53 MEZ schrieb Qu Wenruo : >> >> >> On 2018年02月22日 10:56, Christoph Anton Mitterer wrote: >>> Just one last for today... I did a quick run with the byte nr from >> the last mail... See screenshot >>> >>> It still gives these mapping errors... But does seem to write >> files... >> >>From your previous picture, it seems that FS_TREE is your primary >> subvolume, and 257 would be your snapshot. >> >> And for that block which can't be mapped, it seems to be a corruption >> and it's really too large. >> >> So ignoring it wouldn't be a problem. >> >> And just keep btrfs-store running to see what it salvaged? >> >>> >>> But these mapping errors... Wtf?! >>> >>> >>> Thanks and until tomorrow. >>> >>> Chris >>> >>> Oh and in my panic (I still fear that my main data fs, which is on >> other hard disks could be affected by that strange bug, too, and have >> no idea how to verify they are not) I forgot: you are from China, >> aren't you? So a blessed happy new year. :-) >> >> Happy new year too. >> >> Thanks, >> Qu >> >>> > > Hey > > Have you written more after the mail below? Cause my normal email account > ran full and I cannot recover that right now with my computer. > > Anyway... I tried now the restore and it seems to give back some data > (haven't looked at it yet)... I also made a dd copy of the whole fs image to > another freshly crafted btrfs fs (as an image file). > > That seemed to work well, but when I differs that image with the original, > new csum errors of that file were found. (see attached image) > > Could that be a pointer to some hardware defect? Perhaps the memory? Though I > did do an extensive memtest86+ a while ago. > > And that could be the reason for the corruption in the first place... > > Thanks, > Chris. > signature.asc Description: OpenPGP digital signature