[PATCH RFC] btrfs-progs: Show backtrace on BUGs
btrfs check is still under heavy development and so there are some BUGs beging hit. btrfs check can be run on limited environment which lacks gdb to debug the abort in detail. If we could see backtrace, it will be easier to find a root cause of the BUG. Following is my btrfs check output with and without the patch. without patch: ref mismatch on [1437411180544 16384] extent item 3, found 2 btrfs: extent_io.c:612: free_extent_buffer: Assertion `!(eb-flags 1)' failed. enabling repair mode Checking filesystem on /dev/sdb2 UUID: a53121ee-679f-4241-bb44-ceb5a1a7beb7 with patch: ref mismatch on [1437411180544 16384] extent item 3, found 2 btrfs check(free_extent_buffer+0xa3)[0x808ff89] btrfs check[0x8090222] btrfs check(alloc_extent_buffer+0xa7)[0x80903d0] btrfs check(btrfs_find_create_tree_block+0x2e)[0x807edef] btrfs check(btrfs_alloc_free_block+0x299)[0x808a3c4] btrfs check(__btrfs_cow_block+0x1d4)[0x8078896] btrfs check(btrfs_cow_block+0x150)[0x807934d] btrfs check(btrfs_search_slot+0x136)[0x807c041] btrfs check[0x8065a6b] btrfs check[0x806bc0f] btrfs check(cmd_check+0xc8f)[0x806d03a] btrfs check(main+0x167)[0x804f5ec] /lib/libc.so.6(__libc_start_main+0xe6)[0xf754c4f6] btrfs check[0x804f061] btrfsck: extent_io.c:612: free_extent_buffer: Assertion `!(eb-flags 1)' failed. enabling repair mode Checking filesystem on /dev/sdb2 UUID: a53121ee-679f-4241-bb44-ceb5a1a7beb7 Now it's much clear that there's something wrong around alloc_extent_buffer. Signed-off-by: Naohiro Aota na...@elisp.net --- Makefile | 2 +- kerncompat.h | 29 ++--- 2 files changed, 27 insertions(+), 4 deletions(-) diff --git a/Makefile b/Makefile index 76565e8..9db5441 100644 --- a/Makefile +++ b/Makefile @@ -5,7 +5,7 @@ CC = gcc LN = ln AR = ar AM_CFLAGS = -Wall -D_FILE_OFFSET_BITS=64 -DBTRFS_FLAT_INCLUDES -fno-strict-aliasing -fPIC -CFLAGS = -g -O1 -fno-strict-aliasing +CFLAGS = -g -O1 -fno-strict-aliasing -rdynamic objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \ root-tree.o dir-item.o file-item.o inode-item.o inode-map.o \ extent-cache.o extent_io.o volumes.o utils.o repair.o \ diff --git a/kerncompat.h b/kerncompat.h index f370cd8..94fadc8 100644 --- a/kerncompat.h +++ b/kerncompat.h @@ -27,6 +27,7 @@ #include byteswap.h #include assert.h #include stddef.h +#include execinfo.h #include linux/types.h #ifndef READ @@ -50,7 +51,18 @@ #define ULONG_MAX (~0UL) #endif -#define BUG() assert(0) +#define MAX_BACKTRACE 128 +#define show_trace() \ + do {\ + void *trace[MAX_BACKTRACE]; \ + int n = backtrace(trace, MAX_BACKTRACE);\ + backtrace_symbols_fd(trace, n, 2); \ + } while(0) +#define BUG() \ + do {\ + show_trace(); \ + assert(0); \ + } while(0) #ifdef __CHECKER__ #define __force__attribute__((force)) #define __bitwise__ __attribute__((bitwise)) @@ -233,8 +245,19 @@ static inline long IS_ERR(const void *ptr) #define kstrdup(x, y) strdup(x) #define kfree(x) free(x) -#define BUG_ON(c) assert(!(c)) -#define WARN_ON(c) assert(!(c)) +#define BUG_ON(c) \ + do {\ + if((c)) { \ + show_trace(); \ + assert(!(c)); \ + } \ + } while(0) +#define WARN_ON(c) do {\ + if((c)) { \ + show_trace(); \ + assert(!(c)); \ + } \ + } while(0) #define container_of(ptr, type, member) ({ \ -- 2.0.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: Do not free dirty extent buffer
free_some_buffer() should not free dirty extent buffers. They should be left for later commit. Signed-off-by: Naohiro Aota na...@elisp.net --- extent_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/extent_io.c b/extent_io.c index a127e54..8a668be 100644 --- a/extent_io.c +++ b/extent_io.c @@ -552,7 +552,7 @@ static int free_some_buffers(struct extent_io_tree *tree) list_for_each_safe(node, next, tree-lru) { eb = list_entry(node, struct extent_buffer, lru); - if (eb-refs == 1) { + if (eb-refs == 1 !(eb-flags EXTENT_DIRTY)) { free_extent_buffer(eb); if (tree-cache_size cache_hard_max) break; -- 2.0.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: Do not free dirty extent buffer
Hi, Filipe David Manana fdman...@gmail.com writes: On Thu, Aug 21, 2014 at 1:07 PM, Naohiro Aota na...@elisp.net wrote: free_some_buffer() should not free dirty extent buffers. They should be left for later commit. Signed-off-by: Naohiro Aota na...@elisp.net --- extent_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/extent_io.c b/extent_io.c index a127e54..8a668be 100644 --- a/extent_io.c +++ b/extent_io.c @@ -552,7 +552,7 @@ static int free_some_buffers(struct extent_io_tree *tree) list_for_each_safe(node, next, tree-lru) { eb = list_entry(node, struct extent_buffer, lru); - if (eb-refs == 1) { + if (eb-refs == 1 !(eb-flags EXTENT_DIRTY)) { Hi, Did you meant bitwise and () and not logical and () right? Oops, you're right. Following is the new patch. From 0e1a49216d40b44909cdfacd5cd9a13aa796db41 Mon Sep 17 00:00:00 2001 From: Naohiro Aota na...@elisp.net Date: Mon, 25 Aug 2014 14:03:59 +0900 Subject: [PATCH] btrfs-progs: Do not free dirty extent buffer free_some_buffer() should not free dirty extent buffers. They are left to be committed. Signed-off-by: Naohiro Aota na...@elisp.net --- extent_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/extent_io.c b/extent_io.c index a127e54..1df377d 100644 --- a/extent_io.c +++ b/extent_io.c @@ -552,7 +552,7 @@ static int free_some_buffers(struct extent_io_tree *tree) list_for_each_safe(node, next, tree-lru) { eb = list_entry(node, struct extent_buffer, lru); - if (eb-refs == 1) { + if (eb-refs == 1 !(eb-flags EXTENT_DIRTY)) { free_extent_buffer(eb); if (tree-cache_size cache_hard_max) break; -- 2.0.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs-progs: initial reference count of extent buffer is correct?
Hi, list I'm having trouble with my btrfs FS recently and running btrfs check to try to fix the FS. Unfortunately, it aborted with: btrfsck: root-tree.c:81: btrfs_update_root: Assertion `!(ret != 0)' failed. It means that extent tree root is not found in tree root tree! Then I added btrfs_print_leaf() there to see what is happening there. There were (... METADATA_ITEM 0) keys listed. Well, I found tree root tree's root extent buffer is somewhat replaced by a extent buffer from the extent tree. Reading the code, it seems that free_some_buffers() reclaim extent buffers allocated to root trees because they are not extent_buffer_get()ed (i.e. @refs == 1). To reproduce this problem, try running this code. This program first print root tree node's bytenr, and scan some trees. If your FS is large enough to run free_some_buffers(), tree root node's bytenr after the scan would be different. #include stdio.h #include ctree.h #include disk-io.h void scan_tree(struct btrfs_root *root, struct extent_buffer *eb) { u32 i; u32 nr; nr = btrfs_header_nritems(eb); if (btrfs_is_leaf(eb)) return; u32 size = btrfs_level_size(root, btrfs_header_level(eb) - 1); for (i = 0; i nr; i++) { if (btrfs_is_leaf(eb)) return; u64 bytenr = btrfs_node_blockptr(eb, i); struct extent_buffer *next = read_tree_block(root, bytenr, size, btrfs_node_ptr_generation(eb, i)); if (!next) continue; scan_tree(root, next); } } int main(int ac, char **av) { struct btrfs_fs_info *info; struct btrfs_root *root; info = open_ctree_fs_info(av[1], 0, 0, OPEN_CTREE_PARTIAL); root = info-fs_root; printf(tree root %lld\n, info-tree_root-node-start); scan_tree(info-fs_root, info-extent_root-node); scan_tree(info-fs_root, info-csum_root-node); scan_tree(info-fs_root, info-fs_root-node); printf(tree root %lld\n, info-tree_root-node-start); return close_ctree(root); } On my environment, the above code print the following result. Tree root tree variable is eventually pointing to another extent! $ ./btrfs-reproduce /dev/sda3 tree root 91393835008 tree root 49102848 I found commit 53ee1bccf99cd5b474fe1aa857b7dd176e3a1407 changed the initial @refs to 1, stating that we don't give enough free_extent_buffer() to reduce the eb's references to zero so that the eb can finally be freed, but I don't think this is correct. Even if initial @refs == 2, one free_extent_buffer() would make the @refs to 1 and so let it reclaimed by free_some_buffer(), so it does not seems to be a problem for me... I think there are some collides how to use extent buffer: should __alloc_extent_buffer set @refs = 2 for the caller or should the code call extent_buffer_get() by themselves everywhere you allocate eb before any other eb allocation not to let the first eb reclaimed? How to fix this problem? revert 53ee1bccf99cd5b474fe1aa857b7dd176e3a1407 is the collect way? or add missing extent_buffer_get() everywhere allocating is done? Naohiro -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: do not reclaim extent buffer
We should kill free_some_buffers() to stop reclaiming extent buffers or we will hit a problem described below. As of commit 53ee1bccf99cd5b474fe1aa857b7dd176e3a1407, we are not counting a reference for tree-lru anymore. However free_some_buffers() is still left and is reclaiming extent buffers whose @refs == 1. This cause extent buffers to be reclaimed unintentionally. Thus the following steps could happen: 1. A buffer at address A is reclaimed by free_some_buffers() (address A is also free()ed) 2. Some code call alloc_extent_buffer() 3. Address A is assigned to newly allocated buffer 4. You see a buffer pointed by A suddenly changed its content This problem is also pointed out here and it has a reproducer: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36703.html This commit drop free_some_buffers() and related variables, and also it modify extent_io_tree_cleanup() to catch non-free'ed buffers properly. Signed-off-by: Naohiro Aota na...@elisp.net --- extent_io.c | 37 +++-- 1 file changed, 3 insertions(+), 34 deletions(-) diff --git a/extent_io.c b/extent_io.c index 1df377d..425af8a 100644 --- a/extent_io.c +++ b/extent_io.c @@ -30,9 +30,6 @@ #include ctree.h #include volumes.h -static u64 cache_soft_max = 1024 * 1024 * 256; -static u64 cache_hard_max = 1 * 1024 * 1024 * 1024; - void extent_io_tree_init(struct extent_io_tree *tree) { cache_tree_init(tree-state); @@ -77,12 +74,9 @@ void extent_io_tree_cleanup(struct extent_io_tree *tree) while(!list_empty(tree-lru)) { eb = list_entry(tree-lru.next, struct extent_buffer, lru); - if (eb-refs != 1) { - fprintf(stderr, extent buffer leak: - start %llu len %u\n, - (unsigned long long)eb-start, eb-len); - eb-refs = 1; - } + fprintf(stderr, extent buffer leak: + start %llu len %u\n, + (unsigned long long)eb-start, eb-len); free_extent_buffer(eb); } @@ -541,30 +535,6 @@ out: return ret; } -static int free_some_buffers(struct extent_io_tree *tree) -{ - u32 nrscan = 0; - struct extent_buffer *eb; - struct list_head *node, *next; - - if (tree-cache_size cache_soft_max) - return 0; - - list_for_each_safe(node, next, tree-lru) { - eb = list_entry(node, struct extent_buffer, lru); - if (eb-refs == 1 !(eb-flags EXTENT_DIRTY)) { - free_extent_buffer(eb); - if (tree-cache_size cache_hard_max) - break; - } else { - list_move_tail(eb-lru, tree-lru); - } - if (nrscan++ 64 tree-cache_size cache_hard_max) - break; - } - return 0; -} - static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree, u64 bytenr, u32 blocksize) { @@ -589,7 +559,6 @@ static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree, eb-cache_node.size = blocksize; INIT_LIST_HEAD(eb-recow); - free_some_buffers(tree); ret = insert_cache_extent(tree-cache, eb-cache_node); if (ret) { free(eb); -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: clear bio reference after submit_one_bio()
After submit_one_bio(), `bio' can go away. However submit_extent_page() leave `bio' referable if submit_one_bio() failed (e.g. -ENOMEM on OOM). It will cause invalid paging request when submit_extent_page() is called next time. I reproduced ENOMEM case with the following script (need CONFIG_FAIL_PAGE_ALLOC, and CONFIG_FAULT_INJECTION_DEBUG_FS). #!/bin/bash dmesgout=dmesg.txt start=10 end=30 step=1000 # btrfs options device=/dev/vdb1 directory=/mnt/btrfs # fault-injection options percent=100 times=3 mkdir -p $directory || exit 1 mount -o compress $device $directory || exit 1 rm -f $directory/file || exit 1 dd if=/dev/zero of=$directory/file bs=1M count=512 || exit 1 for interval in `seq $start $step $end`; do dmesg -C echo 1 /proc/sys/vm/drop_caches sync export FAILCMD_TYPE=fail_page_alloc ./failcmd.sh -p $percent -t $times -i $interval \ --ignore-gfp-highmem=N --ignore-gfp-wait=N --min-order=0 \ -- \ cat $directory/file /dev/null dmesg ${dmesgout} if grep -q BUG: ${dmesgout}; then cat ${dmesgout} exit 1 fi done umount $directory exit 0 Signed-off-by: Naohiro Aota na...@elisp.net --- fs/btrfs/extent_io.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4ebabd2..4421161 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2816,8 +2816,10 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, bio_add_page(bio, page, page_size, offset) page_size) { ret = submit_one_bio(rw, bio, mirror_num, prev_bio_flags); - if (ret 0) + if (ret 0) { + *bio_ret = NULL; return ret; + } bio = NULL; } else { return 0; -- 2.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: tweak key advancing condition
The key advancing condition used in copy_to_sk() is loose. It can advance the key even if it reaches sk-max_*: e.g. when the max key = (512, 1024, -1) and the current key = (512, 1025, 10), it increments the offset by 1, continues hopeless search from (512, 1025, 11). This issue make ioctl() to take a lot of time scanning all the leaf a blocks one by one. This commit fix the problem using standard way of key comparison: btrfs_comp_cpu_keys() Signed-off-by: Naohiro Aota na...@elisp.net --- fs/btrfs/ioctl.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 1c22c65..07dc01d 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1932,6 +1932,7 @@ static noinline int copy_to_sk(struct btrfs_root *root, u64 found_transid; struct extent_buffer *leaf; struct btrfs_ioctl_search_header sh; + struct btrfs_key test; unsigned long item_off; unsigned long item_len; int nritems; @@ -2015,12 +2016,17 @@ static noinline int copy_to_sk(struct btrfs_root *root, } advance_key: ret = 0; - if (key-offset (u64)-1 key-offset sk-max_offset) + test.objectid = sk-max_objectid; + test.type = sk-max_type; + test.offset = sk-max_offset; + if (btrfs_comp_cpu_keys(key, test) = 0) + ret = 1; + else if (key-offset (u64)-1) key-offset++; - else if (key-type (u8)-1 key-type sk-max_type) { + else if (key-type (u8)-1) { key-offset = 0; key-type++; - } else if (key-objectid (u64)-1 key-objectid sk-max_objectid) { + } else if (key-objectid (u64)-1) { key-offset = 0; key-type = 0; key-objectid++; -- 2.4.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][RESEND] btrfs: fix search key advancing condition
The search key advancing condition used in copy_to_sk() is loose. It can advance the key even if it reaches sk-max_*: e.g. when the max key = (512, 1024, -1) and the current key = (512, 1025, 10), it increments the offset by 1, continues hopeless search from (512, 1025, 11). This issue make ioctl() to take unexpectedly long time scanning all the leaf a blocks one by one. This commit fix the problem using standard way of key comparison: btrfs_comp_cpu_keys() Signed-off-by: Naohiro Aota na...@elisp.net --- fs/btrfs/ioctl.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 1c22c65..07dc01d 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1932,6 +1932,7 @@ static noinline int copy_to_sk(struct btrfs_root *root, u64 found_transid; struct extent_buffer *leaf; struct btrfs_ioctl_search_header sh; + struct btrfs_key test; unsigned long item_off; unsigned long item_len; int nritems; @@ -2015,12 +2016,17 @@ static noinline int copy_to_sk(struct btrfs_root *root, } advance_key: ret = 0; - if (key-offset (u64)-1 key-offset sk-max_offset) + test.objectid = sk-max_objectid; + test.type = sk-max_type; + test.offset = sk-max_offset; + if (btrfs_comp_cpu_keys(key, test) = 0) + ret = 1; + else if (key-offset (u64)-1) key-offset++; - else if (key-type (u8)-1 key-type sk-max_type) { + else if (key-type (u8)-1) { key-offset = 0; key-type++; - } else if (key-objectid (u64)-1 key-objectid sk-max_objectid) { + } else if (key-objectid (u64)-1) { key-offset = 0; key-type = 0; key-objectid++; -- 2.4.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RESEND] btrfs: fix search key advancing condition
Hello, list. Could any one take a look at on this? I believe this is a issue slowing down ioctl(BTRFS_IOC_TREE_SEARCH) if the target key is missing. On Tue, Jun 30, 2015 at 11:25 AM, Naohiro Aota na...@elisp.net wrote: The search key advancing condition used in copy_to_sk() is loose. It can advance the key even if it reaches sk-max_*: e.g. when the max key = (512, 1024, -1) and the current key = (512, 1025, 10), it increments the offset by 1, continues hopeless search from (512, 1025, 11). This issue make ioctl() to take unexpectedly long time scanning all the leaf a blocks one by one. This commit fix the problem using standard way of key comparison: btrfs_comp_cpu_keys() Signed-off-by: Naohiro Aota na...@elisp.net --- fs/btrfs/ioctl.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 1c22c65..07dc01d 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1932,6 +1932,7 @@ static noinline int copy_to_sk(struct btrfs_root *root, u64 found_transid; struct extent_buffer *leaf; struct btrfs_ioctl_search_header sh; + struct btrfs_key test; unsigned long item_off; unsigned long item_len; int nritems; @@ -2015,12 +2016,17 @@ static noinline int copy_to_sk(struct btrfs_root *root, } advance_key: ret = 0; - if (key-offset (u64)-1 key-offset sk-max_offset) + test.objectid = sk-max_objectid; + test.type = sk-max_type; + test.offset = sk-max_offset; + if (btrfs_comp_cpu_keys(key, test) = 0) + ret = 1; + else if (key-offset (u64)-1) key-offset++; - else if (key-type (u8)-1 key-type sk-max_type) { + else if (key-type (u8)-1) { key-offset = 0; key-type++; - } else if (key-objectid (u64)-1 key-objectid sk-max_objectid) { + } else if (key-objectid (u64)-1) { key-offset = 0; key-type = 0; key-objectid++; -- 2.4.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] btrfs-progs: test: umount if confirmation failed
When a check in check_inode() failed, the test should umount test target file system. This commit add clean up umount line in failure path. Signed-off-by: Naohiro Aota <na...@elisp.net> --- tests/fsck-tests/012-leaf-corruption/test.sh | 4 1 file changed, 4 insertions(+) diff --git a/tests/fsck-tests/012-leaf-corruption/test.sh b/tests/fsck-tests/012-leaf-corruption/test.sh index 6e23145..bfdd0ea 100755 --- a/tests/fsck-tests/012-leaf-corruption/test.sh +++ b/tests/fsck-tests/012-leaf-corruption/test.sh @@ -57,6 +57,7 @@ check_inode() # Check whether the inode exists exists=$($SUDO_HELPER find $path -inum $ino) if [ -z "$exists" ]; then + $SUDO_HELPER umount $TEST_MNT _fail "inode $ino not recovered correctly" fi @@ -64,17 +65,20 @@ check_inode() found_mode=$(printf "%o" 0x$($SUDO_HELPER stat $exists -c %f)) if [ $found_mode -ne $mode ]; then echo "$found_mode" + $SUDO_HELPER umount $TEST_MNT _fail "inode $ino modes not recovered" fi # Check inode size found_size=$($SUDO_HELPER stat $exists -c %s) if [ $mode -ne 41700 -a $found_size -ne $size ]; then + $SUDO_HELPER umount $TEST_MNT _fail "inode $ino size not recovered correctly" fi # Check inode name if [ "$(basename $exists)" != "$name" ]; then + $SUDO_HELPER umount $TEST_MNT _fail "inode $ino name not recovered correctly" else return 0 -- 2.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs-progs: properly reset nlink of multi-linked file
If a file is linked from more than one directory and only one of the links is corrupted, btrfs check dose not reset the nlink properly. Actually it can go into infinite loop to link the broken file into lost+found. This patch fix two part of the code. The first one delay the freeing valid (no error, found inode ref, directory index, and directory item) backrefs. Freeing valid backrefs earier prevent reset_nlink() to add back all valid links. The second fix is obvious: passing `ref_type' to btrfs_add_link() is just wrong. It should be `filetype' instead. The current code can break all valid file links. Signed-off-by: Naohiro Aota <na...@elisp.net> --- cmds-check.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index 6a0b50a..11ff3fe 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -810,7 +810,8 @@ static void maybe_free_inode_rec(struct cache_tree *inode_cache, if (backref->found_dir_item && backref->found_dir_index) { if (backref->filetype != filetype) backref->errors |= REF_ERR_FILETYPE_UNMATCH; - if (!backref->errors && backref->found_inode_ref) { + if (!backref->errors && backref->found_inode_ref && + rec->nlink == rec->found_link) { list_del(>list); free(backref); } @@ -2392,7 +2393,7 @@ static int reset_nlink(struct btrfs_trans_handle *trans, list_for_each_entry(backref, >backrefs, list) { ret = btrfs_add_link(trans, root, rec->ino, backref->dir, backref->name, backref->namelen, -backref->ref_type, >index, 1); +backref->filetype, >index, 1); if (ret < 0) goto out; } -- 2.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] btrfs-progs: fix file restore to lost+found bug
This series address an issue of btrfsck to restore infinite number of same file into `lost+found' directory. The issue occur on a file which is linked from two different directory A and B. If links from dir A is corrupted and links from dir B is kept valid, btrfsck won't stop creating a file in lost+found like this: - Moving file 'file.del.51' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1876 Trying to rebuild inode:1877 Moving file 'del' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1877 Can't get file name for inode 1876, using '1876' as fallback Moving file '1876' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1876 Can't get file name for inode 1876, using '1876' as fallback Moving file '1876.1876' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1876 (snip) Moving file '1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876' to 'lost+found' dir since it has no valid backref Fixed the nlink of inode 1876 Can't get file name for inode 1876, using '1876' as fallback Can't get file name for inode 1876, using '1876' as fallback Can't get file name for inode 1876, using '1876' as fallback - The problem is early release of inode backrefs. The release prevents `reset_nlink()' to add back valid backrefs to an inode. In the result, the following results occur: 0. btrfsck scan a FS tree 1. It finds valid links and invalid links (some links are lost) 2. All valid links are released 3. btrfsck detects found_links != nlink 4. reset_nlink() reset nlink to 0 5. No valid links are restored (thus still nlink = 0) 6. The file is restored to lost+found since nlink == 0 (now, nlink = 1) 7. btrfsck rescan the FS tree 8. It finds `found_links' = #valid_links+1 (in lost+found) and nlink = 1 9. again all valid links are lost, and restore to lost+found The first patch add clean up code to the test. It umount test directory on failure path. The second patch fix the above problem. And the last patch extend the test to check a case of multiple-linked file corruption. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] btrfs-progs: fix file restore to lost+found bug
On Sat, Dec 5, 2015 at 10:35 AM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote: > > > On 12/04/2015 01:37 PM, Naohiro Aota wrote: >> >> This series address an issue of btrfsck to restore infinite number of >> same file into `lost+found' directory. The issue occur on a file which >> is linked from two different directory A and B. If links from dir A is >> corrupted and links from dir B is kept valid, btrfsck won't stop >> creating a file in lost+found like this: >> >> - >> Moving file 'file.del.51' to 'lost+found' dir since it has no valid >> backref >> Fixed the nlink of inode 1876 >> Trying to rebuild inode:1877 >> Moving file 'del' to 'lost+found' dir since it has no valid backref >> Fixed the nlink of inode 1877 >> Can't get file name for inode 1876, using '1876' as fallback >> Moving file '1876' to 'lost+found' dir since it has no valid backref >> Fixed the nlink of inode 1876 >> Can't get file name for inode 1876, using '1876' as fallback >> Moving file '1876.1876' to 'lost+found' dir since it has no valid backref >> Fixed the nlink of inode 1876 >> (snip) >> Moving file >> '1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876.1876' >> to 'lost+found' dir since it has no valid backref >> Fixed the nlink of inode 1876 >> Can't get file name for inode 1876, using '1876' as fallback >> Can't get file name for inode 1876, using '1876' as fallback >> Can't get file name for inode 1876, using '1876' as fallback >> - >> >> The problem is early release of inode backrefs. The release prevents >> `reset_nlink()' to add back valid backrefs to an inode. In the result, >> the following results occur: >> >> 0. btrfsck scan a FS tree >> 1. It finds valid links and invalid links (some links are lost) >> 2. All valid links are released >> 3. btrfsck detects found_links != nlink >> 4. reset_nlink() reset nlink to 0 >> 5. No valid links are restored (thus still nlink = 0) >> 6. The file is restored to lost+found since nlink == 0 (now, nlink = 1) >> 7. btrfsck rescan the FS tree >> 8. It finds `found_links' = #valid_links+1 (in lost+found) and nlink = 1 >> 9. again all valid links are lost, and restore to lost+found > > > Right, that's one case I missed in the repair code. > > Thanks for the fix. Thanks for the review. > >> >> The first patch add clean up code to the test. It umount test >> directory on failure path. The second patch fix the above problem. And >> the last patch extend the test to check a case of multiple-linked file >> corruption. > > > But I only see the first 2 patches in maillist... > The last test case seems missing? Maybe, the last patch is too large to post to the list? Even it get smaller, 130260 bytes seems to be a bit large. How should I handle this? Put my repo somewhere and wait a maintainer to pull it? > > Thanks, > Qu > >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs-progs: test: umount if confirmation failed
On Tue, Dec 8, 2015 at 12:33 AM, David Sterba <dste...@suse.cz> wrote: > On Fri, Dec 04, 2015 at 02:37:25PM +0900, Naohiro Aota wrote: >> When a check in check_inode() failed, the test should umount test target >> file system. This commit add clean up umount line in failure path. > > The lack of cleanup after failed tests is intentional (exceptions > possible). The tests are supposed to do cleanup if they pass but if they > fail the test environment could be examined. In case of more complex > setups (loop devices, dm devices) it's some work to set it up again. > > We could add an optional per-test script cleanup.sh that does the > cleanup on demand (and is also run from tests/clean-tests.sh). I see. Well, I overlooked the README.md stating about clean-tests.sh... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] btrfs-progs: fix file restore to lost+found bug
On Tue, Dec 8, 2015 at 12:35 AM, David Sterba <dste...@suse.cz> wrote: > On Mon, Dec 07, 2015 at 11:59:19AM +0900, Naohiro Aota wrote: >> > But I only see the first 2 patches in maillist... >> > The last test case seems missing? >> >> Maybe, the last patch is too large to post to the list? Even it get >> smaller, 130260 bytes seems to be a bit large. >> >> How should I handle this? Put my repo somewhere and wait a maintainer >> to pull it? > > Please send it to me directly. The image will be available in > btrfs-progs git and we don't necessarily need the copy in the > mailinglist. Sure. I'll send it to you. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: Show a warning message if one of objectid reaches its highest value
2016-03-07 12:05 GMT+09:00 Satoru Takeuchi: > - It's better to show a warning message for the exceptional case > that one of objectid (in most case, inode number) reaches its > highest value. Show this message only once to avoid filling > dmesg with it. > - EOVERFLOW is more proper return value for this case. > ENOSPC is for "No space left on device" case and objectid isn't > related to any device. I have concern about EOVERFLOW. The value returned here will go through to the user space via btrfs_find_free_ino() and btrfs_create(). It means that "creat" and "mkdir" can now return EOVERFLOW when it failed to assign new inode number. Such behavior would disagree with other file systems, which result in user space programs to be confused. Also, I don't think EOVERFLOW described in "creat(2)" (or open(2)) suits for this case. As far as I read the following man page from creat(2), giving ENOSPC is better option here. > ENOSPC: pathname was to be created but the device containing pathname has no > room for the new file. > EOVERFLOW: > pathname refers to a regular file that is too large to be opened. > (snip) > Signed-off-by: Satoru Takeuchi > --- > This patch can be applied to 4.5-rc7 > --- > fs/btrfs/inode-map.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c > index e50316c..f5e3228 100644 > --- a/fs/btrfs/inode-map.c > +++ b/fs/btrfs/inode-map.c > @@ -556,7 +556,15 @@ int btrfs_find_free_objectid(struct btrfs_root *root, > u64 *objectid) > mutex_lock(>objectid_mutex); > > if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) { > - ret = -ENOSPC; > + static bool __warned = false; > + > + if (unlikely(!__warned)) { > + btrfs_warn(root->fs_info, > + "The objectid of root %llu reaches its > highest value.\n", > + root->root_key.objectid); > + __warned = true; > + } > + ret = -EOVERFLOW; > goto out; > } > > -- > 2.5.0 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: let btrfs_delete_unused_bgs() to clean relocated bgs
Currently, btrfs_relocate_chunk() is removing relocated BG by itself. But the work can be done by btrfs_delete_unused_bgs() (and it's better since it trim the BG). Let's dedupe the code. While btrfs_delete_unused_bgs() is already hitting the relocated BG, it skip the BG since the BG has "ro" flag set (to keep balancing BG intact). On the other hand, btrfs cannot drop "ro" flag here to prevent additional writes. So this patch make use of "removed" flag. btrfs_delete_unused_bgs() now detect the flag to distinguish whether a read-only BG is relocating or not. Signed-off-by: Naohiro Aota <naohiro.a...@hgst.com> --- fs/btrfs/extent-tree.c | 2 +- fs/btrfs/volumes.c | 24 ++-- 2 files changed, 11 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 843ed27..d382735 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -10971,7 +10971,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) spin_lock(_group->lock); if (block_group->reserved || btrfs_block_group_used(_group->item) || - block_group->ro || + (block_group->ro && !block_group->removed) || list_is_singular(_group->list)) { /* * We want to bail if we made new allocations or have diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 7e6399f..1a6789d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2931,8 +2931,8 @@ out: static int btrfs_relocate_chunk(struct btrfs_root *root, u64 chunk_offset) { struct btrfs_root *extent_root; - struct btrfs_trans_handle *trans; int ret; + struct btrfs_block_group_cache *block_group; root = root->fs_info->chunk_root; extent_root = root->fs_info->extent_root; @@ -2962,21 +2962,17 @@ static int btrfs_relocate_chunk(struct btrfs_root *root, u64 chunk_offset) if (ret) return ret; - trans = btrfs_start_trans_remove_block_group(root->fs_info, -chunk_offset); - if (IS_ERR(trans)) { - ret = PTR_ERR(trans); - btrfs_handle_fs_error(root->fs_info, ret, NULL); - return ret; - } - /* -* step two, delete the device extents and the -* chunk tree entries +* step two, flag the chunk as removed and let +* btrfs_delete_unused_bgs() remove it. */ - ret = btrfs_remove_chunk(trans, root, chunk_offset); - btrfs_end_transaction(trans, extent_root); - return ret; + block_group = btrfs_lookup_block_group(root->fs_info, chunk_offset); + spin_lock(_group->lock); + block_group->removed = 1; + spin_unlock(_group->lock); + btrfs_put_block_group(block_group); + + return 0; } static int btrfs_relocate_sys_chunks(struct btrfs_root *root) -- 2.7.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: let btrfs_delete_unused_bgs() to clean relocated bgs
2016-09-02 (金) の 09:35 -0400 に Josef Bacik さんは書きました: > On 09/02/2016 03:46 AM, Naohiro Aota wrote: > > > > Currently, btrfs_relocate_chunk() is removing relocated BG by > > itself. But > > the work can be done by btrfs_delete_unused_bgs() (and it's better > > since it > > trim the BG). Let's dedupe the code. > > > > While btrfs_delete_unused_bgs() is already hitting the relocated > > BG, it > > skip the BG since the BG has "ro" flag set (to keep balancing BG > > intact). > > On the other hand, btrfs cannot drop "ro" flag here to prevent > > additional > > writes. So this patch make use of "removed" flag. > > btrfs_delete_unused_bgs() now detect the flag to distinguish > > whether a > > read-only BG is relocating or not. > > > > This seems racey to me. We remove the last part of the block group, > it ends up > on the unused_bgs_list, we process this list, see that removed isn't > set and we > skip it, then later we set removed, but it's too late. I think the > right way is > to actually do a transaction, set ->removed, manually add it to the > unused_bgs_list if it's not already, then end the transaction. This > way we are > guaranteed to have the bg on the list when it is ready to be > removed. This is > my analysis after looking at it for 10 seconds after being awake for > like 30 > minutes so if I'm missing something let me know. Thanks, I don't think a race will happen. Since we are holding delete_unused_bgs_mutex here, btrfs_delte_unused_bgs() checks ->removed flag after we unlock the mutex i.e. we setup the flag properly. For a case btrfs_delete_usused_bgs() checks the BG before we hold delte_unused_bgs_mutex, then that BG is removed by it (if it's empty) and btrfs_relocate_chunk() should never see it. Regards, Naohiro
[PATCH 1/2] btrfs-progs: build: generate all dependency files
We're missing several dependency files like: $ diff -u <(find -name '*.o'|cut -d. -f2|sort) <(find -name '*.o.d'|cut -d. -f2|sort) --- /proc/self/fd/112017-09-14 18:17:44.460564620 +0900 +++ /proc/self/fd/122017-09-14 18:17:44.460564620 +0900 @@ -3,7 +3,6 @@ /btrfs-corrupt-block /btrfs-debug-tree /btrfs-find-root -/btrfs-list /btrfs-map-logical /btrfs-select-super /btrfstune @@ -29,11 +28,6 @@ /cmds-scrub /cmds-send /cmds-subvolume -/convert/common -/convert/main -/convert/source-ext2 -/convert/source-fs -/convert/source-reiserfs /ctree /dir-item /disk-io This is due to moving things out of objects and cmds_objects variables. Such missing dependency files cause mis-building of some source files (try touch utils.h; make mkfs/main.o). This patch introduce a new variable "all_objects" to keep all the objects and use the variable to generate proper dependency file building rules. Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com> --- Makefile |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index a114eca..c00dff6 100644 --- a/Makefile +++ b/Makefile @@ -121,6 +121,9 @@ libbtrfs_headers = send-stream.h send-utils.h send.h kernel-lib/rbtree.h btrfs-l convert_objects = convert/main.o convert/common.o convert/source-fs.o \ convert/source-ext2.o convert/source-reiserfs.o mkfs_objects = mkfs/main.o mkfs/common.o +image_objects = image/main.o +all_objects = $(objects) $(cmds_objects) $(libbtrfs_objects) $(convert_objects) \ + $(mkfs_objects) $(image_objects) TESTS = fsck-tests.sh convert-tests.sh @@ -591,5 +594,5 @@ uninstall: cd $(DESTDIR)$(bindir); $(RM) -f -- btrfsck fsck.btrfs $(progs_install) ifneq ($(MAKECMDGOALS),clean) --include $(objects:.o=.o.d) $(cmds_objects:.o=.o.d) $(subst .btrfs,, $(filter-out btrfsck.o.d, $(progs:=.o.d))) +-include $(all_objects:.o=.o.d) $(subst .btrfs,, $(filter-out btrfsck.o.d, $(progs:=.o.d))) endif -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs: build: omit unnecessary -MD flag
According to gcc(1), "-MD is equivalent to -M -MF file, except that -E is not implied." Since the rule in the Makefile is just generating dependency file and not building object file, it is no use to have "-MD" here. Also, it's overridden and conflicting with the following "-MM" flag. I guess we can drop it. Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com> --- Makefile |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index c00dff6..60c802a 100644 --- a/Makefile +++ b/Makefile @@ -264,7 +264,7 @@ else endif %.o.d: %.c - $(Q)$(CC) -MD -MM -MG -MF $@ -MT $(@:.o.d=.o) -MT $(@:.o.d=.static.o) -MT $@ $(CFLAGS) $< + $(Q)$(CC) -MM -MG -MF $@ -MT $(@:.o.d=.o) -MT $(@:.o.d=.static.o) -MT $@ $(CFLAGS) $< # # Pick from per-file variables, btrfs_*_cflags -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs-progs: build: generate all dependency files
On 2017年09月14日 21:41, David Sterba wrote: > On Thu, Sep 14, 2017 at 07:10:46PM +0900, Naohiro Aota wrote: >> We're missing several dependency files like: >> >> $ diff -u <(find -name '*.o'|cut -d. -f2|sort) <(find -name '*.o.d'|cut -d. >> -f2|sort) >>--- /proc/self/fd/112017-09-14 18:17:44.460564620 +0900 >>+++ /proc/self/fd/122017-09-14 18:17:44.460564620 +0900 > > Please note that an actual diff in the changelog is understood as start > of the patch by git-am, indenting the --- or +++ lines makes it work > again. Oops, I forgot about that limitation. Thank you for the fix. > >> @@ -3,7 +3,6 @@ >> /btrfs-corrupt-block >> /btrfs-debug-tree >> /btrfs-find-root >> -/btrfs-list >> /btrfs-map-logical >> /btrfs-select-super >> /btrfstune >> @@ -29,11 +28,6 @@ >> /cmds-scrub >> /cmds-send >> /cmds-subvolume >> -/convert/common >> -/convert/main >> -/convert/source-ext2 >> -/convert/source-fs >> -/convert/source-reiserfs >> /ctree >> /dir-item >> /disk-io >> >> >> This is due to moving things out of objects and cmds_objects variables. Such >> missing dependency files cause mis-building of some source files (try touch >> utils.h; make mkfs/main.o). >> >> This patch introduce a new variable "all_objects" to keep all the objects and >> use the variable to generate proper dependency file building rules. >> >> Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com> > > Applied, thanks. > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: fix max chunk size on dup
Hi, You may notice my @wdc.com address is bounced. As my internship job at wdc is finished, the address is already expired. Please contact me on this @elisp.net address. 2017-10-11 2:42 GMT+09:00 Hans van Kranenburg <hans.van.kranenb...@mendix.com>: > Sorry for the mail spam, it's an interesting code puzzle... :) > > On 10/10/2017 07:22 PM, Hans van Kranenburg wrote: >> On 10/10/2017 07:07 PM, Hans van Kranenburg wrote: >>> On 10/10/2017 01:31 PM, David Sterba wrote: >>>> Hi, >>>> >>>> On Fri, Sep 29, 2017 at 04:20:51PM +0900, Naohiro Aota wrote: >>>>> Balancing a fresh METADATA=dup btrfs file system (with size < 50G) >>>>> generates a 128MB sized block group. While we set max_stripe_size = >>>>> max_chunk_size = 256MB, we get this half sized block group: >>>>> >>>>> $ btrfs ins dump-t -t CHUNK_TREE btrfs.img|grep length >>>>> length 8388608 owner 2 stripe_len 65536 type DATA >>>>> length 33554432 owner 2 stripe_len 65536 type SYSTEM|DUP >>>>> length 134217728 owner 2 stripe_len 65536 type >>>>> METADATA|DUP >>>>> >>>>> Before commit 86db25785a6e ("Btrfs: fix max chunk size on raid5/6"), we >>>>> used "stripe_size * ndevs > max_chunk_size * ncopies" to check the max >>>>> chunk size. Since stripe_size = 256MB * dev_stripes (= 2) = 512MB, ndevs >>>>> = 1, max_chunk_size = 256MB, and ncopies = 2, we allowed 256MB >>>>> METADATA|DUP block group. >>>>> >>>>> But now, we use "stripe_size * data_stripes > max_chunk_size". Since >>>>> data_stripes = 1 on DUP, it disallows the block group to have > 128MB. >>>>> What missing here is "dev_stripes". Proper logical space used by the block >>>>> group is "stripe_size * data_stripes / dev_stripes". Tweak the equations >>>>> to >>>>> use the right value. >>>> >>>> I started looking into it and still don't fully understand it. Change >>>> deep in the allocator can easily break some blockgroup combinations, so >>>> I'm rather conservative here. >>> >>> I think that the added usage of data_stripes in 86db25785a6e is the >>> problematic change. data_stripes is something that was introduced as >>> part of RAID56 in 53b381b3a and clearly only has a meaning that's >>> properly thought of for RAID56. The RAID56 commit already adds "this >>> will have to be fixed for RAID1 and RAID10 over more drives", only the >>> author doesn't catch the DUP case, which already breaks at that point. >>> Thank you for explaining in detail :) Yes, the allocator was already broken by using the data_stripes. (and it will also "break" existing file systems, in terms of making a DUP|META block group smaller) I believe this patch fix the things. >>> At the beginning it says: >>> >>> int data_stripes; /* number of stripes that count for block group size */ >>> >>> For the example: >>> >>> This is DUP: >>> >>> .sub_stripes= 1, >>> .dev_stripes= 2, >>> .devs_max = 1, >>> .devs_min = 1, >>> .tolerated_failures = 0, >>> .devs_increment = 1, >>> .ncopies= 2, >>> >>> In the code: >>> >>> max_stripe_size = SZ_256M >>> max_chunk_size = max_stripe_size-> SZ_256M >>> >>> Then we have find_free_dev_extent: >>> max_stripe_size * dev_stripes -> SZ_256M * 2 -> 512M >>> >>> So we like to find 512M on a disk, to stuff 2 stripes of 256M inside for >>> the DUP. (remember: The two parts of DUP *never* end up on a different >>> disk, even if you have multiple) >>> > > Another better fix would be to make a change here... > >>> If we find one: >>> stripe_size = devices_info[ndevs-1].max_avail -> 512M, yay > > ...because this is not yay. The 512M is max_avail, which needs to holds > *2* stripes, not 1. So stripe_size is suddenly changed to twice the > stripe size for DUP. > > So, an additional division again by dev_stripes would translate the > max_avail on device ndevs-1 to the stripe size to use. This catch the point. Notice stripe_size is divided by dev_stripes after the lines in the patch. It means that at this region (from "stripe_size = devices_info[ndevs-1].max_avail;" to "stripe_size = div_u64(stripe_size, dev_stri
Re: [PATCH] btrfs: set include path relatively
2017-10-12 21:38 GMT+09:00 David Sterba <dste...@suse.cz>: > On Thu, Oct 12, 2017 at 11:22:24AM +0900, Naohiro Aota wrote: >> Currently, gcc is passed the include directory with full path. As a result, >> dependency files (*.o.d) also record the full path at the build time. Such >> full path dependency is annoying for sharing the source between multiple >> machines, containers, or anything the path differ. >> >> And this is the same way what other program using autotools e.g. e2fsprogs >> is doing: >> >> $ grep top_builddir Makefile >> top_builddir = . >> CPPFLAGS = -I. -I$(top_builddir)/lib -I$(top_srcdir)/lib >> BUILD_CFLAGS = -g -O2 -I. -I$(top_builddir)/lib -I$(top_srcdir)/lib >> -DHAVE_CONFIG_H >> > > Makes sense for the header files. TOPDIR is also used for linking, can > it cause similar problems, where the library search path is set as > -L$(TOPDIR) ? Since both "TOPDIR := $(shell pwd)" and -L$(TOPDIR) is evaluated at the build time, it never cause problem on linking itself. What the problem with dependency files is persisting the full path into the dep files. We generate the dep files only for objects (%.o.d rule) Well, it's ok to use "-L." here, and it's the same way with autotools does. > > I wonder if we should do the same as in the example above and set > default value of TOPDIR to '.', instead of `pwd`. TOPDIR is also used by library-test{,.static}. They run gcc in a temporary directory. Thus, TOPDIR here should be absolute. > > >> Signed-off-by: Naohiro Aota <na...@elisp.net> >> --- >> Makefile | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/Makefile b/Makefile >> index b1f3388..69b94fb 100644 >> --- a/Makefile >> +++ b/Makefile >> @@ -70,8 +70,8 @@ CFLAGS = $(SUBST_CFLAGS) \ >>-D_XOPEN_SOURCE=700 \ >>-fno-strict-aliasing \ >>-fPIC \ >> - -I$(TOPDIR) \ >> - -I$(TOPDIR)/kernel-lib \ >> + -I. \ >> + -I./kernel-lib \ >>$(EXTRAWARN_CFLAGS) \ >>$(DEBUG_CFLAGS_INTERNAL) \ >>$(EXTRA_CFLAGS) >> -- >> 2.14.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: propagate error to btrfs_cmp_data_prepare caller
btrfs_cmp_data_prepare() (almost) always returns 0 i.e. ignoring errors from gather_extent_pages(). While the pages are freed by btrfs_cmp_data_free(), cmp->num_pages still has > 0. Then, btrfs_extent_same() try to access the already freed pages causing faults (or violates PageLocked assertion). This patch just return the error as is so that the caller stop the process. Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com> Fixes: f441460202cb ("btrfs: fix deadlock with extent-same and readpage") Cc: <sta...@vger.kernel.org> # 4.2 --- fs/btrfs/ioctl.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index ae8fbf9d3de2..b5971923f15f 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3028,7 +3028,7 @@ static int btrfs_cmp_data_prepare(struct inode *src, u64 loff, out: if (ret) btrfs_cmp_data_free(cmp); - return 0; + return ret; } static int btrfs_cmp_data(u64 len, struct cmp_pages *cmp) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs: clear ordered flag on cleaning up ordered extents
On 2017年09月08日 03:25, David Sterba wrote: > On Fri, Sep 01, 2017 at 06:59:49PM +0800, Qu Wenruo wrote: >> On 2017年09月01日 16:58, Naohiro Aota wrote: >>> commit 524272607e88 ("btrfs: Handle delalloc error correctly to avoid >>> ordered extent hang") introduced btrfs_cleanup_ordered_extents() to cleanup >>> submitted ordered extents. However, it does not clear the ordered bit >>> (Private2) of coresponding pages. Thus, the following BUG occurs from >>> free_pages_check_bad() (on btrfs/125 with nospace_cache). >>> >>> BUG: Bad page state in process btrfs pfn:3fa787 >>> page:df2acfe9e1c0 count:0 mapcount:0 mapping: (null) index:0xd >>> flags: 0x80002008(uptodate|private_2) >>> raw: 80002008 000d >>> raw: df2acf5c1b20 b443802238b0 >>> page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >>> bad because of flags: 0x2000(private_2) >>> >>> This patch clear the flag as same as other places calling >>> btrfs_dec_test_ordered_pending() for every page in the specified range. >>> >>> Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com> >>> Fixes: 524272607e88 ("btrfs: Handle delalloc error correctly to avoid >>> ordered extent hang") >>> Cc: <sta...@vger.kernel.org> # 4.12 >>> --- >>> fs/btrfs/inode.c | 12 >>> 1 file changed, 12 insertions(+) >>> >>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >>> index 24bcd5cd9cf2..ae4c0a1bef38 100644 >>> --- a/fs/btrfs/inode.c >>> +++ b/fs/btrfs/inode.c >>> @@ -135,6 +135,18 @@ static inline void >>> btrfs_cleanup_ordered_extents(struct inode *inode, >>> const u64 offset, >>> const u64 bytes) >>> { >>> + unsigned long index = offset >> PAGE_SHIFT; >>> + unsigned long end_index = (offset + bytes - 1) >> PAGE_SHIFT; >>> + struct page *page; >>> + >>> + while (index <= end_index) { >>> + page = find_get_page(inode->i_mapping, index); >>> + index++; >>> + if (!page) >>> + continue; >>> + ClearPagePrivate2(page); >>> + put_page(page); >>> + } >> >> At first glance, explicitly clearing Private2 flag here seems a little >> strange to me. >> However btrfs_invalidatepage() is also doing the same thing, I think >> it's fine. >> >> Reviewed-by: Qu Wenruo <quwenruo.bt...@gmx.com> >> >> BTW, Private2 flag is set by extent_clear_unlock_delalloc() with >> page_ops |= PAGE_SET_PRIVATE2, but we're clearing the page flag without >> any encapsulation, it may be better to use similar function to clear >> Private2 flag. > > I agree, the Private2 flag is given another meaning in btrfs, ie. the > writeback status, so this would be better wrapped in helpers that > reflect what is the private2 flag used for. The helpers might be > trivial, but their name will be a better documentation than the random > comments that we can be found next to its use. > Thanks for the reviews. I'll send a new patch to add the wrapping function. (or should I squash the patch with this change?) Regards, Naohiro -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: fix max chunk size on dup
Balancing a fresh METADATA=dup btrfs file system (with size < 50G) generates a 128MB sized block group. While we set max_stripe_size = max_chunk_size = 256MB, we get this half sized block group: $ btrfs ins dump-t -t CHUNK_TREE btrfs.img|grep length length 8388608 owner 2 stripe_len 65536 type DATA length 33554432 owner 2 stripe_len 65536 type SYSTEM|DUP length 134217728 owner 2 stripe_len 65536 type METADATA|DUP Before commit 86db25785a6e ("Btrfs: fix max chunk size on raid5/6"), we used "stripe_size * ndevs > max_chunk_size * ncopies" to check the max chunk size. Since stripe_size = 256MB * dev_stripes (= 2) = 512MB, ndevs = 1, max_chunk_size = 256MB, and ncopies = 2, we allowed 256MB METADATA|DUP block group. But now, we use "stripe_size * data_stripes > max_chunk_size". Since data_stripes = 1 on DUP, it disallows the block group to have > 128MB. What missing here is "dev_stripes". Proper logical space used by the block group is "stripe_size * data_stripes / dev_stripes". Tweak the equations to use the right value. Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com> --- fs/btrfs/volumes.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 0e8f16c305df..d1ac3e1fb753 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4751,10 +4751,11 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, * is really going to be in terms of logical address space, * and compare that answer with the max chunk size */ - if (stripe_size * data_stripes > max_chunk_size) { + if (stripe_size * data_stripes > max_chunk_size * dev_stripes) { u64 mask = (1ULL << 24) - 1; - stripe_size = div_u64(max_chunk_size, data_stripes); + stripe_size = div_u64(max_chunk_size * dev_stripes, + data_stripes); /* bump the answer up to a 16MB boundary */ stripe_size = (stripe_size + mask) & ~mask; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: fix NULL pointer dereference from free_reloc_roots()
__del_reloc_root should be called before freeing up reloc_root->node. If not, calling __del_reloc_root() dereference reloc_root->node, causing the system BUG. Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com> --- fs/btrfs/relocation.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 65661d1aae4e..6445de8e9ece 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2393,11 +2393,11 @@ void free_reloc_roots(struct list_head *list) while (!list_empty(list)) { reloc_root = list_entry(list->next, struct btrfs_root, root_list); + __del_reloc_root(reloc_root); free_extent_buffer(reloc_root->node); free_extent_buffer(reloc_root->commit_root); reloc_root->node = NULL; reloc_root->commit_root = NULL; - __del_reloc_root(reloc_root); } } -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs: finish ordered extent cleaning if no progress is found
__endio_write_update_ordered() repeats the search until it reaches the end of the specified range. This works well with direct IO path, because before the function is called, it's ensured that there are ordered extents filling whole the range. It's not the case, however, when it's called from run_delalloc_range(): it is possible to have error in the midle of the loop in e.g. run_delalloc_nocow(), so that there exisits the range not covered by any ordered extents. By cleaning such "uncomplete" range, __endio_write_update_ordered() stucks at offset where there're no ordered extents. Since the ordered extents are created from head to tail, we can stop the search if there are no offset progress. Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com> Fixes: 524272607e88 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang") Cc: <sta...@vger.kernel.org> # 4.12 --- fs/btrfs/inode.c |4 1 file changed, 4 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ae4c0a1bef38..fd5934121b4b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8309,6 +8309,7 @@ static void __endio_write_update_ordered(struct inode *inode, btrfs_work_func_t func; u64 ordered_offset = offset; u64 ordered_bytes = bytes; + u64 last_offset; int ret; if (btrfs_is_free_space_inode(BTRFS_I(inode))) { @@ -8320,6 +8321,7 @@ static void __endio_write_update_ordered(struct inode *inode, } again: + last_offset = ordered_offset; ret = btrfs_dec_test_first_ordered_pending(inode, , _offset, ordered_bytes, @@ -8330,6 +8332,8 @@ static void __endio_write_update_ordered(struct inode *inode, btrfs_init_work(>work, func, finish_ordered_fn, NULL, NULL); btrfs_queue_work(wq, >work); out_test: + if (ordered_offset == last_offset) + return; /* * our bio might span multiple ordered extents. If we haven't * completed the accounting for the whole dio, go back and try again -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs: clear ordered flag on cleaning up ordered extents
commit 524272607e88 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang") introduced btrfs_cleanup_ordered_extents() to cleanup submitted ordered extents. However, it does not clear the ordered bit (Private2) of coresponding pages. Thus, the following BUG occurs from free_pages_check_bad() (on btrfs/125 with nospace_cache). BUG: Bad page state in process btrfs pfn:3fa787 page:df2acfe9e1c0 count:0 mapcount:0 mapping: (null) index:0xd flags: 0x80002008(uptodate|private_2) raw: 80002008 000d raw: df2acf5c1b20 b443802238b0 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: 0x2000(private_2) This patch clear the flag as same as other places calling btrfs_dec_test_ordered_pending() for every page in the specified range. Signed-off-by: Naohiro Aota <naohiro.a...@wdc.com> Fixes: 524272607e88 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang") Cc: <sta...@vger.kernel.org> # 4.12 --- fs/btrfs/inode.c | 12 1 file changed, 12 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 24bcd5cd9cf2..ae4c0a1bef38 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -135,6 +135,18 @@ static inline void btrfs_cleanup_ordered_extents(struct inode *inode, const u64 offset, const u64 bytes) { + unsigned long index = offset >> PAGE_SHIFT; + unsigned long end_index = (offset + bytes - 1) >> PAGE_SHIFT; + struct page *page; + + while (index <= end_index) { + page = find_get_page(inode->i_mapping, index); + index++; + if (!page) + continue; + ClearPagePrivate2(page); + put_page(page); + } return __endio_write_update_ordered(inode, offset + PAGE_SIZE, bytes - PAGE_SIZE, false); } -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: set include path relatively
Currently, gcc is passed the include directory with full path. As a result, dependency files (*.o.d) also record the full path at the build time. Such full path dependency is annoying for sharing the source between multiple machines, containers, or anything the path differ. And this is the same way what other program using autotools e.g. e2fsprogs is doing: $ grep top_builddir Makefile top_builddir = . CPPFLAGS = -I. -I$(top_builddir)/lib -I$(top_srcdir)/lib BUILD_CFLAGS = -g -O2 -I. -I$(top_builddir)/lib -I$(top_srcdir)/lib -DHAVE_CONFIG_H Signed-off-by: Naohiro Aota <na...@elisp.net> --- Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index b1f3388..69b94fb 100644 --- a/Makefile +++ b/Makefile @@ -70,8 +70,8 @@ CFLAGS = $(SUBST_CFLAGS) \ -D_XOPEN_SOURCE=700 \ -fno-strict-aliasing \ -fPIC \ --I$(TOPDIR) \ --I$(TOPDIR)/kernel-lib \ +-I. \ +-I./kernel-lib \ $(EXTRAWARN_CFLAGS) \ $(DEBUG_CFLAGS_INTERNAL) \ $(EXTRA_CFLAGS) -- 2.14.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: revert fs_devices state on error of btrfs_init_new_device()
When btrfs hits error after modifying fs_devices in btrfs_init_new_device() (such as btrfs_add_dev_item() returns error), it leaves everything as is, but frees allocated btrfs_device. As a result, fs_devices->devices and fs_devices->alloc_list contain already freed btrfs_device, leading to later use-after-free bug. Error path also messes the things like ->num_devices. While they go backs to the original value by unscanning btrfs devices, it is safe to revert them here. Fixes: 79787eaab461 ("btrfs: replace many BUG_ONs with proper error handling") Signed-off-by: Naohiro Aota --- fs/btrfs/volumes.c | 28 +++- 1 file changed, 23 insertions(+), 5 deletions(-) This patch applies on master, but not on kdave/for-next because of 74b9f4e186eb ("btrfs: declare fs_devices in btrfs_init_new_device()") diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1da162928d1a..5f0512fffa52 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2410,7 +2410,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path struct list_head *devices; struct super_block *sb = fs_info->sb; struct rcu_string *name; - u64 tmp; + u64 orig_super_total_bytes, orig_super_num_devices; int seeding_dev = 0; int ret = 0; bool unlocked = false; @@ -2509,12 +2509,14 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path if (!blk_queue_nonrot(q)) fs_info->fs_devices->rotating = 1; - tmp = btrfs_super_total_bytes(fs_info->super_copy); + orig_super_total_bytes = btrfs_super_total_bytes(fs_info->super_copy); btrfs_set_super_total_bytes(fs_info->super_copy, - round_down(tmp + device->total_bytes, fs_info->sectorsize)); + round_down(orig_super_total_bytes + device->total_bytes, + fs_info->sectorsize)); - tmp = btrfs_super_num_devices(fs_info->super_copy); - btrfs_set_super_num_devices(fs_info->super_copy, tmp + 1); + orig_super_num_devices = btrfs_super_num_devices(fs_info->super_copy); + btrfs_set_super_num_devices(fs_info->super_copy, + orig_super_num_devices + 1); /* add sysfs device entry */ btrfs_sysfs_add_device_link(fs_info->fs_devices, device); @@ -2594,6 +2596,22 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path error_sysfs: btrfs_sysfs_rm_device_link(fs_info->fs_devices, device); + mutex_lock(_info->fs_devices->device_list_mutex); + mutex_lock(_info->chunk_mutex); + list_del_rcu(>dev_list); + list_del(>dev_alloc_list); + fs_info->fs_devices->num_devices--; + fs_info->fs_devices->open_devices--; + fs_info->fs_devices->rw_devices--; + fs_info->fs_devices->total_devices--; + fs_info->fs_devices->total_rw_bytes -= device->total_bytes; + atomic64_sub(device->total_bytes, _info->free_chunk_space); + btrfs_set_super_total_bytes(fs_info->super_copy, + orig_super_total_bytes); + btrfs_set_super_num_devices(fs_info->super_copy, + orig_super_num_devices); + mutex_unlock(_info->chunk_mutex); + mutex_unlock(_info->fs_devices->device_list_mutex); error_trans: if (seeding_dev) sb->s_flags |= SB_RDONLY; -- 2.18.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: fix use-after-free of cmp workspace pages
btrfs_cmp_data_free() puts cmp's src_pages and dst_pages, but leaves their page address intact. Now, if you hit "goto again" in btrfs_extent_same_range() and hit some error in btrfs_cmp_data_prepare(), you'll try to unlock/put already put pages. This is simple fix to reset the address to avoid use-after-free. Fixes: 67b07bd4bec5 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl") Signed-off-by: Naohiro Aota --- fs/btrfs/ioctl.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 43ecbe620dea..b077544b5232 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3327,11 +3327,13 @@ static void btrfs_cmp_data_free(struct cmp_pages *cmp) if (pg) { unlock_page(pg); put_page(pg); + cmp->src_pages[i] = NULL; } pg = cmp->dst_pages[i]; if (pg) { unlock_page(pg); put_page(pg); + cmp->dst_pages[i] = NULL; } } } -- 2.18.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html