Re: btrfs convert problem
On 14.09.2018 02:17, Qu Wenruo wrote: > > > On 2018/9/14 上午12:37, Nikolay Borisov wrote: >> >> >> On 13.09.2018 19:15, Serhat Sevki Dincer wrote: -1 seems to be EPERM, is your device write-protected, readonly or something like that ? >>> >>> I am able to use ext4 partition read/write, created a file and wrote >>> in it, un/re mounted it, all is ok. >>> The drive only has a microusb port and a little led light; no rw >>> switch or anything like that.. >>> >> >> What might help is running btrfs convert under strace. So something like : >> >> sudo strace -f -o strace.log btrfs-convert /dev/sdb1 and then send the log. >> > strace would greatly help in this case. > > My guess is something wrong happened in migrate_super_block(), which > doesn't handle ret > 0 case well. > In that case, it means pwrite 4K doesn't finish in one call, which looks > pretty strange. So Qu, I'm not seeing any EPERM errors, though I see the following so you might be right: openat(AT_FDCWD, "/dev/sdb1", O_RDWR) = 5 followed by a lot of preads, the last one of which is: pread64(5, 0x557a0abd10b0, 4096, 2732765184) = -1 EIO (Input/output error) This is then followed by a a lot of pwrites, the last 10 or so syscalls are: pwrite64(5, "\220\255\262\244\0\0\0\0\0\0"..., 16384, 91127808) = 16384 pwrite64(5, "|IO\233\0\0\0\0\0\0"..., 16384, 91144192) = 16384 pwrite64(5, "x\2501n\0\0\0\0\0\0"..., 16384, 91160576) = 16384 pwrite64(5, "\252\254l)\0\0\0\0\0\0"..., 16384, 91176960) = 16384 pwrite64(5, "P\256\331\373\0\0\0\0\0\0"..., 16384, 91193344) = 16384 pwrite64(5, "\3\230\230+\0\0\0\0\0\0"..., 4096, 83951616) = 4096 write(2, "ERROR: ", 7) = 7 write(2, "failed to "..., 37) = 37 write(2, "\n", 1) = 1 close(4) = 0 write(2, "WARNING: ", 9) = 9 write(2, "an error o"..., 104) = 104 write(2, "\n", 1) = 1 exit_group(1) = ? but looking at migrate_super_block it's not doing that many writes - just writing the sb at BTRFS_SUPER_INFO_OFFSET and then zeroing out 0..BTRFS_SUPER_INFO_OFFSET. Also the number of pwrites: grep -c pwrite ~/Downloads/btrfs-issue/convert-strace.log 198 So the failure is not originating from migrate_super_block. > > Thanks, > Qu >
Re: [PATCH v2 5/7] btrfs-progs: lowmem: continue to check item in last slot while checking inodes
On 09/14/2018 07:43 AM, Qu Wenruo wrote: On 2018/9/13 上午4:49, damenly...@gmail.com wrote: From: Su Yue After call of check_inode_item(), path may point to the last unchecked slot of the leaf. The outer walk_up_tree() always treats the position as checked item then skips to next item. So check_inode_item() always set its path to *unchecked* slot, while walk_up_tree() always think its path is set to *checked* slot. Then this is indeed a problem. Yep, that's the case. If the last item was an inode item, yes, it was unchecked. Then walk_up_tree() will think the leaf is checked and walk up to upper node, process_one_leaf() in walk_down_tree() would skip to check next inode item. Which means, the inode item won't be checked. Solution: After check_inode_item returns, if found path point to the last item of a leaf, Would you please explain more about why last item makes a difference here? From previous statement, it looks like it's the difference in how walk_up_tree() and check_inode_item() handles the path. Not really related to last item. Yes, the change is tricky. The core problem is walk_up_tree() will skip to other nodes. Decreament of slot will let path still point to the leaf after walk_up_tree() returns. Or we must change logical of walk_up_tree() or check_inode_item(). BTW,now process_one_leaf() checks inode_item from first inode_item or second item with different ino. Change it start from last slot should be the right way. Thanks, Su Thanks, Su Or did I miss something? Thanks, Qu decrease path slot manually, so walk_up_tree() will stay on the leaf. Fixes: 5e2dc770471b ("btrfs-progs: check: skip shared node or leaf check for low_memory mode") Signed-off-by: Su Yue --- check/mode-lowmem.c | 12 1 file changed, 12 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 8fc9edab1d66..b6b33786d02b 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -2612,6 +2612,18 @@ again: if (cur->start == cur_bytenr) goto again; + /* +* path may point at the last item(a inode item maybe) in a leaf. +* Without below lines, walk_up_tree() will skip the item which +* means all items related to the inode will never be checked. +* Decrease the slot manually, walk_up_tree won't skip to next node +* if it occurs. +*/ + if (path->slots[0] + 1 >= btrfs_header_nritems(path->nodes[0])) { + if (path->slots[0]) + path->slots[0]--; + } + /* * we have switched to another leaf, above nodes may * have changed, here walk down the path, if a node
Re: [PATCH v2 4/7] btrfs-progs: lowmem: search key of root again after check_fs_root() under repair
On 09/14/2018 07:37 AM, Qu Wenruo wrote: On 2018/9/13 上午4:49, damenly...@gmail.com wrote: From: Su Yue In check_fs_roots_lowmem(), we do search and follow the resulted path to call check_fs_root(), then call btrfs_next_item() to check next root. However, if repair is enabled, the root tree can be cowed, the existed path can cause strange errors. Solution: If repair, save the key before calling check_fs_root, search the saved key again before checking next root. Both reason and solution looks good. Signed-off-by: Su Yue --- check/mode-lowmem.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 89a304bbdd69..8fc9edab1d66 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4967,9 +4967,13 @@ int check_fs_roots_lowmem(struct btrfs_fs_info *fs_info) } while (1) { + struct btrfs_key saved_key; + node = path.nodes[0]; slot = path.slots[0]; btrfs_item_key_to_cpu(node, , slot); + if (repair) + saved_key = key; if (key.objectid > BTRFS_LAST_FREE_OBJECTID) goto out; if (key.type == BTRFS_ROOT_ITEM_KEY && @@ -5000,6 +5004,17 @@ int check_fs_roots_lowmem(struct btrfs_fs_info *fs_info) err |= ret; } next: + /* +* Since root tree can be cowed during repair, +* here search the saved key again. +*/ + if (repair) { + btrfs_release_path(); + ret = btrfs_search_slot(NULL, fs_info->tree_root, + _key, , 0, 0); + /* Repair never deletes trees, search must succeed. */ + BUG_ON(ret); But this doesn't look good to me. Your assumption here is valid (at least for now), but it's possible that some tree blocks get corrupted in a large root tree, and in that case, we could still read part of the root tree, but btrfs_search_slot() could still return -EIO for certain search key. So I still prefer to do some error handling other than BUG_ON(ret). Okay, will try it. Thanks, Su Thanks, Qu + } ret = btrfs_next_item(tree_root, ); if (ret > 0) goto out;
Re: [PATCH v2 3/7] btrfs-progs: lowmem check: find dir_item by di_key in check_dir_item()
On 09/14/2018 07:33 AM, Qu Wenruo wrote: On 2018/9/13 上午4:49, damenly...@gmail.com wrote: From: Su Yue In check_dir_item, we are going to search corresponding dir_item/index. Commit 564901eac7a4 ("btrfs-progs: check: introduce print_dir_item_err()") Changed argument name from key to di_key but forgot to change the key name for dir_item search. So @key shouldn't be used here. It should be @di_key. Fixes: 564901eac7a4 ("btrfs-progs: check: introduce print_dir_item_err()") Which also forgot to modify comment about parameters. OhNoticed Signed-off-by: Su Yue --- check/mode-lowmem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 1bce44f5658a..89a304bbdd69 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -1658,7 +1658,7 @@ begin: /* check relative INDEX/ITEM */ key.objectid = di_key->objectid; - if (key.type == BTRFS_DIR_ITEM_KEY) { + if (di_key->type == BTRFS_DIR_ITEM_KEY) { To avoid such problem, I recommend to rename @key to @search_key, and move the declaration inside the while loop. Nice suggestion. Will do it in next version. Thanks, Su Thanks, Qu key.type = BTRFS_DIR_INDEX_KEY; key.offset = index; } else {
Re: [PATCH v2 7/7] btrfs-progs: fsck-tests: add test case inode_extref without dir_item and dir_index
On 2018/9/13 上午4:49, damenly...@gmail.com wrote: > From: Su Yue > > This case contains an inode_extref: > == > ... >item 1 key (257 INODE_EXTREF 3460996356) itemoff 3947 itemsize 24 > index 257 parent 256 namelen 6 name: foo255 > ... > == > The related dir_item and dir_index are missing. > > Add the case to ensure both original and lowmem mode check can handle > the case of inode_extref. > > Lowmem part is supported since patch named >'btrfs-progs: lowmem: improve check_inode_extref()'. > > And rename default_case.img to inode_ref_without_dir_item_index.img. > > Signed-off-by: Su Yue > --- > .../inode_extref_without_dir_item_index.img | Bin 0 -> 10240 bytes > ...=> inode_ref_without_dir_item_and_index.img} | Bin > 2 files changed, 0 insertions(+), 0 deletions(-) > create mode 100644 > tests/fsck-tests/009-no-dir-item-or-index/inode_extref_without_dir_item_index.img Checked the image, it's normal INODE_REF with INODE_EXTREF. Looks good by itself. It could be even better, by providing image with INODE_EXTREF only without any INODE_REF. Thanks, Qu > rename tests/fsck-tests/009-no-dir-item-or-index/{default_case.img => > inode_ref_without_dir_item_and_index.img} (100%) > > diff --git > a/tests/fsck-tests/009-no-dir-item-or-index/inode_extref_without_dir_item_index.img > > b/tests/fsck-tests/009-no-dir-item-or-index/inode_extref_without_dir_item_index.img > new file mode 100644 > index > ..85f7a8211fe66dfa9b2b1f3a98e72f54cd402616 > GIT binary patch > literal 10240 > zcmeHscQo8j8?UlhEOymZqDGCfdWp7rf~Y~1#Of{4q9>NMSe=L(Wkn}~5JZVy61@b` > zOY|NsB z{}O`#MQ8s~cKv^0N5UJX{146m++Z;H#s?3twmBv=mszSgNy!H|F1 > z$NlwH{s)Ks?ejn7KN0v(1pZ$}z<*}eU$kky0Zga5{R>1!Q|jBj!Kk_IdwP^`75 > zM#+ojWKDf@k7f}D9aT8qYmSi4F2VupuXr{D%*M11tY*yA57Z39C|MYjWMVhzVK5j1 > zieWerkiPZjcdzsBS22#CPt1DVI!`Y9h*_LYFoYww=`DnvA?wF8(}YD85$ > zG7$zaGL3dVC{eyOEl3W_-qsf~#zJSUyqf<-HM2v|!&%~jSR%lz|08~jx;7^e3~;sY > zT^phzaQf-A9pzJu_AnE#I*l;xF8RnBNniHgj!g0JPRgY|`Ok zT==vTr(~q+;FBMuOG`o+hG&)q;T&{Y>hQSmPV3fQT{G?g$ZX6e{@Ov^aKMAG > zr|O5JwU)}O00QwdZZ<+yyfQ#~Ka9r6hA@3Oik}0|;rd7T)~)3|9Uli=;g > zh~{}Qn(6qP?bLNIenC5 > zLO3=6UuM9&&4o2fB&4lmozz;AiUWx^ndhJ)O`@bM4QWxL!mC>|7xSNK`Sw)H$N)-m > zSXz2)pA0>PYZPT59_=l(%tNBo?D|v)$%|6$`GKhK z3`TXnj3nB0HqI`I=#f5$3l_;9nqB$anY63#2cOROvo^B4lgXan6XVTQ><>+y93Fiy > zTiWZzmC*aeUfXv1K)7a?^YiuO;WXH%#MegmjkD8b)YP2MzV{*|{#*5|R9Ov&-Q^d% > z_IMBdg|z+-kf1lZG%Y^>3;_8eTVDY0gLuVuK^iH*1%Z5W_7lF*~uM$6u+hgOC_ > z4rWyxVvQeo1hp}4g+fK$4!+0}!ov|G-z-7PF$8c+7m_drD7D1!U-DzS0t`X > z>Y@fV(K_9a{<$w^me4NJyvG7Sf}U~z9#&01Q3o5cr>bn8t5a2O%6)B z^Po>Wf6SW`JBx#V08j00K~k%COtKaNNL64F+f;tgKG=lY(fg3Y1(No%L^e>S=)3E( > z&!A-jV%H0qu$NE*8j| zJXGa7*BIuzqutPc#vY;Wdne2cv}U-aTSp=~9Zl%M@uo?r{oXa+;oE6suP&`d!i > zP#H`@FWLwE;X3r-?mMs`bkeHA5!(ySKs*bn0(aIF=`f>3Wl=2@yv!W(Y57SB519_C > z6)+t9-_I4%1fb5Z{cpNOM?t%BnXTyFy@$oS?=g{%z#FR7*F40BkJ{o-r > zeg|!2s2At!us$=cRn=-Ad2{xrofs7EHO~dqhBX`nd|fIgA*e6HJ^;GIet=Fzs|Guf >
Re: [PATCH v2 6/7] btrfs-progs: lowmem: improve check_inode_extref()
On 2018/9/13 上午4:49, damenly...@gmail.com wrote: > From: Su Yue > > inode_extref is much similar with inode_ref, most codes are reused in > check_inode_extref(). > Exception: > There is no need to check root directory, so remove it. > Make check_inode_extref() verify hash value with key offset now. > > And lowmem check can detect errors about inode_extref and try to > repair errors. > > Signed-off-by: Su Yue > --- > check/mode-lowmem.c | 119 ++-- > 1 file changed, 92 insertions(+), 27 deletions(-) > > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c > index b6b33786d02b..bec2ee185cc8 100644 > --- a/check/mode-lowmem.c > +++ b/check/mode-lowmem.c > @@ -1182,37 +1182,79 @@ out: > * > * Return 0 if no error occurred. > */ > -static int check_inode_extref(struct btrfs_root *root, > - struct btrfs_key *ref_key, > - struct extent_buffer *node, int slot, u64 *refs, > - int mode) > +static int check_inode_extref(struct btrfs_root *root, struct btrfs_key > *ref_key, > + struct btrfs_path *path, char *name_ret, > + u32 *namelen_ret, u64 *refs_ret, int mode) > { > struct btrfs_key key; > struct btrfs_key location; > struct btrfs_inode_extref *extref; > + struct extent_buffer *node; > char namebuf[BTRFS_NAME_LEN] = {0}; > u32 total; > - u32 cur = 0; > + u32 cur; > u32 len; > u32 name_len; > u64 index; > u64 parent; > + int err; > + int tmp_err; > int ret; > - int err = 0; > + int slot; > + u64 refs; > + bool search_again = false; > > location.objectid = ref_key->objectid; > location.type = BTRFS_INODE_ITEM_KEY; > location.offset = 0; > +begin: > + cur = 0; > + err = 0; > + refs = *refs_ret; > + *namelen_ret = 0; > + > + /* since after repair, path and the dir item may be changed */ > + if (search_again) { > + search_again = false; > + btrfs_release_path(path); > + ret = btrfs_search_slot(NULL, root, ref_key, path, 0, 0); > + /* > + * The item was deleted, let the path point to the last checked > + * item. > + */ > + if (ret > 0) { > + if (path->slots[0] == 0) { > + ret = btrfs_prev_leaf(root, path); > + /* > + * we are checking the inode item, there must > + * be some items before, the case shall never > + * happen. > + */ This assumption looks valid by a quick glance, but doesn't stand in the following case: 1) It's really the first INODE_ITEM. 2) Previous leaf/node is corrupted. So we still need some error handling for it. > + BUG_ON(ret); > + } else { > + path->slots[0]--; > + } > + goto out; > + } else if (ret < 0) { > + err |= ret; > + goto out; > + } > + } > + > + node = path->nodes[0]; > + slot = path->slots[0]; > > extref = btrfs_item_ptr(node, slot, struct btrfs_inode_extref); > total = btrfs_item_size_nr(node, slot); > > -next: > - /* update inode ref count */ > - (*refs)++; > - name_len = btrfs_inode_extref_name_len(node, extref); > - index = btrfs_inode_extref_index(node, extref); > +loop: > + /* Update inode ref count */ > + refs++; > + tmp_err = 0; > parent = btrfs_inode_extref_parent(node, extref); > + index = btrfs_inode_extref_index(node, extref); > + name_len = btrfs_inode_extref_name_len(node, extref); > + > if (name_len <= BTRFS_NAME_LEN) { > len = name_len; > } else { > @@ -1220,37 +1262,60 @@ next: > warning("root %llu INODE_EXTREF[%llu %llu] name too long", > root->objectid, ref_key->objectid, ref_key->offset); > } > + > read_extent_buffer(node, namebuf, (unsigned long)(extref + 1), len); > > - /* Check root dir ref name */ > - if (index == 0 && strncmp(namebuf, "..", name_len)) { > - error("root %llu INODE_EXTREF[%llu %llu] ROOT_DIR name > shouldn't be %s", > + /* verify hash value */ > + if (ref_key->offset != btrfs_extref_hash(parent, namebuf, len)) { > + err |= -EIO; It's never a good idea to mix minus error number with bitmap errors. Thanks, Qu > + error("root %llu INODE_EXTREF[%llu %llu] name %s namelen %u > mode %d mismatch with its hash, wanted %llu have %llu", > root->objectid, ref_key->objectid, ref_key->offset, > - namebuf); > - err |=
Re: [PATCH v2 5/7] btrfs-progs: lowmem: continue to check item in last slot while checking inodes
On 2018/9/13 上午4:49, damenly...@gmail.com wrote: > From: Su Yue > > After call of check_inode_item(), path may point to the last unchecked > slot of the leaf. The outer walk_up_tree() always treats the position > as checked item then skips to next item. So check_inode_item() always set its path to *unchecked* slot, while walk_up_tree() always think its path is set to *checked* slot. Then this is indeed a problem. > > If the last item was an inode item, yes, it was unchecked. > Then walk_up_tree() will think the leaf is checked and walk up to > upper node, process_one_leaf() in walk_down_tree() would skip to > check next inode item. Which means, the inode item won't be checked. > > Solution: > After check_inode_item returns, if found path point to the last item > of a leaf, Would you please explain more about why last item makes a difference here? >From previous statement, it looks like it's the difference in how walk_up_tree() and check_inode_item() handles the path. Not really related to last item. Or did I miss something? Thanks, Qu > decrease path slot manually, so walk_up_tree() will stay > on the leaf. > > Fixes: 5e2dc770471b ("btrfs-progs: check: skip shared node or leaf check for > low_memory mode") > Signed-off-by: Su Yue > --- > check/mode-lowmem.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c > index 8fc9edab1d66..b6b33786d02b 100644 > --- a/check/mode-lowmem.c > +++ b/check/mode-lowmem.c > @@ -2612,6 +2612,18 @@ again: > if (cur->start == cur_bytenr) > goto again; > > + /* > + * path may point at the last item(a inode item maybe) in a leaf. > + * Without below lines, walk_up_tree() will skip the item which > + * means all items related to the inode will never be checked. > + * Decrease the slot manually, walk_up_tree won't skip to next node > + * if it occurs. > + */ > + if (path->slots[0] + 1 >= btrfs_header_nritems(path->nodes[0])) { > + if (path->slots[0]) > + path->slots[0]--; > + } > + > /* >* we have switched to another leaf, above nodes may >* have changed, here walk down the path, if a node >
Re: [PATCH v2 4/7] btrfs-progs: lowmem: search key of root again after check_fs_root() under repair
On 2018/9/13 上午4:49, damenly...@gmail.com wrote: > From: Su Yue > > In check_fs_roots_lowmem(), we do search and follow the resulted path > to call check_fs_root(), then call btrfs_next_item() to check next > root. > However, if repair is enabled, the root tree can be cowed, the > existed path can cause strange errors. > > Solution: > If repair, save the key before calling check_fs_root, > search the saved key again before checking next root. Both reason and solution looks good. > > Signed-off-by: Su Yue > --- > check/mode-lowmem.c | 15 +++ > 1 file changed, 15 insertions(+) > > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c > index 89a304bbdd69..8fc9edab1d66 100644 > --- a/check/mode-lowmem.c > +++ b/check/mode-lowmem.c > @@ -4967,9 +4967,13 @@ int check_fs_roots_lowmem(struct btrfs_fs_info > *fs_info) > } > > while (1) { > + struct btrfs_key saved_key; > + > node = path.nodes[0]; > slot = path.slots[0]; > btrfs_item_key_to_cpu(node, , slot); > + if (repair) > + saved_key = key; > if (key.objectid > BTRFS_LAST_FREE_OBJECTID) > goto out; > if (key.type == BTRFS_ROOT_ITEM_KEY && > @@ -5000,6 +5004,17 @@ int check_fs_roots_lowmem(struct btrfs_fs_info > *fs_info) > err |= ret; > } > next: > + /* > + * Since root tree can be cowed during repair, > + * here search the saved key again. > + */ > + if (repair) { > + btrfs_release_path(); > + ret = btrfs_search_slot(NULL, fs_info->tree_root, > + _key, , 0, 0); > + /* Repair never deletes trees, search must succeed. */ > + BUG_ON(ret); But this doesn't look good to me. Your assumption here is valid (at least for now), but it's possible that some tree blocks get corrupted in a large root tree, and in that case, we could still read part of the root tree, but btrfs_search_slot() could still return -EIO for certain search key. So I still prefer to do some error handling other than BUG_ON(ret). Thanks, Qu > + } > ret = btrfs_next_item(tree_root, ); > if (ret > 0) > goto out; >
Re: [PATCH v2 3/7] btrfs-progs: lowmem check: find dir_item by di_key in check_dir_item()
On 2018/9/13 上午4:49, damenly...@gmail.com wrote: > From: Su Yue > > In check_dir_item, we are going to search corresponding > dir_item/index. > > Commit 564901eac7a4 ("btrfs-progs: check: introduce > print_dir_item_err()") Changed argument name from key to di_key but > forgot to change the key name for dir_item search. > So @key shouldn't be used here. It should be @di_key. > > Fixes: 564901eac7a4 ("btrfs-progs: check: introduce print_dir_item_err()") Which also forgot to modify comment about parameters. > Signed-off-by: Su Yue > --- > check/mode-lowmem.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c > index 1bce44f5658a..89a304bbdd69 100644 > --- a/check/mode-lowmem.c > +++ b/check/mode-lowmem.c > @@ -1658,7 +1658,7 @@ begin: > > /* check relative INDEX/ITEM */ > key.objectid = di_key->objectid; > - if (key.type == BTRFS_DIR_ITEM_KEY) { > + if (di_key->type == BTRFS_DIR_ITEM_KEY) { To avoid such problem, I recommend to rename @key to @search_key, and move the declaration inside the while loop. Thanks, Qu > key.type = BTRFS_DIR_INDEX_KEY; > key.offset = index; > } else { >
Re: [PATCH v2 2/7] btrfs-progs: make btrfs_unlink() lookup inode_extref
On 2018/9/13 上午4:49, damenly...@gmail.com wrote: > From: Su Yue > > btrfs_unlink() uses btrfs_lookup_inode_ref() to look up inode_ref > but forget inode_extref case. > > Let btrfs_unlink() call btrfs_lookup_inode_extref() if inode_ref is > found and EXTENDED_IREF feature is enabled. > > Fixes: 0cc75eddd093 ("btrfs-progs: Add btrfs_unlink() and btrfs_add_link() > functions.") My fault. Indeed we need to check extref especially EXTENTD_IREF is already default mkfs features. Reviewed-by: Qu Wenruo Thanks, Qu > Signed-off-by: Su Yue > --- > inode.c | 14 +- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/inode.c b/inode.c > index 2398bca4a109..598ad0ab6b4c 100644 > --- a/inode.c > +++ b/inode.c > @@ -277,6 +277,7 @@ int btrfs_unlink(struct btrfs_trans_handle *trans, struct > btrfs_root *root, > struct btrfs_key key; > struct btrfs_inode_item *inode_item; > struct btrfs_inode_ref *inode_ref; > + struct btrfs_inode_extref *inode_extref = NULL; > struct btrfs_dir_item *dir_item; > u64 inode_size; > u32 nlinks; > @@ -296,7 +297,18 @@ int btrfs_unlink(struct btrfs_trans_handle *trans, > struct btrfs_root *root, > ret = PTR_ERR(inode_ref); > goto out; > } > - if (inode_ref) > + > + if (!inode_ref && btrfs_fs_incompat(root->fs_info, EXTENDED_IREF)) { > + btrfs_release_path(path); > + inode_extref = btrfs_lookup_inode_extref(trans, root, path, > + name, namelen, ino, parent_ino, 0); > + if (IS_ERR(inode_extref)) { > + ret = PTR_ERR(inode_extref); > + goto out; > + } > + } > + > + if (inode_ref || inode_extref) > del_inode_ref = 1; > btrfs_release_path(path); > >
Re: [PATCH v2 1/7] btrfs-progs: adjust arguments of btrfs_lookup_inode_extref()
On 2018/9/13 上午4:49, damenly...@gmail.com wrote: > From: Su Yue > > The argument index is not used in btrfs_lookup_inode_extref(), > so remove it. > And adjust positions its arguments to make it consistent with > kernel part. > > No functional change. > > Fixes: 260675657767 ("btrfs-progs: Import btrfs_insert/del/lookup_extref() > functions.") > Signed-off-by: Su Yue Looks good, just small nitpick on coding style. After fixing that, feel free to add my tag: Reviewed-by: Qu Wenruo > --- > ctree.h | 6 +++--- > inode-item.c | 6 +++--- > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/ctree.h b/ctree.h > index 4719962df67d..e7f6c5df95f1 100644 > --- a/ctree.h > +++ b/ctree.h > @@ -2708,9 +2708,9 @@ int btrfs_lookup_inode(struct btrfs_trans_handle > *trans, struct btrfs_root > *root, struct btrfs_path *path, > struct btrfs_key *location, int mod); > struct btrfs_inode_extref *btrfs_lookup_inode_extref(struct > btrfs_trans_handle > - *trans, struct btrfs_path *path, struct btrfs_root *root, > - u64 ino, u64 parent_ino, u64 index, const char *name, > - int namelen, int ins_len); > + *trans, struct btrfs_root *root, struct btrfs_path *path, *trans and struct btrfs_trans_handle are in different lines, looks pretty strange. It would be much better to keep them in the same line, just what kernel does. > + const char *name, int namelen, u64 ino, u64 parent_ino, > + int ins_len); > int btrfs_del_inode_extref(struct btrfs_trans_handle *trans, > struct btrfs_root *root, > const char *name, int name_len, > diff --git a/inode-item.c b/inode-item.c > index 1cc106670cd4..461557cb83d6 100644 > --- a/inode-item.c > +++ b/inode-item.c > @@ -228,9 +228,9 @@ static int btrfs_find_name_in_ext_backref(struct > btrfs_path *path, > } > > struct btrfs_inode_extref *btrfs_lookup_inode_extref(struct > btrfs_trans_handle > - *trans, struct btrfs_path *path, struct btrfs_root *root, > - u64 ino, u64 parent_ino, u64 index, const char *name, > - int namelen, int ins_len) > + *trans, struct btrfs_root *root, struct btrfs_path *path, Same here. Thanks, Qu > + const char *name, int namelen, u64 ino, u64 parent_ino, > + int ins_len) > { > struct btrfs_key key; > struct btrfs_inode_extref *extref; >
Re: btrfs convert problem
On 2018/9/14 上午12:37, Nikolay Borisov wrote: > > > On 13.09.2018 19:15, Serhat Sevki Dincer wrote: >>> -1 seems to be EPERM, is your device write-protected, readonly or >>> something like that ? >> >> I am able to use ext4 partition read/write, created a file and wrote >> in it, un/re mounted it, all is ok. >> The drive only has a microusb port and a little led light; no rw >> switch or anything like that.. >> > > What might help is running btrfs convert under strace. So something like : > > sudo strace -f -o strace.log btrfs-convert /dev/sdb1 and then send the log. > strace would greatly help in this case. My guess is something wrong happened in migrate_super_block(), which doesn't handle ret > 0 case well. In that case, it means pwrite 4K doesn't finish in one call, which looks pretty strange. Thanks, Qu
Re: [PATCH 3/3] btrfs-progs: check: lowmem: Refactor extent type checks in check_file_extent
On 2018/9/13 下午8:05, Nikolay Borisov wrote: > Make the checks in check_file_extent a bit more explicit. First we check > for unknown type and fail accordingly. Then we check for inline extent > and handle it in the newly introduced check_file_extent_inline. Finally > if none of the above checks triggered then we must have a regular or > prealloc extents. > > Signed-off-by: Nikolay Borisov Reviewed-by: Qu Wenruo Thanks, Qu > --- > check/mode-lowmem.c | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c > index 3a6fbb33c858..aa946a1307de 100644 > --- a/check/mode-lowmem.c > +++ b/check/mode-lowmem.c > @@ -1915,21 +1915,23 @@ static int check_file_extent(struct btrfs_root *root, > struct btrfs_path *path, > > btrfs_item_key_to_cpu(node, , slot); > fi = btrfs_item_ptr(node, slot, struct btrfs_file_extent_item); > - > - /* Check inline extent */ > extent_type = btrfs_file_extent_type(node, fi); > - if (extent_type == BTRFS_FILE_EXTENT_INLINE) > - return check_file_extent_inline(root, path, size, end); > > /* Check extent type */ > if (extent_type != BTRFS_FILE_EXTENT_REG && > - extent_type != BTRFS_FILE_EXTENT_PREALLOC) { > + extent_type != BTRFS_FILE_EXTENT_PREALLOC && > + extent_type != BTRFS_FILE_EXTENT_INLINE) { > err |= FILE_EXTENT_ERROR; > error("root %llu EXTENT_DATA[%llu %llu] type bad", > root->objectid, fkey.objectid, fkey.offset); > return err; > } > > + /* Check inline extent */ > + if (extent_type == BTRFS_FILE_EXTENT_INLINE) > + return check_file_extent_inline(root, path, size, end); > + > + > /* Check REG_EXTENT/PREALLOC_EXTENT */ > disk_bytenr = btrfs_file_extent_disk_bytenr(node, fi); > disk_num_bytes = btrfs_file_extent_disk_num_bytes(node, fi); >
Re: [PATCH 2/3] btrfs-progs: check: lowmem: Refactor extent len test in check_file_extent_inline
On 2018/9/13 下午8:05, Nikolay Borisov wrote: > Instead of having another top-level if which checks for > 'extent_num_bytes != item_inline_len' only if we are !compressed, just > move the 'if' inside the 'else' branch of the first top-level if, since > it has already checked for !compressed or not. No functional changes. > > Signed-off-by: Nikolay Borisov Reviewed-by: Qu Wenruo Thanks, Qu > --- > check/mode-lowmem.c | 19 ++- > 1 file changed, 10 insertions(+), 9 deletions(-) > > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c > index 48c1537e7440..3a6fbb33c858 100644 > --- a/check/mode-lowmem.c > +++ b/check/mode-lowmem.c > @@ -1860,19 +1860,20 @@ static int check_file_extent_inline(struct btrfs_root > *root, > err |= FILE_EXTENT_ERROR; > } > > - } > - if (!compressed && extent_num_bytes != item_inline_len) { > - error( > + > + if (extent_num_bytes != item_inline_len) { > + error( > "root %llu EXTENT_DATA[%llu %llu] wrong inline size, have: %llu, expected: > %u", > root->objectid, fkey.objectid, fkey.offset, > extent_num_bytes, item_inline_len); > - if (repair) { > - ret = repair_inline_ram_bytes(root, path, > - _num_bytes); > - if (ret) > + if (repair) { > + ret = repair_inline_ram_bytes(root, path, > + > _num_bytes); > + if (ret) > + err |= FILE_EXTENT_ERROR; > + } else { > err |= FILE_EXTENT_ERROR; > - } else { > - err |= FILE_EXTENT_ERROR; > + } > } > } > *end += extent_num_bytes; >
Re: [PATCH 1/3] btrfs-progs: check: lowmem: Factor out inline extent checking code in its own function
On 2018/9/13 下午8:05, Nikolay Borisov wrote: > Since the inline extent code can be largely self-sufficient, factor > it out from check_file_extent. No functional changes. > > Signed-off-by: Nikolay Borisov Reviewed-by: Qu Wenruo Indeed makes more sense to refactor inline check out. Thanks, Qu > --- > check/mode-lowmem.c | 142 ++-- > 1 file changed, 83 insertions(+), 59 deletions(-) > > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c > index 1bce44f5658a..48c1537e7440 100644 > --- a/check/mode-lowmem.c > +++ b/check/mode-lowmem.c > @@ -1800,6 +1800,87 @@ static int repair_inline_ram_bytes(struct btrfs_root > *root, > return ret; > } > > + > +static int check_file_extent_inline(struct btrfs_root *root, > + struct btrfs_path *path, u64 *size, > + u64 *end) > +{ > + u32 max_inline_extent_size = min_t(u32, root->fs_info->sectorsize - 1, > + BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)); > + struct extent_buffer *node = path->nodes[0]; > + struct btrfs_item *e = btrfs_item_nr(0); > + struct btrfs_file_extent_item *fi; > + struct btrfs_key fkey; > + u64 extent_num_bytes; > + u32 item_inline_len; > + int ret; > + int compressed = 0; > + int err = 0; > + > + fi = btrfs_item_ptr(node, path->slots[0], > + struct btrfs_file_extent_item); > + item_inline_len = btrfs_file_extent_inline_item_len(node, e); > + extent_num_bytes = btrfs_file_extent_ram_bytes(node, fi); > + compressed = btrfs_file_extent_compression(node, fi); > + btrfs_item_key_to_cpu(node, , path->slots[0]); > + > + if (extent_num_bytes == 0) { > + error( > +"root %llu EXTENT_DATA[%llu %llu] has empty inline extent", > + root->objectid, fkey.objectid, fkey.offset); > + err |= FILE_EXTENT_ERROR; > + } > + > + if (compressed) { > + if (extent_num_bytes > root->fs_info->sectorsize) { > + error( > +"root %llu EXTENT_DATA[%llu %llu] too large inline extent ram size, have > %llu, max: %u", > + root->objectid, fkey.objectid, > + fkey.offset, extent_num_bytes, > + root->fs_info->sectorsize - 1); > + err |= FILE_EXTENT_ERROR; > + } > + > + if (item_inline_len > max_inline_extent_size) { > + error( > +"root %llu EXTENT_DATA[%llu %llu] too large inline extent on-disk size, have > %u, max: %u", > + root->objectid, fkey.objectid, > + fkey.offset, item_inline_len, > + max_inline_extent_size); > + err |= FILE_EXTENT_ERROR; > + } > + > + } else { > + > + if (extent_num_bytes > max_inline_extent_size) { > + error( > +"root %llu EXTENT_DATA[%llu %llu] too large inline extent size, have %llu, > max: %u", > + root->objectid, fkey.objectid, fkey.offset, > + extent_num_bytes, max_inline_extent_size); > + err |= FILE_EXTENT_ERROR; > + } > + > + } > + if (!compressed && extent_num_bytes != item_inline_len) { > + error( > +"root %llu EXTENT_DATA[%llu %llu] wrong inline size, have: %llu, expected: > %u", > + root->objectid, fkey.objectid, fkey.offset, > + extent_num_bytes, item_inline_len); > + if (repair) { > + ret = repair_inline_ram_bytes(root, path, > + _num_bytes); > + if (ret) > + err |= FILE_EXTENT_ERROR; > + } else { > + err |= FILE_EXTENT_ERROR; > + } > + } > + *end += extent_num_bytes; > + *size += extent_num_bytes; > + > + return err; > +} > + > /* > * Check file extent datasum/hole, update the size of the file extents, > * check and update the last offset of the file extent. > @@ -1824,8 +1905,6 @@ static int check_file_extent(struct btrfs_root *root, > struct btrfs_path *path, > u64 csum_found; /* In byte size, sectorsize aligned */ > u64 search_start; /* Logical range start we search for csum */ > u64 search_len; /* Logical range len we search for csum */ > - u32 max_inline_extent_size = min_t(u32, root->fs_info->sectorsize - 1, > - BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)); > unsigned int extent_type; > unsigned int is_hole; > int slot = path->slots[0]; > @@ -1838,63 +1917,8 @@ static int check_file_extent(struct btrfs_root *root, > struct btrfs_path *path, > >
[PATCH v2] Btrfs: remove level==0 check in balance_level
btrfs_search_slot() if (level != 0) setup_nodes_for_search() balance_level() It is just impossible to have level=0 in balance_level. Reviewed-by: Nikolay Borisov Signed-off-by: Liu Bo --- v2: add assertion for level just in case someone breaks the rule in the future. fs/btrfs/ctree.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 8b31caa60b0a..ada44c786f2e 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1778,8 +1778,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans, int orig_slot = path->slots[level]; u64 orig_ptr; - if (level == 0) - return 0; + ASSERT(level > 0); mid = path->nodes[level]; -- 1.8.3.1
[PATCH v2] Btrfs: assert page dirty bit
Just in case that someone breaks the rule that pages are dirty as long as eb is dirty. Signed-off-by: Liu Bo --- v2: fix typo of CONFIG_BTRFS_DEBUG fs/btrfs/extent_io.c | 5 + 1 file changed, 5 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index fb2bf50134a1..f88231171009 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5184,6 +5184,11 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb) set_page_dirty(eb->pages[i]); } +#ifdef CONFIG_BTRFS_DEBUG + for (i = 0; i < num_pages; i++) + ASSERT(PageDirty(eb->pages[i])); +#endif + return was_dirty; } -- 1.8.3.1
[PATCH v2] Btrfs: skip set_page_dirty if eb is dirty
As long as @eb is marked with EXTENT_BUFFER_DIRTY, all of its pages are dirty, so no need to set pages dirty again. Ftrace showed that the loop took 10us on my dev box, so removing this can save us at least 10us if eb is already dirty. Signed-off-by: Liu Bo --- v2: Update the patch's description. fs/btrfs/extent_io.c | 11 +++ fs/btrfs/extent_io.h | 2 +- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 628f1aef34b0..fb2bf50134a1 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5165,11 +5165,11 @@ void clear_extent_buffer_dirty(struct extent_buffer *eb) WARN_ON(atomic_read(>refs) == 0); } -int set_extent_buffer_dirty(struct extent_buffer *eb) +bool set_extent_buffer_dirty(struct extent_buffer *eb) { int i; int num_pages; - int was_dirty = 0; + bool was_dirty; check_buffer_tree_ref(eb); @@ -5179,8 +5179,11 @@ int set_extent_buffer_dirty(struct extent_buffer *eb) WARN_ON(atomic_read(>refs) == 0); WARN_ON(!test_bit(EXTENT_BUFFER_TREE_REF, >bflags)); - for (i = 0; i < num_pages; i++) - set_page_dirty(eb->pages[i]); + if (!was_dirty) { + for (i = 0; i < num_pages; i++) + set_page_dirty(eb->pages[i]); + } + return was_dirty; } diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index b4d03e677e1d..f2ab42d57f02 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -479,7 +479,7 @@ void extent_buffer_bitmap_set(struct extent_buffer *eb, unsigned long start, void extent_buffer_bitmap_clear(struct extent_buffer *eb, unsigned long start, unsigned long pos, unsigned long len); void clear_extent_buffer_dirty(struct extent_buffer *eb); -int set_extent_buffer_dirty(struct extent_buffer *eb); +bool set_extent_buffer_dirty(struct extent_buffer *eb); void set_extent_buffer_uptodate(struct extent_buffer *eb); void clear_extent_buffer_uptodate(struct extent_buffer *eb); int extent_buffer_under_io(struct extent_buffer *eb); -- 1.8.3.1
Re: [PATCH] Btrfs: skip set_page_dirty if eb is dirty
On Thu, Sep 13, 2018 at 01:29:59PM +0200, David Sterba wrote: > On Wed, Sep 12, 2018 at 12:27:46PM -0700, Liu Bo wrote: > > On Wed, Sep 12, 2018 at 09:37:20AM +0300, Nikolay Borisov wrote: > > > > > > > > > On 12.09.2018 01:06, Liu Bo wrote: > > > > As long as @eb is marked with EXTENT_BUFFER_DIRTY, all of its pages > > > > are dirty, so no need to set pages dirty again. > > > > > > > > Signed-off-by: Liu Bo > > > > > > Does make it any performance difference, numbers? > > > > > > > To be honest, the performance difference would be trivial in a normal > > big test round. But I just looked into the difference from my ftrace, > > removing the loop can reduce the time spent by 10us in my box. > > 10us was for the case where the pages were dirty already and the for > cycle was then skipped? > > set_page_dirty is not lightweight, calls down to various functions and > holds locks. I can't tell if this is still fast enough on your machine > so that it really takes 10us. > Sorry for not making it clear, on my box 10us was spent in the 'for' loop, which iterates 4 pages and 4 calls for set_page_dirty(). thanks, -liubo > The conditional dirtying is definetelly worth though, > > Reviewed-by: David Sterba
Re: btrfs convert problem
On 13.09.2018 19:15, Serhat Sevki Dincer wrote: >> -1 seems to be EPERM, is your device write-protected, readonly or >> something like that ? > > I am able to use ext4 partition read/write, created a file and wrote > in it, un/re mounted it, all is ok. > The drive only has a microusb port and a little led light; no rw > switch or anything like that.. > What might help is running btrfs convert under strace. So something like : sudo strace -f -o strace.log btrfs-convert /dev/sdb1 and then send the log.
Re: btrfs send hangs after partial transfer and blocks all IO
(resend to all) On Thu, Sep 13, 2018 at 9:44 AM, Nikolay Borisov wrote: > > > On 13.09.2018 18:30, Chris Murphy wrote: >> This is the 2nd or 3rd thread containing hanging btrfs send, with >> kernel 4.18.x. The subject of one is "btrfs send hung in pipe_wait" >> and the other I can't find at the moment. In that case though the hang >> is reproducible in 4.14.x and weirdly it only happens when a snapshot >> contains (perhaps many) reflinks. Scrub and check lowmem find nothing >> wrong. >> >> I have snapshots with a few reflinks (cp --reflink and also >> deduplication), and I see maybe 15-30 second hangs where nothing is >> apparently happening (in top or iotop), but I'm also not seeing any >> blocked tasks or high CPU usage. Perhaps in my case it's just >> recovering quickly. >> >> Are there any kernel config options in "# Debug Lockups and Hangs" >> that might hint at what's going on? Some of these are enabled in >> Fedora debug kernels, which are built practically daily, e.g. right >> now the latest in the build system is 4.19.0-0.rc3.git2.1 - which >> translates to git 54eda9df17f3. > > If it's a lock-related problem then you need Lock Debugging => Lock > debugging: prove locking correctness OK looks like that's under a different section as CONFIG_PROVE_LOCKING which is enabled on Fedora debug kernels. # Debug Lockups and Hangs CONFIG_LOCKUP_DETECTOR=y CONFIG_SOFTLOCKUP_DETECTOR=y # CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0 CONFIG_HARDLOCKUP_DETECTOR_PERF=y CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y CONFIG_HARDLOCKUP_DETECTOR=y # CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is not set CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=0 # Lock Debugging (spinlocks, mutexes, etc...) CONFIG_LOCK_DEBUGGING_SUPPORT=y CONFIG_PROVE_LOCKING=y CONFIG_LOCK_STAT=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_LOCKDEP=y # CONFIG_DEBUG_LOCKDEP is not set # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set CONFIG_LOCK_TORTURE_TEST=m -- Chris Murphy
Re: btrfs convert problem
> -1 seems to be EPERM, is your device write-protected, readonly or > something like that ? I am able to use ext4 partition read/write, created a file and wrote in it, un/re mounted it, all is ok. The drive only has a microusb port and a little led light; no rw switch or anything like that..
Re: btrfs convert problem
On 13.09.2018 15:22, Serhat Sevki Dincer wrote: > Hi, > > I have an external usb HDD (WD my passport, just usb cable, no > external power) with a single ext4 partition occupying the whole disk > with 698 GiB capacity and 188 GiB empty space. The data on disk is not > very important. > > I also have a laptop with Manjaro 64-bit XFCE, kernel 4.14.68, > btrfs-progs v4.17.1, all packages come with Manjaro. > > I tried to convert my disk to btrfs with > sudo btrfs-convert /dev/sdb1 > I have also tried -i, -n options, all failed with: > > create btrfs filesystem: > blocksize: 4096 > nodesize: 16384 > features: extref, skinny-metadata (default) > creating ext2 image file > ERROR: failed to create ext2_saved/image: -1 -1 seems to be EPERM, is your device write-protected, readonly or something like that ? > WARNING: an error occurred during conversion, filesystem is partially > created but not finalized and not mountable > > I could not find this error/warning combo on the net. > ext4 partition seems intact and working after these attempts. > How can I debug and complete this conversion? > > Thanks.. >
Re: btrfs send hangs after partial transfer and blocks all IO
On 13.09.2018 18:30, Chris Murphy wrote: > This is the 2nd or 3rd thread containing hanging btrfs send, with > kernel 4.18.x. The subject of one is "btrfs send hung in pipe_wait" > and the other I can't find at the moment. In that case though the hang > is reproducible in 4.14.x and weirdly it only happens when a snapshot > contains (perhaps many) reflinks. Scrub and check lowmem find nothing > wrong. > > I have snapshots with a few reflinks (cp --reflink and also > deduplication), and I see maybe 15-30 second hangs where nothing is > apparently happening (in top or iotop), but I'm also not seeing any > blocked tasks or high CPU usage. Perhaps in my case it's just > recovering quickly. > > Are there any kernel config options in "# Debug Lockups and Hangs" > that might hint at what's going on? Some of these are enabled in > Fedora debug kernels, which are built practically daily, e.g. right > now the latest in the build system is 4.19.0-0.rc3.git2.1 - which > translates to git 54eda9df17f3. If it's a lock-related problem then you need Lock Debugging => Lock debugging: prove locking correctness > > > Chris Murphy >
Re: btrfs send hangs after partial transfer and blocks all IO
This is the 2nd or 3rd thread containing hanging btrfs send, with kernel 4.18.x. The subject of one is "btrfs send hung in pipe_wait" and the other I can't find at the moment. In that case though the hang is reproducible in 4.14.x and weirdly it only happens when a snapshot contains (perhaps many) reflinks. Scrub and check lowmem find nothing wrong. I have snapshots with a few reflinks (cp --reflink and also deduplication), and I see maybe 15-30 second hangs where nothing is apparently happening (in top or iotop), but I'm also not seeing any blocked tasks or high CPU usage. Perhaps in my case it's just recovering quickly. Are there any kernel config options in "# Debug Lockups and Hangs" that might hint at what's going on? Some of these are enabled in Fedora debug kernels, which are built practically daily, e.g. right now the latest in the build system is 4.19.0-0.rc3.git2.1 - which translates to git 54eda9df17f3. Chris Murphy
Re: [PATCH V5] Btrfs: enchanse raid1/10 balance heuristic
сб, 7 июл. 2018 г. в 18:24, Timofey Titovets : > > From: Timofey Titovets > > Currently btrfs raid1/10 balancer bаlance requests to mirrors, > based on pid % num of mirrors. > > Make logic understood: > - if one of underline devices are non rotational > - Queue leght to underline devices > > By default try use pid % num_mirrors guessing, but: > - If one of mirrors are non rotational, repick optimal to it > - If underline mirror have less queue leght then optimal, >repick to that mirror > > For avoid round-robin request balancing, > lets round down queue leght: > - By 8 for rotational devs > - By 2 for all non rotational devs > > Changes: > v1 -> v2: > - Use helper part_in_flight() from genhd.c > to get queue lenght > - Move guess code to guess_optimal() > - Change balancer logic, try use pid % mirror by default > Make balancing on spinning rust if one of underline devices > are overloaded > v2 -> v3: > - Fix arg for RAID10 - use sub_stripes, instead of num_stripes > v3 -> v4: > - Rebased on latest misc-next > v4 -> v5: > - Rebased on latest misc-next > > Signed-off-by: Timofey Titovets > --- > block/genhd.c | 1 + > fs/btrfs/volumes.c | 111 - > 2 files changed, 110 insertions(+), 2 deletions(-) > > diff --git a/block/genhd.c b/block/genhd.c > index 9656f9e9f99e..5ea5acc88d3c 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -81,6 +81,7 @@ void part_in_flight(struct request_queue *q, struct > hd_struct *part, > atomic_read(>in_flight[1]); > } > } > +EXPORT_SYMBOL_GPL(part_in_flight); > > void part_in_flight_rw(struct request_queue *q, struct hd_struct *part, >unsigned int inflight[2]) > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index c95af358b71f..fa7dd6ac087f 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > #include > #include "ctree.h" > #include "extent_map.h" > @@ -5201,6 +5202,111 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info > *fs_info, u64 logical, u64 len) > return ret; > } > > +/** > + * bdev_get_queue_len - return rounded down in flight queue lenght of bdev > + * > + * @bdev: target bdev > + * @round_down: round factor big for hdd and small for ssd, like 8 and 2 > + */ > +static int bdev_get_queue_len(struct block_device *bdev, int round_down) > +{ > + int sum; > + struct hd_struct *bd_part = bdev->bd_part; > + struct request_queue *rq = bdev_get_queue(bdev); > + uint32_t inflight[2] = {0, 0}; > + > + part_in_flight(rq, bd_part, inflight); > + > + sum = max_t(uint32_t, inflight[0], inflight[1]); > + > + /* > +* Try prevent switch for every sneeze > +* By roundup output num by some value > +*/ > + return ALIGN_DOWN(sum, round_down); > +} > + > +/** > + * guess_optimal - return guessed optimal mirror > + * > + * Optimal expected to be pid % num_stripes > + * > + * That's generaly ok for spread load > + * Add some balancer based on queue leght to device > + * > + * Basic ideas: > + * - Sequential read generate low amount of request > + *so if load of drives are equal, use pid % num_stripes balancing > + * - For mixed rotate/non-rotate mirrors, pick non-rotate as optimal > + *and repick if other dev have "significant" less queue lenght > + * - Repick optimal if queue leght of other mirror are less > + */ > +static int guess_optimal(struct map_lookup *map, int num, int optimal) > +{ > + int i; > + int round_down = 8; > + int qlen[num]; > + bool is_nonrot[num]; > + bool all_bdev_nonrot = true; > + bool all_bdev_rotate = true; > + struct block_device *bdev; > + > + if (num == 1) > + return optimal; > + > + /* Check accessible bdevs */ > + for (i = 0; i < num; i++) { > + /* Init for missing bdevs */ > + is_nonrot[i] = false; > + qlen[i] = INT_MAX; > + bdev = map->stripes[i].dev->bdev; > + if (bdev) { > + qlen[i] = 0; > + is_nonrot[i] = blk_queue_nonrot(bdev_get_queue(bdev)); > + if (is_nonrot[i]) > + all_bdev_rotate = false; > + else > + all_bdev_nonrot = false; > + } > + } > + > + /* > +* Don't bother with computation > +* if only one of two bdevs are accessible > +*/ > + if (num == 2 && qlen[0] != qlen[1]) { > + if (qlen[0] < qlen[1]) > + return 0; > + else > + return 1; > + } > + > + if (all_bdev_nonrot) > + round_down = 2; > + > + for (i = 0; i < num; i++) { > +
Re: btrfs send hangs after partial transfer and blocks all IO
On 13.09.2018 15:30, Jürgen Herrmann wrote: > OK, I will install kdump later and perform a dump after the hang. > > One more noob question beforehand: does this dump contain sensitive > information, for example the luks encryption key for the disk etc? A > Google search only brings up one relevant search result which can only > be viewed with a redhat subscription... So a kdump will dump the kernel memory so it's possible that the LUKS encryption keys could be extracted from that image. Bummer, it's understandable why you wouldn't want to upload it :). In this case you'd have to install also the 'crash' utility to open the crashdump and extract the calltrace of the btrfs process. The rough process should be : crash 'path to vm linux' 'path to vmcore file', then once inside the crash utility : set , you can acquire the pid by issuing 'ps' which will give you a ps-like output of all running processes at the time of crash. After the context has been set you can run 'bt' which will give you a backtrace of the send process. > > Best regards, > Jürgen > > Am 13. September 2018 14:02:11 schrieb Nikolay Borisov : > >> On 13.09.2018 14:50, Jürgen Herrmann wrote: >>> I was echoing "w" to /proc/sysrq_trigger every 0.5s which did work also >>> after the hang because I started the loop before the hang. The dmesg >>> output should show the hanging tasks from second 346 on or so. Still not >>> useful? >>> >> >> So from 346 it's evident that transaction commit is waiting for >> commit_root_sem to be acquired. So something else is holding it and not >> giving the transaction chance to finish committing. Now the only place >> where send acquires this lock is in find_extent_clone around the call >> to extent_from_logical. The latter basically does an extent tree search >> and doesn't loop so can't possibly deadlock. Furthermore I don't see any >> userspace processes being hung in kernel space. >> >> Additionally looking at the userspace processes they indicate that >> find_extent_clone has finished and are blocked in send_write_or_clone >> which does the write. But I guess this actually happens before the hang. >> >> >> So at this point without looking at the stacktrace of the btrfs send >> process after the hung has occurred I don't think much can be done > > > Mit AquaMail Android > https://www.mobisystems.com/aqua-mail > > >
Re: btrfs send hangs after partial transfer and blocks all IO
OK, I will install kdump later and perform a dump after the hang. One more noob question beforehand: does this dump contain sensitive information, for example the luks encryption key for the disk etc? A Google search only brings up one relevant search result which can only be viewed with a redhat subscription... Best regards, Jürgen Am 13. September 2018 14:02:11 schrieb Nikolay Borisov : On 13.09.2018 14:50, Jürgen Herrmann wrote: I was echoing "w" to /proc/sysrq_trigger every 0.5s which did work also after the hang because I started the loop before the hang. The dmesg output should show the hanging tasks from second 346 on or so. Still not useful? So from 346 it's evident that transaction commit is waiting for commit_root_sem to be acquired. So something else is holding it and not giving the transaction chance to finish committing. Now the only place where send acquires this lock is in find_extent_clone around the call to extent_from_logical. The latter basically does an extent tree search and doesn't loop so can't possibly deadlock. Furthermore I don't see any userspace processes being hung in kernel space. Additionally looking at the userspace processes they indicate that find_extent_clone has finished and are blocked in send_write_or_clone which does the write. But I guess this actually happens before the hang. So at this point without looking at the stacktrace of the btrfs send process after the hung has occurred I don't think much can be done Mit AquaMail Android https://www.mobisystems.com/aqua-mail
Re: [PATCH 0/3] Misc refactoring of check_file_extent
On Thu, Sep 13, 2018 at 03:05:04PM +0300, Nikolay Borisov wrote: >While looking at check_file_extent I thought that the code might be a bit >cleaner than it actually is and cleaner as well. The first patch factors out >the code dealing with inline extents into a separate function aptly named >check_file_extent_inline. This allows to remove some inline-specific variable >from check_file_extent. Patch 2 just moves the final check in the new function >into the already existing branch handling the !compressed case. Finally >the check which detects unknown extent types is moved first in >check_file_extent, >followed by the code to handle inline extents and finally the existing code to >handle regular/prealloc extents is left intact. > >This patchset brings no functional changes. For the series, Reviewed-by: Lu Fengqi -- Thanks, Lu > >Nikolay Borisov (3): > btrfs-progs: check: lowmem: Factor out inline extent checking code in >its own function > btrfs-progs: check: lowmem: Refactor extent len test in >check_file_extent_inline > btrfs-progs: check: lowmem: Refactor extent type checks in >check_file_extent > > check/mode-lowmem.c | 151 ++-- > 1 file changed, 89 insertions(+), 62 deletions(-) > >-- >2.17.1 > > >
btrfs convert problem
Hi, I have an external usb HDD (WD my passport, just usb cable, no external power) with a single ext4 partition occupying the whole disk with 698 GiB capacity and 188 GiB empty space. The data on disk is not very important. I also have a laptop with Manjaro 64-bit XFCE, kernel 4.14.68, btrfs-progs v4.17.1, all packages come with Manjaro. I tried to convert my disk to btrfs with sudo btrfs-convert /dev/sdb1 I have also tried -i, -n options, all failed with: create btrfs filesystem: blocksize: 4096 nodesize: 16384 features: extref, skinny-metadata (default) creating ext2 image file ERROR: failed to create ext2_saved/image: -1 WARNING: an error occurred during conversion, filesystem is partially created but not finalized and not mountable I could not find this error/warning combo on the net. ext4 partition seems intact and working after these attempts. How can I debug and complete this conversion? Thanks..
[PATCH 2/3] btrfs-progs: check: lowmem: Refactor extent len test in check_file_extent_inline
Instead of having another top-level if which checks for 'extent_num_bytes != item_inline_len' only if we are !compressed, just move the 'if' inside the 'else' branch of the first top-level if, since it has already checked for !compressed or not. No functional changes. Signed-off-by: Nikolay Borisov --- check/mode-lowmem.c | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 48c1537e7440..3a6fbb33c858 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -1860,19 +1860,20 @@ static int check_file_extent_inline(struct btrfs_root *root, err |= FILE_EXTENT_ERROR; } - } - if (!compressed && extent_num_bytes != item_inline_len) { - error( + + if (extent_num_bytes != item_inline_len) { + error( "root %llu EXTENT_DATA[%llu %llu] wrong inline size, have: %llu, expected: %u", root->objectid, fkey.objectid, fkey.offset, extent_num_bytes, item_inline_len); - if (repair) { - ret = repair_inline_ram_bytes(root, path, - _num_bytes); - if (ret) + if (repair) { + ret = repair_inline_ram_bytes(root, path, + _num_bytes); + if (ret) + err |= FILE_EXTENT_ERROR; + } else { err |= FILE_EXTENT_ERROR; - } else { - err |= FILE_EXTENT_ERROR; + } } } *end += extent_num_bytes; -- 2.17.1
[PATCH 3/3] btrfs-progs: check: lowmem: Refactor extent type checks in check_file_extent
Make the checks in check_file_extent a bit more explicit. First we check for unknown type and fail accordingly. Then we check for inline extent and handle it in the newly introduced check_file_extent_inline. Finally if none of the above checks triggered then we must have a regular or prealloc extents. Signed-off-by: Nikolay Borisov --- check/mode-lowmem.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 3a6fbb33c858..aa946a1307de 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -1915,21 +1915,23 @@ static int check_file_extent(struct btrfs_root *root, struct btrfs_path *path, btrfs_item_key_to_cpu(node, , slot); fi = btrfs_item_ptr(node, slot, struct btrfs_file_extent_item); - - /* Check inline extent */ extent_type = btrfs_file_extent_type(node, fi); - if (extent_type == BTRFS_FILE_EXTENT_INLINE) - return check_file_extent_inline(root, path, size, end); /* Check extent type */ if (extent_type != BTRFS_FILE_EXTENT_REG && - extent_type != BTRFS_FILE_EXTENT_PREALLOC) { + extent_type != BTRFS_FILE_EXTENT_PREALLOC && + extent_type != BTRFS_FILE_EXTENT_INLINE) { err |= FILE_EXTENT_ERROR; error("root %llu EXTENT_DATA[%llu %llu] type bad", root->objectid, fkey.objectid, fkey.offset); return err; } + /* Check inline extent */ + if (extent_type == BTRFS_FILE_EXTENT_INLINE) + return check_file_extent_inline(root, path, size, end); + + /* Check REG_EXTENT/PREALLOC_EXTENT */ disk_bytenr = btrfs_file_extent_disk_bytenr(node, fi); disk_num_bytes = btrfs_file_extent_disk_num_bytes(node, fi); -- 2.17.1
[PATCH 1/3] btrfs-progs: check: lowmem: Factor out inline extent checking code in its own function
Since the inline extent code can be largely self-sufficient, factor it out from check_file_extent. No functional changes. Signed-off-by: Nikolay Borisov --- check/mode-lowmem.c | 142 ++-- 1 file changed, 83 insertions(+), 59 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 1bce44f5658a..48c1537e7440 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -1800,6 +1800,87 @@ static int repair_inline_ram_bytes(struct btrfs_root *root, return ret; } + +static int check_file_extent_inline(struct btrfs_root *root, + struct btrfs_path *path, u64 *size, + u64 *end) +{ + u32 max_inline_extent_size = min_t(u32, root->fs_info->sectorsize - 1, + BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)); + struct extent_buffer *node = path->nodes[0]; + struct btrfs_item *e = btrfs_item_nr(0); + struct btrfs_file_extent_item *fi; + struct btrfs_key fkey; + u64 extent_num_bytes; + u32 item_inline_len; + int ret; + int compressed = 0; + int err = 0; + + fi = btrfs_item_ptr(node, path->slots[0], + struct btrfs_file_extent_item); + item_inline_len = btrfs_file_extent_inline_item_len(node, e); + extent_num_bytes = btrfs_file_extent_ram_bytes(node, fi); + compressed = btrfs_file_extent_compression(node, fi); + btrfs_item_key_to_cpu(node, , path->slots[0]); + + if (extent_num_bytes == 0) { + error( +"root %llu EXTENT_DATA[%llu %llu] has empty inline extent", + root->objectid, fkey.objectid, fkey.offset); + err |= FILE_EXTENT_ERROR; + } + + if (compressed) { + if (extent_num_bytes > root->fs_info->sectorsize) { + error( +"root %llu EXTENT_DATA[%llu %llu] too large inline extent ram size, have %llu, max: %u", + root->objectid, fkey.objectid, + fkey.offset, extent_num_bytes, + root->fs_info->sectorsize - 1); + err |= FILE_EXTENT_ERROR; + } + + if (item_inline_len > max_inline_extent_size) { + error( +"root %llu EXTENT_DATA[%llu %llu] too large inline extent on-disk size, have %u, max: %u", + root->objectid, fkey.objectid, + fkey.offset, item_inline_len, + max_inline_extent_size); + err |= FILE_EXTENT_ERROR; + } + + } else { + + if (extent_num_bytes > max_inline_extent_size) { + error( +"root %llu EXTENT_DATA[%llu %llu] too large inline extent size, have %llu, max: %u", + root->objectid, fkey.objectid, fkey.offset, + extent_num_bytes, max_inline_extent_size); + err |= FILE_EXTENT_ERROR; + } + + } + if (!compressed && extent_num_bytes != item_inline_len) { + error( +"root %llu EXTENT_DATA[%llu %llu] wrong inline size, have: %llu, expected: %u", + root->objectid, fkey.objectid, fkey.offset, + extent_num_bytes, item_inline_len); + if (repair) { + ret = repair_inline_ram_bytes(root, path, + _num_bytes); + if (ret) + err |= FILE_EXTENT_ERROR; + } else { + err |= FILE_EXTENT_ERROR; + } + } + *end += extent_num_bytes; + *size += extent_num_bytes; + + return err; +} + /* * Check file extent datasum/hole, update the size of the file extents, * check and update the last offset of the file extent. @@ -1824,8 +1905,6 @@ static int check_file_extent(struct btrfs_root *root, struct btrfs_path *path, u64 csum_found; /* In byte size, sectorsize aligned */ u64 search_start; /* Logical range start we search for csum */ u64 search_len; /* Logical range len we search for csum */ - u32 max_inline_extent_size = min_t(u32, root->fs_info->sectorsize - 1, - BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)); unsigned int extent_type; unsigned int is_hole; int slot = path->slots[0]; @@ -1838,63 +1917,8 @@ static int check_file_extent(struct btrfs_root *root, struct btrfs_path *path, /* Check inline extent */ extent_type = btrfs_file_extent_type(node, fi); - if (extent_type == BTRFS_FILE_EXTENT_INLINE) { - struct btrfs_item *e = btrfs_item_nr(slot); - u32 item_inline_len;
[PATCH 0/3] Misc refactoring of check_file_extent
While looking at check_file_extent I thought that the code might be a bit cleaner than it actually is and cleaner as well. The first patch factors out the code dealing with inline extents into a separate function aptly named check_file_extent_inline. This allows to remove some inline-specific variable from check_file_extent. Patch 2 just moves the final check in the new function into the already existing branch handling the !compressed case. Finally the check which detects unknown extent types is moved first in check_file_extent, followed by the code to handle inline extents and finally the existing code to handle regular/prealloc extents is left intact. This patchset brings no functional changes. Nikolay Borisov (3): btrfs-progs: check: lowmem: Factor out inline extent checking code in its own function btrfs-progs: check: lowmem: Refactor extent len test in check_file_extent_inline btrfs-progs: check: lowmem: Refactor extent type checks in check_file_extent check/mode-lowmem.c | 151 ++-- 1 file changed, 89 insertions(+), 62 deletions(-) -- 2.17.1
Re: btrfs send hangs after partial transfer and blocks all IO
On 13.09.2018 14:50, Jürgen Herrmann wrote: > I was echoing "w" to /proc/sysrq_trigger every 0.5s which did work also > after the hang because I started the loop before the hang. The dmesg > output should show the hanging tasks from second 346 on or so. Still not > useful? > So from 346 it's evident that transaction commit is waiting for commit_root_sem to be acquired. So something else is holding it and not giving the transaction chance to finish committing. Now the only place where send acquires this lock is in find_extent_clone around the call to extent_from_logical. The latter basically does an extent tree search and doesn't loop so can't possibly deadlock. Furthermore I don't see any userspace processes being hung in kernel space. Additionally looking at the userspace processes they indicate that find_extent_clone has finished and are blocked in send_write_or_clone which does the write. But I guess this actually happens before the hang. So at this point without looking at the stacktrace of the btrfs send process after the hung has occurred I don't think much can be done
Re: [PATCH] btrfs: wait on caching when putting the bg cache
On Wed, Sep 12, 2018 at 08:40:44PM +0200, David Sterba wrote: > On Wed, Sep 12, 2018 at 10:45:45AM -0400, Josef Bacik wrote: > > While testing my backport I noticed there was a panic if I ran > > generic/416 generic/417 generic/418 all in a row. This just happened to > > uncover a race where we had outstanding IO after we destroy all of our > > workqueues, and then we'd go to queue the endio work on those free'd > > workqueues. This is because we aren't waiting for the caching threads > > to be done before freeing everything up, so to fix this make sure we > > wait on any outstanding caching that's being done before we free up the > > block group, so we're sure to be done with all IO by the time we get to > > btrfs_stop_all_workers(). This fixes the panic I was seeing > > consistently in testing. > > Can you please attach the stacktrace(s)? I think I've seen similar error > once or twice but not able to reproduce. I found at least this one https://patchwork.kernel.org/patch/10495885/, when the rbio cache is destroyed, there's some in-flight IO. This is not the example I had in mind before but still roughly matches the symptoms.
Re: btrfs send hangs after partial transfer and blocks all IO
I was echoing "w" to /proc/sysrq_trigger every 0.5s which did work also after the hang because I started the loop before the hang. The dmesg output should show the hanging tasks from second 346 on or so. Still not useful? Best regards, Jürgen Am 13. September 2018 13:04:39 schrieb Nikolay Borisov : On 13.09.2018 13:56, Jürgen Herrmann wrote: Both loops were started before the hang because after the hang I cannot do that anymore. That's why there is progress in the logs at first. The hang continues for at least 1.5 hours. No data is transferred anymore during this time. I never waited longer than 1.5 hours. So these logs don't provide any useful information then. The other thing which I can advise is to setup kdump and when the kernel hangs cause a crashdump to be taken and try to upload it somewhere alongside your vmlinux file for further debugging. Best regards, Jürgen Am 13. September 2018 12:50:59 schrieb Nikolay Borisov : On 13.09.2018 13:29, Jürgen Herrmann wrote: Am 13.9.2018 10:40, schrieb Nikolay Borisov: On 13.09.2018 11:34, Jürgen Herrmann wrote: Hello! I have a newly installed laptop running a freshly installed (abt. two months ago) laptop running latest linux mint 19. Root filesystem is on a 1TB Samsung 860 M.2 SSD with btrfs on top of a LUKS encrypted 900G partition. Timeshift-btrfs is enabled for root (@) and home (@home) subvolumes. I want to transfer snapshots to a server with a separated disk via "btrfs send" and ssh. Here's the list of snapshot directories, each containing tow snapshots for root and home: drwxr-xr-x 1 root root 30 Sep 12 22:08 2018-08-16_20-00-01 drwxr-xr-x 1 root root 30 Aug 17 14:00 2018-08-17_14-00-02 drwxr-xr-x 1 root root 30 Aug 23 20:00 2018-08-23_20-00-01 drwxr-xr-x 1 root root 30 Aug 30 20:00 2018-08-30_20-00-01 drwxr-xr-x 1 root root 30 Sep 6 20:00 2018-09-06_20-00-01 drwxr-xr-x 1 root root 30 Sep 6 22:00 2018-09-06_22-00-01 drwxr-xr-x 1 root root 30 Sep 8 16:00 2018-09-08_16-00-01 drwxr-xr-x 1 root root 30 Sep 10 20:00 2018-09-10_20-00-02 drwxr-xr-x 1 root root 30 Sep 11 21:00 2018-09-11_21-00-02 drwxr-xr-x 1 root root 30 Sep 12 21:00 2018-09-12_21-00-01 "btrfs send /mnt/timeshift/backup/timeshift-btrfs/snapshots/2018-08-16_20-00-01/@ /dev/null" results in the btrfs task taking 100% cpu time on one cpu and then all IO is blocked -> only reboot can solve the hang. The crash does not happen immediately, as i was on the road using cellular connection it seemed fine at first. That's how I found out that it transfers ~140MB of data before hanging. The snapshot is created on the server and contains data (du shows abt 140MB). I am running vanilla kernel 4.18.6 (compiled by myself) and btrfs progs 4.17.1 compiled from source. Here's the btrfs filesystem info: Label: none uuid: a914c141-72bf-448b-847f-d64ee82d8b7b Total devices 1 FS bytes used 342.85GiB devid1 size 875.44GiB used 357.05GiB path /dev/mapper/sda3_crypt A scrub shows no errors: scrub status for a914c141-72bf-448b-847f-d64ee82d8b7b scrub started at Thu Sep 13 10:20:18 2018 and finished after 00:12:19 total bytes scrubbed: 342.78GiB with 0 errors What can I do to help debugging this issue? You should provide output of echo w > /proc/sysrq-trigger. Also sample the stack of /proc/[pid of btrfs send]/stack to see if it is changing. Best regards, Jürgen Hello! dmesg output can be found here: https://pastebin.com/g86dPGSZ So from what I see current transaction commit is waiting for root->commit_root_sem and then other threads (in this case systemd) is waiting for transaction commit to finish. stacks can be found here: https://pastebin.com/dCt1YgJp ANd your user process seems to be making some progress as evident from the fact that the call trace of the process is actually changing over the course of sampling. Is it possible that it just takes time to do the IO ? Best regards, Jürgen Mit AquaMail Android https://www.mobisystems.com/aqua-mail Mit AquaMail Android https://www.mobisystems.com/aqua-mail
Re: [PATCH] Btrfs: skip set_page_dirty if eb is dirty
On Wed, Sep 12, 2018 at 12:27:46PM -0700, Liu Bo wrote: > On Wed, Sep 12, 2018 at 09:37:20AM +0300, Nikolay Borisov wrote: > > > > > > On 12.09.2018 01:06, Liu Bo wrote: > > > As long as @eb is marked with EXTENT_BUFFER_DIRTY, all of its pages > > > are dirty, so no need to set pages dirty again. > > > > > > Signed-off-by: Liu Bo > > > > Does make it any performance difference, numbers? > > > > To be honest, the performance difference would be trivial in a normal > big test round. But I just looked into the difference from my ftrace, > removing the loop can reduce the time spent by 10us in my box. 10us was for the case where the pages were dirty already and the for cycle was then skipped? set_page_dirty is not lightweight, calls down to various functions and holds locks. I can't tell if this is still fast enough on your machine so that it really takes 10us. The conditional dirtying is definetelly worth though, Reviewed-by: David Sterba
Re: btrfs send hangs after partial transfer and blocks all IO
On 13.09.2018 13:56, Jürgen Herrmann wrote: > Both loops were started before the hang because after the hang I cannot > do that anymore. That's why there is progress in the logs at first. The > hang continues for at least 1.5 hours. No data is transferred anymore > during this time. I never waited longer than 1.5 hours. So these logs don't provide any useful information then. The other thing which I can advise is to setup kdump and when the kernel hangs cause a crashdump to be taken and try to upload it somewhere alongside your vmlinux file for further debugging. > > Best regards, > Jürgen > > Am 13. September 2018 12:50:59 schrieb Nikolay Borisov : > >> On 13.09.2018 13:29, Jürgen Herrmann wrote: >>> Am 13.9.2018 10:40, schrieb Nikolay Borisov: On 13.09.2018 11:34, Jürgen Herrmann wrote: > Hello! > > I have a newly installed laptop running a freshly installed (abt. two > months ago) laptop running latest linux mint 19. Root filesystem is > on a > 1TB Samsung 860 M.2 SSD with btrfs on top of a LUKS encrypted 900G > partition. Timeshift-btrfs is enabled for root (@) and home (@home) > subvolumes. I want to transfer snapshots to a server with a separated > disk via "btrfs send" and ssh. > > Here's the list of snapshot directories, each containing tow snapshots > for root and home: > > drwxr-xr-x 1 root root 30 Sep 12 22:08 2018-08-16_20-00-01 > drwxr-xr-x 1 root root 30 Aug 17 14:00 2018-08-17_14-00-02 > drwxr-xr-x 1 root root 30 Aug 23 20:00 2018-08-23_20-00-01 > drwxr-xr-x 1 root root 30 Aug 30 20:00 2018-08-30_20-00-01 > drwxr-xr-x 1 root root 30 Sep 6 20:00 2018-09-06_20-00-01 > drwxr-xr-x 1 root root 30 Sep 6 22:00 2018-09-06_22-00-01 > drwxr-xr-x 1 root root 30 Sep 8 16:00 2018-09-08_16-00-01 > drwxr-xr-x 1 root root 30 Sep 10 20:00 2018-09-10_20-00-02 > drwxr-xr-x 1 root root 30 Sep 11 21:00 2018-09-11_21-00-02 > drwxr-xr-x 1 root root 30 Sep 12 21:00 2018-09-12_21-00-01 > > "btrfs send > /mnt/timeshift/backup/timeshift-btrfs/snapshots/2018-08-16_20-00-01/@ >> /dev/null" results in the btrfs task taking 100% cpu time on one cpu > and then all IO is blocked -> only reboot can solve the hang. > > The crash does not happen immediately, as i was on the road using > cellular connection it seemed fine at first. That's how I found out > that > it transfers ~140MB of data before hanging. The snapshot is created on > the server and contains data (du shows abt 140MB). > > I am running vanilla kernel 4.18.6 (compiled by myself) and btrfs > progs > 4.17.1 compiled from source. > > Here's the btrfs filesystem info: > Label: none uuid: a914c141-72bf-448b-847f-d64ee82d8b7b > Total devices 1 FS bytes used 342.85GiB > devid 1 size 875.44GiB used 357.05GiB path > /dev/mapper/sda3_crypt > > A scrub shows no errors: > scrub status for a914c141-72bf-448b-847f-d64ee82d8b7b > scrub started at Thu Sep 13 10:20:18 2018 and finished after > 00:12:19 > total bytes scrubbed: 342.78GiB with 0 errors > > What can I do to help debugging this issue? You should provide output of echo w > /proc/sysrq-trigger. Also sample the stack of /proc/[pid of btrfs send]/stack to see if it is changing. > > Best regards, > Jürgen >>> >>> Hello! >>> >>> dmesg output can be found here: >>> https://pastebin.com/g86dPGSZ >> >> So from what I see current transaction commit is waiting for >> root->commit_root_sem and then other threads (in this case systemd) is >> waiting for transaction commit to finish. >>> >>> stacks can be found here: >>> https://pastebin.com/dCt1YgJp >> >> ANd your user process seems to be making some progress as evident from >> the fact that the call trace of the process is actually changing over >> the course of sampling. Is it possible that it just takes time to do the >> IO ? >>> >>> Best regards, >>> Jürgen > > > Mit AquaMail Android > https://www.mobisystems.com/aqua-mail > > >
Re: btrfs send hangs after partial transfer and blocks all IO
Both loops were started before the hang because after the hang I cannot do that anymore. That's why there is progress in the logs at first. The hang continues for at least 1.5 hours. No data is transferred anymore during this time. I never waited longer than 1.5 hours. Best regards, Jürgen Am 13. September 2018 12:50:59 schrieb Nikolay Borisov : On 13.09.2018 13:29, Jürgen Herrmann wrote: Am 13.9.2018 10:40, schrieb Nikolay Borisov: On 13.09.2018 11:34, Jürgen Herrmann wrote: Hello! I have a newly installed laptop running a freshly installed (abt. two months ago) laptop running latest linux mint 19. Root filesystem is on a 1TB Samsung 860 M.2 SSD with btrfs on top of a LUKS encrypted 900G partition. Timeshift-btrfs is enabled for root (@) and home (@home) subvolumes. I want to transfer snapshots to a server with a separated disk via "btrfs send" and ssh. Here's the list of snapshot directories, each containing tow snapshots for root and home: drwxr-xr-x 1 root root 30 Sep 12 22:08 2018-08-16_20-00-01 drwxr-xr-x 1 root root 30 Aug 17 14:00 2018-08-17_14-00-02 drwxr-xr-x 1 root root 30 Aug 23 20:00 2018-08-23_20-00-01 drwxr-xr-x 1 root root 30 Aug 30 20:00 2018-08-30_20-00-01 drwxr-xr-x 1 root root 30 Sep 6 20:00 2018-09-06_20-00-01 drwxr-xr-x 1 root root 30 Sep 6 22:00 2018-09-06_22-00-01 drwxr-xr-x 1 root root 30 Sep 8 16:00 2018-09-08_16-00-01 drwxr-xr-x 1 root root 30 Sep 10 20:00 2018-09-10_20-00-02 drwxr-xr-x 1 root root 30 Sep 11 21:00 2018-09-11_21-00-02 drwxr-xr-x 1 root root 30 Sep 12 21:00 2018-09-12_21-00-01 "btrfs send /mnt/timeshift/backup/timeshift-btrfs/snapshots/2018-08-16_20-00-01/@ /dev/null" results in the btrfs task taking 100% cpu time on one cpu and then all IO is blocked -> only reboot can solve the hang. The crash does not happen immediately, as i was on the road using cellular connection it seemed fine at first. That's how I found out that it transfers ~140MB of data before hanging. The snapshot is created on the server and contains data (du shows abt 140MB). I am running vanilla kernel 4.18.6 (compiled by myself) and btrfs progs 4.17.1 compiled from source. Here's the btrfs filesystem info: Label: none uuid: a914c141-72bf-448b-847f-d64ee82d8b7b Total devices 1 FS bytes used 342.85GiB devid1 size 875.44GiB used 357.05GiB path /dev/mapper/sda3_crypt A scrub shows no errors: scrub status for a914c141-72bf-448b-847f-d64ee82d8b7b scrub started at Thu Sep 13 10:20:18 2018 and finished after 00:12:19 total bytes scrubbed: 342.78GiB with 0 errors What can I do to help debugging this issue? You should provide output of echo w > /proc/sysrq-trigger. Also sample the stack of /proc/[pid of btrfs send]/stack to see if it is changing. Best regards, Jürgen Hello! dmesg output can be found here: https://pastebin.com/g86dPGSZ So from what I see current transaction commit is waiting for root->commit_root_sem and then other threads (in this case systemd) is waiting for transaction commit to finish. stacks can be found here: https://pastebin.com/dCt1YgJp ANd your user process seems to be making some progress as evident from the fact that the call trace of the process is actually changing over the course of sampling. Is it possible that it just takes time to do the IO ? Best regards, Jürgen Mit AquaMail Android https://www.mobisystems.com/aqua-mail
Re: btrfs send hangs after partial transfer and blocks all IO
On 13.09.2018 13:29, Jürgen Herrmann wrote: > Am 13.9.2018 10:40, schrieb Nikolay Borisov: >> On 13.09.2018 11:34, Jürgen Herrmann wrote: >>> Hello! >>> >>> I have a newly installed laptop running a freshly installed (abt. two >>> months ago) laptop running latest linux mint 19. Root filesystem is on a >>> 1TB Samsung 860 M.2 SSD with btrfs on top of a LUKS encrypted 900G >>> partition. Timeshift-btrfs is enabled for root (@) and home (@home) >>> subvolumes. I want to transfer snapshots to a server with a separated >>> disk via "btrfs send" and ssh. >>> >>> Here's the list of snapshot directories, each containing tow snapshots >>> for root and home: >>> >>> drwxr-xr-x 1 root root 30 Sep 12 22:08 2018-08-16_20-00-01 >>> drwxr-xr-x 1 root root 30 Aug 17 14:00 2018-08-17_14-00-02 >>> drwxr-xr-x 1 root root 30 Aug 23 20:00 2018-08-23_20-00-01 >>> drwxr-xr-x 1 root root 30 Aug 30 20:00 2018-08-30_20-00-01 >>> drwxr-xr-x 1 root root 30 Sep 6 20:00 2018-09-06_20-00-01 >>> drwxr-xr-x 1 root root 30 Sep 6 22:00 2018-09-06_22-00-01 >>> drwxr-xr-x 1 root root 30 Sep 8 16:00 2018-09-08_16-00-01 >>> drwxr-xr-x 1 root root 30 Sep 10 20:00 2018-09-10_20-00-02 >>> drwxr-xr-x 1 root root 30 Sep 11 21:00 2018-09-11_21-00-02 >>> drwxr-xr-x 1 root root 30 Sep 12 21:00 2018-09-12_21-00-01 >>> >>> "btrfs send >>> /mnt/timeshift/backup/timeshift-btrfs/snapshots/2018-08-16_20-00-01/@ /dev/null" results in the btrfs task taking 100% cpu time on one cpu >>> and then all IO is blocked -> only reboot can solve the hang. >>> >>> The crash does not happen immediately, as i was on the road using >>> cellular connection it seemed fine at first. That's how I found out that >>> it transfers ~140MB of data before hanging. The snapshot is created on >>> the server and contains data (du shows abt 140MB). >>> >>> I am running vanilla kernel 4.18.6 (compiled by myself) and btrfs progs >>> 4.17.1 compiled from source. >>> >>> Here's the btrfs filesystem info: >>> Label: none uuid: a914c141-72bf-448b-847f-d64ee82d8b7b >>> Total devices 1 FS bytes used 342.85GiB >>> devid 1 size 875.44GiB used 357.05GiB path >>> /dev/mapper/sda3_crypt >>> >>> A scrub shows no errors: >>> scrub status for a914c141-72bf-448b-847f-d64ee82d8b7b >>> scrub started at Thu Sep 13 10:20:18 2018 and finished after >>> 00:12:19 >>> total bytes scrubbed: 342.78GiB with 0 errors >>> >>> What can I do to help debugging this issue? >> >> >> You should provide output of echo w > /proc/sysrq-trigger. Also >> sample the stack of /proc/[pid of btrfs send]/stack to see if it is >> changing. >> >> >>> >>> Best regards, >>> Jürgen > > Hello! > > dmesg output can be found here: > https://pastebin.com/g86dPGSZ So from what I see current transaction commit is waiting for root->commit_root_sem and then other threads (in this case systemd) is waiting for transaction commit to finish. > > stacks can be found here: > https://pastebin.com/dCt1YgJp ANd your user process seems to be making some progress as evident from the fact that the call trace of the process is actually changing over the course of sampling. Is it possible that it just takes time to do the IO ? > > Best regards, > Jürgen
Re: btrfs send hangs after partial transfer and blocks all IO
Am 13.9.2018 10:40, schrieb Nikolay Borisov: On 13.09.2018 11:34, Jürgen Herrmann wrote: Hello! I have a newly installed laptop running a freshly installed (abt. two months ago) laptop running latest linux mint 19. Root filesystem is on a 1TB Samsung 860 M.2 SSD with btrfs on top of a LUKS encrypted 900G partition. Timeshift-btrfs is enabled for root (@) and home (@home) subvolumes. I want to transfer snapshots to a server with a separated disk via "btrfs send" and ssh. Here's the list of snapshot directories, each containing tow snapshots for root and home: drwxr-xr-x 1 root root 30 Sep 12 22:08 2018-08-16_20-00-01 drwxr-xr-x 1 root root 30 Aug 17 14:00 2018-08-17_14-00-02 drwxr-xr-x 1 root root 30 Aug 23 20:00 2018-08-23_20-00-01 drwxr-xr-x 1 root root 30 Aug 30 20:00 2018-08-30_20-00-01 drwxr-xr-x 1 root root 30 Sep 6 20:00 2018-09-06_20-00-01 drwxr-xr-x 1 root root 30 Sep 6 22:00 2018-09-06_22-00-01 drwxr-xr-x 1 root root 30 Sep 8 16:00 2018-09-08_16-00-01 drwxr-xr-x 1 root root 30 Sep 10 20:00 2018-09-10_20-00-02 drwxr-xr-x 1 root root 30 Sep 11 21:00 2018-09-11_21-00-02 drwxr-xr-x 1 root root 30 Sep 12 21:00 2018-09-12_21-00-01 "btrfs send /mnt/timeshift/backup/timeshift-btrfs/snapshots/2018-08-16_20-00-01/@ /dev/null" results in the btrfs task taking 100% cpu time on one cpu and then all IO is blocked -> only reboot can solve the hang. The crash does not happen immediately, as i was on the road using cellular connection it seemed fine at first. That's how I found out that it transfers ~140MB of data before hanging. The snapshot is created on the server and contains data (du shows abt 140MB). I am running vanilla kernel 4.18.6 (compiled by myself) and btrfs progs 4.17.1 compiled from source. Here's the btrfs filesystem info: Label: none uuid: a914c141-72bf-448b-847f-d64ee82d8b7b Total devices 1 FS bytes used 342.85GiB devid 1 size 875.44GiB used 357.05GiB path /dev/mapper/sda3_crypt A scrub shows no errors: scrub status for a914c141-72bf-448b-847f-d64ee82d8b7b scrub started at Thu Sep 13 10:20:18 2018 and finished after 00:12:19 total bytes scrubbed: 342.78GiB with 0 errors What can I do to help debugging this issue? You should provide output of echo w > /proc/sysrq-trigger. Also sample the stack of /proc/[pid of btrfs send]/stack to see if it is changing. Best regards, Jürgen Hello! dmesg output can be found here: https://pastebin.com/g86dPGSZ stacks can be found here: https://pastebin.com/dCt1YgJp Best regards, Jürgen -- Jürgen Herrmann https://t-5.eu ALbertstraße 2 94327 Bogen
Re: [PATCH] btrfs: Handle error of get_old_root
On Thu, Sep 13, 2018 at 11:35:10AM +0300, Nikolay Borisov wrote: > In btrfs_search_old_slot get_old_root is always used with the > assumption it cannot fail. However, this is not true in rare > circumstance it can fail and return null. Currently this will not happen, as the code returning NULL 1383 if (!eb) 1384 return NULL; is preceed by btrfs_clone_extent_buffer or alloc_dummy_extent_buffer that will not fail due to GFP_NOFAIL in __alloc_extent_buffer. However I agree the error handling in btrfs_search_old_slot should be there as the NOFAIL semantics is hidden and may change eventually.
Re: [PATCH] btrfs-progs: lowmem: fix false alert about the existence of gaps in the check_file_extent
On Thu, Sep 13, 2018 at 12:12:27PM +0300, Nikolay Borisov wrote: > > >On 13.09.2018 11:20, Lu Fengqi wrote: >> In the check_inode_item function, the extent_end variable used to store the >> end of the last file extent that has checked. When it passes to >> check_file_extent, if the offset of the next file extent is not equal to >> it, there is a gap between the two file extents. > >The 'end' parameter of check_file_extent tracks the ending offset of the >last checked extent. This is used to detect gaps between adjacent extents. > >> >> In the case of a gap existing, it is wrong that only add the >> extent_num_bytes of this file extent to the invalid extent_end variable as >> before. Therefore, lowmem check will false alert that there are gaps >> between the subsequent file extents of this inode due to the wrong >> extent_end variable. > >Currently such gaps are wrongly detected since for regular extents only >the size of the extent is added to the 'end' parameter. This results in >wrongly considering all extents of a file as having gaps between them >when only 2 of them really have a gap as seen in the example below. Thank you for refining the commit message for me. > >> >> Solution: >> The extent_end variable should set to the sum of the offset and the >> extent_num_bytes of the file extent. >> >> Example: >> Suppose that lowmem check the following file extent of inode 257. >> >> item 6 key (257 EXTENT_DATA 0) itemoff 15813 itemsize 53 >> generation 6 type 1 (regular) >> extent data disk byte 13631488 nr 4096 >> extent data offset 0 nr 4096 ram 4096 >> extent compression 0 (none) >> item 7 key (257 EXTENT_DATA 8192) itemoff 15760 itemsize 53 >> generation 6 type 1 (regular) >> extent data disk byte 13631488 nr 4096 >> extent data offset 0 nr 4096 ram 4096 >> extent compression 0 (none) >> item 8 key (257 EXTENT_DATA 12288) itemoff 15707 itemsize 53 >> generation 6 type 1 (regular) >> extent data disk byte 13631488 nr 4096 >> extent data offset 0 nr 4096 ram 4096 >> extent compression 0 (none) >> >> For inode 257, check_inode_item set extent_end to 0, then call >> check_file_extent to check item {6,7,8}. >> item 6) >> offset(0) == extent_end(0) >> extent_end = extent_end(0) + extent_num_bytes(4096) >> item 7) >> offset(8192) != extent_end(4096) >> extent_end = extent_end(4096) + extent_num_bytes(4096) >> ^^^ >> The old extent_end should replace by offset(8192). >> item 8) >> offset(12288) != extent_end(8192) >> ^^^ >> But there is no gap between item {7,8}. > >The example makes sense. But can the same thing happen with the inline >extents, ie should the same adjustments be made for the code in if >(extent_type == BTRFS_FILE_EXTENT_INLINE) ? > IIRC, generally there is only one inline extent per file. Although there will be other regular extents, the inline extent must be the first one. So it seems that there is no need to change the code in if (extent_type == BTRFS_FILE_EXTENT_INLINE). -- Thanks, Lu >> >> Fixes: d88da10ddd42 ("btrfs-progs: check: introduce function to check file >> extent") >> Signed-off-by: Lu Fengqi >> --- >> check/mode-lowmem.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c >> index 1bce44f5658a..370318f0e631 100644 >> --- a/check/mode-lowmem.c >> +++ b/check/mode-lowmem.c >> @@ -1974,7 +1974,7 @@ static int check_file_extent(struct btrfs_root *root, >> struct btrfs_path *path, >> } >> } >> >> -*end += extent_num_bytes; >> +*end = fkey.offset + extent_num_bytes; >> if (!is_hole) >> *size += extent_num_bytes; >> >> > >
Re: [PATCH] btrfs-progs: Eliminate remaining uses of strerror(errno)
On Thu, Sep 13, 2018 at 09:17:17AM +0300, Nikolay Borisov wrote: > > > On 13.09.2018 03:52, Rosen Penev wrote: > > %m allows a smaller filesize. Useful on embedded systems. > > > > Signed-off-by: Rosen Penev > > --- > > build/Documentation/Makefile | 142 + > > build/Makefile.inc | 42 ++ > > build/config.h | 139 + > > build/config.log | 822 ++ > > build/config.status | 1059 ++ > > build/version.h | 14 + > > cmds-qgroup.c|3 +- > > messages.h |4 +- > > mkfs/rootdir.c | 47 +- > > qgroup.c |3 +- > > 10 files changed, 2243 insertions(+), 32 deletions(-) > > create mode 100644 build/Documentation/Makefile > > create mode 100644 build/Makefile.inc > > create mode 100644 build/config.h > > create mode 100644 build/config.log > > create mode 100755 build/config.status > > create mode 100644 build/version.h > > You clearly need to resend with changes only to the *.[ch] files No need to resend, the .[ch] changes look good, I'll drop the rest.
Re: [PATCH] btrfs-progs: lowmem: fix false alert about the existence of gaps in the check_file_extent
On 13.09.2018 11:20, Lu Fengqi wrote: > In the check_inode_item function, the extent_end variable used to store the > end of the last file extent that has checked. When it passes to > check_file_extent, if the offset of the next file extent is not equal to > it, there is a gap between the two file extents. The 'end' parameter of check_file_extent tracks the ending offset of the last checked extent. This is used to detect gaps between adjacent extents. > > In the case of a gap existing, it is wrong that only add the > extent_num_bytes of this file extent to the invalid extent_end variable as > before. Therefore, lowmem check will false alert that there are gaps > between the subsequent file extents of this inode due to the wrong > extent_end variable. Currently such gaps are wrongly detected since for regular extents only the size of the extent is added to the 'end' parameter. This results in wrongly considering all extents of a file as having gaps between them when only 2 of them really have a gap as seen in the example below. > > Solution: > The extent_end variable should set to the sum of the offset and the > extent_num_bytes of the file extent. > > Example: > Suppose that lowmem check the following file extent of inode 257. > > item 6 key (257 EXTENT_DATA 0) itemoff 15813 itemsize 53 > generation 6 type 1 (regular) > extent data disk byte 13631488 nr 4096 > extent data offset 0 nr 4096 ram 4096 > extent compression 0 (none) > item 7 key (257 EXTENT_DATA 8192) itemoff 15760 itemsize 53 > generation 6 type 1 (regular) > extent data disk byte 13631488 nr 4096 > extent data offset 0 nr 4096 ram 4096 > extent compression 0 (none) > item 8 key (257 EXTENT_DATA 12288) itemoff 15707 itemsize 53 > generation 6 type 1 (regular) > extent data disk byte 13631488 nr 4096 > extent data offset 0 nr 4096 ram 4096 > extent compression 0 (none) > > For inode 257, check_inode_item set extent_end to 0, then call > check_file_extent to check item {6,7,8}. > item 6) > offset(0) == extent_end(0) > extent_end = extent_end(0) + extent_num_bytes(4096) > item 7) > offset(8192) != extent_end(4096) > extent_end = extent_end(4096) + extent_num_bytes(4096) > ^^^ > The old extent_end should replace by offset(8192). > item 8) > offset(12288) != extent_end(8192) > ^^^ > But there is no gap between item {7,8}. The example makes sense. But can the same thing happen with the inline extents, ie should the same adjustments be made for the code in if (extent_type == BTRFS_FILE_EXTENT_INLINE) ? > > Fixes: d88da10ddd42 ("btrfs-progs: check: introduce function to check file > extent") > Signed-off-by: Lu Fengqi > --- > check/mode-lowmem.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c > index 1bce44f5658a..370318f0e631 100644 > --- a/check/mode-lowmem.c > +++ b/check/mode-lowmem.c > @@ -1974,7 +1974,7 @@ static int check_file_extent(struct btrfs_root *root, > struct btrfs_path *path, > } > } > > - *end += extent_num_bytes; > + *end = fkey.offset + extent_num_bytes; > if (!is_hole) > *size += extent_num_bytes; > >
Re: [PATCH] btrfs: Handle error of get_old_root
On Thu, Sep 13, 2018 at 11:35:10AM +0300, Nikolay Borisov wrote: >In btrfs_search_old_slot get_old_root is always used with the >assumption it cannot fail. However, this is not true in rare >circumstance it can fail and return null. This will lead to null >point dereference when the header is read. Fix this by checking the >return value and properly handling NULL by setting ret to -EIO and >returning gracefully. > >CID: 1087503 >Signed-off-by: Nikolay Borisov Reviewed-by: Lu Fengqi -- Thanks, Lu >--- > fs/btrfs/ctree.c | 4 > 1 file changed, 4 insertions(+) > >diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c >index 1124d236291d..a5399fd49c17 100644 >--- a/fs/btrfs/ctree.c >+++ b/fs/btrfs/ctree.c >@@ -2961,6 +2961,10 @@ int btrfs_search_old_slot(struct btrfs_root *root, >const struct btrfs_key *key, > > again: > b = get_old_root(root, time_seq); >+ if (!b) { >+ ret = -EIO; >+ goto done; >+ } > level = btrfs_header_level(b); > p->locks[level] = BTRFS_READ_LOCK; > >-- >2.7.4 > > >
Re: [PATCH] btrfs: Remove logically dead code from btrfs_orphan_cleanup
On Thu, Sep 13, 2018 at 11:35:00AM +0300, Nikolay Borisov wrote: >In btrfs_orphan_cleanup the final 'if (ret) goto out' cannot ever be >executed. This is due to the last assignment to 'ret' depending on >the return value of btrfs_iget. If an error other than -ENOENT is >returned then the loop is prematurely terminated by 'goto out'. >On the other hand, if the error value is ENOENT then a subsequent >if branch is executed that always re-assigns 'ret' and in case it's >an error just terminates the loop. No functional changes. > >CID: 1437392 >Signed-off-by: Nikolay Borisov Reviewed-by: Lu Fengqi >--- > fs/btrfs/inode.c | 2 -- > 1 file changed, 2 deletions(-) > >diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >index 3f03fec06a3a..64df0378a22f 100644 >--- a/fs/btrfs/inode.c >+++ b/fs/btrfs/inode.c >@@ -3471,8 +3471,6 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) > > /* this will do delete_inode and everything for us */ > iput(inode); >- if (ret) >- goto out; > } > /* release the path since we're done with it */ > btrfs_release_path(path); >-- >2.7.4 > > > -- Thanks, Lu
Re: btrfs send hangs after partial transfer and blocks all IO
On 13.09.2018 11:34, Jürgen Herrmann wrote: > Hello! > > I have a newly installed laptop running a freshly installed (abt. two > months ago) laptop running latest linux mint 19. Root filesystem is on a > 1TB Samsung 860 M.2 SSD with btrfs on top of a LUKS encrypted 900G > partition. Timeshift-btrfs is enabled for root (@) and home (@home) > subvolumes. I want to transfer snapshots to a server with a separated > disk via "btrfs send" and ssh. > > Here's the list of snapshot directories, each containing tow snapshots > for root and home: > > drwxr-xr-x 1 root root 30 Sep 12 22:08 2018-08-16_20-00-01 > drwxr-xr-x 1 root root 30 Aug 17 14:00 2018-08-17_14-00-02 > drwxr-xr-x 1 root root 30 Aug 23 20:00 2018-08-23_20-00-01 > drwxr-xr-x 1 root root 30 Aug 30 20:00 2018-08-30_20-00-01 > drwxr-xr-x 1 root root 30 Sep 6 20:00 2018-09-06_20-00-01 > drwxr-xr-x 1 root root 30 Sep 6 22:00 2018-09-06_22-00-01 > drwxr-xr-x 1 root root 30 Sep 8 16:00 2018-09-08_16-00-01 > drwxr-xr-x 1 root root 30 Sep 10 20:00 2018-09-10_20-00-02 > drwxr-xr-x 1 root root 30 Sep 11 21:00 2018-09-11_21-00-02 > drwxr-xr-x 1 root root 30 Sep 12 21:00 2018-09-12_21-00-01 > > "btrfs send > /mnt/timeshift/backup/timeshift-btrfs/snapshots/2018-08-16_20-00-01/@ >>/dev/null" results in the btrfs task taking 100% cpu time on one cpu > and then all IO is blocked -> only reboot can solve the hang. > > The crash does not happen immediately, as i was on the road using > cellular connection it seemed fine at first. That's how I found out that > it transfers ~140MB of data before hanging. The snapshot is created on > the server and contains data (du shows abt 140MB). > > I am running vanilla kernel 4.18.6 (compiled by myself) and btrfs progs > 4.17.1 compiled from source. > > Here's the btrfs filesystem info: > Label: none uuid: a914c141-72bf-448b-847f-d64ee82d8b7b > Total devices 1 FS bytes used 342.85GiB > devid 1 size 875.44GiB used 357.05GiB path > /dev/mapper/sda3_crypt > > A scrub shows no errors: > scrub status for a914c141-72bf-448b-847f-d64ee82d8b7b > scrub started at Thu Sep 13 10:20:18 2018 and finished after > 00:12:19 > total bytes scrubbed: 342.78GiB with 0 errors > > What can I do to help debugging this issue? You should provide output of echo w > /proc/sysrq-trigger. Also sample the stack of /proc/[pid of btrfs send]/stack to see if it is changing. > > Best regards, > Jürgen
[PATCH] btrfs: Handle error of get_old_root
In btrfs_search_old_slot get_old_root is always used with the assumption it cannot fail. However, this is not true in rare circumstance it can fail and return null. This will lead to null point dereference when the header is read. Fix this by checking the return value and properly handling NULL by setting ret to -EIO and returning gracefully. CID: 1087503 Signed-off-by: Nikolay Borisov --- fs/btrfs/ctree.c | 4 1 file changed, 4 insertions(+) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 1124d236291d..a5399fd49c17 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2961,6 +2961,10 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key, again: b = get_old_root(root, time_seq); + if (!b) { + ret = -EIO; + goto done; + } level = btrfs_header_level(b); p->locks[level] = BTRFS_READ_LOCK; -- 2.7.4
[PATCH] btrfs: Remove logically dead code from btrfs_orphan_cleanup
In btrfs_orphan_cleanup the final 'if (ret) goto out' cannot ever be executed. This is due to the last assignment to 'ret' depending on the return value of btrfs_iget. If an error other than -ENOENT is returned then the loop is prematurely terminated by 'goto out'. On the other hand, if the error value is ENOENT then a subsequent if branch is executed that always re-assigns 'ret' and in case it's an error just terminates the loop. No functional changes. CID: 1437392 Signed-off-by: Nikolay Borisov --- fs/btrfs/inode.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3f03fec06a3a..64df0378a22f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3471,8 +3471,6 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) /* this will do delete_inode and everything for us */ iput(inode); - if (ret) - goto out; } /* release the path since we're done with it */ btrfs_release_path(path); -- 2.7.4
btrfs send hangs after partial transfer and blocks all IO
Hello! I have a newly installed laptop running a freshly installed (abt. two months ago) laptop running latest linux mint 19. Root filesystem is on a 1TB Samsung 860 M.2 SSD with btrfs on top of a LUKS encrypted 900G partition. Timeshift-btrfs is enabled for root (@) and home (@home) subvolumes. I want to transfer snapshots to a server with a separated disk via "btrfs send" and ssh. Here's the list of snapshot directories, each containing tow snapshots for root and home: drwxr-xr-x 1 root root 30 Sep 12 22:08 2018-08-16_20-00-01 drwxr-xr-x 1 root root 30 Aug 17 14:00 2018-08-17_14-00-02 drwxr-xr-x 1 root root 30 Aug 23 20:00 2018-08-23_20-00-01 drwxr-xr-x 1 root root 30 Aug 30 20:00 2018-08-30_20-00-01 drwxr-xr-x 1 root root 30 Sep 6 20:00 2018-09-06_20-00-01 drwxr-xr-x 1 root root 30 Sep 6 22:00 2018-09-06_22-00-01 drwxr-xr-x 1 root root 30 Sep 8 16:00 2018-09-08_16-00-01 drwxr-xr-x 1 root root 30 Sep 10 20:00 2018-09-10_20-00-02 drwxr-xr-x 1 root root 30 Sep 11 21:00 2018-09-11_21-00-02 drwxr-xr-x 1 root root 30 Sep 12 21:00 2018-09-12_21-00-01 "btrfs send /mnt/timeshift/backup/timeshift-btrfs/snapshots/2018-08-16_20-00-01/@ >/dev/null" results in the btrfs task taking 100% cpu time on one cpu and then all IO is blocked -> only reboot can solve the hang. The crash does not happen immediately, as i was on the road using cellular connection it seemed fine at first. That's how I found out that it transfers ~140MB of data before hanging. The snapshot is created on the server and contains data (du shows abt 140MB). I am running vanilla kernel 4.18.6 (compiled by myself) and btrfs progs 4.17.1 compiled from source. Here's the btrfs filesystem info: Label: none uuid: a914c141-72bf-448b-847f-d64ee82d8b7b Total devices 1 FS bytes used 342.85GiB devid1 size 875.44GiB used 357.05GiB path /dev/mapper/sda3_crypt A scrub shows no errors: scrub status for a914c141-72bf-448b-847f-d64ee82d8b7b scrub started at Thu Sep 13 10:20:18 2018 and finished after 00:12:19 total bytes scrubbed: 342.78GiB with 0 errors What can I do to help debugging this issue? Best regards, Jürgen -- Jürgen Herrmann https://t-5.eu ALbertstraße 2 94327 Bogen
Re: [PATCH] btrfs-progs: calibrate extent_end when found a gap
On Thu, Sep 13, 2018 at 04:30:28PM +0800, Lu Fengqi wrote: >On Tue, Sep 11, 2018 at 04:41:21PM +0200, David Sterba wrote: >>On Tue, Sep 04, 2018 at 08:42:01PM +0800, Lu Fengqi wrote: >>> The extent_end will be used to check whether there is gap between this >>> extent and next extent. If it is not calibrated, check_file_extent will >> >>Do you mean 'synchronized' or 'matching'. > >I apologize for this incomprehensible commit message, and I have updated >the commit message. > >[PATCH] btrfs-progs: lowmem: fix false alert about the existence of gaps in >the check_file_extent > >> >>> mistake that there are gaps between the remaining extents. >> >>If this is a bugfix, do you have a testcase? Thanks. >> > >The testcase requires some check repair's fixes (including originl and lowmem) >that my colleagues are working on. After they get it, I will send the >testcase. > >The attached is the image which can trigger the false alert. Sorry, I miss the attached. -- Thanks, Lu > >Without the patch mentioned before, lowmem check will false alert that expect >the hole extent [257 EXTENT_DATA 8192]. > >ERROR: root 5 EXTENT_DATA[257 12288] gap exists, expected: EXTENT_DATA[257 >8192] > >-- >Thanks, >Lu > > file_extent_with_gap.img Description: Binary data
Re: [PATCH] btrfs-progs: calibrate extent_end when found a gap
On Tue, Sep 11, 2018 at 04:41:21PM +0200, David Sterba wrote: >On Tue, Sep 04, 2018 at 08:42:01PM +0800, Lu Fengqi wrote: >> The extent_end will be used to check whether there is gap between this >> extent and next extent. If it is not calibrated, check_file_extent will > >Do you mean 'synchronized' or 'matching'. I apologize for this incomprehensible commit message, and I have updated the commit message. [PATCH] btrfs-progs: lowmem: fix false alert about the existence of gaps in the check_file_extent > >> mistake that there are gaps between the remaining extents. > >If this is a bugfix, do you have a testcase? Thanks. > The testcase requires some check repair's fixes (including originl and lowmem) that my colleagues are working on. After they get it, I will send the testcase. The attached is the image which can trigger the false alert. Without the patch mentioned before, lowmem check will false alert that expect the hole extent [257 EXTENT_DATA 8192]. ERROR: root 5 EXTENT_DATA[257 12288] gap exists, expected: EXTENT_DATA[257 8192] -- Thanks, Lu
[PATCH] btrfs-progs: lowmem: fix false alert about the existence of gaps in the check_file_extent
In the check_inode_item function, the extent_end variable used to store the end of the last file extent that has checked. When it passes to check_file_extent, if the offset of the next file extent is not equal to it, there is a gap between the two file extents. In the case of a gap existing, it is wrong that only add the extent_num_bytes of this file extent to the invalid extent_end variable as before. Therefore, lowmem check will false alert that there are gaps between the subsequent file extents of this inode due to the wrong extent_end variable. Solution: The extent_end variable should set to the sum of the offset and the extent_num_bytes of the file extent. Example: Suppose that lowmem check the following file extent of inode 257. item 6 key (257 EXTENT_DATA 0) itemoff 15813 itemsize 53 generation 6 type 1 (regular) extent data disk byte 13631488 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression 0 (none) item 7 key (257 EXTENT_DATA 8192) itemoff 15760 itemsize 53 generation 6 type 1 (regular) extent data disk byte 13631488 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression 0 (none) item 8 key (257 EXTENT_DATA 12288) itemoff 15707 itemsize 53 generation 6 type 1 (regular) extent data disk byte 13631488 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression 0 (none) For inode 257, check_inode_item set extent_end to 0, then call check_file_extent to check item {6,7,8}. item 6) offset(0) == extent_end(0) extent_end = extent_end(0) + extent_num_bytes(4096) item 7) offset(8192) != extent_end(4096) extent_end = extent_end(4096) + extent_num_bytes(4096) ^^^ The old extent_end should replace by offset(8192). item 8) offset(12288) != extent_end(8192) ^^^ But there is no gap between item {7,8}. Fixes: d88da10ddd42 ("btrfs-progs: check: introduce function to check file extent") Signed-off-by: Lu Fengqi --- check/mode-lowmem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 1bce44f5658a..370318f0e631 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -1974,7 +1974,7 @@ static int check_file_extent(struct btrfs_root *root, struct btrfs_path *path, } } - *end += extent_num_bytes; + *end = fkey.offset + extent_num_bytes; if (!is_hole) *size += extent_num_bytes; -- 2.18.0
Re: [PATCH] btrfs-progs: Eliminate remaining uses of strerror(errno)
On 13.09.2018 03:52, Rosen Penev wrote: > %m allows a smaller filesize. Useful on embedded systems. > > Signed-off-by: Rosen Penev > --- > build/Documentation/Makefile | 142 + > build/Makefile.inc | 42 ++ > build/config.h | 139 + > build/config.log | 822 ++ > build/config.status | 1059 ++ > build/version.h | 14 + > cmds-qgroup.c|3 +- > messages.h |4 +- > mkfs/rootdir.c | 47 +- > qgroup.c |3 +- > 10 files changed, 2243 insertions(+), 32 deletions(-) > create mode 100644 build/Documentation/Makefile > create mode 100644 build/Makefile.inc > create mode 100644 build/config.h > create mode 100644 build/config.log > create mode 100755 build/config.status > create mode 100644 build/version.h You clearly need to resend with changes only to the *.[ch] files