Timeline or roadmap to split lowmem mode and original mode check?
Hi David, There is the long planned work to split the original mode and lowmem mode check into their own .[ch] files. I found it harder and harder to locate repair functions for original and lowmem mode when writing fixes for them. I wonder if it's a good time to start the split work, and if you're OK with it, my plan is to: 1) Start the split work based on 4.14.1 2) Split check code into: check/common.[ch]<<< Not sure how many functions will be there check/lowmem.[ch] check/origin.[ch] check/main.c Currently most fsck fixes and enhancement are from Fujitsu guys who I'm quite familiar with, so rebasing their code won't be a big problem for me. If this works for you, I can start the work asap. Thanks, Qu signature.asc Description: OpenPGP digital signature
[PATCH 0/3] Lowmem fsck repair to fix filetype mismatch
Sebastian reported a filesystem corruption where DIR_INDEX has wrong filetype against INODE_ITEM. Lowmem mode normally handles such problem by checking DIR_INDEX, DIR_ITEM and INODE_REF/INODE_ITEM to determine the correct file type. In such case, lowmem mode fsck can get the correct filetype. When fixing the problem, lowmem mode will try to re-insert correct (DIR_INDEX, DIR_ITEM, INODE_REF) tuple, and if existing correct DIR_ITEM and INODE_REF is found, btrfs_link() will just skip and only insert correct DIR_INDEX. However, when inserting correct DIR_INDEX, due to extra DIR_INDEX validation, incorrect one will be skiped and correct one will be inserted after invalid one. This leads to lowmem mode repair to create duplicated DIR_INDEX. This patch will fix it by removing the whole (DIR_INDEX, DIR_ITEM, INODE_REF) tuple before inserting correct tuple. And the removing part, btrfs_unlink(), will be enhanced to handle incorrect tuple member more robust. Please note that, due a bug in lowmem mode repair, btrfs check will still show "error(s) found in fs tree" even repair is done successfully. And test case for this repair still needs extra work for original mode to support such repair, or test case won't pass original mode test. Qu Wenruo (3): btrfs-progs: lowmem fsck: Remove corupted link before re-add correct link btrfs-progs: dir-item: Allow dir item and dir index lookup to ignore found problem btrfs-progs: dir-item: Make btrfs_delete_one_dir_name more robust to handle corrupted name len cmds-check.c | 6 +- convert/main.c | 2 +- ctree.h| 4 ++-- dir-item.c | 45 +++-- inode.c| 14 +++--- 5 files changed, 50 insertions(+), 21 deletions(-) -- 2.15.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs-progs: dir-item: Allow dir item and dir index lookup to ignore found problem
For functions btrfs_lookup_dir_index() and btrfs_lookup_dir_item(), they will check if the dir item/index is valid before doing further check. The behavior is pretty good for healthy fs, but calling them on corrupted fs with incorrect dir item/index will also return NULL, making repair code unable to reuse them. This patch add a new parameter @verify to address the problem. For normal operation, @verify should be set to true to keep the old behavior. While for some functions used in repair, like btrfs_unlink(), they can set @verify to false, so repair code can locate corrupted dir index/item and do what they should do (either repair or just delete). Signed-off-by: Qu Wenruo --- cmds-check.c | 2 +- convert/main.c | 2 +- ctree.h| 4 ++-- dir-item.c | 34 ++ inode.c| 14 +++--- 5 files changed, 37 insertions(+), 19 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index f302724dd840..59575506c83e 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -3074,7 +3074,7 @@ static int delete_dir_index(struct btrfs_root *root, btrfs_init_path(&path); di = btrfs_lookup_dir_index(trans, root, &path, backref->dir, backref->name, backref->namelen, - backref->index, -1); + backref->index, -1, false); if (IS_ERR(di)) { ret = PTR_ERR(di); btrfs_release_path(&path); diff --git a/convert/main.c b/convert/main.c index 89f9261172ca..102ba4fc5ae2 100644 --- a/convert/main.c +++ b/convert/main.c @@ -1580,7 +1580,7 @@ static int do_rollback(const char *devname) /* Search the image file */ root_dir = btrfs_root_dirid(&image_root->root_item); dir = btrfs_lookup_dir_item(NULL, image_root, &path, root_dir, - image_name, strlen(image_name), 0); + image_name, strlen(image_name), 0, true); if (!dir || IS_ERR(dir)) { btrfs_release_path(&path); diff --git a/ctree.h b/ctree.h index ef422ea60031..02529f4cb021 100644 --- a/ctree.h +++ b/ctree.h @@ -2679,12 +2679,12 @@ struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, u64 dir, const char *name, int name_len, -int mod); +int mod, bool verify); struct btrfs_dir_item *btrfs_lookup_dir_index(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, u64 dir, const char *name, int name_len, - u64 index, int mod); + u64 index, int mod, bool verify); int btrfs_delete_one_dir_name(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, diff --git a/dir-item.c b/dir-item.c index 462546c0eaf4..31ad1eca29d5 100644 --- a/dir-item.c +++ b/dir-item.c @@ -24,7 +24,7 @@ static struct btrfs_dir_item *btrfs_match_dir_item_name(struct btrfs_root *root, struct btrfs_path *path, - const char *name, int name_len); + const char *name, int name_len, bool verify); static struct btrfs_dir_item *insert_with_overflow(struct btrfs_trans_handle *trans, @@ -43,7 +43,8 @@ static struct btrfs_dir_item *insert_with_overflow(struct btrfs_trans_handle ret = btrfs_insert_empty_item(trans, root, path, cpu_key, data_size); if (ret == -EEXIST) { struct btrfs_dir_item *di; - di = btrfs_match_dir_item_name(root, path, name, name_len); + di = btrfs_match_dir_item_name(root, path, name, name_len, + true); if (di) return ERR_PTR(-EEXIST); ret = btrfs_extend_item(root, path, data_size); @@ -196,7 +197,7 @@ struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, u64 dir, const char *name, int name_len, -int mod) +int mod, bool verify) { int ret; struct btrfs_key key; @@ -227,14 +228,31 @@ struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans,
[PATCH 1/3] btrfs-progs: lowmem fsck: Remove corupted link before re-add correct link
For repair_ternary_lowmem() used in lowmem mode, if it found 1 of DIR_INDEX/DIR_ITEM/INODE_REF missing, it will try to insert correct link. However for case like invalid type in DIR_INDEX, we should delete the corrupted DIR_INDEX first before inserting the correct link. This patch will remove the corrupted link before re-insert. This should solve the duplicated DIR_INDEX problem in old lowmem mode repair. Cc: Sebastian Andrzej Siewior Signed-off-by: Qu Wenruo --- cmds-check.c | 4 1 file changed, 4 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index 7fc30da83ea1..f302724dd840 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -4997,6 +4997,10 @@ int repair_ternary_lowmem(struct btrfs_root *root, u64 dir_ino, u64 ino, goto out; } if (stage == 1) { + ret = btrfs_unlink(trans, root, ino, dir_ino, index, name, + name_len, 0); + if (ret) + goto out; ret = btrfs_add_link(trans, root, ino, dir_ino, name, name_len, filetype, &index, 1, 1); goto out; -- 2.15.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs-progs: dir-item: Make btrfs_delete_one_dir_name more robust to handle corrupted name len
Function btrfs_delete_one_dir_name() will check if the dir_item is the last content of the item, and delete the whole item if needed. However if @name_len of one dir_item/dir_index is corrupted and larger than the item size, the function will still try to treat it as partly remove, which will screw up the whole leaf. This patch will enhance the item deletion check, to cover corrupted name len, so in that case we just delete the whole item. Signed-off-by: Qu Wenruo --- dir-item.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/dir-item.c b/dir-item.c index 31ad1eca29d5..7ce3d2a40433 100644 --- a/dir-item.c +++ b/dir-item.c @@ -281,7 +281,6 @@ int btrfs_delete_one_dir_name(struct btrfs_trans_handle *trans, struct btrfs_path *path, struct btrfs_dir_item *di) { - struct extent_buffer *leaf; u32 sub_item_len; u32 item_len; @@ -291,7 +290,15 @@ int btrfs_delete_one_dir_name(struct btrfs_trans_handle *trans, sub_item_len = sizeof(*di) + btrfs_dir_name_len(leaf, di) + btrfs_dir_data_len(leaf, di); item_len = btrfs_item_size_nr(leaf, path->slots[0]); - if (sub_item_len == item_len) { + + /* +* If @sub_item_len is longer than @item_len, then it means the +* name_len is just corrupted. +* No good idea to know if there is anything we can recover from +* the corrupted item. +* Just delete the item. +*/ + if (sub_item_len >= item_len) { ret = btrfs_del_item(trans, root, path); } else { unsigned long ptr = (unsigned long)di; -- 2.15.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Add chunk allocation ENOSPC debug message for enospc_debug mount option
On 2018年01月16日 21:47, Nikolay Borisov wrote: > > > On 15.01.2018 08:13, Qu Wenruo wrote: >> Enospc_debug makes extent allocator to print more debug messages, >> however for chunk allocation, there is no debug message for enospc_debug >> at all. >> >> This patch will add message for the following parts of chunk allocator: >> >> 1) No rw device at all >>Quite rare, but at least output one message for this case. >> >> 2) No enough space for some device >>This debug message is quite handy for unbalanced disks with stripe >>based profiles (RAID0/10/5/6). >> >> 3) Not enough free devices >>This debug message should tell us if current chunk allocator is >>working correctly on minimal device requirement. >> >> Although under most case, we will hit other ENOSPC before we even hit a >> chunk allocator ENOSPC, but such debug info won't help. >> >> Signed-off-by: Qu Wenruo >> --- >> fs/btrfs/volumes.c | 19 +-- >> 1 file changed, 17 insertions(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index a25684287501..664d8a1b90b3 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -4622,8 +4622,11 @@ static int __btrfs_alloc_chunk(struct >> btrfs_trans_handle *trans, >> >> BUG_ON(!alloc_profile_is_valid(type, 0)); >> >> -if (list_empty(&fs_devices->alloc_list)) >> +if (list_empty(&fs_devices->alloc_list)) { >> +if (btrfs_test_opt(info, ENOSPC_DEBUG)) >> +btrfs_warn(info, "%s: No writable device", __func__); > > perhaps this shouldn't be gated on ENOSPC_DEBUG if it's a warning, or if > it's to be gated then make it a DEBUG. Because the case of no writeable device is rare. But change it to debug seems good. > >> return -ENOSPC; >> +} >> >> index = __get_raid_index(type); >> >> @@ -4705,8 +4708,14 @@ static int __btrfs_alloc_chunk(struct >> btrfs_trans_handle *trans, >> if (ret == 0) >> max_avail = max_stripe_size * dev_stripes; >> >> -if (max_avail < BTRFS_STRIPE_LEN * dev_stripes) >> +if (max_avail < BTRFS_STRIPE_LEN * dev_stripes) { >> +if (btrfs_test_opt(info, ENOSPC_DEBUG)) >> +btrfs_debug(info, >> +"%s: devid %llu has no free space, have=%llu want=%u", >> +__func__, device->devid, max_avail, >> +BTRFS_STRIPE_LEN * dev_stripes); > > Here we have a debug output gated on ENOSCP_DEBUG so let's be consistent > (hence my previous comment) >> continue; >> +} >> >> if (ndevs == fs_devices->rw_devices) { >> WARN(1, "%s: found more than %llu devices\n", >> @@ -4731,6 +4740,12 @@ static int __btrfs_alloc_chunk(struct >> btrfs_trans_handle *trans, >> >> if (ndevs < devs_increment * sub_stripes || ndevs < devs_min) { >> ret = -ENOSPC; >> +if (btrfs_test_opt(info, ENOSPC_DEBUG)) { >> +btrfs_debug(info, >> +"%s: not enough devices with free space: have=%d minimal=%d >> increment=%d", >> +__func__, ndevs, devs_min, >> +devs_increment * sub_stripes); > > Without looking at the code it's not really obvious what increment is. > Perhaps you can use a more descriptive word? "increment" is indeed less meaningful. I'll change it to only output "minimal" just min(minimal, devs_min). Thanks, Qu > >> +} >> goto error; >> } >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > signature.asc Description: OpenPGP digital signature
[PATCH] Fstests: btrfs/011 fix device mounted when tests aborts
One of btrfs tests, btrfs/011, uses SCRATCH_DEV_POOL and puts a non-SCRATCH_DEV device as the first one when doing mkfs, and this makes _require_scratch{_nocheck} confused since it checks mount point with SCRATCH_DEV only. This adds _scratch_umount to cleanup() to umount SCRATCH_MNT by 011 itself. Signed-off-by: Liu Bo --- tests/btrfs/011 | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/btrfs/011 b/tests/btrfs/011 index 28f1388..f4c5309 100755 --- a/tests/btrfs/011 +++ b/tests/btrfs/011 @@ -51,6 +51,7 @@ _cleanup() fi wait rm -f $tmp.tmp + _scratch_unmount > /dev/null 2>&1 } trap "_cleanup; exit \$status" 0 1 2 3 15 -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fstests: btrfs/027 unmount scratch device if test fails
This test, btrfs/027, runs tests against different raid profiles in a loop, if one of them aborts, it also fails the following ones with errors like, Test -m raid10 -d raid10 ERROR: /dev/xxx is mounted Test -m raid5 -d raid5 ERROR: /dev/xxx is mounted Test -m raid6 -d raid6 ERROR: /dev/xxx is mounted _scratch_unmount is added to avoid the above. Signed-off-by: Liu Bo --- tests/btrfs/027 | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/btrfs/027 b/tests/btrfs/027 index 625a27f..689cd4c 100755 --- a/tests/btrfs/027 +++ b/tests/btrfs/027 @@ -95,6 +95,7 @@ run_test() $SCRATCH_MNT >>$seqres.full 2>&1 if [ $? -ne 0 ]; then echo "btrfs replace failed" + _scratch_unmount _spare_dev_put _scratch_dev_pool_put return @@ -102,6 +103,7 @@ run_test() $BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >>$seqres.full 2>&1 if [ $? -ne 0 ]; then echo "btrfs scrub failed" + _scratch_unmount _spare_dev_put _scratch_dev_pool_put return -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: correct wrong comment about magic number of index_cnt
On Fri, Jan 12, 2018 at 11:08:03AM +0800, Su Yue wrote: > There is no function named btrfs_get_inode_index_count. > Explanation for magic number index_cnt=2 in btrfs_new_inode() is > actually located in btrfs_set_inode_index_count(). > > So replace 'btrfs_get_inode_index_count' in the comment by > 'btrfs_set_inode_index_count'. > > Signed-off-by: Su Yue Added to 4.16 queue, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Streamline btrfs_delalloc_reserve_metadata initial operations
On Fri, Jan 12, 2018 at 04:21:05PM +0200, Nikolay Borisov wrote: > The behavior of btrfs_delalloc_reserve_metadata depends on whether > the inode we are allocating for is the freespace inode or not. As it > stands if we are the free node we set 'flush' and 'delalloc_lock' > variable to certain values. Subsequently we check the values of those > vars and act accordingly. Instead, simplify things by having 1 if > which checks whether we are the freespace inode or not and do any > specific operation in either branches of that if. This makes the code > a bit easier to understand, as an added bonus it also shrinks the > compiled size: > > add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-17 (-17) > Function old new delta > btrfs_delalloc_reserve_metadata 18761859 -17 > Total: Before=85966, After=85949, chg -0.02% This looks too fine grained and IMHO not useful to mention. The overall module size delta is interesting when compared between the base nad pull request, but not for individual patches, namely if it's just 17 bytes. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Add chunk allocation ENOSPC debug message for enospc_debug mount option
On 15.01.2018 08:13, Qu Wenruo wrote: > Enospc_debug makes extent allocator to print more debug messages, > however for chunk allocation, there is no debug message for enospc_debug > at all. > > This patch will add message for the following parts of chunk allocator: > > 1) No rw device at all >Quite rare, but at least output one message for this case. > > 2) No enough space for some device >This debug message is quite handy for unbalanced disks with stripe >based profiles (RAID0/10/5/6). > > 3) Not enough free devices >This debug message should tell us if current chunk allocator is >working correctly on minimal device requirement. > > Although under most case, we will hit other ENOSPC before we even hit a > chunk allocator ENOSPC, but such debug info won't help. > > Signed-off-by: Qu Wenruo > --- > fs/btrfs/volumes.c | 19 +-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index a25684287501..664d8a1b90b3 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -4622,8 +4622,11 @@ static int __btrfs_alloc_chunk(struct > btrfs_trans_handle *trans, > > BUG_ON(!alloc_profile_is_valid(type, 0)); > > - if (list_empty(&fs_devices->alloc_list)) > + if (list_empty(&fs_devices->alloc_list)) { > + if (btrfs_test_opt(info, ENOSPC_DEBUG)) > + btrfs_warn(info, "%s: No writable device", __func__); perhaps this shouldn't be gated on ENOSPC_DEBUG if it's a warning, or if it's to be gated then make it a DEBUG. > return -ENOSPC; > + } > > index = __get_raid_index(type); > > @@ -4705,8 +4708,14 @@ static int __btrfs_alloc_chunk(struct > btrfs_trans_handle *trans, > if (ret == 0) > max_avail = max_stripe_size * dev_stripes; > > - if (max_avail < BTRFS_STRIPE_LEN * dev_stripes) > + if (max_avail < BTRFS_STRIPE_LEN * dev_stripes) { > + if (btrfs_test_opt(info, ENOSPC_DEBUG)) > + btrfs_debug(info, > + "%s: devid %llu has no free space, have=%llu want=%u", > + __func__, device->devid, max_avail, > + BTRFS_STRIPE_LEN * dev_stripes); Here we have a debug output gated on ENOSCP_DEBUG so let's be consistent (hence my previous comment) > continue; > + } > > if (ndevs == fs_devices->rw_devices) { > WARN(1, "%s: found more than %llu devices\n", > @@ -4731,6 +4740,12 @@ static int __btrfs_alloc_chunk(struct > btrfs_trans_handle *trans, > > if (ndevs < devs_increment * sub_stripes || ndevs < devs_min) { > ret = -ENOSPC; > + if (btrfs_test_opt(info, ENOSPC_DEBUG)) { > + btrfs_debug(info, > + "%s: not enough devices with free space: have=%d minimal=%d > increment=%d", > + __func__, ndevs, devs_min, > + devs_increment * sub_stripes); Without looking at the code it's not really obvious what increment is. Perhaps you can use a more descriptive word? > + } > goto error; > } > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendations for balancing as part of regular maintenance?
On 2018-01-16 01:45, Chris Murphy wrote: On Mon, Jan 15, 2018 at 11:23 AM, Tom Worster wrote: On 13 Jan 2018, at 17:09, Chris Murphy wrote: On Fri, Jan 12, 2018 at 11:24 AM, Austin S. Hemmelgarn wrote: To that end, I propose the following text for the FAQ: Q: Do I need to run a balance regularly? A: While not strictly necessary for normal operations, running a filtered balance regularly can help prevent your filesystem from ending up with ENOSPC issues. The following command run daily on each BTRFS volume should be more than sufficient for most users: `btrfs balance start -dusage=25 -dlimit=2..10 -musage=25 -mlimit=2..10` Daily? Seems excessive. I've got multiple Btrfs file systems that I haven't balanced, full or partial, in a year. And I have no problems. One is a laptop which accumulates snapshots until roughly 25% free space remains and then most of the snapshots are deleted, except the most recent few, all at one time. I'm not experiencing any problems so far. The other is a NAS and it's multiple copies, with maybe 100-200 snapshots. One backup volume is 99% full, there's no more unallocated free space, I delete snapshots only to make room for btrfs send receive to keep pushing the most recent snapshot from the main volume to the backup. Again no problems. I really think suggestions this broad are just going to paper over bugs or design flaws, we won't see as many bug reports and then real problems won't get fixed. This is just an answer to a FAQ. This is not Austin or anyone else trying to telling you or anyone else that you should do this. It should be clear that there is an implied caveat along the lines of: "There are other ways to manage allocation besides regular balancing. This recommendation is a For-Dummies-kinda default that should work well enough if you don't have another strategy better adapted to your situation." If this implication is not obvious enough then we can add something explicit. It's an upstream answer to a frequently asked question. It's rather official, or about as close as it gets to it. I also thing the time based method is too subjective. What about the layout means a balance is needed? And if it's really a suggestion, why isn't there a chron or systemd unit that just does this for the user, in btrfs-progs, working and enabled by default? As a newcomer to BTRFS, I was astonished to learn that it demands each user figure out some workaround for what is, in my judgement, a required but missing feature, i.e. a defect, a bug. At present the docs are pretty confusing for someone trying to deal with it on their own. Unless some better fix is in the works, this _should_ be a systemd unit or something. Until then, please put it in FAQ. At least openSUSE has a systemd unit for a long time now, but last time I checked (a bit over a year ago) it's disabled by default. Why? And insofar as I'm aware, openSUSE users aren't having big problems related to lack of balancing, they have problems due to the lack of balancing combined with schizo snapper defaults, which are these days masked somewhat by turning on quotas so snapper can be more accurate about cleaning up. And in turn causing other issues because of the quotas, but that's getting OT... Basically the scripted balance tells me two things: a. Something is broken (still) b. None of the developers has time to investigate coherent bug reports about a. and fix/refine it. I don't entirely agree here. The issue is essentially inherent in the very design of the two-stage allocator itself, so it's not really something that can just be fixed by some simple surface patch. The only real options I see to fix it are either: 1. Redesign the allocator or: 2. figure out some way to handle this generically and automatically. The first case is pretty much immediately out because it will almost certainly require a breaking change in the on-disk format. The second is extremely challenging to do right, and likely to cause some significant controversy among list regulars (I for one don't want the FS doing stuff behind my back that impacts performance, and I have a feeling that quite a lot of other people here don't either). Given that, I would say time is only a (probably small) part of it. This is not an easy thing to fix given the current situation, and difficult problems tend to sit around with no progress for very long periods of time in open source development. And therefore papering over the problem is all we have. Basically it's a sledgehammer approach. How exactly is this any different than requiring a user to manually scrub things to check data that's not being actively used? Or requiring manual invocation of defragmentation? Or even batch deduplication? All of those are manually triggered solutions to 'problems' with the filesystem, just like this is. The only difference is that people are used to needing to manually defrag disks, and reasonably used to the need for manual scrubs
Re: invalid files names, btrfs check can't repair it
On 2018年01月16日 20:15, Sebastian Andrzej Siewior wrote: > On 2018-01-16 19:20:56 [+0800], Qu Wenruo wrote: >> This is a very minor problem. >> >> btrfs check --repair should handle it without problem. > > awesome, thank you! After the --repair completed, the following didn't > report the error anymore. And now I was able to remove the two files. > Thank you. And thanks for your report on this issue too, I'm working on the generic repair function for such type mismatch and name len mismatch. It would be soon to see such repair function in btrfs-progs, and no need to bother using any hard coded fixer. (Just a little spoiler, the repair functionality will land in lowmem mode first, as lowmem mode already have such infrastructure to address it, although if executed with mainline btrfs-progs, it will just worse the problem) Thanks, Qu > >> Thanks, >> Qu > > Sebastian > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > signature.asc Description: OpenPGP digital signature
Re: invalid files names, btrfs check can't repair it
On 2018-01-16 19:20:56 [+0800], Qu Wenruo wrote: > This is a very minor problem. > > btrfs check --repair should handle it without problem. awesome, thank you! After the --repair completed, the following didn't report the error anymore. And now I was able to remove the two files. Thank you. > Thanks, > Qu Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/4] btrfs: cleanup btrfs_mount() using btrfs_mount_root()
On 01/16/2018 03:26 AM, David Sterba wrote: On Fri, Jan 12, 2018 at 06:14:40PM +0800, Anand Jain wrote: Misono, This change is causing subsequent (subvol) mount to fail when device option is specified. The simplest eg for failure is .. mkfs.btrfs -qf /dev/sdc /dev/sdb mount -o device=/dev/sdb /dev/sdc /btrfs mount -o device=/dev/sdb /dev/sdc /btrfs1 mount: /dev/sdc is already mounted or /btrfs1 busy Looks like blkdev_get_by_path() <-- is failing. btrfs_scan_one_device() btrfs_parse_early_options() btrfs_mount() Which is due to different holders (viz. btrfs_root_fs_type and btrfs_fs_type) one is used for vfs_mount and other for scan, so they form different holders and can't let EXCL open which is needed for both scan and open. This looks close to what I see in the random test failures. I've reverted your patch "btrfs: optimize move uuid_mutex closer to the critical section" as I bisected to it. The uuid mutex around blkdev_get_path probably protected the concurrent mount and scan so they did not ask for EXCL at the same time. Reverting (or removing the patch from the current misc-next) queue is simpler for me ATM as I want to get to a stable base now, we can add it later if we understand the issue with the mount/scan. Right. I don't see above test case failing on your branch [1] which does not have the uuid_mutex patch. Quite strange, there isn't any concurrency (mount and scan) in this test case. [1] git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next Ran xfstests, got stuck at btrfs/011 failures, (and will wait for Liubo's v2 patch). OR is there any other test case you were referring to as random test failures ? Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: invalid files names, btrfs check can't repair it
On 2018年01月16日 18:49, Sebastian Andrzej Siewior wrote: > On 2018-01-16 18:22:14 [+0800], Qu Wenruo wrote: >> Btrfs check is always recommended before really mount it. > > # btrfs check -p /dev/sdb4 > Checking filesystem on /dev/sdb4 > UUID: b3bfb56e-d445-4335-93f0-c1fb2d1f6df1 > checking extents [O] > cache and super generation don't match, space cache will be invalidated > root 5 inode 57648595 errors 200, dir isize wrong This is a very minor problem. btrfs check --repair should handle it without problem. Or just ignore it if you don't trust check --repair. > > ERROR: errors found in fs roots > found 64009740288 bytes used, error(s) found > total csum bytes: 61644600 > total tree bytes: 865746944 > total fs tree bytes: 693469184 > total extent tree bytes: 84905984 > btree space waste bytes: 225096302 > file data blocks allocated: 63203807232 > referenced 63142268928 > >> However as long as there is no other problem, btrfs check should not >> report anything wrong. > > Is it important to note that the filesystem is a raid1 one? Nope, the repair code handles any chunk profile well. Thanks, Qu > >> Thanks, >> Qu > > Sebastian > signature.asc Description: OpenPGP digital signature
Re: Recommendations for balancing as part of regular maintenance?
On Tue, Jan 16, 2018 at 9:45 AM, Chris Murphy wrote: ... >> >> Unless some better fix is in the works, this _should_ be a systemd unit or >> something. Until then, please put it in FAQ. > > At least openSUSE has a systemd unit for a long time now, but last > time I checked (a bit over a year ago) it's disabled by default. Why? > It is now enabled by default on Tumbleweed and hence likely on SLE/Leap 15. > And insofar as I'm aware, openSUSE users aren't having big problems > related to lack of balancing, they have problems due to the lack of > balancing combined with schizo snapper defaults, which are these days > masked somewhat by turning on quotas so snapper can be more accurate > about cleaning up. > Not only that but also making snapshot policy less aggressive - now (in Tumbleweed/Leap 42.3) periodical snapshots are turned off by default, only configuration changes via YaST/package updates via zypper trigger snapshot creation. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: invalid files names, btrfs check can't repair it
On 2018-01-16 18:22:14 [+0800], Qu Wenruo wrote: > Btrfs check is always recommended before really mount it. # btrfs check -p /dev/sdb4 Checking filesystem on /dev/sdb4 UUID: b3bfb56e-d445-4335-93f0-c1fb2d1f6df1 checking extents [O] cache and super generation don't match, space cache will be invalidated root 5 inode 57648595 errors 200, dir isize wrong ERROR: errors found in fs roots found 64009740288 bytes used, error(s) found total csum bytes: 61644600 total tree bytes: 865746944 total fs tree bytes: 693469184 total extent tree bytes: 84905984 btree space waste bytes: 225096302 file data blocks allocated: 63203807232 referenced 63142268928 > However as long as there is no other problem, btrfs check should not > report anything wrong. Is it important to note that the filesystem is a raid1 one? > Thanks, > Qu Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: invalid files names, btrfs check can't repair it
On 2018年01月16日 18:10, Sebastian Andrzej Siewior wrote: > On 2018-01-16 13:19:50 [+0800], Qu Wenruo wrote: >> Usage: >> ./btrfs-corrupt-block -X > # ./btrfs-corrupt-block -X /dev/sdb4 > Fix is done successfully > > may I mount it now or should I run btrfs check or something first? Btrfs check is always recommended before really mount it. However as long as there is no other problem, btrfs check should not report anything wrong. Thanks, Qu > >> Thanks, >> Qu > > Sebastian > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > signature.asc Description: OpenPGP digital signature
Re: invalid files names, btrfs check can't repair it
On 2018-01-16 13:19:50 [+0800], Qu Wenruo wrote: > Usage: > ./btrfs-corrupt-block -X # ./btrfs-corrupt-block -X /dev/sdb4 Fix is done successfully may I mount it now or should I run btrfs check or something first? > Thanks, > Qu Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: invalid files names, btrfs check can't repair it
On 16 January 2018 05:51:23 CET, Qu Wenruo wrote: >Since there is no other hit, I assume it's root subvolume (5), but I >still need the extra confirm since the fix will be hard-coded. Yes, 5 is correct. > >Thanks, >Qu Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html