Re: Mount stalls indefinitely after enabling quota groups.
On 2018/8/12 下午12:18, Dan Merillat wrote: > On Sat, Aug 11, 2018 at 9:36 PM Qu Wenruo wrote: > >>> I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you >>> to disable quota offline. >> >> Patch set (from my work mailbox), titled "[PATCH] btrfs-progs: rescue: >> Add ability to disable quota offline". >> Can also be fetched from github: >> https://github.com/adam900710/btrfs-progs/tree/quota_disable >> >> Usage is: >> # btrfs rescue disable-quota >> >> Tested locally, it would just toggle the ON/OFF flag for quota, so the >> modification should be minimal. > > Noticed one thing while testing this, but it's not related to the > patch so I'll keep it here. > I still had the ,ro mounts in fstab, and while it mounted ro quickly > *unmounting* the filesystem, even readonly, > got hung up: > > Aug 11 23:47:27 fileserver kernel: [ 484.314725] INFO: task > umount:5422 blocked for more than 120 seconds. > Aug 11 23:47:27 fileserver kernel: [ 484.314787] Not tainted > 4.17.14-dirty #3 > Aug 11 23:47:27 fileserver kernel: [ 484.314892] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Aug 11 23:47:27 fileserver kernel: [ 484.315006] umount D > 0 5422 4656 0x0080 > Aug 11 23:47:27 fileserver kernel: [ 484.315122] Call Trace: > Aug 11 23:47:27 fileserver kernel: [ 484.315176] ? __schedule+0x2c0/0x820 > Aug 11 23:47:27 fileserver kernel: [ 484.315270] ? > kmem_cache_alloc+0x167/0x1b0 > Aug 11 23:47:27 fileserver kernel: [ 484.315358] schedule+0x3c/0x90 > Aug 11 23:47:27 fileserver kernel: [ 484.315493] > schedule_timeout+0x1e4/0x430 > Aug 11 23:47:27 fileserver kernel: [ 484.315542] ? > kmem_cache_alloc+0x167/0x1b0 > Aug 11 23:47:27 fileserver kernel: [ 484.315686] wait_for_common+0xb1/0x170 > Aug 11 23:47:27 fileserver kernel: [ 484.315798] ? wake_up_q+0x70/0x70 > Aug 11 23:47:27 fileserver kernel: [ 484.315911] > btrfs_qgroup_wait_for_completion+0x5f/0x80 This normally waits for rescan. It may be possible that your original "btrfs quota enable" kicked in rescan but hasn't finished before umount. But for RO mount we shouldn't have any rescan running. Maybe I could find some spare time looking into it. Thanks for the report, Qu > Aug 11 23:47:27 fileserver kernel: [ 484.316031] close_ctree+0x27/0x2d0 > Aug 11 23:47:27 fileserver kernel: [ 484.316138] > generic_shutdown_super+0x69/0x110 > Aug 11 23:47:27 fileserver kernel: [ 484.316252] kill_anon_super+0xe/0x20 > Aug 11 23:47:27 fileserver kernel: [ 484.316301] btrfs_kill_super+0x13/0x100 > Aug 11 23:47:27 fileserver kernel: [ 484.316349] > deactivate_locked_super+0x39/0x70 > Aug 11 23:47:27 fileserver kernel: [ 484.316399] cleanup_mnt+0x3b/0x70 > Aug 11 23:47:27 fileserver kernel: [ 484.316459] task_work_run+0x89/0xb0 > Aug 11 23:47:27 fileserver kernel: [ 484.316519] > exit_to_usermode_loop+0x8c/0x90 > Aug 11 23:47:27 fileserver kernel: [ 484.316579] do_syscall_64+0xf1/0x110 > Aug 11 23:47:27 fileserver kernel: [ 484.316639] > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > Is it trying to write changes to a ro mount, or is it doing a bunch of > work that it's just going to throw away? I ended up using sysrq-b > after commenting out the entries in fstab. > > Everything seems fine with the filesystem now. I appreciate all the help! > signature.asc Description: OpenPGP digital signature
Re: Mount stalls indefinitely after enabling quota groups.
On Sat, Aug 11, 2018 at 9:36 PM Qu Wenruo wrote: > > I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you > > to disable quota offline. > > Patch set (from my work mailbox), titled "[PATCH] btrfs-progs: rescue: > Add ability to disable quota offline". > Can also be fetched from github: > https://github.com/adam900710/btrfs-progs/tree/quota_disable > > Usage is: > # btrfs rescue disable-quota > > Tested locally, it would just toggle the ON/OFF flag for quota, so the > modification should be minimal. Noticed one thing while testing this, but it's not related to the patch so I'll keep it here. I still had the ,ro mounts in fstab, and while it mounted ro quickly *unmounting* the filesystem, even readonly, got hung up: Aug 11 23:47:27 fileserver kernel: [ 484.314725] INFO: task umount:5422 blocked for more than 120 seconds. Aug 11 23:47:27 fileserver kernel: [ 484.314787] Not tainted 4.17.14-dirty #3 Aug 11 23:47:27 fileserver kernel: [ 484.314892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 11 23:47:27 fileserver kernel: [ 484.315006] umount D 0 5422 4656 0x0080 Aug 11 23:47:27 fileserver kernel: [ 484.315122] Call Trace: Aug 11 23:47:27 fileserver kernel: [ 484.315176] ? __schedule+0x2c0/0x820 Aug 11 23:47:27 fileserver kernel: [ 484.315270] ? kmem_cache_alloc+0x167/0x1b0 Aug 11 23:47:27 fileserver kernel: [ 484.315358] schedule+0x3c/0x90 Aug 11 23:47:27 fileserver kernel: [ 484.315493] schedule_timeout+0x1e4/0x430 Aug 11 23:47:27 fileserver kernel: [ 484.315542] ? kmem_cache_alloc+0x167/0x1b0 Aug 11 23:47:27 fileserver kernel: [ 484.315686] wait_for_common+0xb1/0x170 Aug 11 23:47:27 fileserver kernel: [ 484.315798] ? wake_up_q+0x70/0x70 Aug 11 23:47:27 fileserver kernel: [ 484.315911] btrfs_qgroup_wait_for_completion+0x5f/0x80 Aug 11 23:47:27 fileserver kernel: [ 484.316031] close_ctree+0x27/0x2d0 Aug 11 23:47:27 fileserver kernel: [ 484.316138] generic_shutdown_super+0x69/0x110 Aug 11 23:47:27 fileserver kernel: [ 484.316252] kill_anon_super+0xe/0x20 Aug 11 23:47:27 fileserver kernel: [ 484.316301] btrfs_kill_super+0x13/0x100 Aug 11 23:47:27 fileserver kernel: [ 484.316349] deactivate_locked_super+0x39/0x70 Aug 11 23:47:27 fileserver kernel: [ 484.316399] cleanup_mnt+0x3b/0x70 Aug 11 23:47:27 fileserver kernel: [ 484.316459] task_work_run+0x89/0xb0 Aug 11 23:47:27 fileserver kernel: [ 484.316519] exit_to_usermode_loop+0x8c/0x90 Aug 11 23:47:27 fileserver kernel: [ 484.316579] do_syscall_64+0xf1/0x110 Aug 11 23:47:27 fileserver kernel: [ 484.316639] entry_SYSCALL_64_after_hwframe+0x49/0xbe Is it trying to write changes to a ro mount, or is it doing a bunch of work that it's just going to throw away? I ended up using sysrq-b after commenting out the entries in fstab. Everything seems fine with the filesystem now. I appreciate all the help!
Re: [PATCH] btrfs-progs: rescue: Add ability to disable quota offline
On Sat, Aug 11, 2018 at 9:34 PM Qu Wenruo wrote: > > Provide an offline tool to disable quota. > > For kernel which skip_balance doesn't work, there is no way to disable > quota on huge fs with balance, as quota will cause balance to hang for a > long long time for each tree block switch. > > So add an offline rescue tool to disable quota. > > Reported-by: Dan Merillat > Signed-off-by: Qu Wenruo That fixed it, thanks. Tested-By: Dan Merillat
Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
On Fri, Aug 10, 2018 at 9:29 PM, Duncan <1i5t5.dun...@cox.net> wrote: > Chris Murphy posted on Fri, 10 Aug 2018 12:07:34 -0600 as excerpted: > >> But whether data is shared or exclusive seems potentially ephemeral, and >> not something a sysadmin should even be able to anticipate let alone >> individual users. > > Define "user(s)". The person who is saving their document on a network share, and they've never heard of Btrfs. > Arguably, in the context of btrfs tool usage, "user" /is/ the admin, I'm not talking about btrfs tools. I'm talking about rational, predictable behavior of a shared folder. If I try to drop a 1GiB file into my share and I'm denied, not enough free space, and behind the scenes it's because of a quota limit, I expect I can delete *any* file(s) amounting to create 1GiB free space and then I'll be able to drop that file successfully without error. But if I'm unwittingly deleting shared files, my quota usage won't go down, and I still can't save my file. So now I somehow need a secret incantation to discover only my exclusive files and delete enough of them in order to save this 1GiB file. It's weird, it's unexpected, I think it's a use case failure. Maybe Btrfs quotas isn't meant to work with samba or NFS shares. *shrug* > > "Regular users" as you use the term, that is the non-admins who just need > to know how close they are to running out of their allotted storage > resources, shouldn't really need to care about btrfs tool usage in the > first place, and btrfs commands in general, including btrfs quota related > commands, really aren't targeted at them, and aren't designed to report > the type of information they are likely to find useful. Other tools will > be more appropriate. I'm not talking about any btrfs commands or even the term quota for regular users. I'm talking about saving a file, being denied, and how does the user figure out how to free up space? Anyway, it's a hypothetical scenario. While I have Samba running on a Btrfs volume with various shares as subvolumes, I don't have quotas enabled. -- Chris Murphy
Re: Mount stalls indefinitely after enabling quota groups.
On 2018/8/12 上午8:30, Qu Wenruo wrote: > > > On 2018/8/12 上午5:10, Dan Merillat wrote: >> 19 hours later, still going extremely slowly and taking longer and >> longer for progress made. Main symptom is the mount process is >> spinning at 100% CPU, interspersed with btrfs-transaction spinning at >> 100% CPU. >> So far it's racked up 14h45m of CPU time on mount and an additional >> 3h40m on btrfs-transaction. >> >> The current drop key changes every 10-15 minutes when I check it via >> inspect-internal, so some progress is slowly being made. >> >> I built the kernel with ftrace to see what's going on internally, this >> is the pattern I'm seeing: >> > [snip] > > It looks pretty like qgroup, but too many noise. > The pin point trace event would btrfs_find_all_roots(). > >> >> Repeats indefinitely. btrace shows basically zero activity on the >> array while it spins, with the occasional burst when mount & >> btrfs-transaction swap off. >> >> To recap the chain of events leading up to this: >> 11TB Array got completely full and started fragmenting badly. >> Ran bedup and it found 600gb of duplicate files that it offline-shared. >> Reboot for unrelated reasons > > 11T, with highly deduped usage is really the worst scenario case for qgroup. > Qgroup is not really good at handle hight reflinked files, nor balance. > When they combines, it goes worse. > >> Enabled quota on all subvolumes to try to track where the new data is >> coming from >> Tried to balance metadata due to transaction CPU spikes >> Force-rebooted after the array was completely lagged out. >> >> Now attempting to mount it RW. Readonly works, but RW has taken well >> over 24 hours at this point. > > I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you > to disable quota offline. Patch set (from my work mailbox), titled "[PATCH] btrfs-progs: rescue: Add ability to disable quota offline". Can also be fetched from github: https://github.com/adam900710/btrfs-progs/tree/quota_disable Usage is: # btrfs rescue disable-quota Tested locally, it would just toggle the ON/OFF flag for quota, so the modification should be minimal. Thanks, Qu > > Thanks, > Qu > > signature.asc Description: OpenPGP digital signature
[PATCH] btrfs-progs: rescue: Add ability to disable quota offline
Provide an offline tool to disable quota. For kernel which skip_balance doesn't work, there is no way to disable quota on huge fs with balance, as quota will cause balance to hang for a long long time for each tree block switch. So add an offline rescue tool to disable quota. Reported-by: Dan Merillat Signed-off-by: Qu Wenruo --- This can patch can be fetched from github repo: https://github.com/adam900710/btrfs-progs/tree/quota_disable --- Documentation/btrfs-rescue.asciidoc | 6 +++ cmds-rescue.c | 80 + 2 files changed, 86 insertions(+) diff --git a/Documentation/btrfs-rescue.asciidoc b/Documentation/btrfs-rescue.asciidoc index f94a0ff2b45e..fb088c1a768a 100644 --- a/Documentation/btrfs-rescue.asciidoc +++ b/Documentation/btrfs-rescue.asciidoc @@ -31,6 +31,12 @@ help. NOTE: Since *chunk-recover* will scan the whole device, it will be *VERY* slow especially executed on a large device. +*disable-quota* :: +disable quota offline ++ +Acts as a fallback method to disable quota for case where mount hangs due to +balance and quota. + *fix-device-size* :: fix device size and super block total bytes values that are do not match + diff --git a/cmds-rescue.c b/cmds-rescue.c index 38c4ab9b2ef6..c7cd92427e9d 100644 --- a/cmds-rescue.c +++ b/cmds-rescue.c @@ -250,6 +250,84 @@ out: return !!ret; } +static const char * const cmd_rescue_disable_quota_usage[] = { + "btrfs rescue disable-quota ", + "Disable quota, especially useful for balance mount hang when quota enabled", + "", + NULL +}; + +static int cmd_rescue_disable_quota(int argc, char **argv) +{ + struct btrfs_trans_handle *trans; + struct btrfs_fs_info *fs_info; + struct btrfs_path path; + struct btrfs_root *root; + struct btrfs_qgroup_status_item *qi; + struct btrfs_key key; + char *devname; + int ret; + + clean_args_no_options(argc, argv, cmd_rescue_disable_quota_usage); + if (check_argc_exact(argc, 2)) + usage(cmd_rescue_disable_quota_usage); + + devname = argv[optind]; + ret = check_mounted(devname); + if (ret < 0) { + error("could not check mount status: %s", strerror(-ret)); + return !!ret; + } else if (ret) { + error("%s is currently mounted", devname); + return !!ret; + } + fs_info = open_ctree_fs_info(devname, 0, 0, 0, OPEN_CTREE_WRITES); + if (!fs_info) { + error("could not open btrfs"); + ret = -EIO; + return !!ret; + } + root = fs_info->quota_root; + if (!root) { + printf("Quota is not enabled, no need to modify the fs\n"); + goto close; + } + btrfs_init_path(); + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + error("failed to start transaction: %s", strerror(-ret)); + goto close; + } + key.objectid = 0; + key.type = BTRFS_QGROUP_STATUS_KEY; + key.offset = 0; + ret = btrfs_search_slot(trans, root, , , 0, 1); + if (ret < 0) { + error("failed to search tree: %s", strerror(-ret)); + goto close; + } + if (ret > 0) { + printf( + "qgroup status item not found, not need to modify the fs"); + ret = 0; + goto release; + } + qi = btrfs_item_ptr(path.nodes[0], path.slots[0], + struct btrfs_qgroup_status_item); + btrfs_set_qgroup_status_flags(path.nodes[0], qi, + BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT); + btrfs_mark_buffer_dirty(path.nodes[0]); + ret = btrfs_commit_transaction(trans, root); + if (ret < 0) + error("failed to commit transaction: %s", strerror(-ret)); +release: + btrfs_release_path(); +close: + close_ctree(fs_info->tree_root); + return !!ret; +} + static const char rescue_cmd_group_info[] = "toolbox for specific rescue operations"; @@ -262,6 +340,8 @@ const struct cmd_group rescue_cmd_group = { { "zero-log", cmd_rescue_zero_log, cmd_rescue_zero_log_usage, NULL, 0}, { "fix-device-size", cmd_rescue_fix_device_size, cmd_rescue_fix_device_size_usage, NULL, 0}, + { "disable-quota", cmd_rescue_disable_quota, + cmd_rescue_disable_quota_usage, NULL, 0}, NULL_CMD_STRUCT } }; -- 2.18.0
Re: Mount stalls indefinitely after enabling quota groups.
On 2018/8/12 上午8:59, Dan Merillat wrote: > On Sat, Aug 11, 2018 at 8:30 PM Qu Wenruo wrote: >> >> It looks pretty like qgroup, but too many noise. >> The pin point trace event would btrfs_find_all_roots(). > > I had this half-written when you replied. > > Agreed: looks like bulk of time spent resides in qgroups. Spent some > time with sysrq-l and ftrace: > > ? __rcu_read_unlock+0x5/0x50 > ? return_to_handler+0x15/0x36 > __rcu_read_unlock+0x5/0x50 > find_extent_buffer+0x47/0x90extent_io.c:4888 > read_block_for_search.isra.12+0xc8/0x350ctree.c:2399 > btrfs_search_slot+0x3e7/0x9c0 ctree.c:2837 > btrfs_next_old_leaf+0x1dc/0x410 ctree.c:5702 > btrfs_next_old_item ctree.h:2952 > add_all_parents backref.c:487 > resolve_indirect_refs+0x3f7/0x7e0 backref.c:575 > find_parent_nodes+0x42d/0x1290 backref.c:1236 > ? find_parent_nodes+0x5/0x1290 backref.c:1114 > btrfs_find_all_roots_safe+0x98/0x100backref.c:1414 > btrfs_find_all_roots+0x52/0x70 backref.c:1442 > btrfs_qgroup_trace_extent_post+0x27/0x60qgroup.c:1503 > btrfs_qgroup_trace_leaf_items+0x104/0x130 qgroup.c:1589 > btrfs_qgroup_trace_subtree+0x26a/0x3a0 qgroup.c:1750 > do_walk_down+0x33c/0x5a0extent-tree.c:8883 > walk_down_tree+0xa8/0xd0extent-tree.c:9041 > btrfs_drop_snapshot+0x370/0x8b0 extent-tree.c:9203 > merge_reloc_roots+0xcf/0x220 > btrfs_recover_relocation+0x26d/0x400 > ? btrfs_cleanup_fs_roots+0x16a/0x180 > btrfs_remount+0x32e/0x510 > do_remount_sb+0x67/0x1e0 > do_mount+0x712/0xc90 > > The mount is looping in btrfs_qgroup_trace_subtree, as evidenced by > the following ftrace filter: > fileserver:/sys/kernel/tracing# cat set_ftrace_filter > btrfs_qgroup_trace_extent > btrfs_qgroup_trace_subtree Yep, it's quota causing the hang. > [snip] > > So 10-13 minutes per cycle. > >> 11T, with highly deduped usage is really the worst scenario case for qgroup. >> Qgroup is not really good at handle hight reflinked files, nor balance. >> When they combines, it goes worse. > > I'm not really understanding the use-case of qgroup if it melts down > on large systems with a shared base + individual changes. The problem is, for balance btrfs is doing a trick by switch tree reloc tree with real fs tree. However, tree reloc tree doesn't account to quota, but for real fs tree it contributes to quota. And since above owner changes, btrfs needs to do a full subtree rescan. For small subvolume it's not a problem, but for large subvolume, quota needs to rescan thousands tree blocks, and due to highly deduped files, each tree blocks needs extra iterations for each deduped files. Both factors contribute to the slow mount. There are several workaround patches in the mail list, one is to make the balance background for mount, so it won't hang mount. But it still makes transaction pretty slow (write will still be blocked for a long time) There is also plan to skip subtree rescan completely, but it needs extra review to ensure such tree block switch won't change quota number. Thanks, Qu > >> I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you >> to disable quota offline. > > Ok. I was looking at just doing this to speed things up: > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 51b5e2da708c..c5bf937b79f0 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -8877,7 +8877,7 @@ static noinline int do_walk_down(struct > btrfs_trans_handle *trans, > parent = 0; > } > > - if (need_account) { > + if (0) { > ret = btrfs_qgroup_trace_subtree(trans, root, next, > generation, level - > 1); > if (ret) { > > > btrfs_err_rl(fs_info, > "Error %d accounting shared subtree. Quota > is out of sync, rescan required.", > ret); > } > > > If I follow, this will leave me with inconsistent qgroups and a full > rescan is required. That seems an acceptable tradeoff, since it seems > like the best plan going forward is to nuke the qgroups anyway. > > There's still the btrfs-transaction spin, but I'm hoping that's > related to qgroups as well. > >> >> Thanks, >> Qu > > Appreciate it. I was going to go with my hackjob patch to avoid any > untested rewriting - there's already an error path for "something went > wrong updating qgroups during walk_tree" so it seemed safest to take > advantage of it. I'll patch either the kernel or the btrfs programs, > whichever you think is best. > signature.asc Description: OpenPGP digital signature
Re: Mount stalls indefinitely after enabling quota groups.
On Sat, Aug 11, 2018 at 8:30 PM Qu Wenruo wrote: > > It looks pretty like qgroup, but too many noise. > The pin point trace event would btrfs_find_all_roots(). I had this half-written when you replied. Agreed: looks like bulk of time spent resides in qgroups. Spent some time with sysrq-l and ftrace: ? __rcu_read_unlock+0x5/0x50 ? return_to_handler+0x15/0x36 __rcu_read_unlock+0x5/0x50 find_extent_buffer+0x47/0x90extent_io.c:4888 read_block_for_search.isra.12+0xc8/0x350ctree.c:2399 btrfs_search_slot+0x3e7/0x9c0 ctree.c:2837 btrfs_next_old_leaf+0x1dc/0x410 ctree.c:5702 btrfs_next_old_item ctree.h:2952 add_all_parents backref.c:487 resolve_indirect_refs+0x3f7/0x7e0 backref.c:575 find_parent_nodes+0x42d/0x1290 backref.c:1236 ? find_parent_nodes+0x5/0x1290 backref.c:1114 btrfs_find_all_roots_safe+0x98/0x100backref.c:1414 btrfs_find_all_roots+0x52/0x70 backref.c:1442 btrfs_qgroup_trace_extent_post+0x27/0x60qgroup.c:1503 btrfs_qgroup_trace_leaf_items+0x104/0x130 qgroup.c:1589 btrfs_qgroup_trace_subtree+0x26a/0x3a0 qgroup.c:1750 do_walk_down+0x33c/0x5a0extent-tree.c:8883 walk_down_tree+0xa8/0xd0extent-tree.c:9041 btrfs_drop_snapshot+0x370/0x8b0 extent-tree.c:9203 merge_reloc_roots+0xcf/0x220 btrfs_recover_relocation+0x26d/0x400 ? btrfs_cleanup_fs_roots+0x16a/0x180 btrfs_remount+0x32e/0x510 do_remount_sb+0x67/0x1e0 do_mount+0x712/0xc90 The mount is looping in btrfs_qgroup_trace_subtree, as evidenced by the following ftrace filter: fileserver:/sys/kernel/tracing# cat set_ftrace_filter btrfs_qgroup_trace_extent btrfs_qgroup_trace_subtree # cat trace ... mount-6803 [003] 80407.649752: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_subtree mount-6803 [003] 80407.649772: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.649797: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.649821: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.649846: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.701652: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754547: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754574: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754598: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754622: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [003] 80407.754646: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items ... repeats 240 times mount-6803 [002] 80412.568804: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [002] 80412.568825: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items mount-6803 [002] 80412.568850: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_subtree mount-6803 [002] 80412.568872: btrfs_qgroup_trace_extent <-btrfs_qgroup_trace_leaf_items Looks like invocations of btrfs_qgroup_trace_subtree are taking forever: mount-6803 [006] 80641.627709: btrfs_qgroup_trace_subtree <-do_walk_down mount-6803 [003] 81433.760945: btrfs_qgroup_trace_subtree <-do_walk_down (add do_walk_down to the trace here) mount-6803 [001] 82124.623557: do_walk_down <-walk_down_tree mount-6803 [001] 82124.623567: btrfs_qgroup_trace_subtree <-do_walk_down mount-6803 [006] 82695.241306: do_walk_down <-walk_down_tree mount-6803 [006] 82695.241316: btrfs_qgroup_trace_subtree <-do_walk_down So 10-13 minutes per cycle. > 11T, with highly deduped usage is really the worst scenario case for qgroup. > Qgroup is not really good at handle hight reflinked files, nor balance. > When they combines, it goes worse. I'm not really understanding the use-case of qgroup if it melts down on large systems with a shared base + individual changes. > I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you > to disable quota offline. Ok. I was looking at just doing this to speed things up: diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 51b5e2da708c..c5bf937b79f0 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8877,7 +8877,7 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans, parent = 0; } - if (need_account) { + if (0) { ret
Re: Mount stalls indefinitely after enabling quota groups.
On 2018/8/12 上午5:10, Dan Merillat wrote: > 19 hours later, still going extremely slowly and taking longer and > longer for progress made. Main symptom is the mount process is > spinning at 100% CPU, interspersed with btrfs-transaction spinning at > 100% CPU. > So far it's racked up 14h45m of CPU time on mount and an additional > 3h40m on btrfs-transaction. > > The current drop key changes every 10-15 minutes when I check it via > inspect-internal, so some progress is slowly being made. > > I built the kernel with ftrace to see what's going on internally, this > is the pattern I'm seeing: > [snip] It looks pretty like qgroup, but too many noise. The pin point trace event would btrfs_find_all_roots(). > > Repeats indefinitely. btrace shows basically zero activity on the > array while it spins, with the occasional burst when mount & > btrfs-transaction swap off. > > To recap the chain of events leading up to this: > 11TB Array got completely full and started fragmenting badly. > Ran bedup and it found 600gb of duplicate files that it offline-shared. > Reboot for unrelated reasons 11T, with highly deduped usage is really the worst scenario case for qgroup. Qgroup is not really good at handle hight reflinked files, nor balance. When they combines, it goes worse. > Enabled quota on all subvolumes to try to track where the new data is > coming from > Tried to balance metadata due to transaction CPU spikes > Force-rebooted after the array was completely lagged out. > > Now attempting to mount it RW. Readonly works, but RW has taken well > over 24 hours at this point. I'll add a new rescue subcommand, 'btrfs rescue disable-quota' for you to disable quota offline. Thanks, Qu signature.asc Description: OpenPGP digital signature
Re: Mount stalls indefinitely after enabling quota groups.
19 hours later, still going extremely slowly and taking longer and longer for progress made. Main symptom is the mount process is spinning at 100% CPU, interspersed with btrfs-transaction spinning at 100% CPU. So far it's racked up 14h45m of CPU time on mount and an additional 3h40m on btrfs-transaction. The current drop key changes every 10-15 minutes when I check it via inspect-internal, so some progress is slowly being made. I built the kernel with ftrace to see what's going on internally, this is the pattern I'm seeing: mount-6803 [002] ...1 69023.970964: btrfs_next_old_leaf <-resolve_indirect_refs mount-6803 [002] ...1 69023.970965: btrfs_release_path <-btrfs_next_old_leaf mount-6803 [002] ...1 69023.970965: btrfs_search_slot <-btrfs_next_old_leaf mount-6803 [002] ...1 69023.970966: btrfs_clear_path_blocking <-btrfs_search_slot mount-6803 [002] ...1 69023.970966: btrfs_set_path_blocking <-btrfs_clear_path_blocking mount-6803 [002] ...1 69023.970967: btrfs_bin_search <-btrfs_search_slot mount-6803 [002] ...1 69023.970967: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970967: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970968: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970968: btrfs_node_key <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970969: btrfs_buffer_uptodate <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970969: btrfs_clear_path_blocking <-btrfs_search_slot mount-6803 [002] ...1 69023.970970: btrfs_set_path_blocking <-btrfs_clear_path_blocking mount-6803 [002] ...1 69023.970970: btrfs_bin_search <-btrfs_search_slot mount-6803 [002] ...1 69023.970970: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970971: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970971: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970972: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970972: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970973: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970973: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970973: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970974: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970974: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970975: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970975: btrfs_node_key <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970976: btrfs_buffer_uptodate <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970976: btrfs_clear_path_blocking <-btrfs_search_slot mount-6803 [002] ...1 69023.970976: btrfs_set_path_blocking <-btrfs_clear_path_blocking mount-6803 [002] ...1 69023.970977: btrfs_bin_search <-btrfs_search_slot mount-6803 [002] ...1 69023.970977: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970978: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970978: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970978: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970979: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970979: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970980: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970980: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970980: btrfs_comp_cpu_keys <-generic_bin_search.constprop.14 mount-6803 [002] ...1 69023.970981: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970981: btrfs_get_token_64 <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970982: btrfs_node_key <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970982: btrfs_buffer_uptodate <-read_block_for_search.isra.12 mount-6803 [002] ...1 69023.970983: btrfs_clear_path_blocking <-btrfs_search_slot mount-6803 [002] ...1 69023.970983: btrfs_set_path_blocking <-btrfs_clear_path_blocking mount-6803 [002] ...1
Re: List of known BTRFS Raid 5/6 Bugs?
On Sat, Aug 11, 2018 at 08:27:04AM +0200, erentheti...@mail.de wrote: > I guess that covers most topics, two last questions: > > Will the write hole behave differently on Raid 6 compared to Raid 5 ? Not really. It changes the probability distribution (you get an extra chance to recover using a parity block in some cases), but there are still cases where data gets lost that didn't need to be. > Is there any benefit of running Raid 5 Metadata compared to Raid 1 ? There may be benefits of raid5 metadata, but they are small compared to the risks. In some configurations it may not be possible to allocate the last gigabyte of space. raid1 will allocate 1GB chunks from 2 disks at a time while raid5 will allocate 1GB chunks from N disks at a time, and if N is an odd number there could be one chunk left over in the array that is unusable. Most users will find this irrelevant because a large disk array that is filled to the last GB will become quite slow due to long free space search and seek times--you really want to keep usage below 95%, maybe 98% at most, and that means the last GB will never be needed. Reading raid5 metadata could theoretically be faster than raid1, but that depends on a lot of variables, so you can't assume it as a rule of thumb. Raid6 metadata is more interesting because it's the only currently supported way to get 2-disk failure tolerance in btrfs. Unfortunately that benefit is rather limited due to the write hole bug. There are patches floating around that implement multi-disk raid1 (i.e. 3 or 4 mirror copies instead of just 2). This would be much better for metadata than raid6--more flexible, more robust, and my guess is that it will be faster as well (no need for RMW updates or journal seeks). > - > FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT > signature.asc Description: PGP signature
Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
10.08.2018 12:33, Tomasz Pala пишет: > >> For 4 disk with 1T free space each, if you're using RAID5 for data, then >> you can write 3T data. >> But if you're also using RAID10 for metadata, and you're using default >> inline, we can use small files to fill the free space, resulting 2T >> available space. >> >> So in this case how would you calculate the free space? 3T or 2T or >> anything between them? > > The answear is pretty simple: 3T. Rationale: > - this is the space I do can put in a single data stream, > - people are aware that there is metadata overhead with any object; > after all, metadata are also data, > - while filling the fs with small files the free space available would > self-adjust after every single file put, so after uploading 1T of such > files the df should report 1.5T free. There would be nothing weird(er > that now) that 1T of data has actually eaten 1.5T of storage. > > No crystal ball calculations, just KISS; since one _can_ put 3T file > (non sparse, uncompressible, bulk written) on a filesystem, the free space is > 3T. > As far as I can tell, that is exactly what "df" reports now. "btrfs fi us" will tell you both max (reported by "df") and worst case min.
Re: List of known BTRFS Raid 5/6 Bugs?
I guess that covers most topics, two last questions: Will the write hole behave differently on Raid 6 compared to Raid 5 ? Is there any benefit of running Raid 5 Metadata compared to Raid 1 ? - FreeMail powered by mail.de - MEHR SICHERHEIT, SERIOSITÄT UND KOMFORT
Re: Report correct filesystem usage / limits on BTRFS subvolumes with quota
10.08.2018 21:21, Tomasz Pala пишет: > On Fri, Aug 10, 2018 at 07:39:30 -0400, Austin S. Hemmelgarn wrote: > >>> I.e.: every shared segment should be accounted within quota (at least once). >> I think what you mean to say here is that every shared extent should be >> accounted to quotas for every location it is reflinked from. IOW, that >> if an extent is shared between two subvolumes each with it's own quota, >> they should both have it accounted against their quota. > > Yes. > This is what "referenced" in quota group report is, is not it? What is missing here?