[PATCH] btrfs-progs: Fix NULL pointer when receive clone operation
The subvol_info returned from subvol_uuid_search() can be NULL. So the branch checking IS_ERR(si) should also check if it's NULL. Reported-by: Tsutomu Itoh Signed-off-by: Qu Wenruo --- cmds-receive.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmds-receive.c b/cmds-receive.c index cb42aa2..c8f2fff 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -750,7 +750,7 @@ static int process_clone(const char *path, u64 offset, u64 len, si = subvol_uuid_search(&rctx->sus, 0, clone_uuid, clone_ctransid, NULL, subvol_search_by_received_uuid); - if (IS_ERR(si)) { + if (IS_ERR(si) || !si) { if (memcmp(clone_uuid, rctx->cur_subvol.received_uuid, BTRFS_UUID_SIZE) == 0) { /* TODO check generation of extent */ -- 2.10.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] btrfs fixes and cleanups
Hi David, This is the collection of my patches targetting 4.10, I've dropped patch "Btrfs: adjust len of writes if following a preallocated extent" because of the deadlock caused by this commit. Patches are based on v4.9-rc8, and test against fstests with default mount options has been taken to make sure it doesn't break anything. I haven't got a kernel.org git repo, so this is mainly for tracking purpose and for testing git flow. (cherry-pick patches might be the only way at this moment...sorry for the inconvenience.) Anyway, patches can be found at https://github.com/liubogithub/btrfs-work.git for-dave Thanks, liubo Liu Bo (9): Btrfs: add 'inode' for extent map tracepoint Btrfs: add truncated_len for ordered extent tracepoints Btrfs: use down_read_nested to make lockdep silent Btrfs: fix lockdep warning about log_mutex Btrfs: fix truncate down when no_holes feature is enabled Btrfs: fix btrfs_ordered_update_i_size to update disk_i_size properly Btrfs: fix comment in btrfs_page_mkwrite Btrfs: clean up btrfs_ordered_update_i_size Btrfs: fix another race between truncate and lockless dio write fs/btrfs/extent-tree.c | 3 ++- fs/btrfs/inode.c | 43 +++ fs/btrfs/ordered-data.c | 42 -- fs/btrfs/tree-log.c | 13 ++--- include/trace/events/btrfs.h | 16 5 files changed, 83 insertions(+), 34 deletions(-) -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix another race between truncate and lockless dio write
Dio writes can update i_size in btrfs_get_blocks_direct when it writes to offset beyond EOF so that endio can update disk_i_size correctly (because we don't udpate disk_i_size beyond i_size). However, when truncating down a file, we firstly update i_size and then wait for in-flight lockless dio reads/writes, according to the above, i_size may have been changed in dio writes, and file extents don't get truncated. For lockless dio writes are always overwrites, i_size is not supposed to be changed, so this adds a check to filter out this case. The race could be reproduced by fstests/generic/299 with patch "Btrfs: fix btrfs_ordered_update_i_size to update disk_i_size properly" applied. Signed-off-by: Liu Bo --- fs/btrfs/inode.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c9973e5..171d8e8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -72,6 +72,7 @@ struct btrfs_dio_data { u64 reserve; u64 unsubmitted_oe_range_start; u64 unsubmitted_oe_range_end; + int overwrite; }; static const struct inode_operations btrfs_dir_inode_operations; @@ -7833,7 +7834,7 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, * Need to update the i_size under the extent lock so buffered * readers will get the updated i_size when we unlock. */ - if (start + len > i_size_read(inode)) + if (!dio_data->overwrite && start + len > i_size_read(inode)) i_size_write(inode, start + len); adjust_dio_outstanding_extents(inode, dio_data, len); @@ -8715,6 +8716,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * not unlock the i_mutex at this case. */ if (offset + count <= inode->i_size) { + dio_data.overwrite = 1; inode_unlock(inode); relock = true; } -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs progs pre-release 4.9-rc1
On 2016/12/14 23:42, David Sterba wrote: > Hi, > > a pre-release has been tagged. Contains almost the entire devel branch from > today. There are small fixes, the lowmem mode of check gets more updates but > still does not work in the --repair mode and is considered experimental. > > ETA for 4.9 is in +6 days (2016-12-20). > > Minor fixes, docs improvements or more testcases will be still considered for > 4.9 release. xfstests btrfs/{108,109,117} that was working in 4.8.5 will not work properly. + ./check btrfs/108 FSTYP -- btrfs PLATFORM -- Linux/x86_64 luna 4.9.0 MKFS_OPTIONS -- /dev/sdb3 MOUNT_OPTIONS -- /dev/sdb3 /test6 btrfs/108 1s ... [failed, exit status 1] - output mismatch (see /xfstests/results//btrfs/108.out.bad) --- tests/btrfs/108.out 2015-10-19 09:55:52.0 +0900 +++ /xfstests/results//btrfs/108.out.bad2016-12-15 15:41:43.771411349 +0900 @@ -8,6 +8,6 @@ File digests in the original filesystem: fbf36a062ffcbd644b5739c4d683ccc7 SCRATCH_MNT/snap/foo 5d2c92827a70aad932cfe7363105c55e SCRATCH_MNT/snap/bar -File digests in the new filesystem: -fbf36a062ffcbd644b5739c4d683ccc7 SCRATCH_MNT/snap/foo -5d2c92827a70aad932cfe7363105c55e SCRATCH_MNT/snap/bar +./common/rc: line 2784: 22352 Segmentation fault (core dumped) "$@" >> $seqres.full 2>&1 ... (Run 'diff -u tests/btrfs/108.out /xfstests/results//btrfs/108.out.bad' to see the entire diff) Ran: btrfs/108 Failures: btrfs/108 Failed 1 of 1 tests Thanks, Tsutomu > > Changes: > * check: many lowmem mode updates > * send: use splice syscall to copy buffer from kernel > * receive: new option to dump the stream in textual form > * convert: > * move sources to own directory > * prevent accounting of blocks beyond end of the device > * make it work with 64k sectorsize > * mkfs: move sources to own directory > * defrag: warns if directory used without -r > * dev stats: > * new option to check stats for non-zero values > * add long option for -z > * library: version bump to 0.1.2, added subvol_uuid_search2 > * other: > * cleanups > * docs updates > > Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/ > Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git > > Shortlog: > > Adam Borowski (1): > btrfs-progs: man mkfs: warn about RAID5/6 being experimental > > Anand Jain (1): > btrfs-progs: recursive defrag cleanup duplicate code > > Austin S. Hemmelgarn (1): > btrfs-progs: dev stats: add dev stats returncode option > > Chandan Rajendra (3): > btrfs-progs: Use helper function to access > btrfs_super_block->sys_chunk_array_size > btrfs-progs: convert: Prevent accounting blocks beyond end of device > btrfs-progs: convert: Fix migrate_super_block() to work with 64k > sectorsize > > David Sterba (35): > btrfs-progs: remove extra newline from messages > btrfs-progs: use symbolic name for first inode number when searching > btrfs-progs: send: use splice syscall instead of read/write to transfer > buffer > btrfs-progs: send: rename thread callback to read data from kernel > btrfs-progs: make incompat bit wrappers more compact > btrfs-progs: receive: rename receive context variable > btrfs-progs: check: use on-stack path buffer in check_fs_first_inode > btrfs-progs: check: use on-stack path buffer in check_fs_root_v2 > btrfs-progs: check: use on-stack path buffer in check_fs_roots_v2 > btrfs-progs: send dump: introduce helper for printing escaped path > btrfs-progs: send dump: print escaped path > btrfs-progs: send dump: use reentrant variant of localtime > btrfs-progs: tests: add more gobal option to test 001-btrfs > btrfs-progs: docs: update receive help and manual page > btrfs-progs: build: extend pattern rules for standalone directories > btrfs-progs: move btrfs-convert to own directory > btrfs-progs: move mkfs.btrfs sources to own directory > btrfs-progs: tests: check for partscan support in > misc/006-partitioned-loopdev > btrfs-progs: run mkfs tests in CI > btrfs-progs: mkfs: annotation of a case > btrfs-progs: docs: clarify trim after mkfs -K > btrfs-progs: docs: make documentation updates workflow more clear > btrfs-progs: dev stats: adjust some error messages > btrfs-progs: dev stats: use char type path > btrfs-progs: dev stats: use table based printing of items > btrfs-progs: dev stats: add long option for -z > btrfs-progs: docs: update dev stats help and manual page > btrfs-progs: help: fix printing of aliased commands > btrfs-progs: fixup API after change in subvol_uuid_search > btrfs-progs: library: bump to 0.1.2 > btrfs-progs: handle failed strdup in subvol_uuid_search2 > btrfs-progs: dev stats: update option name for checking non-zero
[PATCH v2] btrfs-progs: tests: add test for --sync option of qgroup show
Simple test script for the following patch. btrfs-progs: qgroup: add sync option to 'qgroup show' Signed-off-by: Tsutomu Itoh --- v2: dropped the test of --no-sync --- tests/cli-tests/005-qgroup-show-sync/test.sh | 30 1 file changed, 30 insertions(+) create mode 100755 tests/cli-tests/005-qgroup-show-sync/test.sh diff --git a/tests/cli-tests/005-qgroup-show-sync/test.sh b/tests/cli-tests/005-qgroup-show-sync/test.sh new file mode 100755 index 000..a325b48 --- /dev/null +++ b/tests/cli-tests/005-qgroup-show-sync/test.sh @@ -0,0 +1,30 @@ +#!/bin/bash +# +# simple test of qgroup show --sync option + +source $TOP/tests/common + +check_prereq mkfs.btrfs +check_prereq btrfs + +setup_root_helper +prepare_test_dev 1g + +run_check $TOP/mkfs.btrfs -f $IMAGE +run_check_mount_test_dev + +run_check $SUDO_HELPER $TOP/btrfs subvolume create $TEST_MNT/Sub +run_check $SUDO_HELPER $TOP/btrfs quota enable $TEST_MNT/Sub + +for opt in '' '--' '--sync'; do + run_check $SUDO_HELPER $TOP/btrfs qgroup limit 300M $TEST_MNT/Sub + run_check $SUDU_HELPER dd if=/dev/zero of=$TEST_MNT/Sub/file bs=1M count=200 + + run_check $SUDO_HELPER $TOP/btrfs qgroup show -re $opt $TEST_MNT/Sub + + run_check $SUDO_HELPER $TOP/btrfs qgroup limit none $TEST_MNT/Sub + run_check rm -f $TEST_MNT/Sub/file + run_check $TOP/btrfs filesystem sync $TEST_MNT/Sub +done + +run_check_umount_test_dev -- 2.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/2] btrfs-progs: qgroup: change the value of sort option
The value of sort option ('S') is not used for option letter. Therefore, I'll change the single letter to non-character. Signed-off-by: Tsutomu Itoh --- This patch is separated from patch of --sync option. --- cmds-qgroup.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/cmds-qgroup.c b/cmds-qgroup.c index 2a10c97..34e3bcc 100644 --- a/cmds-qgroup.c +++ b/cmds-qgroup.c @@ -313,10 +313,11 @@ static int cmd_qgroup_show(int argc, char **argv) while (1) { int c; enum { - GETOPT_VAL_SYNC = 256 + GETOPT_VAL_SORT = 256, + GETOPT_VAL_SYNC }; static const struct option long_options[] = { - {"sort", required_argument, NULL, 'S'}, + {"sort", required_argument, NULL, GETOPT_VAL_SORT}, {"sync", no_argument, NULL, GETOPT_VAL_SYNC}, { NULL, 0, NULL, 0 } }; @@ -347,7 +348,7 @@ static int cmd_qgroup_show(int argc, char **argv) case 'f': filter_flag |= 0x2; break; - case 'S': + case GETOPT_VAL_SORT: ret = btrfs_qgroup_parse_sort_string(optarg, &comparer_set); if (ret) -- 2.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/2] btrfs-progs: qgroup: add sync option to 'qgroup show'
The 'qgroup show' command does not synchronize filesystem. Therefore, 'qgroup show' may not display the correct value unless synchronized with 'filesystem sync' command etc. So add the '--sync' option so that we can choose whether or not to synchronize when executing the command. Signed-off-by: Tsutomu Itoh --- v2: use getopt_long with enum instead of single letter (suggested by Qu) v3: dropped the --no-sync option and separated the patch of sort option (suggested by David) --- Documentation/btrfs-qgroup.asciidoc | 4 cmds-qgroup.c | 22 -- 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/Documentation/btrfs-qgroup.asciidoc b/Documentation/btrfs-qgroup.asciidoc index 438dbc7..3053f2e 100644 --- a/Documentation/btrfs-qgroup.asciidoc +++ b/Documentation/btrfs-qgroup.asciidoc @@ -126,6 +126,10 @@ Prefix \'+' means ascending order and \'-' means descending order of . If no prefix is given, use ascending order by default. + If multiple s is given, use comma to separate. ++ +--sync +To retrieve information after updating the state of qgroups, +force sync of the filesystem identified by before getting information. EXIT STATUS --- diff --git a/cmds-qgroup.c b/cmds-qgroup.c index bc15077..2a10c97 100644 --- a/cmds-qgroup.c +++ b/cmds-qgroup.c @@ -272,8 +272,7 @@ static int cmd_qgroup_destroy(int argc, char **argv) } static const char * const cmd_qgroup_show_usage[] = { - "btrfs qgroup show -pcreFf " - "[--sort=qgroupid,rfer,excl,max_rfer,max_excl] ", + "btrfs qgroup show [options] ", "Show subvolume quota groups.", "-p print parent qgroup id", "-c print child qgroup id", @@ -288,6 +287,7 @@ static const char * const cmd_qgroup_show_usage[] = { " list qgroups sorted by specified items", " you can use '+' or '-' in front of each item.", " (+:ascending, -:descending, ascending default)", + "--sync force sync of the filesystem before getting info", NULL }; @@ -301,6 +301,7 @@ static int cmd_qgroup_show(int argc, char **argv) u64 qgroupid; int filter_flag = 0; unsigned unit_mode; + int sync = 0; struct btrfs_qgroup_comparer_set *comparer_set; struct btrfs_qgroup_filter_set *filter_set; @@ -311,8 +312,12 @@ static int cmd_qgroup_show(int argc, char **argv) while (1) { int c; + enum { + GETOPT_VAL_SYNC = 256 + }; static const struct option long_options[] = { {"sort", required_argument, NULL, 'S'}, + {"sync", no_argument, NULL, GETOPT_VAL_SYNC}, { NULL, 0, NULL, 0 } }; @@ -348,6 +353,9 @@ static int cmd_qgroup_show(int argc, char **argv) if (ret) usage(cmd_qgroup_show_usage); break; + case GETOPT_VAL_SYNC: + sync = 1; + break; default: usage(cmd_qgroup_show_usage); } @@ -365,6 +373,16 @@ static int cmd_qgroup_show(int argc, char **argv) return 1; } + if (sync) { + ret = ioctl(fd, BTRFS_IOC_SYNC); + if (ret < 0) { + error("sync ioctl failed on '%s': %s", path, + strerror(errno)); + close_file_or_dir(fd, dirstream); + goto out; + } + } + if (filter_flag) { ret = lookup_path_rootid(fd, &qgroupid); if (ret < 0) { -- 2.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs-progs: qgroup: add sync option to 'qgroup show'
Hi David, Thanks for the review. On 2016/12/14 19:54, David Sterba wrote: > On Wed, Dec 07, 2016 at 04:55:15PM +0900, Tsutomu Itoh wrote: >> The 'qgroup show' command does not synchronize filesystem. >> Therefore, 'qgroup show' may not display the correct value unless >> synchronized with 'filesystem sync' command etc. >> >> So add the '--sync' and '--no-sync' options so that we can choose >> whether or not to synchronize when executing the command. >> >> Signed-off-by: Tsutomu Itoh >> --- >> v2: use getopt_long with enum instead of single letter (suggested by Qu) >> --- >> Documentation/btrfs-qgroup.asciidoc | 6 ++ >> cmds-qgroup.c | 33 + >> 2 files changed, 35 insertions(+), 4 deletions(-) >> >> diff --git a/Documentation/btrfs-qgroup.asciidoc >> b/Documentation/btrfs-qgroup.asciidoc >> index 438dbc7..9c65795 100644 >> --- a/Documentation/btrfs-qgroup.asciidoc >> +++ b/Documentation/btrfs-qgroup.asciidoc >> @@ -126,6 +126,12 @@ Prefix \'+' means ascending order and \'-' means >> descending order of . >> If no prefix is given, use ascending order by default. >> + >> If multiple s is given, use comma to separate. >> ++ >> +--sync >> +To retrieve information after updating the status of qgroups, >> +invoke sync before getting information. > > This could be more specific, that it's a filesystem sync. > >> +--no-sync >> +Do not invoke sync before getting information (default). > > I'm not sure we need this option, how is it supposed to be used? I made it to pair with --sync, but there is no use case in particular. So, I would like to drop this with the next patch. > >> @@ -311,8 +313,15 @@ static int cmd_qgroup_show(int argc, char **argv) >> >> while (1) { >> int c; >> +enum { >> +GETOPT_VAL_SORT = 256, >> +GETOPT_VAL_SYNC, >> +GETOPT_VAL_NO_SYNC >> +}; >> static const struct option long_options[] = { >> -{"sort", required_argument, NULL, 'S'}, >> +{"sort", required_argument, NULL, GETOPT_VAL_SORT}, > > This change is unrelated to the patch, please make a separate patch for > that. OK. I'll separate this with the next patch. Thanks, Tsutomu > > Otherwise looks good. > >> +{"sync", no_argument, NULL, GETOPT_VAL_SYNC}, >> +{"no-sync", no_argument, NULL, GETOPT_VAL_NO_SYNC}, >> { NULL, 0, NULL, 0 } >> }; >> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
Hi, The dirty data is in large amount, probably unable to commit to disk. And this seems to happen when copying from 7200rpm to 5600rpm disks, according to previous post. Probably the I/Os are buffered and pending, unable to get finished in-time. It might be helpful to know if this only happens for specific types of 5600 rpm disks? And are these disks on RAID groups? Thanks. Xin Sent: Wednesday, December 14, 2016 at 3:38 AM From: admin To: "Michal Hocko" Cc: linux-btrfs@vger.kernel.org, linux-ker...@vger.kernel.org, "David Sterba" , "Chris Mason" Subject: Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another Hi, I verified the log files and see no prior oom killer invocation. Unfortunately the machine has been rebooted since. Next time it happens, I will also look in dmesg. Thanks, David Arendt Michal Hocko – Wed., 14. December 2016 11:31 > Btw. the stall should be preceded by the OOM killer invocation. Could > you share the OOM report please. I am asking because such an OOM killer > would be clearly pre-mature as per your meminfo. I am trying to change > that code and seeing your numbers might help me. > > Thanks! > > On Wed 14-12-16 11:17:43, Michal Hocko wrote: > > On Tue 13-12-16 18:11:01, David Arendt wrote: > > > Hi, > > > > > > I receive the following page allocation stall while copying lots of > > > large files from one btrfs hdd to another. > > > > > > Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for > > > 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL) > > > Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8 > > > Tainted: P O 4.9.0 #1 > > [...] > > > Dec 13 13:04:29 server kernel: Call Trace: > > > Dec 13 13:04:29 server kernel: [] ? dump_stack+0x46/0x5d > > > Dec 13 13:04:29 server kernel: [] ? > > > warn_alloc+0x111/0x130 > > > Dec 13 13:04:33 server kernel: [] ? > > > __alloc_pages_nodemask+0xbe8/0xd30 > > > Dec 13 13:04:33 server kernel: [] ? > > > pagecache_get_page+0xe4/0x230 > > > Dec 13 13:04:33 server kernel: [] ? > > > alloc_extent_buffer+0x10b/0x400 > > > Dec 13 13:04:33 server kernel: [] ? > > > btrfs_alloc_tree_block+0x125/0x560 > > > > OK, so this is > > find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL) > > > > The main question is whether this really needs to be NOFS request... > > > > > Dec 13 13:04:33 server kernel: [] ? > > > read_extent_buffer_pages+0x21f/0x280 > > > Dec 13 13:04:33 server kernel: [] ? > > > __btrfs_cow_block+0x141/0x580 > > > Dec 13 13:04:33 server kernel: [] ? > > > btrfs_cow_block+0x100/0x150 > > > Dec 13 13:04:33 server kernel: [] ? > > > btrfs_search_slot+0x1e9/0x9c0 > > > Dec 13 13:04:33 server kernel: [] ? > > > __set_extent_bit+0x512/0x550 > > > Dec 13 13:04:33 server kernel: [] ? > > > lookup_inline_extent_backref+0xf5/0x5e0 > > > Dec 13 13:04:34 server kernel: [] ? > > > set_extent_bit+0x24/0x30 > > > Dec 13 13:04:34 server kernel: [] ? > > > update_block_group.isra.34+0x114/0x380 > > > Dec 13 13:04:34 server kernel: [] ? > > > __btrfs_free_extent.isra.35+0xf4/0xd20 > > > Dec 13 13:04:34 server kernel: [] ? > > > btrfs_merge_delayed_refs+0x61/0x5d0 > > > Dec 13 13:04:34 server kernel: [] ? > > > __btrfs_run_delayed_refs+0x902/0x10a0 > > > Dec 13 13:04:34 server kernel: [] ? > > > btrfs_run_delayed_refs+0x90/0x2a0 > > > Dec 13 13:04:34 server kernel: [] ? > > > delayed_ref_async_start+0x84/0xa0 > > > > What would cause the reclaim recursion? > > > > > Dec 13 13:04:34 server kernel: Mem-Info: > > > Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34 > > > isolated_anon:0\x0a active_file:7370032 inactive_file:450105 > > > isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189 > > > unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a > > > mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241 > > > free_cma:0 > > > > This speaks for itself. There is a lot of dirty data, basically no > > anonymous memory and GFP_NOFS cannot do much to reclaim obviously. This > > is either a configuraion bug as somebody noted down the thread (setting > > the dirty_ratio) or suboptimality of the btrfs code which might request > > NOFS even though it is not strictly necessary. This would be more for > > btrfs developers. > > -- > > Michal Hocko > > SUSE Labs > > -- > Michal Hocko > SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] duperemove: test presence of dedupe ioctl
On Wed, Dec 14, 2016 at 10:38:45AM -0800, Darrick J. Wong wrote: > > > +struct fake_btrfs_ioctl_same_args { > > > + struct btrfs_ioctl_same_args args; > > > + struct btrfs_ioctl_same_extent_info info; > > > +}; > > > > Why does this need a fake structure here? > > In order to test the ioctl we have to fill out at least one > btrfs_ioctl_same_extent_info so that we get far enough into the fs-specific > dedupe_range handler that we've verified that the fs is capable of dedupe and > that the fs is willing to try to satisfy the request. Oh, got it, it's just the fake that tripped me up. > We could just malloc sizeof(_same_args) + sizeof(_same_extent_info)... Either that, or more simply just don't give the structure a name by just declaring it locally on the stack: struct { struct btrfs_ioctl_same_args args; struct btrfs_ioctl_same_extent_info info; } sa = { 0 }; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix btrfs_ordered_update_i_size to update disk_i_size properly
On Thu, Dec 01, 2016 at 01:46:10PM -0800, Liu Bo wrote: > btrfs_ordered_update_i_size can be called by truncate and endio, but only > endio > takes ordered_extent which contains the completed IO. > > while truncating down a file, if there are some in-flight IOs, > btrfs_ordered_update_i_size in endio will set disk_i_size to @orig_offset that > is zero. If truncating-down fails somehow, we try to recover in memory isize > with this zero'd disk_i_size. > > Fix it by only updating disk_i_size with @orig_offset when > btrfs_ordered_update_i_size is not called from endio while truncating down and > waiting for in-flight IOs completing their work before recover in-memory size. > > Besides fixing the above issue, add an assertion for last_size to double check > we truncate down to the desired size. > > Signed-off-by: Liu Bo > --- > fs/btrfs/inode.c| 14 ++ > fs/btrfs/ordered-data.c | 9 +++-- > 2 files changed, 21 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 09157dd..ef3594d 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -4682,6 +4682,13 @@ int btrfs_truncate_inode_items(struct > btrfs_trans_handle *trans, > > btrfs_free_path(path); > > + if (err == 0) { > + /* only inline file may have last_size != new_size */ > + if (new_size >= root->sectorsize || > + new_size > root->fs_info->max_inline) > + ASSERT(last_size == new_size); > + } > + This ASSERT has been hit by fstests/generic/299, and it didn't show up the first time I tested, I'm trying to figure out whether we have problems in code or in this ASSERT. Thanks, -liubo > if (be_nice && bytes_deleted > SZ_32M) { > unsigned long updates = trans->delayed_ref_updates; > if (updates) { > @@ -5064,6 +5071,13 @@ static int btrfs_setsize(struct inode *inode, struct > iattr *attr) > if (ret && inode->i_nlink) { > int err; > > + /* To get a stable disk_i_size */ > + err = btrfs_wait_ordered_range(inode, 0, (u64)-1); > + if (err) { > + btrfs_orphan_del(NULL, inode); > + return err; > + } > + > /* >* failed to truncate, disk_i_size is only adjusted down >* as we remove extents, so it should represent the true > diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c > index b2d1e95..5eaa25a 100644 > --- a/fs/btrfs/ordered-data.c > +++ b/fs/btrfs/ordered-data.c > @@ -982,8 +982,13 @@ int btrfs_ordered_update_i_size(struct inode *inode, u64 > offset, > } > disk_i_size = BTRFS_I(inode)->disk_i_size; > > - /* truncate file */ > - if (disk_i_size > i_size) { > + /* > + * truncate file. > + * If ordered is not NULL, then this is called from endio and > + * disk_i_size will be updated by either truncate itself or any > + * in-flight IOs which are inside the disk_i_size. > + */ > + if (!ordered && disk_i_size > i_size) { > BTRFS_I(inode)->disk_i_size = orig_offset; > ret = 0; > goto out; > -- > 2.5.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] duperemove: test presence of dedupe ioctl
On Wed, Dec 14, 2016 at 02:44:36AM -0800, Christoph Hellwig wrote: > On Fri, Dec 09, 2016 at 09:56:45AM -0800, Darrick J. Wong wrote: > > Since a zero-length dedupe operation is guaranteed to succeed, use that > > to test whether or not this filesystem supports dedupe. > > > > Signed-off-by: Darrick J. Wong > > --- > > file_scan.c | 47 +-- > > 1 file changed, 37 insertions(+), 10 deletions(-) > > > > diff --git a/file_scan.c b/file_scan.c > > index 617f166..a34453e 100644 > > --- a/file_scan.c > > +++ b/file_scan.c > > @@ -45,11 +45,7 @@ > > #include "file_scan.h" > > #include "dbfile.h" > > #include "util.h" > > - > > -/* This is not in linux/magic.h */ > > -#ifndefXFS_SB_MAGIC > > -#defineXFS_SB_MAGIC0x58465342 /* 'XFSB' */ > > -#endif > > +#include "btrfs-ioctl.h" > > > > static char path[PATH_MAX] = { 0, }; > > static char *pathp = path; > > @@ -189,6 +185,39 @@ static int walk_dir(const char *name) > > return ret; > > } > > > > +struct fake_btrfs_ioctl_same_args { > > + struct btrfs_ioctl_same_args args; > > + struct btrfs_ioctl_same_extent_info info; > > +}; > > Why does this need a fake structure here? In order to test the ioctl we have to fill out at least one btrfs_ioctl_same_extent_info so that we get far enough into the fs-specific dedupe_range handler that we've verified that the fs is capable of dedupe and that the fs is willing to try to satisfy the request. We could just malloc sizeof(_same_args) + sizeof(_same_extent_info)... --D > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs progs pre-release 4.9-rc1
Hi, a pre-release has been tagged. Contains almost the entire devel branch from today. There are small fixes, the lowmem mode of check gets more updates but still does not work in the --repair mode and is considered experimental. ETA for 4.9 is in +6 days (2016-12-20). Minor fixes, docs improvements or more testcases will be still considered for 4.9 release. Changes: * check: many lowmem mode updates * send: use splice syscall to copy buffer from kernel * receive: new option to dump the stream in textual form * convert: * move sources to own directory * prevent accounting of blocks beyond end of the device * make it work with 64k sectorsize * mkfs: move sources to own directory * defrag: warns if directory used without -r * dev stats: * new option to check stats for non-zero values * add long option for -z * library: version bump to 0.1.2, added subvol_uuid_search2 * other: * cleanups * docs updates Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/ Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git Shortlog: Adam Borowski (1): btrfs-progs: man mkfs: warn about RAID5/6 being experimental Anand Jain (1): btrfs-progs: recursive defrag cleanup duplicate code Austin S. Hemmelgarn (1): btrfs-progs: dev stats: add dev stats returncode option Chandan Rajendra (3): btrfs-progs: Use helper function to access btrfs_super_block->sys_chunk_array_size btrfs-progs: convert: Prevent accounting blocks beyond end of device btrfs-progs: convert: Fix migrate_super_block() to work with 64k sectorsize David Sterba (35): btrfs-progs: remove extra newline from messages btrfs-progs: use symbolic name for first inode number when searching btrfs-progs: send: use splice syscall instead of read/write to transfer buffer btrfs-progs: send: rename thread callback to read data from kernel btrfs-progs: make incompat bit wrappers more compact btrfs-progs: receive: rename receive context variable btrfs-progs: check: use on-stack path buffer in check_fs_first_inode btrfs-progs: check: use on-stack path buffer in check_fs_root_v2 btrfs-progs: check: use on-stack path buffer in check_fs_roots_v2 btrfs-progs: send dump: introduce helper for printing escaped path btrfs-progs: send dump: print escaped path btrfs-progs: send dump: use reentrant variant of localtime btrfs-progs: tests: add more gobal option to test 001-btrfs btrfs-progs: docs: update receive help and manual page btrfs-progs: build: extend pattern rules for standalone directories btrfs-progs: move btrfs-convert to own directory btrfs-progs: move mkfs.btrfs sources to own directory btrfs-progs: tests: check for partscan support in misc/006-partitioned-loopdev btrfs-progs: run mkfs tests in CI btrfs-progs: mkfs: annotation of a case btrfs-progs: docs: clarify trim after mkfs -K btrfs-progs: docs: make documentation updates workflow more clear btrfs-progs: dev stats: adjust some error messages btrfs-progs: dev stats: use char type path btrfs-progs: dev stats: use table based printing of items btrfs-progs: dev stats: add long option for -z btrfs-progs: docs: update dev stats help and manual page btrfs-progs: help: fix printing of aliased commands btrfs-progs: fixup API after change in subvol_uuid_search btrfs-progs: library: bump to 0.1.2 btrfs-progs: handle failed strdup in subvol_uuid_search2 btrfs-progs: dev stats: update option name for checking non-zero status btrfs-progs: defrag: cleanup temporary errno value btrfs-progs: defrag: warn when deframgenting directories without -r btrfs-progs: update CHANGES for v4.9 Goldwyn Rodrigues (5): btrfs-progs: Correct value printed by assertions/BUG_ON/WARN_ON btrfs-progs: Remove duplicate printfs in warning_trace()/assert_trace() btrfs-progs: check: fix extents after finding all errors btrfs-progs: Initialize ret to suppress compiler warning btrfs-progs: find_free_dev_extent() closer to kernel code Lu Fengqi (11): btrfs-progs: check: introduce function to find dir_item btrfs-progs: check: introduce function to check inode_ref btrfs-progs: check: introduce function to check inode_extref btrfs-progs: check: introduce function to find inode_ref btrfs-progs: check: introduce function to check dir_item btrfs-progs: check: introduce function to check file extent btrfs-progs: check: introduce function to check inode item btrfs-progs: check: introduce function to check fs root btrfs-progs: check: introduce function to check root ref btrfs-progs: check: introduce low_memory mode fs_tree check btrfs-progs: check: fix the return value bug of cmd_check() Noah Massey (1): btrfs-progs: docs: fix typo in mk
[RFC] btrfs: lockdep says "possible recursive locking detected" in btrfs_clear_lock_blocking_rw()
With lockdep enabled I managed to trigger the following lockdep splat: | = | [ INFO: possible recursive locking detected ] | 4.9.0-rt0 #804 Tainted: GW | - | kworker/u16:4/154 is trying to acquire lock: | (btrfs-fs-00){+.+...}, at: [] btrfs_clear_lock_blocking_rw+0x71/0x120 | | but task is already holding lock: | (btrfs-fs-00){+.+...}, at: [] btrfs_clear_lock_blocking_rw+0x71/0x120 | | other info that might help us debug this: | Possible unsafe locking scenario: | |CPU0 | | lock(btrfs-fs-00); | lock(btrfs-fs-00); | | *** DEADLOCK *** | | May be due to missing lock nesting notation | | 6 locks held by kworker/u16:4/154: | #0: ("%s-%s""btrfs", name){.+.+.+}, at: [] process_one_work+0x1f3/0x7b0 | #1: ((&work->normal_work)){+.+.+.}, at: [] process_one_work+0x1f3/0x7b0 | #2: (sb_internal){.+.+..}, at: [] start_transaction+0x2f1/0x590 | #3: (btrfs-fs-02){+.+...}, at: [] btrfs_clear_lock_blocking_rw+0x71/0x120 | #4: (btrfs-fs-01){+.+...}, at: [] btrfs_clear_lock_blocking_rw+0x71/0x120 | #5: (btrfs-fs-00){+.+...}, at: [] btrfs_clear_lock_blocking_rw+0x71/0x120 | | stack backtrace: | CPU: 1 PID: 154 Comm: kworker/u16:4 Tainted: GW 4.9.0-rt1+ #804 | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014 | Workqueue: btrfs-delalloc btrfs_delalloc_helper | c9000123b7d0 8141a2a5 829d6db0 829d6db0 | c9000123b890 810c19dd 02fe 0006 | 3c272f80 82308200 ce68145f590e60eb 880039c108c0 | Call Trace: | [] dump_stack+0x86/0xc1 | [] __lock_acquire+0x6dd/0x11d0 | [] lock_acquire+0x116/0x240 | [] rt_read_lock+0x45/0x60 | [] btrfs_clear_lock_blocking_rw+0x71/0x120 | [] btrfs_clear_path_blocking+0x94/0xb0 | [] btrfs_next_old_leaf+0x3df/0x420 | [] btrfs_next_leaf+0xb/0x10 | [] __btrfs_drop_extents+0x1cb/0xd50 | [] cow_file_range_inline+0x191/0x6c0 | [] compress_file_range.constprop.68+0x314/0x710 | [] async_cow_start+0x30/0x50 | [] btrfs_scrubparity_helper+0xfd/0x620 | [] btrfs_delalloc_helper+0x9/0x10 | [] process_one_work+0x26e/0x7b0 | [] worker_thread+0x46/0x560 | [] kthread+0xee/0x110 | [] ret_from_fork+0x2a/0x40 I can trigger it on -RT but it won't show up on a vanilla kernel. I don't see obvious difference here (between RT and !RT). We do have more preemption points and a spin_lock() does not disable preemption (so any assumption on spin_lock() disabling preemption will fail). With all btrfs events enabled, this did not trigger. With the following patch --- a/fs/btrfs/locking.c +++ b/fs/btrfs/locking.c @@ -41,6 +41,7 @@ void btrfs_set_lock_blocking_rw(struct extent_buffer *eb, int rw) */ if (eb->lock_nested && current->pid == eb->lock_owner) return; + trace_printk("eb %p rw %d\n", eb, rw); if (rw == BTRFS_WRITE_LOCK) { if (atomic_read(&eb->blocking_writers) == 0) { WARN_ON(atomic_read(&eb->spinning_writers) != 1); @@ -73,6 +74,7 @@ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw) if (eb->lock_nested && current->pid == eb->lock_owner) return; + trace_printk("eb %p rw %d\n", eb, rw); if (rw == BTRFS_WRITE_LOCK_BLOCKING) { BUG_ON(atomic_read(&eb->blocking_writers) != 1); write_lock(&eb->lock); I manage to collect this (the last few lines from the kworker): # _-=> irqs-off # / _=> need-resched #|/ _-=> need-resched_lazy #|| / _---=> hardirq/softirq #||| / _--=> preempt-depth # / _-=> preempt-lazy-depth #| / _-=> migrate-disable #|| /delay # TASK-PID CPU# ||| TIMESTAMP FUNCTION # | | | ||| | | kworker/u16:4-154 [001] .1160.632361: btrfs_set_lock_blocking_rw: eb 880039ebac00 rw 1 kworker/u16:4-154 [001] ...60.632362: btrfs_clear_lock_blocking_rw: eb 880039ebac00 rw 3 kworker/u16:4-154 [001] .1160.632366: btrfs_set_lock_blocking_rw: eb 880039ebac00 rw 1 kworker/u16:4-154 [001] ...60.632367: btrfs_clear_lock_blocking_rw: eb 880039ebac00 rw 3 kworker/u16:4-154 [001] .1160.632367: btrfs_set_lock_blocking_rw: eb 880039ebac00 rw 1 kworker/u16:4-154 [001] ...60.632368: btrfs_set_lock_blocking_rw: eb 880039ebac00 rw 3 kworker/u16:4-154 [001] ...60.632369: btrfs_clear_lock_blocking_rw: eb 880039ebac00 rw 3 kworker/u16:4-154 [001] .1260.632371: btrfs_set_lock_blocking_rw: eb 880039ebb000 rw 1 kwork
[PATCH 2/2] btrfs: swap free() and trace point in run_ordered_work()
The previous patch removed a trace point due to a use after free problem with tracing enabled. While looking at the backtrace it took me a while to find the right spot. While doing so I noticed that this trace point could be used after one of two clean-up functions were invoked: - run_one_async_free() - async_cow_free() Both of them free the `work' item so a later use in the tracepoint is not possible. This patch swaps the order so we first have the trace point and then free the struct. Signed-off-by: Sebastian Andrzej Siewior --- fs/btrfs/async-thread.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index d0dfc3d2e199..6f4631bf74f8 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -288,8 +288,8 @@ static void run_ordered_work(struct __btrfs_workqueue *wq) * we don't want to call the ordered free functions * with the lock held though */ - work->ordered_free(work); trace_btrfs_all_work_done(work); + work->ordered_free(work); } spin_unlock_irqrestore(lock, flags); } -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs: drop trace_btrfs_all_work_done() from normal_work_helper()
For btrfs_scrubparity_helper() the ->func() is set to scrub_parity_bio_endio_worker(). This functions invokes scrub_free_parity() which kfrees() the `work' object. All is good as long as trace events are not enabled because we boom with a backtrace like this: | Workqueue: btrfs-endio btrfs_endio_helper | RIP: 0010:[] [] trace_event_raw_event_btrfs__work__done+0x4e/0xa0 | Call Trace: | [] btrfs_scrubparity_helper+0x59d/0x780 | [] btrfs_endio_helper+0x9/0x10 | [] process_one_work+0x26e/0x7b0 | [] worker_thread+0x46/0x560 | [] kthread+0xee/0x110 | [] ret_from_fork+0x2a/0x40 So in order to avoid this, I remove the trace point. Signed-off-by: Sebastian Andrzej Siewior --- fs/btrfs/async-thread.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c index e0f071f6b5a7..d0dfc3d2e199 100644 --- a/fs/btrfs/async-thread.c +++ b/fs/btrfs/async-thread.c @@ -318,8 +318,6 @@ static void normal_work_helper(struct btrfs_work *work) set_bit(WORK_DONE_BIT, &work->flags); run_ordered_work(wq); } - if (!need_order) - trace_btrfs_all_work_done(work); } void btrfs_init_work(struct btrfs_work *work, btrfs_work_func_t uniq_func, -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] Lowmem fsck false alert fixes
On Mon, Dec 05, 2016 at 05:07:52PM +0800, Qu Wenruo wrote: > Btrfs-progs test case 023 will cause assert and a lot of false alerts > for lowmem mode. > > The problems are caused by several reasons, from bad handler for tree > reloc root(calling btrfs_read_fs_root on tree reloc tree) to too > restrict check. > > Fix the lowmem mode bugs. > > There is another bug which affects both original mode and lowmem mode, > it seems to be caused by this commit: > commit 00e769d04c2c83029d6c71fbded133597d93ad55 > Author: Goldwyn Rodrigues > Date: Tue Nov 29 10:24:52 2016 -0600 > > btrfs-progs: Correct value printed by assertions/BUG_ON/WARN_ON > > Informed Goldwyn to fix it. > So the fix for the common assert is not included in this patchset. > > Qu Wenruo (4): > btrfs-progs: check: Fix assert when using lowmem on fs with tree reloc > tree > btrfs-progs: check: Fix lowmem mode stack overflow caused by fsck/023 > btrfs-progs: check: Fix lowmem false alert on tree reloc tree > btrfs-progs: check: Fix false alert on generation mismatch for tree > reloc tree 1-4 applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs-progs: btrfs-convert: Prevent accounting blocks beyond end of device
On Fri, Dec 09, 2016 at 09:03:57AM +0800, Qu Wenruo wrote: > Hi Chandan, > > Thanks for the patch. > > At 12/08/2016 09:56 PM, Chandan Rajendra wrote: > > When looping across data block bitmap, __ext2_add_one_block() may add > > blocks which do not exist on the underlying disk. This commit prevents > > this from happening by checking the block index against the maximum > > block count that was present in the ext4 filesystem instance that is > > being converted. > > The patch looks good to me. > > Reviewed-by: Qu Wenruo Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs-convert: Fix migrate_super_block() to work with 64k sectorsize
On Fri, Dec 09, 2016 at 09:09:29AM +0800, Qu Wenruo wrote: > > > At 12/08/2016 09:56 PM, Chandan Rajendra wrote: > > migrate_super_block() uses sectorsize to refer to the size of the > > superblock. Hence on 64k sectorsize filesystems, it ends up computing > > checksum beyond the super block length (i.e. > > BTRFS_SUPER_INFO_SIZE). This commit fixes the bug by using > > BTRFS_SUPER_INFO_SIZE instead of sectorsize of the underlying > > filesystem. > > > > Signed-off-by: Chandan Rajendra > > Reviewed-by: Qu Wenruo Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
Hi, I verified the log files and see no prior oom killer invocation. Unfortunately the machine has been rebooted since. Next time it happens, I will also look in dmesg. Thanks, David Arendt Michal Hocko – Wed., 14. December 2016 11:31 > Btw. the stall should be preceded by the OOM killer invocation. Could > you share the OOM report please. I am asking because such an OOM killer > would be clearly pre-mature as per your meminfo. I am trying to change > that code and seeing your numbers might help me. > > Thanks! > > On Wed 14-12-16 11:17:43, Michal Hocko wrote: > > On Tue 13-12-16 18:11:01, David Arendt wrote: > > > Hi, > > > > > > I receive the following page allocation stall while copying lots of > > > large files from one btrfs hdd to another. > > > > > > Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for > > > 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL) > > > Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8 > > > Tainted: P O4.9.0 #1 > > [...] > > > Dec 13 13:04:29 server kernel: Call Trace: > > > Dec 13 13:04:29 server kernel: [] ? > > > dump_stack+0x46/0x5d > > > Dec 13 13:04:29 server kernel: [] ? > > > warn_alloc+0x111/0x130 > > > Dec 13 13:04:33 server kernel: [] ? > > > __alloc_pages_nodemask+0xbe8/0xd30 > > > Dec 13 13:04:33 server kernel: [] ? > > > pagecache_get_page+0xe4/0x230 > > > Dec 13 13:04:33 server kernel: [] ? > > > alloc_extent_buffer+0x10b/0x400 > > > Dec 13 13:04:33 server kernel: [] ? > > > btrfs_alloc_tree_block+0x125/0x560 > > > > OK, so this is > > find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL) > > > > The main question is whether this really needs to be NOFS request... > > > > > Dec 13 13:04:33 server kernel: [] ? > > > read_extent_buffer_pages+0x21f/0x280 > > > Dec 13 13:04:33 server kernel: [] ? > > > __btrfs_cow_block+0x141/0x580 > > > Dec 13 13:04:33 server kernel: [] ? > > > btrfs_cow_block+0x100/0x150 > > > Dec 13 13:04:33 server kernel: [] ? > > > btrfs_search_slot+0x1e9/0x9c0 > > > Dec 13 13:04:33 server kernel: [] ? > > > __set_extent_bit+0x512/0x550 > > > Dec 13 13:04:33 server kernel: [] ? > > > lookup_inline_extent_backref+0xf5/0x5e0 > > > Dec 13 13:04:34 server kernel: [] ? > > > set_extent_bit+0x24/0x30 > > > Dec 13 13:04:34 server kernel: [] ? > > > update_block_group.isra.34+0x114/0x380 > > > Dec 13 13:04:34 server kernel: [] ? > > > __btrfs_free_extent.isra.35+0xf4/0xd20 > > > Dec 13 13:04:34 server kernel: [] ? > > > btrfs_merge_delayed_refs+0x61/0x5d0 > > > Dec 13 13:04:34 server kernel: [] ? > > > __btrfs_run_delayed_refs+0x902/0x10a0 > > > Dec 13 13:04:34 server kernel: [] ? > > > btrfs_run_delayed_refs+0x90/0x2a0 > > > Dec 13 13:04:34 server kernel: [] ? > > > delayed_ref_async_start+0x84/0xa0 > > > > What would cause the reclaim recursion? > > > > > Dec 13 13:04:34 server kernel: Mem-Info: > > > Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34 > > > isolated_anon:0\x0a active_file:7370032 inactive_file:450105 > > > isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189 > > > unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a > > > mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241 > > > free_cma:0 > > > > This speaks for itself. There is a lot of dirty data, basically no > > anonymous memory and GFP_NOFS cannot do much to reclaim obviously. This > > is either a configuraion bug as somebody noted down the thread (setting > > the dirty_ratio) or suboptimality of the btrfs code which might request > > NOFS even though it is not strictly necessary. This would be more for > > btrfs developers. > > -- > > Michal Hocko > > SUSE Labs > > -- > Michal Hocko > SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs-progs: qgroup: add sync option to 'qgroup show'
On Wed, Dec 07, 2016 at 04:55:15PM +0900, Tsutomu Itoh wrote: > The 'qgroup show' command does not synchronize filesystem. > Therefore, 'qgroup show' may not display the correct value unless > synchronized with 'filesystem sync' command etc. > > So add the '--sync' and '--no-sync' options so that we can choose > whether or not to synchronize when executing the command. > > Signed-off-by: Tsutomu Itoh > --- > v2: use getopt_long with enum instead of single letter (suggested by Qu) > --- > Documentation/btrfs-qgroup.asciidoc | 6 ++ > cmds-qgroup.c | 33 + > 2 files changed, 35 insertions(+), 4 deletions(-) > > diff --git a/Documentation/btrfs-qgroup.asciidoc > b/Documentation/btrfs-qgroup.asciidoc > index 438dbc7..9c65795 100644 > --- a/Documentation/btrfs-qgroup.asciidoc > +++ b/Documentation/btrfs-qgroup.asciidoc > @@ -126,6 +126,12 @@ Prefix \'+' means ascending order and \'-' means > descending order of . > If no prefix is given, use ascending order by default. > + > If multiple s is given, use comma to separate. > ++ > +--sync > +To retrieve information after updating the status of qgroups, > +invoke sync before getting information. This could be more specific, that it's a filesystem sync. > +--no-sync > +Do not invoke sync before getting information (default). I'm not sure we need this option, how is it supposed to be used? > @@ -311,8 +313,15 @@ static int cmd_qgroup_show(int argc, char **argv) > > while (1) { > int c; > + enum { > + GETOPT_VAL_SORT = 256, > + GETOPT_VAL_SYNC, > + GETOPT_VAL_NO_SYNC > + }; > static const struct option long_options[] = { > - {"sort", required_argument, NULL, 'S'}, > + {"sort", required_argument, NULL, GETOPT_VAL_SORT}, This change is unrelated to the patch, please make a separate patch for that. Otherwise looks good. > + {"sync", no_argument, NULL, GETOPT_VAL_SYNC}, > + {"no-sync", no_argument, NULL, GETOPT_VAL_NO_SYNC}, > { NULL, 0, NULL, 0 } > }; > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] duperemove: test presence of dedupe ioctl
On Fri, Dec 09, 2016 at 09:56:45AM -0800, Darrick J. Wong wrote: > Since a zero-length dedupe operation is guaranteed to succeed, use that > to test whether or not this filesystem supports dedupe. > > Signed-off-by: Darrick J. Wong > --- > file_scan.c | 47 +-- > 1 file changed, 37 insertions(+), 10 deletions(-) > > diff --git a/file_scan.c b/file_scan.c > index 617f166..a34453e 100644 > --- a/file_scan.c > +++ b/file_scan.c > @@ -45,11 +45,7 @@ > #include "file_scan.h" > #include "dbfile.h" > #include "util.h" > - > -/* This is not in linux/magic.h */ > -#ifndef XFS_SB_MAGIC > -#define XFS_SB_MAGIC0x58465342 /* 'XFSB' */ > -#endif > +#include "btrfs-ioctl.h" > > static char path[PATH_MAX] = { 0, }; > static char *pathp = path; > @@ -189,6 +185,39 @@ static int walk_dir(const char *name) > return ret; > } > > +struct fake_btrfs_ioctl_same_args { > + struct btrfs_ioctl_same_args args; > + struct btrfs_ioctl_same_extent_info info; > +}; Why does this need a fake structure here? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
Btw. the stall should be preceded by the OOM killer invocation. Could you share the OOM report please. I am asking because such an OOM killer would be clearly pre-mature as per your meminfo. I am trying to change that code and seeing your numbers might help me. Thanks! On Wed 14-12-16 11:17:43, Michal Hocko wrote: > On Tue 13-12-16 18:11:01, David Arendt wrote: > > Hi, > > > > I receive the following page allocation stall while copying lots of > > large files from one btrfs hdd to another. > > > > Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for > > 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL) > > Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8 > > Tainted: P O4.9.0 #1 > [...] > > Dec 13 13:04:29 server kernel: Call Trace: > > Dec 13 13:04:29 server kernel: [] ? dump_stack+0x46/0x5d > > Dec 13 13:04:29 server kernel: [] ? > > warn_alloc+0x111/0x130 > > Dec 13 13:04:33 server kernel: [] ? > > __alloc_pages_nodemask+0xbe8/0xd30 > > Dec 13 13:04:33 server kernel: [] ? > > pagecache_get_page+0xe4/0x230 > > Dec 13 13:04:33 server kernel: [] ? > > alloc_extent_buffer+0x10b/0x400 > > Dec 13 13:04:33 server kernel: [] ? > > btrfs_alloc_tree_block+0x125/0x560 > > OK, so this is > find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL) > > The main question is whether this really needs to be NOFS request... > > > Dec 13 13:04:33 server kernel: [] ? > > read_extent_buffer_pages+0x21f/0x280 > > Dec 13 13:04:33 server kernel: [] ? > > __btrfs_cow_block+0x141/0x580 > > Dec 13 13:04:33 server kernel: [] ? > > btrfs_cow_block+0x100/0x150 > > Dec 13 13:04:33 server kernel: [] ? > > btrfs_search_slot+0x1e9/0x9c0 > > Dec 13 13:04:33 server kernel: [] ? > > __set_extent_bit+0x512/0x550 > > Dec 13 13:04:33 server kernel: [] ? > > lookup_inline_extent_backref+0xf5/0x5e0 > > Dec 13 13:04:34 server kernel: [] ? > > set_extent_bit+0x24/0x30 > > Dec 13 13:04:34 server kernel: [] ? > > update_block_group.isra.34+0x114/0x380 > > Dec 13 13:04:34 server kernel: [] ? > > __btrfs_free_extent.isra.35+0xf4/0xd20 > > Dec 13 13:04:34 server kernel: [] ? > > btrfs_merge_delayed_refs+0x61/0x5d0 > > Dec 13 13:04:34 server kernel: [] ? > > __btrfs_run_delayed_refs+0x902/0x10a0 > > Dec 13 13:04:34 server kernel: [] ? > > btrfs_run_delayed_refs+0x90/0x2a0 > > Dec 13 13:04:34 server kernel: [] ? > > delayed_ref_async_start+0x84/0xa0 > > What would cause the reclaim recursion? > > > Dec 13 13:04:34 server kernel: Mem-Info: > > Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34 > > isolated_anon:0\x0a active_file:7370032 inactive_file:450105 > > isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189 > > unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a > > mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241 > > free_cma:0 > > This speaks for itself. There is a lot of dirty data, basically no > anonymous memory and GFP_NOFS cannot do much to reclaim obviously. This > is either a configuraion bug as somebody noted down the thread (setting > the dirty_ratio) or suboptimality of the btrfs code which might request > NOFS even though it is not strictly necessary. This would be more for > btrfs developers. > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
On Tue 13-12-16 18:11:01, David Arendt wrote: > Hi, > > I receive the following page allocation stall while copying lots of > large files from one btrfs hdd to another. > > Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for > 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL) > Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8 Tainted: > P O4.9.0 #1 [...] > Dec 13 13:04:29 server kernel: Call Trace: > Dec 13 13:04:29 server kernel: [] ? dump_stack+0x46/0x5d > Dec 13 13:04:29 server kernel: [] ? warn_alloc+0x111/0x130 > Dec 13 13:04:33 server kernel: [] ? > __alloc_pages_nodemask+0xbe8/0xd30 > Dec 13 13:04:33 server kernel: [] ? > pagecache_get_page+0xe4/0x230 > Dec 13 13:04:33 server kernel: [] ? > alloc_extent_buffer+0x10b/0x400 > Dec 13 13:04:33 server kernel: [] ? > btrfs_alloc_tree_block+0x125/0x560 OK, so this is find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL) The main question is whether this really needs to be NOFS request... > Dec 13 13:04:33 server kernel: [] ? > read_extent_buffer_pages+0x21f/0x280 > Dec 13 13:04:33 server kernel: [] ? > __btrfs_cow_block+0x141/0x580 > Dec 13 13:04:33 server kernel: [] ? > btrfs_cow_block+0x100/0x150 > Dec 13 13:04:33 server kernel: [] ? > btrfs_search_slot+0x1e9/0x9c0 > Dec 13 13:04:33 server kernel: [] ? > __set_extent_bit+0x512/0x550 > Dec 13 13:04:33 server kernel: [] ? > lookup_inline_extent_backref+0xf5/0x5e0 > Dec 13 13:04:34 server kernel: [] ? > set_extent_bit+0x24/0x30 > Dec 13 13:04:34 server kernel: [] ? > update_block_group.isra.34+0x114/0x380 > Dec 13 13:04:34 server kernel: [] ? > __btrfs_free_extent.isra.35+0xf4/0xd20 > Dec 13 13:04:34 server kernel: [] ? > btrfs_merge_delayed_refs+0x61/0x5d0 > Dec 13 13:04:34 server kernel: [] ? > __btrfs_run_delayed_refs+0x902/0x10a0 > Dec 13 13:04:34 server kernel: [] ? > btrfs_run_delayed_refs+0x90/0x2a0 > Dec 13 13:04:34 server kernel: [] ? > delayed_ref_async_start+0x84/0xa0 What would cause the reclaim recursion? > Dec 13 13:04:34 server kernel: Mem-Info: > Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34 > isolated_anon:0\x0a active_file:7370032 inactive_file:450105 > isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189 > unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a > mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241 > free_cma:0 This speaks for itself. There is a lot of dirty data, basically no anonymous memory and GFP_NOFS cannot do much to reclaim obviously. This is either a configuraion bug as somebody noted down the thread (setting the dirty_ratio) or suboptimality of the btrfs code which might request NOFS even though it is not strictly necessary. This would be more for btrfs developers. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html