Re: [RFC 3/3] btrfs: make shrink_delalloc() try harder to reclaim metadata space
Hi, Any comments about these 3 patches, thanks. Regards, Xiaoguang Wang On 09/22/2016 05:25 PM, Wang Xiaoguang wrote: Since commit b02441999efcc6152b87cd58e7970bb7843f76cf, we don't wait all ordered extents, but I run into some enospc errors when doing large file create and delete test, it's because shrink_delalloc() does not write enough delalloc bytes and wait them finished: From: Miao Xie Date: Mon, 4 Nov 2013 23:13:25 +0800 Subject: [PATCH] Btrfs: don't wait for the completion of all the ordered extents It is very likely that there are lots of ordered extents in the filesytem, if we wait for the completion of all of them when we want to reclaim some space for the metadata space reservation, we would be blocked for a long time. The performance would drop down suddenly for a long time. But since Josef introduced "Btrfs: introduce ticketed enospc infrastructure", shrink_delalloc() starts to be run asynchronously, then If we want to reclaim metadata space, we can try harder, after all, false enospc error is not acceptable. Signed-off-by: Wang Xiaoguang --- fs/btrfs/extent-tree.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 46c2a37..f7c420b 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4721,7 +4721,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, if (trans) return; if (wait_ordered) - btrfs_wait_ordered_roots(root->fs_info, items, + btrfs_wait_ordered_roots(root->fs_info, -1, 0, (u64)-1); return; } @@ -4775,6 +4775,14 @@ skip_async: } delalloc_bytes = percpu_counter_sum_positive( &root->fs_info->delalloc_bytes); + if (loops == 2) { + /* +* Try to write all current delalloc bytes and wait all +* ordered extents to have a last try. +*/ + to_reclaim = delalloc_bytes; + items = -1; + } } } -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix enospc in punch hole
From: Robbie Ko when extent-tree level > BTRFS_MAX_LEVEL / 2, __btrfs_drop_extents -> btrfs_duplicate_item -> setup_leaf_for_split -> split_leaf maybe enospc, because min_size is too small, need use btrfs_calc_trans_metadata_size. Signed-off-by: Robbie Ko --- fs/btrfs/file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index fea31a4..809ca85 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2322,7 +2322,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) u64 tail_len; u64 orig_start = offset; u64 cur_offset; - u64 min_size = btrfs_calc_trunc_metadata_size(root, 1); + u64 min_size = btrfs_calc_trans_metadata_size(root, 1); u64 drop_end; int ret = 0; int err = 0; @@ -2469,7 +2469,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) ret = -ENOMEM; goto out_free; } - rsv->size = btrfs_calc_trunc_metadata_size(root, 1); + rsv->size = btrfs_calc_trans_metadata_size(root, 1); rsv->failfast = 1; /* -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: image: fix compiler warning
On 2016/10/07 1:00, David Sterba wrote: > On Wed, Oct 05, 2016 at 05:07:48PM +0900, Tsutomu Itoh wrote: >> In v4.8-rc1, gcc 5.3.1 gives following warning. Fixed it. >> >> [CC] btrfs-image.o >> btrfs-image.c: In function 'flush_pending': >> btrfs-image.c:708:17: warning: 'start' may be used uninitialized in this >> function [-Wmaybe-uninitialized] >> header->bytenr = cpu_to_le64(start); >> ^ >> btrfs-image.c:927:6: note: 'start' was declared here >> u64 start; >> ^ > > So the patch makes the compiler warning go away, but is the code > correct? AFAICS, the warning points to the case where flush_pending is > called with done=1 (from create_metadump) and there's zero > md->pending_size . Are you sure this is an expected state and that the > function can proceed with state = 0 ? > I think that this is a case where some errors occurred before calling flush_pending. Therefore, create_metadump returns the error to the caller. (creating the image fails, I think.) Thanks, Tsutomu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: relocation: Fix NULL pointer access and leaking qgroups in btrfs_recover_relocation
At 10/07/2016 02:00 AM, Goldwyn Rodrigues wrote: Hi Qu, On 10/06/2016 03:03 AM, Qu Wenruo wrote: We fixed balance qgroup leaking by introducing qgroup_fix_relocated_data_extents(), but it only covers normal balance routine well. For case btrfs_recover_relocation() routine, since rc->stage or rc->data_inode are not initialized, we either skip qgroup_fix_relocated_data_extents() or just cause NULL pointer access. Since we skip qgroup_fix_relocated_data_extents() in btrfs_recover_relocation(), we will still leak qgroup numbers for that routine. In the fix, we won't use data_inode any longer, as at the timing of the qgroup fix, noone will modify data_reloc tree any longer, so path locking should be good enough. And add check against rc->merge_reloc_tree, so we can detect if we are in btrfs_recover_relocation() routine and continue qgroup fix. While testing this patch, I realized the original problem of qgroup values being incorrect after a balance, has returned... or probably was not solved completely. I tried it with the shell script sent sometime back. The script fails when I checkout fix df2c95f33e0a2 as well. I remember it did not appear when I tested it last, so it is possible that some other change has affected this. Would recommend to hold off this until the balance issue is not completely fixed. For reference, the test script is: #!/bin/bash -x MNT="/mnt" DEV="/dev/vdb" mkfs.btrfs -f $DEV mount -t btrfs $DEV $MNT mkdir $MNT/snaps echo "populate $MNT with some data" #cp -a /usr/share/fonts $MNT/ cp -a /usr/ $MNT/ & for i in `seq -w 0 8`; do S="$MNT/snaps/snap$i" echo "create and populate $S" btrfs su snap $MNT $S; cp -a /boot $S; done; #let the cp from above finish wait btrfs fi sync $MNT btrfs quota enable $MNT btrfs quota rescan -w $MNT btrfs qg show $MNT umount $MNT mount -t btrfs $DEV $MNT time btrfs balance start --full-balance $MNT umount $MNT btrfsck $DEV Thanks for the report. Yes, the script can reproduce the problem, almost 100% possibility. But it also takes a long time to run.(Partly because of the slow find_parent_nodes function call). I tried to simplify it by using inline extent to bumping the tree level and use small file extents to reduce IO. But no good reproducer yet. Did you remember which kernel version you tested the original fix? Maybe it could provide some clue. Thanks, Qu Reported-by: Goldwyn Rodrigues Signed-off-by: Qu Wenruo --- fs/btrfs/relocation.c | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index c0c13dc..f250187 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -3946,8 +3946,7 @@ static int qgroup_fix_relocated_data_extents(struct btrfs_trans_handle *trans, struct reloc_control *rc) { struct btrfs_fs_info *fs_info = rc->extent_root->fs_info; - struct inode *inode = rc->data_inode; - struct btrfs_root *data_reloc_root = BTRFS_I(inode)->root; + struct btrfs_root *data_reloc_root; struct btrfs_path *path; struct btrfs_key key; int ret = 0; @@ -3956,18 +3955,28 @@ static int qgroup_fix_relocated_data_extents(struct btrfs_trans_handle *trans, return 0; /* +* For normal balance routine, merge_reloc_tree flag is always cleared +* before commit trans. +* For recover relocation routine, merge_reloc_tree is always 1, we need +* to continue anyway. +* * Only for stage where we update data pointers the qgroup fix is * valid. * For MOVING_DATA stage, we will miss the timing of swapping tree * blocks, and won't fix it. */ - if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found)) + if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found) && + rc->merge_reloc_tree == 0) return 0; + data_reloc_root = read_fs_root(fs_info, BTRFS_DATA_RELOC_TREE_OBJECTID); + if (IS_ERR(data_reloc_root)) + return PTR_ERR(data_reloc_root); + path = btrfs_alloc_path(); if (!path) return -ENOMEM; - key.objectid = btrfs_ino(inode); + key.objectid = BTRFS_FIRST_FREE_OBJECTID + 1; key.type = BTRFS_EXTENT_DATA_KEY; key.offset = 0; @@ -3975,13 +3984,10 @@ static int qgroup_fix_relocated_data_extents(struct btrfs_trans_handle *trans, if (ret < 0) goto out; - lock_extent(&BTRFS_I(inode)->io_tree, 0, (u64)-1); while (1) { struct btrfs_file_extent_item *fi; btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); - if (key.objectid > btrfs_ino(inode)) - break; if (key.type != BTRFS_EXTENT_DATA_KEY) goto next; fi = btrfs_item_ptr(path->no
[PATCH] Btrfs: fix leak subvol subv_writers conter
From: Robbie Ko In run_delalloc_nocow, maybe not release subv_writers conter, will lead to create snapshot hang. Signed-off-by: Robbie Ko --- fs/btrfs/inode.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e6811c4..9722554 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1386,11 +1386,17 @@ next_slot: * this ensure that csum for a given extent are * either valid or do not exist. */ - if (csum_exist_in_range(root, disk_bytenr, num_bytes)) + if (csum_exist_in_range(root, disk_bytenr, num_bytes)) { + if (!nolock) + btrfs_end_write_no_snapshoting(root); goto out_check; + } if (!btrfs_inc_nocow_writers(root->fs_info, -disk_bytenr)) +disk_bytenr)) { + if (!nolock) + btrfs_end_write_no_snapshoting(root); goto out_check; + } nocow = 1; } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) { extent_end = found_key.offset + -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is is possible to submit binary image as fstest test case?
On Thu, Oct 06, 2016 at 08:29:28AM -0400, Brian Foster wrote: > Doesn't necessarily bother me one way or the other, but something we've > done with XFS in such situations is introduce a DEBUG mode only sysfs > tunable that delays certain infrastructure (log recovery in our case) to > coordinate with test cases that try to reproduce such timing/racing > problems. The other approach the btrfs folks might consider is to have a sufficiently powerful "debugfs" or "xfs_db" program that can generate test images with the desired property. I've found that the time I've invested in creating debugfs has repaid itself a hundred times over, especially when maintaining a regression test suite for e2fsck. Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Making statfs return qgroup info (for container use case etc.)
On Thu, Oct 06, 2016 at 02:51:58PM +0100, Tim Small wrote: > Hello, > > I use btrfs for containers (e.g. full OS style containers using lxc/lxd > etc. rather than application containers). btrfs de-duplication and > send/receive are very useful features for this use-case. > > Each container runs in its own subvolume, and I use quotas to stop one > container exhausting the disk space of the others. > > df shows the total space + free space for the entire filesystem - which > is confusing for users within the containers. Worse - they don't have > any way of knowing how much quota they actually have left (other than du > -xs / vs. whatever I've told them). > > It'd be really nice if running e.g. statfs("/home", ...) within a > container could be made to return the qgroup usage + limit for the path > (e.g. by passing in an option at mount time - such as qgroup level > maybe?) , instead of the global filesystem data in f_bfree f_blocks etc. XFS does this with directory tree quotas. It was implmented 10 years ago or so, IIRC... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: relocation: Fix NULL pointer access and leaking qgroups in btrfs_recover_relocation
Hi Qu, On 10/06/2016 03:03 AM, Qu Wenruo wrote: > We fixed balance qgroup leaking by introducing > qgroup_fix_relocated_data_extents(), but it only covers normal balance > routine well. > > For case btrfs_recover_relocation() routine, since rc->stage or > rc->data_inode are not initialized, we either skip > qgroup_fix_relocated_data_extents() or just cause NULL pointer access. > > Since we skip qgroup_fix_relocated_data_extents() in > btrfs_recover_relocation(), we will still leak qgroup numbers for that > routine. > > In the fix, we won't use data_inode any longer, as at the timing of the > qgroup fix, noone will modify data_reloc tree any longer, so path > locking should be good enough. > > And add check against rc->merge_reloc_tree, so we can detect if we are > in btrfs_recover_relocation() routine and continue qgroup fix. While testing this patch, I realized the original problem of qgroup values being incorrect after a balance, has returned... or probably was not solved completely. I tried it with the shell script sent sometime back. The script fails when I checkout fix df2c95f33e0a2 as well. I remember it did not appear when I tested it last, so it is possible that some other change has affected this. Would recommend to hold off this until the balance issue is not completely fixed. For reference, the test script is: #!/bin/bash -x MNT="/mnt" DEV="/dev/vdb" mkfs.btrfs -f $DEV mount -t btrfs $DEV $MNT mkdir $MNT/snaps echo "populate $MNT with some data" #cp -a /usr/share/fonts $MNT/ cp -a /usr/ $MNT/ & for i in `seq -w 0 8`; do S="$MNT/snaps/snap$i" echo "create and populate $S" btrfs su snap $MNT $S; cp -a /boot $S; done; #let the cp from above finish wait btrfs fi sync $MNT btrfs quota enable $MNT btrfs quota rescan -w $MNT btrfs qg show $MNT umount $MNT mount -t btrfs $DEV $MNT time btrfs balance start --full-balance $MNT umount $MNT btrfsck $DEV > > Reported-by: Goldwyn Rodrigues > Signed-off-by: Qu Wenruo > --- > fs/btrfs/relocation.c | 21 + > 1 file changed, 13 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index c0c13dc..f250187 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -3946,8 +3946,7 @@ static int qgroup_fix_relocated_data_extents(struct > btrfs_trans_handle *trans, >struct reloc_control *rc) > { > struct btrfs_fs_info *fs_info = rc->extent_root->fs_info; > - struct inode *inode = rc->data_inode; > - struct btrfs_root *data_reloc_root = BTRFS_I(inode)->root; > + struct btrfs_root *data_reloc_root; > struct btrfs_path *path; > struct btrfs_key key; > int ret = 0; > @@ -3956,18 +3955,28 @@ static int qgroup_fix_relocated_data_extents(struct > btrfs_trans_handle *trans, > return 0; > > /* > + * For normal balance routine, merge_reloc_tree flag is always cleared > + * before commit trans. > + * For recover relocation routine, merge_reloc_tree is always 1, we need > + * to continue anyway. > + * >* Only for stage where we update data pointers the qgroup fix is >* valid. >* For MOVING_DATA stage, we will miss the timing of swapping tree >* blocks, and won't fix it. >*/ > - if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found)) > + if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found) && > + rc->merge_reloc_tree == 0) > return 0; > > + data_reloc_root = read_fs_root(fs_info, BTRFS_DATA_RELOC_TREE_OBJECTID); > + if (IS_ERR(data_reloc_root)) > + return PTR_ERR(data_reloc_root); > + > path = btrfs_alloc_path(); > if (!path) > return -ENOMEM; > - key.objectid = btrfs_ino(inode); > + key.objectid = BTRFS_FIRST_FREE_OBJECTID + 1; > key.type = BTRFS_EXTENT_DATA_KEY; > key.offset = 0; > > @@ -3975,13 +3984,10 @@ static int qgroup_fix_relocated_data_extents(struct > btrfs_trans_handle *trans, > if (ret < 0) > goto out; > > - lock_extent(&BTRFS_I(inode)->io_tree, 0, (u64)-1); > while (1) { > struct btrfs_file_extent_item *fi; > > btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); > - if (key.objectid > btrfs_ino(inode)) > - break; > if (key.type != BTRFS_EXTENT_DATA_KEY) > goto next; > fi = btrfs_item_ptr(path->nodes[0], path->slots[0], > @@ -4004,7 +4010,6 @@ next: > break; > } > } > - unlock_extent(&BTRFS_I(inode)->io_tree, 0 , (u64)-1); > out: > btrfs_free_path(path); > return ret; > -- Goldwyn -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.or
Re: Problem with btrfs_create() getting EBUSY; cause is stale highest_objectid at mount - kernel.org bug 173001
Dave Olson wrote: > Dave Olson wrote: > > > The full details are in > > https://bugzilla.kernel.org/show_bug.cgi?id=173001 > > > > Basicly, logrotate has rotated a file to a new name, tries to open a new > > file with the original name, and gets EBUSY. The file is not created. The cause is that the highest_objectid field in the btrfs_root structure is set incorrectly (an old stale value) at filesystem mount in some cases. See the writeup in the bug for more details. I'm guessing that some of the journal information is not flushed to disk after btrfs_create(), even when the filesystem is mounted with flushoncommit, so if a powercycle is done within a few seconds of the file being created, there is a good chance that this problem will occur on a subsequent reboot. btrfs check (up through 4.7.3) does not seem to detect this stale condition. Dave Olson ol...@cumulusnetworks.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: image: fix compiler warning
On Wed, Oct 05, 2016 at 05:07:48PM +0900, Tsutomu Itoh wrote: > In v4.8-rc1, gcc 5.3.1 gives following warning. Fixed it. > > [CC] btrfs-image.o > btrfs-image.c: In function 'flush_pending': > btrfs-image.c:708:17: warning: 'start' may be used uninitialized in this > function [-Wmaybe-uninitialized] > header->bytenr = cpu_to_le64(start); > ^ > btrfs-image.c:927:6: note: 'start' was declared here > u64 start; > ^ So the patch makes the compiler warning go away, but is the code correct? AFAICS, the warning points to the case where flush_pending is called with done=1 (from create_metadump) and there's zero md->pending_size . Are you sure this is an expected state and that the function can proceed with state = 0 ? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: subvol_uuid_search: Return error code on memory allocation failure
On Wed, Oct 05, 2016 at 06:33:12PM +0530, Prasanth K S R wrote: > From: Prasanth K S R > > This commit fixes coverity defect CID 1328695. > > Signed-off-by: Prasanth K S R > --- > send-utils.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/send-utils.c b/send-utils.c > index a85fa08..6f80b6f 100644 > --- a/send-utils.c > +++ b/send-utils.c > @@ -486,6 +486,11 @@ struct subvol_info *subvol_uuid_search(struct > subvol_uuid_search *s, > info->path = strdup(path); > } else { > info->path = malloc(PATH_MAX); > + if (!info->path) { > + fprintf(stderr, "Memory allocation failed\n"); Please use the error() helper (does not need \n terminated string). Also, would be grat to convert subvol_uuid_search to return ERR_PTR style value (which means to conver all callers). Now we don't see why exactly it fails and interpret it as nonexistence of the uuid. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: Fix warning_trace compile error if backtrace is disabled
On Thu, Oct 06, 2016 at 04:47:19PM +0800, Qu Wenruo wrote: > If we disable backtrace, btrfs-progs can't be compiled since we don't > have warning_trace defined. > > Fix by move it out of #ifndef BTRFS_DISABLE_BACKTRACE block. But now this breaks with backtrace, eg: kerncompat.h: In function ‘warning_trace’: kerncompat.h:91:3: warning: implicit declaration of function ‘print_trace’ [-Wimplicit-function-declaration] print_trace(); ^~~ kerncompat.h: At top level: kerncompat.h:97:20: warning: conflicting types for ‘print_trace’ static inline void print_trace(void) ^~~ kerncompat.h:97:20: error: static declaration of ‘print_trace’ follows non-static declaration kerncompat.h:91:3: note: previous implicit declaration of ‘print_trace’ was here print_trace(); I've fixed it and will add a script to cycle through common build configuration so we can automate that a bit. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Making statfs return qgroup info (for container use case etc.)
Hello, I use btrfs for containers (e.g. full OS style containers using lxc/lxd etc. rather than application containers). btrfs de-duplication and send/receive are very useful features for this use-case. Each container runs in its own subvolume, and I use quotas to stop one container exhausting the disk space of the others. df shows the total space + free space for the entire filesystem - which is confusing for users within the containers. Worse - they don't have any way of knowing how much quota they actually have left (other than du -xs / vs. whatever I've told them). It'd be really nice if running e.g. statfs("/home", ...) within a container could be made to return the qgroup usage + limit for the path (e.g. by passing in an option at mount time - such as qgroup level maybe?) , instead of the global filesystem data in f_bfree f_blocks etc. A useful tweak might be to only return the qgroup info within non-root user namespaces, or perhaps just if the root UID != 0 (since the normal use case for this is unprivileged containers), but I've no idea how viable that idea is. Yet another idea - (since btrfs always returns 0 for f_files - used for inodes by other FS), then perhaps having the option of returning quota information there would be useful (or maybe return quota referenced size info in f_blocks and f_bfree, and exclusive size info in f_files and f_free)? Googling shows that this question has come up briefly on the #btrfs IRC channel in the past. Does sound like something that could be added to btrfs? What I need to do currently only works with zfs, but I don't really want to go there if I can avoid it... ref: search for btrfs in this web page: https://www.stgraber.org/2016/03/26/lxd-2-0-resource-control-412/ ... the shown working example is on zfs... Tim. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] btrfs-progs: tests: make the ioctl-test actually useful
Enhance the test to verify ioctl uniqueness, compare the defined values against the hardcoded expected values, and take care of the 32bit/64bit compatibility workarounds. Signed-off-by: David Sterba --- Makefile.in | 27 +- ioctl-test.c | 266 +-- 2 files changed, 263 insertions(+), 30 deletions(-) diff --git a/Makefile.in b/Makefile.in index 983b8b9a6193..48ff54ca60f2 100644 --- a/Makefile.in +++ b/Makefile.in @@ -387,9 +387,30 @@ quick-test: $(objects) $(libs) quick-test.o @echo "[LD] $@" $(Q)$(CC) $(CFLAGS) -o quick-test $(objects) $(libs) quick-test.o $(LDFLAGS) $(LIBS) -ioctl-test: $(objects) $(libs) ioctl-test.o - @echo "[LD] $@" - $(Q)$(CC) $(CFLAGS) -o ioctl-test $(objects) $(libs) ioctl-test.o $(LDFLAGS) $(LIBS) +ioctl-test-64.o: ioctl-test.c ioctl.h kerncompat.h ctree.h + @echo "[CC64] $@" + $(Q)$(CC) $(CFLAGS) -m64 -c $< -o $@ + +ioctl-test-32.o: ioctl-test.c ioctl.h kerncompat.h ctree.h + @echo "[CC32] $@" + $(Q)$(CC) $(CFLAGS) -m32 -c $< -o $@ + +ioctl-test-32: ioctl-test-32.o + @echo "[LD32] $@" + $(Q)$(CC) $(CFLAGS) -m32 -o $@ $< $(LDFLAGS) + @echo " ?[PAHOLE] $@" + -$(Q)pahole $@ > $@.pahole + +ioctl-test-64: ioctl-test-64.o + @echo "[LD64] $@" + $(Q)$(CC) $(CFLAGS) -m64 -o $@ $< $(LDFLAGS) + @echo " ?[PAHOLE] $@" + -$(Q)pahole $@ > $@.pahole + +test-ioctl: ioctl-test-32 ioctl-test-64 + @echo "[TEST/ioctl]" + @$(Q)./ioctl-test-32 > ioctl-test-32.log + @$(Q)./ioctl-test-64 > ioctl-test-64.log send-test: $(objects) $(libs) send-test.o @echo "[LD] $@" diff --git a/ioctl-test.c b/ioctl-test.c index 54fc013584e3..26b3c68df606 100644 --- a/ioctl-test.c +++ b/ioctl-test.c @@ -1,37 +1,249 @@ +/* + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + */ + +#include "kerncompat.h" #include #include -#include "kerncompat.h" + #include "ioctl.h" +#include "ctree.h" + +#define LIST_32_COMPAT \ + ONE(BTRFS_IOC_SET_RECEIVED_SUBVOL_32) + +#define LIST_64_COMPAT \ + ONE(BTRFS_IOC_SEND_64) + +#define LIST_BASE \ + ONE(BTRFS_IOC_SNAP_CREATE) \ + ONE(BTRFS_IOC_DEFRAG) \ + ONE(BTRFS_IOC_RESIZE) \ + ONE(BTRFS_IOC_SCAN_DEV) \ + ONE(BTRFS_IOC_TRANS_START) \ + ONE(BTRFS_IOC_TRANS_END)\ + ONE(BTRFS_IOC_SYNC) \ + ONE(BTRFS_IOC_CLONE)\ + ONE(BTRFS_IOC_ADD_DEV) \ + ONE(BTRFS_IOC_RM_DEV) \ + ONE(BTRFS_IOC_BALANCE) \ + ONE(BTRFS_IOC_CLONE_RANGE) \ + ONE(BTRFS_IOC_SUBVOL_CREATE)\ + ONE(BTRFS_IOC_SNAP_DESTROY) \ + ONE(BTRFS_IOC_DEFRAG_RANGE) \ + ONE(BTRFS_IOC_TREE_SEARCH) \ + ONE(BTRFS_IOC_TREE_SEARCH_V2) \ + ONE(BTRFS_IOC_INO_LOOKUP) \ + ONE(BTRFS_IOC_DEFAULT_SUBVOL) \ + ONE(BTRFS_IOC_SPACE_INFO) \ + ONE(BTRFS_IOC_START_SYNC) \ + ONE(BTRFS_IOC_WAIT_SYNC)\ + ONE(BTRFS_IOC_SNAP_CREATE_V2) \ + ONE(BTRFS_IOC_SUBVOL_CREATE_V2) \ + ONE(BTRFS_IOC_SUBVOL_GETFLAGS) \ + ONE(BTRFS_IOC_SUBVOL_SETFLAGS) \ + ONE(BTRFS_IOC_SCRUB)\ + ONE(BTRFS_IOC_SCRUB_CANCEL) \ + ONE(BTRFS_IOC_SCRUB_PROGRESS) \ + ONE(BTRFS_IOC_DEV_INFO) \ + ONE(BTRFS_IOC_FS_INFO) \ + ONE(BTRFS_IOC_BALANCE_V2) \ + ONE(BTRFS_IOC_BALANCE_CTL) \ + ONE(BTRFS_IOC_BALANCE_PROGRESS) \ + ONE(BTRFS_IOC_INO_PATHS)\ + ONE(BTRFS_IOC_LOGICAL_INO) \ + ONE(BTRFS_IOC_SET_RECEIVED_SUBVOL) \ + ONE(BTRFS_IOC_SEND) \ + ONE(BTRFS_IOC_DEVICES_READY)\ + ONE(BTRFS_IOC_QUOTA_CTL)\ + ONE(BTRFS_IOC_QGROUP_ASSIGN)\ + ONE(BTRFS_IOC_QGROUP_CREATE)\
[PATCH 3/4] btrfs-progs: ioctl: add 64bit compat for SEND
The ioctl value of SEND will be different on 32bit userspace and 64bit kernel due to different pointer type width, that unfortunatelly made it into the structure definition. To maintain backward compatibility, we must do it in the 64bit->32bit way, because we don't have the kernel side workardound like SET_RECEIVED_SUBVOL has. Changing value of SEND would then break existing users of the raw ioctl. The compatibility structure and ioctl should not be used, exists for documentation, and testing. Signed-off-by: David Sterba --- ioctl.h | 31 +-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/ioctl.h b/ioctl.h index 6baa51ab1724..f7233da90637 100644 --- a/ioctl.h +++ b/ioctl.h @@ -635,8 +635,8 @@ struct btrfs_ioctl_send_args { __u64 reserved[4]; /* in */ } __attribute__((packed)); /* - * Size of structure depends on pointer width, was not caught. Kernel handles - * pointer width differences transparently + * Size of structure depends on pointer width, was not caught in the early + * days. Kernel handles pointer width differences transparently. */ BUILD_ASSERT(sizeof(__u64 *) == 8 ? sizeof(struct btrfs_ioctl_send_args) == 72 @@ -644,6 +644,28 @@ BUILD_ASSERT(sizeof(__u64 *) == 8 ? sizeof(struct btrfs_ioctl_send_args) == 68 : 0)); +/* + * Different pointer width leads to structure size change. Kernel should accept + * both ioctl values (derived from the structures) for backward compatibility. + * Size of this structure is same on 32bit and 64bit though. + * + * NOTE: do not use in your code, this is for testing only + */ +struct btrfs_ioctl_send_args_64 { + __s64 send_fd; /* in */ + __u64 clone_sources_count; /* in */ + union { + __u64 __user *clone_sources;/* in */ + __u64 __clone_sources_alignment; + }; + __u64 parent_root; /* in */ + __u64 flags;/* in */ + __u64 reserved[4]; /* in */ +} __attribute__((packed)); +BUILD_ASSERT(sizeof(struct btrfs_ioctl_send_args_64) == 72); + +#define BTRFS_IOC_SEND_64_COMPAT_DEFINED 1 + /* Error codes as returned by the kernel */ enum btrfs_err_code { notused, @@ -761,6 +783,11 @@ static inline char *btrfs_err_str(enum btrfs_err_code err_code) struct btrfs_ioctl_received_subvol_args_32) #endif +#ifdef BTRFS_IOC_SEND_64_COMPAT_DEFINED +#define BTRFS_IOC_SEND_64 _IOW(BTRFS_IOCTL_MAGIC, 38, \ + struct btrfs_ioctl_send_args_64) +#endif + #define BTRFS_IOC_SEND _IOW(BTRFS_IOCTL_MAGIC, 38, struct btrfs_ioctl_send_args) #define BTRFS_IOC_DEVICES_READY _IOR(BTRFS_IOCTL_MAGIC, 39, \ struct btrfs_ioctl_vol_args) -- 2.10.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] btrfs-progs: ioctl: pack structures
Add the packed attribute to several structures so we avoid any accidental size changes due to compiler-dependent padding. Expected alignment is 8 bytes, padded where needed. Signed-off-by: David Sterba --- ioctl.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/ioctl.h b/ioctl.h index a7235c00c1a0..d0c06657f0c0 100644 --- a/ioctl.h +++ b/ioctl.h @@ -149,7 +149,7 @@ struct btrfs_ioctl_scrub_args { struct btrfs_scrub_progress progress; /* out */ /* pad to 1k */ __u64 unused[(1024-32-sizeof(struct btrfs_scrub_progress))/8]; -}; +} __attribute__((packed)); BUILD_ASSERT(sizeof(struct btrfs_ioctl_scrub_args) == 1024); #define BTRFS_IOCTL_DEV_REPLACE_CONT_READING_FROM_SRCDEV_MODE_ALWAYS 0 @@ -160,7 +160,7 @@ struct btrfs_ioctl_dev_replace_start_params { * above */ __u8 srcdev_name[BTRFS_DEVICE_PATH_NAME_MAX + 1]; /* in */ __u8 tgtdev_name[BTRFS_DEVICE_PATH_NAME_MAX + 1]; /* in */ -}; +} __attribute__((packed, aligned (8))); BUILD_ASSERT(sizeof(struct btrfs_ioctl_dev_replace_start_params) == 2072); #define BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED0 @@ -175,7 +175,7 @@ struct btrfs_ioctl_dev_replace_status_params { __u64 time_stopped; /* out, seconds since 1-Jan-1970 */ __u64 num_write_errors; /* out */ __u64 num_uncorrectable_read_errors;/* out */ -}; +} __attribute__((packed)); BUILD_ASSERT(sizeof(struct btrfs_ioctl_dev_replace_status_params) == 48); #define BTRFS_IOCTL_DEV_REPLACE_CMD_START 0 @@ -556,7 +556,7 @@ BUILD_ASSERT(sizeof(struct btrfs_ioctl_qgroup_create_args) == 16); struct btrfs_ioctl_timespec { __u64 sec; __u32 nsec; -}; +} __attribute__((packed, aligned (8))); struct btrfs_ioctl_received_subvol_args { charuuid[BTRFS_UUID_SIZE]; /* in */ @@ -601,7 +601,7 @@ struct btrfs_ioctl_send_args { __u64 parent_root; /* in */ __u64 flags;/* in */ __u64 reserved[4]; /* in */ -}; +} __attribute__((packed)); /* * Size of structure depends on pointer width, was not caught. Kernel handles * pointer width differences transparently -- 2.10.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] btrfs-progs: ioctl: add 32bit compat for SET_RECEIVED_SUBVOL
The ioctl value of SET_RECEIVED_SUBVOL will be different on 32bit userspace and 64bit kernel. This is transparently handled on the kernel side. We now define the same structure so we can verify the ioctl value. The defined structure and ioctl should not be used. It is exists only for testing purposes, documenting the mess and as a warning for the future. Signed-off-by: David Sterba --- ioctl.h | 38 ++ 1 file changed, 38 insertions(+) diff --git a/ioctl.h b/ioctl.h index d0c06657f0c0..6baa51ab1724 100644 --- a/ioctl.h +++ b/ioctl.h @@ -570,6 +570,38 @@ struct btrfs_ioctl_received_subvol_args { BUILD_ASSERT(sizeof(struct btrfs_ioctl_received_subvol_args) == 200); /* + * If we have a 32-bit userspace and 64-bit kernel, then the UAPI + * structures are incorrect, as the timespec structure from userspace + * is 4 bytes too small. We define these alternatives here for backward + * compatibility, the kernel understands both values. + */ + +/* + * Structure size is different on 32bit and 64bit, has some padding if the + * structure is embedded. Packing makes sure the size is same on both, but will + * be misaligned on 64bit. + * + * NOTE: do not use in your code, this is for testing only + */ +struct btrfs_ioctl_timespec_32 { + __u64 sec; + __u32 nsec; +} __attribute__ ((__packed__)); + +struct btrfs_ioctl_received_subvol_args_32 { + charuuid[BTRFS_UUID_SIZE]; /* in */ + __u64 stransid; /* in */ + __u64 rtransid; /* out */ + struct btrfs_ioctl_timespec_32 stime; /* in */ + struct btrfs_ioctl_timespec_32 rtime; /* out */ + __u64 flags; /* in */ + __u64 reserved[16]; /* in */ +} __attribute__ ((__packed__)); +BUILD_ASSERT(sizeof(struct btrfs_ioctl_received_subvol_args_32) == 192); + +#define BTRFS_IOC_SET_RECEIVED_SUBVOL_32_COMPAT_DEFINED 1 + +/* * Caller doesn't want file data in the send stream, even if the * search of clone sources doesn't find an extent. UPDATE_EXTENT * commands will be sent instead of WRITE commands. @@ -723,6 +755,12 @@ static inline char *btrfs_err_str(enum btrfs_err_code err_code) struct btrfs_ioctl_logical_ino_args) #define BTRFS_IOC_SET_RECEIVED_SUBVOL _IOWR(BTRFS_IOCTL_MAGIC, 37, \ struct btrfs_ioctl_received_subvol_args) + +#ifdef BTRFS_IOC_SET_RECEIVED_SUBVOL_32_COMPAT_DEFINED +#define BTRFS_IOC_SET_RECEIVED_SUBVOL_32 _IOWR(BTRFS_IOCTL_MAGIC, 37, \ + struct btrfs_ioctl_received_subvol_args_32) +#endif + #define BTRFS_IOC_SEND _IOW(BTRFS_IOCTL_MAGIC, 38, struct btrfs_ioctl_send_args) #define BTRFS_IOC_DEVICES_READY _IOR(BTRFS_IOCTL_MAGIC, 39, \ struct btrfs_ioctl_vol_args) -- 2.10.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Btrfs-progs 4.8, fix build on 32bit, ioctl cleanups
So this series should fix the 32bit build problems, tested on gcc 6.2.1. I've added test of checking the ioctl numbers: "make test-ioctl" that will build 32bit and 64bit versions (needs compiler for both). The ioctl compatibility reasoning is in the changelogs. The goal is to keep everything working (send, receive), so there's no visible change, but more testing would be welcome of course. The patchset is now in devel, on top of 4.8. David Sterba (4): btrfs-progs: ioctl: pack structures btrfs-progs: ioctl: add 32bit compat for SET_RECEIVED_SUBVOL btrfs-progs: ioctl: add 64bit compat for SEND btrfs-progs: tests: make the ioctl-test actually useful Makefile.in | 27 +- ioctl-test.c | 266 +-- ioctl.h | 79 -- 3 files changed, 335 insertions(+), 37 deletions(-) -- 2.10.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] exportfs: be careful to only return expected errors.
On Thu, Oct 06, 2016 at 05:39:24PM +1100, NeilBrown wrote: > On Thu, Aug 04 2016, NeilBrown wrote: > > > > > > > When nfsd calls fh_to_dentry, it expect ESTALE or ENOMEM as errors. > > In particular it can be tempting to return ENOENT, but this is not > > handled well by nfsd. > > > > Rather than requiring strict adherence to error code code filesystems, > > treat all unexpected error codes the same as ESTALE. This is safest. > > > > Signed-off-by: NeilBrown > > --- > > > > I didn't add a dprintk for unexpected error messages, partly > > because dprintk isn't usable in exportfs. I could have used pr_debug() > > but I really didn't see much value. > > > > This has been tested together with the btrfs change, and it restores > > correct functionality. > > > > Thanks, > > NeilBrown > > > Hi Bruce, > I notice that this patch isn't in -next, but the btrfs change which > motivated it is. > That means, unless there is some other work around somewhere, btrfs > might start having problems with nfs export. Whoops, I wonder what happened. Looking back Oh, I think I set it aside pending a response from Christoph, but I don't think that's really necessary. > Can we get this patch into 4.9-rc?? > > Or has someone fixed it a different way? No, I'll just apply. I need to send a pull request soon, I just have to make up my mind on COPY first. --b. > Thanks, > NeilBrown > > > > > > fs/exportfs/expfs.c | 10 ++ > > include/linux/exportfs.h | 13 +++-- > > 2 files changed, 13 insertions(+), 10 deletions(-) > > > > diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c > > index 207ba8d627ca..a4b531be9168 100644 > > --- a/fs/exportfs/expfs.c > > +++ b/fs/exportfs/expfs.c > > @@ -428,10 +428,10 @@ struct dentry *exportfs_decode_fh(struct vfsmount > > *mnt, struct fid *fid, > > if (!nop || !nop->fh_to_dentry) > > return ERR_PTR(-ESTALE); > > result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type); > > - if (!result) > > - result = ERR_PTR(-ESTALE); > > - if (IS_ERR(result)) > > - return result; > > + if (PTR_ERR(result) == -ENOMEM) > > + return ERR_CAST(result); > > + if (IS_ERR_OR_NULL(result)) > > + return ERR_PTR(-ESTALE); > > > > if (d_is_dir(result)) { > > /* > > @@ -541,6 +541,8 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, > > struct fid *fid, > > > > err_result: > > dput(result); > > + if (err != -ENOMEM) > > + err = -ESTALE; > > return ERR_PTR(err); > > } > > EXPORT_SYMBOL_GPL(exportfs_decode_fh); > > diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h > > index b03c0625fa6e..5ab958cdc50b 100644 > > --- a/include/linux/exportfs.h > > +++ b/include/linux/exportfs.h > > @@ -157,12 +157,13 @@ struct fid { > > *@fh_to_dentry is given a &struct super_block (@sb) and a file handle > > *fragment (@fh, @fh_len). It should return a &struct dentry which > > refers > > *to the same file that the file handle fragment refers to. If it > > cannot, > > - *it should return a %NULL pointer if the file was found but no > > acceptable > > - *&dentries were available, or an %ERR_PTR error code indicating why it > > - *couldn't be found (e.g. %ENOENT or %ENOMEM). Any suitable dentry > > can be > > - *returned including, if necessary, a new dentry created with > > d_alloc_root. > > - *The caller can then find any other extant dentries by following the > > - *d_alias links. > > + *it should return a %NULL pointer if the file cannot be found, or an > > + *%ERR_PTR error code of %ENOMEM if a memory allocation failure > > occurred. > > + *Any other error code is treated like %NULL, and will cause an > > %ESTALE error > > + *for callers of exportfs_decode_fh(). > > + *Any suitable dentry can be returned including, if necessary, a new > > dentry > > + *created with d_alloc_root. The caller can then find any other extant > > + *dentries by following the d_alias links. > > * > > * fh_to_parent: > > *Same as @fh_to_dentry, except that it returns a pointer to the parent > > -- > > 2.9.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is is possible to submit binary image as fstest test case?
On Thu, Oct 06, 2016 at 04:12:56PM +0800, Qu Wenruo wrote: > Hi, > > Just as the title says, for some case(OK, btrfs again) we need to catch a > file system in special timing. > > In this specific case, we need to grab a btrfs image undergoing balancing, > just before the balance finished. > > Although we can use flakey to drop all write, we still don't have method to > catch the timing of the that transaction. > > > On the other hand, we can tweak our local kernel, adding msleep()/message > and dump the disk during the sleep. > And the image I dumped can easily trigger btrfs kernel and user-space bug. > > So I'm wondering if I can just upload a zipped raw image as part of the test > case? > Doesn't necessarily bother me one way or the other, but something we've done with XFS in such situations is introduce a DEBUG mode only sysfs tunable that delays certain infrastructure (log recovery in our case) to coordinate with test cases that try to reproduce such timing/racing problems. See test xfs/051 for an example.. Brian > Thanks, > Qu > > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: Fix warning_trace compile error if backtrace is disabled
If we disable backtrace, btrfs-progs can't be compiled since we don't have warning_trace defined. Fix by move it out of #ifndef BTRFS_DISABLE_BACKTRACE block. Signed-off-by: Qu Wenruo --- kerncompat.h | 38 -- 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/kerncompat.h b/kerncompat.h index ea04ef3..fba0bc1 100644 --- a/kerncompat.h +++ b/kerncompat.h @@ -72,6 +72,26 @@ #define ___token_glue(a,b,c) a ## b ## c #define BUILD_ASSERT(x)extern int __token_glue(compile_time_assert_,__LINE__,__COUNTER__)[1-2*!(x)] __attribute__((unused)) +static inline void warning_trace(const char *assertion, const char *filename, + const char *func, unsigned line, int val, + int trace) +{ + if (val) + return; + if (assertion) + fprintf(stderr, + "%s:%d: %s: Warning: assertion `%s` failed, value %d\n", + filename, line, func, assertion, val); + else + fprintf(stderr, + "%s:%d: %s: Warning: assertion failed, value %d.\n", + filename, line, func, val); +#ifndef BTRFS_DISABLE_BACKTRACE + if (trace) + print_trace(); +#endif +} + #ifndef BTRFS_DISABLE_BACKTRACE #define MAX_BACKTRACE 16 static inline void print_trace(void) @@ -99,24 +119,6 @@ static inline void assert_trace(const char *assertion, const char *filename, exit(1); } -static inline void warning_trace(const char *assertion, const char *filename, - const char *func, unsigned line, int val, - int trace) -{ - if (val) - return; - if (assertion) - fprintf(stderr, - "%s:%d: %s: Warning: assertion `%s` failed, value %d\n", - filename, line, func, assertion, val); - else - fprintf(stderr, - "%s:%d: %s: Warning: assertion failed, value %d.\n", - filename, line, func, val); - if (trace) - print_trace(); -} - #define BUG() assert_trace(NULL, __FILE__, __func__, __LINE__, 0) #else #define BUG() assert(0) -- 2.10.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: Fix stack overflow for checking qgroup on tree reloc tree
For tree reloc tree whose level is >= 2, the root node's parent will point to itself. In this case it will make btrfsck overflow its stack and cause segfault. While for tree reloc tree, it doesn't affect qgroup and kernel can handle it well. So add tree reloc tree check for qgroup-verify.c and fix the bug. Test case will follow soon after I make a minimal image for it. Current xz ziped image is still over 10M for a 512M fs. Signed-off-by: Qu Wenruo --- qgroup-verify.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/qgroup-verify.c b/qgroup-verify.c index df0e547..9a56f59 100644 --- a/qgroup-verify.c +++ b/qgroup-verify.c @@ -369,6 +369,11 @@ static int find_parent_roots(struct ulist *roots, u64 parent) if (ret < 0) goto out; } + } else if (ref->parent == ref->bytenr) { + /* +* Special loop case for tree reloc tree +*/ + ref->root = BTRFS_TREE_RELOC_OBJECTID; } else { ret = find_parent_roots(roots, ref->parent); if (ret < 0) @@ -578,6 +583,8 @@ static u64 resolve_one_root(u64 bytenr) if (ref->root) return ref->root; + if (ref->parent == bytenr) + return BTRFS_TREE_RELOC_OBJECTID; return resolve_one_root(ref->parent); } @@ -748,6 +755,9 @@ static int add_refs_for_implied(struct btrfs_fs_info *info, u64 bytenr, struct btrfs_root *root; struct btrfs_key key; + /* Tree reloc tree doesn't contribute qgroup, skip it */ + if (root_id == BTRFS_TREE_RELOC_OBJECTID) + return 0; key.objectid = root_id; key.type = BTRFS_ROOT_ITEM_KEY; key.offset = (u64)-1; -- 2.10.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Is is possible to submit binary image as fstest test case?
Hi, Just as the title says, for some case(OK, btrfs again) we need to catch a file system in special timing. In this specific case, we need to grab a btrfs image undergoing balancing, just before the balance finished. Although we can use flakey to drop all write, we still don't have method to catch the timing of the that transaction. On the other hand, we can tweak our local kernel, adding msleep()/message and dump the disk during the sleep. And the image I dumped can easily trigger btrfs kernel and user-space bug. So I'm wondering if I can just upload a zipped raw image as part of the test case? Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: relocation: Fix NULL pointer access and leaking qgroups in btrfs_recover_relocation
We fixed balance qgroup leaking by introducing qgroup_fix_relocated_data_extents(), but it only covers normal balance routine well. For case btrfs_recover_relocation() routine, since rc->stage or rc->data_inode are not initialized, we either skip qgroup_fix_relocated_data_extents() or just cause NULL pointer access. Since we skip qgroup_fix_relocated_data_extents() in btrfs_recover_relocation(), we will still leak qgroup numbers for that routine. In the fix, we won't use data_inode any longer, as at the timing of the qgroup fix, noone will modify data_reloc tree any longer, so path locking should be good enough. And add check against rc->merge_reloc_tree, so we can detect if we are in btrfs_recover_relocation() routine and continue qgroup fix. Reported-by: Goldwyn Rodrigues Signed-off-by: Qu Wenruo --- fs/btrfs/relocation.c | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index c0c13dc..f250187 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -3946,8 +3946,7 @@ static int qgroup_fix_relocated_data_extents(struct btrfs_trans_handle *trans, struct reloc_control *rc) { struct btrfs_fs_info *fs_info = rc->extent_root->fs_info; - struct inode *inode = rc->data_inode; - struct btrfs_root *data_reloc_root = BTRFS_I(inode)->root; + struct btrfs_root *data_reloc_root; struct btrfs_path *path; struct btrfs_key key; int ret = 0; @@ -3956,18 +3955,28 @@ static int qgroup_fix_relocated_data_extents(struct btrfs_trans_handle *trans, return 0; /* +* For normal balance routine, merge_reloc_tree flag is always cleared +* before commit trans. +* For recover relocation routine, merge_reloc_tree is always 1, we need +* to continue anyway. +* * Only for stage where we update data pointers the qgroup fix is * valid. * For MOVING_DATA stage, we will miss the timing of swapping tree * blocks, and won't fix it. */ - if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found)) + if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found) && + rc->merge_reloc_tree == 0) return 0; + data_reloc_root = read_fs_root(fs_info, BTRFS_DATA_RELOC_TREE_OBJECTID); + if (IS_ERR(data_reloc_root)) + return PTR_ERR(data_reloc_root); + path = btrfs_alloc_path(); if (!path) return -ENOMEM; - key.objectid = btrfs_ino(inode); + key.objectid = BTRFS_FIRST_FREE_OBJECTID + 1; key.type = BTRFS_EXTENT_DATA_KEY; key.offset = 0; @@ -3975,13 +3984,10 @@ static int qgroup_fix_relocated_data_extents(struct btrfs_trans_handle *trans, if (ret < 0) goto out; - lock_extent(&BTRFS_I(inode)->io_tree, 0, (u64)-1); while (1) { struct btrfs_file_extent_item *fi; btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); - if (key.objectid > btrfs_ino(inode)) - break; if (key.type != BTRFS_EXTENT_DATA_KEY) goto next; fi = btrfs_item_ptr(path->nodes[0], path->slots[0], @@ -4004,7 +4010,6 @@ next: break; } } - unlock_extent(&BTRFS_I(inode)->io_tree, 0 , (u64)-1); out: btrfs_free_path(path); return ret; -- 2.10.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: opencode chunk locking, remove helpers
On Wed, Oct 05, 2016 at 02:41:24PM -0400, Jeff Mahoney wrote: > On 10/5/16 8:23 AM, David Sterba wrote: > > The helpers are trivial and we don't use them consistently. > > This one is going to conflict with the root->fs_info removal patchset in > every chunk since the helper does nothing with the root. Otherwise, the > idea is sound. No problem, I'll refresh and resend after the fsinfo/root series. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full
Thanks Wang, i applied them both on top of vanilla v4.8 - i hope this is OK. Will report back what happens. Greets, Stefan Am 06.10.2016 um 05:04 schrieb Wang Xiaoguang: > Hi, > > On 09/29/2016 03:27 PM, Stefan Priebe - Profihost AG wrote: >> Am 29.09.2016 um 09:13 schrieb Wang Xiaoguang: > I found that compress sometime report ENOSPC error even in 4.8-rc8, > currently I cannot confirm that as i do not have anough space to test this without compression ;-( But yes i've compression enabled. >>> I might not get you, my poor english :) >>> You mean that you only get ENOSPC error when compression is enabled? >>> >>> And when compression is not enabled, you do not get ENOSPC error? >> I can't tell you. I cannot test with compression not enabled. I do not >> have anough free space on this disk. > I had just sent two patches to fix false enospc error for compression, > please have a try, they fix false enospc error in my test environment. > btrfs: fix false enospc for compression > btrfs: improve inode's outstanding_extents computation > > I apply these two patchs in linux upstream tree, the latest commit > is 41844e36206be90cd4d962ea49b0abc3612a99d0. > > Regards, > Xiaoguang Wang > >> > I'm trying to fix it. That sounds good but do you also get the BTRFS: space_info 4 has 18446742286429913088 free, is not full kernel messages on umount? if not you might have found another problem. >>> Yes, I seem similar messages, you can paste you whole dmesg info here. >> [ cut here ] >> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5790 >> btrfs_free_block_groups+0x346/0x430 [btrfs]() >> Modules linked in: netconsole xt_multiport iptable_filter ip_tables >> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm >> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan >> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop >> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov >> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit >> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd >> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid >> CPU: 2 PID: 5187 Comm: umount Tainted: G O 4.4.22+63-ph #1 >> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015 >> 880fda777d00 813b69c3 >> c067a099 880fda777d38 810821c6 >> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098 >> Call Trace: >> [] dump_stack+0x63/0x90 >> [] warn_slowpath_common+0x86/0xc0 >> [] warn_slowpath_null+0x1a/0x20 >> [] btrfs_free_block_groups+0x346/0x430 [btrfs] >> [] close_ctree+0x15d/0x330 [btrfs] >> [] btrfs_put_super+0x19/0x20 [btrfs] >> [] generic_shutdown_super+0x6f/0x100 >> [] kill_anon_super+0x12/0x20 >> [] btrfs_kill_super+0x16/0xa0 [btrfs] >> [] deactivate_locked_super+0x43/0x70 >> [] deactivate_super+0x5c/0x60 >> [] cleanup_mnt+0x3f/0x90 >> [] __cleanup_mnt+0x12/0x20 >> [] task_work_run+0x81/0xa0 >> [] exit_to_usermode_loop+0xb0/0xc0 >> [] syscall_return_slowpath+0xd4/0x130 >> [] int_ret_from_sys_call+0x25/0x8f >> ---[ end trace cee6ace13018e13e ]--- >> [ cut here ] >> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5791 >> btrfs_free_block_groups+0x365/0x430 [btrfs]() >> Modules linked in: netconsole xt_multiport iptable_filter ip_tables >> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm >> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan >> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop >> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov >> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit >> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd >> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid >> CPU: 2 PID: 5187 Comm: umount Tainted: G W O 4.4.22+63-ph #1 >> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015 >> 880fda777d00 813b69c3 >> c067a099 880fda777d38 810821c6 >> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098 >> Call Trace: >> [] dump_stack+0x63/0x90 >> [] warn_slowpath_common+0x86/0xc0 >> [] warn_slowpath_null+0x1a/0x20 >> [] btrfs_free_block_groups+0x365/0x430 [btrfs] >> [] close_ctree+0x15d/0x330 [btrfs] >> [] btrfs_put_super+0x19/0x20 [btrfs] >> [] generic_shutdown_super+0x6f/0x100 >> [] kill_anon_super+0x12/0x20 >> [] btrfs_kill_super+0x16/0xa0 [btrfs] >> [] deactivate_locked_super+0x43/0x70 >> [] deactivate_super+0x5c/0x60 >> [] cleanup_mnt+0x3f/0x90 >> [] __cleanup_mnt+0x12/0x20 >> [] task_work_run+0x81/0xa0 >> [] exit_to_usermode_loop+0xb0/0xc0 >> [] syscall_return_slowpath+0xd4/0x130 >> [] int_ret_from_sys_call+0x25/0x8f >> ---[ end trace cee6ace1
Re: BTRFS: space_info 4 has 18446742286429913088 free, is not full
Thanks Wang, i applied them both on top of vanilla v4.8 - i hope this is OK. Will report back what happens. Greets, Stefan Am 06.10.2016 um 05:04 schrieb Wang Xiaoguang: > Hi, > > On 09/29/2016 03:27 PM, Stefan Priebe - Profihost AG wrote: >> Am 29.09.2016 um 09:13 schrieb Wang Xiaoguang: > I found that compress sometime report ENOSPC error even in 4.8-rc8, > currently I cannot confirm that as i do not have anough space to test this without compression ;-( But yes i've compression enabled. >>> I might not get you, my poor english :) >>> You mean that you only get ENOSPC error when compression is enabled? >>> >>> And when compression is not enabled, you do not get ENOSPC error? >> I can't tell you. I cannot test with compression not enabled. I do not >> have anough free space on this disk. > I had just sent two patches to fix false enospc error for compression, > please have a try, they fix false enospc error in my test environment. > btrfs: fix false enospc for compression > btrfs: improve inode's outstanding_extents computation > > I apply these two patchs in linux upstream tree, the latest commit > is 41844e36206be90cd4d962ea49b0abc3612a99d0. > > Regards, > Xiaoguang Wang > >> > I'm trying to fix it. That sounds good but do you also get the BTRFS: space_info 4 has 18446742286429913088 free, is not full kernel messages on umount? if not you might have found another problem. >>> Yes, I seem similar messages, you can paste you whole dmesg info here. >> [ cut here ] >> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5790 >> btrfs_free_block_groups+0x346/0x430 [btrfs]() >> Modules linked in: netconsole xt_multiport iptable_filter ip_tables >> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm >> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan >> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop >> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov >> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit >> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd >> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid >> CPU: 2 PID: 5187 Comm: umount Tainted: G O 4.4.22+63-ph #1 >> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015 >> 880fda777d00 813b69c3 >> c067a099 880fda777d38 810821c6 >> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098 >> Call Trace: >> [] dump_stack+0x63/0x90 >> [] warn_slowpath_common+0x86/0xc0 >> [] warn_slowpath_null+0x1a/0x20 >> [] btrfs_free_block_groups+0x346/0x430 [btrfs] >> [] close_ctree+0x15d/0x330 [btrfs] >> [] btrfs_put_super+0x19/0x20 [btrfs] >> [] generic_shutdown_super+0x6f/0x100 >> [] kill_anon_super+0x12/0x20 >> [] btrfs_kill_super+0x16/0xa0 [btrfs] >> [] deactivate_locked_super+0x43/0x70 >> [] deactivate_super+0x5c/0x60 >> [] cleanup_mnt+0x3f/0x90 >> [] __cleanup_mnt+0x12/0x20 >> [] task_work_run+0x81/0xa0 >> [] exit_to_usermode_loop+0xb0/0xc0 >> [] syscall_return_slowpath+0xd4/0x130 >> [] int_ret_from_sys_call+0x25/0x8f >> ---[ end trace cee6ace13018e13e ]--- >> [ cut here ] >> WARNING: CPU: 2 PID: 5187 at fs/btrfs/extent-tree.c:5791 >> btrfs_free_block_groups+0x365/0x430 [btrfs]() >> Modules linked in: netconsole xt_multiport iptable_filter ip_tables >> x_tables 8021q garp bonding x86_pkg_temp_thermal coretemp kvm_intel kvm >> irqbypass sb_edac crc32_pclmul edac_core i2c_i801 i40e(O) vxlan >> ip6_udp_tunnel udp_tunnel shpchp ipmi_si ipmi_msghandler button loop >> btrfs dm_mod raid10 raid0 multipath linear raid456 async_raid6_recov >> async_memcpy async_pq async_xor async_tx xor raid6_pq igb i2c_algo_bit >> i2c_core usbhid raid1 md_mod xhci_pci sg ehci_pci xhci_hcd ehci_hcd >> sd_mod ahci usbcore ptp libahci usb_common pps_core aacraid >> CPU: 2 PID: 5187 Comm: umount Tainted: G W O 4.4.22+63-ph #1 >> Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015 >> 880fda777d00 813b69c3 >> c067a099 880fda777d38 810821c6 >> 880074bf0a00 88103c10c088 88103c10c000 88103c10c098 >> Call Trace: >> [] dump_stack+0x63/0x90 >> [] warn_slowpath_common+0x86/0xc0 >> [] warn_slowpath_null+0x1a/0x20 >> [] btrfs_free_block_groups+0x365/0x430 [btrfs] >> [] close_ctree+0x15d/0x330 [btrfs] >> [] btrfs_put_super+0x19/0x20 [btrfs] >> [] generic_shutdown_super+0x6f/0x100 >> [] kill_anon_super+0x12/0x20 >> [] btrfs_kill_super+0x16/0xa0 [btrfs] >> [] deactivate_locked_super+0x43/0x70 >> [] deactivate_super+0x5c/0x60 >> [] cleanup_mnt+0x3f/0x90 >> [] __cleanup_mnt+0x12/0x20 >> [] task_work_run+0x81/0xa0 >> [] exit_to_usermode_loop+0xb0/0xc0 >> [] syscall_return_slowpath+0xd4/0x130 >> [] int_ret_from_sys_call+0x25/0x8f >> ---[ end trace cee6ace1
Re: [PATCH v3.1 2/3] btrfs: relocation: Fix leaking qgroups numbers on data extents
At 10/05/2016 09:53 AM, Goldwyn Rodrigues wrote: On 08/14/2016 09:36 PM, Qu Wenruo wrote: This patch fixes a REGRESSION introduced in 4.2, caused by the big quota rework. When balancing data extents, qgroup will leak all its numbers for relocated data extents. The relocation is done in the following steps for data extents: 1) Create data reloc tree and inode 2) Copy all data extents to data reloc tree And commit transaction 3) Create tree reloc tree(special snapshot) for any related subvolumes 4) Replace file extent in tree reloc tree with new extents in data reloc tree And commit transaction 5) Merge tree reloc tree with original fs, by swapping tree blocks For 1)~4), since tree reloc tree and data reloc tree doesn't count to qgroup, everything is OK. But for 5), the swapping of tree blocks will only info qgroup to track metadata extents. If metadata extents contain file extents, qgroup number for file extents will get lost, leading to corrupted qgroup accounting. The fix is, before commit transaction of step 5), manually info qgroup to track all file extents in data reloc tree. Since at commit transaction time, the tree swapping is done, and qgroup will account these data extents correctly. Cc: Mark Fasheh Reported-by: Mark Fasheh Reported-by: Filipe Manana Signed-off-by: Qu Wenruo Tested-by: Goldwyn Rodrigues --- fs/btrfs/relocation.c | 109 +++--- 1 file changed, 103 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index b26a5ae..27480ef 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -31,6 +31,7 @@ #include "async-thread.h" #include "free-space-cache.h" #include "inode-map.h" +#include "qgroup.h" /* * backref_node, mapping_node and tree_block start with this @@ -3916,6 +3917,90 @@ int prepare_to_relocate(struct reloc_control *rc) return 0; } +/* + * Qgroup fixer for data chunk relocation. + * The data relocation is done in the following steps + * 1) Copy data extents into data reloc tree + * 2) Create tree reloc tree(special snapshot) for related subvolumes + * 3) Modify file extents in tree reloc tree + * 4) Merge tree reloc tree with original fs tree, by swapping tree blocks + * + * The problem is, data and tree reloc tree are not accounted to qgroup, + * and 4) will only info qgroup to track tree blocks change, not file extents + * in the tree blocks. + * + * The good news is, related data extents are all in data reloc tree, so we + * only need to info qgroup to track all file extents in data reloc tree + * before commit trans. + */ +static int qgroup_fix_relocated_data_extents(struct btrfs_trans_handle *trans, +struct reloc_control *rc) +{ + struct btrfs_fs_info *fs_info = rc->extent_root->fs_info; + struct inode *inode = rc->data_inode; + struct btrfs_root *data_reloc_root = BTRFS_I(inode)->root; + struct btrfs_path *path; + struct btrfs_key key; + int ret = 0; + + if (!fs_info->quota_enabled) + return 0; + + /* +* Only for stage where we update data pointers the qgroup fix is +* valid. +* For MOVING_DATA stage, we will miss the timing of swapping tree +* blocks, and won't fix it. +*/ + if (!(rc->stage == UPDATE_DATA_PTRS && rc->extents_found)) + return 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + key.objectid = btrfs_ino(inode); + key.type = BTRFS_EXTENT_DATA_KEY; + key.offset = 0; + + ret = btrfs_search_slot(NULL, data_reloc_root, &key, path, 0, 0); + if (ret < 0) + goto out; + + lock_extent(&BTRFS_I(inode)->io_tree, 0, (u64)-1); + while (1) { + struct btrfs_file_extent_item *fi; + + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); + if (key.objectid > btrfs_ino(inode)) + break; + if (key.type != BTRFS_EXTENT_DATA_KEY) + goto next; + fi = btrfs_item_ptr(path->nodes[0], path->slots[0], + struct btrfs_file_extent_item); + if (btrfs_file_extent_type(path->nodes[0], fi) != + BTRFS_FILE_EXTENT_REG) + goto next; + ret = btrfs_qgroup_insert_dirty_extent(trans, fs_info, + btrfs_file_extent_disk_bytenr(path->nodes[0], fi), + btrfs_file_extent_disk_num_bytes(path->nodes[0], fi), + GFP_NOFS); + if (ret < 0) + break; +next: + ret = btrfs_next_item(data_reloc_root, path); + if (ret < 0) + break; + if (ret > 0) { + ret = 0; + break; + } +