Re: [PATCH] Btrfs: fix negative qgroup tracking from owner accounting (bug #61951)
On Mon, November 04, 2013 at 18:42 (+0100), Josef Bacik wrote: On Thu, Oct 24, 2013 at 03:22:06PM +0200, Jan Schmidt wrote: btrfs_dec_ref() queued a delayed ref for owner of a tree block. The qgroup tracking is based on delayed refs. The owner of a tree block is set when a tree block is allocated, it is never updated. When you allocate a tree block and then remove the subvolume that did the allocation, the qgroup accounting for that removal is correct. However, the removal was accounted again for each subvolume deletion that also referenced the tree block, because accounting was erroneously based on the owner. Instead of queueing delayed refs for the non-existent owner, we now queue delayed refs for the root being removed. This fixes the qgroup accounting. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net Tested-by: dustym...@gmail.com This breaks btrfs/003, I'm kicking it out. Can you be a bit more specific? Works fine here. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix negative qgroup tracking from owner accounting (bug #61951)
(cc Arne) On Thu, October 24, 2013 at 16:49 (+0200), Wang Shilong wrote: Hello Jan, btrfs_dec_ref() queued a delayed ref for owner of a tree block. The qgroup tracking is based on delayed refs. The owner of a tree block is set when a tree block is allocated, it is never updated. When you allocate a tree block and then remove the subvolume that did the allocation, the qgroup accounting for that removal is correct. However, the removal was accounted again for each subvolume deletion that also referenced the tree block, because accounting was erroneously based on the owner. Instead of queueing delayed refs for the non-existent owner, we now queue delayed refs for the root being removed. This fixes the qgroup accounting. Thanks for tracking this, i apply your patch, and using the flowing patch, found the problem still exist, the test script like the following: #!/bin/sh for i in $(seq 1000) do dd if=/dev/zero of=mnt/$iaaa bs=10K count=1 done btrfs sub snapshot mnt mnt/1 for i in $(seq 100) do btrfs sub snapshot mnt/$i mnt/$(($i+1)) done for i in $(seq 101) do btrfs sub delete mnt/$i done I've understood the problem this reproducer creates. In fact, you can shorten it dramatically. The story of qgroups is going to turn awkward at this point. mkfs and enable quota, put some data in (needs a level 2 tree) - this accounts rfer and excl for qgroup 5 take a snapshot - this creates qgroup 257, which gets rfer(257) = rfer(5) and excl(257) = 0, excl(5) = 0. now make sure you don't cow anything (which we always did in our extensive tests), just drop the newly created snapshot. - excl(5) ought to become what it was before the snapshot, and there's no code for this. This is because there is node code that brings rfer(257) to zero, the data extents are not touched because the tree blocks of 5 and 257 are shared. Drop tree does not go down the whole tree, when it finds a tree block with refcnt 1 it just decrements it and is done. This is very efficient but is bad the qgroup numbers. We have got three possibile solutions in mind: A: Always walk down the whole tree for quota-enabled fs tree drops. Can be done with the read-ahead code, but is potentially a whole lot of work for large file systems. B: Use tracking qgroups as required for several operations on higher level qgroups also for the level 0 qgroups. They could be created automatically and track the correct numbers just in case a snapshot is deleted. The problem with that approach is that it does not scale for a large number of subvolumes, as you need to track each possible combination of all subvolumes (exponential costs). C: Make sure all your metadata is cowed before dropping a subvolume. This is explicitly doing what solution A would do implicitly, but can theoretically be done by the user. I don't consider C a practical solution. Sigh. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix negative qgroup tracking from owner accounting (bug #61951)
Hi Josef, please consider this patch for btrfs-next and for the following merge window (3.13). The fact that there's another problem concerning qgroups as discussed in the rest of this thread doesn't make this patch any less correct. Thanks, -Jan On Thu, October 24, 2013 at 15:22 (+0200), Jan Schmidt wrote: btrfs_dec_ref() queued a delayed ref for owner of a tree block. The qgroup tracking is based on delayed refs. The owner of a tree block is set when a tree block is allocated, it is never updated. When you allocate a tree block and then remove the subvolume that did the allocation, the qgroup accounting for that removal is correct. However, the removal was accounted again for each subvolume deletion that also referenced the tree block, because accounting was erroneously based on the owner. Instead of queueing delayed refs for the non-existent owner, we now queue delayed refs for the root being removed. This fixes the qgroup accounting. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net Tested-by: dustym...@gmail.com --- fs/btrfs/extent-tree.c | 14 +- 1 files changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d58bef1..7846cae 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3004,12 +3004,11 @@ out: static int __btrfs_mod_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *buf, -int full_backref, int inc, int for_cow) +int full_backref, u64 ref_root, int inc, int for_cow) { u64 bytenr; u64 num_bytes; u64 parent; - u64 ref_root; u32 nritems; struct btrfs_key key; struct btrfs_file_extent_item *fi; @@ -3019,7 +3018,6 @@ static int __btrfs_mod_ref(struct btrfs_trans_handle *trans, int (*process_func)(struct btrfs_trans_handle *, struct btrfs_root *, u64, u64, u64, u64, u64, u64, int); - ref_root = btrfs_header_owner(buf); nritems = btrfs_header_nritems(buf); level = btrfs_header_level(buf); @@ -3075,13 +3073,19 @@ fail: int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *buf, int full_backref, int for_cow) { - return __btrfs_mod_ref(trans, root, buf, full_backref, 1, for_cow); + u64 ref_root; + + ref_root = btrfs_header_owner(buf); + + return __btrfs_mod_ref(trans, root, buf, full_backref, ref_root, +1, for_cow); } int btrfs_dec_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *buf, int full_backref, int for_cow) { - return __btrfs_mod_ref(trans, root, buf, full_backref, 0, for_cow); + return __btrfs_mod_ref(trans, root, buf, full_backref, root-objectid, +0, for_cow); } static int write_one_cache_group(struct btrfs_trans_handle *trans, -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send/receive do not keep inode ctimes
Hi Karl, On Fri, October 25, 2013 at 15:12 (+0200), Karl Kiniger wrote: is there low level support to change inode ctimes somehow? (on ext[234] it can be done using debugfs) No. It would be nice to make received snapshots as similar as possible to their send source. (I am not talking about uuids and such, just ls -lc output) This is not planned. Currently, we do not even preserve the inode number. Can you give a short explanation of your use case, why do you need to keep the ctime? Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix negative qgroup tracking from owner accounting (bug #61951)
On Thu, October 24, 2013 at 16:49 (+0200), Wang Shilong wrote: Hello Jan, btrfs_dec_ref() queued a delayed ref for owner of a tree block. The qgroup tracking is based on delayed refs. The owner of a tree block is set when a tree block is allocated, it is never updated. When you allocate a tree block and then remove the subvolume that did the allocation, the qgroup accounting for that removal is correct. However, the removal was accounted again for each subvolume deletion that also referenced the tree block, because accounting was erroneously based on the owner. Instead of queueing delayed refs for the non-existent owner, we now queue delayed refs for the root being removed. This fixes the qgroup accounting. Thanks for tracking this, i apply your patch, and using the flowing patch, found the problem still exist, the test script like the following: Reproduced. Gives more negative numbers due to accounting triggered by the cleaner thread, that's the common part here. I still believe that the fix I sent is correct, it's probably not complete. Looking into it. Thanks, -Jan #!/bin/sh for i in $(seq 1000) do dd if=/dev/zero of=mnt/$iaaa bs=10K count=1 done btrfs sub snapshot mnt mnt/1 for i in $(seq 100) do btrfs sub snapshot mnt/$i mnt/$(($i+1)) done for i in $(seq 101) do btrfs sub delete mnt/$i done Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: Don't allocate inode that is already in use
On Tue, October 15, 2013 at 22:41 (+0200), Zach Brown wrote: Probably a bit too obscure to turn this into an xfstest? At least nobody complained so far, and this reproducer takes me 1m57 to run, so nothing I want in each xfstest cycle. I disagree. The entire point of regression tests is to trigger bugs that the usual processes failed to find, like this one. If you think that two minutes is too long for a test to run then mark it as stress (is that the xfstests group for boring long running tests?) or take the time to make a tighter test. Don't just skip regression testing. Please. You are mixing up my points. The first argument you're quoting is not against regression testing in this case, and it deserves the stress answer, I agree. You don't quote my second argument, which is not just skip regression testing. I'll try again in other words: A regression test only makes sense if it can prevent us from making the same mistake again. As far as I see, the reproducer script is so specific, that the only thing it can prevent is an exact revert of Stefan's patch. If you argue that we should have a test for just this, fair enough, then we could use exactly Stefan's script. I don't think that gains us anything. We're not normally reverting bugfix patches deliberately, especially not for very short patches with very long descriptions. I'd very much like to see a more generic test to avoid similar regressions, if that can be created. I don't have a good plan how to trigger such a situation (i.e. know which inodes are on the free_inode_pinned list) in a more general way. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: use right root when checking for hash collision
On Wed, October 09, 2013 at 18:26 (+0200), Josef Bacik wrote: btrfs_rename was using the root of the old dir instead of the root of the new dir when checking for a hash collision, so if you tried to move a file into a subvol it would freak out because it would see the file you are trying to move in its current root. This fixes the bug where this would fail btrfs subvol create test1 btrfs subvol create test2 mv test1 test2. Thanks to Chris Murphy for catching this, Reported-by: Chris Murphy li...@colorremedies.com Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1d7ef37..d468246 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7993,7 +7993,7 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, /* check for collisions, even if the name isn't there */ - ret = btrfs_check_dir_item_collision(root, new_dir-i_ino, + ret = btrfs_check_dir_item_collision(dest, new_dir-i_ino, new_dentry-d_name.name, new_dentry-d_name.len); Looks correct. I claim that better variable names would have had avoided this bug. The code uses old_dir / new_dir, old_entry / new_entry, old_inode / new_inode - so, while you're at it: How about changing the variables to old_root / new_root instead of keeping root / dest? - Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: Don't allocate inode that is already in use
On Tue, October 15, 2013 at 20:08 (+0200), Stefan Behrens wrote: Due to an off-by-one error, it is possible to reproduce a bug when the inode cache is used. The same inode number is assigned twice, the second time this leads to an EEXIST in btrfs_insert_empty_items(). The issue can happen when a file is removed right after a subvolume is created and then a new inode number is created before the inodes in free_inode_pinned are processed. unlink() calls btrfs_return_ino() which calls start_caching() in this case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by searching for the highest inode (which already cannot find the unlinked one anymore in btrfs_find_free_objectid()). So if this unlinked inode's number is equal to the highest_ino + 1 (or = this value instead of this value which was the off-by-one error), we mustn't add the inode number to free_ino_pinned (caching_thread() does it right). In this case we need to try directly to add the number to the inode_cache which will fail in this case. When this inode number is allocated while it is still in free_ino_pinned, it is allocated and still added to the free inode cache when the pinned inodes are processed, thus one of the following inode number allocations will get an inode that is already in use and fail with EEXIST in btrfs_insert_empty_items(). One example which was created with the reproducer below: Create a snapshot, work in the newly created snapshot for the rest. In unlink(inode 34284) call btrfs_return_ino() which calls start_caching(). start_caching() calls add_free_space [34284, 18446744073709517077]. In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong. mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284. btrfs_unpin_free_ino calls add_free_space [34284, 1]. mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284. EEXIST when the new inode is inserted. One possible reproducer is this one: #!/bin/sh # preparation TEST_DEV=/dev/sdc1 TEST_MNT=/mnt umount ${TEST_MNT} 2/dev/null || true mkfs.btrfs -f ${TEST_DEV} mount ${TEST_DEV} ${TEST_MNT} -o \ rw,relatime,compress=lzo,space_cache,inode_cache btrfs subv create ${TEST_MNT}/s1 for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2 FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed 's|^.*/\([^/]*\)$|\1|'` rm ${TEST_MNT}/s2/$FILENAME touch ${TEST_MNT}/s2/$FILENAME # the following steps can be repeated to reproduce the issue again and again [ -e ${TEST_MNT}/s3 ] btrfs subv del ${TEST_MNT}/s3 btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3 rm ${TEST_MNT}/s3/$FILENAME touch ${TEST_MNT}/s3/$FILENAME ls -alFi ${TEST_MNT}/s?/$FILENAME touch ${TEST_MNT}/s3/_1 || logger FAILED ls -alFi ${TEST_MNT}/s?/_1 touch ${TEST_MNT}/s3/_2 || logger FAILED ls -alFi ${TEST_MNT}/s?/_2 touch ${TEST_MNT}/s3/__1 || logger FAILED ls -alFi ${TEST_MNT}/s?/__1 touch ${TEST_MNT}/s3/__2 || logger FAILED ls -alFi ${TEST_MNT}/s?/__2 # if the above is not enough, add the following loop: for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done #for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done # one of the touch(1) calls in s3 fail due to EEXIST because the inode is # already in use that btrfs_find_ino_for_alloc() returns. Probably a bit too obscure to turn this into an xfstest? At least nobody complained so far, and this reproducer takes me 1m57 to run, so nothing I want in each xfstest cycle. If we ever introduce a similar problem, this reproducer probably won't find it (at least if it's really dependent on the exact number of files and the exact inode number), unless we're effectively reversing this patch. So no real use for a regression test in my opinion, I'm okay with just fixing it. Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de --- fs/btrfs/inode-map.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c index 014de49..ec08004 100644 --- a/fs/btrfs/inode-map.c +++ b/fs/btrfs/inode-map.c @@ -237,7 +237,7 @@ again: start_caching(root); if (objectid = root-cache_progress || - objectid root-highest_objectid) + objectid = root-highest_objectid) __btrfs_add_free_space(ctl, objectid, 1); else __btrfs_add_free_space(pinned, objectid, 1); Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net ... although this is not the most beautiful commit message I've ever seen ;-) -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: btrfs/011 improvement for compressed filesystems
status $SCRATCH_MNT $tmp.tmp 21 cat $tmp.tmp $seqres.full - grep -q finished $tmp.tmp || _fail btrfs replace status failed + grep -q finished $tmp.tmp || _fail btrfs replace status (finished) failed fi if ps -p $noise_pid | grep -q $noise_pid; then The Q-comparisons look a bit strange to me, but they've been there before. Reviewed-by: Jan Schmidt list@jan-o-sch.net -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 2/2] xfstests btrfs/316: test send / receive
Basic send / receive functionality test for btrfs. Requires current version of fsstress built (-x support). Relies on fssum tool but can skip the test if it failed to build. Signed-off-by: Jan Schmidt list@jan-o-sch.net Reviewed-by: Josef Bacik jba...@fusionio.com --- tests/btrfs/316 | 116 +++ tests/btrfs/316.out |4 ++ tests/btrfs/group |1 + 3 files changed, 121 insertions(+), 0 deletions(-) create mode 100755 tests/btrfs/316 create mode 100644 tests/btrfs/316.out diff --git a/tests/btrfs/316 b/tests/btrfs/316 new file mode 100755 index 000..b3af7d9 --- /dev/null +++ b/tests/btrfs/316 @@ -0,0 +1,116 @@ +#! /bin/bash +# FSQA Test No. 316 +# +# Run fsstress to create a reasonably strange file system, make a +# snapshot (base) and run more fsstress. Then take another snapshot +# (incr) and send both snapshots to a temp file. Remake the file +# system and receive from the files. Check both states with fssum. +# +#--- +# Copyright (C) 2013 STRATO. All rights reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# +#--- +# +# creator +owner=list.bt...@jan-o-sch.net + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=`mktemp -d` +status=1 + +_cleanup() +{ + echo *** unmount + umount $SCRATCH_MNT 2/dev/null + rm -f $tmp.* +} +trap _cleanup; exit \$status 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_need_to_be_root +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_seek_data_hole + +FSSUM_PROG=$here/src/fssum +[ -x $FSSUM_PROG ] || _notrun fssum not built + +rm -f $seqres.full + +workout() +{ + fsz=$1 + ops=$2 + + umount $SCRATCH_DEV /dev/null 21 + echo *** mkfs -dsize=$fsz$seqres.full + echo $seqres.full + _scratch_mkfs_sized $fsz $seqres.full 21 \ + || _fail size=$fsz mkfs failed + run_check _scratch_mount -o noatime + + run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n $ops $FSSTRESS_AVOID -x \ + $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/base + + run_check $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/incr + + echo # $BTRFS_UTIL_PROG send $SCRATCH_MNT/base $tmp/base.snap \ +$seqres.full + $BTRFS_UTIL_PROG send $SCRATCH_MNT/base $tmp/base.snap 2 $seqres.full \ + || _fail failed: '$@' + echo # $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base\ + $SCRATCH_MNT/incr $tmp/incr.snap $seqres.full + $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base \ + $SCRATCH_MNT/incr $tmp/incr.snap 2 $seqres.full \ + || _fail failed: '$@' + + run_check $FSSUM_PROG -A -f -w $tmp/base.fssum $SCRATCH_MNT/base + run_check $FSSUM_PROG -A -f -w $tmp/incr.fssum -x $SCRATCH_MNT/incr/base \ + $SCRATCH_MNT/incr + + umount $SCRATCH_DEV /dev/null 21 + echo *** mkfs -dsize=$fsz$seqres.full + echo $seqres.full + _scratch_mkfs_sized $fsz $seqres.full 21 \ + || _fail size=$fsz mkfs failed + run_check _scratch_mount -o noatime + + run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT $tmp/base.snap + run_check $FSSUM_PROG -r $tmp/base.fssum $SCRATCH_MNT/base + + run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT $tmp/incr.snap + run_check $FSSUM_PROG -r $tmp/incr.fssum $SCRATCH_MNT/incr +} + +echo *** test send / receive + +fssize=`expr 2000 \* 1024 \* 1024` +ops=200 + +workout $fssize $ops + +echo *** done +status=0 +exit diff --git a/tests/btrfs/316.out b/tests/btrfs/316.out new file mode 100644 index 000..4564c85 --- /dev/null +++ b/tests/btrfs/316.out @@ -0,0 +1,4 @@ +QA output created by 316 +*** test send / receive +*** done +*** unmount diff --git a/tests/btrfs/group b/tests/btrfs/group index bc6c256..11d708a 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -9,3 +9,4 @@ 276 auto rw metadata 284 auto 307 auto quick +316 auto rw metadata -- 1.7.2.5 -- To unsubscribe from this list: send the line unsubscribe linux
[PATCH v4 0/2] xfstest btrfs/316: test send / receive
These two patches add the announced tests for btrfs send / receive. As requested, the fssum tool is now included. -- v1-v2: - included fssum - test number is now 316 (was 314) v2-v3: - added missing -lcrypto to build fssum - removed obsolete change in README now that fssum is included - fixed comment in test/btrfs/316's header (314 - 316) v3-v4: - build fssum with help of autotools only if libssl is available - removed clumsy OPT_TARGETS in src/Makefile - added #define directives for SEEK_DATA and SEEK_HOLE to fssum.c Jan Schmidt (2): xfstests: add fssum tool xfstests btrfs/316: test send / receive .gitignore |1 + aclocal.m4 |1 + configure.ac |1 + include/builddefs.in |1 + m4/Makefile |1 + m4/package_ssldev.m4 |4 + src/Makefile |8 + src/fssum.c | 828 ++ tests/btrfs/316 | 116 +++ tests/btrfs/316.out |4 + tests/btrfs/group|1 + 11 files changed, 966 insertions(+), 0 deletions(-) create mode 100644 m4/package_ssldev.m4 create mode 100644 src/fssum.c create mode 100755 tests/btrfs/316 create mode 100644 tests/btrfs/316.out -- 1.7.2.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch v2 1/2] Btrfs: fix possible memory leak in find_parent_nodes()
On Fri, August 09, 2013 at 07:25 (+0200), Wang Shilong wrote: The origin code dealt with 'ref' as following steps: |-list_del(ref-list) |-some operations |-kfree(ref) If operations failed, it would goto label 'out' without freeing this 'ref'. and then memory leak would happen.Just move list_del() after kfree() will fix the problem. Still not sufficient as an explanation. What is missing is the hint that in the error handling code, we free everything that's left in the prefs list. -Jan Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com Reviewed-by: Miao Xie mi...@cn.fujitsu.com --- V1-V2: add explanations to changelog --- fs/btrfs/backref.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 68048d6..7b55c95 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -911,7 +911,6 @@ again: while (!list_empty(prefs)) { ref = list_first_entry(prefs, struct __prelim_ref, list); - list_del(ref-list); WARN_ON(ref-count 0); if (ref-count ref-root_id ref-parent == 0) { /* no parent == root of tree */ @@ -956,6 +955,7 @@ again: eie-next = ref-inode_list; } } + list_del(ref-list); kfree(ref); } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: stop using GFP_ATOMIC when allocating rewind ebs
On Wed, August 07, 2013 at 23:11 (+0200), Josef Bacik wrote: There is no reason we can't just set the path to blocking and then do normal GFP_NOFS allocations for these extent buffers. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/ctree.c | 16 ++-- fs/btrfs/extent_io.c |8 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 1dd8a71..414a2d7 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1191,8 +1191,8 @@ __tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, * is freed (its refcount is decremented). */ static struct extent_buffer * -tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, - u64 time_seq) +tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct btrfs_path *path, + struct extent_buffer *eb, u64 time_seq) { struct extent_buffer *eb_rewin; struct tree_mod_elem *tm; @@ -1207,12 +1207,15 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, if (!tm) return eb; + btrfs_set_path_blocking(path); + btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); + if (tm-op == MOD_LOG_KEY_REMOVE_WHILE_FREEING) { BUG_ON(tm-slot != 0); eb_rewin = alloc_dummy_extent_buffer(eb-start, fs_info-tree_root-nodesize); if (!eb_rewin) { - btrfs_tree_read_unlock(eb); + btrfs_tree_read_unlock_blocking(eb); free_extent_buffer(eb); return NULL; } @@ -1224,13 +1227,14 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, } else { eb_rewin = btrfs_clone_extent_buffer(eb); if (!eb_rewin) { - btrfs_tree_read_unlock(eb); + btrfs_tree_read_unlock_blocking(eb); free_extent_buffer(eb); return NULL; } } - btrfs_tree_read_unlock(eb); + btrfs_clear_path_blocking(path, NULL, BTRFS_READ_LOCK); + btrfs_tree_read_unlock_blocking(eb); unlock_blocking? Rest looks ok to me. Thanks, -Jan free_extent_buffer(eb); extent_buffer_get(eb_rewin); @@ -2779,7 +2783,7 @@ again: btrfs_clear_path_blocking(p, b, BTRFS_READ_LOCK); } - b = tree_mod_log_rewind(root-fs_info, b, time_seq); + b = tree_mod_log_rewind(root-fs_info, p, b, time_seq); if (!b) { ret = -ENOMEM; goto done; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index b422cba..beda5a8 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4340,12 +4340,12 @@ struct extent_buffer *btrfs_clone_extent_buffer(struct extent_buffer *src) struct extent_buffer *new; unsigned long num_pages = num_extent_pages(src-start, src-len); - new = __alloc_extent_buffer(NULL, src-start, src-len, GFP_ATOMIC); + new = __alloc_extent_buffer(NULL, src-start, src-len, GFP_NOFS); if (new == NULL) return NULL; for (i = 0; i num_pages; i++) { - p = alloc_page(GFP_ATOMIC); + p = alloc_page(GFP_NOFS); if (!p) { btrfs_release_extent_buffer(new); return NULL; @@ -4369,12 +4369,12 @@ struct extent_buffer *alloc_dummy_extent_buffer(u64 start, unsigned long len) unsigned long num_pages = num_extent_pages(0, len); unsigned long i; - eb = __alloc_extent_buffer(NULL, start, len, GFP_ATOMIC); + eb = __alloc_extent_buffer(NULL, start, len, GFP_NOFS); if (!eb) return NULL; for (i = 0; i num_pages; i++) { - eb-pages[i] = alloc_page(GFP_ATOMIC); + eb-pages[i] = alloc_page(GFP_NOFS); if (!eb-pages[i]) goto err; } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: deal with enomem in the rewind path V3
); + + } + if (page) { + /* One for when we alloced the page */ + page_cache_release(page); + } + } while (index != start_idx); +} + +/* + * Helper for releasing the extent buffer. + */ +static inline void btrfs_release_extent_buffer(struct extent_buffer *eb) +{ + btrfs_release_extent_buffer_page(eb, 0); + __free_extent_buffer(eb); +} + static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree, u64 start, unsigned long len, @@ -4276,7 +4346,10 @@ struct extent_buffer *btrfs_clone_extent_buffer(struct extent_buffer *src) for (i = 0; i num_pages; i++) { p = alloc_page(GFP_ATOMIC); - BUG_ON(!p); + if (!p) { + btrfs_release_extent_buffer(new); + return NULL; + } attach_extent_buffer_page(new, p); WARN_ON(PageDirty(p)); SetPageUptodate(p); @@ -4317,76 +4390,6 @@ err: return NULL; } -static int extent_buffer_under_io(struct extent_buffer *eb) -{ - return (atomic_read(eb-io_pages) || - test_bit(EXTENT_BUFFER_WRITEBACK, eb-bflags) || - test_bit(EXTENT_BUFFER_DIRTY, eb-bflags)); -} - -/* - * Helper for releasing extent buffer page. - */ -static void btrfs_release_extent_buffer_page(struct extent_buffer *eb, - unsigned long start_idx) -{ - unsigned long index; - unsigned long num_pages; - struct page *page; - int mapped = !test_bit(EXTENT_BUFFER_DUMMY, eb-bflags); - - BUG_ON(extent_buffer_under_io(eb)); - - num_pages = num_extent_pages(eb-start, eb-len); - index = start_idx + num_pages; - if (start_idx = index) - return; - - do { - index--; - page = extent_buffer_page(eb, index); - if (page mapped) { - spin_lock(page-mapping-private_lock); - /* - * We do this since we'll remove the pages after we've - * removed the eb from the radix tree, so we could race - * and have this page now attached to the new eb. So - * only clear page_private if it's still connected to - * this eb. - */ - if (PagePrivate(page) - page-private == (unsigned long)eb) { - BUG_ON(test_bit(EXTENT_BUFFER_DIRTY, eb-bflags)); - BUG_ON(PageDirty(page)); - BUG_ON(PageWriteback(page)); - /* - * We need to make sure we haven't be attached - * to a new eb. - */ - ClearPagePrivate(page); - set_page_private(page, 0); - /* One for the page private */ - page_cache_release(page); - } - spin_unlock(page-mapping-private_lock); - - } - if (page) { - /* One for when we alloced the page */ - page_cache_release(page); - } - } while (index != start_idx); -} - -/* - * Helper for releasing the extent buffer. - */ -static inline void btrfs_release_extent_buffer(struct extent_buffer *eb) -{ - btrfs_release_extent_buffer_page(eb, 0); - __free_extent_buffer(eb); -} - static void check_buffer_tree_ref(struct extent_buffer *eb) { int refs; Weird patch formatting concerning extent_io.c, I assume there are no changes in extent_buffer_under_io and btrfs_release_extent_buffer_page, you just moved btrfs_clone_extent_buffer, right? Perhaps --patience or --minimal could do better? Otherwise, Reviewed-by: Jan Schmidt list@jan-o-sch.net Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: pass gfp_t to __add_prelim_ref() to avoid always using GFP_ATOMIC
); ret = __add_prelim_ref(prefs, root, key, 0, 0, -bytenr, count); +bytenr, count, GFP_NOFS); break; } default: @@ -738,7 +738,7 @@ static int __add_keyed_refs(struct btrfs_fs_info *fs_info, case BTRFS_SHARED_BLOCK_REF_KEY: ret = __add_prelim_ref(prefs, 0, NULL, info_level + 1, key.offset, - bytenr, 1); + bytenr, 1, GFP_NOFS); break; case BTRFS_SHARED_DATA_REF_KEY: { struct btrfs_shared_data_ref *sdref; @@ -748,13 +748,13 @@ static int __add_keyed_refs(struct btrfs_fs_info *fs_info, struct btrfs_shared_data_ref); count = btrfs_shared_data_ref_count(leaf, sdref); ret = __add_prelim_ref(prefs, 0, NULL, 0, key.offset, - bytenr, count); + bytenr, count, GFP_NOFS); break; } case BTRFS_TREE_BLOCK_REF_KEY: ret = __add_prelim_ref(prefs, key.offset, NULL, info_level + 1, 0, -bytenr, 1); +bytenr, 1, GFP_NOFS); break; case BTRFS_EXTENT_DATA_REF_KEY: { struct btrfs_extent_data_ref *dref; @@ -770,7 +770,7 @@ static int __add_keyed_refs(struct btrfs_fs_info *fs_info, key.offset = btrfs_extent_data_ref_offset(leaf, dref); root = btrfs_extent_data_ref_root(leaf, dref); ret = __add_prelim_ref(prefs, root, key, 0, 0, -bytenr, count); +bytenr, count, GFP_NOFS); break; } default: Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/2] xfstests btrfs/316: test send / receive
Basic send / receive functionality test for btrfs. Requires current version of fsstress built (-x support). Relies on fssum tool but can skip the test if it failed to build. Signed-off-by: Jan Schmidt list@jan-o-sch.net Reviewed-by: Josef Bacik jba...@fusionio.com --- tests/btrfs/316 | 113 +++ tests/btrfs/316.out |4 ++ tests/btrfs/group |1 + 3 files changed, 118 insertions(+), 0 deletions(-) create mode 100755 tests/btrfs/316 create mode 100644 tests/btrfs/316.out diff --git a/tests/btrfs/316 b/tests/btrfs/316 new file mode 100755 index 000..087978a --- /dev/null +++ b/tests/btrfs/316 @@ -0,0 +1,113 @@ +#! /bin/bash +# FSQA Test No. 316 +# +# Run fsstress to create a reasonably strange file system, make a +# snapshot (base) and run more fsstress. Then take another snapshot +# (incr) and send both snapshots to a temp file. Remake the file +# system and receive from the files. Check both states with fssum. +# +#--- +# Copyright (C) 2013 STRATO. All rights reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# +#--- +# +# creator +owner=list.bt...@jan-o-sch.net + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=`mktemp -d` +status=1 + +_cleanup() +{ + echo *** unmount + umount $SCRATCH_MNT 2/dev/null + rm -f $tmp.* +} +trap _cleanup; exit \$status 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_need_to_be_root +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_command $FSSUM_PROG fssum + +rm -f $seqres.full + +workout() +{ + fsz=$1 + ops=$2 + + umount $SCRATCH_DEV /dev/null 21 + echo *** mkfs -dsize=$fsz$seqres.full + echo $seqres.full + _scratch_mkfs_sized $fsz $seqres.full 21 \ + || _fail size=$fsz mkfs failed + run_check _scratch_mount -o noatime + + run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n $ops $FSSTRESS_AVOID -x \ + $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/base + + run_check $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/incr + + echo # $BTRFS_UTIL_PROG send $SCRATCH_MNT/base $tmp/base.snap \ +$seqres.full + $BTRFS_UTIL_PROG send $SCRATCH_MNT/base $tmp/base.snap 2 $seqres.full \ + || _fail failed: '$@' + echo # $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base\ + $SCRATCH_MNT/incr $tmp/incr.snap $seqres.full + $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base \ + $SCRATCH_MNT/incr $tmp/incr.snap 2 $seqres.full \ + || _fail failed: '$@' + + run_check $FSSUM_PROG -A -f -w $tmp/base.fssum $SCRATCH_MNT/base + run_check $FSSUM_PROG -A -f -w $tmp/incr.fssum -x $SCRATCH_MNT/incr/base \ + $SCRATCH_MNT/incr + + umount $SCRATCH_DEV /dev/null 21 + echo *** mkfs -dsize=$fsz$seqres.full + echo $seqres.full + _scratch_mkfs_sized $fsz $seqres.full 21 \ + || _fail size=$fsz mkfs failed + run_check _scratch_mount -o noatime + + run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT $tmp/base.snap + run_check $FSSUM_PROG -r $tmp/base.fssum $SCRATCH_MNT/base + + run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT $tmp/incr.snap + run_check $FSSUM_PROG -r $tmp/incr.fssum $SCRATCH_MNT/incr +} + +echo *** test send / receive + +fssize=`expr 2000 \* 1024 \* 1024` +ops=200 + +workout $fssize $ops + +echo *** done +status=0 +exit diff --git a/tests/btrfs/316.out b/tests/btrfs/316.out new file mode 100644 index 000..4564c85 --- /dev/null +++ b/tests/btrfs/316.out @@ -0,0 +1,4 @@ +QA output created by 316 +*** test send / receive +*** done +*** unmount diff --git a/tests/btrfs/group b/tests/btrfs/group index bc6c256..11d708a 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -9,3 +9,4 @@ 276 auto rw metadata 284 auto 307 auto quick +316 auto rw metadata -- 1.7.2.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More
[PATCH v3 1/2] xfstests: add fssum tool
fssum is a tool to build a recursive checksum for a file system. The home repository of fssum is git://git.kernel.org/pub/scm/linux/kernel/git/arne/far-progs.git It is added as an optional target, because it depends on glibc = 2.15 for SEEK_HOLE / SEEK_DATA. The test to be added using fssum will just be skipped if fssum wasn't built. Signed-off-by: Jan Schmidt list@jan-o-sch.net --- .gitignore|1 + common/config |2 + src/Makefile | 11 +- src/fssum.c | 819 + 4 files changed, 832 insertions(+), 1 deletions(-) create mode 100644 src/fssum.c diff --git a/.gitignore b/.gitignore index 11594aa..c2fc6e3 100644 --- a/.gitignore +++ b/.gitignore @@ -45,6 +45,7 @@ /src/fill /src/fill2 /src/fs_perms +/src/fssum /src/fstest /src/fsync-tester /src/ftrunc diff --git a/common/config b/common/config index 67c1498..c8bee29 100644 --- a/common/config +++ b/common/config @@ -146,6 +146,8 @@ export SED_PROG=`set_prog_path sed` export BC_PROG=`set_prog_path bc` [ $BC_PROG = ] _fatal bc not found +export FSSUM_PROG=`set_prog_path fssum $here/src/fssum` + export PS_ALL_FLAGS=-ef export DF_PROG=`set_prog_path df` diff --git a/src/Makefile b/src/Makefile index cc679e8..10a4d3c 100644 --- a/src/Makefile +++ b/src/Makefile @@ -20,10 +20,14 @@ LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \ stale_handle pwrite_mmap_blocked t_dir_offset2 seek_sanity_test \ seek_copy_test t_readdir_1 t_readdir_2 fsync-tester +OPT_TARGETS = fssum + SUBDIRS = LLDLIBS = $(LIBATTR) $(LIBHANDLE) $(LIBACL) +OPT_LDLIBS = -lssl -lcrypto + ifeq ($(HAVE_XLOG_ASSIGN_LSN), true) LINUX_TARGETS += loggen endif @@ -60,7 +64,7 @@ CFILES = $(TARGETS:=.c) LDIRT = $(TARGETS) -default: depend $(TARGETS) $(SUBDIRS) +default: depend $(TARGETS) $(OPT_TARGETS) $(SUBDIRS) depend: .dep @@ -70,11 +74,16 @@ $(TARGETS): $(LIBTEST) @echo [CC]$@ $(Q)$(LTLINK) $@.c -o $@ $(CFLAGS) $(LDFLAGS) $(LDLIBS) $(LIBTEST) +$(OPT_TARGETS): $(LIBTEST) + @echo [CC]$@ + -$(Q)$(LTLINK) $@.c -o $@ $(CFLAGS) $(LDFLAGS) $(LDLIBS) $(OPT_LDLIBS) $(LIBTEST) + LINKTEST = $(LTLINK) $@.c -o $@ $(CFLAGS) $(LDFLAGS) install: default $(addsuffix -install,$(SUBDIRS)) $(INSTALL) -m 755 -d $(PKG_LIB_DIR)/src $(LTINSTALL) -m 755 $(TARGETS) $(PKG_LIB_DIR)/src + -$(LTINSTALL) -m 755 $(OPT_TARGETS) $(PKG_LIB_DIR)/src $(LTINSTALL) -m 755 fill2attr fill2fs fill2fs_check scaleread.sh $(PKG_LIB_DIR)/src $(LTINSTALL) -m 644 dumpfile $(PKG_LIB_DIR)/src diff --git a/src/fssum.c b/src/fssum.c new file mode 100644 index 000..ecddb6a --- /dev/null +++ b/src/fssum.c @@ -0,0 +1,819 @@ +/* + * Copyright (C) 2012 STRATO AG. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + */ +#define _BSD_SOURCE +#define _LARGEFILE64_SOURCE +#ifndef _GNU_SOURCE +#define _GNU_SOURCE +#endif +#include stdio.h +#include stdlib.h +#include unistd.h +#include string.h +#include fcntl.h +#include dirent.h +#include errno.h +#include sys/types.h +#include sys/stat.h +#ifdef __SOLARIS__ +#include sys/mkdev.h +#endif +#include openssl/md5.h +#include netinet/in.h +#include inttypes.h +#include assert.h + +#define CS_SIZE 16 +#define CHUNKS 128 + +#if __BYTE_ORDER == __LITTLE_ENDIAN +#define htonll(x) __bswap_64 (x) +#endif + +/* TODO: add hardlink recognition */ +/* TODO: add xattr/acl */ + +struct excludes { + char *path; + int len; +}; + +typedef struct _sum { + MD5_CTX md5; + unsigned char out[16]; +} sum_t; + +typedef int (*sum_file_data_t)(int fd, sum_t *dst); + +int gen_manifest = 0; +int in_manifest = 0; +char *checksum = NULL; +struct excludes *excludes; +int n_excludes = 0; +int verbose = 0; +FILE *out_fp; +FILE *in_fp; + +enum _flags { + FLAG_UID, + FLAG_GID, + FLAG_MODE, + FLAG_ATIME, + FLAG_MTIME, + FLAG_CTIME, + FLAG_DATA, + FLAG_OPEN_ERROR, + FLAG_STRUCTURE, + NUM_FLAGS +}; + +const char flchar[] = ugoamcdes; +char line[65536]; + +int flags[NUM_FLAGS] = {1, 1, 1, 1, 1, 0, 1, 0, 0}; + +char * +getln(char *buf, int size, FILE *fp) +{ + char *p; + int l; + + p = fgets(buf, size, fp); + if (!p) + return NULL; + + l
[PATCH v3 0/2] xfstest btrfs/316: test send / receive
These two patches add the announced tests for btrfs send / receive. As requested, the fssum tool is now included. One drawback is that I'm unable to edit configure.ac or whatever needs to be modified in an autotools preferred way. Any hints appreciated, preferrably hints containing all the modifications required to introduce something like HAVE_SEEK_HOLE. I do not want to make modifications to fssum.c here, if that's absolutely required (because one /could/ get along using linux/fs.h, which is not the way I would like to go), I'd like to have that changed in the far-progs repository where fssum.c comes from as well. -- v1-v2: - included fssum - test number is now 316 (was 314) v2-v3: - added missing -lcrypto to build fssum - removed obsolete change in README now that fssum is included - fixed comment in test/btrfs/316's header (314 - 316) Jan Schmidt (2): xfstests: add fssum tool xfstests btrfs/316: test send / receive .gitignore |1 + common/config |2 + src/Makefile| 11 +- src/fssum.c | 819 +++ tests/btrfs/316 | 113 +++ tests/btrfs/316.out |4 + tests/btrfs/group |1 + 7 files changed, 950 insertions(+), 1 deletions(-) create mode 100644 src/fssum.c create mode 100755 tests/btrfs/316 create mode 100644 tests/btrfs/316.out -- 1.7.2.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] Btrfs: catch error return value from find_extent_in_eb()
On Thu, August 08, 2013 at 12:24 (+0200), Filipe David Manana wrote: On Thu, Aug 8, 2013 at 6:04 AM, Wang Shilong wangsl.f...@cn.fujitsu.com wrote: find_extent_in_eb() may return ENOMEM, catch this error return value. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com Reviewed-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/backref.c | 4 1 file changed, 4 insertions(+) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 54e7610..f7781e6 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -934,6 +934,10 @@ again: } ret = find_extent_in_eb(eb, bytenr, *extent_item_pos, eie); + if (ret) { + free_extent_buffer(eb); + goto out; + } ref-inode_list = eie; free_extent_buffer(eb); } Hello, this is a duplicate of: https://patchwork.kernel.org/patch/2835989/ Your linked patch checks for ret 0, which is a safer option since there are functions down the stack returning 0 or 0 for success and 0 for errors. Currently, find_extent_in_eb doesn't return their return values, but I'd rather be a bit more on the safe side and use your patch. Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Btrfs: allocate prelim_ref with a slab allocater
On Thu, August 08, 2013 at 07:04 (+0200), Wang Shilong wrote: struct __prelim_ref is allocated and freed frequently when walking backref tree, using slab allocater can not only speed up allocating but also detect memory leaks. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com Reviewed-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/backref.c | 30 +- fs/btrfs/backref.h | 2 ++ fs/btrfs/super.c | 8 3 files changed, 35 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index f7781e6..916e4f1 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -119,6 +119,26 @@ struct __prelim_ref { u64 wanted_disk_byte; }; +static struct kmem_cache *prelim_ref_cache; + +int __init btrfs_prelim_ref_init(void) +{ + prelim_ref_cache = kmem_cache_create(btrfs_prelim_ref, + sizeof(struct __prelim_ref), + 0, + SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD, + NULL); + if (!prelim_ref_cache) + return -ENOMEM; + return 0; +} + +void btrfs_prelim_ref_exit(void) +{ + if (prelim_ref_cache) + kmem_cache_destroy(prelim_ref_cache); +} + /* * the rules for all callers of this function are: * - obtaining the parent is the goal @@ -165,7 +185,7 @@ static int __add_prelim_ref(struct list_head *head, u64 root_id, { struct __prelim_ref *ref; - ref = kmalloc(sizeof(*ref), gfp_mask); + ref = kmem_cache_alloc(prelim_ref_cache, gfp_mask); if (!ref) return -ENOMEM; @@ -493,7 +513,7 @@ static void __merge_refs(struct list_head *head, int mode) ref1-count += ref2-count; list_del(ref2-list); - kfree(ref2); + kmem_cache_free(prelim_ref_cache, ref2); } } @@ -958,7 +978,7 @@ again: } } list_del(ref-list); - kfree(ref); + kmem_cache_free(prelim_ref_cache, ref); } out: @@ -966,13 +986,13 @@ out: while (!list_empty(prefs)) { ref = list_first_entry(prefs, struct __prelim_ref, list); list_del(ref-list); - kfree(ref); + kmem_cache_free(prelim_ref_cache, ref); } while (!list_empty(prefs_delayed)) { ref = list_first_entry(prefs_delayed, struct __prelim_ref, list); list_del(ref-list); - kfree(ref); + kmem_cache_free(prelim_ref_cache, ref); } return ret; diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h index 8f2e767..a910b27 100644 --- a/fs/btrfs/backref.h +++ b/fs/btrfs/backref.h @@ -72,4 +72,6 @@ int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid, struct btrfs_inode_extref **ret_extref, u64 *found_off); +int __init btrfs_prelim_ref_init(void); +void btrfs_prelim_ref_exit(void); #endif diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index b64d762..de7eb3d 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -56,6 +56,7 @@ #include rcu-string.h #include dev-replace.h #include free-space-cache.h +#include backref.h #define CREATE_TRACE_POINTS #include trace/events/btrfs.h @@ -1774,6 +1775,10 @@ static int __init init_btrfs_fs(void) if (err) goto free_auto_defrag; + err = btrfs_prelim_ref_init(); + if (err) + goto free_prelim_ref; + err = btrfs_interface_init(); if (err) goto free_delayed_ref; @@ -1791,6 +1796,8 @@ static int __init init_btrfs_fs(void) unregister_ioctl: btrfs_interface_exit(); +free_prelim_ref: + btrfs_prelim_ref_exit(); free_delayed_ref: btrfs_delayed_ref_exit(); free_auto_defrag: @@ -1817,6 +1824,7 @@ static void __exit exit_btrfs_fs(void) btrfs_delayed_ref_exit(); btrfs_auto_defrag_exit(); btrfs_delayed_inode_exit(); + btrfs_prelim_ref_exit(); ordered_data_exit(); extent_map_exit(); extent_io_exit(); I generally like the idea of using a custom cache here. What about this one? 324 static int __resolve_indirect_refs(struct btrfs_fs_info *fs_info, [...] 367 /* additional parents require new refs being added here */ 368 while ((node = ulist_next(parents, uiter))) { 369 new_ref = kmalloc(sizeof(*new_ref), GFP_NOFS); That new_ref will also be freed with kmem_cache_free after your patch, I think. Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH] Btrfs: stop using GFP_ATOMIC when allocating rewind ebs
On Thu, August 08, 2013 at 15:12 (+0200), Josef Bacik wrote: On Thu, Aug 08, 2013 at 09:23:06AM +0200, Jan Schmidt wrote: On Wed, August 07, 2013 at 23:11 (+0200), Josef Bacik wrote: There is no reason we can't just set the path to blocking and then do normal GFP_NOFS allocations for these extent buffers. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/ctree.c | 16 ++-- fs/btrfs/extent_io.c |8 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 1dd8a71..414a2d7 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1191,8 +1191,8 @@ __tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, * is freed (its refcount is decremented). */ static struct extent_buffer * -tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, - u64 time_seq) +tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct btrfs_path *path, + struct extent_buffer *eb, u64 time_seq) { struct extent_buffer *eb_rewin; struct tree_mod_elem *tm; @@ -1207,12 +1207,15 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, if (!tm) return eb; + btrfs_set_path_blocking(path); + btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); + if (tm-op == MOD_LOG_KEY_REMOVE_WHILE_FREEING) { BUG_ON(tm-slot != 0); eb_rewin = alloc_dummy_extent_buffer(eb-start, fs_info-tree_root-nodesize); if (!eb_rewin) { - btrfs_tree_read_unlock(eb); + btrfs_tree_read_unlock_blocking(eb); free_extent_buffer(eb); return NULL; } @@ -1224,13 +1227,14 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, } else { eb_rewin = btrfs_clone_extent_buffer(eb); if (!eb_rewin) { - btrfs_tree_read_unlock(eb); + btrfs_tree_read_unlock_blocking(eb); free_extent_buffer(eb); return NULL; } } - btrfs_tree_read_unlock(eb); + btrfs_clear_path_blocking(path, NULL, BTRFS_READ_LOCK); + btrfs_tree_read_unlock_blocking(eb); unlock_blocking? Rest looks ok to me. Yeah I change the lock to blocking above, so I have to do read_unlock_blocking here. Thanks, Uh, obviously. Got confused by the btrfs_clear_path_blocking above, but of course we're locking eb explicitly ourselves. Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net Thanks! -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: deal with enomem in the rewind path V3
On Thu, August 08, 2013 at 16:28 (+0200), David Sterba wrote: On Thu, Aug 08, 2013 at 09:36:52AM +0200, Jan Schmidt wrote: Weird patch formatting concerning extent_io.c, I assume there are no changes in extent_buffer_under_io and btrfs_release_extent_buffer_page, you just moved btrfs_clone_extent_buffer, right? Perhaps --patience or --minimal could do better? Otherwise, git diff --patience produces identical result for me (1.8.3.1). Yeah, I expected that after Josef said that he actually moved the other two functions, so the structure really changed in a way git cannot diff any better. Reviewed-by: Jan Schmidt list@jan-o-sch.net ^^^ xfs? :) Whoops :-) Replace that by btrfs if you wish. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v5 4/5] Btrfs: disable qgroups accounting when quota is off
Nice try hiding this one in a dedup patch set, but I finally found it :-) On Wed, July 31, 2013 at 17:37 (+0200), Liu Bo wrote: So we don't need to do qgroups accounting trick without enabling quota. This reduces my tester's costing time from ~28s to ~23s. Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/extent-tree.c |6 ++ fs/btrfs/qgroup.c |6 ++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 10a5c72..c6612f5 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2524,6 +2524,12 @@ int btrfs_delayed_refs_qgroup_accounting(struct btrfs_trans_handle *trans, struct qgroup_update *qgroup_update; int ret = 0; + if (!trans-root-fs_info-quota_enabled) { + if (trans-delayed_ref_elem.seq) + btrfs_put_tree_mod_seq(fs_info, trans-delayed_ref_elem); + return 0; + } + if (list_empty(trans-qgroup_ref_list) != !trans-delayed_ref_elem.seq) { /* list without seq or seq without list */ diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 1280eff..f3e82aa 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1200,6 +1200,9 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans, { struct qgroup_update *u; + if (!trans-root-fs_info-quota_enabled) + return 0; + BUG_ON(!trans-delayed_ref_elem.seq); u = kmalloc(sizeof(*u), GFP_NOFS); if (!u) @@ -1850,6 +1853,9 @@ out: void assert_qgroups_uptodate(struct btrfs_trans_handle *trans) { + if (!trans-root-fs_info-quota_enabled) + return; + if (list_empty(trans-qgroup_ref_list) !trans-delayed_ref_elem.seq) return; pr_err(btrfs: qgroups not uptodate in trans handle %p: list is%s empty, seq is %#x.%x\n, The second hunk looks sensible at first sight. However, hunk 1 and 3 don't. They assert consistency of qgroup state in well defined places. The fact that you need to disable those checks shows that skipping addition to the list in the second hunk cannot be right, or at least not sufficient. We've got the list of qgroup operations trans-qgroup_ref_list and we've got the qgroup's delayed ref blocker, trans-delayed_ref_elem. If you stop adding to the list (hunk 2) which seems reasonable when quota is disabled, then you also must ensure you're not acquiring the delayed ref blocker element, which should give another performance boost. need_ref_seq may be the right place for this change. It just feels a bit too obvious. The critical cases obviously are quota enable and quota disable. I just don't recall why it wasn't that way from the very beginning of qgroups, I might be missing something fundamental here. Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v5 4/5] Btrfs: disable qgroups accounting when quota is off
On Mon, August 05, 2013 at 16:18 (+0200), Liu Bo wrote: On Mon, Aug 05, 2013 at 02:34:30PM +0200, Jan Schmidt wrote: Nice try hiding this one in a dedup patch set, but I finally found it :-) A, I didn't mean to ;-) On Wed, July 31, 2013 at 17:37 (+0200), Liu Bo wrote: So we don't need to do qgroups accounting trick without enabling quota. This reduces my tester's costing time from ~28s to ~23s. Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/extent-tree.c |6 ++ fs/btrfs/qgroup.c |6 ++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 10a5c72..c6612f5 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2524,6 +2524,12 @@ int btrfs_delayed_refs_qgroup_accounting(struct btrfs_trans_handle *trans, struct qgroup_update *qgroup_update; int ret = 0; + if (!trans-root-fs_info-quota_enabled) { + if (trans-delayed_ref_elem.seq) + btrfs_put_tree_mod_seq(fs_info, trans-delayed_ref_elem); + return 0; + } + if (list_empty(trans-qgroup_ref_list) != !trans-delayed_ref_elem.seq) { /* list without seq or seq without list */ diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 1280eff..f3e82aa 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1200,6 +1200,9 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans, { struct qgroup_update *u; + if (!trans-root-fs_info-quota_enabled) + return 0; + BUG_ON(!trans-delayed_ref_elem.seq); u = kmalloc(sizeof(*u), GFP_NOFS); if (!u) @@ -1850,6 +1853,9 @@ out: void assert_qgroups_uptodate(struct btrfs_trans_handle *trans) { + if (!trans-root-fs_info-quota_enabled) + return; + if (list_empty(trans-qgroup_ref_list) !trans-delayed_ref_elem.seq) return; pr_err(btrfs: qgroups not uptodate in trans handle %p: list is%s empty, seq is %#x.%x\n, The second hunk looks sensible at first sight. However, hunk 1 and 3 don't. They assert consistency of qgroup state in well defined places. The fact that you need to disable those checks shows that skipping addition to the list in the second hunk cannot be right, or at least not sufficient. I agree, only hunk 2 is necessary. We've got the list of qgroup operations trans-qgroup_ref_list and we've got the qgroup's delayed ref blocker, trans-delayed_ref_elem. If you stop adding to the list (hunk 2) which seems reasonable when quota is disabled, then you also must ensure you're not acquiring the delayed ref blocker element, which should give another performance boost. WHY a 'must' here? Because otherwise you are going to hit the BUG_ONs you avoided with hunk 1 and 3. need_ref_seq may be the right place for this change. It just feels a bit too obvious. The critical cases obviously are quota enable and quota disable. I just don't recall why it wasn't that way from the very beginning of qgroups, I might be missing something fundamental here. Yeah I thought about 'need_ref_seq', but the point is that delayed ref blocker not only serves qgroups accounting, but also features based on backref walking, such as scrub, snapshot-aware defragment. I think you're confusing trans-delayed_ref_elem with other callers of btrfs_get_tree_mod_seq() and btrfs_put_tree_mod_seq(). trans-delayed_ref_elem is only used in qgroup context, as far as my grep reaches. There are other callers of btrfs_get_tree_mod_seq() that can put their blocker element on the stack, such as iterate_extent_inodes(). But I still might be missing something. Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: add missing error check to find_parent_nodes
On Wed, July 31, 2013 at 01:26 (+0200), Filipe David Borba Manana wrote: Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Ensure extent buffer is freed on error. fs/btrfs/backref.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 8bc5e8c..980e85a 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -935,8 +935,10 @@ again: } ret = find_extent_in_eb(eb, bytenr, *extent_item_pos, eie); - ref-inode_list = eie; free_extent_buffer(eb); + if (ret 0) + goto out; + ref-inode_list = eie; } ret = ulist_add_merge(refs, ref-parent, (uintptr_t)ref-inode_list, The only ret 0 I'm seeing is ENOMEM, so that should be safe. Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Cloning a Btrfs partition
On Mon, July 29, 2013 at 17:32 (+0200), BJ Quinn wrote: Thanks for the response! Not sure I want to roll a custom kernel on this particular system. Any idea on when it might make it to 3.10 stable or 3.11? Or should I just revert back to 3.9? I missed that it's in fact in 3.11 and if I got Liu Bo right he's going to send it to 3.10 stable soon. Thanks, -Jan Thanks! -BJ - Original Message - From: Jan Schmidt list.bt...@jan-o-sch.net Sent: Monday, July 29, 2013 3:21:51 AM Hi BJ, [original message rewrapped] On Thu, July 25, 2013 at 18:32 (+0200), BJ Quinn wrote: (Apologies for the double post -- forgot to send as plain text the first time around, so the list rejected it.) I see that there's now a btrfs send / receive and I've tried using it, but I'm getting the oops I've pasted below, after which the FS becomes unresponsive (no I/O to the drive, no CPU usage, but all attempts to access the FS results in a hang). I have an internal drive (single drive) that contains 82GB of compressed data with a couple hundred snapshots. I tried taking the first snapshot and making a read only copy (btrfs subvolume snapshot -r) and then I connected an external USB drive and ran btrfs send / receive to that external drive. It starts working and gets a couple of GB in (I'd expect the first snapshot to be about 20GB) and then gets the following error. I had to use the latest copy of btrfs-progs from git, because the package installed on my system (btrfs-progs-0.20-0.2.git91d9eec) simply returned invalid argument when trying to run btrfs send / receive. Thanks in advance for any info you may have. The problem has been introduced with rbtree ulists in 3.10, commit Btrfs: add a rb_tree to improve performance of ulist search You should be safe to revert that commit, it's a performance optimization attempt. Alternatively, you can apply the published fix Btrfs: fix crash regarding to ulist_add_merge It has not made it into 3.10 stable or 3.11, yet, but is contained in Josef's btrfs-next git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Cloning a Btrfs partition
Hi BJ, [original message rewrapped] On Thu, July 25, 2013 at 18:32 (+0200), BJ Quinn wrote: (Apologies for the double post -- forgot to send as plain text the first time around, so the list rejected it.) I see that there's now a btrfs send / receive and I've tried using it, but I'm getting the oops I've pasted below, after which the FS becomes unresponsive (no I/O to the drive, no CPU usage, but all attempts to access the FS results in a hang). I have an internal drive (single drive) that contains 82GB of compressed data with a couple hundred snapshots. I tried taking the first snapshot and making a read only copy (btrfs subvolume snapshot -r) and then I connected an external USB drive and ran btrfs send / receive to that external drive. It starts working and gets a couple of GB in (I'd expect the first snapshot to be about 20GB) and then gets the following error. I had to use the latest copy of btrfs-progs from git, because the package installed on my system (btrfs-progs-0.20-0.2.git91d9eec) simply returned invalid argument when trying to run btrfs send / receive. Thanks in advance for any info you may have. The problem has been introduced with rbtree ulists in 3.10, commit Btrfs: add a rb_tree to improve performance of ulist search You should be safe to revert that commit, it's a performance optimization attempt. Alternatively, you can apply the published fix Btrfs: fix crash regarding to ulist_add_merge It has not made it into 3.10 stable or 3.11, yet, but is contained in Josef's btrfs-next git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix crash regarding to ulist_add_merge
On Fri, June 28, 2013 at 06:37 (+0200), Liu Bo wrote: Several users reported this crash of NULL pointer or general protection, the story is that we add a rbtree for speedup ulist iteration, and we use krealloc() to address ulist growth, and krealloc() use memcpy to copy old data to new memory area, so it's OK for an array as it doesn't use pointers while it's not OK for a rbtree as it uses pointers. So krealloc() will mess up our rbtree and it ends up with crash. Reviewed-by: Wang Shilong wangsl-f...@cn.fujitsu.com Signed-off-by: Liu Bo bo.li@oracle.com --- v3: fix a return value problem(Thanks Wang Shilong). v2: fix an use-after-free bug and a finger error(Thanks Zach and Josef). fs/btrfs/ulist.c | 15 +++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ulist.c b/fs/btrfs/ulist.c index 7b417e2..b0a523b2 100644 --- a/fs/btrfs/ulist.c +++ b/fs/btrfs/ulist.c @@ -205,6 +205,10 @@ int ulist_add_merge(struct ulist *ulist, u64 val, u64 aux, u64 new_alloced = ulist-nodes_alloced + 128; struct ulist_node *new_nodes; void *old = NULL; + int i; + + for (i = 0; i ulist-nnodes; i++) + rb_erase(ulist-nodes[i].rb_node, ulist-root); /* * if nodes_alloced == ULIST_SIZE no memory has been allocated @@ -224,6 +228,17 @@ int ulist_add_merge(struct ulist *ulist, u64 val, u64 aux, ulist-nodes = new_nodes; ulist-nodes_alloced = new_alloced; + + /* + * krealloc actually uses memcpy, which does not copy rb_node + * pointers, so we have to do it ourselves. Otherwise we may + * be bitten by crashes. + */ + for (i = 0; i ulist-nnodes; i++) { + ret = ulist_rbtree_insert(ulist, ulist-nodes[i]); + if (ret 0) + return ret; + } } ulist-nodes[ulist-nnodes].val = val; ulist-nodes[ulist-nnodes].aux = aux; Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net Josef, how about sending this one for the next 3.11 rc and to 3.10 stable? Any objections? -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/2] xfstest btrfs/316: test send / receive (was: btrfs/314)
From: root root@zarzz.(none) These two patches add the announced tests for btrfs send / receive. As requested, the fssum tool is now included. One drawback is that I'm unable to edit configure.ac or whatever needs to be modified in an autotools preferred way. Any hints appreciated, preferrably hints containing all the modifications required to introduce something like HAVE_SEEK_HOLE. I do not want to make modifications to fssum.c here, if that's absolutely required (because one /could/ get along using linux/fs.h, which is not the way I would like to go), I'd like to have that changed in the far-progs repository where fssum.c comes from as well. Jan Schmidt (2): xfstests: add fssum tool xfstests btrfs/316: test send / receive .gitignore |1 + README |3 + common/config |2 + src/Makefile| 11 +- src/fssum.c | 819 +++ tests/btrfs/316 | 113 +++ tests/btrfs/316.out |4 + tests/btrfs/group |1 + 8 files changed, 953 insertions(+), 1 deletions(-) create mode 100644 src/fssum.c create mode 100755 tests/btrfs/316 create mode 100644 tests/btrfs/316.out -- 1.7.2.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] xfstests btrfs/316: test send / receive
Basic send / receive functionality test for btrfs. Requires current version of fsstress built (-x support). Relies on fssum tool but can skip the test if it failed to build. Signed-off-by: Jan Schmidt list@jan-o-sch.net --- README |3 + tests/btrfs/316 | 113 +++ tests/btrfs/316.out |4 ++ tests/btrfs/group |1 + 4 files changed, 121 insertions(+), 0 deletions(-) create mode 100755 tests/btrfs/316 create mode 100644 tests/btrfs/316.out diff --git a/README b/README index a49ca7c..d287f63 100644 --- a/README +++ b/README @@ -26,6 +26,9 @@ Preparing system for tests (IRIX and Linux): http://www.extra.research.philips.com/udf/, then copy the udf_test binary to xfstests/src/. If you wish to disable UDF verification test set the environment variable DISABLE_UDF_TEST to 1. +- If you wish to run the btrfs send / receive components of the suite + install fssum from +git://git.kernel.org/pub/scm/linux/kernel/git/arne/far-progs.git - create one or two partitions to use for testing diff --git a/tests/btrfs/316 b/tests/btrfs/316 new file mode 100755 index 000..2e86428 --- /dev/null +++ b/tests/btrfs/316 @@ -0,0 +1,113 @@ +#! /bin/bash +# FSQA Test No. 314 +# +# Run fsstress to create a reasonably strange file system, make a +# snapshot (base) and run more fsstress. Then take another snapshot +# (incr) and send both snapshots to a temp file. Remake the file +# system and receive from the files. Check both states with fssum. +# +#--- +# Copyright (C) 2013 STRATO. All rights reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# +#--- +# +# creator +owner=list.bt...@jan-o-sch.net + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=`mktemp -d` +status=1 + +_cleanup() +{ + echo *** unmount + umount $SCRATCH_MNT 2/dev/null + rm -f $tmp.* +} +trap _cleanup; exit \$status 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_need_to_be_root +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_command $FSSUM_PROG fssum + +rm -f $seqres.full + +workout() +{ + fsz=$1 + ops=$2 + + umount $SCRATCH_DEV /dev/null 21 + echo *** mkfs -dsize=$fsz$seqres.full + echo $seqres.full + _scratch_mkfs_sized $fsz $seqres.full 21 \ + || _fail size=$fsz mkfs failed + run_check _scratch_mount -o noatime + + run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n $ops $FSSTRESS_AVOID -x \ + $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/base + + run_check $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/incr + + echo # $BTRFS_UTIL_PROG send $SCRATCH_MNT/base $tmp/base.snap \ +$seqres.full + $BTRFS_UTIL_PROG send $SCRATCH_MNT/base $tmp/base.snap 2 $seqres.full \ + || _fail failed: '$@' + echo # $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base\ + $SCRATCH_MNT/incr $tmp/incr.snap $seqres.full + $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base \ + $SCRATCH_MNT/incr $tmp/incr.snap 2 $seqres.full \ + || _fail failed: '$@' + + run_check $FSSUM_PROG -A -f -w $tmp/base.fssum $SCRATCH_MNT/base + run_check $FSSUM_PROG -A -f -w $tmp/incr.fssum -x $SCRATCH_MNT/incr/base \ + $SCRATCH_MNT/incr + + umount $SCRATCH_DEV /dev/null 21 + echo *** mkfs -dsize=$fsz$seqres.full + echo $seqres.full + _scratch_mkfs_sized $fsz $seqres.full 21 \ + || _fail size=$fsz mkfs failed + run_check _scratch_mount -o noatime + + run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT $tmp/base.snap + run_check $FSSUM_PROG -r $tmp/base.fssum $SCRATCH_MNT/base + + run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT $tmp/incr.snap + run_check $FSSUM_PROG -r $tmp/incr.fssum $SCRATCH_MNT/incr +} + +echo *** test send / receive + +fssize=`expr 2000 \* 1024 \* 1024` +ops=200 + +workout $fssize $ops + +echo *** done +status=0 +exit
Re: [PATCH] Btrfs: fix extent buffer leak after backref walking
On Wed, July 03, 2013 at 08:40 (+0200), Liu Bo wrote: commit 47fb091fb787420cd195e66f162737401cce023f(Btrfs: fix unlock after free on rewinded tree blocks) takes an extra increment on the reference of allocated dummy extent buffer, so now we cannot free this dummy one, and end up with extent buffer leak. Signed-off-by: Liu Bo bo.li@oracle.com Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 02fae7f..3d790b4 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1268,12 +1268,12 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, BUG_ON(!eb_rewin); } - extent_buffer_get(eb_rewin); btrfs_tree_read_unlock(eb); free_extent_buffer(eb); extent_buffer_get(eb_rewin); btrfs_tree_read_lock(eb_rewin); + __tree_mod_log_rewind(eb_rewin, time_seq, tm); WARN_ON(btrfs_header_nritems(eb_rewin) BTRFS_NODEPTRS_PER_BLOCK(fs_info-tree_root)); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: hold the tree mod lock in __tree_mod_log_rewind
On Sun, June 30, 2013 at 15:55 (+0200), Josef Bacik wrote: On Sun, Jun 30, 2013 at 10:25:05AM +0200, Jan Schmidt wrote: On 30.06.2013 05:17, Josef Bacik wrote: We need to hold the tree mod log lock in __tree_mod_log_rewind since we walk forward in the tree mod entries, otherwise we'll end up with random entries and trip the BUG_ON() at the front of __tree_mod_log_rewind. This fixes the panics people were seeing when running find /whatever -type f -exec btrfs fi defrag {} \; This patch cannot help to solve the problem, as far as I've understood what is going on. It does change timing, though, which presumably makes it pass the current reproducer we're having. On rewinding, iteration through the tree mod log rb-tree goes backwards in time, which means that once we've found our staring point we cannot be trapped by later additions. The old items we're rewinding towards cannot be freed, because we've allocated a blocker element within the tree and rewinding never goes beyond the allocated blocker. The blocker element is allocated by btrfs_get_tree_mod_seq and mostly referred to as time_seq within the other tree mod log functions in ctree.c. To sum up, the added lock is not required. The debug output I've analyzed so far shows that after we've rewinded all REMOVE_WHILE_FREEING operations on a buffer, ordered consecutively as expected, there comes another REMOVE_WHILE_FREEING with a sequence number much further in the past for the same buffer (but that sequence number is still higher than out time_seq rewind barrier at that point). This must be a logical problem I've not completely understood so far, but locking doesn't seem to be the right track. Finally reproduced it, this is my output btrfs-endio-wri-23110 [000] ...2 9556.882103: __tree_mod_log_rewind: rewinding 15450537984 btrfs-endio-wri-23110 [000] ...2 9556.882104: __tree_mod_log_rewind: 15450537984: processing 880246590a40, op 3, seq 68719476829, slot 0 btrfs-endio-wri-23110 [000] ...2 9556.882106: __tree_mod_log_rewind: 15450537984: processing 880246590ac0, op 3, seq 68719476828, slot 1 btrfs-endio-wri-23110 [000] ...2 9556.882108: __tree_mod_log_rewind: 15450537984: processing 880246590a40, op 3, seq 68719476829, slot 0 btrfs-endio-wri-23110 [000] ...2 9556.882110: __tree_mod_log_rewind: 15450537984: this tm is failing, 880246590a40, seq 68719476829, slot 0 so I'm inclined to beleive I've got it right. Thanks, Looking at the code I agree we should have a read lock around rb_next, protecting it against reorganization during insertions. Fits to that kind of debug output. How about just getting the lock for the rb_next call? There can be quite a lot of operations to rewind and I'd rather not have every other fs tree modification block on that. Thanks, -Jan Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: stop using GFP_ATOMIC for the tree mod log allocations
On Mon, July 01, 2013 at 22:25 (+0200), Josef Bacik wrote: Previously we held the tree mod lock when adding stuff because we use it to check and see if we truly do want to track tree modifications. This is admirable, but GFP_ATOMIC in a critical area that is going to get hit pretty hard and often is not nice. So instead do our basic checks to see if we don't need to track modifications, and if those pass then do our allocation, and then when we go to insert the new modification check if we still care, and if we don't just free up our mod and return. Otherwise we're good to go and we can carry on. Thanks, I'd like to look at a side-by-side diff of that patch in my editor. However, it does not apply to your current master branch, and git even refuses trying a 3-way-merge because your Repository lacks necessary blobs. Can you please push something? Thanks, -Jan Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/ctree.c | 161 ++ 1 files changed, 54 insertions(+), 107 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 127e1fd..fff08f9 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -484,8 +484,27 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm) struct rb_node **new; struct rb_node *parent = NULL; struct tree_mod_elem *cur; + int ret = 0; + + BUG_ON(!tm); + + tree_mod_log_write_lock(fs_info); + if (list_empty(fs_info-tree_mod_seq_list)) { + tree_mod_log_write_unlock(fs_info); + /* + * Ok we no longer care about logging modifications, free up tm + * and return 0. Any callers shouldn't be using tm after + * calling tree_mod_log_insert, but if they do we can just + * change this to return a special error code to let the callers + * do their own thing. + */ + kfree(tm); + return 0; + } - BUG_ON(!tm || !tm-seq); + spin_lock(fs_info-tree_mod_seq_lock); + tm-seq = btrfs_inc_tree_mod_seq_minor(fs_info); + spin_unlock(fs_info-tree_mod_seq_lock); tm_root = fs_info-tree_mod_log; new = tm_root-rb_node; @@ -501,14 +520,17 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm) else if (cur-seq tm-seq) new = ((*new)-rb_right); else { + ret = -EEXIST; kfree(tm); - return -EEXIST; + goto out; } } rb_link_node(tm-node, parent, new); rb_insert_color(tm-node, tm_root); - return 0; +out: + tree_mod_log_write_unlock(fs_info); + return ret; } /* @@ -524,55 +546,17 @@ static inline int tree_mod_dont_log(struct btrfs_fs_info *fs_info, return 1; if (eb btrfs_header_level(eb) == 0) return 1; - - tree_mod_log_write_lock(fs_info); - if (list_empty(fs_info-tree_mod_seq_list)) { - /* - * someone emptied the list while we were waiting for the lock. - * we must not add to the list when no blocker exists. - */ - tree_mod_log_write_unlock(fs_info); - return 1; - } - return 0; } -/* - * This allocates memory and gets a tree modification sequence number. - * - * Returns 0 on error. - * Returns 0 (the added sequence number) on success. - */ -static inline struct tree_mod_elem * -tree_mod_alloc(struct btrfs_fs_info *fs_info, gfp_t flags) -{ - struct tree_mod_elem *tm; - - /* - * once we switch from spin locks to something different, we should - * honor the flags parameter here. - */ - tm = *tm_ret = kzalloc(sizeof(*tm), GFP_ATOMIC); - if (!tm) - return NULL; - - spin_lock(fs_info-tree_mod_seq_lock); - tm-seq = btrfs_inc_tree_mod_seq_minor(fs_info); - spin_unlock(fs_info-tree_mod_seq_lock); - - return tm; -} - static inline int __tree_mod_log_insert_key(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, int slot, enum mod_log_op op, gfp_t flags) { - int ret; struct tree_mod_elem *tm; - tm = tree_mod_alloc(fs_info, flags); + tm = kzalloc(sizeof(*tm), flags); if (!tm) return -ENOMEM; @@ -589,34 +573,14 @@ __tree_mod_log_insert_key(struct btrfs_fs_info *fs_info, } static noinline int -tree_mod_log_insert_key_mask(struct btrfs_fs_info *fs_info, - struct extent_buffer *eb, int slot, - enum mod_log_op op, gfp_t flags) +tree_mod_log_insert_key(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, int slot, +
Re: [PATCH] Btrfs: only do the tree_mod_log_free_eb if this is our last ref
(resent to list) On Mon, July 01, 2013 at 22:12 (+0200), Josef Bacik wrote: There is another bug in the tree mod log stuff in that we're calling tree_mod_log_free_eb every single time a block is cow'ed. The problem with this is that if this block is shared by multiple snapshots we will call this multiple times per block, so if we go to rewind the mod log for this block we'll BUG_ON() in __tree_mod_log_rewind because we try to rewind a free twice. We only want to call tree_mod_log_free_eb if we are actually freeing the block. With this patch I no longer hit the panic in __tree_mod_log_rewind. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com Strange that never really popped up largely so far, should be quite easy to hit. Anyway, Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 32e30ad..127e1fd 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1093,7 +1093,8 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans, btrfs_set_node_ptr_generation(parent, parent_slot, trans-transid); btrfs_mark_buffer_dirty(parent); - tree_mod_log_free_eb(root-fs_info, buf); + if (last_ref) + tree_mod_log_free_eb(root-fs_info, buf); btrfs_free_tree_block(trans, root, buf, parent_start, last_ref); } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: hold the tree mod lock in __tree_mod_log_rewind
On 30.06.2013 05:17, Josef Bacik wrote: We need to hold the tree mod log lock in __tree_mod_log_rewind since we walk forward in the tree mod entries, otherwise we'll end up with random entries and trip the BUG_ON() at the front of __tree_mod_log_rewind. This fixes the panics people were seeing when running find /whatever -type f -exec btrfs fi defrag {} \; This patch cannot help to solve the problem, as far as I've understood what is going on. It does change timing, though, which presumably makes it pass the current reproducer we're having. On rewinding, iteration through the tree mod log rb-tree goes backwards in time, which means that once we've found our staring point we cannot be trapped by later additions. The old items we're rewinding towards cannot be freed, because we've allocated a blocker element within the tree and rewinding never goes beyond the allocated blocker. The blocker element is allocated by btrfs_get_tree_mod_seq and mostly referred to as time_seq within the other tree mod log functions in ctree.c. To sum up, the added lock is not required. The debug output I've analyzed so far shows that after we've rewinded all REMOVE_WHILE_FREEING operations on a buffer, ordered consecutively as expected, there comes another REMOVE_WHILE_FREEING with a sequence number much further in the past for the same buffer (but that sequence number is still higher than out time_seq rewind barrier at that point). This must be a logical problem I've not completely understood so far, but locking doesn't seem to be the right track. Thanks, -Jan Thansk, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/ctree.c | 10 ++ 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index c32d03d..7921e1d 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1161,8 +1161,8 @@ __tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info, * time_seq). */ static void -__tree_mod_log_rewind(struct extent_buffer *eb, u64 time_seq, - struct tree_mod_elem *first_tm) +__tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, + u64 time_seq, struct tree_mod_elem *first_tm) { u32 n; struct rb_node *next; @@ -1172,6 +1172,7 @@ __tree_mod_log_rewind(struct extent_buffer *eb, u64 time_seq, unsigned long p_size = sizeof(struct btrfs_key_ptr); n = btrfs_header_nritems(eb); + tree_mod_log_read_lock(fs_info); while (tm tm-seq = time_seq) { /* * all the operations are recorded with the operator used for @@ -1226,6 +1227,7 @@ __tree_mod_log_rewind(struct extent_buffer *eb, u64 time_seq, if (tm-index != first_tm-index) break; } + tree_mod_log_read_unlock(fs_info); btrfs_set_header_nritems(eb, n); } @@ -1274,7 +1276,7 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, extent_buffer_get(eb_rewin); btrfs_tree_read_lock(eb_rewin); - __tree_mod_log_rewind(eb_rewin, time_seq, tm); + __tree_mod_log_rewind(fs_info, eb_rewin, time_seq, tm); WARN_ON(btrfs_header_nritems(eb_rewin) BTRFS_NODEPTRS_PER_BLOCK(fs_info-tree_root)); @@ -1350,7 +1352,7 @@ get_old_root(struct btrfs_root *root, u64 time_seq) btrfs_set_header_generation(eb, old_generation); } if (tm) - __tree_mod_log_rewind(eb, time_seq, tm); + __tree_mod_log_rewind(root-fs_info, eb, time_seq, tm); else WARN_ON(btrfs_header_level(eb) != 0); WARN_ON(btrfs_header_nritems(eb) BTRFS_NODEPTRS_PER_BLOCK(root)); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Btrfs: qgroup rescan fixes for next rc
Hi Chris, I know, Linus is turning grumpy again. I'd still feel better if we sent this patch set for the very next rc now. Any particular objections? -Jan On Tue, May 28, 2013 at 17:47 (+0200), Jan Schmidt wrote: Here are three fixes for the new qgroup rescan feature. The first two are quite small, the third one is a little bigger. I thought about splitting that one up, but in the end I didn't find a good point to break that up. It achieves more than one goal, I agree, but its more or less a compact code change that need not be split artifically in my opinion. Jan Schmidt (3): Btrfs: fix memory patcher through fs_info-qgroup_ulist Btrfs: avoid double free of fs_info-qgroup_ulist Btrfs: fix qgroup rescan resume on mount fs/btrfs/ctree.h |2 + fs/btrfs/disk-io.c |2 + fs/btrfs/qgroup.c | 198 +--- 3 files changed, 131 insertions(+), 71 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests btrfs/314: test send / receive
(cc Arne for far-progs discussion) On Thu, June 06, 2013 at 19:54 (+0200), Eric Sandeen wrote: On 6/6/13 10:20 AM, Jan Schmidt wrote: Basic send / receive functionality test for btrfs. Requires current version of fsstress built (-x support). Relies on fssum tool, which is not part of the test suite but can skip the test if it is missing. Signed-off-by: Jan Schmidt list@jan-o-sch.net w/o commenting on the test itself, I'm a little uneasy about requiring some external, not-widely-installed tool for this to run. The fear is that it won't be run as often as it could/should be. The main purpose is to have it run by developers changing something around btrfs send / receive and probably the backref walker (while there exists a separate test not requiring fssum for backrefs). I think we can get them to install fssum. Could the same test be done w/o fssum, or should we maybe put a copy of fssum into xfstests/src/fssum.c ? I don't know any adequate replacement for fssum in this case. The purpose is to build a checksum for a whole file system tree, including data and partly metadata. I don't feel like copying fssum from far-progs into xfstests, though it probably won't hurt much. However, I cannot promise we won't make changes to it for far-progs, probably creating two incompatible versions of fssum in the wild. Arne? Or does fssum exist in any standard distro package? It doesn't. Perhaps Josef can hurry and make a Fedora package for it, if that prevents a separate copy to xfstests :-) Thanks, -Jan Thanks, -Eric --- README |3 + common/config |2 + tests/btrfs/314 | 113 +++ tests/btrfs/314.out |4 ++ tests/btrfs/group |1 + 5 files changed, 123 insertions(+), 0 deletions(-) create mode 100755 tests/btrfs/314 create mode 100644 tests/btrfs/314.out diff --git a/README b/README index d4d4f31..56b31f0 100644 --- a/README +++ b/README @@ -26,6 +26,9 @@ Preparing system for tests (IRIX and Linux): http://www.extra.research.philips.com/udf/, then copy the udf_test binary to xfstests/src/. If you wish to disable UDF verification test set the environment variable DISABLE_UDF_TEST to 1. +- If you wish to run the btrfs send / receive components of the suite + install fssum from +git://git.kernel.org/pub/scm/linux/kernel/git/arne/far-progs.git - create one or two partitions to use for testing diff --git a/common/config b/common/config index 67c1498..1c11da3 100644 --- a/common/config +++ b/common/config @@ -146,6 +146,8 @@ export SED_PROG=`set_prog_path sed` export BC_PROG=`set_prog_path bc` [ $BC_PROG = ] _fatal bc not found +export FSSUM_PROG=`set_prog_path fssum` + export PS_ALL_FLAGS=-ef export DF_PROG=`set_prog_path df` diff --git a/tests/btrfs/314 b/tests/btrfs/314 new file mode 100755 index 000..2e86428 --- /dev/null +++ b/tests/btrfs/314 @@ -0,0 +1,113 @@ +#! /bin/bash +# FSQA Test No. 314 +# +# Run fsstress to create a reasonably strange file system, make a +# snapshot (base) and run more fsstress. Then take another snapshot +# (incr) and send both snapshots to a temp file. Remake the file +# system and receive from the files. Check both states with fssum. +# +#--- +# Copyright (C) 2013 STRATO. All rights reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# +#--- +# +# creator +owner=list.bt...@jan-o-sch.net + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=`mktemp -d` +status=1 + +_cleanup() +{ +echo *** unmount +umount $SCRATCH_MNT 2/dev/null +rm -f $tmp.* +} +trap _cleanup; exit \$status 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_need_to_be_root +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_command $FSSUM_PROG fssum + +rm -f $seqres.full + +workout() +{ +fsz=$1 +ops=$2 + +umount $SCRATCH_DEV /dev/null 21 +echo *** mkfs -dsize=$fsz$seqres.full +echo $seqres.full
Re: [PATCH] xfstests btrfs/314: test send / receive
On Fri, June 07, 2013 at 16:51 (+0200), Arne Jansen wrote: On 07.06.2013 16:50, Eric Sandeen wrote: On 6/7/13 5:29 AM, Dave Chinner wrote: On Fri, Jun 07, 2013 at 09:18:58AM +0200, Jan Schmidt wrote: (cc Arne for far-progs discussion) On Thu, June 06, 2013 at 19:54 (+0200), Eric Sandeen wrote: On 6/6/13 10:20 AM, Jan Schmidt wrote: Basic send / receive functionality test for btrfs. Requires current version of fsstress built (-x support). Relies on fssum tool, which is not part of the test suite but can skip the test if it is missing. Signed-off-by: Jan Schmidt list@jan-o-sch.net w/o commenting on the test itself, I'm a little uneasy about requiring some external, not-widely-installed tool for this to run. The fear is that it won't be run as often as it could/should be. The main purpose is to have it run by developers changing something around btrfs send / receive and probably the backref walker (while there exists a separate test not requiring fssum for backrefs). I think we can get them to install fssum. There's no point in having tests that require you to go find something else before the tests can be run. That's been tried before, and it doesn't work - the test just won't get run by the majority of people who run xfstests. Could the same test be done w/o fssum, or should we maybe put a copy of fssum into xfstests/src/fssum.c ? I don't know any adequate replacement for fssum in this case. The purpose is to build a checksum for a whole file system tree, including data and partly metadata. I don't feel like copying fssum from far-progs into xfstests, though it probably won't hurt much. However, I cannot promise we won't make changes to it for far-progs, probably creating two incompatible versions of fssum in the wild. Arne? Or does fssum exist in any standard distro package? It doesn't. Perhaps Josef can hurry and make a Fedora package for it, if that prevents a separate copy to xfstests :-) No, it doesn't. Packages would be needed for debian, suse, SLES, RHEL, etc for that to be a useful method of distribution. Just dump a snapshot of the utility in the xfstests src dir so we don't have to care about distribution issues... Yup I agree with this, if it's not widely available or replaceable by more common tools, let's just put a snapshot in xfstests. I'm fine with that, too. To prevent more agreement mails: I'll send a v2 including fssum.c, but probably not today. -Jan -Arne -Eric Cheers, Dave. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] xfstests btrfs/314: test send / receive
Basic send / receive functionality test for btrfs. Requires current version of fsstress built (-x support). Relies on fssum tool, which is not part of the test suite but can skip the test if it is missing. Signed-off-by: Jan Schmidt list@jan-o-sch.net --- README |3 + common/config |2 + tests/btrfs/314 | 113 +++ tests/btrfs/314.out |4 ++ tests/btrfs/group |1 + 5 files changed, 123 insertions(+), 0 deletions(-) create mode 100755 tests/btrfs/314 create mode 100644 tests/btrfs/314.out diff --git a/README b/README index d4d4f31..56b31f0 100644 --- a/README +++ b/README @@ -26,6 +26,9 @@ Preparing system for tests (IRIX and Linux): http://www.extra.research.philips.com/udf/, then copy the udf_test binary to xfstests/src/. If you wish to disable UDF verification test set the environment variable DISABLE_UDF_TEST to 1. +- If you wish to run the btrfs send / receive components of the suite + install fssum from +git://git.kernel.org/pub/scm/linux/kernel/git/arne/far-progs.git - create one or two partitions to use for testing diff --git a/common/config b/common/config index 67c1498..1c11da3 100644 --- a/common/config +++ b/common/config @@ -146,6 +146,8 @@ export SED_PROG=`set_prog_path sed` export BC_PROG=`set_prog_path bc` [ $BC_PROG = ] _fatal bc not found +export FSSUM_PROG=`set_prog_path fssum` + export PS_ALL_FLAGS=-ef export DF_PROG=`set_prog_path df` diff --git a/tests/btrfs/314 b/tests/btrfs/314 new file mode 100755 index 000..2e86428 --- /dev/null +++ b/tests/btrfs/314 @@ -0,0 +1,113 @@ +#! /bin/bash +# FSQA Test No. 314 +# +# Run fsstress to create a reasonably strange file system, make a +# snapshot (base) and run more fsstress. Then take another snapshot +# (incr) and send both snapshots to a temp file. Remake the file +# system and receive from the files. Check both states with fssum. +# +#--- +# Copyright (C) 2013 STRATO. All rights reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# +#--- +# +# creator +owner=list.bt...@jan-o-sch.net + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=`mktemp -d` +status=1 + +_cleanup() +{ + echo *** unmount + umount $SCRATCH_MNT 2/dev/null + rm -f $tmp.* +} +trap _cleanup; exit \$status 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_need_to_be_root +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_command $FSSUM_PROG fssum + +rm -f $seqres.full + +workout() +{ + fsz=$1 + ops=$2 + + umount $SCRATCH_DEV /dev/null 21 + echo *** mkfs -dsize=$fsz$seqres.full + echo $seqres.full + _scratch_mkfs_sized $fsz $seqres.full 21 \ + || _fail size=$fsz mkfs failed + run_check _scratch_mount -o noatime + + run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n $ops $FSSTRESS_AVOID -x \ + $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/base + + run_check $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/incr + + echo # $BTRFS_UTIL_PROG send $SCRATCH_MNT/base $tmp/base.snap \ +$seqres.full + $BTRFS_UTIL_PROG send $SCRATCH_MNT/base $tmp/base.snap 2 $seqres.full \ + || _fail failed: '$@' + echo # $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base\ + $SCRATCH_MNT/incr $tmp/incr.snap $seqres.full + $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base \ + $SCRATCH_MNT/incr $tmp/incr.snap 2 $seqres.full \ + || _fail failed: '$@' + + run_check $FSSUM_PROG -A -f -w $tmp/base.fssum $SCRATCH_MNT/base + run_check $FSSUM_PROG -A -f -w $tmp/incr.fssum -x $SCRATCH_MNT/incr/base \ + $SCRATCH_MNT/incr + + umount $SCRATCH_DEV /dev/null 21 + echo *** mkfs -dsize=$fsz$seqres.full + echo $seqres.full + _scratch_mkfs_sized $fsz $seqres.full 21 \ + || _fail size=$fsz mkfs failed + run_check _scratch_mount -o noatime
[PATCH 0/3] Btrfs: qgroup rescan fixes for next rc
Here are three fixes for the new qgroup rescan feature. The first two are quite small, the third one is a little bigger. I thought about splitting that one up, but in the end I didn't find a good point to break that up. It achieves more than one goal, I agree, but its more or less a compact code change that need not be split artifically in my opinion. Jan Schmidt (3): Btrfs: fix memory patcher through fs_info-qgroup_ulist Btrfs: avoid double free of fs_info-qgroup_ulist Btrfs: fix qgroup rescan resume on mount fs/btrfs/ctree.h |2 + fs/btrfs/disk-io.c |2 + fs/btrfs/qgroup.c | 198 +--- 3 files changed, 131 insertions(+), 71 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] Btrfs: fix memory patcher through fs_info-qgroup_ulist
Commit 5b7c665e introduced fs_info-qgroup_ulist, that is allocated during btrfs_read_qgroup_config and meant to be used later by the qgroup accounting code. However, it is always freed before btrfs_read_qgroup_config returns, becuase the commit mentioned above adds a check for (ret), where a check for (ret 0) would have been the right choice. This commit fixes the check. Cc: Wang Shilong wangsl-f...@cn.fujitsu.com Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/qgroup.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index d059d86..74b432d 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -430,7 +430,7 @@ out: } btrfs_free_path(path); - if (ret) + if (ret 0) ulist_free(fs_info-qgroup_ulist); return ret 0 ? ret : 0; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] Btrfs: fix qgroup rescan resume on mount
When called during mount, we cannot start the rescan worker thread until open_ctree is done. This commit restuctures the qgroup rescan internals to enable a clean deferral of the rescan resume operation. First of all, the struct qgroup_rescan is removed, saving us a malloc and some initialization synchronizations problems. Its only element (the worker struct) now lives within fs_info just as the rest of the rescan code. Then setting up a rescan worker is split into several reusable stages. Currently we have three different rescan startup scenarios: (A) rescan ioctl (B) rescan resume by mount (C) rescan by quota enable Each case needs its own combination of the four following steps: (1) set the progress [A, C: zero; B: state of umount] (2) commit the transaction [A] (3) set the counters [A, C: zero; B: state of umount] (4) start worker [A, B, C] qgroup_rescan_init does step (1). There's no extra function added to commit a transaction, we've got that already. qgroup_rescan_zero_tracking does step (3). Step (4) is nothing more than a call to the generic btrfs_queue_worker. We also get rid of a double check for the rescan progress during btrfs_qgroup_account_ref, which is no longer required due to having step 2 from the list above. As a side effect, this commit prepares to move the rescan start code from btrfs_run_qgroups (which is run during commit) to a less time critical section. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.h |2 + fs/btrfs/disk-io.c |2 + fs/btrfs/qgroup.c | 190 +--- 3 files changed, 125 insertions(+), 69 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index fd62aa8..8ac8d52 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1610,6 +1610,7 @@ struct btrfs_fs_info { struct btrfs_key qgroup_rescan_progress; struct btrfs_workers qgroup_rescan_workers; struct completion qgroup_rescan_completion; + struct btrfs_work qgroup_rescan_work; /* filesystem state */ unsigned long fs_state; @@ -3856,6 +3857,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, int btrfs_quota_disable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); +void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info); int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info); int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 src, u64 dst); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index d7b46c6..da4a10c 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2879,6 +2879,8 @@ retry_root_backup: return ret; } + btrfs_qgroup_rescan_resume(fs_info); + return 0; fail_qgroup: diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index c6ce642..1280eff 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -98,13 +98,10 @@ struct btrfs_qgroup_list { struct btrfs_qgroup *member; }; -struct qgroup_rescan { - struct btrfs_work work; - struct btrfs_fs_info*fs_info; -}; - -static void qgroup_rescan_start(struct btrfs_fs_info *fs_info, - struct qgroup_rescan *qscan); +static int +qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid, + int init_flags); +static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info); /* must be called with qgroup_ioctl_lock held */ static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info, @@ -255,6 +252,7 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) int slot; int ret = 0; u64 flags = 0; + u64 rescan_progress = 0; if (!fs_info-quota_enabled) return 0; @@ -312,20 +310,7 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) } fs_info-qgroup_flags = btrfs_qgroup_status_flags(l, ptr); - fs_info-qgroup_rescan_progress.objectid = - btrfs_qgroup_status_rescan(l, ptr); - if (fs_info-qgroup_flags - BTRFS_QGROUP_STATUS_FLAG_RESCAN) { - struct qgroup_rescan *qscan = - kmalloc(sizeof(*qscan), GFP_NOFS); - if (!qscan) { - ret = -ENOMEM; - goto out; - } - fs_info-qgroup_rescan_progress.type = 0; - fs_info-qgroup_rescan_progress.offset = 0
[PATCH 2/3] Btrfs: avoid double free of fs_info-qgroup_ulist
When btrfs_read_qgroup_config or btrfs_quota_enable return non-zero, we've already freed the fs_info-qgroup_ulist. The final btrfs_free_qgroup_config called from quota_disable makes another ulist_free(fs_info-qgroup_ulist) call. We set fs_info-qgroup_ulist to NULL on the mentioned error paths, turning the ulist_free in btrfs_free_qgroup_config into a noop. Cc: Wang Shilong wangsl-f...@cn.fujitsu.com Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/qgroup.c |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 74b432d..c6ce642 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -430,8 +430,10 @@ out: } btrfs_free_path(path); - if (ret 0) + if (ret 0) { ulist_free(fs_info-qgroup_ulist); + fs_info-qgroup_ulist = NULL; + } return ret 0 ? ret : 0; } @@ -932,8 +934,10 @@ out_free_root: kfree(quota_root); } out: - if (ret) + if (ret) { ulist_free(fs_info-qgroup_ulist); + fs_info-qgroup_ulist = NULL; + } mutex_unlock(fs_info-qgroup_ioctl_lock); return ret; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Btrfs: qgroup rescan fixes for next rc
Hi Wang, Please have a look at these patches, you should have been CCed but I just realized git send-email doesn't care about Cc lines in the patch file. Sigh. Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hard freezes with 3.9.0 during io-intensive loads
On Thu, May 16, 2013 at 09:19 (+0200), Kai Krakow wrote: 3.9.2 still does not fix anything. I'll go with autodefrag=off for the moment until I hear some news in that regard. With this new information, is it still helpful to get a metadata image from me? It should be reproducable if you enable autodefrag or defragment cow'ed files. Would still be helpful, yes. If you've got questions on the usage of btrfs-image, your best bet is probably #btrfs on freenode, I haven't created any usable images with that tool so far, but I've heard of people that succeeded. Thanks! -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hard freezes with 3.9.0 during io-intensive loads
On Fri, May 10, 2013 at 01:30 (+0200), Kai Krakow wrote: Jan Schmidt list.bt...@jan-o-sch.net schrieb: Apparently, it's not fixed. The system does not freeze now but it threw multiple backtraces right in front of my Xorg session. The backtraces look a little bit different now. Here's what I got: https://gist.github.com/kakra/8a340f006d01e146865d Occurence while running bedup dedup --defrag --size-cutoff $((1024*1024)) which was currently dedup'ing my backup volume with daily snapshots filled by rsync --inplace - so I suppose some file contents are pretty scattered. At least that looks different for now. I'm not certain about all the fixes in btrfs-next. Can you give it a try and bisect if btrfs-next is good? That would be really helpful. I'd prefer to not bisect my production system kernel... That will probably take ages as running the reproducable test takes about 30-60 minutes before the problem hits my system. At least unless you had a suggestion how to speed up the process... ;-) I see, hoped it would be something quicker. I saw the pull request with those fixes, so I supsect it didn't go into 3.9.1 but rather will go into 3.9.2? Probably. However, those patches obviously weren't enough to solve your problem. We don't submit a lot of things to stable, so they are likely to remain the only btrfs related changes in there, which would mean it is unlikely to help with your problem. We can try to debug that further, you can send me / upload the output of btrfs-image -c9 /dev/whatever blah.img built from Josef's repository git://github.com/josefbacik/btrfs-progs.git It contains all your metadata (like file names), data is omitted from the dump. I probably wait and just do not run the dedup process until I have 3.9.2 installed. The backup works with occassional hiccups, the system very very sometimes freezes but I almost always see the backtraces in dmesg after backup. Let's see if it's all gone in 3.9.2. It's always an alternative to hope for the best :-) -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hard freezes with 3.9.0 during io-intensive loads
On Wed, May 08, 2013 at 02:24 (+0200), Kai Krakow wrote: Kai Krakow hurikhan77+bt...@gmail.com schrieb: I can reliably reproduce it from two different approaches. I'd like to only apply the commits fixing it. Can you name them here? In git log order: 6ced2666 Btrfs: separate sequence numbers for delayed ref tracking and tree mod log ef9120b1 Btrfs: fix tree mod log regression on root split operations 2ed098ca Btrfs: fix accessing the root pointer in tree mod log functions 50723551 Btrfs: fix unlock after free on rewinded tree blocks The commit ids are from josef's master branch (git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git) which is known not to be very stable regarding commit ids. Thanks, applied almost cleanly to 3.9.0 vanilla with just one reject. And that was for some error message. I'm simply ignoring that and currently compiling it. I will get back here with the result (fixed or not fixed for one or both situations). Apparently, it's not fixed. The system does not freeze now but it threw multiple backtraces right in front of my Xorg session. The backtraces look a little bit different now. Here's what I got: https://gist.github.com/kakra/8a340f006d01e146865d Occurence while running bedup dedup --defrag --size-cutoff $((1024*1024)) which was currently dedup'ing my backup volume with daily snapshots filled by rsync --inplace - so I suppose some file contents are pretty scattered. At least that looks different for now. I'm not certain about all the fixes in btrfs-next. Can you give it a try and bisect if btrfs-next is good? That would be really helpful. -Jan [ 2612.573501] [ cut here ] [ 2612.573509] WARNING: at fs/btrfs/inode.c:2157 record_one_backref+0x310/0x328() [ 2612.573510] Hardware name: To Be Filled By O.E.M. [ 2612.573511] Modules linked in: rfcomm bnep af_packet vsock(O) vmmon(O) vmnet(O) vmci(O) vmblock(O) reiserfs snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device gspca_sonixj gspca_main videodev gpio_ich coretemp hwmon kvm_intel kvm crc32_pclmul crc32c_intel btusb microcode bluetooth pcspkr lpc_ich i2c_i801 8250 mfd_core serial_core evdev usb_storage zram(C) unix [ 2612.573528] Pid: 13112, comm: btrfs-endio-wri Tainted: G C O 3.9.0-gentoo #3 [ 2612.573529] Call Trace: [ 2612.573534] [8102f11d] ? warn_slowpath_common+0x78/0x8e [ 2612.573536] [81183aed] ? record_one_backref+0x310/0x328 [ 2612.573540] [811c5eb0] ? iterate_extent_inodes+0x177/0x23c [ 2612.573542] [811837dd] ? btrfs_real_readdir+0x482/0x482 [ 2612.573543] [811837dd] ? btrfs_real_readdir+0x482/0x482 [ 2612.573545] [811c5ffe] ? iterate_inodes_from_logical+0x89/0x96 [ 2612.573547] [81182320] ? record_extent_backrefs+0x4d/0x8e [ 2612.573549] [8118a8d3] ? btrfs_finish_ordered_io+0x671/0x798 [ 2612.573552] [811a33f3] ? worker_loop+0x176/0x493 [ 2612.573553] [811a327d] ? btrfs_queue_worker+0x272/0x272 [ 2612.573554] [811a327d] ? btrfs_queue_worker+0x272/0x272 [ 2612.573557] [810496d2] ? kthread+0x81/0x89 [ 2612.573560] [8105] ? free_sched_groups+0x32/0x50 [ 2612.573561] [81049651] ? kthread_freezable_should_stop+0x36/0x36 [ 2612.573564] [8151c6ac] ? ret_from_fork+0x7c/0xb0 [ 2612.573566] [81049651] ? kthread_freezable_should_stop+0x36/0x36 [ 2612.573567] ---[ end trace 4c42d11ebaf277b6 ]--- [ 2612.574001] [ cut here ] [ 2612.574004] WARNING: at fs/btrfs/inode.c:2157 record_one_backref+0x310/0x328() [ 2612.574004] Hardware name: To Be Filled By O.E.M. [ 2612.574005] Modules linked in: rfcomm bnep af_packet vsock(O) vmmon(O) vmnet(O) vmci(O) vmblock(O) reiserfs snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device gspca_sonixj gspca_main videodev gpio_ich coretemp hwmon kvm_intel kvm crc32_pclmul crc32c_intel btusb microcode bluetooth pcspkr lpc_ich i2c_i801 8250 mfd_core serial_core evdev usb_storage zram(C) unix [ 2612.574017] Pid: 13110, comm: btrfs-endio-wri Tainted: GWC O 3.9.0-gentoo #3 [ 2612.574018] Call Trace: [ 2612.574020] [8102f11d] ? warn_slowpath_common+0x78/0x8e [ 2612.574021] [81183aed] ? record_one_backref+0x310/0x328 [ 2612.574023] [811c5eb0] ? iterate_extent_inodes+0x177/0x23c [ 2612.574025] [811837dd] ? btrfs_real_readdir+0x482/0x482 [ 2612.574027] [811837dd] ? btrfs_real_readdir+0x482/0x482 [ 2612.574029] [811c5ffe] ? iterate_inodes_from_logical+0x89/0x96 [ 2612.574030] [81182320] ? record_extent_backrefs+0x4d/0x8e [ 2612.574032] [8118a8d3] ? btrfs_finish_ordered_io+0x671/0x798 [ 2612.574034] [811a33f3] ? worker_loop+0x176/0x493 [ 2612.574035] [811a327d] ? btrfs_queue_worker+0x272/0x272 [ 2612.574036] [811a327d] ? btrfs_queue_worker+0x272/0x272 [ 2612.574038]
Re: hard freezes with 3.9.0 during io-intensive loads
On Mon, May 06, 2013 at 22:29 (+0200), Kai Krakow wrote: Jan Schmidt list.bt...@jan-o-sch.net schrieb: That one should be fixed in btrfs-next. If you can reliably reproduce the bug I'd be glad to get a confirmation - you can probably even save putting it on bugzilla then ;-) I can reliably reproduce it from two different approaches. I'd like to only apply the commits fixing it. Can you name them here? In git log order: 6ced2666 Btrfs: separate sequence numbers for delayed ref tracking and tree mod log ef9120b1 Btrfs: fix tree mod log regression on root split operations 2ed098ca Btrfs: fix accessing the root pointer in tree mod log functions 50723551 Btrfs: fix unlock after free on rewinded tree blocks The commit ids are from josef's master branch (git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git) which is known not to be very stable regarding commit ids. Thanks, -Jan [snip] -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: add ioctl to wait for qgroup rescan completion
On Mon, May 06, 2013 at 23:20 (+0200), David Sterba wrote: On Mon, May 06, 2013 at 09:14:17PM +0200, Jan Schmidt wrote: --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -530,6 +530,7 @@ struct btrfs_ioctl_send_args { struct btrfs_ioctl_quota_rescan_args) #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \ struct btrfs_ioctl_quota_rescan_args) +#define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46) Why do you need an ioctl when the same can be achieved by polling the RESCAN_STATUS value ? The code does not anything special that has to be done within kernel. It's because I don't like polling :-) A rescan can take hours to complete, and you wouldn't like to see one ioctl per second for such a period either, I guess. (Plus: Everybody would lose like .9 seconds for each run of the xfstest I'm writing - accumulates to ages at least!) If you're worried about ioctl numbers, we could turn it into flags for BTRFS_IOC_QUOTA_RESCAN, but I don't see we're short on ioctl numbers yet. The reason why I chose a separate ioctl is that it is more like an attach operation to support both, specifying it when starting a fresh scan and waiting for a scan that's already running. I find it more intuitive to have it separate. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: save us a mutex_lock usage when doing quota rescan
On Tue, May 07, 2013 at 08:15 (+0200), Wang Shilong wrote: If qgroup_rescan worker is in progress, we should ignore the extent that has not been dealt with qgroup_rescan worker,just let them dealt later otherwise we may get wrong qgroup accounting. However, we have checked this before find_all_roots() without spin_lock. When doing qgroup accounting, we don't have to check it again, because during this period,qgroup_rescan worker can deal with more extents and qgroup_rescan_extent-objectid can only go larger, so here the check is unnecessary. Just remove this check, so that we don't need hold qgroup_rescan_lock when doing qgroup accounting. NAK. After a discussion on that lock the last thing in this thread I see is ... On Wed, May 01, 2013 at 13:57 (+0200), Jan Schmidt wrote: Now I see what you mean. The second check is only required when we start a rescan operation after the initial check in btrfs_qgroup_account_ref. Please continue on that argument, your commit message doesn't explain at all why we should be safe to remove this check. -Jan Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com --- fs/btrfs/qgroup.c |9 - 1 files changed, 0 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index d059d86..2710784 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1445,15 +1445,7 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle *trans, if (ret 0) return ret; - mutex_lock(fs_info-qgroup_rescan_lock); spin_lock(fs_info-qgroup_lock); - if (fs_info-qgroup_flags BTRFS_QGROUP_STATUS_FLAG_RESCAN) { - if (fs_info-qgroup_rescan_progress.objectid = node-bytenr) { - ret = 0; - goto unlock; - } - } - quota_root = fs_info-quota_root; if (!quota_root) goto unlock; @@ -1492,7 +1484,6 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle *trans, unlock: spin_unlock(fs_info-qgroup_lock); - mutex_unlock(fs_info-qgroup_rescan_lock); ulist_free(roots); return ret; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix passing wrong arg gfp_t to decide the correct allocation mode
On Tue, May 07, 2013 at 08:20 (+0200), Wang Shilong wrote: If you look the code carefully, you will see all the tree_mod_alloc() has to use GFP_ATOMIC. However, the original code pass the wrong arg gfp_t in some places, this dosen't cause any problems, because in the tree_mod_alloc(), it ignores arg gfp_t and just use GFP_ATOMIC directly, this is not good. However, i think we should try best not to allocate with GFP_ATOMIC, so i keep the gfp_t there in the hope we can change allocation mode in the future. NAK. The code as it is now is prepared to get rid of at least some GFP_ATOMIC allocations. You won't get rid of all of them, as there are a lot of spin lock situations where we need to add to the tree mod lock anyway. As a preparation we currently pass the best flags (least restrictive) we can instead of always passing GFP_ATOMIC. I pointed you to this comment already: 557 /* 558 * once we switch from spin locks to something different, we should 559 * honor the flags parameter here. 560 */ 561 tm = *tm_ret = kzalloc(sizeof(*tm), GFP_ATOMIC); So, if you want less atomic allocations, find something more suitable than an rwlock for fs_info-tree_mod_log_lock an you can in fact replace GFP_ATOMIC with flags in the kzalloc(). The good thing is, because everything is already prepared you don't have to think about all the callers again an pass the correct flags. -Jan Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com --- fs/btrfs/ctree.c | 37 ++--- 1 files changed, 18 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index de6de8e..33c9061 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -553,7 +553,7 @@ static inline int tree_mod_alloc(struct btrfs_fs_info *fs_info, gfp_t flags, * once we switch from spin locks to something different, we should * honor the flags parameter here. */ - tm = *tm_ret = kzalloc(sizeof(*tm), GFP_ATOMIC); + tm = *tm_ret = kzalloc(sizeof(*tm), flags); if (!tm) return -ENOMEM; @@ -591,14 +591,14 @@ __tree_mod_log_insert_key(struct btrfs_fs_info *fs_info, static noinline int tree_mod_log_insert_key_mask(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, int slot, - enum mod_log_op op, gfp_t flags) + enum mod_log_op op) { int ret; if (tree_mod_dont_log(fs_info, eb)) return 0; - ret = __tree_mod_log_insert_key(fs_info, eb, slot, op, flags); + ret = __tree_mod_log_insert_key(fs_info, eb, slot, op, GFP_ATOMIC); tree_mod_log_write_unlock(fs_info); return ret; @@ -608,7 +608,7 @@ static noinline int tree_mod_log_insert_key(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, int slot, enum mod_log_op op) { - return tree_mod_log_insert_key_mask(fs_info, eb, slot, op, GFP_NOFS); + return tree_mod_log_insert_key_mask(fs_info, eb, slot, op); } static noinline int @@ -616,13 +616,13 @@ tree_mod_log_insert_key_locked(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, int slot, enum mod_log_op op) { - return __tree_mod_log_insert_key(fs_info, eb, slot, op, GFP_NOFS); + return __tree_mod_log_insert_key(fs_info, eb, slot, op, GFP_ATOMIC); } static noinline int tree_mod_log_insert_move(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, int dst_slot, int src_slot, - int nr_items, gfp_t flags) + int nr_items) { struct tree_mod_elem *tm; int ret; @@ -642,7 +642,7 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info, BUG_ON(ret 0); } - ret = tree_mod_alloc(fs_info, flags, tm); + ret = tree_mod_alloc(fs_info, GFP_ATOMIC, tm); if (ret 0) goto out; @@ -679,7 +679,7 @@ __tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb) static noinline int tree_mod_log_insert_root(struct btrfs_fs_info *fs_info, struct extent_buffer *old_root, - struct extent_buffer *new_root, gfp_t flags, + struct extent_buffer *new_root, int log_removal) { struct tree_mod_elem *tm; @@ -691,7 +691,7 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info, if (log_removal) __tree_mod_log_free_eb(fs_info, old_root); - ret = tree_mod_alloc(fs_info, flags, tm); + ret = tree_mod_alloc(fs_info, GFP_ATOMIC, tm); if (ret 0) goto out; @@ -809,19 +809,18 @@ tree_mod_log_eb_move(struct btrfs_fs_info *fs_info, struct extent_buffer *dst, { int ret; ret = tree_mod_log_insert_move(fs_info,
Re: Kernel BUG: __tree_mod_log_rewind
On Tue, May 07, 2013 at 11:25 (+0200), Elladan wrote: I can get btrfs to throw a kernel bug easily by running btrfs fi defrag on some files in 3.9.0: Thanks for reporting. It's a known bug (that ought to be fixed before the 3.9 release in fact). You can either use btrfs-next or apply the commits mentioned in my previous email today: On Tue, May 07, 2013 at 08:08 (+0200), Jan Schmidt wrote: In git log order: 6ced2666 Btrfs: separate sequence numbers for delayed ref tracking and tree mod log ef9120b1 Btrfs: fix tree mod log regression on root split operations 2ed098ca Btrfs: fix accessing the root pointer in tree mod log functions 50723551 Btrfs: fix unlock after free on rewinded tree blocks The commit ids are from josef's master branch (git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git) which is known not to be very stable regarding commit ids. Either way should fix your problem. An alternative is to wait for a 3.9 stable release after those fixes are in mainline (which should happen within the next seven days, I hope). Not using defrag, autodefrag or qgroups might also be an effective workaround, but no guarantees on that. -Jan May 7 01:57:33 caper kernel: [0.00] Linux version 3.9.0-030900-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201304291257 SMP Mon Apr 29 16:58:15 UTC 2013 ... May 7 02:09:21 caper kernel: [ 726.745485] [ cut here ] May 7 02:09:21 caper kernel: [ 726.745567] Kernel BUG at a00ea503 [verbose debug info unavailable] May 7 02:09:21 caper kernel: [ 726.745643] invalid opcode: [#1] SMP May 7 02:09:21 caper kernel: [ 726.745807] Modules linked in: snd_hrtimer zram(C) bnep rfcomm bluetooth parport_pc ppdev nfsd nfs_acl auth_rpcgss nfs fscache binfmt_misc lockd sunrpc snd_hda_codec_hdmi joydev hid_gaff ff_memless snd_usb_ audio snd_usbmidi_lib uvcvideo snd_seq_midi videobuf2_core videodev snd_rawmidi videobuf2_vmalloc videobuf2_memops snd_seq_midi_event dm_multipath snd_hda_codec_realtek snd_seq scsi_dh kvm_amd snd_seq_device snd_hda_intel kvm snd_hda_codec snd_hwdep microcode snd_pcm snd_timer k10temp edac_core edac_mce_amd serio_raw snd sp5100_tco i2c_piix4 soundcore snd_page_alloc mac_hid wmi it87 hwmon_vid lp parport xfs btrfs raid6_pq zlib_deflate xor libcrc32c ses enclosure dm_crypt hi d_generic usbhid hid usb_storage firewire_ohci firewire_core crc_itu_t ahci pata_acpi pata_atiixp libahci r8169 May 7 02:09:21 caper kernel: [ 726.749841] CPU 3 May 7 02:09:21 caper kernel: [ 726.749900] Pid: 1703, comm: btrfs-endio-wri Tainted: G C 3.9.0-030900-generic #201304291257 Gigabyte Technology Co., Ltd. GA-MA790GP-UD4H/GA-MA790GP-UD4H May 7 02:09:21 caper kernel: [ 726.750069] RIP: 0010:[a00ea503] [a00ea503] __tree_mod_log_rewind+0x253/0x260 [btrfs] May 7 02:09:21 caper kernel: [ 726.750244] RSP: 0018:88011a2e1838 EFLAGS: 00010293 May 7 02:09:21 caper kernel: [ 726.750316] RAX: RBX: 88004b2798f0 RCX: 88011a2e17d8 May 7 02:09:21 caper kernel: [ 726.750390] RDX: 13f3a75c RSI: 05e8 RDI: 8800172ea880 May 7 02:09:21 caper kernel: [ 726.750463] RBP: 88011a2e1868 R08: 1000 R09: 88011a2e17e8 May 7 02:09:21 caper kernel: [ 726.750536] R10: 000103db R11: R12: 880098cf4d80 May 7 02:09:21 caper kernel: [ 726.750609] R13: 002b R14: 8800172ea700 R15: 0009c7a7 May 7 02:09:21 caper kernel: [ 726.750683] FS: 7fa2bc594700() GS:88014fd8() knlGS: May 7 02:09:21 caper kernel: [ 726.750770] CS: 0010 DS: ES: CR0: 8005003b May 7 02:09:21 caper kernel: [ 726.750841] CR2: fd82c000 CR3: 00014654d000 CR4: 07e0 May 7 02:09:21 caper kernel: [ 726.750914] DR0: DR1: DR2: May 7 02:09:21 caper kernel: [ 726.750987] DR3: DR6: 0ff0 DR7: 0400 May 7 02:09:21 caper kernel: [ 726.751061] Process btrfs-endio-wri (pid: 1703, threadinfo 88011a2e, task 88004a6b2ea0) May 7 02:09:21 caper kernel: [ 726.751147] Stack: May 7 02:09:21 caper kernel: [ 726.751212] 88011a2e1858 880104c8de30 0009c7a7 8800 May 7 02:09:21 caper kernel: [ 726.751488] a8598000 880148278000 88011a2e18b8 a00ea5ef May 7 02:09:21 caper kernel: [ 726.751763] 880098cf4d80 88004b2798f0 8800338d3000 0001 May 7 02:09:21 caper kernel: [ 726.752038] Call Trace: May 7 02:09:21 caper kernel: [ 726.752135] [a00ea5ef] tree_mod_log_rewind+0xdf/0x240 [btrfs] May 7 02:09:21 caper kernel: [ 726.752237] [a00f25cb] btrfs_search_old_slot+0x4cb/0x670 [btrfs] May 7 02:09:21 caper kernel: [ 726.752351] [a016d118
Re: [PATCH] Btrfs: use arg gfp_mask to decide how to allocate tree mod
On Sun, May 05, 2013 at 15:58 (+0200), Wang Shilong wrote: It seems the original code doesn't pass the right arg gfp_t to decide how to allocate. Just applying this patch, fsstress will fail. So please ignore this patch, will resend later.. That's in fact what the comment above the line you changed implies :-) -Jan Thanks, Wang From: Wang Shilong wangsl-f...@cn.fujitsu.com We have passed arg gfp_mask to tree_mod_alloc(), so just use it rather than always use GFP_ATOMIC. Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com --- fs/btrfs/ctree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index de6de8e..0e3514f 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -553,7 +553,7 @@ static inline int tree_mod_alloc(struct btrfs_fs_info *fs_info, gfp_t flags, * once we switch from spin locks to something different, we should * honor the flags parameter here. */ -tm = *tm_ret = kzalloc(sizeof(*tm), GFP_ATOMIC); +tm = *tm_ret = kzalloc(sizeof(*tm), flags); if (!tm) return -ENOMEM; -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible to dedpulicate read-only snapshots for space-efficient backups
On Sun, May 05, 2013 at 12:07 (+0200), Kai Krakow wrote: I'm using an bash/rsync script[1] to backup my whole system on a nightly basis to an attached USB3 drive into a scratch area, then take a snapshot of this area. I'd like to have these snapshots immutable, so they should be read-only. Have you considered using btrfs send / receive for that purpose? You would just save the dedup step. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hard freezes with 3.9.0 during io-intensive loads
On Sun, May 05, 2013 at 18:10 (+0200), Kai Krakow wrote: Hello list, Kai Krakow hurikhan77+bt...@gmail.com schrieb: I've upgraded to 3.9.0 mainly for the snapshot-aware defragging patches. I'm running bedup[1] on a regular basis and it is now the third time that I got back to my PC just to find it hard-frozen and I needed to use the reset button. It looks like this happens only while running bedup on my two btrfs filesystems but I'm not sure if it happens for any of the filesystems or only one. This is my setup: # cat /etc/fstab (shortened) UUID=d2bb232a-2e8f-4951-8bcc-97e237f1b536 / btrfs compress=lzo,subvol=root64 0 1 # /dev/sd{a,b,c}3 LABEL=usb-backup /mnt/private/usb-backup btrfs noauto,compress- force=zlib,subvolid=0,autodefrag,comment=systemd.automount 0 0 # external usb3 disk # btrfs filesystem show Label: 'usb-backup' uuid: 7038c8fa-4293-49e9-b493-a9c46e5663ca Total devices 1 FS bytes used 1.13TB devid1 size 1.82TB used 1.75TB path /dev/sdd1 Label: 'system' uuid: d2bb232a-2e8f-4951-8bcc-97e237f1b536 Total devices 3 FS bytes used 914.43GB devid3 size 927.26GB used 426.03GB path /dev/sdc3 devid2 size 927.26GB used 426.03GB path /dev/sdb3 devid1 size 927.26GB used 427.07GB path /dev/sda3 Btrfs v0.20-rc1 Since the system hard-freezes I have no messages from dmesg. But I suspect it to be related to the defragmentation option in bedup (I've switched to bedub with --defrag since 3.9.0, and autodefrag for the backup drive). Just in case, I'm going to try without this option now and see if it won't freeze. I was able to take a physical screenshot with a real camera of a kernel backtrace one time when the freeze happened. I wonder if it is useful to you and where to send it. I just don't want to upload jpegs right here to the list without asking first. The big plus is: Altough I had to hard-reset the frozen system several times now, btrfs survived the procedure without any impact (just boot times increases noticeably, probably due to log-replays or something). So thumbs up for the developers on that point. Thanks to the great cwillu netcat service here's my backtrace: That one should be fixed in btrfs-next. If you can reliably reproduce the bug I'd be glad to get a confirmation - you can probably even save putting it on bugzilla then ;-) -Jan 4,1072,17508258745,-;[ cut here ] 2,1073,17508258772,-;kernel BUG at fs/btrfs/ctree.c:1144! 4,1074,17508258791,-;invalid opcode: [#1] SMP 4,1075,17508258811,-;Modules linked in: bnep bluetooth af_packet vmci(O) vmmon(O) vmblock(O) vmnet(O) vsock reiserfs snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device gspca_sonixj gpio_ich gspca_main videodev coretemp hwmon kvm_intel kvm crc32_pclmul crc32c_intel 8250 serial_core lpc_ich microcode mfd_core i2c_i801 pcspkr evdev usb_storage zram(C) unix 4,1076,17508258966,-;CPU 0 4,1077,17508258977,-;Pid: 7212, comm: btrfs-endio-wri Tainted: G C O 3.9.0-gentoo #2 To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3 4,1078,17508259023,-;RIP: 0010:[81161d12] [81161d12] __tree_mod_log_rewind+0x4c/0x121 4,1079,17508259064,-;RSP: 0018:8801966718e8 EFLAGS: 00010293 4,1080,17508259085,-;RAX: 0003 RBX: 8801ee8d33b0 RCX: 880196671888 4,1081,17508259112,-;RDX: 0a4596a4 RSI: 0eee RDI: 8804087be700 4,1082,17508259138,-;RBP: 0071 R08: 1000 R09: 880196671898 4,1083,17508259165,-;R10: R11: R12: 880406c2e000 4,1084,17508259191,-;R13: 8a11 R14: 8803b5aa1200 R15: 0001 4,1085,17508259218,-;FS: () GS:88041f20() knlGS: 4,1086,17508259248,-;CS: 0010 DS: ES: CR0: 80050033 4,1087,17508259270,-;CR2: 026f0390 CR3: 01a0b000 CR4: 000407f0 4,1088,17508259297,-;DR0: DR1: DR2: 4,1089,17508259323,-;DR3: DR6: 0ff0 DR7: 0400 4,1090,17508259350,-;Process btrfs-endio-wri (pid: 7212, threadinfo 88019667, task 8801b82e5400) 4,1091,17508259383,-;Stack: 4,1092,17508259391,-; 8801ee8d38f0 880021b6f360 88013a5b2000 8a11 4,1093,17508259423,-; 8802d0a14000 81167606 0246 8801ee8d33b0 4,1094,17508259455,-; 880406c2e000 8801966719bf 880021b6f360 4,1095,17508259488,-;Call Trace: 4,1096,17508259500,-; [81167606] ? btrfs_search_old_slot+0x543/0x61e 4,1097,17508259526,-; [811692de] ? btrfs_next_old_leaf+0x8a/0x332 4,1098,17508259552,-; [811c484a] ? __resolve_indirect_refs+0x2d8/0x408 4,1099,17508259578,-; [811c533b] ? find_parent_nodes+0x9c1/0xcec 4,1100,17508259602,-; [811c5e06]
Btrfs: wait for quota rescan to complete
Two small patches, one for the kernel and one for the user mode. Both required to support waiting for quota rescan to complete. Jan Schmidt (1): Btrfs: add ioctl to wait for qgroup rescan completion fs/btrfs/ctree.h |2 ++ fs/btrfs/ioctl.c | 12 fs/btrfs/qgroup.c | 21 + include/uapi/linux/btrfs.h |1 + 4 files changed, 36 insertions(+), 0 deletions(-) Jan Schmidt (2): Btrfs-progs: fixup: add flags to struct btrfs_ioctl_quota_rescan_args Btrfs-progs: added btrfs quota rescan -w switch (wait) cmds-quota.c | 19 +-- ioctl.h |2 ++ 2 files changed, 19 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Btrfs-progs: added btrfs quota rescan -w switch (wait)
With -w one can wait for a rescan operation to finish. It can be used when starting a rescan operation or later to wait for the currently running rescan operation to finish. Waiting is interruptible. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- cmds-quota.c | 19 +-- ioctl.h |1 + 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/cmds-quota.c b/cmds-quota.c index 1169772..6557e83 100644 --- a/cmds-quota.c +++ b/cmds-quota.c @@ -90,10 +90,11 @@ static int cmd_quota_disable(int argc, char **argv) } static const char * const cmd_quota_rescan_usage[] = { - btrfs quota rescan [-s] path, + btrfs quota rescan [-sw] path, Trash all qgroup numbers and scan the metadata again with the current config., , -s show status of a running rescan operation, + -w wait for rescan operation to finish (can be already in progress), NULL }; @@ -105,21 +106,30 @@ static int cmd_quota_rescan(int argc, char **argv) char *path = NULL; struct btrfs_ioctl_quota_rescan_args args; int ioctlnum = BTRFS_IOC_QUOTA_RESCAN; + int wait_for_completion = 0; optind = 1; while (1) { - int c = getopt(argc, argv, s); + int c = getopt(argc, argv, sw); if (c 0) break; switch (c) { case 's': ioctlnum = BTRFS_IOC_QUOTA_RESCAN_STATUS; break; + case 'w': + wait_for_completion = 1; + break; default: usage(cmd_quota_rescan_usage); } } + if (ioctlnum != BTRFS_IOC_QUOTA_RESCAN wait_for_completion) { + fprintf(stderr, ERROR: -w cannot be used with -s\n); + return 12; + } + if (check_argc_exact(argc - optind, 1)) usage(cmd_quota_rescan_usage); @@ -134,6 +144,11 @@ static int cmd_quota_rescan(int argc, char **argv) ret = ioctl(fd, ioctlnum, args); e = errno; + + if (wait_for_completion (ret == 0 || e == EINPROGRESS)) { + ret = ioctl(fd, BTRFS_IOC_QUOTA_RESCAN_WAIT, args); + e = errno; + } close(fd); if (ioctlnum == BTRFS_IOC_QUOTA_RESCAN) { diff --git a/ioctl.h b/ioctl.h index abe6dd4..c260bbf 100644 --- a/ioctl.h +++ b/ioctl.h @@ -529,6 +529,7 @@ struct btrfs_ioctl_clone_range_args { struct btrfs_ioctl_quota_rescan_args) #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \ struct btrfs_ioctl_quota_rescan_args) +#define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46) #define BTRFS_IOC_GET_FSLABEL _IOR(BTRFS_IOCTL_MAGIC, 49, \ char[BTRFS_LABEL_SIZE]) #define BTRFS_IOC_SET_FSLABEL _IOW(BTRFS_IOCTL_MAGIC, 50, \ -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Btrfs-progs: fixup: add flags to struct btrfs_ioctl_quota_rescan_args
The patch set previously sent was sent together with the kernel part, but was not updated as I added some reserved bytes to the ioctl struct for future compatibility. This fixes struct btrfs_ioctl_quota_rescan_args. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- ioctl.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/ioctl.h b/ioctl.h index 1ee631a..abe6dd4 100644 --- a/ioctl.h +++ b/ioctl.h @@ -429,6 +429,7 @@ struct btrfs_ioctl_quota_ctl_args { struct btrfs_ioctl_quota_rescan_args { __u64 flags; __u64 progress; + __u64 reserved[6]; }; struct btrfs_ioctl_qgroup_assign_args { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: add ioctl to wait for qgroup rescan completion
btrfs_qgroup_wait_for_completion waits until the currently running qgroup operation completes. It returns immediately when no rescan process is in progress. This is useful to automate things around the rescan process (e.g. testing). Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.h |2 ++ fs/btrfs/ioctl.c | 12 fs/btrfs/qgroup.c | 21 + include/uapi/linux/btrfs.h |1 + 4 files changed, 36 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8624f49..39ca0d9 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1607,6 +1607,7 @@ struct btrfs_fs_info { struct mutex qgroup_rescan_lock; /* protects the progress item */ struct btrfs_key qgroup_rescan_progress; struct btrfs_workers qgroup_rescan_workers; + struct completion qgroup_rescan_completion; /* filesystem state */ unsigned long fs_state; @@ -3836,6 +3837,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, int btrfs_quota_disable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); +int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info); int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 src, u64 dst); int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 5e93bb8..9161660 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3937,6 +3937,16 @@ static long btrfs_ioctl_quota_rescan_status(struct file *file, void __user *arg) return ret; } +static long btrfs_ioctl_quota_rescan_wait(struct file *file, void __user *arg) +{ + struct btrfs_root *root = BTRFS_I(fdentry(file)-d_inode)-root; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + return btrfs_qgroup_wait_for_completion(root-fs_info); +} + static long btrfs_ioctl_set_received_subvol(struct file *file, void __user *arg) { @@ -4179,6 +4189,8 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_quota_rescan(file, argp); case BTRFS_IOC_QUOTA_RESCAN_STATUS: return btrfs_ioctl_quota_rescan_status(file, argp); + case BTRFS_IOC_QUOTA_RESCAN_WAIT: + return btrfs_ioctl_quota_rescan_wait(file, argp); case BTRFS_IOC_DEV_REPLACE: return btrfs_ioctl_dev_replace(root, argp); case BTRFS_IOC_GET_FSLABEL: diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 9d49c58..ebca17a 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2068,6 +2068,8 @@ out: } else { pr_err(btrfs: qgroup scan failed with %d\n, err); } + + complete_all(fs_info-qgroup_rescan_completion); } static void @@ -2108,6 +2110,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info) fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN; memset(fs_info-qgroup_rescan_progress, 0, sizeof(fs_info-qgroup_rescan_progress)); + init_completion(fs_info-qgroup_rescan_completion); /* clear all current qgroup tracking information */ for (n = rb_first(fs_info-qgroup_tree); n; n = rb_next(n)) { @@ -2124,3 +2127,21 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info) return 0; } + +int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info) +{ + int running; + int ret = 0; + + mutex_lock(fs_info-qgroup_rescan_lock); + spin_lock(fs_info-qgroup_lock); + running = fs_info-qgroup_flags BTRFS_QGROUP_STATUS_FLAG_RESCAN; + spin_unlock(fs_info-qgroup_lock); + mutex_unlock(fs_info-qgroup_rescan_lock); + + if (running) + ret = wait_for_completion_interruptible( + fs_info-qgroup_rescan_completion); + + return ret; +} diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 5ef0df5..5b683b5 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -530,6 +530,7 @@ struct btrfs_ioctl_send_args { struct btrfs_ioctl_quota_rescan_args) #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \ struct btrfs_ioctl_quota_rescan_args) +#define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46) #define BTRFS_IOC_GET_FSLABEL _IOR(BTRFS_IOCTL_MAGIC, 49, \ char[BTRFS_LABEL_SIZE]) #define BTRFS_IOC_SET_FSLABEL _IOW(BTRFS_IOCTL_MAGIC, 50, \ -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/3] Btrfs: rescan for qgroups
Hi Wang, On 01.05.2013 09:29, Wang Shilong wrote: Hello Jan, If qgroup tracking is out of sync, a rescan operation can be started. It iterates the complete extent tree and recalculates all qgroup tracking data. This is an expensive operation and should not be used unless required. A filesystem under rescan can still be umounted. The rescan continues on the next mount. Status information is provided with a separate ioctl while a rescan operation is in progress. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.h | 17 ++- fs/btrfs/disk-io.c |5 + fs/btrfs/ioctl.c | 83 ++-- fs/btrfs/qgroup.c | 318 ++-- include/uapi/linux/btrfs.h | 12 ++- 5 files changed, 400 insertions(+), 35 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 412c306..e4f28a6 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1021,9 +1021,9 @@ struct btrfs_block_group_item { */ #define BTRFS_QGROUP_STATUS_FLAG_ON (1ULL 0) /* - * SCANNING is set during the initialization phase + * RESCAN is set during the initialization phase */ -#define BTRFS_QGROUP_STATUS_FLAG_SCANNING (1ULL 1) +#define BTRFS_QGROUP_STATUS_FLAG_RESCAN (1ULL 1) /* * Some qgroup entries are known to be out of date, * either because the configuration has changed in a way that @@ -1052,7 +1052,7 @@ struct btrfs_qgroup_status_item { * only used during scanning to record the progress * of the scan. It contains a logical address */ -__le64 scan; +__le64 rescan; } __attribute__ ((__packed__)); struct btrfs_qgroup_info_item { @@ -1603,6 +1603,11 @@ struct btrfs_fs_info { /* used by btrfs_qgroup_record_ref for an efficient tree traversal */ u64 qgroup_seq; +/* qgroup rescan items */ +struct mutex qgroup_rescan_lock; /* protects the progress item */ +struct btrfs_key qgroup_rescan_progress; +struct btrfs_workers qgroup_rescan_workers; + /* filesystem state */ unsigned long fs_state; @@ -2888,8 +2893,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct btrfs_qgroup_status_item, version, 64); BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item, flags, 64); -BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item, - scan, 64); +BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item, + rescan, 64); /* btrfs_qgroup_info_item */ BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item, @@ -3834,7 +3839,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_quota_disable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); -int btrfs_quota_rescan(struct btrfs_fs_info *fs_info); +int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 src, u64 dst); int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 7717363..63e9348 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2010,6 +2010,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_stop_workers(fs_info-caching_workers); btrfs_stop_workers(fs_info-readahead_workers); btrfs_stop_workers(fs_info-flush_workers); +btrfs_stop_workers(fs_info-qgroup_rescan_workers); } /* helper to cleanup tree roots */ @@ -2301,6 +2302,7 @@ int open_ctree(struct super_block *sb, fs_info-qgroup_seq = 1; fs_info-quota_enabled = 0; fs_info-pending_quota_state = 0; +mutex_init(fs_info-qgroup_rescan_lock); btrfs_init_free_cluster(fs_info-meta_alloc_cluster); btrfs_init_free_cluster(fs_info-data_alloc_cluster); @@ -2529,6 +2531,8 @@ int open_ctree(struct super_block *sb, btrfs_init_workers(fs_info-readahead_workers, readahead, fs_info-thread_pool_size, fs_info-generic_worker); +btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1, + fs_info-generic_worker); /* * endios are largely parallel and should have a very @@ -2563,6 +2567,7 @@ int open_ctree(struct super_block *sb, ret |= btrfs_start_workers(fs_info-caching_workers); ret |= btrfs_start_workers(fs_info-readahead_workers); ret |= btrfs_start_workers(fs_info-flush_workers); +ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers); if (ret) { err = -ENOMEM; goto fail_sb_buffer; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d0af96a..5e93bb8 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c
Re: [PATCH v4 2/3] Btrfs: rescan for qgroups
On 01.05.2013 13:42, Wang Shilong wrote: Hi Jan, Hi Wang, On 01.05.2013 09:29, Wang Shilong wrote: Hello Jan, If qgroup tracking is out of sync, a rescan operation can be started. It iterates the complete extent tree and recalculates all qgroup tracking data. This is an expensive operation and should not be used unless required. A filesystem under rescan can still be umounted. The rescan continues on the next mount. Status information is provided with a separate ioctl while a rescan operation is in progress. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.h | 17 ++- fs/btrfs/disk-io.c |5 + fs/btrfs/ioctl.c | 83 ++-- fs/btrfs/qgroup.c | 318 ++-- include/uapi/linux/btrfs.h | 12 ++- 5 files changed, 400 insertions(+), 35 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 412c306..e4f28a6 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1021,9 +1021,9 @@ struct btrfs_block_group_item { */ #define BTRFS_QGROUP_STATUS_FLAG_ON(1ULL 0) /* - * SCANNING is set during the initialization phase + * RESCAN is set during the initialization phase */ -#define BTRFS_QGROUP_STATUS_FLAG_SCANNING (1ULL 1) +#define BTRFS_QGROUP_STATUS_FLAG_RESCAN (1ULL 1) /* * Some qgroup entries are known to be out of date, * either because the configuration has changed in a way that @@ -1052,7 +1052,7 @@ struct btrfs_qgroup_status_item { * only used during scanning to record the progress * of the scan. It contains a logical address */ - __le64 scan; + __le64 rescan; } __attribute__ ((__packed__)); struct btrfs_qgroup_info_item { @@ -1603,6 +1603,11 @@ struct btrfs_fs_info { /* used by btrfs_qgroup_record_ref for an efficient tree traversal */ u64 qgroup_seq; + /* qgroup rescan items */ + struct mutex qgroup_rescan_lock; /* protects the progress item */ + struct btrfs_key qgroup_rescan_progress; + struct btrfs_workers qgroup_rescan_workers; + /* filesystem state */ unsigned long fs_state; @@ -2888,8 +2893,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct btrfs_qgroup_status_item, version, 64); BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item, flags, 64); -BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item, - scan, 64); +BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item, + rescan, 64); /* btrfs_qgroup_info_item */ BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item, @@ -3834,7 +3839,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_quota_disable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); -int btrfs_quota_rescan(struct btrfs_fs_info *fs_info); +int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 src, u64 dst); int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 7717363..63e9348 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2010,6 +2010,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_stop_workers(fs_info-caching_workers); btrfs_stop_workers(fs_info-readahead_workers); btrfs_stop_workers(fs_info-flush_workers); + btrfs_stop_workers(fs_info-qgroup_rescan_workers); } /* helper to cleanup tree roots */ @@ -2301,6 +2302,7 @@ int open_ctree(struct super_block *sb, fs_info-qgroup_seq = 1; fs_info-quota_enabled = 0; fs_info-pending_quota_state = 0; + mutex_init(fs_info-qgroup_rescan_lock); btrfs_init_free_cluster(fs_info-meta_alloc_cluster); btrfs_init_free_cluster(fs_info-data_alloc_cluster); @@ -2529,6 +2531,8 @@ int open_ctree(struct super_block *sb, btrfs_init_workers(fs_info-readahead_workers, readahead, fs_info-thread_pool_size, fs_info-generic_worker); + btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1, + fs_info-generic_worker); /* * endios are largely parallel and should have a very @@ -2563,6 +2567,7 @@ int open_ctree(struct super_block *sb, ret |= btrfs_start_workers(fs_info-caching_workers); ret |= btrfs_start_workers(fs_info-readahead_workers); ret |= btrfs_start_workers(fs_info-flush_workers); + ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers); if (ret) { err = -ENOMEM; goto fail_sb_buffer; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d0af96a..5e93bb8 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3701,12 +3701,10 @@ static long
[BUG] crash after failed mount of btrfs-image
Hi Josef, tried your btrfs-image tool (which didn't work for me but that's not that important). # ~/btrfs-image /dev/sdt1 /var/tmp/janosch.btrfsimage # mount -o loop /var/tmp/janosch.btrfsimage /mnt/test mount: you must specify the filesystem type Doesn't mount, okay. Use -r: # ~/btrfs-image -r /var/tmp/janosch.btrfsimage /var/tmp/removeme # mount -o loop /var/tmp/removeme /mnt/test That failed in open_ctree - probably related. The following loopback mount of a different dd dump of the same file system lead to a null pointer dereference. # mount -o loop /var/tmp/janosch.dump /mnt/test I'm just guessing, should btrfs-image patch the uuid in the blocks and generate a fresh one? 1[ 2287.927943] BUG: unable to handle kernel 6[ 2287.927944] SysRq : Changing Loglevel 4[ 2287.927945] Loglevel set to 3 4[ 2288.061561] NULL pointer dereference at 01e8 1[ 2288.061563] IP: [a048da30] start_transaction+0x20/0x4f0 [btrfs] 4[ 2288.143091] PGD 232f5c067 PUD 2279dd067 PMD 0 4[ 2288.143094] Oops: [#1] PREEMPT SMP DEBUG_PAGEALLOC 4[ 2288.143098] Modules linked in: btrfs raid6_pq xor mpt2sas scsi_transport_sas raid_class [last unloaded: btrfs] 4[ 2288.143104] CPU 2 4[ 2288.143107] Pid: 22375, comm: btrfs-qgroup-re Not tainted 3.8.0+ #15 Supermicro X8SIL/X8SIL 4[ 2288.143109] RIP: 0010:[a048da30] [a048da30] start_transaction+0x20/0x4f0 [btrfs] 4[ 2288.143122] RSP: 0018:880232d79c28 EFLAGS: 00010296 4[ 2288.143123] RAX: 0014 RBX: ffe2 RCX: 0002 4[ 2288.143125] RDX: RSI: RDI: 4[ 2288.143126] RBP: 880232d79c78 R08: R09: 4[ 2288.143128] R10: 0001 R11: 074b R12: 880231c31378 4[ 2288.143129] R13: 880227ce5158 R14: R15: 880231c31368 4[ 2288.143131] FS: () GS:880236a0() knlGS: 4[ 2288.143133] CS: 0010 DS: ES: CR0: 8005003b 4[ 2288.143134] CR2: 01e8 CR3: 00022fdd CR4: 07e0 4[ 2288.143136] DR0: DR1: DR2: 4[ 2288.143137] DR3: DR6: 0ff0 DR7: 0400 4[ 2288.143139] Process btrfs-qgroup-re (pid: 22375, threadinfo 880232d78000, task 880234eb) 4[ 2288.143140] Stack: 4[ 2288.143141] 88020010 880232d79c98 880232d79c58 0298 4[ 2288.143144] a050658d fff4 880231c31378 880227ce5158 4[ 2288.143147] 880232d79dd8 880231c31368 880232d79c88 a048e298 4[ 2288.143151] Call Trace: 4[ 2288.143164] [a048e298] btrfs_start_transaction+0x18/0x20 [btrfs] 4[ 2288.143180] [a04ef035] btrfs_qgroup_rescan_worker+0xd5/0x840 [btrfs] 4[ 2288.143184] [810ec06d] ? trace_hardirqs_off+0xd/0x10 4[ 2288.143187] [810c99ab] ? local_clock+0x4b/0x60 4[ 2288.143191] [819b9420] ? _raw_spin_unlock_irq+0x30/0x60 4[ 2288.143206] [a04bc26f] worker_loop+0x13f/0x5b0 [btrfs] 4[ 2288.143221] [a04bc130] ? btrfs_queue_worker+0x300/0x300 [btrfs] 4[ 2288.143224] [810b4ebe] kthread+0xde/0xf0 4[ 2288.143227] [810b4de0] ? __init_kthread_worker+0x70/0x70 4[ 2288.143231] [819c0bdc] ret_from_fork+0x7c/0xb0 4[ 2288.143233] [810b4de0] ? __init_kthread_worker+0x70/0x70 4[ 2288.143235] Code: c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 28 66 66 66 66 90 48 c7 c3 e2 ff ff ff 49 89 f6 48 8b b7 e8 01 00 00 49 89 fc 41 89 d5 48 8b 86 a0 33 00 00 a8 1[ 2288.143266] RIP [a048da30] start_transaction+0x20/0x4f0 [btrfs] 4[ 2288.225868] RSP 880232d79c28 4[ 2288.225870] CR2: 01e8 4[ 2288.226363] ---[ end trace 64cb1c6d4f6c2fa7 ]--- The corresponding line of code from start_transaction is 334: 324 static struct btrfs_trans_handle * 325 start_transaction(struct btrfs_root *root, u64 num_items, int type, 326 enum btrfs_reserve_flush_enum flush) 327 { 328 struct btrfs_trans_handle *h; 329 struct btrfs_transaction *cur_trans; 330 u64 num_bytes = 0; 331 int ret; 332 u64 qgroup_reserved = 0; 333 334 if (test_bit(BTRFS_FS_STATE_ERROR, root-fs_info-fs_state)) 335 return ERR_PTR(-EROFS); With the mentioned steps I could reproduce the problem once, a second attempt failed. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: btrfs/276 - stop all fsstress before exiting
On Fri, April 26, 2013 at 07:29 (+0200), Eric Sandeen wrote: Tests after 276 were failing because the background fsstress hadn't quit prior to exit, devices couldn't be unmounted, etc. I don't see how that would happen. Any further insight? Just use the same trick as generic/068 does, and use a tmpfile to control whether the background loop keeps running. I like that trick :-) Thanks, -Jan Also, no need to umount scratch at cleanup time, the scripts do that for us. Signed-off-by: Eric Sandeen sand...@redhat.com --- (nobody else ran into this? really?) diff --git a/tests/btrfs/276 b/tests/btrfs/276 index 0a5ce36..9d68b54 100755 --- a/tests/btrfs/276 +++ b/tests/btrfs/276 @@ -36,14 +36,8 @@ noise_pid=0 _cleanup() { - if [ $noise_pid -ne 0 ]; then - echo background noise kill $noise_pid $seqres.full - kill $noise_pid - noise_pid=0 - wait - fi - echo *** unmount - umount $SCRATCH_MNT 2/dev/null + rm $tmp.running + wait rm -f $tmp.* } trap _cleanup; exit \$status 0 1 2 3 15 @@ -210,7 +204,7 @@ workout() if [ $do_bg_noise -ne 0 ]; then # make background noise while backrefs are being walked - while /bin/true; do + while [ -f $tmp.running ]; do echo background fsstress $seqres.full run_check $FSSTRESS_PROG -d $SCRATCH_MNT/bgnoise -n 999 echo background rm $seqres.full @@ -263,6 +257,8 @@ nfiles=4 numprocs=1 do_bg_noise=1 +touch $tmp.running + workout $filesize $nfiles $numprocs $snap_name $do_bg_noise echo *** done diff --git a/tests/btrfs/276.out b/tests/btrfs/276.out index 2032dea..5113164 100644 --- a/tests/btrfs/276.out +++ b/tests/btrfs/276.out @@ -1,4 +1,3 @@ QA output created by 276 *** test backref walking *** done -*** unmount -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions
The function is separated into a preparation part and the three accounting steps mentioned in the qgroups documentation. The goal is to make steps two and three usable by the rescan functionality. A side effect is that the function is restructured into readable subunits. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/qgroup.c | 253 +++-- 1 files changed, 148 insertions(+), 105 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index f175471..c50e5a5 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1185,6 +1185,144 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans, return 0; } +static int qgroup_account_ref_step1(struct btrfs_fs_info *fs_info, + struct ulist *roots, struct ulist *tmp, + u64 seq) +{ + struct ulist_node *unode; + struct ulist_iterator uiter; + struct ulist_node *tmp_unode; + struct ulist_iterator tmp_uiter; + struct btrfs_qgroup *qg; + int ret; + + ULIST_ITER_INIT(uiter); + while ((unode = ulist_next(roots, uiter))) { + qg = find_qgroup_rb(fs_info, unode-val); + if (!qg) + continue; + + ulist_reinit(tmp); + /* XXX id not needed */ + ret = ulist_add(tmp, qg-qgroupid, + (u64)(uintptr_t)qg, GFP_ATOMIC); + if (ret 0) + return ret; + ULIST_ITER_INIT(tmp_uiter); + while ((tmp_unode = ulist_next(tmp, tmp_uiter))) { + struct btrfs_qgroup_list *glist; + + qg = (struct btrfs_qgroup *)(uintptr_t)tmp_unode-aux; + if (qg-refcnt seq) + qg-refcnt = seq + 1; + else + ++qg-refcnt; + + list_for_each_entry(glist, qg-groups, next_group) { + ret = ulist_add(tmp, glist-group-qgroupid, + (u64)(uintptr_t)glist-group, + GFP_ATOMIC); + if (ret 0) + return ret; + } + } + } + + return 0; +} + +static int qgroup_account_ref_step2(struct btrfs_fs_info *fs_info, + struct ulist *roots, struct ulist *tmp, + u64 seq, int sgn, u64 num_bytes, + struct btrfs_qgroup *qgroup) +{ + struct ulist_node *unode; + struct ulist_iterator uiter; + struct btrfs_qgroup *qg; + struct btrfs_qgroup_list *glist; + int ret; + + ulist_reinit(tmp); + ret = ulist_add(tmp, qgroup-qgroupid, (uintptr_t)qgroup, GFP_ATOMIC); + if (ret 0) + return ret; + + ULIST_ITER_INIT(uiter); + while ((unode = ulist_next(tmp, uiter))) { + qg = (struct btrfs_qgroup *)(uintptr_t)unode-aux; + if (qg-refcnt seq) { + /* not visited by step 1 */ + qg-rfer += sgn * num_bytes; + qg-rfer_cmpr += sgn * num_bytes; + if (roots-nnodes == 0) { + qg-excl += sgn * num_bytes; + qg-excl_cmpr += sgn * num_bytes; + } + qgroup_dirty(fs_info, qg); + } + WARN_ON(qg-tag = seq); + qg-tag = seq; + + list_for_each_entry(glist, qg-groups, next_group) { + ret = ulist_add(tmp, glist-group-qgroupid, + (uintptr_t)glist-group, GFP_ATOMIC); + if (ret 0) + return ret; + } + } + + return 0; +} + +static int qgroup_account_ref_step3(struct btrfs_fs_info *fs_info, + struct ulist *roots, struct ulist *tmp, + u64 seq, int sgn, u64 num_bytes) +{ + struct ulist_node *unode; + struct ulist_iterator uiter; + struct btrfs_qgroup *qg; + struct ulist_node *tmp_unode; + struct ulist_iterator tmp_uiter; + int ret; + + ULIST_ITER_INIT(uiter); + while ((unode = ulist_next(roots, uiter))) { + qg = find_qgroup_rb(fs_info, unode-val); + if (!qg) + continue; + + ulist_reinit(tmp); + ret = ulist_add(tmp, qg-qgroupid, (uintptr_t)qg, GFP_ATOMIC); + if (ret 0) + return ret; + + ULIST_ITER_INIT(tmp_uiter); + while ((tmp_unode = ulist_next(tmp
[PATCH v4 3/3] Btrfs: automatic rescan after quota enable command
When qgroup tracking is enabled, we do an automatic cycle of the new rescan mechanism. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/qgroup.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 664d457..1df4db5 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1491,10 +1491,14 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans, { struct btrfs_root *quota_root = fs_info-quota_root; int ret = 0; + int start_rescan_worker = 0; if (!quota_root) goto out; + if (!fs_info-quota_enabled fs_info-pending_quota_state) + start_rescan_worker = 1; + fs_info-quota_enabled = fs_info-pending_quota_state; spin_lock(fs_info-qgroup_lock); @@ -1520,6 +1524,13 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans, if (ret) fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; + if (!ret start_rescan_worker) { + ret = btrfs_qgroup_rescan(fs_info); + if (ret) + pr_err(btrfs: start rescan quota failed: %d\n, ret); + ret = 0; + } + out: return ret; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 2/3] Btrfs: rescan for qgroups
If qgroup tracking is out of sync, a rescan operation can be started. It iterates the complete extent tree and recalculates all qgroup tracking data. This is an expensive operation and should not be used unless required. A filesystem under rescan can still be umounted. The rescan continues on the next mount. Status information is provided with a separate ioctl while a rescan operation is in progress. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.h | 17 ++- fs/btrfs/disk-io.c |5 + fs/btrfs/ioctl.c | 83 ++-- fs/btrfs/qgroup.c | 318 ++-- include/uapi/linux/btrfs.h | 12 ++- 5 files changed, 400 insertions(+), 35 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 412c306..e4f28a6 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1021,9 +1021,9 @@ struct btrfs_block_group_item { */ #define BTRFS_QGROUP_STATUS_FLAG_ON(1ULL 0) /* - * SCANNING is set during the initialization phase + * RESCAN is set during the initialization phase */ -#define BTRFS_QGROUP_STATUS_FLAG_SCANNING (1ULL 1) +#define BTRFS_QGROUP_STATUS_FLAG_RESCAN(1ULL 1) /* * Some qgroup entries are known to be out of date, * either because the configuration has changed in a way that @@ -1052,7 +1052,7 @@ struct btrfs_qgroup_status_item { * only used during scanning to record the progress * of the scan. It contains a logical address */ - __le64 scan; + __le64 rescan; } __attribute__ ((__packed__)); struct btrfs_qgroup_info_item { @@ -1603,6 +1603,11 @@ struct btrfs_fs_info { /* used by btrfs_qgroup_record_ref for an efficient tree traversal */ u64 qgroup_seq; + /* qgroup rescan items */ + struct mutex qgroup_rescan_lock; /* protects the progress item */ + struct btrfs_key qgroup_rescan_progress; + struct btrfs_workers qgroup_rescan_workers; + /* filesystem state */ unsigned long fs_state; @@ -2888,8 +2893,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct btrfs_qgroup_status_item, version, 64); BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item, flags, 64); -BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item, - scan, 64); +BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item, + rescan, 64); /* btrfs_qgroup_info_item */ BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item, @@ -3834,7 +3839,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_quota_disable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); -int btrfs_quota_rescan(struct btrfs_fs_info *fs_info); +int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 src, u64 dst); int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 7717363..63e9348 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2010,6 +2010,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_stop_workers(fs_info-caching_workers); btrfs_stop_workers(fs_info-readahead_workers); btrfs_stop_workers(fs_info-flush_workers); + btrfs_stop_workers(fs_info-qgroup_rescan_workers); } /* helper to cleanup tree roots */ @@ -2301,6 +2302,7 @@ int open_ctree(struct super_block *sb, fs_info-qgroup_seq = 1; fs_info-quota_enabled = 0; fs_info-pending_quota_state = 0; + mutex_init(fs_info-qgroup_rescan_lock); btrfs_init_free_cluster(fs_info-meta_alloc_cluster); btrfs_init_free_cluster(fs_info-data_alloc_cluster); @@ -2529,6 +2531,8 @@ int open_ctree(struct super_block *sb, btrfs_init_workers(fs_info-readahead_workers, readahead, fs_info-thread_pool_size, fs_info-generic_worker); + btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1, + fs_info-generic_worker); /* * endios are largely parallel and should have a very @@ -2563,6 +2567,7 @@ int open_ctree(struct super_block *sb, ret |= btrfs_start_workers(fs_info-caching_workers); ret |= btrfs_start_workers(fs_info-readahead_workers); ret |= btrfs_start_workers(fs_info-flush_workers); + ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers); if (ret) { err = -ENOMEM; goto fail_sb_buffer; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d0af96a..5e93bb8 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3701,12
Re: [PATCH] Btrfs: separate sequence numbers for delayed ref tracking and tree mod log
On Wed, April 24, 2013 at 10:12 (+0200), Liu Bo wrote: On Tue, Apr 23, 2013 at 08:00:27PM +0200, Jan Schmidt wrote: Sequence numbers for delayed refs have been introduced in the first version of the qgroup patch set. To solve the problem of find_all_roots on a busy file system, the tree mod log was introduced. The sequence numbers for that were simply shared between those two users. Can't we just separate them with two vars? My reasoning comes a few lines below ... thanks, liubo However, at one point in qgroup's quota accounting, there's a statement accessing the previous sequence number, that's still just doing (seq - 1) just as it had to in the very first version. To satisfy that requirement, this patch makes the sequence number counter 64 bit and splits it into a major part (used for qgroup sequence number counting) and a minor part (incremented for each tree modification in the log). This enables us to go exactly one major step backwards, as required for qgroups, while still incrementing the sequence counter for tree mod log insertions to keep track of their order. Keeping them in a single variable means there's no need to change all the code dealing with comparisons of two sequence numbers. See the previous sentence :-) And, it doesn't add too much complexity, setting and incrementing remains in fact quite easy, even though we use the upper 32 bit and the lower 32 bit of that integer independently. Thanks, -Jan The sequence number is reset to 0 on commit (not new in this patch), which ensures we won't overflow the two 32 bit counters. Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs from the tree mod log code may happen. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- [snip] -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: separate sequence numbers for delayed ref tracking and tree mod log
On Wed, April 24, 2013 at 15:04 (+0200), Josef Bacik wrote: On Tue, Apr 23, 2013 at 12:00:27PM -0600, Jan Schmidt wrote: Sequence numbers for delayed refs have been introduced in the first version of the qgroup patch set. To solve the problem of find_all_roots on a busy file system, the tree mod log was introduced. The sequence numbers for that were simply shared between those two users. However, at one point in qgroup's quota accounting, there's a statement accessing the previous sequence number, that's still just doing (seq - 1) just as it had to in the very first version. To satisfy that requirement, this patch makes the sequence number counter 64 bit and splits it into a major part (used for qgroup sequence number counting) and a minor part (incremented for each tree modification in the log). This enables us to go exactly one major step backwards, as required for qgroups, while still incrementing the sequence counter for tree mod log insertions to keep track of their order. Keeping them in a single variable means there's no need to change all the code dealing with comparisons of two sequence numbers. The sequence number is reset to 0 on commit (not new in this patch), which ensures we won't overflow the two 32 bit counters. Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs from the tree mod log code may happen. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.c | 36 +--- fs/btrfs/ctree.h |7 ++- fs/btrfs/delayed-ref.c |6 -- fs/btrfs/disk-io.c |2 +- fs/btrfs/extent-tree.c |5 +++-- fs/btrfs/qgroup.c | 13 - fs/btrfs/transaction.c |2 +- 7 files changed, 52 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 566d99b..b74136e 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -361,6 +361,36 @@ static inline void tree_mod_log_write_unlock(struct btrfs_fs_info *fs_info) } /* + * increment the upper half of tree_mod_seq, set lower half zero + * + * must be called with fs_info-tree_mod_seq_lock held + */ +static inline u64 btrfs_inc_tree_mod_seq_major(struct btrfs_fs_info *fs_info) +{ +u64 seq = atomic64_read(fs_info-tree_mod_seq); +seq = 0xull; +seq += 1ull 32; +atomic64_set(fs_info-tree_mod_seq, seq); +return seq; +} This isn't going to work, you read in the value, inc it and then set the new value. If somebody comes in and inc's in between the read and the sync, like btrfs_inc_tree_mod_seq_minor could do when you call tree_mod_alloc, you'll end up losing the minor update. Thanks, I don't think I'll lose it. The minor update is made and returned to the one who needs it, that number can still be used. There is no guarantee for two concurrent modifications to which major a minor number belongs, though. Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/3] Btrfs: rescan for qgroups
On Wed, April 24, 2013 at 13:00 (+0200), Wang Shilong wrote: Hello Jan, [snip] +/* + * returns 0 on error, 0 when more leafs are to be scanned. + * returns 1 when done, 2 when done and FLAG_INCONSISTENT was cleared. + */ +static int +qgroup_rescan_leaf(struct qgroup_rescan *qscan, struct btrfs_path *path, + struct btrfs_trans_handle *trans, struct ulist *tmp, + struct extent_buffer *scratch_leaf) +{ +struct btrfs_key found; +struct btrfs_fs_info *fs_info = qscan-fs_info; +struct ulist *roots = NULL; +struct ulist_node *unode; +struct ulist_iterator uiter; +struct seq_list tree_mod_seq_elem = {}; +u64 seq; +int slot; +int ret; + +path-leave_spinning = 1; +mutex_lock(fs_info-qgroup_rescan_lock); +ret = btrfs_search_slot_for_read(fs_info-extent_root, + fs_info-qgroup_rescan_progress, + path, 1, 0); + +pr_debug(current progress key (%llu %u %llu), search_slot ret %d\n, + (unsigned long long)fs_info-qgroup_rescan_progress.objectid, + fs_info-qgroup_rescan_progress.type, + (unsigned long long)fs_info-qgroup_rescan_progress.offset, + ret); + +if (ret) { +/* + * The rescan is about to end, we will not be scanning any + * further blocks. We cannot unset the RESCAN flag here, because + * we want to commit the transaction if everything went well. + * To make the live accounting work in this phase, we set our + * scan progress pointer such that every real extent objectid + * will be smaller. + */ +fs_info-qgroup_rescan_progress.objectid = (u64)-1; +btrfs_release_path(path); +mutex_unlock(fs_info-qgroup_rescan_lock); +return ret; +} + +btrfs_item_key_to_cpu(path-nodes[0], found, + btrfs_header_nritems(path-nodes[0]) - 1); +fs_info-qgroup_rescan_progress.objectid = found.objectid + 1; + +btrfs_get_tree_mod_seq(fs_info, tree_mod_seq_elem); +memcpy(scratch_leaf, path-nodes[0], sizeof(*scratch_leaf)); +slot = path-slots[0]; +btrfs_release_path(path); +mutex_unlock(fs_info-qgroup_rescan_lock); + +for (; slot btrfs_header_nritems(scratch_leaf); ++slot) { +btrfs_item_key_to_cpu(scratch_leaf, found, slot); +if (found.type != BTRFS_EXTENT_ITEM_KEY) +continue; +ret = btrfs_find_all_roots(trans, fs_info, found.objectid, + tree_mod_seq_elem.seq, roots); +if (ret 0) +break; +spin_lock(fs_info-qgroup_lock); +seq = fs_info-qgroup_seq; +fs_info-qgroup_seq += roots-nnodes + 1; /* max refcnt */ + +ulist_reinit(tmp); +ULIST_ITER_INIT(uiter); +while ((unode = ulist_next(roots, uiter))) { +struct btrfs_qgroup *qg; + +qg = find_qgroup_rb(fs_info, unode-val); +if (!qg) +continue; + +ret = ulist_add(tmp, qg-qgroupid, (uintptr_t)qg, +GFP_ATOMIC); +if (ret 0) { +spin_unlock(fs_info-qgroup_lock); +goto out; +} +} + +/* this is similar to step 2 of btrfs_qgroup_account_ref */ +ULIST_ITER_INIT(uiter); +while ((unode = ulist_next(tmp, uiter))) { +struct btrfs_qgroup *qg; +struct btrfs_qgroup_list *glist; + +qg = (struct btrfs_qgroup *)(uintptr_t) unode-aux; +qg-rfer += found.offset; +qg-rfer_cmpr += found.offset; +WARN_ON(qg-tag = seq); +WARN_ON(qg-refcnt = seq); +if (qg-refcnt seq) +qg-refcnt = seq + 1; +else +qg-refcnt = qg-refcnt + 1; +qgroup_dirty(fs_info, qg); + +list_for_each_entry(glist, qg-groups, next_group) { +ret = ulist_add(tmp, glist-group-qgroupid, +(uintptr_t)glist-group, +GFP_ATOMIC); +if (ret 0) { +spin_unlock(fs_info-qgroup_lock); +goto out; +} +} +} Here i think we can resue arne's 3 steps algorithm to make qgroup accounting correct. However, your first step just find all the root
[PATCH v2] Btrfs: separate sequence numbers for delayed ref tracking and tree mod log
Sequence numbers for delayed refs have been introduced in the first version of the qgroup patch set. To solve the problem of find_all_roots on a busy file system, the tree mod log was introduced. The sequence numbers for that were simply shared between those two users. However, at one point in qgroup's quota accounting, there's a statement accessing the previous sequence number, that's still just doing (seq - 1) just as it would have to in the very first version. To satisfy that requirement, this patch makes the sequence number counter 64 bit and splits it into a major part (used for qgroup sequence number counting) and a minor part (incremented for each tree modification in the log). This enables us to go exactly one major step backwards, as required for qgroups, while still incrementing the sequence counter for tree mod log insertions to keep track of their order. Keeping them in a single variable means there's no need to change all the code dealing with comparisons of two sequence numbers. The sequence number is reset to 0 on commit (not new in this patch), which ensures we won't overflow the two 32 bit counters. Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs from the tree mod log code may happen. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- Changes v1-v2: - added spin lock and comment around btrfs_inc_tree_mod_seq_minor (to make Josef happy in case I get hit by a bus and somebody tries to change it later) fs/btrfs/ctree.c | 47 --- fs/btrfs/ctree.h |7 ++- fs/btrfs/delayed-ref.c |6 -- fs/btrfs/disk-io.c |2 +- fs/btrfs/extent-tree.c |5 +++-- fs/btrfs/qgroup.c | 13 - fs/btrfs/transaction.c |2 +- 7 files changed, 63 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 566d99b..6275c9c 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -361,6 +361,44 @@ static inline void tree_mod_log_write_unlock(struct btrfs_fs_info *fs_info) } /* + * Increment the upper half of tree_mod_seq, set lower half zero. + * + * Must be called with fs_info-tree_mod_seq_lock held. + */ +static inline u64 btrfs_inc_tree_mod_seq_major(struct btrfs_fs_info *fs_info) +{ + u64 seq = atomic64_read(fs_info-tree_mod_seq); + seq = 0xull; + seq += 1ull 32; + atomic64_set(fs_info-tree_mod_seq, seq); + return seq; +} + +/* + * Increment the lower half of tree_mod_seq. + * + * Must be called with fs_info-tree_mod_seq_lock held. The way major numbers + * are generated should not technically require a spin lock here. (Rationale: + * incrementing the minor while incrementing the major seq number is between its + * atomic64_read and atomic64_set calls doesn't duplicate sequence numbers, it + * just returns a unique sequence number as usual.) We have decided to leave + * that requirement in here and rethink it once we notice it really imposes a + * problem on some workload. + */ +static inline u64 btrfs_inc_tree_mod_seq_minor(struct btrfs_fs_info *fs_info) +{ + return atomic64_inc_return(fs_info-tree_mod_seq); +} + +/* + * return the last minor in the previous major tree_mod_seq number + */ +u64 btrfs_tree_mod_seq_prev(u64 seq) +{ + return (seq 0xull) - 1ull; +} + +/* * This adds a new blocker to the tree mod log's blocker list if the @elem * passed does not already have a sequence number set. So when a caller expects * to record tree modifications, it should ensure to set elem-seq to zero @@ -376,10 +414,10 @@ u64 btrfs_get_tree_mod_seq(struct btrfs_fs_info *fs_info, tree_mod_log_write_lock(fs_info); spin_lock(fs_info-tree_mod_seq_lock); if (!elem-seq) { - elem-seq = btrfs_inc_tree_mod_seq(fs_info); + elem-seq = btrfs_inc_tree_mod_seq_major(fs_info); list_add_tail(elem-list, fs_info-tree_mod_seq_list); } - seq = btrfs_inc_tree_mod_seq(fs_info); + seq = btrfs_inc_tree_mod_seq_minor(fs_info); spin_unlock(fs_info-tree_mod_seq_lock); tree_mod_log_write_unlock(fs_info); @@ -524,7 +562,10 @@ static inline int tree_mod_alloc(struct btrfs_fs_info *fs_info, gfp_t flags, if (!tm) return -ENOMEM; - tm-seq = btrfs_inc_tree_mod_seq(fs_info); + spin_lock(fs_info-tree_mod_seq_lock); + tm-seq = btrfs_inc_tree_mod_seq_minor(fs_info); + spin_unlock(fs_info-tree_mod_seq_lock); + return tm-seq; } diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 412c306..5f34f89 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1422,7 +1422,7 @@ struct btrfs_fs_info { /* this protects tree_mod_seq_list */ spinlock_t tree_mod_seq_lock; - atomic_t tree_mod_seq; + atomic64_t tree_mod_seq; struct list_head tree_mod_seq_list; struct seq_list tree_mod_seq_elem; @@ -3334,10
[PATCH v3 3/3] Btrfs: automatic rescan after quota enable command
When qgroup tracking is enabled, we do an automatic cycle of the new rescan mechanism. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/qgroup.c | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 249dd64..b1ae0ab 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1494,10 +1494,14 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans, { struct btrfs_root *quota_root = fs_info-quota_root; int ret = 0; + int start_rescan_worker = 0; if (!quota_root) goto out; + if (!fs_info-quota_enabled fs_info-pending_quota_state) + start_rescan_worker = 1; + fs_info-quota_enabled = fs_info-pending_quota_state; spin_lock(fs_info-qgroup_lock); @@ -1523,6 +1527,12 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans, if (ret) fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; + if (start_rescan_worker) { + ret = btrfs_qgroup_rescan(fs_info); + if (ret) + pr_err(btrfs: start rescan quota failed: %d\n, ret); + } + out: return ret; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions
The function is separated into a preparation part and the three accounting steps mentioned in the qgroups documentation. The goal is to make steps two and three usable by the rescan functionality. A side effect is that the function is restructured into readable subunits. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/qgroup.c | 253 +++-- 1 files changed, 148 insertions(+), 105 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index f175471..c50e5a5 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1185,6 +1185,144 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans, return 0; } +static int qgroup_account_ref_step1(struct btrfs_fs_info *fs_info, + struct ulist *roots, struct ulist *tmp, + u64 seq) +{ + struct ulist_node *unode; + struct ulist_iterator uiter; + struct ulist_node *tmp_unode; + struct ulist_iterator tmp_uiter; + struct btrfs_qgroup *qg; + int ret; + + ULIST_ITER_INIT(uiter); + while ((unode = ulist_next(roots, uiter))) { + qg = find_qgroup_rb(fs_info, unode-val); + if (!qg) + continue; + + ulist_reinit(tmp); + /* XXX id not needed */ + ret = ulist_add(tmp, qg-qgroupid, + (u64)(uintptr_t)qg, GFP_ATOMIC); + if (ret 0) + return ret; + ULIST_ITER_INIT(tmp_uiter); + while ((tmp_unode = ulist_next(tmp, tmp_uiter))) { + struct btrfs_qgroup_list *glist; + + qg = (struct btrfs_qgroup *)(uintptr_t)tmp_unode-aux; + if (qg-refcnt seq) + qg-refcnt = seq + 1; + else + ++qg-refcnt; + + list_for_each_entry(glist, qg-groups, next_group) { + ret = ulist_add(tmp, glist-group-qgroupid, + (u64)(uintptr_t)glist-group, + GFP_ATOMIC); + if (ret 0) + return ret; + } + } + } + + return 0; +} + +static int qgroup_account_ref_step2(struct btrfs_fs_info *fs_info, + struct ulist *roots, struct ulist *tmp, + u64 seq, int sgn, u64 num_bytes, + struct btrfs_qgroup *qgroup) +{ + struct ulist_node *unode; + struct ulist_iterator uiter; + struct btrfs_qgroup *qg; + struct btrfs_qgroup_list *glist; + int ret; + + ulist_reinit(tmp); + ret = ulist_add(tmp, qgroup-qgroupid, (uintptr_t)qgroup, GFP_ATOMIC); + if (ret 0) + return ret; + + ULIST_ITER_INIT(uiter); + while ((unode = ulist_next(tmp, uiter))) { + qg = (struct btrfs_qgroup *)(uintptr_t)unode-aux; + if (qg-refcnt seq) { + /* not visited by step 1 */ + qg-rfer += sgn * num_bytes; + qg-rfer_cmpr += sgn * num_bytes; + if (roots-nnodes == 0) { + qg-excl += sgn * num_bytes; + qg-excl_cmpr += sgn * num_bytes; + } + qgroup_dirty(fs_info, qg); + } + WARN_ON(qg-tag = seq); + qg-tag = seq; + + list_for_each_entry(glist, qg-groups, next_group) { + ret = ulist_add(tmp, glist-group-qgroupid, + (uintptr_t)glist-group, GFP_ATOMIC); + if (ret 0) + return ret; + } + } + + return 0; +} + +static int qgroup_account_ref_step3(struct btrfs_fs_info *fs_info, + struct ulist *roots, struct ulist *tmp, + u64 seq, int sgn, u64 num_bytes) +{ + struct ulist_node *unode; + struct ulist_iterator uiter; + struct btrfs_qgroup *qg; + struct ulist_node *tmp_unode; + struct ulist_iterator tmp_uiter; + int ret; + + ULIST_ITER_INIT(uiter); + while ((unode = ulist_next(roots, uiter))) { + qg = find_qgroup_rb(fs_info, unode-val); + if (!qg) + continue; + + ulist_reinit(tmp); + ret = ulist_add(tmp, qg-qgroupid, (uintptr_t)qg, GFP_ATOMIC); + if (ret 0) + return ret; + + ULIST_ITER_INIT(tmp_uiter); + while ((tmp_unode = ulist_next(tmp
[PATCH v3 0/3] Btrfs: quota rescan for 3.10
The kernel side for rescan, which is needed if you want to enable qgroup tracking on a non-empty volume. The first patch splits btrfs_qgroup_account_ref into readable ans reusable units. The second patch adds the rescan implementation (refer to its commit message for a description of the algorithm). The third patch starts an automatic rescan when qgroups are enabled. It is only separated to potentially help bisecting things in case of a problem. The required user space patch was sent at 2013-04-05, subject [PATCH] Btrfs-progs: quota rescan. -- Changes v2-v3: - rebased to btrfs-next - stop rescan worker when quota is disabled - check return value of ulist_add() - initialize worker struct to zero Changes v1-v2: - fix calculation of the exclusive field for qgroups in level != 0 - split btrfs_qgroup_account_ref - take into account that mutex_unlock might schedule - fix kzalloc error checking - add some reserved ints to struct btrfs_ioctl_quota_rescan_args - changed modification to unused #define BTRFS_QUOTA_CTL_RESCAN - added missing (unsigned long long) casts for pr_debug - more detailed commit messages Jan Schmidt (3): Btrfs: split btrfs_qgroup_account_ref into four functions Btrfs: rescan for qgroups Btrfs: automatic rescan after quota enable command fs/btrfs/ctree.h | 17 +- fs/btrfs/disk-io.c |5 + fs/btrfs/ioctl.c | 83 ++- fs/btrfs/qgroup.c | 575 +++- include/uapi/linux/btrfs.h | 12 +- 5 files changed, 552 insertions(+), 140 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/3] Btrfs: rescan for qgroups
If qgroup tracking is out of sync, a rescan operation can be started. It iterates the complete extent tree and recalculates all qgroup tracking data. This is an expensive operation and should not be used unless required. A filesystem under rescan can still be umounted. The rescan continues on the next mount. Status information is provided with a separate ioctl while a rescan operation is in progress. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.h | 17 ++- fs/btrfs/disk-io.c |5 + fs/btrfs/ioctl.c | 83 ++-- fs/btrfs/qgroup.c | 312 ++-- include/uapi/linux/btrfs.h | 12 ++- 5 files changed, 394 insertions(+), 35 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 412c306..e4f28a6 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1021,9 +1021,9 @@ struct btrfs_block_group_item { */ #define BTRFS_QGROUP_STATUS_FLAG_ON(1ULL 0) /* - * SCANNING is set during the initialization phase + * RESCAN is set during the initialization phase */ -#define BTRFS_QGROUP_STATUS_FLAG_SCANNING (1ULL 1) +#define BTRFS_QGROUP_STATUS_FLAG_RESCAN(1ULL 1) /* * Some qgroup entries are known to be out of date, * either because the configuration has changed in a way that @@ -1052,7 +1052,7 @@ struct btrfs_qgroup_status_item { * only used during scanning to record the progress * of the scan. It contains a logical address */ - __le64 scan; + __le64 rescan; } __attribute__ ((__packed__)); struct btrfs_qgroup_info_item { @@ -1603,6 +1603,11 @@ struct btrfs_fs_info { /* used by btrfs_qgroup_record_ref for an efficient tree traversal */ u64 qgroup_seq; + /* qgroup rescan items */ + struct mutex qgroup_rescan_lock; /* protects the progress item */ + struct btrfs_key qgroup_rescan_progress; + struct btrfs_workers qgroup_rescan_workers; + /* filesystem state */ unsigned long fs_state; @@ -2888,8 +2893,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct btrfs_qgroup_status_item, version, 64); BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item, flags, 64); -BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item, - scan, 64); +BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item, + rescan, 64); /* btrfs_qgroup_info_item */ BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item, @@ -3834,7 +3839,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_quota_disable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); -int btrfs_quota_rescan(struct btrfs_fs_info *fs_info); +int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 src, u64 dst); int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f4628c7..f80383e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1996,6 +1996,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) btrfs_stop_workers(fs_info-caching_workers); btrfs_stop_workers(fs_info-readahead_workers); btrfs_stop_workers(fs_info-flush_workers); + btrfs_stop_workers(fs_info-qgroup_rescan_workers); } /* helper to cleanup tree roots */ @@ -2257,6 +2258,7 @@ int open_ctree(struct super_block *sb, fs_info-qgroup_seq = 1; fs_info-quota_enabled = 0; fs_info-pending_quota_state = 0; + mutex_init(fs_info-qgroup_rescan_lock); btrfs_init_free_cluster(fs_info-meta_alloc_cluster); btrfs_init_free_cluster(fs_info-data_alloc_cluster); @@ -2485,6 +2487,8 @@ int open_ctree(struct super_block *sb, btrfs_init_workers(fs_info-readahead_workers, readahead, fs_info-thread_pool_size, fs_info-generic_worker); + btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1, + fs_info-generic_worker); /* * endios are largely parallel and should have a very @@ -2519,6 +2523,7 @@ int open_ctree(struct super_block *sb, ret |= btrfs_start_workers(fs_info-caching_workers); ret |= btrfs_start_workers(fs_info-readahead_workers); ret |= btrfs_start_workers(fs_info-flush_workers); + ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers); if (ret) { err = -ENOMEM; goto fail_sb_buffer; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d0af96a..5e93bb8 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3701,12
Re: [PATCH v3 2/3] Btrfs: rescan for qgroups
On Tue, April 23, 2013 at 14:05 (+0200), Wang Shilong wrote: Hello Jan, [..snip..] /* * the delayed ref sequence number we pass depends on the direction of * the operation. for add operations, we pass (node-seq - 1) to skip @@ -1401,7 +1428,17 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle *trans, if (ret 0) return ret; +mutex_lock(fs_info-qgroup_rescan_lock); spin_lock(fs_info-qgroup_lock); +if (fs_info-qgroup_flags BTRFS_QGROUP_STATUS_FLAG_RESCAN) { +if (fs_info-qgroup_rescan_progress.objectid = node-bytenr) { +ret = 0; +mutex_unlock(fs_info-qgroup_rescan_lock); +goto unlock; +} +} +mutex_unlock(fs_info-qgroup_rescan_lock); + quota_root = fs_info-quota_root; if (!quota_root) goto unlock; @@ -1820,3 +1857,250 @@ void assert_qgroups_uptodate(struct btrfs_trans_handle *trans) trans-delayed_ref_elem.seq); BUG(); } + +/* + * returns 0 on error, 0 when more leafs are to be scanned. + * returns 1 when done, 2 when done and FLAG_INCONSISTENT was cleared. + */ +static int +qgroup_rescan_leaf(struct qgroup_rescan *qscan, struct btrfs_path *path, + struct btrfs_trans_handle *trans, struct ulist *tmp, + struct extent_buffer *scratch_leaf) +{ +struct btrfs_key found; +struct btrfs_fs_info *fs_info = qscan-fs_info; +struct ulist *roots = NULL; +struct ulist_node *unode; +struct ulist_iterator uiter; +struct seq_list tree_mod_seq_elem = {}; +u64 seq; +int slot; +int ret; + +path-leave_spinning = 1; +mutex_lock(fs_info-qgroup_rescan_lock); +ret = btrfs_search_slot_for_read(fs_info-extent_root, + fs_info-qgroup_rescan_progress, + path, 1, 0); + +pr_debug(current progress key (%llu %u %llu), search_slot ret %d\n, + (unsigned long long)fs_info-qgroup_rescan_progress.objectid, + fs_info-qgroup_rescan_progress.type, + (unsigned long long)fs_info-qgroup_rescan_progress.offset, + ret); + +if (ret) { +/* + * The rescan is about to end, we will not be scanning any + * further blocks. We cannot unset the RESCAN flag here, because + * we want to commit the transaction if everything went well. + * To make the live accounting work in this phase, we set our + * scan progress pointer such that every real extent objectid + * will be smaller. + */ +fs_info-qgroup_rescan_progress.objectid = (u64)-1; +btrfs_release_path(path); +mutex_unlock(fs_info-qgroup_rescan_lock); +return ret; +} + +btrfs_item_key_to_cpu(path-nodes[0], found, + btrfs_header_nritems(path-nodes[0]) - 1); +fs_info-qgroup_rescan_progress.objectid = found.objectid + 1; + +btrfs_get_tree_mod_seq(fs_info, tree_mod_seq_elem); +memcpy(scratch_leaf, path-nodes[0], sizeof(*scratch_leaf)); +slot = path-slots[0]; +btrfs_release_path(path); +mutex_unlock(fs_info-qgroup_rescan_lock); + +for (; slot btrfs_header_nritems(scratch_leaf); ++slot) { +btrfs_item_key_to_cpu(scratch_leaf, found, slot); +if (found.type != BTRFS_EXTENT_ITEM_KEY) +continue; +ret = btrfs_find_all_roots(trans, fs_info, found.objectid, + tree_mod_seq_elem.seq, roots); +if (ret 0) +break; +spin_lock(fs_info-qgroup_lock); +seq = fs_info-qgroup_seq; +fs_info-qgroup_seq += roots-nnodes + 1; /* max refcnt */ + +ulist_reinit(tmp); +ULIST_ITER_INIT(uiter); +while ((unode = ulist_next(roots, uiter))) { +struct btrfs_qgroup *qg; + +qg = find_qgroup_rb(fs_info, unode-val); +if (!qg) +continue; + +ret = ulist_add(tmp, qg-qgroupid, (uintptr_t)qg, +GFP_ATOMIC); If ulist_add() fails, we still need to call ulist_free(roots).. +if (ret 0) { +spin_unlock(fs_info-qgroup_lock); +goto out; +} +} + +/* this is similar to step 2 of btrfs_qgroup_account_ref */ +ULIST_ITER_INIT(uiter); +while ((unode = ulist_next(tmp, uiter))) { +struct btrfs_qgroup *qg; +struct btrfs_qgroup_list *glist; + +qg = (struct btrfs_qgroup *)(uintptr_t) unode-aux; +
Re: [PATCH v3 3/3] Btrfs: automatic rescan after quota enable command
On Tue, April 23, 2013 at 17:36 (+0200), David Sterba wrote: On Tue, Apr 23, 2013 at 01:26:51PM +0200, Jan Schmidt wrote: --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1494,10 +1494,14 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans, { struct btrfs_root *quota_root = fs_info-quota_root; int ret = 0; +int start_rescan_worker = 0; if (!quota_root) goto out; +if (!fs_info-quota_enabled fs_info-pending_quota_state) +start_rescan_worker = 1; + fs_info-quota_enabled = fs_info-pending_quota_state; spin_lock(fs_info-qgroup_lock); @@ -1523,6 +1527,12 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans, if (ret) fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; +if (start_rescan_worker) { +ret = btrfs_qgroup_rescan(fs_info); btrfs_run_qgroups() is called from transaction commit and does BUG_ON the return value. btrfs_qgroup_rescan can return -EINPROGRESS if the rescan is in progress and this is propagated back to trans commit. So the rescan triggered by ioctl may cause a crash, unless I'm missing something. You're right, doesn't seem like a good idea to propagate that return value to the caller. I'll leave in the printk following the quoted line and reset ret to zero afterwards. (As already mentioned, v4 to come) Thanks, Jan The original question I've had is what sort of work does rescan do because it's on the commit path and we don't want to add more work and delay commit. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/3] Btrfs: rescan for qgroups
On Tue, April 23, 2013 at 16:54 (+0200), Wang Shilong wrote: Hello Jan, +static void btrfs_qgroup_rescan_worker(struct btrfs_work *work) +{ +struct qgroup_rescan *qscan = container_of(work, struct qgroup_rescan, + work); +struct btrfs_path *path; +struct btrfs_trans_handle *trans = NULL; +struct btrfs_fs_info *fs_info = qscan-fs_info; +struct ulist *tmp = NULL; +struct extent_buffer *scratch_leaf = NULL; +int err = -ENOMEM; + +path = btrfs_alloc_path(); +if (!path) +goto out; +tmp = ulist_alloc(GFP_NOFS); +if (!tmp) +goto out; +scratch_leaf = kmalloc(sizeof(*scratch_leaf), GFP_NOFS); +if (!scratch_leaf) +goto out; + +err = 0; +while (!err) { +trans = btrfs_start_transaction(fs_info-fs_root, 0); +if (IS_ERR(trans)) { +err = PTR_ERR(trans); +break; +} +if (!fs_info-quota_enabled) { +err = EINTR;' Why not -EINTR? Makes sense, will change that. +} else { +err = qgroup_rescan_leaf(qscan, path, trans, + tmp, scratch_leaf); +} +if (err 0) +btrfs_commit_transaction(trans, fs_info-fs_root); +else +btrfs_end_transaction(trans, fs_info-fs_root); +} + +out: +kfree(scratch_leaf); +ulist_free(tmp); +btrfs_free_path(path); +kfree(qscan); + +mutex_lock(fs_info-qgroup_rescan_lock); +fs_info-qgroup_flags = ~BTRFS_QGROUP_STATUS_FLAG_RESCAN; + +if (err == 2 +fs_info-qgroup_flags BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) { +fs_info-qgroup_flags = ~BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; +} else if (err 0) { It -EINTR happens, quota has been disabled, i don't think we should set INCONSISTENT flag… Debatable. Quota information is in fact inconsistent on disk, and only because we can conclude that also from the fact that it is currently disabled, it doesn't hurt to set that flag. In fact, whenever quota is enabled, we're setting the flag, too: 802 int btrfs_quota_enable(struct btrfs_trans_handle *trans, 803struct btrfs_fs_info *fs_info) ... 852 fs_info-qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON | 853 BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; So I don't think it's worth another comparison here. Thanks, -Jan Thanks, Wang +fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; +} +mutex_unlock(fs_info-qgroup_rescan_lock); + +if (err = 0) { +pr_info(btrfs: qgroup scan completed%s\n, +err == 2 ? (inconsistency flag cleared) : ); +} else { +pr_err(btrfs: qgroup scan failed with %d\n, err); +} +} + -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: separate sequence numbers for delayed ref tracking and tree mod log
Sequence numbers for delayed refs have been introduced in the first version of the qgroup patch set. To solve the problem of find_all_roots on a busy file system, the tree mod log was introduced. The sequence numbers for that were simply shared between those two users. However, at one point in qgroup's quota accounting, there's a statement accessing the previous sequence number, that's still just doing (seq - 1) just as it had to in the very first version. To satisfy that requirement, this patch makes the sequence number counter 64 bit and splits it into a major part (used for qgroup sequence number counting) and a minor part (incremented for each tree modification in the log). This enables us to go exactly one major step backwards, as required for qgroups, while still incrementing the sequence counter for tree mod log insertions to keep track of their order. Keeping them in a single variable means there's no need to change all the code dealing with comparisons of two sequence numbers. The sequence number is reset to 0 on commit (not new in this patch), which ensures we won't overflow the two 32 bit counters. Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs from the tree mod log code may happen. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.c | 36 +--- fs/btrfs/ctree.h |7 ++- fs/btrfs/delayed-ref.c |6 -- fs/btrfs/disk-io.c |2 +- fs/btrfs/extent-tree.c |5 +++-- fs/btrfs/qgroup.c | 13 - fs/btrfs/transaction.c |2 +- 7 files changed, 52 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 566d99b..b74136e 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -361,6 +361,36 @@ static inline void tree_mod_log_write_unlock(struct btrfs_fs_info *fs_info) } /* + * increment the upper half of tree_mod_seq, set lower half zero + * + * must be called with fs_info-tree_mod_seq_lock held + */ +static inline u64 btrfs_inc_tree_mod_seq_major(struct btrfs_fs_info *fs_info) +{ + u64 seq = atomic64_read(fs_info-tree_mod_seq); + seq = 0xull; + seq += 1ull 32; + atomic64_set(fs_info-tree_mod_seq, seq); + return seq; +} + +/* + * increment the lower half of tree_mod_seq + */ +static inline u64 btrfs_inc_tree_mod_seq_minor(struct btrfs_fs_info *fs_info) +{ + return atomic64_inc_return(fs_info-tree_mod_seq); +} + +/* + * return the last minor in the previous major tree_mod_seq number + */ +u64 btrfs_tree_mod_seq_prev(u64 seq) +{ + return (seq 0xull) - 1ull; +} + +/* * This adds a new blocker to the tree mod log's blocker list if the @elem * passed does not already have a sequence number set. So when a caller expects * to record tree modifications, it should ensure to set elem-seq to zero @@ -376,10 +406,10 @@ u64 btrfs_get_tree_mod_seq(struct btrfs_fs_info *fs_info, tree_mod_log_write_lock(fs_info); spin_lock(fs_info-tree_mod_seq_lock); if (!elem-seq) { - elem-seq = btrfs_inc_tree_mod_seq(fs_info); + elem-seq = btrfs_inc_tree_mod_seq_major(fs_info); list_add_tail(elem-list, fs_info-tree_mod_seq_list); } - seq = btrfs_inc_tree_mod_seq(fs_info); + seq = btrfs_inc_tree_mod_seq_minor(fs_info); spin_unlock(fs_info-tree_mod_seq_lock); tree_mod_log_write_unlock(fs_info); @@ -524,7 +554,7 @@ static inline int tree_mod_alloc(struct btrfs_fs_info *fs_info, gfp_t flags, if (!tm) return -ENOMEM; - tm-seq = btrfs_inc_tree_mod_seq(fs_info); + tm-seq = btrfs_inc_tree_mod_seq_minor(fs_info); return tm-seq; } diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 412c306..5f34f89 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1422,7 +1422,7 @@ struct btrfs_fs_info { /* this protects tree_mod_seq_list */ spinlock_t tree_mod_seq_lock; - atomic_t tree_mod_seq; + atomic64_t tree_mod_seq; struct list_head tree_mod_seq_list; struct seq_list tree_mod_seq_elem; @@ -3334,10 +3334,7 @@ u64 btrfs_get_tree_mod_seq(struct btrfs_fs_info *fs_info, struct seq_list *elem); void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info, struct seq_list *elem); -static inline u64 btrfs_inc_tree_mod_seq(struct btrfs_fs_info *fs_info) -{ - return atomic_inc_return(fs_info-tree_mod_seq); -} +u64 btrfs_tree_mod_seq_prev(u64 seq); int btrfs_old_root_level(struct btrfs_root *root, u64 time_seq); /* root-item.c */ diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 116abec..c219463 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -361,8 +361,10 @@ int btrfs_check_delayed_seq(struct btrfs_fs_info *fs_info, elem = list_first_entry(fs_info-tree_mod_seq_list
Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7
On Fri, April 19, 2013 at 07:57 (+0200), Tejun Heo wrote: (cc'ing btrfs people) On Fri, Apr 19, 2013 at 11:33:20AM +0800, Wanlong Gao wrote: RIP: 0010:[812484d3] [812484d3] ftrace_raw_event_block_bio_complete+0x73/0xf0 ... [811b6c10] bio_endio+0x80/0x90 [a0790d26] btrfs_end_bio+0xf6/0x190 [btrfs] [811b6bcd] bio_endio+0x3d/0x90 [81249873] req_bio_endio+0xa3/0xe0 Ugh In fs/btrfs/volumes.c static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) { ... bio-bi_bdev = (struct block_device *) (unsigned long)bbio-mirror_num; ... } static void btrfs_end_bio(struct bio *bio, int err) { ... bio-bi_bdev = (struct block_device *) (unsigned long)bbio-mirror_num; ... } In fs/btrfs/extent_io.c static void end_bio_extent_readpage(struct bio *bio, int err) { int mirror; ... mirror = (int)(unsigned long)bio-bi_bdev; ... } Ewweehh No wonder this thing crashes. Chris, can't the original bio carry bbio in bi_private and let end_bio_extent_readpage() free the bbio instead of abusing bi_bdev like this? Oops. It's been my patch back in 2011 (commit 2774b2ca3), sent as an RFC-Patch and just slipped in without further discussion of that exact change. Hackish, yes - my reasoning was because the block layer changed bio-bi_bdev anyway, no one would want to look into it after the bio returned (and in fact it didn't hurt for like two years now). Although the block layer changes bi_bdev, it stays a valid bdev pointer, I admit. One way around this would be what you suggest, however that would mean the caller of (btrfs|btree)_submit_bio_hook gets its completion called in the end, but must know that the private is in fact a bbio which in turn carries the caller's private. Doesn't sound clean to me, either. The best idea I currently have is to add a dispatcher function that does the freeing of bbio and calls the actual completion with mirror_num as a separate parameter. That would make all the btrfs completions incompatible with bio_end_io_t, but it shouldn't hurt. At least now I know I wasn't invited to LSF for a good reason :-) -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] Btrfs: rescan for qgroups
On Tue, April 16, 2013 at 14:22 (+0200), Wang Shilong wrote: Hello Jan, more comments below.. [...snip..] + +static long btrfs_ioctl_quota_rescan_status(struct file *file, void __user *arg) +{ +struct btrfs_root *root = BTRFS_I(fdentry(file)-d_inode)-root; +struct btrfs_ioctl_quota_rescan_args *qsa; +int ret = 0; + +if (!capable(CAP_SYS_ADMIN)) +return -EPERM; + +qsa = kzalloc(sizeof(*qsa), GFP_NOFS); +if (!qsa) +return -ENOMEM; + Here, i think we should hold qgroup_rescan_lock and group_lock: 1 qgroup_rescan protect BTRFS_QGROUP_STATUS_RESCAN 2quota disabling may happen this time..so group_lock should also be held here. It's just a status call for user space, I don't really care about exact synchronization here. *If* we wanted to do that, I would have moved the code into qgroup.c, because all the code that requires any qgroup locks is there. But I'd really want to keep it simple. You cannot get completely garbage information that way, you only could race with someone just starting off or finishing a rescan operation. I don't think that really matters in the end. +if (root-fs_info-qgroup_flags BTRFS_QGROUP_STATUS_FLAG_RESCAN) { +qsa-flags = 1; +qsa-progress = root-fs_info-qgroup_rescan_progress.objectid; +} + +if (copy_to_user(arg, qsa, sizeof(*qsa))) +ret = -EFAULT; + +kfree(qsa); +return ret; +} + [….snip...] + +/* + * returns 0 on error, 0 when more leafs are to be scanned. + * returns 1 when done, 2 when done and FLAG_INCONSISTENT was cleared. + */ +static int +qgroup_rescan_leaf(struct qgroup_rescan *qscan, struct btrfs_path *path, + struct btrfs_trans_handle *trans, struct ulist *tmp, + struct extent_buffer *scratch_leaf) +{ +struct btrfs_key found; +struct btrfs_fs_info *fs_info = qscan-fs_info; +struct ulist *roots = NULL; +struct ulist_node *unode; +struct ulist_iterator uiter; +struct seq_list tree_mod_seq_elem = {}; +u64 seq; +int slot; +int ret; + +path-leave_spinning = 1; +mutex_lock(fs_info-qgroup_rescan_lock); Here in qgroup_rescan_leaf(), we don't need hold group_rescan_lock. Because qgroup_rescan_lock is used to protect qgroup_flag, in group_rescan_leaf(). we don't change qgroup_flag.. So we don't need hold the group_rescan_lock. Maybe we can just remove the lock qgroup_rescan_lock, and i think what qgroup_rscan_lock does that qgroup_lock can replace. No, we cannot. We need the mutex for the following tree search and tie it to the following update of the qgroup_rescan_progress. In fact, that's the only reason I introduced it, but I don't want to hold a spin lock for a whole tree search. If we do not make sure the search operation and the progress update happen under the same lock, we can end up with a tree block being found by thread A, then thread B checks the rescan_progress, then thread A updates the rescan_progress according to the found block and doing the rescan. That would result in wrong tracking information. +ret = btrfs_search_slot_for_read(fs_info-extent_root, + fs_info-qgroup_rescan_progress, + path, 1, 0); + +pr_debug(current progress key (%llu %u %llu), search_slot ret %d\n, + (unsigned long long)fs_info-qgroup_rescan_progress.objectid, + fs_info-qgroup_rescan_progress.type, + (unsigned long long)fs_info-qgroup_rescan_progress.offset, + ret); + +if (ret) { +/* + * The rescan is about to end, we will not be scanning any + * further blocks. We cannot unset the RESCAN flag here, because + * we want to commit the transaction if everything went well. + * To make the live accounting work in this phase, we set our + * scan progress pointer such that every real extent objectid + * will be smaller. + */ +fs_info-qgroup_rescan_progress.objectid = (u64)-1; +btrfs_release_path(path); +mutex_unlock(fs_info-qgroup_rescan_lock); +return ret; +} + +btrfs_item_key_to_cpu(path-nodes[0], found, + btrfs_header_nritems(path-nodes[0]) - 1); +fs_info-qgroup_rescan_progress.objectid = found.objectid + 1; + +btrfs_get_tree_mod_seq(fs_info, tree_mod_seq_elem); +memcpy(scratch_leaf, path-nodes[0], sizeof(*scratch_leaf)); +slot = path-slots[0]; +btrfs_release_path(path); +mutex_unlock(fs_info-qgroup_rescan_lock); + +for (; slot btrfs_header_nritems(scratch_leaf); ++slot) { +btrfs_item_key_to_cpu(scratch_leaf, found, slot); +if (found.type != BTRFS_EXTENT_ITEM_KEY) +continue; +ret =
Re: [PATCH] Btrfs: return error when we specify wrong start
On Tue, April 16, 2013 at 10:40 (+0200), Liu Bo wrote: We need such a sanity check for wrong start, otherwise, even with a wrong start that's larger than file size, we can end up not only changing inode's force compress flag but also FS's incompat flags. That reads out very cryptic. Can you please add something hinting at defrag to the title or at least the description? Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/3] Btrfs: quota rescan for 3.10
The kernel side for rescan, which is needed if you want to enable qgroup tracking on a non-empty volume. The first patch splits btrfs_qgroup_account_ref into readable ans reusable units. The second patch adds the rescan implementation (refer to its commit message for a description of the algorithm). The third patch starts an automatic rescan when qgroups are enabled. It is only separated to potentially help bisecting things in case of a problem. The required user space patch was sent at 2013-04-05, subject [PATCH] Btrfs-progs: quota rescan. -- Changes v1-v2: - fix calculation of the exclusive field for qgroups in level != 0 - split btrfs_qgroup_account_ref - take into account that mutex_unlock might schedule - fix kzalloc error checking - add some reserved ints to struct btrfs_ioctl_quota_rescan_args - changed modification to unused #define BTRFS_QUOTA_CTL_RESCAN - added missing (unsigned long long) casts for pr_debug - more detailed commit messages Jan Schmidt (3): Btrfs: split btrfs_qgroup_account_ref into four functions Btrfs: rescan for qgroups Btrfs: automatic rescan after quota enable command fs/btrfs/ctree.h | 17 +- fs/btrfs/disk-io.c |6 + fs/btrfs/ioctl.c | 83 ++-- fs/btrfs/qgroup.c | 517 +++- include/uapi/linux/btrfs.h | 12 +- 5 files changed, 509 insertions(+), 126 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/3] Btrfs: rescan for qgroups
If qgroup tracking is out of sync, a rescan operation can be started. It iterates the complete extent tree and recalculates all qgroup tracking data. This is an expensive operation and should not be used unless required. A filesystem under rescan can still be umounted. The rescan continues on the next mount. Status information is provided with a separate ioctl while a rescan operation is in progress. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.h | 17 ++- fs/btrfs/disk-io.c |6 + fs/btrfs/ioctl.c | 83 ++-- fs/btrfs/qgroup.c | 295 +-- include/uapi/linux/btrfs.h | 12 ++- 5 files changed, 378 insertions(+), 35 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 0d82922..bd4e2a7 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1019,9 +1019,9 @@ struct btrfs_block_group_item { */ #define BTRFS_QGROUP_STATUS_FLAG_ON(1ULL 0) /* - * SCANNING is set during the initialization phase + * RESCAN is set during the initialization phase */ -#define BTRFS_QGROUP_STATUS_FLAG_SCANNING (1ULL 1) +#define BTRFS_QGROUP_STATUS_FLAG_RESCAN(1ULL 1) /* * Some qgroup entries are known to be out of date, * either because the configuration has changed in a way that @@ -1050,7 +1050,7 @@ struct btrfs_qgroup_status_item { * only used during scanning to record the progress * of the scan. It contains a logical address */ - __le64 scan; + __le64 rescan; } __attribute__ ((__packed__)); struct btrfs_qgroup_info_item { @@ -1587,6 +1587,11 @@ struct btrfs_fs_info { /* used by btrfs_qgroup_record_ref for an efficient tree traversal */ u64 qgroup_seq; + /* qgroup rescan items */ + struct mutex qgroup_rescan_lock; /* protects the progress item */ + struct btrfs_key qgroup_rescan_progress; + struct btrfs_workers qgroup_rescan_workers; + /* filesystem state */ unsigned long fs_state; @@ -2864,8 +2869,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct btrfs_qgroup_status_item, version, 64); BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item, flags, 64); -BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item, - scan, 64); +BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item, + rescan, 64); /* btrfs_qgroup_info_item */ BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item, @@ -3784,7 +3789,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_quota_disable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); -int btrfs_quota_rescan(struct btrfs_fs_info *fs_info); +int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 src, u64 dst); int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 6d19a0a..60d15fe 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2192,6 +2192,7 @@ int open_ctree(struct super_block *sb, fs_info-qgroup_seq = 1; fs_info-quota_enabled = 0; fs_info-pending_quota_state = 0; + mutex_init(fs_info-qgroup_rescan_lock); btrfs_init_free_cluster(fs_info-meta_alloc_cluster); btrfs_init_free_cluster(fs_info-data_alloc_cluster); @@ -2394,6 +2395,8 @@ int open_ctree(struct super_block *sb, btrfs_init_workers(fs_info-readahead_workers, readahead, fs_info-thread_pool_size, fs_info-generic_worker); + btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1, + fs_info-generic_worker); /* * endios are largely parallel and should have a very @@ -2428,6 +2431,7 @@ int open_ctree(struct super_block *sb, ret |= btrfs_start_workers(fs_info-caching_workers); ret |= btrfs_start_workers(fs_info-readahead_workers); ret |= btrfs_start_workers(fs_info-flush_workers); + ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers); if (ret) { err = -ENOMEM; goto fail_sb_buffer; @@ -2773,6 +2777,7 @@ fail_sb_buffer: btrfs_stop_workers(fs_info-delayed_workers); btrfs_stop_workers(fs_info-caching_workers); btrfs_stop_workers(fs_info-flush_workers); + btrfs_stop_workers(fs_info-qgroup_rescan_workers); fail_alloc: fail_iput: btrfs_mapping_tree_free(fs_info-mapping_tree); @@ -3463,6 +3468,7 @@ int close_ctree(struct btrfs_root *root) btrfs_stop_workers(fs_info-caching_workers); btrfs_stop_workers(fs_info
[PATCH v2 3/3] Btrfs: automatic rescan after quota enable command
When qgroup tracking is enabled, we do an automatic cycle of the new rescan mechanism. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/qgroup.c | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index bb081b5..0ea2c3e 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1356,10 +1356,14 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans, { struct btrfs_root *quota_root = fs_info-quota_root; int ret = 0; + int start_rescan_worker = 0; if (!quota_root) goto out; + if (!fs_info-quota_enabled fs_info-pending_quota_state) + start_rescan_worker = 1; + fs_info-quota_enabled = fs_info-pending_quota_state; spin_lock(fs_info-qgroup_lock); @@ -1385,6 +1389,12 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans, if (ret) fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT; + if (start_rescan_worker) { + ret = btrfs_qgroup_rescan(fs_info); + if (ret) + pr_err(btrfs: start rescan quota failed: %d\n, ret); + } + out: return ret; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions
The function is separated into a preparation part and the three accounting steps mentioned in the qgroups documentation. The goal is to make steps two and three usable by the rescan functionality. A side effect is that the function is restructured into readable subunits. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/qgroup.c | 212 ++--- 1 files changed, 121 insertions(+), 91 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index b44124d..c38a0c5 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1075,6 +1075,122 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans, return 0; } +static void qgroup_account_ref_step1(struct btrfs_fs_info *fs_info, +struct ulist *roots, struct ulist *tmp, +u64 seq) +{ + struct ulist_node *unode; + struct ulist_iterator uiter; + struct ulist_node *tmp_unode; + struct ulist_iterator tmp_uiter; + struct btrfs_qgroup *qg; + + ULIST_ITER_INIT(uiter); + while ((unode = ulist_next(roots, uiter))) { + qg = find_qgroup_rb(fs_info, unode-val); + if (!qg) + continue; + + ulist_reinit(tmp); + /* XXX id not needed */ + ulist_add(tmp, qg-qgroupid, (u64)(uintptr_t)qg, GFP_ATOMIC); + ULIST_ITER_INIT(tmp_uiter); + while ((tmp_unode = ulist_next(tmp, tmp_uiter))) { + struct btrfs_qgroup_list *glist; + + qg = (struct btrfs_qgroup *)(uintptr_t)tmp_unode-aux; + if (qg-refcnt seq) + qg-refcnt = seq + 1; + else + ++qg-refcnt; + + list_for_each_entry(glist, qg-groups, next_group) { + ulist_add(tmp, glist-group-qgroupid, + (u64)(uintptr_t)glist-group, + GFP_ATOMIC); + } + } + } +} + +static void qgroup_account_ref_step2(struct btrfs_fs_info *fs_info, +struct ulist *roots, struct ulist *tmp, +u64 seq, int sgn, u64 num_bytes, +struct btrfs_qgroup *qgroup) +{ + struct ulist_node *unode; + struct ulist_iterator uiter; + struct btrfs_qgroup *qg; + struct btrfs_qgroup_list *glist; + + ulist_reinit(tmp); + ulist_add(tmp, qgroup-qgroupid, (uintptr_t)qgroup, GFP_ATOMIC); + + ULIST_ITER_INIT(uiter); + while ((unode = ulist_next(tmp, uiter))) { + + qg = (struct btrfs_qgroup *)(uintptr_t)unode-aux; + if (qg-refcnt seq) { + /* not visited by step 1 */ + qg-rfer += sgn * num_bytes; + qg-rfer_cmpr += sgn * num_bytes; + if (roots-nnodes == 0) { + qg-excl += sgn * num_bytes; + qg-excl_cmpr += sgn * num_bytes; + } + qgroup_dirty(fs_info, qg); + } + WARN_ON(qg-tag = seq); + qg-tag = seq; + + list_for_each_entry(glist, qg-groups, next_group) { + ulist_add(tmp, glist-group-qgroupid, + (uintptr_t)glist-group, GFP_ATOMIC); + } + } +} + +static void qgroup_account_ref_step3(struct btrfs_fs_info *fs_info, +struct ulist *roots, struct ulist *tmp, +u64 seq, int sgn, u64 num_bytes) +{ + struct ulist_node *unode; + struct ulist_iterator uiter; + struct btrfs_qgroup *qg; + struct ulist_node *tmp_unode; + struct ulist_iterator tmp_uiter; + + ULIST_ITER_INIT(uiter); + while ((unode = ulist_next(roots, uiter))) { + qg = find_qgroup_rb(fs_info, unode-val); + if (!qg) + continue; + + ulist_reinit(tmp); + ulist_add(tmp, qg-qgroupid, (uintptr_t)qg, GFP_ATOMIC); + ULIST_ITER_INIT(tmp_uiter); + while ((tmp_unode = ulist_next(tmp, tmp_uiter))) { + struct btrfs_qgroup_list *glist; + + qg = (struct btrfs_qgroup *)(uintptr_t)tmp_unode-aux; + if (qg-tag == seq) + continue; + + if (qg-refcnt - seq == roots-nnodes) { + qg-excl -= sgn * num_bytes; + qg-excl_cmpr -= sgn * num_bytes; + qgroup_dirty(fs_info, qg
Re: [PATCH v2 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions
On Tue, April 16, 2013 at 11:20 (+0200), Wang Shilong wrote: Hello Jan, The function is separated into a preparation part and the three accounting steps mentioned in the qgroups documentation. The goal is to make steps two and three usable by the rescan functionality. A side effect is that the function is restructured into readable subunits. How about renaming the three functions like: 1 qgroup_walk_old_roots() 2 qgroup_walk_new_root() 3 qgroup_rewalk_old_root() I'd like this function to be meaningful, but not just step1,2,3. Maybe you can think out better function name. I'd like to keep it like 1, 2, 3, because that matches the documentation in the qgroup pdf and the code has always been documented in those three steps. Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] Btrfs: rescan for qgroups
On Tue, April 16, 2013 at 11:26 (+0200), Wang Shilong wrote: Hello, Jan If qgroup tracking is out of sync, a rescan operation can be started. It iterates the complete extent tree and recalculates all qgroup tracking data. This is an expensive operation and should not be used unless required. A filesystem under rescan can still be umounted. The rescan continues on the next mount. Status information is provided with a separate ioctl while a rescan operation is in progress. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.h | 17 ++- fs/btrfs/disk-io.c |6 + fs/btrfs/ioctl.c | 83 ++-- fs/btrfs/qgroup.c | 295 +-- include/uapi/linux/btrfs.h | 12 ++- 5 files changed, 378 insertions(+), 35 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 0d82922..bd4e2a7 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1019,9 +1019,9 @@ struct btrfs_block_group_item { */ #define BTRFS_QGROUP_STATUS_FLAG_ON (1ULL 0) /* - * SCANNING is set during the initialization phase + * RESCAN is set during the initialization phase */ -#define BTRFS_QGROUP_STATUS_FLAG_SCANNING (1ULL 1) +#define BTRFS_QGROUP_STATUS_FLAG_RESCAN (1ULL 1) /* * Some qgroup entries are known to be out of date, * either because the configuration has changed in a way that @@ -1050,7 +1050,7 @@ struct btrfs_qgroup_status_item { * only used during scanning to record the progress * of the scan. It contains a logical address */ -__le64 scan; +__le64 rescan; } __attribute__ ((__packed__)); struct btrfs_qgroup_info_item { @@ -1587,6 +1587,11 @@ struct btrfs_fs_info { /* used by btrfs_qgroup_record_ref for an efficient tree traversal */ u64 qgroup_seq; +/* qgroup rescan items */ +struct mutex qgroup_rescan_lock; /* protects the progress item */ +struct btrfs_key qgroup_rescan_progress; +struct btrfs_workers qgroup_rescan_workers; + /* filesystem state */ unsigned long fs_state; @@ -2864,8 +2869,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct btrfs_qgroup_status_item, version, 64); BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item, flags, 64); -BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item, - scan, 64); +BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item, + rescan, 64); /* btrfs_qgroup_info_item */ BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item, @@ -3784,7 +3789,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_quota_disable(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); -int btrfs_quota_rescan(struct btrfs_fs_info *fs_info); +int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 src, u64 dst); int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 6d19a0a..60d15fe 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2192,6 +2192,7 @@ int open_ctree(struct super_block *sb, fs_info-qgroup_seq = 1; fs_info-quota_enabled = 0; fs_info-pending_quota_state = 0; +mutex_init(fs_info-qgroup_rescan_lock); btrfs_init_free_cluster(fs_info-meta_alloc_cluster); btrfs_init_free_cluster(fs_info-data_alloc_cluster); @@ -2394,6 +2395,8 @@ int open_ctree(struct super_block *sb, btrfs_init_workers(fs_info-readahead_workers, readahead, fs_info-thread_pool_size, fs_info-generic_worker); +btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1, + fs_info-generic_worker); /* * endios are largely parallel and should have a very @@ -2428,6 +2431,7 @@ int open_ctree(struct super_block *sb, ret |= btrfs_start_workers(fs_info-caching_workers); ret |= btrfs_start_workers(fs_info-readahead_workers); ret |= btrfs_start_workers(fs_info-flush_workers); +ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers); if (ret) { err = -ENOMEM; goto fail_sb_buffer; @@ -2773,6 +2777,7 @@ fail_sb_buffer: btrfs_stop_workers(fs_info-delayed_workers); btrfs_stop_workers(fs_info-caching_workers); btrfs_stop_workers(fs_info-flush_workers); +btrfs_stop_workers(fs_info-qgroup_rescan_workers); fail_alloc: fail_iput: btrfs_mapping_tree_free(fs_info-mapping_tree); @@ -3463,6 +3468,7 @@ int close_ctree(struct btrfs_root *root) btrfs_stop_workers(fs_info-caching_workers
Re: [PATCH v2 2/3] Btrfs: rescan for qgroups
On Tue, April 16, 2013 at 12:08 (+0200), Wang Shilong wrote: Hello Jan, slot = path-slots[0]; ptr = btrfs_item_ptr(l, slot, struct btrfs_qgroup_status_item); +spin_lock(fs_info-qgroup_lock); Why we need hold qgroup_lock here? would you please explain... It would have been easier for me if you had left the relevant context in there, but I finally found it. Thinking again about it, as update_qgroup_status_item is only called from transaction commit context, we can do without a spinlock here. I meant to protect fs_info-qgroup_flags and fs_info-qgroup_rescan_progress, but it seems not required. Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: rescan for qgroups
On Mon, April 15, 2013 at 08:08 (+0200), Wang Shilong wrote: Hello Jan, On Mon, April 15, 2013 at 07:44 (+0200), Jan Schmidt wrote: Thanks, v2 to come. Uh, but not immediately. I didn't get tracking of exclusive right. That will need some time to fix and test. 'exclusive' adds the complexity of btrfs qgroup. So if you send V2. I'd like you add more lines in changelog. Yes, the commit message will be longer as you requested previously. This does not include a complete description on how exclusive works. The qgroup pdf explains that. Besides, i have a question in my mind.(I have not seen you code).. When qgroup rescan will happen? 1 when quota is enabled That's what the second patch does, yes. Your patches should be merged in a way that we first create the level 0 qgroups for all subvolumes and then start the rescan, obviously. 2 if a new qgroup relations is created, rescan should happen? With your patches, there will be no subvolume qgroups missing. For the higher level groups, one needs expert knowledge anyway. I think it's best to leave that decision to the administrator configuring those qgroups. 2 user call qgroup rescan.. Of course, yes. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: rescan for qgroups
On Mon, April 15, 2013 at 07:44 (+0200), Jan Schmidt wrote: Thanks, v2 to come. Uh, but not immediately. I didn't get tracking of exclusive right. That will need some time to fix and test. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] Btrfs: fix accessing the root pointer in tree mod log functions
The tree mod log functions were accessing root-node-... directly, without use of btrfs_root_node() or explicit rcu locking. This could lead to an extent buffer reference being leaked and another reference being freed too early when preemtion was enabled. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.c | 38 +++--- 1 files changed, 19 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 4439cb7..0260795 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1068,11 +1068,11 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans, */ static struct tree_mod_elem * __tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info, - struct btrfs_root *root, u64 time_seq) + struct extent_buffer *eb_root, u64 time_seq) { struct tree_mod_elem *tm; struct tree_mod_elem *found = NULL; - u64 root_logical = root-node-start; + u64 root_logical = eb_root-start; int looped = 0; if (!time_seq) @@ -1106,7 +1106,6 @@ __tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info, found = tm; root_logical = tm-old_root.logical; - BUG_ON(root_logical == root-node-start); looped = 1; } @@ -1245,29 +1244,30 @@ get_old_root(struct btrfs_root *root, u64 time_seq) { struct tree_mod_elem *tm; struct extent_buffer *eb; + struct extent_buffer *eb_root; struct extent_buffer *old; struct tree_mod_root *old_root = NULL; u64 old_generation = 0; u64 logical; u32 blocksize; - eb = btrfs_read_lock_root_node(root); - tm = __tree_mod_log_oldest_root(root-fs_info, root, time_seq); + eb_root = btrfs_read_lock_root_node(root); + tm = __tree_mod_log_oldest_root(root-fs_info, eb_root, time_seq); if (!tm) - return root-node; + return eb_root; if (tm-op == MOD_LOG_ROOT_REPLACE) { old_root = tm-old_root; old_generation = tm-generation; logical = old_root-logical; } else { - logical = root-node-start; + logical = eb_root-start; } tm = tree_mod_log_search(root-fs_info, logical, time_seq); if (old_root tm tm-op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) { - btrfs_tree_read_unlock(root-node); - free_extent_buffer(root-node); + btrfs_tree_read_unlock(eb_root); + free_extent_buffer(eb_root); blocksize = btrfs_level_size(root, old_root-level); old = read_tree_block(root, logical, blocksize, 0); if (!old) { @@ -1279,13 +1279,13 @@ get_old_root(struct btrfs_root *root, u64 time_seq) free_extent_buffer(old); } } else if (old_root) { - btrfs_tree_read_unlock(root-node); - free_extent_buffer(root-node); + btrfs_tree_read_unlock(eb_root); + free_extent_buffer(eb_root); eb = alloc_dummy_extent_buffer(logical, root-nodesize); } else { - eb = btrfs_clone_extent_buffer(root-node); - btrfs_tree_read_unlock(root-node); - free_extent_buffer(root-node); + eb = btrfs_clone_extent_buffer(eb_root); + btrfs_tree_read_unlock(eb_root); + free_extent_buffer(eb_root); } if (!eb) @@ -1295,7 +1295,7 @@ get_old_root(struct btrfs_root *root, u64 time_seq) if (old_root) { btrfs_set_header_bytenr(eb, eb-start); btrfs_set_header_backref_rev(eb, BTRFS_MIXED_BACKREF_REV); - btrfs_set_header_owner(eb, root-root_key.objectid); + btrfs_set_header_owner(eb, btrfs_header_owner(eb_root)); btrfs_set_header_level(eb, old_root-level); btrfs_set_header_generation(eb, old_generation); } @@ -1312,15 +1312,15 @@ int btrfs_old_root_level(struct btrfs_root *root, u64 time_seq) { struct tree_mod_elem *tm; int level; + struct extent_buffer *eb_root = btrfs_root_node(root); - tm = __tree_mod_log_oldest_root(root-fs_info, root, time_seq); + tm = __tree_mod_log_oldest_root(root-fs_info, eb_root, time_seq); if (tm tm-op == MOD_LOG_ROOT_REPLACE) { level = tm-old_root.level; } else { - rcu_read_lock(); - level = btrfs_header_level(root-node); - rcu_read_unlock(); + level = btrfs_header_level(eb_root); } + free_extent_buffer(eb_root); return level; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http
[PATCH 0/3] Btrfs: patches for tree mod log for next rc
These three fixes for the tree mod log should go into the next rc pull request to mainline. I suggest adding them to stable as well, as we did with the previous tree mod log patch. The first patch fixes logging of root split operations. The second one fixes access to the root within the tree mod log functions, which weren't correctly honoring rcu locking. The third is a fix correcting the order of free and unlock. With the snapshot aware defrag patches we added in 3.9, these fixes are also important for users not using qgroups. Concerning qgroups, there is at least one more issue to be solved: The qgroup's expectations how tree mod log increases its sequence numbers doesn't fit with what the tree mod log code is actually doing. Estimated size of that fix is larger than what should go into the rc commits or into stable, that one will be coming for 3.10. With these three patches applied, I've been running my tests for more than a day without any issues, while without, it takes only minutes to trigger a BUG_ON or WARN_ON. Jan Schmidt (3): Btrfs: fix tree mod log regression on root split operations Btrfs: fix accessing the root pointer in tree mod log functions Btrfs: fix unlock after free on rewinded tree blocks fs/btrfs/ctree.c | 111 - 1 files changed, 59 insertions(+), 52 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] Btrfs: fix tree mod log regression on root split operations
Commit d9abbf1c changed tree mod log locking around ROOT_REPLACE operations. When a tree root is split, however, we were logging removal of all elements from the root node before logging removal of half of the elements for the split operation. This leads to a BUG_ON when rewinding. This commit removes the erroneous logging of removal of all elements. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.c | 55 - 1 files changed, 29 insertions(+), 26 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index ca9d8f1..4439cb7 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -643,7 +643,8 @@ __tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb) static noinline int tree_mod_log_insert_root(struct btrfs_fs_info *fs_info, struct extent_buffer *old_root, -struct extent_buffer *new_root, gfp_t flags) +struct extent_buffer *new_root, gfp_t flags, +int log_removal) { struct tree_mod_elem *tm; int ret; @@ -651,7 +652,8 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info, if (tree_mod_dont_log(fs_info, NULL)) return 0; - __tree_mod_log_free_eb(fs_info, old_root); + if (log_removal) + __tree_mod_log_free_eb(fs_info, old_root); ret = tree_mod_alloc(fs_info, flags, tm); if (ret 0) @@ -738,7 +740,7 @@ tree_mod_log_search(struct btrfs_fs_info *fs_info, u64 start, u64 min_seq) static noinline void tree_mod_log_eb_copy(struct btrfs_fs_info *fs_info, struct extent_buffer *dst, struct extent_buffer *src, unsigned long dst_offset, -unsigned long src_offset, int nr_items, int log_removal) +unsigned long src_offset, int nr_items) { int ret; int i; @@ -752,12 +754,10 @@ tree_mod_log_eb_copy(struct btrfs_fs_info *fs_info, struct extent_buffer *dst, } for (i = 0; i nr_items; i++) { - if (log_removal) { - ret = tree_mod_log_insert_key_locked(fs_info, src, - i + src_offset, - MOD_LOG_KEY_REMOVE); - BUG_ON(ret 0); - } + ret = tree_mod_log_insert_key_locked(fs_info, src, + i + src_offset, + MOD_LOG_KEY_REMOVE); + BUG_ON(ret 0); ret = tree_mod_log_insert_key_locked(fs_info, dst, i + dst_offset, MOD_LOG_KEY_ADD); @@ -802,11 +802,12 @@ tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb) static noinline void tree_mod_log_set_root_pointer(struct btrfs_root *root, - struct extent_buffer *new_root_node) + struct extent_buffer *new_root_node, + int log_removal) { int ret; ret = tree_mod_log_insert_root(root-fs_info, root-node, - new_root_node, GFP_NOFS); + new_root_node, GFP_NOFS, log_removal); BUG_ON(ret 0); } @@ -1028,7 +1029,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans, parent_start = 0; extent_buffer_get(cow); - tree_mod_log_set_root_pointer(root, cow); + tree_mod_log_set_root_pointer(root, cow, 1); rcu_assign_pointer(root-node, cow); btrfs_free_tree_block(trans, root, buf, parent_start, @@ -1754,7 +1755,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans, goto enospc; } - tree_mod_log_set_root_pointer(root, child); + tree_mod_log_set_root_pointer(root, child, 1); rcu_assign_pointer(root-node, child); add_root_to_dirty_list(root); @@ -2998,7 +2999,7 @@ static int push_node_left(struct btrfs_trans_handle *trans, push_items = min(src_nritems - 8, push_items); tree_mod_log_eb_copy(root-fs_info, dst, src, dst_nritems, 0, -push_items, 1); +push_items); copy_extent_buffer(dst, src, btrfs_node_key_ptr_offset(dst_nritems), btrfs_node_key_ptr_offset(0), @@ -3069,7 +3070,7 @@ static int balance_node_right(struct btrfs_trans_handle *trans, sizeof(struct btrfs_key_ptr)); tree_mod_log_eb_copy(root-fs_info, dst, src, 0, -src_nritems - push_items
Re: [PATCH 2/2] Btrfs: introduce noextiref mount option
On Fri, April 12, 2013 at 06:13 (+0200), Miao Xie wrote: Onthu, 11 Apr 2013 16:29:48 +0200, Jan Schmidt wrote: On Thu, April 11, 2013 at 12:35 (+0200), Miao Xie wrote: Now, we set incompat flag EXTEND_IREF when we actually need insert a extend inode reference, not when making a fs. But some users may hope that the fs still can be mounted on the old kernel, and don't hope we insert any extend inode references. So we introduce noextiref mount option to close this function. That's a much better approach compared to setting the flag on mkfs, I agree. Signed-off-by: Miao Xie mi...@cn.fujitsu.com Cc: Mark Fasheh mfas...@suse.de --- fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c| 9 + fs/btrfs/inode-item.c | 2 +- fs/btrfs/super.c | 41 - 4 files changed, 51 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a883e47..db88963 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1911,6 +1911,7 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_MOUNT_CHECK_INTEGRITY(1 20) #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1 21) #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR (1 22) +#define BTRFS_MOUNT_NOEXTIREF (1 23) #define btrfs_clear_opt(o, opt)((o) = ~BTRFS_MOUNT_##opt) #define btrfs_set_opt(o, opt) ((o) |= BTRFS_MOUNT_##opt) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index ab8ef37..ee00448 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2269,6 +2269,15 @@ int open_ctree(struct super_block *sb, goto fail_alloc; } + if ((btrfs_super_incompat_flags(disk_super) +BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF) + btrfs_test_opt(tree_root, NOEXTIREF)) { + printk(KERN_ERR BTRFS: couldn't mount because the extend iref + can not be close.\n); + err = -EINVAL; + goto fail_alloc; + } + if (btrfs_super_leafsize(disk_super) != btrfs_super_nodesize(disk_super)) { printk(KERN_ERR BTRFS: couldn't mount because metadata diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c index f07eb45..7c4f880 100644 --- a/fs/btrfs/inode-item.c +++ b/fs/btrfs/inode-item.c @@ -442,7 +442,7 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, out: btrfs_free_path(path); - if (ret == -EMLINK) { + if (ret == -EMLINK !btrfs_test_opt(root, NOEXTIREF)) { /* * We ran out of space in the ref array. Need to add an * extended ref. diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 0f03569..fd375b3 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -315,7 +315,7 @@ enum { Opt_nodatacow, Opt_max_inline, Opt_alloc_start, Opt_nobarrier, Opt_ssd, Opt_nossd, Opt_ssd_spread, Opt_thread_pool, Opt_noacl, Opt_compress, Opt_compress_type, Opt_compress_force, Opt_compress_force_type, - Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard, + Opt_notreelog, Opt_noextiref, Opt_ratio, Opt_flushoncommit, Opt_discard, Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed, Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache, Opt_no_space_cache, Opt_recovery, Opt_skip_balance, @@ -344,6 +344,7 @@ static match_table_t tokens = { {Opt_nossd, nossd}, {Opt_noacl, noacl}, {Opt_notreelog, notreelog}, + {Opt_noextiref, noextiref}, {Opt_flushoncommit, flushoncommit}, {Opt_ratio, metadata_ratio=%d}, {Opt_discard, discard}, @@ -535,6 +536,10 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) printk(KERN_INFO btrfs: disabling tree log\n); btrfs_set_opt(info-mount_opt, NOTREELOG); break; + case Opt_noextiref: + printk(KERN_INFO btrfs: disabling extend inode ref\n); + btrfs_set_opt(info-mount_opt, NOEXTIREF); + break; case Opt_flushoncommit: printk(KERN_INFO btrfs: turning on flush-on-commit\n); btrfs_set_opt(info-mount_opt, FLUSHONCOMMIT); @@ -1202,6 +1207,35 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, new_pool_size); } +static int btrfs_close_extend_iref(struct btrfs_fs_info *fs_info, + unsigned long old_opts) The name irritated me, it's more like unset instead of close, isn't it? Maybe btrfs_set_no_extend_iref() is better, the other developers might think we will clear BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF. I think we should use the exact name of the mount option, so btrfs_set_noextiref is probably least ambiguous. Or even btrfs_set_mntflag_noextiref. +{ + struct btrfs_trans_handle *trans; + int ret; + + if (btrfs_raw_test_opt(old_opts, NOEXTIREF
Re: [PATCH] Btrfs: add a rb_tree to improve performance of ulist search
;/* auxiliary value saved along with the val */ + struct rb_node rb_node; /* used to speed up search */ }; struct ulist { @@ -54,6 +58,8 @@ struct ulist { */ struct ulist_node *nodes; + struct rb_root root; + /* * inline storage space for the first ULIST_SIZE entries */ Makes a lot of sense. Thanks! Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: don't set INCOMPAT_EXTENDED_IREF flag when making a new fs
On Thu, April 11, 2013 at 12:28 (+0200), Miao Xie wrote: There is no extended irefs in the new fs, and we can mount it on the old kernel without extended iref function safely. So we needn't set INCOMPAT_EXTENDED_IREF flag when making a new fs, and just set it when we actually insert a extended iref. Signed-off-by: Miao Xie mi...@cn.fujitsu.com Cc: Mark Fasheh mfas...@suse.de --- mkfs.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/mkfs.c b/mkfs.c index c8cb395..aca6e46 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1654,8 +1654,6 @@ raid_groups: super = root-fs_info-super_copy; flags = btrfs_super_incompat_flags(super); - flags |= BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF; - if (mixed) flags |= BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS; This one should have a large *** do not apply until kernel patches from [PATCH 0/2] do not open the extend *** inode reference at the beginning have been merged. tag. Otherwise, extended irefs are disabled entirely for all new file systems in environments where they have been working so far. -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Btrfs: introduce noextiref mount option
On Thu, April 11, 2013 at 12:35 (+0200), Miao Xie wrote: Now, we set incompat flag EXTEND_IREF when we actually need insert a extend inode reference, not when making a fs. But some users may hope that the fs still can be mounted on the old kernel, and don't hope we insert any extend inode references. So we introduce noextiref mount option to close this function. That's a much better approach compared to setting the flag on mkfs, I agree. Signed-off-by: Miao Xie mi...@cn.fujitsu.com Cc: Mark Fasheh mfas...@suse.de --- fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c| 9 + fs/btrfs/inode-item.c | 2 +- fs/btrfs/super.c | 41 - 4 files changed, 51 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a883e47..db88963 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1911,6 +1911,7 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_MOUNT_CHECK_INTEGRITY (1 20) #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1 21) #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR (1 22) +#define BTRFS_MOUNT_NOEXTIREF(1 23) #define btrfs_clear_opt(o, opt) ((o) = ~BTRFS_MOUNT_##opt) #define btrfs_set_opt(o, opt)((o) |= BTRFS_MOUNT_##opt) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index ab8ef37..ee00448 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2269,6 +2269,15 @@ int open_ctree(struct super_block *sb, goto fail_alloc; } + if ((btrfs_super_incompat_flags(disk_super) + BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF) + btrfs_test_opt(tree_root, NOEXTIREF)) { + printk(KERN_ERR BTRFS: couldn't mount because the extend iref +can not be close.\n); + err = -EINVAL; + goto fail_alloc; + } + if (btrfs_super_leafsize(disk_super) != btrfs_super_nodesize(disk_super)) { printk(KERN_ERR BTRFS: couldn't mount because metadata diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c index f07eb45..7c4f880 100644 --- a/fs/btrfs/inode-item.c +++ b/fs/btrfs/inode-item.c @@ -442,7 +442,7 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle *trans, out: btrfs_free_path(path); - if (ret == -EMLINK) { + if (ret == -EMLINK !btrfs_test_opt(root, NOEXTIREF)) { /* * We ran out of space in the ref array. Need to add an * extended ref. diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 0f03569..fd375b3 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -315,7 +315,7 @@ enum { Opt_nodatacow, Opt_max_inline, Opt_alloc_start, Opt_nobarrier, Opt_ssd, Opt_nossd, Opt_ssd_spread, Opt_thread_pool, Opt_noacl, Opt_compress, Opt_compress_type, Opt_compress_force, Opt_compress_force_type, - Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard, + Opt_notreelog, Opt_noextiref, Opt_ratio, Opt_flushoncommit, Opt_discard, Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed, Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache, Opt_no_space_cache, Opt_recovery, Opt_skip_balance, @@ -344,6 +344,7 @@ static match_table_t tokens = { {Opt_nossd, nossd}, {Opt_noacl, noacl}, {Opt_notreelog, notreelog}, + {Opt_noextiref, noextiref}, {Opt_flushoncommit, flushoncommit}, {Opt_ratio, metadata_ratio=%d}, {Opt_discard, discard}, @@ -535,6 +536,10 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) printk(KERN_INFO btrfs: disabling tree log\n); btrfs_set_opt(info-mount_opt, NOTREELOG); break; + case Opt_noextiref: + printk(KERN_INFO btrfs: disabling extend inode ref\n); + btrfs_set_opt(info-mount_opt, NOEXTIREF); + break; case Opt_flushoncommit: printk(KERN_INFO btrfs: turning on flush-on-commit\n); btrfs_set_opt(info-mount_opt, FLUSHONCOMMIT); @@ -1202,6 +1207,35 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, new_pool_size); } +static int btrfs_close_extend_iref(struct btrfs_fs_info *fs_info, +unsigned long old_opts) The name irritated me, it's more like unset instead of close, isn't it? +{ + struct btrfs_trans_handle *trans; + int ret; + + if (btrfs_raw_test_opt(old_opts, NOEXTIREF) || + !btrfs_raw_test_opt(fs_info-mount_opt, NOEXTIREF)) + return 0; + + trans = btrfs_attach_transaction(fs_info-tree_root); + if (IS_ERR(trans)) { + if (PTR_ERR(trans) != -ENOENT) + return PTR_ERR(trans); + } else { +
Re: kernel BUG at fs/btrfs/ctree.c:1144!
On Wed, April 10, 2013 at 09:58 (+0200), Ahmet Inan wrote: I got this problem since 3.8.5 + for-linus (from that time). Have just tried 3.8.6 + for-linus with git merge -X theirs btrfs/for-linus but still same problem. Going back to 3.7.4 + for-linus (from that time) doesn't give me the problem. The stack you attached shows a function added in the snapshot aware defrag patches (commit 38c227d8), added in 3.8. The real problem, however, is not caused by that commit but by a tree mod log bug. I expect that fs/btrfs/ctree.c:1144 is this BUG_ON in your kernel from __tree_mod_log_rewind (my line numbers don't match): 1138 switch (tm-op) { 1139 case MOD_LOG_KEY_REMOVE_WHILE_FREEING: 1140 BUG_ON(tm-slot n); I've got a fix for that I'm currently testing, expect it on the list soon. This is an production nfs server with 2x2TB raid1, so cant reboot it that often. Have seen this same problem on another system (also raid1) once, but rebooting helped, no problems since. Both systems use autodefrag, maybe that sometimes triggers it? I really would like to help, so i can stay on the latest kernels. What should i do? For the meantime I recommend to not defrag your filesystem. As a general remark, please send your stack traces inline, not as attachment if possible. Thanks, -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Lockdep warning on for-linus branch (umount vs. evict_inode)
I was running fsstress to trigger a tree mod log problem on a current kernel with some custom debug patches applied, so if anyone looking at this needs any line numbers let me know: 4[ 1221.749586] [ INFO: possible circular locking dependency detected ] 4[ 1221.749589] 3.8.0+ #9 Not tainted 4[ 1221.749590] --- 4[ 1221.749591] fsstress/3108 is trying to acquire lock: 4[ 1221.749592] (sb_internal){.+.+..}, at: [a0183cde] start_transaction+0x2de/0x4f0 [btrfs] 4[ 1221.749614] 4[ 1221.749614] but task is already holding lock: 4[ 1221.749616] (fs_info-ordered_operations_mutex){+.+...}, at: [a019c089] btrfs_wait_ordered_extents+0x49/0x270 [btrfs] 4[ 1221.749632] 4[ 1221.749632] which lock already depends on the new lock. 4[ 1221.749632] 4[ 1221.749634] 4[ 1221.749634] the existing dependency chain (in reverse order) is: 4[ 1221.749635] 4[ 1221.749635] - #1 (fs_info-ordered_operations_mutex){+.+...}: 4[ 1221.749638][810f1f73] lock_acquire+0x93/0x130 4[ 1221.749643][819b5fff] __mutex_lock_common+0x5f/0x4a0 4[ 1221.749647][819b6575] mutex_lock_nested+0x45/0x50 4[ 1221.749650][a019b935] btrfs_run_ordered_operations+0x55/0x2e0 [btrfs] 4[ 1221.749663][a0182866] btrfs_commit_transaction+0x76/0xd40 [btrfs] 4[ 1221.749675][a017c3a7] btrfs_commit_super+0x67/0x130 [btrfs] 4[ 1221.749687][a017daea] close_ctree+0x34a/0x3a0 [btrfs] 4[ 1221.749699][a014fe49] btrfs_put_super+0x19/0x20 [btrfs] 4[ 1221.749707][811bed62] generic_shutdown_super+0x62/0xf0 4[ 1221.749710][811bee86] kill_anon_super+0x16/0x30 4[ 1221.749712][a015396a] btrfs_kill_super+0x1a/0x90 [btrfs] 4[ 1221.749720][811bf3a5] deactivate_locked_super+0x45/0x70 4[ 1221.749722][811c02aa] deactivate_super+0x4a/0x70 4[ 1221.749725][811dbe72] mntput_no_expire+0xd2/0x130 4[ 1221.749728][811dcb6e] sys_umount+0x7e/0x3b0 4[ 1221.749730][819c0c82] system_call_fastpath+0x16/0x1b 4[ 1221.749734] 4[ 1221.749734] - #0 (sb_internal){.+.+..}: 4[ 1221.749736][810f1e03] __lock_acquire+0x1713/0x17f0 4[ 1221.749739][810f1f73] lock_acquire+0x93/0x130 4[ 1221.749741][811be82f] __sb_start_write+0x13f/0x230 4[ 1221.749745][a0183cde] start_transaction+0x2de/0x4f0 [btrfs] 4[ 1221.749757][a0183fc7] btrfs_join_transaction+0x17/0x20 [btrfs] 4[ 1221.749770][a01d6bf0] btrfs_commit_inode_delayed_inode+0x60/0x150 [btrfs] 4[ 1221.749784][a018a240] btrfs_evict_inode+0x140/0x350 [btrfs] 4[ 1221.749798][811d6df7] evict+0xa7/0x1a0 4[ 1221.749801][811d7008] iput+0x118/0x1a0 4[ 1221.749803][a019c286] btrfs_wait_ordered_extents+0x246/0x270 [btrfs] 4[ 1221.749817][a0151897] btrfs_sync_fs+0x47/0x110 [btrfs] 4[ 1221.749825][811ecaa0] sync_fs_one_sb+0x20/0x30 4[ 1221.749828][811c07f6] iterate_supers+0xb6/0xf0 4[ 1221.749831][811ecf85] sys_sync+0x55/0x90 4[ 1221.749833][819c0c82] system_call_fastpath+0x16/0x1b 4[ 1221.749836] 4[ 1221.749836] other info that might help us debug this: 4[ 1221.749836] 4[ 1221.749837] Possible unsafe locking scenario: 4[ 1221.749837] 4[ 1221.749839]CPU0CPU1 4[ 1221.749840] 4[ 1221.749841] lock(fs_info-ordered_operations_mutex); 4[ 1221.749843]lock(sb_internal); 4[ 1221.749845] lock(fs_info-ordered_operations_mutex); 4[ 1221.749846] lock(sb_internal); 4[ 1221.749848] 4[ 1221.749848] *** DEADLOCK *** 4[ 1221.749848] 4[ 1221.749851] 2 locks held by fsstress/3108: 4[ 1221.749852] #0: (type-s_umount_key#22){+.}, at: [811c07e0] iterate_supers+0xa0/0xf0 4[ 1221.749857] #1: (fs_info-ordered_operations_mutex){+.+...}, at: [a019c089] btrfs_wait_ordered_extents+0x49/0x270 [btrfs] 4[ 1221.749873] 4[ 1221.749873] stack backtrace: 4[ 1221.749875] Pid: 3108, comm: fsstress Not tainted 3.8.0+ #9 4[ 1221.749876] Call Trace: 4[ 1221.749880] [810ef2be] print_circular_bug+0x20e/0x2f0 4[ 1221.749883] [810f1e03] __lock_acquire+0x1713/0x17f0 4[ 1221.749896] [a0183cde] ? start_transaction+0x2de/0x4f0 [btrfs] 4[ 1221.749898] [810f1f73] lock_acquire+0x93/0x130 4[ 1221.749911] [a0183cde] ? start_transaction+0x2de/0x4f0 [btrfs] 4[ 1221.749914] [8116b60d] ? find_get_pages_tag+0x2d/0x1d0 4[ 1221.749918] [811be82f] __sb_start_write+0x13f/0x230 4[ 1221.749930] [a0183cde] ? start_transaction+0x2de/0x4f0 [btrfs] 4[ 1221.749943] [a0183cde] ? start_transaction+0x2de/0x4f0 [btrfs] 4[ 1221.749946]
Re: btrfs-progs: re-add send-test
On 06.04.2013 20:30, Eric Sandeen wrote: From: Mark Fasheh mfas...@suse.de btrfs-progs: re-add send-test send-test.c links against libbtrfs and uses the send functionality provided to decode and print a send stream to the console. This looks pretty much like fardump from Arne's far repository: git://git.kernel.org/pub/scm/linux/kernel/git/arne/far-progs.git The stream generated by btrfs send is generated in a way to be easily receivable by any other filesystem (destination fs' limitations apply). Thus, we came up with the term Filesystem Agnostic Replication. In my opinion, one of the next steps would be getting the logic used in btrfs receive into a generic far lib, which itself would link against libbtrfs to take the btrfs part from there. So, btrfs receive on the command line would become a stub calling something in the yet to create far lib, which itself would call hooks back to libbtrfs. I have no plan ready how to build btrfs progs if we're doing such a shift, that would have to be sorted out. My goal is to get everything that could be shared with other filesystems out of btrfs-progs into a far lib, and the send-test sent here definitely falls into that category; plus: it already exists outside btrfs-progs, unless I'm missing something. Thanks! -Jan [snip] -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html