Re: [PATCH] Btrfs: fix negative qgroup tracking from owner accounting (bug #61951)

2013-11-06 Thread Jan Schmidt
 
On Mon, November 04, 2013 at 18:42 (+0100), Josef Bacik wrote:
 On Thu, Oct 24, 2013 at 03:22:06PM +0200, Jan Schmidt wrote:
 btrfs_dec_ref() queued a delayed ref for owner of a tree block. The qgroup
 tracking is based on delayed refs. The owner of a tree block is set when a
 tree block is allocated, it is never updated.

 When you allocate a tree block and then remove the subvolume that did the
 allocation, the qgroup accounting for that removal is correct. However, the
 removal was accounted again for each subvolume deletion that also referenced
 the tree block, because accounting was erroneously based on the owner.

 Instead of queueing delayed refs for the non-existent owner, we now
 queue delayed refs for the root being removed. This fixes the qgroup
 accounting.

 Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
 Tested-by: dustym...@gmail.com
 
 This breaks btrfs/003, I'm kicking it out.

Can you be a bit more specific? Works fine here.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix negative qgroup tracking from owner accounting (bug #61951)

2013-11-01 Thread Jan Schmidt
(cc Arne)

On Thu, October 24, 2013 at 16:49 (+0200), Wang Shilong wrote:
 Hello Jan,
 
 btrfs_dec_ref() queued a delayed ref for owner of a tree block. The qgroup
 tracking is based on delayed refs. The owner of a tree block is set when a
 tree block is allocated, it is never updated.

 When you allocate a tree block and then remove the subvolume that did the
 allocation, the qgroup accounting for that removal is correct. However, the
 removal was accounted again for each subvolume deletion that also referenced
 the tree block, because accounting was erroneously based on the owner.

 Instead of queueing delayed refs for the non-existent owner, we now
 queue delayed refs for the root being removed. This fixes the qgroup
 accounting.
 
 Thanks for tracking this, i apply your patch, and using the flowing patch,
 found the problem still exist, the test script like the following:
 
 #!/bin/sh
 
 for i in $(seq 1000)
 do
   dd if=/dev/zero 
 of=mnt/$iaaa  bs=10K 
 count=1
 done
 
 btrfs sub snapshot mnt mnt/1
 for i in $(seq 100)
 do
   btrfs sub snapshot mnt/$i mnt/$(($i+1))
 done
 
 for i in $(seq 101)
 do
   btrfs sub delete mnt/$i
 done

I've understood the problem this reproducer creates. In fact, you can shorten it
dramatically. The story of qgroups is going to turn awkward at this point.

mkfs and enable quota, put some data in (needs a level 2 tree)
- this accounts rfer and excl for qgroup 5

take a snapshot
- this creates qgroup 257, which gets rfer(257) = rfer(5) and excl(257) = 0,
excl(5) = 0.

now make sure you don't cow anything (which we always did in our extensive
tests), just drop the newly created snapshot.
- excl(5) ought to become what it was before the snapshot, and there's no code
for this. This is because there is node code that brings rfer(257) to zero, the
data extents are not touched because the tree blocks of 5 and 257 are shared.

Drop tree does not go down the whole tree, when it finds a tree block with
refcnt  1 it just decrements it and is done. This is very efficient but is bad
the qgroup numbers.

We have got three possibile solutions in mind:

A: Always walk down the whole tree for quota-enabled fs tree drops. Can be done
with the read-ahead code, but is potentially a whole lot of work for large file
systems.

B: Use tracking qgroups as required for several operations on higher level
qgroups also for the level 0 qgroups. They could be created automatically and
track the correct numbers just in case a snapshot is deleted. The problem with
that approach is that it does not scale for a large number of subvolumes, as you
need to track each possible combination of all subvolumes (exponential costs).

C: Make sure all your metadata is cowed before dropping a subvolume. This is
explicitly doing what solution A would do implicitly, but can theoretically be
done by the user. I don't consider C a practical solution.

Sigh.
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix negative qgroup tracking from owner accounting (bug #61951)

2013-11-01 Thread Jan Schmidt
Hi Josef,

please consider this patch for btrfs-next and for the following merge window
(3.13). The fact that there's another problem concerning qgroups as discussed in
the rest of this thread doesn't make this patch any less correct.

Thanks,
-Jan

On Thu, October 24, 2013 at 15:22 (+0200), Jan Schmidt wrote:
 btrfs_dec_ref() queued a delayed ref for owner of a tree block. The qgroup
 tracking is based on delayed refs. The owner of a tree block is set when a
 tree block is allocated, it is never updated.
 
 When you allocate a tree block and then remove the subvolume that did the
 allocation, the qgroup accounting for that removal is correct. However, the
 removal was accounted again for each subvolume deletion that also referenced
 the tree block, because accounting was erroneously based on the owner.
 
 Instead of queueing delayed refs for the non-existent owner, we now
 queue delayed refs for the root being removed. This fixes the qgroup
 accounting.
 
 Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
 Tested-by: dustym...@gmail.com
 ---
  fs/btrfs/extent-tree.c |   14 +-
  1 files changed, 9 insertions(+), 5 deletions(-)
 
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index d58bef1..7846cae 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -3004,12 +3004,11 @@ out:
  static int __btrfs_mod_ref(struct btrfs_trans_handle *trans,
  struct btrfs_root *root,
  struct extent_buffer *buf,
 -int full_backref, int inc, int for_cow)
 +int full_backref, u64 ref_root, int inc, int for_cow)
  {
   u64 bytenr;
   u64 num_bytes;
   u64 parent;
 - u64 ref_root;
   u32 nritems;
   struct btrfs_key key;
   struct btrfs_file_extent_item *fi;
 @@ -3019,7 +3018,6 @@ static int __btrfs_mod_ref(struct btrfs_trans_handle 
 *trans,
   int (*process_func)(struct btrfs_trans_handle *, struct btrfs_root *,
   u64, u64, u64, u64, u64, u64, int);
  
 - ref_root = btrfs_header_owner(buf);
   nritems = btrfs_header_nritems(buf);
   level = btrfs_header_level(buf);
  
 @@ -3075,13 +3073,19 @@ fail:
  int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 struct extent_buffer *buf, int full_backref, int for_cow)
  {
 - return __btrfs_mod_ref(trans, root, buf, full_backref, 1, for_cow);
 + u64 ref_root;
 +
 + ref_root = btrfs_header_owner(buf);
 +
 + return __btrfs_mod_ref(trans, root, buf, full_backref, ref_root,
 +1, for_cow);
  }
  
  int btrfs_dec_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 struct extent_buffer *buf, int full_backref, int for_cow)
  {
 - return __btrfs_mod_ref(trans, root, buf, full_backref, 0, for_cow);
 + return __btrfs_mod_ref(trans, root, buf, full_backref, root-objectid,
 +0, for_cow);
  }
  
  static int write_one_cache_group(struct btrfs_trans_handle *trans,
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send/receive do not keep inode ctimes

2013-11-01 Thread Jan Schmidt
Hi Karl,

On Fri, October 25, 2013 at 15:12 (+0200), Karl Kiniger wrote:
 is there low level support to change inode ctimes somehow?
 (on ext[234] it can be done using debugfs)

No.

 It would be nice to make received snapshots as similar as
 possible to their send source. (I am not talking about
 uuids and such, just ls -lc output)

This is not planned. Currently, we do not even preserve the inode number. Can
you give a short explanation of your use case, why do you need to keep the 
ctime?

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix negative qgroup tracking from owner accounting (bug #61951)

2013-10-24 Thread Jan Schmidt
On Thu, October 24, 2013 at 16:49 (+0200), Wang Shilong wrote:
 Hello Jan,
 
 btrfs_dec_ref() queued a delayed ref for owner of a tree block. The qgroup
 tracking is based on delayed refs. The owner of a tree block is set when a
 tree block is allocated, it is never updated.

 When you allocate a tree block and then remove the subvolume that did the
 allocation, the qgroup accounting for that removal is correct. However, the
 removal was accounted again for each subvolume deletion that also referenced
 the tree block, because accounting was erroneously based on the owner.

 Instead of queueing delayed refs for the non-existent owner, we now
 queue delayed refs for the root being removed. This fixes the qgroup
 accounting.
 
 Thanks for tracking this, i apply your patch, and using the flowing patch,
 found the problem still exist, the test script like the following:

Reproduced. Gives more negative numbers due to accounting triggered by the
cleaner thread, that's the common part here. I still believe that the fix I sent
is correct, it's probably not complete. Looking into it.

Thanks,
-Jan

 #!/bin/sh
 
 for i in $(seq 1000)
 do
   dd if=/dev/zero 
 of=mnt/$iaaa  bs=10K 
 count=1
 done
 
 btrfs sub snapshot mnt mnt/1
 for i in $(seq 100)
 do
   btrfs sub snapshot mnt/$i mnt/$(($i+1))
 done
 
 for i in $(seq 101)
 do
   btrfs sub delete mnt/$i
 done
 
 
 Thanks,
 Wang
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Don't allocate inode that is already in use

2013-10-16 Thread Jan Schmidt
On Tue, October 15, 2013 at 22:41 (+0200), Zach Brown wrote:
 Probably a bit too obscure to turn this into  an xfstest? At least nobody
 complained so far, and this reproducer takes me 1m57 to run, so nothing I 
 want
 in each xfstest cycle.
 
 I disagree.  The entire point of regression tests is to trigger bugs
 that the usual processes failed to find, like this one.
 
 If you think that two minutes is too long for a test to run then mark it
 as stress (is that the xfstests group for boring long running tests?)
 or take the time to make a tighter test.
 
 Don't just skip regression testing.  Please.

You are mixing up my points. The first argument you're quoting is not against
regression testing in this case, and it deserves the stress answer, I agree.

You don't quote my second argument, which is not just skip regression testing.
I'll try again in other words: A regression test only makes sense if it can
prevent us from making the same mistake again. As far as I see, the reproducer
script is so specific, that the only thing it can prevent is an exact revert of
Stefan's patch. If you argue that we should have a test for just this, fair
enough, then we could use exactly Stefan's script. I don't think that gains us
anything. We're not normally reverting bugfix patches deliberately, especially
not for very short patches with very long descriptions.

I'd very much like to see a more generic test to avoid similar regressions, if
that can be created. I don't have a good plan how to trigger such a situation
(i.e. know which inodes are on the free_inode_pinned list) in a more general 
way.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use right root when checking for hash collision

2013-10-16 Thread Jan Schmidt
On Wed, October 09, 2013 at 18:26 (+0200), Josef Bacik wrote:
 btrfs_rename was using the root of the old dir instead of the root of the new
 dir when checking for a hash collision, so if you tried to move a file into a
 subvol it would freak out because it would see the file you are trying to move
 in its current root.  This fixes the bug where this would fail
 
 btrfs subvol create test1
 btrfs subvol create test2
 mv test1 test2.
 
 Thanks to Chris Murphy for catching this,
 
 Reported-by: Chris Murphy li...@colorremedies.com
 Signed-off-by: Josef Bacik jba...@fusionio.com
 ---
  fs/btrfs/inode.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index 1d7ef37..d468246 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -7993,7 +7993,7 @@ static int btrfs_rename(struct inode *old_dir, struct 
 dentry *old_dentry,
  
  
   /* check for collisions, even if the  name isn't there */
 - ret = btrfs_check_dir_item_collision(root, new_dir-i_ino,
 + ret = btrfs_check_dir_item_collision(dest, new_dir-i_ino,
new_dentry-d_name.name,
new_dentry-d_name.len);

Looks correct.

I claim that better variable names would have had avoided this bug. The code
uses old_dir / new_dir, old_entry / new_entry, old_inode / new_inode - so, while
you're at it: How about changing the variables to old_root / new_root instead of
keeping root / dest?

- Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Don't allocate inode that is already in use

2013-10-15 Thread Jan Schmidt
On Tue, October 15, 2013 at 20:08 (+0200), Stefan Behrens wrote:
 Due to an off-by-one error, it is possible to reproduce a bug
 when the inode cache is used.
 
 The same inode number is assigned twice, the second time this
 leads to an EEXIST in btrfs_insert_empty_items().
 
 The issue can happen when a file is removed right after a subvolume
 is created and then a new inode number is created before the
 inodes in free_inode_pinned are processed.
 unlink() calls btrfs_return_ino() which calls start_caching() in this
 case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by
 searching for the highest inode (which already cannot find the
 unlinked one anymore in btrfs_find_free_objectid()). So if this
 unlinked inode's number is equal to the highest_ino + 1 (or = this value
 instead of  this value which was the off-by-one error), we mustn't add
 the inode number to free_ino_pinned (caching_thread() does it right).
 In this case we need to try directly to add the number to the inode_cache
 which will fail in this case.
 
 When this inode number is allocated while it is still in free_ino_pinned,
 it is allocated and still added to the free inode cache when the
 pinned inodes are processed, thus one of the following inode number
 allocations will get an inode that is already in use and fail with EEXIST
 in btrfs_insert_empty_items().
 
 One example which was created with the reproducer below:
 Create a snapshot, work in the newly created snapshot for the rest.
 In unlink(inode 34284) call btrfs_return_ino() which calls start_caching().
 start_caching() calls add_free_space [34284, 18446744073709517077].
 In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong.
 mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
 btrfs_unpin_free_ino calls add_free_space [34284, 1].
 mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
 EEXIST when the new inode is inserted.
 
 One possible reproducer is this one:
  #!/bin/sh
  # preparation
 TEST_DEV=/dev/sdc1
 TEST_MNT=/mnt
 umount ${TEST_MNT} 2/dev/null || true
 mkfs.btrfs -f ${TEST_DEV}
 mount ${TEST_DEV} ${TEST_MNT} -o \
  rw,relatime,compress=lzo,space_cache,inode_cache
 btrfs subv create ${TEST_MNT}/s1
 for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done
 btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2
 FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed 's|^.*/\([^/]*\)$|\1|'`
 rm ${TEST_MNT}/s2/$FILENAME
 touch ${TEST_MNT}/s2/$FILENAME
  # the following steps can be repeated to reproduce the issue again and again
 [ -e ${TEST_MNT}/s3 ]  btrfs subv del ${TEST_MNT}/s3
 btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3
 rm ${TEST_MNT}/s3/$FILENAME
 touch ${TEST_MNT}/s3/$FILENAME
 ls -alFi ${TEST_MNT}/s?/$FILENAME
 touch ${TEST_MNT}/s3/_1 || logger FAILED
 ls -alFi ${TEST_MNT}/s?/_1
 touch ${TEST_MNT}/s3/_2 || logger FAILED
 ls -alFi ${TEST_MNT}/s?/_2
 touch ${TEST_MNT}/s3/__1 || logger FAILED
 ls -alFi ${TEST_MNT}/s?/__1
 touch ${TEST_MNT}/s3/__2 || logger FAILED
 ls -alFi ${TEST_MNT}/s?/__2
  # if the above is not enough, add the following loop:
 for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
  #for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; 
 done
  # one of the touch(1) calls in s3 fail due to EEXIST because the inode is
  # already in use that btrfs_find_ino_for_alloc() returns.

Probably a bit too obscure to turn this into  an xfstest? At least nobody
complained so far, and this reproducer takes me 1m57 to run, so nothing I want
in each xfstest cycle.

If we ever introduce a similar problem, this reproducer probably won't find it
(at least if it's really dependent on the exact number of files and the exact
inode number), unless we're effectively reversing this patch. So no real use for
a regression test in my opinion, I'm okay with just fixing it.

 Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de
 ---
  fs/btrfs/inode-map.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
 index 014de49..ec08004 100644
 --- a/fs/btrfs/inode-map.c
 +++ b/fs/btrfs/inode-map.c
 @@ -237,7 +237,7 @@ again:
   start_caching(root);
  
   if (objectid = root-cache_progress ||
 - objectid  root-highest_objectid)
 + objectid = root-highest_objectid)
   __btrfs_add_free_space(ctl, objectid, 1);
   else
   __btrfs_add_free_space(pinned, objectid, 1);
 

Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net

... although this is not the most beautiful commit message I've ever seen ;-)

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: btrfs/011 improvement for compressed filesystems

2013-09-27 Thread Jan Schmidt
 status $SCRATCH_MNT  $tmp.tmp 21
   cat $tmp.tmp  $seqres.full
 - grep -q finished $tmp.tmp || _fail btrfs replace status failed
 + grep -q finished $tmp.tmp || _fail btrfs replace status 
 (finished) failed
   fi
  
   if ps -p $noise_pid | grep -q $noise_pid; then
 

The Q-comparisons look a bit strange to me, but they've been there before.

Reviewed-by: Jan Schmidt list@jan-o-sch.net

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/2] xfstests btrfs/316: test send / receive

2013-08-13 Thread Jan Schmidt
Basic send / receive functionality test for btrfs. Requires current
version of fsstress built (-x support). Relies on fssum tool but can
skip the test if it failed to build.

Signed-off-by: Jan Schmidt list@jan-o-sch.net
Reviewed-by: Josef Bacik jba...@fusionio.com
---
 tests/btrfs/316 |  116 +++
 tests/btrfs/316.out |4 ++
 tests/btrfs/group   |1 +
 3 files changed, 121 insertions(+), 0 deletions(-)
 create mode 100755 tests/btrfs/316
 create mode 100644 tests/btrfs/316.out

diff --git a/tests/btrfs/316 b/tests/btrfs/316
new file mode 100755
index 000..b3af7d9
--- /dev/null
+++ b/tests/btrfs/316
@@ -0,0 +1,116 @@
+#! /bin/bash
+# FSQA Test No. 316
+#
+# Run fsstress to create a reasonably strange file system, make a
+# snapshot (base) and run more fsstress. Then take another snapshot
+# (incr) and send both snapshots to a temp file. Remake the file
+# system and receive from the files. Check both states with fssum.
+#
+#---
+# Copyright (C) 2013 STRATO.  All rights reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+#---
+#
+# creator
+owner=list.bt...@jan-o-sch.net
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo QA output created by $seq
+
+here=`pwd`
+tmp=`mktemp -d`
+status=1
+
+_cleanup()
+{
+   echo *** unmount
+   umount $SCRATCH_MNT 2/dev/null
+   rm -f $tmp.*
+}
+trap _cleanup; exit \$status 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_need_to_be_root
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_seek_data_hole
+
+FSSUM_PROG=$here/src/fssum
+[ -x $FSSUM_PROG ] || _notrun fssum not built
+
+rm -f $seqres.full
+
+workout()
+{
+   fsz=$1
+   ops=$2
+
+   umount $SCRATCH_DEV /dev/null 21
+   echo *** mkfs -dsize=$fsz$seqres.full
+   echo  $seqres.full
+   _scratch_mkfs_sized $fsz $seqres.full 21 \
+   || _fail size=$fsz mkfs failed
+   run_check _scratch_mount -o noatime
+
+   run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n $ops $FSSTRESS_AVOID -x \
+   $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/base
+
+   run_check $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/incr
+
+   echo # $BTRFS_UTIL_PROG send $SCRATCH_MNT/base  $tmp/base.snap \
+$seqres.full
+   $BTRFS_UTIL_PROG send $SCRATCH_MNT/base  $tmp/base.snap 2 
$seqres.full \
+   || _fail failed: '$@'
+   echo # $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base\
+   $SCRATCH_MNT/incr  $tmp/incr.snap  $seqres.full
+   $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base \
+   $SCRATCH_MNT/incr  $tmp/incr.snap 2 $seqres.full \
+   || _fail failed: '$@'
+
+   run_check $FSSUM_PROG -A -f -w $tmp/base.fssum $SCRATCH_MNT/base
+   run_check $FSSUM_PROG -A -f -w $tmp/incr.fssum -x 
$SCRATCH_MNT/incr/base \
+   $SCRATCH_MNT/incr
+
+   umount $SCRATCH_DEV /dev/null 21
+   echo *** mkfs -dsize=$fsz$seqres.full
+   echo  $seqres.full
+   _scratch_mkfs_sized $fsz $seqres.full 21 \
+   || _fail size=$fsz mkfs failed
+   run_check _scratch_mount -o noatime
+
+   run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT  $tmp/base.snap
+   run_check $FSSUM_PROG -r $tmp/base.fssum $SCRATCH_MNT/base
+
+   run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT  $tmp/incr.snap
+   run_check $FSSUM_PROG -r $tmp/incr.fssum $SCRATCH_MNT/incr
+}
+
+echo *** test send / receive
+
+fssize=`expr 2000 \* 1024 \* 1024`
+ops=200
+
+workout $fssize $ops
+
+echo *** done
+status=0
+exit
diff --git a/tests/btrfs/316.out b/tests/btrfs/316.out
new file mode 100644
index 000..4564c85
--- /dev/null
+++ b/tests/btrfs/316.out
@@ -0,0 +1,4 @@
+QA output created by 316
+*** test send / receive
+*** done
+*** unmount
diff --git a/tests/btrfs/group b/tests/btrfs/group
index bc6c256..11d708a 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -9,3 +9,4 @@
 276 auto rw metadata
 284 auto
 307 auto quick
+316 auto rw metadata
-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe linux

[PATCH v4 0/2] xfstest btrfs/316: test send / receive

2013-08-13 Thread Jan Schmidt
These two patches add the announced tests for btrfs send / receive. As
requested, the fssum tool is now included.

--
v1-v2:
 - included fssum
 - test number is now 316 (was 314)
v2-v3:
 - added missing -lcrypto to build fssum
 - removed obsolete change in README now that fssum is included
 - fixed comment in test/btrfs/316's header (314 - 316)
v3-v4:
 - build fssum with help of autotools only if libssl is available
 - removed clumsy OPT_TARGETS in src/Makefile
 - added #define directives for SEEK_DATA and SEEK_HOLE to fssum.c

Jan Schmidt (2):
  xfstests: add fssum tool
  xfstests btrfs/316: test send / receive

 .gitignore   |1 +
 aclocal.m4   |1 +
 configure.ac |1 +
 include/builddefs.in |1 +
 m4/Makefile  |1 +
 m4/package_ssldev.m4 |4 +
 src/Makefile |8 +
 src/fssum.c  |  828 ++
 tests/btrfs/316  |  116 +++
 tests/btrfs/316.out  |4 +
 tests/btrfs/group|1 +
 11 files changed, 966 insertions(+), 0 deletions(-)
 create mode 100644 m4/package_ssldev.m4
 create mode 100644 src/fssum.c
 create mode 100755 tests/btrfs/316
 create mode 100644 tests/btrfs/316.out

-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch v2 1/2] Btrfs: fix possible memory leak in find_parent_nodes()

2013-08-09 Thread Jan Schmidt
 
On Fri, August 09, 2013 at 07:25 (+0200), Wang Shilong wrote:
 The origin code dealt with 'ref' as following steps:
   |-list_del(ref-list)
   |-some operations
   |-kfree(ref)
 
 If operations failed, it would goto label 'out' without freeing this 'ref'.
 and then memory leak would happen.Just move list_del() after kfree()
 will fix the problem.

Still not sufficient as an explanation. What is missing is the hint that in the
error handling code, we free everything that's left in the prefs list.

-Jan

 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
 Reviewed-by: Miao Xie mi...@cn.fujitsu.com
 ---
 V1-V2: add explanations to changelog
 ---
  fs/btrfs/backref.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
 index 68048d6..7b55c95 100644
 --- a/fs/btrfs/backref.c
 +++ b/fs/btrfs/backref.c
 @@ -911,7 +911,6 @@ again:
  
   while (!list_empty(prefs)) {
   ref = list_first_entry(prefs, struct __prelim_ref, list);
 - list_del(ref-list);
   WARN_ON(ref-count  0);
   if (ref-count  ref-root_id  ref-parent == 0) {
   /* no parent == root of tree */
 @@ -956,6 +955,7 @@ again:
   eie-next = ref-inode_list;
   }
   }
 + list_del(ref-list);
   kfree(ref);
   }
  
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: stop using GFP_ATOMIC when allocating rewind ebs

2013-08-08 Thread Jan Schmidt
 
On Wed, August 07, 2013 at 23:11 (+0200), Josef Bacik wrote:
 There is no reason we can't just set the path to blocking and then do normal
 GFP_NOFS allocations for these extent buffers.  Thanks,
 
 Signed-off-by: Josef Bacik jba...@fusionio.com
 ---
  fs/btrfs/ctree.c |   16 ++--
  fs/btrfs/extent_io.c |8 
  2 files changed, 14 insertions(+), 10 deletions(-)
 
 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index 1dd8a71..414a2d7 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -1191,8 +1191,8 @@ __tree_mod_log_rewind(struct btrfs_fs_info *fs_info, 
 struct extent_buffer *eb,
   * is freed (its refcount is decremented).
   */
  static struct extent_buffer *
 -tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb,
 - u64 time_seq)
 +tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
 + struct extent_buffer *eb, u64 time_seq)
  {
   struct extent_buffer *eb_rewin;
   struct tree_mod_elem *tm;
 @@ -1207,12 +1207,15 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, 
 struct extent_buffer *eb,
   if (!tm)
   return eb;
  
 + btrfs_set_path_blocking(path);
 + btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
 +
   if (tm-op == MOD_LOG_KEY_REMOVE_WHILE_FREEING) {
   BUG_ON(tm-slot != 0);
   eb_rewin = alloc_dummy_extent_buffer(eb-start,
   fs_info-tree_root-nodesize);
   if (!eb_rewin) {
 - btrfs_tree_read_unlock(eb);
 + btrfs_tree_read_unlock_blocking(eb);
   free_extent_buffer(eb);
   return NULL;
   }
 @@ -1224,13 +1227,14 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, 
 struct extent_buffer *eb,
   } else {
   eb_rewin = btrfs_clone_extent_buffer(eb);
   if (!eb_rewin) {
 - btrfs_tree_read_unlock(eb);
 + btrfs_tree_read_unlock_blocking(eb);
   free_extent_buffer(eb);
   return NULL;
   }
   }
  
 - btrfs_tree_read_unlock(eb);
 + btrfs_clear_path_blocking(path, NULL, BTRFS_READ_LOCK);
 + btrfs_tree_read_unlock_blocking(eb);

unlock_blocking? Rest looks ok to me.

Thanks,
-Jan

   free_extent_buffer(eb);
  
   extent_buffer_get(eb_rewin);
 @@ -2779,7 +2783,7 @@ again:
   btrfs_clear_path_blocking(p, b,
 BTRFS_READ_LOCK);
   }
 - b = tree_mod_log_rewind(root-fs_info, b, time_seq);
 + b = tree_mod_log_rewind(root-fs_info, p, b, time_seq);
   if (!b) {
   ret = -ENOMEM;
   goto done;
 diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
 index b422cba..beda5a8 100644
 --- a/fs/btrfs/extent_io.c
 +++ b/fs/btrfs/extent_io.c
 @@ -4340,12 +4340,12 @@ struct extent_buffer 
 *btrfs_clone_extent_buffer(struct extent_buffer *src)
   struct extent_buffer *new;
   unsigned long num_pages = num_extent_pages(src-start, src-len);
  
 - new = __alloc_extent_buffer(NULL, src-start, src-len, GFP_ATOMIC);
 + new = __alloc_extent_buffer(NULL, src-start, src-len, GFP_NOFS);
   if (new == NULL)
   return NULL;
  
   for (i = 0; i  num_pages; i++) {
 - p = alloc_page(GFP_ATOMIC);
 + p = alloc_page(GFP_NOFS);
   if (!p) {
   btrfs_release_extent_buffer(new);
   return NULL;
 @@ -4369,12 +4369,12 @@ struct extent_buffer *alloc_dummy_extent_buffer(u64 
 start, unsigned long len)
   unsigned long num_pages = num_extent_pages(0, len);
   unsigned long i;
  
 - eb = __alloc_extent_buffer(NULL, start, len, GFP_ATOMIC);
 + eb = __alloc_extent_buffer(NULL, start, len, GFP_NOFS);
   if (!eb)
   return NULL;
  
   for (i = 0; i  num_pages; i++) {
 - eb-pages[i] = alloc_page(GFP_ATOMIC);
 + eb-pages[i] = alloc_page(GFP_NOFS);
   if (!eb-pages[i])
   goto err;
   }
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: deal with enomem in the rewind path V3

2013-08-08 Thread Jan Schmidt
);
 +
 + }
 + if (page) {
 + /* One for when we alloced the page */
 + page_cache_release(page);
 + }
 + } while (index != start_idx);
 +}
 +
 +/*
 + * Helper for releasing the extent buffer.
 + */
 +static inline void btrfs_release_extent_buffer(struct extent_buffer *eb)
 +{
 + btrfs_release_extent_buffer_page(eb, 0);
 + __free_extent_buffer(eb);
 +}
 +
  static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree 
 *tree,
  u64 start,
  unsigned long len,
 @@ -4276,7 +4346,10 @@ struct extent_buffer *btrfs_clone_extent_buffer(struct 
 extent_buffer *src)
  
   for (i = 0; i  num_pages; i++) {
   p = alloc_page(GFP_ATOMIC);
 - BUG_ON(!p);
 + if (!p) {
 + btrfs_release_extent_buffer(new);
 + return NULL;
 + }
   attach_extent_buffer_page(new, p);
   WARN_ON(PageDirty(p));
   SetPageUptodate(p);
 @@ -4317,76 +4390,6 @@ err:
   return NULL;
  }
  
 -static int extent_buffer_under_io(struct extent_buffer *eb)
 -{
 - return (atomic_read(eb-io_pages) ||
 - test_bit(EXTENT_BUFFER_WRITEBACK, eb-bflags) ||
 - test_bit(EXTENT_BUFFER_DIRTY, eb-bflags));
 -}
 -
 -/*
 - * Helper for releasing extent buffer page.
 - */
 -static void btrfs_release_extent_buffer_page(struct extent_buffer *eb,
 - unsigned long start_idx)
 -{
 - unsigned long index;
 - unsigned long num_pages;
 - struct page *page;
 - int mapped = !test_bit(EXTENT_BUFFER_DUMMY, eb-bflags);
 -
 - BUG_ON(extent_buffer_under_io(eb));
 -
 - num_pages = num_extent_pages(eb-start, eb-len);
 - index = start_idx + num_pages;
 - if (start_idx = index)
 - return;
 -
 - do {
 - index--;
 - page = extent_buffer_page(eb, index);
 - if (page  mapped) {
 - spin_lock(page-mapping-private_lock);
 - /*
 -  * We do this since we'll remove the pages after we've
 -  * removed the eb from the radix tree, so we could race
 -  * and have this page now attached to the new eb.  So
 -  * only clear page_private if it's still connected to
 -  * this eb.
 -  */
 - if (PagePrivate(page) 
 - page-private == (unsigned long)eb) {
 - BUG_ON(test_bit(EXTENT_BUFFER_DIRTY, 
 eb-bflags));
 - BUG_ON(PageDirty(page));
 - BUG_ON(PageWriteback(page));
 - /*
 -  * We need to make sure we haven't be attached
 -  * to a new eb.
 -  */
 - ClearPagePrivate(page);
 - set_page_private(page, 0);
 - /* One for the page private */
 - page_cache_release(page);
 - }
 - spin_unlock(page-mapping-private_lock);
 -
 - }
 - if (page) {
 - /* One for when we alloced the page */
 - page_cache_release(page);
 - }
 - } while (index != start_idx);
 -}
 -
 -/*
 - * Helper for releasing the extent buffer.
 - */
 -static inline void btrfs_release_extent_buffer(struct extent_buffer *eb)
 -{
 - btrfs_release_extent_buffer_page(eb, 0);
 - __free_extent_buffer(eb);
 -}
 -
  static void check_buffer_tree_ref(struct extent_buffer *eb)
  {
   int refs;
 

Weird patch formatting concerning extent_io.c, I assume there are no changes in
extent_buffer_under_io and btrfs_release_extent_buffer_page, you just moved
btrfs_clone_extent_buffer, right? Perhaps --patience or --minimal could do
better? Otherwise,

Reviewed-by: Jan Schmidt list@jan-o-sch.net

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: pass gfp_t to __add_prelim_ref() to avoid always using GFP_ATOMIC

2013-08-08 Thread Jan Schmidt
);
   ret = __add_prelim_ref(prefs, root, key, 0, 0,
 -bytenr, count);
 +bytenr, count, GFP_NOFS);
   break;
   }
   default:
 @@ -738,7 +738,7 @@ static int __add_keyed_refs(struct btrfs_fs_info *fs_info,
   case BTRFS_SHARED_BLOCK_REF_KEY:
   ret = __add_prelim_ref(prefs, 0, NULL,
   info_level + 1, key.offset,
 - bytenr, 1);
 + bytenr, 1, GFP_NOFS);
   break;
   case BTRFS_SHARED_DATA_REF_KEY: {
   struct btrfs_shared_data_ref *sdref;
 @@ -748,13 +748,13 @@ static int __add_keyed_refs(struct btrfs_fs_info 
 *fs_info,
 struct btrfs_shared_data_ref);
   count = btrfs_shared_data_ref_count(leaf, sdref);
   ret = __add_prelim_ref(prefs, 0, NULL, 0, key.offset,
 - bytenr, count);
 + bytenr, count, GFP_NOFS);
   break;
   }
   case BTRFS_TREE_BLOCK_REF_KEY:
   ret = __add_prelim_ref(prefs, key.offset, NULL,
  info_level + 1, 0,
 -bytenr, 1);
 +bytenr, 1, GFP_NOFS);
   break;
   case BTRFS_EXTENT_DATA_REF_KEY: {
   struct btrfs_extent_data_ref *dref;
 @@ -770,7 +770,7 @@ static int __add_keyed_refs(struct btrfs_fs_info *fs_info,
   key.offset = btrfs_extent_data_ref_offset(leaf, dref);
   root = btrfs_extent_data_ref_root(leaf, dref);
   ret = __add_prelim_ref(prefs, root, key, 0, 0,
 -bytenr, count);
 +bytenr, count, GFP_NOFS);
   break;
   }
   default:
 

Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] xfstests btrfs/316: test send / receive

2013-08-08 Thread Jan Schmidt
Basic send / receive functionality test for btrfs. Requires current
version of fsstress built (-x support). Relies on fssum tool but can
skip the test if it failed to build.

Signed-off-by: Jan Schmidt list@jan-o-sch.net
Reviewed-by: Josef Bacik jba...@fusionio.com
---
 tests/btrfs/316 |  113 +++
 tests/btrfs/316.out |4 ++
 tests/btrfs/group   |1 +
 3 files changed, 118 insertions(+), 0 deletions(-)
 create mode 100755 tests/btrfs/316
 create mode 100644 tests/btrfs/316.out

diff --git a/tests/btrfs/316 b/tests/btrfs/316
new file mode 100755
index 000..087978a
--- /dev/null
+++ b/tests/btrfs/316
@@ -0,0 +1,113 @@
+#! /bin/bash
+# FSQA Test No. 316
+#
+# Run fsstress to create a reasonably strange file system, make a
+# snapshot (base) and run more fsstress. Then take another snapshot
+# (incr) and send both snapshots to a temp file. Remake the file
+# system and receive from the files. Check both states with fssum.
+#
+#---
+# Copyright (C) 2013 STRATO.  All rights reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+#---
+#
+# creator
+owner=list.bt...@jan-o-sch.net
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo QA output created by $seq
+
+here=`pwd`
+tmp=`mktemp -d`
+status=1
+
+_cleanup()
+{
+   echo *** unmount
+   umount $SCRATCH_MNT 2/dev/null
+   rm -f $tmp.*
+}
+trap _cleanup; exit \$status 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_need_to_be_root
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_command $FSSUM_PROG fssum
+
+rm -f $seqres.full
+
+workout()
+{
+   fsz=$1
+   ops=$2
+
+   umount $SCRATCH_DEV /dev/null 21
+   echo *** mkfs -dsize=$fsz$seqres.full
+   echo  $seqres.full
+   _scratch_mkfs_sized $fsz $seqres.full 21 \
+   || _fail size=$fsz mkfs failed
+   run_check _scratch_mount -o noatime
+
+   run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n $ops $FSSTRESS_AVOID -x \
+   $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/base
+
+   run_check $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/incr
+
+   echo # $BTRFS_UTIL_PROG send $SCRATCH_MNT/base  $tmp/base.snap \
+$seqres.full
+   $BTRFS_UTIL_PROG send $SCRATCH_MNT/base  $tmp/base.snap 2 
$seqres.full \
+   || _fail failed: '$@'
+   echo # $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base\
+   $SCRATCH_MNT/incr  $tmp/incr.snap  $seqres.full
+   $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base \
+   $SCRATCH_MNT/incr  $tmp/incr.snap 2 $seqres.full \
+   || _fail failed: '$@'
+
+   run_check $FSSUM_PROG -A -f -w $tmp/base.fssum $SCRATCH_MNT/base
+   run_check $FSSUM_PROG -A -f -w $tmp/incr.fssum -x 
$SCRATCH_MNT/incr/base \
+   $SCRATCH_MNT/incr
+
+   umount $SCRATCH_DEV /dev/null 21
+   echo *** mkfs -dsize=$fsz$seqres.full
+   echo  $seqres.full
+   _scratch_mkfs_sized $fsz $seqres.full 21 \
+   || _fail size=$fsz mkfs failed
+   run_check _scratch_mount -o noatime
+
+   run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT  $tmp/base.snap
+   run_check $FSSUM_PROG -r $tmp/base.fssum $SCRATCH_MNT/base
+
+   run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT  $tmp/incr.snap
+   run_check $FSSUM_PROG -r $tmp/incr.fssum $SCRATCH_MNT/incr
+}
+
+echo *** test send / receive
+
+fssize=`expr 2000 \* 1024 \* 1024`
+ops=200
+
+workout $fssize $ops
+
+echo *** done
+status=0
+exit
diff --git a/tests/btrfs/316.out b/tests/btrfs/316.out
new file mode 100644
index 000..4564c85
--- /dev/null
+++ b/tests/btrfs/316.out
@@ -0,0 +1,4 @@
+QA output created by 316
+*** test send / receive
+*** done
+*** unmount
diff --git a/tests/btrfs/group b/tests/btrfs/group
index bc6c256..11d708a 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -9,3 +9,4 @@
 276 auto rw metadata
 284 auto
 307 auto quick
+316 auto rw metadata
-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More

[PATCH v3 1/2] xfstests: add fssum tool

2013-08-08 Thread Jan Schmidt
fssum is a tool to build a recursive checksum for a file system. The home
repository of fssum is

git://git.kernel.org/pub/scm/linux/kernel/git/arne/far-progs.git

It is added as an optional target, because it depends on glibc = 2.15 for
SEEK_HOLE / SEEK_DATA. The test to be added using fssum will just be skipped
if fssum wasn't built.

Signed-off-by: Jan Schmidt list@jan-o-sch.net
---
 .gitignore|1 +
 common/config |2 +
 src/Makefile  |   11 +-
 src/fssum.c   |  819 +
 4 files changed, 832 insertions(+), 1 deletions(-)
 create mode 100644 src/fssum.c

diff --git a/.gitignore b/.gitignore
index 11594aa..c2fc6e3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -45,6 +45,7 @@
 /src/fill
 /src/fill2
 /src/fs_perms
+/src/fssum
 /src/fstest
 /src/fsync-tester
 /src/ftrunc
diff --git a/common/config b/common/config
index 67c1498..c8bee29 100644
--- a/common/config
+++ b/common/config
@@ -146,6 +146,8 @@ export SED_PROG=`set_prog_path sed`
 export BC_PROG=`set_prog_path bc`
 [ $BC_PROG =  ]  _fatal bc not found
 
+export FSSUM_PROG=`set_prog_path fssum $here/src/fssum`
+
 export PS_ALL_FLAGS=-ef
 
 export DF_PROG=`set_prog_path df`
diff --git a/src/Makefile b/src/Makefile
index cc679e8..10a4d3c 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -20,10 +20,14 @@ LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize 
preallo_rw_pattern_reader \
stale_handle pwrite_mmap_blocked t_dir_offset2 seek_sanity_test \
seek_copy_test t_readdir_1 t_readdir_2 fsync-tester
 
+OPT_TARGETS = fssum
+
 SUBDIRS =
 
 LLDLIBS = $(LIBATTR) $(LIBHANDLE) $(LIBACL)
 
+OPT_LDLIBS = -lssl -lcrypto
+
 ifeq ($(HAVE_XLOG_ASSIGN_LSN), true)
 LINUX_TARGETS += loggen
 endif
@@ -60,7 +64,7 @@ CFILES = $(TARGETS:=.c)
 LDIRT = $(TARGETS)
 
 
-default: depend $(TARGETS) $(SUBDIRS)
+default: depend $(TARGETS) $(OPT_TARGETS) $(SUBDIRS)
 
 depend: .dep
 
@@ -70,11 +74,16 @@ $(TARGETS): $(LIBTEST)
@echo [CC]$@
$(Q)$(LTLINK) $@.c -o $@ $(CFLAGS) $(LDFLAGS) $(LDLIBS) $(LIBTEST)
 
+$(OPT_TARGETS): $(LIBTEST)
+   @echo [CC]$@
+   -$(Q)$(LTLINK) $@.c -o $@ $(CFLAGS) $(LDFLAGS) $(LDLIBS) $(OPT_LDLIBS) 
$(LIBTEST)
+
 LINKTEST = $(LTLINK) $@.c -o $@ $(CFLAGS) $(LDFLAGS)
 
 install: default $(addsuffix -install,$(SUBDIRS))
$(INSTALL) -m 755 -d $(PKG_LIB_DIR)/src
$(LTINSTALL) -m 755 $(TARGETS) $(PKG_LIB_DIR)/src
+   -$(LTINSTALL) -m 755 $(OPT_TARGETS) $(PKG_LIB_DIR)/src
$(LTINSTALL) -m 755 fill2attr fill2fs fill2fs_check scaleread.sh 
$(PKG_LIB_DIR)/src
$(LTINSTALL) -m 644 dumpfile $(PKG_LIB_DIR)/src
 
diff --git a/src/fssum.c b/src/fssum.c
new file mode 100644
index 000..ecddb6a
--- /dev/null
+++ b/src/fssum.c
@@ -0,0 +1,819 @@
+/*
+ * Copyright (C) 2012 STRATO AG.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+#define _BSD_SOURCE
+#define _LARGEFILE64_SOURCE
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include stdio.h
+#include stdlib.h
+#include unistd.h
+#include string.h
+#include fcntl.h
+#include dirent.h
+#include errno.h
+#include sys/types.h
+#include sys/stat.h
+#ifdef __SOLARIS__
+#include sys/mkdev.h
+#endif
+#include openssl/md5.h
+#include netinet/in.h
+#include inttypes.h
+#include assert.h
+
+#define CS_SIZE 16
+#define CHUNKS 128
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define htonll(x) __bswap_64 (x)
+#endif
+
+/* TODO: add hardlink recognition */
+/* TODO: add xattr/acl */
+
+struct excludes {
+   char *path;
+   int len;
+};
+
+typedef struct _sum {
+   MD5_CTX md5;
+   unsigned char   out[16];
+} sum_t;
+
+typedef int (*sum_file_data_t)(int fd, sum_t *dst);
+
+int gen_manifest = 0;
+int in_manifest = 0;
+char *checksum = NULL;
+struct excludes *excludes;
+int n_excludes = 0;
+int verbose = 0;
+FILE *out_fp;
+FILE *in_fp;
+
+enum _flags {
+   FLAG_UID,
+   FLAG_GID,
+   FLAG_MODE,
+   FLAG_ATIME,
+   FLAG_MTIME,
+   FLAG_CTIME,
+   FLAG_DATA,
+   FLAG_OPEN_ERROR,
+   FLAG_STRUCTURE,
+   NUM_FLAGS
+};
+
+const char flchar[] = ugoamcdes;
+char line[65536];
+
+int flags[NUM_FLAGS] = {1, 1, 1, 1, 1, 0, 1, 0, 0};
+
+char *
+getln(char *buf, int size, FILE *fp)
+{
+   char *p;
+   int l;
+
+   p = fgets(buf, size, fp);
+   if (!p)
+   return NULL;
+
+   l

[PATCH v3 0/2] xfstest btrfs/316: test send / receive

2013-08-08 Thread Jan Schmidt
These two patches add the announced tests for btrfs send / receive. As
requested, the fssum tool is now included.

One drawback is that I'm unable to edit configure.ac or whatever needs
to be modified in an autotools preferred way. Any hints appreciated,
preferrably hints containing all the modifications required to introduce
something like HAVE_SEEK_HOLE.

I do not want to make modifications to fssum.c here, if that's
absolutely required (because one /could/ get along using linux/fs.h,
which is not the way I would like to go), I'd like to have that changed
in the far-progs repository where fssum.c comes from as well.

--
v1-v2:
 - included fssum
 - test number is now 316 (was 314)
v2-v3:
 - added missing -lcrypto to build fssum
 - removed obsolete change in README now that fssum is included
 - fixed comment in test/btrfs/316's header (314 - 316)

Jan Schmidt (2):
  xfstests: add fssum tool
  xfstests btrfs/316: test send / receive

 .gitignore  |1 +
 common/config   |2 +
 src/Makefile|   11 +-
 src/fssum.c |  819 +++
 tests/btrfs/316 |  113 +++
 tests/btrfs/316.out |4 +
 tests/btrfs/group   |1 +
 7 files changed, 950 insertions(+), 1 deletions(-)
 create mode 100644 src/fssum.c
 create mode 100755 tests/btrfs/316
 create mode 100644 tests/btrfs/316.out

-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] Btrfs: catch error return value from find_extent_in_eb()

2013-08-08 Thread Jan Schmidt
 
On Thu, August 08, 2013 at 12:24 (+0200), Filipe David Manana wrote:
 On Thu, Aug 8, 2013 at 6:04 AM, Wang Shilong wangsl.f...@cn.fujitsu.com 
 wrote:
 find_extent_in_eb() may return ENOMEM, catch this error return value.

 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
 Reviewed-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/backref.c | 4 
  1 file changed, 4 insertions(+)

 diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
 index 54e7610..f7781e6 100644
 --- a/fs/btrfs/backref.c
 +++ b/fs/btrfs/backref.c
 @@ -934,6 +934,10 @@ again:
 }
 ret = find_extent_in_eb(eb, bytenr,
 *extent_item_pos, 
 eie);
 +   if (ret) {
 +   free_extent_buffer(eb);
 +   goto out;
 +   }
 ref-inode_list = eie;
 free_extent_buffer(eb);
 }
 
 Hello, this is a duplicate of:  https://patchwork.kernel.org/patch/2835989/

Your linked patch checks for ret  0, which is a safer option since there are
functions down the stack returning  0 or 0 for success and  0 for errors.
Currently, find_extent_in_eb doesn't return their return values, but I'd rather
be a bit more on the safe side and use your patch.

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Btrfs: allocate prelim_ref with a slab allocater

2013-08-08 Thread Jan Schmidt
 
On Thu, August 08, 2013 at 07:04 (+0200), Wang Shilong wrote:
 struct __prelim_ref is allocated and freed frequently when
 walking backref tree, using slab allocater can not only
 speed up allocating but also detect memory leaks.
 
 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
 Reviewed-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/backref.c | 30 +-
  fs/btrfs/backref.h |  2 ++
  fs/btrfs/super.c   |  8 
  3 files changed, 35 insertions(+), 5 deletions(-)
 
 diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
 index f7781e6..916e4f1 100644
 --- a/fs/btrfs/backref.c
 +++ b/fs/btrfs/backref.c
 @@ -119,6 +119,26 @@ struct __prelim_ref {
   u64 wanted_disk_byte;
  };
  
 +static struct kmem_cache *prelim_ref_cache;
 +
 +int __init btrfs_prelim_ref_init(void)
 +{
 + prelim_ref_cache = kmem_cache_create(btrfs_prelim_ref,
 + sizeof(struct __prelim_ref),
 + 0,
 + SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD,
 + NULL);
 + if (!prelim_ref_cache)
 + return -ENOMEM;
 + return 0;
 +}
 +
 +void btrfs_prelim_ref_exit(void)
 +{
 + if (prelim_ref_cache)
 + kmem_cache_destroy(prelim_ref_cache);
 +}
 +
  /*
   * the rules for all callers of this function are:
   * - obtaining the parent is the goal
 @@ -165,7 +185,7 @@ static int __add_prelim_ref(struct list_head *head, u64 
 root_id,
  {
   struct __prelim_ref *ref;
  
 - ref = kmalloc(sizeof(*ref), gfp_mask);
 + ref = kmem_cache_alloc(prelim_ref_cache, gfp_mask);
   if (!ref)
   return -ENOMEM;
  
 @@ -493,7 +513,7 @@ static void __merge_refs(struct list_head *head, int mode)
   ref1-count += ref2-count;
  
   list_del(ref2-list);
 - kfree(ref2);
 + kmem_cache_free(prelim_ref_cache, ref2);
   }
  
   }
 @@ -958,7 +978,7 @@ again:
   }
   }
   list_del(ref-list);
 - kfree(ref);
 + kmem_cache_free(prelim_ref_cache, ref);
   }
  
  out:
 @@ -966,13 +986,13 @@ out:
   while (!list_empty(prefs)) {
   ref = list_first_entry(prefs, struct __prelim_ref, list);
   list_del(ref-list);
 - kfree(ref);
 + kmem_cache_free(prelim_ref_cache, ref);
   }
   while (!list_empty(prefs_delayed)) {
   ref = list_first_entry(prefs_delayed, struct __prelim_ref,
  list);
   list_del(ref-list);
 - kfree(ref);
 + kmem_cache_free(prelim_ref_cache, ref);
   }
  
   return ret;
 diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
 index 8f2e767..a910b27 100644
 --- a/fs/btrfs/backref.h
 +++ b/fs/btrfs/backref.h
 @@ -72,4 +72,6 @@ int btrfs_find_one_extref(struct btrfs_root *root, u64 
 inode_objectid,
 struct btrfs_inode_extref **ret_extref,
 u64 *found_off);
  
 +int __init btrfs_prelim_ref_init(void);
 +void btrfs_prelim_ref_exit(void);
  #endif
 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index b64d762..de7eb3d 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -56,6 +56,7 @@
  #include rcu-string.h
  #include dev-replace.h
  #include free-space-cache.h
 +#include backref.h
  
  #define CREATE_TRACE_POINTS
  #include trace/events/btrfs.h
 @@ -1774,6 +1775,10 @@ static int __init init_btrfs_fs(void)
   if (err)
   goto free_auto_defrag;
  
 + err = btrfs_prelim_ref_init();
 + if (err)
 + goto free_prelim_ref;
 +
   err = btrfs_interface_init();
   if (err)
   goto free_delayed_ref;
 @@ -1791,6 +1796,8 @@ static int __init init_btrfs_fs(void)
  
  unregister_ioctl:
   btrfs_interface_exit();
 +free_prelim_ref:
 + btrfs_prelim_ref_exit();
  free_delayed_ref:
   btrfs_delayed_ref_exit();
  free_auto_defrag:
 @@ -1817,6 +1824,7 @@ static void __exit exit_btrfs_fs(void)
   btrfs_delayed_ref_exit();
   btrfs_auto_defrag_exit();
   btrfs_delayed_inode_exit();
 + btrfs_prelim_ref_exit();
   ordered_data_exit();
   extent_map_exit();
   extent_io_exit();
 

I generally like the idea of using a custom cache here. What about this one?

 324 static int __resolve_indirect_refs(struct btrfs_fs_info *fs_info,
[...]
 367 /* additional parents require new refs being added here */
 368 while ((node = ulist_next(parents, uiter))) {
 369 new_ref = kmalloc(sizeof(*new_ref), GFP_NOFS);

That new_ref will also be freed with kmem_cache_free after your patch, I think.

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH] Btrfs: stop using GFP_ATOMIC when allocating rewind ebs

2013-08-08 Thread Jan Schmidt
On Thu, August 08, 2013 at 15:12 (+0200), Josef Bacik wrote:
 On Thu, Aug 08, 2013 at 09:23:06AM +0200, Jan Schmidt wrote:
  
 On Wed, August 07, 2013 at 23:11 (+0200), Josef Bacik wrote:
 There is no reason we can't just set the path to blocking and then do normal
 GFP_NOFS allocations for these extent buffers.  Thanks,

 Signed-off-by: Josef Bacik jba...@fusionio.com
 ---
  fs/btrfs/ctree.c |   16 ++--
  fs/btrfs/extent_io.c |8 
  2 files changed, 14 insertions(+), 10 deletions(-)

 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index 1dd8a71..414a2d7 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -1191,8 +1191,8 @@ __tree_mod_log_rewind(struct btrfs_fs_info *fs_info, 
 struct extent_buffer *eb,
   * is freed (its refcount is decremented).
   */
  static struct extent_buffer *
 -tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer 
 *eb,
 -   u64 time_seq)
 +tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
 +   struct extent_buffer *eb, u64 time_seq)
  {
 struct extent_buffer *eb_rewin;
 struct tree_mod_elem *tm;
 @@ -1207,12 +1207,15 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, 
 struct extent_buffer *eb,
 if (!tm)
 return eb;
  
 +   btrfs_set_path_blocking(path);
 +   btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
 +
 if (tm-op == MOD_LOG_KEY_REMOVE_WHILE_FREEING) {
 BUG_ON(tm-slot != 0);
 eb_rewin = alloc_dummy_extent_buffer(eb-start,
 fs_info-tree_root-nodesize);
 if (!eb_rewin) {
 -   btrfs_tree_read_unlock(eb);
 +   btrfs_tree_read_unlock_blocking(eb);
 free_extent_buffer(eb);
 return NULL;
 }
 @@ -1224,13 +1227,14 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, 
 struct extent_buffer *eb,
 } else {
 eb_rewin = btrfs_clone_extent_buffer(eb);
 if (!eb_rewin) {
 -   btrfs_tree_read_unlock(eb);
 +   btrfs_tree_read_unlock_blocking(eb);
 free_extent_buffer(eb);
 return NULL;
 }
 }
  
 -   btrfs_tree_read_unlock(eb);
 +   btrfs_clear_path_blocking(path, NULL, BTRFS_READ_LOCK);
 +   btrfs_tree_read_unlock_blocking(eb);

 unlock_blocking? Rest looks ok to me.

 
 Yeah I change the lock to blocking above, so I have to do read_unlock_blocking
 here.  Thanks,

Uh, obviously. Got confused by the btrfs_clear_path_blocking above, but of
course we're locking eb explicitly ourselves.

Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net

Thanks!
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: deal with enomem in the rewind path V3

2013-08-08 Thread Jan Schmidt
On Thu, August 08, 2013 at 16:28 (+0200), David Sterba wrote:
 On Thu, Aug 08, 2013 at 09:36:52AM +0200, Jan Schmidt wrote:
 Weird patch formatting concerning extent_io.c, I assume there are no changes 
 in
 extent_buffer_under_io and btrfs_release_extent_buffer_page, you just moved
 btrfs_clone_extent_buffer, right? Perhaps --patience or --minimal could do
 better? Otherwise,
 
 git diff --patience produces identical result for me (1.8.3.1).

Yeah, I expected that after Josef said that he actually moved the other two
functions, so the structure really changed in a way git cannot diff any better.

 Reviewed-by: Jan Schmidt list@jan-o-sch.net
  ^^^
 xfs? :)

Whoops :-) Replace that by btrfs if you wish.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v5 4/5] Btrfs: disable qgroups accounting when quota is off

2013-08-05 Thread Jan Schmidt
Nice try hiding this one in a dedup patch set, but I finally found it :-)

On Wed, July 31, 2013 at 17:37 (+0200), Liu Bo wrote:
 So we don't need to do qgroups accounting trick without enabling quota.
 This reduces my tester's costing time from ~28s to ~23s.  
 
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
  fs/btrfs/extent-tree.c |6 ++
  fs/btrfs/qgroup.c  |6 ++
  2 files changed, 12 insertions(+), 0 deletions(-)
 
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 10a5c72..c6612f5 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -2524,6 +2524,12 @@ int btrfs_delayed_refs_qgroup_accounting(struct 
 btrfs_trans_handle *trans,
   struct qgroup_update *qgroup_update;
   int ret = 0;
  
 + if (!trans-root-fs_info-quota_enabled) {
 + if (trans-delayed_ref_elem.seq)
 + btrfs_put_tree_mod_seq(fs_info, 
 trans-delayed_ref_elem);
 + return 0;
 + }
 +
   if (list_empty(trans-qgroup_ref_list) !=
   !trans-delayed_ref_elem.seq) {
   /* list without seq or seq without list */
 diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
 index 1280eff..f3e82aa 100644
 --- a/fs/btrfs/qgroup.c
 +++ b/fs/btrfs/qgroup.c
 @@ -1200,6 +1200,9 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle 
 *trans,
  {
   struct qgroup_update *u;
  
 + if (!trans-root-fs_info-quota_enabled)
 + return 0;
 +
   BUG_ON(!trans-delayed_ref_elem.seq);
   u = kmalloc(sizeof(*u), GFP_NOFS);
   if (!u)
 @@ -1850,6 +1853,9 @@ out:
  
  void assert_qgroups_uptodate(struct btrfs_trans_handle *trans)
  {
 + if (!trans-root-fs_info-quota_enabled)
 + return;
 +
   if (list_empty(trans-qgroup_ref_list)  !trans-delayed_ref_elem.seq)
   return;
   pr_err(btrfs: qgroups not uptodate in trans handle %p: list is%s 
 empty, seq is %#x.%x\n,
 

The second hunk looks sensible at first sight. However, hunk 1 and 3 don't. They
assert consistency of qgroup state in well defined places. The fact that you
need to disable those checks shows that skipping addition to the list in the
second hunk cannot be right, or at least not sufficient.

We've got the list of qgroup operations trans-qgroup_ref_list and we've got the
qgroup's delayed ref blocker, trans-delayed_ref_elem. If you stop adding to the
list (hunk 2) which seems reasonable when quota is disabled, then you also must
ensure you're not acquiring the delayed ref blocker element, which should give
another performance boost.

need_ref_seq may be the right place for this change. It just feels a bit too
obvious. The critical cases obviously are quota enable and quota disable. I just
don't recall why it wasn't that way from the very beginning of qgroups, I might
be missing something fundamental here.

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v5 4/5] Btrfs: disable qgroups accounting when quota is off

2013-08-05 Thread Jan Schmidt
On Mon, August 05, 2013 at 16:18 (+0200), Liu Bo wrote:
 On Mon, Aug 05, 2013 at 02:34:30PM +0200, Jan Schmidt wrote:
 Nice try hiding this one in a dedup patch set, but I finally found it :-)
 
 A, I didn't mean to ;-)
 

 On Wed, July 31, 2013 at 17:37 (+0200), Liu Bo wrote:
 So we don't need to do qgroups accounting trick without enabling quota.
 This reduces my tester's costing time from ~28s to ~23s.  

 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
  fs/btrfs/extent-tree.c |6 ++
  fs/btrfs/qgroup.c  |6 ++
  2 files changed, 12 insertions(+), 0 deletions(-)

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 10a5c72..c6612f5 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -2524,6 +2524,12 @@ int btrfs_delayed_refs_qgroup_accounting(struct 
 btrfs_trans_handle *trans,
 struct qgroup_update *qgroup_update;
 int ret = 0;
  
 +   if (!trans-root-fs_info-quota_enabled) {
 +   if (trans-delayed_ref_elem.seq)
 +   btrfs_put_tree_mod_seq(fs_info, 
 trans-delayed_ref_elem);
 +   return 0;
 +   }
 +
 if (list_empty(trans-qgroup_ref_list) !=
 !trans-delayed_ref_elem.seq) {
 /* list without seq or seq without list */
 diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
 index 1280eff..f3e82aa 100644
 --- a/fs/btrfs/qgroup.c
 +++ b/fs/btrfs/qgroup.c
 @@ -1200,6 +1200,9 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle 
 *trans,
  {
 struct qgroup_update *u;
  
 +   if (!trans-root-fs_info-quota_enabled)
 +   return 0;
 +
 BUG_ON(!trans-delayed_ref_elem.seq);
 u = kmalloc(sizeof(*u), GFP_NOFS);
 if (!u)
 @@ -1850,6 +1853,9 @@ out:
  
  void assert_qgroups_uptodate(struct btrfs_trans_handle *trans)
  {
 +   if (!trans-root-fs_info-quota_enabled)
 +   return;
 +
 if (list_empty(trans-qgroup_ref_list)  !trans-delayed_ref_elem.seq)
 return;
 pr_err(btrfs: qgroups not uptodate in trans handle %p: list is%s 
 empty, seq is %#x.%x\n,


 The second hunk looks sensible at first sight. However, hunk 1 and 3 don't. 
 They
 assert consistency of qgroup state in well defined places. The fact that you
 need to disable those checks shows that skipping addition to the list in the
 second hunk cannot be right, or at least not sufficient.
 
 I agree, only hunk 2 is necessary.
 

 We've got the list of qgroup operations trans-qgroup_ref_list and we've got 
 the
 qgroup's delayed ref blocker, trans-delayed_ref_elem. If you stop adding to 
 the
 list (hunk 2) which seems reasonable when quota is disabled, then you also 
 must
 ensure you're not acquiring the delayed ref blocker element, which should 
 give
 another performance boost.
 
 WHY a 'must' here?

Because otherwise you are going to hit the BUG_ONs you avoided with hunk 1 and 
3.


 need_ref_seq may be the right place for this change. It just feels a bit too
 obvious. The critical cases obviously are quota enable and quota disable. I 
 just
 don't recall why it wasn't that way from the very beginning of qgroups, I 
 might
 be missing something fundamental here.
 
 Yeah I thought about 'need_ref_seq', but the point is that delayed ref blocker
 not only serves qgroups accounting, but also features based on backref
 walking, such as scrub, snapshot-aware defragment.

I think you're confusing trans-delayed_ref_elem with other callers of
btrfs_get_tree_mod_seq() and btrfs_put_tree_mod_seq(). trans-delayed_ref_elem
is only used in qgroup context, as far as my grep reaches. There are other
callers of btrfs_get_tree_mod_seq() that can put their blocker element on the
stack, such as iterate_extent_inodes().

But I still might be missing something.

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: add missing error check to find_parent_nodes

2013-07-31 Thread Jan Schmidt
On Wed, July 31, 2013 at 01:26 (+0200), Filipe David Borba Manana wrote:
 Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
 ---
 
 V2: Ensure extent buffer is freed on error.
 
  fs/btrfs/backref.c |4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
 index 8bc5e8c..980e85a 100644
 --- a/fs/btrfs/backref.c
 +++ b/fs/btrfs/backref.c
 @@ -935,8 +935,10 @@ again:
   }
   ret = find_extent_in_eb(eb, bytenr,
   *extent_item_pos, eie);
 - ref-inode_list = eie;
   free_extent_buffer(eb);
 + if (ret  0)
 + goto out;
 + ref-inode_list = eie;
   }
   ret = ulist_add_merge(refs, ref-parent,
 (uintptr_t)ref-inode_list,
 

The only ret  0 I'm seeing is ENOMEM, so that should be safe.

Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cloning a Btrfs partition

2013-07-30 Thread Jan Schmidt
On Mon, July 29, 2013 at 17:32 (+0200), BJ Quinn wrote:
 Thanks for the response!  Not sure I want to roll a custom kernel on this
 particular system.  Any idea on when it might make it to 3.10 stable or 
 3.11?  Or should I just revert back to 3.9?

I missed that it's in fact in 3.11 and if I got Liu Bo right he's going to send
it to 3.10 stable soon.

Thanks,
-Jan

 Thanks!
 
 -BJ
 
 - Original Message - 
 
 From: Jan Schmidt list.bt...@jan-o-sch.net 
 Sent: Monday, July 29, 2013 3:21:51 AM 
 
 Hi BJ, 
 
 [original message rewrapped] 
 
 On Thu, July 25, 2013 at 18:32 (+0200), BJ Quinn wrote: 
 (Apologies for the double post -- forgot to send as plain text the first 
 time 
 around, so the list rejected it.) 

 I see that there's now a btrfs send / receive and I've tried using it, but 
 I'm getting the oops I've pasted below, after which the FS becomes 
 unresponsive (no I/O to the drive, no CPU usage, but all attempts to access 
 the FS results in a hang). I have an internal drive (single drive) that 
 contains 82GB of compressed data with a couple hundred snapshots. I tried 
 taking the first snapshot and making a read only copy (btrfs subvolume 
 snapshot -r) and then I connected an external USB drive and ran btrfs send / 
 receive to that external drive. It starts working and gets a couple of GB in 
 (I'd expect the first snapshot to be about 20GB) and then gets the following 
 error. I had to use the latest copy of btrfs-progs from git, because the 
 package installed on my system (btrfs-progs-0.20-0.2.git91d9eec) simply 
 returned invalid argument when trying to run btrfs send / receive. Thanks 
 in advance for any info you may have. 
 
 The problem has been introduced with rbtree ulists in 3.10, commit 
 
 Btrfs: add a rb_tree to improve performance of ulist search 
 
 You should be safe to revert that commit, it's a performance optimization 
 attempt. Alternatively, you can apply the published fix 
 
 Btrfs: fix crash regarding to ulist_add_merge 
 
 It has not made it into 3.10 stable or 3.11, yet, but is contained in Josef's 
 btrfs-next 
 
 git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
 
 Thanks, 
 -Jan 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: Cloning a Btrfs partition

2013-07-29 Thread Jan Schmidt
Hi BJ,

[original message rewrapped]

On Thu, July 25, 2013 at 18:32 (+0200), BJ Quinn wrote:
 (Apologies for the double post -- forgot to send as plain text the first time
 around, so the list rejected it.)
 
 I see that there's now a btrfs send / receive and I've tried using it, but
 I'm getting the oops I've pasted below, after which the FS becomes
 unresponsive (no I/O to the drive, no CPU usage, but all attempts to access
 the FS results in a hang). I have an internal drive (single drive) that
 contains 82GB of compressed data with a couple hundred snapshots. I tried
 taking the first snapshot and making a read only copy (btrfs subvolume
 snapshot -r) and then I connected an external USB drive and ran btrfs send /
 receive to that external drive. It starts working and gets a couple of GB in
 (I'd expect the first snapshot to be about 20GB) and then gets the following
 error. I had to use the latest copy of btrfs-progs from git, because the
 package installed on my system (btrfs-progs-0.20-0.2.git91d9eec) simply
 returned invalid argument when trying to run btrfs send / receive. Thanks
 in advance for any info you may have.

The problem has been introduced with rbtree ulists in 3.10, commit

Btrfs: add a rb_tree to improve performance of ulist search

You should be safe to revert that commit, it's a performance optimization
attempt. Alternatively, you can apply the published fix

Btrfs: fix crash regarding to ulist_add_merge

It has not made it into 3.10 stable or 3.11, yet, but is contained in Josef's
btrfs-next

git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix crash regarding to ulist_add_merge

2013-07-29 Thread Jan Schmidt
On Fri, June 28, 2013 at 06:37 (+0200), Liu Bo wrote:
 Several users reported this crash of NULL pointer or general protection,
 the story is that we add a rbtree for speedup ulist iteration, and we
 use krealloc() to address ulist growth, and krealloc() use memcpy to copy
 old data to new memory area, so it's OK for an array as it doesn't use
 pointers while it's not OK for a rbtree as it uses pointers.
 
 So krealloc() will mess up our rbtree and it ends up with crash.
 
 Reviewed-by: Wang Shilong wangsl-f...@cn.fujitsu.com
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
 v3: fix a return value problem(Thanks Wang Shilong).
 v2: fix an use-after-free bug and a finger error(Thanks Zach and Josef).
 
  fs/btrfs/ulist.c |   15 +++
  1 files changed, 15 insertions(+), 0 deletions(-)
 
 diff --git a/fs/btrfs/ulist.c b/fs/btrfs/ulist.c
 index 7b417e2..b0a523b2 100644
 --- a/fs/btrfs/ulist.c
 +++ b/fs/btrfs/ulist.c
 @@ -205,6 +205,10 @@ int ulist_add_merge(struct ulist *ulist, u64 val, u64 
 aux,
   u64 new_alloced = ulist-nodes_alloced + 128;
   struct ulist_node *new_nodes;
   void *old = NULL;
 + int i;
 +
 + for (i = 0; i  ulist-nnodes; i++)
 + rb_erase(ulist-nodes[i].rb_node, ulist-root);
  
   /*
* if nodes_alloced == ULIST_SIZE no memory has been allocated
 @@ -224,6 +228,17 @@ int ulist_add_merge(struct ulist *ulist, u64 val, u64 
 aux,
  
   ulist-nodes = new_nodes;
   ulist-nodes_alloced = new_alloced;
 +
 + /*
 +  * krealloc actually uses memcpy, which does not copy rb_node
 +  * pointers, so we have to do it ourselves.  Otherwise we may
 +  * be bitten by crashes.
 +  */
 + for (i = 0; i  ulist-nnodes; i++) {
 + ret = ulist_rbtree_insert(ulist, ulist-nodes[i]);
 + if (ret  0)
 + return ret;
 + }
   }
   ulist-nodes[ulist-nnodes].val = val;
   ulist-nodes[ulist-nnodes].aux = aux;
 

Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net

Josef, how about sending this one for the next 3.11 rc and to 3.10 stable? Any
objections?

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] xfstest btrfs/316: test send / receive (was: btrfs/314)

2013-07-24 Thread Jan Schmidt
From: root root@zarzz.(none)

These two patches add the announced tests for btrfs send / receive. As
requested, the fssum tool is now included.

One drawback is that I'm unable to edit configure.ac or whatever needs
to be modified in an autotools preferred way. Any hints appreciated,
preferrably hints containing all the modifications required to introduce
something like HAVE_SEEK_HOLE.

I do not want to make modifications to fssum.c here, if that's
absolutely required (because one /could/ get along using linux/fs.h,
which is not the way I would like to go), I'd like to have that changed
in the far-progs repository where fssum.c comes from as well.

Jan Schmidt (2):
  xfstests: add fssum tool
  xfstests btrfs/316: test send / receive

 .gitignore  |1 +
 README  |3 +
 common/config   |2 +
 src/Makefile|   11 +-
 src/fssum.c |  819 +++
 tests/btrfs/316 |  113 +++
 tests/btrfs/316.out |4 +
 tests/btrfs/group   |1 +
 8 files changed, 953 insertions(+), 1 deletions(-)
 create mode 100644 src/fssum.c
 create mode 100755 tests/btrfs/316
 create mode 100644 tests/btrfs/316.out

-- 
1.7.2.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] xfstests btrfs/316: test send / receive

2013-07-24 Thread Jan Schmidt
Basic send / receive functionality test for btrfs. Requires current
version of fsstress built (-x support). Relies on fssum tool but can
skip the test if it failed to build.

Signed-off-by: Jan Schmidt list@jan-o-sch.net
---
 README  |3 +
 tests/btrfs/316 |  113 +++
 tests/btrfs/316.out |4 ++
 tests/btrfs/group   |1 +
 4 files changed, 121 insertions(+), 0 deletions(-)
 create mode 100755 tests/btrfs/316
 create mode 100644 tests/btrfs/316.out

diff --git a/README b/README
index a49ca7c..d287f63 100644
--- a/README
+++ b/README
@@ -26,6 +26,9 @@ Preparing system for tests (IRIX and Linux):
   http://www.extra.research.philips.com/udf/, then copy the udf_test 
   binary to xfstests/src/. If you wish to disable UDF verification test
   set the environment variable DISABLE_UDF_TEST to 1.
+- If you wish to run the btrfs send / receive components of the suite
+  install fssum from
+git://git.kernel.org/pub/scm/linux/kernel/git/arne/far-progs.git

 
 - create one or two partitions to use for testing
diff --git a/tests/btrfs/316 b/tests/btrfs/316
new file mode 100755
index 000..2e86428
--- /dev/null
+++ b/tests/btrfs/316
@@ -0,0 +1,113 @@
+#! /bin/bash
+# FSQA Test No. 314
+#
+# Run fsstress to create a reasonably strange file system, make a
+# snapshot (base) and run more fsstress. Then take another snapshot
+# (incr) and send both snapshots to a temp file. Remake the file
+# system and receive from the files. Check both states with fssum.
+#
+#---
+# Copyright (C) 2013 STRATO.  All rights reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+#---
+#
+# creator
+owner=list.bt...@jan-o-sch.net
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo QA output created by $seq
+
+here=`pwd`
+tmp=`mktemp -d`
+status=1
+
+_cleanup()
+{
+   echo *** unmount
+   umount $SCRATCH_MNT 2/dev/null
+   rm -f $tmp.*
+}
+trap _cleanup; exit \$status 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_need_to_be_root
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_command $FSSUM_PROG fssum
+
+rm -f $seqres.full
+
+workout()
+{
+   fsz=$1
+   ops=$2
+
+   umount $SCRATCH_DEV /dev/null 21
+   echo *** mkfs -dsize=$fsz$seqres.full
+   echo  $seqres.full
+   _scratch_mkfs_sized $fsz $seqres.full 21 \
+   || _fail size=$fsz mkfs failed
+   run_check _scratch_mount -o noatime
+
+   run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n $ops $FSSTRESS_AVOID -x \
+   $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/base
+
+   run_check $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/incr
+
+   echo # $BTRFS_UTIL_PROG send $SCRATCH_MNT/base  $tmp/base.snap \
+$seqres.full
+   $BTRFS_UTIL_PROG send $SCRATCH_MNT/base  $tmp/base.snap 2 
$seqres.full \
+   || _fail failed: '$@'
+   echo # $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base\
+   $SCRATCH_MNT/incr  $tmp/incr.snap  $seqres.full
+   $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base \
+   $SCRATCH_MNT/incr  $tmp/incr.snap 2 $seqres.full \
+   || _fail failed: '$@'
+
+   run_check $FSSUM_PROG -A -f -w $tmp/base.fssum $SCRATCH_MNT/base
+   run_check $FSSUM_PROG -A -f -w $tmp/incr.fssum -x 
$SCRATCH_MNT/incr/base \
+   $SCRATCH_MNT/incr
+
+   umount $SCRATCH_DEV /dev/null 21
+   echo *** mkfs -dsize=$fsz$seqres.full
+   echo  $seqres.full
+   _scratch_mkfs_sized $fsz $seqres.full 21 \
+   || _fail size=$fsz mkfs failed
+   run_check _scratch_mount -o noatime
+
+   run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT  $tmp/base.snap
+   run_check $FSSUM_PROG -r $tmp/base.fssum $SCRATCH_MNT/base
+
+   run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT  $tmp/incr.snap
+   run_check $FSSUM_PROG -r $tmp/incr.fssum $SCRATCH_MNT/incr
+}
+
+echo *** test send / receive
+
+fssize=`expr 2000 \* 1024 \* 1024`
+ops=200
+
+workout $fssize $ops
+
+echo *** done
+status=0
+exit

Re: [PATCH] Btrfs: fix extent buffer leak after backref walking

2013-07-03 Thread Jan Schmidt
On Wed, July 03, 2013 at 08:40 (+0200), Liu Bo wrote:
 commit 47fb091fb787420cd195e66f162737401cce023f(Btrfs: fix unlock after free 
 on rewinded tree blocks)
 takes an extra increment on the reference of allocated dummy extent buffer, 
 so now we
 cannot free this dummy one, and end up with extent buffer leak.
 
 Signed-off-by: Liu Bo bo.li@oracle.com

Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net

 ---
  fs/btrfs/ctree.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index 02fae7f..3d790b4 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -1268,12 +1268,12 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, 
 struct extent_buffer *eb,
   BUG_ON(!eb_rewin);
   }
  
 - extent_buffer_get(eb_rewin);
   btrfs_tree_read_unlock(eb);
   free_extent_buffer(eb);
  
   extent_buffer_get(eb_rewin);
   btrfs_tree_read_lock(eb_rewin);
 +
   __tree_mod_log_rewind(eb_rewin, time_seq, tm);
   WARN_ON(btrfs_header_nritems(eb_rewin) 
   BTRFS_NODEPTRS_PER_BLOCK(fs_info-tree_root));
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: hold the tree mod lock in __tree_mod_log_rewind

2013-07-02 Thread Jan Schmidt
On Sun, June 30, 2013 at 15:55 (+0200), Josef Bacik wrote:
 On Sun, Jun 30, 2013 at 10:25:05AM +0200, Jan Schmidt wrote:
 On 30.06.2013 05:17, Josef Bacik wrote:
 We need to hold the tree mod log lock in __tree_mod_log_rewind since we walk
 forward in the tree mod entries, otherwise we'll end up with random entries 
 and
 trip the BUG_ON() at the front of __tree_mod_log_rewind.  This fixes the 
 panics
 people were seeing when running

 find /whatever -type f -exec btrfs fi defrag {} \;

 This patch cannot help to solve the problem, as far as I've understood
 what is going on. It does change timing, though, which presumably makes
 it pass the current reproducer we're having.

 On rewinding, iteration through the tree mod log rb-tree goes backwards
 in time, which means that once we've found our staring point we cannot
 be trapped by later additions. The old items we're rewinding towards
 cannot be freed, because we've allocated a blocker element within the
 tree and rewinding never goes beyond the allocated blocker. The blocker
 element is allocated by btrfs_get_tree_mod_seq and mostly referred to as
 time_seq within the other tree mod log functions in ctree.c. To sum up,
 the added lock is not required.

 The debug output I've analyzed so far shows that after we've rewinded
 all REMOVE_WHILE_FREEING operations on a buffer, ordered consecutively
 as expected, there comes another REMOVE_WHILE_FREEING with a sequence
 number much further in the past for the same buffer (but that sequence
 number is still higher than out time_seq rewind barrier at that point).
 This must be a logical problem I've not completely understood so far,
 but locking doesn't seem to be the right track.

 
 Finally reproduced it, this is my output
 
  btrfs-endio-wri-23110 [000] ...2  9556.882103: __tree_mod_log_rewind: 
 rewinding 15450537984
  btrfs-endio-wri-23110 [000] ...2  9556.882104: __tree_mod_log_rewind: 
 15450537984: processing 880246590a40, op 3, seq 68719476829, slot 0
  btrfs-endio-wri-23110 [000] ...2  9556.882106: __tree_mod_log_rewind: 
 15450537984: processing 880246590ac0, op 3, seq 68719476828, slot 1
  btrfs-endio-wri-23110 [000] ...2  9556.882108: __tree_mod_log_rewind: 
 15450537984: processing 880246590a40, op 3, seq 68719476829, slot 0
  btrfs-endio-wri-23110 [000] ...2  9556.882110: __tree_mod_log_rewind: 
 15450537984: this tm is failing, 880246590a40, seq 68719476829, slot 0
 
 so I'm inclined to beleive I've got it right.  Thanks,

Looking at the code I agree we should have a read lock around rb_next,
protecting it against reorganization during insertions. Fits to that kind of
debug output.

How about just getting the lock for the rb_next call? There can be quite a lot
of operations to rewind and I'd rather not have every other fs tree modification
block on that.

Thanks,
-Jan

 Josef
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: stop using GFP_ATOMIC for the tree mod log allocations

2013-07-02 Thread Jan Schmidt
On Mon, July 01, 2013 at 22:25 (+0200), Josef Bacik wrote:
 Previously we held the tree mod lock when adding stuff because we use it to
 check and see if we truly do want to track tree modifications.  This is
 admirable, but GFP_ATOMIC in a critical area that is going to get hit pretty
 hard and often is not nice.  So instead do our basic checks to see if we don't
 need to track modifications, and if those pass then do our allocation, and 
 then
 when we go to insert the new modification check if we still care, and if we
 don't just free up our mod and return.  Otherwise we're good to go and we can
 carry on.  Thanks,

I'd like to look at a side-by-side diff of that patch in my editor. However, it
does not apply to your current master branch, and git even refuses trying a
3-way-merge because your Repository lacks necessary blobs. Can you please push
something?

Thanks,
-Jan

 Signed-off-by: Josef Bacik jba...@fusionio.com
 ---
  fs/btrfs/ctree.c |  161 
 ++
  1 files changed, 54 insertions(+), 107 deletions(-)
 
 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index 127e1fd..fff08f9 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -484,8 +484,27 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, 
 struct tree_mod_elem *tm)
   struct rb_node **new;
   struct rb_node *parent = NULL;
   struct tree_mod_elem *cur;
 + int ret = 0;
 +
 + BUG_ON(!tm);
 +
 + tree_mod_log_write_lock(fs_info);
 + if (list_empty(fs_info-tree_mod_seq_list)) {
 + tree_mod_log_write_unlock(fs_info);
 + /*
 +  * Ok we no longer care about logging modifications, free up tm
 +  * and return 0.  Any callers shouldn't be using tm after
 +  * calling tree_mod_log_insert, but if they do we can just
 +  * change this to return a special error code to let the callers
 +  * do their own thing.
 +  */
 + kfree(tm);
 + return 0;
 + }
  
 - BUG_ON(!tm || !tm-seq);
 + spin_lock(fs_info-tree_mod_seq_lock);
 + tm-seq = btrfs_inc_tree_mod_seq_minor(fs_info);
 + spin_unlock(fs_info-tree_mod_seq_lock);
  
   tm_root = fs_info-tree_mod_log;
   new = tm_root-rb_node;
 @@ -501,14 +520,17 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, 
 struct tree_mod_elem *tm)
   else if (cur-seq  tm-seq)
   new = ((*new)-rb_right);
   else {
 + ret = -EEXIST;
   kfree(tm);
 - return -EEXIST;
 + goto out;
   }
   }
  
   rb_link_node(tm-node, parent, new);
   rb_insert_color(tm-node, tm_root);
 - return 0;
 +out:
 + tree_mod_log_write_unlock(fs_info);
 + return ret;
  }
  
  /*
 @@ -524,55 +546,17 @@ static inline int tree_mod_dont_log(struct 
 btrfs_fs_info *fs_info,
   return 1;
   if (eb  btrfs_header_level(eb) == 0)
   return 1;
 -
 - tree_mod_log_write_lock(fs_info);
 - if (list_empty(fs_info-tree_mod_seq_list)) {
 - /*
 -  * someone emptied the list while we were waiting for the lock.
 -  * we must not add to the list when no blocker exists.
 -  */
 - tree_mod_log_write_unlock(fs_info);
 - return 1;
 - }
 -
   return 0;
  }
  
 -/*
 - * This allocates memory and gets a tree modification sequence number.
 - *
 - * Returns 0 on error.
 - * Returns 0 (the added sequence number) on success.
 - */
 -static inline struct tree_mod_elem *
 -tree_mod_alloc(struct btrfs_fs_info *fs_info, gfp_t flags)
 -{
 - struct tree_mod_elem *tm;
 -
 - /*
 -  * once we switch from spin locks to something different, we should
 -  * honor the flags parameter here.
 -  */
 - tm = *tm_ret = kzalloc(sizeof(*tm), GFP_ATOMIC);
 - if (!tm)
 - return NULL;
 -
 - spin_lock(fs_info-tree_mod_seq_lock);
 - tm-seq = btrfs_inc_tree_mod_seq_minor(fs_info);
 - spin_unlock(fs_info-tree_mod_seq_lock);
 -
 - return tm;
 -}
 -
  static inline int
  __tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
 struct extent_buffer *eb, int slot,
 enum mod_log_op op, gfp_t flags)
  {
 - int ret;
   struct tree_mod_elem *tm;
  
 - tm = tree_mod_alloc(fs_info, flags);
 + tm = kzalloc(sizeof(*tm), flags);
   if (!tm)
   return -ENOMEM;
  
 @@ -589,34 +573,14 @@ __tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
  }
  
  static noinline int
 -tree_mod_log_insert_key_mask(struct btrfs_fs_info *fs_info,
 -  struct extent_buffer *eb, int slot,
 -  enum mod_log_op op, gfp_t flags)
 +tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
 + struct extent_buffer *eb, int slot,
 + 

Re: [PATCH] Btrfs: only do the tree_mod_log_free_eb if this is our last ref

2013-07-02 Thread Jan Schmidt
(resent to list)

On Mon, July 01, 2013 at 22:12 (+0200), Josef Bacik wrote:
 There is another bug in the tree mod log stuff in that we're calling
 tree_mod_log_free_eb every single time a block is cow'ed.  The problem with 
 this
 is that if this block is shared by multiple snapshots we will call this 
 multiple
 times per block, so if we go to rewind the mod log for this block we'll 
 BUG_ON()
 in __tree_mod_log_rewind because we try to rewind a free twice.  We only want 
 to
 call tree_mod_log_free_eb if we are actually freeing the block.  With this 
 patch
 I no longer hit the panic in __tree_mod_log_rewind.  Thanks,
 
 Signed-off-by: Josef Bacik jba...@fusionio.com

Strange that never really popped up largely so far, should be quite easy to hit.
Anyway,

Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net

 ---
  fs/btrfs/ctree.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)
 
 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index 32e30ad..127e1fd 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -1093,7 +1093,8 @@ static noinline int __btrfs_cow_block(struct 
 btrfs_trans_handle *trans,
   btrfs_set_node_ptr_generation(parent, parent_slot,
 trans-transid);
   btrfs_mark_buffer_dirty(parent);
 - tree_mod_log_free_eb(root-fs_info, buf);
 + if (last_ref)
 + tree_mod_log_free_eb(root-fs_info, buf);
   btrfs_free_tree_block(trans, root, buf, parent_start,
 last_ref);
   }
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: hold the tree mod lock in __tree_mod_log_rewind

2013-06-30 Thread Jan Schmidt
On 30.06.2013 05:17, Josef Bacik wrote:
 We need to hold the tree mod log lock in __tree_mod_log_rewind since we walk
 forward in the tree mod entries, otherwise we'll end up with random entries 
 and
 trip the BUG_ON() at the front of __tree_mod_log_rewind.  This fixes the 
 panics
 people were seeing when running
 
 find /whatever -type f -exec btrfs fi defrag {} \;

This patch cannot help to solve the problem, as far as I've understood
what is going on. It does change timing, though, which presumably makes
it pass the current reproducer we're having.

On rewinding, iteration through the tree mod log rb-tree goes backwards
in time, which means that once we've found our staring point we cannot
be trapped by later additions. The old items we're rewinding towards
cannot be freed, because we've allocated a blocker element within the
tree and rewinding never goes beyond the allocated blocker. The blocker
element is allocated by btrfs_get_tree_mod_seq and mostly referred to as
time_seq within the other tree mod log functions in ctree.c. To sum up,
the added lock is not required.

The debug output I've analyzed so far shows that after we've rewinded
all REMOVE_WHILE_FREEING operations on a buffer, ordered consecutively
as expected, there comes another REMOVE_WHILE_FREEING with a sequence
number much further in the past for the same buffer (but that sequence
number is still higher than out time_seq rewind barrier at that point).
This must be a logical problem I've not completely understood so far,
but locking doesn't seem to be the right track.

Thanks,
-Jan


 Thansk,
 
 Signed-off-by: Josef Bacik jba...@fusionio.com
 ---
  fs/btrfs/ctree.c |   10 ++
  1 files changed, 6 insertions(+), 4 deletions(-)
 
 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index c32d03d..7921e1d 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -1161,8 +1161,8 @@ __tree_mod_log_oldest_root(struct btrfs_fs_info 
 *fs_info,
   * time_seq).
   */
  static void
 -__tree_mod_log_rewind(struct extent_buffer *eb, u64 time_seq,
 -   struct tree_mod_elem *first_tm)
 +__tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer 
 *eb,
 +   u64 time_seq, struct tree_mod_elem *first_tm)
  {
   u32 n;
   struct rb_node *next;
 @@ -1172,6 +1172,7 @@ __tree_mod_log_rewind(struct extent_buffer *eb, u64 
 time_seq,
   unsigned long p_size = sizeof(struct btrfs_key_ptr);
  
   n = btrfs_header_nritems(eb);
 + tree_mod_log_read_lock(fs_info);
   while (tm  tm-seq = time_seq) {
   /*
* all the operations are recorded with the operator used for
 @@ -1226,6 +1227,7 @@ __tree_mod_log_rewind(struct extent_buffer *eb, u64 
 time_seq,
   if (tm-index != first_tm-index)
   break;
   }
 + tree_mod_log_read_unlock(fs_info);
   btrfs_set_header_nritems(eb, n);
  }
  
 @@ -1274,7 +1276,7 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, 
 struct extent_buffer *eb,
  
   extent_buffer_get(eb_rewin);
   btrfs_tree_read_lock(eb_rewin);
 - __tree_mod_log_rewind(eb_rewin, time_seq, tm);
 + __tree_mod_log_rewind(fs_info, eb_rewin, time_seq, tm);
   WARN_ON(btrfs_header_nritems(eb_rewin) 
   BTRFS_NODEPTRS_PER_BLOCK(fs_info-tree_root));
  
 @@ -1350,7 +1352,7 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
   btrfs_set_header_generation(eb, old_generation);
   }
   if (tm)
 - __tree_mod_log_rewind(eb, time_seq, tm);
 + __tree_mod_log_rewind(root-fs_info, eb, time_seq, tm);
   else
   WARN_ON(btrfs_header_level(eb) != 0);
   WARN_ON(btrfs_header_nritems(eb)  BTRFS_NODEPTRS_PER_BLOCK(root));
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Btrfs: qgroup rescan fixes for next rc

2013-06-10 Thread Jan Schmidt
Hi Chris,

I know, Linus is turning grumpy again. I'd still feel better if we sent this
patch set for the very next rc now. Any particular objections?

-Jan

On Tue, May 28, 2013 at 17:47 (+0200), Jan Schmidt wrote:
 Here are three fixes for the new qgroup rescan feature. The first two
 are quite small, the third one is a little bigger. I thought about
 splitting that one up, but in the end I didn't find a good point to
 break that up. It achieves more than one goal, I agree, but its more or
 less a compact code change that need not be split artifically in my
 opinion.
 
 Jan Schmidt (3):
   Btrfs: fix memory patcher through fs_info-qgroup_ulist
   Btrfs: avoid double free of fs_info-qgroup_ulist
   Btrfs: fix qgroup rescan resume on mount
 
  fs/btrfs/ctree.h   |2 +
  fs/btrfs/disk-io.c |2 +
  fs/btrfs/qgroup.c  |  198 
 +---
  3 files changed, 131 insertions(+), 71 deletions(-)
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests btrfs/314: test send / receive

2013-06-07 Thread Jan Schmidt
(cc Arne for far-progs discussion)

On Thu, June 06, 2013 at 19:54 (+0200), Eric Sandeen wrote:
 On 6/6/13 10:20 AM, Jan Schmidt wrote:
 Basic send / receive functionality test for btrfs. Requires current
 version of fsstress built (-x support). Relies on fssum tool, which is
 not part of the test suite but can skip the test if it is missing.

 Signed-off-by: Jan Schmidt list@jan-o-sch.net
 
 w/o commenting on the test itself, I'm a little uneasy about requiring
 some external, not-widely-installed tool for this to run.  The fear is
 that it won't be run as often as it could/should be.

The main purpose is to have it run by developers changing something around btrfs
send / receive and probably the backref walker (while there exists a separate
test not requiring fssum for backrefs). I think we can get them to install 
fssum.

 Could the same test be done w/o fssum, or should we maybe put a copy
 of fssum into xfstests/src/fssum.c ?

I don't know any adequate replacement for fssum in this case. The purpose is to
build a checksum for a whole file system tree, including data and partly 
metadata.

I don't feel like copying fssum from far-progs into xfstests, though it probably
won't hurt much. However, I cannot promise we won't make changes to it for
far-progs, probably creating two incompatible versions of fssum in the wild. 
Arne?

 Or does fssum exist in any standard distro package?

It doesn't. Perhaps Josef can hurry and make a Fedora package for it, if that
prevents a separate copy to xfstests :-)

Thanks,
-Jan

 Thanks,
 -Eric
 
 ---
  README  |3 +
  common/config   |2 +
  tests/btrfs/314 |  113 
 +++
  tests/btrfs/314.out |4 ++
  tests/btrfs/group   |1 +
  5 files changed, 123 insertions(+), 0 deletions(-)
  create mode 100755 tests/btrfs/314
  create mode 100644 tests/btrfs/314.out

 diff --git a/README b/README
 index d4d4f31..56b31f0 100644
 --- a/README
 +++ b/README
 @@ -26,6 +26,9 @@ Preparing system for tests (IRIX and Linux):
http://www.extra.research.philips.com/udf/, then copy the udf_test 
binary to xfstests/src/. If you wish to disable UDF verification test
set the environment variable DISABLE_UDF_TEST to 1.
 +- If you wish to run the btrfs send / receive components of the suite
 +  install fssum from
 +git://git.kernel.org/pub/scm/linux/kernel/git/arne/far-progs.git
  
  
  - create one or two partitions to use for testing
 diff --git a/common/config b/common/config
 index 67c1498..1c11da3 100644
 --- a/common/config
 +++ b/common/config
 @@ -146,6 +146,8 @@ export SED_PROG=`set_prog_path sed`
  export BC_PROG=`set_prog_path bc`
  [ $BC_PROG =  ]  _fatal bc not found
  
 +export FSSUM_PROG=`set_prog_path fssum`
 +
  export PS_ALL_FLAGS=-ef
  
  export DF_PROG=`set_prog_path df`
 diff --git a/tests/btrfs/314 b/tests/btrfs/314
 new file mode 100755
 index 000..2e86428
 --- /dev/null
 +++ b/tests/btrfs/314
 @@ -0,0 +1,113 @@
 +#! /bin/bash
 +# FSQA Test No. 314
 +#
 +# Run fsstress to create a reasonably strange file system, make a
 +# snapshot (base) and run more fsstress. Then take another snapshot
 +# (incr) and send both snapshots to a temp file. Remake the file
 +# system and receive from the files. Check both states with fssum.
 +#
 +#---
 +# Copyright (C) 2013 STRATO.  All rights reserved.
 +#
 +# This program is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU General Public License as
 +# published by the Free Software Foundation.
 +#
 +# This program is distributed in the hope that it would be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +# GNU General Public License for more details.
 +#
 +# You should have received a copy of the GNU General Public License
 +# along with this program; if not, write the Free Software Foundation,
 +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
 +#
 +#---
 +#
 +# creator
 +owner=list.bt...@jan-o-sch.net
 +
 +seq=`basename $0`
 +seqres=$RESULT_DIR/$seq
 +echo QA output created by $seq
 +
 +here=`pwd`
 +tmp=`mktemp -d`
 +status=1
 +
 +_cleanup()
 +{
 +echo *** unmount
 +umount $SCRATCH_MNT 2/dev/null
 +rm -f $tmp.*
 +}
 +trap _cleanup; exit \$status 0 1 2 3 15
 +
 +# get standard environment, filters and checks
 +. ./common/rc
 +. ./common/filter
 +
 +# real QA test starts here
 +_need_to_be_root
 +_supported_fs btrfs
 +_supported_os Linux
 +_require_scratch
 +_require_command $FSSUM_PROG fssum
 +
 +rm -f $seqres.full
 +
 +workout()
 +{
 +fsz=$1
 +ops=$2
 +
 +umount $SCRATCH_DEV /dev/null 21
 +echo *** mkfs -dsize=$fsz$seqres.full
 +echo  $seqres.full

Re: [PATCH] xfstests btrfs/314: test send / receive

2013-06-07 Thread Jan Schmidt
On Fri, June 07, 2013 at 16:51 (+0200), Arne Jansen wrote:
 On 07.06.2013 16:50, Eric Sandeen wrote:
 On 6/7/13 5:29 AM, Dave Chinner wrote:
 On Fri, Jun 07, 2013 at 09:18:58AM +0200, Jan Schmidt wrote:
 (cc Arne for far-progs discussion)

 On Thu, June 06, 2013 at 19:54 (+0200), Eric Sandeen wrote:
 On 6/6/13 10:20 AM, Jan Schmidt wrote:
 Basic send / receive functionality test for btrfs. Requires current
 version of fsstress built (-x support). Relies on fssum tool, which is
 not part of the test suite but can skip the test if it is missing.

 Signed-off-by: Jan Schmidt list@jan-o-sch.net

 w/o commenting on the test itself, I'm a little uneasy about requiring
 some external, not-widely-installed tool for this to run.  The fear is
 that it won't be run as often as it could/should be.

 The main purpose is to have it run by developers changing something around 
 btrfs
 send / receive and probably the backref walker (while there exists a 
 separate
 test not requiring fssum for backrefs). I think we can get them to install 
 fssum.

 There's no point in having tests that require you to go find
 something else before the tests can be run. That's been tried
 before, and it doesn't work - the test just won't get run by
 the majority of people who run xfstests.

 Could the same test be done w/o fssum, or should we maybe put a copy
 of fssum into xfstests/src/fssum.c ?

 I don't know any adequate replacement for fssum in this case. The purpose 
 is to
 build a checksum for a whole file system tree, including data and partly 
 metadata.

 I don't feel like copying fssum from far-progs into xfstests, though it 
 probably
 won't hurt much. However, I cannot promise we won't make changes to it for
 far-progs, probably creating two incompatible versions of fssum in the 
 wild. Arne?

 Or does fssum exist in any standard distro package?

 It doesn't. Perhaps Josef can hurry and make a Fedora package for it, if 
 that
 prevents a separate copy to xfstests :-)

 No, it doesn't. Packages would be needed for debian, suse, SLES,
 RHEL, etc for that to be a useful method of distribution. Just dump
 a snapshot of the utility in the xfstests src dir so we don't have
 to care about distribution issues...

 Yup I agree with this, if it's not widely available or replaceable by more
 common tools, let's just put a snapshot in xfstests.
 
 I'm fine with that, too.

To prevent more agreement mails: I'll send a v2 including fssum.c, but probably
not today.

-Jan

 -Arne
 

 -Eric

 Cheers,

 Dave.


 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] xfstests btrfs/314: test send / receive

2013-06-06 Thread Jan Schmidt
Basic send / receive functionality test for btrfs. Requires current
version of fsstress built (-x support). Relies on fssum tool, which is
not part of the test suite but can skip the test if it is missing.

Signed-off-by: Jan Schmidt list@jan-o-sch.net
---
 README  |3 +
 common/config   |2 +
 tests/btrfs/314 |  113 +++
 tests/btrfs/314.out |4 ++
 tests/btrfs/group   |1 +
 5 files changed, 123 insertions(+), 0 deletions(-)
 create mode 100755 tests/btrfs/314
 create mode 100644 tests/btrfs/314.out

diff --git a/README b/README
index d4d4f31..56b31f0 100644
--- a/README
+++ b/README
@@ -26,6 +26,9 @@ Preparing system for tests (IRIX and Linux):
   http://www.extra.research.philips.com/udf/, then copy the udf_test 
   binary to xfstests/src/. If you wish to disable UDF verification test
   set the environment variable DISABLE_UDF_TEST to 1.
+- If you wish to run the btrfs send / receive components of the suite
+  install fssum from
+git://git.kernel.org/pub/scm/linux/kernel/git/arne/far-progs.git

 
 - create one or two partitions to use for testing
diff --git a/common/config b/common/config
index 67c1498..1c11da3 100644
--- a/common/config
+++ b/common/config
@@ -146,6 +146,8 @@ export SED_PROG=`set_prog_path sed`
 export BC_PROG=`set_prog_path bc`
 [ $BC_PROG =  ]  _fatal bc not found
 
+export FSSUM_PROG=`set_prog_path fssum`
+
 export PS_ALL_FLAGS=-ef
 
 export DF_PROG=`set_prog_path df`
diff --git a/tests/btrfs/314 b/tests/btrfs/314
new file mode 100755
index 000..2e86428
--- /dev/null
+++ b/tests/btrfs/314
@@ -0,0 +1,113 @@
+#! /bin/bash
+# FSQA Test No. 314
+#
+# Run fsstress to create a reasonably strange file system, make a
+# snapshot (base) and run more fsstress. Then take another snapshot
+# (incr) and send both snapshots to a temp file. Remake the file
+# system and receive from the files. Check both states with fssum.
+#
+#---
+# Copyright (C) 2013 STRATO.  All rights reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+#---
+#
+# creator
+owner=list.bt...@jan-o-sch.net
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo QA output created by $seq
+
+here=`pwd`
+tmp=`mktemp -d`
+status=1
+
+_cleanup()
+{
+   echo *** unmount
+   umount $SCRATCH_MNT 2/dev/null
+   rm -f $tmp.*
+}
+trap _cleanup; exit \$status 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_need_to_be_root
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_command $FSSUM_PROG fssum
+
+rm -f $seqres.full
+
+workout()
+{
+   fsz=$1
+   ops=$2
+
+   umount $SCRATCH_DEV /dev/null 21
+   echo *** mkfs -dsize=$fsz$seqres.full
+   echo  $seqres.full
+   _scratch_mkfs_sized $fsz $seqres.full 21 \
+   || _fail size=$fsz mkfs failed
+   run_check _scratch_mount -o noatime
+
+   run_check $FSSTRESS_PROG -d $SCRATCH_MNT -n $ops $FSSTRESS_AVOID -x \
+   $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/base
+
+   run_check $BTRFS_UTIL_PROG subvol snap -r $SCRATCH_MNT $SCRATCH_MNT/incr
+
+   echo # $BTRFS_UTIL_PROG send $SCRATCH_MNT/base  $tmp/base.snap \
+$seqres.full
+   $BTRFS_UTIL_PROG send $SCRATCH_MNT/base  $tmp/base.snap 2 
$seqres.full \
+   || _fail failed: '$@'
+   echo # $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base\
+   $SCRATCH_MNT/incr  $tmp/incr.snap  $seqres.full
+   $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/base \
+   $SCRATCH_MNT/incr  $tmp/incr.snap 2 $seqres.full \
+   || _fail failed: '$@'
+
+   run_check $FSSUM_PROG -A -f -w $tmp/base.fssum $SCRATCH_MNT/base
+   run_check $FSSUM_PROG -A -f -w $tmp/incr.fssum -x 
$SCRATCH_MNT/incr/base \
+   $SCRATCH_MNT/incr
+
+   umount $SCRATCH_DEV /dev/null 21
+   echo *** mkfs -dsize=$fsz$seqres.full
+   echo  $seqres.full
+   _scratch_mkfs_sized $fsz $seqres.full 21 \
+   || _fail size=$fsz mkfs failed
+   run_check _scratch_mount -o noatime

[PATCH 0/3] Btrfs: qgroup rescan fixes for next rc

2013-05-28 Thread Jan Schmidt
Here are three fixes for the new qgroup rescan feature. The first two
are quite small, the third one is a little bigger. I thought about
splitting that one up, but in the end I didn't find a good point to
break that up. It achieves more than one goal, I agree, but its more or
less a compact code change that need not be split artifically in my
opinion.

Jan Schmidt (3):
  Btrfs: fix memory patcher through fs_info-qgroup_ulist
  Btrfs: avoid double free of fs_info-qgroup_ulist
  Btrfs: fix qgroup rescan resume on mount

 fs/btrfs/ctree.h   |2 +
 fs/btrfs/disk-io.c |2 +
 fs/btrfs/qgroup.c  |  198 +---
 3 files changed, 131 insertions(+), 71 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] Btrfs: fix memory patcher through fs_info-qgroup_ulist

2013-05-28 Thread Jan Schmidt
Commit 5b7c665e introduced fs_info-qgroup_ulist, that is allocated during
btrfs_read_qgroup_config and meant to be used later by the qgroup accounting
code. However, it is always freed before btrfs_read_qgroup_config returns,
becuase the commit mentioned above adds a check for (ret), where a check
for (ret  0) would have been the right choice. This commit fixes the check.

Cc: Wang Shilong wangsl-f...@cn.fujitsu.com
Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/qgroup.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index d059d86..74b432d 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -430,7 +430,7 @@ out:
}
btrfs_free_path(path);
 
-   if (ret)
+   if (ret  0)
ulist_free(fs_info-qgroup_ulist);
 
return ret  0 ? ret : 0;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] Btrfs: fix qgroup rescan resume on mount

2013-05-28 Thread Jan Schmidt
When called during mount, we cannot start the rescan worker thread until
open_ctree is done. This commit restuctures the qgroup rescan internals to
enable a clean deferral of the rescan resume operation.

First of all, the struct qgroup_rescan is removed, saving us a malloc and
some initialization synchronizations problems. Its only element (the worker
struct) now lives within fs_info just as the rest of the rescan code.

Then setting up a rescan worker is split into several reusable stages.
Currently we have three different rescan startup scenarios:
(A) rescan ioctl
(B) rescan resume by mount
(C) rescan by quota enable

Each case needs its own combination of the four following steps:
(1) set the progress [A, C: zero; B: state of umount]
(2) commit the transaction [A]
(3) set the counters [A, C: zero; B: state of umount]
(4) start worker [A, B, C]

qgroup_rescan_init does step (1). There's no extra function added to commit
a transaction, we've got that already. qgroup_rescan_zero_tracking does
step (3). Step (4) is nothing more than a call to the generic
btrfs_queue_worker.

We also get rid of a double check for the rescan progress during
btrfs_qgroup_account_ref, which is no longer required due to having step 2
from the list above.

As a side effect, this commit prepares to move the rescan start code from
btrfs_run_qgroups (which is run during commit) to a less time critical
section.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/ctree.h   |2 +
 fs/btrfs/disk-io.c |2 +
 fs/btrfs/qgroup.c  |  190 +---
 3 files changed, 125 insertions(+), 69 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index fd62aa8..8ac8d52 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1610,6 +1610,7 @@ struct btrfs_fs_info {
struct btrfs_key qgroup_rescan_progress;
struct btrfs_workers qgroup_rescan_workers;
struct completion qgroup_rescan_completion;
+   struct btrfs_work qgroup_rescan_work;
 
/* filesystem state */
unsigned long fs_state;
@@ -3856,6 +3857,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
 int btrfs_quota_disable(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info);
 int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
+void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info);
 int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info);
 int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info, u64 src, u64 dst);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d7b46c6..da4a10c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2879,6 +2879,8 @@ retry_root_backup:
return ret;
}
 
+   btrfs_qgroup_rescan_resume(fs_info);
+
return 0;
 
 fail_qgroup:
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index c6ce642..1280eff 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -98,13 +98,10 @@ struct btrfs_qgroup_list {
struct btrfs_qgroup *member;
 };
 
-struct qgroup_rescan {
-   struct btrfs_work   work;
-   struct btrfs_fs_info*fs_info;
-};
-
-static void qgroup_rescan_start(struct btrfs_fs_info *fs_info,
-   struct qgroup_rescan *qscan);
+static int
+qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
+  int init_flags);
+static void qgroup_rescan_zero_tracking(struct btrfs_fs_info *fs_info);
 
 /* must be called with qgroup_ioctl_lock held */
 static struct btrfs_qgroup *find_qgroup_rb(struct btrfs_fs_info *fs_info,
@@ -255,6 +252,7 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info)
int slot;
int ret = 0;
u64 flags = 0;
+   u64 rescan_progress = 0;
 
if (!fs_info-quota_enabled)
return 0;
@@ -312,20 +310,7 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info)
}
fs_info-qgroup_flags = btrfs_qgroup_status_flags(l,
  ptr);
-   fs_info-qgroup_rescan_progress.objectid =
-   btrfs_qgroup_status_rescan(l, ptr);
-   if (fs_info-qgroup_flags 
-   BTRFS_QGROUP_STATUS_FLAG_RESCAN) {
-   struct qgroup_rescan *qscan =
-   kmalloc(sizeof(*qscan), GFP_NOFS);
-   if (!qscan) {
-   ret = -ENOMEM;
-   goto out;
-   }
-   fs_info-qgroup_rescan_progress.type = 0;
-   fs_info-qgroup_rescan_progress.offset = 0

[PATCH 2/3] Btrfs: avoid double free of fs_info-qgroup_ulist

2013-05-28 Thread Jan Schmidt
When btrfs_read_qgroup_config or btrfs_quota_enable return non-zero, we've
already freed the fs_info-qgroup_ulist. The final btrfs_free_qgroup_config
called from quota_disable makes another ulist_free(fs_info-qgroup_ulist)
call.

We set fs_info-qgroup_ulist to NULL on the mentioned error paths, turning
the ulist_free in btrfs_free_qgroup_config into a noop.

Cc: Wang Shilong wangsl-f...@cn.fujitsu.com
Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/qgroup.c |8 ++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 74b432d..c6ce642 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -430,8 +430,10 @@ out:
}
btrfs_free_path(path);
 
-   if (ret  0)
+   if (ret  0) {
ulist_free(fs_info-qgroup_ulist);
+   fs_info-qgroup_ulist = NULL;
+   }
 
return ret  0 ? ret : 0;
 }
@@ -932,8 +934,10 @@ out_free_root:
kfree(quota_root);
}
 out:
-   if (ret)
+   if (ret) {
ulist_free(fs_info-qgroup_ulist);
+   fs_info-qgroup_ulist = NULL;
+   }
mutex_unlock(fs_info-qgroup_ioctl_lock);
return ret;
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Btrfs: qgroup rescan fixes for next rc

2013-05-28 Thread Jan Schmidt
Hi Wang,

Please have a look at these patches, you should have been CCed but I just
realized git send-email doesn't care about Cc lines in the patch file. Sigh.

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hard freezes with 3.9.0 during io-intensive loads

2013-05-17 Thread Jan Schmidt
On Thu, May 16, 2013 at 09:19 (+0200), Kai Krakow wrote:
 3.9.2 still does not fix anything. I'll go with autodefrag=off for the 
 moment until I hear some news in that regard. With this new information, is 
 it still helpful to get a metadata image from me? It should be reproducable 
 if you enable autodefrag or defragment cow'ed files.

Would still be helpful, yes. If you've got questions on the usage of
btrfs-image, your best bet is probably #btrfs on freenode, I haven't created any
usable images with that tool so far, but I've heard of people that succeeded.

Thanks!
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hard freezes with 3.9.0 during io-intensive loads

2013-05-10 Thread Jan Schmidt
On Fri, May 10, 2013 at 01:30 (+0200), Kai Krakow wrote:
 Jan Schmidt list.bt...@jan-o-sch.net schrieb:
 
 Apparently, it's not fixed. The system does not freeze now but it threw
 multiple backtraces right in front of my Xorg session. The backtraces
 look a little bit different now. Here's what I got:

 https://gist.github.com/kakra/8a340f006d01e146865d

 Occurence while running bedup dedup --defrag --size-cutoff
 $((1024*1024)) which was currently dedup'ing my backup volume with daily
 snapshots filled by rsync --inplace - so I suppose some file contents
 are pretty scattered.

 At least that looks different for now. I'm not certain about all the fixes
 in btrfs-next. Can you give it a try and bisect if btrfs-next is good?
 That would be really helpful.
 
 I'd prefer to not bisect my production system kernel... That will probably 
 take ages as running the reproducable test takes about 30-60 minutes 
 before the problem hits my system. At least unless you had a suggestion how 
 to speed up the process... ;-)

I see, hoped it would be something quicker.

 I saw the pull request with those fixes, so I supsect it didn't go into 
 3.9.1 but rather will go into 3.9.2?

Probably. However, those patches obviously weren't enough to solve your problem.
We don't submit a lot of things to stable, so they are likely to remain the only
btrfs related changes in there, which would mean it is unlikely to help with
your problem.

We can try to debug that further, you can send me / upload the output of

   btrfs-image -c9 /dev/whatever blah.img

built from Josef's repository

   git://github.com/josefbacik/btrfs-progs.git

It contains all your metadata (like file names), data is omitted from the dump.

 I probably wait and just do not run the dedup process until I have 3.9.2 
 installed. The backup works with occassional hiccups, the system very very 
 sometimes freezes but I almost always see the backtraces in dmesg after 
 backup. Let's see if it's all gone in 3.9.2.

It's always an alternative to hope for the best :-)

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hard freezes with 3.9.0 during io-intensive loads

2013-05-08 Thread Jan Schmidt
On Wed, May 08, 2013 at 02:24 (+0200), Kai Krakow wrote:
 Kai Krakow hurikhan77+bt...@gmail.com schrieb:
 
 I can reliably reproduce it from two different approaches. I'd like to
 only apply the commits fixing it. Can you name them here?

 In git log order:

 6ced2666 Btrfs: separate sequence numbers for delayed ref tracking and
 tree mod log ef9120b1 Btrfs: fix tree mod log regression on root split
 operations 2ed098ca Btrfs: fix accessing the root pointer in tree mod log
 functions 50723551 Btrfs: fix unlock after free on rewinded tree blocks

 The commit ids are from josef's master branch
 (git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git)
 which is known not to be very stable regarding commit ids.

 Thanks, applied almost cleanly to 3.9.0 vanilla with just one reject. And
 that was for some error message. I'm simply ignoring that and currently
 compiling it.

 I will get back here with the result (fixed or not fixed for one or both
 situations).
 
 Apparently, it's not fixed. The system does not freeze now but it threw 
 multiple backtraces right in front of my Xorg session. The backtraces look a 
 little bit different now. Here's what I got:
 
 https://gist.github.com/kakra/8a340f006d01e146865d
 
 Occurence while running bedup dedup --defrag --size-cutoff $((1024*1024)) 
 which was currently dedup'ing my backup volume with daily snapshots filled 
 by rsync --inplace - so I suppose some file contents are pretty scattered.

At least that looks different for now. I'm not certain about all the fixes in
btrfs-next. Can you give it a try and bisect if btrfs-next is good? That would
be really helpful.

-Jan

 [ 2612.573501] [ cut here ]
 [ 2612.573509] WARNING: at fs/btrfs/inode.c:2157 
 record_one_backref+0x310/0x328()
 [ 2612.573510] Hardware name: To Be Filled By O.E.M.
 [ 2612.573511] Modules linked in: rfcomm bnep af_packet vsock(O) vmmon(O) 
 vmnet(O) vmci(O) vmblock(O) reiserfs snd_usb_audio snd_usbmidi_lib 
 snd_rawmidi snd_seq_device gspca_sonixj gspca_main videodev gpio_ich 
 coretemp hwmon kvm_intel kvm crc32_pclmul crc32c_intel btusb microcode 
 bluetooth pcspkr lpc_ich i2c_i801 8250 mfd_core serial_core evdev 
 usb_storage zram(C) unix
 [ 2612.573528] Pid: 13112, comm: btrfs-endio-wri Tainted: G C O 
 3.9.0-gentoo #3
 [ 2612.573529] Call Trace:
 [ 2612.573534]  [8102f11d] ? warn_slowpath_common+0x78/0x8e
 [ 2612.573536]  [81183aed] ? record_one_backref+0x310/0x328
 [ 2612.573540]  [811c5eb0] ? iterate_extent_inodes+0x177/0x23c
 [ 2612.573542]  [811837dd] ? btrfs_real_readdir+0x482/0x482
 [ 2612.573543]  [811837dd] ? btrfs_real_readdir+0x482/0x482
 [ 2612.573545]  [811c5ffe] ? iterate_inodes_from_logical+0x89/0x96
 [ 2612.573547]  [81182320] ? record_extent_backrefs+0x4d/0x8e
 [ 2612.573549]  [8118a8d3] ? btrfs_finish_ordered_io+0x671/0x798
 [ 2612.573552]  [811a33f3] ? worker_loop+0x176/0x493
 [ 2612.573553]  [811a327d] ? btrfs_queue_worker+0x272/0x272
 [ 2612.573554]  [811a327d] ? btrfs_queue_worker+0x272/0x272
 [ 2612.573557]  [810496d2] ? kthread+0x81/0x89
 [ 2612.573560]  [8105] ? free_sched_groups+0x32/0x50
 [ 2612.573561]  [81049651] ? 
 kthread_freezable_should_stop+0x36/0x36
 [ 2612.573564]  [8151c6ac] ? ret_from_fork+0x7c/0xb0
 [ 2612.573566]  [81049651] ? 
 kthread_freezable_should_stop+0x36/0x36
 [ 2612.573567] ---[ end trace 4c42d11ebaf277b6 ]---
 [ 2612.574001] [ cut here ]
 [ 2612.574004] WARNING: at fs/btrfs/inode.c:2157 
 record_one_backref+0x310/0x328()
 [ 2612.574004] Hardware name: To Be Filled By O.E.M.
 [ 2612.574005] Modules linked in: rfcomm bnep af_packet vsock(O) vmmon(O) 
 vmnet(O) vmci(O) vmblock(O) reiserfs snd_usb_audio snd_usbmidi_lib 
 snd_rawmidi snd_seq_device gspca_sonixj gspca_main videodev gpio_ich 
 coretemp hwmon kvm_intel kvm crc32_pclmul crc32c_intel btusb microcode 
 bluetooth pcspkr lpc_ich i2c_i801 8250 mfd_core serial_core evdev 
 usb_storage zram(C) unix
 [ 2612.574017] Pid: 13110, comm: btrfs-endio-wri Tainted: GWC O 
 3.9.0-gentoo #3
 [ 2612.574018] Call Trace:
 [ 2612.574020]  [8102f11d] ? warn_slowpath_common+0x78/0x8e
 [ 2612.574021]  [81183aed] ? record_one_backref+0x310/0x328
 [ 2612.574023]  [811c5eb0] ? iterate_extent_inodes+0x177/0x23c
 [ 2612.574025]  [811837dd] ? btrfs_real_readdir+0x482/0x482
 [ 2612.574027]  [811837dd] ? btrfs_real_readdir+0x482/0x482
 [ 2612.574029]  [811c5ffe] ? iterate_inodes_from_logical+0x89/0x96
 [ 2612.574030]  [81182320] ? record_extent_backrefs+0x4d/0x8e
 [ 2612.574032]  [8118a8d3] ? btrfs_finish_ordered_io+0x671/0x798
 [ 2612.574034]  [811a33f3] ? worker_loop+0x176/0x493
 [ 2612.574035]  [811a327d] ? btrfs_queue_worker+0x272/0x272
 [ 2612.574036]  [811a327d] ? btrfs_queue_worker+0x272/0x272
 [ 2612.574038]  

Re: hard freezes with 3.9.0 during io-intensive loads

2013-05-07 Thread Jan Schmidt
On Mon, May 06, 2013 at 22:29 (+0200), Kai Krakow wrote:
 Jan Schmidt list.bt...@jan-o-sch.net schrieb:
 
 That one should be fixed in btrfs-next. If you can reliably reproduce the
 bug I'd be glad to get a confirmation - you can probably even save putting
 it on bugzilla then ;-)
 
 I can reliably reproduce it from two different approaches. I'd like to only 
 apply the commits fixing it. Can you name them here?

In git log order:

6ced2666 Btrfs: separate sequence numbers for delayed ref tracking and tree mod 
log
ef9120b1 Btrfs: fix tree mod log regression on root split operations
2ed098ca Btrfs: fix accessing the root pointer in tree mod log functions
50723551 Btrfs: fix unlock after free on rewinded tree blocks

The commit ids are from josef's master branch
(git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git) which is
known not to be very stable regarding commit ids.

Thanks,
-Jan

 [snip]
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: add ioctl to wait for qgroup rescan completion

2013-05-07 Thread Jan Schmidt
On Mon, May 06, 2013 at 23:20 (+0200), David Sterba wrote:
 On Mon, May 06, 2013 at 09:14:17PM +0200, Jan Schmidt wrote:
 --- a/include/uapi/linux/btrfs.h
 +++ b/include/uapi/linux/btrfs.h
 @@ -530,6 +530,7 @@ struct btrfs_ioctl_send_args {
 struct btrfs_ioctl_quota_rescan_args)
  #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
 struct btrfs_ioctl_quota_rescan_args)
 +#define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)
 
 Why do you need an ioctl when the same can be achieved by polling the
 RESCAN_STATUS value ? The code does not anything special that has to be
 done within kernel.

It's because I don't like polling :-) A rescan can take hours to complete, and
you wouldn't like to see one ioctl per second for such a period either, I guess.
(Plus: Everybody would lose like .9 seconds for each run of the xfstest I'm
writing - accumulates to ages at least!)

If you're worried about ioctl numbers, we could turn it into flags for
BTRFS_IOC_QUOTA_RESCAN, but I don't see we're short on ioctl numbers yet. The
reason why I chose a separate ioctl is that it is more like an attach operation
to support both, specifying it when starting a fresh scan and waiting for a scan
that's already running. I find it more intuitive to have it separate.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: save us a mutex_lock usage when doing quota rescan

2013-05-07 Thread Jan Schmidt
On Tue, May 07, 2013 at 08:15 (+0200), Wang Shilong wrote:
 If qgroup_rescan worker is in progress, we should ignore
 the extent that has not been dealt with qgroup_rescan worker,just
 let them dealt later otherwise we may get wrong qgroup accounting.
 
 However, we have checked this before find_all_roots() without spin_lock.
 When doing qgroup accounting, we don't have to check it again, because
 during this period,qgroup_rescan worker can deal with more extents and
 qgroup_rescan_extent-objectid can only go larger, so here the check
 is unnecessary.
 
 Just remove this check, so that we don't need hold qgroup_rescan_lock
 when doing qgroup accounting.

NAK.

After a discussion on that lock the last thing in this thread I see is ...

On Wed, May 01, 2013 at 13:57 (+0200), Jan Schmidt wrote:
 Now I see what you mean. The second check is only required when we start
 a rescan operation after the initial check in btrfs_qgroup_account_ref.

Please continue on that argument, your commit message doesn't explain at all why
we should be safe to remove this check.

-Jan

 Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com
 ---
  fs/btrfs/qgroup.c |9 -
  1 files changed, 0 insertions(+), 9 deletions(-)
 
 diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
 index d059d86..2710784 100644
 --- a/fs/btrfs/qgroup.c
 +++ b/fs/btrfs/qgroup.c
 @@ -1445,15 +1445,7 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle 
 *trans,
   if (ret  0)
   return ret;
  
 - mutex_lock(fs_info-qgroup_rescan_lock);
   spin_lock(fs_info-qgroup_lock);
 - if (fs_info-qgroup_flags  BTRFS_QGROUP_STATUS_FLAG_RESCAN) {
 - if (fs_info-qgroup_rescan_progress.objectid = node-bytenr) {
 - ret = 0;
 - goto unlock;
 - }
 - }
 -
   quota_root = fs_info-quota_root;
   if (!quota_root)
   goto unlock;
 @@ -1492,7 +1484,6 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle 
 *trans,
  
  unlock:
   spin_unlock(fs_info-qgroup_lock);
 - mutex_unlock(fs_info-qgroup_rescan_lock);
   ulist_free(roots);
  
   return ret;
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix passing wrong arg gfp_t to decide the correct allocation mode

2013-05-07 Thread Jan Schmidt
On Tue, May 07, 2013 at 08:20 (+0200), Wang Shilong wrote:
 If you look the code carefully, you will see all the tree_mod_alloc()
 has to use GFP_ATOMIC. However, the original code pass the wrong arg
 gfp_t in some places, this dosen't cause any problems, because in the
 tree_mod_alloc(), it ignores arg gfp_t and just use GFP_ATOMIC directly,
 this is not good.
 
 However, i think we should try best not to allocate with GFP_ATOMIC, so
 i keep the gfp_t there in the hope we can change allocation mode in the
 future.

NAK.

The code as it is now is prepared to get rid of at least some GFP_ATOMIC
allocations. You won't get rid of all of them, as there are a lot of spin lock
situations where we need to add to the tree mod lock anyway.

As a preparation we currently pass the best flags (least restrictive) we can
instead of always passing GFP_ATOMIC. I pointed you to this comment already:

 557 /*
 558  * once we switch from spin locks to something different, we should
 559  * honor the flags parameter here.
 560  */
 561 tm = *tm_ret = kzalloc(sizeof(*tm), GFP_ATOMIC);

So, if you want less atomic allocations, find something more suitable than an
rwlock for fs_info-tree_mod_log_lock an you can in fact replace GFP_ATOMIC
with flags in the kzalloc().

The good thing is, because everything is already prepared you don't have to
think about all the callers again an pass the correct flags.

-Jan


 Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com
 ---
  fs/btrfs/ctree.c |   37 ++---
  1 files changed, 18 insertions(+), 19 deletions(-)
 
 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index de6de8e..33c9061 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -553,7 +553,7 @@ static inline int tree_mod_alloc(struct btrfs_fs_info 
 *fs_info, gfp_t flags,
* once we switch from spin locks to something different, we should
* honor the flags parameter here.
*/
 - tm = *tm_ret = kzalloc(sizeof(*tm), GFP_ATOMIC);
 + tm = *tm_ret = kzalloc(sizeof(*tm), flags);
   if (!tm)
   return -ENOMEM;
  
 @@ -591,14 +591,14 @@ __tree_mod_log_insert_key(struct btrfs_fs_info *fs_info,
  static noinline int
  tree_mod_log_insert_key_mask(struct btrfs_fs_info *fs_info,
struct extent_buffer *eb, int slot,
 -  enum mod_log_op op, gfp_t flags)
 +  enum mod_log_op op)
  {
   int ret;
  
   if (tree_mod_dont_log(fs_info, eb))
   return 0;
  
 - ret = __tree_mod_log_insert_key(fs_info, eb, slot, op, flags);
 + ret = __tree_mod_log_insert_key(fs_info, eb, slot, op, GFP_ATOMIC);
  
   tree_mod_log_write_unlock(fs_info);
   return ret;
 @@ -608,7 +608,7 @@ static noinline int
  tree_mod_log_insert_key(struct btrfs_fs_info *fs_info, struct extent_buffer 
 *eb,
   int slot, enum mod_log_op op)
  {
 - return tree_mod_log_insert_key_mask(fs_info, eb, slot, op, GFP_NOFS);
 + return tree_mod_log_insert_key_mask(fs_info, eb, slot, op);
  }
  
  static noinline int
 @@ -616,13 +616,13 @@ tree_mod_log_insert_key_locked(struct btrfs_fs_info 
 *fs_info,
struct extent_buffer *eb, int slot,
enum mod_log_op op)
  {
 - return __tree_mod_log_insert_key(fs_info, eb, slot, op, GFP_NOFS);
 + return __tree_mod_log_insert_key(fs_info, eb, slot, op, GFP_ATOMIC);
  }
  
  static noinline int
  tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
struct extent_buffer *eb, int dst_slot, int src_slot,
 -  int nr_items, gfp_t flags)
 +  int nr_items)
  {
   struct tree_mod_elem *tm;
   int ret;
 @@ -642,7 +642,7 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
   BUG_ON(ret  0);
   }
  
 - ret = tree_mod_alloc(fs_info, flags, tm);
 + ret = tree_mod_alloc(fs_info, GFP_ATOMIC, tm);
   if (ret  0)
   goto out;
  
 @@ -679,7 +679,7 @@ __tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, 
 struct extent_buffer *eb)
  static noinline int
  tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
struct extent_buffer *old_root,
 -  struct extent_buffer *new_root, gfp_t flags,
 +  struct extent_buffer *new_root,
int log_removal)
  {
   struct tree_mod_elem *tm;
 @@ -691,7 +691,7 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
   if (log_removal)
   __tree_mod_log_free_eb(fs_info, old_root);
  
 - ret = tree_mod_alloc(fs_info, flags, tm);
 + ret = tree_mod_alloc(fs_info, GFP_ATOMIC, tm);
   if (ret  0)
   goto out;
  
 @@ -809,19 +809,18 @@ tree_mod_log_eb_move(struct btrfs_fs_info *fs_info, 
 struct extent_buffer *dst,
  {
   int ret;
   ret = tree_mod_log_insert_move(fs_info, 

Re: Kernel BUG: __tree_mod_log_rewind

2013-05-07 Thread Jan Schmidt
On Tue, May 07, 2013 at 11:25 (+0200), Elladan wrote:
 I can get btrfs to throw a kernel bug easily by running btrfs fi
 defrag on some files in 3.9.0:

Thanks for reporting. It's a known bug (that ought to be fixed before the 3.9
release in fact). You can either use btrfs-next or apply the commits mentioned
in my previous email today:

On Tue, May 07, 2013 at 08:08 (+0200), Jan Schmidt wrote:
 In git log order:

 6ced2666 Btrfs: separate sequence numbers for delayed ref tracking and tree
 mod log
 ef9120b1 Btrfs: fix tree mod log regression on root split operations
 2ed098ca Btrfs: fix accessing the root pointer in tree mod log functions
 50723551 Btrfs: fix unlock after free on rewinded tree blocks

 The commit ids are from josef's master branch
 (git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git) which is
 known not to be very stable regarding commit ids.

Either way should fix your problem. An alternative is to wait for a 3.9 stable
release after those fixes are in mainline (which should happen within the next
seven days, I hope). Not using defrag, autodefrag or qgroups might also be an
effective workaround, but no guarantees on that.

-Jan

 May  7 01:57:33 caper kernel: [0.00] Linux version
 3.9.0-030900-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro
 4.6.3-1ubuntu5) ) #201304291257 SMP Mon Apr 29 16:58:15 UTC 2013
 ...
 May  7 02:09:21 caper kernel: [  726.745485] [ cut here
 ]
 May  7 02:09:21 caper kernel: [  726.745567] Kernel BUG at
 a00ea503 [verbose debug info unavailable]
 May  7 02:09:21 caper kernel: [  726.745643] invalid opcode:  [#1] SMP
 May  7 02:09:21 caper kernel: [  726.745807] Modules linked in:
 snd_hrtimer zram(C) bnep rfcomm bluetooth parport_pc ppdev nfsd
 nfs_acl auth_rpcgss nfs fscache binfmt_misc lockd sunrpc
 snd_hda_codec_hdmi joydev hid_gaff ff_memless snd_usb_
 audio snd_usbmidi_lib uvcvideo snd_seq_midi videobuf2_core videodev
 snd_rawmidi videobuf2_vmalloc videobuf2_memops snd_seq_midi_event
 dm_multipath snd_hda_codec_realtek snd_seq scsi_dh kvm_amd
 snd_seq_device snd_hda_intel kvm snd_hda_codec
  snd_hwdep microcode snd_pcm snd_timer k10temp edac_core edac_mce_amd
 serio_raw snd sp5100_tco i2c_piix4 soundcore snd_page_alloc mac_hid
 wmi it87 hwmon_vid lp parport xfs btrfs raid6_pq zlib_deflate xor
 libcrc32c ses enclosure dm_crypt hi
 d_generic usbhid hid usb_storage firewire_ohci firewire_core crc_itu_t
 ahci pata_acpi pata_atiixp libahci r8169
 May  7 02:09:21 caper kernel: [  726.749841] CPU 3
 May  7 02:09:21 caper kernel: [  726.749900] Pid: 1703, comm:
 btrfs-endio-wri Tainted: G C   3.9.0-030900-generic
 #201304291257 Gigabyte Technology Co., Ltd.
 GA-MA790GP-UD4H/GA-MA790GP-UD4H
 May  7 02:09:21 caper kernel: [  726.750069] RIP:
 0010:[a00ea503]  [a00ea503]
 __tree_mod_log_rewind+0x253/0x260 [btrfs]
 May  7 02:09:21 caper kernel: [  726.750244] RSP:
 0018:88011a2e1838  EFLAGS: 00010293
 May  7 02:09:21 caper kernel: [  726.750316] RAX: 
 RBX: 88004b2798f0 RCX: 88011a2e17d8
 May  7 02:09:21 caper kernel: [  726.750390] RDX: 13f3a75c
 RSI: 05e8 RDI: 8800172ea880
 May  7 02:09:21 caper kernel: [  726.750463] RBP: 88011a2e1868
 R08: 1000 R09: 88011a2e17e8
 May  7 02:09:21 caper kernel: [  726.750536] R10: 000103db
 R11:  R12: 880098cf4d80
 May  7 02:09:21 caper kernel: [  726.750609] R13: 002b
 R14: 8800172ea700 R15: 0009c7a7
 May  7 02:09:21 caper kernel: [  726.750683] FS:
 7fa2bc594700() GS:88014fd8()
 knlGS:
 May  7 02:09:21 caper kernel: [  726.750770] CS:  0010 DS:  ES:
  CR0: 8005003b
 May  7 02:09:21 caper kernel: [  726.750841] CR2: fd82c000
 CR3: 00014654d000 CR4: 07e0
 May  7 02:09:21 caper kernel: [  726.750914] DR0: 
 DR1:  DR2: 
 May  7 02:09:21 caper kernel: [  726.750987] DR3: 
 DR6: 0ff0 DR7: 0400
 May  7 02:09:21 caper kernel: [  726.751061] Process btrfs-endio-wri
 (pid: 1703, threadinfo 88011a2e, task 88004a6b2ea0)
 May  7 02:09:21 caper kernel: [  726.751147] Stack:
 May  7 02:09:21 caper kernel: [  726.751212]  88011a2e1858
 880104c8de30 0009c7a7 8800
 May  7 02:09:21 caper kernel: [  726.751488]  a8598000
 880148278000 88011a2e18b8 a00ea5ef
 May  7 02:09:21 caper kernel: [  726.751763]  880098cf4d80
 88004b2798f0 8800338d3000 0001
 May  7 02:09:21 caper kernel: [  726.752038] Call Trace:
 May  7 02:09:21 caper kernel: [  726.752135]  [a00ea5ef]
 tree_mod_log_rewind+0xdf/0x240 [btrfs]
 May  7 02:09:21 caper kernel: [  726.752237]  [a00f25cb]
 btrfs_search_old_slot+0x4cb/0x670 [btrfs]
 May  7 02:09:21 caper kernel: [  726.752351]  [a016d118

Re: [PATCH] Btrfs: use arg gfp_mask to decide how to allocate tree mod

2013-05-06 Thread Jan Schmidt
On Sun, May 05, 2013 at 15:58 (+0200), Wang Shilong wrote:
 It seems the original code doesn't pass the right arg gfp_t to decide how to 
 allocate.
 Just applying this patch, fsstress will fail. So please ignore this patch, 
 will resend later..

That's in fact what the comment above the line you changed implies :-)

-Jan

 Thanks,
 Wang
 
 From: Wang Shilong wangsl-f...@cn.fujitsu.com

 We have passed arg gfp_mask to tree_mod_alloc(), so
 just use it rather than always use GFP_ATOMIC.

 Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com
 ---
 fs/btrfs/ctree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index de6de8e..0e3514f 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -553,7 +553,7 @@ static inline int tree_mod_alloc(struct btrfs_fs_info 
 *fs_info, gfp_t flags,
   * once we switch from spin locks to something different, we should
   * honor the flags parameter here.
   */
 -tm = *tm_ret = kzalloc(sizeof(*tm), GFP_ATOMIC);
 +tm = *tm_ret = kzalloc(sizeof(*tm), flags);
  if (!tm)
  return -ENOMEM;

 -- 
 1.7.11.7

 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible to dedpulicate read-only snapshots for space-efficient backups

2013-05-06 Thread Jan Schmidt
On Sun, May 05, 2013 at 12:07 (+0200), Kai Krakow wrote:
 I'm using an bash/rsync script[1] to backup my whole system on a nightly 
 basis to an attached USB3 drive into a scratch area, then take a snapshot of 
 this area. I'd like to have these snapshots immutable, so they should be 
 read-only.

Have you considered using btrfs send / receive for that purpose? You would just
save the dedup step.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hard freezes with 3.9.0 during io-intensive loads

2013-05-06 Thread Jan Schmidt
On Sun, May 05, 2013 at 18:10 (+0200), Kai Krakow wrote:
 Hello list,
 
 Kai Krakow hurikhan77+bt...@gmail.com schrieb:
 
 I've upgraded to 3.9.0 mainly for the snapshot-aware defragging patches.
 I'm running bedup[1] on a regular basis and it is now the third time that
 I got back to my PC just to find it hard-frozen and I needed to use the
 reset button.

 It looks like this happens only while running bedup on my two btrfs
 filesystems but I'm not sure if it happens for any of the filesystems or
 only one. This is my setup:

 # cat /etc/fstab (shortened)
 UUID=d2bb232a-2e8f-4951-8bcc-97e237f1b536 / btrfs
 compress=lzo,subvol=root64 0 1 # /dev/sd{a,b,c}3
 LABEL=usb-backup /mnt/private/usb-backup btrfs noauto,compress-
 force=zlib,subvolid=0,autodefrag,comment=systemd.automount 0 0 # external
 usb3 disk

 # btrfs filesystem show
 Label: 'usb-backup'  uuid: 7038c8fa-4293-49e9-b493-a9c46e5663ca
 Total devices 1 FS bytes used 1.13TB
 devid1 size 1.82TB used 1.75TB path /dev/sdd1

 Label: 'system'  uuid: d2bb232a-2e8f-4951-8bcc-97e237f1b536
 Total devices 3 FS bytes used 914.43GB
 devid3 size 927.26GB used 426.03GB path /dev/sdc3
 devid2 size 927.26GB used 426.03GB path /dev/sdb3
 devid1 size 927.26GB used 427.07GB path /dev/sda3

 Btrfs v0.20-rc1

 Since the system hard-freezes I have no messages from dmesg. But I suspect
 it to be related to the defragmentation option in bedup (I've switched to
 bedub with --defrag since 3.9.0, and autodefrag for the backup drive).
 Just in case, I'm going to try without this option now and see if it won't
 freeze.

 I was able to take a physical screenshot with a real camera of a kernel
 backtrace one time when the freeze happened. I wonder if it is useful to
 you and where to send it. I just don't want to upload jpegs right here to
 the list without asking first.

 The big plus is: Altough I had to hard-reset the frozen system several
 times now, btrfs survived the procedure without any impact (just boot
 times increases noticeably, probably due to log-replays or something). So
 thumbs up for the developers on that point.
 
 Thanks to the great cwillu netcat service here's my backtrace:

That one should be fixed in btrfs-next. If you can reliably reproduce the bug
I'd be glad to get a confirmation - you can probably even save putting it on
bugzilla then ;-)

-Jan

 4,1072,17508258745,-;[ cut here ]
 2,1073,17508258772,-;kernel BUG at fs/btrfs/ctree.c:1144!
 4,1074,17508258791,-;invalid opcode:  [#1] SMP 
 4,1075,17508258811,-;Modules linked in: bnep bluetooth af_packet vmci(O) 
 vmmon(O) vmblock(O) vmnet(O) vsock reiserfs snd_usb_audio snd_usbmidi_lib 
 snd_rawmidi snd_seq_device gspca_sonixj gpio_ich gspca_main videodev 
 coretemp hwmon kvm_intel kvm crc32_pclmul crc32c_intel 8250 serial_core 
 lpc_ich microcode mfd_core i2c_i801 pcspkr evdev usb_storage zram(C) unix
 4,1076,17508258966,-;CPU 0 
 4,1077,17508258977,-;Pid: 7212, comm: btrfs-endio-wri Tainted: G C O 
 3.9.0-gentoo #2 To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3
 4,1078,17508259023,-;RIP: 0010:[81161d12]  [81161d12] 
 __tree_mod_log_rewind+0x4c/0x121
 4,1079,17508259064,-;RSP: 0018:8801966718e8  EFLAGS: 00010293
 4,1080,17508259085,-;RAX: 0003 RBX: 8801ee8d33b0 RCX: 
 880196671888
 4,1081,17508259112,-;RDX: 0a4596a4 RSI: 0eee RDI: 
 8804087be700
 4,1082,17508259138,-;RBP: 0071 R08: 1000 R09: 
 880196671898
 4,1083,17508259165,-;R10:  R11:  R12: 
 880406c2e000
 4,1084,17508259191,-;R13: 8a11 R14: 8803b5aa1200 R15: 
 0001
 4,1085,17508259218,-;FS:  () GS:88041f20() 
 knlGS:
 4,1086,17508259248,-;CS:  0010 DS:  ES:  CR0: 80050033
 4,1087,17508259270,-;CR2: 026f0390 CR3: 01a0b000 CR4: 
 000407f0
 4,1088,17508259297,-;DR0:  DR1:  DR2: 
 
 4,1089,17508259323,-;DR3:  DR6: 0ff0 DR7: 
 0400
 4,1090,17508259350,-;Process btrfs-endio-wri (pid: 7212, threadinfo 
 88019667, task 8801b82e5400)
 4,1091,17508259383,-;Stack:
 4,1092,17508259391,-; 8801ee8d38f0 880021b6f360 88013a5b2000 
 8a11
 4,1093,17508259423,-; 8802d0a14000 81167606 0246 
 8801ee8d33b0
 4,1094,17508259455,-; 880406c2e000 8801966719bf 880021b6f360 
 
 4,1095,17508259488,-;Call Trace:
 4,1096,17508259500,-; [81167606] ? 
 btrfs_search_old_slot+0x543/0x61e
 4,1097,17508259526,-; [811692de] ? btrfs_next_old_leaf+0x8a/0x332
 4,1098,17508259552,-; [811c484a] ? 
 __resolve_indirect_refs+0x2d8/0x408
 4,1099,17508259578,-; [811c533b] ? find_parent_nodes+0x9c1/0xcec
 4,1100,17508259602,-; [811c5e06] 

Btrfs: wait for quota rescan to complete

2013-05-06 Thread Jan Schmidt
Two small patches, one for the kernel and one for the user mode. Both
required to support waiting for quota rescan to complete.

Jan Schmidt (1):
  Btrfs: add ioctl to wait for qgroup rescan completion

 fs/btrfs/ctree.h   |2 ++
 fs/btrfs/ioctl.c   |   12 
 fs/btrfs/qgroup.c  |   21 +
 include/uapi/linux/btrfs.h |1 +
 4 files changed, 36 insertions(+), 0 deletions(-)


Jan Schmidt (2):
  Btrfs-progs: fixup: add flags to struct btrfs_ioctl_quota_rescan_args
  Btrfs-progs: added btrfs quota rescan -w switch (wait)

 cmds-quota.c |   19 +--
 ioctl.h  |2 ++
 2 files changed, 19 insertions(+), 2 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs-progs: added btrfs quota rescan -w switch (wait)

2013-05-06 Thread Jan Schmidt
With -w one can wait for a rescan operation to finish. It can be used when
starting a rescan operation or later to wait for the currently running
rescan operation to finish. Waiting is interruptible.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 cmds-quota.c |   19 +--
 ioctl.h  |1 +
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/cmds-quota.c b/cmds-quota.c
index 1169772..6557e83 100644
--- a/cmds-quota.c
+++ b/cmds-quota.c
@@ -90,10 +90,11 @@ static int cmd_quota_disable(int argc, char **argv)
 }
 
 static const char * const cmd_quota_rescan_usage[] = {
-   btrfs quota rescan [-s] path,
+   btrfs quota rescan [-sw] path,
Trash all qgroup numbers and scan the metadata again with the current 
config.,
,
-s   show status of a running rescan operation,
+   -w   wait for rescan operation to finish (can be already in progress),
NULL
 };
 
@@ -105,21 +106,30 @@ static int cmd_quota_rescan(int argc, char **argv)
char *path = NULL;
struct btrfs_ioctl_quota_rescan_args args;
int ioctlnum = BTRFS_IOC_QUOTA_RESCAN;
+   int wait_for_completion = 0;
 
optind = 1;
while (1) {
-   int c = getopt(argc, argv, s);
+   int c = getopt(argc, argv, sw);
if (c  0)
break;
switch (c) {
case 's':
ioctlnum = BTRFS_IOC_QUOTA_RESCAN_STATUS;
break;
+   case 'w':
+   wait_for_completion = 1;
+   break;
default:
usage(cmd_quota_rescan_usage);
}
}
 
+   if (ioctlnum != BTRFS_IOC_QUOTA_RESCAN  wait_for_completion) {
+   fprintf(stderr, ERROR: -w cannot be used with -s\n);
+   return 12;
+   }
+
if (check_argc_exact(argc - optind, 1))
usage(cmd_quota_rescan_usage);
 
@@ -134,6 +144,11 @@ static int cmd_quota_rescan(int argc, char **argv)
 
ret = ioctl(fd, ioctlnum, args);
e = errno;
+
+   if (wait_for_completion  (ret == 0 || e == EINPROGRESS)) {
+   ret = ioctl(fd, BTRFS_IOC_QUOTA_RESCAN_WAIT, args);
+   e = errno;
+   }
close(fd);
 
if (ioctlnum == BTRFS_IOC_QUOTA_RESCAN) {
diff --git a/ioctl.h b/ioctl.h
index abe6dd4..c260bbf 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -529,6 +529,7 @@ struct btrfs_ioctl_clone_range_args {
   struct btrfs_ioctl_quota_rescan_args)
 #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
   struct btrfs_ioctl_quota_rescan_args)
+#define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)
 #define BTRFS_IOC_GET_FSLABEL _IOR(BTRFS_IOCTL_MAGIC, 49, \
   char[BTRFS_LABEL_SIZE])
 #define BTRFS_IOC_SET_FSLABEL _IOW(BTRFS_IOCTL_MAGIC, 50, \
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs-progs: fixup: add flags to struct btrfs_ioctl_quota_rescan_args

2013-05-06 Thread Jan Schmidt
The patch set previously sent was sent together with the kernel part, but
was not updated as I added some reserved bytes to the ioctl struct for
future compatibility. This fixes struct btrfs_ioctl_quota_rescan_args.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 ioctl.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/ioctl.h b/ioctl.h
index 1ee631a..abe6dd4 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -429,6 +429,7 @@ struct btrfs_ioctl_quota_ctl_args {
 struct btrfs_ioctl_quota_rescan_args {
__u64   flags;
__u64   progress;
+   __u64   reserved[6];
 };
 
 struct btrfs_ioctl_qgroup_assign_args {
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: add ioctl to wait for qgroup rescan completion

2013-05-06 Thread Jan Schmidt
btrfs_qgroup_wait_for_completion waits until the currently running qgroup
operation completes. It returns immediately when no rescan process is in
progress. This is useful to automate things around the rescan process (e.g.
testing).

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/ctree.h   |2 ++
 fs/btrfs/ioctl.c   |   12 
 fs/btrfs/qgroup.c  |   21 +
 include/uapi/linux/btrfs.h |1 +
 4 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8624f49..39ca0d9 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1607,6 +1607,7 @@ struct btrfs_fs_info {
struct mutex qgroup_rescan_lock; /* protects the progress item */
struct btrfs_key qgroup_rescan_progress;
struct btrfs_workers qgroup_rescan_workers;
+   struct completion qgroup_rescan_completion;
 
/* filesystem state */
unsigned long fs_state;
@@ -3836,6 +3837,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
 int btrfs_quota_disable(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info);
 int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
+int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info);
 int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info, u64 src, u64 dst);
 int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5e93bb8..9161660 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3937,6 +3937,16 @@ static long btrfs_ioctl_quota_rescan_status(struct file 
*file, void __user *arg)
return ret;
 }
 
+static long btrfs_ioctl_quota_rescan_wait(struct file *file, void __user *arg)
+{
+   struct btrfs_root *root = BTRFS_I(fdentry(file)-d_inode)-root;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   return btrfs_qgroup_wait_for_completion(root-fs_info);
+}
+
 static long btrfs_ioctl_set_received_subvol(struct file *file,
void __user *arg)
 {
@@ -4179,6 +4189,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_quota_rescan(file, argp);
case BTRFS_IOC_QUOTA_RESCAN_STATUS:
return btrfs_ioctl_quota_rescan_status(file, argp);
+   case BTRFS_IOC_QUOTA_RESCAN_WAIT:
+   return btrfs_ioctl_quota_rescan_wait(file, argp);
case BTRFS_IOC_DEV_REPLACE:
return btrfs_ioctl_dev_replace(root, argp);
case BTRFS_IOC_GET_FSLABEL:
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 9d49c58..ebca17a 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2068,6 +2068,8 @@ out:
} else {
pr_err(btrfs: qgroup scan failed with %d\n, err);
}
+
+   complete_all(fs_info-qgroup_rescan_completion);
 }
 
 static void
@@ -2108,6 +2110,7 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN;
memset(fs_info-qgroup_rescan_progress, 0,
sizeof(fs_info-qgroup_rescan_progress));
+   init_completion(fs_info-qgroup_rescan_completion);
 
/* clear all current qgroup tracking information */
for (n = rb_first(fs_info-qgroup_tree); n; n = rb_next(n)) {
@@ -2124,3 +2127,21 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
 
return 0;
 }
+
+int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info)
+{
+   int running;
+   int ret = 0;
+
+   mutex_lock(fs_info-qgroup_rescan_lock);
+   spin_lock(fs_info-qgroup_lock);
+   running = fs_info-qgroup_flags  BTRFS_QGROUP_STATUS_FLAG_RESCAN;
+   spin_unlock(fs_info-qgroup_lock);
+   mutex_unlock(fs_info-qgroup_rescan_lock);
+
+   if (running)
+   ret = wait_for_completion_interruptible(
+   fs_info-qgroup_rescan_completion);
+
+   return ret;
+}
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 5ef0df5..5b683b5 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -530,6 +530,7 @@ struct btrfs_ioctl_send_args {
   struct btrfs_ioctl_quota_rescan_args)
 #define BTRFS_IOC_QUOTA_RESCAN_STATUS _IOR(BTRFS_IOCTL_MAGIC, 45, \
   struct btrfs_ioctl_quota_rescan_args)
+#define BTRFS_IOC_QUOTA_RESCAN_WAIT _IO(BTRFS_IOCTL_MAGIC, 46)
 #define BTRFS_IOC_GET_FSLABEL _IOR(BTRFS_IOCTL_MAGIC, 49, \
   char[BTRFS_LABEL_SIZE])
 #define BTRFS_IOC_SET_FSLABEL _IOW(BTRFS_IOCTL_MAGIC, 50, \
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/3] Btrfs: rescan for qgroups

2013-05-01 Thread Jan Schmidt
Hi Wang,

On 01.05.2013 09:29, Wang Shilong wrote:
 Hello Jan,
 
 If qgroup tracking is out of sync, a rescan operation can be started. It
 iterates the complete extent tree and recalculates all qgroup tracking data.
 This is an expensive operation and should not be used unless required.

 A filesystem under rescan can still be umounted. The rescan continues on the
 next mount.  Status information is provided with a separate ioctl while a
 rescan operation is in progress.

 Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
 ---
 fs/btrfs/ctree.h   |   17 ++-
 fs/btrfs/disk-io.c |5 +
 fs/btrfs/ioctl.c   |   83 ++--
 fs/btrfs/qgroup.c  |  318 
 ++--
 include/uapi/linux/btrfs.h |   12 ++-
 5 files changed, 400 insertions(+), 35 deletions(-)

 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 412c306..e4f28a6 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1021,9 +1021,9 @@ struct btrfs_block_group_item {
  */
 #define BTRFS_QGROUP_STATUS_FLAG_ON  (1ULL  0)
 /*
 - * SCANNING is set during the initialization phase
 + * RESCAN is set during the initialization phase
  */
 -#define BTRFS_QGROUP_STATUS_FLAG_SCANNING   (1ULL  1)
 +#define BTRFS_QGROUP_STATUS_FLAG_RESCAN (1ULL  1)
 /*
  * Some qgroup entries are known to be out of date,
  * either because the configuration has changed in a way that
 @@ -1052,7 +1052,7 @@ struct btrfs_qgroup_status_item {
   * only used during scanning to record the progress
   * of the scan. It contains a logical address
   */
 -__le64 scan;
 +__le64 rescan;
 } __attribute__ ((__packed__));

 struct btrfs_qgroup_info_item {
 @@ -1603,6 +1603,11 @@ struct btrfs_fs_info {
  /* used by btrfs_qgroup_record_ref for an efficient tree traversal */
  u64 qgroup_seq;

 +/* qgroup rescan items */
 +struct mutex qgroup_rescan_lock; /* protects the progress item */
 +struct btrfs_key qgroup_rescan_progress;
 +struct btrfs_workers qgroup_rescan_workers;
 +
  /* filesystem state */
  unsigned long fs_state;

 @@ -2888,8 +2893,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct 
 btrfs_qgroup_status_item,
 version, 64);
 BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
 flags, 64);
 -BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item,
 -   scan, 64);
 +BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item,
 +   rescan, 64);

 /* btrfs_qgroup_info_item */
 BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
 @@ -3834,7 +3839,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle 
 *trans,
 struct btrfs_fs_info *fs_info);
 int btrfs_quota_disable(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info);
 -int btrfs_quota_rescan(struct btrfs_fs_info *fs_info);
 +int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
 int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info, u64 src, u64 dst);
 int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 7717363..63e9348 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -2010,6 +2010,7 @@ static void btrfs_stop_all_workers(struct 
 btrfs_fs_info *fs_info)
  btrfs_stop_workers(fs_info-caching_workers);
  btrfs_stop_workers(fs_info-readahead_workers);
  btrfs_stop_workers(fs_info-flush_workers);
 +btrfs_stop_workers(fs_info-qgroup_rescan_workers);
 }

 /* helper to cleanup tree roots */
 @@ -2301,6 +2302,7 @@ int open_ctree(struct super_block *sb,
  fs_info-qgroup_seq = 1;
  fs_info-quota_enabled = 0;
  fs_info-pending_quota_state = 0;
 +mutex_init(fs_info-qgroup_rescan_lock);

  btrfs_init_free_cluster(fs_info-meta_alloc_cluster);
  btrfs_init_free_cluster(fs_info-data_alloc_cluster);
 @@ -2529,6 +2531,8 @@ int open_ctree(struct super_block *sb,
  btrfs_init_workers(fs_info-readahead_workers, readahead,
 fs_info-thread_pool_size,
 fs_info-generic_worker);
 +btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1,
 +   fs_info-generic_worker);

  /*
   * endios are largely parallel and should have a very
 @@ -2563,6 +2567,7 @@ int open_ctree(struct super_block *sb,
  ret |= btrfs_start_workers(fs_info-caching_workers);
  ret |= btrfs_start_workers(fs_info-readahead_workers);
  ret |= btrfs_start_workers(fs_info-flush_workers);
 +ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers);
  if (ret) {
  err = -ENOMEM;
  goto fail_sb_buffer;
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index d0af96a..5e93bb8 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c

Re: [PATCH v4 2/3] Btrfs: rescan for qgroups

2013-05-01 Thread Jan Schmidt


On 01.05.2013 13:42, Wang Shilong wrote:
 Hi Jan,
 
 Hi Wang,

 On 01.05.2013 09:29, Wang Shilong wrote:
 Hello Jan,

 If qgroup tracking is out of sync, a rescan operation can be started. It
 iterates the complete extent tree and recalculates all qgroup tracking 
 data.
 This is an expensive operation and should not be used unless required.

 A filesystem under rescan can still be umounted. The rescan continues on 
 the
 next mount.  Status information is provided with a separate ioctl while a
 rescan operation is in progress.

 Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
 ---
 fs/btrfs/ctree.h   |   17 ++-
 fs/btrfs/disk-io.c |5 +
 fs/btrfs/ioctl.c   |   83 ++--
 fs/btrfs/qgroup.c  |  318 
 ++--
 include/uapi/linux/btrfs.h |   12 ++-
 5 files changed, 400 insertions(+), 35 deletions(-)

 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 412c306..e4f28a6 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1021,9 +1021,9 @@ struct btrfs_block_group_item {
 */
 #define BTRFS_QGROUP_STATUS_FLAG_ON(1ULL  0)
 /*
 - * SCANNING is set during the initialization phase
 + * RESCAN is set during the initialization phase
 */
 -#define BTRFS_QGROUP_STATUS_FLAG_SCANNING (1ULL  1)
 +#define BTRFS_QGROUP_STATUS_FLAG_RESCAN   (1ULL  1)
 /*
 * Some qgroup entries are known to be out of date,
 * either because the configuration has changed in a way that
 @@ -1052,7 +1052,7 @@ struct btrfs_qgroup_status_item {
 * only used during scanning to record the progress
 * of the scan. It contains a logical address
 */
 -  __le64 scan;
 +  __le64 rescan;
 } __attribute__ ((__packed__));

 struct btrfs_qgroup_info_item {
 @@ -1603,6 +1603,11 @@ struct btrfs_fs_info {
/* used by btrfs_qgroup_record_ref for an efficient tree traversal */
u64 qgroup_seq;

 +  /* qgroup rescan items */
 +  struct mutex qgroup_rescan_lock; /* protects the progress item */
 +  struct btrfs_key qgroup_rescan_progress;
 +  struct btrfs_workers qgroup_rescan_workers;
 +
/* filesystem state */
unsigned long fs_state;

 @@ -2888,8 +2893,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct 
 btrfs_qgroup_status_item,
   version, 64);
 BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
   flags, 64);
 -BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item,
 - scan, 64);
 +BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item,
 + rescan, 64);

 /* btrfs_qgroup_info_item */
 BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
 @@ -3834,7 +3839,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle 
 *trans,
   struct btrfs_fs_info *fs_info);
 int btrfs_quota_disable(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info);
 -int btrfs_quota_rescan(struct btrfs_fs_info *fs_info);
 +int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
 int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info, u64 src, u64 dst);
 int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 7717363..63e9348 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -2010,6 +2010,7 @@ static void btrfs_stop_all_workers(struct 
 btrfs_fs_info *fs_info)
btrfs_stop_workers(fs_info-caching_workers);
btrfs_stop_workers(fs_info-readahead_workers);
btrfs_stop_workers(fs_info-flush_workers);
 +  btrfs_stop_workers(fs_info-qgroup_rescan_workers);
 }

 /* helper to cleanup tree roots */
 @@ -2301,6 +2302,7 @@ int open_ctree(struct super_block *sb,
fs_info-qgroup_seq = 1;
fs_info-quota_enabled = 0;
fs_info-pending_quota_state = 0;
 +  mutex_init(fs_info-qgroup_rescan_lock);

btrfs_init_free_cluster(fs_info-meta_alloc_cluster);
btrfs_init_free_cluster(fs_info-data_alloc_cluster);
 @@ -2529,6 +2531,8 @@ int open_ctree(struct super_block *sb,
btrfs_init_workers(fs_info-readahead_workers, readahead,
   fs_info-thread_pool_size,
   fs_info-generic_worker);
 +  btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1,
 + fs_info-generic_worker);

/*
 * endios are largely parallel and should have a very
 @@ -2563,6 +2567,7 @@ int open_ctree(struct super_block *sb,
ret |= btrfs_start_workers(fs_info-caching_workers);
ret |= btrfs_start_workers(fs_info-readahead_workers);
ret |= btrfs_start_workers(fs_info-flush_workers);
 +  ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers);
if (ret) {
err = -ENOMEM;
goto fail_sb_buffer;
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index d0af96a..5e93bb8 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -3701,12 +3701,10 @@ static long

[BUG] crash after failed mount of btrfs-image

2013-04-29 Thread Jan Schmidt
Hi Josef,

tried your btrfs-image tool (which didn't work for me but that's not that
important).

# ~/btrfs-image /dev/sdt1 /var/tmp/janosch.btrfsimage
# mount -o loop /var/tmp/janosch.btrfsimage /mnt/test
mount: you must specify the filesystem type

Doesn't mount, okay. Use -r:

# ~/btrfs-image -r /var/tmp/janosch.btrfsimage /var/tmp/removeme
# mount -o loop /var/tmp/removeme /mnt/test

That failed in open_ctree - probably related. The following loopback mount of a
different dd dump of the same file system lead to a null pointer dereference.

# mount -o loop /var/tmp/janosch.dump /mnt/test

I'm just guessing, should btrfs-image patch the uuid in the blocks and generate
a fresh one?

1[ 2287.927943] BUG: unable to handle kernel
6[ 2287.927944] SysRq : Changing Loglevel
4[ 2287.927945] Loglevel set to 3
4[ 2288.061561] NULL pointer dereference at 01e8
1[ 2288.061563] IP: [a048da30] start_transaction+0x20/0x4f0 [btrfs]
4[ 2288.143091] PGD 232f5c067 PUD 2279dd067 PMD 0
4[ 2288.143094] Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
4[ 2288.143098] Modules linked in: btrfs raid6_pq xor mpt2sas 
scsi_transport_sas raid_class [last unloaded: btrfs]
4[ 2288.143104] CPU 2
4[ 2288.143107] Pid: 22375, comm: btrfs-qgroup-re Not tainted 3.8.0+ #15 
Supermicro X8SIL/X8SIL
4[ 2288.143109] RIP: 0010:[a048da30]  [a048da30] 
start_transaction+0x20/0x4f0 [btrfs]
4[ 2288.143122] RSP: 0018:880232d79c28  EFLAGS: 00010296
4[ 2288.143123] RAX: 0014 RBX: ffe2 RCX: 
0002
4[ 2288.143125] RDX:  RSI:  RDI: 

4[ 2288.143126] RBP: 880232d79c78 R08:  R09: 

4[ 2288.143128] R10: 0001 R11: 074b R12: 
880231c31378
4[ 2288.143129] R13: 880227ce5158 R14:  R15: 
880231c31368
4[ 2288.143131] FS:  () GS:880236a0() 
knlGS:
4[ 2288.143133] CS:  0010 DS:  ES:  CR0: 8005003b
4[ 2288.143134] CR2: 01e8 CR3: 00022fdd CR4: 
07e0
4[ 2288.143136] DR0:  DR1:  DR2: 

4[ 2288.143137] DR3:  DR6: 0ff0 DR7: 
0400
4[ 2288.143139] Process btrfs-qgroup-re (pid: 22375, threadinfo 
880232d78000, task 880234eb)
4[ 2288.143140] Stack:
4[ 2288.143141]  88020010 880232d79c98 880232d79c58 
0298
4[ 2288.143144]  a050658d fff4 880231c31378 
880227ce5158
4[ 2288.143147]  880232d79dd8 880231c31368 880232d79c88 
a048e298
4[ 2288.143151] Call Trace:
4[ 2288.143164]  [a048e298] btrfs_start_transaction+0x18/0x20 
[btrfs]
4[ 2288.143180]  [a04ef035] btrfs_qgroup_rescan_worker+0xd5/0x840 
[btrfs]
4[ 2288.143184]  [810ec06d] ? trace_hardirqs_off+0xd/0x10
4[ 2288.143187]  [810c99ab] ? local_clock+0x4b/0x60
4[ 2288.143191]  [819b9420] ? _raw_spin_unlock_irq+0x30/0x60
4[ 2288.143206]  [a04bc26f] worker_loop+0x13f/0x5b0 [btrfs]
4[ 2288.143221]  [a04bc130] ? btrfs_queue_worker+0x300/0x300 [btrfs]
4[ 2288.143224]  [810b4ebe] kthread+0xde/0xf0
4[ 2288.143227]  [810b4de0] ? __init_kthread_worker+0x70/0x70
4[ 2288.143231]  [819c0bdc] ret_from_fork+0x7c/0xb0
4[ 2288.143233]  [810b4de0] ? __init_kthread_worker+0x70/0x70
4[ 2288.143235] Code: c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 
56 41 55 41 54 53 48 83 ec 28 66 66 66 66 90 48 c7 c3 e2 ff ff ff 49 89 f6 48 
8b b7 e8 01 00 00 49 89 fc 41 89 d5 48 8b 86 a0 33 00 00 a8
1[ 2288.143266] RIP  [a048da30] start_transaction+0x20/0x4f0 [btrfs]
4[ 2288.225868]  RSP 880232d79c28
4[ 2288.225870] CR2: 01e8
4[ 2288.226363] ---[ end trace 64cb1c6d4f6c2fa7 ]---

The corresponding line of code from start_transaction is 334:

 324 static struct btrfs_trans_handle *
 325 start_transaction(struct btrfs_root *root, u64 num_items, int type,
 326   enum btrfs_reserve_flush_enum flush)
 327 {
 328 struct btrfs_trans_handle *h;
 329 struct btrfs_transaction *cur_trans;
 330 u64 num_bytes = 0;
 331 int ret;
 332 u64 qgroup_reserved = 0;
 333 
 334 if (test_bit(BTRFS_FS_STATE_ERROR, root-fs_info-fs_state))
 335 return ERR_PTR(-EROFS);

With the mentioned steps I could reproduce the problem once, a second attempt
failed.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: btrfs/276 - stop all fsstress before exiting

2013-04-26 Thread Jan Schmidt
On Fri, April 26, 2013 at 07:29 (+0200), Eric Sandeen wrote:
 Tests after 276 were failing because the background fsstress
 hadn't quit prior to exit, devices couldn't be unmounted, etc.

I don't see how that would happen. Any further insight?

 Just use the same trick as generic/068 does, and use
 a tmpfile to control whether the background loop keeps
 running.

I like that trick :-)

Thanks,
-Jan

 Also, no need to umount scratch at cleanup time, the scripts
 do that for us.
 
 Signed-off-by: Eric Sandeen sand...@redhat.com
 ---
 
 (nobody else ran into this?  really?)
 
 diff --git a/tests/btrfs/276 b/tests/btrfs/276
 index 0a5ce36..9d68b54 100755
 --- a/tests/btrfs/276
 +++ b/tests/btrfs/276
 @@ -36,14 +36,8 @@ noise_pid=0
  
  _cleanup()
  {
 - if [ $noise_pid -ne 0 ]; then
 - echo background noise kill $noise_pid $seqres.full
 - kill $noise_pid
 - noise_pid=0
 - wait
 - fi
 - echo *** unmount
 - umount $SCRATCH_MNT 2/dev/null
 + rm $tmp.running
 + wait
   rm -f $tmp.*
  }
  trap _cleanup; exit \$status 0 1 2 3 15
 @@ -210,7 +204,7 @@ workout()
  
   if [ $do_bg_noise -ne 0 ]; then
   # make background noise while backrefs are being walked
 - while /bin/true; do
 + while [ -f $tmp.running ]; do
   echo background fsstress $seqres.full
   run_check $FSSTRESS_PROG -d $SCRATCH_MNT/bgnoise -n 999
   echo background rm $seqres.full
 @@ -263,6 +257,8 @@ nfiles=4
  numprocs=1
  do_bg_noise=1
  
 +touch $tmp.running
 +
  workout $filesize $nfiles $numprocs $snap_name $do_bg_noise
  
  echo *** done
 diff --git a/tests/btrfs/276.out b/tests/btrfs/276.out
 index 2032dea..5113164 100644
 --- a/tests/btrfs/276.out
 +++ b/tests/btrfs/276.out
 @@ -1,4 +1,3 @@
  QA output created by 276
  *** test backref walking
  *** done
 -*** unmount
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions

2013-04-25 Thread Jan Schmidt
The function is separated into a preparation part and the three accounting
steps mentioned in the qgroups documentation. The goal is to make steps two
and three usable by the rescan functionality. A side effect is that the
function is restructured into readable subunits.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/qgroup.c |  253 +++--
 1 files changed, 148 insertions(+), 105 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index f175471..c50e5a5 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1185,6 +1185,144 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle 
*trans,
return 0;
 }
 
+static int qgroup_account_ref_step1(struct btrfs_fs_info *fs_info,
+   struct ulist *roots, struct ulist *tmp,
+   u64 seq)
+{
+   struct ulist_node *unode;
+   struct ulist_iterator uiter;
+   struct ulist_node *tmp_unode;
+   struct ulist_iterator tmp_uiter;
+   struct btrfs_qgroup *qg;
+   int ret;
+
+   ULIST_ITER_INIT(uiter);
+   while ((unode = ulist_next(roots, uiter))) {
+   qg = find_qgroup_rb(fs_info, unode-val);
+   if (!qg)
+   continue;
+
+   ulist_reinit(tmp);
+   /* XXX id not needed */
+   ret = ulist_add(tmp, qg-qgroupid,
+   (u64)(uintptr_t)qg, GFP_ATOMIC);
+   if (ret  0)
+   return ret;
+   ULIST_ITER_INIT(tmp_uiter);
+   while ((tmp_unode = ulist_next(tmp, tmp_uiter))) {
+   struct btrfs_qgroup_list *glist;
+
+   qg = (struct btrfs_qgroup *)(uintptr_t)tmp_unode-aux;
+   if (qg-refcnt  seq)
+   qg-refcnt = seq + 1;
+   else
+   ++qg-refcnt;
+
+   list_for_each_entry(glist, qg-groups, next_group) {
+   ret = ulist_add(tmp, glist-group-qgroupid,
+   (u64)(uintptr_t)glist-group,
+   GFP_ATOMIC);
+   if (ret  0)
+   return ret;
+   }
+   }
+   }
+
+   return 0;
+}
+
+static int qgroup_account_ref_step2(struct btrfs_fs_info *fs_info,
+   struct ulist *roots, struct ulist *tmp,
+   u64 seq, int sgn, u64 num_bytes,
+   struct btrfs_qgroup *qgroup)
+{
+   struct ulist_node *unode;
+   struct ulist_iterator uiter;
+   struct btrfs_qgroup *qg;
+   struct btrfs_qgroup_list *glist;
+   int ret;
+
+   ulist_reinit(tmp);
+   ret = ulist_add(tmp, qgroup-qgroupid, (uintptr_t)qgroup, GFP_ATOMIC);
+   if (ret  0)
+   return ret;
+
+   ULIST_ITER_INIT(uiter);
+   while ((unode = ulist_next(tmp, uiter))) {
+   qg = (struct btrfs_qgroup *)(uintptr_t)unode-aux;
+   if (qg-refcnt  seq) {
+   /* not visited by step 1 */
+   qg-rfer += sgn * num_bytes;
+   qg-rfer_cmpr += sgn * num_bytes;
+   if (roots-nnodes == 0) {
+   qg-excl += sgn * num_bytes;
+   qg-excl_cmpr += sgn * num_bytes;
+   }
+   qgroup_dirty(fs_info, qg);
+   }
+   WARN_ON(qg-tag = seq);
+   qg-tag = seq;
+
+   list_for_each_entry(glist, qg-groups, next_group) {
+   ret = ulist_add(tmp, glist-group-qgroupid,
+   (uintptr_t)glist-group, GFP_ATOMIC);
+   if (ret  0)
+   return ret;
+   }
+   }
+
+   return 0;
+}
+
+static int qgroup_account_ref_step3(struct btrfs_fs_info *fs_info,
+   struct ulist *roots, struct ulist *tmp,
+   u64 seq, int sgn, u64 num_bytes)
+{
+   struct ulist_node *unode;
+   struct ulist_iterator uiter;
+   struct btrfs_qgroup *qg;
+   struct ulist_node *tmp_unode;
+   struct ulist_iterator tmp_uiter;
+   int ret;
+
+   ULIST_ITER_INIT(uiter);
+   while ((unode = ulist_next(roots, uiter))) {
+   qg = find_qgroup_rb(fs_info, unode-val);
+   if (!qg)
+   continue;
+
+   ulist_reinit(tmp);
+   ret = ulist_add(tmp, qg-qgroupid, (uintptr_t)qg, GFP_ATOMIC);
+   if (ret  0)
+   return ret;
+
+   ULIST_ITER_INIT(tmp_uiter);
+   while ((tmp_unode = ulist_next(tmp

[PATCH v4 3/3] Btrfs: automatic rescan after quota enable command

2013-04-25 Thread Jan Schmidt
When qgroup tracking is enabled, we do an automatic cycle of the new rescan
mechanism.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/qgroup.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 664d457..1df4db5 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1491,10 +1491,14 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
 {
struct btrfs_root *quota_root = fs_info-quota_root;
int ret = 0;
+   int start_rescan_worker = 0;
 
if (!quota_root)
goto out;
 
+   if (!fs_info-quota_enabled  fs_info-pending_quota_state)
+   start_rescan_worker = 1;
+
fs_info-quota_enabled = fs_info-pending_quota_state;
 
spin_lock(fs_info-qgroup_lock);
@@ -1520,6 +1524,13 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
if (ret)
fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
 
+   if (!ret  start_rescan_worker) {
+   ret = btrfs_qgroup_rescan(fs_info);
+   if (ret)
+   pr_err(btrfs: start rescan quota failed: %d\n, ret);
+   ret = 0;
+   }
+
 out:
 
return ret;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/3] Btrfs: rescan for qgroups

2013-04-25 Thread Jan Schmidt
If qgroup tracking is out of sync, a rescan operation can be started. It
iterates the complete extent tree and recalculates all qgroup tracking data.
This is an expensive operation and should not be used unless required.

A filesystem under rescan can still be umounted. The rescan continues on the
next mount.  Status information is provided with a separate ioctl while a
rescan operation is in progress.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/ctree.h   |   17 ++-
 fs/btrfs/disk-io.c |5 +
 fs/btrfs/ioctl.c   |   83 ++--
 fs/btrfs/qgroup.c  |  318 ++--
 include/uapi/linux/btrfs.h |   12 ++-
 5 files changed, 400 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 412c306..e4f28a6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1021,9 +1021,9 @@ struct btrfs_block_group_item {
  */
 #define BTRFS_QGROUP_STATUS_FLAG_ON(1ULL  0)
 /*
- * SCANNING is set during the initialization phase
+ * RESCAN is set during the initialization phase
  */
-#define BTRFS_QGROUP_STATUS_FLAG_SCANNING  (1ULL  1)
+#define BTRFS_QGROUP_STATUS_FLAG_RESCAN(1ULL  1)
 /*
  * Some qgroup entries are known to be out of date,
  * either because the configuration has changed in a way that
@@ -1052,7 +1052,7 @@ struct btrfs_qgroup_status_item {
 * only used during scanning to record the progress
 * of the scan. It contains a logical address
 */
-   __le64 scan;
+   __le64 rescan;
 } __attribute__ ((__packed__));
 
 struct btrfs_qgroup_info_item {
@@ -1603,6 +1603,11 @@ struct btrfs_fs_info {
/* used by btrfs_qgroup_record_ref for an efficient tree traversal */
u64 qgroup_seq;
 
+   /* qgroup rescan items */
+   struct mutex qgroup_rescan_lock; /* protects the progress item */
+   struct btrfs_key qgroup_rescan_progress;
+   struct btrfs_workers qgroup_rescan_workers;
+
/* filesystem state */
unsigned long fs_state;
 
@@ -2888,8 +2893,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct 
btrfs_qgroup_status_item,
   version, 64);
 BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
   flags, 64);
-BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item,
-  scan, 64);
+BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item,
+  rescan, 64);
 
 /* btrfs_qgroup_info_item */
 BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
@@ -3834,7 +3839,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
   struct btrfs_fs_info *fs_info);
 int btrfs_quota_disable(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info);
-int btrfs_quota_rescan(struct btrfs_fs_info *fs_info);
+int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
 int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info, u64 src, u64 dst);
 int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7717363..63e9348 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2010,6 +2010,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info 
*fs_info)
btrfs_stop_workers(fs_info-caching_workers);
btrfs_stop_workers(fs_info-readahead_workers);
btrfs_stop_workers(fs_info-flush_workers);
+   btrfs_stop_workers(fs_info-qgroup_rescan_workers);
 }
 
 /* helper to cleanup tree roots */
@@ -2301,6 +2302,7 @@ int open_ctree(struct super_block *sb,
fs_info-qgroup_seq = 1;
fs_info-quota_enabled = 0;
fs_info-pending_quota_state = 0;
+   mutex_init(fs_info-qgroup_rescan_lock);
 
btrfs_init_free_cluster(fs_info-meta_alloc_cluster);
btrfs_init_free_cluster(fs_info-data_alloc_cluster);
@@ -2529,6 +2531,8 @@ int open_ctree(struct super_block *sb,
btrfs_init_workers(fs_info-readahead_workers, readahead,
   fs_info-thread_pool_size,
   fs_info-generic_worker);
+   btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1,
+  fs_info-generic_worker);
 
/*
 * endios are largely parallel and should have a very
@@ -2563,6 +2567,7 @@ int open_ctree(struct super_block *sb,
ret |= btrfs_start_workers(fs_info-caching_workers);
ret |= btrfs_start_workers(fs_info-readahead_workers);
ret |= btrfs_start_workers(fs_info-flush_workers);
+   ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers);
if (ret) {
err = -ENOMEM;
goto fail_sb_buffer;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index d0af96a..5e93bb8 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3701,12

Re: [PATCH] Btrfs: separate sequence numbers for delayed ref tracking and tree mod log

2013-04-24 Thread Jan Schmidt
On Wed, April 24, 2013 at 10:12 (+0200), Liu Bo wrote:
 On Tue, Apr 23, 2013 at 08:00:27PM +0200, Jan Schmidt wrote:
 Sequence numbers for delayed refs have been introduced in the first version
 of the qgroup patch set. To solve the problem of find_all_roots on a busy
 file system, the tree mod log was introduced. The sequence numbers for that
 were simply shared between those two users.
 
 Can't we just separate them with two vars?

My reasoning comes a few lines below ...

 thanks,
 liubo
 

 However, at one point in qgroup's quota accounting, there's a statement
 accessing the previous sequence number, that's still just doing (seq - 1)
 just as it had to in the very first version.

 To satisfy that requirement, this patch makes the sequence number counter 64
 bit and splits it into a major part (used for qgroup sequence number
 counting) and a minor part (incremented for each tree modification in the
 log). This enables us to go exactly one major step backwards, as required
 for qgroups, while still incrementing the sequence counter for tree mod log
 insertions to keep track of their order. Keeping them in a single variable
 means there's no need to change all the code dealing with comparisons of two
 sequence numbers.

See the previous sentence :-)

And, it doesn't add too much complexity, setting and incrementing remains in
fact quite easy, even though we use the upper 32 bit and the lower 32 bit of
that integer independently.

Thanks,
-Jan


 The sequence number is reset to 0 on commit (not new in this patch), which
 ensures we won't overflow the two 32 bit counters.

 Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs
 from the tree mod log code may happen.

 Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
 ---
 [snip]
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: separate sequence numbers for delayed ref tracking and tree mod log

2013-04-24 Thread Jan Schmidt
On Wed, April 24, 2013 at 15:04 (+0200), Josef Bacik wrote:
 On Tue, Apr 23, 2013 at 12:00:27PM -0600, Jan Schmidt wrote:
 Sequence numbers for delayed refs have been introduced in the first version
 of the qgroup patch set. To solve the problem of find_all_roots on a busy
 file system, the tree mod log was introduced. The sequence numbers for that
 were simply shared between those two users.

 However, at one point in qgroup's quota accounting, there's a statement
 accessing the previous sequence number, that's still just doing (seq - 1)
 just as it had to in the very first version.

 To satisfy that requirement, this patch makes the sequence number counter 64
 bit and splits it into a major part (used for qgroup sequence number
 counting) and a minor part (incremented for each tree modification in the
 log). This enables us to go exactly one major step backwards, as required
 for qgroups, while still incrementing the sequence counter for tree mod log
 insertions to keep track of their order. Keeping them in a single variable
 means there's no need to change all the code dealing with comparisons of two
 sequence numbers.

 The sequence number is reset to 0 on commit (not new in this patch), which
 ensures we won't overflow the two 32 bit counters.

 Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs
 from the tree mod log code may happen.

 Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
 ---
  fs/btrfs/ctree.c   |   36 +---
  fs/btrfs/ctree.h   |7 ++-
  fs/btrfs/delayed-ref.c |6 --
  fs/btrfs/disk-io.c |2 +-
  fs/btrfs/extent-tree.c |5 +++--
  fs/btrfs/qgroup.c  |   13 -
  fs/btrfs/transaction.c |2 +-
  7 files changed, 52 insertions(+), 19 deletions(-)

 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index 566d99b..b74136e 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -361,6 +361,36 @@ static inline void tree_mod_log_write_unlock(struct 
 btrfs_fs_info *fs_info)
  }
  
  /*
 + * increment the upper half of tree_mod_seq, set lower half zero
 + *
 + * must be called with fs_info-tree_mod_seq_lock held
 + */
 +static inline u64 btrfs_inc_tree_mod_seq_major(struct btrfs_fs_info 
 *fs_info)
 +{
 +u64 seq = atomic64_read(fs_info-tree_mod_seq);
 +seq = 0xull;
 +seq += 1ull  32;
 +atomic64_set(fs_info-tree_mod_seq, seq);
 +return seq;
 +}
 
 This isn't going to work, you read in the value, inc it and then set the new
 value.  If somebody comes in and inc's in between the read and the sync, like
 btrfs_inc_tree_mod_seq_minor could do when you call tree_mod_alloc, you'll end
 up losing the minor update.  Thanks,

I don't think I'll lose it. The minor update is made and returned to the one who
needs it, that number can still be used. There is no guarantee for two
concurrent modifications to which major a minor number belongs, though.

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] Btrfs: rescan for qgroups

2013-04-24 Thread Jan Schmidt
On Wed, April 24, 2013 at 13:00 (+0200), Wang Shilong wrote:
 Hello Jan,
 
 [snip]
 
 +/*
 + * returns  0 on error, 0 when more leafs are to be scanned.
 + * returns 1 when done, 2 when done and FLAG_INCONSISTENT was cleared.
 + */
 +static int
 +qgroup_rescan_leaf(struct qgroup_rescan *qscan, struct btrfs_path *path,
 +   struct btrfs_trans_handle *trans, struct ulist *tmp,
 +   struct extent_buffer *scratch_leaf)
 +{
 +struct btrfs_key found;
 +struct btrfs_fs_info *fs_info = qscan-fs_info;
 +struct ulist *roots = NULL;
 +struct ulist_node *unode;
 +struct ulist_iterator uiter;
 +struct seq_list tree_mod_seq_elem = {};
 +u64 seq;
 +int slot;
 +int ret;
 +
 +path-leave_spinning = 1;
 +mutex_lock(fs_info-qgroup_rescan_lock);
 +ret = btrfs_search_slot_for_read(fs_info-extent_root,
 + fs_info-qgroup_rescan_progress,
 + path, 1, 0);
 +
 +pr_debug(current progress key (%llu %u %llu), search_slot ret %d\n,
 + (unsigned long long)fs_info-qgroup_rescan_progress.objectid,
 + fs_info-qgroup_rescan_progress.type,
 + (unsigned long long)fs_info-qgroup_rescan_progress.offset,
 + ret);
 +
 +if (ret) {
 +/*
 + * The rescan is about to end, we will not be scanning any
 + * further blocks. We cannot unset the RESCAN flag here, because
 + * we want to commit the transaction if everything went well.
 + * To make the live accounting work in this phase, we set our
 + * scan progress pointer such that every real extent objectid
 + * will be smaller.
 + */
 +fs_info-qgroup_rescan_progress.objectid = (u64)-1;
 +btrfs_release_path(path);
 +mutex_unlock(fs_info-qgroup_rescan_lock);
 +return ret;
 +}
 +
 +btrfs_item_key_to_cpu(path-nodes[0], found,
 +  btrfs_header_nritems(path-nodes[0]) - 1);
 +fs_info-qgroup_rescan_progress.objectid = found.objectid + 1;
 +
 +btrfs_get_tree_mod_seq(fs_info, tree_mod_seq_elem);
 +memcpy(scratch_leaf, path-nodes[0], sizeof(*scratch_leaf));
 +slot = path-slots[0];
 +btrfs_release_path(path);
 +mutex_unlock(fs_info-qgroup_rescan_lock);
 +
 +for (; slot  btrfs_header_nritems(scratch_leaf); ++slot) {
 +btrfs_item_key_to_cpu(scratch_leaf, found, slot);
 +if (found.type != BTRFS_EXTENT_ITEM_KEY)
 +continue;
 +ret = btrfs_find_all_roots(trans, fs_info, found.objectid,
 +   tree_mod_seq_elem.seq, roots);
 +if (ret  0)
 +break;
 +spin_lock(fs_info-qgroup_lock);
 +seq = fs_info-qgroup_seq;
 +fs_info-qgroup_seq += roots-nnodes + 1; /* max refcnt */
 +
 +ulist_reinit(tmp);
 +ULIST_ITER_INIT(uiter);
 +while ((unode = ulist_next(roots, uiter))) {
 +struct btrfs_qgroup *qg;
 +
 +qg = find_qgroup_rb(fs_info, unode-val);
 +if (!qg)
 +continue;
 +
 +ret = ulist_add(tmp, qg-qgroupid, (uintptr_t)qg,
 +GFP_ATOMIC);
 +if (ret  0) {
 +spin_unlock(fs_info-qgroup_lock);
 +goto out;
 +}
 +}
 +
 +/* this is similar to step 2 of btrfs_qgroup_account_ref */
 +ULIST_ITER_INIT(uiter);
 +while ((unode = ulist_next(tmp, uiter))) {
 +struct btrfs_qgroup *qg;
 +struct btrfs_qgroup_list *glist;
 +
 +qg = (struct btrfs_qgroup *)(uintptr_t) unode-aux;
 +qg-rfer += found.offset;
 +qg-rfer_cmpr += found.offset;
 +WARN_ON(qg-tag = seq);
 +WARN_ON(qg-refcnt = seq);
 +if (qg-refcnt  seq)
 +qg-refcnt = seq + 1;
 +else
 +qg-refcnt = qg-refcnt + 1;
 +qgroup_dirty(fs_info, qg);
 +
 +list_for_each_entry(glist, qg-groups, next_group) {
 +ret = ulist_add(tmp, glist-group-qgroupid,
 +(uintptr_t)glist-group,
 +GFP_ATOMIC);
 +if (ret  0) {
 +spin_unlock(fs_info-qgroup_lock);
 +goto out;
 +}
 +}
 +}
 
 
 Here i think we can resue arne's 3 steps algorithm to make qgroup accounting 
 correct.
 However, your first step just find all the root 

[PATCH v2] Btrfs: separate sequence numbers for delayed ref tracking and tree mod log

2013-04-24 Thread Jan Schmidt
Sequence numbers for delayed refs have been introduced in the first version
of the qgroup patch set. To solve the problem of find_all_roots on a busy
file system, the tree mod log was introduced. The sequence numbers for that
were simply shared between those two users.

However, at one point in qgroup's quota accounting, there's a statement
accessing the previous sequence number, that's still just doing (seq - 1)
just as it would have to in the very first version.

To satisfy that requirement, this patch makes the sequence number counter 64
bit and splits it into a major part (used for qgroup sequence number
counting) and a minor part (incremented for each tree modification in the
log). This enables us to go exactly one major step backwards, as required
for qgroups, while still incrementing the sequence counter for tree mod log
insertions to keep track of their order. Keeping them in a single variable
means there's no need to change all the code dealing with comparisons of two
sequence numbers.

The sequence number is reset to 0 on commit (not new in this patch), which
ensures we won't overflow the two 32 bit counters.

Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs
from the tree mod log code may happen.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
Changes v1-v2:
- added spin lock and comment around btrfs_inc_tree_mod_seq_minor (to make
  Josef happy in case I get hit by a bus and somebody tries to change it
  later)

 fs/btrfs/ctree.c   |   47 ---
 fs/btrfs/ctree.h   |7 ++-
 fs/btrfs/delayed-ref.c |6 --
 fs/btrfs/disk-io.c |2 +-
 fs/btrfs/extent-tree.c |5 +++--
 fs/btrfs/qgroup.c  |   13 -
 fs/btrfs/transaction.c |2 +-
 7 files changed, 63 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 566d99b..6275c9c 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -361,6 +361,44 @@ static inline void tree_mod_log_write_unlock(struct 
btrfs_fs_info *fs_info)
 }
 
 /*
+ * Increment the upper half of tree_mod_seq, set lower half zero.
+ *
+ * Must be called with fs_info-tree_mod_seq_lock held.
+ */
+static inline u64 btrfs_inc_tree_mod_seq_major(struct btrfs_fs_info *fs_info)
+{
+   u64 seq = atomic64_read(fs_info-tree_mod_seq);
+   seq = 0xull;
+   seq += 1ull  32;
+   atomic64_set(fs_info-tree_mod_seq, seq);
+   return seq;
+}
+
+/*
+ * Increment the lower half of tree_mod_seq.
+ *
+ * Must be called with fs_info-tree_mod_seq_lock held. The way major numbers
+ * are generated should not technically require a spin lock here. (Rationale:
+ * incrementing the minor while incrementing the major seq number is between 
its
+ * atomic64_read and atomic64_set calls doesn't duplicate sequence numbers, it
+ * just returns a unique sequence number as usual.) We have decided to leave
+ * that requirement in here and rethink it once we notice it really imposes a
+ * problem on some workload.
+ */
+static inline u64 btrfs_inc_tree_mod_seq_minor(struct btrfs_fs_info *fs_info)
+{
+   return atomic64_inc_return(fs_info-tree_mod_seq);
+}
+
+/*
+ * return the last minor in the previous major tree_mod_seq number
+ */
+u64 btrfs_tree_mod_seq_prev(u64 seq)
+{
+   return (seq  0xull) - 1ull;
+}
+
+/*
  * This adds a new blocker to the tree mod log's blocker list if the @elem
  * passed does not already have a sequence number set. So when a caller expects
  * to record tree modifications, it should ensure to set elem-seq to zero
@@ -376,10 +414,10 @@ u64 btrfs_get_tree_mod_seq(struct btrfs_fs_info *fs_info,
tree_mod_log_write_lock(fs_info);
spin_lock(fs_info-tree_mod_seq_lock);
if (!elem-seq) {
-   elem-seq = btrfs_inc_tree_mod_seq(fs_info);
+   elem-seq = btrfs_inc_tree_mod_seq_major(fs_info);
list_add_tail(elem-list, fs_info-tree_mod_seq_list);
}
-   seq = btrfs_inc_tree_mod_seq(fs_info);
+   seq = btrfs_inc_tree_mod_seq_minor(fs_info);
spin_unlock(fs_info-tree_mod_seq_lock);
tree_mod_log_write_unlock(fs_info);
 
@@ -524,7 +562,10 @@ static inline int tree_mod_alloc(struct btrfs_fs_info 
*fs_info, gfp_t flags,
if (!tm)
return -ENOMEM;
 
-   tm-seq = btrfs_inc_tree_mod_seq(fs_info);
+   spin_lock(fs_info-tree_mod_seq_lock);
+   tm-seq = btrfs_inc_tree_mod_seq_minor(fs_info);
+   spin_unlock(fs_info-tree_mod_seq_lock);
+
return tm-seq;
 }
 
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 412c306..5f34f89 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1422,7 +1422,7 @@ struct btrfs_fs_info {
 
/* this protects tree_mod_seq_list */
spinlock_t tree_mod_seq_lock;
-   atomic_t tree_mod_seq;
+   atomic64_t tree_mod_seq;
struct list_head tree_mod_seq_list;
struct seq_list tree_mod_seq_elem;
 
@@ -3334,10

[PATCH v3 3/3] Btrfs: automatic rescan after quota enable command

2013-04-23 Thread Jan Schmidt
When qgroup tracking is enabled, we do an automatic cycle of the new rescan
mechanism.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/qgroup.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 249dd64..b1ae0ab 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1494,10 +1494,14 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
 {
struct btrfs_root *quota_root = fs_info-quota_root;
int ret = 0;
+   int start_rescan_worker = 0;
 
if (!quota_root)
goto out;
 
+   if (!fs_info-quota_enabled  fs_info-pending_quota_state)
+   start_rescan_worker = 1;
+
fs_info-quota_enabled = fs_info-pending_quota_state;
 
spin_lock(fs_info-qgroup_lock);
@@ -1523,6 +1527,12 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
if (ret)
fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
 
+   if (start_rescan_worker) {
+   ret = btrfs_qgroup_rescan(fs_info);
+   if (ret)
+   pr_err(btrfs: start rescan quota failed: %d\n, ret);
+   }
+
 out:
 
return ret;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions

2013-04-23 Thread Jan Schmidt
The function is separated into a preparation part and the three accounting
steps mentioned in the qgroups documentation. The goal is to make steps two
and three usable by the rescan functionality. A side effect is that the
function is restructured into readable subunits.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/qgroup.c |  253 +++--
 1 files changed, 148 insertions(+), 105 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index f175471..c50e5a5 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1185,6 +1185,144 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle 
*trans,
return 0;
 }
 
+static int qgroup_account_ref_step1(struct btrfs_fs_info *fs_info,
+   struct ulist *roots, struct ulist *tmp,
+   u64 seq)
+{
+   struct ulist_node *unode;
+   struct ulist_iterator uiter;
+   struct ulist_node *tmp_unode;
+   struct ulist_iterator tmp_uiter;
+   struct btrfs_qgroup *qg;
+   int ret;
+
+   ULIST_ITER_INIT(uiter);
+   while ((unode = ulist_next(roots, uiter))) {
+   qg = find_qgroup_rb(fs_info, unode-val);
+   if (!qg)
+   continue;
+
+   ulist_reinit(tmp);
+   /* XXX id not needed */
+   ret = ulist_add(tmp, qg-qgroupid,
+   (u64)(uintptr_t)qg, GFP_ATOMIC);
+   if (ret  0)
+   return ret;
+   ULIST_ITER_INIT(tmp_uiter);
+   while ((tmp_unode = ulist_next(tmp, tmp_uiter))) {
+   struct btrfs_qgroup_list *glist;
+
+   qg = (struct btrfs_qgroup *)(uintptr_t)tmp_unode-aux;
+   if (qg-refcnt  seq)
+   qg-refcnt = seq + 1;
+   else
+   ++qg-refcnt;
+
+   list_for_each_entry(glist, qg-groups, next_group) {
+   ret = ulist_add(tmp, glist-group-qgroupid,
+   (u64)(uintptr_t)glist-group,
+   GFP_ATOMIC);
+   if (ret  0)
+   return ret;
+   }
+   }
+   }
+
+   return 0;
+}
+
+static int qgroup_account_ref_step2(struct btrfs_fs_info *fs_info,
+   struct ulist *roots, struct ulist *tmp,
+   u64 seq, int sgn, u64 num_bytes,
+   struct btrfs_qgroup *qgroup)
+{
+   struct ulist_node *unode;
+   struct ulist_iterator uiter;
+   struct btrfs_qgroup *qg;
+   struct btrfs_qgroup_list *glist;
+   int ret;
+
+   ulist_reinit(tmp);
+   ret = ulist_add(tmp, qgroup-qgroupid, (uintptr_t)qgroup, GFP_ATOMIC);
+   if (ret  0)
+   return ret;
+
+   ULIST_ITER_INIT(uiter);
+   while ((unode = ulist_next(tmp, uiter))) {
+   qg = (struct btrfs_qgroup *)(uintptr_t)unode-aux;
+   if (qg-refcnt  seq) {
+   /* not visited by step 1 */
+   qg-rfer += sgn * num_bytes;
+   qg-rfer_cmpr += sgn * num_bytes;
+   if (roots-nnodes == 0) {
+   qg-excl += sgn * num_bytes;
+   qg-excl_cmpr += sgn * num_bytes;
+   }
+   qgroup_dirty(fs_info, qg);
+   }
+   WARN_ON(qg-tag = seq);
+   qg-tag = seq;
+
+   list_for_each_entry(glist, qg-groups, next_group) {
+   ret = ulist_add(tmp, glist-group-qgroupid,
+   (uintptr_t)glist-group, GFP_ATOMIC);
+   if (ret  0)
+   return ret;
+   }
+   }
+
+   return 0;
+}
+
+static int qgroup_account_ref_step3(struct btrfs_fs_info *fs_info,
+   struct ulist *roots, struct ulist *tmp,
+   u64 seq, int sgn, u64 num_bytes)
+{
+   struct ulist_node *unode;
+   struct ulist_iterator uiter;
+   struct btrfs_qgroup *qg;
+   struct ulist_node *tmp_unode;
+   struct ulist_iterator tmp_uiter;
+   int ret;
+
+   ULIST_ITER_INIT(uiter);
+   while ((unode = ulist_next(roots, uiter))) {
+   qg = find_qgroup_rb(fs_info, unode-val);
+   if (!qg)
+   continue;
+
+   ulist_reinit(tmp);
+   ret = ulist_add(tmp, qg-qgroupid, (uintptr_t)qg, GFP_ATOMIC);
+   if (ret  0)
+   return ret;
+
+   ULIST_ITER_INIT(tmp_uiter);
+   while ((tmp_unode = ulist_next(tmp

[PATCH v3 0/3] Btrfs: quota rescan for 3.10

2013-04-23 Thread Jan Schmidt
The kernel side for rescan, which is needed if you want to enable qgroup
tracking on a non-empty volume. The first patch splits
btrfs_qgroup_account_ref into readable ans reusable units. The second
patch adds the rescan implementation (refer to its commit message for a
description of the algorithm). The third  patch starts an automatic
rescan when qgroups are enabled. It is only separated to potentially
help bisecting things in case of a problem.

The required user space patch was sent at 2013-04-05, subject [PATCH]
Btrfs-progs: quota rescan.

--
Changes v2-v3:
- rebased to btrfs-next
- stop rescan worker when quota is disabled
- check return value of ulist_add()
- initialize worker struct to zero

Changes v1-v2:
- fix calculation of the exclusive field for qgroups in level != 0
- split btrfs_qgroup_account_ref
- take into account that mutex_unlock might schedule
- fix kzalloc error checking
- add some reserved ints to struct btrfs_ioctl_quota_rescan_args
- changed modification to unused #define BTRFS_QUOTA_CTL_RESCAN
- added missing (unsigned long long) casts for pr_debug
- more detailed commit messages

Jan Schmidt (3):
  Btrfs: split btrfs_qgroup_account_ref into four functions
  Btrfs: rescan for qgroups
  Btrfs: automatic rescan after quota enable command

 fs/btrfs/ctree.h   |   17 +-
 fs/btrfs/disk-io.c |5 +
 fs/btrfs/ioctl.c   |   83 ++-
 fs/btrfs/qgroup.c  |  575 +++-
 include/uapi/linux/btrfs.h |   12 +-
 5 files changed, 552 insertions(+), 140 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/3] Btrfs: rescan for qgroups

2013-04-23 Thread Jan Schmidt
If qgroup tracking is out of sync, a rescan operation can be started. It
iterates the complete extent tree and recalculates all qgroup tracking data.
This is an expensive operation and should not be used unless required.

A filesystem under rescan can still be umounted. The rescan continues on the
next mount.  Status information is provided with a separate ioctl while a
rescan operation is in progress.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/ctree.h   |   17 ++-
 fs/btrfs/disk-io.c |5 +
 fs/btrfs/ioctl.c   |   83 ++--
 fs/btrfs/qgroup.c  |  312 ++--
 include/uapi/linux/btrfs.h |   12 ++-
 5 files changed, 394 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 412c306..e4f28a6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1021,9 +1021,9 @@ struct btrfs_block_group_item {
  */
 #define BTRFS_QGROUP_STATUS_FLAG_ON(1ULL  0)
 /*
- * SCANNING is set during the initialization phase
+ * RESCAN is set during the initialization phase
  */
-#define BTRFS_QGROUP_STATUS_FLAG_SCANNING  (1ULL  1)
+#define BTRFS_QGROUP_STATUS_FLAG_RESCAN(1ULL  1)
 /*
  * Some qgroup entries are known to be out of date,
  * either because the configuration has changed in a way that
@@ -1052,7 +1052,7 @@ struct btrfs_qgroup_status_item {
 * only used during scanning to record the progress
 * of the scan. It contains a logical address
 */
-   __le64 scan;
+   __le64 rescan;
 } __attribute__ ((__packed__));
 
 struct btrfs_qgroup_info_item {
@@ -1603,6 +1603,11 @@ struct btrfs_fs_info {
/* used by btrfs_qgroup_record_ref for an efficient tree traversal */
u64 qgroup_seq;
 
+   /* qgroup rescan items */
+   struct mutex qgroup_rescan_lock; /* protects the progress item */
+   struct btrfs_key qgroup_rescan_progress;
+   struct btrfs_workers qgroup_rescan_workers;
+
/* filesystem state */
unsigned long fs_state;
 
@@ -2888,8 +2893,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct 
btrfs_qgroup_status_item,
   version, 64);
 BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
   flags, 64);
-BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item,
-  scan, 64);
+BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item,
+  rescan, 64);
 
 /* btrfs_qgroup_info_item */
 BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
@@ -3834,7 +3839,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
   struct btrfs_fs_info *fs_info);
 int btrfs_quota_disable(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info);
-int btrfs_quota_rescan(struct btrfs_fs_info *fs_info);
+int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
 int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info, u64 src, u64 dst);
 int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f4628c7..f80383e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1996,6 +1996,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info 
*fs_info)
btrfs_stop_workers(fs_info-caching_workers);
btrfs_stop_workers(fs_info-readahead_workers);
btrfs_stop_workers(fs_info-flush_workers);
+   btrfs_stop_workers(fs_info-qgroup_rescan_workers);
 }
 
 /* helper to cleanup tree roots */
@@ -2257,6 +2258,7 @@ int open_ctree(struct super_block *sb,
fs_info-qgroup_seq = 1;
fs_info-quota_enabled = 0;
fs_info-pending_quota_state = 0;
+   mutex_init(fs_info-qgroup_rescan_lock);
 
btrfs_init_free_cluster(fs_info-meta_alloc_cluster);
btrfs_init_free_cluster(fs_info-data_alloc_cluster);
@@ -2485,6 +2487,8 @@ int open_ctree(struct super_block *sb,
btrfs_init_workers(fs_info-readahead_workers, readahead,
   fs_info-thread_pool_size,
   fs_info-generic_worker);
+   btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1,
+  fs_info-generic_worker);
 
/*
 * endios are largely parallel and should have a very
@@ -2519,6 +2523,7 @@ int open_ctree(struct super_block *sb,
ret |= btrfs_start_workers(fs_info-caching_workers);
ret |= btrfs_start_workers(fs_info-readahead_workers);
ret |= btrfs_start_workers(fs_info-flush_workers);
+   ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers);
if (ret) {
err = -ENOMEM;
goto fail_sb_buffer;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index d0af96a..5e93bb8 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3701,12

Re: [PATCH v3 2/3] Btrfs: rescan for qgroups

2013-04-23 Thread Jan Schmidt
On Tue, April 23, 2013 at 14:05 (+0200), Wang Shilong wrote:
 Hello Jan,
 
 [..snip..]
 
 
 
  /*
   * the delayed ref sequence number we pass depends on the direction of
   * the operation. for add operations, we pass (node-seq - 1) to skip
 @@ -1401,7 +1428,17 @@ int btrfs_qgroup_account_ref(struct 
 btrfs_trans_handle *trans,
  if (ret  0)
  return ret;
  
 +mutex_lock(fs_info-qgroup_rescan_lock);
  spin_lock(fs_info-qgroup_lock);
 +if (fs_info-qgroup_flags  BTRFS_QGROUP_STATUS_FLAG_RESCAN) {
 +if (fs_info-qgroup_rescan_progress.objectid = node-bytenr) {
 +ret = 0;
 +mutex_unlock(fs_info-qgroup_rescan_lock);
 +goto unlock;
 +}
 +}
 +mutex_unlock(fs_info-qgroup_rescan_lock);
 +
  quota_root = fs_info-quota_root;
  if (!quota_root)
  goto unlock;
 @@ -1820,3 +1857,250 @@ void assert_qgroups_uptodate(struct 
 btrfs_trans_handle *trans)
  trans-delayed_ref_elem.seq);
  BUG();
  }
 +
 +/*
 + * returns  0 on error, 0 when more leafs are to be scanned.
 + * returns 1 when done, 2 when done and FLAG_INCONSISTENT was cleared.
 + */
 +static int
 +qgroup_rescan_leaf(struct qgroup_rescan *qscan, struct btrfs_path *path,
 +   struct btrfs_trans_handle *trans, struct ulist *tmp,
 +   struct extent_buffer *scratch_leaf)
 +{
 +struct btrfs_key found;
 +struct btrfs_fs_info *fs_info = qscan-fs_info;
 +struct ulist *roots = NULL;
 +struct ulist_node *unode;
 +struct ulist_iterator uiter;
 +struct seq_list tree_mod_seq_elem = {};
 +u64 seq;
 +int slot;
 +int ret;
 +
 +path-leave_spinning = 1;
 +mutex_lock(fs_info-qgroup_rescan_lock);
 +ret = btrfs_search_slot_for_read(fs_info-extent_root,
 + fs_info-qgroup_rescan_progress,
 + path, 1, 0);
 +
 +pr_debug(current progress key (%llu %u %llu), search_slot ret %d\n,
 + (unsigned long long)fs_info-qgroup_rescan_progress.objectid,
 + fs_info-qgroup_rescan_progress.type,
 + (unsigned long long)fs_info-qgroup_rescan_progress.offset,
 + ret);
 +
 +if (ret) {
 +/*
 + * The rescan is about to end, we will not be scanning any
 + * further blocks. We cannot unset the RESCAN flag here, because
 + * we want to commit the transaction if everything went well.
 + * To make the live accounting work in this phase, we set our
 + * scan progress pointer such that every real extent objectid
 + * will be smaller.
 + */
 +fs_info-qgroup_rescan_progress.objectid = (u64)-1;
 +btrfs_release_path(path);
 +mutex_unlock(fs_info-qgroup_rescan_lock);
 +return ret;
 +}
 +
 +btrfs_item_key_to_cpu(path-nodes[0], found,
 +  btrfs_header_nritems(path-nodes[0]) - 1);
 +fs_info-qgroup_rescan_progress.objectid = found.objectid + 1;
 +
 +btrfs_get_tree_mod_seq(fs_info, tree_mod_seq_elem);
 +memcpy(scratch_leaf, path-nodes[0], sizeof(*scratch_leaf));
 +slot = path-slots[0];
 +btrfs_release_path(path);
 +mutex_unlock(fs_info-qgroup_rescan_lock);
 +
 +for (; slot  btrfs_header_nritems(scratch_leaf); ++slot) {
 +btrfs_item_key_to_cpu(scratch_leaf, found, slot);
 +if (found.type != BTRFS_EXTENT_ITEM_KEY)
 +continue;
 +ret = btrfs_find_all_roots(trans, fs_info, found.objectid,
 +   tree_mod_seq_elem.seq, roots);
 +if (ret  0)
 +break;
 +spin_lock(fs_info-qgroup_lock);
 +seq = fs_info-qgroup_seq;
 +fs_info-qgroup_seq += roots-nnodes + 1; /* max refcnt */
 +
 +ulist_reinit(tmp);
 +ULIST_ITER_INIT(uiter);
 +while ((unode = ulist_next(roots, uiter))) {
 +struct btrfs_qgroup *qg;
 +
 +qg = find_qgroup_rb(fs_info, unode-val);
 +if (!qg)
 +continue;
 +
 +ret = ulist_add(tmp, qg-qgroupid, (uintptr_t)qg,
 +GFP_ATOMIC);
 
 
 If ulist_add() fails, we still need to call ulist_free(roots)..
 
 +if (ret  0) {
 +spin_unlock(fs_info-qgroup_lock);
 
 +goto out;
 +}
 +}
 +
 +/* this is similar to step 2 of btrfs_qgroup_account_ref */
 +ULIST_ITER_INIT(uiter);
 +while ((unode = ulist_next(tmp, uiter))) {
 +struct btrfs_qgroup *qg;
 +struct btrfs_qgroup_list *glist;
 +
 +qg = (struct btrfs_qgroup *)(uintptr_t) unode-aux;
 +

Re: [PATCH v3 3/3] Btrfs: automatic rescan after quota enable command

2013-04-23 Thread Jan Schmidt
On Tue, April 23, 2013 at 17:36 (+0200), David Sterba wrote:
 On Tue, Apr 23, 2013 at 01:26:51PM +0200, Jan Schmidt wrote:
 --- a/fs/btrfs/qgroup.c
 +++ b/fs/btrfs/qgroup.c
 @@ -1494,10 +1494,14 @@ int btrfs_run_qgroups(struct btrfs_trans_handle 
 *trans,
  {
  struct btrfs_root *quota_root = fs_info-quota_root;
  int ret = 0;
 +int start_rescan_worker = 0;
  
  if (!quota_root)
  goto out;
  
 +if (!fs_info-quota_enabled  fs_info-pending_quota_state)
 +start_rescan_worker = 1;
 +
  fs_info-quota_enabled = fs_info-pending_quota_state;
  
  spin_lock(fs_info-qgroup_lock);
 @@ -1523,6 +1527,12 @@ int btrfs_run_qgroups(struct btrfs_trans_handle 
 *trans,
  if (ret)
  fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
  
 +if (start_rescan_worker) {
 +ret = btrfs_qgroup_rescan(fs_info);
 
 btrfs_run_qgroups() is called from transaction commit and does BUG_ON
 the return value.
 
 btrfs_qgroup_rescan can return -EINPROGRESS if the rescan is in progress
 and this is propagated back to trans commit. So the rescan triggered by
 ioctl may cause a crash, unless I'm missing something.

You're right, doesn't seem like a good idea to propagate that return value to
the caller. I'll leave in the printk following the quoted line and reset ret to
zero afterwards. (As already mentioned, v4 to come)

Thanks,
Jan


 The original question I've had is what sort of work does rescan do
 because it's on the commit path and we don't want to add more work and
 delay commit.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] Btrfs: rescan for qgroups

2013-04-23 Thread Jan Schmidt
On Tue, April 23, 2013 at 16:54 (+0200), Wang Shilong wrote:
 
 Hello Jan,
 
  
 +static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
 +{
 +struct qgroup_rescan *qscan = container_of(work, struct qgroup_rescan,
 +   work);
 +struct btrfs_path *path;
 +struct btrfs_trans_handle *trans = NULL;
 +struct btrfs_fs_info *fs_info = qscan-fs_info;
 +struct ulist *tmp = NULL;
 +struct extent_buffer *scratch_leaf = NULL;
 +int err = -ENOMEM;
 +
 +path = btrfs_alloc_path();
 +if (!path)
 +goto out;
 +tmp = ulist_alloc(GFP_NOFS);
 +if (!tmp)
 +goto out;
 +scratch_leaf = kmalloc(sizeof(*scratch_leaf), GFP_NOFS);
 +if (!scratch_leaf)
 +goto out;
 +
 +err = 0;
 +while (!err) {
 +trans = btrfs_start_transaction(fs_info-fs_root, 0);
 +if (IS_ERR(trans)) {
 +err = PTR_ERR(trans);
 +break;
 +}
 +if (!fs_info-quota_enabled) {
 +err = EINTR;'
   Why not -EINTR?

Makes sense, will change that.

 +} else {
 +err = qgroup_rescan_leaf(qscan, path, trans,
 + tmp, scratch_leaf);
 +}
 +if (err  0)
 +btrfs_commit_transaction(trans, fs_info-fs_root);
 +else
 +btrfs_end_transaction(trans, fs_info-fs_root);
 +}
 +
 +out:
 +kfree(scratch_leaf);
 +ulist_free(tmp);
 +btrfs_free_path(path);
 +kfree(qscan);
 +
 +mutex_lock(fs_info-qgroup_rescan_lock);
 +fs_info-qgroup_flags = ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
 +
 +if (err == 2 
 +fs_info-qgroup_flags  BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) {
 +fs_info-qgroup_flags = ~BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
 +} else if (err  0) {
 
 It -EINTR happens, quota has been disabled, i don't think we should set 
 INCONSISTENT flag…

Debatable. Quota information is in fact inconsistent on disk, and only because
we can conclude that also from the fact that it is currently disabled, it
doesn't hurt to set that flag. In fact, whenever quota is enabled, we're setting
the flag, too:

 802 int btrfs_quota_enable(struct btrfs_trans_handle *trans,
 803struct btrfs_fs_info *fs_info)
...
 852 fs_info-qgroup_flags = BTRFS_QGROUP_STATUS_FLAG_ON |
 853 BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;

So I don't think it's worth another comparison here.

Thanks,
-Jan

 Thanks,
 Wang
 
 +fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
 +}
 +mutex_unlock(fs_info-qgroup_rescan_lock);
 +
 +if (err = 0) {
 +pr_info(btrfs: qgroup scan completed%s\n,
 +err == 2 ?  (inconsistency flag cleared) : );
 +} else {
 +pr_err(btrfs: qgroup scan failed with %d\n, err);
 +}
 +}
 +

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: separate sequence numbers for delayed ref tracking and tree mod log

2013-04-23 Thread Jan Schmidt
Sequence numbers for delayed refs have been introduced in the first version
of the qgroup patch set. To solve the problem of find_all_roots on a busy
file system, the tree mod log was introduced. The sequence numbers for that
were simply shared between those two users.

However, at one point in qgroup's quota accounting, there's a statement
accessing the previous sequence number, that's still just doing (seq - 1)
just as it had to in the very first version.

To satisfy that requirement, this patch makes the sequence number counter 64
bit and splits it into a major part (used for qgroup sequence number
counting) and a minor part (incremented for each tree modification in the
log). This enables us to go exactly one major step backwards, as required
for qgroups, while still incrementing the sequence counter for tree mod log
insertions to keep track of their order. Keeping them in a single variable
means there's no need to change all the code dealing with comparisons of two
sequence numbers.

The sequence number is reset to 0 on commit (not new in this patch), which
ensures we won't overflow the two 32 bit counters.

Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs
from the tree mod log code may happen.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/ctree.c   |   36 +---
 fs/btrfs/ctree.h   |7 ++-
 fs/btrfs/delayed-ref.c |6 --
 fs/btrfs/disk-io.c |2 +-
 fs/btrfs/extent-tree.c |5 +++--
 fs/btrfs/qgroup.c  |   13 -
 fs/btrfs/transaction.c |2 +-
 7 files changed, 52 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 566d99b..b74136e 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -361,6 +361,36 @@ static inline void tree_mod_log_write_unlock(struct 
btrfs_fs_info *fs_info)
 }
 
 /*
+ * increment the upper half of tree_mod_seq, set lower half zero
+ *
+ * must be called with fs_info-tree_mod_seq_lock held
+ */
+static inline u64 btrfs_inc_tree_mod_seq_major(struct btrfs_fs_info *fs_info)
+{
+   u64 seq = atomic64_read(fs_info-tree_mod_seq);
+   seq = 0xull;
+   seq += 1ull  32;
+   atomic64_set(fs_info-tree_mod_seq, seq);
+   return seq;
+}
+
+/*
+ * increment the lower half of tree_mod_seq
+ */
+static inline u64 btrfs_inc_tree_mod_seq_minor(struct btrfs_fs_info *fs_info)
+{
+   return atomic64_inc_return(fs_info-tree_mod_seq);
+}
+
+/*
+ * return the last minor in the previous major tree_mod_seq number
+ */
+u64 btrfs_tree_mod_seq_prev(u64 seq)
+{
+   return (seq  0xull) - 1ull;
+}
+
+/*
  * This adds a new blocker to the tree mod log's blocker list if the @elem
  * passed does not already have a sequence number set. So when a caller expects
  * to record tree modifications, it should ensure to set elem-seq to zero
@@ -376,10 +406,10 @@ u64 btrfs_get_tree_mod_seq(struct btrfs_fs_info *fs_info,
tree_mod_log_write_lock(fs_info);
spin_lock(fs_info-tree_mod_seq_lock);
if (!elem-seq) {
-   elem-seq = btrfs_inc_tree_mod_seq(fs_info);
+   elem-seq = btrfs_inc_tree_mod_seq_major(fs_info);
list_add_tail(elem-list, fs_info-tree_mod_seq_list);
}
-   seq = btrfs_inc_tree_mod_seq(fs_info);
+   seq = btrfs_inc_tree_mod_seq_minor(fs_info);
spin_unlock(fs_info-tree_mod_seq_lock);
tree_mod_log_write_unlock(fs_info);
 
@@ -524,7 +554,7 @@ static inline int tree_mod_alloc(struct btrfs_fs_info 
*fs_info, gfp_t flags,
if (!tm)
return -ENOMEM;
 
-   tm-seq = btrfs_inc_tree_mod_seq(fs_info);
+   tm-seq = btrfs_inc_tree_mod_seq_minor(fs_info);
return tm-seq;
 }
 
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 412c306..5f34f89 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1422,7 +1422,7 @@ struct btrfs_fs_info {
 
/* this protects tree_mod_seq_list */
spinlock_t tree_mod_seq_lock;
-   atomic_t tree_mod_seq;
+   atomic64_t tree_mod_seq;
struct list_head tree_mod_seq_list;
struct seq_list tree_mod_seq_elem;
 
@@ -3334,10 +3334,7 @@ u64 btrfs_get_tree_mod_seq(struct btrfs_fs_info *fs_info,
   struct seq_list *elem);
 void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
struct seq_list *elem);
-static inline u64 btrfs_inc_tree_mod_seq(struct btrfs_fs_info *fs_info)
-{
-   return atomic_inc_return(fs_info-tree_mod_seq);
-}
+u64 btrfs_tree_mod_seq_prev(u64 seq);
 int btrfs_old_root_level(struct btrfs_root *root, u64 time_seq);
 
 /* root-item.c */
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 116abec..c219463 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -361,8 +361,10 @@ int btrfs_check_delayed_seq(struct btrfs_fs_info *fs_info,
elem = list_first_entry(fs_info-tree_mod_seq_list

Re: [BUG REPORT] Kernel panic on 3.9.0-rc7-4-gbb33db7

2013-04-19 Thread Jan Schmidt
On Fri, April 19, 2013 at 07:57 (+0200), Tejun Heo wrote:
 (cc'ing btrfs people)
 
 On Fri, Apr 19, 2013 at 11:33:20AM +0800, Wanlong Gao wrote:
 RIP: 0010:[812484d3]  [812484d3] 
 ftrace_raw_event_block_bio_complete+0x73/0xf0
 ...
  [811b6c10] bio_endio+0x80/0x90
  [a0790d26] btrfs_end_bio+0xf6/0x190 [btrfs]
  [811b6bcd] bio_endio+0x3d/0x90
  [81249873] req_bio_endio+0xa3/0xe0
 
 Ugh
 
 In fs/btrfs/volumes.c
 
   static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical)
   {
   ...
   bio-bi_bdev = (struct block_device *)
  (unsigned long)bbio-mirror_num;
   ...
   }
 
   static void btrfs_end_bio(struct bio *bio, int err)
   {
   ...
   bio-bi_bdev = (struct block_device *)
   (unsigned long)bbio-mirror_num;
   
   ...
   }
 
 In fs/btrfs/extent_io.c
 
   static void end_bio_extent_readpage(struct bio *bio, int err)
   {
   int mirror;
   ...
   mirror = (int)(unsigned long)bio-bi_bdev;
   ...
   }
 
 Ewweehh
 
 No wonder this thing crashes.  Chris, can't the original bio carry
 bbio in bi_private and let end_bio_extent_readpage() free the bbio
 instead of abusing bi_bdev like this?

Oops.

It's been my patch back in 2011 (commit 2774b2ca3), sent as an RFC-Patch and
just slipped in without further discussion of that exact change. Hackish, yes -
my reasoning was because the block layer changed bio-bi_bdev anyway, no one
would want to look into it after the bio returned (and in fact it didn't hurt
for like two years now). Although the block layer changes bi_bdev, it stays a
valid bdev pointer, I admit.

One way around this would be what you suggest, however that would mean the
caller of (btrfs|btree)_submit_bio_hook gets its completion called in the end,
but must know that the private is in fact a bbio which in turn carries the
caller's private. Doesn't sound clean to me, either.

The best idea I currently have is to add a dispatcher function that does the
freeing of bbio and calls the actual completion with mirror_num as a separate
parameter. That would make all the btrfs completions incompatible with
bio_end_io_t, but it shouldn't hurt.

At least now I know I wasn't invited to LSF for a good reason :-)

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] Btrfs: rescan for qgroups

2013-04-17 Thread Jan Schmidt
On Tue, April 16, 2013 at 14:22 (+0200), Wang Shilong wrote:
 
 Hello Jan, more comments below..
 
 [...snip..]
 
  
 +
 +static long btrfs_ioctl_quota_rescan_status(struct file *file, void __user 
 *arg)
 +{
 +struct btrfs_root *root = BTRFS_I(fdentry(file)-d_inode)-root;
 +struct btrfs_ioctl_quota_rescan_args *qsa;
 +int ret = 0;
 +
 +if (!capable(CAP_SYS_ADMIN))
 +return -EPERM;
 +
 +qsa = kzalloc(sizeof(*qsa), GFP_NOFS);
 +if (!qsa)
 +return -ENOMEM;
 +
 
   Here, i think we should hold qgroup_rescan_lock and group_lock:
 
   1 qgroup_rescan protect BTRFS_QGROUP_STATUS_RESCAN  
   2quota disabling may happen this time..so group_lock should also be 
 held here.

It's just a status call for user space, I don't really care about exact
synchronization here. *If* we wanted to do that, I would have moved the code
into qgroup.c, because all the code that requires any qgroup locks is there. But
I'd really want to keep it simple. You cannot get completely garbage information
that way, you only could race with someone just starting off or finishing a
rescan operation. I don't think that really matters in the end.

 
 
 +if (root-fs_info-qgroup_flags  BTRFS_QGROUP_STATUS_FLAG_RESCAN) {
 +qsa-flags = 1;
 +qsa-progress = root-fs_info-qgroup_rescan_progress.objectid;
 +}
 +
 +if (copy_to_user(arg, qsa, sizeof(*qsa)))
 +ret = -EFAULT;
 +
 +kfree(qsa);
 +return ret;
 +}
 +
  
 [….snip...]

 +
 +/*
 + * returns  0 on error, 0 when more leafs are to be scanned.
 + * returns 1 when done, 2 when done and FLAG_INCONSISTENT was cleared.
 + */
 +static int
 +qgroup_rescan_leaf(struct qgroup_rescan *qscan, struct btrfs_path *path,
 +   struct btrfs_trans_handle *trans, struct ulist *tmp,
 +   struct extent_buffer *scratch_leaf)
 +{
 +struct btrfs_key found;
 +struct btrfs_fs_info *fs_info = qscan-fs_info;
 +struct ulist *roots = NULL;
 +struct ulist_node *unode;
 +struct ulist_iterator uiter;
 +struct seq_list tree_mod_seq_elem = {};
 +u64 seq;
 +int slot;
 +int ret;
 +
 +path-leave_spinning = 1;
 +mutex_lock(fs_info-qgroup_rescan_lock);
 
 Here in qgroup_rescan_leaf(), we don't need hold group_rescan_lock.
 Because qgroup_rescan_lock is used to protect qgroup_flag, in 
 group_rescan_leaf().
 we don't change qgroup_flag.. So we don't need hold the group_rescan_lock.
 
 Maybe we can just remove the lock qgroup_rescan_lock,  and i think what 
 qgroup_rscan_lock
 does that qgroup_lock can replace.

No, we cannot. We need the mutex for the following tree search and tie it to the
following update of the qgroup_rescan_progress. In fact, that's the only reason
I introduced it, but I don't want to hold a spin lock for a whole tree search.
If we do not make sure the search operation and the progress update happen under
the same lock, we can end up with a tree block being found by thread A, then
thread B checks the rescan_progress, then thread A updates the rescan_progress
according to the found block and doing the rescan. That would result in wrong
tracking information.

 
 
 +ret = btrfs_search_slot_for_read(fs_info-extent_root,
 + fs_info-qgroup_rescan_progress,
 + path, 1, 0);
 +
 +pr_debug(current progress key (%llu %u %llu), search_slot ret %d\n,
 + (unsigned long long)fs_info-qgroup_rescan_progress.objectid,
 + fs_info-qgroup_rescan_progress.type,
 + (unsigned long long)fs_info-qgroup_rescan_progress.offset,
 + ret);
 +
 +if (ret) {
 +/*
 + * The rescan is about to end, we will not be scanning any
 + * further blocks. We cannot unset the RESCAN flag here, because
 + * we want to commit the transaction if everything went well.
 + * To make the live accounting work in this phase, we set our
 + * scan progress pointer such that every real extent objectid
 + * will be smaller.
 + */
 +fs_info-qgroup_rescan_progress.objectid = (u64)-1;
 +btrfs_release_path(path);
 +mutex_unlock(fs_info-qgroup_rescan_lock);
 +return ret;
 +}
 +
 +btrfs_item_key_to_cpu(path-nodes[0], found,
 +  btrfs_header_nritems(path-nodes[0]) - 1);
 +fs_info-qgroup_rescan_progress.objectid = found.objectid + 1;
 +
 +btrfs_get_tree_mod_seq(fs_info, tree_mod_seq_elem);
 +memcpy(scratch_leaf, path-nodes[0], sizeof(*scratch_leaf));
 +slot = path-slots[0];
 +btrfs_release_path(path);
 +mutex_unlock(fs_info-qgroup_rescan_lock);
 +
 +for (; slot  btrfs_header_nritems(scratch_leaf); ++slot) {
 +btrfs_item_key_to_cpu(scratch_leaf, found, slot);
 +if (found.type != BTRFS_EXTENT_ITEM_KEY)
 +continue;
 +ret = 

Re: [PATCH] Btrfs: return error when we specify wrong start

2013-04-16 Thread Jan Schmidt
On Tue, April 16, 2013 at 10:40 (+0200), Liu Bo wrote:
 We need such a sanity check for wrong start, otherwise, even with
 a wrong start that's larger than file size, we can end up not only
 changing inode's force compress flag but also FS's incompat flags.

That reads out very cryptic. Can you please add something hinting at defrag to
the title or at least the description?

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] Btrfs: quota rescan for 3.10

2013-04-16 Thread Jan Schmidt
The kernel side for rescan, which is needed if you want to enable qgroup
tracking on a non-empty volume. The first patch splits
btrfs_qgroup_account_ref into readable ans reusable units. The second
patch adds the rescan implementation (refer to its commit message for a
description of the algorithm). The third  patch starts an automatic
rescan when qgroups are enabled. It is only separated to potentially
help bisecting things in case of a problem.

The required user space patch was sent at 2013-04-05, subject [PATCH]
Btrfs-progs: quota rescan.

--
Changes v1-v2:
- fix calculation of the exclusive field for qgroups in level != 0
- split btrfs_qgroup_account_ref
- take into account that mutex_unlock might schedule
- fix kzalloc error checking
- add some reserved ints to struct btrfs_ioctl_quota_rescan_args
- changed modification to unused #define BTRFS_QUOTA_CTL_RESCAN
- added missing (unsigned long long) casts for pr_debug
- more detailed commit messages

Jan Schmidt (3):
  Btrfs: split btrfs_qgroup_account_ref into four functions
  Btrfs: rescan for qgroups
  Btrfs: automatic rescan after quota enable command

 fs/btrfs/ctree.h   |   17 +-
 fs/btrfs/disk-io.c |6 +
 fs/btrfs/ioctl.c   |   83 ++--
 fs/btrfs/qgroup.c  |  517 +++-
 include/uapi/linux/btrfs.h |   12 +-
 5 files changed, 509 insertions(+), 126 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] Btrfs: rescan for qgroups

2013-04-16 Thread Jan Schmidt
If qgroup tracking is out of sync, a rescan operation can be started. It
iterates the complete extent tree and recalculates all qgroup tracking data.
This is an expensive operation and should not be used unless required.

A filesystem under rescan can still be umounted. The rescan continues on the
next mount.  Status information is provided with a separate ioctl while a
rescan operation is in progress.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/ctree.h   |   17 ++-
 fs/btrfs/disk-io.c |6 +
 fs/btrfs/ioctl.c   |   83 ++--
 fs/btrfs/qgroup.c  |  295 +--
 include/uapi/linux/btrfs.h |   12 ++-
 5 files changed, 378 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0d82922..bd4e2a7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1019,9 +1019,9 @@ struct btrfs_block_group_item {
  */
 #define BTRFS_QGROUP_STATUS_FLAG_ON(1ULL  0)
 /*
- * SCANNING is set during the initialization phase
+ * RESCAN is set during the initialization phase
  */
-#define BTRFS_QGROUP_STATUS_FLAG_SCANNING  (1ULL  1)
+#define BTRFS_QGROUP_STATUS_FLAG_RESCAN(1ULL  1)
 /*
  * Some qgroup entries are known to be out of date,
  * either because the configuration has changed in a way that
@@ -1050,7 +1050,7 @@ struct btrfs_qgroup_status_item {
 * only used during scanning to record the progress
 * of the scan. It contains a logical address
 */
-   __le64 scan;
+   __le64 rescan;
 } __attribute__ ((__packed__));
 
 struct btrfs_qgroup_info_item {
@@ -1587,6 +1587,11 @@ struct btrfs_fs_info {
/* used by btrfs_qgroup_record_ref for an efficient tree traversal */
u64 qgroup_seq;
 
+   /* qgroup rescan items */
+   struct mutex qgroup_rescan_lock; /* protects the progress item */
+   struct btrfs_key qgroup_rescan_progress;
+   struct btrfs_workers qgroup_rescan_workers;
+
/* filesystem state */
unsigned long fs_state;
 
@@ -2864,8 +2869,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct 
btrfs_qgroup_status_item,
   version, 64);
 BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
   flags, 64);
-BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item,
-  scan, 64);
+BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item,
+  rescan, 64);
 
 /* btrfs_qgroup_info_item */
 BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
@@ -3784,7 +3789,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans,
   struct btrfs_fs_info *fs_info);
 int btrfs_quota_disable(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info);
-int btrfs_quota_rescan(struct btrfs_fs_info *fs_info);
+int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
 int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info, u64 src, u64 dst);
 int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6d19a0a..60d15fe 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2192,6 +2192,7 @@ int open_ctree(struct super_block *sb,
fs_info-qgroup_seq = 1;
fs_info-quota_enabled = 0;
fs_info-pending_quota_state = 0;
+   mutex_init(fs_info-qgroup_rescan_lock);
 
btrfs_init_free_cluster(fs_info-meta_alloc_cluster);
btrfs_init_free_cluster(fs_info-data_alloc_cluster);
@@ -2394,6 +2395,8 @@ int open_ctree(struct super_block *sb,
btrfs_init_workers(fs_info-readahead_workers, readahead,
   fs_info-thread_pool_size,
   fs_info-generic_worker);
+   btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1,
+  fs_info-generic_worker);
 
/*
 * endios are largely parallel and should have a very
@@ -2428,6 +2431,7 @@ int open_ctree(struct super_block *sb,
ret |= btrfs_start_workers(fs_info-caching_workers);
ret |= btrfs_start_workers(fs_info-readahead_workers);
ret |= btrfs_start_workers(fs_info-flush_workers);
+   ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers);
if (ret) {
err = -ENOMEM;
goto fail_sb_buffer;
@@ -2773,6 +2777,7 @@ fail_sb_buffer:
btrfs_stop_workers(fs_info-delayed_workers);
btrfs_stop_workers(fs_info-caching_workers);
btrfs_stop_workers(fs_info-flush_workers);
+   btrfs_stop_workers(fs_info-qgroup_rescan_workers);
 fail_alloc:
 fail_iput:
btrfs_mapping_tree_free(fs_info-mapping_tree);
@@ -3463,6 +3468,7 @@ int close_ctree(struct btrfs_root *root)
btrfs_stop_workers(fs_info-caching_workers);
btrfs_stop_workers(fs_info

[PATCH v2 3/3] Btrfs: automatic rescan after quota enable command

2013-04-16 Thread Jan Schmidt
When qgroup tracking is enabled, we do an automatic cycle of the new rescan
mechanism.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/qgroup.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index bb081b5..0ea2c3e 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1356,10 +1356,14 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
 {
struct btrfs_root *quota_root = fs_info-quota_root;
int ret = 0;
+   int start_rescan_worker = 0;
 
if (!quota_root)
goto out;
 
+   if (!fs_info-quota_enabled  fs_info-pending_quota_state)
+   start_rescan_worker = 1;
+
fs_info-quota_enabled = fs_info-pending_quota_state;
 
spin_lock(fs_info-qgroup_lock);
@@ -1385,6 +1389,12 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
if (ret)
fs_info-qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
 
+   if (start_rescan_worker) {
+   ret = btrfs_qgroup_rescan(fs_info);
+   if (ret)
+   pr_err(btrfs: start rescan quota failed: %d\n, ret);
+   }
+
 out:
 
return ret;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions

2013-04-16 Thread Jan Schmidt
The function is separated into a preparation part and the three accounting
steps mentioned in the qgroups documentation. The goal is to make steps two
and three usable by the rescan functionality. A side effect is that the
function is restructured into readable subunits.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/qgroup.c |  212 ++---
 1 files changed, 121 insertions(+), 91 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index b44124d..c38a0c5 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1075,6 +1075,122 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle 
*trans,
return 0;
 }
 
+static void qgroup_account_ref_step1(struct btrfs_fs_info *fs_info,
+struct ulist *roots, struct ulist *tmp,
+u64 seq)
+{
+   struct ulist_node *unode;
+   struct ulist_iterator uiter;
+   struct ulist_node *tmp_unode;
+   struct ulist_iterator tmp_uiter;
+   struct btrfs_qgroup *qg;
+
+   ULIST_ITER_INIT(uiter);
+   while ((unode = ulist_next(roots, uiter))) {
+   qg = find_qgroup_rb(fs_info, unode-val);
+   if (!qg)
+   continue;
+
+   ulist_reinit(tmp);
+   /* XXX id not needed */
+   ulist_add(tmp, qg-qgroupid, (u64)(uintptr_t)qg, GFP_ATOMIC);
+   ULIST_ITER_INIT(tmp_uiter);
+   while ((tmp_unode = ulist_next(tmp, tmp_uiter))) {
+   struct btrfs_qgroup_list *glist;
+
+   qg = (struct btrfs_qgroup *)(uintptr_t)tmp_unode-aux;
+   if (qg-refcnt  seq)
+   qg-refcnt = seq + 1;
+   else
+   ++qg-refcnt;
+
+   list_for_each_entry(glist, qg-groups, next_group) {
+   ulist_add(tmp, glist-group-qgroupid,
+ (u64)(uintptr_t)glist-group,
+ GFP_ATOMIC);
+   }
+   }
+   }
+}
+
+static void qgroup_account_ref_step2(struct btrfs_fs_info *fs_info,
+struct ulist *roots, struct ulist *tmp,
+u64 seq, int sgn, u64 num_bytes,
+struct btrfs_qgroup *qgroup)
+{
+   struct ulist_node *unode;
+   struct ulist_iterator uiter;
+   struct btrfs_qgroup *qg;
+   struct btrfs_qgroup_list *glist;
+
+   ulist_reinit(tmp);
+   ulist_add(tmp, qgroup-qgroupid, (uintptr_t)qgroup, GFP_ATOMIC);
+
+   ULIST_ITER_INIT(uiter);
+   while ((unode = ulist_next(tmp, uiter))) {
+
+   qg = (struct btrfs_qgroup *)(uintptr_t)unode-aux;
+   if (qg-refcnt  seq) {
+   /* not visited by step 1 */
+   qg-rfer += sgn * num_bytes;
+   qg-rfer_cmpr += sgn * num_bytes;
+   if (roots-nnodes == 0) {
+   qg-excl += sgn * num_bytes;
+   qg-excl_cmpr += sgn * num_bytes;
+   }
+   qgroup_dirty(fs_info, qg);
+   }
+   WARN_ON(qg-tag = seq);
+   qg-tag = seq;
+
+   list_for_each_entry(glist, qg-groups, next_group) {
+   ulist_add(tmp, glist-group-qgroupid,
+ (uintptr_t)glist-group, GFP_ATOMIC);
+   }
+   }
+}
+
+static void qgroup_account_ref_step3(struct btrfs_fs_info *fs_info,
+struct ulist *roots, struct ulist *tmp,
+u64 seq, int sgn, u64 num_bytes)
+{
+   struct ulist_node *unode;
+   struct ulist_iterator uiter;
+   struct btrfs_qgroup *qg;
+   struct ulist_node *tmp_unode;
+   struct ulist_iterator tmp_uiter;
+
+   ULIST_ITER_INIT(uiter);
+   while ((unode = ulist_next(roots, uiter))) {
+   qg = find_qgroup_rb(fs_info, unode-val);
+   if (!qg)
+   continue;
+
+   ulist_reinit(tmp);
+   ulist_add(tmp, qg-qgroupid, (uintptr_t)qg, GFP_ATOMIC);
+   ULIST_ITER_INIT(tmp_uiter);
+   while ((tmp_unode = ulist_next(tmp, tmp_uiter))) {
+   struct btrfs_qgroup_list *glist;
+
+   qg = (struct btrfs_qgroup *)(uintptr_t)tmp_unode-aux;
+   if (qg-tag == seq)
+   continue;
+
+   if (qg-refcnt - seq == roots-nnodes) {
+   qg-excl -= sgn * num_bytes;
+   qg-excl_cmpr -= sgn * num_bytes;
+   qgroup_dirty(fs_info, qg

Re: [PATCH v2 1/3] Btrfs: split btrfs_qgroup_account_ref into four functions

2013-04-16 Thread Jan Schmidt
On Tue, April 16, 2013 at 11:20 (+0200), Wang Shilong wrote:
 Hello Jan,
 
 The function is separated into a preparation part and the three accounting
 steps mentioned in the qgroups documentation. The goal is to make steps two
 and three usable by the rescan functionality. A side effect is that the
 function is restructured into readable subunits.
 
 
 How about renaming the three functions like:
 
 1 qgroup_walk_old_roots()
 2 qgroup_walk_new_root()
 3 qgroup_rewalk_old_root()
 
 I'd like this function to be meaningful, but not just step1,2,3.
 Maybe you can think out better function name.

I'd like to keep it like 1, 2, 3, because that matches the documentation in the
qgroup pdf and the code has always been documented in those three steps.

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] Btrfs: rescan for qgroups

2013-04-16 Thread Jan Schmidt
On Tue, April 16, 2013 at 11:26 (+0200), Wang Shilong wrote:
 Hello, Jan
 
 If qgroup tracking is out of sync, a rescan operation can be started. It
 iterates the complete extent tree and recalculates all qgroup tracking data.
 This is an expensive operation and should not be used unless required.

 A filesystem under rescan can still be umounted. The rescan continues on the
 next mount.  Status information is provided with a separate ioctl while a
 rescan operation is in progress.

 Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
 ---
  fs/btrfs/ctree.h   |   17 ++-
  fs/btrfs/disk-io.c |6 +
  fs/btrfs/ioctl.c   |   83 ++--
  fs/btrfs/qgroup.c  |  295 
 +--
  include/uapi/linux/btrfs.h |   12 ++-
  5 files changed, 378 insertions(+), 35 deletions(-)

 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 0d82922..bd4e2a7 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1019,9 +1019,9 @@ struct btrfs_block_group_item {
   */
  #define BTRFS_QGROUP_STATUS_FLAG_ON (1ULL  0)
  /*
 - * SCANNING is set during the initialization phase
 + * RESCAN is set during the initialization phase
   */
 -#define BTRFS_QGROUP_STATUS_FLAG_SCANNING   (1ULL  1)
 +#define BTRFS_QGROUP_STATUS_FLAG_RESCAN (1ULL  1)
  /*
   * Some qgroup entries are known to be out of date,
   * either because the configuration has changed in a way that
 @@ -1050,7 +1050,7 @@ struct btrfs_qgroup_status_item {
   * only used during scanning to record the progress
   * of the scan. It contains a logical address
   */
 -__le64 scan;
 +__le64 rescan;
  } __attribute__ ((__packed__));
  
  struct btrfs_qgroup_info_item {
 @@ -1587,6 +1587,11 @@ struct btrfs_fs_info {
  /* used by btrfs_qgroup_record_ref for an efficient tree traversal */
  u64 qgroup_seq;
  
 +/* qgroup rescan items */
 +struct mutex qgroup_rescan_lock; /* protects the progress item */
 +struct btrfs_key qgroup_rescan_progress;
 +struct btrfs_workers qgroup_rescan_workers;
 +
  /* filesystem state */
  unsigned long fs_state;
  
 @@ -2864,8 +2869,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_version, struct 
 btrfs_qgroup_status_item,
 version, 64);
  BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
 flags, 64);
 -BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item,
 -   scan, 64);
 +BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item,
 +   rescan, 64);
  
  /* btrfs_qgroup_info_item */
  BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
 @@ -3784,7 +3789,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle 
 *trans,
 struct btrfs_fs_info *fs_info);
  int btrfs_quota_disable(struct btrfs_trans_handle *trans,
  struct btrfs_fs_info *fs_info);
 -int btrfs_quota_rescan(struct btrfs_fs_info *fs_info);
 +int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info);
  int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info, u64 src, u64 dst);
  int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 6d19a0a..60d15fe 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -2192,6 +2192,7 @@ int open_ctree(struct super_block *sb,
  fs_info-qgroup_seq = 1;
  fs_info-quota_enabled = 0;
  fs_info-pending_quota_state = 0;
 +mutex_init(fs_info-qgroup_rescan_lock);
  
  btrfs_init_free_cluster(fs_info-meta_alloc_cluster);
  btrfs_init_free_cluster(fs_info-data_alloc_cluster);
 @@ -2394,6 +2395,8 @@ int open_ctree(struct super_block *sb,
  btrfs_init_workers(fs_info-readahead_workers, readahead,
 fs_info-thread_pool_size,
 fs_info-generic_worker);
 +btrfs_init_workers(fs_info-qgroup_rescan_workers, qgroup-rescan, 1,
 +   fs_info-generic_worker);
  
  /*
   * endios are largely parallel and should have a very
 @@ -2428,6 +2431,7 @@ int open_ctree(struct super_block *sb,
  ret |= btrfs_start_workers(fs_info-caching_workers);
  ret |= btrfs_start_workers(fs_info-readahead_workers);
  ret |= btrfs_start_workers(fs_info-flush_workers);
 +ret |= btrfs_start_workers(fs_info-qgroup_rescan_workers);
  if (ret) {
  err = -ENOMEM;
  goto fail_sb_buffer;
 @@ -2773,6 +2777,7 @@ fail_sb_buffer:
  btrfs_stop_workers(fs_info-delayed_workers);
  btrfs_stop_workers(fs_info-caching_workers);
  btrfs_stop_workers(fs_info-flush_workers);
 +btrfs_stop_workers(fs_info-qgroup_rescan_workers);
  fail_alloc:
  fail_iput:
  btrfs_mapping_tree_free(fs_info-mapping_tree);
 @@ -3463,6 +3468,7 @@ int close_ctree(struct btrfs_root *root)
  btrfs_stop_workers(fs_info-caching_workers

Re: [PATCH v2 2/3] Btrfs: rescan for qgroups

2013-04-16 Thread Jan Schmidt
On Tue, April 16, 2013 at 12:08 (+0200), Wang Shilong wrote:
 Hello Jan,
  
 
  slot = path-slots[0];
  ptr = btrfs_item_ptr(l, slot, struct btrfs_qgroup_status_item);
 +spin_lock(fs_info-qgroup_lock);
 
 
 Why we need hold qgroup_lock here? would you please explain...

It would have been easier for me if you had left the relevant context in there,
but I finally found it.

Thinking again about it, as update_qgroup_status_item is only called from
transaction commit context, we can do without a spinlock here. I meant to
protect fs_info-qgroup_flags and fs_info-qgroup_rescan_progress, but it seems
not required.

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: rescan for qgroups

2013-04-15 Thread Jan Schmidt
On Mon, April 15, 2013 at 08:08 (+0200), Wang Shilong wrote:
 Hello Jan,
 
 On Mon, April 15, 2013 at 07:44 (+0200), Jan Schmidt wrote:
 Thanks, v2 to come.

 Uh, but not immediately. I didn't get tracking of exclusive right. That 
 will
 need some time to fix and test.
 
 
 'exclusive' adds the complexity of btrfs qgroup.
 So if you send V2. I'd like you add more lines in changelog.

Yes, the commit message will be longer as you requested previously. This does
not include a complete description on how exclusive works. The qgroup pdf
explains that.

 Besides, i have a question in my mind.(I have not seen you code)..
 When qgroup rescan  will happen?
 
 1 when quota is enabled

That's what the second patch does, yes. Your patches should be merged in a way
that we first create the level 0 qgroups for all subvolumes and then start the
rescan, obviously.

 2 if a new qgroup relations is created, rescan should happen?

With your patches, there will be no subvolume qgroups missing. For the higher
level groups, one needs expert knowledge anyway. I think it's best to leave that
decision to the administrator configuring those qgroups.

 2 user call qgroup rescan..

Of course, yes.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: rescan for qgroups

2013-04-14 Thread Jan Schmidt
On Mon, April 15, 2013 at 07:44 (+0200), Jan Schmidt wrote:
 Thanks, v2 to come.

Uh, but not immediately. I didn't get tracking of exclusive right. That will
need some time to fix and test.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] Btrfs: fix accessing the root pointer in tree mod log functions

2013-04-13 Thread Jan Schmidt
The tree mod log functions were accessing root-node-... directly, without
use of btrfs_root_node() or explicit rcu locking. This could lead to an
extent buffer reference being leaked and another reference being freed too
early when preemtion was enabled.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/ctree.c |   38 +++---
 1 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4439cb7..0260795 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1068,11 +1068,11 @@ static noinline int __btrfs_cow_block(struct 
btrfs_trans_handle *trans,
  */
 static struct tree_mod_elem *
 __tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info,
-  struct btrfs_root *root, u64 time_seq)
+  struct extent_buffer *eb_root, u64 time_seq)
 {
struct tree_mod_elem *tm;
struct tree_mod_elem *found = NULL;
-   u64 root_logical = root-node-start;
+   u64 root_logical = eb_root-start;
int looped = 0;
 
if (!time_seq)
@@ -1106,7 +1106,6 @@ __tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info,
 
found = tm;
root_logical = tm-old_root.logical;
-   BUG_ON(root_logical == root-node-start);
looped = 1;
}
 
@@ -1245,29 +1244,30 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
 {
struct tree_mod_elem *tm;
struct extent_buffer *eb;
+   struct extent_buffer *eb_root;
struct extent_buffer *old;
struct tree_mod_root *old_root = NULL;
u64 old_generation = 0;
u64 logical;
u32 blocksize;
 
-   eb = btrfs_read_lock_root_node(root);
-   tm = __tree_mod_log_oldest_root(root-fs_info, root, time_seq);
+   eb_root = btrfs_read_lock_root_node(root);
+   tm = __tree_mod_log_oldest_root(root-fs_info, eb_root, time_seq);
if (!tm)
-   return root-node;
+   return eb_root;
 
if (tm-op == MOD_LOG_ROOT_REPLACE) {
old_root = tm-old_root;
old_generation = tm-generation;
logical = old_root-logical;
} else {
-   logical = root-node-start;
+   logical = eb_root-start;
}
 
tm = tree_mod_log_search(root-fs_info, logical, time_seq);
if (old_root  tm  tm-op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) {
-   btrfs_tree_read_unlock(root-node);
-   free_extent_buffer(root-node);
+   btrfs_tree_read_unlock(eb_root);
+   free_extent_buffer(eb_root);
blocksize = btrfs_level_size(root, old_root-level);
old = read_tree_block(root, logical, blocksize, 0);
if (!old) {
@@ -1279,13 +1279,13 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
free_extent_buffer(old);
}
} else if (old_root) {
-   btrfs_tree_read_unlock(root-node);
-   free_extent_buffer(root-node);
+   btrfs_tree_read_unlock(eb_root);
+   free_extent_buffer(eb_root);
eb = alloc_dummy_extent_buffer(logical, root-nodesize);
} else {
-   eb = btrfs_clone_extent_buffer(root-node);
-   btrfs_tree_read_unlock(root-node);
-   free_extent_buffer(root-node);
+   eb = btrfs_clone_extent_buffer(eb_root);
+   btrfs_tree_read_unlock(eb_root);
+   free_extent_buffer(eb_root);
}
 
if (!eb)
@@ -1295,7 +1295,7 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
if (old_root) {
btrfs_set_header_bytenr(eb, eb-start);
btrfs_set_header_backref_rev(eb, BTRFS_MIXED_BACKREF_REV);
-   btrfs_set_header_owner(eb, root-root_key.objectid);
+   btrfs_set_header_owner(eb, btrfs_header_owner(eb_root));
btrfs_set_header_level(eb, old_root-level);
btrfs_set_header_generation(eb, old_generation);
}
@@ -1312,15 +1312,15 @@ int btrfs_old_root_level(struct btrfs_root *root, u64 
time_seq)
 {
struct tree_mod_elem *tm;
int level;
+   struct extent_buffer *eb_root = btrfs_root_node(root);
 
-   tm = __tree_mod_log_oldest_root(root-fs_info, root, time_seq);
+   tm = __tree_mod_log_oldest_root(root-fs_info, eb_root, time_seq);
if (tm  tm-op == MOD_LOG_ROOT_REPLACE) {
level = tm-old_root.level;
} else {
-   rcu_read_lock();
-   level = btrfs_header_level(root-node);
-   rcu_read_unlock();
+   level = btrfs_header_level(eb_root);
}
+   free_extent_buffer(eb_root);
 
return level;
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http

[PATCH 0/3] Btrfs: patches for tree mod log for next rc

2013-04-13 Thread Jan Schmidt
These three fixes for the tree mod log should go into the next rc pull
request to mainline. I suggest adding them to stable as well, as we did
with the previous tree mod log patch.

The first patch fixes logging of root split operations. The second one
fixes access to the root within the tree mod log functions, which
weren't correctly honoring rcu locking. The third is a fix correcting
the order of free and unlock.

With the snapshot aware defrag patches we added in 3.9, these fixes are
also important for users not using qgroups.

Concerning qgroups, there is at least one more issue to be solved: The
qgroup's expectations how tree mod log increases its sequence numbers
doesn't fit with what the tree mod log code is actually doing. Estimated
size of that fix is larger than what should go into the rc commits or
into stable, that one will be coming for 3.10.

With these three patches applied, I've been running my tests for more
than a day without any issues, while without, it takes only minutes to
trigger a BUG_ON or WARN_ON.

Jan Schmidt (3):
  Btrfs: fix tree mod log regression on root split operations
  Btrfs: fix accessing the root pointer in tree mod log functions
  Btrfs: fix unlock after free on rewinded tree blocks

 fs/btrfs/ctree.c |  111 -
 1 files changed, 59 insertions(+), 52 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] Btrfs: fix tree mod log regression on root split operations

2013-04-13 Thread Jan Schmidt
Commit d9abbf1c changed tree mod log locking around ROOT_REPLACE operations.
When a tree root is split, however, we were logging removal of all elements
from the root node before logging removal of half of the elements for the
split operation. This leads to a BUG_ON when rewinding.

This commit removes the erroneous logging of removal of all elements.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/ctree.c |   55 -
 1 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index ca9d8f1..4439cb7 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -643,7 +643,8 @@ __tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, 
struct extent_buffer *eb)
 static noinline int
 tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
 struct extent_buffer *old_root,
-struct extent_buffer *new_root, gfp_t flags)
+struct extent_buffer *new_root, gfp_t flags,
+int log_removal)
 {
struct tree_mod_elem *tm;
int ret;
@@ -651,7 +652,8 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
if (tree_mod_dont_log(fs_info, NULL))
return 0;
 
-   __tree_mod_log_free_eb(fs_info, old_root);
+   if (log_removal)
+   __tree_mod_log_free_eb(fs_info, old_root);
 
ret = tree_mod_alloc(fs_info, flags, tm);
if (ret  0)
@@ -738,7 +740,7 @@ tree_mod_log_search(struct btrfs_fs_info *fs_info, u64 
start, u64 min_seq)
 static noinline void
 tree_mod_log_eb_copy(struct btrfs_fs_info *fs_info, struct extent_buffer *dst,
 struct extent_buffer *src, unsigned long dst_offset,
-unsigned long src_offset, int nr_items, int log_removal)
+unsigned long src_offset, int nr_items)
 {
int ret;
int i;
@@ -752,12 +754,10 @@ tree_mod_log_eb_copy(struct btrfs_fs_info *fs_info, 
struct extent_buffer *dst,
}
 
for (i = 0; i  nr_items; i++) {
-   if (log_removal) {
-   ret = tree_mod_log_insert_key_locked(fs_info, src,
-   i + src_offset,
-   MOD_LOG_KEY_REMOVE);
-   BUG_ON(ret  0);
-   }
+   ret = tree_mod_log_insert_key_locked(fs_info, src,
+   i + src_offset,
+   MOD_LOG_KEY_REMOVE);
+   BUG_ON(ret  0);
ret = tree_mod_log_insert_key_locked(fs_info, dst,
 i + dst_offset,
 MOD_LOG_KEY_ADD);
@@ -802,11 +802,12 @@ tree_mod_log_free_eb(struct btrfs_fs_info *fs_info, 
struct extent_buffer *eb)
 
 static noinline void
 tree_mod_log_set_root_pointer(struct btrfs_root *root,
- struct extent_buffer *new_root_node)
+ struct extent_buffer *new_root_node,
+ int log_removal)
 {
int ret;
ret = tree_mod_log_insert_root(root-fs_info, root-node,
-  new_root_node, GFP_NOFS);
+  new_root_node, GFP_NOFS, log_removal);
BUG_ON(ret  0);
 }
 
@@ -1028,7 +1029,7 @@ static noinline int __btrfs_cow_block(struct 
btrfs_trans_handle *trans,
parent_start = 0;
 
extent_buffer_get(cow);
-   tree_mod_log_set_root_pointer(root, cow);
+   tree_mod_log_set_root_pointer(root, cow, 1);
rcu_assign_pointer(root-node, cow);
 
btrfs_free_tree_block(trans, root, buf, parent_start,
@@ -1754,7 +1755,7 @@ static noinline int balance_level(struct 
btrfs_trans_handle *trans,
goto enospc;
}
 
-   tree_mod_log_set_root_pointer(root, child);
+   tree_mod_log_set_root_pointer(root, child, 1);
rcu_assign_pointer(root-node, child);
 
add_root_to_dirty_list(root);
@@ -2998,7 +2999,7 @@ static int push_node_left(struct btrfs_trans_handle 
*trans,
push_items = min(src_nritems - 8, push_items);
 
tree_mod_log_eb_copy(root-fs_info, dst, src, dst_nritems, 0,
-push_items, 1);
+push_items);
copy_extent_buffer(dst, src,
   btrfs_node_key_ptr_offset(dst_nritems),
   btrfs_node_key_ptr_offset(0),
@@ -3069,7 +3070,7 @@ static int balance_node_right(struct btrfs_trans_handle 
*trans,
  sizeof(struct btrfs_key_ptr));
 
tree_mod_log_eb_copy(root-fs_info, dst, src, 0,
-src_nritems - push_items

Re: [PATCH 2/2] Btrfs: introduce noextiref mount option

2013-04-12 Thread Jan Schmidt
On Fri, April 12, 2013 at 06:13 (+0200), Miao Xie wrote:
 Onthu, 11 Apr 2013 16:29:48 +0200, Jan Schmidt wrote:
 On Thu, April 11, 2013 at 12:35 (+0200), Miao Xie wrote:
 Now, we set incompat flag EXTEND_IREF when we actually need insert a
 extend inode reference, not when making a fs. But some users may hope
 that the fs still can be mounted on the old kernel, and don't hope we
 insert any extend inode references. So we introduce noextiref mount
 option to close this function.

 That's a much better approach compared to setting the flag on mkfs, I agree.

 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 Cc: Mark Fasheh mfas...@suse.de
 ---
  fs/btrfs/ctree.h  |  1 +
  fs/btrfs/disk-io.c|  9 +
  fs/btrfs/inode-item.c |  2 +-
  fs/btrfs/super.c  | 41 -
  4 files changed, 51 insertions(+), 2 deletions(-)

 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index a883e47..db88963 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1911,6 +1911,7 @@ struct btrfs_ioctl_defrag_range_args {
  #define BTRFS_MOUNT_CHECK_INTEGRITY(1  20)
  #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1  21)
  #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR   (1  22)
 +#define BTRFS_MOUNT_NOEXTIREF  (1  23)
  
  #define btrfs_clear_opt(o, opt)((o) = ~BTRFS_MOUNT_##opt)
  #define btrfs_set_opt(o, opt)  ((o) |= BTRFS_MOUNT_##opt)
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index ab8ef37..ee00448 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -2269,6 +2269,15 @@ int open_ctree(struct super_block *sb,
 goto fail_alloc;
 }
  
 +   if ((btrfs_super_incompat_flags(disk_super) 
 +BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF) 
 +   btrfs_test_opt(tree_root, NOEXTIREF)) {
 +   printk(KERN_ERR BTRFS: couldn't mount because the extend iref
 +  can not be close.\n);
 +   err = -EINVAL;
 +   goto fail_alloc;
 +   }
 +
 if (btrfs_super_leafsize(disk_super) !=
 btrfs_super_nodesize(disk_super)) {
 printk(KERN_ERR BTRFS: couldn't mount because metadata 
 diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c
 index f07eb45..7c4f880 100644
 --- a/fs/btrfs/inode-item.c
 +++ b/fs/btrfs/inode-item.c
 @@ -442,7 +442,7 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle 
 *trans,
  out:
 btrfs_free_path(path);
  
 -   if (ret == -EMLINK) {
 +   if (ret == -EMLINK  !btrfs_test_opt(root, NOEXTIREF)) {
 /*
  * We ran out of space in the ref array. Need to add an
  * extended ref.
 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index 0f03569..fd375b3 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -315,7 +315,7 @@ enum {
 Opt_nodatacow, Opt_max_inline, Opt_alloc_start, Opt_nobarrier, Opt_ssd,
 Opt_nossd, Opt_ssd_spread, Opt_thread_pool, Opt_noacl, Opt_compress,
 Opt_compress_type, Opt_compress_force, Opt_compress_force_type,
 -   Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard,
 +   Opt_notreelog, Opt_noextiref, Opt_ratio, Opt_flushoncommit, Opt_discard,
 Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed,
 Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache,
 Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
 @@ -344,6 +344,7 @@ static match_table_t tokens = {
 {Opt_nossd, nossd},
 {Opt_noacl, noacl},
 {Opt_notreelog, notreelog},
 +   {Opt_noextiref, noextiref},
 {Opt_flushoncommit, flushoncommit},
 {Opt_ratio, metadata_ratio=%d},
 {Opt_discard, discard},
 @@ -535,6 +536,10 @@ int btrfs_parse_options(struct btrfs_root *root, char 
 *options)
 printk(KERN_INFO btrfs: disabling tree log\n);
 btrfs_set_opt(info-mount_opt, NOTREELOG);
 break;
 +   case Opt_noextiref:
 +   printk(KERN_INFO btrfs: disabling extend inode ref\n);
 +   btrfs_set_opt(info-mount_opt, NOEXTIREF);
 +   break;
 case Opt_flushoncommit:
 printk(KERN_INFO btrfs: turning on flush-on-commit\n);
 btrfs_set_opt(info-mount_opt, FLUSHONCOMMIT);
 @@ -1202,6 +1207,35 @@ static void btrfs_resize_thread_pool(struct 
 btrfs_fs_info *fs_info,
   new_pool_size);
  }
  
 +static int btrfs_close_extend_iref(struct btrfs_fs_info *fs_info,
 +  unsigned long old_opts)

 The name irritated me, it's more like unset instead of close, isn't it?
 
 Maybe btrfs_set_no_extend_iref() is better, the other developers might think
 we will clear BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF.

I think we should use the exact name of the mount option, so
btrfs_set_noextiref is probably least ambiguous. Or even
btrfs_set_mntflag_noextiref.


 +{
 +   struct btrfs_trans_handle *trans;
 +   int ret;
 +
 +   if (btrfs_raw_test_opt(old_opts, NOEXTIREF

Re: [PATCH] Btrfs: add a rb_tree to improve performance of ulist search

2013-04-11 Thread Jan Schmidt
;/* auxiliary value saved along with the val */
 + struct rb_node rb_node; /* used to speed up search */
  };
  
  struct ulist {
 @@ -54,6 +58,8 @@ struct ulist {
*/
   struct ulist_node *nodes;
  
 + struct rb_root root;
 +
   /*
* inline storage space for the first ULIST_SIZE entries
*/
 

Makes a lot of sense. Thanks!

Reviewed-by: Jan Schmidt list.bt...@jan-o-sch.net
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: don't set INCOMPAT_EXTENDED_IREF flag when making a new fs

2013-04-11 Thread Jan Schmidt
On Thu, April 11, 2013 at 12:28 (+0200), Miao Xie wrote:
 There is no extended irefs in the new fs, and we can mount it on
 the old kernel without extended iref function safely. So we needn't
 set INCOMPAT_EXTENDED_IREF flag when making a new fs, and just set
 it when we actually insert a extended iref.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 Cc: Mark Fasheh mfas...@suse.de
 ---
  mkfs.c | 2 --
  1 file changed, 2 deletions(-)
 
 diff --git a/mkfs.c b/mkfs.c
 index c8cb395..aca6e46 100644
 --- a/mkfs.c
 +++ b/mkfs.c
 @@ -1654,8 +1654,6 @@ raid_groups:
  
   super = root-fs_info-super_copy;
   flags = btrfs_super_incompat_flags(super);
 - flags |= BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF;
 -
   if (mixed)
   flags |= BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS;
  
 

This one should have a large

*** do not apply until kernel patches from [PATCH 0/2] do not open the extend
*** inode reference at the beginning have been merged.

tag. Otherwise, extended irefs are disabled entirely for all new file systems in
environments where they have been working so far.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Btrfs: introduce noextiref mount option

2013-04-11 Thread Jan Schmidt
On Thu, April 11, 2013 at 12:35 (+0200), Miao Xie wrote:
 Now, we set incompat flag EXTEND_IREF when we actually need insert a
 extend inode reference, not when making a fs. But some users may hope
 that the fs still can be mounted on the old kernel, and don't hope we
 insert any extend inode references. So we introduce noextiref mount
 option to close this function.

That's a much better approach compared to setting the flag on mkfs, I agree.

 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 Cc: Mark Fasheh mfas...@suse.de
 ---
  fs/btrfs/ctree.h  |  1 +
  fs/btrfs/disk-io.c|  9 +
  fs/btrfs/inode-item.c |  2 +-
  fs/btrfs/super.c  | 41 -
  4 files changed, 51 insertions(+), 2 deletions(-)
 
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index a883e47..db88963 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1911,6 +1911,7 @@ struct btrfs_ioctl_defrag_range_args {
  #define BTRFS_MOUNT_CHECK_INTEGRITY  (1  20)
  #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1  21)
  #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR (1  22)
 +#define BTRFS_MOUNT_NOEXTIREF(1  23)
  
  #define btrfs_clear_opt(o, opt)  ((o) = ~BTRFS_MOUNT_##opt)
  #define btrfs_set_opt(o, opt)((o) |= BTRFS_MOUNT_##opt)
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index ab8ef37..ee00448 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -2269,6 +2269,15 @@ int open_ctree(struct super_block *sb,
   goto fail_alloc;
   }
  
 + if ((btrfs_super_incompat_flags(disk_super) 
 +  BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF) 
 + btrfs_test_opt(tree_root, NOEXTIREF)) {
 + printk(KERN_ERR BTRFS: couldn't mount because the extend iref
 +can not be close.\n);
 + err = -EINVAL;
 + goto fail_alloc;
 + }
 +
   if (btrfs_super_leafsize(disk_super) !=
   btrfs_super_nodesize(disk_super)) {
   printk(KERN_ERR BTRFS: couldn't mount because metadata 
 diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c
 index f07eb45..7c4f880 100644
 --- a/fs/btrfs/inode-item.c
 +++ b/fs/btrfs/inode-item.c
 @@ -442,7 +442,7 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle 
 *trans,
  out:
   btrfs_free_path(path);
  
 - if (ret == -EMLINK) {
 + if (ret == -EMLINK  !btrfs_test_opt(root, NOEXTIREF)) {
   /*
* We ran out of space in the ref array. Need to add an
* extended ref.
 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index 0f03569..fd375b3 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -315,7 +315,7 @@ enum {
   Opt_nodatacow, Opt_max_inline, Opt_alloc_start, Opt_nobarrier, Opt_ssd,
   Opt_nossd, Opt_ssd_spread, Opt_thread_pool, Opt_noacl, Opt_compress,
   Opt_compress_type, Opt_compress_force, Opt_compress_force_type,
 - Opt_notreelog, Opt_ratio, Opt_flushoncommit, Opt_discard,
 + Opt_notreelog, Opt_noextiref, Opt_ratio, Opt_flushoncommit, Opt_discard,
   Opt_space_cache, Opt_clear_cache, Opt_user_subvol_rm_allowed,
   Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache,
   Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
 @@ -344,6 +344,7 @@ static match_table_t tokens = {
   {Opt_nossd, nossd},
   {Opt_noacl, noacl},
   {Opt_notreelog, notreelog},
 + {Opt_noextiref, noextiref},
   {Opt_flushoncommit, flushoncommit},
   {Opt_ratio, metadata_ratio=%d},
   {Opt_discard, discard},
 @@ -535,6 +536,10 @@ int btrfs_parse_options(struct btrfs_root *root, char 
 *options)
   printk(KERN_INFO btrfs: disabling tree log\n);
   btrfs_set_opt(info-mount_opt, NOTREELOG);
   break;
 + case Opt_noextiref:
 + printk(KERN_INFO btrfs: disabling extend inode ref\n);
 + btrfs_set_opt(info-mount_opt, NOEXTIREF);
 + break;
   case Opt_flushoncommit:
   printk(KERN_INFO btrfs: turning on flush-on-commit\n);
   btrfs_set_opt(info-mount_opt, FLUSHONCOMMIT);
 @@ -1202,6 +1207,35 @@ static void btrfs_resize_thread_pool(struct 
 btrfs_fs_info *fs_info,
 new_pool_size);
  }
  
 +static int btrfs_close_extend_iref(struct btrfs_fs_info *fs_info,
 +unsigned long old_opts)

The name irritated me, it's more like unset instead of close, isn't it?

 +{
 + struct btrfs_trans_handle *trans;
 + int ret;
 +
 + if (btrfs_raw_test_opt(old_opts, NOEXTIREF) ||
 + !btrfs_raw_test_opt(fs_info-mount_opt, NOEXTIREF))
 + return 0;
 +
 + trans = btrfs_attach_transaction(fs_info-tree_root);
 + if (IS_ERR(trans)) {
 + if (PTR_ERR(trans) != -ENOENT)
 + return PTR_ERR(trans);
 + } else {
 +   

Re: kernel BUG at fs/btrfs/ctree.c:1144!

2013-04-10 Thread Jan Schmidt

On Wed, April 10, 2013 at 09:58 (+0200), Ahmet Inan wrote:
 I got this problem since 3.8.5 + for-linus (from that time).
 Have just tried 3.8.6 + for-linus with git merge -X theirs
 btrfs/for-linus but still same problem.
 Going back to 3.7.4 + for-linus (from that time) doesn't give me the problem.

The stack you attached shows a function added in the snapshot aware defrag
patches (commit 38c227d8), added in 3.8.

The real problem, however, is not caused by that commit but by a tree mod log
bug. I expect that fs/btrfs/ctree.c:1144 is this BUG_ON in your kernel from
__tree_mod_log_rewind (my line numbers don't match):

1138 switch (tm-op) {
1139 case MOD_LOG_KEY_REMOVE_WHILE_FREEING:
1140 BUG_ON(tm-slot  n);

I've got a fix for that I'm currently testing, expect it on the list soon.

 This is an production nfs server with 2x2TB raid1, so cant reboot it that 
 often.
 Have seen this same problem on another system (also raid1) once, but
 rebooting helped, no problems since.
 Both systems use autodefrag, maybe that sometimes triggers it?
 I really would like to help, so i can stay on the latest kernels.
 What should i do?

For the meantime I recommend to not defrag your filesystem.

As a general remark, please send your stack traces inline, not as attachment if
possible.

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Lockdep warning on for-linus branch (umount vs. evict_inode)

2013-04-10 Thread Jan Schmidt
I was running fsstress to trigger a tree mod log problem on a current kernel
with some custom debug patches applied, so if anyone looking at this needs any
line numbers let me know:

4[ 1221.749586] [ INFO: possible circular locking dependency detected ]
4[ 1221.749589] 3.8.0+ #9 Not tainted
4[ 1221.749590] ---
4[ 1221.749591] fsstress/3108 is trying to acquire lock:
4[ 1221.749592]  (sb_internal){.+.+..}, at: [a0183cde] 
start_transaction+0x2de/0x4f0 [btrfs]
4[ 1221.749614] 
4[ 1221.749614] but task is already holding lock:
4[ 1221.749616]  (fs_info-ordered_operations_mutex){+.+...}, at: 
[a019c089] btrfs_wait_ordered_extents+0x49/0x270 [btrfs]
4[ 1221.749632] 
4[ 1221.749632] which lock already depends on the new lock.
4[ 1221.749632] 
4[ 1221.749634] 
4[ 1221.749634] the existing dependency chain (in reverse order) is:
4[ 1221.749635] 
4[ 1221.749635] - #1 (fs_info-ordered_operations_mutex){+.+...}:
4[ 1221.749638][810f1f73] lock_acquire+0x93/0x130
4[ 1221.749643][819b5fff] __mutex_lock_common+0x5f/0x4a0
4[ 1221.749647][819b6575] mutex_lock_nested+0x45/0x50
4[ 1221.749650][a019b935] 
btrfs_run_ordered_operations+0x55/0x2e0 [btrfs]
4[ 1221.749663][a0182866] 
btrfs_commit_transaction+0x76/0xd40 [btrfs]
4[ 1221.749675][a017c3a7] btrfs_commit_super+0x67/0x130 
[btrfs]
4[ 1221.749687][a017daea] close_ctree+0x34a/0x3a0 [btrfs]
4[ 1221.749699][a014fe49] btrfs_put_super+0x19/0x20 [btrfs]
4[ 1221.749707][811bed62] generic_shutdown_super+0x62/0xf0
4[ 1221.749710][811bee86] kill_anon_super+0x16/0x30
4[ 1221.749712][a015396a] btrfs_kill_super+0x1a/0x90 [btrfs]
4[ 1221.749720][811bf3a5] deactivate_locked_super+0x45/0x70
4[ 1221.749722][811c02aa] deactivate_super+0x4a/0x70
4[ 1221.749725][811dbe72] mntput_no_expire+0xd2/0x130
4[ 1221.749728][811dcb6e] sys_umount+0x7e/0x3b0
4[ 1221.749730][819c0c82] system_call_fastpath+0x16/0x1b
4[ 1221.749734] 
4[ 1221.749734] - #0 (sb_internal){.+.+..}:
4[ 1221.749736][810f1e03] __lock_acquire+0x1713/0x17f0
4[ 1221.749739][810f1f73] lock_acquire+0x93/0x130
4[ 1221.749741][811be82f] __sb_start_write+0x13f/0x230
4[ 1221.749745][a0183cde] start_transaction+0x2de/0x4f0 
[btrfs]
4[ 1221.749757][a0183fc7] btrfs_join_transaction+0x17/0x20 
[btrfs]
4[ 1221.749770][a01d6bf0] 
btrfs_commit_inode_delayed_inode+0x60/0x150 [btrfs]
4[ 1221.749784][a018a240] btrfs_evict_inode+0x140/0x350 
[btrfs]
4[ 1221.749798][811d6df7] evict+0xa7/0x1a0
4[ 1221.749801][811d7008] iput+0x118/0x1a0
4[ 1221.749803][a019c286] 
btrfs_wait_ordered_extents+0x246/0x270 [btrfs]
4[ 1221.749817][a0151897] btrfs_sync_fs+0x47/0x110 [btrfs]
4[ 1221.749825][811ecaa0] sync_fs_one_sb+0x20/0x30
4[ 1221.749828][811c07f6] iterate_supers+0xb6/0xf0
4[ 1221.749831][811ecf85] sys_sync+0x55/0x90
4[ 1221.749833][819c0c82] system_call_fastpath+0x16/0x1b
4[ 1221.749836] 
4[ 1221.749836] other info that might help us debug this:
4[ 1221.749836] 
4[ 1221.749837]  Possible unsafe locking scenario:
4[ 1221.749837] 
4[ 1221.749839]CPU0CPU1
4[ 1221.749840]
4[ 1221.749841]   lock(fs_info-ordered_operations_mutex);
4[ 1221.749843]lock(sb_internal);
4[ 1221.749845]
lock(fs_info-ordered_operations_mutex);
4[ 1221.749846]   lock(sb_internal);
4[ 1221.749848] 
4[ 1221.749848]  *** DEADLOCK ***
4[ 1221.749848] 
4[ 1221.749851] 2 locks held by fsstress/3108:
4[ 1221.749852]  #0:  (type-s_umount_key#22){+.}, at: 
[811c07e0] iterate_supers+0xa0/0xf0
4[ 1221.749857]  #1:  (fs_info-ordered_operations_mutex){+.+...}, at: 
[a019c089] btrfs_wait_ordered_extents+0x49/0x270 [btrfs]
4[ 1221.749873] 
4[ 1221.749873] stack backtrace:
4[ 1221.749875] Pid: 3108, comm: fsstress Not tainted 3.8.0+ #9
4[ 1221.749876] Call Trace:
4[ 1221.749880]  [810ef2be] print_circular_bug+0x20e/0x2f0
4[ 1221.749883]  [810f1e03] __lock_acquire+0x1713/0x17f0
4[ 1221.749896]  [a0183cde] ? start_transaction+0x2de/0x4f0 [btrfs]
4[ 1221.749898]  [810f1f73] lock_acquire+0x93/0x130
4[ 1221.749911]  [a0183cde] ? start_transaction+0x2de/0x4f0 [btrfs]
4[ 1221.749914]  [8116b60d] ? find_get_pages_tag+0x2d/0x1d0
4[ 1221.749918]  [811be82f] __sb_start_write+0x13f/0x230
4[ 1221.749930]  [a0183cde] ? start_transaction+0x2de/0x4f0 [btrfs]
4[ 1221.749943]  [a0183cde] ? start_transaction+0x2de/0x4f0 [btrfs]
4[ 1221.749946]  

Re: btrfs-progs: re-add send-test

2013-04-09 Thread Jan Schmidt
On 06.04.2013 20:30, Eric Sandeen wrote:
 From: Mark Fasheh mfas...@suse.de
 
 btrfs-progs: re-add send-test
 
 send-test.c links against libbtrfs and uses the send functionality provided
 to decode and print a send stream to the console.

This looks pretty much like fardump from Arne's far repository:

   git://git.kernel.org/pub/scm/linux/kernel/git/arne/far-progs.git

The stream generated by btrfs send is generated in a way to be easily
receivable by any other filesystem (destination fs' limitations apply).
Thus, we came up with the term Filesystem Agnostic Replication.

In my opinion, one of the next steps would be getting the logic used in
btrfs receive into a generic far lib, which itself would link against
libbtrfs to take the btrfs part from there. So, btrfs receive on the
command line would become a stub calling something in the yet to create
far lib, which itself would call hooks back to libbtrfs.

I have no plan ready how to build btrfs progs if we're doing such a
shift, that would have to be sorted out.

My goal is to get everything that could be shared with other filesystems
out of btrfs-progs into a far lib, and the send-test sent here
definitely falls into that category; plus: it already exists outside
btrfs-progs, unless I'm missing something.

Thanks!
-Jan

 [snip]
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   >