Re: bug caused by removal of trans_mutex? (Was: Re: kernel BUG at fs/btrfs/extent-tree.c:6164!)

2011-06-14 Thread Yan, Zheng
I found another bug. There are codes (btrfs_save_ino_cache) that
modify fs trees after
create_pending_snapshots is called. This can corrupt your fs.

On Mon, Jun 13, 2011 at 3:13 PM, Li Zefan l...@cn.fujitsu.com wrote:
 Cc: Josef

 I encountered following panic using 'btrfs-unstable + for-linus'
 kernel.

 I ran btrfs fi bal /test5 command, and mount option of /test5
 is as follows:

  /dev/sdc3 on /test5 type btrfs 
 (rw,space_cache,compress=lzo,inode_cache)

 So, just a btrfs fi bal would lead to the bug?
 I think so.

 It should be specific to the inode caching code.  The balancing code is
 finding the inode map cache extents, but it doesn't know how to relocate
 them.

 However, the panic has occurred even if inode_cahce is turned off.
 Is this another problem?


 I don't think free inode cache isthe cause of the bug here (even if 
 inode_cache
 is turned on).

 What I have found out is:

 1. git checkout a4abeea41adfa3c143c289045f4625dfaeba2212

 So the top commit is the removal of trans_mutex and no delayed_inode patch
 or free inode cache patchset in the tree, and bug can be triggered.

 2. git checkout 2a1eb4614d984d5cd4c928784e9afcf5c07f93be

 So the top commit is the one before trans_mutex removal, and no bug triggered.

 3. test linus' tree

 bug triggered.

 4. revert that suspicoius commit manually from linus' tree

 no bug.

 so either that commit is buggy or it reveals some bugs covered by the 
 trans_mutex.

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange filefrag output on btrfs

2011-06-14 Thread Andreas Philipp

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
On 14.06.2011 04:54, Li Zefan wrote:
 Andreas Philipp wrote:

 -BEGIN PGP SIGNED MESSAGE- Hash: SHA1

 On 13.06.2011 13:50, David Sterba wrote:
 On Sat, Jun 11, 2011 at 05:39:15PM +0200, Andreas Philipp
 wrote:
 On one of my btrfs volumes I see a strange output from
 filefrag when run against a particular large (~8GB) file.
 filefrag and filefrag -v give me a different number of
 extents, see below.

 aph@thor /mnt/nutshell $ sudo filefrag -v funtoo.img | grep
 extents funtoo.img: 2624 extents found aph@thor /mnt/nutshell
 $ sudo filefrag funtoo.img | grep extents funtoo.img: 2653
 extents found

 is the file open and being written to? did you run sync before
 the first command?
 The file is not open. Yes, I have run sync before the first
 command. Now, I tested again with a copy of file but the results
 is more or less the same.

 aph@thor /mnt/nutshell $ cp funtoo.img funtoo.1.img aph@thor
 /mnt/nutshell $ sync aph@thor /mnt/nutshell $ sudo filefrag -v
 funtoo.img funtoo.1.img | grep extents funtoo.img: 2624 extents
 found funtoo.1.img: 57 extents found aph@thor /mnt/nutshell $
 sudo filefrag funtoo.img funtoo.1.img | grep extents funtoo.img:
 2653 extents found funtoo.1.img: 311 extents found


 If you look into the source code of filefrag, you'll know why.

 There are two ways to calc the extent number, depending on whether
 verbose option is turned on or not.

 In the verbose mode, it will check if the next extent is adjacent
 to the prev extent in the physical position, and in this case they
 are considered to be one extent.

 That's why the number returned in verbose mode is smaller.
Thank you for this explanation.

Andreas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
iQIcBAEBAgAGBQJN9wmvAAoJEJIcBJ3+XkgiIWgQAL0+SLwnc6V6nar30rG6wCt+
czTTy7wFgdP5oYby9NMj2a5YifxG2XBa+Hnw3doLSxTHv4i7WqouaFeT4OotBzb+
jV8GBAn3vRyGlV0mfEx1PHzqUJNzUpJHZpWKvKx4JW91z3gZ/FdXEhbZXNyTVvPm
WtaLXz71CMtCSy81TN437T92H7yvv4SxiRubbe+IuBpKCJaIA1eH2yoJ+72yDNKH
TS74hvfYoDXngxZry4EA2/3mGTOq3PMSljWBw76pqx47KhsZged0ZN+YA8th7iiK
H3Pm3m19yzvt5niA5aS/ilwR50pKE2LI2dq7kkc2yjol/A86iUmIkAEm94Bv7a/3
hdBHslzqZpb2sWaQB2qjDA9aWGyDld3B2C1a+CiYSr0kqtPRlWKPPCQiDNibxrMp
cC2vT92OCoJMnsz7OC3nQN+UZAzBTnx7deFVAlgxnLrsuVT2IZMfxeurTLGJy0vG
zygp7pXdLbj4pzvLcIbf53DQ8wsSfQfLlMDec7wj+TpDLWCuBLQRVmWIKsc1ovMb
epoBihD4xJZguaeQAsyxBuFgYNoWCj0ebxWejGIYvilCZ8SJflN8/dEN3HaT8haR
9k6qdNB9cNULggs4dN8zvB530InDNxJHuI67hRcdLs+VDcWHjCdXmfcgn3Lz5km4
wDAlG4uZi/T5Pqz1Eqvq
=NrYD
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange filefrag output on btrfs

2011-06-14 Thread Chris Samuel
On 14/06/11 12:54, Li Zefan wrote:

 There are two ways to calc the extent number, depending
 on whether verbose option is turned on or not.

To me that's very counter intuitive!

Maybe another way to do that is to name those two values
differently and in verbose mode report them both so people
can see that there is a match ?

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stalls with latest btrfs merge into 3.0-rc2

2011-06-14 Thread Jim Schutt

Josef Bacik wrote:

On 06/13/2011 05:07 PM, Jim Schutt wrote:

Hi,

On a system under a heavy write load from multiple ceph OSDs,
I'm running into the following hung tasks where btrfs is implicated.
I'm running commit 3c25fa740e2 from Linus' tree merged with
commit cb9b41c92fa from git://ceph.newdream.net/git/ceph-client.git.



Please try this patch and verify it fixes the problem.  If it does I'll 
make it less crappy and send it along.  Thanks,


I saw no stalls with your patch applied after 30 minutes of writing,
whereas without it the stalls would trigger after a few minutes.

Let me know if you get a new version you'd like me to test, otherwise

Tested-by: Jim Schutt jasc...@sandia.gov

Thanks -- Jim



Josef




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix path leakage on subvol deletion

2011-06-14 Thread Josef Bacik
The delayed ref patch accidently removed the btrfs_free_path in
btrfs_unlink_subvol, this puts it back and means we don't leak a path.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/inode.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 30a1d24..1483372 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3074,6 +3074,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
ret = btrfs_update_inode(trans, root, dir);
BUG_ON(ret);
 
+   btrfs_free_path(path);
return 0;
 }
 
-- 
1.7.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: protect the pending_snapshots list with trans_lock

2011-06-14 Thread Josef Bacik
Currently there is nothing protecting the pending_snapshots list on the
transaction.  We only hold the directory mutex that we are snapshotting and a
read lock on the subvol_sem, so we could race with somebody else creating a
snapshot in a different directory and end up with list corruption.  So protect
this list with the trans_lock.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/ioctl.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b793d11..a3c4751 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -482,8 +482,10 @@ static int create_snapshot(struct btrfs_root *root, struct 
dentry *dentry,
ret = btrfs_snap_reserve_metadata(trans, pending_snapshot);
BUG_ON(ret);
 
+   spin_lock(root-fs_info-trans_lock);
list_add(pending_snapshot-list,
 trans-transaction-pending_snapshots);
+   spin_unlock(root-fs_info-trans_lock);
if (async_transid) {
*async_transid = trans-transid;
ret = btrfs_commit_transaction_async(trans,
-- 
1.7.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: set no_trans_join after trying to expand the transaction

2011-06-14 Thread Josef Bacik
We can lockup if we try to allow new writers join the transaction and we have
flushoncommit set or have a pending snapshot.  This is because we set
no_trans_join and then loop around and try to wait for ordered extents again.
The problem is the ordered endio stuff needs to join the transaction, which it
can't do because no_trans_join is set.  So instead wait until after this loop to
set no_trans_join and then make sure to wait for num_writers == 1 in case
anybody got started in between us exiting the loop and setting no_trans_join.
This could easily be reproduced by mounting -o flushoncommit and running xfstest
13.  It cannot be reproduced with this patch.  Thanks,

Reported-by: Jim Schutt jasc...@sandia.gov
Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/transaction.c |   14 +++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 8754997..d0f1c07 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1239,12 +1239,20 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
schedule_timeout(1);
 
finish_wait(cur_trans-writer_wait, wait);
-   spin_lock(root-fs_info-trans_lock);
-   root-fs_info-trans_no_join = 1;
-   spin_unlock(root-fs_info-trans_lock);
} while (atomic_read(cur_trans-num_writers)  1 ||
 (should_grow  cur_trans-num_joined != joined));
 
+   /*
+* Ok now we need to make sure to block out any other joins while we
+* commit the transaction.  We could have started a join before setting
+* no_join so make sure to wait for num_writers to == 1 again.
+*/
+   spin_lock(root-fs_info-trans_lock);
+   root-fs_info-trans_no_join = 1;
+   spin_unlock(root-fs_info-trans_lock);
+   wait_event(cur_trans-writer_wait,
+  atomic_read(cur_trans-num_writers) == 1);
+
ret = create_pending_snapshots(trans, root-fs_info);
BUG_ON(ret);
 
-- 
1.7.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html