Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2014-02-18 Thread Alex Lyakas
Hello Josef,

On Tue, Dec 18, 2012 at 3:52 PM, Josef Bacik jba...@fusionio.com wrote:
 On Wed, Dec 12, 2012 at 06:52:37PM -0700, Liu Bo wrote:
 An user reported that he has hit an annoying deadlock while playing with
 ceph based on btrfs.

 Current updating device tree requires space from METADATA chunk,
 so we -may- need to do a recursive chunk allocation when adding/updating
 dev extent, that is where the deadlock comes from.

 If we use SYSTEM metadata to update device tree, we can avoid the recursive
 stuff.


 This is going to cause us to allocate much more system chunks than we used to
 which could land us in trouble.  Instead let's just keep us from re-entering 
 if
 we're already allocating a chunk.  We do the chunk allocation when we don't 
 have
 enough space for a cluster, but we'll likely have plenty of space to make an
 allocation.  Can you give this patch a try Jim and see if it fixes your 
 problem?
 Thanks,

 Josef


 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index e152809..59df5e7 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -3564,6 +3564,10 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
 *trans,
 int wait_for_alloc = 0;
 int ret = 0;

 +   /* Don't re-enter if we're already allocating a chunk */
 +   if (trans-allocating_chunk)
 +   return -ENOSPC;
 +
 space_info = __find_space_info(extent_root-fs_info, flags);
 if (!space_info) {
 ret = update_space_info(extent_root-fs_info, flags,
 @@ -3606,6 +3610,8 @@ again:
 goto again;
 }

 +   trans-allocating_chunk = true;
 +
 /*
  * If we have mixed data/metadata chunks we want to make sure we keep
  * allocating mixed chunks instead of individual chunks.
 @@ -3632,6 +3638,7 @@ again:
 check_system_chunk(trans, extent_root, flags);

 ret = btrfs_alloc_chunk(trans, extent_root, flags);
 +   trans-allocating_chunk = false;
 if (ret  0  ret != -ENOSPC)
 goto out;

 diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
 index e6509b9..47ad8be 100644
 --- a/fs/btrfs/transaction.c
 +++ b/fs/btrfs/transaction.c
 @@ -388,6 +388,7 @@ again:
 h-qgroup_reserved = qgroup_reserved;
 h-delayed_ref_elem.seq = 0;
 h-type = type;
 +   h-allocating_chunk = false;
 INIT_LIST_HEAD(h-qgroup_ref_list);
 INIT_LIST_HEAD(h-new_bgs);

 diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
 index 0e8aa1e..69700f7 100644
 --- a/fs/btrfs/transaction.h
 +++ b/fs/btrfs/transaction.h
 @@ -68,6 +68,7 @@ struct btrfs_trans_handle {
 struct btrfs_block_rsv *orig_rsv;
 short aborted;
 short adding_csums;
 +   bool allocating_chunk;
 enum btrfs_trans_type type;
 /*
  * this root is only needed to validate that the root passed to

I hit this problem in a following scenario:
- a data chunk allocation is triggered, and locks chunk_mutex
- the same thread now also wants to allocate a metadata chunk, so it
recursively calls do_chunk_alloc, but cannot lock the chunk_mutex =
deadlock
- btrfs has only one metadata chunk, the one that was initially
allocated by mkfs, it has:
total_bytes=8388608
bytes_used=8130560
bytes_pinned=77824
bytes_reserved=180224
so bytes_used + bytes_pinned + bytes_reserved == total_bytes

Your patch would have returned ENOSPC and avoid the deadlock, but
there would be a failure to allocate a tree block for metadata. So the
transaction would have probably aborted.

How such situation should be handled?

Idea1:
- lock chunk mutex,
- if we are allocating a data chunk, check whether the metadata space
is below some threshold. If yes, go and allocate a metadata chunk
first and then only a data chunk.

Idea2:
- check if we are the same thread that already locked the chunk mutex.
If yes, allow recursive call but don't attempt to lock/unlock the
chunk_mutex this time

Or some other way?

Thanks!
Alex.






 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2014-02-18 Thread Josef Bacik
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



On 02/18/2014 10:47 AM, Alex Lyakas wrote:
 Hello Josef,
 
 On Tue, Dec 18, 2012 at 3:52 PM, Josef Bacik jba...@fusionio.com
 wrote:
 On Wed, Dec 12, 2012 at 06:52:37PM -0700, Liu Bo wrote:
 An user reported that he has hit an annoying deadlock while
 playing with ceph based on btrfs.
 
 Current updating device tree requires space from METADATA
 chunk, so we -may- need to do a recursive chunk allocation when
 adding/updating dev extent, that is where the deadlock comes
 from.
 
 If we use SYSTEM metadata to update device tree, we can avoid
 the recursive stuff.
 
 
 This is going to cause us to allocate much more system chunks
 than we used to which could land us in trouble.  Instead let's
 just keep us from re-entering if we're already allocating a
 chunk.  We do the chunk allocation when we don't have enough
 space for a cluster, but we'll likely have plenty of space to
 make an allocation.  Can you give this patch a try Jim and see if
 it fixes your problem? Thanks,
 
 Josef
 
 
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c 
 index e152809..59df5e7 100644 --- a/fs/btrfs/extent-tree.c +++
 b/fs/btrfs/extent-tree.c @@ -3564,6 +3564,10 @@ static int
 do_chunk_alloc(struct btrfs_trans_handle *trans, int
 wait_for_alloc = 0; int ret = 0;
 
 +   /* Don't re-enter if we're already allocating a chunk */ 
 +   if (trans-allocating_chunk) +   return
 -ENOSPC; + space_info = __find_space_info(extent_root-fs_info,
 flags); if (!space_info) { ret =
 update_space_info(extent_root-fs_info, flags, @@ -3606,6 +3610,8
 @@ again: goto again; }
 
 +   trans-allocating_chunk = true; + /* * If we have mixed
 data/metadata chunks we want to make sure we keep * allocating
 mixed chunks instead of individual chunks. @@ -3632,6 +3638,7 @@
 again: check_system_chunk(trans, extent_root, flags);
 
 ret = btrfs_alloc_chunk(trans, extent_root, flags); +
 trans-allocating_chunk = false; if (ret  0  ret != -ENOSPC) 
 goto out;
 
 diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c 
 index e6509b9..47ad8be 100644 --- a/fs/btrfs/transaction.c +++
 b/fs/btrfs/transaction.c @@ -388,6 +388,7 @@ again: 
 h-qgroup_reserved = qgroup_reserved; h-delayed_ref_elem.seq =
 0; h-type = type; +   h-allocating_chunk = false; 
 INIT_LIST_HEAD(h-qgroup_ref_list); 
 INIT_LIST_HEAD(h-new_bgs);
 
 diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h 
 index 0e8aa1e..69700f7 100644 --- a/fs/btrfs/transaction.h +++
 b/fs/btrfs/transaction.h @@ -68,6 +68,7 @@ struct
 btrfs_trans_handle { struct btrfs_block_rsv *orig_rsv; short
 aborted; short adding_csums; +   bool allocating_chunk; enum
 btrfs_trans_type type; /* * this root is only needed to validate
 that the root passed to
 
 I hit this problem in a following scenario: - a data chunk
 allocation is triggered, and locks chunk_mutex - the same thread
 now also wants to allocate a metadata chunk, so it recursively
 calls do_chunk_alloc, but cannot lock the chunk_mutex = deadlock -
 btrfs has only one metadata chunk, the one that was initially 
 allocated by mkfs, it has: total_bytes=8388608 bytes_used=8130560 
 bytes_pinned=77824 bytes_reserved=180224 so bytes_used +
 bytes_pinned + bytes_reserved == total_bytes
 
 Your patch would have returned ENOSPC and avoid the deadlock, but 
 there would be a failure to allocate a tree block for metadata. So
 the transaction would have probably aborted.
 
 How such situation should be handled?
 
 Idea1: - lock chunk mutex, - if we are allocating a data chunk,
 check whether the metadata space is below some threshold. If yes,
 go and allocate a metadata chunk first and then only a data chunk.
 
 Idea2: - check if we are the same thread that already locked the
 chunk mutex. If yes, allow recursive call but don't attempt to
 lock/unlock the chunk_mutex this time
 
 Or some other way?
 

I fixed this with the delayed chunk allocation stuff which doesn't
actually do the block group creation stuff until we end the
transaction, so we can allocate metadata chunks without any issue.
Thanks,

Josef
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTA4UMAAoJEANb+wAKly3B+KEP/RdlEyJWydetjQxllF0cgHY1
UraqWBl+mSSHlwZlHyGjmAu6cK6n+QfTZtdIBhihdY50UcvMuWtVmz2JzlbxeO5+
88dBevADmW+QQoRl0yyQgnjlLWm+LvMTgOd1r+DZqlGs6sdX05dMI207+fQOW+c4
P+UKbT/eUYRVC4K//J1GUk4Yh3Q70U25321RWCehSUciwDVJO2LztD9VBAgh3qUc
o5uh5syshS3RbEi0hnUQ8tDKXWvdZQBA2RF4loXACCmQO95e84mxVpoYPd9S1yYs
J+wf+Bak5hKZxmXJkOVcjLj4GsVQFJWTBTj6FvOFrm5TAFEGSyzrEzL8xW361+VS
I1q8GPSVN1fGKkVypddylLIXLHmqXb57UElvGhoBM0otxNd8+xfSpLZ045vv5qLx
RKwhJI1gIWD59kBre0fdSkUJZDeYSmLWOiwG6hG3A7Yy93c6/1RLHRnHq5NEe12R
nrqZKBnkvDKnL/21eVqpOMo7i/AzCB7N+ojfaql2WvWcLkCpomhLBgC18Q1RiSzZ
nfmafQIUPunM4l/fLXsbYFdiUu2jSZWZuTpOV71lYUqfrUydqBCZqTpWAlmfkNQ7
C4BHMtgfiRn6CI2KzpP6DpdGJbxjExEWzwheaswffN5TzOxEHQeRvHOKI41ln1i7
UfdifDhUx+zZl0TxMesQ
=elae
-END PGP SIGNATURE-

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2014-02-18 Thread Alex Lyakas
Hi Josef,
is this the commit to look at:
6df9a95e63395f595d0d1eb5d561dd6c91c40270 Btrfs: make the chunk
allocator completely tree lockless

or some other commits are also relevant?

Alex.


On Tue, Feb 18, 2014 at 6:06 PM, Josef Bacik jba...@fb.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1



 On 02/18/2014 10:47 AM, Alex Lyakas wrote:
 Hello Josef,

 On Tue, Dec 18, 2012 at 3:52 PM, Josef Bacik jba...@fusionio.com
 wrote:
 On Wed, Dec 12, 2012 at 06:52:37PM -0700, Liu Bo wrote:
 An user reported that he has hit an annoying deadlock while
 playing with ceph based on btrfs.

 Current updating device tree requires space from METADATA
 chunk, so we -may- need to do a recursive chunk allocation when
 adding/updating dev extent, that is where the deadlock comes
 from.

 If we use SYSTEM metadata to update device tree, we can avoid
 the recursive stuff.


 This is going to cause us to allocate much more system chunks
 than we used to which could land us in trouble.  Instead let's
 just keep us from re-entering if we're already allocating a
 chunk.  We do the chunk allocation when we don't have enough
 space for a cluster, but we'll likely have plenty of space to
 make an allocation.  Can you give this patch a try Jim and see if
 it fixes your problem? Thanks,

 Josef


 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index e152809..59df5e7 100644 --- a/fs/btrfs/extent-tree.c +++
 b/fs/btrfs/extent-tree.c @@ -3564,6 +3564,10 @@ static int
 do_chunk_alloc(struct btrfs_trans_handle *trans, int
 wait_for_alloc = 0; int ret = 0;

 +   /* Don't re-enter if we're already allocating a chunk */
 +   if (trans-allocating_chunk) +   return
 -ENOSPC; + space_info = __find_space_info(extent_root-fs_info,
 flags); if (!space_info) { ret =
 update_space_info(extent_root-fs_info, flags, @@ -3606,6 +3610,8
 @@ again: goto again; }

 +   trans-allocating_chunk = true; + /* * If we have mixed
 data/metadata chunks we want to make sure we keep * allocating
 mixed chunks instead of individual chunks. @@ -3632,6 +3638,7 @@
 again: check_system_chunk(trans, extent_root, flags);

 ret = btrfs_alloc_chunk(trans, extent_root, flags); +
 trans-allocating_chunk = false; if (ret  0  ret != -ENOSPC)
 goto out;

 diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
 index e6509b9..47ad8be 100644 --- a/fs/btrfs/transaction.c +++
 b/fs/btrfs/transaction.c @@ -388,6 +388,7 @@ again:
 h-qgroup_reserved = qgroup_reserved; h-delayed_ref_elem.seq =
 0; h-type = type; +   h-allocating_chunk = false;
 INIT_LIST_HEAD(h-qgroup_ref_list);
 INIT_LIST_HEAD(h-new_bgs);

 diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
 index 0e8aa1e..69700f7 100644 --- a/fs/btrfs/transaction.h +++
 b/fs/btrfs/transaction.h @@ -68,6 +68,7 @@ struct
 btrfs_trans_handle { struct btrfs_block_rsv *orig_rsv; short
 aborted; short adding_csums; +   bool allocating_chunk; enum
 btrfs_trans_type type; /* * this root is only needed to validate
 that the root passed to

 I hit this problem in a following scenario: - a data chunk
 allocation is triggered, and locks chunk_mutex - the same thread
 now also wants to allocate a metadata chunk, so it recursively
 calls do_chunk_alloc, but cannot lock the chunk_mutex = deadlock -
 btrfs has only one metadata chunk, the one that was initially
 allocated by mkfs, it has: total_bytes=8388608 bytes_used=8130560
 bytes_pinned=77824 bytes_reserved=180224 so bytes_used +
 bytes_pinned + bytes_reserved == total_bytes

 Your patch would have returned ENOSPC and avoid the deadlock, but
 there would be a failure to allocate a tree block for metadata. So
 the transaction would have probably aborted.

 How such situation should be handled?

 Idea1: - lock chunk mutex, - if we are allocating a data chunk,
 check whether the metadata space is below some threshold. If yes,
 go and allocate a metadata chunk first and then only a data chunk.

 Idea2: - check if we are the same thread that already locked the
 chunk mutex. If yes, allow recursive call but don't attempt to
 lock/unlock the chunk_mutex this time

 Or some other way?


 I fixed this with the delayed chunk allocation stuff which doesn't
 actually do the block group creation stuff until we end the
 transaction, so we can allocate metadata chunks without any issue.
 Thanks,

 Josef
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iQIcBAEBAgAGBQJTA4UMAAoJEANb+wAKly3B+KEP/RdlEyJWydetjQxllF0cgHY1
 UraqWBl+mSSHlwZlHyGjmAu6cK6n+QfTZtdIBhihdY50UcvMuWtVmz2JzlbxeO5+
 88dBevADmW+QQoRl0yyQgnjlLWm+LvMTgOd1r+DZqlGs6sdX05dMI207+fQOW+c4
 P+UKbT/eUYRVC4K//J1GUk4Yh3Q70U25321RWCehSUciwDVJO2LztD9VBAgh3qUc
 o5uh5syshS3RbEi0hnUQ8tDKXWvdZQBA2RF4loXACCmQO95e84mxVpoYPd9S1yYs
 J+wf+Bak5hKZxmXJkOVcjLj4GsVQFJWTBTj6FvOFrm5TAFEGSyzrEzL8xW361+VS
 I1q8GPSVN1fGKkVypddylLIXLHmqXb57UElvGhoBM0otxNd8+xfSpLZ045vv5qLx
 RKwhJI1gIWD59kBre0fdSkUJZDeYSmLWOiwG6hG3A7Yy93c6/1RLHRnHq5NEe12R
 

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2014-02-18 Thread Josef Bacik
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/18/2014 11:24 AM, Alex Lyakas wrote:
 Hi Josef, is this the commit to look at: 
 6df9a95e63395f595d0d1eb5d561dd6c91c40270 Btrfs: make the chunk 
 allocator completely tree lockless
 
 or some other commits are also relevant?
 

It's been so long but I'm pretty sure everything you need is in that
patch.  Thanks,

Josef
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTA4nPAAoJEANb+wAKly3BlNwP/AyG7LtDo6YYaYvXJyBJa1Vj
hq2C48lwQhSjAYfn5QJ05AUOgL4xAb1THSjDqTIyoyEXGwBnRLEaX3/MFygthrxi
9u137ys1C/EQr3fmRecdz6Qpojkf5EAxiK8J0nL+G/BXoJYdwCYUj4oLOgqwP6/X
/XhsqyCLmj8jATndYCz7Z68xfutF37xtId0mWEsRrnvMqrT5nDvA/WpkzYE+ovc3
OhffFHfHJAf94qMb6EtSpH3E2MJDIYfp6cIAEgEK2ougZLnf0lkjcCXd2B6fRLcY
9WuZaVsi4J+vqGxVwnxDaJ7TbjEDXbl+bnAs5R5VDKZUy56zOxNA9//ejCuYtl/P
r5K0PKZXu81wiK22DbF0hhZfzdkElnVqx8DSgwTyyo5aJTj6cNuDRdPmTz4TEbib
N8z7rGC85Y4Z9Z1Gwnj3cD6pKQU4+anUhkIWNFVM9SpWbjYXgjjTMAj/LaM6GhJL
OptTORUwu4+9hGnfu7ItL8uyVrBwyh9cUcbru79D0+YyyWR5fDsgYFCtvUuhJ16q
vrViGT2MVyt4ZevvJMG02997sC8OCyeF4W0eQgyvgSOJToeoOJ57j8z/mSUntqDE
94f6hqOBjN6UY6/2FFILeMH0xuF0Li5JUOYB5Da99iHByeHQ4hrBWVyyvZfqW4vN
YY32d8J7Ine1N7/IZdVh
=jn5g
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-31 Thread Josef Bacik
On Wed, Jan 30, 2013 at 02:37:40PM -0700, Jim Schutt wrote:
 On 01/30/2013 09:38 AM, Josef Bacik wrote:
  On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
   On 01/29/2013 01:04 PM, Josef Bacik wrote:
On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
 On 01/28/2013 02:23 PM, Josef Bacik wrote:
  On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
  Hi Josef,
 
  Thanks for the patch - sorry for the long delay in 
  testing...
 
  
  Jim,
  
  I've been trying to reason out how this happens, could you 
  do a btrfs fi df on
  the filesystem thats giving you trouble so I can see if 
  what I think is
  happening is what's actually happening.  Thanks,
 
 Here's an example, using a slightly different kernel than
 my previous report.  It's your btrfs-next master branch
 (commit 8f139e59d5 Btrfs: use bit operation for -fs_state)
 with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).
 
 
 Here I'm finding the file system in question:
 
 # ls -l /dev/mapper | grep dm-93
 lrwxrwxrwx 1 root root   8 Jan 29 11:13 cs53s19p2 - ../dm-93
 
 # df -h | grep -A 1 cs53s19p2
 /dev/mapper/cs53s19p2
   896G  1.1G  896G   1% 
 /ram/mnt/ceph/data.osd.522
 
 
 Here's the info you asked for:
 
 # btrfs fi df /ram/mnt/ceph/data.osd.522
 Data: total=2.01GB, used=1.00GB
 System: total=4.00MB, used=64.00KB
 Metadata: total=8.00MB, used=7.56MB
 
How big is the disk you are using, and what mount options?  I have a 
patch to
keep the panic from happening and hopefully the abort, could you try 
this?  I
still want to keep the underlying error from happening because it 
shouldn't be,
but no reason I can't fix the error case while you can easily 
reproduce it :).
Thanks,

Josef

   From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
From: Josef Bacik jba...@fusionio.com
Date: Tue, 29 Jan 2013 15:03:37 -0500
Subject: [PATCH] Btrfs: fix chunk allocation error handling

If we error out allocating a dev extent we will have already created 
the
block group and such which will cause problems since the allocator 
may have
tried to allocate out of the block group that no longer exists.  This 
will
cause BUG_ON()'s in the bio submission path.  This also makes a 
failure to
allocate a dev extent a non-abort error, we will just clean up the dev
extents we did allocate and exit.  Now if we fail to delete the dev 
extents
we will abort since we can't have half of the dev extents hanging 
around,
but this will make us much less likely to abort.  Thanks,

Signed-off-by: Josef Bacik jba...@fusionio.com
---
   
   Interesting - with your patch applied I triggered the following, just
   bringing up a fresh Ceph filesystem - I didn't even get a chance to
   mount it on my Ceph clients:
   
  Ok can you give this patch a whirl as well?  It seems to fix the problem 
  for me.
 
 With this patch on top of your previous patch, after several trials of
 my test I am also unable to reproduce the issue.  Since I had been
 having trouble first time, every time, I think it also seems to fix
 the problem for me.
 

Hey Jim,

Could you test this patch instead?  I think it's a little less hamfisted and
should give us a nice balance between not crashing and being good for
performance.  Thanks,

Josef

commit 43510c0e5faad8e5e4d8ba13baa1dd5dfb3d39ce
Author: Josef Bacik jba...@fusionio.com
Date:   Wed Jan 30 17:02:51 2013 -0500

Btrfs: do not allow overcommit to happen if we are over 80% in use

Because of how little we allocate chunks now we can get really tight on
metadata space before we will allocate a new chunk.  This resulted in being
unable to add device extents when allocating a new metadata chunk as we did
not have enough space.  This is because we were allowed to overcommit too
much metadata without actually making sure we had enough space to make
allocations.  The idea behind overcommit is that we are allowed to say sure
you can have that reservation when most of the free space is occupied by
reservations, not actual allocations.  But in this case where a majority of
the total space is in use by actual allocations we can screw ourselves by
not being able to make real allocations when it matters.  So put this cap in
place for now to keep us from overcommitting so much that we run out of
space.  Thanks,

Reported-and-tested-by: Jim Schutt jasc...@sandia.gov
Signed-off-by: Josef Bacik jba...@fusionio.com

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index dca5679..156341e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3672,13 +3672,30 @@ static int can_overcommit(struct btrfs_root *root,
   

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-31 Thread Jim Schutt
On 01/31/2013 08:33 AM, Josef Bacik wrote:
 On Wed, Jan 30, 2013 at 02:37:40PM -0700, Jim Schutt wrote:
 On 01/30/2013 09:38 AM, Josef Bacik wrote:
 On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
 On 01/29/2013 01:04 PM, Josef Bacik wrote:
 On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
 On 01/28/2013 02:23 PM, Josef Bacik wrote:
 On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
 Hi Josef,

 Thanks for the patch - sorry for the long delay in 
 testing...


 Jim,

 I've been trying to reason out how this happens, could you do a 
 btrfs fi df on
 the filesystem thats giving you trouble so I can see if what I 
 think is
 happening is what's actually happening.  Thanks,

 Here's an example, using a slightly different kernel than
 my previous report.  It's your btrfs-next master branch
 (commit 8f139e59d5 Btrfs: use bit operation for -fs_state)
 with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).


 Here I'm finding the file system in question:

 # ls -l /dev/mapper | grep dm-93
 lrwxrwxrwx 1 root root   8 Jan 29 11:13 cs53s19p2 - ../dm-93

 # df -h | grep -A 1 cs53s19p2
 /dev/mapper/cs53s19p2
   896G  1.1G  896G   1% 
 /ram/mnt/ceph/data.osd.522


 Here's the info you asked for:

 # btrfs fi df /ram/mnt/ceph/data.osd.522
 Data: total=2.01GB, used=1.00GB
 System: total=4.00MB, used=64.00KB
 Metadata: total=8.00MB, used=7.56MB

 How big is the disk you are using, and what mount options?  I have a 
 patch to
 keep the panic from happening and hopefully the abort, could you try 
 this?  I
 still want to keep the underlying error from happening because it 
 shouldn't be,
 but no reason I can't fix the error case while you can easily reproduce 
 it :).
 Thanks,

 Josef

 From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
 From: Josef Bacik jba...@fusionio.com
 Date: Tue, 29 Jan 2013 15:03:37 -0500
 Subject: [PATCH] Btrfs: fix chunk allocation error handling

 If we error out allocating a dev extent we will have already created the
 block group and such which will cause problems since the allocator may 
 have
 tried to allocate out of the block group that no longer exists.  This 
 will
 cause BUG_ON()'s in the bio submission path.  This also makes a failure 
 to
 allocate a dev extent a non-abort error, we will just clean up the dev
 extents we did allocate and exit.  Now if we fail to delete the dev 
 extents
 we will abort since we can't have half of the dev extents hanging 
 around,
 but this will make us much less likely to abort.  Thanks,

 Signed-off-by: Josef Bacik jba...@fusionio.com
 ---

 Interesting - with your patch applied I triggered the following, just
 bringing up a fresh Ceph filesystem - I didn't even get a chance to
 mount it on my Ceph clients:

 Ok can you give this patch a whirl as well?  It seems to fix the problem 
 for me.

 With this patch on top of your previous patch, after several trials of
 my test I am also unable to reproduce the issue.  Since I had been
 having trouble first time, every time, I think it also seems to fix
 the problem for me.

 
 Hey Jim,
 
 Could you test this patch instead?  I think it's a little less hamfisted and
 should give us a nice balance between not crashing and being good for
 performance.  Thanks,

Hi Josef,

Running with this patch in place of your previous version, I
was again unable to reproduce the issue.

I might be seeing a couple percent increase in performance, or
it might just be noise, but I'm willing to say that I think
performance is same-or-better than the previous version of
the patch.

Thanks again!

-- Jim

 
 Josef
 
 commit 43510c0e5faad8e5e4d8ba13baa1dd5dfb3d39ce
 Author: Josef Bacik jba...@fusionio.com
 Date:   Wed Jan 30 17:02:51 2013 -0500
 
 Btrfs: do not allow overcommit to happen if we are over 80% in use
 
 Because of how little we allocate chunks now we can get really tight on
 metadata space before we will allocate a new chunk.  This resulted in 
 being
 unable to add device extents when allocating a new metadata chunk as we 
 did
 not have enough space.  This is because we were allowed to overcommit too
 much metadata without actually making sure we had enough space to make
 allocations.  The idea behind overcommit is that we are allowed to say 
 sure
 you can have that reservation when most of the free space is occupied by
 reservations, not actual allocations.  But in this case where a majority 
 of
 the total space is in use by actual allocations we can screw ourselves by
 not being able to make real allocations when it matters.  So put this cap 
 in
 place for now to keep us from overcommitting so much that we run out of
 space.  Thanks,
 
 Reported-and-tested-by: Jim Schutt jasc...@sandia.gov
 Signed-off-by: Josef Bacik jba...@fusionio.com
 
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index dca5679..156341e 100644
 --- a/fs/btrfs/extent-tree.c
 +++ 

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-30 Thread Josef Bacik
On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
 On 01/29/2013 01:04 PM, Josef Bacik wrote:
  On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
   On 01/28/2013 02:23 PM, Josef Bacik wrote:
On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
Hi Josef,
   
Thanks for the patch - sorry for the long delay in testing...
   

Jim,

I've been trying to reason out how this happens, could you do a btrfs 
fi df on
the filesystem thats giving you trouble so I can see if what I think 
is
happening is what's actually happening.  Thanks,
   
   Here's an example, using a slightly different kernel than
   my previous report.  It's your btrfs-next master branch
   (commit 8f139e59d5 Btrfs: use bit operation for -fs_state)
   with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).
   
   
   Here I'm finding the file system in question:
   
   # ls -l /dev/mapper | grep dm-93
   lrwxrwxrwx 1 root root   8 Jan 29 11:13 cs53s19p2 - ../dm-93
   
   # df -h | grep -A 1 cs53s19p2
   /dev/mapper/cs53s19p2
 896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522
   
   
   Here's the info you asked for:
   
   # btrfs fi df /ram/mnt/ceph/data.osd.522
   Data: total=2.01GB, used=1.00GB
   System: total=4.00MB, used=64.00KB
   Metadata: total=8.00MB, used=7.56MB
   
  How big is the disk you are using, and what mount options?  I have a patch 
  to
  keep the panic from happening and hopefully the abort, could you try this?  
  I
  still want to keep the underlying error from happening because it shouldn't 
  be,
  but no reason I can't fix the error case while you can easily reproduce it 
  :).
  Thanks,
  
  Josef
  
 From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
  From: Josef Bacik jba...@fusionio.com
  Date: Tue, 29 Jan 2013 15:03:37 -0500
  Subject: [PATCH] Btrfs: fix chunk allocation error handling
  
  If we error out allocating a dev extent we will have already created the
  block group and such which will cause problems since the allocator may have
  tried to allocate out of the block group that no longer exists.  This will
  cause BUG_ON()'s in the bio submission path.  This also makes a failure to
  allocate a dev extent a non-abort error, we will just clean up the dev
  extents we did allocate and exit.  Now if we fail to delete the dev extents
  we will abort since we can't have half of the dev extents hanging around,
  but this will make us much less likely to abort.  Thanks,
  
  Signed-off-by: Josef Bacik jba...@fusionio.com
  ---
 
 Interesting - with your patch applied I triggered the following, just
 bringing up a fresh Ceph filesystem - I didn't even get a chance to
 mount it on my Ceph clients:
 

Well that makes me a sad panda, but hey it didn't panic this time.  What
workload are you running on this fs/ceph cluster?  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-30 Thread Josef Bacik
On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
 On 01/29/2013 01:04 PM, Josef Bacik wrote:
  On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
   On 01/28/2013 02:23 PM, Josef Bacik wrote:
On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
Hi Josef,
   
Thanks for the patch - sorry for the long delay in testing...
   

Jim,

I've been trying to reason out how this happens, could you do a btrfs 
fi df on
the filesystem thats giving you trouble so I can see if what I think 
is
happening is what's actually happening.  Thanks,
   
   Here's an example, using a slightly different kernel than
   my previous report.  It's your btrfs-next master branch
   (commit 8f139e59d5 Btrfs: use bit operation for -fs_state)
   with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).
   
   
   Here I'm finding the file system in question:
   
   # ls -l /dev/mapper | grep dm-93
   lrwxrwxrwx 1 root root   8 Jan 29 11:13 cs53s19p2 - ../dm-93
   
   # df -h | grep -A 1 cs53s19p2
   /dev/mapper/cs53s19p2
 896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522
   
   
   Here's the info you asked for:
   
   # btrfs fi df /ram/mnt/ceph/data.osd.522
   Data: total=2.01GB, used=1.00GB
   System: total=4.00MB, used=64.00KB
   Metadata: total=8.00MB, used=7.56MB
   
  How big is the disk you are using, and what mount options?  I have a patch 
  to
  keep the panic from happening and hopefully the abort, could you try this?  
  I
  still want to keep the underlying error from happening because it shouldn't 
  be,
  but no reason I can't fix the error case while you can easily reproduce it 
  :).
  Thanks,
  
  Josef
  
 From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
  From: Josef Bacik jba...@fusionio.com
  Date: Tue, 29 Jan 2013 15:03:37 -0500
  Subject: [PATCH] Btrfs: fix chunk allocation error handling
  
  If we error out allocating a dev extent we will have already created the
  block group and such which will cause problems since the allocator may have
  tried to allocate out of the block group that no longer exists.  This will
  cause BUG_ON()'s in the bio submission path.  This also makes a failure to
  allocate a dev extent a non-abort error, we will just clean up the dev
  extents we did allocate and exit.  Now if we fail to delete the dev extents
  we will abort since we can't have half of the dev extents hanging around,
  but this will make us much less likely to abort.  Thanks,
  
  Signed-off-by: Josef Bacik jba...@fusionio.com
  ---
 
 Interesting - with your patch applied I triggered the following, just
 bringing up a fresh Ceph filesystem - I didn't even get a chance to
 mount it on my Ceph clients:
 

Actually nevermind it looks like I figured out how to reproduce.  I'll let you
know when I have something to test.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-30 Thread Josef Bacik
On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
 On 01/29/2013 01:04 PM, Josef Bacik wrote:
  On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
   On 01/28/2013 02:23 PM, Josef Bacik wrote:
On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
Hi Josef,
   
Thanks for the patch - sorry for the long delay in testing...
   

Jim,

I've been trying to reason out how this happens, could you do a btrfs 
fi df on
the filesystem thats giving you trouble so I can see if what I think 
is
happening is what's actually happening.  Thanks,
   
   Here's an example, using a slightly different kernel than
   my previous report.  It's your btrfs-next master branch
   (commit 8f139e59d5 Btrfs: use bit operation for -fs_state)
   with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).
   
   
   Here I'm finding the file system in question:
   
   # ls -l /dev/mapper | grep dm-93
   lrwxrwxrwx 1 root root   8 Jan 29 11:13 cs53s19p2 - ../dm-93
   
   # df -h | grep -A 1 cs53s19p2
   /dev/mapper/cs53s19p2
 896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522
   
   
   Here's the info you asked for:
   
   # btrfs fi df /ram/mnt/ceph/data.osd.522
   Data: total=2.01GB, used=1.00GB
   System: total=4.00MB, used=64.00KB
   Metadata: total=8.00MB, used=7.56MB
   
  How big is the disk you are using, and what mount options?  I have a patch 
  to
  keep the panic from happening and hopefully the abort, could you try this?  
  I
  still want to keep the underlying error from happening because it shouldn't 
  be,
  but no reason I can't fix the error case while you can easily reproduce it 
  :).
  Thanks,
  
  Josef
  
 From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
  From: Josef Bacik jba...@fusionio.com
  Date: Tue, 29 Jan 2013 15:03:37 -0500
  Subject: [PATCH] Btrfs: fix chunk allocation error handling
  
  If we error out allocating a dev extent we will have already created the
  block group and such which will cause problems since the allocator may have
  tried to allocate out of the block group that no longer exists.  This will
  cause BUG_ON()'s in the bio submission path.  This also makes a failure to
  allocate a dev extent a non-abort error, we will just clean up the dev
  extents we did allocate and exit.  Now if we fail to delete the dev extents
  we will abort since we can't have half of the dev extents hanging around,
  but this will make us much less likely to abort.  Thanks,
  
  Signed-off-by: Josef Bacik jba...@fusionio.com
  ---
 
 Interesting - with your patch applied I triggered the following, just
 bringing up a fresh Ceph filesystem - I didn't even get a chance to
 mount it on my Ceph clients:
 

Ok can you give this patch a whirl as well?  It seems to fix the problem for me.
Thanks,

Josef

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index dca5679..874bcf2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3677,8 +3677,18 @@ static int can_overcommit(struct btrfs_root *root,
u64 used;
 
used = space_info-bytes_used + space_info-bytes_reserved +
-   space_info-bytes_pinned + space_info-bytes_readonly +
-   space_info-bytes_may_use;
+   space_info-bytes_pinned + space_info-bytes_readonly;
+
+   /*
+* We only want to allow over committing if we have lots of actual space
+* free, but if we've tied up more than 80% of the space with actual
+* space reservation (not including bytes we _might_ use) then don't
+* allow overcommitting as it will just make things go badly for us.
+*/
+   if (used  div_factor(space_info-total_bytes, 8))
+   return 0;
+
+   used += space_info-bytes_may_use;
 
spin_lock(root-fs_info-free_chunk_lock);
avail = root-fs_info-free_chunk_space;
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-30 Thread Jim Schutt
On 01/30/2013 09:38 AM, Josef Bacik wrote:
 On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
  On 01/29/2013 01:04 PM, Josef Bacik wrote:
   On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
On 01/28/2013 02:23 PM, Josef Bacik wrote:
 On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
 Hi Josef,

 Thanks for the patch - sorry for the long delay in 
 testing...

 
 Jim,
 
 I've been trying to reason out how this happens, could you do 
 a btrfs fi df on
 the filesystem thats giving you trouble so I can see if what 
 I think is
 happening is what's actually happening.  Thanks,

Here's an example, using a slightly different kernel than
my previous report.  It's your btrfs-next master branch
(commit 8f139e59d5 Btrfs: use bit operation for -fs_state)
with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).


Here I'm finding the file system in question:

# ls -l /dev/mapper | grep dm-93
lrwxrwxrwx 1 root root   8 Jan 29 11:13 cs53s19p2 - ../dm-93

# df -h | grep -A 1 cs53s19p2
/dev/mapper/cs53s19p2
  896G  1.1G  896G   1% 
/ram/mnt/ceph/data.osd.522


Here's the info you asked for:

# btrfs fi df /ram/mnt/ceph/data.osd.522
Data: total=2.01GB, used=1.00GB
System: total=4.00MB, used=64.00KB
Metadata: total=8.00MB, used=7.56MB

   How big is the disk you are using, and what mount options?  I have a 
   patch to
   keep the panic from happening and hopefully the abort, could you try 
   this?  I
   still want to keep the underlying error from happening because it 
   shouldn't be,
   but no reason I can't fix the error case while you can easily reproduce 
   it :).
   Thanks,
   
   Josef
   
  From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
   From: Josef Bacik jba...@fusionio.com
   Date: Tue, 29 Jan 2013 15:03:37 -0500
   Subject: [PATCH] Btrfs: fix chunk allocation error handling
   
   If we error out allocating a dev extent we will have already created the
   block group and such which will cause problems since the allocator may 
   have
   tried to allocate out of the block group that no longer exists.  This 
   will
   cause BUG_ON()'s in the bio submission path.  This also makes a failure 
   to
   allocate a dev extent a non-abort error, we will just clean up the dev
   extents we did allocate and exit.  Now if we fail to delete the dev 
   extents
   we will abort since we can't have half of the dev extents hanging 
   around,
   but this will make us much less likely to abort.  Thanks,
   
   Signed-off-by: Josef Bacik jba...@fusionio.com
   ---
  
  Interesting - with your patch applied I triggered the following, just
  bringing up a fresh Ceph filesystem - I didn't even get a chance to
  mount it on my Ceph clients:
  
 Ok can you give this patch a whirl as well?  It seems to fix the problem for 
 me.

With this patch on top of your previous patch, after several trials of
my test I am also unable to reproduce the issue.  Since I had been
having trouble first time, every time, I think it also seems to fix
the problem for me.

Thanks again!

-- Jim

 Thanks,
 
 Josef


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-30 Thread Josef Bacik
On Wed, Jan 30, 2013 at 02:37:40PM -0700, Jim Schutt wrote:
 On 01/30/2013 09:38 AM, Josef Bacik wrote:
  On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
   On 01/29/2013 01:04 PM, Josef Bacik wrote:
On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
 On 01/28/2013 02:23 PM, Josef Bacik wrote:
  On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
  Hi Josef,
 
  Thanks for the patch - sorry for the long delay in 
  testing...
 
  
  Jim,
  
  I've been trying to reason out how this happens, could you 
  do a btrfs fi df on
  the filesystem thats giving you trouble so I can see if 
  what I think is
  happening is what's actually happening.  Thanks,
 
 Here's an example, using a slightly different kernel than
 my previous report.  It's your btrfs-next master branch
 (commit 8f139e59d5 Btrfs: use bit operation for -fs_state)
 with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).
 
 
 Here I'm finding the file system in question:
 
 # ls -l /dev/mapper | grep dm-93
 lrwxrwxrwx 1 root root   8 Jan 29 11:13 cs53s19p2 - ../dm-93
 
 # df -h | grep -A 1 cs53s19p2
 /dev/mapper/cs53s19p2
   896G  1.1G  896G   1% 
 /ram/mnt/ceph/data.osd.522
 
 
 Here's the info you asked for:
 
 # btrfs fi df /ram/mnt/ceph/data.osd.522
 Data: total=2.01GB, used=1.00GB
 System: total=4.00MB, used=64.00KB
 Metadata: total=8.00MB, used=7.56MB
 
How big is the disk you are using, and what mount options?  I have a 
patch to
keep the panic from happening and hopefully the abort, could you try 
this?  I
still want to keep the underlying error from happening because it 
shouldn't be,
but no reason I can't fix the error case while you can easily 
reproduce it :).
Thanks,

Josef

   From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
From: Josef Bacik jba...@fusionio.com
Date: Tue, 29 Jan 2013 15:03:37 -0500
Subject: [PATCH] Btrfs: fix chunk allocation error handling

If we error out allocating a dev extent we will have already created 
the
block group and such which will cause problems since the allocator 
may have
tried to allocate out of the block group that no longer exists.  This 
will
cause BUG_ON()'s in the bio submission path.  This also makes a 
failure to
allocate a dev extent a non-abort error, we will just clean up the dev
extents we did allocate and exit.  Now if we fail to delete the dev 
extents
we will abort since we can't have half of the dev extents hanging 
around,
but this will make us much less likely to abort.  Thanks,

Signed-off-by: Josef Bacik jba...@fusionio.com
---
   
   Interesting - with your patch applied I triggered the following, just
   bringing up a fresh Ceph filesystem - I didn't even get a chance to
   mount it on my Ceph clients:
   
  Ok can you give this patch a whirl as well?  It seems to fix the problem 
  for me.
 
 With this patch on top of your previous patch, after several trials of
 my test I am also unable to reproduce the issue.  Since I had been
 having trouble first time, every time, I think it also seems to fix
 the problem for me.
 
 Thanks again!
 

Awesome thanks for testing!

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-29 Thread Josef Bacik
On Mon, Jan 28, 2013 at 07:30:09PM -0700, Liu Bo wrote:
 On Mon, Jan 28, 2013 at 04:23:31PM -0500, Josef Bacik wrote:
  On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
   Hi Josef,
   
   Thanks for the patch - sorry for the long delay in testing...
   
  
  Jim,
  
  I've been trying to reason out how this happens, could you do a btrfs fi df 
  on
  the filesystem thats giving you trouble so I can see if what I think is
  happening is what's actually happening.  Thanks,
 
 Josef,
 
 A quick reproducer here: running xfstests 251 with autodefrag,compress=zlib
 


251  [not run] FSTRIM is not supported

Are you sure its 251?  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-29 Thread Josef Bacik
On Tue, Jan 29, 2013 at 08:47:30AM -0500, Josef Bacik wrote:
 On Mon, Jan 28, 2013 at 07:30:09PM -0700, Liu Bo wrote:
  On Mon, Jan 28, 2013 at 04:23:31PM -0500, Josef Bacik wrote:
   On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
Hi Josef,

Thanks for the patch - sorry for the long delay in testing...

   
   Jim,
   
   I've been trying to reason out how this happens, could you do a btrfs fi 
   df on
   the filesystem thats giving you trouble so I can see if what I think is
   happening is what's actually happening.  Thanks,
  
  Josef,
  
  A quick reproducer here: running xfstests 251 with autodefrag,compress=zlib
  
 
 
 251  [not run] FSTRIM is not supported
 
 Are you sure its 251?  Thanks,

Sorry it's early, I need a device that does trim.  /me waits for his fusion card
to get back from the shop,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-29 Thread David Sterba
On Tue, Jan 29, 2013 at 08:50:34AM -0500, Josef Bacik wrote:
 On Tue, Jan 29, 2013 at 08:47:30AM -0500, Josef Bacik wrote:
  251  [not run] FSTRIM is not supported
  
  Are you sure its 251?  Thanks,
 
 Sorry it's early, I need a device that does trim.  /me waits for his fusion 
 card
 to get back from the shop,

You can use scsi_debug device with

parm:   lbpu:enable LBP, support UNMAP command (def=0) (int)

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-29 Thread David Sterba
On Tue, Jan 29, 2013 at 05:43:31PM +0100, David Sterba wrote:
 On Tue, Jan 29, 2013 at 08:50:34AM -0500, Josef Bacik wrote:
 You can use scsi_debug device with
 
 parm:   lbpu:enable LBP, support UNMAP command (def=0) (int)

Also, loop device with a file backed by a filesystem with hole punch
support also understands TRIM.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-29 Thread Jim Schutt
On 01/28/2013 02:23 PM, Josef Bacik wrote:
 On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
 Hi Josef,

 Thanks for the patch - sorry for the long delay in testing...

 
 Jim,
 
 I've been trying to reason out how this happens, could you do a btrfs fi df on
 the filesystem thats giving you trouble so I can see if what I think is
 happening is what's actually happening.  Thanks,

Here's an example, using a slightly different kernel than
my previous report.  It's your btrfs-next master branch
(commit 8f139e59d5 Btrfs: use bit operation for -fs_state)
with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).


Here I'm finding the file system in question:

# ls -l /dev/mapper | grep dm-93
lrwxrwxrwx 1 root root   8 Jan 29 11:13 cs53s19p2 - ../dm-93

# df -h | grep -A 1 cs53s19p2
/dev/mapper/cs53s19p2
  896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522


Here's the info you asked for:

# btrfs fi df /ram/mnt/ceph/data.osd.522
Data: total=2.01GB, used=1.00GB
System: total=4.00MB, used=64.00KB
Metadata: total=8.00MB, used=7.56MB


And here's the backtrace that had trouble on dm-93.
It's a little different to my previous report:

[  705.496463] [ cut here ]
[  705.501123] WARNING: at fs/btrfs/super.c:256 
__btrfs_abort_transaction+0x60/0x110 [btrfs]()
[  705.509751] Hardware name: X8DTH-i/6/iF/6F
[  705.513862] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm 
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash 
dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput 
sg joydev sd_mod hid_generic iTCO_wdt iTCO_vendor_support coretemp kvm 
crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 
xts gf128mul microcode serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en 
mlx4_core ata_piix libata mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4 
i2c_i801 i2c_core button lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac 
edac_core dm_mod ioatdma nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc 
fscache broadcom tg3 hwmon bnx2 igb dca e1000
[  705.580232] Pid: 33025, comm: ceph-osd Not tainted 3.7.0-00269-gd9acbfd #492
[  705.587488] Call Trace:
[  705.589957]  [8103ff04] warn_slowpath_common+0x94/0xc0
[  705.596108]  [a055331a] ? btrfs_free_path+0x2a/0x40 [btrfs]
[  705.602685]  [8103ffe6] warn_slowpath_fmt+0x46/0x50
[  705.608563]  [a054c730] __btrfs_abort_transaction+0x60/0x110 
[btrfs]
[  705.615994]  [a05a2058] __btrfs_alloc_chunk+0x678/0x710 [btrfs]
[  705.622945]  [a05a214e] btrfs_alloc_chunk+0x5e/0x90 [btrfs]
[  705.629635]  [a055edb1] ? check_system_chunk+0x71/0x130 [btrfs]
[  705.637079]  [a055f15c] do_chunk_alloc+0x2ec/0x370 [btrfs]
[  705.643451]  [a055b199] ? btrfs_reduce_alloc_profile+0xa9/0x120 
[btrfs]
[  705.650951]  [a0561d1c] btrfs_check_data_free_space+0x13c/0x2b0 
[btrfs]
[  705.658446]  [a0564a70] btrfs_delalloc_reserve_space+0x20/0x60 
[btrfs]
[  705.665882]  [a058980e] __btrfs_buffered_write+0x15e/0x340 [btrfs]
[  705.672952]  [a0589e29] btrfs_file_aio_write+0x309/0x450 [btrfs]
[  705.679889]  [a0589b20] ? __btrfs_direct_write+0x130/0x130 [btrfs]
[  705.686934]  [811626f4] do_sync_readv_writev+0x94/0xe0
[  705.692942]  [811637b3] do_readv_writev+0xe3/0x1e0
[  705.698604]  [81180c42] ? fget_light+0x122/0x170
[  705.704093]  [811638f6] vfs_writev+0x46/0x60
[  705.709239]  [81163a2f] sys_writev+0x5f/0xc0
[  705.714388]  [812637ee] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  705.720827]  [814b7882] system_call_fastpath+0x16/0x1b
[  705.726829] ---[ end trace 6e889d6d939ca116 ]---
[  705.731459] BTRFS warning (device dm-93): __btrfs_alloc_chunk:3787: Aborting 
unused transaction(error 28).
[  705.741187] btrfs: mapping failed logical 1099431936 bio len 524288 len 65536
[  705.741192] BTRFS warning (device dm-93): find_free_extent:5948: Aborting 
unused transaction(Object already exists).
[  705.759185] [ cut here ]
[  705.763929] kernel BUG at fs/btrfs/volumes.c:4891!
[  705.768990] invalid opcode:  [#1] SMP 
[  705.773561] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm 
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash 
dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput 
sg joydev sd_mod hid_generic iTCO_wdt iTCO_vendor_support coretemp kvm 
crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 
xts gf128mul microcode serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en 
mlx4_core ata_piix libata mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4 
i2c_i801 i2c_core button lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac 
edac_core dm_mod ioatdma nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc 
fscache broadcom tg3 hwmon bnx2 igb dca e1000
[  705.845121] 

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-29 Thread Josef Bacik
On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
 On 01/28/2013 02:23 PM, Josef Bacik wrote:
  On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
  Hi Josef,
 
  Thanks for the patch - sorry for the long delay in testing...
 
  
  Jim,
  
  I've been trying to reason out how this happens, could you do a btrfs fi df 
  on
  the filesystem thats giving you trouble so I can see if what I think is
  happening is what's actually happening.  Thanks,
 
 Here's an example, using a slightly different kernel than
 my previous report.  It's your btrfs-next master branch
 (commit 8f139e59d5 Btrfs: use bit operation for -fs_state)
 with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).
 
 
 Here I'm finding the file system in question:
 
 # ls -l /dev/mapper | grep dm-93
 lrwxrwxrwx 1 root root   8 Jan 29 11:13 cs53s19p2 - ../dm-93
 
 # df -h | grep -A 1 cs53s19p2
 /dev/mapper/cs53s19p2
   896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522
 
 
 Here's the info you asked for:
 
 # btrfs fi df /ram/mnt/ceph/data.osd.522
 Data: total=2.01GB, used=1.00GB
 System: total=4.00MB, used=64.00KB
 Metadata: total=8.00MB, used=7.56MB
 

How big is the disk you are using, and what mount options?  I have a patch to
keep the panic from happening and hopefully the abort, could you try this?  I
still want to keep the underlying error from happening because it shouldn't be,
but no reason I can't fix the error case while you can easily reproduce it :).
Thanks,

Josef

From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
From: Josef Bacik jba...@fusionio.com
Date: Tue, 29 Jan 2013 15:03:37 -0500
Subject: [PATCH] Btrfs: fix chunk allocation error handling

If we error out allocating a dev extent we will have already created the
block group and such which will cause problems since the allocator may have
tried to allocate out of the block group that no longer exists.  This will
cause BUG_ON()'s in the bio submission path.  This also makes a failure to
allocate a dev extent a non-abort error, we will just clean up the dev
extents we did allocate and exit.  Now if we fail to delete the dev extents
we will abort since we can't have half of the dev extents hanging around,
but this will make us much less likely to abort.  Thanks,

Signed-off-by: Josef Bacik jba...@fusionio.com
---
 fs/btrfs/volumes.c |   32 ++--
 1 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4f8c281..2ba5b84 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3766,12 +3766,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
if (ret)
goto error;
 
-   ret = btrfs_make_block_group(trans, extent_root, 0, type,
-BTRFS_FIRST_CHUNK_TREE_OBJECTID,
-start, num_bytes);
-   if (ret)
-   goto error;
-
for (i = 0; i  map-num_stripes; ++i) {
struct btrfs_device *device;
u64 dev_offset;
@@ -3783,15 +3777,33 @@ static int __btrfs_alloc_chunk(struct 
btrfs_trans_handle *trans,
info-chunk_root-root_key.objectid,
BTRFS_FIRST_CHUNK_TREE_OBJECTID,
start, dev_offset, stripe_size);
-   if (ret) {
-   btrfs_abort_transaction(trans, extent_root, ret);
-   goto error;
-   }
+   if (ret)
+   goto error_dev_extent;
+   }
+
+   ret = btrfs_make_block_group(trans, extent_root, 0, type,
+BTRFS_FIRST_CHUNK_TREE_OBJECTID,
+start, num_bytes);
+   if (ret) {
+   i = map-num_stripes - 1;
+   goto error_dev_extent;
}
 
kfree(devices_info);
return 0;
 
+error_dev_extent:
+   for (; i = 0; i--) {
+   struct btrfs_device *device;
+   int err;
+
+   device = map-stripes[i].dev;
+   err = btrfs_free_dev_extent(trans, device, start);
+   if (err) {
+   btrfs_abort_transaction(trans, extent_root, err);
+   break;
+   }
+   }
 error:
kfree(map);
kfree(devices_info);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-29 Thread Jim Schutt
On 01/29/2013 01:04 PM, Josef Bacik wrote:
 On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
 On 01/28/2013 02:23 PM, Josef Bacik wrote:
 On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
 Hi Josef,

 Thanks for the patch - sorry for the long delay in testing...


 Jim,

 I've been trying to reason out how this happens, could you do a btrfs fi df 
 on
 the filesystem thats giving you trouble so I can see if what I think is
 happening is what's actually happening.  Thanks,

 Here's an example, using a slightly different kernel than
 my previous report.  It's your btrfs-next master branch
 (commit 8f139e59d5 Btrfs: use bit operation for -fs_state)
 with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).


 Here I'm finding the file system in question:

 # ls -l /dev/mapper | grep dm-93
 lrwxrwxrwx 1 root root   8 Jan 29 11:13 cs53s19p2 - ../dm-93

 # df -h | grep -A 1 cs53s19p2
 /dev/mapper/cs53s19p2
   896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522


 Here's the info you asked for:

 # btrfs fi df /ram/mnt/ceph/data.osd.522
 Data: total=2.01GB, used=1.00GB
 System: total=4.00MB, used=64.00KB
 Metadata: total=8.00MB, used=7.56MB

 
 How big is the disk you are using, and what mount options? 

The partition is ~900 GiB, and the mount options according
to /proc/mount are: rw,noatime,nospace_cache

Also, in case it matters, I build the file systems
with -l 65536 -n 65536.

 I have a patch to
 keep the panic from happening and hopefully the abort, could you try this?  I
 still want to keep the underlying error from happening because it shouldn't 
 be,
 but no reason I can't fix the error case while you can easily reproduce it :).

I'm happy to try it - but I probably won't have results
for you until tomorrow, due to other time pressures.

Thanks for taking a look.

-- Jim

 Thanks,
 
 Josef
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-29 Thread Jim Schutt
On 01/29/2013 01:04 PM, Josef Bacik wrote:
 On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
  On 01/28/2013 02:23 PM, Josef Bacik wrote:
   On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
   Hi Josef,
  
   Thanks for the patch - sorry for the long delay in testing...
  
   
   Jim,
   
   I've been trying to reason out how this happens, could you do a btrfs 
   fi df on
   the filesystem thats giving you trouble so I can see if what I think is
   happening is what's actually happening.  Thanks,
  
  Here's an example, using a slightly different kernel than
  my previous report.  It's your btrfs-next master branch
  (commit 8f139e59d5 Btrfs: use bit operation for -fs_state)
  with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).
  
  
  Here I'm finding the file system in question:
  
  # ls -l /dev/mapper | grep dm-93
  lrwxrwxrwx 1 root root   8 Jan 29 11:13 cs53s19p2 - ../dm-93
  
  # df -h | grep -A 1 cs53s19p2
  /dev/mapper/cs53s19p2
896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522
  
  
  Here's the info you asked for:
  
  # btrfs fi df /ram/mnt/ceph/data.osd.522
  Data: total=2.01GB, used=1.00GB
  System: total=4.00MB, used=64.00KB
  Metadata: total=8.00MB, used=7.56MB
  
 How big is the disk you are using, and what mount options?  I have a patch to
 keep the panic from happening and hopefully the abort, could you try this?  I
 still want to keep the underlying error from happening because it shouldn't 
 be,
 but no reason I can't fix the error case while you can easily reproduce it :).
 Thanks,
 
 Josef
 
From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
 From: Josef Bacik jba...@fusionio.com
 Date: Tue, 29 Jan 2013 15:03:37 -0500
 Subject: [PATCH] Btrfs: fix chunk allocation error handling
 
 If we error out allocating a dev extent we will have already created the
 block group and such which will cause problems since the allocator may have
 tried to allocate out of the block group that no longer exists.  This will
 cause BUG_ON()'s in the bio submission path.  This also makes a failure to
 allocate a dev extent a non-abort error, we will just clean up the dev
 extents we did allocate and exit.  Now if we fail to delete the dev extents
 we will abort since we can't have half of the dev extents hanging around,
 but this will make us much less likely to abort.  Thanks,
 
 Signed-off-by: Josef Bacik jba...@fusionio.com
 ---

Interesting - with your patch applied I triggered the following, just
bringing up a fresh Ceph filesystem - I didn't even get a chance to
mount it on my Ceph clients:

[ 6419.450179] BTRFS error (device dm-73) in btrfs_free_dev_extent:1115: error 
28 (Slot search failed)
[ 6419.459223] btrfs is forced readonly
[ 6419.462805] [ cut here ]
[ 6419.467440] WARNING: at fs/btrfs/super.c:256 
__btrfs_abort_transaction+0x60/0x110 [btrfs]()
[ 6419.475809] Hardware name: X8DTH-i/6/iF/6F
[ 6419.479914] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm 
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash 
dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput 
sg joydev sd_mod iTCO_wdt iTCO_vendor_support hid_generic coretemp kvm 
crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 
xts gf128mul microcode button ata_piix libata mpt2sas scsi_transport_sas 
raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en 
mlx4_core cxgb4 i2c_i801 i2c_core lpc_ich mfd_core uhci_hcd ehci_hcd 
i7core_edac edac_core ioatdma dm_mod nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs 
lockd sunrpc fscache broadcom tg3 hwmon bnx2 igb dca e1000
[ 6419.546095] Pid: 107593, comm: ceph-osd Not tainted 3.7.0-00270-g8353482 #494
[ 6419.553227] Call Trace:
[ 6419.555697]  [8103ff04] warn_slowpath_common+0x94/0xc0
[ 6419.561708]  [8103ffe6] warn_slowpath_fmt+0x46/0x50
[ 6419.567491]  [a0542730] __btrfs_abort_transaction+0x60/0x110 
[btrfs]
[ 6419.574746]  [a05980c6] __btrfs_alloc_chunk+0x6e6/0x770 [btrfs]
[ 6419.581553]  [a05981ae] btrfs_alloc_chunk+0x5e/0x90 [btrfs]
[ 6419.588017]  [a0554db1] ? check_system_chunk+0x71/0x130 [btrfs]
[ 6419.594824]  [a055515c] do_chunk_alloc+0x2ec/0x370 [btrfs]
[ 6419.601188]  [a055e06c] find_free_extent+0xaac/0xbe0 [btrfs]
[ 6419.607733]  [a055e222] btrfs_reserve_extent+0x82/0x190 [btrfs]
[ 6419.614545]  [a055e3b5] btrfs_alloc_free_block+0x85/0x230 [btrfs]
[ 6419.621530]  [a0586e55] ? check_buffer_tree_ref+0x25/0x50 [btrfs]
[ 6419.628512]  [a0549bca] __btrfs_cow_block+0x14a/0x4b0 [btrfs]
[ 6419.635155]  [a05a261c] ? btrfs_try_tree_write_lock+0x3c/0xa0 
[btrfs]
[ 6419.642475]  [a05a2c43] ? btrfs_set_lock_blocking_rw+0xe3/0x160 
[btrfs]
[ 6419.649970]  [a054a5b1] btrfs_cow_block+0x161/0x200 [btrfs]
[ 6419.656424]  [a054d679] btrfs_search_slot+0x399/0x760 

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-28 Thread Josef Bacik
On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
 Hi Josef,
 
 Thanks for the patch - sorry for the long delay in testing...
 

Jim,

I've been trying to reason out how this happens, could you do a btrfs fi df on
the filesystem thats giving you trouble so I can see if what I think is
happening is what's actually happening.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-28 Thread Jim Schutt
On 01/28/2013 02:23 PM, Josef Bacik wrote:
 On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
 Hi Josef,

 Thanks for the patch - sorry for the long delay in testing...

 
 Jim,
 
 I've been trying to reason out how this happens, could you do a btrfs fi df on
 the filesystem thats giving you trouble so I can see if what I think is
 happening is what's actually happening.  Thanks,

Sure - it'll take me a bit to set the test up again.

-- Jim

 
 Josef
 
 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-28 Thread Liu Bo
On Mon, Jan 28, 2013 at 04:23:31PM -0500, Josef Bacik wrote:
 On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
  Hi Josef,
  
  Thanks for the patch - sorry for the long delay in testing...
  
 
 Jim,
 
 I've been trying to reason out how this happens, could you do a btrfs fi df on
 the filesystem thats giving you trouble so I can see if what I think is
 happening is what's actually happening.  Thanks,

Josef,

A quick reproducer here: running xfstests 251 with autodefrag,compress=zlib

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2013-01-03 Thread Jim Schutt
Hi Josef,

Thanks for the patch - sorry for the long delay in testing...


On 12/18/2012 06:52 AM, Josef Bacik wrote:
 On Wed, Dec 12, 2012 at 06:52:37PM -0700, Liu Bo wrote:
 An user reported that he has hit an annoying deadlock while playing with
 ceph based on btrfs.

 Current updating device tree requires space from METADATA chunk,
 so we -may- need to do a recursive chunk allocation when adding/updating
 dev extent, that is where the deadlock comes from.

 If we use SYSTEM metadata to update device tree, we can avoid the recursive
 stuff.

 
 This is going to cause us to allocate much more system chunks than we used to
 which could land us in trouble.  Instead let's just keep us from re-entering 
 if
 we're already allocating a chunk.  We do the chunk allocation when we don't 
 have
 enough space for a cluster, but we'll likely have plenty of space to make an
 allocation.  Can you give this patch a try Jim and see if it fixes your 
 problem?
 Thanks,
 
 Josef
 

With your patch applied to 3.7.1, I get the following on one
of my servers running Ceph OSDs.  The end effect is that some
of my ceph client writes hang. 

[ 1440.335752] [ cut here ]
[ 1440.340602] WARNING: at fs/btrfs/super.c:246 
__btrfs_abort_transaction+0x60/0x110 [btrfs]()
[ 1440.349117] Hardware name: X8DTH-i/6/iF/6F
[ 1440.353252] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm 
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash 
dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput 
sg joydev sd_mod iTCO_wdt iTCO_vendor_support hid_generic button ata_piix 
libata coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper 
cryptd lrw aes_x86_64 xts gf128mul microcode mpt2sas scsi_transport_sas 
raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en 
mlx4_core cxgb4 i2c_i801 i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd ioatdma 
i7core_edac dm_mod edac_core nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd 
sunrpc fscache broadcom tg3 hwmon bnx2 igb dca e1000
[ 1440.419398] Pid: 48686, comm: ceph-osd Not tainted 3.7.1-6-gc794580 #484
[ 1440.426614] Call Trace:
[ 1440.429083]  [8103fed4] warn_slowpath_common+0x94/0xc0
[ 1440.435110]  [8103ffb6] warn_slowpath_fmt+0x46/0x50
[ 1440.440894]  [a05425c0] __btrfs_abort_transaction+0x60/0x110 
[btrfs]
[ 1440.448135]  [a059513d] __btrfs_alloc_chunk+0x6cd/0x750 [btrfs]
[ 1440.454941]  [a059521e] btrfs_alloc_chunk+0x5e/0x90 [btrfs]
[ 1440.461382]  [a05543a1] ? check_system_chunk+0x71/0x130 [btrfs]
[ 1440.468188]  [a055474c] do_chunk_alloc+0x2ec/0x370 [btrfs]
[ 1440.474562]  [a05509e9] ? btrfs_reduce_alloc_profile+0xa9/0x120 
[btrfs]
[ 1440.482050]  [a055839c] btrfs_check_data_free_space+0x13c/0x2b0 
[btrfs]
[ 1440.489558]  [a0559f40] btrfs_delalloc_reserve_space+0x20/0x60 
[btrfs]
[ 1440.497013]  [a057e31e] __btrfs_buffered_write+0x15e/0x350 [btrfs]
[ 1440.504095]  [a057e849] btrfs_file_aio_write+0x209/0x320 [btrfs]
[ 1440.511000]  [a057e640] ? __btrfs_direct_write+0x130/0x130 [btrfs]
[ 1440.518062]  [81164ef4] do_sync_readv_writev+0x94/0xe0
[ 1440.524105]  [81165f03] do_readv_writev+0xe3/0x1e0
[ 1440.529792]  [81182ff2] ? fget_light+0x122/0x170
[ 1440.535275]  [81166046] vfs_writev+0x46/0x60
[ 1440.540412]  [8116617f] sys_writev+0x5f/0xc0
[ 1440.545547]  [81264b3e] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1440.551987]  [814b7102] system_call_fastpath+0x16/0x1b
[ 1440.558016] ---[ end trace 764e83a458dabca6 ]---
[ 1440.562662] BTRFS warning (device dm-32): __btrfs_alloc_chunk:3488: Aborting 
unused transaction(error 28).
[ 1440.595987] BTRFS warning (device dm-32): find_free_extent:5871: Aborting 
unused transaction(Object already exists).
[ 1440.606542] BUG: unable to handle kernel NULL pointer dereference at 
  (null)
[ 1440.614382] IP: [a0584e5e] map_private_extent_buffer+0xe/0xf0 
[btrfs]
[ 1440.621704] PGD 6138e8067 PUD 56749f067 PMD 0 
[ 1440.626190] Oops:  [#1] SMP 
[ 1440.629442] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm 
ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash 
dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput 
sg joydev sd_mod iTCO_wdt iTCO_vendor_support hid_generic button ata_piix 
libata coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper 
cryptd lrw aes_x86_64 xts gf128mul microcode mpt2sas scsi_transport_sas 
raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en 
mlx4_core cxgb4 i2c_i801 i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd ioatdma 
i7core_edac dm_mod edac_core nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd 
sunrpc fscache broadcom tg3 hwmon bnx2 igb dca e1000
[ 1440.694855] CPU 16 
[ 1440.696784] Pid: 48687, comm: ceph-osd Tainted: GW

Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2012-12-18 Thread Josef Bacik
On Wed, Dec 12, 2012 at 06:52:37PM -0700, Liu Bo wrote:
 An user reported that he has hit an annoying deadlock while playing with
 ceph based on btrfs.
 
 Current updating device tree requires space from METADATA chunk,
 so we -may- need to do a recursive chunk allocation when adding/updating
 dev extent, that is where the deadlock comes from.
 
 If we use SYSTEM metadata to update device tree, we can avoid the recursive
 stuff.
 

This is going to cause us to allocate much more system chunks than we used to
which could land us in trouble.  Instead let's just keep us from re-entering if
we're already allocating a chunk.  We do the chunk allocation when we don't have
enough space for a cluster, but we'll likely have plenty of space to make an
allocation.  Can you give this patch a try Jim and see if it fixes your problem?
Thanks,

Josef


diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e152809..59df5e7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3564,6 +3564,10 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans,
int wait_for_alloc = 0;
int ret = 0;
 
+   /* Don't re-enter if we're already allocating a chunk */
+   if (trans-allocating_chunk)
+   return -ENOSPC;
+
space_info = __find_space_info(extent_root-fs_info, flags);
if (!space_info) {
ret = update_space_info(extent_root-fs_info, flags,
@@ -3606,6 +3610,8 @@ again:
goto again;
}
 
+   trans-allocating_chunk = true;
+
/*
 * If we have mixed data/metadata chunks we want to make sure we keep
 * allocating mixed chunks instead of individual chunks.
@@ -3632,6 +3638,7 @@ again:
check_system_chunk(trans, extent_root, flags);
 
ret = btrfs_alloc_chunk(trans, extent_root, flags);
+   trans-allocating_chunk = false;
if (ret  0  ret != -ENOSPC)
goto out;
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index e6509b9..47ad8be 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -388,6 +388,7 @@ again:
h-qgroup_reserved = qgroup_reserved;
h-delayed_ref_elem.seq = 0;
h-type = type;
+   h-allocating_chunk = false;
INIT_LIST_HEAD(h-qgroup_ref_list);
INIT_LIST_HEAD(h-new_bgs);
 
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 0e8aa1e..69700f7 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -68,6 +68,7 @@ struct btrfs_trans_handle {
struct btrfs_block_rsv *orig_rsv;
short aborted;
short adding_csums;
+   bool allocating_chunk;
enum btrfs_trans_type type;
/*
 * this root is only needed to validate that the root passed to
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix a deadlock on chunk mutex

2012-12-18 Thread Josef Bacik
On Tue, Dec 18, 2012 at 07:47:51AM -0700, Liu Bo wrote:
 On Tue, Dec 18, 2012 at 08:52:42AM -0500, Josef Bacik wrote:
  On Wed, Dec 12, 2012 at 06:52:37PM -0700, Liu Bo wrote:
   An user reported that he has hit an annoying deadlock while playing with
   ceph based on btrfs.
   
   Current updating device tree requires space from METADATA chunk,
   so we -may- need to do a recursive chunk allocation when adding/updating
   dev extent, that is where the deadlock comes from.
   
   If we use SYSTEM metadata to update device tree, we can avoid the 
   recursive
   stuff.
   
  
  This is going to cause us to allocate much more system chunks than we used 
  to
  which could land us in trouble.  Instead let's just keep us from 
  re-entering if
  we're already allocating a chunk.  We do the chunk allocation when we don't 
  have
  enough space for a cluster, but we'll likely have plenty of space to make an
  allocation.  Can you give this patch a try Jim and see if it fixes your 
  problem?
  Thanks,
 
 From the stack info Jim gave, returning ENOSPC to caller will end up with
 aborting to readonly if there is no others save the situation by 
 allocating another METADATA chunk, it is recursive allocation though.
 

if (ret  0  ret != -ENOSPC)

it shouldn't abort, it should just drop empty_size and stop trying to allocate a
cluster and just allocate the blocks needed, and this is only for the recursive
chunk allocation, so after this succeeds we'll have a new chunk and the original
allocation will be able to carry on.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix a deadlock on chunk mutex

2012-12-12 Thread Liu Bo
An user reported that he has hit an annoying deadlock while playing with
ceph based on btrfs.

Current updating device tree requires space from METADATA chunk,
so we -may- need to do a recursive chunk allocation when adding/updating
dev extent, that is where the deadlock comes from.

If we use SYSTEM metadata to update device tree, we can avoid the recursive
stuff.

Reported-by: Jim Schutt jasc...@sandia.gov
Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/extent-tree.c |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3d3e2c1..561dad5 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3346,7 +3346,8 @@ u64 btrfs_get_alloc_profile(struct btrfs_root *root, int 
data)
 
if (data)
flags = BTRFS_BLOCK_GROUP_DATA;
-   else if (root == root-fs_info-chunk_root)
+   else if (root == root-fs_info-chunk_root ||
+root == root-fs_info-dev_root)
flags = BTRFS_BLOCK_GROUP_SYSTEM;
else
flags = BTRFS_BLOCK_GROUP_METADATA;
@@ -3534,7 +3535,8 @@ static u64 get_system_chunk_thresh(struct btrfs_root 
*root, u64 type)
else
num_dev = 1;/* DUP or single */
 
-   /* metadata for updaing devices and chunk tree */
+   /* metadata for adding/updating devices and chunk tree */
+   num_dev = num_dev  1
return btrfs_calc_trans_metadata_size(root, num_dev + 1);
 }
 
@@ -4351,7 +4353,7 @@ static void init_global_block_rsv(struct btrfs_fs_info 
*fs_info)
 
fs_info-extent_root-block_rsv = fs_info-global_block_rsv;
fs_info-csum_root-block_rsv = fs_info-global_block_rsv;
-   fs_info-dev_root-block_rsv = fs_info-global_block_rsv;
+   fs_info-dev_root-block_rsv = fs_info-chunk_block_rsv;
fs_info-tree_root-block_rsv = fs_info-global_block_rsv;
fs_info-chunk_root-block_rsv = fs_info-chunk_block_rsv;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html