Re: Need some help with a backref problem

2008-10-17 Thread Josef Bacik
On Fri, Oct 17, 2008 at 09:48:58AM +0800, Yan Zheng wrote:
 2008/10/17 Josef Bacik [EMAIL PROTECTED]:
  On Thu, Oct 16, 2008 at 04:54:12PM -0400, Josef Bacik wrote:
  Hello,
 
  Its the end of the day here and I haven't figured this out, so hopefully 
  Yan you
  can figure this out and I can come in tomorrow and keep working on taking
  alloc_mutex out :).  What is happening is I'm getting -ENOENT from
  lookup_extent_backref in finish_current_insert() when extent_op-type ==
  PENDING_BACKREF_UPDATE.  The way I have locking is that the only way this 
  can
  happen is if we delete the extent backref completely, and then do
  btrfs_update_ref.  I put a lookup_extent_backref in 
  __btrfs_update_extent_ref
  and did a BUG_ON(ret), and it gave me this backtrace
 
 
 I guess there are two or more threads running finish_current_insert at the 
 same
 time. (find_first_extent_bit vs clear_extent_bits race)


This can't happen, its gaurded by a new mutex that is responsible for the
extent_ins/del_pending/pinned_extents extent io trees.
 
   [a035ecac] ? btrfs_update_ref+0x2ce/0x322 [btrfs]
   [a034f859] ? insert_ptr+0x176/0x184 [btrfs]
   [a0354615] ? split_node+0x54a/0x5b3 [btrfs]
   [a03555af] ? btrfs_search_slot+0x4ef/0x7aa [btrfs]
   [8109a1ad] ? check_bytes_and_report+0x37/0xc9
   [8109a1ad] ? check_bytes_and_report+0x37/0xc9
   [a0355da7] ? btrfs_insert_empty_items+0x7d/0x43b [btrfs]
   [a03561b4] ? btrfs_insert_item+0x4f/0xa4 [btrfs]
   [a0358ef6] ? finish_current_insert+0xfc/0x2b5 [btrfs]
   [8109a832] ? init_object+0x27/0x6e
   [a035a77b] ? __btrfs_alloc_reserved_extent+0x37d/0x3dc [btrfs]
   [a035aa26] ? btrfs_alloc_reserved_extent+0x2b/0x5b [btrfs]
   [a036accf] ? btrfs_finish_ordered_io+0x21b/0x344 [btrfs]
   [a037a782] ? end_bio_extent_writepage+0x9b/0x172 [btrfs]
   [a037fe51] ? worker_loop+0x42/0x125 [btrfs]
   [a037fe0f] ? worker_loop+0x0/0x125 [btrfs]
   [81046721] ? kthread+0x47/0x76
   [8100cd59] ? child_rip+0xa/0x11
   [810466da] ? kthread+0x0/0x76
   [8100cd4f] ? child_rip+0x0/0x11
 
  And I also put in some printk's to figure out when exactly this was 
  happening,
  and it happens in split_node() when c == root-node, so we do an
  insert_new_root.  My first reaction was to put a c = path-nodes[level] 
  after
  the insert_new_root, but looking at it thats just going to give me the same
  thing back.  I can't figure out if I'm doing something wrong or if there is
  something wonky with the backref stuff, and what is even more worriesome 
  is that
  I can't figure out why having alloc_mutex in there kept this problem from
  happenign before, since the way this happens doesn't have anything to do 
  with
  alloc_mutex.  All help is appreciated, even random thinking outloud, 
  hopefully
  we can figure out what is going on and I can finish ripping alloc_mutex 
  out.
  Thanks,
 
 
  Ok I think I figured it out, we need to have a
 
  c = root-node;
 
  after the insert_new_root, since we will have free'd the old extent buffer 
  and
  replaced it with a new one.  Does that sound right?  Thanks,
 
 
 I don't think so. If we do this, we will end up spliting the new root.


Ok, I think I understand this now, thanks,

Josef 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help with a backref problem

2008-10-17 Thread Josef Bacik
On Fri, Oct 17, 2008 at 09:48:58AM +0800, Yan Zheng wrote:
 2008/10/17 Josef Bacik [EMAIL PROTECTED]:
  On Thu, Oct 16, 2008 at 04:54:12PM -0400, Josef Bacik wrote:
  Hello,
 
  Its the end of the day here and I haven't figured this out, so hopefully 
  Yan you
  can figure this out and I can come in tomorrow and keep working on taking
  alloc_mutex out :).  What is happening is I'm getting -ENOENT from
  lookup_extent_backref in finish_current_insert() when extent_op-type ==
  PENDING_BACKREF_UPDATE.  The way I have locking is that the only way this 
  can
  happen is if we delete the extent backref completely, and then do
  btrfs_update_ref.  I put a lookup_extent_backref in 
  __btrfs_update_extent_ref
  and did a BUG_ON(ret), and it gave me this backtrace
 
 
 I guess there are two or more threads running finish_current_insert at the 
 same
 time. (find_first_extent_bit vs clear_extent_bits race)
 
   [a035ecac] ? btrfs_update_ref+0x2ce/0x322 [btrfs]
   [a034f859] ? insert_ptr+0x176/0x184 [btrfs]
   [a0354615] ? split_node+0x54a/0x5b3 [btrfs]
   [a03555af] ? btrfs_search_slot+0x4ef/0x7aa [btrfs]
   [8109a1ad] ? check_bytes_and_report+0x37/0xc9
   [8109a1ad] ? check_bytes_and_report+0x37/0xc9
   [a0355da7] ? btrfs_insert_empty_items+0x7d/0x43b [btrfs]
   [a03561b4] ? btrfs_insert_item+0x4f/0xa4 [btrfs]
   [a0358ef6] ? finish_current_insert+0xfc/0x2b5 [btrfs]
   [8109a832] ? init_object+0x27/0x6e
   [a035a77b] ? __btrfs_alloc_reserved_extent+0x37d/0x3dc [btrfs]
   [a035aa26] ? btrfs_alloc_reserved_extent+0x2b/0x5b [btrfs]
   [a036accf] ? btrfs_finish_ordered_io+0x21b/0x344 [btrfs]
   [a037a782] ? end_bio_extent_writepage+0x9b/0x172 [btrfs]
   [a037fe51] ? worker_loop+0x42/0x125 [btrfs]
   [a037fe0f] ? worker_loop+0x0/0x125 [btrfs]
   [81046721] ? kthread+0x47/0x76
   [8100cd59] ? child_rip+0xa/0x11
   [810466da] ? kthread+0x0/0x76
   [8100cd4f] ? child_rip+0x0/0x11
 
  And I also put in some printk's to figure out when exactly this was 
  happening,
  and it happens in split_node() when c == root-node, so we do an
  insert_new_root.  My first reaction was to put a c = path-nodes[level] 
  after
  the insert_new_root, but looking at it thats just going to give me the same
  thing back.  I can't figure out if I'm doing something wrong or if there is
  something wonky with the backref stuff, and what is even more worriesome 
  is that
  I can't figure out why having alloc_mutex in there kept this problem from
  happenign before, since the way this happens doesn't have anything to do 
  with
  alloc_mutex.  All help is appreciated, even random thinking outloud, 
  hopefully
  we can figure out what is going on and I can finish ripping alloc_mutex 
  out.
  Thanks,
 
 
  Ok I think I figured it out, we need to have a
 
  c = root-node;
 
  after the insert_new_root, since we will have free'd the old extent buffer 
  and
  replaced it with a new one.  Does that sound right?  Thanks,
 
 
 I don't think so. If we do this, we will end up spliting the new root.


here is my patch, its a bit of a mess right now, thanks,

Josef

 
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 9caeb37..4f2 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1390,8 +1390,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, 
struct btrfs_root
lowest_level = p-lowest_level;
WARN_ON(lowest_level  ins_len  0);
WARN_ON(p-nodes[0] != NULL);
-   WARN_ON(cow  root == root-fs_info-extent_root 
-   !mutex_is_locked(root-fs_info-alloc_mutex));
+
if (ins_len  0)
lowest_unlock = 2;
 
@@ -2051,6 +2050,7 @@ static noinline int split_node(struct btrfs_trans_handle 
*trans,
if (c == root-node) {
/* trying to split the root, lets make a new one */
ret = insert_new_root(trans, root, path, level + 1);
+   printk(KERN_ERR splitting the root, %llu\n, c-start);
if (ret)
return ret;
} else {
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index fad58b9..d1e304f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -516,12 +516,14 @@ struct btrfs_free_space {
struct rb_node offset_index;
u64 offset;
u64 bytes;
+   unsigned long ip;
 };
 
 struct btrfs_block_group_cache {
struct btrfs_key key;
struct btrfs_block_group_item item;
spinlock_t lock;
+   struct mutex alloc_mutex;
u64 pinned;
u64 reserved;
u64 flags;
@@ -600,6 +602,7 @@ struct btrfs_fs_info {
struct mutex transaction_kthread_mutex;
struct mutex cleaner_mutex;
struct mutex alloc_mutex;
+   struct mutex extent_io_mutex;
struct mutex chunk_mutex;
struct mutex drop_mutex;
struct mutex volume_mutex;
@@ -1879,8 +1882,12 @@ int 

Need some help with a backref problem

2008-10-16 Thread Josef Bacik
Hello,

Its the end of the day here and I haven't figured this out, so hopefully Yan you
can figure this out and I can come in tomorrow and keep working on taking
alloc_mutex out :).  What is happening is I'm getting -ENOENT from
lookup_extent_backref in finish_current_insert() when extent_op-type ==
PENDING_BACKREF_UPDATE.  The way I have locking is that the only way this can
happen is if we delete the extent backref completely, and then do
btrfs_update_ref.  I put a lookup_extent_backref in __btrfs_update_extent_ref
and did a BUG_ON(ret), and it gave me this backtrace

 [a035ecac] ? btrfs_update_ref+0x2ce/0x322 [btrfs]
 [a034f859] ? insert_ptr+0x176/0x184 [btrfs]
 [a0354615] ? split_node+0x54a/0x5b3 [btrfs]
 [a03555af] ? btrfs_search_slot+0x4ef/0x7aa [btrfs]
 [8109a1ad] ? check_bytes_and_report+0x37/0xc9
 [8109a1ad] ? check_bytes_and_report+0x37/0xc9
 [a0355da7] ? btrfs_insert_empty_items+0x7d/0x43b [btrfs]
 [a03561b4] ? btrfs_insert_item+0x4f/0xa4 [btrfs]
 [a0358ef6] ? finish_current_insert+0xfc/0x2b5 [btrfs]
 [8109a832] ? init_object+0x27/0x6e
 [a035a77b] ? __btrfs_alloc_reserved_extent+0x37d/0x3dc [btrfs]
 [a035aa26] ? btrfs_alloc_reserved_extent+0x2b/0x5b [btrfs]
 [a036accf] ? btrfs_finish_ordered_io+0x21b/0x344 [btrfs]
 [a037a782] ? end_bio_extent_writepage+0x9b/0x172 [btrfs]
 [a037fe51] ? worker_loop+0x42/0x125 [btrfs]
 [a037fe0f] ? worker_loop+0x0/0x125 [btrfs]
 [81046721] ? kthread+0x47/0x76
 [8100cd59] ? child_rip+0xa/0x11
 [810466da] ? kthread+0x0/0x76
 [8100cd4f] ? child_rip+0x0/0x11

And I also put in some printk's to figure out when exactly this was happening,
and it happens in split_node() when c == root-node, so we do an
insert_new_root.  My first reaction was to put a c = path-nodes[level] after
the insert_new_root, but looking at it thats just going to give me the same
thing back.  I can't figure out if I'm doing something wrong or if there is
something wonky with the backref stuff, and what is even more worriesome is that
I can't figure out why having alloc_mutex in there kept this problem from
happenign before, since the way this happens doesn't have anything to do with
alloc_mutex.  All help is appreciated, even random thinking outloud, hopefully
we can figure out what is going on and I can finish ripping alloc_mutex out.
Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help with a backref problem

2008-10-16 Thread Josef Bacik
On Thu, Oct 16, 2008 at 04:54:12PM -0400, Josef Bacik wrote:
 Hello,
 
 Its the end of the day here and I haven't figured this out, so hopefully Yan 
 you
 can figure this out and I can come in tomorrow and keep working on taking
 alloc_mutex out :).  What is happening is I'm getting -ENOENT from
 lookup_extent_backref in finish_current_insert() when extent_op-type ==
 PENDING_BACKREF_UPDATE.  The way I have locking is that the only way this can
 happen is if we delete the extent backref completely, and then do
 btrfs_update_ref.  I put a lookup_extent_backref in __btrfs_update_extent_ref
 and did a BUG_ON(ret), and it gave me this backtrace
 
  [a035ecac] ? btrfs_update_ref+0x2ce/0x322 [btrfs]
  [a034f859] ? insert_ptr+0x176/0x184 [btrfs]
  [a0354615] ? split_node+0x54a/0x5b3 [btrfs]
  [a03555af] ? btrfs_search_slot+0x4ef/0x7aa [btrfs]
  [8109a1ad] ? check_bytes_and_report+0x37/0xc9
  [8109a1ad] ? check_bytes_and_report+0x37/0xc9
  [a0355da7] ? btrfs_insert_empty_items+0x7d/0x43b [btrfs]
  [a03561b4] ? btrfs_insert_item+0x4f/0xa4 [btrfs]
  [a0358ef6] ? finish_current_insert+0xfc/0x2b5 [btrfs]
  [8109a832] ? init_object+0x27/0x6e
  [a035a77b] ? __btrfs_alloc_reserved_extent+0x37d/0x3dc [btrfs]
  [a035aa26] ? btrfs_alloc_reserved_extent+0x2b/0x5b [btrfs]
  [a036accf] ? btrfs_finish_ordered_io+0x21b/0x344 [btrfs]
  [a037a782] ? end_bio_extent_writepage+0x9b/0x172 [btrfs]
  [a037fe51] ? worker_loop+0x42/0x125 [btrfs]
  [a037fe0f] ? worker_loop+0x0/0x125 [btrfs]
  [81046721] ? kthread+0x47/0x76
  [8100cd59] ? child_rip+0xa/0x11
  [810466da] ? kthread+0x0/0x76
  [8100cd4f] ? child_rip+0x0/0x11
 
 And I also put in some printk's to figure out when exactly this was happening,
 and it happens in split_node() when c == root-node, so we do an
 insert_new_root.  My first reaction was to put a c = path-nodes[level] after
 the insert_new_root, but looking at it thats just going to give me the same
 thing back.  I can't figure out if I'm doing something wrong or if there is
 something wonky with the backref stuff, and what is even more worriesome is 
 that
 I can't figure out why having alloc_mutex in there kept this problem from
 happenign before, since the way this happens doesn't have anything to do with
 alloc_mutex.  All help is appreciated, even random thinking outloud, hopefully
 we can figure out what is going on and I can finish ripping alloc_mutex out.
 Thanks,


Ok I think I figured it out, we need to have a

c = root-node;

after the insert_new_root, since we will have free'd the old extent buffer and
replaced it with a new one.  Does that sound right?  Thanks,

Josef 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some help with a backref problem

2008-10-16 Thread Yan Zheng
2008/10/17 Josef Bacik [EMAIL PROTECTED]:
 On Thu, Oct 16, 2008 at 04:54:12PM -0400, Josef Bacik wrote:
 Hello,

 Its the end of the day here and I haven't figured this out, so hopefully Yan 
 you
 can figure this out and I can come in tomorrow and keep working on taking
 alloc_mutex out :).  What is happening is I'm getting -ENOENT from
 lookup_extent_backref in finish_current_insert() when extent_op-type ==
 PENDING_BACKREF_UPDATE.  The way I have locking is that the only way this can
 happen is if we delete the extent backref completely, and then do
 btrfs_update_ref.  I put a lookup_extent_backref in __btrfs_update_extent_ref
 and did a BUG_ON(ret), and it gave me this backtrace


I guess there are two or more threads running finish_current_insert at the same
time. (find_first_extent_bit vs clear_extent_bits race)

  [a035ecac] ? btrfs_update_ref+0x2ce/0x322 [btrfs]
  [a034f859] ? insert_ptr+0x176/0x184 [btrfs]
  [a0354615] ? split_node+0x54a/0x5b3 [btrfs]
  [a03555af] ? btrfs_search_slot+0x4ef/0x7aa [btrfs]
  [8109a1ad] ? check_bytes_and_report+0x37/0xc9
  [8109a1ad] ? check_bytes_and_report+0x37/0xc9
  [a0355da7] ? btrfs_insert_empty_items+0x7d/0x43b [btrfs]
  [a03561b4] ? btrfs_insert_item+0x4f/0xa4 [btrfs]
  [a0358ef6] ? finish_current_insert+0xfc/0x2b5 [btrfs]
  [8109a832] ? init_object+0x27/0x6e
  [a035a77b] ? __btrfs_alloc_reserved_extent+0x37d/0x3dc [btrfs]
  [a035aa26] ? btrfs_alloc_reserved_extent+0x2b/0x5b [btrfs]
  [a036accf] ? btrfs_finish_ordered_io+0x21b/0x344 [btrfs]
  [a037a782] ? end_bio_extent_writepage+0x9b/0x172 [btrfs]
  [a037fe51] ? worker_loop+0x42/0x125 [btrfs]
  [a037fe0f] ? worker_loop+0x0/0x125 [btrfs]
  [81046721] ? kthread+0x47/0x76
  [8100cd59] ? child_rip+0xa/0x11
  [810466da] ? kthread+0x0/0x76
  [8100cd4f] ? child_rip+0x0/0x11

 And I also put in some printk's to figure out when exactly this was 
 happening,
 and it happens in split_node() when c == root-node, so we do an
 insert_new_root.  My first reaction was to put a c = path-nodes[level] after
 the insert_new_root, but looking at it thats just going to give me the same
 thing back.  I can't figure out if I'm doing something wrong or if there is
 something wonky with the backref stuff, and what is even more worriesome is 
 that
 I can't figure out why having alloc_mutex in there kept this problem from
 happenign before, since the way this happens doesn't have anything to do with
 alloc_mutex.  All help is appreciated, even random thinking outloud, 
 hopefully
 we can figure out what is going on and I can finish ripping alloc_mutex out.
 Thanks,


 Ok I think I figured it out, we need to have a

 c = root-node;

 after the insert_new_root, since we will have free'd the old extent buffer and
 replaced it with a new one.  Does that sound right?  Thanks,


I don't think so. If we do this, we will end up spliting the new root.

Regards
Yan Zheng
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html