Re: Need some help with a backref problem
On Fri, Oct 17, 2008 at 09:48:58AM +0800, Yan Zheng wrote: 2008/10/17 Josef Bacik [EMAIL PROTECTED]: On Thu, Oct 16, 2008 at 04:54:12PM -0400, Josef Bacik wrote: Hello, Its the end of the day here and I haven't figured this out, so hopefully Yan you can figure this out and I can come in tomorrow and keep working on taking alloc_mutex out :). What is happening is I'm getting -ENOENT from lookup_extent_backref in finish_current_insert() when extent_op-type == PENDING_BACKREF_UPDATE. The way I have locking is that the only way this can happen is if we delete the extent backref completely, and then do btrfs_update_ref. I put a lookup_extent_backref in __btrfs_update_extent_ref and did a BUG_ON(ret), and it gave me this backtrace I guess there are two or more threads running finish_current_insert at the same time. (find_first_extent_bit vs clear_extent_bits race) This can't happen, its gaurded by a new mutex that is responsible for the extent_ins/del_pending/pinned_extents extent io trees. [a035ecac] ? btrfs_update_ref+0x2ce/0x322 [btrfs] [a034f859] ? insert_ptr+0x176/0x184 [btrfs] [a0354615] ? split_node+0x54a/0x5b3 [btrfs] [a03555af] ? btrfs_search_slot+0x4ef/0x7aa [btrfs] [8109a1ad] ? check_bytes_and_report+0x37/0xc9 [8109a1ad] ? check_bytes_and_report+0x37/0xc9 [a0355da7] ? btrfs_insert_empty_items+0x7d/0x43b [btrfs] [a03561b4] ? btrfs_insert_item+0x4f/0xa4 [btrfs] [a0358ef6] ? finish_current_insert+0xfc/0x2b5 [btrfs] [8109a832] ? init_object+0x27/0x6e [a035a77b] ? __btrfs_alloc_reserved_extent+0x37d/0x3dc [btrfs] [a035aa26] ? btrfs_alloc_reserved_extent+0x2b/0x5b [btrfs] [a036accf] ? btrfs_finish_ordered_io+0x21b/0x344 [btrfs] [a037a782] ? end_bio_extent_writepage+0x9b/0x172 [btrfs] [a037fe51] ? worker_loop+0x42/0x125 [btrfs] [a037fe0f] ? worker_loop+0x0/0x125 [btrfs] [81046721] ? kthread+0x47/0x76 [8100cd59] ? child_rip+0xa/0x11 [810466da] ? kthread+0x0/0x76 [8100cd4f] ? child_rip+0x0/0x11 And I also put in some printk's to figure out when exactly this was happening, and it happens in split_node() when c == root-node, so we do an insert_new_root. My first reaction was to put a c = path-nodes[level] after the insert_new_root, but looking at it thats just going to give me the same thing back. I can't figure out if I'm doing something wrong or if there is something wonky with the backref stuff, and what is even more worriesome is that I can't figure out why having alloc_mutex in there kept this problem from happenign before, since the way this happens doesn't have anything to do with alloc_mutex. All help is appreciated, even random thinking outloud, hopefully we can figure out what is going on and I can finish ripping alloc_mutex out. Thanks, Ok I think I figured it out, we need to have a c = root-node; after the insert_new_root, since we will have free'd the old extent buffer and replaced it with a new one. Does that sound right? Thanks, I don't think so. If we do this, we will end up spliting the new root. Ok, I think I understand this now, thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help with a backref problem
On Fri, Oct 17, 2008 at 09:48:58AM +0800, Yan Zheng wrote: 2008/10/17 Josef Bacik [EMAIL PROTECTED]: On Thu, Oct 16, 2008 at 04:54:12PM -0400, Josef Bacik wrote: Hello, Its the end of the day here and I haven't figured this out, so hopefully Yan you can figure this out and I can come in tomorrow and keep working on taking alloc_mutex out :). What is happening is I'm getting -ENOENT from lookup_extent_backref in finish_current_insert() when extent_op-type == PENDING_BACKREF_UPDATE. The way I have locking is that the only way this can happen is if we delete the extent backref completely, and then do btrfs_update_ref. I put a lookup_extent_backref in __btrfs_update_extent_ref and did a BUG_ON(ret), and it gave me this backtrace I guess there are two or more threads running finish_current_insert at the same time. (find_first_extent_bit vs clear_extent_bits race) [a035ecac] ? btrfs_update_ref+0x2ce/0x322 [btrfs] [a034f859] ? insert_ptr+0x176/0x184 [btrfs] [a0354615] ? split_node+0x54a/0x5b3 [btrfs] [a03555af] ? btrfs_search_slot+0x4ef/0x7aa [btrfs] [8109a1ad] ? check_bytes_and_report+0x37/0xc9 [8109a1ad] ? check_bytes_and_report+0x37/0xc9 [a0355da7] ? btrfs_insert_empty_items+0x7d/0x43b [btrfs] [a03561b4] ? btrfs_insert_item+0x4f/0xa4 [btrfs] [a0358ef6] ? finish_current_insert+0xfc/0x2b5 [btrfs] [8109a832] ? init_object+0x27/0x6e [a035a77b] ? __btrfs_alloc_reserved_extent+0x37d/0x3dc [btrfs] [a035aa26] ? btrfs_alloc_reserved_extent+0x2b/0x5b [btrfs] [a036accf] ? btrfs_finish_ordered_io+0x21b/0x344 [btrfs] [a037a782] ? end_bio_extent_writepage+0x9b/0x172 [btrfs] [a037fe51] ? worker_loop+0x42/0x125 [btrfs] [a037fe0f] ? worker_loop+0x0/0x125 [btrfs] [81046721] ? kthread+0x47/0x76 [8100cd59] ? child_rip+0xa/0x11 [810466da] ? kthread+0x0/0x76 [8100cd4f] ? child_rip+0x0/0x11 And I also put in some printk's to figure out when exactly this was happening, and it happens in split_node() when c == root-node, so we do an insert_new_root. My first reaction was to put a c = path-nodes[level] after the insert_new_root, but looking at it thats just going to give me the same thing back. I can't figure out if I'm doing something wrong or if there is something wonky with the backref stuff, and what is even more worriesome is that I can't figure out why having alloc_mutex in there kept this problem from happenign before, since the way this happens doesn't have anything to do with alloc_mutex. All help is appreciated, even random thinking outloud, hopefully we can figure out what is going on and I can finish ripping alloc_mutex out. Thanks, Ok I think I figured it out, we need to have a c = root-node; after the insert_new_root, since we will have free'd the old extent buffer and replaced it with a new one. Does that sound right? Thanks, I don't think so. If we do this, we will end up spliting the new root. here is my patch, its a bit of a mess right now, thanks, Josef diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 9caeb37..4f2 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1390,8 +1390,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root lowest_level = p-lowest_level; WARN_ON(lowest_level ins_len 0); WARN_ON(p-nodes[0] != NULL); - WARN_ON(cow root == root-fs_info-extent_root - !mutex_is_locked(root-fs_info-alloc_mutex)); + if (ins_len 0) lowest_unlock = 2; @@ -2051,6 +2050,7 @@ static noinline int split_node(struct btrfs_trans_handle *trans, if (c == root-node) { /* trying to split the root, lets make a new one */ ret = insert_new_root(trans, root, path, level + 1); + printk(KERN_ERR splitting the root, %llu\n, c-start); if (ret) return ret; } else { diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index fad58b9..d1e304f 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -516,12 +516,14 @@ struct btrfs_free_space { struct rb_node offset_index; u64 offset; u64 bytes; + unsigned long ip; }; struct btrfs_block_group_cache { struct btrfs_key key; struct btrfs_block_group_item item; spinlock_t lock; + struct mutex alloc_mutex; u64 pinned; u64 reserved; u64 flags; @@ -600,6 +602,7 @@ struct btrfs_fs_info { struct mutex transaction_kthread_mutex; struct mutex cleaner_mutex; struct mutex alloc_mutex; + struct mutex extent_io_mutex; struct mutex chunk_mutex; struct mutex drop_mutex; struct mutex volume_mutex; @@ -1879,8 +1882,12 @@ int
Need some help with a backref problem
Hello, Its the end of the day here and I haven't figured this out, so hopefully Yan you can figure this out and I can come in tomorrow and keep working on taking alloc_mutex out :). What is happening is I'm getting -ENOENT from lookup_extent_backref in finish_current_insert() when extent_op-type == PENDING_BACKREF_UPDATE. The way I have locking is that the only way this can happen is if we delete the extent backref completely, and then do btrfs_update_ref. I put a lookup_extent_backref in __btrfs_update_extent_ref and did a BUG_ON(ret), and it gave me this backtrace [a035ecac] ? btrfs_update_ref+0x2ce/0x322 [btrfs] [a034f859] ? insert_ptr+0x176/0x184 [btrfs] [a0354615] ? split_node+0x54a/0x5b3 [btrfs] [a03555af] ? btrfs_search_slot+0x4ef/0x7aa [btrfs] [8109a1ad] ? check_bytes_and_report+0x37/0xc9 [8109a1ad] ? check_bytes_and_report+0x37/0xc9 [a0355da7] ? btrfs_insert_empty_items+0x7d/0x43b [btrfs] [a03561b4] ? btrfs_insert_item+0x4f/0xa4 [btrfs] [a0358ef6] ? finish_current_insert+0xfc/0x2b5 [btrfs] [8109a832] ? init_object+0x27/0x6e [a035a77b] ? __btrfs_alloc_reserved_extent+0x37d/0x3dc [btrfs] [a035aa26] ? btrfs_alloc_reserved_extent+0x2b/0x5b [btrfs] [a036accf] ? btrfs_finish_ordered_io+0x21b/0x344 [btrfs] [a037a782] ? end_bio_extent_writepage+0x9b/0x172 [btrfs] [a037fe51] ? worker_loop+0x42/0x125 [btrfs] [a037fe0f] ? worker_loop+0x0/0x125 [btrfs] [81046721] ? kthread+0x47/0x76 [8100cd59] ? child_rip+0xa/0x11 [810466da] ? kthread+0x0/0x76 [8100cd4f] ? child_rip+0x0/0x11 And I also put in some printk's to figure out when exactly this was happening, and it happens in split_node() when c == root-node, so we do an insert_new_root. My first reaction was to put a c = path-nodes[level] after the insert_new_root, but looking at it thats just going to give me the same thing back. I can't figure out if I'm doing something wrong or if there is something wonky with the backref stuff, and what is even more worriesome is that I can't figure out why having alloc_mutex in there kept this problem from happenign before, since the way this happens doesn't have anything to do with alloc_mutex. All help is appreciated, even random thinking outloud, hopefully we can figure out what is going on and I can finish ripping alloc_mutex out. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help with a backref problem
On Thu, Oct 16, 2008 at 04:54:12PM -0400, Josef Bacik wrote: Hello, Its the end of the day here and I haven't figured this out, so hopefully Yan you can figure this out and I can come in tomorrow and keep working on taking alloc_mutex out :). What is happening is I'm getting -ENOENT from lookup_extent_backref in finish_current_insert() when extent_op-type == PENDING_BACKREF_UPDATE. The way I have locking is that the only way this can happen is if we delete the extent backref completely, and then do btrfs_update_ref. I put a lookup_extent_backref in __btrfs_update_extent_ref and did a BUG_ON(ret), and it gave me this backtrace [a035ecac] ? btrfs_update_ref+0x2ce/0x322 [btrfs] [a034f859] ? insert_ptr+0x176/0x184 [btrfs] [a0354615] ? split_node+0x54a/0x5b3 [btrfs] [a03555af] ? btrfs_search_slot+0x4ef/0x7aa [btrfs] [8109a1ad] ? check_bytes_and_report+0x37/0xc9 [8109a1ad] ? check_bytes_and_report+0x37/0xc9 [a0355da7] ? btrfs_insert_empty_items+0x7d/0x43b [btrfs] [a03561b4] ? btrfs_insert_item+0x4f/0xa4 [btrfs] [a0358ef6] ? finish_current_insert+0xfc/0x2b5 [btrfs] [8109a832] ? init_object+0x27/0x6e [a035a77b] ? __btrfs_alloc_reserved_extent+0x37d/0x3dc [btrfs] [a035aa26] ? btrfs_alloc_reserved_extent+0x2b/0x5b [btrfs] [a036accf] ? btrfs_finish_ordered_io+0x21b/0x344 [btrfs] [a037a782] ? end_bio_extent_writepage+0x9b/0x172 [btrfs] [a037fe51] ? worker_loop+0x42/0x125 [btrfs] [a037fe0f] ? worker_loop+0x0/0x125 [btrfs] [81046721] ? kthread+0x47/0x76 [8100cd59] ? child_rip+0xa/0x11 [810466da] ? kthread+0x0/0x76 [8100cd4f] ? child_rip+0x0/0x11 And I also put in some printk's to figure out when exactly this was happening, and it happens in split_node() when c == root-node, so we do an insert_new_root. My first reaction was to put a c = path-nodes[level] after the insert_new_root, but looking at it thats just going to give me the same thing back. I can't figure out if I'm doing something wrong or if there is something wonky with the backref stuff, and what is even more worriesome is that I can't figure out why having alloc_mutex in there kept this problem from happenign before, since the way this happens doesn't have anything to do with alloc_mutex. All help is appreciated, even random thinking outloud, hopefully we can figure out what is going on and I can finish ripping alloc_mutex out. Thanks, Ok I think I figured it out, we need to have a c = root-node; after the insert_new_root, since we will have free'd the old extent buffer and replaced it with a new one. Does that sound right? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some help with a backref problem
2008/10/17 Josef Bacik [EMAIL PROTECTED]: On Thu, Oct 16, 2008 at 04:54:12PM -0400, Josef Bacik wrote: Hello, Its the end of the day here and I haven't figured this out, so hopefully Yan you can figure this out and I can come in tomorrow and keep working on taking alloc_mutex out :). What is happening is I'm getting -ENOENT from lookup_extent_backref in finish_current_insert() when extent_op-type == PENDING_BACKREF_UPDATE. The way I have locking is that the only way this can happen is if we delete the extent backref completely, and then do btrfs_update_ref. I put a lookup_extent_backref in __btrfs_update_extent_ref and did a BUG_ON(ret), and it gave me this backtrace I guess there are two or more threads running finish_current_insert at the same time. (find_first_extent_bit vs clear_extent_bits race) [a035ecac] ? btrfs_update_ref+0x2ce/0x322 [btrfs] [a034f859] ? insert_ptr+0x176/0x184 [btrfs] [a0354615] ? split_node+0x54a/0x5b3 [btrfs] [a03555af] ? btrfs_search_slot+0x4ef/0x7aa [btrfs] [8109a1ad] ? check_bytes_and_report+0x37/0xc9 [8109a1ad] ? check_bytes_and_report+0x37/0xc9 [a0355da7] ? btrfs_insert_empty_items+0x7d/0x43b [btrfs] [a03561b4] ? btrfs_insert_item+0x4f/0xa4 [btrfs] [a0358ef6] ? finish_current_insert+0xfc/0x2b5 [btrfs] [8109a832] ? init_object+0x27/0x6e [a035a77b] ? __btrfs_alloc_reserved_extent+0x37d/0x3dc [btrfs] [a035aa26] ? btrfs_alloc_reserved_extent+0x2b/0x5b [btrfs] [a036accf] ? btrfs_finish_ordered_io+0x21b/0x344 [btrfs] [a037a782] ? end_bio_extent_writepage+0x9b/0x172 [btrfs] [a037fe51] ? worker_loop+0x42/0x125 [btrfs] [a037fe0f] ? worker_loop+0x0/0x125 [btrfs] [81046721] ? kthread+0x47/0x76 [8100cd59] ? child_rip+0xa/0x11 [810466da] ? kthread+0x0/0x76 [8100cd4f] ? child_rip+0x0/0x11 And I also put in some printk's to figure out when exactly this was happening, and it happens in split_node() when c == root-node, so we do an insert_new_root. My first reaction was to put a c = path-nodes[level] after the insert_new_root, but looking at it thats just going to give me the same thing back. I can't figure out if I'm doing something wrong or if there is something wonky with the backref stuff, and what is even more worriesome is that I can't figure out why having alloc_mutex in there kept this problem from happenign before, since the way this happens doesn't have anything to do with alloc_mutex. All help is appreciated, even random thinking outloud, hopefully we can figure out what is going on and I can finish ripping alloc_mutex out. Thanks, Ok I think I figured it out, we need to have a c = root-node; after the insert_new_root, since we will have free'd the old extent buffer and replaced it with a new one. Does that sound right? Thanks, I don't think so. If we do this, we will end up spliting the new root. Regards Yan Zheng -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html