Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
On Mon, Nov 05, 2018 at 04:36:34PM +, Filipe Manana wrote: > On Mon, Nov 5, 2018 at 4:34 PM David Sterba wrote: > > On Mon, Nov 05, 2018 at 04:30:35PM +, Filipe Manana wrote: > > > On Mon, Nov 5, 2018 at 4:29 PM David Sterba wrote: > > > > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote: > > > > > > Ah ok makes sense. Well in that case lets just make > > > > > > btrfs_read_locked_inode() > > > > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly > > > > > > > > > > > > if (path != in_path) > > > > > > > > > > You mean the following on top of v4: > > > > > > > > > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg > > > > > > > > > > Not much different, just saves one such if statement. I'm ok with > > > > > that. > > > > > > > > Now in misc-next with v4 and the friendpaste incremental as > > > > > > > > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69 > > > > > > Please don't add the incremental. It's buggy. It was meant to figure > > > out what Josef was saying. That's why I haven't sent a V5. > > > > Ok dropped, I'll will wait for a proper patch. > > It's V4, the last sent version. Just forget the incremental. > Thanks. For the record, V4 has been merged to master in 4.20-rc2.
Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
On Mon, Nov 5, 2018 at 4:34 PM David Sterba wrote: > > On Mon, Nov 05, 2018 at 04:30:35PM +, Filipe Manana wrote: > > On Mon, Nov 5, 2018 at 4:29 PM David Sterba wrote: > > > > > > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote: > > > > > Ah ok makes sense. Well in that case lets just make > > > > > btrfs_read_locked_inode() > > > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly > > > > > > > > > > if (path != in_path) > > > > > > > > You mean the following on top of v4: > > > > > > > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg > > > > > > > > Not much different, just saves one such if statement. I'm ok with that. > > > > > > Now in misc-next with v4 and the friendpaste incremental as > > > > > > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69 > > > > Please don't add the incremental. It's buggy. It was meant to figure > > out what Josef was saying. That's why I haven't sent a V5. > > Ok dropped, I'll will wait for a proper patch. It's V4, the last sent version. Just forget the incremental. Thanks.
Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
On Mon, Nov 05, 2018 at 04:30:35PM +, Filipe Manana wrote: > On Mon, Nov 5, 2018 at 4:29 PM David Sterba wrote: > > > > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote: > > > > Ah ok makes sense. Well in that case lets just make > > > > btrfs_read_locked_inode() > > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly > > > > > > > > if (path != in_path) > > > > > > You mean the following on top of v4: > > > > > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg > > > > > > Not much different, just saves one such if statement. I'm ok with that. > > > > Now in misc-next with v4 and the friendpaste incremental as > > > > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69 > > Please don't add the incremental. It's buggy. It was meant to figure > out what Josef was saying. That's why I haven't sent a V5. Ok dropped, I'll will wait for a proper patch.
Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
On Mon, Nov 5, 2018 at 4:29 PM David Sterba wrote: > > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote: > > > Ah ok makes sense. Well in that case lets just make > > > btrfs_read_locked_inode() > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly > > > > > > if (path != in_path) > > > > You mean the following on top of v4: > > > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg > > > > Not much different, just saves one such if statement. I'm ok with that. > > Now in misc-next with v4 and the friendpaste incremental as > > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69 Please don't add the incremental. It's buggy. It was meant to figure out what Josef was saying. That's why I haven't sent a V5.
Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote: > > Ah ok makes sense. Well in that case lets just make > > btrfs_read_locked_inode() > > take a path, and allocate it in btrfs_iget, that'll remove the ugly > > > > if (path != in_path) > > You mean the following on top of v4: > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg > > Not much different, just saves one such if statement. I'm ok with that. Now in misc-next with v4 and the friendpaste incremental as https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69
Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote: > On Wed, Oct 24, 2018 at 1:40 PM Josef Bacik wrote: > > > > On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote: > > > On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik wrote: > > > > > > > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote: > > > > > From: Filipe Manana > > > > > > > > > > When we are writing out a free space cache, during the transaction > > > > > commit > > > > > phase, we can end up in a deadlock which results in a stack trace > > > > > like the > > > > > following: > > > > > > > > > > schedule+0x28/0x80 > > > > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > > > > ? finish_wait+0x80/0x80 > > > > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > > > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > > > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > > > > ? inode_insert5+0x119/0x190 > > > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > > > ? kmem_cache_alloc+0x166/0x1d0 > > > > > btrfs_iget+0x113/0x690 [btrfs] > > > > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > > > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > > > > load_free_space_cache+0x7c/0x170 [btrfs] > > > > > ? cache_block_group+0x72/0x3b0 [btrfs] > > > > > cache_block_group+0x1b3/0x3b0 [btrfs] > > > > > ? finish_wait+0x80/0x80 > > > > > find_free_extent+0x799/0x1010 [btrfs] > > > > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > > > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > > > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > > > > btrfs_cow_block+0xdc/0x180 [btrfs] > > > > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > > > ? kmem_cache_alloc+0x166/0x1d0 > > > > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > > > > cache_save_setup+0xe4/0x3a0 [btrfs] > > > > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > > > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > > > > > > > At cache_save_setup() we need to update the inode item of a block > > > > > group's > > > > > cache which is located in the tree root (fs_info->tree_root), which > > > > > means > > > > > that it may result in COWing a leaf from that tree. If that happens we > > > > > need to find a free metadata extent and while looking for one, if we > > > > > find > > > > > a block group which was not cached yet we attempt to load its cache by > > > > > calling cache_block_group(). However this function will try to load > > > > > the > > > > > inode of the free space cache, which requires finding the matching > > > > > inode > > > > > item in the tree root - if that inode item is located in the same > > > > > leaf as > > > > > the inode item of the space cache we are updating at > > > > > cache_save_setup(), > > > > > we end up in a deadlock, since we try to obtain a read lock on the > > > > > same > > > > > extent buffer that we previously write locked. > > > > > > > > > > So fix this by using the tree root's commit root when searching for a > > > > > block group's free space cache inode item when we are attempting to > > > > > load > > > > > a free space cache. This is safe since block groups once loaded stay > > > > > in > > > > > memory forever, as well as their caches, so after they are first > > > > > loaded > > > > > we will never need to read their inode items again. For new block > > > > > groups, > > > > > once they are created they get their ->cached field set to > > > > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode > > > > > item. > > > > > > > > > > Reported-by: Andrew Nelson > > > > > Link: > > > > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ > > > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > > >
Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
On Wed, Oct 24, 2018 at 1:40 PM Josef Bacik wrote: > > On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote: > > On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik wrote: > > > > > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote: > > > > From: Filipe Manana > > > > > > > > When we are writing out a free space cache, during the transaction > > > > commit > > > > phase, we can end up in a deadlock which results in a stack trace like > > > > the > > > > following: > > > > > > > > schedule+0x28/0x80 > > > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > > > ? finish_wait+0x80/0x80 > > > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > > > ? inode_insert5+0x119/0x190 > > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > > ? kmem_cache_alloc+0x166/0x1d0 > > > > btrfs_iget+0x113/0x690 [btrfs] > > > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > > > load_free_space_cache+0x7c/0x170 [btrfs] > > > > ? cache_block_group+0x72/0x3b0 [btrfs] > > > > cache_block_group+0x1b3/0x3b0 [btrfs] > > > > ? finish_wait+0x80/0x80 > > > > find_free_extent+0x799/0x1010 [btrfs] > > > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > > > btrfs_cow_block+0xdc/0x180 [btrfs] > > > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > > ? kmem_cache_alloc+0x166/0x1d0 > > > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > > > cache_save_setup+0xe4/0x3a0 [btrfs] > > > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > > > > > At cache_save_setup() we need to update the inode item of a block > > > > group's > > > > cache which is located in the tree root (fs_info->tree_root), which > > > > means > > > > that it may result in COWing a leaf from that tree. If that happens we > > > > need to find a free metadata extent and while looking for one, if we > > > > find > > > > a block group which was not cached yet we attempt to load its cache by > > > > calling cache_block_group(). However this function will try to load the > > > > inode of the free space cache, which requires finding the matching inode > > > > item in the tree root - if that inode item is located in the same leaf > > > > as > > > > the inode item of the space cache we are updating at cache_save_setup(), > > > > we end up in a deadlock, since we try to obtain a read lock on the same > > > > extent buffer that we previously write locked. > > > > > > > > So fix this by using the tree root's commit root when searching for a > > > > block group's free space cache inode item when we are attempting to load > > > > a free space cache. This is safe since block groups once loaded stay in > > > > memory forever, as well as their caches, so after they are first loaded > > > > we will never need to read their inode items again. For new block > > > > groups, > > > > once they are created they get their ->cached field set to > > > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item. > > > > > > > > Reported-by: Andrew Nelson > > > > Link: > > > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ > > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > > > > Tested-by: Andrew Nelson > > > > Signed-off-by: Filipe Manana > > > > --- > > > > > > > > > > Now my goal is to see how many times I can get you to redo this thing. > > > > > > Why not instead just do > > > > > > if (btrfs_is_free_space_inode(inode)) > > > path->search_commit_root = 1; > > > > > > in read_locked_inode? That would be cleaner. If we don't want to do > > > that for > > > the inode cache (I'm not sure if it's ok or not) we could just do > > > > > > if (root == fs_info->tree_root) > > > > We can't (not just that at least). > > Tried something like that, but we get into a BUG_ON when writing out > > the space cache for new block groups (created in the current > > transaction). > > Because at cache_save_setup() we have this: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342 > > > > Lookup for the inode in normal root, doesn't exist, create it then > > repeat - if still not found, BUG_ON. > > Could also make create_free_space_inode() return an inode pointer and > > make it call btrfs_iget(). > > > > Ah ok makes sense. Well in that case lets just make btrfs_read_locked_inode() > take a path, and allocate it in btrfs_iget, that'll remove the ugly > > if (path != in_path) You mean the following on top of v4: https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg Not much different, just saves one such if statement. I'm ok with that. > > stuff. Thanks, > > Josef
Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote: > On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik wrote: > > > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote: > > > From: Filipe Manana > > > > > > When we are writing out a free space cache, during the transaction commit > > > phase, we can end up in a deadlock which results in a stack trace like the > > > following: > > > > > > schedule+0x28/0x80 > > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > > ? finish_wait+0x80/0x80 > > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > > ? inode_insert5+0x119/0x190 > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > ? kmem_cache_alloc+0x166/0x1d0 > > > btrfs_iget+0x113/0x690 [btrfs] > > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > > load_free_space_cache+0x7c/0x170 [btrfs] > > > ? cache_block_group+0x72/0x3b0 [btrfs] > > > cache_block_group+0x1b3/0x3b0 [btrfs] > > > ? finish_wait+0x80/0x80 > > > find_free_extent+0x799/0x1010 [btrfs] > > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > > btrfs_cow_block+0xdc/0x180 [btrfs] > > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > ? kmem_cache_alloc+0x166/0x1d0 > > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > > cache_save_setup+0xe4/0x3a0 [btrfs] > > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > > > At cache_save_setup() we need to update the inode item of a block group's > > > cache which is located in the tree root (fs_info->tree_root), which means > > > that it may result in COWing a leaf from that tree. If that happens we > > > need to find a free metadata extent and while looking for one, if we find > > > a block group which was not cached yet we attempt to load its cache by > > > calling cache_block_group(). However this function will try to load the > > > inode of the free space cache, which requires finding the matching inode > > > item in the tree root - if that inode item is located in the same leaf as > > > the inode item of the space cache we are updating at cache_save_setup(), > > > we end up in a deadlock, since we try to obtain a read lock on the same > > > extent buffer that we previously write locked. > > > > > > So fix this by using the tree root's commit root when searching for a > > > block group's free space cache inode item when we are attempting to load > > > a free space cache. This is safe since block groups once loaded stay in > > > memory forever, as well as their caches, so after they are first loaded > > > we will never need to read their inode items again. For new block groups, > > > once they are created they get their ->cached field set to > > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item. > > > > > > Reported-by: Andrew Nelson > > > Link: > > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > > > Tested-by: Andrew Nelson > > > Signed-off-by: Filipe Manana > > > --- > > > > > > > Now my goal is to see how many times I can get you to redo this thing. > > > > Why not instead just do > > > > if (btrfs_is_free_space_inode(inode)) > > path->search_commit_root = 1; > > > > in read_locked_inode? That would be cleaner. If we don't want to do that > > for > > the inode cache (I'm not sure if it's ok or not) we could just do > > > > if (root == fs_info->tree_root) > > We can't (not just that at least). > Tried something like that, but we get into a BUG_ON when writing out > the space cache for new block groups (created in the current > transaction). > Because at cache_save_setup() we have this: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342 > > Lookup for the inode in normal root, doesn't exist, create it then > repeat - if still not found, BUG_ON. > Could also make create_free_space_inode() return an inode pointer and > make it call btrfs_iget(). > Ah ok makes sense. Well in that case lets just make btrfs_read_locked_inode() take a path, and allocate it in btrfs_iget, that'll remove the ugly if (path != in_path) stuff. Thanks, Josef
Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik wrote: > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote: > > From: Filipe Manana > > > > When we are writing out a free space cache, during the transaction commit > > phase, we can end up in a deadlock which results in a stack trace like the > > following: > > > > schedule+0x28/0x80 > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > ? finish_wait+0x80/0x80 > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > ? inode_insert5+0x119/0x190 > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > ? kmem_cache_alloc+0x166/0x1d0 > > btrfs_iget+0x113/0x690 [btrfs] > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > load_free_space_cache+0x7c/0x170 [btrfs] > > ? cache_block_group+0x72/0x3b0 [btrfs] > > cache_block_group+0x1b3/0x3b0 [btrfs] > > ? finish_wait+0x80/0x80 > > find_free_extent+0x799/0x1010 [btrfs] > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > btrfs_cow_block+0xdc/0x180 [btrfs] > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > ? kmem_cache_alloc+0x166/0x1d0 > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > cache_save_setup+0xe4/0x3a0 [btrfs] > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > At cache_save_setup() we need to update the inode item of a block group's > > cache which is located in the tree root (fs_info->tree_root), which means > > that it may result in COWing a leaf from that tree. If that happens we > > need to find a free metadata extent and while looking for one, if we find > > a block group which was not cached yet we attempt to load its cache by > > calling cache_block_group(). However this function will try to load the > > inode of the free space cache, which requires finding the matching inode > > item in the tree root - if that inode item is located in the same leaf as > > the inode item of the space cache we are updating at cache_save_setup(), > > we end up in a deadlock, since we try to obtain a read lock on the same > > extent buffer that we previously write locked. > > > > So fix this by using the tree root's commit root when searching for a > > block group's free space cache inode item when we are attempting to load > > a free space cache. This is safe since block groups once loaded stay in > > memory forever, as well as their caches, so after they are first loaded > > we will never need to read their inode items again. For new block groups, > > once they are created they get their ->cached field set to > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item. > > > > Reported-by: Andrew Nelson > > Link: > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > > Tested-by: Andrew Nelson > > Signed-off-by: Filipe Manana > > --- > > > > Now my goal is to see how many times I can get you to redo this thing. > > Why not instead just do > > if (btrfs_is_free_space_inode(inode)) > path->search_commit_root = 1; > > in read_locked_inode? That would be cleaner. If we don't want to do that for > the inode cache (I'm not sure if it's ok or not) we could just do > > if (root == fs_info->tree_root) We can't (not just that at least). Tried something like that, but we get into a BUG_ON when writing out the space cache for new block groups (created in the current transaction). Because at cache_save_setup() we have this: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342 Lookup for the inode in normal root, doesn't exist, create it then repeat - if still not found, BUG_ON. Could also make create_free_space_inode() return an inode pointer and make it call btrfs_iget(). > > instead. Thanks, > > Josef
Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > When we are writing out a free space cache, during the transaction commit > phase, we can end up in a deadlock which results in a stack trace like the > following: > > schedule+0x28/0x80 > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > ? finish_wait+0x80/0x80 > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > btrfs_search_slot+0xf6/0x9f0 [btrfs] > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > ? inode_insert5+0x119/0x190 > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > ? kmem_cache_alloc+0x166/0x1d0 > btrfs_iget+0x113/0x690 [btrfs] > __lookup_free_space_inode+0xd8/0x150 [btrfs] > lookup_free_space_inode+0x5b/0xb0 [btrfs] > load_free_space_cache+0x7c/0x170 [btrfs] > ? cache_block_group+0x72/0x3b0 [btrfs] > cache_block_group+0x1b3/0x3b0 [btrfs] > ? finish_wait+0x80/0x80 > find_free_extent+0x799/0x1010 [btrfs] > btrfs_reserve_extent+0x9b/0x180 [btrfs] > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > __btrfs_cow_block+0x11d/0x500 [btrfs] > btrfs_cow_block+0xdc/0x180 [btrfs] > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > ? kmem_cache_alloc+0x166/0x1d0 > btrfs_update_inode_item+0x46/0x100 [btrfs] > cache_save_setup+0xe4/0x3a0 [btrfs] > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > At cache_save_setup() we need to update the inode item of a block group's > cache which is located in the tree root (fs_info->tree_root), which means > that it may result in COWing a leaf from that tree. If that happens we > need to find a free metadata extent and while looking for one, if we find > a block group which was not cached yet we attempt to load its cache by > calling cache_block_group(). However this function will try to load the > inode of the free space cache, which requires finding the matching inode > item in the tree root - if that inode item is located in the same leaf as > the inode item of the space cache we are updating at cache_save_setup(), > we end up in a deadlock, since we try to obtain a read lock on the same > extent buffer that we previously write locked. > > So fix this by using the tree root's commit root when searching for a > block group's free space cache inode item when we are attempting to load > a free space cache. This is safe since block groups once loaded stay in > memory forever, as well as their caches, so after they are first loaded > we will never need to read their inode items again. For new block groups, > once they are created they get their ->cached field set to > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item. > > Reported-by: Andrew Nelson > Link: > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > Tested-by: Andrew Nelson > Signed-off-by: Filipe Manana > --- > Now my goal is to see how many times I can get you to redo this thing. Why not instead just do if (btrfs_is_free_space_inode(inode)) path->search_commit_root = 1; in read_locked_inode? That would be cleaner. If we don't want to do that for the inode cache (I'm not sure if it's ok or not) we could just do if (root == fs_info->tree_root) instead. Thanks, Josef
[PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent
From: Filipe Manana When we are writing out a free space cache, during the transaction commit phase, we can end up in a deadlock which results in a stack trace like the following: schedule+0x28/0x80 btrfs_tree_read_lock+0x8e/0x120 [btrfs] ? finish_wait+0x80/0x80 btrfs_read_lock_root_node+0x2f/0x40 [btrfs] btrfs_search_slot+0xf6/0x9f0 [btrfs] ? evict_refill_and_join+0xd0/0xd0 [btrfs] ? inode_insert5+0x119/0x190 btrfs_lookup_inode+0x3a/0xc0 [btrfs] ? kmem_cache_alloc+0x166/0x1d0 btrfs_iget+0x113/0x690 [btrfs] __lookup_free_space_inode+0xd8/0x150 [btrfs] lookup_free_space_inode+0x5b/0xb0 [btrfs] load_free_space_cache+0x7c/0x170 [btrfs] ? cache_block_group+0x72/0x3b0 [btrfs] cache_block_group+0x1b3/0x3b0 [btrfs] ? finish_wait+0x80/0x80 find_free_extent+0x799/0x1010 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] __btrfs_cow_block+0x11d/0x500 [btrfs] btrfs_cow_block+0xdc/0x180 [btrfs] btrfs_search_slot+0x3bd/0x9f0 [btrfs] btrfs_lookup_inode+0x3a/0xc0 [btrfs] ? kmem_cache_alloc+0x166/0x1d0 btrfs_update_inode_item+0x46/0x100 [btrfs] cache_save_setup+0xe4/0x3a0 [btrfs] btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] btrfs_commit_transaction+0xcb/0x8b0 [btrfs] At cache_save_setup() we need to update the inode item of a block group's cache which is located in the tree root (fs_info->tree_root), which means that it may result in COWing a leaf from that tree. If that happens we need to find a free metadata extent and while looking for one, if we find a block group which was not cached yet we attempt to load its cache by calling cache_block_group(). However this function will try to load the inode of the free space cache, which requires finding the matching inode item in the tree root - if that inode item is located in the same leaf as the inode item of the space cache we are updating at cache_save_setup(), we end up in a deadlock, since we try to obtain a read lock on the same extent buffer that we previously write locked. So fix this by using the tree root's commit root when searching for a block group's free space cache inode item when we are attempting to load a free space cache. This is safe since block groups once loaded stay in memory forever, as well as their caches, so after they are first loaded we will never need to read their inode items again. For new block groups, once they are created they get their ->cached field set to BTRFS_CACHE_FINISHED meaning we will not need to read their inode item. Reported-by: Andrew Nelson Link: https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") Tested-by: Andrew Nelson Signed-off-by: Filipe Manana --- V2: Made the solution more generic, since the problem could happen in any path COWing an extent buffer from the root tree. Applies on top of a previous patch titled: "Btrfs: fix deadlock when writing out free space caches" V3: Made it more simple by avoiding the atomic from V2 and pass the root to find_free_extent(). V4: Changed the whole approach so that we lookup for free space cache inode items using the commit root instead. The previous approach was causing some transactions to be aborted with -ENOSPC during umount because sometimes we were skipping cache loading of all metadata block groups. fs/btrfs/ctree.h| 3 +++ fs/btrfs/free-space-cache.c | 22 +- fs/btrfs/inode.c| 32 ++-- 3 files changed, 46 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2cddfe7806a4..2b34b2a05ad6 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3177,6 +3177,9 @@ void btrfs_destroy_inode(struct inode *inode); int btrfs_drop_inode(struct inode *inode); int __init btrfs_init_cachep(void); void __cold btrfs_destroy_cachep(void); +struct inode *btrfs_iget_path(struct super_block *s, struct btrfs_key *location, + struct btrfs_root *root, int *new, + struct btrfs_path *path); struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location, struct btrfs_root *root, int *was_new); struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 65b79500e09f..7265f35324f6 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -68,7 +68,8 @@ static struct inode *__lookup_free_space_inode(struct btrfs_root *root, btrfs_disk_key_to_cpu(, _key); btrfs_release_path(path); - inode = btrfs_iget(fs_info->sb, , root, NULL); + inode = btrfs_iget_path(fs_info->sb, , root, NULL, path); + btrfs_release_path(path); if (IS_ERR(inode)) return inode; @@ -830,6 +831,25 @@
Re: [PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent
On Wed, Oct 24, 2018 at 5:08 AM Josef Bacik wrote: > > On Mon, Oct 22, 2018 at 11:05:08PM +0100, Filipe Manana wrote: > > On Mon, Oct 22, 2018 at 8:18 PM Josef Bacik wrote: > > > > > > On Mon, Oct 22, 2018 at 08:10:37PM +0100, fdman...@kernel.org wrote: > > > > From: Filipe Manana > > > > > > > > When we are writing out a free space cache, during the transaction > > > > commit > > > > phase, we can end up in a deadlock which results in a stack trace like > > > > the > > > > following: > > > > > > > > schedule+0x28/0x80 > > > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > > > ? finish_wait+0x80/0x80 > > > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > > > ? inode_insert5+0x119/0x190 > > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > > ? kmem_cache_alloc+0x166/0x1d0 > > > > btrfs_iget+0x113/0x690 [btrfs] > > > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > > > load_free_space_cache+0x7c/0x170 [btrfs] > > > > ? cache_block_group+0x72/0x3b0 [btrfs] > > > > cache_block_group+0x1b3/0x3b0 [btrfs] > > > > ? finish_wait+0x80/0x80 > > > > find_free_extent+0x799/0x1010 [btrfs] > > > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > > > btrfs_cow_block+0xdc/0x180 [btrfs] > > > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > > ? kmem_cache_alloc+0x166/0x1d0 > > > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > > > cache_save_setup+0xe4/0x3a0 [btrfs] > > > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > > > > > At cache_save_setup() we need to update the inode item of a block > > > > group's > > > > cache which is located in the tree root (fs_info->tree_root), which > > > > means > > > > that it may result in COWing a leaf from that tree. If that happens we > > > > need to find a free metadata extent and while looking for one, if we > > > > find > > > > a block group which was not cached yet we attempt to load its cache by > > > > calling cache_block_group(). However this function will try to load the > > > > inode of the free space cache, which requires finding the matching inode > > > > item in the tree root - if that inode item is located in the same leaf > > > > as > > > > the inode item of the space cache we are updating at cache_save_setup(), > > > > we end up in a deadlock, since we try to obtain a read lock on the same > > > > extent buffer that we previously write locked. > > > > > > > > So fix this by skipping the loading of free space caches of any block > > > > groups that are not yet cached (rare cases) if we are COWing an extent > > > > buffer from the root tree and space caching is enabled (-o space_cache > > > > mount option). This is a rare case and its downside is failure to > > > > find a free extent (return -ENOSPC) when all the already cached block > > > > groups have no free extents. > > > > > > > > Reported-by: Andrew Nelson > > > > Link: > > > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ > > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > > > > Tested-by: Andrew Nelson > > > > Signed-off-by: Filipe Manana > > > > > > Great, thanks, > > > > > > Reviewed-by: Josef Bacik > > > > So this makes many fstests occasionally fail with aborted transaction > > due to ENOSPC. > > It's late and I haven't verified yet, but I suppose this is because we > > always skip loading the cache regardless of currently being COWing an > > existing leaf or allocating a new one (growing the tree). > > Needs to be fixed. > > > > How about we just use path->search_commit_root? If we're loading the cache we > just want the last committed version, it's not like we read it after we've > written it. Then we can go back to business as usual. Thanks, Yeah, that works. It was an idea before sending v1 but it felt a bit dirty at the time. Left fstests running overnight using the commit root approach and everything seems fine. Sending a v4. Another alternative, which would solve similar problems for any tree, would be to allow getting a read lock on an already (spin) write locked eb, just like we do for blocking write locks: https://friendpaste.com/6XrGXb5p0RSJGixUFZ8lCt thanks > > Josef
Re: [PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent
On Mon, Oct 22, 2018 at 11:05:08PM +0100, Filipe Manana wrote: > On Mon, Oct 22, 2018 at 8:18 PM Josef Bacik wrote: > > > > On Mon, Oct 22, 2018 at 08:10:37PM +0100, fdman...@kernel.org wrote: > > > From: Filipe Manana > > > > > > When we are writing out a free space cache, during the transaction commit > > > phase, we can end up in a deadlock which results in a stack trace like the > > > following: > > > > > > schedule+0x28/0x80 > > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > > ? finish_wait+0x80/0x80 > > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > > ? inode_insert5+0x119/0x190 > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > ? kmem_cache_alloc+0x166/0x1d0 > > > btrfs_iget+0x113/0x690 [btrfs] > > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > > load_free_space_cache+0x7c/0x170 [btrfs] > > > ? cache_block_group+0x72/0x3b0 [btrfs] > > > cache_block_group+0x1b3/0x3b0 [btrfs] > > > ? finish_wait+0x80/0x80 > > > find_free_extent+0x799/0x1010 [btrfs] > > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > > btrfs_cow_block+0xdc/0x180 [btrfs] > > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > ? kmem_cache_alloc+0x166/0x1d0 > > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > > cache_save_setup+0xe4/0x3a0 [btrfs] > > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > > > At cache_save_setup() we need to update the inode item of a block group's > > > cache which is located in the tree root (fs_info->tree_root), which means > > > that it may result in COWing a leaf from that tree. If that happens we > > > need to find a free metadata extent and while looking for one, if we find > > > a block group which was not cached yet we attempt to load its cache by > > > calling cache_block_group(). However this function will try to load the > > > inode of the free space cache, which requires finding the matching inode > > > item in the tree root - if that inode item is located in the same leaf as > > > the inode item of the space cache we are updating at cache_save_setup(), > > > we end up in a deadlock, since we try to obtain a read lock on the same > > > extent buffer that we previously write locked. > > > > > > So fix this by skipping the loading of free space caches of any block > > > groups that are not yet cached (rare cases) if we are COWing an extent > > > buffer from the root tree and space caching is enabled (-o space_cache > > > mount option). This is a rare case and its downside is failure to > > > find a free extent (return -ENOSPC) when all the already cached block > > > groups have no free extents. > > > > > > Reported-by: Andrew Nelson > > > Link: > > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > > > Tested-by: Andrew Nelson > > > Signed-off-by: Filipe Manana > > > > Great, thanks, > > > > Reviewed-by: Josef Bacik > > So this makes many fstests occasionally fail with aborted transaction > due to ENOSPC. > It's late and I haven't verified yet, but I suppose this is because we > always skip loading the cache regardless of currently being COWing an > existing leaf or allocating a new one (growing the tree). > Needs to be fixed. > How about we just use path->search_commit_root? If we're loading the cache we just want the last committed version, it's not like we read it after we've written it. Then we can go back to business as usual. Thanks, Josef
Re: [PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent
On Mon, Oct 22, 2018 at 8:18 PM Josef Bacik wrote: > > On Mon, Oct 22, 2018 at 08:10:37PM +0100, fdman...@kernel.org wrote: > > From: Filipe Manana > > > > When we are writing out a free space cache, during the transaction commit > > phase, we can end up in a deadlock which results in a stack trace like the > > following: > > > > schedule+0x28/0x80 > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > ? finish_wait+0x80/0x80 > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > ? inode_insert5+0x119/0x190 > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > ? kmem_cache_alloc+0x166/0x1d0 > > btrfs_iget+0x113/0x690 [btrfs] > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > load_free_space_cache+0x7c/0x170 [btrfs] > > ? cache_block_group+0x72/0x3b0 [btrfs] > > cache_block_group+0x1b3/0x3b0 [btrfs] > > ? finish_wait+0x80/0x80 > > find_free_extent+0x799/0x1010 [btrfs] > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > btrfs_cow_block+0xdc/0x180 [btrfs] > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > ? kmem_cache_alloc+0x166/0x1d0 > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > cache_save_setup+0xe4/0x3a0 [btrfs] > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > At cache_save_setup() we need to update the inode item of a block group's > > cache which is located in the tree root (fs_info->tree_root), which means > > that it may result in COWing a leaf from that tree. If that happens we > > need to find a free metadata extent and while looking for one, if we find > > a block group which was not cached yet we attempt to load its cache by > > calling cache_block_group(). However this function will try to load the > > inode of the free space cache, which requires finding the matching inode > > item in the tree root - if that inode item is located in the same leaf as > > the inode item of the space cache we are updating at cache_save_setup(), > > we end up in a deadlock, since we try to obtain a read lock on the same > > extent buffer that we previously write locked. > > > > So fix this by skipping the loading of free space caches of any block > > groups that are not yet cached (rare cases) if we are COWing an extent > > buffer from the root tree and space caching is enabled (-o space_cache > > mount option). This is a rare case and its downside is failure to > > find a free extent (return -ENOSPC) when all the already cached block > > groups have no free extents. > > > > Reported-by: Andrew Nelson > > Link: > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > > Tested-by: Andrew Nelson > > Signed-off-by: Filipe Manana > > Great, thanks, > > Reviewed-by: Josef Bacik So this makes many fstests occasionally fail with aborted transaction due to ENOSPC. It's late and I haven't verified yet, but I suppose this is because we always skip loading the cache regardless of currently being COWing an existing leaf or allocating a new one (growing the tree). Needs to be fixed. > > Josef
Re: [PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent
On Mon, Oct 22, 2018 at 08:10:37PM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > When we are writing out a free space cache, during the transaction commit > phase, we can end up in a deadlock which results in a stack trace like the > following: > > schedule+0x28/0x80 > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > ? finish_wait+0x80/0x80 > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > btrfs_search_slot+0xf6/0x9f0 [btrfs] > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > ? inode_insert5+0x119/0x190 > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > ? kmem_cache_alloc+0x166/0x1d0 > btrfs_iget+0x113/0x690 [btrfs] > __lookup_free_space_inode+0xd8/0x150 [btrfs] > lookup_free_space_inode+0x5b/0xb0 [btrfs] > load_free_space_cache+0x7c/0x170 [btrfs] > ? cache_block_group+0x72/0x3b0 [btrfs] > cache_block_group+0x1b3/0x3b0 [btrfs] > ? finish_wait+0x80/0x80 > find_free_extent+0x799/0x1010 [btrfs] > btrfs_reserve_extent+0x9b/0x180 [btrfs] > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > __btrfs_cow_block+0x11d/0x500 [btrfs] > btrfs_cow_block+0xdc/0x180 [btrfs] > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > ? kmem_cache_alloc+0x166/0x1d0 > btrfs_update_inode_item+0x46/0x100 [btrfs] > cache_save_setup+0xe4/0x3a0 [btrfs] > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > At cache_save_setup() we need to update the inode item of a block group's > cache which is located in the tree root (fs_info->tree_root), which means > that it may result in COWing a leaf from that tree. If that happens we > need to find a free metadata extent and while looking for one, if we find > a block group which was not cached yet we attempt to load its cache by > calling cache_block_group(). However this function will try to load the > inode of the free space cache, which requires finding the matching inode > item in the tree root - if that inode item is located in the same leaf as > the inode item of the space cache we are updating at cache_save_setup(), > we end up in a deadlock, since we try to obtain a read lock on the same > extent buffer that we previously write locked. > > So fix this by skipping the loading of free space caches of any block > groups that are not yet cached (rare cases) if we are COWing an extent > buffer from the root tree and space caching is enabled (-o space_cache > mount option). This is a rare case and its downside is failure to > find a free extent (return -ENOSPC) when all the already cached block > groups have no free extents. > > Reported-by: Andrew Nelson > Link: > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > Tested-by: Andrew Nelson > Signed-off-by: Filipe Manana Great, thanks, Reviewed-by: Josef Bacik Josef
[PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent
From: Filipe Manana When we are writing out a free space cache, during the transaction commit phase, we can end up in a deadlock which results in a stack trace like the following: schedule+0x28/0x80 btrfs_tree_read_lock+0x8e/0x120 [btrfs] ? finish_wait+0x80/0x80 btrfs_read_lock_root_node+0x2f/0x40 [btrfs] btrfs_search_slot+0xf6/0x9f0 [btrfs] ? evict_refill_and_join+0xd0/0xd0 [btrfs] ? inode_insert5+0x119/0x190 btrfs_lookup_inode+0x3a/0xc0 [btrfs] ? kmem_cache_alloc+0x166/0x1d0 btrfs_iget+0x113/0x690 [btrfs] __lookup_free_space_inode+0xd8/0x150 [btrfs] lookup_free_space_inode+0x5b/0xb0 [btrfs] load_free_space_cache+0x7c/0x170 [btrfs] ? cache_block_group+0x72/0x3b0 [btrfs] cache_block_group+0x1b3/0x3b0 [btrfs] ? finish_wait+0x80/0x80 find_free_extent+0x799/0x1010 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] __btrfs_cow_block+0x11d/0x500 [btrfs] btrfs_cow_block+0xdc/0x180 [btrfs] btrfs_search_slot+0x3bd/0x9f0 [btrfs] btrfs_lookup_inode+0x3a/0xc0 [btrfs] ? kmem_cache_alloc+0x166/0x1d0 btrfs_update_inode_item+0x46/0x100 [btrfs] cache_save_setup+0xe4/0x3a0 [btrfs] btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] btrfs_commit_transaction+0xcb/0x8b0 [btrfs] At cache_save_setup() we need to update the inode item of a block group's cache which is located in the tree root (fs_info->tree_root), which means that it may result in COWing a leaf from that tree. If that happens we need to find a free metadata extent and while looking for one, if we find a block group which was not cached yet we attempt to load its cache by calling cache_block_group(). However this function will try to load the inode of the free space cache, which requires finding the matching inode item in the tree root - if that inode item is located in the same leaf as the inode item of the space cache we are updating at cache_save_setup(), we end up in a deadlock, since we try to obtain a read lock on the same extent buffer that we previously write locked. So fix this by skipping the loading of free space caches of any block groups that are not yet cached (rare cases) if we are COWing an extent buffer from the root tree and space caching is enabled (-o space_cache mount option). This is a rare case and its downside is failure to find a free extent (return -ENOSPC) when all the already cached block groups have no free extents. Reported-by: Andrew Nelson Link: https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") Tested-by: Andrew Nelson Signed-off-by: Filipe Manana --- V2: Made the solution more generic, since the problem could happen in any path COWing an extent buffer from the root tree. Applies on top of a previous patch titled: "Btrfs: fix deadlock when writing out free space caches" V3: Made it more simple by avoiding the atomic from V2 and pass the root to find_free_extent(). fs/btrfs/extent-tree.c | 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 577878324799..e5fd086799ab 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7218,12 +7218,13 @@ btrfs_release_block_group(struct btrfs_block_group_cache *cache, * the free space extent currently. */ static noinline int find_free_extent(struct btrfs_fs_info *fs_info, + struct btrfs_root *root, u64 ram_bytes, u64 num_bytes, u64 empty_size, u64 hint_byte, struct btrfs_key *ins, u64 flags, int delalloc) { int ret = 0; - struct btrfs_root *root = fs_info->extent_root; + struct btrfs_root *extent_root = fs_info->extent_root; struct btrfs_free_cluster *last_ptr = NULL; struct btrfs_block_group_cache *block_group = NULL; u64 search_start = 0; @@ -7366,7 +7367,20 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info, have_block_group: cached = block_group_cache_done(block_group); - if (unlikely(!cached)) { + /* +* If we are COWing a leaf/node from the root tree, we can not +* start caching of a block group because we could deadlock on +* an extent buffer of the root tree. +* Because if we are COWing a leaf from the root tree, we are +* holding a write lock on the respective extent buffer, and +* loading the space cache of a block group requires searching +* for its inode item in the root tree, which can be located +* in the same leaf that we previously write locked, in which +* case we will deadlock. +*/ + if (unlikely(!cached) &&
Re: [PATCH v2] Btrfs: fix deadlock on tree root leaf when finding free extent
On Mon, Oct 22, 2018 at 7:56 PM Josef Bacik wrote: > > On Mon, Oct 22, 2018 at 07:48:30PM +0100, fdman...@kernel.org wrote: > > From: Filipe Manana > > > > When we are writing out a free space cache, during the transaction commit > > phase, we can end up in a deadlock which results in a stack trace like the > > following: > > > > schedule+0x28/0x80 > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > ? finish_wait+0x80/0x80 > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > ? inode_insert5+0x119/0x190 > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > ? kmem_cache_alloc+0x166/0x1d0 > > btrfs_iget+0x113/0x690 [btrfs] > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > load_free_space_cache+0x7c/0x170 [btrfs] > > ? cache_block_group+0x72/0x3b0 [btrfs] > > cache_block_group+0x1b3/0x3b0 [btrfs] > > ? finish_wait+0x80/0x80 > > find_free_extent+0x799/0x1010 [btrfs] > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > btrfs_cow_block+0xdc/0x180 [btrfs] > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > ? kmem_cache_alloc+0x166/0x1d0 > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > cache_save_setup+0xe4/0x3a0 [btrfs] > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > At cache_save_setup() we need to update the inode item of a block group's > > cache which is located in the tree root (fs_info->tree_root), which means > > that it may result in COWing a leaf from that tree. If that happens we > > need to find a free metadata extent and while looking for one, if we find > > a block group which was not cached yet we attempt to load its cache by > > calling cache_block_group(). However this function will try to load the > > inode of the free space cache, which requires finding the matching inode > > item in the tree root - if that inode item is located in the same leaf as > > the inode item of the space cache we are updating at cache_save_setup(), > > we end up in a deadlock, since we try to obtain a read lock on the same > > extent buffer that we previously write locked. > > > > So fix this by skipping the loading of free space caches of any block > > groups that are not yet cached (rare cases) if we are COWing an extent > > buffer from the root tree and space caching is enabled (-o space_cache > > mount option). This is a rare case and its downside is failure to > > find a free extent (return -ENOSPC) when all the already cached block > > groups have no free extents. > > > > Reported-by: Andrew Nelson > > Link: > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > > Tested-by: Andrew Nelson > > Signed-off-by: Filipe Manana > > --- > > > > V2: Made the solution more generic, since the problem could happen in any > > path COWing an extent buffer from the root tree. > > > > Applies on top of a previous patch titled: > > > > "Btrfs: fix deadlock when writing out free space caches" > > > > fs/btrfs/ctree.c | 4 > > fs/btrfs/ctree.h | 3 +++ > > fs/btrfs/disk-io.c | 2 ++ > > fs/btrfs/extent-tree.c | 15 ++- > > 4 files changed, 23 insertions(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c > > index 089b46c4d97f..646aafda55a3 100644 > > --- a/fs/btrfs/ctree.c > > +++ b/fs/btrfs/ctree.c > > @@ -1065,10 +1065,14 @@ static noinline int __btrfs_cow_block(struct > > btrfs_trans_handle *trans, > > root == fs_info->chunk_root || > > root == fs_info->dev_root) > > trans->can_flush_pending_bgs = false; > > + else if (root == fs_info->tree_root) > > + atomic_inc(_info->tree_root_cows); > > > > cow = btrfs_alloc_tree_block(trans, root, parent_start, > > root->root_key.objectid, _key, level, > > search_start, empty_size); > > + if (root == fs_info->tree_root) > > + atomic_dec(_info->tree_root_cows); > > Do we need this though? Our root should be the root we're cow'ing the block > for, and it should be passed all the way down to find_free_extent properly, so > we really should be able to just do if (root == fs_info->tree_root) and not > add > all this stuff. Ups, I missed that we could actually pass the root down to find_free_extent(). That's why made the atomic thing. Sending v3, thanks. > > Not to mention this will race with anybody else doing stuff, so if another > thread that isn't actually touching the tree_root it would skip caching a > block > group when it's completely ok for that thread to do it. Thanks, > > Josef
Re: [PATCH v2] Btrfs: fix deadlock on tree root leaf when finding free extent
On Mon, Oct 22, 2018 at 07:48:30PM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > When we are writing out a free space cache, during the transaction commit > phase, we can end up in a deadlock which results in a stack trace like the > following: > > schedule+0x28/0x80 > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > ? finish_wait+0x80/0x80 > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > btrfs_search_slot+0xf6/0x9f0 [btrfs] > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > ? inode_insert5+0x119/0x190 > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > ? kmem_cache_alloc+0x166/0x1d0 > btrfs_iget+0x113/0x690 [btrfs] > __lookup_free_space_inode+0xd8/0x150 [btrfs] > lookup_free_space_inode+0x5b/0xb0 [btrfs] > load_free_space_cache+0x7c/0x170 [btrfs] > ? cache_block_group+0x72/0x3b0 [btrfs] > cache_block_group+0x1b3/0x3b0 [btrfs] > ? finish_wait+0x80/0x80 > find_free_extent+0x799/0x1010 [btrfs] > btrfs_reserve_extent+0x9b/0x180 [btrfs] > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > __btrfs_cow_block+0x11d/0x500 [btrfs] > btrfs_cow_block+0xdc/0x180 [btrfs] > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > ? kmem_cache_alloc+0x166/0x1d0 > btrfs_update_inode_item+0x46/0x100 [btrfs] > cache_save_setup+0xe4/0x3a0 [btrfs] > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > At cache_save_setup() we need to update the inode item of a block group's > cache which is located in the tree root (fs_info->tree_root), which means > that it may result in COWing a leaf from that tree. If that happens we > need to find a free metadata extent and while looking for one, if we find > a block group which was not cached yet we attempt to load its cache by > calling cache_block_group(). However this function will try to load the > inode of the free space cache, which requires finding the matching inode > item in the tree root - if that inode item is located in the same leaf as > the inode item of the space cache we are updating at cache_save_setup(), > we end up in a deadlock, since we try to obtain a read lock on the same > extent buffer that we previously write locked. > > So fix this by skipping the loading of free space caches of any block > groups that are not yet cached (rare cases) if we are COWing an extent > buffer from the root tree and space caching is enabled (-o space_cache > mount option). This is a rare case and its downside is failure to > find a free extent (return -ENOSPC) when all the already cached block > groups have no free extents. > > Reported-by: Andrew Nelson > Link: > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > Tested-by: Andrew Nelson > Signed-off-by: Filipe Manana > --- > > V2: Made the solution more generic, since the problem could happen in any > path COWing an extent buffer from the root tree. > > Applies on top of a previous patch titled: > > "Btrfs: fix deadlock when writing out free space caches" > > fs/btrfs/ctree.c | 4 > fs/btrfs/ctree.h | 3 +++ > fs/btrfs/disk-io.c | 2 ++ > fs/btrfs/extent-tree.c | 15 ++- > 4 files changed, 23 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c > index 089b46c4d97f..646aafda55a3 100644 > --- a/fs/btrfs/ctree.c > +++ b/fs/btrfs/ctree.c > @@ -1065,10 +1065,14 @@ static noinline int __btrfs_cow_block(struct > btrfs_trans_handle *trans, > root == fs_info->chunk_root || > root == fs_info->dev_root) > trans->can_flush_pending_bgs = false; > + else if (root == fs_info->tree_root) > + atomic_inc(_info->tree_root_cows); > > cow = btrfs_alloc_tree_block(trans, root, parent_start, > root->root_key.objectid, _key, level, > search_start, empty_size); > + if (root == fs_info->tree_root) > + atomic_dec(_info->tree_root_cows); Do we need this though? Our root should be the root we're cow'ing the block for, and it should be passed all the way down to find_free_extent properly, so we really should be able to just do if (root == fs_info->tree_root) and not add all this stuff. Not to mention this will race with anybody else doing stuff, so if another thread that isn't actually touching the tree_root it would skip caching a block group when it's completely ok for that thread to do it. Thanks, Josef
Re: [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent
On Mon, Oct 22, 2018 at 7:07 PM Josef Bacik wrote: > > On Mon, Oct 22, 2018 at 10:09:46AM +0100, fdman...@kernel.org wrote: > > From: Filipe Manana > > > > When we are writing out a free space cache, during the transaction commit > > phase, we can end up in a deadlock which results in a stack trace like the > > following: > > > > schedule+0x28/0x80 > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > ? finish_wait+0x80/0x80 > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > ? inode_insert5+0x119/0x190 > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > ? kmem_cache_alloc+0x166/0x1d0 > > btrfs_iget+0x113/0x690 [btrfs] > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > load_free_space_cache+0x7c/0x170 [btrfs] > > ? cache_block_group+0x72/0x3b0 [btrfs] > > cache_block_group+0x1b3/0x3b0 [btrfs] > > ? finish_wait+0x80/0x80 > > find_free_extent+0x799/0x1010 [btrfs] > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > btrfs_cow_block+0xdc/0x180 [btrfs] > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > ? kmem_cache_alloc+0x166/0x1d0 > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > cache_save_setup+0xe4/0x3a0 [btrfs] > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > At cache_save_setup() we need to update the inode item of a block group's > > cache which is located in the tree root (fs_info->tree_root), which means > > that it may result in COWing a leaf from that tree. If that happens we > > need to find a free metadata extent and while looking for one, if we find > > a block group which was not cached yet we attempt to load its cache by > > calling cache_block_group(). However this function will try to load the > > inode of the free space cache, which requires finding the matching inode > > item in the tree root - if that inode item is located in the same leaf as > > the inode item of the space cache we are updating at cache_save_setup(), > > we end up in a deadlock, since we try to obtain a read lock on the same > > extent buffer that we previously write locked. > > > > So fix this by skipping the loading of free space caches of any block > > groups that are not yet cached (rare cases) if we are updating the inode > > of a free space cache. This is a rare case and its downside is failure to > > find a free extent (return -ENOSPC) when all the already cached block > > groups have no free extents. > > > > Actually isn't this a problem for anything that tries to allocate an extent > while in the tree_root? Like we snapshot or make a subvolume or anything? Indeed. Initially I considered making it more generic (like the recent fix for deadlock when cowing from extent/chunk/device tree) but I totally forgot about the other cases like you mentioned. > We > should just disallow if root == tree_root. But even then we only need to do > this if we're using SPACE_CACHE, using the ye-olde caching or the free space > tree are both ok. Let's just limit it to those cases. Thanks, Yep, makes all sense. Thanks! V2 sent out. > > Josef
[PATCH v2] Btrfs: fix deadlock on tree root leaf when finding free extent
From: Filipe Manana When we are writing out a free space cache, during the transaction commit phase, we can end up in a deadlock which results in a stack trace like the following: schedule+0x28/0x80 btrfs_tree_read_lock+0x8e/0x120 [btrfs] ? finish_wait+0x80/0x80 btrfs_read_lock_root_node+0x2f/0x40 [btrfs] btrfs_search_slot+0xf6/0x9f0 [btrfs] ? evict_refill_and_join+0xd0/0xd0 [btrfs] ? inode_insert5+0x119/0x190 btrfs_lookup_inode+0x3a/0xc0 [btrfs] ? kmem_cache_alloc+0x166/0x1d0 btrfs_iget+0x113/0x690 [btrfs] __lookup_free_space_inode+0xd8/0x150 [btrfs] lookup_free_space_inode+0x5b/0xb0 [btrfs] load_free_space_cache+0x7c/0x170 [btrfs] ? cache_block_group+0x72/0x3b0 [btrfs] cache_block_group+0x1b3/0x3b0 [btrfs] ? finish_wait+0x80/0x80 find_free_extent+0x799/0x1010 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] __btrfs_cow_block+0x11d/0x500 [btrfs] btrfs_cow_block+0xdc/0x180 [btrfs] btrfs_search_slot+0x3bd/0x9f0 [btrfs] btrfs_lookup_inode+0x3a/0xc0 [btrfs] ? kmem_cache_alloc+0x166/0x1d0 btrfs_update_inode_item+0x46/0x100 [btrfs] cache_save_setup+0xe4/0x3a0 [btrfs] btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] btrfs_commit_transaction+0xcb/0x8b0 [btrfs] At cache_save_setup() we need to update the inode item of a block group's cache which is located in the tree root (fs_info->tree_root), which means that it may result in COWing a leaf from that tree. If that happens we need to find a free metadata extent and while looking for one, if we find a block group which was not cached yet we attempt to load its cache by calling cache_block_group(). However this function will try to load the inode of the free space cache, which requires finding the matching inode item in the tree root - if that inode item is located in the same leaf as the inode item of the space cache we are updating at cache_save_setup(), we end up in a deadlock, since we try to obtain a read lock on the same extent buffer that we previously write locked. So fix this by skipping the loading of free space caches of any block groups that are not yet cached (rare cases) if we are COWing an extent buffer from the root tree and space caching is enabled (-o space_cache mount option). This is a rare case and its downside is failure to find a free extent (return -ENOSPC) when all the already cached block groups have no free extents. Reported-by: Andrew Nelson Link: https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") Tested-by: Andrew Nelson Signed-off-by: Filipe Manana --- V2: Made the solution more generic, since the problem could happen in any path COWing an extent buffer from the root tree. Applies on top of a previous patch titled: "Btrfs: fix deadlock when writing out free space caches" fs/btrfs/ctree.c | 4 fs/btrfs/ctree.h | 3 +++ fs/btrfs/disk-io.c | 2 ++ fs/btrfs/extent-tree.c | 15 ++- 4 files changed, 23 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 089b46c4d97f..646aafda55a3 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1065,10 +1065,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans, root == fs_info->chunk_root || root == fs_info->dev_root) trans->can_flush_pending_bgs = false; + else if (root == fs_info->tree_root) + atomic_inc(_info->tree_root_cows); cow = btrfs_alloc_tree_block(trans, root, parent_start, root->root_key.objectid, _key, level, search_start, empty_size); + if (root == fs_info->tree_root) + atomic_dec(_info->tree_root_cows); trans->can_flush_pending_bgs = true; if (IS_ERR(cow)) return PTR_ERR(cow); diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2cddfe7806a4..1b73433c69e2 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1121,6 +1121,9 @@ struct btrfs_fs_info { u32 sectorsize; u32 stripesize; + /* Number of tasks corrently COWing a leaf/node from the tree root. */ + atomic_t tree_root_cows; + #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; struct rb_root block_tree; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 05dc3c17cb62..08c15bf69fb5 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2782,6 +2782,8 @@ int open_ctree(struct super_block *sb, fs_info->sectorsize = 4096; fs_info->stripesize = 4096; + atomic_set(_info->tree_root_cows, 0); + ret = btrfs_alloc_stripe_hash_table(fs_info); if (ret) { err = ret; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 577878324799..14f35e020050 100644 --- a/fs/btrfs/ext
Re: [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent
On Mon, Oct 22, 2018 at 10:09:46AM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > When we are writing out a free space cache, during the transaction commit > phase, we can end up in a deadlock which results in a stack trace like the > following: > > schedule+0x28/0x80 > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > ? finish_wait+0x80/0x80 > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > btrfs_search_slot+0xf6/0x9f0 [btrfs] > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > ? inode_insert5+0x119/0x190 > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > ? kmem_cache_alloc+0x166/0x1d0 > btrfs_iget+0x113/0x690 [btrfs] > __lookup_free_space_inode+0xd8/0x150 [btrfs] > lookup_free_space_inode+0x5b/0xb0 [btrfs] > load_free_space_cache+0x7c/0x170 [btrfs] > ? cache_block_group+0x72/0x3b0 [btrfs] > cache_block_group+0x1b3/0x3b0 [btrfs] > ? finish_wait+0x80/0x80 > find_free_extent+0x799/0x1010 [btrfs] > btrfs_reserve_extent+0x9b/0x180 [btrfs] > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > __btrfs_cow_block+0x11d/0x500 [btrfs] > btrfs_cow_block+0xdc/0x180 [btrfs] > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > ? kmem_cache_alloc+0x166/0x1d0 > btrfs_update_inode_item+0x46/0x100 [btrfs] > cache_save_setup+0xe4/0x3a0 [btrfs] > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > At cache_save_setup() we need to update the inode item of a block group's > cache which is located in the tree root (fs_info->tree_root), which means > that it may result in COWing a leaf from that tree. If that happens we > need to find a free metadata extent and while looking for one, if we find > a block group which was not cached yet we attempt to load its cache by > calling cache_block_group(). However this function will try to load the > inode of the free space cache, which requires finding the matching inode > item in the tree root - if that inode item is located in the same leaf as > the inode item of the space cache we are updating at cache_save_setup(), > we end up in a deadlock, since we try to obtain a read lock on the same > extent buffer that we previously write locked. > > So fix this by skipping the loading of free space caches of any block > groups that are not yet cached (rare cases) if we are updating the inode > of a free space cache. This is a rare case and its downside is failure to > find a free extent (return -ENOSPC) when all the already cached block > groups have no free extents. > Actually isn't this a problem for anything that tries to allocate an extent while in the tree_root? Like we snapshot or make a subvolume or anything? We should just disallow if root == tree_root. But even then we only need to do this if we're using SPACE_CACHE, using the ye-olde caching or the free space tree are both ok. Let's just limit it to those cases. Thanks, Josef
Re: [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent
On Mon, Oct 22, 2018 at 10:10 AM wrote: > > From: Filipe Manana > > When we are writing out a free space cache, during the transaction commit > phase, we can end up in a deadlock which results in a stack trace like the > following: > > schedule+0x28/0x80 > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > ? finish_wait+0x80/0x80 > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > btrfs_search_slot+0xf6/0x9f0 [btrfs] > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > ? inode_insert5+0x119/0x190 > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > ? kmem_cache_alloc+0x166/0x1d0 > btrfs_iget+0x113/0x690 [btrfs] > __lookup_free_space_inode+0xd8/0x150 [btrfs] > lookup_free_space_inode+0x5b/0xb0 [btrfs] > load_free_space_cache+0x7c/0x170 [btrfs] > ? cache_block_group+0x72/0x3b0 [btrfs] > cache_block_group+0x1b3/0x3b0 [btrfs] > ? finish_wait+0x80/0x80 > find_free_extent+0x799/0x1010 [btrfs] > btrfs_reserve_extent+0x9b/0x180 [btrfs] > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > __btrfs_cow_block+0x11d/0x500 [btrfs] > btrfs_cow_block+0xdc/0x180 [btrfs] > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > ? kmem_cache_alloc+0x166/0x1d0 > btrfs_update_inode_item+0x46/0x100 [btrfs] > cache_save_setup+0xe4/0x3a0 [btrfs] > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > At cache_save_setup() we need to update the inode item of a block group's > cache which is located in the tree root (fs_info->tree_root), which means > that it may result in COWing a leaf from that tree. If that happens we > need to find a free metadata extent and while looking for one, if we find > a block group which was not cached yet we attempt to load its cache by > calling cache_block_group(). However this function will try to load the > inode of the free space cache, which requires finding the matching inode > item in the tree root - if that inode item is located in the same leaf as > the inode item of the space cache we are updating at cache_save_setup(), > we end up in a deadlock, since we try to obtain a read lock on the same > extent buffer that we previously write locked. > > So fix this by skipping the loading of free space caches of any block > groups that are not yet cached (rare cases) if we are updating the inode > of a free space cache. This is a rare case and its downside is failure to > find a free extent (return -ENOSPC) when all the already cached block > groups have no free extents. > > Reported-by: Andrew Nelson > Link: > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > Signed-off-by: Filipe Manana Tested-by: Andrew Nelson > --- > fs/btrfs/ctree.h | 3 +++ > fs/btrfs/disk-io.c | 2 ++ > fs/btrfs/extent-tree.c | 22 +- > 3 files changed, 26 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 2cddfe7806a4..d23ee26eb17d 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1121,6 +1121,9 @@ struct btrfs_fs_info { > u32 sectorsize; > u32 stripesize; > > + /* The task currently updating a free space cache inode item. */ > + struct task_struct *space_cache_updater; > + > #ifdef CONFIG_BTRFS_FS_REF_VERIFY > spinlock_t ref_verify_lock; > struct rb_root block_tree; > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 05dc3c17cb62..aa5e9a91e560 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -2782,6 +2782,8 @@ int open_ctree(struct super_block *sb, > fs_info->sectorsize = 4096; > fs_info->stripesize = 4096; > > + fs_info->space_cache_updater = NULL; > + > ret = btrfs_alloc_stripe_hash_table(fs_info); > if (ret) { > err = ret; > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 577878324799..e93040449771 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -3364,7 +3364,9 @@ static int cache_save_setup(struct > btrfs_block_group_cache *block_group, > * time. > */ > BTRFS_I(inode)->generation = 0; > + fs_info->space_cache_updater = current; > ret = btrfs_update_inode(trans, root, inode); > + fs_info->space_cache_updater = NULL; > if (ret) { > /* > * So theoretically we could recover from this, simply set the > @@ -7366,7 +7368,25 @@ static noinline int find_free_extent(struct > btrfs_fs_info *fs_info, > > have_bl
Re: [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent
On Mon, Oct 22, 2018 at 10:10 AM wrote: > > From: Filipe Manana > > When we are writing out a free space cache, during the transaction commit > phase, we can end up in a deadlock which results in a stack trace like the > following: > > schedule+0x28/0x80 > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > ? finish_wait+0x80/0x80 > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > btrfs_search_slot+0xf6/0x9f0 [btrfs] > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > ? inode_insert5+0x119/0x190 > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > ? kmem_cache_alloc+0x166/0x1d0 > btrfs_iget+0x113/0x690 [btrfs] > __lookup_free_space_inode+0xd8/0x150 [btrfs] > lookup_free_space_inode+0x5b/0xb0 [btrfs] > load_free_space_cache+0x7c/0x170 [btrfs] > ? cache_block_group+0x72/0x3b0 [btrfs] > cache_block_group+0x1b3/0x3b0 [btrfs] > ? finish_wait+0x80/0x80 > find_free_extent+0x799/0x1010 [btrfs] > btrfs_reserve_extent+0x9b/0x180 [btrfs] > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > __btrfs_cow_block+0x11d/0x500 [btrfs] > btrfs_cow_block+0xdc/0x180 [btrfs] > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > ? kmem_cache_alloc+0x166/0x1d0 > btrfs_update_inode_item+0x46/0x100 [btrfs] > cache_save_setup+0xe4/0x3a0 [btrfs] > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > At cache_save_setup() we need to update the inode item of a block group's > cache which is located in the tree root (fs_info->tree_root), which means > that it may result in COWing a leaf from that tree. If that happens we > need to find a free metadata extent and while looking for one, if we find > a block group which was not cached yet we attempt to load its cache by > calling cache_block_group(). However this function will try to load the > inode of the free space cache, which requires finding the matching inode > item in the tree root - if that inode item is located in the same leaf as > the inode item of the space cache we are updating at cache_save_setup(), > we end up in a deadlock, since we try to obtain a read lock on the same > extent buffer that we previously write locked. > > So fix this by skipping the loading of free space caches of any block > groups that are not yet cached (rare cases) if we are updating the inode > of a free space cache. This is a rare case and its downside is failure to > find a free extent (return -ENOSPC) when all the already cached block > groups have no free extents. > > Reported-by: Andrew Nelson > Link: > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > Signed-off-by: Filipe Manana Andrew Nelson > --- > fs/btrfs/ctree.h | 3 +++ > fs/btrfs/disk-io.c | 2 ++ > fs/btrfs/extent-tree.c | 22 +- > 3 files changed, 26 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 2cddfe7806a4..d23ee26eb17d 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1121,6 +1121,9 @@ struct btrfs_fs_info { > u32 sectorsize; > u32 stripesize; > > + /* The task currently updating a free space cache inode item. */ > + struct task_struct *space_cache_updater; > + > #ifdef CONFIG_BTRFS_FS_REF_VERIFY > spinlock_t ref_verify_lock; > struct rb_root block_tree; > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 05dc3c17cb62..aa5e9a91e560 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -2782,6 +2782,8 @@ int open_ctree(struct super_block *sb, > fs_info->sectorsize = 4096; > fs_info->stripesize = 4096; > > + fs_info->space_cache_updater = NULL; > + > ret = btrfs_alloc_stripe_hash_table(fs_info); > if (ret) { > err = ret; > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 577878324799..e93040449771 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -3364,7 +3364,9 @@ static int cache_save_setup(struct > btrfs_block_group_cache *block_group, > * time. > */ > BTRFS_I(inode)->generation = 0; > + fs_info->space_cache_updater = current; > ret = btrfs_update_inode(trans, root, inode); > + fs_info->space_cache_updater = NULL; > if (ret) { > /* > * So theoretically we could recover from this, simply set the > @@ -7366,7 +7368,25 @@ static noinline int find_free_extent(struct > btrfs_fs_info *fs_info, > > have_block_group: > cached = b
[PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent
From: Filipe Manana When we are writing out a free space cache, during the transaction commit phase, we can end up in a deadlock which results in a stack trace like the following: schedule+0x28/0x80 btrfs_tree_read_lock+0x8e/0x120 [btrfs] ? finish_wait+0x80/0x80 btrfs_read_lock_root_node+0x2f/0x40 [btrfs] btrfs_search_slot+0xf6/0x9f0 [btrfs] ? evict_refill_and_join+0xd0/0xd0 [btrfs] ? inode_insert5+0x119/0x190 btrfs_lookup_inode+0x3a/0xc0 [btrfs] ? kmem_cache_alloc+0x166/0x1d0 btrfs_iget+0x113/0x690 [btrfs] __lookup_free_space_inode+0xd8/0x150 [btrfs] lookup_free_space_inode+0x5b/0xb0 [btrfs] load_free_space_cache+0x7c/0x170 [btrfs] ? cache_block_group+0x72/0x3b0 [btrfs] cache_block_group+0x1b3/0x3b0 [btrfs] ? finish_wait+0x80/0x80 find_free_extent+0x799/0x1010 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] __btrfs_cow_block+0x11d/0x500 [btrfs] btrfs_cow_block+0xdc/0x180 [btrfs] btrfs_search_slot+0x3bd/0x9f0 [btrfs] btrfs_lookup_inode+0x3a/0xc0 [btrfs] ? kmem_cache_alloc+0x166/0x1d0 btrfs_update_inode_item+0x46/0x100 [btrfs] cache_save_setup+0xe4/0x3a0 [btrfs] btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] btrfs_commit_transaction+0xcb/0x8b0 [btrfs] At cache_save_setup() we need to update the inode item of a block group's cache which is located in the tree root (fs_info->tree_root), which means that it may result in COWing a leaf from that tree. If that happens we need to find a free metadata extent and while looking for one, if we find a block group which was not cached yet we attempt to load its cache by calling cache_block_group(). However this function will try to load the inode of the free space cache, which requires finding the matching inode item in the tree root - if that inode item is located in the same leaf as the inode item of the space cache we are updating at cache_save_setup(), we end up in a deadlock, since we try to obtain a read lock on the same extent buffer that we previously write locked. So fix this by skipping the loading of free space caches of any block groups that are not yet cached (rare cases) if we are updating the inode of a free space cache. This is a rare case and its downside is failure to find a free extent (return -ENOSPC) when all the already cached block groups have no free extents. Reported-by: Andrew Nelson Link: https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") Signed-off-by: Filipe Manana --- fs/btrfs/ctree.h | 3 +++ fs/btrfs/disk-io.c | 2 ++ fs/btrfs/extent-tree.c | 22 +- 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2cddfe7806a4..d23ee26eb17d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1121,6 +1121,9 @@ struct btrfs_fs_info { u32 sectorsize; u32 stripesize; + /* The task currently updating a free space cache inode item. */ + struct task_struct *space_cache_updater; + #ifdef CONFIG_BTRFS_FS_REF_VERIFY spinlock_t ref_verify_lock; struct rb_root block_tree; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 05dc3c17cb62..aa5e9a91e560 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2782,6 +2782,8 @@ int open_ctree(struct super_block *sb, fs_info->sectorsize = 4096; fs_info->stripesize = 4096; + fs_info->space_cache_updater = NULL; + ret = btrfs_alloc_stripe_hash_table(fs_info); if (ret) { err = ret; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 577878324799..e93040449771 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3364,7 +3364,9 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group, * time. */ BTRFS_I(inode)->generation = 0; + fs_info->space_cache_updater = current; ret = btrfs_update_inode(trans, root, inode); + fs_info->space_cache_updater = NULL; if (ret) { /* * So theoretically we could recover from this, simply set the @@ -7366,7 +7368,25 @@ static noinline int find_free_extent(struct btrfs_fs_info *fs_info, have_block_group: cached = block_group_cache_done(block_group); - if (unlikely(!cached)) { + /* +* If we are updating the inode of a free space cache, we can +* not start the caching of any block group because we could +* deadlock on an extent buffer of the root tree. +* At cache_save_setup() we update the inode item of a free +* space cache, so we may need to COW a leaf of the root tree, +* which implies finding a free metadata extent. So if when +* searching for such
Re: Couldn't read tree root BTRFS - SEED
Hello Qu, Thank you very much for the answer. I am sorry to answer late because before bothering more, I decided to perform another cloning test. In this time he has joined the second device correctly. The problem has occurred because they are BCACHE devices, and although the SEED disk does not vary over time, buy the read-write yes, with which I have made sure this time empty the cache before performing the cloning. Now, when inserting the discs I have managed to mount everything correctly: Label: 'SEED_MD2' uuid: 285b16fa-b297-43ed-bb3b-311950729eb6 Total devices 2 FS bytes used 2.00TiB devid1 size 5.44TiB used 1.97TiB path /dev/bcache2 devid2 size 1.80TiB used 122.03GiB path /dev/bcache1 The truth is that the BTRFS-BCACHE-MDADM combination is complex, but the results are really satisfactory and enormous versatility is achieved. To thank the BTRFS team and other responsible people for doing a great job. El jueves, 5 de abril de 2018 20:57:49 (CEST) Qu Wenruo escribió: > On 2018年04月05日 19:27, Senén Vidal Blanco wrote: > > Hello, > > I have a problem when mounting the btrfs system with two units (one in > > SEED > > mode and the other in read-write mode) > > > > Initially it is functioning correctly in production: > > > > btrfs fi sh > > Label: 'SEED_MD2' uuid: 285b16fa-b297-43ed-bb3b-311950729eb6 > > > >Total devices 2 FS bytes used 2.00TiB > >devid1 size 5.44TiB used 1.97TiB path /dev/bcache1 > >devid2 size 1.80TiB used 119.03GiB path /dev/bcache0 > > > > Label: 'BOOT' uuid: ce8fd2ef-975c-417a-b90c-280f8d324c44 > > > >Total devices 1 FS bytes used 536.86MiB > >devid1 size 1.86GiB used 1.15GiB path /dev/md1 > > > > Label: 'SEED_MD2' uuid: 851e4474-d375-4a25-b202-949e51f05877 > > > >Total devices 1 FS bytes used 1.94TiB > >devid1 size 5.44TiB used 1.97TiB path /dev/bcache1 > > > > The bcache0 and bcache1 units are what I mention. > > bcache1 is the SEED > > bcache0 is the read-write > > > > When it comes to mounting the two copy mirror devices that is in > > production I have this result: > > > > parent transid verify failed on 2184045428736 wanted 269593 found 266275 > > parent transid verify failed on 2184045428736 wanted 269593 found 266275 > > It's not the problem that btrfs fails to assemble correct seed/rw > devices list, it's something wrong (metadata corruption) happened. > > > Ignoring transid failure > > So this is btrfs-progs (btrfs check). Kernel doesn't ignore any transid > failure AFAIK. > > > Couldn't map the block 2287467560960 > > No mapping for 2287467560960-2287467577344 > > Couldn't map the block 2287467560960 > > bytenr mismatch, want=2287467560960, have=0 > > Couldn't read tree root > > Label: 'SEED_MD2' uuid: 851e4474-d375-4a25-b202-949e51f05877 > > > >Total devices 1 FS bytes used 1.94TiB > >devid1 size 5.44TiB used 1.97TiB path /dev/bcache1 > > > > Label: 'SEED_MD2' uuid: 285b16fa-b297-43ed-bb3b-311950729eb6 > > > >Total devices 2 FS bytes used 1.98TiB > >devid2 size 1.80TiB used 102.03GiB path /dev/bcache2 > >*** Some devices missing > > > > Would there be any way to tell BTRFS that the missing unit is the one that > > corresponds to the SEED unit so that it correctly rebuilds the file > > system? > > The rw device is corrupted. > > To ensure your seed device is safe, please run "btrfs check device>" to see if there is any problem. > > From current output, it seems that your chunk tree is corrupted, so that > btrfs check can't even map the logical address for your tree root. > > And to further debug the corrupted fs, please attach the following data: > > 1) btrfs inspect dump-super -fa > 2) btrfs inspect dump-tree -t chunk > > Thanks, > Qu > > > Thank you. -- Senén Vidal Blanco - SGISoft S.L. Tlf.: 986413322 - 660923711 GPG ID 466431A8AF01F99A http://www.sgisoft.com/ -- signature.asc Description: This is a digitally signed message part.
Re: Couldn't read tree root BTRFS - SEED
On 2018年04月05日 19:27, Senén Vidal Blanco wrote: > Hello, > I have a problem when mounting the btrfs system with two units (one in SEED > mode and the other in read-write mode) > > Initially it is functioning correctly in production: > > btrfs fi sh > Label: 'SEED_MD2' uuid: 285b16fa-b297-43ed-bb3b-311950729eb6 >Total devices 2 FS bytes used 2.00TiB >devid1 size 5.44TiB used 1.97TiB path /dev/bcache1 >devid2 size 1.80TiB used 119.03GiB path /dev/bcache0 > > Label: 'BOOT' uuid: ce8fd2ef-975c-417a-b90c-280f8d324c44 >Total devices 1 FS bytes used 536.86MiB >devid1 size 1.86GiB used 1.15GiB path /dev/md1 > > Label: 'SEED_MD2' uuid: 851e4474-d375-4a25-b202-949e51f05877 >Total devices 1 FS bytes used 1.94TiB >devid1 size 5.44TiB used 1.97TiB path /dev/bcache1 > > The bcache0 and bcache1 units are what I mention. > bcache1 is the SEED > bcache0 is the read-write > > When it comes to mounting the two copy mirror devices that is in production I > have this result: > > parent transid verify failed on 2184045428736 wanted 269593 found 266275 > parent transid verify failed on 2184045428736 wanted 269593 found 266275 It's not the problem that btrfs fails to assemble correct seed/rw devices list, it's something wrong (metadata corruption) happened. > Ignoring transid failure So this is btrfs-progs (btrfs check). Kernel doesn't ignore any transid failure AFAIK. > Couldn't map the block 2287467560960 > No mapping for 2287467560960-2287467577344 > Couldn't map the block 2287467560960 > bytenr mismatch, want=2287467560960, have=0 > Couldn't read tree root > Label: 'SEED_MD2' uuid: 851e4474-d375-4a25-b202-949e51f05877 >Total devices 1 FS bytes used 1.94TiB >devid1 size 5.44TiB used 1.97TiB path /dev/bcache1 > > Label: 'SEED_MD2' uuid: 285b16fa-b297-43ed-bb3b-311950729eb6 >Total devices 2 FS bytes used 1.98TiB >devid2 size 1.80TiB used 102.03GiB path /dev/bcache2 >*** Some devices missing > > Would there be any way to tell BTRFS that the missing unit is the one that > corresponds to the SEED unit so that it correctly rebuilds the file system? The rw device is corrupted. To ensure your seed device is safe, please run "btrfs check " to see if there is any problem. From current output, it seems that your chunk tree is corrupted, so that btrfs check can't even map the logical address for your tree root. And to further debug the corrupted fs, please attach the following data: 1) btrfs inspect dump-super -fa 2) btrfs inspect dump-tree -t chunk Thanks, Qu > > Thank you. > signature.asc Description: OpenPGP digital signature
Couldn't read tree root BTRFS - SEED
Hello, I have a problem when mounting the btrfs system with two units (one in SEED mode and the other in read-write mode) Initially it is functioning correctly in production: btrfs fi sh Label: 'SEED_MD2' uuid: 285b16fa-b297-43ed-bb3b-311950729eb6 Total devices 2 FS bytes used 2.00TiB devid1 size 5.44TiB used 1.97TiB path /dev/bcache1 devid2 size 1.80TiB used 119.03GiB path /dev/bcache0 Label: 'BOOT' uuid: ce8fd2ef-975c-417a-b90c-280f8d324c44 Total devices 1 FS bytes used 536.86MiB devid1 size 1.86GiB used 1.15GiB path /dev/md1 Label: 'SEED_MD2' uuid: 851e4474-d375-4a25-b202-949e51f05877 Total devices 1 FS bytes used 1.94TiB devid1 size 5.44TiB used 1.97TiB path /dev/bcache1 The bcache0 and bcache1 units are what I mention. bcache1 is the SEED bcache0 is the read-write When it comes to mounting the two copy mirror devices that is in production I have this result: parent transid verify failed on 2184045428736 wanted 269593 found 266275 parent transid verify failed on 2184045428736 wanted 269593 found 266275 Ignoring transid failure Couldn't map the block 2287467560960 No mapping for 2287467560960-2287467577344 Couldn't map the block 2287467560960 bytenr mismatch, want=2287467560960, have=0 Couldn't read tree root Label: 'SEED_MD2' uuid: 851e4474-d375-4a25-b202-949e51f05877 Total devices 1 FS bytes used 1.94TiB devid1 size 5.44TiB used 1.97TiB path /dev/bcache1 Label: 'SEED_MD2' uuid: 285b16fa-b297-43ed-bb3b-311950729eb6 Total devices 2 FS bytes used 1.98TiB devid2 size 1.80TiB used 102.03GiB path /dev/bcache2 *** Some devices missing Would there be any way to tell BTRFS that the missing unit is the one that corresponds to the SEED unit so that it correctly rebuilds the file system? Thank you. signature.asc Description: This is a digitally signed message part.
Re: [PATCH] btrfs-progs: Use more restrict check to read out tree root
On Tue, Apr 25, 2017 at 04:40:16PM +0800, Qu Wenruo wrote: > For fuzzed image bko-156811-bad-parent-ref-qgroup-verify.raw, it cause > qgroup to report -ENOMEM. > > But the fact is, such image is heavy damaged so there is not valid root > item for extent tree. > > Normal extent tree key in root tree should be (EXTENT_TREE ROOT_ITEM 0), > while in that fuzzed image, we got (EXTENT_TREE EXXTENT_DATA SOME_NUMBER). > > It's btrfs_find_last_root() that only checks the objectid, not caring > key type leads to such problem. > > Fix by doing extra check on key type for such case. > > Signed-off-by: Qu WenruoApplied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: Use more restrict check to read out tree root
For fuzzed image bko-156811-bad-parent-ref-qgroup-verify.raw, it cause qgroup to report -ENOMEM. But the fact is, such image is heavy damaged so there is not valid root item for extent tree. Normal extent tree key in root tree should be (EXTENT_TREE ROOT_ITEM 0), while in that fuzzed image, we got (EXTENT_TREE EXXTENT_DATA SOME_NUMBER). It's btrfs_find_last_root() that only checks the objectid, not caring key type leads to such problem. Fix by doing extra check on key type for such case. Signed-off-by: Qu Wenruo--- root-tree.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/root-tree.c b/root-tree.c index ab01a140..6b8f8c1c 100644 --- a/root-tree.c +++ b/root-tree.c @@ -51,7 +51,8 @@ int btrfs_find_last_root(struct btrfs_root *root, u64 objectid, l = path->nodes[0]; slot = path->slots[0] - 1; btrfs_item_key_to_cpu(l, _key, slot); - if (found_key.objectid != objectid) { + if (found_key.type != BTRFS_ROOT_ITEM_KEY || + found_key.objectid != objectid) { ret = -ENOENT; goto out; } -- 2.12.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rescue a single-device btrfs instance with zeroed tree root
On 2016-06-21 at 20:23 +0300, Ivan Shapovalov wrote: > Hello, > > So this is another case of "I lost my partition and do not have > backups". More precisely, _this_ is the backup and it turned out to > be > damaged. > > (The backup was made by partclone.btrfs. Together with a zeroed out > tree root, this asks for a bug in partclone...) > > So: the tree root is zeroes, backup roots are zeroes too, > btrfs-find-root only reports blocks of level 0 (needed is 1). > Is there something that can be done? Maybe it is possible to > reconstruct the root from its children? > Operations log following. > Please Cc: me in replies as I'm not subscribed to the list. Anyone? I'd appreciate any advice on how to rebuild the tree roots. I'd even write some code if someone tells me the disk format and logical tree constraints (i. e. in which order to put pointers to child nodes). BTW, it looks like all tree roots are lost, i. e. btrfs-find-root with any objectid (extent tree, subvolume tree, anything in ctree.h) finds only level 0 nodes. It should be possible to rebuild intermediate nodes, isn't it? Do they contain valuable information? Please Cc: me in replies as I'm not subscribed to the list. Thanks, -- Ivan Shapovalov / intelfx / > > 1. regular mount > > # mount /dev/loop0p3 /mnt/temp > > === dmesg === > [106737.299592] BTRFS info (device loop0p3): disk space caching is > enabled > [106737.299604] BTRFS: has skinny extents > [106737.299884] BTRFS error (device loop0p3): bad tree block start 0 > 162633449472 > [106737.299888] BTRFS: failed to read tree root on loop0p3 > [106737.314359] BTRFS: open_ctree failed > === end dmesg === > > 2. mount with -o recovery > > # mount -o recovery /dev/loop0p3 /mnt/temp > > === dmesg === > [106742.305720] BTRFS warning (device loop0p3): 'recovery' is > deprecated, use 'usebackuproot' instead > [106742.305722] BTRFS info (device loop0p3): trying to use backup > root at mount time > [106742.305724] BTRFS info (device loop0p3): disk space caching is > enabled > [106742.305725] BTRFS: has skinny extents > [106742.306056] BTRFS error (device loop0p3): bad tree block start 0 > 162633449472 > [106742.306060] BTRFS: failed to read tree root on loop0p3 > [106742.306069] BTRFS error (device loop0p3): bad tree block start 0 > 162633449472 > [106742.306071] BTRFS: failed to read tree root on loop0p3 > [106742.306084] BTRFS error (device loop0p3): bad tree block start 0 > 162632237056 > [106742.306086] BTRFS: failed to read tree root on loop0p3 > [106742.306097] BTRFS error (device loop0p3): bad tree block start 0 > 162626682880 > [106742.306100] BTRFS: failed to read tree root on loop0p3 > [106742.306111] BTRFS error (device loop0p3): bad tree block start 0 > 162609168384 > [106742.306114] BTRFS: failed to read tree root on loop0p3 > [106742.327272] BTRFS: open_ctree failed > === end dmesg === > > > 3. btrfs-find-root > > # btrfs-find-root /dev/loop0p3 > Couldn't read tree root > Superblock thinks the generation is 22332 > Superblock thinks the level is 1 > Well block 162633646080(gen: 22332 level: 0) seems good, but > generation/level doesn't match, want gen: 22332 level: 1 > Well block 162633596928(gen: 22332 level: 0) seems good, but > generation/level doesn't match, want gen: 22332 level: 1 > Well block 162633515008(gen: 22332 level: 0) seems good, but > generation/level doesn't match, want gen: 22332 level: 1 > > Thanks, signature.asc Description: This is a digitally signed message part
Rescue a single-device btrfs instance with zeroed tree root
Hello, So this is another case of "I lost my partition and do not have backups". More precisely, _this_ is the backup and it turned out to be damaged. (The backup was made by partclone.btrfs. Together with a zeroed out tree root, this asks for a bug in partclone...) So: the tree root is zeroes, backup roots are zeroes too, btrfs-find-root only reports blocks of level 0 (needed is 1). Is there something that can be done? Maybe it is possible to reconstruct the root from its children? Operations log following. Please Cc: me in replies as I'm not subscribed to the list. 1. regular mount # mount /dev/loop0p3 /mnt/temp === dmesg === [106737.299592] BTRFS info (device loop0p3): disk space caching is enabled [106737.299604] BTRFS: has skinny extents [106737.299884] BTRFS error (device loop0p3): bad tree block start 0 162633449472 [106737.299888] BTRFS: failed to read tree root on loop0p3 [106737.314359] BTRFS: open_ctree failed === end dmesg === 2. mount with -o recovery # mount -o recovery /dev/loop0p3 /mnt/temp === dmesg === [106742.305720] BTRFS warning (device loop0p3): 'recovery' is deprecated, use 'usebackuproot' instead [106742.305722] BTRFS info (device loop0p3): trying to use backup root at mount time [106742.305724] BTRFS info (device loop0p3): disk space caching is enabled [106742.305725] BTRFS: has skinny extents [106742.306056] BTRFS error (device loop0p3): bad tree block start 0 162633449472 [106742.306060] BTRFS: failed to read tree root on loop0p3 [106742.306069] BTRFS error (device loop0p3): bad tree block start 0 162633449472 [106742.306071] BTRFS: failed to read tree root on loop0p3 [106742.306084] BTRFS error (device loop0p3): bad tree block start 0 162632237056 [106742.306086] BTRFS: failed to read tree root on loop0p3 [106742.306097] BTRFS error (device loop0p3): bad tree block start 0 162626682880 [106742.306100] BTRFS: failed to read tree root on loop0p3 [106742.306111] BTRFS error (device loop0p3): bad tree block start 0 162609168384 [106742.306114] BTRFS: failed to read tree root on loop0p3 [106742.327272] BTRFS: open_ctree failed === end dmesg === 3. btrfs-find-root # btrfs-find-root /dev/loop0p3 Couldn't read tree root Superblock thinks the generation is 22332 Superblock thinks the level is 1 Well block 162633646080(gen: 22332 level: 0) seems good, but generation/level doesn't match, want gen: 22332 level: 1 Well block 162633596928(gen: 22332 level: 0) seems good, but generation/level doesn't match, want gen: 22332 level: 1 Well block 162633515008(gen: 22332 level: 0) seems good, but generation/level doesn't match, want gen: 22332 level: 1 Thanks, -- Ivan Shapovalov / intelfx / signature.asc Description: This is a digitally signed message part
[PATCH V2] Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots
The following call trace is seen when btrfs/031 test is executed in a loop, [ 158.661848] [ cut here ] [ 158.662634] WARNING: CPU: 2 PID: 890 at /home/chandan/repos/linux/fs/btrfs/ioctl.c:558 create_subvol+0x3d1/0x6ea() [ 158.664102] BTRFS: Transaction aborted (error -2) [ 158.664774] Modules linked in: [ 158.665266] CPU: 2 PID: 890 Comm: btrfs Not tainted 4.4.0-rc6-g511711a #2 [ 158.666251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 158.667392] 81c0a6b0 8806c7c4f8e8 81431fc8 8806c7c4f930 [ 158.668515] 8806c7c4f920 81051aa1 880c85aff000 8800bb44d000 [ 158.669647] 8808863b5c98 fffe 8806c7c4f980 [ 158.670769] Call Trace: [ 158.671153] [] dump_stack+0x44/0x5c [ 158.671884] [] warn_slowpath_common+0x81/0xc0 [ 158.672769] [] warn_slowpath_fmt+0x47/0x50 [ 158.673620] [] create_subvol+0x3d1/0x6ea [ 158.674440] [] btrfs_mksubvol.isra.30+0x369/0x520 [ 158.675376] [] ? percpu_down_read+0x1a/0x50 [ 158.676235] [] btrfs_ioctl_snap_create_transid+0x101/0x180 [ 158.677268] [] btrfs_ioctl_snap_create+0x52/0x70 [ 158.678183] [] btrfs_ioctl+0x474/0x2f90 [ 158.678975] [] ? vma_merge+0xee/0x300 [ 158.679751] [] ? alloc_pages_vma+0x91/0x170 [ 158.680599] [] ? lru_cache_add_active_or_unevictable+0x22/0x70 [ 158.681686] [] ? selinux_file_ioctl+0xff/0x1d0 [ 158.682581] [] do_vfs_ioctl+0x2c1/0x490 [ 158.683399] [] ? security_file_ioctl+0x3e/0x60 [ 158.684297] [] SyS_ioctl+0x74/0x80 [ 158.685051] [] entry_SYSCALL_64_fastpath+0x12/0x6a [ 158.685958] ---[ end trace 4b63312de5a2cb76 ]--- [ 158.686647] BTRFS: error (device loop0) in create_subvol:558: errno=-2 No such entry [ 158.709508] BTRFS info (device loop0): forced readonly [ 158.737113] BTRFS info (device loop0): disk space caching is enabled [ 158.738096] BTRFS error (device loop0): Remounting read-write after error is not allowed [ 158.851303] BTRFS error (device loop0): cleaner transaction attach returned -30 This occurs because, Mount filesystem Create subvol with ID 257 Unmount filesystem Mount filesystem Delete subvol with ID 257 btrfs_drop_snapshot() Add root corresponding to subvol 257 into btrfs_transaction->dropped_roots list Create new subvol (i.e. create_subvol()) 257 is returned as the next free objectid btrfs_read_fs_root_no_name() Finds the btrfs_root instance corresponding to the old subvol with ID 257 in btrfs_fs_info->fs_roots_radix. Returns error since btrfs_root_item->refs has the value of 0. To fix the issue the commit initializes tree root's and subvolume root's highest_objectid when loading the roots from disk. Signed-off-by: Chandan Rajendra <chan...@linux.vnet.ibm.com> --- Changelog: v1->v2: A newly created subvolume cannot have an ID beyond BTRFS_LAST_FREE_OBJECTID. Hence when loading tree root tree or any of the subvolume trees, We now use an assert statement to verify if root->highest_objectid has a value greater than BTRFS_LAST_FREE_OBJECTID. fs/btrfs/disk-io.c | 27 +++ fs/btrfs/inode-map.c | 9 + fs/btrfs/inode-map.h | 1 + fs/btrfs/ioctl.c | 4 4 files changed, 33 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index dc6b73a..3679ad6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1583,8 +1583,23 @@ int btrfs_init_fs_root(struct btrfs_root *root) ret = get_anon_bdev(>anon_dev); if (ret) goto free_writers; + + mutex_lock(>objectid_mutex); + ret = btrfs_find_highest_objectid(root, + >highest_objectid); + if (ret) { + mutex_unlock(>objectid_mutex); + goto free_root_dev; + } + + ASSERT(root->highest_objectid <= BTRFS_LAST_FREE_OBJECTID); + + mutex_unlock(>objectid_mutex); + return 0; +free_root_dev: + free_anon_bdev(root->anon_dev); free_writers: btrfs_free_subvolume_writers(root->subv_writers); fail: @@ -2914,6 +2929,18 @@ retry_root_backup: tree_root->commit_root = btrfs_root_node(tree_root); btrfs_set_root_refs(_root->root_item, 1); + mutex_lock(_root->objectid_mutex); + ret = btrfs_find_highest_objectid(tree_root, + _root->highest_objectid); + if (ret) { + mutex_unlock(_root->objectid_mutex); + goto recovery_tree_root; + } + + ASSERT(tree_root->highest_objectid <= BTRFS_LAST_FREE_OBJECTID); + + mutex_unlock(_root->objectid_mutex); + ret = btrfs_read_roots(fs_info, tree_root); if (ret) goto recovery_tree_root; diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c index 767a605..07573dc 100644 --- a/fs/btrfs/inode-map.c +++ b/fs/btrfs/i
Re: [PATCH] Btrfs: Intialize btrfs_root->highest_objectid when loading tree root and subvolume roots
On Tuesday 05 Jan 2016 13:12:34 David Sterba wrote: > Sorry for not answering that. As you're going to resend it, please > use EOVERFLOW in the btrfs_init_fs_root. We should not hit the overflow > error in the mount path. Right. Now I understand that. David, Replacing the following code snippet instances (in both open_ctree() and btrfs_init_fs_root()) ... if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) { mutex_unlock(>objectid_mutex); ret = -EOVERFLOW; goto free_root_dev; } with ASSERT(tree_root->highest_objectid <= BTRFS_LAST_FREE_OBJECTID); is probably a better option? The validation of root->highest_objectid must have been done by btrfs_find_free_objectid() when creating the subvolume. If the parent subvolume already has an objectid with BTRFS_LAST_FREE_OBJECTID as the value, btrfs_find_free_objectid() would return with an error and hence we should never have subvolumes containing other subvolumes with objectid greater than BTRFS_LAST_FREE_OBJECTID. -- chandan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: Intialize btrfs_root->highest_objectid when loading tree root and subvolume roots
On Wed, Oct 07, 2015 at 07:40:46PM +0530, Chandan Rajendra wrote: > On Wednesday 07 Oct 2015 11:25:03 David Sterba wrote: > > On Mon, Oct 05, 2015 at 10:14:24PM +0530, Chandan Rajendra wrote: > > > + if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) { > > > + mutex_unlock(>objectid_mutex); > > > + ret = -ENOSPC; > > > > ENOSPC ... I don't think it's right as this could be with a normal > > enospc during subvolume creation. The problem is that theh inode number > > space is exhausted, the closest error code I see is EOVERFLOW. As this > > is an ioctl we can afford to define the meaning of this return value as > > such (unlike for eg. creat()/open()). > > > > > + goto free_root_dev; > > > + } > > > + > > > + mutex_unlock(>objectid_mutex); > > > + > > > > > > return 0; > > David, Are you suggesting that we return -EOVERFLOW from within > btrfs_init_fs_root() and continue returning -ENOSPC in case of error > (i.e. tree_root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID) from > open_ctree()? > > If yes, btrfs_init_fs_root() gets invoked from open_ctree() via > btrfs_read_fs_root_no_name() and hence we may end up returning -EOVERFLOW when > servicing the mount() syscall. Sorry for not answering that. As you're going to resend it, please use EOVERFLOW in the btrfs_init_fs_root. We should not hit the overflow error in the mount path. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: Intialize btrfs_root->highest_objectid when loading tree root and subvolume roots
On Sunday 03 Jan 2016 00:02:18 james harvey wrote: > Bump. > > Pretty sure I just ran into this, outside of a testing scenario. See > http://permalink.gmane.org/gmane.comp.file-systems.btrfs/51796 > > Looks like the patch was never committed. > Thanks for pointing this out. I will rebase & test the patch on top of current integration branch and post it again. -- chandan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: Intialize btrfs_root->highest_objectid when loading tree root and subvolume roots
Bump. Pretty sure I just ran into this, outside of a testing scenario. See http://permalink.gmane.org/gmane.comp.file-systems.btrfs/51796 Looks like the patch was never committed. On Wed, Oct 7, 2015 at 10:10 AM, Chandan Rajendrawrote: > On Wednesday 07 Oct 2015 11:25:03 David Sterba wrote: >> On Mon, Oct 05, 2015 at 10:14:24PM +0530, Chandan Rajendra wrote: >> > + if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) { >> > + mutex_unlock(>objectid_mutex); >> > + ret = -ENOSPC; >> >> ENOSPC ... I don't think it's right as this could be with a normal >> enospc during subvolume creation. The problem is that theh inode number >> space is exhausted, the closest error code I see is EOVERFLOW. As this >> is an ioctl we can afford to define the meaning of this return value as >> such (unlike for eg. creat()/open()). >> >> > + goto free_root_dev; >> > + } >> > + >> > + mutex_unlock(>objectid_mutex); >> > + >> > >> > return 0; > > David, Are you suggesting that we return -EOVERFLOW from within > btrfs_init_fs_root() and continue returning -ENOSPC in case of error > (i.e. tree_root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID) from > open_ctree()? > > If yes, btrfs_init_fs_root() gets invoked from open_ctree() via > btrfs_read_fs_root_no_name() and hence we may end up returning -EOVERFLOW when > servicing the mount() syscall. > > -- > chandan > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 09/25] btrfs-progs: Introduce function to setup temporary tree root
Introduce new function, setup_temp_tree_root(), to initialize temporary tree root for make_btrfs_v2(). The new function will setup tree root at metadata chunk and ensure data won't be written into metadata chunk. Also, new make_btrfs_v2() will have a much better code structure than old make_btrfs(). Signed-off-by: Qu Wenruo <quwen...@cn.fujitsu.com> --- utils.c | 148 1 file changed, 148 insertions(+) diff --git a/utils.c b/utils.c index 444b3f3..fa26949 100644 --- a/utils.c +++ b/utils.c @@ -303,6 +303,136 @@ out: } /* + * Setup an extent buffer for tree block. + */ +static int setup_temp_extent_buffer(struct extent_buffer *buf, + struct btrfs_mkfs_config *cfg, + u64 bytenr, u64 owner) +{ + unsigned char fsid[BTRFS_FSID_SIZE]; + unsigned char chunk_uuid[BTRFS_UUID_SIZE]; + int ret; + + /* We rely on cfg->fs_uuid and chunk_uuid to fsid and chunk uuid */ + BUG_ON(!cfg->fs_uuid || !cfg->chunk_uuid); + ret = uuid_parse(cfg->fs_uuid, fsid); + if (ret) + return -EINVAL; + ret = uuid_parse(cfg->chunk_uuid, chunk_uuid); + if (ret) + return -EINVAL; + + memset(buf->data, 0, cfg->nodesize); + buf->len = cfg->nodesize; + btrfs_set_header_bytenr(buf, bytenr); + btrfs_set_header_generation(buf, 1); + btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV); + btrfs_set_header_owner(buf, owner); + btrfs_set_header_flags(buf, BTRFS_HEADER_FLAG_WRITTEN); + write_extent_buffer(buf, chunk_uuid, btrfs_header_chunk_tree_uuid(buf), + BTRFS_UUID_SIZE); + write_extent_buffer(buf, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE); + return 0; +} + +static inline int write_temp_extent_buffer(int fd, struct extent_buffer *buf, + u64 bytenr) +{ + int ret; + + csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0); + + /* Temporary extent buffer is always mapped 1:1 on disk */ + ret = pwrite(fd, buf->data, buf->len, bytenr); + if (ret < buf->len) + ret = (ret < 0 ? ret : -EIO); + else + ret = 0; + return ret; +} + +/* + * Insert a root item for temporary tree root + * + * Only used in make_btrfs_v2(). + */ +static void insert_temp_root_item(struct extent_buffer *buf, + struct btrfs_mkfs_config *cfg, + int *slot, u32 *itemoff, u64 objectid, + u64 bytenr) +{ + struct btrfs_root_item root_item; + struct btrfs_inode_item *inode_item; + struct btrfs_disk_key disk_key; + + btrfs_set_header_nritems(buf, *slot + 1); + (*itemoff) -= sizeof(root_item); + memset(_item, 0, sizeof(root_item)); + inode_item = _item.inode; + btrfs_set_stack_inode_generation(inode_item, 1); + btrfs_set_stack_inode_size(inode_item, 3); + btrfs_set_stack_inode_nlink(inode_item, 1); + btrfs_set_stack_inode_nbytes(inode_item, cfg->nodesize); + btrfs_set_stack_inode_mode(inode_item, S_IFDIR | 0755); + btrfs_set_root_refs(_item, 1); + btrfs_set_root_used(_item, cfg->nodesize); + btrfs_set_root_generation(_item, 1); + btrfs_set_root_bytenr(_item, bytenr); + + memset(_key, 0, sizeof(disk_key)); + btrfs_set_disk_key_type(_key, BTRFS_ROOT_ITEM_KEY); + btrfs_set_disk_key_objectid(_key, objectid); + btrfs_set_disk_key_offset(_key, 0); + + btrfs_set_item_key(buf, _key, *slot); + btrfs_set_item_offset(buf, btrfs_item_nr(*slot), *itemoff); + btrfs_set_item_size(buf, btrfs_item_nr(*slot), sizeof(root_item)); + write_extent_buffer(buf, _item, + btrfs_item_ptr_offset(buf, *slot), + sizeof(root_item)); + (*slot)++; +} + +static int setup_temp_root_tree(int fd, struct btrfs_mkfs_config *cfg, + u64 root_bytenr, u64 extent_bytenr, + u64 dev_bytenr, u64 fs_bytenr, u64 csum_bytenr) +{ + struct extent_buffer *buf = NULL; + u32 itemoff = __BTRFS_LEAF_DATA_SIZE(cfg->nodesize); + int slot = 0; + int ret; + + /* + * Provided bytenr must in ascending order, or tree root will have a +* bad key order. +*/ + BUG_ON(!(root_bytenr < extent_bytenr && extent_bytenr < dev_bytenr && +dev_bytenr < fs_bytenr && fs_bytenr < csum_bytenr)); + buf = malloc(sizeof(*buf) + cfg->nodesize); + if (!buf) + return -ENOMEM; + + ret = setup_temp_extent_buffer(buf, cfg, root_bytenr, + BTRFS_ROOT_TREE_OB
[PATCH 09/25] btrfs-progs: Introduce function to setup temporary tree root
Introduce new function, setup_temp_tree_root(), to initialize temporary tree root for make_btrfs_v2(). The new function will setup tree root at metadata chunk and ensure data won't be written into metadata chunk. Also, new make_btrfs_v2() will have a much better code structure than old make_btrfs(). Signed-off-by: Qu Wenruo <quwen...@cn.fujitsu.com> --- utils.c | 148 1 file changed, 148 insertions(+) diff --git a/utils.c b/utils.c index 6c4da04..ce4d93c 100644 --- a/utils.c +++ b/utils.c @@ -493,6 +493,136 @@ out: } /* + * Setup an extent buffer for tree block. + */ +static int setup_temp_extent_buffer(struct extent_buffer *buf, + struct btrfs_mkfs_config *cfg, + u64 bytenr, u64 owner) +{ + unsigned char fsid[BTRFS_FSID_SIZE]; + unsigned char chunk_uuid[BTRFS_UUID_SIZE]; + int ret; + + /* We rely on cfg->fs_uuid and chunk_uuid to fsid and chunk uuid */ + BUG_ON(!cfg->fs_uuid || !cfg->chunk_uuid); + ret = uuid_parse(cfg->fs_uuid, fsid); + if (ret) + return -EINVAL; + ret = uuid_parse(cfg->chunk_uuid, chunk_uuid); + if (ret) + return -EINVAL; + + memset(buf->data, 0, cfg->nodesize); + buf->len = cfg->nodesize; + btrfs_set_header_bytenr(buf, bytenr); + btrfs_set_header_generation(buf, 1); + btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV); + btrfs_set_header_owner(buf, owner); + btrfs_set_header_flags(buf, BTRFS_HEADER_FLAG_WRITTEN); + write_extent_buffer(buf, chunk_uuid, btrfs_header_chunk_tree_uuid(buf), + BTRFS_UUID_SIZE); + write_extent_buffer(buf, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE); + return 0; +} + +static inline int write_temp_extent_buffer(int fd, struct extent_buffer *buf, + u64 bytenr) +{ + int ret; + + csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0); + + /* Temporary extent buffer is always mapped 1:1 on disk */ + ret = pwrite(fd, buf->data, buf->len, bytenr); + if (ret < buf->len) + ret = (ret < 0 ? ret : -EIO); + else + ret = 0; + return ret; +} + +/* + * Insert a root item for temporary tree root + * + * Only used in make_btrfs_v2(). + */ +static void insert_temp_root_item(struct extent_buffer *buf, + struct btrfs_mkfs_config *cfg, + int *slot, u32 *itemoff, u64 objectid, + u64 bytenr) +{ + struct btrfs_root_item root_item; + struct btrfs_inode_item *inode_item; + struct btrfs_disk_key disk_key; + + btrfs_set_header_nritems(buf, *slot + 1); + (*itemoff) -= sizeof(root_item); + memset(_item, 0, sizeof(root_item)); + inode_item = _item.inode; + btrfs_set_stack_inode_generation(inode_item, 1); + btrfs_set_stack_inode_size(inode_item, 3); + btrfs_set_stack_inode_nlink(inode_item, 1); + btrfs_set_stack_inode_nbytes(inode_item, cfg->nodesize); + btrfs_set_stack_inode_mode(inode_item, S_IFDIR | 0755); + btrfs_set_root_refs(_item, 1); + btrfs_set_root_used(_item, cfg->nodesize); + btrfs_set_root_generation(_item, 1); + btrfs_set_root_bytenr(_item, bytenr); + + memset(_key, 0, sizeof(disk_key)); + btrfs_set_disk_key_type(_key, BTRFS_ROOT_ITEM_KEY); + btrfs_set_disk_key_objectid(_key, objectid); + btrfs_set_disk_key_offset(_key, 0); + + btrfs_set_item_key(buf, _key, *slot); + btrfs_set_item_offset(buf, btrfs_item_nr(*slot), *itemoff); + btrfs_set_item_size(buf, btrfs_item_nr(*slot), sizeof(root_item)); + write_extent_buffer(buf, _item, + btrfs_item_ptr_offset(buf, *slot), + sizeof(root_item)); + (*slot)++; +} + +static int setup_temp_root_tree(int fd, struct btrfs_mkfs_config *cfg, + u64 root_bytenr, u64 extent_bytenr, + u64 dev_bytenr, u64 fs_bytenr, u64 csum_bytenr) +{ + struct extent_buffer *buf = NULL; + u32 itemoff = __BTRFS_LEAF_DATA_SIZE(cfg->nodesize); + int slot = 0; + int ret; + + /* + * Provided bytenr must in ascending order, or tree root will have a +* bad key order. +*/ + BUG_ON(!(root_bytenr < extent_bytenr && extent_bytenr < dev_bytenr && +dev_bytenr < fs_bytenr && fs_bytenr < csum_bytenr)); + buf = malloc(sizeof(*buf) + cfg->nodesize); + if (!buf) + return -ENOMEM; + + ret = setup_temp_extent_buffer(buf, cfg, root_bytenr, + BTRFS_ROOT_TREE_OB
[RFC PATCH 09/22] btrfs-progs: Introduce function to setup temporary tree root
Introduce new function, setup_temp_tree_root(), to initialize temporary tree root for make_btrfs_v2(). The new function will setup tree root at metadata chunk and ensure data won't be written into metadata chunk. Also, new make_btrfs_v2() will have a much better code structure than old make_btrfs(). Signed-off-by: Qu Wenruo <quwen...@cn.fujitsu.com> --- utils.c | 148 1 file changed, 148 insertions(+) diff --git a/utils.c b/utils.c index 6c4da04..ce4d93c 100644 --- a/utils.c +++ b/utils.c @@ -493,6 +493,136 @@ out: } /* + * Setup an extent buffer for tree block. + */ +static int setup_temp_extent_buffer(struct extent_buffer *buf, + struct btrfs_mkfs_config *cfg, + u64 bytenr, u64 owner) +{ + unsigned char fsid[BTRFS_FSID_SIZE]; + unsigned char chunk_uuid[BTRFS_UUID_SIZE]; + int ret; + + /* We rely on cfg->fs_uuid and chunk_uuid to fsid and chunk uuid */ + BUG_ON(!cfg->fs_uuid || !cfg->chunk_uuid); + ret = uuid_parse(cfg->fs_uuid, fsid); + if (ret) + return -EINVAL; + ret = uuid_parse(cfg->chunk_uuid, chunk_uuid); + if (ret) + return -EINVAL; + + memset(buf->data, 0, cfg->nodesize); + buf->len = cfg->nodesize; + btrfs_set_header_bytenr(buf, bytenr); + btrfs_set_header_generation(buf, 1); + btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV); + btrfs_set_header_owner(buf, owner); + btrfs_set_header_flags(buf, BTRFS_HEADER_FLAG_WRITTEN); + write_extent_buffer(buf, chunk_uuid, btrfs_header_chunk_tree_uuid(buf), + BTRFS_UUID_SIZE); + write_extent_buffer(buf, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE); + return 0; +} + +static inline int write_temp_extent_buffer(int fd, struct extent_buffer *buf, + u64 bytenr) +{ + int ret; + + csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0); + + /* Temporary extent buffer is always mapped 1:1 on disk */ + ret = pwrite(fd, buf->data, buf->len, bytenr); + if (ret < buf->len) + ret = (ret < 0 ? ret : -EIO); + else + ret = 0; + return ret; +} + +/* + * Insert a root item for temporary tree root + * + * Only used in make_btrfs_v2(). + */ +static void insert_temp_root_item(struct extent_buffer *buf, + struct btrfs_mkfs_config *cfg, + int *slot, u32 *itemoff, u64 objectid, + u64 bytenr) +{ + struct btrfs_root_item root_item; + struct btrfs_inode_item *inode_item; + struct btrfs_disk_key disk_key; + + btrfs_set_header_nritems(buf, *slot + 1); + (*itemoff) -= sizeof(root_item); + memset(_item, 0, sizeof(root_item)); + inode_item = _item.inode; + btrfs_set_stack_inode_generation(inode_item, 1); + btrfs_set_stack_inode_size(inode_item, 3); + btrfs_set_stack_inode_nlink(inode_item, 1); + btrfs_set_stack_inode_nbytes(inode_item, cfg->nodesize); + btrfs_set_stack_inode_mode(inode_item, S_IFDIR | 0755); + btrfs_set_root_refs(_item, 1); + btrfs_set_root_used(_item, cfg->nodesize); + btrfs_set_root_generation(_item, 1); + btrfs_set_root_bytenr(_item, bytenr); + + memset(_key, 0, sizeof(disk_key)); + btrfs_set_disk_key_type(_key, BTRFS_ROOT_ITEM_KEY); + btrfs_set_disk_key_objectid(_key, objectid); + btrfs_set_disk_key_offset(_key, 0); + + btrfs_set_item_key(buf, _key, *slot); + btrfs_set_item_offset(buf, btrfs_item_nr(*slot), *itemoff); + btrfs_set_item_size(buf, btrfs_item_nr(*slot), sizeof(root_item)); + write_extent_buffer(buf, _item, + btrfs_item_ptr_offset(buf, *slot), + sizeof(root_item)); + (*slot)++; +} + +static int setup_temp_root_tree(int fd, struct btrfs_mkfs_config *cfg, + u64 root_bytenr, u64 extent_bytenr, + u64 dev_bytenr, u64 fs_bytenr, u64 csum_bytenr) +{ + struct extent_buffer *buf = NULL; + u32 itemoff = __BTRFS_LEAF_DATA_SIZE(cfg->nodesize); + int slot = 0; + int ret; + + /* + * Provided bytenr must in ascending order, or tree root will have a +* bad key order. +*/ + BUG_ON(!(root_bytenr < extent_bytenr && extent_bytenr < dev_bytenr && +dev_bytenr < fs_bytenr && fs_bytenr < csum_bytenr)); + buf = malloc(sizeof(*buf) + cfg->nodesize); + if (!buf) + return -ENOMEM; + + ret = setup_temp_extent_buffer(buf, cfg, root_bytenr, + BTRFS_ROOT_TREE_OB
Re: [PATCH] Btrfs: Intialize btrfs_root->highest_objectid when loading tree root and subvolume roots
On Mon, Oct 05, 2015 at 10:14:24PM +0530, Chandan Rajendra wrote: > + if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) { > + mutex_unlock(>objectid_mutex); > + ret = -ENOSPC; ENOSPC ... I don't think it's right as this could be with a normal enospc during subvolume creation. The problem is that theh inode number space is exhausted, the closest error code I see is EOVERFLOW. As this is an ioctl we can afford to define the meaning of this return value as such (unlike for eg. creat()/open()). > + goto free_root_dev; > + } > + > + mutex_unlock(>objectid_mutex); > + > return 0; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 10/10] btrfs-progs: Allow open_ctree use backup tree root or search it automatically if primary tree root is corrupted.
On Tue, Jul 28, 2015 at 08:34:50AM +0800, Qu Wenruo wrote: Yes, you were against the automatic use of backup root or especially iteration all metadata space to find the latest tree root. I'll try to add a new option like --full-scan to enable the automatic search of all metadata space. But I'm afraid it may be a little late as I'm recently debugging other things like in-band deduplication. This is not anything urgent, I was going through my git branches with pending patches and found this. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 10/10] btrfs-progs: Allow open_ctree use backup tree root or search it automatically if primary tree root is corrupted.
On Mon, Jan 19, 2015 at 02:45:12PM +0800, Qu Wenruo wrote: Allow open_ctree to try its best to open tree root on damaged file system. With this patch, open_ctree will follow the below priority to read tree root, providing better chance to open damaged btrfs fs. 1) Using root bytenr in SB to read tree root Normal routine if not specified to use backup roots or specified root tree bytenr. 2) Using backup root if specified or 1) fails Backup will be automatically used without user specification if normal routine fails 3) Try to search possible tree root in all its metadata chunks The last chance, searching through all the metadata space. May takes a long time but still worth a try since above methods all fails. I don't see this patch merged and there are no folowups. IIRC I had objections against the automatic behaviour but cannot find the discussion. Can we merge at least some bit of that patch? It does more than one thing so it would be better to split it. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 10/10] btrfs-progs: Allow open_ctree use backup tree root or search it automatically if primary tree root is corrupted.
David Sterba wrote on 2015/07/27 17:04 +0200: On Mon, Jan 19, 2015 at 02:45:12PM +0800, Qu Wenruo wrote: Allow open_ctree to try its best to open tree root on damaged file system. With this patch, open_ctree will follow the below priority to read tree root, providing better chance to open damaged btrfs fs. 1) Using root bytenr in SB to read tree root Normal routine if not specified to use backup roots or specified root tree bytenr. 2) Using backup root if specified or 1) fails Backup will be automatically used without user specification if normal routine fails 3) Try to search possible tree root in all its metadata chunks The last chance, searching through all the metadata space. May takes a long time but still worth a try since above methods all fails. I don't see this patch merged and there are no folowups. IIRC I had objections against the automatic behaviour but cannot find the discussion. Can we merge at least some bit of that patch? It does more than one thing so it would be better to split it. I forgot this patchset again. I must admit that I have a bad memory... Yes, you were against the automatic use of backup root or especially iteration all metadata space to find the latest tree root. I'll try to add a new option like --full-scan to enable the automatic search of all metadata space. But I'm afraid it may be a little late as I'm recently debugging other things like in-band deduplication. Thanks Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root.
New function reset_one_root_csum() will reset all csum in one root. And reset_roots_csum() will reset all csum of all trees in tree root. which provides the basis for later dangerous options. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- cmds-check.c | 176 +++ 1 file changed, 176 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index e4b4f4a..535a518 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -9193,6 +9193,182 @@ out: return ret; } +static int reset_one_root_csum(struct btrfs_root *root) +{ + struct btrfs_fs_info *fs_info = root-fs_info; + struct btrfs_path *path; + struct btrfs_key key; + struct extent_buffer *node; + u32 sectorsize = root-sectorsize; + int max_level = btrfs_root_level(root-root_item); + int slot; + int i; + int ret = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + path-reada = 0; + + /* Iterate all levels except level 0 */ + for (i = 0; i max_level; i++) { + int cur_level = max_level - i; + + path-lowest_level = cur_level; + key.offset = 0; + key.objectid = 0; + key.type = 0; + + /* +* btrfs_search_slot() can't work with lowest_level and cow, +* so use inslen=0 and cow=0 and later modify will be done +* directly to disk. +*/ + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); + if (ret 0) + goto out; + + /* Iterate all node slots in this level */ + while (1) { + u64 bytenr; + + node = path-nodes[0]; + slot = path-slots[0]; + bytenr = btrfs_node_blockptr(node, slot); + + if (bytenr != round_down(bytenr, sectorsize)) { + bytenr = round_down(bytenr, sectorsize); + btrfs_set_node_blockptr(node, slot, bytenr); + ret = write_tree_block(NULL, root, node); + if (ret 0) { + fprintf(stderr, + Fail to write extent at %llu\n, + bytenr); + goto out; + } + } + ret = reset_tree_block_csum(fs_info, bytenr, + root-nodesize); + if (ret 0) { + fprintf(stderr, + Fail to reset csum for tree block at %llu\n, + bytenr); + goto out; + } + + ret = btrfs_next_slot(root, path, cur_level); + /* +* Error should not happen since higher level iteration +* has already reset the csum of this level. +* Either way, goto next level should be OK. +*/ + if (ret) + break; + } + btrfs_release_path(path); + } +out: + btrfs_free_path(path); + return ret; +} + +static int reset_roots_csum(struct btrfs_root *tree_root) +{ + struct btrfs_fs_info *fs_info = tree_root-fs_info; + struct btrfs_key key; + struct btrfs_path *path; + struct btrfs_root_item *ri; + struct btrfs_root *root; + struct extent_buffer *node; + u32 nodesize = tree_root-nodesize; + u32 sectorsize = tree_root-sectorsize; + u64 bytenr; + int slot; + int ret = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + key.objectid = 0; + key.offset = 0; + key.type = 0; + + /* +* Tree root is OK and we can do cow. But in case extent tree is +* corrupted, we still use the nocow method. +*/ + ret = btrfs_search_slot(NULL, tree_root, key, path, 0, 0); + if (ret 0) + goto out; + while (1) { + slot = path-slots[0]; + node = path-nodes[0]; + btrfs_item_key_to_cpu(node, key, slot); + if (key.type != BTRFS_ROOT_ITEM_KEY) + goto next; + + /* +* skip tree reloc tree, it's not support by +* btrfs_read_fs_root() yet. +*/ + if (key.objectid == BTRFS_TREE_RELOC_OBJECTID) + goto next; + + ri = btrfs_item_ptr(node, slot, struct btrfs_root_item); + bytenr
[PATCH 5/5] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root.
New function reset_one_root_csum() will reset all csum in one root. And reset_roots_csum() will reset all csum of all trees in tree root. which provides the basis for later dangerous options. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- cmds-check.c | 157 +++ 1 file changed, 157 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index e4b4f4a..2cdd3d9 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -9193,6 +9193,163 @@ out: return ret; } +static int reset_one_root_csum(struct btrfs_trans_handle *trans, + struct btrfs_root *root) +{ + struct btrfs_fs_info *fs_info = root-fs_info; + struct btrfs_path *path; + struct btrfs_key key; + struct extent_buffer *node; + u32 sectorsize = root-sectorsize; + int max_level = btrfs_root_level(root-root_item); + int slot; + int i; + int ret = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + path-reada = 0; + + /* Iterate all levels except level 0 */ + for (i = 0; i max_level; i++) { + int cur_level = max_level - i; + + path-lowest_level = cur_level; + key.offset = 0; + key.objectid = 0; + key.type = 0; + + ret = btrfs_search_slot(trans, root, key, path, 1, 1); + if (ret 0) + goto out; + + /* Iterate all node slots in this level */ + while (1) { + u64 bytenr; + + node = path-nodes[0]; + slot = path-slots[0]; + bytenr = btrfs_node_blockptr(node, slot); + + if (bytenr != round_down(bytenr, sectorsize)) { + bytenr = round_down(bytenr, sectorsize); + btrfs_set_node_blockptr(node, slot, bytenr); + btrfs_mark_buffer_dirty(node); + } + ret = reset_tree_block_csum(fs_info, bytenr, + root-nodesize); + if (ret 0) { + fprintf(stderr, + Fail to reset csum for tree block at %llu\n, + bytenr); + goto next_slot; + } +next_slot: + ret = btrfs_next_slot(root, path, cur_level); + /* +* Error should not happen since higher level iteration +* has already reset the csum of this level. +* Either way, goto next level should be OK. +*/ + if (ret) + break; + } + btrfs_release_path(path); + } +out: + btrfs_free_path(path); + return ret; +} + +static int reset_roots_csum(struct btrfs_trans_handle *trans, + struct btrfs_root *tree_root) +{ + struct btrfs_fs_info *fs_info = tree_root-fs_info; + struct btrfs_key key; + struct btrfs_path *path; + struct btrfs_root_item *ri; + struct btrfs_root *root; + struct extent_buffer *node; + u32 nodesize = tree_root-nodesize; + u32 sectorsize = tree_root-sectorsize; + u64 bytenr; + int slot; + int ret = 0; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + key.objectid = 0; + key.offset = 0; + key.type = 0; + + ret = btrfs_search_slot(trans, tree_root, key, path, 1, 0); + if (ret 0) + goto out; + while (1) { + slot = path-slots[0]; + node = path-nodes[0]; + btrfs_item_key_to_cpu(node, key, slot); + if (key.type != BTRFS_ROOT_ITEM_KEY) + goto next; + + /* +* skip tree reloc tree, it's not support by +* btrfs_read_fs_root() yet. +*/ + if (key.objectid == BTRFS_TREE_RELOC_OBJECTID) + goto next; + + ri = btrfs_item_ptr(node, slot, struct btrfs_root_item); + bytenr = btrfs_disk_root_bytenr(node, ri); + + /* +* Check if the bytenr is aligned. +* If the error bit is only in the low 12bits, no real damage +* will happen since we will align it. +*/ + if (round_down(bytenr, sectorsize) != bytenr) { + bytenr = round_down(bytenr, sectorsize); + btrfs_set_disk_root_bytenr(node, ri, bytenr); + btrfs_mark_buffer_dirty(node); + } + + ret
[PATCH v2 10/10] btrfs-progs: Allow open_ctree use backup tree root or search it automatically if primary tree root is corrupted.
Allow open_ctree to try its best to open tree root on damaged file system. With this patch, open_ctree will follow the below priority to read tree root, providing better chance to open damaged btrfs fs. 1) Using root bytenr in SB to read tree root Normal routine if not specified to use backup roots or specified root tree bytenr. 2) Using backup root if specified or 1) fails Backup will be automatically used without user specification if normal routine fails 3) Try to search possible tree root in all its metadata chunks The last chance, searching through all the metadata space. May takes a long time but still worth a try since above methods all fails. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com Signed-off-by: David Sterba dste...@suse.cz --- disk-io.c | 104 +- 1 file changed, 96 insertions(+), 8 deletions(-) diff --git a/disk-io.c b/disk-io.c index 71ca31f..6673a99 100644 --- a/disk-io.c +++ b/disk-io.c @@ -35,6 +35,7 @@ #include utils.h #include print-tree.h #include rbtree-utils.h +#include find-root.h static int check_tree_block(struct btrfs_root *root, struct extent_buffer *buf) { @@ -894,9 +895,33 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr, blocksize = btrfs_level_size(root, btrfs_super_root_level(sb)); generation = btrfs_super_generation(sb); - if (!root_tree_bytenr !(flags OPEN_CTREE_BACKUP_ROOT)) { + /* +* If root bytenr is specified, use it and if fails, just return error, +* not to try other method. +*/ + if (root_tree_bytenr) { + root-node = read_tree_block(root, root_tree_bytenr, +blocksize, generation); + if (!extent_buffer_uptodate(root-node)) { + fprintf(stderr, Couldn't read tree root at %llu\n, + root_tree_bytenr); + return -EIO; + } else + goto extent_tree; + } + /* Normal root bytenr from super */ + if (!(flags OPEN_CTREE_BACKUP_ROOT)) { root_tree_bytenr = btrfs_super_root(sb); - } else if (flags OPEN_CTREE_BACKUP_ROOT) { + root-node = read_tree_block(root, root_tree_bytenr, +blocksize, generation); + if (!extent_buffer_uptodate(root-node)) { + fprintf(stderr, + Couldn't read tree root, try backup\n); + ret = -EAGAIN; + } else + goto extent_tree; + } + if ((flags OPEN_CTREE_BACKUP_ROOT) || ret == -EAGAIN) { struct btrfs_root_backup *backup; int index = find_best_backup_root(sb); if (index = BTRFS_NUM_BACKUP_ROOTS) { @@ -906,15 +931,78 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr, backup = fs_info-super_copy-super_roots + index; root_tree_bytenr = btrfs_backup_tree_root(backup); generation = btrfs_backup_tree_root_gen(backup); + root-node = read_tree_block(root, root_tree_bytenr, +blocksize, generation); + if (!extent_buffer_uptodate(root-node)) { + fprintf(stderr, + Couldn't read backup tree root, try searching tree root\n); + ret = -EAGAIN; + } else { + ret = 0; + goto extent_tree; + } } - root-node = read_tree_block(root, root_tree_bytenr, blocksize, -generation); - if (!extent_buffer_uptodate(root-node)) { - fprintf(stderr, Couldn't read tree root\n); - return -EIO; - } + /* Last chance, search the tree root */ + if (ret == -EAGAIN) { + struct btrfs_find_root_filter filter = {0}; + struct btrfs_find_root_gen_cache *gen_cache; + struct cache_tree result; + struct cache_extent *cache; + struct cache_extent *tree_block; + + filter.objectid = BTRFS_ROOT_TREE_OBJECTID; + cache_tree_init(result); + + printf(Searching tree root, may take some time\n); + ret = btrfs_find_root_search(fs_info-chunk_root, filter, +result, NULL); + if (ret 0) { + fprintf(stderr, Couldn't search tree root: %s\n, + strerror(-ret)); + btrfs_find_root_free(result); + return ret; + } + if (cache_tree_empty(result)) { + fprintf(stderr, Fail to find any tree
[SOLVED] btrfs unmountable: read block failed check_tree_block; Couldn't read tree root
For solution see http://article.gmane.org/gmane.comp.file-systems.btrfs/39974 Am 28.10.14 um 00:03 schrieb Ansgar Hockmann-Stolle: Am 27.10.14 um 14:23 schrieb Ansgar Hockmann-Stolle: Hi! My btrfs system partition went readonly. After reboot it doesnt mount anymore. System was openSUSE 13.1 Tumbleweed (kernel 3.17.??). Now I'm on openSUSE 13.2-RC1 rescue (kernel 3.16.3). I dumped (dd) the whole 250 GB SSD to some USB file and tried some btrfs tools on another copy per loopback device. But everything failed with: kernel: BTRFS: failed to read tree root on dm-2 See http://pastebin.com/raw.php?i=dPnU6nzg. Any hints where to go from here? After an offlist hint (thanks Tom!) I compiled the latest btrfs-progs 3.17 and tried some more ... linux:~/bin # ./btrfs --version Btrfs v3.17 linux:~/bin # ./btrfs-find-root /dev/sda3 Super think's the tree root is at 1015238656, chunk root 20971520 Well block 239718400 seems great, but generation doesn't match, have=661931, want=663595 level 0 Well block 239722496 seems great, but generation doesn't match, have=661931, want=663595 level 0 Well block 320098304 seems great, but generation doesn't match, have=662233, want=663595 level 0 Well block 879341568 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879345664 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879382528 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879398912 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879403008 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879423488 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879435776 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 880095232 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 881504256 seems great, but generation doesn't match, have=663228, want=663595 level 0 Well block 881512448 seems great, but generation doesn't match, have=663228, want=663595 level 0 Well block 936271872 seems great, but generation doesn't match, have=663397, want=663595 level 0 Well block 1004490752 seems great, but generation doesn't match, have=663571, want=663595 level 0 Well block 1007804416 seems great, but generation doesn't match, have=663572, want=663595 level 0 Well block 1012031488 seems great, but generation doesn't match, have=663575, want=663595 level 0 Well block 1012396032 seems great, but generation doesn't match, have=663575, want=663595 level 0 Well block 1012633600 seems great, but generation doesn't match, have=663586, want=663595 level 0 Well block 1012871168 seems great, but generation doesn't match, have=663585, want=663595 level 0 Well block 1015201792 seems great, but generation doesn't match, have=663588, want=663595 level 0 Well block 1015836672 seems great, but generation doesn't match, have=663596, want=663595 level 1 Well block 44132536320 seems great, but generation doesn't match, have=658774, want=663595 level 0 Well block 44178280448 seems great, but generation doesn't match, have=658774, want=663595 level 0 Well block 87443644416 seems great, but generation doesn't match, have=661349, want=663595 level 0 Well block 87514079232 seems great, but generation doesn't match, have=651051, want=663595 level 0 Well block 87517679616 seems great, but generation doesn't match, have=661349, want=663595 level 0 Well block 98697822208 seems great, but generation doesn't match, have=643548, want=663595 level 0 Well block 103285026816 seems great, but generation doesn't match, have=661672, want=663595 level 0 Well block 103309553664 seems great, but generation doesn't match, have=661674, want=663595 level 0 Well block 103523430400 seems great, but generation doesn't match, have=661767, want=663595 level 0 No more metdata to scan, exiting This line I found interesting because have is want + 1: Well block 1015836672 seems great, but generation doesn't match, have=663596, want=663595 level 1 And here the tail of btrfs rescue chunk-recover (full output at http://pastebin.com/raw.php?i=1D5VgDxv) [..] Total Chunks:234 Heathy:231 Bad:3 Orphan Block Groups: Orphan Device Extents: Couldn't map the block 1015238656 btrfs: volumes.c:1140: btrfs_num_copies: Assertion `!(ce-start logical || ce-start + ce-size logical)' failed. Aborted Sadly btrfs check --repair keep up refusing to do its job. linux:~ # btrfs check --repair /dev/sda3 enabling repair mode Check tree block failed, want=1015238656, have=0 Check tree block failed, want=1015238656, have=0 Check tree block failed, want=1015238656, have=0 Check tree block failed, want=1015238656, have=0 Check tree block failed, want=1015238656, have=0 read block failed check_tree_block Couldn't read tree root Checking filesystem on /dev/sda3 UUID: 1af256b5-b1ad-443b-aeee-a6853e70b7e2
Re: btrfs unmountable: read block failed check_tree_block; Couldn't read tree root
Duncan 1i5t5.duncan at cox.net writes: [..] Hope that helps! =:^) Thanks a lot for that many hints! Unfortunately, btrfs restore does not find the tree root and so it does not find anything. I will wait for Qu Wenruo to enhance chunk-recovering. And in the meantime I will test openSUSE 13.2-RC1 by installing a fresh copy of it. I have the dd dump and a working copy of it to test. And backups ... I told each friend I helped in the last years to make regular backups but me ... ;-) Thanks, Ansgar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs unmountable: read block failed check_tree_block; Couldn't read tree root
Hi! My btrfs system partition went readonly. After reboot it doesnt mount anymore. System was openSUSE 13.1 Tumbleweed (kernel 3.17.??). Now I'm on openSUSE 13.2-RC1 rescue (kernel 3.16.3). I dumped (dd) the whole 250 GB SSD to some USB file and tried some btrfs tools on another copy per loopback device. But everything failed with: kernel: BTRFS: failed to read tree root on dm-2 See http://pastebin.com/raw.php?i=dPnU6nzg. Any hints where to go from here? Ciao Ansgar -- Ansgar Hockmann-Stolle, Universität Osnabrück, Rechenzentrum Albrechtstraße 28, 49076 Osnabrück, Deutschland, Raum 31/E77B +49 541 969-2749 (fax -2470), http://www.home.uos.de/anshockm -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs unmountable: read block failed check_tree_block; Couldn't read tree root
Am 27.10.14 um 14:23 schrieb Ansgar Hockmann-Stolle: Hi! My btrfs system partition went readonly. After reboot it doesnt mount anymore. System was openSUSE 13.1 Tumbleweed (kernel 3.17.??). Now I'm on openSUSE 13.2-RC1 rescue (kernel 3.16.3). I dumped (dd) the whole 250 GB SSD to some USB file and tried some btrfs tools on another copy per loopback device. But everything failed with: kernel: BTRFS: failed to read tree root on dm-2 See http://pastebin.com/raw.php?i=dPnU6nzg. Any hints where to go from here? After an offlist hint (thanks Tom!) I compiled the latest btrfs-progs 3.17 and tried some more ... linux:~/bin # ./btrfs --version Btrfs v3.17 linux:~/bin # ./btrfs-find-root /dev/sda3 Super think's the tree root is at 1015238656, chunk root 20971520 Well block 239718400 seems great, but generation doesn't match, have=661931, want=663595 level 0 Well block 239722496 seems great, but generation doesn't match, have=661931, want=663595 level 0 Well block 320098304 seems great, but generation doesn't match, have=662233, want=663595 level 0 Well block 879341568 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879345664 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879382528 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879398912 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879403008 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879423488 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879435776 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 880095232 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 881504256 seems great, but generation doesn't match, have=663228, want=663595 level 0 Well block 881512448 seems great, but generation doesn't match, have=663228, want=663595 level 0 Well block 936271872 seems great, but generation doesn't match, have=663397, want=663595 level 0 Well block 1004490752 seems great, but generation doesn't match, have=663571, want=663595 level 0 Well block 1007804416 seems great, but generation doesn't match, have=663572, want=663595 level 0 Well block 1012031488 seems great, but generation doesn't match, have=663575, want=663595 level 0 Well block 1012396032 seems great, but generation doesn't match, have=663575, want=663595 level 0 Well block 1012633600 seems great, but generation doesn't match, have=663586, want=663595 level 0 Well block 1012871168 seems great, but generation doesn't match, have=663585, want=663595 level 0 Well block 1015201792 seems great, but generation doesn't match, have=663588, want=663595 level 0 Well block 1015836672 seems great, but generation doesn't match, have=663596, want=663595 level 1 Well block 44132536320 seems great, but generation doesn't match, have=658774, want=663595 level 0 Well block 44178280448 seems great, but generation doesn't match, have=658774, want=663595 level 0 Well block 87443644416 seems great, but generation doesn't match, have=661349, want=663595 level 0 Well block 87514079232 seems great, but generation doesn't match, have=651051, want=663595 level 0 Well block 87517679616 seems great, but generation doesn't match, have=661349, want=663595 level 0 Well block 98697822208 seems great, but generation doesn't match, have=643548, want=663595 level 0 Well block 103285026816 seems great, but generation doesn't match, have=661672, want=663595 level 0 Well block 103309553664 seems great, but generation doesn't match, have=661674, want=663595 level 0 Well block 103523430400 seems great, but generation doesn't match, have=661767, want=663595 level 0 No more metdata to scan, exiting This line I found interesting because have is want + 1: Well block 1015836672 seems great, but generation doesn't match, have=663596, want=663595 level 1 And here the tail of btrfs rescue chunk-recover (full output at http://pastebin.com/raw.php?i=1D5VgDxv) [..] Total Chunks: 234 Heathy: 231 Bad: 3 Orphan Block Groups: Orphan Device Extents: Couldn't map the block 1015238656 btrfs: volumes.c:1140: btrfs_num_copies: Assertion `!(ce-start logical || ce-start + ce-size logical)' failed. Aborted Sadly btrfs check --repair keep up refusing to do its job. linux:~ # btrfs check --repair /dev/sda3 enabling repair mode Check tree block failed, want=1015238656, have=0 Check tree block failed, want=1015238656, have=0 Check tree block failed, want=1015238656, have=0 Check tree block failed, want=1015238656, have=0 Check tree block failed, want=1015238656, have=0 read block failed check_tree_block Couldn't read tree root Checking filesystem on /dev/sda3 UUID: 1af256b5-b1ad-443b-aeee-a6853e70b7e2 Critical roots corrupted, unable to fsck the FS Segmentation fault Any more hints? Ciao
Re: btrfs unmountable: read block failed check_tree_block; Couldn't read tree root
Ansgar Hockmann-Stolle posted on Mon, 27 Oct 2014 14:23:19 +0100 as excerpted: Hi! My btrfs system partition went readonly. After reboot it doesnt mount anymore. System was openSUSE 13.1 Tumbleweed (kernel 3.17.??). Now I'm on openSUSE 13.2-RC1 rescue (kernel 3.16.3). I dumped (dd) the whole 250 GB SSD to some USB file and tried some btrfs tools on another copy per loopback device. But everything failed with: kernel: BTRFS: failed to read tree root on dm-2 See http://pastebin.com/raw.php?i=dPnU6nzg. Any hints where to go from here? Good job posting initial problem information. =:^) A lot of folks take 2-3 rounds of request and reply before that much info is available on the problem. While others may be able to assist you in restoring that filesystem to working condition, my focus is more on recovering what can be recovered from it and doing a fresh mkfs. System partition, 250 GB, looks to be just under 231 GiB based on the total bytes from btrfs-show-super. How recent is your backup, and/or being a system partition, is it simply the distro installation, possibly without too much customization, thus easily reinstalled? IOW, if you were to call that partition a total loss and simply mkfs it, would you lose anything real valuable that's not backed up? (Of course, the standard lecture at this point is that if it's not backed up, by definition you didn't consider it valuable enough to be worth the hassle, so by definition it's not valuable and you can simply blow it away, but...) If you're in good shape in that regard, that's what I'd probably do at this point, keeping the dd image you made in case someone's interested in tracking the problem down and making btrfs handle that case. If there's important files on there that you don't have backed up, or if you have a backup but it's older than you'd like and you want to try to recover current versions of what you can (the situation I was in a few months ago), then btrfs restore is what you're interested in. Restore works on an /unmounted/ (and potentially unmountable, as here) filesystem, letting you retrieve files from it and copy them to other filesystems. It does NOT write anything to the damaged filesystem itself, so no worries about making the problem worse. There's a page on the wiki describing how to use btrfs restore along with btrfs-find-root in some detail, definitely more than is in the manpages or that I want to do here. https://btrfs.wiki.kernel.org/index.php/Restore Some useful hints that weren't originally clear to me as I used that page here: * Generation and transid are the same thing, a sequentially increasing number that updates every time the root tree is written. The generation recorded in your superblocks (from btrfs-show-super) is 663595, so the idea would be that generation/transid, falling back one to 663594 if 95 isn't usable, then 93, then... etc. The lower the number the further back in history you're going, so obviously, you want the closest to 663595 that you can get, that still gives you access to a (nearly) whole filesystem, or at least the parts of it you are interested in. * That page was written before restore's -D/--dry-run option was available. This option can be quite helpful, and I recommend using it to see what will actually be restored at each generation and associated tree root (bytenr/byte-number). Tho (with -v/verbose) the list of files restored will normally be too long to go thru in detail, you can either scan it or pipe the output to wc -l to get a general idea of how many files would be restored. * Restore's -l/list-tree-roots option isn't listed on the page either. btrfs restore -l -t bytenr can be quite useful, giving you a nice list of trees available for the generation corresponding to that bytenr (as found using btrfs-find-root). This is where the page's advice to pick the latest tree root with all or as many as possible of the filesystem trees in it, comes in, since this lets you easily see which trees each root has available. * I don't use snapshots or subvolumes here, while I understand OpenSuSE uses them rather heavily (via snapper). Thus I have no direct experience with restore's snapshot-related options. Presumably you can either ignore the snapshots (the apparent default) or restore them either in general (using -s) or selectively (using -r, with the appropriate snapshot rootid). * It's worth noting that restore simply lets you retrieve files. It does *NOT* retrieve file ownership or permissions, with the restored files all being owned by the user you ran btrfs restore under (presumably root), with $UMASK permissions. You'll have to restore ownership and permissions manually. When I used restore here I had a backup, but the backup was old. So I hacked up a bash scriptlet with a for loop, that went thru all the restored files recursively, comparing them against the old backup. If the file existed
Re: btrfs unmountable: read block failed check_tree_block; Couldn't read tree root
Original Message Subject: Re: btrfs unmountable: read block failed check_tree_block; Couldn't read tree root From: Qu Wenruo quwen...@cn.fujitsu.com To: Ansgar Hockmann-Stolle ansgar.hockmann-sto...@uni-osnabrueck.de, linux-btrfs@vger.kernel.org Date: 2014年10月28日 09:05 Original Message Subject: Re: btrfs unmountable: read block failed check_tree_block; Couldn't read tree root From: Ansgar Hockmann-Stolle ansgar.hockmann-sto...@uni-osnabrueck.de To: linux-btrfs@vger.kernel.org Date: 2014年10月28日 07:03 Am 27.10.14 um 14:23 schrieb Ansgar Hockmann-Stolle: Hi! My btrfs system partition went readonly. After reboot it doesnt mount anymore. System was openSUSE 13.1 Tumbleweed (kernel 3.17.??). Now I'm on openSUSE 13.2-RC1 rescue (kernel 3.16.3). I dumped (dd) the whole 250 GB SSD to some USB file and tried some btrfs tools on another copy per loopback device. But everything failed with: kernel: BTRFS: failed to read tree root on dm-2 See http://pastebin.com/raw.php?i=dPnU6nzg. Any hints where to go from here? After an offlist hint (thanks Tom!) I compiled the latest btrfs-progs 3.17 and tried some more ... linux:~/bin # ./btrfs --version Btrfs v3.17 linux:~/bin # ./btrfs-find-root /dev/sda3 Super think's the tree root is at 1015238656, chunk root 20971520 Well block 239718400 seems great, but generation doesn't match, have=661931, want=663595 level 0 Well block 239722496 seems great, but generation doesn't match, have=661931, want=663595 level 0 Well block 320098304 seems great, but generation doesn't match, have=662233, want=663595 level 0 Well block 879341568 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879345664 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879382528 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879398912 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879403008 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879423488 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 879435776 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 880095232 seems great, but generation doesn't match, have=663227, want=663595 level 0 Well block 881504256 seems great, but generation doesn't match, have=663228, want=663595 level 0 Well block 881512448 seems great, but generation doesn't match, have=663228, want=663595 level 0 Well block 936271872 seems great, but generation doesn't match, have=663397, want=663595 level 0 Well block 1004490752 seems great, but generation doesn't match, have=663571, want=663595 level 0 Well block 1007804416 seems great, but generation doesn't match, have=663572, want=663595 level 0 Well block 1012031488 seems great, but generation doesn't match, have=663575, want=663595 level 0 Well block 1012396032 seems great, but generation doesn't match, have=663575, want=663595 level 0 Well block 1012633600 seems great, but generation doesn't match, have=663586, want=663595 level 0 Well block 1012871168 seems great, but generation doesn't match, have=663585, want=663595 level 0 Well block 1015201792 seems great, but generation doesn't match, have=663588, want=663595 level 0 Well block 1015836672 seems great, but generation doesn't match, have=663596, want=663595 level 1 Well block 44132536320 seems great, but generation doesn't match, have=658774, want=663595 level 0 Well block 44178280448 seems great, but generation doesn't match, have=658774, want=663595 level 0 Well block 87443644416 seems great, but generation doesn't match, have=661349, want=663595 level 0 Well block 87514079232 seems great, but generation doesn't match, have=651051, want=663595 level 0 Well block 87517679616 seems great, but generation doesn't match, have=661349, want=663595 level 0 Well block 98697822208 seems great, but generation doesn't match, have=643548, want=663595 level 0 Well block 103285026816 seems great, but generation doesn't match, have=661672, want=663595 level 0 Well block 103309553664 seems great, but generation doesn't match, have=661674, want=663595 level 0 Well block 103523430400 seems great, but generation doesn't match, have=661767, want=663595 level 0 No more metdata to scan, exiting This line I found interesting because have is want + 1: Well block 1015836672 seems great, but generation doesn't match, have=663596, want=663595 level 1 And here the tail of btrfs rescue chunk-recover (full output at http://pastebin.com/raw.php?i=1D5VgDxv) [..] Total Chunks:234 Heathy:231 Bad:3 Orphan Block Groups: Orphan Device Extents: Couldn't map the block 1015238656 btrfs: volumes.c:1140: btrfs_num_copies: Assertion `!(ce-start logical || ce-start + ce-size logical)' failed. Aborted After looking into the 3 bad chunks, it turns
[PATCH 07/12] Btrfs-progs: re-search tree root if it changes
If we change something while scanning fs-roots we need to redo our search so that we get valid root items and have valid root cache. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- cmds-check.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index e81a26c..9522077 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -2171,7 +2171,7 @@ static int check_fs_roots(struct btrfs_root *root, struct btrfs_path path; struct btrfs_key key; struct walk_control wc; - struct extent_buffer *leaf; + struct extent_buffer *leaf, *tree_node; struct btrfs_root *tmp_root; struct btrfs_root *tree_root = root-fs_info-tree_root; int ret; @@ -2186,13 +2186,19 @@ static int check_fs_roots(struct btrfs_root *root, memset(wc, 0, sizeof(wc)); cache_tree_init(wc.shared); btrfs_init_path(path); - +again: key.offset = 0; key.objectid = 0; key.type = BTRFS_ROOT_ITEM_KEY; ret = btrfs_search_slot(NULL, tree_root, key, path, 0, 0); BUG_ON(ret 0); + tree_node = tree_root-node; while (1) { + if (tree_node != tree_root-node) { + free_root_recs_tree(root_cache); + btrfs_release_path(path); + goto again; + } leaf = path.nodes[0]; if (path.slots[0] = btrfs_header_nritems(leaf)) { ret = btrfs_next_leaf(tree_root, path); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/12] btrfs: handle errors from reading the quota tree root
On Fri, Aug 01, 2014 at 06:12:37PM -0500, Eric Sandeen wrote: Reading the quota tree root may fail with ENOENT if there is no quota, which is fine, but the code was ignoring every other error as well, which is not fine. Kinda makes you want to write a test that would have caught this. Kinda. Also, if you're still keen to iterate on this series, it looks like this pattern is copied and pasted a few times in open_ctree(). With temporary root pointers for each block, for some reason. A little helper function could take a bite out of open_ctree(). - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/12] btrfs: handle errors from reading the quota tree root
On 8/4/14, 1:35 PM, Zach Brown wrote: On Fri, Aug 01, 2014 at 06:12:37PM -0500, Eric Sandeen wrote: Reading the quota tree root may fail with ENOENT if there is no quota, which is fine, but the code was ignoring every other error as well, which is not fine. Kinda makes you want to write a test that would have caught this. Kinda. /me looks at ground, shuffles feet ... Also, if you're still keen to iterate on this series, it looks like this pattern is copied and pasted a few times in open_ctree(). With temporary root pointers for each block, for some reason. A little helper function could take a bite out of open_ctree(). Hm, the uuid tree is roughly similar, but not exactly. I think those are the only 2 optional roots (uuid because it'll get regenerated). I'm guessing the temporary root pointer is so we don't ever assign a PTR_ERR to the root in fs_info? -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/12] btrfs: handle errors from reading the quota tree root
On Mon, Aug 04, 2014 at 01:42:23PM -0500, Eric Sandeen wrote: On 8/4/14, 1:35 PM, Zach Brown wrote: On Fri, Aug 01, 2014 at 06:12:37PM -0500, Eric Sandeen wrote: Reading the quota tree root may fail with ENOENT if there is no quota, which is fine, but the code was ignoring every other error as well, which is not fine. Kinda makes you want to write a test that would have caught this. Kinda. /me looks at ground, shuffles feet ... Also, if you're still keen to iterate on this series, it looks like this pattern is copied and pasted a few times in open_ctree(). With temporary root pointers for each block, for some reason. A little helper function could take a bite out of open_ctree(). Hm, the uuid tree is roughly similar, but not exactly. I think those are the only 2 optional roots (uuid because it'll get regenerated). I'm guessing the temporary root pointer is so we don't ever assign a PTR_ERR to the root in fs_info? It took me a while to see what you meant. Yeah, using a temporary root makes sense. Using a different one for each block makes less sense. a = f(A); if (a) goto out; info-a = a; b = f(B); if (b) goto out; info-b = b; vs. r = f(A); if (r) goto out; info-a = r; r = f(B); if (r) goto out; info-b = r; - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/12] btrfs: handle errors from reading the quota tree root
On 8/4/14, 1:51 PM, Zach Brown wrote: On Mon, Aug 04, 2014 at 01:42:23PM -0500, Eric Sandeen wrote: On 8/4/14, 1:35 PM, Zach Brown wrote: On Fri, Aug 01, 2014 at 06:12:37PM -0500, Eric Sandeen wrote: Reading the quota tree root may fail with ENOENT if there is no quota, which is fine, but the code was ignoring every other error as well, which is not fine. Kinda makes you want to write a test that would have caught this. Kinda. /me looks at ground, shuffles feet ... Also, if you're still keen to iterate on this series, it looks like this pattern is copied and pasted a few times in open_ctree(). With temporary root pointers for each block, for some reason. A little helper function could take a bite out of open_ctree(). Hm, the uuid tree is roughly similar, but not exactly. I think those are the only 2 optional roots (uuid because it'll get regenerated). I'm guessing the temporary root pointer is so we don't ever assign a PTR_ERR to the root in fs_info? It took me a while to see what you meant. Yeah, using a temporary root makes sense. Using a different one for each block makes less sense. snip - z Yeah, fair enough, I thought about that after I hit send ;) I could send a V2 of patch 11/12 to do that w/o needing to redo the series too much. :) I'll see if there are any other comments. Thanks! -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/12] btrfs: handle errors from reading the quota tree root
Reading the quota tree root may fail with ENOENT if there is no quota, which is fine, but the code was ignoring every other error as well, which is not fine. Signed-off-by: Eric Sandeen sand...@redhat.com --- fs/btrfs/disk-io.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e6746be..28d35a8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2733,7 +2733,12 @@ retry_root_backup: location.objectid = BTRFS_QUOTA_TREE_OBJECTID; quota_root = btrfs_read_tree_root(tree_root, location); - if (!IS_ERR(quota_root)) { + if (IS_ERR(quota_root)) { + ret = PTR_ERR(quota_root); + /* It's fine to not have quotas */ + if (ret != -ENOENT) + goto recovery_tree_root; + } else { set_bit(BTRFS_ROOT_TRACK_DIRTY, quota_root-state); fs_info-quota_enabled = 1; fs_info-pending_quota_state = 1; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] Btrfs-progs: fsck: force to udate tree root for some cases
commit roots won't update root item in tree root if it finds updated root's bytenr is same as before. However, this is not right for fsck, we need update tree root in the following case: 1.overwrite previous root node. 2.reinit reloc data tree, this is because we skip pin relo data tree before which means we can allocate same block as before. Fix this by updating tree root ourselves for the above cases. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- cmds-check.c | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/cmds-check.c b/cmds-check.c index 4b2a8f0..ae611d1 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -5906,6 +5906,7 @@ static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, struct extent_buffer *c; struct extent_buffer *old = root-node; int level; + int ret; struct btrfs_disk_key disk_key = {0,0,0}; level = 0; @@ -5922,6 +5923,7 @@ static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, if (IS_ERR(c)) { c = old; extent_buffer_get(c); + overwrite = 1; } init: memset_extent_buffer(c, 0, 0, sizeof(struct btrfs_header)); @@ -5939,7 +5941,26 @@ init: BTRFS_UUID_SIZE); btrfs_mark_buffer_dirty(c); - + /* +* this case can happen in the following case: +* +* 1.overwrite previous root. +* +* 2.reinit reloc data root, this is because we skip pin +* down reloc data tree before which means we can allocate +* same block bytenr here. +*/ + if (old-start == c-start) { + btrfs_set_root_generation(root-root_item, + trans-transid); + root-root_item.level = btrfs_header_level(root-node); + ret = btrfs_update_root(trans, root-fs_info-tree_root, + root-root_key, root-root_item); + if (ret) { + free_extent_buffer(c); + return ret; + } + } free_extent_buffer(old); root-node = c; add_root_to_dirty_list(root); -- 1.9.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix wrong search path initialization before searching tree root
From: Wang Shilong wangsl.f...@cn.fujitsu.com To search tree root without transaction protection, we should neither search commit root nor skip locking here, fix it. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- fs/btrfs/send.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 54665d4..14f9693 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -2095,7 +2095,7 @@ static int send_subvol_begin(struct send_ctx *sctx) char *name = NULL; int namelen; - path = alloc_path_for_send(); + path = btrfs_alloc_path(); if (!path) return -ENOMEM; -- 1.8.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: look up the containing tree root id
Find the tree id of the containing subvolume for a given file or directory. For subvolume return it's own id. $ btrfs inspect-internal rootid path Signed-off-by: David Sterba dste...@suse.cz --- cmds-inspect.c | 38 ++ man/btrfs.8.in |9 + utils.c| 30 ++ utils.h|2 ++ 4 files changed, 79 insertions(+), 0 deletions(-) diff --git a/cmds-inspect.c b/cmds-inspect.c index bdebf7d..f0c8e3d 100644 --- a/cmds-inspect.c +++ b/cmds-inspect.c @@ -301,6 +301,43 @@ out: return ret ? 1 : 0; } +static const char* const cmd_rootid_usage[] = { + btrfs inspect-internal rootid path, + Get tree ID of the containing subvolume of path., + NULL +}; + +static int cmd_rootid(int argc, char **argv) +{ + int ret; + int fd = -1; + u64 rootid; + DIR *dirstream = NULL; + + if (check_argc_exact(argc, 2)) + usage(cmd_rootid_usage); + + fd = open_file_or_dir(argv[1], dirstream); + if (fd 0) { + fprintf(stderr, ERROR: can't access '%s'\n, argv[1]); + ret = -ENOENT; + goto out; + } + + ret = lookup_ino_rootid(fd, rootid); + if (ret) { + fprintf(stderr, %s: rootid failed with ret=%d\n, + argv[0], ret); + goto out; + } + + printf(%llu\n, (unsigned long long)rootid); +out: + close_file_or_dir(fd, dirstream); + + return !!ret; +} + const struct cmd_group inspect_cmd_group = { inspect_cmd_group_usage, NULL, { { inode-resolve, cmd_inode_resolve, cmd_inode_resolve_usage, @@ -309,6 +346,7 @@ const struct cmd_group inspect_cmd_group = { cmd_logical_resolve_usage, NULL, 0 }, { subvolid-resolve, cmd_subvolid_resolve, cmd_subvolid_resolve_usage, NULL, 0 }, + { rootid, cmd_rootid, cmd_rootid_usage, NULL, 0 }, NULL_CMD_STRUCT } }; diff --git a/man/btrfs.8.in b/man/btrfs.8.in index 0b69ce2..be4a6b2 100644 --- a/man/btrfs.8.in +++ b/man/btrfs.8.in @@ -80,6 +80,8 @@ btrfs \- control a btrfs filesystem .PP \fBbtrfs\fP \fBinspect-internal subvolid-resolve\fP \fIsubvolid\fP \fIpath\fP .PP +\fBbtrfs\fP \fBinspect-internal rootid\fP \fIpath\fP +.PP .PP \fBbtrfs\fP \fBsend\fP [-v] [-p \fIparent\fP] [-c \fIclone-src\fP] [-f \fIoutfile\fP] \fIsubvol\fP .PP @@ -582,6 +584,13 @@ not enough to read all the resolved results. The max value one can set is 64k. Get file system paths for the given subvolume ID. .TP +\fBinspect-internal rootid\fP \fIpath\fP +For a given file or directory, return the containing tree root id. For a +subvolume return it's own tree id. + +The result is undefined for the so-called empty subvolumes (identified by inode number 2). +.TP + \fBsend\fP [-v] [-p \fIparent\fP] [-c \fIclone-src\fP] [-f \fIoutfile\fP] \fIsubvol\fP Send the subvolume to stdout. Sends the subvolume specified by \fIsubvol\fR to stdout. diff --git a/utils.c b/utils.c index 5fa193b..8aecc4a 100644 --- a/utils.c +++ b/utils.c @@ -1975,3 +1975,33 @@ int ask_user(char *question) (answer = strtok_r(buf, \t\n\r, saveptr)) (!strcasecmp(answer, yes) || !strcasecmp(answer, y)); } + +/* + * For a given: + * - file or directory return the containing tree root id + * - subvolume return it's own tree id + * - BTRFS_EMPTY_SUBVOL_DIR_OBJECTID (directory with ino == 2) the result is + * undefined and function returns -1 + */ +int lookup_ino_rootid(int fd, u64 *rootid) +{ + struct btrfs_ioctl_ino_lookup_args args; + int ret; + int e; + + memset(args, 0, sizeof(args)); + args.treeid = 0; + args.objectid = BTRFS_FIRST_FREE_OBJECTID; + + ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP, args); + e = errno; + if (ret) { + fprintf(stderr, ERROR: Failed to lookup root id - %s\n, + strerror(e)); + return ret; + } + + *rootid = args.treeid; + + return 0; +} diff --git a/utils.h b/utils.h index fdef3f0..19f028f 100644 --- a/utils.h +++ b/utils.h @@ -81,4 +81,6 @@ int is_vol_small(char *file); int csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf, int verify); int ask_user(char *question); +int lookup_ino_rootid(int fd, u64 *rootid); + #endif -- 1.7.9 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/17] Btrfs: fix accessing a freed tree root
inode_tree_del() will move the tree root into the dead root list, and then the tree will be destroyed by the cleaner. So if we remove the delayed node which is cached in the inode after inode_tree_del(), we may access a freed tree root. Fix it. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1669c3b..7f6e78a 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4723,6 +4723,7 @@ void btrfs_evict_inode(struct inode *inode) btrfs_end_transaction(trans, root); btrfs_btree_balance_dirty(root); no_delete: + btrfs_remove_delayed_node(inode); clear_inode(inode); return; } @@ -7978,7 +7979,6 @@ void btrfs_destroy_inode(struct inode *inode) inode_tree_del(inode); btrfs_drop_extent_cache(inode, 0, (u64)-1, 0); free: - btrfs_remove_delayed_node(inode); call_rcu(inode-i_rcu, btrfs_i_callback); } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tree root
On Wed, Oct 03, 2012 at 06:35:00PM +0200, Øystein Sættem Middelthun wrote: Well block 14102764707840 seems great, but generation doesn't match, have=109268, want=109269 Because the filesystem was last in use with a pre 3.2-kernel I am unable to use mount -o recovery, but restore seems to work when I specify the previous tree-root. My problem is however that the btrfs is so large I have nowhere to temporarily put all the files. I am currently running kernel 3.5. Does mount have an option to manually tell it to use the tree root at block 14102764707840? It's not possible to supply the tree_root via mount option, but it should be possible to modify the superblock to point the tree_root to the one you've found, in a similar way that the -o recovery does that, but this hasn't been done before and minor details may make it complicated (like getting the transaction number in sync, as the tree_root has latest-1). david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
tree root
Hi! I have a broken btrfs unable to mount because it is unable to find the tree root. Using find-root I find the following: Well block 14102764707840 seems great, but generation doesn't match, have=109268, want=109269 Because the filesystem was last in use with a pre 3.2-kernel I am unable to use mount -o recovery, but restore seems to work when I specify the previous tree-root. My problem is however that the btrfs is so large I have nowhere to temporarily put all the files. I am currently running kernel 3.5. Does mount have an option to manually tell it to use the tree root at block 14102764707840? -- Best regards Øystein Middelthun -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tree root
On Wed, Oct 3, 2012 at 11:35 AM, Øystein Sættem Middelthun oyst...@middelthun.no wrote: Hi! I have a broken btrfs unable to mount because it is unable to find the tree root. Using find-root I find the following: Well block 14102764707840 seems great, but generation doesn't match, have=109268, want=109269 Because the filesystem was last in use with a pre 3.2-kernel I am unable to use mount -o recovery, but restore seems to work when I specify the previous tree-root. My problem is however that the btrfs is so large I have nowhere to temporarily put all the files. I am currently running kernel 3.5. Does mount have an option to manually tell it to use the tree root at block 14102764707840? If you do not have a suitable backup for these files, please make an effort to do what you can with restore. Some of the repair methods out there have a possibility to make the situation worse. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tree root
On 10/03/2012 07:29 PM, Mitch Harder wrote: If you do not have a suitable backup for these files, please make an effort to do what you can with restore. Some of the repair methods out there have a possibility to make the situation worse. We are talking about something like 50TB, so there is just no way I have the available space on other disks for temporary storage. So in effect you are saying that there are no other available options than a restore? If I understand correctly a feature along the lines of mount -o tree_root=14102764707840 /dev/ /path/ would solve my problem. The fs is unmountable because of a temporary loss of connection with an underlying disk controller, and I don't think the device has a lot of errors besides not being able to find the latest tree root. -- Best regards Øystein Middelthun -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tree root
On Wed, Oct 3, 2012 at 5:11 PM, Øystein Sættem Middelthun oyst...@middelthun.no wrote: On 10/03/2012 07:29 PM, Mitch Harder wrote: If you do not have a suitable backup for these files, please make an effort to do what you can with restore. Some of the repair methods out there have a possibility to make the situation worse. We are talking about something like 50TB, so there is just no way I have the available space on other disks for temporary storage. So in effect you are saying that there are no other available options than a restore? If I understand correctly a feature along the lines of mount -o tree_root=14102764707840 /dev/ /path/ would solve my problem. The fs is unmountable because of a temporary loss of connection with an underlying disk controller, and I don't think the device has a lot of errors besides not being able to find the latest tree root. You should probably try to supply some more information about your situation. Was this btrfs volume build with RAID-1? If so, we should be able to mount in degraded mode. Even so, when I see the words unable to find the tree root and temporary loss of connection with an underlying disk controller along with the implication that you have no reliable backup of this data, I worry that your situation is potentially precarious. The possibility exists that recovering your data is your best option (as opposed to restoring to previous working condition). Using backup tree-roots and super-blocks has the potential to do irreversible damage. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Failed tree root
Hi! After a short loss of contact with an underlying disk controller my btrfs partition is unmountable. dmesg provides the following information: [356850.853787] btrfs bad tree block start 0 14102771924992 [356850.853808] btrfs: failed to read tree root on dm-0 [356850.859218] btrfs: open_ctree failed Is this type of error known to be fixable by the unstable btrfsck (20120328)? I am currently running kernel 3.2.0-25. -- Best regards Øystein Middelthun -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Seg fault when csum-tree root is corrupted.
From: Anand Jain anand.j...@oracle.com Segmentation fault with the following trace when csum-tree is deliberately corrupted using btrfs-corrupt-block read block failed check_tree_block Couldn't setup csum tree checking extents Check tree block failed, want=29376512, have=0 :: read block failed check_tree_block !! Segmentation fault (core dumped) --- !! The below fix will redirect btrfsck user to use --init-csum-tree when csum root is corrupted --- btrfsck.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/btrfsck.c b/btrfsck.c index 7aac736..6ced7b5 100644 --- a/btrfsck.c +++ b/btrfsck.c @@ -3483,6 +3483,10 @@ int main(int ac, char **av) fprintf(stderr, Critical roots corrupted, unable to fsck the FS\n); return -EIO; } + if (!extent_buffer_uptodate(info-csum_root-node) !init_csum_tree) { + fprintf(stderr, Checksum root corrupted, run 'btrfsck --init-csum-tree'\n); + return -EIO; + } root = info-fs_root; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html