Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-11-13 Thread David Sterba
On Mon, Nov 05, 2018 at 04:36:34PM +, Filipe Manana wrote:
> On Mon, Nov 5, 2018 at 4:34 PM David Sterba  wrote:
> > On Mon, Nov 05, 2018 at 04:30:35PM +, Filipe Manana wrote:
> > > On Mon, Nov 5, 2018 at 4:29 PM David Sterba  wrote:
> > > > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > > > > > Ah ok makes sense.  Well in that case lets just make 
> > > > > > btrfs_read_locked_inode()
> > > > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> > > > > >
> > > > > > if (path != in_path)
> > > > >
> > > > > You mean the following on top of v4:
> > > > >
> > > > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> > > > >
> > > > > Not much different, just saves one such if statement. I'm ok with 
> > > > > that.
> > > >
> > > > Now in misc-next with v4 and the friendpaste incremental as
> > > >
> > > > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69
> > >
> > > Please don't add the incremental. It's buggy. It was meant to figure
> > > out what Josef was saying. That's why I haven't sent a V5.
> >
> > Ok dropped, I'll will wait for a proper patch.
> 
> It's V4, the last sent version. Just forget the incremental.
> Thanks.

For the record, V4 has been merged to master in 4.20-rc2.


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-11-05 Thread Filipe Manana
On Mon, Nov 5, 2018 at 4:34 PM David Sterba  wrote:
>
> On Mon, Nov 05, 2018 at 04:30:35PM +, Filipe Manana wrote:
> > On Mon, Nov 5, 2018 at 4:29 PM David Sterba  wrote:
> > >
> > > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > > > > Ah ok makes sense.  Well in that case lets just make 
> > > > > btrfs_read_locked_inode()
> > > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> > > > >
> > > > > if (path != in_path)
> > > >
> > > > You mean the following on top of v4:
> > > >
> > > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> > > >
> > > > Not much different, just saves one such if statement. I'm ok with that.
> > >
> > > Now in misc-next with v4 and the friendpaste incremental as
> > >
> > > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69
> >
> > Please don't add the incremental. It's buggy. It was meant to figure
> > out what Josef was saying. That's why I haven't sent a V5.
>
> Ok dropped, I'll will wait for a proper patch.

It's V4, the last sent version. Just forget the incremental.
Thanks.


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-11-05 Thread David Sterba
On Mon, Nov 05, 2018 at 04:30:35PM +, Filipe Manana wrote:
> On Mon, Nov 5, 2018 at 4:29 PM David Sterba  wrote:
> >
> > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > > > Ah ok makes sense.  Well in that case lets just make 
> > > > btrfs_read_locked_inode()
> > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> > > >
> > > > if (path != in_path)
> > >
> > > You mean the following on top of v4:
> > >
> > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> > >
> > > Not much different, just saves one such if statement. I'm ok with that.
> >
> > Now in misc-next with v4 and the friendpaste incremental as
> >
> > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69
> 
> Please don't add the incremental. It's buggy. It was meant to figure
> out what Josef was saying. That's why I haven't sent a V5.

Ok dropped, I'll will wait for a proper patch.


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-11-05 Thread Filipe Manana
On Mon, Nov 5, 2018 at 4:29 PM David Sterba  wrote:
>
> On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > > Ah ok makes sense.  Well in that case lets just make 
> > > btrfs_read_locked_inode()
> > > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> > >
> > > if (path != in_path)
> >
> > You mean the following on top of v4:
> >
> > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> >
> > Not much different, just saves one such if statement. I'm ok with that.
>
> Now in misc-next with v4 and the friendpaste incremental as
>
> https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69

Please don't add the incremental. It's buggy. It was meant to figure
out what Josef was saying. That's why I haven't sent a V5.


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-11-05 Thread David Sterba
On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > Ah ok makes sense.  Well in that case lets just make 
> > btrfs_read_locked_inode()
> > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> >
> > if (path != in_path)
> 
> You mean the following on top of v4:
> 
> https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> 
> Not much different, just saves one such if statement. I'm ok with that.

Now in misc-next with v4 and the friendpaste incremental as

https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-24 Thread Josef Bacik
On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> On Wed, Oct 24, 2018 at 1:40 PM Josef Bacik  wrote:
> >
> > On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote:
> > > On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik  wrote:
> > > >
> > > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote:
> > > > > From: Filipe Manana 
> > > > >
> > > > > When we are writing out a free space cache, during the transaction 
> > > > > commit
> > > > > phase, we can end up in a deadlock which results in a stack trace 
> > > > > like the
> > > > > following:
> > > > >
> > > > >  schedule+0x28/0x80
> > > > >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > > > >  ? finish_wait+0x80/0x80
> > > > >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > > > >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > > > >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > > > >  ? inode_insert5+0x119/0x190
> > > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > > >  btrfs_iget+0x113/0x690 [btrfs]
> > > > >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > > > >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > > > >  load_free_space_cache+0x7c/0x170 [btrfs]
> > > > >  ? cache_block_group+0x72/0x3b0 [btrfs]
> > > > >  cache_block_group+0x1b3/0x3b0 [btrfs]
> > > > >  ? finish_wait+0x80/0x80
> > > > >  find_free_extent+0x799/0x1010 [btrfs]
> > > > >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > > > >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > > > >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> > > > >  btrfs_cow_block+0xdc/0x180 [btrfs]
> > > > >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > > >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> > > > >  cache_save_setup+0xe4/0x3a0 [btrfs]
> > > > >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > > > >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > > > >
> > > > > At cache_save_setup() we need to update the inode item of a block 
> > > > > group's
> > > > > cache which is located in the tree root (fs_info->tree_root), which 
> > > > > means
> > > > > that it may result in COWing a leaf from that tree. If that happens we
> > > > > need to find a free metadata extent and while looking for one, if we 
> > > > > find
> > > > > a block group which was not cached yet we attempt to load its cache by
> > > > > calling cache_block_group(). However this function will try to load 
> > > > > the
> > > > > inode of the free space cache, which requires finding the matching 
> > > > > inode
> > > > > item in the tree root - if that inode item is located in the same 
> > > > > leaf as
> > > > > the inode item of the space cache we are updating at 
> > > > > cache_save_setup(),
> > > > > we end up in a deadlock, since we try to obtain a read lock on the 
> > > > > same
> > > > > extent buffer that we previously write locked.
> > > > >
> > > > > So fix this by using the tree root's commit root when searching for a
> > > > > block group's free space cache inode item when we are attempting to 
> > > > > load
> > > > > a free space cache. This is safe since block groups once loaded stay 
> > > > > in
> > > > > memory forever, as well as their caches, so after they are first 
> > > > > loaded
> > > > > we will never need to read their inode items again. For new block 
> > > > > groups,
> > > > > once they are created they get their ->cached field set to
> > > > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode 
> > > > > item.
> > > > >
> > > > > Reported-by: Andrew Nelson 
> > > > > Link: 
> > > > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> > > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > 

Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-24 Thread Filipe Manana
On Wed, Oct 24, 2018 at 1:40 PM Josef Bacik  wrote:
>
> On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote:
> > On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik  wrote:
> > >
> > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote:
> > > > From: Filipe Manana 
> > > >
> > > > When we are writing out a free space cache, during the transaction 
> > > > commit
> > > > phase, we can end up in a deadlock which results in a stack trace like 
> > > > the
> > > > following:
> > > >
> > > >  schedule+0x28/0x80
> > > >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > > >  ? finish_wait+0x80/0x80
> > > >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > > >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > > >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > > >  ? inode_insert5+0x119/0x190
> > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > >  btrfs_iget+0x113/0x690 [btrfs]
> > > >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > > >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > > >  load_free_space_cache+0x7c/0x170 [btrfs]
> > > >  ? cache_block_group+0x72/0x3b0 [btrfs]
> > > >  cache_block_group+0x1b3/0x3b0 [btrfs]
> > > >  ? finish_wait+0x80/0x80
> > > >  find_free_extent+0x799/0x1010 [btrfs]
> > > >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > > >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > > >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> > > >  btrfs_cow_block+0xdc/0x180 [btrfs]
> > > >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> > > >  cache_save_setup+0xe4/0x3a0 [btrfs]
> > > >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > > >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > > >
> > > > At cache_save_setup() we need to update the inode item of a block 
> > > > group's
> > > > cache which is located in the tree root (fs_info->tree_root), which 
> > > > means
> > > > that it may result in COWing a leaf from that tree. If that happens we
> > > > need to find a free metadata extent and while looking for one, if we 
> > > > find
> > > > a block group which was not cached yet we attempt to load its cache by
> > > > calling cache_block_group(). However this function will try to load the
> > > > inode of the free space cache, which requires finding the matching inode
> > > > item in the tree root - if that inode item is located in the same leaf 
> > > > as
> > > > the inode item of the space cache we are updating at cache_save_setup(),
> > > > we end up in a deadlock, since we try to obtain a read lock on the same
> > > > extent buffer that we previously write locked.
> > > >
> > > > So fix this by using the tree root's commit root when searching for a
> > > > block group's free space cache inode item when we are attempting to load
> > > > a free space cache. This is safe since block groups once loaded stay in
> > > > memory forever, as well as their caches, so after they are first loaded
> > > > we will never need to read their inode items again. For new block 
> > > > groups,
> > > > once they are created they get their ->cached field set to
> > > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
> > > >
> > > > Reported-by: Andrew Nelson 
> > > > Link: 
> > > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > > Tested-by: Andrew Nelson 
> > > > Signed-off-by: Filipe Manana 
> > > > ---
> > > >
> > >
> > > Now my goal is to see how many times I can get you to redo this thing.
> > >
> > > Why not instead just do
> > >
> > > if (btrfs_is_free_space_inode(inode))
> > >   path->search_commit_root = 1;
> > >
> > > in read_locked_inode?  That would be cleaner.  If we don't want to do 
> > > that for
> > > the inode cache (I'm not sure if it's ok or not) we could just do
> > >
> > > if (root == fs_info->tree_root)
> >
> > We can't (not just that at least).
> > Tried something like that, but we get into a BUG_ON when writing out
> > the space cache for new block groups (created in the current
> > transaction).
> > Because at cache_save_setup() we have this:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342
> >
> > Lookup for the inode in normal root, doesn't exist, create it then
> > repeat - if still not found, BUG_ON.
> > Could also make create_free_space_inode() return an inode pointer and
> > make it call btrfs_iget().
> >
>
> Ah ok makes sense.  Well in that case lets just make btrfs_read_locked_inode()
> take a path, and allocate it in btrfs_iget, that'll remove the ugly
>
> if (path != in_path)

You mean the following on top of v4:

https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg

Not much different, just saves one such if statement. I'm ok with that.

>
> stuff.  Thanks,
>
> Josef


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-24 Thread Josef Bacik
On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote:
> On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik  wrote:
> >
> > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote:
> > > From: Filipe Manana 
> > >
> > > When we are writing out a free space cache, during the transaction commit
> > > phase, we can end up in a deadlock which results in a stack trace like the
> > > following:
> > >
> > >  schedule+0x28/0x80
> > >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > >  ? finish_wait+0x80/0x80
> > >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > >  ? inode_insert5+0x119/0x190
> > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > >  ? kmem_cache_alloc+0x166/0x1d0
> > >  btrfs_iget+0x113/0x690 [btrfs]
> > >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > >  load_free_space_cache+0x7c/0x170 [btrfs]
> > >  ? cache_block_group+0x72/0x3b0 [btrfs]
> > >  cache_block_group+0x1b3/0x3b0 [btrfs]
> > >  ? finish_wait+0x80/0x80
> > >  find_free_extent+0x799/0x1010 [btrfs]
> > >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> > >  btrfs_cow_block+0xdc/0x180 [btrfs]
> > >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > >  ? kmem_cache_alloc+0x166/0x1d0
> > >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> > >  cache_save_setup+0xe4/0x3a0 [btrfs]
> > >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > >
> > > At cache_save_setup() we need to update the inode item of a block group's
> > > cache which is located in the tree root (fs_info->tree_root), which means
> > > that it may result in COWing a leaf from that tree. If that happens we
> > > need to find a free metadata extent and while looking for one, if we find
> > > a block group which was not cached yet we attempt to load its cache by
> > > calling cache_block_group(). However this function will try to load the
> > > inode of the free space cache, which requires finding the matching inode
> > > item in the tree root - if that inode item is located in the same leaf as
> > > the inode item of the space cache we are updating at cache_save_setup(),
> > > we end up in a deadlock, since we try to obtain a read lock on the same
> > > extent buffer that we previously write locked.
> > >
> > > So fix this by using the tree root's commit root when searching for a
> > > block group's free space cache inode item when we are attempting to load
> > > a free space cache. This is safe since block groups once loaded stay in
> > > memory forever, as well as their caches, so after they are first loaded
> > > we will never need to read their inode items again. For new block groups,
> > > once they are created they get their ->cached field set to
> > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
> > >
> > > Reported-by: Andrew Nelson 
> > > Link: 
> > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > Tested-by: Andrew Nelson 
> > > Signed-off-by: Filipe Manana 
> > > ---
> > >
> >
> > Now my goal is to see how many times I can get you to redo this thing.
> >
> > Why not instead just do
> >
> > if (btrfs_is_free_space_inode(inode))
> >   path->search_commit_root = 1;
> >
> > in read_locked_inode?  That would be cleaner.  If we don't want to do that 
> > for
> > the inode cache (I'm not sure if it's ok or not) we could just do
> >
> > if (root == fs_info->tree_root)
> 
> We can't (not just that at least).
> Tried something like that, but we get into a BUG_ON when writing out
> the space cache for new block groups (created in the current
> transaction).
> Because at cache_save_setup() we have this:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342
> 
> Lookup for the inode in normal root, doesn't exist, create it then
> repeat - if still not found, BUG_ON.
> Could also make create_free_space_inode() return an inode pointer and
> make it call btrfs_iget().
> 

Ah ok makes sense.  Well in that case lets just make btrfs_read_locked_inode()
take a path, and allocate it in btrfs_iget, that'll remove the ugly

if (path != in_path)

stuff.  Thanks,

Josef


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-24 Thread Filipe Manana
On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik  wrote:
>
> On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote:
> > From: Filipe Manana 
> >
> > When we are writing out a free space cache, during the transaction commit
> > phase, we can end up in a deadlock which results in a stack trace like the
> > following:
> >
> >  schedule+0x28/0x80
> >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> >  ? inode_insert5+0x119/0x190
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_iget+0x113/0x690 [btrfs]
> >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> >  load_free_space_cache+0x7c/0x170 [btrfs]
> >  ? cache_block_group+0x72/0x3b0 [btrfs]
> >  cache_block_group+0x1b3/0x3b0 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  find_free_extent+0x799/0x1010 [btrfs]
> >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> >  btrfs_cow_block+0xdc/0x180 [btrfs]
> >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> >  cache_save_setup+0xe4/0x3a0 [btrfs]
> >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> >
> > At cache_save_setup() we need to update the inode item of a block group's
> > cache which is located in the tree root (fs_info->tree_root), which means
> > that it may result in COWing a leaf from that tree. If that happens we
> > need to find a free metadata extent and while looking for one, if we find
> > a block group which was not cached yet we attempt to load its cache by
> > calling cache_block_group(). However this function will try to load the
> > inode of the free space cache, which requires finding the matching inode
> > item in the tree root - if that inode item is located in the same leaf as
> > the inode item of the space cache we are updating at cache_save_setup(),
> > we end up in a deadlock, since we try to obtain a read lock on the same
> > extent buffer that we previously write locked.
> >
> > So fix this by using the tree root's commit root when searching for a
> > block group's free space cache inode item when we are attempting to load
> > a free space cache. This is safe since block groups once loaded stay in
> > memory forever, as well as their caches, so after they are first loaded
> > we will never need to read their inode items again. For new block groups,
> > once they are created they get their ->cached field set to
> > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
> >
> > Reported-by: Andrew Nelson 
> > Link: 
> > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > Tested-by: Andrew Nelson 
> > Signed-off-by: Filipe Manana 
> > ---
> >
>
> Now my goal is to see how many times I can get you to redo this thing.
>
> Why not instead just do
>
> if (btrfs_is_free_space_inode(inode))
>   path->search_commit_root = 1;
>
> in read_locked_inode?  That would be cleaner.  If we don't want to do that for
> the inode cache (I'm not sure if it's ok or not) we could just do
>
> if (root == fs_info->tree_root)

We can't (not just that at least).
Tried something like that, but we get into a BUG_ON when writing out
the space cache for new block groups (created in the current
transaction).
Because at cache_save_setup() we have this:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342

Lookup for the inode in normal root, doesn't exist, create it then
repeat - if still not found, BUG_ON.
Could also make create_free_space_inode() return an inode pointer and
make it call btrfs_iget().

>
> instead.  Thanks,
>
> Josef


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-24 Thread Josef Bacik
On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> When we are writing out a free space cache, during the transaction commit
> phase, we can end up in a deadlock which results in a stack trace like the
> following:
> 
>  schedule+0x28/0x80
>  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
>  ? finish_wait+0x80/0x80
>  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
>  btrfs_search_slot+0xf6/0x9f0 [btrfs]
>  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
>  ? inode_insert5+0x119/0x190
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_iget+0x113/0x690 [btrfs]
>  __lookup_free_space_inode+0xd8/0x150 [btrfs]
>  lookup_free_space_inode+0x5b/0xb0 [btrfs]
>  load_free_space_cache+0x7c/0x170 [btrfs]
>  ? cache_block_group+0x72/0x3b0 [btrfs]
>  cache_block_group+0x1b3/0x3b0 [btrfs]
>  ? finish_wait+0x80/0x80
>  find_free_extent+0x799/0x1010 [btrfs]
>  btrfs_reserve_extent+0x9b/0x180 [btrfs]
>  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
>  __btrfs_cow_block+0x11d/0x500 [btrfs]
>  btrfs_cow_block+0xdc/0x180 [btrfs]
>  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_update_inode_item+0x46/0x100 [btrfs]
>  cache_save_setup+0xe4/0x3a0 [btrfs]
>  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
>  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> 
> At cache_save_setup() we need to update the inode item of a block group's
> cache which is located in the tree root (fs_info->tree_root), which means
> that it may result in COWing a leaf from that tree. If that happens we
> need to find a free metadata extent and while looking for one, if we find
> a block group which was not cached yet we attempt to load its cache by
> calling cache_block_group(). However this function will try to load the
> inode of the free space cache, which requires finding the matching inode
> item in the tree root - if that inode item is located in the same leaf as
> the inode item of the space cache we are updating at cache_save_setup(),
> we end up in a deadlock, since we try to obtain a read lock on the same
> extent buffer that we previously write locked.
> 
> So fix this by using the tree root's commit root when searching for a
> block group's free space cache inode item when we are attempting to load
> a free space cache. This is safe since block groups once loaded stay in
> memory forever, as well as their caches, so after they are first loaded
> we will never need to read their inode items again. For new block groups,
> once they are created they get their ->cached field set to
> BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
> 
> Reported-by: Andrew Nelson 
> Link: 
> https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> Tested-by: Andrew Nelson 
> Signed-off-by: Filipe Manana 
> ---
> 

Now my goal is to see how many times I can get you to redo this thing.

Why not instead just do 

if (btrfs_is_free_space_inode(inode))
  path->search_commit_root = 1;

in read_locked_inode?  That would be cleaner.  If we don't want to do that for
the inode cache (I'm not sure if it's ok or not) we could just do

if (root == fs_info->tree_root)

instead.  Thanks,

Josef


[PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-24 Thread fdmanana
From: Filipe Manana 

When we are writing out a free space cache, during the transaction commit
phase, we can end up in a deadlock which results in a stack trace like the
following:

 schedule+0x28/0x80
 btrfs_tree_read_lock+0x8e/0x120 [btrfs]
 ? finish_wait+0x80/0x80
 btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
 btrfs_search_slot+0xf6/0x9f0 [btrfs]
 ? evict_refill_and_join+0xd0/0xd0 [btrfs]
 ? inode_insert5+0x119/0x190
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_iget+0x113/0x690 [btrfs]
 __lookup_free_space_inode+0xd8/0x150 [btrfs]
 lookup_free_space_inode+0x5b/0xb0 [btrfs]
 load_free_space_cache+0x7c/0x170 [btrfs]
 ? cache_block_group+0x72/0x3b0 [btrfs]
 cache_block_group+0x1b3/0x3b0 [btrfs]
 ? finish_wait+0x80/0x80
 find_free_extent+0x799/0x1010 [btrfs]
 btrfs_reserve_extent+0x9b/0x180 [btrfs]
 btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
 __btrfs_cow_block+0x11d/0x500 [btrfs]
 btrfs_cow_block+0xdc/0x180 [btrfs]
 btrfs_search_slot+0x3bd/0x9f0 [btrfs]
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_update_inode_item+0x46/0x100 [btrfs]
 cache_save_setup+0xe4/0x3a0 [btrfs]
 btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
 btrfs_commit_transaction+0xcb/0x8b0 [btrfs]

At cache_save_setup() we need to update the inode item of a block group's
cache which is located in the tree root (fs_info->tree_root), which means
that it may result in COWing a leaf from that tree. If that happens we
need to find a free metadata extent and while looking for one, if we find
a block group which was not cached yet we attempt to load its cache by
calling cache_block_group(). However this function will try to load the
inode of the free space cache, which requires finding the matching inode
item in the tree root - if that inode item is located in the same leaf as
the inode item of the space cache we are updating at cache_save_setup(),
we end up in a deadlock, since we try to obtain a read lock on the same
extent buffer that we previously write locked.

So fix this by using the tree root's commit root when searching for a
block group's free space cache inode item when we are attempting to load
a free space cache. This is safe since block groups once loaded stay in
memory forever, as well as their caches, so after they are first loaded
we will never need to read their inode items again. For new block groups,
once they are created they get their ->cached field set to
BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.

Reported-by: Andrew Nelson 
Link: 
https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
Tested-by: Andrew Nelson 
Signed-off-by: Filipe Manana 
---

V2: Made the solution more generic, since the problem could happen in any
path COWing an extent buffer from the root tree.

Applies on top of a previous patch titled:

 "Btrfs: fix deadlock when writing out free space caches"

V3: Made it more simple by avoiding the atomic from V2 and pass the root
to find_free_extent().

V4: Changed the whole approach so that we lookup for free space cache inode
items using the commit root instead.
The previous approach was causing some transactions to be aborted with
-ENOSPC during umount because sometimes we were skipping cache loading
of all metadata block groups.

 fs/btrfs/ctree.h|  3 +++
 fs/btrfs/free-space-cache.c | 22 +-
 fs/btrfs/inode.c| 32 ++--
 3 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2cddfe7806a4..2b34b2a05ad6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3177,6 +3177,9 @@ void btrfs_destroy_inode(struct inode *inode);
 int btrfs_drop_inode(struct inode *inode);
 int __init btrfs_init_cachep(void);
 void __cold btrfs_destroy_cachep(void);
+struct inode *btrfs_iget_path(struct super_block *s, struct btrfs_key 
*location,
+ struct btrfs_root *root, int *new,
+ struct btrfs_path *path);
 struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location,
 struct btrfs_root *root, int *was_new);
 struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 65b79500e09f..7265f35324f6 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -68,7 +68,8 @@ static struct inode *__lookup_free_space_inode(struct 
btrfs_root *root,
btrfs_disk_key_to_cpu(, _key);
btrfs_release_path(path);
 
-   inode = btrfs_iget(fs_info->sb, , root, NULL);
+   inode = btrfs_iget_path(fs_info->sb, , root, NULL, path);
+   btrfs_release_path(path);
if (IS_ERR(inode))
return inode;
 
@@ -830,6 +831,25 @@ 

Re: [PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-24 Thread Filipe Manana
On Wed, Oct 24, 2018 at 5:08 AM Josef Bacik  wrote:
>
> On Mon, Oct 22, 2018 at 11:05:08PM +0100, Filipe Manana wrote:
> > On Mon, Oct 22, 2018 at 8:18 PM Josef Bacik  wrote:
> > >
> > > On Mon, Oct 22, 2018 at 08:10:37PM +0100, fdman...@kernel.org wrote:
> > > > From: Filipe Manana 
> > > >
> > > > When we are writing out a free space cache, during the transaction 
> > > > commit
> > > > phase, we can end up in a deadlock which results in a stack trace like 
> > > > the
> > > > following:
> > > >
> > > >  schedule+0x28/0x80
> > > >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > > >  ? finish_wait+0x80/0x80
> > > >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > > >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > > >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > > >  ? inode_insert5+0x119/0x190
> > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > >  btrfs_iget+0x113/0x690 [btrfs]
> > > >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > > >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > > >  load_free_space_cache+0x7c/0x170 [btrfs]
> > > >  ? cache_block_group+0x72/0x3b0 [btrfs]
> > > >  cache_block_group+0x1b3/0x3b0 [btrfs]
> > > >  ? finish_wait+0x80/0x80
> > > >  find_free_extent+0x799/0x1010 [btrfs]
> > > >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > > >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > > >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> > > >  btrfs_cow_block+0xdc/0x180 [btrfs]
> > > >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> > > >  cache_save_setup+0xe4/0x3a0 [btrfs]
> > > >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > > >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > > >
> > > > At cache_save_setup() we need to update the inode item of a block 
> > > > group's
> > > > cache which is located in the tree root (fs_info->tree_root), which 
> > > > means
> > > > that it may result in COWing a leaf from that tree. If that happens we
> > > > need to find a free metadata extent and while looking for one, if we 
> > > > find
> > > > a block group which was not cached yet we attempt to load its cache by
> > > > calling cache_block_group(). However this function will try to load the
> > > > inode of the free space cache, which requires finding the matching inode
> > > > item in the tree root - if that inode item is located in the same leaf 
> > > > as
> > > > the inode item of the space cache we are updating at cache_save_setup(),
> > > > we end up in a deadlock, since we try to obtain a read lock on the same
> > > > extent buffer that we previously write locked.
> > > >
> > > > So fix this by skipping the loading of free space caches of any block
> > > > groups that are not yet cached (rare cases) if we are COWing an extent
> > > > buffer from the root tree and space caching is enabled (-o space_cache
> > > > mount option). This is a rare case and its downside is failure to
> > > > find a free extent (return -ENOSPC) when all the already cached block
> > > > groups have no free extents.
> > > >
> > > > Reported-by: Andrew Nelson 
> > > > Link: 
> > > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > > Tested-by: Andrew Nelson 
> > > > Signed-off-by: Filipe Manana 
> > >
> > > Great, thanks,
> > >
> > > Reviewed-by: Josef Bacik 
> >
> > So this makes many fstests occasionally fail with aborted transaction
> > due to ENOSPC.
> > It's late and I haven't verified yet, but I suppose this is because we
> > always skip loading the cache regardless of currently being COWing an
> > existing leaf or allocating a new one (growing the tree).
> > Needs to be fixed.
> >
>
> How about we just use path->search_commit_root?  If we're loading the cache we
> just want the last committed version, it's not like we read it after we've
> written it.  Then we can go back to business as usual.  Thanks,

Yeah, that works. It was an idea before sending v1 but it felt a bit
dirty at the time.
Left fstests running overnight using the commit root approach and
everything seems fine. Sending a v4.

Another alternative, which would solve similar problems for any tree,
would be to allow getting a read lock on an
already (spin) write locked eb, just like we do for blocking write
locks:  https://friendpaste.com/6XrGXb5p0RSJGixUFZ8lCt

thanks


>
> Josef


Re: [PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-23 Thread Josef Bacik
On Mon, Oct 22, 2018 at 11:05:08PM +0100, Filipe Manana wrote:
> On Mon, Oct 22, 2018 at 8:18 PM Josef Bacik  wrote:
> >
> > On Mon, Oct 22, 2018 at 08:10:37PM +0100, fdman...@kernel.org wrote:
> > > From: Filipe Manana 
> > >
> > > When we are writing out a free space cache, during the transaction commit
> > > phase, we can end up in a deadlock which results in a stack trace like the
> > > following:
> > >
> > >  schedule+0x28/0x80
> > >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > >  ? finish_wait+0x80/0x80
> > >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > >  ? inode_insert5+0x119/0x190
> > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > >  ? kmem_cache_alloc+0x166/0x1d0
> > >  btrfs_iget+0x113/0x690 [btrfs]
> > >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > >  load_free_space_cache+0x7c/0x170 [btrfs]
> > >  ? cache_block_group+0x72/0x3b0 [btrfs]
> > >  cache_block_group+0x1b3/0x3b0 [btrfs]
> > >  ? finish_wait+0x80/0x80
> > >  find_free_extent+0x799/0x1010 [btrfs]
> > >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> > >  btrfs_cow_block+0xdc/0x180 [btrfs]
> > >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > >  ? kmem_cache_alloc+0x166/0x1d0
> > >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> > >  cache_save_setup+0xe4/0x3a0 [btrfs]
> > >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > >
> > > At cache_save_setup() we need to update the inode item of a block group's
> > > cache which is located in the tree root (fs_info->tree_root), which means
> > > that it may result in COWing a leaf from that tree. If that happens we
> > > need to find a free metadata extent and while looking for one, if we find
> > > a block group which was not cached yet we attempt to load its cache by
> > > calling cache_block_group(). However this function will try to load the
> > > inode of the free space cache, which requires finding the matching inode
> > > item in the tree root - if that inode item is located in the same leaf as
> > > the inode item of the space cache we are updating at cache_save_setup(),
> > > we end up in a deadlock, since we try to obtain a read lock on the same
> > > extent buffer that we previously write locked.
> > >
> > > So fix this by skipping the loading of free space caches of any block
> > > groups that are not yet cached (rare cases) if we are COWing an extent
> > > buffer from the root tree and space caching is enabled (-o space_cache
> > > mount option). This is a rare case and its downside is failure to
> > > find a free extent (return -ENOSPC) when all the already cached block
> > > groups have no free extents.
> > >
> > > Reported-by: Andrew Nelson 
> > > Link: 
> > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > Tested-by: Andrew Nelson 
> > > Signed-off-by: Filipe Manana 
> >
> > Great, thanks,
> >
> > Reviewed-by: Josef Bacik 
> 
> So this makes many fstests occasionally fail with aborted transaction
> due to ENOSPC.
> It's late and I haven't verified yet, but I suppose this is because we
> always skip loading the cache regardless of currently being COWing an
> existing leaf or allocating a new one (growing the tree).
> Needs to be fixed.
> 

How about we just use path->search_commit_root?  If we're loading the cache we
just want the last committed version, it's not like we read it after we've
written it.  Then we can go back to business as usual.  Thanks,

Josef


Re: [PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-22 Thread Filipe Manana
On Mon, Oct 22, 2018 at 8:18 PM Josef Bacik  wrote:
>
> On Mon, Oct 22, 2018 at 08:10:37PM +0100, fdman...@kernel.org wrote:
> > From: Filipe Manana 
> >
> > When we are writing out a free space cache, during the transaction commit
> > phase, we can end up in a deadlock which results in a stack trace like the
> > following:
> >
> >  schedule+0x28/0x80
> >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> >  ? inode_insert5+0x119/0x190
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_iget+0x113/0x690 [btrfs]
> >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> >  load_free_space_cache+0x7c/0x170 [btrfs]
> >  ? cache_block_group+0x72/0x3b0 [btrfs]
> >  cache_block_group+0x1b3/0x3b0 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  find_free_extent+0x799/0x1010 [btrfs]
> >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> >  btrfs_cow_block+0xdc/0x180 [btrfs]
> >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> >  cache_save_setup+0xe4/0x3a0 [btrfs]
> >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> >
> > At cache_save_setup() we need to update the inode item of a block group's
> > cache which is located in the tree root (fs_info->tree_root), which means
> > that it may result in COWing a leaf from that tree. If that happens we
> > need to find a free metadata extent and while looking for one, if we find
> > a block group which was not cached yet we attempt to load its cache by
> > calling cache_block_group(). However this function will try to load the
> > inode of the free space cache, which requires finding the matching inode
> > item in the tree root - if that inode item is located in the same leaf as
> > the inode item of the space cache we are updating at cache_save_setup(),
> > we end up in a deadlock, since we try to obtain a read lock on the same
> > extent buffer that we previously write locked.
> >
> > So fix this by skipping the loading of free space caches of any block
> > groups that are not yet cached (rare cases) if we are COWing an extent
> > buffer from the root tree and space caching is enabled (-o space_cache
> > mount option). This is a rare case and its downside is failure to
> > find a free extent (return -ENOSPC) when all the already cached block
> > groups have no free extents.
> >
> > Reported-by: Andrew Nelson 
> > Link: 
> > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > Tested-by: Andrew Nelson 
> > Signed-off-by: Filipe Manana 
>
> Great, thanks,
>
> Reviewed-by: Josef Bacik 

So this makes many fstests occasionally fail with aborted transaction
due to ENOSPC.
It's late and I haven't verified yet, but I suppose this is because we
always skip loading the cache regardless of currently being COWing an
existing leaf or allocating a new one (growing the tree).
Needs to be fixed.

>
> Josef


Re: [PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-22 Thread Josef Bacik
On Mon, Oct 22, 2018 at 08:10:37PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> When we are writing out a free space cache, during the transaction commit
> phase, we can end up in a deadlock which results in a stack trace like the
> following:
> 
>  schedule+0x28/0x80
>  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
>  ? finish_wait+0x80/0x80
>  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
>  btrfs_search_slot+0xf6/0x9f0 [btrfs]
>  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
>  ? inode_insert5+0x119/0x190
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_iget+0x113/0x690 [btrfs]
>  __lookup_free_space_inode+0xd8/0x150 [btrfs]
>  lookup_free_space_inode+0x5b/0xb0 [btrfs]
>  load_free_space_cache+0x7c/0x170 [btrfs]
>  ? cache_block_group+0x72/0x3b0 [btrfs]
>  cache_block_group+0x1b3/0x3b0 [btrfs]
>  ? finish_wait+0x80/0x80
>  find_free_extent+0x799/0x1010 [btrfs]
>  btrfs_reserve_extent+0x9b/0x180 [btrfs]
>  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
>  __btrfs_cow_block+0x11d/0x500 [btrfs]
>  btrfs_cow_block+0xdc/0x180 [btrfs]
>  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_update_inode_item+0x46/0x100 [btrfs]
>  cache_save_setup+0xe4/0x3a0 [btrfs]
>  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
>  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> 
> At cache_save_setup() we need to update the inode item of a block group's
> cache which is located in the tree root (fs_info->tree_root), which means
> that it may result in COWing a leaf from that tree. If that happens we
> need to find a free metadata extent and while looking for one, if we find
> a block group which was not cached yet we attempt to load its cache by
> calling cache_block_group(). However this function will try to load the
> inode of the free space cache, which requires finding the matching inode
> item in the tree root - if that inode item is located in the same leaf as
> the inode item of the space cache we are updating at cache_save_setup(),
> we end up in a deadlock, since we try to obtain a read lock on the same
> extent buffer that we previously write locked.
> 
> So fix this by skipping the loading of free space caches of any block
> groups that are not yet cached (rare cases) if we are COWing an extent
> buffer from the root tree and space caching is enabled (-o space_cache
> mount option). This is a rare case and its downside is failure to
> find a free extent (return -ENOSPC) when all the already cached block
> groups have no free extents.
> 
> Reported-by: Andrew Nelson 
> Link: 
> https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> Tested-by: Andrew Nelson 
> Signed-off-by: Filipe Manana 

Great, thanks,

Reviewed-by: Josef Bacik 

Josef


[PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-22 Thread fdmanana
From: Filipe Manana 

When we are writing out a free space cache, during the transaction commit
phase, we can end up in a deadlock which results in a stack trace like the
following:

 schedule+0x28/0x80
 btrfs_tree_read_lock+0x8e/0x120 [btrfs]
 ? finish_wait+0x80/0x80
 btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
 btrfs_search_slot+0xf6/0x9f0 [btrfs]
 ? evict_refill_and_join+0xd0/0xd0 [btrfs]
 ? inode_insert5+0x119/0x190
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_iget+0x113/0x690 [btrfs]
 __lookup_free_space_inode+0xd8/0x150 [btrfs]
 lookup_free_space_inode+0x5b/0xb0 [btrfs]
 load_free_space_cache+0x7c/0x170 [btrfs]
 ? cache_block_group+0x72/0x3b0 [btrfs]
 cache_block_group+0x1b3/0x3b0 [btrfs]
 ? finish_wait+0x80/0x80
 find_free_extent+0x799/0x1010 [btrfs]
 btrfs_reserve_extent+0x9b/0x180 [btrfs]
 btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
 __btrfs_cow_block+0x11d/0x500 [btrfs]
 btrfs_cow_block+0xdc/0x180 [btrfs]
 btrfs_search_slot+0x3bd/0x9f0 [btrfs]
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_update_inode_item+0x46/0x100 [btrfs]
 cache_save_setup+0xe4/0x3a0 [btrfs]
 btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
 btrfs_commit_transaction+0xcb/0x8b0 [btrfs]

At cache_save_setup() we need to update the inode item of a block group's
cache which is located in the tree root (fs_info->tree_root), which means
that it may result in COWing a leaf from that tree. If that happens we
need to find a free metadata extent and while looking for one, if we find
a block group which was not cached yet we attempt to load its cache by
calling cache_block_group(). However this function will try to load the
inode of the free space cache, which requires finding the matching inode
item in the tree root - if that inode item is located in the same leaf as
the inode item of the space cache we are updating at cache_save_setup(),
we end up in a deadlock, since we try to obtain a read lock on the same
extent buffer that we previously write locked.

So fix this by skipping the loading of free space caches of any block
groups that are not yet cached (rare cases) if we are COWing an extent
buffer from the root tree and space caching is enabled (-o space_cache
mount option). This is a rare case and its downside is failure to
find a free extent (return -ENOSPC) when all the already cached block
groups have no free extents.

Reported-by: Andrew Nelson 
Link: 
https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
Tested-by: Andrew Nelson 
Signed-off-by: Filipe Manana 
---

V2: Made the solution more generic, since the problem could happen in any
path COWing an extent buffer from the root tree.

Applies on top of a previous patch titled:

 "Btrfs: fix deadlock when writing out free space caches"

V3: Made it more simple by avoiding the atomic from V2 and pass the root
to find_free_extent().

 fs/btrfs/extent-tree.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 577878324799..e5fd086799ab 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7218,12 +7218,13 @@ btrfs_release_block_group(struct 
btrfs_block_group_cache *cache,
  * the free space extent currently.
  */
 static noinline int find_free_extent(struct btrfs_fs_info *fs_info,
+   struct btrfs_root *root,
u64 ram_bytes, u64 num_bytes, u64 empty_size,
u64 hint_byte, struct btrfs_key *ins,
u64 flags, int delalloc)
 {
int ret = 0;
-   struct btrfs_root *root = fs_info->extent_root;
+   struct btrfs_root *extent_root = fs_info->extent_root;
struct btrfs_free_cluster *last_ptr = NULL;
struct btrfs_block_group_cache *block_group = NULL;
u64 search_start = 0;
@@ -7366,7 +7367,20 @@ static noinline int find_free_extent(struct 
btrfs_fs_info *fs_info,
 
 have_block_group:
cached = block_group_cache_done(block_group);
-   if (unlikely(!cached)) {
+   /*
+* If we are COWing a leaf/node from the root tree, we can not
+* start caching of a block group because we could deadlock on
+* an extent buffer of the root tree.
+* Because if we are COWing a leaf from the root tree, we are
+* holding a write lock on the respective extent buffer, and
+* loading the space cache of a block group requires searching
+* for its inode item in the root tree, which can be located
+* in the same leaf that we previously write locked, in which
+* case we will deadlock.
+*/
+   if (unlikely(!cached) &&

Re: [PATCH v2] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-22 Thread Filipe Manana
On Mon, Oct 22, 2018 at 7:56 PM Josef Bacik  wrote:
>
> On Mon, Oct 22, 2018 at 07:48:30PM +0100, fdman...@kernel.org wrote:
> > From: Filipe Manana 
> >
> > When we are writing out a free space cache, during the transaction commit
> > phase, we can end up in a deadlock which results in a stack trace like the
> > following:
> >
> >  schedule+0x28/0x80
> >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> >  ? inode_insert5+0x119/0x190
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_iget+0x113/0x690 [btrfs]
> >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> >  load_free_space_cache+0x7c/0x170 [btrfs]
> >  ? cache_block_group+0x72/0x3b0 [btrfs]
> >  cache_block_group+0x1b3/0x3b0 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  find_free_extent+0x799/0x1010 [btrfs]
> >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> >  btrfs_cow_block+0xdc/0x180 [btrfs]
> >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> >  cache_save_setup+0xe4/0x3a0 [btrfs]
> >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> >
> > At cache_save_setup() we need to update the inode item of a block group's
> > cache which is located in the tree root (fs_info->tree_root), which means
> > that it may result in COWing a leaf from that tree. If that happens we
> > need to find a free metadata extent and while looking for one, if we find
> > a block group which was not cached yet we attempt to load its cache by
> > calling cache_block_group(). However this function will try to load the
> > inode of the free space cache, which requires finding the matching inode
> > item in the tree root - if that inode item is located in the same leaf as
> > the inode item of the space cache we are updating at cache_save_setup(),
> > we end up in a deadlock, since we try to obtain a read lock on the same
> > extent buffer that we previously write locked.
> >
> > So fix this by skipping the loading of free space caches of any block
> > groups that are not yet cached (rare cases) if we are COWing an extent
> > buffer from the root tree and space caching is enabled (-o space_cache
> > mount option). This is a rare case and its downside is failure to
> > find a free extent (return -ENOSPC) when all the already cached block
> > groups have no free extents.
> >
> > Reported-by: Andrew Nelson 
> > Link: 
> > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > Tested-by: Andrew Nelson 
> > Signed-off-by: Filipe Manana 
> > ---
> >
> > V2: Made the solution more generic, since the problem could happen in any
> > path COWing an extent buffer from the root tree.
> >
> > Applies on top of a previous patch titled:
> >
> >  "Btrfs: fix deadlock when writing out free space caches"
> >
> >  fs/btrfs/ctree.c   |  4 
> >  fs/btrfs/ctree.h   |  3 +++
> >  fs/btrfs/disk-io.c |  2 ++
> >  fs/btrfs/extent-tree.c | 15 ++-
> >  4 files changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index 089b46c4d97f..646aafda55a3 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -1065,10 +1065,14 @@ static noinline int __btrfs_cow_block(struct 
> > btrfs_trans_handle *trans,
> >   root == fs_info->chunk_root ||
> >   root == fs_info->dev_root)
> >   trans->can_flush_pending_bgs = false;
> > + else if (root == fs_info->tree_root)
> > + atomic_inc(_info->tree_root_cows);
> >
> >   cow = btrfs_alloc_tree_block(trans, root, parent_start,
> >   root->root_key.objectid, _key, level,
> >   search_start, empty_size);
> > + if (root == fs_info->tree_root)
> > + atomic_dec(_info->tree_root_cows);
>
> Do we need this though?  Our root should be the root we're cow'ing the block
> for, and it should be passed all the way down to find_free_extent properly, so
> we really should be able to just do if (root == fs_info->tree_root) and not 
> add
> all this stuff.

Ups, I missed that we could actually pass the root down to find_free_extent().
That's why made the atomic thing.

Sending v3, thanks.

>
> Not to mention this will race with anybody else doing stuff, so if another
> thread that isn't actually touching the tree_root it would skip caching a 
> block
> group when it's completely ok for that thread to do it.  Thanks,
>
> Josef


Re: [PATCH v2] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-22 Thread Josef Bacik
On Mon, Oct 22, 2018 at 07:48:30PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> When we are writing out a free space cache, during the transaction commit
> phase, we can end up in a deadlock which results in a stack trace like the
> following:
> 
>  schedule+0x28/0x80
>  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
>  ? finish_wait+0x80/0x80
>  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
>  btrfs_search_slot+0xf6/0x9f0 [btrfs]
>  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
>  ? inode_insert5+0x119/0x190
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_iget+0x113/0x690 [btrfs]
>  __lookup_free_space_inode+0xd8/0x150 [btrfs]
>  lookup_free_space_inode+0x5b/0xb0 [btrfs]
>  load_free_space_cache+0x7c/0x170 [btrfs]
>  ? cache_block_group+0x72/0x3b0 [btrfs]
>  cache_block_group+0x1b3/0x3b0 [btrfs]
>  ? finish_wait+0x80/0x80
>  find_free_extent+0x799/0x1010 [btrfs]
>  btrfs_reserve_extent+0x9b/0x180 [btrfs]
>  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
>  __btrfs_cow_block+0x11d/0x500 [btrfs]
>  btrfs_cow_block+0xdc/0x180 [btrfs]
>  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_update_inode_item+0x46/0x100 [btrfs]
>  cache_save_setup+0xe4/0x3a0 [btrfs]
>  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
>  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> 
> At cache_save_setup() we need to update the inode item of a block group's
> cache which is located in the tree root (fs_info->tree_root), which means
> that it may result in COWing a leaf from that tree. If that happens we
> need to find a free metadata extent and while looking for one, if we find
> a block group which was not cached yet we attempt to load its cache by
> calling cache_block_group(). However this function will try to load the
> inode of the free space cache, which requires finding the matching inode
> item in the tree root - if that inode item is located in the same leaf as
> the inode item of the space cache we are updating at cache_save_setup(),
> we end up in a deadlock, since we try to obtain a read lock on the same
> extent buffer that we previously write locked.
> 
> So fix this by skipping the loading of free space caches of any block
> groups that are not yet cached (rare cases) if we are COWing an extent
> buffer from the root tree and space caching is enabled (-o space_cache
> mount option). This is a rare case and its downside is failure to
> find a free extent (return -ENOSPC) when all the already cached block
> groups have no free extents.
> 
> Reported-by: Andrew Nelson 
> Link: 
> https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> Tested-by: Andrew Nelson 
> Signed-off-by: Filipe Manana 
> ---
> 
> V2: Made the solution more generic, since the problem could happen in any
> path COWing an extent buffer from the root tree.
> 
> Applies on top of a previous patch titled:
> 
>  "Btrfs: fix deadlock when writing out free space caches"
> 
>  fs/btrfs/ctree.c   |  4 
>  fs/btrfs/ctree.h   |  3 +++
>  fs/btrfs/disk-io.c |  2 ++
>  fs/btrfs/extent-tree.c | 15 ++-
>  4 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 089b46c4d97f..646aafda55a3 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1065,10 +1065,14 @@ static noinline int __btrfs_cow_block(struct 
> btrfs_trans_handle *trans,
>   root == fs_info->chunk_root ||
>   root == fs_info->dev_root)
>   trans->can_flush_pending_bgs = false;
> + else if (root == fs_info->tree_root)
> + atomic_inc(_info->tree_root_cows);
>  
>   cow = btrfs_alloc_tree_block(trans, root, parent_start,
>   root->root_key.objectid, _key, level,
>   search_start, empty_size);
> + if (root == fs_info->tree_root)
> + atomic_dec(_info->tree_root_cows);

Do we need this though?  Our root should be the root we're cow'ing the block
for, and it should be passed all the way down to find_free_extent properly, so
we really should be able to just do if (root == fs_info->tree_root) and not add
all this stuff.

Not to mention this will race with anybody else doing stuff, so if another
thread that isn't actually touching the tree_root it would skip caching a block
group when it's completely ok for that thread to do it.  Thanks,

Josef


Re: [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-22 Thread Filipe Manana
On Mon, Oct 22, 2018 at 7:07 PM Josef Bacik  wrote:
>
> On Mon, Oct 22, 2018 at 10:09:46AM +0100, fdman...@kernel.org wrote:
> > From: Filipe Manana 
> >
> > When we are writing out a free space cache, during the transaction commit
> > phase, we can end up in a deadlock which results in a stack trace like the
> > following:
> >
> >  schedule+0x28/0x80
> >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> >  ? inode_insert5+0x119/0x190
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_iget+0x113/0x690 [btrfs]
> >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> >  load_free_space_cache+0x7c/0x170 [btrfs]
> >  ? cache_block_group+0x72/0x3b0 [btrfs]
> >  cache_block_group+0x1b3/0x3b0 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  find_free_extent+0x799/0x1010 [btrfs]
> >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> >  btrfs_cow_block+0xdc/0x180 [btrfs]
> >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> >  cache_save_setup+0xe4/0x3a0 [btrfs]
> >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> >
> > At cache_save_setup() we need to update the inode item of a block group's
> > cache which is located in the tree root (fs_info->tree_root), which means
> > that it may result in COWing a leaf from that tree. If that happens we
> > need to find a free metadata extent and while looking for one, if we find
> > a block group which was not cached yet we attempt to load its cache by
> > calling cache_block_group(). However this function will try to load the
> > inode of the free space cache, which requires finding the matching inode
> > item in the tree root - if that inode item is located in the same leaf as
> > the inode item of the space cache we are updating at cache_save_setup(),
> > we end up in a deadlock, since we try to obtain a read lock on the same
> > extent buffer that we previously write locked.
> >
> > So fix this by skipping the loading of free space caches of any block
> > groups that are not yet cached (rare cases) if we are updating the inode
> > of a free space cache. This is a rare case and its downside is failure to
> > find a free extent (return -ENOSPC) when all the already cached block
> > groups have no free extents.
> >
>
> Actually isn't this a problem for anything that tries to allocate an extent
> while in the tree_root?  Like we snapshot or make a subvolume or anything?

Indeed. Initially I considered making it more generic (like the recent
fix for deadlock when cowing from extent/chunk/device tree) but I
totally forgot about the other cases like you mentioned.

>  We
> should just disallow if root == tree_root.  But even then we only need to do
> this if we're using SPACE_CACHE, using the ye-olde caching or the free space
> tree are both ok.  Let's just limit it to those cases.  Thanks,

Yep, makes all sense.

Thanks! V2 sent out.

>
> Josef


[PATCH v2] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-22 Thread fdmanana
From: Filipe Manana 

When we are writing out a free space cache, during the transaction commit
phase, we can end up in a deadlock which results in a stack trace like the
following:

 schedule+0x28/0x80
 btrfs_tree_read_lock+0x8e/0x120 [btrfs]
 ? finish_wait+0x80/0x80
 btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
 btrfs_search_slot+0xf6/0x9f0 [btrfs]
 ? evict_refill_and_join+0xd0/0xd0 [btrfs]
 ? inode_insert5+0x119/0x190
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_iget+0x113/0x690 [btrfs]
 __lookup_free_space_inode+0xd8/0x150 [btrfs]
 lookup_free_space_inode+0x5b/0xb0 [btrfs]
 load_free_space_cache+0x7c/0x170 [btrfs]
 ? cache_block_group+0x72/0x3b0 [btrfs]
 cache_block_group+0x1b3/0x3b0 [btrfs]
 ? finish_wait+0x80/0x80
 find_free_extent+0x799/0x1010 [btrfs]
 btrfs_reserve_extent+0x9b/0x180 [btrfs]
 btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
 __btrfs_cow_block+0x11d/0x500 [btrfs]
 btrfs_cow_block+0xdc/0x180 [btrfs]
 btrfs_search_slot+0x3bd/0x9f0 [btrfs]
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_update_inode_item+0x46/0x100 [btrfs]
 cache_save_setup+0xe4/0x3a0 [btrfs]
 btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
 btrfs_commit_transaction+0xcb/0x8b0 [btrfs]

At cache_save_setup() we need to update the inode item of a block group's
cache which is located in the tree root (fs_info->tree_root), which means
that it may result in COWing a leaf from that tree. If that happens we
need to find a free metadata extent and while looking for one, if we find
a block group which was not cached yet we attempt to load its cache by
calling cache_block_group(). However this function will try to load the
inode of the free space cache, which requires finding the matching inode
item in the tree root - if that inode item is located in the same leaf as
the inode item of the space cache we are updating at cache_save_setup(),
we end up in a deadlock, since we try to obtain a read lock on the same
extent buffer that we previously write locked.

So fix this by skipping the loading of free space caches of any block
groups that are not yet cached (rare cases) if we are COWing an extent
buffer from the root tree and space caching is enabled (-o space_cache
mount option). This is a rare case and its downside is failure to
find a free extent (return -ENOSPC) when all the already cached block
groups have no free extents.

Reported-by: Andrew Nelson 
Link: 
https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
Tested-by: Andrew Nelson 
Signed-off-by: Filipe Manana 
---

V2: Made the solution more generic, since the problem could happen in any
path COWing an extent buffer from the root tree.

Applies on top of a previous patch titled:

 "Btrfs: fix deadlock when writing out free space caches"

 fs/btrfs/ctree.c   |  4 
 fs/btrfs/ctree.h   |  3 +++
 fs/btrfs/disk-io.c |  2 ++
 fs/btrfs/extent-tree.c | 15 ++-
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 089b46c4d97f..646aafda55a3 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1065,10 +1065,14 @@ static noinline int __btrfs_cow_block(struct 
btrfs_trans_handle *trans,
root == fs_info->chunk_root ||
root == fs_info->dev_root)
trans->can_flush_pending_bgs = false;
+   else if (root == fs_info->tree_root)
+   atomic_inc(_info->tree_root_cows);
 
cow = btrfs_alloc_tree_block(trans, root, parent_start,
root->root_key.objectid, _key, level,
search_start, empty_size);
+   if (root == fs_info->tree_root)
+   atomic_dec(_info->tree_root_cows);
trans->can_flush_pending_bgs = true;
if (IS_ERR(cow))
return PTR_ERR(cow);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2cddfe7806a4..1b73433c69e2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1121,6 +1121,9 @@ struct btrfs_fs_info {
u32 sectorsize;
u32 stripesize;
 
+   /* Number of tasks corrently COWing a leaf/node from the tree root. */
+   atomic_t tree_root_cows;
+
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
spinlock_t ref_verify_lock;
struct rb_root block_tree;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 05dc3c17cb62..08c15bf69fb5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2782,6 +2782,8 @@ int open_ctree(struct super_block *sb,
fs_info->sectorsize = 4096;
fs_info->stripesize = 4096;
 
+   atomic_set(_info->tree_root_cows, 0);
+
ret = btrfs_alloc_stripe_hash_table(fs_info);
if (ret) {
err = ret;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 577878324799..14f35e020050 100644
--- a/fs/btrfs/ext

Re: [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-22 Thread Josef Bacik
On Mon, Oct 22, 2018 at 10:09:46AM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> When we are writing out a free space cache, during the transaction commit
> phase, we can end up in a deadlock which results in a stack trace like the
> following:
> 
>  schedule+0x28/0x80
>  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
>  ? finish_wait+0x80/0x80
>  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
>  btrfs_search_slot+0xf6/0x9f0 [btrfs]
>  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
>  ? inode_insert5+0x119/0x190
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_iget+0x113/0x690 [btrfs]
>  __lookup_free_space_inode+0xd8/0x150 [btrfs]
>  lookup_free_space_inode+0x5b/0xb0 [btrfs]
>  load_free_space_cache+0x7c/0x170 [btrfs]
>  ? cache_block_group+0x72/0x3b0 [btrfs]
>  cache_block_group+0x1b3/0x3b0 [btrfs]
>  ? finish_wait+0x80/0x80
>  find_free_extent+0x799/0x1010 [btrfs]
>  btrfs_reserve_extent+0x9b/0x180 [btrfs]
>  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
>  __btrfs_cow_block+0x11d/0x500 [btrfs]
>  btrfs_cow_block+0xdc/0x180 [btrfs]
>  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_update_inode_item+0x46/0x100 [btrfs]
>  cache_save_setup+0xe4/0x3a0 [btrfs]
>  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
>  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> 
> At cache_save_setup() we need to update the inode item of a block group's
> cache which is located in the tree root (fs_info->tree_root), which means
> that it may result in COWing a leaf from that tree. If that happens we
> need to find a free metadata extent and while looking for one, if we find
> a block group which was not cached yet we attempt to load its cache by
> calling cache_block_group(). However this function will try to load the
> inode of the free space cache, which requires finding the matching inode
> item in the tree root - if that inode item is located in the same leaf as
> the inode item of the space cache we are updating at cache_save_setup(),
> we end up in a deadlock, since we try to obtain a read lock on the same
> extent buffer that we previously write locked.
> 
> So fix this by skipping the loading of free space caches of any block
> groups that are not yet cached (rare cases) if we are updating the inode
> of a free space cache. This is a rare case and its downside is failure to
> find a free extent (return -ENOSPC) when all the already cached block
> groups have no free extents.
> 

Actually isn't this a problem for anything that tries to allocate an extent
while in the tree_root?  Like we snapshot or make a subvolume or anything?  We
should just disallow if root == tree_root.  But even then we only need to do
this if we're using SPACE_CACHE, using the ye-olde caching or the free space
tree are both ok.  Let's just limit it to those cases.  Thanks,

Josef


Re: [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-22 Thread Filipe Manana
On Mon, Oct 22, 2018 at 10:10 AM  wrote:
>
> From: Filipe Manana 
>
> When we are writing out a free space cache, during the transaction commit
> phase, we can end up in a deadlock which results in a stack trace like the
> following:
>
>  schedule+0x28/0x80
>  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
>  ? finish_wait+0x80/0x80
>  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
>  btrfs_search_slot+0xf6/0x9f0 [btrfs]
>  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
>  ? inode_insert5+0x119/0x190
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_iget+0x113/0x690 [btrfs]
>  __lookup_free_space_inode+0xd8/0x150 [btrfs]
>  lookup_free_space_inode+0x5b/0xb0 [btrfs]
>  load_free_space_cache+0x7c/0x170 [btrfs]
>  ? cache_block_group+0x72/0x3b0 [btrfs]
>  cache_block_group+0x1b3/0x3b0 [btrfs]
>  ? finish_wait+0x80/0x80
>  find_free_extent+0x799/0x1010 [btrfs]
>  btrfs_reserve_extent+0x9b/0x180 [btrfs]
>  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
>  __btrfs_cow_block+0x11d/0x500 [btrfs]
>  btrfs_cow_block+0xdc/0x180 [btrfs]
>  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_update_inode_item+0x46/0x100 [btrfs]
>  cache_save_setup+0xe4/0x3a0 [btrfs]
>  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
>  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
>
> At cache_save_setup() we need to update the inode item of a block group's
> cache which is located in the tree root (fs_info->tree_root), which means
> that it may result in COWing a leaf from that tree. If that happens we
> need to find a free metadata extent and while looking for one, if we find
> a block group which was not cached yet we attempt to load its cache by
> calling cache_block_group(). However this function will try to load the
> inode of the free space cache, which requires finding the matching inode
> item in the tree root - if that inode item is located in the same leaf as
> the inode item of the space cache we are updating at cache_save_setup(),
> we end up in a deadlock, since we try to obtain a read lock on the same
> extent buffer that we previously write locked.
>
> So fix this by skipping the loading of free space caches of any block
> groups that are not yet cached (rare cases) if we are updating the inode
> of a free space cache. This is a rare case and its downside is failure to
> find a free extent (return -ENOSPC) when all the already cached block
> groups have no free extents.
>
> Reported-by: Andrew Nelson 
> Link: 
> https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> Signed-off-by: Filipe Manana 

Tested-by: Andrew Nelson 

> ---
>  fs/btrfs/ctree.h   |  3 +++
>  fs/btrfs/disk-io.c |  2 ++
>  fs/btrfs/extent-tree.c | 22 +-
>  3 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 2cddfe7806a4..d23ee26eb17d 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1121,6 +1121,9 @@ struct btrfs_fs_info {
> u32 sectorsize;
> u32 stripesize;
>
> +   /* The task currently updating a free space cache inode item. */
> +   struct task_struct *space_cache_updater;
> +
>  #ifdef CONFIG_BTRFS_FS_REF_VERIFY
> spinlock_t ref_verify_lock;
> struct rb_root block_tree;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 05dc3c17cb62..aa5e9a91e560 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2782,6 +2782,8 @@ int open_ctree(struct super_block *sb,
> fs_info->sectorsize = 4096;
> fs_info->stripesize = 4096;
>
> +   fs_info->space_cache_updater = NULL;
> +
> ret = btrfs_alloc_stripe_hash_table(fs_info);
> if (ret) {
> err = ret;
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 577878324799..e93040449771 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3364,7 +3364,9 @@ static int cache_save_setup(struct 
> btrfs_block_group_cache *block_group,
>  * time.
>  */
> BTRFS_I(inode)->generation = 0;
> +   fs_info->space_cache_updater = current;
> ret = btrfs_update_inode(trans, root, inode);
> +   fs_info->space_cache_updater = NULL;
> if (ret) {
> /*
>  * So theoretically we could recover from this, simply set the
> @@ -7366,7 +7368,25 @@ static noinline int find_free_extent(struct 
> btrfs_fs_info *fs_info,
>
>  have_bl

Re: [PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-22 Thread Filipe Manana
On Mon, Oct 22, 2018 at 10:10 AM  wrote:
>
> From: Filipe Manana 
>
> When we are writing out a free space cache, during the transaction commit
> phase, we can end up in a deadlock which results in a stack trace like the
> following:
>
>  schedule+0x28/0x80
>  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
>  ? finish_wait+0x80/0x80
>  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
>  btrfs_search_slot+0xf6/0x9f0 [btrfs]
>  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
>  ? inode_insert5+0x119/0x190
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_iget+0x113/0x690 [btrfs]
>  __lookup_free_space_inode+0xd8/0x150 [btrfs]
>  lookup_free_space_inode+0x5b/0xb0 [btrfs]
>  load_free_space_cache+0x7c/0x170 [btrfs]
>  ? cache_block_group+0x72/0x3b0 [btrfs]
>  cache_block_group+0x1b3/0x3b0 [btrfs]
>  ? finish_wait+0x80/0x80
>  find_free_extent+0x799/0x1010 [btrfs]
>  btrfs_reserve_extent+0x9b/0x180 [btrfs]
>  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
>  __btrfs_cow_block+0x11d/0x500 [btrfs]
>  btrfs_cow_block+0xdc/0x180 [btrfs]
>  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_update_inode_item+0x46/0x100 [btrfs]
>  cache_save_setup+0xe4/0x3a0 [btrfs]
>  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
>  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
>
> At cache_save_setup() we need to update the inode item of a block group's
> cache which is located in the tree root (fs_info->tree_root), which means
> that it may result in COWing a leaf from that tree. If that happens we
> need to find a free metadata extent and while looking for one, if we find
> a block group which was not cached yet we attempt to load its cache by
> calling cache_block_group(). However this function will try to load the
> inode of the free space cache, which requires finding the matching inode
> item in the tree root - if that inode item is located in the same leaf as
> the inode item of the space cache we are updating at cache_save_setup(),
> we end up in a deadlock, since we try to obtain a read lock on the same
> extent buffer that we previously write locked.
>
> So fix this by skipping the loading of free space caches of any block
> groups that are not yet cached (rare cases) if we are updating the inode
> of a free space cache. This is a rare case and its downside is failure to
> find a free extent (return -ENOSPC) when all the already cached block
> groups have no free extents.
>
> Reported-by: Andrew Nelson 
> Link: 
> https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> Signed-off-by: Filipe Manana 

Andrew Nelson 


> ---
>  fs/btrfs/ctree.h   |  3 +++
>  fs/btrfs/disk-io.c |  2 ++
>  fs/btrfs/extent-tree.c | 22 +-
>  3 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 2cddfe7806a4..d23ee26eb17d 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1121,6 +1121,9 @@ struct btrfs_fs_info {
> u32 sectorsize;
> u32 stripesize;
>
> +   /* The task currently updating a free space cache inode item. */
> +   struct task_struct *space_cache_updater;
> +
>  #ifdef CONFIG_BTRFS_FS_REF_VERIFY
> spinlock_t ref_verify_lock;
> struct rb_root block_tree;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 05dc3c17cb62..aa5e9a91e560 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2782,6 +2782,8 @@ int open_ctree(struct super_block *sb,
> fs_info->sectorsize = 4096;
> fs_info->stripesize = 4096;
>
> +   fs_info->space_cache_updater = NULL;
> +
> ret = btrfs_alloc_stripe_hash_table(fs_info);
> if (ret) {
> err = ret;
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 577878324799..e93040449771 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3364,7 +3364,9 @@ static int cache_save_setup(struct 
> btrfs_block_group_cache *block_group,
>  * time.
>  */
> BTRFS_I(inode)->generation = 0;
> +   fs_info->space_cache_updater = current;
> ret = btrfs_update_inode(trans, root, inode);
> +   fs_info->space_cache_updater = NULL;
> if (ret) {
> /*
>  * So theoretically we could recover from this, simply set the
> @@ -7366,7 +7368,25 @@ static noinline int find_free_extent(struct 
> btrfs_fs_info *fs_info,
>
>  have_block_group:
> cached = b

[PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-22 Thread fdmanana
From: Filipe Manana 

When we are writing out a free space cache, during the transaction commit
phase, we can end up in a deadlock which results in a stack trace like the
following:

 schedule+0x28/0x80
 btrfs_tree_read_lock+0x8e/0x120 [btrfs]
 ? finish_wait+0x80/0x80
 btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
 btrfs_search_slot+0xf6/0x9f0 [btrfs]
 ? evict_refill_and_join+0xd0/0xd0 [btrfs]
 ? inode_insert5+0x119/0x190
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_iget+0x113/0x690 [btrfs]
 __lookup_free_space_inode+0xd8/0x150 [btrfs]
 lookup_free_space_inode+0x5b/0xb0 [btrfs]
 load_free_space_cache+0x7c/0x170 [btrfs]
 ? cache_block_group+0x72/0x3b0 [btrfs]
 cache_block_group+0x1b3/0x3b0 [btrfs]
 ? finish_wait+0x80/0x80
 find_free_extent+0x799/0x1010 [btrfs]
 btrfs_reserve_extent+0x9b/0x180 [btrfs]
 btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
 __btrfs_cow_block+0x11d/0x500 [btrfs]
 btrfs_cow_block+0xdc/0x180 [btrfs]
 btrfs_search_slot+0x3bd/0x9f0 [btrfs]
 btrfs_lookup_inode+0x3a/0xc0 [btrfs]
 ? kmem_cache_alloc+0x166/0x1d0
 btrfs_update_inode_item+0x46/0x100 [btrfs]
 cache_save_setup+0xe4/0x3a0 [btrfs]
 btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
 btrfs_commit_transaction+0xcb/0x8b0 [btrfs]

At cache_save_setup() we need to update the inode item of a block group's
cache which is located in the tree root (fs_info->tree_root), which means
that it may result in COWing a leaf from that tree. If that happens we
need to find a free metadata extent and while looking for one, if we find
a block group which was not cached yet we attempt to load its cache by
calling cache_block_group(). However this function will try to load the
inode of the free space cache, which requires finding the matching inode
item in the tree root - if that inode item is located in the same leaf as
the inode item of the space cache we are updating at cache_save_setup(),
we end up in a deadlock, since we try to obtain a read lock on the same
extent buffer that we previously write locked.

So fix this by skipping the loading of free space caches of any block
groups that are not yet cached (rare cases) if we are updating the inode
of a free space cache. This is a rare case and its downside is failure to
find a free extent (return -ENOSPC) when all the already cached block
groups have no free extents.

Reported-by: Andrew Nelson 
Link: 
https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
Signed-off-by: Filipe Manana 
---
 fs/btrfs/ctree.h   |  3 +++
 fs/btrfs/disk-io.c |  2 ++
 fs/btrfs/extent-tree.c | 22 +-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2cddfe7806a4..d23ee26eb17d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1121,6 +1121,9 @@ struct btrfs_fs_info {
u32 sectorsize;
u32 stripesize;
 
+   /* The task currently updating a free space cache inode item. */
+   struct task_struct *space_cache_updater;
+
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
spinlock_t ref_verify_lock;
struct rb_root block_tree;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 05dc3c17cb62..aa5e9a91e560 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2782,6 +2782,8 @@ int open_ctree(struct super_block *sb,
fs_info->sectorsize = 4096;
fs_info->stripesize = 4096;
 
+   fs_info->space_cache_updater = NULL;
+
ret = btrfs_alloc_stripe_hash_table(fs_info);
if (ret) {
err = ret;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 577878324799..e93040449771 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3364,7 +3364,9 @@ static int cache_save_setup(struct 
btrfs_block_group_cache *block_group,
 * time.
 */
BTRFS_I(inode)->generation = 0;
+   fs_info->space_cache_updater = current;
ret = btrfs_update_inode(trans, root, inode);
+   fs_info->space_cache_updater = NULL;
if (ret) {
/*
 * So theoretically we could recover from this, simply set the
@@ -7366,7 +7368,25 @@ static noinline int find_free_extent(struct 
btrfs_fs_info *fs_info,
 
 have_block_group:
cached = block_group_cache_done(block_group);
-   if (unlikely(!cached)) {
+   /*
+* If we are updating the inode of a free space cache, we can
+* not start the caching of any block group because we could
+* deadlock on an extent buffer of the root tree.
+* At cache_save_setup() we update the inode item of a free
+* space cache, so we may need to COW a leaf of the root tree,
+* which implies finding a free metadata extent. So if when
+* searching for such 

Re: Couldn't read tree root BTRFS - SEED

2018-04-09 Thread Senén Vidal Blanco
Hello Qu,
Thank you very much for the answer. I am sorry to answer late because before 
bothering more, I decided to perform another cloning test.
In this time he has joined the second device correctly.
The problem has occurred because they are BCACHE devices, and although the 
SEED disk does not vary over time, buy the read-write yes, with which I have 
made sure this time empty the cache before performing the cloning.

Now, when inserting the discs I have managed to mount everything correctly:
Label: 'SEED_MD2'  uuid: 285b16fa-b297-43ed-bb3b-311950729eb6
Total devices 2 FS bytes used 2.00TiB
devid1 size 5.44TiB used 1.97TiB path /dev/bcache2
devid2 size 1.80TiB used 122.03GiB path /dev/bcache1

The truth is that the BTRFS-BCACHE-MDADM combination is complex, but the 
results are really satisfactory and enormous versatility is achieved.

To thank the BTRFS team and other responsible people for doing a great job.


El jueves, 5 de abril de 2018 20:57:49 (CEST) Qu Wenruo escribió:
> On 2018年04月05日 19:27, Senén Vidal Blanco wrote:
> > Hello,
> > I have a problem when mounting the btrfs system with two units (one in
> > SEED
> > mode and the other in read-write mode)
> > 
> > Initially it is functioning correctly in production:
> > 
> > btrfs fi sh
> > Label: 'SEED_MD2'  uuid: 285b16fa-b297-43ed-bb3b-311950729eb6
> > 
> >Total devices 2 FS bytes used 2.00TiB
> >devid1 size 5.44TiB used 1.97TiB path /dev/bcache1
> >devid2 size 1.80TiB used 119.03GiB path /dev/bcache0
> > 
> > Label: 'BOOT'  uuid: ce8fd2ef-975c-417a-b90c-280f8d324c44
> > 
> >Total devices 1 FS bytes used 536.86MiB
> >devid1 size 1.86GiB used 1.15GiB path /dev/md1
> > 
> > Label: 'SEED_MD2'  uuid: 851e4474-d375-4a25-b202-949e51f05877
> > 
> >Total devices 1 FS bytes used 1.94TiB
> >devid1 size 5.44TiB used 1.97TiB path /dev/bcache1
> > 
> > The bcache0 and bcache1 units are what I mention.
> > bcache1 is the SEED
> > bcache0 is the read-write
> > 
> > When it comes to mounting the two copy mirror devices that is in
> > production I have this result:
> > 
> > parent transid verify failed on 2184045428736 wanted 269593 found 266275
> > parent transid verify failed on 2184045428736 wanted 269593 found 266275
> 
> It's not the problem that btrfs fails to assemble correct seed/rw
> devices list, it's something wrong (metadata corruption) happened.
> 
> > Ignoring transid failure
> 
> So this is btrfs-progs (btrfs check). Kernel doesn't ignore any transid
> failure AFAIK.
> 
> > Couldn't map the block 2287467560960
> > No mapping for 2287467560960-2287467577344
> > Couldn't map the block 2287467560960
> > bytenr mismatch, want=2287467560960, have=0
> > Couldn't read tree root
> > Label: 'SEED_MD2'  uuid: 851e4474-d375-4a25-b202-949e51f05877
> > 
> >Total devices 1 FS bytes used 1.94TiB
> >devid1 size 5.44TiB used 1.97TiB path /dev/bcache1
> > 
> > Label: 'SEED_MD2'  uuid: 285b16fa-b297-43ed-bb3b-311950729eb6
> > 
> >Total devices 2 FS bytes used 1.98TiB
> >devid2 size 1.80TiB used 102.03GiB path /dev/bcache2
> >*** Some devices missing
> > 
> > Would there be any way to tell BTRFS that the missing unit is the one that
> > corresponds to the SEED unit so that it correctly rebuilds the file
> > system?
> 
> The rw device is corrupted.
> 
> To ensure your seed device is safe, please run "btrfs check  device>" to see if there is any problem.
> 
> From current output, it seems that your chunk tree is corrupted, so that
> btrfs check can't even map the logical address for your tree root.
> 
> And to further debug the corrupted fs, please attach the following data:
> 
> 1) btrfs inspect dump-super -fa 
> 2) btrfs inspect dump-tree -t chunk 
> 
> Thanks,
> Qu
> 
> > Thank you.

-- 
Senén Vidal Blanco - SGISoft S.L.
 
Tlf.: 986413322 - 660923711
GPG ID 466431A8AF01F99A
http://www.sgisoft.com/
--
 


signature.asc
Description: This is a digitally signed message part.


Re: Couldn't read tree root BTRFS - SEED

2018-04-05 Thread Qu Wenruo


On 2018年04月05日 19:27, Senén Vidal Blanco wrote:
> Hello,
> I have a problem when mounting the btrfs system with two units (one in SEED 
> mode and the other in read-write mode)
>  
> Initially it is functioning correctly in production:
>  
> btrfs fi sh 
> Label: 'SEED_MD2'  uuid: 285b16fa-b297-43ed-bb3b-311950729eb6 
>Total devices 2 FS bytes used 2.00TiB 
>devid1 size 5.44TiB used 1.97TiB path /dev/bcache1 
>devid2 size 1.80TiB used 119.03GiB path /dev/bcache0 
>  
> Label: 'BOOT'  uuid: ce8fd2ef-975c-417a-b90c-280f8d324c44 
>Total devices 1 FS bytes used 536.86MiB 
>devid1 size 1.86GiB used 1.15GiB path /dev/md1 
>  
> Label: 'SEED_MD2'  uuid: 851e4474-d375-4a25-b202-949e51f05877 
>Total devices 1 FS bytes used 1.94TiB 
>devid1 size 5.44TiB used 1.97TiB path /dev/bcache1
>  
> The bcache0 and bcache1 units are what I mention.
> bcache1 is the SEED
> bcache0 is the read-write
>  
> When it comes to mounting the two copy mirror devices that is in production I 
> have this result:
>  
> parent transid verify failed on 2184045428736 wanted 269593 found 266275 
> parent transid verify failed on 2184045428736 wanted 269593 found 266275

It's not the problem that btrfs fails to assemble correct seed/rw
devices list, it's something wrong (metadata corruption) happened.



> Ignoring transid failure

So this is btrfs-progs (btrfs check). Kernel doesn't ignore any transid
failure AFAIK.

> Couldn't map the block 2287467560960 
> No mapping for 2287467560960-2287467577344 
> Couldn't map the block 2287467560960 
> bytenr mismatch, want=2287467560960, have=0 
> Couldn't read tree root 
> Label: 'SEED_MD2'  uuid: 851e4474-d375-4a25-b202-949e51f05877 
>Total devices 1 FS bytes used 1.94TiB 
>devid1 size 5.44TiB used 1.97TiB path /dev/bcache1 
>  
> Label: 'SEED_MD2'  uuid: 285b16fa-b297-43ed-bb3b-311950729eb6 
>Total devices 2 FS bytes used 1.98TiB 
>devid2 size 1.80TiB used 102.03GiB path /dev/bcache2 
>*** Some devices missing
>  
> Would there be any way to tell BTRFS that the missing unit is the one that 
> corresponds to the SEED unit so that it correctly rebuilds the file system?

The rw device is corrupted.

To ensure your seed device is safe, please run "btrfs check " to see if there is any problem.

From current output, it seems that your chunk tree is corrupted, so that
btrfs check can't even map the logical address for your tree root.

And to further debug the corrupted fs, please attach the following data:

1) btrfs inspect dump-super -fa 
2) btrfs inspect dump-tree -t chunk 

Thanks,
Qu

> 
> Thank you.
> 



signature.asc
Description: OpenPGP digital signature


Couldn't read tree root BTRFS - SEED

2018-04-05 Thread Senén Vidal Blanco
Hello,
I have a problem when mounting the btrfs system with two units (one in SEED 
mode and the other in read-write mode)
 
Initially it is functioning correctly in production:
 
btrfs fi sh 
Label: 'SEED_MD2'  uuid: 285b16fa-b297-43ed-bb3b-311950729eb6 
   Total devices 2 FS bytes used 2.00TiB 
   devid1 size 5.44TiB used 1.97TiB path /dev/bcache1 
   devid2 size 1.80TiB used 119.03GiB path /dev/bcache0 
 
Label: 'BOOT'  uuid: ce8fd2ef-975c-417a-b90c-280f8d324c44 
   Total devices 1 FS bytes used 536.86MiB 
   devid1 size 1.86GiB used 1.15GiB path /dev/md1 
 
Label: 'SEED_MD2'  uuid: 851e4474-d375-4a25-b202-949e51f05877 
   Total devices 1 FS bytes used 1.94TiB 
   devid1 size 5.44TiB used 1.97TiB path /dev/bcache1
 
The bcache0 and bcache1 units are what I mention.
bcache1 is the SEED
bcache0 is the read-write
 
When it comes to mounting the two copy mirror devices that is in production I 
have this result:
 
parent transid verify failed on 2184045428736 wanted 269593 found 266275 
parent transid verify failed on 2184045428736 wanted 269593 found 266275 
Ignoring transid failure 
Couldn't map the block 2287467560960 
No mapping for 2287467560960-2287467577344 
Couldn't map the block 2287467560960 
bytenr mismatch, want=2287467560960, have=0 
Couldn't read tree root 
Label: 'SEED_MD2'  uuid: 851e4474-d375-4a25-b202-949e51f05877 
   Total devices 1 FS bytes used 1.94TiB 
   devid1 size 5.44TiB used 1.97TiB path /dev/bcache1 
 
Label: 'SEED_MD2'  uuid: 285b16fa-b297-43ed-bb3b-311950729eb6 
   Total devices 2 FS bytes used 1.98TiB 
   devid2 size 1.80TiB used 102.03GiB path /dev/bcache2 
   *** Some devices missing
 
Would there be any way to tell BTRFS that the missing unit is the one that 
corresponds to the SEED unit so that it correctly rebuilds the file system?

Thank you.



signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] btrfs-progs: Use more restrict check to read out tree root

2017-05-02 Thread David Sterba
On Tue, Apr 25, 2017 at 04:40:16PM +0800, Qu Wenruo wrote:
> For fuzzed image bko-156811-bad-parent-ref-qgroup-verify.raw, it cause
> qgroup to report -ENOMEM.
> 
> But the fact is, such image is heavy damaged so there is not valid root
> item for extent tree.
> 
> Normal extent tree key in root tree should be (EXTENT_TREE ROOT_ITEM 0),
> while in that fuzzed image, we got (EXTENT_TREE EXXTENT_DATA SOME_NUMBER).
> 
> It's btrfs_find_last_root() that only checks the objectid, not caring
> key type leads to such problem.
> 
> Fix by doing extra check on key type for such case.
> 
> Signed-off-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Use more restrict check to read out tree root

2017-04-25 Thread Qu Wenruo
For fuzzed image bko-156811-bad-parent-ref-qgroup-verify.raw, it cause
qgroup to report -ENOMEM.

But the fact is, such image is heavy damaged so there is not valid root
item for extent tree.

Normal extent tree key in root tree should be (EXTENT_TREE ROOT_ITEM 0),
while in that fuzzed image, we got (EXTENT_TREE EXXTENT_DATA SOME_NUMBER).

It's btrfs_find_last_root() that only checks the objectid, not caring
key type leads to such problem.

Fix by doing extra check on key type for such case.

Signed-off-by: Qu Wenruo 
---
 root-tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/root-tree.c b/root-tree.c
index ab01a140..6b8f8c1c 100644
--- a/root-tree.c
+++ b/root-tree.c
@@ -51,7 +51,8 @@ int btrfs_find_last_root(struct btrfs_root *root, u64 
objectid,
l = path->nodes[0];
slot = path->slots[0] - 1;
btrfs_item_key_to_cpu(l, _key, slot);
-   if (found_key.objectid != objectid) {
+   if (found_key.type != BTRFS_ROOT_ITEM_KEY ||
+   found_key.objectid != objectid) {
ret = -ENOENT;
goto out;
}
-- 
2.12.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rescue a single-device btrfs instance with zeroed tree root

2016-06-27 Thread Ivan Shapovalov
On 2016-06-21 at 20:23 +0300, Ivan Shapovalov wrote:
> Hello,
> 
> So this is another case of "I lost my partition and do not have
> backups". More precisely, _this_ is the backup and it turned out to
> be
> damaged.
> 
> (The backup was made by partclone.btrfs. Together with a zeroed out
> tree root, this asks for a bug in partclone...)
> 
> So: the tree root is zeroes, backup roots are zeroes too,
> btrfs-find-root only reports blocks of level 0 (needed is 1).
> Is there something that can be done? Maybe it is possible to
> reconstruct the root from its children?
> Operations log following.
> Please Cc: me in replies as I'm not subscribed to the list.


Anyone? I'd appreciate any advice on how to rebuild the tree roots.

I'd even write some code if someone tells me the disk format and
logical tree constraints (i. e. in which order to put pointers to child
nodes).

BTW, it looks like all tree roots are lost, i. e. btrfs-find-root with
any objectid (extent tree, subvolume tree, anything in ctree.h) finds
only level 0 nodes.

It should be possible to rebuild intermediate nodes, isn't it? Do they
contain valuable information?

Please Cc: me in replies as I'm not subscribed to the list.

Thanks,
-- 

Ivan Shapovalov / intelfx /

> 
> 1. regular mount
> 
> # mount /dev/loop0p3 /mnt/temp
> 
> === dmesg ===
> [106737.299592] BTRFS info (device loop0p3): disk space caching is
> enabled
> [106737.299604] BTRFS: has skinny extents
> [106737.299884] BTRFS error (device loop0p3): bad tree block start 0
> 162633449472
> [106737.299888] BTRFS: failed to read tree root on loop0p3
> [106737.314359] BTRFS: open_ctree failed
> === end dmesg ===
> 
> 2. mount with -o recovery
> 
> # mount -o recovery /dev/loop0p3 /mnt/temp
> 
> === dmesg ===
> [106742.305720] BTRFS warning (device loop0p3): 'recovery' is
> deprecated, use 'usebackuproot' instead
> [106742.305722] BTRFS info (device loop0p3): trying to use backup
> root at mount time
> [106742.305724] BTRFS info (device loop0p3): disk space caching is
> enabled
> [106742.305725] BTRFS: has skinny extents
> [106742.306056] BTRFS error (device loop0p3): bad tree block start 0
> 162633449472
> [106742.306060] BTRFS: failed to read tree root on loop0p3
> [106742.306069] BTRFS error (device loop0p3): bad tree block start 0
> 162633449472
> [106742.306071] BTRFS: failed to read tree root on loop0p3
> [106742.306084] BTRFS error (device loop0p3): bad tree block start 0
> 162632237056
> [106742.306086] BTRFS: failed to read tree root on loop0p3
> [106742.306097] BTRFS error (device loop0p3): bad tree block start 0
> 162626682880
> [106742.306100] BTRFS: failed to read tree root on loop0p3
> [106742.306111] BTRFS error (device loop0p3): bad tree block start 0
> 162609168384
> [106742.306114] BTRFS: failed to read tree root on loop0p3
> [106742.327272] BTRFS: open_ctree failed
> === end dmesg ===
> 
> 
> 3. btrfs-find-root
> 
> # btrfs-find-root /dev/loop0p3
> Couldn't read tree root
> Superblock thinks the generation is 22332
> Superblock thinks the level is 1
> Well block 162633646080(gen: 22332 level: 0) seems good, but
> generation/level doesn't match, want gen: 22332 level: 1
> Well block 162633596928(gen: 22332 level: 0) seems good, but
> generation/level doesn't match, want gen: 22332 level: 1
> Well block 162633515008(gen: 22332 level: 0) seems good, but
> generation/level doesn't match, want gen: 22332 level: 1
> 
> Thanks,

signature.asc
Description: This is a digitally signed message part


Rescue a single-device btrfs instance with zeroed tree root

2016-06-21 Thread Ivan Shapovalov
Hello,

So this is another case of "I lost my partition and do not have
backups". More precisely, _this_ is the backup and it turned out to be
damaged.

(The backup was made by partclone.btrfs. Together with a zeroed out
tree root, this asks for a bug in partclone...)

So: the tree root is zeroes, backup roots are zeroes too,
btrfs-find-root only reports blocks of level 0 (needed is 1).
Is there something that can be done? Maybe it is possible to
reconstruct the root from its children?
Operations log following.
Please Cc: me in replies as I'm not subscribed to the list.

1. regular mount

# mount /dev/loop0p3 /mnt/temp

=== dmesg ===
[106737.299592] BTRFS info (device loop0p3): disk space caching is enabled
[106737.299604] BTRFS: has skinny extents
[106737.299884] BTRFS error (device loop0p3): bad tree block start 0 
162633449472
[106737.299888] BTRFS: failed to read tree root on loop0p3
[106737.314359] BTRFS: open_ctree failed
=== end dmesg ===

2. mount with -o recovery

# mount -o recovery /dev/loop0p3 /mnt/temp

=== dmesg ===
[106742.305720] BTRFS warning (device loop0p3): 'recovery' is deprecated, use 
'usebackuproot' instead
[106742.305722] BTRFS info (device loop0p3): trying to use backup root at mount 
time
[106742.305724] BTRFS info (device loop0p3): disk space caching is enabled
[106742.305725] BTRFS: has skinny extents
[106742.306056] BTRFS error (device loop0p3): bad tree block start 0 
162633449472
[106742.306060] BTRFS: failed to read tree root on loop0p3
[106742.306069] BTRFS error (device loop0p3): bad tree block start 0 
162633449472
[106742.306071] BTRFS: failed to read tree root on loop0p3
[106742.306084] BTRFS error (device loop0p3): bad tree block start 0 
162632237056
[106742.306086] BTRFS: failed to read tree root on loop0p3
[106742.306097] BTRFS error (device loop0p3): bad tree block start 0 
162626682880
[106742.306100] BTRFS: failed to read tree root on loop0p3
[106742.306111] BTRFS error (device loop0p3): bad tree block start 0 
162609168384
[106742.306114] BTRFS: failed to read tree root on loop0p3
[106742.327272] BTRFS: open_ctree failed
=== end dmesg ===


3. btrfs-find-root

# btrfs-find-root /dev/loop0p3
Couldn't read tree root
Superblock thinks the generation is 22332
Superblock thinks the level is 1
Well block 162633646080(gen: 22332 level: 0) seems good, but generation/level 
doesn't match, want gen: 22332 level: 1
Well block 162633596928(gen: 22332 level: 0) seems good, but generation/level 
doesn't match, want gen: 22332 level: 1
Well block 162633515008(gen: 22332 level: 0) seems good, but generation/level 
doesn't match, want gen: 22332 level: 1

Thanks,
-- 

Ivan Shapovalov / intelfx /

signature.asc
Description: This is a digitally signed message part


[PATCH V2] Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots

2016-01-07 Thread Chandan Rajendra
The following call trace is seen when btrfs/031 test is executed in a loop,

[  158.661848] [ cut here ]
[  158.662634] WARNING: CPU: 2 PID: 890 at 
/home/chandan/repos/linux/fs/btrfs/ioctl.c:558 create_subvol+0x3d1/0x6ea()
[  158.664102] BTRFS: Transaction aborted (error -2)
[  158.664774] Modules linked in:
[  158.665266] CPU: 2 PID: 890 Comm: btrfs Not tainted 4.4.0-rc6-g511711a #2
[  158.666251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Bochs 01/01/2011
[  158.667392]  81c0a6b0 8806c7c4f8e8 81431fc8 
8806c7c4f930
[  158.668515]  8806c7c4f920 81051aa1 880c85aff000 
8800bb44d000
[  158.669647]  8808863b5c98  fffe 
8806c7c4f980
[  158.670769] Call Trace:
[  158.671153]  [] dump_stack+0x44/0x5c
[  158.671884]  [] warn_slowpath_common+0x81/0xc0
[  158.672769]  [] warn_slowpath_fmt+0x47/0x50
[  158.673620]  [] create_subvol+0x3d1/0x6ea
[  158.674440]  [] btrfs_mksubvol.isra.30+0x369/0x520
[  158.675376]  [] ? percpu_down_read+0x1a/0x50
[  158.676235]  [] btrfs_ioctl_snap_create_transid+0x101/0x180
[  158.677268]  [] btrfs_ioctl_snap_create+0x52/0x70
[  158.678183]  [] btrfs_ioctl+0x474/0x2f90
[  158.678975]  [] ? vma_merge+0xee/0x300
[  158.679751]  [] ? alloc_pages_vma+0x91/0x170
[  158.680599]  [] ? 
lru_cache_add_active_or_unevictable+0x22/0x70
[  158.681686]  [] ? selinux_file_ioctl+0xff/0x1d0
[  158.682581]  [] do_vfs_ioctl+0x2c1/0x490
[  158.683399]  [] ? security_file_ioctl+0x3e/0x60
[  158.684297]  [] SyS_ioctl+0x74/0x80
[  158.685051]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
[  158.685958] ---[ end trace 4b63312de5a2cb76 ]---
[  158.686647] BTRFS: error (device loop0) in create_subvol:558: errno=-2 No 
such entry
[  158.709508] BTRFS info (device loop0): forced readonly
[  158.737113] BTRFS info (device loop0): disk space caching is enabled
[  158.738096] BTRFS error (device loop0): Remounting read-write after error is 
not allowed
[  158.851303] BTRFS error (device loop0): cleaner transaction attach returned 
-30

This occurs because,

Mount filesystem
Create subvol with ID 257
Unmount filesystem
Mount filesystem
Delete subvol with ID 257
  btrfs_drop_snapshot()
Add root corresponding to subvol 257 into
btrfs_transaction->dropped_roots list
Create new subvol (i.e. create_subvol())
  257 is returned as the next free objectid
  btrfs_read_fs_root_no_name()
Finds the btrfs_root instance corresponding to the old subvol with ID 257
in btrfs_fs_info->fs_roots_radix.
Returns error since btrfs_root_item->refs has the value of 0.

To fix the issue the commit initializes tree root's and subvolume root's
highest_objectid when loading the roots from disk.

Signed-off-by: Chandan Rajendra <chan...@linux.vnet.ibm.com>
---
Changelog:

v1->v2:
A newly created subvolume cannot have an ID beyond
BTRFS_LAST_FREE_OBJECTID.  Hence when loading tree root tree or any of
the subvolume trees, We now use an assert statement to verify if
root->highest_objectid has a value greater than
BTRFS_LAST_FREE_OBJECTID.

 fs/btrfs/disk-io.c   | 27 +++
 fs/btrfs/inode-map.c |  9 +
 fs/btrfs/inode-map.h |  1 +
 fs/btrfs/ioctl.c |  4 
 4 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index dc6b73a..3679ad6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1583,8 +1583,23 @@ int btrfs_init_fs_root(struct btrfs_root *root)
ret = get_anon_bdev(>anon_dev);
if (ret)
goto free_writers;
+
+   mutex_lock(>objectid_mutex);
+   ret = btrfs_find_highest_objectid(root,
+   >highest_objectid);
+   if (ret) {
+   mutex_unlock(>objectid_mutex);
+   goto free_root_dev;
+   }
+
+   ASSERT(root->highest_objectid <= BTRFS_LAST_FREE_OBJECTID);
+
+   mutex_unlock(>objectid_mutex);
+
return 0;
 
+free_root_dev:
+   free_anon_bdev(root->anon_dev);
 free_writers:
btrfs_free_subvolume_writers(root->subv_writers);
 fail:
@@ -2914,6 +2929,18 @@ retry_root_backup:
tree_root->commit_root = btrfs_root_node(tree_root);
btrfs_set_root_refs(_root->root_item, 1);
 
+   mutex_lock(_root->objectid_mutex);
+   ret = btrfs_find_highest_objectid(tree_root,
+   _root->highest_objectid);
+   if (ret) {
+   mutex_unlock(_root->objectid_mutex);
+   goto recovery_tree_root;
+   }
+
+   ASSERT(tree_root->highest_objectid <= BTRFS_LAST_FREE_OBJECTID);
+
+   mutex_unlock(_root->objectid_mutex);
+
ret = btrfs_read_roots(fs_info, tree_root);
if (ret)
goto recovery_tree_root;
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 767a605..07573dc 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/i

Re: [PATCH] Btrfs: Intialize btrfs_root->highest_objectid when loading tree root and subvolume roots

2016-01-06 Thread Chandan Rajendra
On Tuesday 05 Jan 2016 13:12:34 David Sterba wrote:
> Sorry for not answering that. As you're going to resend it, please
> use EOVERFLOW in the btrfs_init_fs_root. We should not hit the overflow
> error in the mount path.
Right. Now I understand that.

David, Replacing the following code snippet instances (in both open_ctree()
and btrfs_init_fs_root()) ...

if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
mutex_unlock(>objectid_mutex);
ret = -EOVERFLOW;
goto free_root_dev;
}

with 

 ASSERT(tree_root->highest_objectid <= BTRFS_LAST_FREE_OBJECTID);

is probably a better option?

The validation of root->highest_objectid must have been done by
btrfs_find_free_objectid() when creating the subvolume. If the parent
subvolume already has an objectid with BTRFS_LAST_FREE_OBJECTID as the value,
btrfs_find_free_objectid() would return with an error and hence we should
never have subvolumes containing other subvolumes with objectid greater than
BTRFS_LAST_FREE_OBJECTID.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Intialize btrfs_root->highest_objectid when loading tree root and subvolume roots

2016-01-05 Thread David Sterba
On Wed, Oct 07, 2015 at 07:40:46PM +0530, Chandan Rajendra wrote:
> On Wednesday 07 Oct 2015 11:25:03 David Sterba wrote:
> > On Mon, Oct 05, 2015 at 10:14:24PM +0530, Chandan Rajendra wrote:
> > > + if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
> > > + mutex_unlock(>objectid_mutex);
> > > + ret = -ENOSPC;
> > 
> > ENOSPC ... I don't think it's right as this could be with a normal
> > enospc during subvolume creation. The problem is that theh inode number
> > space is exhausted, the closest error code I see is EOVERFLOW. As this
> > is an ioctl we can afford to define the meaning of this return value as
> > such (unlike for eg. creat()/open()).
> > 
> > > + goto free_root_dev;
> > > + }
> > > +
> > > + mutex_unlock(>objectid_mutex);
> > > +
> > > 
> > >   return 0;
> 
> David, Are you suggesting that we return -EOVERFLOW from within
> btrfs_init_fs_root() and continue returning -ENOSPC in case of error
> (i.e. tree_root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID) from
> open_ctree()?
> 
> If yes, btrfs_init_fs_root() gets invoked from open_ctree() via
> btrfs_read_fs_root_no_name() and hence we may end up returning -EOVERFLOW when
> servicing the mount() syscall.

Sorry for not answering that. As you're going to resend it, please
use EOVERFLOW in the btrfs_init_fs_root. We should not hit the overflow
error in the mount path.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Intialize btrfs_root->highest_objectid when loading tree root and subvolume roots

2016-01-04 Thread Chandan Rajendra
On Sunday 03 Jan 2016 00:02:18 james harvey wrote:
> Bump.
> 
> Pretty sure I just ran into this, outside of a testing scenario.  See
> http://permalink.gmane.org/gmane.comp.file-systems.btrfs/51796
> 
> Looks like the patch was never committed.
>

Thanks for pointing this out. I will rebase & test the patch on top of current
integration branch and post it again.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Intialize btrfs_root->highest_objectid when loading tree root and subvolume roots

2016-01-02 Thread james harvey
Bump.

Pretty sure I just ran into this, outside of a testing scenario.  See
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/51796

Looks like the patch was never committed.

On Wed, Oct 7, 2015 at 10:10 AM, Chandan Rajendra
 wrote:
> On Wednesday 07 Oct 2015 11:25:03 David Sterba wrote:
>> On Mon, Oct 05, 2015 at 10:14:24PM +0530, Chandan Rajendra wrote:
>> > +   if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
>> > +   mutex_unlock(>objectid_mutex);
>> > +   ret = -ENOSPC;
>>
>> ENOSPC ... I don't think it's right as this could be with a normal
>> enospc during subvolume creation. The problem is that theh inode number
>> space is exhausted, the closest error code I see is EOVERFLOW. As this
>> is an ioctl we can afford to define the meaning of this return value as
>> such (unlike for eg. creat()/open()).
>>
>> > +   goto free_root_dev;
>> > +   }
>> > +
>> > +   mutex_unlock(>objectid_mutex);
>> > +
>> >
>> > return 0;
>
> David, Are you suggesting that we return -EOVERFLOW from within
> btrfs_init_fs_root() and continue returning -ENOSPC in case of error
> (i.e. tree_root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID) from
> open_ctree()?
>
> If yes, btrfs_init_fs_root() gets invoked from open_ctree() via
> btrfs_read_fs_root_no_name() and hence we may end up returning -EOVERFLOW when
> servicing the mount() syscall.
>
> --
> chandan
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 09/25] btrfs-progs: Introduce function to setup temporary tree root

2015-11-30 Thread Qu Wenruo
Introduce new function, setup_temp_tree_root(), to initialize temporary
tree root for make_btrfs_v2().

The new function will setup tree root at metadata chunk and ensure data
won't be written into metadata chunk.

Also, new make_btrfs_v2() will have a much better code structure than
old make_btrfs().

Signed-off-by: Qu Wenruo <quwen...@cn.fujitsu.com>
---
 utils.c | 148 
 1 file changed, 148 insertions(+)

diff --git a/utils.c b/utils.c
index 444b3f3..fa26949 100644
--- a/utils.c
+++ b/utils.c
@@ -303,6 +303,136 @@ out:
 }
 
 /*
+ * Setup an extent buffer for tree block.
+ */
+static int setup_temp_extent_buffer(struct extent_buffer *buf,
+   struct btrfs_mkfs_config *cfg,
+   u64 bytenr, u64 owner)
+{
+   unsigned char fsid[BTRFS_FSID_SIZE];
+   unsigned char chunk_uuid[BTRFS_UUID_SIZE];
+   int ret;
+
+   /* We rely on cfg->fs_uuid and chunk_uuid to fsid and chunk uuid */
+   BUG_ON(!cfg->fs_uuid || !cfg->chunk_uuid);
+   ret = uuid_parse(cfg->fs_uuid, fsid);
+   if (ret)
+   return -EINVAL;
+   ret = uuid_parse(cfg->chunk_uuid, chunk_uuid);
+   if (ret)
+   return -EINVAL;
+
+   memset(buf->data, 0, cfg->nodesize);
+   buf->len = cfg->nodesize;
+   btrfs_set_header_bytenr(buf, bytenr);
+   btrfs_set_header_generation(buf, 1);
+   btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV);
+   btrfs_set_header_owner(buf, owner);
+   btrfs_set_header_flags(buf, BTRFS_HEADER_FLAG_WRITTEN);
+   write_extent_buffer(buf, chunk_uuid, btrfs_header_chunk_tree_uuid(buf),
+   BTRFS_UUID_SIZE);
+   write_extent_buffer(buf, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE);
+   return 0;
+}
+
+static inline int write_temp_extent_buffer(int fd, struct extent_buffer *buf,
+  u64 bytenr)
+{
+   int ret;
+
+   csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0);
+
+   /* Temporary extent buffer is always mapped 1:1 on disk */
+   ret = pwrite(fd, buf->data, buf->len, bytenr);
+   if (ret < buf->len)
+   ret = (ret < 0 ? ret : -EIO);
+   else
+   ret = 0;
+   return ret;
+}
+
+/*
+ * Insert a root item for temporary tree root
+ *
+ * Only used in make_btrfs_v2().
+ */
+static void insert_temp_root_item(struct extent_buffer *buf,
+ struct btrfs_mkfs_config *cfg,
+ int *slot, u32 *itemoff, u64 objectid,
+ u64 bytenr)
+{
+   struct btrfs_root_item root_item;
+   struct btrfs_inode_item *inode_item;
+   struct btrfs_disk_key disk_key;
+
+   btrfs_set_header_nritems(buf, *slot + 1);
+   (*itemoff) -= sizeof(root_item);
+   memset(_item, 0, sizeof(root_item));
+   inode_item = _item.inode;
+   btrfs_set_stack_inode_generation(inode_item, 1);
+   btrfs_set_stack_inode_size(inode_item, 3);
+   btrfs_set_stack_inode_nlink(inode_item, 1);
+   btrfs_set_stack_inode_nbytes(inode_item, cfg->nodesize);
+   btrfs_set_stack_inode_mode(inode_item, S_IFDIR | 0755);
+   btrfs_set_root_refs(_item, 1);
+   btrfs_set_root_used(_item, cfg->nodesize);
+   btrfs_set_root_generation(_item, 1);
+   btrfs_set_root_bytenr(_item, bytenr);
+
+   memset(_key, 0, sizeof(disk_key));
+   btrfs_set_disk_key_type(_key, BTRFS_ROOT_ITEM_KEY);
+   btrfs_set_disk_key_objectid(_key, objectid);
+   btrfs_set_disk_key_offset(_key, 0);
+
+   btrfs_set_item_key(buf, _key, *slot);
+   btrfs_set_item_offset(buf, btrfs_item_nr(*slot), *itemoff);
+   btrfs_set_item_size(buf, btrfs_item_nr(*slot), sizeof(root_item));
+   write_extent_buffer(buf, _item,
+   btrfs_item_ptr_offset(buf, *slot),
+   sizeof(root_item));
+   (*slot)++;
+}
+
+static int setup_temp_root_tree(int fd, struct btrfs_mkfs_config *cfg,
+   u64 root_bytenr, u64 extent_bytenr,
+   u64 dev_bytenr, u64 fs_bytenr, u64 csum_bytenr)
+{
+   struct extent_buffer *buf = NULL;
+   u32 itemoff = __BTRFS_LEAF_DATA_SIZE(cfg->nodesize);
+   int slot = 0;
+   int ret;
+
+   /*
+    * Provided bytenr must in ascending order, or tree root will have a
+* bad key order.
+*/
+   BUG_ON(!(root_bytenr < extent_bytenr && extent_bytenr < dev_bytenr &&
+dev_bytenr < fs_bytenr && fs_bytenr < csum_bytenr));
+   buf = malloc(sizeof(*buf) + cfg->nodesize);
+   if (!buf)
+   return -ENOMEM;
+
+   ret = setup_temp_extent_buffer(buf, cfg, root_bytenr,
+  BTRFS_ROOT_TREE_OB

[PATCH 09/25] btrfs-progs: Introduce function to setup temporary tree root

2015-11-19 Thread Qu Wenruo
Introduce new function, setup_temp_tree_root(), to initialize temporary
tree root for make_btrfs_v2().

The new function will setup tree root at metadata chunk and ensure data
won't be written into metadata chunk.

Also, new make_btrfs_v2() will have a much better code structure than
old make_btrfs().

Signed-off-by: Qu Wenruo <quwen...@cn.fujitsu.com>
---
 utils.c | 148 
 1 file changed, 148 insertions(+)

diff --git a/utils.c b/utils.c
index 6c4da04..ce4d93c 100644
--- a/utils.c
+++ b/utils.c
@@ -493,6 +493,136 @@ out:
 }
 
 /*
+ * Setup an extent buffer for tree block.
+ */
+static int setup_temp_extent_buffer(struct extent_buffer *buf,
+   struct btrfs_mkfs_config *cfg,
+   u64 bytenr, u64 owner)
+{
+   unsigned char fsid[BTRFS_FSID_SIZE];
+   unsigned char chunk_uuid[BTRFS_UUID_SIZE];
+   int ret;
+
+   /* We rely on cfg->fs_uuid and chunk_uuid to fsid and chunk uuid */
+   BUG_ON(!cfg->fs_uuid || !cfg->chunk_uuid);
+   ret = uuid_parse(cfg->fs_uuid, fsid);
+   if (ret)
+   return -EINVAL;
+   ret = uuid_parse(cfg->chunk_uuid, chunk_uuid);
+   if (ret)
+   return -EINVAL;
+
+   memset(buf->data, 0, cfg->nodesize);
+   buf->len = cfg->nodesize;
+   btrfs_set_header_bytenr(buf, bytenr);
+   btrfs_set_header_generation(buf, 1);
+   btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV);
+   btrfs_set_header_owner(buf, owner);
+   btrfs_set_header_flags(buf, BTRFS_HEADER_FLAG_WRITTEN);
+   write_extent_buffer(buf, chunk_uuid, btrfs_header_chunk_tree_uuid(buf),
+   BTRFS_UUID_SIZE);
+   write_extent_buffer(buf, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE);
+   return 0;
+}
+
+static inline int write_temp_extent_buffer(int fd, struct extent_buffer *buf,
+  u64 bytenr)
+{
+   int ret;
+
+   csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0);
+
+   /* Temporary extent buffer is always mapped 1:1 on disk */
+   ret = pwrite(fd, buf->data, buf->len, bytenr);
+   if (ret < buf->len)
+   ret = (ret < 0 ? ret : -EIO);
+   else
+   ret = 0;
+   return ret;
+}
+
+/*
+ * Insert a root item for temporary tree root
+ *
+ * Only used in make_btrfs_v2().
+ */
+static void insert_temp_root_item(struct extent_buffer *buf,
+ struct btrfs_mkfs_config *cfg,
+ int *slot, u32 *itemoff, u64 objectid,
+ u64 bytenr)
+{
+   struct btrfs_root_item root_item;
+   struct btrfs_inode_item *inode_item;
+   struct btrfs_disk_key disk_key;
+
+   btrfs_set_header_nritems(buf, *slot + 1);
+   (*itemoff) -= sizeof(root_item);
+   memset(_item, 0, sizeof(root_item));
+   inode_item = _item.inode;
+   btrfs_set_stack_inode_generation(inode_item, 1);
+   btrfs_set_stack_inode_size(inode_item, 3);
+   btrfs_set_stack_inode_nlink(inode_item, 1);
+   btrfs_set_stack_inode_nbytes(inode_item, cfg->nodesize);
+   btrfs_set_stack_inode_mode(inode_item, S_IFDIR | 0755);
+   btrfs_set_root_refs(_item, 1);
+   btrfs_set_root_used(_item, cfg->nodesize);
+   btrfs_set_root_generation(_item, 1);
+   btrfs_set_root_bytenr(_item, bytenr);
+
+   memset(_key, 0, sizeof(disk_key));
+   btrfs_set_disk_key_type(_key, BTRFS_ROOT_ITEM_KEY);
+   btrfs_set_disk_key_objectid(_key, objectid);
+   btrfs_set_disk_key_offset(_key, 0);
+
+   btrfs_set_item_key(buf, _key, *slot);
+   btrfs_set_item_offset(buf, btrfs_item_nr(*slot), *itemoff);
+   btrfs_set_item_size(buf, btrfs_item_nr(*slot), sizeof(root_item));
+   write_extent_buffer(buf, _item,
+   btrfs_item_ptr_offset(buf, *slot),
+   sizeof(root_item));
+   (*slot)++;
+}
+
+static int setup_temp_root_tree(int fd, struct btrfs_mkfs_config *cfg,
+   u64 root_bytenr, u64 extent_bytenr,
+   u64 dev_bytenr, u64 fs_bytenr, u64 csum_bytenr)
+{
+   struct extent_buffer *buf = NULL;
+   u32 itemoff = __BTRFS_LEAF_DATA_SIZE(cfg->nodesize);
+   int slot = 0;
+   int ret;
+
+   /*
+    * Provided bytenr must in ascending order, or tree root will have a
+* bad key order.
+*/
+   BUG_ON(!(root_bytenr < extent_bytenr && extent_bytenr < dev_bytenr &&
+dev_bytenr < fs_bytenr && fs_bytenr < csum_bytenr));
+   buf = malloc(sizeof(*buf) + cfg->nodesize);
+   if (!buf)
+   return -ENOMEM;
+
+   ret = setup_temp_extent_buffer(buf, cfg, root_bytenr,
+  BTRFS_ROOT_TREE_OB

[RFC PATCH 09/22] btrfs-progs: Introduce function to setup temporary tree root

2015-11-17 Thread Qu Wenruo
Introduce new function, setup_temp_tree_root(), to initialize temporary
tree root for make_btrfs_v2().

The new function will setup tree root at metadata chunk and ensure data
won't be written into metadata chunk.

Also, new make_btrfs_v2() will have a much better code structure than
old make_btrfs().

Signed-off-by: Qu Wenruo <quwen...@cn.fujitsu.com>
---
 utils.c | 148 
 1 file changed, 148 insertions(+)

diff --git a/utils.c b/utils.c
index 6c4da04..ce4d93c 100644
--- a/utils.c
+++ b/utils.c
@@ -493,6 +493,136 @@ out:
 }
 
 /*
+ * Setup an extent buffer for tree block.
+ */
+static int setup_temp_extent_buffer(struct extent_buffer *buf,
+   struct btrfs_mkfs_config *cfg,
+   u64 bytenr, u64 owner)
+{
+   unsigned char fsid[BTRFS_FSID_SIZE];
+   unsigned char chunk_uuid[BTRFS_UUID_SIZE];
+   int ret;
+
+   /* We rely on cfg->fs_uuid and chunk_uuid to fsid and chunk uuid */
+   BUG_ON(!cfg->fs_uuid || !cfg->chunk_uuid);
+   ret = uuid_parse(cfg->fs_uuid, fsid);
+   if (ret)
+   return -EINVAL;
+   ret = uuid_parse(cfg->chunk_uuid, chunk_uuid);
+   if (ret)
+   return -EINVAL;
+
+   memset(buf->data, 0, cfg->nodesize);
+   buf->len = cfg->nodesize;
+   btrfs_set_header_bytenr(buf, bytenr);
+   btrfs_set_header_generation(buf, 1);
+   btrfs_set_header_backref_rev(buf, BTRFS_MIXED_BACKREF_REV);
+   btrfs_set_header_owner(buf, owner);
+   btrfs_set_header_flags(buf, BTRFS_HEADER_FLAG_WRITTEN);
+   write_extent_buffer(buf, chunk_uuid, btrfs_header_chunk_tree_uuid(buf),
+   BTRFS_UUID_SIZE);
+   write_extent_buffer(buf, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE);
+   return 0;
+}
+
+static inline int write_temp_extent_buffer(int fd, struct extent_buffer *buf,
+  u64 bytenr)
+{
+   int ret;
+
+   csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0);
+
+   /* Temporary extent buffer is always mapped 1:1 on disk */
+   ret = pwrite(fd, buf->data, buf->len, bytenr);
+   if (ret < buf->len)
+   ret = (ret < 0 ? ret : -EIO);
+   else
+   ret = 0;
+   return ret;
+}
+
+/*
+ * Insert a root item for temporary tree root
+ *
+ * Only used in make_btrfs_v2().
+ */
+static void insert_temp_root_item(struct extent_buffer *buf,
+ struct btrfs_mkfs_config *cfg,
+ int *slot, u32 *itemoff, u64 objectid,
+ u64 bytenr)
+{
+   struct btrfs_root_item root_item;
+   struct btrfs_inode_item *inode_item;
+   struct btrfs_disk_key disk_key;
+
+   btrfs_set_header_nritems(buf, *slot + 1);
+   (*itemoff) -= sizeof(root_item);
+   memset(_item, 0, sizeof(root_item));
+   inode_item = _item.inode;
+   btrfs_set_stack_inode_generation(inode_item, 1);
+   btrfs_set_stack_inode_size(inode_item, 3);
+   btrfs_set_stack_inode_nlink(inode_item, 1);
+   btrfs_set_stack_inode_nbytes(inode_item, cfg->nodesize);
+   btrfs_set_stack_inode_mode(inode_item, S_IFDIR | 0755);
+   btrfs_set_root_refs(_item, 1);
+   btrfs_set_root_used(_item, cfg->nodesize);
+   btrfs_set_root_generation(_item, 1);
+   btrfs_set_root_bytenr(_item, bytenr);
+
+   memset(_key, 0, sizeof(disk_key));
+   btrfs_set_disk_key_type(_key, BTRFS_ROOT_ITEM_KEY);
+   btrfs_set_disk_key_objectid(_key, objectid);
+   btrfs_set_disk_key_offset(_key, 0);
+
+   btrfs_set_item_key(buf, _key, *slot);
+   btrfs_set_item_offset(buf, btrfs_item_nr(*slot), *itemoff);
+   btrfs_set_item_size(buf, btrfs_item_nr(*slot), sizeof(root_item));
+   write_extent_buffer(buf, _item,
+   btrfs_item_ptr_offset(buf, *slot),
+   sizeof(root_item));
+   (*slot)++;
+}
+
+static int setup_temp_root_tree(int fd, struct btrfs_mkfs_config *cfg,
+   u64 root_bytenr, u64 extent_bytenr,
+   u64 dev_bytenr, u64 fs_bytenr, u64 csum_bytenr)
+{
+   struct extent_buffer *buf = NULL;
+   u32 itemoff = __BTRFS_LEAF_DATA_SIZE(cfg->nodesize);
+   int slot = 0;
+   int ret;
+
+   /*
+    * Provided bytenr must in ascending order, or tree root will have a
+* bad key order.
+*/
+   BUG_ON(!(root_bytenr < extent_bytenr && extent_bytenr < dev_bytenr &&
+dev_bytenr < fs_bytenr && fs_bytenr < csum_bytenr));
+   buf = malloc(sizeof(*buf) + cfg->nodesize);
+   if (!buf)
+   return -ENOMEM;
+
+   ret = setup_temp_extent_buffer(buf, cfg, root_bytenr,
+  BTRFS_ROOT_TREE_OB

Re: [PATCH] Btrfs: Intialize btrfs_root->highest_objectid when loading tree root and subvolume roots

2015-10-07 Thread David Sterba
On Mon, Oct 05, 2015 at 10:14:24PM +0530, Chandan Rajendra wrote:
> + if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
> + mutex_unlock(>objectid_mutex);
> + ret = -ENOSPC;

ENOSPC ... I don't think it's right as this could be with a normal
enospc during subvolume creation. The problem is that theh inode number
space is exhausted, the closest error code I see is EOVERFLOW. As this
is an ioctl we can afford to define the meaning of this return value as
such (unlike for eg. creat()/open()).

> + goto free_root_dev;
> + }
> +
> + mutex_unlock(>objectid_mutex);
> +
>   return 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/10] btrfs-progs: Allow open_ctree use backup tree root or search it automatically if primary tree root is corrupted.

2015-07-28 Thread David Sterba
On Tue, Jul 28, 2015 at 08:34:50AM +0800, Qu Wenruo wrote:
 Yes, you were against the automatic use of backup root or especially 
 iteration all metadata space to find the latest tree root.
 
 I'll try to add a new option like --full-scan to enable the automatic 
 search of all metadata space.
 
 But I'm afraid it may be a little late as I'm recently debugging other 
 things like in-band deduplication.

This is not anything urgent, I was going through my git branches with
pending patches and found this.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/10] btrfs-progs: Allow open_ctree use backup tree root or search it automatically if primary tree root is corrupted.

2015-07-27 Thread David Sterba
On Mon, Jan 19, 2015 at 02:45:12PM +0800, Qu Wenruo wrote:
 Allow open_ctree to try its best to open tree root on damaged file system.
 
 With this patch, open_ctree will follow the below priority to read tree
 root, providing better chance to open damaged btrfs fs.
 1) Using root bytenr in SB to read tree root
 Normal routine if not specified to use backup roots or specified root
 tree bytenr.
 
 2) Using backup root if specified or 1) fails
 Backup will be automatically used without user specification if
 normal routine fails
 
 3) Try to search possible tree root in all its metadata chunks
 The last chance, searching through all the metadata space.
 May takes a long time but still worth a try since above methods all
 fails.

I don't see this patch merged and there are no folowups. IIRC I had
objections against the automatic behaviour but cannot find the
discussion. Can we merge at least some bit of that patch? It does more
than one thing so it would be better to split it.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/10] btrfs-progs: Allow open_ctree use backup tree root or search it automatically if primary tree root is corrupted.

2015-07-27 Thread Qu Wenruo



David Sterba wrote on 2015/07/27 17:04 +0200:

On Mon, Jan 19, 2015 at 02:45:12PM +0800, Qu Wenruo wrote:

Allow open_ctree to try its best to open tree root on damaged file system.

With this patch, open_ctree will follow the below priority to read tree
root, providing better chance to open damaged btrfs fs.
1) Using root bytenr in SB to read tree root
Normal routine if not specified to use backup roots or specified root
tree bytenr.

2) Using backup root if specified or 1) fails
Backup will be automatically used without user specification if
normal routine fails

3) Try to search possible tree root in all its metadata chunks
The last chance, searching through all the metadata space.
May takes a long time but still worth a try since above methods all
fails.


I don't see this patch merged and there are no folowups. IIRC I had
objections against the automatic behaviour but cannot find the
discussion. Can we merge at least some bit of that patch? It does more
than one thing so it would be better to split it.



I forgot this patchset again.
I must admit that I have a bad memory...

Yes, you were against the automatic use of backup root or especially 
iteration all metadata space to find the latest tree root.


I'll try to add a new option like --full-scan to enable the automatic 
search of all metadata space.


But I'm afraid it may be a little late as I'm recently debugging other 
things like in-band deduplication.


Thanks
Qu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root.

2015-02-03 Thread Qu Wenruo
New function reset_one_root_csum() will reset all csum in one root.
And reset_roots_csum() will reset all csum of all trees in tree root.
which provides the basis for later dangerous options.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 cmds-check.c | 176 +++
 1 file changed, 176 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index e4b4f4a..535a518 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -9193,6 +9193,182 @@ out:
return ret;
 }
 
+static int reset_one_root_csum(struct btrfs_root *root)
+{
+   struct btrfs_fs_info *fs_info = root-fs_info;
+   struct btrfs_path *path;
+   struct btrfs_key key;
+   struct extent_buffer *node;
+   u32 sectorsize = root-sectorsize;
+   int max_level = btrfs_root_level(root-root_item);
+   int slot;
+   int i;
+   int ret = 0;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+   path-reada = 0;
+
+   /* Iterate all levels except level 0 */
+   for (i = 0; i  max_level; i++) {
+   int cur_level = max_level - i;
+
+   path-lowest_level = cur_level;
+   key.offset = 0;
+   key.objectid = 0;
+   key.type = 0;
+
+   /*
+* btrfs_search_slot() can't work with lowest_level and cow,
+* so use inslen=0 and cow=0 and later modify will be done
+* directly to disk.
+*/
+   ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
+   if (ret  0)
+   goto out;
+
+   /* Iterate all node slots in this level */
+   while (1) {
+   u64 bytenr;
+
+   node = path-nodes[0];
+   slot = path-slots[0];
+   bytenr = btrfs_node_blockptr(node, slot);
+
+   if (bytenr != round_down(bytenr, sectorsize)) {
+   bytenr = round_down(bytenr, sectorsize);
+   btrfs_set_node_blockptr(node, slot, bytenr);
+   ret = write_tree_block(NULL, root, node);
+   if (ret  0) {
+   fprintf(stderr,
+   Fail to write extent at %llu\n,
+   bytenr);
+   goto out;
+   }
+   }
+   ret = reset_tree_block_csum(fs_info, bytenr,
+   root-nodesize);
+   if (ret  0) {
+   fprintf(stderr,
+   Fail to reset csum for tree block at 
%llu\n,
+   bytenr);
+   goto out;
+   }
+
+   ret = btrfs_next_slot(root, path, cur_level);
+   /*
+* Error should not happen since higher level iteration
+* has already reset the csum of this level.
+* Either way, goto next level should be OK.
+*/
+   if (ret)
+   break;
+   }
+   btrfs_release_path(path);
+   }
+out:
+   btrfs_free_path(path);
+   return ret;
+}
+
+static int reset_roots_csum(struct btrfs_root *tree_root)
+{
+   struct btrfs_fs_info *fs_info = tree_root-fs_info;
+   struct btrfs_key key;
+   struct btrfs_path *path;
+   struct btrfs_root_item *ri;
+   struct btrfs_root *root;
+   struct extent_buffer *node;
+   u32 nodesize = tree_root-nodesize;
+   u32 sectorsize = tree_root-sectorsize;
+   u64 bytenr;
+   int slot;
+   int ret = 0;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   key.objectid = 0;
+   key.offset = 0;
+   key.type = 0;
+
+   /*
+* Tree root is OK and we can do cow. But in case extent tree is
+* corrupted, we still use the nocow method.
+*/
+   ret = btrfs_search_slot(NULL, tree_root, key, path, 0, 0);
+   if (ret  0)
+   goto out;
+   while (1) {
+   slot = path-slots[0];
+   node = path-nodes[0];
+   btrfs_item_key_to_cpu(node, key, slot);
+   if (key.type != BTRFS_ROOT_ITEM_KEY)
+   goto next;
+
+   /*
+* skip tree reloc tree, it's not support by
+* btrfs_read_fs_root() yet.
+*/
+   if (key.objectid == BTRFS_TREE_RELOC_OBJECTID)
+   goto next;
+
+   ri = btrfs_item_ptr(node, slot, struct btrfs_root_item);
+   bytenr

[PATCH 5/5] btrfs-progs: Introduce new function reset_(one_root/roots)_csum() to reset one/all tree's csum in tree root.

2015-02-03 Thread Qu Wenruo
New function reset_one_root_csum() will reset all csum in one root.
And reset_roots_csum() will reset all csum of all trees in tree root.
which provides the basis for later dangerous options.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 cmds-check.c | 157 +++
 1 file changed, 157 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index e4b4f4a..2cdd3d9 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -9193,6 +9193,163 @@ out:
return ret;
 }
 
+static int reset_one_root_csum(struct btrfs_trans_handle *trans,
+  struct btrfs_root *root)
+{
+   struct btrfs_fs_info *fs_info = root-fs_info;
+   struct btrfs_path *path;
+   struct btrfs_key key;
+   struct extent_buffer *node;
+   u32 sectorsize = root-sectorsize;
+   int max_level = btrfs_root_level(root-root_item);
+   int slot;
+   int i;
+   int ret = 0;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+   path-reada = 0;
+
+   /* Iterate all levels except level 0 */
+   for (i = 0; i  max_level; i++) {
+   int cur_level = max_level - i;
+
+   path-lowest_level = cur_level;
+   key.offset = 0;
+   key.objectid = 0;
+   key.type = 0;
+
+   ret = btrfs_search_slot(trans, root, key, path, 1, 1);
+   if (ret  0)
+   goto out;
+
+   /* Iterate all node slots in this level */
+   while (1) {
+   u64 bytenr;
+
+   node = path-nodes[0];
+   slot = path-slots[0];
+   bytenr = btrfs_node_blockptr(node, slot);
+
+   if (bytenr != round_down(bytenr, sectorsize)) {
+   bytenr = round_down(bytenr, sectorsize);
+   btrfs_set_node_blockptr(node, slot, bytenr);
+   btrfs_mark_buffer_dirty(node);
+   }
+   ret = reset_tree_block_csum(fs_info, bytenr,
+   root-nodesize);
+   if (ret  0) {
+   fprintf(stderr,
+   Fail to reset csum for tree block at 
%llu\n,
+   bytenr);
+   goto next_slot;
+   }
+next_slot:
+   ret = btrfs_next_slot(root, path, cur_level);
+   /*
+* Error should not happen since higher level iteration
+* has already reset the csum of this level.
+* Either way, goto next level should be OK.
+*/
+   if (ret)
+   break;
+   }
+   btrfs_release_path(path);
+   }
+out:
+   btrfs_free_path(path);
+   return ret;
+}
+
+static int reset_roots_csum(struct btrfs_trans_handle *trans,
+   struct btrfs_root *tree_root)
+{
+   struct btrfs_fs_info *fs_info = tree_root-fs_info;
+   struct btrfs_key key;
+   struct btrfs_path *path;
+   struct btrfs_root_item *ri;
+   struct btrfs_root *root;
+   struct extent_buffer *node;
+   u32 nodesize = tree_root-nodesize;
+   u32 sectorsize = tree_root-sectorsize;
+   u64 bytenr;
+   int slot;
+   int ret = 0;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   key.objectid = 0;
+   key.offset = 0;
+   key.type = 0;
+
+   ret = btrfs_search_slot(trans, tree_root, key, path, 1, 0);
+   if (ret  0)
+   goto out;
+   while (1) {
+   slot = path-slots[0];
+   node = path-nodes[0];
+   btrfs_item_key_to_cpu(node, key, slot);
+   if (key.type != BTRFS_ROOT_ITEM_KEY)
+   goto next;
+
+   /*
+* skip tree reloc tree, it's not support by
+* btrfs_read_fs_root() yet.
+*/
+   if (key.objectid == BTRFS_TREE_RELOC_OBJECTID)
+   goto next;
+
+   ri = btrfs_item_ptr(node, slot, struct btrfs_root_item);
+   bytenr = btrfs_disk_root_bytenr(node, ri);
+
+   /*
+* Check if the bytenr is aligned.
+* If the error bit is only in the low 12bits, no real damage
+* will happen since we will align it.
+*/
+   if (round_down(bytenr, sectorsize) != bytenr) {
+   bytenr = round_down(bytenr, sectorsize);
+   btrfs_set_disk_root_bytenr(node, ri, bytenr);
+   btrfs_mark_buffer_dirty(node);
+   }
+
+   ret

[PATCH v2 10/10] btrfs-progs: Allow open_ctree use backup tree root or search it automatically if primary tree root is corrupted.

2015-01-18 Thread Qu Wenruo
Allow open_ctree to try its best to open tree root on damaged file system.

With this patch, open_ctree will follow the below priority to read tree
root, providing better chance to open damaged btrfs fs.
1) Using root bytenr in SB to read tree root
Normal routine if not specified to use backup roots or specified root
tree bytenr.

2) Using backup root if specified or 1) fails
Backup will be automatically used without user specification if
normal routine fails

3) Try to search possible tree root in all its metadata chunks
The last chance, searching through all the metadata space.
May takes a long time but still worth a try since above methods all
fails.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
Signed-off-by: David Sterba dste...@suse.cz
---
 disk-io.c | 104 +-
 1 file changed, 96 insertions(+), 8 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index 71ca31f..6673a99 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -35,6 +35,7 @@
 #include utils.h
 #include print-tree.h
 #include rbtree-utils.h
+#include find-root.h
 
 static int check_tree_block(struct btrfs_root *root, struct extent_buffer *buf)
 {
@@ -894,9 +895,33 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, 
u64 root_tree_bytenr,
blocksize = btrfs_level_size(root, btrfs_super_root_level(sb));
generation = btrfs_super_generation(sb);
 
-   if (!root_tree_bytenr  !(flags  OPEN_CTREE_BACKUP_ROOT)) {
+   /*
+* If root bytenr is specified, use it and if fails, just return error,
+* not to try other method.
+*/
+   if (root_tree_bytenr) {
+   root-node = read_tree_block(root, root_tree_bytenr,
+blocksize, generation);
+   if (!extent_buffer_uptodate(root-node)) {
+   fprintf(stderr, Couldn't read tree root at %llu\n,
+   root_tree_bytenr);
+   return -EIO;
+   } else
+   goto extent_tree;
+   }
+   /* Normal root bytenr from super */
+   if (!(flags  OPEN_CTREE_BACKUP_ROOT)) {
root_tree_bytenr = btrfs_super_root(sb);
-   } else if (flags  OPEN_CTREE_BACKUP_ROOT) {
+   root-node = read_tree_block(root, root_tree_bytenr,
+blocksize, generation);
+   if (!extent_buffer_uptodate(root-node)) {
+   fprintf(stderr,
+   Couldn't read tree root, try backup\n);
+   ret = -EAGAIN;
+   } else
+   goto extent_tree;
+   }
+   if ((flags  OPEN_CTREE_BACKUP_ROOT) || ret == -EAGAIN) {
struct btrfs_root_backup *backup;
int index = find_best_backup_root(sb);
if (index = BTRFS_NUM_BACKUP_ROOTS) {
@@ -906,15 +931,78 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, 
u64 root_tree_bytenr,
backup = fs_info-super_copy-super_roots + index;
root_tree_bytenr = btrfs_backup_tree_root(backup);
generation = btrfs_backup_tree_root_gen(backup);
+   root-node = read_tree_block(root, root_tree_bytenr,
+blocksize, generation);
+   if (!extent_buffer_uptodate(root-node)) {
+   fprintf(stderr,
+   Couldn't read backup tree root, try searching 
tree root\n);
+   ret = -EAGAIN;
+   } else {
+   ret = 0;
+   goto extent_tree;
+   }
}
 
-   root-node = read_tree_block(root, root_tree_bytenr, blocksize,
-generation);
-   if (!extent_buffer_uptodate(root-node)) {
-   fprintf(stderr, Couldn't read tree root\n);
-   return -EIO;
-   }
+   /* Last chance, search the tree root */
+   if (ret == -EAGAIN) {
+   struct btrfs_find_root_filter filter = {0};
+   struct btrfs_find_root_gen_cache *gen_cache;
+   struct cache_tree result;
+   struct cache_extent *cache;
+   struct cache_extent *tree_block;
+
+   filter.objectid = BTRFS_ROOT_TREE_OBJECTID;
+   cache_tree_init(result);
+
+   printf(Searching tree root, may take some time\n);
+   ret = btrfs_find_root_search(fs_info-chunk_root, filter,
+result, NULL);
+   if (ret  0) {
+   fprintf(stderr, Couldn't search tree root: %s\n,
+   strerror(-ret));
+   btrfs_find_root_free(result);
+   return ret;
+   }
+   if (cache_tree_empty(result)) {
+   fprintf(stderr, Fail to find any tree

[SOLVED] btrfs unmountable: read block failed check_tree_block; Couldn't read tree root

2014-11-04 Thread Ansgar Hockmann-Stolle

For solution see
http://article.gmane.org/gmane.comp.file-systems.btrfs/39974

Am 28.10.14 um 00:03 schrieb Ansgar Hockmann-Stolle:

Am 27.10.14 um 14:23 schrieb Ansgar Hockmann-Stolle:

Hi!

My btrfs system partition went readonly. After reboot it doesnt mount
anymore. System was openSUSE 13.1 Tumbleweed (kernel 3.17.??). Now I'm
on openSUSE 13.2-RC1 rescue (kernel 3.16.3). I dumped (dd) the whole 250
GB SSD to some USB file and tried some btrfs tools on another copy per
loopback device. But everything failed with:

kernel: BTRFS: failed to read tree root on dm-2

See http://pastebin.com/raw.php?i=dPnU6nzg.

Any hints where to go from here?


After an offlist hint (thanks Tom!) I compiled the latest btrfs-progs
3.17 and tried some more ...

linux:~/bin # ./btrfs --version
Btrfs v3.17
linux:~/bin # ./btrfs-find-root /dev/sda3
Super think's the tree root is at 1015238656, chunk root 20971520
Well block 239718400 seems great, but generation doesn't match,
have=661931, want=663595 level 0
Well block 239722496 seems great, but generation doesn't match,
have=661931, want=663595 level 0
Well block 320098304 seems great, but generation doesn't match,
have=662233, want=663595 level 0
Well block 879341568 seems great, but generation doesn't match,
have=663227, want=663595 level 0
Well block 879345664 seems great, but generation doesn't match,
have=663227, want=663595 level 0
Well block 879382528 seems great, but generation doesn't match,
have=663227, want=663595 level 0
Well block 879398912 seems great, but generation doesn't match,
have=663227, want=663595 level 0
Well block 879403008 seems great, but generation doesn't match,
have=663227, want=663595 level 0
Well block 879423488 seems great, but generation doesn't match,
have=663227, want=663595 level 0
Well block 879435776 seems great, but generation doesn't match,
have=663227, want=663595 level 0
Well block 880095232 seems great, but generation doesn't match,
have=663227, want=663595 level 0
Well block 881504256 seems great, but generation doesn't match,
have=663228, want=663595 level 0
Well block 881512448 seems great, but generation doesn't match,
have=663228, want=663595 level 0
Well block 936271872 seems great, but generation doesn't match,
have=663397, want=663595 level 0
Well block 1004490752 seems great, but generation doesn't match,
have=663571, want=663595 level 0
Well block 1007804416 seems great, but generation doesn't match,
have=663572, want=663595 level 0
Well block 1012031488 seems great, but generation doesn't match,
have=663575, want=663595 level 0
Well block 1012396032 seems great, but generation doesn't match,
have=663575, want=663595 level 0
Well block 1012633600 seems great, but generation doesn't match,
have=663586, want=663595 level 0
Well block 1012871168 seems great, but generation doesn't match,
have=663585, want=663595 level 0
Well block 1015201792 seems great, but generation doesn't match,
have=663588, want=663595 level 0
Well block 1015836672 seems great, but generation doesn't match,
have=663596, want=663595 level 1
Well block 44132536320 seems great, but generation doesn't match,
have=658774, want=663595 level 0
Well block 44178280448 seems great, but generation doesn't match,
have=658774, want=663595 level 0
Well block 87443644416 seems great, but generation doesn't match,
have=661349, want=663595 level 0
Well block 87514079232 seems great, but generation doesn't match,
have=651051, want=663595 level 0
Well block 87517679616 seems great, but generation doesn't match,
have=661349, want=663595 level 0
Well block 98697822208 seems great, but generation doesn't match,
have=643548, want=663595 level 0
Well block 103285026816 seems great, but generation doesn't match,
have=661672, want=663595 level 0
Well block 103309553664 seems great, but generation doesn't match,
have=661674, want=663595 level 0
Well block 103523430400 seems great, but generation doesn't match,
have=661767, want=663595 level 0
No more metdata to scan, exiting

This line I found interesting because have is want + 1:
Well block 1015836672 seems great, but generation doesn't match,
have=663596, want=663595 level 1

And here the tail of btrfs rescue chunk-recover (full output at
http://pastebin.com/raw.php?i=1D5VgDxv)

[..]
Total Chunks:234
   Heathy:231
   Bad:3

Orphan Block Groups:

Orphan Device Extents:
Couldn't map the block 1015238656
btrfs: volumes.c:1140: btrfs_num_copies: Assertion `!(ce-start 
logical || ce-start + ce-size  logical)' failed.
Aborted


Sadly btrfs check --repair keep up refusing to do its job.

linux:~ # btrfs check --repair /dev/sda3
enabling repair mode
Check tree block failed, want=1015238656, have=0
Check tree block failed, want=1015238656, have=0
Check tree block failed, want=1015238656, have=0
Check tree block failed, want=1015238656, have=0
Check tree block failed, want=1015238656, have=0
read block failed check_tree_block
Couldn't read tree root
Checking filesystem on /dev/sda3
UUID: 1af256b5-b1ad-443b-aeee-a6853e70b7e2

Re: btrfs unmountable: read block failed check_tree_block; Couldn't read tree root

2014-10-28 Thread Ansgar Hockmann-Stolle
Duncan 1i5t5.duncan at cox.net writes:

[..]
 
 Hope that helps! =:^)
 

Thanks a lot for that many hints!
Unfortunately, btrfs restore does not find the tree root and so it does not
find anything.

I will wait for Qu Wenruo to enhance chunk-recovering.
And in the meantime I will test openSUSE 13.2-RC1 by installing a fresh copy
of it. I have the dd dump and a working copy of it to test. And backups ...
I told each friend I helped in the last years to make regular backups but me
... ;-)

Thanks,
Ansgar


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs unmountable: read block failed check_tree_block; Couldn't read tree root

2014-10-27 Thread Ansgar Hockmann-Stolle

Hi!

My btrfs system partition went readonly. After reboot it doesnt mount 
anymore. System was openSUSE 13.1 Tumbleweed (kernel 3.17.??). Now I'm 
on openSUSE 13.2-RC1 rescue (kernel 3.16.3). I dumped (dd) the whole 250 
GB SSD to some USB file and tried some btrfs tools on another copy per 
loopback device. But everything failed with:


kernel: BTRFS: failed to read tree root on dm-2

See http://pastebin.com/raw.php?i=dPnU6nzg.

Any hints where to go from here?

Ciao
Ansgar
--
Ansgar Hockmann-Stolle, Universität Osnabrück, Rechenzentrum
Albrechtstraße 28, 49076 Osnabrück, Deutschland, Raum 31/E77B
+49 541 969-2749 (fax -2470), http://www.home.uos.de/anshockm
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs unmountable: read block failed check_tree_block; Couldn't read tree root

2014-10-27 Thread Ansgar Hockmann-Stolle

Am 27.10.14 um 14:23 schrieb Ansgar Hockmann-Stolle:

Hi!

My btrfs system partition went readonly. After reboot it doesnt mount
anymore. System was openSUSE 13.1 Tumbleweed (kernel 3.17.??). Now I'm
on openSUSE 13.2-RC1 rescue (kernel 3.16.3). I dumped (dd) the whole 250
GB SSD to some USB file and tried some btrfs tools on another copy per
loopback device. But everything failed with:

kernel: BTRFS: failed to read tree root on dm-2

See http://pastebin.com/raw.php?i=dPnU6nzg.

Any hints where to go from here?


After an offlist hint (thanks Tom!) I compiled the latest btrfs-progs 
3.17 and tried some more ...


linux:~/bin # ./btrfs --version
Btrfs v3.17
linux:~/bin # ./btrfs-find-root /dev/sda3
Super think's the tree root is at 1015238656, chunk root 20971520
Well block 239718400 seems great, but generation doesn't match, 
have=661931, want=663595 level 0
Well block 239722496 seems great, but generation doesn't match, 
have=661931, want=663595 level 0
Well block 320098304 seems great, but generation doesn't match, 
have=662233, want=663595 level 0
Well block 879341568 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879345664 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879382528 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879398912 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879403008 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879423488 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879435776 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 880095232 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 881504256 seems great, but generation doesn't match, 
have=663228, want=663595 level 0
Well block 881512448 seems great, but generation doesn't match, 
have=663228, want=663595 level 0
Well block 936271872 seems great, but generation doesn't match, 
have=663397, want=663595 level 0
Well block 1004490752 seems great, but generation doesn't match, 
have=663571, want=663595 level 0
Well block 1007804416 seems great, but generation doesn't match, 
have=663572, want=663595 level 0
Well block 1012031488 seems great, but generation doesn't match, 
have=663575, want=663595 level 0
Well block 1012396032 seems great, but generation doesn't match, 
have=663575, want=663595 level 0
Well block 1012633600 seems great, but generation doesn't match, 
have=663586, want=663595 level 0
Well block 1012871168 seems great, but generation doesn't match, 
have=663585, want=663595 level 0
Well block 1015201792 seems great, but generation doesn't match, 
have=663588, want=663595 level 0
Well block 1015836672 seems great, but generation doesn't match, 
have=663596, want=663595 level 1
Well block 44132536320 seems great, but generation doesn't match, 
have=658774, want=663595 level 0
Well block 44178280448 seems great, but generation doesn't match, 
have=658774, want=663595 level 0
Well block 87443644416 seems great, but generation doesn't match, 
have=661349, want=663595 level 0
Well block 87514079232 seems great, but generation doesn't match, 
have=651051, want=663595 level 0
Well block 87517679616 seems great, but generation doesn't match, 
have=661349, want=663595 level 0
Well block 98697822208 seems great, but generation doesn't match, 
have=643548, want=663595 level 0
Well block 103285026816 seems great, but generation doesn't match, 
have=661672, want=663595 level 0
Well block 103309553664 seems great, but generation doesn't match, 
have=661674, want=663595 level 0
Well block 103523430400 seems great, but generation doesn't match, 
have=661767, want=663595 level 0

No more metdata to scan, exiting

This line I found interesting because have is want + 1:
Well block 1015836672 seems great, but generation doesn't match, 
have=663596, want=663595 level 1


And here the tail of btrfs rescue chunk-recover (full output at 
http://pastebin.com/raw.php?i=1D5VgDxv)


[..]
Total Chunks:   234
  Heathy:   231
  Bad:  3

Orphan Block Groups:

Orphan Device Extents:
Couldn't map the block 1015238656
btrfs: volumes.c:1140: btrfs_num_copies: Assertion `!(ce-start  
logical || ce-start + ce-size  logical)' failed.

Aborted


Sadly btrfs check --repair keep up refusing to do its job.

linux:~ # btrfs check --repair /dev/sda3
enabling repair mode
Check tree block failed, want=1015238656, have=0
Check tree block failed, want=1015238656, have=0
Check tree block failed, want=1015238656, have=0
Check tree block failed, want=1015238656, have=0
Check tree block failed, want=1015238656, have=0
read block failed check_tree_block
Couldn't read tree root
Checking filesystem on /dev/sda3
UUID: 1af256b5-b1ad-443b-aeee-a6853e70b7e2
Critical roots corrupted, unable to fsck the FS
Segmentation fault

Any more hints?

Ciao

Re: btrfs unmountable: read block failed check_tree_block; Couldn't read tree root

2014-10-27 Thread Duncan
Ansgar Hockmann-Stolle posted on Mon, 27 Oct 2014 14:23:19 +0100 as
excerpted:

 Hi!
 
 My btrfs system partition went readonly. After reboot it doesnt mount
 anymore. System was openSUSE 13.1 Tumbleweed (kernel 3.17.??). Now I'm
 on openSUSE 13.2-RC1 rescue (kernel 3.16.3). I dumped (dd) the whole 250
 GB SSD to some USB file and tried some btrfs tools on another copy per
 loopback device. But everything failed with:
 
 kernel: BTRFS: failed to read tree root on dm-2
 
 See http://pastebin.com/raw.php?i=dPnU6nzg.
 
 Any hints where to go from here?

Good job posting initial problem information.  =:^)  A lot of folks take 
2-3 rounds of request and reply before that much info is available on the 
problem.

While others may be able to assist you in restoring that filesystem to 
working condition, my focus is more on recovering what can be recovered 
from it and doing a fresh mkfs.

System partition, 250 GB, looks to be just under 231 GiB based on the 
total bytes from btrfs-show-super.

How recent is your backup, and/or being a system partition, is it simply 
the distro installation, possibly without too much customization, thus 
easily reinstalled?

IOW, if you were to call that partition a total loss and simply mkfs it, 
would you lose anything real valuable that's not backed up?  (Of course, 
the standard lecture at this point is that if it's not backed up, by 
definition you didn't consider it valuable enough to be worth the hassle, 
so by definition it's not valuable and you can simply blow it away, 
but...)

If you're in good shape in that regard, that's what I'd probably do at 
this point, keeping the dd image you made in case someone's interested in 
tracking the problem down and making btrfs handle that case.

If there's important files on there that you don't have backed up, or if 
you have a backup but it's older than you'd like and you want to try to 
recover current versions of what you can (the situation I was in a few 
months ago), then btrfs restore is what you're interested in.  Restore 
works on an /unmounted/ (and potentially unmountable, as here) 
filesystem, letting you retrieve files from it and copy them to other 
filesystems.  It does NOT write anything to the damaged filesystem 
itself, so no worries about making the problem worse.

There's a page on the wiki describing how to use btrfs restore along with 
btrfs-find-root in some detail, definitely more than is in the manpages 
or that I want to do here.

https://btrfs.wiki.kernel.org/index.php/Restore

Some useful hints that weren't originally clear to me as I used that page 
here:

* Generation and transid are the same thing, a sequentially increasing 
number that updates every time the root tree is written.  The generation 
recorded in your superblocks (from btrfs-show-super) is 663595, so the 
idea would be that generation/transid, falling back one to 663594 if 95 
isn't usable, then 93, then... etc.  The lower the number the further 
back in history you're going, so obviously, you want the closest to 
663595 that you can get, that still gives you access to a (nearly) whole 
filesystem, or at least the parts of it you are interested in.

* That page was written before restore's -D/--dry-run option was 
available.  This option can be quite helpful, and I recommend using it to 
see what will actually be restored at each generation and associated tree 
root (bytenr/byte-number).  Tho (with -v/verbose) the list of files 
restored will normally be too long to go thru in detail, you can either 
scan it or pipe the output to wc -l to get a general idea of how many 
files would be restored.

* Restore's -l/list-tree-roots option isn't listed on the page either.  
btrfs restore -l -t bytenr can be quite useful, giving you a nice list 
of trees available for the generation corresponding to that bytenr (as 
found using btrfs-find-root).  This is where the page's advice to pick 
the latest tree root with all or as many as possible of the filesystem 
trees in it, comes in, since this lets you easily see which trees each 
root has available.

* I don't use snapshots or subvolumes here, while I understand OpenSuSE 
uses them rather heavily (via snapper).  Thus I have no direct experience 
with restore's snapshot-related options.  Presumably you can either 
ignore the snapshots (the apparent default) or restore them either in 
general (using -s) or selectively (using -r, with the appropriate 
snapshot rootid).

* It's worth noting that restore simply lets you retrieve files.  It does 
*NOT* retrieve file ownership or permissions, with the restored files all 
being owned by the user you ran btrfs restore under (presumably root), 
with $UMASK permissions.  You'll have to restore ownership and 
permissions manually.

When I used restore here I had a backup, but the backup was old.  So I 
hacked up a bash scriptlet with a for loop, that went thru all the 
restored files recursively, comparing them against the old backup.  If 
the file existed

Re: btrfs unmountable: read block failed check_tree_block; Couldn't read tree root

2014-10-27 Thread Qu Wenruo


 Original Message 
Subject: Re: btrfs unmountable: read block failed check_tree_block; 
Couldn't read tree root

From: Qu Wenruo quwen...@cn.fujitsu.com
To: Ansgar Hockmann-Stolle ansgar.hockmann-sto...@uni-osnabrueck.de, 
linux-btrfs@vger.kernel.org

Date: 2014年10月28日 09:05


 Original Message 
Subject: Re: btrfs unmountable: read block failed check_tree_block; 
Couldn't read tree root

From: Ansgar Hockmann-Stolle ansgar.hockmann-sto...@uni-osnabrueck.de
To: linux-btrfs@vger.kernel.org
Date: 2014年10月28日 07:03

Am 27.10.14 um 14:23 schrieb Ansgar Hockmann-Stolle:

Hi!

My btrfs system partition went readonly. After reboot it doesnt mount
anymore. System was openSUSE 13.1 Tumbleweed (kernel 3.17.??). Now I'm
on openSUSE 13.2-RC1 rescue (kernel 3.16.3). I dumped (dd) the whole 
250

GB SSD to some USB file and tried some btrfs tools on another copy per
loopback device. But everything failed with:

kernel: BTRFS: failed to read tree root on dm-2

See http://pastebin.com/raw.php?i=dPnU6nzg.

Any hints where to go from here?


After an offlist hint (thanks Tom!) I compiled the latest btrfs-progs 
3.17 and tried some more ...


linux:~/bin # ./btrfs --version
Btrfs v3.17
linux:~/bin # ./btrfs-find-root /dev/sda3
Super think's the tree root is at 1015238656, chunk root 20971520
Well block 239718400 seems great, but generation doesn't match, 
have=661931, want=663595 level 0
Well block 239722496 seems great, but generation doesn't match, 
have=661931, want=663595 level 0
Well block 320098304 seems great, but generation doesn't match, 
have=662233, want=663595 level 0
Well block 879341568 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879345664 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879382528 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879398912 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879403008 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879423488 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 879435776 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 880095232 seems great, but generation doesn't match, 
have=663227, want=663595 level 0
Well block 881504256 seems great, but generation doesn't match, 
have=663228, want=663595 level 0
Well block 881512448 seems great, but generation doesn't match, 
have=663228, want=663595 level 0
Well block 936271872 seems great, but generation doesn't match, 
have=663397, want=663595 level 0
Well block 1004490752 seems great, but generation doesn't match, 
have=663571, want=663595 level 0
Well block 1007804416 seems great, but generation doesn't match, 
have=663572, want=663595 level 0
Well block 1012031488 seems great, but generation doesn't match, 
have=663575, want=663595 level 0
Well block 1012396032 seems great, but generation doesn't match, 
have=663575, want=663595 level 0
Well block 1012633600 seems great, but generation doesn't match, 
have=663586, want=663595 level 0
Well block 1012871168 seems great, but generation doesn't match, 
have=663585, want=663595 level 0
Well block 1015201792 seems great, but generation doesn't match, 
have=663588, want=663595 level 0
Well block 1015836672 seems great, but generation doesn't match, 
have=663596, want=663595 level 1
Well block 44132536320 seems great, but generation doesn't match, 
have=658774, want=663595 level 0
Well block 44178280448 seems great, but generation doesn't match, 
have=658774, want=663595 level 0
Well block 87443644416 seems great, but generation doesn't match, 
have=661349, want=663595 level 0
Well block 87514079232 seems great, but generation doesn't match, 
have=651051, want=663595 level 0
Well block 87517679616 seems great, but generation doesn't match, 
have=661349, want=663595 level 0
Well block 98697822208 seems great, but generation doesn't match, 
have=643548, want=663595 level 0
Well block 103285026816 seems great, but generation doesn't match, 
have=661672, want=663595 level 0
Well block 103309553664 seems great, but generation doesn't match, 
have=661674, want=663595 level 0
Well block 103523430400 seems great, but generation doesn't match, 
have=661767, want=663595 level 0

No more metdata to scan, exiting

This line I found interesting because have is want + 1:
Well block 1015836672 seems great, but generation doesn't match, 
have=663596, want=663595 level 1


And here the tail of btrfs rescue chunk-recover (full output at 
http://pastebin.com/raw.php?i=1D5VgDxv)


[..]
Total Chunks:234
  Heathy:231
  Bad:3

Orphan Block Groups:

Orphan Device Extents:
Couldn't map the block 1015238656
btrfs: volumes.c:1140: btrfs_num_copies: Assertion `!(ce-start  
logical || ce-start + ce-size  logical)' failed.

Aborted

After looking into the 3 bad chunks, it turns

[PATCH 07/12] Btrfs-progs: re-search tree root if it changes

2014-10-10 Thread Josef Bacik
If we change something while scanning fs-roots we need to redo our search so
that we get valid root items and have valid root cache.  Thanks,

Signed-off-by: Josef Bacik jba...@fb.com
---
 cmds-check.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index e81a26c..9522077 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -2171,7 +2171,7 @@ static int check_fs_roots(struct btrfs_root *root,
struct btrfs_path path;
struct btrfs_key key;
struct walk_control wc;
-   struct extent_buffer *leaf;
+   struct extent_buffer *leaf, *tree_node;
struct btrfs_root *tmp_root;
struct btrfs_root *tree_root = root-fs_info-tree_root;
int ret;
@@ -2186,13 +2186,19 @@ static int check_fs_roots(struct btrfs_root *root,
memset(wc, 0, sizeof(wc));
cache_tree_init(wc.shared);
btrfs_init_path(path);
-
+again:
key.offset = 0;
key.objectid = 0;
key.type = BTRFS_ROOT_ITEM_KEY;
ret = btrfs_search_slot(NULL, tree_root, key, path, 0, 0);
BUG_ON(ret  0);
+   tree_node = tree_root-node;
while (1) {
+   if (tree_node != tree_root-node) {
+   free_root_recs_tree(root_cache);
+   btrfs_release_path(path);
+   goto again;
+   }
leaf = path.nodes[0];
if (path.slots[0] = btrfs_header_nritems(leaf)) {
ret = btrfs_next_leaf(tree_root, path);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/12] btrfs: handle errors from reading the quota tree root

2014-08-04 Thread Zach Brown
On Fri, Aug 01, 2014 at 06:12:37PM -0500, Eric Sandeen wrote:
 Reading the quota tree root may fail with ENOENT
 if there is no quota, which is fine, but the code was
 ignoring every other error as well, which is not fine.

Kinda makes you want to write a test that would have caught this.

Kinda.

Also, if you're still keen to iterate on this series, it looks like this
pattern is copied and pasted a few times in open_ctree().  With
temporary root pointers for each block, for some reason.  A little
helper function could take a bite out of open_ctree().

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/12] btrfs: handle errors from reading the quota tree root

2014-08-04 Thread Eric Sandeen
On 8/4/14, 1:35 PM, Zach Brown wrote:
 On Fri, Aug 01, 2014 at 06:12:37PM -0500, Eric Sandeen wrote:
 Reading the quota tree root may fail with ENOENT
 if there is no quota, which is fine, but the code was
 ignoring every other error as well, which is not fine.
 
 Kinda makes you want to write a test that would have caught this.
 
 Kinda.

/me looks at ground, shuffles feet ...
 
 Also, if you're still keen to iterate on this series, it looks like this
 pattern is copied and pasted a few times in open_ctree().  With
 temporary root pointers for each block, for some reason.  A little
 helper function could take a bite out of open_ctree().

Hm, the uuid tree is roughly similar, but not exactly.  I think those
are the only 2 optional roots (uuid because it'll get regenerated).

I'm guessing the temporary root pointer is so we don't ever assign a
PTR_ERR to the root in fs_info?  

-Eric
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/12] btrfs: handle errors from reading the quota tree root

2014-08-04 Thread Zach Brown
On Mon, Aug 04, 2014 at 01:42:23PM -0500, Eric Sandeen wrote:
 On 8/4/14, 1:35 PM, Zach Brown wrote:
  On Fri, Aug 01, 2014 at 06:12:37PM -0500, Eric Sandeen wrote:
  Reading the quota tree root may fail with ENOENT
  if there is no quota, which is fine, but the code was
  ignoring every other error as well, which is not fine.
  
  Kinda makes you want to write a test that would have caught this.
  
  Kinda.
 
 /me looks at ground, shuffles feet ...
  
  Also, if you're still keen to iterate on this series, it looks like this
  pattern is copied and pasted a few times in open_ctree().  With
  temporary root pointers for each block, for some reason.  A little
  helper function could take a bite out of open_ctree().
 
 Hm, the uuid tree is roughly similar, but not exactly.  I think those
 are the only 2 optional roots (uuid because it'll get regenerated).
 
 I'm guessing the temporary root pointer is so we don't ever assign a
 PTR_ERR to the root in fs_info?  

It took me a while to see what you meant.

Yeah, using a temporary root makes sense.  Using a different one for
each block makes less sense.

a = f(A);
if (a)
goto out;
info-a = a;

b = f(B);
if (b)
goto out;
info-b = b;

vs.

r = f(A);
if (r)
goto out;
info-a = r;

r = f(B);
if (r)
goto out;
info-b = r;

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/12] btrfs: handle errors from reading the quota tree root

2014-08-04 Thread Eric Sandeen
On 8/4/14, 1:51 PM, Zach Brown wrote:
 On Mon, Aug 04, 2014 at 01:42:23PM -0500, Eric Sandeen wrote:
 On 8/4/14, 1:35 PM, Zach Brown wrote:
 On Fri, Aug 01, 2014 at 06:12:37PM -0500, Eric Sandeen wrote:
 Reading the quota tree root may fail with ENOENT
 if there is no quota, which is fine, but the code was
 ignoring every other error as well, which is not fine.

 Kinda makes you want to write a test that would have caught this.

 Kinda.

 /me looks at ground, shuffles feet ...
  
 Also, if you're still keen to iterate on this series, it looks like this
 pattern is copied and pasted a few times in open_ctree().  With
 temporary root pointers for each block, for some reason.  A little
 helper function could take a bite out of open_ctree().

 Hm, the uuid tree is roughly similar, but not exactly.  I think those
 are the only 2 optional roots (uuid because it'll get regenerated).

 I'm guessing the temporary root pointer is so we don't ever assign a
 PTR_ERR to the root in fs_info?  
 
 It took me a while to see what you meant.
 
 Yeah, using a temporary root makes sense.  Using a different one for
 each block makes less sense.
 

snip

 - z
 

Yeah, fair enough, I thought about that after I hit send ;)
I could send a V2 of patch 11/12 to do that w/o needing to redo
the series too much.  :)  I'll see if there are any other comments.

Thanks!
-Eric
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/12] btrfs: handle errors from reading the quota tree root

2014-08-01 Thread Eric Sandeen
Reading the quota tree root may fail with ENOENT
if there is no quota, which is fine, but the code was
ignoring every other error as well, which is not fine.

Signed-off-by: Eric Sandeen sand...@redhat.com
---
 fs/btrfs/disk-io.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e6746be..28d35a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2733,7 +2733,12 @@ retry_root_backup:
 
location.objectid = BTRFS_QUOTA_TREE_OBJECTID;
quota_root = btrfs_read_tree_root(tree_root, location);
-   if (!IS_ERR(quota_root)) {
+   if (IS_ERR(quota_root)) {
+   ret = PTR_ERR(quota_root);
+   /* It's fine to not have quotas */
+   if (ret != -ENOENT)
+   goto recovery_tree_root;
+   } else {
set_bit(BTRFS_ROOT_TRACK_DIRTY, quota_root-state);
fs_info-quota_enabled = 1;
fs_info-pending_quota_state = 1;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] Btrfs-progs: fsck: force to udate tree root for some cases

2014-03-05 Thread Wang Shilong
commit roots won't update root item in tree root if it finds
updated root's bytenr is same as before.

However, this is not right for fsck, we need update tree root in
the following case:

1.overwrite previous root node.

2.reinit reloc data tree, this is because we skip pin relo data
 tree before which means we can allocate same block as before.

Fix this by updating tree root ourselves for the above cases.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 cmds-check.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index 4b2a8f0..ae611d1 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -5906,6 +5906,7 @@ static int btrfs_fsck_reinit_root(struct 
btrfs_trans_handle *trans,
struct extent_buffer *c;
struct extent_buffer *old = root-node;
int level;
+   int ret;
struct btrfs_disk_key disk_key = {0,0,0};
 
level = 0;
@@ -5922,6 +5923,7 @@ static int btrfs_fsck_reinit_root(struct 
btrfs_trans_handle *trans,
if (IS_ERR(c)) {
c = old;
extent_buffer_get(c);
+   overwrite = 1;
}
 init:
memset_extent_buffer(c, 0, 0, sizeof(struct btrfs_header));
@@ -5939,7 +5941,26 @@ init:
BTRFS_UUID_SIZE);
 
btrfs_mark_buffer_dirty(c);
-
+   /*
+* this case can happen in the following case:
+*
+* 1.overwrite previous root.
+*
+* 2.reinit reloc data root, this is because we skip pin
+* down reloc data tree before which means we can allocate
+* same block bytenr here.
+*/
+   if (old-start == c-start) {
+   btrfs_set_root_generation(root-root_item,
+ trans-transid);
+   root-root_item.level = btrfs_header_level(root-node);
+   ret = btrfs_update_root(trans, root-fs_info-tree_root,
+   root-root_key, root-root_item);
+   if (ret) {
+   free_extent_buffer(c);
+   return ret;
+   }
+   }
free_extent_buffer(old);
root-node = c;
add_root_to_dirty_list(root);
-- 
1.9.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix wrong search path initialization before searching tree root

2014-01-14 Thread Wang Shilong
From: Wang Shilong wangsl.f...@cn.fujitsu.com

To search tree root without transaction protection, we should neither search 
commit
root nor skip locking here, fix it.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 fs/btrfs/send.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 54665d4..14f9693 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -2095,7 +2095,7 @@ static int send_subvol_begin(struct send_ctx *sctx)
char *name = NULL;
int namelen;
 
-   path = alloc_path_for_send();
+   path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
 
-- 
1.8.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: look up the containing tree root id

2013-09-18 Thread David Sterba
Find the tree id of the containing subvolume for a given file or
directory. For subvolume return it's own id.

$ btrfs inspect-internal rootid path

Signed-off-by: David Sterba dste...@suse.cz
---
 cmds-inspect.c |   38 ++
 man/btrfs.8.in |9 +
 utils.c|   30 ++
 utils.h|2 ++
 4 files changed, 79 insertions(+), 0 deletions(-)

diff --git a/cmds-inspect.c b/cmds-inspect.c
index bdebf7d..f0c8e3d 100644
--- a/cmds-inspect.c
+++ b/cmds-inspect.c
@@ -301,6 +301,43 @@ out:
return ret ? 1 : 0;
 }
 
+static const char* const cmd_rootid_usage[] = {
+   btrfs inspect-internal rootid path,
+   Get tree ID of the containing subvolume of path.,
+   NULL
+};
+
+static int cmd_rootid(int argc, char **argv)
+{
+   int ret;
+   int fd = -1;
+   u64 rootid;
+   DIR *dirstream = NULL;
+
+   if (check_argc_exact(argc, 2))
+   usage(cmd_rootid_usage);
+
+   fd = open_file_or_dir(argv[1], dirstream);
+   if (fd  0) {
+   fprintf(stderr, ERROR: can't access '%s'\n, argv[1]);
+   ret = -ENOENT;
+   goto out;
+   }
+
+   ret = lookup_ino_rootid(fd, rootid);
+   if (ret) {
+   fprintf(stderr, %s: rootid failed with ret=%d\n,
+   argv[0], ret);
+   goto out;
+   }
+
+   printf(%llu\n, (unsigned long long)rootid);
+out:
+   close_file_or_dir(fd, dirstream);
+
+   return !!ret;
+}
+
 const struct cmd_group inspect_cmd_group = {
inspect_cmd_group_usage, NULL, {
{ inode-resolve, cmd_inode_resolve, cmd_inode_resolve_usage,
@@ -309,6 +346,7 @@ const struct cmd_group inspect_cmd_group = {
cmd_logical_resolve_usage, NULL, 0 },
{ subvolid-resolve, cmd_subvolid_resolve,
cmd_subvolid_resolve_usage, NULL, 0 },
+   { rootid, cmd_rootid, cmd_rootid_usage, NULL, 0 },
NULL_CMD_STRUCT
}
 };
diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 0b69ce2..be4a6b2 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -80,6 +80,8 @@ btrfs \- control a btrfs filesystem
 .PP
 \fBbtrfs\fP \fBinspect-internal subvolid-resolve\fP \fIsubvolid\fP 
\fIpath\fP
 .PP
+\fBbtrfs\fP \fBinspect-internal rootid\fP \fIpath\fP
+.PP
 .PP
 \fBbtrfs\fP \fBsend\fP [-v] [-p \fIparent\fP] [-c \fIclone-src\fP] [-f 
\fIoutfile\fP] \fIsubvol\fP
 .PP
@@ -582,6 +584,13 @@ not enough to read all the resolved results. The max value 
one can set is 64k.
 Get file system paths for the given subvolume ID.
 .TP
 
+\fBinspect-internal rootid\fP \fIpath\fP
+For a given file or directory, return the containing tree root id. For a
+subvolume return it's own tree id.
+
+The result is undefined for the so-called empty subvolumes (identified by 
inode number 2).
+.TP
+
 \fBsend\fP [-v] [-p \fIparent\fP] [-c \fIclone-src\fP] [-f 
\fIoutfile\fP] \fIsubvol\fP
 Send the subvolume to stdout.
 Sends the subvolume specified by \fIsubvol\fR to stdout.
diff --git a/utils.c b/utils.c
index 5fa193b..8aecc4a 100644
--- a/utils.c
+++ b/utils.c
@@ -1975,3 +1975,33 @@ int ask_user(char *question)
   (answer = strtok_r(buf,  \t\n\r, saveptr)) 
   (!strcasecmp(answer, yes) || !strcasecmp(answer, y));
 }
+
+/*
+ * For a given:
+ * - file or directory return the containing tree root id
+ * - subvolume return it's own tree id
+ * - BTRFS_EMPTY_SUBVOL_DIR_OBJECTID (directory with ino == 2) the result is
+ *   undefined and function returns -1
+ */
+int lookup_ino_rootid(int fd, u64 *rootid)
+{
+   struct btrfs_ioctl_ino_lookup_args args;
+   int ret;
+   int e;
+
+   memset(args, 0, sizeof(args));
+   args.treeid = 0;
+   args.objectid = BTRFS_FIRST_FREE_OBJECTID;
+
+   ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP, args);
+   e = errno;
+   if (ret) {
+   fprintf(stderr, ERROR: Failed to lookup root id - %s\n,
+   strerror(e));
+   return ret;
+   }
+
+   *rootid = args.treeid;
+
+   return 0;
+}
diff --git a/utils.h b/utils.h
index fdef3f0..19f028f 100644
--- a/utils.h
+++ b/utils.h
@@ -81,4 +81,6 @@ int is_vol_small(char *file);
 int csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf,
   int verify);
 int ask_user(char *question);
+int lookup_ino_rootid(int fd, u64 *rootid);
+
 #endif
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/17] Btrfs: fix accessing a freed tree root

2013-05-15 Thread Miao Xie
inode_tree_del() will move the tree root into the dead root list, and
then the tree will be destroyed by the cleaner. So if we remove the
delayed node which is cached in the inode after inode_tree_del(),
we may access a freed tree root. Fix it.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1669c3b..7f6e78a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4723,6 +4723,7 @@ void btrfs_evict_inode(struct inode *inode)
btrfs_end_transaction(trans, root);
btrfs_btree_balance_dirty(root);
 no_delete:
+   btrfs_remove_delayed_node(inode);
clear_inode(inode);
return;
 }
@@ -7978,7 +7979,6 @@ void btrfs_destroy_inode(struct inode *inode)
inode_tree_del(inode);
btrfs_drop_extent_cache(inode, 0, (u64)-1, 0);
 free:
-   btrfs_remove_delayed_node(inode);
call_rcu(inode-i_rcu, btrfs_i_callback);
 }
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tree root

2012-10-04 Thread David Sterba
On Wed, Oct 03, 2012 at 06:35:00PM +0200, Øystein Sættem Middelthun wrote:
 Well block 14102764707840 seems great, but generation doesn't match,
 have=109268, want=109269
 
 Because the filesystem was last in use with a pre 3.2-kernel I am unable to
 use mount -o recovery, but restore seems to work when I specify the previous
 tree-root. My problem is however that the btrfs is so large I have nowhere
 to temporarily put all the files. I am currently running kernel 3.5. Does
 mount have an option to manually tell it to use the tree root at block
 14102764707840?

It's not possible to supply the tree_root via mount option, but it
should be possible to modify the superblock to point the tree_root to
the one you've found, in a similar way that the -o recovery does that,
but this hasn't been done before and minor details may make it
complicated (like getting the transaction number in sync, as the
tree_root has latest-1).


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


tree root

2012-10-03 Thread Øystein Sættem Middelthun

Hi!

I have a broken btrfs unable to mount because it is unable to find the 
tree root. Using find-root I find the following:


Well block 14102764707840 seems great, but generation doesn't match, 
have=109268, want=109269


Because the filesystem was last in use with a pre 3.2-kernel I am unable 
to use mount -o recovery, but restore seems to work when I specify the 
previous tree-root. My problem is however that the btrfs is so large I 
have nowhere to temporarily put all the files. I am currently running 
kernel 3.5. Does mount have an option to manually tell it to use the 
tree root at block 14102764707840?



--
Best regards
Øystein Middelthun
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tree root

2012-10-03 Thread Mitch Harder
On Wed, Oct 3, 2012 at 11:35 AM, Øystein Sættem Middelthun
oyst...@middelthun.no wrote:
 Hi!

 I have a broken btrfs unable to mount because it is unable to find the tree
 root. Using find-root I find the following:

 Well block 14102764707840 seems great, but generation doesn't match,
 have=109268, want=109269

 Because the filesystem was last in use with a pre 3.2-kernel I am unable to
 use mount -o recovery, but restore seems to work when I specify the previous
 tree-root. My problem is however that the btrfs is so large I have nowhere
 to temporarily put all the files. I am currently running kernel 3.5. Does
 mount have an option to manually tell it to use the tree root at block
 14102764707840?


If you do not have a suitable backup for these files, please make an
effort to do what you can with restore.  Some of the repair methods
out there have a possibility to make the situation worse.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tree root

2012-10-03 Thread Øystein Sættem Middelthun

On 10/03/2012 07:29 PM, Mitch Harder wrote:

If you do not have a suitable backup for these files, please make an
effort to do what you can with restore.  Some of the repair methods
out there have a possibility to make the situation worse.


We are talking about something like 50TB, so there is just no way I have 
the available space on other disks for temporary storage.


So in effect you are saying that there are no other available options 
than a restore? If I understand correctly a feature along the lines of 
mount -o tree_root=14102764707840 /dev/ /path/ would solve my problem.


The fs is unmountable because of a temporary loss of connection with an 
underlying disk controller, and I don't think the device has a lot of 
errors besides not being able to find the latest tree root.



--
Best regards
Øystein Middelthun
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tree root

2012-10-03 Thread Mitch Harder
On Wed, Oct 3, 2012 at 5:11 PM, Øystein Sættem Middelthun
oyst...@middelthun.no wrote:
 On 10/03/2012 07:29 PM, Mitch Harder wrote:

 If you do not have a suitable backup for these files, please make an
 effort to do what you can with restore.  Some of the repair methods
 out there have a possibility to make the situation worse.


 We are talking about something like 50TB, so there is just no way I have the
 available space on other disks for temporary storage.

 So in effect you are saying that there are no other available options than a
 restore? If I understand correctly a feature along the lines of mount -o
 tree_root=14102764707840 /dev/ /path/ would solve my problem.

 The fs is unmountable because of a temporary loss of connection with an
 underlying disk controller, and I don't think the device has a lot of errors
 besides not being able to find the latest tree root.


You should probably try to supply some more information about your situation.

Was this btrfs volume build with RAID-1?

If so, we should be able to mount in degraded mode.

Even so, when I see the words unable to find the tree root and
temporary loss of connection with an underlying disk controller
along with the implication that you have no reliable backup of this
data, I worry that your situation is potentially precarious.

The possibility exists that recovering your data is your best option
(as opposed to restoring to previous working condition).

Using backup tree-roots and super-blocks has the potential to do
irreversible damage.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Failed tree root

2012-07-07 Thread Øystein Sættem Middelthun

Hi!

After a short loss of contact with an underlying disk controller my 
btrfs partition is unmountable.


dmesg provides the following information:

[356850.853787] btrfs bad tree block start 0 14102771924992
[356850.853808] btrfs: failed to read tree root on dm-0
[356850.859218] btrfs: open_ctree failed

Is this type of error known to be fixable by the unstable btrfsck 
(20120328)? I am currently running kernel 3.2.0-25.


--
Best regards
Øystein Middelthun

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Seg fault when csum-tree root is corrupted.

2012-07-02 Thread Anand jain
From: Anand Jain anand.j...@oracle.com

Segmentation fault with the following trace when csum-tree is
deliberately corrupted using btrfs-corrupt-block

read block failed check_tree_block
Couldn't setup csum tree
checking extents
Check tree block failed, want=29376512, have=0
::
read block failed check_tree_block  !!
Segmentation fault (core dumped) --- !!

The below fix will redirect btrfsck user to use --init-csum-tree
when csum root is corrupted
---
 btrfsck.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/btrfsck.c b/btrfsck.c
index 7aac736..6ced7b5 100644
--- a/btrfsck.c
+++ b/btrfsck.c
@@ -3483,6 +3483,10 @@ int main(int ac, char **av)
fprintf(stderr, Critical roots corrupted, unable to fsck the 
FS\n);
return -EIO;
}
+   if (!extent_buffer_uptodate(info-csum_root-node)  !init_csum_tree) {
+   fprintf(stderr, Checksum root corrupted, run 'btrfsck 
--init-csum-tree'\n);
+   return -EIO;
+   }
 
root = info-fs_root;
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html