Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-11-13 Thread David Sterba
On Mon, Nov 05, 2018 at 04:36:34PM +, Filipe Manana wrote:
> On Mon, Nov 5, 2018 at 4:34 PM David Sterba  wrote:
> > On Mon, Nov 05, 2018 at 04:30:35PM +, Filipe Manana wrote:
> > > On Mon, Nov 5, 2018 at 4:29 PM David Sterba  wrote:
> > > > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > > > > > Ah ok makes sense.  Well in that case lets just make 
> > > > > > btrfs_read_locked_inode()
> > > > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> > > > > >
> > > > > > if (path != in_path)
> > > > >
> > > > > You mean the following on top of v4:
> > > > >
> > > > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> > > > >
> > > > > Not much different, just saves one such if statement. I'm ok with 
> > > > > that.
> > > >
> > > > Now in misc-next with v4 and the friendpaste incremental as
> > > >
> > > > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69
> > >
> > > Please don't add the incremental. It's buggy. It was meant to figure
> > > out what Josef was saying. That's why I haven't sent a V5.
> >
> > Ok dropped, I'll will wait for a proper patch.
> 
> It's V4, the last sent version. Just forget the incremental.
> Thanks.

For the record, V4 has been merged to master in 4.20-rc2.


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-11-05 Thread Filipe Manana
On Mon, Nov 5, 2018 at 4:34 PM David Sterba  wrote:
>
> On Mon, Nov 05, 2018 at 04:30:35PM +, Filipe Manana wrote:
> > On Mon, Nov 5, 2018 at 4:29 PM David Sterba  wrote:
> > >
> > > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > > > > Ah ok makes sense.  Well in that case lets just make 
> > > > > btrfs_read_locked_inode()
> > > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> > > > >
> > > > > if (path != in_path)
> > > >
> > > > You mean the following on top of v4:
> > > >
> > > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> > > >
> > > > Not much different, just saves one such if statement. I'm ok with that.
> > >
> > > Now in misc-next with v4 and the friendpaste incremental as
> > >
> > > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69
> >
> > Please don't add the incremental. It's buggy. It was meant to figure
> > out what Josef was saying. That's why I haven't sent a V5.
>
> Ok dropped, I'll will wait for a proper patch.

It's V4, the last sent version. Just forget the incremental.
Thanks.


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-11-05 Thread David Sterba
On Mon, Nov 05, 2018 at 04:30:35PM +, Filipe Manana wrote:
> On Mon, Nov 5, 2018 at 4:29 PM David Sterba  wrote:
> >
> > On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > > > Ah ok makes sense.  Well in that case lets just make 
> > > > btrfs_read_locked_inode()
> > > > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> > > >
> > > > if (path != in_path)
> > >
> > > You mean the following on top of v4:
> > >
> > > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> > >
> > > Not much different, just saves one such if statement. I'm ok with that.
> >
> > Now in misc-next with v4 and the friendpaste incremental as
> >
> > https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69
> 
> Please don't add the incremental. It's buggy. It was meant to figure
> out what Josef was saying. That's why I haven't sent a V5.

Ok dropped, I'll will wait for a proper patch.


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-11-05 Thread Filipe Manana
On Mon, Nov 5, 2018 at 4:29 PM David Sterba  wrote:
>
> On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > > Ah ok makes sense.  Well in that case lets just make 
> > > btrfs_read_locked_inode()
> > > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> > >
> > > if (path != in_path)
> >
> > You mean the following on top of v4:
> >
> > https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> >
> > Not much different, just saves one such if statement. I'm ok with that.
>
> Now in misc-next with v4 and the friendpaste incremental as
>
> https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69

Please don't add the incremental. It's buggy. It was meant to figure
out what Josef was saying. That's why I haven't sent a V5.


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-11-05 Thread David Sterba
On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> > Ah ok makes sense.  Well in that case lets just make 
> > btrfs_read_locked_inode()
> > take a path, and allocate it in btrfs_iget, that'll remove the ugly
> >
> > if (path != in_path)
> 
> You mean the following on top of v4:
> 
> https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg
> 
> Not much different, just saves one such if statement. I'm ok with that.

Now in misc-next with v4 and the friendpaste incremental as

https://github.com/kdave/btrfs-devel/commit/efcfd6c87d28b3aa9bcba52d7c1e1fc79a2dad69


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-24 Thread Josef Bacik
On Wed, Oct 24, 2018 at 01:48:40PM +0100, Filipe Manana wrote:
> On Wed, Oct 24, 2018 at 1:40 PM Josef Bacik  wrote:
> >
> > On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote:
> > > On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik  wrote:
> > > >
> > > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote:
> > > > > From: Filipe Manana 
> > > > >
> > > > > When we are writing out a free space cache, during the transaction 
> > > > > commit
> > > > > phase, we can end up in a deadlock which results in a stack trace 
> > > > > like the
> > > > > following:
> > > > >
> > > > >  schedule+0x28/0x80
> > > > >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > > > >  ? finish_wait+0x80/0x80
> > > > >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > > > >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > > > >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > > > >  ? inode_insert5+0x119/0x190
> > > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > > >  btrfs_iget+0x113/0x690 [btrfs]
> > > > >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > > > >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > > > >  load_free_space_cache+0x7c/0x170 [btrfs]
> > > > >  ? cache_block_group+0x72/0x3b0 [btrfs]
> > > > >  cache_block_group+0x1b3/0x3b0 [btrfs]
> > > > >  ? finish_wait+0x80/0x80
> > > > >  find_free_extent+0x799/0x1010 [btrfs]
> > > > >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > > > >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > > > >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> > > > >  btrfs_cow_block+0xdc/0x180 [btrfs]
> > > > >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > > >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> > > > >  cache_save_setup+0xe4/0x3a0 [btrfs]
> > > > >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > > > >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > > > >
> > > > > At cache_save_setup() we need to update the inode item of a block 
> > > > > group's
> > > > > cache which is located in the tree root (fs_info->tree_root), which 
> > > > > means
> > > > > that it may result in COWing a leaf from that tree. If that happens we
> > > > > need to find a free metadata extent and while looking for one, if we 
> > > > > find
> > > > > a block group which was not cached yet we attempt to load its cache by
> > > > > calling cache_block_group(). However this function will try to load 
> > > > > the
> > > > > inode of the free space cache, which requires finding the matching 
> > > > > inode
> > > > > item in the tree root - if that inode item is located in the same 
> > > > > leaf as
> > > > > the inode item of the space cache we are updating at 
> > > > > cache_save_setup(),
> > > > > we end up in a deadlock, since we try to obtain a read lock on the 
> > > > > same
> > > > > extent buffer that we previously write locked.
> > > > >
> > > > > So fix this by using the tree root's commit root when searching for a
> > > > > block group's free space cache inode item when we are attempting to 
> > > > > load
> > > > > a free space cache. This is safe since block groups once loaded stay 
> > > > > in
> > > > > memory forever, as well as their caches, so after they are first 
> > > > > loaded
> > > > > we will never need to read their inode items again. For new block 
> > > > > groups,
> > > > > once they are created they get their ->cached field set to
> > > > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode 
> > > > > item.
> > > > >
> > > > > Reported-by: Andrew Nelson 
> > > > > Link: 
> > > > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> > > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > > > Tested-by: Andrew Nelson 
> > > > > Signed-off-by: Filipe Manana 
> > > > > ---
> > > > >
> > > >
> > > > Now my goal is to see how many times I can get you to redo this thing.
> > > >
> > > > Why not instead just do
> > > >
> > > > if (btrfs_is_free_space_inode(inode))
> > > >   path->search_commit_root = 1;
> > > >
> > > > in read_locked_inode?  That would be cleaner.  If we don't want to do 
> > > > that for
> > > > the inode cache (I'm not sure if it's ok or not) we could just do
> > > >
> > > > if (root == fs_info->tree_root)
> > >
> > > We can't (not just that at least).
> > > Tried something like that, but we get into a BUG_ON when writing out
> > > the space cache for new block groups (created in the current
> > > transaction).
> > > Because at cache_save_setup() we have this:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342
> > >
> > > Lookup for the inode in normal root, doesn't exist, create it then
> > > repeat - if still not found, BUG_ON.
> > > Could also make create_free_space_inode() return an inode pointer and
> > > make it call btrfs_iget().
> > >
> >
> > Ah ok makes sense.  

Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-24 Thread Filipe Manana
On Wed, Oct 24, 2018 at 1:40 PM Josef Bacik  wrote:
>
> On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote:
> > On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik  wrote:
> > >
> > > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote:
> > > > From: Filipe Manana 
> > > >
> > > > When we are writing out a free space cache, during the transaction 
> > > > commit
> > > > phase, we can end up in a deadlock which results in a stack trace like 
> > > > the
> > > > following:
> > > >
> > > >  schedule+0x28/0x80
> > > >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > > >  ? finish_wait+0x80/0x80
> > > >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > > >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > > >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > > >  ? inode_insert5+0x119/0x190
> > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > >  btrfs_iget+0x113/0x690 [btrfs]
> > > >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > > >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > > >  load_free_space_cache+0x7c/0x170 [btrfs]
> > > >  ? cache_block_group+0x72/0x3b0 [btrfs]
> > > >  cache_block_group+0x1b3/0x3b0 [btrfs]
> > > >  ? finish_wait+0x80/0x80
> > > >  find_free_extent+0x799/0x1010 [btrfs]
> > > >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > > >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > > >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> > > >  btrfs_cow_block+0xdc/0x180 [btrfs]
> > > >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > > >  ? kmem_cache_alloc+0x166/0x1d0
> > > >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> > > >  cache_save_setup+0xe4/0x3a0 [btrfs]
> > > >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > > >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > > >
> > > > At cache_save_setup() we need to update the inode item of a block 
> > > > group's
> > > > cache which is located in the tree root (fs_info->tree_root), which 
> > > > means
> > > > that it may result in COWing a leaf from that tree. If that happens we
> > > > need to find a free metadata extent and while looking for one, if we 
> > > > find
> > > > a block group which was not cached yet we attempt to load its cache by
> > > > calling cache_block_group(). However this function will try to load the
> > > > inode of the free space cache, which requires finding the matching inode
> > > > item in the tree root - if that inode item is located in the same leaf 
> > > > as
> > > > the inode item of the space cache we are updating at cache_save_setup(),
> > > > we end up in a deadlock, since we try to obtain a read lock on the same
> > > > extent buffer that we previously write locked.
> > > >
> > > > So fix this by using the tree root's commit root when searching for a
> > > > block group's free space cache inode item when we are attempting to load
> > > > a free space cache. This is safe since block groups once loaded stay in
> > > > memory forever, as well as their caches, so after they are first loaded
> > > > we will never need to read their inode items again. For new block 
> > > > groups,
> > > > once they are created they get their ->cached field set to
> > > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
> > > >
> > > > Reported-by: Andrew Nelson 
> > > > Link: 
> > > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > > Tested-by: Andrew Nelson 
> > > > Signed-off-by: Filipe Manana 
> > > > ---
> > > >
> > >
> > > Now my goal is to see how many times I can get you to redo this thing.
> > >
> > > Why not instead just do
> > >
> > > if (btrfs_is_free_space_inode(inode))
> > >   path->search_commit_root = 1;
> > >
> > > in read_locked_inode?  That would be cleaner.  If we don't want to do 
> > > that for
> > > the inode cache (I'm not sure if it's ok or not) we could just do
> > >
> > > if (root == fs_info->tree_root)
> >
> > We can't (not just that at least).
> > Tried something like that, but we get into a BUG_ON when writing out
> > the space cache for new block groups (created in the current
> > transaction).
> > Because at cache_save_setup() we have this:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342
> >
> > Lookup for the inode in normal root, doesn't exist, create it then
> > repeat - if still not found, BUG_ON.
> > Could also make create_free_space_inode() return an inode pointer and
> > make it call btrfs_iget().
> >
>
> Ah ok makes sense.  Well in that case lets just make btrfs_read_locked_inode()
> take a path, and allocate it in btrfs_iget, that'll remove the ugly
>
> if (path != in_path)

You mean the following on top of v4:

https://friendpaste.com/6XrGXb5p0RSJGixUFYouHg

Not much different, just saves one such if statement. I'm ok with that.

>
> stuff.  Thanks,
>
> Josef


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-24 Thread Josef Bacik
On Wed, Oct 24, 2018 at 12:53:59PM +0100, Filipe Manana wrote:
> On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik  wrote:
> >
> > On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote:
> > > From: Filipe Manana 
> > >
> > > When we are writing out a free space cache, during the transaction commit
> > > phase, we can end up in a deadlock which results in a stack trace like the
> > > following:
> > >
> > >  schedule+0x28/0x80
> > >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> > >  ? finish_wait+0x80/0x80
> > >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> > >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> > >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> > >  ? inode_insert5+0x119/0x190
> > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > >  ? kmem_cache_alloc+0x166/0x1d0
> > >  btrfs_iget+0x113/0x690 [btrfs]
> > >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> > >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> > >  load_free_space_cache+0x7c/0x170 [btrfs]
> > >  ? cache_block_group+0x72/0x3b0 [btrfs]
> > >  cache_block_group+0x1b3/0x3b0 [btrfs]
> > >  ? finish_wait+0x80/0x80
> > >  find_free_extent+0x799/0x1010 [btrfs]
> > >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> > >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> > >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> > >  btrfs_cow_block+0xdc/0x180 [btrfs]
> > >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> > >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> > >  ? kmem_cache_alloc+0x166/0x1d0
> > >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> > >  cache_save_setup+0xe4/0x3a0 [btrfs]
> > >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> > >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> > >
> > > At cache_save_setup() we need to update the inode item of a block group's
> > > cache which is located in the tree root (fs_info->tree_root), which means
> > > that it may result in COWing a leaf from that tree. If that happens we
> > > need to find a free metadata extent and while looking for one, if we find
> > > a block group which was not cached yet we attempt to load its cache by
> > > calling cache_block_group(). However this function will try to load the
> > > inode of the free space cache, which requires finding the matching inode
> > > item in the tree root - if that inode item is located in the same leaf as
> > > the inode item of the space cache we are updating at cache_save_setup(),
> > > we end up in a deadlock, since we try to obtain a read lock on the same
> > > extent buffer that we previously write locked.
> > >
> > > So fix this by using the tree root's commit root when searching for a
> > > block group's free space cache inode item when we are attempting to load
> > > a free space cache. This is safe since block groups once loaded stay in
> > > memory forever, as well as their caches, so after they are first loaded
> > > we will never need to read their inode items again. For new block groups,
> > > once they are created they get their ->cached field set to
> > > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
> > >
> > > Reported-by: Andrew Nelson 
> > > Link: 
> > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > > Tested-by: Andrew Nelson 
> > > Signed-off-by: Filipe Manana 
> > > ---
> > >
> >
> > Now my goal is to see how many times I can get you to redo this thing.
> >
> > Why not instead just do
> >
> > if (btrfs_is_free_space_inode(inode))
> >   path->search_commit_root = 1;
> >
> > in read_locked_inode?  That would be cleaner.  If we don't want to do that 
> > for
> > the inode cache (I'm not sure if it's ok or not) we could just do
> >
> > if (root == fs_info->tree_root)
> 
> We can't (not just that at least).
> Tried something like that, but we get into a BUG_ON when writing out
> the space cache for new block groups (created in the current
> transaction).
> Because at cache_save_setup() we have this:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342
> 
> Lookup for the inode in normal root, doesn't exist, create it then
> repeat - if still not found, BUG_ON.
> Could also make create_free_space_inode() return an inode pointer and
> make it call btrfs_iget().
> 

Ah ok makes sense.  Well in that case lets just make btrfs_read_locked_inode()
take a path, and allocate it in btrfs_iget, that'll remove the ugly

if (path != in_path)

stuff.  Thanks,

Josef


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-24 Thread Filipe Manana
On Wed, Oct 24, 2018 at 12:37 PM Josef Bacik  wrote:
>
> On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote:
> > From: Filipe Manana 
> >
> > When we are writing out a free space cache, during the transaction commit
> > phase, we can end up in a deadlock which results in a stack trace like the
> > following:
> >
> >  schedule+0x28/0x80
> >  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
> >  btrfs_search_slot+0xf6/0x9f0 [btrfs]
> >  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
> >  ? inode_insert5+0x119/0x190
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_iget+0x113/0x690 [btrfs]
> >  __lookup_free_space_inode+0xd8/0x150 [btrfs]
> >  lookup_free_space_inode+0x5b/0xb0 [btrfs]
> >  load_free_space_cache+0x7c/0x170 [btrfs]
> >  ? cache_block_group+0x72/0x3b0 [btrfs]
> >  cache_block_group+0x1b3/0x3b0 [btrfs]
> >  ? finish_wait+0x80/0x80
> >  find_free_extent+0x799/0x1010 [btrfs]
> >  btrfs_reserve_extent+0x9b/0x180 [btrfs]
> >  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
> >  __btrfs_cow_block+0x11d/0x500 [btrfs]
> >  btrfs_cow_block+0xdc/0x180 [btrfs]
> >  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
> >  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
> >  ? kmem_cache_alloc+0x166/0x1d0
> >  btrfs_update_inode_item+0x46/0x100 [btrfs]
> >  cache_save_setup+0xe4/0x3a0 [btrfs]
> >  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
> >  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> >
> > At cache_save_setup() we need to update the inode item of a block group's
> > cache which is located in the tree root (fs_info->tree_root), which means
> > that it may result in COWing a leaf from that tree. If that happens we
> > need to find a free metadata extent and while looking for one, if we find
> > a block group which was not cached yet we attempt to load its cache by
> > calling cache_block_group(). However this function will try to load the
> > inode of the free space cache, which requires finding the matching inode
> > item in the tree root - if that inode item is located in the same leaf as
> > the inode item of the space cache we are updating at cache_save_setup(),
> > we end up in a deadlock, since we try to obtain a read lock on the same
> > extent buffer that we previously write locked.
> >
> > So fix this by using the tree root's commit root when searching for a
> > block group's free space cache inode item when we are attempting to load
> > a free space cache. This is safe since block groups once loaded stay in
> > memory forever, as well as their caches, so after they are first loaded
> > we will never need to read their inode items again. For new block groups,
> > once they are created they get their ->cached field set to
> > BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
> >
> > Reported-by: Andrew Nelson 
> > Link: 
> > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> > Tested-by: Andrew Nelson 
> > Signed-off-by: Filipe Manana 
> > ---
> >
>
> Now my goal is to see how many times I can get you to redo this thing.
>
> Why not instead just do
>
> if (btrfs_is_free_space_inode(inode))
>   path->search_commit_root = 1;
>
> in read_locked_inode?  That would be cleaner.  If we don't want to do that for
> the inode cache (I'm not sure if it's ok or not) we could just do
>
> if (root == fs_info->tree_root)

We can't (not just that at least).
Tried something like that, but we get into a BUG_ON when writing out
the space cache for new block groups (created in the current
transaction).
Because at cache_save_setup() we have this:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/extent-tree.c?h=v4.19#n3342

Lookup for the inode in normal root, doesn't exist, create it then
repeat - if still not found, BUG_ON.
Could also make create_free_space_inode() return an inode pointer and
make it call btrfs_iget().

>
> instead.  Thanks,
>
> Josef


Re: [PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-24 Thread Josef Bacik
On Wed, Oct 24, 2018 at 10:13:03AM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> When we are writing out a free space cache, during the transaction commit
> phase, we can end up in a deadlock which results in a stack trace like the
> following:
> 
>  schedule+0x28/0x80
>  btrfs_tree_read_lock+0x8e/0x120 [btrfs]
>  ? finish_wait+0x80/0x80
>  btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
>  btrfs_search_slot+0xf6/0x9f0 [btrfs]
>  ? evict_refill_and_join+0xd0/0xd0 [btrfs]
>  ? inode_insert5+0x119/0x190
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_iget+0x113/0x690 [btrfs]
>  __lookup_free_space_inode+0xd8/0x150 [btrfs]
>  lookup_free_space_inode+0x5b/0xb0 [btrfs]
>  load_free_space_cache+0x7c/0x170 [btrfs]
>  ? cache_block_group+0x72/0x3b0 [btrfs]
>  cache_block_group+0x1b3/0x3b0 [btrfs]
>  ? finish_wait+0x80/0x80
>  find_free_extent+0x799/0x1010 [btrfs]
>  btrfs_reserve_extent+0x9b/0x180 [btrfs]
>  btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
>  __btrfs_cow_block+0x11d/0x500 [btrfs]
>  btrfs_cow_block+0xdc/0x180 [btrfs]
>  btrfs_search_slot+0x3bd/0x9f0 [btrfs]
>  btrfs_lookup_inode+0x3a/0xc0 [btrfs]
>  ? kmem_cache_alloc+0x166/0x1d0
>  btrfs_update_inode_item+0x46/0x100 [btrfs]
>  cache_save_setup+0xe4/0x3a0 [btrfs]
>  btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
>  btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
> 
> At cache_save_setup() we need to update the inode item of a block group's
> cache which is located in the tree root (fs_info->tree_root), which means
> that it may result in COWing a leaf from that tree. If that happens we
> need to find a free metadata extent and while looking for one, if we find
> a block group which was not cached yet we attempt to load its cache by
> calling cache_block_group(). However this function will try to load the
> inode of the free space cache, which requires finding the matching inode
> item in the tree root - if that inode item is located in the same leaf as
> the inode item of the space cache we are updating at cache_save_setup(),
> we end up in a deadlock, since we try to obtain a read lock on the same
> extent buffer that we previously write locked.
> 
> So fix this by using the tree root's commit root when searching for a
> block group's free space cache inode item when we are attempting to load
> a free space cache. This is safe since block groups once loaded stay in
> memory forever, as well as their caches, so after they are first loaded
> we will never need to read their inode items again. For new block groups,
> once they are created they get their ->cached field set to
> BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
> 
> Reported-by: Andrew Nelson 
> Link: 
> https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/
> Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists")
> Tested-by: Andrew Nelson 
> Signed-off-by: Filipe Manana 
> ---
> 

Now my goal is to see how many times I can get you to redo this thing.

Why not instead just do 

if (btrfs_is_free_space_inode(inode))
  path->search_commit_root = 1;

in read_locked_inode?  That would be cleaner.  If we don't want to do that for
the inode cache (I'm not sure if it's ok or not) we could just do

if (root == fs_info->tree_root)

instead.  Thanks,

Josef