Re: [PATCH][v2] btrfs: run delayed items before dropping the snapshot

2018-12-06 Thread David Sterba
On Wed, Dec 05, 2018 at 12:12:21PM -0500, Josef Bacik wrote:
> From: Josef Bacik 
> 
> With my delayed refs patches in place we started seeing a large amount
> of aborts in __btrfs_free_extent
> 
> BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 
> 35964  owner 1 offset 0
> Call Trace:
>  ? btrfs_merge_delayed_refs+0xaf/0x340
>  __btrfs_run_delayed_refs+0x6ea/0xfc0
>  ? btrfs_set_path_blocking+0x31/0x60
>  btrfs_run_delayed_refs+0xeb/0x180
>  btrfs_commit_transaction+0x179/0x7f0
>  ? btrfs_check_space_for_delayed_refs+0x30/0x50
>  ? should_end_transaction.isra.19+0xe/0x40
>  btrfs_drop_snapshot+0x41c/0x7c0
>  btrfs_clean_one_deleted_snapshot+0xb5/0xd0
>  cleaner_kthread+0xf6/0x120
>  kthread+0xf8/0x130
>  ? btree_invalidatepage+0x90/0x90
>  ? kthread_bind+0x10/0x10
>  ret_from_fork+0x35/0x40
> 
> This was because btrfs_drop_snapshot depends on the root not being modified
> while it's dropping the snapshot.  It will unlock the root node (and really
> every node) as it walks down the tree, only to re-lock it when it needs to do
> something.  This is a problem because if we modify the tree we could cow a 
> block
> in our path, which free's our reference to that block.  Then once we get back 
> to
> that shared block we'll free our reference to it again, and get ENOENT when
> trying to lookup our extent reference to that block in __btrfs_free_extent.
> 
> This is ultimately happening because we have delayed items left to be 
> processed
> for our deleted snapshot _after_ all of the inodes are closed for the 
> snapshot.
> We only run the delayed inode item if we're deleting the inode, and even then 
> we
> do not run the delayed insertions or delayed removals.  These can be run at 
> any
> point after our final inode does it's last iput, which is what triggers the
> snapshot deletion.  We can end up with the snapshot deletion happening and 
> then
> have the delayed items run on that file system, resulting in the above 
> problem.
> 
> This problem has existed forever, however my patches made it much easier to 
> hit
> as I wake up the cleaner much more often to deal with delayed iputs, which 
> made
> us more likely to start the snapshot dropping work before the transaction
> commits, which is when the delayed items would generally be run.  Before,
> generally speaking, we would run the delayed items, commit the transaction, 
> and
> wakeup the cleaner thread to start deleting snapshots, which means we were 
> less
> likely to hit this problem.  You could still hit it if you had multiple
> snapshots to be deleted and ended up with lots of delayed items, but it was
> definitely harder.
> 
> Fix for now by simply running all the delayed items before starting to drop 
> the
> snapshot.  We could make this smarter in the future by making the delayed 
> items
> per-root, and then simply drop any delayed items for roots that we are going 
> to
> delete.  But for now just a quick and easy solution is the safest.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Josef Bacik 
> ---
> v1->v2:
> - check for errors from btrfs_run_delayed_items.
> - Dave I can reroll the series, but the second version of patch 1 is the same,
>   let me know what you want.

As this is a small update it's fine to send just that patch. You may
also use --in-reply-to so it threads to the original series. Resending
series makes most sense (to me) when there's a discussion and many
changes, so a fresh series makes it clear what's the current status.

Patch replaced in for-next topic branch, thanks.


Re: [PATCH][v2] btrfs: run delayed items before dropping the snapshot

2018-12-05 Thread Filipe Manana
On Wed, Dec 5, 2018 at 5:14 PM Josef Bacik  wrote:
>
> From: Josef Bacik 
>
> With my delayed refs patches in place we started seeing a large amount
> of aborts in __btrfs_free_extent
>
> BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 
> 35964  owner 1 offset 0
> Call Trace:
>  ? btrfs_merge_delayed_refs+0xaf/0x340
>  __btrfs_run_delayed_refs+0x6ea/0xfc0
>  ? btrfs_set_path_blocking+0x31/0x60
>  btrfs_run_delayed_refs+0xeb/0x180
>  btrfs_commit_transaction+0x179/0x7f0
>  ? btrfs_check_space_for_delayed_refs+0x30/0x50
>  ? should_end_transaction.isra.19+0xe/0x40
>  btrfs_drop_snapshot+0x41c/0x7c0
>  btrfs_clean_one_deleted_snapshot+0xb5/0xd0
>  cleaner_kthread+0xf6/0x120
>  kthread+0xf8/0x130
>  ? btree_invalidatepage+0x90/0x90
>  ? kthread_bind+0x10/0x10
>  ret_from_fork+0x35/0x40
>
> This was because btrfs_drop_snapshot depends on the root not being modified
> while it's dropping the snapshot.  It will unlock the root node (and really
> every node) as it walks down the tree, only to re-lock it when it needs to do
> something.  This is a problem because if we modify the tree we could cow a 
> block
> in our path, which free's our reference to that block.  Then once we get back 
> to
> that shared block we'll free our reference to it again, and get ENOENT when
> trying to lookup our extent reference to that block in __btrfs_free_extent.
>
> This is ultimately happening because we have delayed items left to be 
> processed
> for our deleted snapshot _after_ all of the inodes are closed for the 
> snapshot.
> We only run the delayed inode item if we're deleting the inode, and even then 
> we
> do not run the delayed insertions or delayed removals.  These can be run at 
> any
> point after our final inode does it's last iput, which is what triggers the
> snapshot deletion.  We can end up with the snapshot deletion happening and 
> then
> have the delayed items run on that file system, resulting in the above 
> problem.
>
> This problem has existed forever, however my patches made it much easier to 
> hit
> as I wake up the cleaner much more often to deal with delayed iputs, which 
> made
> us more likely to start the snapshot dropping work before the transaction
> commits, which is when the delayed items would generally be run.  Before,
> generally speaking, we would run the delayed items, commit the transaction, 
> and
> wakeup the cleaner thread to start deleting snapshots, which means we were 
> less
> likely to hit this problem.  You could still hit it if you had multiple
> snapshots to be deleted and ended up with lots of delayed items, but it was
> definitely harder.
>
> Fix for now by simply running all the delayed items before starting to drop 
> the
> snapshot.  We could make this smarter in the future by making the delayed 
> items
> per-root, and then simply drop any delayed items for roots that we are going 
> to
> delete.  But for now just a quick and easy solution is the safest.
>
> Cc: sta...@vger.kernel.org
> Signed-off-by: Josef Bacik 

Reviewed-by: Filipe Manana 

Looks good now. Thanks.

> ---
> v1->v2:
> - check for errors from btrfs_run_delayed_items.
> - Dave I can reroll the series, but the second version of patch 1 is the same,
>   let me know what you want.
>
>  fs/btrfs/extent-tree.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index dcb699dd57f3..473084eb7a2d 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -9330,6 +9330,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
> goto out_free;
> }
>
> +   err = btrfs_run_delayed_items(trans);
> +   if (err)
> +   goto out_end_trans;
> +
> if (block_rsv)
> trans->block_rsv = block_rsv;
>
> --
> 2.14.3
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”


[PATCH][v2] btrfs: run delayed items before dropping the snapshot

2018-12-05 Thread Josef Bacik
From: Josef Bacik 

With my delayed refs patches in place we started seeing a large amount
of aborts in __btrfs_free_extent

BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 
35964  owner 1 offset 0
Call Trace:
 ? btrfs_merge_delayed_refs+0xaf/0x340
 __btrfs_run_delayed_refs+0x6ea/0xfc0
 ? btrfs_set_path_blocking+0x31/0x60
 btrfs_run_delayed_refs+0xeb/0x180
 btrfs_commit_transaction+0x179/0x7f0
 ? btrfs_check_space_for_delayed_refs+0x30/0x50
 ? should_end_transaction.isra.19+0xe/0x40
 btrfs_drop_snapshot+0x41c/0x7c0
 btrfs_clean_one_deleted_snapshot+0xb5/0xd0
 cleaner_kthread+0xf6/0x120
 kthread+0xf8/0x130
 ? btree_invalidatepage+0x90/0x90
 ? kthread_bind+0x10/0x10
 ret_from_fork+0x35/0x40

This was because btrfs_drop_snapshot depends on the root not being modified
while it's dropping the snapshot.  It will unlock the root node (and really
every node) as it walks down the tree, only to re-lock it when it needs to do
something.  This is a problem because if we modify the tree we could cow a block
in our path, which free's our reference to that block.  Then once we get back to
that shared block we'll free our reference to it again, and get ENOENT when
trying to lookup our extent reference to that block in __btrfs_free_extent.

This is ultimately happening because we have delayed items left to be processed
for our deleted snapshot _after_ all of the inodes are closed for the snapshot.
We only run the delayed inode item if we're deleting the inode, and even then we
do not run the delayed insertions or delayed removals.  These can be run at any
point after our final inode does it's last iput, which is what triggers the
snapshot deletion.  We can end up with the snapshot deletion happening and then
have the delayed items run on that file system, resulting in the above problem.

This problem has existed forever, however my patches made it much easier to hit
as I wake up the cleaner much more often to deal with delayed iputs, which made
us more likely to start the snapshot dropping work before the transaction
commits, which is when the delayed items would generally be run.  Before,
generally speaking, we would run the delayed items, commit the transaction, and
wakeup the cleaner thread to start deleting snapshots, which means we were less
likely to hit this problem.  You could still hit it if you had multiple
snapshots to be deleted and ended up with lots of delayed items, but it was
definitely harder.

Fix for now by simply running all the delayed items before starting to drop the
snapshot.  We could make this smarter in the future by making the delayed items
per-root, and then simply drop any delayed items for roots that we are going to
delete.  But for now just a quick and easy solution is the safest.

Cc: sta...@vger.kernel.org
Signed-off-by: Josef Bacik 
---
v1->v2:
- check for errors from btrfs_run_delayed_items.
- Dave I can reroll the series, but the second version of patch 1 is the same,
  let me know what you want.

 fs/btrfs/extent-tree.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index dcb699dd57f3..473084eb7a2d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9330,6 +9330,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
goto out_free;
}
 
+   err = btrfs_run_delayed_items(trans);
+   if (err)
+   goto out_end_trans;
+
if (block_rsv)
trans->block_rsv = block_rsv;
 
-- 
2.14.3