Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
On Mon, Sep 16, 2019 at 01:59:47PM +0100, Filipe Manana wrote: > On Thu, Sep 12, 2019 at 8:32 AM Sasha Levin wrote: > > > > Hi, > > > > [This is an automated email] > > > > This commit has been processed because it contains a -stable tag. > > The stable tag indicates that it's relevant for the following trees: 4.4+ > > > > The bot has tested the following trees: v5.2.14, v4.19.72, v4.14.143, > > v4.9.192, v4.4.192. > > > > v5.2.14: Build OK! > > v4.19.72: Failed to apply! Possible dependencies: > > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of > > different files") > > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and > > unlink/rmdir") > > b8aa330d2acb ("Btrfs: improve performance on fsync of files with > > multiple hardlinks") > > > > v4.14.143: Failed to apply! Possible dependencies: > > 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link > > recreation") > > 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link > > combination") > > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of > > different files") > > 8d9e220ca084 ("btrfs: simplify IS_ERR/PTR_ERR checks") > > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and > > unlink/rmdir") > > b8aa330d2acb ("Btrfs: improve performance on fsync of files with > > multiple hardlinks") > > > > v4.9.192: Failed to apply! Possible dependencies: > > 0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience > > variables") > > 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link > > recreation") > > 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link > > combination") > > 4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take > > btrfs_inode") > > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of > > different files") > > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and > > unlink/rmdir") > > cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block") > > da17066c4047 ("btrfs: pull node/sector/stripe sizes out of root and > > into fs_info") > > db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode") > > de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info") > > fb456252d3d9 ("btrfs: root->fs_info cleanup, use fs_info->dev_root > > everywhere") > > > > v4.4.192: Failed to apply! Possible dependencies: > > 0132761017e0 ("btrfs: fix string and comment grammatical issues and > > typos") > > 09cbfeaf1a5a ("mm, fs: get rid of PAGE_CACHE_* and > > page_cache_{get,release} macros") > > 0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience > > variables") > > 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link > > recreation") > > 0e749e54244e ("dax: increase granularity of dax_clear_blocks() > > operations") > > 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link > > combination") > > 44f714dae50a ("Btrfs: improve performance on fsync against new inode > > after rename/unlink") > > 4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take > > btrfs_inode") > > 52db400fcd50 ("pmem, dax: clean up clear_pmem()") > > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of > > different files") > > 781feef7e6be ("Btrfs: fix lockdep warning about log_mutex") > > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and > > unlink/rmdir") > > b2e0d1625e19 ("dax: fix lifetime of in-kernel dax mappings with > > dax_map_atomic()") > > bb7ab3b92e46 ("btrfs: Fix misspellings in comments.") > > cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block") > > d1a5f2b4d8a1 ("block: use DAX for partition table reads") > > db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode") > > de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info") > > > > > > NOTE: The patch will not be queued to stable trees until it is upstream. > > > > How should we proceed with this patch? > > So here follows, as attachments, patches for the following stable > versions: 4.4.193, 4.9.193, 4.14.144 and 4.19.73. Thanks for these backports, now queued up! greg k-h
Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
On Thu, Sep 12, 2019 at 8:32 AM Sasha Levin wrote: > > Hi, > > [This is an automated email] > > This commit has been processed because it contains a -stable tag. > The stable tag indicates that it's relevant for the following trees: 4.4+ > > The bot has tested the following trees: v5.2.14, v4.19.72, v4.14.143, > v4.9.192, v4.4.192. > > v5.2.14: Build OK! > v4.19.72: Failed to apply! Possible dependencies: > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different > files") > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and > unlink/rmdir") > b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple > hardlinks") > > v4.14.143: Failed to apply! Possible dependencies: > 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link > recreation") > 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link > combination") > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different > files") > 8d9e220ca084 ("btrfs: simplify IS_ERR/PTR_ERR checks") > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and > unlink/rmdir") > b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple > hardlinks") > > v4.9.192: Failed to apply! Possible dependencies: > 0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience > variables") > 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link > recreation") > 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link > combination") > 4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take > btrfs_inode") > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different > files") > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and > unlink/rmdir") > cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block") > da17066c4047 ("btrfs: pull node/sector/stripe sizes out of root and into > fs_info") > db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode") > de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info") > fb456252d3d9 ("btrfs: root->fs_info cleanup, use fs_info->dev_root > everywhere") > > v4.4.192: Failed to apply! Possible dependencies: > 0132761017e0 ("btrfs: fix string and comment grammatical issues and > typos") > 09cbfeaf1a5a ("mm, fs: get rid of PAGE_CACHE_* and > page_cache_{get,release} macros") > 0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience > variables") > 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link > recreation") > 0e749e54244e ("dax: increase granularity of dax_clear_blocks() > operations") > 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link > combination") > 44f714dae50a ("Btrfs: improve performance on fsync against new inode > after rename/unlink") > 4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take > btrfs_inode") > 52db400fcd50 ("pmem, dax: clean up clear_pmem()") > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different > files") > 781feef7e6be ("Btrfs: fix lockdep warning about log_mutex") > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and > unlink/rmdir") > b2e0d1625e19 ("dax: fix lifetime of in-kernel dax mappings with > dax_map_atomic()") > bb7ab3b92e46 ("btrfs: Fix misspellings in comments.") > cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block") > d1a5f2b4d8a1 ("block: use DAX for partition table reads") > db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode") > de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info") > > > NOTE: The patch will not be queued to stable trees until it is upstream. > > How should we proceed with this patch? So here follows, as attachments, patches for the following stable versions: 4.4.193, 4.9.193, 4.14.144 and 4.19.73. Thanks. > > -- > Thanks, > Sasha -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.” From 842266d7437f7714a67ddb34680546f0268f27c9 Mon Sep 17 00:00:00 2001 From: Filipe Manana Date: Tue, 10 Sep 2019 15:26:49 +0100 Subject: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction Sometimes when fsync'ing a file we need to log that other inodes exist and when we need to do that we acquire a reference on the inodes and then drop that reference using iput() after logging them. That generally is not a problem except if we end up doing the final iput() (dropping the last reference) on the inode and that inode has a link count of 0, which can happen in a very short time window if the logging path gets a reference on the inode while it's being unlinked. In that case we end up getting the eviction callback, btrfs_evict_inode(), invoked through the iput() call chain which needs to drop all of the inode's items from its subvol
Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
On Thu, Sep 12, 2019 at 02:19:55PM +0100, Filipe Manana wrote: > On Thu, Sep 12, 2019 at 1:18 PM Josef Bacik wrote: > > > > On Tue, Sep 10, 2019 at 03:26:49PM +0100, fdman...@kernel.org wrote: > > > From: Filipe Manana > > > > > > Sometimes when fsync'ing a file we need to log that other inodes exist and > > > when we need to do that we acquire a reference on the inodes and then drop > > > that reference using iput() after logging them. > > > > > > That generally is not a problem except if we end up doing the final iput() > > > (dropping the last reference) on the inode and that inode has a link count > > > of 0, which can happen in a very short time window if the logging path > > > gets a reference on the inode while it's being unlinked. > > > > > > In that case we end up getting the eviction callback, btrfs_evict_inode(), > > > invoked through the iput() call chain which needs to drop all of the > > > inode's items from its subvolume btree, and in order to do that, it needs > > > to join a transaction at the helper function evict_refill_and_join(). > > > However because the task previously started a transaction at the fsync > > > handler, btrfs_sync_file(), it has current->journal_info already pointing > > > to a transaction handle and therefore evict_refill_and_join() will get > > > that transaction handle from btrfs_join_transaction(). From this point on, > > > two different problems can happen: > > > > > > 1) evict_refill_and_join() will often change the transaction handle's > > >block reserve (->block_rsv) and set its ->bytes_reserved field to a > > >value greater than 0. If evict_refill_and_join() never commits the > > >transaction, the eviction handler ends up decreasing the reference > > >count (->use_count) of the transaction handle through the call to > > >btrfs_end_transaction(), and after that point we have a transaction > > >handle with a NULL ->block_rsv (which is the value prior to the > > >transaction join from evict_refill_and_join()) and a ->bytes_reserved > > >value greater than 0. If after the eviction/iput completes the inode > > >logging path hits an error or it decides that it must fallback to a > > >transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a > > >non-zero value from btrfs_log_dentry_safe(), and because of that > > >non-zero value it tries to commit the transaction using a handle with > > >a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes > > >the transaction commit hit an assertion failure at > > >btrfs_trans_release_metadata() because ->bytes_reserved is not zero but > > >the ->block_rsv is NULL. The produced stack trace for that is like the > > >following: > > > > > >[192922.917158] assertion failed: !trans->bytes_reserved, file: > > > fs/btrfs/transaction.c, line: 816 > > >[192922.917553] [ cut here ] > > >[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532! > > >[192922.918310] invalid opcode: [#1] SMP DEBUG_PAGEALLOC PTI > > >[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: GW > > > 5.1.4-btrfs-next-47 #1 > > >[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > > > BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 > > >[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs] > > >(...) > > >[192922.920925] RSP: 0018:aebdc8a27da8 EFLAGS: 00010286 > > >[192922.921315] RAX: 0051 RBX: 95c9c16a41c0 RCX: > > > > > >[192922.921692] RDX: RSI: 95cab6b16838 RDI: > > > 95cab6b16838 > > >[192922.922066] RBP: 95c9c16a41c0 R08: R09: > > > > > >[192922.922442] R10: aebdc8a27e70 R11: R12: > > > 95ca731a0980 > > >[192922.922820] R13: R14: 95ca84c73338 R15: > > > 95ca731a0ea8 > > >[192922.923200] FS: 7f337eda4e80() GS:95cab6b0() > > > knlGS: > > >[192922.923579] CS: 0010 DS: ES: CR0: 80050033 > > >[192922.923948] CR2: 7f337edad000 CR3: 0001e00f6002 CR4: > > > 003606e0 > > >[192922.924329] DR0: DR1: DR2: > > > > > >[192922.924711] DR3: DR6: fffe0ff0 DR7: > > > 0400 > > >[192922.925105] Call Trace: > > >[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs] > > >[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs] > > >[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs] > > >[192922.926731] do_fsync+0x38/0x60 > > >[192922.927138] __x64_sys_fdatasync+0x13/0x20 > > >[192922.927543] do_syscall_64+0x60/0x1c0 > > >[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > >(...) > > >[192922.934077] ---[ end trace f00808b12068168f ]---
Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
On Thu, Sep 12, 2019 at 1:18 PM Josef Bacik wrote: > > On Tue, Sep 10, 2019 at 03:26:49PM +0100, fdman...@kernel.org wrote: > > From: Filipe Manana > > > > Sometimes when fsync'ing a file we need to log that other inodes exist and > > when we need to do that we acquire a reference on the inodes and then drop > > that reference using iput() after logging them. > > > > That generally is not a problem except if we end up doing the final iput() > > (dropping the last reference) on the inode and that inode has a link count > > of 0, which can happen in a very short time window if the logging path > > gets a reference on the inode while it's being unlinked. > > > > In that case we end up getting the eviction callback, btrfs_evict_inode(), > > invoked through the iput() call chain which needs to drop all of the > > inode's items from its subvolume btree, and in order to do that, it needs > > to join a transaction at the helper function evict_refill_and_join(). > > However because the task previously started a transaction at the fsync > > handler, btrfs_sync_file(), it has current->journal_info already pointing > > to a transaction handle and therefore evict_refill_and_join() will get > > that transaction handle from btrfs_join_transaction(). From this point on, > > two different problems can happen: > > > > 1) evict_refill_and_join() will often change the transaction handle's > >block reserve (->block_rsv) and set its ->bytes_reserved field to a > >value greater than 0. If evict_refill_and_join() never commits the > >transaction, the eviction handler ends up decreasing the reference > >count (->use_count) of the transaction handle through the call to > >btrfs_end_transaction(), and after that point we have a transaction > >handle with a NULL ->block_rsv (which is the value prior to the > >transaction join from evict_refill_and_join()) and a ->bytes_reserved > >value greater than 0. If after the eviction/iput completes the inode > >logging path hits an error or it decides that it must fallback to a > >transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a > >non-zero value from btrfs_log_dentry_safe(), and because of that > >non-zero value it tries to commit the transaction using a handle with > >a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes > >the transaction commit hit an assertion failure at > >btrfs_trans_release_metadata() because ->bytes_reserved is not zero but > >the ->block_rsv is NULL. The produced stack trace for that is like the > >following: > > > >[192922.917158] assertion failed: !trans->bytes_reserved, file: > > fs/btrfs/transaction.c, line: 816 > >[192922.917553] [ cut here ] > >[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532! > >[192922.918310] invalid opcode: [#1] SMP DEBUG_PAGEALLOC PTI > >[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: GW > > 5.1.4-btrfs-next-47 #1 > >[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > > BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 > >[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs] > >(...) > >[192922.920925] RSP: 0018:aebdc8a27da8 EFLAGS: 00010286 > >[192922.921315] RAX: 0051 RBX: 95c9c16a41c0 RCX: > > > >[192922.921692] RDX: RSI: 95cab6b16838 RDI: > > 95cab6b16838 > >[192922.922066] RBP: 95c9c16a41c0 R08: R09: > > > >[192922.922442] R10: aebdc8a27e70 R11: R12: > > 95ca731a0980 > >[192922.922820] R13: R14: 95ca84c73338 R15: > > 95ca731a0ea8 > >[192922.923200] FS: 7f337eda4e80() GS:95cab6b0() > > knlGS: > >[192922.923579] CS: 0010 DS: ES: CR0: 80050033 > >[192922.923948] CR2: 7f337edad000 CR3: 0001e00f6002 CR4: > > 003606e0 > >[192922.924329] DR0: DR1: DR2: > > > >[192922.924711] DR3: DR6: fffe0ff0 DR7: > > 0400 > >[192922.925105] Call Trace: > >[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs] > >[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs] > >[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs] > >[192922.926731] do_fsync+0x38/0x60 > >[192922.927138] __x64_sys_fdatasync+0x13/0x20 > >[192922.927543] do_syscall_64+0x60/0x1c0 > >[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe > >(...) > >[192922.934077] ---[ end trace f00808b12068168f ]--- > > > > 2) If evict_refill_and_join() decides to commit the transaction, it will > >be able to do it, since the nested transaction join only increments the > >transaction handle's ->use_count reference counter and it does n
Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
On Thu, Sep 12, 2019 at 12:59 PM Josef Bacik wrote: > > On Tue, Sep 10, 2019 at 03:26:49PM +0100, fdman...@kernel.org wrote: > > From: Filipe Manana > > > > Sometimes when fsync'ing a file we need to log that other inodes exist and > > when we need to do that we acquire a reference on the inodes and then drop > > that reference using iput() after logging them. > > > > That generally is not a problem except if we end up doing the final iput() > > (dropping the last reference) on the inode and that inode has a link count > > of 0, which can happen in a very short time window if the logging path > > gets a reference on the inode while it's being unlinked. > > > > In that case we end up getting the eviction callback, btrfs_evict_inode(), > > invoked through the iput() call chain which needs to drop all of the > > inode's items from its subvolume btree, and in order to do that, it needs > > to join a transaction at the helper function evict_refill_and_join(). > > However because the task previously started a transaction at the fsync > > handler, btrfs_sync_file(), it has current->journal_info already pointing > > to a transaction handle and therefore evict_refill_and_join() will get > > that transaction handle from btrfs_join_transaction(). From this point on, > > two different problems can happen: > > > > 1) evict_refill_and_join() will often change the transaction handle's > >block reserve (->block_rsv) and set its ->bytes_reserved field to a > >value greater than 0. If evict_refill_and_join() never commits the > >transaction, the eviction handler ends up decreasing the reference > >count (->use_count) of the transaction handle through the call to > >btrfs_end_transaction(), and after that point we have a transaction > >handle with a NULL ->block_rsv (which is the value prior to the > >transaction join from evict_refill_and_join()) and a ->bytes_reserved > >value greater than 0. If after the eviction/iput completes the inode > >logging path hits an error or it decides that it must fallback to a > >transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a > >non-zero value from btrfs_log_dentry_safe(), and because of that > >non-zero value it tries to commit the transaction using a handle with > >a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes > >the transaction commit hit an assertion failure at > >btrfs_trans_release_metadata() because ->bytes_reserved is not zero but > >the ->block_rsv is NULL. The produced stack trace for that is like the > >following: > > > >[192922.917158] assertion failed: !trans->bytes_reserved, file: > > fs/btrfs/transaction.c, line: 816 > >[192922.917553] [ cut here ] > >[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532! > >[192922.918310] invalid opcode: [#1] SMP DEBUG_PAGEALLOC PTI > >[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: GW > > 5.1.4-btrfs-next-47 #1 > >[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > > BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 > >[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs] > >(...) > >[192922.920925] RSP: 0018:aebdc8a27da8 EFLAGS: 00010286 > >[192922.921315] RAX: 0051 RBX: 95c9c16a41c0 RCX: > > > >[192922.921692] RDX: RSI: 95cab6b16838 RDI: > > 95cab6b16838 > >[192922.922066] RBP: 95c9c16a41c0 R08: R09: > > > >[192922.922442] R10: aebdc8a27e70 R11: R12: > > 95ca731a0980 > >[192922.922820] R13: R14: 95ca84c73338 R15: > > 95ca731a0ea8 > >[192922.923200] FS: 7f337eda4e80() GS:95cab6b0() > > knlGS: > >[192922.923579] CS: 0010 DS: ES: CR0: 80050033 > >[192922.923948] CR2: 7f337edad000 CR3: 0001e00f6002 CR4: > > 003606e0 > >[192922.924329] DR0: DR1: DR2: > > > >[192922.924711] DR3: DR6: fffe0ff0 DR7: > > 0400 > >[192922.925105] Call Trace: > >[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs] > >[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs] > >[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs] > >[192922.926731] do_fsync+0x38/0x60 > >[192922.927138] __x64_sys_fdatasync+0x13/0x20 > >[192922.927543] do_syscall_64+0x60/0x1c0 > >[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe > >(...) > >[192922.934077] ---[ end trace f00808b12068168f ]--- > > > > 2) If evict_refill_and_join() decides to commit the transaction, it will > >be able to do it, since the nested transaction join only increments the > >transaction handle's ->use_count reference counter and it does
Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
On Tue, Sep 10, 2019 at 03:26:49PM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > Sometimes when fsync'ing a file we need to log that other inodes exist and > when we need to do that we acquire a reference on the inodes and then drop > that reference using iput() after logging them. > > That generally is not a problem except if we end up doing the final iput() > (dropping the last reference) on the inode and that inode has a link count > of 0, which can happen in a very short time window if the logging path > gets a reference on the inode while it's being unlinked. > > In that case we end up getting the eviction callback, btrfs_evict_inode(), > invoked through the iput() call chain which needs to drop all of the > inode's items from its subvolume btree, and in order to do that, it needs > to join a transaction at the helper function evict_refill_and_join(). > However because the task previously started a transaction at the fsync > handler, btrfs_sync_file(), it has current->journal_info already pointing > to a transaction handle and therefore evict_refill_and_join() will get > that transaction handle from btrfs_join_transaction(). From this point on, > two different problems can happen: > > 1) evict_refill_and_join() will often change the transaction handle's >block reserve (->block_rsv) and set its ->bytes_reserved field to a >value greater than 0. If evict_refill_and_join() never commits the >transaction, the eviction handler ends up decreasing the reference >count (->use_count) of the transaction handle through the call to >btrfs_end_transaction(), and after that point we have a transaction >handle with a NULL ->block_rsv (which is the value prior to the >transaction join from evict_refill_and_join()) and a ->bytes_reserved >value greater than 0. If after the eviction/iput completes the inode >logging path hits an error or it decides that it must fallback to a >transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a >non-zero value from btrfs_log_dentry_safe(), and because of that >non-zero value it tries to commit the transaction using a handle with >a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes >the transaction commit hit an assertion failure at >btrfs_trans_release_metadata() because ->bytes_reserved is not zero but >the ->block_rsv is NULL. The produced stack trace for that is like the >following: > >[192922.917158] assertion failed: !trans->bytes_reserved, file: > fs/btrfs/transaction.c, line: 816 >[192922.917553] [ cut here ] >[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532! >[192922.918310] invalid opcode: [#1] SMP DEBUG_PAGEALLOC PTI >[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: GW > 5.1.4-btrfs-next-47 #1 >[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 >[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs] >(...) >[192922.920925] RSP: 0018:aebdc8a27da8 EFLAGS: 00010286 >[192922.921315] RAX: 0051 RBX: 95c9c16a41c0 RCX: > >[192922.921692] RDX: RSI: 95cab6b16838 RDI: > 95cab6b16838 >[192922.922066] RBP: 95c9c16a41c0 R08: R09: > >[192922.922442] R10: aebdc8a27e70 R11: R12: > 95ca731a0980 >[192922.922820] R13: R14: 95ca84c73338 R15: > 95ca731a0ea8 >[192922.923200] FS: 7f337eda4e80() GS:95cab6b0() > knlGS: >[192922.923579] CS: 0010 DS: ES: CR0: 80050033 >[192922.923948] CR2: 7f337edad000 CR3: 0001e00f6002 CR4: > 003606e0 >[192922.924329] DR0: DR1: DR2: > >[192922.924711] DR3: DR6: fffe0ff0 DR7: > 0400 >[192922.925105] Call Trace: >[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs] >[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs] >[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs] >[192922.926731] do_fsync+0x38/0x60 >[192922.927138] __x64_sys_fdatasync+0x13/0x20 >[192922.927543] do_syscall_64+0x60/0x1c0 >[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe >(...) >[192922.934077] ---[ end trace f00808b12068168f ]--- > > 2) If evict_refill_and_join() decides to commit the transaction, it will >be able to do it, since the nested transaction join only increments the >transaction handle's ->use_count reference counter and it does not >prevent the transaction from getting committed. This means that after >eviction completes, the fsync logging path will be using a transaction >handle that refers to an already committed transaction. What
Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
On Tue, Sep 10, 2019 at 03:26:49PM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > Sometimes when fsync'ing a file we need to log that other inodes exist and > when we need to do that we acquire a reference on the inodes and then drop > that reference using iput() after logging them. > > That generally is not a problem except if we end up doing the final iput() > (dropping the last reference) on the inode and that inode has a link count > of 0, which can happen in a very short time window if the logging path > gets a reference on the inode while it's being unlinked. > > In that case we end up getting the eviction callback, btrfs_evict_inode(), > invoked through the iput() call chain which needs to drop all of the > inode's items from its subvolume btree, and in order to do that, it needs > to join a transaction at the helper function evict_refill_and_join(). > However because the task previously started a transaction at the fsync > handler, btrfs_sync_file(), it has current->journal_info already pointing > to a transaction handle and therefore evict_refill_and_join() will get > that transaction handle from btrfs_join_transaction(). From this point on, > two different problems can happen: > > 1) evict_refill_and_join() will often change the transaction handle's >block reserve (->block_rsv) and set its ->bytes_reserved field to a >value greater than 0. If evict_refill_and_join() never commits the >transaction, the eviction handler ends up decreasing the reference >count (->use_count) of the transaction handle through the call to >btrfs_end_transaction(), and after that point we have a transaction >handle with a NULL ->block_rsv (which is the value prior to the >transaction join from evict_refill_and_join()) and a ->bytes_reserved >value greater than 0. If after the eviction/iput completes the inode >logging path hits an error or it decides that it must fallback to a >transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a >non-zero value from btrfs_log_dentry_safe(), and because of that >non-zero value it tries to commit the transaction using a handle with >a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes >the transaction commit hit an assertion failure at >btrfs_trans_release_metadata() because ->bytes_reserved is not zero but >the ->block_rsv is NULL. The produced stack trace for that is like the >following: > >[192922.917158] assertion failed: !trans->bytes_reserved, file: > fs/btrfs/transaction.c, line: 816 >[192922.917553] [ cut here ] >[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532! >[192922.918310] invalid opcode: [#1] SMP DEBUG_PAGEALLOC PTI >[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: GW > 5.1.4-btrfs-next-47 #1 >[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 >[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs] >(...) >[192922.920925] RSP: 0018:aebdc8a27da8 EFLAGS: 00010286 >[192922.921315] RAX: 0051 RBX: 95c9c16a41c0 RCX: > >[192922.921692] RDX: RSI: 95cab6b16838 RDI: > 95cab6b16838 >[192922.922066] RBP: 95c9c16a41c0 R08: R09: > >[192922.922442] R10: aebdc8a27e70 R11: R12: > 95ca731a0980 >[192922.922820] R13: R14: 95ca84c73338 R15: > 95ca731a0ea8 >[192922.923200] FS: 7f337eda4e80() GS:95cab6b0() > knlGS: >[192922.923579] CS: 0010 DS: ES: CR0: 80050033 >[192922.923948] CR2: 7f337edad000 CR3: 0001e00f6002 CR4: > 003606e0 >[192922.924329] DR0: DR1: DR2: > >[192922.924711] DR3: DR6: fffe0ff0 DR7: > 0400 >[192922.925105] Call Trace: >[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs] >[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs] >[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs] >[192922.926731] do_fsync+0x38/0x60 >[192922.927138] __x64_sys_fdatasync+0x13/0x20 >[192922.927543] do_syscall_64+0x60/0x1c0 >[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe >(...) >[192922.934077] ---[ end trace f00808b12068168f ]--- > > 2) If evict_refill_and_join() decides to commit the transaction, it will >be able to do it, since the nested transaction join only increments the >transaction handle's ->use_count reference counter and it does not >prevent the transaction from getting committed. This means that after This brings up a good point, we should probably not allow the commit in this case, or add an ASSERT(use_count == 1) or something, cause this
Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
On 12.09.19 г. 14:24 ч., Filipe Manana wrote: > On Thu, Sep 12, 2019 at 12:10 PM Nikolay Borisov wrote: >> >> >> >> On 10.09.19 г. 17:26 ч., fdman...@kernel.org wrote: >>> From: Filipe Manana >>> >>> Sometimes when fsync'ing a file we need to log that other inodes exist and >>> when we need to do that we acquire a reference on the inodes and then drop >>> that reference using iput() after logging them. >>> >>> That generally is not a problem except if we end up doing the final iput() >>> (dropping the last reference) on the inode and that inode has a link count >>> of 0, which can happen in a very short time window if the logging path >>> gets a reference on the inode while it's being unlinked. >>> >>> In that case we end up getting the eviction callback, btrfs_evict_inode(), >>> invoked through the iput() call chain which needs to drop all of the >>> inode's items from its subvolume btree, and in order to do that, it needs >>> to join a transaction at the helper function evict_refill_and_join(). >>> However because the task previously started a transaction at the fsync >>> handler, btrfs_sync_file(), it has current->journal_info already pointing >>> to a transaction handle and therefore evict_refill_and_join() will get >>> that transaction handle from btrfs_join_transaction(). From this point on, >>> two different problems can happen: >>> >>> 1) evict_refill_and_join() will often change the transaction handle's >>>block reserve (->block_rsv) and set its ->bytes_reserved field to a >>>value greater than 0. If evict_refill_and_join() never commits the >>>transaction, the eviction handler ends up decreasing the reference >>>count (->use_count) of the transaction handle through the call to >>>btrfs_end_transaction(), and after that point we have a transaction >>>handle with a NULL ->block_rsv (which is the value prior to the >>>transaction join from evict_refill_and_join()) and a ->bytes_reserved >>>value greater than 0. If after the eviction/iput completes the inode >>>logging path hits an error or it decides that it must fallback to a >>>transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a >>>non-zero value from btrfs_log_dentry_safe(), and because of that >>>non-zero value it tries to commit the transaction using a handle with >>>a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes >>>the transaction commit hit an assertion failure at >>>btrfs_trans_release_metadata() because ->bytes_reserved is not zero but >>>the ->block_rsv is NULL. The produced stack trace for that is like the >>>following: >>> >>>[192922.917158] assertion failed: !trans->bytes_reserved, file: >>> fs/btrfs/transaction.c, line: 816 >>>[192922.917553] [ cut here ] >>>[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532! >>>[192922.918310] invalid opcode: [#1] SMP DEBUG_PAGEALLOC PTI >>>[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: GW >>> 5.1.4-btrfs-next-47 #1 >>>[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >>> BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 >>>[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs] >>>(...) >>>[192922.920925] RSP: 0018:aebdc8a27da8 EFLAGS: 00010286 >>>[192922.921315] RAX: 0051 RBX: 95c9c16a41c0 RCX: >>> >>>[192922.921692] RDX: RSI: 95cab6b16838 RDI: >>> 95cab6b16838 >>>[192922.922066] RBP: 95c9c16a41c0 R08: R09: >>> >>>[192922.922442] R10: aebdc8a27e70 R11: R12: >>> 95ca731a0980 >>>[192922.922820] R13: R14: 95ca84c73338 R15: >>> 95ca731a0ea8 >>>[192922.923200] FS: 7f337eda4e80() GS:95cab6b0() >>> knlGS: >>>[192922.923579] CS: 0010 DS: ES: CR0: 80050033 >>>[192922.923948] CR2: 7f337edad000 CR3: 0001e00f6002 CR4: >>> 003606e0 >>>[192922.924329] DR0: DR1: DR2: >>> >>>[192922.924711] DR3: DR6: fffe0ff0 DR7: >>> 0400 >>>[192922.925105] Call Trace: >>>[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs] >>>[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs] >>>[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs] >>>[192922.926731] do_fsync+0x38/0x60 >>>[192922.927138] __x64_sys_fdatasync+0x13/0x20 >>>[192922.927543] do_syscall_64+0x60/0x1c0 >>>[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>(...) >>>[192922.934077] ---[ end trace f00808b12068168f ]--- >>> >>> 2) If evict_refill_and_join() decides to commit the transaction, it will >> evict_refill_and_join only ever calls btrfs_join_transaction so it >> cannot ever commit the transac
Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
On Thu, Sep 12, 2019 at 12:10 PM Nikolay Borisov wrote: > > > > On 10.09.19 г. 17:26 ч., fdman...@kernel.org wrote: > > From: Filipe Manana > > > > Sometimes when fsync'ing a file we need to log that other inodes exist and > > when we need to do that we acquire a reference on the inodes and then drop > > that reference using iput() after logging them. > > > > That generally is not a problem except if we end up doing the final iput() > > (dropping the last reference) on the inode and that inode has a link count > > of 0, which can happen in a very short time window if the logging path > > gets a reference on the inode while it's being unlinked. > > > > In that case we end up getting the eviction callback, btrfs_evict_inode(), > > invoked through the iput() call chain which needs to drop all of the > > inode's items from its subvolume btree, and in order to do that, it needs > > to join a transaction at the helper function evict_refill_and_join(). > > However because the task previously started a transaction at the fsync > > handler, btrfs_sync_file(), it has current->journal_info already pointing > > to a transaction handle and therefore evict_refill_and_join() will get > > that transaction handle from btrfs_join_transaction(). From this point on, > > two different problems can happen: > > > > 1) evict_refill_and_join() will often change the transaction handle's > >block reserve (->block_rsv) and set its ->bytes_reserved field to a > >value greater than 0. If evict_refill_and_join() never commits the > >transaction, the eviction handler ends up decreasing the reference > >count (->use_count) of the transaction handle through the call to > >btrfs_end_transaction(), and after that point we have a transaction > >handle with a NULL ->block_rsv (which is the value prior to the > >transaction join from evict_refill_and_join()) and a ->bytes_reserved > >value greater than 0. If after the eviction/iput completes the inode > >logging path hits an error or it decides that it must fallback to a > >transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a > >non-zero value from btrfs_log_dentry_safe(), and because of that > >non-zero value it tries to commit the transaction using a handle with > >a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes > >the transaction commit hit an assertion failure at > >btrfs_trans_release_metadata() because ->bytes_reserved is not zero but > >the ->block_rsv is NULL. The produced stack trace for that is like the > >following: > > > >[192922.917158] assertion failed: !trans->bytes_reserved, file: > > fs/btrfs/transaction.c, line: 816 > >[192922.917553] [ cut here ] > >[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532! > >[192922.918310] invalid opcode: [#1] SMP DEBUG_PAGEALLOC PTI > >[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: GW > > 5.1.4-btrfs-next-47 #1 > >[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > > BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 > >[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs] > >(...) > >[192922.920925] RSP: 0018:aebdc8a27da8 EFLAGS: 00010286 > >[192922.921315] RAX: 0051 RBX: 95c9c16a41c0 RCX: > > > >[192922.921692] RDX: RSI: 95cab6b16838 RDI: > > 95cab6b16838 > >[192922.922066] RBP: 95c9c16a41c0 R08: R09: > > > >[192922.922442] R10: aebdc8a27e70 R11: R12: > > 95ca731a0980 > >[192922.922820] R13: R14: 95ca84c73338 R15: > > 95ca731a0ea8 > >[192922.923200] FS: 7f337eda4e80() GS:95cab6b0() > > knlGS: > >[192922.923579] CS: 0010 DS: ES: CR0: 80050033 > >[192922.923948] CR2: 7f337edad000 CR3: 0001e00f6002 CR4: > > 003606e0 > >[192922.924329] DR0: DR1: DR2: > > > >[192922.924711] DR3: DR6: fffe0ff0 DR7: > > 0400 > >[192922.925105] Call Trace: > >[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs] > >[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs] > >[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs] > >[192922.926731] do_fsync+0x38/0x60 > >[192922.927138] __x64_sys_fdatasync+0x13/0x20 > >[192922.927543] do_syscall_64+0x60/0x1c0 > >[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe > >(...) > >[192922.934077] ---[ end trace f00808b12068168f ]--- > > > > 2) If evict_refill_and_join() decides to commit the transaction, it will > evict_refill_and_join only ever calls btrfs_join_transaction so it > cannot ever commit the transaction. It can: https://git.kernel.org/pub/scm/linux/kern
Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
On 10.09.19 г. 17:26 ч., fdman...@kernel.org wrote: > From: Filipe Manana > > Sometimes when fsync'ing a file we need to log that other inodes exist and > when we need to do that we acquire a reference on the inodes and then drop > that reference using iput() after logging them. > > That generally is not a problem except if we end up doing the final iput() > (dropping the last reference) on the inode and that inode has a link count > of 0, which can happen in a very short time window if the logging path > gets a reference on the inode while it's being unlinked. > > In that case we end up getting the eviction callback, btrfs_evict_inode(), > invoked through the iput() call chain which needs to drop all of the > inode's items from its subvolume btree, and in order to do that, it needs > to join a transaction at the helper function evict_refill_and_join(). > However because the task previously started a transaction at the fsync > handler, btrfs_sync_file(), it has current->journal_info already pointing > to a transaction handle and therefore evict_refill_and_join() will get > that transaction handle from btrfs_join_transaction(). From this point on, > two different problems can happen: > > 1) evict_refill_and_join() will often change the transaction handle's >block reserve (->block_rsv) and set its ->bytes_reserved field to a >value greater than 0. If evict_refill_and_join() never commits the >transaction, the eviction handler ends up decreasing the reference >count (->use_count) of the transaction handle through the call to >btrfs_end_transaction(), and after that point we have a transaction >handle with a NULL ->block_rsv (which is the value prior to the >transaction join from evict_refill_and_join()) and a ->bytes_reserved >value greater than 0. If after the eviction/iput completes the inode >logging path hits an error or it decides that it must fallback to a >transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a >non-zero value from btrfs_log_dentry_safe(), and because of that >non-zero value it tries to commit the transaction using a handle with >a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes >the transaction commit hit an assertion failure at >btrfs_trans_release_metadata() because ->bytes_reserved is not zero but >the ->block_rsv is NULL. The produced stack trace for that is like the >following: > >[192922.917158] assertion failed: !trans->bytes_reserved, file: > fs/btrfs/transaction.c, line: 816 >[192922.917553] [ cut here ] >[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532! >[192922.918310] invalid opcode: [#1] SMP DEBUG_PAGEALLOC PTI >[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: GW > 5.1.4-btrfs-next-47 #1 >[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 >[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs] >(...) >[192922.920925] RSP: 0018:aebdc8a27da8 EFLAGS: 00010286 >[192922.921315] RAX: 0051 RBX: 95c9c16a41c0 RCX: > >[192922.921692] RDX: RSI: 95cab6b16838 RDI: > 95cab6b16838 >[192922.922066] RBP: 95c9c16a41c0 R08: R09: > >[192922.922442] R10: aebdc8a27e70 R11: R12: > 95ca731a0980 >[192922.922820] R13: R14: 95ca84c73338 R15: > 95ca731a0ea8 >[192922.923200] FS: 7f337eda4e80() GS:95cab6b0() > knlGS: >[192922.923579] CS: 0010 DS: ES: CR0: 80050033 >[192922.923948] CR2: 7f337edad000 CR3: 0001e00f6002 CR4: > 003606e0 >[192922.924329] DR0: DR1: DR2: > >[192922.924711] DR3: DR6: fffe0ff0 DR7: > 0400 >[192922.925105] Call Trace: >[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs] >[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs] >[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs] >[192922.926731] do_fsync+0x38/0x60 >[192922.927138] __x64_sys_fdatasync+0x13/0x20 >[192922.927543] do_syscall_64+0x60/0x1c0 >[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe >(...) >[192922.934077] ---[ end trace f00808b12068168f ]--- > > 2) If evict_refill_and_join() decides to commit the transaction, it will evict_refill_and_join only ever calls btrfs_join_transaction so it cannot ever commit the transaction. Let's look into its sole caller, btrfs_evict_inode, I see 3 cases where transactions handle is involved in that function: 1. btrfs_commit_inode_delayed_inode - iot calls trans_join + end_transaction => use_count = 2 -> no commit 2. btrfs_truncate_inode_items -> never calls e
Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
On Thu, Sep 12, 2019 at 8:32 AM Sasha Levin wrote: > > Hi, > > [This is an automated email] > > This commit has been processed because it contains a -stable tag. > The stable tag indicates that it's relevant for the following trees: 4.4+ > > The bot has tested the following trees: v5.2.14, v4.19.72, v4.14.143, > v4.9.192, v4.4.192. > > v5.2.14: Build OK! > v4.19.72: Failed to apply! Possible dependencies: > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different > files") > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and > unlink/rmdir") > b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple > hardlinks") > > v4.14.143: Failed to apply! Possible dependencies: > 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link > recreation") > 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link > combination") > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different > files") > 8d9e220ca084 ("btrfs: simplify IS_ERR/PTR_ERR checks") > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and > unlink/rmdir") > b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple > hardlinks") > > v4.9.192: Failed to apply! Possible dependencies: > 0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience > variables") > 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link > recreation") > 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link > combination") > 4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take > btrfs_inode") > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different > files") > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and > unlink/rmdir") > cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block") > da17066c4047 ("btrfs: pull node/sector/stripe sizes out of root and into > fs_info") > db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode") > de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info") > fb456252d3d9 ("btrfs: root->fs_info cleanup, use fs_info->dev_root > everywhere") > > v4.4.192: Failed to apply! Possible dependencies: > 0132761017e0 ("btrfs: fix string and comment grammatical issues and > typos") > 09cbfeaf1a5a ("mm, fs: get rid of PAGE_CACHE_* and > page_cache_{get,release} macros") > 0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience > variables") > 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link > recreation") > 0e749e54244e ("dax: increase granularity of dax_clear_blocks() > operations") > 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link > combination") > 44f714dae50a ("Btrfs: improve performance on fsync against new inode > after rename/unlink") > 4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take > btrfs_inode") > 52db400fcd50 ("pmem, dax: clean up clear_pmem()") > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different > files") > 781feef7e6be ("Btrfs: fix lockdep warning about log_mutex") > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and > unlink/rmdir") > b2e0d1625e19 ("dax: fix lifetime of in-kernel dax mappings with > dax_map_atomic()") > bb7ab3b92e46 ("btrfs: Fix misspellings in comments.") > cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block") > d1a5f2b4d8a1 ("block: use DAX for partition table reads") > db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode") > de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info") > > > NOTE: The patch will not be queued to stable trees until it is upstream. > > How should we proceed with this patch? After it lands on Linus' tree, I'll try to send patch versions that apply to different kernel releases. Thanks. > > -- > Thanks, > Sasha -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”
Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
Hi, [This is an automated email] This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: 4.4+ The bot has tested the following trees: v5.2.14, v4.19.72, v4.14.143, v4.9.192, v4.4.192. v5.2.14: Build OK! v4.19.72: Failed to apply! Possible dependencies: 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files") a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir") b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple hardlinks") v4.14.143: Failed to apply! Possible dependencies: 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation") 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination") 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files") 8d9e220ca084 ("btrfs: simplify IS_ERR/PTR_ERR checks") a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir") b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple hardlinks") v4.9.192: Failed to apply! Possible dependencies: 0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience variables") 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation") 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination") 4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take btrfs_inode") 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files") a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir") cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block") da17066c4047 ("btrfs: pull node/sector/stripe sizes out of root and into fs_info") db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode") de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info") fb456252d3d9 ("btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere") v4.4.192: Failed to apply! Possible dependencies: 0132761017e0 ("btrfs: fix string and comment grammatical issues and typos") 09cbfeaf1a5a ("mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros") 0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience variables") 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation") 0e749e54244e ("dax: increase granularity of dax_clear_blocks() operations") 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination") 44f714dae50a ("Btrfs: improve performance on fsync against new inode after rename/unlink") 4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take btrfs_inode") 52db400fcd50 ("pmem, dax: clean up clear_pmem()") 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files") 781feef7e6be ("Btrfs: fix lockdep warning about log_mutex") a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir") b2e0d1625e19 ("dax: fix lifetime of in-kernel dax mappings with dax_map_atomic()") bb7ab3b92e46 ("btrfs: Fix misspellings in comments.") cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block") d1a5f2b4d8a1 ("block: use DAX for partition table reads") db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode") de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info") NOTE: The patch will not be queued to stable trees until it is upstream. How should we proceed with this patch? -- Thanks, Sasha
Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
On Tue, Sep 10, 2019 at 03:26:49PM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > Sometimes when fsync'ing a file we need to log that other inodes exist and > when we need to do that we acquire a reference on the inodes and then drop > that reference using iput() after logging them. > > That generally is not a problem except if we end up doing the final iput() > (dropping the last reference) on the inode and that inode has a link count > of 0, which can happen in a very short time window if the logging path > gets a reference on the inode while it's being unlinked. > > In that case we end up getting the eviction callback, btrfs_evict_inode(), > invoked through the iput() call chain which needs to drop all of the > inode's items from its subvolume btree, and in order to do that, it needs > to join a transaction at the helper function evict_refill_and_join(). > However because the task previously started a transaction at the fsync > handler, btrfs_sync_file(), it has current->journal_info already pointing > to a transaction handle and therefore evict_refill_and_join() will get > that transaction handle from btrfs_join_transaction(). From this point on, > two different problems can happen: > > 1) evict_refill_and_join() will often change the transaction handle's >block reserve (->block_rsv) and set its ->bytes_reserved field to a >value greater than 0. If evict_refill_and_join() never commits the >transaction, the eviction handler ends up decreasing the reference >count (->use_count) of the transaction handle through the call to >btrfs_end_transaction(), and after that point we have a transaction >handle with a NULL ->block_rsv (which is the value prior to the >transaction join from evict_refill_and_join()) and a ->bytes_reserved >value greater than 0. If after the eviction/iput completes the inode >logging path hits an error or it decides that it must fallback to a >transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a >non-zero value from btrfs_log_dentry_safe(), and because of that >non-zero value it tries to commit the transaction using a handle with >a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes >the transaction commit hit an assertion failure at >btrfs_trans_release_metadata() because ->bytes_reserved is not zero but >the ->block_rsv is NULL. The produced stack trace for that is like the >following: > >[192922.917158] assertion failed: !trans->bytes_reserved, file: > fs/btrfs/transaction.c, line: 816 >[192922.917553] [ cut here ] >[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532! >[192922.918310] invalid opcode: [#1] SMP DEBUG_PAGEALLOC PTI >[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: GW > 5.1.4-btrfs-next-47 #1 >[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 >[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs] >(...) >[192922.920925] RSP: 0018:aebdc8a27da8 EFLAGS: 00010286 >[192922.921315] RAX: 0051 RBX: 95c9c16a41c0 RCX: > >[192922.921692] RDX: RSI: 95cab6b16838 RDI: > 95cab6b16838 >[192922.922066] RBP: 95c9c16a41c0 R08: R09: > >[192922.922442] R10: aebdc8a27e70 R11: R12: > 95ca731a0980 >[192922.922820] R13: R14: 95ca84c73338 R15: > 95ca731a0ea8 >[192922.923200] FS: 7f337eda4e80() GS:95cab6b0() > knlGS: >[192922.923579] CS: 0010 DS: ES: CR0: 80050033 >[192922.923948] CR2: 7f337edad000 CR3: 0001e00f6002 CR4: > 003606e0 >[192922.924329] DR0: DR1: DR2: > >[192922.924711] DR3: DR6: fffe0ff0 DR7: > 0400 >[192922.925105] Call Trace: >[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs] >[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs] >[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs] >[192922.926731] do_fsync+0x38/0x60 >[192922.927138] __x64_sys_fdatasync+0x13/0x20 >[192922.927543] do_syscall_64+0x60/0x1c0 >[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe >(...) >[192922.934077] ---[ end trace f00808b12068168f ]--- > > 2) If evict_refill_and_join() decides to commit the transaction, it will >be able to do it, since the nested transaction join only increments the >transaction handle's ->use_count reference counter and it does not >prevent the transaction from getting committed. This means that after >eviction completes, the fsync logging path will be using a transaction >handle that refers to an already committed transaction. What