Explanatory note on bionic-proposed (see comment #17):
The flaky fail set had a long tail (tests that failed just a few times).
Symptom:
---
The logs show the failure cause is output mismatch, because '/target'
failed to umount (busy), and it was logged; e.g.,
btrfs/027 [failed, exit status 1]- output mismatch (see
/home/ubuntu/xfstests-dev/results//btrfs/027.out.bad)
--- tests/btrfs/027.out 2020-10-26 22:23:54.626341276 +0000
+++ /home/ubuntu/xfstests-dev/results//btrfs/027.out.bad
2020-11-17 06:34:12.623548481 +0000
@@ -1,2 +1,3 @@
QA output created by 027
-Silence is golden
+umount: /mnt/scratch: target is busy.
+failed to unmount /dev/loop2
...
The number of failed tests increases with the number of umount failures.
Comparing both numbers per log file: (line breaks added for clarity.)
$ (grep ^Failed xfstests.log.2020-11-1*; \
grep -c 'umount: /mnt/scratch: target is busy'
xfstests.log.2020-11-1* | grep -v :0) \
| sort
xfstests.log.2020-11-16-21-03-32:Failed 31 of 823 tests // 30ish is
normal failure count
xfstests.log.2020-11-16-22-16-38:Failed 34 of 823 tests
xfstests.log.2020-11-16-23-25-56:Failed 34 of 823 tests
xfstests.log.2020-11-17-00-36-27:Failed 36 of 823 tests
xfstests.log.2020-11-17-01-44-04:Failed 37 of 823 tests
xfstests.log.2020-11-17-02-52-41:Failed 37 of 823 tests // things go
bad after this. umount hung here.
xfstests.log.2020-11-17-04-07-11:15
xfstests.log.2020-11-17-04-07-11:Failed 68 of 823 tests // umount hung
here too.
xfstests.log.2020-11-17-05-21-34:76
xfstests.log.2020-11-17-05-21-34:Failed 158 of 823 tests
xfstests.log.2020-11-17-06-29-11:148
xfstests.log.2020-11-17-06-29-11:Failed 278 of 823 tests
xfstests.log.2020-11-17-07-29-57:31
xfstests.log.2020-11-17-07-29-57:Failed 104 of 823 tests
xfstests.log.2020-11-17-08-37-16:33
xfstests.log.2020-11-17-08-37-16:Failed 87 of 823 tests
xfstests.log.2020-11-17-09-50-43:36
xfstests.log.2020-11-17-09-50-43:Failed 101 of 823 tests
xfstests.log.2020-11-17-11-04-16:43
xfstests.log.2020-11-17-11-04-16:Failed 111 of 823 tests
xfstests.log.2020-11-17-12-08-14:68
xfstests.log.2020-11-17-12-08-14:Failed 143 of 823 tests
xfstests.log.2020-11-17-13-14-01:93
xfstests.log.2020-11-17-13-14-01:Failed 200 of 823 tests
And the kernel log files show that 'umount' hung in the timestamps where
the increase started.
$ sudo grep umount kern.log.for.2020-11-1*
kern.log.for.2020-11-17-02-52-41:Nov 17 03:52:55 mfo-s390x-bionic
kernel: [24748.572259] CPU: 4 PID: 10862 Comm: umount Tainted: G W
4.15.0-125-generic #128-Ubuntu
kern.log.for.2020-11-17-02-52-41:Nov 17 03:53:10 mfo-s390x-bionic
kernel: [24762.822089] CPU: 4 PID: 11705 Comm: umount Tainted: G W
4.15.0-125-generic #128-Ubuntu
kern.log.for.2020-11-17-02-52-41:Nov 17 03:53:16 mfo-s390x-bionic
kernel: [24768.881926] CPU: 4 PID: 12110 Comm: umount Tainted: G W
4.15.0-125-generic #128-Ubuntu
kern.log.for.2020-11-17-04-07-11:Nov 17 03:52:55 mfo-s390x-bionic
kernel: [24748.572259] CPU: 4 PID: 10862 Comm: umount Tainted: G W
4.15.0-125-generic #128-Ubuntu
kern.log.for.2020-11-17-04-07-11:Nov 17 03:53:10 mfo-s390x-bionic
kernel: [24762.822089] CPU: 4 PID: 11705 Comm: umount Tainted: G W
4.15.0-125-generic #128-Ubuntu
kern.log.for.2020-11-17-04-07-11:Nov 17 03:53:16 mfo-s390x-bionic
kernel: [24768.881926] CPU: 4 PID: 12110 Comm: umount Tainted: G W
4.15.0-125-generic #128-Ubuntu
This is the stack trace (s390x kernel pointer addresses.)
24748.572081] ------------[ cut here ]------------
[24748.572083] BTRFS: Transaction aborted (error -28) // i.e., -ENOSPC
[24748.572217] WARNING: CPU: 4 PID: 10862 at
/build/linux-9nQaV0/linux-4.15.0/fs/btrfs/extent-tree.c:3097
btrfs_run_delayed_refs+0x208/0x260 [btrfs]
...
[24748.572259] CPU: 4 PID: 10862 Comm: umount Tainted: G W
4.15.0-125-generic #128-Ubuntu
[24748.572259] Hardware name: IBM 2964 N63 400 (KVM/Linux)
...
[24748.572297] Call Trace:
[24748.572309] ([<000003ff8039710c>] btrfs_run_delayed_refs+0x204/0x260
[btrfs])
[24748.572322] [<000003ff803b0b46>]
btrfs_commit_transaction+0x66/0x9e8 [btrfs]
[24748.572330] [<00000000003cdf9e>] __sync_filesystem+0x56/0x80
[24748.572333] [<000000000038f088>] generic_shutdown_super+0x48/0x168
[24748.572334] [<000000000038f50e>] kill_anon_super+0x2e/0x40
[24748.572344] [<000003ff80375b60>] btrfs_kill_super+0x30/0x148 [btrfs]
[24748.572346] [<000000000038fa90>] deactivate_locked_super+0x70/0xa0
[24748.572348] [<00000000003b6b94>] cleanup_mnt+0x64/0xa8
[24748.572351] [<000000000019a542>] task_work_run+0xba/0x100
[24748.572361] [<0000000000108a6e>] do_notify_resume+0x4e/0x60
[24748.572365] [<00000000008ff3be>] system_call+0xe2/0x2c8
After these errors, the failure rate increased; but not all times/tests;
since it seems that at some point the umount finishes, unblocking tests.
Cause:
---
Since this issue didn't happen with the test/patched kernel, it would
seem to originate from other/new patches. There are indeed 2 new ones
not used in the test kernel:
6f64bf0873eb btrfs: qgroup: fix data leak caused by race between
writeback and truncate <<< NEW
d14eef17dcf2 btrfs: don't force read-only after error in drop snapshot
<<< NEW
$ ls -1r *.patch # test kernel on top of Ubuntu-4.15.0-122.124
0017-btrfs-ctree-check-key-order-before-merging-tree-bloc.patch
0016-btrfs-extent-tree-kill-the-BUG_ON-in-insert_inline_e.patch
0015-btrfs-extent-tree-kill-BUG_ON-in-__btrfs_free_extent.patch
0014-btrfs-extent_io-do-extra-check-for-extent-buffer-rea.patch
0013-btrfs-drop-unnecessary-offset_in_page-in-extent-buff.patch
0012-btrfs-use-BUG-instead-of-BUG_ON-1.patch
0011-btrfs-use-offset_in_page-instead-of-open-coding-it.patch
0010-btrfs-fix-wrong-address-when-faulting-in-pages-in-th.patch
0009-btrfs-fix-lockdep-splat-in-add_missing_dev.patch
0008-btrfs-require-only-sector-size-alignment-for-parent-.patch
0007-btrfs-fix-potential-deadlock-in-the-search-ioctl.patch
0006-uaccess-Add-non-pagefault-user-space-write-function.patch
0005-uaccess-Add-non-pagefault-user-space-read-functions.patch
0004-btrfs-set-the-lockdep-class-for-log-tree-extent-buff.patch
0003-btrfs-Remove-extraneous-extent_buffer_get-from-tree_.patch
0002-btrfs-Remove-redundant-extent_buffer_get-in-get_old_.patch
0001-btrfs-drop-path-before-adding-new-uuid-tree-entry.patch
$ git log --oneline Ubuntu-4.15.0-122.124..Ubuntu-4.15.0-125.128 -- fs/btrfs/
a0a951aaa63f btrfs: ctree: check key order before merging tree blocks
0db62fbc1d08 btrfs: extent-tree: kill the BUG_ON() in
insert_inline_extent_backref()
4f0e0482c42a btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent()
36b1133f57e3 btrfs: extent_io: do extra check for extent buffer read write
functions
82e64a4e1ff8 btrfs: drop unnecessary offset_in_page in extent buffer helpers
3393e02f4100 btrfs: use BUG() instead of BUG_ON(1)
10679c69e0a8 btrfs: use offset_in_page instead of open-coding it
6f64bf0873eb btrfs: qgroup: fix data leak caused by race between writeback and
truncate <<< NEW
d14eef17dcf2 btrfs: don't force read-only after error in drop snapshot <<< NEW
2f54a800e2ae btrfs: fix wrong address when faulting in pages in the search ioctl
71ef83d6ab86 btrfs: fix lockdep splat in add_missing_dev
cb1483e19a40 btrfs: require only sector size alignment for parent eb bytenr
d738658f503d btrfs: fix potential deadlock in the search ioctl
cca53f849a53 btrfs: set the lockdep class for log tree extent buffers
bb11dbfbc405 btrfs: Remove extraneous extent_buffer_get from tree_mod_log_rewind
9a4e81ea6a44 btrfs: Remove redundant extent_buffer_get in get_old_root
725ce0e39e6d btrfs: drop path before adding new uuid tree entry
The commit message in 6f64bf0873eb explicitly describes the umount path
(now in return to userspace, but still.) So it does look like a match:
commit 6f64bf0873ebc15a4e710cb04c7630576637a845
Author: Qu Wenruo <[email protected]>
Date: Fri Jul 17 15:12:05 2020 +0800
btrfs: qgroup: fix data leak caused by race between writeback and truncate
BugLink: https://bugs.launchpad.net/bugs/1901257
[ Upstream commit fa91e4aa1716004ea8096d5185ec0451e206aea0 ]
[BUG]
When running tests like generic/013 on test device with btrfs quota
enabled, it can normally lead to data leak, detected at unmount time:
BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv
4096
------------[ cut here ]------------
WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142
close_ctree+0x1dc/0x323 [btrfs]
RIP: 0010:close_ctree+0x1dc/0x323 [btrfs]
Call Trace:
btrfs_put_super+0x15/0x17 [btrfs]
generic_shutdown_super+0x72/0x110
kill_anon_super+0x18/0x30
btrfs_kill_super+0x17/0x30 [btrfs]
deactivate_locked_super+0x3b/0xa0
deactivate_super+0x40/0x50
cleanup_mnt+0x135/0x190
__cleanup_mnt+0x12/0x20
task_work_run+0x64/0xb0
__prepare_exit_to_usermode+0x1bc/0x1c0
__syscall_return_slowpath+0x47/0x230
do_syscall_64+0x64/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
---[ end trace caf08beafeca2392 ]---
BTRFS error (device dm-3): qgroup reserved space leaked
Also, this issue happened with Focal's original kernel from master-next,
which happened to have that commit already; and it didn't happen with
Focal's 'pre-original' kernel from -updates, which didn't. More evidence.
Revert?
---
Apparently this doesn't need to be reverted; it doesn't happen everytime,
as seen in this test, and with previous tests for Focal:
As described above, this was also noticed in Focal's results w/ the original
kernel from master-next, which has that commit, but to a lesser extent (see
comment #8) so it wasn't noticed as much.
Thus the impact seem to vary, and happen more rarely, because the results w/
focal-proposed did NOT hit it a single time in 11 runs, and commit IS there.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1902254
Title:
Bionic: btrfs: kernel BUG at /build/linux-
eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233!
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902254/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs