** Also affects: linux (Ubuntu Oracular)
Importance: Undecided
Status: New
** Changed in: linux (Ubuntu Oracular)
Status: New => In Progress
** Changed in: linux (Ubuntu Oracular)
Importance: Undecided => Medium
** Changed in: linux (Ubuntu Oracular)
Assignee: (unassigned) => Matthew Ruffell (mruffell)
** Summary changed:
- BTRFS kernel panic on btrfs_remove_qgroup
+ btrfs will WARN_ON() in btrfs_remove_qgroup() unnecessarily
** Description changed:
- We are able to trigger a kernel oops in the btrfs code from userspace:
+ BugLink: https://bugs.launchpad.net/bugs/2091719
- [ 46.597006] Kernel panic - not syncing: kernel: panic_on_warn set ...
- [ 46.597474] CPU: 0 PID: 1316 Comm: (sd-clean) Not tainted 6.8.0-50-generic
#51-Ubuntu
- [ 46.597660] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.3-3.fc41 04/01/2014
- [ 46.597882] Call Trace:
- [ 46.597948] <TASK>
- [ 46.598028] dump_stack_lvl+0x27/0xa0
- [ 46.598115] dump_stack+0x10/0x20
- [ 46.598222] panic+0x366/0x3c0
- [ 46.598319] ? btrfs_remove_qgroup+0x271/0x490 [btrfs]
- [ 46.598517] check_panic_on_warn+0x4f/0x60
- [ 46.598609] __warn+0x95/0x160
- [ 46.598703] ? btrfs_remove_qgroup+0x271/0x490 [btrfs]
- [ 46.598932] report_bug+0x17e/0x1b0
- [ 46.599245] handle_bug+0x51/0xa0
- [ 46.599414] exc_invalid_op+0x18/0x80
- [ 46.599645] asm_exc_invalid_op+0x1b/0x20
- [ 46.599794] RIP: 0010:btrfs_remove_qgroup+0x271/0x490 [btrfs]
- [ 46.600073] Code: c0 0f 85 27 fe ff ff 48 8b 43 b0 4c 39 f0 75 d5 4d 8d b5
e0 08 00 00 4c 89 f7 e8 8a 45 19 e2 48 83 7b 98 00 0f 84 52 01 00 00 <0f> 0b 49
8b 45 10 a8 10 74 42 41 f6 85 d0 08 00 00 0c 75 38 48 83
- [ 46.600516] RSP: 0018:ffffa0a0c30b3d58 EFLAGS: 00010206
- [ 46.600640] RAX: 0000000000000000 RBX: ffff958787663cb8 RCX:
0000000000000000
- [ 46.600826] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
- [ 46.601025] RBP: ffffa0a0c30b3dc0 R08: 0000000000000000 R09:
0000000000000000
- [ 46.601199] R10: 0000000000000000 R11: 0000000000000000 R12:
00ff00000000010f
- [ 46.601385] R13: ffff95878b730000 R14: ffff95878b7308e0 R15:
0000000000000000
- [ 46.601579] ? btrfs_remove_qgroup+0x266/0x490 [btrfs]
- [ 46.601867] btrfs_ioctl+0x12b9/0x13a0 [btrfs]
- [ 46.602100] ? srso_alias_return_thunk+0x5/0xfbef5
- [ 46.602575] ? __seccomp_filter+0x368/0x570
- [ 46.602696] ? __fput+0x15e/0x2e0
- [ 46.602993] __x64_sys_ioctl+0xa3/0xf0
- [ 46.603143] x64_sys_call+0x12a3/0x25a0
- [ 46.603379] do_syscall_64+0x7f/0x180
- [ 46.603557] ? srso_alias_return_thunk+0x5/0xfbef5
- [ 46.603680] ? do_syscall_64+0x8c/0x180
- [ 46.603774] ? srso_alias_return_thunk+0x5/0xfbef5
- [ 46.603898] ? syscall_exit_to_user_mode+0x86/0x260
- [ 46.604043] ? srso_alias_return_thunk+0x5/0xfbef5
- [ 46.604155] ? do_syscall_64+0x8c/0x180
- [ 46.604248] ? do_syscall_64+0x8c/0x180
- [ 46.604341] ? srso_alias_return_thunk+0x5/0xfbef5
- [ 46.604693] entry_SYSCALL_64_after_hwframe+0x78/0x80
- [ 46.605029] RIP: 0033:0x7c2fbeb24ded
- [ 46.605283] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0
10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d
00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
- [ 46.606374] RSP: 002b:00007ffe3e103770 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
- [ 46.607365] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
00007c2fbeb24ded
- [ 46.607872] RDX: 00007ffe3e1037d0 RSI: 000000004010942a RDI:
0000000000000016
- [ 46.608856] RBP: 00007ffe3e1037c0 R08: 0000000000000069 R09:
0000000000000000
- [ 46.609477] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000000000016
- [ 46.609672] R13: 0000000000000000 R14: 00ff00000000010f R15:
0000000000000016
- [ 46.609862] </TASK>
- [ 46.611054] Kernel Offset: 0x20600000 from 0xffffffff81000000 (relocation
range: 0xffffffff80000000-0xffffffffbfffffff)
+ [Impact]
- This happens on Ubuntu Noble since the kernel update to 6.8.0-50-generic
- which was promoted from proposed to updates this week, it did not happen
- before. The upstream systemd CI on Github reproduces this issue
- consistently, e.g.:
+ The following commit for noble and oracular introduced two new WARN_ON() calls
+ in btrfs qgroup removals, and even though the author at the time believed they
+ would not be reachable, it turns out it can happen quite frequently in the
+ right conditions.
+
+ ubuntu-noble b2ad25ba539452f492805e5f7d94e80894aa860f
+ commit a776bf5f3c2300cfdf8a195663460b1793ac9847
+ Author: Qu Wenruo <[email protected]>
+ Date: Fri Apr 19 14:29:32 2024 +0930
+ Subject: btrfs: slightly loosen the requirement for qgroup removal
+ Link:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a776bf5f3c2300cfdf8a195663460b1793ac9847
+
+ $ git describe --contains b2ad25ba539452f492805e5f7d94e80894aa860f
+ Ubuntu-6.8.0-50.51~143
+
+ This primarily affects the systemd CI that runs integration tests on merge:
https://github.com/systemd/systemd/actions/runs/12297539029/job/34318915884?pr=35589
- It also happens on the newest upstream kernel, and was reproduced with
- the same backtrace on Archlinux too. It was bisected to one of the
- following upstream BTRFS commits listed in this Github comment:
+ Kernel panic - not syncing: kernel: panic_on_warn set ...
+ CPU: 0 PID: 1316 Comm: (sd-clean) Not tainted 6.8.0-50-generic #51-Ubuntu
+ Call Trace:
+ <TASK>
+ dump_stack_lvl+0x27/0xa0
+ dump_stack+0x10/0x20
+ panic+0x366/0x3c0
+ ? btrfs_remove_qgroup+0x271/0x490 [btrfs]
+ check_panic_on_warn+0x4f/0x60
+ __warn+0x95/0x160
+ ? btrfs_remove_qgroup+0x271/0x490 [btrfs]
+ report_bug+0x17e/0x1b0
+ handle_bug+0x51/0xa0
+ exc_invalid_op+0x18/0x80
+ asm_exc_invalid_op+0x1b/0x20
+ RIP: 0010:btrfs_remove_qgroup+0x271/0x490 [btrfs]
+ Code: c0 0f 85 27 fe ff ff 48 8b 43 b0 4c 39 f0 75 d5 4d 8d b5 e0 08 00 00 4c
89 f7 e8 8a 45 19 e2 48 83 7b 98 00 0f 84 52 01 00 00 <0f> 0b 49 8b 45 10 a8 10
74 42 41 f6 85 d0 08 00 00 0c 75 38 48 83
+ ? btrfs_remove_qgroup+0x266/0x490 [btrfs]
+ btrfs_ioctl+0x12b9/0x13a0 [btrfs]
+ ? srso_alias_return_thunk+0x5/0xfbef5
+ ? __seccomp_filter+0x368/0x570
+ ? __fput+0x15e/0x2e0
+ __x64_sys_ioctl+0xa3/0xf0
+ x64_sys_call+0x12a3/0x25a0
+ do_syscall_64+0x7f/0x180
+ entry_SYSCALL_64_after_hwframe+0x78/0x80
+ [Fix]
+
+ The fix just landed in mainline as:
+
+ commit c0def46dec9c547679a25fe7552c4bcbec0b0dd2
+ Author: Qu Wenruo <[email protected]>
+ Date: Mon Nov 11 07:29:07 2024 +1030
+ Subject: btrfs: improve the warning and error message for
btrfs_remove_qgroup()
+ Link:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0def46dec9c547679a25fe7552c4bcbec0b0dd2
+
+ The commit places the WARN_ON behind CONFIG_BTRFS_DEBUG, which silences the
+ warning for most users. It is safe to do so, as noted by the Author, as
+ the user space tool managing the qgroups would rescan them, to fix the
+ inconsistent view.
+
+ This is needed for both noble and oracular.
+
+ [Testcase]
+
+ The upstream systemd CI tests can consistently reproduce the issue, so the
test
+ and proposed kernels will be run against the systemd CI for verification.
+
+ There is a test kernel available in the following ppa:
+
+ https://launchpad.net/~mruffell/+archive/ubuntu/lp2091719-test
+
+ If you install it, the systemd CI will run to completion.
+
+ [Where problems could occur]
+
+ We are changing the WARN_ON() to occur only when CONFIG_BTRFS_DEBUG is
enabled.
+ There is no other change in logic, so functionality should be the same as what
+ we have now.
+
+ If a regression were to occur, it would affect systems with btrfs filesystems
+ that are utilising subvolumes. It would not likely cause any data loss or disk
+ corruption, as userspace tools should be able to automatically fix up any
+ inconsistent views without user interaction.
+
+ [Other info]
+
+ Systemd upstream bisected the issue here:
https://github.com/systemd/systemd/pull/35567#issuecomment-2538160543
-
- A fix has been proposed by SUSE and tested on Archlinux, and confirmed
- to solve the crash:
-
- https://github.com/btrfs/linux/commit/c61ffaa0a3d9a2094e24d1fe2b17f20e109d2cc8
-
- Note that this fix has been tested on Archlinux, not on Ubuntu. But we
- are confident it's the same issue.
** Tags added: noble oracular
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2091719
Title:
btrfs will WARN_ON() in btrfs_remove_qgroup() unnecessarily
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2091719/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs