** Also affects: linux (Ubuntu Oracular)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Oracular)
       Status: New => In Progress

** Changed in: linux (Ubuntu Oracular)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Oracular)
     Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Summary changed:

- BTRFS kernel panic on btrfs_remove_qgroup
+ btrfs will WARN_ON() in btrfs_remove_qgroup() unnecessarily

** Description changed:

- We are able to trigger a kernel oops in the btrfs code from userspace:
+ BugLink: https://bugs.launchpad.net/bugs/2091719
  
- [   46.597006] Kernel panic - not syncing: kernel: panic_on_warn set ...
- [   46.597474] CPU: 0 PID: 1316 Comm: (sd-clean) Not tainted 6.8.0-50-generic 
#51-Ubuntu
- [   46.597660] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.3-3.fc41 04/01/2014
- [   46.597882] Call Trace:
- [   46.597948]  <TASK>
- [   46.598028]  dump_stack_lvl+0x27/0xa0
- [   46.598115]  dump_stack+0x10/0x20
- [   46.598222]  panic+0x366/0x3c0
- [   46.598319]  ? btrfs_remove_qgroup+0x271/0x490 [btrfs]
- [   46.598517]  check_panic_on_warn+0x4f/0x60
- [   46.598609]  __warn+0x95/0x160
- [   46.598703]  ? btrfs_remove_qgroup+0x271/0x490 [btrfs]
- [   46.598932]  report_bug+0x17e/0x1b0
- [   46.599245]  handle_bug+0x51/0xa0
- [   46.599414]  exc_invalid_op+0x18/0x80
- [   46.599645]  asm_exc_invalid_op+0x1b/0x20
- [   46.599794] RIP: 0010:btrfs_remove_qgroup+0x271/0x490 [btrfs]
- [   46.600073] Code: c0 0f 85 27 fe ff ff 48 8b 43 b0 4c 39 f0 75 d5 4d 8d b5 
e0 08 00 00 4c 89 f7 e8 8a 45 19 e2 48 83 7b 98 00 0f 84 52 01 00 00 <0f> 0b 49 
8b 45 10 a8 10 74 42 41 f6 85 d0 08 00 00 0c 75 38 48 83
- [   46.600516] RSP: 0018:ffffa0a0c30b3d58 EFLAGS: 00010206
- [   46.600640] RAX: 0000000000000000 RBX: ffff958787663cb8 RCX: 
0000000000000000
- [   46.600826] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
- [   46.601025] RBP: ffffa0a0c30b3dc0 R08: 0000000000000000 R09: 
0000000000000000
- [   46.601199] R10: 0000000000000000 R11: 0000000000000000 R12: 
00ff00000000010f
- [   46.601385] R13: ffff95878b730000 R14: ffff95878b7308e0 R15: 
0000000000000000
- [   46.601579]  ? btrfs_remove_qgroup+0x266/0x490 [btrfs]
- [   46.601867]  btrfs_ioctl+0x12b9/0x13a0 [btrfs]
- [   46.602100]  ? srso_alias_return_thunk+0x5/0xfbef5
- [   46.602575]  ? __seccomp_filter+0x368/0x570
- [   46.602696]  ? __fput+0x15e/0x2e0
- [   46.602993]  __x64_sys_ioctl+0xa3/0xf0
- [   46.603143]  x64_sys_call+0x12a3/0x25a0
- [   46.603379]  do_syscall_64+0x7f/0x180
- [   46.603557]  ? srso_alias_return_thunk+0x5/0xfbef5
- [   46.603680]  ? do_syscall_64+0x8c/0x180
- [   46.603774]  ? srso_alias_return_thunk+0x5/0xfbef5
- [   46.603898]  ? syscall_exit_to_user_mode+0x86/0x260
- [   46.604043]  ? srso_alias_return_thunk+0x5/0xfbef5
- [   46.604155]  ? do_syscall_64+0x8c/0x180
- [   46.604248]  ? do_syscall_64+0x8c/0x180
- [   46.604341]  ? srso_alias_return_thunk+0x5/0xfbef5
- [   46.604693]  entry_SYSCALL_64_after_hwframe+0x78/0x80
- [   46.605029] RIP: 0033:0x7c2fbeb24ded
- [   46.605283] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 
10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 
00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
- [   46.606374] RSP: 002b:00007ffe3e103770 EFLAGS: 00000246 ORIG_RAX: 
0000000000000010
- [   46.607365] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 
00007c2fbeb24ded
- [   46.607872] RDX: 00007ffe3e1037d0 RSI: 000000004010942a RDI: 
0000000000000016
- [   46.608856] RBP: 00007ffe3e1037c0 R08: 0000000000000069 R09: 
0000000000000000
- [   46.609477] R10: 0000000000000000 R11: 0000000000000246 R12: 
0000000000000016
- [   46.609672] R13: 0000000000000000 R14: 00ff00000000010f R15: 
0000000000000016
- [   46.609862]  </TASK>
- [   46.611054] Kernel Offset: 0x20600000 from 0xffffffff81000000 (relocation 
range: 0xffffffff80000000-0xffffffffbfffffff)
+ [Impact]
  
- This happens on Ubuntu Noble since the kernel update to 6.8.0-50-generic
- which was promoted from proposed to updates this week, it did not happen
- before. The upstream systemd CI on Github reproduces this issue
- consistently, e.g.:
+ The following commit for noble and oracular introduced two new WARN_ON() calls
+ in btrfs qgroup removals, and even though the author at the time believed they
+ would not be reachable, it turns out it can happen quite frequently in the
+ right conditions.
+ 
+ ubuntu-noble b2ad25ba539452f492805e5f7d94e80894aa860f
+ commit a776bf5f3c2300cfdf8a195663460b1793ac9847
+ Author: Qu Wenruo <[email protected]>
+ Date: Fri Apr 19 14:29:32 2024 +0930
+ Subject: btrfs: slightly loosen the requirement for qgroup removal
+ Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a776bf5f3c2300cfdf8a195663460b1793ac9847
+ 
+ $ git describe --contains b2ad25ba539452f492805e5f7d94e80894aa860f
+ Ubuntu-6.8.0-50.51~143
+ 
+ This primarily affects the systemd CI that runs integration tests on merge:
  
https://github.com/systemd/systemd/actions/runs/12297539029/job/34318915884?pr=35589
  
- It also happens on the newest upstream kernel, and was reproduced with
- the same backtrace on Archlinux too. It was bisected to one of the
- following upstream BTRFS commits listed in this Github comment:
+ Kernel panic - not syncing: kernel: panic_on_warn set ...
+ CPU: 0 PID: 1316 Comm: (sd-clean) Not tainted 6.8.0-50-generic #51-Ubuntu
+ Call Trace:
+  <TASK>
+  dump_stack_lvl+0x27/0xa0
+  dump_stack+0x10/0x20
+  panic+0x366/0x3c0
+  ? btrfs_remove_qgroup+0x271/0x490 [btrfs]
+  check_panic_on_warn+0x4f/0x60
+  __warn+0x95/0x160
+  ? btrfs_remove_qgroup+0x271/0x490 [btrfs]
+  report_bug+0x17e/0x1b0
+  handle_bug+0x51/0xa0
+  exc_invalid_op+0x18/0x80
+  asm_exc_invalid_op+0x1b/0x20
+ RIP: 0010:btrfs_remove_qgroup+0x271/0x490 [btrfs]
+ Code: c0 0f 85 27 fe ff ff 48 8b 43 b0 4c 39 f0 75 d5 4d 8d b5 e0 08 00 00 4c 
89 f7 e8 8a 45 19 e2 48 83 7b 98 00 0f 84 52 01 00 00 <0f> 0b 49 8b 45 10 a8 10 
74 42 41 f6 85 d0 08 00 00 0c 75 38 48 83
+  ? btrfs_remove_qgroup+0x266/0x490 [btrfs]
+  btrfs_ioctl+0x12b9/0x13a0 [btrfs]
+  ? srso_alias_return_thunk+0x5/0xfbef5
+  ? __seccomp_filter+0x368/0x570
+  ? __fput+0x15e/0x2e0
+  __x64_sys_ioctl+0xa3/0xf0
+  x64_sys_call+0x12a3/0x25a0
+  do_syscall_64+0x7f/0x180
+  entry_SYSCALL_64_after_hwframe+0x78/0x80
  
+ [Fix]
+ 
+ The fix just landed in mainline as:
+ 
+ commit c0def46dec9c547679a25fe7552c4bcbec0b0dd2
+ Author: Qu Wenruo <[email protected]>
+ Date:   Mon Nov 11 07:29:07 2024 +1030
+ Subject: btrfs: improve the warning and error message for 
btrfs_remove_qgroup()
+ Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0def46dec9c547679a25fe7552c4bcbec0b0dd2
+ 
+ The commit places the WARN_ON behind CONFIG_BTRFS_DEBUG, which silences the
+ warning for most users. It is safe to do so, as noted by the Author, as
+ the user space tool managing the qgroups would rescan them, to fix the
+ inconsistent view.
+ 
+ This is needed for both noble and oracular.
+ 
+ [Testcase]
+ 
+ The upstream systemd CI tests can consistently reproduce the issue, so the 
test
+ and proposed kernels will be run against the systemd CI for verification.
+ 
+ There is a test kernel available in the following ppa:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/lp2091719-test
+ 
+ If you install it, the systemd CI will run to completion.
+ 
+ [Where problems could occur]
+ 
+ We are changing the WARN_ON() to occur only when CONFIG_BTRFS_DEBUG is 
enabled.
+ There is no other change in logic, so functionality should be the same as what
+ we have now.
+ 
+ If a regression were to occur, it would affect systems with btrfs filesystems
+ that are utilising subvolumes. It would not likely cause any data loss or disk
+ corruption, as userspace tools should be able to automatically fix up any
+ inconsistent views without user interaction.
+ 
+ [Other info]
+ 
+ Systemd upstream bisected the issue here:
  https://github.com/systemd/systemd/pull/35567#issuecomment-2538160543
- 
- A fix has been proposed by SUSE and tested on Archlinux, and confirmed
- to solve the crash:
- 
- https://github.com/btrfs/linux/commit/c61ffaa0a3d9a2094e24d1fe2b17f20e109d2cc8
- 
- Note that this fix has been tested on Archlinux, not on Ubuntu. But we
- are confident it's the same issue.

** Tags added: noble oracular

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2091719

Title:
  btrfs will WARN_ON() in btrfs_remove_qgroup() unnecessarily

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2091719/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to