Public bug reported:
Every now and then, the `umount` command gets stuck in the `D` state
when unmounting ZFS snapshots:
# ps aux | grep umount
root 912290 0.0 0.0 10344 2560 ? D Apr26 0:01 umount
/mnt/zfs-snapshot-backup/var/opt/jira
At the same time, we can see a kernel oops/panic in `dmesg`:
Sat 2025-04-26 02:15:43 UTC systemd[1]:
mnt-zfs\x2dsnapshot\x2dbackup-var-opt-jira.mount: Deactivated successfully.
Sat 2025-04-26 02:15:44 UTC kernel: BUG: kernel NULL pointer dereference,
address: 0000000000000000
Sat 2025-04-26 02:15:44 UTC kernel: #PF: supervisor instruction fetch in kernel
mode
Sat 2025-04-26 02:15:44 UTC kernel: #PF: error_code(0x0010) - not-present page
Sat 2025-04-26 02:15:44 UTC kernel: PGD 8000000131251067 P4D 8000000131251067
PUD 0
Sat 2025-04-26 02:15:44 UTC kernel: Oops: 0010 [#1] PREEMPT SMP PTI
Sat 2025-04-26 02:15:44 UTC kernel: CPU: 0 PID: 486 Comm: arc_prune Tainted: P
O 6.8.0-58-generic #60-Ubuntu
Sat 2025-04-26 02:15:44 UTC kernel: Hardware name: QEMU Standard PC (i440FX +
PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Sat 2025-04-26 02:15:44 UTC kernel: RIP: 0010:0x0
Sat 2025-04-26 02:15:44 UTC kernel: Code: Unable to access opcode bytes at
0xffffffffffffffd6.
Sat 2025-04-26 02:15:44 UTC kernel: RSP: 0018:ffffb845c0cebd40 EFLAGS: 00010246
Sat 2025-04-26 02:15:44 UTC kernel: RAX: 0000000000000000 RBX: ffffb845c0cebdac
RCX: 0000000000000000
Sat 2025-04-26 02:15:44 UTC kernel: RDX: 0000000000000000 RSI: ffffb845c0cebd48
RDI: ffff8f8bb1fa4f00
Sat 2025-04-26 02:15:44 UTC kernel: RBP: ffffb845c0cebd98 R08: 0000000000000000
R09: 0000000000000000
Sat 2025-04-26 02:15:44 UTC kernel: R10: 0000000000000000 R11: 0000000000000000
R12: 0000000000009ca5
Sat 2025-04-26 02:15:44 UTC kernel: R13: 0000000000000000 R14: ffff8f8ab33dc000
R15: ffff8f8bb1fa4f00
Sat 2025-04-26 02:15:44 UTC kernel: FS: 0000000000000000(0000)
GS:ffff8f8cede00000(0000) knlGS:0000000000000000
Sat 2025-04-26 02:15:44 UTC kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Sat 2025-04-26 02:15:44 UTC kernel: CR2: ffffffffffffffd6 CR3: 000000013837e002
CR4: 00000000001706f0
Sat 2025-04-26 02:15:44 UTC kernel: DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
Sat 2025-04-26 02:15:44 UTC kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
Sat 2025-04-26 02:15:44 UTC kernel: Call Trace:
Sat 2025-04-26 02:15:44 UTC kernel: <TASK>
Sat 2025-04-26 02:15:44 UTC kernel: ? show_regs+0x6d/0x80
Sat 2025-04-26 02:15:44 UTC kernel: ? __die+0x24/0x80
Sat 2025-04-26 02:15:44 UTC kernel: ? page_fault_oops+0x99/0x1b0
Sat 2025-04-26 02:15:44 UTC kernel: ? do_user_addr_fault+0x2e9/0x670
Sat 2025-04-26 02:15:44 UTC kernel: ? free_large_kmalloc+0x6b/0xc0
Sat 2025-04-26 02:15:44 UTC kernel: ? exc_page_fault+0x83/0x1b0
Sat 2025-04-26 02:15:44 UTC kernel: ? asm_exc_page_fault+0x27/0x30
Sat 2025-04-26 02:15:44 UTC kernel: zfs_prune+0x90/0x130 [zfs]
Sat 2025-04-26 02:15:44 UTC kernel: zpl_prune_sb+0x35/0x60 [zfs]
Sat 2025-04-26 02:15:44 UTC kernel: arc_prune_task+0x22/0x40 [zfs]
Sat 2025-04-26 02:15:44 UTC kernel: taskq_thread+0x1f6/0x3c0 [spl]
Sat 2025-04-26 02:15:44 UTC kernel: ? __pfx_default_wake_function+0x10/0x10
Sat 2025-04-26 02:15:44 UTC kernel: ? __pfx_taskq_thread+0x10/0x10 [spl]
Sat 2025-04-26 02:15:44 UTC kernel: kthread+0xf2/0x120
Sat 2025-04-26 02:15:44 UTC kernel: ? __pfx_kthread+0x10/0x10
Sat 2025-04-26 02:15:44 UTC kernel: ret_from_fork+0x47/0x70
Sat 2025-04-26 02:15:44 UTC kernel: ? __pfx_kthread+0x10/0x10
Sat 2025-04-26 02:15:44 UTC kernel: ret_from_fork_asm+0x1b/0x30
Sat 2025-04-26 02:15:44 UTC kernel: </TASK>
Sat 2025-04-26 02:15:44 UTC kernel: Modules linked in: tls tcp_diag udp_diag
inet_diag xt_comment xt_set ip_set_hash_net ip_set_hash_ip ip_set xt_tcpudp
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables
cfg80211 binfmt_misc intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass
rapl qxl drm_ttm_helper ttm i2c_piix4 zfs(PO) pvpanic_mmio pvpanic qemu_fw_cfg
spl(O) input_leds joydev mac_hid serio_raw sch_fq_codel dm_multipath efi_pstore
nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
libcrc32c raid1 raid0 crct10dif_pclmul crc32_pclmul hid_generic polyval_clmulni
polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 floppy virtio_rng
psmouse pata_acpi usbhid hid aesni_intel crypto_simd cryptd
Sat 2025-04-26 02:15:44 UTC kernel: CR2: 0000000000000000
Sat 2025-04-26 02:15:44 UTC kernel: ---[ end trace 0000000000000000 ]---
Sat 2025-04-26 02:15:44 UTC kernel: RIP: 0010:0x0
Sat 2025-04-26 02:15:44 UTC kernel: Code: Unable to access opcode bytes at
0xffffffffffffffd6.
Sat 2025-04-26 02:15:44 UTC kernel: RSP: 0018:ffffb845c0cebd40 EFLAGS: 00010246
Sat 2025-04-26 02:15:44 UTC kernel: RAX: 0000000000000000 RBX: ffffb845c0cebdac
RCX: 0000000000000000
Sat 2025-04-26 02:15:44 UTC kernel: RDX: 0000000000000000 RSI: ffffb845c0cebd48
RDI: ffff8f8bb1fa4f00
Sat 2025-04-26 02:15:44 UTC kernel: RBP: ffffb845c0cebd98 R08: 0000000000000000
R09: 0000000000000000
Sat 2025-04-26 02:15:44 UTC kernel: R10: 0000000000000000 R11: 0000000000000000
R12: 0000000000009ca5
Sat 2025-04-26 02:15:44 UTC kernel: R13: 0000000000000000 R14: ffff8f8ab33dc000
R15: ffff8f8bb1fa4f00
Sat 2025-04-26 02:15:44 UTC kernel: FS: 0000000000000000(0000)
GS:ffff8f8cede00000(0000) knlGS:0000000000000000
Sat 2025-04-26 02:15:44 UTC kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Sat 2025-04-26 02:15:44 UTC kernel: CR2: ffffffffffffffd6 CR3: 000000013837e002
CR4: 00000000001706f0
Sat 2025-04-26 02:15:44 UTC kernel: DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
Sat 2025-04-26 02:15:44 UTC kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
Sat 2025-04-26 02:15:44 UTC kernel: note: arc_prune[486] exited with irqs
disabled
Here is another stack track in the same situation on a different VM:
[May10 04:58] general protection fault, probably for non-canonical address
0x636f6c2f7273752f: 0000 [#1] PREEMPT SMP NOPTI
[ +0.000037] CPU: 3 PID: 676 Comm: arc_prune Tainted: P O
6.8.0-55-generic #57-Ubuntu
[ +0.000022] Hardware name: Hetzner vServer/Standard PC (Q35 + ICH9, 2009),
BIOS 20171111 11/11/2017
[ +0.000020] RIP: 0010:srso_alias_safe_ret+0x5/0x7
[ +0.000019] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8d 64 24 08 <c3> cc e8 f4
ff ff ff 0f 0b cc cc cc cc cc cc cc cc cc cc cc cc cc
[ +0.000044] RSP: 0018:ffff9e32c043fd38 EFLAGS: 00010293
[ +0.000015] RAX: 636f6c2f7273752f RBX: ffff9e32c043fdac RCX: 0000000000000000
[ +0.000016] RDX: 0000000000000000 RSI: ffff9e32c043fd48 RDI: ffff8d5002f1db80
[ +0.000016] RBP: ffff9e32c043fd98 R08: 0000000000000000 R09: 0000000000000000
[ +0.000021] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000059b
[ +0.000016] R13: 0000000000000000 R14: ffff8d5010ada000 R15: ffff8d5002f1db80
[ +0.000019] FS: 0000000000000000(0000) GS:ffff8d5730780000(0000)
knlGS:0000000000000000
[ +0.000019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000013] CR2: 00007fdf8ea1b000 CR3: 0000000113a3c004 CR4: 0000000000770ef0
[ +0.000017] PKRU: 55555554
[ +0.000009] Call Trace:
[ +0.000009] <TASK>
[ +0.000010] ? show_regs+0x6d/0x80
[ +0.000014] ? die_addr+0x37/0xa0
[ +0.000011] ? exc_general_protection+0x1db/0x480
[ +0.000015] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000015] ? asm_exc_general_protection+0x27/0x30
[ +0.000017] ? srso_alias_safe_ret+0x5/0x7
[ +0.000012] ? srso_alias_return_thunk+0x5/0xfbef5
[ +0.000014] ? zfs_prune+0xf7/0x130 [zfs]
[ +0.000234] zpl_prune_sb+0x35/0x60 [zfs]
[ +0.000202] arc_prune_task+0x22/0x40 [zfs]
[ +0.000211] taskq_thread+0x1f6/0x3c0 [spl]
[ +0.000026] ? __pfx_default_wake_function+0x10/0x10
[ +0.000019] ? __pfx_taskq_thread+0x10/0x10 [spl]
[ +0.000023] kthread+0xf2/0x120
[ +0.000013] ? __pfx_kthread+0x10/0x10
[ +0.000014] ret_from_fork+0x47/0x70
[ +0.000013] ? __pfx_kthread+0x10/0x10
[ +0.000013] ret_from_fork_asm+0x1b/0x30
[ +0.000017] </TASK>
[ +0.000009] Modules linked in: tls tcp_diag udp_diag inet_diag xt_comment
xt_set ip_set_hash_net ip_set_hash_ip ip_set xt_tcpudp xt_conntrack
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables binfmt_misc
nls_iso8859_1 zfs(PO) spl(O) input_leds joydev serio_raw sch_fq_codel
dm_multipath efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs
blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic usbhid hid
crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic
ghash_clmulni_intel sha256_ssse3 sha1_ssse3 ahci psmouse libahci virtio_gpu
xhci_pci virtio_rng xhci_pci_renesas virtio_dma_buf aesni_intel crypto_simd
cryptd
[ +0.000208] ---[ end trace 0000000000000000 ]---
[ +0.758503] RIP: 0010:srso_alias_safe_ret+0x5/0x7
[ +0.000038] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8d 64 24 08 <c3> cc e8 f4
ff ff ff 0f 0b cc cc cc cc cc cc cc cc cc cc cc cc cc
[ +0.000039] RSP: 0018:ffff9e32c043fd38 EFLAGS: 00010293
[ +0.000579] RAX: 636f6c2f7273752f RBX: ffff9e32c043fdac RCX: 0000000000000000
[ +0.000614] RDX: 0000000000000000 RSI: ffff9e32c043fd48 RDI: ffff8d5002f1db80
[ +0.000615] RBP: ffff9e32c043fd98 R08: 0000000000000000 R09: 0000000000000000
[ +0.000575] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000059b
[ +0.000503] R13: 0000000000000000 R14: ffff8d5010ada000 R15: ffff8d5002f1db80
[ +0.000450] FS: 0000000000000000(0000) GS:ffff8d5730780000(0000)
knlGS:0000000000000000
[ +0.000504] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000481] CR2: 00007fdf8ea1b000 CR3: 000000010f716005 CR4: 0000000000770ef0
[ +0.000381] PKRU: 55555554
The end result is a system that cannot be shut down cleanly anymore,
because unmounting never finishes.
This is *not* easily reproducible. We run about 300 systems with Ubuntu
24.04, each one mounting and unmounting ZFS snapshots at least once per
day. On those, we saw the bug 3 times in the last 2 months or so.
Mounting/unmounting ZFS snapshots is part of our backup software. We've
been doing that for many years now and this bug only started appearing
with Ubuntu 24.04.
Let me know if you need any more info. Thanks!
More info:
# lsb_release -rd
No LSB modules are available.
Description: Ubuntu 24.04.2 LTS
Release: 24.04
# apt-cache policy zfsutils-linux
zfsutils-linux:
Installed: 2.2.2-0ubuntu9.2
Candidate: 2.2.2-0ubuntu9.2
Version table:
*** 2.2.2-0ubuntu9.2 500
500 mirror+file:/etc/apt/mirrors/ubuntu.txt noble-updates/main amd64
Packages
100 /var/lib/dpkg/status
# uname -a
Linux foo 6.8.0-59-generic #61-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 11 23:16:11
UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
** Affects: zfs-linux (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2110885
Title:
Kernel panic when unmounting ZFS snapshots
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/2110885/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs