Public bug reported:
One of my bionic servers with HWE 5.4.0 hangs on boot (apparently while
starting LVM snapshots) after upgrading from Linux 5.4.0-42 to 5.4.0-47,
with the following trace:
[ 29.126292] kobject_add_internal failed for :a-0000152 with -EEXIST, don't
try to register things with the same name in the same directory.
[ 29.138854] BUG: kernel NULL pointer dereference, address: 0000000000000020
[ 29.145977] #PF: supervisor read access in kernel mode
[ 29.145979] #PF: error_code(0x0000) - not-present page
[ 29.145981] PGD 0 P4D 0
[ 29.158800] Oops: 0000 [#1] SMP NOPTI
[ 29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic
#50~18.04.1-Ubuntu
[ 29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3
07/15/2019
[ 29.178038] RIP: 0010:free_percpu+0x120/0x1f0
[ 29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5 4e
bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b 58
20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45
[ 29.202530] RSP: 0018:ffffa2f69c3d38e8 EFLAGS: 00010046
[ 29.209204] RAX: 0000000000000000 RBX: ffff92202ff397c0 RCX:
ffffffffa880a000
[ 29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI:
0000000000000000
[ 29.223469] RBP: ffffa2f69c3d3918 R08: 0000000000000000 R09:
ffffffffa74a5300
[ 29.230609] R10: ffffa2f69c3d3820 R11: 0000000000000000 R12:
cf35c0f24f14c3c0
[ 29.237745] R13: cf362fb2a054c3c0 R14: 0000000000000287 R15:
0000000000000008
[ 29.244878] FS: 00007f93a04b0900(0000) GS:ffff913faed80000(0000)
knlGS:0000000000000000
[ 29.252961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 29.258707] CR2: 0000000000000020 CR3: 0000003fa9d90000 CR4:
00000000003406e0
[ 29.265883] Call Trace:
[ 29.268346] __kmem_cache_release+0x1a/0x30
[ 29.273913] __kmem_cache_create+0x4f9/0x550
[ 29.278192] ? __kmalloc_node+0x1eb/0x320
[ 29.282205] ? kvmalloc_node+0x31/0x80
[ 29.285962] create_cache+0x120/0x1f0
[ 29.291003] kmem_cache_create_usercopy+0x17d/0x270
[ 29.295882] kmem_cache_create+0x16/0x20
[ 29.300152] dm_bufio_client_create+0x1af/0x3f0 [dm_bufio]
[ 29.305644] ? snapshot_map+0x5e0/0x5e0 [dm_snapshot]
[ 29.310693] persistent_read_metadata+0x1ed/0x500 [dm_snapshot]
[ 29.316627] ? _cond_resched+0x19/0x40
[ 29.320384] snapshot_ctr+0x79e/0x910 [dm_snapshot]
[ 29.325276] dm_table_add_target+0x18d/0x370
[ 29.329552] table_load+0x12a/0x370
[ 29.333045] ctl_ioctl+0x1e2/0x590
[ 29.336450] ? retrieve_status+0x1c0/0x1c0
[ 29.340551] dm_ctl_ioctl+0xe/0x20
[ 29.343958] do_vfs_ioctl+0xa9/0x640
[ 29.347547] ? ksys_semctl.constprop.19+0xf7/0x190
[ 29.352337] ksys_ioctl+0x75/0x80
[ 29.355663] __x64_sys_ioctl+0x1a/0x20
[ 29.359421] do_syscall_64+0x57/0x190
[ 29.363094] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 29.368144] RIP: 0033:0x7f939f0286d7
[ 29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7
c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
[ 29.390478] RSP: 002b:00007ffe918df168 EFLAGS: 00000202 ORIG_RAX:
0000000000000010
[ 29.398045] RAX: ffffffffffffffda RBX: 0000561c107f672c RCX:
00007f939f0286d7
[ 29.405175] RDX: 0000561c1107c610 RSI: 00000000c138fd09 RDI:
0000000000000009
[ 29.412309] RBP: 00007ffe918df220 R08: 00007f939f59d120 R09:
00007ffe918defd0
[ 29.419442] R10: 0000561c1107c6c0 R11: 0000000000000202 R12:
00007f939f59c4e6
[ 29.426623] R13: 00007f939f59c4e6 R14: 00007f939f59c4e6 R15:
00007f939f59c4e6
[ 29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero nls_iso8859_1
ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd kvm ccp k10temp
ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel ib_iser rdma_cm iw_cm
ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables
x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0
multipath linear mlx5_ib ib_uverbs ib_core bcache crc64 hid_generic
crct10dif_pclmul mlx5_core crc32_pclmul ast ghash_clmulni_intel drm_vram_helper
pci_hyperv_intf ttm aesni_intel mpt3sas nvme crypto_simd drm_kms_helper
syscopyarea igb cryptd raid_class sysfillrect ahci tls sysimgblt glue_helper
dca usbhid fb_sys_fops libahci nvme_core mlxfw i2c_algo_bit scsi_transport_sas
drm hid i2c_piix4
[ 29.507853] CR2: 0000000000000020
[ 29.511174] ---[ end trace 43bd923f80cbdf52 ]---
That :a-0000152 is meant to be /sys/kernel/slab/:a-0000152. Even a
working kernel shows some trouble there:
$ uname -a
Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24
UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ ls -l /sys/kernel/slab | grep a-0000152
lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-0000152
So on 5.4.0-42 the named node doesn't get created, but at least it
doesn't crash. The same thing is visible on my 5.8.0-18 desktop, but I
can't reproduce the crash on other machines with snapshot thin volumes
despite it happening every time (even with maxcpus=1) on the affected
system.
It should be noted that LVM was not in use on this system until just
before it was rebooted into the new kernel, but downgrading to -42 does
work so it seems like a coincidence. Before I realised it was a recent
regression I dug through mm/slub.c's history and found dde3c6b7
("mm/slub: fix a memory leak in sysfs_slab_add()") kind of suspicious --
it ostensibly fixes a leak from 80da026a ("mm/slub: fix slab double-free
in case of duplicate sysfs filename"), exactly the codepath that seems
to crash here.
There's clearly some existing bug causing the slab sysfs node to not be
added, and I guess dde3c6b7 turns that into a crash on some systems.
This is a test system, so I can do whatever debugging is required to
narrow down the trigger.
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Description changed:
One of my bionic servers with HWE 5.4.0 hangs on boot (apparently while
starting LVM snapshots) after upgrading from Linux 5.4.0-42 to 5.4.0-47,
with the following trace:
- [ 29.126292] kobject_add_internal failed for :a-0000152 with -EEXIST,
don't try to register things with the same name in the same directory.
- [ 29.138854] BUG: kernel NULL pointer dereference, address:
0000000000000020
- [ 29.145977] #PF: supervisor read access in kernel mode
- [ 29.145979] #PF: error_code(0x0000) - not-present page
- [ 29.145981] PGD 0 P4D 0
- [ 29.158800] Oops: 0000 [#1] SMP NOPTI
- [ 29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic
#50~18.04.1-Ubuntu
- [ 29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3
07/15/2019
- [ 29.178038] RIP: 0010:free_percpu+0x120/0x1f0
- [ 29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5
4e bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b
58 20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45
- [ 29.202530] RSP: 0018:ffffa2f69c3d38e8 EFLAGS: 00010046
- [ 29.209204] RAX: 0000000000000000 RBX: ffff92202ff397c0 RCX:
ffffffffa880a000
- [ 29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI:
0000000000000000
- [ 29.223469] RBP: ffffa2f69c3d3918 R08: 0000000000000000 R09:
ffffffffa74a5300
- [ 29.230609] R10: ffffa2f69c3d3820 R11: 0000000000000000 R12:
cf35c0f24f14c3c0
- [ 29.237745] R13: cf362fb2a054c3c0 R14: 0000000000000287 R15:
0000000000000008
- [ 29.244878] FS: 00007f93a04b0900(0000) GS:ffff913faed80000(0000)
knlGS:0000000000000000
- [ 29.252961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
- [ 29.258707] CR2: 0000000000000020 CR3: 0000003fa9d90000 CR4:
00000000003406e0
- [ 29.265883] Call Trace:
- [ 29.268346] __kmem_cache_release+0x1a/0x30
- [ 29.273913] __kmem_cache_create+0x4f9/0x550
- [ 29.278192] ? __kmalloc_node+0x1eb/0x320
- [ 29.282205] ? kvmalloc_node+0x31/0x80
- [ 29.285962] create_cache+0x120/0x1f0
- [ 29.291003] kmem_cache_create_usercopy+0x17d/0x270
- [ 29.295882] kmem_cache_create+0x16/0x20
- [ 29.300152] dm_bufio_client_create+0x1af/0x3f0 [dm_bufio]
- [ 29.305644] ? snapshot_map+0x5e0/0x5e0 [dm_snapshot]
- [ 29.310693] persistent_read_metadata+0x1ed/0x500 [dm_snapshot]
- [ 29.316627] ? _cond_resched+0x19/0x40
- [ 29.320384] snapshot_ctr+0x79e/0x910 [dm_snapshot]
- [ 29.325276] dm_table_add_target+0x18d/0x370
- [ 29.329552] table_load+0x12a/0x370
- [ 29.333045] ctl_ioctl+0x1e2/0x590
- [ 29.336450] ? retrieve_status+0x1c0/0x1c0
- [ 29.340551] dm_ctl_ioctl+0xe/0x20
- [ 29.343958] do_vfs_ioctl+0xa9/0x640
- [ 29.347547] ? ksys_semctl.constprop.19+0xf7/0x190
- [ 29.352337] ksys_ioctl+0x75/0x80
- [ 29.355663] __x64_sys_ioctl+0x1a/0x20
- [ 29.359421] do_syscall_64+0x57/0x190
- [ 29.363094] entry_SYSCALL_64_after_hwframe+0x44/0xa9
- [ 29.368144] RIP: 0033:0x7f939f0286d7
- [ 29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48
c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
- [ 29.390478] RSP: 002b:00007ffe918df168 EFLAGS: 00000202 ORIG_RAX:
0000000000000010
- [ 29.398045] RAX: ffffffffffffffda RBX: 0000561c107f672c RCX:
00007f939f0286d7
- [ 29.405175] RDX: 0000561c1107c610 RSI: 00000000c138fd09 RDI:
0000000000000009
- [ 29.412309] RBP: 00007ffe918df220 R08: 00007f939f59d120 R09:
00007ffe918defd0
- [ 29.419442] R10: 0000561c1107c6c0 R11: 0000000000000202 R12:
00007f939f59c4e6
- [ 29.426623] R13: 00007f939f59c4e6 R14: 00007f939f59c4e6 R15:
00007f939f59c4e6
- [ 29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero
nls_iso8859_1 ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd
kvm ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel
ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core
bcache crc64 hid_generic crct10dif_pclmul mlx5_core crc32_pclmul ast
ghash_clmulni_intel drm_vram_helper pci_hyperv_intf ttm aesni_intel mpt3sas
nvme crypto_simd drm_kms_helper syscopyarea igb cryptd raid_class sysfillrect
ahci tls sysimgblt glue_helper dca usbhid fb_sys_fops libahci nvme_core mlxfw
i2c_algo_bit scsi_transport_sas drm hid i2c_piix4
- [ 29.507853] CR2: 0000000000000020
- [ 29.511174] ---[ end trace 43bd923f80cbdf52 ]---
+ [ 29.126292] kobject_add_internal failed for :a-0000152 with -EEXIST,
don't try to register things with the same name in the same directory.
+ [ 29.138854] BUG: kernel NULL pointer dereference, address:
0000000000000020
+ [ 29.145977] #PF: supervisor read access in kernel mode
+ [ 29.145979] #PF: error_code(0x0000) - not-present page
+ [ 29.145981] PGD 0 P4D 0
+ [ 29.158800] Oops: 0000 [#1] SMP NOPTI
+ [ 29.162468] CPU: 6 PID: 2532 Comm: lvm Not tainted 5.4.0-46-generic
#50~18.04.1-Ubuntu
+ [ 29.170378] Hardware name: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.3
07/15/2019
+ [ 29.178038] RIP: 0010:free_percpu+0x120/0x1f0
+ [ 29.183786] Code: 43 64 48 01 d0 49 39 c4 0f 83 71 ff ff ff 65 8b 05 a5
4e bc 58 48 8b 15 0e 4e 20 01 48 98 48 8b 3c c2 4c 01 e7 e8 f0 97 02 00 <48> 8b
58 20 48 8b 53 38 e9 48 ff ff ff f3 c3 48 8b 43 38 48 89 45
+ [ 29.202530] RSP: 0018:ffffa2f69c3d38e8 EFLAGS: 00010046
+ [ 29.209204] RAX: 0000000000000000 RBX: ffff92202ff397c0 RCX:
ffffffffa880a000
+ [ 29.216336] RDX: cf35c0f24f2cc3c0 RSI: 43817c451b92afcb RDI:
0000000000000000
+ [ 29.223469] RBP: ffffa2f69c3d3918 R08: 0000000000000000 R09:
ffffffffa74a5300
+ [ 29.230609] R10: ffffa2f69c3d3820 R11: 0000000000000000 R12:
cf35c0f24f14c3c0
+ [ 29.237745] R13: cf362fb2a054c3c0 R14: 0000000000000287 R15:
0000000000000008
+ [ 29.244878] FS: 00007f93a04b0900(0000) GS:ffff913faed80000(0000)
knlGS:0000000000000000
+ [ 29.252961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ [ 29.258707] CR2: 0000000000000020 CR3: 0000003fa9d90000 CR4:
00000000003406e0
+ [ 29.265883] Call Trace:
+ [ 29.268346] __kmem_cache_release+0x1a/0x30
+ [ 29.273913] __kmem_cache_create+0x4f9/0x550
+ [ 29.278192] ? __kmalloc_node+0x1eb/0x320
+ [ 29.282205] ? kvmalloc_node+0x31/0x80
+ [ 29.285962] create_cache+0x120/0x1f0
+ [ 29.291003] kmem_cache_create_usercopy+0x17d/0x270
+ [ 29.295882] kmem_cache_create+0x16/0x20
+ [ 29.300152] dm_bufio_client_create+0x1af/0x3f0 [dm_bufio]
+ [ 29.305644] ? snapshot_map+0x5e0/0x5e0 [dm_snapshot]
+ [ 29.310693] persistent_read_metadata+0x1ed/0x500 [dm_snapshot]
+ [ 29.316627] ? _cond_resched+0x19/0x40
+ [ 29.320384] snapshot_ctr+0x79e/0x910 [dm_snapshot]
+ [ 29.325276] dm_table_add_target+0x18d/0x370
+ [ 29.329552] table_load+0x12a/0x370
+ [ 29.333045] ctl_ioctl+0x1e2/0x590
+ [ 29.336450] ? retrieve_status+0x1c0/0x1c0
+ [ 29.340551] dm_ctl_ioctl+0xe/0x20
+ [ 29.343958] do_vfs_ioctl+0xa9/0x640
+ [ 29.347547] ? ksys_semctl.constprop.19+0xf7/0x190
+ [ 29.352337] ksys_ioctl+0x75/0x80
+ [ 29.355663] __x64_sys_ioctl+0x1a/0x20
+ [ 29.359421] do_syscall_64+0x57/0x190
+ [ 29.363094] entry_SYSCALL_64_after_hwframe+0x44/0xa9
+ [ 29.368144] RIP: 0033:0x7f939f0286d7
+ [ 29.371732] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48
c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
+ [ 29.390478] RSP: 002b:00007ffe918df168 EFLAGS: 00000202 ORIG_RAX:
0000000000000010
+ [ 29.398045] RAX: ffffffffffffffda RBX: 0000561c107f672c RCX:
00007f939f0286d7
+ [ 29.405175] RDX: 0000561c1107c610 RSI: 00000000c138fd09 RDI:
0000000000000009
+ [ 29.412309] RBP: 00007ffe918df220 R08: 00007f939f59d120 R09:
00007ffe918defd0
+ [ 29.419442] R10: 0000561c1107c6c0 R11: 0000000000000202 R12:
00007f939f59c4e6
+ [ 29.426623] R13: 00007f939f59c4e6 R14: 00007f939f59c4e6 R15:
00007f939f59c4e6
+ [ 29.433778] Modules linked in: dm_snapshot dm_bufio dm_zero
nls_iso8859_1 ipmi_ssif input_leds amd64_edac_mod edac_mce_amd joydev kvm_amd
kvm ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel
ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core
bcache crc64 hid_generic crct10dif_pclmul mlx5_core crc32_pclmul ast
ghash_clmulni_intel drm_vram_helper pci_hyperv_intf ttm aesni_intel mpt3sas
nvme crypto_simd drm_kms_helper syscopyarea igb cryptd raid_class sysfillrect
ahci tls sysimgblt glue_helper dca usbhid fb_sys_fops libahci nvme_core mlxfw
i2c_algo_bit scsi_transport_sas drm hid i2c_piix4
+ [ 29.507853] CR2: 0000000000000020
+ [ 29.511174] ---[ end trace 43bd923f80cbdf52 ]---
- That :a-0000152 is meant to be /sys/kernel/debug/:a-0000152. Even a
+ That :a-0000152 is meant to be /sys/kernel/slab/:a-0000152. Even a
working kernel shows some trouble there:
- $ uname -a
- Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10
07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
- $ ls -l /sys/kernel/slab | grep a-0000152
- lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-0000152
+ $ uname -a
+ Linux <REDACTED> 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10
07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
+ $ ls -l /sys/kernel/slab | grep a-0000152
+ lrwxrwxrwx 1 root root 0 Sep 8 03:20 dm_bufio_buffer -> :a-0000152
So on 5.4.0-42 the named node doesn't get created, but at least it
doesn't crash. The same thing is visible on my 5.8.0-18 desktop, but I
can't reproduce the crash on other machines with snapshot thin volumes
despite it happening every time (even with maxcpus=1) on the affected
system.
It should be noted that LVM was not in use on this system until just
before it was rebooted into the new kernel, but downgrading to -42 does
work so it seems like a coincidence. Before I realised it was a recent
regression I dug through mm/slub.c's history and found dde3c6b7
("mm/slub: fix a memory leak in sysfs_slab_add()") kind of suspicious --
it ostensibly fixes a leak from 80da026a ("mm/slub: fix slab double-free
in case of duplicate sysfs filename"), exactly the codepath that seems
to crash here.
There's clearly some existing bug causing the slab sysfs node to not be
added, and I guess dde3c6b7 turns that into a crash on some systems.
This is a test system, so I can do whatever debugging is required to
narrow down the trigger.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1894780
Title:
Oops and hang when starting LVM snapshots on 5.4.0-47
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894780/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs