I have tried the same again, this time with openvswitch unconfigured (but still
running).
ubuntu@node-horsea:~$ sudo ovs-vsctl del-br ovsbr0
ubuntu@node-horsea:~$ sudo ovs-vsctl show
8dfc2067-7b9b-48d7-a50a-df17bbd3cb6c
ovs_version: "2.13.5"
chcpu disabling/enabling still crashes.
ubuntu@node-horsea:~$ sudo chcpu -d 5-11
Killed
[ +3.357665] IRQ 56: no longer affine to CPU5
[ +0.000021] IRQ 72: no longer affine to CPU5
[ +0.000009] IRQ 82: no longer affine to CPU5
[ +0.000011] IRQ 96: no longer affine to CPU5
[ +0.000019] IRQ 121: no longer affine to CPU5
[ +0.000012] IRQ 136: no longer affine to CPU5
[ +0.002380] smpboot: CPU 5 is now offline
[ +0.000468] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ +0.031357] #PF: supervisor write access in kernel mode
[ +0.023816] #PF: error_code(0x0002) - not-present page
[ +0.023147] PGD 0 P4D 0
[ +0.011391] Oops: 0002 [#1] SMP PTI
[ +0.015688] CPU: 11 PID: 5967 Comm: chcpu Tainted: P W O
5.13.0-27-generic #29~20.04.1-Ubuntu
[ +0.043614] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS
P89 01/22/2018
[ +0.037462] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190
[ +0.023926] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 8b
48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 08
48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84
[ +0.084916] RSP: 0018:ffffacf3ccddbbb0 EFLAGS: 00010286
[ +0.023534] RAX: ffffccf3bfb7a580 RBX: 0000000000000000 RCX: ffffacf3ccddbbb0
[ +0.033000] RDX: ffffccf3bfb7a588 RSI: 0000000000000000 RDI: 0000000000000000
[ +0.032118] RBP: ffffacf3ccddbbe8 R08: 0000000000000000 R09: ffffacf3ccddbaa8
[ +0.032456] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8da022a50000
[ +0.032331] R13: ffffccf3bfb7a580 R14: ffffacf3ccddbbb0 R15: 0000000000000005
[ +0.032137] FS: 00007f1a4aff9580(0000) GS:ffff8da29fcc0000(0000)
knlGS:0000000000000000
[ +0.036480] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.025981] CR2: 0000000000000008 CR3: 00000005b29d6006 CR4: 00000000001706e0
[ +0.032158] Call Trace:
[ +0.010991] ? blk_mq_exit_hctx+0x160/0x160
[ +0.018782] cpuhp_invoke_callback+0x179/0x430
[ +0.020112] cpuhp_invoke_callback_range+0x44/0x80
[ +0.021557] _cpu_down+0x109/0x310
[ +0.015284] cpu_down+0x36/0x60
[ +0.014475] cpu_device_down+0x16/0x20
[ +0.016905] cpu_subsys_offline+0xe/0x10
[ +0.017652] device_offline+0x8e/0xc0
[ +0.016520] online_store+0x4c/0x90
[ +0.015662] dev_attr_store+0x17/0x30
[ +0.016581] sysfs_kf_write+0x3e/0x50
[ +0.016546] kernfs_fop_write_iter+0x138/0x1c0
[ +0.020381] new_sync_write+0x117/0x1b0
[ +0.017389] vfs_write+0x185/0x250
[ +0.015407] ksys_write+0x67/0xe0
[ +0.014961] __x64_sys_write+0x1a/0x20
[ +0.016879] do_syscall_64+0x61/0xb0
[ +0.016095] ? syscall_exit_to_user_mode+0x27/0x50
[ +0.021544] ? __x64_sys_faccessat+0x1c/0x20
[ +0.019347] ? do_syscall_64+0x6e/0xb0
[ +0.016881] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ +0.022785] RIP: 0033:0x7f1a4af140a7
[ +0.016084] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3
0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0
ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ +0.086348] RSP: 002b:00007ffd8b2963d8 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ +0.034055] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007f1a4af140a7
[ +0.032183] RDX: 0000000000000001 RSI: 000055feb3b47869 RDI: 0000000000000004
[ +0.032139] RBP: 00007f1a4aff9500 R08: 0000000000000000 R09: 00007ffd8b296380
[ +0.032232] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
[ +0.032338] R13: 00007ffd8b2963e0 R14: 000055feb3b47869 R15: 0000000000000001
[ +0.032799] Modules linked in: ebtable_filter ebtables veth nbd xt_comment
zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO)
spl(O) vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock
xt_MASQUERADE xt_conntrack xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp
ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_tables
ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc
nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 nls_iso8859_1 dm_multipath scsi_dh_rdac
scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr intel_rapl_common sb_edac
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel rpcrdma kvm sunrpc
rdma_ucm rapl intel_cstate ib_iser libiscsi scsi_transport_iscsi efi_pstore
rdma_cm ib_umad ib_ipoib iw_cm ib_cm hpilo ioatdma acpi_ipmi ipmi_si
acpi_power_meter acpi_tad mac_hid sch_fq_codel ipmi_devintf ipmi_msghandler msr
ip_tables x_tables autofs4 btrfs
[ +0.000048] blake2b_generic zstd_compress raid10 raid456 async_raid6_recov
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0
multipath linear mlx5_ib ib_uverbs ib_core ses enclosure mgag200 i2c_algo_bit
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core
crct10dif_pclmul crc32_pclmul mlx5_core ghash_clmulni_intel nvme ixgbe
pci_hyperv_intf drm aesni_intel psample xfrm_algo crypto_simd i2c_i801 xhci_pci
hpsa mlxfw dca cryptd tg3 i2c_smbus lpc_ich xhci_pci_renesas tls nvme_core mdio
scsi_transport_sas wmi
[ +0.618979] CR2: 0000000000000008
[ +0.014940] ---[ end trace acdc0f1424b180b1 ]---
[ +0.026418] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190
[ +0.023965] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 8b
48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 08
48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84
[ +0.085003] RSP: 0018:ffffacf3ccddbbb0 EFLAGS: 00010286
[ +0.024710] RAX: ffffccf3bfb7a580 RBX: 0000000000000000 RCX: ffffacf3ccddbbb0
[ +0.032751] RDX: ffffccf3bfb7a588 RSI: 0000000000000000 RDI: 0000000000000000
[ +0.032111] RBP: ffffacf3ccddbbe8 R08: 0000000000000000 R09: ffffacf3ccddbaa8
[ +0.032106] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8da022a50000
[ +0.032477] R13: ffffccf3bfb7a580 R14: ffffacf3ccddbbb0 R15: 0000000000000005
[ +0.032263] FS: 00007f1a4aff9580(0000) GS:ffff8da29fcc0000(0000)
knlGS:0000000000000000
[ +0.036690] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.025863] CR2: 0000000000000008 CR3: 00000005b29d6006 CR4: 00000000001706e0
That further simplifies the test as it comes down to "I can not disable
CPUs on 5.13.0-27-generic"
** Summary changed:
- Focal 20.04.4 crashing when using openvswitch and disabling CPUs
+ Focal 20.04.4 5.13.0-27-generic crashing disabling CPUs
** Description changed:
Hi I'm facing the following crash now two times in a row while runnign the
same
test - so somewhat reproducible it seems:
[ 1444.399448] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 1444.431172] #PF: supervisor write access in kernel mode
[ 1444.454715] #PF: error_code(0x0002) - not-present page
[ 1444.478052] PGD 0 P4D 0
[ 1444.489448] Oops: 0002 [#1] SMP PTI
[ 1444.505120] CPU: 6 PID: 26233 Comm: chcpu Tainted: P W O
5.13.0-27-generic #29~20.04.1-Ubuntu
[ 1444.549884] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9,
BIOS P89 01/22/2018
[ 1444.587322] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190
[ 1444.611352] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48
8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77
08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84
[ 1444.696490] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282
[ 1444.720510] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX:
ffffbf5d818dbbf0
[ 1444.752719] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI:
0000000000000000
[ 1444.784978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09:
ffffbf5d818dbae8
[ 1444.816712] R10: 0000000000000001 R11: 0000000000000001 R12:
ffff983d939b0000
[ 1444.848844] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15:
0000000000000005
[ 1444.881389] FS: 00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000)
knlGS:0000000000000000
[ 1444.918201] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1444.944633] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4:
00000000001706e0
[ 1444.977001] Call Trace:
[ 1444.988071] ? blk_mq_exit_hctx+0x160/0x160
[ 1445.007037] cpuhp_invoke_callback+0x179/0x430
[ 1445.027179] cpuhp_invoke_callback_range+0x44/0x80
[ 1445.048737] _cpu_down+0x109/0x310
[ 1445.064062] cpu_down+0x36/0x60
[ 1445.077882] cpu_device_down+0x16/0x20
[ 1445.094741] cpu_subsys_offline+0xe/0x10
[ 1445.112439] device_offline+0x8e/0xc0
[ 1445.129064] online_store+0x4c/0x90
[ 1445.144835] dev_attr_store+0x17/0x30
[ 1445.161307] sysfs_kf_write+0x3e/0x50
[ 1445.177856] kernfs_fop_write_iter+0x138/0x1c0
[ 1445.198036] new_sync_write+0x117/0x1b0
[ 1445.215386] vfs_write+0x185/0x250
[ 1445.230649] ksys_write+0x67/0xe0
[ 1445.245565] __x64_sys_write+0x1a/0x20
[ 1445.262448] do_syscall_64+0x61/0xb0
[ 1445.278585] ? do_syscall_64+0x6e/0xb0
[ 1445.295940] ? asm_exc_page_fault+0x8/0x30
[ 1445.314969] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1445.338356] RIP: 0033:0x7f1c8fda30a7
[ 1445.355062] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00
f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 1445.440161] RSP: 002b:00007fffed1c4418 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 1445.474829] RAX: ffffffffffffffda RBX: 0000000000000040 RCX:
00007f1c8fda30a7
[ 1445.507219] RDX: 0000000000000001 RSI: 0000559369f25869 RDI:
0000000000000004
[ 1445.539438] RBP: 00007f1c8fe88500 R08: 0000000000000000 R09:
00007fffed1c43c0
[ 1445.572547] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000000000004
[ 1445.604842] R13: 00007fffed1c4420 R14: 0000559369f25869 R15:
0000000000000001
[ 1445.636897] Modules linked in: vhost_net tap ebtable_filter ebtables veth
nbd xt_comment zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO)
zcommon(PO) znvpair(PO) spl(O) vhost_vsock vmw_vsock_virtio_transport_common
vhost vhost_iotlb vsock xt_MASQUERADE xt_conntrack xt_CHECKSUM ipt_REJECT
nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle
iptable_nat nf_tables ip6table_filter ip6_tables iptable_filter bpfilter bridge
stp llc nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 uio_pci_generic uio nls_iso8859_1
dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr
intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp
rpcrdma kvm_intel sunrpc kvm rdma_ucm ib_iser libiscsi scsi_transport_iscsi
rapl ib_umad rdma_cm ib_ipoib intel_cstate efi_pstore iw_cm ib_cm hpilo ioatdma
acpi_ipmi acpi_tad ipmi_si mac_hid acpi_power_meter sch_fq_codel ipmi_devintf
ipmi_msghandler msr
[ 1445.636951] ip_tables x_tables autofs4 btrfs blake2b_generic
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs
ib_core ses enclosure mgag200 i2c_algo_bit drm_kms_helper crct10dif_pclmul
syscopyarea crc32_pclmul sysfillrect ghash_clmulni_intel sysimgblt fb_sys_fops
aesni_intel mlx5_core cec ixgbe pci_hyperv_intf crypto_simd xfrm_algo psample
nvme cryptd rc_core hpsa mlxfw i2c_i801 dca xhci_pci drm i2c_smbus lpc_ich tg3
tls xhci_pci_renesas nvme_core mdio scsi_transport_sas wmi
[ 1446.267521] CR2: 0000000000000008
[ 1446.282506] ---[ end trace 99f81ab62ed1f929 ]---
[ 1446.308857] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190
[ 1446.311595] ixgbe 0000:04:00.1 eno50: NIC Link is Up 10 Gbps, Flow
Control: None
[ 1446.332973] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48
8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77
08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84
[ 1446.332975] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282
[ 1446.332977] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX:
ffffbf5d818dbbf0
[ 1446.332978] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI:
0000000000000000
[ 1446.332978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09:
ffffbf5d818dbae8
[ 1446.332979] R10: 0000000000000001 R11: 0000000000000001 R12:
ffff983d939b0000
[ 1446.332980] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15:
0000000000000005
[ 1446.332981] FS: 00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000)
knlGS:0000000000000000
[ 1446.332982] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1446.332983] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4:
00000000001706e0
The system is somewhat stuck afterwards.
I can't get back to libvirt (not restart the service, not spawn a new guest),
nor openvswitch (ovs-vsctl show) all those calls get stuck while other things
somewhat work. But also e.g. a new ssh login is stuck, so debugging after the
crash is very limited.
The order in which the tests do things is like:
1. set up a simple openvswitch
2. start the libvirt network for this OVS instance
3. disable cpus 5-11 (as I want the test to only have 0-4)
+ 4. start a KVM guest on that OVS with huge pages
- Note: I reproduced this without step #4 in the meantime, so we can
- ignore the VM that I originally mentioned.
-
+ Note: I reproduced this without step #4, #1 and #2 in the meantime, so
+ we can ignore the VM and OVS that I originally mentioned.
--- details ---
- The OVS setup is rather simple,
- one internal bridge and one upstream port, nothing "too special".
- This looks like:
-
- + ovs-vsctl show
- 8dfc2067-7b9b-48d7-a50a-df17bbd3cb6c
- Bridge ovsbr0
- Port eno49
- Interface eno49
- Port ovsbr0
- Interface ovsbr0
- type: internal
- ovs_version: "2.13.5
-
+ I originally had details about the OVS and the VM here, but to be honest
+ all that is left is "boot and run chcpu -d => crash". So not much
+ details to share.
---
ProblemType: Bug
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Mar 29 07:06 seq
crw-rw---- 1 root audio 116, 33 Mar 29 07:06 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
'/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: skip
DistroRelease: Ubuntu 20.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: HP ProLiant DL360 Gen9
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
PciMultimedia:
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.13.0-27-generic
root=UUID=c941b173-e6b5-485a-a02b-8d966b8d3c73 ro --- console=ttyS1,115200
ProcVersionSignature: Ubuntu 5.13.0-27.29~20.04.1-generic 5.13.19
RelatedPackageVersions:
linux-restricted-modules-5.13.0-27-generic N/A
linux-backports-modules-5.13.0-27-generic N/A
linux-firmware 1.187.29
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: focal uec-images
Uname: Linux 5.13.0-27-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: kvm libvirt
_MarkForUpload: True
dmi.bios.date: 01/22/2018
dmi.bios.release: 2.56
dmi.bios.vendor: HP
dmi.bios.version: P89
dmi.board.name: ProLiant DL360 Gen9
dmi.board.vendor: HP
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.ec.firmware.release: 2.60
dmi.modalias:
dmi:bvnHP:bvrP89:bd01/22/2018:br2.56:efr2.60:svnHP:pnProLiantDL360Gen9:pvr:rvnHP:rnProLiantDL360Gen9:rvr:cvnHP:ct23:cvr:sku780018-S01:
dmi.product.family: ProLiant
dmi.product.name: ProLiant DL360 Gen9
dmi.product.sku: 780018-S01
dmi.sys.vendor: HP
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1966870
Title:
Focal 20.04.4 5.13.0-27-generic crashing disabling CPUs
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1966870/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs