[Bug 1733662] Re: System hang with Linux kernel 4.13, not with 4.10

Rod Smith Thu, 21 Dec 2017 08:01:23 -0800

That one completed one run of the test OK, but then crashed on the
second one, when bringing CPU 15 back online, with the following dmesg
output:


[  160.596312] EDAC MC0: Giving out device to module sb_edac.c controller 
Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0 (INTERRUPT)
[  160.596537] EDAC MC1: Giving out device to module sb_edac.c controller 
Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0 (INTERRUPT)
[  160.596679] EDAC sbridge: Some needed devices are missing
[  160.627089] EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#1_Ha#0: 
DEV 0000:ff:12.0
[  160.651100] EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#0_Ha#0: 
DEV 0000:7f:12.0
[  160.651271] EDAC sbridge: Couldn't find mci handler
[  160.651422] EDAC sbridge: Couldn't find mci handler
[  160.651572] EDAC sbridge: Failed to register device with error -19.
[  161.099074] BUG: unable to handle kernel paging request at 0000000180040100
[  161.099512] IP: __kmalloc_node+0x135/0x2a0
[  161.099704] PGD 1ff1f01067 
[  161.099705] P4D 1ff1f01067 
[  161.099871] PUD 0 

[  161.100373] Oops: 0000 [#2] SMP
[  161.100548] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal 
intel_powerclamp coretemp intel_cstate kvm_intel kvm irqbypass intel_rapl_perf 
joydev input_leds ipmi_ssif ipmi_si ipmi_devintf ipmi_msghandler mei_me mei 
shpchp lpc_ich acpi_pad mac_hid acpi_power_meter ib_iser rdma_cm iw_cm ib_cm 
ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure 
scsi_transport_sas fnic crct10dif_pclmul crc32_pclmul mgag200 
ghash_clmulni_intel ttm pcbc igb hid_generic drm_kms_helper aesni_intel dca 
syscopyarea i2c_algo_bit sysfillrect aes_x86_64 sysimgblt usbhid libfcoe 
crypto_simd fb_sys_fops ahci ptp glue_helper hid mxm_wmi libfc cryptd libahci
[  161.102507]  pps_core drm enic scsi_transport_fc megaraid_sas wmi
[  161.102856] CPU: 2 PID: 3686 Comm: python3 Tainted: G      D         
4.13.0-13-generic #14~lp1733662Commit8d9d2235a82ea41
[  161.103230] Hardware name: Cisco Systems Inc UCSC-C240-M4L/UCSC-C240-M4L, 
BIOS C240M4.2.0.10c.0.032320160820 03/23/2016
[  161.103624] task: ffff8f3de5989740 task.stack: ffffa3a7ce288000
[  161.104024] RIP: 0010:__kmalloc_node+0x135/0x2a0
[  161.104431] RSP: 0018:ffffa3a7ce28bc30 EFLAGS: 00010246
[  161.104846] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000f95
[  161.105274] RDX: 0000000000000f94 RSI: 0000000000000000 RDI: 000000000001f3e0
[  161.105705] RBP: ffffa3a7ce28bc70 R08: ffff8f3dffc9f3e0 R09: ffff8f3dff807c00
[  161.106148] R10: ffffffffbb017760 R11: ffff8f5df8fa21f2 R12: 00000000014080c0
[  161.106599] R13: 0000000000000008 R14: 0000000180040100 R15: ffff8f3dff807c00
[  161.107057] FS:  00007f7849b98700(0000) GS:ffff8f3dffc80000(0000) 
knlGS:0000000000000000
[  161.107530] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  161.108014] CR2: 0000000180040100 CR3: 0000001ff6e6e000 CR4: 00000000001406e0
[  161.108509] Call Trace:
[  161.109012]  ? alloc_cpumask_var_node+0x1f/0x30
[  161.109523]  ? on_each_cpu_cond+0x160/0x160
[  161.110036]  alloc_cpumask_var_node+0x1f/0x30
[  161.110558]  zalloc_cpumask_var_node+0xf/0x20
[  161.111084]  smpcfd_prepare_cpu+0x64/0xc0
[  161.111615]  cpuhp_invoke_callback+0x84/0x3b0
[  161.112151]  cpuhp_up_callbacks+0x36/0xc0
[  161.112690]  _cpu_up+0x87/0xd0
[  161.113235]  do_cpu_up+0x8b/0xb0
[  161.113785]  cpu_up+0x13/0x20
[  161.114342]  cpu_subsys_online+0x3d/0x90
[  161.114881]  device_online+0x4a/0x90
[  161.115422]  online_store+0x89/0xa0
[  161.115951]  dev_attr_store+0x18/0x30
[  161.116472]  sysfs_kf_write+0x37/0x40
[  161.116994]  kernfs_fop_write+0x11c/0x1a0
[  161.117510]  __vfs_write+0x18/0x40
[  161.118029]  vfs_write+0xb1/0x1a0
[  161.118544]  SyS_write+0x55/0xc0
[  161.119062]  entry_SYSCALL_64_fastpath+0x1e/0xa9
[  161.119581] RIP: 0033:0x7f78497784a0
[  161.120081] RSP: 002b:00007fff6e69ed48 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
[  161.120602] RAX: ffffffffffffffda RBX: 0000000001ea8410 RCX: 00007f78497784a0
[  161.121129] RDX: 0000000000000002 RSI: 0000000001fbe400 RDI: 0000000000000003
[  161.121666] RBP: 0000000000a3e020 R08: 0000000000000000 R09: 0000000000000001
[  161.122202] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
[  161.122720] R13: 0000000000501520 R14: 00007fff6e69f1b0 R15: 00007f7848690240
[  161.123226] Code: 89 cf 4c 89 4d c0 e8 0b 7f 01 00 49 89 c7 4c 8b 4d c0 4d 
85 ff 0f 85 47 ff ff ff 45 31 f6 eb 3c 49 63 47 20 49 8b 3f 48 8d 4a 01 <49> 8b 
1c 06 4c 89 f0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 20 ff 
[  161.124251] RIP: __kmalloc_node+0x135/0x2a0 RSP: ffffa3a7ce28bc30
[  161.124738] CR2: 0000000180040100
[  161.125220] ---[ end trace 1246d63efc5b2bf0 ]---

Rather than hang, as has happened before, the script crashed ("Killed"
was displayed and I was dropped back to a bash prompt). The system
behaved unreliably and I was forced to reboot it via its BMC.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1733662

Title:
  System hang with Linux kernel 4.13, not with 4.10

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1733662/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1733662] Re: System hang with Linux kernel 4.13, not with 4.10

Reply via email to