That one completed two runs, but on the second run, dmesg included the
following message at one point:

[  240.841694] kernel BUG at 
/home/jsalisbury/bugs/lp1733662/ubuntu-artful/mm/slub.c:3878!
[  240.842765] invalid opcode: 0000 [#1] SMP
[  240.843718] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm irqbypass intel_cstate intel_rapl_perf 
ipmi_ssif joydev input_leds ipmi_si ipmi_devintf ipmi_msghandler 
acpi_power_meter lpc_ich shpchp acpi_pad mac_hid mei_me mei ib_iser rdma_cm 
iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure 
scsi_transport_sas crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc fnic 
mgag200 ttm hid_generic drm_kms_helper syscopyarea igb sysfillrect aesni_intel 
sysimgblt usbhid libfcoe fb_sys_fops aes_x86_64 dca hid crypto_simd 
i2c_algo_bit mxm_wmi glue_helper ptp cryptd ahci libfc libahci
[  240.851457]  drm pps_core megaraid_sas scsi_transport_fc enic wmi
[  240.852693] CPU: 8 PID: 2724 Comm: irqbalance Not tainted 4.13.0-13-generic 
#14~lp1733662Commitac2fc5adab0f4
[  240.853965] Hardware name: Cisco Systems Inc UCSC-C240-M4L/UCSC-C240-M4L, 
BIOS C240M4.2.0.10c.0.032320160820 03/23/2016
[  240.855281] task: ffff9b62a76645c0 task.stack: ffffb973cf6fc000
[  240.856603] RIP: 0010:kfree+0x11c/0x160
[  240.857937] RSP: 0018:ffffb973cf6ffa08 EFLAGS: 00010246
[  240.859280] RAX: fffff8803cff0020 RBX: ffff9b6200000000 RCX: 0000000000000000
[  240.860632] RDX: 0000000000000000 RSI: ffff9b62b0eb5348 RDI: 000064dcc0000000
[  240.861995] RBP: ffffb973cf6ffa20 R08: ffff9b62b22f70f0 R09: 0000000180220021
[  240.863367] R10: fffff8803d000000 R11: 0000000000000001 R12: ffff9b62b1648780
[  240.864756] R13: ffffffffb65dd4e0 R14: ffff9b62a872f0d8 R15: ffff9b62a872fac0
[  240.866145] FS:  00007ff8c4d06740(0000) GS:ffff9b62bf200000(0000) 
knlGS:0000000000000000
[  240.867562] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  240.868986] CR2: 00007fff9ef860f8 CR3: 0000003fe7876000 CR4: 00000000001406e0
[  240.870438] Call Trace:
[  240.871882]  kfree_const+0x20/0x30
[  240.873328]  kernfs_put+0x71/0x180
[  240.874778]  kernfs_dop_release+0x12/0x20
[  240.876218]  __dentry_kill+0xe5/0x150
[  240.877644]  shrink_dentry_list+0x11f/0x2e0
[  240.879078]  d_invalidate+0x67/0x110
[  240.880526]  lookup_fast+0x2b9/0x310
[  240.881968]  ? dput.part.23+0x2d/0x1e0
[  240.883393]  walk_component+0x49/0x340
[  240.884811]  ? kernfs_iop_permission+0x4f/0x60
[  240.886253]  link_path_walk+0x1bc/0x590
[  240.887690]  ? path_init+0x177/0x2f0
[  240.889105]  path_lookupat+0x56/0x1f0
[  240.890529]  filename_lookup+0xb6/0x190
[  240.891964]  ? sprintf+0x51/0x70
[  240.893387]  ? __check_object_size+0xaf/0x1b0
[  240.894822]  ? strncpy_from_user+0x4d/0x170
[  240.896240]  user_path_at_empty+0x36/0x40
[  240.897673]  ? user_path_at_empty+0x36/0x40
[  240.899101]  vfs_statx+0x76/0xe0
[  240.900517]  SYSC_newstat+0x3d/0x70
[  240.901934]  ? ____fput+0xe/0x10
[  240.903365]  ? task_work_run+0x7b/0x90
[  240.904783]  ? exit_to_usermode_loop+0x9b/0xd0
[  240.906181]  SyS_newstat+0xe/0x10
[  240.907559]  entry_SYSCALL_64_fastpath+0x1e/0xa9
[  240.908900] RIP: 0033:0x7ff8c3df6bb5
[  240.910196] RSP: 002b:00007ffe6cf8a928 EFLAGS: 00000246 ORIG_RAX: 
0000000000000004
[  240.911496] RAX: ffffffffffffffda RBX: 0000000000fe9a40 RCX: 00007ff8c3df6bb5
[  240.912763] RDX: 00007ffe6cf8a980 RSI: 00007ffe6cf8a980 RDI: 00007ffe6cf8c210
[  240.913985] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000039
[  240.915181] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  240.916320] R13: 00007ffe6cf8b22b R14: 0000000000fe9a40 R15: 0000000000fe92f0
[  240.917447] Code: 08 49 83 c4 18 48 89 da 4c 89 ee ff d0 49 8b 04 24 48 85 
c0 75 e6 e9 0e ff ff ff 49 8b 02 f6 c4 80 75 0a 49 8b 42 20 a8 01 75 02 <0f> 0b 
49 8b 02 31 f6 f6 c4 80 74 04 41 8b 72 6c 4c 89 d7 e8 2c 
[  240.919769] RIP: kfree+0x11c/0x160 RSP: ffffb973cf6ffa08
[  240.920909] ---[ end trace 67fe147f4dd931eb ]---

A third run produced a hang when offlining CPU 8, with the following
dmesg output:

[  352.776303] EDAC MC1: Giving out device to module sb_edac.c controller 
Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0 (INTERRUPT)
[  352.776572] EDAC sbridge: Some needed devices are missing
[  352.801614] EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#1_Ha#0: 
DEV 0000:ff:12.0
[  352.825588] EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#0_Ha#0: 
DEV 0000:7f:12.0
[  352.826090] EDAC sbridge: Couldn't find mci handler
[  352.826457] EDAC sbridge: Couldn't find mci handler
[  352.826826] EDAC sbridge: Failed to register device with error -19.
[  353.286163] BUG: unable to handle kernel paging request at 0000317865646e69
[  353.286790] IP: __kmalloc_node+0x135/0x2a0
[  353.287303] PGD 0 
[  353.287304] P4D 0 

[  353.288695] Oops: 0000 [#2] SMP
[  353.289158] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm irqbypass intel_cstate intel_rapl_perf 
ipmi_ssif joydev input_leds ipmi_si ipmi_devintf ipmi_msghandler 
acpi_power_meter lpc_ich shpchp acpi_pad mac_hid mei_me mei ib_iser rdma_cm 
iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure 
scsi_transport_sas crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc fnic 
mgag200 ttm hid_generic drm_kms_helper syscopyarea igb sysfillrect aesni_intel 
sysimgblt usbhid libfcoe fb_sys_fops aes_x86_64 dca hid crypto_simd 
i2c_algo_bit mxm_wmi glue_helper ptp cryptd ahci libfc libahci
[  353.294318]  drm pps_core megaraid_sas scsi_transport_fc enic wmi
[  353.295246] CPU: 8 PID: 56 Comm: cpuhp/8 Tainted: G      D         
4.13.0-13-generic #14~lp1733662Commitac2fc5adab0f4
[  353.296231] Hardware name: Cisco Systems Inc UCSC-C240-M4L/UCSC-C240-M4L, 
BIOS C240M4.2.0.10c.0.032320160820 03/23/2016
[  353.297274] task: ffff9b62b8fc0000 task.stack: ffffb973cc780000
[  353.298341] RIP: 0010:__kmalloc_node+0x135/0x2a0
[  353.299416] RSP: 0018:ffffb973cc783bb0 EFLAGS: 00010246
[  353.300511] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000008a2
[  353.301652] RDX: 00000000000008a1 RSI: 0000000000000000 RDI: 000000000001f3e0
[  353.302793] RBP: ffffb973cc783bf0 R08: ffff9b62bf21f3e0 R09: ffff9b42bf807c00
[  353.303960] R10: 000000000000024c R11: 0000000000020dd1 R12: 00000000014080c0
[  353.305155] R13: 0000000000000008 R14: 0000317865646e69 R15: ffff9b42bf807c00
[  353.306379] FS:  0000000000000000(0000) GS:ffff9b62bf200000(0000) 
knlGS:0000000000000000
[  353.307637] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  353.308901] CR2: 0000317865646e69 CR3: 0000002343409000 CR4: 00000000001406e0
[  353.310205] Call Trace:
[  353.311531]  ? alloc_cpumask_var_node+0x1f/0x30
[  353.312881]  alloc_cpumask_var_node+0x1f/0x30
[  353.314245]  zalloc_cpumask_var+0x14/0x20
[  353.315616]  cpudl_init+0x6a/0xe0
[  353.316992]  init_rootdomain+0x7a/0xd0
[  353.318393]  build_sched_domains+0x26a/0xdd0
[  353.319817]  ? call_rcu_sched+0x17/0x20
[  353.321249]  ? cpu_attach_domain+0x1af/0x6a0
[  353.322698]  ? kfree+0x14a/0x160
[  353.324146]  partition_sched_domains+0x1c6/0x2f0
[  353.325623]  ? sched_cpu_activate+0xd0/0xd0
[  353.327122]  cpuset_update_active_cpus+0x17/0x40
[  353.328583]  sched_cpu_deactivate+0x94/0xd0
[  353.330052]  ? call_rcu_bh+0x20/0x20
[  353.331495]  ? call_rcu_bh+0x20/0x20
[  353.332894]  ? trace_raw_output_rcu_utilization+0x50/0x50
[  353.334320]  ? pick_next_task_fair+0x48e/0x560
[  353.335736]  cpuhp_invoke_callback+0x84/0x3b0
[  353.337164]  cpuhp_down_callbacks+0x42/0x80
[  353.338579]  cpuhp_thread_fun+0x88/0xe0
[  353.339971]  smpboot_thread_fn+0xec/0x160
[  353.341346]  kthread+0x125/0x140
[  353.342723]  ? sort_range+0x30/0x30
[  353.344106]  ? kthread_create_on_node+0x70/0x70
[  353.345521]  ret_from_fork+0x25/0x30
[  353.346928] Code: 89 cf 4c 89 4d c0 e8 0b 7f 01 00 49 89 c7 4c 8b 4d c0 4d 
85 ff 0f 85 47 ff ff ff 45 31 f6 eb 3c 49 63 47 20 49 8b 3f 48 8d 4a 01 <49> 8b 
1c 06 4c 89 f0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 20 ff 
[  353.349833] RIP: __kmalloc_node+0x135/0x2a0 RSP: ffffb973cc783bb0
[  353.351218] CR2: 0000317865646e69
[  353.352559] ---[ end trace 67fe147f4dd931ec ]---

Although the test script hung, I was able to continue using my other
terminal normally, run other programs, log out, log back in, etc. An
attempt to reboot ("sudo shutdown -h now") did not succeed; the system
hung with "[ OK ] Stopped target Multi-User System" on the console.
After forcing a restart via the BMC, I ran the test script again, which
completed one run but then hung on the second run, with limited
functionality thereafter. The dmesg output on the second run included
the following:

[  103.752641] ------------[ cut here ]------------
[  103.752643] kernel BUG at 
/home/jsalisbury/bugs/lp1733662/ubuntu-artful/mm/slub.c:3878!
[  103.753548] invalid opcode: 0000 [#1] SMP
[  103.754440] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal 
intel_powerclamp ipmi_ssif coretemp joydev input_leds intel_cstate ipmi_si 
intel_rapl_perf mei_me ipmi_devintf ipmi_msghandler kvm_intel kvm irqbypass mei 
mac_hid shpchp acpi_power_meter lpc_ich acpi_pad ib_iser rdma_cm iw_cm ib_cm 
ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure 
scsi_transport_sas crct10dif_pclmul mgag200 crc32_pclmul igb ttm hid_generic 
ghash_clmulni_intel drm_kms_helper fnic pcbc usbhid dca syscopyarea aesni_intel 
sysfillrect i2c_algo_bit sysimgblt fb_sys_fops hid libfcoe aes_x86_64 ahci ptp 
crypto_simd libfc glue_helper mxm_wmi cryptd drm
[  103.762134]  libahci pps_core enic scsi_transport_fc megaraid_sas wmi
[  103.763369] CPU: 0 PID: 3649 Comm: python3 Not tainted 4.13.0-13-generic 
#14~lp1733662Commitac2fc5adab0f4
[  103.764641] Hardware name: Cisco Systems Inc UCSC-C240-M4L/UCSC-C240-M4L, 
BIOS C240M4.2.0.10c.0.032320160820 03/23/2016
[  103.765948] task: ffff8e90a5999740 task.stack: ffff9dbb4e320000
[  103.767263] RIP: 0010:kfree+0x11c/0x160
[  103.768601] RSP: 0018:ffff9dbb4e323cb0 EFLAGS: 00010246
[  103.769941] RAX: fffffa5b3cff0020 RBX: ffff8eb000000000 RCX: 0000000000000000
[  103.771301] RDX: 0000000000000000 RSI: 0000000000000028 RDI: 0000718ec0000000
[  103.772663] RBP: ffff9dbb4e323cc8 R08: dead000000000100 R09: ffffffff985ed7a8
[  103.774049] R10: fffffa5b3d000000 R11: 0000000000000000 R12: 0000000000000028
[  103.775426] R13: ffffffff97eead09 R14: 000000000000000a R15: ffffffff977143f0
[  103.776809] FS:  00007f1e1c29f700(0000) GS:ffff8e90bfc00000(0000) 
knlGS:0000000000000000
[  103.778214] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  103.779645] CR2: 000055be9d7243a8 CR3: 0000003ff74a3000 CR4: 00000000001406f0
[  103.781094] Call Trace:
[  103.782527]  free_cpumask_var+0x9/0x10
[  103.783961]  smpcfd_dead_cpu+0x24/0x40
[  103.785415]  cpuhp_invoke_callback+0x84/0x3b0
[  103.786859]  ? flow_cache_lookup+0x4c0/0x4c0
[  103.788303]  cpuhp_down_callbacks+0x42/0x80
[  103.789745]  _cpu_down+0xc2/0x100
[  103.791191]  do_cpu_down+0x33/0x50
[  103.792624]  cpu_down+0x10/0x20
[  103.794056]  cpu_subsys_offline+0x14/0x20
[  103.795492]  device_offline+0x73/0xc0
[  103.796926]  online_store+0x4c/0xa0
[  103.798351]  dev_attr_store+0x18/0x30
[  103.799779]  sysfs_kf_write+0x37/0x40
[  103.801201]  kernfs_fop_write+0x11c/0x1a0
[  103.802634]  __vfs_write+0x18/0x40
[  103.804065]  vfs_write+0xb1/0x1a0
[  103.805485]  SyS_write+0x55/0xc0
[  103.806888]  entry_SYSCALL_64_fastpath+0x1e/0xa9
[  103.808310] RIP: 0033:0x7f1e1be7f4a0
[  103.809730] RSP: 002b:00007ffc4ead2768 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
[  103.811181] RAX: ffffffffffffffda RBX: 0000000001d8b410 RCX: 00007f1e1be7f4a0
[  103.812648] RDX: 0000000000000002 RSI: 0000000001ea1060 RDI: 0000000000000003
[  103.814122] RBP: 0000000000a3e020 R08: 0000000000000000 R09: 0000000000000001
[  103.815600] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
[  103.817048] R13: 0000000000501520 R14: 00007ffc4ead2bd0 R15: 00007f1e1ad98240
[  103.818475] Code: 08 49 83 c4 18 48 89 da 4c 89 ee ff d0 49 8b 04 24 48 85 
c0 75 e6 e9 0e ff ff ff 49 8b 02 f6 c4 80 75 0a 49 8b 42 20 a8 01 75 02 <0f> 0b 
49 8b 02 31 f6 f6 c4 80 74 04 41 8b 72 6c 4c 89 d7 e8 2c 
[  103.821390] RIP: kfree+0x11c/0x160 RSP: ffff9dbb4e323cb0
[  103.822826] ---[ end trace 7c1d545f713a5ad1 ]---

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1733662

Title:
  System hang with Linux kernel 4.13, not with 4.10

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1733662/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to