[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
Please upgrade to linux-hwe kernel version 4.13 and re-open this bug if you are able to reproduce this bug. ** Changed in: linux (Ubuntu) Status: In Progress => Won't Fix ** Changed in: ubuntu-power-systems Status: In Progress => Won't Fix -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
** Tags added: kernel-da-key -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
** Changed in: ubuntu-power-systems Importance: Critical => High -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
** Tags removed: severity-critical ** Tags added: severity-high -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
After experiencing the issue on the guest, I turned on SMT again and the threads look fine: $ sudo ppc64_cpu --smt=8 $ sudo ppc64_cpu --info Core 0:0*1*2*3*4*5*6*7* Core 1:8*9* 10* 11* 12* 13* 14* 15* Core 2: 16* 17* 18* 19* 20* 21* 22* 23* Core 3: 24* 25* 26* 27* 28* 29* 30* 31* Core 4: 32* 33* 34* 35* 36* 37* 38* 39* Core 5: 40* 41* 42* 43* 44* 45* 46* 47* Core 6: 48* 49* 50* 51* 52* 53* 54* 55* Core 7: 56* 57* 58* 59* 60* 61* 62* 63* Core 8: 64* 65* 66* 67* 68* 69* 70* 71* Core 9: 72* 73* 74* 75* 76* 77* 78* 79* Core 10: 80* 81* 82* 83* 84* 85* 86* 87* Core 11: 88* 89* 90* 91* 92* 93* 94* 95* Core 12: 96* 97* 98* 99* 100* 101* 102* 103* Core 13: 104* 105* 106* 107* 108* 109* 110* 111* Core 14: 112* 113* 114* 115* 116* 117* 118* 119* Core 15: 120* 121* 122* 123* 124* 125* 126* 127* Core 16: 128* 129* 130* 131* 132* 133* 134* 135* Core 17: 136* 137* 138* 139* 140* 141* 142* 143* Core 18: 144* 145* 146* 147* 148* 149* 150* 151* Core 19: 152* 153* 154* 155* 156* 157* 158* 159* -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
** Tags removed: targetmilestone-inin--- ** Tags added: targetmilestone-inin16044 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
** Changed in: ubuntu-power-systems Status: New => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
Hi Diego, What's the processor type of the host system? Thanks, Kleber -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
The problem seems to be related to how qemu allocates the virtual cores and threads to the VM. Starting the VMs with the following command I cannot reproduce the issue: $ qemu-system-ppc64le -m 1024 -smp 16,threads=8 -nographic -net nic,model=virtio -net user,hostfwd=tcp::10022-:22 -drive file =./autopkgtest-xenial-ppc64el.img -enable-kvm As long the number of cpus and threads provided to the -smp parameter follow the topology of the p8 processor, making qemu use more than one core only if the number of threads is multiple of 8, I can't reproduce the hang. If the parameters provided makes qemu spread the virtual threads on more than one physical cpu without using all the 8 threads (e.g. '-smp 2', '-smp 4,threads=2', '-smp 8,threads=4') then the guest kernel hangs at some point. This is the environment being tested: Machine type-model: 8247-22L Host: Artful kernel 4.13.0-17-generic qemu 2.10+dfsg-0ubuntu3 Guests: Xenial, Zesty and Artful with latest kernel from -updates. Could someone from IBM please confirm if this is a known or expected behavior? This issue seems to be around for a long time and it's odd that this hasn't come up before. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
I also found that if I run artful VM on the same hypervisor, I do not see this problem on artful kernel. So, from what I see, it seems to be a problem on kernel 4.10 that is being exposed better on a KVM that runs on kernel 4.13. This is better log I was able to capture: [ 32.029274] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [swapper/10:0] [ 32.029402] Modules linked in: vmx_crypto kvm ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ibmvscsi ibmveth crc32c_vpmsum virtio_blk [ 32.029464] CPU: 10 PID: 0 Comm: swapper/10 Not tainted 4.10.0-383-generic #383 [ 32.029466] task: c007f7663c00 task.stack: c007fe158000 [ 32.029467] NIP: c00163c4 LR: c00163c4 CTR: 0006 [ 32.029467] REGS: c007fe15b590 TRAP: 0901 Not tainted (4.10.0-383-generic) [ 32.029468] MSR: 80009033[ 32.029472] CR: 28002824 XER: [ 32.029472] CFAR: c01894e4 SOFTE: 1 GPR00: c018995c c007fe15b810 c14ac900 0900 GPR04: c010bc30 c007ffda2980 8000 c007fe15b8c8 GPR08: 0600 0100 c007ffd8e128 GPR12: c09b8450 c7b85a00 [ 32.029483] NIP [c00163c4] arch_local_irq_restore+0x74/0x90 [ 32.029484] LR [c00163c4] arch_local_irq_restore+0x74/0x90 [ 32.029485] Call Trace: [ 32.029486] [c007fe15b810] [fffee2c1] 0xfffee2c1 (unreliable) [ 32.029488] [c007fe15b830] [c018995c] expire_timers+0x13c/0x210 [ 32.029490] [c007fe15b8a0] [c0189bd8] run_timer_softirq+0x1a8/0x230 [ 32.029492] [c007fe15b940] [c0bae11c] __do_softirq+0x19c/0x3fc [ 32.029494] [c007fe15ba30] [c00f28c8] irq_exit+0xe8/0x120 [ 32.029496] [c007fe15ba50] [c00250d4] timer_interrupt+0xa4/0xe0 [ 32.029498] [c007fe15ba80] [c00090d4] decrementer_common+0x114/0x120 [ 32.029501] --- interrupt: 901 at plpar_hcall_norets+0x1c/0x28 LR = check_and_cede_processor+0x38/0x50 [ 32.029502] [c007fe15bd70] [c09b80d4] check_and_cede_processor+0x24/0x50 (unreliable) [ 32.029504] [c007fe15bdd0] [c09b84a4] shared_cede_loop+0x54/0x150 [ 32.029506] [c007fe15be00] [c09b53ec] cpuidle_enter_state+0x17c/0x450 [ 32.029507] [c007fe15be60] [c01519a0] call_cpuidle+0x50/0xa0 [ 32.029509] [c007fe15be80] [c0151ec0] do_idle+0x2d0/0x340 [ 32.029510] [c007fe15bf00] [c01521c0] cpu_startup_entry+0x40/0x60 [ 32.029512] [c007fe15bf30] [c0047200] start_secondary+0x340/0x390 [ 32.029514] [c007fe15bf90] [c000aa6c] start_secondary_prolog+0x10/0x14 [ 32.029515] Instruction dump: [ 32.029517] 994d01d2 2fa3 409e0024 e92d0020 61298000 7d210164 38210020 e8010010 [ 32.029520] 7c0803a6 4e800020 6042 4bff42e5 <6000> 4be4 6042 e92d0020 [ 34.073274] INFO: rcu_sched detected stalls on CPUs/tasks: [ 34.073318] 10-...: (8 GPs behind) idle=ab9/1/0 softirq=703/703 fqs=2514 [ 34.073352] (detected by 13, t=5252 jiffies, g=-67, c=-68, q=6813) [ 34.073386] Task dump for CPU 10: [ 34.073387] swapper/10 R running task0 0 1 0x0804 [ 34.073389] Call Trace: [ 34.073391] [c007fe15bb20] [0001] 0x1 (unreliable) [ 97.093266] INFO: rcu_sched detected stalls on CPUs/tasks: [ 97.093310] 10-...: (8 GPs behind) idle=ab9/1/0 softirq=703/703 fqs=10197 [ 97.093342] (detected by 13, t=21007 jiffies, g=-67, c=-68, q=169213) [ 97.093378] Task dump for CPU 10: [ 97.093379] swapper/10 R running task0 0 1 0x0804 [ 97.093382] Call Trace: [ 97.093384] [c007fe15bb20] [0001] 0x1 (unreliable) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
I found something interesting. The original dump above was reproduced with kernel 4.4.0-94-generic in the hypervisor. It is very hard to reproduce this issue on the following hypervisor (I just reproduce this once in a while). I migrated to a machine to use kernel 4.13.0-17-generic and the problem is being reproduced instantaneously. That said, I understand that the problem has something to do with the hypervisor kernel version. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
I can reproduce the issue even going back to Zesty kernel 4.10.0-20-generic on the guest. I will continue going back to try to identify the first bad version. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
Looking in a email thread titled "RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?"[1] it seems that this problem is very similar to the one in the thread. The final fix seems to be commit id 2fe59f507a65dbd734b990a11ebc7488f6f87a24. [1] http://linuxppc.10917.n7.nabble.com/Re-RCU-lockup-issues-when-CONFIG-SOFTLOCKUP-DETECTOR-n-any-one-else-seeing-this-tt125634.html#none; -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
I can reproduce the problem if I provide nr_cpu > 1 for -smp qemu flag. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
** Changed in: linux (Ubuntu) Importance: Undecided => Critical ** Changed in: linux (Ubuntu) Status: New => In Progress ** Changed in: linux (Ubuntu) Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) => Kleber Sacilotto de Souza (kleber-souza) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1733864] Re: kernel 4.10.0-40 is hanging with a CPU soft lock
** Also affects: ubuntu-power-systems Importance: Undecided Status: New ** Changed in: ubuntu-power-systems Assignee: (unassigned) => Canonical Kernel Team (canonical-kernel-team) ** Changed in: ubuntu-power-systems Importance: Undecided => Critical ** Tags added: triage-g -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1733864 Title: kernel 4.10.0-40 is hanging with a CPU soft lock To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1733864/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs