Re: kvm crash on 5.7-rc1 and later
Woody Suwalski wrote: Xiaoyao Li wrote: On 7/12/2020 2:21 AM, Peter Zijlstra wrote: On Fri, Jul 03, 2020 at 11:15:31AM -0400, Woody Suwalski wrote: I am observing a 100% reproducible kvm crash on kernels starting with 5.7-rc1, always with the same opcode . It happens during wake up from the host suspended state. Worked OK on 5.6 and older. The host is based on Debian testing, Thinkpad T440, i5 cpu. [ 61.576664] kernel BUG at arch/x86/kvm/x86.c:387! [ 61.576672] invalid opcode: [#1] PREEMPT SMP NOPTI [ 61.576678] CPU: 0 PID: 3851 Comm: qemu-system-x86 Not tainted 5.7-pingu #0 [ 61.576680] Hardware name: LENOVO 20B6005JUS/20B6005JUS, BIOS GJETA4WW (2.54 ) 03/27/2020 [ 61.576700] RIP: 0010:kvm_spurious_fault+0xa/0x10 [kvm] Crash results in a dead kvm and occasionally a very unstable system. Bisecting the problem between v5.6 and v5.7-rc1 points to commit 6650cdd9a8ccf00555dbbe743d58541ad8feb6a7 Author: Peter Zijlstra (Intel) Date: Sun Jan 26 12:05:35 2020 -0800 x86/split_lock: Enable split lock detection by kernel Reversing that patch seems to actually "cure" the issue. The problem is present in all kernels past 5.7-rc1, however the patch is not reversing directly in later source trees, so can not retest the logic on recent kernels. Peter, would you have idea how to debug that (or even better - would you happen to know the fix)? I have attached dmesg logs from a "good" 5.6.9 kernel, and then "bad" 5.7.0 and 5.8-rc3 I have no clue about kvm. Nor do I actually have hardware with SLD on. I've Cc'ed a bunch of folks who might have more ideas. I think this bug is the same as the one found by Sean, and is already fixed in 5.8-rc4. https://lore.kernel.org/kvm/20200605192605.7439-1-sean.j.christopher...@intel.com/ You are right, kvm works OK on 5.8-rc4. The fix will need to be backported to 5.7. Thanks, Woody I see it is already in 5.7.8. Great :-) Thanks, Woody
Re: kvm crash on 5.7-rc1 and later
Xiaoyao Li wrote: On 7/12/2020 2:21 AM, Peter Zijlstra wrote: On Fri, Jul 03, 2020 at 11:15:31AM -0400, Woody Suwalski wrote: I am observing a 100% reproducible kvm crash on kernels starting with 5.7-rc1, always with the same opcode . It happens during wake up from the host suspended state. Worked OK on 5.6 and older. The host is based on Debian testing, Thinkpad T440, i5 cpu. [ 61.576664] kernel BUG at arch/x86/kvm/x86.c:387! [ 61.576672] invalid opcode: [#1] PREEMPT SMP NOPTI [ 61.576678] CPU: 0 PID: 3851 Comm: qemu-system-x86 Not tainted 5.7-pingu #0 [ 61.576680] Hardware name: LENOVO 20B6005JUS/20B6005JUS, BIOS GJETA4WW (2.54 ) 03/27/2020 [ 61.576700] RIP: 0010:kvm_spurious_fault+0xa/0x10 [kvm] Crash results in a dead kvm and occasionally a very unstable system. Bisecting the problem between v5.6 and v5.7-rc1 points to commit 6650cdd9a8ccf00555dbbe743d58541ad8feb6a7 Author: Peter Zijlstra (Intel) Date: Sun Jan 26 12:05:35 2020 -0800 x86/split_lock: Enable split lock detection by kernel Reversing that patch seems to actually "cure" the issue. The problem is present in all kernels past 5.7-rc1, however the patch is not reversing directly in later source trees, so can not retest the logic on recent kernels. Peter, would you have idea how to debug that (or even better - would you happen to know the fix)? I have attached dmesg logs from a "good" 5.6.9 kernel, and then "bad" 5.7.0 and 5.8-rc3 I have no clue about kvm. Nor do I actually have hardware with SLD on. I've Cc'ed a bunch of folks who might have more ideas. I think this bug is the same as the one found by Sean, and is already fixed in 5.8-rc4. https://lore.kernel.org/kvm/20200605192605.7439-1-sean.j.christopher...@intel.com/ You are right, kvm works OK on 5.8-rc4. The fix will need to be backported to 5.7. Thanks, Woody
Re: kvm crash on 5.7-rc1 and later
On 7/12/2020 2:21 AM, Peter Zijlstra wrote: On Fri, Jul 03, 2020 at 11:15:31AM -0400, Woody Suwalski wrote: I am observing a 100% reproducible kvm crash on kernels starting with 5.7-rc1, always with the same opcode . It happens during wake up from the host suspended state. Worked OK on 5.6 and older. The host is based on Debian testing, Thinkpad T440, i5 cpu. [ 61.576664] kernel BUG at arch/x86/kvm/x86.c:387! [ 61.576672] invalid opcode: [#1] PREEMPT SMP NOPTI [ 61.576678] CPU: 0 PID: 3851 Comm: qemu-system-x86 Not tainted 5.7-pingu #0 [ 61.576680] Hardware name: LENOVO 20B6005JUS/20B6005JUS, BIOS GJETA4WW (2.54 ) 03/27/2020 [ 61.576700] RIP: 0010:kvm_spurious_fault+0xa/0x10 [kvm] Crash results in a dead kvm and occasionally a very unstable system. Bisecting the problem between v5.6 and v5.7-rc1 points to commit 6650cdd9a8ccf00555dbbe743d58541ad8feb6a7 Author: Peter Zijlstra (Intel) Date: Sun Jan 26 12:05:35 2020 -0800 x86/split_lock: Enable split lock detection by kernel Reversing that patch seems to actually "cure" the issue. The problem is present in all kernels past 5.7-rc1, however the patch is not reversing directly in later source trees, so can not retest the logic on recent kernels. Peter, would you have idea how to debug that (or even better - would you happen to know the fix)? I have attached dmesg logs from a "good" 5.6.9 kernel, and then "bad" 5.7.0 and 5.8-rc3 I have no clue about kvm. Nor do I actually have hardware with SLD on. I've Cc'ed a bunch of folks who might have more ideas. I think this bug is the same as the one found by Sean, and is already fixed in 5.8-rc4. https://lore.kernel.org/kvm/20200605192605.7439-1-sean.j.christopher...@intel.com/
Re: kvm crash on 5.7-rc1 and later
On Fri, Jul 03, 2020 at 11:15:31AM -0400, Woody Suwalski wrote: > I am observing a 100% reproducible kvm crash on kernels starting with > 5.7-rc1, always with the same opcode . > It happens during wake up from the host suspended state. Worked OK on 5.6 > and older. > The host is based on Debian testing, Thinkpad T440, i5 cpu. > > [ 61.576664] kernel BUG at arch/x86/kvm/x86.c:387! > [ 61.576672] invalid opcode: [#1] PREEMPT SMP NOPTI > [ 61.576678] CPU: 0 PID: 3851 Comm: qemu-system-x86 Not tainted 5.7-pingu > #0 > [ 61.576680] Hardware name: LENOVO 20B6005JUS/20B6005JUS, BIOS GJETA4WW > (2.54 ) 03/27/2020 > [ 61.576700] RIP: 0010:kvm_spurious_fault+0xa/0x10 [kvm] > > Crash results in a dead kvm and occasionally a very unstable system. > > Bisecting the problem between v5.6 and v5.7-rc1 points to > > commit 6650cdd9a8ccf00555dbbe743d58541ad8feb6a7 > Author: Peter Zijlstra (Intel) > Date: Sun Jan 26 12:05:35 2020 -0800 > > x86/split_lock: Enable split lock detection by kernel > > Reversing that patch seems to actually "cure" the issue. > > The problem is present in all kernels past 5.7-rc1, however the patch is not > reversing directly in later source trees, so can not retest the logic on > recent kernels. > > Peter, would you have idea how to debug that (or even better - would you > happen to know the fix)? > > I have attached dmesg logs from a "good" 5.6.9 kernel, and then "bad" 5.7.0 > and 5.8-rc3 I have no clue about kvm. Nor do I actually have hardware with SLD on. I've Cc'ed a bunch of folks who might have more ideas.