Re: kvm crash on 5.7-rc1 and later

2020-07-12 Thread Woody Suwalski

Woody Suwalski wrote:

Xiaoyao Li wrote:

On 7/12/2020 2:21 AM, Peter Zijlstra wrote:

On Fri, Jul 03, 2020 at 11:15:31AM -0400, Woody Suwalski wrote:

I am observing a 100% reproducible kvm crash on kernels starting with
5.7-rc1, always with the same opcode .
It happens during wake up from the host suspended state. Worked OK 
on 5.6

and older.
The host is based on Debian testing, Thinkpad T440, i5 cpu.

[   61.576664] kernel BUG at arch/x86/kvm/x86.c:387!
[   61.576672] invalid opcode:  [#1] PREEMPT SMP NOPTI
[   61.576678] CPU: 0 PID: 3851 Comm: qemu-system-x86 Not tainted 
5.7-pingu

#0
[   61.576680] Hardware name: LENOVO 20B6005JUS/20B6005JUS, BIOS 
GJETA4WW

(2.54 ) 03/27/2020
[   61.576700] RIP: 0010:kvm_spurious_fault+0xa/0x10 [kvm]

Crash results in a dead kvm and occasionally a very unstable system.

Bisecting the problem between v5.6 and v5.7-rc1 points to

commit 6650cdd9a8ccf00555dbbe743d58541ad8feb6a7
Author: Peter Zijlstra (Intel) 
Date:   Sun Jan 26 12:05:35 2020 -0800

 x86/split_lock: Enable split lock detection by kernel

Reversing that patch seems to actually "cure" the issue.

The problem is present in all kernels past 5.7-rc1, however the 
patch is not
reversing directly in later source trees, so can not retest the 
logic on

recent kernels.

Peter, would you have idea how to debug that (or even better - 
would you

happen to know the fix)?

I have attached dmesg logs from a "good" 5.6.9 kernel, and then 
"bad" 5.7.0

and 5.8-rc3


I have no clue about kvm. Nor do I actually have hardware with SLD on.
I've Cc'ed a bunch of folks who might have more ideas.



I think this bug is the same as the one found by Sean, and is already 
fixed in 5.8-rc4.


https://lore.kernel.org/kvm/20200605192605.7439-1-sean.j.christopher...@intel.com/ 



You are right, kvm works OK on 5.8-rc4.
The fix will need to be backported to 5.7.

Thanks, Woody


I see it is already in 5.7.8. Great :-)
Thanks, Woody



Re: kvm crash on 5.7-rc1 and later

2020-07-12 Thread Woody Suwalski

Xiaoyao Li wrote:

On 7/12/2020 2:21 AM, Peter Zijlstra wrote:

On Fri, Jul 03, 2020 at 11:15:31AM -0400, Woody Suwalski wrote:

I am observing a 100% reproducible kvm crash on kernels starting with
5.7-rc1, always with the same opcode .
It happens during wake up from the host suspended state. Worked OK 
on 5.6

and older.
The host is based on Debian testing, Thinkpad T440, i5 cpu.

[   61.576664] kernel BUG at arch/x86/kvm/x86.c:387!
[   61.576672] invalid opcode:  [#1] PREEMPT SMP NOPTI
[   61.576678] CPU: 0 PID: 3851 Comm: qemu-system-x86 Not tainted 
5.7-pingu

#0
[   61.576680] Hardware name: LENOVO 20B6005JUS/20B6005JUS, BIOS 
GJETA4WW

(2.54 ) 03/27/2020
[   61.576700] RIP: 0010:kvm_spurious_fault+0xa/0x10 [kvm]

Crash results in a dead kvm and occasionally a very unstable system.

Bisecting the problem between v5.6 and v5.7-rc1 points to

commit 6650cdd9a8ccf00555dbbe743d58541ad8feb6a7
Author: Peter Zijlstra (Intel) 
Date:   Sun Jan 26 12:05:35 2020 -0800

 x86/split_lock: Enable split lock detection by kernel

Reversing that patch seems to actually "cure" the issue.

The problem is present in all kernels past 5.7-rc1, however the 
patch is not
reversing directly in later source trees, so can not retest the 
logic on

recent kernels.

Peter, would you have idea how to debug that (or even better - would 
you

happen to know the fix)?

I have attached dmesg logs from a "good" 5.6.9 kernel, and then 
"bad" 5.7.0

and 5.8-rc3


I have no clue about kvm. Nor do I actually have hardware with SLD on.
I've Cc'ed a bunch of folks who might have more ideas.



I think this bug is the same as the one found by Sean, and is already 
fixed in 5.8-rc4.


https://lore.kernel.org/kvm/20200605192605.7439-1-sean.j.christopher...@intel.com/ 



You are right, kvm works OK on 5.8-rc4.
The fix will need to be backported to 5.7.

Thanks, Woody



Re: kvm crash on 5.7-rc1 and later

2020-07-12 Thread Xiaoyao Li

On 7/12/2020 2:21 AM, Peter Zijlstra wrote:

On Fri, Jul 03, 2020 at 11:15:31AM -0400, Woody Suwalski wrote:

I am observing a 100% reproducible kvm crash on kernels starting with
5.7-rc1, always with the same opcode .
It happens during wake up from the host suspended state. Worked OK on 5.6
and older.
The host is based on Debian testing, Thinkpad T440, i5 cpu.

[   61.576664] kernel BUG at arch/x86/kvm/x86.c:387!
[   61.576672] invalid opcode:  [#1] PREEMPT SMP NOPTI
[   61.576678] CPU: 0 PID: 3851 Comm: qemu-system-x86 Not tainted 5.7-pingu
#0
[   61.576680] Hardware name: LENOVO 20B6005JUS/20B6005JUS, BIOS GJETA4WW
(2.54 ) 03/27/2020
[   61.576700] RIP: 0010:kvm_spurious_fault+0xa/0x10 [kvm]

Crash results in a dead kvm and occasionally a very unstable system.

Bisecting the problem between v5.6 and v5.7-rc1 points to

commit 6650cdd9a8ccf00555dbbe743d58541ad8feb6a7
Author: Peter Zijlstra (Intel) 
Date:   Sun Jan 26 12:05:35 2020 -0800

     x86/split_lock: Enable split lock detection by kernel

Reversing that patch seems to actually "cure" the issue.

The problem is present in all kernels past 5.7-rc1, however the patch is not
reversing directly in later source trees, so can not retest the logic on
recent kernels.

Peter, would you have idea how to debug that (or even better - would you
happen to know the fix)?

I have attached dmesg logs from a "good" 5.6.9 kernel, and then "bad" 5.7.0
and 5.8-rc3


I have no clue about kvm. Nor do I actually have hardware with SLD on.
I've Cc'ed a bunch of folks who might have more ideas.



I think this bug is the same as the one found by Sean, and is already 
fixed in 5.8-rc4.


https://lore.kernel.org/kvm/20200605192605.7439-1-sean.j.christopher...@intel.com/


Re: kvm crash on 5.7-rc1 and later

2020-07-11 Thread Peter Zijlstra
On Fri, Jul 03, 2020 at 11:15:31AM -0400, Woody Suwalski wrote:
> I am observing a 100% reproducible kvm crash on kernels starting with
> 5.7-rc1, always with the same opcode .
> It happens during wake up from the host suspended state. Worked OK on 5.6
> and older.
> The host is based on Debian testing, Thinkpad T440, i5 cpu.
> 
> [   61.576664] kernel BUG at arch/x86/kvm/x86.c:387!
> [   61.576672] invalid opcode:  [#1] PREEMPT SMP NOPTI
> [   61.576678] CPU: 0 PID: 3851 Comm: qemu-system-x86 Not tainted 5.7-pingu
> #0
> [   61.576680] Hardware name: LENOVO 20B6005JUS/20B6005JUS, BIOS GJETA4WW
> (2.54 ) 03/27/2020
> [   61.576700] RIP: 0010:kvm_spurious_fault+0xa/0x10 [kvm]
> 
> Crash results in a dead kvm and occasionally a very unstable system.
> 
> Bisecting the problem between v5.6 and v5.7-rc1 points to
> 
> commit 6650cdd9a8ccf00555dbbe743d58541ad8feb6a7
> Author: Peter Zijlstra (Intel) 
> Date:   Sun Jan 26 12:05:35 2020 -0800
> 
>     x86/split_lock: Enable split lock detection by kernel
> 
> Reversing that patch seems to actually "cure" the issue.
> 
> The problem is present in all kernels past 5.7-rc1, however the patch is not
> reversing directly in later source trees, so can not retest the logic on
> recent kernels.
> 
> Peter, would you have idea how to debug that (or even better - would you
> happen to know the fix)?
> 
> I have attached dmesg logs from a "good" 5.6.9 kernel, and then "bad" 5.7.0
> and 5.8-rc3

I have no clue about kvm. Nor do I actually have hardware with SLD on.
I've Cc'ed a bunch of folks who might have more ideas.