Re: [Xen-devel] Recent cores-scheduling failures

2020-01-15 Thread Sergey Dyasli
On 19/12/2019 16:14, Jürgen Groß wrote:
> On 19.12.19 13:45, Sergey Dyasli wrote:
>> Hi Juergen,
>>
>> We recently did another quick test of core scheduling mode, and the following
>> failures were found:
>>
>> 1. live-patch apply failures:
>>
>>  (XEN) [ 1058.751974] livepatch: lp_1_1: Timed out on semaphore in CPU 
>> quiesce phase 30/31
>>  (XEN) [ 1058.751982] livepatch: lp_1_1 finished REPLACE with rc=-16

Have you been able to look into this one?

>>
>> 2. ACPI S5 crash:
>>
>>  https://paste.debian.net/1121748/
>
> So in sched_slave() *vprev is already scrubbed.
>
> I have currently no idea how that could happen, is vprev->is_running
> should be cleared only a little bit later.

Have you been able to identify the place in code where this happens?
I can try adding some debug messages.

In some good news, we did more XenRT testing with core scheduling mode
and there were no other issues found so far.

--
Thanks,
Sergey

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Recent cores-scheduling failures

2020-01-07 Thread Sergey Dyasli
On 20/12/2019 06:26, Jürgen Groß wrote:
> On 19.12.19 13:45, Sergey Dyasli wrote:
>> Hi Juergen,
>>
>> We recently did another quick test of core scheduling mode, and the following
>> failures were found:
>>
>> 1. live-patch apply failures:
>>
>>  (XEN) [ 1058.751974] livepatch: lp_1_1: Timed out on semaphore in CPU 
>> quiesce phase 30/31
>>  (XEN) [ 1058.751982] livepatch: lp_1_1 finished REPLACE with rc=-16
>>
>> 2. ACPI S5 crash:
>>
>>  https://paste.debian.net/1121748/
>
> Are there any XenServer patches in your hypervisor?
>
> I'm asking because I don't see why a vcpu would be freed when shutting
> down the host (other than by any shutdown scripts, but those should be
> long finished when trying to enter S5).

While we have the patch-queue applied in our testing, there is nothing
there that would affect the scheduler directly.

The S5 crash reproduces reliably in automated testing, but I still don't
know how to trigger the issue manually.

--
Thanks,
Sergey

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Recent cores-scheduling failures

2019-12-19 Thread Jürgen Groß

On 19.12.19 13:45, Sergey Dyasli wrote:

Hi Juergen,

We recently did another quick test of core scheduling mode, and the following
failures were found:

1. live-patch apply failures:

 (XEN) [ 1058.751974] livepatch: lp_1_1: Timed out on semaphore in CPU 
quiesce phase 30/31
 (XEN) [ 1058.751982] livepatch: lp_1_1 finished REPLACE with rc=-16

2. ACPI S5 crash:

 https://paste.debian.net/1121748/


Are there any XenServer patches in your hypervisor?

I'm asking because I don't see why a vcpu would be freed when shutting
down the host (other than by any shutdown scripts, but those should be
long finished when trying to enter S5).


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Recent cores-scheduling failures

2019-12-19 Thread Jürgen Groß

On 19.12.19 13:45, Sergey Dyasli wrote:

Hi Juergen,

We recently did another quick test of core scheduling mode, and the following
failures were found:

1. live-patch apply failures:

 (XEN) [ 1058.751974] livepatch: lp_1_1: Timed out on semaphore in CPU 
quiesce phase 30/31
 (XEN) [ 1058.751982] livepatch: lp_1_1 finished REPLACE with rc=-16

2. ACPI S5 crash:

 https://paste.debian.net/1121748/


So in sched_slave() *vprev is already scrubbed.

I have currently no idea how that could happen, is vprev->is_running
should be cleared only a little bit later.

Will look into it more..


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Recent cores-scheduling failures

2019-12-19 Thread Andrew Cooper
On 19/12/2019 12:45, Sergey Dyasli wrote:
> 2. ACPI S5 crash:
>
> https://paste.debian.net/1121748/

cmpw $0x7fff,(%rax) with %rax as 0xc2c2c2c2c2c2c2c2

Looks like a use-after-free checking for the idle domain.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel