On 12.02.20 12:21, Sergey Dyasli wrote:
Hi Juergen,

Recently our testing has found a host crash which is reproducible.
Do you have any idea what might be going on here?

Oh, nice catch!

The problem is that get_cpu_idle_time() is calling vcpu_runstate_get()
for an idle vcpu. This is fragile as idle vcpus are sometimes assigned
temporarily to normal scheduling units, thus the ASSERT() in the unlock
function is failing when the assignment of the idle vcpu is modified
under the feet of vcpu_runstate_get() and the unit it has been assigned
to before is already scheduled on another cpu.

The patch is rather easy, though. Can you try it, please?


Juergen
>From 0236aee221409fa826a81395f2f3e8b15d5128de Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgr...@suse.com>
To: xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dun...@eu.citrix.com>
Cc: Dario Faggioli <dfaggi...@suse.com>
Date: Wed, 12 Feb 2020 13:04:16 +0100
Subject: [PATCH] xen/sched: fix get_cpu_idle_time() with core scheduling

get_cpu_idle_time() is calling vcpu_runstate_get() for an idle vcpu.
With core scheduling active this is fragile, as idle vcpus are assigned
to other scheduling units temporarily, and that assignment is changed
in some cases without holding the scheduling lock, and
vcpu_runstate_get() is using v->sched_unit as parameter for
unit_schedule_[un]lock_irq(), resulting in an ASSERT() triggering in
unlock in case v->sched_unit has changed meanwhile.

Fix that by using a local unit variable holding the correct unit.

Signed-off-by: Juergen Gross <jgr...@suse.com>
---
 xen/common/sched/core.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 2e43f8029f..de5a6b1a57 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -308,17 +308,26 @@ void vcpu_runstate_get(const struct vcpu *v,
 {
     spinlock_t *lock;
     s_time_t delta;
+    struct sched_unit *unit;
 
     rcu_read_lock(&sched_res_rculock);
 
-    lock = likely(v == current) ? NULL : unit_schedule_lock_irq(v->sched_unit);
+    /*
+     * Be careful in case of an idle vcpu: the assignment to a unit might
+     * change even with the scheduling lock held, so be sure to use the
+     * correct unit for locking in order to avoid triggering an ASSERT() in
+     * the unlock function.
+     */
+    unit = is_idle_vcpu(v) ? get_sched_res(v->processor)->sched_unit_idle
+                           : v->sched_unit;
+    lock = likely(v == current) ? NULL : unit_schedule_lock_irq(unit);
     memcpy(runstate, &v->runstate, sizeof(*runstate));
     delta = NOW() - runstate->state_entry_time;
     if ( delta > 0 )
         runstate->time[runstate->state] += delta;
 
     if ( unlikely(lock != NULL) )
-        unit_schedule_unlock_irq(lock, v->sched_unit);
+        unit_schedule_unlock_irq(lock, unit);
 
     rcu_read_unlock(&sched_res_rculock);
 }
-- 
2.16.4

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to