On Thu, 2018-04-12 at 09:38 +0000, George Dunlap wrote:
> > On Apr 11, 2018, at 10:31 PM, Dario Faggioli <raist...@linux.it>
> > wrote:
> > (XEN) Xen BUG at sched_credit.c:876
> > (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-
> > 7.bug1087289_411  x86_64  debug=y   Not tainted ]----
> > (XEN) CPU:    108
> > (XEN) RIP:    e008:[<ffff82d080229ab4>]
> > sched_credit.c#csched_vcpu_migrate+0x27/0x54
> > (XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
> > ...
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d080229ab4>]
> > sched_credit.c#csched_vcpu_migrate+0x27/0x54
> > (XEN)    [<ffff82d080236348>] schedule.c#vcpu_move_locked+0xbb/0xc2
> > (XEN)    [<ffff82d08023764c>] schedule.c#vcpu_migrate+0x226/0x25b
> > (XEN)    [<ffff82d080239367>] context_saved+0x95/0x9c
> > (XEN)    [<ffff82d08027797d>] context_switch+0xe66/0xeb0
> > (XEN)    [<ffff82d080236943>] schedule.c#schedule+0x5f4/0x627
> > (XEN)    [<ffff82d080239f15>] softirq.c#__do_softirq+0x85/0x90
> > (XEN)    [<ffff82d080239f6a>] do_softirq+0x13/0x15
> > (XEN)    [<ffff82d08031f5db>] vmx_asm_do_vmentry+0x2b/0x30
> > 
> > So, really *exactly* the same. Ok, thanks.
> 
> But this doesn’t make any sense.  If you applied Dario’s ‘fix’ patch,
> then context_saved() should have *just* called vcpu_sleep_nosync()
> before calling vcpu_migrate().  The VPF_migrating flag should still
> be set, so it should have called csched_vcpu_sleep(); and sd->curr
> should have been changed to be != prev way back in schedule(), so
> csched_vcpu_sleep() should have called runq_remove().
> 
Well, you've just described me, banging my head on my desk, since
yesterday afternoon. :-P

> It’s probably worth asking the obvious question: Are you sure the
> “fix” patch is actually applied (in addition to the new “debug”
> patch)? :-)
> 
> If so, then maybe it’s time to open-code vcpu_sleep_nosync() there in
> context_saved(), to try to figure out where our understanding of what
> *should* happen is incorrect.
> 
Ehm... Can you please stop reading my mind? It's annoying. :-D
Well, I guess we can say: "great minds think alike". :-P

Olaf, new patch. Please, remove _everything_ and apply _only_ this one.

As George is saying, the vcpu just can't be in the runqueue, unless:
 1) vcpu_sleep_nosync() did not remove it
 2) someone is putting it back there

Let's check 1 first.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 9bc638c09c..67628a1f95 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -867,6 +867,17 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
     return cpu;
 }
 
+static void
+csched_vcpu_migrate(const struct scheduler *ops, struct vcpu *vc,
+		    unsigned int new_cpu)
+{
+    BUG_ON(vc->is_running);
+    BUG_ON(test_bit(_VPF_migrating, &vc->pause_flags));
+    BUG_ON(CSCHED_VCPU(vc) == CSCHED_VCPU(curr_on_cpu(vc->processor)));
+    BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
+    vc->processor = new_cpu;
+}
+
 static int
 csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
@@ -2278,6 +2289,7 @@ static const struct scheduler sched_credit_def = {
     .adjust_global  = csched_sys_cntl,
 
     .pick_cpu       = csched_cpu_pick,
+    .migrate        = csched_vcpu_migrate,
     .do_schedule    = csched_schedule,
 
     .dump_cpu_state = csched_dump_pcpu,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 343ab6306e..7be62efa33 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1554,7 +1554,29 @@ void context_saved(struct vcpu *prev)
     SCHED_OP(vcpu_scheduler(prev), context_saved, prev);
 
     if ( unlikely(prev->pause_flags & VPF_migrating) )
+    {
+        /*
+         * If someone (e.g., vcpu_set_affinity()) has set VPF_migrating
+         * on prev in between when schedule() releases the scheduler
+         * lock and here, we need to make sure we properly mark the
+         * vcpu as not runnable (and all it comes with that), with
+         * vcpu_sleep_nosync(), before calling vcpu_migrate().
+         */
+        //vcpu_sleep_nosync(prev);
+        unsigned long flags;
+        spinlock_t *lock = vcpu_schedule_lock_irqsave(prev, &flags);
+
+        BUG_ON(vcpu_runnable(prev));
+        BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags));
+        if ( prev->runstate.state == RUNSTATE_runnable )
+            vcpu_runstate_change(prev, RUNSTATE_offline, NOW());
+        BUG_ON(curr_on_cpu(prev->processor) == prev);
+        SCHED_OP(vcpu_scheduler(prev), sleep, prev);
+
+        vcpu_schedule_unlock_irqrestore(lock, flags, prev);
+
         vcpu_migrate(prev);
+    }
 }
 
 /* The scheduler timer: force a run through the scheduler */

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to