On Wed, 2018-04-11 at 17:27 +0200, Olaf Hering wrote:
> On Wed, Apr 11, Olaf Hering wrote:
> 
> > That was with sched=credit2, sorry for that.
> > Now with just that second patch ...
> 
> Still BUG in csched_load_balance.
> 
> (XEN) Xen BUG at sched_credit.c:1694
> (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5-
> 6.bug1087289_411  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    135
> (XEN) RIP:    e008:[<ffff82d08022ae34>]
> sched_credit.c#csched_schedule+0x44a/0xd42
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08022ae34>]
> sched_credit.c#csched_schedule+0x44a/0xd42
> (XEN)    [<ffff82d080236406>] schedule.c#schedule+0x107/0x627
> (XEN)    [<ffff82d080239ec5>] softirq.c#__do_softirq+0x85/0x90
> (XEN)    [<ffff82d080239f1a>] do_softirq+0x13/0x15
> (XEN)    [<ffff82d0802738f0>] domain.c#idle_loop+0xac/0xbe
> 
Ok, back to square 1. :-/

A data point is that Credit2 works. In Credit2, vcpu_move_locked()
(called by vcpu_migrate()) calls a function called migrate() which
--because of Credit2 specific reasons-- consider legit the fact that it
finds the vcpu in a runqueue... So that's what I think "save" us, and
that is why this data point does not help much (sorry Olaf for not
realizing this earlier, and asking you to try Credit2). :-(

On the other hand, in Credit1, there should be no good reason why
vcpu_migrate() would be called on a vcpu which is on a runqueue, and
the fact that we're still crashing proves that there is at least
another race, causing that to happen.

So, the debug patch I posted previously in this thread, was wrong. I'm
attaching a new one to this email. Olaf, if you're trying again, please
do it with both, the "fix" (xen-sched-debug-vcpumigrate-race.patch),
and this one.

Debug hypervisor, as usual, if possible. :-)

It will crash, again, possibly with the same stack trace, but I think
it's worth a try.

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/
commit 5fb7ad8d1220101e69a87014d5a485d19aea9917
Author: Dario Faggioli <dfaggi...@suse.com>
Date:   Wed Apr 11 09:04:33 2018 +0200

    xen: credit: implement SCHED_OP(migrate)
    
    with just sanity checking in it, to catch a race.
    
    Signed-off-by: Dario Faggioli <dfaggi...@suse.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 9bc638c09c..7a909376e6 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -867,6 +867,17 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
     return cpu;
 }
 
+static void
+csched_vcpu_migrate(const struct scheduler *ops, struct vcpu *vc,
+		    unsigned int new_cpu)
+{
+    BUG_ON(vc->is_running);
+    BUG_ON(test_bit(_VPF_migrating, &vc->pause_flags));
+    BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc)));
+    BUG_ON(CSCHED_VCPU(vc) == CSCHED_VCPU(curr_on_cpu(vc->processor)));
+    vc->processor = new_cpu;
+}
+
 static int
 csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
@@ -2278,6 +2289,7 @@ static const struct scheduler sched_credit_def = {
     .adjust_global  = csched_sys_cntl,
 
     .pick_cpu       = csched_cpu_pick,
+    .migrate        = csched_vcpu_migrate,
     .do_schedule    = csched_schedule,
 
     .dump_cpu_state = csched_dump_pcpu,

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to