On Wed, 2018-04-11 at 17:27 +0200, Olaf Hering wrote: > On Wed, Apr 11, Olaf Hering wrote: > > > That was with sched=credit2, sorry for that. > > Now with just that second patch ... > > Still BUG in csched_load_balance. > > (XEN) Xen BUG at sched_credit.c:1694 > (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5- > 6.bug1087289_411 x86_64 debug=y Not tainted ]---- > (XEN) CPU: 135 > (XEN) RIP: e008:[<ffff82d08022ae34>] > sched_credit.c#csched_schedule+0x44a/0xd42 > ... > (XEN) Xen call trace: > (XEN) [<ffff82d08022ae34>] > sched_credit.c#csched_schedule+0x44a/0xd42 > (XEN) [<ffff82d080236406>] schedule.c#schedule+0x107/0x627 > (XEN) [<ffff82d080239ec5>] softirq.c#__do_softirq+0x85/0x90 > (XEN) [<ffff82d080239f1a>] do_softirq+0x13/0x15 > (XEN) [<ffff82d0802738f0>] domain.c#idle_loop+0xac/0xbe > Ok, back to square 1. :-/
A data point is that Credit2 works. In Credit2, vcpu_move_locked() (called by vcpu_migrate()) calls a function called migrate() which --because of Credit2 specific reasons-- consider legit the fact that it finds the vcpu in a runqueue... So that's what I think "save" us, and that is why this data point does not help much (sorry Olaf for not realizing this earlier, and asking you to try Credit2). :-( On the other hand, in Credit1, there should be no good reason why vcpu_migrate() would be called on a vcpu which is on a runqueue, and the fact that we're still crashing proves that there is at least another race, causing that to happen. So, the debug patch I posted previously in this thread, was wrong. I'm attaching a new one to this email. Olaf, if you're trying again, please do it with both, the "fix" (xen-sched-debug-vcpumigrate-race.patch), and this one. Debug hypervisor, as usual, if possible. :-) It will crash, again, possibly with the same stack trace, but I think it's worth a try. Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/
commit 5fb7ad8d1220101e69a87014d5a485d19aea9917 Author: Dario Faggioli <dfaggi...@suse.com> Date: Wed Apr 11 09:04:33 2018 +0200 xen: credit: implement SCHED_OP(migrate) with just sanity checking in it, to catch a race. Signed-off-by: Dario Faggioli <dfaggi...@suse.com> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index 9bc638c09c..7a909376e6 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -867,6 +867,17 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) return cpu; } +static void +csched_vcpu_migrate(const struct scheduler *ops, struct vcpu *vc, + unsigned int new_cpu) +{ + BUG_ON(vc->is_running); + BUG_ON(test_bit(_VPF_migrating, &vc->pause_flags)); + BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc))); + BUG_ON(CSCHED_VCPU(vc) == CSCHED_VCPU(curr_on_cpu(vc->processor))); + vc->processor = new_cpu; +} + static int csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc) { @@ -2278,6 +2289,7 @@ static const struct scheduler sched_credit_def = { .adjust_global = csched_sys_cntl, .pick_cpu = csched_cpu_pick, + .migrate = csched_vcpu_migrate, .do_schedule = csched_schedule, .dump_cpu_state = csched_dump_pcpu,
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel