On Thu, 2018-04-12 at 09:38 +0000, George Dunlap wrote: > > On Apr 11, 2018, at 10:31 PM, Dario Faggioli <raist...@linux.it> > > wrote: > > (XEN) Xen BUG at sched_credit.c:876 > > (XEN) ----[ Xen-4.11.20180410T125709.50f8ba84a5- > > 7.bug1087289_411 x86_64 debug=y Not tainted ]---- > > (XEN) CPU: 108 > > (XEN) RIP: e008:[<ffff82d080229ab4>] > > sched_credit.c#csched_vcpu_migrate+0x27/0x54 > > (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor > > ... > > (XEN) Xen call trace: > > (XEN) [<ffff82d080229ab4>] > > sched_credit.c#csched_vcpu_migrate+0x27/0x54 > > (XEN) [<ffff82d080236348>] schedule.c#vcpu_move_locked+0xbb/0xc2 > > (XEN) [<ffff82d08023764c>] schedule.c#vcpu_migrate+0x226/0x25b > > (XEN) [<ffff82d080239367>] context_saved+0x95/0x9c > > (XEN) [<ffff82d08027797d>] context_switch+0xe66/0xeb0 > > (XEN) [<ffff82d080236943>] schedule.c#schedule+0x5f4/0x627 > > (XEN) [<ffff82d080239f15>] softirq.c#__do_softirq+0x85/0x90 > > (XEN) [<ffff82d080239f6a>] do_softirq+0x13/0x15 > > (XEN) [<ffff82d08031f5db>] vmx_asm_do_vmentry+0x2b/0x30 > > > > So, really *exactly* the same. Ok, thanks. > > But this doesn’t make any sense. If you applied Dario’s ‘fix’ patch, > then context_saved() should have *just* called vcpu_sleep_nosync() > before calling vcpu_migrate(). The VPF_migrating flag should still > be set, so it should have called csched_vcpu_sleep(); and sd->curr > should have been changed to be != prev way back in schedule(), so > csched_vcpu_sleep() should have called runq_remove(). > Well, you've just described me, banging my head on my desk, since yesterday afternoon. :-P
> It’s probably worth asking the obvious question: Are you sure the > “fix” patch is actually applied (in addition to the new “debug” > patch)? :-) > > If so, then maybe it’s time to open-code vcpu_sleep_nosync() there in > context_saved(), to try to figure out where our understanding of what > *should* happen is incorrect. > Ehm... Can you please stop reading my mind? It's annoying. :-D Well, I guess we can say: "great minds think alike". :-P Olaf, new patch. Please, remove _everything_ and apply _only_ this one. As George is saying, the vcpu just can't be in the runqueue, unless: 1) vcpu_sleep_nosync() did not remove it 2) someone is putting it back there Let's check 1 first. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Software Engineer @ SUSE https://www.suse.com/
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index 9bc638c09c..67628a1f95 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -867,6 +867,17 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) return cpu; } +static void +csched_vcpu_migrate(const struct scheduler *ops, struct vcpu *vc, + unsigned int new_cpu) +{ + BUG_ON(vc->is_running); + BUG_ON(test_bit(_VPF_migrating, &vc->pause_flags)); + BUG_ON(CSCHED_VCPU(vc) == CSCHED_VCPU(curr_on_cpu(vc->processor))); + BUG_ON(__vcpu_on_runq(CSCHED_VCPU(vc))); + vc->processor = new_cpu; +} + static int csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc) { @@ -2278,6 +2289,7 @@ static const struct scheduler sched_credit_def = { .adjust_global = csched_sys_cntl, .pick_cpu = csched_cpu_pick, + .migrate = csched_vcpu_migrate, .do_schedule = csched_schedule, .dump_cpu_state = csched_dump_pcpu, diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 343ab6306e..7be62efa33 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -1554,7 +1554,29 @@ void context_saved(struct vcpu *prev) SCHED_OP(vcpu_scheduler(prev), context_saved, prev); if ( unlikely(prev->pause_flags & VPF_migrating) ) + { + /* + * If someone (e.g., vcpu_set_affinity()) has set VPF_migrating + * on prev in between when schedule() releases the scheduler + * lock and here, we need to make sure we properly mark the + * vcpu as not runnable (and all it comes with that), with + * vcpu_sleep_nosync(), before calling vcpu_migrate(). + */ + //vcpu_sleep_nosync(prev); + unsigned long flags; + spinlock_t *lock = vcpu_schedule_lock_irqsave(prev, &flags); + + BUG_ON(vcpu_runnable(prev)); + BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags)); + if ( prev->runstate.state == RUNSTATE_runnable ) + vcpu_runstate_change(prev, RUNSTATE_offline, NOW()); + BUG_ON(curr_on_cpu(prev->processor) == prev); + SCHED_OP(vcpu_scheduler(prev), sleep, prev); + + vcpu_schedule_unlock_irqrestore(lock, flags, prev); + vcpu_migrate(prev); + } } /* The scheduler timer: force a run through the scheduler */
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel