Re: [Xen-devel] [BUG] mistakenly wake in Xen's credit scheduler

Dario Faggioli Tue, 27 Oct 2015 22:43:20 -0700

On Tue, 2015-10-27 at 14:32 -0600, suokun wrote:
> On Tue, Oct 27, 2015 at 4:44 AM, Dario Faggioli
> <dario.faggi...@citrix.com> wrote:


> Hi, Dario,
> Thank you for your reply.
>
Hi,

> Here are my two VMs running on the same physical CPU.
> VM-IO: 1-vCPU pinned to a pCPU, running netperf
> VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop
> Another machine run the netperf client to send the requests to VM-IO.
> 
> My code is very simple:
> in VM-IO, as server side: $ netserver -p 12345
> in VM-CPU, just running a while(1) loop: $./loop
> in the client, send I/O request to the VM-IO: $ netperf -H
> [server_ip]
> -l 15 -t TCP_STREAM -p 12345
> 
Ok, thanks.

> The setting that led to the poor IO performance is as follows:
> VM-IO:  1-vCPU pinned to a pCPU, running netperf
> VM-CPU: 1-vCPU pinned the the same pCPU, running a while(1) loop
> 
> The root cause is that when an IO request comes, VM-IO’s vCPU is
> elevated to BOOST and goes through vcpu_wake —> __runq_tickle. In
> __runq_tickle, the currently running vCPU (i.e., the vCPU from VM
> -CPU) is marked as _VPF_migrating.
>
Ok.

> Then, Xen goes through schedule() to
> reschedule the current vCPU (i.e., vCPU from VM-CPU) and schedule the
> next vCPU (i.e., the vCPU from VM-IO). Due to the _VPF_migrating 
> flag, the descheduled vCPU will be migrated in context_saved() and 
> later woken up in cpu_wake().
>
Sure.

> Indeed, csched_vcpu_wake() will quit if the
> vCPU from VM-CPU is on run queue. But it is actually not. In
> csched_schedule(), the vCPU will not be inserted back to run queue
> because it is not runnable due to the __VPF_migrating bit in
> pause_flags. As such, the vCPU from VM-CPU will boosted and not be
> preempted by a later IO request because BOOST can not preempt BOOST.
> 
Aha! Now I see what you mean. From the previous email, I couldn't
really tell which one call to schedule you where looking at, during
each phase of the analysis... Thanks for clarifying!

And, yes, I agree with you that, since the vCPU of VM-CPU fails the
vcpu_runnable() test, it's being treated as it is really waking up from
sleep, in csched_vcpu_wake(), and hence boosted.

> A simple fix would be allowing BOOST to preempt BOOST. 
>
Nah, that would be an hack on top of an hack! :-P

> A better fix
> would be checking the CPU affinity before setting the __VPF_migrating
> flag.
> 
Yeah, I like this better. So, can you try the patch attached to this
email?

Here at my place, without any patch, I get the following results:

 idle:       throughput = 806.64
 with noise: throughput = 166.50

With the patch, I get this:

 idle:       throughput = 807.18
 with noise: throughput = 731.66

The patch (if you confirm that it works) fixes the bug in this
particular situations, where vCPUs are all pinned to the same pCPUs,
but does not prevent vCPUs being migrated around the pCPUs to become
BOOSTed in Credit2.

That is something I think we should avoid, and I've got a (small) patch
series ready for that. I'll give some more testing to it before sending
it to the list, though, as I want to make sure it's not causing
regressions.

Thanks and Regards,
Dario
--- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

commit 16381936ad320d010c7566c946a3e528f803e78a
Author: Dario Faggioli <dario.faggi...@citrix.com>
Date:   Tue Oct 27 23:22:16 2015 +0100

    xen: credit1: on vCPU wakeup, kick away current only if makes sense
    
    In fact, when waking up a vCPU, __runq_tickle() is called
    to allow the new vCPU to run on a pCPU (which one, depends
    on the relationship between the priority of the new vCPU,
    and the ones of the vCPUs that are already running).
    
    If there is no idle processor on which the new vCPU can
    run (e.g., because of pinning/affinity), we try to migrate
    away the vCPU that is currently running on the new vCPU's
    processor (i.e., the processor on which the vCPU is waking
    up).
    
    Now, trying to migrate a vCPU has the effect of pushing it
    through a
    
     running --> offline --> runnable
    
    transition, which, in turn, has the following negative
    effects:
    
     1) Credit1 counts that as a wakeup, and it BOOSTs the
        vCPU, even if it is a CPU-bound one, which wouldn't
        normally have deserved boosting. This can prevent
        legit IO-bound vCPUs to get ahold of the processor
        until such spurious boosting expires, hurting the
        performance!
    
     2) since the vCPU is fails the vcpu_runnable() test
        (within the call to csched_schedule() that follows
        the wakeup, as a consequence of tickling) the
        scheduling rate-limiting mechanism is also fooled,
        i.e., the context switch happens even if less than
        the minimum execution amount of time passed.
    
    In particular, 1) has been reported to cause the following
    issue:
    
     * VM-IO: 1-vCPU pinned to a pCPU, running netperf
     * VM-CPU: 1-vCPU pinned the the same pCPU, running a busy
               CPU loop
     ==> Only VM-I/O: throughput is 806.64 Mbps
     ==> VM-I/O + VM-CPU: throughput is 166.50 Mbps
    
    This patch solves (for the above scenario) the problem
    by checking whether or not it makes sense to try to
    migrate away the vCPU currently running on the processor.
    In fact, we shouldn't even try to do it, if there aren't
    idle processors where such a vCPU can execute. In such case,
    Attempting the migration is just futile (harmful, actually!).
    
    With this patch, in the above configuration, results are:
    
     ==> Only VM-I/O: throughput is 807.18 Mbps
     ==> VM-I/O + VM-CPU: throughput is 731.66 Mbps
    
    Note that, still about 1), it is _wrong_ that Credit1
    treats wakeups resulting from migration of a vCPU to
    another pCPU as "regular wakeups", hence granting BOOST
    priority to the vCPUs experiencing that. However:
     - fixing that is non-trivial, and requires being done
       in its own patch;
     - that is orthogonal to the fix being introduced here.
       That is to say, even when Credit1 will be fixed not
       to boost migrating vCPUs, this patch will be still
       corect and necessary.
    
    Reported-by: suokun <suokuns...@gmail.com>
    Signed-off-by: Dario Faggioli <dario.faggi...@citrix.com>
    ---
    Cc: George Dunlap <george.dun...@citrix.com>
    Cc: suokun <suokuns...@gmail.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index b8f28fe..1b30e67 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -426,9 +426,10 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
 
             /*
              * If there are no suitable idlers for new, and it's higher
-             * priority than cur, ask the scheduler to migrate cur away.
-             * We have to act like this (instead of just waking some of
-             * the idlers suitable for cur) because cur is running.
+             * priority than cur, check whether we can migrate cur away.
+             * (We have to do it indirectly, via _VPF_migrating, instead
+             * of just tickling any idler suitable for cur) because cur
+             * is running.)
              *
              * If there are suitable idlers for new, no matter priorities,
              * leave cur alone (as it is running and is, likely, cache-hot)
@@ -437,11 +438,18 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
              */
             if ( new_idlers_empty && new->pri > cur->pri )
             {
+                csched_balance_cpumask(cur->vcpu, balance_step,
+                                       csched_balance_mask(cpu));
+                if ( cpumask_intersects(csched_balance_mask(cpu),
+                                        &idle_mask) )
+                {
+                    SCHED_VCPU_STAT_CRANK(cur, kicked_away);
+                    SCHED_VCPU_STAT_CRANK(cur, migrate_r);
+                    SCHED_STAT_CRANK(migrate_kicked_away);
+                    set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
+                }
+                /* Tickle cpu anyway, to let new preempt cur. */
                 SCHED_STAT_CRANK(tickle_idlers_none);
-                SCHED_VCPU_STAT_CRANK(cur, kicked_away);
-                SCHED_VCPU_STAT_CRANK(cur, migrate_r);
-                SCHED_STAT_CRANK(migrate_kicked_away);
-                set_bit(_VPF_migrating, &cur->vcpu->pause_flags);
                 __cpumask_set_cpu(cpu, &mask);
             }
             else if ( !new_idlers_empty )

signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [BUG] mistakenly wake in Xen's credit scheduler

Reply via email to