Re: [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

2017-11-22 Thread Quan Xu



On 2017-11-16 17:45, Daniel Lezcano wrote:

On 16/11/2017 10:12, Quan Xu wrote:


On 2017-11-16 06:03, Thomas Gleixner wrote:

On Wed, 15 Nov 2017, Peter Zijlstra wrote:


On Mon, Nov 13, 2017 at 06:06:02PM +0800, Quan Xu wrote:

From: Yang Zhang <yang.zhang...@gmail.com>

Implement a generic idle poll which resembles the functionality
found in arch/. Provide weak arch_cpu_idle_poll function which
can be overridden by the architecture code if needed.

No, we want less of those magic hooks, not more.


Interrupts arrive which may not cause a reschedule in idle loops.
In KVM guest, this costs several VM-exit/VM-entry cycles, VM-entry
for interrupts and VM-exit immediately. Also this becomes more
expensive than bare metal. Add a generic idle poll before enter
real idle path. When a reschedule event is pending, we can bypass
the real idle path.

Why not do a HV specific idle driver?

If I understand the problem correctly then he wants to avoid the heavy
lifting in tick_nohz_idle_enter() in the first place, but there is
already
an interesting quirk there which makes it exit early.  See commit
3c5d92a0cfb5 ("nohz: Introduce arch_needs_cpu"). The reason for this
commit
looks similar. But lets not proliferate that. I'd rather see that go
away.

agreed.

Even we can get more benifit than commit 3c5d92a0cfb5 ("nohz: Introduce
arch_needs_cpu")
in kvm guest. I won't proliferate that..


But the irq_timings stuff is heading into the same direction, with a more
complex prediction logic which should tell you pretty good how long that
idle period is going to be and in case of an interrupt heavy workload
this
would skip the extra work of stopping and restarting the tick and
provide a
very good input into a polling decision.


interesting. I have tested with IRQ_TIMINGS related code, which seems
not working so far.

I don't know how you tested it, can you elaborate what you meant by
"seems not working so far" ?


Daniel, I tried to enable IRQ_TIMINGS* manually. used 
irq_timings_next_event()

to return estimation of the earliest interrupt. However I got a constant.


There are still some work to do to be more efficient. The prediction
based on the irq timings is all right if the interrupts have a simple
periodicity. But as soon as there is a pattern, the current code can't
handle it properly and does bad predictions.

I'm working on a self-learning pattern detection which is too heavy for
the kernel, and with it we should be able to detect properly the
patterns and re-ajust the period if it changes. I'm in the process of
making it suitable for kernel code (both math and perf).

One improvement which can be done right now and which can help you is
the interrupts rate on the CPU. It is possible to compute it and that
will give an accurate information for the polling decision.


As tglx said, talk to each other / work together to make it usable for 
all use cases.
could you share how to enable it to get the interrupts rate on the CPU? 
I can try it

in cloud scenario. of course, I'd like to work with you to improve it.

Quan
Alibaba Cloud

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Xen-devel] [PATCH RFC v3 0/6] x86/idle: add halt poll support

2017-11-22 Thread Quan Xu



On 2017-11-16 05:31, Konrad Rzeszutek Wilk wrote:

On Mon, Nov 13, 2017 at 06:05:59PM +0800, Quan Xu wrote:

From: Yang Zhang <yang.zhang...@gmail.com>

Some latency-intensive workload have seen obviously performance
drop when running inside VM. The main reason is that the overhead
is amplified when running inside VM. The most cost I have seen is
inside idle path.

Meaning an VMEXIT b/c it is an 'halt' operation ? And then going
back in guest (VMRESUME) takes time. And hence your latency gets
all whacked b/c of this?

   Konrad, I can't follow 'b/c' here.. sorry.


So if I understand - you want to use your _full_ timeslice (of the guest)
without ever (or as much as possible) to go in the hypervisor?

    as much as possible.


Which means in effect you don't care about power-saving or CPUfreq
savings, you just want to eat the full CPU for snack?
  actually, we  care about power-saving. The poll duration is 
self-tuning, otherwise it is almost as the same as
  'halt=poll'. Also we always sent out with CPU usage of benchmark 
netperf/ctxsw. We got much more

  performance with limited promotion of CPU usage.



This patch introduces a new mechanism to poll for a while before
entering idle state. If schedule is needed during poll, then we
don't need to goes through the heavy overhead path.

Schedule of what? The guest or the host?

  rescheduled of guest scheduler..
  it is the guest.


Quan
Alibaba Cloud





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

2017-11-22 Thread Quan Xu



On 2017-11-17 19:36, Thomas Gleixner wrote:

On Fri, 17 Nov 2017, Quan Xu wrote:

On 2017-11-16 17:53, Thomas Gleixner wrote:

That's just plain wrong. We don't want to see any of this PARAVIRT crap in
anything outside the architecture/hypervisor interfacing code which really
needs it.

The problem can and must be solved at the generic level in the first place
to gather the data which can be used to make such decisions.

How that information is used might be either completely generic or requires
system specific variants. But as long as we don't have any information at
all we cannot discuss that.

Please sit down and write up which data needs to be considered to make
decisions about probabilistic polling. Then we need to compare and contrast
that with the data which is necessary to make power/idle state decisions.

I would be very surprised if this data would not overlap by at least 90%.


1. which data needs to considerd to make decisions about probabilistic polling

I really need to write up which data needs to considerd to make
decisions about probabilistic polling. At last several months,
I always focused on the data _from idle to reschedule_, then to bypass
the idle loops. unfortunately, this makes me touch scheduler/idle/nohz
code inevitably.

with tglx's suggestion, the data which is necessary to make power/idle
state decisions, is the last idle state's residency time. IIUC this data
is duration from idle to wakeup, which maybe by reschedule irq or other irq.

That's part of the picture, but not complete.


tglx, could you share more? I am very curious about it..


I also test that the reschedule irq overlap by more than 90% (trace the
need_resched status after cpuidle_idle_call), when I run ctxsw/netperf for
one minute.

as the overlap, I think I can input the last idle state's residency time
to make decisions about probabilistic polling, as @dev->last_residency does.
it is much easier to get data.

That's only true for your particular use case.


2. do a HV specific idle driver (function)

so far, power management is not exposed to guest.. idle is simple for KVM
guest,
calling "sti" / "hlt"(cpuidle_idle_call() --> default_idle_call())..
thanks Xen guys, who has implemented the paravirt framework. I can implement
it
as easy as following:

  --- a/arch/x86/kernel/kvm.c

Your email client is using a very strange formatting.


my bad, I insert space to highlight these code.


This is definitely better than what you proposed so far and implementing it
as a prove of concept seems to be worthwhile.

But I doubt that this is the final solution. It's not generic and not
necessarily suitable for all use case scenarios.



yes, I am exhausted :):)


could you tell me the gap to be generic and necessarily suitable for
all use case scenarios? as lack of irq/idle predictors?

 I really want to upstream it for all of public cloud users/providers..

as kvm host has a similar one, is it possible to upstream with following 
conditions? :
    1). add a QEMU configuration, whether enable or not, by default 
disable.

    2). add some "TODO" comments near the code.
    3). ...


anyway, thanks for your help..

Quan
 Alibaba Cloud
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

2017-11-22 Thread Quan Xu



On 2017-11-16 06:03, Thomas Gleixner wrote:

On Wed, 15 Nov 2017, Peter Zijlstra wrote:


On Mon, Nov 13, 2017 at 06:06:02PM +0800, Quan Xu wrote:

From: Yang Zhang <yang.zhang...@gmail.com>

Implement a generic idle poll which resembles the functionality
found in arch/. Provide weak arch_cpu_idle_poll function which
can be overridden by the architecture code if needed.

No, we want less of those magic hooks, not more.


Interrupts arrive which may not cause a reschedule in idle loops.
In KVM guest, this costs several VM-exit/VM-entry cycles, VM-entry
for interrupts and VM-exit immediately. Also this becomes more
expensive than bare metal. Add a generic idle poll before enter
real idle path. When a reschedule event is pending, we can bypass
the real idle path.

Why not do a HV specific idle driver?

If I understand the problem correctly then he wants to avoid the heavy
lifting in tick_nohz_idle_enter() in the first place, but there is already
an interesting quirk there which makes it exit early.  See commit
3c5d92a0cfb5 ("nohz: Introduce arch_needs_cpu"). The reason for this commit
looks similar. But lets not proliferate that. I'd rather see that go away.


agreed.

Even we can get more benifit than commit 3c5d92a0cfb5 ("nohz: Introduce 
arch_needs_cpu")

in kvm guest. I won't proliferate that..


But the irq_timings stuff is heading into the same direction, with a more
complex prediction logic which should tell you pretty good how long that
idle period is going to be and in case of an interrupt heavy workload this
would skip the extra work of stopping and restarting the tick and provide a
very good input into a polling decision.



interesting. I have tested with IRQ_TIMINGS related code, which seems 
not working so far.

Also I'd like to help as much as I can.

This can be handled either in a HV specific idle driver or even in the
generic core code. If the interrupt does not arrive then you can assume
within the predicted time then you can assume that the flood stopped and
invoke halt or whatever.

That avoids all of that 'tunable and tweakable' x86 specific hackery and
utilizes common functionality which is mostly there already.
here is some sample code. Poll for a while before enter halt in 
cpuidle_enter_state()
If I get a reschedule event, then don't try to enter halt.  (I hope this 
is the right direction as Peter mentioned in another email)


--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -210,6 +210,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
struct cpuidle_driver *drv,

    target_state = >states[index];
    }

+#ifdef CONFIG_PARAVIRT
+   paravirt_idle_poll();
+
+   if (need_resched())
+   return -EBUSY;
+#endif
+
    /* Take note of the planned idle state. */
    sched_idle_set_state(target_state);




thanks,

Quan
Alibaba Cloud
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

2017-11-22 Thread Quan Xu



On 2017-11-16 17:53, Thomas Gleixner wrote:

On Thu, 16 Nov 2017, Quan Xu wrote:

On 2017-11-16 06:03, Thomas Gleixner wrote:
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -210,6 +210,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev,
struct cpuidle_driver *drv,
     target_state = >states[index];
     }

+#ifdef CONFIG_PARAVIRT
+   paravirt_idle_poll();
+
+   if (need_resched())
+   return -EBUSY;
+#endif

That's just plain wrong. We don't want to see any of this PARAVIRT crap in
anything outside the architecture/hypervisor interfacing code which really
needs it.

The problem can and must be solved at the generic level in the first place
to gather the data which can be used to make such decisions.

How that information is used might be either completely generic or requires
system specific variants. But as long as we don't have any information at
all we cannot discuss that.

Please sit down and write up which data needs to be considered to make
decisions about probabilistic polling. Then we need to compare and contrast
that with the data which is necessary to make power/idle state decisions.

I would be very surprised if this data would not overlap by at least 90%.



Peter, tglx
Thanks for your comments..

rethink of this patch set,

1. which data needs to considerd to make decisions about probabilistic 
polling


I really need to write up which data needs to considerd to make
decisions about probabilistic polling. At last several months,
I always focused on the data _from idle to reschedule_, then to bypass
the idle loops. unfortunately, this makes me touch scheduler/idle/nohz
code inevitably.

with tglx's suggestion, the data which is necessary to make power/idle
state decisions, is the last idle state's residency time. IIUC this data
is duration from idle to wakeup, which maybe by reschedule irq or other irq.

I also test that the reschedule irq overlap by more than 90% (trace the
need_resched status after cpuidle_idle_call), when I run ctxsw/netperf for
one minute.

as the overlap, I think I can input the last idle state's residency time
to make decisions about probabilistic polling, as @dev->last_residency does.
it is much easier to get data.


2. do a HV specific idle driver (function)

so far, power management is not exposed to guest.. idle is simple for 
KVM guest,

calling "sti" / "hlt"(cpuidle_idle_call() --> default_idle_call())..
thanks Xen guys, who has implemented the paravirt framework. I can 
implement it

as easy as following:

 --- a/arch/x86/kernel/kvm.c
 +++ b/arch/x86/kernel/kvm.c
 @@ -465,6 +465,12 @@ static void __init 
kvm_apf_trap_init(void)

 update_intr_gate(X86_TRAP_PF, async_page_fault);
  }

 +static __cpuidle void kvm_safe_halt(void)
 +{
     +    /* 1. POLL, if need_resched() --> return */
     +
 +    asm volatile("sti; hlt": : :"memory"); /* 2. halt */
 +
     +    /* 3. get the last idle state's residency time */
 +
     +    /* 4. update poll duration based on last idle state's 
residency time */

 +}
 +
  void __init kvm_guest_init(void)
  {
 int i;
 @@ -490,6 +496,8 @@ void __init kvm_guest_init(void)
 if (kvmclock_vsyscall)
 kvm_setup_vsyscall_timeinfo();

 +   pv_irq_ops.safe_halt = kvm_safe_halt;
 +
  #ifdef CONFIG_SMP




then, I am no need to introduce a new pvops, and never modify 
schedule/idle/nohz code again.

also I can narrow all of the code down in arch/x86/kernel/kvm.c.

If this is in the right direction, I will send a new patch set next week..

thanks,

Quan
Alibaba Cloud

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

2017-11-22 Thread Quan Xu



On 2017-11-16 16:45, Peter Zijlstra wrote:

On Wed, Nov 15, 2017 at 11:03:08PM +0100, Thomas Gleixner wrote:

If I understand the problem correctly then he wants to avoid the heavy
lifting in tick_nohz_idle_enter() in the first place, but there is already
an interesting quirk there which makes it exit early.

Sure. And there are people who want to do the same for native.

Adding more ugly and special cases just isn't the way to go about doing
that.

I'm fairly sure I've told the various groups that want to tinker with
this to work together on this. I've also in fairly significant detail
sketched how to rework the idle code and idle predictors.

At this point I'm too tired to dig any of that up, so I'll just keep
saying no to patches that don't even attempt to go in the right
direction.

Peter, take care.

I really have considered this factor, and try my best not to interfere 
with scheduler/idle code.
if irq_timings code is ready, I can use it directly. I think irq_timings 
is not an easy task, I'd
like to help as much as I can.  Also don't try to touch tick_nohz* code 
again.


as tglx suggested, this can be handled either in a HV specific idle driver or 
even in the generic core code.

I hope this is in the right direction.

Quan
Alibaba Cloud

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-22 Thread Quan Xu



On 2017/11/14 18:27, Juergen Gross wrote:

On 14/11/17 10:38, Quan Xu wrote:


On 2017/11/14 15:30, Juergen Gross wrote:

On 14/11/17 08:02, Quan Xu wrote:

On 2017/11/13 18:53, Juergen Gross wrote:

On 13/11/17 11:06, Quan Xu wrote:

From: Quan Xu <quan@gmail.com>

So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
in idle path which will poll for a while before we enter the real idle
state.

In virtualization, idle path includes several heavy operations
includes timer access(LAPIC timer or TSC deadline timer) which will
hurt performance especially for latency intensive workload like
message
passing task. The cost is mainly from the vmexit which is a hardware
context switch between virtual machine and hypervisor. Our solution is
to poll for a while and do not enter real idle path if we can get the
schedule event during polling.

Poll may cause the CPU waste so we adopt a smart polling mechanism to
reduce the useless poll.

Signed-off-by: Yang Zhang <yang.zhang...@gmail.com>
Signed-off-by: Quan Xu <quan@gmail.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Alok Kataria <akata...@vmware.com>
Cc: Rusty Russell <ru...@rustcorp.com.au>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: x...@kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
Cc: xen-de...@lists.xenproject.org

Hmm, is the idle entry path really so critical to performance that a
new
pvops function is necessary?

Juergen, Here is the data we get when running benchmark netperf:
   1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
  29031.6 bit/s -- 76.1 %CPU

   2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
  35787.7 bit/s -- 129.4 %CPU

   3. w/ kvm dynamic poll:
  35735.6 bit/s -- 200.0 %CPU

   4. w/patch and w/ kvm dynamic poll:
  42225.3 bit/s -- 198.7 %CPU

   5. idle=poll
  37081.7 bit/s -- 998.1 %CPU



   w/ this patch, we will improve performance by 23%.. even we could
improve
   performance by 45.4%, if we use w/patch and w/ kvm dynamic poll.
also the
   cost of CPU is much lower than 'idle=poll' case..

I don't question the general idea. I just think pvops isn't the best way
to implement it.


Wouldn't a function pointer, maybe guarded
by a static key, be enough? A further advantage would be that this
would
work on other architectures, too.

I assume this feature will be ported to other archs.. a new pvops makes

   sorry, a typo.. /other archs/other hypervisors/
   it refers hypervisor like Xen, HyperV and VMware)..


code
clean and easy to maintain. also I tried to add it into existed pvops,
but it
doesn't match.

You are aware that pvops is x86 only?

yes, I'm aware..


I really don't see the big difference in maintainability compared to the
static key / function pointer variant:

void (*guest_idle_poll_func)(void);
struct static_key guest_idle_poll_key __read_mostly;

static inline void guest_idle_poll(void)
{
 if (static_key_false(_idle_poll_key))
     guest_idle_poll_func();
}



thank you for your sample code :)
I agree there is no big difference.. I think we are discussion for two
things:
  1) x86 VM on different hypervisors
  2) different archs VM on kvm hypervisor

What I want to do is x86 VM on different hypervisors, such as kvm / xen
/ hyperv ..

Why limit the solution to x86 if the more general solution isn't
harder?

As you didn't give any reason why the pvops approach is better other
than you don't care for non-x86 platforms you won't get an "Ack" from
me for this patch.



It just looks a little odder to me. I understand you care about no-x86 arch.

Are you aware 'pv_time_ops' for arm64/arm/x86 archs, defined in
   - arch/arm64/include/asm/paravirt.h
   - arch/x86/include/asm/paravirt_types.h
   - arch/arm/include/asm/paravirt.h

I am unfamilar with arm code. IIUC, if you'd implement pv_idle_ops
for arm/arm64 arch, you'd define a same structure in
   - arch/arm64/include/asm/paravirt.h or
   - arch/arm/include/asm/paravirt.h

.. instead of static key / fuction.

then implement a real function in
   - arch/arm/kernel/paravirt.c.

Also I wonder HOW/WHERE to define a static key/function, then to benifit
x86/no-x86 archs?

Quan
Alibaba Cloud


And KVM would just need to set guest_idle_poll_func and enable the
static key. Works on non-x86 architectures, too.


.. referred to 'pv_mmu_ops', HyperV and Xen can implement their own
functions for 'pv_mmu_ops'.
I think it is the same to pv_idle_ops.

with above explaination, do you still think I need to define the static
key/function pointer variant?

btw, any interest to port it to Xen HVM guest? :)

Maybe. But this should work for Xen on ARM, too.


Juergen



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-22 Thread Quan Xu



On 2017/11/14 16:22, Wanpeng Li wrote:

2017-11-14 16:15 GMT+08:00 Quan Xu <quan@gmail.com>:


On 2017/11/14 15:12, Wanpeng Li wrote:

2017-11-14 15:02 GMT+08:00 Quan Xu <quan@gmail.com>:


On 2017/11/13 18:53, Juergen Gross wrote:

On 13/11/17 11:06, Quan Xu wrote:

From: Quan Xu <quan@gmail.com>

So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
in idle path which will poll for a while before we enter the real idle
state.

In virtualization, idle path includes several heavy operations
includes timer access(LAPIC timer or TSC deadline timer) which will
hurt performance especially for latency intensive workload like message
passing task. The cost is mainly from the vmexit which is a hardware
context switch between virtual machine and hypervisor. Our solution is
to poll for a while and do not enter real idle path if we can get the
schedule event during polling.

Poll may cause the CPU waste so we adopt a smart polling mechanism to
reduce the useless poll.

Signed-off-by: Yang Zhang <yang.zhang...@gmail.com>
Signed-off-by: Quan Xu <quan@gmail.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Alok Kataria <akata...@vmware.com>
Cc: Rusty Russell <ru...@rustcorp.com.au>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: x...@kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
Cc: xen-de...@lists.xenproject.org

Hmm, is the idle entry path really so critical to performance that a new
pvops function is necessary?

Juergen, Here is the data we get when running benchmark netperf:
   1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
  29031.6 bit/s -- 76.1 %CPU

   2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
  35787.7 bit/s -- 129.4 %CPU

   3. w/ kvm dynamic poll:
  35735.6 bit/s -- 200.0 %CPU

Actually we can reduce the CPU utilization by sleeping a period of
time as what has already been done in the poll logic of IO subsystem,
then we can improve the algorithm in kvm instead of introduing another
duplicate one in the kvm guest.

We really appreciate upstream's kvm dynamic poll mechanism, which is
really helpful for a lot of scenario..

However, as description said, in virtualization, idle path includes
several heavy operations includes timer access (LAPIC timer or TSC
deadline timer) which will hurt performance especially for latency
intensive workload like message passing task. The cost is mainly from
the vmexit which is a hardware context switch between virtual machine
and hypervisor.

for upstream's kvm dynamic poll mechanism, even you could provide a
better algorism, how could you bypass timer access (LAPIC timer or TSC
deadline timer), or a hardware context switch between virtual machine
and hypervisor. I know these is a tradeoff.

Furthermore, here is the data we get when running benchmark contextswitch
to measure the latency(lower is better):

1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
   3402.9 ns/ctxsw -- 199.8 %CPU

2. w/ patch and disable kvm dynamic poll:
   1163.5 ns/ctxsw -- 205.5 %CPU

3. w/ kvm dynamic poll:
   2280.6 ns/ctxsw -- 199.5 %CPU

so, these tow solution are quite similar, but not duplicate..

that's also why to add a generic idle poll before enter real idle path.
When a reschedule event is pending, we can bypass the real idle path.


There is a similar logic in the idle governor/driver, so how this
patchset influence the decision in the idle governor/driver when
running on bare-metal(power managment is not exposed to the guest so
we will not enter into idle driver in the guest)?



This is expected to take effect only when running as a virtual machine with
proper CONFIG_* enabled. This can not work on bare mental even with proper
CONFIG_* enabled.

Quan
Alibaba Cloud
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-22 Thread Quan Xu



On 2017/11/14 15:30, Juergen Gross wrote:

On 14/11/17 08:02, Quan Xu wrote:


On 2017/11/13 18:53, Juergen Gross wrote:

On 13/11/17 11:06, Quan Xu wrote:

From: Quan Xu <quan@gmail.com>

So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
in idle path which will poll for a while before we enter the real idle
state.

In virtualization, idle path includes several heavy operations
includes timer access(LAPIC timer or TSC deadline timer) which will
hurt performance especially for latency intensive workload like message
passing task. The cost is mainly from the vmexit which is a hardware
context switch between virtual machine and hypervisor. Our solution is
to poll for a while and do not enter real idle path if we can get the
schedule event during polling.

Poll may cause the CPU waste so we adopt a smart polling mechanism to
reduce the useless poll.

Signed-off-by: Yang Zhang <yang.zhang...@gmail.com>
Signed-off-by: Quan Xu <quan@gmail.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Alok Kataria <akata...@vmware.com>
Cc: Rusty Russell <ru...@rustcorp.com.au>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: x...@kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
Cc: xen-de...@lists.xenproject.org

Hmm, is the idle entry path really so critical to performance that a new
pvops function is necessary?

Juergen, Here is the data we get when running benchmark netperf:
  1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
     29031.6 bit/s -- 76.1 %CPU

  2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
     35787.7 bit/s -- 129.4 %CPU

  3. w/ kvm dynamic poll:
     35735.6 bit/s -- 200.0 %CPU

  4. w/patch and w/ kvm dynamic poll:
     42225.3 bit/s -- 198.7 %CPU

  5. idle=poll
     37081.7 bit/s -- 998.1 %CPU



  w/ this patch, we will improve performance by 23%.. even we could improve
  performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
  cost of CPU is much lower than 'idle=poll' case..

I don't question the general idea. I just think pvops isn't the best way
to implement it.


Wouldn't a function pointer, maybe guarded
by a static key, be enough? A further advantage would be that this would
work on other architectures, too.

I assume this feature will be ported to other archs.. a new pvops makes


  sorry, a typo.. /other archs/other hypervisors/
  it refers hypervisor like Xen, HyperV and VMware)..


code
clean and easy to maintain. also I tried to add it into existed pvops,
but it
doesn't match.

You are aware that pvops is x86 only?


yes, I'm aware..


I really don't see the big difference in maintainability compared to the
static key / function pointer variant:

void (*guest_idle_poll_func)(void);
struct static_key guest_idle_poll_key __read_mostly;

static inline void guest_idle_poll(void)
{
if (static_key_false(_idle_poll_key))
guest_idle_poll_func();
}




thank you for your sample code :)
I agree there is no big difference.. I think we are discussion for two 
things:

 1) x86 VM on different hypervisors
 2) different archs VM on kvm hypervisor

What I want to do is x86 VM on different hypervisors, such as kvm / xen 
/ hyperv ..



And KVM would just need to set guest_idle_poll_func and enable the
static key. Works on non-x86 architectures, too.



.. referred to 'pv_mmu_ops', HyperV and Xen can implement their own 
functions for 'pv_mmu_ops'.

I think it is the same to pv_idle_ops.

with above explaination, do you still think I need to define the static
key/function pointer variant?

btw, any interest to port it to Xen HVM guest? :)

Quan
Alibaba Cloud
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-22 Thread Quan Xu



On 2017/11/14 15:12, Wanpeng Li wrote:

2017-11-14 15:02 GMT+08:00 Quan Xu <quan@gmail.com>:


On 2017/11/13 18:53, Juergen Gross wrote:

On 13/11/17 11:06, Quan Xu wrote:

From: Quan Xu <quan@gmail.com>

So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
in idle path which will poll for a while before we enter the real idle
state.

In virtualization, idle path includes several heavy operations
includes timer access(LAPIC timer or TSC deadline timer) which will
hurt performance especially for latency intensive workload like message
passing task. The cost is mainly from the vmexit which is a hardware
context switch between virtual machine and hypervisor. Our solution is
to poll for a while and do not enter real idle path if we can get the
schedule event during polling.

Poll may cause the CPU waste so we adopt a smart polling mechanism to
reduce the useless poll.

Signed-off-by: Yang Zhang <yang.zhang...@gmail.com>
Signed-off-by: Quan Xu <quan@gmail.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Alok Kataria <akata...@vmware.com>
Cc: Rusty Russell <ru...@rustcorp.com.au>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: x...@kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
Cc: xen-de...@lists.xenproject.org

Hmm, is the idle entry path really so critical to performance that a new
pvops function is necessary?

Juergen, Here is the data we get when running benchmark netperf:
  1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
 29031.6 bit/s -- 76.1 %CPU

  2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
 35787.7 bit/s -- 129.4 %CPU

  3. w/ kvm dynamic poll:
 35735.6 bit/s -- 200.0 %CPU

Actually we can reduce the CPU utilization by sleeping a period of
time as what has already been done in the poll logic of IO subsystem,
then we can improve the algorithm in kvm instead of introduing another
duplicate one in the kvm guest.

We really appreciate upstream's kvm dynamic poll mechanism, which is
really helpful for a lot of scenario..

However, as description said, in virtualization, idle path includes
several heavy operations includes timer access (LAPIC timer or TSC
deadline timer) which will hurt performance especially for latency
intensive workload like message passing task. The cost is mainly from
the vmexit which is a hardware context switch between virtual machine
and hypervisor.

for upstream's kvm dynamic poll mechanism, even you could provide a
better algorism, how could you bypass timer access (LAPIC timer or TSC
deadline timer), or a hardware context switch between virtual machine
and hypervisor. I know these is a tradeoff.

Furthermore, here is the data we get when running benchmark contextswitch
to measure the latency(lower is better):

1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
  3402.9 ns/ctxsw -- 199.8 %CPU

2. w/ patch and disable kvm dynamic poll:
  1163.5 ns/ctxsw -- 205.5 %CPU

3. w/ kvm dynamic poll:
  2280.6 ns/ctxsw -- 199.5 %CPU

so, these tow solution are quite similar, but not duplicate..

that's also why to add a generic idle poll before enter real idle path.
When a reschedule event is pending, we can bypass the real idle path.


Quan
Alibaba Cloud





Regards,
Wanpeng Li


  4. w/patch and w/ kvm dynamic poll:
 42225.3 bit/s -- 198.7 %CPU

  5. idle=poll
 37081.7 bit/s -- 998.1 %CPU



  w/ this patch, we will improve performance by 23%.. even we could improve
  performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
  cost of CPU is much lower than 'idle=poll' case..


Wouldn't a function pointer, maybe guarded
by a static key, be enough? A further advantage would be that this would
work on other architectures, too.


I assume this feature will be ported to other archs.. a new pvops makes code
clean and easy to maintain. also I tried to add it into existed pvops, but
it
doesn't match.



Quan
Alibaba Cloud


Juergen



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-22 Thread Quan Xu



On 2017/11/13 18:53, Juergen Gross wrote:

On 13/11/17 11:06, Quan Xu wrote:

From: Quan Xu <quan@gmail.com>

So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
in idle path which will poll for a while before we enter the real idle
state.

In virtualization, idle path includes several heavy operations
includes timer access(LAPIC timer or TSC deadline timer) which will
hurt performance especially for latency intensive workload like message
passing task. The cost is mainly from the vmexit which is a hardware
context switch between virtual machine and hypervisor. Our solution is
to poll for a while and do not enter real idle path if we can get the
schedule event during polling.

Poll may cause the CPU waste so we adopt a smart polling mechanism to
reduce the useless poll.

Signed-off-by: Yang Zhang <yang.zhang...@gmail.com>
Signed-off-by: Quan Xu <quan@gmail.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Alok Kataria <akata...@vmware.com>
Cc: Rusty Russell <ru...@rustcorp.com.au>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: x...@kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
Cc: xen-de...@lists.xenproject.org

Hmm, is the idle entry path really so critical to performance that a new
pvops function is necessary?

Juergen, Here is the data we get when running benchmark netperf:
 1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
    29031.6 bit/s -- 76.1 %CPU

 2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
    35787.7 bit/s -- 129.4 %CPU

 3. w/ kvm dynamic poll:
    35735.6 bit/s -- 200.0 %CPU

 4. w/patch and w/ kvm dynamic poll:
    42225.3 bit/s -- 198.7 %CPU

 5. idle=poll
    37081.7 bit/s -- 998.1 %CPU



 w/ this patch, we will improve performance by 23%.. even we could improve
 performance by 45.4%, if we use w/patch and w/ kvm dynamic poll. also the
 cost of CPU is much lower than 'idle=poll' case..


Wouldn't a function pointer, maybe guarded
by a static key, be enough? A further advantage would be that this would
work on other architectures, too.


I assume this feature will be ported to other archs.. a new pvops makes code
clean and easy to maintain. also I tried to add it into existed pvops, 
but it

doesn't match.



Quan
Alibaba Cloud


Juergen



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC v3 4/6] Documentation: Add three sysctls for smart idle poll

2017-11-22 Thread Quan Xu



On 2017/11/13 23:08, Ingo Molnar wrote:

* Quan Xu <quan.x...@gmail.com> wrote:


From: Quan Xu <quan@gmail.com>

To reduce the cost of poll, we introduce three sysctl to control the
poll time when running as a virtual machine with paravirt.

Signed-off-by: Yang Zhang <yang.zhang...@gmail.com>
Signed-off-by: Quan Xu <quan@gmail.com>
---
  Documentation/sysctl/kernel.txt |   35 +++
  arch/x86/kernel/paravirt.c  |4 
  include/linux/kernel.h  |6 ++
  kernel/sysctl.c |   34 ++
  4 files changed, 79 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 694968c..30c25fb 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -714,6 +714,41 @@ kernel tries to allocate a number starting from this one.
  
  ==
  
+paravirt_poll_grow: (X86 only)

+
+Multiplied value to increase the poll time. This is expected to take
+effect only when running as a virtual machine with CONFIG_PARAVIRT
+enabled. This can't bring any benifit on bare mental even with
+CONFIG_PARAVIRT enabled.
+
+By default this value is 2. Possible values to set are in range {2..16}.
+
+==
+
+paravirt_poll_shrink: (X86 only)
+
+Divided value to reduce the poll time. This is expected to take effect
+only when running as a virtual machine with CONFIG_PARAVIRT enabled.
+This can't bring any benifit on bare mental even with CONFIG_PARAVIRT
+enabled.
+
+By default this value is 2. Possible values to set are in range {2..16}.
+
+==
+
+paravirt_poll_threshold_ns: (X86 only)
+
+Controls the maximum poll time before entering real idle path. This is
+expected to take effect only when running as a virtual machine with
+CONFIG_PARAVIRT enabled. This can't bring any benifit on bare mental
+even with CONFIG_PARAVIRT enabled.
+
+By default, this value is 0 means not to poll. Possible values to set
+are in range {0..50}. Change the value to non-zero if running
+latency-bound workloads in a virtual machine.

I absolutely hate it how this hybrid idle loop polling mechanism is not
self-tuning!


Ingo, actually it is self-tuning..

Please make it all work fine by default, and automatically so, instead of adding
three random parameters...
.. I will make it all fine by default. howerver cloud environment is of 
diversity,


could I only leave paravirt_poll_threshold_ns parameter (the maximum 
poll time),
which is as similar as "adaptive halt-polling" Wanpeng mentioned.. then 
user can turn

it off, or find an appropriate threshold for some odd scenario..

thanks for your comments!!
Quan
Alibaba Cloud

And if it cannot be done automatically then we should rather not do it at all.
Maybe the next submitter of a similar feature can think of a better approach.

Thanks,

Ingo



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

2017-11-22 Thread Quan Xu
From: Yang Zhang <yang.zhang...@gmail.com>

Implement a generic idle poll which resembles the functionality
found in arch/. Provide weak arch_cpu_idle_poll function which
can be overridden by the architecture code if needed.

Interrupts arrive which may not cause a reschedule in idle loops.
In KVM guest, this costs several VM-exit/VM-entry cycles, VM-entry
for interrupts and VM-exit immediately. Also this becomes more
expensive than bare metal. Add a generic idle poll before enter
real idle path. When a reschedule event is pending, we can bypass
the real idle path.

Signed-off-by: Quan Xu <quan@gmail.com>
Signed-off-by: Yang Zhang <yang.zhang...@gmail.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: x...@kernel.org
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Kyle Huey <m...@kylehuey.com>
Cc: Len Brown <len.br...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Tom Lendacky <thomas.lenda...@amd.com>
Cc: Tobias Klauser <tklau...@distanz.ch>
Cc: linux-ker...@vger.kernel.org
---
 arch/x86/kernel/process.c |7 +++
 kernel/sched/idle.c   |2 ++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c676853..f7db8b5 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -333,6 +333,13 @@ void arch_cpu_idle(void)
x86_idle();
 }
 
+#ifdef CONFIG_PARAVIRT
+void arch_cpu_idle_poll(void)
+{
+   paravirt_idle_poll();
+}
+#endif
+
 /*
  * We use this if we don't have any better idle routine..
  */
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 257f4f0..df7c422 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -74,6 +74,7 @@ static noinline int __cpuidle cpu_idle_poll(void)
 }
 
 /* Weak implementations for optional arch specific functions */
+void __weak arch_cpu_idle_poll(void) { }
 void __weak arch_cpu_idle_prepare(void) { }
 void __weak arch_cpu_idle_enter(void) { }
 void __weak arch_cpu_idle_exit(void) { }
@@ -219,6 +220,7 @@ static void do_idle(void)
 */
 
__current_set_polling();
+   arch_cpu_idle_poll();
quiet_vmstat();
tick_nohz_idle_enter();
 
-- 
1.7.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v3 0/6] x86/idle: add halt poll support

2017-11-22 Thread Quan Xu
From: Quan Xu <quan@gmail.com>

Some latency-intensive workload have seen obviously performance
drop when running inside VM. The main reason is that the overhead
is amplified when running inside VM. The most cost I have seen is
inside idle path.

This patch introduces a new mechanism to poll for a while before
entering idle state. If schedule is needed during poll, then we
don't need to goes through the heavy overhead path.

Here is the data we get when running benchmark contextswitch to measure
the latency(lower is better):

   1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
 3402.9 ns/ctxsw -- 199.8 %CPU

   2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
  halt_poll_threshold=1  -- 1151.4 ns/ctxsw -- 200.1 %CPU
  halt_poll_threshold=2  -- 1149.7 ns/ctxsw -- 199.9 %CPU
  halt_poll_threshold=3  -- 1151.0 ns/ctxsw -- 199.9 %CPU
  halt_poll_threshold=4  -- 1155.4 ns/ctxsw -- 199.3 %CPU
  halt_poll_threshold=5  -- 1161.0 ns/ctxsw -- 200.0 %CPU
  halt_poll_threshold=10 -- 1163.8 ns/ctxsw -- 200.4 %CPU
  halt_poll_threshold=30 -- 1159.4 ns/ctxsw -- 201.9 %CPU
  halt_poll_threshold=50 -- 1163.5 ns/ctxsw -- 205.5 %CPU

   3. w/ kvm dynamic poll:
  halt_poll_ns=1  -- 3470.5 ns/ctxsw -- 199.6 %CPU
  halt_poll_ns=2  -- 3273.0 ns/ctxsw -- 199.7 %CPU
  halt_poll_ns=3  -- 3628.7 ns/ctxsw -- 199.4 %CPU
  halt_poll_ns=4  -- 2280.6 ns/ctxsw -- 199.5 %CPU
  halt_poll_ns=5  -- 3200.3 ns/ctxsw -- 199.7 %CPU
  halt_poll_ns=10 -- 2186.6 ns/ctxsw -- 199.6 %CPU
  halt_poll_ns=30 -- 3178.7 ns/ctxsw -- 199.6 %CPU
  halt_poll_ns=50 -- 3505.4 ns/ctxsw -- 199.7 %CPU

   4. w/patch and w/ kvm dynamic poll:

  halt_poll_ns=1 & halt_poll_threshold=1  -- 1155.5 ns/ctxsw -- 
199.8 %CPU
  halt_poll_ns=1 & halt_poll_threshold=2  -- 1165.6 ns/ctxsw -- 
199.8 %CPU
  halt_poll_ns=1 & halt_poll_threshold=3  -- 1161.1 ns/ctxsw -- 
200.0 %CPU

  halt_poll_ns=2 & halt_poll_threshold=1  -- 1158.1 ns/ctxsw -- 
199.8 %CPU
  halt_poll_ns=2 & halt_poll_threshold=2  -- 1161.0 ns/ctxsw -- 
199.7 %CPU
  halt_poll_ns=2 & halt_poll_threshold=3  -- 1163.7 ns/ctxsw -- 
199.9 %CPU

  halt_poll_ns=3 & halt_poll_threshold=1  -- 1158.7 ns/ctxsw -- 
199.7 %CPU
  halt_poll_ns=3 & halt_poll_threshold=2  -- 1153.8 ns/ctxsw -- 
199.8 %CPU
  halt_poll_ns=3 & halt_poll_threshold=3  -- 1155.1 ns/ctxsw -- 
199.8 %CPU

   5. idle=poll
  3957.57 ns/ctxsw --  999.4%CPU

Here is the data we get when running benchmark netperf:

   1. w/o patch and disable kvm dynamic poll (halt_poll_ns=0):
  29031.6 bit/s -- 76.1 %CPU

   2. w/ patch and disable kvm dynamic poll (halt_poll_ns=0):
  halt_poll_threshold=1  -- 29021.7 bit/s -- 105.1 %CPU
  halt_poll_threshold=2  -- 33463.5 bit/s -- 128.2 %CPU
  halt_poll_threshold=3  -- 34436.4 bit/s -- 127.8 %CPU
  halt_poll_threshold=4  -- 35563.3 bit/s -- 129.6 %CPU
  halt_poll_threshold=5  -- 35787.7 bit/s -- 129.4 %CPU
  halt_poll_threshold=10 -- 35477.7 bit/s -- 130.0 %CPU
  halt_poll_threshold=30 -- 35730.0 bit/s -- 132.4 %CPU
  halt_poll_threshold=50 -- 34978.4 bit/s -- 134.2 %CPU

   3. w/ kvm dynamic poll:
  halt_poll_ns=1  -- 28849.8 bit/s -- 75.2  %CPU
  halt_poll_ns=2  -- 29004.8 bit/s -- 76.1  %CPU
  halt_poll_ns=3  -- 35662.0 bit/s -- 199.7 %CPU
  halt_poll_ns=4  -- 35874.8 bit/s -- 187.5 %CPU
  halt_poll_ns=5  -- 35603.1 bit/s -- 199.8 %CPU
  halt_poll_ns=10 -- 35588.8 bit/s -- 200.0 %CPU
  halt_poll_ns=30 -- 35912.4 bit/s -- 200.0 %CPU
  halt_poll_ns=50 -- 35735.6 bit/s -- 200.0 %CPU

   4. w/patch and w/ kvm dynamic poll:

  halt_poll_ns=1 & halt_poll_threshold=1  -- 29427.9 bit/s -- 107.8 
%CPU
  halt_poll_ns=1 & halt_poll_threshold=2  -- 33048.4 bit/s -- 128.1 
%CPU
  halt_poll_ns=1 & halt_poll_threshold=3  -- 35129.8 bit/s -- 129.1 
%CPU

  halt_poll_ns=2 & halt_poll_threshold=1  -- 31091.3 bit/s -- 130.3 
%CPU
  halt_poll_ns=2 & halt_poll_threshold=2  -- 33587.9 bit/s -- 128.9 
%CPU
  halt_poll_ns=2 & halt_poll_threshold=3  -- 35532.9 bit/s -- 129.1 
%CPU

  halt_poll_ns=3 & halt_poll_threshold=1  -- 35633.1 bit/s -- 199.4 
%CPU
  halt_poll_ns=3 & halt_poll_threshold=2  -- 42225.3 bit/s -- 198.7 
%CPU
  halt_poll_ns=3 & halt_poll_threshold=3  -- 42210.7 bit/s -- 200.3 
%CPU

   5. idle=poll
  37081.7 bit/s -- 998.1 %CPU

---
V2 -> V3:
- move poll update into arch/. in v3, poll update is based on duration of the
  last idle loop which is from tick_nohz_idle_enter to tick_nohz_idle_exit,
  and try our best not to interfere with schedule

[PATCH RFC v3 5/6] tick: get duration of the last idle loop

2017-11-22 Thread Quan Xu
From: Quan Xu <quan@gmail.com>

the last idle loop is from tick_nohz_idle_enter to tick_nohz_idle_exit.

Signed-off-by: Yang Zhang <yang.zhang...@gmail.com>
Signed-off-by: Quan Xu <quan@gmail.com>
Cc: Frederic Weisbecker <fweis...@gmail.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: linux-ker...@vger.kernel.org
---
 include/linux/tick.h |2 ++
 kernel/time/tick-sched.c |   11 +++
 kernel/time/tick-sched.h |3 +++
 3 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index cf413b3..77ae46d 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -118,6 +118,7 @@ enum tick_dep_bits {
 extern void tick_nohz_idle_exit(void);
 extern void tick_nohz_irq_exit(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
+extern ktime_t tick_nohz_get_last_idle_length(void);
 extern unsigned long tick_nohz_get_idle_calls(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
@@ -127,6 +128,7 @@ enum tick_dep_bits {
 static inline void tick_nohz_idle_enter(void) { }
 static inline void tick_nohz_idle_exit(void) { }
 
+static ktime_t tick_nohz_get_last_idle_length(void) { return -1; }
 static inline ktime_t tick_nohz_get_sleep_length(void)
 {
return NSEC_PER_SEC / HZ;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c7a899c..65c9cc0 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -548,6 +548,7 @@ static void tick_nohz_update_jiffies(ktime_t now)
else
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, 
delta);
ts->idle_entrytime = now;
+   ts->idle_length = delta;
}
 
if (last_update_time)
@@ -998,6 +999,16 @@ void tick_nohz_irq_exit(void)
 }
 
 /**
+ * tick_nohz_get_last_idle_length - return the length of the last idle loop
+ */
+ktime_t tick_nohz_get_last_idle_length(void)
+{
+   struct tick_sched *ts = this_cpu_ptr(_cpu_sched);
+
+   return ts->idle_length;
+}
+
+/**
  * tick_nohz_get_sleep_length - return the length of the current sleep
  *
  * Called from power state control code with interrupts disabled
diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index 954b43d..2630cf9 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -39,6 +39,8 @@ enum tick_nohz_mode {
  * @idle_sleeptime:Sum of the time slept in idle with sched tick stopped
  * @iowait_sleeptime:  Sum of the time slept in idle with sched tick stopped, 
with IO outstanding
  * @sleep_length:  Duration of the current idle sleep
+ * @idle_length:   Duration of the last idle loop is from
+ * tick_nohz_idle_enter to tick_nohz_idle_exit.
  * @do_timer_lst:  CPU was the last one doing do_timer before going idle
  */
 struct tick_sched {
@@ -59,6 +61,7 @@ struct tick_sched {
ktime_t idle_sleeptime;
ktime_t iowait_sleeptime;
ktime_t sleep_length;
+   ktime_t idle_length;
unsigned long   last_jiffies;
u64 next_timer;
ktime_t idle_expires;
-- 
1.7.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v3 6/6] KVM guest: introduce smart idle poll algorithm

2017-11-22 Thread Quan Xu
From: Yang Zhang <yang.zhang...@gmail.com>

using smart idle poll to reduce the useless poll when system is idle.

Signed-off-by: Quan Xu <quan@gmail.com>
Signed-off-by: Yang Zhang <yang.zhang...@gmail.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
---
 arch/x86/kernel/kvm.c |   47 +++
 1 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 2a6e402..8bb6d55 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -365,11 +366,57 @@ static void kvm_guest_cpu_init(void)
kvm_register_steal_time();
 }
 
+static unsigned int grow_poll_ns(unsigned int old, unsigned int grow,
+unsigned int max)
+{
+   unsigned int val;
+
+   /* set base poll time to 1ns */
+   if (old == 0 && grow)
+   return 1;
+
+   val = old * grow;
+   if (val > max)
+   val = max;
+
+   return val;
+}
+
+static unsigned int shrink_poll_ns(unsigned int old, unsigned int shrink)
+{
+   if (shrink == 0)
+   return 0;
+
+   return old / shrink;
+}
+
+static void kvm_idle_update_poll_duration(ktime_t idle)
+{
+   unsigned long poll_duration = this_cpu_read(poll_duration_ns);
+
+   /* so far poll duration is based on nohz */
+   if (idle == -1ULL)
+   return;
+
+   if (poll_duration && idle > paravirt_poll_threshold_ns)
+   poll_duration = shrink_poll_ns(poll_duration,
+  paravirt_poll_shrink);
+   else if (poll_duration < paravirt_poll_threshold_ns &&
+idle < paravirt_poll_threshold_ns)
+   poll_duration = grow_poll_ns(poll_duration, paravirt_poll_grow,
+paravirt_poll_threshold_ns);
+
+   this_cpu_write(poll_duration_ns, poll_duration);
+}
+
 static void kvm_idle_poll(void)
 {
unsigned long poll_duration = this_cpu_read(poll_duration_ns);
+   ktime_t idle = tick_nohz_get_last_idle_length();
ktime_t start, cur, stop;
 
+   kvm_idle_update_poll_duration(idle);
+
start = cur = ktime_get();
stop = ktime_add_ns(ktime_get(), poll_duration);
 
-- 
1.7.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v3 4/6] Documentation: Add three sysctls for smart idle poll

2017-11-22 Thread Quan Xu
From: Quan Xu <quan@gmail.com>

To reduce the cost of poll, we introduce three sysctl to control the
poll time when running as a virtual machine with paravirt.

Signed-off-by: Yang Zhang <yang.zhang...@gmail.com>
Signed-off-by: Quan Xu <quan@gmail.com>
---
 Documentation/sysctl/kernel.txt |   35 +++
 arch/x86/kernel/paravirt.c  |4 
 include/linux/kernel.h  |6 ++
 kernel/sysctl.c |   34 ++
 4 files changed, 79 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 694968c..30c25fb 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -714,6 +714,41 @@ kernel tries to allocate a number starting from this one.
 
 ==
 
+paravirt_poll_grow: (X86 only)
+
+Multiplied value to increase the poll time. This is expected to take
+effect only when running as a virtual machine with CONFIG_PARAVIRT
+enabled. This can't bring any benifit on bare mental even with
+CONFIG_PARAVIRT enabled.
+
+By default this value is 2. Possible values to set are in range {2..16}.
+
+==
+
+paravirt_poll_shrink: (X86 only)
+
+Divided value to reduce the poll time. This is expected to take effect
+only when running as a virtual machine with CONFIG_PARAVIRT enabled.
+This can't bring any benifit on bare mental even with CONFIG_PARAVIRT
+enabled.
+
+By default this value is 2. Possible values to set are in range {2..16}.
+
+==
+
+paravirt_poll_threshold_ns: (X86 only)
+
+Controls the maximum poll time before entering real idle path. This is
+expected to take effect only when running as a virtual machine with
+CONFIG_PARAVIRT enabled. This can't bring any benifit on bare mental
+even with CONFIG_PARAVIRT enabled.
+
+By default, this value is 0 means not to poll. Possible values to set
+are in range {0..50}. Change the value to non-zero if running
+latency-bound workloads in a virtual machine.
+
+==
+
 powersave-nap: (PPC only)
 
 If set, Linux-PPC will use the 'nap' mode of powersaving,
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 67cab22..28c74ca 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -317,6 +317,10 @@ struct pv_idle_ops pv_idle_ops = {
.poll = paravirt_nop,
 };
 
+unsigned long paravirt_poll_threshold_ns;
+unsigned int paravirt_poll_shrink = 2;
+unsigned int paravirt_poll_grow = 2;
+
 __visible struct pv_irq_ops pv_irq_ops = {
.save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),
.restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 4b484ab..0f46846 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -491,6 +491,12 @@ extern __scanf(2, 0)
 
 extern bool crash_kexec_post_notifiers;
 
+#ifdef CONFIG_PARAVIRT
+extern unsigned long paravirt_poll_threshold_ns;
+extern unsigned int paravirt_poll_shrink;
+extern unsigned int paravirt_poll_grow;
+#endif
+
 /*
  * panic_cpu is used for synchronizing panic() and crash_kexec() execution. It
  * holds a CPU number which is executing panic() currently. A value of
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index d9c31bc..9f194dc 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -135,6 +135,11 @@
 static int six_hundred_forty_kb = 640 * 1024;
 #endif
 
+#ifdef CONFIG_PARAVIRT
+static int sixteen = 16;
+static int five_hundred_thousand = 50;
+#endif
+
 /* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */
 static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
 
@@ -1226,6 +1231,35 @@ static int sysrq_sysctl_handler(struct ctl_table *table, 
int write,
.extra2 = ,
},
 #endif
+#ifdef CONFIG_PARAVIRT
+   {
+   .procname   = "paravirt_halt_poll_threshold",
+   .data   = _poll_threshold_ns,
+   .maxlen = sizeof(unsigned long),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _hundred_thousand,
+   },
+   {
+   .procname   = "paravirt_halt_poll_grow",
+   .data   = _poll_grow,
+   .maxlen = sizeof(unsigned int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = ,
+   },
+   {
+   .procname   = "paravirt_halt_poll_shrink",
+   .data   = _poll_shrink,
+   .maxlen = sizeof(unsig

[PATCH RFC v3 1/6] x86/paravirt: Add pv_idle_ops to paravirt ops

2017-11-22 Thread Quan Xu
From: Quan Xu <quan@gmail.com>

So far, pv_idle_ops.poll is the only ops for pv_idle. .poll is called
in idle path which will poll for a while before we enter the real idle
state.

In virtualization, idle path includes several heavy operations
includes timer access(LAPIC timer or TSC deadline timer) which will
hurt performance especially for latency intensive workload like message
passing task. The cost is mainly from the vmexit which is a hardware
context switch between virtual machine and hypervisor. Our solution is
to poll for a while and do not enter real idle path if we can get the
schedule event during polling.

Poll may cause the CPU waste so we adopt a smart polling mechanism to
reduce the useless poll.

Signed-off-by: Yang Zhang <yang.zhang...@gmail.com>
Signed-off-by: Quan Xu <quan@gmail.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Alok Kataria <akata...@vmware.com>
Cc: Rusty Russell <ru...@rustcorp.com.au>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: x...@kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
Cc: xen-de...@lists.xenproject.org
---
 arch/x86/include/asm/paravirt.h   |5 +
 arch/x86/include/asm/paravirt_types.h |6 ++
 arch/x86/kernel/paravirt.c|6 ++
 3 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index fd81228..3c83727 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -198,6 +198,11 @@ static inline unsigned long long paravirt_read_pmc(int 
counter)
 
 #define rdpmcl(counter, val) ((val) = paravirt_read_pmc(counter))
 
+static inline void paravirt_idle_poll(void)
+{
+   PVOP_VCALL0(pv_idle_ops.poll);
+}
+
 static inline void paravirt_alloc_ldt(struct desc_struct *ldt, unsigned 
entries)
 {
PVOP_VCALL2(pv_cpu_ops.alloc_ldt, ldt, entries);
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 10cc3b9..95c0e3e 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -313,6 +313,10 @@ struct pv_lock_ops {
struct paravirt_callee_save vcpu_is_preempted;
 } __no_randomize_layout;
 
+struct pv_idle_ops {
+   void (*poll)(void);
+} __no_randomize_layout;
+
 /* This contains all the paravirt structures: we get a convenient
  * number for each function using the offset which we use to indicate
  * what to patch. */
@@ -323,6 +327,7 @@ struct paravirt_patch_template {
struct pv_irq_ops pv_irq_ops;
struct pv_mmu_ops pv_mmu_ops;
struct pv_lock_ops pv_lock_ops;
+   struct pv_idle_ops pv_idle_ops;
 } __no_randomize_layout;
 
 extern struct pv_info pv_info;
@@ -332,6 +337,7 @@ struct paravirt_patch_template {
 extern struct pv_irq_ops pv_irq_ops;
 extern struct pv_mmu_ops pv_mmu_ops;
 extern struct pv_lock_ops pv_lock_ops;
+extern struct pv_idle_ops pv_idle_ops;
 
 #define PARAVIRT_PATCH(x)  \
(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 19a3e8f..67cab22 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -128,6 +128,7 @@ unsigned paravirt_patch_jmp(void *insnbuf, const void 
*target,
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
.pv_lock_ops = pv_lock_ops,
 #endif
+   .pv_idle_ops = pv_idle_ops,
};
return *((void **) + type);
 }
@@ -312,6 +313,10 @@ struct pv_time_ops pv_time_ops = {
.steal_clock = native_steal_clock,
 };
 
+struct pv_idle_ops pv_idle_ops = {
+   .poll = paravirt_nop,
+};
+
 __visible struct pv_irq_ops pv_irq_ops = {
.save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),
.restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
@@ -463,3 +468,4 @@ struct pv_mmu_ops pv_mmu_ops __ro_after_init = {
 EXPORT_SYMBOL(pv_mmu_ops);
 EXPORT_SYMBOL_GPL(pv_info);
 EXPORT_SYMBOL(pv_irq_ops);
+EXPORT_SYMBOL(pv_idle_ops);
-- 
1.7.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v3 2/6] KVM guest: register kvm_idle_poll for pv_idle_ops

2017-11-22 Thread Quan Xu
From: Quan Xu <quan@gmail.com>

Although smart idle poll has nothing to do with paravirt, it can
not bring any benifit to native. So we only enable it when Linux
runs as a KVM guest( also it can extend to other hypervisor like
Xen, HyperV and VMware).

Introduce per-CPU variable poll_duration_ns to control the max
poll time.

Signed-off-by: Yang Zhang <yang.zhang...@gmail.com>
Signed-off-by: Quan Xu <quan@gmail.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: "Radim Krčmář" <rkrc...@redhat.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
---
 arch/x86/kernel/kvm.c |   26 ++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8bb9594..2a6e402 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -75,6 +75,7 @@ static int parse_no_kvmclock_vsyscall(char *arg)
 
 early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
 
+static DEFINE_PER_CPU(unsigned long, poll_duration_ns);
 static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
 static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
 static int has_steal_clock = 0;
@@ -364,6 +365,29 @@ static void kvm_guest_cpu_init(void)
kvm_register_steal_time();
 }
 
+static void kvm_idle_poll(void)
+{
+   unsigned long poll_duration = this_cpu_read(poll_duration_ns);
+   ktime_t start, cur, stop;
+
+   start = cur = ktime_get();
+   stop = ktime_add_ns(ktime_get(), poll_duration);
+
+   do {
+   if (need_resched())
+   break;
+   cur = ktime_get();
+   } while (ktime_before(cur, stop));
+}
+
+static void kvm_guest_idle_init(void)
+{
+   if (!kvm_para_available())
+   return;
+
+   pv_idle_ops.poll = kvm_idle_poll;
+}
+
 static void kvm_pv_disable_apf(void)
 {
if (!__this_cpu_read(apf_reason.enabled))
@@ -499,6 +523,8 @@ void __init kvm_guest_init(void)
kvm_guest_cpu_init();
 #endif
 
+   kvm_guest_idle_init();
+
/*
 * Hard lockup detection is enabled by default. Disable it, as guests
 * can get false positives too easily, for example if the host is
-- 
1.7.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH RFC v3 0/6] x86/idle: add halt poll support

2017-11-22 Thread Quan Xu
heduler/idle code. (This seems
  not to follow Peter's v2 comment, however we had a f2f discussion about it
  in Prague.)
- enhance patch desciption.
- enhance Documentation and sysctls.
- test with IRQ_TIMINGS related code, which seems not working so far.

V1 -> V2:
- integrate the smart halt poll into paravirt code
- use idle_stamp instead of check_poll
- since it hard to get whether vcpu is the only task in pcpu, so we
  don't consider it in this series.(May improve it in future)

---
Quan Xu (4):
  x86/paravirt: Add pv_idle_ops to paravirt ops
  KVM guest: register kvm_idle_poll for pv_idle_ops
  Documentation: Add three sysctls for smart idle poll
  tick: get duration of the last idle loop

Yang Zhang (2):
  sched/idle: Add a generic poll before enter real idle path
  KVM guest: introduce smart idle poll algorithm

 Documentation/sysctl/kernel.txt   |   35 
 arch/x86/include/asm/paravirt.h   |5 ++
 arch/x86/include/asm/paravirt_types.h |6 +++
 arch/x86/kernel/kvm.c |   73 +
 arch/x86/kernel/paravirt.c|   10 +
 arch/x86/kernel/process.c |7 +++
 include/linux/kernel.h|6 +++
 include/linux/tick.h  |2 +
 kernel/sched/idle.c   |2 +
 kernel/sysctl.c   |   34 +++
 kernel/time/tick-sched.c  |   11 +
 kernel/time/tick-sched.h  |3 +
 12 files changed, 194 insertions(+), 0 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v3 0/6] x86/idle: add halt poll support

2017-11-22 Thread Quan Xu
heduler/idle code. (This seems
  not to follow Peter's v2 comment, however we had a f2f discussion about it
  in Prague.)
- enhance patch desciption.
- enhance Documentation and sysctls.
- test with IRQ_TIMINGS related code, which seems not working so far.

V1 -> V2:
- integrate the smart halt poll into paravirt code
- use idle_stamp instead of check_poll
- since it hard to get whether vcpu is the only task in pcpu, so we
  don't consider it in this series.(May improve it in future)

---
Quan Xu (4):
  x86/paravirt: Add pv_idle_ops to paravirt ops
  KVM guest: register kvm_idle_poll for pv_idle_ops
  Documentation: Add three sysctls for smart idle poll
  tick: get duration of the last idle loop

Yang Zhang (2):
  sched/idle: Add a generic poll before enter real idle path
  KVM guest: introduce smart idle poll algorithm

 Documentation/sysctl/kernel.txt   |   35 
 arch/x86/include/asm/paravirt.h   |5 ++
 arch/x86/include/asm/paravirt_types.h |6 +++
 arch/x86/kernel/kvm.c |   73 +
 arch/x86/kernel/paravirt.c|   10 +
 arch/x86/kernel/process.c |7 +++
 include/linux/kernel.h|6 +++
 include/linux/tick.h  |2 +
 kernel/sched/idle.c   |2 +
 kernel/sysctl.c   |   34 +++
 kernel/time/tick-sched.c  |   11 +
 kernel/time/tick-sched.h  |3 +
 12 files changed, 194 insertions(+), 0 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization