Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-02 Thread Wanpeng Li

On 9/2/15 9:49 AM, David Matlack wrote:

On Tue, Sep 1, 2015 at 5:29 PM, Wanpeng Li  wrote:

On 9/2/15 7:24 AM, David Matlack wrote:

On Tue, Sep 1, 2015 at 3:58 PM, Wanpeng Li  wrote:



Why this can happen?

Ah, probably because I'm missing 9c8fd1ba220 (KVM: x86: optimize delivery
of TSC deadline timer interrupt). I don't think the edge case exists in
the latest kernel.


Yeah, hope we both(include Peter Kieser) can test against latest kvm tree to
avoid confusing. The reason to introduce the adaptive halt-polling toggle is
to handle the "edge case" as you mentioned above. So I think we can make
more efforts improve v4 instead. I will improve v4 to handle short halt
today. ;-)

That's fine. It's just easier to convey my ideas with a patch. FYI the
other reason for the toggle patch was to add the timer for kvm_vcpu_block,
which I think is the only way to get dynamic halt-polling right. Feel free
to work on top of v4!


I introduce your idea to shrink/grow poll time in v5 by detecting 
long/short halt and the performance looks good. Many thanks your help, 
David! ;-)


Regards,
Wanpeng Li





Did you test your patch against a windows guest?

I have not. I tested against a 250HZ linux guest to check how it performs
against a ticking guest. Presumably, windows should be the same, but at a
higher tick rate. Do you have a test for Windows?


I just test the idle vCPUs usage.


V4 for windows 10:

+-++---+
| | |
|
|  w/o halt-poll   |  w/ halt-poll  | dynamic(v4) halt-poll
|
+-++---+
| | |
|
|~2.1%|~3.0%  | ~2.4%
|
+-++---+

I'm not seeing the same results with v4. With a 250HZ ticking guest
I see 15% c0 with halt_poll_ns=200 and 1.27% with halt_poll_ns=0.
Are you running one vcpu per pcpu?

(The reason for the overhead: the new tracepoint shows each vcpu is
alternating between 0 and 500 us.)


V4  for linux guest:

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+


Regards,
Wanpeng Li


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-02 Thread Wanpeng Li

On 9/2/15 9:49 AM, David Matlack wrote:

On Tue, Sep 1, 2015 at 5:29 PM, Wanpeng Li  wrote:

On 9/2/15 7:24 AM, David Matlack wrote:

On Tue, Sep 1, 2015 at 3:58 PM, Wanpeng Li  wrote:



Why this can happen?

Ah, probably because I'm missing 9c8fd1ba220 (KVM: x86: optimize delivery
of TSC deadline timer interrupt). I don't think the edge case exists in
the latest kernel.


Yeah, hope we both(include Peter Kieser) can test against latest kvm tree to
avoid confusing. The reason to introduce the adaptive halt-polling toggle is
to handle the "edge case" as you mentioned above. So I think we can make
more efforts improve v4 instead. I will improve v4 to handle short halt
today. ;-)

That's fine. It's just easier to convey my ideas with a patch. FYI the
other reason for the toggle patch was to add the timer for kvm_vcpu_block,
which I think is the only way to get dynamic halt-polling right. Feel free
to work on top of v4!


I introduce your idea to shrink/grow poll time in v5 by detecting 
long/short halt and the performance looks good. Many thanks your help, 
David! ;-)


Regards,
Wanpeng Li





Did you test your patch against a windows guest?

I have not. I tested against a 250HZ linux guest to check how it performs
against a ticking guest. Presumably, windows should be the same, but at a
higher tick rate. Do you have a test for Windows?


I just test the idle vCPUs usage.


V4 for windows 10:

+-++---+
| | |
|
|  w/o halt-poll   |  w/ halt-poll  | dynamic(v4) halt-poll
|
+-++---+
| | |
|
|~2.1%|~3.0%  | ~2.4%
|
+-++---+

I'm not seeing the same results with v4. With a 250HZ ticking guest
I see 15% c0 with halt_poll_ns=200 and 1.27% with halt_poll_ns=0.
Are you running one vcpu per pcpu?

(The reason for the overhead: the new tracepoint shows each vcpu is
alternating between 0 and 500 us.)


V4  for linux guest:

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+


Regards,
Wanpeng Li


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-01 Thread David Matlack
On Tue, Sep 1, 2015 at 5:29 PM, Wanpeng Li  wrote:
> On 9/2/15 7:24 AM, David Matlack wrote:
>>
>> On Tue, Sep 1, 2015 at 3:58 PM, Wanpeng Li  wrote:

>>>
>>> Why this can happen?
>>
>> Ah, probably because I'm missing 9c8fd1ba220 (KVM: x86: optimize delivery
>> of TSC deadline timer interrupt). I don't think the edge case exists in
>> the latest kernel.
>
>
> Yeah, hope we both(include Peter Kieser) can test against latest kvm tree to
> avoid confusing. The reason to introduce the adaptive halt-polling toggle is
> to handle the "edge case" as you mentioned above. So I think we can make
> more efforts improve v4 instead. I will improve v4 to handle short halt
> today. ;-)

That's fine. It's just easier to convey my ideas with a patch. FYI the
other reason for the toggle patch was to add the timer for kvm_vcpu_block,
which I think is the only way to get dynamic halt-polling right. Feel free
to work on top of v4!

>

>>>
>>> Did you test your patch against a windows guest?
>>
>> I have not. I tested against a 250HZ linux guest to check how it performs
>> against a ticking guest. Presumably, windows should be the same, but at a
>> higher tick rate. Do you have a test for Windows?
>
>
> I just test the idle vCPUs usage.
>
>
> V4 for windows 10:
>
> +-++---+
> | | |
> |
> |  w/o halt-poll   |  w/ halt-poll  | dynamic(v4) halt-poll
> |
> +-++---+
> | | |
> |
> |~2.1%|~3.0%  | ~2.4%
> |
> +-++---+

I'm not seeing the same results with v4. With a 250HZ ticking guest
I see 15% c0 with halt_poll_ns=200 and 1.27% with halt_poll_ns=0.
Are you running one vcpu per pcpu?

(The reason for the overhead: the new tracepoint shows each vcpu is
alternating between 0 and 500 us.)

>
> V4  for linux guest:
>
> +-++---+
> | ||   |
> |  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
> +-++---+
> | ||   |
> |~0.9%|~1.8%   | ~1.2% |
> +-++---+
>
>
> Regards,
> Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-01 Thread Wanpeng Li

On 9/2/15 7:24 AM, David Matlack wrote:

On Tue, Sep 1, 2015 at 3:58 PM, Wanpeng Li  wrote:

On 9/2/15 6:34 AM, David Matlack wrote:

On Tue, Sep 1, 2015 at 3:30 PM, Wanpeng Li  wrote:

On 9/2/15 5:45 AM, David Matlack wrote:

On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li 
wrote:

v3 -> v4:
* bring back grow vcpu->halt_poll_ns when interrupt arrives and
shrinks
  when idle VCPU is detected

v2 -> v3:
* grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or
/halt_poll_ns_shrink
* drop the macros and hard coding the numbers in the param
definitions
* update the comments "5-7 us"
* remove halt_poll_ns_max and use halt_poll_ns as the max
halt_poll_ns
time,
  vcpu->halt_poll_ns start at zero
* drop the wrappers
* move the grow/shrink logic before "out:" w/ "if (waited)"

I posted a patchset which adds dynamic poll toggling (on/off switch). I
think
this gives you a good place to build your dynamic growth patch on top.
The
toggling patch has close to zero overhead for idle VMs and equivalent
performance VMs doing message passing as always-poll. It's a patch
that's
been
in my queue for a few weeks but just haven't had the time to send out.
We
can
win even more with your patchset by only polling as much as we need (via
dynamic growth/shrink). It also gives us a better place to stand for
choosing
a default for halt_poll_ns. (We can run experiments and see how high
vcpu->halt_poll_ns tends to grow.)

The reason I posted a separate patch for toggling is because it adds
timers
to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can
get
called multiple times for one halt). To do dynamic poll adjustment


Why this can happen?

Ah, probably because I'm missing 9c8fd1ba220 (KVM: x86: optimize delivery
of TSC deadline timer interrupt). I don't think the edge case exists in
the latest kernel.


Yeah, hope we both(include Peter Kieser) can test against latest kvm 
tree to avoid confusing. The reason to introduce the adaptive 
halt-polling toggle is to handle the "edge case" as you mentioned above. 
So I think we can make more efforts improve v4 instead. I will improve 
v4 to handle short halt today. ;-)







correctly,
we have to time the length of each halt. Otherwise we hit some bad edge
cases:

 v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns
grew
every
 time we had a long halt. So idle VMs looked like: 0 us -> 500 us ->
1
ms ->
 2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at
0
when
 the halts are long.

 v4: v4 fixed the idle overhead problem but broke dynamic growth for
message
 passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns
would
grow.
 That means vcpu->halt_poll_ns will always be maxed out, even when
the
halt
 time is much less than the max.

I think we can fix both edge cases if we make grow/shrink decisions
based
on
the length of kvm_vcpu_block rather than the arrival of a guest
interrupt
during polling.

Some thoughts for dynamic growth:
 * Given Windows 10 timer tick (1 ms), let's set the maximum poll
time
to
   less than 1ms. 200 us has been a good value for always-poll. We
can
   probably go a bit higher once we have your patch. Maybe 500 us?


Did you test your patch against a windows guest?

I have not. I tested against a 250HZ linux guest to check how it performs
against a ticking guest. Presumably, windows should be the same, but at a
higher tick rate. Do you have a test for Windows?


I just test the idle vCPUs usage.


V4 for windows 10:

+-++---+
| | 
|   |
|  w/o halt-poll   |  w/ halt-poll  | dynamic(v4) 
halt-poll |

+-++---+
| | 
|   |
|~2.1%|~3.0%  | ~2.4%   
  |

+-++---+

V4  for linux guest:

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+


Regards,
Wanpeng Li




 * The base case of dynamic growth (the first grow() after being at
0)
should
   be small. 500 us is too big. When I run TCP_RR in my guest I see
poll
times
   of < 10 us. TCP_RR is on the lower-end of message passing workload
latency,
   so 10 us would be a good base case.


How to get your TCP_RR benchmark?

Regards,
Wanpeng Li

Install the netperf package, or build from here:
http://www.netperf.org/netperf/DownloadNetperf.html

In the vm:


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-01 Thread David Matlack
On Tue, Sep 1, 2015 at 3:58 PM, Wanpeng Li  wrote:
> On 9/2/15 6:34 AM, David Matlack wrote:
>>
>> On Tue, Sep 1, 2015 at 3:30 PM, Wanpeng Li  wrote:
>>>
>>> On 9/2/15 5:45 AM, David Matlack wrote:

 On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li 
 wrote:
>
> v3 -> v4:
>* bring back grow vcpu->halt_poll_ns when interrupt arrives and
> shrinks
>  when idle VCPU is detected
>
> v2 -> v3:
>* grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or
> /halt_poll_ns_shrink
>* drop the macros and hard coding the numbers in the param
> definitions
>* update the comments "5-7 us"
>* remove halt_poll_ns_max and use halt_poll_ns as the max
> halt_poll_ns
> time,
>  vcpu->halt_poll_ns start at zero
>* drop the wrappers
>* move the grow/shrink logic before "out:" w/ "if (waited)"

 I posted a patchset which adds dynamic poll toggling (on/off switch). I
 think
 this gives you a good place to build your dynamic growth patch on top.
 The
 toggling patch has close to zero overhead for idle VMs and equivalent
 performance VMs doing message passing as always-poll. It's a patch
 that's
 been
 in my queue for a few weeks but just haven't had the time to send out.
 We
 can
 win even more with your patchset by only polling as much as we need (via
 dynamic growth/shrink). It also gives us a better place to stand for
 choosing
 a default for halt_poll_ns. (We can run experiments and see how high
 vcpu->halt_poll_ns tends to grow.)

 The reason I posted a separate patch for toggling is because it adds
 timers
 to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can
 get
 called multiple times for one halt). To do dynamic poll adjustment
>
>
> Why this can happen?

Ah, probably because I'm missing 9c8fd1ba220 (KVM: x86: optimize delivery
of TSC deadline timer interrupt). I don't think the edge case exists in
the latest kernel.

>
>
 correctly,
 we have to time the length of each halt. Otherwise we hit some bad edge
 cases:

 v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns
 grew
 every
 time we had a long halt. So idle VMs looked like: 0 us -> 500 us ->
 1
 ms ->
 2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at
 0
 when
 the halts are long.

 v4: v4 fixed the idle overhead problem but broke dynamic growth for
 message
 passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns
 would
 grow.
 That means vcpu->halt_poll_ns will always be maxed out, even when
 the
 halt
 time is much less than the max.

 I think we can fix both edge cases if we make grow/shrink decisions
 based
 on
 the length of kvm_vcpu_block rather than the arrival of a guest
 interrupt
 during polling.

 Some thoughts for dynamic growth:
 * Given Windows 10 timer tick (1 ms), let's set the maximum poll
 time
 to
   less than 1ms. 200 us has been a good value for always-poll. We
 can
   probably go a bit higher once we have your patch. Maybe 500 us?
>
>
> Did you test your patch against a windows guest?

I have not. I tested against a 250HZ linux guest to check how it performs
against a ticking guest. Presumably, windows should be the same, but at a
higher tick rate. Do you have a test for Windows?

>

 * The base case of dynamic growth (the first grow() after being at
 0)
 should
   be small. 500 us is too big. When I run TCP_RR in my guest I see
 poll
 times
   of < 10 us. TCP_RR is on the lower-end of message passing workload
 latency,
   so 10 us would be a good base case.
>>>
>>>
>>> How to get your TCP_RR benchmark?
>>>
>>> Regards,
>>> Wanpeng Li
>>
>> Install the netperf package, or build from here:
>> http://www.netperf.org/netperf/DownloadNetperf.html
>>
>> In the vm:
>>
>> # ./netserver
>> # ./netperf -t TCP_RR
>>
>> Be sure to use an SMP guest (we want TCP_RR to be a cross-core message
>> passing workload in order to test halt-polling).
>
>
> Ah, ok, I use the same benchmark as yours.
>
> Regards,
> Wanpeng Li
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-01 Thread Wanpeng Li

On 9/2/15 6:34 AM, David Matlack wrote:

On Tue, Sep 1, 2015 at 3:30 PM, Wanpeng Li  wrote:

On 9/2/15 5:45 AM, David Matlack wrote:

On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li 
wrote:

v3 -> v4:
   * bring back grow vcpu->halt_poll_ns when interrupt arrives and shrinks
 when idle VCPU is detected

v2 -> v3:
   * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or
/halt_poll_ns_shrink
   * drop the macros and hard coding the numbers in the param definitions
   * update the comments "5-7 us"
   * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns
time,
 vcpu->halt_poll_ns start at zero
   * drop the wrappers
   * move the grow/shrink logic before "out:" w/ "if (waited)"

I posted a patchset which adds dynamic poll toggling (on/off switch). I
think
this gives you a good place to build your dynamic growth patch on top. The
toggling patch has close to zero overhead for idle VMs and equivalent
performance VMs doing message passing as always-poll. It's a patch that's
been
in my queue for a few weeks but just haven't had the time to send out. We
can
win even more with your patchset by only polling as much as we need (via
dynamic growth/shrink). It also gives us a better place to stand for
choosing
a default for halt_poll_ns. (We can run experiments and see how high
vcpu->halt_poll_ns tends to grow.)

The reason I posted a separate patch for toggling is because it adds
timers
to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can get
called multiple times for one halt). To do dynamic poll adjustment


Why this can happen?


correctly,
we have to time the length of each halt. Otherwise we hit some bad edge
cases:

v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns grew
every
time we had a long halt. So idle VMs looked like: 0 us -> 500 us -> 1
ms ->
2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at 0
when
the halts are long.

v4: v4 fixed the idle overhead problem but broke dynamic growth for
message
passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns would
grow.
That means vcpu->halt_poll_ns will always be maxed out, even when the
halt
time is much less than the max.

I think we can fix both edge cases if we make grow/shrink decisions based
on
the length of kvm_vcpu_block rather than the arrival of a guest interrupt
during polling.

Some thoughts for dynamic growth:
* Given Windows 10 timer tick (1 ms), let's set the maximum poll time
to
  less than 1ms. 200 us has been a good value for always-poll. We can
  probably go a bit higher once we have your patch. Maybe 500 us?


Did you test your patch against a windows guest?



* The base case of dynamic growth (the first grow() after being at 0)
should
  be small. 500 us is too big. When I run TCP_RR in my guest I see poll
times
  of < 10 us. TCP_RR is on the lower-end of message passing workload
latency,
  so 10 us would be a good base case.


How to get your TCP_RR benchmark?

Regards,
Wanpeng Li

Install the netperf package, or build from here:
http://www.netperf.org/netperf/DownloadNetperf.html

In the vm:

# ./netserver
# ./netperf -t TCP_RR

Be sure to use an SMP guest (we want TCP_RR to be a cross-core message
passing workload in order to test halt-polling).


Ah, ok, I use the same benchmark as yours.

Regards,
Wanpeng Li


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-01 Thread David Matlack
On Tue, Sep 1, 2015 at 3:30 PM, Wanpeng Li  wrote:
> On 9/2/15 5:45 AM, David Matlack wrote:
>>
>> On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li 
>> wrote:
>>>
>>> v3 -> v4:
>>>   * bring back grow vcpu->halt_poll_ns when interrupt arrives and shrinks
>>> when idle VCPU is detected
>>>
>>> v2 -> v3:
>>>   * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or
>>> /halt_poll_ns_shrink
>>>   * drop the macros and hard coding the numbers in the param definitions
>>>   * update the comments "5-7 us"
>>>   * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns
>>> time,
>>> vcpu->halt_poll_ns start at zero
>>>   * drop the wrappers
>>>   * move the grow/shrink logic before "out:" w/ "if (waited)"
>>
>> I posted a patchset which adds dynamic poll toggling (on/off switch). I
>> think
>> this gives you a good place to build your dynamic growth patch on top. The
>> toggling patch has close to zero overhead for idle VMs and equivalent
>> performance VMs doing message passing as always-poll. It's a patch that's
>> been
>> in my queue for a few weeks but just haven't had the time to send out. We
>> can
>> win even more with your patchset by only polling as much as we need (via
>> dynamic growth/shrink). It also gives us a better place to stand for
>> choosing
>> a default for halt_poll_ns. (We can run experiments and see how high
>> vcpu->halt_poll_ns tends to grow.)
>>
>> The reason I posted a separate patch for toggling is because it adds
>> timers
>> to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can get
>> called multiple times for one halt). To do dynamic poll adjustment
>> correctly,
>> we have to time the length of each halt. Otherwise we hit some bad edge
>> cases:
>>
>>v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns grew
>> every
>>time we had a long halt. So idle VMs looked like: 0 us -> 500 us -> 1
>> ms ->
>>2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at 0
>> when
>>the halts are long.
>>
>>v4: v4 fixed the idle overhead problem but broke dynamic growth for
>> message
>>passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns would
>> grow.
>>That means vcpu->halt_poll_ns will always be maxed out, even when the
>> halt
>>time is much less than the max.
>>
>> I think we can fix both edge cases if we make grow/shrink decisions based
>> on
>> the length of kvm_vcpu_block rather than the arrival of a guest interrupt
>> during polling.
>>
>> Some thoughts for dynamic growth:
>>* Given Windows 10 timer tick (1 ms), let's set the maximum poll time
>> to
>>  less than 1ms. 200 us has been a good value for always-poll. We can
>>  probably go a bit higher once we have your patch. Maybe 500 us?
>>
>>* The base case of dynamic growth (the first grow() after being at 0)
>> should
>>  be small. 500 us is too big. When I run TCP_RR in my guest I see poll
>> times
>>  of < 10 us. TCP_RR is on the lower-end of message passing workload
>> latency,
>>  so 10 us would be a good base case.
>
>
> How to get your TCP_RR benchmark?
>
> Regards,
> Wanpeng Li

Install the netperf package, or build from here:
http://www.netperf.org/netperf/DownloadNetperf.html

In the vm:

# ./netserver
# ./netperf -t TCP_RR

Be sure to use an SMP guest (we want TCP_RR to be a cross-core message
passing workload in order to test halt-polling).

>
>
>>> v1 -> v2:
>>>   * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of
>>> the module parameter
>>>   * use the shrink/grow matrix which is suggested by David
>>>   * set halt_poll_ns_max to 2ms
>>>
>>> There is a downside of halt_poll_ns since poll is still happen for idle
>>> VCPU which can waste cpu usage. This patchset add the ability to adjust
>>> halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and
>>> shrinks halt_poll_ns when idle VCPU is detected.
>>>
>>> There are two new kernel parameters for changing the halt_poll_ns:
>>> halt_poll_ns_grow and halt_poll_ns_shrink.
>>>
>>>
>>> Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
>>> halt-poll is the default 50ns, the max halt_poll_ns of dynamic
>>> halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
>>> The test method is almost from David.
>>>
>>> +-++---+
>>> | ||   |
>>> |  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
>>> +-++---+
>>> | ||   |
>>> |~0.9%|~1.8%   | ~1.2% |
>>> +-++---+
>>>
>>> The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the
>>> dynamic halt-poll drop it to ~0.3% which means that reduce the 67%
>>> overhead
>>> introduced by always halt-poll.
>>>
>>> Wanpeng Li (3):
>>>KVM: make halt_poll_ns 

Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-01 Thread Wanpeng Li

On 9/2/15 5:45 AM, David Matlack wrote:

On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li  wrote:

v3 -> v4:
  * bring back grow vcpu->halt_poll_ns when interrupt arrives and shrinks
when idle VCPU is detected

v2 -> v3:
  * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or /halt_poll_ns_shrink
  * drop the macros and hard coding the numbers in the param definitions
  * update the comments "5-7 us"
  * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns time,
vcpu->halt_poll_ns start at zero
  * drop the wrappers
  * move the grow/shrink logic before "out:" w/ "if (waited)"

I posted a patchset which adds dynamic poll toggling (on/off switch). I think
this gives you a good place to build your dynamic growth patch on top. The
toggling patch has close to zero overhead for idle VMs and equivalent
performance VMs doing message passing as always-poll. It's a patch that's been
in my queue for a few weeks but just haven't had the time to send out. We can
win even more with your patchset by only polling as much as we need (via
dynamic growth/shrink). It also gives us a better place to stand for choosing
a default for halt_poll_ns. (We can run experiments and see how high
vcpu->halt_poll_ns tends to grow.)

The reason I posted a separate patch for toggling is because it adds timers
to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can get
called multiple times for one halt). To do dynamic poll adjustment correctly,
we have to time the length of each halt. Otherwise we hit some bad edge cases:

   v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns grew every
   time we had a long halt. So idle VMs looked like: 0 us -> 500 us -> 1 ms ->
   2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at 0 when
   the halts are long.

   v4: v4 fixed the idle overhead problem but broke dynamic growth for message
   passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns would grow.
   That means vcpu->halt_poll_ns will always be maxed out, even when the halt
   time is much less than the max.

I think we can fix both edge cases if we make grow/shrink decisions based on
the length of kvm_vcpu_block rather than the arrival of a guest interrupt
during polling.

Some thoughts for dynamic growth:
   * Given Windows 10 timer tick (1 ms), let's set the maximum poll time to
 less than 1ms. 200 us has been a good value for always-poll. We can
 probably go a bit higher once we have your patch. Maybe 500 us?

   * The base case of dynamic growth (the first grow() after being at 0) should
 be small. 500 us is too big. When I run TCP_RR in my guest I see poll times
 of < 10 us. TCP_RR is on the lower-end of message passing workload latency,
 so 10 us would be a good base case.


How to get your TCP_RR benchmark?

Regards,
Wanpeng Li


v1 -> v2:
  * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of
the module parameter
  * use the shrink/grow matrix which is suggested by David
  * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for idle
VCPU which can waste cpu usage. This patchset add the ability to adjust
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
halt-poll is the default 50ns, the max halt_poll_ns of dynamic
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+

The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead
introduced by always halt-poll.

Wanpeng Li (3):
   KVM: make halt_poll_ns per-VCPU
   KVM: dynamic halt_poll_ns adjustment
   KVM: trace kvm_halt_poll_ns grow/shrink

  include/linux/kvm_host.h   |  1 +
  include/trace/events/kvm.h | 30 
  virt/kvm/kvm_main.c| 50 +++---
  3 files changed, 78 insertions(+), 3 deletions(-)
--
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-01 Thread David Matlack
On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li  wrote:
> v3 -> v4:
>  * bring back grow vcpu->halt_poll_ns when interrupt arrives and shrinks
>when idle VCPU is detected
>
> v2 -> v3:
>  * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or 
> /halt_poll_ns_shrink
>  * drop the macros and hard coding the numbers in the param definitions
>  * update the comments "5-7 us"
>  * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns time,
>vcpu->halt_poll_ns start at zero
>  * drop the wrappers
>  * move the grow/shrink logic before "out:" w/ "if (waited)"

I posted a patchset which adds dynamic poll toggling (on/off switch). I think
this gives you a good place to build your dynamic growth patch on top. The
toggling patch has close to zero overhead for idle VMs and equivalent
performance VMs doing message passing as always-poll. It's a patch that's been
in my queue for a few weeks but just haven't had the time to send out. We can
win even more with your patchset by only polling as much as we need (via
dynamic growth/shrink). It also gives us a better place to stand for choosing
a default for halt_poll_ns. (We can run experiments and see how high
vcpu->halt_poll_ns tends to grow.)

The reason I posted a separate patch for toggling is because it adds timers
to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can get
called multiple times for one halt). To do dynamic poll adjustment correctly,
we have to time the length of each halt. Otherwise we hit some bad edge cases:

  v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns grew every
  time we had a long halt. So idle VMs looked like: 0 us -> 500 us -> 1 ms ->
  2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at 0 when
  the halts are long.

  v4: v4 fixed the idle overhead problem but broke dynamic growth for message
  passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns would grow.
  That means vcpu->halt_poll_ns will always be maxed out, even when the halt
  time is much less than the max.

I think we can fix both edge cases if we make grow/shrink decisions based on
the length of kvm_vcpu_block rather than the arrival of a guest interrupt
during polling.

Some thoughts for dynamic growth:
  * Given Windows 10 timer tick (1 ms), let's set the maximum poll time to
less than 1ms. 200 us has been a good value for always-poll. We can
probably go a bit higher once we have your patch. Maybe 500 us?

  * The base case of dynamic growth (the first grow() after being at 0) should
be small. 500 us is too big. When I run TCP_RR in my guest I see poll times
of < 10 us. TCP_RR is on the lower-end of message passing workload latency,
so 10 us would be a good base case.

>
> v1 -> v2:
>  * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of
>the module parameter
>  * use the shrink/grow matrix which is suggested by David
>  * set halt_poll_ns_max to 2ms
>
> There is a downside of halt_poll_ns since poll is still happen for idle
> VCPU which can waste cpu usage. This patchset add the ability to adjust
> halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and
> shrinks halt_poll_ns when idle VCPU is detected.
>
> There are two new kernel parameters for changing the halt_poll_ns:
> halt_poll_ns_grow and halt_poll_ns_shrink.
>
>
> Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
> halt-poll is the default 50ns, the max halt_poll_ns of dynamic
> halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
> The test method is almost from David.
>
> +-++---+
> | ||   |
> |  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
> +-++---+
> | ||   |
> |~0.9%|~1.8%   | ~1.2% |
> +-++---+
>
> The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the
> dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead
> introduced by always halt-poll.
>
> Wanpeng Li (3):
>   KVM: make halt_poll_ns per-VCPU
>   KVM: dynamic halt_poll_ns adjustment
>   KVM: trace kvm_halt_poll_ns grow/shrink
>
>  include/linux/kvm_host.h   |  1 +
>  include/trace/events/kvm.h | 30 
>  virt/kvm/kvm_main.c| 50 
> +++---
>  3 files changed, 78 insertions(+), 3 deletions(-)
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-01 Thread David Matlack
On Tue, Sep 1, 2015 at 5:29 PM, Wanpeng Li  wrote:
> On 9/2/15 7:24 AM, David Matlack wrote:
>>
>> On Tue, Sep 1, 2015 at 3:58 PM, Wanpeng Li  wrote:

>>>
>>> Why this can happen?
>>
>> Ah, probably because I'm missing 9c8fd1ba220 (KVM: x86: optimize delivery
>> of TSC deadline timer interrupt). I don't think the edge case exists in
>> the latest kernel.
>
>
> Yeah, hope we both(include Peter Kieser) can test against latest kvm tree to
> avoid confusing. The reason to introduce the adaptive halt-polling toggle is
> to handle the "edge case" as you mentioned above. So I think we can make
> more efforts improve v4 instead. I will improve v4 to handle short halt
> today. ;-)

That's fine. It's just easier to convey my ideas with a patch. FYI the
other reason for the toggle patch was to add the timer for kvm_vcpu_block,
which I think is the only way to get dynamic halt-polling right. Feel free
to work on top of v4!

>

>>>
>>> Did you test your patch against a windows guest?
>>
>> I have not. I tested against a 250HZ linux guest to check how it performs
>> against a ticking guest. Presumably, windows should be the same, but at a
>> higher tick rate. Do you have a test for Windows?
>
>
> I just test the idle vCPUs usage.
>
>
> V4 for windows 10:
>
> +-++---+
> | | |
> |
> |  w/o halt-poll   |  w/ halt-poll  | dynamic(v4) halt-poll
> |
> +-++---+
> | | |
> |
> |~2.1%|~3.0%  | ~2.4%
> |
> +-++---+

I'm not seeing the same results with v4. With a 250HZ ticking guest
I see 15% c0 with halt_poll_ns=200 and 1.27% with halt_poll_ns=0.
Are you running one vcpu per pcpu?

(The reason for the overhead: the new tracepoint shows each vcpu is
alternating between 0 and 500 us.)

>
> V4  for linux guest:
>
> +-++---+
> | ||   |
> |  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
> +-++---+
> | ||   |
> |~0.9%|~1.8%   | ~1.2% |
> +-++---+
>
>
> Regards,
> Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-01 Thread David Matlack
On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li  wrote:
> v3 -> v4:
>  * bring back grow vcpu->halt_poll_ns when interrupt arrives and shrinks
>when idle VCPU is detected
>
> v2 -> v3:
>  * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or 
> /halt_poll_ns_shrink
>  * drop the macros and hard coding the numbers in the param definitions
>  * update the comments "5-7 us"
>  * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns time,
>vcpu->halt_poll_ns start at zero
>  * drop the wrappers
>  * move the grow/shrink logic before "out:" w/ "if (waited)"

I posted a patchset which adds dynamic poll toggling (on/off switch). I think
this gives you a good place to build your dynamic growth patch on top. The
toggling patch has close to zero overhead for idle VMs and equivalent
performance VMs doing message passing as always-poll. It's a patch that's been
in my queue for a few weeks but just haven't had the time to send out. We can
win even more with your patchset by only polling as much as we need (via
dynamic growth/shrink). It also gives us a better place to stand for choosing
a default for halt_poll_ns. (We can run experiments and see how high
vcpu->halt_poll_ns tends to grow.)

The reason I posted a separate patch for toggling is because it adds timers
to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can get
called multiple times for one halt). To do dynamic poll adjustment correctly,
we have to time the length of each halt. Otherwise we hit some bad edge cases:

  v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns grew every
  time we had a long halt. So idle VMs looked like: 0 us -> 500 us -> 1 ms ->
  2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at 0 when
  the halts are long.

  v4: v4 fixed the idle overhead problem but broke dynamic growth for message
  passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns would grow.
  That means vcpu->halt_poll_ns will always be maxed out, even when the halt
  time is much less than the max.

I think we can fix both edge cases if we make grow/shrink decisions based on
the length of kvm_vcpu_block rather than the arrival of a guest interrupt
during polling.

Some thoughts for dynamic growth:
  * Given Windows 10 timer tick (1 ms), let's set the maximum poll time to
less than 1ms. 200 us has been a good value for always-poll. We can
probably go a bit higher once we have your patch. Maybe 500 us?

  * The base case of dynamic growth (the first grow() after being at 0) should
be small. 500 us is too big. When I run TCP_RR in my guest I see poll times
of < 10 us. TCP_RR is on the lower-end of message passing workload latency,
so 10 us would be a good base case.

>
> v1 -> v2:
>  * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of
>the module parameter
>  * use the shrink/grow matrix which is suggested by David
>  * set halt_poll_ns_max to 2ms
>
> There is a downside of halt_poll_ns since poll is still happen for idle
> VCPU which can waste cpu usage. This patchset add the ability to adjust
> halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and
> shrinks halt_poll_ns when idle VCPU is detected.
>
> There are two new kernel parameters for changing the halt_poll_ns:
> halt_poll_ns_grow and halt_poll_ns_shrink.
>
>
> Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
> halt-poll is the default 50ns, the max halt_poll_ns of dynamic
> halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
> The test method is almost from David.
>
> +-++---+
> | ||   |
> |  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
> +-++---+
> | ||   |
> |~0.9%|~1.8%   | ~1.2% |
> +-++---+
>
> The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the
> dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead
> introduced by always halt-poll.
>
> Wanpeng Li (3):
>   KVM: make halt_poll_ns per-VCPU
>   KVM: dynamic halt_poll_ns adjustment
>   KVM: trace kvm_halt_poll_ns grow/shrink
>
>  include/linux/kvm_host.h   |  1 +
>  include/trace/events/kvm.h | 30 
>  virt/kvm/kvm_main.c| 50 
> +++---
>  3 files changed, 78 insertions(+), 3 deletions(-)
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-01 Thread Wanpeng Li

On 9/2/15 7:24 AM, David Matlack wrote:

On Tue, Sep 1, 2015 at 3:58 PM, Wanpeng Li  wrote:

On 9/2/15 6:34 AM, David Matlack wrote:

On Tue, Sep 1, 2015 at 3:30 PM, Wanpeng Li  wrote:

On 9/2/15 5:45 AM, David Matlack wrote:

On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li 
wrote:

v3 -> v4:
* bring back grow vcpu->halt_poll_ns when interrupt arrives and
shrinks
  when idle VCPU is detected

v2 -> v3:
* grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or
/halt_poll_ns_shrink
* drop the macros and hard coding the numbers in the param
definitions
* update the comments "5-7 us"
* remove halt_poll_ns_max and use halt_poll_ns as the max
halt_poll_ns
time,
  vcpu->halt_poll_ns start at zero
* drop the wrappers
* move the grow/shrink logic before "out:" w/ "if (waited)"

I posted a patchset which adds dynamic poll toggling (on/off switch). I
think
this gives you a good place to build your dynamic growth patch on top.
The
toggling patch has close to zero overhead for idle VMs and equivalent
performance VMs doing message passing as always-poll. It's a patch
that's
been
in my queue for a few weeks but just haven't had the time to send out.
We
can
win even more with your patchset by only polling as much as we need (via
dynamic growth/shrink). It also gives us a better place to stand for
choosing
a default for halt_poll_ns. (We can run experiments and see how high
vcpu->halt_poll_ns tends to grow.)

The reason I posted a separate patch for toggling is because it adds
timers
to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can
get
called multiple times for one halt). To do dynamic poll adjustment


Why this can happen?

Ah, probably because I'm missing 9c8fd1ba220 (KVM: x86: optimize delivery
of TSC deadline timer interrupt). I don't think the edge case exists in
the latest kernel.


Yeah, hope we both(include Peter Kieser) can test against latest kvm 
tree to avoid confusing. The reason to introduce the adaptive 
halt-polling toggle is to handle the "edge case" as you mentioned above. 
So I think we can make more efforts improve v4 instead. I will improve 
v4 to handle short halt today. ;-)







correctly,
we have to time the length of each halt. Otherwise we hit some bad edge
cases:

 v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns
grew
every
 time we had a long halt. So idle VMs looked like: 0 us -> 500 us ->
1
ms ->
 2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at
0
when
 the halts are long.

 v4: v4 fixed the idle overhead problem but broke dynamic growth for
message
 passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns
would
grow.
 That means vcpu->halt_poll_ns will always be maxed out, even when
the
halt
 time is much less than the max.

I think we can fix both edge cases if we make grow/shrink decisions
based
on
the length of kvm_vcpu_block rather than the arrival of a guest
interrupt
during polling.

Some thoughts for dynamic growth:
 * Given Windows 10 timer tick (1 ms), let's set the maximum poll
time
to
   less than 1ms. 200 us has been a good value for always-poll. We
can
   probably go a bit higher once we have your patch. Maybe 500 us?


Did you test your patch against a windows guest?

I have not. I tested against a 250HZ linux guest to check how it performs
against a ticking guest. Presumably, windows should be the same, but at a
higher tick rate. Do you have a test for Windows?


I just test the idle vCPUs usage.


V4 for windows 10:

+-++---+
| | 
|   |
|  w/o halt-poll   |  w/ halt-poll  | dynamic(v4) 
halt-poll |

+-++---+
| | 
|   |
|~2.1%|~3.0%  | ~2.4%   
  |

+-++---+

V4  for linux guest:

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+


Regards,
Wanpeng Li




 * The base case of dynamic growth (the first grow() after being at
0)
should
   be small. 500 us is too big. When I run TCP_RR in my guest I see
poll
times
   of < 10 us. TCP_RR is on the lower-end of message passing workload
latency,
   so 10 us would be a good base case.


How to get your TCP_RR benchmark?

Regards,
Wanpeng Li

Install the netperf package, or build from 

Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-01 Thread David Matlack
On Tue, Sep 1, 2015 at 3:58 PM, Wanpeng Li  wrote:
> On 9/2/15 6:34 AM, David Matlack wrote:
>>
>> On Tue, Sep 1, 2015 at 3:30 PM, Wanpeng Li  wrote:
>>>
>>> On 9/2/15 5:45 AM, David Matlack wrote:

 On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li 
 wrote:
>
> v3 -> v4:
>* bring back grow vcpu->halt_poll_ns when interrupt arrives and
> shrinks
>  when idle VCPU is detected
>
> v2 -> v3:
>* grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or
> /halt_poll_ns_shrink
>* drop the macros and hard coding the numbers in the param
> definitions
>* update the comments "5-7 us"
>* remove halt_poll_ns_max and use halt_poll_ns as the max
> halt_poll_ns
> time,
>  vcpu->halt_poll_ns start at zero
>* drop the wrappers
>* move the grow/shrink logic before "out:" w/ "if (waited)"

 I posted a patchset which adds dynamic poll toggling (on/off switch). I
 think
 this gives you a good place to build your dynamic growth patch on top.
 The
 toggling patch has close to zero overhead for idle VMs and equivalent
 performance VMs doing message passing as always-poll. It's a patch
 that's
 been
 in my queue for a few weeks but just haven't had the time to send out.
 We
 can
 win even more with your patchset by only polling as much as we need (via
 dynamic growth/shrink). It also gives us a better place to stand for
 choosing
 a default for halt_poll_ns. (We can run experiments and see how high
 vcpu->halt_poll_ns tends to grow.)

 The reason I posted a separate patch for toggling is because it adds
 timers
 to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can
 get
 called multiple times for one halt). To do dynamic poll adjustment
>
>
> Why this can happen?

Ah, probably because I'm missing 9c8fd1ba220 (KVM: x86: optimize delivery
of TSC deadline timer interrupt). I don't think the edge case exists in
the latest kernel.

>
>
 correctly,
 we have to time the length of each halt. Otherwise we hit some bad edge
 cases:

 v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns
 grew
 every
 time we had a long halt. So idle VMs looked like: 0 us -> 500 us ->
 1
 ms ->
 2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at
 0
 when
 the halts are long.

 v4: v4 fixed the idle overhead problem but broke dynamic growth for
 message
 passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns
 would
 grow.
 That means vcpu->halt_poll_ns will always be maxed out, even when
 the
 halt
 time is much less than the max.

 I think we can fix both edge cases if we make grow/shrink decisions
 based
 on
 the length of kvm_vcpu_block rather than the arrival of a guest
 interrupt
 during polling.

 Some thoughts for dynamic growth:
 * Given Windows 10 timer tick (1 ms), let's set the maximum poll
 time
 to
   less than 1ms. 200 us has been a good value for always-poll. We
 can
   probably go a bit higher once we have your patch. Maybe 500 us?
>
>
> Did you test your patch against a windows guest?

I have not. I tested against a 250HZ linux guest to check how it performs
against a ticking guest. Presumably, windows should be the same, but at a
higher tick rate. Do you have a test for Windows?

>

 * The base case of dynamic growth (the first grow() after being at
 0)
 should
   be small. 500 us is too big. When I run TCP_RR in my guest I see
 poll
 times
   of < 10 us. TCP_RR is on the lower-end of message passing workload
 latency,
   so 10 us would be a good base case.
>>>
>>>
>>> How to get your TCP_RR benchmark?
>>>
>>> Regards,
>>> Wanpeng Li
>>
>> Install the netperf package, or build from here:
>> http://www.netperf.org/netperf/DownloadNetperf.html
>>
>> In the vm:
>>
>> # ./netserver
>> # ./netperf -t TCP_RR
>>
>> Be sure to use an SMP guest (we want TCP_RR to be a cross-core message
>> passing workload in order to test halt-polling).
>
>
> Ah, ok, I use the same benchmark as yours.
>
> Regards,
> Wanpeng Li
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-01 Thread David Matlack
On Tue, Sep 1, 2015 at 3:30 PM, Wanpeng Li  wrote:
> On 9/2/15 5:45 AM, David Matlack wrote:
>>
>> On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li 
>> wrote:
>>>
>>> v3 -> v4:
>>>   * bring back grow vcpu->halt_poll_ns when interrupt arrives and shrinks
>>> when idle VCPU is detected
>>>
>>> v2 -> v3:
>>>   * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or
>>> /halt_poll_ns_shrink
>>>   * drop the macros and hard coding the numbers in the param definitions
>>>   * update the comments "5-7 us"
>>>   * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns
>>> time,
>>> vcpu->halt_poll_ns start at zero
>>>   * drop the wrappers
>>>   * move the grow/shrink logic before "out:" w/ "if (waited)"
>>
>> I posted a patchset which adds dynamic poll toggling (on/off switch). I
>> think
>> this gives you a good place to build your dynamic growth patch on top. The
>> toggling patch has close to zero overhead for idle VMs and equivalent
>> performance VMs doing message passing as always-poll. It's a patch that's
>> been
>> in my queue for a few weeks but just haven't had the time to send out. We
>> can
>> win even more with your patchset by only polling as much as we need (via
>> dynamic growth/shrink). It also gives us a better place to stand for
>> choosing
>> a default for halt_poll_ns. (We can run experiments and see how high
>> vcpu->halt_poll_ns tends to grow.)
>>
>> The reason I posted a separate patch for toggling is because it adds
>> timers
>> to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can get
>> called multiple times for one halt). To do dynamic poll adjustment
>> correctly,
>> we have to time the length of each halt. Otherwise we hit some bad edge
>> cases:
>>
>>v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns grew
>> every
>>time we had a long halt. So idle VMs looked like: 0 us -> 500 us -> 1
>> ms ->
>>2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at 0
>> when
>>the halts are long.
>>
>>v4: v4 fixed the idle overhead problem but broke dynamic growth for
>> message
>>passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns would
>> grow.
>>That means vcpu->halt_poll_ns will always be maxed out, even when the
>> halt
>>time is much less than the max.
>>
>> I think we can fix both edge cases if we make grow/shrink decisions based
>> on
>> the length of kvm_vcpu_block rather than the arrival of a guest interrupt
>> during polling.
>>
>> Some thoughts for dynamic growth:
>>* Given Windows 10 timer tick (1 ms), let's set the maximum poll time
>> to
>>  less than 1ms. 200 us has been a good value for always-poll. We can
>>  probably go a bit higher once we have your patch. Maybe 500 us?
>>
>>* The base case of dynamic growth (the first grow() after being at 0)
>> should
>>  be small. 500 us is too big. When I run TCP_RR in my guest I see poll
>> times
>>  of < 10 us. TCP_RR is on the lower-end of message passing workload
>> latency,
>>  so 10 us would be a good base case.
>
>
> How to get your TCP_RR benchmark?
>
> Regards,
> Wanpeng Li

Install the netperf package, or build from here:
http://www.netperf.org/netperf/DownloadNetperf.html

In the vm:

# ./netserver
# ./netperf -t TCP_RR

Be sure to use an SMP guest (we want TCP_RR to be a cross-core message
passing workload in order to test halt-polling).

>
>
>>> v1 -> v2:
>>>   * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of
>>> the module parameter
>>>   * use the shrink/grow matrix which is suggested by David
>>>   * set halt_poll_ns_max to 2ms
>>>
>>> There is a downside of halt_poll_ns since poll is still happen for idle
>>> VCPU which can waste cpu usage. This patchset add the ability to adjust
>>> halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and
>>> shrinks halt_poll_ns when idle VCPU is detected.
>>>
>>> There are two new kernel parameters for changing the halt_poll_ns:
>>> halt_poll_ns_grow and halt_poll_ns_shrink.
>>>
>>>
>>> Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
>>> halt-poll is the default 50ns, the max halt_poll_ns of dynamic
>>> halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
>>> The test method is almost from David.
>>>
>>> +-++---+
>>> | ||   |
>>> |  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
>>> +-++---+
>>> | ||   |
>>> |~0.9%|~1.8%   | ~1.2% |
>>> +-++---+
>>>
>>> The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the
>>> dynamic halt-poll drop it to ~0.3% which means that reduce the 67%
>>> overhead
>>> introduced by always halt-poll.
>>>

Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-01 Thread Wanpeng Li

On 9/2/15 6:34 AM, David Matlack wrote:

On Tue, Sep 1, 2015 at 3:30 PM, Wanpeng Li  wrote:

On 9/2/15 5:45 AM, David Matlack wrote:

On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li 
wrote:

v3 -> v4:
   * bring back grow vcpu->halt_poll_ns when interrupt arrives and shrinks
 when idle VCPU is detected

v2 -> v3:
   * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or
/halt_poll_ns_shrink
   * drop the macros and hard coding the numbers in the param definitions
   * update the comments "5-7 us"
   * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns
time,
 vcpu->halt_poll_ns start at zero
   * drop the wrappers
   * move the grow/shrink logic before "out:" w/ "if (waited)"

I posted a patchset which adds dynamic poll toggling (on/off switch). I
think
this gives you a good place to build your dynamic growth patch on top. The
toggling patch has close to zero overhead for idle VMs and equivalent
performance VMs doing message passing as always-poll. It's a patch that's
been
in my queue for a few weeks but just haven't had the time to send out. We
can
win even more with your patchset by only polling as much as we need (via
dynamic growth/shrink). It also gives us a better place to stand for
choosing
a default for halt_poll_ns. (We can run experiments and see how high
vcpu->halt_poll_ns tends to grow.)

The reason I posted a separate patch for toggling is because it adds
timers
to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can get
called multiple times for one halt). To do dynamic poll adjustment


Why this can happen?


correctly,
we have to time the length of each halt. Otherwise we hit some bad edge
cases:

v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns grew
every
time we had a long halt. So idle VMs looked like: 0 us -> 500 us -> 1
ms ->
2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at 0
when
the halts are long.

v4: v4 fixed the idle overhead problem but broke dynamic growth for
message
passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns would
grow.
That means vcpu->halt_poll_ns will always be maxed out, even when the
halt
time is much less than the max.

I think we can fix both edge cases if we make grow/shrink decisions based
on
the length of kvm_vcpu_block rather than the arrival of a guest interrupt
during polling.

Some thoughts for dynamic growth:
* Given Windows 10 timer tick (1 ms), let's set the maximum poll time
to
  less than 1ms. 200 us has been a good value for always-poll. We can
  probably go a bit higher once we have your patch. Maybe 500 us?


Did you test your patch against a windows guest?



* The base case of dynamic growth (the first grow() after being at 0)
should
  be small. 500 us is too big. When I run TCP_RR in my guest I see poll
times
  of < 10 us. TCP_RR is on the lower-end of message passing workload
latency,
  so 10 us would be a good base case.


How to get your TCP_RR benchmark?

Regards,
Wanpeng Li

Install the netperf package, or build from here:
http://www.netperf.org/netperf/DownloadNetperf.html

In the vm:

# ./netserver
# ./netperf -t TCP_RR

Be sure to use an SMP guest (we want TCP_RR to be a cross-core message
passing workload in order to test halt-polling).


Ah, ok, I use the same benchmark as yours.

Regards,
Wanpeng Li


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-09-01 Thread Wanpeng Li

On 9/2/15 5:45 AM, David Matlack wrote:

On Thu, Aug 27, 2015 at 2:47 AM, Wanpeng Li  wrote:

v3 -> v4:
  * bring back grow vcpu->halt_poll_ns when interrupt arrives and shrinks
when idle VCPU is detected

v2 -> v3:
  * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or /halt_poll_ns_shrink
  * drop the macros and hard coding the numbers in the param definitions
  * update the comments "5-7 us"
  * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns time,
vcpu->halt_poll_ns start at zero
  * drop the wrappers
  * move the grow/shrink logic before "out:" w/ "if (waited)"

I posted a patchset which adds dynamic poll toggling (on/off switch). I think
this gives you a good place to build your dynamic growth patch on top. The
toggling patch has close to zero overhead for idle VMs and equivalent
performance VMs doing message passing as always-poll. It's a patch that's been
in my queue for a few weeks but just haven't had the time to send out. We can
win even more with your patchset by only polling as much as we need (via
dynamic growth/shrink). It also gives us a better place to stand for choosing
a default for halt_poll_ns. (We can run experiments and see how high
vcpu->halt_poll_ns tends to grow.)

The reason I posted a separate patch for toggling is because it adds timers
to kvm_vcpu_block and deals with a weird edge case (kvm_vcpu_block can get
called multiple times for one halt). To do dynamic poll adjustment correctly,
we have to time the length of each halt. Otherwise we hit some bad edge cases:

   v3: v3 had lots of idle overhead. It's because vcpu->halt_poll_ns grew every
   time we had a long halt. So idle VMs looked like: 0 us -> 500 us -> 1 ms ->
   2 ms -> 4 ms -> 0 us. Ideally vcpu->halt_poll_ns should just stay at 0 when
   the halts are long.

   v4: v4 fixed the idle overhead problem but broke dynamic growth for message
   passing VMs. Every time a VM did a short halt, vcpu->halt_poll_ns would grow.
   That means vcpu->halt_poll_ns will always be maxed out, even when the halt
   time is much less than the max.

I think we can fix both edge cases if we make grow/shrink decisions based on
the length of kvm_vcpu_block rather than the arrival of a guest interrupt
during polling.

Some thoughts for dynamic growth:
   * Given Windows 10 timer tick (1 ms), let's set the maximum poll time to
 less than 1ms. 200 us has been a good value for always-poll. We can
 probably go a bit higher once we have your patch. Maybe 500 us?

   * The base case of dynamic growth (the first grow() after being at 0) should
 be small. 500 us is too big. When I run TCP_RR in my guest I see poll times
 of < 10 us. TCP_RR is on the lower-end of message passing workload latency,
 so 10 us would be a good base case.


How to get your TCP_RR benchmark?

Regards,
Wanpeng Li


v1 -> v2:
  * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of
the module parameter
  * use the shrink/grow matrix which is suggested by David
  * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for idle
VCPU which can waste cpu usage. This patchset add the ability to adjust
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
halt-poll is the default 50ns, the max halt_poll_ns of dynamic
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+

The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead
introduced by always halt-poll.

Wanpeng Li (3):
   KVM: make halt_poll_ns per-VCPU
   KVM: dynamic halt_poll_ns adjustment
   KVM: trace kvm_halt_poll_ns grow/shrink

  include/linux/kvm_host.h   |  1 +
  include/trace/events/kvm.h | 30 
  virt/kvm/kvm_main.c| 50 +++---
  3 files changed, 78 insertions(+), 3 deletions(-)
--
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-31 Thread Wanpeng Li

On 8/31/15 3:44 PM, Wanpeng Li wrote:

On 8/30/15 6:26 AM, Peter Kieser wrote:

Thanks, Wanpeng. Applied this to Linux 3.18 and seeing much higher CPU
usage (200%) for qemu 2.4.0 process on a Windows 10 x64 guest. qemu
parameters:


Interesting. I test this against latest kvm tree and stable qemu 
2.0.0, 4 vCPUs on pCPU0(other pCPUs are offline to easy observe %C0 
and to avoid vCPUs schedule overhead influence). I just ignore the 
fluctuation and post the most common result of %C0 against the Windows 
10 x86 guest.


s/x86/x64



+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic(v4) halt-poll |
+-++---+
| ||   |
|~2.1%|~3.0%   | ~2.4% |
+-++---+

Regards,
Wanpeng Li



qemu-system-x86_64 -enable-kvm -name arwan-20150704 -S -machine
pc-q35-2.2,accel=kvm,usb=off -cpu
Haswell,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 8192
-realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid
7c2fc02d-2798-4fc9-ad04-db5f1af92723 -no-user-config -nodefaults
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/arwan-20150704.monitor,server,nowait 


-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
-no-shutdown -boot strict=on -device
i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device
pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -device
nec-usb-xhci,id=usb1,bus=pci.2,addr=0x4 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x5 -drive
file=/dev/mapper/crypt-arwan-20150704,if=none,id=drive-virtio-disk0,format=raw,cache=none,discard=unmap,aio=native 


-device
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 


-drive
file=/usr/share/virtio-win/virtio-win.iso,if=none,media=cdrom,id=drive-sata0-0-2,readonly=on,format=raw 


-device
ide-cd,bus=ide.2,drive=drive-sata0-0-2,id=sata0-0-2,bootindex=1
-netdev tap,fds=31:32:33:34,id=hostnet0,vhost=on,vhostfds=35:36:37:38
-device
virtio-net-pci,guest_csum=off,guest_tso4=off,guest_tso6=off,mq=on,vectors=10,netdev=hostnet0,id=net0,mac=52:54:00:f3:6b:c4,bus=pci.2,addr=0x2 


-chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/arwan-20150704.org.qemu.guest_agent.0,server,nowait 


-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 


-chardev spicevmc,id=charchannel1,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 


-vnc 127.0.0.1:4 -device
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pcie.0,addr=0x1 


-device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x1 -msg
timestamp=on

I revert patch, qemu shows 17% CPU usage on host. Thoughts?

-Peter

On 2015-08-29 3:21 PM, Wanpeng Li wrote:

Hi Peter,
On 8/30/15 5:18 AM, Peter Kieser wrote:

Hi Wanpeng,

Do I need to set any module parameters to use your patch, or should
halt_poll_ns automatically tune with just your patch series applied?



You don't need any module parameters.

Regards,
Wanpeng Li


Thanks.

On 2015-08-27 2:47 AM, Wanpeng Li wrote:

v3 -> v4:
  * bring back grow vcpu->halt_poll_ns when interrupt arrives and
shrinks
when idle VCPU is detected

v2 -> v3:
  * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or
/halt_poll_ns_shrink
  * drop the macros and hard coding the numbers in the param
definitions
  * update the comments "5-7 us"
  * remove halt_poll_ns_max and use halt_poll_ns as the max
halt_poll_ns time,
vcpu->halt_poll_ns start at zero
  * drop the wrappers
  * move the grow/shrink logic before "out:" w/ "if (waited)"

v1 -> v2:
  * change kvm_vcpu_block to read halt_poll_ns from the vcpu
instead of
the module parameter
  * use the shrink/grow matrix which is suggested by David
  * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for
idle
VCPU which can waste cpu usage. This patchset add the ability to
adjust
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt
arrives and
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
halt-poll is the default 50ns, the max halt_poll_ns of dynamic
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|  

Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-31 Thread Wanpeng Li

On 8/30/15 6:26 AM, Peter Kieser wrote:

Thanks, Wanpeng. Applied this to Linux 3.18 and seeing much higher CPU
usage (200%) for qemu 2.4.0 process on a Windows 10 x64 guest. qemu
parameters:


Interesting. I test this against latest kvm tree and stable qemu 2.0.0, 
4 vCPUs on pCPU0(other pCPUs are offline to easy observe %C0 and to 
avoid vCPUs schedule overhead influence). I just ignore the fluctuation 
and post the most common result of %C0 against the Windows 10 x86 guest.


+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic(v4) halt-poll |
+-++---+
| ||   |
|~2.1%|~3.0%   | ~2.4% |
+-++---+

Regards,
Wanpeng Li



qemu-system-x86_64 -enable-kvm -name arwan-20150704 -S -machine
pc-q35-2.2,accel=kvm,usb=off -cpu
Haswell,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 8192
-realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid
7c2fc02d-2798-4fc9-ad04-db5f1af92723 -no-user-config -nodefaults
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/arwan-20150704.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
-no-shutdown -boot strict=on -device
i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device
pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -device
nec-usb-xhci,id=usb1,bus=pci.2,addr=0x4 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x5 -drive
file=/dev/mapper/crypt-arwan-20150704,if=none,id=drive-virtio-disk0,format=raw,cache=none,discard=unmap,aio=native
-device
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
-drive
file=/usr/share/virtio-win/virtio-win.iso,if=none,media=cdrom,id=drive-sata0-0-2,readonly=on,format=raw
-device
ide-cd,bus=ide.2,drive=drive-sata0-0-2,id=sata0-0-2,bootindex=1
-netdev tap,fds=31:32:33:34,id=hostnet0,vhost=on,vhostfds=35:36:37:38
-device
virtio-net-pci,guest_csum=off,guest_tso4=off,guest_tso6=off,mq=on,vectors=10,netdev=hostnet0,id=net0,mac=52:54:00:f3:6b:c4,bus=pci.2,addr=0x2
-chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/arwan-20150704.org.qemu.guest_agent.0,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
-chardev spicevmc,id=charchannel1,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
-vnc 127.0.0.1:4 -device
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pcie.0,addr=0x1
-device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x1 -msg
timestamp=on

I revert patch, qemu shows 17% CPU usage on host. Thoughts?

-Peter

On 2015-08-29 3:21 PM, Wanpeng Li wrote:

Hi Peter,
On 8/30/15 5:18 AM, Peter Kieser wrote:

Hi Wanpeng,

Do I need to set any module parameters to use your patch, or should
halt_poll_ns automatically tune with just your patch series applied?



You don't need any module parameters.

Regards,
Wanpeng Li


Thanks.

On 2015-08-27 2:47 AM, Wanpeng Li wrote:

v3 -> v4:
  * bring back grow vcpu->halt_poll_ns when interrupt arrives and
shrinks
when idle VCPU is detected

v2 -> v3:
  * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or
/halt_poll_ns_shrink
  * drop the macros and hard coding the numbers in the param
definitions
  * update the comments "5-7 us"
  * remove halt_poll_ns_max and use halt_poll_ns as the max
halt_poll_ns time,
vcpu->halt_poll_ns start at zero
  * drop the wrappers
  * move the grow/shrink logic before "out:" w/ "if (waited)"

v1 -> v2:
  * change kvm_vcpu_block to read halt_poll_ns from the vcpu
instead of
the module parameter
  * use the shrink/grow matrix which is suggested by David
  * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for
idle
VCPU which can waste cpu usage. This patchset add the ability to
adjust
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt
arrives and
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
halt-poll is the default 50ns, the max halt_poll_ns of dynamic
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |

Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-31 Thread Wanpeng Li

On 8/31/15 3:44 PM, Wanpeng Li wrote:

On 8/30/15 6:26 AM, Peter Kieser wrote:

Thanks, Wanpeng. Applied this to Linux 3.18 and seeing much higher CPU
usage (200%) for qemu 2.4.0 process on a Windows 10 x64 guest. qemu
parameters:


Interesting. I test this against latest kvm tree and stable qemu 
2.0.0, 4 vCPUs on pCPU0(other pCPUs are offline to easy observe %C0 
and to avoid vCPUs schedule overhead influence). I just ignore the 
fluctuation and post the most common result of %C0 against the Windows 
10 x86 guest.


s/x86/x64



+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic(v4) halt-poll |
+-++---+
| ||   |
|~2.1%|~3.0%   | ~2.4% |
+-++---+

Regards,
Wanpeng Li



qemu-system-x86_64 -enable-kvm -name arwan-20150704 -S -machine
pc-q35-2.2,accel=kvm,usb=off -cpu
Haswell,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 8192
-realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid
7c2fc02d-2798-4fc9-ad04-db5f1af92723 -no-user-config -nodefaults
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/arwan-20150704.monitor,server,nowait 


-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
-no-shutdown -boot strict=on -device
i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device
pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -device
nec-usb-xhci,id=usb1,bus=pci.2,addr=0x4 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x5 -drive
file=/dev/mapper/crypt-arwan-20150704,if=none,id=drive-virtio-disk0,format=raw,cache=none,discard=unmap,aio=native 


-device
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 


-drive
file=/usr/share/virtio-win/virtio-win.iso,if=none,media=cdrom,id=drive-sata0-0-2,readonly=on,format=raw 


-device
ide-cd,bus=ide.2,drive=drive-sata0-0-2,id=sata0-0-2,bootindex=1
-netdev tap,fds=31:32:33:34,id=hostnet0,vhost=on,vhostfds=35:36:37:38
-device
virtio-net-pci,guest_csum=off,guest_tso4=off,guest_tso6=off,mq=on,vectors=10,netdev=hostnet0,id=net0,mac=52:54:00:f3:6b:c4,bus=pci.2,addr=0x2 


-chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/arwan-20150704.org.qemu.guest_agent.0,server,nowait 


-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 


-chardev spicevmc,id=charchannel1,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 


-vnc 127.0.0.1:4 -device
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pcie.0,addr=0x1 


-device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x1 -msg
timestamp=on

I revert patch, qemu shows 17% CPU usage on host. Thoughts?

-Peter

On 2015-08-29 3:21 PM, Wanpeng Li wrote:

Hi Peter,
On 8/30/15 5:18 AM, Peter Kieser wrote:

Hi Wanpeng,

Do I need to set any module parameters to use your patch, or should
halt_poll_ns automatically tune with just your patch series applied?



You don't need any module parameters.

Regards,
Wanpeng Li


Thanks.

On 2015-08-27 2:47 AM, Wanpeng Li wrote:

v3 -> v4:
  * bring back grow vcpu->halt_poll_ns when interrupt arrives and
shrinks
when idle VCPU is detected

v2 -> v3:
  * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or
/halt_poll_ns_shrink
  * drop the macros and hard coding the numbers in the param
definitions
  * update the comments "5-7 us"
  * remove halt_poll_ns_max and use halt_poll_ns as the max
halt_poll_ns time,
vcpu->halt_poll_ns start at zero
  * drop the wrappers
  * move the grow/shrink logic before "out:" w/ "if (waited)"

v1 -> v2:
  * change kvm_vcpu_block to read halt_poll_ns from the vcpu
instead of
the module parameter
  * use the shrink/grow matrix which is suggested by David
  * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for
idle
VCPU which can waste cpu usage. This patchset add the ability to
adjust
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt
arrives and
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
halt-poll is the default 50ns, the max halt_poll_ns of dynamic
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|  

Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-31 Thread Wanpeng Li

On 8/30/15 6:26 AM, Peter Kieser wrote:

Thanks, Wanpeng. Applied this to Linux 3.18 and seeing much higher CPU
usage (200%) for qemu 2.4.0 process on a Windows 10 x64 guest. qemu
parameters:


Interesting. I test this against latest kvm tree and stable qemu 2.0.0, 
4 vCPUs on pCPU0(other pCPUs are offline to easy observe %C0 and to 
avoid vCPUs schedule overhead influence). I just ignore the fluctuation 
and post the most common result of %C0 against the Windows 10 x86 guest.


+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic(v4) halt-poll |
+-++---+
| ||   |
|~2.1%|~3.0%   | ~2.4% |
+-++---+

Regards,
Wanpeng Li



qemu-system-x86_64 -enable-kvm -name arwan-20150704 -S -machine
pc-q35-2.2,accel=kvm,usb=off -cpu
Haswell,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 8192
-realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid
7c2fc02d-2798-4fc9-ad04-db5f1af92723 -no-user-config -nodefaults
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/arwan-20150704.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
-no-shutdown -boot strict=on -device
i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device
pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -device
nec-usb-xhci,id=usb1,bus=pci.2,addr=0x4 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x5 -drive
file=/dev/mapper/crypt-arwan-20150704,if=none,id=drive-virtio-disk0,format=raw,cache=none,discard=unmap,aio=native
-device
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
-drive
file=/usr/share/virtio-win/virtio-win.iso,if=none,media=cdrom,id=drive-sata0-0-2,readonly=on,format=raw
-device
ide-cd,bus=ide.2,drive=drive-sata0-0-2,id=sata0-0-2,bootindex=1
-netdev tap,fds=31:32:33:34,id=hostnet0,vhost=on,vhostfds=35:36:37:38
-device
virtio-net-pci,guest_csum=off,guest_tso4=off,guest_tso6=off,mq=on,vectors=10,netdev=hostnet0,id=net0,mac=52:54:00:f3:6b:c4,bus=pci.2,addr=0x2
-chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/arwan-20150704.org.qemu.guest_agent.0,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
-chardev spicevmc,id=charchannel1,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
-vnc 127.0.0.1:4 -device
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pcie.0,addr=0x1
-device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x1 -msg
timestamp=on

I revert patch, qemu shows 17% CPU usage on host. Thoughts?

-Peter

On 2015-08-29 3:21 PM, Wanpeng Li wrote:

Hi Peter,
On 8/30/15 5:18 AM, Peter Kieser wrote:

Hi Wanpeng,

Do I need to set any module parameters to use your patch, or should
halt_poll_ns automatically tune with just your patch series applied?



You don't need any module parameters.

Regards,
Wanpeng Li


Thanks.

On 2015-08-27 2:47 AM, Wanpeng Li wrote:

v3 -> v4:
  * bring back grow vcpu->halt_poll_ns when interrupt arrives and
shrinks
when idle VCPU is detected

v2 -> v3:
  * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or
/halt_poll_ns_shrink
  * drop the macros and hard coding the numbers in the param
definitions
  * update the comments "5-7 us"
  * remove halt_poll_ns_max and use halt_poll_ns as the max
halt_poll_ns time,
vcpu->halt_poll_ns start at zero
  * drop the wrappers
  * move the grow/shrink logic before "out:" w/ "if (waited)"

v1 -> v2:
  * change kvm_vcpu_block to read halt_poll_ns from the vcpu
instead of
the module parameter
  * use the shrink/grow matrix which is suggested by David
  * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for
idle
VCPU which can waste cpu usage. This patchset add the ability to
adjust
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt
arrives and
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
halt-poll is the default 50ns, the max halt_poll_ns of dynamic
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |

Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-29 Thread Wanpeng Li

On 8/30/15 8:13 AM, Peter Kieser wrote:

On 2015-08-29 4:55 PM, Wanpeng Li wrote:

On 8/30/15 6:26 AM, Peter Kieser wrote:
Thanks, Wanpeng. Applied this to Linux 3.18 and seeing much higher 
CPU usage (200%) for qemu 2.4.0 process on a Windows 10 x64 guest. 
qemu parameters:


Thanks for the report. If Paolo's patch "kvm: add halt_poll_ns module 
parameter" is applied on your 3.18? Btw, do you test the linux guest? 


No high CPU usage on Linux guests. 


What's the difference w/ and w/o the patchset?


Following patch series are applied (in order):

* kvm: add halt_poll_ns module parameter
* KVM: make halt_poll_ns static
* KVM: Dynamic Halt-Polling v4



I will find a windows 10 x64 guest to figure out what happens tomorrow. 
Do you test other windows guest(like win7)? Btw, could you test v3 
dynamic halt-polling(against windows 10 guest) which David gives the 
shrink/grow logic different with v4. Many thanks for your time, Peter! ;-)


Regards,
Wanepng Li


-Peter




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-29 Thread Peter Kieser



On 2015-08-29 4:55 PM, Wanpeng Li wrote:

On 8/30/15 6:26 AM, Peter Kieser wrote:
Thanks, Wanpeng. Applied this to Linux 3.18 and seeing much higher 
CPU usage (200%) for qemu 2.4.0 process on a Windows 10 x64 guest. 
qemu parameters:


Thanks for the report. If Paolo's patch "kvm: add halt_poll_ns module 
parameter" is applied on your 3.18? Btw, do you test the linux guest? 


No high CPU usage on Linux guests. Following patch series are applied 
(in order):


* kvm: add halt_poll_ns module parameter
* KVM: make halt_poll_ns static
* KVM: Dynamic Halt-Polling v4

-Peter




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-29 Thread Wanpeng Li

On 8/30/15 6:26 AM, Peter Kieser wrote:
Thanks, Wanpeng. Applied this to Linux 3.18 and seeing much higher CPU 
usage (200%) for qemu 2.4.0 process on a Windows 10 x64 guest. qemu 
parameters:


Thanks for the report. If Paolo's patch "kvm: add halt_poll_ns module 
parameter" is applied on your 3.18? Btw, do you test the linux guest?


Regards,
Wanpeng Li



qemu-system-x86_64 -enable-kvm -name arwan-20150704 -S -machine 
pc-q35-2.2,accel=kvm,usb=off -cpu 
Haswell,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 8192 
-realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 
7c2fc02d-2798-4fc9-ad04-db5f1af92723 -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/arwan-20150704.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime 
-no-shutdown -boot strict=on -device 
i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device 
pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -device 
nec-usb-xhci,id=usb1,bus=pci.2,addr=0x4 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x5 -drive 
file=/dev/mapper/crypt-arwan-20150704,if=none,id=drive-virtio-disk0,format=raw,cache=none,discard=unmap,aio=native 
-device 
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 
-drive 
file=/usr/share/virtio-win/virtio-win.iso,if=none,media=cdrom,id=drive-sata0-0-2,readonly=on,format=raw 
-device 
ide-cd,bus=ide.2,drive=drive-sata0-0-2,id=sata0-0-2,bootindex=1 
-netdev tap,fds=31:32:33:34,id=hostnet0,vhost=on,vhostfds=35:36:37:38 
-device 
virtio-net-pci,guest_csum=off,guest_tso4=off,guest_tso6=off,mq=on,vectors=10,netdev=hostnet0,id=net0,mac=52:54:00:f3:6b:c4,bus=pci.2,addr=0x2 
-chardev 
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/arwan-20150704.org.qemu.guest_agent.0,server,nowait 
-device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 
-chardev spicevmc,id=charchannel1,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 
-vnc 127.0.0.1:4 -device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pcie.0,addr=0x1 
-device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x1 -msg 
timestamp=on


I revert patch, qemu shows 17% CPU usage on host. Thoughts?

-Peter

On 2015-08-29 3:21 PM, Wanpeng Li wrote:

Hi Peter,
On 8/30/15 5:18 AM, Peter Kieser wrote:

Hi Wanpeng,

Do I need to set any module parameters to use your patch, or should 
halt_poll_ns automatically tune with just your patch series applied?




You don't need any module parameters.

Regards,
Wanpeng Li


Thanks.

On 2015-08-27 2:47 AM, Wanpeng Li wrote:

v3 -> v4:
  * bring back grow vcpu->halt_poll_ns when interrupt arrives and 
shrinks

when idle VCPU is detected

v2 -> v3:
  * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or 
/halt_poll_ns_shrink
  * drop the macros and hard coding the numbers in the param 
definitions

  * update the comments "5-7 us"
  * remove halt_poll_ns_max and use halt_poll_ns as the max 
halt_poll_ns time,

vcpu->halt_poll_ns start at zero
  * drop the wrappers
  * move the grow/shrink logic before "out:" w/ "if (waited)"

v1 -> v2:
  * change kvm_vcpu_block to read halt_poll_ns from the vcpu 
instead of

the module parameter
  * use the shrink/grow matrix which is suggested by David
  * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for 
idle
VCPU which can waste cpu usage. This patchset add the ability to 
adjust
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt 
arrives and

shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
halt-poll is the default 50ns, the max halt_poll_ns of dynamic
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+
  The always halt-poll 
will increase ~0.9% cpu usage for idle vCPUs and the
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% 
overhead

introduced by always halt-poll.

Wanpeng Li (3):
   KVM: make halt_poll_ns per-VCPU
   KVM: dynamic halt_poll_ns adjustment
   KVM: trace kvm_halt_poll_ns grow/shrink

  include/linux/kvm_host.h   |  1 +
  include/trace/events/kvm.h | 30 
  virt/kvm/kvm_main.c| 50 

Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-29 Thread Peter Kieser
Thanks, Wanpeng. Applied this to Linux 3.18 and seeing much higher CPU 
usage (200%) for qemu 2.4.0 process on a Windows 10 x64 guest. qemu 
parameters:


qemu-system-x86_64 -enable-kvm -name arwan-20150704 -S -machine 
pc-q35-2.2,accel=kvm,usb=off -cpu 
Haswell,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 8192 
-realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 
7c2fc02d-2798-4fc9-ad04-db5f1af92723 -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/arwan-20150704.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime 
-no-shutdown -boot strict=on -device 
i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device 
pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -device 
nec-usb-xhci,id=usb1,bus=pci.2,addr=0x4 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x5 -drive 
file=/dev/mapper/crypt-arwan-20150704,if=none,id=drive-virtio-disk0,format=raw,cache=none,discard=unmap,aio=native 
-device 
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 
-drive 
file=/usr/share/virtio-win/virtio-win.iso,if=none,media=cdrom,id=drive-sata0-0-2,readonly=on,format=raw 
-device ide-cd,bus=ide.2,drive=drive-sata0-0-2,id=sata0-0-2,bootindex=1 
-netdev tap,fds=31:32:33:34,id=hostnet0,vhost=on,vhostfds=35:36:37:38 
-device 
virtio-net-pci,guest_csum=off,guest_tso4=off,guest_tso6=off,mq=on,vectors=10,netdev=hostnet0,id=net0,mac=52:54:00:f3:6b:c4,bus=pci.2,addr=0x2 
-chardev 
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/arwan-20150704.org.qemu.guest_agent.0,server,nowait 
-device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 
-chardev spicevmc,id=charchannel1,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 
-vnc 127.0.0.1:4 -device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pcie.0,addr=0x1 
-device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x1 -msg timestamp=on


I revert patch, qemu shows 17% CPU usage on host. Thoughts?

-Peter

On 2015-08-29 3:21 PM, Wanpeng Li wrote:

Hi Peter,
On 8/30/15 5:18 AM, Peter Kieser wrote:

Hi Wanpeng,

Do I need to set any module parameters to use your patch, or should 
halt_poll_ns automatically tune with just your patch series applied?




You don't need any module parameters.

Regards,
Wanpeng Li


Thanks.

On 2015-08-27 2:47 AM, Wanpeng Li wrote:

v3 -> v4:
  * bring back grow vcpu->halt_poll_ns when interrupt arrives and 
shrinks

when idle VCPU is detected

v2 -> v3:
  * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or 
/halt_poll_ns_shrink
  * drop the macros and hard coding the numbers in the param 
definitions

  * update the comments "5-7 us"
  * remove halt_poll_ns_max and use halt_poll_ns as the max 
halt_poll_ns time,

vcpu->halt_poll_ns start at zero
  * drop the wrappers
  * move the grow/shrink logic before "out:" w/ "if (waited)"

v1 -> v2:
  * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of
the module parameter
  * use the shrink/grow matrix which is suggested by David
  * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for idle
VCPU which can waste cpu usage. This patchset add the ability to adjust
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives 
and

shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
halt-poll is the default 50ns, the max halt_poll_ns of dynamic
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+
  The always halt-poll 
will increase ~0.9% cpu usage for idle vCPUs and the
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% 
overhead

introduced by always halt-poll.

Wanpeng Li (3):
   KVM: make halt_poll_ns per-VCPU
   KVM: dynamic halt_poll_ns adjustment
   KVM: trace kvm_halt_poll_ns grow/shrink

  include/linux/kvm_host.h   |  1 +
  include/trace/events/kvm.h | 30 
  virt/kvm/kvm_main.c| 50 
+++---

  3 files changed, 78 insertions(+), 3 deletions(-)






--
Peter Kieser
604.338.9294 / pe...@kieser.ca




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-29 Thread Wanpeng Li

Hi Peter,
On 8/30/15 5:18 AM, Peter Kieser wrote:

Hi Wanpeng,

Do I need to set any module parameters to use your patch, or should 
halt_poll_ns automatically tune with just your patch series applied?




You don't need any module parameters.

Regards,
Wanpeng Li


Thanks.

On 2015-08-27 2:47 AM, Wanpeng Li wrote:

v3 -> v4:
  * bring back grow vcpu->halt_poll_ns when interrupt arrives and 
shrinks

when idle VCPU is detected

v2 -> v3:
  * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or 
/halt_poll_ns_shrink

  * drop the macros and hard coding the numbers in the param definitions
  * update the comments "5-7 us"
  * remove halt_poll_ns_max and use halt_poll_ns as the max 
halt_poll_ns time,

vcpu->halt_poll_ns start at zero
  * drop the wrappers
  * move the grow/shrink logic before "out:" w/ "if (waited)"

v1 -> v2:
  * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of
the module parameter
  * use the shrink/grow matrix which is suggested by David
  * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for idle
VCPU which can waste cpu usage. This patchset add the ability to adjust
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
halt-poll is the default 50ns, the max halt_poll_ns of dynamic
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+
  The always halt-poll 
will increase ~0.9% cpu usage for idle vCPUs and the
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% 
overhead

introduced by always halt-poll.

Wanpeng Li (3):
   KVM: make halt_poll_ns per-VCPU
   KVM: dynamic halt_poll_ns adjustment
   KVM: trace kvm_halt_poll_ns grow/shrink

  include/linux/kvm_host.h   |  1 +
  include/trace/events/kvm.h | 30 
  virt/kvm/kvm_main.c| 50 
+++---

  3 files changed, 78 insertions(+), 3 deletions(-)




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-29 Thread Wanpeng Li

Hi Peter,
On 8/30/15 5:18 AM, Peter Kieser wrote:

Hi Wanpeng,

Do I need to set any module parameters to use your patch, or should 
halt_poll_ns automatically tune with just your patch series applied?




You don't need any module parameters.

Regards,
Wanpeng Li


Thanks.

On 2015-08-27 2:47 AM, Wanpeng Li wrote:

v3 - v4:
  * bring back grow vcpu-halt_poll_ns when interrupt arrives and 
shrinks

when idle VCPU is detected

v2 - v3:
  * grow/shrink vcpu-halt_poll_ns by *halt_poll_ns_grow or 
/halt_poll_ns_shrink

  * drop the macros and hard coding the numbers in the param definitions
  * update the comments 5-7 us
  * remove halt_poll_ns_max and use halt_poll_ns as the max 
halt_poll_ns time,

vcpu-halt_poll_ns start at zero
  * drop the wrappers
  * move the grow/shrink logic before out: w/ if (waited)

v1 - v2:
  * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of
the module parameter
  * use the shrink/grow matrix which is suggested by David
  * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for idle
VCPU which can waste cpu usage. This patchset add the ability to adjust
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
halt-poll is the default 50ns, the max halt_poll_ns of dynamic
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+
  The always halt-poll 
will increase ~0.9% cpu usage for idle vCPUs and the
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% 
overhead

introduced by always halt-poll.

Wanpeng Li (3):
   KVM: make halt_poll_ns per-VCPU
   KVM: dynamic halt_poll_ns adjustment
   KVM: trace kvm_halt_poll_ns grow/shrink

  include/linux/kvm_host.h   |  1 +
  include/trace/events/kvm.h | 30 
  virt/kvm/kvm_main.c| 50 
+++---

  3 files changed, 78 insertions(+), 3 deletions(-)




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-29 Thread Peter Kieser
Thanks, Wanpeng. Applied this to Linux 3.18 and seeing much higher CPU 
usage (200%) for qemu 2.4.0 process on a Windows 10 x64 guest. qemu 
parameters:


qemu-system-x86_64 -enable-kvm -name arwan-20150704 -S -machine 
pc-q35-2.2,accel=kvm,usb=off -cpu 
Haswell,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 8192 
-realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 
7c2fc02d-2798-4fc9-ad04-db5f1af92723 -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/arwan-20150704.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime 
-no-shutdown -boot strict=on -device 
i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device 
pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -device 
nec-usb-xhci,id=usb1,bus=pci.2,addr=0x4 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x5 -drive 
file=/dev/mapper/crypt-arwan-20150704,if=none,id=drive-virtio-disk0,format=raw,cache=none,discard=unmap,aio=native 
-device 
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 
-drive 
file=/usr/share/virtio-win/virtio-win.iso,if=none,media=cdrom,id=drive-sata0-0-2,readonly=on,format=raw 
-device ide-cd,bus=ide.2,drive=drive-sata0-0-2,id=sata0-0-2,bootindex=1 
-netdev tap,fds=31:32:33:34,id=hostnet0,vhost=on,vhostfds=35:36:37:38 
-device 
virtio-net-pci,guest_csum=off,guest_tso4=off,guest_tso6=off,mq=on,vectors=10,netdev=hostnet0,id=net0,mac=52:54:00:f3:6b:c4,bus=pci.2,addr=0x2 
-chardev 
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/arwan-20150704.org.qemu.guest_agent.0,server,nowait 
-device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 
-chardev spicevmc,id=charchannel1,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 
-vnc 127.0.0.1:4 -device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pcie.0,addr=0x1 
-device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x1 -msg timestamp=on


I revert patch, qemu shows 17% CPU usage on host. Thoughts?

-Peter

On 2015-08-29 3:21 PM, Wanpeng Li wrote:

Hi Peter,
On 8/30/15 5:18 AM, Peter Kieser wrote:

Hi Wanpeng,

Do I need to set any module parameters to use your patch, or should 
halt_poll_ns automatically tune with just your patch series applied?




You don't need any module parameters.

Regards,
Wanpeng Li


Thanks.

On 2015-08-27 2:47 AM, Wanpeng Li wrote:

v3 - v4:
  * bring back grow vcpu-halt_poll_ns when interrupt arrives and 
shrinks

when idle VCPU is detected

v2 - v3:
  * grow/shrink vcpu-halt_poll_ns by *halt_poll_ns_grow or 
/halt_poll_ns_shrink
  * drop the macros and hard coding the numbers in the param 
definitions

  * update the comments 5-7 us
  * remove halt_poll_ns_max and use halt_poll_ns as the max 
halt_poll_ns time,

vcpu-halt_poll_ns start at zero
  * drop the wrappers
  * move the grow/shrink logic before out: w/ if (waited)

v1 - v2:
  * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of
the module parameter
  * use the shrink/grow matrix which is suggested by David
  * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for idle
VCPU which can waste cpu usage. This patchset add the ability to adjust
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives 
and

shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
halt-poll is the default 50ns, the max halt_poll_ns of dynamic
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+
  The always halt-poll 
will increase ~0.9% cpu usage for idle vCPUs and the
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% 
overhead

introduced by always halt-poll.

Wanpeng Li (3):
   KVM: make halt_poll_ns per-VCPU
   KVM: dynamic halt_poll_ns adjustment
   KVM: trace kvm_halt_poll_ns grow/shrink

  include/linux/kvm_host.h   |  1 +
  include/trace/events/kvm.h | 30 
  virt/kvm/kvm_main.c| 50 
+++---

  3 files changed, 78 insertions(+), 3 deletions(-)






--
Peter Kieser
604.338.9294 / pe...@kieser.ca




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-29 Thread Wanpeng Li

On 8/30/15 6:26 AM, Peter Kieser wrote:
Thanks, Wanpeng. Applied this to Linux 3.18 and seeing much higher CPU 
usage (200%) for qemu 2.4.0 process on a Windows 10 x64 guest. qemu 
parameters:


Thanks for the report. If Paolo's patch kvm: add halt_poll_ns module 
parameter is applied on your 3.18? Btw, do you test the linux guest?


Regards,
Wanpeng Li



qemu-system-x86_64 -enable-kvm -name arwan-20150704 -S -machine 
pc-q35-2.2,accel=kvm,usb=off -cpu 
Haswell,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 8192 
-realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 
7c2fc02d-2798-4fc9-ad04-db5f1af92723 -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/arwan-20150704.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime 
-no-shutdown -boot strict=on -device 
i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device 
pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -device 
nec-usb-xhci,id=usb1,bus=pci.2,addr=0x4 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x5 -drive 
file=/dev/mapper/crypt-arwan-20150704,if=none,id=drive-virtio-disk0,format=raw,cache=none,discard=unmap,aio=native 
-device 
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 
-drive 
file=/usr/share/virtio-win/virtio-win.iso,if=none,media=cdrom,id=drive-sata0-0-2,readonly=on,format=raw 
-device 
ide-cd,bus=ide.2,drive=drive-sata0-0-2,id=sata0-0-2,bootindex=1 
-netdev tap,fds=31:32:33:34,id=hostnet0,vhost=on,vhostfds=35:36:37:38 
-device 
virtio-net-pci,guest_csum=off,guest_tso4=off,guest_tso6=off,mq=on,vectors=10,netdev=hostnet0,id=net0,mac=52:54:00:f3:6b:c4,bus=pci.2,addr=0x2 
-chardev 
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/arwan-20150704.org.qemu.guest_agent.0,server,nowait 
-device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 
-chardev spicevmc,id=charchannel1,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 
-vnc 127.0.0.1:4 -device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pcie.0,addr=0x1 
-device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x1 -msg 
timestamp=on


I revert patch, qemu shows 17% CPU usage on host. Thoughts?

-Peter

On 2015-08-29 3:21 PM, Wanpeng Li wrote:

Hi Peter,
On 8/30/15 5:18 AM, Peter Kieser wrote:

Hi Wanpeng,

Do I need to set any module parameters to use your patch, or should 
halt_poll_ns automatically tune with just your patch series applied?




You don't need any module parameters.

Regards,
Wanpeng Li


Thanks.

On 2015-08-27 2:47 AM, Wanpeng Li wrote:

v3 - v4:
  * bring back grow vcpu-halt_poll_ns when interrupt arrives and 
shrinks

when idle VCPU is detected

v2 - v3:
  * grow/shrink vcpu-halt_poll_ns by *halt_poll_ns_grow or 
/halt_poll_ns_shrink
  * drop the macros and hard coding the numbers in the param 
definitions

  * update the comments 5-7 us
  * remove halt_poll_ns_max and use halt_poll_ns as the max 
halt_poll_ns time,

vcpu-halt_poll_ns start at zero
  * drop the wrappers
  * move the grow/shrink logic before out: w/ if (waited)

v1 - v2:
  * change kvm_vcpu_block to read halt_poll_ns from the vcpu 
instead of

the module parameter
  * use the shrink/grow matrix which is suggested by David
  * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for 
idle
VCPU which can waste cpu usage. This patchset add the ability to 
adjust
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt 
arrives and

shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink.


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of
halt-poll is the default 50ns, the max halt_poll_ns of dynamic
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+
  The always halt-poll 
will increase ~0.9% cpu usage for idle vCPUs and the
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% 
overhead

introduced by always halt-poll.

Wanpeng Li (3):
   KVM: make halt_poll_ns per-VCPU
   KVM: dynamic halt_poll_ns adjustment
   KVM: trace kvm_halt_poll_ns grow/shrink

  include/linux/kvm_host.h   |  1 +
  include/trace/events/kvm.h | 30 
  virt/kvm/kvm_main.c| 50 

Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-29 Thread Peter Kieser



On 2015-08-29 4:55 PM, Wanpeng Li wrote:

On 8/30/15 6:26 AM, Peter Kieser wrote:
Thanks, Wanpeng. Applied this to Linux 3.18 and seeing much higher 
CPU usage (200%) for qemu 2.4.0 process on a Windows 10 x64 guest. 
qemu parameters:


Thanks for the report. If Paolo's patch kvm: add halt_poll_ns module 
parameter is applied on your 3.18? Btw, do you test the linux guest? 


No high CPU usage on Linux guests. Following patch series are applied 
(in order):


* kvm: add halt_poll_ns module parameter
* KVM: make halt_poll_ns static
* KVM: Dynamic Halt-Polling v4

-Peter




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-29 Thread Wanpeng Li

On 8/30/15 8:13 AM, Peter Kieser wrote:

On 2015-08-29 4:55 PM, Wanpeng Li wrote:

On 8/30/15 6:26 AM, Peter Kieser wrote:
Thanks, Wanpeng. Applied this to Linux 3.18 and seeing much higher 
CPU usage (200%) for qemu 2.4.0 process on a Windows 10 x64 guest. 
qemu parameters:


Thanks for the report. If Paolo's patch kvm: add halt_poll_ns module 
parameter is applied on your 3.18? Btw, do you test the linux guest? 


No high CPU usage on Linux guests. 


What's the difference w/ and w/o the patchset?


Following patch series are applied (in order):

* kvm: add halt_poll_ns module parameter
* KVM: make halt_poll_ns static
* KVM: Dynamic Halt-Polling v4



I will find a windows 10 x64 guest to figure out what happens tomorrow. 
Do you test other windows guest(like win7)? Btw, could you test v3 
dynamic halt-polling(against windows 10 guest) which David gives the 
shrink/grow logic different with v4. Many thanks for your time, Peter! ;-)


Regards,
Wanepng Li


-Peter




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-27 Thread Wanpeng Li
v3 -> v4:
 * bring back grow vcpu->halt_poll_ns when interrupt arrives and shrinks
   when idle VCPU is detected 

v2 -> v3:
 * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or /halt_poll_ns_shrink
 * drop the macros and hard coding the numbers in the param definitions
 * update the comments "5-7 us"
 * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns time,
   vcpu->halt_poll_ns start at zero
 * drop the wrappers 
 * move the grow/shrink logic before "out:" w/ "if (waited)"

v1 -> v2:
 * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of 
   the module parameter
 * use the shrink/grow matrix which is suggested by David
 * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for idle 
VCPU which can waste cpu usage. This patchset add the ability to adjust 
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and 
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink. 


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of 
halt-poll is the default 50ns, the max halt_poll_ns of dynamic 
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+
 
The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the 
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead 
introduced by always halt-poll.

Wanpeng Li (3):
  KVM: make halt_poll_ns per-VCPU
  KVM: dynamic halt_poll_ns adjustment
  KVM: trace kvm_halt_poll_ns grow/shrink

 include/linux/kvm_host.h   |  1 +
 include/trace/events/kvm.h | 30 
 virt/kvm/kvm_main.c| 50 +++---
 3 files changed, 78 insertions(+), 3 deletions(-)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-27 Thread Wanpeng Li
v3 -> v4:
 * bring back grow vcpu->halt_poll_ns when interrupt arrives and shrinks
   when idle VCPU is detected 

v2 -> v3:
 * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or /halt_poll_ns_shrink
 * drop the macros and hard coding the numbers in the param definitions
 * update the comments "5-7 us"
 * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns time,
   vcpu->halt_poll_ns start at zero
 * drop the wrappers 
 * move the grow/shrink logic before "out:" w/ "if (waited)"

v1 -> v2:
 * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of 
   the module parameter
 * use the shrink/grow matrix which is suggested by David
 * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for idle 
VCPU which can waste cpu usage. This patchset add the ability to adjust 
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and 
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink. 


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of 
halt-poll is the default 50ns, the max halt_poll_ns of dynamic 
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+
 
The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the 
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead 
introduced by always halt-poll.

Wanpeng Li (3):
  KVM: make halt_poll_ns per-VCPU
  KVM: dynamic halt_poll_ns adjustment
  KVM: trace kvm_halt_poll_ns grow/shrink

 include/linux/kvm_host.h   |  1 +
 include/trace/events/kvm.h | 30 
 virt/kvm/kvm_main.c| 50 +++---
 3 files changed, 78 insertions(+), 3 deletions(-)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-27 Thread Wanpeng Li
v3 - v4:
 * bring back grow vcpu-halt_poll_ns when interrupt arrives and shrinks
   when idle VCPU is detected 

v2 - v3:
 * grow/shrink vcpu-halt_poll_ns by *halt_poll_ns_grow or /halt_poll_ns_shrink
 * drop the macros and hard coding the numbers in the param definitions
 * update the comments 5-7 us
 * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns time,
   vcpu-halt_poll_ns start at zero
 * drop the wrappers 
 * move the grow/shrink logic before out: w/ if (waited)

v1 - v2:
 * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of 
   the module parameter
 * use the shrink/grow matrix which is suggested by David
 * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for idle 
VCPU which can waste cpu usage. This patchset add the ability to adjust 
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and 
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink. 


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of 
halt-poll is the default 50ns, the max halt_poll_ns of dynamic 
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+
 
The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the 
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead 
introduced by always halt-poll.

Wanpeng Li (3):
  KVM: make halt_poll_ns per-VCPU
  KVM: dynamic halt_poll_ns adjustment
  KVM: trace kvm_halt_poll_ns grow/shrink

 include/linux/kvm_host.h   |  1 +
 include/trace/events/kvm.h | 30 
 virt/kvm/kvm_main.c| 50 +++---
 3 files changed, 78 insertions(+), 3 deletions(-)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 0/3] KVM: Dynamic Halt-Polling

2015-08-27 Thread Wanpeng Li
v3 - v4:
 * bring back grow vcpu-halt_poll_ns when interrupt arrives and shrinks
   when idle VCPU is detected 

v2 - v3:
 * grow/shrink vcpu-halt_poll_ns by *halt_poll_ns_grow or /halt_poll_ns_shrink
 * drop the macros and hard coding the numbers in the param definitions
 * update the comments 5-7 us
 * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns time,
   vcpu-halt_poll_ns start at zero
 * drop the wrappers 
 * move the grow/shrink logic before out: w/ if (waited)

v1 - v2:
 * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of 
   the module parameter
 * use the shrink/grow matrix which is suggested by David
 * set halt_poll_ns_max to 2ms

There is a downside of halt_poll_ns since poll is still happen for idle 
VCPU which can waste cpu usage. This patchset add the ability to adjust 
halt_poll_ns dynamically, grows halt_poll_ns if an interrupt arrives and 
shrinks halt_poll_ns when idle VCPU is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink. 


Test w/ high cpu overcommit ratio, pin vCPUs, and the halt_poll_ns of 
halt-poll is the default 50ns, the max halt_poll_ns of dynamic 
halt-poll is 2ms. Then watch the %C0 in the dump of Powertop tool.
The test method is almost from David.

+-++---+
| ||   |
|  w/o halt-poll  |  w/ halt-poll  | dynamic halt-poll |
+-++---+
| ||   |
|~0.9%|~1.8%   | ~1.2% |
+-++---+
 
The always halt-poll will increase ~0.9% cpu usage for idle vCPUs and the 
dynamic halt-poll drop it to ~0.3% which means that reduce the 67% overhead 
introduced by always halt-poll.

Wanpeng Li (3):
  KVM: make halt_poll_ns per-VCPU
  KVM: dynamic halt_poll_ns adjustment
  KVM: trace kvm_halt_poll_ns grow/shrink

 include/linux/kvm_host.h   |  1 +
 include/trace/events/kvm.h | 30 
 virt/kvm/kvm_main.c| 50 +++---
 3 files changed, 78 insertions(+), 3 deletions(-)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/