Re: pidns: Make pid accounting and pid_max per namespace

2015-10-09 Thread Zhang Haoyu

On 10/10/15 12:40, Zhang Haoyu wrote:
> On 10/10/15 11:35, Zefan Li wrote:
>> On 2015/10/9 18:29, Zhang Haoyu wrote:
>>> I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2),
>>> and there's one bad program was running in one container.
>>> This program produced many child threads continuously without free, so more 
>>> and
>>> more pid numbers were consumed by this program, until hitting the pix_max 
>>> limit (32768
>>> default in my system ).
>>>
>>> What's worse is that containers and host share the pid numbers resource, so 
>>> new program
>>> cannot be produced any more in host and other containers.
>>>
>>> And, I clone the upstream kernel source from
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>> This problem is still there, I'm not sure.
>>>
>>> IMO, we should isolate the pid accounting and pid_max between pid 
>>> namespaces,
>>> and make them per pidns.
>>> Below post had request for making pid_max per pidns.
>>> http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210
>>>
>> Mainline kernel already supports per-cgroup pid limit, which should solve
>> your problem.
>>
> What about pid accounting?
> If one pidns consume too many pids, dose it influence the other pid 
> namespaces?
I found it, thanks very much.
>
> Thanks,
> Zhang Haoyu


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pidns: Make pid accounting and pid_max per namespace

2015-10-09 Thread Zhang Haoyu

On 10/10/15 11:35, Zefan Li wrote:
> On 2015/10/9 18:29, Zhang Haoyu wrote:
>> I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2),
>> and there's one bad program was running in one container.
>> This program produced many child threads continuously without free, so more 
>> and
>> more pid numbers were consumed by this program, until hitting the pix_max 
>> limit (32768
>> default in my system ).
>>
>> What's worse is that containers and host share the pid numbers resource, so 
>> new program
>> cannot be produced any more in host and other containers.
>>
>> And, I clone the upstream kernel source from
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>> This problem is still there, I'm not sure.
>>
>> IMO, we should isolate the pid accounting and pid_max between pid namespaces,
>> and make them per pidns.
>> Below post had request for making pid_max per pidns.
>> http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210
>>
>
> Mainline kernel already supports per-cgroup pid limit, which should solve
> your problem.
>
What about pid accounting?
If one pidns consume too many pids, dose it influence the other pid namespaces?

Thanks,
Zhang Haoyu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


pidns: Make pid accounting and pid_max per namespace

2015-10-09 Thread Zhang Haoyu
I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2),
and there's one bad program was running in one container.
This program produced many child threads continuously without free, so more and
more pid numbers were consumed by this program, until hitting the pix_max limit 
(32768
default in my system ).

What's worse is that containers and host share the pid numbers resource, so new 
program
cannot be produced any more in host and other containers.

And, I clone the upstream kernel source from
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
This problem is still there, I'm not sure.

IMO, we should isolate the pid accounting and pid_max between pid namespaces,
and make them per pidns.
Below post had request for making pid_max per pidns.
http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210

Thanks,
Zhang Haoyu
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pidns: Make pid accounting and pid_max per namespace

2015-10-09 Thread Zhang Haoyu

On 10/10/15 11:35, Zefan Li wrote:
> On 2015/10/9 18:29, Zhang Haoyu wrote:
>> I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2),
>> and there's one bad program was running in one container.
>> This program produced many child threads continuously without free, so more 
>> and
>> more pid numbers were consumed by this program, until hitting the pix_max 
>> limit (32768
>> default in my system ).
>>
>> What's worse is that containers and host share the pid numbers resource, so 
>> new program
>> cannot be produced any more in host and other containers.
>>
>> And, I clone the upstream kernel source from
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>> This problem is still there, I'm not sure.
>>
>> IMO, we should isolate the pid accounting and pid_max between pid namespaces,
>> and make them per pidns.
>> Below post had request for making pid_max per pidns.
>> http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210
>>
>
> Mainline kernel already supports per-cgroup pid limit, which should solve
> your problem.
>
What about pid accounting?
If one pidns consume too many pids, dose it influence the other pid namespaces?

Thanks,
Zhang Haoyu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pidns: Make pid accounting and pid_max per namespace

2015-10-09 Thread Zhang Haoyu

On 10/10/15 12:40, Zhang Haoyu wrote:
> On 10/10/15 11:35, Zefan Li wrote:
>> On 2015/10/9 18:29, Zhang Haoyu wrote:
>>> I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2),
>>> and there's one bad program was running in one container.
>>> This program produced many child threads continuously without free, so more 
>>> and
>>> more pid numbers were consumed by this program, until hitting the pix_max 
>>> limit (32768
>>> default in my system ).
>>>
>>> What's worse is that containers and host share the pid numbers resource, so 
>>> new program
>>> cannot be produced any more in host and other containers.
>>>
>>> And, I clone the upstream kernel source from
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>> This problem is still there, I'm not sure.
>>>
>>> IMO, we should isolate the pid accounting and pid_max between pid 
>>> namespaces,
>>> and make them per pidns.
>>> Below post had request for making pid_max per pidns.
>>> http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210
>>>
>> Mainline kernel already supports per-cgroup pid limit, which should solve
>> your problem.
>>
> What about pid accounting?
> If one pidns consume too many pids, dose it influence the other pid 
> namespaces?
I found it, thanks very much.
>
> Thanks,
> Zhang Haoyu


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


pidns: Make pid accounting and pid_max per namespace

2015-10-09 Thread Zhang Haoyu
I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2),
and there's one bad program was running in one container.
This program produced many child threads continuously without free, so more and
more pid numbers were consumed by this program, until hitting the pix_max limit 
(32768
default in my system ).

What's worse is that containers and host share the pid numbers resource, so new 
program
cannot be produced any more in host and other containers.

And, I clone the upstream kernel source from
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
This problem is still there, I'm not sure.

IMO, we should isolate the pid accounting and pid_max between pid namespaces,
and make them per pidns.
Below post had request for making pid_max per pidns.
http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210

Thanks,
Zhang Haoyu
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [question] lots of interrupts injected to vm when pressing some key w/o releasing

2014-11-28 Thread Zhang Haoyu
>Hi all,
>On Thu, Nov 27, 2014 at 03:20:43PM +0800, Zhang Haoyu wrote:
>>>>>>>> I tested win-server-2008 with "-cpu 
>>>>>>>> core2duo,hv_spinlocks=0x,hv_relaxed,hv_time",
>>>>>>>> this problem still happened, about 200,000 vmexits per-second, 
>>>>>>>> bringing very bad experience, just like being stuck.
>>>>>>> 
>>>>>>> Please upload a full trace somewhere, or at least the "perf report" 
>>>>>>> output.
>>>>>>> 
>>>>>> 
>>>>>> And, if I remove the commit of 0bc830b0, the problem disappeared.
>>>>> 
>>>>> Please send the full trace file.  If you compress it, it should be small.
>>>>> 
>>>> See the attach 1, please.
>>>> 
>>>>> Paolo
>>>
>>>Can you try the follow draft patch to see whether it solve your problem? 
>>>This patch is based on commit 0bc830b0.
>>>
>>After applying this patch, VM got stuck with black-screen at boot stage,
>># trace-cmd report:
>>version = 6
>>CPU 0 is empty
>>CPU 1 is empty
>>CPU 2 is empty
>>CPU 3 is empty
>>CPU 5 is empty
>>CPU 7 is empty
>>cpus=8
>> kvm-1266  [004] 14399.834397: kvm_set_irq:  gsi 9 level 
>> 1 source 0
>> kvm-1266  [004] 14399.834403: kvm_pic_set_irq:  chip 1 pin 1 
>> (edge|masked)
>> kvm-1266  [004] 14399.834411: kvm_apic_accept_irq:  apicid 0 vec 
>> 177 (LowPrio|level)
>> kvm-1266  [004] 14399.834412: kvm_ioapic_set_irq:   pin 9 dst 3 
>> vec=177 (LowPrio|logical|level)
>> kvm-1266  [004] 14402.180013: kvm_set_irq:  gsi 9 level 
>> 1 source 0
>> kvm-1266  [004] 14402.180019: kvm_pic_set_irq:  chip 1 pin 1 
>> (edge|masked)
>> kvm-1266  [004] 14402.180028: kvm_apic_accept_irq:  apicid 1 vec 
>> 177 (LowPrio|level)
>> kvm-1266  [004] 14402.180029: kvm_ioapic_set_irq:   pin 9 dst 3 
>> vec=177 (LowPrio|logical|level)
>> kvm-1266  [004] 14404.525627: kvm_set_irq:  gsi 9 level 
>> 1 source 0
>> kvm-1266  [004] 14404.525634: kvm_pic_set_irq:  chip 1 pin 1 
>> (edge|masked)
>> kvm-1266  [004] 14404.525641: kvm_apic_accept_irq:  apicid 0 vec 
>> 177 (LowPrio|level)
>> kvm-1266  [004] 14404.525642: kvm_ioapic_set_irq:   pin 9 dst 3 
>> vec=177 (LowPrio|logical|level)
>> kvm-1266  [004] 14406.871238: kvm_set_irq:  gsi 9 level 
>> 1 source 0
>> kvm-1266  [004] 14406.871245: kvm_pic_set_irq:  chip 1 pin 1 
>> (edge|masked)
>> kvm-1266  [004] 14406.871254: kvm_apic_accept_irq:  apicid 1 vec 
>> 177 (LowPrio|level)
>> kvm-1266  [004] 14406.871256: kvm_ioapic_set_irq:   pin 9 dst 3 
>> vec=177 (LowPrio|logical|level)
>> kvm-1266  [006] 14409.216849: kvm_set_irq:  gsi 9 level 
>> 1 source 0
>> kvm-1266  [006] 14409.216855: kvm_pic_set_irq:  chip 1 pin 1 
>> (edge|masked)
>> kvm-1266  [006] 14409.216862: kvm_apic_accept_irq:  apicid 0 vec 
>> 177 (LowPrio|level)
>> kvm-1266  [006] 14409.216863: kvm_ioapic_set_irq:   pin 9 dst 3 
>> vec=177 (LowPrio|logical|level)
>> kvm-1266  [006] 14411.562475: kvm_set_irq:  gsi 9 level 
>> 1 source 0
>> kvm-1266  [006] 14411.562481: kvm_pic_set_irq:  chip 1 pin 1 
>> (edge|masked)
>> kvm-1266  [006] 14411.562489: kvm_apic_accept_irq:  apicid 1 vec 
>> 177 (LowPrio|level)
>> kvm-1266  [006] 14411.562491: kvm_ioapic_set_irq:   pin 9 dst 3 
>> vec=177 (LowPrio|logical|level)
>> kvm-1266  [004] 14413.908074: kvm_set_irq:  gsi 9 level 
>> 1 source 0
>> kvm-1266  [004] 14413.908080: kvm_pic_set_irq:  chip 1 pin 1 
>> (edge|masked)
>> kvm-1266  [004] 14413.908088: kvm_apic_accept_irq:  apicid 0 vec 
>> 177 (LowPrio|level)
>> kvm-1266  [004] 14413.908089: kvm_ioapic_set_irq:   pin 9 dst 3 
>> vec=177 (LowPrio|logical|level)
>>
>>Thanks,
>>Zhang Haoyu
>>
>>>diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>>>index 25e16a6..8f4e211 100644
>>>--- a/virt/kvm/ioapic.c
>>>+++ b/virt/kvm/ioapic.c
>>>@@ -305,6 +305,7 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int 
>>>irq, bool line_status)
>>>

Re: [question] lots of interrupts injected to vm when pressing some key w/o releasing

2014-11-28 Thread Zhang Haoyu
Hi all,
On Thu, Nov 27, 2014 at 03:20:43PM +0800, Zhang Haoyu wrote:
 I tested win-server-2008 with -cpu 
 core2duo,hv_spinlocks=0x,hv_relaxed,hv_time,
 this problem still happened, about 200,000 vmexits per-second, 
 bringing very bad experience, just like being stuck.
 
 Please upload a full trace somewhere, or at least the perf report 
 output.
 
 
 And, if I remove the commit of 0bc830b0, the problem disappeared.
 
 Please send the full trace file.  If you compress it, it should be small.
 
 See the attach 1, please.
 
 Paolo

Can you try the follow draft patch to see whether it solve your problem? 
This patch is based on commit 0bc830b0.

After applying this patch, VM got stuck with black-screen at boot stage,
# trace-cmd report:
version = 6
CPU 0 is empty
CPU 1 is empty
CPU 2 is empty
CPU 3 is empty
CPU 5 is empty
CPU 7 is empty
cpus=8
 kvm-1266  [004] 14399.834397: kvm_set_irq:  gsi 9 level 
 1 source 0
 kvm-1266  [004] 14399.834403: kvm_pic_set_irq:  chip 1 pin 1 
 (edge|masked)
 kvm-1266  [004] 14399.834411: kvm_apic_accept_irq:  apicid 0 vec 
 177 (LowPrio|level)
 kvm-1266  [004] 14399.834412: kvm_ioapic_set_irq:   pin 9 dst 3 
 vec=177 (LowPrio|logical|level)
 kvm-1266  [004] 14402.180013: kvm_set_irq:  gsi 9 level 
 1 source 0
 kvm-1266  [004] 14402.180019: kvm_pic_set_irq:  chip 1 pin 1 
 (edge|masked)
 kvm-1266  [004] 14402.180028: kvm_apic_accept_irq:  apicid 1 vec 
 177 (LowPrio|level)
 kvm-1266  [004] 14402.180029: kvm_ioapic_set_irq:   pin 9 dst 3 
 vec=177 (LowPrio|logical|level)
 kvm-1266  [004] 14404.525627: kvm_set_irq:  gsi 9 level 
 1 source 0
 kvm-1266  [004] 14404.525634: kvm_pic_set_irq:  chip 1 pin 1 
 (edge|masked)
 kvm-1266  [004] 14404.525641: kvm_apic_accept_irq:  apicid 0 vec 
 177 (LowPrio|level)
 kvm-1266  [004] 14404.525642: kvm_ioapic_set_irq:   pin 9 dst 3 
 vec=177 (LowPrio|logical|level)
 kvm-1266  [004] 14406.871238: kvm_set_irq:  gsi 9 level 
 1 source 0
 kvm-1266  [004] 14406.871245: kvm_pic_set_irq:  chip 1 pin 1 
 (edge|masked)
 kvm-1266  [004] 14406.871254: kvm_apic_accept_irq:  apicid 1 vec 
 177 (LowPrio|level)
 kvm-1266  [004] 14406.871256: kvm_ioapic_set_irq:   pin 9 dst 3 
 vec=177 (LowPrio|logical|level)
 kvm-1266  [006] 14409.216849: kvm_set_irq:  gsi 9 level 
 1 source 0
 kvm-1266  [006] 14409.216855: kvm_pic_set_irq:  chip 1 pin 1 
 (edge|masked)
 kvm-1266  [006] 14409.216862: kvm_apic_accept_irq:  apicid 0 vec 
 177 (LowPrio|level)
 kvm-1266  [006] 14409.216863: kvm_ioapic_set_irq:   pin 9 dst 3 
 vec=177 (LowPrio|logical|level)
 kvm-1266  [006] 14411.562475: kvm_set_irq:  gsi 9 level 
 1 source 0
 kvm-1266  [006] 14411.562481: kvm_pic_set_irq:  chip 1 pin 1 
 (edge|masked)
 kvm-1266  [006] 14411.562489: kvm_apic_accept_irq:  apicid 1 vec 
 177 (LowPrio|level)
 kvm-1266  [006] 14411.562491: kvm_ioapic_set_irq:   pin 9 dst 3 
 vec=177 (LowPrio|logical|level)
 kvm-1266  [004] 14413.908074: kvm_set_irq:  gsi 9 level 
 1 source 0
 kvm-1266  [004] 14413.908080: kvm_pic_set_irq:  chip 1 pin 1 
 (edge|masked)
 kvm-1266  [004] 14413.908088: kvm_apic_accept_irq:  apicid 0 vec 
 177 (LowPrio|level)
 kvm-1266  [004] 14413.908089: kvm_ioapic_set_irq:   pin 9 dst 3 
 vec=177 (LowPrio|logical|level)

Thanks,
Zhang Haoyu

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 25e16a6..8f4e211 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -305,6 +305,7 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int 
irq, bool line_status)
return ret;
 }
 
+static int irq_status[256];
 int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int 
 irq_source_id,
   int level, bool line_status)
 {
@@ -312,10 +313,13 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int 
irq, int irq_source_id,
u32 mask = 1  irq;
union kvm_ioapic_redirect_entry entry;
int ret, irq_level;
+   int old_irq;
 

I'm not sure which version of kvm the patch is against, anyway, all the 
modification should be moved to ioapic_set_irq() against the upstream 
kvm, I test the patch w/ win2k8 guest and w/o the Haoyu's comand line 
and it can fix the bug mentioned by Haoyu, in addition, press any key 
can reproduce the bug on my side instead of the small set which Haoyu 
mentioned.

Yang's patch indeed can fix the problem.
Our rtc optimization together with Yang' patch causes the stuck happened at 
boot stage.

Thanks,
Zhang Haoyu

Regards,
Wanpeng Li 

BUG_ON(irq  0 || irq = IOAPIC_NUM_PINS);
 
spin_lock(ioapic-lock);
+   old_irq = irq_status[irq];
+   irq_status[irq] = level

Re: [PATCH] vhost: Add polling mode

2014-08-22 Thread Zhang Haoyu
>>> > 
>>> > Results:
>>> > 
>>> > Netperf, 1 vm:
>>> > The polling patch improved throughput by ~33% (1516 MB/sec -> 2046 
>>> > MB/sec).
>>> > Number of exits/sec decreased 6x.
>>> > The same improvement was shown when I tested with 3 vms running netperf
>>> > (4086 MB/sec -> 5545 MB/sec).
>>> > 
>>> > filebench, 1 vm:
>>> > ops/sec improved by 13% with the polling patch. Number of exits 
>>> was reduced by
>>> > 31%.
>>> > The same experiment with 3 vms running filebench showed similar numbers.
>>> > 
>>> > Signed-off-by: Razya Ladelsky 
>>> 
>>> Gave it a quick try on s390/kvm. As expected it makes no difference 
>>> for big streaming workload like iperf.
>>> uperf with a 1-1 round robin got indeed faster by about 30%.
>>> The high CPU consumption is something that bothers me though, as 
>>> virtualized systems tend to be full.
>>> 
>>> 
>>
>>Thanks for confirming the results!
>>The best way to use this patch would be along with a shared vhost thread 
>>for multiple
>>devices/vms, as described in:
>>http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument
>>This work assumes having a dedicated I/O core where the vhost thread 
>>serves multiple vms, which 
>>makes the high cpu utilization less of a concern. 
>>
>Hi, Razya, Shirley
>I am going to test the combination of 
>"several (depends on total number of cpu on host, e.g.,  total_number * 1/3) 
>vhost threads server all VMs" and "vhost: add polling mode",
>now I get the patch 
>"http://thread.gmane.org/gmane.comp.emulators.kvm.devel/88682/focus=88723; 
>posted by Shirley,
>any update to this patch?
>
>And, I want to make a bit change on this patch, create total_cpu_number * 
>1/N(N={3,4}) vhost threads instead of per-cpu vhost thread to server all VMs,
Just like xen netback threads, whose number is equal to num_online_cpus on 
Dom0, 
but for kvm host, I think per-cpu vhost thread is too many.
>any ideas?
>
>Thanks,
>Zhang Haoyu
>>
>>
>>> > +static int poll_start_rate = 0;
>>> > +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
>>> > +MODULE_PARM_DESC(poll_start_rate, "Start continuous polling of 
>>> virtqueue when rate of events is at least this number per jiffy. If 
>>> 0, never start polling.");
>>> > +
>>> > +static int poll_stop_idle = 3*HZ; /* 3 seconds */
>>> > +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
>>> > +MODULE_PARM_DESC(poll_stop_idle, "Stop continuous polling of 
>>> virtqueue after this many jiffies of no work.");
>>> 
>>> This seems ridicoudly high. Even one jiffie is an eternity, so 
>>> setting it to 1 as a default would reduce the CPU overhead for most cases.
>>> If we dont have a packet in one millisecond, we can surely go back 
>>> to the kick approach, I think.
>>> 
>>> Christian
>>> 
>>
>>Good point, will reduce it and recheck.
>>Thank you,
>>Razya

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vhost: Add polling mode

2014-08-22 Thread Zhang Haoyu
>> > 
>> > Results:
>> > 
>> > Netperf, 1 vm:
>> > The polling patch improved throughput by ~33% (1516 MB/sec -> 2046 MB/sec).
>> > Number of exits/sec decreased 6x.
>> > The same improvement was shown when I tested with 3 vms running netperf
>> > (4086 MB/sec -> 5545 MB/sec).
>> > 
>> > filebench, 1 vm:
>> > ops/sec improved by 13% with the polling patch. Number of exits 
>> was reduced by
>> > 31%.
>> > The same experiment with 3 vms running filebench showed similar numbers.
>> > 
>> > Signed-off-by: Razya Ladelsky 
>> 
>> Gave it a quick try on s390/kvm. As expected it makes no difference 
>> for big streaming workload like iperf.
>> uperf with a 1-1 round robin got indeed faster by about 30%.
>> The high CPU consumption is something that bothers me though, as 
>> virtualized systems tend to be full.
>> 
>> 
>
>Thanks for confirming the results!
>The best way to use this patch would be along with a shared vhost thread 
>for multiple
>devices/vms, as described in:
>http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument
>This work assumes having a dedicated I/O core where the vhost thread 
>serves multiple vms, which 
>makes the high cpu utilization less of a concern. 
>
Hi, Razya, Shirley
I am going to test the combination of 
"several (depends on total number of cpu on host, e.g.,  total_number * 1/3) 
vhost threads server all VMs" and "vhost: add polling mode",
now I get the patch 
"http://thread.gmane.org/gmane.comp.emulators.kvm.devel/88682/focus=88723; 
posted by Shirley,
any update to this patch?

And, I want to make a bit change on this patch, create total_cpu_number * 
1/N(N={3,4}) vhost threads instead of per-cpu vhost thread to server all VMs,
any ideas?

Thanks,
Zhang Haoyu
>
>
>> > +static int poll_start_rate = 0;
>> > +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
>> > +MODULE_PARM_DESC(poll_start_rate, "Start continuous polling of 
>> virtqueue when rate of events is at least this number per jiffy. If 
>> 0, never start polling.");
>> > +
>> > +static int poll_stop_idle = 3*HZ; /* 3 seconds */
>> > +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
>> > +MODULE_PARM_DESC(poll_stop_idle, "Stop continuous polling of 
>> virtqueue after this many jiffies of no work.");
>> 
>> This seems ridicoudly high. Even one jiffie is an eternity, so 
>> setting it to 1 as a default would reduce the CPU overhead for most cases.
>> If we dont have a packet in one millisecond, we can surely go back 
>> to the kick approach, I think.
>> 
>> Christian
>> 
>
>Good point, will reduce it and recheck.
>Thank you,
>Razya

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vhost: Add polling mode

2014-08-22 Thread Zhang Haoyu
  
  Results:
  
  Netperf, 1 vm:
  The polling patch improved throughput by ~33% (1516 MB/sec - 2046 MB/sec).
  Number of exits/sec decreased 6x.
  The same improvement was shown when I tested with 3 vms running netperf
  (4086 MB/sec - 5545 MB/sec).
  
  filebench, 1 vm:
  ops/sec improved by 13% with the polling patch. Number of exits 
 was reduced by
  31%.
  The same experiment with 3 vms running filebench showed similar numbers.
  
  Signed-off-by: Razya Ladelsky ra...@il.ibm.com
 
 Gave it a quick try on s390/kvm. As expected it makes no difference 
 for big streaming workload like iperf.
 uperf with a 1-1 round robin got indeed faster by about 30%.
 The high CPU consumption is something that bothers me though, as 
 virtualized systems tend to be full.
 
 

Thanks for confirming the results!
The best way to use this patch would be along with a shared vhost thread 
for multiple
devices/vms, as described in:
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument
This work assumes having a dedicated I/O core where the vhost thread 
serves multiple vms, which 
makes the high cpu utilization less of a concern. 

Hi, Razya, Shirley
I am going to test the combination of 
several (depends on total number of cpu on host, e.g.,  total_number * 1/3) 
vhost threads server all VMs and vhost: add polling mode,
now I get the patch 
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/88682/focus=88723; 
posted by Shirley,
any update to this patch?

And, I want to make a bit change on this patch, create total_cpu_number * 
1/N(N={3,4}) vhost threads instead of per-cpu vhost thread to server all VMs,
any ideas?

Thanks,
Zhang Haoyu


  +static int poll_start_rate = 0;
  +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
  +MODULE_PARM_DESC(poll_start_rate, Start continuous polling of 
 virtqueue when rate of events is at least this number per jiffy. If 
 0, never start polling.);
  +
  +static int poll_stop_idle = 3*HZ; /* 3 seconds */
  +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
  +MODULE_PARM_DESC(poll_stop_idle, Stop continuous polling of 
 virtqueue after this many jiffies of no work.);
 
 This seems ridicoudly high. Even one jiffie is an eternity, so 
 setting it to 1 as a default would reduce the CPU overhead for most cases.
 If we dont have a packet in one millisecond, we can surely go back 
 to the kick approach, I think.
 
 Christian
 

Good point, will reduce it and recheck.
Thank you,
Razya

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vhost: Add polling mode

2014-08-22 Thread Zhang Haoyu
  
  Results:
  
  Netperf, 1 vm:
  The polling patch improved throughput by ~33% (1516 MB/sec - 2046 
  MB/sec).
  Number of exits/sec decreased 6x.
  The same improvement was shown when I tested with 3 vms running netperf
  (4086 MB/sec - 5545 MB/sec).
  
  filebench, 1 vm:
  ops/sec improved by 13% with the polling patch. Number of exits 
 was reduced by
  31%.
  The same experiment with 3 vms running filebench showed similar numbers.
  
  Signed-off-by: Razya Ladelsky ra...@il.ibm.com
 
 Gave it a quick try on s390/kvm. As expected it makes no difference 
 for big streaming workload like iperf.
 uperf with a 1-1 round robin got indeed faster by about 30%.
 The high CPU consumption is something that bothers me though, as 
 virtualized systems tend to be full.
 
 

Thanks for confirming the results!
The best way to use this patch would be along with a shared vhost thread 
for multiple
devices/vms, as described in:
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument
This work assumes having a dedicated I/O core where the vhost thread 
serves multiple vms, which 
makes the high cpu utilization less of a concern. 

Hi, Razya, Shirley
I am going to test the combination of 
several (depends on total number of cpu on host, e.g.,  total_number * 1/3) 
vhost threads server all VMs and vhost: add polling mode,
now I get the patch 
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/88682/focus=88723; 
posted by Shirley,
any update to this patch?

And, I want to make a bit change on this patch, create total_cpu_number * 
1/N(N={3,4}) vhost threads instead of per-cpu vhost thread to server all VMs,
Just like xen netback threads, whose number is equal to num_online_cpus on 
Dom0, 
but for kvm host, I think per-cpu vhost thread is too many.
any ideas?

Thanks,
Zhang Haoyu


  +static int poll_start_rate = 0;
  +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
  +MODULE_PARM_DESC(poll_start_rate, Start continuous polling of 
 virtqueue when rate of events is at least this number per jiffy. If 
 0, never start polling.);
  +
  +static int poll_stop_idle = 3*HZ; /* 3 seconds */
  +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
  +MODULE_PARM_DESC(poll_stop_idle, Stop continuous polling of 
 virtqueue after this many jiffies of no work.);
 
 This seems ridicoudly high. Even one jiffie is an eternity, so 
 setting it to 1 as a default would reduce the CPU overhead for most cases.
 If we dont have a packet in one millisecond, we can surely go back 
 to the kick approach, I think.
 
 Christian
 

Good point, will reduce it and recheck.
Thank you,
Razya

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/