Re: pidns: Make pid accounting and pid_max per namespace
On 10/10/15 12:40, Zhang Haoyu wrote: > On 10/10/15 11:35, Zefan Li wrote: >> On 2015/10/9 18:29, Zhang Haoyu wrote: >>> I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2), >>> and there's one bad program was running in one container. >>> This program produced many child threads continuously without free, so more >>> and >>> more pid numbers were consumed by this program, until hitting the pix_max >>> limit (32768 >>> default in my system ). >>> >>> What's worse is that containers and host share the pid numbers resource, so >>> new program >>> cannot be produced any more in host and other containers. >>> >>> And, I clone the upstream kernel source from >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >>> This problem is still there, I'm not sure. >>> >>> IMO, we should isolate the pid accounting and pid_max between pid >>> namespaces, >>> and make them per pidns. >>> Below post had request for making pid_max per pidns. >>> http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210 >>> >> Mainline kernel already supports per-cgroup pid limit, which should solve >> your problem. >> > What about pid accounting? > If one pidns consume too many pids, dose it influence the other pid > namespaces? I found it, thanks very much. > > Thanks, > Zhang Haoyu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pidns: Make pid accounting and pid_max per namespace
On 10/10/15 11:35, Zefan Li wrote: > On 2015/10/9 18:29, Zhang Haoyu wrote: >> I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2), >> and there's one bad program was running in one container. >> This program produced many child threads continuously without free, so more >> and >> more pid numbers were consumed by this program, until hitting the pix_max >> limit (32768 >> default in my system ). >> >> What's worse is that containers and host share the pid numbers resource, so >> new program >> cannot be produced any more in host and other containers. >> >> And, I clone the upstream kernel source from >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >> This problem is still there, I'm not sure. >> >> IMO, we should isolate the pid accounting and pid_max between pid namespaces, >> and make them per pidns. >> Below post had request for making pid_max per pidns. >> http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210 >> > > Mainline kernel already supports per-cgroup pid limit, which should solve > your problem. > What about pid accounting? If one pidns consume too many pids, dose it influence the other pid namespaces? Thanks, Zhang Haoyu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
pidns: Make pid accounting and pid_max per namespace
I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2), and there's one bad program was running in one container. This program produced many child threads continuously without free, so more and more pid numbers were consumed by this program, until hitting the pix_max limit (32768 default in my system ). What's worse is that containers and host share the pid numbers resource, so new program cannot be produced any more in host and other containers. And, I clone the upstream kernel source from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git This problem is still there, I'm not sure. IMO, we should isolate the pid accounting and pid_max between pid namespaces, and make them per pidns. Below post had request for making pid_max per pidns. http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210 Thanks, Zhang Haoyu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pidns: Make pid accounting and pid_max per namespace
On 10/10/15 11:35, Zefan Li wrote: > On 2015/10/9 18:29, Zhang Haoyu wrote: >> I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2), >> and there's one bad program was running in one container. >> This program produced many child threads continuously without free, so more >> and >> more pid numbers were consumed by this program, until hitting the pix_max >> limit (32768 >> default in my system ). >> >> What's worse is that containers and host share the pid numbers resource, so >> new program >> cannot be produced any more in host and other containers. >> >> And, I clone the upstream kernel source from >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >> This problem is still there, I'm not sure. >> >> IMO, we should isolate the pid accounting and pid_max between pid namespaces, >> and make them per pidns. >> Below post had request for making pid_max per pidns. >> http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210 >> > > Mainline kernel already supports per-cgroup pid limit, which should solve > your problem. > What about pid accounting? If one pidns consume too many pids, dose it influence the other pid namespaces? Thanks, Zhang Haoyu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pidns: Make pid accounting and pid_max per namespace
On 10/10/15 12:40, Zhang Haoyu wrote: > On 10/10/15 11:35, Zefan Li wrote: >> On 2015/10/9 18:29, Zhang Haoyu wrote: >>> I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2), >>> and there's one bad program was running in one container. >>> This program produced many child threads continuously without free, so more >>> and >>> more pid numbers were consumed by this program, until hitting the pix_max >>> limit (32768 >>> default in my system ). >>> >>> What's worse is that containers and host share the pid numbers resource, so >>> new program >>> cannot be produced any more in host and other containers. >>> >>> And, I clone the upstream kernel source from >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >>> This problem is still there, I'm not sure. >>> >>> IMO, we should isolate the pid accounting and pid_max between pid >>> namespaces, >>> and make them per pidns. >>> Below post had request for making pid_max per pidns. >>> http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210 >>> >> Mainline kernel already supports per-cgroup pid limit, which should solve >> your problem. >> > What about pid accounting? > If one pidns consume too many pids, dose it influence the other pid > namespaces? I found it, thanks very much. > > Thanks, > Zhang Haoyu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
pidns: Make pid accounting and pid_max per namespace
I started multiple docker containers in centos6.6(linux-2.6.32-504.16.2), and there's one bad program was running in one container. This program produced many child threads continuously without free, so more and more pid numbers were consumed by this program, until hitting the pix_max limit (32768 default in my system ). What's worse is that containers and host share the pid numbers resource, so new program cannot be produced any more in host and other containers. And, I clone the upstream kernel source from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git This problem is still there, I'm not sure. IMO, we should isolate the pid accounting and pid_max between pid namespaces, and make them per pidns. Below post had request for making pid_max per pidns. http://thread.gmane.org/gmane.linux.kernel/1108167/focus=210 Thanks, Zhang Haoyu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [question] lots of interrupts injected to vm when pressing some key w/o releasing
>Hi all, >On Thu, Nov 27, 2014 at 03:20:43PM +0800, Zhang Haoyu wrote: >>>>>>>> I tested win-server-2008 with "-cpu >>>>>>>> core2duo,hv_spinlocks=0x,hv_relaxed,hv_time", >>>>>>>> this problem still happened, about 200,000 vmexits per-second, >>>>>>>> bringing very bad experience, just like being stuck. >>>>>>> >>>>>>> Please upload a full trace somewhere, or at least the "perf report" >>>>>>> output. >>>>>>> >>>>>> >>>>>> And, if I remove the commit of 0bc830b0, the problem disappeared. >>>>> >>>>> Please send the full trace file. If you compress it, it should be small. >>>>> >>>> See the attach 1, please. >>>> >>>>> Paolo >>> >>>Can you try the follow draft patch to see whether it solve your problem? >>>This patch is based on commit 0bc830b0. >>> >>After applying this patch, VM got stuck with black-screen at boot stage, >># trace-cmd report: >>version = 6 >>CPU 0 is empty >>CPU 1 is empty >>CPU 2 is empty >>CPU 3 is empty >>CPU 5 is empty >>CPU 7 is empty >>cpus=8 >> kvm-1266 [004] 14399.834397: kvm_set_irq: gsi 9 level >> 1 source 0 >> kvm-1266 [004] 14399.834403: kvm_pic_set_irq: chip 1 pin 1 >> (edge|masked) >> kvm-1266 [004] 14399.834411: kvm_apic_accept_irq: apicid 0 vec >> 177 (LowPrio|level) >> kvm-1266 [004] 14399.834412: kvm_ioapic_set_irq: pin 9 dst 3 >> vec=177 (LowPrio|logical|level) >> kvm-1266 [004] 14402.180013: kvm_set_irq: gsi 9 level >> 1 source 0 >> kvm-1266 [004] 14402.180019: kvm_pic_set_irq: chip 1 pin 1 >> (edge|masked) >> kvm-1266 [004] 14402.180028: kvm_apic_accept_irq: apicid 1 vec >> 177 (LowPrio|level) >> kvm-1266 [004] 14402.180029: kvm_ioapic_set_irq: pin 9 dst 3 >> vec=177 (LowPrio|logical|level) >> kvm-1266 [004] 14404.525627: kvm_set_irq: gsi 9 level >> 1 source 0 >> kvm-1266 [004] 14404.525634: kvm_pic_set_irq: chip 1 pin 1 >> (edge|masked) >> kvm-1266 [004] 14404.525641: kvm_apic_accept_irq: apicid 0 vec >> 177 (LowPrio|level) >> kvm-1266 [004] 14404.525642: kvm_ioapic_set_irq: pin 9 dst 3 >> vec=177 (LowPrio|logical|level) >> kvm-1266 [004] 14406.871238: kvm_set_irq: gsi 9 level >> 1 source 0 >> kvm-1266 [004] 14406.871245: kvm_pic_set_irq: chip 1 pin 1 >> (edge|masked) >> kvm-1266 [004] 14406.871254: kvm_apic_accept_irq: apicid 1 vec >> 177 (LowPrio|level) >> kvm-1266 [004] 14406.871256: kvm_ioapic_set_irq: pin 9 dst 3 >> vec=177 (LowPrio|logical|level) >> kvm-1266 [006] 14409.216849: kvm_set_irq: gsi 9 level >> 1 source 0 >> kvm-1266 [006] 14409.216855: kvm_pic_set_irq: chip 1 pin 1 >> (edge|masked) >> kvm-1266 [006] 14409.216862: kvm_apic_accept_irq: apicid 0 vec >> 177 (LowPrio|level) >> kvm-1266 [006] 14409.216863: kvm_ioapic_set_irq: pin 9 dst 3 >> vec=177 (LowPrio|logical|level) >> kvm-1266 [006] 14411.562475: kvm_set_irq: gsi 9 level >> 1 source 0 >> kvm-1266 [006] 14411.562481: kvm_pic_set_irq: chip 1 pin 1 >> (edge|masked) >> kvm-1266 [006] 14411.562489: kvm_apic_accept_irq: apicid 1 vec >> 177 (LowPrio|level) >> kvm-1266 [006] 14411.562491: kvm_ioapic_set_irq: pin 9 dst 3 >> vec=177 (LowPrio|logical|level) >> kvm-1266 [004] 14413.908074: kvm_set_irq: gsi 9 level >> 1 source 0 >> kvm-1266 [004] 14413.908080: kvm_pic_set_irq: chip 1 pin 1 >> (edge|masked) >> kvm-1266 [004] 14413.908088: kvm_apic_accept_irq: apicid 0 vec >> 177 (LowPrio|level) >> kvm-1266 [004] 14413.908089: kvm_ioapic_set_irq: pin 9 dst 3 >> vec=177 (LowPrio|logical|level) >> >>Thanks, >>Zhang Haoyu >> >>>diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c >>>index 25e16a6..8f4e211 100644 >>>--- a/virt/kvm/ioapic.c >>>+++ b/virt/kvm/ioapic.c >>>@@ -305,6 +305,7 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int >>>irq, bool line_status) >>>
Re: [question] lots of interrupts injected to vm when pressing some key w/o releasing
Hi all, On Thu, Nov 27, 2014 at 03:20:43PM +0800, Zhang Haoyu wrote: I tested win-server-2008 with -cpu core2duo,hv_spinlocks=0x,hv_relaxed,hv_time, this problem still happened, about 200,000 vmexits per-second, bringing very bad experience, just like being stuck. Please upload a full trace somewhere, or at least the perf report output. And, if I remove the commit of 0bc830b0, the problem disappeared. Please send the full trace file. If you compress it, it should be small. See the attach 1, please. Paolo Can you try the follow draft patch to see whether it solve your problem? This patch is based on commit 0bc830b0. After applying this patch, VM got stuck with black-screen at boot stage, # trace-cmd report: version = 6 CPU 0 is empty CPU 1 is empty CPU 2 is empty CPU 3 is empty CPU 5 is empty CPU 7 is empty cpus=8 kvm-1266 [004] 14399.834397: kvm_set_irq: gsi 9 level 1 source 0 kvm-1266 [004] 14399.834403: kvm_pic_set_irq: chip 1 pin 1 (edge|masked) kvm-1266 [004] 14399.834411: kvm_apic_accept_irq: apicid 0 vec 177 (LowPrio|level) kvm-1266 [004] 14399.834412: kvm_ioapic_set_irq: pin 9 dst 3 vec=177 (LowPrio|logical|level) kvm-1266 [004] 14402.180013: kvm_set_irq: gsi 9 level 1 source 0 kvm-1266 [004] 14402.180019: kvm_pic_set_irq: chip 1 pin 1 (edge|masked) kvm-1266 [004] 14402.180028: kvm_apic_accept_irq: apicid 1 vec 177 (LowPrio|level) kvm-1266 [004] 14402.180029: kvm_ioapic_set_irq: pin 9 dst 3 vec=177 (LowPrio|logical|level) kvm-1266 [004] 14404.525627: kvm_set_irq: gsi 9 level 1 source 0 kvm-1266 [004] 14404.525634: kvm_pic_set_irq: chip 1 pin 1 (edge|masked) kvm-1266 [004] 14404.525641: kvm_apic_accept_irq: apicid 0 vec 177 (LowPrio|level) kvm-1266 [004] 14404.525642: kvm_ioapic_set_irq: pin 9 dst 3 vec=177 (LowPrio|logical|level) kvm-1266 [004] 14406.871238: kvm_set_irq: gsi 9 level 1 source 0 kvm-1266 [004] 14406.871245: kvm_pic_set_irq: chip 1 pin 1 (edge|masked) kvm-1266 [004] 14406.871254: kvm_apic_accept_irq: apicid 1 vec 177 (LowPrio|level) kvm-1266 [004] 14406.871256: kvm_ioapic_set_irq: pin 9 dst 3 vec=177 (LowPrio|logical|level) kvm-1266 [006] 14409.216849: kvm_set_irq: gsi 9 level 1 source 0 kvm-1266 [006] 14409.216855: kvm_pic_set_irq: chip 1 pin 1 (edge|masked) kvm-1266 [006] 14409.216862: kvm_apic_accept_irq: apicid 0 vec 177 (LowPrio|level) kvm-1266 [006] 14409.216863: kvm_ioapic_set_irq: pin 9 dst 3 vec=177 (LowPrio|logical|level) kvm-1266 [006] 14411.562475: kvm_set_irq: gsi 9 level 1 source 0 kvm-1266 [006] 14411.562481: kvm_pic_set_irq: chip 1 pin 1 (edge|masked) kvm-1266 [006] 14411.562489: kvm_apic_accept_irq: apicid 1 vec 177 (LowPrio|level) kvm-1266 [006] 14411.562491: kvm_ioapic_set_irq: pin 9 dst 3 vec=177 (LowPrio|logical|level) kvm-1266 [004] 14413.908074: kvm_set_irq: gsi 9 level 1 source 0 kvm-1266 [004] 14413.908080: kvm_pic_set_irq: chip 1 pin 1 (edge|masked) kvm-1266 [004] 14413.908088: kvm_apic_accept_irq: apicid 0 vec 177 (LowPrio|level) kvm-1266 [004] 14413.908089: kvm_ioapic_set_irq: pin 9 dst 3 vec=177 (LowPrio|logical|level) Thanks, Zhang Haoyu diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 25e16a6..8f4e211 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -305,6 +305,7 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int irq, bool line_status) return ret; } +static int irq_status[256]; int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id, int level, bool line_status) { @@ -312,10 +313,13 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id, u32 mask = 1 irq; union kvm_ioapic_redirect_entry entry; int ret, irq_level; + int old_irq; I'm not sure which version of kvm the patch is against, anyway, all the modification should be moved to ioapic_set_irq() against the upstream kvm, I test the patch w/ win2k8 guest and w/o the Haoyu's comand line and it can fix the bug mentioned by Haoyu, in addition, press any key can reproduce the bug on my side instead of the small set which Haoyu mentioned. Yang's patch indeed can fix the problem. Our rtc optimization together with Yang' patch causes the stuck happened at boot stage. Thanks, Zhang Haoyu Regards, Wanpeng Li BUG_ON(irq 0 || irq = IOAPIC_NUM_PINS); spin_lock(ioapic-lock); + old_irq = irq_status[irq]; + irq_status[irq] = level
Re: [PATCH] vhost: Add polling mode
>>> > >>> > Results: >>> > >>> > Netperf, 1 vm: >>> > The polling patch improved throughput by ~33% (1516 MB/sec -> 2046 >>> > MB/sec). >>> > Number of exits/sec decreased 6x. >>> > The same improvement was shown when I tested with 3 vms running netperf >>> > (4086 MB/sec -> 5545 MB/sec). >>> > >>> > filebench, 1 vm: >>> > ops/sec improved by 13% with the polling patch. Number of exits >>> was reduced by >>> > 31%. >>> > The same experiment with 3 vms running filebench showed similar numbers. >>> > >>> > Signed-off-by: Razya Ladelsky >>> >>> Gave it a quick try on s390/kvm. As expected it makes no difference >>> for big streaming workload like iperf. >>> uperf with a 1-1 round robin got indeed faster by about 30%. >>> The high CPU consumption is something that bothers me though, as >>> virtualized systems tend to be full. >>> >>> >> >>Thanks for confirming the results! >>The best way to use this patch would be along with a shared vhost thread >>for multiple >>devices/vms, as described in: >>http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument >>This work assumes having a dedicated I/O core where the vhost thread >>serves multiple vms, which >>makes the high cpu utilization less of a concern. >> >Hi, Razya, Shirley >I am going to test the combination of >"several (depends on total number of cpu on host, e.g., total_number * 1/3) >vhost threads server all VMs" and "vhost: add polling mode", >now I get the patch >"http://thread.gmane.org/gmane.comp.emulators.kvm.devel/88682/focus=88723; >posted by Shirley, >any update to this patch? > >And, I want to make a bit change on this patch, create total_cpu_number * >1/N(N={3,4}) vhost threads instead of per-cpu vhost thread to server all VMs, Just like xen netback threads, whose number is equal to num_online_cpus on Dom0, but for kvm host, I think per-cpu vhost thread is too many. >any ideas? > >Thanks, >Zhang Haoyu >> >> >>> > +static int poll_start_rate = 0; >>> > +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR); >>> > +MODULE_PARM_DESC(poll_start_rate, "Start continuous polling of >>> virtqueue when rate of events is at least this number per jiffy. If >>> 0, never start polling."); >>> > + >>> > +static int poll_stop_idle = 3*HZ; /* 3 seconds */ >>> > +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR); >>> > +MODULE_PARM_DESC(poll_stop_idle, "Stop continuous polling of >>> virtqueue after this many jiffies of no work."); >>> >>> This seems ridicoudly high. Even one jiffie is an eternity, so >>> setting it to 1 as a default would reduce the CPU overhead for most cases. >>> If we dont have a packet in one millisecond, we can surely go back >>> to the kick approach, I think. >>> >>> Christian >>> >> >>Good point, will reduce it and recheck. >>Thank you, >>Razya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vhost: Add polling mode
>> > >> > Results: >> > >> > Netperf, 1 vm: >> > The polling patch improved throughput by ~33% (1516 MB/sec -> 2046 MB/sec). >> > Number of exits/sec decreased 6x. >> > The same improvement was shown when I tested with 3 vms running netperf >> > (4086 MB/sec -> 5545 MB/sec). >> > >> > filebench, 1 vm: >> > ops/sec improved by 13% with the polling patch. Number of exits >> was reduced by >> > 31%. >> > The same experiment with 3 vms running filebench showed similar numbers. >> > >> > Signed-off-by: Razya Ladelsky >> >> Gave it a quick try on s390/kvm. As expected it makes no difference >> for big streaming workload like iperf. >> uperf with a 1-1 round robin got indeed faster by about 30%. >> The high CPU consumption is something that bothers me though, as >> virtualized systems tend to be full. >> >> > >Thanks for confirming the results! >The best way to use this patch would be along with a shared vhost thread >for multiple >devices/vms, as described in: >http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument >This work assumes having a dedicated I/O core where the vhost thread >serves multiple vms, which >makes the high cpu utilization less of a concern. > Hi, Razya, Shirley I am going to test the combination of "several (depends on total number of cpu on host, e.g., total_number * 1/3) vhost threads server all VMs" and "vhost: add polling mode", now I get the patch "http://thread.gmane.org/gmane.comp.emulators.kvm.devel/88682/focus=88723; posted by Shirley, any update to this patch? And, I want to make a bit change on this patch, create total_cpu_number * 1/N(N={3,4}) vhost threads instead of per-cpu vhost thread to server all VMs, any ideas? Thanks, Zhang Haoyu > > >> > +static int poll_start_rate = 0; >> > +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR); >> > +MODULE_PARM_DESC(poll_start_rate, "Start continuous polling of >> virtqueue when rate of events is at least this number per jiffy. If >> 0, never start polling."); >> > + >> > +static int poll_stop_idle = 3*HZ; /* 3 seconds */ >> > +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR); >> > +MODULE_PARM_DESC(poll_stop_idle, "Stop continuous polling of >> virtqueue after this many jiffies of no work."); >> >> This seems ridicoudly high. Even one jiffie is an eternity, so >> setting it to 1 as a default would reduce the CPU overhead for most cases. >> If we dont have a packet in one millisecond, we can surely go back >> to the kick approach, I think. >> >> Christian >> > >Good point, will reduce it and recheck. >Thank you, >Razya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vhost: Add polling mode
Results: Netperf, 1 vm: The polling patch improved throughput by ~33% (1516 MB/sec - 2046 MB/sec). Number of exits/sec decreased 6x. The same improvement was shown when I tested with 3 vms running netperf (4086 MB/sec - 5545 MB/sec). filebench, 1 vm: ops/sec improved by 13% with the polling patch. Number of exits was reduced by 31%. The same experiment with 3 vms running filebench showed similar numbers. Signed-off-by: Razya Ladelsky ra...@il.ibm.com Gave it a quick try on s390/kvm. As expected it makes no difference for big streaming workload like iperf. uperf with a 1-1 round robin got indeed faster by about 30%. The high CPU consumption is something that bothers me though, as virtualized systems tend to be full. Thanks for confirming the results! The best way to use this patch would be along with a shared vhost thread for multiple devices/vms, as described in: http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument This work assumes having a dedicated I/O core where the vhost thread serves multiple vms, which makes the high cpu utilization less of a concern. Hi, Razya, Shirley I am going to test the combination of several (depends on total number of cpu on host, e.g., total_number * 1/3) vhost threads server all VMs and vhost: add polling mode, now I get the patch http://thread.gmane.org/gmane.comp.emulators.kvm.devel/88682/focus=88723; posted by Shirley, any update to this patch? And, I want to make a bit change on this patch, create total_cpu_number * 1/N(N={3,4}) vhost threads instead of per-cpu vhost thread to server all VMs, any ideas? Thanks, Zhang Haoyu +static int poll_start_rate = 0; +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(poll_start_rate, Start continuous polling of virtqueue when rate of events is at least this number per jiffy. If 0, never start polling.); + +static int poll_stop_idle = 3*HZ; /* 3 seconds */ +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(poll_stop_idle, Stop continuous polling of virtqueue after this many jiffies of no work.); This seems ridicoudly high. Even one jiffie is an eternity, so setting it to 1 as a default would reduce the CPU overhead for most cases. If we dont have a packet in one millisecond, we can surely go back to the kick approach, I think. Christian Good point, will reduce it and recheck. Thank you, Razya -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] vhost: Add polling mode
Results: Netperf, 1 vm: The polling patch improved throughput by ~33% (1516 MB/sec - 2046 MB/sec). Number of exits/sec decreased 6x. The same improvement was shown when I tested with 3 vms running netperf (4086 MB/sec - 5545 MB/sec). filebench, 1 vm: ops/sec improved by 13% with the polling patch. Number of exits was reduced by 31%. The same experiment with 3 vms running filebench showed similar numbers. Signed-off-by: Razya Ladelsky ra...@il.ibm.com Gave it a quick try on s390/kvm. As expected it makes no difference for big streaming workload like iperf. uperf with a 1-1 round robin got indeed faster by about 30%. The high CPU consumption is something that bothers me though, as virtualized systems tend to be full. Thanks for confirming the results! The best way to use this patch would be along with a shared vhost thread for multiple devices/vms, as described in: http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument This work assumes having a dedicated I/O core where the vhost thread serves multiple vms, which makes the high cpu utilization less of a concern. Hi, Razya, Shirley I am going to test the combination of several (depends on total number of cpu on host, e.g., total_number * 1/3) vhost threads server all VMs and vhost: add polling mode, now I get the patch http://thread.gmane.org/gmane.comp.emulators.kvm.devel/88682/focus=88723; posted by Shirley, any update to this patch? And, I want to make a bit change on this patch, create total_cpu_number * 1/N(N={3,4}) vhost threads instead of per-cpu vhost thread to server all VMs, Just like xen netback threads, whose number is equal to num_online_cpus on Dom0, but for kvm host, I think per-cpu vhost thread is too many. any ideas? Thanks, Zhang Haoyu +static int poll_start_rate = 0; +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(poll_start_rate, Start continuous polling of virtqueue when rate of events is at least this number per jiffy. If 0, never start polling.); + +static int poll_stop_idle = 3*HZ; /* 3 seconds */ +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(poll_stop_idle, Stop continuous polling of virtqueue after this many jiffies of no work.); This seems ridicoudly high. Even one jiffie is an eternity, so setting it to 1 as a default would reduce the CPU overhead for most cases. If we dont have a packet in one millisecond, we can surely go back to the kick approach, I think. Christian Good point, will reduce it and recheck. Thank you, Razya -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/