Re: [PATCH] x86/kvm: Show guest system/user cputime in cpustat
On Thu, 2010-03-11 at 09:50 +0200, Avi Kivity wrote: On 03/11/2010 09:46 AM, Sheng Yang wrote: On Thursday 11 March 2010 15:36:01 Avi Kivity wrote: On 03/11/2010 09:20 AM, Sheng Yang wrote: Currently we can only get the cpu_stat of whole guest as one. This patch enhanced cpu_stat with more detail, has guest_system and guest_user cpu time statistics with a little overhead. Signed-off-by: Sheng Yangsh...@linux.intel.com It seems per-process guest cpu utilization is more useful than per-cpu's. --- This draft patch based on KVM upstream to show the idea. I would split it into more kernel friendly version later. The overhead is, the cost of get_cpl() after each exit from guest. This can be very expensive in the nested virtualization case, so I wouldn't like this to be in normal paths. I think detailed profiling like that can be left to 'perf kvm', which only has overhead if enabled at runtime. Yes, that's my concern too(though nested vmcs/vmcb read already too expensive, they should be optimized...). Any ideas on how to do that? Perhaps use paravirt_ops to covert the vmread into a memory read? We store the vmwrites in the vmcs anyway. Another method is to add sysctl entry, such like /proc/sys/kernel/collect_guest_utilization, and we can set it off by default. Or add a /sys/kernel/debug/kvm/collect_guest_utilization. The other concern is, perf alike mechanism would bring a lot more overhead compared to this. Ordinarily users won't care if time is spent in guest kernel mode or guest user mode. They want to see which guest is imposing a load on a system. I consider a user profiling a guest from the host an advanced and rarer use case, so it's okay to require tools and additional overhead for this. Here is the story why Sheng worked out the patch. Some guys work on KVM performance. They want us to extend top to show guest utilization info, such like guest kernel and guest userspace cpu utilization. With the new tool, they could find which VM (mapping with qemu process id) consumes too much cpu time in host space (including kernel and userspace), and compare them with guest kernel/userspace. That information could provide a first-hand high-level overview about all VMs running in the system and help admin quickly find what the worst VM instance is. So we need per-process (guest) cpu utilization than per-cpu guest utilization. For example you can put the code to note the cpl in a tracepoint which is enabled dynamically. Yanmin have already implement perf kvm to support this. We are just arguing if a normal top-alike mechanism is necessary. perf kvm mostly is used to find hot functions which might cause more overhead. Sheng's patch has less overhead. I am also considering to make it a feature that can be disabled. But seems it make things complicate and result in uncertain cpustat output. I'm not even sure that guest time was a good idea. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make QEmu HPET disabled by default for KVM?
On Thursday 11 March 2010 15:58:12 Avi Kivity wrote: On 03/11/2010 09:52 AM, Sheng Yang wrote: I think we have already suffered enough timer issues due to this(e.g. I can't boot up well on 2.6.18 kernel)... 2.6.18 as guest or as host? Guest I have kept --no-hpet in my setup for months... Any details about the problems? HPET is important to some guests. Seems like HPET reaction is too slow to satisfy some guests(for it would replace PIT). Here is the thread last time. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/44899 -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make QEmu HPET disabled by default for KVM?
On 03/11/2010 10:23 AM, Sheng Yang wrote: I have kept --no-hpet in my setup for months... Any details about the problems? HPET is important to some guests. Seems like HPET reaction is too slow to satisfy some guests(for it would replace PIT). Here is the thread last time. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/44899 Thanks. We can address this in three ways: first, adjust the guest not to do timing related tests when virtualized (since no matter what we do, the tests may fail). Second, I think we should implement userspace ack notifiers (similar to tpr access notifiers already present). Third, we can implement a kernel hpet, which, after we solve the zillion bug it introduces, will also give a nice performance improvement for hpet intensive workloads. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make QEmu HPET disabled by default for KVM?
On Thu, Mar 11, 2010 at 10:28:12AM +0200, Avi Kivity wrote: On 03/11/2010 10:23 AM, Sheng Yang wrote: I have kept --no-hpet in my setup for months... Any details about the problems? HPET is important to some guests. Seems like HPET reaction is too slow to satisfy some guests(for it would replace PIT). Here is the thread last time. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/44899 Thanks. We can address this in three ways: first, adjust the guest not to do timing related tests when virtualized (since no matter what we do, the tests may fail). Second, I think we should implement userspace ack notifiers (similar to tpr access notifiers already present). Third, we can implement a kernel hpet, which, after we solve the zillion bug it introduces, will also give a nice performance improvement for hpet intensive workloads. Second will not solve the problem. Presence of ack notifiers will not make HPET interrupt arrive faster. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: Rework VCPU state writeback API
On 03/02/2010 02:14 AM, Marcelo Tosatti wrote: On Mon, Mar 01, 2010 at 07:10:30PM +0100, Jan Kiszka wrote: This grand cleanup drops all reset and vmsave/load related synchronization points in favor of four(!) generic hooks: - cpu_synchronize_all_states in qemu_savevm_state_complete (initial sync from kernel before vmsave) - cpu_synchronize_all_post_init in qemu_loadvm_state (writeback after vmload) - cpu_synchronize_all_post_init in main after machine init - cpu_synchronize_all_post_reset in qemu_system_reset (writeback after system reset) These writeback points + the existing one of VCPU exec after cpu_synchronize_state map on three levels of writeback: - KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run) - KVM_PUT_RESET_STATE (on synchronous system reset, all VCPUs stopped) - KVM_PUT_FULL_STATE(on init or vmload, all VCPUs stopped as well) This level is passed to the arch-specific VCPU state writing function that will decide which concrete substates need to be written. That way, no writer of load, save or reset functions that interact with in-kernel KVM states will ever have to worry about synchronization again. That also means that a lot of reasons for races, segfaults and deadlocks are eliminated. cpu_synchronize_state remains untouched, just as Anthony suggested. We continue to need it before reading or writing of VCPU states that are also tracked by in-kernel KVM subsystems. Consequently, this patch removes many cpu_synchronize_state calls that are now redundant, just like remaining explicit register syncs. Signed-off-by: Jan Kiszkajan.kis...@siemens.com Jan, This patch breaks system reset of WinXP.32 install (more easily reproducible without iothread enabled). What's the conclusion here? The patch is innocent of the regression? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/3] target-i386: print EFER in cpu_dump_state
On 03/09/2010 03:53 AM, Marcelo Tosatti wrote: Signed-off-by: Marcelo Tosattimtosa...@redhat.com Index: qemu-kvm-uq/target-i386/helper.c === --- qemu-kvm-uq.orig/target-i386/helper.c +++ qemu-kvm-uq/target-i386/helper.c @@ -1176,6 +1176,7 @@ void cpu_dump_state(CPUState *env, FILE cpu_x86_dump_seg_cache(env, f, cpu_fprintf, TR,env-tr); #ifdef TARGET_X86_64 +cpu_fprintf(f, EFER=%016 PRIx64 \n, env-efer); if (env-hflags HF_LMA_MASK) { cpu_fprintf(f, GDT= %016 PRIx64 %08x\n, env-gdt.base, env-gdt.limit); Better to do this for i386 too, no? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make QEmu HPET disabled by default for KVM?
On Thursday 11 March 2010 16:31:57 Gleb Natapov wrote: On Thu, Mar 11, 2010 at 10:28:12AM +0200, Avi Kivity wrote: On 03/11/2010 10:23 AM, Sheng Yang wrote: I have kept --no-hpet in my setup for months... Any details about the problems? HPET is important to some guests. Seems like HPET reaction is too slow to satisfy some guests(for it would replace PIT). Here is the thread last time. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/44899 Thanks. We can address this in three ways: first, adjust the guest not to do timing related tests when virtualized (since no matter what we do, the tests may fail). Second, I think we should implement userspace ack notifiers (similar to tpr access notifiers already present). Third, we can implement a kernel hpet, which, after we solve the zillion bug it introduces, will also give a nice performance improvement for hpet intensive workloads. Second will not solve the problem. Presence of ack notifiers will not make HPET interrupt arrive faster. The slow may also due to lost tick. And with the lost tick, hpet is still unusable... -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM test: Support to SLES install
On Wed, 2010-03-10 at 10:42 -0300, Lucas Meneghel Rodrigues wrote: On Wed, Mar 10, 2010 at 8:45 AM, Lucas Meneghel Rodrigues l...@redhat.com wrote: From: yogi anant...@linux.vnet.ibm.com Adds new entry SUSE in test_base file for sles and contains autoinst file for doing unatteneded Sles11 64-bit install. Oh Yogi, by the way, could you please reorganize the opensuse session and add at least an autoyast file for opensuse 11.2 so I can actually test if we can get a successful installation? I tried to play with the XML file of SLES to see if I could get opensuse 11.2 installed, but it turns out that those config files are an endless XML nightmare and all I tried makes yast to die. The mechanics of the whole thing are correct, I can get yast to start with no problems, but parsing the autoyast file makes the VM to hang. So I am fine with adding the patch, but it'd be nice to have an OS with irrestrict access that everybody could play with (opensuse). I don't have enough time to make it work on my own, so if you have some spare time, please work on this. sure Lucas, i will be happy to create autoyast file for opensuse too. Will work on tht patch and send it as soon as possible. Signed-off-by: Yogananth Subramanian anant...@linux.vnet.ibm.com --- client/tests/kvm/tests_base.cfg.sample | 22 ++ 1 files changed, 22 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index c76470d..acb2076 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -503,6 +503,28 @@ variants: md5sum = 2afee1b8a87175e6dee2b8dbbd1ad8e8 md5sum_1m = 768ca32503ef92c28f2d144f2a87e4d0 +- SLES: +no setup +shell_prompt = ^r...@.*[\#\$]\s*$|# +unattended_install: +pxe_image = linux +pxe_initrd = initrd +extra_params += -bootp /pxelinux.0 -boot n +kernel_args = autoyast=floppy + +variants: +- 11.64: +no setup +image_name = sles11-64 +cdrom=linux/SLES-11-DVD-x86_64-GM-DVD1.iso +md5sum = 50a2bd45cd12c3808c3ee48208e2586b +md5sum_1m = 0951cab7c32e332362fc424c1054 +unattended_install: +unattended_file = unattended/Sles11-64-autoinst.xml +tftp = images/sles11-64/tftpboot +floppy = images/sles11-64floppy.img +pxe_dir = boot/x86_64/loader + - @Ubuntu: shell_prompt = ^r...@.*[\#\$]\s*$ -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/3] kvm: handle internal error
On 03/09/2010 03:53 AM, Marcelo Tosatti wrote: Port qemu-kvm's KVM_EXIT_INTERNAL_ERROR handling to upstream. Signed-off-by: Marcelo Tosattimtosa...@redhat.com Index: qemu-kvm/kvm-all.c === --- qemu-kvm.orig/kvm-all.c +++ qemu-kvm/kvm-all.c @@ -721,6 +721,28 @@ static int kvm_handle_io(uint16_t port, return 1; } +#ifdef KVM_CAP_INTERNAL_ERROR_DATA +static void kvm_handle_internal_error(CPUState *env, struct kvm_run *run) +{ + +if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) { +int i; + +fprintf(stderr, KVM internal error. Suberror: %d\n, +run-internal.suberror); + +for (i = 0; i run-internal.ndata; ++i) { +fprintf(stderr, extra data[%d]: %PRIx64\n, +i, (uint64_t)run-internal.data[i]); +} +} +cpu_dump_state(env, stderr, fprintf, 0); +if (run-internal.suberror == KVM_INTERNAL_ERROR_EMULATION) +fprintf(stderr, emulation failure\n); { braces } +vm_stop(0); +} +#endif Should trigger a qmp message to let management know something went wrong (can come later). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/3] kvm: allow qemu to set EPT identity mapping address
On 03/09/2010 03:53 AM, Marcelo Tosatti wrote: From: Sheng Yangsh...@linux.intel.com If we use larger BIOS image than current 256KB, we would need move reserved TSS and EPT identity mapping pages. Currently TSS support this, but not EPT. Signed-off-by: Marcelo Tosattimtosa...@redhat.com Index: qemu-kvm/target-i386/kvm.c === --- qemu-kvm.orig/target-i386/kvm.c +++ qemu-kvm/target-i386/kvm.c @@ -341,6 +341,24 @@ static int kvm_has_msr_star(CPUState *en return 0; } +static int kvm_init_identity_map_page(KVMState *s) +{ +#ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR +int ret; +uint64_t addr = 0xfffbc000; + +if (!kvm_check_extension(s, KVM_CAP_SET_IDENTITY_MAP_ADDR)) +return 0; { braces } + +ret = kvm_vm_ioctl(s, KVM_SET_IDENTITY_MAP_ADDR,addr); +if (ret 0) { +fprintf(stderr, kvm_set_identity_map_addr: %s\n, strerror(ret)); +return ret; +} +#endif +return 0; +} + int kvm_arch_init(KVMState *s, int smp_cpus) { int ret; @@ -368,7 +386,11 @@ int kvm_arch_init(KVMState *s, int smp_c perror(e820_add_entry() table is full); exit(1); } -return kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000); +ret = kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000); +if (ret 0) +return ret; { } + +return kvm_init_identity_map_page(s); } static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make QEmu HPET disabled by default for KVM?
On Thu, Mar 11, 2010 at 04:38:48PM +0800, Sheng Yang wrote: On Thursday 11 March 2010 16:31:57 Gleb Natapov wrote: On Thu, Mar 11, 2010 at 10:28:12AM +0200, Avi Kivity wrote: On 03/11/2010 10:23 AM, Sheng Yang wrote: I have kept --no-hpet in my setup for months... Any details about the problems? HPET is important to some guests. Seems like HPET reaction is too slow to satisfy some guests(for it would replace PIT). Here is the thread last time. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/44899 Thanks. We can address this in three ways: first, adjust the guest not to do timing related tests when virtualized (since no matter what we do, the tests may fail). Second, I think we should implement userspace ack notifiers (similar to tpr access notifiers already present). Third, we can implement a kernel hpet, which, after we solve the zillion bug it introduces, will also give a nice performance improvement for hpet intensive workloads. Second will not solve the problem. Presence of ack notifiers will not make HPET interrupt arrive faster. The slow may also due to lost tick. And with the lost tick, hpet is still unusable... If the problem it due to lost ticks reinjection may solve it, but only partially. What if IO thread haven't run even once during the time vcpu did clock source check? IIRC sometimes we trigger this even with in kernel PIT. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make QEmu HPET disabled by default for KVM?
On 03/11/2010 10:42 AM, Gleb Natapov wrote: On Thu, Mar 11, 2010 at 04:38:48PM +0800, Sheng Yang wrote: On Thursday 11 March 2010 16:31:57 Gleb Natapov wrote: On Thu, Mar 11, 2010 at 10:28:12AM +0200, Avi Kivity wrote: On 03/11/2010 10:23 AM, Sheng Yang wrote: I have kept --no-hpet in my setup for months... Any details about the problems? HPET is important to some guests. Seems like HPET reaction is too slow to satisfy some guests(for it would replace PIT). Here is the thread last time. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/44899 Thanks. We can address this in three ways: first, adjust the guest not to do timing related tests when virtualized (since no matter what we do, the tests may fail). Second, I think we should implement userspace ack notifiers (similar to tpr access notifiers already present). Third, we can implement a kernel hpet, which, after we solve the zillion bug it introduces, will also give a nice performance improvement for hpet intensive workloads. Second will not solve the problem. Presence of ack notifiers will not make HPET interrupt arrive faster. The slow may also due to lost tick. And with the lost tick, hpet is still unusable... If the problem it due to lost ticks reinjection may solve it, but only partially. What if IO thread haven't run even once during the time vcpu did clock source check? IIRC sometimes we trigger this even with in kernel PIT. That is true. Reinjection can correct problems in the long term, but may fail in the short term. 10 ticks is easily short term in a heavily loaded system. How does it happen with kernel PIT? I could understand it if we had a work item doing the injection, but everything happens either from hrtimer context or vcpu context. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Move kvm_exit tracepoint rip reading inside tracepoint
Reading rip is expensive on vmx, so move it inside the tracepoint so we only incur the cost if tracing is enabled. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/svm.c |2 +- arch/x86/kvm/trace.h |6 +++--- arch/x86/kvm/vmx.c |2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 3a2f2b9..b646e96 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2685,7 +2685,7 @@ static int handle_exit(struct kvm_vcpu *vcpu) struct kvm_run *kvm_run = vcpu-run; u32 exit_code = svm-vmcb-control.exit_code; - trace_kvm_exit(exit_code, svm-vmcb-save.rip); + trace_kvm_exit(exit_code, vcpu); if (unlikely(svm-nested.exit_required)) { nested_svm_vmexit(svm); diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index b75efef..3cf9547 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -182,8 +182,8 @@ TRACE_EVENT(kvm_apic, * Tracepoint for kvm guest exit: */ TRACE_EVENT(kvm_exit, - TP_PROTO(unsigned int exit_reason, unsigned long guest_rip), - TP_ARGS(exit_reason, guest_rip), + TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu), + TP_ARGS(exit_reason, vcpu), TP_STRUCT__entry( __field(unsigned int, exit_reason ) @@ -192,7 +192,7 @@ TRACE_EVENT(kvm_exit, TP_fast_assign( __entry-exit_reason= exit_reason; - __entry-guest_rip = guest_rip; + __entry-guest_rip = kvm_rip_read(vcpu); ), TP_printk(reason %s rip 0x%lx, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ae3217d..06108f3 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3605,7 +3605,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) u32 exit_reason = vmx-exit_reason; u32 vectoring_info = vmx-idt_vectoring_info; - trace_kvm_exit(exit_reason, kvm_rip_read(vcpu)); + trace_kvm_exit(exit_reason, vcpu); /* If guest state is invalid, start emulating */ if (vmx-emulation_required emulate_invalid_guest_state) -- 1.7.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86/kvm: Show guest system/user cputime in cpustat
On Thursday 11 March 2010 15:50:54 Avi Kivity wrote: On 03/11/2010 09:46 AM, Sheng Yang wrote: On Thursday 11 March 2010 15:36:01 Avi Kivity wrote: On 03/11/2010 09:20 AM, Sheng Yang wrote: Currently we can only get the cpu_stat of whole guest as one. This patch enhanced cpu_stat with more detail, has guest_system and guest_user cpu time statistics with a little overhead. Signed-off-by: Sheng Yangsh...@linux.intel.com --- This draft patch based on KVM upstream to show the idea. I would split it into more kernel friendly version later. The overhead is, the cost of get_cpl() after each exit from guest. This can be very expensive in the nested virtualization case, so I wouldn't like this to be in normal paths. I think detailed profiling like that can be left to 'perf kvm', which only has overhead if enabled at runtime. Yes, that's my concern too(though nested vmcs/vmcb read already too expensive, they should be optimized...). Any ideas on how to do that? Perhaps use paravirt_ops to covert the vmread into a memory read? We store the vmwrites in the vmcs anyway. When Qing(CCed) was working on nested VMX in the past, he found PV vmread/vmwrite indeed works well(it would write to the virtual vmcs so vmwrite can also benefit). Though compared to old machine(one our internal patch shows improve more than 5%), NHM get less benefit due to the reduced vmexit cost. -- regards Yang, Sheng The other concern is, perf alike mechanism would bring a lot more overhead compared to this. Ordinarily users won't care if time is spent in guest kernel mode or guest user mode. They want to see which guest is imposing a load on a system. I consider a user profiling a guest from the host an advanced and rarer use case, so it's okay to require tools and additional overhead for this. For example you can put the code to note the cpl in a tracepoint which is enabled dynamically. Yanmin have already implement perf kvm to support this. We are just arguing if a normal top-alike mechanism is necessary. I am also considering to make it a feature that can be disabled. But seems it make things complicate and result in uncertain cpustat output. I'm not even sure that guest time was a good idea. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On 03/11/2010 08:55 AM, Avi Kivity wrote: On 03/10/2010 11:30 PM, Luiz Capitulino wrote: 2. Do we have kvm-specific projects? Can they be part of the QEMU project or do we need a different mentoring organization for it? Something really interesting is kvm-assisted tcg. I'm afraid it's a bit too complicated to GSoC. I suppose the GSoC ideas wiki page will migrate to a QEMU ideas page in some time, so it's good to have ideas written down. Also, the selection of projects will be done by members of the community, by grading the student's submissions. The bar would be placed higher for someone who picks a complicated project. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
guest kernel debugging through serial port
hi, I have followed the windows guest debugging procedure from http://www.linux-kvm.org/page/WindowsGuestDrivers/GuestDebugging. And it works when I start two guests and bind tcp port to guest serial port, but it is really slow. And if I use -serial /dev/ttyS1 for the guest debugging target, I can't talk to it from my dev machine that has connected to ttyS1 with target machine (host). Is this a known problem? Thanks, Neo -- I would remember that if researchers were not ambitious probably today we haven't the technology we are using! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Status of KVM vulnerabilities
Hi, all. Recently Debian has published the DSA-2010-1 [1] where the following vulnerabilities are fixed: * CVE-2010-0298 CVE-2010-0306 (Gleb Natapov) * CVE-2010-0309 (Marcelo Tosatti) * CVE-2010-0419 (Paolo Bonzini) I'm using Linux 2.6.32.3 with qemu-kvm-0.12.1.2 and I would like to know if it is necessary to update kvm-kmod or qemu-kvm, if some of these versions presents this vulnerability and some new version already exists and fix it. Thanks in advance for your replies. Regards, Daniel [1] http://seclists.org/bugtraq/2010/Mar/98 -- Fingerprint: BFB3 08D6 B4D1 31B2 72B9 29CE 6696 BF1B 14E6 1D37 Powered by Debian GNU/Linux Lenny - Linux user #188.598 signature.asc Description: Digital signature
Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.
Gleb Natapov wrote: On Wed, Mar 10, 2010 at 07:08:31PM +0900, Takuya Yoshikawa wrote: Gleb Natapov wrote: Entering guest from time to time will not change semantics of the processor (if code is not modified under processor's feet at least). Currently we reenter guest mode after each iteration of string instruction for all instruction but ins/outs. E.g., is there no chance that during the repetitions, in the middle of the repetitions, page faults occur? If it can, without entering the guest, can we handle it? -- I lack some basic assumptions? If page fault occurs we inject it to the guest. Oh, I maight fail to tell what I worried about. Opposite, I mean, I worried about NOT reentering the guest case. Are you thinking about something specific here? If we inject exceptions Yes. when they occur and we inject interrupt when they arrive what problem do you see? I guess this is how real CPU actually works. I doubt it re-reads string instruction on each iteration. No problem if we detect and inject page faults like that. I just didn't so certain that when we encounter a page fault in the middle of the repetitions(about rep specific case), if we can inject it, suspend the repetition and enter the guest immediately like SDM Vol.2B says: A repeating string operation can be suspended by an exception or interrupt. When this happens, the state of the registers is preserved to allow the string operation to be resumed upon a return from the exception or interrupt handler. ... This mechanism allows long string operations to proceed without affecting the interrupt response time of the system. Ah, I might misunderstand that if we reenter the guest every time for rep, page fault detection, not injection, can be done by the other ways easily, by EXIT reason or something. Both ways may need the same thing, sorry. Another concern I wrote was just about the dependencies between your time to time criteria and SDM's without affecting the interrupt response time. This is just the problem of how we can determine the criteria appropriately. I know that current implementation with reentrance is OK. Current implementation does not reenter guest on each iteration for pio string, so currently we have both variants. I'm sorry, I was confused as if the current implementation already included some of your patches. To inject a page fault without reentering the guest, we need to add some more hacks to the emulator IIUC. No, we just need to enter guest if exception happens. I see that this in handled incorrectly in my current patch series. I was just not certain if the following condition(from SDM Vol.2B) is satisfied The source and destination registers point to the next string elements to be operated on, the EIP register points to the string instruction, and the ECX register has the value it held following the last successful iteration of the instruction. in the emulator's fault handling. I should have read your patch more closely. Thanks, Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.
On Thu, Mar 11, 2010 at 06:58:14PM +0900, Takuya Yoshikawa wrote: Gleb Natapov wrote: On Wed, Mar 10, 2010 at 07:08:31PM +0900, Takuya Yoshikawa wrote: Gleb Natapov wrote: Entering guest from time to time will not change semantics of the processor (if code is not modified under processor's feet at least). Currently we reenter guest mode after each iteration of string instruction for all instruction but ins/outs. E.g., is there no chance that during the repetitions, in the middle of the repetitions, page faults occur? If it can, without entering the guest, can we handle it? -- I lack some basic assumptions? If page fault occurs we inject it to the guest. Oh, I maight fail to tell what I worried about. Opposite, I mean, I worried about NOT reentering the guest case. Are you thinking about something specific here? If we inject exceptions Yes. when they occur and we inject interrupt when they arrive what problem do you see? I guess this is how real CPU actually works. I doubt it re-reads string instruction on each iteration. No problem if we detect and inject page faults like that. Yes, that part is missing from my patch. I just didn't so certain that when we encounter a page fault in the middle of the repetitions(about rep specific case), if we can inject it, suspend the repetition and enter the guest immediately like SDM Vol.2B says: A repeating string operation can be suspended by an exception or interrupt. When this happens, the state of the registers is preserved to allow the string operation to be resumed upon a return from the exception or interrupt handler. ... This mechanism allows long string operations to proceed without affecting the interrupt response time of the system. Ah, I might misunderstand that if we reenter the guest every time for rep, page fault detection, not injection, can be done by the other ways easily, by EXIT reason or something. Both ways may need the same thing, sorry. When instruction is emulated page fault detection is done by the emulator itself. During guest entry the exception is injected. So all we need to do in the emulator is to enter guest immediately when exception condition is detected. Another concern I wrote was just about the dependencies between your time to time criteria and SDM's without affecting the interrupt response time. This is just the problem of how we can determine the criteria appropriately. We can reenter guest immediately if there is pending interrupt (we can't do that with ins read ahead, but this optimization is non architectural anyway). I know that current implementation with reentrance is OK. Current implementation does not reenter guest on each iteration for pio string, so currently we have both variants. I'm sorry, I was confused as if the current implementation already included some of your patches. It's independent from my patches. This is how string pio always worked. Otherwise certain workloads are too slow. To inject a page fault without reentering the guest, we need to add some more hacks to the emulator IIUC. No, we just need to enter guest if exception happens. I see that this in handled incorrectly in my current patch series. I was just not certain if the following condition(from SDM Vol.2B) is satisfied The source and destination registers point to the next string elements to be operated on, the EIP register points to the string instruction, and the ECX register has the value it held following the last successful iteration of the instruction. It is satisfied. Writeback is done on each iteration. in the emulator's fault handling. I should have read your patch more closely. Thanks, Takuya -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Don't spam kernel log when injecting exceptions due to bad cr writes
These are guest-triggerable. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/x86.c | 27 --- 1 files changed, 0 insertions(+), 27 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 169b1b3..66609f6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -404,8 +404,6 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) #ifdef CONFIG_X86_64 if (cr0 0xUL) { - printk(KERN_DEBUG set_cr0: 0x%lx #GP, reserved bits 0x%lx\n, - cr0, kvm_read_cr0(vcpu)); kvm_inject_gp(vcpu, 0); return; } @@ -414,14 +412,11 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) cr0 = ~CR0_RESERVED_BITS; if ((cr0 X86_CR0_NW) !(cr0 X86_CR0_CD)) { - printk(KERN_DEBUG set_cr0: #GP, CD == 0 NW == 1\n); kvm_inject_gp(vcpu, 0); return; } if ((cr0 X86_CR0_PG) !(cr0 X86_CR0_PE)) { - printk(KERN_DEBUG set_cr0: #GP, set PG flag - and a clear PE flag\n); kvm_inject_gp(vcpu, 0); return; } @@ -432,15 +427,11 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) int cs_db, cs_l; if (!is_pae(vcpu)) { - printk(KERN_DEBUG set_cr0: #GP, start paging - in long mode while PAE is disabled\n); kvm_inject_gp(vcpu, 0); return; } kvm_x86_ops-get_cs_db_l_bits(vcpu, cs_db, cs_l); if (cs_l) { - printk(KERN_DEBUG set_cr0: #GP, start paging - in long mode while CS.L == 1\n); kvm_inject_gp(vcpu, 0); return; @@ -448,8 +439,6 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) } else #endif if (is_pae(vcpu) !load_pdptrs(vcpu, vcpu-arch.cr3)) { - printk(KERN_DEBUG set_cr0: #GP, pdptrs - reserved bits\n); kvm_inject_gp(vcpu, 0); return; } @@ -475,28 +464,23 @@ void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE; if (cr4 CR4_RESERVED_BITS) { - printk(KERN_DEBUG set_cr4: #GP, reserved bits\n); kvm_inject_gp(vcpu, 0); return; } if (is_long_mode(vcpu)) { if (!(cr4 X86_CR4_PAE)) { - printk(KERN_DEBUG set_cr4: #GP, clearing PAE while - in long mode\n); kvm_inject_gp(vcpu, 0); return; } } else if (is_paging(vcpu) (cr4 X86_CR4_PAE) ((cr4 ^ old_cr4) pdptr_bits) !load_pdptrs(vcpu, vcpu-arch.cr3)) { - printk(KERN_DEBUG set_cr4: #GP, pdptrs reserved bits\n); kvm_inject_gp(vcpu, 0); return; } if (cr4 X86_CR4_VMXE) { - printk(KERN_DEBUG set_cr4: #GP, setting VMXE\n); kvm_inject_gp(vcpu, 0); return; } @@ -517,21 +501,16 @@ void kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) if (is_long_mode(vcpu)) { if (cr3 CR3_L_MODE_RESERVED_BITS) { - printk(KERN_DEBUG set_cr3: #GP, reserved bits\n); kvm_inject_gp(vcpu, 0); return; } } else { if (is_pae(vcpu)) { if (cr3 CR3_PAE_RESERVED_BITS) { - printk(KERN_DEBUG - set_cr3: #GP, reserved bits\n); kvm_inject_gp(vcpu, 0); return; } if (is_paging(vcpu) !load_pdptrs(vcpu, cr3)) { - printk(KERN_DEBUG set_cr3: #GP, pdptrs - reserved bits\n); kvm_inject_gp(vcpu, 0); return; } @@ -563,7 +542,6 @@ EXPORT_SYMBOL_GPL(kvm_set_cr3); void kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8) { if (cr8 CR8_RESERVED_BITS) { - printk(KERN_DEBUG set_cr8: #GP, reserved bits 0x%lx\n, cr8); kvm_inject_gp(vcpu, 0); return; } @@ -619,15 +597,12 @@ static u32 emulated_msrs[] = { static void set_efer(struct kvm_vcpu *vcpu, u64 efer) { if (efer efer_reserved_bits)
Re: Make QEmu HPET disabled by default for KVM?
On Thu, Mar 11, 2010 at 10:46:06AM +0200, Avi Kivity wrote: On 03/11/2010 10:42 AM, Gleb Natapov wrote: On Thu, Mar 11, 2010 at 04:38:48PM +0800, Sheng Yang wrote: On Thursday 11 March 2010 16:31:57 Gleb Natapov wrote: On Thu, Mar 11, 2010 at 10:28:12AM +0200, Avi Kivity wrote: On 03/11/2010 10:23 AM, Sheng Yang wrote: I have kept --no-hpet in my setup for months... Any details about the problems? HPET is important to some guests. Seems like HPET reaction is too slow to satisfy some guests(for it would replace PIT). Here is the thread last time. http://thread.gmane.org/gmane.comp.emulators.kvm.devel/44899 Thanks. We can address this in three ways: first, adjust the guest not to do timing related tests when virtualized (since no matter what we do, the tests may fail). Second, I think we should implement userspace ack notifiers (similar to tpr access notifiers already present). Third, we can implement a kernel hpet, which, after we solve the zillion bug it introduces, will also give a nice performance improvement for hpet intensive workloads. Second will not solve the problem. Presence of ack notifiers will not make HPET interrupt arrive faster. The slow may also due to lost tick. And with the lost tick, hpet is still unusable... If the problem it due to lost ticks reinjection may solve it, but only partially. What if IO thread haven't run even once during the time vcpu did clock source check? IIRC sometimes we trigger this even with in kernel PIT. That is true. Reinjection can correct problems in the long term, but may fail in the short term. 10 ticks is easily short term in a heavily loaded system. How does it happen with kernel PIT? I could understand it if we had a work item doing the injection, but everything happens either from hrtimer context or vcpu context. Do we kick vcpu out of guest mode when hrtimer triggers? I don't see us doing it in __kvm_timer_fn(). -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter
On Thu, Mar 11, 2010 at 08:47:21AM +0200, Avi Kivity wrote: tdp is still used in both cases, so that name is confusing. We could call it mmu.direct_map (and set it for real mode?) or mmu.virtual_map (with the opposite sense). Or something. I like the mmu.direct_map name. Its a good term too, I will change it in the patch. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Move kvm_exit tracepoint rip reading inside tracepoint
Avi Kivity wrote: diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index b75efef..3cf9547 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -182,8 +182,8 @@ TRACE_EVENT(kvm_apic, * Tracepoint for kvm guest exit: */ TRACE_EVENT(kvm_exit, - TP_PROTO(unsigned int exit_reason, unsigned long guest_rip), - TP_ARGS(exit_reason, guest_rip), + TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu), Whitespaces were inserted by accident? + TP_ARGS(exit_reason, vcpu), TP_STRUCT__entry( __field(unsigned int, exit_reason ) @@ -192,7 +192,7 @@ TRACE_EVENT(kvm_exit, TP_fast_assign( __entry-exit_reason= exit_reason; - __entry-guest_rip = guest_rip; + __entry-guest_rip = kvm_rip_read(vcpu); ), TP_printk(reason %s rip 0x%lx, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ae3217d..06108f3 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3605,7 +3605,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) u32 exit_reason = vmx-exit_reason; u32 vectoring_info = vmx-idt_vectoring_info; - trace_kvm_exit(exit_reason, kvm_rip_read(vcpu)); + trace_kvm_exit(exit_reason, vcpu); /* If guest state is invalid, start emulating */ if (vmx-emulation_required emulate_invalid_guest_state) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Trace exception injection
Often an exception can help point out where things start to go wrong. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/trace.h | 32 arch/x86/kvm/x86.c |3 +++ 2 files changed, 35 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index d10b359..32c912c 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -219,6 +219,38 @@ TRACE_EVENT(kvm_inj_virq, TP_printk(irq %u, __entry-irq) ); +#define EXS(x) { x##_VECTOR, # #x } + +#define kvm_trace_sym_exc \ + EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM), \ + EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), \ + EXS(MF), EXS(MC) + +/* + * Tracepoint for kvm interrupt injection: + */ +TRACE_EVENT(kvm_inj_exception, + TP_PROTO(unsigned exception, bool has_error, unsigned error_code), + TP_ARGS(exception, has_error, error_code), + + TP_STRUCT__entry( + __field(u8, exception ) + __field(u8, has_error ) + __field(u32,error_code ) + ), + + TP_fast_assign( + __entry-exception = exception; + __entry-has_error = has_error; + __entry-error_code = error_code; + ), + + TP_printk(%s (0x%x), + __print_symbolic(__entry-exception, kvm_trace_sym_exc), + /* FIXME: don't print error_code if not present */ + __entry-has_error ? __entry-error_code : 0) +); + /* * Tracepoint for page fault. */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 66609f6..bcf52d1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4231,6 +4231,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu) { /* try to reinject previous events if any */ if (vcpu-arch.exception.pending) { + trace_kvm_inj_exception(vcpu-arch.exception.nr, + vcpu-arch.exception.has_error_code, + vcpu-arch.exception.error_code); kvm_x86_ops-queue_exception(vcpu, vcpu-arch.exception.nr, vcpu-arch.exception.has_error_code, vcpu-arch.exception.error_code); -- 1.7.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Move kvm_exit tracepoint rip reading inside tracepoint
On 03/11/2010 01:03 PM, Takuya Yoshikawa wrote: Avi Kivity wrote: diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index b75efef..3cf9547 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -182,8 +182,8 @@ TRACE_EVENT(kvm_apic, * Tracepoint for kvm guest exit: */ TRACE_EVENT(kvm_exit, -TP_PROTO(unsigned int exit_reason, unsigned long guest_rip), -TP_ARGS(exit_reason, guest_rip), +TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu), Whitespaces were inserted by accident? Yeah, already fixed locally. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: guest patched with pax causes set_cr0: 0xffff88000[...] #GP, reserved bits 0x8004003? flood on host
On 03/11/2010 04:31 PM, pagee...@freemail.hu wrote: On 11 Mar 2010 at 8:44, Avi Kivity wrote: On 03/10/2010 06:17 PM, Antoine Martin wrote: Hi, I've updated my host kernel headers to 2.6.33, rebuilt glibc (and the base system), rebuilt kvm. ... and now I get hundreds of those in dmesg on the host when I start a guest kernel that worked fine before. (2.6.33 + pax patch v5) set_cr0: 0x88000ec29d58 #GP, reserved bits 0x80040033 set_cr0: 0x88000f3cdb38 #GP, reserved bits 0x8004003b set_cr0: 0x88000f3dbc88 #GP, reserved bits 0x80040033 set_cr0: 0x88000f83b958 #GP, reserved bits 0x8004003b The guest is clearly confused. Can you bisect kvm to find out what introduced this problem? OK, will try to find the time. the guest is calling pax_{open,close}_kernel that flip cr0.wp off/on, respectively. Antoine, can you decode some of those rip values please (or better, send me the corresponding vmlinux and all logs) I've dumped everything here (.config, vmlinuz and log): http://users.nagafix.co.uk/~antoine/KVM/ Antoine -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Trace exception injection
On Thu, Mar 11, 2010 at 01:03:12PM +0200, Avi Kivity wrote: Often an exception can help point out where things start to go wrong. Adding guest rip where exception happened will be useful too. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/trace.h | 32 arch/x86/kvm/x86.c |3 +++ 2 files changed, 35 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index d10b359..32c912c 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -219,6 +219,38 @@ TRACE_EVENT(kvm_inj_virq, TP_printk(irq %u, __entry-irq) ); +#define EXS(x) { x##_VECTOR, # #x } + +#define kvm_trace_sym_exc\ + EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM), \ + EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), \ + EXS(MF), EXS(MC) + +/* + * Tracepoint for kvm interrupt injection: + */ +TRACE_EVENT(kvm_inj_exception, + TP_PROTO(unsigned exception, bool has_error, unsigned error_code), + TP_ARGS(exception, has_error, error_code), + + TP_STRUCT__entry( + __field(u8, exception ) + __field(u8, has_error ) + __field(u32,error_code ) + ), + + TP_fast_assign( + __entry-exception = exception; + __entry-has_error = has_error; + __entry-error_code = error_code; + ), + + TP_printk(%s (0x%x), + __print_symbolic(__entry-exception, kvm_trace_sym_exc), + /* FIXME: don't print error_code if not present */ + __entry-has_error ? __entry-error_code : 0) +); + /* * Tracepoint for page fault. */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 66609f6..bcf52d1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4231,6 +4231,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu) { /* try to reinject previous events if any */ if (vcpu-arch.exception.pending) { + trace_kvm_inj_exception(vcpu-arch.exception.nr, + vcpu-arch.exception.has_error_code, + vcpu-arch.exception.error_code); kvm_x86_ops-queue_exception(vcpu, vcpu-arch.exception.nr, vcpu-arch.exception.has_error_code, vcpu-arch.exception.error_code); -- 1.7.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: guest patched with pax causes set_cr0: 0xffff88000[...] #GP, reserved bits 0x8004003? flood on host
On 11 Mar 2010 at 8:44, Avi Kivity wrote: On 03/10/2010 06:17 PM, Antoine Martin wrote: Hi, I've updated my host kernel headers to 2.6.33, rebuilt glibc (and the base system), rebuilt kvm. ... and now I get hundreds of those in dmesg on the host when I start a guest kernel that worked fine before. (2.6.33 + pax patch v5) set_cr0: 0x88000ec29d58 #GP, reserved bits 0x80040033 set_cr0: 0x88000f3cdb38 #GP, reserved bits 0x8004003b set_cr0: 0x88000f3dbc88 #GP, reserved bits 0x80040033 set_cr0: 0x88000f83b958 #GP, reserved bits 0x8004003b The guest is clearly confused. Can you bisect kvm to find out what introduced this problem? the guest is calling pax_{open,close}_kernel that flip cr0.wp off/on, respectively. Antoine, can you decode some of those rip values please (or better, send me the corresponding vmlinux and all logs)? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On 11.03.2010, at 10:43, Paolo Bonzini wrote: On 03/11/2010 08:55 AM, Avi Kivity wrote: On 03/10/2010 11:30 PM, Luiz Capitulino wrote: 2. Do we have kvm-specific projects? Can they be part of the QEMU project or do we need a different mentoring organization for it? Something really interesting is kvm-assisted tcg. I'm afraid it's a bit too complicated to GSoC. I suppose the GSoC ideas wiki page will migrate to a QEMU ideas page in some time, so it's good to have ideas written down. Also, the selection of projects will be done by members of the community, by grading the student's submissions. The bar would be placed higher for someone who picks a complicated project. The list is also still missing a lot of potential mentors for the listed ideas. Let me propose some here :) == Shared memory transport between guest(s) and host == Sounds like Avi would be a good fit. I'm pretty unknowledgeable when it comes to shm. == Pass through file systems (9p, CIFS) == I dislike CIFS now that we use it regularly. It just doesn't work for Linux to Linux communication. But as far as 9P is concerned, do you need help there Anthony? If so, would you take over the mentoring? == Add more sophisticated encodings to VNC server == I could probably help out being a secondary mentor here, but Anthony would be a good fit as primary, no? I guess Kraxel could help out too. == Write a C QMP library based on QEMU JSON and QMP code == Suggested by Anthony, mentored by Anthony? :) Possible other candidates are Luiz and Kraxel I guess? I haven't really tracked QMP that much. == Add support for guest copy/paste == This should probably be folded into the above VNC server improvements. By itself it's just too little of a task. == Device state visualization == Jan, Kraxel? Maybe too small for a task? == Upstreaming some of the Android emulator bits == Jan, Anthony? If you read the suggestion and just think to yourself well yes, I think I could do it - then put your name in the wiki :). Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Trace exception injection
On 03/11/2010 01:09 PM, Gleb Natapov wrote: On Thu, Mar 11, 2010 at 01:03:12PM +0200, Avi Kivity wrote: Often an exception can help point out where things start to go wrong. Adding guest rip where exception happened will be useful too. You get that from the previous kvm_exit trace. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On 03/11/2010 01:25 PM, Alexander Graf wrote: The list is also still missing a lot of potential mentors for the listed ideas. Let me propose some here :) == Shared memory transport between guest(s) and host == Sounds like Avi would be a good fit. I'm pretty unknowledgeable when it comes to shm. Not sure what this is. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make QEmu HPET disabled by default for KVM?
On 03/11/2010 12:23 PM, Gleb Natapov wrote: If the problem it due to lost ticks reinjection may solve it, but only partially. What if IO thread haven't run even once during the time vcpu did clock source check? IIRC sometimes we trigger this even with in kernel PIT. That is true. Reinjection can correct problems in the long term, but may fail in the short term. 10 ticks is easily short term in a heavily loaded system. How does it happen with kernel PIT? I could understand it if we had a work item doing the injection, but everything happens either from hrtimer context or vcpu context. Do we kick vcpu out of guest mode when hrtimer triggers? I don't see us doing it in __kvm_timer_fn(). We're always running on the same cpu as vcpu 0, so no need. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On 11.03.2010, at 12:54, Avi Kivity wrote: On 03/11/2010 01:25 PM, Alexander Graf wrote: The list is also still missing a lot of potential mentors for the listed ideas. Let me propose some here :) == Shared memory transport between guest(s) and host == Sounds like Avi would be a good fit. I'm pretty unknowledgeable when it comes to shm. Not sure what this is. Cam's shared memory device. Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make QEmu HPET disabled by default for KVM?
On 03/11/2010 01:56 PM, Avi Kivity wrote: On 03/11/2010 12:23 PM, Gleb Natapov wrote: If the problem it due to lost ticks reinjection may solve it, but only partially. What if IO thread haven't run even once during the time vcpu did clock source check? IIRC sometimes we trigger this even with in kernel PIT. That is true. Reinjection can correct problems in the long term, but may fail in the short term. 10 ticks is easily short term in a heavily loaded system. How does it happen with kernel PIT? I could understand it if we had a work item doing the injection, but everything happens either from hrtimer context or vcpu context. Do we kick vcpu out of guest mode when hrtimer triggers? I don't see us doing it in __kvm_timer_fn(). We're always running on the same cpu as vcpu 0, so no need. Would be better to do it, though, in case we have migration races. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On 03/11/2010 01:56 PM, Alexander Graf wrote: On 11.03.2010, at 12:54, Avi Kivity wrote: On 03/11/2010 01:25 PM, Alexander Graf wrote: The list is also still missing a lot of potential mentors for the listed ideas. Let me propose some here :) == Shared memory transport between guest(s) and host == Sounds like Avi would be a good fit. I'm pretty unknowledgeable when it comes to shm. Not sure what this is. Cam's shared memory device. That's plain shared memory among guests (though the host could also participate). transport evokes something like virtio rings. I could mentor it, though I prefer something in kvm, and it looks close to completion. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On 11.03.2010, at 12:58, Avi Kivity wrote: On 03/11/2010 01:56 PM, Alexander Graf wrote: On 11.03.2010, at 12:54, Avi Kivity wrote: On 03/11/2010 01:25 PM, Alexander Graf wrote: The list is also still missing a lot of potential mentors for the listed ideas. Let me propose some here :) == Shared memory transport between guest(s) and host == Sounds like Avi would be a good fit. I'm pretty unknowledgeable when it comes to shm. Not sure what this is. Cam's shared memory device. That's plain shared memory among guests (though the host could also participate). transport evokes something like virtio rings. I could mentor it, though I prefer something in kvm, and it looks close to completion. I agree. Take it off the list then :-). Another idea I'd have would be upstream integration (and cleanup) of the ARM KVM port: https://wiki.ncl.cs.columbia.edu/wiki/index.php/AndroidVirt:MainPage Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On 03/11/2010 12:25 PM, Alexander Graf wrote: == Write a C QMP library based on QEMU JSON and QMP code == Suggested by Anthony, mentored by Anthony?:) Possible other candidates are Luiz and Kraxel I guess? I haven't really tracked QMP that much. If you guys are okay with this, I think I could mentor since I followed the design of QMP quite closely (and this is the only one that I think I could do a decent job with). BTW, it worked out much better for me in the past when the student and mentor were in a similar time zone. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: guest patched with pax causes set_cr0: 0xffff88000[...] #GP, reserved bits 0x8004003? flood on host
On 11 Mar 2010 at 8:44, Avi Kivity wrote: On 03/10/2010 06:17 PM, Antoine Martin wrote: Hi, I've updated my host kernel headers to 2.6.33, rebuilt glibc (and the base system), rebuilt kvm. ... and now I get hundreds of those in dmesg on the host when I start a guest kernel that worked fine before. (2.6.33 + pax patch v5) set_cr0: 0x88000ec29d58 #GP, reserved bits 0x80040033 set_cr0: 0x88000f3cdb38 #GP, reserved bits 0x8004003b set_cr0: 0x88000f3dbc88 #GP, reserved bits 0x80040033 set_cr0: 0x88000f83b958 #GP, reserved bits 0x8004003b The guest is clearly confused. Can you bisect kvm to find out what introduced this problem? i screwed up the paravirt register clobbers, don't worry about it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On 03/11/2010 02:03 PM, Alexander Graf wrote: Another idea I'd have would be upstream integration (and cleanup) of the ARM KVM port: https://wiki.ncl.cs.columbia.edu/wiki/index.php/AndroidVirt:MainPage Huh, didn't even know this thing existed. Definitely something to merge. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Windows Driver for -vga std
Hi all, using the Default VGA settings Windows XP detects an unknown VGA Device, but everything is fine, Display settings are ok. But how can I setup my XP to detect this virtual graphics board correctly? I just want to continue using this setting but with no complaints in the system/hardware settings. Best regards, Erik -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
On 03/10/2010 07:41 PM, Paul Brook wrote: You're much better off using a bulk-data transfer API that relaxes coherency requirements. IOW, shared memory doesn't make sense for TCG Rather, tcg doesn't make sense for shared memory smp. But we knew that already. In think TCG SMP is a hard, but soluble problem, especially when you're running guests used to coping with NUMA. Do you mean by using a per-cpu tlb? These kind of solutions are generally slow, but tcg's slowness may mask this out. Yes. TCG interacting with third parties via shared memory is probably never going to make sense. The third party in this case is qemu. Maybe. But it's a different instance of qemu, and once this feature exists I bet people will use it for other things. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On 03/10/2010 11:30 PM, Luiz Capitulino wrote: 2. Do we have kvm-specific projects? Can they be part of the QEMU project or do we need a different mentoring organization for it? Complete big real mode emulation. I'll add this. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Trace exception injection
On Thu, Mar 11, 2010 at 01:51:30PM +0200, Avi Kivity wrote: On 03/11/2010 01:09 PM, Gleb Natapov wrote: On Thu, Mar 11, 2010 at 01:03:12PM +0200, Avi Kivity wrote: Often an exception can help point out where things start to go wrong. Adding guest rip where exception happened will be useful too. You get that from the previous kvm_exit trace. Not in a case of emulation ;) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Trace exception injection
On 03/11/2010 02:31 PM, Gleb Natapov wrote: On Thu, Mar 11, 2010 at 01:51:30PM +0200, Avi Kivity wrote: On 03/11/2010 01:09 PM, Gleb Natapov wrote: On Thu, Mar 11, 2010 at 01:03:12PM +0200, Avi Kivity wrote: Often an exception can help point out where things start to go wrong. Adding guest rip where exception happened will be useful too. You get that from the previous kvm_exit trace. Not in a case of emulation ;) Then we need an emulator trace. I have it in a branch somewhere, will reactivate it after your stuff goes in. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inter-VM shared memory PCI device
On Thursday 11 March 2010, Avi Kivity wrote: A totally different option that avoids this whole problem would be to separate the signalling from the shared memory, making the PCI shared memory device a trivial device with a single memory BAR, and using something a higher-level concept like a virtio based serial line for the actual signalling. That would be much slower. The current scheme allows for an ioeventfd/irqfd short circuit which allows one guest to interrupt another without involving their qemus at all. Yes, the serial line approach would be much slower, but my point was that we can do signaling over something else, which could well be something building on irqfd. Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On Thu, 11 Mar 2010 10:43:09 +0100 Paolo Bonzini pbonz...@redhat.com wrote: On 03/11/2010 08:55 AM, Avi Kivity wrote: On 03/10/2010 11:30 PM, Luiz Capitulino wrote: 2. Do we have kvm-specific projects? Can they be part of the QEMU project or do we need a different mentoring organization for it? Something really interesting is kvm-assisted tcg. I'm afraid it's a bit too complicated to GSoC. I suppose the GSoC ideas wiki page will migrate to a QEMU ideas page in some time, so it's good to have ideas written down. Also, the selection of projects will be done by members of the community, by grading the student's submissions. The bar would be placed higher for someone who picks a complicated project. Exactly, we also have a 'skill level' tag, setting it to high should help and note that we can have from grad students to phd ones. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On 11.03.2010, at 13:59, Luiz Capitulino wrote: On Thu, 11 Mar 2010 10:43:09 +0100 Paolo Bonzini pbonz...@redhat.com wrote: On 03/11/2010 08:55 AM, Avi Kivity wrote: On 03/10/2010 11:30 PM, Luiz Capitulino wrote: 2. Do we have kvm-specific projects? Can they be part of the QEMU project or do we need a different mentoring organization for it? Something really interesting is kvm-assisted tcg. I'm afraid it's a bit too complicated to GSoC. I suppose the GSoC ideas wiki page will migrate to a QEMU ideas page in some time, so it's good to have ideas written down. Also, the selection of projects will be done by members of the community, by grading the student's submissions. The bar would be placed higher for someone who picks a complicated project. Exactly, we also have a 'skill level' tag, setting it to high should help and note that we can have from grad students to phd ones. I don't think we should put in a correlation between skill level and degree. I myself only have a Bachelor's degree :-). Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inter-VM shared memory PCI device
On 03/11/2010 02:57 PM, Arnd Bergmann wrote: On Thursday 11 March 2010, Avi Kivity wrote: A totally different option that avoids this whole problem would be to separate the signalling from the shared memory, making the PCI shared memory device a trivial device with a single memory BAR, and using something a higher-level concept like a virtio based serial line for the actual signalling. That would be much slower. The current scheme allows for an ioeventfd/irqfd short circuit which allows one guest to interrupt another without involving their qemus at all. Yes, the serial line approach would be much slower, but my point was that we can do signaling over something else, which could well be something building on irqfd. Well, we could, but it seems to make things more complicated? A card with shared memory, and another card with an interrupt interconnect? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Ideas wiki for GSoC 2010
On Wed, 2010-03-10 at 18:30 -0300, Luiz Capitulino wrote: Hi there, Our wiki page for the Summer of Code 2010 is doing quite well: http://wiki.qemu.org/Google_Summer_of_Code_2010 Just to let you guys know that I'm going to give a talk at the local university (Unicamp) about kvm autotest, and will spread the word about the qemu and kvm summer of code applications, will incentivate the students to apply for qemu and kvm. The university was the 2nd overall place on number of student proposals accepted on gsoc for the last couple of years, with an excellent completion rate, so I believe we could have some good work coming out of it. Now the most important is: 1. Get mentors assigned to projects. Just put your name and email in the right field. It's ok and even desirable to have two mentors per project, but please remember that mentoring is serious work, more info here: http://code.google.com/p/google-summer-of-code/wiki/AdviceforMentors http://gsoc-wiki.osuosl.org/index.php/Main_Page 2. Do we have kvm-specific projects? Can they be part of the QEMU project or do we need a different mentoring organization for it? 3. Fill in the missing information for the suggested project (description, skill level, languages, etc) I will complete our application tomorrow or on Friday. PS: I'm CC'ing everyone who suggested projects there, except one or two I couldn't find the email address. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On Thu, 11 Mar 2010 12:25:24 +0100 Alexander Graf ag...@suse.de wrote: == Write a C QMP library based on QEMU JSON and QMP code == Suggested by Anthony, mentored by Anthony? :) Possible other candidates are Luiz and Kraxel I guess? I haven't really tracked QMP that much. I didn't candidate as a mentor myself because Anthony has a better idea wrt to the public API. But I certainly can help with the implementation. I have more two or three QMP projects to suggest, btw. == Add support for guest copy/paste == This should probably be folded into the above VNC server improvements. By itself it's just too little of a task. == Device state visualization == Jan, Kraxel? Maybe too small for a task? I think that whether a task is small or not also depends on the student, of course that we should not come up with a project that can be easily done in two weeks. On the other hand, 'not that difficult' tasks can be an excellent project for those really new to open source and serious development. You know, when you're a starter you spend quite a lot of time reading code and trying things out (and there's nothing wrong with that). So, for this kind of project the mentor only should take extra care to choose a student that is really going to learn a lot in the project. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On Thu, 11 Mar 2010 13:09:37 +0100 Paolo Bonzini pbonz...@redhat.com wrote: On 03/11/2010 12:25 PM, Alexander Graf wrote: == Write a C QMP library based on QEMU JSON and QMP code == Suggested by Anthony, mentored by Anthony?:) Possible other candidates are Luiz and Kraxel I guess? I haven't really tracked QMP that much. If you guys are okay with this, I think I could mentor since I followed the design of QMP quite closely (and this is the only one that I think I could do a decent job with). Sure. BTW, it worked out much better for me in the past when the student and mentor were in a similar time zone. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On Thu, 11 Mar 2010 14:00:46 +0100 Alexander Graf ag...@suse.de wrote: On 11.03.2010, at 13:59, Luiz Capitulino wrote: On Thu, 11 Mar 2010 10:43:09 +0100 Paolo Bonzini pbonz...@redhat.com wrote: On 03/11/2010 08:55 AM, Avi Kivity wrote: On 03/10/2010 11:30 PM, Luiz Capitulino wrote: 2. Do we have kvm-specific projects? Can they be part of the QEMU project or do we need a different mentoring organization for it? Something really interesting is kvm-assisted tcg. I'm afraid it's a bit too complicated to GSoC. I suppose the GSoC ideas wiki page will migrate to a QEMU ideas page in some time, so it's good to have ideas written down. Also, the selection of projects will be done by members of the community, by grading the student's submissions. The bar would be placed higher for someone who picks a complicated project. Exactly, we also have a 'skill level' tag, setting it to high should help and note that we can have from grad students to phd ones. I don't think we should put in a correlation between skill level and degree. I myself only have a Bachelor's degree :-). Absolutely, my bad :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to tweak kernel to get the best out of kvm?
Hi Avi, I had missed to include some important syslog lines from the host system. See attachment. On 03/10/10 14:15, Avi Kivity wrote: You have tons of iowait time, indicating an I/O bottleneck. Is this disk IO or network IO? The rsync session puts a high load on both, but actually I do not see how a high load on disk or block IO could make the virtual hosts unresponsive, as shown by the hosts syslog? What filesystem are you using for the host? Are you using qcow2 or raw access? What's the qemu command line. It is ext3 and qcow2. Currently I am testing with reiserfs on the host system. The system performance seems to be worse, compared with ext3. Here is the kvm command line (as generated by libvirt): /usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 1024 -smp 1 -name test0.0 \ -uuid 74e71149-4baf-3af0-9c99-f4e50273296f \ -monitor unix:/var/lib/libvirt/qemu/test0.0.monitor,server,nowait \ -boot c -drive if=ide,media=cdrom,bus=1,unit=0 \ -drive file=/export/storage/test0.0.img,if=virtio,boot=on \ -net nic,macaddr=00:16:36:94:7e:f3,vlan=0,model=virtio,name=net0 \ -net tap,fd=60,vlan=0,name=hostnet0 -serial pty -parallel none \ -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -balloon virtio How many virtual machines would you assume I could run on a host with 64 GByte RAM, 2 quad cores, a bonding NIC with 4*1Gbit/sec and a hardware RAID? Each vhost is supposed to get 4 GByte RAM and 1 CPU. 15 guests should fit comfortably, more with ksm running if the workloads are similar, or if you use ballooning. 15 vhosts would be nice. ksm is in the kernel, but not in my qemu-kvm (yet). Here the problem is likely the host filesystem and/or I/O scheduler. The optimal layout is placing guest disks in LVM volumes, and accessing them with -drive file=...,cache=none. However, file-based access should also work. I will try LVM tomorrow, when the test with reiserfs is completed. Many thanx Harri syslog.gz Description: application/gzip
Re: [PATCH] Inter-VM shared memory PCI device
On Thursday 11 March 2010, Avi Kivity wrote: That would be much slower. The current scheme allows for an ioeventfd/irqfd short circuit which allows one guest to interrupt another without involving their qemus at all. Yes, the serial line approach would be much slower, but my point was that we can do signaling over something else, which could well be something building on irqfd. Well, we could, but it seems to make things more complicated? A card with shared memory, and another card with an interrupt interconnect? Yes, I agree that it's more complicated if you have a specific application in mind that needs one of each, and most use cases that want shared memory also need an interrupt mechanism, but it's not always the case: - You could use ext2 with -o xip on a private mapping of a shared host file in order to share the page cache. This does not need any interrupts. - If you have more than two parties sharing the segment, there are different ways to communicate, e.g. always send an interrupt to all others, or have dedicated point-to-point connections. There is also some complexity in trying to cover all possible cases in one driver. I have to say that I also really like the idea of futex over shared memory, which could potentially make this all a lot simpler. I don't know how this would best be implemented on the host though. Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
On Thu, 11 Mar 2010, Nick Piggin wrote: On Thu, Mar 11, 2010 at 03:10:47AM +, Jamie Lokier wrote: Paul Brook wrote: In a cross environment that becomes extremely hairy. For example the x86 architecture effectively has an implicit write barrier before every store, and an implicit read barrier before every load. Btw, x86 doesn't have any implicit barriers due to ordinary loads. Only stores and atomics have implicit barriers, afaik. As of March 2009[1] Intel guarantees that memory reads occur in order (they may only be reordered relative to writes). It appears AMD do not provide this guarantee, which could be an interesting problem for heterogeneous migration.. (Summary: At least on AMD64, it does too, for normal accesses to naturally aligned addresses in write-back cacheable memory.) Oh, that's interesting. Way back when I guess we knew writes were in order and it wasn't explicit that reads were, hence smp_rmb() using a locked atomic. Here is a post by Nick Piggin from 2007 with links to Intel _and_ AMD documents asserting that reads to cacheable memory are in program order: http://lkml.org/lkml/2007/9/28/212 Subject: [patch] x86: improved memory barrier implementation Links to documents: http://developer.intel.com/products/processor/manuals/318147.pdf http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf The Intel link doesn't work any more, but the AMD one does. It might have been merged into their development manual now. It was (http://www.intel.com/products/processor/manuals/): Intel╝ 64 Architecture Memory Ordering White Paper This document has been merged into Volume 3A of Intel 64 and IA-32 Architectures Software Developer's Manual. [..snip..] -- mailto:av1...@comtv.ru
Re: Shadow page table questions
It doesn't, and there are often multiple shadow pages per guest page, distinguished by their sp-role field. Oh, great! Does this mean that there is already a mechanism for synchronizing all shadow pages shadowing the same guest when such a guest page changes? Marek -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: Ideas wiki for GSoC 2010
On Thu, Mar 11, 2010 at 5:03 AM, Alexander Graf ag...@suse.de wrote: On 11.03.2010, at 12:58, Avi Kivity wrote: On 03/11/2010 01:56 PM, Alexander Graf wrote: On 11.03.2010, at 12:54, Avi Kivity wrote: On 03/11/2010 01:25 PM, Alexander Graf wrote: The list is also still missing a lot of potential mentors for the listed ideas. Let me propose some here :) == Shared memory transport between guest(s) and host == Sounds like Avi would be a good fit. I'm pretty unknowledgeable when it comes to shm. Not sure what this is. Cam's shared memory device. That's plain shared memory among guests (though the host could also participate). transport evokes something like virtio rings. I could mentor it, though I prefer something in kvm, and it looks close to completion. I agree. Take it off the list then :-). Fair enough. I'd be willing to take up one of the other suggestions. Cam -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make QEmu HPET disabled by default for KVM?
On Thu, Mar 11, 2010 at 09:58:12AM +0200, Avi Kivity wrote: On 03/11/2010 09:52 AM, Sheng Yang wrote: I think we have already suffered enough timer issues due to this(e.g. I can't boot up well on 2.6.18 kernel)... 2.6.18 as guest or as host? I have kept --no-hpet in my setup for months... Any details about the problems? HPET is important to some guests. As Gleb mentioned in the other thread, reinjection will introduce another set of problems. Ideally all this timer related problems should be fixed by correlating timer interrupts and time source reads. Since one already has to use special timer parameters (-rtc-td-hack, -no-kvm-pit-reinjection), using -no-hpet for problematic Linux guests seems fine? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/3] target-i386: print EFER in cpu_dump_state
On Thu, Mar 11, 2010 at 10:35:21AM +0200, Avi Kivity wrote: On 03/09/2010 03:53 AM, Marcelo Tosatti wrote: Signed-off-by: Marcelo Tosattimtosa...@redhat.com Index: qemu-kvm-uq/target-i386/helper.c === --- qemu-kvm-uq.orig/target-i386/helper.c +++ qemu-kvm-uq/target-i386/helper.c @@ -1176,6 +1176,7 @@ void cpu_dump_state(CPUState *env, FILE cpu_x86_dump_seg_cache(env, f, cpu_fprintf, TR,env-tr); #ifdef TARGET_X86_64 +cpu_fprintf(f, EFER=%016 PRIx64 \n, env-efer); if (env-hflags HF_LMA_MASK) { cpu_fprintf(f, GDT= %016 PRIx64 %08x\n, env-gdt.base, env-gdt.limit); Better to do this for i386 too, no? On systems that support IA-32e mode, the extended feature enable register (IA32_EFER) is available. This model-specific register controls activation of IA-32e mode and other IA-32e mode operations. Can it be useful for i386 too? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] KVM: Rework VCPU state writeback API
On Thu, Mar 11, 2010 at 10:32:50AM +0200, Avi Kivity wrote: On 03/02/2010 02:14 AM, Marcelo Tosatti wrote: On Mon, Mar 01, 2010 at 07:10:30PM +0100, Jan Kiszka wrote: This grand cleanup drops all reset and vmsave/load related synchronization points in favor of four(!) generic hooks: - cpu_synchronize_all_states in qemu_savevm_state_complete (initial sync from kernel before vmsave) - cpu_synchronize_all_post_init in qemu_loadvm_state (writeback after vmload) - cpu_synchronize_all_post_init in main after machine init - cpu_synchronize_all_post_reset in qemu_system_reset (writeback after system reset) These writeback points + the existing one of VCPU exec after cpu_synchronize_state map on three levels of writeback: - KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run) - KVM_PUT_RESET_STATE (on synchronous system reset, all VCPUs stopped) - KVM_PUT_FULL_STATE(on init or vmload, all VCPUs stopped as well) This level is passed to the arch-specific VCPU state writing function that will decide which concrete substates need to be written. That way, no writer of load, save or reset functions that interact with in-kernel KVM states will ever have to worry about synchronization again. That also means that a lot of reasons for races, segfaults and deadlocks are eliminated. cpu_synchronize_state remains untouched, just as Anthony suggested. We continue to need it before reading or writing of VCPU states that are also tracked by in-kernel KVM subsystems. Consequently, this patch removes many cpu_synchronize_state calls that are now redundant, just like remaining explicit register syncs. Signed-off-by: Jan Kiszkajan.kis...@siemens.com Jan, This patch breaks system reset of WinXP.32 install (more easily reproducible without iothread enabled). What's the conclusion here? The patch is innocent of the regression? Yes, it is. The problem was caused by a recent seabios change, now fixed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2968899 ] guest lockup setting clock when smp 1
Bugs item #2968899, was opened at 2010-03-11 14:31 Message generated for change (Tracker Item Submitted) made by high33 You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2968899group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: hugohiggins (high33) Assigned to: Nobody/Anonymous (nobody) Summary: guest lockup setting clock when smp 1 Initial Comment: When booting iso image ubuntu-9.10-server-amd64.iso using qemu-kvm-0.12.3 the guest will always lock up when installer tries to set clock via ntp when using -smp 2. Bug is repeatable every time during install. Workaround seems to be booting without -smp parameter. command line: /usr/local/qemu-kvm-0.12.3/bin/qemu-system-x86_64 -name test -M pc -m 2048 -boot d -vga std -sdl -net nic,macaddr=BA:DD:C0:FF:EE:F6 -net vde -drive file=/dev/sdp,if=scsi,boot=on -cdrom iso/ubuntu-9.10-server-amd64.iso -k en-us -usbdevice tablet -serial file:serial.log -smp 2 -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2968899group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2968899 ] guest lockup setting clock when smp 1
Bugs item #2968899, was opened at 2010-03-11 14:31 Message generated for change (Comment added) made by high33 You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2968899group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: hugohiggins (high33) Assigned to: Nobody/Anonymous (nobody) Summary: guest lockup setting clock when smp 1 Initial Comment: When booting iso image ubuntu-9.10-server-amd64.iso using qemu-kvm-0.12.3 the guest will always lock up when installer tries to set clock via ntp when using -smp 2. Bug is repeatable every time during install. Workaround seems to be booting without -smp parameter. command line: /usr/local/qemu-kvm-0.12.3/bin/qemu-system-x86_64 -name test -M pc -m 2048 -boot d -vga std -sdl -net nic,macaddr=BA:DD:C0:FF:EE:F6 -net vde -drive file=/dev/sdp,if=scsi,boot=on -cdrom iso/ubuntu-9.10-server-amd64.iso -k en-us -usbdevice tablet -serial file:serial.log -smp 2 -- Comment By: hugohiggins (high33) Date: 2010-03-11 14:33 Message: This is on a kvm hypervisor host running xubuntu 9.04 dual processor 6-core Opteron with 32Gig of ram and kernel 2.6.28-16-generic #55-Ubuntu SMP Tue Oct 20 19:48:32 UTC 2009 x86_64 GNU/Linux -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2968899group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
On Thu, Mar 04, 2010 at 04:58:20PM +0100, Joerg Roedel wrote: On Thu, Mar 04, 2010 at 11:42:55AM -0300, Marcelo Tosatti wrote: On Wed, Mar 03, 2010 at 08:12:03PM +0100, Joerg Roedel wrote: Hi, here are the patches that implement nested paging support for nested svm. They are somewhat intrusive to the soft-mmu so I post them as RFC in the first round to get feedback about the general direction of the changes. Nevertheless I am proud to report that with these patches the famous kernel-compile benchmark runs only 4% slower in the l2 guest as in the l1 guest when l2 is single-processor. With SMP guests the situation is very different. The more vcpus the guest has the more is the performance drop from l1 to l2. Anyway, this post is to get feedback about the overall concept of these patches. Please review and give feedback :-) Joerg, What perf gain does this bring ? (i'm not aware of the current overhead). The benchmark was an allnoconfig kernel compile in tmpfs which took with the same guest image: as l1-guest with npt: 2m23s as l2-guest with l1(nested)-l2(shadow): around 8-9 minutes as l2-guest with l1(nested)-l2(shadow) without the recent msrpm optimization: around 19 minutes as l2-guest with l1(nested)-l2(nested) [this patchset]: 2m25s-2m30s Overall comments: Can't you translate l2_gpa - l1_gpa walking the current l1 nested pagetable, and pass that to the kvm tdp fault path (with the correct context setup)? If I understand your suggestion correctly, I think thats exactly whats done in the patches. Some words about the design: For nested-nested we need to shadow the l1-nested-ptable on the host. This is done using the vcpu-arch.mmu context which holds the l1 paging modes while the l2 is running. On a npt-fault from the l2 we just instrument the shadow-ptable code. This is the common case. because it happens all the time while the l2 is running. OK, makes sense now, I was missing the fact that the l1-nested-ptable needs to be shadowed and l1 translations to it must be write protected. You should disable out of sync shadow so that l1 guest writes to l1-nested-ptables always trap. And in the trap case, you'd have to invalidate l2 shadow pagetable entries that used the (now obsolete) l1-nested-ptable entry. Does that happen automatically? The other thing is that vcpu-arch.mmu.gva_to_gpa is expected to still work and translate virtual addresses of the l2 into physical addresses of the l1 (so it can be accessed with kvm functions). To do this we need to be aware of the L2 paging mode. It is stored in vcpu-arch.nested_mmu context. This context is only used for gva_to_gpa translations. It is not used to build shadow page tables or anything else. Thats the reason only the parts necessary for gva_to_gpa translations of the nested_mmu context are initialized. Since we can not use mmu.gva_to_gpa to translate only between l2_gpa and l1_gpa because this function is required to translate l2_gva to l1_gpa by other parts of kvm, the function which does this translation is moved to nested_mmu.gva_to_gpa. So basically the gva_to_gpa function pointers are swapped between mmu and nested_mmu. The nested_mmu.gva_to_gpa function is used in translate_gpa_nested which is assigned to the newly introduced translate_gpa callback of nested_mmu context. This callback is used in the walk_addr function to translate every l2_gpa address we read from cr3 or the guest ptes into l1_gpa to read the next step from the guest memory. In the old unnested case the translate_gpa callback would point to a function which just returns the gpa it is passed to it unmodified. The walk_addr function is generalized and now there are basically two versions of it: * walk_addr which translates using vcpu-arch.mmu context * walk_addr_nested which translates using vcpu-arch.nested_mmu context Thats pretty much how these patches work. You probably need to include a flag in base_role to differentiate between l1 / l2 shadow tables (say if they use the same cr3 value). Not sure if this is necessary. It may be necessary when large pages come into play. Otherwise the host npt pages are distinguished by the shadow npt pages by the direct-flag. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM: x86: ignore access permissions for hypercall patching
Ignore access permissions while patching hypercall instructions. Otherwise KVM injects a page fault when trying to patch vmcall on read-only text regions: Freeing initrd memory: 8843k freed Freeing unused kernel memory: 660k freed Write protecting the kernel text: 4780k Write protecting the kernel read-only data: 1912k BUG: unable to handle kernel paging request at c01292e3 IP: [c01292e3] kvm_leave_lazy_mmu+0x43/0x70 *pde = 00910067 *pte = 00129161 Oops: 0003 [#1] SMP CC: sta...@kernel.org Reported-by: Stefan Bader stefan.ba...@canonical.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 703f637..bf5c83f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3253,12 +3253,17 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, static int emulator_write_emulated_onepage(unsigned long addr, const void *val, unsigned int bytes, - struct kvm_vcpu *vcpu) + struct kvm_vcpu *vcpu, + bool guest_initiated) { gpa_t gpa; u32 error_code; - gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code); + + if (guest_initiated) + gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code); + else + gpa = kvm_mmu_gva_to_gpa_system(vcpu, addr, error_code); if (gpa == UNMAPPED_GVA) { kvm_inject_page_fault(vcpu, addr, error_code); @@ -3289,24 +3294,35 @@ mmio: return X86EMUL_CONTINUE; } -int emulator_write_emulated(unsigned long addr, +int __emulator_write_emulated(unsigned long addr, const void *val, unsigned int bytes, - struct kvm_vcpu *vcpu) + struct kvm_vcpu *vcpu, + bool guest_initiated) { /* Crossing a page boundary? */ if (((addr + bytes - 1) ^ addr) PAGE_MASK) { int rc, now; now = -addr ~PAGE_MASK; - rc = emulator_write_emulated_onepage(addr, val, now, vcpu); + rc = emulator_write_emulated_onepage(addr, val, now, vcpu, +guest_initiated); if (rc != X86EMUL_CONTINUE) return rc; addr += now; val += now; bytes -= now; } - return emulator_write_emulated_onepage(addr, val, bytes, vcpu); + return emulator_write_emulated_onepage(addr, val, bytes, vcpu, + guest_initiated); +} + +int emulator_write_emulated(unsigned long addr, + const void *val, + unsigned int bytes, + struct kvm_vcpu *vcpu) +{ + return __emulator_write_emulated(addr, val, bytes, vcpu, true); } EXPORT_SYMBOL_GPL(emulator_write_emulated); @@ -3997,7 +4013,7 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu) kvm_x86_ops-patch_hypercall(vcpu, instruction); - return emulator_write_emulated(rip, instruction, 3, vcpu); + return __emulator_write_emulated(rip, instruction, 3, vcpu, false); } static u64 mk_cr_64(u64 curr_cr, u32 new_val) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: x86: ignore access permissions for hypercall patching
With this patch applied on top, I was able to boot my guest on a AMD host system. Marcelo Tosatti wrote: Ignore access permissions while patching hypercall instructions. Otherwise KVM injects a page fault when trying to patch vmcall on read-only text regions: Freeing initrd memory: 8843k freed Freeing unused kernel memory: 660k freed Write protecting the kernel text: 4780k Write protecting the kernel read-only data: 1912k BUG: unable to handle kernel paging request at c01292e3 IP: [c01292e3] kvm_leave_lazy_mmu+0x43/0x70 *pde = 00910067 *pte = 00129161 Oops: 0003 [#1] SMP CC: sta...@kernel.org Reported-by: Stefan Bader stefan.ba...@canonical.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Tested-by: Stefan Bader stefan.ba...@canonical.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 703f637..bf5c83f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3253,12 +3253,17 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, static int emulator_write_emulated_onepage(unsigned long addr, const void *val, unsigned int bytes, -struct kvm_vcpu *vcpu) +struct kvm_vcpu *vcpu, +bool guest_initiated) { gpa_t gpa; u32 error_code; - gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code); + + if (guest_initiated) + gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error_code); + else + gpa = kvm_mmu_gva_to_gpa_system(vcpu, addr, error_code); if (gpa == UNMAPPED_GVA) { kvm_inject_page_fault(vcpu, addr, error_code); @@ -3289,24 +3294,35 @@ mmio: return X86EMUL_CONTINUE; } -int emulator_write_emulated(unsigned long addr, +int __emulator_write_emulated(unsigned long addr, const void *val, unsigned int bytes, -struct kvm_vcpu *vcpu) +struct kvm_vcpu *vcpu, +bool guest_initiated) { /* Crossing a page boundary? */ if (((addr + bytes - 1) ^ addr) PAGE_MASK) { int rc, now; now = -addr ~PAGE_MASK; - rc = emulator_write_emulated_onepage(addr, val, now, vcpu); + rc = emulator_write_emulated_onepage(addr, val, now, vcpu, + guest_initiated); if (rc != X86EMUL_CONTINUE) return rc; addr += now; val += now; bytes -= now; } - return emulator_write_emulated_onepage(addr, val, bytes, vcpu); + return emulator_write_emulated_onepage(addr, val, bytes, vcpu, +guest_initiated); +} + +int emulator_write_emulated(unsigned long addr, +const void *val, +unsigned int bytes, +struct kvm_vcpu *vcpu) +{ + return __emulator_write_emulated(addr, val, bytes, vcpu, true); } EXPORT_SYMBOL_GPL(emulator_write_emulated); @@ -3997,7 +4013,7 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu) kvm_x86_ops-patch_hypercall(vcpu, instruction); - return emulator_write_emulated(rip, instruction, 3, vcpu); + return __emulator_write_emulated(rip, instruction, 3, vcpu, false); } static u64 mk_cr_64(u64 curr_cr, u32 new_val) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: Ideas wiki for GSoC 2010
Avi Kivity wrote: On 03/10/2010 11:30 PM, Luiz Capitulino wrote: 2. Do we have kvm-specific projects? Can they be part of the QEMU project or do we need a different mentoring organization for it? Something really interesting is kvm-assisted tcg. I'm afraid it's a bit too complicated to GSoC. Is this simpler: kvm-assisted user-mode emulation (no TCG involved)? -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: ia64: fix the error code of ioctl KVM_IA64_VCPU_GET_STACK failure
The ioctl KVM_IA64_VCPU_GET_STACK does not set the error code if copy_to_user() fail, and 0 will be return, we should use -EFAULT instead of 0 in this case, so this patch fixed it. Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- arch/ia64/kvm/kvm-ia64.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 26e0e08..bc07c81 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1535,8 +1535,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp, goto out; if (copy_to_user(user_stack, stack, -sizeof(struct kvm_ia64_vcpu_stack))) +sizeof(struct kvm_ia64_vcpu_stack))) { + r = -EFAULT; goto out; + } break; } -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: fix the error of ioctl KVM_IRQ_LINE if no irq chip
If no irq chip in kernel, ioctl KVM_IRQ_LINE will return -EFAULT. But I see in other place such as KVM_[GET|SET]IRQCHIP, -ENXIO is return. So this patch used -ENXIO instead of -EFAULT. Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- arch/x86/kvm/x86.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3753c11..c6b7e9f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2857,11 +2857,13 @@ long kvm_arch_vm_ioctl(struct file *filp, r = -EFAULT; if (copy_from_user(irq_event, argp, sizeof irq_event)) goto out; + r = -ENXIO; if (irqchip_in_kernel(kvm)) { __s32 status; status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irq_event.irq, irq_event.level); if (ioctl == KVM_IRQ_LINE_STATUS) { + r = -EFAULT; irq_event.status = status; if (copy_to_user(argp, irq_event, sizeof irq_event)) -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: ia64: fix the error of ioctl KVM_IRQ_LINE if no irq chip
If no irq chip in kernel, ioctl KVM_IRQ_LINE will return -EFAULT. But I see in other place such as KVM_[GET|SET]IRQCHIP, -ENXIO is return. So this patch used -ENXIO instead of -EFAULT. Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- arch/ia64/kvm/kvm-ia64.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 26e0e08..0d2e41a 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -979,11 +979,13 @@ long kvm_arch_vm_ioctl(struct file *filp, r = -EFAULT; if (copy_from_user(irq_event, argp, sizeof irq_event)) goto out; + r = -ENXIO; if (irqchip_in_kernel(kvm)) { __s32 status; status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irq_event.irq, irq_event.level); if (ioctl == KVM_IRQ_LINE_STATUS) { + r = -EFAULT; irq_event.status = status; if (copy_to_user(argp, irq_event, sizeof irq_event)) -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/2] virtio-serial-bus: wake up iothread upon guest read notification
Wake up iothread when buffers are consumed. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-ioworker/hw/virtio-serial-bus.c === --- qemu-ioworker.orig/hw/virtio-serial-bus.c +++ qemu-ioworker/hw/virtio-serial-bus.c @@ -331,6 +331,7 @@ static void handle_output(VirtIODevice * static void handle_input(VirtIODevice *vdev, VirtQueue *vq) { +qemu_notify_event(main_io_worker); } static uint32_t get_features(VirtIODevice *vdev, uint32_t features) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/2] introduce QEMUIOWorker and wake up iothread on virtio-serial-bus notification
-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/2] Pass QEMUIOWorker to qemu_notify_event
This can be used later to introduce generic iothread workers. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-ioworker/async.c === --- qemu-ioworker.orig/async.c +++ qemu-ioworker/async.c @@ -180,7 +180,7 @@ void qemu_bh_schedule(QEMUBH *bh) bh-scheduled = 1; bh-idle = 0; /* stop the currently executing CPU to execute the BH ASAP */ -qemu_notify_event(); +qemu_notify_event(main_io_worker); } void qemu_bh_cancel(QEMUBH *bh) Index: qemu-ioworker/hw/mac_dbdma.c === --- qemu-ioworker.orig/hw/mac_dbdma.c +++ qemu-ioworker/hw/mac_dbdma.c @@ -655,7 +655,7 @@ void DBDMA_register_channel(void *dbdma, void DBDMA_schedule(void) { -qemu_notify_event(); +qemu_notify_event(main_io_worker); } static void Index: qemu-ioworker/hw/virtio-net.c === --- qemu-ioworker.orig/hw/virtio-net.c +++ qemu-ioworker/hw/virtio-net.c @@ -359,7 +359,7 @@ static void virtio_net_handle_rx(VirtIOD /* We now have RX buffers, signal to the IO thread to break out of the * select to re-poll the tap file descriptor */ -qemu_notify_event(); +qemu_notify_event(main_io_worker); } static int virtio_net_can_receive(VLANClientState *nc) Index: qemu-ioworker/qemu-common.h === --- qemu-ioworker.orig/qemu-common.h +++ qemu-ioworker/qemu-common.h @@ -234,11 +234,17 @@ typedef uint64_t pcibus_t; void cpu_save(QEMUFile *f, void *opaque); int cpu_load(QEMUFile *f, void *opaque, int version_id); +typedef struct QEMUIOWorker { +void *opaque; +} QEMUIOWorker; + /* Force QEMU to stop what it's doing and service IO */ void qemu_service_io(void); /* Force QEMU to process pending events */ -void qemu_notify_event(void); +void qemu_notify_event(QEMUIOWorker *worker); + +extern QEMUIOWorker *main_io_worker; /* Unblock cpu */ void qemu_cpu_kick(void *env); Index: qemu-ioworker/vl.c === --- qemu-ioworker.orig/vl.c +++ qemu-ioworker/vl.c @@ -274,6 +274,9 @@ uint8_t qemu_uuid[16]; static QEMUBootSetHandler *boot_set_handler; static void *boot_set_opaque; +QEMUIOWorker iothread_worker; +QEMUIOWorker *main_io_worker = iothread_worker; + #ifdef SIGRTMIN #define SIG_IPI (SIGRTMIN+4) #else @@ -885,7 +888,7 @@ void qemu_mod_timer(QEMUTimer *ts, int64 } /* Interrupt execution to force deadline recalculation. */ if (use_icount) -qemu_notify_event(); +qemu_notify_event(main_io_worker); } } @@ -1062,7 +1065,7 @@ static void host_alarm_handler(int host_ } #endif timer_alarm_pending = 1; -qemu_notify_event(); +qemu_notify_event(main_io_worker); } } @@ -2928,7 +2931,7 @@ static int ram_load(QEMUFile *f, void *o void qemu_service_io(void) { -qemu_notify_event(); +qemu_notify_event(main_io_worker); } /***/ @@ -3180,26 +3183,26 @@ void qemu_system_reset_request(void) } else { reset_requested = 1; } -qemu_notify_event(); +qemu_notify_event(main_io_worker); } void qemu_system_shutdown_request(void) { shutdown_requested = 1; -qemu_notify_event(); +qemu_notify_event(main_io_worker); } void qemu_system_powerdown_request(void) { powerdown_requested = 1; -qemu_notify_event(); +qemu_notify_event(main_io_worker); } #ifdef CONFIG_IOTHREAD static void qemu_system_vmstop_request(int reason) { vmstop_requested = reason; -qemu_notify_event(); +qemu_notify_event(main_io_worker); } #endif @@ -3341,7 +3344,7 @@ void qemu_cpu_kick(void *env) return; } -void qemu_notify_event(void) +void qemu_notify_event(QEMUIOWorker *worker) { CPUState *env = cpu_single_env; @@ -3727,7 +3730,7 @@ void qemu_init_vcpu(void *_env) tcg_init_vcpu(env); } -void qemu_notify_event(void) +void qemu_notify_event(QEMUIOWorker *worker) { qemu_event_increment(); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: coalesced_mmio: NULLify the pointers before freeing ring page and dev
kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio ring page and dev even after it has freed them. This may trigger problems, e.g., if we call kvm_coalesced_mmio_free() in kvm_destroy_vm() or kvm_vm_ioctl_register_coalesced_mmio() afterward. This patch avoids such problems by NULLifying the pointers. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- virt/kvm/coalesced_mmio.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 5169736..11776b7 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm) return ret; out_free_dev: + kvm-coalesced_mmio_dev = NULL; kfree(dev); out_free_page: + kvm-coalesced_mmio_ring = NULL; __free_page(page); out_err: return ret; -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: coalesced_mmio: NULLify the pointers before freeing ring page and dev
Takuya Yoshikawa wrote: kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio ring page and dev even after it has freed them. This may trigger problems, e.g., if we call kvm_coalesced_mmio_free() in kvm_destroy_vm() or kvm_vm_ioctl_register_coalesced_mmio() afterward. This patch avoids such problems by NULLifying the pointers. After this patch, I think we also need to do some check in kvm_vcpu_fault() for coalesced_mmio_ring, since the coalesced_mmio may not be init correctly. This is other issue, so I will send a new patch for this. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- virt/kvm/coalesced_mmio.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 5169736..11776b7 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm) return ret; out_free_dev: + kvm-coalesced_mmio_dev = NULL; kfree(dev); out_free_page: + kvm-coalesced_mmio_ring = NULL; __free_page(page); out_err: return ret; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: fix to not use NULL kvm-coalesced_mmio_ring in kvm_vcpu_fault()
If coalesced_mmio init fail, the kvm-coalesced_mmio_ring will be set to NULL. If so, we should return VM_FAULT_SIGBUS in kvm_vcpu_fault() even if vmf-pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET. Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- virt/kvm/kvm_main.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e758ef7..0e06a6d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1253,7 +1253,8 @@ static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf) page = virt_to_page(vcpu-arch.pio_data); #endif #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET - else if (vmf-pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET) + else if (vmf-pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET +vcpu-kvm-coalesced_mmio_ring) page = virt_to_page(vcpu-kvm-coalesced_mmio_ring); #endif else -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: coalesced_mmio: NULLify the pointers before freeing ring page and dev
Wei Yongjun wrote: Takuya Yoshikawa wrote: kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio ring page and dev even after it has freed them. This may trigger problems, e.g., if we call kvm_coalesced_mmio_free() in kvm_destroy_vm() or kvm_vm_ioctl_register_coalesced_mmio() afterward. This patch avoids such problems by NULLifying the pointers. After this patch, I think we also need to do some check in kvm_vcpu_fault() for coalesced_mmio_ring, since the coalesced_mmio may not be init correctly. This is other issue, so I will send a new patch for this. Eh, thanks. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- virt/kvm/coalesced_mmio.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 5169736..11776b7 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm) return ret; out_free_dev: + kvm-coalesced_mmio_dev = NULL; kfree(dev); out_free_page: + kvm-coalesced_mmio_ring = NULL; __free_page(page); out_err: return ret; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: fix to not use NULL kvm-coalesced_mmio_ring in kvm_vcpu_fault()
Wei Yongjun wrote: If coalesced_mmio init fail, the kvm-coalesced_mmio_ring will be set to NULL. If so, we should return VM_FAULT_SIGBUS in kvm_vcpu_fault() even if vmf-pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET. Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- virt/kvm/kvm_main.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e758ef7..0e06a6d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1253,7 +1253,8 @@ static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf) page = virt_to_page(vcpu-arch.pio_data); #endif #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET - else if (vmf-pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET) + else if (vmf-pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET +vcpu-kvm-coalesced_mmio_ring) page = virt_to_page(vcpu-kvm-coalesced_mmio_ring); #endif else Btw, I am not certain if we can continue the normal path even if kvm_coalesced_mmio_init() fails. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: fix the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO failure
This patch change the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO from -EINVAL to -ENXIO if no coalesced mmio dev exists. Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- virt/kvm/coalesced_mmio.c |4 ++-- virt/kvm/kvm_main.c |2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 5169736..22500d4 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -138,7 +138,7 @@ int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm, struct kvm_coalesced_mmio_dev *dev = kvm-coalesced_mmio_dev; if (dev == NULL) - return -EINVAL; + return -ENXIO; mutex_lock(kvm-slots_lock); if (dev-nb_zones = KVM_COALESCED_MMIO_ZONE_MAX) { @@ -161,7 +161,7 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm, struct kvm_coalesced_mmio_zone *z; if (dev == NULL) - return -EINVAL; + return -ENXIO; mutex_lock(kvm-slots_lock); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0e06a6d..861435e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1603,7 +1603,6 @@ static long kvm_vm_ioctl(struct file *filp, r = -EFAULT; if (copy_from_user(zone, argp, sizeof zone)) goto out; - r = -ENXIO; r = kvm_vm_ioctl_register_coalesced_mmio(kvm, zone); if (r) goto out; @@ -1615,7 +1614,6 @@ static long kvm_vm_ioctl(struct file *filp, r = -EFAULT; if (copy_from_user(zone, argp, sizeof zone)) goto out; - r = -ENXIO; r = kvm_vm_ioctl_unregister_coalesced_mmio(kvm, zone); if (r) goto out; -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ioeventfd usage in KVM
Hi, I'm trying to use ioeventfd/irqfds for my shared memory patch. I followed the usage in the vhost-net patches to see how it's setup for virtio-pci and tried to follow it as closely as I could. Despite the call to kvm_vm_ioctl() returning 0, any writes to the assigned 4-byte memory area do not seem to trigger a write to the corresponding fd. At this point, I'm just trying to get the ioeventfd happening. I notice that virtio-pci allocates it's BAR as PCI_BASE_ADDRESS_SPACE_IO and then uses register_ioport_{read,write} whereas I use cpu_register_io_memory and the PCI_BASE_ADDRESS_SPACE_MEMORY type as shown below. +static void ivshmem_mmio_map(PCIDevice *pci_dev, int region_num, + pcibus_t addr, pcibus_t size, int type) +{ +PCI_IVShmemState *d = (PCI_IVShmemState *)pci_dev; +IVShmemState *s = d-ivshmem_state; + +s-otheraddr = addr /* this address will be used for the ioeventfd*/ +cpu_register_physical_memory(addr + 0, 0x100, s-ivshmem_mmio_io_addr); +} +s-ivshmem_mmio_io_addr = cpu_register_io_memory(ivshmem_mmio_read, +ivshmem_mmio_write, s); +/* region for registers*/ +pci_register_bar(d-dev, 0, 0x100, + PCI_BASE_ADDRESS_SPACE_MEMORY, ivshmem_mmio_map); my basic attempt looks like this: struct kvm_ioeventfd ked; ked.addr = s-otheraddr + Doorbell; ked.len = 4; ked.flags = KVM_IOEVENTFD_FLAG_PIO; ked.fd = an_eventfd; ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, ked); but when the guest writes to the offset of Doorbell, I cannot see any action (via a select on the fd). Is there something obviously wrong that I'm doing? When I get this working, I'd be happy to write up a page for the KVM site. Thanks, Cam -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/2] virtio-serial-bus: wake up iothread upon guest read notification
On (Thu) Mar 11 2010 [23:45:51], Marcelo Tosatti wrote: Wake up iothread when buffers are consumed. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu-ioworker/hw/virtio-serial-bus.c === --- qemu-ioworker.orig/hw/virtio-serial-bus.c +++ qemu-ioworker/hw/virtio-serial-bus.c @@ -331,6 +331,7 @@ static void handle_output(VirtIODevice * static void handle_input(VirtIODevice *vdev, VirtQueue *vq) { +qemu_notify_event(main_io_worker); } ACK, the host lets us know buffers are consumed and new buffers have been added to the pool so that we can start sending more data. Before this patch my tests took 16m18s to run. After this patch my tests take 1m17s to run. Both tests done with just one buffer made available in the virtio-queues. Amit -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: x86: ignore access permissions for hypercall patching
On Thu, Mar 11, 2010 at 06:16:05PM -0300, Marcelo Tosatti wrote: Ignore access permissions while patching hypercall instructions. Otherwise KVM injects a page fault when trying to patch vmcall on read-only text regions: Freeing initrd memory: 8843k freed Freeing unused kernel memory: 660k freed Write protecting the kernel text: 4780k Write protecting the kernel read-only data: 1912k BUG: unable to handle kernel paging request at c01292e3 IP: [c01292e3] kvm_leave_lazy_mmu+0x43/0x70 *pde = 00910067 *pte = 00129161 Oops: 0003 [#1] SMP CC: sta...@kernel.org Reported-by: Stefan Bader stefan.ba...@canonical.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com My emulator patch series introduce kvm_write_guest_virt_system(). May be used it here (only compile tested). diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3753c11..9833c25 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3157,14 +3157,18 @@ static int kvm_read_guest_virt_system(gva_t addr, void *val, unsigned int bytes, return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, error); } -static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes, - struct kvm_vcpu *vcpu, u32 *error) +static int kvm_write_guest_virt_helper(gva_t addr, void *val, + unsigned int bytes, + struct kvm_vcpu *vcpu, u32 access, + u32 *error) { void *data = val; int r = X86EMUL_CONTINUE; + access |= PFERR_WRITE_MASK; + while (bytes) { - gpa_t gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error); + gpa_t gpa = vcpu-arch.mmu.gva_to_gpa(vcpu, addr, access, error); unsigned offset = addr (PAGE_SIZE-1); unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset); int ret; @@ -3187,6 +3191,19 @@ out: return r; } +static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes, + struct kvm_vcpu *vcpu, u32 *error) +{ + u32 access = (kvm_x86_ops-get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; + return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, access, error); +} + +static int kvm_write_guest_virt_system(gva_t addr, void *val, + unsigned int bytes, + struct kvm_vcpu *vcpu, u32 *error) +{ + return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, 0, error); +} static int emulator_read_emulated(unsigned long addr, void *val, @@ -3997,7 +4014,7 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu) kvm_x86_ops-patch_hypercall(vcpu, instruction); - return emulator_write_emulated(rip, instruction, 3, vcpu); + return kvm_write_guest_virt_system(rip, instruction, 3, vcpu, NULL); } static u64 mk_cr_64(u64 curr_cr, u32 new_val) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: x86: ignore access permissions for hypercall patching
On Fri, Mar 12, 2010 at 07:56:00AM +0200, Gleb Natapov wrote: On Thu, Mar 11, 2010 at 06:16:05PM -0300, Marcelo Tosatti wrote: Ignore access permissions while patching hypercall instructions. Otherwise KVM injects a page fault when trying to patch vmcall on read-only text regions: Freeing initrd memory: 8843k freed Freeing unused kernel memory: 660k freed Write protecting the kernel text: 4780k Write protecting the kernel read-only data: 1912k BUG: unable to handle kernel paging request at c01292e3 IP: [c01292e3] kvm_leave_lazy_mmu+0x43/0x70 *pde = 00910067 *pte = 00129161 Oops: 0003 [#1] SMP CC: sta...@kernel.org Reported-by: Stefan Bader stefan.ba...@canonical.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com My emulator patch series introduce kvm_write_guest_virt_system(). May be used it here (only compile tested). Ignore that, it will not work. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
On 03/11/2010 10:58 PM, Marcelo Tosatti wrote: Can't you translate l2_gpa - l1_gpa walking the current l1 nested pagetable, and pass that to the kvm tdp fault path (with the correct context setup)? If I understand your suggestion correctly, I think thats exactly whats done in the patches. Some words about the design: For nested-nested we need to shadow the l1-nested-ptable on the host. This is done using the vcpu-arch.mmu context which holds the l1 paging modes while the l2 is running. On a npt-fault from the l2 we just instrument the shadow-ptable code. This is the common case. because it happens all the time while the l2 is running. OK, makes sense now, I was missing the fact that the l1-nested-ptable needs to be shadowed and l1 translations to it must be write protected. Shadow converts (gva - gpa - hpa) to (gva - hpa) or (ngpa - gpa - hpa) to (ngpa - hpa) equally well. In the second case npt still does (ngva - ngpa). You should disable out of sync shadow so that l1 guest writes to l1-nested-ptables always trap. Why? The guest is under obligation to flush the tlb if it writes to a page table, and we will resync on that tlb flush. Unsync makes just as much sense for nnpt. Think of khugepaged in the guest eating a page table and spitting out a PDE. And in the trap case, you'd have to invalidate l2 shadow pagetable entries that used the (now obsolete) l1-nested-ptable entry. Does that happen automatically? What do you mean by 'l2 shadow ptable entries'? There are the guest's page tables (ordinary direct mapped, unless the guest's guest is also running an npt-enabled hypervisor), and the host page tables. When the guest writes to each page table, we invalidate the shadows. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/18][RFC] Nested Paging support for Nested SVM (aka NPT-Virtualization)
On 03/04/2010 05:58 PM, Joerg Roedel wrote: You probably need to include a flag in base_role to differentiate between l1 / l2 shadow tables (say if they use the same cr3 value). Not sure if this is necessary. It may be necessary when large pages come into play. Otherwise the host npt pages are distinguished by the shadow npt pages by the direct-flag. Hm, I think that direct maps for the same gfn can be shared. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ioeventfd usage in KVM
On 03/12/2010 07:08 AM, Cam Macdonell wrote: +s-ivshmem_mmio_io_addr = cpu_register_io_memory(ivshmem_mmio_read, +ivshmem_mmio_write, s); +/* region for registers*/ +pci_register_bar(d-dev, 0, 0x100, + PCI_BASE_ADDRESS_SPACE_MEMORY, ivshmem_mmio_map); You've selected the memory address space here. my basic attempt looks like this: struct kvm_ioeventfd ked; ked.addr = s-otheraddr + Doorbell; ked.len = 4; ked.flags = KVM_IOEVENTFD_FLAG_PIO; ked.fd = an_eventfd; ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD,ked); But the PIO address space here. but when the guest writes to the offset of Doorbell, I cannot see any action (via a select on the fd). Is there something obviously wrong that I'm doing? Yes - they must match. Not PIO is faster on x86 but nonexistant elsewhere. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -v2] KVM: fix kvm_coalesced_mmio_init()'s error handling
This version may be better. Thanks, Takuya === kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio ring page and dev even after it has freed them. Also, if this function fails, though it must be rare, it seems to be suggesting the system's serious state. This patch changes the error handling for this function to fix these issues. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- virt/kvm/coalesced_mmio.c |2 ++ virt/kvm/kvm_main.c |4 +++- 2 files changed, 5 insertions(+), 1 deletions(-) diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 5169736..11776b7 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm) return ret; out_free_dev: + kvm-coalesced_mmio_dev = NULL; kfree(dev); out_free_page: + kvm-coalesced_mmio_ring = NULL; __free_page(page); out_err: return ret; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e758ef7..9e72067 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -419,7 +419,9 @@ static struct kvm *kvm_create_vm(void) list_add(kvm-vm_list, vm_list); spin_unlock(kvm_lock); #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET - kvm_coalesced_mmio_init(kvm); + r = kvm_coalesced_mmio_init(kvm); + if (r 0) + goto out_err; #endif out: return kvm; -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v2] KVM: fix kvm_coalesced_mmio_init()'s error handling
Takuya Yoshikawa wrote: This version may be better. Thanks, Takuya === kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio ring page and dev even after it has freed them. Also, if this function fails, though it must be rare, it seems to be suggesting the system's serious state. This patch changes the error handling for this function to fix these issues. We must also unregister mmu_notifier in the error path. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- virt/kvm/coalesced_mmio.c |2 ++ virt/kvm/kvm_main.c |4 +++- 2 files changed, 5 insertions(+), 1 deletions(-) diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 5169736..11776b7 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm) return ret; out_free_dev: + kvm-coalesced_mmio_dev = NULL; kfree(dev); out_free_page: + kvm-coalesced_mmio_ring = NULL; __free_page(page); out_err: return ret; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e758ef7..9e72067 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -419,7 +419,9 @@ static struct kvm *kvm_create_vm(void) list_add(kvm-vm_list, vm_list); spin_unlock(kvm_lock); #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET - kvm_coalesced_mmio_init(kvm); + r = kvm_coalesced_mmio_init(kvm); + if (r 0) + goto out_err; #endif out: return kvm; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v2] KVM: fix kvm_coalesced_mmio_init()'s error handling
Wei Yongjun wrote: Takuya Yoshikawa wrote: This version may be better. Thanks, Takuya === kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio ring page and dev even after it has freed them. Also, if this function fails, though it must be rare, it seems to be suggesting the system's serious state. This patch changes the error handling for this function to fix these issues. We must also unregister mmu_notifier in the error path. Oh, sorry. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- virt/kvm/coalesced_mmio.c |2 ++ virt/kvm/kvm_main.c |4 +++- 2 files changed, 5 insertions(+), 1 deletions(-) diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 5169736..11776b7 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -119,8 +119,10 @@ int kvm_coalesced_mmio_init(struct kvm *kvm) return ret; out_free_dev: + kvm-coalesced_mmio_dev = NULL; kfree(dev); out_free_page: + kvm-coalesced_mmio_ring = NULL; __free_page(page); out_err: return ret; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e758ef7..9e72067 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -419,7 +419,9 @@ static struct kvm *kvm_create_vm(void) list_add(kvm-vm_list, vm_list); spin_unlock(kvm_lock); #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET - kvm_coalesced_mmio_init(kvm); + r = kvm_coalesced_mmio_init(kvm); + if (r 0) + goto out_err; #endif out: return kvm; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html