Re: Make QEmu HPET disabled by default for KVM?
On 03/11/2010 09:52 AM, Sheng Yang wrote: I think we have already suffered enough timer issues due to this(e.g. I can't boot up well on 2.6.18 kernel)... 2.6.18 as guest or as host? I have kept --no-hpet in my setup for months... Any details about the problems? HPET is important to some guests. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ideas wiki for GSoC 2010
On 03/10/2010 11:30 PM, Luiz Capitulino wrote: 2. Do we have kvm-specific projects? Can they be part of the QEMU project or do we need a different mentoring organization for it? Something really interesting is kvm-assisted tcg. I'm afraid it's a bit too complicated to GSoC. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Make QEmu HPET disabled by default for KVM?
I think we have already suffered enough timer issues due to this(e.g. I can't boot up well on 2.6.18 kernel)... I have kept --no-hpet in my setup for months... -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86/kvm: Show guest system/user cputime in cpustat
On 03/11/2010 09:46 AM, Sheng Yang wrote: On Thursday 11 March 2010 15:36:01 Avi Kivity wrote: On 03/11/2010 09:20 AM, Sheng Yang wrote: Currently we can only get the cpu_stat of whole guest as one. This patch enhanced cpu_stat with more detail, has guest_system and guest_user cpu time statistics with a little overhead. Signed-off-by: Sheng Yang --- This draft patch based on KVM upstream to show the idea. I would split it into more kernel friendly version later. The overhead is, the cost of get_cpl() after each exit from guest. This can be very expensive in the nested virtualization case, so I wouldn't like this to be in normal paths. I think detailed profiling like that can be left to 'perf kvm', which only has overhead if enabled at runtime. Yes, that's my concern too(though nested vmcs/vmcb read already too expensive, they should be optimized...). Any ideas on how to do that? Perhaps use paravirt_ops to covert the vmread into a memory read? We store the vmwrites in the vmcs anyway. The other concern is, perf alike mechanism would bring a lot more overhead compared to this. Ordinarily users won't care if time is spent in guest kernel mode or guest user mode. They want to see which guest is imposing a load on a system. I consider a user profiling a guest from the host an advanced and rarer use case, so it's okay to require tools and additional overhead for this. For example you can put the code to note the cpl in a tracepoint which is enabled dynamically. Yanmin have already implement "perf kvm" to support this. We are just arguing if a normal top-alike mechanism is necessary. I am also considering to make it a feature that can be disabled. But seems it make things complicate and result in uncertain cpustat output. I'm not even sure that guest time was a good idea. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86/kvm: Show guest system/user cputime in cpustat
On Thursday 11 March 2010 15:36:01 Avi Kivity wrote: > On 03/11/2010 09:20 AM, Sheng Yang wrote: > > Currently we can only get the cpu_stat of whole guest as one. This patch > > enhanced cpu_stat with more detail, has guest_system and guest_user cpu > > time statistics with a little overhead. > > > > Signed-off-by: Sheng Yang > > --- > > > > This draft patch based on KVM upstream to show the idea. I would split it > > into more kernel friendly version later. > > > > The overhead is, the cost of get_cpl() after each exit from guest. > > This can be very expensive in the nested virtualization case, so I > wouldn't like this to be in normal paths. I think detailed profiling > like that can be left to 'perf kvm', which only has overhead if enabled > at runtime. Yes, that's my concern too(though nested vmcs/vmcb read already too expensive, they should be optimized...). The other concern is, perf alike mechanism would bring a lot more overhead compared to this. > For example you can put the code to note the cpl in a tracepoint which > is enabled dynamically. Yanmin have already implement "perf kvm" to support this. We are just arguing if a normal top-alike mechanism is necessary. I am also considering to make it a feature that can be disabled. But seems it make things complicate and result in uncertain cpustat output. -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86/kvm: Show guest system/user cputime in cpustat
On 03/11/2010 09:20 AM, Sheng Yang wrote: Currently we can only get the cpu_stat of whole guest as one. This patch enhanced cpu_stat with more detail, has guest_system and guest_user cpu time statistics with a little overhead. Signed-off-by: Sheng Yang --- This draft patch based on KVM upstream to show the idea. I would split it into more kernel friendly version later. The overhead is, the cost of get_cpl() after each exit from guest. This can be very expensive in the nested virtualization case, so I wouldn't like this to be in normal paths. I think detailed profiling like that can be left to 'perf kvm', which only has overhead if enabled at runtime. For example you can put the code to note the cpl in a tracepoint which is enabled dynamically. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] x86/kvm: Show guest system/user cputime in cpustat
Currently we can only get the cpu_stat of whole guest as one. This patch enhanced cpu_stat with more detail, has guest_system and guest_user cpu time statistics with a little overhead. Signed-off-by: Sheng Yang --- This draft patch based on KVM upstream to show the idea. I would split it into more kernel friendly version later. The overhead is, the cost of get_cpl() after each exit from guest. Comments are welcome! arch/x86/kvm/x86.c | 10 ++ fs/proc/stat.c | 22 -- include/linux/kernel_stat.h |2 ++ include/linux/kvm_host.h|1 + include/linux/sched.h |1 + kernel/sched.c |6 ++ 6 files changed, 36 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 703f637..c8ea6e1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4290,6 +4290,14 @@ static void inject_pending_event(struct kvm_vcpu *vcpu) } } +static void kvm_update_guest_mode(struct kvm_vcpu *vcpu) +{ + int cpl = kvm_x86_ops->get_cpl(vcpu); + + if (cpl != 0) + current->flags |= PF_VCPU_USER; +} + static int vcpu_enter_guest(struct kvm_vcpu *vcpu) { int r; @@ -4377,6 +4385,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) trace_kvm_entry(vcpu->vcpu_id); kvm_x86_ops->run(vcpu); + kvm_update_guest_mode(vcpu); + /* * If the guest has used debug registers, at least dr7 * will be disabled while returning to the host. diff --git a/fs/proc/stat.c b/fs/proc/stat.c index b9b7aad..d07640a 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -27,7 +27,7 @@ static int show_stat(struct seq_file *p, void *v) int i, j; unsigned long jif; cputime64_t user, nice, system, idle, iowait, irq, softirq, steal; - cputime64_t guest, guest_nice; + cputime64_t guest, guest_nice, guest_user, guest_system; u64 sum = 0; u64 sum_softirq = 0; unsigned int per_softirq_sums[NR_SOFTIRQS] = {0}; @@ -36,7 +36,7 @@ static int show_stat(struct seq_file *p, void *v) user = nice = system = idle = iowait = irq = softirq = steal = cputime64_zero; - guest = guest_nice = cputime64_zero; + guest = guest_nice = guest_user = guest_system = cputime64_zero; getboottime(&boottime); jif = boottime.tv_sec; @@ -53,6 +53,10 @@ static int show_stat(struct seq_file *p, void *v) guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest); guest_nice = cputime64_add(guest_nice, kstat_cpu(i).cpustat.guest_nice); + guest_user = cputime64_add(guest_user, + kstat_cpu(i).cpustat.guest_user); + guest_system = cputime64_add(guest_system, + kstat_cpu(i).cpustat.guest_system); for_each_irq_nr(j) { sum += kstat_irqs_cpu(j, i); } @@ -68,7 +72,7 @@ static int show_stat(struct seq_file *p, void *v) sum += arch_irq_stat(); seq_printf(p, "cpu %llu %llu %llu %llu %llu %llu %llu %llu %llu " - "%llu\n", + "%llu %llu %llu\n", (unsigned long long)cputime64_to_clock_t(user), (unsigned long long)cputime64_to_clock_t(nice), (unsigned long long)cputime64_to_clock_t(system), @@ -78,7 +82,9 @@ static int show_stat(struct seq_file *p, void *v) (unsigned long long)cputime64_to_clock_t(softirq), (unsigned long long)cputime64_to_clock_t(steal), (unsigned long long)cputime64_to_clock_t(guest), - (unsigned long long)cputime64_to_clock_t(guest_nice)); + (unsigned long long)cputime64_to_clock_t(guest_nice), + (unsigned long long)cputime64_to_clock_t(guest_user), + (unsigned long long)cputime64_to_clock_t(guest_system)); for_each_online_cpu(i) { /* Copy values here to work around gcc-2.95.3, gcc-2.96 */ @@ -93,9 +99,11 @@ static int show_stat(struct seq_file *p, void *v) steal = kstat_cpu(i).cpustat.steal; guest = kstat_cpu(i).cpustat.guest; guest_nice = kstat_cpu(i).cpustat.guest_nice; + guest_user = kstat_cpu(i).cpustat.guest_user; + guest_system = kstat_cpu(i).cpustat.guest_system; seq_printf(p, "cpu%d %llu %llu %llu %llu %llu %llu %llu %llu %llu " - "%llu\n", + "%llu %llu %llu\n", i, (unsigned long long)cputime64_to_clock_t(user), (unsigned long long)cputime64_to_clock_t(nice), @@ -106,7 +114,9 @@ static int show_stat(struct seq_file *p, void *v) (unsigned long long)cputime64_to_clock_t(softirq), (unsigned
Re: [PATCH] Inter-VM shared memory PCI device
On 03/10/2010 04:04 PM, Arnd Bergmann wrote: On Tuesday 09 March 2010, Cam Macdonell wrote: We could make the masking in RAM, not in registers, like virtio, which would require no exits. It would then be part of the application specific protocol and out of scope of of this spec. This kind of implementation would be possible now since with UIO it's up to the application whether to mask interrupts or not and what interrupts mean. We could leave the interrupt mask register for those who want that behaviour. Arnd's idea would remove the need for the Doorbell and Mask, but we will always need at least one MMIO register to send whatever interrupts we do send. You'd also have to be very careful if the notification is in RAM to avoid races between one guest triggering an interrupt and another guest clearing its interrupt mask. A totally different option that avoids this whole problem would be to separate the signalling from the shared memory, making the PCI shared memory device a trivial device with a single memory BAR, and using something a higher-level concept like a virtio based serial line for the actual signalling. That would be much slower. The current scheme allows for an ioeventfd/irqfd short circuit which allows one guest to interrupt another without involving their qemus at all. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inter-VM shared memory PCI device
On 03/10/2010 06:36 PM, Cam Macdonell wrote: On Wed, Mar 10, 2010 at 2:21 AM, Avi Kivity wrote: On 03/09/2010 08:34 PM, Cam Macdonell wrote: On Tue, Mar 9, 2010 at 10:28 AM, Avi Kivitywrote: On 03/09/2010 05:27 PM, Cam Macdonell wrote: Registers are used for synchronization between guests sharing the same memory object when interrupts are supported (this requires using the shared memory server). How does the driver detect whether interrupts are supported or not? At the moment, the VM ID is set to -1 if interrupts aren't supported, but that may not be the clearest way to do things. With UIO is there a way to detect if the interrupt pin is on? I suggest not designing the device to uio. Make it a good guest-independent device, and if uio doesn't fit it, change it. Why not support interrupts unconditionally? Is the device useful without interrupts? Currently my patch works with or without the shared memory server. If you give the parameter -ivshmem 256,foo then this will create (if necessary) and map /dev/shm/foo as the shared region without interrupt support. Some users of shared memory are using it this way. Going forward we can require the shared memory server and always have interrupts enabled. Can you explain how they synchronize? Polling? Using the network? Using it as a shared cache? If it's a reasonable use case it makes sense to keep it. Do you mean how they synchronize without interrupts? One project I've been contacted about uses the shared region directly for synchronization for simulations running in different VMs that share data in the memory region. In my tests spinlocks in the shared region work between guests. I see. If we want to keep the serverless implementation, do we need to support shm_open with -chardev somehow? Something like -chardev shm,name=foo. Right now my qdev implementation just passes the name to the -device option and opens it. I think using the file name is fine. Another thing comes to mind - a shared memory ID, in case a guest has multiple cards. Sure, a number that can be passed on the command-line and stored in a register? Yes. NICs use the MAC address and storage uses the disk serial number, this is the same thing for shared memory. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter
On 03/10/2010 05:26 PM, Joerg Roedel wrote: On Wed, Mar 10, 2010 at 04:53:29PM +0200, Avi Kivity wrote: On 03/10/2010 04:44 PM, Joerg Roedel wrote: On Mon, Mar 08, 2010 at 11:17:41AM +0200, Avi Kivity wrote: On 03/03/2010 09:12 PM, Joerg Roedel wrote: This patch changes the tdp_enabled flag from its global meaning to the mmu-context. This is necessary for Nested SVM with emulation of Nested Paging where we need an extra MMU context to shadow the Nested Nested Page Table. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ec891a2..e7bef19 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -254,6 +254,7 @@ struct kvm_mmu { int root_level; int shadow_root_level; union kvm_mmu_page_role base_role; + bool tdp_enabled; This needs a different name, since the old one is still around. Perhaps we could call it parent_mmu and make it a kvm_mmu pointer. Hmm, how about renaming the global tdp_enabled variable to tdp_usable? The global variable indicates if tdp is _usable_ and we can _enable_ it for a mmu context. I think of the global flags as host tdp, and the mmu as guest tdp (but maybe this is wrong?). If that makes sense, the naming should reflect that. The basic flow of the mmu state with npt-npt is: 1. As long as the L1 is running the arch.mmu context is in tdp mode and builds a direct-mapped page table. 2. When vmrun is emulated and the nested vmcb enables nested paging, arch.mmu is switched to a shadow-mmu mode which now shadows the l1 nested page table. So when the l2-guest runs with nested paging the arch.mmu.tdp_enabled variable on the host is false. 3. On a vmexit emulation the mmu is switched back to tdp handling state. So the mmu.tdp_enabled parameter is about tdp being enabled for the mmu context (so mmu.tdp_enabled means that we build a l1-direct-mapped page table when true or shadow a l1-page-table when false). Thats why I think the 'tdp_enabled' name makes sense in the mmu-context. The global flag only shows if an mmu-context could be in tdp-state. So tdp_usable may be a good name for it. tdp is still used in both cases, so that name is confusing. We could call it mmu.direct_map (and set it for real mode?) or mmu.virtual_map (with the opposite sense). Or something. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: guest patched with pax causes "set_cr0: 0xffff88000[...] #GP, reserved bits 0x8004003?" flood on host
On 03/10/2010 06:17 PM, Antoine Martin wrote: Hi, I've updated my host kernel headers to 2.6.33, rebuilt glibc (and the base system), rebuilt kvm. ... and now I get hundreds of those in dmesg on the host when I start a guest kernel that worked fine before. (2.6.33 + pax patch v5) set_cr0: 0x88000ec29d58 #GP, reserved bits 0x80040033 set_cr0: 0x88000f3cdb38 #GP, reserved bits 0x8004003b set_cr0: 0x88000f3dbc88 #GP, reserved bits 0x80040033 set_cr0: 0x88000f83b958 #GP, reserved bits 0x8004003b The guest is clearly confused. Can you bisect kvm to find out what introduced this problem? (hundreds of all 4) And the VM just reboots shortly after starting init. Funnily enough, I've got some VMs still running that kernel just fine! (as I started them before the headers+glibc+qemu-kvm rebuild) Now, you might just say that I shouldn't use out of tree patches like pax, You can run anything you like in a guest. but I just want to know one thing: should the guest kernel still be able to flood dmesg on the host like this? No, these are debug messages. Thanks Antoine PS: Avi, are you still interested in seeing if this rebuild fixes the pread/glibc bug? I think we figured it out, but a confirmation would be nice. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Shadow page table questions
On 03/11/2010 02:06 AM, Marek Olszewski wrote: Thanks for the response. I've looked through the code some more and think I have figured it out now. I finally see that the root_hpa variable gets switched before entering the guest in mmu_alloc_roots, to correspond with the new cr3. Thanks again. Perhaps you can help me with one more question. I was hoping to try out a certain change for a research project. I would like to "privatize" kvm_mmu_page's and their spe's for each guest thread running in certain designated guest processes. The goal is to give each thread its own shadow page table graphs that map the same guest logical addresses to guest physical addresses (with some changes to be introduced later). Are there any assumptions that KVM makes that will break if I do something like this? I understand that I will have to add some code throughout the mmu to make sure that these structures are synchronized when a guest thread makes a change, but I'm wondering if there is anything else. Does the reverse mapping data structure you have assume that there is only one shadow page per guest page? It doesn't, and there are often multiple shadow pages per guest page, distinguished by their sp->role field. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
On 03/10/2010 07:41 PM, Paul Brook wrote: You're much better off using a bulk-data transfer API that relaxes coherency requirements. IOW, shared memory doesn't make sense for TCG Rather, tcg doesn't make sense for shared memory smp. But we knew that already. In think TCG SMP is a hard, but soluble problem, especially when you're running guests used to coping with NUMA. Do you mean by using a per-cpu tlb? These kind of solutions are generally slow, but tcg's slowness may mask this out. TCG interacting with third parties via shared memory is probably never going to make sense. The third party in this case is qemu. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM-test: SR-IOV: Fix a bug that wrongly check VFs count
The parameter 'devices_requested' is irrelated to driver_option 'max_vfs' of 'igb'. NIC card 82576 has two network interfaces and each can be virtualized up to 7 virtual functions, therefore we multiply two for the value of driver_option 'max_vfs' and can thus get the total number of VFs. Signed-off-by: Yolkfull Chow --- client/tests/kvm/kvm_utils.py | 19 +-- 1 files changed, 13 insertions(+), 6 deletions(-) diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py index 4565dc1..1813ed1 100644 --- a/client/tests/kvm/kvm_utils.py +++ b/client/tests/kvm/kvm_utils.py @@ -1012,17 +1012,22 @@ class PciAssignable(object): """ Get VFs count number according to lspci. """ +# FIXME: Need to think out a method of identify which +# 'virtual function' belongs to which physical card considering +# that if the host has more than one 82576 card. PCI_ID? cmd = "lspci | grep 'Virtual Function' | wc -l" -# For each VF we'll see 2 prints of 'Virtual Function', so let's -# divide the result per 2 -return int(commands.getoutput(cmd)) / 2 +return int(commands.getoutput(cmd)) def check_vfs_count(self): """ Check VFs count number according to the parameter driver_options. """ -return (self.get_vfs_count == self.devices_requested) +# Network card 82576 has two network interfaces and each can be +# virtualized up to 7 virtual functions, therefore we multiply +# two for the value of driver_option 'max_vfs'. +expected_count = int((re.findall("(\d)", self.driver_option)[0])) * 2 +return (self.get_vfs_count == expected_count) def is_binded_to_stub(self, full_id): @@ -1054,15 +1059,17 @@ class PciAssignable(object): elif not self.check_vfs_count(): os.system("modprobe -r %s" % self.driver) re_probe = True +else: +return True # Re-probe driver with proper number of VFs if re_probe: cmd = "modprobe %s %s" % (self.driver, self.driver_option) +logging.info("Loading the driver '%s' with option '%s'" % + (self.driver, self.driver_option)) s, o = commands.getstatusoutput(cmd) if s: return False -if not self.check_vfs_count(): -return False return True -- 1.7.0.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
On Thu, Mar 11, 2010 at 03:10:47AM +, Jamie Lokier wrote: > Paul Brook wrote: > > > > In a cross environment that becomes extremely hairy. For example the > > > > x86 > > > > architecture effectively has an implicit write barrier before every > > > > store, and an implicit read barrier before every load. > > > > > > Btw, x86 doesn't have any implicit barriers due to ordinary loads. > > > Only stores and atomics have implicit barriers, afaik. > > > > As of March 2009[1] Intel guarantees that memory reads occur in > > order (they may only be reordered relative to writes). It appears > > AMD do not provide this guarantee, which could be an interesting > > problem for heterogeneous migration.. > > (Summary: At least on AMD64, it does too, for normal accesses to > naturally aligned addresses in write-back cacheable memory.) > > Oh, that's interesting. Way back when I guess we knew writes were in > order and it wasn't explicit that reads were, hence smp_rmb() using a > locked atomic. > > Here is a post by Nick Piggin from 2007 with links to Intel _and_ AMD > documents asserting that reads to cacheable memory are in program order: > > http://lkml.org/lkml/2007/9/28/212 > Subject: [patch] x86: improved memory barrier implementation > > Links to documents: > > http://developer.intel.com/products/processor/manuals/318147.pdf > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf > > The Intel link doesn't work any more, but the AMD one does. It might have been merged into their development manual now. > Nick asserts "both manufacturers are committed to in-order loads from > cacheable memory for the x86 architecture". At the time we did ask Intel and AMD engineers. We talked with Andy Glew from Intel I believe, but I can't recall the AMD contact. Linus was involved in the discussions as well. We tried to do the right thing with this. > I have just read the AMD document, and it is in there (but not > completely obviously), in section 7.2. The implicit load-load and > store-store barriers are only guaranteed for "normal cacheable > accesses on naturally aligned boundaries to WB [write-back cacheable] > memory". There are also implicit load-store barriers but not > store-load. > > Note that the document covers AMD64; it does not say anything about > their (now old) 32-bit processors. Hmm. Well it couldn't hurt to ask again. We've never seen any problems yet, so I'm rather sure we're in the clear. > > > [*] The most recent docs I have handy. Up to and including Core-2 Duo. > > Are you sure the read ordering applies to 32-bit Intel and AMD CPUs too? > > Many years ago, before 64-bit x86 existed, I recall discussions on > LKML where it was made clear that stores were performed in program > order. If it were known at the time that loads were performed in > program order on 32-bit x86s, I would have expected that to have been > mentioned by someone. The way it was explained to us by the Intel engineer is that they had implemented only visibly in-order loads, but they wanted to keep their options open in future so they did not want to commit to in order loads as an ISA feature. So when the whitepaper was released we got their blessing to retroactively apply the rules to previous CPUs. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question on stopping KVM start at boot
On Thu, Mar 11, 2010 at 11:59:45AM +0800, sati...@pacific.net.hk wrote: > Hi Dustin, > > >Or you can edit the /etc/init.d/kvm or > >/etc/init.d/qemu-kvm init script and add the "-b" option to the > >modprobe calls in there. > > $ cat /etc/init.d/qemu-kvm | grep modprobe > if modprobe "$module" > > Where shall I add "-b" option? Thanks If I'm not wrong, there. That "if" is loading the module and sees the return code to give an error if it fails to load. Thanks, Rodrigo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question on stopping KVM start at boot
Quoting Bitman Zhou : - snip - Please what further command I have to run in order to activate the new blacklist.conf ? For Ubutnu, you can just use update-rc.d sudo update-rc.d kvm disable to disable kvm and sudo update-rc.d kvm enable to enable it again. Tks for your advice. B.R. Stephen L -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question on stopping KVM start at boot
Hi Dustin, Thanks for your advice. - snip - Or you can edit the /etc/init.d/kvm or /etc/init.d/qemu-kvm init script and add the "-b" option to the modprobe calls in there. $ cat /etc/init.d/kvm | grep modprobe No printout $ cat /etc/init.d/qemu-kvm | grep modprobe if modprobe "$module" . if modprobe "$module" then log_end_msg 0 else log_end_msg 1 exit 1 fi ;; Where shall I add "-b" option? Thanks B.R. Stephen -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
Paul Brook wrote: > > > In a cross environment that becomes extremely hairy. For example the x86 > > > architecture effectively has an implicit write barrier before every > > > store, and an implicit read barrier before every load. > > > > Btw, x86 doesn't have any implicit barriers due to ordinary loads. > > Only stores and atomics have implicit barriers, afaik. > > As of March 2009[1] Intel guarantees that memory reads occur in > order (they may only be reordered relative to writes). It appears > AMD do not provide this guarantee, which could be an interesting > problem for heterogeneous migration.. (Summary: At least on AMD64, it does too, for normal accesses to naturally aligned addresses in write-back cacheable memory.) Oh, that's interesting. Way back when I guess we knew writes were in order and it wasn't explicit that reads were, hence smp_rmb() using a locked atomic. Here is a post by Nick Piggin from 2007 with links to Intel _and_ AMD documents asserting that reads to cacheable memory are in program order: http://lkml.org/lkml/2007/9/28/212 Subject: [patch] x86: improved memory barrier implementation Links to documents: http://developer.intel.com/products/processor/manuals/318147.pdf http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf The Intel link doesn't work any more, but the AMD one does. Nick asserts "both manufacturers are committed to in-order loads from cacheable memory for the x86 architecture". I have just read the AMD document, and it is in there (but not completely obviously), in section 7.2. The implicit load-load and store-store barriers are only guaranteed for "normal cacheable accesses on naturally aligned boundaries to WB [write-back cacheable] memory". There are also implicit load-store barriers but not store-load. Note that the document covers AMD64; it does not say anything about their (now old) 32-bit processors. > [*] The most recent docs I have handy. Up to and including Core-2 Duo. Are you sure the read ordering applies to 32-bit Intel and AMD CPUs too? Many years ago, before 64-bit x86 existed, I recall discussions on LKML where it was made clear that stores were performed in program order. If it were known at the time that loads were performed in program order on 32-bit x86s, I would have expected that to have been mentioned by someone. -- Jamie -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Shadow page table questions
Thanks for the response. I've looked through the code some more and think I have figured it out now. I finally see that the root_hpa variable gets switched before entering the guest in mmu_alloc_roots, to correspond with the new cr3. Thanks again. Perhaps you can help me with one more question. I was hoping to try out a certain change for a research project. I would like to "privatize" kvm_mmu_page's and their spe's for each guest thread running in certain designated guest processes. The goal is to give each thread its own shadow page table graphs that map the same guest logical addresses to guest physical addresses (with some changes to be introduced later). Are there any assumptions that KVM makes that will break if I do something like this? I understand that I will have to add some code throughout the mmu to make sure that these structures are synchronized when a guest thread makes a change, but I'm wondering if there is anything else. Does the reverse mapping data structure you have assume that there is only one shadow page per guest page? Thanks! Marek Avi Kivity wrote: On 03/10/2010 06:57 AM, Marek Olszewski wrote: Hello, I was wondering if someone could point me to some documentation that explains the basic non-nested-paging shadow page table algorithm/strategy used by KVM. I understand that KVM caches shadow page tables across context switches and that there is a reverse mapping and page protection to help zap shadow page tables when the guest page tables change. However, I'm not entirely sure how the actual caching is done. At first I assumed that KVM would change the host CR3 on every guest context switch such that it would point to a cached shadow page table for the currently running guest user thread, however, as far as I can tell, the host CR3 does not change so I'm a little lost. If indeed it doesn't change the CR3, how does KVM solve the problem that arises when two processes in the guest OS share the same guest logical addresses? The host cr3 does change, though not by using the 'mov cr3' instruction (that would cause the host to immediately switch to the guest address space, which would be bad). See the calls to kvm_x86_ops->set_cr3(). I'm also interested in figuring out what KVM does when running with multiple virtual CPUs. Looking at the code, I can see that each VCPU has its own root pointer to a shadow page table graph, but I have yet to figure out if this graph has node's shared between VCPUs, or whether they are all private. Everything is shared. If the guest is running with identical cr3s, kvm will load identical cr3s in guest mode. An exception is when we use 32-bit pae mode. In that case, the guest cr3s will be different (but guest PDPTRs will be identical). Instead of dealing with the pae cr3, we deal with the four PDPTRs. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][ PATCH 1/3] vhost-net: support multiple buffer heads in receiver
"Michael S. Tsirkin" wrote on 03/07/2010 11:45:33 PM: > > > > +static int skb_head_len(struct sk_buff_head *skq) > > > > +{ > > > > + struct sk_buff *head; > > > > + > > > > + head = skb_peek(skq); > > > > + if (head) > > > > + return head->len; > > > > + return 0; > > > > +} > > > > + > > > > > > This is done without locking, which I think can crash > > > if skb is consumed after we peek it but before we read the > > > length. > > > > This thread is the only legitimate consumer, right? But > > qemu has the file descriptor and I guess we shouldn't trust > > that it won't give it to someone else; it'd break vhost, but > > a crash would be inappropriate, of course. I'd like to avoid > > the lock, but I have another idea here, so will investigate. I looked at this some more and actually, I'm not sure I see a crash here. First, without qemu, or something it calls, being broken as root, nothing else should ever read from the socket, in which case the length will be exactly right for the next packet we read. No problem. But if by some error this skb is freed, we'll have valid memory address that isn't the length field of the next packet we'll read. If the length is negative or more than available in the vring, we'll fail the buffer allocation, exit the loop, and get the new head length of the receive queue the next time around -- no problem. If the length field is 0, we'll exit the loop even though we have data to read, but will get that packet the next time we get in here, again, with the right length. No problem. If the length field is big enough to allocate buffer space for it, but smaller than the new head we have to read, the recvmsg will fail with EMSGSIZE, drop the packet, exit the loop and be back in business with the next packet. No problem. Otherwise, the packet will fit and be delivered. I don't much like the notion of using skb->head when it's garbage, but that can only happen if qemu has broken, and I don't see a crash unless the skb is not only freed but no longer a valid memory address for reading at all, and all within the race window. Since the code handles other failure cases (not enough ring buffers or packet not fitting in the allocated buffers), the actual length value only matters in the sense that it prevents us from using buffers unnecessarily-- something that isn't all that relevant if it's hosed enough to have unauthorized readers on the socket. Is this case worth the performance penalty we'll no doubt pay for either locking the socket or always allocating for a max-sized packet? I'll experiment with a couple solutions here, but unless I've missed something, we might be better off just leaving it as-is. +-DLS -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ideas wiki for GSoC 2010
Hi there, Our wiki page for the Summer of Code 2010 is doing quite well: http://wiki.qemu.org/Google_Summer_of_Code_2010 Now the most important is: 1. Get mentors assigned to projects. Just put your name and email in the right field. It's ok and even desirable to have two mentors per project, but please remember that mentoring is serious work, more info here: http://code.google.com/p/google-summer-of-code/wiki/AdviceforMentors http://gsoc-wiki.osuosl.org/index.php/Main_Page 2. Do we have kvm-specific projects? Can they be part of the QEMU project or do we need a different mentoring organization for it? 3. Fill in the missing information for the suggested project (description, skill level, languages, etc) I will complete our application tomorrow or on Friday. PS: I'm CC'ing everyone who suggested projects there, except one or two I couldn't find the email address. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question on stopping KVM start at boot
On Wed, Mar 10, 2010 at 3:08 AM, Bitman Zhou wrote: >> I need to stop KVM starting at boot. >> >> I added following 2 lines at the bottom of /etc/modprobe.d/blacklist.conf >> blacklist kvm >> blacklist kvm-amd >> >> >> Reboot PC >> >> It doesn't work. >> >> $ lsmod | grep kvm >> kvm_amd 41556 0 >> kvm 190648 1 kvm_amd >> >> >> Please what further command I have to run in order to activate the new >> blacklist.conf ? > > For Ubutnu, you can just use update-rc.d > > sudo update-rc.d kvm disable > > to disable kvm and > > sudo update-rc.d kvm enable > > to enable it again. Hi there, Unfortunately, the /etc/init.d/kvm and /etc/init.d/qemu-kvm init scripts in previous Ubuntu releases (9.10 and earlier) didn't respect the module blacklists. I have corrected this in Ubuntu 10.04 by using modprobe -b. Thus for Ubuntu 10.04 forward, you should be able to use the blacklist appropriately. For other release, you can disable the init script entirely as the other responder wrote. Or you can edit the /etc/init.d/kvm or /etc/init.d/qemu-kvm init script and add the "-b" option to the modprobe calls in there. -- :-Dustin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 32-bit qemu + 64-bit kvm be a problem?
Neo Jia wrote: > hi, > > I have to keep a 32-bit qmeu user space to work with some legacy > library I have but still want to use 64-bit host Linux to explore > 64-bit advantage. > > So I am wondering if I can use a 32-bit qemu + 64-bit kvm-kmod > configuration. Will there be any limitation or drawback for this > configuration? I already get one that we can't assign guest physical > memory more than 2047 MB. I use 32bit kvm on 64bit kernel since the day one. Nothing of interest since that, everything just works. Recently (this week) I come across a situation when something does not work in 64/32 mode. Namely it is linux aio (see the other thread in kvm@ a few days back) - but this is not due to kvm but due to other kernel subsystem (in this case aio) which lacks proper compat handlers in place. Generally I reported quite several issues in this config - here or there there were issues, something did not work. Now the places where we've issues are decreasing (hopefully anyway), at least I haven't seen issues recently, except of this aio stuff. But strictly speaking, I don't see any good reason to run 32bit kvm on 64 bit kernel either. Most distributions nowadays provide a set of 64bit libraries for their 32bit versions so that limited support for 64bit binaries are available. This is mostly enough for kvm - without X and SDL support it works just fine (using vnc display). Historically I've 32bit userspace, but most guests now are running with 64bit kvm - either because the guests switched to 64bit kernel or because aio thing or just because I looks like it is more efficient (less syscall/ioctl 32=>64 translation and the like). kvm itself uses only very few memory so here it almost makes no difference between 32 and 64 bits (in 64bit pointers are larger and hence usually more memory is used). Yes, it is difficult to provide everything needed for sdl, but for our tasks SDL windows aren't really necessary, and for testing 32bit mode works just fine too... /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [Autotest] [KVM-AUTOTEST] fix tap interface for parallel execution
- "Yogananth Subramanian" wrote: > Adds support to create guests with different MAC address during > parallel > execution of autotest, this is done by creating worker dicts with > different "address_index" > > Signed-off-by: Yogananth Subramanian > --- > client/tests/kvm/kvm_scheduler.py |3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/client/tests/kvm/kvm_scheduler.py > b/client/tests/kvm/kvm_scheduler.py > index 93b7df6..9000391 100644 > --- a/client/tests/kvm/kvm_scheduler.py > +++ b/client/tests/kvm/kvm_scheduler.py > @@ -33,7 +33,8 @@ class scheduler: > # "Personal" worker dicts contain modifications that are applied > # specifically to each worker. For example, each worker must use a > # different environment file and a different MAC address pool. > -self.worker_dicts = [{"env": "env%d" % i} for i in > range(num_workers)] > +self.worker_dicts = [{"env": "env%d" % i, "address_index": i-1} > + for i in range(num_workers)] This approach won't work in the general case -- some tests use more than 1 VM and each VM requires a different address_index. address_pools.cfg defines, for each host, a MAC address pool. Every pool consists of several contiguous ranges, and looks something like this: address_ranges = r1 r2 r3 address_range_base_mac_r1 = 52:54:00:12:34:56 address_range_size_r1 = 20 address_range_base_mac_r2 = 52:54:00:12:80:00 address_range_size_r2 = 20 ... (more ranges here) The pool itself needs to be split between the parallel workers, so that each worker has its own completely separate pool. In other words, the parameters address_ranges, address_range_base_mac_* and address_range_size_* need to be modified in 'self.worker_dicts', not address_index. For example, if a pool has 2 ranges: r1r2 - and there are 3 workers, the pool needs to be distributed evenly like this: r1 r2r3r4 --- - --- so that worker A gets r1, worker B gets [r2, r3] and worker C gets r4. This shouldn't be very hard. I'll see if I can work on a patch that will do this. > > > def worker(self, index, run_test_func): > -- > 1.6.0.4 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
> > You're much better off using a bulk-data transfer API that relaxes > > coherency requirements. IOW, shared memory doesn't make sense for TCG > > Rather, tcg doesn't make sense for shared memory smp. But we knew that > already. In think TCG SMP is a hard, but soluble problem, especially when you're running guests used to coping with NUMA. TCG interacting with third parties via shared memory is probably never going to make sense. Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
On 03/10/2010 07:13 PM, Anthony Liguori wrote: On 03/10/2010 03:25 AM, Avi Kivity wrote: On 03/09/2010 11:44 PM, Anthony Liguori wrote: Ah yes. For cross tcg environments you can map the memory using mmio callbacks instead of directly, and issue the appropriate barriers there. Not good enough unless you want to severely restrict the use of shared memory within the guest. For instance, it's going to be useful to assume that you atomic instructions remain atomic. Crossing architecture boundaries here makes these assumptions invalid. A barrier is not enough. You could make the mmio callbacks flow to the shared memory server over the unix-domain socket, which would then serialize them. Still need to keep RMWs as single operations. When the host supports it, implement the operation locally (you can't render cmpxchg16b on i386, for example). But now you have a requirement that the shmem server runs in lock-step with the guest VCPU which has to happen for every single word of data transferred. Alternative implementation: expose a futex in a shared memory object and use that to serialize access. Now all accesses happen from vcpu context, and as long as there is no contention, should be fast, at least relative to tcg. You're much better off using a bulk-data transfer API that relaxes coherency requirements. IOW, shared memory doesn't make sense for TCG :-) Rather, tcg doesn't make sense for shared memory smp. But we knew that already. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
On 03/10/2010 03:25 AM, Avi Kivity wrote: On 03/09/2010 11:44 PM, Anthony Liguori wrote: Ah yes. For cross tcg environments you can map the memory using mmio callbacks instead of directly, and issue the appropriate barriers there. Not good enough unless you want to severely restrict the use of shared memory within the guest. For instance, it's going to be useful to assume that you atomic instructions remain atomic. Crossing architecture boundaries here makes these assumptions invalid. A barrier is not enough. You could make the mmio callbacks flow to the shared memory server over the unix-domain socket, which would then serialize them. Still need to keep RMWs as single operations. When the host supports it, implement the operation locally (you can't render cmpxchg16b on i386, for example). But now you have a requirement that the shmem server runs in lock-step with the guest VCPU which has to happen for every single word of data transferred. You're much better off using a bulk-data transfer API that relaxes coherency requirements. IOW, shared memory doesn't make sense for TCG :-) Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Autotest] [KVM-AUTOTEST] Patch to fix tap interface support for parallel execution
Hello Lucas, I like to submit a patch to fix support for tap interface in kvm-autotest, while executin it in parallel. This is done by creating workers with different "addres_index", so that MAC address of the guests created will be unique. The currents implementation tries to create just differnt env files and not different Mac address or address_index, so all the guests endup using the same address_index and MAC address. Thanks and Regards Yogi -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [Autotest] [KVM-AUTOTEST] fix tap interface for parallel execution
Adds support to create guests with different MAC address during parallel execution of autotest, this is done by creating worker dicts with different "address_index" Signed-off-by: Yogananth Subramanian --- client/tests/kvm/kvm_scheduler.py |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm/kvm_scheduler.py b/client/tests/kvm/kvm_scheduler.py index 93b7df6..9000391 100644 --- a/client/tests/kvm/kvm_scheduler.py +++ b/client/tests/kvm/kvm_scheduler.py @@ -33,7 +33,8 @@ class scheduler: # "Personal" worker dicts contain modifications that are applied # specifically to each worker. For example, each worker must use a # different environment file and a different MAC address pool. -self.worker_dicts = [{"env": "env%d" % i} for i in range(num_workers)] +self.worker_dicts = [{"env": "env%d" % i, "address_index": i-1} + for i in range(num_workers)] def worker(self, index, run_test_func): -- 1.6.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inter-VM shared memory PCI device
On Wed, Mar 10, 2010 at 2:21 AM, Avi Kivity wrote: > On 03/09/2010 08:34 PM, Cam Macdonell wrote: >> >> On Tue, Mar 9, 2010 at 10:28 AM, Avi Kivity wrote: >> >>> >>> On 03/09/2010 05:27 PM, Cam Macdonell wrote: >>> > > >> >> Registers are used >> for synchronization between guests sharing the same memory object when >> interrupts are supported (this requires using the shared memory >> server). >> >> >> > > How does the driver detect whether interrupts are supported or not? > > At the moment, the VM ID is set to -1 if interrupts aren't supported, but that may not be the clearest way to do things. With UIO is there a way to detect if the interrupt pin is on? >>> >>> I suggest not designing the device to uio. Make it a good >>> guest-independent >>> device, and if uio doesn't fit it, change it. >>> >>> Why not support interrupts unconditionally? Is the device useful without >>> interrupts? >>> >> >> Currently my patch works with or without the shared memory server. If >> you give the parameter >> >> -ivshmem 256,foo >> >> then this will create (if necessary) and map /dev/shm/foo as the >> shared region without interrupt support. Some users of shared memory >> are using it this way. >> >> Going forward we can require the shared memory server and always have >> interrupts enabled. >> > > Can you explain how they synchronize? Polling? Using the network? Using > it as a shared cache? > > If it's a reasonable use case it makes sense to keep it. > Do you mean how they synchronize without interrupts? One project I've been contacted about uses the shared region directly for synchronization for simulations running in different VMs that share data in the memory region. In my tests spinlocks in the shared region work between guests. If we want to keep the serverless implementation, do we need to support shm_open with -chardev somehow? Something like -chardev shm,name=foo. Right now my qdev implementation just passes the name to the -device option and opens it. > Another thing comes to mind - a shared memory ID, in case a guest has > multiple cards. Sure, a number that can be passed on the command-line and stored in a register? Cam -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2941282 ] Ubuntu 10.04 installer fails due to I/O errors with virtio
Bugs item #2941282, was opened at 2010-01-27 23:19 Message generated for change (Comment added) made by bschmidt You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2941282&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: Marti Raudsepp (intgr) Assigned to: Nobody/Anonymous (nobody) Summary: Ubuntu 10.04 installer fails due to I/O errors with virtio Initial Comment: I tried installing Ubuntu 10.04 and Fedora 12 in a KVM virtual machine using virtio, on a 8G raw file-backed disk. Both installers failed half-way due to I/O errors. So I tried reproducing it and managed to repeat it 6 times. The bug doesn't occur with IDE emulation. The bug happens fairly quickly with -smp 4 -- usually within 5 minutes -- but is much rarer with -smp 1. Ubuntu installer has kernel 2.6.32-11-generic Fedora 12 has kernel 2.6.31.5-127.fc12.x86_64 Host has kernel 2.6.32.6 (Arch Linux) and QEMU 0.12.2 When testing with -smp 1, it also produced a kernel oops from "block/blk-core.c:245". This line warns when the function is called with interrupts enabled: void blk_start_queue(struct request_queue *q) { WARN_ON(!irqs_disabled()); queue_flag_clear(QUEUE_FLAG_STOPPED, q); __blk_run_queue(q); } --- host machine --- [ma...@newn]% qemu-kvm --version QEMU PC emulator version 0.12.2 (qemu-kvm-0.12.2), Copyright (c) 2003-2008 Fabrice Bellard [ma...@newn]% ps aux |grep crash root 16283 31.4 7.1 427020 289960 ? Sl 22:44 8:37 /usr/bin/qemu-kvm -S -M pc-0.11 -enable-kvm -m 256 -smp 1 -name ubuntu-crashtest -uuid 0d7d4f2d-5589-160b-1f1b-75d46e293a2c -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/ubuntu-crashtest.monitor,server,nowait -monitor chardev:monitor -boot d -drive file=/store/iso/lucid-desktop-amd64.iso,if=ide,media=cdrom,index=2,format=raw -drive file=/store/virt/ubuntu-crashtest.img,if=virtio,index=0,format=raw -net nic,macaddr=52:54:00:45:e7:19,vlan=0,name=nic.0 -net tap,fd=43,vlan=0,name=tap.0 -serial none -parallel none -usb -usbdevice tablet -vnc 127.0.0.1:1 -k en-us -vga cirrus -soundhw es1370 -balloon virtio marti17700 0.0 0.0 8360 968 pts/4S+ 23:11 0:00 grep crash [ma...@newn]% stat /store/virt/ubuntu-crashtest.img File: `/store/virt/ubuntu-crashtest.img' Size: 8589934592 Blocks: 5615368IO Block: 4096 regular file Device: fe01h/65025dInode: 4718596 Links: 1 Access: (0600/-rw---) Uid: (0/root) Gid: (0/root) Access: 2010-01-27 22:43:45.128113080 +0200 Modify: 2010-01-27 23:09:11.523577452 +0200 Change: 2010-01-27 23:09:11.523577452 +0200 [ma...@newn]% uname -a Linux newn 2.6.32-ARCH #1 SMP PREEMPT Mon Jan 25 20:33:50 CET 2010 x86_64 AMD Phenom(tm) II X4 940 Processor AuthenticAMD GNU/Linux [ma...@newn]% cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 16 model : 4 model name : AMD Phenom(tm) II X4 940 Processor stepping: 2 cpu MHz : 800.000 cache size : 512 KB physical id : 0 siblings: 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt bogomips: 6028.69 TLB size: 1024 4K pages clflush size: 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate *snip* three more CPU cores --- Ubuntu guest VM --- ubu...@ubuntu:/tmp$ uname -a Linux ubuntu 2.6.32-11-generic #15-Ubuntu SMP Tue Jan 19 20:38:41 UTC 2010 x86_64 GNU/Linux ubu...@ubuntu:/tmp$ cat /sys/block/vda/stat 7388948289 1661218 39497026765 947851 6284676 9459960 0 987890 9893220 ubu...@ubuntu:/tmp$ dmesg [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Linux version 2.6.32-11-generic (bui...@crested) (gcc version 4.4.3 20100116 (prerelease) (Ubuntu 4.4.2-9ubuntu4) ) #15-Ubuntu SMP Tue Jan 19 20:38:41 UTC 2010 (Ubuntu 2.6.32-11.15-generic) [0.00] Command line: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/ubuntu.seed boot=casper only-ubiquity initrd=/casper/initrd.lz quiet splash -- [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] BIOS-provide
guest patched with pax causes "set_cr0: 0xffff88000[...] #GP, reserved bits 0x8004003?" flood on host
Hi, I've updated my host kernel headers to 2.6.33, rebuilt glibc (and the base system), rebuilt kvm. ... and now I get hundreds of those in dmesg on the host when I start a guest kernel that worked fine before. (2.6.33 + pax patch v5) set_cr0: 0x88000ec29d58 #GP, reserved bits 0x80040033 set_cr0: 0x88000f3cdb38 #GP, reserved bits 0x8004003b set_cr0: 0x88000f3dbc88 #GP, reserved bits 0x80040033 set_cr0: 0x88000f83b958 #GP, reserved bits 0x8004003b (hundreds of all 4) And the VM just reboots shortly after starting init. Funnily enough, I've got some VMs still running that kernel just fine! (as I started them before the headers+glibc+qemu-kvm rebuild) Now, you might just say that I shouldn't use out of tree patches like pax, but I just want to know one thing: should the guest kernel still be able to flood dmesg on the host like this? Thanks Antoine PS: Avi, are you still interested in seeing if this rebuild fixes the pread/glibc bug? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to tweak kernel to get the best out of kvm?
On 03/10/2010 05:57 PM, Javier Guerra Giraldez wrote: On Wed, Mar 10, 2010 at 8:15 AM, Avi Kivity wrote: 15 guests should fit comfortably, more with ksm running if the workloads are similar, or if you use ballooning. is there any simple way to get some stats to see how is ksm doing? See /sys/kernel/mm/ksm -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to tweak kernel to get the best out of kvm?
On Wed, Mar 10, 2010 at 8:15 AM, Avi Kivity wrote: > 15 guests should fit comfortably, more with ksm running if the workloads are > similar, or if you use ballooning. is there any simple way to get some stats to see how is ksm doing? -- Javier -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter
On Wed, Mar 10, 2010 at 04:53:29PM +0200, Avi Kivity wrote: > On 03/10/2010 04:44 PM, Joerg Roedel wrote: > >On Mon, Mar 08, 2010 at 11:17:41AM +0200, Avi Kivity wrote: > >>On 03/03/2010 09:12 PM, Joerg Roedel wrote: > >>>This patch changes the tdp_enabled flag from its global > >>>meaning to the mmu-context. This is necessary for Nested SVM > >>>with emulation of Nested Paging where we need an extra MMU > >>>context to shadow the Nested Nested Page Table. > >>> > >>> > >>>diff --git a/arch/x86/include/asm/kvm_host.h > >>>b/arch/x86/include/asm/kvm_host.h > >>>index ec891a2..e7bef19 100644 > >>>--- a/arch/x86/include/asm/kvm_host.h > >>>+++ b/arch/x86/include/asm/kvm_host.h > >>>@@ -254,6 +254,7 @@ struct kvm_mmu { > >>> int root_level; > >>> int shadow_root_level; > >>> union kvm_mmu_page_role base_role; > >>>+ bool tdp_enabled; > >>> > >>This needs a different name, since the old one is still around. > >>Perhaps we could call it parent_mmu and make it a kvm_mmu pointer. > >Hmm, how about renaming the global tdp_enabled variable to tdp_usable? > >The global variable indicates if tdp is _usable_ and we can _enable_ it > >for a mmu context. > > I think of the global flags as host tdp, and the mmu as guest tdp > (but maybe this is wrong?). If that makes sense, the naming should > reflect that. The basic flow of the mmu state with npt-npt is: 1. As long as the L1 is running the arch.mmu context is in tdp mode and builds a direct-mapped page table. 2. When vmrun is emulated and the nested vmcb enables nested paging, arch.mmu is switched to a shadow-mmu mode which now shadows the l1 nested page table. So when the l2-guest runs with nested paging the arch.mmu.tdp_enabled variable on the host is false. 3. On a vmexit emulation the mmu is switched back to tdp handling state. So the mmu.tdp_enabled parameter is about tdp being enabled for the mmu context (so mmu.tdp_enabled means that we build a l1-direct-mapped page table when true or shadow a l1-page-table when false). Thats why I think the 'tdp_enabled' name makes sense in the mmu-context. The global flag only shows if an mmu-context could be in tdp-state. So tdp_usable may be a good name for it. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] QMP: Sync with upstream event changes
This commit contains the following QMP event changes to sync kvm_main_loop() with upstream: - Add the SHUTDOWN event (it's currently missing here) - Drop the RESET event (it's now emitted in qemu_system_reset()) - Drop the DEBUG event (it has been dropped upstream) Signed-off-by: Luiz Capitulino --- qemu-kvm.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index e417f21..2233a37 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -2038,6 +2038,7 @@ int kvm_main_loop(void) while (1) { main_loop_wait(1000); if (qemu_shutdown_requested()) { +monitor_protocol_event(QEVENT_SHUTDOWN, NULL); if (qemu_no_shutdown()) { vm_stop(0); } else @@ -2046,10 +2047,8 @@ int kvm_main_loop(void) monitor_protocol_event(QEVENT_POWERDOWN, NULL); qemu_irq_raise(qemu_system_powerdown); } else if (qemu_reset_requested()) { -monitor_protocol_event(QEVENT_RESET, NULL); qemu_kvm_system_reset(); } else if (kvm_debug_cpu_requested) { -monitor_protocol_event(QEVENT_DEBUG, NULL); gdb_set_stop_cpu(kvm_debug_cpu_requested); vm_stop(EXCP_DEBUG); kvm_debug_cpu_requested = NULL; -- 1.7.0.2.182.ge007 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter
On 03/10/2010 04:44 PM, Joerg Roedel wrote: On Mon, Mar 08, 2010 at 11:17:41AM +0200, Avi Kivity wrote: On 03/03/2010 09:12 PM, Joerg Roedel wrote: This patch changes the tdp_enabled flag from its global meaning to the mmu-context. This is necessary for Nested SVM with emulation of Nested Paging where we need an extra MMU context to shadow the Nested Nested Page Table. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ec891a2..e7bef19 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -254,6 +254,7 @@ struct kvm_mmu { int root_level; int shadow_root_level; union kvm_mmu_page_role base_role; + bool tdp_enabled; This needs a different name, since the old one is still around. Perhaps we could call it parent_mmu and make it a kvm_mmu pointer. Hmm, how about renaming the global tdp_enabled variable to tdp_usable? The global variable indicates if tdp is _usable_ and we can _enable_ it for a mmu context. I think of the global flags as host tdp, and the mmu as guest tdp (but maybe this is wrong?). If that makes sense, the naming should reflect that. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] KVM: MMU: Reinstate pte prefetch on invlpg
Commit fb341f57 removed the pte prefetch on guest invlpg, citing guest races. However, the SDM is adamant that prefetch is allowed: "The processor may create entries in paging-structure caches for translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path." And, in fact, there was a race in the prefetch code: we picked up the pte without the mmu lock held, so an older invlpg could install the pte over a newer invlpg. Reinstate the prefetch logic, but this time note whether another invlpg has executed using a counter. If a race occured, do not install the pte. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/mmu.c | 37 +++-- arch/x86/kvm/paging_tmpl.h | 15 +++ 3 files changed, 39 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ec891a2..fb2afda 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -389,6 +389,7 @@ struct kvm_arch { unsigned int n_free_mmu_pages; unsigned int n_requested_mmu_pages; unsigned int n_alloc_mmu_pages; + atomic_t invlpg_counter; struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES]; /* * Hash table of struct kvm_mmu_page. diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 086025e..e821609 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2611,20 +2611,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, int flooded = 0; int npte; int r; + int invlpg_counter; pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes); - switch (bytes) { - case 4: - gentry = *(const u32 *)new; - break; - case 8: - gentry = *(const u64 *)new; - break; - default: - gentry = 0; - break; - } + invlpg_counter = atomic_read(&vcpu->kvm->arch.invlpg_counter); /* * Assume that the pte write on a page table of the same type @@ -2632,16 +2623,34 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, * (might be false while changing modes). Note it is verified later * by update_pte(). */ - if (is_pae(vcpu) && bytes == 4) { + if ((is_pae(vcpu) && bytes == 4) || !new) { /* Handle a 32-bit guest writing two halves of a 64-bit gpte */ - gpa &= ~(gpa_t)7; - r = kvm_read_guest(vcpu->kvm, gpa, &gentry, 8); + if (is_pae(vcpu)) { + gpa &= ~(gpa_t)7; + bytes = 8; + } + r = kvm_read_guest(vcpu->kvm, gpa, &gentry, min(bytes, 8)); if (r) gentry = 0; + new = (const u8 *)&gentry; + } + + switch (bytes) { + case 4: + gentry = *(const u32 *)new; + break; + case 8: + gentry = *(const u64 *)new; + break; + default: + gentry = 0; + break; } mmu_guess_page_from_pte_write(vcpu, gpa, gentry); spin_lock(&vcpu->kvm->mmu_lock); + if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter) + gentry = 0; kvm_mmu_access_page(vcpu, gfn); kvm_mmu_free_some_pages(vcpu); ++vcpu->kvm->stat.mmu_pte_write; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 4b37e1a..067797a 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -463,6 +463,7 @@ out_unlock: static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) { struct kvm_shadow_walk_iterator iterator; + gpa_t pte_gpa = -1; int level; u64 *sptep; int need_flush = 0; @@ -476,6 +477,10 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) if (level == PT_PAGE_TABLE_LEVEL || ((level == PT_DIRECTORY_LEVEL && is_large_pte(*sptep))) || ((level == PT_PDPE_LEVEL && is_large_pte(*sptep { + struct kvm_mmu_page *sp = page_header(__pa(sptep)); + + pte_gpa = (sp->gfn << PAGE_SHIFT); + pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t); if (is_shadow_present_pte(*sptep)) { rmap_remove(vcpu->kvm, sptep); @@ -493,7 +498,17 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) if (need_flush) kvm_flush_remote_tlbs(vcpu->kvm); + + atomic_inc(&vcpu->kvm->arch.invlpg_counter); + spin_unlock(&vcpu->kvm->mmu_lock); + + if (pte_gpa == -1) + return; + + if (mmu_topup_memory_caches(vcpu)
[PATCH 4/5] KVM: MMU: Do not instantiate nontrapping spte on unsync page
The update_pte() path currently uses a nontrapping spte when a nonpresent (or nonaccessed) gpte is written. This is fine since at present it is only used on sync pages. However, on an unsync page this will cause an endless fault loop as the guest is under no obligation to invlpg a gpte that transitions from nonpresent to present. Needed for the next patch which reinstates update_pte() on invlpg. Signed-off-by: Avi Kivity --- arch/x86/kvm/paging_tmpl.h | 10 -- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 81eab9a..4b37e1a 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -258,11 +258,17 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page, pt_element_t gpte; unsigned pte_access; pfn_t pfn; + u64 new_spte; gpte = *(const pt_element_t *)pte; if (~gpte & (PT_PRESENT_MASK | PT_ACCESSED_MASK)) { - if (!is_present_gpte(gpte)) - __set_spte(spte, shadow_notrap_nonpresent_pte); + if (!is_present_gpte(gpte)) { + if (page->unsync) + new_spte = shadow_trap_nonpresent_pte; + else + new_spte = shadow_notrap_nonpresent_pte; + __set_spte(spte, new_spte); + } return; } pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte); -- 1.7.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] KVM: Make locked operations truly atomic
Once upon a time, locked operations were emulated while holding the mmu mutex. Since mmu pages were write protected, it was safe to emulate the writes in a non-atomic manner, since there could be no other writer, either in the guest or in the kernel. These days emulation takes place without holding the mmu spinlock, so the write could be preempted by an unshadowing event, which exposes the page to writes by the guest. This may cause corruption of guest page tables. Fix by using an atomic cmpxchg for these operations. Signed-off-by: Avi Kivity --- arch/x86/kvm/x86.c | 69 1 files changed, 48 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3753c11..8558a1c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3310,41 +3310,68 @@ int emulator_write_emulated(unsigned long addr, } EXPORT_SYMBOL_GPL(emulator_write_emulated); +#define CMPXCHG_TYPE(t, ptr, old, new) \ + (cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old)) + +#ifdef CONFIG_X86_64 +# define CMPXCHG64(ptr, old, new) CMPXCHG_TYPE(u64, ptr, old, new) +#else +# define CMPXCHG64(ptr, old, new) \ + (cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u *)(new)) == *(u64 *)(old)) +#endif + static int emulator_cmpxchg_emulated(unsigned long addr, const void *old, const void *new, unsigned int bytes, struct kvm_vcpu *vcpu) { - printk_once(KERN_WARNING "kvm: emulating exchange as write\n"); -#ifndef CONFIG_X86_64 - /* guests cmpxchg8b have to be emulated atomically */ - if (bytes == 8) { - gpa_t gpa; - struct page *page; - char *kaddr; - u64 val; + gpa_t gpa; + struct page *page; + char *kaddr; + bool exchanged; - gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL); + /* guests cmpxchg8b have to be emulated atomically */ + if (bytes > 8 || (bytes & (bytes - 1))) + goto emul_write; - if (gpa == UNMAPPED_GVA || - (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) - goto emul_write; + gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL); - if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK)) - goto emul_write; + if (gpa == UNMAPPED_GVA || + (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) + goto emul_write; - val = *(u64 *)new; + if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK)) + goto emul_write; - page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT); + page = gfn_to_page(vcpu->kvm, gpa >> PAGE_SHIFT); - kaddr = kmap_atomic(page, KM_USER0); - set_64bit((u64 *)(kaddr + offset_in_page(gpa)), val); - kunmap_atomic(kaddr, KM_USER0); - kvm_release_page_dirty(page); + kaddr = kmap_atomic(page, KM_USER0); + kaddr += offset_in_page(gpa); + switch (bytes) { + case 1: + exchanged = CMPXCHG_TYPE(u8, kaddr, old, new); + break; + case 2: + exchanged = CMPXCHG_TYPE(u16, kaddr, old, new); + break; + case 4: + exchanged = CMPXCHG_TYPE(u32, kaddr, old, new); + break; + case 8: + exchanged = CMPXCHG64(kaddr, old, new); + break; + default: + BUG(); } + kunmap_atomic(kaddr, KM_USER0); + kvm_release_page_dirty(page); + + if (!exchanged) + return X86EMUL_CMPXCHG_FAILED; + emul_write: -#endif + printk_once(KERN_WARNING "kvm: emulating exchange as write\n"); return emulator_write_emulated(addr, new, bytes, vcpu); } -- 1.7.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] Fix some mmu/emulator atomicity issues (v2)
Currently when we emulate a locked operation into a shadowed guest page table, we perform a write rather than a true atomic. This is indicated by the "emulating exchange as write" message that shows up in dmesg. In addition, the pte prefetch operation during invlpg suffered from a race. This was fixed by removing the operation. This patchset fixes both issues and reinstates pte prefetch on invlpg. v2: - fix truncated description for patch 1 - add new patch 4, which fixes a bug in patch 5 Avi Kivity (5): KVM: MMU: Consolidate two guest pte reads in kvm_mmu_pte_write() KVM: Make locked operations truly atomic KVM: Don't follow an atomic operation by a non-atomic one KVM: MMU: Do not instantiate nontrapping spte on unsync page KVM: MMU: Reinstate pte prefetch on invlpg arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/mmu.c | 78 +++--- arch/x86/kvm/paging_tmpl.h | 25 +- arch/x86/kvm/x86.c | 101 --- 4 files changed, 137 insertions(+), 68 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] KVM: MMU: Consolidate two guest pte reads in kvm_mmu_pte_write()
kvm_mmu_pte_write() reads guest ptes in two different occasions, both to allow a 32-bit pae guest to update a pte with 4-byte writes. Consolidate these into a single read, which also allows us to consolidate another read from an invlpg speculating a gpte into the shadow page table. Signed-off-by: Avi Kivity --- arch/x86/kvm/mmu.c | 69 +++ 1 files changed, 31 insertions(+), 38 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 741373e..086025e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2558,36 +2558,11 @@ static bool last_updated_pte_accessed(struct kvm_vcpu *vcpu) } static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, - const u8 *new, int bytes) + u64 gpte) { gfn_t gfn; - int r; - u64 gpte = 0; pfn_t pfn; - if (bytes != 4 && bytes != 8) - return; - - /* -* Assume that the pte write on a page table of the same type -* as the current vcpu paging mode. This is nearly always true -* (might be false while changing modes). Note it is verified later -* by update_pte(). -*/ - if (is_pae(vcpu)) { - /* Handle a 32-bit guest writing two halves of a 64-bit gpte */ - if ((bytes == 4) && (gpa % 4 == 0)) { - r = kvm_read_guest(vcpu->kvm, gpa & ~(u64)7, &gpte, 8); - if (r) - return; - memcpy((void *)&gpte + (gpa % 8), new, 4); - } else if ((bytes == 8) && (gpa % 8 == 0)) { - memcpy((void *)&gpte, new, 8); - } - } else { - if ((bytes == 4) && (gpa % 4 == 0)) - memcpy((void *)&gpte, new, 4); - } if (!is_present_gpte(gpte)) return; gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT; @@ -2638,7 +2613,34 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, int r; pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes); - mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes); + + switch (bytes) { + case 4: + gentry = *(const u32 *)new; + break; + case 8: + gentry = *(const u64 *)new; + break; + default: + gentry = 0; + break; + } + + /* +* Assume that the pte write on a page table of the same type +* as the current vcpu paging mode. This is nearly always true +* (might be false while changing modes). Note it is verified later +* by update_pte(). +*/ + if (is_pae(vcpu) && bytes == 4) { + /* Handle a 32-bit guest writing two halves of a 64-bit gpte */ + gpa &= ~(gpa_t)7; + r = kvm_read_guest(vcpu->kvm, gpa, &gentry, 8); + if (r) + gentry = 0; + } + + mmu_guess_page_from_pte_write(vcpu, gpa, gentry); spin_lock(&vcpu->kvm->mmu_lock); kvm_mmu_access_page(vcpu, gfn); kvm_mmu_free_some_pages(vcpu); @@ -2703,20 +2705,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, continue; } spte = &sp->spt[page_offset / sizeof(*spte)]; - if ((gpa & (pte_size - 1)) || (bytes < pte_size)) { - gentry = 0; - r = kvm_read_guest_atomic(vcpu->kvm, - gpa & ~(u64)(pte_size - 1), - &gentry, pte_size); - new = (const void *)&gentry; - if (r < 0) - new = NULL; - } while (npte--) { entry = *spte; mmu_pte_write_zap_pte(vcpu, sp, spte); - if (new) - mmu_pte_write_new_pte(vcpu, sp, spte, new); + if (gentry) + mmu_pte_write_new_pte(vcpu, sp, spte, &gentry); mmu_pte_write_flush_tlb(vcpu, entry, *spte); ++spte; } -- 1.7.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] KVM: Don't follow an atomic operation by a non-atomic one
Currently emulated atomic operations are immediately followed by a non-atomic operation, so that kvm_mmu_pte_write() can be invoked. This updates the mmu but undoes the whole point of doing things atomically. Fix by only performing the atomic operation and the mmu update, and avoiding the non-atomic write. Signed-off-by: Avi Kivity --- arch/x86/kvm/x86.c | 32 +--- 1 files changed, 25 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8558a1c..4cd56c6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3253,7 +3253,8 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, static int emulator_write_emulated_onepage(unsigned long addr, const void *val, unsigned int bytes, - struct kvm_vcpu *vcpu) + struct kvm_vcpu *vcpu, + bool mmu_only) { gpa_t gpa; u32 error_code; @@ -3269,6 +3270,10 @@ static int emulator_write_emulated_onepage(unsigned long addr, if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE) goto mmio; + if (mmu_only) { + kvm_mmu_pte_write(vcpu, gpa, val, bytes, 1); + return X86EMUL_CONTINUE; + } if (emulator_write_phys(vcpu, gpa, val, bytes)) return X86EMUL_CONTINUE; @@ -3289,24 +3294,35 @@ mmio: return X86EMUL_CONTINUE; } -int emulator_write_emulated(unsigned long addr, - const void *val, - unsigned int bytes, - struct kvm_vcpu *vcpu) +static int __emulator_write_emulated(unsigned long addr, +const void *val, +unsigned int bytes, +struct kvm_vcpu *vcpu, +bool mmu_only) { /* Crossing a page boundary? */ if (((addr + bytes - 1) ^ addr) & PAGE_MASK) { int rc, now; now = -addr & ~PAGE_MASK; - rc = emulator_write_emulated_onepage(addr, val, now, vcpu); + rc = emulator_write_emulated_onepage(addr, val, now, vcpu, +mmu_only); if (rc != X86EMUL_CONTINUE) return rc; addr += now; val += now; bytes -= now; } - return emulator_write_emulated_onepage(addr, val, bytes, vcpu); + return emulator_write_emulated_onepage(addr, val, bytes, vcpu, + mmu_only); +} + +int emulator_write_emulated(unsigned long addr, + const void *val, + unsigned int bytes, + struct kvm_vcpu *vcpu) +{ + return __emulator_write_emulated(addr, val, bytes, vcpu, false); } EXPORT_SYMBOL_GPL(emulator_write_emulated); @@ -3370,6 +3386,8 @@ static int emulator_cmpxchg_emulated(unsigned long addr, if (!exchanged) return X86EMUL_CMPXCHG_FAILED; + return __emulator_write_emulated(addr, new, bytes, vcpu, true); + emul_write: printk_once(KERN_WARNING "kvm: emulating exchange as write\n"); -- 1.7.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 18/18] KVM: X86: Add KVM_CAP_SVM_CPUID_FIXED
On Mon, Mar 08, 2010 at 11:39:31AM +0200, Avi Kivity wrote: > On 03/03/2010 09:12 PM, Joerg Roedel wrote: > >This capability shows userspace that is can trust the values > >of cpuid[0x800A] that it gets from the kernel. Old > >behavior was to just return the host cpuid values which is > >broken because all additional svm-features need support in > >the svm emulation code. > > > > A think we can simply fix the bug and push the fix to the various > stable queues. Ok, sounds good too. I have some more fixes queued up and send this one together with them. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/18] KVM: MMU: Add infrastructure for two-level page walker
On Mon, Mar 08, 2010 at 11:37:22AM +0200, Avi Kivity wrote: > On 03/03/2010 09:12 PM, Joerg Roedel wrote: > >This patch introduces a mmu-callback to translate gpa > >addresses in the walk_addr code. This is later used to > >translate l2_gpa addresses into l1_gpa addresses. > > > >Signed-off-by: Joerg Roedel > >--- > > arch/x86/include/asm/kvm_host.h |1 + > > arch/x86/kvm/mmu.c |7 +++ > > arch/x86/kvm/paging_tmpl.h | 19 +++ > > include/linux/kvm_host.h|5 + > > 4 files changed, 32 insertions(+), 0 deletions(-) > > > >diff --git a/arch/x86/include/asm/kvm_host.h > >b/arch/x86/include/asm/kvm_host.h > >index c0b5576..76c8b5f 100644 > >--- a/arch/x86/include/asm/kvm_host.h > >+++ b/arch/x86/include/asm/kvm_host.h > >@@ -250,6 +250,7 @@ struct kvm_mmu { > > void (*free)(struct kvm_vcpu *vcpu); > > gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access, > > u32 *error); > >+gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error); > > void (*prefetch_page)(struct kvm_vcpu *vcpu, > > struct kvm_mmu_page *page); > > int (*sync_page)(struct kvm_vcpu *vcpu, > > I think placing this here means we will miss a few translations, > namely when we do a physical access (say, reading PDPTEs or > similar). > > We need to do this on the level of kvm_read_guest() so we capture > physical accesses: > > kvm_read_guest_virt > -> walk_addr > -> kvm_read_guest_tdp > -> kvm_read_guest_virt > -> walk_addr > -> kvm_read_guest_tdp > -> kvm_read_guest > > Of course, not all accesses will use kvm_read_guest_tdp; for example > kvmclock accesses should still go untranslated. Ok, doing the translation in kvm_read_guest is certainly the more generic approach. I already fixed a bug related to loading l2 pdptr pointers. Doing the translation in kvm_read_guest makes the code a lot nicer. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/18] KVM: MMU: Make tdp_enabled a mmu-context parameter
On Mon, Mar 08, 2010 at 11:17:41AM +0200, Avi Kivity wrote: > On 03/03/2010 09:12 PM, Joerg Roedel wrote: > >This patch changes the tdp_enabled flag from its global > >meaning to the mmu-context. This is necessary for Nested SVM > >with emulation of Nested Paging where we need an extra MMU > >context to shadow the Nested Nested Page Table. > > > > > >diff --git a/arch/x86/include/asm/kvm_host.h > >b/arch/x86/include/asm/kvm_host.h > >index ec891a2..e7bef19 100644 > >--- a/arch/x86/include/asm/kvm_host.h > >+++ b/arch/x86/include/asm/kvm_host.h > >@@ -254,6 +254,7 @@ struct kvm_mmu { > > int root_level; > > int shadow_root_level; > > union kvm_mmu_page_role base_role; > >+bool tdp_enabled; > > > > This needs a different name, since the old one is still around. > Perhaps we could call it parent_mmu and make it a kvm_mmu pointer. Hmm, how about renaming the global tdp_enabled variable to tdp_usable? The global variable indicates if tdp is _usable_ and we can _enable_ it for a mmu context. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 19/24] KVM: x86 emulator: fix in/out emulation.
On Wed, Mar 10, 2010 at 11:12:34AM +0200, Avi Kivity wrote: > On 03/09/2010 08:09 PM, Gleb Natapov wrote: > > > >>We don't want to enter the emulator for non-string in/out. Leftover > >>test code? > >> > >No, unfortunately this is not leftover. I just don't see a way how we > >can bypass emulator and still have emulator be able to emulate in/out > >(for big real mode for instance). The problem is basically described in > >the commit message. If we have function outside of emulator that does > >in/out emulation on vcpu directly, then emulator can't use it since > >committing shadowed registers will overwrite the result of emulation. > >Having two different emulations (one outside of emulator and another in > >emulator) is also problematic since when userspace returns after IO exit > >we don't know which emulation to continue. If we want to avoid > >instruction decoding we can fill in emulation context from exit info as > >if instruction was already decoded and call emulator. > > > > Alternatively, another entry point would be fine. in/out is a fast > path (used for virtio for example). > You mean another entry point into emulator, not separate implementation for emulated in/out and intercepted one. If yes this is what I mean by "faking" decoding stage. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2962575 ] MINIX 3.1.6 works in QEMU-0.12.3 only with KVM disabled
Bugs item #2962575, was opened at 2010-03-03 13:20 Message generated for change (Comment added) made by erikvdk You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2962575&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: intel Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Erik van der Kouwe (erikvdk) Assigned to: Nobody/Anonymous (nobody) Summary: MINIX 3.1.6 works in QEMU-0.12.3 only with KVM disabled Initial Comment: Dear all, If one runs the following commands after installing qemu-0.12.3 or qemu-kvm-0.12.3: wget http://www.minix3.org/download/minix_R3.1.6-r6084.iso.bz2 bunzip2 minix_R3.1.6-r6084.iso.bz2 qemu-system-x86_64 -cdrom minix_R3.1.6-r6084.iso -enable-kvm and presses 1 (Regular MINIX 3), the following error message results when loading MINIX: kvm: unhandled exit 8021 kvm_run returned -22 The guest stops after that. This error message does not occur without the -enable-kvm switch. It does not occur with qemu-kvm-0.11.0 as bundled with Ubuntu. The problem occurs with the "qemu" binary from qemu-0.12.3 as well as "qemu-system-x86_64" from qemu-kvm-0.12.3, but in the former case no error message is printed. The code that is running when it fails is in https://gforge.cs.vu.nl/gf/project/minix/scmsvn/?action=browse&path=%2Ftrunk%2Fsrc%2Fboot%2Fboothead.s&revision=5918&view=markup. It happens in ext_copy: ext_copy: mov x_dst_desc+2, ax movbx_dst_desc+4, dl ! Set base of destination segment mov ax, 8(bp) mov dx, 10(bp) mov x_src_desc+2, ax movbx_src_desc+4, dl ! Set base of source segment mov si, #x_gdt ! es:si = global descriptor table shr cx, #1 ! Words to move movbah, #0x87 ! Code for extended memory move int 0x15 The line that fails is "int 0x15", which performs a BIOS call to copy data from low memory to above the 1MB barrier. The machine is running in 16-bit real mode when this code is executed. Output for "uname -a" on the host: Linux hp364 2.6.31-20-generic #57-Ubuntu SMP Mon Feb 8 09:05:19 UTC 2010 i686 GNU/Linux Output for "cat /proc/cpuinfo" on the host: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU E8600 @ 3.33GHz stepping: 10 cpu MHz : 1998.000 cache size : 6144 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority bogomips: 6650.50 clflush size: 64 power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU E8600 @ 3.33GHz stepping: 10 cpu MHz : 1998.000 cache size : 6144 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority bogomips: 6649.80 clflush size: 64 power management: With kind regards, Erik -- Comment By: Erik van der Kouwe (erikvdk) Date: 2010-03-10 15:16 Message: Thanks to Avi Kivity I now have a workaround for this issue, namely 16-byte align the addresses in the GDT passed to the BIOS extended copy function. The BIOS left the unaligned descriptor causing MINIX to operate in unreal mode, which is not well supported by KVM on Intel. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2962575&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2967396 ] Workaround
Bugs item #2967396, was opened at 2010-03-10 15:13 Message generated for change (Comment added) made by erikvdk You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2967396&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Deleted Resolution: None Priority: 5 Private: No Submitted By: Erik van der Kouwe (erikvdk) Assigned to: Nobody/Anonymous (nobody) Summary: Workaround Initial Comment: Thanks to Avi Kivity I now have a workaround for this issue, namely 16-byte align the addresses in the GDT passed to the BIOS extended copy function. The BIOS left the unaligned descriptor causing MINIX to operate in unreal mode, which is not well supported by KVM on Intel. -- >Comment By: Erik van der Kouwe (erikvdk) Date: 2010-03-10 15:14 Message: Oops, this was supposed to be a comment to another report -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2967396&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2967396 ] Workaround
Bugs item #2967396, was opened at 2010-03-10 15:13 Message generated for change (Tracker Item Submitted) made by erikvdk You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2967396&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Erik van der Kouwe (erikvdk) Assigned to: Nobody/Anonymous (nobody) Summary: Workaround Initial Comment: Thanks to Avi Kivity I now have a workaround for this issue, namely 16-byte align the addresses in the GDT passed to the BIOS extended copy function. The BIOS left the unaligned descriptor causing MINIX to operate in unreal mode, which is not well supported by KVM on Intel. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2967396&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question on stopping KVM start at boot
> Hi folks. > > Host - ubuntu 9.10 64bit > Virtualizer - KVM > > I need to stop KVM starting at boot. > > I added following 2 lines at the bottom of /etc/modprobe.d/blacklist.conf > blacklist kvm > blacklist kvm-amd > > > Reboot PC > > It doesn't work. > > $ lsmod | grep kvm > kvm_amd41556 0 > kvm 190648 1 kvm_amd > > > Please what further command I have to run in order to activate the new > blacklist.conf ? For Ubutnu, you can just use update-rc.d sudo update-rc.d kvm disable to disable kvm and sudo update-rc.d kvm enable to enable it again. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inter-VM shared memory PCI device
On Tuesday 09 March 2010, Cam Macdonell wrote: > > > > We could make the masking in RAM, not in registers, like virtio, which would > > require no exits. It would then be part of the application specific > > protocol and out of scope of of this spec. > > > > This kind of implementation would be possible now since with UIO it's > up to the application whether to mask interrupts or not and what > interrupts mean. We could leave the interrupt mask register for those > who want that behaviour. Arnd's idea would remove the need for the > Doorbell and Mask, but we will always need at least one MMIO register > to send whatever interrupts we do send. You'd also have to be very careful if the notification is in RAM to avoid races between one guest triggering an interrupt and another guest clearing its interrupt mask. A totally different option that avoids this whole problem would be to separate the signalling from the shared memory, making the PCI shared memory device a trivial device with a single memory BAR, and using something a higher-level concept like a virtio based serial line for the actual signalling. Arnd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PCI capabilities support for assigned devices
Hi *, in qemu-kvm/hw/device-assignment.c assigned_device_pci_cap_init() appearently only PCI_CAP_ID_MSI and PCI_CAP_ID_MSIX are exposed to the guest. Linux Broadcom bnx2 and tg3 drivers expect PCI_CAP_ID_PM to be present. Are there any plans to implement this and possibly other PCI capability features for assigned devices? If not, is there a list of network cards known to work with PCI assignment in KVM? Best Regards, Sebastian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.
On Wed, Mar 10, 2010 at 07:08:31PM +0900, Takuya Yoshikawa wrote: > Gleb Natapov wrote: > > >>>Entering guest from time to time will not change semantics of the > >>>processor (if code is not modified under processor's feet at least). > >>>Currently we reenter guest mode after each iteration of string > >>>instruction for all instruction but ins/outs. > >>> > >>E.g., is there no chance that during the repetitions, in the middle of the > >>repetitions, page faults occur? If it can, without entering the guest, can > >>we handle it? > >> -- I lack some basic assumptions? > >> > >If page fault occurs we inject it to the guest. > > > > Oh, I maight fail to tell what I worried about. > Opposite, I mean, I worried about NOT reentering the guest case. > Are you thinking about something specific here? If we inject exceptions when they occur and we inject interrupt when they arrive what problem do you see? I guess this is how real CPU actually works. I doubt it re-reads string instruction on each iteration. > I know that current implementation with reentrance is OK. Current implementation does not reenter guest on each iteration for pio string, so currently we have both variants. > > To inject a page fault without reentering the guest, we need to add > some more hacks to the emulator IIUC. > No, we just need to enter guest if exception happens. I see that this in handled incorrectly in my current patch series. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM test: Support to SLES install
On Wed, Mar 10, 2010 at 8:45 AM, Lucas Meneghel Rodrigues wrote: > From: yogi > > Adds new entry "SUSE" in test_base file for sles and > contains autoinst file for doing unatteneded Sles11 64-bit > install. Oh Yogi, by the way, could you please reorganize the opensuse session and add at least an autoyast file for opensuse 11.2 so I can actually test if we can get a successful installation? I tried to play with the XML file of SLES to see if I could get opensuse 11.2 installed, but it turns out that those config files are an endless XML nightmare and all I tried makes yast to die. The mechanics of the whole thing are correct, I can get yast to start with no problems, but parsing the autoyast file makes the VM to hang. So I am fine with adding the patch, but it'd be nice to have an OS with irrestrict access that everybody could play with (opensuse). I don't have enough time to make it work on my own, so if you have some spare time, please work on this. > Signed-off-by: Yogananth Subramanian > --- > client/tests/kvm/tests_base.cfg.sample | 22 ++ > 1 files changed, 22 insertions(+), 0 deletions(-) > > diff --git a/client/tests/kvm/tests_base.cfg.sample > b/client/tests/kvm/tests_base.cfg.sample > index c76470d..acb2076 100644 > --- a/client/tests/kvm/tests_base.cfg.sample > +++ b/client/tests/kvm/tests_base.cfg.sample > @@ -503,6 +503,28 @@ variants: > md5sum = 2afee1b8a87175e6dee2b8dbbd1ad8e8 > md5sum_1m = 768ca32503ef92c28f2d144f2a87e4d0 > > + - SLES: > + no setup > + shell_prompt = "^r...@.*[\#\$]\s*$|#" > + unattended_install: > + pxe_image = "linux" > + pxe_initrd = "initrd" > + extra_params += " -bootp /pxelinux.0 -boot n" > + kernel_args = "autoyast=floppy" > + > + variants: > + - 11.64: > + no setup > + image_name = sles11-64 > + cdrom=linux/SLES-11-DVD-x86_64-GM-DVD1.iso > + md5sum = 50a2bd45cd12c3808c3ee48208e2586b > + md5sum_1m = 0951cab7c32e332362fc424c1054 > + unattended_install: > + unattended_file = > unattended/Sles11-64-autoinst.xml > + tftp = "images/sles11-64/tftpboot" > + floppy = "images/sles11-64floppy.img" > + pxe_dir = "boot/x86_64/loader" > + > - @Ubuntu: > shell_prompt = "^r...@.*[\#\$]\s*$" > > -- > 1.6.6.1 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Lucas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to tweak kernel to get the best out of kvm?
On 03/10/2010 02:58 PM, Harald Dunkel wrote: Hi Avi, On 03/08/10 12:02, Avi Kivity wrote: On 03/05/2010 05:20 PM, Harald Dunkel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi folks, Problem: My kvm server (8 cores, 64 GByte RAM, amd64) can eat up all block device or file system performance, so that the kvm clients become almost unresponsive. This is _very_ bad. I would like to make sure that the kvm clients do not affect each other, and that all (including the server itself) get a fair part of computing power and memory space. Please describe the issue in detail, provide output from 'vmstat' and 'top'. Sorry for the delay. I cannot put these services at risk, so I have setup a test environment on another host (2 quadcore Xeons, ht enabled, 32 GByte RAM, no swap, bridged networking) to reproduce the problem. There are 8 virtual hosts, each with a single CPU, 1 GByte RAM and 4 GByte swap on a virtual disk. The virtual disks are image files in the local file system. These images are not shared. For testing each virtual host builds the Linux kernel. In parallel I am running rsync to clone a remote virtual machine (22 GByte) to the local physical disk. Attached you can find the requested logs. The kern.log shows the problem: The virtual CPUs get stuck (as it seems). Several virtual hosts showed this effect. One vhost was unresponsive for more than 30 minutes. Surely this is a stress test, but I had a similar effect with our virtual mail server on the production system, while I was running a similar rsync session. mailhost was unresponsive for more than 2 minutes, then it was back. The other 8 virtual hosts on this system were started, but idle (AFAICT). You have tons of iowait time, indicating an I/O bottleneck. What filesystem are you using for the host? Are you using qcow2 or raw access? What's the qemu command line. Perhaps your filesystem doesn't perform well on synchronous writes. For testing only, you might try cache=writeback. BTW, please note that free memory goes down over time. This happens only if the rsync is running. Without rsync the free memory is stable. That's expected. rsync fills up the guest and host pagecache, both drain free memory (the guest only until it has touched all of its memory). What config options would you suggest to build and run a Linux kernel optimized for running kvm clients? Sorry for asking, but AFAICS some general guidelines for kvm are missing here. Of course I saw a lot of options in Documentation/\ kernel-parameters.txt, but unfortunately I am not a kernel hacker. Any helpful comment would be highly appreciated. One way to ensure guests don't affect each other is not to overcommit, that is make sure each guest gets its own cores, there is enough memory for all guests, and guests have separate disks. Of course that defeats the some of the reasons for virtualizing in the first place; but if you share resources, some compromises must be made. How many virtual machines would you assume I could run on a host with 64 GByte RAM, 2 quad cores, a bonding NIC with 4*1Gbit/sec and a hardware RAID? Each vhost is supposed to get 4 GByte RAM and 1 CPU. 15 guests should fit comfortably, more with ksm running if the workloads are similar, or if you use ballooning. If you do share resources, then Linux manages how they are shared. The scheduler will share the processors, the memory management subsystem will share memory, and the I/O scheduler will share disk bandwidth. If you see a problem in one of these areas you will need to tune the subsystem that is misbehaving. Do you think that the bridge connecting the tunnel devices and the real NIC makes the problems? Is there also a subsystem managing network access? Here the problem is likely the host filesystem and/or I/O scheduler. The optimal layout is placing guest disks in LVM volumes, and accessing them with -drive file=...,cache=none. However, file-based access should also work. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question on stopping KVM start at boot
On Wed, Mar 10, 2010 at 04:07:09PM +0800, sati...@pacific.net.hk wrote: > Hi folks. > > Host - ubuntu 9.10 64bit > Virtualizer - KVM > > I need to stop KVM starting at boot. > > I added following 2 lines at the bottom of /etc/modprobe.d/blacklist.conf > blacklist kvm > blacklist kvm-amd > > > Reboot PC > > It doesn't work. > > $ lsmod | grep kvm > kvm_amd41556 0 > kvm 190648 1 kvm_amd > > > Please what further command I have to run in order to activate the > new blacklist.conf ? /etc/init.d/qemu-kvm loads the module. So you probably must not run it on start up or teach it how to read the blacklist. Perhaps this is more appropriate to ask on an ubuntu-specific mailing list, since this is an ubuntu problem. Thanks, Rodrigo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] arch/x86/kvm/* Checkpatch cleanup
Where have you based these changes on? I already did most of the cleanups to svm.c you have made here and the patch should be in avi/master. Joerg On Wed, Mar 10, 2010 at 12:37:47PM +0100, Andrea Gelmini wrote: > Fixes for all files > > Signed-off-by: Andrea Gelmini > --- > arch/x86/kvm/emulate.c | 139 > +++--- > arch/x86/kvm/i8254.c |8 +-- > arch/x86/kvm/i8254.h | 12 ++-- > arch/x86/kvm/i8259.c |3 +- > arch/x86/kvm/kvm_timer.h |6 +- > arch/x86/kvm/lapic.c |6 +- > arch/x86/kvm/mmu.c | 17 +++--- > arch/x86/kvm/mmutrace.h |6 +- > arch/x86/kvm/svm.c | 77 +- > arch/x86/kvm/trace.h | 12 ++-- > arch/x86/kvm/vmx.c | 44 +++--- > arch/x86/kvm/x86.c | 18 +++--- > arch/x86/kvm/x86.h |2 +- > 13 files changed, 170 insertions(+), 180 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM test: Support to SLES install
From: yogi Adds new entry "SUSE" in test_base file for sles and contains autoinst file for doing unatteneded Sles11 64-bit install. Signed-off-by: Yogananth Subramanian --- client/tests/kvm/tests_base.cfg.sample | 22 ++ 1 files changed, 22 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index c76470d..acb2076 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -503,6 +503,28 @@ variants: md5sum = 2afee1b8a87175e6dee2b8dbbd1ad8e8 md5sum_1m = 768ca32503ef92c28f2d144f2a87e4d0 +- SLES: +no setup +shell_prompt = "^r...@.*[\#\$]\s*$|#" +unattended_install: +pxe_image = "linux" +pxe_initrd = "initrd" +extra_params += " -bootp /pxelinux.0 -boot n" +kernel_args = "autoyast=floppy" + +variants: +- 11.64: +no setup +image_name = sles11-64 +cdrom=linux/SLES-11-DVD-x86_64-GM-DVD1.iso +md5sum = 50a2bd45cd12c3808c3ee48208e2586b +md5sum_1m = 0951cab7c32e332362fc424c1054 +unattended_install: +unattended_file = unattended/Sles11-64-autoinst.xml +tftp = "images/sles11-64/tftpboot" +floppy = "images/sles11-64floppy.img" +pxe_dir = "boot/x86_64/loader" + - @Ubuntu: shell_prompt = "^r...@.*[\#\$]\s*$" -- 1.6.6.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM checkpatch.pl cleanup
Hi all, as Marcelo told me I send you these group of patches. They're just checkpatch.pl cleanup. They cleanly apply and compile on latest Linus' git tree. Thanks a lot for your work, Andrea -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] arch/x86/kvm/* Checkpatch cleanup
Fixes for all files Signed-off-by: Andrea Gelmini --- arch/x86/kvm/emulate.c | 139 +++--- arch/x86/kvm/i8254.c |8 +-- arch/x86/kvm/i8254.h | 12 ++-- arch/x86/kvm/i8259.c |3 +- arch/x86/kvm/kvm_timer.h |6 +- arch/x86/kvm/lapic.c |6 +- arch/x86/kvm/mmu.c | 17 +++--- arch/x86/kvm/mmutrace.h |6 +- arch/x86/kvm/svm.c | 77 +- arch/x86/kvm/trace.h | 12 ++-- arch/x86/kvm/vmx.c | 44 +++--- arch/x86/kvm/x86.c | 18 +++--- arch/x86/kvm/x86.h |2 +- 13 files changed, 170 insertions(+), 180 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 4dade6a..3ebec1e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -449,35 +449,34 @@ static u32 group2_table[] = { /* Raw emulation: instruction has two explicit operands. */ -#define __emulate_2op_nobyte(_op,_src,_dst,_eflags,_wx,_wy,_lx,_ly,_qx,_qy) \ - do {\ - unsigned long _tmp; \ - \ - switch ((_dst).bytes) { \ - case 2: \ - emulate_2op(_op,_src,_dst,_eflags,_wx,_wy,"w"); \ - break; \ - case 4: \ - emulate_2op(_op,_src,_dst,_eflags,_lx,_ly,"l"); \ - break; \ - case 8: \ - ON64(emulate_2op(_op,_src,_dst,_eflags,_qx,_qy,"q")); \ - break; \ - } \ +#define __emulate_2op_nobyte(_op, _src, _dst, _eflags, _wx, _wy, _lx, _ly, _qx, _qy) \ + do { \ + unsigned long _tmp; \ + switch ((_dst).bytes) { \ + case 2: \ + emulate_2op(_op, _src, _dst, _eflags, _wx, _wy, "w"); \ + break; \ + case 4: \ + emulate_2op(_op, _src, _dst, _eflags, _lx, _ly, "l"); \ + break; \ + case 8: \ + ON64(emulate_2op(_op, _src, _dst, _eflags, _qx, _qy, "q")); \ + break; \ + } \ } while (0) -#define __emulate_2op(_op,_src,_dst,_eflags,_bx,_by,_wx,_wy,_lx,_ly,_qx,_qy) \ - do { \ - unsigned long _tmp; \ - switch ((_dst).bytes) { \ - case 1: \ - emulate_2op(_op,_src,_dst,_eflags,_bx,_by,"b"); \ - break; \ - default: \ - __emulate_2op_nobyte(_op, _src, _dst, _eflags, \ -_wx, _wy, _lx, _ly, _qx, _qy); \ - break; \ - }\ +#define __emulate_2op(_op, _src, _dst, _eflags, _bx, _by, _wx, _wy, _lx, _ly, _qx, _qy)\ + do { \ + unsigned long _tmp; \ + switch ((_dst).bytes) { \ + case 1: \ + emulate_2op(_op, _src, _dst, _eflags, _bx, _by, "b"); \ + break; \ + default:
[PATCH 1/2] virt/kvm/* Checkpatch cleanup
Fixes for all files Signed-off-by: Andrea Gelmini --- virt/kvm/assigned-dev.c |2 +- virt/kvm/coalesced_mmio.h |4 ++-- virt/kvm/ioapic.c |4 ++-- virt/kvm/ioapic.h |2 +- virt/kvm/irq_comm.c |2 +- virt/kvm/kvm_main.c | 12 ++-- 6 files changed, 13 insertions(+), 13 deletions(-) diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index 057e2cc..6595bf2 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -786,7 +786,7 @@ long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl, goto out_free_irq_routing; r = kvm_set_irq_routing(kvm, entries, routing.nr, routing.flags); - out_free_irq_routing: +out_free_irq_routing: vfree(entries); break; } diff --git a/virt/kvm/coalesced_mmio.h b/virt/kvm/coalesced_mmio.h index 8a5959e..8c3b79f 100644 --- a/virt/kvm/coalesced_mmio.h +++ b/virt/kvm/coalesced_mmio.h @@ -25,9 +25,9 @@ struct kvm_coalesced_mmio_dev { int kvm_coalesced_mmio_init(struct kvm *kvm); void kvm_coalesced_mmio_free(struct kvm *kvm); int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm, - struct kvm_coalesced_mmio_zone *zone); +struct kvm_coalesced_mmio_zone *zone); int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm, - struct kvm_coalesced_mmio_zone *zone); +struct kvm_coalesced_mmio_zone *zone); #else diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 3db15a8..b718699 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -43,7 +43,7 @@ #include "irq.h" #if 0 -#define ioapic_debug(fmt,arg...) printk(KERN_WARNING fmt,##arg) +#define ioapic_debug(fmt, arg...) printk(KERN_WARNING fmt, ##arg) #else #define ioapic_debug(fmt, arg...) #endif @@ -326,7 +326,7 @@ static int ioapic_mmio_write(struct kvm_io_device *this, gpa_t addr, int len, return -EOPNOTSUPP; ioapic_debug("ioapic_mmio_write addr=%p len=%d val=%p\n", -(void*)addr, len, val); +(void *)addr, len, val); ASSERT(!(addr & 0xf)); /* check alignment */ if (len == 4 || len == 8) diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h index 8a751b7..0053dd7 100644 --- a/virt/kvm/ioapic.h +++ b/virt/kvm/ioapic.h @@ -50,7 +50,7 @@ struct kvm_ioapic { }; #ifdef DEBUG -#define ASSERT(x) \ +#define ASSERT(x) \ do { \ if (!(x)) { \ printk(KERN_EMERG "assertion failed %s: %d: %s\n", \ diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 9fd5b3e..9a05c77 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -162,7 +162,7 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level) irq_set[i++] = *e; rcu_read_unlock(); - while(i--) { + while (i--) { int r; r = irq_set[i].set(&irq_set[i], kvm, irq_source_id, level); if (r < 0) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 548f925..596900e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -64,7 +64,7 @@ MODULE_LICENSE("GPL"); /* * Ordering of locks: * - * kvm->lock --> kvm->slots_lock --> kvm->irq_lock + * kvm->lock --> kvm->slots_lock --> kvm->irq_lock */ DEFINE_SPINLOCK(kvm_lock); @@ -681,8 +681,8 @@ skip_lpage: * memslot will be created. * * validation of sp->gfn happens in: -* - gfn_to_hva (kvm_read_guest, gfn_to_pfn) -* - kvm_is_visible_gfn (mmu_check_roots) +* - gfn_to_hva (kvm_read_guest, gfn_to_pfn) +* - kvm_is_visible_gfn (mmu_check_roots) */ kvm_arch_flush_shadow(kvm); kfree(old_memslots); @@ -918,7 +918,7 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) slot = gfn_to_memslot_unaliased(kvm, gfn); if (!slot || slot->flags & KVM_MEMSLOT_INVALID) return bad_hva(); - return (slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE); + return slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE; } EXPORT_SYMBOL_GPL(gfn_to_hva); @@ -970,7 +970,7 @@ EXPORT_SYMBOL_GPL(gfn_to_pfn); static unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn) { - return (slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE); + return slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE; } pfn_t gfn_to_pfn_memsl
Re: [PATCH] KVM test: Make sure check_image script runs on VMs turned off
On Wed, Mar 10, 2010 at 5:21 AM, Michael Goldish wrote: > > - "Lucas Meneghel Rodrigues" wrote: > >> As it is hard to guarantee that a qcow2 image will be in a >> consistent state with a VM turned on, take an extra safety >> step and make sure the preprocessor shuts down the VMs >> before the post process command check_image.py runs. >> >> Signed-off-by: Lucas Meneghel Rodrigues >> --- >> client/tests/kvm/tests_base.cfg.sample | 2 ++ >> 1 files changed, 2 insertions(+), 0 deletions(-) >> >> diff --git a/client/tests/kvm/tests_base.cfg.sample >> b/client/tests/kvm/tests_base.cfg.sample >> index 340b0c0..beae786 100644 >> --- a/client/tests/kvm/tests_base.cfg.sample >> +++ b/client/tests/kvm/tests_base.cfg.sample >> @@ -1049,6 +1049,8 @@ variants: >> post_command = " python scripts/check_image.py;" >> remove_image = no >> post_command_timeout = 600 >> + kill_vm = yes >> + kill_vm_gracefully = yes > > That's not necessarily bad, but this may significantly slow down > testing because it means the VM will shutdown and boot up again > after every qcow2 test. It'll also separate the tests in an > unnatural way, eliminating the possibility of catching problems > that only appear after several consecutive tests (such problems > may or may not be possible, I'm not sure). > Maybe we should consider specifying the post_command for only some > of the tests, or add a dedicated test for this purpose, or even > a no-op test that only shuts down the VM and runs the post command. Or we could make this post command non critical and avoid the shutdowns. This way we'd pay attention to failures only when investigating the logs, this way if the consistency check fails on a situation where it shouldn't then we'd take an action. >> - vmdk: >> only Fedora Ubuntu Windows >> only smp2 >> -- >> 1.6.6.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Lucas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
> >> As of March 2009[1] Intel guarantees that memory reads occur in order > >> (they may only be reordered relative to writes). It appears AMD do not > >> provide this guarantee, which could be an interesting problem for > >> heterogeneous migration.. > > > > Interesting, but what ordering would cause problems that AMD would do > > but Intel wouldn't? Wouldn't that ordering cause the same problems > > for POSIX shared memory in general (regardless of Qemu) on AMD? > > If some code was written for the Intel guarantees it would break if > migrated to AMD. Of course, it would also break if run on AMD in the > first place. Right. This is independent of shared memory, and is a case where reporting an Intel CPUID on and AMD host might get you into trouble. Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MINIX 3.1.6 works in QEMU-0.12.3 only with KVM disabled
On 03/10/2010 12:26 PM, Erik van der Kouwe wrote: Dear all, I've submitted this bug report a week ago: http://sourceforge.net/tracker/?func=detail&aid=2962575&group_id=180599&atid=893831 I was wondering if work has already been done on this (maybe the problem was already known) and whether patches to fix this and/or workarounds are known. MINIX is using big real mode which is currently not well supported by kvm on Intel hardware: (qemu) info registers EAX=0010 EBX=0009 ECX=4920 EDX=a796 ESI=0200 EDI=49200200 EBP=0009 ESP=a762 EIP=f4a7 EFL=00023002 [---] CPL=3 II=0 A20=1 SMM=0 HLT=0 ES = f300 CS =f000 000f f300 SS =9492 00094920 f300 DS =97ce 00097cec f300 A ds.base of 0x97cec cannot be translated to a real mode segment. There is some work to get this to work, but it is proceeding really slowly. It should work on AMD hardware though. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
MINIX 3.1.6 works in QEMU-0.12.3 only with KVM disabled
Dear all, I've submitted this bug report a week ago: http://sourceforge.net/tracker/?func=detail&aid=2962575&group_id=180599&atid=893831 I was wondering if work has already been done on this (maybe the problem was already known) and whether patches to fix this and/or workarounds are known. Thanks for any answers, Erik van der Kouwe -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: extended vga modes?
2010/3/8 Avi Kivity : > On 03/08/2010 01:07 PM, Michael Tokarev wrote: >> >> Avi Kivity wrote: >> [] >> In short, when vgabios were dropped from qemu-kvm (for whatever yet unknown reason), >>> >>> What do you mean? qemu-kvm still carries a local vgabios (see >>> kvm/vgabios in qemu-kvm.git). >>> >> >> Oh my. So we all overlooked it. I asked you several times >> about the bios sources, in 0.12 seabios were supposed to be >> in roms/seabios (which is still empty in the release), and >> I thought vgabios should be in roms/vgabios (which is empty >> too), and concluded it were dropped from qemu-kvm tarball. >> But you're right, and I by mistake take vgabios sources from >> upstream qemu when building Debian package, instead of using >> the old'good sources from kvm/vgabios. What a mess!... :( >> >> And it looks like that it's time to remove at least parts of >> this mess, don't you think? How about pushing the vgabios >> changes to qemu and moving it to the same place where it is >> in qemu? Does it make sense? >> > > We can't push the changes to qemu since qemu.git doesn't have a vgabios > fork. We might push the changes upstream. Best of all if the seabios thing > repeats itself with vgabios so we have maintainable and maintained vga > firmware. > actually they do. see vgasrc directory in seabios.git. but it is very incomplete: no cirrus support, no vbe. > -- > error compiling committee.c: too many arguments to function > > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 32-bit qemu + 64-bit kvm be a problem?
On 03/10/2010 11:59 AM, Neo Jia wrote: hi, I have to keep a 32-bit qmeu user space to work with some legacy library I have but still want to use 64-bit host Linux to explore 64-bit advantage. So I am wondering if I can use a 32-bit qemu + 64-bit kvm-kmod configuration. It's fully supported. It's less well tested that 64/64 or 32/32, so please report any bugs. Will there be any limitation or drawback for this configuration? I already get one that we can't assign guest physical memory more than 2047 MB. That is the only limitation AFAIK. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.
Gleb Natapov wrote: Entering guest from time to time will not change semantics of the processor (if code is not modified under processor's feet at least). Currently we reenter guest mode after each iteration of string instruction for all instruction but ins/outs. E.g., is there no chance that during the repetitions, in the middle of the repetitions, page faults occur? If it can, without entering the guest, can we handle it? -- I lack some basic assumptions? If page fault occurs we inject it to the guest. Oh, I maight fail to tell what I worried about. Opposite, I mean, I worried about NOT reentering the guest case. I know that current implementation with reentrance is OK. To inject a page fault without reentering the guest, we need to add some more hacks to the emulator IIUC. Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
32-bit qemu + 64-bit kvm be a problem?
hi, I have to keep a 32-bit qmeu user space to work with some legacy library I have but still want to use 64-bit host Linux to explore 64-bit advantage. So I am wondering if I can use a 32-bit qemu + 64-bit kvm-kmod configuration. Will there be any limitation or drawback for this configuration? I already get one that we can't assign guest physical memory more than 2047 MB. Thanks, Neo -- I would remember that if researchers were not ambitious probably today we haven't the technology we are using! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Shadow page table questions
On 03/10/2010 06:57 AM, Marek Olszewski wrote: Hello, I was wondering if someone could point me to some documentation that explains the basic non-nested-paging shadow page table algorithm/strategy used by KVM. I understand that KVM caches shadow page tables across context switches and that there is a reverse mapping and page protection to help zap shadow page tables when the guest page tables change. However, I'm not entirely sure how the actual caching is done. At first I assumed that KVM would change the host CR3 on every guest context switch such that it would point to a cached shadow page table for the currently running guest user thread, however, as far as I can tell, the host CR3 does not change so I'm a little lost. If indeed it doesn't change the CR3, how does KVM solve the problem that arises when two processes in the guest OS share the same guest logical addresses? The host cr3 does change, though not by using the 'mov cr3' instruction (that would cause the host to immediately switch to the guest address space, which would be bad). See the calls to kvm_x86_ops->set_cr3(). I'm also interested in figuring out what KVM does when running with multiple virtual CPUs. Looking at the code, I can see that each VCPU has its own root pointer to a shadow page table graph, but I have yet to figure out if this graph has node's shared between VCPUs, or whether they are all private. Everything is shared. If the guest is running with identical cr3s, kvm will load identical cr3s in guest mode. An exception is when we use 32-bit pae mode. In that case, the guest cr3s will be different (but guest PDPTRs will be identical). Instead of dealing with the pae cr3, we deal with the four PDPTRs. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM PMU virtualization
On Thu, 2010-03-04 at 09:00 +0800, Zhang, Yanmin wrote: > On Wed, 2010-03-03 at 11:15 +0100, Peter Zijlstra wrote: > > On Wed, 2010-03-03 at 17:27 +0800, Zhang, Yanmin wrote: > > > -#ifndef perf_misc_flags > > > -#define perf_misc_flags(regs) (user_mode(regs) ? PERF_RECORD_MISC_USER > > > : \ > > > -PERF_RECORD_MISC_KERNEL) > > > -#define perf_instruction_pointer(regs) instruction_pointer(regs) > > > -#endif > > > > Ah, that #ifndef is for powerpc, which I think you just broke. > Thanks for the reminder. I deleted powerpc codes when building cscope > lib. > > It seems perf_save_virt_ip/perf_reset_virt_ip interfaces are ugly. I plan to > change them to a callback function struct and kvm registers its version to > perf. > > Such like: > struct perf_guest_info_callbacks { > int (*is_in_guest)(); > u64 (*get_guest_ip)(); > int (*copy_guest_stack)(); > int (*reset_in_guest)(); > ... > }; > int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *); > int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *); > > It's more scalable and neater. In case you guys might lose patience, I worked out a new patch against 2.6.34-rc1. It could work with: #perf kvm --guest --guestkallsyms /guest/os/kernel/proc/kallsyms --guestmodules /guest/os/proc/modules top It also support to collect both host side and guest side at the same time: #perf kvm --host --guest --guestkallsyms /guest/os/kernel/proc/kallsyms --guestmodules /guest/os/proc/modules top The first output line of top has guest kernel/user space percentage. Or just host side: #perf kvm --host As tool perf source codes have lots of changes, I am still working on perf kvm record and report. --- diff -Nraup linux-2.6.34-rc1/arch/x86/include/asm/ptrace.h linux-2.6.34-rc1_work/arch/x86/include/asm/ptrace.h --- linux-2.6.34-rc1/arch/x86/include/asm/ptrace.h 2010-03-09 13:04:20.730596079 +0800 +++ linux-2.6.34-rc1_work/arch/x86/include/asm/ptrace.h 2010-03-10 17:06:34.228953260 +0800 @@ -167,6 +167,15 @@ static inline int user_mode(struct pt_re #endif } +static inline int user_mode_cs(u16 cs) +{ +#ifdef CONFIG_X86_32 + return (cs & SEGMENT_RPL_MASK) == USER_RPL; +#else + return !!(cs & 3); +#endif +} + static inline int user_mode_vm(struct pt_regs *regs) { #ifdef CONFIG_X86_32 diff -Nraup linux-2.6.34-rc1/arch/x86/kvm/vmx.c linux-2.6.34-rc1_work/arch/x86/kvm/vmx.c --- linux-2.6.34-rc1/arch/x86/kvm/vmx.c 2010-03-09 13:04:20.758593132 +0800 +++ linux-2.6.34-rc1_work/arch/x86/kvm/vmx.c2010-03-10 17:11:49.709019136 +0800 @@ -26,6 +26,7 @@ #include #include #include +#include #include "kvm_cache_regs.h" #include "x86.h" @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct vmcs_write32(TPR_THRESHOLD, irr); } +DEFINE_PER_CPU(int, kvm_in_guest) = {0}; + +static void kvm_set_in_guest(void) +{ + percpu_write(kvm_in_guest, 1); +} + +static int kvm_is_in_guest(void) +{ + return percpu_read(kvm_in_guest); +} + +static int kvm_is_user_mode(void) +{ + int user_mode; + user_mode = user_mode_cs(vmcs_read16(GUEST_CS_SELECTOR)); + return user_mode; +} + +static u64 kvm_get_guest_ip(void) +{ + return vmcs_readl(GUEST_RIP); +} + +static void kvm_reset_in_guest(void) +{ + if (percpu_read(kvm_in_guest)) + percpu_write(kvm_in_guest, 0); +} + +static struct perf_guest_info_callbacks kvm_guest_cbs = { + .is_in_guest= kvm_is_in_guest, + .is_user_mode = kvm_is_user_mode, + .get_guest_ip = kvm_get_guest_ip, + .reset_in_guest = kvm_reset_in_guest +}; + static void vmx_complete_interrupts(struct vcpu_vmx *vmx) { u32 exit_intr_info; @@ -3653,8 +3691,11 @@ static void vmx_complete_interrupts(stru /* We need to handle NMIs before interrupts are enabled */ if ((exit_intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR && - (exit_intr_info & INTR_INFO_VALID_MASK)) + (exit_intr_info & INTR_INFO_VALID_MASK)) { + kvm_set_in_guest(); asm("int $2"); + kvm_reset_in_guest(); + } idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK; @@ -4251,6 +4292,8 @@ static int __init vmx_init(void) if (bypass_guest_pf) kvm_mmu_set_nonpresent_ptes(~0xffeull, 0ull); + perf_register_guest_info_callbacks(&kvm_guest_cbs); + return 0; out3: @@ -4266,6 +4309,8 @@ out: static void __exit vmx_exit(void) { + perf_unregister_guest_info_callbacks(&kvm_guest_cbs); + free_page((unsigned long)vmx_msr_bitmap_legacy); free_page((unsigned long)vmx_msr_bitmap_longmode); free_page((unsigned long)vmx_io_bitmap_b); diff -Nraup linux-2.6.34-rc1/include/linux/perf_event.h linux-2.6.34-rc1_work/include/linux/perf_event.h --- linux-2.6.34-rc1/include/li
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
On 03/10/2010 06:38 AM, Cam Macdonell wrote: On Tue, Mar 9, 2010 at 5:03 PM, Paul Brook wrote: In a cross environment that becomes extremely hairy. For example the x86 architecture effectively has an implicit write barrier before every store, and an implicit read barrier before every load. Btw, x86 doesn't have any implicit barriers due to ordinary loads. Only stores and atomics have implicit barriers, afaik. As of March 2009[1] Intel guarantees that memory reads occur in order (they may only be reordered relative to writes). It appears AMD do not provide this guarantee, which could be an interesting problem for heterogeneous migration.. Paul [*] The most recent docs I have handy. Up to and including Core-2 Duo. Interesting, but what ordering would cause problems that AMD would do but Intel wouldn't? Wouldn't that ordering cause the same problems for POSIX shared memory in general (regardless of Qemu) on AMD? If some code was written for the Intel guarantees it would break if migrated to AMD. Of course, it would also break if run on AMD in the first place. I think shared memory breaks migration anyway. Until someone implements distributed shared memory. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device
On 03/09/2010 11:44 PM, Anthony Liguori wrote: Ah yes. For cross tcg environments you can map the memory using mmio callbacks instead of directly, and issue the appropriate barriers there. Not good enough unless you want to severely restrict the use of shared memory within the guest. For instance, it's going to be useful to assume that you atomic instructions remain atomic. Crossing architecture boundaries here makes these assumptions invalid. A barrier is not enough. You could make the mmio callbacks flow to the shared memory server over the unix-domain socket, which would then serialize them. Still need to keep RMWs as single operations. When the host supports it, implement the operation locally (you can't render cmpxchg16b on i386, for example). Shared memory only makes sense when using KVM. In fact, we should actively disable the shared memory device when not using KVM. Looks like that's the only practical choice. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Inter-VM shared memory PCI device
On 03/09/2010 08:34 PM, Cam Macdonell wrote: On Tue, Mar 9, 2010 at 10:28 AM, Avi Kivity wrote: On 03/09/2010 05:27 PM, Cam Macdonell wrote: Registers are used for synchronization between guests sharing the same memory object when interrupts are supported (this requires using the shared memory server). How does the driver detect whether interrupts are supported or not? At the moment, the VM ID is set to -1 if interrupts aren't supported, but that may not be the clearest way to do things. With UIO is there a way to detect if the interrupt pin is on? I suggest not designing the device to uio. Make it a good guest-independent device, and if uio doesn't fit it, change it. Why not support interrupts unconditionally? Is the device useful without interrupts? Currently my patch works with or without the shared memory server. If you give the parameter -ivshmem 256,foo then this will create (if necessary) and map /dev/shm/foo as the shared region without interrupt support. Some users of shared memory are using it this way. Going forward we can require the shared memory server and always have interrupts enabled. Can you explain how they synchronize? Polling? Using the network? Using it as a shared cache? If it's a reasonable use case it makes sense to keep it. Another thing comes to mind - a shared memory ID, in case a guest has multiple cards. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.
On Wed, Mar 10, 2010 at 06:12:34PM +0900, Takuya Yoshikawa wrote: > Gleb Natapov wrote: > >On Wed, Mar 10, 2010 at 11:30:20AM +0900, Takuya Yoshikawa wrote: > >>Gleb Natapov wrote: > >>>On Tue, Mar 09, 2010 at 04:50:29PM +0200, Avi Kivity wrote: > On 03/09/2010 04:09 PM, Gleb Natapov wrote: > >Currently when string instruction is only partially complete we go back > >to a guest mode, guest tries to reexecute instruction and exits again > >and at this point emulation continues. Avoid all of this by restarting > >instruction without going back to a guest mode. > What happens if rcx is really big? Going back into the guest gave > us a preemption point. > > >>>Two solutions. We can check if reschedule is required and yield cpu if > >>>needed. Or we can enter guest from time to time. > >>One generic question: from the viewpoint of KVM's policy, is it OK to make > >>the semantics different from real CPUs? > >> > >>Semantics, may be better to use other words, but I'm little bit worried that > >>the second solution may change something, not mentioning about bugs but some > >>behavior patterns depending on the "time to time". > >> > >Entering guest from time to time will not change semantics of the > >processor (if code is not modified under processor's feet at least). > >Currently we reenter guest mode after each iteration of string > >instruction for all instruction but ins/outs. > > > > E.g., is there no chance that during the repetitions, in the middle of the > repetitions, page faults occur? If it can, without entering the guest, can > we handle it? > -- I lack some basic assumptions? > If page fault occurs we inject it to the guest. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.
On 03/10/2010 11:12 AM, Takuya Yoshikawa wrote: Entering guest from time to time will not change semantics of the processor (if code is not modified under processor's feet at least). Currently we reenter guest mode after each iteration of string instruction for all instruction but ins/outs. E.g., is there no chance that during the repetitions, in the middle of the repetitions, page faults occur? If it can, without entering the guest, can we handle it? Page faults can occur, and we need to handle them. Another reason for reentering the guest is so that we can inject guest interrupts. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.
On 03/09/2010 08:11 PM, Gleb Natapov wrote: On Tue, Mar 09, 2010 at 04:50:29PM +0200, Avi Kivity wrote: On 03/09/2010 04:09 PM, Gleb Natapov wrote: Currently when string instruction is only partially complete we go back to a guest mode, guest tries to reexecute instruction and exits again and at this point emulation continues. Avoid all of this by restarting instruction without going back to a guest mode. What happens if rcx is really big? Going back into the guest gave us a preemption point. Two solutions. We can check if reschedule is required and yield cpu if needed. Or we can enter guest from time to time. I'd stick with the current solution, reentering the guest every page or so. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 19/24] KVM: x86 emulator: fix in/out emulation.
On 03/09/2010 08:09 PM, Gleb Natapov wrote: We don't want to enter the emulator for non-string in/out. Leftover test code? No, unfortunately this is not leftover. I just don't see a way how we can bypass emulator and still have emulator be able to emulate in/out (for big real mode for instance). The problem is basically described in the commit message. If we have function outside of emulator that does in/out emulation on vcpu directly, then emulator can't use it since committing shadowed registers will overwrite the result of emulation. Having two different emulations (one outside of emulator and another in emulator) is also problematic since when userspace returns after IO exit we don't know which emulation to continue. If we want to avoid instruction decoding we can fill in emulation context from exit info as if instruction was already decoded and call emulator. Alternatively, another entry point would be fine. in/out is a fast path (used for virtio for example). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 15/24] KVM: x86 emulator: Provide more callbacks for x86 emulator.
On 03/09/2010 07:57 PM, Gleb Natapov wrote: Descriptor writes need an atomic kvm_set_guest_bit(), no? It is? atomic against what? Current code just write whole descriptor using write_std(). These are accessed bit changes, and are done atomically in the same way as a page table walk sets the accessed and dirty bit. Presumably the atomic operation is to allow the kernel to scan segments and swap them out if they are not used. We can use cmpxchg callback for that, no? Yes. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.
Gleb Natapov wrote: On Wed, Mar 10, 2010 at 11:30:20AM +0900, Takuya Yoshikawa wrote: Gleb Natapov wrote: On Tue, Mar 09, 2010 at 04:50:29PM +0200, Avi Kivity wrote: On 03/09/2010 04:09 PM, Gleb Natapov wrote: Currently when string instruction is only partially complete we go back to a guest mode, guest tries to reexecute instruction and exits again and at this point emulation continues. Avoid all of this by restarting instruction without going back to a guest mode. What happens if rcx is really big? Going back into the guest gave us a preemption point. Two solutions. We can check if reschedule is required and yield cpu if needed. Or we can enter guest from time to time. One generic question: from the viewpoint of KVM's policy, is it OK to make the semantics different from real CPUs? Semantics, may be better to use other words, but I'm little bit worried that the second solution may change something, not mentioning about bugs but some behavior patterns depending on the "time to time". Entering guest from time to time will not change semantics of the processor (if code is not modified under processor's feet at least). Currently we reenter guest mode after each iteration of string instruction for all instruction but ins/outs. E.g., is there no chance that during the repetitions, in the middle of the repetitions, page faults occur? If it can, without entering the guest, can we handle it? -- I lack some basic assumptions? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/24] KVM: x86 emulator: restart string instruction without going back to a guest.
On Wed, Mar 10, 2010 at 11:30:20AM +0900, Takuya Yoshikawa wrote: > Gleb Natapov wrote: > >On Tue, Mar 09, 2010 at 04:50:29PM +0200, Avi Kivity wrote: > >>On 03/09/2010 04:09 PM, Gleb Natapov wrote: > >>>Currently when string instruction is only partially complete we go back > >>>to a guest mode, guest tries to reexecute instruction and exits again > >>>and at this point emulation continues. Avoid all of this by restarting > >>>instruction without going back to a guest mode. > >>What happens if rcx is really big? Going back into the guest gave > >>us a preemption point. > >> > >Two solutions. We can check if reschedule is required and yield cpu if > >needed. Or we can enter guest from time to time. > > One generic question: from the viewpoint of KVM's policy, is it OK to make > the semantics different from real CPUs? > > Semantics, may be better to use other words, but I'm little bit worried that > the second solution may change something, not mentioning about bugs but some > behavior patterns depending on the "time to time". > Entering guest from time to time will not change semantics of the processor (if code is not modified under processor's feet at least). Currently we reenter guest mode after each iteration of string instruction for all instruction but ins/outs. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM test: Make sure check_image script runs on VMs turned off
- "Lucas Meneghel Rodrigues" wrote: > As it is hard to guarantee that a qcow2 image will be in a > consistent state with a VM turned on, take an extra safety > step and make sure the preprocessor shuts down the VMs > before the post process command check_image.py runs. > > Signed-off-by: Lucas Meneghel Rodrigues > --- > client/tests/kvm/tests_base.cfg.sample |2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/client/tests/kvm/tests_base.cfg.sample > b/client/tests/kvm/tests_base.cfg.sample > index 340b0c0..beae786 100644 > --- a/client/tests/kvm/tests_base.cfg.sample > +++ b/client/tests/kvm/tests_base.cfg.sample > @@ -1049,6 +1049,8 @@ variants: > post_command = " python scripts/check_image.py;" > remove_image = no > post_command_timeout = 600 > +kill_vm = yes > +kill_vm_gracefully = yes That's not necessarily bad, but this may significantly slow down testing because it means the VM will shutdown and boot up again after every qcow2 test. It'll also separate the tests in an unnatural way, eliminating the possibility of catching problems that only appear after several consecutive tests (such problems may or may not be possible, I'm not sure). Maybe we should consider specifying the post_command for only some of the tests, or add a dedicated test for this purpose, or even a no-op test that only shuts down the VM and runs the post command. > - vmdk: > only Fedora Ubuntu Windows > only smp2 > -- > 1.6.6.1 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Question on stopping KVM start at boot
Hi folks. Host - ubuntu 9.10 64bit Virtualizer - KVM I need to stop KVM starting at boot. I added following 2 lines at the bottom of /etc/modprobe.d/blacklist.conf blacklist kvm blacklist kvm-amd Reboot PC It doesn't work. $ lsmod | grep kvm kvm_amd41556 0 kvm 190648 1 kvm_amd Please what further command I have to run in order to activate the new blacklist.conf ? TIA B.R. Stephen L -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html