Re: ESXi, KVM or Xen?
Peter Chacko wrote: Did you consider VirtualBox ? Of course VmWare is the market leader NOW, but if you plan to invest on future open source platforms, You should choose KVM(which is now Linux native) or XEN.( Its unlikely to be killed). KVM still lag behind in terms of enterprise-class features , but count on it for future investment. So, i think you should just start off with Xen or virtualBox, with a migration plan to KVM in future. VBox has surely its strengths on non-Linux hosts and hosts without virtualization acceleration. But I would carefully evaluate its performance under relevant load: http://permalink.gmane.org/gmane.comp.emulators.virtualbox.devel/2796 I recently learned from someone doing Xen consulting that it's still troublesome to get it running on non-certified hardware. This may have impact on the hardware choice. For hosting Linux-on-Linux, I would also consider containers, e.g. lxc or OpenVZ. Performance-wise, that's generally the most efficient approach. Jan signature.asc Description: OpenPGP digital signature
Re: ESXi, KVM or Xen?
Emmanuel Noobadmin wrote: if by 'put storage on the network' you mean using a block-level protocol (iSCSI, FCoE, AoE, NBD, DRBD...), then you should by all means initiate on the host OS (Dom0 in Xen) and present to the VM as if it were local storage. it's far faster and more stable that way. in that case, storage wouldn't add to the VM's network load, which might or might not make those (old) scenarios irrelevant Thanks for that tip :) in any case, yes; Xen does have more maturity on big hosting deployments. but most third parties are betting on KVM for the future. the biggest examples are Redhat, Canonical, libvirt (which is sponsored by redhat), and Eucalyptus (which reimplements amazon's EC2 with either Xen or KVM, focusing on the last) so the gap is closing. This is what I figured too, hence not a straightforward choice. I don't need top notch performance for most of the servers targeted for virtualization. Loads are usually low except on the mail servers and often only when there's a mail loop problem. So if the performance hit under worse case situation is only 10~20%, it's something I can live with. Especially since the intended VM servers (i5/i7) will be significantly faster than the current ones (P4/C2D) I'm basing the my estimates on. But I need to do my due dilligence and have justification ready to show that current performance/reliability/security is at least good enough instead of I like where KVM is going and think it'll be the platform of choice in the years to come. Bosses and clients tend to frown on that kind of thing :D How much customization will you apply on your virtualization infrastructure? If you can manage to do the majority via proper hypervisor abstraction, specifically libvirt, you will actually have quite some freedom in choosing the platform. If not, I would very carefully look at the management interfaces of all those hypervisors, how much they conform to standard administration procedures or what specialties they require, both on host and guest side. and finally, even if right now the 'best' deployment on Xen definitely outperforms KVM by a measurable margin; when things are not optimal Xen degrades a lot quicker than KVM. in part because the Xen core scheduler is far from the maturity of Linux kernel's scheduler. The problem is finding stats to back that up if my clients/boss ask about it. So far most of the available comparisons/data seem rather dated, mostly 2007 and 2008. The most professional looking one, in that PDF I linked to, seems to indicate the opposite, i.e. KVM degrades faster when things go south. That graph with the Apache problem is especially damning because our primary product/services are web-based applications, infrastructure being a supplement service/product. In addition, I remember reading a thread on this list where an Intel developer pointed out that the Linux scheduler causes performance hit, about 8x~10x slower when the physical processors are heavily loaded and there are more vCPU than pCPU when it puts the same VM's vCPUs into the same physical core. That's only relevant if you run SMP guests on over-committed hosts. How will your guests look like? So I am a little worried since 8~10x is massive difference, esp if some process goes awry, starts chewing up processor cycles and the VM starts to lag because of this. A vicious cycle that makes it even harder to fix things without killing the VM. Of course if I could honestly tell my clients/boss This, this and this are rare situations we will almost never encounter..., then it's a different thing. Hence asking about this here :) All solutions have weak points. The point is indeed to estimate if your use cases will trigger then. Still then, the question remains if some weakness is inherent to the solution's design or likely to be fixed quicker than you will actually hit it. And weaknesses may not only be performance aspects. Jan signature.asc Description: OpenPGP digital signature
[PATCH] KVM: VMX: fix tlb flush with invalid root
Commit 341d9b535b6c simplify reload logic while entry guest mode, it can avoid unnecessary sync-root if KVM_REQ_MMU_RELOAD and KVM_REQ_MMU_SYNC both set. But, it cause a issue that when we handle 'KVM_REQ_TLB_FLUSH', the root is invalid, it is triggered during my test: Kernel BUG at a00212b8 [verbose debug info unavailable] .. [810f5caf] ? fget_light+0x111/0x28e [81103963] sys_ioctl+0x47/0x6a [81002c1b] system_call_fastpath+0x16/0x1b Code: f0 eb 21 f7 c2 00 00 00 04 74 22 48 8d 45 f0 48 c7 45 f0 00 00 00 00 48 c7 45 f8 00 00 00 00 b9 02 00 00 00 66 0f 38 80 08 77 02 0f 0b c9 c3 55 48 89 e5 0f 1f 44 00 00 ba 00 68 00 00 48 8b 8f RIP [a00212b8] vmx_flush_tlb+0xdf/0xe3 [kvm_intel] RSP 8800be269ca8 Fixed by directly return if the root is not ready. Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/include/asm/kvm_host.h |2 ++ arch/x86/kvm/mmu.c |2 -- arch/x86/kvm/vmx.c |5 - 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 2bda624..8f522ec 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -40,6 +40,8 @@ 0xFF00ULL) #define INVALID_PAGE (~(hpa_t)0) +#define VALID_PAGE(x) ((x) != INVALID_PAGE) + #define UNMAPPED_GVA (~(gpa_t)0) /* KVM Hugepage definitions for x86 */ diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a0c5c31..399ddb0 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -92,8 +92,6 @@ module_param(oos_shadow, bool, 0644); #define PT_FIRST_AVAIL_BITS_SHIFT 9 #define PT64_SECOND_AVAIL_BITS_SHIFT 52 -#define VALID_PAGE(x) ((x) != INVALID_PAGE) - #define PT64_LEVEL_BITS 9 #define PT64_LEVEL_SHIFT(level) \ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 806ab12..ebaaeaf 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1831,8 +1831,11 @@ static void exit_lmode(struct kvm_vcpu *vcpu) static void vmx_flush_tlb(struct kvm_vcpu *vcpu) { vpid_sync_context(to_vmx(vcpu)); - if (enable_ept) + if (enable_ept) { + if (!VALID_PAGE(vcpu-arch.mmu.root_hpa)) + return; ept_sync_context(construct_eptp(vcpu-arch.mmu.root_hpa)); + } } static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu) -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 4/6] KVM: MMU: prefetch ptes when intercepted guest #PF
Marcelo Tosatti wrote: + +if (is_rsvd_bits_set(vcpu, gpte, PT_PAGE_TABLE_LEVEL)) +break; BTW, doesnt sync_page also lack reserved bit checking? (unrelated to this patch). I think it's not since if EPT is enabled, no unsync page exist, the sync page path can't be trigged. :-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: IOAPIC: only access APIC registers one dword at a time
Marcelo Tosatti wrote: +if (len != 4) { +printk(KERN_WARNING ioapic: wrong length %d\n, len); +return 0; +} + Just remove the printks please, as guests can flood hosts dmesg. OK, please review this one, thanks, Marcelo! Subject: [PATCH v3] KVM: IOAPIC: only access APIC registers one dword at a time The IOAPIC spec says: When accessing these registers, accesses must be done one dword at a time. For example, software should never access byte 2 from the Data register before accessing bytes 0 and 1. The hardware will not attempt to recover from a bad programming model in this case. So, this patch removes other width access Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- virt/kvm/ioapic.c | 22 ++ 1 files changed, 6 insertions(+), 16 deletions(-) diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 1149c60..aed91fd 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -288,6 +288,9 @@ static int ioapic_mmio_read(struct kvm_io_device *this, gpa_t addr, int len, ioapic_debug(addr %lx\n, (unsigned long)addr); ASSERT(!(addr 0xf)); /* check alignment */ + if (len != 4) + return 0; + addr = 0xff; spin_lock(ioapic-lock); switch (addr) { @@ -305,18 +308,7 @@ static int ioapic_mmio_read(struct kvm_io_device *this, gpa_t addr, int len, } spin_unlock(ioapic-lock); - switch (len) { - case 8: - *(u64 *) val = result; - break; - case 1: - case 2: - case 4: - memcpy(val, (char *)result, len); - break; - default: - printk(KERN_WARNING ioapic: wrong length %d\n, len); - } + *(u32 *) val = result; return 0; } @@ -332,12 +324,10 @@ static int ioapic_mmio_write(struct kvm_io_device *this, gpa_t addr, int len, (void*)addr, len, val); ASSERT(!(addr 0xf)); /* check alignment */ - if (len == 4 || len == 8) + if (len == 4) data = *(u32 *) val; - else { - printk(KERN_WARNING ioapic: Unsupported size %d\n, len); + else return 0; - } addr = 0xff; spin_lock(ioapic-lock); -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
On Mon, Jun 28, 2010 at 05:56:07PM +0800, Xin, Xiaohui wrote: OK, if I understand you correctly then I don't think have a problem. With your current patch-set you have exactly the same situation when the skb-data is reallocated as a kernel buffer. When will skb-data to be reallocated? May you point me the code path? Anything that calls pskb_expand_head. This is OK because as you correctly argue, it is a rare situation. With my proposal you will need to get this extra external buffer in even less cases, because you'd only need to do it if the skb head grows, which only happens if it becomes encapsulated. So let me explain it in a bit more detail: Our packet starts out as a purely non-linear skb, i.e., skb-head contains nothing and all the page frags come from the guest. During host processing we may pull data into skb-head but the first frag will remain unless we pull all of it. If we did do that then you would have a free external buffer anyway. Now in the common case the header may be modified or pulled, but it very rarely grows. So you can just copy the header back into the first frag just before we give it to the guest. Since the data is still there, so recompute the page offset and size is ok, right? Right, you just move the page offset back and increase the size. However, to do this safely we'd need to have a way of knowing whether the skb head has been modified. It may well turn out to be just as effective to do something like if (memcmp(skb-data, page frag head, skb_headlen)) memcpy(page frag head, skb-data, skb_headlen) As the page frag head should be in cache since it would've been used to populate skb-data. It'd be good to run some benchmarks with this to see whether adding a bit to sk_buff to avoid the memcmp is worth it or not. Cheers, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: IOAPIC: only access APIC registers one dword at a time
On 07/02/2010 11:00 AM, Xiao Guangrong wrote: The IOAPIC spec says: When accessing these registers, accesses must be done one dword at a time. For example, software should never access byte 2 from the Data register before accessing bytes 0 and 1. The hardware will not attempt to recover from a bad programming model in this case. So, this patch removes other width access The ioapic code also implements the ia64 iosapic. I'm guessing that does support 64-bit accesses. Please check the iosapic documentation. There might be guests that use incorrect access despite the documentation; if real hardware supports it, it should work. So we need to start with just a warning, and allow the access. Later we can drop the invalid access. @@ -288,6 +288,11 @@ static int ioapic_mmio_read(struct kvm_io_device *this, gpa_t addr, int len, ioapic_debug(addr %lx\n, (unsigned long)addr); ASSERT(!(addr 0xf)); /* check alignment */ + if (len != 4) { + printk(KERN_WARNING ioapic: wrong length %d\n, len); + return 0; + } + Guest triggered, so needs to be rate limited. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm.git unittest failures
On 07/03/2010 01:44 AM, Marcelo Tosatti wrote: Config entry: [access] file = access.flat Massive log, compressed and attached. run test pde.p user: FAIL: error code 5 expected 4 test pte.rw pde.p user: FAIL: error code 5 expected 4 test pte.user pde.p user: FAIL: error code 5 expected 4 test pte.rw pte.user pde.p user: FAIL: error code 5 expected 4 test pte.a pde.p user: FAIL: error code 5 expected 4 test pte.rw pte.a pde.p user: FAIL: error code 5 expected 4 test pte.user pte.a pde.p user: FAIL: error code 5 expected 4 P flag (bit 0). This flag is 0 if there is no valid translation for the linear address because the P flag was 0 in one of the paging-structure entries used to translate that address. Avi, a walk ignoring access permissions should be done to properly set the P flag on error code. Does anybody care? Probably not, but best to be compatible with real hardware. We could simply continue the existing walk, just set a flag indicating that it can't succeed. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random crash in post_kvm_run()
On 07/02/2010 10:08 PM, BuraphaLinux Server wrote: Hello, I tried my best to do the bisection, and the result after many kernels was: Bisecting: 0 revisions left to test after this (roughly 0 steps) [3ce672d48400e0112fec7a3cb6bb2120493c6e11] KVM: SVM: init_vmcb(): remove redundant save-cr0 initialization So what do I do next? You still need to test that last kernel. 0 revisions left means that after this test (and the last git bisect good or git bisect bad) you'll have the answer. I did not 'make mrproper' between each build - should I have done that? No need. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch
Marcelo Tosatti wrote: On Thu, Jul 01, 2010 at 09:55:56PM +0800, Xiao Guangrong wrote: Combine guest pte read between guest pte walk and pte prefetch Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/paging_tmpl.h | 48 ++- 1 files changed, 33 insertions(+), 15 deletions(-) Can't do this, it can miss invlpg: vcpu0 vcpu1 read guest ptes modify guest pte invlpg instantiate stale guest pte Ah, oops, sorry :-( See how the pte is reread inside fetch with mmu_lock held. It looks like something is broken in 'fetch' functions, this patch will fix it. Subject: [PATCH] KVM: MMU: fix last level broken in FNAME(fetch) We read the guest level out of 'mmu_lock', sometimes, the host mapping is confusion. Consider this case: VCPU0: VCPU1 Read guest mapping, assume the mapping is: GLV3 - GLV2 - GLV1 - GFNA, And in the host, the corresponding mapping is HLV3 - HLV2 - HLV1(P=0) Write GLV1 and cause the mapping point to GFNB (May occur in pte_write or invlpg path) Mapping GLV1 to GFNA This issue only occurs in the last indirect mapping, since if the middle mapping is changed, the mapping will be zapped, then it will be detected in the FNAME(fetch) path, but when it map the last level, it not checked. Fixed by also check the last level. Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/paging_tmpl.h | 32 +--- 1 files changed, 25 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 3350c02..e617e93 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -291,6 +291,20 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, gpte_to_gfn(gpte), pfn, true, true); } +static bool FNAME(check_level_mapping)(struct kvm_vcpu *vcpu, + struct guest_walker *gw, int level) +{ + pt_element_t curr_pte; + int r; + + r = kvm_read_guest_atomic(vcpu-kvm, gw-pte_gpa[level - 1], +curr_pte, sizeof(curr_pte)); + if (r || curr_pte != gw-ptes[level - 1]) + return false; + + return true; +} + /* * Fetch a shadow pte for a specific level in the paging hierarchy. */ @@ -304,11 +318,9 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, u64 spte, *sptep = NULL; int direct; gfn_t table_gfn; - int r; int level; - bool dirty = is_dirty_gpte(gw-ptes[gw-level - 1]); + bool dirty = is_dirty_gpte(gw-ptes[gw-level - 1]), check = true; unsigned direct_access; - pt_element_t curr_pte; struct kvm_shadow_walk_iterator iterator; if (!is_present_gpte(gw-ptes[gw-level - 1])) @@ -322,6 +334,12 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, level = iterator.level; sptep = iterator.sptep; if (iterator.level == hlevel) { + if (check level == gw-level + !FNAME(check_level_mapping)(vcpu, gw, hlevel)) { + kvm_release_pfn_clean(pfn); + break; + } + mmu_set_spte(vcpu, sptep, access, gw-pte_access access, user_fault, write_fault, @@ -376,10 +394,10 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, sp = kvm_mmu_get_page(vcpu, table_gfn, addr, level-1, direct, access, sptep); if (!direct) { - r = kvm_read_guest_atomic(vcpu-kvm, - gw-pte_gpa[level - 2], - curr_pte, sizeof(curr_pte)); - if (r || curr_pte != gw-ptes[level - 2]) { + if (hlevel == level - 1) + check = false; + + if (!FNAME(check_level_mapping)(vcpu, gw, level - 1)) { kvm_mmu_put_page(sp, sptep); kvm_release_pfn_clean(pfn); sptep = NULL; -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch
On 07/02/2010 08:03 PM, Marcelo Tosatti wrote: On Thu, Jul 01, 2010 at 09:55:56PM +0800, Xiao Guangrong wrote: Combine guest pte read between guest pte walk and pte prefetch Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/paging_tmpl.h | 48 ++- 1 files changed, 33 insertions(+), 15 deletions(-) Can't do this, it can miss invlpg: vcpu0 vcpu1 read guest ptes modify guest pte invlpg instantiate stale guest pte See how the pte is reread inside fetch with mmu_lock held. Note, this is fine if the pte is unsync, since vcpu0 will soon invlpg it. It's only broken for sync ptes. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch
On 07/03/2010 01:31 PM, Xiao Guangrong wrote: See how the pte is reread inside fetch with mmu_lock held. It looks like something is broken in 'fetch' functions, this patch will fix it. Subject: [PATCH] KVM: MMU: fix last level broken in FNAME(fetch) We read the guest level out of 'mmu_lock', sometimes, the host mapping is confusion. Consider this case: VCPU0: VCPU1 Read guest mapping, assume the mapping is: GLV3 - GLV2 - GLV1 - GFNA, And in the host, the corresponding mapping is HLV3 - HLV2 - HLV1(P=0) Write GLV1 and cause the mapping point to GFNB (May occur in pte_write or invlpg path) Mapping GLV1 to GFNA This issue only occurs in the last indirect mapping, since if the middle mapping is changed, the mapping will be zapped, then it will be detected in the FNAME(fetch) path, but when it map the last level, it not checked. Fixed by also check the last level. I don't really see what is fixed. We already check the gpte. What's special about the new scenario? @@ -322,6 +334,12 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, level = iterator.level; sptep = iterator.sptep; if (iterator.level == hlevel) { + if (check level == gw-level + !FNAME(check_level_mapping)(vcpu, gw, hlevel)) { + kvm_release_pfn_clean(pfn); + break; + } + Now we check here... mmu_set_spte(vcpu, sptep, access, gw-pte_access access, user_fault, write_fault, @@ -376,10 +394,10 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, sp = kvm_mmu_get_page(vcpu, table_gfn, addr, level-1, direct, access, sptep); if (!direct) { - r = kvm_read_guest_atomic(vcpu-kvm, - gw-pte_gpa[level - 2], - curr_pte, sizeof(curr_pte)); - if (r || curr_pte != gw-ptes[level - 2]) { + if (hlevel == level - 1) + check = false; + + if (!FNAME(check_level_mapping)(vcpu, gw, level - 1)) { ... and here? Why? (looking at the code, we have a call to kvm_host_page_size() on every page fault, that takes mmap_sem... that's got to impact scaling) -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch
Avi Kivity wrote: On 07/03/2010 01:31 PM, Xiao Guangrong wrote: See how the pte is reread inside fetch with mmu_lock held. It looks like something is broken in 'fetch' functions, this patch will fix it. Subject: [PATCH] KVM: MMU: fix last level broken in FNAME(fetch) We read the guest level out of 'mmu_lock', sometimes, the host mapping is confusion. Consider this case: VCPU0: VCPU1 Read guest mapping, assume the mapping is: GLV3 - GLV2 - GLV1 - GFNA, And in the host, the corresponding mapping is HLV3 - HLV2 - HLV1(P=0) Write GLV1 and cause the mapping point to GFNB (May occur in pte_write or invlpg path) Mapping GLV1 to GFNA This issue only occurs in the last indirect mapping, since if the middle mapping is changed, the mapping will be zapped, then it will be detected in the FNAME(fetch) path, but when it map the last level, it not checked. Fixed by also check the last level. I don't really see what is fixed. We already check the gpte. What's special about the new scenario? I mean is: while we map the last level, we will directly set to the pfn but the pfn is got by walk_addr, at this time, the guest mapping may be changed. What is the 'We already check the gpte' mean? i think i miss something :-( -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch
On 07/03/2010 03:16 PM, Xiao Guangrong wrote: Avi Kivity wrote: On 07/03/2010 01:31 PM, Xiao Guangrong wrote: See how the pte is reread inside fetch with mmu_lock held. It looks like something is broken in 'fetch' functions, this patch will fix it. Subject: [PATCH] KVM: MMU: fix last level broken in FNAME(fetch) We read the guest level out of 'mmu_lock', sometimes, the host mapping is confusion. Consider this case: VCPU0: VCPU1 Read guest mapping, assume the mapping is: GLV3 - GLV2 - GLV1 - GFNA, And in the host, the corresponding mapping is HLV3 - HLV2 - HLV1(P=0) Write GLV1 and cause the mapping point to GFNB (May occur in pte_write or invlpg path) Mapping GLV1 to GFNA This issue only occurs in the last indirect mapping, since if the middle mapping is changed, the mapping will be zapped, then it will be detected in the FNAME(fetch) path, but when it map the last level, it not checked. Fixed by also check the last level. I don't really see what is fixed. We already check the gpte. What's special about the new scenario? I mean is: while we map the last level, we will directly set to the pfn but the pfn is got by walk_addr, at this time, the guest mapping may be changed. What is the 'We already check the gpte' mean? i think i miss something :-( if (!direct) { r = kvm_read_guest_atomic(vcpu-kvm, gw-pte_gpa[level - 2], curr_pte, sizeof(curr_pte)); if (r || curr_pte != gw-ptes[level - 2]) { kvm_mmu_put_page(shadow_page, sptep); kvm_release_pfn_clean(pfn); sptep = NULL; break; } } the code you moved... under what scenario is it not sufficient? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch
On 07/03/2010 03:31 PM, Xiao Guangrong wrote: Avi Kivity wrote: if (!direct) { r = kvm_read_guest_atomic(vcpu-kvm, gw-pte_gpa[level - 2], curr_pte, sizeof(curr_pte)); if (r || curr_pte != gw-ptes[level - 2]) { kvm_mmu_put_page(shadow_page, sptep); kvm_release_pfn_clean(pfn); sptep = NULL; break; } } the code you moved... under what scenario is it not sufficient? I not move those code, just use common function instead, that it's FNAME(check_level_mapping)(), there are do the same work. And this check is not sufficient, since it's only checked if the mapping is zapped or not exist, for other words only when broken this judgment: is_shadow_present_pte(*sptep) !is_large_pte(*sptep) but if the middle level is present and it's not the large mapping, this check is skipped. Well, in the description, it looked like everything was using small pages (in kvm, level=1 means PTE level, we need to change this one day). Please describe again and say exactly when the guest or host uses large pages. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch
On 07/03/2010 03:44 PM, Avi Kivity wrote: On 07/03/2010 03:31 PM, Xiao Guangrong wrote: Avi Kivity wrote: if (!direct) { r = kvm_read_guest_atomic(vcpu-kvm, gw-pte_gpa[level - 2], curr_pte, sizeof(curr_pte)); if (r || curr_pte != gw-ptes[level - 2]) { kvm_mmu_put_page(shadow_page, sptep); kvm_release_pfn_clean(pfn); sptep = NULL; break; } } the code you moved... under what scenario is it not sufficient? I not move those code, just use common function instead, that it's FNAME(check_level_mapping)(), there are do the same work. And this check is not sufficient, since it's only checked if the mapping is zapped or not exist, for other words only when broken this judgment: is_shadow_present_pte(*sptep) !is_large_pte(*sptep) but if the middle level is present and it's not the large mapping, this check is skipped. Well, in the description, it looked like everything was using small pages (in kvm, level=1 means PTE level, we need to change this one day). Please describe again and say exactly when the guest or host uses large pages. Oh, I see what you mean. Regarding the patch, is it possible just to move the check before, instead of adding the 'check' variable? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch
Avi Kivity wrote: And this check is not sufficient, since it's only checked if the mapping is zapped or not exist, for other words only when broken this judgment: is_shadow_present_pte(*sptep) !is_large_pte(*sptep) but if the middle level is present and it's not the large mapping, this check is skipped. Well, in the description, it looked like everything was using small pages (in kvm, level=1 means PTE level, we need to change this one day). Please describe again and say exactly when the guest or host uses large pages. Avi, sorry for my poor English, i not mean everything was using small, i don't know where cause you confused :-( -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/6] KVM: MMU: combine guest pte read between walk and pte prefetch
Avi Kivity wrote: Well, in the description, it looked like everything was using small pages (in kvm, level=1 means PTE level, we need to change this one day). Please describe again and say exactly when the guest or host uses large pages. Oh, I see what you mean. Regarding the patch, is it possible just to move the check before, instead of adding the 'check' variable? Umm, if we move the check before the judgment, it'll check every level, actually, only the opened mapping and the laster level need checked, so for the performance reason, maybe it's better to keep two check-point. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: random crash in post_kvm_run()
Ok, I kept going like you said. Here is what it said: $git bisect good 44ea2b1758d88ad822e65b1c4c21ca6164494e27 is the first bad commit commit 44ea2b1758d88ad822e65b1c4c21ca6164494e27 Author: Avi Kivity a...@redhat.com Date: Sun Sep 6 15:55:37 2009 +0300 KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr area Currently MSR_KERNEL_GS_BASE is saved and restored as part of the guest/host msr reloading. Since we wish to lazy-restore all the other msrs, save and reload MSR_KERNEL_GS_BASE explicitly instead of using the common code. Signed-off-by: Avi Kivity a...@redhat.com :04 04 fcf14f9e5578a996430650f7806a54bcc8184ef6 24f4b80719c5b7931a5ed5604f3554f78352ed67 M arch $ On 7/3/10, Avi Kivity a...@redhat.com wrote: On 07/02/2010 10:08 PM, BuraphaLinux Server wrote: Hello, I tried my best to do the bisection, and the result after many kernels was: Bisecting: 0 revisions left to test after this (roughly 0 steps) [3ce672d48400e0112fec7a3cb6bb2120493c6e11] KVM: SVM: init_vmcb(): remove redundant save-cr0 initialization So what do I do next? You still need to test that last kernel. 0 revisions left means that after this test (and the last git bisect good or git bisect bad) you'll have the answer. I did not 'make mrproper' between each build - should I have done that? No need. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 27/27] KVM: PPC: Add Documentation about PV interface
On Fri, 2010-07-02 at 18:27 +0200, Segher Boessenkool wrote: +To find out if we're running on KVM or not, we overlay the PVR register. Usually +the PVR register contains an id that identifies your CPU type. If, however, you +pass KVM_PVR_PARA in the register that you want the PVR result in, the register +still contains KVM_PVR_PARA after the mfpvr call. + + LOAD_REG_IMM(r5, KVM_PVR_PARA) + mfpvr r5 + [r5 still contains KVM_PVR_PARA] I love this part :-) Me not :-) It should be in the device-tree instead, or something like that. Enough games with PVR... Ben. + __u64 scratch3; + __u64 critical; /* Guest may not get interrupts if == r1 */ + __u64 sprg0; + __u64 sprg1; + __u64 sprg2; + __u64 sprg3; + __u64 srr0; + __u64 srr1; + __u64 dar; + __u64 msr; + __u32 dsisr; + __u32 int_pending; /* Tells the guest if we have an interrupt */ +}; + +Additions to the page must only occur at the end. Struct fields are always 32 +bit aligned. The u64s are 64-bit aligned, should they always be? +The ld and std instructions are transormed to lwz and stw instructions +respectively on 32 bit systems with an added offset of 4 to accomodate for big +endianness. Will this add never overflow? Is there anything that checks for it? +mtmsrd rX, 0 b special mtmsr section +mtmsr b special mtmsr section mtmsr rX Segher ___ Linuxppc-dev mailing list linuxppc-...@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 27/27] KVM: PPC: Add Documentation about PV interface
On Fri, 2010-07-02 at 20:41 +0200, Alexander Graf wrote: The u64s are 64-bit aligned, should they always be? That's obvious, isn't it? And the ABI only specifies u64s to be 32 bit aligned, no? At least that's what ld and std specify. No, the PowerPC ABI specifies u64's to be 64-bit aligned, even for 32-bit binaries. Ben. +The ld and std instructions are transormed to lwz and stw instructions +respectively on 32 bit systems with an added offset of 4 to accomodate for big +endianness. Will this add never overflow? Is there anything that checks for it? It basically means that to access dar, we either do ld rX, DAR(0) or lwz rX, DAR+4(0) +mtmsrd rX, 0 b special mtmsr section +mtmsr b special mtmsr section mtmsr rX Nod. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 27/27] KVM: PPC: Add Documentation about PV interface
On Fri, 2010-07-02 at 18:27 +0200, Segher Boessenkool wrote: +To find out if we're running on KVM or not, we overlay the PVR register. Usually +the PVR register contains an id that identifies your CPU type. If, however, you +pass KVM_PVR_PARA in the register that you want the PVR result in, the register +still contains KVM_PVR_PARA after the mfpvr call. + + LOAD_REG_IMM(r5, KVM_PVR_PARA) + mfpvr r5 + [r5 still contains KVM_PVR_PARA] I love this part :-) Me not :-) It should be in the device-tree instead, or something like that. Enough games with PVR... Ben. + __u64 scratch3; + __u64 critical; /* Guest may not get interrupts if == r1 */ + __u64 sprg0; + __u64 sprg1; + __u64 sprg2; + __u64 sprg3; + __u64 srr0; + __u64 srr1; + __u64 dar; + __u64 msr; + __u32 dsisr; + __u32 int_pending; /* Tells the guest if we have an interrupt */ +}; + +Additions to the page must only occur at the end. Struct fields are always 32 +bit aligned. The u64s are 64-bit aligned, should they always be? +The ld and std instructions are transormed to lwz and stw instructions +respectively on 32 bit systems with an added offset of 4 to accomodate for big +endianness. Will this add never overflow? Is there anything that checks for it? +mtmsrd rX, 0 b special mtmsr section +mtmsr b special mtmsr section mtmsr rX Segher ___ Linuxppc-dev mailing list linuxppc-...@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 27/27] KVM: PPC: Add Documentation about PV interface
On Fri, 2010-07-02 at 20:41 +0200, Alexander Graf wrote: The u64s are 64-bit aligned, should they always be? That's obvious, isn't it? And the ABI only specifies u64s to be 32 bit aligned, no? At least that's what ld and std specify. No, the PowerPC ABI specifies u64's to be 64-bit aligned, even for 32-bit binaries. Ben. +The ld and std instructions are transormed to lwz and stw instructions +respectively on 32 bit systems with an added offset of 4 to accomodate for big +endianness. Will this add never overflow? Is there anything that checks for it? It basically means that to access dar, we either do ld rX, DAR(0) or lwz rX, DAR+4(0) +mtmsrd rX, 0 b special mtmsr section +mtmsr b special mtmsr section mtmsr rX Nod. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html