Re: [RFC] Nested VMX support - kernel
Orit, First, thank you for this work, it's very interesting. I tried the patchset but met some problem. It's a kernel panic in L2 guest, and L1 L0 remains operable: BUG: unable to handle kernel paging request at 0104b00d IP: [c0105282] math_state_restore+0xe/0x2f *pde = For my environment, the L1 hypervisor is 32 bit KVM (kernel version is 2.6.25), the complete serial.log of L2 is attached. Do you know how I can get over this hang? Thanks, Qing On Mon, 2009-08-17 at 21:48 +0800, or...@il.ibm.com wrote: From: Orit Wasserman or...@il.ibm.com This patch implements nested VMX support. It enables a guest to use the VMX APIs in order to run its own nested guest (i.e., it enables running other hypervisors which use VMX under KVM). The current patch supports running Linux under a nested KVM. Additional patches for running Windows under nested KVM, and Linux and Windows under nested VMware server(!), are currently running in the lab. We are in the process of forward-porting those patches to -tip. The current patch only supports a single nested hypervisor, which can only run a single guest. SMP is not supported yet when running nested hypervisor (work in progress). Only 64 bit nested hypervisors are supported. Currently only EPT mode in both host and nested hypervisor is supported (i.e., both hypervisors must use EPT). This patch was written by: Orit Wasserman, or...@il.ibm.com Ben-Ami Yassour, ben...@il.ibm.com Abel Gordon, ab...@il.ibm.com Muli Ben-Yehuda, m...@il.ibm.com With contributions by Anthony Liguori, aligu...@us.ibm.com Mike Day, md...@us.ibm.com This work was inspired by the nested SVM support by Alexander Graf and Joerg Roedel. Signed-off-by: Orit Wasserman or...@il.ibm.com Linux version 2.6.25 (r...@localhost.localdomain) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Mar 18 13:12:03 CST 2009 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e8000 - 0010 (reserved) BIOS-e820: 0010 - 31ff (usable) BIOS-e820: 31ff - 3200 (ACPI data) BIOS-e820: fffbd000 - 0001 (reserved) 0MB HIGHMEM available. 799MB LOWMEM available. Scan SMP from c000 for 1024 bytes. Scan SMP from c009fc00 for 1024 bytes. Scan SMP from c00f for 65536 bytes. found SMP MP-table at [c00fb540] 000fb540 Zone PFN ranges: DMA 0 - 4096 Normal 4096 - 204784 HighMem204784 - 204784 Movable zone start PFN for each node early_node_map[1] active PFN ranges 0:0 - 204784 DMI 2.4 present. Using APIC driver default ACPI: RSDP 000FB660, 0014 (r0 QEMU ) ACPI: RSDT 31FF, 002C (r1 QEMU QEMURSDT1 QEMU1) ACPI: FACP 31FF002C, 0074 (r1 QEMU QEMUFACP1 QEMU1) ACPI: DSDT 31FF0100, 24A4 (r1 BXPC BXDSDT1 INTL 20061109) ACPI: FACS 31FF00C0, 0040 ACPI: APIC 31FF25A8, 00E0 (r1 QEMU QEMUAPIC1 QEMU1) ACPI: PM-Timer IO Port: 0xb008 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 6:2 APIC version 20 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] disabled) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] disabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] disabled) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] disabled) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] disabled) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] disabled) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] disabled) ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] disabled) ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] disabled) ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] disabled) ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0c] disabled) ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0d] disabled) ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0e] disabled) ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0f] disabled) ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 1, version 17, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 4000 (gap: 3200:cdfbd000) Built 1 zonelists in Zone order, mobility grouping on. Total pages: 203185 Kernel command line: ro root=LABEL=/ rhgb console=ttyS0 console=tty0 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 PID hash table entries: 4096 (order: 12, 16384 bytes) Detected 3200.275 MHz processor. Console: colour VGA+ 80x25 console [tty0] enabled console
Re: [RFC] Nested VMX support - kernel
On 08/17/2009 04:48 PM, or...@il.ibm.com wrote: From: Orit Wassermanor...@il.ibm.com This patch implements nested VMX support. It enables a guest to use the VMX APIs in order to run its own nested guest (i.e., it enables running other hypervisors which use VMX under KVM). The current patch supports running Linux under a nested KVM. Additional patches for running Windows under nested KVM, and Linux and Windows under nested VMware server(!), are currently running in the lab. We are in the process of forward-porting those patches to -tip. Very impressive stuff. The current patch only supports a single nested hypervisor, which can only run a single guest. SMP is not supported yet when running nested hypervisor (work in progress). Only 64 bit nested hypervisors are supported. Currently only EPT mode in both host and nested hypervisor is supported (i.e., both hypervisors must use EPT). Can you explain what is missing wrt SMP and multiguest support? diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 33901be..fad3577 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -389,7 +389,8 @@ struct kvm_arch{ unsigned int n_free_mmu_pages; unsigned int n_requested_mmu_pages; unsigned int n_alloc_mmu_pages; - struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES]; + struct hlist_head _mmu_page_hash[KVM_NUM_MMU_PAGES]; + struct hlist_head *mmu_page_hash; This leads to exploding memory use when the guest has multiple eptp contexts. You should put all shadow pages in the existing mmu_page_hash, and tag then with eptp pointer. Nested EPT is just like ordinary shadow pagetables. Shadow folds the gva-gpa translation with the gpa-hpa translation; nested EPT folds the ngpa-gpa translation with the gpa-hpa translation so you should be able to reuse the code. #includelinux/kvm_host.h #includelinux/types.h @@ -2042,7 +2043,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) ASSERT(!VALID_PAGE(root)); if (tdp_enabled) direct = 1; - if (mmu_check_root(vcpu, root_gfn)) + if (!is_nested_tdp() mmu_check_root(vcpu, root_gfn)) return 1; Why remove the check? It's still needed (presuming root_gfn refers to the guest eptp). -static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, +static int __tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, pfn_t pfn, u32 error_code) { - pfn_t pfn; int r; int level; gfn_t gfn = gpa PAGE_SHIFT; @@ -2159,11 +2159,6 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, mmu_seq = vcpu-kvm-mmu_notifier_seq; smp_rmb(); - pfn = gfn_to_pfn(vcpu-kvm, gfn); - if (is_error_pfn(pfn)) { - kvm_release_pfn_clean(pfn); - return 1; - } spin_lock(vcpu-kvm-mmu_lock); if (mmu_notifier_retry(vcpu, mmu_seq)) goto out_unlock; @@ -2180,6 +2175,30 @@ out_unlock: return 0; } Why do you need a variant of tdp_page_fault() that doesn't do error checking? +int nested_tdp_page_fault(struct kvm_vcpu *vcpu, + gpa_t gpa2, + gpa_t ept12) +{ + gpa_t gpa1; + pfn_t pfn; + int r; + u64 data = 0; + + ASSERT(vcpu); + ASSERT(VALID_PAGE(vcpu-arch.mmu.root_hpa)); + + r = mmu_topup_memory_caches(vcpu); + if (r) + return r; + + gpa1 = paging64_nested_ept_walk(vcpu, gpa2, ept12); + + if (gpa1 == UNMAPPED_GVA) + return 1; + + kvm_read_guest(vcpu-kvm, gpa1,data, sizeof(data)); Missing error check. + + pfn = gfn_to_pfn(vcpu-kvm, gpa1 PAGE_SHIFT); + + if (is_error_pfn(pfn)) { + kvm_release_pfn_clean(pfn); + return 1; + } + + r = __tdp_page_fault(vcpu, gpa2 PAGE_MASK, pfn, 0); + if (r) + return r; + + return 0; + +} +EXPORT_SYMBOL_GPL(nested_tdp_page_fault); This should be part of the normal kvm_mmu_page_fault(). It needs to emulate if the nested guest has mmio access (for example if you use device assignment in the guest with an emulated device). +#if PTTYPE == 64 +static gpa_t paging64_nested_ept_walk(struct kvm_vcpu *vcpu, gpa_t addr, + gpa_t ept12) +{ + pt_element_t pte; + gfn_t table_gfn; + unsigned index; + gpa_t pte_gpa; + gpa_t gpa1 = UNMAPPED_GVA; + + struct guest_walker walk; + struct guest_walker *walker =walk; + + walker-level = vcpu-arch.mmu.shadow_root_level;; + pte = ept12; + + for (;;) { + index = PT_INDEX(addr, walker-level); + + table_gfn = gpte_to_gfn(pte); + pte_gpa = gfn_to_gpa(table_gfn); +
Re: [RFC] Nested VMX support - kernel
On Mon, Aug 17, 2009 at 03:48:35PM +0200, or...@il.ibm.com wrote: From: Orit Wasserman or...@il.ibm.com This patch implements nested VMX support. It enables a guest to use the VMX APIs in order to run its own nested guest (i.e., it enables running other hypervisors which use VMX under KVM). The current patch supports running Linux under a nested KVM. Additional patches for running Windows under nested KVM, and Linux and Windows under nested VMware server(!), are currently running in the lab. We are in the process of forward-porting those patches to -tip. The current patch only supports a single nested hypervisor, which can only run a single guest. SMP is not supported yet when running nested hypervisor (work in progress). Only 64 bit nested hypervisors are supported. Currently only EPT mode in both host and nested hypervisor is supported (i.e., both hypervisors must use EPT). This patch was written by: Orit Wasserman, or...@il.ibm.com Ben-Ami Yassour, ben...@il.ibm.com Abel Gordon, ab...@il.ibm.com Muli Ben-Yehuda, m...@il.ibm.com With contributions by Anthony Liguori, aligu...@us.ibm.com Mike Day, md...@us.ibm.com Nice work. Do you have any performance numbers? Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Nested VMX support - kernel
Roedel, Joerg joerg.roe...@amd.com wrote on 17/08/2009 17:24:33: On Mon, Aug 17, 2009 at 03:48:35PM +0200, or...@il.ibm.com wrote: From: Orit Wasserman or...@il.ibm.com This patch implements nested VMX support. It enables a guest to use the VMX APIs in order to run its own nested guest (i.e., it enables running other hypervisors which use VMX under KVM). The current patch supports running Linux under a nested KVM. Additional patches for running Windows under nested KVM, and Linux and Windows under nested VMware server(!), are currently running in the lab. We are in the process of forward-porting those patches to -tip. The current patch only supports a single nested hypervisor, which can only run a single guest. SMP is not supported yet when running nested hypervisor (work in progress). Only 64 bit nested hypervisors are supported. Currently only EPT mode in both host and nested hypervisor is supported (i.e., both hypervisors must use EPT). This patch was written by: Orit Wasserman, or...@il.ibm.com Ben-Ami Yassour, ben...@il.ibm.com Abel Gordon, ab...@il.ibm.com Muli Ben-Yehuda, m...@il.ibm.com With contributions by Anthony Liguori, aligu...@us.ibm.com Mike Day, md...@us.ibm.com Nice work. Do you have any performance numbers? We are currently working on the performance, and will have more numbers soon. For this patch, for kernbench the overhead of nested kvm over non-nested kvm is about 12%, mostly due to nested EPT which really improves performance. Regards, Ben Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html