Re: [RFC] Nested VMX support - kernel

2009-08-21 Thread Qing He
Orit,
First, thank you for this work, it's very interesting. I
tried the patchset but met some problem. It's a kernel panic in
L2 guest, and L1  L0 remains operable:

BUG: unable to handle kernel paging request at 0104b00d
IP: [c0105282] math_state_restore+0xe/0x2f
*pde = 

For my environment, the L1 hypervisor is 32 bit KVM (kernel version
is 2.6.25), the complete serial.log of L2 is attached. Do you know
how I can get over this hang?

Thanks,
Qing

On Mon, 2009-08-17 at 21:48 +0800, or...@il.ibm.com wrote:
 From: Orit Wasserman or...@il.ibm.com
 
 This patch implements nested VMX support. It enables a guest to use the
 VMX APIs in order to run its own nested guest (i.e., it enables
 running other hypervisors which use VMX under KVM). The current patch
 supports running Linux under a nested KVM. Additional patches for
 running Windows under nested KVM, and Linux and Windows under nested
 VMware server(!), are currently running in the lab. We are in the
 process of forward-porting those patches to -tip.
 
 The current patch only supports a single nested hypervisor, which can
 only run a single guest.  SMP is not supported yet when running nested
 hypervisor (work in progress). Only 64 bit nested hypervisors are
 supported. Currently only EPT mode in both host and nested hypervisor
 is supported (i.e., both hypervisors must use EPT).
 
 This patch was written by:
 Orit Wasserman, or...@il.ibm.com
 Ben-Ami Yassour, ben...@il.ibm.com
 Abel Gordon, ab...@il.ibm.com
 Muli Ben-Yehuda, m...@il.ibm.com
 
 With contributions by
 Anthony Liguori, aligu...@us.ibm.com
 Mike Day, md...@us.ibm.com
 
 This work was inspired by the nested SVM support by Alexander Graf and
 Joerg Roedel.
 
 Signed-off-by: Orit Wasserman or...@il.ibm.com
Linux version 2.6.25 (r...@localhost.localdomain) (gcc version 4.1.2 20080704 
(Red Hat 4.1.2-44)) #1 SMP Wed Mar 18 13:12:03 CST 2009
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e8000 - 0010 (reserved)
 BIOS-e820: 0010 - 31ff (usable)
 BIOS-e820: 31ff - 3200 (ACPI data)
 BIOS-e820: fffbd000 - 0001 (reserved)
0MB HIGHMEM available.
799MB LOWMEM available.
Scan SMP from c000 for 1024 bytes.
Scan SMP from c009fc00 for 1024 bytes.
Scan SMP from c00f for 65536 bytes.
found SMP MP-table at [c00fb540] 000fb540
Zone PFN ranges:
  DMA 0 - 4096
  Normal   4096 -   204784
  HighMem204784 -   204784
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
0:0 -   204784
DMI 2.4 present.
Using APIC driver default
ACPI: RSDP 000FB660, 0014 (r0 QEMU  )
ACPI: RSDT 31FF, 002C (r1 QEMU   QEMURSDT1 QEMU1)
ACPI: FACP 31FF002C, 0074 (r1 QEMU   QEMUFACP1 QEMU1)
ACPI: DSDT 31FF0100, 24A4 (r1   BXPC   BXDSDT1 INTL 20061109)
ACPI: FACS 31FF00C0, 0040
ACPI: APIC 31FF25A8, 00E0 (r1 QEMU   QEMUAPIC1 QEMU1)
ACPI: PM-Timer IO Port: 0xb008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:2 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] disabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] disabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] disabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] disabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] disabled)
ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] disabled)
ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] disabled)
ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] disabled)
ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] disabled)
ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0c] disabled)
ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0d] disabled)
ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0e] disabled)
ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0f] disabled)
ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 1, version 17, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 4000 (gap: 3200:cdfbd000)
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 203185
Kernel command line: ro root=LABEL=/ rhgb console=ttyS0 console=tty0
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 16384 bytes)
Detected 3200.275 MHz processor.
Console: colour VGA+ 80x25
console [tty0] enabled
console 

Re: [RFC] Nested VMX support - kernel

2009-08-18 Thread Avi Kivity

On 08/17/2009 04:48 PM, or...@il.ibm.com wrote:

From: Orit Wassermanor...@il.ibm.com

This patch implements nested VMX support. It enables a guest to use the
VMX APIs in order to run its own nested guest (i.e., it enables
running other hypervisors which use VMX under KVM). The current patch
supports running Linux under a nested KVM. Additional patches for
running Windows under nested KVM, and Linux and Windows under nested
VMware server(!), are currently running in the lab. We are in the
process of forward-porting those patches to -tip.

   


Very impressive stuff.


The current patch only supports a single nested hypervisor, which can
only run a single guest.  SMP is not supported yet when running nested
hypervisor (work in progress). Only 64 bit nested hypervisors are
supported. Currently only EPT mode in both host and nested hypervisor
is supported (i.e., both hypervisors must use EPT).

   


Can you explain what is missing wrt SMP and multiguest support?


diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 33901be..fad3577 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -389,7 +389,8 @@ struct kvm_arch{
unsigned int n_free_mmu_pages;
unsigned int n_requested_mmu_pages;
unsigned int n_alloc_mmu_pages;
-   struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
+   struct hlist_head _mmu_page_hash[KVM_NUM_MMU_PAGES];
+   struct hlist_head *mmu_page_hash;
   


This leads to exploding memory use when the guest has multiple eptp 
contexts.  You should put all shadow pages in the existing 
mmu_page_hash, and tag then with eptp pointer.


Nested EPT is just like ordinary shadow pagetables.  Shadow folds the 
gva-gpa translation with the gpa-hpa translation; nested EPT folds the 
ngpa-gpa translation with the gpa-hpa translation so you should be 
able to reuse the code.




  #includelinux/kvm_host.h
  #includelinux/types.h
@@ -2042,7 +2043,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
ASSERT(!VALID_PAGE(root));
if (tdp_enabled)
direct = 1;
-   if (mmu_check_root(vcpu, root_gfn))
+   if (!is_nested_tdp()   mmu_check_root(vcpu, root_gfn))
return 1;
   


Why remove the check?  It's still needed (presuming root_gfn refers to 
the guest eptp).




-static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
+static int __tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, pfn_t pfn,
u32 error_code)
  {
-   pfn_t pfn;
int r;
int level;
gfn_t gfn = gpa  PAGE_SHIFT;
@@ -2159,11 +2159,6 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa,

mmu_seq = vcpu-kvm-mmu_notifier_seq;
smp_rmb();
-   pfn = gfn_to_pfn(vcpu-kvm, gfn);
-   if (is_error_pfn(pfn)) {
-   kvm_release_pfn_clean(pfn);
-   return 1;
-   }
spin_lock(vcpu-kvm-mmu_lock);
if (mmu_notifier_retry(vcpu, mmu_seq))
goto out_unlock;
@@ -2180,6 +2175,30 @@ out_unlock:
return 0;
  }
   


Why do you need a variant of tdp_page_fault() that doesn't do error 
checking?




+int nested_tdp_page_fault(struct kvm_vcpu *vcpu,
+ gpa_t gpa2,
+ gpa_t ept12)
+{
+   gpa_t gpa1;
+   pfn_t pfn;
+   int r;
+   u64 data = 0;
+
+   ASSERT(vcpu);
+   ASSERT(VALID_PAGE(vcpu-arch.mmu.root_hpa));
+
+   r = mmu_topup_memory_caches(vcpu);
+   if (r)
+   return r;
+
+   gpa1 = paging64_nested_ept_walk(vcpu, gpa2, ept12);
+
+   if (gpa1 == UNMAPPED_GVA)
+   return 1;
+
+   kvm_read_guest(vcpu-kvm, gpa1,data, sizeof(data));
   


Missing error check.


+
+   pfn = gfn_to_pfn(vcpu-kvm, gpa1  PAGE_SHIFT);
+
+   if (is_error_pfn(pfn)) {
+   kvm_release_pfn_clean(pfn);
+   return 1;
+   }
+
+   r = __tdp_page_fault(vcpu, gpa2  PAGE_MASK, pfn, 0);
+   if (r)
+   return r;
+
+   return 0;
+
+}
+EXPORT_SYMBOL_GPL(nested_tdp_page_fault);
   


This should be part of the normal kvm_mmu_page_fault().  It needs to 
emulate if the nested guest has mmio access (for example if you use 
device assignment in the guest with an emulated device).



+#if PTTYPE == 64
+static gpa_t paging64_nested_ept_walk(struct kvm_vcpu *vcpu, gpa_t addr,
+   gpa_t ept12)
+{
+   pt_element_t pte;
+   gfn_t table_gfn;
+   unsigned index;
+   gpa_t pte_gpa;
+   gpa_t gpa1 = UNMAPPED_GVA;
+
+   struct guest_walker walk;
+   struct guest_walker *walker =walk;
+
+   walker-level = vcpu-arch.mmu.shadow_root_level;;
+   pte = ept12;
+
+   for (;;) {
+   index = PT_INDEX(addr, walker-level);
+
+   table_gfn = gpte_to_gfn(pte);
+   pte_gpa = gfn_to_gpa(table_gfn);
+   

Re: [RFC] Nested VMX support - kernel

2009-08-17 Thread Roedel, Joerg
On Mon, Aug 17, 2009 at 03:48:35PM +0200, or...@il.ibm.com wrote:
 From: Orit Wasserman or...@il.ibm.com
 
 This patch implements nested VMX support. It enables a guest to use the
 VMX APIs in order to run its own nested guest (i.e., it enables
 running other hypervisors which use VMX under KVM). The current patch
 supports running Linux under a nested KVM. Additional patches for
 running Windows under nested KVM, and Linux and Windows under nested
 VMware server(!), are currently running in the lab. We are in the
 process of forward-porting those patches to -tip.
 
 The current patch only supports a single nested hypervisor, which can
 only run a single guest.  SMP is not supported yet when running nested
 hypervisor (work in progress). Only 64 bit nested hypervisors are
 supported. Currently only EPT mode in both host and nested hypervisor
 is supported (i.e., both hypervisors must use EPT).
 
 This patch was written by:
 Orit Wasserman, or...@il.ibm.com
 Ben-Ami Yassour, ben...@il.ibm.com
 Abel Gordon, ab...@il.ibm.com
 Muli Ben-Yehuda, m...@il.ibm.com
 
 With contributions by
 Anthony Liguori, aligu...@us.ibm.com
 Mike Day, md...@us.ibm.com

Nice work. Do you have any performance numbers?

Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Nested VMX support - kernel

2009-08-17 Thread Ben-Ami Yassour1


Roedel, Joerg joerg.roe...@amd.com wrote on 17/08/2009 17:24:33:

 On Mon, Aug 17, 2009 at 03:48:35PM +0200, or...@il.ibm.com wrote:
  From: Orit Wasserman or...@il.ibm.com
 
  This patch implements nested VMX support. It enables a guest to use the
  VMX APIs in order to run its own nested guest (i.e., it enables
  running other hypervisors which use VMX under KVM). The current patch
  supports running Linux under a nested KVM. Additional patches for
  running Windows under nested KVM, and Linux and Windows under nested
  VMware server(!), are currently running in the lab. We are in the
  process of forward-porting those patches to -tip.
 
  The current patch only supports a single nested hypervisor, which can
  only run a single guest.  SMP is not supported yet when running nested
  hypervisor (work in progress). Only 64 bit nested hypervisors are
  supported. Currently only EPT mode in both host and nested hypervisor
  is supported (i.e., both hypervisors must use EPT).
 
  This patch was written by:
  Orit Wasserman, or...@il.ibm.com
  Ben-Ami Yassour, ben...@il.ibm.com
  Abel Gordon, ab...@il.ibm.com
  Muli Ben-Yehuda, m...@il.ibm.com
 
  With contributions by
  Anthony Liguori, aligu...@us.ibm.com
  Mike Day, md...@us.ibm.com

 Nice work. Do you have any performance numbers?

We are currently working on the performance, and will have more
numbers soon. For this patch, for kernbench the overhead of
nested kvm over non-nested kvm is about 12%, mostly due to nested
EPT which really improves performance.

Regards,
Ben


Joerg



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html