Re: [PATCH] allow enabling/disabling NPT by reloading only the architecture module
On Tuesday 15 July 2008 18:55:37 Avi Kivity wrote: Yang, Sheng wrote: On Tuesday 15 July 2008 02:36:36 Joerg Roedel wrote: If NPT is enabled after loading both KVM modules on AMD and it should be disabled, both KVM modules must be reloaded. If only the architecture module is reloaded the behavior is undefined. With this patch it is possible to disable NPT only by reloading the kvm_amd module. Signed-off-by: Joerg Roedel [EMAIL PROTECTED] --- From 3dd7fa4abb1cfc702b3fbd7038d585b541f981a4 Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Tue, 15 Jul 2008 14:18:29 +0800 Subject: [PATCH] KVM: VMX: Fix undefined beaviour of EPT after reload kvm-intel.ko Based on Joerg Roedel's fix for NPT. Thanks Joerg! Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kvm/vmx.c | 15 +-- 1 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5f807e3..374e1ca 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3108,14 +3108,17 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) return ERR_PTR(-ENOMEM); allocate_vpid(vmx); - if (id == 0 vm_need_ept()) { - kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK | - VMX_EPT_WRITABLE_MASK | - VMX_EPT_DEFAULT_MT VMX_EPT_MT_EPTE_SHIFT); - kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK, + if (id == 0) { + if (vm_need_ept()) { + kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK | + VMX_EPT_WRITABLE_MASK | + VMX_EPT_DEFAULT_MT VMX_EPT_MT_EPTE_SHIFT); + kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK, VMX_EPT_FAKE_DIRTY_MASK, 0ull, VMX_EPT_EXECUTABLE_MASK); - kvm_enable_tdp(); + kvm_enable_tdp(); + } else + kvm_disable_tdp(); } hmm, what is this code doing in vmx_create_vcpu()? surely vmx_init() is a better place? Oh, may be a historic reason :) Move it to vmx_init() now. -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: VMX: Fix bypass_guest_pf enabling when disable EPT in module parameter
From c4a2cad8b91ac4c0b04a5ccd1f0bfab1d7e6ef37 Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Wed, 16 Jul 2008 09:21:22 +0800 Subject: [PATCH] KVM: VMX: Fix bypass_guest_pf enabling when disable EPT in module parameter Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kvm/vmx.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5f807e3..d47c3f8 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3294,7 +3294,7 @@ static int __init vmx_init(void) vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_ESP); vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_EIP); - if (cpu_has_vmx_ept()) + if (vm_need_ept()) bypass_guest_pf = 0; if (bypass_guest_pf) -- 1.5.6 From c4a2cad8b91ac4c0b04a5ccd1f0bfab1d7e6ef37 Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Wed, 16 Jul 2008 09:21:22 +0800 Subject: [PATCH] KVM: VMX: Fix bypass_guest_pf enabling when disable EPT in module parameter Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kvm/vmx.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5f807e3..d47c3f8 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3294,7 +3294,7 @@ static int __init vmx_init(void) vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_ESP); vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_EIP); - if (cpu_has_vmx_ept()) + if (vm_need_ept()) bypass_guest_pf = 0; if (bypass_guest_pf) -- 1.5.6
[PATCH 2/2] KVM: VMX: Fix undefined beaviour of EPT after reload kvm-intel.ko
From bcbe1b5c4c6098f122accba4f00f6617baf807f7 Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Wed, 16 Jul 2008 09:25:40 +0800 Subject: [PATCH] KVM: VMX: Fix undefined beaviour of EPT after reload kvm-intel.ko As well as move set base/mask ptes to vmx_init(). Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kvm/vmx.c | 20 ++-- 1 files changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d47c3f8..baddb6e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3108,15 +3108,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) return ERR_PTR(-ENOMEM); allocate_vpid(vmx); - if (id == 0 vm_need_ept()) { - kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK | - VMX_EPT_WRITABLE_MASK | - VMX_EPT_DEFAULT_MT VMX_EPT_MT_EPTE_SHIFT); - kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK, - VMX_EPT_FAKE_DIRTY_MASK, 0ull, - VMX_EPT_EXECUTABLE_MASK); - kvm_enable_tdp(); - } err = kvm_vcpu_init(vmx-vcpu, kvm, id); if (err) @@ -3294,8 +3285,17 @@ static int __init vmx_init(void) vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_ESP); vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_EIP); - if (vm_need_ept()) + if (vm_need_ept()) { bypass_guest_pf = 0; + kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK | + VMX_EPT_WRITABLE_MASK | + VMX_EPT_DEFAULT_MT VMX_EPT_MT_EPTE_SHIFT); + kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK, + VMX_EPT_FAKE_DIRTY_MASK, 0ull, + VMX_EPT_EXECUTABLE_MASK); + kvm_enable_tdp(); + } else + kvm_disable_tdp(); if (bypass_guest_pf) kvm_mmu_set_nonpresent_ptes(~0xffeull, 0ull); -- 1.5.6 From bcbe1b5c4c6098f122accba4f00f6617baf807f7 Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Wed, 16 Jul 2008 09:25:40 +0800 Subject: [PATCH] KVM: VMX: Fix undefined beaviour of EPT after reload kvm-intel.ko As well as move set base/mask ptes to vmx_init(). Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kvm/vmx.c | 20 ++-- 1 files changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d47c3f8..baddb6e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3108,15 +3108,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) return ERR_PTR(-ENOMEM); allocate_vpid(vmx); - if (id == 0 vm_need_ept()) { - kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK | - VMX_EPT_WRITABLE_MASK | - VMX_EPT_DEFAULT_MT VMX_EPT_MT_EPTE_SHIFT); - kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK, -VMX_EPT_FAKE_DIRTY_MASK, 0ull, -VMX_EPT_EXECUTABLE_MASK); - kvm_enable_tdp(); - } err = kvm_vcpu_init(vmx-vcpu, kvm, id); if (err) @@ -3294,8 +3285,17 @@ static int __init vmx_init(void) vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_ESP); vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_EIP); - if (vm_need_ept()) + if (vm_need_ept()) { bypass_guest_pf = 0; + kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK | + VMX_EPT_WRITABLE_MASK | + VMX_EPT_DEFAULT_MT VMX_EPT_MT_EPTE_SHIFT); + kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK, +VMX_EPT_FAKE_DIRTY_MASK, 0ull, +VMX_EPT_EXECUTABLE_MASK); + kvm_enable_tdp(); + } else + kvm_disable_tdp(); if (bypass_guest_pf) kvm_mmu_set_nonpresent_ptes(~0xffeull, 0ull); -- 1.5.6
[PATCH 0/2] configure: add support for audio-{drv,card}-list
The following series adds support for qemu's audio configure option lists that were added in kvm-71 to support selecting which interface will be used to enable audio in the host from the guest (oss, alsa, sdl, esd, fmod, or pulseaudio) and which audio devices emulation to enable for the guest (ac97, adlib, cs2431a or gus). PATCH 1/2 : configure: include audio list options for --help output PATCH 2/2 : configure: passthrough for audio-{drv,card}-list Carlo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] configure: passthrough for audio-{drv,card}-list and logic cleanup
Extending the cleanup logic used in a patch from Jindrich Makovicka, changes the default option to pass the full option to qemu's configure and add a passthrough for qemu options that use a space separated list of options like the list for audio drivers enabled or the list for audio devices emulated. Signed-off-by: Carlo Marcelo Arenas Belon [EMAIL PROTECTED] --- configure | 17 + 1 files changed, 13 insertions(+), 4 deletions(-) diff --git a/configure b/configure index 2558e0e..fc05767 100755 --- a/configure +++ b/configure @@ -10,6 +10,8 @@ qemu_cflags= qemu_ldflags= qemu_opts= cross_prefix= +audio_drv_list= +audio_card_list= arch=`uname -m` target_exec= @@ -39,7 +41,8 @@ EOF } while [[ $1 = -* ]]; do -opt=$1; shift +optorig=$1; shift +opt=$optorig arg= if [[ $opt = *=* ]]; then arg=${opt#*=} @@ -67,16 +70,21 @@ while [[ $1 = -* ]]; do --cross-prefix) cross_prefix=$arg ;; + --audio-drv-list) + audio_drv_list=$arg + ;; + --audio-card-list) + audio_card_list=$arg + ;; --help) usage ;; *) - qemu_opts=$qemu_opts $opt + qemu_opts=$qemu_opts $optorig ;; esac done - #set kenel directory libkvm_kerneldir=$(readlink -f kernel) @@ -114,11 +122,12 @@ fi --extra-ldflags=-L $PWD/../libkvm $qemu_ldflags \ --kernel-path=$libkvm_kerneldir \ --prefix=$prefix \ +${audio_drv_list:+--audio-drv-list=$audio_drv_list} \ +${audio_card_list:+--audio-card-list=$audio_card_list} \ ${cross_prefix:+--cross-prefix=$cross_prefix} \ ${cross_prefix:+--cpu=$arch} $qemu_opts ) || usage - cat EOF config.mak ARCH=$arch PREFIX=$prefix -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 01/04]Create x86 directory to hold x86-specific files.
Avi Kivity wrote: Zhang, Xiantao wrote: From 03ac444d1ab4446c587e8180ceaba60b9e75b28d Mon Sep 17 00:00:00 2001 From: Xiantao Zhang [EMAIL PROTECTED] Date: Fri, 11 Jul 2008 10:13:08 +0800 Subject: [PATCH] KVM: external module: Moving x86-speicif files to x86 directory. Create x86 directory to hold x86-specific files. Signed-off-by: Xiantao Zhang [EMAIL PROTECTED] --- kernel/{ = x86}/anon_inodes.c|0 This isn't really x86 specific. It's just kernel version dependent. The problem is that it is built unconditionally, even if the kernel has anon_inodes support. Please send a patch that wraps the entire file in #ifdef so that we use the host kernel's anon_inodes if it is available. Sure, I will update the patch. kernel/{ = x86}/external-module-compat.c |0 kernel/{ = x86}/external-module-compat.h |0 Parts of this are generic, for example the mutex code. Please move only the x86 specifc parts (even if ia64 doesn't need everything in the generic code).' OK. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Ignore DEBUGCTL MSRs
Avi Kivity wrote: Alexander Graf wrote: Avi Kivity wrote: Alexander Graf wrote: Netware writes and reads to the DEBUGCTL and LAST*IP MSRs without further checks and is really confused to receive a #GP during that. To make it happy we should just make them stubs, which is exactly what SVM already does. To support VMX too, I put these in the generic code. Maybe the SVM code could be cleaned up to use generic code too. Please add a pr_unimpl() when bits that cause a real processor to do something are set. Like this? I also removed the set handlers for the *IP MSRs, as these are read only and made it only handle debug bits, no perfmon bits. With a changelog entry. ok. +pr_unimpl(vcpu, %s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n, +__func__, data); We can avoid the printout if data == 0, since we support that case fully. I was thinking a lot about that. Even though we support data == 0, usually the kernel log output is useful for people trying to find if something is cause a problem. If they see that DEBUGCTL gets set, but won't see it getting unset, they'd get confused IMHO. So the current behavior is on purpose, but if you oppose to that idea, please tell me. --- Netware writes to DEBUGCTL and reads from the DEBUGCTL and LAST*IP MSRs without further checks and is really confused to receive a #GP during that. To make it happy we should just make them stubs, which is exactly what SVM already does. Writes to DEBUGCTL that are vendor-specific are resembled to behave as if the virtual CPU does not know them. Signed-off-by: Alexander Graf [EMAIL PROTECTED] diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fc0721e..10f5e95 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -609,6 +609,15 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) pr_unimpl(vcpu, %s: MSR_IA32_MCG_CTL 0x%llx, nop\n, __func__, data); break; + case MSR_IA32_DEBUGCTLMSR: + if (data ~(u64)(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) { + /* Values other than LBR and BTF are vendor-specific, + thus reserved and should throw a #GP */ + return 1; + } + pr_unimpl(vcpu, %s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n, + __func__, data); + break; case MSR_IA32_UCODE_REV: case MSR_IA32_UCODE_WRITE: break; @@ -705,6 +714,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_IA32_MC0_MISC+16: case MSR_IA32_UCODE_REV: case MSR_IA32_EBL_CR_POWERON: + case MSR_IA32_DEBUGCTLMSR: + case MSR_IA32_LASTBRANCHFROMIP: + case MSR_IA32_LASTBRANCHTOIP: + case MSR_IA32_LASTINTFROMIP: + case MSR_IA32_LASTINTTOIP: data = 0; break; case MSR_MTRRcap:
[PATCH] qemu: re-add definition for qemu_get_launch_info
somehow missing from sysemu.h after a qemu merge and otherwise complaining with the following warning : kvm-71/qemu/migration.c: In function 'migration_init_ssh': kvm-71/qemu/migration.c:629: warning: implicit declaration of function 'qemu_get_launch_info' Signed-off-by: Carlo Marcelo Arenas Belon [EMAIL PROTECTED] --- qemu/sysemu.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/qemu/sysemu.h b/qemu/sysemu.h index 993d67b..ab8ac91 100644 --- a/qemu/sysemu.h +++ b/qemu/sysemu.h @@ -41,6 +41,9 @@ void qemu_system_powerdown(void); #endif void qemu_system_reset(void); +void qemu_get_launch_info(int *argc, char ***argv, + int *opt_daemonize, const char **opt_incoming); + void do_savevm(const char *name); void do_loadvm(const char *name); void do_delvm(const char *name); -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu: remove duplicated inclusion of signal.h in qemu-kvm.h
added by mistake as part of 4820cce75999b2673a964eb87601229a4bd78ad9 Signed-off-by: Carlo Marcelo Arenas Belon [EMAIL PROTECTED] --- qemu/qemu-kvm.h |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/qemu/qemu-kvm.h b/qemu/qemu-kvm.h index 8b7dcde..7e28428 100644 --- a/qemu/qemu-kvm.h +++ b/qemu/qemu-kvm.h @@ -12,8 +12,6 @@ #include signal.h -#include signal.h - int kvm_main_loop(void); int kvm_qemu_init(void); int kvm_qemu_create_context(void); -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm: Unknown error 524, Fail to handle apic access vmexit
* Yang, Sheng [EMAIL PROTECTED] [2008-07-16 11:26]: Hi Martin, can you show more dmesg here? It doesn't contain any other messages from kvm. If you still want it, let me know. And if it can be reproduce stable? I can reproduce this 100%. Anyway, I just tried 2.6.26 with FlexPriority disabled and now kvm no longer exits (and there's no Fail to handle apic access vmexit message) but Windows still displays the same blue screen (and reboots). -- Martin Michlmayr http://www.cyrius.com/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PCI passthrough with VT-d - native performance
In last few tests that we made with PCI-passthrough and VT-d using iperf, we were able to get the same throughput as on native OS with a 1G NIC (with higher CPU utilization). The following patches are the PCI-passthrough patches that Amit sent (re-based on the last kvm tree), followed by a few improvements and the VT-d extension. I am also sending the userspace patches: the patch that Amit sent for PCI passthrough and the direct-mmio extension for userspace (note that without the direct mmio extension we get less then half the throughput). Comments are welcome. Regards, Ben -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8] KVM: PCIPT: change order of device release
Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED] --- arch/x86/kvm/x86.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8d25b4a..65b307d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -343,9 +343,9 @@ static void kvm_free_pci_passthrough(struct kvm *kvm) pci_pt_dev = list_entry(ptr, struct kvm_pci_pt_dev_list, list); /* Search for this device got us a refcount */ - pci_dev_put(pci_pt_dev-pt_dev.dev); pci_release_regions(pci_pt_dev-pt_dev.dev); pci_disable_device(pci_pt_dev-pt_dev.dev); + pci_dev_put(pci_pt_dev-pt_dev.dev); list_del(pci_pt_dev-list); kfree(pci_pt_dev); -- 1.5.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8] KVM: PCIPT: fix interrupt handling
This patch fixes a few problems with the interrupt handling for passthrough devices. 1. Pass the interrupt handler the pointer to the device, so we do not need to lock the pcipt lock in the interrupt handler. 2. Remove the pt_irq_handled bitmap - it is no longer needed. 3. Split kvm_pci_pt_work_fn into two functions, one for interrupt injection and another for the ack - is much simpler code this way. 4. Change the passthrough initialization order - add the device structure to the list, before registering the interrupt handler. 5. On passthrough destruction path, free the interrupt handler before cleaning queued work. Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED] --- arch/x86/kvm/x86.c | 156 ++- include/asm-x86/kvm_host.h |5 +- virt/kvm/ioapic.c |5 +- 3 files changed, 69 insertions(+), 97 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c07ca2b..8d25b4a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -145,49 +145,37 @@ kvm_find_pci_pt_dev(struct list_head *head, return NULL; } -static DECLARE_BITMAP(pt_irq_handled, NR_IRQS); - -static void kvm_pci_pt_work_fn(struct work_struct *work) +static void kvm_pci_pt_int_work_fn(struct work_struct *work) { - struct kvm_pci_pt_dev_list *match; struct kvm_pci_pt_work *int_work; - int source; - unsigned long flags; - int guest_irq; - int host_irq; int_work = container_of(work, struct kvm_pci_pt_work, work); - source = int_work-source ? KVM_PT_SOURCE_IRQ_ACK : KVM_PT_SOURCE_IRQ; - /* This is taken to safely inject irq inside the guest. When * the interrupt injection (or the ioapic code) uses a * finer-grained lock, update this */ - mutex_lock(int_work-kvm-lock); - read_lock_irqsave(kvm_pci_pt_lock, flags); - match = kvm_find_pci_pt_dev(int_work-kvm-arch.pci_pt_dev_head, NULL, - int_work-irq, source); - if (!match) { - printk(KERN_ERR %s: no matching device assigned to guest - found for irq %d, source = %d!\n, - __func__, int_work-irq, int_work-source); - read_unlock_irqrestore(kvm_pci_pt_lock, flags); - goto out; - } - guest_irq = match-pt_dev.guest.irq; - host_irq = match-pt_dev.host.irq; - read_unlock_irqrestore(kvm_pci_pt_lock, flags); + mutex_lock(int_work-pt_dev-kvm-lock); + kvm_set_irq(int_work-pt_dev-kvm, int_work-pt_dev-guest.irq, 1); + mutex_unlock(int_work-pt_dev-kvm-lock); + kvm_put_kvm(int_work-pt_dev-kvm); +} - if (source == KVM_PT_SOURCE_IRQ) - kvm_set_irq(int_work-kvm, guest_irq, 1); - else { - kvm_set_irq(int_work-kvm, int_work-irq, 0); - enable_irq(host_irq); - } -out: - mutex_unlock(int_work-kvm-lock); - kvm_put_kvm(int_work-kvm); +static void kvm_pci_pt_ack_work_fn(struct work_struct *work) +{ + struct kvm_pci_pt_work *ack_work; + + ack_work = container_of(work, struct kvm_pci_pt_work, work); + + /* This is taken to safely inject irq inside the guest. When +* the interrupt injection (or the ioapic code) uses a +* finer-grained lock, update this +*/ + mutex_lock(ack_work-pt_dev-kvm-lock); + kvm_set_irq(ack_work-pt_dev-kvm, ack_work-pt_dev-guest.irq, 0); + enable_irq(ack_work-pt_dev-host.irq); + mutex_unlock(ack_work-pt_dev-kvm-lock); + kvm_put_kvm(ack_work-pt_dev-kvm); } /* FIXME: Implement the OR logic needed to make shared interrupts on @@ -195,28 +183,11 @@ out: */ static irqreturn_t kvm_pci_pt_dev_intr(int irq, void *dev_id) { - struct kvm *kvm = (struct kvm *) dev_id; - struct kvm_pci_pt_dev_list *pci_pt_dev; - - if (!test_bit(irq, pt_irq_handled)) - return IRQ_NONE; - - read_lock(kvm_pci_pt_lock); - pci_pt_dev = kvm_find_pci_pt_dev(kvm-arch.pci_pt_dev_head, NULL, -irq, KVM_PT_SOURCE_IRQ); - if (!pci_pt_dev) { - read_unlock(kvm_pci_pt_lock); - return IRQ_NONE; - } - - pci_pt_dev-pt_dev.int_work.irq = irq; - pci_pt_dev-pt_dev.int_work.kvm = kvm; - pci_pt_dev-pt_dev.int_work.source = 0; - - kvm_get_kvm(kvm); - schedule_work(pci_pt_dev-pt_dev.int_work.work); - read_unlock(kvm_pci_pt_lock); + struct kvm_pci_passthrough_dev_kernel *pt_dev = + (struct kvm_pci_passthrough_dev_kernel *) dev_id; + kvm_get_kvm(pt_dev-kvm); + schedule_work(pt_dev-int_work.work); disable_irq_nosync(irq); return IRQ_HANDLED; } @@ -226,25 +197,20 @@ static void kvm_pci_pt_ack_irq(void *opaque, int irq) { struct kvm *kvm = opaque; struct kvm_pci_pt_dev_list *pci_pt_dev; - unsigned long flags;
[PATCH 7/8] KVM: PCIPT: VT-d support
This patch includes the functions to support VT-d for passthrough devices. [Ben: fixed memory pinning, cleanup] Signed-off-by: Kay, Allen M [EMAIL PROTECTED] Signed-off-by: Weidong Han [EMAIL PROTECTED] Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED] --- arch/x86/kvm/Makefile |2 +- arch/x86/kvm/vtd.c | 176 arch/x86/kvm/x86.c | 11 +++ include/asm-x86/kvm_host.h |1 + include/linux/kvm_host.h |6 ++ virt/kvm/kvm_main.c|6 ++ 6 files changed, 201 insertions(+), 1 deletions(-) create mode 100644 arch/x86/kvm/vtd.c diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index d0e940b..5d9d079 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -11,7 +11,7 @@ endif EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \ - i8254.o + i8254.o vtd.o obj-$(CONFIG_KVM) += kvm.o kvm-intel-objs = vmx.o obj-$(CONFIG_KVM_INTEL) += kvm-intel.o diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c new file mode 100644 index 000..83efb8a --- /dev/null +++ b/arch/x86/kvm/vtd.c @@ -0,0 +1,176 @@ +/* + * Copyright (c) 2006, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + * Copyright (C) 2006-2008 Intel Corporation + * Author: Allen M. Kay [EMAIL PROTECTED] + * Author: Weidong Han [EMAIL PROTECTED] + */ + +#include linux/list.h +#include linux/kvm_host.h +#include linux/pci.h +#include linux/dmar.h +#include linux/intel-iommu.h + +static int kvm_iommu_unmap_memslots(struct kvm *kvm); + +int kvm_iommu_map_pages(struct kvm *kvm, + gfn_t base_gfn, unsigned long npages) +{ + gfn_t gfn = base_gfn; + pfn_t pfn; + int i, rc; + struct dmar_domain *domain = kvm-arch.intel_iommu_domain; + + if (!domain) + return -EFAULT; + + for (i = 0; i npages; i++) { + pfn = gfn_to_pfn(kvm, gfn); + rc = intel_iommu_page_mapping(domain, + gfn PAGE_SHIFT, + pfn PAGE_SHIFT, + PAGE_SIZE, + DMA_PTE_READ | + DMA_PTE_WRITE); + if (rc) + kvm_release_pfn_clean(pfn); + + gfn++; + } + return 0; +} + +static int kvm_iommu_map_memslots(struct kvm *kvm) +{ + int i, rc; + for (i = 0; i kvm-nmemslots; i++) { + rc = kvm_iommu_map_pages(kvm, kvm-memslots[i].base_gfn, +kvm-memslots[i].npages); + if (rc) + return rc; + } + return 0; +} + +int kvm_iommu_map_guest(struct kvm *kvm, + struct kvm_pci_passthrough_dev *pci_pt_dev) +{ + struct pci_dev *pdev = NULL; + + printk(KERN_DEBUG VT-d direct map: host bdf = %x:%x:%x\n, + pci_pt_dev-host.busnr, + PCI_SLOT(pci_pt_dev-host.devfn), + PCI_FUNC(pci_pt_dev-host.devfn)); + + for_each_pci_dev(pdev) { + if ((pdev-bus-number == pci_pt_dev-host.busnr) + (pdev-devfn == pci_pt_dev-host.devfn)) { + break; + } + } + + if (pdev == NULL) { + if (kvm-arch.intel_iommu_domain) { + intel_iommu_domain_exit(kvm-arch.intel_iommu_domain); + kvm-arch.intel_iommu_domain = NULL; + } + return -ENODEV; + } + + kvm-arch.intel_iommu_domain = intel_iommu_domain_alloc(pdev); + + if (kvm_iommu_map_memslots(kvm)) { + kvm_iommu_unmap_memslots(kvm); + return -EFAULT; + } + + intel_iommu_detach_dev(kvm-arch.intel_iommu_domain, + pdev-bus-number, pdev-devfn); + + if (intel_iommu_context_mapping(kvm-arch.intel_iommu_domain, + pdev)) { + printk(KERN_ERR Domain context map for %s failed, + pci_name(pdev)); + return -EFAULT; + } + return 0; +} + +static int kvm_iommu_put_pages(struct kvm *kvm, +
[PATCH 8/8] KVM: PCIPT: VT-d: dont map mmio memory slots
Avoid mapping mmio memory slots. Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED] --- arch/x86/kvm/vtd.c | 20 +--- include/asm-x86/kvm_host.h |2 ++ virt/kvm/kvm_main.c|2 +- 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c index 83efb8a..77044fb 100644 --- a/arch/x86/kvm/vtd.c +++ b/arch/x86/kvm/vtd.c @@ -40,14 +40,20 @@ int kvm_iommu_map_pages(struct kvm *kvm, for (i = 0; i npages; i++) { pfn = gfn_to_pfn(kvm, gfn); - rc = intel_iommu_page_mapping(domain, - gfn PAGE_SHIFT, + if (!is_mmio_pfn(pfn)) { + rc = intel_iommu_page_mapping(domain, + gfn PAGE_SHIFT, pfn PAGE_SHIFT, - PAGE_SIZE, - DMA_PTE_READ | - DMA_PTE_WRITE); - if (rc) - kvm_release_pfn_clean(pfn); + PAGE_SIZE, + DMA_PTE_READ | + DMA_PTE_WRITE); + if (rc) + kvm_release_pfn_clean(pfn); + } else { + printk(KERN_DEBUG kvm_iommu_map_page: + invalid pfn=%lx\n, pfn); + return 0; + } gfn++; } diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index 6185ed7..ee4685c 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -513,6 +513,8 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes, gpa_t addr, unsigned long *ret); +int is_mmio_pfn(pfn_t pfn); + extern bool tdp_enabled; enum emulation_result { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 77d7001..0653ec1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -77,7 +77,7 @@ static inline int valid_vcpu(int n) return likely(n = 0 n KVM_MAX_VCPUS); } -static inline int is_mmio_pfn(pfn_t pfn) +inline int is_mmio_pfn(pfn_t pfn) { if (pfn_valid(pfn)) return PageReserved(pfn_to_page(pfn)); -- 1.5.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] VT-d: changes to support KVM
From: Kay, Allen M [EMAIL PROTECTED] This patch extends the VT-d driver to support KVM [Ben: fixed memory pinning] Signed-off-by: Kay, Allen M [EMAIL PROTECTED] Signed-off-by: Weidong Han [EMAIL PROTECTED] Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED] --- drivers/pci/dmar.c |4 +- drivers/pci/intel-iommu.c| 117 +- drivers/pci/iova.c |2 +- {drivers/pci = include/linux}/intel-iommu.h | 11 +++ {drivers/pci = include/linux}/iova.h|0 5 files changed, 127 insertions(+), 7 deletions(-) rename {drivers/pci = include/linux}/intel-iommu.h (94%) rename {drivers/pci = include/linux}/iova.h (100%) diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c index f941f60..a58a5b0 100644 --- a/drivers/pci/dmar.c +++ b/drivers/pci/dmar.c @@ -26,8 +26,8 @@ #include linux/pci.h #include linux/dmar.h -#include iova.h -#include intel-iommu.h +#include linux/iova.h +#include linux/intel-iommu.h #undef PREFIX #define PREFIX DMAR: diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index bb06423..a566406 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -20,6 +20,7 @@ * Author: Anil S Keshavamurthy [EMAIL PROTECTED] */ +#undef DEBUG #include linux/init.h #include linux/bitmap.h #include linux/debugfs.h @@ -33,8 +34,8 @@ #include linux/dma-mapping.h #include linux/mempool.h #include linux/timer.h -#include iova.h -#include intel-iommu.h +#include linux/iova.h +#include linux/intel-iommu.h #include asm/proto.h /* force_iommu in this header in x86-64*/ #include asm/cacheflush.h #include asm/gart.h @@ -160,7 +161,7 @@ static inline void *alloc_domain_mem(void) return iommu_kmem_cache_alloc(iommu_domain_cache); } -static inline void free_domain_mem(void *vaddr) +static void free_domain_mem(void *vaddr) { kmem_cache_free(iommu_domain_cache, vaddr); } @@ -1414,7 +1415,7 @@ static void domain_remove_dev_info(struct dmar_domain *domain) * find_domain * Note: we use struct pci_dev-dev.archdata.iommu stores the info */ -struct dmar_domain * +static struct dmar_domain * find_domain(struct pci_dev *pdev) { struct device_domain_info *info; @@ -2431,3 +2432,111 @@ int __init intel_iommu_init(void) return 0; } +void intel_iommu_domain_exit(struct dmar_domain *domain) +{ + u64 end; + + /* Domain 0 is reserved, so dont process it */ + if (!domain) + return; + + end = DOMAIN_MAX_ADDR(domain-gaw); + end = end (~PAGE_MASK_4K); + + /* clear ptes */ + dma_pte_clear_range(domain, 0, end); + + /* free page tables */ + dma_pte_free_pagetable(domain, 0, end); + + iommu_free_domain(domain); + free_domain_mem(domain); +} +EXPORT_SYMBOL_GPL(intel_iommu_domain_exit); + +struct dmar_domain *intel_iommu_domain_alloc(struct pci_dev *pdev) +{ + struct dmar_drhd_unit *drhd; + struct dmar_domain *domain; + struct intel_iommu *iommu; + + drhd = dmar_find_matched_drhd_unit(pdev); + if (!drhd) { + printk(KERN_ERR intel_iommu_domain_alloc: drhd == NULL\n); + return NULL; + } + + iommu = drhd-iommu; + if (!iommu) { + printk(KERN_ERR + intel_iommu_domain_alloc: iommu == NULL\n); + return NULL; + } + domain = iommu_alloc_domain(iommu); + if (!domain) { + printk(KERN_ERR + intel_iommu_domain_alloc: domain == NULL\n); + return NULL; + } + if (domain_init(domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) { + printk(KERN_ERR + intel_iommu_domain_alloc: domain_init() failed\n); + intel_iommu_domain_exit(domain); + return NULL; + } + return domain; +} +EXPORT_SYMBOL_GPL(intel_iommu_domain_alloc); + +int intel_iommu_context_mapping( + struct dmar_domain *domain, struct pci_dev *pdev) +{ + int rc; + rc = domain_context_mapping(domain, pdev); + return rc; +} +EXPORT_SYMBOL_GPL(intel_iommu_context_mapping); + +int intel_iommu_page_mapping( + struct dmar_domain *domain, dma_addr_t iova, + u64 hpa, size_t size, int prot) +{ + int rc; + rc = domain_page_mapping(domain, iova, hpa, size, prot); + return rc; +} +EXPORT_SYMBOL_GPL(intel_iommu_page_mapping); + +void intel_iommu_detach_dev(struct dmar_domain *domain, u8 bus, u8 devfn) +{ + detach_domain_for_dev(domain, bus, devfn); +} +EXPORT_SYMBOL_GPL(intel_iommu_detach_dev); + +struct dmar_domain * +intel_iommu_find_domain(struct pci_dev *pdev) +{ + return find_domain(pdev); +} +EXPORT_SYMBOL_GPL(intel_iommu_find_domain); + +int intel_iommu_found(void) +{ + return g_num_of_iommus; +} +EXPORT_SYMBOL_GPL(intel_iommu_found); + +u64 intel_iommu_iova_to_pfn(struct dmar_domain *domain, u64
[PATCH 1/8] KVM: Introduce a callback routine for IOAPIC ack handling
From: Amit Shah [EMAIL PROTECTED] This will be useful for acking irqs of assigned devices Signed-off-by: Amit Shah [EMAIL PROTECTED] --- virt/kvm/ioapic.c |3 +++ virt/kvm/ioapic.h |1 + 2 files changed, 4 insertions(+), 0 deletions(-) diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index c0d2287..8ce93c7 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -295,6 +295,9 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi) ent-fields.remote_irr = 0; if (!ent-fields.mask (ioapic-irr (1 gsi))) ioapic_service(ioapic, gsi); + + if (ioapic-ack_notifier) + ioapic-ack_notifier(ioapic-kvm, gsi); } void kvm_ioapic_update_eoi(struct kvm *kvm, int vector) diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h index 7f16675..a42743f 100644 --- a/virt/kvm/ioapic.h +++ b/virt/kvm/ioapic.h @@ -58,6 +58,7 @@ struct kvm_ioapic { } redirtbl[IOAPIC_NUM_PINS]; struct kvm_io_device dev; struct kvm *kvm; + void (*ack_notifier)(void *opaque, int irq); }; #ifdef DEBUG -- 1.5.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm: Unknown error 524, Fail to handle apic access vmexit
Yang, Sheng wrote: On Tuesday 15 July 2008 23:19:07 Dor Laor wrote: Martin Michlmayr wrote: I installed a Windows XP SP2 guest on a Debian x86_64 host The installation itself went fine but kvm aborts when when XP starts during Windows XP Setup. XP mentions something with intelppm.sys (see the attached screenshot) and kvm says: kvm_run: Unknown error 524 kvm_run returned -524 It's a FlexPriority bug, while it should be solved, you can disable it by using kvm-intel module parameter. Dor, are you sure it's a FlexPriority bug? Well, I'm not sure it's the FlexPriority's fault, it's just when it is disabled it does not happen and I saw the apic access. It can be miss emulation too. It happened to me on ~ kvm-69 If you look at where is the complain, you would find there is a result of emulate_instruction(). And you will find a clearly emulation failed (mmio) rip 7cb3d000 ff ff 8d 85 in the bug tracker Martin metioned above the Fail to handle apic access vmexit! Offset is 0xf0(Spurious Interrupt Vector Register). I don't think ff ff 8d 85 is a vaild opcode for that case. Maybe it's a regression? The last report is long ago... Hi Martin, can you show more dmesg here? And if it can be reproduce stable? Thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: networking setup problem
paolo pedaletti wrote: Hi, I hope this is the right ml to submit my problem. Abstract: I can't setup 2 different network inside my VMs, one public and one private. Scheme: eth0 - -| proxy |---eth1 |- | H| | O| eth0 - | S|| web |--|eth1 T|- | | | | eth0 - | || db|---eth1 - this is a classic LAMP, sparse on 3 VM 1) front end, proxy (apache2 in reverse with mod-security) 2) application server, web (apache2 + php5) 3) database (mysql5) (it's a test/backup environment) each VM must have 2 network card: eth0 on the local network, in bridge with the host physical eth0 eth1 on the virtual private network, for internal communications between them saying that, ... it doesn't work :-( (linux ubuntu 8.04 2.6.24-19-generic, kvm-62) these are the command lines: kvm -name PROXY -net nic,vlan=0,macaddr=00:18:BE:EF:17:2A,model=rtl8139 -net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh -net nic,vlan=1,macaddr=00:18:BE:EF:17:2B,model=rtl8139 -net user,vlan=1,ifname=dmz0,script=./qemu-ifup.sh -drive index=0,media=disk,if=scsi,file=./ubuntu-server.PROXY.root,boot=on -drive index=1,media=disk,if=scsi,file=./ubuntu-server.PROXY.home -drive index=2,media=disk,if=scsi,file=./ubuntu-server.PROXY.swap kvm -name WEBAPP -net nic,vlan=0,macaddr=00:18:BE:EF:17:1A,model=rtl8139 -net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh -net nic,vlan=1,macaddr=00:18:BE:EF:17:1B,model=rtl8139 -net user,vlan=1,ifname=dmz0,script=./qemu-ifup.sh -drive index=0,media=disk,if=scsi,file=./ubuntu-server.WEB.root,boot=on -drive index=1,media=disk,if=scsi,file=./ubuntu-server.WEB.home -drive index=2,media=disk,if=scsi,file=./ubuntu-server.WEB.swap kvm -name DB -net nic,vlan=0,macaddr=00:18:BE:EF:17:0A,model=rtl8139 -net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh -net nic,vlan=1,macaddr=00:18:BE:EF:17:0B,model=rtl8139 -net user,vlan=1,ifname=dmz0,script=./qemu-ifup.sh -drive index=0,media=disk,if=scsi,file=./ubuntu-server.DB.root,boot=on -drive index=1,media=disk,if=scsi,file=./ubuntu-server.DB.home -drive index=2,media=disk,if=scsi,file=./ubuntu-server.DB.swap Does using a different ifname help ? PROXY: ifname=tap2 and dmz2 WEBAPP: ifname=tap1 and dmz1 DB: ifname=tap0 and dmz0 Also check route on guests. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2019053 ] tbench fails on guest when AMD NPT enabled
Bugs item #2019053, was opened at 2008-07-16 03:10 Message generated for change (Comment added) made by avik You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2019053group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: amd Group: None Status: Pending Resolution: None Priority: 5 Private: No Submitted By: Alex Williamson (alex_williamson) Assigned to: Nobody/Anonymous (nobody) Summary: tbench fails on guest when AMD NPT enabled Initial Comment: Running on a dual-socket system with AMD 2356 quad-core processors (8 total cores), 32GB RAM, Ubuntu Hardy 2.6.24-19-generic (64bit) with kvm-71 userspace and kernel modules. With no module options, dmesg confirms: kvm: Nested Paging enabled Start guest with: /usr/local/kvm/bin/qemu-system-x86_64 -hda /dev/VM/Ubuntu64 -m 1024 -net nic,model=e1000,mac=de:ad:be:ef:00:01 -net tap,script=/root/bin/br0-ifup -smp 8 -vnc :0 Guest VM is also Ubuntu Hardy 64bit. On the guest run 'tbench 16 tbench server'. System running tbench_srv is a different system in my case. The tbench client will fail randomly, often quietly with Child failed with status 1, but sometimes more harshly with a glibc double free error. If I unload the modules and reload w/o npt: modprobe -r kvm-amd modprobe -r kvm modprobe kvm-amd npt=0 dmesg confirms: kvm: Nested Paging disabled The tbench test now runs over and over successfully. The test also runs fine on an Intel E5450 (no EPT). -- Comment By: Avi Kivity (avik) Date: 2008-07-16 17:19 Message: Logged In: YES user_id=539971 Originator: NO Strange. If you add an mlockall() to qemu startup, does the test pass? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2019053group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] posix-timers: Do not modify an already queued timer signal
When a timer fires, posix_timer_event() zeroes out its pre-allocated siginfo structure, initialises it and then queues up the signal with send_sigqueue(). However, we may have previously queued up this signal, in which case we only want to increment si_overrun and re-initialising the siginfo structure is incorrect. Also, since we are modifying an already queued signal without the protection of the sighand spinlock, we may also race with e.g. collect_signal() causing it to fail to find a signal on the pending list because it happens to look at the siginfo struct after it was zeroed and before it was re-initialised. The race was observed with a modified kvm-userspace when running a guest under heavy network load. When it occurs, KVM never sees another SIGALRM signal because although the signal is queued up the appropriate bit is never set in the pending mask. Manually sending the process a SIGALRM kicks it out of this state. The fix is simple - only modify the pre-allocated sigqueue once we're sure that it hasn't already been queued. Signed-off-by: Mark McLoughlin [EMAIL PROTECTED] Cc: Oleg Nesterov [EMAIL PROTECTED] Cc: Roland McGrath [EMAIL PROTECTED] --- include/linux/sched.h |2 +- kernel/posix-timers.c | 20 +++- kernel/signal.c |5 +++-- 3 files changed, 15 insertions(+), 12 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 2134917..718f7ec 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1791,7 +1791,7 @@ extern void zap_other_threads(struct task_struct *p); extern int kill_proc(pid_t, int, int); extern struct sigqueue *sigqueue_alloc(void); extern void sigqueue_free(struct sigqueue *); -extern int send_sigqueue(struct sigqueue *, struct task_struct *, int group); +extern int send_sigqueue(struct sigqueue *, siginfo_t *, struct task_struct *, int group); extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *); extern int do_sigaltstack(const stack_t __user *, stack_t __user *, unsigned long); diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c index dbd8398..b42c964 100644 --- a/kernel/posix-timers.c +++ b/kernel/posix-timers.c @@ -298,19 +298,21 @@ void do_schedule_next_timer(struct siginfo *info) int posix_timer_event(struct k_itimer *timr,int si_private) { - memset(timr-sigq-info, 0, sizeof(siginfo_t)); - timr-sigq-info.si_sys_private = si_private; + siginfo_t info; + + memset(info, 0, sizeof(siginfo_t)); + info.si_sys_private = si_private; /* Send signal to the process that owns this timer.*/ - timr-sigq-info.si_signo = timr-it_sigev_signo; - timr-sigq-info.si_errno = 0; - timr-sigq-info.si_code = SI_TIMER; - timr-sigq-info.si_tid = timr-it_id; - timr-sigq-info.si_value = timr-it_sigev_value; + info.si_signo = timr-it_sigev_signo; + info.si_errno = 0; + info.si_code = SI_TIMER; + info.si_tid = timr-it_id; + info.si_value = timr-it_sigev_value; if (timr-it_sigev_notify SIGEV_THREAD_ID) { struct task_struct *leader; - int ret = send_sigqueue(timr-sigq, timr-it_process, 0); + int ret = send_sigqueue(timr-sigq, info, timr-it_process, 0); if (likely(ret = 0)) return ret; @@ -321,7 +323,7 @@ int posix_timer_event(struct k_itimer *timr,int si_private) timr-it_process = leader; } - return send_sigqueue(timr-sigq, timr-it_process, 1); + return send_sigqueue(timr-sigq, info, timr-it_process, 1); } EXPORT_SYMBOL_GPL(posix_timer_event); diff --git a/kernel/signal.c b/kernel/signal.c index 6c0958e..50e0b13 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1292,9 +1292,9 @@ void sigqueue_free(struct sigqueue *q) __sigqueue_free(q); } -int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group) +int send_sigqueue(struct sigqueue *q, siginfo_t *info, struct task_struct *t, int group) { - int sig = q-info.si_signo; + int sig = info-si_signo; struct sigpending *pending; unsigned long flags; int ret; @@ -1322,6 +1322,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group) signalfd_notify(t, sig); pending = group ? t-signal-shared_pending : t-pending; + copy_siginfo(q-info, info); list_add_tail(q-list, pending-list); sigaddset(pending-signal, sig); complete_signal(sig, t, group); -- 1.5.5.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/8] KVM: PCIPT: fix interrupt handling
Ben-Ami Yassour wrote: This patch fixes a few problems with the interrupt handling for passthrough devices. Well, fold it into the patch it fixes. There is no point in sending a buggy patch and a fix in the same patchset. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] KVM: Handle device assignment to guests
Ben-Ami Yassour wrote: From: Han, Weidong [EMAIL PROTECTED] This patch adds support for handling PCI devices that are assigned to the guest (PCI passthrough). + +/* + * Used to find a registered host PCI device (a passthrough device) + * during ioctls, interrupts or EOI + */ +struct kvm_pci_pt_dev_list * +kvm_find_pci_pt_dev(struct list_head *head, + struct kvm_pci_pt_info *pt_pci_info, int irq, int source) +{ + struct list_head *ptr; + struct kvm_pci_pt_dev_list *match; + + list_for_each(ptr, head) { + match = list_entry(ptr, struct kvm_pci_pt_dev_list, list); + + switch (source) { + case KVM_PT_SOURCE_IRQ: + /* +* Used to find a registered host device +* during interrupt context on host +*/ + if (match-pt_dev.host.irq == irq) + return match; + break; + case KVM_PT_SOURCE_IRQ_ACK: + /* +* Used to find a registered host device when +* the guest acks an interrupt +*/ + if (match-pt_dev.guest.irq == irq) + return match; + break; + case KVM_PT_SOURCE_UPDATE: + if ((match-pt_dev.host.busnr == pt_pci_info-busnr) + (match-pt_dev.host.devfn == pt_pci_info-devfn)) + return match; + break; + } + } + return NULL; +} This monster is best split into three functions each handling a separate case, without the 'source' argument. +static void kvm_pci_pt_work_fn(struct work_struct *work) +{ + struct kvm_pci_pt_dev_list *match; + struct kvm_pci_pt_work *int_work; + int source; + unsigned long flags; + int guest_irq; + int host_irq; + + int_work = container_of(work, struct kvm_pci_pt_work, work); + + source = int_work-source ? KVM_PT_SOURCE_IRQ_ACK : KVM_PT_SOURCE_IRQ; + + /* This is taken to safely inject irq inside the guest. When +* the interrupt injection (or the ioapic code) uses a +* finer-grained lock, update this +*/ + mutex_lock(int_work-kvm-lock); + read_lock_irqsave(kvm_pci_pt_lock, flags); + match = kvm_find_pci_pt_dev(int_work-kvm-arch.pci_pt_dev_head, NULL, + int_work-irq, source); + if (!match) { + printk(KERN_ERR %s: no matching device assigned to guest + found for irq %d, source = %d!\n, + __func__, int_work-irq, int_work-source); + read_unlock_irqrestore(kvm_pci_pt_lock, flags); + goto out; + } + guest_irq = match-pt_dev.guest.irq; + host_irq = match-pt_dev.host.irq; + read_unlock_irqrestore(kvm_pci_pt_lock, flags); + + if (source == KVM_PT_SOURCE_IRQ) + kvm_set_irq(int_work-kvm, guest_irq, 1); + else { + kvm_set_irq(int_work-kvm, int_work-irq, 0); + enable_irq(host_irq); + } +out: + mutex_unlock(int_work-kvm-lock); + kvm_put_kvm(int_work-kvm); +} + +/* FIXME: Implement the OR logic needed to make shared interrupts on + * this line behave properly + */ Isn't this a showstopper? There is no easy way for a user to avoid sharing, especially as we have only three pci irqs at present. +static irqreturn_t kvm_pci_pt_dev_intr(int irq, void *dev_id) +{ + struct kvm *kvm = (struct kvm *) dev_id; + struct kvm_pci_pt_dev_list *pci_pt_dev; + + if (!test_bit(irq, pt_irq_handled)) + return IRQ_NONE; + + read_lock(kvm_pci_pt_lock); + pci_pt_dev = kvm_find_pci_pt_dev(kvm-arch.pci_pt_dev_head, NULL, +irq, KVM_PT_SOURCE_IRQ); + if (!pci_pt_dev) { + read_unlock(kvm_pci_pt_lock); + return IRQ_NONE; + } I see we don't reuse the result of the search. I guess we can't, since the list may change between the interrupt and the execution of the work function. + + pci_pt_dev-pt_dev.int_work.irq = irq; + pci_pt_dev-pt_dev.int_work.kvm = kvm; + pci_pt_dev-pt_dev.int_work.source = 0; + For a bool, use false, not 0. But 'source' isn't really a good name for a boolean. Perhaps 'is_ack'? + +/* Ack the irq line for a passthrough device */ +static void kvm_pci_pt_ack_irq(void *opaque, int irq) +{ + struct kvm *kvm = opaque; + struct kvm_pci_pt_dev_list *pci_pt_dev; + unsigned long flags; + + if (irq == -1) + return; + + read_lock_irqsave(kvm_pci_pt_lock, flags); + pci_pt_dev = kvm_find_pci_pt_dev(kvm-arch.pci_pt_dev_head, NULL, irq, +
[ kvm-Bugs-2019053 ] tbench fails on guest when AMD NPT enabled
Bugs item #2019053, was opened at 2008-07-15 18:10 Message generated for change (Comment added) made by alex_williamson You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2019053group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: amd Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Alex Williamson (alex_williamson) Assigned to: Nobody/Anonymous (nobody) Summary: tbench fails on guest when AMD NPT enabled Initial Comment: Running on a dual-socket system with AMD 2356 quad-core processors (8 total cores), 32GB RAM, Ubuntu Hardy 2.6.24-19-generic (64bit) with kvm-71 userspace and kernel modules. With no module options, dmesg confirms: kvm: Nested Paging enabled Start guest with: /usr/local/kvm/bin/qemu-system-x86_64 -hda /dev/VM/Ubuntu64 -m 1024 -net nic,model=e1000,mac=de:ad:be:ef:00:01 -net tap,script=/root/bin/br0-ifup -smp 8 -vnc :0 Guest VM is also Ubuntu Hardy 64bit. On the guest run 'tbench 16 tbench server'. System running tbench_srv is a different system in my case. The tbench client will fail randomly, often quietly with Child failed with status 1, but sometimes more harshly with a glibc double free error. If I unload the modules and reload w/o npt: modprobe -r kvm-amd modprobe -r kvm modprobe kvm-amd npt=0 dmesg confirms: kvm: Nested Paging disabled The tbench test now runs over and over successfully. The test also runs fine on an Intel E5450 (no EPT). -- Comment By: Alex Williamson (alex_williamson) Date: 2008-07-16 09:18 Message: Logged In: YES user_id=333914 Originator: YES No, I added mlockall(MCL_CURRENT | MCL_FUTURE) to qemu/vl.c:main() and it makes no difference. I'm only starting a 1G guest on an otherwise idle 32G host, so host memory pressure is pretty light. -- Comment By: Avi Kivity (avik) Date: 2008-07-16 08:19 Message: Logged In: YES user_id=539971 Originator: NO Strange. If you add an mlockall() to qemu startup, does the test pass? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2019053group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PCI passthrough with VT-d - native performance
On Wed, 2008-07-16 at 17:36 +0300, Avi Kivity wrote: Ben-Ami Yassour wrote: In last few tests that we made with PCI-passthrough and VT-d using iperf, we were able to get the same throughput as on native OS with a 1G NIC Excellent! (with higher CPU utilization). How much higher? Here are some numbers for running iperf -l 1M: e1000 NIC (behind a PCI bridge) Bandwidth (Mbit/sec)CPU utilization Native OS 771 18% Native OS with VT-d 760 18% KVM VT-d390 95% KVM VT-d with direct mmio 770 84% KVM emulated 57 100% Comment: its not clear to me why the native linux can not get closer to 1G for this NIC, (I verified that its not external network issues). But clearly we shouldn't hope to get more then the host does with a KVM guest (especially if the guest and host are the same OS as in this case...). e1000e NIC (onboard) Bandwidth (Mbit/sec)CPU utilization Native OS 915 18% Native OS with VT-d 915 18% KVM VT-d with direct mmio 914 98% Clearly we need to try and improve the CPU utilization, but I think that this is good enough for the first phase. The following patches are the PCI-passthrough patches that Amit sent (re-based on the last kvm tree), followed by a few improvements and the VT-d extension. I am also sending the userspace patches: the patch that Amit sent for PCI passthrough and the direct-mmio extension for userspace (note that without the direct mmio extension we get less then half the throughput). Is mmio passthrough the reason for the performance improvement? If not, what was the problem? Direct mmio was definitely a major improvement, without it we got half the throughput, as you can see above. In addition patch 4/8 improves the interrupt handling and removes unnecessary locks, and I assume that it also fixed performance issues (I did not investigate exactly in what way). Regards, Ben -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PCI passthrough with VT-d - native performance
Ben-Ami Yassour wrote: (with higher CPU utilization). How much higher? Here are some numbers for running iperf -l 1M: e1000 NIC (behind a PCI bridge) Bandwidth (Mbit/sec)CPU utilization Native OS 771 18% Native OS with VT-d 760 18% KVM VT-d390 95% KVM VT-d with direct mmio 770 84% KVM emulated 57 100% Comment: its not clear to me why the native linux can not get closer to 1G for this NIC, (I verified that its not external network issues). But clearly we shouldn't hope to get more then the host does with a KVM guest (especially if the guest and host are the same OS as in this case...). e1000e NIC (onboard) Bandwidth (Mbit/sec)CPU utilization Native OS 915 18% Native OS with VT-d 915 18% KVM VT-d with direct mmio 914 98% Clearly we need to try and improve the CPU utilization, but I think that this is good enough for the first phase. Agree; part of the higher utilization is of course not the fault of the device assignment code, rather it is ordinary virtualization overhead. We'll have to tune this. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PCI passthrough with VT-d - native performance
Ben-Ami Yassour wrote: On Wed, 2008-07-16 at 17:36 +0300, Avi Kivity wrote: Ben-Ami Yassour wrote: In last few tests that we made with PCI-passthrough and VT-d using iperf, we were able to get the same throughput as on native OS with a 1G NIC Excellent! (with higher CPU utilization). How much higher? Here are some numbers for running iperf -l 1M: e1000 NIC (behind a PCI bridge) Bandwidth (Mbit/sec)CPU utilization Native OS 771 18% Native OS with VT-d 760 18% KVM VT-d390 95% KVM VT-d with direct mmio 770 84% KVM emulated 57 100% What about virtio? Also, which emulated is this? That CPU utilization is extremely high and somewhat illogical if native w/vt-d has almost no CPU impact. Have you run oprofile yet or have any insight into where CPU is being burnt? What does kvm_stat look like? I wonder if there are a large number of PIO exits. What does the interrupt count look like on native vs. KVM with VT-d? Regards, Anthony Liguori Comment: its not clear to me why the native linux can not get closer to 1G for this NIC, (I verified that its not external network issues). But clearly we shouldn't hope to get more then the host does with a KVM guest (especially if the guest and host are the same OS as in this case...). e1000e NIC (onboard) Bandwidth (Mbit/sec)CPU utilization Native OS 915 18% Native OS with VT-d 915 18% KVM VT-d with direct mmio 914 98% Clearly we need to try and improve the CPU utilization, but I think that this is good enough for the first phase. The following patches are the PCI-passthrough patches that Amit sent (re-based on the last kvm tree), followed by a few improvements and the VT-d extension. I am also sending the userspace patches: the patch that Amit sent for PCI passthrough and the direct-mmio extension for userspace (note that without the direct mmio extension we get less then half the throughput). Is mmio passthrough the reason for the performance improvement? If not, what was the problem? Direct mmio was definitely a major improvement, without it we got half the throughput, as you can see above. In addition patch 4/8 improves the interrupt handling and removes unnecessary locks, and I assume that it also fixed performance issues (I did not investigate exactly in what way). Regards, Ben -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2/RFC] libkvm-s390
This is an update patch for libkvm to build and work on s390. It should address all comments from Avi as well as some aspects I have found: o implement kvm_show_regs o use s390 instead of s390x in file names. It is commonly used for 31 and 64bit systems o dont define __s390__ and __s390x__ in config.mak. Its predefined in gcc. o add some callbacks (done by Carsten, but not yet posted) From: Carsten Otte [EMAIL PROTECTED] From: Christian Borntraeger [EMAIL PROTECTED] Signed-off-by: Christian Borntraeger [EMAIL PROTECTED] --- Makefile|2 libkvm/config-s390.mak |3 + libkvm/config-s390x.mak |3 + libkvm/kvm-common.h |7 ++ libkvm/kvm-s390.h | 31 ++ libkvm/libkvm-s390.c| 137 libkvm/libkvm.c | 25 libkvm/libkvm.h | 17 + 8 files changed, 224 insertions(+), 1 deletion(-) Index: kvm-userspace/Makefile === --- kvm-userspace.orig/Makefile +++ kvm-userspace/Makefile @@ -5,7 +5,7 @@ DESTDIR= rpmrelease = devel -sane-arch = $(subst i386,x86,$(subst x86_64,x86,$(ARCH))) +sane-arch = $(subst i386,x86,$(subst x86_64,x86,$(subst s390x,s390,$(ARCH .PHONY: kernel user libkvm qemu bios vgabios extboot clean libfdt Index: kvm-userspace/libkvm/config-s390.mak === --- /dev/null +++ kvm-userspace/libkvm/config-s390.mak @@ -0,0 +1,3 @@ +# s390 31bit mode +LIBDIR := /lib +libkvm-$(ARCH)-objs := libkvm-s390.o Index: kvm-userspace/libkvm/config-s390x.mak === --- /dev/null +++ kvm-userspace/libkvm/config-s390x.mak @@ -0,0 +1,3 @@ +# s390 64 bit mode (arch=s390x) +LIBDIR := /lib64 +libkvm-$(ARCH)-objs := libkvm-s390.o Index: kvm-userspace/libkvm/kvm-common.h === --- kvm-userspace.orig/libkvm/kvm-common.h +++ kvm-userspace/libkvm/kvm-common.h @@ -18,8 +18,15 @@ /* FIXME: share this number with kvm */ /* FIXME: or dynamically alloc/realloc regions */ +#ifndef __s390__ #define KVM_MAX_NUM_MEM_REGIONS 8u #define MAX_VCPUS 16 +#else +#define KVM_MAX_NUM_MEM_REGIONS 1u +#define MAX_VCPUS 64 +#define LIBKVM_S390_ORIGIN (0UL) +#endif + /* kvm abi verison variable */ extern int kvm_abi; Index: kvm-userspace/libkvm/kvm-s390.h === --- /dev/null +++ kvm-userspace/libkvm/kvm-s390.h @@ -0,0 +1,31 @@ +/* + * This header is for functions variables that will ONLY be + * used inside libkvm for s390. + * THESE ARE NOT EXPOSED TO THE USER AND ARE ONLY FOR USE + * WITHIN LIBKVM. + * + * Copyright (C) 2006 Qumranet, Inc. + * + * Authors: + * Avi Kivity [EMAIL PROTECTED] + * Yaniv Kamay [EMAIL PROTECTED] + * + * Copyright 2008 IBM Corporation. + * Authors: + * Carsten Otte [EMAIL PROTECTED] + * + * This work is licensed under the GNU LGPL license, version 2. + */ + +#ifndef KVM_S390_H +#define KVM_S390_H + +#include asm/ptrace.h +#include kvm-common.h + +#define PAGE_SIZE 4096ul +#define PAGE_MASK (~(PAGE_SIZE - 1)) + +#define smp_wmb() asm volatile( ::: memory) + +#endif Index: kvm-userspace/libkvm/libkvm-s390.c === --- /dev/null +++ kvm-userspace/libkvm/libkvm-s390.c @@ -0,0 +1,137 @@ +/* + * This file contains the s390 specific implementation for the + * architecture dependent functions defined in kvm-common.h and + * libkvm.h + * + * Copyright (C) 2006 Qumranet + * Copyright IBM Corp. 2008 + * + * Authors: + * Carsten Otte [EMAIL PROTECTED] + * Christian Borntraeger [EMAIL PROTECTED] + * + * This work is licensed under the GNU LGPL license, version 2. + */ + +#include sys/ioctl.h +#include asm/ptrace.h + +#include libkvm.h +#include kvm-common.h +#include errno.h +#include stdio.h +#include inttypes.h + +int handle_dcr(struct kvm_run *run, kvm_context_t kvm, int vcpu) +{ + fprintf(stderr, %s: Operation not supported\n, __FUNCTION__); + return -1; +} + +int kvm_alloc_kernel_memory(kvm_context_t kvm, unsigned long memory, + void **vm_mem) +{ + fprintf(stderr, %s: Operation not supported\n, __FUNCTION__); + return -1; +} + +void *kvm_create_kernel_phys_mem(kvm_context_t kvm, unsigned long phys_start, +unsigned long len, int log, int writable) +{ + fprintf(stderr, %s: Operation not supported\n, __FUNCTION__); + return NULL; +} + +void kvm_show_code(kvm_context_t kvm, int vcpu) +{ + fprintf(stderr, %s: Operation not supported\n, __FUNCTION__); +} + +void kvm_show_regs(kvm_context_t kvm, int vcpu) +{ + struct kvm_regs regs; + struct kvm_sregs sregs; + int i; + + if (kvm_get_regs(kvm, vcpu, regs)) + return; + + if (kvm_get_sregs(kvm,
Re: [PATCH] posix-timers: Do not modify an already queued timer signal
On Wed, 2008-07-16 at 15:50 +0100, Mark McLoughlin wrote: The race was observed with a modified kvm-userspace when running a guest under heavy network load. When it occurs, KVM never sees another SIGALRM signal because although the signal is queued up the appropriate bit is never set in the pending mask. Manually sending the process a SIGALRM kicks it out of this state. I should clarify what I mean by modified kvm-userspace. Basically, I was trying out a suggestion of Marcelo's to drop the global qemu mutex when reading GSO packets from a tap device i.e. @@ -4299,7 +4299,9 @@ static void tap_send(void *opaque) sbuf.buf = s-buf; s-size = getmsg(s-fd, NULL, sbuf, f) =0 ? sbuf.len : -1; #else + kvm_mutex_unlock(); s-size = read(s-fd, s-buf, sizeof(s-buf)); + kvm_mutex_lock(); It seems to work fine, but more on that later ... important thing is that if people see a hard-to-reproduce condition where things seem to slow down or lock up, try manually doing a kill -ALRM $(qemu) and if that fixes it, then you're probably seeing this bug. Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] KVM: Handle device assignment to guests
From: Amit Shah [EMAIL PROTECTED] This patch adds support for handling PCI devices that are assigned to the guest (PCI passthrough). The device to be assigned to the guest is registered in the host kernel and interrupt delivery is handled. If a device is already assigned, or the device driver for it is still loaded on the host, the device assignment is failed by conveying a -EBUSY reply to the userspace. Devices that share their interrupt line are not supported at the moment. By itself, this patch will not make devices work within the guest. The VT-d extension is required to enable the device to perform DMA. Another alternative is PVDMA. Signed-off-by: Amit Shah [EMAIL PROTECTED] Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED] Signed-off-by: Han, Weidong [EMAIL PROTECTED] --- arch/x86/kvm/x86.c | 267 include/asm-x86/kvm_host.h | 37 ++ include/asm-x86/kvm_para.h | 16 +++- include/linux/kvm.h|3 + virt/kvm/ioapic.c | 12 ++- 5 files changed, 332 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3167006..65b307d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4,10 +4,12 @@ * derived from drivers/kvm/kvm_main.c * * Copyright (C) 2006 Qumranet, Inc. + * Copyright (C) 2008 Qumranet, Inc. * * Authors: * Avi Kivity [EMAIL PROTECTED] * Yaniv Kamay [EMAIL PROTECTED] + * Amit Shah[EMAIL PROTECTED] * * This work is licensed under the terms of the GNU GPL, version 2. See * the COPYING file in the top-level directory. @@ -23,8 +25,10 @@ #include x86.h #include linux/clocksource.h +#include linux/interrupt.h #include linux/kvm.h #include linux/fs.h +#include linux/pci.h #include linux/vmalloc.h #include linux/module.h #include linux/mman.h @@ -98,6 +102,256 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { { NULL } }; +DEFINE_RWLOCK(kvm_pci_pt_lock); + +/* + * Used to find a registered host PCI device (a passthrough device) + * during ioctls, interrupts or EOI + */ +struct kvm_pci_pt_dev_list * +kvm_find_pci_pt_dev(struct list_head *head, + struct kvm_pci_pt_info *pt_pci_info, int irq, int source) +{ + struct list_head *ptr; + struct kvm_pci_pt_dev_list *match; + + list_for_each(ptr, head) { + match = list_entry(ptr, struct kvm_pci_pt_dev_list, list); + + switch (source) { + case KVM_PT_SOURCE_IRQ: + /* +* Used to find a registered host device +* during interrupt context on host +*/ + if (match-pt_dev.host.irq == irq) + return match; + break; + case KVM_PT_SOURCE_IRQ_ACK: + /* +* Used to find a registered host device when +* the guest acks an interrupt +*/ + if (match-pt_dev.guest.irq == irq) + return match; + break; + case KVM_PT_SOURCE_UPDATE: + if ((match-pt_dev.host.busnr == pt_pci_info-busnr) + (match-pt_dev.host.devfn == pt_pci_info-devfn)) + return match; + break; + } + } + return NULL; +} + +static void kvm_pci_pt_int_work_fn(struct work_struct *work) +{ + struct kvm_pci_pt_work *int_work; + + int_work = container_of(work, struct kvm_pci_pt_work, work); + + /* This is taken to safely inject irq inside the guest. When +* the interrupt injection (or the ioapic code) uses a +* finer-grained lock, update this +*/ + mutex_lock(int_work-pt_dev-kvm-lock); + kvm_set_irq(int_work-pt_dev-kvm, int_work-pt_dev-guest.irq, 1); + mutex_unlock(int_work-pt_dev-kvm-lock); + kvm_put_kvm(int_work-pt_dev-kvm); +} + +static void kvm_pci_pt_ack_work_fn(struct work_struct *work) +{ + struct kvm_pci_pt_work *ack_work; + + ack_work = container_of(work, struct kvm_pci_pt_work, work); + + /* This is taken to safely inject irq inside the guest. When +* the interrupt injection (or the ioapic code) uses a +* finer-grained lock, update this +*/ + mutex_lock(ack_work-pt_dev-kvm-lock); + kvm_set_irq(ack_work-pt_dev-kvm, ack_work-pt_dev-guest.irq, 0); + enable_irq(ack_work-pt_dev-host.irq); + mutex_unlock(ack_work-pt_dev-kvm-lock); + kvm_put_kvm(ack_work-pt_dev-kvm); +} + +/* FIXME: Implement the OR logic needed to make shared interrupts on + * this line behave properly + */ +static irqreturn_t kvm_pci_pt_dev_intr(int irq, void *dev_id) +{ + struct kvm_pci_passthrough_dev_kernel *pt_dev = + (struct kvm_pci_passthrough_dev_kernel *)
[PATCH 3/6] KVM: Handle device assignment to guests
From: Amit Shah [EMAIL PROTECTED] This patch adds support for handling PCI devices that are assigned to the guest (PCI passthrough). The device to be assigned to the guest is registered in the host kernel and interrupt delivery is handled. If a device is already assigned, or the device driver for it is still loaded on the host, the device assignment is failed by conveying a -EBUSY reply to the userspace. Devices that share their interrupt line are not supported at the moment. By itself, this patch will not make devices work within the guest. The VT-d extension is required to enable the device to perform DMA. Another alternative is PVDMA. Signed-off-by: Amit Shah [EMAIL PROTECTED] Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED] Signed-off-by: Han, Weidong [EMAIL PROTECTED] --- arch/x86/kvm/x86.c | 267 include/asm-x86/kvm_host.h | 37 ++ include/asm-x86/kvm_para.h | 16 +++- include/linux/kvm.h|3 + virt/kvm/ioapic.c | 12 ++- 5 files changed, 332 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3167006..65b307d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4,10 +4,12 @@ * derived from drivers/kvm/kvm_main.c * * Copyright (C) 2006 Qumranet, Inc. + * Copyright (C) 2008 Qumranet, Inc. * * Authors: * Avi Kivity [EMAIL PROTECTED] * Yaniv Kamay [EMAIL PROTECTED] + * Amit Shah[EMAIL PROTECTED] * * This work is licensed under the terms of the GNU GPL, version 2. See * the COPYING file in the top-level directory. @@ -23,8 +25,10 @@ #include x86.h #include linux/clocksource.h +#include linux/interrupt.h #include linux/kvm.h #include linux/fs.h +#include linux/pci.h #include linux/vmalloc.h #include linux/module.h #include linux/mman.h @@ -98,6 +102,256 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { { NULL } }; +DEFINE_RWLOCK(kvm_pci_pt_lock); + +/* + * Used to find a registered host PCI device (a passthrough device) + * during ioctls, interrupts or EOI + */ +struct kvm_pci_pt_dev_list * +kvm_find_pci_pt_dev(struct list_head *head, + struct kvm_pci_pt_info *pt_pci_info, int irq, int source) +{ + struct list_head *ptr; + struct kvm_pci_pt_dev_list *match; + + list_for_each(ptr, head) { + match = list_entry(ptr, struct kvm_pci_pt_dev_list, list); + + switch (source) { + case KVM_PT_SOURCE_IRQ: + /* +* Used to find a registered host device +* during interrupt context on host +*/ + if (match-pt_dev.host.irq == irq) + return match; + break; + case KVM_PT_SOURCE_IRQ_ACK: + /* +* Used to find a registered host device when +* the guest acks an interrupt +*/ + if (match-pt_dev.guest.irq == irq) + return match; + break; + case KVM_PT_SOURCE_UPDATE: + if ((match-pt_dev.host.busnr == pt_pci_info-busnr) + (match-pt_dev.host.devfn == pt_pci_info-devfn)) + return match; + break; + } + } + return NULL; +} + +static void kvm_pci_pt_int_work_fn(struct work_struct *work) +{ + struct kvm_pci_pt_work *int_work; + + int_work = container_of(work, struct kvm_pci_pt_work, work); + + /* This is taken to safely inject irq inside the guest. When +* the interrupt injection (or the ioapic code) uses a +* finer-grained lock, update this +*/ + mutex_lock(int_work-pt_dev-kvm-lock); + kvm_set_irq(int_work-pt_dev-kvm, int_work-pt_dev-guest.irq, 1); + mutex_unlock(int_work-pt_dev-kvm-lock); + kvm_put_kvm(int_work-pt_dev-kvm); +} + +static void kvm_pci_pt_ack_work_fn(struct work_struct *work) +{ + struct kvm_pci_pt_work *ack_work; + + ack_work = container_of(work, struct kvm_pci_pt_work, work); + + /* This is taken to safely inject irq inside the guest. When +* the interrupt injection (or the ioapic code) uses a +* finer-grained lock, update this +*/ + mutex_lock(ack_work-pt_dev-kvm-lock); + kvm_set_irq(ack_work-pt_dev-kvm, ack_work-pt_dev-guest.irq, 0); + enable_irq(ack_work-pt_dev-host.irq); + mutex_unlock(ack_work-pt_dev-kvm-lock); + kvm_put_kvm(ack_work-pt_dev-kvm); +} + +/* FIXME: Implement the OR logic needed to make shared interrupts on + * this line behave properly + */ +static irqreturn_t kvm_pci_pt_dev_intr(int irq, void *dev_id) +{ + struct kvm_pci_passthrough_dev_kernel *pt_dev = + (struct kvm_pci_passthrough_dev_kernel *)
Re: [PATCH 3/6] KVM: Handle device assignment to guests
Please ignore this repeated patch -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PCI passthrough with VT-d - native performance
Ben-Ami Yassour wrote: That CPU utilization is extremely high and somewhat illogical if native w/vt-d has almost no CPU impact. Have you run oprofile yet or have any insight into where CPU is being burnt? What does kvm_stat look like? I wonder if there are a large number of PIO exits. What does the interrupt count look like on native vs. KVM with VT-d? Regards, Anthony Liguori These are all good points and questions, I agree that we need to take a deeper look into the performance issues, but I think that we need to merge with the main KVM tree first. It would be good to get the host interrupt rate, to confirm that the host isn't flooded with interrupts. A deeper analysis can wait. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm_queue_exception
hi there i was using kvm-70 and kernel 2.6.25 on debian lenny for two weeks without problems. processor is an AMD Opteron 2350. now out of the nowwhere there is a problem with 1 of 4 guests (1x debian etch amd64, 1x debian etch i386, 1x ubuntu 7.10 amd64, 1x ubuntu 7.10 i386) On the host i see the following kernel messages again and again... Rebooted the host and started all guests. Same problem again with only one guest (debian etch amd64 with kernel 2.6.25). Jul 16 19:17:03 cubalibre kernel: [ 9275.836932] [ cut here ] Jul 16 19:17:03 cubalibre kernel: [ 9275.836962] WARNING: at /usr/src/modules/kvm/x86.c:185 kvm_queue_exception_e+0x26/0x47 [kvm]() Jul 16 19:17:03 cubalibre kernel: [ 9275.837022] Modules linked in: tun sbs ac battery wmi container sbshc video output nfs lockd nfs_acl sunrpc bridge ipv6 mptctl ipmi_poweroff ipmi_si ipmi_devintf ipmi_msghandler kvm_amd kvm bonding loop psmouse serio_raw pcspkr button i2c_piix4 shpchp pci_hotplug i2c_core dcdbas evdev ext3 jbd mbcache dm_mirror dm_snapshot dm_mod ide_cd_mod cdrom sd_mod sata_svw usbhid hid ff_memless ata_generic serverworks libata dock mptsas mptscsih mptbase scsi_transport_sas scsi_mod tg3 ehci_hcd ide_pci_generic ide_core ohci_hcd thermal processor fan Jul 16 19:17:03 cubalibre kernel: [ 9275.837458] Pid: 5135, comm: kvm Not tainted 2.6.25-2-amd64 #1 Jul 16 19:17:03 cubalibre kernel: [ 9275.837491] Jul 16 19:17:03 cubalibre kernel: [ 9275.837491] Call Trace: Jul 16 19:17:03 cubalibre kernel: [ 9275.837544] [80234ce5] warn_on_slowpath+0x51/0x63 Jul 16 19:17:03 cubalibre kernel: [ 9275.837580] [8022a56c] hrtick_start_fair+0xfb/0x143 Jul 16 19:17:03 cubalibre kernel: [ 9275.837618] [80230217] hrtick_set+0x88/0xf7 Jul 16 19:17:03 cubalibre kernel: [ 9275.837652] [8041f709] error_exit+0x0/0x60 Jul 16 19:17:03 cubalibre kernel: [ 9275.837697] [882118f0] :kvm:gfn_to_hva+0x1c/0x40 Jul 16 19:17:03 cubalibre kernel: [ 9275.837742] [88211a7b] :kvm:kvm_read_guest_page+0x34/0x46 Jul 16 19:17:03 cubalibre kernel: [ 9275.837792] [88214cd1] :kvm:kvm_queue_exception_e+0x26/0x47 Jul 16 19:17:03 cubalibre kernel: [ 9275.837839] [88237ddc] :kvm_amd:handle_exit+0x9a/0x1ab Jul 16 19:17:03 cubalibre kernel: [ 9275.837886] [8821705b] :kvm:kvm_arch_vcpu_ioctl_run+0x460/0x612 Jul 16 19:17:03 cubalibre kernel: [ 9275.837940] [882129fb] :kvm:kvm_vcpu_ioctl+0xf3/0x3a9 Jul 16 19:17:03 cubalibre kernel: [ 9275.837976] [8027c8aa] zone_statistics+0x3f/0x93 Jul 16 19:17:03 cubalibre kernel: [ 9275.838011] [8027659e] get_page_from_freelist+0x4a6/0x638 Jul 16 19:17:03 cubalibre kernel: [ 9275.838055] [80276e13] __alloc_pages+0x71/0x312 Jul 16 19:17:03 cubalibre kernel: [ 9275.838092] [80281318] handle_mm_fault+0x38b/0x893 Jul 16 19:17:03 cubalibre kernel: [ 9275.838134] [8023e5c4] recalc_sigpending+0xe/0x38 Jul 16 19:17:03 cubalibre kernel: [ 9275.838167] [8023f80d] dequeue_signal+0x8d/0x113 Jul 16 19:17:03 cubalibre kernel: [ 9275.838206] [802a5b05] vfs_ioctl+0x21/0x6b Jul 16 19:17:03 cubalibre kernel: [ 9275.838239] [802a5d97] do_vfs_ioctl+0x248/0x261 Jul 16 19:17:03 cubalibre kernel: [ 9275.838274] [8029ad84] vfs_read+0x11e/0x152 Jul 16 19:17:03 cubalibre kernel: [ 9275.838308] [802a5e01] sys_ioctl+0x51/0x70 Jul 16 19:17:03 cubalibre kernel: [ 9275.838347] [8020bd9a] system_call_after_swapgs+0x8a/0x8f Jul 16 19:17:03 cubalibre kernel: [ 9275.838385] Jul 16 19:17:03 cubalibre kernel: [ 9275.838412] ---[ end trace 18dbdafc95bffe16 ]--- Jul 16 19:17:03 cubalibre kernel: [ 9275.838456] [ cut here ] kvm_stat output: efer_reload0 0 exits 26806018 2361 fpu_reload 17349000 1205 halt_exits 2938557 370 halt_wakeup 33961629 host_state_reload 17697095 1245 hypercalls 0 0 insn_emulation 7519074 753 insn_emulation_fail0 0 invlpg 0 0 io_exits13196230 591 irq_exits1593408 279 irq_window 0 0 largepages 0 0 mmio_exits 1445645 224 mmu_cache_miss 702 0 mmu_flooded0 0 mmu_pde_zapped 0 0 mmu_pte_updated0 0 mmu_pte_write 23000 0 mmu_recycled 0 0 mmu_shadow_zapped 0 0 nmi_window 0 0 pf_fixed 0 0 pf_guest 0 0 remote_tlb_flush 3 0 request_irq0 0 signal_exits 4 0
Re: kvm causing memory corruption? now 2.6.26
On a suggestion of Anthony's, I tried a defconfig kernel. It is now bombing out on an assertion in the lapic code: http://sr71.net/~dave/linux/2.6.26-oops1.txt -- Dave -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/3] KVM: VMX: handle segment limit granularity special case in software
As the comment in the diff mentions, VMX does not accept any bit in the range 11:0 of ES,CS,FS,GS,SS segment registers limit field to be zero with the granulity bit set to one. So clear granularity and adjust the limit accordingly. Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED] Index: kvm/arch/x86/kvm/vmx.c === --- kvm.orig/arch/x86/kvm/vmx.c +++ kvm/arch/x86/kvm/vmx.c @@ -1665,6 +1665,22 @@ static void vmx_set_segment(struct kvm_v return; } vmcs_writel(sf-base, var-base); + + /* +* section 22.3.1.2: +* - If any bit in the limit field in the range 11:0 is 0, G must be 0. +* - If any bit in the limit field in the range 31:20 is 1, G must be 1. +*/ + if (!vcpu-arch.rmode.active !var-unusable +seg != VCPU_SREG_TR seg != VCPU_SREG_LDTR) { +#define SEG_MASK ((1 12)-1) + if (var-g (var-limit SEG_MASK) != SEG_MASK) { + var-g = 0; + var-limit = 12; + var-limit |= SEG_MASK; + } + } + vmcs_write32(sf-limit, var-limit); vmcs_write16(sf-selector, var-selector); if (vcpu-arch.rmode.active var-s) { -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM overflows the stack
On Wed, 2008-07-16 at 14:44 -0700, Dave Hansen wrote: On a suggestion of Anthony's, I tried a defconfig kernel. It is now bombing out on an assertion in the lapic code: http://sr71.net/~dave/linux/2.6.26-oops1.txt I think I found it!!! $ (objdump -d kvm.ko ; objdump -d kvm-intel.ko ) | egrep 'sub.*0x...,.*esp|:' | egrep sub -B1 1a90 kvm_vcpu_ioctl: 1a9a: 81 ec 60 06 00 00 sub$0x660,%esp -- 4e90 kvm_arch_vcpu_ioctl: 4e9d: 81 ec 6c 08 00 00 sub$0x86c,%esp -- 5900 kvm_arch_vm_ioctl: 5903: 81 ec 34 05 00 00 sub$0x534,%esp -- d4f0 paging64_prefetch_page: d4f8: 81 ec 1c 01 00 00 sub$0x11c,%esp -- dfd0 paging32_prefetch_page: dfd8: 81 ec 1c 01 00 00 sub$0x11c,%esp -- f390 kvm_pv_mmu_op: f3a1: 81 ec 28 02 00 00 sub$0x228,%esp We're simply overflowing the stack. I changed all of the large on-stack allocations to 'static', and it actually boots now. I know 'static' isn't safe, but it was good for a quick test. A 'make stackcheck' confirms this: [EMAIL PROTECTED]:~/kernels/linux-2.6.git$ make checkstack objdump -d vmlinux $(find . -name '*.ko') | \ perl /home/dave/kernels/linux-2.6.git-t61/scripts/checkstack.pl i386 0x42d3 kvm_arch_vcpu_ioctl [kvm]: 2148 0x12e3 kvm_vcpu_ioctl [kvm]:1620 0x4a83 kvm_arch_vm_ioctl [kvm]: 1332 0x9a26 airo_get_aplist [airo]: 1140 0x9b76 airo_get_aplist [airo]: 1140 0x9c82 airo_get_aplist [airo]: 1140 ... In other words, kvm has the top 3 stack users in my kernel. As you can see from my trace above, these things also get called with super-long stacks already. Man. That sucked to find. Avi, how would you like this fixed? I'd be happy to prepare some patches. Do you have a particular approach that you think we should use? Just make the big objects dynamically allocated? -- Dave -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: networking setup problem
paolo pedaletti wrote: Hi, I hope this is the right ml to submit my problem. Abstract: I can't setup 2 different network inside my VMs, one public and one private. Scheme: eth0 - -| proxy |---eth1 |- | H| | O| eth0 - | S|| web |--|eth1 T|- | | | | eth0 - | || db|---eth1 - this is a classic LAMP, sparse on 3 VM 1) front end, proxy (apache2 in reverse with mod-security) 2) application server, web (apache2 + php5) 3) database (mysql5) (it's a test/backup environment) each VM must have 2 network card: eth0 on the local network, in bridge with the host physical eth0 eth1 on the virtual private network, for internal communications between them saying that, ... it doesn't work :-( (linux ubuntu 8.04 2.6.24-19-generic, kvm-62) these are the command lines: kvm -name PROXY -net nic,vlan=0,macaddr=00:18:BE:EF:17:2A,model=rtl8139 -net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh -net nic,vlan=1,macaddr=00:18:BE:EF:17:2B,model=rtl8139 -net user,vlan=1,ifname=dmz0,script=./qemu-ifup.sh -drive index=0,media=disk,if=scsi,file=./ubuntu-server.PROXY.root,boot=on -drive index=1,media=disk,if=scsi,file=./ubuntu-server.PROXY.home -drive index=2,media=disk,if=scsi,file=./ubuntu-server.PROXY.swap kvm -name WEBAPP -net nic,vlan=0,macaddr=00:18:BE:EF:17:1A,model=rtl8139 -net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh -net nic,vlan=1,macaddr=00:18:BE:EF:17:1B,model=rtl8139 -net user,vlan=1,ifname=dmz0,script=./qemu-ifup.sh -drive index=0,media=disk,if=scsi,file=./ubuntu-server.WEB.root,boot=on -drive index=1,media=disk,if=scsi,file=./ubuntu-server.WEB.home -drive index=2,media=disk,if=scsi,file=./ubuntu-server.WEB.swap kvm -name DB -net nic,vlan=0,macaddr=00:18:BE:EF:17:0A,model=rtl8139 -net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh -net nic,vlan=1,macaddr=00:18:BE:EF:17:0B,model=rtl8139 -net user,vlan=1,ifname=dmz0,script=./qemu-ifup.sh -drive index=0,media=disk,if=scsi,file=./ubuntu-server.DB.root,boot=on -drive index=1,media=disk,if=scsi,file=./ubuntu-server.DB.home -drive index=2,media=disk,if=scsi,file=./ubuntu-server.DB.swap $ cat /etc/qemu-ifup -8-88-- #!/bin/sh set -x echo Executing $0 case $1 in tap*)echo tun network BRIDGE=br0 if [ -z $(ifconfig $BRIDGE) ] ; then /usr/sbin/brctl addbr $BRIDGE dhclient $BRIDGE fi /usr/sbin/tunctl -u `whoami` -t $1 echo Bringing up $1 for bridged mode... /sbin/ifconfig $1 0.0.0.0 promisc up /sbin/ip link set $1 up sleep 0.5s echo Adding $1 to br0... /usr/sbin/brctl addif $BRIDGE $1 ;; dmz*)echo dmz network BRIDGE=br1 if [ -z $(ifconfig $BRIDGE) ] ; then /usr/sbin/brctl addbr $BRIDGE dhclient $BRIDGE fi /usr/sbin/tunctl -u `whoami` -t $1 echo Bringing up $1 for bridged mode... /sbin/ifconfig $1 0.0.0.0 promisc up /sbin/ip link set $1 up sleep 0.5s echo Adding $1 to $BRIDGE... /usr/sbin/brctl addif $BRIDGE $1 ;; *) echo Error: no interface specified or interface '$1' invalid exit 1 esac -8-88-- eth0 works for all the VM, eth1 doesn't. constrain: no dhcp, all static ip any suggestion? AFAIK, -net user does not need an ifname or script argument - there's no host interface for the user mode stack. Try these: kvm -name PROXY -net nic,vlan=0,macaddr=00:18:BE:EF:17:2A,model=rtl8139 -net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh -net nic,vlan=1,macaddr=00:18:BE:EF:17:2B,model=rtl8139 -net user,vlan=1 -drive index=0,media=disk,if=scsi,file=./ubuntu-server.PROXY.root,boot=on -drive index=1,media=disk,if=scsi,file=./ubuntu-server.PROXY.home -drive index=2,media=disk,if=scsi,file=./ubuntu-server.PROXY.swap kvm -name WEBAPP -net nic,vlan=0,macaddr=00:18:BE:EF:17:1A,model=rtl8139 -net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh -net nic,vlan=1,macaddr=00:18:BE:EF:17:1B,model=rtl8139 -net user,vlan=1 -drive index=0,media=disk,if=scsi,file=./ubuntu-server.WEB.root,boot=on -drive index=1,media=disk,if=scsi,file=./ubuntu-server.WEB.home -drive index=2,media=disk,if=scsi,file=./ubuntu-server.WEB.swap kvm -name DB -net nic,vlan=0,macaddr=00:18:BE:EF:17:0A,model=rtl8139 -net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh -net nic,vlan=1,macaddr=00:18:BE:EF:17:0B,model=rtl8139 -net user,vlan=1 -drive index=0,media=disk,if=scsi,file=./ubuntu-server.DB.root,boot=on -drive index=1,media=disk,if=scsi,file=./ubuntu-server.DB.home -drive index=2,media=disk,if=scsi,file=./ubuntu-server.DB.swap -- David. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] kvm-autotest
* Uri Lublin [EMAIL PROTECTED] [2008-07-16 18:15]: Client side, for installation, we already have a solution that works for all types of guests: http://kvm.qumranet.com/kvmwiki/KVMTest which is already integrated as a client test in autotest. Once you record your installation via kvmtest, then it is just matter for keeping the iso and an empty disk image around and replaying the installation with -snapshot. So guest installation is a client test. KVMtest has its own way of managing/booting/communicating-with guests, and naturally does not use KVM/KVMGuest classes of autotest server. It does not use the KVM/KVMGuest classes in autotest server precisely because it is a client test, there is no need to do anything server-side w.r.t KVMTest. We ensure that we've built and installed whatever version of kvm on the target machine and that the target machine has access to require inputs (iso,disk image) and invoke KVMTest. How about the test, suggested by Marcelo/Chris, of changing physical cpu of a VM using taskset. Would that be a client test or a server test ? What about stop/cont, save/restore ? I think most kvm-tests will be client tests. Agreed, the above examples will be client tests. Autotest client can do parallel execution, or even step-wise. Between those, we should be able to ensure we get proper coverage for the envisioned scenarios. More complex tests, which involve multiple hosts, such as migration between two hosts, should be server tests. Managing all those tests, can be done by autoserv. Agreed. Now, I'm actually more interested in doing the following: use kvmtest to replay an installation of a guest and instead of throwing the guest away once (running with -snapshot) it has passes the install, it is now ready to be used to execute autotest client tests or something else. autotest client tests can be consider as guest load which usually does is orthogonal to the real kvm-test that is running (e.g. migration-test Definitely. I just want to utilize autotest to drive both guest creation and test orchestration which includes installation as well as testing known working guests in various scenarios and yes autotest simplifies generating guest load. while watching a movie, or migration-test while building the kernel). Also what would you do for non-linux guest ? I'm not quite sure what to do here since non-linux guests isn't really in my scope beyond simple tests (installation, shutdown, reboot, configuration variance). We'll try this little exercise of writing a kvm-test on the server side and on the client side and compare complexity. That's a bit vague, what sort of test are you talking about? If you mean installation, i'm not interested since that's been handled by KVMTest. Are you actually running autotest tests with KVMTest installed guests? Do Yes, but the transition between using KVMtest to install aguest and running autotest inside isn't completely automated -- yet. you have to manually exchange ssh-keys. Yeah, we'll need a solution, but it should be pretty simple to automate, Mostly likely pregenerate a key and serve it up to the guest via cdrom image and then install that into the guest. As to complexity, I urge you to look at the existing kvm u examples[2] in the autotest server dir, those look pretty darn simple to me and already include all of the infrastructure for capturing console logs, results and errors. It does look simple. It was written for a different purpose though, which is to run autotest tests on guests, not to run kvm tests. Sure, but that doesn't mean it isn't a good infrastructure on top of which we can build kvm testing. Oh, I forgot my pointer to the server setup last time: 1. http://test.kernel.org/autotest/AutotestServerInstall 2. autotest/server/samples/kvm.srv So what do you propose ? We've were thinking today how we can move things to the server and want to get your (or anyone's) opinion. Does the server always starts (boots) guests ? Do most tests run on the server-side (similar to [2]) or may Yeah, let me explain a bit more about how the server and clients work: You'll have one master server which runs the job monitor, it will look for autotest server files (.srv) and that file will run on the server, but you will write your srv file to execute on a set of machines that your master server manages. Autotest server maintains a db of machines. Looking at autotest/server/samples/sleeptest.srv: def run(machine): host = hosts.SSHHost(machine) at = autotest.Autotest(host) at.run_test('sleeptest') job.parallel_simple(run, machines) We're defining a run function, and then running that across all the machines in the grid. The other interesting one to look at is netperf-guest-to-host-far.srv. That file demonstrates installing kvm to differnt host machines (not the server where the .srv file is running); on
Re: [PATCH 3/6] KVM: Handle device assignment to guests
Some comments below. :) On Wednesday 16 July 2008 23:56:50 Ben-Ami Yassour wrote: From: Amit Shah [EMAIL PROTECTED] This patch adds support for handling PCI devices that are assigned to the guest (PCI passthrough). The device to be assigned to the guest is registered in the host kernel and interrupt delivery is handled. If a device is already assigned, or the device driver for it is still loaded on the host, the device assignment is failed by conveying a -EBUSY reply to the userspace. Devices that share their interrupt line are not supported at the moment. By itself, this patch will not make devices work within the guest. The VT-d extension is required to enable the device to perform DMA. Another alternative is PVDMA. Signed-off-by: Amit Shah [EMAIL PROTECTED] Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED] Signed-off-by: Han, Weidong [EMAIL PROTECTED] --- arch/x86/kvm/x86.c | 267 include/asm-x86/kvm_host.h | 37 ++ include/asm-x86/kvm_para.h | 16 +++- include/linux/kvm.h|3 + virt/kvm/ioapic.c | 12 ++- 5 files changed, 332 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3167006..65b307d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4,10 +4,12 @@ * derived from drivers/kvm/kvm_main.c * * Copyright (C) 2006 Qumranet, Inc. + * Copyright (C) 2008 Qumranet, Inc. * * Authors: * Avi Kivity [EMAIL PROTECTED] * Yaniv Kamay [EMAIL PROTECTED] + * Amit Shah[EMAIL PROTECTED] * * This work is licensed under the terms of the GNU GPL, version 2. See * the COPYING file in the top-level directory. @@ -23,8 +25,10 @@ #include x86.h #include linux/clocksource.h +#include linux/interrupt.h #include linux/kvm.h #include linux/fs.h +#include linux/pci.h #include linux/vmalloc.h #include linux/module.h #include linux/mman.h @@ -98,6 +102,256 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { { NULL } }; [snip] + +static int kvm_vm_ioctl_pci_pt_dev(struct kvm *kvm, +struct kvm_pci_passthrough_dev *pci_pt_dev) +{ + int r = 0; + struct kvm_pci_pt_dev_list *match; + struct pci_dev *dev; + + write_lock(kvm_pci_pt_lock); + + /* Check if this is a request to update the irq of the device + * in the guest (BIOS/ kernels can dynamically reprogram irq + * numbers). This also protects us from adding the same + * device twice. + */ + match = kvm_find_pci_pt_dev(kvm-arch.pci_pt_dev_head, + pci_pt_dev-host, 0, KVM_PT_SOURCE_UPDATE); + if (match) { + match-pt_dev.guest.irq = pci_pt_dev-guest.irq; + write_unlock(kvm_pci_pt_lock); + goto out; + } + write_unlock(kvm_pci_pt_lock); + + match = kzalloc(sizeof(struct kvm_pci_pt_dev_list), GFP_KERNEL); + if (match == NULL) { + printk(KERN_INFO %s: Couldn't allocate memory\n, +__func__); + r = -ENOMEM; + goto out; + } + dev = pci_get_bus_and_slot(pci_pt_dev-host.busnr, +pci_pt_dev-host.devfn); + if (!dev) { + printk(KERN_INFO %s: host device not found\n, __func__); + r = -EINVAL; + goto out_free; + } + if (pci_enable_device(dev)) { + printk(KERN_INFO %s: Could not enable PCI device\n, __func__); + r = -EBUSY; + goto out_put; + } + r = pci_request_regions(dev, kvm_pt_device); + if (r) { + printk(KERN_INFO %s: Could not get access to device regions\n, +__func__); + goto out_put; pci_disable_device()? + } + match-pt_dev.guest.busnr = pci_pt_dev-guest.busnr; + match-pt_dev.guest.devfn = pci_pt_dev-guest.devfn; + match-pt_dev.host.busnr = pci_pt_dev-host.busnr; + match-pt_dev.host.devfn = pci_pt_dev-host.devfn; + match-pt_dev.dev = dev; + + write_lock(kvm_pci_pt_lock); + + INIT_WORK(match-pt_dev.int_work.work, kvm_pci_pt_int_work_fn); + INIT_WORK(match-pt_dev.ack_work.work, kvm_pci_pt_ack_work_fn); + + match-pt_dev.kvm = kvm; + match-pt_dev.int_work.pt_dev = match-pt_dev; + match-pt_dev.ack_work.pt_dev = match-pt_dev; + + list_add(match-list, kvm-arch.pci_pt_dev_head); + + write_unlock(kvm_pci_pt_lock); + + if (irqchip_in_kernel(kvm)) { + match-pt_dev.guest.irq = pci_pt_dev-guest.irq; + match-pt_dev.host.irq = dev-irq; + if (kvm-arch.vioapic) + kvm-arch.vioapic-ack_notifier = kvm_pci_pt_ack_irq; + if (kvm-arch.vpic) + kvm-arch.vpic-ack_notifier = kvm_pci_pt_ack_irq; + + /* Even though this is PCI, we don't want to use shared +
RE: [PATCH 3/8] KVM: Handle device assignment to guests
Avi Kivity wrote: +static void kvm_pci_pt_work_fn(struct work_struct *work) +{ +struct kvm_pci_pt_dev_list *match; +struct kvm_pci_pt_work *int_work; +int source; +unsigned long flags; +int guest_irq; +int host_irq; + +int_work = container_of(work, struct kvm_pci_pt_work, work); + +source = int_work-source ? KVM_PT_SOURCE_IRQ_ACK : KVM_PT_SOURCE_IRQ; + + /* This is taken to safely inject irq inside the guest. When + * the interrupt injection (or the ioapic code) uses a + * finer-grained lock, update this + */ +mutex_lock(int_work-kvm-lock); +read_lock_irqsave(kvm_pci_pt_lock, flags); +match = kvm_find_pci_pt_dev(int_work-kvm-arch.pci_pt_dev_head, NULL, + int_work-irq, source); +if (!match) { +printk(KERN_ERR %s: no matching device assigned to guest + found for irq %d, source = %d!\n, + __func__, int_work-irq, int_work-source); +read_unlock_irqrestore(kvm_pci_pt_lock, flags); + goto out; +} +guest_irq = match-pt_dev.guest.irq; +host_irq = match-pt_dev.host.irq; +read_unlock_irqrestore(kvm_pci_pt_lock, flags); + +if (source == KVM_PT_SOURCE_IRQ) +kvm_set_irq(int_work-kvm, guest_irq, 1); +else { +kvm_set_irq(int_work-kvm, int_work-irq, 0); +enable_irq(host_irq); +} +out: +mutex_unlock(int_work-kvm-lock); +kvm_put_kvm(int_work-kvm); +} + +/* FIXME: Implement the OR logic needed to make shared interrupts on + * this line behave properly + */ Isn't this a showstopper? There is no easy way for a user to avoid sharing, especially as we have only three pci irqs at present. Currently it's not easy to avoid sharing. I think we can support MSI for assgined device to solve sharing problem. Randy (Weidong) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: PCI passthrough with VT-d - native performance
Anthony Liguori wrote: Ben-Ami Yassour wrote: On Wed, 2008-07-16 at 17:36 +0300, Avi Kivity wrote: Ben-Ami Yassour wrote: In last few tests that we made with PCI-passthrough and VT-d using iperf, we were able to get the same throughput as on native OS with a 1G NIC Excellent! (with higher CPU utilization). How much higher? Here are some numbers for running iperf -l 1M: e1000 NIC (behind a PCI bridge) Bandwidth (Mbit/sec)CPU utilization Native OS 771 18% Native OS with VT-d 760 18% KVM VT-d390 95% KVM VT-d with direct mmio 770 84% KVM emulated 57 100% What about virtio? Also, which emulated is this? That CPU utilization is extremely high and somewhat illogical if native w/vt-d has almost no CPU impact. Have you run oprofile yet or have any insight into where CPU is being burnt? What does kvm_stat look like? I wonder if there are a large number of PIO exits. What does the interrupt count look like on native vs. KVM with VT-d? e1000 NIC doesn't use PIO. Randy (Weidong)
Re: kvm causing memory corruption? now 2.6.26
Dave Hansen wrote: On a suggestion of Anthony's, I tried a defconfig kernel. It is now bombing out on an assertion in the lapic code: http://sr71.net/~dave/linux/2.6.26-oops1.txt Well that assert is plain wrong: static int apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source, int short_hand, int dest, int dest_mode) { int result = 0; struct kvm_lapic *target = vcpu-arch.apic; apic_debug(target %p, source %p, dest 0x%x, dest_mode 0x%x, short_hand 0x%x, target, source, dest, dest_mode, short_hand); ASSERT(!target); It should be ASSERT(target), if anything. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM overflows the stack
Dave Hansen wrote: On Wed, 2008-07-16 at 14:44 -0700, Dave Hansen wrote: On a suggestion of Anthony's, I tried a defconfig kernel. It is now bombing out on an assertion in the lapic code: http://sr71.net/~dave/linux/2.6.26-oops1.txt I think I found it!!! $ (objdump -d kvm.ko ; objdump -d kvm-intel.ko ) | egrep 'sub.*0x...,.*esp|:' | egrep sub -B1 1a90 kvm_vcpu_ioctl: 1a9a: 81 ec 60 06 00 00 sub$0x660,%esp -- 4e90 kvm_arch_vcpu_ioctl: 4e9d: 81 ec 6c 08 00 00 sub$0x86c,%esp -- 5900 kvm_arch_vm_ioctl: 5903: 81 ec 34 05 00 00 sub$0x534,%esp -- d4f0 paging64_prefetch_page: d4f8: 81 ec 1c 01 00 00 sub$0x11c,%esp -- dfd0 paging32_prefetch_page: dfd8: 81 ec 1c 01 00 00 sub$0x11c,%esp -- f390 kvm_pv_mmu_op: f3a1: 81 ec 28 02 00 00 sub$0x228,%esp We're simply overflowing the stack. I changed all of the large on-stack allocations to 'static', and it actually boots now. I know 'static' isn't safe, but it was good for a quick test. Yes! It's obvious, once you know it... A 'make stackcheck' confirms this: [EMAIL PROTECTED]:~/kernels/linux-2.6.git$ make checkstack objdump -d vmlinux $(find . -name '*.ko') | \ perl /home/dave/kernels/linux-2.6.git-t61/scripts/checkstack.pl i386 0x42d3 kvm_arch_vcpu_ioctl [kvm]: 2148 0x12e3 kvm_vcpu_ioctl [kvm]:1620 0x4a83 kvm_arch_vm_ioctl [kvm]: 1332 0x9a26 airo_get_aplist [airo]: 1140 0x9b76 airo_get_aplist [airo]: 1140 0x9c82 airo_get_aplist [airo]: 1140 ... In other words, kvm has the top 3 stack users in my kernel. As you can see from my trace above, these things also get called with super-long stacks already. Man. That sucked to find. Avi, how would you like this fixed? I'd be happy to prepare some patches. Do you have a particular approach that you think we should use? Just make the big objects dynamically allocated? Yes, things like kvm_lapic_state are way too big to be on the stack. There's an additional problem here, that apparently your gcc (which version?) doesn't fold objects in a switch statement into the same stack slot: switch (...) { case x: { struct medium a; ... } case y: struct medium b; ... } }; These could be solved either by stack allocation, or by moving into functions marked noinline. Whichever is easier. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html