Re: [PATCH] allow enabling/disabling NPT by reloading only the architecture module

2008-07-16 Thread Yang, Sheng
On Tuesday 15 July 2008 18:55:37 Avi Kivity wrote:
 Yang, Sheng wrote:
  On Tuesday 15 July 2008 02:36:36 Joerg Roedel wrote:
  If NPT is enabled after loading both KVM modules on AMD and it
  should be disabled, both KVM modules must be reloaded. If only
  the architecture module is reloaded the behavior is undefined.
  With this patch it is possible to disable NPT only by reloading
  the kvm_amd module.
 
  Signed-off-by: Joerg Roedel [EMAIL PROTECTED]
  ---
 
  From 3dd7fa4abb1cfc702b3fbd7038d585b541f981a4 Mon Sep 17 00:00:00
  2001 From: Sheng Yang [EMAIL PROTECTED]
  Date: Tue, 15 Jul 2008 14:18:29 +0800
  Subject: [PATCH] KVM: VMX: Fix undefined beaviour of EPT after
  reload kvm-intel.ko
 
  Based on Joerg Roedel's fix for NPT.
 
  Thanks Joerg!
 
  Signed-off-by: Sheng Yang [EMAIL PROTECTED]
  ---
   arch/x86/kvm/vmx.c |   15 +--
   1 files changed, 9 insertions(+), 6 deletions(-)
 
  diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
  index 5f807e3..374e1ca 100644
  --- a/arch/x86/kvm/vmx.c
  +++ b/arch/x86/kvm/vmx.c
  @@ -3108,14 +3108,17 @@ static struct kvm_vcpu
  *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
  return ERR_PTR(-ENOMEM);
 
  allocate_vpid(vmx);
  -   if (id == 0  vm_need_ept()) {
  -   kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK |
  -   VMX_EPT_WRITABLE_MASK |
  -   VMX_EPT_DEFAULT_MT  VMX_EPT_MT_EPTE_SHIFT);
  -   kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK,
  +   if (id == 0) {
  +   if (vm_need_ept()) {
  +   kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK |
  +   VMX_EPT_WRITABLE_MASK |
  +   VMX_EPT_DEFAULT_MT  VMX_EPT_MT_EPTE_SHIFT);
  +   kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK,
  VMX_EPT_FAKE_DIRTY_MASK, 0ull,
  VMX_EPT_EXECUTABLE_MASK);
  -   kvm_enable_tdp();
  +   kvm_enable_tdp();
  +   } else
  +   kvm_disable_tdp();
  }

 hmm, what is this code doing in vmx_create_vcpu()?  surely
 vmx_init() is a better place?

Oh, may be a historic reason :)

Move it to vmx_init() now.

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: VMX: Fix bypass_guest_pf enabling when disable EPT in module parameter

2008-07-16 Thread Yang, Sheng
From c4a2cad8b91ac4c0b04a5ccd1f0bfab1d7e6ef37 Mon Sep 17 00:00:00 2001
From: Sheng Yang [EMAIL PROTECTED]
Date: Wed, 16 Jul 2008 09:21:22 +0800
Subject: [PATCH] KVM: VMX: Fix bypass_guest_pf enabling when disable 
EPT in module parameter


Signed-off-by: Sheng Yang [EMAIL PROTECTED]
---
 arch/x86/kvm/vmx.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5f807e3..d47c3f8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3294,7 +3294,7 @@ static int __init vmx_init(void)
vmx_disable_intercept_for_msr(vmx_msr_bitmap, 
MSR_IA32_SYSENTER_ESP);
vmx_disable_intercept_for_msr(vmx_msr_bitmap, 
MSR_IA32_SYSENTER_EIP);

-   if (cpu_has_vmx_ept())
+   if (vm_need_ept())
bypass_guest_pf = 0;

if (bypass_guest_pf)
--
1.5.6

From c4a2cad8b91ac4c0b04a5ccd1f0bfab1d7e6ef37 Mon Sep 17 00:00:00 2001
From: Sheng Yang [EMAIL PROTECTED]
Date: Wed, 16 Jul 2008 09:21:22 +0800
Subject: [PATCH] KVM: VMX: Fix bypass_guest_pf enabling when disable EPT in module parameter


Signed-off-by: Sheng Yang [EMAIL PROTECTED]
---
 arch/x86/kvm/vmx.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5f807e3..d47c3f8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3294,7 +3294,7 @@ static int __init vmx_init(void)
 	vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_ESP);
 	vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_EIP);
 
-	if (cpu_has_vmx_ept())
+	if (vm_need_ept())
 		bypass_guest_pf = 0;
 
 	if (bypass_guest_pf)
-- 
1.5.6



[PATCH 2/2] KVM: VMX: Fix undefined beaviour of EPT after reload kvm-intel.ko

2008-07-16 Thread Yang, Sheng
From bcbe1b5c4c6098f122accba4f00f6617baf807f7 Mon Sep 17 00:00:00 2001
From: Sheng Yang [EMAIL PROTECTED]
Date: Wed, 16 Jul 2008 09:25:40 +0800
Subject: [PATCH] KVM: VMX: Fix undefined beaviour of EPT after reload 
kvm-intel.ko

As well as move set base/mask ptes to vmx_init().

Signed-off-by: Sheng Yang [EMAIL PROTECTED]
---
 arch/x86/kvm/vmx.c |   20 ++--
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d47c3f8..baddb6e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3108,15 +3108,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct 
kvm *kvm, unsigned int id)
return ERR_PTR(-ENOMEM);

allocate_vpid(vmx);
-   if (id == 0  vm_need_ept()) {
-   kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK |
-   VMX_EPT_WRITABLE_MASK |
-   VMX_EPT_DEFAULT_MT  VMX_EPT_MT_EPTE_SHIFT);
-   kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK,
-   VMX_EPT_FAKE_DIRTY_MASK, 0ull,
-   VMX_EPT_EXECUTABLE_MASK);
-   kvm_enable_tdp();
-   }

err = kvm_vcpu_init(vmx-vcpu, kvm, id);
if (err)
@@ -3294,8 +3285,17 @@ static int __init vmx_init(void)
vmx_disable_intercept_for_msr(vmx_msr_bitmap, 
MSR_IA32_SYSENTER_ESP);
vmx_disable_intercept_for_msr(vmx_msr_bitmap, 
MSR_IA32_SYSENTER_EIP);

-   if (vm_need_ept())
+   if (vm_need_ept()) {
bypass_guest_pf = 0;
+   kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK |
+   VMX_EPT_WRITABLE_MASK |
+   VMX_EPT_DEFAULT_MT  VMX_EPT_MT_EPTE_SHIFT);
+   kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK,
+   VMX_EPT_FAKE_DIRTY_MASK, 0ull,
+   VMX_EPT_EXECUTABLE_MASK);
+   kvm_enable_tdp();
+   } else
+   kvm_disable_tdp();

if (bypass_guest_pf)
kvm_mmu_set_nonpresent_ptes(~0xffeull, 0ull);
--
1.5.6

From bcbe1b5c4c6098f122accba4f00f6617baf807f7 Mon Sep 17 00:00:00 2001
From: Sheng Yang [EMAIL PROTECTED]
Date: Wed, 16 Jul 2008 09:25:40 +0800
Subject: [PATCH] KVM: VMX: Fix undefined beaviour of EPT after reload kvm-intel.ko

As well as move set base/mask ptes to vmx_init().

Signed-off-by: Sheng Yang [EMAIL PROTECTED]
---
 arch/x86/kvm/vmx.c |   20 ++--
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d47c3f8..baddb6e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3108,15 +3108,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 		return ERR_PTR(-ENOMEM);
 
 	allocate_vpid(vmx);
-	if (id == 0  vm_need_ept()) {
-		kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK |
-			VMX_EPT_WRITABLE_MASK |
-			VMX_EPT_DEFAULT_MT  VMX_EPT_MT_EPTE_SHIFT);
-		kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK,
-VMX_EPT_FAKE_DIRTY_MASK, 0ull,
-VMX_EPT_EXECUTABLE_MASK);
-		kvm_enable_tdp();
-	}
 
 	err = kvm_vcpu_init(vmx-vcpu, kvm, id);
 	if (err)
@@ -3294,8 +3285,17 @@ static int __init vmx_init(void)
 	vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_ESP);
 	vmx_disable_intercept_for_msr(vmx_msr_bitmap, MSR_IA32_SYSENTER_EIP);
 
-	if (vm_need_ept())
+	if (vm_need_ept()) {
 		bypass_guest_pf = 0;
+		kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK |
+			VMX_EPT_WRITABLE_MASK |
+			VMX_EPT_DEFAULT_MT  VMX_EPT_MT_EPTE_SHIFT);
+		kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK,
+VMX_EPT_FAKE_DIRTY_MASK, 0ull,
+VMX_EPT_EXECUTABLE_MASK);
+		kvm_enable_tdp();
+	} else
+		kvm_disable_tdp();
 
 	if (bypass_guest_pf)
 		kvm_mmu_set_nonpresent_ptes(~0xffeull, 0ull);
-- 
1.5.6



[PATCH 0/2] configure: add support for audio-{drv,card}-list

2008-07-16 Thread Carlo Marcelo Arenas Belon
The following series adds support for qemu's audio configure option lists
that were added in kvm-71 to support selecting which interface will be used
to enable audio in the host from the guest (oss, alsa, sdl, esd, fmod, or
pulseaudio) and which audio devices emulation to enable for the guest (ac97,
adlib, cs2431a or gus).

  PATCH 1/2 : configure: include audio list options for --help output
  PATCH 2/2 : configure: passthrough for audio-{drv,card}-list

Carlo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] configure: passthrough for audio-{drv,card}-list and logic cleanup

2008-07-16 Thread Carlo Marcelo Arenas Belon
Extending the cleanup logic used in a patch from Jindrich Makovicka,
changes the default option to pass the full option to qemu's
configure and add a passthrough for qemu options that use a space
separated list of options like the list for audio drivers enabled
or the list for audio devices emulated.

Signed-off-by: Carlo Marcelo Arenas Belon [EMAIL PROTECTED]
---
 configure |   17 +
 1 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/configure b/configure
index 2558e0e..fc05767 100755
--- a/configure
+++ b/configure
@@ -10,6 +10,8 @@ qemu_cflags=
 qemu_ldflags=
 qemu_opts=
 cross_prefix=
+audio_drv_list=
+audio_card_list=
 arch=`uname -m`
 target_exec=
 
@@ -39,7 +41,8 @@ EOF
 }
 
 while [[ $1 = -* ]]; do
-opt=$1; shift
+optorig=$1; shift
+opt=$optorig
 arg=
 if [[ $opt = *=* ]]; then
arg=${opt#*=}
@@ -67,16 +70,21 @@ while [[ $1 = -* ]]; do
--cross-prefix)
cross_prefix=$arg
 ;;
+   --audio-drv-list)
+   audio_drv_list=$arg
+   ;;
+   --audio-card-list)
+   audio_card_list=$arg
+   ;;
--help)
usage
;;
*)
-   qemu_opts=$qemu_opts $opt
+   qemu_opts=$qemu_opts $optorig
;;
 esac
 done
 
-
 #set kenel directory
 libkvm_kerneldir=$(readlink -f kernel)
 
@@ -114,11 +122,12 @@ fi
 --extra-ldflags=-L $PWD/../libkvm $qemu_ldflags \
 --kernel-path=$libkvm_kerneldir \
 --prefix=$prefix \
+${audio_drv_list:+--audio-drv-list=$audio_drv_list} \
+${audio_card_list:+--audio-card-list=$audio_card_list} \
 ${cross_prefix:+--cross-prefix=$cross_prefix} \
 ${cross_prefix:+--cpu=$arch} $qemu_opts
 ) || usage
 
-
 cat EOF  config.mak
 ARCH=$arch
 PREFIX=$prefix
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 01/04]Create x86 directory to hold x86-specific files.

2008-07-16 Thread Zhang, Xiantao
Avi Kivity wrote:
 Zhang, Xiantao wrote:
 From 03ac444d1ab4446c587e8180ceaba60b9e75b28d Mon Sep 17 00:00:00
 2001 From: Xiantao Zhang [EMAIL PROTECTED]
 Date: Fri, 11 Jul 2008 10:13:08 +0800
 Subject: [PATCH] KVM: external module: Moving x86-speicif files to
 x86 directory. 
 
 Create x86 directory to hold x86-specific files.
 Signed-off-by: Xiantao Zhang [EMAIL PROTECTED] ---
  kernel/{ = x86}/anon_inodes.c|0
 
 
 This isn't really x86 specific.  It's just kernel version dependent.
 The problem is that it is built unconditionally, even if the kernel
 has anon_inodes support.
 Please send a patch that wraps the entire file in #ifdef so that we
 use the host kernel's anon_inodes if it is available.
 
Sure, I will update the patch. 
  kernel/{ = x86}/external-module-compat.c |0
  kernel/{ = x86}/external-module-compat.h |0
 
 
 Parts of this are generic, for example the mutex code.  Please move
 only the x86 specifc parts (even if ia64 doesn't need everything in
 the generic code).'
OK.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ignore DEBUGCTL MSRs

2008-07-16 Thread Alexander Graf

Avi Kivity wrote:

Alexander Graf wrote:

Avi Kivity wrote:

Alexander Graf wrote:
Netware writes and reads to the DEBUGCTL and LAST*IP MSRs without 
further checks and is really confused to receive a #GP during that. 
To make it happy we should just make them stubs, which is exactly 
what SVM already does.


To support VMX too, I put these in the generic code. Maybe the SVM 
code could be cleaned up to use generic code too.




Please add a pr_unimpl() when bits that cause a real processor to do 
something are set.


Like this? I also removed the set handlers for the *IP MSRs, as these 
are read only and made it only handle debug bits, no perfmon bits.




With a changelog entry.


ok.




+pr_unimpl(vcpu, %s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n,
+__func__, data);
  


We can avoid the printout if data == 0, since we support that case fully.


I was thinking a lot about that. Even though we support data == 0, 
usually the kernel log output is useful for people trying to find if 
something is cause a problem. If they see that DEBUGCTL gets set, but 
won't see it getting unset, they'd get confused IMHO.
So the current behavior is on purpose, but if you oppose to that idea, 
please tell me.



---

Netware writes to DEBUGCTL and reads from the DEBUGCTL and LAST*IP MSRs 
without further checks and is really confused to receive a #GP during 
that. To make it happy we should just make them stubs, which is exactly 
what SVM already does.


Writes to DEBUGCTL that are vendor-specific are resembled to behave as 
if the virtual CPU does not know them.


Signed-off-by: Alexander Graf [EMAIL PROTECTED]

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fc0721e..10f5e95 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -609,6 +609,15 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 		pr_unimpl(vcpu, %s: MSR_IA32_MCG_CTL 0x%llx, nop\n,
 			__func__, data);
 		break;
+	case MSR_IA32_DEBUGCTLMSR:
+		if (data  ~(u64)(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) {
+			/* Values other than LBR and BTF are vendor-specific,
+			   thus reserved and should throw a #GP */
+			return 1;
+		}
+		pr_unimpl(vcpu, %s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n,
+			__func__, data);
+		break;
 	case MSR_IA32_UCODE_REV:
 	case MSR_IA32_UCODE_WRITE:
 		break;
@@ -705,6 +714,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
 	case MSR_IA32_MC0_MISC+16:
 	case MSR_IA32_UCODE_REV:
 	case MSR_IA32_EBL_CR_POWERON:
+	case MSR_IA32_DEBUGCTLMSR:
+	case MSR_IA32_LASTBRANCHFROMIP:
+	case MSR_IA32_LASTBRANCHTOIP:
+	case MSR_IA32_LASTINTFROMIP:
+	case MSR_IA32_LASTINTTOIP:
 		data = 0;
 		break;
 	case MSR_MTRRcap:


[PATCH] qemu: re-add definition for qemu_get_launch_info

2008-07-16 Thread Carlo Marcelo Arenas Belon
somehow missing from sysemu.h after a qemu merge and otherwise complaining
with the following warning :

  kvm-71/qemu/migration.c: In function 'migration_init_ssh':
  kvm-71/qemu/migration.c:629: warning: implicit declaration of function 
'qemu_get_launch_info'

Signed-off-by: Carlo Marcelo Arenas Belon [EMAIL PROTECTED]
---
 qemu/sysemu.h |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/qemu/sysemu.h b/qemu/sysemu.h
index 993d67b..ab8ac91 100644
--- a/qemu/sysemu.h
+++ b/qemu/sysemu.h
@@ -41,6 +41,9 @@ void qemu_system_powerdown(void);
 #endif
 void qemu_system_reset(void);
 
+void qemu_get_launch_info(int *argc, char ***argv,
+  int *opt_daemonize, const char **opt_incoming);
+
 void do_savevm(const char *name);
 void do_loadvm(const char *name);
 void do_delvm(const char *name);
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu: remove duplicated inclusion of signal.h in qemu-kvm.h

2008-07-16 Thread Carlo Marcelo Arenas Belon
added by mistake as part of 4820cce75999b2673a964eb87601229a4bd78ad9

Signed-off-by: Carlo Marcelo Arenas Belon [EMAIL PROTECTED]
---
 qemu/qemu-kvm.h |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/qemu/qemu-kvm.h b/qemu/qemu-kvm.h
index 8b7dcde..7e28428 100644
--- a/qemu/qemu-kvm.h
+++ b/qemu/qemu-kvm.h
@@ -12,8 +12,6 @@
 
 #include signal.h
 
-#include signal.h
-
 int kvm_main_loop(void);
 int kvm_qemu_init(void);
 int kvm_qemu_create_context(void);
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm: Unknown error 524, Fail to handle apic access vmexit

2008-07-16 Thread Martin Michlmayr
* Yang, Sheng [EMAIL PROTECTED] [2008-07-16 11:26]:
 Hi Martin, can you show more dmesg here?

It doesn't contain any other messages from kvm.  If you still want it,
let me know.

 And if it can be reproduce stable?

I can reproduce this 100%.

Anyway, I just tried 2.6.26 with FlexPriority disabled and now kvm no
longer exits (and there's no Fail to handle apic access vmexit
message) but Windows still displays the same blue screen (and
reboots).

-- 
Martin Michlmayr
http://www.cyrius.com/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PCI passthrough with VT-d - native performance

2008-07-16 Thread Ben-Ami Yassour
In last few tests that we made with PCI-passthrough and VT-d using
iperf, we were able to get the same throughput as on native OS with a 1G
NIC (with higher CPU utilization).

The following patches are the PCI-passthrough patches that Amit sent
(re-based on the last kvm tree), followed by a few improvements and the
VT-d extension.
I am also sending the userspace patches: the patch that Amit sent for
PCI passthrough and the direct-mmio extension for userspace (note that
without the direct mmio extension we get less then half the throughput).

Comments are welcome.

Regards,
Ben


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/8] KVM: PCIPT: change order of device release

2008-07-16 Thread Ben-Ami Yassour
Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED]
---
 arch/x86/kvm/x86.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8d25b4a..65b307d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -343,9 +343,9 @@ static void kvm_free_pci_passthrough(struct kvm *kvm)
pci_pt_dev = list_entry(ptr, struct kvm_pci_pt_dev_list, list);
 
/* Search for this device got us a refcount */
-   pci_dev_put(pci_pt_dev-pt_dev.dev);
pci_release_regions(pci_pt_dev-pt_dev.dev);
pci_disable_device(pci_pt_dev-pt_dev.dev);
+   pci_dev_put(pci_pt_dev-pt_dev.dev);
 
list_del(pci_pt_dev-list);
kfree(pci_pt_dev);
-- 
1.5.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/8] KVM: PCIPT: fix interrupt handling

2008-07-16 Thread Ben-Ami Yassour
This patch fixes a few problems with the interrupt handling for
passthrough devices.

1. Pass the interrupt handler the pointer to the device, so we do not
need to lock the pcipt lock in the interrupt handler.

2. Remove the pt_irq_handled bitmap - it is no longer needed.

3. Split kvm_pci_pt_work_fn into two functions, one for interrupt
injection and another for the ack - is much simpler code this way.

4. Change the passthrough initialization order - add the device
structure to the list, before registering the interrupt handler.

5. On passthrough destruction path, free the interrupt handler before
cleaning queued work.

Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED]
---
 arch/x86/kvm/x86.c |  156 ++-
 include/asm-x86/kvm_host.h |5 +-
 virt/kvm/ioapic.c  |5 +-
 3 files changed, 69 insertions(+), 97 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c07ca2b..8d25b4a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -145,49 +145,37 @@ kvm_find_pci_pt_dev(struct list_head *head,
return NULL;
 }
 
-static DECLARE_BITMAP(pt_irq_handled, NR_IRQS);
-
-static void kvm_pci_pt_work_fn(struct work_struct *work)
+static void kvm_pci_pt_int_work_fn(struct work_struct *work)
 {
-   struct kvm_pci_pt_dev_list *match;
struct kvm_pci_pt_work *int_work;
-   int source;
-   unsigned long flags;
-   int guest_irq;
-   int host_irq;
 
int_work = container_of(work, struct kvm_pci_pt_work, work);
 
-   source = int_work-source ? KVM_PT_SOURCE_IRQ_ACK : KVM_PT_SOURCE_IRQ;
-
/* This is taken to safely inject irq inside the guest. When
 * the interrupt injection (or the ioapic code) uses a
 * finer-grained lock, update this
 */
-   mutex_lock(int_work-kvm-lock);
-   read_lock_irqsave(kvm_pci_pt_lock, flags);
-   match = kvm_find_pci_pt_dev(int_work-kvm-arch.pci_pt_dev_head, NULL,
-   int_work-irq, source);
-   if (!match) {
-   printk(KERN_ERR %s: no matching device assigned to guest 
-  found for irq %d, source = %d!\n,
-  __func__, int_work-irq, int_work-source);
-   read_unlock_irqrestore(kvm_pci_pt_lock, flags);
-   goto out;
-   }
-   guest_irq = match-pt_dev.guest.irq;
-   host_irq = match-pt_dev.host.irq;
-   read_unlock_irqrestore(kvm_pci_pt_lock, flags);
+   mutex_lock(int_work-pt_dev-kvm-lock);
+   kvm_set_irq(int_work-pt_dev-kvm, int_work-pt_dev-guest.irq, 1);
+   mutex_unlock(int_work-pt_dev-kvm-lock);
+   kvm_put_kvm(int_work-pt_dev-kvm);
+}
 
-   if (source == KVM_PT_SOURCE_IRQ)
-   kvm_set_irq(int_work-kvm, guest_irq, 1);
-   else {
-   kvm_set_irq(int_work-kvm, int_work-irq, 0);
-   enable_irq(host_irq);
-   }
-out:
-   mutex_unlock(int_work-kvm-lock);
-   kvm_put_kvm(int_work-kvm);
+static void kvm_pci_pt_ack_work_fn(struct work_struct *work)
+{
+   struct kvm_pci_pt_work *ack_work;
+
+   ack_work = container_of(work, struct kvm_pci_pt_work, work);
+
+   /* This is taken to safely inject irq inside the guest. When
+* the interrupt injection (or the ioapic code) uses a
+* finer-grained lock, update this
+*/
+   mutex_lock(ack_work-pt_dev-kvm-lock);
+   kvm_set_irq(ack_work-pt_dev-kvm, ack_work-pt_dev-guest.irq, 0);
+   enable_irq(ack_work-pt_dev-host.irq);
+   mutex_unlock(ack_work-pt_dev-kvm-lock);
+   kvm_put_kvm(ack_work-pt_dev-kvm);
 }
 
 /* FIXME: Implement the OR logic needed to make shared interrupts on
@@ -195,28 +183,11 @@ out:
  */
 static irqreturn_t kvm_pci_pt_dev_intr(int irq, void *dev_id)
 {
-   struct kvm *kvm = (struct kvm *) dev_id;
-   struct kvm_pci_pt_dev_list *pci_pt_dev;
-
-   if (!test_bit(irq, pt_irq_handled))
-   return IRQ_NONE;
-
-   read_lock(kvm_pci_pt_lock);
-   pci_pt_dev = kvm_find_pci_pt_dev(kvm-arch.pci_pt_dev_head, NULL,
-irq, KVM_PT_SOURCE_IRQ);
-   if (!pci_pt_dev) {
-   read_unlock(kvm_pci_pt_lock);
-   return IRQ_NONE;
-   }
-
-   pci_pt_dev-pt_dev.int_work.irq = irq;
-   pci_pt_dev-pt_dev.int_work.kvm = kvm;
-   pci_pt_dev-pt_dev.int_work.source = 0;
-
-   kvm_get_kvm(kvm);
-   schedule_work(pci_pt_dev-pt_dev.int_work.work);
-   read_unlock(kvm_pci_pt_lock);
+   struct kvm_pci_passthrough_dev_kernel *pt_dev =
+   (struct kvm_pci_passthrough_dev_kernel *) dev_id;
 
+   kvm_get_kvm(pt_dev-kvm);
+   schedule_work(pt_dev-int_work.work);
disable_irq_nosync(irq);
return IRQ_HANDLED;
 }
@@ -226,25 +197,20 @@ static void kvm_pci_pt_ack_irq(void *opaque, int irq)
 {
struct kvm *kvm = opaque;
struct kvm_pci_pt_dev_list *pci_pt_dev;
-   unsigned long flags;
 
   

[PATCH 7/8] KVM: PCIPT: VT-d support

2008-07-16 Thread Ben-Ami Yassour
This patch includes the functions to support VT-d for passthrough
devices.

[Ben: fixed memory pinning, cleanup]

Signed-off-by: Kay, Allen M [EMAIL PROTECTED]
Signed-off-by: Weidong Han [EMAIL PROTECTED]
Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED]
---
 arch/x86/kvm/Makefile  |2 +-
 arch/x86/kvm/vtd.c |  176 
 arch/x86/kvm/x86.c |   11 +++
 include/asm-x86/kvm_host.h |1 +
 include/linux/kvm_host.h   |6 ++
 virt/kvm/kvm_main.c|6 ++
 6 files changed, 201 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/kvm/vtd.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index d0e940b..5d9d079 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -11,7 +11,7 @@ endif
 EXTRA_CFLAGS += -Ivirt/kvm -Iarch/x86/kvm
 
 kvm-objs := $(common-objs) x86.o mmu.o x86_emulate.o i8259.o irq.o lapic.o \
-   i8254.o
+   i8254.o vtd.o
 obj-$(CONFIG_KVM) += kvm.o
 kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
new file mode 100644
index 000..83efb8a
--- /dev/null
+++ b/arch/x86/kvm/vtd.c
@@ -0,0 +1,176 @@
+/*
+ * Copyright (c) 2006, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Copyright (C) 2006-2008 Intel Corporation
+ * Author: Allen M. Kay [EMAIL PROTECTED]
+ * Author: Weidong Han [EMAIL PROTECTED]
+ */
+
+#include linux/list.h
+#include linux/kvm_host.h
+#include linux/pci.h
+#include linux/dmar.h
+#include linux/intel-iommu.h
+
+static int kvm_iommu_unmap_memslots(struct kvm *kvm);
+
+int kvm_iommu_map_pages(struct kvm *kvm,
+   gfn_t base_gfn, unsigned long npages)
+{
+   gfn_t gfn = base_gfn;
+   pfn_t pfn;
+   int i, rc;
+   struct dmar_domain *domain = kvm-arch.intel_iommu_domain;
+
+   if (!domain)
+   return -EFAULT;
+
+   for (i = 0; i  npages; i++) {
+   pfn = gfn_to_pfn(kvm, gfn);
+   rc = intel_iommu_page_mapping(domain,
+ gfn  PAGE_SHIFT,
+ pfn  PAGE_SHIFT,
+ PAGE_SIZE,
+ DMA_PTE_READ |
+ DMA_PTE_WRITE);
+   if (rc)
+   kvm_release_pfn_clean(pfn);
+
+   gfn++;
+   }
+   return 0;
+}
+
+static int kvm_iommu_map_memslots(struct kvm *kvm)
+{
+   int i, rc;
+   for (i = 0; i  kvm-nmemslots; i++) {
+   rc = kvm_iommu_map_pages(kvm, kvm-memslots[i].base_gfn,
+kvm-memslots[i].npages);
+   if (rc)
+   return rc;
+   }
+   return 0;
+}
+
+int kvm_iommu_map_guest(struct kvm *kvm,
+   struct kvm_pci_passthrough_dev *pci_pt_dev)
+{
+   struct pci_dev *pdev = NULL;
+
+   printk(KERN_DEBUG VT-d direct map: host bdf = %x:%x:%x\n,
+  pci_pt_dev-host.busnr,
+  PCI_SLOT(pci_pt_dev-host.devfn),
+  PCI_FUNC(pci_pt_dev-host.devfn));
+
+   for_each_pci_dev(pdev) {
+   if ((pdev-bus-number == pci_pt_dev-host.busnr) 
+   (pdev-devfn == pci_pt_dev-host.devfn)) {
+   break;
+   }
+   }
+
+   if (pdev == NULL) {
+   if (kvm-arch.intel_iommu_domain) {
+   intel_iommu_domain_exit(kvm-arch.intel_iommu_domain);
+   kvm-arch.intel_iommu_domain = NULL;
+   }
+   return -ENODEV;
+   }
+
+   kvm-arch.intel_iommu_domain = intel_iommu_domain_alloc(pdev);
+
+   if (kvm_iommu_map_memslots(kvm)) {
+   kvm_iommu_unmap_memslots(kvm);
+   return -EFAULT;
+   }
+
+   intel_iommu_detach_dev(kvm-arch.intel_iommu_domain,
+  pdev-bus-number, pdev-devfn);
+
+   if (intel_iommu_context_mapping(kvm-arch.intel_iommu_domain,
+   pdev)) {
+   printk(KERN_ERR Domain context map for %s failed,
+  pci_name(pdev));
+   return -EFAULT;
+   }
+   return 0;
+}
+
+static int kvm_iommu_put_pages(struct kvm *kvm,
+  

[PATCH 8/8] KVM: PCIPT: VT-d: dont map mmio memory slots

2008-07-16 Thread Ben-Ami Yassour
Avoid mapping mmio memory slots.

Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED]
---
 arch/x86/kvm/vtd.c |   20 +---
 include/asm-x86/kvm_host.h |2 ++
 virt/kvm/kvm_main.c|2 +-
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vtd.c b/arch/x86/kvm/vtd.c
index 83efb8a..77044fb 100644
--- a/arch/x86/kvm/vtd.c
+++ b/arch/x86/kvm/vtd.c
@@ -40,14 +40,20 @@ int kvm_iommu_map_pages(struct kvm *kvm,
 
for (i = 0; i  npages; i++) {
pfn = gfn_to_pfn(kvm, gfn);
-   rc = intel_iommu_page_mapping(domain,
- gfn  PAGE_SHIFT,
+   if (!is_mmio_pfn(pfn)) {
+   rc = intel_iommu_page_mapping(domain,
+ gfn  PAGE_SHIFT,
  pfn  PAGE_SHIFT,
- PAGE_SIZE,
- DMA_PTE_READ |
- DMA_PTE_WRITE);
-   if (rc)
-   kvm_release_pfn_clean(pfn);
+ PAGE_SIZE,
+ DMA_PTE_READ |
+ DMA_PTE_WRITE);
+   if (rc)
+   kvm_release_pfn_clean(pfn);
+   } else {
+   printk(KERN_DEBUG kvm_iommu_map_page:
+  invalid pfn=%lx\n, pfn);
+   return 0;
+   }
 
gfn++;
}
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 6185ed7..ee4685c 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -513,6 +513,8 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
 int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
  gpa_t addr, unsigned long *ret);
 
+int is_mmio_pfn(pfn_t pfn);
+
 extern bool tdp_enabled;
 
 enum emulation_result {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 77d7001..0653ec1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -77,7 +77,7 @@ static inline int valid_vcpu(int n)
return likely(n = 0  n  KVM_MAX_VCPUS);
 }
 
-static inline int is_mmio_pfn(pfn_t pfn)
+inline int is_mmio_pfn(pfn_t pfn)
 {
if (pfn_valid(pfn))
return PageReserved(pfn_to_page(pfn));
-- 
1.5.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] VT-d: changes to support KVM

2008-07-16 Thread Ben-Ami Yassour
From: Kay, Allen M [EMAIL PROTECTED]

This patch extends the VT-d driver to support KVM

[Ben: fixed memory pinning]

Signed-off-by: Kay, Allen M [EMAIL PROTECTED]
Signed-off-by: Weidong Han [EMAIL PROTECTED]
Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED]
---
 drivers/pci/dmar.c   |4 +-
 drivers/pci/intel-iommu.c|  117 +-
 drivers/pci/iova.c   |2 +-
 {drivers/pci = include/linux}/intel-iommu.h |   11 +++
 {drivers/pci = include/linux}/iova.h|0 
 5 files changed, 127 insertions(+), 7 deletions(-)
 rename {drivers/pci = include/linux}/intel-iommu.h (94%)
 rename {drivers/pci = include/linux}/iova.h (100%)

diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
index f941f60..a58a5b0 100644
--- a/drivers/pci/dmar.c
+++ b/drivers/pci/dmar.c
@@ -26,8 +26,8 @@
 
 #include linux/pci.h
 #include linux/dmar.h
-#include iova.h
-#include intel-iommu.h
+#include linux/iova.h
+#include linux/intel-iommu.h
 
 #undef PREFIX
 #define PREFIX DMAR:
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index bb06423..a566406 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -20,6 +20,7 @@
  * Author: Anil S Keshavamurthy [EMAIL PROTECTED]
  */
 
+#undef DEBUG
 #include linux/init.h
 #include linux/bitmap.h
 #include linux/debugfs.h
@@ -33,8 +34,8 @@
 #include linux/dma-mapping.h
 #include linux/mempool.h
 #include linux/timer.h
-#include iova.h
-#include intel-iommu.h
+#include linux/iova.h
+#include linux/intel-iommu.h
 #include asm/proto.h /* force_iommu in this header in x86-64*/
 #include asm/cacheflush.h
 #include asm/gart.h
@@ -160,7 +161,7 @@ static inline void *alloc_domain_mem(void)
return iommu_kmem_cache_alloc(iommu_domain_cache);
 }
 
-static inline void free_domain_mem(void *vaddr)
+static void free_domain_mem(void *vaddr)
 {
kmem_cache_free(iommu_domain_cache, vaddr);
 }
@@ -1414,7 +1415,7 @@ static void domain_remove_dev_info(struct dmar_domain 
*domain)
  * find_domain
  * Note: we use struct pci_dev-dev.archdata.iommu stores the info
  */
-struct dmar_domain *
+static struct dmar_domain *
 find_domain(struct pci_dev *pdev)
 {
struct device_domain_info *info;
@@ -2431,3 +2432,111 @@ int __init intel_iommu_init(void)
return 0;
 }
 
+void intel_iommu_domain_exit(struct dmar_domain *domain)
+{
+   u64 end;
+
+   /* Domain 0 is reserved, so dont process it */
+   if (!domain)
+   return;
+
+   end = DOMAIN_MAX_ADDR(domain-gaw);
+   end = end  (~PAGE_MASK_4K);
+
+   /* clear ptes */
+   dma_pte_clear_range(domain, 0, end);
+
+   /* free page tables */
+   dma_pte_free_pagetable(domain, 0, end);
+
+   iommu_free_domain(domain);
+   free_domain_mem(domain);
+}
+EXPORT_SYMBOL_GPL(intel_iommu_domain_exit);
+
+struct dmar_domain *intel_iommu_domain_alloc(struct pci_dev *pdev)
+{
+   struct dmar_drhd_unit *drhd;
+   struct dmar_domain *domain;
+   struct intel_iommu *iommu;
+
+   drhd = dmar_find_matched_drhd_unit(pdev);
+   if (!drhd) {
+   printk(KERN_ERR intel_iommu_domain_alloc: drhd == NULL\n);
+   return NULL;
+   }
+
+   iommu = drhd-iommu;
+   if (!iommu) {
+   printk(KERN_ERR
+   intel_iommu_domain_alloc: iommu == NULL\n);
+   return NULL;
+   }
+   domain = iommu_alloc_domain(iommu);
+   if (!domain) {
+   printk(KERN_ERR
+   intel_iommu_domain_alloc: domain == NULL\n);
+   return NULL;
+   }
+   if (domain_init(domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
+   printk(KERN_ERR
+   intel_iommu_domain_alloc: domain_init() failed\n);
+   intel_iommu_domain_exit(domain);
+   return NULL;
+   }
+   return domain;
+}
+EXPORT_SYMBOL_GPL(intel_iommu_domain_alloc);
+
+int intel_iommu_context_mapping(
+   struct dmar_domain *domain, struct pci_dev *pdev)
+{
+   int rc;
+   rc = domain_context_mapping(domain, pdev);
+   return rc;
+}
+EXPORT_SYMBOL_GPL(intel_iommu_context_mapping);
+
+int intel_iommu_page_mapping(
+   struct dmar_domain *domain, dma_addr_t iova,
+   u64 hpa, size_t size, int prot)
+{
+   int rc;
+   rc = domain_page_mapping(domain, iova, hpa, size, prot);
+   return rc;
+}
+EXPORT_SYMBOL_GPL(intel_iommu_page_mapping);
+
+void intel_iommu_detach_dev(struct dmar_domain *domain, u8 bus, u8 devfn)
+{
+   detach_domain_for_dev(domain, bus, devfn);
+}
+EXPORT_SYMBOL_GPL(intel_iommu_detach_dev);
+
+struct dmar_domain *
+intel_iommu_find_domain(struct pci_dev *pdev)
+{
+   return find_domain(pdev);
+}
+EXPORT_SYMBOL_GPL(intel_iommu_find_domain);
+
+int intel_iommu_found(void)
+{
+   return g_num_of_iommus;
+}
+EXPORT_SYMBOL_GPL(intel_iommu_found);
+
+u64 intel_iommu_iova_to_pfn(struct dmar_domain *domain, u64 

[PATCH 1/8] KVM: Introduce a callback routine for IOAPIC ack handling

2008-07-16 Thread Ben-Ami Yassour
From: Amit Shah [EMAIL PROTECTED]

This will be useful for acking irqs of assigned devices

Signed-off-by: Amit Shah [EMAIL PROTECTED]
---
 virt/kvm/ioapic.c |3 +++
 virt/kvm/ioapic.h |1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index c0d2287..8ce93c7 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -295,6 +295,9 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic 
*ioapic, int gsi)
ent-fields.remote_irr = 0;
if (!ent-fields.mask  (ioapic-irr  (1  gsi)))
ioapic_service(ioapic, gsi);
+
+   if (ioapic-ack_notifier)
+   ioapic-ack_notifier(ioapic-kvm, gsi);
 }
 
 void kvm_ioapic_update_eoi(struct kvm *kvm, int vector)
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index 7f16675..a42743f 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -58,6 +58,7 @@ struct kvm_ioapic {
} redirtbl[IOAPIC_NUM_PINS];
struct kvm_io_device dev;
struct kvm *kvm;
+   void (*ack_notifier)(void *opaque, int irq);
 };
 
 #ifdef DEBUG
-- 
1.5.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm: Unknown error 524, Fail to handle apic access vmexit

2008-07-16 Thread Dor Laor

Yang, Sheng wrote:

On Tuesday 15 July 2008 23:19:07 Dor Laor wrote:
  

Martin Michlmayr wrote:


I installed a Windows XP SP2 guest on a Debian x86_64 host The
installation itself went fine but kvm aborts when when XP starts
during Windows XP Setup.  XP mentions something with
intelppm.sys (see the attached screenshot) and kvm says:

kvm_run: Unknown error 524
kvm_run returned -524
  

It's a FlexPriority bug, while it should be solved, you can disable
it by using kvm-intel module parameter.




Dor, are you sure it's a FlexPriority bug? 

  
Well, I'm not sure it's the FlexPriority's fault, it's just when it is 
disabled it does not happen and I saw the apic

access. It can be miss emulation too.
It happened to me on ~ kvm-69
If you look at where is the complain, you would find there is a result 
of emulate_instruction().


And you will find a clearly emulation failed (mmio) rip 7cb3d000 ff 
ff 8d 85 in the bug tracker Martin metioned above the Fail to 
handle apic access vmexit! Offset is 0xf0(Spurious Interrupt Vector 
Register).


I don't think ff ff 8d 85 is a vaild opcode for that case.

Maybe it's a regression? The last report is long ago...

Hi Martin, can you show more dmesg here? And if it can be reproduce 
stable?


Thanks.

  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: networking setup problem

2008-07-16 Thread Uri Lublin

paolo pedaletti wrote:

Hi,
I hope this is the right ml to submit my problem.

Abstract: I can't setup 2 different network inside my VMs, one public 
and one private.


Scheme:

  eth0 -
 -| proxy |---eth1
 |-  |
H|   |
O|   eth0 -  |
S|| web   |--|eth1
T|-  |
 |   |
 |   eth0 -  |
 || db|---eth1
  -


this is a classic LAMP, sparse on 3 VM

1) front end, proxy (apache2 in reverse with mod-security)
2) application server, web (apache2 + php5)
3) database (mysql5)

(it's a test/backup environment)

each VM must have 2 network card:
eth0 on the local network, in bridge with the host physical eth0
eth1 on the virtual private network, for internal communications between 
them


saying that, ... it doesn't work :-(
(linux ubuntu 8.04 2.6.24-19-generic, kvm-62)

these are the command lines:

kvm -name PROXY
-net nic,vlan=0,macaddr=00:18:BE:EF:17:2A,model=rtl8139
-net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh
-net nic,vlan=1,macaddr=00:18:BE:EF:17:2B,model=rtl8139
-net user,vlan=1,ifname=dmz0,script=./qemu-ifup.sh
-drive index=0,media=disk,if=scsi,file=./ubuntu-server.PROXY.root,boot=on
-drive index=1,media=disk,if=scsi,file=./ubuntu-server.PROXY.home
-drive index=2,media=disk,if=scsi,file=./ubuntu-server.PROXY.swap

kvm -name WEBAPP
-net nic,vlan=0,macaddr=00:18:BE:EF:17:1A,model=rtl8139
-net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh
-net nic,vlan=1,macaddr=00:18:BE:EF:17:1B,model=rtl8139
-net user,vlan=1,ifname=dmz0,script=./qemu-ifup.sh
-drive index=0,media=disk,if=scsi,file=./ubuntu-server.WEB.root,boot=on
-drive index=1,media=disk,if=scsi,file=./ubuntu-server.WEB.home
-drive index=2,media=disk,if=scsi,file=./ubuntu-server.WEB.swap

kvm -name DB
-net nic,vlan=0,macaddr=00:18:BE:EF:17:0A,model=rtl8139
-net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh
-net nic,vlan=1,macaddr=00:18:BE:EF:17:0B,model=rtl8139
-net user,vlan=1,ifname=dmz0,script=./qemu-ifup.sh
-drive index=0,media=disk,if=scsi,file=./ubuntu-server.DB.root,boot=on
-drive index=1,media=disk,if=scsi,file=./ubuntu-server.DB.home
-drive index=2,media=disk,if=scsi,file=./ubuntu-server.DB.swap



Does using a different ifname help ?
PROXY:  ifname=tap2 and dmz2
WEBAPP: ifname=tap1 and dmz1
DB: ifname=tap0 and dmz0

Also check route on guests.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2019053 ] tbench fails on guest when AMD NPT enabled

2008-07-16 Thread SourceForge.net
Bugs item #2019053, was opened at 2008-07-16 03:10
Message generated for change (Comment added) made by avik
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2019053group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: amd
Group: None
Status: Pending
Resolution: None
Priority: 5
Private: No
Submitted By: Alex Williamson (alex_williamson)
Assigned to: Nobody/Anonymous (nobody)
Summary: tbench fails on guest when AMD NPT enabled

Initial Comment:
Running on a dual-socket system with AMD 2356 quad-core processors (8 total 
cores), 32GB RAM, Ubuntu Hardy 2.6.24-19-generic (64bit) with kvm-71 userspace 
and kernel modules.  With no module options, dmesg confirms: kvm: Nested Paging 
enabled

Start guest with:

/usr/local/kvm/bin/qemu-system-x86_64 -hda /dev/VM/Ubuntu64 -m 1024 -net 
nic,model=e1000,mac=de:ad:be:ef:00:01 -net tap,script=/root/bin/br0-ifup -smp 8 
-vnc :0

Guest VM is also Ubuntu Hardy 64bit.  On the guest run 'tbench 16 tbench 
server'.  System running tbench_srv is a different system in my case.

The tbench client will fail randomly, often quietly with Child failed with 
status 1, but sometimes more harshly with a glibc double free error.

If I unload the modules and reload w/o npt:

modprobe -r kvm-amd
modprobe -r kvm
modprobe kvm-amd npt=0

dmesg confirms: kvm: Nested Paging disabled

The tbench test now runs over and over successfully.  The test also runs fine 
on an Intel E5450 (no EPT).

--

Comment By: Avi Kivity (avik)
Date: 2008-07-16 17:19

Message:
Logged In: YES 
user_id=539971
Originator: NO

Strange.  If you add an mlockall() to qemu startup, does the test pass?

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2019053group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] posix-timers: Do not modify an already queued timer signal

2008-07-16 Thread Mark McLoughlin
When a timer fires, posix_timer_event() zeroes out its
pre-allocated siginfo structure, initialises it and then
queues up the signal with send_sigqueue().

However, we may have previously queued up this signal, in
which case we only want to increment si_overrun and
re-initialising the siginfo structure is incorrect.

Also, since we are modifying an already queued signal
without the protection of the sighand spinlock, we may also
race with e.g. collect_signal() causing it to fail to find
a signal on the pending list because it happens to look at
the siginfo struct after it was zeroed and before it was
re-initialised.

The race was observed with a modified kvm-userspace when
running a guest under heavy network load. When it occurs,
KVM never sees another SIGALRM signal because although
the signal is queued up the appropriate bit is never set
in the pending mask. Manually sending the process a SIGALRM
kicks it out of this state.

The fix is simple - only modify the pre-allocated sigqueue
once we're sure that it hasn't already been queued.

Signed-off-by: Mark McLoughlin [EMAIL PROTECTED]
Cc: Oleg Nesterov [EMAIL PROTECTED]
Cc: Roland McGrath [EMAIL PROTECTED]

---
 include/linux/sched.h |2 +-
 kernel/posix-timers.c |   20 +++-
 kernel/signal.c   |5 +++--
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2134917..718f7ec 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1791,7 +1791,7 @@ extern void zap_other_threads(struct task_struct *p);
 extern int kill_proc(pid_t, int, int);
 extern struct sigqueue *sigqueue_alloc(void);
 extern void sigqueue_free(struct sigqueue *);
-extern int send_sigqueue(struct sigqueue *,  struct task_struct *, int group);
+extern int send_sigqueue(struct sigqueue *, siginfo_t *, struct task_struct *, 
int group);
 extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *);
 extern int do_sigaltstack(const stack_t __user *, stack_t __user *, unsigned 
long);
 
diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index dbd8398..b42c964 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -298,19 +298,21 @@ void do_schedule_next_timer(struct siginfo *info)
 
 int posix_timer_event(struct k_itimer *timr,int si_private)
 {
-   memset(timr-sigq-info, 0, sizeof(siginfo_t));
-   timr-sigq-info.si_sys_private = si_private;
+   siginfo_t info;
+
+   memset(info, 0, sizeof(siginfo_t));
+   info.si_sys_private = si_private;
/* Send signal to the process that owns this timer.*/
 
-   timr-sigq-info.si_signo = timr-it_sigev_signo;
-   timr-sigq-info.si_errno = 0;
-   timr-sigq-info.si_code = SI_TIMER;
-   timr-sigq-info.si_tid = timr-it_id;
-   timr-sigq-info.si_value = timr-it_sigev_value;
+   info.si_signo = timr-it_sigev_signo;
+   info.si_errno = 0;
+   info.si_code = SI_TIMER;
+   info.si_tid = timr-it_id;
+   info.si_value = timr-it_sigev_value;
 
if (timr-it_sigev_notify  SIGEV_THREAD_ID) {
struct task_struct *leader;
-   int ret = send_sigqueue(timr-sigq, timr-it_process, 0);
+   int ret = send_sigqueue(timr-sigq, info, timr-it_process, 0);
 
if (likely(ret = 0))
return ret;
@@ -321,7 +323,7 @@ int posix_timer_event(struct k_itimer *timr,int si_private)
timr-it_process = leader;
}
 
-   return send_sigqueue(timr-sigq, timr-it_process, 1);
+   return send_sigqueue(timr-sigq, info, timr-it_process, 1);
 }
 EXPORT_SYMBOL_GPL(posix_timer_event);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 6c0958e..50e0b13 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1292,9 +1292,9 @@ void sigqueue_free(struct sigqueue *q)
__sigqueue_free(q);
 }
 
-int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
+int send_sigqueue(struct sigqueue *q, siginfo_t *info, struct task_struct *t, 
int group)
 {
-   int sig = q-info.si_signo;
+   int sig = info-si_signo;
struct sigpending *pending;
unsigned long flags;
int ret;
@@ -1322,6 +1322,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct 
*t, int group)
 
signalfd_notify(t, sig);
pending = group ? t-signal-shared_pending : t-pending;
+   copy_siginfo(q-info, info);
list_add_tail(q-list, pending-list);
sigaddset(pending-signal, sig);
complete_signal(sig, t, group);
-- 
1.5.5.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/8] KVM: PCIPT: fix interrupt handling

2008-07-16 Thread Avi Kivity

Ben-Ami Yassour wrote:

This patch fixes a few problems with the interrupt handling for
passthrough devices.

  


Well, fold it into the patch it fixes.  There is no point in sending a 
buggy patch and a fix in the same patchset.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: Handle device assignment to guests

2008-07-16 Thread Avi Kivity

Ben-Ami Yassour wrote:

From: Han, Weidong [EMAIL PROTECTED]

This patch adds support for handling PCI devices that are assigned to
the guest (PCI passthrough).
+
+/*
+ * Used to find a registered host PCI device (a passthrough device)
+ * during ioctls, interrupts or EOI
+ */
+struct kvm_pci_pt_dev_list *
+kvm_find_pci_pt_dev(struct list_head *head,
+   struct kvm_pci_pt_info *pt_pci_info, int irq, int source)
+{
+   struct list_head *ptr;
+   struct kvm_pci_pt_dev_list *match;
+
+   list_for_each(ptr, head) {
+   match = list_entry(ptr, struct kvm_pci_pt_dev_list, list);
+
+   switch (source) {
+   case KVM_PT_SOURCE_IRQ:
+   /*
+* Used to find a registered host device
+* during interrupt context on host
+*/
+   if (match-pt_dev.host.irq == irq)
+   return match;
+   break;
+   case KVM_PT_SOURCE_IRQ_ACK:
+   /*
+* Used to find a registered host device when
+* the guest acks an interrupt
+*/
+   if (match-pt_dev.guest.irq == irq)
+   return match;
+   break;
+   case KVM_PT_SOURCE_UPDATE:
+   if ((match-pt_dev.host.busnr == pt_pci_info-busnr) 
+   (match-pt_dev.host.devfn == pt_pci_info-devfn))
+   return match;
+   break;
+   }
+   }
+   return NULL;
+}
  


This monster is best split into three functions each handling a separate 
case, without the 'source' argument.



+static void kvm_pci_pt_work_fn(struct work_struct *work)
+{
+   struct kvm_pci_pt_dev_list *match;
+   struct kvm_pci_pt_work *int_work;
+   int source;
+   unsigned long flags;
+   int guest_irq;
+   int host_irq;
+
+   int_work = container_of(work, struct kvm_pci_pt_work, work);
+
+   source = int_work-source ? KVM_PT_SOURCE_IRQ_ACK : KVM_PT_SOURCE_IRQ;
+
+   /* This is taken to safely inject irq inside the guest. When
+* the interrupt injection (or the ioapic code) uses a
+* finer-grained lock, update this
+*/
+   mutex_lock(int_work-kvm-lock);
+   read_lock_irqsave(kvm_pci_pt_lock, flags);
+   match = kvm_find_pci_pt_dev(int_work-kvm-arch.pci_pt_dev_head, NULL,
+   int_work-irq, source);
+   if (!match) {
+   printk(KERN_ERR %s: no matching device assigned to guest 
+  found for irq %d, source = %d!\n,
+  __func__, int_work-irq, int_work-source);
+   read_unlock_irqrestore(kvm_pci_pt_lock, flags);
+   goto out;
+   }
+   guest_irq = match-pt_dev.guest.irq;
+   host_irq = match-pt_dev.host.irq;
+   read_unlock_irqrestore(kvm_pci_pt_lock, flags);
+
+   if (source == KVM_PT_SOURCE_IRQ)
+   kvm_set_irq(int_work-kvm, guest_irq, 1);
+   else {
+   kvm_set_irq(int_work-kvm, int_work-irq, 0);
+   enable_irq(host_irq);
+   }
+out:
+   mutex_unlock(int_work-kvm-lock);
+   kvm_put_kvm(int_work-kvm);
+}
  
+

+/* FIXME: Implement the OR logic needed to make shared interrupts on
+ * this line behave properly
+ */
  


Isn't this a showstopper?  There is no easy way for a user to avoid 
sharing, especially as we have only three pci irqs at present.



+static irqreturn_t kvm_pci_pt_dev_intr(int irq, void *dev_id)
+{
+   struct kvm *kvm = (struct kvm *) dev_id;
+   struct kvm_pci_pt_dev_list *pci_pt_dev;
+
+   if (!test_bit(irq, pt_irq_handled))
+   return IRQ_NONE;
+
+   read_lock(kvm_pci_pt_lock);
+   pci_pt_dev = kvm_find_pci_pt_dev(kvm-arch.pci_pt_dev_head, NULL,
+irq, KVM_PT_SOURCE_IRQ);
+   if (!pci_pt_dev) {
+   read_unlock(kvm_pci_pt_lock);
+   return IRQ_NONE;
+   }
  



I see we don't reuse the result of the search.  I guess we can't, since 
the list may change between the interrupt and the execution of the work 
function.



+
+   pci_pt_dev-pt_dev.int_work.irq = irq;
+   pci_pt_dev-pt_dev.int_work.kvm = kvm;
+   pci_pt_dev-pt_dev.int_work.source = 0;
+
  


For a bool, use false, not 0.  But 'source' isn't really a good name for 
a boolean.  Perhaps 'is_ack'?



+
+/* Ack the irq line for a passthrough device */
+static void kvm_pci_pt_ack_irq(void *opaque, int irq)
+{
+   struct kvm *kvm = opaque;
+   struct kvm_pci_pt_dev_list *pci_pt_dev;
+   unsigned long flags;
+
+   if (irq == -1)
+   return;
+
+   read_lock_irqsave(kvm_pci_pt_lock, flags);
+   pci_pt_dev = kvm_find_pci_pt_dev(kvm-arch.pci_pt_dev_head, NULL, irq,
+   

[ kvm-Bugs-2019053 ] tbench fails on guest when AMD NPT enabled

2008-07-16 Thread SourceForge.net
Bugs item #2019053, was opened at 2008-07-15 18:10
Message generated for change (Comment added) made by alex_williamson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2019053group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: amd
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alex Williamson (alex_williamson)
Assigned to: Nobody/Anonymous (nobody)
Summary: tbench fails on guest when AMD NPT enabled

Initial Comment:
Running on a dual-socket system with AMD 2356 quad-core processors (8 total 
cores), 32GB RAM, Ubuntu Hardy 2.6.24-19-generic (64bit) with kvm-71 userspace 
and kernel modules.  With no module options, dmesg confirms: kvm: Nested Paging 
enabled

Start guest with:

/usr/local/kvm/bin/qemu-system-x86_64 -hda /dev/VM/Ubuntu64 -m 1024 -net 
nic,model=e1000,mac=de:ad:be:ef:00:01 -net tap,script=/root/bin/br0-ifup -smp 8 
-vnc :0

Guest VM is also Ubuntu Hardy 64bit.  On the guest run 'tbench 16 tbench 
server'.  System running tbench_srv is a different system in my case.

The tbench client will fail randomly, often quietly with Child failed with 
status 1, but sometimes more harshly with a glibc double free error.

If I unload the modules and reload w/o npt:

modprobe -r kvm-amd
modprobe -r kvm
modprobe kvm-amd npt=0

dmesg confirms: kvm: Nested Paging disabled

The tbench test now runs over and over successfully.  The test also runs fine 
on an Intel E5450 (no EPT).

--

Comment By: Alex Williamson (alex_williamson)
Date: 2008-07-16 09:18

Message:
Logged In: YES 
user_id=333914
Originator: YES

No, I added mlockall(MCL_CURRENT | MCL_FUTURE) to qemu/vl.c:main() and it
makes no difference.  I'm only starting a 1G guest on an otherwise idle 32G
host, so host memory pressure is pretty light.

--

Comment By: Avi Kivity (avik)
Date: 2008-07-16 08:19

Message:
Logged In: YES 
user_id=539971
Originator: NO

Strange.  If you add an mlockall() to qemu startup, does the test pass?

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2019053group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI passthrough with VT-d - native performance

2008-07-16 Thread Ben-Ami Yassour
On Wed, 2008-07-16 at 17:36 +0300, Avi Kivity wrote:
 Ben-Ami Yassour wrote:
  In last few tests that we made with PCI-passthrough and VT-d using
  iperf, we were able to get the same throughput as on native OS with a 1G
  NIC
 
 Excellent!
 
   (with higher CPU utilization).

 
 How much higher?

Here are some numbers for running iperf -l 1M:

e1000 NIC (behind a PCI bridge)
   Bandwidth (Mbit/sec)CPU utilization
Native OS   771  18%
Native OS with VT-d 760  18% 
KVM VT-d390  95% 
KVM VT-d with direct mmio   770  84%
KVM emulated 57 100%  

Comment: its not clear to me why the native linux can not get closer to 1G for 
this NIC,
(I verified that its not external network issues). But clearly we shouldn't 
hope to 
get more then the host does with a KVM guest (especially if the guest and host 
are the 
same OS as in this case...).

e1000e NIC (onboard)
   Bandwidth (Mbit/sec)CPU utilization
Native OS   915  18%
Native OS with VT-d 915  18%
KVM VT-d with direct mmio   914  98%

Clearly we need to try and improve the CPU utilization, but I think that this 
is good enough 
for the first phase.

 
  The following patches are the PCI-passthrough patches that Amit sent
  (re-based on the last kvm tree), followed by a few improvements and the
  VT-d extension.
  I am also sending the userspace patches: the patch that Amit sent for
  PCI passthrough and the direct-mmio extension for userspace (note that
  without the direct mmio extension we get less then half the throughput).

 
 Is mmio passthrough the reason for the performance improvement?  If not, 
 what was the problem?
 
Direct mmio was definitely a major improvement, without it we got half the 
throughput,
as you can see above.
In addition patch 4/8 improves the interrupt handling and removes unnecessary 
locks,
and I assume that it also fixed performance issues (I did not investigate 
exactly in what way).

Regards,
Ben


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI passthrough with VT-d - native performance

2008-07-16 Thread Avi Kivity

Ben-Ami Yassour wrote:

 (with higher CPU utilization).
  
  

How much higher?



Here are some numbers for running iperf -l 1M:

e1000 NIC (behind a PCI bridge)
   Bandwidth (Mbit/sec)CPU utilization
Native OS   771  18%
Native OS with VT-d 760  18% 
KVM VT-d390  95% 
KVM VT-d with direct mmio   770  84%
KVM emulated 57 100%  


Comment: its not clear to me why the native linux can not get closer to 1G for 
this NIC,
(I verified that its not external network issues). But clearly we shouldn't hope to 
get more then the host does with a KVM guest (especially if the guest and host are the 
same OS as in this case...).


e1000e NIC (onboard)
   Bandwidth (Mbit/sec)CPU utilization
Native OS   915  18%
Native OS with VT-d 915  18%
KVM VT-d with direct mmio   914  98%

Clearly we need to try and improve the CPU utilization, but I think that this is good enough 
for the first phase.


  


Agree;  part of the higher utilization is of course not the fault of the 
device assignment code, rather it is ordinary virtualization overhead.  
We'll have to tune this.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI passthrough with VT-d - native performance

2008-07-16 Thread Anthony Liguori

Ben-Ami Yassour wrote:

On Wed, 2008-07-16 at 17:36 +0300, Avi Kivity wrote:
  

Ben-Ami Yassour wrote:


In last few tests that we made with PCI-passthrough and VT-d using
iperf, we were able to get the same throughput as on native OS with a 1G
NIC
  

Excellent!



 (with higher CPU utilization).
  
  

How much higher?



Here are some numbers for running iperf -l 1M:

e1000 NIC (behind a PCI bridge)
   Bandwidth (Mbit/sec)CPU utilization
Native OS   771  18%
Native OS with VT-d 760  18% 
KVM VT-d390  95% 
KVM VT-d with direct mmio   770  84%
KVM emulated 57 100%  
  


What about virtio?  Also, which emulated is this?

That CPU utilization is extremely high and somewhat illogical if native 
w/vt-d has almost no CPU impact.  Have you run oprofile yet or have any 
insight into where CPU is being burnt?


What does kvm_stat look like?  I wonder if there are a large number of 
PIO exits.  What does the interrupt count look like on native vs. KVM 
with VT-d?


Regards,

Anthony Liguori


Comment: its not clear to me why the native linux can not get closer to 1G for 
this NIC,
(I verified that its not external network issues). But clearly we shouldn't hope to 
get more then the host does with a KVM guest (especially if the guest and host are the 
same OS as in this case...).


e1000e NIC (onboard)
   Bandwidth (Mbit/sec)CPU utilization
Native OS   915  18%
Native OS with VT-d 915  18%
KVM VT-d with direct mmio   914  98%

Clearly we need to try and improve the CPU utilization, but I think that this is good enough 
for the first phase.


  

The following patches are the PCI-passthrough patches that Amit sent
(re-based on the last kvm tree), followed by a few improvements and the
VT-d extension.
I am also sending the userspace patches: the patch that Amit sent for
PCI passthrough and the direct-mmio extension for userspace (note that
without the direct mmio extension we get less then half the throughput).
  
  
Is mmio passthrough the reason for the performance improvement?  If not, 
what was the problem?




Direct mmio was definitely a major improvement, without it we got half the 
throughput,
as you can see above.
In addition patch 4/8 improves the interrupt handling and removes unnecessary 
locks,
and I assume that it also fixed performance issues (I did not investigate 
exactly in what way).

Regards,
Ben


  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2/RFC] libkvm-s390

2008-07-16 Thread Christian Borntraeger
This is an update patch for libkvm to build and work on s390.

It should address all comments from Avi as well as some aspects I have found:
o implement kvm_show_regs
o use s390 instead of s390x in file names. It is commonly used for 31 and 
64bit systems
o dont define __s390__ and __s390x__ in config.mak. Its predefined in gcc. 
o add some callbacks (done by Carsten, but not yet posted)

From: Carsten Otte [EMAIL PROTECTED]
From: Christian Borntraeger [EMAIL PROTECTED]

Signed-off-by: Christian Borntraeger [EMAIL PROTECTED]

---
 Makefile|2 
 libkvm/config-s390.mak  |3 +
 libkvm/config-s390x.mak |3 +
 libkvm/kvm-common.h |7 ++
 libkvm/kvm-s390.h   |   31 ++
 libkvm/libkvm-s390.c|  137 
 libkvm/libkvm.c |   25 
 libkvm/libkvm.h |   17 +
 8 files changed, 224 insertions(+), 1 deletion(-)

Index: kvm-userspace/Makefile
===
--- kvm-userspace.orig/Makefile
+++ kvm-userspace/Makefile
@@ -5,7 +5,7 @@ DESTDIR=
 
 rpmrelease = devel
 
-sane-arch = $(subst i386,x86,$(subst x86_64,x86,$(ARCH)))
+sane-arch = $(subst i386,x86,$(subst x86_64,x86,$(subst s390x,s390,$(ARCH
 
 .PHONY: kernel user libkvm qemu bios vgabios extboot clean libfdt
 
Index: kvm-userspace/libkvm/config-s390.mak
===
--- /dev/null
+++ kvm-userspace/libkvm/config-s390.mak
@@ -0,0 +1,3 @@
+# s390 31bit mode
+LIBDIR := /lib
+libkvm-$(ARCH)-objs := libkvm-s390.o
Index: kvm-userspace/libkvm/config-s390x.mak
===
--- /dev/null
+++ kvm-userspace/libkvm/config-s390x.mak
@@ -0,0 +1,3 @@
+# s390 64 bit mode (arch=s390x)
+LIBDIR := /lib64
+libkvm-$(ARCH)-objs := libkvm-s390.o
Index: kvm-userspace/libkvm/kvm-common.h
===
--- kvm-userspace.orig/libkvm/kvm-common.h
+++ kvm-userspace/libkvm/kvm-common.h
@@ -18,8 +18,15 @@
 
 /* FIXME: share this number with kvm */
 /* FIXME: or dynamically alloc/realloc regions */
+#ifndef __s390__
 #define KVM_MAX_NUM_MEM_REGIONS 8u
 #define MAX_VCPUS 16
+#else
+#define KVM_MAX_NUM_MEM_REGIONS 1u
+#define MAX_VCPUS 64
+#define LIBKVM_S390_ORIGIN (0UL)
+#endif
+
 
 /* kvm abi verison variable */
 extern int kvm_abi;
Index: kvm-userspace/libkvm/kvm-s390.h
===
--- /dev/null
+++ kvm-userspace/libkvm/kvm-s390.h
@@ -0,0 +1,31 @@
+/*
+ * This header is for functions  variables that will ONLY be
+ * used inside libkvm for s390.
+ * THESE ARE NOT EXPOSED TO THE USER AND ARE ONLY FOR USE
+ * WITHIN LIBKVM.
+ *
+ * Copyright (C) 2006 Qumranet, Inc.
+ *
+ * Authors:
+ * Avi Kivity   [EMAIL PROTECTED]
+ * Yaniv Kamay  [EMAIL PROTECTED]
+ *
+ * Copyright 2008 IBM Corporation.
+ * Authors:
+ * Carsten Otte [EMAIL PROTECTED]
+ *
+ * This work is licensed under the GNU LGPL license, version 2.
+ */
+
+#ifndef KVM_S390_H
+#define KVM_S390_H
+
+#include asm/ptrace.h
+#include kvm-common.h
+
+#define PAGE_SIZE 4096ul
+#define PAGE_MASK (~(PAGE_SIZE - 1))
+
+#define smp_wmb()   asm volatile( ::: memory)
+
+#endif
Index: kvm-userspace/libkvm/libkvm-s390.c
===
--- /dev/null
+++ kvm-userspace/libkvm/libkvm-s390.c
@@ -0,0 +1,137 @@
+/*
+ * This file contains the s390 specific implementation for the
+ * architecture dependent functions defined in kvm-common.h and
+ * libkvm.h
+ *
+ * Copyright (C) 2006 Qumranet
+ * Copyright IBM Corp. 2008
+ *
+ * Authors:
+ * Carsten Otte [EMAIL PROTECTED]
+ *  Christian Borntraeger [EMAIL PROTECTED]
+ *
+ * This work is licensed under the GNU LGPL license, version 2.
+ */
+
+#include sys/ioctl.h
+#include asm/ptrace.h
+
+#include libkvm.h
+#include kvm-common.h
+#include errno.h
+#include stdio.h
+#include inttypes.h
+
+int handle_dcr(struct kvm_run *run,  kvm_context_t kvm, int vcpu)
+{
+   fprintf(stderr, %s: Operation not supported\n, __FUNCTION__);
+   return -1;
+}
+
+int kvm_alloc_kernel_memory(kvm_context_t kvm, unsigned long memory,
+   void **vm_mem)
+{
+   fprintf(stderr, %s: Operation not supported\n, __FUNCTION__);
+   return -1;
+}
+
+void *kvm_create_kernel_phys_mem(kvm_context_t kvm, unsigned long phys_start,
+unsigned long len, int log, int writable)
+{
+   fprintf(stderr, %s: Operation not supported\n, __FUNCTION__);
+   return NULL;
+}
+
+void kvm_show_code(kvm_context_t kvm, int vcpu)
+{
+   fprintf(stderr, %s: Operation not supported\n, __FUNCTION__);
+}
+
+void kvm_show_regs(kvm_context_t kvm, int vcpu)
+{
+   struct kvm_regs regs;
+   struct kvm_sregs sregs;
+   int i;
+
+   if (kvm_get_regs(kvm, vcpu, regs))
+   return;
+
+   if (kvm_get_sregs(kvm, 

Re: [PATCH] posix-timers: Do not modify an already queued timer signal

2008-07-16 Thread Mark McLoughlin
On Wed, 2008-07-16 at 15:50 +0100, Mark McLoughlin wrote:

 The race was observed with a modified kvm-userspace when
 running a guest under heavy network load. When it occurs,
 KVM never sees another SIGALRM signal because although
 the signal is queued up the appropriate bit is never set
 in the pending mask. Manually sending the process a SIGALRM
 kicks it out of this state.

I should clarify what I mean by modified kvm-userspace. Basically, I
was trying out a suggestion of Marcelo's to drop the global qemu mutex
when reading GSO packets from a tap device i.e.

@@ -4299,7 +4299,9 @@ static void tap_send(void *opaque)
sbuf.buf = s-buf;
s-size = getmsg(s-fd, NULL, sbuf, f) =0 ? sbuf.len : -1;
#else
+   kvm_mutex_unlock();
s-size = read(s-fd, s-buf, sizeof(s-buf));
+   kvm_mutex_lock();

It seems to work fine, but more on that later ... important thing is
that if people see a hard-to-reproduce condition where things seem to
slow down or lock up, try manually doing a kill -ALRM $(qemu) and if
that fixes it, then you're probably seeing this bug.

Cheers,
Mark.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] KVM: Handle device assignment to guests

2008-07-16 Thread Ben-Ami Yassour
From: Amit Shah [EMAIL PROTECTED]

This patch adds support for handling PCI devices that are assigned to
the guest (PCI passthrough).

The device to be assigned to the guest is registered in the host kernel
and interrupt delivery is handled. If a device is already assigned, or
the device driver for it is still loaded on the host, the device assignment
is failed by conveying a -EBUSY reply to the userspace.

Devices that share their interrupt line are not supported at the moment.

By itself, this patch will not make devices work within the guest.
The VT-d extension is required to enable the device to perform DMA.
Another alternative is PVDMA.

Signed-off-by: Amit Shah [EMAIL PROTECTED]
Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED]
Signed-off-by: Han, Weidong [EMAIL PROTECTED]
---
 arch/x86/kvm/x86.c |  267 
 include/asm-x86/kvm_host.h |   37 ++
 include/asm-x86/kvm_para.h |   16 +++-
 include/linux/kvm.h|3 +
 virt/kvm/ioapic.c  |   12 ++-
 5 files changed, 332 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3167006..65b307d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4,10 +4,12 @@
  * derived from drivers/kvm/kvm_main.c
  *
  * Copyright (C) 2006 Qumranet, Inc.
+ * Copyright (C) 2008 Qumranet, Inc.
  *
  * Authors:
  *   Avi Kivity   [EMAIL PROTECTED]
  *   Yaniv Kamay  [EMAIL PROTECTED]
+ *   Amit Shah[EMAIL PROTECTED]
  *
  * This work is licensed under the terms of the GNU GPL, version 2.  See
  * the COPYING file in the top-level directory.
@@ -23,8 +25,10 @@
 #include x86.h
 
 #include linux/clocksource.h
+#include linux/interrupt.h
 #include linux/kvm.h
 #include linux/fs.h
+#include linux/pci.h
 #include linux/vmalloc.h
 #include linux/module.h
 #include linux/mman.h
@@ -98,6 +102,256 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
 };
 
+DEFINE_RWLOCK(kvm_pci_pt_lock);
+
+/*
+ * Used to find a registered host PCI device (a passthrough device)
+ * during ioctls, interrupts or EOI
+ */
+struct kvm_pci_pt_dev_list *
+kvm_find_pci_pt_dev(struct list_head *head,
+   struct kvm_pci_pt_info *pt_pci_info, int irq, int source)
+{
+   struct list_head *ptr;
+   struct kvm_pci_pt_dev_list *match;
+
+   list_for_each(ptr, head) {
+   match = list_entry(ptr, struct kvm_pci_pt_dev_list, list);
+
+   switch (source) {
+   case KVM_PT_SOURCE_IRQ:
+   /*
+* Used to find a registered host device
+* during interrupt context on host
+*/
+   if (match-pt_dev.host.irq == irq)
+   return match;
+   break;
+   case KVM_PT_SOURCE_IRQ_ACK:
+   /*
+* Used to find a registered host device when
+* the guest acks an interrupt
+*/
+   if (match-pt_dev.guest.irq == irq)
+   return match;
+   break;
+   case KVM_PT_SOURCE_UPDATE:
+   if ((match-pt_dev.host.busnr == pt_pci_info-busnr) 
+   (match-pt_dev.host.devfn == pt_pci_info-devfn))
+   return match;
+   break;
+   }
+   }
+   return NULL;
+}
+
+static void kvm_pci_pt_int_work_fn(struct work_struct *work)
+{
+   struct kvm_pci_pt_work *int_work;
+
+   int_work = container_of(work, struct kvm_pci_pt_work, work);
+
+   /* This is taken to safely inject irq inside the guest. When
+* the interrupt injection (or the ioapic code) uses a
+* finer-grained lock, update this
+*/
+   mutex_lock(int_work-pt_dev-kvm-lock);
+   kvm_set_irq(int_work-pt_dev-kvm, int_work-pt_dev-guest.irq, 1);
+   mutex_unlock(int_work-pt_dev-kvm-lock);
+   kvm_put_kvm(int_work-pt_dev-kvm);
+}
+
+static void kvm_pci_pt_ack_work_fn(struct work_struct *work)
+{
+   struct kvm_pci_pt_work *ack_work;
+
+   ack_work = container_of(work, struct kvm_pci_pt_work, work);
+
+   /* This is taken to safely inject irq inside the guest. When
+* the interrupt injection (or the ioapic code) uses a
+* finer-grained lock, update this
+*/
+   mutex_lock(ack_work-pt_dev-kvm-lock);
+   kvm_set_irq(ack_work-pt_dev-kvm, ack_work-pt_dev-guest.irq, 0);
+   enable_irq(ack_work-pt_dev-host.irq);
+   mutex_unlock(ack_work-pt_dev-kvm-lock);
+   kvm_put_kvm(ack_work-pt_dev-kvm);
+}
+
+/* FIXME: Implement the OR logic needed to make shared interrupts on
+ * this line behave properly
+ */
+static irqreturn_t kvm_pci_pt_dev_intr(int irq, void *dev_id)
+{
+   struct kvm_pci_passthrough_dev_kernel *pt_dev =
+   (struct kvm_pci_passthrough_dev_kernel *) 

[PATCH 3/6] KVM: Handle device assignment to guests

2008-07-16 Thread Ben-Ami Yassour
From: Amit Shah [EMAIL PROTECTED]

This patch adds support for handling PCI devices that are assigned to
the guest (PCI passthrough).

The device to be assigned to the guest is registered in the host kernel
and interrupt delivery is handled. If a device is already assigned, or
the device driver for it is still loaded on the host, the device assignment
is failed by conveying a -EBUSY reply to the userspace.

Devices that share their interrupt line are not supported at the moment.

By itself, this patch will not make devices work within the guest.
The VT-d extension is required to enable the device to perform DMA.
Another alternative is PVDMA.

Signed-off-by: Amit Shah [EMAIL PROTECTED]
Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED]
Signed-off-by: Han, Weidong [EMAIL PROTECTED]
---
 arch/x86/kvm/x86.c |  267 
 include/asm-x86/kvm_host.h |   37 ++
 include/asm-x86/kvm_para.h |   16 +++-
 include/linux/kvm.h|3 +
 virt/kvm/ioapic.c  |   12 ++-
 5 files changed, 332 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3167006..65b307d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4,10 +4,12 @@
  * derived from drivers/kvm/kvm_main.c
  *
  * Copyright (C) 2006 Qumranet, Inc.
+ * Copyright (C) 2008 Qumranet, Inc.
  *
  * Authors:
  *   Avi Kivity   [EMAIL PROTECTED]
  *   Yaniv Kamay  [EMAIL PROTECTED]
+ *   Amit Shah[EMAIL PROTECTED]
  *
  * This work is licensed under the terms of the GNU GPL, version 2.  See
  * the COPYING file in the top-level directory.
@@ -23,8 +25,10 @@
 #include x86.h
 
 #include linux/clocksource.h
+#include linux/interrupt.h
 #include linux/kvm.h
 #include linux/fs.h
+#include linux/pci.h
 #include linux/vmalloc.h
 #include linux/module.h
 #include linux/mman.h
@@ -98,6 +102,256 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
 };
 
+DEFINE_RWLOCK(kvm_pci_pt_lock);
+
+/*
+ * Used to find a registered host PCI device (a passthrough device)
+ * during ioctls, interrupts or EOI
+ */
+struct kvm_pci_pt_dev_list *
+kvm_find_pci_pt_dev(struct list_head *head,
+   struct kvm_pci_pt_info *pt_pci_info, int irq, int source)
+{
+   struct list_head *ptr;
+   struct kvm_pci_pt_dev_list *match;
+
+   list_for_each(ptr, head) {
+   match = list_entry(ptr, struct kvm_pci_pt_dev_list, list);
+
+   switch (source) {
+   case KVM_PT_SOURCE_IRQ:
+   /*
+* Used to find a registered host device
+* during interrupt context on host
+*/
+   if (match-pt_dev.host.irq == irq)
+   return match;
+   break;
+   case KVM_PT_SOURCE_IRQ_ACK:
+   /*
+* Used to find a registered host device when
+* the guest acks an interrupt
+*/
+   if (match-pt_dev.guest.irq == irq)
+   return match;
+   break;
+   case KVM_PT_SOURCE_UPDATE:
+   if ((match-pt_dev.host.busnr == pt_pci_info-busnr) 
+   (match-pt_dev.host.devfn == pt_pci_info-devfn))
+   return match;
+   break;
+   }
+   }
+   return NULL;
+}
+
+static void kvm_pci_pt_int_work_fn(struct work_struct *work)
+{
+   struct kvm_pci_pt_work *int_work;
+
+   int_work = container_of(work, struct kvm_pci_pt_work, work);
+
+   /* This is taken to safely inject irq inside the guest. When
+* the interrupt injection (or the ioapic code) uses a
+* finer-grained lock, update this
+*/
+   mutex_lock(int_work-pt_dev-kvm-lock);
+   kvm_set_irq(int_work-pt_dev-kvm, int_work-pt_dev-guest.irq, 1);
+   mutex_unlock(int_work-pt_dev-kvm-lock);
+   kvm_put_kvm(int_work-pt_dev-kvm);
+}
+
+static void kvm_pci_pt_ack_work_fn(struct work_struct *work)
+{
+   struct kvm_pci_pt_work *ack_work;
+
+   ack_work = container_of(work, struct kvm_pci_pt_work, work);
+
+   /* This is taken to safely inject irq inside the guest. When
+* the interrupt injection (or the ioapic code) uses a
+* finer-grained lock, update this
+*/
+   mutex_lock(ack_work-pt_dev-kvm-lock);
+   kvm_set_irq(ack_work-pt_dev-kvm, ack_work-pt_dev-guest.irq, 0);
+   enable_irq(ack_work-pt_dev-host.irq);
+   mutex_unlock(ack_work-pt_dev-kvm-lock);
+   kvm_put_kvm(ack_work-pt_dev-kvm);
+}
+
+/* FIXME: Implement the OR logic needed to make shared interrupts on
+ * this line behave properly
+ */
+static irqreturn_t kvm_pci_pt_dev_intr(int irq, void *dev_id)
+{
+   struct kvm_pci_passthrough_dev_kernel *pt_dev =
+   (struct kvm_pci_passthrough_dev_kernel *) 

Re: [PATCH 3/6] KVM: Handle device assignment to guests

2008-07-16 Thread Ben-Ami Yassour
Please ignore this repeated patch


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI passthrough with VT-d - native performance

2008-07-16 Thread Avi Kivity

Ben-Ami Yassour wrote:
  
That CPU utilization is extremely high and somewhat illogical if native 
w/vt-d has almost no CPU impact.  Have you run oprofile yet or have any 
insight into where CPU is being burnt?


What does kvm_stat look like?  I wonder if there are a large number of 
PIO exits.  What does the interrupt count look like on native vs. KVM 
with VT-d?


Regards,

Anthony Liguori




These are all good points and questions, I agree that we need to take a deeper 
look into the performance issues, but I think that we need to merge with 
the main KVM tree first.
  


It would be good to get the host interrupt rate, to confirm that the 
host isn't flooded with interrupts.  A deeper analysis can wait.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm_queue_exception

2008-07-16 Thread Thomas Mueller
hi there

i was using kvm-70 and kernel 2.6.25 on debian lenny for two weeks 
without problems. processor is an AMD Opteron 2350.  now out of 
the nowwhere there is a problem with 1 of 4 guests (1x debian etch amd64, 
1x debian etch i386, 1x ubuntu 7.10 amd64, 1x ubuntu 7.10 i386)

On the host i see the following kernel  messages again and again... 
Rebooted the host and started all guests. Same 
problem again with only one guest (debian etch amd64 with kernel 2.6.25). 

Jul 16 19:17:03 cubalibre kernel: [ 9275.836932] [ cut here 
]
Jul 16 19:17:03 cubalibre kernel: [ 9275.836962] WARNING: at 
/usr/src/modules/kvm/x86.c:185 kvm_queue_exception_e+0x26/0x47 [kvm]()
Jul 16 19:17:03 cubalibre kernel: [ 9275.837022] Modules linked in: tun sbs ac 
battery wmi container sbshc video output nfs lockd nfs_acl sunrpc bridge ipv6 
mptctl ipmi_poweroff ipmi_si ipmi_devintf ipmi_msghandler kvm_amd kvm bonding 
loop psmouse serio_raw pcspkr button i2c_piix4 shpchp pci_hotplug i2c_core 
dcdbas evdev ext3 jbd mbcache dm_mirror dm_snapshot dm_mod ide_cd_mod cdrom 
sd_mod sata_svw usbhid hid ff_memless ata_generic serverworks libata dock 
mptsas mptscsih mptbase scsi_transport_sas scsi_mod tg3 ehci_hcd 
ide_pci_generic ide_core ohci_hcd thermal processor fan
Jul 16 19:17:03 cubalibre kernel: [ 9275.837458] Pid: 5135, comm: kvm Not 
tainted 2.6.25-2-amd64 #1
Jul 16 19:17:03 cubalibre kernel: [ 9275.837491]
Jul 16 19:17:03 cubalibre kernel: [ 9275.837491] Call Trace:
Jul 16 19:17:03 cubalibre kernel: [ 9275.837544]  [80234ce5] 
warn_on_slowpath+0x51/0x63
Jul 16 19:17:03 cubalibre kernel: [ 9275.837580]  [8022a56c] 
hrtick_start_fair+0xfb/0x143
Jul 16 19:17:03 cubalibre kernel: [ 9275.837618]  [80230217] 
hrtick_set+0x88/0xf7
Jul 16 19:17:03 cubalibre kernel: [ 9275.837652]  [8041f709] 
error_exit+0x0/0x60
Jul 16 19:17:03 cubalibre kernel: [ 9275.837697]  [882118f0] 
:kvm:gfn_to_hva+0x1c/0x40
Jul 16 19:17:03 cubalibre kernel: [ 9275.837742]  [88211a7b] 
:kvm:kvm_read_guest_page+0x34/0x46
Jul 16 19:17:03 cubalibre kernel: [ 9275.837792]  [88214cd1] 
:kvm:kvm_queue_exception_e+0x26/0x47
Jul 16 19:17:03 cubalibre kernel: [ 9275.837839]  [88237ddc] 
:kvm_amd:handle_exit+0x9a/0x1ab
Jul 16 19:17:03 cubalibre kernel: [ 9275.837886]  [8821705b] 
:kvm:kvm_arch_vcpu_ioctl_run+0x460/0x612
Jul 16 19:17:03 cubalibre kernel: [ 9275.837940]  [882129fb] 
:kvm:kvm_vcpu_ioctl+0xf3/0x3a9
Jul 16 19:17:03 cubalibre kernel: [ 9275.837976]  [8027c8aa] 
zone_statistics+0x3f/0x93
Jul 16 19:17:03 cubalibre kernel: [ 9275.838011]  [8027659e] 
get_page_from_freelist+0x4a6/0x638
Jul 16 19:17:03 cubalibre kernel: [ 9275.838055]  [80276e13] 
__alloc_pages+0x71/0x312
Jul 16 19:17:03 cubalibre kernel: [ 9275.838092]  [80281318] 
handle_mm_fault+0x38b/0x893
Jul 16 19:17:03 cubalibre kernel: [ 9275.838134]  [8023e5c4] 
recalc_sigpending+0xe/0x38
Jul 16 19:17:03 cubalibre kernel: [ 9275.838167]  [8023f80d] 
dequeue_signal+0x8d/0x113
Jul 16 19:17:03 cubalibre kernel: [ 9275.838206]  [802a5b05] 
vfs_ioctl+0x21/0x6b
Jul 16 19:17:03 cubalibre kernel: [ 9275.838239]  [802a5d97] 
do_vfs_ioctl+0x248/0x261
Jul 16 19:17:03 cubalibre kernel: [ 9275.838274]  [8029ad84] 
vfs_read+0x11e/0x152
Jul 16 19:17:03 cubalibre kernel: [ 9275.838308]  [802a5e01] 
sys_ioctl+0x51/0x70
Jul 16 19:17:03 cubalibre kernel: [ 9275.838347]  [8020bd9a] 
system_call_after_swapgs+0x8a/0x8f
Jul 16 19:17:03 cubalibre kernel: [ 9275.838385]
Jul 16 19:17:03 cubalibre kernel: [ 9275.838412] ---[ end trace 
18dbdafc95bffe16 ]---
Jul 16 19:17:03 cubalibre kernel: [ 9275.838456] [ cut here 
]

kvm_stat output:

efer_reload0 0
exits   26806018  2361
fpu_reload  17349000  1205
halt_exits   2938557   370
halt_wakeup   33961629
host_state_reload   17697095  1245
hypercalls 0 0
insn_emulation   7519074   753
insn_emulation_fail0 0
invlpg 0 0
io_exits13196230   591
irq_exits1593408   279
irq_window 0 0
largepages 0 0
mmio_exits   1445645   224
mmu_cache_miss   702 0
mmu_flooded0 0
mmu_pde_zapped 0 0
mmu_pte_updated0 0
mmu_pte_write  23000 0
mmu_recycled   0 0
mmu_shadow_zapped  0 0
nmi_window 0 0
pf_fixed   0 0
pf_guest   0 0
remote_tlb_flush   3 0
request_irq0 0
signal_exits   4 0

Re: kvm causing memory corruption? now 2.6.26

2008-07-16 Thread Dave Hansen
On a suggestion of Anthony's, I tried a defconfig kernel.

It is now bombing out on an assertion in the lapic code:

http://sr71.net/~dave/linux/2.6.26-oops1.txt



-- Dave

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/3] KVM: VMX: handle segment limit granularity special case in software

2008-07-16 Thread Marcelo Tosatti
As the comment in the diff mentions, VMX does not accept any bit in
the range 11:0 of ES,CS,FS,GS,SS segment registers limit field to 
be zero with the granulity bit set to one.

So clear granularity and adjust the limit accordingly. 

Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED]

Index: kvm/arch/x86/kvm/vmx.c
===
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -1665,6 +1665,22 @@ static void vmx_set_segment(struct kvm_v
return;
}
vmcs_writel(sf-base, var-base);
+
+   /*
+* section 22.3.1.2:
+* - If any bit in the limit field in the range 11:0 is 0, G must be 0.
+* - If any bit in the limit field in the range 31:20 is 1, G must be 1.
+*/
+   if (!vcpu-arch.rmode.active  !var-unusable 
+seg != VCPU_SREG_TR  seg != VCPU_SREG_LDTR) {
+#define SEG_MASK ((1  12)-1)
+   if (var-g  (var-limit  SEG_MASK) != SEG_MASK) {
+   var-g = 0;
+   var-limit = 12;
+   var-limit |= SEG_MASK;
+   }
+   }
+
vmcs_write32(sf-limit, var-limit);
vmcs_write16(sf-selector, var-selector);
if (vcpu-arch.rmode.active  var-s) {

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM overflows the stack

2008-07-16 Thread Dave Hansen
On Wed, 2008-07-16 at 14:44 -0700, Dave Hansen wrote:
 On a suggestion of Anthony's, I tried a defconfig kernel.
 
 It is now bombing out on an assertion in the lapic code:
 
   http://sr71.net/~dave/linux/2.6.26-oops1.txt

I think I found it!!!

$ (objdump -d kvm.ko ; objdump -d kvm-intel.ko ) | egrep 'sub.*0x...,.*esp|:'  
| egrep sub -B1
1a90 kvm_vcpu_ioctl:
1a9a:   81 ec 60 06 00 00   sub$0x660,%esp
--
4e90 kvm_arch_vcpu_ioctl:
4e9d:   81 ec 6c 08 00 00   sub$0x86c,%esp
--
5900 kvm_arch_vm_ioctl:
5903:   81 ec 34 05 00 00   sub$0x534,%esp
--
d4f0 paging64_prefetch_page:
d4f8:   81 ec 1c 01 00 00   sub$0x11c,%esp
--
dfd0 paging32_prefetch_page:
dfd8:   81 ec 1c 01 00 00   sub$0x11c,%esp
--
f390 kvm_pv_mmu_op:
f3a1:   81 ec 28 02 00 00   sub$0x228,%esp

We're simply overflowing the stack.  I changed all of the large on-stack
allocations to 'static', and it actually boots now.  I know 'static'
isn't safe, but it was good for a quick test.

A 'make stackcheck' confirms this:

[EMAIL PROTECTED]:~/kernels/linux-2.6.git$ make checkstack
objdump -d vmlinux $(find . -name '*.ko') | \
perl /home/dave/kernels/linux-2.6.git-t61/scripts/checkstack.pl i386
0x42d3 kvm_arch_vcpu_ioctl [kvm]:   2148
0x12e3 kvm_vcpu_ioctl [kvm]:1620
0x4a83 kvm_arch_vm_ioctl [kvm]: 1332
0x9a26 airo_get_aplist [airo]:  1140
0x9b76 airo_get_aplist [airo]:  1140
0x9c82 airo_get_aplist [airo]:  1140
...

In other words, kvm has the top 3 stack users in my kernel.  As you can
see from my trace above, these things also get called with super-long
stacks already.  Man.  That sucked to find.

Avi, how would you like this fixed?  I'd be happy to prepare some
patches.  Do you have a particular approach that you think we should
use?  Just make the big objects dynamically allocated?

-- Dave

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: networking setup problem

2008-07-16 Thread David Mair

paolo pedaletti wrote:

Hi,
I hope this is the right ml to submit my problem.

Abstract: I can't setup 2 different network inside my VMs, one public 
and one private.


Scheme:

  eth0 -
 -| proxy |---eth1
 |-  |
H|   |
O|   eth0 -  |
S|| web   |--|eth1
T|-  |
 |   |
 |   eth0 -  |
 || db|---eth1
  -


this is a classic LAMP, sparse on 3 VM

1) front end, proxy (apache2 in reverse with mod-security)
2) application server, web (apache2 + php5)
3) database (mysql5)

(it's a test/backup environment)

each VM must have 2 network card:
eth0 on the local network, in bridge with the host physical eth0
eth1 on the virtual private network, for internal communications between 
them


saying that, ... it doesn't work :-(
(linux ubuntu 8.04 2.6.24-19-generic, kvm-62)

these are the command lines:

kvm -name PROXY
-net nic,vlan=0,macaddr=00:18:BE:EF:17:2A,model=rtl8139
-net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh
-net nic,vlan=1,macaddr=00:18:BE:EF:17:2B,model=rtl8139
-net user,vlan=1,ifname=dmz0,script=./qemu-ifup.sh
-drive index=0,media=disk,if=scsi,file=./ubuntu-server.PROXY.root,boot=on
-drive index=1,media=disk,if=scsi,file=./ubuntu-server.PROXY.home
-drive index=2,media=disk,if=scsi,file=./ubuntu-server.PROXY.swap

kvm -name WEBAPP
-net nic,vlan=0,macaddr=00:18:BE:EF:17:1A,model=rtl8139
-net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh
-net nic,vlan=1,macaddr=00:18:BE:EF:17:1B,model=rtl8139
-net user,vlan=1,ifname=dmz0,script=./qemu-ifup.sh
-drive index=0,media=disk,if=scsi,file=./ubuntu-server.WEB.root,boot=on
-drive index=1,media=disk,if=scsi,file=./ubuntu-server.WEB.home
-drive index=2,media=disk,if=scsi,file=./ubuntu-server.WEB.swap

kvm -name DB
-net nic,vlan=0,macaddr=00:18:BE:EF:17:0A,model=rtl8139
-net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh
-net nic,vlan=1,macaddr=00:18:BE:EF:17:0B,model=rtl8139
-net user,vlan=1,ifname=dmz0,script=./qemu-ifup.sh
-drive index=0,media=disk,if=scsi,file=./ubuntu-server.DB.root,boot=on
-drive index=1,media=disk,if=scsi,file=./ubuntu-server.DB.home
-drive index=2,media=disk,if=scsi,file=./ubuntu-server.DB.swap


$ cat /etc/qemu-ifup
-8-88--
#!/bin/sh
set -x

echo Executing $0

case $1 in
tap*)echo tun network
 BRIDGE=br0
if [ -z $(ifconfig $BRIDGE) ] ; then
/usr/sbin/brctl addbr $BRIDGE
dhclient $BRIDGE
fi
/usr/sbin/tunctl -u `whoami` -t $1
echo Bringing up $1 for bridged mode...
/sbin/ifconfig $1 0.0.0.0 promisc up
/sbin/ip link set $1 up
sleep 0.5s
echo Adding $1 to br0...
/usr/sbin/brctl addif $BRIDGE $1
;;

dmz*)echo dmz network
 BRIDGE=br1
if [ -z $(ifconfig $BRIDGE) ] ; then
/usr/sbin/brctl addbr $BRIDGE
dhclient $BRIDGE
fi
/usr/sbin/tunctl -u `whoami` -t $1
echo Bringing up $1 for bridged mode...
/sbin/ifconfig $1 0.0.0.0 promisc up
/sbin/ip link set $1 up
sleep 0.5s
echo Adding $1 to $BRIDGE...
/usr/sbin/brctl addif $BRIDGE $1
;;

*)   echo Error: no interface specified or interface '$1' invalid
exit 1
esac
-8-88--



eth0 works for all the VM, eth1 doesn't.

constrain: no dhcp, all static ip

any suggestion?



AFAIK, -net user does not need an ifname or script argument - there's no host 
interface for the user mode stack. Try these:


kvm -name PROXY
-net nic,vlan=0,macaddr=00:18:BE:EF:17:2A,model=rtl8139
-net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh
-net nic,vlan=1,macaddr=00:18:BE:EF:17:2B,model=rtl8139
-net user,vlan=1
-drive index=0,media=disk,if=scsi,file=./ubuntu-server.PROXY.root,boot=on
-drive index=1,media=disk,if=scsi,file=./ubuntu-server.PROXY.home
-drive index=2,media=disk,if=scsi,file=./ubuntu-server.PROXY.swap

kvm -name WEBAPP
-net nic,vlan=0,macaddr=00:18:BE:EF:17:1A,model=rtl8139
-net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh
-net nic,vlan=1,macaddr=00:18:BE:EF:17:1B,model=rtl8139
-net user,vlan=1
-drive index=0,media=disk,if=scsi,file=./ubuntu-server.WEB.root,boot=on
-drive index=1,media=disk,if=scsi,file=./ubuntu-server.WEB.home
-drive index=2,media=disk,if=scsi,file=./ubuntu-server.WEB.swap

kvm -name DB
-net nic,vlan=0,macaddr=00:18:BE:EF:17:0A,model=rtl8139
-net tap,vlan=0,ifname=tap0,script=./qemu-ifup.sh
-net nic,vlan=1,macaddr=00:18:BE:EF:17:0B,model=rtl8139
-net user,vlan=1
-drive index=0,media=disk,if=scsi,file=./ubuntu-server.DB.root,boot=on
-drive index=1,media=disk,if=scsi,file=./ubuntu-server.DB.home
-drive index=2,media=disk,if=scsi,file=./ubuntu-server.DB.swap

--
David.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] kvm-autotest

2008-07-16 Thread Ryan Harper
* Uri Lublin [EMAIL PROTECTED] [2008-07-16 18:15]:
 Client side, for installation, we already have a solution that works for
 all types of guests:
 
 http://kvm.qumranet.com/kvmwiki/KVMTest
 
 
 which is already integrated as a client test in autotest.  Once you
 record your installation via kvmtest, then it is just  matter for
 keeping the iso and an empty disk image around and replaying the
 installation with -snapshot.
 
 
 So guest installation is a client test. KVMtest has its own way of
 managing/booting/communicating-with guests, and naturally does not use
 KVM/KVMGuest classes of autotest server.

It does not use the KVM/KVMGuest classes in autotest server precisely
because it is a client test, there is no need to do anything
server-side w.r.t KVMTest.  We ensure that we've built and installed
whatever version of kvm on the target machine and that the target
machine has access to require inputs (iso,disk image) and invoke
KVMTest.

 
 How about the test, suggested by Marcelo/Chris, of changing physical cpu of 
 a VM
 using taskset. Would that be a client test or a server test ?
 What about stop/cont, save/restore ?
 
 I think most kvm-tests will be client tests.

Agreed, the above examples will be client tests.  Autotest client can do
parallel execution, or even step-wise.  Between those, we should be able
to ensure we get proper coverage for the envisioned scenarios.

 
 More complex tests, which involve multiple hosts, such as migration
 between two hosts, should be server tests.
 
 Managing all those tests, can be done by autoserv.

Agreed.

 
 
 
 Now, I'm actually more interested in doing the following:
 
 use kvmtest to replay an installation of a guest and instead of throwing
 the guest away once (running with -snapshot) it has passes the install,
 it is now ready to be used to execute autotest client tests or something
 else.
 
 autotest client tests can be consider as guest load which usually does is 
 orthogonal to the real kvm-test that is running (e.g. migration-test 

Definitely.  I just want to utilize autotest to drive both guest
creation and test orchestration which includes installation as well as
testing known working guests in various scenarios and yes autotest
simplifies generating guest load.

 while watching a movie, or migration-test while building the kernel). Also 
 what would you do for non-linux guest ?

I'm not quite sure what to do here since non-linux guests isn't really
in my scope beyond simple tests (installation, shutdown, reboot,
configuration variance).  

 
 
 We'll try this little exercise of writing a kvm-test on the server side 
 and on the client side and compare complexity.
 
 That's a bit vague, what sort of test are you talking about?  If you
 mean installation, i'm not interested since that's been handled by
 KVMTest.
 Are you actually running autotest tests with KVMTest installed guests? Do 

Yes, but the transition between using KVMtest to install aguest and
running autotest inside isn't completely automated -- yet.

 you have to manually exchange ssh-keys.

Yeah, we'll need a solution, but it should be pretty simple to automate,
Mostly likely pregenerate a key and serve it up to the guest via cdrom
image and then install that into the guest. 


 As to complexity, I urge you to look at the existing kvm
u
 examples[2] in the autotest server dir, those look pretty darn simple to me
 and already include all of the infrastructure for capturing console
 logs, results and errors.
 
 It does look simple. It was written for a different purpose though, which 
 is to run autotest tests on guests, not to run kvm tests.

Sure, but that doesn't mean it isn't a good infrastructure on top of
which we can build kvm testing.

 
 
 
 Oh, I forgot my pointer to the server setup last time:
 
 1. http://test.kernel.org/autotest/AutotestServerInstall
 2. autotest/server/samples/kvm.srv
 
 
 
 
 So what do you propose ?
 We've were thinking today how we can move things to the server and want to 
 get your (or anyone's) opinion. Does the server always starts (boots) 
 guests ? Do most tests run on the server-side (similar to [2]) or may 

Yeah, let me explain a bit more about how the server and clients work:

You'll have one master server which runs the job monitor, it will look
for autotest server files (.srv) and that file will run on the server,
but you will write your srv file to execute on a set of machines that
your master server manages.  Autotest server maintains a db of machines.

Looking at autotest/server/samples/sleeptest.srv:


def run(machine):
   host = hosts.SSHHost(machine)
   at = autotest.Autotest(host)
   at.run_test('sleeptest')

job.parallel_simple(run, machines)

We're defining a run function, and then running that across all the
machines in the grid.  

The other interesting one to look at is netperf-guest-to-host-far.srv.

That file demonstrates installing kvm to differnt host machines (not the
server where the .srv file is running); on 

Re: [PATCH 3/6] KVM: Handle device assignment to guests

2008-07-16 Thread Yang, Sheng
Some comments below. :)

On Wednesday 16 July 2008 23:56:50 Ben-Ami Yassour wrote:
 From: Amit Shah [EMAIL PROTECTED]

 This patch adds support for handling PCI devices that are assigned
 to the guest (PCI passthrough).

 The device to be assigned to the guest is registered in the host
 kernel and interrupt delivery is handled. If a device is already
 assigned, or the device driver for it is still loaded on the host,
 the device assignment is failed by conveying a -EBUSY reply to the
 userspace.

 Devices that share their interrupt line are not supported at the
 moment.

 By itself, this patch will not make devices work within the guest.
 The VT-d extension is required to enable the device to perform DMA.
 Another alternative is PVDMA.

 Signed-off-by: Amit Shah [EMAIL PROTECTED]
 Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED]
 Signed-off-by: Han, Weidong [EMAIL PROTECTED]
 ---
  arch/x86/kvm/x86.c |  267
 
 include/asm-x86/kvm_host.h |   37 ++
  include/asm-x86/kvm_para.h |   16 +++-
  include/linux/kvm.h|3 +
  virt/kvm/ioapic.c  |   12 ++-
  5 files changed, 332 insertions(+), 3 deletions(-)

 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 3167006..65b307d 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -4,10 +4,12 @@
   * derived from drivers/kvm/kvm_main.c
   *
   * Copyright (C) 2006 Qumranet, Inc.
 + * Copyright (C) 2008 Qumranet, Inc.
   *
   * Authors:
   *   Avi Kivity   [EMAIL PROTECTED]
   *   Yaniv Kamay  [EMAIL PROTECTED]
 + *   Amit Shah[EMAIL PROTECTED]
   *
   * This work is licensed under the terms of the GNU GPL, version
 2.  See * the COPYING file in the top-level directory.
 @@ -23,8 +25,10 @@
  #include x86.h

  #include linux/clocksource.h
 +#include linux/interrupt.h
  #include linux/kvm.h
  #include linux/fs.h
 +#include linux/pci.h
  #include linux/vmalloc.h
  #include linux/module.h
  #include linux/mman.h
 @@ -98,6 +102,256 @@ struct kvm_stats_debugfs_item
 debugfs_entries[] = { { NULL }
  };

[snip]
 +
 +static int kvm_vm_ioctl_pci_pt_dev(struct kvm *kvm,
 +struct kvm_pci_passthrough_dev *pci_pt_dev)
 +{
 + int r = 0;
 + struct kvm_pci_pt_dev_list *match;
 + struct pci_dev *dev;
 +
 + write_lock(kvm_pci_pt_lock);
 +
 + /* Check if this is a request to update the irq of the device
 +  * in the guest (BIOS/ kernels can dynamically reprogram irq
 +  * numbers).  This also protects us from adding the same
 +  * device twice.
 +  */
 + match = kvm_find_pci_pt_dev(kvm-arch.pci_pt_dev_head,
 + pci_pt_dev-host, 0, KVM_PT_SOURCE_UPDATE);
 + if (match) {
 + match-pt_dev.guest.irq = pci_pt_dev-guest.irq;
 + write_unlock(kvm_pci_pt_lock);
 + goto out;
 + }
 + write_unlock(kvm_pci_pt_lock);
 +
 + match = kzalloc(sizeof(struct kvm_pci_pt_dev_list), GFP_KERNEL);
 + if (match == NULL) {
 + printk(KERN_INFO %s: Couldn't allocate memory\n,
 +__func__);
 + r = -ENOMEM;
 + goto out;
 + }
 + dev = pci_get_bus_and_slot(pci_pt_dev-host.busnr,
 +pci_pt_dev-host.devfn);
 + if (!dev) {
 + printk(KERN_INFO %s: host device not found\n, __func__);
 + r = -EINVAL;
 + goto out_free;
 + }
 + if (pci_enable_device(dev)) {
 + printk(KERN_INFO %s: Could not enable PCI device\n, __func__);
 + r = -EBUSY;
 + goto out_put;
 + }
 + r = pci_request_regions(dev, kvm_pt_device);
 + if (r) {
 + printk(KERN_INFO %s: Could not get access to device regions\n,
 +__func__);
 + goto out_put;

pci_disable_device()?

 + }
 + match-pt_dev.guest.busnr = pci_pt_dev-guest.busnr;
 + match-pt_dev.guest.devfn = pci_pt_dev-guest.devfn;
 + match-pt_dev.host.busnr = pci_pt_dev-host.busnr;
 + match-pt_dev.host.devfn = pci_pt_dev-host.devfn;
 + match-pt_dev.dev = dev;
 +
 + write_lock(kvm_pci_pt_lock);
 +
 + INIT_WORK(match-pt_dev.int_work.work, kvm_pci_pt_int_work_fn);
 + INIT_WORK(match-pt_dev.ack_work.work, kvm_pci_pt_ack_work_fn);
 +
 + match-pt_dev.kvm = kvm;
 + match-pt_dev.int_work.pt_dev = match-pt_dev;
 + match-pt_dev.ack_work.pt_dev = match-pt_dev;
 +
 + list_add(match-list, kvm-arch.pci_pt_dev_head);
 +
 + write_unlock(kvm_pci_pt_lock);
 +
 + if (irqchip_in_kernel(kvm)) {
 + match-pt_dev.guest.irq = pci_pt_dev-guest.irq;
 + match-pt_dev.host.irq = dev-irq;
 + if (kvm-arch.vioapic)
 + kvm-arch.vioapic-ack_notifier = kvm_pci_pt_ack_irq;
 + if (kvm-arch.vpic)
 + kvm-arch.vpic-ack_notifier = kvm_pci_pt_ack_irq;
 +
 + /* Even though this is PCI, we don't want to use shared
 +  

RE: [PATCH 3/8] KVM: Handle device assignment to guests

2008-07-16 Thread Han, Weidong
Avi Kivity wrote:
 +static void kvm_pci_pt_work_fn(struct work_struct *work) +{
 +struct kvm_pci_pt_dev_list *match;
 +struct kvm_pci_pt_work *int_work;
 +int source;
 +unsigned long flags;
 +int guest_irq;
 +int host_irq;
 +
 +int_work = container_of(work, struct kvm_pci_pt_work, work); +
 +source = int_work-source ? KVM_PT_SOURCE_IRQ_ACK :
 KVM_PT_SOURCE_IRQ; + +   /* This is taken to safely inject irq
inside
 the guest. When + * the interrupt injection (or the ioapic code)
 uses a +  * finer-grained lock, update this
 + */
 +mutex_lock(int_work-kvm-lock);
 +read_lock_irqsave(kvm_pci_pt_lock, flags);
 +match =
kvm_find_pci_pt_dev(int_work-kvm-arch.pci_pt_dev_head,
 NULL, +  int_work-irq, source);
 +if (!match) {
 +printk(KERN_ERR %s: no matching device assigned to
guest 
 +   found for irq %d, source = %d!\n,
 +   __func__, int_work-irq, int_work-source);
 +read_unlock_irqrestore(kvm_pci_pt_lock, flags); +
goto out;
 +}
 +guest_irq = match-pt_dev.guest.irq;
 +host_irq = match-pt_dev.host.irq;
 +read_unlock_irqrestore(kvm_pci_pt_lock, flags);
 +
 +if (source == KVM_PT_SOURCE_IRQ)
 +kvm_set_irq(int_work-kvm, guest_irq, 1);
 +else {
 +kvm_set_irq(int_work-kvm, int_work-irq, 0);
 +enable_irq(host_irq);
 +}
 +out:
 +mutex_unlock(int_work-kvm-lock);
 +kvm_put_kvm(int_work-kvm);
 +}
 
 +
 +/* FIXME: Implement the OR logic needed to make shared interrupts
 on + * this line behave properly + */
 
 
 Isn't this a showstopper?  There is no easy way for a user to avoid
 sharing, especially as we have only three pci irqs at present.
 

Currently it's not easy to avoid sharing. I think we can support MSI for
assgined device to solve sharing problem. 

Randy (Weidong)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PCI passthrough with VT-d - native performance

2008-07-16 Thread Han, Weidong
Anthony Liguori wrote:
 Ben-Ami Yassour wrote:
 On Wed, 2008-07-16 at 17:36 +0300, Avi Kivity wrote:
 
 Ben-Ami Yassour wrote:
 
 In last few tests that we made with PCI-passthrough and VT-d using
 iperf, we were able to get the same throughput as on native OS
 with a 1G NIC 
 
 Excellent!
 
 
  (with higher CPU utilization).
 
 
 How much higher?
 
 
 Here are some numbers for running iperf -l 1M:
 
 e1000 NIC (behind a PCI bridge)
Bandwidth (Mbit/sec)CPU utilization
 Native OS   771  18%
 Native OS with VT-d 760  18%
 KVM VT-d390  95%
 KVM VT-d with direct mmio   770  84%
 KVM emulated 57 100%
 
 
 What about virtio?  Also, which emulated is this?
 
 That CPU utilization is extremely high and somewhat illogical if
 native w/vt-d has almost no CPU impact.  Have you run oprofile yet or
 have any insight into where CPU is being burnt?
 
 What does kvm_stat look like?  I wonder if there are a large number of
 PIO exits.  What does the interrupt count look like on native vs. KVM
 with VT-d?
 

e1000 NIC doesn't use PIO. 

Randy (Weidong)


Re: kvm causing memory corruption? now 2.6.26

2008-07-16 Thread Avi Kivity

Dave Hansen wrote:

On a suggestion of Anthony's, I tried a defconfig kernel.

It is now bombing out on an assertion in the lapic code:

http://sr71.net/~dave/linux/2.6.26-oops1.txt


  


Well that assert is plain wrong:

static int apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
  int short_hand, int dest, int dest_mode)
{
   int result = 0;
   struct kvm_lapic *target = vcpu-arch.apic;

   apic_debug(target %p, source %p, dest 0x%x, 
  dest_mode 0x%x, short_hand 0x%x,
  target, source, dest, dest_mode, short_hand);

   ASSERT(!target);


It should be ASSERT(target), if anything.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM overflows the stack

2008-07-16 Thread Avi Kivity

Dave Hansen wrote:

On Wed, 2008-07-16 at 14:44 -0700, Dave Hansen wrote:
  

On a suggestion of Anthony's, I tried a defconfig kernel.

It is now bombing out on an assertion in the lapic code:

http://sr71.net/~dave/linux/2.6.26-oops1.txt



I think I found it!!!

$ (objdump -d kvm.ko ; objdump -d kvm-intel.ko ) | egrep 'sub.*0x...,.*esp|:'  
| egrep sub -B1
1a90 kvm_vcpu_ioctl:
1a9a:   81 ec 60 06 00 00   sub$0x660,%esp
--
4e90 kvm_arch_vcpu_ioctl:
4e9d:   81 ec 6c 08 00 00   sub$0x86c,%esp
--
5900 kvm_arch_vm_ioctl:
5903:   81 ec 34 05 00 00   sub$0x534,%esp
--
d4f0 paging64_prefetch_page:
d4f8:   81 ec 1c 01 00 00   sub$0x11c,%esp
--
dfd0 paging32_prefetch_page:
dfd8:   81 ec 1c 01 00 00   sub$0x11c,%esp
--
f390 kvm_pv_mmu_op:
f3a1:   81 ec 28 02 00 00   sub$0x228,%esp

We're simply overflowing the stack.  I changed all of the large on-stack
allocations to 'static', and it actually boots now.  I know 'static'
isn't safe, but it was good for a quick test.

  


Yes!   It's obvious, once you know it...


A 'make stackcheck' confirms this:

[EMAIL PROTECTED]:~/kernels/linux-2.6.git$ make checkstack
objdump -d vmlinux $(find . -name '*.ko') | \
perl /home/dave/kernels/linux-2.6.git-t61/scripts/checkstack.pl i386
0x42d3 kvm_arch_vcpu_ioctl [kvm]:   2148
0x12e3 kvm_vcpu_ioctl [kvm]:1620
0x4a83 kvm_arch_vm_ioctl [kvm]: 1332
0x9a26 airo_get_aplist [airo]:  1140
0x9b76 airo_get_aplist [airo]:  1140
0x9c82 airo_get_aplist [airo]:  1140
...

In other words, kvm has the top 3 stack users in my kernel.  As you can
see from my trace above, these things also get called with super-long
stacks already.  Man.  That sucked to find.

Avi, how would you like this fixed?  I'd be happy to prepare some
patches.  Do you have a particular approach that you think we should
use?  Just make the big objects dynamically allocated?
  


Yes, things like kvm_lapic_state are way too big to be on the stack.  
There's an additional problem here, that apparently your gcc (which 
version?) doesn't fold objects in a switch statement into the same stack 
slot:


switch (...) {
   case x: {
struct medium a;
...
   }
   case y:
 struct medium b;
 ...
   }
};

These could be solved either by stack allocation, or by moving into 
functions marked noinline.  Whichever is easier.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html