[COMMIT master] KVM: Trivial format fix in setup_routing_entry()

2009-05-03 Thread Avi Kivity
From: Chris Wright chr...@sous-sol.org

Remove extra tab.

Signed-off-by: Chris Wright chr...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 4fa1f60..a8bd466 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -271,7 +271,7 @@ static int setup_routing_entry(struct 
kvm_kernel_irq_routing_entry *e,
delta = 8;
break;
case KVM_IRQCHIP_IOAPIC:
-   e-set = kvm_set_ioapic_irq;
+   e-set = kvm_set_ioapic_irq;
break;
default:
goto out;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Fix NX support reporting

2009-05-03 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

NX support is bit 20, not bit 1.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5fcde2c..8b0d777 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1266,7 +1266,7 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
bit(X86_FEATURE_CMOV) | bit(X86_FEATURE_PSE36) |
bit(X86_FEATURE_MMX) | bit(X86_FEATURE_FXSR) |
bit(X86_FEATURE_SYSCALL) |
-   (bit(X86_FEATURE_NX)  is_efer_nx()) |
+   (is_efer_nx() ? bit(X86_FEATURE_NX) : 0) |
 #ifdef CONFIG_X86_64
bit(X86_FEATURE_LM) |
 #endif
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Make EFER reads safe when EFER does not exist

2009-05-03 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Some processors don't have EFER; don't oops if userspace wants us to
read EFER when we check NX.

Cc: sta...@kernel.org
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8b0d777..2d7082c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1128,9 +1128,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 
 static int is_efer_nx(void)
 {
-   u64 efer;
+   unsigned long long efer = 0;
 
-   rdmsrl(MSR_EFER, efer);
+   rdmsrl_safe(MSR_EFER, efer);
return efer  EFER_NX;
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Increment virtio-net savevm version to avoid conflict with upstream QEMU.

2009-05-03 Thread Avi Kivity
From: Anthony Liguori aligu...@us.ibm.com

When TAP_VNET_HDR eventually merges into upstream QEMU, it cannot change the
format of the version 6 savevm data.  This means that we're going to have to
bump the version up to 7.  I'm happy to reserve version 7 as having TAP_VNET_HDR
support to allow time to include this support in upstream QEMU.

For those shipping products based on KVM though, it's important that we do not
conflict with upstream QEMU versioning or else it's going to result in breakage
of backwards compatibility.

Signed-off-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 5f5f2f3..ac8e030 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -21,7 +21,12 @@
 
 #define TAP_VNET_HDR
 
-#define VIRTIO_NET_VM_VERSION6
+/* Version 7 has TAP_VNET_HDR support.  This is reserved in upstream QEMU to
+ * avoid future conflict.
+ * We can't assume verisons  7 have TAP_VNET_HDR support until this is merged
+ * in upstream QEMU.
+ */
+#define VIRTIO_NET_VM_VERSION7
 
 #define MAC_TABLE_ENTRIES32
 #define MAX_VLAN(1  12)   /* Per 802.1Q definition */
@@ -652,8 +657,9 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int 
version_id)
 qemu_get_buffer(f, (uint8_t *)n-vlans, MAX_VLAN  3);
 
 #ifdef TAP_VNET_HDR
-if (qemu_get_be32(f))
+if (version_id == 7  qemu_get_be32(f)) {
 tap_using_vnet_hdr(n-vc-vlan-first_client, 1);
+}
 #endif
 
 if (n-tx_timer_active) {
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Remove -cpu-vendor-string

2009-05-03 Thread Avi Kivity
From: Anthony Liguori aligu...@us.ibm.com

Superceded by qemu '-cpu vendor=...' option.

Signed-off-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/linux-user/main.c b/linux-user/main.c
index 5967fa3..dc39b05 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -43,7 +43,6 @@ int singlestep;
 
 static const char *interp_prefix = CONFIG_QEMU_PREFIX;
 const char *qemu_uname_release = CONFIG_UNAME_RELEASE;
-const char *cpu_vendor_string = NULL;
 
 #if defined(__i386__)  !defined(CONFIG_STATIC)
 /* Force usage of an ELF interpreter even if it is an ELF shared
diff --git a/qemu-options.hx b/qemu-options.hx
index a11ead9..f7d83c9 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1545,8 +1545,6 @@ DEF(pcidevice, HAS_ARG, QEMU_OPTION_pcidevice,
 #endif
 DEF(enable-nesting, 0, QEMU_OPTION_enable_nesting,
 -enable-nesting enable support for running a VM inside the VM (AMD 
only)\n)
-DEF(cpu-vendor, HAS_ARG, QEMU_OPTION_cpu_vendor,
--cpu-vendor STRING   override the cpuid vendor string\n)
 DEF(nvram, HAS_ARG, QEMU_OPTION_nvram,
 -nvram FILE  provide ia64 nvram contents\n)
 DEF(tdf, 0, QEMU_OPTION_tdf,
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 24fcea8..7440983 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -91,8 +91,6 @@ static void add_flagname_to_bitmaps(char *flagname, uint32_t 
*features,
 fprintf(stderr, CPU feature %s not found\n, flagname);
 }
 
-extern const char *cpu_vendor_string;
-
 typedef struct x86_def_t {
 const char *name;
 uint32_t level;
@@ -431,9 +429,6 @@ static int cpu_x86_register (CPUX86State *env, const char 
*cpu_model)
 {
 const char *model_id = def-model_id;
 int c, len, i;
-
-if (cpu_vendor_string != NULL)
-model_id = cpu_vendor_string;
 if (!model_id)
 model_id = ;
 len = strlen(model_id);
diff --git a/vl.c b/vl.c
index fbc84a7..38f208a 100644
--- a/vl.c
+++ b/vl.c
@@ -268,7 +268,6 @@ const char *mem_path = NULL;
 int mem_prealloc = 1;  /* force preallocation of physical target memory */
 #endif
 long hpagesize = 0;
-const char *cpu_vendor_string;
 #ifdef TARGET_ARM
 int old_param = 0;
 #endif
@@ -5216,9 +5215,6 @@ int main(int argc, char **argv, char **envp)
 nb_prom_envs++;
 break;
 #endif
-   case QEMU_OPTION_cpu_vendor:
-   cpu_vendor_string = optarg;
-   break;
 #ifdef TARGET_ARM
 case QEMU_OPTION_old_param:
 old_param = 1;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Don't clean *.dtb

2009-05-03 Thread Avi Kivity
From: Anthony Liguori aligu...@us.ibm.com

*.dtb is under source control, so it shouldn't be deleted.

Signed-off-by: Anthony Liguori aligu...@us.ibm.com
Acked-by: Hollis Blanchard holl...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/pc-bios/Makefile b/pc-bios/Makefile
index dabeb4c..315288d 100644
--- a/pc-bios/Makefile
+++ b/pc-bios/Makefile
@@ -16,4 +16,4 @@ all: $(TARGETS)
dtc -I dts -O dtb -o $@ $
 
 clean:
-   rm -f $(TARGETS) *.o *~ *.dtb
+   rm -f $(TARGETS) *.o *~
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Increment version id for CPU save state

2009-05-03 Thread Avi Kivity
From: Anthony Liguori aligu...@us.ibm.com

9 is reserved for KVM.  KVM cannot support migration from any other version.

Signed-off-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index f054af1..af0ee18 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -838,7 +838,9 @@ static inline int cpu_get_time_fast(void)
 #define cpu_signal_handler cpu_x86_signal_handler
 #define cpu_list x86_cpu_list
 
-#define CPU_SAVE_VERSION 8
+/* CPU_SAVE_VERSION 9 is reserved for KVM.  This is to avoid breakage as KVM
+ * merges into upstream QEMU */
+#define CPU_SAVE_VERSION 9
 
 /* MMU modes definitions */
 #define MMU_MODE0_SUFFIX _kernel
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 399204d..7f75d31 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -196,6 +196,9 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
 if (version_id != 3  version_id != 4  version_id != 5
  version_id != 6  version_id != 7  version_id != 8)
 return -EINVAL;
+/* KVM cannot accept migrations from QEMU today */
+if (version_id != 9)
+return -EINVAL;
 for(i = 0; i  CPU_NB_REGS; i++)
 qemu_get_betls(f, env-regs[i]);
 qemu_get_betls(f, env-eip);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Fix build when objdir != srcdir

2009-05-03 Thread Avi Kivity
From: Anthony Liguori aligu...@us.ibm.com

This requires adding the necessary bits to configure to create the directories
and symlinks for libkvm.  It also requires sticking KVM_CFLAGS in
config-host.mak to ensure that it gets the right set of includes for the
kernel headers.

Signed-off-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/configure b/configure
index f59476d..7bc7f22 100755
--- a/configure
+++ b/configure
@@ -520,7 +520,7 @@ if test $werror = yes ; then
 CFLAGS=$CFLAGS -Werror
 fi
 
-CFLAGS=$CFLAGS -I$(readlink -f kvm/libkvm)
+CFLAGS=$CFLAGS -I$(readlink -f $source_path/kvm/libkvm)
 
 if test $solaris = no ; then
 if ld --version 2/dev/null | grep GNU ld /dev/null 2/dev/null ; then
@@ -1788,6 +1788,11 @@ bsd)
 ;;
 esac
 
+# this is a temp hack needed for libkvm
+if test $kvm = yes ; then
+echo KVM_CFLAGS=$kvm_cflags  $config_mak
+fi
+
 tools=
 if test `expr $target_list : .*softmmu.*` != 0 ; then
   tools=qemu-img\$(EXESUF) $tools
@@ -2165,10 +2170,11 @@ done # for target in $targets
 
 # build tree in object directory if source path is different from current one
 if test $source_path_used = yes ; then
-DIRS=tests tests/cris slirp audio
+DIRS=tests tests/cris slirp audio kvm/libkvm
 FILES=Makefile tests/Makefile
 FILES=$FILES tests/cris/Makefile tests/cris/.gdbinit
 FILES=$FILES tests/test-mmap.c
+FILES=$FILES kvm/libkvm/Makefile
 for dir in $DIRS ; do
 mkdir -p $dir
 done
diff --git a/kvm/libkvm/Makefile b/kvm/libkvm/Makefile
index 727ce48..8de7eaf 100644
--- a/kvm/libkvm/Makefile
+++ b/kvm/libkvm/Makefile
@@ -1,5 +1,12 @@
 include ../../config-host.mak
-include config-$(ARCH).mak
+ifneq ($(VPATH),)
+srcdir=$(VPATH)/kvm/libkvm
+else
+srcdir=.
+endif
+
+include $(srcdir)/config-$(ARCH).mak
+
 
 # libkvm is not -Wredundant-decls friendly yet
 CFLAGS += -Wno-redundant-decls
@@ -18,6 +25,8 @@ LDFLAGS += $(CFLAGS)
 
 CXXFLAGS = $(autodepend-flags)
 
+VPATH:=$(VPATH)/kvm/libkvm
+
 autodepend-flags = -MMD -MF $(dir $*).$(notdir $*).d
 
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Rename config-powerpc to config-ppc

2009-05-03 Thread Avi Kivity
From: Hollis Blanchard holl...@us.ibm.com

Apparently $(ARCH) now holds the qemu meaning, rather than the KVM meaning.

Signed-off-by: Hollis Blanchard holl...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kvm/libkvm/config-powerpc.mak b/kvm/libkvm/config-ppc.mak
similarity index 100%
rename from kvm/libkvm/config-powerpc.mak
rename to kvm/libkvm/config-ppc.mak
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Fix format of package version for KVM

2009-05-03 Thread Avi Kivity
From: Anthony Liguori aligu...@us.ibm.com

We currently show 0.10.50kvm-devel whereas pkgversion normally would show
0.10.50 (kvm-devel).  This is due to some weirdness in how pkgversion is
constructed in configure.

This corrects the version display.

Signed-off-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/configure b/configure
index 09b61a6..a9a6756 100755
--- a/configure
+++ b/configure
@@ -194,7 +194,7 @@ blobs=yes
 fdt=yes
 sdl_x11=no
 xen=yes
-pkgversion=kvm-devel
+pkgversion= (kvm-devel)
 signalfd=no
 eventfd=no
 cpu_emulation=yes
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Fix missing prototype warning.

2009-05-03 Thread Avi Kivity
From: Hollis Blanchard holl...@us.ibm.com

As far as I can see, kvm_destroy_memory_region_works() has nothing to do with
KVM_CAP_DEVICE_ASSIGNMENT, so move the prototype outside that ifdef block.

Signed-off-by: Hollis Blanchard holl...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kvm/libkvm/libkvm.h b/kvm/libkvm/libkvm.h
index ce6f054..c23d37b 100644
--- a/kvm/libkvm/libkvm.h
+++ b/kvm/libkvm/libkvm.h
@@ -739,6 +739,7 @@ int kvm_assign_irq(kvm_context_t kvm,
 int kvm_deassign_irq(kvm_context_t kvm,
struct kvm_assigned_irq *assigned_irq);
 #endif
+#endif
 
 /*!
  * \brief Determines whether destroying memory regions is allowed
@@ -748,7 +749,6 @@ int kvm_deassign_irq(kvm_context_t kvm,
  * \param kvm Pointer to the current kvm_context
  */
 int kvm_destroy_memory_region_works(kvm_context_t kvm);
-#endif
 
 #ifdef KVM_CAP_DEVICE_DEASSIGNMENT
 /*!
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Fix warning when__ia64__ is not defined.

2009-05-03 Thread Avi Kivity
From: Hollis Blanchard holl...@us.ibm.com

Signed-off-by: Hollis Blanchard holl...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kvm/libkvm/kvm-common.h b/kvm/libkvm/kvm-common.h
index 96361e8..591fb53 100644
--- a/kvm/libkvm/kvm-common.h
+++ b/kvm/libkvm/kvm-common.h
@@ -22,7 +22,7 @@
 #define KVM_MAX_NUM_MEM_REGIONS 1u
 #define MAX_VCPUS 64
 #define LIBKVM_S390_ORIGIN (0UL)
-#elif __ia64__
+#elif defined(__ia64__)
 #define KVM_MAX_NUM_MEM_REGIONS 32u
 #define MAX_VCPUS 256
 #else
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT stable-0.10] Read kvm version from KVM_VERSION file

2009-05-03 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

This allows the packager to add a KVM_VERSION file to the tarball instead
of modifying the source.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/configure b/configure
index 0cfdf7b..04e072b 100755
--- a/configure
+++ b/configure
@@ -152,6 +152,17 @@ case $cpu in
 cpu=unknown
   ;;
 esac
+
+kvm_version() {
+local fname=$(dirname $0)/KVM_VERSION
+
+if test -f $fname; then
+cat $fname
+else
+echo kvm-devel
+fi
+}
+
 gprof=no
 sparse=no
 bigendian=no
@@ -190,6 +201,7 @@ aix=no
 blobs=yes
 fdt=yes
 sdl_x11=no
+pkgversion=$(kvm_version)
 signalfd=no
 eventfd=no
 cpu_emulation=yes
@@ -1474,7 +1486,7 @@ fi
 qemu_version=`head $source_path/VERSION`
 echo VERSION=$qemu_version $config_mak
 echo #define QEMU_VERSION \$qemu_version\  $config_h
-echo #define KVM_VERSION \kvm-devel\  $config_h
+echo #define KVM_VERSION \${pkgversion}\  $config_h
 
 echo SRC_PATH=$source_path  $config_mak
 if [ $source_path_used = yes ]; then
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] patch add_powerpc_kvm_headers.diff

2009-05-03 Thread Avi Kivity
From: Hollis Blanchard holl...@us.ibm.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kvm/kernel/arch/powerpc/include/asm/kvm.h 
b/kvm/kernel/arch/powerpc/include/asm/kvm.h
new file mode 100644
index 000..c4f1ed1
--- /dev/null
+++ b/kvm/kernel/arch/powerpc/include/asm/kvm.h
@@ -0,0 +1,102 @@
+#ifndef KVM_UNIFDEF_H
+#define KVM_UNIFDEF_H
+
+#ifdef __i386__
+#ifndef CONFIG_X86_32
+#define CONFIG_X86_32 1
+#endif
+#endif
+
+#ifdef __x86_64__
+#ifndef CONFIG_X86_64
+#define CONFIG_X86_64 1
+#endif
+#endif
+
+#if defined(__i386__) || defined (__x86_64__)
+#ifndef CONFIG_X86
+#define CONFIG_X86 1
+#endif
+#endif
+
+#ifdef __ia64__
+#ifndef CONFIG_IA64
+#define CONFIG_IA64 1
+#endif
+#endif
+
+#ifdef __PPC__
+#ifndef CONFIG_PPC
+#define CONFIG_PPC 1
+#endif
+#endif
+
+#ifdef __s390__
+#ifndef CONFIG_S390
+#define CONFIG_S390 1
+#endif
+#endif
+
+#endif
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2007
+ *
+ * Authors: Hollis Blanchard holl...@us.ibm.com
+ */
+
+#ifndef __LINUX_KVM_POWERPC_H
+#define __LINUX_KVM_POWERPC_H
+
+#include linux/types.h
+
+struct kvm_regs {
+   __u64 pc;
+   __u64 cr;
+   __u64 ctr;
+   __u64 lr;
+   __u64 xer;
+   __u64 msr;
+   __u64 srr0;
+   __u64 srr1;
+   __u64 pid;
+
+   __u64 sprg0;
+   __u64 sprg1;
+   __u64 sprg2;
+   __u64 sprg3;
+   __u64 sprg4;
+   __u64 sprg5;
+   __u64 sprg6;
+   __u64 sprg7;
+
+   __u64 gpr[32];
+};
+
+struct kvm_sregs {
+};
+
+struct kvm_fpu {
+   __u64 fpr[32];
+};
+
+struct kvm_debug_exit_arch {
+};
+
+/* for KVM_SET_GUEST_DEBUG */
+struct kvm_guest_debug_arch {
+};
+
+#endif /* __LINUX_KVM_POWERPC_H */
diff --git a/kvm/kernel/arch/powerpc/include/asm/kvm_44x.h 
b/kvm/kernel/arch/powerpc/include/asm/kvm_44x.h
new file mode 100644
index 000..956f252
--- /dev/null
+++ b/kvm/kernel/arch/powerpc/include/asm/kvm_44x.h
@@ -0,0 +1,108 @@
+#ifndef KVM_UNIFDEF_H
+#define KVM_UNIFDEF_H
+
+#ifdef __i386__
+#ifndef CONFIG_X86_32
+#define CONFIG_X86_32 1
+#endif
+#endif
+
+#ifdef __x86_64__
+#ifndef CONFIG_X86_64
+#define CONFIG_X86_64 1
+#endif
+#endif
+
+#if defined(__i386__) || defined (__x86_64__)
+#ifndef CONFIG_X86
+#define CONFIG_X86 1
+#endif
+#endif
+
+#ifdef __ia64__
+#ifndef CONFIG_IA64
+#define CONFIG_IA64 1
+#endif
+#endif
+
+#ifdef __PPC__
+#ifndef CONFIG_PPC
+#define CONFIG_PPC 1
+#endif
+#endif
+
+#ifdef __s390__
+#ifndef CONFIG_S390
+#define CONFIG_S390 1
+#endif
+#endif
+
+#endif
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * Authors: Hollis Blanchard holl...@us.ibm.com
+ */
+
+#ifndef __ASM_44X_H__
+#define __ASM_44X_H__
+
+#include linux/kvm_host.h
+
+#define PPC44x_TLB_SIZE 64
+
+/* If the guest is expecting it, this can be as large as we like; we'd just
+ * need to find some way of advertising it. */
+#define KVM44x_GUEST_TLB_SIZE 64
+
+struct kvmppc_44x_tlbe {
+   u32 tid; /* Only the low 8 bits are used. */
+   u32 word0;
+   u32 word1;
+   u32 word2;
+};
+
+struct kvmppc_44x_shadow_ref {
+   struct page *page;
+   u16 gtlb_index;
+   u8 writeable;
+   u8 tid;
+};
+
+struct kvmppc_vcpu_44x {
+   /* Unmodified copy of the guest's TLB. */
+   struct kvmppc_44x_tlbe guest_tlb[KVM44x_GUEST_TLB_SIZE];
+
+   /* References to guest pages in the hardware TLB. */
+   struct kvmppc_44x_shadow_ref shadow_refs[PPC44x_TLB_SIZE];
+
+   /* State of the shadow TLB at guest context switch time. */
+   struct kvmppc_44x_tlbe shadow_tlb[PPC44x_TLB_SIZE];
+   u8 shadow_tlb_mod[PPC44x_TLB_SIZE];
+
+   struct kvm_vcpu vcpu;
+};
+
+static inline struct kvmppc_vcpu_44x *to_44x(struct kvm_vcpu *vcpu)
+{
+   return 

[COMMIT stable-0.10] Merge commit 'v0.10.3' into stable-0.10

2009-05-03 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

* commit 'v0.10.3':
  Update version for 0.10.3 release
  Implement cancellation method for dma async I/O (Avi Kivity)
  Convert vectored aio emulation to use a dedicated pool (Avi Kivity)
  Refactor aio callback allocation to use an aiocb pool (Avi Kivity)
  Fix hw/acpi.c build w/ DEBUG enabled
  Make sure not to fall through on error in loadvm
  Pci nic: pci_register_device can fail
  Fix serial option with -drive
  suport device driver initialization model
  kvm: Avoid COW if KVM MMU is asynchronous
  vnc: windup keypad keys for qemu console emulation
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT stable-0.10] kvm: Add release script

2009-05-03 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kvm/scripts/make-release b/kvm/scripts/make-release
new file mode 100755
index 000..3b1dccf
--- /dev/null
+++ b/kvm/scripts/make-release
@@ -0,0 +1,60 @@
+#!/bin/bash -e
+
+usage() {
+echo usage: $0 [--upload] [--formal] commit [name]
+exit 1
+}
+
+[[ -f ~/.kvmreleaserc ]]  . ~/.kvmreleaserc
+
+upload=
+formal=
+
+releasedir=~/sf-release
+[[ -z $TMP ]]  TMP=/tmp
+tmpdir=$TMP/qemu-kvm-make-release.$$
+while [[ $1 = -* ]]; do
+opt=$1
+shift
+case $opt in
+   --upload)
+   upload=yes
+   ;;
+   --formal)
+   formal=yes
+   ;;
+   *)
+   usage
+   ;;
+esac
+done
+
+commit=$1
+name=$2
+
+if [[ -z $commit ]]; then
+usage
+fi
+
+if [[ -z $name ]]; then
+name=$commit
+fi
+
+tarball=$releasedir/$name.tar
+
+cd $(dirname $0)/../..
+git archive --prefix=$name/ --format=tar $commit  $tarball
+
+if [[ -n $formal ]]; then
+mkdir -p $tmpdir
+echo $name  $tmpdir/KVM_VERSION
+tar -rf $tarball --transform s,^,$name/, -C $tmpdir KVM_VERSION
+rm -rf $tmpdir
+fi
+
+gzip -9 $tarball
+tarball=$tarball.gz
+
+if [[ -n $upload ]]; then
+rsync --progress -h $tarball a...@frs.sourceforge.net:uploads/
+fi
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT stable-0.10] kvm: disable kqemu

2009-05-03 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/configure b/configure
index 913bcb8..0cfdf7b 100755
--- a/configure
+++ b/configure
@@ -314,6 +314,7 @@ if [ $cpu = i386 -o $cpu = x86_64 ] ; then
 kqemu=yes
 audio_possible_drivers=$audio_possible_drivers fmod
 kvm=yes
+kqemu=no
 fi
 if [ $cpu = ia64 ] ; then
  kvm=yes
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Revert Sync idcache after emualted DMA operations for ia64

2009-05-03 Thread Avi Kivity
From: Hollis Blanchard holl...@us.ibm.com

This reverts commit 9dc99a28236161a5a1b4c58f1e9c4ec6179cb976.
Aside from the other issues discussed on kvm-devel, this commit breaks the
PowerPC build.

Signed-off-by: Hollis Blanchard holl...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/cache-utils.h b/cache-utils.h
index 2a07fbd..b45fde4 100644
--- a/cache-utils.h
+++ b/cache-utils.h
@@ -33,22 +33,8 @@ static inline void flush_icache_range(unsigned long start, 
unsigned long stop)
 asm volatile (sync : : : memory);
 asm volatile (isync : : : memory);
 }
-#define qemu_sync_idcache flush_icache_range
-#else
 
-#ifdef __ia64__
-static inline void qemu_sync_idcache(unsigned long start, unsigned long stop)
-{
-while (start  stop) {
-   asm volatile (fc %0 :: r(start));
-   start += 32;
-}
-asm volatile (;;sync.i;;srlz.i;;);
-}
 #else
-static inline void qemu_sync_idcache(unsigned long start, unsigned long stop) 
{}
-#endif
-
 #define qemu_cache_utils_init(envp) do { (void) (envp); } while (0)
 #endif
 
diff --git a/cutils.c b/cutils.c
index e2bee1e..a1652ab 100644
--- a/cutils.c
+++ b/cutils.c
@@ -23,7 +23,6 @@
  */
 #include qemu-common.h
 #include host-utils.h
-#include cache-utils.h
 #include assert.h
 
 void pstrcpy(char *buf, int buf_size, const char *str)
@@ -177,8 +176,6 @@ void qemu_iovec_from_buffer(QEMUIOVector *qiov, const void 
*buf, size_t count)
 if (copy  qiov-iov[i].iov_len)
 copy = qiov-iov[i].iov_len;
 memcpy(qiov-iov[i].iov_base, p, copy);
-qemu_sync_idcache((unsigned long)qiov-iov[i].iov_base,
-  (unsigned long)(qiov-iov[i].iov_base + copy));
 p += copy;
 count -= copy;
 }
diff --git a/dma-helpers.c b/dma-helpers.c
index f71ca3b..f9eb224 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -9,7 +9,6 @@
 
 #include dma.h
 #include block_int.h
-#include cache-utils.h
 
 static AIOPool dma_aio_pool;
 
@@ -138,8 +137,6 @@ static BlockDriverAIOCB *dma_bdrv_io(
 BlockDriverCompletionFunc *cb, void *opaque,
 int is_write)
 {
-int i;
-QEMUIOVector *qiov;
 DMAAIOCB *dbs =  qemu_aio_get_pool(dma_aio_pool, bs, cb, opaque);
 
 dbs-acb = NULL;
@@ -152,15 +149,6 @@ static BlockDriverAIOCB *dma_bdrv_io(
 dbs-bh = NULL;
 qemu_iovec_init(dbs-iov, sg-nsg);
 dma_bdrv_cb(dbs, 0);
-
-if (!is_write) {
-qiov = dbs-iov;
-for (i = 0; i  qiov-niov; ++i) {
-   qemu_sync_idcache((unsigned long)qiov-iov[i].iov_base,
- (unsigned long)(qiov-iov[i].iov_base + 
qiov-iov[i].iov_len));
-   }
-}
-
 if (!dbs-acb) {
 qemu_aio_release(dbs);
 return NULL;
diff --git a/exec.c b/exec.c
index 0c5545e..a5bca49 100644
--- a/exec.c
+++ b/exec.c
@@ -3400,9 +3400,6 @@ void cpu_physical_memory_unmap(void *buffer, 
target_phys_addr_t len,
 addr1 += l;
 access_len -= l;
 }
-if (kvm_enabled())
-   flush_icache_range((unsigned long)buffer,
-   (unsigned long)buffer + access_len);
 }
 return;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT stable-0.10] Fix warning when__ia64__ is not defined.

2009-05-03 Thread Avi Kivity
From: Hollis Blanchard holl...@us.ibm.com

Signed-off-by: Hollis Blanchard holl...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kvm/libkvm/kvm-common.h b/kvm/libkvm/kvm-common.h
index de1ada2..9060820 100644
--- a/kvm/libkvm/kvm-common.h
+++ b/kvm/libkvm/kvm-common.h
@@ -22,7 +22,7 @@
 #define KVM_MAX_NUM_MEM_REGIONS 1u
 #define MAX_VCPUS 64
 #define LIBKVM_S390_ORIGIN (0UL)
-#elif __ia64__
+#elif defined(__ia64__)
 #define KVM_MAX_NUM_MEM_REGIONS 32u
 #define MAX_VCPUS 256
 #else
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] kvm: Add release script

2009-05-03 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kvm/scripts/make-release b/kvm/scripts/make-release
new file mode 100755
index 000..3b1dccf
--- /dev/null
+++ b/kvm/scripts/make-release
@@ -0,0 +1,60 @@
+#!/bin/bash -e
+
+usage() {
+echo usage: $0 [--upload] [--formal] commit [name]
+exit 1
+}
+
+[[ -f ~/.kvmreleaserc ]]  . ~/.kvmreleaserc
+
+upload=
+formal=
+
+releasedir=~/sf-release
+[[ -z $TMP ]]  TMP=/tmp
+tmpdir=$TMP/qemu-kvm-make-release.$$
+while [[ $1 = -* ]]; do
+opt=$1
+shift
+case $opt in
+   --upload)
+   upload=yes
+   ;;
+   --formal)
+   formal=yes
+   ;;
+   *)
+   usage
+   ;;
+esac
+done
+
+commit=$1
+name=$2
+
+if [[ -z $commit ]]; then
+usage
+fi
+
+if [[ -z $name ]]; then
+name=$commit
+fi
+
+tarball=$releasedir/$name.tar
+
+cd $(dirname $0)/../..
+git archive --prefix=$name/ --format=tar $commit  $tarball
+
+if [[ -n $formal ]]; then
+mkdir -p $tmpdir
+echo $name  $tmpdir/KVM_VERSION
+tar -rf $tarball --transform s,^,$name/, -C $tmpdir KVM_VERSION
+rm -rf $tmpdir
+fi
+
+gzip -9 $tarball
+tarball=$tarball.gz
+
+if [[ -n $upload ]]; then
+rsync --progress -h $tarball a...@frs.sourceforge.net:uploads/
+fi
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Al Viro
On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
 + /* We re-use eventfd for irqfd */
 + fd = sys_eventfd2(0, 0);
 + if (fd  0) {
 + ret = fd;
 + goto fail;
 + }
 +
 + /* We maintain a reference to eventfd for the irqfd lifetime */
 + file = eventfd_fget(fd);
 + if (IS_ERR(file)) {
 + ret = PTR_ERR(file);
 + goto fail;
 + }
 +
 + irqfd-file = file;

This is just plain wrong.  You have no promise whatsoever that caller of
that sucker won't race with e.g. dup2().  IOW, you can't assume that
file will be of the expected kind.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: build system Add link to qemu

2009-05-03 Thread Avi Kivity

Jan Kiszka wrote:

I'm getting closer to a working qemu-kvm, but there are still a few
messy parts. The magic dance goes like this:

cd qemu-kvm/kvm
ln -s .. qemu   (or apply patch below)
./configure -whatever
make

Still, this is unintuitive. As both top-level configure and Makefile
already differ from upstream, I see no reason not tweaking them also in
way that ./configure  make from the top-level directory behaves as
expected again. May look into this later (and the other warnings the
build threw at me), now I've to understand an ugly shadow page table
inconsistency of kvm...

Jan

---

Subject: [PATCH] qemu-kvm: build system Add link to qemu

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 kvm/qemu |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 12 kvm/qemu

diff --git a/kvm/qemu b/kvm/qemu
new file mode 12
index 000..a96aa0e
--- /dev/null
+++ b/kvm/qemu
@@ -0,0 +1 @@
+..
\ No newline at end of file
  


This shouldn't be needed.  Can you confirm this with current qemu-kvm.git?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PPC support for qemu-kvm

2009-05-03 Thread Avi Kivity

Hollis Blanchard wrote:

These patches fix a number of issues with PowerPC builds of qemu-kvm.git.
However, even after applying these patches it still doesn't build, due to
confusion with KVM_UPSTREAM and CONFIG_KVM.
  


Applied all, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: trivial format fix in setup_routing_entry()

2009-05-03 Thread Avi Kivity

Chris Wright wrote:

Remove extra tab.

  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to boot guest on kernel 2.6.29.1 with kvm-84 or kvm-85

2009-05-03 Thread Avi Kivity

Avi Kivity wrote:

Kenni Lund wrote:

Avi Kivity a...@redhat.com wrote:
 

Kenni Lund wrote:
   

Ok, but as I write in my message, I'm using the
  

KVM modules from the latest upstream kernel, not the kvm-85
modules.
   
According to the KVM download page, 
http://www.linux-kvm.org/page/Downloads, any kernel above 2.6.25 
should work with the
  
latest KVM userspace. This has been true until now in my case, but 
it breaks

with 2.6.29.1 and that's the reason why I'm posting this bug report.
   


Can you try a bisect?



Yes, sorry for the late reply. I did the bisect as requested and it 
returned the following results:


# bad: [8d7bff2d72660d9d60aa371ae3d1356bbf329a09] Linux 2.6.29.1
# good: [4a6908a3a050aacc9c3a2f36b276b46c0629ad91] Linux 2.6.28 git 
bisect start 'v2.6.29.1' 'v2.6.28' '--' 'arch/x86/kvm' 'virt/kvm'
# good: [b82091824ee4970adf92d5cd6d57b12273171625] KVM: Prevent trace 
call into unloaded module text
git bisect good 
b82091824ee4970adf92d5cd6d57b12273171625
# good: [7f59f492da722eb3551bbe1f8f4450a21896f05d] KVM: use 
cpumask_var_t for cpus_hardware_enabled git bisect good 
7f59f492da722eb3551bbe1f8f4450a21896f05d
# good: [19de40a8472fa64693eab844911eec277d489f6c] KVM: change KVM to 
use IOMMU API git bisect good 
19de40a8472fa64693eab844911eec277d489f6c
# good: [2aaf69dcee864f4fb6402638dd2f263324ac839f] KVM: MMU: Map 
device MMIO as UC in EPT

git bisect good 2aaf69dcee864f4fb6402638dd2f263324ac839f
# good: [682edb4c01e690c7c7cd772dbd6f4e0fd74dc572] KVM: Fix assigned 
devices circular locking dependency

git bisect good 682edb4c01e690c7c7cd772dbd6f4e0fd74dc572
# bad: [f438349efb8247cd0c1d453a4131b1f801bf5691] KVM: VMX: Don't 
allow uninhibited access to EFER on i386

git bisect bad f438349efb8247cd0c1d453a4131b1f801bf5691
# good: [516a1a7e9dc80358030fe01aabb3bedf882db9e2] KVM: VMX: Flush 
volatile msrs before emulating rdmsr

git bisect good 516a1a7e9dc80358030fe01aabb3bedf882db9e2


And the final output:

f438349efb8247cd0c1d453a4131b1f801bf5691 is first bad commit
commit f438349efb8247cd0c1d453a4131b1f801bf5691
Author: Avi Kivity Date:   Thu Mar 26 23:05:03 2009 +

KVM: VMX: Don't allow uninhibited access to EFER on i386

upstream commit: 16175a796d061833aacfbd9672235f2d2725df65

vmx_set_msr() does not allow i386 guests to touch EFER, but they 
can still
do so through the default: label in the switch.  If they set 
EFER_LME, they

can oops the host.

Fix by having EFER access through the normal channel (which will 
check for

EFER_LME) even on i386.

Reported-and-tested-by: Benjamin Gilbert Cc: sta...@kernel.org
Signed-off-by: Avi Kivity Signed-off-by: Chris Wright
:04 04 cf7848d35c136beee6665e67839080d450977af0 
0a39980481dd346306b2ac54dbe916741515f1f1 M  arch




FYI, I also tested 2.6.29.2 and the issue still exists.

Do you need more information?

  


Please try the attached patch.



It won't help - I reproduced the issue.  Instead, try passing the 
parameter '-cpu qemu32' (or '-cpu qemu64,-nx').


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Implement generic double fault generation mechanism

2009-05-03 Thread Gleb Natapov
On Thu, Apr 30, 2009 at 03:24:07PM +0800, Dong, Eddie wrote:
 
 
 Move Double-Fault generation logic out of page fault
 exception generating function to cover more generic case.
 
 Signed-off-by: Eddie Dong eddie.d...@intel.com
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index ab1fdac..51a8dad 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -162,12 +162,59 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data)
  }
  EXPORT_SYMBOL_GPL(kvm_set_apic_base);
  
 +#define EXCPT_BENIGN 0
 +#define EXCPT_CONTRIBUTORY   1
 +#define EXCPT_PF 2
 +
 +static int exception_class(int vector)
 +{
 + if (vector == 14)
 + return EXCPT_PF;
 + else if (vector == 0 || (vector = 10  vector = 13))
 + return EXCPT_CONTRIBUTORY;
 + else
 + return EXCPT_BENIGN;
 +}
 +
This makes double fault (8) benign exception. Surely not what you want.

 +static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 + unsigned nr, bool has_error, u32 error_code)
 +{
 + u32 prev_nr;
 + int class1, class2;
 +
 + if (!vcpu-arch.exception.pending) {
 + vcpu-arch.exception.pending = true;
 + vcpu-arch.exception.has_error_code = has_error;
 + vcpu-arch.exception.nr = nr;
 + vcpu-arch.exception.error_code = error_code;
 + return;
 + }
 +
 + /* to check exception */
 + prev_nr = vcpu-arch.exception.nr;
 + class2 = exception_class(nr);
 + class1 = exception_class(prev_nr);
 + if ((class1 == EXCPT_CONTRIBUTORY  class2 == EXCPT_CONTRIBUTORY)
 + || (class1 == EXCPT_PF  class2 != EXCPT_BENIGN)) {
 + /* generate double fault per SDM Table 5-5 */
 + printk(KERN_DEBUG kvm: double fault 0x%x on 0x%x\n,
 + prev_nr, nr);
 + vcpu-arch.exception.pending = true;
 + vcpu-arch.exception.has_error_code = 1;
 + vcpu-arch.exception.nr = DF_VECTOR;
 + vcpu-arch.exception.error_code = 0;
 + if (prev_nr == DF_VECTOR) {
 + /* triple fault - shutdown */
 + set_bit(KVM_REQ_TRIPLE_FAULT, vcpu-requests);
 + }
 + } else
 + printk(KERN_ERR Exception 0x%x on 0x%x happens serially\n,
 + prev_nr, nr);
 +}
When two exceptions happens serially is is better to replace pending exception
with a new one. This way the first exception (that is lost) will be regenerated
when instruction will be re-executed.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: Add helpers for checking and requiring kvm extensions

2009-05-03 Thread Avi Kivity
Instead of open-coding the check extension sequence, provide helpers
for checking whether an extension exists, and for aborting if an
extension is missing.

Signed-off-by: Avi Kivity a...@redhat.com
---
 kvm-all.c |   63 +++--
 kvm.h |6 +
 target-i386/kvm.c |8 +--
 3 files changed, 39 insertions(+), 38 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 36659a9..1642a2a 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -64,6 +64,30 @@ struct KVMState
 
 static KVMState *kvm_state;
 
+int kvm_check_extension(int extension)
+{
+int ret;
+
+ret = kvm_ioctl(kvm_state, KVM_CHECK_EXTENSION, extension);
+if (ret  0) {
+fprintf(stderr, KVM_CHECK_EXTENSION failed: %s\n, strerror(errno));
+exit(1);
+}
+return ret;
+}
+
+int kvm_require_extension(int extension, const char *estr)
+{
+int ret;
+
+ret = kvm_check_extension(extension);
+if (!ret) {
+fprintf(stderr, Required KVM extension %s not present\n, estr);
+exit(1);
+}
+return ret;
+}
+
 static KVMSlot *kvm_alloc_slot(KVMState *s)
 {
 int i;
@@ -331,6 +355,8 @@ int kvm_init(int smp_cpus)
 
 s = qemu_mallocz(sizeof(KVMState));
 
+kvm_state = s;
+
 #ifdef KVM_CAP_SET_GUEST_DEBUG
 TAILQ_INIT(s-kvm_sw_breakpoints);
 #endif
@@ -368,42 +394,22 @@ int kvm_init(int smp_cpus)
  * just use a user allocated buffer so we can use regular pages
  * unmodified.  Make sure we have a sufficiently modern version of KVM.
  */
-ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_USER_MEMORY);
-if (ret = 0) {
-if (ret == 0)
-ret = -EINVAL;
-fprintf(stderr, kvm does not support KVM_CAP_USER_MEMORY\n);
-goto err;
-}
+KVM_REQUIRE_EXTENSION(KVM_CAP_USER_MEMORY);
 
 /* There was a nasty bug in  kvm-80 that prevents memory slots from being
  * destroyed properly.  Since we rely on this capability, refuse to work
  * with any kernel without this capability. */
-ret = kvm_ioctl(s, KVM_CHECK_EXTENSION,
-KVM_CAP_DESTROY_MEMORY_REGION_WORKS);
-if (ret = 0) {
-if (ret == 0)
-ret = -EINVAL;
-
-fprintf(stderr,
-KVM kernel module broken (DESTROY_MEMORY_REGION)\n
-Please upgrade to at least kvm-81.\n);
-goto err;
-}
+KVM_REQUIRE_EXTENSION(KVM_CAP_DESTROY_MEMORY_REGION_WORKS);
 
 s-coalesced_mmio = 0;
 #ifdef KVM_CAP_COALESCED_MMIO
-ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_COALESCED_MMIO);
-if (ret  0)
-s-coalesced_mmio = ret;
+s-coalesced_mmio = kvm_check_extension(KVM_CAP_COALESCED_MMIO);
 #endif
 
 ret = kvm_arch_init(s, smp_cpus);
 if (ret  0)
 goto err;
 
-kvm_state = s;
-
 return 0;
 
 err:
@@ -415,6 +421,8 @@ err:
 }
 qemu_free(s);
 
+kvm_state = NULL;
+
 return ret;
 }
 
@@ -763,14 +771,7 @@ int kvm_vcpu_ioctl(CPUState *env, int type, ...)
 
 int kvm_has_sync_mmu(void)
 {
-#ifdef KVM_CAP_SYNC_MMU
-KVMState *s = kvm_state;
-
-if (kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_SYNC_MMU)  0)
-return 1;
-#endif
-
-return 0;
+return kvm_check_extension(KVM_CAP_SYNC_MMU);
 }
 
 void kvm_setup_guest_memory(void *start, size_t size)
diff --git a/kvm.h b/kvm.h
index 0ea2426..bd4e8d4 100644
--- a/kvm.h
+++ b/kvm.h
@@ -29,6 +29,12 @@ struct kvm_run;
 
 /* external API */
 
+#define KVM_REQUIRE_EXTENSION(extension) \
+kvm_require_extension(extension, #extension)
+
+int kvm_check_extension(int extension);
+int kvm_require_extension(int extension, const char *estr);
+
 int kvm_init(int smp_cpus);
 
 int kvm_init_vcpu(CPUState *env);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 2de8b81..b534b2d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -154,19 +154,13 @@ static int kvm_has_msr_star(CPUState *env)
 
 int kvm_arch_init(KVMState *s, int smp_cpus)
 {
-int ret;
-
 /* create vm86 tss.  KVM uses vm86 mode to emulate 16-bit code
  * directly.  In order to use vm86 mode, a TSS is needed.  Since this
  * must be part of guest physical memory, we need to allocate it.  Older
  * versions of KVM just assumed that it would be at the end of physical
  * memory but that doesn't work with more than 4GB of memory.  We simply
  * refuse to work with those older versions of KVM. */
-ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_SET_TSS_ADDR);
-if (ret = 0) {
-fprintf(stderr, kvm does not support KVM_CAP_SET_TSS_ADDR\n);
-return ret;
-}
+KVM_REQUIRE_EXTENSION(KVM_CAP_SET_TSS_ADDR);
 
 /* this address is 3 pages before the bios, and the bios should present
  * as unavaible memory.  FIXME, need to ensure the e820 map deals with
-- 
1.6.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: Unable to boot guest on kernel 2.6.29.1 with kvm-84 or kvm-85

2009-05-03 Thread Kenni Lund
Avi Kivity wrote:
 Kenni Lund wrote:  
 Avi Kivity a...@redhat.com wrote:  
   
 Kenni Lund wrote:
  
 Ok, but as I write in my message, I'm using the

 KVM modules from the latest upstream kernel, not the kvm-85
 modules.   
 
 According to the KVM download page,
 http://www.linux-kvm.org/page/Downloads, any kernel above 2.6.25
 should work with the 
  
 latest KVM userspace. This has been true until now in my case, but
 it breaks  
 with 2.6.29.1 and that's the reason why I'm posting this bug report.
  
 
 Can you try a bisect?
  
   
 Yes, sorry for the late reply. I did the bisect as requested and it   
 returned the following results:   
   
 # bad: [8d7bff2d72660d9d60aa371ae3d1356bbf329a09] Linux 2.6.29.1  
 # good: [4a6908a3a050aacc9c3a2f36b276b46c0629ad91] Linux 2.6.28 git   
 bisect start 'v2.6.29.1' 'v2.6.28' '--' 'arch/x86/kvm' 'virt/kvm' 
 # good: [b82091824ee4970adf92d5cd6d57b12273171625] KVM: Prevent trace 
 call into unloaded module text
 git bisect good   
 b82091824ee4970adf92d5cd6d57b12273171625  
 # good: [7f59f492da722eb3551bbe1f8f4450a21896f05d] KVM: use   
 cpumask_var_t for cpus_hardware_enabled git bisect good   
 7f59f492da722eb3551bbe1f8f4450a21896f05d  
 # good: [19de40a8472fa64693eab844911eec277d489f6c] KVM: change KVM to 
 use IOMMU API git bisect good 
 19de40a8472fa64693eab844911eec277d489f6c  
 # good: [2aaf69dcee864f4fb6402638dd2f263324ac839f] KVM: MMU: Map  
 device MMIO as UC in EPT  
 git bisect good 2aaf69dcee864f4fb6402638dd2f263324ac839f  
 # good: [682edb4c01e690c7c7cd772dbd6f4e0fd74dc572] KVM: Fix assigned  
 devices circular locking dependency   
 git bisect good 682edb4c01e690c7c7cd772dbd6f4e0fd74dc572  
 # bad: [f438349efb8247cd0c1d453a4131b1f801bf5691] KVM: VMX: Don't 
 allow uninhibited access to EFER on i386  
 git bisect bad f438349efb8247cd0c1d453a4131b1f801bf5691   
 # good: [516a1a7e9dc80358030fe01aabb3bedf882db9e2] KVM: VMX: Flush
 volatile msrs before emulating rdmsr  
 git bisect good 516a1a7e9dc80358030fe01aabb3bedf882db9e2  
   
   
 And the final output: 
   
 f438349efb8247cd0c1d453a4131b1f801bf5691 is first bad commit
 commit f438349efb8247cd0c1d453a4131b1f801bf5691
 Author: Avi Kivity Date: Thu Mar 26 23:05:03 2009 +

 KVM: VMX: Don't allow uninhibited access to EFER on i386

 upstream commit: 16175a796d061833aacfbd9672235f2d2725df65

 vmx_set_msr() does not allow i386 guests to touch EFER, but they
 can still
 do so through the default: label in the switch. If they set
 EFER_LME, they
 can oops the host.

 Fix by having EFER access through the normal channel (which will
 check for
 EFER_LME) even on i386.

 Reported-and-tested-by: Benjamin Gilbert Cc: sta...@kernel.org
 Signed-off-by: Avi Kivity Signed-off-by: Chris Wright
 :04 04 cf7848d35c136beee6665e67839080d450977af0
 0a39980481dd346306b2ac54dbe916741515f1f1 M arch

 

 FYI, I also tested 2.6.29.2 and the issue still exists.

 Do you need more information?



 Please try the attached patch.


It won't help - I reproduced the issue. Instead, try passing the
parameter '-cpu qemu32' (or '-cpu qemu64,-nx').

Adding the parameter '-cpu qemu32' (32bit host + 32 bit guest) makes the WinXP 
guest boot.
...but is this parameter equal to '-no-kvm'? Eg. with emulated CPU?

Best Regards
Kenni Lund
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo 

Re: Unable to boot guest on kernel 2.6.29.1 with kvm-84 or kvm-85

2009-05-03 Thread Avi Kivity

Kenni Lund wrote:


It won't help - I reproduced the issue. Instead, try passing the
parameter '-cpu qemu32' (or '-cpu qemu64,-nx').



Adding the parameter '-cpu qemu32' (32bit host + 32 bit guest) makes the WinXP 
guest boot.
...but is this parameter equal to '-no-kvm'? Eg. with emulated CPU?
  


No, as you can tell from the speed (and 'info kvm' output).

'-cpu qemu32' means use the cpu features exposed by the virtual qemu32 
cpu.  The default is qemu64, which includes 64-bit support and NX.  Your 
host kernel doesn't have NX support, but qemu erronously reports that it 
does, causing Windows to get confused.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] Fix kvm cpuid reporting

2009-05-03 Thread Avi Kivity
kvm supports an interface for reporting which cpuid features are supported.
Use it for trimming the cpu feature set reported to the guest.  This prevents,
for example, reporting NX to a guest when in fact we do not support it.

Avi Kivity (4):
  kvm: Add support for querying supported cpu features
  Make x86 cpuid feature names available in file scope
  Fix x86 feature modifications for features that set multiple bits
  kvm: Trim cpu features not supported by kvm

 kvm.h|3 ++
 target-i386/helper.c |   98 +
 target-i386/kvm.c|   80 
 3 files changed, 149 insertions(+), 32 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Make x86 cpuid feature names available in file scope

2009-05-03 Thread Avi Kivity
To be used later.

Signed-off-by: Avi Kivity a...@redhat.com
---
 target-i386/helper.c |   55 +
 1 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index a070e08..88585b8 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -32,39 +32,40 @@
 
 //#define DEBUG_MMU
 
+/* feature flags taken from Intel Processor Identification and the CPUID
+ * Instruction and AMD's CPUID Specification. In cases of disagreement
+ * about feature names, the Linux name is used. */
+static const char *feature_name[] = {
+fpu, vme, de, pse, tsc, msr, pae, mce,
+cx8, apic, NULL, sep, mtrr, pge, mca, cmov,
+pat, pse36, pn /* Intel psn */, clflush /* Intel clfsh */, NULL, 
ds /* Intel dts */, acpi, mmx,
+fxsr, sse, sse2, ss, ht /* Intel htt */, tm, ia64, pbe,
+};
+static const char *ext_feature_name[] = {
+pni /* Intel,AMD sse3 */, NULL, NULL, monitor, ds_cpl, vmx, NULL 
/* Linux smx */, est,
+tm2, ssse3, cid, NULL, NULL, cx16, xtpr, NULL,
+NULL, NULL, dca, NULL, NULL, NULL, NULL, popcnt,
+   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+};
+static const char *ext2_feature_name[] = {
+fpu, vme, de, pse, tsc, msr, pae, mce,
+cx8 /* AMD CMPXCHG8B */, apic, NULL, syscall, mtrr, pge, mca, 
cmov,
+pat, pse36, NULL, NULL /* Linux mp */, nx /* Intel xd */, NULL, 
mmxext, mmx,
+fxsr, fxsr_opt /* AMD ffxsr */, pdpe1gb /* AMD Page1GB */, rdtscp, 
NULL, lm /* Intel 64 */, 3dnowext, 3dnow,
+};
+static const char *ext3_feature_name[] = {
+lahf_lm /* AMD LahfSahf */, cmp_legacy, svm, extapic /* AMD 
ExtApicSpace */, cr8legacy /* AMD AltMovCr8 */, abm, sse4a, misalignsse,
+3dnowprefetch, osvw, NULL /* Linux ibs */, NULL, skinit, wdt, 
NULL, NULL,
+NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+};
+
 static void add_flagname_to_bitmaps(char *flagname, uint32_t *features, 
 uint32_t *ext_features, 
 uint32_t *ext2_features, 
 uint32_t *ext3_features)
 {
 int i;
-/* feature flags taken from Intel Processor Identification and the CPUID
- * Instruction and AMD's CPUID Specification. In cases of disagreement 
- * about feature names, the Linux name is used. */
-static const char *feature_name[] = {
-fpu, vme, de, pse, tsc, msr, pae, mce,
-cx8, apic, NULL, sep, mtrr, pge, mca, cmov,
-pat, pse36, pn /* Intel psn */, clflush /* Intel clfsh */, 
NULL, ds /* Intel dts */, acpi, mmx,
-fxsr, sse, sse2, ss, ht /* Intel htt */, tm, ia64, pbe,
-};
-static const char *ext_feature_name[] = {
-   pni /* Intel,AMD sse3 */, NULL, NULL, monitor, ds_cpl, vmx, 
NULL /* Linux smx */, est,
-   tm2, ssse3, cid, NULL, NULL, cx16, xtpr, NULL,
-   NULL, NULL, dca, NULL, NULL, NULL, NULL, popcnt,
-   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-};
-static const char *ext2_feature_name[] = {
-   fpu, vme, de, pse, tsc, msr, pae, mce,
-   cx8 /* AMD CMPXCHG8B */, apic, NULL, syscall, mtrr, pge, 
mca, cmov,
-   pat, pse36, NULL, NULL /* Linux mp */, nx /* Intel xd */, NULL, 
mmxext, mmx,
-   fxsr, fxsr_opt /* AMD ffxsr */, pdpe1gb /* AMD Page1GB */, 
rdtscp, NULL, lm /* Intel 64 */, 3dnowext, 3dnow,
-};
-static const char *ext3_feature_name[] = {
-   lahf_lm /* AMD LahfSahf */, cmp_legacy, svm, extapic /* AMD 
ExtApicSpace */, cr8legacy /* AMD AltMovCr8 */, abm, sse4a, misalignsse,
-   3dnowprefetch, osvw, NULL /* Linux ibs */, NULL, skinit, wdt, 
NULL, NULL,
-   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-};
 
 for ( i = 0 ; i  32 ; i++ ) 
 if (feature_name[i]  !strcmp (flagname, feature_name[i])) {
-- 
1.6.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] Fix x86 feature modifications for features that set multiple bits

2009-05-03 Thread Avi Kivity
QEMU allows adding or removing cpu features by using the syntax '-cpu +feature'
or '-cpu -feature'.  Some cpuid features cause more than one bit to be set or
cleared; but QEMU stops after just one bit has been modified, causing the
feature bits to be inconsistent.

Fix by allowing all feature bits corresponding to a given name to be set.

Signed-off-by: Avi Kivity a...@redhat.com
---
 target-i386/helper.c |   13 -
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index 88585b8..0c91133 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -66,28 +66,31 @@ static void add_flagname_to_bitmaps(char *flagname, 
uint32_t *features,
 uint32_t *ext3_features)
 {
 int i;
+int found = 0;
 
 for ( i = 0 ; i  32 ; i++ ) 
 if (feature_name[i]  !strcmp (flagname, feature_name[i])) {
 *features |= 1  i;
-return;
+found = 1;
 }
 for ( i = 0 ; i  32 ; i++ ) 
 if (ext_feature_name[i]  !strcmp (flagname, ext_feature_name[i])) {
 *ext_features |= 1  i;
-return;
+found = 1;
 }
 for ( i = 0 ; i  32 ; i++ ) 
 if (ext2_feature_name[i]  !strcmp (flagname, ext2_feature_name[i])) {
 *ext2_features |= 1  i;
-return;
+found = 1;
 }
 for ( i = 0 ; i  32 ; i++ ) 
 if (ext3_feature_name[i]  !strcmp (flagname, ext3_feature_name[i])) {
 *ext3_features |= 1  i;
-return;
+found = 1;
 }
-fprintf(stderr, CPU feature %s not found\n, flagname);
+if (!found) {
+fprintf(stderr, CPU feature %s not found\n, flagname);
+}
 }
 
 typedef struct x86_def_t {
-- 
1.6.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] kvm: Trim cpu features not supported by kvm

2009-05-03 Thread Avi Kivity
Remove cpu features that are not supported by kvm from the cpuid features
reported to the guest.

Signed-off-by: Avi Kivity a...@redhat.com
---
 target-i386/helper.c |   30 ++
 1 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index 0c91133..bdf242b 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -93,6 +93,21 @@ static void add_flagname_to_bitmaps(char *flagname, uint32_t 
*features,
 }
 }
 
+static void kvm_trim_features(uint32_t *features, uint32_t supported,
+  const char *names[])
+{
+int i;
+uint32_t mask;
+
+for (i = 0; i  32; ++i) {
+mask = 1U  i;
+if ((*features  mask)  !(supported  mask)) {
+printf(Processor feature %s not supported by kvm\n, names[i]);
+*features = ~mask;
+}
+}
+}
+
 typedef struct x86_def_t {
 const char *name;
 uint32_t level;
@@ -1699,5 +1714,20 @@ CPUX86State *cpu_x86_init(const char *cpu_model)
 
 qemu_init_vcpu(env);
 
+if (kvm_enabled()) {
+kvm_trim_features(env-cpuid_features,
+  kvm_arch_get_supported_cpuid(env, 1, R_EDX),
+  feature_name);
+kvm_trim_features(env-cpuid_ext_features,
+  kvm_arch_get_supported_cpuid(env, 1, R_ECX),
+  ext_feature_name);
+kvm_trim_features(env-cpuid_ext2_features,
+  kvm_arch_get_supported_cpuid(env, 0x8001, R_EDX),
+  ext2_feature_name);
+kvm_trim_features(env-cpuid_ext3_features,
+  kvm_arch_get_supported_cpuid(env, 0x8001, R_ECX),
+  ext3_feature_name);
+}
+
 return env;
 }
-- 
1.6.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] kvm: Add support for querying supported cpu features

2009-05-03 Thread Avi Kivity
kvm does not support all cpu features; add support for dunamically querying
the supported feature set.

Signed-off-by: Avi Kivity a...@redhat.com
---
 kvm.h |3 ++
 target-i386/kvm.c |   80 +
 2 files changed, 83 insertions(+), 0 deletions(-)

diff --git a/kvm.h b/kvm.h
index bd4e8d4..c134c45 100644
--- a/kvm.h
+++ b/kvm.h
@@ -124,6 +124,9 @@ void kvm_arch_remove_all_hw_breakpoints(void);
 
 void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg);
 
+uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
+  int reg);
+
 /* generic hooks - to be moved/refactored once there are more users */
 
 static inline void cpu_synchronize_state(CPUState *env, int modified)
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index b534b2d..5f54ff5 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -34,6 +34,86 @@
 do { } while (0)
 #endif
 
+#ifdef KVM_CAP_EXT_CPUID
+
+static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
+{
+struct kvm_cpuid2 *cpuid;
+int r, size;
+
+size = sizeof(*cpuid) + max * sizeof(*cpuid-entries);
+cpuid = (struct kvm_cpuid2 *)qemu_mallocz(size);
+cpuid-nent = max;
+r = kvm_ioctl(s, KVM_GET_SUPPORTED_CPUID, cpuid);
+if (r  0) {
+if (r == -E2BIG) {
+qemu_free(cpuid);
+return NULL;
+} else {
+fprintf(stderr, KVM_GET_SUPPORTED_CPUID failed: %s\n,
+strerror(-r));
+exit(1);
+}
+}
+return cpuid;
+}
+
+uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, int 
reg)
+{
+struct kvm_cpuid2 *cpuid;
+int i, max;
+uint32_t ret = 0;
+uint32_t cpuid_1_edx;
+
+if (!kvm_check_extension(KVM_CAP_EXT_CPUID)) {
+return -1U;
+}
+
+max = 1;
+while ((cpuid = try_get_cpuid(env-kvm_state, max)) == NULL) {
+max *= 2;
+}
+
+for (i = 0; i  cpuid-nent; ++i) {
+if (cpuid-entries[i].function == function) {
+switch (reg) {
+case R_EAX:
+ret = cpuid-entries[i].eax;
+break;
+case R_EBX:
+ret = cpuid-entries[i].ebx;
+break;
+case R_ECX:
+ret = cpuid-entries[i].ecx;
+break;
+case R_EDX:
+ret = cpuid-entries[i].edx;
+if (function == 0x8001) {
+/* On Intel, kvm returns cpuid according to the Intel spec,
+ * so add missing bits according to the AMD spec:
+ */
+cpuid_1_edx = kvm_arch_get_supported_cpuid(env, 1, R_EDX);
+ret |= cpuid_1_edx  0xdfeff7ff;
+}
+break;
+}
+}
+}
+
+qemu_free(cpuid);
+
+return ret;
+}
+
+#else
+
+uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function, int 
reg)
+{
+return -1U;
+}
+
+#endif
+
 int kvm_arch_init_vcpu(CPUState *env)
 {
 struct {
-- 
1.6.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert Sync idcache after emualted DMA operations for ia64

2009-05-03 Thread Avi Kivity

Hollis Blanchard wrote:

This reverts commit 9dc99a28236161a5a1b4c58f1e9c4ec6179cb976.
Aside from the other issues discussed on kvm-devel, this commit breaks the
PowerPC build.

  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/04] qemu-kvm: Remove the dependency for phys_ram_base for ipf.c

2009-05-03 Thread Avi Kivity

Jes Sorensen wrote:

Zhang, Xiantao wrote:

Jes Sorensen wrote:
I still can't see the difference with the patch in Avi's tree except 
nvram stuff.  And I believe the global variable you mentioned should 
be only used for nvram. So I propose an incremental patch for that. :)


Hi,

Here is an incremental version of the patch. I think the differences
should be pretty obvious now :-)

It fixes the memcpy issues in the hob and nvram code and also cleans
up the interfaces a lot.

Avi, please add.


Looks good to me.  Xiantao?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/21] Remove use of signalfd in block-raw-posix.c

2009-05-03 Thread Avi Kivity

Avi Kivity wrote:

Anthony Liguori wrote:


Oh okay.  But signal delivery is slow; for example the FPU needs to 
be reset.


Is it really justified to add all of this extra code (including 
signalfd emulation) for something that probably isn't even measurable?


We don't have to add signalfd emulation; we can simply use signal+pipe 
in that case.


We won't know if it's measurable or not until we measure it (or not).



I like using wiz-bang features of Linux as much as the next guy, but 
I think we're stretching to justify it here :-)




I think it's worth it in this case.  It will become more important in 
time, too.




Out of curiosity, I measured this:

[...@balrog test]$ ./signal
2777 ns/signal (pipe)
 844 ns/signal (signalfd)


At 1 signals/sec, this will save about 2% cpu time.  It's definitely 
worthwhile for the handful of lines it takes.


test program:

#include signal.h
#include unistd.h
#include sys/signalfd.h
#include sys/time.h
#include stdio.h

static int wfd;

static void handler(int signum)
{
   char b;

   write(wfd, b, 1);
}

static int create_pipe(void)
{
   int fd[2];
   sigset_t s;

   pipe(fd);
   wfd = fd[1];
   signal(SIGUSR1, handler);
   sigemptyset(s);
   sigaddset(s, SIGUSR1);
   sigprocmask(SIG_UNBLOCK, s, NULL);
   return fd[0];
}

static int create_signalfd(void)
{
   sigset_t s;

   sigemptyset(s);
   sigaddset(s, SIGUSR1);
   sigprocmask(SIG_BLOCK, s, NULL);
   return signalfd(-1, s, 0);
}

static uint64_t time_usec(void)
{
   struct timeval tv;

   gettimeofday(tv, NULL);
   return (uint64_t)tv.tv_sec * 100 + tv.tv_usec;
}

#define N 1000

static void test(const char *name, int fd, int len)
{
   int i;
   uint64_t t1, t2;
   char buf[128];

   t1 = time_usec();
   for (i = 0; i  N; ++i) {
   raise(SIGUSR1);
   read(fd, buf, len);
   }
   t2 = time_usec();

   close(fd);

   printf(%5d ns/signal (%s)\n, 1000 * (t2 - t1) / N, name);
}

int main(int ac, char **av)
{
   test(pipe, create_pipe(), 1);
   test(signalfd, create_signalfd(), 128);
   return 0;
}


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Avi Kivity

Michael S. Tsirkin wrote:

On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
  

This allows an eventfd to be registered as an irq source with a guest.  Any
signaling operation on the eventfd (via userspace or kernel) will inject
the registered GSI at the next available window.

Signed-off-by: Gregory Haskins ghask...@novell.com



If we ever want to use this with e.g. MSI-X emulation in guest, and want
to be stricly compliant to MSI-X, we'll need a way for guest to mask
interrupts, and for host to report that a masked interrupt is pending.
Ideally, all this will be doable with a couple of mmapped pages to avoid
vmexits/system calls.

  


We could do this in two ways:

- move msix entry emulation into the kernel
- require the device to support replacing its irqfd, and juggle it like so:
  - guest disables msi
  - replace device model fd with eventfd belonging to us
  - when the device fires its eventfd, set the irq pending bit
  - guest enables msi
   - if the pending bit is set, fire the interrupt?
   - replace device model fd with the real irqfd

I'm leaning towards the latter, though it's not an easy call.


+static void
+irqfd_inject(struct work_struct *work)
+{
+   struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
+   struct kvm *kvm = irqfd-kvm;
+
+   mutex_lock(kvm-lock);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
+   mutex_unlock(kvm-lock);



This will do weird stuff (deliver the irq twice) if the irq is
MSI/MSI-X. I know this was discussed already and is a temporary
shortcut, but maybe add a comment that we really want kvm_toggle_irq,
so that we won't forget?
  


If so, that's a bug.  MSI should ignore kvm_set_irq(..., 0).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm: unapplied patches in my queue

2009-05-03 Thread Michael S. Tsirkin
Hi,
The following patches have been posted a week ago:
[PATCH] qemu-kvm: make clean should propagate into libkvm dire
[PATCH] qemu-kvm: fix compiler warning
[PATCH] qemu-kvm: make kvm_create_pit static

No comments have been made since then - does this mean they can be
applied?

Thanks,
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm: unapplied patches in my queue

2009-05-03 Thread Avi Kivity

Michael S. Tsirkin wrote:

Hi,
The following patches have been posted a week ago:
[PATCH] qemu-kvm: make clean should propagate into libkvm dire
[PATCH] qemu-kvm: fix compiler warning
[PATCH] qemu-kvm: make kvm_create_pit static

No comments have been made since then - does this mean they can be
applied?
  


I make a note when I apply patches, and make a comment when something is 
wrong with them.  No comment means they're either waiting to be 
reviewed, or that I forgot about them completely.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Davide Libenzi
On Sun, 3 May 2009, Al Viro wrote:

 On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
  +   /* We re-use eventfd for irqfd */
  +   fd = sys_eventfd2(0, 0);
  +   if (fd  0) {
  +   ret = fd;
  +   goto fail;
  +   }
  +
  +   /* We maintain a reference to eventfd for the irqfd lifetime */
  +   file = eventfd_fget(fd);
  +   if (IS_ERR(file)) {
  +   ret = PTR_ERR(file);
  +   goto fail;
  +   }
  +
  +   irqfd-file = file;
 
 This is just plain wrong.  You have no promise whatsoever that caller of
 that sucker won't race with e.g. dup2().  IOW, you can't assume that
 file will be of the expected kind.

The eventfd_fget() checks for the file_operations pointer, before 
returning the file*, and fails if the fd in not an eventfd. Or you have 
other concerns?



- Davide


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Linux x86 guest panics in skb_copy_bits

2009-05-03 Thread Justin Dossey
Hi all,

I have a pretty straightforward setup.

Hypervisor:
dual xeon e5205 running Gentoo Linux
kernel 2.6.27 with virtio devices enabled
kvm 84
libvirt 0.5.1

Guest:
32-bit, virtio for nic and disk, qcow2.
Linux 2.6.28.

Network is bridged using tap and brctl.

I'm running Apache on the guest.  Whenever I send enough data through
the virtual NIC, I get a panic in skb_copy_bits.  I've tried using the
e1000 driver instead of the virtio one, but that makes no difference.

Has anyone else seen this behavior before?  I got this on 2.6.27 and 2.6.28.

Here's a snippet:

[280204.340016]  [c02253e2] panic+0x4e/0xea
[280204.340016]  [c05906b9] oops_end+0x8f/0xa3
[280204.340016]  [c0204e94] die+0x57/0x5f
[280204.340016]  [c0592192] do_page_fault+0x605/0x6bc
[280204.340016]  [c059010d] ? _spin_lock+0x15/0x18
[280204.340016]  [c04bfad8] ? __qdisc_run+0xe6/0x1a7
[280204.340016]  [c0591b8d] ? do_page_fault+0x0/0x6bc
[280204.340016]  [c0590482] error_code+0x72/0x78
[280204.340016]  [c04ace8c] ? skb_copy_bits+0x4f/0x1c4
[280204.340016]  [c0215864] ? kvm_set_pte+0x26/0x29
[280204.340016]  [c054889c] xdr_skb_read_bits+0x1f/0x37
[280204.340016]  [c054872b] xdr_partial_copy_from_skb+
0x117/0x16c
[280204.340016]  [c0549ec4] xs_tcp_data_recv+0x245/0x3de
[280204.340016]  [c054887d] ? xdr_skb_read_bits+0x0/0x37
[280204.340016]  [c04e07d6] tcp_read_sock+0x8c/0x1e2
[280204.340016]  [c0549c7f] ? xs_tcp_data_recv+0x0/0x3de
[280204.340016]  [c054a5d1] xs_tcp_data_ready+0x54/0x64
[280204.340016]  [c04e9469] tcp_rcv_established+0x524/0x7b7
[280204.340016]  [c04ee4b2] tcp_v4_do_rcv+0x173/0x2dc


--
Justin Dossey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Al Viro
On Sun, May 03, 2009 at 11:07:26AM -0700, Davide Libenzi wrote:
 On Sun, 3 May 2009, Al Viro wrote:
  On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
   + /* We re-use eventfd for irqfd */
   + fd = sys_eventfd2(0, 0);
   + if (fd  0) {
   + ret = fd;
   + goto fail;
   + }
   +
   + /* We maintain a reference to eventfd for the irqfd lifetime */
   + file = eventfd_fget(fd);
   + if (IS_ERR(file)) {
   + ret = PTR_ERR(file);
   + goto fail;
   + }
   +
   + irqfd-file = file;
  
  This is just plain wrong.  You have no promise whatsoever that caller of
  that sucker won't race with e.g. dup2().  IOW, you can't assume that
  file will be of the expected kind.
 
 The eventfd_fget() checks for the file_operations pointer, before 
 returning the file*, and fails if the fd in not an eventfd. Or you have 
 other concerns?

OK, but... it's still wrong.  Descriptor numbers are purely for interaction
with userland; using them that way violates very general race-prevention
rules, even if you do paper over the worst of consequences with check in
eventfd_fget().

General rules:
* descriptor you've generated is fit only for return to userland;
* descriptor you've got from userland is fit only for *single*
fget() or equivalent, unless you are one of the core syscalls manipulating
the descriptor table itself (dup2, etc.)
* once file is installed in descriptor table, you'd better be past
the last failure exit; sys_close() on cleanup path is not acceptable.
That's what reserving descriptors is for.

IOW, the sane solution would be to export something that returns your
struct file *.  And stop playing with fd completely.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Michael S. Tsirkin
On Sun, May 03, 2009 at 07:59:40PM +0300, Avi Kivity wrote:
 Michael S. Tsirkin wrote:
 On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
   
 This allows an eventfd to be registered as an irq source with a guest.  Any
 signaling operation on the eventfd (via userspace or kernel) will inject
 the registered GSI at the next available window.

 Signed-off-by: Gregory Haskins ghask...@novell.com
 

 If we ever want to use this with e.g. MSI-X emulation in guest, and want
 to be stricly compliant to MSI-X, we'll need a way for guest to mask
 interrupts, and for host to report that a masked interrupt is pending.
 Ideally, all this will be doable with a couple of mmapped pages to avoid
 vmexits/system calls.

   

 We could do this in two ways:

 - move msix entry emulation into the kernel

It's not too bad IMO: MSIX is just a table with a list
of vectors, you check the mask bit on each interrupt,
if masked mark vector pending and poll until unmasked.

 - require the device to support replacing its irqfd, and juggle it like so:
   - guest disables msi
   - replace device model fd with eventfd belonging to us
   - when the device fires its eventfd, set the irq pending bit
   - guest enables msi
- if the pending bit is set, fire the interrupt?
- replace device model fd with the real irqfd

Looks like a lot of code. No?

 I'm leaning towards the latter, though it's not an easy call.

Actually there's a third option: add KVM_MASK_IRQ, KVM_UNMASK_IRQ ioctls
which will block/unblock guest from getting interrupt on this irq,
whatever the source.  Interrupts are queued in kernel while masked. A
third ioctl KVM_PENDING_IRQS will return the status for a set if IRQs.
qemu would call these ioctls when guest edits the MSIX vector control or
reads the pending bit array.

 +static void
 +irqfd_inject(struct work_struct *work)
 +{
 +   struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
 +   struct kvm *kvm = irqfd-kvm;
 +
 +   mutex_lock(kvm-lock);
 +   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
 +   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
 +   mutex_unlock(kvm-lock);
 

 This will do weird stuff (deliver the irq twice) if the irq is
 MSI/MSI-X. I know this was discussed already and is a temporary
 shortcut, but maybe add a comment that we really want kvm_toggle_irq,
 so that we won't forget?
   

 If so, that's a bug.  MSI should ignore kvm_set_irq(..., 0).

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Avi Kivity

Michael S. Tsirkin wrote:

On Sun, May 03, 2009 at 07:59:40PM +0300, Avi Kivity wrote:
  

Michael S. Tsirkin wrote:


On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote:
  
  

This allows an eventfd to be registered as an irq source with a guest.  Any
signaling operation on the eventfd (via userspace or kernel) will inject
the registered GSI at the next available window.

Signed-off-by: Gregory Haskins ghask...@novell.com



If we ever want to use this with e.g. MSI-X emulation in guest, and want
to be stricly compliant to MSI-X, we'll need a way for guest to mask
interrupts, and for host to report that a masked interrupt is pending.
Ideally, all this will be doable with a couple of mmapped pages to avoid
vmexits/system calls.

  
  

We could do this in two ways:

- move msix entry emulation into the kernel



It's not too bad IMO: MSIX is just a table with a list
of vectors, you check the mask bit on each interrupt,
if masked mark vector pending and poll until unmasked.
  


Right, but it's more and more core, and more and more bugs.  Bugs in the 
kernel are more expensive to fix and get to users.


  

- require the device to support replacing its irqfd, and juggle it like so:
  - guest disables msi
  - replace device model fd with eventfd belonging to us
  - when the device fires its eventfd, set the irq pending bit
  - guest enables msi
   - if the pending bit is set, fire the interrupt?
   - replace device model fd with the real irqfd



Looks like a lot of code. No?
  


We'll need exactly the same code if we do it in the kernel.  The only 
addition is replacing the fd.



I'm leaning towards the latter, though it's not an easy call.



Actually there's a third option: add KVM_MASK_IRQ, KVM_UNMASK_IRQ ioctls
which will block/unblock guest from getting interrupt on this irq,
whatever the source.  Interrupts are queued in kernel while masked. A
third ioctl KVM_PENDING_IRQS will return the status for a set if IRQs.
qemu would call these ioctls when guest edits the MSIX vector control or
reads the pending bit array.
  


I think this is the best option.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


file descriptor abuses

2009-05-03 Thread Al Viro
On Sun, May 03, 2009 at 08:01:36PM +0100, Al Viro wrote:

 General rules:
   * descriptor you've generated is fit only for return to userland;
   * descriptor you've got from userland is fit only for *single*
 fget() or equivalent, unless you are one of the core syscalls manipulating
 the descriptor table itself (dup2, etc.)
   * once file is installed in descriptor table, you'd better be past
 the last failure exit; sys_close() on cleanup path is not acceptable.
 That's what reserving descriptors is for.
 
 IOW, the sane solution would be to export something that returns your
 struct file *.  And stop playing with fd completely.

Speaking of which, quick look through fget() callers shows this turd:

static int p9_socket_open(struct p9_client *client, struct socket *csocket)
{
int fd, ret;

fd = sock_map_fd(csocket, 0);
.

ret = p9_fd_open(client, fd, fd);
if (ret  0) {
P9_EPRINTK(KERN_ERR, p9_socket_open: failed to open fd\n);
sockfd_put(csocket);
return ret;
}
.
return 0;
}

where p9_fd_open() calls fget() on its 2nd and 3rd arguments.  Which does
worse than just a leak, AFAICT - on failure exit it leaves a dangling
pointer from descriptor table.

On the almost unrelated note, we have (in drivers/sneak_in^Wstaging/usbip)
sockfd_to_socket(), with all callers leaking struct file, AFAICS.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Michael S. Tsirkin
On Sun, May 03, 2009 at 10:17:16PM +0300, Avi Kivity wrote:
 Actually there's a third option: add KVM_MASK_IRQ, KVM_UNMASK_IRQ ioctls
 which will block/unblock guest from getting interrupt on this irq,
 whatever the source.  Interrupts are queued in kernel while masked. A
 third ioctl KVM_PENDING_IRQS will return the status for a set if IRQs.
 qemu would call these ioctls when guest edits the MSIX vector control or
 reads the pending bit array.
   

 I think this is the best option.

Sounds good.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface

2009-05-03 Thread Davide Libenzi
On Sun, 3 May 2009, Al Viro wrote:

 IOW, the sane solution would be to export something that returns your
 struct file *.  And stop playing with fd completely.

This builds but it's not tested at all.

- Make all the work of the old anon_inode_getfd(), done by a new 
  anon_inode_getfile(), with anon_inode_getfd() using its services

- Make all the work done by sys_eventfd(), done by a new 
  eventfd_file_create() (which in turn uses anon_inode_getfile()), with 
  sys_eventfd() using its services

IRQfd can use eventfd_file_create(), fget(), get_unused_fd_flags() and 
fd_install() just before returning.
Is that what you had in mind?



- Davide


---
 fs/anon_inodes.c|   68 +---
 fs/eventfd.c|   44 +---
 include/linux/anon_inodes.h |3 +
 include/linux/eventfd.h |6 +++
 4 files changed, 92 insertions(+), 29 deletions(-)

Index: linux-2.6.mod/fs/anon_inodes.c
===
--- linux-2.6.mod.orig/fs/anon_inodes.c 2009-05-03 12:21:09.0 -0700
+++ linux-2.6.mod/fs/anon_inodes.c  2009-05-03 12:54:02.0 -0700
@@ -64,28 +64,24 @@ static const struct dentry_operations an
  *
  * Creates a new file by hooking it on a single inode. This is useful for files
  * that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfd() will share a single inode,
+ * All the files created with anon_inode_getfile() will share a single inode,
  * hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup.  Returns new descriptor or -error.
+ * setup.  Returns the newly created file* or error.
  */
-int anon_inode_getfd(const char *name, const struct file_operations *fops,
-void *priv, int flags)
+struct file *anon_inode_getfile(const char *name,
+   const struct file_operations *fops,
+   void *priv, int flags)
 {
struct qstr this;
struct dentry *dentry;
struct file *file;
-   int error, fd;
+   int error;
 
if (IS_ERR(anon_inode_inode))
-   return -ENODEV;
+   return ERR_PTR(-ENODEV);
 
if (fops-owner  !try_module_get(fops-owner))
-   return -ENOENT;
-
-   error = get_unused_fd_flags(flags);
-   if (error  0)
-   goto err_module;
-   fd = error;
+   return ERR_PTR(-ENOENT);
 
/*
 * Link the inode to a directory entry by creating a unique name
@@ -97,7 +93,7 @@ int anon_inode_getfd(const char *name, c
this.hash = 0;
dentry = d_alloc(anon_inode_mnt-mnt_sb-s_root, this);
if (!dentry)
-   goto err_put_unused_fd;
+   goto err_module;
 
/*
 * We know the anon_inode inode count is always greater than zero,
@@ -123,16 +119,54 @@ int anon_inode_getfd(const char *name, c
file-f_version = 0;
file-private_data = priv;
 
+   return file;
+
+err_dput:
+   dput(dentry);
+err_module:
+   module_put(fops-owner);
+   return ERR_PTR(error);
+}
+EXPORT_SYMBOL_GPL(anon_inode_getfile);
+
+/**
+ * anon_inode_getfd - creates a new file instance by hooking it up to an
+ *anonymous inode, and a dentry that describe the class
+ *of the file
+ *
+ * @name:[in]name of the class of the new file
+ * @fops:[in]file operations for the new file
+ * @priv:[in]private data for the new file (will be file's 
private_data)
+ * @flags:   [in]flags
+ *
+ * Creates a new file by hooking it on a single inode. This is useful for files
+ * that do not need to have a full-fledged inode in order to operate correctly.
+ * All the files created with anon_inode_getfd() will share a single inode,
+ * hence saving memory and avoiding code duplication for the file/inode/dentry
+ * setup.  Returns new descriptor or -error.
+ */
+int anon_inode_getfd(const char *name, const struct file_operations *fops,
+void *priv, int flags)
+{
+   int error, fd;
+   struct file *file;
+
+   error = get_unused_fd_flags(flags);
+   if (error  0)
+   return error;
+   fd = error;
+
+   file = anon_inode_getfile(name, fops, priv, flags);
+   if (IS_ERR(file)) {
+   error = PTR_ERR(file);
+   goto err_put_unused_fd;
+   }
fd_install(fd, file);
 
return fd;
 
-err_dput:
-   dput(dentry);
 err_put_unused_fd:
put_unused_fd(fd);
-err_module:
-   module_put(fops-owner);
return error;
 }
 EXPORT_SYMBOL_GPL(anon_inode_getfd);
Index: linux-2.6.mod/fs/eventfd.c
===
--- linux-2.6.mod.orig/fs/eventfd.c 2009-05-03 12:21:09.0 -0700
+++ linux-2.6.mod/fs/eventfd.c  2009-05-03 

Custom BIOS supported size

2009-05-03 Thread Cristi Magherusan
Hello,

Which is the maximum size supported for a custom BIOS image(eg.
coreboot-based)? I tried some 256K coreboot BIOS images and seemed to
work fine, but it blowed up with a 3MB image (which by the way works on
qemu just fine). 

Qemu also seems to fail with images greater than 4MB, and I'd appreciate
if either kvm or qemu had support for such images, and maybe even
greater than that if technically possible.

You can get such images from http://www.coreboot.org/QEMU
The 3MB one that I tried can be fetched from
http://panzer.utcluj.ro/~alien/coreboot/AVATT/BIOS/OpenVZ/bios.bin

Thanks!
Cristi

-- 
Ing. Cristi Măgherușan, System/Network Engineer
Technical University of Cluj-Napoca, Romania
http://cc.utcluj.ro  +40264 401247


signature.asc
Description: This is a digitally signed message part


[RFC] Bring in all the Linux headers we depend on in QEMU

2009-05-03 Thread Anthony Liguori

Sorry this explanation is long winded, but this is a messy situation.

In Linux, there isn't a very consistent policy about userspace kernel 
header inclusion.  On a typical Linux system, you're likely to find 
kernel headers in three places.


glibc headers (/usr/include/{linux,asm})

These headers are installed by glibc.  They very often are based on much 
older kernel versions that the kernel you have in your distribution.  
For software that depends on these headers, very often this means that 
your software detects features being missing that are present on your 
kernel.  Furthermore, glibc only installs the headers it needs so very 
often certain headers have dependencies that aren't met.  A classic 
example is linux/compiler.h and the broken usbdevice_fs.h header that 
depends on it.  There are still distributions today that QEMU doesn't 
compile on because of this.


Today, most of QEMU's code depends on these headers.

/lib/modules/$(uname -r)/build

These are the kernel headers that are installed as part of your kernel.  
In general, this is a pretty good place to find the headers that are 
associated with the kernel version you're actually running on.  However, 
these headers are part of the kernel build tree and are not always 
guaranteed to be includable from userspace.


random kernel tree

Developers, in particular, like to point things at their random kernel 
trees.  In general though, relying on a full kernel source tree being 
available isn't a good idea.  Kernel headers change dramatically across 
versions too so it's very likely that we would need to have a lot of 
#ifdefs dependent on kernel versions, or some of the uglier work arounds 
we have in usb-linux.c.


I think the best way to avoid #ifdefs and dependencies on 
broken/incomplete glibc headers is to include all of the Linux headers 
we need within QEMU.  The attached patch does just this.


I think there's room for discussion about whether we really want to do 
this.  We could potentially depend on some more common glibc headers 
(like asm/types.h) while bringing in less reliable headers 
(if_tun.h/virtio*).  Including them all seems like the most robust 
solution to me though.


Comments?

Regards,

Anthony Liguori
From b42aa57e94fc6b05897271b0816fa34d70670c9a Mon Sep 17 00:00:00 2001
From: Anthony Liguori aligu...@us.ibm.com
Date: Sat, 2 May 2009 15:45:24 -0500
Subject: [PATCH] Import Linux headers into QEMU

These are the headers from Linux 2.6.29 that have been modified to work under
QEMU.  This includes the necessary scripts to generate the headers from the
original Linux source tree.

I've not included the headers in this patch as they are quite big and would make
review more difficult.  I've included headers for all host architectures QEMU
currently supports but I've only tested x86.

Signed-off-by: Anthony Liguori aligu...@us.ibm.com
---
 Makefile.target |3 --
 configure   |   37 ++-
 linux/Makefile  |  108 +++
 linux/README|   19 ++
 linux/fixup.sed |   19 ++
 usb-linux.c |1 -
 6 files changed, 165 insertions(+), 22 deletions(-)
 create mode 100644 linux/Makefile
 create mode 100644 linux/README
 create mode 100644 linux/fixup.sed

diff --git a/Makefile.target b/Makefile.target
index f735105..474245a 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -124,9 +124,6 @@ CFLAGS+=-I/opt/SUNWspro/prod/include/cc
 endif
 endif
 
-kvm.o: CFLAGS+=$(KVM_CFLAGS)
-kvm-all.o: CFLAGS+=$(KVM_CFLAGS)
-
 all: $(PROGS)
 
 #
diff --git a/configure b/configure
index 82fb60a..379a2a6 100755
--- a/configure
+++ b/configure
@@ -1081,6 +1081,23 @@ EOF
   fi
 fi
 
+# Linux kernel headers CFLAGS
+if test -z $kerneldir ; then
+linux_cflags=-I$source_path/linux
+else
+linux_cflags=-I$kerneldir/include
+if test \( $cpu = i386 -o $cpu = x86_64 \) \
+	-a -d $kerneldir/arch/x86/include ; then
+	linux_cflags=$linux_cflags -I$kerneldir/arch/x86/include
+elif test $cpu = ppc -a -d $kerneldir/arch/powerpc/include ; then
+	linux_cflags=$linux_cflags -I$kerneldir/arch/powerpc/include
+elif test -d $kerneldir/arch/$cpu/include ; then
+	linux_cflags=$linux_cflags -I$kerneldir/arch/$cpu/include
+fi
+fi
+
+OS_CFLAGS=$OS_CFLAGS $linux_cflags
+
 ##
 # kvm probe
 if test $kvm = yes ; then
@@ -1100,27 +1117,14 @@ if test $kvm = yes ; then
 #endif
 int main(void) { return 0; }
 EOF
-  if test $kerneldir !=  ; then
-  kvm_cflags=-I$kerneldir/include
-  if test \( $cpu = i386 -o $cpu = x86_64 \) \
- -a -d $kerneldir/arch/x86/include ; then
-kvm_cflags=$kvm_cflags -I$kerneldir/arch/x86/include
-	elif test $cpu = ppc -a -d $kerneldir/arch/powerpc/include ; then
-	kvm_cflags=$kvm_cflags -I$kerneldir/arch/powerpc/include
-elif test -d $kerneldir/arch/$cpu/include ; then
-kvm_cflags=$kvm_cflags 

Re: kvm-userspace broken?

2009-05-03 Thread Hans de Bruin

Avi Kivity wrote:

Oliver Rath wrote:

Hi List,

maybe i missed some announcements, but

git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-userspace.git


givs no response:

kvm-userspace # git clone 
git://git.kernel.org/pub/scm/virt/kvm/kvm-userspace.git
Initialized empty Git repository in 
/home/oliver/kvm-userspace/kvm-userspace/.git/

fatal: The remote end hung up unexpectedly


Whats up there?
  


kvm-userspace.git has been retired; it's now playing golf in 
git://git.kernel.org/pub/scm/virt/kvm/retired/kvm-userspace.git.  Use 
git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git instead.


The latest tarbal on sourceforge is kvm-85, yet the clone I just made 
from qemu-kvm 'git descibes' itself as kvm-84-756-gf7d114d. Is that right?


--
Hans
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM x86_64 with SR-IOV..?

2009-05-03 Thread Nicholas A. Bellinger
On Sat, 2009-05-02 at 18:22 +0800, Sheng Yang wrote:
 On Thu, Apr 30, 2009 at 01:22:54PM -0700, Nicholas A. Bellinger wrote:
  Greetings KVM folks,
  
  I wondering if any information exists for doing SR-IOV on the new VT-d
  capable chipsets with KVM..?  From what I understand the patches for
  doing this with KVM are floating around, but I have been unable to find
  any user-level docs for actually making it all go against a upstream
  v2.6.30-rc3 code..
  
  So far I have been doing IOV testing with Xen 3.3 and 3.4.0-pre, and I
  am really hoping to be able to jump to KVM for single-function and and
  then multi-function SR-IOV.  I know that the VM migration stuff for IOV
  in Xen is up and running,  and I assume it is being worked in for KVM
  instance migration as well..?  This part is less important (at least for
  me :-) than getting a stable SR-IOV setup running under the KVM
  hypervisor..  Does anyone have any pointers for this..?
  
  Any comments or suggestions are appreciated!
  
 
 Hi Nicholas
 
 The patches are not floating around now. As you know, SR-IOV for Linux have
 been in 2.6.30, so then you can use upstream KVM and qemu-kvm(or recent
 released kvm-85) with 2.6.30-rc3 as host kernel. And some time ago, there are
 several SRIOV related patches for qemu-kvm, and now they all have been checked
 in.
 
 And for KVM, the extra document is not necessary, for you can simple assign a
 VF to guest like any other devices. And how to create VF is specific for each
 device driver. So just create a VF then assign it to KVM guest is fine.
 

Greetings Sheng,

So, I have been trying the latest kvm-85 release on a v2.6.30-rc3
checkout from linux-2.6.git on a CentOS 5u3 x86_64 install on Intel
IOH-5520 based dual socket Nehalem board.  I have enabled DMAR and
Interrupt Remapping my KVM host using v2.6.30-rc3 and from what I can
tell, the KVM_CAP_* defines from libkvm are enabled with building kvm-85
after './configure --kerneldir=/usr/src/linux-2.6.git' and the PCI
passthrough code is being enabled in kvm-85/qemu/hw/device-assignment.c
AFAICT..

From there, I use the freshly installed qemu-x86_64-system binary to
start a Debian 5 x86_64 HVM (that previously had been moving network
packets under Xen for PCIe passthrough). I see the MSI-X interrupt
remapping working on the KVM host for the passed -pcidevice, and the
MMIO mappings from the qemu build that I also saw while using
Xen/qemu-dm built with PCI passthrough are there as well..

But while the KVM guest is booting, I see the following exception(s)
from qemu-x86_64-system for one of the VFs for a multi-function PCIe device:

BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)

I try with one of the on-board e1000e ports (02:00.0) and I see the same
exception along with some MSI-X exceptions from qemu-x86_64-system in
KVM guest.. However, I am still able to see the e1000e and the other
vxge multi-function device with lspci, but I am unable to dhcp or ping
with the e1000e and VF from multi-function device fails to register the
MSI-X interrupt in the guest..

S, I enabled the debugging code in kvm-85/qemu/hw/device-assignment.c
and see the PAGE aligned MMIO memory for the passed PCIe device is being
released during the BUG exceptions above..  Is there something else I should
be looking at..?  I have pci-stub enabled, and I unbind 02:00.0
from /sys/bus/pci/drivers/e1000e/unbind successfully (just like with Xen
and pciback), but I am unable to do the 'echo -n 02:00.0
 /sys/bus/pci/drivers/pci-stub/bind' (it returns write error, no such
device, with no dmesg output) on the KVM host running v2.6.30-rc3.  Is
this supposed to happen on v2.6.30-rc3 with pci-stub..?  I am also using
the the kvm-85 source dist kvm_intel.ko and kvm.ko kernel modules.  Is
there something I am missing when building kvm-85 for SR-IOV passthrough..?

Also FYI, I am having to use pci=resource_alignment=
because the BIOS does not PAGE_SIZE align the MMIO BARs for my multi-function
devices..

Also, I tried with disabling the DMAR with the Intel IOMMU passthrough
from this patch:

https://lists.linux-foundation.org/pipermail/iommu/2009-April/001339.html

that did not make it into v2.6.30-rc3.  The patch logic was enabled but
still I saw the same kvm exceptions from qemu-system-x86_64.

Anyways, I am going to give it a shot with the Fedora 11 x86_64 Preview
and see if it works as expected with a IOH-5520 chipset with the AMI
BIOS on a Tyan S7010 with Xeon 5520s.  Hopefully this is just a kvm-85 build
and/or install issue I am seeing on my CentOS 5u3 install (that has a Xen PCIe
passthrough setup on it as well) with v2.6.30-rc3.  I will try on a fresh 
install
on a distro with the new KVM logic and see what happens.

:-)

Thanks for your comments!

--nab

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] kvm-kmod: fix build on kernels with kvm trace set

2009-05-03 Thread Zhang, Xiantao
Avi Kivity wrote:
 Michael S. Tsirkin wrote:
 CONFIG_KVM_TRACE in kernel conflicts with the definition
 in external module. external-module-compat-comm.h tried
 to work around this, but this didn't work as some
 code still does #include linux/autoconf.h
 directly.
 
 Solve this differently by s/CONFIG_KVM_TRACE/CONFIG_KMOD_KVM_TRACE/
 in awk. Had to tighten regular expressions in hack-module.awk
 so that they don't trigger on kvm_host.h .
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  Makefile  |5 +++--
  configure |2 +-
  external-module-compat-comm.h |7 ---
  x86/Kbuild|2 +-
  x86/hack-module.awk   |8 +---
  5 files changed, 10 insertions(+), 14 deletions(-)
 
 diff --git a/Makefile b/Makefile
 index f2ef811..9cdc0af 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -34,8 +34,8 @@ hack-files-ia64 = kvm_main.c kvm_fw.c kvm_lib.c
 kvm-ia64.c 
 
  hack-files = $(hack-files-$(ARCH_DIR))
 
 -ifeq ($(EXT_CONFIG_KVM_TRACE),y)
 -module_defines += -DEXT_CONFIG_KVM_TRACE=y
 +ifeq ($(CONFIG_KMOD_KVM_TRACE),y)
 +module_defines += -DCONFIG_KMOD_KVM_TRACE=1
  endif
 
  all:: prerequisite
 @@ -72,6 +72,7 @@ header-sync:
  for i in $$(find $T -name '*.h'); do \
  $(call unifdef,$$i); done
  $(call hack, include/linux/kvm.h)
 +$(call hack, include/linux/kvm_host.h)
  $(call hack, include/asm-$(ARCH_DIR)/kvm.h)
  set -e  for i in $$(find $T -type f -printf '%P '); \
  do mkdir -p $$(dirname $$i); cmp -s $$i $T/$$i || cp $T/$$i $$i;
 done 
 diff --git a/configure b/configure
 index 30af6e7..6e12bb1 100755
 --- a/configure
 +++ b/configure
 @@ -122,5 +122,5 @@ DEPMOD_VERSION=$depmod_version
  EOF
 
  cat EOF  config.kbuild
 -EXT_CONFIG_KVM_TRACE=$kvm_trace
 +CONFIG_KMOD_KVM_TRACE=$kvm_trace
  EOF
 diff --git a/external-module-compat-comm.h
 b/external-module-compat-comm.h 
 index c955927..e561448 100644
 --- a/external-module-compat-comm.h
 +++ b/external-module-compat-comm.h
 @@ -18,13 +18,6 @@
  #include linux/hrtimer.h
  #include asm/bitops.h
 
 -/* Override CONFIG_KVM_TRACE */
 -#ifdef EXT_CONFIG_KVM_TRACE
 -#  define CONFIG_KVM_TRACE 1
 -#else
 -#  undef CONFIG_KVM_TRACE
 -#endif
 -
  /*
   * 2.6.16 does not have GFP_NOWAIT
   */
 diff --git a/x86/Kbuild b/x86/Kbuild
 index d3aca00..fbdb28b 100644
 --- a/x86/Kbuild
 +++ b/x86/Kbuild
 @@ -7,7 +7,7 @@ kvm-objs := kvm_main.o x86.o mmu.o x86_emulate.o
   ../anon_inodes.o irq.o i8259.o lapic.o ioapic.o preempt.o i8254.o
   coalesced_mmio.o irq_comm.o \   timer.o \
 ../external-module-compat.o -ifeq ($(EXT_CONFIG_KVM_TRACE),y)
 +ifeq ($(CONFIG_KMOD_KVM_TRACE),y)
  kvm-objs += kvm_trace.o
  endif
  ifeq ($(CONFIG_IOMMU_API),y)
 diff --git a/x86/hack-module.awk b/x86/hack-module.awk
 index 260eeef..f3d95be 100644
 --- a/x86/hack-module.awk
 +++ b/x86/hack-module.awk
 @@ -4,7 +4,7 @@ BEGIN { split(INIT_WORK desc_struct ldttss_desc64
desc_ptr  \ hrtimer_expires_remaining  \
on_each_cpu relay_open request_irq , compat_apis); }
 
 -/^int kvm_init\(/ { anon_inodes = 1 }
 +/^int kvm_init\([^)]*\)$/ { anon_inodes = 1 }
 
  /return 0;/  anon_inodes {
  print \tr = kvm_init_anon_inodes();;
 @@ -17,7 +17,7 @@ BEGIN { split(INIT_WORK desc_struct ldttss_desc64
  desc_ptr  \  anon_inodes = 0 }
 
 -/^void kvm_exit/ { anon_inodes_exit = 1 }
 +/^void kvm_exit[^)]*\)$/ { anon_inodes_exit = 1 }
 
  /\}/  anon_inodes_exit {
  print \tkvm_exit_anon_inodes();;
 @@ -25,7 +25,7 @@ BEGIN { split(INIT_WORK desc_struct ldttss_desc64
  desc_ptr  \  anon_inodes_exit = 0 }
 
 -/^int kvm_arch_init/ { kvm_arch_init = 1 }
 +/^int kvm_arch_init[^)])$/ { kvm_arch_init = 1 }
  /\tsc_khz\/  kvm_arch_init { sub(\\tsc_khz\\,
 kvm_tsc_khz) }  /^}/ { kvm_arch_init = 0 } 
 
 @@ -85,6 +85,8 @@ BEGIN { split(INIT_WORK desc_struct ldttss_desc64
 desc_ptr  \ 
 
  /\kvm_.*_fops\.owner = module;/ { $0 =
 IF_ANON_INODES_DOES_REFCOUNTS( $0 ) } 
 
 +{ sub(/\CONFIG_KVM_TRACE\/, CONFIG_KMOD_KVM_TRACE) } +
  { print }
 
  /unsigned long flags;/   vmx_load_host_state {
 
 
 Xiantao, do we need to change this for ia64?

IA64 didn't support kvm trace, so doesn't need these changes , thanks! :)
Xiantao--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] Revert Sync idcache after emualted DMA operations for ia64

2009-05-03 Thread Zhang, Xiantao
Avi Kivity wrote:
 Hollis Blanchard wrote:
 This reverts commit 9dc99a28236161a5a1b4c58f1e9c4ec6179cb976.
 Aside from the other issues discussed on kvm-devel, this commit
 breaks the PowerPC build. 
 
 
 
 Applied, thanks.

Hollis, 
   Could you explain why this patch breaks the powerpc build?  qemu_sync_icache 
has the definition for non-ai64 case, so shoudn't break any arch-specific 
build.  
Xiantao--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 03/04] qemu-kvm: Remove the dependency for phys_ram_base for ipf.c

2009-05-03 Thread Zhang, Xiantao
Avi Kivity wrote:
 Jes Sorensen wrote:
 Zhang, Xiantao wrote:
 Jes Sorensen wrote:
 I still can't see the difference with the patch in Avi's tree except
 nvram stuff.  And I believe the global variable you mentioned should
 be only used for nvram. So I propose an incremental patch for that.
 :) 
 
 Hi,
 
 Here is an incremental version of the patch. I think the differences
 should be pretty obvious now :-)
 
 It fixes the memcpy issues in the hob and nvram code and also cleans
 up the interfaces a lot. 
 
 Avi, please add.
 
 Looks good to me.  Xiantao?

Hi, Jes
Have you tested nvram support with this patch? I
Xiantao--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Assign the correct pci id range to virtio_pci

2009-05-03 Thread Rusty Russell
On Mon, 27 Apr 2009 12:53:25 pm Pantelis Koukousoulas wrote:
 On Mon, Apr 27, 2009 at 3:44 AM, Anthony Liguori anth...@codemonkey.ws 
 wrote:
  Would be good to at least include the experiment range in case people are
  making third-party virtio modules and want to play around without replacing
  virtio-{pci,*}.
 
 I 'd be happy with a simple comment explaining the 0x103f (e.g.,
 /* Not yet using the full 0x1000 - 0x10ef to hedge our bets in case we
 broke the ABI.*/
 as explained above)

Thanks, I like your patch.

Where did this idea of experimental range come from, BTW?  I prefer your
module cmdline approach, as it discourages deployment with such numbers.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM x86_64 with SR-IOV..?

2009-05-03 Thread Sheng Yang
On Monday 04 May 2009 08:53:07 Nicholas A. Bellinger wrote:
 On Sat, 2009-05-02 at 18:22 +0800, Sheng Yang wrote:
  On Thu, Apr 30, 2009 at 01:22:54PM -0700, Nicholas A. Bellinger wrote:
   Greetings KVM folks,
  
   I wondering if any information exists for doing SR-IOV on the new VT-d
   capable chipsets with KVM..?  From what I understand the patches for
   doing this with KVM are floating around, but I have been unable to find
   any user-level docs for actually making it all go against a upstream
   v2.6.30-rc3 code..
  
   So far I have been doing IOV testing with Xen 3.3 and 3.4.0-pre, and I
   am really hoping to be able to jump to KVM for single-function and and
   then multi-function SR-IOV.  I know that the VM migration stuff for IOV
   in Xen is up and running,  and I assume it is being worked in for KVM
   instance migration as well..?  This part is less important (at least
   for me :-) than getting a stable SR-IOV setup running under the KVM
   hypervisor..  Does anyone have any pointers for this..?
  
   Any comments or suggestions are appreciated!
 
  Hi Nicholas
 
  The patches are not floating around now. As you know, SR-IOV for Linux
  have been in 2.6.30, so then you can use upstream KVM and qemu-kvm(or
  recent released kvm-85) with 2.6.30-rc3 as host kernel. And some time
  ago, there are several SRIOV related patches for qemu-kvm, and now they
  all have been checked in.
 
  And for KVM, the extra document is not necessary, for you can simple
  assign a VF to guest like any other devices. And how to create VF is
  specific for each device driver. So just create a VF then assign it to
  KVM guest is fine.

 Greetings Sheng,

 So, I have been trying the latest kvm-85 release on a v2.6.30-rc3
 checkout from linux-2.6.git on a CentOS 5u3 x86_64 install on Intel
 IOH-5520 based dual socket Nehalem board.  I have enabled DMAR and
 Interrupt Remapping my KVM host using v2.6.30-rc3 and from what I can
 tell, the KVM_CAP_* defines from libkvm are enabled with building kvm-85
 after './configure --kerneldir=/usr/src/linux-2.6.git' and the PCI
 passthrough code is being enabled in kvm-85/qemu/hw/device-assignment.c
 AFAICT..

 From there, I use the freshly installed qemu-x86_64-system binary to

 start a Debian 5 x86_64 HVM (that previously had been moving network
 packets under Xen for PCIe passthrough). I see the MSI-X interrupt
 remapping working on the KVM host for the passed -pcidevice, and the
 MMIO mappings from the qemu build that I also saw while using
 Xen/qemu-dm built with PCI passthrough are there as well..


Hi Nicholas

 But while the KVM guest is booting, I see the following exception(s)
 from qemu-x86_64-system for one of the VFs for a multi-function PCIe
 device:

 BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)

This one is mostly harmless.

 I try with one of the on-board e1000e ports (02:00.0) and I see the same
 exception along with some MSI-X exceptions from qemu-x86_64-system in
 KVM guest.. However, I am still able to see the e1000e and the other
 vxge multi-function device with lspci, but I am unable to dhcp or ping
 with the e1000e and VF from multi-function device fails to register the
 MSI-X interrupt in the guest..

Did you see the interrupt in the guest and host side? I think you can try on-
board e1000e for MSI-X first. And please ensure correlated driver have been 
loaded correctly. And what do you mean by some MSI-X exceptions? Better with 
the log.

 S, I enabled the debugging code in kvm-85/qemu/hw/device-assignment.c
 and see the PAGE aligned MMIO memory for the passed PCIe device is being
 released during the BUG exceptions above..  Is there something else I
 should be looking at..?  

That part of memory should be released for trap MMIO for MSI-X table.

 I have pci-stub enabled, and I unbind 02:00.0
 from /sys/bus/pci/drivers/e1000e/unbind successfully (just like with Xen
 and pciback), but I am unable to do the 'echo -n 02:00.0

  /sys/bus/pci/drivers/pci-stub/bind' (it returns write error, no such

 device, with no dmesg output) on the KVM host running v2.6.30-rc3.  Is
 this supposed to happen on v2.6.30-rc3 with pci-stub..?  

Maybe you need echo :02:00.0  /sys/bus/pci/drivers/pci-stub/bind? 

 I am also using
 the the kvm-85 source dist kvm_intel.ko and kvm.ko kernel modules.  Is
 there something I am missing when building kvm-85 for SR-IOV passthrough..?

I think the first thing is to confirm that device assignment work in your 
environment, using on-board card. You can also refer to 
http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM

And you can post debug_device_assignment=1 log and qemu log and the tail of 
dmesg as well.

Thanks!

-- 
regards
Yang, Sheng


 Also FYI, I am having to use pci=resource_alignment=
 because the BIOS does not PAGE_SIZE align the MMIO BARs for my
 multi-function devices..

 Also, I tried with disabling the DMAR with the Intel IOMMU passthrough
 from this patch:

 

Re: [PATCH] Assign the correct pci id range to virtio_pci

2009-05-03 Thread Pantelis Koukousoulas
 I 'd be happy with a simple comment explaining the 0x103f (e.g.,
 /* Not yet using the full 0x1000 - 0x10ef to hedge our bets in case we
 broke the ABI.*/
 as explained above)

 Thanks, I like your patch.

 Where did this idea of experimental range come from, BTW?

In the qemu sources, there is a file pci-ids.txt that documents the
PCI ID rules.
I 'm attaching it for your convenience.

 I prefer your module cmdline approach, as it discourages
 deployment with such numbers.

Great, I like this way better too, because it allows using the full experimental
range (16 IDs) while also allowing for breaking the virtio_pci ABI.

Thanks,
Pantelis

PCI IDs for qemu


Red Hat, Inc. donates a part of its device ID range to qemu, to be used for
virtual devices.  The vendor ID is 1af4 (formerly Qumranet ID).

The 1000 - 10ff device ID range is used for VirtIO devices.

The 1100 device ID is used as PCI Subsystem ID for existing hardware
devices emulated by qemu.

All other device IDs are reserved.


VirtIO Device IDs
-

1af4:1000  network device
1af4:1001  block device
1af4:1002  balloon device
1af4:1003  console device

1af4:1004  Reserved.
   to  Contact Gerd Hoffmann kra...@redhat.com to get a
1af4:10ef  device ID assigned for your new virtio device.

1af4:10f0  Available for experimental usage without registration.  Must get
   to  official ID when the code leaves the test lab (i.e. when seeking
1af4:10ff  upstream merge or shipping a distro/product) to avoid conflicts.



Re: [RFC 1/2] Add MCE simulation support to qemu/tcg

2009-05-03 Thread Huang Ying
I found there is a qemu-kvm.git on git.kernel.org. I will re-base my
patches for that.

And I found that the kvm support in qemu (target-i386/kvm.c) is quite
different from that in qemu-kvm (kvm/). Where should MCE support goes?
To target-i386/kvm.c or kvm/ or both?

Best Regards,
Huang Ying

On Thu, 2009-04-30 at 16:53 +0800, Huang Ying wrote:
 - MCE features are initialized when VCPU is intialized according to CPUID.
 - A monitor command mce is added to inject a MCE.
 - A new interrupt mask: CPU_INTERRUPT_MCE is added to inject the MCE.
 
 Signed-off-by: Huang Ying ying.hu...@intel.com
 
 ---
  cpu-all.h   |4 ++
  cpu-exec.c  |4 ++
  monitor.c   |   49 +
  target-i386/cpu.h   |   22 +++
  target-i386/helper.c|   70 
 
  target-i386/op_helper.c |   34 +++
  6 files changed, 183 insertions(+)
 
 --- a/target-i386/cpu.h
 +++ b/target-i386/cpu.h
 @@ -202,6 +202,7 @@
  #define CR4_DE_MASK   (1  3)
  #define CR4_PSE_MASK  (1  4)
  #define CR4_PAE_MASK  (1  5)
 +#define CR4_MCE_MASK  (1  6)
  #define CR4_PGE_MASK  (1  7)
  #define CR4_PCE_MASK  (1  8)
  #define CR4_OSFXSR_SHIFT 9
 @@ -248,6 +249,17 @@
  #define PG_ERROR_RSVD_MASK 0x08
  #define PG_ERROR_I_D_MASK  0x10
  
 +#define MCE_CAP_DEF  0x100
 +#define MCE_BANKS_DEF4
 +
 +#define MCG_CTL_P(1UL8)
 +
 +#define MCG_STATUS_MCIP  (1UL2)
 +
 +#define MCI_STATUS_VAL   (1UL63)
 +#define MCI_STATUS_OVER  (1UL62)
 +#define MCI_STATUS_UC(1UL61)
 +
  #define MSR_IA32_TSC0x10
  #define MSR_IA32_APICBASE   0x1b
  #define MSR_IA32_APICBASE_BSP   (18)
 @@ -288,6 +300,11 @@
  
  #define MSR_MTRRdefType  0x2ff
  
 +#define MSR_MC0_CTL  0x400
 +#define MSR_MC0_STATUS   0x401
 +#define MSR_MC0_ADDR 0x402
 +#define MSR_MC0_MISC 0x403
 +
  #define MSR_EFER0xc080
  
  #define MSR_EFER_SCE   (1  0)
 @@ -673,6 +690,11 @@ typedef struct CPUX86State {
  /* in order to simplify APIC support, we leave this pointer to the
 user */
  struct APICState *apic_state;
 +
 +uint64 mcg_cap;
 +uint64 mcg_status;
 +uint64 mcg_ctl;
 +uint64 *mce_banks;
  } CPUX86State;
  
  CPUX86State *cpu_x86_init(const char *cpu_model);
 --- a/target-i386/op_helper.c
 +++ b/target-i386/op_helper.c
 @@ -3133,7 +3133,23 @@ void helper_wrmsr(void)
  case MSR_MTRRdefType:
  env-mtrr_deftype = val;
  break;
 +case MSR_MCG_STATUS:
 +env-mcg_status = val;
 +break;
 +case MSR_MCG_CTL:
 +if ((env-mcg_cap  MCG_CTL_P)
 + (val == 0 || val == ~(uint64_t)0))
 +env-mcg_ctl = val;
 +break;
  default:
 +if ((uint32_t)ECX = MSR_MC0_CTL
 + (uint32_t)ECX  MSR_MC0_CTL + (4 * env-mcg_cap  0xff)) {
 +uint32_t offset = (uint32_t)ECX - MSR_MC0_CTL;
 +if ((offset  0x3) != 0
 +|| (val == 0 || val == ~(uint64_t)0))
 +env-mce_banks[offset] = val;
 +break;
 +}
  /* XXX: exception ? */
  break;
  }
 @@ -3252,7 +3268,25 @@ void helper_rdmsr(void)
  /* XXX: exception ? */
  val = 0;
  break;
 +case MSR_MCG_CAP:
 +val = env-mcg_cap;
 +break;
 +case MSR_MCG_CTL:
 +if (env-mcg_cap  MCG_CTL_P)
 +val = env-mcg_ctl;
 +else
 +val = 0;
 +break;
 +case MSR_MCG_STATUS:
 +val = env-mcg_status;
 +break;
  default:
 +if ((uint32_t)ECX = MSR_MC0_CTL
 + (uint32_t)ECX  MSR_MC0_CTL + (4 * env-mcg_cap  0xff)) {
 +uint32_t offset = (uint32_t)ECX - MSR_MC0_CTL;
 +val = env-mce_banks[offset];
 +break;
 +}
  /* XXX: exception ? */
  val = 0;
  break;
 --- a/target-i386/helper.c
 +++ b/target-i386/helper.c
 @@ -1430,6 +1430,75 @@ static void breakpoint_handler(CPUState 
  }
  #endif /* !CONFIG_USER_ONLY */
  
 +/* This should come from sysemu.h - if we could include it here... */
 +void qemu_system_reset_request(void);
 +
 +void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
 +uint64_t mcg_status, uint64_t addr, uint64_t misc)
 +{
 +uint64_t mcg_cap = cenv-mcg_cap;
 +unsigned bank_num = mcg_cap  0xff;
 +uint64_t *banks = cenv-mce_banks;
 +
 +if (bank = bank_num || !(status  MCI_STATUS_VAL))
 +return;
 +
 +/*
 + * if MSR_MCG_CTL is not all 1s, the uncorrected error
 + * reporting is disabled
 + */
 +if ((status  MCI_STATUS_UC)  (mcg_cap  MCG_CTL_P) 
 +cenv-mcg_ctl != ~(uint64_t)0)
 +return;
 +banks += 4 * bank;
 +/*
 + * if MSR_MCi_CTL is not all 1s, the uncorrected 

Re: KVM x86_64 with SR-IOV..?

2009-05-03 Thread Nicholas A. Bellinger
On Mon, 2009-05-04 at 10:09 +0800, Sheng Yang wrote:
 On Monday 04 May 2009 08:53:07 Nicholas A. Bellinger wrote:
  On Sat, 2009-05-02 at 18:22 +0800, Sheng Yang wrote:
   On Thu, Apr 30, 2009 at 01:22:54PM -0700, Nicholas A. Bellinger wrote:
Greetings KVM folks,
   
I wondering if any information exists for doing SR-IOV on the new VT-d
capable chipsets with KVM..?  From what I understand the patches for
doing this with KVM are floating around, but I have been unable to find
any user-level docs for actually making it all go against a upstream
v2.6.30-rc3 code..
   
So far I have been doing IOV testing with Xen 3.3 and 3.4.0-pre, and I
am really hoping to be able to jump to KVM for single-function and and
then multi-function SR-IOV.  I know that the VM migration stuff for IOV
in Xen is up and running,  and I assume it is being worked in for KVM
instance migration as well..?  This part is less important (at least
for me :-) than getting a stable SR-IOV setup running under the KVM
hypervisor..  Does anyone have any pointers for this..?
   
Any comments or suggestions are appreciated!
  
   Hi Nicholas
  
   The patches are not floating around now. As you know, SR-IOV for Linux
   have been in 2.6.30, so then you can use upstream KVM and qemu-kvm(or
   recent released kvm-85) with 2.6.30-rc3 as host kernel. And some time
   ago, there are several SRIOV related patches for qemu-kvm, and now they
   all have been checked in.
  
   And for KVM, the extra document is not necessary, for you can simple
   assign a VF to guest like any other devices. And how to create VF is
   specific for each device driver. So just create a VF then assign it to
   KVM guest is fine.
 
  Greetings Sheng,
 
  So, I have been trying the latest kvm-85 release on a v2.6.30-rc3
  checkout from linux-2.6.git on a CentOS 5u3 x86_64 install on Intel
  IOH-5520 based dual socket Nehalem board.  I have enabled DMAR and
  Interrupt Remapping my KVM host using v2.6.30-rc3 and from what I can
  tell, the KVM_CAP_* defines from libkvm are enabled with building kvm-85
  after './configure --kerneldir=/usr/src/linux-2.6.git' and the PCI
  passthrough code is being enabled in kvm-85/qemu/hw/device-assignment.c
  AFAICT..
 
  From there, I use the freshly installed qemu-x86_64-system binary to
 
  start a Debian 5 x86_64 HVM (that previously had been moving network
  packets under Xen for PCIe passthrough). I see the MSI-X interrupt
  remapping working on the KVM host for the passed -pcidevice, and the
  MMIO mappings from the qemu build that I also saw while using
  Xen/qemu-dm built with PCI passthrough are there as well..
 
 
 Hi Nicholas
 
  But while the KVM guest is booting, I see the following exception(s)
  from qemu-x86_64-system for one of the VFs for a multi-function PCIe
  device:
 
  BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)
 
 This one is mostly harmless.
 

Ok, good to know..  :-)

  I try with one of the on-board e1000e ports (02:00.0) and I see the same
  exception along with some MSI-X exceptions from qemu-x86_64-system in
  KVM guest.. However, I am still able to see the e1000e and the other
  vxge multi-function device with lspci, but I am unable to dhcp or ping
  with the e1000e and VF from multi-function device fails to register the
  MSI-X interrupt in the guest..
 
 Did you see the interrupt in the guest and host side?

Ok, I am restarting the e1000e test with a fresh Fedora 11 install and
KVM host kernel 2.6.29.1-111.fc11.x86_64.   After unbinding and
attaching the e1000e single-function device at 02:00.0 to pci-stub with:

   echo 8086 10d3  /sys/bus/pci/drivers/pci-stub/new_id
   echo :02:00.0  /sys/bus/pci/devices/:02:00.0/driver/unbind
   echo :02:00.0  /sys/bus/pci/drivers/pci-stub/bind 

I see the following the KVM host kernel ring buffer:

   e1000e :02:00.0: PCI INT A disabled
   pci-stub :02:00.0: PCI INT A - GSI 17 (level, low) - IRQ 17
   pci-stub :02:00.0: irq 58 for MSI/MSI-X

  I think you can try on-
 board e1000e for MSI-X first. And please ensure correlated driver have been 
 loaded correctly.

nod..

  And what do you mean by some MSI-X exceptions? Better with 
 the log.

Ok, with the Fedora 11 installed qemu-kemu, I see the expected
kvm_destroy_phys_mem() statements:

#kvm-host qemu-kvm -m 2048 -smp 8 -pcidevice host=02:00.0 
lenny64guest1-orig.img 
BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)
BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)

However I still see the following in the KVM guest kernel ring buffer
running v2.6.30-rc in the HVM guest.

[5.523790] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[5.524582] e1000e :00:05.0: PCI INT A - Link[LNKA] - GSI 10 (level, 
high) - IRQ 10
[5.525710] e1000e :00:05.0: setting latency timer to 64
[5.526048] :00:05.0: :00:05.0: Failed to initialize MSI-X 
interrupts.  Falling back to MSI interrupts.
[5.527200] 

Re: [PATCH 03/04] qemu-kvm: Remove the dependency for phys_ram_base for ipf.c

2009-05-03 Thread Jes Sorensen

Zhang, Xiantao wrote:

Avi Kivity wrote:
Looks good to me.  Xiantao?

Hi, Jes
Have you tested nvram support with this patch? I
Xiantao


No,

But it is behaving exactly like the old code, so it is no more broken
than the old code was.

Lets apply this and then look at the nvram issues afterwards.

Jes
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 03/04] qemu-kvm: Remove the dependency for phys_ram_base for ipf.c

2009-05-03 Thread Zhang, Xiantao
Jes Sorensen wrote:
 Zhang, Xiantao wrote:
 Avi Kivity wrote:
 Looks good to me.  Xiantao?
 
 Hi, Jes
 Have you tested nvram support with this patch? I
 Xiantao
 
 No,
 
 But it is behaving exactly like the old code, so it is no more broken
 than the old code was.
 
 Lets apply this and then look at the nvram issues afterwards.

Okay. :)
Xiantao

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM x86_64 with SR-IOV..?

2009-05-03 Thread Nicholas A. Bellinger
On Sun, 2009-05-03 at 21:36 -0700, Nicholas A. Bellinger wrote:
 On Mon, 2009-05-04 at 10:09 +0800, Sheng Yang wrote:
  On Monday 04 May 2009 08:53:07 Nicholas A. Bellinger wrote:
   On Sat, 2009-05-02 at 18:22 +0800, Sheng Yang wrote:
On Thu, Apr 30, 2009 at 01:22:54PM -0700, Nicholas A. Bellinger wrote:
 Greetings KVM folks,

 I wondering if any information exists for doing SR-IOV on the new VT-d
 capable chipsets with KVM..?  From what I understand the patches for
 doing this with KVM are floating around, but I have been unable to 
 find
 any user-level docs for actually making it all go against a upstream
 v2.6.30-rc3 code..

 So far I have been doing IOV testing with Xen 3.3 and 3.4.0-pre, and I
 am really hoping to be able to jump to KVM for single-function and and
 then multi-function SR-IOV.  I know that the VM migration stuff for 
 IOV
 in Xen is up and running,  and I assume it is being worked in for KVM
 instance migration as well..?  This part is less important (at least
 for me :-) than getting a stable SR-IOV setup running under the KVM
 hypervisor..  Does anyone have any pointers for this..?

 Any comments or suggestions are appreciated!
   
Hi Nicholas
   
The patches are not floating around now. As you know, SR-IOV for Linux
have been in 2.6.30, so then you can use upstream KVM and qemu-kvm(or
recent released kvm-85) with 2.6.30-rc3 as host kernel. And some time
ago, there are several SRIOV related patches for qemu-kvm, and now they
all have been checked in.
   
And for KVM, the extra document is not necessary, for you can simple
assign a VF to guest like any other devices. And how to create VF is
specific for each device driver. So just create a VF then assign it to
KVM guest is fine.
  
   Greetings Sheng,
  
   So, I have been trying the latest kvm-85 release on a v2.6.30-rc3
   checkout from linux-2.6.git on a CentOS 5u3 x86_64 install on Intel
   IOH-5520 based dual socket Nehalem board.  I have enabled DMAR and
   Interrupt Remapping my KVM host using v2.6.30-rc3 and from what I can
   tell, the KVM_CAP_* defines from libkvm are enabled with building kvm-85
   after './configure --kerneldir=/usr/src/linux-2.6.git' and the PCI
   passthrough code is being enabled in kvm-85/qemu/hw/device-assignment.c
   AFAICT..
  
   From there, I use the freshly installed qemu-x86_64-system binary to
  
   start a Debian 5 x86_64 HVM (that previously had been moving network
   packets under Xen for PCIe passthrough). I see the MSI-X interrupt
   remapping working on the KVM host for the passed -pcidevice, and the
   MMIO mappings from the qemu build that I also saw while using
   Xen/qemu-dm built with PCI passthrough are there as well..
  
  
  Hi Nicholas
  
   But while the KVM guest is booting, I see the following exception(s)
   from qemu-x86_64-system for one of the VFs for a multi-function PCIe
   device:
  
   BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)
  
  This one is mostly harmless.
  
 
 Ok, good to know..  :-)
 
   I try with one of the on-board e1000e ports (02:00.0) and I see the same
   exception along with some MSI-X exceptions from qemu-x86_64-system in
   KVM guest.. However, I am still able to see the e1000e and the other
   vxge multi-function device with lspci, but I am unable to dhcp or ping
   with the e1000e and VF from multi-function device fails to register the
   MSI-X interrupt in the guest..
  
  Did you see the interrupt in the guest and host side?
 
 Ok, I am restarting the e1000e test with a fresh Fedora 11 install and
 KVM host kernel 2.6.29.1-111.fc11.x86_64.   After unbinding and
 attaching the e1000e single-function device at 02:00.0 to pci-stub with:
 
echo 8086 10d3  /sys/bus/pci/drivers/pci-stub/new_id
echo :02:00.0  /sys/bus/pci/devices/:02:00.0/driver/unbind
echo :02:00.0  /sys/bus/pci/drivers/pci-stub/bind 
 
 I see the following the KVM host kernel ring buffer:
 
e1000e :02:00.0: PCI INT A disabled
pci-stub :02:00.0: PCI INT A - GSI 17 (level, low) - IRQ 17
pci-stub :02:00.0: irq 58 for MSI/MSI-X
 
   I think you can try on-
  board e1000e for MSI-X first. And please ensure correlated driver have been 
  loaded correctly.
 
 nod..
 
   And what do you mean by some MSI-X exceptions? Better with 
  the log.
 
 Ok, with the Fedora 11 installed qemu-kemu, I see the expected
 kvm_destroy_phys_mem() statements:
 
 #kvm-host qemu-kvm -m 2048 -smp 8 -pcidevice host=02:00.0 
 lenny64guest1-orig.img 
 BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)
 BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)
 
 However I still see the following in the KVM guest kernel ring buffer
 running v2.6.30-rc in the HVM guest.
 
 [5.523790] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
 [5.524582] e1000e :00:05.0: PCI INT A - Link[LNKA] - GSI 10 (level, 
 high) - IRQ 10
 [

Re: KVM x86_64 with SR-IOV..?

2009-05-03 Thread Nicholas A. Bellinger
On Sun, 2009-05-03 at 22:28 -0700, Nicholas A. Bellinger wrote:
 On Sun, 2009-05-03 at 21:36 -0700, Nicholas A. Bellinger wrote:
  On Mon, 2009-05-04 at 10:09 +0800, Sheng Yang wrote:
   On Monday 04 May 2009 08:53:07 Nicholas A. Bellinger wrote:
On Sat, 2009-05-02 at 18:22 +0800, Sheng Yang wrote:
 On Thu, Apr 30, 2009 at 01:22:54PM -0700, Nicholas A. Bellinger wrote:
  Greetings KVM folks,
 
  I wondering if any information exists for doing SR-IOV on the new 
  VT-d
  capable chipsets with KVM..?  From what I understand the patches for
  doing this with KVM are floating around, but I have been unable to 
  find
  any user-level docs for actually making it all go against a upstream
  v2.6.30-rc3 code..
 
  So far I have been doing IOV testing with Xen 3.3 and 3.4.0-pre, 
  and I
  am really hoping to be able to jump to KVM for single-function and 
  and
  then multi-function SR-IOV.  I know that the VM migration stuff for 
  IOV
  in Xen is up and running,  and I assume it is being worked in for 
  KVM
  instance migration as well..?  This part is less important (at least
  for me :-) than getting a stable SR-IOV setup running under the KVM
  hypervisor..  Does anyone have any pointers for this..?
 
  Any comments or suggestions are appreciated!

 Hi Nicholas

 The patches are not floating around now. As you know, SR-IOV for Linux
 have been in 2.6.30, so then you can use upstream KVM and qemu-kvm(or
 recent released kvm-85) with 2.6.30-rc3 as host kernel. And some time
 ago, there are several SRIOV related patches for qemu-kvm, and now 
 they
 all have been checked in.

 And for KVM, the extra document is not necessary, for you can simple
 assign a VF to guest like any other devices. And how to create VF is
 specific for each device driver. So just create a VF then assign it to
 KVM guest is fine.
   
Greetings Sheng,
   
So, I have been trying the latest kvm-85 release on a v2.6.30-rc3
checkout from linux-2.6.git on a CentOS 5u3 x86_64 install on Intel
IOH-5520 based dual socket Nehalem board.  I have enabled DMAR and
Interrupt Remapping my KVM host using v2.6.30-rc3 and from what I can
tell, the KVM_CAP_* defines from libkvm are enabled with building kvm-85
after './configure --kerneldir=/usr/src/linux-2.6.git' and the PCI
passthrough code is being enabled in kvm-85/qemu/hw/device-assignment.c
AFAICT..
   
From there, I use the freshly installed qemu-x86_64-system binary to
   
start a Debian 5 x86_64 HVM (that previously had been moving network
packets under Xen for PCIe passthrough). I see the MSI-X interrupt
remapping working on the KVM host for the passed -pcidevice, and the
MMIO mappings from the qemu build that I also saw while using
Xen/qemu-dm built with PCI passthrough are there as well..
   
   
   Hi Nicholas
   
But while the KVM guest is booting, I see the following exception(s)
from qemu-x86_64-system for one of the VFs for a multi-function PCIe
device:
   
BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)
   
   This one is mostly harmless.
   
  
  Ok, good to know..  :-)
  
I try with one of the on-board e1000e ports (02:00.0) and I see the same
exception along with some MSI-X exceptions from qemu-x86_64-system in
KVM guest.. However, I am still able to see the e1000e and the other
vxge multi-function device with lspci, but I am unable to dhcp or ping
with the e1000e and VF from multi-function device fails to register the
MSI-X interrupt in the guest..
   
   Did you see the interrupt in the guest and host side?
  
  Ok, I am restarting the e1000e test with a fresh Fedora 11 install and
  KVM host kernel 2.6.29.1-111.fc11.x86_64.   After unbinding and
  attaching the e1000e single-function device at 02:00.0 to pci-stub with:
  
 echo 8086 10d3  /sys/bus/pci/drivers/pci-stub/new_id
 echo :02:00.0  /sys/bus/pci/devices/:02:00.0/driver/unbind
 echo :02:00.0  /sys/bus/pci/drivers/pci-stub/bind 
  
  I see the following the KVM host kernel ring buffer:
  
 e1000e :02:00.0: PCI INT A disabled
 pci-stub :02:00.0: PCI INT A - GSI 17 (level, low) - IRQ 17
 pci-stub :02:00.0: irq 58 for MSI/MSI-X
  
I think you can try on-
   board e1000e for MSI-X first. And please ensure correlated driver have 
   been 
   loaded correctly.
  
  nod..
  
And what do you mean by some MSI-X exceptions? Better with 
   the log.
  
  Ok, with the Fedora 11 installed qemu-kemu, I see the expected
  kvm_destroy_phys_mem() statements:
  
  #kvm-host qemu-kvm -m 2048 -smp 8 -pcidevice host=02:00.0 
  lenny64guest1-orig.img 
  BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)
  BUG: kvm_destroy_phys_mem: invalid parameters (slot=-1)
  
  However I still see the following in the KVM guest kernel ring