[PATCH] Merge commit 'qemu-svn/trunk'

2009-04-06 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

* commit 'qemu-svn/trunk':
  Fix wrong return value
  Remove dead AIO code for win32
  target-mips: optimize gen_movcf_*()
  target-mips: optimize gen_movci()
  target-mips: optimize gen_compute_branch1()
  Misc scsi disk/cdrom fixes/improvements 4/4
  misc scsi disk/cdrom fixes/improvements 3/4
  misc scsi disk/cdrom fixes/improvements 2/4
  misc scsi disk/cdrom fixes/improvements 1/4
  target-mips: don't map FP registers as TCG global variables
  target-mips: fix divu instruction
  tcg: fix _tl aliases for divu/remu
  target-ppc: Explain why the whole TLB is flushed on SR write
  Fix hxtool eating backslash sequences for sh != bash
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: configure: pass --with-patched-kernel to kernel/configure

2009-04-06 Thread Avi Kivity
From: Mark McLoughlin mar...@redhat.com

We need to know this so that we can avoid doing things that
are specific to building kvm.ko - e.g. when we are only
doing header-sync.

Signed-off-by: Mark McLoughlin mar...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/configure b/configure
index 49c4419..249c743 100755
--- a/configure
+++ b/configure
@@ -121,6 +121,7 @@ arch=${arch%%-*}
 ./configure \
--kerneldir=$kerneldir \
--arch=$arch \
+   $([ -z ${want_module} ]  echo --with-patched-kernel) \
${cross_prefix:+--cross-prefix=$cross_prefix} \
${kvm_trace:+--with-kvm-trace}
 )
diff --git a/kernel/configure b/kernel/configure
index 79fb093..3fd0c94 100755
--- a/kernel/configure
+++ b/kernel/configure
@@ -6,6 +6,7 @@ cc=gcc
 ld=ld
 objcopy=objcopy
 ar=ar
+want_module=1
 kvm_trace=
 cross_prefix=
 arch=`uname -m`
@@ -44,6 +45,9 @@ while [[ $1 = -* ]]; do
kerneldir=$arg
 no_uname=1
;;
+   --with-patched-kernel)
+   want_module=
+   ;;
--with-kvm-trace)
kvm_trace=y
;;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: configure: run kernel configure even with --with-patched-kernel

2009-04-06 Thread Avi Kivity
From: Mark McLoughlin mar...@redhat.com

We still run header-sync in this case, which requires configure
to have been run.

Signed-off-by: Mark McLoughlin mar...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/configure b/configure
index a2aad59..49c4419 100755
--- a/configure
+++ b/configure
@@ -117,7 +117,7 @@ processor=${arch#*-}
 arch=${arch%%-*}
 
 #configure kernel module
-[[ -n $want_module ]]  (cd kernel;
+[ -e kernel/Makefile ]  (cd kernel;
 ./configure \
--kerneldir=$kerneldir \
--arch=$arch \
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: external module: only hack tsc_khz in kvm_arch_init

2009-04-06 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

We now hack it to a function call, so all hell breaks loose if we change
local variables names tsc_khz.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kernel/x86/hack-module.awk b/kernel/x86/hack-module.awk
index a05c0c3..260eeef 100644
--- a/kernel/x86/hack-module.awk
+++ b/kernel/x86/hack-module.awk
@@ -1,4 +1,4 @@
-BEGIN { split(INIT_WORK tsc_khz desc_struct ldttss_desc64 desc_ptr  \
+BEGIN { split(INIT_WORK desc_struct ldttss_desc64 desc_ptr  \
  hrtimer_add_expires_ns hrtimer_get_expires  \
  hrtimer_get_expires_ns hrtimer_start_expires  \
  hrtimer_expires_remaining  \
@@ -25,6 +25,10 @@ BEGIN { split(INIT_WORK tsc_khz desc_struct ldttss_desc64 
desc_ptr  \
 anon_inodes_exit = 0
 }
 
+/^int kvm_arch_init/ { kvm_arch_init = 1 }
+/\tsc_khz\/  kvm_arch_init { sub(\\tsc_khz\\, kvm_tsc_khz) }
+/^}/ { kvm_arch_init = 0 }
+
 /MODULE_AUTHOR/ {
 printf(MODULE_INFO(version, \%s\);\n, version)
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: external module: backward compatibility for ia64 msidef.h

2009-04-06 Thread Avi Kivity
From: Zhang, Yang yang.zh...@intel.com

when using make in kernel, it can not find msidef.h. This patch
fix this.

Signed-off-by: Yang Zhang yang.zh...@intel.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kernel/include-compat/asm-ia64/msidef.h 
b/kernel/include-compat/asm-ia64/msidef.h
new file mode 100644
index 000..592c104
--- /dev/null
+++ b/kernel/include-compat/asm-ia64/msidef.h
@@ -0,0 +1,42 @@
+#ifndef _IA64_MSI_DEF_H
+#define _IA64_MSI_DEF_H
+
+/*
+ * Shifts for APIC-based data
+ */
+
+#define MSI_DATA_VECTOR_SHIFT  0
+#defineMSI_DATA_VECTOR(v)  (((u8)v)  
MSI_DATA_VECTOR_SHIFT)
+#define MSI_DATA_VECTOR_MASK   0xff00
+
+#define MSI_DATA_DELIVERY_MODE_SHIFT   8
+#define MSI_DATA_DELIVERY_FIXED(0  MSI_DATA_DELIVERY_MODE_SHIFT)
+#define MSI_DATA_DELIVERY_LOWPRI   (1  MSI_DATA_DELIVERY_MODE_SHIFT)
+
+#define MSI_DATA_LEVEL_SHIFT   14
+#define MSI_DATA_LEVEL_DEASSERT(0  MSI_DATA_LEVEL_SHIFT)
+#define MSI_DATA_LEVEL_ASSERT  (1  MSI_DATA_LEVEL_SHIFT)
+
+#define MSI_DATA_TRIGGER_SHIFT 15
+#define MSI_DATA_TRIGGER_EDGE  (0  MSI_DATA_TRIGGER_SHIFT)
+#define MSI_DATA_TRIGGER_LEVEL (1  MSI_DATA_TRIGGER_SHIFT)
+
+/*
+ * Shift/mask fields for APIC-based bus address
+ */
+
+#define MSI_ADDR_DEST_ID_SHIFT 4
+#define MSI_ADDR_HEADER0xfee0
+
+#define MSI_ADDR_DEST_ID_MASK  0xffff
+#define MSI_ADDR_DEST_ID_CPU(cpu)  ((cpu)  MSI_ADDR_DEST_ID_SHIFT)
+
+#define MSI_ADDR_DEST_MODE_SHIFT   2
+#define MSI_ADDR_DEST_MODE_PHYS(0  MSI_ADDR_DEST_MODE_SHIFT)
+#defineMSI_ADDR_DEST_MODE_LOGIC(1  MSI_ADDR_DEST_MODE_SHIFT)
+
+#define MSI_ADDR_REDIRECTION_SHIFT 3
+#define MSI_ADDR_REDIRECTION_CPU   (0  MSI_ADDR_REDIRECTION_SHIFT)
+#define MSI_ADDR_REDIRECTION_LOWPRI(1  
MSI_ADDR_REDIRECTION_SHIFT)
+
+#endif/* _IA64_MSI_DEF_H */
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: external module: backward compatibility for PAGE_KERNEL_UC on ia64

2009-04-06 Thread Avi Kivity
From: Yang Zhang Yang Zhang

Signed-off-by: Yang Zhang Yang Zhang
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kernel/ia64/external-module-compat.h 
b/kernel/ia64/external-module-compat.h
index 592733c..bc78c3d 100644
--- a/kernel/ia64/external-module-compat.h
+++ b/kernel/ia64/external-module-compat.h
@@ -42,6 +42,12 @@ typedef u64 phys_addr_t;
 
 #endif
 
+#if LINUX_VERSION_CODE  KERNEL_VERSION(2,6,30)
+
+#define PAGE_KERNEL_UC __pgprot(__DIRTY_BITS  | _PAGE_PL_0 | _PAGE_AR_RWX | \
+   _PAGE_MA_UC)
+#endif
+
 #endif
 
 #ifndef CONFIG_HAVE_KVM_IRQCHIP
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: qemu: Do not use log dirty on ia64

2009-04-06 Thread Avi Kivity
From: Zhang, Yang yang.zh...@intel.com

ia64 does not support log dirty.

Signed-off-by: Yang Zhang yang.zh...@intel.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c
index 4164368..ed76367 100644
--- a/qemu/qemu-kvm.c
+++ b/qemu/qemu-kvm.c
@@ -1374,7 +1374,10 @@ int kvm_log_start(target_phys_addr_t phys_addr, 
target_phys_addr_t len)
 if (must_use_aliases_source(phys_addr))
 return 0;
 #endif
+
+#ifndef TARGET_IA64
 kvm_qemu_log_memory(phys_addr, len, 1);
+#endif
 return 0;
 }
 
@@ -1384,7 +1387,10 @@ int kvm_log_stop(target_phys_addr_t phys_addr, 
target_phys_addr_t len)
 if (must_use_aliases_source(phys_addr))
 return 0;
 #endif
+
+#ifndef TARGET_IA64
 kvm_qemu_log_memory(phys_addr, len, 0);
+#endif
 return 0;
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: testsuite: add tests for short/near Jcc and call instruction emulation

2009-04-06 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/user/test/x86/realmode.c b/user/test/x86/realmode.c
index f6d5326..336ba1c 100644
--- a/user/test/x86/realmode.c
+++ b/user/test/x86/realmode.c
@@ -361,14 +361,94 @@ void test_io(void)
 void test_call(void)
 {
struct regs inregs = { 0 }, outregs;
+   u32 esp[16];
+
+   inregs.esp = (u32)esp;
+
MK_INSN(call1, mov $test_function, %eax \n\t
   call *%eax\n\t);
+   MK_INSN(call_near1, jmp 2f\n\t
+   1: mov $0x1234, %eax\n\t
+   ret\n\t
+   2: call 1b\t);
+   MK_INSN(call_near2, call 1f\n\t
+   jmp 2f\n\t
+   1: mov $0x1234, %eax\n\t
+   ret\n\t
+   2:\t);
 
exec_in_big_real_mode(inregs, outregs,
  insn_call1,
  insn_call1_end - insn_call1);
if(!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x1234)
print_serial(Call Test 1: FAIL\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+   insn_call_near1, insn_call_near1_end - insn_call_near1);
+   if(!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x1234)
+   print_serial(Call near Test 1: FAIL\n);
+   exec_in_big_real_mode(inregs, outregs,
+   insn_call_near2, insn_call_near2_end - insn_call_near2);
+   if(!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x1234)
+   print_serial(Call near Test 2: FAIL\n);
+}
+
+void test_jcc_short(void)
+{
+   struct regs inregs = { 0 }, outregs;
+   MK_INSN(jnz_short1, jnz 1f\n\t
+   mov $0x1234, %eax\n\t
+   1:\n\t);
+   MK_INSN(jnz_short2, 1:\n\t
+   cmp $0x1234, %eax\n\t
+   mov $0x1234, %eax\n\t
+   jnz 1b\n\t);
+   MK_INSN(jmp_short1, jmp 1f\n\t
+ mov $0x1234, %eax\n\t
+ 1:\n\t);
+
+   exec_in_big_real_mode(inregs, outregs,
+   insn_jnz_short1, insn_jnz_short1_end - insn_jnz_short1);
+   if(!regs_equal(inregs, outregs, 0))
+   print_serial(JNZ sort Test 1: FAIL\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+   insn_jnz_short2, insn_jnz_short2_end - insn_jnz_short2);
+   if(!regs_equal(inregs, outregs, R_AX) || !(outregs.eflags  (1  6)))
+   print_serial(JNZ sort Test 2: FAIL\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+   insn_jmp_short1, insn_jmp_short1_end - insn_jmp_short1);
+   if(!regs_equal(inregs, outregs, 0))
+   print_serial(JMP sort Test 1: FAIL\n);
+}
+
+void test_jcc_near(void)
+{
+   struct regs inregs = { 0 }, outregs;
+   /* encode near jmp manually. gas will not do it if offsets  127 byte */
+   MK_INSN(jnz_near1, .byte 0x0f, 0x85, 0x06, 0x00\n\t
+  mov $0x1234, %eax\n\t);
+   MK_INSN(jnz_near2, cmp $0x1234, %eax\n\t
+  mov $0x1234, %eax\n\t
+  .byte 0x0f, 0x85, 0xf0, 0xff\n\t);
+   MK_INSN(jmp_near1, .byte 0xE9, 0x06, 0x00\n\t
+  mov $0x1234, %eax\n\t);
+
+   exec_in_big_real_mode(inregs, outregs,
+   insn_jnz_near1, insn_jnz_near1_end - insn_jnz_near1);
+   if(!regs_equal(inregs, outregs, 0))
+   print_serial(JNZ near Test 1: FAIL\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+   insn_jnz_near2, insn_jnz_near2_end - insn_jnz_near2);
+   if(!regs_equal(inregs, outregs, R_AX) || !(outregs.eflags  (1  6)))
+   print_serial(JNZ near Test 2: FAIL\n);
+
+   exec_in_big_real_mode(inregs, outregs,
+   insn_jmp_near1, insn_jmp_near1_end - insn_jmp_near1);
+   if(!regs_equal(inregs, outregs, 0))
+   print_serial(JMP near Test 1: FAIL\n);
 }
 
 void test_null(void)
@@ -389,6 +469,10 @@ void start(void)
test_add_imm();
test_io();
test_eflags_insn();
+   test_jcc_short();
+   test_jcc_near();
+   /* test_call() uses short jump so call it after testing jcc */
+   test_call();
 
exit(0);
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: qemu: improve pci host device address parsing

2009-04-06 Thread Avi Kivity
From: Han, Weidong weidong@intel.com

pci_parse_devaddr parses [[domain:][bus:]slot, it's valid when even
enter only slot, whereas it must be bus:slot.func in device assignment
command  (-pcidevice host=bus:slot.func). So I implemented a dedicated
function to parse device bdf in device assignment command, rather than
mix two parsing function together.

Signed-off-by: Weidong Han weidong@intel.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index b7f9fa6..09e54ae 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -1190,8 +1190,7 @@ out:
  */
 AssignedDevInfo *add_assigned_device(const char *arg)
 {
-char *cp, *cp1;
-char device[8];
+char device[16];
 char dma[6];
 int r;
 AssignedDevInfo *adev;
@@ -1202,6 +1201,13 @@ AssignedDevInfo *add_assigned_device(const char *arg)
 return NULL;
 }
 r = get_param_value(device, sizeof(device), host, arg);
+if (!r)
+ goto bad;
+
+r = pci_parse_host_devaddr(device, adev-bus, adev-dev, adev-func);
+if (r)
+goto bad;
+
 r = get_param_value(adev-name, sizeof(adev-name), name, arg);
 if (!r)
snprintf(adev-name, sizeof(adev-name), %s, device);
@@ -1211,18 +1217,6 @@ AssignedDevInfo *add_assigned_device(const char *arg)
 if (r  !strncmp(dma, none, 4))
 adev-disable_iommu = 1;
 #endif
-cp = device;
-adev-bus = strtoul(cp, cp1, 16);
-if (*cp1 != ':')
-goto bad;
-cp = cp1 + 1;
-
-adev-dev = strtoul(cp, cp1, 16);
-if (*cp1 != '.')
-goto bad;
-cp = cp1 + 1;
-
-adev-func = strtoul(cp, cp1, 16);
 
 LIST_INSERT_HEAD(adev_head, adev, next);
 return adev;
diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
index eca0517..bf97c8c 100644
--- a/qemu/hw/pci.c
+++ b/qemu/hw/pci.c
@@ -163,6 +163,7 @@ static int pci_set_default_subsystem_id(PCIDevice *pci_dev)
 }
 
 /*
+ * Parse pci address in qemu command
  * Parse [[domain:]bus:]slot, return -1 on error
  */
 static int pci_parse_devaddr(const char *addr, int *domp, int *busp, unsigned 
*slotp)
@@ -211,6 +212,55 @@ static int pci_parse_devaddr(const char *addr, int *domp, 
int *busp, unsigned *s
 return 0;
 }
 
+/*
+ * Parse device bdf in device assignment command:
+ *
+ * -pcidevice host=bus:dev.func
+ *
+ * Parse bus:slot.func return -1 on error
+ */
+int pci_parse_host_devaddr(const char *addr, int *busp,
+   int *slotp, int *funcp)
+{
+const char *p;
+char *e;
+int val;
+int bus = 0, slot = 0, func = 0;
+
+p = addr;
+val = strtoul(p, e, 16);
+if (e == p)
+   return -1;
+if (*e == ':') {
+   bus = val;
+   p = e + 1;
+   val = strtoul(p, e, 16);
+   if (e == p)
+   return -1;
+   if (*e == '.') {
+   slot = val;
+   p = e + 1;
+   val = strtoul(p, e, 16);
+   if (e == p)
+   return -1;
+   func = val;
+   } else
+   return -1;
+} else
+   return -1;
+
+if (bus  0xff || slot  0x1f || func  0x7)
+   return -1;
+
+if (*e)
+   return -1;
+
+*busp = bus;
+*slotp = slot;
+*funcp = func;
+return 0;
+}
+
 int pci_read_devaddr(const char *addr, int *domp, int *busp, unsigned *slotp)
 {
 char devaddr[32];
diff --git a/qemu/hw/pci.h b/qemu/hw/pci.h
index ea0bec8..890a41b 100644
--- a/qemu/hw/pci.h
+++ b/qemu/hw/pci.h
@@ -234,6 +234,9 @@ PCIDevice *pci_find_device(int bus_num, int slot, int 
function);
 int pci_read_devaddr(const char *addr, int *domp, int *busp, unsigned *slotp);
 int pci_assign_devaddr(const char *addr, int *domp, int *busp, unsigned 
*slotp);
 
+int pci_parse_host_devaddr(const char *addr, int *busp,
+   int *slotp, int *funcp);
+
 void pci_info(Monitor *mon);
 PCIBus *pci_bridge_init(PCIBus *bus, int devfn, uint16_t vid, uint16_t did,
 pci_map_irq_fn map_irq, const char *name);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: configure: --with-patched-kernel doesn't need kernelversion

2009-04-06 Thread Avi Kivity
From: Mark McLoughlin mar...@redhat.com

DEPMOD_VERSION is only used for kvm.ko, not for e.g. header-sync.

Signed-off-by: Mark McLoughlin mar...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kernel/configure b/kernel/configure
index 3fd0c94..7a222e4 100755
--- a/kernel/configure
+++ b/kernel/configure
@@ -12,6 +12,7 @@ cross_prefix=
 arch=`uname -m`
 # don't use uname if kerneldir is set
 no_uname=
+# we only need depmod_version for kvm.ko install
 depmod_version=
 if [ -z TMPDIR ] ; then
 TMPDIR=.
@@ -80,7 +81,7 @@ if [ -d $kerneldir/include2 ]; then
 kernelsourcedir=${kerneldir%/*}/source
 fi
 
-if [ -n $no_uname ]; then
+if [ -n $no_uname -a $want_module ]; then
 if [ -e $kerneldir/.kernelrelease ]; then
 depmod_version=`cat $kerneldir/.kernelrelease`
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: extboot: silence compiler warning

2009-04-06 Thread Avi Kivity
From: Jan Kiszka jan.kis...@web.de

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/qemu/hw/extboot.c b/qemu/hw/extboot.c
index 32e6226..13ffafa 100644
--- a/qemu/hw/extboot.c
+++ b/qemu/hw/extboot.c
@@ -77,8 +77,8 @@ static void extboot_write_cmd(void *opaque, uint32_t addr, 
uint32_t value)
 BlockDriverState *bs = opaque;
 int cylinders, heads, sectors, err;
 uint64_t nb_sectors;
-target_phys_addr_t pa;
-int blen;
+target_phys_addr_t pa = 0;
+int blen = 0;
 void *buf = NULL;
 
 if (cmd-type == 0x01 || cmd-type == 0x02) {
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: testsuite: test long JMP emulation

2009-04-06 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/user/test/x86/realmode.c b/user/test/x86/realmode.c
index 336ba1c..755b5d1 100644
--- a/user/test/x86/realmode.c
+++ b/user/test/x86/realmode.c
@@ -451,6 +451,23 @@ void test_jcc_near(void)
print_serial(JMP near Test 1: FAIL\n);
 }
 
+void test_long_jmp()
+{
+   struct regs inregs = { 0 }, outregs;
+   u32 esp[16];
+
+   inregs.esp = (u32)esp;
+   MK_INSN(long_jmp, call 1f\n\t
+ jmp 2f\n\t
+ 1: jmp $0, $test_function\n\t
+ 2:\n\t);
+   exec_in_big_real_mode(inregs, outregs,
+ insn_long_jmp,
+ insn_long_jmp_end - insn_long_jmp);
+   if(!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x1234)
+   print_serial(Long JMP Test: FAIL\n);
+}
+
 void test_null(void)
 {
struct regs inregs = { 0 }, outregs;
@@ -473,6 +490,8 @@ void start(void)
test_jcc_near();
/* test_call() uses short jump so call it after testing jcc */
test_call();
+   /* long jmp test uses call near so test it after testing call */
+   test_long_jmp();
 
exit(0);
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: Update .gitignore

2009-04-06 Thread Avi Kivity
From: Jan Kiszka jan.kis...@web.de

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/.gitignore b/.gitignore
index fcdc357..22a8200 100644
--- a/.gitignore
+++ b/.gitignore
@@ -53,10 +53,14 @@ kernel/x86/coalesced_mmio.[ch]
 kernel/x86/kvm_cache_regs.h
 kernel/x86/vtd.c
 kernel/x86/irq_comm.c
+kernel/x86/timer.c
+kernel/x86/kvm_timer.h
+kernel/x86/iommu.c
 qemu/pc-bios/extboot.bin
 qemu/qemu-doc.html
 qemu/*.[18]
 qemu/*.pod
 qemu/qemu-tech.html
+qemu/qemu-options.texi
 user/kvmtrace
 user/test/x86/bootstrap
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: qemu: fixup 4GB+ memslot large page alignment

2009-04-06 Thread Avi Kivity
From: Marcelo Tosatti mtosa...@redhat.com

Need to align the 4GB+ memslot after we know its address, not before.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c
index d4a4320..cc84772 100644
--- a/qemu/hw/pc.c
+++ b/qemu/hw/pc.c
@@ -866,6 +866,7 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
 
 /* above 4giga memory allocation */
 if (above_4g_mem_size  0) {
+ram_addr = qemu_ram_alloc(above_4g_mem_size);
 if (hpagesize) {
 if (ram_addr  (hpagesize-1)) {
 unsigned long aligned_addr;
@@ -874,7 +875,6 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
 ram_addr = aligned_addr;
 }
 }
-ram_addr = qemu_ram_alloc(above_4g_mem_size);
 cpu_register_physical_memory(0x1ULL,
  above_4g_mem_size,
  ram_addr);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: Fix handling of a fault during NMI unblocked due to IRET

2009-04-06 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Bit 12 is undefined in any of the following cases:
 If the VM exit sets the valid bit in the IDT-vectoring information field.
 If the VM exit is due to a double fault.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7d7b0d6..631f9b7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3272,36 +3272,41 @@ static void update_tpr_threshold(struct kvm_vcpu *vcpu)
 static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
 {
u32 exit_intr_info;
-   u32 idt_vectoring_info;
+   u32 idt_vectoring_info = vmx-idt_vectoring_info;
bool unblock_nmi;
u8 vector;
int type;
bool idtv_info_valid;
u32 error;
 
+   idtv_info_valid = idt_vectoring_info  VECTORING_INFO_VALID_MASK;
exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
if (cpu_has_virtual_nmis()) {
unblock_nmi = (exit_intr_info  INTR_INFO_UNBLOCK_NMI) != 0;
vector = exit_intr_info  INTR_INFO_VECTOR_MASK;
/*
-* SDM 3: 25.7.1.2
+* SDM 3: 27.7.1.2 (September 2008)
 * Re-set bit block by NMI before VM entry if vmexit caused by
 * a guest IRET fault.
+* SDM 3: 23.2.2 (September 2008)
+* Bit 12 is undefined in any of the following cases:
+*  If the VM exit sets the valid bit in the IDT-vectoring
+*   information field.
+*  If the VM exit is due to a double fault.
 */
-   if (unblock_nmi  vector != DF_VECTOR)
+   if ((exit_intr_info  INTR_INFO_VALID_MASK)  unblock_nmi 
+   vector != DF_VECTOR  !idtv_info_valid)
vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
  GUEST_INTR_STATE_NMI);
} else if (unlikely(vmx-soft_vnmi_blocked))
vmx-vnmi_blocked_time +=
ktime_to_ns(ktime_sub(ktime_get(), vmx-entry_time));
 
-   idt_vectoring_info = vmx-idt_vectoring_info;
-   idtv_info_valid = idt_vectoring_info  VECTORING_INFO_VALID_MASK;
vector = idt_vectoring_info  VECTORING_INFO_VECTOR_MASK;
type = idt_vectoring_info  VECTORING_INFO_TYPE_MASK;
if (vmx-vcpu.arch.nmi_injected) {
/*
-* SDM 3: 25.7.1.2
+* SDM 3: 27.7.1.2 (September 2008)
 * Clear bit block by NMI before VM entry if a NMI delivery
 * faulted.
 */
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Use rsvd_bits_mask in load_pdptrs()

2009-04-06 Thread Avi Kivity
From: Dong, Eddie eddie.d...@intel.com

Also remove bit 5-6 from rsvd_bits_mask per latest SDM.

Signed-off-by: Eddie Dong eddie.d...@intel.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e0f63b6..76dd43c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -225,11 +225,6 @@ static int is_nx(struct kvm_vcpu *vcpu)
return vcpu-arch.shadow_efer  EFER_NX;
 }
 
-static int is_present_pte(unsigned long pte)
-{
-   return pte  PT_PRESENT_MASK;
-}
-
 static int is_shadow_present_pte(u64 pte)
 {
return pte != shadow_trap_nonpresent_pte
@@ -2195,6 +2190,9 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, 
int level)
context-rsvd_bits_mask[1][0] = ~0ull;
break;
case PT32E_ROOT_LEVEL:
+   context-rsvd_bits_mask[0][2] =
+   rsvd_bits(maxphyaddr, 63) |
+   rsvd_bits(7, 8) | rsvd_bits(1, 2);  /* PDPTE */
context-rsvd_bits_mask[0][1] = exb_bit_rsvd |
rsvd_bits(maxphyaddr, 62);  /* PDE */
context-rsvd_bits_mask[0][0] = exb_bit_rsvd |
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index eaab214..3494a2f 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -75,4 +75,9 @@ static inline int is_paging(struct kvm_vcpu *vcpu)
return vcpu-arch.cr0  X86_CR0_PG;
 }
 
+static inline int is_present_pte(unsigned long pte)
+{
+   return pte  PT_PRESENT_MASK;
+}
+
 #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index aeb0193..0dcf95b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -234,7 +234,8 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
goto out;
}
for (i = 0; i  ARRAY_SIZE(pdpte); ++i) {
-   if ((pdpte[i]  1)  (pdpte[i]  0xfff001e6ull)) {
+   if (is_present_pte(pdpte[i]) 
+   (pdpte[i]  vcpu-arch.mmu.rsvd_bits_mask[0][2])) {
ret = 0;
goto out;
}
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: Fix feature testing

2009-04-06 Thread Avi Kivity
From: Sheng Yang sh...@linux.intel.com

The testing of feature is too early now, before vmcs_config complete 
initialization.

Signed-off-by: Sheng Yang sh...@linux.intel.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1caa1fc..7d7b0d6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1208,15 +1208,6 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
  vmx_capability.ept, vmx_capability.vpid);
}
 
-   if (!cpu_has_vmx_vpid())
-   enable_vpid = 0;
-
-   if (!cpu_has_vmx_ept())
-   enable_ept = 0;
-
-   if (!cpu_has_vmx_flexpriority())
-   flexpriority_enabled = 0;
-
min = 0;
 #ifdef CONFIG_X86_64
min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
@@ -1320,6 +1311,15 @@ static __init int hardware_setup(void)
if (boot_cpu_has(X86_FEATURE_NX))
kvm_enable_efer_bits(EFER_NX);
 
+   if (!cpu_has_vmx_vpid())
+   enable_vpid = 0;
+
+   if (!cpu_has_vmx_ept())
+   enable_ept = 0;
+
+   if (!cpu_has_vmx_flexpriority())
+   flexpriority_enabled = 0;
+
return alloc_kvm_area();
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Merge commit 'qemu-svn/trunk'

2009-04-06 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

* commit 'qemu-svn/trunk': (38 commits)
  Remove WIN32 guard around -k
  Add new command line option -singlestep for tcg single stepping.
  tcg/x86_64: optimize register allocation order
  stop dirty tracking just at the end of migration (Glauber Costa)
  create qemu_file_set_error (Glauber Costa)
  propagate error on failed completion (Glauber Costa)
  Disable qemu-io on Win32
  Add files not included in previous commit.
  Fix savevm after BDRV_FILE size enforcement
  Fix the build for --disable-aio
  gdbstub: Rework configuration via command line and monitor (Jan Kiszka)
  Make `-icount' help fit 80 chars screen width (Robert Riebisch)
  qemu-io - an I/O path exerciser (Christoph Hellwig)
  Fix display breakage when resizing the screen (v2) (Avi Kivity)
  Fix some win32 compile warnings
  Make binary stripping conditional (Riku Voipio)
  qcow2: fix image creation for large,  ~2TB, images (Chris Wright)
  pci_add storage: fix error handling for 'if' parameter (Eduardo Habkost)
  build system: clean qemu-options.texi and gdbstub-xml.c (Jan Kiszka)
  build system: silent generation of doc files and qemu-options.h (Jan Kiszka)
  ...

Conflicts:
qemu/Makefile.target
qemu/hw/vga.c
qemu/vl.c

Signed-off-by: Avi Kivity a...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: Rewrite vmx_complete_interrupt()'s twisted maze of if() statements

2009-04-06 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

...with a more straightforward switch().

Also fix a bug when NMI could be dropped on exit. Although this should
never happen in practice, since NMIs can only be injected, never triggered
internally by the guest like exceptions.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 631f9b7..577aa95 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3277,7 +3277,6 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
u8 vector;
int type;
bool idtv_info_valid;
-   u32 error;
 
idtv_info_valid = idt_vectoring_info  VECTORING_INFO_VALID_MASK;
exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
@@ -3302,34 +3301,42 @@ static void vmx_complete_interrupts(struct vcpu_vmx 
*vmx)
vmx-vnmi_blocked_time +=
ktime_to_ns(ktime_sub(ktime_get(), vmx-entry_time));
 
+   vmx-vcpu.arch.nmi_injected = false;
+   kvm_clear_exception_queue(vmx-vcpu);
+   kvm_clear_interrupt_queue(vmx-vcpu);
+
+   if (!idtv_info_valid)
+   return;
+
vector = idt_vectoring_info  VECTORING_INFO_VECTOR_MASK;
type = idt_vectoring_info  VECTORING_INFO_TYPE_MASK;
-   if (vmx-vcpu.arch.nmi_injected) {
+
+   switch(type) {
+   case INTR_TYPE_NMI_INTR:
+   vmx-vcpu.arch.nmi_injected = true;
/*
 * SDM 3: 27.7.1.2 (September 2008)
-* Clear bit block by NMI before VM entry if a NMI delivery
-* faulted.
+* Clear bit block by NMI before VM entry if a NMI
+* delivery faulted.
 */
-   if (idtv_info_valid  type == INTR_TYPE_NMI_INTR)
-   vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO,
-   GUEST_INTR_STATE_NMI);
-   else
-   vmx-vcpu.arch.nmi_injected = false;
-   }
-   kvm_clear_exception_queue(vmx-vcpu);
-   if (idtv_info_valid  (type == INTR_TYPE_HARD_EXCEPTION ||
-   type == INTR_TYPE_SOFT_EXCEPTION)) {
+   vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO,
+   GUEST_INTR_STATE_NMI);
+   break;
+   case INTR_TYPE_HARD_EXCEPTION:
+   case INTR_TYPE_SOFT_EXCEPTION:
if (idt_vectoring_info  VECTORING_INFO_DELIVER_CODE_MASK) {
-   error = vmcs_read32(IDT_VECTORING_ERROR_CODE);
-   kvm_queue_exception_e(vmx-vcpu, vector, error);
+   u32 err = vmcs_read32(IDT_VECTORING_ERROR_CODE);
+   kvm_queue_exception_e(vmx-vcpu, vector, err);
} else
kvm_queue_exception(vmx-vcpu, vector);
vmx-idt_vectoring_info = 0;
-   }
-   kvm_clear_interrupt_queue(vmx-vcpu);
-   if (idtv_info_valid  type == INTR_TYPE_EXT_INTR) {
+   break;
+   case INTR_TYPE_EXT_INTR:
kvm_queue_interrupt(vmx-vcpu, vector);
vmx-idt_vectoring_info = 0;
+   break;
+   default:
+   break;
}
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Fix task switch back link handling.

2009-04-06 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Back link is written to a wrong TSS now.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0dcf95b..a13fa70 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3710,7 +3710,6 @@ static void save_state_to_tss32(struct kvm_vcpu *vcpu,
tss-fs = get_segment_selector(vcpu, VCPU_SREG_FS);
tss-gs = get_segment_selector(vcpu, VCPU_SREG_GS);
tss-ldt_selector = get_segment_selector(vcpu, VCPU_SREG_LDTR);
-   tss-prev_task_link = get_segment_selector(vcpu, VCPU_SREG_TR);
 }
 
 static int load_state_from_tss32(struct kvm_vcpu *vcpu,
@@ -3807,8 +3806,8 @@ static int load_state_from_tss16(struct kvm_vcpu *vcpu,
 }
 
 static int kvm_task_switch_16(struct kvm_vcpu *vcpu, u16 tss_selector,
-  u32 old_tss_base,
-  struct desc_struct *nseg_desc)
+ u16 old_tss_sel, u32 old_tss_base,
+ struct desc_struct *nseg_desc)
 {
struct tss_segment_16 tss_segment_16;
int ret = 0;
@@ -3827,6 +3826,16 @@ static int kvm_task_switch_16(struct kvm_vcpu *vcpu, u16 
tss_selector,
   tss_segment_16, sizeof tss_segment_16))
goto out;
 
+   if (old_tss_sel != 0x) {
+   tss_segment_16.prev_task_link = old_tss_sel;
+
+   if (kvm_write_guest(vcpu-kvm,
+   get_tss_base_addr(vcpu, nseg_desc),
+   tss_segment_16.prev_task_link,
+   sizeof tss_segment_16.prev_task_link))
+   goto out;
+   }
+
if (load_state_from_tss16(vcpu, tss_segment_16))
goto out;
 
@@ -3836,7 +3845,7 @@ out:
 }
 
 static int kvm_task_switch_32(struct kvm_vcpu *vcpu, u16 tss_selector,
-  u32 old_tss_base,
+  u16 old_tss_sel, u32 old_tss_base,
   struct desc_struct *nseg_desc)
 {
struct tss_segment_32 tss_segment_32;
@@ -3856,6 +3865,16 @@ static int kvm_task_switch_32(struct kvm_vcpu *vcpu, u16 
tss_selector,
   tss_segment_32, sizeof tss_segment_32))
goto out;
 
+   if (old_tss_sel != 0x) {
+   tss_segment_32.prev_task_link = old_tss_sel;
+
+   if (kvm_write_guest(vcpu-kvm,
+   get_tss_base_addr(vcpu, nseg_desc),
+   tss_segment_32.prev_task_link,
+   sizeof tss_segment_32.prev_task_link))
+   goto out;
+   }
+
if (load_state_from_tss32(vcpu, tss_segment_32))
goto out;
 
@@ -3911,12 +3930,17 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
tss_selector, int reason)
 
kvm_x86_ops-skip_emulated_instruction(vcpu);
 
+   /* set back link to prev task only if NT bit is set in eflags
+  note that old_tss_sel is not used afetr this point */
+   if (reason != TASK_SWITCH_CALL  reason != TASK_SWITCH_GATE)
+   old_tss_sel = 0x;
+
if (nseg_desc.type  8)
-   ret = kvm_task_switch_32(vcpu, tss_selector, old_tss_base,
-nseg_desc);
+   ret = kvm_task_switch_32(vcpu, tss_selector, old_tss_sel,
+old_tss_base, nseg_desc);
else
-   ret = kvm_task_switch_16(vcpu, tss_selector, old_tss_base,
-nseg_desc);
+   ret = kvm_task_switch_16(vcpu, tss_selector, old_tss_sel,
+old_tss_base, nseg_desc);
 
if (reason == TASK_SWITCH_CALL || reason == TASK_SWITCH_GATE) {
u32 eflags = kvm_x86_ops-get_rflags(vcpu);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: Do not zero idt_vectoring_info in vmx_complete_interrupts().

2009-04-06 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

We will need it later in task_switch().
Code in handle_exception() is dead. is_external_interrupt(vect_info)
will always be false since idt_vectoring_info is zeroed in
vmx_complete_interrupts().

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 577aa95..e4ad9d3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2626,11 +2626,6 @@ static int handle_exception(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
printk(KERN_ERR %s: unexpected, vectoring info 0x%x 
   intr info 0x%x\n, __func__, vect_info, intr_info);
 
-   if (!irqchip_in_kernel(vcpu-kvm)  is_external_interrupt(vect_info)) {
-   int irq = vect_info  VECTORING_INFO_VECTOR_MASK;
-   kvm_push_irq(vcpu, irq);
-   }
-
if ((intr_info  INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR)
return 1;  /* already handled by vmx_vcpu_run() */
 
@@ -3329,11 +3324,9 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
kvm_queue_exception_e(vmx-vcpu, vector, err);
} else
kvm_queue_exception(vmx-vcpu, vector);
-   vmx-idt_vectoring_info = 0;
break;
case INTR_TYPE_EXT_INTR:
kvm_queue_interrupt(vmx-vcpu, vector);
-   vmx-idt_vectoring_info = 0;
break;
default:
break;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Fix unneeded instruction skipping during task switching.

2009-04-06 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

There is no need to skip instruction if the reason for a task switch
is a task gate in IDT and access to it is caused by an external even.
The problem  is currently solved only for VMX since there is no reliable
way to skip an instruction in SVM. We should emulate it instead.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 82ada75..85574b7 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -225,6 +225,7 @@ struct __attribute__ ((__packed__)) vmcb {
 #define SVM_EVTINJ_VALID_ERR (1  11)
 
 #define SVM_EXITINTINFO_VEC_MASK SVM_EVTINJ_VEC_MASK
+#define SVM_EXITINTINFO_TYPE_MASK SVM_EVTINJ_TYPE_MASK
 
 #defineSVM_EXITINTINFO_TYPE_INTR SVM_EVTINJ_TYPE_INTR
 #defineSVM_EXITINTINFO_TYPE_NMI SVM_EVTINJ_TYPE_NMI
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1fcbc17..3ffb695 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1823,17 +1823,28 @@ static int task_switch_interception(struct vcpu_svm 
*svm,
struct kvm_run *kvm_run)
 {
u16 tss_selector;
+   int reason;
+   int int_type = svm-vmcb-control.exit_int_info 
+   SVM_EXITINTINFO_TYPE_MASK;
 
tss_selector = (u16)svm-vmcb-control.exit_info_1;
+
if (svm-vmcb-control.exit_info_2 
(1ULL  SVM_EXITINFOSHIFT_TS_REASON_IRET))
-   return kvm_task_switch(svm-vcpu, tss_selector,
-  TASK_SWITCH_IRET);
-   if (svm-vmcb-control.exit_info_2 
-   (1ULL  SVM_EXITINFOSHIFT_TS_REASON_JMP))
-   return kvm_task_switch(svm-vcpu, tss_selector,
-  TASK_SWITCH_JMP);
-   return kvm_task_switch(svm-vcpu, tss_selector, TASK_SWITCH_CALL);
+   reason = TASK_SWITCH_IRET;
+   else if (svm-vmcb-control.exit_info_2 
+(1ULL  SVM_EXITINFOSHIFT_TS_REASON_JMP))
+   reason = TASK_SWITCH_JMP;
+   else if (svm-vmcb-control.exit_int_info  SVM_EXITINTINFO_VALID)
+   reason = TASK_SWITCH_GATE;
+   else
+   reason = TASK_SWITCH_CALL;
+
+
+   if (reason != TASK_SWITCH_GATE || int_type == SVM_EXITINTINFO_TYPE_SOFT)
+   skip_emulated_instruction(svm-vcpu);
+
+   return kvm_task_switch(svm-vcpu, tss_selector, reason);
 }
 
 static int cpuid_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e4ad9d3..c6997c0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3038,22 +3038,40 @@ static int handle_task_switch(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
struct vcpu_vmx *vmx = to_vmx(vcpu);
unsigned long exit_qualification;
u16 tss_selector;
-   int reason;
+   int reason, type, idt_v;
+
+   idt_v = (vmx-idt_vectoring_info  VECTORING_INFO_VALID_MASK);
+   type = (vmx-idt_vectoring_info  VECTORING_INFO_TYPE_MASK);
 
exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 
reason = (u32)exit_qualification  30;
-   if (reason == TASK_SWITCH_GATE  vmx-vcpu.arch.nmi_injected 
-   (vmx-idt_vectoring_info  VECTORING_INFO_VALID_MASK) 
-   (vmx-idt_vectoring_info  VECTORING_INFO_TYPE_MASK)
-   == INTR_TYPE_NMI_INTR) {
-   vcpu-arch.nmi_injected = false;
-   if (cpu_has_virtual_nmis())
-   vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
- GUEST_INTR_STATE_NMI);
+   if (reason == TASK_SWITCH_GATE  idt_v) {
+   switch (type) {
+   case INTR_TYPE_NMI_INTR:
+   vcpu-arch.nmi_injected = false;
+   if (cpu_has_virtual_nmis())
+   vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+ GUEST_INTR_STATE_NMI);
+   break;
+   case INTR_TYPE_EXT_INTR:
+   kvm_clear_interrupt_queue(vcpu);
+   break;
+   case INTR_TYPE_HARD_EXCEPTION:
+   case INTR_TYPE_SOFT_EXCEPTION:
+   kvm_clear_exception_queue(vcpu);
+   break;
+   default:
+   break;
+   }
}
tss_selector = exit_qualification;
 
+   if (!idt_v || (type != INTR_TYPE_HARD_EXCEPTION 
+  type != INTR_TYPE_EXT_INTR 
+  type != INTR_TYPE_NMI_INTR))
+   skip_emulated_instruction(vcpu);
+
if (!kvm_task_switch(vcpu, tss_selector, reason))
return 0;
 
@@ -3306,7 +3324,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
vector = idt_vectoring_info  VECTORING_INFO_VECTOR_MASK;
type = idt_vectoring_info  VECTORING_INFO_TYPE_MASK;
 

[PATCH] KVM: VMX: Clean up Flex Priority related

2009-04-06 Thread Avi Kivity
From: Sheng Yang sh...@linux.intel.com

And clean paranthes on returns.

Signed-off-by: Sheng Yang sh...@linux.intel.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index aba41ae..1caa1fc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -216,61 +216,69 @@ static inline int is_external_interrupt(u32 intr_info)
 
 static inline int cpu_has_vmx_msr_bitmap(void)
 {
-   return (vmcs_config.cpu_based_exec_ctrl  CPU_BASED_USE_MSR_BITMAPS);
+   return vmcs_config.cpu_based_exec_ctrl  CPU_BASED_USE_MSR_BITMAPS;
 }
 
 static inline int cpu_has_vmx_tpr_shadow(void)
 {
-   return (vmcs_config.cpu_based_exec_ctrl  CPU_BASED_TPR_SHADOW);
+   return vmcs_config.cpu_based_exec_ctrl  CPU_BASED_TPR_SHADOW;
 }
 
 static inline int vm_need_tpr_shadow(struct kvm *kvm)
 {
-   return ((cpu_has_vmx_tpr_shadow())  (irqchip_in_kernel(kvm)));
+   return (cpu_has_vmx_tpr_shadow())  (irqchip_in_kernel(kvm));
 }
 
 static inline int cpu_has_secondary_exec_ctrls(void)
 {
-   return (vmcs_config.cpu_based_exec_ctrl 
-   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS);
+   return vmcs_config.cpu_based_exec_ctrl 
+   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
 }
 
 static inline bool cpu_has_vmx_virtualize_apic_accesses(void)
 {
-   return flexpriority_enabled;
+   return vmcs_config.cpu_based_2nd_exec_ctrl 
+   SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
+}
+
+static inline bool cpu_has_vmx_flexpriority(void)
+{
+   return cpu_has_vmx_tpr_shadow() 
+   cpu_has_vmx_virtualize_apic_accesses();
 }
 
 static inline int cpu_has_vmx_invept_individual_addr(void)
 {
-   return (!!(vmx_capability.ept  VMX_EPT_EXTENT_INDIVIDUAL_BIT));
+   return !!(vmx_capability.ept  VMX_EPT_EXTENT_INDIVIDUAL_BIT);
 }
 
 static inline int cpu_has_vmx_invept_context(void)
 {
-   return (!!(vmx_capability.ept  VMX_EPT_EXTENT_CONTEXT_BIT));
+   return !!(vmx_capability.ept  VMX_EPT_EXTENT_CONTEXT_BIT);
 }
 
 static inline int cpu_has_vmx_invept_global(void)
 {
-   return (!!(vmx_capability.ept  VMX_EPT_EXTENT_GLOBAL_BIT));
+   return !!(vmx_capability.ept  VMX_EPT_EXTENT_GLOBAL_BIT);
 }
 
 static inline int cpu_has_vmx_ept(void)
 {
-   return (vmcs_config.cpu_based_2nd_exec_ctrl 
-   SECONDARY_EXEC_ENABLE_EPT);
+   return vmcs_config.cpu_based_2nd_exec_ctrl 
+   SECONDARY_EXEC_ENABLE_EPT;
 }
 
 static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm)
 {
-   return ((cpu_has_vmx_virtualize_apic_accesses()) 
-   (irqchip_in_kernel(kvm)));
+   return flexpriority_enabled 
+   (cpu_has_vmx_virtualize_apic_accesses()) 
+   (irqchip_in_kernel(kvm));
 }
 
 static inline int cpu_has_vmx_vpid(void)
 {
-   return (vmcs_config.cpu_based_2nd_exec_ctrl 
-   SECONDARY_EXEC_ENABLE_VPID);
+   return vmcs_config.cpu_based_2nd_exec_ctrl 
+   SECONDARY_EXEC_ENABLE_VPID;
 }
 
 static inline int cpu_has_virtual_nmis(void)
@@ -278,6 +286,11 @@ static inline int cpu_has_virtual_nmis(void)
return vmcs_config.pin_based_exec_ctrl  PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline bool report_flexpriority(void)
+{
+   return flexpriority_enabled;
+}
+
 static int __find_msr_index(struct vcpu_vmx *vmx, u32 msr)
 {
int i;
@@ -1201,7 +1214,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
if (!cpu_has_vmx_ept())
enable_ept = 0;
 
-   if (!(vmcs_config.cpu_based_2nd_exec_ctrl  
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
+   if (!cpu_has_vmx_flexpriority())
flexpriority_enabled = 0;
 
min = 0;
@@ -3655,7 +3668,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.check_processor_compatibility = vmx_check_processor_compat,
.hardware_enable = hardware_enable,
.hardware_disable = hardware_disable,
-   .cpu_has_accelerated_tpr = cpu_has_vmx_virtualize_apic_accesses,
+   .cpu_has_accelerated_tpr = report_flexpriority,
 
.vcpu_create = vmx_create_vcpu,
.vcpu_free = vmx_free_vcpu,
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: MMU: Discard reserved bits checking on PDE bit 7-8

2009-04-06 Thread Avi Kivity
From: Sheng Yang sh...@linux.intel.com

1. It's related to a Linux kernel bug which fixed by Ingo on
07a66d7c53a538e1a9759954a82bb6c07365eff9. The original code exists for quite a
long time, and it would convert a PDE for large page into a normal PDE. But it
fail to fit normal PDE well.  With the code before Ingo's fix, the kernel would
fall reserved bit checking with bit 8 - the remaining global bit of PTE. So the
kernel would receive a double-fault.

2. After discussion, we decide to discard PDE bit 7-8 reserved checking for now.
For this marked as reserved in SDM, but didn't checked by the processor in
fact...

Signed-off-by: Sheng Yang sh...@linux.intel.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 76dd43c..d5bdf3a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2194,7 +2194,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, 
int level)
rsvd_bits(maxphyaddr, 63) |
rsvd_bits(7, 8) | rsvd_bits(1, 2);  /* PDPTE */
context-rsvd_bits_mask[0][1] = exb_bit_rsvd |
-   rsvd_bits(maxphyaddr, 62);  /* PDE */
+   rsvd_bits(maxphyaddr, 62);  /* PDE */
context-rsvd_bits_mask[0][0] = exb_bit_rsvd |
rsvd_bits(maxphyaddr, 62);  /* PTE */
context-rsvd_bits_mask[1][1] = exb_bit_rsvd |
@@ -2208,13 +2208,14 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu 
*vcpu, int level)
context-rsvd_bits_mask[0][2] = exb_bit_rsvd |
rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
context-rsvd_bits_mask[0][1] = exb_bit_rsvd |
-   rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8);
+   rsvd_bits(maxphyaddr, 51);
context-rsvd_bits_mask[0][0] = exb_bit_rsvd |
rsvd_bits(maxphyaddr, 51);
context-rsvd_bits_mask[1][3] = context-rsvd_bits_mask[0][3];
context-rsvd_bits_mask[1][2] = context-rsvd_bits_mask[0][2];
context-rsvd_bits_mask[1][1] = exb_bit_rsvd |
-   rsvd_bits(maxphyaddr, 51) | rsvd_bits(13, 20);
+   rsvd_bits(maxphyaddr, 51) |
+   rsvd_bits(13, 20);  /* large page */
context-rsvd_bits_mask[1][0] = ~0ull;
break;
}
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86 emulator: fix call near emulation

2009-04-06 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

The length of pushed on to the stack return address depends on operand
size not address size.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index ca91749..d7c9f6f 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -1792,7 +1792,6 @@ special_insn:
}
c-src.val = (unsigned long) c-eip;
jmp_rel(c, rel);
-   c-op_bytes = c-ad_bytes;
emulate_push(ctxt);
break;
}
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer

2009-04-06 Thread Avi Kivity

Pekka Paalanen wrote:

Not just emulation but address diversion, i.e. modifying the operation
(not the text) before executing it. Mmiotrace could do something like
this:
1. a blob calls ioremap
2. mmiotrace maps the MMIO area privately
3. the blob receives a dummy map from ioremap, that will generate
page fault
4. the blob accesses the dummy map and raises a page fault
5. pf handler detects the dummy map
6. mmiotrace pf handler emulates the instruction and replaces the
dummy address with the real MMIO address.
7. mmiotrace records the operation and the datum
8. go to step 4, or whatever

This means mmiotrace would not have to fiddle with the page
tables and page presence bits like it does now. As said, this
would make mmiotrace SMP-proof, and also eliminate the die notifier
(used for the instruction single stepping trap).

IMO a big step from a hack to a tool. Getting rid of the custom
instruction parser in mmiotrace would be a good step in itself.

Avi Kivity noted, that the KVM emulator does almost everything. Does
it allow also address diversion?
  


Operand access is by means of a callback, so yes.  In kvm's use, it's 
used to access guest memory, so it modified the addresses before reading 
or writing.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add ksm kernel shared memory driver.

2009-04-06 Thread Izik Eidus

Andrey Panin wrote:

On 094, 04 04, 2009 at 05:35:22PM +0300, Izik Eidus wrote:

SNIP

  

+static inline u32 calc_checksum(struct page *page)
+{
+   u32 checksum;
+   void *addr = kmap_atomic(page, KM_USER0);
+   checksum = jhash(addr, PAGE_SIZE, 17);



Why jhash2() is not used here ? It's faster and leads to smaller code size.
  


Beacuse i didnt know, i will check that and change.

Thanks.

(We should really use in cpu crc for Intel Nehalem, and dirty bit for 
the rest of the architactures...)


  

+   kunmap_atomic(addr, KM_USER0);
+   return checksum;
+}



  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VMExits on Software Interrupts

2009-04-06 Thread Avi Kivity

Rastogi, Shaurya wrote:

Hi,

Is there any way to generate a VMExit for software interrupts?

I am interested in causing a VMExit when Int 80 instruction is executed in guest VM. Is it possible to do so? If so how? 
  


Neither vmx nor svm support trapping on software interrupts.  You might 
be able to track changes to the IDT and install a hardware breakpoint on 
the int 80 handler, but that's quite difficult and hacky.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance

2009-04-06 Thread Hauke Hoffmann
On Friday 03 April 2009 13:32:50 you wrote:
 Hallo,

 as I want to switch from XEN to KVM I've made some performance tests
 to see if KVM is as peformant as XEN. But tests with a VMU that receives
 a streamed video, adds a small logo to the video and streams it to a
 client
 have shown that XEN performs much betten than KVM.
 In XEN the vlc (videolan client used to receive, process and send the
 video) process
 within the vmu has a cpuload of 33,8 % whereas in KVM
 the vlc process has a cpuload of 99.9 %.
 I'am not sure why, does anybody now some settings to improve
 the KVM performance?

 Thank you.
 Regards, Stefanie.


 Used hardware and settings:
 In the tests I've used the same host hardware for XEN and KVM:
 - Dual Core AMD 2.2 GHz, 8 GB RAM
 - Tested OSes for KVM Host: Fedora 10, 2.6.27.5-117.fc10.x86_64 with kvm
 version 10.fc10 version 74
 also tested in january: compiled kernel with
 kvm-83

 - KVM Guest settings: OS: Fedora 9 2.6.25-14.fc9.x86_64 (i386 also
 tested)
   RAM: 256 MB (same for XEN vmu)
   CPU: 1 Core with 2,2 GHz (same for XEN vmu)
   tested nic models: rtl8139, e1000, virtio

 Tested Scenario: VMU receives a streamed video , adds a logo (watermark)
 to the video stream and then streams it to a client

 Results:

 XEN:
 Host cpu load (virt-manager): 23%
 VMU  cpu load (virt-manager): 18 %
 VLC process within VMU (top): 33,8%

 KVM:
 no virt-manager cpu load as I started the vmu with the kvm command
 Host cpu load :   52%
 qemu-kvm process (top)77-100%
 VLC process within vmu (top): 80 - 99,9%

 KVM command to start vmu
 /usr/bin/qemu-kvm -boot c -hda /images/vmu01.raw -m 256 -net
 nic,vlan=0,macaddr=aa:bb:cc:dd:ee:10,model=virtio -net
 tap,ifname=tap0,vlan=0,script=/etc/kvm/qemu-ifup,downscript=/etc/kvm/qem
 u-ifdown -vnc 127.0.0.1:1 -k de --daemonize

Hi Stefanie,

does vlc perform operations on disc (eg caching, logging, ...)? 

When it cache you can use virtio also for the disk. 
Just change
-hda /images/vmu01.raw
to
-drive file=/images/vmu01.raw,if=virtio,boot=on

Regards
Hauke







 

 Alcatel-Lucent Deutschland AG
 Bell Labs Germany
 Service Infrastructure, ZFZ-SI
 Stefanie Braun
 Phone:   +49.711.821-34865
 Fax: +49.711.821-32453

 Postal address:
 Alcatel-Lucent Deutschland AG
 Lorenzstrasse 10
 D-70435 STUTTGART

 Mail: stefanie.br...@alcatel-lucent.de



 Alcatel-Lucent Deutschland AG
 Sitz der Gesellschaft: Stuttgart - Amtsgericht Stuttgart HRB 4026
 Vorsitzender des Aufsichtsrats: Michael Oppenhoff Vorstand: Alf Henryk
 Wulf (Vors.), Dr. Rainer Fechner

 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 

hauke hoffmann service and electronic systems

Moristeig 60, D-23556 Lübeck

Telefon: +49 (0) 451 8896462
Fax: +49 (0) 451 8896461
Mobil: +49 (0) 170 7580491
E-Mail: off...@hauke-hoffmann.net
PGP public key: www.hauke-hoffmann.net/static/pgp/kontakt.asc
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-06 Thread Nikola Ciprich
Hi Izik,
Is there some user documentation available? (apart from RTFS?:))
I've compiled kernel with v2 of Your patches, loaded ksm module,
did echo 1  /proc/sys/kernel/mm/ksm/run, but I think it didn't do
anything, at least no pages were collected..
Could You advise me a bit?
thanks a lot in advance...
I can't wait to try it on our hosts runing 50-60 KVMs :)
BR
nik


On Sat, Apr 04, 2009 at 05:35:18PM +0300, Izik Eidus wrote:
 From v1 to v2:
 
 1)Fixed security issue found by Chris Wright:
 Ksm was checking if page is a shared page by running !PageAnon.
 Beacuse that Ksm scan only anonymous memory, all !PageAnons
 inside ksm data strctures are shared page, however there might
 be a case for do_wp_page() when the VM_SHARED is used where
 do_wp_page() would instead of copying the page into new anonymos
 page, would reuse the page, it was fixed by adding check for the
 dirty_bit of the virtual addresses pointing into the shared page.
 I was not finding any VM code tha would clear the dirty bit from
 this virtual address (due to the fact that we allocate the page
 using page_alloc() - kernel allocated pages), ~but i still want
 confirmation about this from the vm guys - thanks.~
 
 2)Moved to sysfs to control ksm:
 It was requested as a better way to control the ksm scanning
 thread than ioctls.
 the sysfs api:
 dir: /sys/kernel/mm/ksm/
 
 kernel_pages_allocated - information about how many kernel pages
 ksm have allocated, this pages are not swappable, and each page
 like that is used by ksm to share pages with identical content
 
 pages_shared - how many pages were shared by ksm
 
 run - set to 1 when you want ksm to run, 0 when no
 
 max_kernel_pages - set the maximum amount of kernel pages
 to be allocated by ksm, set 0 for unlimited.
 
 pages_to_scan - how many pages to scan before ksm will sleep
 
 sleep - how much usecs ksm will sleep.
 
 3)Add sysfs paramater to control the maximum kernel pages to be by
 ksm.
 
 4)Add statistics about how much pages are really shared.
 
 
 One issue still to be discussed:
 There was a suggestion to use madvice(SHAREABLE) instead of using
 ioctls to register memory that need to be scanned by ksm.
 Such change is outside the area of ksm.c and would required adding
 new madvice api, and change some parts of the vm and the kernel
 code, so first thing to do, is realized if we really want this.
 
 I dont know any other open issues.
 
 Thanks.
 
 This is from the first post:
 (The kvm part, togather with the kvm-userspace part, was post with V1
 before about a week, whoever want to test ksm may download the
 patch from lkml archive)
 
 KSM is a linux driver that allows dynamicly sharing identical memory
 pages between one or more processes.
 
 Unlike tradtional page sharing that is made at the allocation of the
 memory, ksm do it dynamicly after the memory was created.
 Memory is periodically scanned; identical pages are identified and
 merged.
 The sharing is unnoticeable by the process that use this memory.
 (the shared pages are marked as readonly, and in case of write
 do_wp_page() take care to create new copy of the page)
 
 To find identical pages ksm use algorithm that is split into three
 primery levels:
 
 1) Ksm will start scan the memory and will calculate checksum for each
page that is registred to be scanned.
(In the first round of the scanning, ksm would only calculate
 this checksum for all the pages)
 
 2) Ksm will go again on the whole memory and will recalculate the
checmsum of the pages, pages that are found to have the same
checksum value, would be considered pages that are most likely
wont changed
Ksm will insert this pages into sorted by page content RB-tree that
is called unstable tree, the reason that this tree is called
unstable is due to the fact that the page contents might changed
while they are still inside the tree, and therefore the tree would
become corrupted.
Due to this problem ksm take two more steps in addition to the
checksum calculation:
a) Ksm will throw and recreate the entire unstable tree each round
   of memory scanning - so if we have corruption, it will be fixed
   when we will rebuild the tree.
b) Ksm is using RB-tree, that its balancing is made by the node color
   and not by the content, so even if the page get corrupted, it still
   would take the same amount of time to search on it.
 
 3) In addition to the unstable tree, ksm hold another tree that is called
stable tree - this tree is RB-tree that is sorted by the pages
content and all its pages are write protected, and therefore it cant get
corrupted.
Each time ksm will find two identcial pages using the unstable tree,
it will create new write-protected shared page, and this page will be
inserted into the stable tree, and would be saved there, the
stable tree, unlike the 

Re: CPU Limits on KVM?

2009-04-06 Thread Oshrit Feder
Hi Francisco, 
I've been trying to limit the cpu usage of a VM using cgroups - so I can 
share my experience with you. However, from monitoring the kvm process 
looks like it doesnt force a hard limit of cpu usage%. Hopefully, someone 
can light up the issue. 
I'm interested in limiting the io bandwidth of network and disk as well. 
(dm-ioband - any experience with that? other suggestions?)

1. mount -t cgroup none /dev/cgroup -o cpu,memory
This creates the cgroup file system api on /dev/cgroup. The top level 
hierarchy node contains all the processes by default. 
2. cd /dev/cgroup
3. mkdir VM1
4. mkdir VM2
5. echo {kvm1 pid}  vm1/tasks
6. echo {kvm2 pid}  vm2/tasks
Moves the vm's processes from the top node to a child node - to enable 
atomic control
7. edit VM1/cpu.shares and VM2/cpu.shares to change cpu proportion 
(default is 1024 shares)
I've used 128 for VM1 and 1024 for VM2. To my understanding, that should 
yield proportion of 128/(1024+128)  15% of cpu limit for VM1.
Each of the VMs was allocated 2 vcpu, the host has 4 cores.(Been tring the 
same with 1 vcpu each, results werent different)

Monitoring results:
I used top, atop and system monitoring GUI on the host and on the guest 
for monitoring. Was able to load VM1 to use more than 50%...
 

(Monitoring screenshot at http%3A%2F%2Ftinyurl.com%2Fcsx3v7 ) On the left 
- guest monitoring, time graph shows that there are peaks of more than 
50%), on the right - host monitoring, kvm pid 32620 (VM1 pid) with 63% 
load)


Regarding the references you asked for. Those are the references I've 
found and worked with:
www.mjmwired.net%2Fkernel%2FDocumentation%2Fcgroups.txt%A0 
http%3A%2F%2Ftinyurl.com%2Fdcnoav  

- Oshrit
oshr...@il.ibm.com



From:
Brian Jackson i...@theiggy.com
To:
Francisco Mazzeo francisco.maz...@gmail.com, kvm@vger.kernel.org
Date:
03/04/2009 02:32
Subject:
Re: CPU Limits on KVM?



I haven't ever really used cgroups. I always figured a fair host scheduler 
is 
good enough to handle spreading load. So I don't know if it will fit 
exactly 
what you need. I don't think so. I also don't know of any other options. I 

will say, If I gave 4 VMs a single cpu each on a 4 core host, I would 
expect 
the host to be fully loaded. I wouldn't see any reason for the host not to 
be 
fully loaded. That is after all one of the key points of virtualization. 
Better utilization of hardware.



On Thursday 02 April 2009 17:33:07 Francisco Mazzeo wrote:
 Hello Brian,

  Thanks for the reply. is there a wiki about cgroupds and how to set 
them
 up?

  Also, I tried just for kicks to see what would happen if I create 4
 Virtual Windows machines, run prime95 (a tool that does iterations
 like superpi to stress test memory/cpu) on all of them and just assign
 them only ONE core to them.

  The server node did not crash and you are right, however I was hoping
 for the server load to stay below 50% as I only gave it one single
 core to each KVM VE. Instead it seems like KVM let each VE get one
 slice of each of the 4 cores of my CPU, which did not accomplish what
 I wanted.

  Is cgroupds the only choice available?

 -- Francisco

 On Thu, Apr 2, 2009 at 3:29 PM, Brian Jackson i...@theiggy.com wrote:
  There's CPU cgroups. It doesn't have exactly the ability you are 
after,
  but it is able to limit process(es) CPU usage. Maxing out CPU usage 
won't
  crash your server. The kernel will arbitrate sharing the CPU evenly 
among
  processes/VMs.
 
  --Brian Jackson
 
  On Thursday 02 April 2009 16:41:10 Francisco Mazzeo wrote:
  Hello,
 
   I am a new user to KVM and was wondering if there was any way to
  limit a VE from using up all the resources of the processor.
 
 
   Right now I have a Quad core 2.5Ghz, I have a KVM VE (running 
windows
  server 2003) and assigned 4 CPUs to it. If I max out the load for 
that
  VE, the entire host node load will be 100% which may crash it if I
  hosted more than 1 single VE.
 
   OpenVZ has cpulimit command, does KVM have something similar or any
  way that I can implement a limit on a single VE? Say I want to only
  give a max of 500Mhz per core, to total 2Ghz to the VE.
 
  Thanks
  Francisco
  www.navigatoris.net / www.serversoutlet.com
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2

2009-04-06 Thread Andrea Arcangeli
On Mon, Apr 06, 2009 at 05:04:49PM +1000, Nick Piggin wrote:
 They should use a shared memory segment, or MAP_ANONYMOUS|MAP_SHARED etc.
 Presumably they will probably want to control it to interleave it over
 all numa nodes and use hugepages for it. It would be very little work.

I thought it's the intermediate result of the computations that leads
to lots of equal data too, in which case ksm is the only way to share
it all.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange guest slowness after some time

2009-04-06 Thread Tomasz Chmielewski

Tomasz Chmielewski schrieb:


As I mentioned, it was using virtio net.

Guests running with e1000 (and virtio_blk) don't have this problem.


Also, virtio_console seem to be affected by this slowness issue.

Am I correct to think that if:

* on guest lsmod outputs:

virtio_console  6828  0 [permanent]

* on guest, /etc/inittab contains:

6:2345:respawn:/sbin/mingetty ttyS0

* on host, I start the guest with a parameter:

-serial unix:/var/run/qemu-server/103.serial,server,nowait


That the guests's ttyS0 console is virtio_console?



If my thinking is correct, than I have a slow serial console on some 
of the guests using virtio_pci and virtio_console driver.



By slow serial console I mean any character typed shows up after a 
second or so.


It can be also cured like with virtio_net - just run:

dd if=/dev/vda of=/dev/null

And the console reacts normally. Stop dd, console is slow again.


I have this issue on two guests with e1000 network, which use virtio_blk 
(and virtio_console...).

I never saw this issue with guests which don't use virtio.



--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


WG: KVM performance

2009-04-06 Thread BRAUN, Stefanie
 

-Ursprüngliche Nachricht-
Von: BRAUN, Stefanie 
Gesendet: Montag, 6. April 2009 18:25
An: 'Avi Kivity'
Betreff: AW: KVM performance

 

-Ursprüngliche Nachricht-
Von: Avi Kivity [mailto:a...@redhat.com]
Gesendet: Montag, 6. April 2009 13:45
An: BRAUN, Stefanie
Cc: kvm@vger.kernel.org
Betreff: Re: KVM performance

BRAUN, Stefanie wrote:
 Hallo,

 as I want to switch from XEN to KVM I've made some performance tests 
 to see if KVM is as peformant as XEN. But tests with a VMU that 
 receives a streamed video, adds a small logo to the video and streams 
 it to a client have shown that XEN performs much betten than KVM.
 In XEN the vlc (videolan client used to receive, process and send the
 video) process
 within the vmu has a cpuload of 33,8 % whereas in KVM the vlc process 
 has a cpuload of 99.9 %.
 I'am not sure why, does anybody now some settings to improve the KVM 
 performance?
   

Is this a tcp test?

Can you test receive and transmit separately?

Hello,

it's a transcoder test, but without transcoding between video formats, the 
vmu just adds a logo (a watermark) into the video.

At the same time the vmu performed several actions:
- receiving a streamed video via udp
- adding a logo to the video
- sending the streamed video via udp

But I think I can split up the test into the following subtests and provide 
further performance values Sub test 1 receive:  - Receiving the video from 
network (udp) and saving locally Sub test 2 transmit: - Reading the video from 
local ressource and sending via network Sub test 3 process:  - Reading the 
video from local ressource, adding the logo to the video stream and saving it 
again locally.


 

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


AW: KVM performance

2009-04-06 Thread BRAUN, Stefanie
 

-Ursprüngliche Nachricht-
Von: Hauke Hoffmann [mailto:kont...@hauke-hoffmann.net] 
Gesendet: Montag, 6. April 2009 14:13
An: kvm@vger.kernel.org
Cc: BRAUN, Stefanie
Betreff: Re: KVM performance

On Friday 03 April 2009 13:32:50 you wrote:
 Hallo,

 as I want to switch from XEN to KVM I've made some performance tests 
 to see if KVM is as peformant as XEN. But tests with a VMU that 
 receives a streamed video, adds a small logo to the video and streams 
 it to a client have shown that XEN performs much betten than KVM.
 In XEN the vlc (videolan client used to receive, process and send the
 video) process
 within the vmu has a cpuload of 33,8 % whereas in KVM the vlc process 
 has a cpuload of 99.9 %.
 I'am not sure why, does anybody now some settings to improve the KVM 
 performance?

 Thank you.
 Regards, Stefanie.


 Used hardware and settings:
 In the tests I've used the same host hardware for XEN and KVM:
 - Dual Core AMD 2.2 GHz, 8 GB RAM
 - Tested OSes for KVM Host: Fedora 10, 2.6.27.5-117.fc10.x86_64 with 
 kvm version 10.fc10 version 74
 also tested in january: compiled kernel 
 with
 kvm-83

 - KVM Guest settings: OS: Fedora 9 2.6.25-14.fc9.x86_64 (i386 also
 tested)
   RAM: 256 MB (same for XEN vmu)
   CPU: 1 Core with 2,2 GHz (same for XEN vmu)
   tested nic models: rtl8139, e1000, virtio

 Tested Scenario: VMU receives a streamed video , adds a logo 
 (watermark) to the video stream and then streams it to a client

 Results:

 XEN:
 Host cpu load (virt-manager): 23%
 VMU  cpu load (virt-manager): 18 %
 VLC process within VMU (top): 33,8%

 KVM:
 no virt-manager cpu load as I started the vmu with the kvm command
 Host cpu load :   52%
 qemu-kvm process (top)77-100%
 VLC process within vmu (top): 80 - 99,9%

 KVM command to start vmu
 /usr/bin/qemu-kvm -boot c -hda /images/vmu01.raw -m 256 -net 
 nic,vlan=0,macaddr=aa:bb:cc:dd:ee:10,model=virtio -net 
 tap,ifname=tap0,vlan=0,script=/etc/kvm/qemu-ifup,downscript=/etc/kvm/q
 em u-ifdown -vnc 127.0.0.1:1 -k de --daemonize

Hi Stefanie,

does vlc perform operations on disc (eg caching, logging, ...)? 

When it cache you can use virtio also for the disk. 
Just change
-hda /images/vmu01.raw
to
-drive file=/images/vmu01.raw,if=virtio,boot=on

Regards
Hauke


Hi Hauke,

Thanks for your replay.

The vlc does not perform excessive operations on disc.

Even so I've added disk virtio to the vmu setup. 
But the qemu-kvm process in the host and the
vlc process within the vmu still consume up to 100%.

Regards,
Stefanie






 

 Alcatel-Lucent Deutschland AG
 Bell Labs Germany
 Service Infrastructure, ZFZ-SI
 Stefanie Braun
 Phone:   +49.711.821-34865
 Fax: +49.711.821-32453

 Postal address:
 Alcatel-Lucent Deutschland AG
 Lorenzstrasse 10
 D-70435 STUTTGART

 Mail: stefanie.br...@alcatel-lucent.de



 Alcatel-Lucent Deutschland AG
 Sitz der Gesellschaft: Stuttgart - Amtsgericht Stuttgart HRB 4026 
 Vorsitzender des Aufsichtsrats: Michael Oppenhoff Vorstand: Alf Henryk 
 Wulf (Vors.), Dr. Rainer Fechner

 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in the 
 body of a message to majord...@vger.kernel.org More majordomo info at  
 http://vger.kernel.org/majordomo-info.html



-- 

hauke hoffmann service and electronic systems

Moristeig 60, D-23556 Lübeck

Telefon: +49 (0) 451 8896462
Fax: +49 (0) 451 8896461
Mobil: +49 (0) 170 7580491
E-Mail: off...@hauke-hoffmann.net
PGP public key: www.hauke-hoffmann.net/static/pgp/kontakt.asc
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Qemu: Flush i-cache after ide-dma operation in IA64

2009-04-06 Thread Hollis Blanchard
On Thu, 2009-04-02 at 10:01 +0800, Zhang, Yang wrote:
 The data from dma will include instructions. In order to exeuting the right
 instruction, we should to flush the i-cache to ensure those data can be see 
 by cpu.
 
 Signed-off-by: Xiantao Zhang xiantao.zh...@intel.com
 Signed-off-by: Yang Zhang yang.zh...@intel.com
 ---
 
 
 diff --git a/qemu/cache-utils.h b/qemu/cache-utils.h
 index b45fde4..5e11d12 100644
 --- a/qemu/cache-utils.h
 +++ b/qemu/cache-utils.h
 @@ -33,8 +33,22 @@ static inline void flush_icache_range(unsigned long start, 
 unsigned long stop)
  asm volatile (sync : : : memory);
  asm volatile (isync : : : memory);
  }
 +#define qemu_sync_idcache flush_icache_range
 +#else
  
 +#ifdef __ia64__
 +static inline void qemu_sync_idcache(unsigned long start, unsigned long stop)
 +{
 +while (start  stop) {
 + asm volatile (fc %0 :: r(start));
 + start += 32;
 +}
 +asm volatile (;;sync.i;;srlz.i;;);
 +}
  #else
 +static inline void qemu_sync_idcache(unsigned long start, unsigned long stop)
 +#endif
 +
  #define qemu_cache_utils_init(envp) do { (void) (envp); } while (0)
  #endif

You already have flush_icache_range() in qemu/target-ia64/fake-exec.c,
so this is redundant. Moving that to cache-utils.h might make sense, but
this should be discussed on qemu-devel.

Also, flush_icache_range() is already called from
cpu_physical_memory_rw(). It would be helpful to include a comment in
this commit explaining why this path is different. (I can see that it
is, but only because I went hunting myself.)

 diff --git a/qemu/cutils.c b/qemu/cutils.c
 index 5b36cc6..7b57173 100644
 --- a/qemu/cutils.c
 +++ b/qemu/cutils.c
 @@ -23,6 +23,7 @@
   */
  #include qemu-common.h
  #include host-utils.h
 +#include cache-utils.h
  #include assert.h
  
  void pstrcpy(char *buf, int buf_size, const char *str)
 @@ -215,6 +216,8 @@ void qemu_iovec_from_buffer(QEMUIOVector *qiov, const 
 void *buf, size_t count)
  if (copy  qiov-iov[i].iov_len)
  copy = qiov-iov[i].iov_len;
  memcpy(qiov-iov[i].iov_base, p, copy);
 +qemu_sync_idcache((unsigned long)qiov-iov[i].iov_base, 
 +(unsigned long)(qiov-iov[i].iov_base + copy));
  p += copy;
  count -= copy;
  }

This is way too generic a call for this location. Other architectures
also need to synchronize L1 caches sometimes, but they don't need to do
it here. You need to comment and guard this call better (probably using
some combination of kvm_enabled() and ifdefs).

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Port

2009-04-06 Thread Hollis Blanchard
On Sun, 2009-04-05 at 00:11 +0530, kvm port wrote:
 ok, so these are a few steps to begin
 (a) add a QEMUMachine for my h/w in qemu

As an alternative, you could start exercising KVM kernel code using
kvm-userspace/test before qemu is ready.

 (b) Add arch support in kvm
 
 I have a few questions
 (a) qemu starts in user space, how would I configure my linux. Should
 the linux run in Hypervisor state and the apps run in user state, and
 nothing runs in guest state [ there are 3 states in my processor]

Are there only 3, or are there two independent dimensions
(hypervisor/guest, user/supervisor)? If there are only 3, you'll need to
figure out how to isolate guest kernel and guest userspace from each
other.

 (b) qemu starts the VM and somehow ( i dont know yet, how?) , starts
 my code in processor guest state

Why are you asking us? You are the processor expert... :)

Qemu calls into KVM via an ioctl, and processor-specific KVM code
(that's you) somehow jumps into guest mode.

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-84 + virtio Ubuntu Hardy guests

2009-04-06 Thread Dustin Kirkland
On Mon, 2009-04-06 at 18:43 +0100, Mark McLoughlin wrote:
 The problem here is that 2.6.24/5 vintage guests are saying they
 support
 something they don't. See this for further details:
 
 http://lists.gnu.org/archive/html/qemu-devel/2009-01/msg00574.html

Agreed, understood.  And those kernels should be patched.

However, it's a bit of a chicken/egg problem...  One can't even boot
those guests (with virtio) to update the kernel, as it panics on boot
(without the kvm hack).


:-Dustin

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-84 + virtio Ubuntu Hardy guests

2009-04-06 Thread Mark McLoughlin
On Mon, 2009-04-06 at 09:55 -0700, Dustin Kirkland wrote:
 On Tue, Mar 31, 2009 at 5:28 PM, Dustin Kirkland kirkl...@canonical.com 
 wrote:
  I'm receiving a heavy volume of Ubuntu Jaunty Beta users reporting
  that Jaunty hosts running kvm-84 (userspace and kernel) are not able
  to boot previously-working Hardy guests (2.6.24 kernel) if virtio
  networking is enabled [1].  Users report that if e1000 is used
  instead, the guest is able to boot (with degraded network performance,
  obviously).  Users are also reporting that this was not a problem when
  kvm-82 was used in Jaunty (though we also merged libvirt 0.5.1 up to
  0.6.0 in roughly the same timeframe).
 ...
  [1] https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/331128
 
 Howdy-
 
 Just a follow-up...
 
 Anthony was able to confirm this issue, and create a patch for KVM,
 which we're carrying in Ubuntu.  It's a bit of a special-case hack,
 but I'm dropping it here for the sake of completeness.
 
 Basically, Hardy guests do not have working GSO (general segment
 offload) support.  Some changes in kvm/libvirt appear to be exposing
 this, and breaking some guests when running virtio.
 
 This patch from Anthony basically disables this support in KVM
 userspace (until we have a better solution for auto-detecting GSO
 support or lack thereof).

The problem here is that 2.6.24/5 vintage guests are saying they support
something they don't. See this for further details:

http://lists.gnu.org/archive/html/qemu-devel/2009-01/msg00574.html

Cheers,
Mark.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Make kvm header compile under g++.

2009-04-06 Thread nathan binkert
Hi Avi,

You said that you'd be willing to include this.  I don't want to
pester or anything, but I would like it to not fall into the abyss.
Would you like me to file it as a bug and assign it to you?  Are there
any changes that you'd like?

The one change you mentioned was to pull struct kvm_io outside of
struct kvm_run.  I mentioned that a grep shows no usage of kvm_io
anywhere, so I didn't do that.

  Nate

On Fri, Mar 27, 2009 at 9:53 PM, nathan binkert n...@binkert.org wrote:
 Two things needed fixing: 1) g++ does not allow a named structure type
 within an anonymous union and 2) Avoid name clash between two padding
 fields within the same struct by giving them different names as is
 done elsewhere in the header.


 Signed-off-by: Nathan Binkert n...@binkert.org
 ---
  include/linux/kvm.h |    6 +++---
  1 files changed, 3 insertions(+), 3 deletions(-)

 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index ee755e2..2e3a734 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -119,7 +119,7 @@ struct kvm_run {
                        __u32 error_code;
                } ex;
                /* KVM_EXIT_IO */
 -               struct kvm_io {
 +               struct {
  #define KVM_EXIT_IO_IN  0
  #define KVM_EXIT_IO_OUT 1
                        __u8 direction;
 @@ -224,10 +224,10 @@ struct kvm_interrupt {
  /* for KVM_GET_DIRTY_LOG */
  struct kvm_dirty_log {
        __u32 slot;
 -       __u32 padding;
 +       __u32 padding1;
        union {
                void __user *dirty_bitmap; /* one bit per page */
 -               __u64 padding;
 +               __u64 padding2;
        };
  };

 --
 1.6.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip 2/6 V4.2] x86: add arch-dep register and stack access API to ptrace

2009-04-06 Thread Roland McGrath
I have no comments about the patch, but the subject line is misleading
because this has nothing do with ptrace.  It's in asm/ptrace.h but it being
struct pt_regs does not really have anything to do with ptrace.


Thanks,
Roland
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-autotest] new test, saves and reloads a guest, from Red Hat QE

2009-04-06 Thread David Huff
---
 client/tests/kvm_runtest_2/kvm_runtest_2.py |1 +
 client/tests/kvm_runtest_2/kvm_tests.cfg.sample |6 ++
 client/tests/kvm_runtest_2/kvm_tests.py |   97 +++
 3 files changed, 104 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm_runtest_2/kvm_runtest_2.py 
b/client/tests/kvm_runtest_2/kvm_runtest_2.py
index c53877f..c83dce9 100644
--- a/client/tests/kvm_runtest_2/kvm_runtest_2.py
+++ b/client/tests/kvm_runtest_2/kvm_runtest_2.py
@@ -34,6 +34,7 @@ class kvm_runtest_2(test.test):
 migration:test_routine(kvm_tests,   
run_migration),
 yum_update:   test_routine(kvm_tests,   
run_yum_update),
 autotest: test_routine(kvm_tests,   
run_autotest),
+saveload: test_routine(kvm_tests,   
run_save_load),
 kvm_install:  test_routine(kvm_install, 
run_kvm_install),
 linux_s3: test_routine(kvm_tests,   
run_linux_s3),
 }
diff --git a/client/tests/kvm_runtest_2/kvm_tests.cfg.sample 
b/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
index 5619fa8..869398a 100644
--- a/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
+++ b/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
@@ -48,6 +48,12 @@ variants:
 reboot = yes
 extra_params +=  -snapshot
 kill_vm_on_error = yes
+
+- saveload:install
+   type = saveload
+   pre_command = if [ -f kvm_save_test_file ] ; then rm -rf 
kvm_save_test_file; fi
+   do_command  = touch kvm_save_test_file
+   verify_command = [ -f kvm_save_test_file ]
 
 - migrate:  install setup
 type = migration
diff --git a/client/tests/kvm_runtest_2/kvm_tests.py 
b/client/tests/kvm_runtest_2/kvm_tests.py
index 0d19af6..463cc84 100644
--- a/client/tests/kvm_runtest_2/kvm_tests.py
+++ b/client/tests/kvm_runtest_2/kvm_tests.py
@@ -449,3 +449,100 @@ def run_linux_s3(test, params, env):
 kvm_log.info(VM resumed after S3)
 
 session.close()
+
+def run_save_load(test, params, env):
+# state testing, save and load vm state
+vm = kvm_utils.env_get_vm(env, params.get(main_vm))
+if not vm:
+message = VM object not found in environment
+kvm_log.error(message)
+raise error.TestError, message
+if not vm.is_alive():
+message = VM seems to be dead; Test requires a living VM
+kvm_log.error(message)
+raise error.TestError, message
+
+kvm_log.info(Waiting for guest to be up...)
+
+pxssh = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2)
+if not pxssh:
+message = Could not log into guest
+kvm_log.error(message)
+raise error.TestFail, message
+
+pre_command=params.get(pre_command)
+do_command=params.get(do_command)
+verify_command=params.get(verify_command)
+
+# do preparation
+kvm_log.info(Logged in)
+kvm_log.info(Doing preparation ... %s % pre_command)
+if not pxssh.send_command(pre_command):
+message = %s failed % pre_command
+kvm_log.error(message)
+raise error.TestFail, message
+pxssh.close()
+kvm_log.info(Logged out)
+
+# save state
+kvm_log.info(Saving VM state ...)
+vm.send_monitor_cmd('savevm test1')
+s, o = vm.send_monitor_cmd('info snapshots')
+if not 'test1' in o:
+message = Saveing VM state error %s % o
+kvm_log.error(message)
+raise error.TestFail, message
+kvm_log.info(o)
+kvm_log.info(VM state saved)
+
+kvm_log.info(Destroying VM ...)
+vm.destroy();
+kvm_log.info(VM Destroyed)
+
+kvm_log.info(Booting VM...)
+if not vm.create():
+message = Could no recreate VM instance
+kvm_log.error(message)
+raise error.TestError, message
+kvm_log.info(VM recreated)
+
+# do modification
+pxssh = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2)
+if not pxssh:
+message = Could not log into guest
+kvm_log.error(message)
+raise error.TestFail, message
+kvm_log.info(Logged in)
+kvm_log.info(Doing modification %s % do_command)
+if not pxssh.send_command(do_command):
+message = %s failed % do_command
+kvm_log.error(message)
+raise error.TestFail, message
+pxssh.close()
+kvm_log.info(Logged out)
+
+# load state
+kvm_log.info(Loading VM state ...)
+s, o = vm.send_monitor_cmd('loadvm test1')
+kvm_log.info(o)
+if Error in o:
+message = VM state load failed: %s % o
+kvm_log.error(message)
+raise error.TestFail, message
+kvm_log.info(VM state loaded)
+
+# verify the status
+pxssh = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2)
+if not pxssh:
+message = Could not log into guest
+kvm_log.error(message)
+raise error.TestFail, message
+kvm_log.info(Verifying ... %s % verify_command)
+if 

Re: [PATCH 1/2] qemu: Allow SMBIOS entries to be loaded and provided to the VM BIOS

2009-04-06 Thread Anthony Liguori

Alex Williamson wrote:

Create a new -smbios options that takes binary SMBIOS entries
to provide to the VM BIOS.  The binary can be easily generated
using something like:

dmidecode -t 1 -u | grep $'^\t\t[^]' | xargs -n1 | \
perl -lne 'printf %c, hex($_)'  smbios_type_1.bin

For some inventory tools, this makes the VM report the system
information for the host.  One entry per binary file, multiple
files can be chained together as:

  -smbios file1,file2,...

or specified independently:

  -smbios file1 -smbios file2

Signed-off-by: Alex Williamson alex.william...@hp.com
  


Hi Alex,

I know we have to support blobs because of OEM specific smbios entries, 
but there are a number of common ones that it would probably be good to 
specify in a less user-unfriendly way.  What do you think?


Anyway, comments below.


diff --git a/hw/acpi.c b/hw/acpi.c
index 52f50a0..0bd93bf 100644
--- a/hw/acpi.c
+++ b/hw/acpi.c
@@ -915,3 +915,69 @@ out:
 }
 return -1;
 }
+
+char *smbios_entries;
+size_t smbios_entries_len;
  


I think an accessor would be better than making these variables global.


+int smbios_entry_add(const char *t)
+{
  


acpi.c is hardware emulation, I'd rather see the command line parsing 
done somewhere else (like vl.c).



+struct stat s;
+char file[1024], *p, *f, *n;
+int fd, r;
+size_t len, off;
+
+f = (char *)t;
+do {
+n = strchr(f, ',');
+if (n) {
+strncpy(file, f, (n - f));
+file[n - f] = '\0';
+f = n + 1;
+} else {
+strcpy(file, f);
+f += strlen(file);
+}
  


I'm happy to just require multiple -smbios options.  I dislike 
overloading with ','s even though we do it a lot in QEMU.



+fd = open(file, O_RDONLY);
+if (fd  0)
+return -1;
+
+if (fstat(fd, s)  0) {
+close(fd);
+return -1;
+}
  


May want to look at load_image/get_image_size.


--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] qemu: Allow SMBIOS entries to be loaded and provided to the VM BIOS

2009-04-06 Thread Alex Williamson
Hi Anthony,

On Mon, 2009-04-06 at 14:50 -0500, Anthony Liguori wrote:
 Alex Williamson wrote:
 
 I know we have to support blobs because of OEM specific smbios entries, 
 but there are a number of common ones that it would probably be good to 
 specify in a less user-unfriendly way.  What do you think?

Yeah, I'll admit this is a pretty unfriendly interface.  I get from your
comment on the other part of the patch that you'd prefer not to get into
the mess of having both binary blobs and command line switches
augmenting the blobs.  This seems reasonable, but also means that we
need a way to fully define the tables we generate from the command line.
For a type 0 entry, that might mean the following set of switches:

-bios-version, -bios-date, -bios-characteristics, -bios-release

And for a type 1:

-system-manufacturer, -system-name, -system-version, -system-serial,
-system-sku, -system-family

type 3:

-chassis-manufacturer, -chassis-type, -chassis-version, -chassis-serial,
-chassis-asset, -chassis-oem

I'm sure I'm missing some, plus we might want to allow the memory and
processor entries to have some fields changed.  Do we want to add that
many switches and means to access them from the rombios?

 Anyway, comments below.
 
  diff --git a/hw/acpi.c b/hw/acpi.c
  index 52f50a0..0bd93bf 100644
  --- a/hw/acpi.c
  +++ b/hw/acpi.c
  @@ -915,3 +915,69 @@ out:
   }
   return -1;
   }
  +
  +char *smbios_entries;
  +size_t smbios_entries_len;

 
 I think an accessor would be better than making these variables global.

Ok

  +int smbios_entry_add(const char *t)
  +{

 
 acpi.c is hardware emulation, I'd rather see the command line parsing 
 done somewhere else (like vl.c).

Ok.  acpi.c was just a convenient place to not bother architectures that
don't care about smbios.

  +struct stat s;
  +char file[1024], *p, *f, *n;
  +int fd, r;
  +size_t len, off;
  +
  +f = (char *)t;
  +do {
  +n = strchr(f, ',');
  +if (n) {
  +strncpy(file, f, (n - f));
  +file[n - f] = '\0';
  +f = n + 1;
  +} else {
  +strcpy(file, f);
  +f += strlen(file);
  +}

 
 I'm happy to just require multiple -smbios options.  I dislike 
 overloading with ','s even though we do it a lot in QEMU.

Yup, I didn't have it initially, but added it because I thought someone
might complain other qemu options allow it.

  +fd = open(file, O_RDONLY);
  +if (fd  0)
  +return -1;
  +
  +if (fstat(fd, s)  0) {
  +close(fd);
  +return -1;
  +}

 
 May want to look at load_image/get_image_size.

Will do.  Thanks,

Alex


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] qemu: Allow SMBIOS entries to be loaded and provided to the VM BIOS

2009-04-06 Thread Anthony Liguori

Alex Williamson wrote:

Hi Anthony,

On Mon, 2009-04-06 at 14:50 -0500, Anthony Liguori wrote:
  

Alex Williamson wrote:

I know we have to support blobs because of OEM specific smbios entries, 
but there are a number of common ones that it would probably be good to 
specify in a less user-unfriendly way.  What do you think?



Yeah, I'll admit this is a pretty unfriendly interface.  I get from your
comment on the other part of the patch that you'd prefer not to get into
the mess of having both binary blobs and command line switches
augmenting the blobs.  This seems reasonable, but also means that we
need a way to fully define the tables we generate from the command line.
For a type 0 entry, that might mean the following set of switches:

-bios-version, -bios-date, -bios-characteristics, -bios-release
  


You could go one level higher:

-smbios type=0,bios-version='1.0',bios-date='2009/10/20' etc.


I'm sure I'm missing some, plus we might want to allow the memory and
processor entries to have some fields changed.  Do we want to add that
many switches and means to access them from the rombios?
  


I think it's okay to start with some of the more common tables and 
provide the parsing in QEMU.  We could then introduce humanize more 
tables down the road as people saw fit.


At the end of the day, I'm most interested in the tables that are going 
to be frequently used by management applications.  That is, the tables 
that are required for things like SVVP certification should be specified 
in a human readable format that QEMU can build reasonable defaults for 
and management tools can override.


I'm torn between exposing the tables directly to the firmware or 
providing a higher level interface.  I really don't like -uuid 
overriding a binary blob though so I'd prefer to avoid that.  -uuid 
should only be respected if using the QEMU generated version of the 
SMBIOS table.  I'll defer to whatever you think is better what is 
exposed in the firmware interface as I can see arguments for both.


--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip 3/6 V4.1] x86: instruction decorder API

2009-04-06 Thread Jim Keniston
On Fri, 2009-04-03 at 20:37 -0400, Masami Hiramatsu wrote:
 Hi Peter,
 
 H. Peter Anvin wrote:
  Masami Hiramatsu wrote:
  Add x86 instruction decoder to arch-specific libraries. This decoder
  can decode all x86 instructions into prefix, opcode, modrm, sib,
  displacement and immediates. This can also show the length of
  instructions.
 
...
  
  Hi Masami,
  
  On the surface the overall structure looks fine, but I have a couple of 
  concerns:
  
  1. is this meant to be able to decode userspace code or just kernel 
  code?  If it is supposed to be able to decode userspace code, is there a 
  reason you're not dealing with 16-bit or V86 mode code at all?  If not, 
  why are you including the 32-bit decoder in a 64-bit kernel (as well as 
  instructions which we're pretty much guaranteed to never use in the 
  kernel, such as ENTER.)
 
 Actually, this aims to decode both of user space and kernel code.
 At this point, it just needs to cover kernel code, because kprobes
 just want to decode kernel binary.
 However, this is just a starting point, uprobe developers want to
 use it to decode user-space code. In that case, it needs to be
 enhanced.

For user-space probing, we've been concentrating on native-built
executables.  Am I correct in thinking that we'll see 16-bit or V86 mode
only on legacy apps built elsewhere?  In any case, it only makes sense
to build on the kvm folks' work in this regard.

...
 
  
  4. you have a bunch of magic opcode constants all over the place.  This 
  means that as new instructions come in -- and they're going to be coming 
  in -- this is going to be hard to update.  It would be cleaner if we 
  could have an intermediate format that preprocesses down to all the 
  relevant tables and perhaps even some of the code rather than 
  open-coding everything in C.
  
  This matters... for example you have:
  
  +   } else if (opcode == 0xea /* jmp far seg:offs */) {
  +   __get_immptr(insn);
  
  ... but nothing similar for opcode 0x9a.  This is extremely hard to spot 
  with this kind of structure.
 
 Oops, that should be a bug. Hmm, I think we'd better bit-flags tables
 for classifying opcodes.
 Jim, can your INAT idea help this situation?
 
 http://sources.redhat.com/ml/systemtap/2009-q2/msg00109.html
 

As noted, the INAT tables follow the kvm model of one fat bitmap of
attributes per opcode, rather than the kprobes/uprobes model of one or
two 256-bit tables per attribute.  (This latter approach was due to the
gradual accumulation of tables over the years.)

I like the bitmap-per-opcode approach because it's relatively easy to
see in one place everything you're saying about a particular opcode.
But with all the potential clients for this service, it's not clear that
we'll get by with a single bitmap for every opcode.  (x86 kvm uses 32
bits per opcode, I think, and the INAT tables use 10.  Seems like we
could overrun 64 bits pretty quickly.)  So I guess that means we'll have
to get a little creative as to how we expose these attribute sets to the
client.

...
 
 Thank you for good advice!
 

Ditto.
Jim Keniston

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip 3/6 V4.1] x86: instruction decorder API

2009-04-06 Thread H. Peter Anvin
Jim Keniston wrote:
 
 For user-space probing, we've been concentrating on native-built
 executables.  Am I correct in thinking that we'll see 16-bit or V86 mode
 only on legacy apps built elsewhere?  In any case, it only makes sense
 to build on the kvm folks' work in this regard.
 

That's a fair assumption; you will of course need to test it and take
appropriate action if it doesn't pan out.

 
 As noted, the INAT tables follow the kvm model of one fat bitmap of
 attributes per opcode, rather than the kprobes/uprobes model of one or
 two 256-bit tables per attribute.  (This latter approach was due to the
 gradual accumulation of tables over the years.)
 
 I like the bitmap-per-opcode approach because it's relatively easy to
 see in one place everything you're saying about a particular opcode.
 But with all the potential clients for this service, it's not clear that
 we'll get by with a single bitmap for every opcode.  (x86 kvm uses 32
 bits per opcode, I think, and the INAT tables use 10.  Seems like we
 could overrun 64 bits pretty quickly.)  So I guess that means we'll have
 to get a little creative as to how we expose these attribute sets to the
 client.
 

This is another very good reason to use an instruction table which is
preprocessed into a usable format: it means that if the internal data
structures change -- and they almost certainly will have to at some
point -- the raw data isn't lost.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-84 + virtio Ubuntu Hardy guests

2009-04-06 Thread Anthony Liguori

Dustin Kirkland wrote:

On Mon, 2009-04-06 at 18:43 +0100, Mark McLoughlin wrote:
  

The problem here is that 2.6.24/5 vintage guests are saying they
support
something they don't. See this for further details:

http://lists.gnu.org/archive/html/qemu-devel/2009-01/msg00574.html



Agreed, understood.  And those kernels should be patched.

However, it's a bit of a chicken/egg problem...  One can't even boot
those guests (with virtio) to update the kernel, as it panics on boot
(without the kvm hack).
  


There's now a fix for this in QEMU stable.  It'll make it's way to 
kvm-userspace once Avi merges.


Regards,

Anthony Liguori

:-Dustin

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-autotest new test, save and load

2009-04-06 Thread Charles Duffy

David Huff wrote:
Submitting new test for review and comments. 
This test was originally developed by the Red Hat QE team. 
It starts a new guest saves it and reloads the saved state.  


looking for comments and or improvements that can be made to this test.


I have some code (aimed at testing software within the guest, rather 
than kvm itself) which does this kind of operation quite frequently; the 
most common case where I've seen failures happen is when writes are 
active while state is being saved.


One enhancement might be to test the integrity of block device writes 
being done during the save/restore process itself -- for instance, write 
a file of all high bits, start a process overwriting it with all low 
bits, run the save and restore before the latter process has completed, 
and ensure that after the migration is complete and the process has 
finished, the file contains nothing but 0s.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] PCI pass-through fixups

2009-04-06 Thread Chris Wright
* Alex Williamson (alex.william...@hp.com) wrote:
 
 I'm wondering if we need a spot for device specific fixups for PCI
 pass-through.  In the example below, I want to expose a single port of
 an Intel 82571EB quad port copper NIC to a guest.  It works great until
 I shutdown the guest, at which point the guest e1000e driver knows by
 the device ID that the NIC is a quad port, and blindly attempts to
 twiddle some bits on the bridge above it (that doesn't exist).

And what happens?

 Obviously some robustness could be added to the driver, but would it
 make sense to do something like below and automatically remap these
 devices to identical single port device IDs?  Thanks,

Sounds quite fragile to me.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RESEND] kvm-userspace: fix option_rom_setup_reset address

2009-04-06 Thread Glauber Costa
On Mon, Apr 6, 2009 at 1:32 PM, Ryan Harper ry...@us.ibm.com wrote:
 Commit f2b690ba461971fb8b04354de8717a73fd08b945 changed the target
 address for option roms, but failed to use the same address when
 registering an option rom reset.  This manifests itself when using
 extboot (boot=on) and reseting a guest via reboot or system_reset on
 monitor and the guest fails to boot.  This patch register the correct
 region for each option rom.

looks good to me.


-- 
Glauber  Costa.
Free as in Freedom
http://glommer.net

The less confident you are, the more serious you have to act.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/3] PCI: rewrite Function Level Reset

2009-04-06 Thread Yu Zhao
Changes:
  1) remove disable_irq() so the shared IRQ won't be disabled.
  2) replace the 1s wait with 100, 200 and 400ms wait intervals
 for the Pending Transaction.
  3) replace mdelay() with msleep().
  4) add might_sleep().
  5) lock the device to prevent PM suspend from accessing the CSRs
 during the reset.
  6) coding style fixes.

Signed-off-by: Yu Zhao yu.z...@intel.com
---
 drivers/pci/pci.c   |  166 ++-
 include/linux/pci.h |2 +-
 2 files changed, 85 insertions(+), 83 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index af4db4e..46ae997 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2008,111 +2008,112 @@ int pci_set_dma_seg_boundary(struct pci_dev *dev, 
unsigned long mask)
 EXPORT_SYMBOL(pci_set_dma_seg_boundary);
 #endif
 
-static int __pcie_flr(struct pci_dev *dev, int probe)
+static int pcie_flr(struct pci_dev *dev, int probe)
 {
-   u16 status;
+   int i;
+   int pos;
u32 cap;
-   int exppos = pci_find_capability(dev, PCI_CAP_ID_EXP);
+   u16 status;
 
-   if (!exppos)
+   pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
+   if (!pos)
return -ENOTTY;
-   pci_read_config_dword(dev, exppos + PCI_EXP_DEVCAP, cap);
+
+   pci_read_config_dword(dev, pos + PCI_EXP_DEVCAP, cap);
if (!(cap  PCI_EXP_DEVCAP_FLR))
return -ENOTTY;
 
if (probe)
return 0;
 
-   pci_block_user_cfg_access(dev);
-
/* Wait for Transaction Pending bit clean */
-   pci_read_config_word(dev, exppos + PCI_EXP_DEVSTA, status);
-   if (!(status  PCI_EXP_DEVSTA_TRPND))
-   goto transaction_done;
+   for (i = 0; i  4; i++) {
+   if (i)
+   msleep((1  (i - 1)) * 100);
 
-   msleep(100);
-   pci_read_config_word(dev, exppos + PCI_EXP_DEVSTA, status);
-   if (!(status  PCI_EXP_DEVSTA_TRPND))
-   goto transaction_done;
-
-   dev_info(dev-dev, Busy after 100ms while trying to reset; 
-   sleeping for 1 second\n);
-   ssleep(1);
-   pci_read_config_word(dev, exppos + PCI_EXP_DEVSTA, status);
-   if (status  PCI_EXP_DEVSTA_TRPND)
-   dev_info(dev-dev, Still busy after 1s; 
-   proceeding with reset anyway\n);
-
-transaction_done:
-   pci_write_config_word(dev, exppos + PCI_EXP_DEVCTL,
+   pci_read_config_word(dev, pos + PCI_EXP_DEVSTA, status);
+   if (!(status  PCI_EXP_DEVSTA_TRPND))
+   goto clear;
+   }
+
+   dev_err(dev-dev, transaction is not cleared; 
+   proceeding with reset anyway\n);
+
+clear:
+   pci_write_config_word(dev, pos + PCI_EXP_DEVCTL,
PCI_EXP_DEVCTL_BCR_FLR);
-   mdelay(100);
+   msleep(100);
 
-   pci_unblock_user_cfg_access(dev);
return 0;
 }
 
-static int __pci_af_flr(struct pci_dev *dev, int probe)
+static int pci_af_flr(struct pci_dev *dev, int probe)
 {
-   int cappos = pci_find_capability(dev, PCI_CAP_ID_AF);
-   u8 status;
+   int i;
+   int pos;
u8 cap;
+   u8 status;
 
-   if (!cappos)
+   pos = pci_find_capability(dev, PCI_CAP_ID_AF);
+   if (!pos)
return -ENOTTY;
-   pci_read_config_byte(dev, cappos + PCI_AF_CAP, cap);
+
+   pci_read_config_byte(dev, pos + PCI_AF_CAP, cap);
if (!(cap  PCI_AF_CAP_TP) || !(cap  PCI_AF_CAP_FLR))
return -ENOTTY;
 
if (probe)
return 0;
 
-   pci_block_user_cfg_access(dev);
-
/* Wait for Transaction Pending bit clean */
-   pci_read_config_byte(dev, cappos + PCI_AF_STATUS, status);
-   if (!(status  PCI_AF_STATUS_TP))
-   goto transaction_done;
+   for (i = 0; i  4; i++) {
+   if (i)
+   msleep((1  (i - 1)) * 100);
+
+   pci_read_config_byte(dev, pos + PCI_AF_STATUS, status);
+   if (!(status  PCI_AF_STATUS_TP))
+   goto clear;
+   }
+
+   dev_err(dev-dev, transaction is not cleared; 
+   proceeding with reset anyway\n);
 
+clear:
+   pci_write_config_byte(dev, pos + PCI_AF_CTRL, PCI_AF_CTRL_FLR);
msleep(100);
-   pci_read_config_byte(dev, cappos + PCI_AF_STATUS, status);
-   if (!(status  PCI_AF_STATUS_TP))
-   goto transaction_done;
-
-   dev_info(dev-dev, Busy after 100ms while trying to
-reset; sleeping for 1 second\n);
-   ssleep(1);
-   pci_read_config_byte(dev, cappos + PCI_AF_STATUS, status);
-   if (status  PCI_AF_STATUS_TP)
-   dev_info(dev-dev, Still busy after 1s; 
-   proceeding with reset anyway\n);
-
-transaction_done:
-   pci_write_config_byte(dev, cappos + PCI_AF_CTRL, PCI_AF_CTRL_FLR);
-   mdelay(100);
-
-   

[RFC PATCH 3/3] PCI: support Secondary Bus Reset

2009-04-06 Thread Yu Zhao
PCI-to-PCI Bridge 1.2 specifies that the Secondary Bus Reset bit can
force the assertion of RST# on the secondary interface, which can be
used to reset all devices including subordinates under this bus. This
can be used to reset a function if this function is the only device
under this bus.

Signed-off-by: Yu Zhao yu.z...@intel.com
---
 drivers/pci/pci.c |   31 +++
 1 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e459a0b..a77c33a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2115,6 +2115,33 @@ static int pci_pm_flr(struct pci_dev *dev, int probe)
return 0;
 }
 
+static int pci_secondary_bus_reset(struct pci_dev *dev, int probe)
+{
+   u16 ctrl;
+   struct pci_dev *pdev;
+
+   if (dev-subordinate)
+   return -ENOTTY;
+
+   list_for_each_entry(pdev, dev-bus-devices, bus_list)
+   if (pdev != dev)
+   return -ENOTTY;
+
+   if (probe)
+   return 0;
+
+   pci_read_config_word(dev-bus-self, PCI_BRIDGE_CONTROL, ctrl);
+   ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
+   pci_write_config_word(dev-bus-self, PCI_BRIDGE_CONTROL, ctrl);
+   msleep(100);
+
+   ctrl = ~PCI_BRIDGE_CTL_BUS_RESET;
+   pci_write_config_word(dev-bus-self, PCI_BRIDGE_CONTROL, ctrl);
+   msleep(100);
+
+   return 0;
+}
+
 static int pci_dev_reset(struct pci_dev *dev, int probe)
 {
int rc;
@@ -2136,6 +2163,10 @@ static int pci_dev_reset(struct pci_dev *dev, int probe)
goto done;
 
rc = pci_pm_flr(dev, probe);
+   if (rc != -ENOTTY)
+   goto done;
+
+   rc = pci_secondary_bus_reset(dev, probe);
 done:
up(dev-dev.sem);
 
-- 
1.5.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] align vga rom to 4k boundary.

2009-04-06 Thread Glauber Costa
Instead of aligning to 2k boundary, as required by the bios,
align to 4k boundary, as required by kvm memory functions. Without
this patch, starting kvm with -vga std option fails with:

create_userspace_phys_mem: Invalid argument
kvm_cpu_register_physical_memory: failed

as described by: https://bugzilla.redhat.com/show_bug.cgi?id=494376

It does not fail with cirrus vga, because it is naturally aligned.
This problem does not seem to affect upstream qemu.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 qemu/hw/pc.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c
index cc84772..680d4a2 100644
--- a/qemu/hw/pc.c
+++ b/qemu/hw/pc.c
@@ -919,7 +919,7 @@ vga_bios_error:
 exit(1);
 }
/* Round up vga bios size to the next 2k boundary */
-   vga_bios_size = (vga_bios_size + 2047)  ~2047;
+   vga_bios_size = (vga_bios_size + 4095)  ~4095;
option_rom_start = 0xc + vga_bios_size;
 
 /* setup basic memory access */
-- 
1.5.6.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


3525MB RAM Limit

2009-04-06 Thread Daniel Scott
Hi,

I'm running a KVM host using Fedora 10 (x86_64) on 4x Quad-Core AMD
Opteron(tm) Processor 8347 HE system with 16GB ram. I've created a
Fedora 10 (x86_64) guest and allocated 8 CPUs and 8GB ram which I can
see using 'virt-manager'. When I login to the guest, I can see the 8
CPUs, but apparently, I only have 3525MB RAM (output from 'free'). At
first I thought it was a 32bit problem, but I've double checked and
the host and guest are both running 64bit architectures. virt-manager
shows 50% ram usage which appears to be correct (4GB, out of 8). It
also correctly recognises the 16GB available in the host machine.

Is this expected? Will the extra ram become available once it is
required? If not, does anyone have any Ideas for how I can fix the
problem?

Thanks,

Dan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3525MB RAM Limit

2009-04-06 Thread Brian Jackson
Read the list archives. A fix was discussed recently.

--Brian Jackson


On Monday 06 April 2009 21:59:24 Daniel Scott wrote:
 Hi,

 I'm running a KVM host using Fedora 10 (x86_64) on 4x Quad-Core AMD
 Opteron(tm) Processor 8347 HE system with 16GB ram. I've created a
 Fedora 10 (x86_64) guest and allocated 8 CPUs and 8GB ram which I can
 see using 'virt-manager'. When I login to the guest, I can see the 8
 CPUs, but apparently, I only have 3525MB RAM (output from 'free'). At
 first I thought it was a 32bit problem, but I've double checked and
 the host and guest are both running 64bit architectures. virt-manager
 shows 50% ram usage which appears to be correct (4GB, out of 8). It
 also correctly recognises the 16GB available in the host machine.

 Is this expected? Will the extra ram become available once it is
 required? If not, does anyone have any Ideas for how I can fix the
 problem?

 Thanks,

 Dan
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm] [PATCH 06/16] Support for device capability

2009-04-06 Thread Sheng Yang
On Saturday 04 April 2009 03:23:31 Alex Williamson wrote:
 On Tue, 2009-03-17 at 11:50 +0800, Sheng Yang wrote:
  This framework can be easily extended to support device capability, like
  MSI/MSI-x.

 Sheng,

 Are you already looking at adding support for PM and EXP capabilities?
 The bnx2 driver is an example that won't claim the device if these
 capabilities aren't present.  Thanks,

Not yet... 

(And it's quite strange that this mail didn't go for my private mailbox. Only 
mailing list one existed...)

-- 
regards
Yang, Sheng


 Alex

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Port

2009-04-06 Thread Hollis Blanchard
On Sun, 2009-04-05 at 00:11 +0530, kvm port wrote:
 ok, so these are a few steps to begin
 (a) add a QEMUMachine for my h/w in qemu

As an alternative, you could start exercising KVM kernel code using
kvm-userspace/test before qemu is ready.

 (b) Add arch support in kvm
 
 I have a few questions
 (a) qemu starts in user space, how would I configure my linux. Should
 the linux run in Hypervisor state and the apps run in user state, and
 nothing runs in guest state [ there are 3 states in my processor]

Are there only 3, or are there two independent dimensions
(hypervisor/guest, user/supervisor)? If there are only 3, you'll need to
figure out how to isolate guest kernel and guest userspace from each
other.

 (b) qemu starts the VM and somehow ( i dont know yet, how?) , starts
 my code in processor guest state

Why are you asking us? You are the processor expert... :)

Qemu calls into KVM via an ioctl, and processor-specific KVM code
(that's you) somehow jumps into guest mode.

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html