[PATCH] Merge commit 'qemu-svn/trunk'
From: Avi Kivity a...@redhat.com * commit 'qemu-svn/trunk': Fix wrong return value Remove dead AIO code for win32 target-mips: optimize gen_movcf_*() target-mips: optimize gen_movci() target-mips: optimize gen_compute_branch1() Misc scsi disk/cdrom fixes/improvements 4/4 misc scsi disk/cdrom fixes/improvements 3/4 misc scsi disk/cdrom fixes/improvements 2/4 misc scsi disk/cdrom fixes/improvements 1/4 target-mips: don't map FP registers as TCG global variables target-mips: fix divu instruction tcg: fix _tl aliases for divu/remu target-ppc: Explain why the whole TLB is flushed on SR write Fix hxtool eating backslash sequences for sh != bash -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: configure: pass --with-patched-kernel to kernel/configure
From: Mark McLoughlin mar...@redhat.com We need to know this so that we can avoid doing things that are specific to building kvm.ko - e.g. when we are only doing header-sync. Signed-off-by: Mark McLoughlin mar...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/configure b/configure index 49c4419..249c743 100755 --- a/configure +++ b/configure @@ -121,6 +121,7 @@ arch=${arch%%-*} ./configure \ --kerneldir=$kerneldir \ --arch=$arch \ + $([ -z ${want_module} ] echo --with-patched-kernel) \ ${cross_prefix:+--cross-prefix=$cross_prefix} \ ${kvm_trace:+--with-kvm-trace} ) diff --git a/kernel/configure b/kernel/configure index 79fb093..3fd0c94 100755 --- a/kernel/configure +++ b/kernel/configure @@ -6,6 +6,7 @@ cc=gcc ld=ld objcopy=objcopy ar=ar +want_module=1 kvm_trace= cross_prefix= arch=`uname -m` @@ -44,6 +45,9 @@ while [[ $1 = -* ]]; do kerneldir=$arg no_uname=1 ;; + --with-patched-kernel) + want_module= + ;; --with-kvm-trace) kvm_trace=y ;; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: configure: run kernel configure even with --with-patched-kernel
From: Mark McLoughlin mar...@redhat.com We still run header-sync in this case, which requires configure to have been run. Signed-off-by: Mark McLoughlin mar...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/configure b/configure index a2aad59..49c4419 100755 --- a/configure +++ b/configure @@ -117,7 +117,7 @@ processor=${arch#*-} arch=${arch%%-*} #configure kernel module -[[ -n $want_module ]] (cd kernel; +[ -e kernel/Makefile ] (cd kernel; ./configure \ --kerneldir=$kerneldir \ --arch=$arch \ -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: external module: only hack tsc_khz in kvm_arch_init
From: Avi Kivity a...@redhat.com We now hack it to a function call, so all hell breaks loose if we change local variables names tsc_khz. Signed-off-by: Avi Kivity a...@redhat.com diff --git a/kernel/x86/hack-module.awk b/kernel/x86/hack-module.awk index a05c0c3..260eeef 100644 --- a/kernel/x86/hack-module.awk +++ b/kernel/x86/hack-module.awk @@ -1,4 +1,4 @@ -BEGIN { split(INIT_WORK tsc_khz desc_struct ldttss_desc64 desc_ptr \ +BEGIN { split(INIT_WORK desc_struct ldttss_desc64 desc_ptr \ hrtimer_add_expires_ns hrtimer_get_expires \ hrtimer_get_expires_ns hrtimer_start_expires \ hrtimer_expires_remaining \ @@ -25,6 +25,10 @@ BEGIN { split(INIT_WORK tsc_khz desc_struct ldttss_desc64 desc_ptr \ anon_inodes_exit = 0 } +/^int kvm_arch_init/ { kvm_arch_init = 1 } +/\tsc_khz\/ kvm_arch_init { sub(\\tsc_khz\\, kvm_tsc_khz) } +/^}/ { kvm_arch_init = 0 } + /MODULE_AUTHOR/ { printf(MODULE_INFO(version, \%s\);\n, version) } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: external module: backward compatibility for ia64 msidef.h
From: Zhang, Yang yang.zh...@intel.com when using make in kernel, it can not find msidef.h. This patch fix this. Signed-off-by: Yang Zhang yang.zh...@intel.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/kernel/include-compat/asm-ia64/msidef.h b/kernel/include-compat/asm-ia64/msidef.h new file mode 100644 index 000..592c104 --- /dev/null +++ b/kernel/include-compat/asm-ia64/msidef.h @@ -0,0 +1,42 @@ +#ifndef _IA64_MSI_DEF_H +#define _IA64_MSI_DEF_H + +/* + * Shifts for APIC-based data + */ + +#define MSI_DATA_VECTOR_SHIFT 0 +#defineMSI_DATA_VECTOR(v) (((u8)v) MSI_DATA_VECTOR_SHIFT) +#define MSI_DATA_VECTOR_MASK 0xff00 + +#define MSI_DATA_DELIVERY_MODE_SHIFT 8 +#define MSI_DATA_DELIVERY_FIXED(0 MSI_DATA_DELIVERY_MODE_SHIFT) +#define MSI_DATA_DELIVERY_LOWPRI (1 MSI_DATA_DELIVERY_MODE_SHIFT) + +#define MSI_DATA_LEVEL_SHIFT 14 +#define MSI_DATA_LEVEL_DEASSERT(0 MSI_DATA_LEVEL_SHIFT) +#define MSI_DATA_LEVEL_ASSERT (1 MSI_DATA_LEVEL_SHIFT) + +#define MSI_DATA_TRIGGER_SHIFT 15 +#define MSI_DATA_TRIGGER_EDGE (0 MSI_DATA_TRIGGER_SHIFT) +#define MSI_DATA_TRIGGER_LEVEL (1 MSI_DATA_TRIGGER_SHIFT) + +/* + * Shift/mask fields for APIC-based bus address + */ + +#define MSI_ADDR_DEST_ID_SHIFT 4 +#define MSI_ADDR_HEADER0xfee0 + +#define MSI_ADDR_DEST_ID_MASK 0xffff +#define MSI_ADDR_DEST_ID_CPU(cpu) ((cpu) MSI_ADDR_DEST_ID_SHIFT) + +#define MSI_ADDR_DEST_MODE_SHIFT 2 +#define MSI_ADDR_DEST_MODE_PHYS(0 MSI_ADDR_DEST_MODE_SHIFT) +#defineMSI_ADDR_DEST_MODE_LOGIC(1 MSI_ADDR_DEST_MODE_SHIFT) + +#define MSI_ADDR_REDIRECTION_SHIFT 3 +#define MSI_ADDR_REDIRECTION_CPU (0 MSI_ADDR_REDIRECTION_SHIFT) +#define MSI_ADDR_REDIRECTION_LOWPRI(1 MSI_ADDR_REDIRECTION_SHIFT) + +#endif/* _IA64_MSI_DEF_H */ -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: external module: backward compatibility for PAGE_KERNEL_UC on ia64
From: Yang Zhang Yang Zhang Signed-off-by: Yang Zhang Yang Zhang Signed-off-by: Avi Kivity a...@redhat.com diff --git a/kernel/ia64/external-module-compat.h b/kernel/ia64/external-module-compat.h index 592733c..bc78c3d 100644 --- a/kernel/ia64/external-module-compat.h +++ b/kernel/ia64/external-module-compat.h @@ -42,6 +42,12 @@ typedef u64 phys_addr_t; #endif +#if LINUX_VERSION_CODE KERNEL_VERSION(2,6,30) + +#define PAGE_KERNEL_UC __pgprot(__DIRTY_BITS | _PAGE_PL_0 | _PAGE_AR_RWX | \ + _PAGE_MA_UC) +#endif + #endif #ifndef CONFIG_HAVE_KVM_IRQCHIP -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: qemu: Do not use log dirty on ia64
From: Zhang, Yang yang.zh...@intel.com ia64 does not support log dirty. Signed-off-by: Yang Zhang yang.zh...@intel.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/qemu/qemu-kvm.c b/qemu/qemu-kvm.c index 4164368..ed76367 100644 --- a/qemu/qemu-kvm.c +++ b/qemu/qemu-kvm.c @@ -1374,7 +1374,10 @@ int kvm_log_start(target_phys_addr_t phys_addr, target_phys_addr_t len) if (must_use_aliases_source(phys_addr)) return 0; #endif + +#ifndef TARGET_IA64 kvm_qemu_log_memory(phys_addr, len, 1); +#endif return 0; } @@ -1384,7 +1387,10 @@ int kvm_log_stop(target_phys_addr_t phys_addr, target_phys_addr_t len) if (must_use_aliases_source(phys_addr)) return 0; #endif + +#ifndef TARGET_IA64 kvm_qemu_log_memory(phys_addr, len, 0); +#endif return 0; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: testsuite: add tests for short/near Jcc and call instruction emulation
From: Gleb Natapov g...@redhat.com Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/user/test/x86/realmode.c b/user/test/x86/realmode.c index f6d5326..336ba1c 100644 --- a/user/test/x86/realmode.c +++ b/user/test/x86/realmode.c @@ -361,14 +361,94 @@ void test_io(void) void test_call(void) { struct regs inregs = { 0 }, outregs; + u32 esp[16]; + + inregs.esp = (u32)esp; + MK_INSN(call1, mov $test_function, %eax \n\t call *%eax\n\t); + MK_INSN(call_near1, jmp 2f\n\t + 1: mov $0x1234, %eax\n\t + ret\n\t + 2: call 1b\t); + MK_INSN(call_near2, call 1f\n\t + jmp 2f\n\t + 1: mov $0x1234, %eax\n\t + ret\n\t + 2:\t); exec_in_big_real_mode(inregs, outregs, insn_call1, insn_call1_end - insn_call1); if(!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x1234) print_serial(Call Test 1: FAIL\n); + + exec_in_big_real_mode(inregs, outregs, + insn_call_near1, insn_call_near1_end - insn_call_near1); + if(!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x1234) + print_serial(Call near Test 1: FAIL\n); + exec_in_big_real_mode(inregs, outregs, + insn_call_near2, insn_call_near2_end - insn_call_near2); + if(!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x1234) + print_serial(Call near Test 2: FAIL\n); +} + +void test_jcc_short(void) +{ + struct regs inregs = { 0 }, outregs; + MK_INSN(jnz_short1, jnz 1f\n\t + mov $0x1234, %eax\n\t + 1:\n\t); + MK_INSN(jnz_short2, 1:\n\t + cmp $0x1234, %eax\n\t + mov $0x1234, %eax\n\t + jnz 1b\n\t); + MK_INSN(jmp_short1, jmp 1f\n\t + mov $0x1234, %eax\n\t + 1:\n\t); + + exec_in_big_real_mode(inregs, outregs, + insn_jnz_short1, insn_jnz_short1_end - insn_jnz_short1); + if(!regs_equal(inregs, outregs, 0)) + print_serial(JNZ sort Test 1: FAIL\n); + + exec_in_big_real_mode(inregs, outregs, + insn_jnz_short2, insn_jnz_short2_end - insn_jnz_short2); + if(!regs_equal(inregs, outregs, R_AX) || !(outregs.eflags (1 6))) + print_serial(JNZ sort Test 2: FAIL\n); + + exec_in_big_real_mode(inregs, outregs, + insn_jmp_short1, insn_jmp_short1_end - insn_jmp_short1); + if(!regs_equal(inregs, outregs, 0)) + print_serial(JMP sort Test 1: FAIL\n); +} + +void test_jcc_near(void) +{ + struct regs inregs = { 0 }, outregs; + /* encode near jmp manually. gas will not do it if offsets 127 byte */ + MK_INSN(jnz_near1, .byte 0x0f, 0x85, 0x06, 0x00\n\t + mov $0x1234, %eax\n\t); + MK_INSN(jnz_near2, cmp $0x1234, %eax\n\t + mov $0x1234, %eax\n\t + .byte 0x0f, 0x85, 0xf0, 0xff\n\t); + MK_INSN(jmp_near1, .byte 0xE9, 0x06, 0x00\n\t + mov $0x1234, %eax\n\t); + + exec_in_big_real_mode(inregs, outregs, + insn_jnz_near1, insn_jnz_near1_end - insn_jnz_near1); + if(!regs_equal(inregs, outregs, 0)) + print_serial(JNZ near Test 1: FAIL\n); + + exec_in_big_real_mode(inregs, outregs, + insn_jnz_near2, insn_jnz_near2_end - insn_jnz_near2); + if(!regs_equal(inregs, outregs, R_AX) || !(outregs.eflags (1 6))) + print_serial(JNZ near Test 2: FAIL\n); + + exec_in_big_real_mode(inregs, outregs, + insn_jmp_near1, insn_jmp_near1_end - insn_jmp_near1); + if(!regs_equal(inregs, outregs, 0)) + print_serial(JMP near Test 1: FAIL\n); } void test_null(void) @@ -389,6 +469,10 @@ void start(void) test_add_imm(); test_io(); test_eflags_insn(); + test_jcc_short(); + test_jcc_near(); + /* test_call() uses short jump so call it after testing jcc */ + test_call(); exit(0); } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: qemu: improve pci host device address parsing
From: Han, Weidong weidong@intel.com pci_parse_devaddr parses [[domain:][bus:]slot, it's valid when even enter only slot, whereas it must be bus:slot.func in device assignment command (-pcidevice host=bus:slot.func). So I implemented a dedicated function to parse device bdf in device assignment command, rather than mix two parsing function together. Signed-off-by: Weidong Han weidong@intel.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c index b7f9fa6..09e54ae 100644 --- a/qemu/hw/device-assignment.c +++ b/qemu/hw/device-assignment.c @@ -1190,8 +1190,7 @@ out: */ AssignedDevInfo *add_assigned_device(const char *arg) { -char *cp, *cp1; -char device[8]; +char device[16]; char dma[6]; int r; AssignedDevInfo *adev; @@ -1202,6 +1201,13 @@ AssignedDevInfo *add_assigned_device(const char *arg) return NULL; } r = get_param_value(device, sizeof(device), host, arg); +if (!r) + goto bad; + +r = pci_parse_host_devaddr(device, adev-bus, adev-dev, adev-func); +if (r) +goto bad; + r = get_param_value(adev-name, sizeof(adev-name), name, arg); if (!r) snprintf(adev-name, sizeof(adev-name), %s, device); @@ -1211,18 +1217,6 @@ AssignedDevInfo *add_assigned_device(const char *arg) if (r !strncmp(dma, none, 4)) adev-disable_iommu = 1; #endif -cp = device; -adev-bus = strtoul(cp, cp1, 16); -if (*cp1 != ':') -goto bad; -cp = cp1 + 1; - -adev-dev = strtoul(cp, cp1, 16); -if (*cp1 != '.') -goto bad; -cp = cp1 + 1; - -adev-func = strtoul(cp, cp1, 16); LIST_INSERT_HEAD(adev_head, adev, next); return adev; diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c index eca0517..bf97c8c 100644 --- a/qemu/hw/pci.c +++ b/qemu/hw/pci.c @@ -163,6 +163,7 @@ static int pci_set_default_subsystem_id(PCIDevice *pci_dev) } /* + * Parse pci address in qemu command * Parse [[domain:]bus:]slot, return -1 on error */ static int pci_parse_devaddr(const char *addr, int *domp, int *busp, unsigned *slotp) @@ -211,6 +212,55 @@ static int pci_parse_devaddr(const char *addr, int *domp, int *busp, unsigned *s return 0; } +/* + * Parse device bdf in device assignment command: + * + * -pcidevice host=bus:dev.func + * + * Parse bus:slot.func return -1 on error + */ +int pci_parse_host_devaddr(const char *addr, int *busp, + int *slotp, int *funcp) +{ +const char *p; +char *e; +int val; +int bus = 0, slot = 0, func = 0; + +p = addr; +val = strtoul(p, e, 16); +if (e == p) + return -1; +if (*e == ':') { + bus = val; + p = e + 1; + val = strtoul(p, e, 16); + if (e == p) + return -1; + if (*e == '.') { + slot = val; + p = e + 1; + val = strtoul(p, e, 16); + if (e == p) + return -1; + func = val; + } else + return -1; +} else + return -1; + +if (bus 0xff || slot 0x1f || func 0x7) + return -1; + +if (*e) + return -1; + +*busp = bus; +*slotp = slot; +*funcp = func; +return 0; +} + int pci_read_devaddr(const char *addr, int *domp, int *busp, unsigned *slotp) { char devaddr[32]; diff --git a/qemu/hw/pci.h b/qemu/hw/pci.h index ea0bec8..890a41b 100644 --- a/qemu/hw/pci.h +++ b/qemu/hw/pci.h @@ -234,6 +234,9 @@ PCIDevice *pci_find_device(int bus_num, int slot, int function); int pci_read_devaddr(const char *addr, int *domp, int *busp, unsigned *slotp); int pci_assign_devaddr(const char *addr, int *domp, int *busp, unsigned *slotp); +int pci_parse_host_devaddr(const char *addr, int *busp, + int *slotp, int *funcp); + void pci_info(Monitor *mon); PCIBus *pci_bridge_init(PCIBus *bus, int devfn, uint16_t vid, uint16_t did, pci_map_irq_fn map_irq, const char *name); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: configure: --with-patched-kernel doesn't need kernelversion
From: Mark McLoughlin mar...@redhat.com DEPMOD_VERSION is only used for kvm.ko, not for e.g. header-sync. Signed-off-by: Mark McLoughlin mar...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/kernel/configure b/kernel/configure index 3fd0c94..7a222e4 100755 --- a/kernel/configure +++ b/kernel/configure @@ -12,6 +12,7 @@ cross_prefix= arch=`uname -m` # don't use uname if kerneldir is set no_uname= +# we only need depmod_version for kvm.ko install depmod_version= if [ -z TMPDIR ] ; then TMPDIR=. @@ -80,7 +81,7 @@ if [ -d $kerneldir/include2 ]; then kernelsourcedir=${kerneldir%/*}/source fi -if [ -n $no_uname ]; then +if [ -n $no_uname -a $want_module ]; then if [ -e $kerneldir/.kernelrelease ]; then depmod_version=`cat $kerneldir/.kernelrelease` -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: extboot: silence compiler warning
From: Jan Kiszka jan.kis...@web.de Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/qemu/hw/extboot.c b/qemu/hw/extboot.c index 32e6226..13ffafa 100644 --- a/qemu/hw/extboot.c +++ b/qemu/hw/extboot.c @@ -77,8 +77,8 @@ static void extboot_write_cmd(void *opaque, uint32_t addr, uint32_t value) BlockDriverState *bs = opaque; int cylinders, heads, sectors, err; uint64_t nb_sectors; -target_phys_addr_t pa; -int blen; +target_phys_addr_t pa = 0; +int blen = 0; void *buf = NULL; if (cmd-type == 0x01 || cmd-type == 0x02) { -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: testsuite: test long JMP emulation
From: Gleb Natapov g...@redhat.com Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/user/test/x86/realmode.c b/user/test/x86/realmode.c index 336ba1c..755b5d1 100644 --- a/user/test/x86/realmode.c +++ b/user/test/x86/realmode.c @@ -451,6 +451,23 @@ void test_jcc_near(void) print_serial(JMP near Test 1: FAIL\n); } +void test_long_jmp() +{ + struct regs inregs = { 0 }, outregs; + u32 esp[16]; + + inregs.esp = (u32)esp; + MK_INSN(long_jmp, call 1f\n\t + jmp 2f\n\t + 1: jmp $0, $test_function\n\t + 2:\n\t); + exec_in_big_real_mode(inregs, outregs, + insn_long_jmp, + insn_long_jmp_end - insn_long_jmp); + if(!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x1234) + print_serial(Long JMP Test: FAIL\n); +} + void test_null(void) { struct regs inregs = { 0 }, outregs; @@ -473,6 +490,8 @@ void start(void) test_jcc_near(); /* test_call() uses short jump so call it after testing jcc */ test_call(); + /* long jmp test uses call near so test it after testing call */ + test_long_jmp(); exit(0); } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: Update .gitignore
From: Jan Kiszka jan.kis...@web.de Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/.gitignore b/.gitignore index fcdc357..22a8200 100644 --- a/.gitignore +++ b/.gitignore @@ -53,10 +53,14 @@ kernel/x86/coalesced_mmio.[ch] kernel/x86/kvm_cache_regs.h kernel/x86/vtd.c kernel/x86/irq_comm.c +kernel/x86/timer.c +kernel/x86/kvm_timer.h +kernel/x86/iommu.c qemu/pc-bios/extboot.bin qemu/qemu-doc.html qemu/*.[18] qemu/*.pod qemu/qemu-tech.html +qemu/qemu-options.texi user/kvmtrace user/test/x86/bootstrap -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: qemu: fixup 4GB+ memslot large page alignment
From: Marcelo Tosatti mtosa...@redhat.com Need to align the 4GB+ memslot after we know its address, not before. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c index d4a4320..cc84772 100644 --- a/qemu/hw/pc.c +++ b/qemu/hw/pc.c @@ -866,6 +866,7 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size, /* above 4giga memory allocation */ if (above_4g_mem_size 0) { +ram_addr = qemu_ram_alloc(above_4g_mem_size); if (hpagesize) { if (ram_addr (hpagesize-1)) { unsigned long aligned_addr; @@ -874,7 +875,6 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size, ram_addr = aligned_addr; } } -ram_addr = qemu_ram_alloc(above_4g_mem_size); cpu_register_physical_memory(0x1ULL, above_4g_mem_size, ram_addr); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: VMX: Fix handling of a fault during NMI unblocked due to IRET
From: Gleb Natapov g...@redhat.com Bit 12 is undefined in any of the following cases: If the VM exit sets the valid bit in the IDT-vectoring information field. If the VM exit is due to a double fault. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 7d7b0d6..631f9b7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3272,36 +3272,41 @@ static void update_tpr_threshold(struct kvm_vcpu *vcpu) static void vmx_complete_interrupts(struct vcpu_vmx *vmx) { u32 exit_intr_info; - u32 idt_vectoring_info; + u32 idt_vectoring_info = vmx-idt_vectoring_info; bool unblock_nmi; u8 vector; int type; bool idtv_info_valid; u32 error; + idtv_info_valid = idt_vectoring_info VECTORING_INFO_VALID_MASK; exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); if (cpu_has_virtual_nmis()) { unblock_nmi = (exit_intr_info INTR_INFO_UNBLOCK_NMI) != 0; vector = exit_intr_info INTR_INFO_VECTOR_MASK; /* -* SDM 3: 25.7.1.2 +* SDM 3: 27.7.1.2 (September 2008) * Re-set bit block by NMI before VM entry if vmexit caused by * a guest IRET fault. +* SDM 3: 23.2.2 (September 2008) +* Bit 12 is undefined in any of the following cases: +* If the VM exit sets the valid bit in the IDT-vectoring +* information field. +* If the VM exit is due to a double fault. */ - if (unblock_nmi vector != DF_VECTOR) + if ((exit_intr_info INTR_INFO_VALID_MASK) unblock_nmi + vector != DF_VECTOR !idtv_info_valid) vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI); } else if (unlikely(vmx-soft_vnmi_blocked)) vmx-vnmi_blocked_time += ktime_to_ns(ktime_sub(ktime_get(), vmx-entry_time)); - idt_vectoring_info = vmx-idt_vectoring_info; - idtv_info_valid = idt_vectoring_info VECTORING_INFO_VALID_MASK; vector = idt_vectoring_info VECTORING_INFO_VECTOR_MASK; type = idt_vectoring_info VECTORING_INFO_TYPE_MASK; if (vmx-vcpu.arch.nmi_injected) { /* -* SDM 3: 25.7.1.2 +* SDM 3: 27.7.1.2 (September 2008) * Clear bit block by NMI before VM entry if a NMI delivery * faulted. */ -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Use rsvd_bits_mask in load_pdptrs()
From: Dong, Eddie eddie.d...@intel.com Also remove bit 5-6 from rsvd_bits_mask per latest SDM. Signed-off-by: Eddie Dong eddie.d...@intel.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index e0f63b6..76dd43c 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -225,11 +225,6 @@ static int is_nx(struct kvm_vcpu *vcpu) return vcpu-arch.shadow_efer EFER_NX; } -static int is_present_pte(unsigned long pte) -{ - return pte PT_PRESENT_MASK; -} - static int is_shadow_present_pte(u64 pte) { return pte != shadow_trap_nonpresent_pte @@ -2195,6 +2190,9 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level) context-rsvd_bits_mask[1][0] = ~0ull; break; case PT32E_ROOT_LEVEL: + context-rsvd_bits_mask[0][2] = + rsvd_bits(maxphyaddr, 63) | + rsvd_bits(7, 8) | rsvd_bits(1, 2); /* PDPTE */ context-rsvd_bits_mask[0][1] = exb_bit_rsvd | rsvd_bits(maxphyaddr, 62); /* PDE */ context-rsvd_bits_mask[0][0] = exb_bit_rsvd | diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index eaab214..3494a2f 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -75,4 +75,9 @@ static inline int is_paging(struct kvm_vcpu *vcpu) return vcpu-arch.cr0 X86_CR0_PG; } +static inline int is_present_pte(unsigned long pte) +{ + return pte PT_PRESENT_MASK; +} + #endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index aeb0193..0dcf95b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -234,7 +234,8 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3) goto out; } for (i = 0; i ARRAY_SIZE(pdpte); ++i) { - if ((pdpte[i] 1) (pdpte[i] 0xfff001e6ull)) { + if (is_present_pte(pdpte[i]) + (pdpte[i] vcpu-arch.mmu.rsvd_bits_mask[0][2])) { ret = 0; goto out; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: VMX: Fix feature testing
From: Sheng Yang sh...@linux.intel.com The testing of feature is too early now, before vmcs_config complete initialization. Signed-off-by: Sheng Yang sh...@linux.intel.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1caa1fc..7d7b0d6 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1208,15 +1208,6 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) vmx_capability.ept, vmx_capability.vpid); } - if (!cpu_has_vmx_vpid()) - enable_vpid = 0; - - if (!cpu_has_vmx_ept()) - enable_ept = 0; - - if (!cpu_has_vmx_flexpriority()) - flexpriority_enabled = 0; - min = 0; #ifdef CONFIG_X86_64 min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; @@ -1320,6 +1311,15 @@ static __init int hardware_setup(void) if (boot_cpu_has(X86_FEATURE_NX)) kvm_enable_efer_bits(EFER_NX); + if (!cpu_has_vmx_vpid()) + enable_vpid = 0; + + if (!cpu_has_vmx_ept()) + enable_ept = 0; + + if (!cpu_has_vmx_flexpriority()) + flexpriority_enabled = 0; + return alloc_kvm_area(); } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Merge commit 'qemu-svn/trunk'
From: Avi Kivity a...@redhat.com * commit 'qemu-svn/trunk': (38 commits) Remove WIN32 guard around -k Add new command line option -singlestep for tcg single stepping. tcg/x86_64: optimize register allocation order stop dirty tracking just at the end of migration (Glauber Costa) create qemu_file_set_error (Glauber Costa) propagate error on failed completion (Glauber Costa) Disable qemu-io on Win32 Add files not included in previous commit. Fix savevm after BDRV_FILE size enforcement Fix the build for --disable-aio gdbstub: Rework configuration via command line and monitor (Jan Kiszka) Make `-icount' help fit 80 chars screen width (Robert Riebisch) qemu-io - an I/O path exerciser (Christoph Hellwig) Fix display breakage when resizing the screen (v2) (Avi Kivity) Fix some win32 compile warnings Make binary stripping conditional (Riku Voipio) qcow2: fix image creation for large, ~2TB, images (Chris Wright) pci_add storage: fix error handling for 'if' parameter (Eduardo Habkost) build system: clean qemu-options.texi and gdbstub-xml.c (Jan Kiszka) build system: silent generation of doc files and qemu-options.h (Jan Kiszka) ... Conflicts: qemu/Makefile.target qemu/hw/vga.c qemu/vl.c Signed-off-by: Avi Kivity a...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: VMX: Rewrite vmx_complete_interrupt()'s twisted maze of if() statements
From: Gleb Natapov g...@redhat.com ...with a more straightforward switch(). Also fix a bug when NMI could be dropped on exit. Although this should never happen in practice, since NMIs can only be injected, never triggered internally by the guest like exceptions. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 631f9b7..577aa95 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3277,7 +3277,6 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) u8 vector; int type; bool idtv_info_valid; - u32 error; idtv_info_valid = idt_vectoring_info VECTORING_INFO_VALID_MASK; exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); @@ -3302,34 +3301,42 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) vmx-vnmi_blocked_time += ktime_to_ns(ktime_sub(ktime_get(), vmx-entry_time)); + vmx-vcpu.arch.nmi_injected = false; + kvm_clear_exception_queue(vmx-vcpu); + kvm_clear_interrupt_queue(vmx-vcpu); + + if (!idtv_info_valid) + return; + vector = idt_vectoring_info VECTORING_INFO_VECTOR_MASK; type = idt_vectoring_info VECTORING_INFO_TYPE_MASK; - if (vmx-vcpu.arch.nmi_injected) { + + switch(type) { + case INTR_TYPE_NMI_INTR: + vmx-vcpu.arch.nmi_injected = true; /* * SDM 3: 27.7.1.2 (September 2008) -* Clear bit block by NMI before VM entry if a NMI delivery -* faulted. +* Clear bit block by NMI before VM entry if a NMI +* delivery faulted. */ - if (idtv_info_valid type == INTR_TYPE_NMI_INTR) - vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO, - GUEST_INTR_STATE_NMI); - else - vmx-vcpu.arch.nmi_injected = false; - } - kvm_clear_exception_queue(vmx-vcpu); - if (idtv_info_valid (type == INTR_TYPE_HARD_EXCEPTION || - type == INTR_TYPE_SOFT_EXCEPTION)) { + vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO, + GUEST_INTR_STATE_NMI); + break; + case INTR_TYPE_HARD_EXCEPTION: + case INTR_TYPE_SOFT_EXCEPTION: if (idt_vectoring_info VECTORING_INFO_DELIVER_CODE_MASK) { - error = vmcs_read32(IDT_VECTORING_ERROR_CODE); - kvm_queue_exception_e(vmx-vcpu, vector, error); + u32 err = vmcs_read32(IDT_VECTORING_ERROR_CODE); + kvm_queue_exception_e(vmx-vcpu, vector, err); } else kvm_queue_exception(vmx-vcpu, vector); vmx-idt_vectoring_info = 0; - } - kvm_clear_interrupt_queue(vmx-vcpu); - if (idtv_info_valid type == INTR_TYPE_EXT_INTR) { + break; + case INTR_TYPE_EXT_INTR: kvm_queue_interrupt(vmx-vcpu, vector); vmx-idt_vectoring_info = 0; + break; + default: + break; } } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Fix task switch back link handling.
From: Gleb Natapov g...@redhat.com Back link is written to a wrong TSS now. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0dcf95b..a13fa70 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3710,7 +3710,6 @@ static void save_state_to_tss32(struct kvm_vcpu *vcpu, tss-fs = get_segment_selector(vcpu, VCPU_SREG_FS); tss-gs = get_segment_selector(vcpu, VCPU_SREG_GS); tss-ldt_selector = get_segment_selector(vcpu, VCPU_SREG_LDTR); - tss-prev_task_link = get_segment_selector(vcpu, VCPU_SREG_TR); } static int load_state_from_tss32(struct kvm_vcpu *vcpu, @@ -3807,8 +3806,8 @@ static int load_state_from_tss16(struct kvm_vcpu *vcpu, } static int kvm_task_switch_16(struct kvm_vcpu *vcpu, u16 tss_selector, - u32 old_tss_base, - struct desc_struct *nseg_desc) + u16 old_tss_sel, u32 old_tss_base, + struct desc_struct *nseg_desc) { struct tss_segment_16 tss_segment_16; int ret = 0; @@ -3827,6 +3826,16 @@ static int kvm_task_switch_16(struct kvm_vcpu *vcpu, u16 tss_selector, tss_segment_16, sizeof tss_segment_16)) goto out; + if (old_tss_sel != 0x) { + tss_segment_16.prev_task_link = old_tss_sel; + + if (kvm_write_guest(vcpu-kvm, + get_tss_base_addr(vcpu, nseg_desc), + tss_segment_16.prev_task_link, + sizeof tss_segment_16.prev_task_link)) + goto out; + } + if (load_state_from_tss16(vcpu, tss_segment_16)) goto out; @@ -3836,7 +3845,7 @@ out: } static int kvm_task_switch_32(struct kvm_vcpu *vcpu, u16 tss_selector, - u32 old_tss_base, + u16 old_tss_sel, u32 old_tss_base, struct desc_struct *nseg_desc) { struct tss_segment_32 tss_segment_32; @@ -3856,6 +3865,16 @@ static int kvm_task_switch_32(struct kvm_vcpu *vcpu, u16 tss_selector, tss_segment_32, sizeof tss_segment_32)) goto out; + if (old_tss_sel != 0x) { + tss_segment_32.prev_task_link = old_tss_sel; + + if (kvm_write_guest(vcpu-kvm, + get_tss_base_addr(vcpu, nseg_desc), + tss_segment_32.prev_task_link, + sizeof tss_segment_32.prev_task_link)) + goto out; + } + if (load_state_from_tss32(vcpu, tss_segment_32)) goto out; @@ -3911,12 +3930,17 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason) kvm_x86_ops-skip_emulated_instruction(vcpu); + /* set back link to prev task only if NT bit is set in eflags + note that old_tss_sel is not used afetr this point */ + if (reason != TASK_SWITCH_CALL reason != TASK_SWITCH_GATE) + old_tss_sel = 0x; + if (nseg_desc.type 8) - ret = kvm_task_switch_32(vcpu, tss_selector, old_tss_base, -nseg_desc); + ret = kvm_task_switch_32(vcpu, tss_selector, old_tss_sel, +old_tss_base, nseg_desc); else - ret = kvm_task_switch_16(vcpu, tss_selector, old_tss_base, -nseg_desc); + ret = kvm_task_switch_16(vcpu, tss_selector, old_tss_sel, +old_tss_base, nseg_desc); if (reason == TASK_SWITCH_CALL || reason == TASK_SWITCH_GATE) { u32 eflags = kvm_x86_ops-get_rflags(vcpu); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: VMX: Do not zero idt_vectoring_info in vmx_complete_interrupts().
From: Gleb Natapov g...@redhat.com We will need it later in task_switch(). Code in handle_exception() is dead. is_external_interrupt(vect_info) will always be false since idt_vectoring_info is zeroed in vmx_complete_interrupts(). Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 577aa95..e4ad9d3 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2626,11 +2626,6 @@ static int handle_exception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) printk(KERN_ERR %s: unexpected, vectoring info 0x%x intr info 0x%x\n, __func__, vect_info, intr_info); - if (!irqchip_in_kernel(vcpu-kvm) is_external_interrupt(vect_info)) { - int irq = vect_info VECTORING_INFO_VECTOR_MASK; - kvm_push_irq(vcpu, irq); - } - if ((intr_info INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR) return 1; /* already handled by vmx_vcpu_run() */ @@ -3329,11 +3324,9 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) kvm_queue_exception_e(vmx-vcpu, vector, err); } else kvm_queue_exception(vmx-vcpu, vector); - vmx-idt_vectoring_info = 0; break; case INTR_TYPE_EXT_INTR: kvm_queue_interrupt(vmx-vcpu, vector); - vmx-idt_vectoring_info = 0; break; default: break; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Fix unneeded instruction skipping during task switching.
From: Gleb Natapov g...@redhat.com There is no need to skip instruction if the reason for a task switch is a task gate in IDT and access to it is caused by an external even. The problem is currently solved only for VMX since there is no reliable way to skip an instruction in SVM. We should emulate it instead. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 82ada75..85574b7 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -225,6 +225,7 @@ struct __attribute__ ((__packed__)) vmcb { #define SVM_EVTINJ_VALID_ERR (1 11) #define SVM_EXITINTINFO_VEC_MASK SVM_EVTINJ_VEC_MASK +#define SVM_EXITINTINFO_TYPE_MASK SVM_EVTINJ_TYPE_MASK #defineSVM_EXITINTINFO_TYPE_INTR SVM_EVTINJ_TYPE_INTR #defineSVM_EXITINTINFO_TYPE_NMI SVM_EVTINJ_TYPE_NMI diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 1fcbc17..3ffb695 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1823,17 +1823,28 @@ static int task_switch_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) { u16 tss_selector; + int reason; + int int_type = svm-vmcb-control.exit_int_info + SVM_EXITINTINFO_TYPE_MASK; tss_selector = (u16)svm-vmcb-control.exit_info_1; + if (svm-vmcb-control.exit_info_2 (1ULL SVM_EXITINFOSHIFT_TS_REASON_IRET)) - return kvm_task_switch(svm-vcpu, tss_selector, - TASK_SWITCH_IRET); - if (svm-vmcb-control.exit_info_2 - (1ULL SVM_EXITINFOSHIFT_TS_REASON_JMP)) - return kvm_task_switch(svm-vcpu, tss_selector, - TASK_SWITCH_JMP); - return kvm_task_switch(svm-vcpu, tss_selector, TASK_SWITCH_CALL); + reason = TASK_SWITCH_IRET; + else if (svm-vmcb-control.exit_info_2 +(1ULL SVM_EXITINFOSHIFT_TS_REASON_JMP)) + reason = TASK_SWITCH_JMP; + else if (svm-vmcb-control.exit_int_info SVM_EXITINTINFO_VALID) + reason = TASK_SWITCH_GATE; + else + reason = TASK_SWITCH_CALL; + + + if (reason != TASK_SWITCH_GATE || int_type == SVM_EXITINTINFO_TYPE_SOFT) + skip_emulated_instruction(svm-vcpu); + + return kvm_task_switch(svm-vcpu, tss_selector, reason); } static int cpuid_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e4ad9d3..c6997c0 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3038,22 +3038,40 @@ static int handle_task_switch(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) struct vcpu_vmx *vmx = to_vmx(vcpu); unsigned long exit_qualification; u16 tss_selector; - int reason; + int reason, type, idt_v; + + idt_v = (vmx-idt_vectoring_info VECTORING_INFO_VALID_MASK); + type = (vmx-idt_vectoring_info VECTORING_INFO_TYPE_MASK); exit_qualification = vmcs_readl(EXIT_QUALIFICATION); reason = (u32)exit_qualification 30; - if (reason == TASK_SWITCH_GATE vmx-vcpu.arch.nmi_injected - (vmx-idt_vectoring_info VECTORING_INFO_VALID_MASK) - (vmx-idt_vectoring_info VECTORING_INFO_TYPE_MASK) - == INTR_TYPE_NMI_INTR) { - vcpu-arch.nmi_injected = false; - if (cpu_has_virtual_nmis()) - vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, - GUEST_INTR_STATE_NMI); + if (reason == TASK_SWITCH_GATE idt_v) { + switch (type) { + case INTR_TYPE_NMI_INTR: + vcpu-arch.nmi_injected = false; + if (cpu_has_virtual_nmis()) + vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, + GUEST_INTR_STATE_NMI); + break; + case INTR_TYPE_EXT_INTR: + kvm_clear_interrupt_queue(vcpu); + break; + case INTR_TYPE_HARD_EXCEPTION: + case INTR_TYPE_SOFT_EXCEPTION: + kvm_clear_exception_queue(vcpu); + break; + default: + break; + } } tss_selector = exit_qualification; + if (!idt_v || (type != INTR_TYPE_HARD_EXCEPTION + type != INTR_TYPE_EXT_INTR + type != INTR_TYPE_NMI_INTR)) + skip_emulated_instruction(vcpu); + if (!kvm_task_switch(vcpu, tss_selector, reason)) return 0; @@ -3306,7 +3324,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) vector = idt_vectoring_info VECTORING_INFO_VECTOR_MASK; type = idt_vectoring_info VECTORING_INFO_TYPE_MASK;
[PATCH] KVM: VMX: Clean up Flex Priority related
From: Sheng Yang sh...@linux.intel.com And clean paranthes on returns. Signed-off-by: Sheng Yang sh...@linux.intel.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index aba41ae..1caa1fc 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -216,61 +216,69 @@ static inline int is_external_interrupt(u32 intr_info) static inline int cpu_has_vmx_msr_bitmap(void) { - return (vmcs_config.cpu_based_exec_ctrl CPU_BASED_USE_MSR_BITMAPS); + return vmcs_config.cpu_based_exec_ctrl CPU_BASED_USE_MSR_BITMAPS; } static inline int cpu_has_vmx_tpr_shadow(void) { - return (vmcs_config.cpu_based_exec_ctrl CPU_BASED_TPR_SHADOW); + return vmcs_config.cpu_based_exec_ctrl CPU_BASED_TPR_SHADOW; } static inline int vm_need_tpr_shadow(struct kvm *kvm) { - return ((cpu_has_vmx_tpr_shadow()) (irqchip_in_kernel(kvm))); + return (cpu_has_vmx_tpr_shadow()) (irqchip_in_kernel(kvm)); } static inline int cpu_has_secondary_exec_ctrls(void) { - return (vmcs_config.cpu_based_exec_ctrl - CPU_BASED_ACTIVATE_SECONDARY_CONTROLS); + return vmcs_config.cpu_based_exec_ctrl + CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; } static inline bool cpu_has_vmx_virtualize_apic_accesses(void) { - return flexpriority_enabled; + return vmcs_config.cpu_based_2nd_exec_ctrl + SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; +} + +static inline bool cpu_has_vmx_flexpriority(void) +{ + return cpu_has_vmx_tpr_shadow() + cpu_has_vmx_virtualize_apic_accesses(); } static inline int cpu_has_vmx_invept_individual_addr(void) { - return (!!(vmx_capability.ept VMX_EPT_EXTENT_INDIVIDUAL_BIT)); + return !!(vmx_capability.ept VMX_EPT_EXTENT_INDIVIDUAL_BIT); } static inline int cpu_has_vmx_invept_context(void) { - return (!!(vmx_capability.ept VMX_EPT_EXTENT_CONTEXT_BIT)); + return !!(vmx_capability.ept VMX_EPT_EXTENT_CONTEXT_BIT); } static inline int cpu_has_vmx_invept_global(void) { - return (!!(vmx_capability.ept VMX_EPT_EXTENT_GLOBAL_BIT)); + return !!(vmx_capability.ept VMX_EPT_EXTENT_GLOBAL_BIT); } static inline int cpu_has_vmx_ept(void) { - return (vmcs_config.cpu_based_2nd_exec_ctrl - SECONDARY_EXEC_ENABLE_EPT); + return vmcs_config.cpu_based_2nd_exec_ctrl + SECONDARY_EXEC_ENABLE_EPT; } static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm) { - return ((cpu_has_vmx_virtualize_apic_accesses()) - (irqchip_in_kernel(kvm))); + return flexpriority_enabled + (cpu_has_vmx_virtualize_apic_accesses()) + (irqchip_in_kernel(kvm)); } static inline int cpu_has_vmx_vpid(void) { - return (vmcs_config.cpu_based_2nd_exec_ctrl - SECONDARY_EXEC_ENABLE_VPID); + return vmcs_config.cpu_based_2nd_exec_ctrl + SECONDARY_EXEC_ENABLE_VPID; } static inline int cpu_has_virtual_nmis(void) @@ -278,6 +286,11 @@ static inline int cpu_has_virtual_nmis(void) return vmcs_config.pin_based_exec_ctrl PIN_BASED_VIRTUAL_NMIS; } +static inline bool report_flexpriority(void) +{ + return flexpriority_enabled; +} + static int __find_msr_index(struct vcpu_vmx *vmx, u32 msr) { int i; @@ -1201,7 +1214,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) if (!cpu_has_vmx_ept()) enable_ept = 0; - if (!(vmcs_config.cpu_based_2nd_exec_ctrl SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) + if (!cpu_has_vmx_flexpriority()) flexpriority_enabled = 0; min = 0; @@ -3655,7 +3668,7 @@ static struct kvm_x86_ops vmx_x86_ops = { .check_processor_compatibility = vmx_check_processor_compat, .hardware_enable = hardware_enable, .hardware_disable = hardware_disable, - .cpu_has_accelerated_tpr = cpu_has_vmx_virtualize_apic_accesses, + .cpu_has_accelerated_tpr = report_flexpriority, .vcpu_create = vmx_create_vcpu, .vcpu_free = vmx_free_vcpu, -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: MMU: Discard reserved bits checking on PDE bit 7-8
From: Sheng Yang sh...@linux.intel.com 1. It's related to a Linux kernel bug which fixed by Ingo on 07a66d7c53a538e1a9759954a82bb6c07365eff9. The original code exists for quite a long time, and it would convert a PDE for large page into a normal PDE. But it fail to fit normal PDE well. With the code before Ingo's fix, the kernel would fall reserved bit checking with bit 8 - the remaining global bit of PTE. So the kernel would receive a double-fault. 2. After discussion, we decide to discard PDE bit 7-8 reserved checking for now. For this marked as reserved in SDM, but didn't checked by the processor in fact... Signed-off-by: Sheng Yang sh...@linux.intel.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 76dd43c..d5bdf3a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2194,7 +2194,7 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level) rsvd_bits(maxphyaddr, 63) | rsvd_bits(7, 8) | rsvd_bits(1, 2); /* PDPTE */ context-rsvd_bits_mask[0][1] = exb_bit_rsvd | - rsvd_bits(maxphyaddr, 62); /* PDE */ + rsvd_bits(maxphyaddr, 62); /* PDE */ context-rsvd_bits_mask[0][0] = exb_bit_rsvd | rsvd_bits(maxphyaddr, 62); /* PTE */ context-rsvd_bits_mask[1][1] = exb_bit_rsvd | @@ -2208,13 +2208,14 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level) context-rsvd_bits_mask[0][2] = exb_bit_rsvd | rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8); context-rsvd_bits_mask[0][1] = exb_bit_rsvd | - rsvd_bits(maxphyaddr, 51) | rsvd_bits(7, 8); + rsvd_bits(maxphyaddr, 51); context-rsvd_bits_mask[0][0] = exb_bit_rsvd | rsvd_bits(maxphyaddr, 51); context-rsvd_bits_mask[1][3] = context-rsvd_bits_mask[0][3]; context-rsvd_bits_mask[1][2] = context-rsvd_bits_mask[0][2]; context-rsvd_bits_mask[1][1] = exb_bit_rsvd | - rsvd_bits(maxphyaddr, 51) | rsvd_bits(13, 20); + rsvd_bits(maxphyaddr, 51) | + rsvd_bits(13, 20); /* large page */ context-rsvd_bits_mask[1][0] = ~0ull; break; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86 emulator: fix call near emulation
From: Gleb Natapov g...@redhat.com The length of pushed on to the stack return address depends on operand size not address size. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c index ca91749..d7c9f6f 100644 --- a/arch/x86/kvm/x86_emulate.c +++ b/arch/x86/kvm/x86_emulate.c @@ -1792,7 +1792,6 @@ special_insn: } c-src.val = (unsigned long) c-eip; jmp_rel(c, rel); - c-op_bytes = c-ad_bytes; emulate_push(ctxt); break; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip 0/6 V4] tracing: kprobe-based event tracer
Pekka Paalanen wrote: Not just emulation but address diversion, i.e. modifying the operation (not the text) before executing it. Mmiotrace could do something like this: 1. a blob calls ioremap 2. mmiotrace maps the MMIO area privately 3. the blob receives a dummy map from ioremap, that will generate page fault 4. the blob accesses the dummy map and raises a page fault 5. pf handler detects the dummy map 6. mmiotrace pf handler emulates the instruction and replaces the dummy address with the real MMIO address. 7. mmiotrace records the operation and the datum 8. go to step 4, or whatever This means mmiotrace would not have to fiddle with the page tables and page presence bits like it does now. As said, this would make mmiotrace SMP-proof, and also eliminate the die notifier (used for the instruction single stepping trap). IMO a big step from a hack to a tool. Getting rid of the custom instruction parser in mmiotrace would be a good step in itself. Avi Kivity noted, that the KVM emulator does almost everything. Does it allow also address diversion? Operand access is by means of a callback, so yes. In kvm's use, it's used to access guest memory, so it modified the addresses before reading or writing. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add ksm kernel shared memory driver.
Andrey Panin wrote: On 094, 04 04, 2009 at 05:35:22PM +0300, Izik Eidus wrote: SNIP +static inline u32 calc_checksum(struct page *page) +{ + u32 checksum; + void *addr = kmap_atomic(page, KM_USER0); + checksum = jhash(addr, PAGE_SIZE, 17); Why jhash2() is not used here ? It's faster and leads to smaller code size. Beacuse i didnt know, i will check that and change. Thanks. (We should really use in cpu crc for Intel Nehalem, and dirty bit for the rest of the architactures...) + kunmap_atomic(addr, KM_USER0); + return checksum; +} -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VMExits on Software Interrupts
Rastogi, Shaurya wrote: Hi, Is there any way to generate a VMExit for software interrupts? I am interested in causing a VMExit when Int 80 instruction is executed in guest VM. Is it possible to do so? If so how? Neither vmx nor svm support trapping on software interrupts. You might be able to track changes to the IDT and install a hardware breakpoint on the int 80 handler, but that's quite difficult and hacky. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance
On Friday 03 April 2009 13:32:50 you wrote: Hallo, as I want to switch from XEN to KVM I've made some performance tests to see if KVM is as peformant as XEN. But tests with a VMU that receives a streamed video, adds a small logo to the video and streams it to a client have shown that XEN performs much betten than KVM. In XEN the vlc (videolan client used to receive, process and send the video) process within the vmu has a cpuload of 33,8 % whereas in KVM the vlc process has a cpuload of 99.9 %. I'am not sure why, does anybody now some settings to improve the KVM performance? Thank you. Regards, Stefanie. Used hardware and settings: In the tests I've used the same host hardware for XEN and KVM: - Dual Core AMD 2.2 GHz, 8 GB RAM - Tested OSes for KVM Host: Fedora 10, 2.6.27.5-117.fc10.x86_64 with kvm version 10.fc10 version 74 also tested in january: compiled kernel with kvm-83 - KVM Guest settings: OS: Fedora 9 2.6.25-14.fc9.x86_64 (i386 also tested) RAM: 256 MB (same for XEN vmu) CPU: 1 Core with 2,2 GHz (same for XEN vmu) tested nic models: rtl8139, e1000, virtio Tested Scenario: VMU receives a streamed video , adds a logo (watermark) to the video stream and then streams it to a client Results: XEN: Host cpu load (virt-manager): 23% VMU cpu load (virt-manager): 18 % VLC process within VMU (top): 33,8% KVM: no virt-manager cpu load as I started the vmu with the kvm command Host cpu load : 52% qemu-kvm process (top)77-100% VLC process within vmu (top): 80 - 99,9% KVM command to start vmu /usr/bin/qemu-kvm -boot c -hda /images/vmu01.raw -m 256 -net nic,vlan=0,macaddr=aa:bb:cc:dd:ee:10,model=virtio -net tap,ifname=tap0,vlan=0,script=/etc/kvm/qemu-ifup,downscript=/etc/kvm/qem u-ifdown -vnc 127.0.0.1:1 -k de --daemonize Hi Stefanie, does vlc perform operations on disc (eg caching, logging, ...)? When it cache you can use virtio also for the disk. Just change -hda /images/vmu01.raw to -drive file=/images/vmu01.raw,if=virtio,boot=on Regards Hauke Alcatel-Lucent Deutschland AG Bell Labs Germany Service Infrastructure, ZFZ-SI Stefanie Braun Phone: +49.711.821-34865 Fax: +49.711.821-32453 Postal address: Alcatel-Lucent Deutschland AG Lorenzstrasse 10 D-70435 STUTTGART Mail: stefanie.br...@alcatel-lucent.de Alcatel-Lucent Deutschland AG Sitz der Gesellschaft: Stuttgart - Amtsgericht Stuttgart HRB 4026 Vorsitzender des Aufsichtsrats: Michael Oppenhoff Vorstand: Alf Henryk Wulf (Vors.), Dr. Rainer Fechner -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- hauke hoffmann service and electronic systems Moristeig 60, D-23556 Lübeck Telefon: +49 (0) 451 8896462 Fax: +49 (0) 451 8896461 Mobil: +49 (0) 170 7580491 E-Mail: off...@hauke-hoffmann.net PGP public key: www.hauke-hoffmann.net/static/pgp/kontakt.asc -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
Hi Izik, Is there some user documentation available? (apart from RTFS?:)) I've compiled kernel with v2 of Your patches, loaded ksm module, did echo 1 /proc/sys/kernel/mm/ksm/run, but I think it didn't do anything, at least no pages were collected.. Could You advise me a bit? thanks a lot in advance... I can't wait to try it on our hosts runing 50-60 KVMs :) BR nik On Sat, Apr 04, 2009 at 05:35:18PM +0300, Izik Eidus wrote: From v1 to v2: 1)Fixed security issue found by Chris Wright: Ksm was checking if page is a shared page by running !PageAnon. Beacuse that Ksm scan only anonymous memory, all !PageAnons inside ksm data strctures are shared page, however there might be a case for do_wp_page() when the VM_SHARED is used where do_wp_page() would instead of copying the page into new anonymos page, would reuse the page, it was fixed by adding check for the dirty_bit of the virtual addresses pointing into the shared page. I was not finding any VM code tha would clear the dirty bit from this virtual address (due to the fact that we allocate the page using page_alloc() - kernel allocated pages), ~but i still want confirmation about this from the vm guys - thanks.~ 2)Moved to sysfs to control ksm: It was requested as a better way to control the ksm scanning thread than ioctls. the sysfs api: dir: /sys/kernel/mm/ksm/ kernel_pages_allocated - information about how many kernel pages ksm have allocated, this pages are not swappable, and each page like that is used by ksm to share pages with identical content pages_shared - how many pages were shared by ksm run - set to 1 when you want ksm to run, 0 when no max_kernel_pages - set the maximum amount of kernel pages to be allocated by ksm, set 0 for unlimited. pages_to_scan - how many pages to scan before ksm will sleep sleep - how much usecs ksm will sleep. 3)Add sysfs paramater to control the maximum kernel pages to be by ksm. 4)Add statistics about how much pages are really shared. One issue still to be discussed: There was a suggestion to use madvice(SHAREABLE) instead of using ioctls to register memory that need to be scanned by ksm. Such change is outside the area of ksm.c and would required adding new madvice api, and change some parts of the vm and the kernel code, so first thing to do, is realized if we really want this. I dont know any other open issues. Thanks. This is from the first post: (The kvm part, togather with the kvm-userspace part, was post with V1 before about a week, whoever want to test ksm may download the patch from lkml archive) KSM is a linux driver that allows dynamicly sharing identical memory pages between one or more processes. Unlike tradtional page sharing that is made at the allocation of the memory, ksm do it dynamicly after the memory was created. Memory is periodically scanned; identical pages are identified and merged. The sharing is unnoticeable by the process that use this memory. (the shared pages are marked as readonly, and in case of write do_wp_page() take care to create new copy of the page) To find identical pages ksm use algorithm that is split into three primery levels: 1) Ksm will start scan the memory and will calculate checksum for each page that is registred to be scanned. (In the first round of the scanning, ksm would only calculate this checksum for all the pages) 2) Ksm will go again on the whole memory and will recalculate the checmsum of the pages, pages that are found to have the same checksum value, would be considered pages that are most likely wont changed Ksm will insert this pages into sorted by page content RB-tree that is called unstable tree, the reason that this tree is called unstable is due to the fact that the page contents might changed while they are still inside the tree, and therefore the tree would become corrupted. Due to this problem ksm take two more steps in addition to the checksum calculation: a) Ksm will throw and recreate the entire unstable tree each round of memory scanning - so if we have corruption, it will be fixed when we will rebuild the tree. b) Ksm is using RB-tree, that its balancing is made by the node color and not by the content, so even if the page get corrupted, it still would take the same amount of time to search on it. 3) In addition to the unstable tree, ksm hold another tree that is called stable tree - this tree is RB-tree that is sorted by the pages content and all its pages are write protected, and therefore it cant get corrupted. Each time ksm will find two identcial pages using the unstable tree, it will create new write-protected shared page, and this page will be inserted into the stable tree, and would be saved there, the stable tree, unlike the
Re: CPU Limits on KVM?
Hi Francisco, I've been trying to limit the cpu usage of a VM using cgroups - so I can share my experience with you. However, from monitoring the kvm process looks like it doesnt force a hard limit of cpu usage%. Hopefully, someone can light up the issue. I'm interested in limiting the io bandwidth of network and disk as well. (dm-ioband - any experience with that? other suggestions?) 1. mount -t cgroup none /dev/cgroup -o cpu,memory This creates the cgroup file system api on /dev/cgroup. The top level hierarchy node contains all the processes by default. 2. cd /dev/cgroup 3. mkdir VM1 4. mkdir VM2 5. echo {kvm1 pid} vm1/tasks 6. echo {kvm2 pid} vm2/tasks Moves the vm's processes from the top node to a child node - to enable atomic control 7. edit VM1/cpu.shares and VM2/cpu.shares to change cpu proportion (default is 1024 shares) I've used 128 for VM1 and 1024 for VM2. To my understanding, that should yield proportion of 128/(1024+128) 15% of cpu limit for VM1. Each of the VMs was allocated 2 vcpu, the host has 4 cores.(Been tring the same with 1 vcpu each, results werent different) Monitoring results: I used top, atop and system monitoring GUI on the host and on the guest for monitoring. Was able to load VM1 to use more than 50%... (Monitoring screenshot at http%3A%2F%2Ftinyurl.com%2Fcsx3v7 ) On the left - guest monitoring, time graph shows that there are peaks of more than 50%), on the right - host monitoring, kvm pid 32620 (VM1 pid) with 63% load) Regarding the references you asked for. Those are the references I've found and worked with: www.mjmwired.net%2Fkernel%2FDocumentation%2Fcgroups.txt%A0 http%3A%2F%2Ftinyurl.com%2Fdcnoav - Oshrit oshr...@il.ibm.com From: Brian Jackson i...@theiggy.com To: Francisco Mazzeo francisco.maz...@gmail.com, kvm@vger.kernel.org Date: 03/04/2009 02:32 Subject: Re: CPU Limits on KVM? I haven't ever really used cgroups. I always figured a fair host scheduler is good enough to handle spreading load. So I don't know if it will fit exactly what you need. I don't think so. I also don't know of any other options. I will say, If I gave 4 VMs a single cpu each on a 4 core host, I would expect the host to be fully loaded. I wouldn't see any reason for the host not to be fully loaded. That is after all one of the key points of virtualization. Better utilization of hardware. On Thursday 02 April 2009 17:33:07 Francisco Mazzeo wrote: Hello Brian, Thanks for the reply. is there a wiki about cgroupds and how to set them up? Also, I tried just for kicks to see what would happen if I create 4 Virtual Windows machines, run prime95 (a tool that does iterations like superpi to stress test memory/cpu) on all of them and just assign them only ONE core to them. The server node did not crash and you are right, however I was hoping for the server load to stay below 50% as I only gave it one single core to each KVM VE. Instead it seems like KVM let each VE get one slice of each of the 4 cores of my CPU, which did not accomplish what I wanted. Is cgroupds the only choice available? -- Francisco On Thu, Apr 2, 2009 at 3:29 PM, Brian Jackson i...@theiggy.com wrote: There's CPU cgroups. It doesn't have exactly the ability you are after, but it is able to limit process(es) CPU usage. Maxing out CPU usage won't crash your server. The kernel will arbitrate sharing the CPU evenly among processes/VMs. --Brian Jackson On Thursday 02 April 2009 16:41:10 Francisco Mazzeo wrote: Hello, I am a new user to KVM and was wondering if there was any way to limit a VE from using up all the resources of the processor. Right now I have a Quad core 2.5Ghz, I have a KVM VE (running windows server 2003) and assigned 4 CPUs to it. If I max out the load for that VE, the entire host node load will be 100% which may crash it if I hosted more than 1 single VE. OpenVZ has cpulimit command, does KVM have something similar or any way that I can implement a limit on a single VE? Say I want to only give a max of 500Mhz per core, to total 2Ghz to the VE. Thanks Francisco www.navigatoris.net / www.serversoutlet.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ksm - dynamic page sharing driver for linux v2
On Mon, Apr 06, 2009 at 05:04:49PM +1000, Nick Piggin wrote: They should use a shared memory segment, or MAP_ANONYMOUS|MAP_SHARED etc. Presumably they will probably want to control it to interleave it over all numa nodes and use hugepages for it. It would be very little work. I thought it's the intermediate result of the computations that leads to lots of equal data too, in which case ksm is the only way to share it all. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange guest slowness after some time
Tomasz Chmielewski schrieb: As I mentioned, it was using virtio net. Guests running with e1000 (and virtio_blk) don't have this problem. Also, virtio_console seem to be affected by this slowness issue. Am I correct to think that if: * on guest lsmod outputs: virtio_console 6828 0 [permanent] * on guest, /etc/inittab contains: 6:2345:respawn:/sbin/mingetty ttyS0 * on host, I start the guest with a parameter: -serial unix:/var/run/qemu-server/103.serial,server,nowait That the guests's ttyS0 console is virtio_console? If my thinking is correct, than I have a slow serial console on some of the guests using virtio_pci and virtio_console driver. By slow serial console I mean any character typed shows up after a second or so. It can be also cured like with virtio_net - just run: dd if=/dev/vda of=/dev/null And the console reacts normally. Stop dd, console is slow again. I have this issue on two guests with e1000 network, which use virtio_blk (and virtio_console...). I never saw this issue with guests which don't use virtio. -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
WG: KVM performance
-Ursprüngliche Nachricht- Von: BRAUN, Stefanie Gesendet: Montag, 6. April 2009 18:25 An: 'Avi Kivity' Betreff: AW: KVM performance -Ursprüngliche Nachricht- Von: Avi Kivity [mailto:a...@redhat.com] Gesendet: Montag, 6. April 2009 13:45 An: BRAUN, Stefanie Cc: kvm@vger.kernel.org Betreff: Re: KVM performance BRAUN, Stefanie wrote: Hallo, as I want to switch from XEN to KVM I've made some performance tests to see if KVM is as peformant as XEN. But tests with a VMU that receives a streamed video, adds a small logo to the video and streams it to a client have shown that XEN performs much betten than KVM. In XEN the vlc (videolan client used to receive, process and send the video) process within the vmu has a cpuload of 33,8 % whereas in KVM the vlc process has a cpuload of 99.9 %. I'am not sure why, does anybody now some settings to improve the KVM performance? Is this a tcp test? Can you test receive and transmit separately? Hello, it's a transcoder test, but without transcoding between video formats, the vmu just adds a logo (a watermark) into the video. At the same time the vmu performed several actions: - receiving a streamed video via udp - adding a logo to the video - sending the streamed video via udp But I think I can split up the test into the following subtests and provide further performance values Sub test 1 receive: - Receiving the video from network (udp) and saving locally Sub test 2 transmit: - Reading the video from local ressource and sending via network Sub test 3 process: - Reading the video from local ressource, adding the logo to the video stream and saving it again locally. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
AW: KVM performance
-Ursprüngliche Nachricht- Von: Hauke Hoffmann [mailto:kont...@hauke-hoffmann.net] Gesendet: Montag, 6. April 2009 14:13 An: kvm@vger.kernel.org Cc: BRAUN, Stefanie Betreff: Re: KVM performance On Friday 03 April 2009 13:32:50 you wrote: Hallo, as I want to switch from XEN to KVM I've made some performance tests to see if KVM is as peformant as XEN. But tests with a VMU that receives a streamed video, adds a small logo to the video and streams it to a client have shown that XEN performs much betten than KVM. In XEN the vlc (videolan client used to receive, process and send the video) process within the vmu has a cpuload of 33,8 % whereas in KVM the vlc process has a cpuload of 99.9 %. I'am not sure why, does anybody now some settings to improve the KVM performance? Thank you. Regards, Stefanie. Used hardware and settings: In the tests I've used the same host hardware for XEN and KVM: - Dual Core AMD 2.2 GHz, 8 GB RAM - Tested OSes for KVM Host: Fedora 10, 2.6.27.5-117.fc10.x86_64 with kvm version 10.fc10 version 74 also tested in january: compiled kernel with kvm-83 - KVM Guest settings: OS: Fedora 9 2.6.25-14.fc9.x86_64 (i386 also tested) RAM: 256 MB (same for XEN vmu) CPU: 1 Core with 2,2 GHz (same for XEN vmu) tested nic models: rtl8139, e1000, virtio Tested Scenario: VMU receives a streamed video , adds a logo (watermark) to the video stream and then streams it to a client Results: XEN: Host cpu load (virt-manager): 23% VMU cpu load (virt-manager): 18 % VLC process within VMU (top): 33,8% KVM: no virt-manager cpu load as I started the vmu with the kvm command Host cpu load : 52% qemu-kvm process (top)77-100% VLC process within vmu (top): 80 - 99,9% KVM command to start vmu /usr/bin/qemu-kvm -boot c -hda /images/vmu01.raw -m 256 -net nic,vlan=0,macaddr=aa:bb:cc:dd:ee:10,model=virtio -net tap,ifname=tap0,vlan=0,script=/etc/kvm/qemu-ifup,downscript=/etc/kvm/q em u-ifdown -vnc 127.0.0.1:1 -k de --daemonize Hi Stefanie, does vlc perform operations on disc (eg caching, logging, ...)? When it cache you can use virtio also for the disk. Just change -hda /images/vmu01.raw to -drive file=/images/vmu01.raw,if=virtio,boot=on Regards Hauke Hi Hauke, Thanks for your replay. The vlc does not perform excessive operations on disc. Even so I've added disk virtio to the vmu setup. But the qemu-kvm process in the host and the vlc process within the vmu still consume up to 100%. Regards, Stefanie Alcatel-Lucent Deutschland AG Bell Labs Germany Service Infrastructure, ZFZ-SI Stefanie Braun Phone: +49.711.821-34865 Fax: +49.711.821-32453 Postal address: Alcatel-Lucent Deutschland AG Lorenzstrasse 10 D-70435 STUTTGART Mail: stefanie.br...@alcatel-lucent.de Alcatel-Lucent Deutschland AG Sitz der Gesellschaft: Stuttgart - Amtsgericht Stuttgart HRB 4026 Vorsitzender des Aufsichtsrats: Michael Oppenhoff Vorstand: Alf Henryk Wulf (Vors.), Dr. Rainer Fechner -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- hauke hoffmann service and electronic systems Moristeig 60, D-23556 Lübeck Telefon: +49 (0) 451 8896462 Fax: +49 (0) 451 8896461 Mobil: +49 (0) 170 7580491 E-Mail: off...@hauke-hoffmann.net PGP public key: www.hauke-hoffmann.net/static/pgp/kontakt.asc -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Qemu: Flush i-cache after ide-dma operation in IA64
On Thu, 2009-04-02 at 10:01 +0800, Zhang, Yang wrote: The data from dma will include instructions. In order to exeuting the right instruction, we should to flush the i-cache to ensure those data can be see by cpu. Signed-off-by: Xiantao Zhang xiantao.zh...@intel.com Signed-off-by: Yang Zhang yang.zh...@intel.com --- diff --git a/qemu/cache-utils.h b/qemu/cache-utils.h index b45fde4..5e11d12 100644 --- a/qemu/cache-utils.h +++ b/qemu/cache-utils.h @@ -33,8 +33,22 @@ static inline void flush_icache_range(unsigned long start, unsigned long stop) asm volatile (sync : : : memory); asm volatile (isync : : : memory); } +#define qemu_sync_idcache flush_icache_range +#else +#ifdef __ia64__ +static inline void qemu_sync_idcache(unsigned long start, unsigned long stop) +{ +while (start stop) { + asm volatile (fc %0 :: r(start)); + start += 32; +} +asm volatile (;;sync.i;;srlz.i;;); +} #else +static inline void qemu_sync_idcache(unsigned long start, unsigned long stop) +#endif + #define qemu_cache_utils_init(envp) do { (void) (envp); } while (0) #endif You already have flush_icache_range() in qemu/target-ia64/fake-exec.c, so this is redundant. Moving that to cache-utils.h might make sense, but this should be discussed on qemu-devel. Also, flush_icache_range() is already called from cpu_physical_memory_rw(). It would be helpful to include a comment in this commit explaining why this path is different. (I can see that it is, but only because I went hunting myself.) diff --git a/qemu/cutils.c b/qemu/cutils.c index 5b36cc6..7b57173 100644 --- a/qemu/cutils.c +++ b/qemu/cutils.c @@ -23,6 +23,7 @@ */ #include qemu-common.h #include host-utils.h +#include cache-utils.h #include assert.h void pstrcpy(char *buf, int buf_size, const char *str) @@ -215,6 +216,8 @@ void qemu_iovec_from_buffer(QEMUIOVector *qiov, const void *buf, size_t count) if (copy qiov-iov[i].iov_len) copy = qiov-iov[i].iov_len; memcpy(qiov-iov[i].iov_base, p, copy); +qemu_sync_idcache((unsigned long)qiov-iov[i].iov_base, +(unsigned long)(qiov-iov[i].iov_base + copy)); p += copy; count -= copy; } This is way too generic a call for this location. Other architectures also need to synchronize L1 caches sometimes, but they don't need to do it here. You need to comment and guard this call better (probably using some combination of kvm_enabled() and ifdefs). -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Port
On Sun, 2009-04-05 at 00:11 +0530, kvm port wrote: ok, so these are a few steps to begin (a) add a QEMUMachine for my h/w in qemu As an alternative, you could start exercising KVM kernel code using kvm-userspace/test before qemu is ready. (b) Add arch support in kvm I have a few questions (a) qemu starts in user space, how would I configure my linux. Should the linux run in Hypervisor state and the apps run in user state, and nothing runs in guest state [ there are 3 states in my processor] Are there only 3, or are there two independent dimensions (hypervisor/guest, user/supervisor)? If there are only 3, you'll need to figure out how to isolate guest kernel and guest userspace from each other. (b) qemu starts the VM and somehow ( i dont know yet, how?) , starts my code in processor guest state Why are you asking us? You are the processor expert... :) Qemu calls into KVM via an ioctl, and processor-specific KVM code (that's you) somehow jumps into guest mode. -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-84 + virtio Ubuntu Hardy guests
On Mon, 2009-04-06 at 18:43 +0100, Mark McLoughlin wrote: The problem here is that 2.6.24/5 vintage guests are saying they support something they don't. See this for further details: http://lists.gnu.org/archive/html/qemu-devel/2009-01/msg00574.html Agreed, understood. And those kernels should be patched. However, it's a bit of a chicken/egg problem... One can't even boot those guests (with virtio) to update the kernel, as it panics on boot (without the kvm hack). :-Dustin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-84 + virtio Ubuntu Hardy guests
On Mon, 2009-04-06 at 09:55 -0700, Dustin Kirkland wrote: On Tue, Mar 31, 2009 at 5:28 PM, Dustin Kirkland kirkl...@canonical.com wrote: I'm receiving a heavy volume of Ubuntu Jaunty Beta users reporting that Jaunty hosts running kvm-84 (userspace and kernel) are not able to boot previously-working Hardy guests (2.6.24 kernel) if virtio networking is enabled [1]. Users report that if e1000 is used instead, the guest is able to boot (with degraded network performance, obviously). Users are also reporting that this was not a problem when kvm-82 was used in Jaunty (though we also merged libvirt 0.5.1 up to 0.6.0 in roughly the same timeframe). ... [1] https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/331128 Howdy- Just a follow-up... Anthony was able to confirm this issue, and create a patch for KVM, which we're carrying in Ubuntu. It's a bit of a special-case hack, but I'm dropping it here for the sake of completeness. Basically, Hardy guests do not have working GSO (general segment offload) support. Some changes in kvm/libvirt appear to be exposing this, and breaking some guests when running virtio. This patch from Anthony basically disables this support in KVM userspace (until we have a better solution for auto-detecting GSO support or lack thereof). The problem here is that 2.6.24/5 vintage guests are saying they support something they don't. See this for further details: http://lists.gnu.org/archive/html/qemu-devel/2009-01/msg00574.html Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Make kvm header compile under g++.
Hi Avi, You said that you'd be willing to include this. I don't want to pester or anything, but I would like it to not fall into the abyss. Would you like me to file it as a bug and assign it to you? Are there any changes that you'd like? The one change you mentioned was to pull struct kvm_io outside of struct kvm_run. I mentioned that a grep shows no usage of kvm_io anywhere, so I didn't do that. Nate On Fri, Mar 27, 2009 at 9:53 PM, nathan binkert n...@binkert.org wrote: Two things needed fixing: 1) g++ does not allow a named structure type within an anonymous union and 2) Avoid name clash between two padding fields within the same struct by giving them different names as is done elsewhere in the header. Signed-off-by: Nathan Binkert n...@binkert.org --- include/linux/kvm.h | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index ee755e2..2e3a734 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -119,7 +119,7 @@ struct kvm_run { __u32 error_code; } ex; /* KVM_EXIT_IO */ - struct kvm_io { + struct { #define KVM_EXIT_IO_IN 0 #define KVM_EXIT_IO_OUT 1 __u8 direction; @@ -224,10 +224,10 @@ struct kvm_interrupt { /* for KVM_GET_DIRTY_LOG */ struct kvm_dirty_log { __u32 slot; - __u32 padding; + __u32 padding1; union { void __user *dirty_bitmap; /* one bit per page */ - __u64 padding; + __u64 padding2; }; }; -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip 2/6 V4.2] x86: add arch-dep register and stack access API to ptrace
I have no comments about the patch, but the subject line is misleading because this has nothing do with ptrace. It's in asm/ptrace.h but it being struct pt_regs does not really have anything to do with ptrace. Thanks, Roland -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-autotest] new test, saves and reloads a guest, from Red Hat QE
--- client/tests/kvm_runtest_2/kvm_runtest_2.py |1 + client/tests/kvm_runtest_2/kvm_tests.cfg.sample |6 ++ client/tests/kvm_runtest_2/kvm_tests.py | 97 +++ 3 files changed, 104 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm_runtest_2/kvm_runtest_2.py b/client/tests/kvm_runtest_2/kvm_runtest_2.py index c53877f..c83dce9 100644 --- a/client/tests/kvm_runtest_2/kvm_runtest_2.py +++ b/client/tests/kvm_runtest_2/kvm_runtest_2.py @@ -34,6 +34,7 @@ class kvm_runtest_2(test.test): migration:test_routine(kvm_tests, run_migration), yum_update: test_routine(kvm_tests, run_yum_update), autotest: test_routine(kvm_tests, run_autotest), +saveload: test_routine(kvm_tests, run_save_load), kvm_install: test_routine(kvm_install, run_kvm_install), linux_s3: test_routine(kvm_tests, run_linux_s3), } diff --git a/client/tests/kvm_runtest_2/kvm_tests.cfg.sample b/client/tests/kvm_runtest_2/kvm_tests.cfg.sample index 5619fa8..869398a 100644 --- a/client/tests/kvm_runtest_2/kvm_tests.cfg.sample +++ b/client/tests/kvm_runtest_2/kvm_tests.cfg.sample @@ -48,6 +48,12 @@ variants: reboot = yes extra_params += -snapshot kill_vm_on_error = yes + +- saveload:install + type = saveload + pre_command = if [ -f kvm_save_test_file ] ; then rm -rf kvm_save_test_file; fi + do_command = touch kvm_save_test_file + verify_command = [ -f kvm_save_test_file ] - migrate: install setup type = migration diff --git a/client/tests/kvm_runtest_2/kvm_tests.py b/client/tests/kvm_runtest_2/kvm_tests.py index 0d19af6..463cc84 100644 --- a/client/tests/kvm_runtest_2/kvm_tests.py +++ b/client/tests/kvm_runtest_2/kvm_tests.py @@ -449,3 +449,100 @@ def run_linux_s3(test, params, env): kvm_log.info(VM resumed after S3) session.close() + +def run_save_load(test, params, env): +# state testing, save and load vm state +vm = kvm_utils.env_get_vm(env, params.get(main_vm)) +if not vm: +message = VM object not found in environment +kvm_log.error(message) +raise error.TestError, message +if not vm.is_alive(): +message = VM seems to be dead; Test requires a living VM +kvm_log.error(message) +raise error.TestError, message + +kvm_log.info(Waiting for guest to be up...) + +pxssh = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2) +if not pxssh: +message = Could not log into guest +kvm_log.error(message) +raise error.TestFail, message + +pre_command=params.get(pre_command) +do_command=params.get(do_command) +verify_command=params.get(verify_command) + +# do preparation +kvm_log.info(Logged in) +kvm_log.info(Doing preparation ... %s % pre_command) +if not pxssh.send_command(pre_command): +message = %s failed % pre_command +kvm_log.error(message) +raise error.TestFail, message +pxssh.close() +kvm_log.info(Logged out) + +# save state +kvm_log.info(Saving VM state ...) +vm.send_monitor_cmd('savevm test1') +s, o = vm.send_monitor_cmd('info snapshots') +if not 'test1' in o: +message = Saveing VM state error %s % o +kvm_log.error(message) +raise error.TestFail, message +kvm_log.info(o) +kvm_log.info(VM state saved) + +kvm_log.info(Destroying VM ...) +vm.destroy(); +kvm_log.info(VM Destroyed) + +kvm_log.info(Booting VM...) +if not vm.create(): +message = Could no recreate VM instance +kvm_log.error(message) +raise error.TestError, message +kvm_log.info(VM recreated) + +# do modification +pxssh = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2) +if not pxssh: +message = Could not log into guest +kvm_log.error(message) +raise error.TestFail, message +kvm_log.info(Logged in) +kvm_log.info(Doing modification %s % do_command) +if not pxssh.send_command(do_command): +message = %s failed % do_command +kvm_log.error(message) +raise error.TestFail, message +pxssh.close() +kvm_log.info(Logged out) + +# load state +kvm_log.info(Loading VM state ...) +s, o = vm.send_monitor_cmd('loadvm test1') +kvm_log.info(o) +if Error in o: +message = VM state load failed: %s % o +kvm_log.error(message) +raise error.TestFail, message +kvm_log.info(VM state loaded) + +# verify the status +pxssh = kvm_utils.wait_for(vm.ssh_login, 240, 0, 2) +if not pxssh: +message = Could not log into guest +kvm_log.error(message) +raise error.TestFail, message +kvm_log.info(Verifying ... %s % verify_command) +if
Re: [PATCH 1/2] qemu: Allow SMBIOS entries to be loaded and provided to the VM BIOS
Alex Williamson wrote: Create a new -smbios options that takes binary SMBIOS entries to provide to the VM BIOS. The binary can be easily generated using something like: dmidecode -t 1 -u | grep $'^\t\t[^]' | xargs -n1 | \ perl -lne 'printf %c, hex($_)' smbios_type_1.bin For some inventory tools, this makes the VM report the system information for the host. One entry per binary file, multiple files can be chained together as: -smbios file1,file2,... or specified independently: -smbios file1 -smbios file2 Signed-off-by: Alex Williamson alex.william...@hp.com Hi Alex, I know we have to support blobs because of OEM specific smbios entries, but there are a number of common ones that it would probably be good to specify in a less user-unfriendly way. What do you think? Anyway, comments below. diff --git a/hw/acpi.c b/hw/acpi.c index 52f50a0..0bd93bf 100644 --- a/hw/acpi.c +++ b/hw/acpi.c @@ -915,3 +915,69 @@ out: } return -1; } + +char *smbios_entries; +size_t smbios_entries_len; I think an accessor would be better than making these variables global. +int smbios_entry_add(const char *t) +{ acpi.c is hardware emulation, I'd rather see the command line parsing done somewhere else (like vl.c). +struct stat s; +char file[1024], *p, *f, *n; +int fd, r; +size_t len, off; + +f = (char *)t; +do { +n = strchr(f, ','); +if (n) { +strncpy(file, f, (n - f)); +file[n - f] = '\0'; +f = n + 1; +} else { +strcpy(file, f); +f += strlen(file); +} I'm happy to just require multiple -smbios options. I dislike overloading with ','s even though we do it a lot in QEMU. +fd = open(file, O_RDONLY); +if (fd 0) +return -1; + +if (fstat(fd, s) 0) { +close(fd); +return -1; +} May want to look at load_image/get_image_size. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] qemu: Allow SMBIOS entries to be loaded and provided to the VM BIOS
Hi Anthony, On Mon, 2009-04-06 at 14:50 -0500, Anthony Liguori wrote: Alex Williamson wrote: I know we have to support blobs because of OEM specific smbios entries, but there are a number of common ones that it would probably be good to specify in a less user-unfriendly way. What do you think? Yeah, I'll admit this is a pretty unfriendly interface. I get from your comment on the other part of the patch that you'd prefer not to get into the mess of having both binary blobs and command line switches augmenting the blobs. This seems reasonable, but also means that we need a way to fully define the tables we generate from the command line. For a type 0 entry, that might mean the following set of switches: -bios-version, -bios-date, -bios-characteristics, -bios-release And for a type 1: -system-manufacturer, -system-name, -system-version, -system-serial, -system-sku, -system-family type 3: -chassis-manufacturer, -chassis-type, -chassis-version, -chassis-serial, -chassis-asset, -chassis-oem I'm sure I'm missing some, plus we might want to allow the memory and processor entries to have some fields changed. Do we want to add that many switches and means to access them from the rombios? Anyway, comments below. diff --git a/hw/acpi.c b/hw/acpi.c index 52f50a0..0bd93bf 100644 --- a/hw/acpi.c +++ b/hw/acpi.c @@ -915,3 +915,69 @@ out: } return -1; } + +char *smbios_entries; +size_t smbios_entries_len; I think an accessor would be better than making these variables global. Ok +int smbios_entry_add(const char *t) +{ acpi.c is hardware emulation, I'd rather see the command line parsing done somewhere else (like vl.c). Ok. acpi.c was just a convenient place to not bother architectures that don't care about smbios. +struct stat s; +char file[1024], *p, *f, *n; +int fd, r; +size_t len, off; + +f = (char *)t; +do { +n = strchr(f, ','); +if (n) { +strncpy(file, f, (n - f)); +file[n - f] = '\0'; +f = n + 1; +} else { +strcpy(file, f); +f += strlen(file); +} I'm happy to just require multiple -smbios options. I dislike overloading with ','s even though we do it a lot in QEMU. Yup, I didn't have it initially, but added it because I thought someone might complain other qemu options allow it. +fd = open(file, O_RDONLY); +if (fd 0) +return -1; + +if (fstat(fd, s) 0) { +close(fd); +return -1; +} May want to look at load_image/get_image_size. Will do. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] qemu: Allow SMBIOS entries to be loaded and provided to the VM BIOS
Alex Williamson wrote: Hi Anthony, On Mon, 2009-04-06 at 14:50 -0500, Anthony Liguori wrote: Alex Williamson wrote: I know we have to support blobs because of OEM specific smbios entries, but there are a number of common ones that it would probably be good to specify in a less user-unfriendly way. What do you think? Yeah, I'll admit this is a pretty unfriendly interface. I get from your comment on the other part of the patch that you'd prefer not to get into the mess of having both binary blobs and command line switches augmenting the blobs. This seems reasonable, but also means that we need a way to fully define the tables we generate from the command line. For a type 0 entry, that might mean the following set of switches: -bios-version, -bios-date, -bios-characteristics, -bios-release You could go one level higher: -smbios type=0,bios-version='1.0',bios-date='2009/10/20' etc. I'm sure I'm missing some, plus we might want to allow the memory and processor entries to have some fields changed. Do we want to add that many switches and means to access them from the rombios? I think it's okay to start with some of the more common tables and provide the parsing in QEMU. We could then introduce humanize more tables down the road as people saw fit. At the end of the day, I'm most interested in the tables that are going to be frequently used by management applications. That is, the tables that are required for things like SVVP certification should be specified in a human readable format that QEMU can build reasonable defaults for and management tools can override. I'm torn between exposing the tables directly to the firmware or providing a higher level interface. I really don't like -uuid overriding a binary blob though so I'd prefer to avoid that. -uuid should only be respected if using the QEMU generated version of the SMBIOS table. I'll defer to whatever you think is better what is exposed in the firmware interface as I can see arguments for both. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip 3/6 V4.1] x86: instruction decorder API
On Fri, 2009-04-03 at 20:37 -0400, Masami Hiramatsu wrote: Hi Peter, H. Peter Anvin wrote: Masami Hiramatsu wrote: Add x86 instruction decoder to arch-specific libraries. This decoder can decode all x86 instructions into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. ... Hi Masami, On the surface the overall structure looks fine, but I have a couple of concerns: 1. is this meant to be able to decode userspace code or just kernel code? If it is supposed to be able to decode userspace code, is there a reason you're not dealing with 16-bit or V86 mode code at all? If not, why are you including the 32-bit decoder in a 64-bit kernel (as well as instructions which we're pretty much guaranteed to never use in the kernel, such as ENTER.) Actually, this aims to decode both of user space and kernel code. At this point, it just needs to cover kernel code, because kprobes just want to decode kernel binary. However, this is just a starting point, uprobe developers want to use it to decode user-space code. In that case, it needs to be enhanced. For user-space probing, we've been concentrating on native-built executables. Am I correct in thinking that we'll see 16-bit or V86 mode only on legacy apps built elsewhere? In any case, it only makes sense to build on the kvm folks' work in this regard. ... 4. you have a bunch of magic opcode constants all over the place. This means that as new instructions come in -- and they're going to be coming in -- this is going to be hard to update. It would be cleaner if we could have an intermediate format that preprocesses down to all the relevant tables and perhaps even some of the code rather than open-coding everything in C. This matters... for example you have: + } else if (opcode == 0xea /* jmp far seg:offs */) { + __get_immptr(insn); ... but nothing similar for opcode 0x9a. This is extremely hard to spot with this kind of structure. Oops, that should be a bug. Hmm, I think we'd better bit-flags tables for classifying opcodes. Jim, can your INAT idea help this situation? http://sources.redhat.com/ml/systemtap/2009-q2/msg00109.html As noted, the INAT tables follow the kvm model of one fat bitmap of attributes per opcode, rather than the kprobes/uprobes model of one or two 256-bit tables per attribute. (This latter approach was due to the gradual accumulation of tables over the years.) I like the bitmap-per-opcode approach because it's relatively easy to see in one place everything you're saying about a particular opcode. But with all the potential clients for this service, it's not clear that we'll get by with a single bitmap for every opcode. (x86 kvm uses 32 bits per opcode, I think, and the INAT tables use 10. Seems like we could overrun 64 bits pretty quickly.) So I guess that means we'll have to get a little creative as to how we expose these attribute sets to the client. ... Thank you for good advice! Ditto. Jim Keniston -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip 3/6 V4.1] x86: instruction decorder API
Jim Keniston wrote: For user-space probing, we've been concentrating on native-built executables. Am I correct in thinking that we'll see 16-bit or V86 mode only on legacy apps built elsewhere? In any case, it only makes sense to build on the kvm folks' work in this regard. That's a fair assumption; you will of course need to test it and take appropriate action if it doesn't pan out. As noted, the INAT tables follow the kvm model of one fat bitmap of attributes per opcode, rather than the kprobes/uprobes model of one or two 256-bit tables per attribute. (This latter approach was due to the gradual accumulation of tables over the years.) I like the bitmap-per-opcode approach because it's relatively easy to see in one place everything you're saying about a particular opcode. But with all the potential clients for this service, it's not clear that we'll get by with a single bitmap for every opcode. (x86 kvm uses 32 bits per opcode, I think, and the INAT tables use 10. Seems like we could overrun 64 bits pretty quickly.) So I guess that means we'll have to get a little creative as to how we expose these attribute sets to the client. This is another very good reason to use an instruction table which is preprocessed into a usable format: it means that if the internal data structures change -- and they almost certainly will have to at some point -- the raw data isn't lost. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-84 + virtio Ubuntu Hardy guests
Dustin Kirkland wrote: On Mon, 2009-04-06 at 18:43 +0100, Mark McLoughlin wrote: The problem here is that 2.6.24/5 vintage guests are saying they support something they don't. See this for further details: http://lists.gnu.org/archive/html/qemu-devel/2009-01/msg00574.html Agreed, understood. And those kernels should be patched. However, it's a bit of a chicken/egg problem... One can't even boot those guests (with virtio) to update the kernel, as it panics on boot (without the kvm hack). There's now a fix for this in QEMU stable. It'll make it's way to kvm-userspace once Avi merges. Regards, Anthony Liguori :-Dustin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-autotest new test, save and load
David Huff wrote: Submitting new test for review and comments. This test was originally developed by the Red Hat QE team. It starts a new guest saves it and reloads the saved state. looking for comments and or improvements that can be made to this test. I have some code (aimed at testing software within the guest, rather than kvm itself) which does this kind of operation quite frequently; the most common case where I've seen failures happen is when writes are active while state is being saved. One enhancement might be to test the integrity of block device writes being done during the save/restore process itself -- for instance, write a file of all high bits, start a process overwriting it with all low bits, run the save and restore before the latter process has completed, and ensure that after the migration is complete and the process has finished, the file contains nothing but 0s. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] PCI pass-through fixups
* Alex Williamson (alex.william...@hp.com) wrote: I'm wondering if we need a spot for device specific fixups for PCI pass-through. In the example below, I want to expose a single port of an Intel 82571EB quad port copper NIC to a guest. It works great until I shutdown the guest, at which point the guest e1000e driver knows by the device ID that the NIC is a quad port, and blindly attempts to twiddle some bits on the bridge above it (that doesn't exist). And what happens? Obviously some robustness could be added to the driver, but would it make sense to do something like below and automatically remap these devices to identical single port device IDs? Thanks, Sounds quite fragile to me. thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RESEND] kvm-userspace: fix option_rom_setup_reset address
On Mon, Apr 6, 2009 at 1:32 PM, Ryan Harper ry...@us.ibm.com wrote: Commit f2b690ba461971fb8b04354de8717a73fd08b945 changed the target address for option roms, but failed to use the same address when registering an option rom reset. This manifests itself when using extboot (boot=on) and reseting a guest via reboot or system_reset on monitor and the guest fails to boot. This patch register the correct region for each option rom. looks good to me. -- Glauber Costa. Free as in Freedom http://glommer.net The less confident you are, the more serious you have to act. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 1/3] PCI: rewrite Function Level Reset
Changes: 1) remove disable_irq() so the shared IRQ won't be disabled. 2) replace the 1s wait with 100, 200 and 400ms wait intervals for the Pending Transaction. 3) replace mdelay() with msleep(). 4) add might_sleep(). 5) lock the device to prevent PM suspend from accessing the CSRs during the reset. 6) coding style fixes. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/pci.c | 166 ++- include/linux/pci.h |2 +- 2 files changed, 85 insertions(+), 83 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index af4db4e..46ae997 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2008,111 +2008,112 @@ int pci_set_dma_seg_boundary(struct pci_dev *dev, unsigned long mask) EXPORT_SYMBOL(pci_set_dma_seg_boundary); #endif -static int __pcie_flr(struct pci_dev *dev, int probe) +static int pcie_flr(struct pci_dev *dev, int probe) { - u16 status; + int i; + int pos; u32 cap; - int exppos = pci_find_capability(dev, PCI_CAP_ID_EXP); + u16 status; - if (!exppos) + pos = pci_find_capability(dev, PCI_CAP_ID_EXP); + if (!pos) return -ENOTTY; - pci_read_config_dword(dev, exppos + PCI_EXP_DEVCAP, cap); + + pci_read_config_dword(dev, pos + PCI_EXP_DEVCAP, cap); if (!(cap PCI_EXP_DEVCAP_FLR)) return -ENOTTY; if (probe) return 0; - pci_block_user_cfg_access(dev); - /* Wait for Transaction Pending bit clean */ - pci_read_config_word(dev, exppos + PCI_EXP_DEVSTA, status); - if (!(status PCI_EXP_DEVSTA_TRPND)) - goto transaction_done; + for (i = 0; i 4; i++) { + if (i) + msleep((1 (i - 1)) * 100); - msleep(100); - pci_read_config_word(dev, exppos + PCI_EXP_DEVSTA, status); - if (!(status PCI_EXP_DEVSTA_TRPND)) - goto transaction_done; - - dev_info(dev-dev, Busy after 100ms while trying to reset; - sleeping for 1 second\n); - ssleep(1); - pci_read_config_word(dev, exppos + PCI_EXP_DEVSTA, status); - if (status PCI_EXP_DEVSTA_TRPND) - dev_info(dev-dev, Still busy after 1s; - proceeding with reset anyway\n); - -transaction_done: - pci_write_config_word(dev, exppos + PCI_EXP_DEVCTL, + pci_read_config_word(dev, pos + PCI_EXP_DEVSTA, status); + if (!(status PCI_EXP_DEVSTA_TRPND)) + goto clear; + } + + dev_err(dev-dev, transaction is not cleared; + proceeding with reset anyway\n); + +clear: + pci_write_config_word(dev, pos + PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR); - mdelay(100); + msleep(100); - pci_unblock_user_cfg_access(dev); return 0; } -static int __pci_af_flr(struct pci_dev *dev, int probe) +static int pci_af_flr(struct pci_dev *dev, int probe) { - int cappos = pci_find_capability(dev, PCI_CAP_ID_AF); - u8 status; + int i; + int pos; u8 cap; + u8 status; - if (!cappos) + pos = pci_find_capability(dev, PCI_CAP_ID_AF); + if (!pos) return -ENOTTY; - pci_read_config_byte(dev, cappos + PCI_AF_CAP, cap); + + pci_read_config_byte(dev, pos + PCI_AF_CAP, cap); if (!(cap PCI_AF_CAP_TP) || !(cap PCI_AF_CAP_FLR)) return -ENOTTY; if (probe) return 0; - pci_block_user_cfg_access(dev); - /* Wait for Transaction Pending bit clean */ - pci_read_config_byte(dev, cappos + PCI_AF_STATUS, status); - if (!(status PCI_AF_STATUS_TP)) - goto transaction_done; + for (i = 0; i 4; i++) { + if (i) + msleep((1 (i - 1)) * 100); + + pci_read_config_byte(dev, pos + PCI_AF_STATUS, status); + if (!(status PCI_AF_STATUS_TP)) + goto clear; + } + + dev_err(dev-dev, transaction is not cleared; + proceeding with reset anyway\n); +clear: + pci_write_config_byte(dev, pos + PCI_AF_CTRL, PCI_AF_CTRL_FLR); msleep(100); - pci_read_config_byte(dev, cappos + PCI_AF_STATUS, status); - if (!(status PCI_AF_STATUS_TP)) - goto transaction_done; - - dev_info(dev-dev, Busy after 100ms while trying to -reset; sleeping for 1 second\n); - ssleep(1); - pci_read_config_byte(dev, cappos + PCI_AF_STATUS, status); - if (status PCI_AF_STATUS_TP) - dev_info(dev-dev, Still busy after 1s; - proceeding with reset anyway\n); - -transaction_done: - pci_write_config_byte(dev, cappos + PCI_AF_CTRL, PCI_AF_CTRL_FLR); - mdelay(100); - -
[RFC PATCH 3/3] PCI: support Secondary Bus Reset
PCI-to-PCI Bridge 1.2 specifies that the Secondary Bus Reset bit can force the assertion of RST# on the secondary interface, which can be used to reset all devices including subordinates under this bus. This can be used to reset a function if this function is the only device under this bus. Signed-off-by: Yu Zhao yu.z...@intel.com --- drivers/pci/pci.c | 31 +++ 1 files changed, 31 insertions(+), 0 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index e459a0b..a77c33a 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2115,6 +2115,33 @@ static int pci_pm_flr(struct pci_dev *dev, int probe) return 0; } +static int pci_secondary_bus_reset(struct pci_dev *dev, int probe) +{ + u16 ctrl; + struct pci_dev *pdev; + + if (dev-subordinate) + return -ENOTTY; + + list_for_each_entry(pdev, dev-bus-devices, bus_list) + if (pdev != dev) + return -ENOTTY; + + if (probe) + return 0; + + pci_read_config_word(dev-bus-self, PCI_BRIDGE_CONTROL, ctrl); + ctrl |= PCI_BRIDGE_CTL_BUS_RESET; + pci_write_config_word(dev-bus-self, PCI_BRIDGE_CONTROL, ctrl); + msleep(100); + + ctrl = ~PCI_BRIDGE_CTL_BUS_RESET; + pci_write_config_word(dev-bus-self, PCI_BRIDGE_CONTROL, ctrl); + msleep(100); + + return 0; +} + static int pci_dev_reset(struct pci_dev *dev, int probe) { int rc; @@ -2136,6 +2163,10 @@ static int pci_dev_reset(struct pci_dev *dev, int probe) goto done; rc = pci_pm_flr(dev, probe); + if (rc != -ENOTTY) + goto done; + + rc = pci_secondary_bus_reset(dev, probe); done: up(dev-dev.sem); -- 1.5.6.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] align vga rom to 4k boundary.
Instead of aligning to 2k boundary, as required by the bios, align to 4k boundary, as required by kvm memory functions. Without this patch, starting kvm with -vga std option fails with: create_userspace_phys_mem: Invalid argument kvm_cpu_register_physical_memory: failed as described by: https://bugzilla.redhat.com/show_bug.cgi?id=494376 It does not fail with cirrus vga, because it is naturally aligned. This problem does not seem to affect upstream qemu. Signed-off-by: Glauber Costa glom...@redhat.com --- qemu/hw/pc.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c index cc84772..680d4a2 100644 --- a/qemu/hw/pc.c +++ b/qemu/hw/pc.c @@ -919,7 +919,7 @@ vga_bios_error: exit(1); } /* Round up vga bios size to the next 2k boundary */ - vga_bios_size = (vga_bios_size + 2047) ~2047; + vga_bios_size = (vga_bios_size + 4095) ~4095; option_rom_start = 0xc + vga_bios_size; /* setup basic memory access */ -- 1.5.6.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
3525MB RAM Limit
Hi, I'm running a KVM host using Fedora 10 (x86_64) on 4x Quad-Core AMD Opteron(tm) Processor 8347 HE system with 16GB ram. I've created a Fedora 10 (x86_64) guest and allocated 8 CPUs and 8GB ram which I can see using 'virt-manager'. When I login to the guest, I can see the 8 CPUs, but apparently, I only have 3525MB RAM (output from 'free'). At first I thought it was a 32bit problem, but I've double checked and the host and guest are both running 64bit architectures. virt-manager shows 50% ram usage which appears to be correct (4GB, out of 8). It also correctly recognises the 16GB available in the host machine. Is this expected? Will the extra ram become available once it is required? If not, does anyone have any Ideas for how I can fix the problem? Thanks, Dan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3525MB RAM Limit
Read the list archives. A fix was discussed recently. --Brian Jackson On Monday 06 April 2009 21:59:24 Daniel Scott wrote: Hi, I'm running a KVM host using Fedora 10 (x86_64) on 4x Quad-Core AMD Opteron(tm) Processor 8347 HE system with 16GB ram. I've created a Fedora 10 (x86_64) guest and allocated 8 CPUs and 8GB ram which I can see using 'virt-manager'. When I login to the guest, I can see the 8 CPUs, but apparently, I only have 3525MB RAM (output from 'free'). At first I thought it was a 32bit problem, but I've double checked and the host and guest are both running 64bit architectures. virt-manager shows 50% ram usage which appears to be correct (4GB, out of 8). It also correctly recognises the 16GB available in the host machine. Is this expected? Will the extra ram become available once it is required? If not, does anyone have any Ideas for how I can fix the problem? Thanks, Dan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm] [PATCH 06/16] Support for device capability
On Saturday 04 April 2009 03:23:31 Alex Williamson wrote: On Tue, 2009-03-17 at 11:50 +0800, Sheng Yang wrote: This framework can be easily extended to support device capability, like MSI/MSI-x. Sheng, Are you already looking at adding support for PM and EXP capabilities? The bnx2 driver is an example that won't claim the device if these capabilities aren't present. Thanks, Not yet... (And it's quite strange that this mail didn't go for my private mailbox. Only mailing list one existed...) -- regards Yang, Sheng Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Port
On Sun, 2009-04-05 at 00:11 +0530, kvm port wrote: ok, so these are a few steps to begin (a) add a QEMUMachine for my h/w in qemu As an alternative, you could start exercising KVM kernel code using kvm-userspace/test before qemu is ready. (b) Add arch support in kvm I have a few questions (a) qemu starts in user space, how would I configure my linux. Should the linux run in Hypervisor state and the apps run in user state, and nothing runs in guest state [ there are 3 states in my processor] Are there only 3, or are there two independent dimensions (hypervisor/guest, user/supervisor)? If there are only 3, you'll need to figure out how to isolate guest kernel and guest userspace from each other. (b) qemu starts the VM and somehow ( i dont know yet, how?) , starts my code in processor guest state Why are you asking us? You are the processor expert... :) Qemu calls into KVM via an ioctl, and processor-specific KVM code (that's you) somehow jumps into guest mode. -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html