[COMMIT master] Fix request_irq() for 2.6.19
From: Chris Wright chr...@sous-sol.org The irq handler changes (introduced in 2.6.19, not 2.6.20) dropped struct pt_regs from the handler prototype, they are found globally now. This introduces the back compat for older kernels. The handler is just a thin layer which calls the real registered handler (all this to work around a minor little compiler warning ;-) Needed for device assignment on older kernels. Signed-off-by: Chris Wright chr...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/external-module-compat-comm.h b/external-module-compat-comm.h index c955927..8cb5440 100644 --- a/external-module-compat-comm.h +++ b/external-module-compat-comm.h @@ -641,19 +641,41 @@ static inline int pci_reset_function(struct pci_dev *dev) #endif #include linux/interrupt.h -#if LINUX_VERSION_CODE KERNEL_VERSION(2,6,20) +#if LINUX_VERSION_CODE KERNEL_VERSION(2,6,19) + +typedef irqreturn_t (*kvm_irq_handler_t)(int, void *); +static kvm_irq_handler_t kvm_irq_handlers[NR_IRQS]; + +static irqreturn_t kvm_irq_thunk(int irq, void *dev_id, struct pt_regs *regs) +{ + kvm_irq_handler_t handler = kvm_irq_handlers[irq]; + return handler(irq, dev_id); +} -typedef irqreturn_t (*kvm_irq_handler_t)(int, void *, struct pt_regs *); static inline int kvm_request_irq(unsigned int a, kvm_irq_handler_t handler, unsigned long c, const char *d, void *e) { - /* FIXME: allocate thunk, etc. */ - return -EINVAL; + int rc; + kvm_irq_handler_t old = kvm_irq_handlers[a]; + if (old) + return -EBUSY; + kvm_irq_handlers[a] = handler; + rc = request_irq(a, kvm_irq_thunk, c, d, e); + if (rc) + kvm_irq_handlers[a] = NULL; + return rc; +} + +static inline void kvm_free_irq(unsigned int irq, void *dev_id) +{ + free_irq(irq, dev_id); + kvm_irq_handlers[irq] = NULL; } #else #define kvm_request_irq request_irq +#define kvm_free_irq free_irq #endif diff --git a/ia64/hack-module.awk b/ia64/hack-module.awk index 2e4e05f..dda3347 100644 --- a/ia64/hack-module.awk +++ b/ia64/hack-module.awk @@ -2,7 +2,7 @@ BEGIN { split(INIT_WORK on_each_cpu smp_call_function \ hrtimer_add_expires_ns hrtimer_get_expires \ hrtimer_get_expires_ns hrtimer_start_expires \ hrtimer_expires_remaining \ - request_irq, compat_apis); } + request_irq free_irq, compat_apis); } /MODULE_AUTHOR/ { printf(MODULE_INFO(version, \%s\);\n, version) diff --git a/x86/hack-module.awk b/x86/hack-module.awk index 260eeef..bdb873a 100644 --- a/x86/hack-module.awk +++ b/x86/hack-module.awk @@ -2,7 +2,7 @@ BEGIN { split(INIT_WORK desc_struct ldttss_desc64 desc_ptr \ hrtimer_add_expires_ns hrtimer_get_expires \ hrtimer_get_expires_ns hrtimer_start_expires \ hrtimer_expires_remaining \ - on_each_cpu relay_open request_irq , compat_apis); } + on_each_cpu relay_open request_irq free_irq , compat_apis); } /^int kvm_init\(/ { anon_inodes = 1 } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Set default configure options for ia64
From: Xiantao Zhang xiantao.zh...@intel.com 1. Disable xen config support for ia64. 2. Only configure ia64-softmmu for ia64. Signed-off-by: Xiantao Zhang xiantao.zh...@intel.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/configure b/configure index fc0fb9b..5f448a2 100755 --- a/configure +++ b/configure @@ -337,6 +337,8 @@ if [ $cpu = i386 -o $cpu = x86_64 ] ; then fi if [ $cpu = ia64 ] ; then kvm=yes + xen=no + target_list=ia64-softmmu cpu_emulation=no gdbstub=no slirp=no -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] exec.c: fix typo in comment (fluch - flush)
From: Sebastian Herbszt herb...@gmx.de Fix typo in comment in exec.c (fluch - flush). Signed-off-by: Sebastian Herbszt herb...@gmx.de Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/exec.c b/exec.c index 16d3cf8..0420f29 100644 --- a/exec.c +++ b/exec.c @@ -3187,7 +3187,7 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf, (0xff ~CODE_DIRTY_FLAG); } /* qemu doesn't execute guest code directly, but kvm does - therefore fluch instruction caches */ + therefore flush instruction caches */ if (kvm_enabled()) flush_icache_range((unsigned long)ptr, ((unsigned long)ptr)+l); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] kvm: Set kvm_arch=powerpc for PPC builds.
From: Hollis Blanchard holl...@us.ibm.com The name of the Linux arch directory is powerpc, not ppc. Signed-off-by: Hollis Blanchard holl...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/configure b/configure index 5f448a2..9a635ae 100755 --- a/configure +++ b/configure @@ -818,6 +818,9 @@ case $cpu in i386 | x86_64) kvm_arch=x86 ;; +ppc) +kvm_arch=powerpc +;; *) kvm_arch=$cpu ;; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Remove memalign call for guess_disk_lchs
From: Anthony Liguori aligu...@us.ibm.com This code doesn't exist in upstream QEMU because it is not necessary to provide an aligned buffer to bdrv_read. The API has always worked this way although at one point, the bouncing was broken which is what led to this patch. The places where qemu_memalign() is used in QEMU are only where performance is sensitive. guess_disk_lchs does not fall into this category. Signed-off-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/block.c b/block.c index 8e08f32..8348cf2 100644 --- a/block.c +++ b/block.c @@ -30,7 +30,6 @@ #include qemu-common.h #include monitor.h #include block_int.h -#include osdep.h #ifdef HOST_BSD #include sys/types.h @@ -772,26 +771,20 @@ struct partition { static int guess_disk_lchs(BlockDriverState *bs, int *pcylinders, int *pheads, int *psectors) { -uint8_t *buf; +uint8_t buf[512]; int ret, i, heads, sectors, cylinders; struct partition *p; uint32_t nr_sects; uint64_t nb_sectors; -buf = qemu_memalign(512, 512); -if (buf == NULL) -return -1; - bdrv_get_geometry(bs, nb_sectors); ret = bdrv_read(bs, 0, buf, 1); if (ret 0) return -1; /* test msdos magic */ -if (buf[510] != 0x55 || buf[511] != 0xaa) { -qemu_free(buf); +if (buf[510] != 0x55 || buf[511] != 0xaa) return -1; -} for(i = 0; i 4; i++) { p = ((struct partition *)(buf + 0x1be)) + i; nr_sects = le32_to_cpu(p-nr_sects); @@ -812,11 +805,9 @@ static int guess_disk_lchs(BlockDriverState *bs, printf(guessed geometry: LCHS=%d %d %d\n, cylinders, heads, sectors); #endif -qemu_free(buf); return 0; } } -qemu_free(buf); return -1; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Remove unnecessary setting of cmos smp_cpu count
From: Anthony Liguori aligu...@us.ibm.com This is duplicate code. Signed-off-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/hw/pc.c b/hw/pc.c index 35f9527..db34f53 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -265,7 +265,6 @@ static void cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size, rtc_set_memory(s, 0x5c, (unsigned int)above_4g_mem_size 24); rtc_set_memory(s, 0x5d, (uint64_t)above_4g_mem_size 32); } -rtc_set_memory(s, 0x5f, smp_cpus - 1); if (ram_size (16 * 1024 * 1024)) val = (ram_size / 65536) - ((16 * 1024 * 1024) / 65536); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Remove dead macros likely/unlikely in exec.c
From: Anthony Liguori aligu...@us.ibm.com More left overs from the old migration code. Signed-off-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/exec.c b/exec.c index 0420f29..0c5545e 100644 --- a/exec.c +++ b/exec.c @@ -3500,14 +3500,6 @@ uint32_t lduw_phys(target_phys_addr_t addr) return tswap16(val); } -#ifdef __GNUC__ -#define likely(x) __builtin_expect(!!(x), 1) -#define unlikely(x) __builtin_expect(!!(x), 0) -#else -#define likely(x) x -#define unlikely(x) x -#endif - /* warning: addr must be aligned. The ram page is not masked as dirty and the code inside is not invalidated. It is useful if the dirty bits are used to track modified PTEs */ -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Remove stray whitespace
From: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/block-vmdk.c b/block-vmdk.c index ff5007c..d47d483 100644 --- a/block-vmdk.c +++ b/block-vmdk.c @@ -93,6 +93,7 @@ typedef struct ActiveBDRVState{ static ActiveBDRVState activeBDRV; + static int vmdk_probe(const uint8_t *buf, int buf_size, const char *filename) { uint32_t magic; diff --git a/vl.c b/vl.c index 9ff4a5a..15f85e2 100644 --- a/vl.c +++ b/vl.c @@ -21,7 +21,6 @@ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN * THE SOFTWARE. */ - #include unistd.h #include fcntl.h #include signal.h -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Remove devfn from BlockDriverState
From: Anthony Liguori aligu...@us.ibm.com It's no longer used. Signed-off-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/block_int.h b/block_int.h index 951ff02..e10b906 100644 --- a/block_int.h +++ b/block_int.h @@ -150,8 +150,6 @@ struct BlockDriverState { int cyls, heads, secs, translation; int type; char device_name[32]; -/* PCI devfn of parent */ -int devfn; BlockDriverState *next; void *private; }; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Remove IBM copyright in unmodified file in upstream
From: Anthony Liguori aligu...@us.ibm.com Presumably, it would carry an IBM copyright upstream if needed. qemu-kvm introduces no additional code over upstream QEMU in this file. Signed-off-by: Anthony Liguori aligu...@us.ibm.com Acked-by: Hollis Blanchard holl...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/hw/ppc4xx.h b/hw/ppc4xx.h index 25a91bd..7832cd9 100644 --- a/hw/ppc4xx.h +++ b/hw/ppc4xx.h @@ -3,9 +3,6 @@ * * Copyright (c) 2007 Jocelyn Mayer * - * Copyright 2008 IBM Corp. - * Authors: Hollis Blanchard holl...@us.ibm.com - * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the Software), to deal * in the Software without restriction, including without limitation the rights -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Re-add vga dirty logging bits dropped by merge 02d4417f75
From: Avi Kivity a...@redhat.com The last qemu.git merge broke vga. Revert the vga changes pending better dirty logging support. Signed-off-by: Avi Kivity a...@redhat.com diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index 3e67acd..20f17a6 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -2619,6 +2619,7 @@ static CPUWriteMemoryFunc *cirrus_linear_bitblt_write[3] = { static void map_linear_vram(CirrusVGAState *s) { +vga_dirty_log_stop((VGAState *)s); if (!s-map_addr s-lfb_addr s-lfb_end) { s-map_addr = s-lfb_addr; s-map_end = s-lfb_end; @@ -2631,11 +2632,16 @@ static void map_linear_vram(CirrusVGAState *s) #ifndef TARGET_IA64 s-lfb_vram_mapped = 0; +cpu_register_physical_memory(isa_mem_base + 0xa, 0x8000, +(s-vram_offset + s-cirrus_bank_base[0]) | IO_MEM_UNASSIGNED); +cpu_register_physical_memory(isa_mem_base + 0xa8000, 0x8000, +(s-vram_offset + s-cirrus_bank_base[1]) | IO_MEM_UNASSIGNED); if (!(s-cirrus_srcptr != s-cirrus_srcptr_end) !((s-sr[0x07] 0x01) == 0) !((s-gr[0x0B] 0x14) == 0x14) !(s-gr[0x0B] 0x02)) { +vga_dirty_log_stop((VGAState *)s); cpu_register_physical_memory(isa_mem_base + 0xa, 0x8000, (s-vram_offset + s-cirrus_bank_base[0]) | IO_MEM_RAM); cpu_register_physical_memory(isa_mem_base + 0xa8000, 0x8000, @@ -2654,11 +2660,14 @@ static void map_linear_vram(CirrusVGAState *s) static void unmap_linear_vram(CirrusVGAState *s) { +vga_dirty_log_stop((VGAState *)s); if (s-map_addr s-lfb_addr s-lfb_end) s-map_addr = s-map_end = 0; cpu_register_physical_memory(isa_mem_base + 0xa, 0x2, s-vga_io_memory); + +vga_dirty_log_start((VGAState *)s); } /* Compute the memory access functions */ @@ -3305,6 +3314,8 @@ static void cirrus_pci_lfb_map(PCIDevice *d, int region_num, { CirrusVGAState *s = ((PCICirrusVGAState *)d)-cirrus_vga; +vga_dirty_log_stop((VGAState *)s); + /* XXX: add byte swapping apertures */ cpu_register_physical_memory(addr, s-vram_size, s-cirrus_linear_io_addr); @@ -3336,10 +3347,14 @@ static void pci_cirrus_write_config(PCIDevice *d, PCICirrusVGAState *pvs = container_of(d, PCICirrusVGAState, dev); CirrusVGAState *s = pvs-cirrus_vga; +vga_dirty_log_stop((VGAState *)s); + pci_default_write_config(d, address, val, len); if (s-map_addr pvs-dev.io_regions[0].addr == -1) s-map_addr = 0; cirrus_update_memory_access(s); + +vga_dirty_log_start((VGAState *)s); } void pci_cirrus_vga_init(PCIBus *bus, int vga_ram_size) diff --git a/hw/vga.c b/hw/vga.c index 4931b69..9ab6e1a 100644 --- a/hw/vga.c +++ b/hw/vga.c @@ -1280,6 +1280,8 @@ static void vga_draw_text(VGAState *s, int full_update) vga_draw_glyph8_func *vga_draw_glyph8; vga_draw_glyph9_func *vga_draw_glyph9; +vga_dirty_log_stop(s); + /* compute font data address (in plane 2) */ v = s-sr[3]; offset = (((v 4) 1) | ((v 1) 6)) * 8192 * 4 + 2; @@ -1578,6 +1580,7 @@ static void vga_sync_dirty_bitmap(VGAState *s) cpu_physical_sync_dirty_bitmap(isa_mem_base + 0xa, 0xa8000); cpu_physical_sync_dirty_bitmap(isa_mem_base + 0xa8000, 0xb); } +vga_dirty_log_start(s); } /* @@ -1809,6 +1812,7 @@ static void vga_draw_blank(VGAState *s, int full_update) return; if (s-last_scr_width = 0 || s-last_scr_height = 0) return; +vga_dirty_log_stop(s); s-rgb_to_pixel = rgb_to_pixel_dup_table[get_depth_index(s-ds)]; @@ -2258,6 +2262,18 @@ void vga_dirty_log_start(VGAState *s) } } +void vga_dirty_log_stop(VGAState *s) +{ +if (kvm_enabled() s-map_addr s1) +kvm_log_stop(s-map_addr, s-map_end - s-map_addr); + +if (kvm_enabled() s-lfb_vram_mapped s2) { +kvm_log_stop(isa_mem_base + 0xa, 0x8000); +kvm_log_stop(isa_mem_base + 0xa8000, 0x8000); +} +s1 = s2 = 0; +} + static void vga_map(PCIDevice *pci_dev, int region_num, uint32_t addr, uint32_t size, int type) { @@ -2267,10 +2283,12 @@ static void vga_map(PCIDevice *pci_dev, int region_num, cpu_register_physical_memory(addr, s-bios_size, s-bios_offset); } else { cpu_register_physical_memory(addr, s-vram_size, s-vram_offset); -s-map_addr = addr; -s-map_end = addr + s-vram_size; -vga_dirty_log_start(s); } + +s-map_addr = addr; +s-map_end = addr + VGA_RAM_SIZE; + +vga_dirty_log_start(s); } void vga_common_init(VGAState *s, int vga_ram_size) @@ -2498,9 +2516,11 @@ static void pci_vga_write_config(PCIDevice *d, PCIVGAState *pvs = container_of(d, PCIVGAState, dev); VGAState *s = pvs-vga_state; +vga_dirty_log_stop(s); pci_default_write_config(d,
[COMMIT master] kvm: Add header files for ia64
From: Xiantao Zhang xiantao.zh...@intel.com Signed-off-by: Xiantao Zhang xiantao.zh...@intel.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/kvm/kernel/arch/ia64/include/asm/kvm.h b/kvm/kernel/arch/ia64/include/asm/kvm.h new file mode 100644 index 000..73963e3 --- /dev/null +++ b/kvm/kernel/arch/ia64/include/asm/kvm.h @@ -0,0 +1,303 @@ +#ifndef KVM_UNIFDEF_H +#define KVM_UNIFDEF_H + +#ifdef __i386__ +#ifndef CONFIG_X86_32 +#define CONFIG_X86_32 1 +#endif +#endif + +#ifdef __x86_64__ +#ifndef CONFIG_X86_64 +#define CONFIG_X86_64 1 +#endif +#endif + +#if defined(__i386__) || defined (__x86_64__) +#ifndef CONFIG_X86 +#define CONFIG_X86 1 +#endif +#endif + +#ifdef __ia64__ +#ifndef CONFIG_IA64 +#define CONFIG_IA64 1 +#endif +#endif + +#ifdef __PPC__ +#ifndef CONFIG_PPC +#define CONFIG_PPC 1 +#endif +#endif + +#ifdef __s390__ +#ifndef CONFIG_S390 +#define CONFIG_S390 1 +#endif +#endif + +#endif +#ifndef __ASM_IA64_KVM_H +#define __ASM_IA64_KVM_H + +/* + * kvm structure definitions for ia64 + * + * Copyright (C) 2007 Xiantao Zhang xiantao.zh...@intel.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + */ + +#include asm/types.h +#include linux/ioctl.h + +/* Select x86 specific features in linux/kvm.h */ +#define __KVM_HAVE_IOAPIC +#define __KVM_HAVE_DEVICE_ASSIGNMENT + +/* Architectural interrupt line count. */ +#define KVM_NR_INTERRUPTS 256 + +#define KVM_IOAPIC_NUM_PINS 48 + +struct kvm_ioapic_state { + __u64 base_address; + __u32 ioregsel; + __u32 id; + __u32 irr; + __u32 pad; + union { + __u64 bits; + struct { + __u8 vector; + __u8 delivery_mode:3; + __u8 dest_mode:1; + __u8 delivery_status:1; + __u8 polarity:1; + __u8 remote_irr:1; + __u8 trig_mode:1; + __u8 mask:1; + __u8 reserve:7; + __u8 reserved[4]; + __u8 dest_id; + } fields; + } redirtbl[KVM_IOAPIC_NUM_PINS]; +}; + +#define KVM_IRQCHIP_PIC_MASTER 0 +#define KVM_IRQCHIP_PIC_SLAVE1 +#define KVM_IRQCHIP_IOAPIC 2 + +#define KVM_CONTEXT_SIZE 8*1024 + +struct kvm_fpreg { + union { + unsigned long bits[2]; + long double __dummy;/* force 16-byte alignment */ + } u; +}; + +union context { + /* 8K size */ + chardummy[KVM_CONTEXT_SIZE]; + struct { + unsigned long psr; + unsigned long pr; + unsigned long caller_unat; + unsigned long pad; + unsigned long gr[32]; + unsigned long ar[128]; + unsigned long br[8]; + unsigned long cr[128]; + unsigned long rr[8]; + unsigned long ibr[8]; + unsigned long dbr[8]; + unsigned long pkr[8]; + struct kvm_fpreg fr[128]; + }; +}; + +struct thash_data { + union { + struct { + unsigned long p: 1; /* 0 */ + unsigned long rv1 : 1; /* 1 */ + unsigned long ma : 3; /* 2-4 */ + unsigned long a: 1; /* 5 */ + unsigned long d: 1; /* 6 */ + unsigned long pl : 2; /* 7-8 */ + unsigned long ar : 3; /* 9-11 */ + unsigned long ppn : 38; /* 12-49 */ + unsigned long rv2 : 2; /* 50-51 */ + unsigned long ed : 1; /* 52 */ + unsigned long ig1 : 11; /* 53-63 */ + }; + struct { + unsigned long __rv1 : 53; /* 0-52 */ + unsigned long contiguous : 1; /*53 */ + unsigned long tc : 1; /* 54 TR or TC */ + unsigned long cl : 1; + /* 55 I side or D side cache line */ + unsigned long len : 4; /* 56-59 */ + unsigned long io : 1; /* 60 entry is for io or not */ +
[COMMIT master] Leave upstream QEMU comments intact
From: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/vl.c b/vl.c index 1d568bd..fbc84a7 100644 --- a/vl.c +++ b/vl.c @@ -5488,10 +5488,10 @@ int main(int argc, char **argv, char **envp) if (bt_parse(bt_opts[i])) exit(1); +/* init the memory */ if (ram_size == 0) ram_size = DEFAULT_RAM_SIZE * 1024 * 1024; -/* init the memory */ if (kvm_enabled()) { if (kvm_qemu_create_context() 0) { fprintf(stderr, Could not create KVM context\n); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Remove #define __user in usb-linux.c
From: Anthony Liguori aligu...@us.ibm.com The Makefile defines __user, so this is unnecessary. Signed-off-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/usb-linux.c b/usb-linux.c index 26643bd..70d7a1c 100644 --- a/usb-linux.c +++ b/usb-linux.c @@ -34,10 +34,6 @@ #include qemu-timer.h #include monitor.h -#if defined(__linux__) -#define __user -#endif - #include dirent.h #include sys/ioctl.h #include signal.h -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Make kvm_cpu_(has|get)_interrupt() work for userspace irqchip too
From: Gleb Natapov g...@redhat.com At the vector level, kernel and userspace irqchip are fairly similar. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index cf17ed5..11c2757 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -24,6 +24,7 @@ #include irq.h #include i8254.h +#include x86.h /* * check if there are pending timer events @@ -48,6 +49,9 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v) { struct kvm_pic *s; + if (!irqchip_in_kernel(v-kvm)) + return v-arch.irq_summary; + if (kvm_apic_has_interrupt(v) == -1) { /* LAPIC */ if (kvm_apic_accept_pic_intr(v)) { s = pic_irqchip(v-kvm);/* PIC */ @@ -67,6 +71,9 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v) struct kvm_pic *s; int vector; + if (!irqchip_in_kernel(v-kvm)) + return kvm_pop_irq(v); + vector = kvm_get_apic_interrupt(v); /* APIC */ if (vector == -1) { if (kvm_apic_accept_pic_intr(v)) { diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 053f3c5..1903c27 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2089,8 +2089,9 @@ static int interrupt_window_interception(struct vcpu_svm *svm, * If the user space waits to inject interrupts, exit as soon as * possible */ - if (kvm_run-request_interrupt_window - !svm-vcpu.arch.irq_summary) { + if (!irqchip_in_kernel(svm-vcpu.kvm) + kvm_run-request_interrupt_window + !kvm_cpu_has_interrupt(svm-vcpu)) { ++svm-vcpu.stat.irq_window_exits; kvm_run-exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN; return 0; @@ -2371,7 +2372,8 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu, (svm-vmcb-save.rflags X86_EFLAGS_IF) (svm-vcpu.arch.hflags HF_GIF_MASK)); - if (svm-vcpu.arch.interrupt_window_open svm-vcpu.arch.irq_summary) + if (svm-vcpu.arch.interrupt_window_open + kvm_cpu_has_interrupt(svm-vcpu)) /* * If interrupts enabled, and not blocked by sti or mov ss. Good. */ @@ -2381,7 +2383,8 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu, * Interrupts blocked. Wait for unblock. */ if (!svm-vcpu.arch.interrupt_window_open - (svm-vcpu.arch.irq_summary || kvm_run-request_interrupt_window)) + (kvm_cpu_has_interrupt(svm-vcpu) || +kvm_run-request_interrupt_window)) svm_set_vintr(svm); else svm_clear_vintr(svm); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c6997c0..b3292c1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2535,21 +2535,20 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu, vmx_inject_nmi(vcpu); if (vcpu-arch.nmi_pending) enable_nmi_window(vcpu); - else if (vcpu-arch.irq_summary -|| kvm_run-request_interrupt_window) + else if (kvm_cpu_has_interrupt(vcpu) || +kvm_run-request_interrupt_window) enable_irq_window(vcpu); return; } if (vcpu-arch.interrupt_window_open) { - if (vcpu-arch.irq_summary !vcpu-arch.interrupt.pending) - kvm_queue_interrupt(vcpu, kvm_pop_irq(vcpu)); + if (kvm_cpu_has_interrupt(vcpu) !vcpu-arch.interrupt.pending) + kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu)); if (vcpu-arch.interrupt.pending) vmx_inject_irq(vcpu, vcpu-arch.interrupt.nr); - } - if (!vcpu-arch.interrupt_window_open - (vcpu-arch.irq_summary || kvm_run-request_interrupt_window)) + } else if(kvm_cpu_has_interrupt(vcpu) || + kvm_run-request_interrupt_window) enable_irq_window(vcpu); } @@ -2976,8 +2975,9 @@ static int handle_interrupt_window(struct kvm_vcpu *vcpu, * If the user space waits to inject interrupts, exit as soon as * possible */ - if (kvm_run-request_interrupt_window - !vcpu-arch.irq_summary) { + if (!irqchip_in_kernel(vcpu-kvm) + kvm_run-request_interrupt_window + !kvm_cpu_has_interrupt(vcpu)) { kvm_run-exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN; return 0; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ab1fdac..8c730ad 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3065,7 +3065,7 @@ EXPORT_SYMBOL_GPL(kvm_emulate_cpuid); static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { -
[COMMIT master] KVM: VMX: Consolidate userspace and kernel interrupt injection for VMX
From: Gleb Natapov g...@redhat.com Use the same callback to inject irq/nmi events no matter what irqchip is in use. Only from VMX for now. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index cb306cf..5edae35 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -517,7 +517,7 @@ struct kvm_x86_ops { void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr, bool has_error_code, u32 error_code); bool (*exception_injected)(struct kvm_vcpu *vcpu); - void (*inject_pending_irq)(struct kvm_vcpu *vcpu); + void (*inject_pending_irq)(struct kvm_vcpu *vcpu, struct kvm_run *run); void (*inject_pending_vectors)(struct kvm_vcpu *vcpu, struct kvm_run *run); int (*interrupt_allowed)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 1903c27..674a249 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2296,7 +2296,7 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu) (svm-vcpu.arch.hflags HF_GIF_MASK); } -static void svm_intr_assist(struct kvm_vcpu *vcpu) +static void svm_intr_assist(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { struct vcpu_svm *svm = to_svm(vcpu); struct vmcb *vmcb = svm-vmcb; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b3292c1..06252f7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2510,48 +2510,6 @@ static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu) return vcpu-arch.interrupt_window_open; } -static void do_interrupt_requests(struct kvm_vcpu *vcpu, - struct kvm_run *kvm_run) -{ - vmx_update_window_states(vcpu); - - if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) - vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO, - GUEST_INTR_STATE_STI | - GUEST_INTR_STATE_MOV_SS); - - if (vcpu-arch.nmi_pending !vcpu-arch.nmi_injected) { - if (vcpu-arch.interrupt.pending) { - enable_nmi_window(vcpu); - } else if (vcpu-arch.nmi_window_open) { - vcpu-arch.nmi_pending = false; - vcpu-arch.nmi_injected = true; - } else { - enable_nmi_window(vcpu); - return; - } - } - if (vcpu-arch.nmi_injected) { - vmx_inject_nmi(vcpu); - if (vcpu-arch.nmi_pending) - enable_nmi_window(vcpu); - else if (kvm_cpu_has_interrupt(vcpu) || -kvm_run-request_interrupt_window) - enable_irq_window(vcpu); - return; - } - - if (vcpu-arch.interrupt_window_open) { - if (kvm_cpu_has_interrupt(vcpu) !vcpu-arch.interrupt.pending) - kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu)); - - if (vcpu-arch.interrupt.pending) - vmx_inject_irq(vcpu, vcpu-arch.interrupt.nr); - } else if(kvm_cpu_has_interrupt(vcpu) || - kvm_run-request_interrupt_window) - enable_irq_window(vcpu); -} - static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr) { int ret; @@ -3351,8 +3309,11 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) } } -static void vmx_intr_assist(struct kvm_vcpu *vcpu) +static void vmx_intr_assist(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { + bool req_int_win = !irqchip_in_kernel(vcpu-kvm) + kvm_run-request_interrupt_window; + update_tpr_threshold(vcpu); vmx_update_window_states(vcpu); @@ -3373,25 +3334,25 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu) return; } } + if (vcpu-arch.nmi_injected) { vmx_inject_nmi(vcpu); - if (vcpu-arch.nmi_pending) - enable_nmi_window(vcpu); - else if (kvm_cpu_has_interrupt(vcpu)) - enable_irq_window(vcpu); - return; + goto out; } + if (!vcpu-arch.interrupt.pending kvm_cpu_has_interrupt(vcpu)) { if (vcpu-arch.interrupt_window_open) kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu)); - else - enable_irq_window(vcpu); } - if (vcpu-arch.interrupt.pending) { + + if (vcpu-arch.interrupt.pending) vmx_inject_irq(vcpu, vcpu-arch.interrupt.nr); - if (kvm_cpu_has_interrupt(vcpu)) - enable_irq_window(vcpu); - } + +out: + if (vcpu-arch.nmi_pending) +
[COMMIT master] KVM: Remove exception_injected() callback.
From: Gleb Natapov g...@redhat.com It always return false for VMX/SVM now. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 5edae35..ea3741e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -516,7 +516,6 @@ struct kvm_x86_ops { void (*set_irq)(struct kvm_vcpu *vcpu, int vec); void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr, bool has_error_code, u32 error_code); - bool (*exception_injected)(struct kvm_vcpu *vcpu); void (*inject_pending_irq)(struct kvm_vcpu *vcpu, struct kvm_run *run); void (*inject_pending_vectors)(struct kvm_vcpu *vcpu, struct kvm_run *run); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 7b6ab16..872787b 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -196,11 +196,6 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, svm-vmcb-control.event_inj_err = error_code; } -static bool svm_exception_injected(struct kvm_vcpu *vcpu) -{ - return false; -} - static int is_external_interrupt(u32 info) { info = SVM_EVTINJ_TYPE_MASK | SVM_EVTINJ_VALID; @@ -2657,7 +2652,6 @@ static struct kvm_x86_ops svm_x86_ops = { .get_irq = svm_get_irq, .set_irq = svm_set_irq, .queue_exception = svm_queue_exception, - .exception_injected = svm_exception_injected, .inject_pending_irq = svm_intr_assist, .inject_pending_vectors = svm_intr_assist, .interrupt_allowed = svm_interrupt_allowed, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9eb518f..3186fcf 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -789,11 +789,6 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info); } -static bool vmx_exception_injected(struct kvm_vcpu *vcpu) -{ - return false; -} - /* * Swap MSR entry in host/guest MSR entry array. */ @@ -3697,7 +3692,6 @@ static struct kvm_x86_ops vmx_x86_ops = { .get_irq = vmx_get_irq, .set_irq = vmx_inject_irq, .queue_exception = vmx_queue_exception, - .exception_injected = vmx_exception_injected, .inject_pending_irq = vmx_intr_assist, .inject_pending_vectors = vmx_intr_assist, .interrupt_allowed = vmx_interrupt_allowed, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0ecd238..f20e1e4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3233,8 +3233,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) profile_hit(KVM_PROFILING, (void *)rip); } - if (vcpu-arch.exception.pending kvm_x86_ops-exception_injected(vcpu)) - vcpu-arch.exception.pending = false; kvm_lapic_sync_from_vapic(vcpu); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
From: Avi Kivity a...@redhat.com Conflicts: arch/ia64/kvm/kvm-ia64.c arch/x86/kvm/mmu.c include/linux/kvm.h Signed-off-by: Avi Kivity a...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Use kvm_arch_interrupt_allowed() instead of checking interrupt_window_open directly
From: Gleb Natapov g...@redhat.com kvm_arch_interrupt_allowed() also checks IF so drop the check. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1727829..0ecd238 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3067,8 +3067,7 @@ static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu, { return (!irqchip_in_kernel(vcpu-kvm) !kvm_cpu_has_interrupt(vcpu) kvm_run-request_interrupt_window - vcpu-arch.interrupt_window_open - (kvm_x86_ops-get_rflags(vcpu) X86_EFLAGS_IF)); + kvm_arch_interrupt_allowed(vcpu)); } static void post_kvm_run_save(struct kvm_vcpu *vcpu, @@ -3081,7 +3080,7 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu, kvm_run-ready_for_interrupt_injection = 1; else kvm_run-ready_for_interrupt_injection = - (vcpu-arch.interrupt_window_open + (kvm_arch_interrupt_allowed(vcpu) !kvm_cpu_has_interrupt(vcpu)); } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Remove inject_pending_vectors() callback
From: Gleb Natapov g...@redhat.com It is the same as inject_pending_irq() for VMX/SVM now. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ea3741e..aa5a54e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -517,8 +517,6 @@ struct kvm_x86_ops { void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr, bool has_error_code, u32 error_code); void (*inject_pending_irq)(struct kvm_vcpu *vcpu, struct kvm_run *run); - void (*inject_pending_vectors)(struct kvm_vcpu *vcpu, - struct kvm_run *run); int (*interrupt_allowed)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*get_tdp_level)(void); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 872787b..d0e4d98 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2653,7 +2653,6 @@ static struct kvm_x86_ops svm_x86_ops = { .set_irq = svm_set_irq, .queue_exception = svm_queue_exception, .inject_pending_irq = svm_intr_assist, - .inject_pending_vectors = svm_intr_assist, .interrupt_allowed = svm_interrupt_allowed, .set_tss_addr = svm_set_tss_addr, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3186fcf..9162b4c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3693,7 +3693,6 @@ static struct kvm_x86_ops vmx_x86_ops = { .set_irq = vmx_inject_irq, .queue_exception = vmx_queue_exception, .inject_pending_irq = vmx_intr_assist, - .inject_pending_vectors = vmx_intr_assist, .interrupt_allowed = vmx_interrupt_allowed, .set_tss_addr = vmx_set_tss_addr, .get_tdp_level = get_ept_level, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f20e1e4..1770b02 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3167,10 +3167,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) if (vcpu-arch.exception.pending) __queue_exception(vcpu); - else if (irqchip_in_kernel(vcpu-kvm)) - kvm_x86_ops-inject_pending_irq(vcpu, kvm_run); else - kvm_x86_ops-inject_pending_vectors(vcpu, kvm_run); + kvm_x86_ops-inject_pending_irq(vcpu, kvm_run); kvm_lapic_sync_to_vapic(vcpu); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: SVM: Add NMI injection support
From: Gleb Natapov g...@redhat.com Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 53533ea..eb140aa 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -763,6 +763,7 @@ enum { #define HF_GIF_MASK(1 0) #define HF_HIF_MASK(1 1) #define HF_VINTR_MASK (1 2) +#define HF_NMI_MASK(1 3) /* * Hardware virtualization extension instructions may fault if a diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 1b09ef5..50c1db9 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1841,6 +1841,14 @@ static int cpuid_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) return 1; } +static int iret_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +{ + ++svm-vcpu.stat.nmi_window_exits; + svm-vmcb-control.intercept = ~(1UL INTERCEPT_IRET); + svm-vcpu.arch.hflags = ~HF_NMI_MASK; + return 1; +} + static int invlpg_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) { if (emulate_instruction(svm-vcpu, kvm_run, 0, 0, 0) != EMULATE_DONE) @@ -2118,6 +2126,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm, [SVM_EXIT_VINTR]= interrupt_window_interception, /* [SVM_EXIT_CR0_SEL_WRITE] = emulate_on_interception, */ [SVM_EXIT_CPUID]= cpuid_interception, + [SVM_EXIT_IRET] = iret_interception, [SVM_EXIT_INVD] = emulate_on_interception, [SVM_EXIT_HLT] = halt_interception, [SVM_EXIT_INVLPG] = invlpg_interception, @@ -2225,6 +2234,13 @@ static void pre_svm_run(struct vcpu_svm *svm) new_asid(svm, svm_data); } +static void svm_inject_nmi(struct vcpu_svm *svm) +{ + svm-vmcb-control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI; + vcpu-arch.hflags |= HF_NMI_MASK; + svm-vmcb-control.intercept |= (1UL INTERCEPT_IRET); + ++vcpu-stat.nmi_injections; +} static inline void svm_inject_irq(struct vcpu_svm *svm, int irq) { @@ -2276,6 +2292,14 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu) vmcb-control.intercept_cr_write |= INTERCEPT_CR8_MASK; } +static int svm_nmi_allowed(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + struct vmcb *vmcb = svm-vmcb; + return !(vmcb-control.int_state SVM_INTERRUPT_SHADOW_MASK) + !(svm-vcpu.arch.hflags HF_NMI_MASK); +} + static int svm_interrupt_allowed(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -2291,16 +2315,35 @@ static void enable_irq_window(struct kvm_vcpu *vcpu) svm_inject_irq(to_svm(vcpu), 0x0); } +static void enable_nmi_window(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + if (svm-vmcb-control.int_state SVM_INTERRUPT_SHADOW_MASK) + enable_irq_window(vcpu); +} + static void svm_intr_inject(struct kvm_vcpu *vcpu) { /* try to reinject previous events if any */ + if (vcpu-arch.nmi_injected) { + svm_inject_nmi(to_svm(vcpu)); + return; + } + if (vcpu-arch.interrupt.pending) { svm_queue_irq(to_svm(vcpu), vcpu-arch.interrupt.nr); return; } /* try to inject new event if pending */ - if (kvm_cpu_has_interrupt(vcpu)) { + if (vcpu-arch.nmi_pending) { + if (svm_nmi_allowed(vcpu)) { + vcpu-arch.nmi_pending = false; + vcpu-arch.nmi_injected = true; + svm_inject_nmi(vcpu); + } + } else if (kvm_cpu_has_interrupt(vcpu)) { if (svm_interrupt_allowed(vcpu)) { kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu)); svm_queue_irq(to_svm(vcpu), vcpu-arch.interrupt.nr); @@ -2319,7 +2362,10 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) svm_intr_inject(vcpu); - if (kvm_cpu_has_interrupt(vcpu) || req_int_win) + /* enable NMI/IRQ window open exits if needed */ + if (vcpu-arch.nmi_pending) + enable_nmi_window(vcpu); + else if (kvm_cpu_has_interrupt(vcpu) || req_int_win) enable_irq_window(vcpu); out: -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Do not report TPR write to userspace if new value bigger or equal to a previous one.
From: Gleb Natapov g...@redhat.com Saves many exits to userspace in a case of IRQ chip in userspace. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 52c99a8..a85a145 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1860,9 +1860,13 @@ static int emulate_on_interception(struct vcpu_svm *svm, static int cr8_write_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) { + u8 cr8_prev = kvm_get_cr8(svm-vcpu); + /* instruction emulation calls kvm_set_cr8() */ emulate_instruction(svm-vcpu, NULL, 0, 0, 0); if (irqchip_in_kernel(svm-vcpu.kvm)) return 1; + if (cr8_prev = kvm_get_cr8(svm-vcpu)) + return 1; kvm_run-exit_reason = KVM_EXIT_SET_TPR; return 0; } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9162b4c..51f804c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2724,13 +2724,18 @@ static int handle_cr(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) kvm_set_cr4(vcpu, kvm_register_read(vcpu, reg)); skip_emulated_instruction(vcpu); return 1; - case 8: - kvm_set_cr8(vcpu, kvm_register_read(vcpu, reg)); - skip_emulated_instruction(vcpu); - if (irqchip_in_kernel(vcpu-kvm)) - return 1; - kvm_run-exit_reason = KVM_EXIT_SET_TPR; - return 0; + case 8: { + u8 cr8_prev = kvm_get_cr8(vcpu); + u8 cr8 = kvm_register_read(vcpu, reg); + kvm_set_cr8(vcpu, cr8); + skip_emulated_instruction(vcpu); + if (irqchip_in_kernel(vcpu-kvm)) + return 1; + if (cr8_prev = cr8) + return 1; + kvm_run-exit_reason = KVM_EXIT_SET_TPR; + return 0; + } }; break; case 2: /* clts */ -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Enable snooping control for supported hardware
From: Sheng Yang sh...@linux.intel.com Memory aliases with different memory type is a problem for guest. For the guest without assigned device, the memory type of guest memory would always been the same as host(WB); but for the assigned device, some part of memory may be used as DMA and then set to uncacheable memory type(UC/WC), which would be a conflict of host memory type then be a potential issue. Snooping control can guarantee the cache correctness of memory go through the DMA engine of VT-d. Signed-off-by: Sheng Yang sh...@linux.intel.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 8a6f6b6..8e680c3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -393,6 +393,8 @@ struct kvm_arch{ struct list_head active_mmu_pages; struct list_head assigned_dev_head; struct iommu_domain *iommu_domain; +#define KVM_IOMMU_CACHE_COHERENCY 0x1 + int iommu_flags; struct kvm_pic *vpic; struct kvm_ioapic *vioapic; struct kvm_pit *vpit; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 59b080c..e8a5649 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3581,11 +3581,26 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) { u64 ret; + /* For VT-d and EPT combination +* 1. MMIO: always map as UC +* 2. EPT with VT-d: +* a. VT-d without snooping control feature: can't guarantee the +* result, try to trust guest. +* b. VT-d with snooping control feature: snooping control feature of +* VT-d engine can guarantee the cache correctness. Just set it +* to WB to keep consistent with host. So the same as item 3. +* 3. EPT without VT-d: always map as WB and set IGMT=1 to keep +*consistent with host MTRR +*/ if (is_mmio) ret = MTRR_TYPE_UNCACHABLE VMX_EPT_MT_EPTE_SHIFT; + else if (vcpu-kvm-arch.iommu_domain + !(vcpu-kvm-arch.iommu_flags KVM_IOMMU_CACHE_COHERENCY)) + ret = kvm_get_guest_memory_type(vcpu, gfn) + VMX_EPT_MT_EPTE_SHIFT; else - ret = (kvm_get_guest_memory_type(vcpu, gfn) - VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IGMT_BIT; + ret = (MTRR_TYPE_WRBACK VMX_EPT_MT_EPTE_SHIFT) + | VMX_EPT_IGMT_BIT; return ret; } diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index 4c40375..1514758 100644 --- a/virt/kvm/iommu.c +++ b/virt/kvm/iommu.c @@ -39,11 +39,16 @@ int kvm_iommu_map_pages(struct kvm *kvm, pfn_t pfn; int i, r = 0; struct iommu_domain *domain = kvm-arch.iommu_domain; + int flags; /* check if iommu exists and in use */ if (!domain) return 0; + flags = IOMMU_READ | IOMMU_WRITE; + if (kvm-arch.iommu_flags KVM_IOMMU_CACHE_COHERENCY) + flags |= IOMMU_CACHE; + for (i = 0; i npages; i++) { /* check if already mapped */ if (iommu_iova_to_phys(domain, gfn_to_gpa(gfn))) @@ -53,8 +58,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, r = iommu_map_range(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn), - PAGE_SIZE, - IOMMU_READ | IOMMU_WRITE); + PAGE_SIZE, flags); if (r) { printk(KERN_ERR kvm_iommu_map_address: iommu failed to map pfn=%lx\n, pfn); @@ -88,7 +92,7 @@ int kvm_assign_device(struct kvm *kvm, { struct pci_dev *pdev = NULL; struct iommu_domain *domain = kvm-arch.iommu_domain; - int r; + int r, last_flags; /* check if iommu exists and in use */ if (!domain) @@ -107,12 +111,29 @@ int kvm_assign_device(struct kvm *kvm, return r; } + last_flags = kvm-arch.iommu_flags; + if (iommu_domain_has_cap(kvm-arch.iommu_domain, +IOMMU_CAP_CACHE_COHERENCY)) + kvm-arch.iommu_flags |= KVM_IOMMU_CACHE_COHERENCY; + + /* Check if need to update IOMMU page table for guest memory */ + if ((last_flags ^ kvm-arch.iommu_flags) == + KVM_IOMMU_CACHE_COHERENCY) { + kvm_iommu_unmap_memslots(kvm); + r = kvm_iommu_map_memslots(kvm); + if (r) + goto out_unmap; + } + printk(KERN_DEBUG assign device: host bdf = %x:%x:%x\n, assigned_dev-host_busnr, PCI_SLOT(assigned_dev-host_devfn), PCI_FUNC(assigned_dev-host_devfn)); return 0; +out_unmap: + kvm_iommu_unmap_memslots(kvm); +
Implement generic double fault generation mechanism
Move Double-Fault generation logic out of page fault exception generating function to cover more generic case. Signed-off-by: Eddie Dong eddie.d...@intel.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ab1fdac..51a8dad 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -162,12 +162,59 @@ void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data) } EXPORT_SYMBOL_GPL(kvm_set_apic_base); +#define EXCPT_BENIGN 0 +#define EXCPT_CONTRIBUTORY 1 +#define EXCPT_PF 2 + +static int exception_class(int vector) +{ + if (vector == 14) + return EXCPT_PF; + else if (vector == 0 || (vector = 10 vector = 13)) + return EXCPT_CONTRIBUTORY; + else + return EXCPT_BENIGN; +} + +static void kvm_multiple_exception(struct kvm_vcpu *vcpu, + unsigned nr, bool has_error, u32 error_code) +{ + u32 prev_nr; + int class1, class2; + + if (!vcpu-arch.exception.pending) { + vcpu-arch.exception.pending = true; + vcpu-arch.exception.has_error_code = has_error; + vcpu-arch.exception.nr = nr; + vcpu-arch.exception.error_code = error_code; + return; + } + + /* to check exception */ + prev_nr = vcpu-arch.exception.nr; + class2 = exception_class(nr); + class1 = exception_class(prev_nr); + if ((class1 == EXCPT_CONTRIBUTORY class2 == EXCPT_CONTRIBUTORY) + || (class1 == EXCPT_PF class2 != EXCPT_BENIGN)) { + /* generate double fault per SDM Table 5-5 */ + printk(KERN_DEBUG kvm: double fault 0x%x on 0x%x\n, + prev_nr, nr); + vcpu-arch.exception.pending = true; + vcpu-arch.exception.has_error_code = 1; + vcpu-arch.exception.nr = DF_VECTOR; + vcpu-arch.exception.error_code = 0; + if (prev_nr == DF_VECTOR) { + /* triple fault - shutdown */ + set_bit(KVM_REQ_TRIPLE_FAULT, vcpu-requests); + } + } else + printk(KERN_ERR Exception 0x%x on 0x%x happens serially\n, + prev_nr, nr); +} + void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr) { - WARN_ON(vcpu-arch.exception.pending); - vcpu-arch.exception.pending = true; - vcpu-arch.exception.has_error_code = false; - vcpu-arch.exception.nr = nr; + kvm_multiple_exception(vcpu, nr, false, 0); } EXPORT_SYMBOL_GPL(kvm_queue_exception); @@ -176,18 +223,6 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long addr, { ++vcpu-stat.pf_guest; - if (vcpu-arch.exception.pending) { - if (vcpu-arch.exception.nr == PF_VECTOR) { - printk(KERN_DEBUG kvm: inject_page_fault: -double fault 0x%lx\n, addr); - vcpu-arch.exception.nr = DF_VECTOR; - vcpu-arch.exception.error_code = 0; - } else if (vcpu-arch.exception.nr == DF_VECTOR) { - /* triple fault - shutdown */ - set_bit(KVM_REQ_TRIPLE_FAULT, vcpu-requests); - } - return; - } vcpu-arch.cr2 = addr; kvm_queue_exception_e(vcpu, PF_VECTOR, error_code); } @@ -200,11 +235,7 @@ EXPORT_SYMBOL_GPL(kvm_inject_nmi); void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code) { - WARN_ON(vcpu-arch.exception.pending); - vcpu-arch.exception.pending = true; - vcpu-arch.exception.has_error_code = true; - vcpu-arch.exception.nr = nr; - vcpu-arch.exception.error_code = error_code; + kvm_multiple_exception(vcpu, nr, true, error_code); } EXPORT_SYMBOL_GPL(kvm_queue_exception_e); irq3.patch Description: irq3.patch
[ kvm-Bugs-2638990 ] Segfault 284
Bugs item #2638990, was opened at 2009-02-25 23:35 Message generated for change (Comment added) made by ivanvimes You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2638990group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 6 Private: No Submitted By: David Rasche (drasche2) Assigned to: Nobody/Anonymous (nobody) Summary: Segfault 284 Initial Comment: Host (2) Intel Xeon (E5430) Quad Core Processors (2.66GHz) 16G mem Host OS: Ubuntu 8.10 (64bit) kvm-72 libvirt 0.4.4 Guest OS Win2k3 Server (32 bit) After running for 8 to 48 hours, Win2k3 guest system crashes with no warning. Syslog shows the following segmentation fault: Feb 25 16:12:02 host-b kernel: [448190.415857] kvm[25511]: segfault at 284 ip 0043 386f sp 7fff97fa3a70 error 4 in kvm[40+19e000] this error has been confirmed on 2 different machines with exactly the same setup. We are running KVM through libvirt with the following xml setup. domain type='kvm' nameexchange/name uuide8d93082-c1db-426c-9ad3-ae651095ceb5/uuid memory4096000/memory currentMemory4096000/currentMemory vcpu3/vcpu os typehvm/type boot dev='hd'/ /os features acpi/ /features clock offset='localtime'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashdestroy/on_crash devices emulator/usr/bin/kvm/emulator disk type='file' device='disk' source file='/mnt/vg0/lvol3/exchange.qcow2'/ target dev='hda' bus='ide'/ /disk disk type='block' device='disk' source dev='/dev/vg1/lv_exchdb'/ target dev='hdb' bus='ide'/ /disk disk type='file' device='cdrom' target dev='hdc' bus='ide'/ readonly/ /disk disk type='block' device='disk' source dev='/dev/vg2/lv_exchlog'/ target dev='hdd' bus='ide'/ /disk interface type='bridge' mac address='00:0c:29:cf:71:e4'/ source bridge='br0'/ /interface input type='tablet' bus='usb'/ input type='mouse' bus='ps2'/ graphics type='vnc' port='5900' listen='127.0.0.1'/ /devices /domain -- Comment By: Simon Jagoe (ivanvimes) Date: 2009-04-30 09:09 Message: I am running an Ubuntu Hardy guest in KVM-72 (Ubuntu Intrepid host), and got a similar segfault: Apr 30 04:16:37 hare kernel: [726803.676199] kvm[4930]: segfault at 284 ip 0043386f sp 7fff1d240dd0 error 4 in kvm[40+19e000] My hardware details are as follows: HP ProLiant ML110 G5 Intel Xeon CPU 3065 2.33GHz (Dual core) 4GB RAM I have four guests on the system, all Ubuntu Hardy. Only one of these appears to have crashed. It is allocated one VCPU and 1024 MB of RAM. The others are: * 2 VCPUs 1024MB RAM * 2 VCPUs 256MB RAM * 1 VCPU 256MB RAM Additionally, I am also using libvirt. My (crashed) domain's XML looks like this: domain type='kvm' namepartridge/name uuid4f3bae26-359b-f633-9476-9d95fc2160b0/uuid memory1048576/memory currentMemory1048576/currentMemory vcpu1/vcpu os typehvm/type boot dev='hd'/ /os features acpi/ /features clock offset='utc'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashdestroy/on_crash devices emulator/usr/bin/kvm/emulator disk type='block' device='disk' source dev='/dev/hare/partridge_root'/ target dev='hda' bus='ide'/ /disk disk type='block' device='disk' source dev='/dev/hare/partridge_home'/ target dev='hdd' bus='ide'/ /disk disk type='block' device='disk' source dev='/dev/hare/partridge_opt'/ target dev='hdc' bus='ide'/ /disk disk type='block' device='disk' source dev='/dev/hare/partridge_var'/ target dev='hdb' bus='ide'/ /disk interface type='bridge' mac address='00:16:3e:30:99:7c'/ source bridge='br0'/ /interface input type='mouse' bus='ps2'/ graphics type='vnc' port='5900' listen='127.0.0.1'/ /devices /domain Please let me know if I can provide more information on this. I will likely upgrade to Ubuntu Jaunty this week (and with it KVM-84). -- Comment By: David Rasche (drasche2) Date: 2009-03-09 17:07 Message: I just updated one of our machines with KVM-84 and I am getting the exact same segfault as reported above. This is cricital. We are trying to run a win2k3 server with exchange and it keeps crashing. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2638990group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info
Re: Missing symlink in qemu-kvm.git?
walt wrote: When building on x86 I get this error: make[2]: Entering directory `/home/wa1ter/src/qemu-kvm/kvm/libkvm' make[2]: *** No rule to make target `/home/wa1ter/src/qemu-kvm/kvm/kernel/include/asm/kvm.h', needed by `libkvm.o'. I fixed it by adding the same symlink that I add to Linus's kernel.git for exactly the same reason: #cd qemu-kvm/kvm/kernel/include #ln -s ../arch/x86/include/asm asm [there was no asm directory here] Am I the only one who has this problem? This is already fixed. Pull again and retry. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Boot problems with qemu-kvm
Xu, Jiajun wrote: Yes. If booting guest with -no-kvm, X display can work well. And I am using bridge network, so still can not get network up. :( And qemu cpu utilization is still ~100%. The last merge with qemu.git broke both vga and networking. Mark fixed networking, we're still looking at vga. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 1/2] Add MCE simulation support to qemu/tcg
- MCE features are initialized when VCPU is intialized according to CPUID. - A monitor command mce is added to inject a MCE. - A new interrupt mask: CPU_INTERRUPT_MCE is added to inject the MCE. Signed-off-by: Huang Ying ying.hu...@intel.com --- cpu-all.h |4 ++ cpu-exec.c |4 ++ monitor.c | 49 + target-i386/cpu.h | 22 +++ target-i386/helper.c| 70 target-i386/op_helper.c | 34 +++ 6 files changed, 183 insertions(+) --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -202,6 +202,7 @@ #define CR4_DE_MASK (1 3) #define CR4_PSE_MASK (1 4) #define CR4_PAE_MASK (1 5) +#define CR4_MCE_MASK (1 6) #define CR4_PGE_MASK (1 7) #define CR4_PCE_MASK (1 8) #define CR4_OSFXSR_SHIFT 9 @@ -248,6 +249,17 @@ #define PG_ERROR_RSVD_MASK 0x08 #define PG_ERROR_I_D_MASK 0x10 +#define MCE_CAP_DEF0x100 +#define MCE_BANKS_DEF 4 + +#define MCG_CTL_P (1UL8) + +#define MCG_STATUS_MCIP(1UL2) + +#define MCI_STATUS_VAL (1UL63) +#define MCI_STATUS_OVER(1UL62) +#define MCI_STATUS_UC (1UL61) + #define MSR_IA32_TSC0x10 #define MSR_IA32_APICBASE 0x1b #define MSR_IA32_APICBASE_BSP (18) @@ -288,6 +300,11 @@ #define MSR_MTRRdefType0x2ff +#define MSR_MC0_CTL0x400 +#define MSR_MC0_STATUS 0x401 +#define MSR_MC0_ADDR 0x402 +#define MSR_MC0_MISC 0x403 + #define MSR_EFER0xc080 #define MSR_EFER_SCE (1 0) @@ -673,6 +690,11 @@ typedef struct CPUX86State { /* in order to simplify APIC support, we leave this pointer to the user */ struct APICState *apic_state; + +uint64 mcg_cap; +uint64 mcg_status; +uint64 mcg_ctl; +uint64 *mce_banks; } CPUX86State; CPUX86State *cpu_x86_init(const char *cpu_model); --- a/target-i386/op_helper.c +++ b/target-i386/op_helper.c @@ -3133,7 +3133,23 @@ void helper_wrmsr(void) case MSR_MTRRdefType: env-mtrr_deftype = val; break; +case MSR_MCG_STATUS: +env-mcg_status = val; +break; +case MSR_MCG_CTL: +if ((env-mcg_cap MCG_CTL_P) + (val == 0 || val == ~(uint64_t)0)) +env-mcg_ctl = val; +break; default: +if ((uint32_t)ECX = MSR_MC0_CTL + (uint32_t)ECX MSR_MC0_CTL + (4 * env-mcg_cap 0xff)) { +uint32_t offset = (uint32_t)ECX - MSR_MC0_CTL; +if ((offset 0x3) != 0 +|| (val == 0 || val == ~(uint64_t)0)) +env-mce_banks[offset] = val; +break; +} /* XXX: exception ? */ break; } @@ -3252,7 +3268,25 @@ void helper_rdmsr(void) /* XXX: exception ? */ val = 0; break; +case MSR_MCG_CAP: +val = env-mcg_cap; +break; +case MSR_MCG_CTL: +if (env-mcg_cap MCG_CTL_P) +val = env-mcg_ctl; +else +val = 0; +break; +case MSR_MCG_STATUS: +val = env-mcg_status; +break; default: +if ((uint32_t)ECX = MSR_MC0_CTL + (uint32_t)ECX MSR_MC0_CTL + (4 * env-mcg_cap 0xff)) { +uint32_t offset = (uint32_t)ECX - MSR_MC0_CTL; +val = env-mce_banks[offset]; +break; +} /* XXX: exception ? */ val = 0; break; --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -1430,6 +1430,75 @@ static void breakpoint_handler(CPUState } #endif /* !CONFIG_USER_ONLY */ +/* This should come from sysemu.h - if we could include it here... */ +void qemu_system_reset_request(void); + +void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status, +uint64_t mcg_status, uint64_t addr, uint64_t misc) +{ +uint64_t mcg_cap = cenv-mcg_cap; +unsigned bank_num = mcg_cap 0xff; +uint64_t *banks = cenv-mce_banks; + +if (bank = bank_num || !(status MCI_STATUS_VAL)) +return; + +/* + * if MSR_MCG_CTL is not all 1s, the uncorrected error + * reporting is disabled + */ +if ((status MCI_STATUS_UC) (mcg_cap MCG_CTL_P) +cenv-mcg_ctl != ~(uint64_t)0) +return; +banks += 4 * bank; +/* + * if MSR_MCi_CTL is not all 1s, the uncorrected error + * reporting is disabled for the bank + */ +if ((status MCI_STATUS_UC) banks[0] != ~(uint64_t)0) +return; +if (status MCI_STATUS_UC) { +if ((cenv-mcg_status MCG_STATUS_MCIP) || +!(cenv-cr[4] CR4_MCE_MASK)) { +fprintf(stderr, injects mce exception while previous +one is in progress!\n); +qemu_log_mask(CPU_LOG_RESET, Triple fault\n); +qemu_system_reset_request(); +return; +
[RFC 2/2] Add MCE simulation support to qemu/kvm
MCE features are detected, initialized and injected via the corresponding KVM ioctl. Signed-off-by: Huang Ying ying.hu...@intel.com --- kvm-all.c| 24 ++ kvm.h|4 +++ target-i386/helper.c |8 +- target-i386/kvm.c| 67 ++- 4 files changed, 101 insertions(+), 2 deletions(-) --- a/kvm-all.c +++ b/kvm-all.c @@ -765,6 +765,30 @@ int kvm_has_sync_mmu(void) return 0; } +int kvm_has_mce(void) +{ +#ifdef KVM_CAP_MCE +KVMState *s = kvm_state; +int r; + +r = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_MCE); +if (r 0) +return r; +#endif +return 0; +} + +int kvm_get_mce_cap_supported(uint64_t *mce_cap) +{ +#ifdef KVM_CAP_MCE +KVMState *s = kvm_state; + +return kvm_ioctl(s, KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap); +#else +return -ENOSYS; +#endif +} + #ifdef KVM_CAP_SET_GUEST_DEBUG struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *env, target_ulong pc) --- a/kvm.h +++ b/kvm.h @@ -47,6 +47,10 @@ int kvm_log_start(target_phys_addr_t phy int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size); int kvm_has_sync_mmu(void); +int kvm_has_mce(void); +int kvm_get_mce_cap_supported(uint64_t *mce_cap); +void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status, +uint64_t mcg_status, uint64_t addr, uint64_t misc); int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size); int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size); --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -1440,6 +1440,11 @@ void cpu_inject_x86_mce(CPUState *cenv, unsigned bank_num = mcg_cap 0xff; uint64_t *banks = cenv-mce_banks; +if (kvm_enabled()) { +kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc); +return; +} + if (bank = bank_num || !(status MCI_STATUS_VAL)) return; @@ -1757,7 +1762,8 @@ CPUX86State *cpu_x86_init(const char *cp cpu_x86_close(env); return NULL; } -mce_init(env); +if (!kvm_enabled()) +mce_init(env); cpu_reset(env); #ifdef CONFIG_KQEMU kqemu_init(env); --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -34,6 +34,42 @@ do { } while (0) #endif +static void kvm_arch_setup_mce(CPUState *env) +{ +int banks; +int ret; +uint64_t mcg_cap; + +#ifdef KVM_CAP_MCE +if (((env-cpuid_version 8) 0xf) 6) +return; + +if ((env-cpuid_features (CPUID_MCE|CPUID_MCA)) != (CPUID_MCE|CPUID_MCA)) +return; + +banks = kvm_has_mce(); +if (banks = 0) +return; + +ret = kvm_get_mce_cap_supported(mcg_cap); +if (ret) { +fprintf(stderr, kvm_get_mce_cap_supported FAILED\n); +return; +} + +if (banks MCE_BANKS_DEF) +banks = MCE_BANKS_DEF; +mcg_cap = MCE_CAP_DEF; +mcg_cap |= banks; + +if (kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, mcg_cap)) { +fprintf(stderr, kvm: setup mce FAILED\n); +return; +} +env-mcg_cap = mcg_cap; +#endif +} + int kvm_arch_init_vcpu(CPUState *env) { struct { @@ -42,6 +78,7 @@ int kvm_arch_init_vcpu(CPUState *env) } __attribute__((packed)) cpuid_data; uint32_t limit, i, j, cpuid_i; uint32_t unused; +int ret; cpuid_i = 0; @@ -107,7 +144,13 @@ int kvm_arch_init_vcpu(CPUState *env) cpuid_data.cpuid.nent = cpuid_i; -return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, cpuid_data); +ret = kvm_vcpu_ioctl(env, KVM_SET_CPUID2, cpuid_data); +if (ret 0) +return ret; + +kvm_arch_setup_mce(env); + +return 0; } static int kvm_has_msr_star(CPUState *env) @@ -665,6 +708,28 @@ int kvm_arch_handle_exit(CPUState *env, return ret; } +void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status, +uint64_t mcg_status, uint64_t addr, uint64_t misc) +{ +#ifdef KVM_CAP_MCE +struct kvm_x86_mce mce = { +.bank = bank, +.status = status, +.mcg_status = mcg_status, +.addr = addr, +.misc = misc, +}; +int ret; + +if (kvm_has_mce() = 0) +return; + +ret = kvm_vcpu_ioctl(cenv, KVM_X86_SET_MCE, mce); +if (ret 0) +fprintf(stderr, kvm: inject mce FAILED\n); +#endif +} + #ifdef KVM_CAP_SET_GUEST_DEBUG int kvm_arch_insert_sw_breakpoint(CPUState *env, struct kvm_sw_breakpoint *bp) { signature.asc Description: This is a digitally signed message part
Re: KVM performance vs. Xen
Andrew Theurer wrote: I wanted to share some performance data for KVM and Xen. I thought it would be interesting to share some performance results especially compared to Xen, using a more complex situation like heterogeneous server consolidation. The Workload: The workload is one that simulates a consolidation of servers on to a single host. There are 3 server types: web, imap, and app (j2ee). In addition, there are other helper servers which are also consolidated: a db server, which helps out with the app server, and an nfs server, which helps out with the web server (a portion of the docroot is nfs mounted). There is also one other server that is simply idle. All 6 servers make up one set. The first 3 server types are sent requests, which in turn may send requests to the db and nfs helper servers. The request rate is throttled to produce a fixed amount of work. In order to increase utilization on the host, more sets of these servers are used. The clients which send requests also have a response time requirement which is monitored. The following results have passed the response time requirements. What's the typical I/O load (disk and network bandwidth) while the tests are running? The host hardware: A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4 x 1 GB Ethenret CPU time measurements with SMT can vary wildly if the system is not fully loaded. If the scheduler happens to schedule two threads on a single core, both of these threads will generate less work compared to if they were scheduled on different cores. Test Results: The throughput is equal in these tests, as the clients throttle the work (this is assuming you don't run out of a resource on the host). What's telling is the CPU used to do the same amount of work: Xen: 52.85% KVM: 66.93% So, KVM requires 66.93/52.85 = 26.6% more CPU to do the same amount of work. Here's the breakdown: totalusernice system irq softirq guest 66.907.200.00 12.940.353.39 43.02 Comparing guest time to all other busy time, that's a 23.88/43.02 = 55% overhead for virtualization. I certainly don't expect it to be 0, but 55% seems a bit high. So, what's the reason for this overhead? At the bottom is oprofile output of top functions for KVM. Some observations: 1) I'm seeing about 2.3% in scheduler functions [that I recognize]. Does that seems a bit excessive? Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. 2) cpu_physical_memory_rw due to not using preadv/pwritev? I think both virtio-net and virtio-blk use memcpy(). 3) vmx_[save|load]_host_state: I take it this is from guest switches? These are called when you context-switch from a guest, and, much more frequently, when you enter qemu. We have 180,000 context switches a second. Is this more than expected? Way more. Across 16 logical cpus, this is 10,000 cs/sec/cpu. I wonder if schedstats can show why we context switch (need to let someone else run, yielded, waiting on io, etc). Yes, there is a scheduler tracer, though I have no idea how to operate it. Do you have kvm_stat logs? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Event channels in KVM?
Anthony Liguori wrote: Kapadia, Vivek wrote: I came across this thread looking for an efficient event channel mechanism between two guests (running on different cpu cores). While I can use available emulated IO mechanism (guest1-host kernel driver-Qemu1-Qemu2) in conjunction with interrupt mechanism (Qemu2-host kernel driver-guest2) in KVM, this involves several context switches. Xen handles notifications in hypervisor via hypercall and hence is likely more efficient. They almost certainly aren't more efficient. An event channel notification involves a hypercall to the hypervisor. When using VT, the performance difference between a vmcall exit vs. a pio exit is quite small (especially compared to the overhead of the exit). We're talking in the order of nanoseconds compared to microseconds. What makes KVM particularly different from Xen is that in KVM, the PIO operation results in a direct transition to QEMU. In Xen, typically event channel notifications result in a bit being set in a bitmap which then results in an interrupt injection depending on the next opportunity the hypervisor has to schedule/run the receiving domain. This is not deterministic and can potentially be a very long period of time. Event channels are inherently asynchronous whereas PIO notifications in KVM are synchronous. Since the scheduler isn't involved and control never leaves the CPU, the KVM PIO notifications are actually extremely efficient. IMHO, it's one of KVM's best design features. If you make the pio operation wake up another guest, then the operation becomes asynchronous. There's really no fundamental different between Xen and kvm here, and both will require the same number of context switches (one) to transfer control. Handling a pio that is completely internal to the guest is different (Xen has to schedule dom0 or the stub domain), but that's not related to interguest communications. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-kmod.git via http
Bernhard Kohl wrote: I'm trying to clone this new repository using the http protocol because I'm behind a proxy. I get the following error. For kvm.git and qemu-kvm.git this works well. $ git clone http://www.kernel.org/pub/scm/virt/kvm/kvm-kmod.git Initialized empty Git repository in /home/bernd/src/kvm-kmod/.git/ fatal: http://www.kernel.org/pub/scm/virt/kvm/kvm-kmod.git/info/refs not found: did you run git update-server-info on the server? $ Thank Should work now. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/04] qemu-kvm: Remove the dependency for phys_ram_base for ipf.c
Zhang, Xiantao wrote: Jes Sorensen wrote: The main difference is that my patch cleans up the interfaces and calls to the various functions, and removes a bunch of global variables as well. I still can't see the difference with the patch in Avi's tree except nvram stuff. And I believe the global variable you mentioned should be only used for nvram. So I propose an incremental patch for that. :) Hi Xiantao, I cannot see your patch in Avi's tree, would you mind sending me the latest version by email, so I can look into this? Thanks, Jes -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/04] qemu-kvm: Remove the dependency for phys_ram_base for ipf.c
Jes Sorensen wrote: Zhang, Xiantao wrote: Jes Sorensen wrote: The main difference is that my patch cleans up the interfaces and calls to the various functions, and removes a bunch of global variables as well. I still can't see the difference with the patch in Avi's tree except nvram stuff. And I believe the global variable you mentioned should be only used for nvram. So I propose an incremental patch for that. :) Hi Xiantao, I cannot see your patch in Avi's tree, would you mind sending me the latest version by email, so I can look into this? I pushed my queue into a branch (named 'queue'). Will merge once I resolve the regressions here. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/21] Remove odd hack in vga.c
Anthony Liguori wrote: I looked closely at the vga code in kvm-userspace a while ago and merged every fix I could understand into upstream QEMU. This particular change makes no sense to me. I could not figure out from revision history what it actually fixed. I'm fairly certain it's not useful today. Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- hw/vga.c | 27 --- 1 files changed, 4 insertions(+), 23 deletions(-) diff --git a/hw/vga.c b/hw/vga.c index d96f1be..385184a 100644 --- a/hw/vga.c +++ b/hw/vga.c @@ -2227,33 +2227,14 @@ typedef struct PCIVGAState { VGAState vga_state; } PCIVGAState; -static int s1, s2; - -static void mark_dirty(target_phys_addr_t start, target_phys_addr_t len) -{ -target_phys_addr_t end = start + len; - -while (start end) { -cpu_physical_memory_set_dirty(cpu_get_physical_page_desc(start)); -start += TARGET_PAGE_SIZE; -} -} - void vga_dirty_log_start(VGAState *s) { if (kvm_enabled() s-map_addr) -if (!s1) { -kvm_log_start(s-map_addr, s-map_end - s-map_addr); -mark_dirty(s-map_addr, s-map_end - s-map_addr); -s1 = 1; -} +kvm_log_start(s-map_addr, s-map_end - s-map_addr); + if (kvm_enabled() s-lfb_vram_mapped) { -if (!s2) { -kvm_log_start(isa_mem_base + 0xa, 0x8000); -kvm_log_start(isa_mem_base + 0xa8000, 0x8000); -mark_dirty(isa_mem_base + 0xa, 0x1); -} -s2 = 1; +kvm_log_start(isa_mem_base + 0xa, 0x8000); +kvm_log_start(isa_mem_base + 0xa8000, 0x8000); } } This makes live migration and vga dirty tracking work together. Unfortunately since the last merge with qemu it's broken. We have a shared resource, the log_dirty flag of memory slots. We can't call log_start() and log_stop() from different users and expect things to work. One cleaner way to fix this is to add a parameter containing the mask which will be used by the client to access the qemu bytemap. log_start() can OR this parameter with its own copy, and log_stop() can AND NOT the same thing. When the local copy is nonzero, the slot dirty log is enabled. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/21] Remove virtio-console PIF change
Anthony Liguori wrote: If this change should happen, it should happen in upstream QEMU. Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- hw/virtio-console.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/virtio-console.c b/hw/virtio-console.c index 89e8be0..b263281 100644 --- a/hw/virtio-console.c +++ b/hw/virtio-console.c @@ -132,7 +132,7 @@ void *virtio_console_init(PCIBus *bus, CharDriverState *chr) PCI_DEVICE_ID_VIRTIO_CONSOLE, PCI_VENDOR_ID_REDHAT_QUMRANET, VIRTIO_ID_CONSOLE, - PCI_CLASS_OTHERS, 0x00, + PCI_CLASS_DISPLAY_OTHER, 0x00, 0, sizeof(VirtIOConsole)); if (s == NULL) return NULL; Since virtio-console is not enabled by default, it isn't needed, so I'll apply this. But if it were needed, there's no reason to introduce regressions into qemu-kvm.git. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/21] Remove -cpu-vendor-string
Anthony Liguori wrote: This isn't in upstream QEMU and is of little utility to KVM. It's unlikely to appear in upstream QEMU either. Since we allow overriding cpuid flags, why not the vendor string? It's necessary for cpu passthrough. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 18/21] Remove host_alarm_timer hacks.
Anthony Liguori wrote: Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- vl.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/vl.c b/vl.c index 3b0e3dc..848a8f8 100644 --- a/vl.c +++ b/vl.c @@ -1367,8 +1367,7 @@ static void host_alarm_handler(int host_signum) last_clock = ti; } #endif -if (1 || -alarm_has_dynticks(alarm_timer) || +if (alarm_has_dynticks(alarm_timer) || (!use_icount qemu_timer_expired(active_timers[QEMU_TIMER_VIRTUAL], qemu_get_clock(vm_clock))) || This was added to fix a problem. Have you tested it? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/21] Remove merge artifacts from qemu-kvm
Anthony Liguori wrote: Now that we've got qemu-kvm, it's pretty easy to look at what's different between upstream QEMU and qemu-kvm. This was actually easy in kvm-userspace.git: git diff origin/master origin/qemu-svn/trunk. Unfortunately, there's still a lot of gunk that seems to keep surviving merges. This series removes all of the gunk I could find. I also culled out a number of fixes that should be in upstream QEMU. I'll take care of getting those committed. Applied all except patches for which I had objections (noted in separate replies); thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/21] Remove #define __user in usb-linux.c
Anthony Liguori wrote: This has been consistently nacked in upstream QEMU. Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- usb-linux.c |4 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/usb-linux.c b/usb-linux.c index 26643bd..70d7a1c 100644 --- a/usb-linux.c +++ b/usb-linux.c @@ -34,10 +34,6 @@ #include qemu-timer.h #include monitor.h -#if defined(__linux__) -#define __user -#endif - This will introduce a regression into qemu-kvm.git. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm.git now live
Jan Kiszka wrote: That's sort of what's implemented in qemu-kvm.git. In qemu.git vga logging does not get disabled, which is really broken. It prevents optimizations like disabling logging when the screen is not displayed to a human. Is there a channel that tells vga nothing will be displayed? I may have missed it while removing all those disable-logging-as-it-may- confuse-slot-management hooks. I think currently qemu simply stops calling vga_draw_graphic(). This makes sense for tcg since it needs to track dirty memory regardless (so it can invalidate TBs). But for kvm we'll want to add an explicit channel. Note that it isn't likely to make a huge difference: if you don't actively read-and-reset the dirty bitmap, kvm will keep the shadow ptes with write permission and you won't see any performance hit. The only difference is whether large pages can be used or not. Where/how does the migration code disable dirty logging? Should be phase 3 of ram_save_live(). But only in qemu-kvm. What is the plan about pushing it upstream? Then we could discuss how to extend the exiting support best. Pushing things upstream is quite difficult because of the very different infrastructure. It's unfortunate that upstream rewrote everything instead of changing things incrementally. Rewrites are almost always a mistake since they throw away accumulated knowledge. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Remove duplicate set_link monitor command
Jan Kiszka wrote: Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- monitor.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/monitor.c b/monitor.c index 11e48c7..674630b 100644 --- a/monitor.c +++ b/monitor.c @@ -1792,7 +1792,6 @@ static const mon_cmd_t mon_cmds[] = { acl allow vnc.username fred\n acl deny vnc.username bob\n acl reset vnc.username\n }, -{ set_link, ss, do_set_link, name [up|down] }, { cpu_set, is, do_cpu_set_nr, cpu [online|offline], change cpu state }, { NULL, NULL, }, }; Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: external module: fix request_irq for 2.6.19
Chris Wright wrote: The irq handler changes (introduced in 2.6.19, not 2.6.20) dropped struct pt_regs from the handler prototype, they are found globally now. This introduces the back compat for older kernels. The handler is just a thin layer which calls the real registered handler (all this to work around a minor little compiler warning ;-) Needed for device assignment on older kernels. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-kmod: fix build on kernels with kvm trace set
Michael S. Tsirkin wrote: CONFIG_KVM_TRACE in kernel conflicts with the definition in external module. external-module-compat-comm.h tried to work around this, but this didn't work as some code still does #include linux/autoconf.h directly. Solve this differently by s/CONFIG_KVM_TRACE/CONFIG_KMOD_KVM_TRACE/ in awk. Had to tighten regular expressions in hack-module.awk so that they don't trigger on kvm_host.h . Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Makefile |5 +++-- configure |2 +- external-module-compat-comm.h |7 --- x86/Kbuild|2 +- x86/hack-module.awk |8 +--- 5 files changed, 10 insertions(+), 14 deletions(-) diff --git a/Makefile b/Makefile index f2ef811..9cdc0af 100644 --- a/Makefile +++ b/Makefile @@ -34,8 +34,8 @@ hack-files-ia64 = kvm_main.c kvm_fw.c kvm_lib.c kvm-ia64.c hack-files = $(hack-files-$(ARCH_DIR)) -ifeq ($(EXT_CONFIG_KVM_TRACE),y) -module_defines += -DEXT_CONFIG_KVM_TRACE=y +ifeq ($(CONFIG_KMOD_KVM_TRACE),y) +module_defines += -DCONFIG_KMOD_KVM_TRACE=1 endif all:: prerequisite @@ -72,6 +72,7 @@ header-sync: for i in $$(find $T -name '*.h'); do \ $(call unifdef,$$i); done $(call hack, include/linux/kvm.h) + $(call hack, include/linux/kvm_host.h) $(call hack, include/asm-$(ARCH_DIR)/kvm.h) set -e for i in $$(find $T -type f -printf '%P '); \ do mkdir -p $$(dirname $$i); cmp -s $$i $T/$$i || cp $T/$$i $$i; done diff --git a/configure b/configure index 30af6e7..6e12bb1 100755 --- a/configure +++ b/configure @@ -122,5 +122,5 @@ DEPMOD_VERSION=$depmod_version EOF cat EOF config.kbuild -EXT_CONFIG_KVM_TRACE=$kvm_trace +CONFIG_KMOD_KVM_TRACE=$kvm_trace EOF diff --git a/external-module-compat-comm.h b/external-module-compat-comm.h index c955927..e561448 100644 --- a/external-module-compat-comm.h +++ b/external-module-compat-comm.h @@ -18,13 +18,6 @@ #include linux/hrtimer.h #include asm/bitops.h -/* Override CONFIG_KVM_TRACE */ -#ifdef EXT_CONFIG_KVM_TRACE -# define CONFIG_KVM_TRACE 1 -#else -# undef CONFIG_KVM_TRACE -#endif - /* * 2.6.16 does not have GFP_NOWAIT */ diff --git a/x86/Kbuild b/x86/Kbuild index d3aca00..fbdb28b 100644 --- a/x86/Kbuild +++ b/x86/Kbuild @@ -7,7 +7,7 @@ kvm-objs := kvm_main.o x86.o mmu.o x86_emulate.o ../anon_inodes.o irq.o i8259.o lapic.o ioapic.o preempt.o i8254.o coalesced_mmio.o irq_comm.o \ timer.o \ ../external-module-compat.o -ifeq ($(EXT_CONFIG_KVM_TRACE),y) +ifeq ($(CONFIG_KMOD_KVM_TRACE),y) kvm-objs += kvm_trace.o endif ifeq ($(CONFIG_IOMMU_API),y) diff --git a/x86/hack-module.awk b/x86/hack-module.awk index 260eeef..f3d95be 100644 --- a/x86/hack-module.awk +++ b/x86/hack-module.awk @@ -4,7 +4,7 @@ BEGIN { split(INIT_WORK desc_struct ldttss_desc64 desc_ptr \ hrtimer_expires_remaining \ on_each_cpu relay_open request_irq , compat_apis); } -/^int kvm_init\(/ { anon_inodes = 1 } +/^int kvm_init\([^)]*\)$/ { anon_inodes = 1 } /return 0;/ anon_inodes { print \tr = kvm_init_anon_inodes();; @@ -17,7 +17,7 @@ BEGIN { split(INIT_WORK desc_struct ldttss_desc64 desc_ptr \ anon_inodes = 0 } -/^void kvm_exit/ { anon_inodes_exit = 1 } +/^void kvm_exit[^)]*\)$/ { anon_inodes_exit = 1 } /\}/ anon_inodes_exit { print \tkvm_exit_anon_inodes();; @@ -25,7 +25,7 @@ BEGIN { split(INIT_WORK desc_struct ldttss_desc64 desc_ptr \ anon_inodes_exit = 0 } -/^int kvm_arch_init/ { kvm_arch_init = 1 } +/^int kvm_arch_init[^)])$/ { kvm_arch_init = 1 } /\tsc_khz\/ kvm_arch_init { sub(\\tsc_khz\\, kvm_tsc_khz) } /^}/ { kvm_arch_init = 0 } @@ -85,6 +85,8 @@ BEGIN { split(INIT_WORK desc_struct ldttss_desc64 desc_ptr \ /\kvm_.*_fops\.owner = module;/ { $0 = IF_ANON_INODES_DOES_REFCOUNTS( $0 ) } +{ sub(/\CONFIG_KVM_TRACE\/, CONFIG_KMOD_KVM_TRACE) } + { print } /unsigned long flags;/ vmx_load_host_state { Xiantao, do we need to change this for ia64? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm.git now live
Jan Kiszka wrote: Avi Kivity wrote: Where/how does the migration code disable dirty logging? Should be phase 3 of ram_save_live(). But only in qemu-kvm. What is the plan about pushing it upstream? Then we could discuss how to extend the exiting support best. Pushing things upstream is quite difficult because of the very different infrastructure. Isn't the midterm goal to get rid of most of these differences (namely libkvm)? Yes, but not by removing existing functionality. It's unfortunate that upstream rewrote everything instead of changing things incrementally. Rewrites are almost always a mistake since they throw away accumulated knowledge. I disagree, at least in this particular case. Upstream already diverged from qemu-kvm, and the latter provided no comparable alternative for slot management and dirty logging. And I still don't see that we lost anything that could not easily be re-integrated into upstream (ie. global dirty logging), finally leading to a cleaner and more complete result. It could have been done differently, by morphing the existing support into something mergable, and merging that. In this way, we'd ensure no needed functionality is lost. As is, we're adding something simple, then discovering it's insufficient. We're throwing away information, that's not a good way to make progress. So, what bits are missing to make KVM migration work in upstream? I don't know of anything beyond dirty logging. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/21] Remove clean rule change
Anthony Liguori wrote: It's not in upstream QEMU so apparently it's not useful. Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- pc-bios/Makefile |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/pc-bios/Makefile b/pc-bios/Makefile index dabeb4c..315288d 100644 --- a/pc-bios/Makefile +++ b/pc-bios/Makefile @@ -16,4 +16,4 @@ all: $(TARGETS) dtc -I dts -O dtb -o $@ $ clean: - rm -f $(TARGETS) *.o *~ *.dtb + rm -f $(TARGETS) *.o *~ Hollis? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/21] Remove unused variables in vga.c
Anthony Liguori wrote: Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- hw/vga.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/hw/vga.c b/hw/vga.c index 4931b69..d96f1be 100644 --- a/hw/vga.c +++ b/hw/vga.c @@ -1585,12 +1585,11 @@ static void vga_sync_dirty_bitmap(VGAState *s) */ static void vga_draw_graphic(VGAState *s, int full_update) { -int y1, y, update, linesize, y_start, double_scan, mask, depth; -int width, height, shift_control, line_offset, bwidth, bits; +int y1, y, update, page_min, page_max, linesize, y_start, double_scan, mask, depth; +int width, height, shift_control, line_offset, page0, page1, bwidth, bits; int disp_width, multi_scan, multi_run; uint8_t *d; uint32_t v, addr1, addr; -long page0, page1, page_min, page_max; vga_draw_line_func *vga_draw_line; This introduces a regression with 4GB guests. I resolved this by posting a patch to qemu; see 12c7e75a7c. Are you using an outdated checkout? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm.git now live
Avi Kivity wrote: Where/how does the migration code disable dirty logging? Should be phase 3 of ram_save_live(). But only in qemu-kvm. What is the plan about pushing it upstream? Then we could discuss how to extend the exiting support best. Pushing things upstream is quite difficult because of the very different infrastructure. Isn't the midterm goal to get rid of most of these differences (namely libkvm)? It's unfortunate that upstream rewrote everything instead of changing things incrementally. Rewrites are almost always a mistake since they throw away accumulated knowledge. I disagree, at least in this particular case. Upstream already diverged from qemu-kvm, and the latter provided no comparable alternative for slot management and dirty logging. And I still don't see that we lost anything that could not easily be re-integrated into upstream (ie. global dirty logging), finally leading to a cleaner and more complete result. So, what bits are missing to make KVM migration work in upstream? Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: Replace get_mt_mask_shift with get_mt_mask
Sheng Yang wrote: Shadow_mt_mask is out of date, now it have only been used as a flag to indicate if TDP enabled. Get rid of it and use tdp_enabled instead. Also put memory type logical in kvm_x86_ops-get_mt_mask(). Applied both, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST] [PATCH] support for remote migration
Hello everyone, I like to submit patch to add support for remote migration in kvm-autotest. To use this patch the following four parameters should be added to the existing migration test remote = dst hostip = localhost ip or name remoteip = remote host ip or name remuser = root rempassword = password the field remote=dst indicates the VM dst should be created on remote machine. For example: - migrate: install setup type = migration vms += dst migration_test_command = help kill_vm_on_error = yes remote = dst hostip = 192.168.1.2 remoteip = 192.168.1.3 remuser = root rempassword = 123456 variants: Three files r being modified in this patch kvm_utils.py, kvm_tests.py and kvm_vm.py. kvm_utils.py - if the ssh-keys have been exchanged between the test machines,then remote login fails with message Got unexpected login prompt, to prevent this, have made it return a session rather then None kvm_tests.py - the host address used in migration is made dynamic kvm_vm.py -have replaced unix sockets with tcp sockets for monitor, in both remote and local VM. Added two new variables(remote,ssh_port) to class VM,remote set to True if the VM is on a remote machine,ssh_port contains the redirection port, funtion get_address() returns the ip of the host whr the VM is(local or remote). Thx Yogi kvm_tests.py |2 - kvm_utils.py |3 -- kvm_vm.py| 61 --- 3 files changed, 48 insertions(+), 18 deletions(-) Signed-off-by: Yogananth Subramanian anant...@in.ibm.com --- diff -aurp kvm-autotest.orgi//client/tests/kvm_runtest_2/kvm_tests.py kvm-autotest/client/tests/kvm_runtest_2/kvm_tests.py --- kvm-autotest.orgi//client/tests/kvm_runtest_2/kvm_tests.py 2009-04-29 18:33:10.0 + +++ kvm-autotest/client/tests/kvm_runtest_2/kvm_tests.py 2009-04-30 05:59:24.0 + @@ -81,7 +81,7 @@ def run_migration(test, params, env): session.close() # Define the migration command -cmd = migrate -d tcp:localhost:%d % dest_vm.migration_port +cmd = migrate -d tcp:%s:%d % (dest_vm.hostip,dest_vm.migration_port) kvm_log.debug(Migration command: %s % cmd) # Migrate diff -aurp kvm-autotest.orgi//client/tests/kvm_runtest_2/kvm_utils.py kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py --- kvm-autotest.orgi//client/tests/kvm_runtest_2/kvm_utils.py 2009-04-29 18:33:10.0 + +++ kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py 2009-04-30 06:13:47.0 + @@ -431,8 +431,7 @@ def remote_login(command, password, prom return None elif match == 2: # login: kvm_log.debug(Got unexpected login prompt) -sub.close() -return None +return sub elif match == 3: # Connection closed kvm_log.debug(Got 'Connection closed') sub.close() diff -aurp kvm-autotest.orgi//client/tests/kvm_runtest_2/kvm_vm.py kvm-autotest/client/tests/kvm_runtest_2/kvm_vm.py --- kvm-autotest.orgi//client/tests/kvm_runtest_2/kvm_vm.py 2009-04-29 18:33:10.0 + +++ kvm-autotest/client/tests/kvm_runtest_2/kvm_vm.py 2009-04-30 06:31:34.0 + @@ -3,6 +3,7 @@ import time import socket import os +import re import kvm_utils import kvm_log @@ -105,6 +106,7 @@ class VM: self.qemu_path = qemu_path self.image_dir = image_dir self.iso_dir = iso_dir +self.remote = False def verify_process_identity(self): Make sure .pid really points to the original qemu process. @@ -124,8 +126,6 @@ class VM: file.close() if not self.qemu_path in cmdline: return False -if not self.monitor_file_name in cmdline: -return False return True def make_qemu_command(self, name=None, params=None, qemu_path=None, image_dir=None, iso_dir=None): @@ -173,7 +173,6 @@ class VM: qemu_cmd = qemu_path qemu_cmd += -name '%s' % name -qemu_cmd += -monitor unix:%s,server,nowait % self.monitor_file_name for image_name in kvm_utils.get_sub_dict_names(params, images): image_params = kvm_utils.get_sub_dict(params, image_name) @@ -211,6 +210,7 @@ class VM: redir_params = kvm_utils.get_sub_dict(params, redir_name) guest_port = int(redir_params.get(guest_port)) host_port = self.get_port(guest_port) +self.ssh_port = host_port qemu_cmd += -redir tcp:%s::%s % (host_port, guest_port) if params.get(display) == vnc: @@ -254,6 +254,17 @@ class VM: image_dir = self.image_dir iso_dir = self.iso_dir +# If VM is remote, set hostip to ip of the remote machine +# If VM is local set hostip to localhost or hostip param +if params.get(remote) == self.name: +
Re: [PATCH] KVM: VMX: Disable VMX when system shutdown
Sheng Yang wrote: Intel TXT(Trusted Execution Technology) required VMX off for all cpu to work when system shutdown. Applied, thanks. Is this needed for 2.6.30 and -stable? That is, is the code that enables TXT in 2.6.30 and below or in the BIOS? Or is it new code not yet merged? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Avi Kivity wrote: Andrew Theurer wrote: I wanted to share some performance data for KVM and Xen. I thought it would be interesting to share some performance results especially compared to Xen, using a more complex situation like heterogeneous server consolidation. The Workload: The workload is one that simulates a consolidation of servers on to a single host. There are 3 server types: web, imap, and app (j2ee). In addition, there are other helper servers which are also consolidated: a db server, which helps out with the app server, and an nfs server, which helps out with the web server (a portion of the docroot is nfs mounted). There is also one other server that is simply idle. All 6 servers make up one set. The first 3 server types are sent requests, which in turn may send requests to the db and nfs helper servers. The request rate is throttled to produce a fixed amount of work. In order to increase utilization on the host, more sets of these servers are used. The clients which send requests also have a response time requirement which is monitored. The following results have passed the response time requirements. What's the typical I/O load (disk and network bandwidth) while the tests are running? This is average thrgoughput: network:Tx: 79 MB/sec Rx: 5 MB/sec disk:read: 17 MB/sec write: 40 MB/sec The host hardware: A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4 x 1 GB Ethenret CPU time measurements with SMT can vary wildly if the system is not fully loaded. If the scheduler happens to schedule two threads on a single core, both of these threads will generate less work compared to if they were scheduled on different cores. Understood. Even if at low loads, the scheduler does the right thing and spreads out to all the cores first, once it goes beyond 50% util, the CPU util can climb at a much higher rate (compared to a linear increase in work) because it then starts scheduling 2 threads per core, and each thread can do less work. I have always wanted something which could more accurately show the utilization of a processor core, but I guess we have to use what we have today. I will run again with SMT off to see what we get. Test Results: The throughput is equal in these tests, as the clients throttle the work (this is assuming you don't run out of a resource on the host). What's telling is the CPU used to do the same amount of work: Xen: 52.85% KVM: 66.93% So, KVM requires 66.93/52.85 = 26.6% more CPU to do the same amount of work. Here's the breakdown: totalusernice system irq softirq guest 66.907.200.00 12.940.353.39 43.02 Comparing guest time to all other busy time, that's a 23.88/43.02 = 55% overhead for virtualization. I certainly don't expect it to be 0, but 55% seems a bit high. So, what's the reason for this overhead? At the bottom is oprofile output of top functions for KVM. Some observations: 1) I'm seeing about 2.3% in scheduler functions [that I recognize]. Does that seems a bit excessive? Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. I have a older patch which makes a small change to posix_aio_thread.c by trying to keep the thread pool size a bit lower than it is today. I will dust that off and see if it helps. 2) cpu_physical_memory_rw due to not using preadv/pwritev? I think both virtio-net and virtio-blk use memcpy(). 3) vmx_[save|load]_host_state: I take it this is from guest switches? These are called when you context-switch from a guest, and, much more frequently, when you enter qemu. We have 180,000 context switches a second. Is this more than expected? Way more. Across 16 logical cpus, this is 10,000 cs/sec/cpu. I wonder if schedstats can show why we context switch (need to let someone else run, yielded, waiting on io, etc). Yes, there is a scheduler tracer, though I have no idea how to operate it. Do you have kvm_stat logs? Sorry, I don't, but I'll run that next time. BTW, I did not notice a batch/log mode the last time I ram kvm_stat. Or maybe it was not obvious to me. Is there an ideal way to run kvm_stat without a curses like output? -Andrew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] deal with interrupt shadow state for emulated instruction
Glauber Costa wrote: we currently unblock shadow interrupt state when we skip an instruction, but failing to do so when we actually emulate one. This blocks interrupts in key instruction blocks, in particular sti; hlt; sequences If the instruction emulated is an sti, we have to block shadow interrupts. The same goes for mov ss. pop ss also needs it, but we don't currently emulate it. Without this patch, I cannot boot gpxe option roms at vmx machines. This is described at https://bugzilla.redhat.com/show_bug.cgi?id=494469 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index cb306cf..9455a30 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -510,6 +510,8 @@ struct kvm_x86_ops { void (*run)(struct kvm_vcpu *vcpu, struct kvm_run *run); int (*handle_exit)(struct kvm_run *run, struct kvm_vcpu *vcpu); void (*skip_emulated_instruction)(struct kvm_vcpu *vcpu); + void (*interrupt_shadow_mask)(struct kvm_vcpu *vcpu, int mask); Can you verb this function? set_interrupt_shadow would make it nicely complement get_interrupt_shadow. + u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu); void (*patch_hypercall)(struct kvm_vcpu *vcpu, unsigned char *hypercall_addr); int (*get_irq)(struct kvm_vcpu *vcpu); +static u32 svm_get_interrupt_shadow(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + u32 ret = 0; + + if (svm-vmcb-control.int_state SVM_INTERRUPT_SHADOW_MASK) + ret |= (X86_SHADOW_INT_STI X86_SHADOW_INT_MOV_SS); + return ret; +} Hmm, if the guest runs an infinite emulated 'mov ss', it will keep toggling the MOV_SS bit, but STI will remain set, so we'll never allow an interrupt into the guest kernel. diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c index d2664fc..797d41f 100644 --- a/arch/x86/kvm/x86_emulate.c +++ b/arch/x86/kvm/x86_emulate.c @@ -1618,6 +1618,16 @@ special_insn: int err; sel = c-src.val; + if (c-modrm_reg == VCPU_SREG_SS) { + u32 int_shadow = + kvm_x86_ops-get_interrupt_shadow(ctxt-vcpu); + /* See sti emulation for an explanation of this */ + if ((int_shadow X86_SHADOW_INT_MOV_SS)) + ctxt-interruptibility = ~X86_SHADOW_INT_MOV_SS; + else + ctxt-interruptibility |= X86_SHADOW_INT_MOV_SS; + } ^= @@ -1846,10 +1856,23 @@ special_insn: ctxt-eflags = ~X86_EFLAGS_IF; c-dst.type = OP_NONE; /* Disable writeback. */ break; - case 0xfb: /* sti */ + case 0xfb: { /* sti */ + u32 int_shadow = kvm_x86_ops-get_interrupt_shadow(ctxt-vcpu); + /* +* an sti; sti; sequence only disable interrupts for the first +* instruction. So, if the last instruction, be it emulated or +* not, left the system with the INT_STI flag enabled, it +* means that the last instruction is an sti. We should not +* leave the flag on in this case +*/ + if ((int_shadow X86_SHADOW_INT_STI)) + ctxt-interruptibility = ~X86_SHADOW_INT_STI; + else + ctxt-interruptibility |= X86_SHADOW_INT_STI; ^= -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/21] Remove use of signalfd in block-raw-posix.c
Anthony Liguori wrote: We don't use signalfd in upstream QEMU. Instead, we always emulate it. With an extra thread - so an extra context switch. It's not necessarily a bad thing to use signalfd, but this is something that should be done upstream. It certainly does qemu-kvm no harm to use the upstream code. It will introduce a (likely minor, but real) performance regression. Instead of this, why not apply the reverse patch to qemu.git? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/21] Remove use of signalfd in block-raw-posix.c
Avi Kivity wrote: Anthony Liguori wrote: We don't use signalfd in upstream QEMU. Instead, we always emulate it. With an extra thread - so an extra context switch. We don't use an extra thread. We just install a signal handler that writes to a pipe. At best, the added overhead is that we get EINTRs more often but this is something we already handle. It's not necessarily a bad thing to use signalfd, but this is something that should be done upstream. It certainly does qemu-kvm no harm to use the upstream code. It will introduce a (likely minor, but real) performance regression. Instead of this, why not apply the reverse patch to qemu.git? I'm not sure signalfd really buys us much. To emulate it requires writing a bunch more data to the pipe. When writing more than 1 byte, we have to worry about whether there's a partial write because the pipe buffers full). We also have to make sure to read from the fd in properly sized chunks. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Andrew Theurer wrote: Avi Kivity wrote: What's the typical I/O load (disk and network bandwidth) while the tests are running? This is average thrgoughput: network:Tx: 79 MB/sec Rx: 5 MB/sec MB as in Byte or Mb as in bit? disk:read: 17 MB/sec write: 40 MB/sec This could definitely cause the extra load, especially if it's many small requests (compared to a few large ones). The host hardware: A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4 x 1 GB Ethenret CPU time measurements with SMT can vary wildly if the system is not fully loaded. If the scheduler happens to schedule two threads on a single core, both of these threads will generate less work compared to if they were scheduled on different cores. Understood. Even if at low loads, the scheduler does the right thing and spreads out to all the cores first, once it goes beyond 50% util, the CPU util can climb at a much higher rate (compared to a linear increase in work) because it then starts scheduling 2 threads per core, and each thread can do less work. I have always wanted something which could more accurately show the utilization of a processor core, but I guess we have to use what we have today. I will run again with SMT off to see what we get. On the other hand, without SMT you will get to overcommit much faster, so you'll have scheduling artifacts. Unfortunately there's no good answer here (except to improve the SMT scheduler). Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. I have a older patch which makes a small change to posix_aio_thread.c by trying to keep the thread pool size a bit lower than it is today. I will dust that off and see if it helps. Really, I think linux-aio support can help here. Yes, there is a scheduler tracer, though I have no idea how to operate it. Do you have kvm_stat logs? Sorry, I don't, but I'll run that next time. BTW, I did not notice a batch/log mode the last time I ram kvm_stat. Or maybe it was not obvious to me. Is there an ideal way to run kvm_stat without a curses like output? You're probably using an ancient version: $ kvm_stat --help Usage: kvm_stat [options] Options: -h, --helpshow this help message and exit -1, --once, --batch run in batch mode for one second -l, --log run in logging mode (like vmstat) -f FIELDS, --fields=FIELDS fields to display (regex) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/21] Remove use of signalfd in block-raw-posix.c
Anthony Liguori wrote: Avi Kivity wrote: Anthony Liguori wrote: We don't use signalfd in upstream QEMU. Instead, we always emulate it. With an extra thread - so an extra context switch. We don't use an extra thread. We just install a signal handler that writes to a pipe. At best, the added overhead is that we get EINTRs more often but this is something we already handle. Oh okay. But signal delivery is slow; for example the FPU needs to be reset. I'm not sure signalfd really buys us much. To emulate it requires writing a bunch more data to the pipe. When writing more than 1 byte, we have to worry about whether there's a partial write because the pipe buffers full). We also have to make sure to read from the fd in properly sized chunks. Then we can use one byte writes (and reads) when signalfd is not available. 128 byte pipe read/writes should always be atomic on Linux though, likely on other OSes too. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface
On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote: This allows an eventfd to be registered as an irq source with a guest. Any signaling operation on the eventfd (via userspace or kernel) will inject the registered GSI at the next available window. Signed-off-by: Gregory Haskins ghask...@novell.com If we ever want to use this with e.g. MSI-X emulation in guest, and want to be stricly compliant to MSI-X, we'll need a way for guest to mask interrupts, and for host to report that a masked interrupt is pending. Ideally, all this will be doable with a couple of mmapped pages to avoid vmexits/system calls. +static void +irqfd_inject(struct work_struct *work) +{ + struct _irqfd *irqfd = container_of(work, struct _irqfd, work); + struct kvm *kvm = irqfd-kvm; + + mutex_lock(kvm-lock); + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1); + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0); + mutex_unlock(kvm-lock); This will do weird stuff (deliver the irq twice) if the irq is MSI/MSI-X. I know this was discussed already and is a temporary shortcut, but maybe add a comment that we really want kvm_toggle_irq, so that we won't forget? +} + -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/21] Remove unused variables in vga.c
Avi Kivity wrote: @@ -1585,12 +1585,11 @@ static void vga_sync_dirty_bitmap(VGAState *s) */ static void vga_draw_graphic(VGAState *s, int full_update) { -int y1, y, update, linesize, y_start, double_scan, mask, depth; -int width, height, shift_control, line_offset, bwidth, bits; +int y1, y, update, page_min, page_max, linesize, y_start, double_scan, mask, depth; +int width, height, shift_control, line_offset, page0, page1, bwidth, bits; int disp_width, multi_scan, multi_run; uint8_t *d; uint32_t v, addr1, addr; -long page0, page1, page_min, page_max; vga_draw_line_func *vga_draw_line; This introduces a regression with 4GB guests. I resolved this by posting a patch to qemu; see 12c7e75a7c. Are you using an outdated checkout? Oh, I understand what's happening now. It took me a while to see that we're changing the type of variables from int to long. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/21] Remove -cpu-vendor-string
Avi Kivity wrote: Anthony Liguori wrote: This isn't in upstream QEMU and is of little utility to KVM. It's unlikely to appear in upstream QEMU either. Since we allow overriding cpuid flags, why not the vendor string? It's necessary for cpu passthrough. But we don't allow explicit override of cpuid flags today. We support choosing CPU models which include vendor id and cpuid flags. Introducing a host CPU model would be acceptable and would more accurately achieve cpu passthrough. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/21] Remove #define __user in usb-linux.c
Avi Kivity wrote: Anthony Liguori wrote: This has been consistently nacked in upstream QEMU. Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- usb-linux.c |4 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/usb-linux.c b/usb-linux.c index 26643bd..70d7a1c 100644 --- a/usb-linux.c +++ b/usb-linux.c @@ -34,10 +34,6 @@ #include qemu-timer.h #include monitor.h -#if defined(__linux__) -#define __user -#endif - This will introduce a regression into qemu-kvm.git. It won't because -D__user is in CFLAGS. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/21] Remove -cpu-vendor-string
Anthony Liguori wrote: This isn't in upstream QEMU and is of little utility to KVM. It's unlikely to appear in upstream QEMU either. Since we allow overriding cpuid flags, why not the vendor string? It's necessary for cpu passthrough. But we don't allow explicit override of cpuid flags today. We support choosing CPU models which include vendor id and cpuid flags. I think we allow -cpu qemu64,-nx for example. Introducing a host CPU model would be acceptable and would more accurately achieve cpu passthrough. I agree that -cpu host[,modifiers] is desirable. But I don't see why we shouldn't support finegrained control. It's probably better done through a -cpu blah,-nx,vendorid=foobar rather than a separate option. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/21] Remove #define __user in usb-linux.c
Anthony Liguori wrote: Avi Kivity wrote: Anthony Liguori wrote: This has been consistently nacked in upstream QEMU. -#if defined(__linux__) -#define __user -#endif - This will introduce a regression into qemu-kvm.git. It won't because -D__user is in CFLAGS. Ah, ok, will apply. But that's not in upstream either. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 18/21] Remove host_alarm_timer hacks.
Avi Kivity wrote: Anthony Liguori wrote: Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- vl.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/vl.c b/vl.c index 3b0e3dc..848a8f8 100644 --- a/vl.c +++ b/vl.c @@ -1367,8 +1367,7 @@ static void host_alarm_handler(int host_signum) last_clock = ti; } #endif -if (1 || -alarm_has_dynticks(alarm_timer) || +if (alarm_has_dynticks(alarm_timer) || (!use_icount qemu_timer_expired(active_timers[QEMU_TIMER_VIRTUAL], qemu_get_clock(vm_clock))) || This was added to fix a problem. Have you tested it? Do you know what problem it fixes? This goes back a very long time. IIUC, this was added prior to the IO thread as an optimization. This ensures that any time there's a timer, the vcpu is interrupted to allow IO to run. With non-dynticks, there can be spurious timer signals because we problem the timer with a fixed frequency. It's necessary to take this path with dynticks because we need to rearm the timer which happens in the IO path. It's not necessary to take this path with a non-dynticks timer unless there's been an expiration. In modern KVM, the IO thread is capable of interrupting the CPU whenever it needs to process IO. Therefore this problem no longer exists. Regards, Anthony Liguori -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/21] Remove #define __user in usb-linux.c
Avi Kivity wrote: Ah, ok, will apply. But that's not in upstream either. Nope, but one step at a time. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 18/21] Remove host_alarm_timer hacks.
Anthony Liguori wrote: Avi Kivity wrote: Anthony Liguori wrote: Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- vl.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/vl.c b/vl.c index 3b0e3dc..848a8f8 100644 --- a/vl.c +++ b/vl.c @@ -1367,8 +1367,7 @@ static void host_alarm_handler(int host_signum) last_clock = ti; } #endif -if (1 || -alarm_has_dynticks(alarm_timer) || +if (alarm_has_dynticks(alarm_timer) || (!use_icount qemu_timer_expired(active_timers[QEMU_TIMER_VIRTUAL], qemu_get_clock(vm_clock))) || This was added to fix a problem. Have you tested it? Do you know what problem it fixes? This goes back a very long time. IIUC, this was added prior to the IO thread as an optimization. This ensures that any time there's a timer, the vcpu is interrupted to allow IO to run. With non-dynticks, there can be spurious timer signals because we problem the timer with a fixed frequency. It's necessary to take this path with dynticks because we need to rearm the timer which happens in the IO path. It's not necessary to take this path with a non-dynticks timer unless there's been an expiration. In modern KVM, the IO thread is capable of interrupting the CPU whenever it needs to process IO. Therefore this problem no longer exists. It would still be good to verify that the problem no longer exists. This is not a cosmetic change; some testing is needed to verify it doesn't introduce new latencies. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/21] Remove -cpu-vendor-string
Avi Kivity wrote: Anthony Liguori wrote: This isn't in upstream QEMU and is of little utility to KVM. It's unlikely to appear in upstream QEMU either. Since we allow overriding cpuid flags, why not the vendor string? It's necessary for cpu passthrough. But we don't allow explicit override of cpuid flags today. We support choosing CPU models which include vendor id and cpuid flags. I think we allow -cpu qemu64,-nx for example. Funny enough, -cpu qemu64,vendor=AuthenticAMD already works today. So yeah, there's no reason to carry -cpu-vendor-string anymore. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/21] Remove use of signalfd in block-raw-posix.c
Avi Kivity wrote: Anthony Liguori wrote: Avi Kivity wrote: Anthony Liguori wrote: We don't use signalfd in upstream QEMU. Instead, we always emulate it. With an extra thread - so an extra context switch. We don't use an extra thread. We just install a signal handler that writes to a pipe. At best, the added overhead is that we get EINTRs more often but this is something we already handle. Oh okay. But signal delivery is slow; for example the FPU needs to be reset. Is it really justified to add all of this extra code (including signalfd emulation) for something that probably isn't even measurable? I like using wiz-bang features of Linux as much as the next guy, but I think we're stretching to justify it here :-) -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Make savevm versioning compatible with upstream QEMU
Anthony Liguori wrote: Right now, there is no way savevm versioning can be compatible with upstream QEMU because KVM adds fields to existing savevm structures without incrementing the versions. If you assume that KVM will eventually merge into upstream QEMU, this means that eventually KVM is going to have to break backwards compatibility with itself to resolve this issue in a non-graceful way. So let's do that now instead of doing it later when the situation is only worse. I'm happy to allocate particular version identifiers for KVM to avoid future conflicts. I believe we should try to eliminate the existing differences so that we can converge in the future on a common versioning scheme. Applied both, thanks. I think we can avoid the need to synchronize too much by saving kvm-specific state for device x using id x-kvm; this allows the two to evolve independently. Of course it's much better to avoid divergence in the first place, but this isn't always possible. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/21] Remove -cpu-vendor-string
Anthony Liguori wrote: This isn't in upstream QEMU and is of little utility to KVM. It's unlikely to appear in upstream QEMU either. Since we allow overriding cpuid flags, why not the vendor string? It's necessary for cpu passthrough. But we don't allow explicit override of cpuid flags today. We support choosing CPU models which include vendor id and cpuid flags. I think we allow -cpu qemu64,-nx for example. Funny enough, -cpu qemu64,vendor=AuthenticAMD already works today. So yeah, there's no reason to carry -cpu-vendor-string anymore. Applied, but had to reverse the sense of the commit log :) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Make savevm versioning compatible with upstream QEMU
Avi Kivity wrote: Anthony Liguori wrote: Right now, there is no way savevm versioning can be compatible with upstream QEMU because KVM adds fields to existing savevm structures without incrementing the versions. If you assume that KVM will eventually merge into upstream QEMU, this means that eventually KVM is going to have to break backwards compatibility with itself to resolve this issue in a non-graceful way. So let's do that now instead of doing it later when the situation is only worse. I'm happy to allocate particular version identifiers for KVM to avoid future conflicts. I believe we should try to eliminate the existing differences so that we can converge in the future on a common versioning scheme. Applied both, thanks. I think we can avoid the need to synchronize too much by saving kvm-specific state for device x using id x-kvm; this allows the two to evolve independently. I need to add save/restore support to upstream QEMU so this is a good excuse to just merge the changes in KVM upstream. So hopefully this will become a non issue. If something arises and you need more savevm state, introduce a new section suffixed or prefixed with kvm. Alternatively, ask and I can reserve an ID upstream. For virtio-net, we just need to get the vnet stuff merged upstream. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/21] Remove use of signalfd in block-raw-posix.c
Anthony Liguori wrote: Oh okay. But signal delivery is slow; for example the FPU needs to be reset. Is it really justified to add all of this extra code (including signalfd emulation) for something that probably isn't even measurable? We don't have to add signalfd emulation; we can simply use signal+pipe in that case. We won't know if it's measurable or not until we measure it (or not). I like using wiz-bang features of Linux as much as the next guy, but I think we're stretching to justify it here :-) I think it's worth it in this case. It will become more important in time, too. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 18/21] Remove host_alarm_timer hacks.
Avi Kivity wrote: Anthony Liguori wrote: In modern KVM, the IO thread is capable of interrupting the CPU whenever it needs to process IO. Therefore this problem no longer exists. It would still be good to verify that the problem no longer exists. This is not a cosmetic change; some testing is needed to verify it doesn't introduce new latencies. N.B. dynticks is the preferred timer in QEMU on Linux. To even hit this code path, you'd have to use an explicit -clock hpet or -clock rtc. I don't have an hpet on my laptop and -clock rtc boots just as fast as it did before. Do we really care about optimizing latency with -clock rtc though? -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm] [PATCH 1/2] Increment virtio-net savevm version to avoid conflict with upstream QEMU.
Alex Williamson wrote: On Wed, 2009-04-29 at 15:53 -0500, Anthony Liguori wrote: -#define VIRTIO_NET_VM_VERSION6 +/* Version 7 has TAP_VNET_HDR support. This is reserved in upstream QEMU to + * avoid future conflict. + * We can't assume verisons 7 have TAP_VNET_HDR support until this is merged + * in upstream QEMU. + */ +#define VIRTIO_NET_VM_VERSION7 It seems like you're saying you're only going to reserve version number 7, and not the 4 bytes of savevm we're using for version 7 here. Couldn't we fix this by adding a dummy patch to qemu to bump to version 7, and push/pop a 4 byte zero from the savevm? Then we could change the code below to = 7. Qemu should probably puke on a savevm image with non-zero in this location until the kvm code gets merged. Looks like one byte would be more than sufficient if we wanted to make that change now too. Thanks, I'd rather just merge vnet into upstream QEMU as quickly as possible. All I have to do to reserve a field is just hope noone submits a patch incrementing version id until we submit vnet support :-) -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix build when objdir != srcdir
Anthony Liguori wrote: This requires adding the necessary bits to configure to create the directories and symlinks for libkvm. It also requires sticking KVM_CFLAGS in config-host.mak to ensure that it gets the right set of includes for the kernel headers. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Avi Kivity wrote: Andrew Theurer wrote: Avi Kivity wrote: What's the typical I/O load (disk and network bandwidth) while the tests are running? This is average thrgoughput: network:Tx: 79 MB/sec Rx: 5 MB/sec MB as in Byte or Mb as in bit? Byte. There are 4 x 1 Gb adapters, each handling about 20 MB/sec or 160 Mbit/sec. disk:read: 17 MB/sec write: 40 MB/sec This could definitely cause the extra load, especially if it's many small requests (compared to a few large ones). I don't have the request sizes at my fingertips, but we have to use a lot of disks to support this I/O, so I think it's safe to assume there are a lot more requests than a simple large sequential read/write. The host hardware: A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4 x 1 GB Ethenret CPU time measurements with SMT can vary wildly if the system is not fully loaded. If the scheduler happens to schedule two threads on a single core, both of these threads will generate less work compared to if they were scheduled on different cores. Understood. Even if at low loads, the scheduler does the right thing and spreads out to all the cores first, once it goes beyond 50% util, the CPU util can climb at a much higher rate (compared to a linear increase in work) because it then starts scheduling 2 threads per core, and each thread can do less work. I have always wanted something which could more accurately show the utilization of a processor core, but I guess we have to use what we have today. I will run again with SMT off to see what we get. On the other hand, without SMT you will get to overcommit much faster, so you'll have scheduling artifacts. Unfortunately there's no good answer here (except to improve the SMT scheduler). Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. I have a older patch which makes a small change to posix_aio_thread.c by trying to keep the thread pool size a bit lower than it is today. I will dust that off and see if it helps. Really, I think linux-aio support can help here. Yes, I think that would work for real block devices, but would that help for files? I am using real block devices right now, but it would be nice to also see a benefit for files in a file-system. Or maybe I am mis-understanding this, and linux-aio can be used on files? -Andrew Yes, there is a scheduler tracer, though I have no idea how to operate it. Do you have kvm_stat logs? Sorry, I don't, but I'll run that next time. BTW, I did not notice a batch/log mode the last time I ram kvm_stat. Or maybe it was not obvious to me. Is there an ideal way to run kvm_stat without a curses like output? You're probably using an ancient version: $ kvm_stat --help Usage: kvm_stat [options] Options: -h, --helpshow this help message and exit -1, --once, --batch run in batch mode for one second -l, --log run in logging mode (like vmstat) -f FIELDS, --fields=FIELDS fields to display (regex) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Avi Kivity wrote: 1) I'm seeing about 2.3% in scheduler functions [that I recognize]. Does that seems a bit excessive? Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. This is why I wrote the linux-aio patch. It only reduced CPU consumption by about 2% although I'm not sure if that's absolute or relative. Andrew? 2) cpu_physical_memory_rw due to not using preadv/pwritev? I think both virtio-net and virtio-blk use memcpy(). With latest linux-2.6, and a development snapshot of glibc, virtio-blk will not use memcpy() anymore but virtio-net still does on the receive path (but not transmit). Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Andrew Theurer wrote: Really, I think linux-aio support can help here. Yes, I think that would work for real block devices, but would that help for files? I am using real block devices right now, but it would be nice to also see a benefit for files in a file-system. Or maybe I am mis-understanding this, and linux-aio can be used on files? For cache=off, with some file systems, yes. But not for cache=writethrough/writeback. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: VMX: Disable VMX when system shutdown
From: Avi Kivity [mailto:a...@redhat.com] Sent: Thursday, April 30, 2009 5:31 AM Sheng Yang wrote: Intel TXT(Trusted Execution Technology) required VMX off for all cpu to work when system shutdown. Applied, thanks. Is this needed for 2.6.30 and -stable? That is, is the code that enables TXT in 2.6.30 and below or in the BIOS? Or is it new code not yet merged? The TXT code will not get merged in 2.6.30, though it will hopefully make it soon thereafter. So it would be fine to put it in 2.6.31. Joe
Re: KVM performance vs. Xen
Andrew Theurer wrote: disk:read: 17 MB/sec write: 40 MB/sec This could definitely cause the extra load, especially if it's many small requests (compared to a few large ones). I don't have the request sizes at my fingertips, but we have to use a lot of disks to support this I/O, so I think it's safe to assume there are a lot more requests than a simple large sequential read/write. Yes. Well the high context switch rate is the scheduler's way of telling us to use linux-aio. If lot's of disks == 100, with a 3ms seek time, that's already 60,000 cs/sec. Really, I think linux-aio support can help here. Yes, I think that would work for real block devices, but would that help for files? I am using real block devices right now, but it would be nice to also see a benefit for files in a file-system. Or maybe I am mis-understanding this, and linux-aio can be used on files? It could work with files with cache=none (though not qcow2 as now written). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Anthony Liguori wrote: 2) cpu_physical_memory_rw due to not using preadv/pwritev? I think both virtio-net and virtio-blk use memcpy(). With latest linux-2.6, and a development snapshot of glibc, virtio-blk will not use memcpy() anymore but virtio-net still does on the receive path (but not transmit). There's still the kernel/user copy, so we have two copies on rx, one on tx. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix build when objdir != srcdir
Avi Kivity wrote: Anthony Liguori wrote: This requires adding the necessary bits to configure to create the directories and symlinks for libkvm. It also requires sticking KVM_CFLAGS in config-host.mak to ensure that it gets the right set of includes for the kernel headers. Applied, thanks. Unapplied, as it breaks ordinary ./configure make. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST] [PATCH] support for remote migration
yogi wrote: Hello everyone, I like to submit patch to add support for remote migration in kvm-autotest. Thanks for the patch, Uri is out on vacation for a while. I'll apply the patch to my test repo and do some validation testing, however may be a little while untill it makes it in. -D -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Anthony Liguori wrote: Avi Kivity wrote: 1) I'm seeing about 2.3% in scheduler functions [that I recognize]. Does that seems a bit excessive? Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. This is why I wrote the linux-aio patch. It only reduced CPU consumption by about 2% although I'm not sure if that's absolute or relative. Andrew? Was that before or after the entire path was made copyless? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Avi Kivity wrote: Anthony Liguori wrote: Avi Kivity wrote: 1) I'm seeing about 2.3% in scheduler functions [that I recognize]. Does that seems a bit excessive? Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. This is why I wrote the linux-aio patch. It only reduced CPU consumption by about 2% although I'm not sure if that's absolute or relative. Andrew? If I recall correctly, it was 2.4% and relative. But with 2.3% in scheduler functions, that's what I expected. Was that before or after the entire path was made copyless? If this is referring to the preadv/writev support, no, I have not tested with that. -Andrew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/04] qemu-kvm: Remove the dependency for phys_ram_base for ipf.c
Avi Kivity wrote: Jes Sorensen wrote: I pushed my queue into a branch (named 'queue'). Will merge once I resolve the regressions here. Hi Avi, I don't see that branch - it's in the qemu-kvm repo? Cheers, Jes [...@leavenworth qemu-kvm]$ git branch -a * master origin/HEAD origin/bios-merge origin/bios-patchqueue origin/bochs-bios-cvs origin/bochs-bios-vendor-drops origin/build origin/for-glommer origin/ia64-vtd origin/irq-routing-2 origin/kvm-updates-2.6.25 origin/kvm-updates-2.6.26 origin/kvm-updates-2.6.27 origin/kvm-updates/2.6.26 origin/kvm-updates/2.6.27 origin/kvm-updates/2.6.28 origin/kvm-updates/2.6.29 origin/kvm-updates/2.6.30 origin/maint/2.6.25 origin/maint/2.6.26 origin/maint/2.6.26-test origin/maint/2.6.28 origin/maint/2.6.29 origin/maint/2.6.30 origin/master origin/merge-tmp origin/origin origin/pending origin/qemu-cvs origin/qemu-vendor-drops origin/realmode origin/release [...@leavenworth qemu-kvm]$ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST] [PATCH] support for remote migration
- yogi anant...@linux.vnet.ibm.com wrote: Hello everyone, I like to submit patch to add support for remote migration in kvm-autotest. Sounds like a good idea. Also, the patch isn't too big, which I personally appreciate very much (makes it easier to read). To use this patch the following four parameters should be added to the existing migration test remote = dst hostip = localhost ip or name remoteip = remote host ip or name remuser = root rempassword = password the field remote=dst indicates the VM dst should be created on remote machine. For example: - migrate: install setup type = migration vms += dst migration_test_command = help kill_vm_on_error = yes remote = dst hostip = 192.168.1.2 remoteip = 192.168.1.3 remuser = root rempassword = 123456 variants: Three files r being modified in this patch kvm_utils.py, kvm_tests.py and kvm_vm.py. kvm_utils.py - if the ssh-keys have been exchanged between the test machines,then remote login fails with message Got unexpected login prompt, to prevent this, have made it return a session rather then None kvm_tests.py - the host address used in migration is made dynamic kvm_vm.py -have replaced unix sockets with tcp sockets for monitor, in both remote and local VM. Added two new variables(remote,ssh_port) to class VM,remote set to True if the VM is on a remote machine,ssh_port contains the redirection port, funtion get_address() returns the ip of the host whr the VM is(local or remote). I've only looked at the code briefly, and it looks very good overall, but I have a few comments/questions: Regarding remote_login: - Why should remote_login return a session when it gets an unexpected login prompt? If you get a login prompt doesn't that mean something went wrong? The username is always provided in the ssh command line, so we shouldn't expect to receive a login prompt -- or am I missing something? I am pretty confident this is true in the general case, but maybe it's different when ssh keys have been exchanged between the hosts. - I think it makes little sense to return a session object when you see a login prompt because that session will be useless. You can't send any commands to it because you don't have a shell prompt yet. Any command you send will be interpreted as a username, and will most likely be the wrong username. - When a guest is in the process of booting and we try to log into it, remote_login sometimes fails because it gets an unexpected login prompt. This is good, as far as I understand, because it means the guest isn't ready yet (still booting). The next time remote_login attempts to log in, it usually succeeds. If we consider an unexpected login prompt OK, we pass login attempts that actually should have failed (and the resulting sessions will be useless anyway). Other things: - If I understand correctly, remote migration will only work if the remote qemu binary path is exactly the same as the local one. Maybe we should receive a qemu path parameter that will allow for some flexibility. - In VM.make_qemu_command(), in the code that handles redirections, you add 'self.ssh_port = host_port'. I don't think this is correct because there can be multiple redirections, unrelated to SSH, so you certainly shouldn't assume that the only redirection is an SSH one. When you want the host port redirected to the guest's SSH port, you should use self.get_port(int(self.params.get(ssh_port))). This will also work if for some reason 'ssh_port' changes while the guest is alive. - It seems that the purpose of 'remote = dst' is to indicate to 'dst' that it should be started as a remote VM. The preferred way to do this is to pass something like 'remote_dst = yes' and then in VM.create() you can test for params.get(remote) == yes. See Addressing objects in the wiki (http://www.linux-kvm.org/page/KVM-Autotest/Parameters#Addressing_objects_.28VMs.2C_images.2C_NICs_etc.29). In general, any parameter you want to pass to a specific VM, you pass using param_vmname = value, e.g. 'mem_dst = 128', and then in VM.create() the parameter is accessible without the VM name extension (e.g. self.params.get(mem) will equal 128). Thanks, Michael -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix build when objdir != srcdir (v2)
This requires adding the necessary bits to configure to create the directories and symlinks for libkvm. It also requires sticking KVM_CFLAGS in config-host.mak to ensure that it gets the right set of includes for the kernel headers. v1 = v2 Fix build when objdir == srcdir Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- configure | 10 -- kvm/libkvm/Makefile |4 +++- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/configure b/configure index fc0fb9b..c41e269 100755 --- a/configure +++ b/configure @@ -518,7 +518,7 @@ if test $werror = yes ; then CFLAGS=$CFLAGS -Werror fi -CFLAGS=$CFLAGS -I$(readlink -f kvm/libkvm) +CFLAGS=$CFLAGS -I$(readlink -f $source_path/kvm/libkvm) if test $solaris = no ; then if ld --version 2/dev/null | grep GNU ld /dev/null 2/dev/null ; then @@ -1785,6 +1785,11 @@ bsd) ;; esac +# this is a temp hack needed for libkvm +if test $kvm = yes ; then +echo KVM_CFLAGS=$kvm_cflags $config_mak +fi + tools= if test `expr $target_list : .*softmmu.*` != 0 ; then tools=qemu-img\$(EXESUF) $tools @@ -2162,10 +2167,11 @@ done # for target in $targets # build tree in object directory if source path is different from current one if test $source_path_used = yes ; then -DIRS=tests tests/cris slirp audio +DIRS=tests tests/cris slirp audio kvm/libkvm FILES=Makefile tests/Makefile FILES=$FILES tests/cris/Makefile tests/cris/.gdbinit FILES=$FILES tests/test-mmap.c +FILES=$FILES kvm/libkvm/Makefile for dir in $DIRS ; do mkdir -p $dir done diff --git a/kvm/libkvm/Makefile b/kvm/libkvm/Makefile index 727ce48..2f2cfa2 100644 --- a/kvm/libkvm/Makefile +++ b/kvm/libkvm/Makefile @@ -1,5 +1,5 @@ include ../../config-host.mak -include config-$(ARCH).mak +include $(VPATH)/kvm/libkvm/config-$(ARCH).mak # libkvm is not -Wredundant-decls friendly yet CFLAGS += -Wno-redundant-decls @@ -18,6 +18,8 @@ LDFLAGS += $(CFLAGS) CXXFLAGS = $(autodepend-flags) +VPATH:=$(VPATH)/kvm/libkvm + autodepend-flags = -MMD -MF $(dir $*).$(notdir $*).d -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix build when objdir != srcdir
Avi Kivity wrote: Avi Kivity wrote: Anthony Liguori wrote: This requires adding the necessary bits to configure to create the directories and symlinks for libkvm. It also requires sticking KVM_CFLAGS in config-host.mak to ensure that it gets the right set of includes for the kernel headers. Applied, thanks. Unapplied, as it breaks ordinary ./configure make. Doh, sorry. Sent a new patch fixing this. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Avi Kivity wrote: Anthony Liguori wrote: 2) cpu_physical_memory_rw due to not using preadv/pwritev? I think both virtio-net and virtio-blk use memcpy(). With latest linux-2.6, and a development snapshot of glibc, virtio-blk will not use memcpy() anymore but virtio-net still does on the receive path (but not transmit). There's still the kernel/user copy, so we have two copies on rx, one on tx. That won't show up as cpu_physical_memory_rw. stl_phys/ldl_phys are suspect though as they degrade to cpu_physical_memory_rw. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Avi Kivity wrote: Anthony Liguori wrote: Avi Kivity wrote: 1) I'm seeing about 2.3% in scheduler functions [that I recognize]. Does that seems a bit excessive? Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. This is why I wrote the linux-aio patch. It only reduced CPU consumption by about 2% although I'm not sure if that's absolute or relative. Andrew? Was that before or after the entire path was made copyless? Before so it's worth updating and trying again. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/21] Remove clean rule change
On Thu, 2009-04-30 at 12:42 +0300, Avi Kivity wrote: Anthony Liguori wrote: It's not in upstream QEMU so apparently it's not useful. Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- pc-bios/Makefile |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/pc-bios/Makefile b/pc-bios/Makefile index dabeb4c..315288d 100644 --- a/pc-bios/Makefile +++ b/pc-bios/Makefile @@ -16,4 +16,4 @@ all: $(TARGETS) dtc -I dts -O dtb -o $@ $ clean: - rm -f $(TARGETS) *.o *~ *.dtb + rm -f $(TARGETS) *.o *~ Hollis? dtb is the compiled (binary) form of dts (source) device tree files. Think of it like bios.bin: if make clean doesn't delete bios.bin (and it looks like it doesn't), neither should it delete *.dtb, and we can drop the patch. Acked-by: Hollis Blanchard holl...@us.ibm.com -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Andrew Theurer wrote: Avi Kivity wrote: Anthony Liguori wrote: Avi Kivity wrote: 1) I'm seeing about 2.3% in scheduler functions [that I recognize]. Does that seems a bit excessive? Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. This is why I wrote the linux-aio patch. It only reduced CPU consumption by about 2% although I'm not sure if that's absolute or relative. Andrew? If I recall correctly, it was 2.4% and relative. But with 2.3% in scheduler functions, that's what I expected. Was that before or after the entire path was made copyless? If this is referring to the preadv/writev support, no, I have not tested with that. Previously, the block API only exposed non-vector interfaces and bounced vectored operations to a linear buffer. That's been eliminated now though so we need to update the linux-aio patch to implement a vectored backend interface. However, it is an apples to apples comparison in terms of copying since the same is true with the thread pool. My take away was that the thread pool overhead isn't the major source of issues. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CPU Limitations
Hi, i tried to get some useful informations out of gdb. but it just gives me this: warning: Can't read pathname for load map: Input/output error. Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /lib/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libz.so.1 Reading symbols from /usr/lib/libasound.so.2... (no debugging symbols found)...done. Loaded symbols for /usr/lib/libasound.so.2 Reading symbols from /usr/lib/libpulse-simple.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libpulse-simple.so.0 Reading symbols from /usr/lib/libgnutls.so.26... (no debugging symbols found)...done. Loaded symbols for /usr/lib/libgnutls.so.26 Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done. Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib/librt.so.1... (no debugging symbols found)...done. Loaded symbols for /lib/librt.so.1 Reading symbols from /lib/libutil.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libutil.so.1 Reading symbols from /usr/lib/libX11.so.6... (no debugging symbols found)...done. Loaded symbols for /usr/lib/libX11.so.6 Reading symbols from /usr/lib/libSDL-1.2.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libSDL-1.2.so.0 Reading symbols from /lib/libncurses.so.5... (no debugging symbols found)...done. Loaded symbols for /lib/libncurses.so.5 Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /usr/lib/libpulse.so.0... (no debugging symbols found)...done. Loaded symbols for /usr/lib/libpulse.so.0 Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /usr/lib/libSM.so.6... (no debugging symbols found)...done. Loaded symbols for /usr/lib/libSM.so.6 Reading symbols from /usr/lib/libICE.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libICE.so.6 Reading symbols from /lib/libcap.so.2... (no debugging symbols found)...done. Loaded symbols for /lib/libcap.so.2 Reading symbols from /usr/lib/libgdbm.so.3...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libgdbm.so.3 Reading symbols from /usr/lib/libtasn1.so.3... (no debugging symbols found)...done. Loaded symbols for /usr/lib/libtasn1.so.3 Reading symbols from /lib/libgcrypt.so.11...(no debugging symbols found)...done. Loaded symbols for /lib/libgcrypt.so.11 Reading symbols from /lib/ld-linux-x86-64.so.2... (no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/lib/libxcb.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libxcb.so.1 Reading symbols from /usr/lib/libdirectfb-1.0.so.0... (no debugging symbols found)...done. Loaded symbols for /usr/lib/libdirectfb-1.0.so.0 Reading symbols from /usr/lib/libfusion-1.0.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libfusion-1.0.so.0 Reading symbols from /usr/lib/libdirect-1.0.so.0... (no debugging symbols found)...done. Loaded symbols for /usr/lib/libdirect-1.0.so.0 Reading symbols from /lib/libuuid.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libuuid.so.1 Reading symbols from /lib/libattr.so.1... (no debugging symbols found)...done. Loaded symbols for /lib/libattr.so.1 Reading symbols from /lib/libgpg-error.so.0...(no debugging symbols found)...done. Loaded symbols for /lib/libgpg-error.so.0 Reading symbols from /usr/lib/libXau.so.6... (no debugging symbols found)...done. Loaded symbols for /usr/lib/libXau.so.6 Reading symbols from /usr/lib/libXdmcp.so.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libXdmcp.so.6 Reading symbols from /lib/libnss_files.so.2... ---Type return to continue, or q return to quit--- (no debugging symbols found)...done. Loaded symbols for /lib/libnss_files.so.2 (no debugging symbols found) Core was generated by `kvm -smp 32 /fs2/xen/disk0'. Program terminated with signal 11, Segmentation fault. [New process 4665] [New process 4666] [New process 4674] [New process 4670] [New process 4676] [New process 4678] [New process 4669] [New process 4667] [New process 4677] [New process 4686] [New process 4675] [New process 4672] [New process 4679] [New process 4682] [New process 4673] [New process 4681] [New process 4671] [New process 4683] [New process 4689] [New process 4685] [New process 4668] [New process 4690] [New process 4684] [New process 4691] [New process 4687] [New process 4692] [New process 4693] [New process 4694] [New process 4695] [New process 4696] [New process 4680] [New process 4688] [New process 4697] #0 0x004092ba in ?? () Do i maybe need to compile KVM with some special debug flags? Is there no patch that increases the number of CPUS? Cheers, Cornelius Am Dienstag, den 28.04.2009, 11:41 +0300 schrieb Avi Kivity:
Re: [PATCH 16/21] Remove clean rule change
Hollis Blanchard wrote: dtb is the compiled (binary) form of dts (source) device tree files. Think of it like bios.bin: if make clean doesn't delete bios.bin (and it looks like it doesn't), neither should it delete *.dtb, and we can drop the patch. Acked-by: Hollis Blanchard holl...@us.ibm.com make clean doesn't delete bios.bin, because bios.bin is under source control (as it requires special tools to build). I see that *.dtb is also under source control, so will apply the patch. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Anthony Liguori wrote: Previously, the block API only exposed non-vector interfaces and bounced vectored operations to a linear buffer. That's been eliminated now though so we need to update the linux-aio patch to implement a vectored backend interface. However, it is an apples to apples comparison in terms of copying since the same is true with the thread pool. My take away was that the thread pool overhead isn't the major source of issues. If the overhead is dominated by copying, then you won't see the difference. Once the copying is eliminated, the comparison may yield different results. We should certainly see a difference in context switches. One cause of context switches won't be eliminated - the non-saturating workload causes us to switch to the idle thread, which incurs a heavyweight exit. This doesn't matter since we're idle anyway, but when we switch back, we incur a heavyweight entry. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CPU Limitations
Cornelius Wefelscheid wrote: Hi, i tried to get some useful informations out of gdb. but it just gives me this: #0 0x004092ba in ?? () Do i maybe need to compile KVM with some special debug flags? Is there no patch that increases the number of CPUS? Use 'gdb /path/to/qemu core_file' -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm.git now live
Avi Kivity wrote: Jan Kiszka wrote: Avi Kivity wrote: Where/how does the migration code disable dirty logging? Should be phase 3 of ram_save_live(). But only in qemu-kvm. What is the plan about pushing it upstream? Then we could discuss how to extend the exiting support best. Pushing things upstream is quite difficult because of the very different infrastructure. Isn't the midterm goal to get rid of most of these differences (namely libkvm)? Yes, but not by removing existing functionality. No one said this. It's unfortunate that upstream rewrote everything instead of changing things incrementally. Rewrites are almost always a mistake since they throw away accumulated knowledge. I disagree, at least in this particular case. Upstream already diverged from qemu-kvm, and the latter provided no comparable alternative for slot management and dirty logging. And I still don't see that we lost anything that could not easily be re-integrated into upstream (ie. global dirty logging), finally leading to a cleaner and more complete result. It could have been done differently, by morphing the existing support into something mergable, and merging that. In this way, we'd ensure no needed functionality is lost. The existing support lacked features upstream already had and instead required additional hacks to make qemu-kvm work. As is, we're adding something simple, then discovering it's insufficient. We're throwing away information, that's not a good way to make progress. I doubt this applies here. So, what bits are missing to make KVM migration work in upstream? I don't know of anything beyond dirty logging. OK, then I will pick this up and have a look at something comparable to cpu_physical_memory_set_dirty_tracking() for upstream. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 18/21] Remove host_alarm_timer hacks.
Avi Kivity wrote: Do we really care about optimizing latency with -clock rtc though? People still run kvm on RHEL 5 (or cheap clones thereof), aren't they affected? Do they use -clock rtc? -clock dynticks should still work on RHEL 5 it's just that you won't get very accurate timer events. You can only use -clock rtc with a single guest at a time so I doubt people use it seriously. The other option would be -clock unix but I can't see why you'd use -clock unix instead of -clock dynticks. The only reason to keep -clock unix around is for non Linux unices. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 18/21] Remove host_alarm_timer hacks.
Anthony Liguori wrote: Avi Kivity wrote: Anthony Liguori wrote: In modern KVM, the IO thread is capable of interrupting the CPU whenever it needs to process IO. Therefore this problem no longer exists. It would still be good to verify that the problem no longer exists. This is not a cosmetic change; some testing is needed to verify it doesn't introduce new latencies. N.B. dynticks is the preferred timer in QEMU on Linux. To even hit this code path, you'd have to use an explicit -clock hpet or -clock rtc. I don't have an hpet on my laptop and -clock rtc boots just as fast as it did before. I'll apply this and see what happens. Do we really care about optimizing latency with -clock rtc though? People still run kvm on RHEL 5 (or cheap clones thereof), aren't they affected? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html