[COMMIT master] document boot option to -drive parameter
From: Bruce Rogers brog...@novell.com The boot option is missing from the documentation for the -drive parameter. If there is a better way to descibe it, I'm all ears. Signed-off-by: Bruce Rogers brog...@novell.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/qemu-options.hx b/qemu-options.hx index c5a160c..fbcf61e 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -160,6 +160,8 @@ an untrusted format header. This option specifies the serial number to assign to the device. @item ad...@var{addr} Specify the controller's PCI address (if=virtio only). +...@item bo...@var{boot} +...@var{boot} is on or off and allows for booting from non-traditional interfaces, such as virtio. @end table By default, writethrough caching is used for all block device. This means that -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Test cmps between two IO locations.
From: Gleb Natapov g...@redhat.com Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/kvm/user/test/x86/emulator.c b/kvm/user/test/x86/emulator.c index c6adbb5..db84c13 100644 --- a/kvm/user/test/x86/emulator.c +++ b/kvm/user/test/x86/emulator.c @@ -17,18 +17,11 @@ void report(const char *name, int result) } } -void test_cmps(void *mem) +void test_cmps_one(unsigned char *m1, unsigned char *m3) { - unsigned char *m1 = mem, *m2 = mem + 1024; - unsigned char m3[1024]; void *rsi, *rdi; long rcx, tmp; - for (int i = 0; i 100; ++i) - m1[i] = m2[i] = m3[i] = i; - for (int i = 100; i 200; ++i) - m1[i] = (m3[i] = m2[i] = i) + 1; - rsi = m1; rdi = m3; rcx = 30; asm volatile(xor %[tmp], %[tmp] \n\t repe/cmpsb @@ -91,6 +84,19 @@ void test_cmps(void *mem) } +void test_cmps(void *mem) +{ + unsigned char *m1 = mem, *m2 = mem + 1024; + unsigned char m3[1024]; + + for (int i = 0; i 100; ++i) + m1[i] = m2[i] = m3[i] = i; + for (int i = 100; i 200; ++i) + m1[i] = (m3[i] = m2[i] = i) + 1; +test_cmps_one(m1, m3); +test_cmps_one(m1, m2); +} + void test_cr8(void) { unsigned long src, dst; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] test: access: don't expect fetch fault indication if !efer.nx
From: Avi Kivity a...@redhat.com Bit 4 of the page-fault error code can only be set if efer.nx=1. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c index 5addd15..3338fbc 100644 --- a/kvm/user/test/x86/access.c +++ b/kvm/user/test/x86/access.c @@ -463,6 +463,8 @@ no_pte: fault: if (!at-expected_fault) at-ignore_pde = 0; +if (!at-flags[AC_CPU_EFER_NX]) +at-expected_error = ~PFERR_FETCH_MASK; } static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond, -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] test: access: consolidate test failure reporting into a function
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c index dbc1213..0906691 100644 --- a/kvm/user/test/x86/access.c +++ b/kvm/user/test/x86/access.c @@ -453,6 +453,28 @@ fault: ; } +static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond, + const char *fmt, ...) +{ +va_list ap; +char buf[500]; + +if (!*success_ret) { +return; +} + +if (!cond) { +return; +} + +*success_ret = false; + +va_start(ap, fmt); +vsnprintf(buf, sizeof(buf), fmt, ap); +va_end(ap); +printf(FAIL: %s\n, buf); +} + int ac_test_do_access(ac_test_t *at) { static unsigned unique = 42; @@ -460,6 +482,7 @@ int ac_test_do_access(ac_test_t *at) unsigned e; static unsigned char user_stack[4096]; unsigned long rsp; +_Bool success = true; ++unique; @@ -531,30 +554,21 @@ int ac_test_do_access(ac_test_t *at) jmp back_to_kernel \n\t .section .text); -if (fault !at-expected_fault) { - printf(FAIL: unexpected fault\n); - return 0; -} -if (!fault at-expected_fault) { - printf(FAIL: unexpected access\n); - return 0; -} -if (fault e != at-expected_error) { - printf(FAIL: error code %x expected %x\n, e, at-expected_error); - return 0; -} -if (at-ptep *at-ptep != at-expected_pte) { - printf(FAIL: pte %x expected %x\n, *at-ptep, at-expected_pte); - return 0; +ac_test_check(at, success, fault !at-expected_fault, + unexpected fault); +ac_test_check(at, success, !fault at-expected_fault, + unexpected access); +ac_test_check(at, success, fault e != at-expected_error, + error code %x expected %x, e, at-expected_error); +ac_test_check(at, success, at-ptep *at-ptep != at-expected_pte, + pte %x expected %x, *at-ptep, at-expected_pte); +ac_test_check(at, success, *at-pdep != at-expected_pde, + pde %x expected %x, *at-pdep, at-expected_pde); + +if (success) { +printf(PASS\n); } - -if (*at-pdep != at-expected_pde) { - printf(FAIL: pde %x expected %x\n, *at-pdep, at-expected_pde); - return 0; -} - -printf(PASS\n); -return 1; +return success; } static void ac_test_show(ac_test_t *at) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] test: access: allow the processor not to set pde.a if a fault occurs
From: Avi Kivity a...@redhat.com Some processors only set accessed bits if the translation is valid; allow this behaviour. This squelchs errors running with EPT. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c index c7a7075..5addd15 100644 --- a/kvm/user/test/x86/access.c +++ b/kvm/user/test/x86/access.c @@ -137,6 +137,7 @@ typedef struct { pt_element_t expected_pte; pt_element_t *pdep; pt_element_t expected_pde; +pt_element_t ignore_pde; int expected_fault; unsigned expected_error; idt_entry_t idt[256]; @@ -370,6 +371,7 @@ void ac_test_setup_pte(ac_test_t *at) if (at-ptep) at-expected_pte = *at-ptep; at-expected_pde = *at-pdep; +at-ignore_pde = 0; at-expected_fault = 0; at-expected_error = PFERR_PRESENT_MASK; @@ -416,13 +418,17 @@ void ac_test_setup_pte(ac_test_t *at) if (at-flags[AC_ACCESS_FETCH] at-flags[AC_PDE_NX]) at-expected_fault = 1; -if (at-expected_fault) +if (!at-flags[AC_PDE_ACCESSED]) +at-ignore_pde = PT_ACCESSED_MASK; + +if (!pde_valid) goto fault; -at-expected_pde |= PT_ACCESSED_MASK; +if (!at-expected_fault) +at-expected_pde |= PT_ACCESSED_MASK; if (at-flags[AC_PDE_PSE]) { - if (at-flags[AC_ACCESS_WRITE]) + if (at-flags[AC_ACCESS_WRITE] !at-expected_fault) at-expected_pde |= PT_DIRTY_MASK; goto no_pte; } @@ -455,7 +461,8 @@ void ac_test_setup_pte(ac_test_t *at) no_pte: fault: -; +if (!at-expected_fault) +at-ignore_pde = 0; } static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond, @@ -484,6 +491,13 @@ static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond, printf(FAIL: %s\n, buf); } +static int pt_match(pt_element_t pte1, pt_element_t pte2, pt_element_t ignore) +{ +pte1 = ~ignore; +pte2 = ~ignore; +return pte1 == pte2; +} + int ac_test_do_access(ac_test_t *at) { static unsigned unique = 42; @@ -571,7 +585,8 @@ int ac_test_do_access(ac_test_t *at) error code %x expected %x, e, at-expected_error); ac_test_check(at, success, at-ptep *at-ptep != at-expected_pte, pte %x expected %x, *at-ptep, at-expected_pte); -ac_test_check(at, success, *at-pdep != at-expected_pde, +ac_test_check(at, success, + !pt_match(*at-pdep, at-expected_pde, at-ignore_pde), pde %x expected %x, *at-pdep, at-expected_pde); if (success verbose) { -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Add test for ljmp.
From: Gleb Natapov g...@redhat.com Test that ljmp with operand in IO memory works. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/kvm/user/test/x86/emulator.c b/kvm/user/test/x86/emulator.c index db84c13..4967d1f 100644 --- a/kvm/user/test/x86/emulator.c +++ b/kvm/user/test/x86/emulator.c @@ -183,6 +183,19 @@ void test_pop(void *mem) report(ret, 1); } +void test_ljmp(void *mem) +{ +unsigned char *m = mem; +volatile int res = 1; + +*(unsigned long**)m = jmpf; +asm volatile (data16/mov %%cs, %0:=m(*(m + sizeof(unsigned long; +asm volatile (rex64/ljmp *%0::m(*m)); +res = 0; +jmpf: +report(ljmp, res); +} + unsigned long read_cr0(void) { unsigned long cr0; @@ -258,6 +271,7 @@ int main() test_smsw(); test_lmsw(); +test_ljmp(mem); printf(\nSUMMARY: %d tests, %d failures\n, tests, fails); return fails ? 1 : 0; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] qemu-kvm: fix crash on reboot with vhost-net
From: Michael S. Tsirkin m...@redhat.com When vhost-net is disabled on reboot, we set msix mask notifier to NULL to disable further mask/unmask notifications. Code currently tries to pass this NULL to notifier, leading to a crash. The right thing to do is to add explicit APIs to enable/disable notifications. Now when disabling notifications: - if vector is masked, we don't need to notify backend, just disable future notifications - if vector is unmasked, invoke callback to unassign backend, then disable future notifications This patch also polls notifier before closing it, to make sure we don't lose events if poll callback didn't have time to run. Signed-off-by: Michael S. Tsirkin m...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/hw/msix.c b/hw/msix.c index 3ec8805..8f9a621 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -609,14 +609,44 @@ void msix_unuse_all_vectors(PCIDevice *dev) int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque) { +int r; +if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) +return 0; + +assert(dev-msix_mask_notifier); +assert(opaque); +assert(!dev-msix_mask_notifier_opaque[vector]); + +if (msix_is_masked(dev, vector)) { +return 0; +} +r = dev-msix_mask_notifier(dev, vector, opaque, +msix_is_masked(dev, vector)); +if (r 0) { +return r; +} +dev-msix_mask_notifier_opaque[vector] = opaque; +return r; +} + +int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector) +{ int r = 0; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return 0; -if (dev-msix_mask_notifier) -r = dev-msix_mask_notifier(dev, vector, opaque, -msix_is_masked(dev, vector)); -if (r = 0) -dev-msix_mask_notifier_opaque[vector] = opaque; +assert(dev-msix_mask_notifier); +assert(dev-msix_mask_notifier_opaque[vector]); + +if (msix_is_masked(dev, vector)) { +return 0; +} +r = dev-msix_mask_notifier(dev, vector, +dev-msix_mask_notifier_opaque[vector], +msix_is_masked(dev, vector)); +if (r 0) { +return r; +} +dev-msix_mask_notifier_opaque[vector] = NULL; return r; } diff --git a/hw/msix.h b/hw/msix.h index f167231..6b21ffb 100644 --- a/hw/msix.h +++ b/hw/msix.h @@ -34,4 +34,5 @@ void msix_reset(PCIDevice *dev); extern int msix_supported; int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque); +int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector); #endif diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 99a588c..c4bc633 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -462,10 +462,13 @@ static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign) msix_set_mask_notifier(proxy-pci_dev, virtio_queue_vector(proxy-vdev, n), vq); } else { -msix_set_mask_notifier(proxy-pci_dev, - virtio_queue_vector(proxy-vdev, n), NULL); +msix_unset_mask_notifier(proxy-pci_dev, +virtio_queue_vector(proxy-vdev, n)); qemu_set_fd_handler(event_notifier_get_fd(notifier), NULL, NULL, NULL); +/* Test and clear notifier before closing it, + * in case poll callback didn't have time to run. */ +virtio_pci_guest_notifier_read(vq); event_notifier_cleanup(notifier); } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] qemu-kvm tests cleanup
From: Naphtali Sprei nsp...@redhat.com Mainly removed unused/unnecessary files and references to them Signed-off-by: Naphtali Sprei nsp...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/kvm/user/README b/kvm/user/README new file mode 100644 index 000..6a83831 --- /dev/null +++ b/kvm/user/README @@ -0,0 +1,23 @@ +This directory contains sources for a kvm test suite. + +Tests for x86 architecture are run as kernel images for qemu that supports multiboot format. +Tests uses an infrastructure called from the bios code. The infrastructure initialize the system/cpu's, +switch to long-mode and calls the 'main' function of the individual test. +Tests uses a qemu's virtual test device, named testdev, for services like printing, exiting, query memory size etc. +See file testdev.txt for more details. + +To create the tests' images just type 'make' in this directory. +Tests' images created in ./test/ARCH/*.flat + +An example of a test invocation: +qemu-system-x86_64 -device testdev,chardev=testlog -chardev file,id=testlog,path=msr.out -kernel ./test/x86/msr.flat +This invocation runs the msr test case. The test output is in file msr.out. + + + +Directory structure: +.: Makefile and config files for the tests +./test/lib: general services for the tests +./test/lib/ARCH: architecture dependent services for the tests +./test/ARCH: the sources of the tests and the created objects/images + diff --git a/kvm/user/balloon_ctl.c b/kvm/user/balloon_ctl.c deleted file mode 100755 index e65b08d..000 --- a/kvm/user/balloon_ctl.c +++ /dev/null @@ -1,92 +0,0 @@ -/* - * This binary provides access to the guest's balloon driver - * module. - * - * Copyright (C) 2007 Qumranet - * - * Author: - * - * Dor Laor dor.l...@qumranet.com - * - * This work is licensed under the GNU LGPL license, version 2. - */ - -#include unistd.h -#include fcntl.h -#include stdio.h -#include stdlib.h -#include sys/mman.h -#include string.h -#include errno.h -#include sys/ioctl.h - -#define __user -#include linux/kvm.h - -#define PAGE_SIZE 4096ul - - -static int balloon_op(int *fd, int bytes) -{ - struct kvm_balloon_op bop; -int r; - - bop.npages = bytes/PAGE_SIZE; - r = ioctl(*fd, KVM_BALLOON_OP, bop); - if (r == -1) - return -errno; - printf(Ballon handled %d pages successfully\n, bop.npages); - - return 0; -} - -static int balloon_init(int *fd) -{ - *fd = open(/dev/kvm_balloon, O_RDWR); - if (*fd == -1) { - perror(open /dev/kvm_balloon); - return -1; - } - - return 0; -} - -int main(int argc, char *argv[]) -{ - int fd; - int r; - int bytes; - - if (argc != 3) { - perror(Please provide op=[i|d], bytes\n); - return 1; - } - bytes = atoi(argv[2]); - - switch (*argv[1]) { - case 'i': - break; - case 'd': - bytes = -bytes; - break; - default: - perror(Wrong op param\n); - return 1; - } - - if (balloon_init(fd)) { - perror(balloon_init failed\n); - return 1; - } - - if ((r = balloon_op(fd, bytes))) { - perror(balloon_op failed\n); - goto out; - } - -out: - close(fd); - - return r; -} - diff --git a/kvm/user/bootstrap.lds b/kvm/user/bootstrap.lds deleted file mode 100644 index fd0a4f8..000 --- a/kvm/user/bootstrap.lds +++ /dev/null @@ -1,15 +0,0 @@ -OUTPUT_FORMAT(binary) - -SECTIONS -{ -. = 0; -stext = .; -.text : { *(.init) *(.text) } -. = ALIGN(4K); -.data : { *(.data) } -. = ALIGN(16); -.bss : { *(.bss) } -. = ALIGN(4K); -edata = .; -} - diff --git a/kvm/user/config-x86-common.mak b/kvm/user/config-x86-common.mak index f3172fb..8e795f0 100644 --- a/kvm/user/config-x86-common.mak +++ b/kvm/user/config-x86-common.mak @@ -2,9 +2,7 @@ CFLAGS += -I../include/x86 -all: kvmtrace test_cases - -balloon_ctl: balloon_ctl.o +all: test_cases cflatobjs += \ test/lib/x86/io.o \ @@ -21,21 +19,17 @@ CFLAGS += -m$(bits) libgcc := $(shell $(CC) -m$(bits) --print-libgcc-file-name) FLATLIBS = test/lib/libcflat.a $(libgcc) -%.flat: %.o $(FLATLIBS) +%.flat: %.o $(FLATLIBS) flat.lds $(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS) -tests-common = $(TEST_DIR)/bootstrap \ - $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \ - $(TEST_DIR)/smptest.flat $(TEST_DIR)/port80.flat \ - $(TEST_DIR)/realmode.flat $(TEST_DIR)/msr.flat +tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \ + $(TEST_DIR)/smptest.flat $(TEST_DIR)/port80.flat \ + $(TEST_DIR)/realmode.flat $(TEST_DIR)/msr.flat test_cases: $(tests-common) $(tests) $(TEST_DIR)/%.o: CFLAGS += -std=gnu99 -ffreestanding -I test/lib -I
[COMMIT master] qemu-kvm tests: add printing for passing tests
From: Naphtali Sprei nsp...@redhat.com Signed-off-by: Naphtali Sprei nsp...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/kvm/user/test/x86/realmode.c b/kvm/user/test/x86/realmode.c index bfc2942..bc4ed97 100644 --- a/kvm/user/test/x86/realmode.c +++ b/kvm/user/test/x86/realmode.c @@ -160,6 +160,8 @@ void test_xchg(void) if (!regs_equal(inregs, outregs, 0)) print_serial(xchg test 1: FAIL\n); + else + print_serial(xchg test 1: PASS\n); exec_in_big_real_mode(inregs, outregs, insn_xchg_test2, @@ -169,6 +171,8 @@ void test_xchg(void) outregs.eax != inregs.ebx || outregs.ebx != inregs.eax) print_serial(xchg test 2: FAIL\n); + else + print_serial(xchg test 2: PASS\n); exec_in_big_real_mode(inregs, outregs, insn_xchg_test3, @@ -178,6 +182,8 @@ void test_xchg(void) outregs.eax != inregs.ecx || outregs.ecx != inregs.eax) print_serial(xchg test 3: FAIL\n); + else + print_serial(xchg test 3: PASS\n); exec_in_big_real_mode(inregs, outregs, insn_xchg_test4, @@ -187,6 +193,8 @@ void test_xchg(void) outregs.eax != inregs.edx || outregs.edx != inregs.eax) print_serial(xchg test 4: FAIL\n); + else + print_serial(xchg test 4: PASS\n); exec_in_big_real_mode(inregs, outregs, insn_xchg_test5, @@ -196,6 +204,8 @@ void test_xchg(void) outregs.eax != inregs.esi || outregs.esi != inregs.eax) print_serial(xchg test 5: FAIL\n); + else + print_serial(xchg test 5: PASS\n); exec_in_big_real_mode(inregs, outregs, insn_xchg_test6, @@ -205,6 +215,8 @@ void test_xchg(void) outregs.eax != inregs.edi || outregs.edi != inregs.eax) print_serial(xchg test 6: FAIL\n); + else + print_serial(xchg test 6: PASS\n); exec_in_big_real_mode(inregs, outregs, insn_xchg_test7, @@ -214,6 +226,8 @@ void test_xchg(void) outregs.eax != inregs.ebp || outregs.ebp != inregs.eax) print_serial(xchg test 7: FAIL\n); + else + print_serial(xchg test 7: PASS\n); exec_in_big_real_mode(inregs, outregs, insn_xchg_test8, @@ -223,6 +237,8 @@ void test_xchg(void) outregs.eax != inregs.esp || outregs.esp != inregs.eax) print_serial(xchg test 8: FAIL\n); + else + print_serial(xchg test 8: PASS\n); } void test_shld(void) @@ -234,9 +250,9 @@ void test_shld(void) insn_shld_test, insn_shld_test_end - insn_shld_test); if (outregs.eax != 0xbeef) - print_serial(shld: failure\n); + print_serial(shld: FAIL\n); else - print_serial(shld: success\n); + print_serial(shld: PASS\n); } void test_mov_imm(void) @@ -253,6 +269,8 @@ void test_mov_imm(void) insn_mov_r16_imm_1_end - insn_mov_r16_imm_1); if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 1234) print_serial(mov test 1: FAIL\n); + else + print_serial(mov test 1: PASS\n); /* test mov $imm, %eax */ exec_in_big_real_mode(inregs, outregs, @@ -260,6 +278,8 @@ void test_mov_imm(void) insn_mov_r32_imm_1_end - insn_mov_r32_imm_1); if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 1234567890) print_serial(mov test 2: FAIL\n); + else + print_serial(mov test 2: PASS\n); /* test mov $imm, %al/%ah */ exec_in_big_real_mode(inregs, outregs, @@ -267,16 +287,24 @@ void test_mov_imm(void) insn_mov_r8_imm_1_end - insn_mov_r8_imm_1); if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x1200) print_serial(mov test 3: FAIL\n); + else + print_serial(mov test 3: PASS\n); + exec_in_big_real_mode(inregs, outregs, insn_mov_r8_imm_2, insn_mov_r8_imm_2_end - insn_mov_r8_imm_2); if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x34) print_serial(mov test 4: FAIL\n); + else + print_serial(mov test 4: PASS\n); + exec_in_big_real_mode(inregs, outregs, insn_mov_r8_imm_3, insn_mov_r8_imm_3_end - insn_mov_r8_imm_3); if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x1234)
[COMMIT master] Merge branch 'upstream-merge'
From: Marcelo Tosatti mtosa...@redhat.com * upstream-merge: (65 commits) block: read-only: open cdrom as read-only when using monitor's change command fix whitespace bogon in some versions of make Changes to usb-linux to conform to coding style Add KVM CFLAGS to vhost build QMP: Introduce RESUME event virtio-9p: Create a syntactic shortcut for the file-system pass-thru virtio-9p: Add P9_TFLUSH support virtio-9p: Add P9_TREMOVE support. virtio-9p: Add P9_TWSTAT support virtio-9p: Add P9_TCREATE support virtio-9p: Add P9_TWRITE support virtio-9p: Add P9_TCLUNK support virtio-9p: Add P9_TREAD support virtio-9p: Add P9_TOPEN support. virtio-9p: Add P9_TWALK support virtio-9p: Add P9_TSTAT support virtio-9p: Add P9_TATTACH support. virtio-9p: Add P9_TVERSION support virtio-9p: Add sg helper functions virtio-9p: Add stat and mode related helper functions. ... Signed-off-by: Marcelo Tosatti mtosa...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] Merge branch 'upstream-merge'
From: Marcelo Tosatti mtosa...@redhat.com * upstream-merge: (243 commits) virtio-serial: Implement flow control for individual ports virtio-serial: Discard data that guest sends us when ports aren't connected virtio-serial: Apps should consume all data that guest sends out / Fix virtio api abuse virtio-serial: Handle scatter/gather input from the guest virtio-serial: Handle scatter-gather buffers for control messages iov: Add iov_to_buf and iov_size helpers iov: Introduce a new file for helpers around iovs, add iov_from_buf() virtio-serial: Send out guest data to ports only if port is opened virtio-serial: Propagate errors in initialising ports / devices in guest virtio-serial: Update copyright year to 2010 virtio-serial: Remove redundant check for 0-sized write request virtio-serial: whitespace: match surrounding code virtio-serial: Use control messages to notify guest of new ports virtio-serial: save/load: Send target host connection status if different virtio-serial: save/load: Ensure we have hot-plugged ports instantiated virtio-serial: save/load: Ensure nr_ports on src and dest are same. virtio-serial: save/load: Ensure target has enough ports microblaze: fix custom fprintf Implement cpu_get_real_ticks for Alpha. target-alpha: Implement RPCC. ... Signed-off-by: Marcelo Tosatti mtosa...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] qemu-kvm: emulator tests: fix msr test
From: Naphtali Sprei nsp...@redhat.com use correct 64 bit mode inline assembly constraints use a canonical form address when writing to the MSR_KERNEL_GS_BASE MSR Signed-off-by: Naphtali Sprei nsp...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/kvm/user/test/x86/msr.c b/kvm/user/test/x86/msr.c index 92102fa..0d6f286 100644 --- a/kvm/user/test/x86/msr.c +++ b/kvm/user/test/x86/msr.c @@ -17,23 +17,25 @@ static void report(const char *name, int passed) static void wrmsr(unsigned index, unsigned long long value) { - asm volatile (wrmsr : : c(index), A(value)); + unsigned a = value, d = value 32; + + asm volatile(wrmsr : : a(a), d(d), c(index)); } static unsigned long long rdmsr(unsigned index) { - unsigned long long value; - - asm volatile (rdmsr : =A(value) : c(index)); + unsigned a, d; - return value; + asm volatile(rdmsr : =a(a), =d(d) : c(index)); + return ((unsigned long long)d 32) | a; } + #endif static void test_kernel_gs_base(void) { #ifdef __x86_64__ - unsigned long long v1 = 0x123456789abcdef, v2; + unsigned long long v1 = 0x123456789abc, v2; wrmsr(MSR_KERNEL_GS_BASE, v1); v2 = rdmsr(MSR_KERNEL_GS_BASE); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] kvm test: Fix i386 crossbuild
From: Jan Kiszka jan.kis...@siemens.com This fixes make ARCH=i386 of the KVM micro tests on x86-64 hosts. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/kvm/user/config-x86-common.mak b/kvm/user/config-x86-common.mak index 63cca42..f3172fb 100644 --- a/kvm/user/config-x86-common.mak +++ b/kvm/user/config-x86-common.mak @@ -18,6 +18,8 @@ $(libcflat): CFLAGS += -ffreestanding -I test/lib CFLAGS += -m$(bits) +libgcc := $(shell $(CC) -m$(bits) --print-libgcc-file-name) + FLATLIBS = test/lib/libcflat.a $(libgcc) %.flat: %.o $(FLATLIBS) $(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS) @@ -32,7 +34,7 @@ test_cases: $(tests-common) $(tests) $(TEST_DIR)/%.o: CFLAGS += -std=gnu99 -ffreestanding -I test/lib -I test/lib/x86 $(TEST_DIR)/bootstrap: $(TEST_DIR)/bootstrap.o - $(CC) -nostdlib -o $@ -Wl,-T,bootstrap.lds $^ + $(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,bootstrap.lds $^ $(TEST_DIR)/access.flat: $(cstart.o) $(TEST_DIR)/access.o $(TEST_DIR)/print.o -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] x86: eliminate TS_XSAVE
On 05/04/2010 12:45 AM, H. Peter Anvin wrote: I was trying to avoid a performance regression relative to the current code, as it appears that some care was taken to avoid the memory reference. I agree that it's probably negligible compared to the save/restore code. If the x86 maintainers agree as well, I'll replace it with cpu_has_xsave. I asked Suresh to comment on this, since he wrote the original code. He did confirm that the intent was to avoid a global memory reference. Ok, so you're happy with the patch as is? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2.6.33.3: possible recursive locking detected
I'm currently running 2.6.33.3 in a KVM instance emulating a core2duo on 1 cpu with virtio HDs running on top of a core2duo host running 2.6.33.3. qemu-kvm version 0.12.3. When doing: echo noop /sys/block/vdd/queue/scheduler I got: [ 1424.438241] = [ 1424.439588] [ INFO: possible recursive locking detected ] [ 1424.440368] 2.6.33.3-moocow.20100429-142641 #2 [ 1424.440960] - [ 1424.440960] bash/2186 is trying to acquire lock: [ 1424.440960] (s_active){.+}, at: [811046b8] sysfs_remove_dir+0x75/0x88 [ 1424.440960] [ 1424.440960] but task is already holding lock: [ 1424.440960] (s_active){.+}, at: [81104849] sysfs_get_active_two+0x1f/0x46 [ 1424.440960] [ 1424.440960] other info that might help us debug this: [ 1424.440960] 4 locks held by bash/2186: [ 1424.440960] #0: (buffer-mutex){+.+.+.}, at: [8110317f] sysfs_write_file+0x39/0x126 [ 1424.440960] #1: (s_active){.+}, at: [81104849] sysfs_get_active_two+0x1f/0x46 [ 1424.440960] #2: (s_active){.+}, at: [81104856] sysfs_get_active_two+0x2c/0x46 [ 1424.440960] #3: (q-sysfs_lock){+.+.+.}, at: [8119c3f0] queue_attr_store+0x44/0x85 [ 1424.440960] [ 1424.440960] stack backtrace: [ 1424.440960] Pid: 2186, comm: bash Not tainted 2.6.33.3-moocow.20100429-142641 #2 [ 1424.440960] Call Trace: [ 1424.440960] [8105e775] __lock_acquire+0xf9f/0x178e [ 1424.440960] [8100d3ec] ? save_stack_trace+0x2a/0x48 [ 1424.440960] [8105b46c] ? lockdep_init_map+0x9f/0x52f [ 1424.440960] [8105b46c] ? lockdep_init_map+0x9f/0x52f [ 1424.440960] [8105cb56] ? trace_hardirqs_on+0xd/0xf [ 1424.440960] [8105f02e] lock_acquire+0xca/0xef [ 1424.440960] [811046b8] ? sysfs_remove_dir+0x75/0x88 [ 1424.440960] [8110458d] sysfs_addrm_finish+0xc8/0x13a [ 1424.440960] [811046b8] ? sysfs_remove_dir+0x75/0x88 [ 1424.440960] [8105cb25] ? trace_hardirqs_on_caller+0x110/0x134 [ 1424.440960] [811046b8] sysfs_remove_dir+0x75/0x88 [ 1424.440960] [811ab312] kobject_del+0x16/0x37 [ 1424.440960] [81195489] elv_iosched_store+0x10a/0x214 [ 1424.440960] [8119c416] queue_attr_store+0x6a/0x85 [ 1424.440960] [81103237] sysfs_write_file+0xf1/0x126 [ 1424.440960] [810b747f] vfs_write+0xae/0x14a [ 1424.440960] [810b75df] sys_write+0x47/0x6e [ 1424.440960] [81002202] system_call_fastpath+0x16/0x1b Original scheduler was cfq. Having rebooted and defaulted to noop I tried echo noop /sys/block/vdd/queue/scheduler and got: [ 311.294464] = [ 311.295820] [ INFO: possible recursive locking detected ] [ 311.296603] 2.6.33.3-moocow.20100429-142641 #2 [ 311.296833] - [ 311.296833] bash/2190 is trying to acquire lock: [ 311.296833] (s_active){.+}, at: [81104630] remove_dir+0x31/0x39 [ 311.296833] [ 311.296833] but task is already holding lock: [ 311.296833] (s_active){.+}, at: [81104849] sysfs_get_active_two+0x1f/0x46 [ 311.296833] [ 311.296833] other info that might help us debug this: [ 311.296833] 4 locks held by bash/2190: [ 311.296833] #0: (buffer-mutex){+.+.+.}, at: [8110317f] sysfs_write_file+0x39/0x126 [ 311.296833] #1: (s_active){.+}, at: [81104849] sysfs_get_active_two+0x1f/0x46 [ 311.296833] #2: (s_active){.+}, at: [81104856] sysfs_get_active_two+0x2c/0x46 [ 311.296833] #3: (q-sysfs_lock){+.+.+.}, at: [8119c3f0] queue_attr_store+0x44/0x85 [ 311.296833] [ 311.296833] stack backtrace: [ 311.296833] Pid: 2190, comm: bash Not tainted 2.6.33.3-moocow.20100429-142641 #2 [ 311.296833] Call Trace: [ 311.296833] [8105e775] __lock_acquire+0xf9f/0x178e [ 311.296833] [8105b46c] ? lockdep_init_map+0x9f/0x52f [ 311.296833] [8105b46c] ? lockdep_init_map+0x9f/0x52f [ 311.296833] [8105cb56] ? trace_hardirqs_on+0xd/0xf [ 311.296833] [8105f02e] lock_acquire+0xca/0xef [ 311.296833] [81104630] ? remove_dir+0x31/0x39 [ 311.296833] [8110458d] sysfs_addrm_finish+0xc8/0x13a [ 311.296833] [81104630] ? remove_dir+0x31/0x39 [ 311.296833] [8105cb25] ? trace_hardirqs_on_caller+0x110/0x134 [ 311.296833] [81104630] remove_dir+0x31/0x39 [ 311.296833] [811046c0] sysfs_remove_dir+0x7d/0x88 [ 311.296833] [811ab312] kobject_del+0x16/0x37 [ 311.296833] [81195489] elv_iosched_store+0x10a/0x214 [ 311.296833] [8119c416] queue_attr_store+0x6a/0x85 [ 311.296833] [81103237] sysfs_write_file+0xf1/0x126 [ 311.296833] [810b747f] vfs_write+0xae/0x14a [ 311.296833] [810b75df] sys_write+0x47/0x6e [ 311.296833] [81002202] system_call_fastpath+0x16/0x1b Changing back to noop
Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu
On 05/03/2010 07:32 PM, Joerg Roedel wrote: On Tue, Apr 27, 2010 at 03:58:36PM +0300, Avi Kivity wrote: So we probably need to upgrade gva_t to a u64. Please send this as a separate patch, and test on i386 hosts. Are there _any_ regular tests of KVM on i386 hosts? For me this is terribly broken (also after I fixed the issue which gave me a VMEXIT_INVALID at the first vmrun). No, apart from the poor users. I'll try to set something up using nsvm. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-spec: document block CMD and FLUSH
On 05/04/2010 07:38 AM, Rusty Russell wrote: On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote: I took a stub at documenting CMD and FLUSH request types in virtio block. Christoph, could you look over this please? I note that the interface seems full of warts to me, this might be a first step to cleaning them. ISTR Christoph had withdrawn some patches in this area, and was waiting for him to resubmit? I've given up on figuring out the block device. What seem to me to be sane semantics along the lines of memory barriers are foreign to disk people: they want (and depend on) flushing everywhere. For example, tdb transactions do not require a flush, they only require what I would call a barrier: that prior data be written out before any future data. Surely that would be more efficient in general than a flush! In fact, TDB wants only writes to *that file* (and metadata) written out first; it has no ordering issues with other I/O on the same device. I think that's SCSI ordered tags. A generic I/O interface would allow you to specify this request depends on these outstanding requests and leave it at that. It might have some sync flush command for dumb applications and OSes. The userspace API might be not be as precise and only allow such a barrier against all prior writes on this fd. Depends on all previous requests, and will commit before all following requests. ie a full barrier. ISTR someone mentioning a desire for such an API years ago, so CC'ing the usual I/O suspects... I'd love to see TCQ exposed to user space. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath - Assertion
Am 03.05.2010 23:26, schrieb Peter Lieven: Hi Qemu/KVM Devel Team, i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3. As backend we use open-iSCSI with dm-multipath. Multipath is configured to queue i/o if no path is available. If we create a failure on all paths, qemu starts to consume 100% CPU due to i/o waits which is ok so far. 1 odd thing: The Monitor Interface is not responding any more ... What es a really blocker is that KVM crashes with: kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: Assertion `bmdma-unit != (uint8_t)-1' failed. after the multipath has reestablisched at least one path. Can you get a stack backtrace with gdb? Any ideas? I remember this was working with earlier kernel/kvm/qemu versions. If it works in the same setup with an older qemu version, bisecting might help. Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.33.3: possible recursive locking detected
On 05/04/2010 10:03 AM, CaT wrote: I'm currently running 2.6.33.3 in a KVM instance emulating a core2duo on 1 cpu with virtio HDs running on top of a core2duo host running 2.6.33.3. qemu-kvm version 0.12.3. Doesn't appear to be related to kvm. Copying lkml. When doing: echo noop/sys/block/vdd/queue/scheduler I got: [ 1424.438241] = [ 1424.439588] [ INFO: possible recursive locking detected ] [ 1424.440368] 2.6.33.3-moocow.20100429-142641 #2 [ 1424.440960] - [ 1424.440960] bash/2186 is trying to acquire lock: [ 1424.440960] (s_active){.+}, at: [811046b8] sysfs_remove_dir+0x75/0x88 [ 1424.440960] [ 1424.440960] but task is already holding lock: [ 1424.440960] (s_active){.+}, at: [81104849] sysfs_get_active_two+0x1f/0x46 [ 1424.440960] [ 1424.440960] other info that might help us debug this: [ 1424.440960] 4 locks held by bash/2186: [ 1424.440960] #0: (buffer-mutex){+.+.+.}, at: [8110317f] sysfs_write_file+0x39/0x126 [ 1424.440960] #1: (s_active){.+}, at: [81104849] sysfs_get_active_two+0x1f/0x46 [ 1424.440960] #2: (s_active){.+}, at: [81104856] sysfs_get_active_two+0x2c/0x46 [ 1424.440960] #3: (q-sysfs_lock){+.+.+.}, at: [8119c3f0] queue_attr_store+0x44/0x85 [ 1424.440960] [ 1424.440960] stack backtrace: [ 1424.440960] Pid: 2186, comm: bash Not tainted 2.6.33.3-moocow.20100429-142641 #2 [ 1424.440960] Call Trace: [ 1424.440960] [8105e775] __lock_acquire+0xf9f/0x178e [ 1424.440960] [8100d3ec] ? save_stack_trace+0x2a/0x48 [ 1424.440960] [8105b46c] ? lockdep_init_map+0x9f/0x52f [ 1424.440960] [8105b46c] ? lockdep_init_map+0x9f/0x52f [ 1424.440960] [8105cb56] ? trace_hardirqs_on+0xd/0xf [ 1424.440960] [8105f02e] lock_acquire+0xca/0xef [ 1424.440960] [811046b8] ? sysfs_remove_dir+0x75/0x88 [ 1424.440960] [8110458d] sysfs_addrm_finish+0xc8/0x13a [ 1424.440960] [811046b8] ? sysfs_remove_dir+0x75/0x88 [ 1424.440960] [8105cb25] ? trace_hardirqs_on_caller+0x110/0x134 [ 1424.440960] [811046b8] sysfs_remove_dir+0x75/0x88 [ 1424.440960] [811ab312] kobject_del+0x16/0x37 [ 1424.440960] [81195489] elv_iosched_store+0x10a/0x214 [ 1424.440960] [8119c416] queue_attr_store+0x6a/0x85 [ 1424.440960] [81103237] sysfs_write_file+0xf1/0x126 [ 1424.440960] [810b747f] vfs_write+0xae/0x14a [ 1424.440960] [810b75df] sys_write+0x47/0x6e [ 1424.440960] [81002202] system_call_fastpath+0x16/0x1b Original scheduler was cfq. Having rebooted and defaulted to noop I tried echo noop/sys/block/vdd/queue/scheduler and got: [ 311.294464] = [ 311.295820] [ INFO: possible recursive locking detected ] [ 311.296603] 2.6.33.3-moocow.20100429-142641 #2 [ 311.296833] - [ 311.296833] bash/2190 is trying to acquire lock: [ 311.296833] (s_active){.+}, at: [81104630] remove_dir+0x31/0x39 [ 311.296833] [ 311.296833] but task is already holding lock: [ 311.296833] (s_active){.+}, at: [81104849] sysfs_get_active_two+0x1f/0x46 [ 311.296833] [ 311.296833] other info that might help us debug this: [ 311.296833] 4 locks held by bash/2190: [ 311.296833] #0: (buffer-mutex){+.+.+.}, at: [8110317f] sysfs_write_file+0x39/0x126 [ 311.296833] #1: (s_active){.+}, at: [81104849] sysfs_get_active_two+0x1f/0x46 [ 311.296833] #2: (s_active){.+}, at: [81104856] sysfs_get_active_two+0x2c/0x46 [ 311.296833] #3: (q-sysfs_lock){+.+.+.}, at: [8119c3f0] queue_attr_store+0x44/0x85 [ 311.296833] [ 311.296833] stack backtrace: [ 311.296833] Pid: 2190, comm: bash Not tainted 2.6.33.3-moocow.20100429-142641 #2 [ 311.296833] Call Trace: [ 311.296833] [8105e775] __lock_acquire+0xf9f/0x178e [ 311.296833] [8105b46c] ? lockdep_init_map+0x9f/0x52f [ 311.296833] [8105b46c] ? lockdep_init_map+0x9f/0x52f [ 311.296833] [8105cb56] ? trace_hardirqs_on+0xd/0xf [ 311.296833] [8105f02e] lock_acquire+0xca/0xef [ 311.296833] [81104630] ? remove_dir+0x31/0x39 [ 311.296833] [8110458d] sysfs_addrm_finish+0xc8/0x13a [ 311.296833] [81104630] ? remove_dir+0x31/0x39 [ 311.296833] [8105cb25] ? trace_hardirqs_on_caller+0x110/0x134 [ 311.296833] [81104630] remove_dir+0x31/0x39 [ 311.296833] [811046c0] sysfs_remove_dir+0x7d/0x88 [ 311.296833] [811ab312] kobject_del+0x16/0x37 [ 311.296833] [81195489] elv_iosched_store+0x10a/0x214 [ 311.296833] [8119c416] queue_attr_store+0x6a/0x85 [ 311.296833] [81103237] sysfs_write_file+0xf1/0x126 [ 311.296833] [810b747f] vfs_write+0xae/0x14a [ 311.296833] [810b75df] sys_write+0x47/0x6e [
Re: [PATCH] virtio-spec: document block CMD and FLUSH
On Tue, May 04 2010, Rusty Russell wrote: On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote: I took a stub at documenting CMD and FLUSH request types in virtio block. Christoph, could you look over this please? I note that the interface seems full of warts to me, this might be a first step to cleaning them. ISTR Christoph had withdrawn some patches in this area, and was waiting for him to resubmit? I've given up on figuring out the block device. What seem to me to be sane semantics along the lines of memory barriers are foreign to disk people: they want (and depend on) flushing everywhere. For example, tdb transactions do not require a flush, they only require what I would call a barrier: that prior data be written out before any future data. Surely that would be more efficient in general than a flush! In fact, TDB wants only writes to *that file* (and metadata) written out first; it has no ordering issues with other I/O on the same device. A generic I/O interface would allow you to specify this request depends on these outstanding requests and leave it at that. It might have some sync flush command for dumb applications and OSes. The userspace API might be not be as precise and only allow such a barrier against all prior writes on this fd. ISTR someone mentioning a desire for such an API years ago, so CC'ing the usual I/O suspects... It would be nice to have a more fuller API for this, but the reality is that only the flush approach is really workable. Even just strict ordering of requests could only be supported on SCSI, and even there the kernel still lacks proper guarantees on error handling to prevent reordering there. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
On 05/03/2010 08:03 PM, Michael Tokarev wrote: Michael, can you try to use -cpu host,-vme and see if that makes a difference? With -cpu host,-vme winNT boots just fine as with just -cpu host. I also tried with -cpu qemu64 and kvm64, with +vme and -vme (4 combinations in total) - in all cases winNT crashes with the same 0x003E error. So it appears that vme makes no difference. Please try again the model/vendor/family. I suggest using x86info on both to see what the differences are, using -cpu host with overrides to make it equivalent to qemu64 (and verifying it fails), then removing the overrides one by one until it works. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu
On Tue, May 04, 2010 at 03:53:57AM -0400, Avi Kivity wrote: On 05/03/2010 07:32 PM, Joerg Roedel wrote: On Tue, Apr 27, 2010 at 03:58:36PM +0300, Avi Kivity wrote: So we probably need to upgrade gva_t to a u64. Please send this as a separate patch, and test on i386 hosts. Are there _any_ regular tests of KVM on i386 hosts? For me this is terribly broken (also after I fixed the issue which gave me a VMEXIT_INVALID at the first vmrun). No, apart from the poor users. I'll try to set something up using nsvm. Ok. I will post an initial fix for the VMEXIT_INVALID bug soon. Apart from that I get a lockdep warning when I try to start a guest. The guest actually boots if it is single-vcpu. SMP guests don't even boot through the BIOS for me. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu
On 05/04/2010 12:11 PM, Roedel, Joerg wrote: On Tue, May 04, 2010 at 03:53:57AM -0400, Avi Kivity wrote: On 05/03/2010 07:32 PM, Joerg Roedel wrote: On Tue, Apr 27, 2010 at 03:58:36PM +0300, Avi Kivity wrote: So we probably need to upgrade gva_t to a u64. Please send this as a separate patch, and test on i386 hosts. Are there _any_ regular tests of KVM on i386 hosts? For me this is terribly broken (also after I fixed the issue which gave me a VMEXIT_INVALID at the first vmrun). No, apart from the poor users. I'll try to set something up using nsvm. Ok. I will post an initial fix for the VMEXIT_INVALID bug soon. Apart from that I get a lockdep warning when I try to start a guest. The guest actually boots if it is single-vcpu. SMP guests don't even boot through the BIOS for me. Strange. i386 vs x86_64 shouldn't have that much effect! -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu-kvm: Process exit requests in kvm loop
This unbreaks the monitor quit command for qemu-kvm. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- qemu-kvm.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 91f0222..43d599d 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -2047,6 +2047,9 @@ int kvm_main_loop(void) vm_stop(EXCP_DEBUG); kvm_debug_cpu_requested = NULL; } +if (qemu_exit_requested()) { +exit(0); +} } pause_all_threads(); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: x86: properly update ready_for_interrupt_injection
On 05/04/2010 05:04 AM, Marcelo Tosatti wrote: The recent changes to emulate string instructions without entering guest mode exposed a bug where pending interrupts are not properly reflected in ready_for_interrupt_injection. The result is that userspace overwrites a previously queued interrupt, when irqchip's are emulated in qemu. Applied, thanks. Fix by always updating state before returning to userspace. Why are we even doing this if irqchip_in_kernel? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean
On 04/28/2010 12:50 PM, Takuya Yoshikawa wrote: Hi Marcelo, Avi, I updated the patch as follows. Changelog: 1. Inserted one r = -ENOMEM; line following Avi's advice. 2. Little change of explanation about performance improvements. I'm now testing and cleaning up my next patch series based on this, so please apply this if this makes sense and has no problems. Thanks, Takuya === Although we always allocate a new dirty bitmap in x86's get_dirty_log(), it is only used as a zero-source of copy_to_user() and freed right after that when memslot is clean. This patch uses clear_user() instead of doing this unnecessary zero-source allocation. Performance improvement: as we can expect easily, the time needed to allocate a bitmap is completely reduced. In my test, the improved ioctl was about 4 to 10 times faster than the original one for clean slots. Furthermore, reducing memory allocations and copies will produce good effects to caches too. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu
On Tue, May 04, 2010 at 05:20:02AM -0400, Avi Kivity wrote: On 05/04/2010 12:11 PM, Roedel, Joerg wrote: On Tue, May 04, 2010 at 03:53:57AM -0400, Avi Kivity wrote: On 05/03/2010 07:32 PM, Joerg Roedel wrote: On Tue, Apr 27, 2010 at 03:58:36PM +0300, Avi Kivity wrote: So we probably need to upgrade gva_t to a u64. Please send this as a separate patch, and test on i386 hosts. Are there _any_ regular tests of KVM on i386 hosts? For me this is terribly broken (also after I fixed the issue which gave me a VMEXIT_INVALID at the first vmrun). No, apart from the poor users. I'll try to set something up using nsvm. Ok. I will post an initial fix for the VMEXIT_INVALID bug soon. Apart from that I get a lockdep warning when I try to start a guest. The guest actually boots if it is single-vcpu. SMP guests don't even boot through the BIOS for me. Strange. i386 vs x86_64 shouldn't have that much effect! This is the lockdep warning I get when I start booting a Linux kernel. It is with the nested-npt patchset but the warning occurs without it too (slightly different backtraces then). [60390.953424] === [60390.954324] [ INFO: possible circular locking dependency detected ] [60390.954324] 2.6.34-rc5 #7 [60390.954324] --- [60390.954324] qemu-system-x86/2506 is trying to acquire lock: [60390.954324] (mm-mmap_sem){++}, at: [c10ab0f4] might_fault+0x4c/0x86 [60390.954324] [60390.954324] but task is already holding lock: [60390.954324] ((kvm-mmu_lock)-rlock){+.+...}, at: [f8ec1b50] spin_lock+0xd/0xf [kvm] [60390.954324] [60390.954324] which lock already depends on the new lock. [60390.954324] [60390.954324] [60390.954324] the existing dependency chain (in reverse order) is: [60390.954324] [60390.954324] - #1 ((kvm-mmu_lock)-rlock){+.+...}: [60390.954324][c10575ad] __lock_acquire+0x9fa/0xb6c [60390.954324][c10577b8] lock_acquire+0x99/0xb8 [60390.954324][c15afa2b] _raw_spin_lock+0x20/0x2f [60390.954324][f8eafe19] spin_lock+0xd/0xf [kvm] [60390.954324][f8eb104e] kvm_mmu_notifier_invalidate_range_start+0x2f/0x71 [kvm] [60390.954324][c10bc994] __mmu_notifier_invalidate_range_start+0x31/0x57 [60390.954324][c10b1de3] mprotect_fixup+0x153/0x3d5 [60390.954324][c10b21ca] sys_mprotect+0x165/0x1db [60390.954324][c10028cc] sysenter_do_call+0x12/0x32 [60390.954324] [60390.954324] - #0 (mm-mmap_sem){++}: [60390.954324][c10574af] __lock_acquire+0x8fc/0xb6c [60390.954324][c10577b8] lock_acquire+0x99/0xb8 [60390.954324][c10ab111] might_fault+0x69/0x86 [60390.954324][c11d5987] _copy_from_user+0x36/0x119 [60390.954324][f8eafcd9] copy_from_user+0xd/0xf [kvm] [60390.954324][f8eb0ac0] kvm_read_guest_page+0x24/0x33 [kvm] [60390.954324][f8ebb362] kvm_read_guest_page_mmu+0x55/0x63 [kvm] [60390.954324][f8ebb397] kvm_read_nested_guest_page+0x27/0x2e [kvm] [60390.954324][f8ebb3da] load_pdptrs+0x3c/0x9e [kvm] [60390.954324][f84747ac] svm_cache_reg+0x25/0x2b [kvm_amd] [60390.954324][f8ec7894] kvm_mmu_load+0xf1/0x1fa [kvm] [60390.954324][f8ebbdfc] kvm_arch_vcpu_ioctl_run+0x252/0x9c7 [kvm] [60390.954324][f8eb1fb5] kvm_vcpu_ioctl+0xee/0x432 [kvm] [60390.954324][c10cf8e9] vfs_ioctl+0x2c/0x96 [60390.954324][c10cfe88] do_vfs_ioctl+0x491/0x4cf [60390.954324][c10cff0c] sys_ioctl+0x46/0x66 [60390.954324][c10028cc] sysenter_do_call+0x12/0x32 [60390.954324] [60390.954324] other info that might help us debug this: [60390.954324] [60390.954324] 3 locks held by qemu-system-x86/2506: [60390.954324] #0: (vcpu-mutex){+.+.+.}, at: [f8eb1185] vcpu_load+0x16/0x32 [kvm] [60390.954324] #1: (kvm-srcu){.+.+.+}, at: [f8eb952c] srcu_read_lock+0x0/0x33 [kvm] [60390.954324] #2: ((kvm-mmu_lock)-rlock){+.+...}, at: [f8ec1b50] spin_lock+0xd/0xf [kvm] [60390.954324] [60390.954324] stack backtrace: [60390.954324] Pid: 2506, comm: qemu-system-x86 Not tainted 2.6.34-rc5 #7 [60390.954324] Call Trace: [60390.954324] [c15adf46] ? printk+0x14/0x16 [60390.954324] [c1056877] print_circular_bug+0x8a/0x96 [60390.954324] [c10574af] __lock_acquire+0x8fc/0xb6c [60390.954324] [f8ec1b50] ? spin_lock+0xd/0xf [kvm] [60390.954324] [c10ab0f4] ? might_fault+0x4c/0x86 [60390.954324] [c10577b8] lock_acquire+0x99/0xb8 [60390.954324] [c10ab0f4] ? might_fault+0x4c/0x86 [60390.954324] [c10ab111] might_fault+0x69/0x86 [60390.954324] [c10ab0f4] ? might_fault+0x4c/0x86 [60390.954324] [c11d5987] _copy_from_user+0x36/0x119 [60390.954324] [f8eafcd9] copy_from_user+0xd/0xf [kvm] [60390.954324] [f8eb0ac0] kvm_read_guest_page+0x24/0x33 [kvm] [60390.954324] [f8ebb362] kvm_read_guest_page_mmu+0x55/0x63 [kvm] [60390.954324] [f8ebb397] kvm_read_nested_guest_page+0x27/0x2e
Re: apparent key mapping error for usb keyboard
On 04/27/2010 12:46 PM, Michael Tokarev wrote: I've a debian bugreport that claims to have a fix for apparently wrong keymap for usb keyboard. I noticed this before with ps/2 keyboard too, the sympthoms were that e.g windows keys were not working in guests, but later on that has been fixed. But with `-usbdevice keyboard', i.e. with usb keyboard, it still does not work. See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578846 for details and for the proposed patch which fixes the mentioned issue. Here's the patch itself: --- a/hw/usb-hid.c +++ b/hw/usb-hid.c @@ -399,3 +399,3 @@ 0x51, 0x4e, 0x49, 0x4c, 0x00, 0x00, 0x00, 0x00, -0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +0x00, 0x00, 0x00, 0xe3, 0xe7, 0x65, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, I'm not sure if it's right fix however. Hence I'm asking for opinions here. If it's a right way to go, it should probably be applied to -stable too. I've no idea, but the correct place to ask is qemu-devel (copied). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu
On 05/04/2010 12:37 PM, Roedel, Joerg wrote: This is the lockdep warning I get when I start booting a Linux kernel. It is with the nested-npt patchset but the warning occurs without it too (slightly different backtraces then). [60390.953424] === [60390.954324] [ INFO: possible circular locking dependency detected ] [60390.954324] 2.6.34-rc5 #7 [60390.954324] --- [60390.954324] qemu-system-x86/2506 is trying to acquire lock: [60390.954324] (mm-mmap_sem){++}, at: [c10ab0f4] might_fault+0x4c/0x86 [60390.954324] [60390.954324] but task is already holding lock: [60390.954324] ((kvm-mmu_lock)-rlock){+.+...}, at: [f8ec1b50] spin_lock+0xd/0xf [kvm] [60390.954324] [60390.954324] which lock already depends on the new lock. [60390.954324] [60390.954324] [60390.954324] the existing dependency chain (in reverse order) is: [60390.954324] [60390.954324] - #1 ((kvm-mmu_lock)-rlock){+.+...}: [60390.954324][c10575ad] __lock_acquire+0x9fa/0xb6c [60390.954324][c10577b8] lock_acquire+0x99/0xb8 [60390.954324][c15afa2b] _raw_spin_lock+0x20/0x2f [60390.954324][f8eafe19] spin_lock+0xd/0xf [kvm] [60390.954324][f8eb104e] kvm_mmu_notifier_invalidate_range_start+0x2f/0x71 [kvm] [60390.954324][c10bc994] __mmu_notifier_invalidate_range_start+0x31/0x57 [60390.954324][c10b1de3] mprotect_fixup+0x153/0x3d5 [60390.954324][c10b21ca] sys_mprotect+0x165/0x1db [60390.954324][c10028cc] sysenter_do_call+0x12/0x32 Unrelated. This can take the lock and free it. It only shows up because we do memory ops inside the mmu_lock, which is deeply forbidden (anything which touches user memory, including kmalloc(), can trigger mmu notifiers and recursive locking). [60390.954324] [60390.954324] - #0 (mm-mmap_sem){++}: [60390.954324][c10574af] __lock_acquire+0x8fc/0xb6c [60390.954324][c10577b8] lock_acquire+0x99/0xb8 [60390.954324][c10ab111] might_fault+0x69/0x86 [60390.954324][c11d5987] _copy_from_user+0x36/0x119 [60390.954324][f8eafcd9] copy_from_user+0xd/0xf [kvm] [60390.954324][f8eb0ac0] kvm_read_guest_page+0x24/0x33 [kvm] [60390.954324][f8ebb362] kvm_read_guest_page_mmu+0x55/0x63 [kvm] [60390.954324][f8ebb397] kvm_read_nested_guest_page+0x27/0x2e [kvm] [60390.954324][f8ebb3da] load_pdptrs+0x3c/0x9e [kvm] [60390.954324][f84747ac] svm_cache_reg+0x25/0x2b [kvm_amd] [60390.954324][f8ec7894] kvm_mmu_load+0xf1/0x1fa [kvm] [60390.954324][f8ebbdfc] kvm_arch_vcpu_ioctl_run+0x252/0x9c7 [kvm] [60390.954324][f8eb1fb5] kvm_vcpu_ioctl+0xee/0x432 [kvm] [60390.954324][c10cf8e9] vfs_ioctl+0x2c/0x96 [60390.954324][c10cfe88] do_vfs_ioctl+0x491/0x4cf [60390.954324][c10cff0c] sys_ioctl+0x46/0x66 [60390.954324][c10028cc] sysenter_do_call+0x12/0x32 Just a silly bug. kvm_pdptr_read() can cause a guest memory read on svm, in this case with the mmu lock taken. I'll post something to fix it. What makes me wondering about this is that the two traces to the locks seem to belong to different threads. Ever increasing complexity... -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu
On 05/04/2010 12:45 PM, Avi Kivity wrote: Just a silly bug. kvm_pdptr_read() can cause a guest memory read on svm, in this case with the mmu lock taken. I'll post something to fix it. I guess this was not reported because most svm machines have npt, and this requires npt=0 to trigger. Nonpae paging disables npt, so you were hit. Interestingly, nsvm makes it more likely to appear, since npt on i386+pae will need the pdptrs. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
On svm, kvm_read_pdptr() may require reading guest memory, which can sleep. Push the spinlock into mmu_alloc_roots(), and only take it after we've read the pdptr. Signed-off-by: Avi Kivity a...@redhat.com --- Marcelo, dropping and re-acquiring the lock before mmu_sync_roots(), is fine, yes? arch/x86/kvm/mmu.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 51eb6d6..de99638 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2065,11 +2065,13 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) direct = 1; root_gfn = 0; } + spin_lock(vcpu-kvm-mmu_lock); sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, direct, ACC_ALL, NULL); root = __pa(sp-spt); ++sp-root_count; + spin_unlock(vcpu-kvm-mmu_lock); vcpu-arch.mmu.root_hpa = root; return 0; } @@ -2093,11 +2095,14 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) direct = 1; root_gfn = i 30; } + spin_lock(vcpu-kvm-mmu_lock); sp = kvm_mmu_get_page(vcpu, root_gfn, i 30, PT32_ROOT_LEVEL, direct, ACC_ALL, NULL); root = __pa(sp-spt); ++sp-root_count; + spin_unlock(vcpu-kvm-mmu_lock); + vcpu-arch.mmu.pae_root[i] = root | PT_PRESENT_MASK; } vcpu-arch.mmu.root_hpa = __pa(vcpu-arch.mmu.pae_root); @@ -2466,7 +2471,9 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) goto out; spin_lock(vcpu-kvm-mmu_lock); kvm_mmu_free_some_pages(vcpu); + spin_unlock(vcpu-kvm-mmu_lock); r = mmu_alloc_roots(vcpu); + spin_lock(vcpu-kvm-mmu_lock); mmu_sync_roots(vcpu); spin_unlock(vcpu-kvm-mmu_lock); if (r) -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: kvm_pdptr_read() may sleep
Annotate it thusly. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/kvm_cache_regs.h |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h index cff851c..d2a98f8 100644 --- a/arch/x86/kvm/kvm_cache_regs.h +++ b/arch/x86/kvm/kvm_cache_regs.h @@ -36,6 +36,8 @@ static inline void kvm_rip_write(struct kvm_vcpu *vcpu, unsigned long val) static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index) { + might_sleep(); /* on svm */ + if (!test_bit(VCPU_EXREG_PDPTR, (unsigned long *)vcpu-arch.regs_avail)) kvm_x86_ops-cache_reg(vcpu, VCPU_EXREG_PDPTR); -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-spec: document block CMD and FLUSH
On Tue, May 04, 2010 at 02:08:24PM +0930, Rusty Russell wrote: On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote: I took a stub at documenting CMD and FLUSH request types in virtio block. Christoph, could you look over this please? I note that the interface seems full of warts to me, this might be a first step to cleaning them. ISTR Christoph had withdrawn some patches in this area, and was waiting for him to resubmit? Any patches I've withdrawn in this area are withdrawn for good. But what I really need to do is to review Michaels spec updates, sorry. UI'll get back to it today. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
On Tue, May 04, 2010 at 06:03:50AM -0400, Avi Kivity wrote: On svm, kvm_read_pdptr() may require reading guest memory, which can sleep. Push the spinlock into mmu_alloc_roots(), and only take it after we've read the pdptr. This fixes the lockdep issue for me. Thanks. Tested-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Avi Kivity a...@redhat.com --- Marcelo, dropping and re-acquiring the lock before mmu_sync_roots(), is fine, yes? arch/x86/kvm/mmu.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 51eb6d6..de99638 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2065,11 +2065,13 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) direct = 1; root_gfn = 0; } + spin_lock(vcpu-kvm-mmu_lock); sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, direct, ACC_ALL, NULL); root = __pa(sp-spt); ++sp-root_count; + spin_unlock(vcpu-kvm-mmu_lock); vcpu-arch.mmu.root_hpa = root; return 0; } @@ -2093,11 +2095,14 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) direct = 1; root_gfn = i 30; } + spin_lock(vcpu-kvm-mmu_lock); sp = kvm_mmu_get_page(vcpu, root_gfn, i 30, PT32_ROOT_LEVEL, direct, ACC_ALL, NULL); root = __pa(sp-spt); ++sp-root_count; + spin_unlock(vcpu-kvm-mmu_lock); + vcpu-arch.mmu.pae_root[i] = root | PT_PRESENT_MASK; } vcpu-arch.mmu.root_hpa = __pa(vcpu-arch.mmu.pae_root); @@ -2466,7 +2471,9 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) goto out; spin_lock(vcpu-kvm-mmu_lock); kvm_mmu_free_some_pages(vcpu); + spin_unlock(vcpu-kvm-mmu_lock); r = mmu_alloc_roots(vcpu); + spin_lock(vcpu-kvm-mmu_lock); mmu_sync_roots(vcpu); spin_unlock(vcpu-kvm-mmu_lock); if (r) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] amended: first round of vhost-net enhancements for net-next
David, This is an amended pull request: I have rebased the tree to the correct patches. This has been through basic tests and seems to work fine here. The following tree includes a couple of enhancements that help vhost-net. Please pull them for net-next. Another set of patches is under debugging/testing and I hope to get them ready in time for 2.6.35, so there may be another pull request later. Thanks! The following changes since commit 7ef527377b88ff05fb122a47619ea506c631c914: Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 (2010-05-02 22:02:06 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost Michael S. Tsirkin (2): tun: add ioctl to modify vnet header size macvtap: add ioctl to modify vnet header size drivers/net/macvtap.c | 27 +++ drivers/net/tun.c | 32 include/linux/if_tun.h |2 ++ 3 files changed, 53 insertions(+), 8 deletions(-) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath - Assertion
hi kevin, i set a breakpint at bmdma_active_if. the first 2 breaks encountered when the last path in the multipath failed, but the assertion was not true. when i kicked one path back in the breakpoint was reached again, this time leading to an assert. the stacktrace is from the point shortly before. hope this helps. br, peter -- (gdb) b bmdma_active_if Breakpoint 2 at 0x43f2e0: file /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h, line 507. (gdb) c Continuing. [Switching to Thread 0x7f7b3300d950 (LWP 21171)] Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507 507assert(bmdma-unit != (uint8_t)-1); (gdb) c Continuing. Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507 507assert(bmdma-unit != (uint8_t)-1); (gdb) c Continuing. Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507 507assert(bmdma-unit != (uint8_t)-1); (gdb) bt full #0 bmdma_active_if (bmdma=0xe31fd8) at /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507 __PRETTY_FUNCTION__ = bmdma_active_if #1 0x0043f6ba in ide_read_dma_cb (opaque=0xe31fd8, ret=0) at /usr/src/qemu-kvm-0.12.3/hw/ide/core.c:554 bm = (BMDMAState *) 0xe31fd8 s = (IDEState *) 0xe17940 n = 0 sector_num = 0 #2 0x0058730c in dma_bdrv_cb (opaque=0xe17940, ret=0) at /usr/src/qemu-kvm-0.12.3/dma-helpers.c:94 dbs = (DMAAIOCB *) 0xe17940 cur_addr = 0 cur_len = 0 mem = (void *) 0x0 #3 0x0049e510 in qemu_laio_process_completion (s=0xe119c0, laiocb=0xe179c0) at linux-aio.c:68 ret = 0 #4 0x0049e611 in qemu_laio_enqueue_completed (s=0xe119c0, laiocb=0xe179c0) at linux-aio.c:107 No locals. #5 0x0049e787 in qemu_laio_completion_cb (opaque=0xe119c0) at linux-aio.c:144 iocb = (struct iocb *) 0xe179f0 laiocb = (struct qemu_laiocb *) 0xe179c0 val = 1 ret = 8 nevents = 1 i = 0 events = {{data = 0x0, obj = 0xe179f0, res = 4096, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 0} repeats 46 times, {data = 0x0, obj = 0x0, res = 0, res2 = 4365191}, {data = 0x429abf, obj = 0x7f7b3300c410, res = 4614129721674825936, res2 = 14777248}, {data = 0x300018, obj = 0x7f7b3300c4c0, res = 140167113393152, res2 = 47259417504}, {data = 0xe17740, obj = 0xa3300c4e0, res = 140167113393184, res2 = 0}, {data = 0xe17740, obj = 0x0, res = 0, res2 = 17}, {data = 0x7f7b3300ccf0, obj = 0x92, res = 32, res2 = 168}, {data = 0x7f7b33797a00, obj = 0x801000, res = 0, res2 = 140167141433408}, {data = 0x7f7b34496e00, obj = 0x7f7b33797a00, res = 140167113393392, res2 = 8392704}, {data = 0x0, obj = 0x7f7b34aca040, res = 140167134932480, res2 = 140167118209654}, {data = 0x7f7b3300d950, obj = 0x42603d, res = 0, res2 = 42949672960}, {data = 0x7f7b3300c510, obj = 0xe17ba0, res = 14776128, res2 = 43805361568}, {data = 0x7f7b3300ced0, obj = 0x42797e, res = 0, res2 = 14777248}, { data = 0x174, obj = 0x0, res = 373, res2 = 0}, {data = 0x176, obj = 0x0, res = 3221225601, res2 = 0}, {data = 0x4008ae89c083, obj = 0x0, res = 209379655938, res2 = 0}, { data = 0x7f7bc084, obj = 0x0, res = 3221225602, res2 = 0}, {data = 0x7f7b0012, obj = 0x0, res = 17, res2 = 0}, {data = 0x0, obj = 0x11, res = 140167113395840, res2 = 146}, {data = 0x20, obj = 0xa8, res = 140167121304064, res2 = 8392704}, {data = 0x0, obj = 0x7f7b34aca040, res = 140167134932480, res2 = 140167121304064}, { data = 0x7f7b3300c680, obj = 0x801000, res = 0, res2 = 140167141433408}, {data = 0x7f7b34496e00, obj = 0x7f7b334a4276, res = 140167113398608, res2 = 4350013}, {data = 0x0, obj = 0xa, res = 140167113393824, res2 = 14777248}, {data = 0xe2c010, obj = 0xa3300c730, res = 140167113396320, res2 = 4356478}, {data = 0x0, obj = 0xe17ba0, res = 372, res2 = 0}, {data = 0x175, obj = 0x0, res = 374, res2 = 0}, {data = 0xc081, obj = 0x0, res = 3221225603, res2 = 0}, {data = 0xc102, obj = 0x0, res = 3221225604, res2 = 0}, {data = 0xc082, obj = 0x0, res = 18, res2 = 0}, {data = 0x11, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 0}, { data = 0x0, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 140167139245116}, {data = 0x0, obj = 0x7f7b34abe118, res = 9, res2 = 13}, {data = 0x25bf5fc6, obj = 0x7f7b348b40f0, res = 140167117719264, res2 = 6}, { data = 0x96fd7f, obj = 0x7f7b3300c850, res = 140167113394680, res2 = 140167117724520}, {data = 0x0, obj = 0x7f7b34abe168, res = 140167141388288, res2 = 4206037}, { data = 0x7f7b3343a210, obj = 0x402058, res = 21474836480, res2 = 4294968102}, {data = 0x0, obj = 0x7f7b34ac8358, res = 140167113394736, res2 = 140167113394680}, { data = 0x25bf5fc6, obj = 0x7f7b3300c9e0, res = 0, res2 = 140167139246910}, {data = 0x0, obj = 0x7f7b34abe168, res
Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu
On Tue, May 04, 2010 at 05:50:50AM -0400, Avi Kivity wrote: On 05/04/2010 12:45 PM, Avi Kivity wrote: Just a silly bug. kvm_pdptr_read() can cause a guest memory read on svm, in this case with the mmu lock taken. I'll post something to fix it. I guess this was not reported because most svm machines have npt, and this requires npt=0 to trigger. Nonpae paging disables npt, so you were hit. Interestingly, nsvm makes it more likely to appear, since npt on i386+pae will need the pdptrs. Hmm, actually it happened on 32 bit with npt enabled. I think this can trigger when mmu_alloc_roots is called for an pae guest because it accidentially tries read the root_gfn from the guest before it figures out that it runs with tdp and omits the gfn read from the guest. I need to touch this for nested-npt and will look into a way improving this. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Fix wallclock version writing race
Wallclock writing uses an unprotected global variable to hold the version; this can cause one guest to interfere with another if both write their wallclock at the same time. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/x86.c | 12 ++-- 1 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f6f8dad..c3152d7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -754,14 +754,22 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data) static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock) { - static int version; + int version; + int r; struct pvclock_wall_clock wc; struct timespec boot; if (!wall_clock) return; - version++; + r = kvm_read_guest(kvm, wall_clock, version, sizeof(version)); + if (r) + return; + + if (version 1) + ++version; /* first time write, random junk */ + + ++version; kvm_write_guest(kvm, wall_clock, version, sizeof(version)); -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu
On 05/04/2010 03:00 PM, Roedel, Joerg wrote: On Tue, May 04, 2010 at 05:50:50AM -0400, Avi Kivity wrote: On 05/04/2010 12:45 PM, Avi Kivity wrote: Just a silly bug. kvm_pdptr_read() can cause a guest memory read on svm, in this case with the mmu lock taken. I'll post something to fix it. I guess this was not reported because most svm machines have npt, and this requires npt=0 to trigger. Nonpae paging disables npt, so you were hit. Interestingly, nsvm makes it more likely to appear, since npt on i386+pae will need the pdptrs. Hmm, actually it happened on 32 bit with npt enabled. I think this can trigger when mmu_alloc_roots is called for an pae guest because it accidentially tries read the root_gfn from the guest before it figures out that it runs with tdp and omits the gfn read from the guest. Yes. I had a patchset which moved the 'direct' calculation before, and skipped root_gfn if it was direct, but it was broken. If you like I can resurrect it, but it may interfere with your work. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
Avi Kivity wrote: On 05/03/2010 08:03 PM, Michael Tokarev wrote: Michael, can you try to use -cpu host,-vme and see if that makes a difference? With -cpu host,-vme winNT boots just fine as with just -cpu host. I also tried with -cpu qemu64 and kvm64, with +vme and -vme (4 combinations in total) - in all cases winNT crashes with the same 0x003E error. So it appears that vme makes no difference. Please try again the model/vendor/family. I suggest using x86info on both to see what the differences are, using -cpu host with overrides to make it equivalent to qemu64 (and verifying it fails), then removing the overrides one by one until it works. I managed to get a NT4 CD and can acknowledge the issues you see. I am about to debug this now. With -cpu host (on a AMD K8, similar to Michael's) I get to the point Michael mentioned: Microsoft (R) Windows NT (TM) Version 4.0 (Build 1381). 1 System Processor [512 MB Memory] Multiprocessor Kernel Then it _seems_ to hang, checking for getting beyond a certain TSC value in a tight loop. (rdtsc; cmp %edx, %edi; ja @rdtsc; jb bailout; cmp %eax, %ebx; ja @rdtsdc; bailout:) But after some time (when I got back from the monitor, but also without going into) I could proceed with the installation. Michael, can you confirm this? I will now try to get behind the STOP 3E error. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath - Assertion
Am 04.05.2010 13:38, schrieb Peter Lieven: hi kevin, i set a breakpint at bmdma_active_if. the first 2 breaks encountered when the last path in the multipath failed, but the assertion was not true. when i kicked one path back in the breakpoint was reached again, this time leading to an assert. the stacktrace is from the point shortly before. hope this helps. Hm, looks like there's something wrong with cancelling requests - bdrv_aio_cancel might decide that it completes a request (and consequently calls the callback for it) whereas the IDE emulation decides that it's done with the request before calling bdrv_aio_cancel. I haven't looked in much detail what this could break, but does something like this help? diff --git a/hw/ide/core.c b/hw/ide/core.c index 0757528..3cd55e3 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -2838,10 +2838,6 @@ static void ide_dma_restart(IDEState *s, int is_read) void ide_dma_cancel(BMDMAState *bm) { if (bm-status BM_STATUS_DMAING) { -bm-status = ~BM_STATUS_DMAING; -/* cancel DMA request */ -bm-unit = -1; -bm-dma_cb = NULL; if (bm-aiocb) { #ifdef DEBUG_AIO printf(aio_cancel\n); @@ -2849,6 +2845,10 @@ void ide_dma_cancel(BMDMAState *bm) bdrv_aio_cancel(bm-aiocb); bm-aiocb = NULL; } +bm-status = ~BM_STATUS_DMAING; +/* cancel DMA request */ +bm-unit = -1; +bm-dma_cb = NULL; } } Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
Andre Przywara wrote: [] I managed to get a NT4 CD and can acknowledge the issues you see. I am about to debug this now. With -cpu host (on a AMD K8, similar to Michael's) I get to the point Michael mentioned: Microsoft (R) Windows NT (TM) Version 4.0 (Build 1381). 1 System Processor [512 MB Memory] Multiprocessor Kernel Then it _seems_ to hang, checking for getting beyond a certain TSC value in a tight loop. (rdtsc; cmp %edx, %edi; ja @rdtsc; jb bailout; cmp %eax, %ebx; ja @rdtsdc; bailout:) But after some time (when I got back from the monitor, but also without going into) I could proceed with the installation. Michael, can you confirm this? I've seen 3 variants here so far: 1. normal installation. It stops for a while after that kernel message you mentioned. For several secodns, mabye even 20 seconds. And after a while it continues. During all this time the guest cpu usage is 100% like you describe (a tight loop). This is what I call working - I never bothered to think if that tight loop/pause is normal or not. This is what happens for me with -cpu host. 2. with -cpu pentium it also displays that kernel message but stops here without any cpu usage whatsoever. I waited for some 40 minutes at one point (I just forgot I started it but later on noticed there's a QEMU window floating around with that NT kernel message on it and nothing happening). 3. In all other cases so far it BSoDs with STOP 0x3E error right before displaying that kernel message. So.. I'm not sure if it's confirmation or not :) Thanks! /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space
Hi, sorry for sending from my personal account. The following series are all from me: From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp The 3rd version of moving dirty bitmaps to user space. From this version, we add x86 and ppc and asm-generic people to CC lists. [To KVM people] Sorry for being late to reply your comments. Avi, - I've wrote an answer to your question in patch 5/12: drivers/vhost/vhost.c . - I've considered to change the set_bit_user_non_atomic to an inline function, but did not change because the other helpers in the uaccess.h are written as macros. Anyway, I hope that x86 people will give us appropriate suggestions about this. - I thought that documenting about making bitmaps 64-bit aligned will be written when we add an API to register user-allocated bitmaps. So probably in the next series. Avi, Alex, - Could you check the ia64 and ppc parts, please? I tried to keep the logical changes as small as possible. I personally tried to build these with cross compilers. For ia64, I could check build success with my patch series. But book3s, even without my patch series, it failed with the following errors: arch/powerpc/kvm/book3s_paired_singles.c: In function 'kvmppc_emulate_paired_single': arch/powerpc/kvm/book3s_paired_singles.c:1289: error: the frame size of 2288 bytes is larger than 2048 bytes make[1]: *** [arch/powerpc/kvm/book3s_paired_singles.o] Error 1 make: *** [arch/powerpc/kvm] Error 2 About changelog: there are two main changes from the 2nd version: 1. I changed the treatment of clean slots (see patch 1/12). This was already applied today, thanks! 2. I changed the switch API. (see patch 11/12). To show this API's advantage, I also did a test (see the end of this mail). [To x86 people] Hi, Thomas, Ingo, Peter, Please review the patches 4,5/12. Because this is the first experience for me to send patches to x86, please tell me if this lacks anything. [To ppc people] Hi, Benjamin, Paul, Alex, Please see the patches 6,7/12. I first say sorry for that I've not tested these yet. In that sense, these may not be in the quality for precise reviews. But I will be happy if you would give me any comments. Alex, could you help me? Though I have a plan to get PPC box in the future, currently I cannot test these. [To asm-generic people] Hi, Arnd, Please review the patch 8/12. This kind of macro is acceptable? [Performance test] We measured the tsc needed to the ioctl()s for getting dirty logs in kernel. Test environment AMD Phenom(tm) 9850 Quad-Core Processor with 8GB memory 1. GUI test (running Ubuntu guest in graphical mode) sudo qemu-system-x86_64 -hda dirtylog_test.img -boot c -m 4192 -net ... We show a relatively stable part to compare how much time is needed for the basic parts of dirty log ioctl. get.org get.opt switch.opt slots[7].len=32768 278379 66398 64024 slots[8].len=32768 181246 270 160 slots[7].len=32768 263961 64673 64494 slots[8].len=32768 181655 265 160 slots[7].len=32768 263736 64701 64610 slots[8].len=32768 182785 267 160 slots[7].len=32768 260925 65360 65042 slots[8].len=32768 182579 264 160 slots[7].len=32768 267823 65915 65682 slots[8].len=32768 186350 271 160 At a glance, we know our optimization improved significantly compared to the original get dirty log ioctl. This is true for both get.opt and switch.opt. This has a really big impact for the personal KVM users who drive KVM in GUI mode on their usual PCs. Next, we notice that switch.opt improved a hundred nano seconds or so for these slots. Although this may sound a bit tiny improvement, we can feel this as a difference of GUI's responses like mouse reactions. To feel the difference, please try GUI on your PC with our patch series! 2. Live-migration test (4GB guest, write loop with 1GB buf) We also did a live-migration test. get.org get.opt switch.opt slots[0].len=655360 797383261144222181 slots[1].len=37570478082186721 1965244 1842824 slots[2].len=637534208 1433562 1012723 1031213 slots[3].len=131072 216858 331 331 slots[4].len=131072 121635 225 164 slots[5].len=131072 120863 356 164 slots[6].len=16777216 121746 1133 156 slots[7].len=32768 120415 230 278 slots[8].len=32768 120368 216 149 slots[0].len=655360 806497194710223582 slots[1].len=37570478082142922 1878025 1895369 slots[2].len=637534208 1386512 1021309 1000345 slots[3].len=131072 221118 459 296 slots[4].len=131072 121516 272 166 slots[5].len=131072 122652 244 173
[RFC][PATCH 1/12 applied today] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean
Although we always allocate a new dirty bitmap in x86's get_dirty_log(), it is only used as a zero-source of copy_to_user() and freed right after that when memslot is clean. This patch uses clear_user() instead of doing this unnecessary zero-source allocation. Performance improvement: as we can expect easily, the time needed to allocate a bitmap is completely reduced. Furthermore, we can avoid the tlb flush triggered by vmalloc() and get some good effects. In my test, the improved ioctl was about 4 to 10 times faster than the original one for clean slots. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/x86.c | 37 +++-- 1 files changed, 23 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..b95a211 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2744,7 +2744,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot; unsigned long n; unsigned long is_dirty = 0; - unsigned long *dirty_bitmap = NULL; mutex_lock(kvm-slots_lock); @@ -2759,27 +2758,30 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); - r = -ENOMEM; - dirty_bitmap = vmalloc(n); - if (!dirty_bitmap) - goto out; - memset(dirty_bitmap, 0, n); - for (i = 0; !is_dirty i n/sizeof(long); i++) is_dirty = memslot-dirty_bitmap[i]; /* If nothing is dirty, don't bother messing with page tables. */ if (is_dirty) { struct kvm_memslots *slots, *old_slots; + unsigned long *dirty_bitmap; spin_lock(kvm-mmu_lock); kvm_mmu_slot_remove_write_access(kvm, log-slot); spin_unlock(kvm-mmu_lock); - slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); - if (!slots) - goto out_free; + r = -ENOMEM; + dirty_bitmap = vmalloc(n); + if (!dirty_bitmap) + goto out; + memset(dirty_bitmap, 0, n); + r = -ENOMEM; + slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); + if (!slots) { + vfree(dirty_bitmap); + goto out; + } memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots)); slots-memslots[log-slot].dirty_bitmap = dirty_bitmap; @@ -2788,13 +2790,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, synchronize_srcu_expedited(kvm-srcu); dirty_bitmap = old_slots-memslots[log-slot].dirty_bitmap; kfree(old_slots); + + r = -EFAULT; + if (copy_to_user(log-dirty_bitmap, dirty_bitmap, n)) { + vfree(dirty_bitmap); + goto out; + } + vfree(dirty_bitmap); + } else { + r = -EFAULT; + if (clear_user(log-dirty_bitmap, n)) + goto out; } r = 0; - if (copy_to_user(log-dirty_bitmap, dirty_bitmap, n)) - r = -EFAULT; -out_free: - vfree(dirty_bitmap); out: mutex_unlock(kvm-slots_lock); return r; -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 2/12] KVM: introduce slot level dirty state management
This patch introduces is_dirty member for each memory slot. Using this member, we remove the dirty bitmap scans which are done in get_dirty_log(). This is important for moving dirty bitmaps to user space because we don't have any good ways to check bitmaps in user space with low cost and scanning bitmaps to check memory slot dirtiness will not be acceptable. When we mark a slot dirty: - x86 and ppc: at the timing of mark_page_dirty() - ia64: at the timing of kvm_ia64_sync_dirty_log() ia64 uses a different place to store dirty logs and synchronize it with the logs of memory slots right before the get_dirty_log(). So we use this timing to update is_dirty. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp CC: Avi Kivity a...@redhat.com CC: Alexander Graf ag...@suse.de --- arch/ia64/kvm/kvm-ia64.c | 11 +++ arch/powerpc/kvm/book3s.c |9 - arch/x86/kvm/x86.c|9 +++-- include/linux/kvm_host.h |4 ++-- virt/kvm/kvm_main.c | 13 +++-- 5 files changed, 19 insertions(+), 27 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index d5f4e91..17fd65c 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1824,6 +1824,9 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm, base = memslot-base_gfn / BITS_PER_LONG; for (i = 0; i n/sizeof(long); ++i) { + if (dirty_bitmap[base + i]) + memslot-is_dirty = true; + memslot-dirty_bitmap[i] = dirty_bitmap[base + i]; dirty_bitmap[base + i] = 0; } @@ -1838,7 +1841,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, int r; unsigned long n; struct kvm_memory_slot *memslot; - int is_dirty = 0; mutex_lock(kvm-slots_lock); spin_lock(kvm-arch.dirty_log_lock); @@ -1847,16 +1849,17 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, if (r) goto out; - r = kvm_get_dirty_log(kvm, log, is_dirty); + r = kvm_get_dirty_log(kvm, log); if (r) goto out; + memslot = kvm-memslots-memslots[log-slot]; /* If nothing is dirty, don't bother messing with page tables. */ - if (is_dirty) { + if (memslot-is_dirty) { kvm_flush_remote_tlbs(kvm); - memslot = kvm-memslots-memslots[log-slot]; n = kvm_dirty_bitmap_bytes(memslot); memset(memslot-dirty_bitmap, 0, n); + memslot-is_dirty = false; } r = 0; out: diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 28e785f..4b074f1 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1191,20 +1191,18 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot; struct kvm_vcpu *vcpu; ulong ga, ga_end; - int is_dirty = 0; int r; unsigned long n; mutex_lock(kvm-slots_lock); - r = kvm_get_dirty_log(kvm, log, is_dirty); + r = kvm_get_dirty_log(kvm, log); if (r) goto out; + memslot = kvm-memslots-memslots[log-slot]; /* If nothing is dirty, don't bother messing with page tables. */ - if (is_dirty) { - memslot = kvm-memslots-memslots[log-slot]; - + if (memslot-is_dirty) { ga = memslot-base_gfn PAGE_SHIFT; ga_end = ga + (memslot-npages PAGE_SHIFT); @@ -1213,6 +1211,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); memset(memslot-dirty_bitmap, 0, n); + memslot-is_dirty = false; } r = 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b95a211..023c7f8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2740,10 +2740,9 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm, int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) { - int r, i; + int r; struct kvm_memory_slot *memslot; unsigned long n; - unsigned long is_dirty = 0; mutex_lock(kvm-slots_lock); @@ -2758,11 +2757,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); - for (i = 0; !is_dirty i n/sizeof(long); i++) - is_dirty = memslot-dirty_bitmap[i]; - /* If nothing is dirty, don't bother messing with page tables. */ - if (is_dirty) { + if (memslot-is_dirty) { struct kvm_memslots *slots, *old_slots; unsigned long *dirty_bitmap; @@ -2784,6 +2780,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, } memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots)); slots-memslots[log-slot].dirty_bitmap
[RFC][PATCH 3/12] KVM: introduce wrapper functions to create and destroy dirty bitmaps
We will change the vmalloc() and vfree() to do_mmap() and do_munmap() later. This patch makes it easy and cleanup the code. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp --- virt/kvm/kvm_main.c | 27 --- 1 files changed, 20 insertions(+), 7 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7ab6312..3e3acad 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -435,6 +435,12 @@ out_err_nodisable: return ERR_PTR(r); } +static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot) +{ + vfree(memslot-dirty_bitmap); + memslot-dirty_bitmap = NULL; +} + /* * Free any memory in @free but not in @dont. */ @@ -447,7 +453,7 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free, vfree(free-rmap); if (!dont || free-dirty_bitmap != dont-dirty_bitmap) - vfree(free-dirty_bitmap); + kvm_destroy_dirty_bitmap(free); for (i = 0; i KVM_NR_PAGE_SIZES - 1; ++i) { @@ -458,7 +464,6 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free, } free-npages = 0; - free-dirty_bitmap = NULL; free-rmap = NULL; } @@ -520,6 +525,18 @@ static int kvm_vm_release(struct inode *inode, struct file *filp) return 0; } +static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot) +{ + unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(memslot); + + memslot-dirty_bitmap = vmalloc(dirty_bytes); + if (!memslot-dirty_bitmap) + return -ENOMEM; + + memset(memslot-dirty_bitmap, 0, dirty_bytes); + return 0; +} + /* * Allocate some memory and give it an address in the guest physical address * space. @@ -653,12 +670,8 @@ skip_lpage: /* Allocate page dirty bitmap if needed */ if ((new.flags KVM_MEM_LOG_DIRTY_PAGES) !new.dirty_bitmap) { - unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(new); - - new.dirty_bitmap = vmalloc(dirty_bytes); - if (!new.dirty_bitmap) + if (kvm_create_dirty_bitmap(new) 0) goto out_free; - memset(new.dirty_bitmap, 0, dirty_bytes); /* destroy any largepage mappings for dirty tracking */ if (old.npages) flush_shadow = 1; -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 4/12] x86: introduce copy_in_user() for 32-bit
During the work of KVM's dirty page logging optimization, we encountered the need of copy_in_user() for 32-bit x86 and ppc: these will be used for manipulating dirty bitmaps in user space. So we implement copy_in_user() for 32-bit with existing generic copy user helpers. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp CC: Avi Kivity a...@redhat.com Cc: Thomas Gleixner t...@linutronix.de CC: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com --- arch/x86/include/asm/uaccess_32.h |2 ++ arch/x86/lib/usercopy_32.c| 26 ++ 2 files changed, 28 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h index 088d09f..85d396d 100644 --- a/arch/x86/include/asm/uaccess_32.h +++ b/arch/x86/include/asm/uaccess_32.h @@ -21,6 +21,8 @@ unsigned long __must_check __copy_from_user_ll_nocache (void *to, const void __user *from, unsigned long n); unsigned long __must_check __copy_from_user_ll_nocache_nozero (void *to, const void __user *from, unsigned long n); +unsigned long __must_check copy_in_user + (void __user *to, const void __user *from, unsigned n); /** * __copy_to_user_inatomic: - Copy a block of data into user space, with less checking. diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c index e218d5d..e90ffc3 100644 --- a/arch/x86/lib/usercopy_32.c +++ b/arch/x86/lib/usercopy_32.c @@ -889,3 +889,29 @@ void copy_from_user_overflow(void) WARN(1, Buffer overflow detected!\n); } EXPORT_SYMBOL(copy_from_user_overflow); + +/** + * copy_in_user: - Copy a block of data from user space to user space. + * @to: Destination address, in user space. + * @from: Source address, in user space. + * @n:Number of bytes to copy. + * + * Context: User context only. This function may sleep. + * + * Copy data from user space to user space. + * + * Returns number of bytes that could not be copied. + * On success, this will be zero. + */ +unsigned long +copy_in_user(void __user *to, const void __user *from, unsigned n) +{ + if (access_ok(VERIFY_WRITE, to, n) access_ok(VERIFY_READ, from, n)) { + if (movsl_is_ok(to, from, n)) + __copy_user(to, from, n); + else + n = __copy_user_intel(to, (const void *)from, n); + } + return n; +} +EXPORT_SYMBOL(copy_in_user); -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 5/12] x86: introduce __set_bit() like function for bitmaps in user space
During the work of KVM's dirty page logging optimization, we encountered the need of manipulating bitmaps in user space efficiantly. To achive this, we introduce a uaccess function for setting a bit in user space following Avi's suggestion. KVM is now using dirty bitmaps for live-migration and VGA. Although we need to update them from kernel side, copying them every time for updating the dirty log is a big bottleneck. Especially, we tested that zero-copy bitmap manipulation improves responses of GUI manipulations a lot. We also found one similar need in drivers/vhost/vhost.c in which the author implemented set_bit_to_user() locally using inefficient functions: see TODO at the top of that. Probably, this kind of need would be common for virtualization area. So we introduce a macro set_bit_user_non_atomic() following the implementation style of x86's uaccess functions. Note: there is a one restriction to this macro: bitmaps must be 64-bit aligned (see the comment in this patch). Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp CC: Avi Kivity a...@redhat.com Cc: Thomas Gleixner t...@linutronix.de CC: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com --- arch/x86/include/asm/uaccess.h | 39 +++ 1 files changed, 39 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index abd3e0e..3138e65 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -98,6 +98,45 @@ struct exception_table_entry { extern int fixup_exception(struct pt_regs *regs); +/** + * set_bit_user_non_atomic: - set a bit of a bitmap in user space. + * @nr: Bit offset. + * @addr: Base address of a bitmap in user space. + * + * Context: User context only. This function may sleep. + * + * This macro set a bit of a bitmap in user space. + * + * Restriction: the bitmap pointed to by @addr must be 64-bit aligned: + * the kernel accesses the bitmap by its own word length, so bitmaps + * allocated by 32-bit processes may cause fault. + * + * Returns zero on success, or -EFAULT on error. + */ +#define __set_bit_user_non_atomic_asm(nr, addr, err, errret) \ + asm volatile(1:bts %1,%2\n\ +2:\n \ +.section .fixup,\ax\\n \ +3:mov %3,%0\n\ + jmp 2b\n \ +.previous\n \ +_ASM_EXTABLE(1b, 3b) \ +: =r(err)\ +: r (nr), m (__m(addr)), i (errret), 0 (err)) + +#define set_bit_user_non_atomic(nr, addr) \ +({ \ + int __ret_sbu; \ + \ + might_fault(); \ + if (access_ok(VERIFY_WRITE, addr, nr/8 + 1))\ + __set_bit_user_non_atomic_asm(nr, addr, __ret_sbu, -EFAULT);\ + else\ + __ret_sbu = -EFAULT;\ + \ + __ret_sbu; \ +}) + /* * These are the main single-value transfer routines. They automatically * use the right size if we just have the right pointer type. -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 6/12 not tested yet] PPC: introduce copy_in_user() for 32-bit
During the work of KVM's dirty page logging optimization, we encountered the need of copy_in_user() for 32-bit ppc and x86: these will be used for manipulating dirty bitmaps in user space. So we implement copy_in_user() for 32-bit with __copy_tofrom_user(). Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp CC: Alexander Graf ag...@suse.de CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/uaccess.h | 17 + 1 files changed, 17 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index bd0fb84..3a01ce8 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -359,6 +359,23 @@ static inline unsigned long copy_to_user(void __user *to, return n; } +static inline unsigned long copy_in_user(void __user *to, + const void __user *from, unsigned long n) +{ + unsigned long over; + + if (likely(access_ok(VERIFY_READ, from, n) + access_ok(VERIFY_WRITE, to, n))) + return __copy_tofrom_user(to, from, n); + if (((unsigned long)from TASK_SIZE) || + ((unsigned long)to TASK_SIZE)) { + over = max((unsigned long)from, (unsigned long)to) + + n - TASK_SIZE; + return __copy_tofrom_user(to, from, n - over) + over; + } + return n; +} + #else /* __powerpc64__ */ #define __copy_in_user(to, from, size) \ -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 5/9] kvm: synchronize state from cpu context
From: Jan Kiszka jan.kis...@siemens.com It is not safe to retrieve the KVM internal state of a given cpu while its potentially modifying it. Queue the request to run on cpu context, similarly to qemu-kvm. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu/kvm-all.c === --- qemu.orig/kvm-all.c +++ qemu/kvm-all.c @@ -796,14 +796,22 @@ void kvm_flush_coalesced_mmio_buffer(voi #endif } -void kvm_cpu_synchronize_state(CPUState *env) +static void do_kvm_cpu_synchronize_state(void *_env) { +CPUState *env = _env; + if (!env-kvm_vcpu_dirty) { kvm_arch_get_registers(env); env-kvm_vcpu_dirty = 1; } } +void kvm_cpu_synchronize_state(CPUState *env) +{ +if (!env-kvm_vcpu_dirty) +run_on_cpu(env, do_kvm_cpu_synchronize_state, env); +} + void kvm_cpu_synchronize_post_reset(CPUState *env) { kvm_arch_put_registers(env, KVM_PUT_RESET_STATE); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 4/9] port qemu-kvm's on_vcpu code
run_on_cpu allows to execute work on a given CPUState context. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu/cpu-all.h === --- qemu.orig/cpu-all.h +++ qemu/cpu-all.h @@ -818,6 +818,7 @@ void cpu_watchpoint_remove_all(CPUState void cpu_single_step(CPUState *env, int enabled); void cpu_reset(CPUState *s); +void run_on_cpu(CPUState *env, void (*func)(void *data), void *data); #define CPU_LOG_TB_OUT_ASM (1 0) #define CPU_LOG_TB_IN_ASM (1 1) Index: qemu/cpu-defs.h === --- qemu.orig/cpu-defs.h +++ qemu/cpu-defs.h @@ -132,6 +132,7 @@ typedef struct icount_decr_u16 { struct kvm_run; struct KVMState; +struct qemu_work_item; typedef struct CPUBreakpoint { target_ulong pc; @@ -204,6 +205,7 @@ typedef struct CPUWatchpoint { uint32_t created; \ struct QemuThread *thread; \ struct QemuCond *halt_cond; \ +struct qemu_work_item *queued_work_first, *queued_work_last;\ const char *cpu_model_str; \ struct KVMState *kvm_state; \ struct kvm_run *kvm_run;\ Index: qemu/cpus.c === --- qemu.orig/cpus.c +++ qemu/cpus.c @@ -115,6 +115,8 @@ static int cpu_has_work(CPUState *env) { if (env-stop) return 1; +if (env-queued_work_first) +return 1; if (env-stopped || !vm_running) return 0; if (!env-halted) @@ -252,6 +254,11 @@ int qemu_cpu_self(void *env) return 1; } +void run_on_cpu(CPUState *env, void (*func)(void *data), void *data) +{ +func(data); +} + void resume_all_vcpus(void) { } @@ -304,6 +311,7 @@ static QemuCond qemu_cpu_cond; /* system init */ static QemuCond qemu_system_cond; static QemuCond qemu_pause_cond; +static QemuCond qemu_work_cond; static void tcg_block_io_signals(void); static void kvm_block_io_signals(CPUState *env); @@ -334,6 +342,50 @@ void qemu_main_loop_start(void) qemu_cond_broadcast(qemu_system_cond); } +void run_on_cpu(CPUState *env, void (*func)(void *data), void *data) +{ +struct qemu_work_item wi; + +if (qemu_cpu_self(env)) { +func(data); +return; +} + +wi.func = func; +wi.data = data; +if (!env-queued_work_first) +env-queued_work_first = wi; +else +env-queued_work_last-next = wi; +env-queued_work_last = wi; +wi.next = NULL; +wi.done = false; + +qemu_cpu_kick(env); +while (!wi.done) { +CPUState *self_env = cpu_single_env; + +qemu_cond_wait(qemu_work_cond, qemu_global_mutex); +cpu_single_env = self_env; +} +} + +static void flush_queued_work(CPUState *env) +{ +struct qemu_work_item *wi; + +if (!env-queued_work_first) +return; + +while ((wi = env-queued_work_first)) { +env-queued_work_first = wi-next; +wi-func(wi-data); +wi-done = true; +} +env-queued_work_last = NULL; +qemu_cond_broadcast(qemu_work_cond); +} + static void qemu_wait_io_event_common(CPUState *env) { if (env-stop) { @@ -341,6 +393,7 @@ static void qemu_wait_io_event_common(CP env-stopped = 1; qemu_cond_signal(qemu_pause_cond); } +flush_queued_work(env); } static void qemu_wait_io_event(CPUState *env) Index: qemu/qemu-common.h === --- qemu.orig/qemu-common.h +++ qemu/qemu-common.h @@ -249,6 +249,14 @@ void qemu_notify_event(void); void qemu_cpu_kick(void *env); int qemu_cpu_self(void *env); +/* work queue */ +struct qemu_work_item { +struct qemu_work_item *next; +void (*func)(void *data); +void *data; +int done; +}; + #ifdef CONFIG_USER_ONLY #define qemu_init_vcpu(env) do { } while (0) #else -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 6/9] add cpu_is_stopped helper
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu/cpu-all.h === --- qemu.orig/cpu-all.h +++ qemu/cpu-all.h @@ -818,6 +818,7 @@ void cpu_watchpoint_remove_all(CPUState void cpu_single_step(CPUState *env, int enabled); void cpu_reset(CPUState *s); +int cpu_is_stopped(CPUState *env); void run_on_cpu(CPUState *env, void (*func)(void *data), void *data); #define CPU_LOG_TB_OUT_ASM (1 0) Index: qemu/cpus.c === --- qemu.orig/cpus.c +++ qemu/cpus.c @@ -91,6 +91,11 @@ void cpu_synchronize_all_post_init(void) } } +int cpu_is_stopped(CPUState *env) +{ +return !vm_running || env-stopped; +} + static void do_vm_stop(int reason) { if (vm_running) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 9/9] kvm: enable smp 1
Process INIT/SIPI requests and enable -smp 1. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu/kvm-all.c === --- qemu.orig/kvm-all.c +++ qemu/kvm-all.c @@ -593,11 +593,6 @@ int kvm_init(int smp_cpus) int ret; int i; -if (smp_cpus 1) { -fprintf(stderr, No SMP KVM support, use '-smp 1'\n); -return -EINVAL; -} - s = qemu_mallocz(sizeof(KVMState)); #ifdef KVM_CAP_SET_GUEST_DEBUG @@ -840,6 +835,11 @@ int kvm_cpu_exec(CPUState *env) } #endif +if (kvm_arch_process_irqchip_events(env)) { +ret = 0; +break; +} + if (env-kvm_vcpu_dirty) { kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE); env-kvm_vcpu_dirty = 0; Index: qemu/kvm.h === --- qemu.orig/kvm.h +++ qemu/kvm.h @@ -90,6 +90,8 @@ int kvm_arch_handle_exit(CPUState *env, int kvm_arch_pre_run(CPUState *env, struct kvm_run *run); +int kvm_arch_process_irqchip_events(CPUState *env); + int kvm_arch_get_registers(CPUState *env); /* state subset only touched by the VCPU itself during runtime */ Index: qemu/target-i386/kvm.c === --- qemu.orig/target-i386/kvm.c +++ qemu/target-i386/kvm.c @@ -1073,6 +1073,22 @@ int kvm_arch_post_run(CPUState *env, str return 0; } +int kvm_arch_process_irqchip_events(CPUState *env) +{ +if (env-interrupt_request CPU_INTERRUPT_INIT) { +kvm_cpu_synchronize_state(env); +do_cpu_init(env); +env-exception_index = EXCP_HALTED; +} + +if (env-interrupt_request CPU_INTERRUPT_SIPI) { +kvm_cpu_synchronize_state(env); +do_cpu_sipi(env); +} + +return env-halted; +} + static int kvm_handle_halt(CPUState *env) { if (!((env-interrupt_request CPU_INTERRUPT_HARD) Index: qemu/target-ppc/kvm.c === --- qemu.orig/target-ppc/kvm.c +++ qemu/target-ppc/kvm.c @@ -224,6 +224,11 @@ int kvm_arch_post_run(CPUState *env, str return 0; } +int kvm_arch_process_irqchip_events(CPUState *env) +{ +return 0; +} + static int kvmppc_handle_halt(CPUState *env) { if (!(env-interrupt_request CPU_INTERRUPT_HARD) (msr_ee)) { Index: qemu/target-s390x/kvm.c === --- qemu.orig/target-s390x/kvm.c +++ qemu/target-s390x/kvm.c @@ -175,6 +175,11 @@ int kvm_arch_post_run(CPUState *env, str return 0; } +int kvm_arch_process_irqchip_events(CPUState *env) +{ +return 0; +} + static void kvm_s390_interrupt_internal(CPUState *env, int type, uint32_t parm, uint64_t parm64, int vm) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space
During the work of KVM's dirty page logging optimization, we encountered the need of manipulating bitmaps in user space efficiantly. To achive this, we introduce a uaccess function for setting a bit in user space following Avi's suggestion. KVM is now using dirty bitmaps for live-migration and VGA. Although we need to update them from kernel side, copying them every time for updating the dirty log is a big bottleneck. Especially, we tested that zero-copy bitmap manipulation improves responses of GUI manipulations a lot. We also found one similar need in drivers/vhost/vhost.c in which the author implemented set_bit_to_user() locally using inefficient functions: see TODO at the top of that. Probably, this kind of need would be common for virtualization area. So we introduce a function set_bit_user_non_atomic(). Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp CC: Alexander Graf ag...@suse.de CC: Benjamin Herrenschmidt b...@kernel.crashing.org CC: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/uaccess.h | 19 +++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 3a01ce8..f878326 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -321,6 +321,25 @@ do { \ __gu_err; \ }) +static inline int set_bit_user_non_atomic(int nr, void __user *addr) +{ + u8 __user *p; + u8 val; + + p = (u8 __user *)((unsigned long)addr + nr / BITS_PER_BYTE); + if (!access_ok(VERIFY_WRITE, p, 1)) + return -EFAULT; + + if (__get_user(val, p)) + return -EFAULT; + + val |= 1U (nr % BITS_PER_BYTE); + if (__put_user(val, p)) + return -EFAULT; + + return 0; +} + /* more complex routines */ -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
Although we can use *_le_bit() helpers to treat bitmaps le arranged, having le bit offset calculation as a seperate macro gives us more freedom. For example, KVM has le arranged dirty bitmaps for VGA, live-migration and they are used in user space too. To avoid bitmap copies between kernel and user space, we want to update the bitmaps in user space directly. To achive this, le bit offset with *_user() functions help us a lot. So let us use the le bit offset calculation part by defining it as a new macro: generic_le_bit_offset() . Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp CC: Arnd Bergmann a...@arndb.de --- include/asm-generic/bitops/le.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h index 80e3bf1..ee445fb 100644 --- a/include/asm-generic/bitops/le.h +++ b/include/asm-generic/bitops/le.h @@ -9,6 +9,8 @@ #if defined(__LITTLE_ENDIAN) +#define generic_le_bit_offset(nr) (nr) + #define generic_test_le_bit(nr, addr) test_bit(nr, addr) #define generic___set_le_bit(nr, addr) __set_bit(nr, addr) #define generic___clear_le_bit(nr, addr) __clear_bit(nr, addr) @@ -25,6 +27,8 @@ #elif defined(__BIG_ENDIAN) +#define generic_le_bit_offset(nr) ((nr) ^ BITOP_LE_SWIZZLE) + #define generic_test_le_bit(nr, addr) \ test_bit((nr) ^ BITOP_LE_SWIZZLE, (addr)) #define generic___set_le_bit(nr, addr) \ -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 9/12] KVM: introduce a wrapper function of set_bit_user_non_atomic()
This is not to break the build for other architectures than x86 and ppc. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp --- arch/ia64/include/asm/kvm_host.h|5 + arch/powerpc/include/asm/kvm_host.h |6 ++ arch/s390/include/asm/kvm_host.h|6 ++ arch/x86/include/asm/kvm_host.h |5 + 4 files changed, 22 insertions(+), 0 deletions(-) diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h index a362e67..938041b 100644 --- a/arch/ia64/include/asm/kvm_host.h +++ b/arch/ia64/include/asm/kvm_host.h @@ -589,6 +589,11 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu); int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run); void kvm_sal_emul(struct kvm_vcpu *vcpu); +static inline int kvm_set_bit_user(int nr, void __user *addr) +{ + return 0; +} + #endif /* __ASSEMBLY__*/ #endif diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 0c9ad86..9463524 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -26,6 +26,7 @@ #include linux/types.h #include linux/kvm_types.h #include asm/kvm_asm.h +#include asm/uaccess.h #define KVM_MAX_VCPUS 1 #define KVM_MEMORY_SLOTS 32 @@ -287,4 +288,9 @@ struct kvm_vcpu_arch { #endif }; +static inline int kvm_set_bit_user(int nr, void __user *addr) +{ + return set_bit_user_non_atomic(nr, addr); +} + #endif /* __POWERPC_KVM_HOST_H__ */ diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index 27605b6..36710ee 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -238,4 +238,10 @@ struct kvm_arch{ }; extern int sie64a(struct kvm_s390_sie_block *, unsigned long *); + +static inline int kvm_set_bit_user(int nr, void __user *addr) +{ + return 0; +} + #endif diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3f0007b..9e22df9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -795,4 +795,9 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask); bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip); +static inline int kvm_set_bit_user(int nr, void __user *addr) +{ + return set_bit_user_non_atomic(nr, addr); +} + #endif /* _ASM_X86_KVM_HOST_H */ -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH RFC 10/12] KVM: move dirty bitmaps to user space
We move dirty bitmaps to user space. - Allocation and destruction: we use do_mmap() and do_munmap(). The new bitmap space is twice longer than the original one and we use the additional space for double buffering: this makes it possible to update the active bitmap while letting the user space read the other one safely. For x86, we can also remove the vmalloc() in kvm_vm_ioctl_get_dirty_log(). - Bitmap manipulations: we replace all functions which access dirty bitmaps with *_user() functions. - For ia64: moving the dirty bitmaps of memory slots does not affect ia64 much because it's using a different place to store dirty logs rather than the dirty bitmaps of memory slots: all we have to change are sync and get of dirty log, so we don't need set_bit_user like functions for ia64. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp CC: Avi Kivity a...@redhat.com CC: Alexander Graf ag...@suse.de --- arch/ia64/kvm/kvm-ia64.c | 15 +- arch/powerpc/kvm/book3s.c |5 +++- arch/x86/kvm/x86.c| 25 -- include/linux/kvm_host.h |3 +- virt/kvm/kvm_main.c | 62 +--- 5 files changed, 82 insertions(+), 28 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 17fd65c..03503e6 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1823,11 +1823,19 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); base = memslot-base_gfn / BITS_PER_LONG; + r = -EFAULT; + if (!access_ok(VERIFY_WRITE, memslot-dirty_bitmap, n)) + goto out; + for (i = 0; i n/sizeof(long); ++i) { if (dirty_bitmap[base + i]) memslot-is_dirty = true; - memslot-dirty_bitmap[i] = dirty_bitmap[base + i]; + if (__put_user(dirty_bitmap[base + i], + memslot-dirty_bitmap[i])) { + r = -EFAULT; + goto out; + } dirty_bitmap[base + i] = 0; } r = 0; @@ -1858,7 +1866,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, if (memslot-is_dirty) { kvm_flush_remote_tlbs(kvm); n = kvm_dirty_bitmap_bytes(memslot); - memset(memslot-dirty_bitmap, 0, n); + if (clear_user(memslot-dirty_bitmap, n)) { + r = -EFAULT; + goto out; + } memslot-is_dirty = false; } r = 0; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 4b074f1..2a31d2f 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1210,7 +1210,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, kvmppc_mmu_pte_pflush(vcpu, ga, ga_end); n = kvm_dirty_bitmap_bytes(memslot); - memset(memslot-dirty_bitmap, 0, n); + if (clear_user(memslot-dirty_bitmap, n)) { + r = -EFAULT; + goto out; + } memslot-is_dirty = false; } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 023c7f8..32a3d94 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2760,40 +2760,37 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, /* If nothing is dirty, don't bother messing with page tables. */ if (memslot-is_dirty) { struct kvm_memslots *slots, *old_slots; - unsigned long *dirty_bitmap; + unsigned long __user *dirty_bitmap; + unsigned long __user *dirty_bitmap_old; spin_lock(kvm-mmu_lock); kvm_mmu_slot_remove_write_access(kvm, log-slot); spin_unlock(kvm-mmu_lock); - r = -ENOMEM; - dirty_bitmap = vmalloc(n); - if (!dirty_bitmap) + dirty_bitmap = memslot-dirty_bitmap; + dirty_bitmap_old = memslot-dirty_bitmap_old; + r = -EFAULT; + if (clear_user(dirty_bitmap_old, n)) goto out; - memset(dirty_bitmap, 0, n); r = -ENOMEM; slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); - if (!slots) { - vfree(dirty_bitmap); + if (!slots) goto out; - } + memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots)); - slots-memslots[log-slot].dirty_bitmap = dirty_bitmap; + slots-memslots[log-slot].dirty_bitmap = dirty_bitmap_old; + slots-memslots[log-slot].dirty_bitmap_old = dirty_bitmap; slots-memslots[log-slot].is_dirty = false;
[RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps
Now that dirty bitmaps are accessible from user space, we export the addresses of these to achieve zero-copy dirty log check: KVM_GET_USER_DIRTY_LOG_ADDR We also need an API for triggering dirty bitmap switch to take the full advantage of double buffered bitmaps. KVM_SWITCH_DIRTY_LOG See the documentation in this patch for precise explanations. About performance improvement: the most important feature of switch API is the lightness. In our test, this appeared in the form of improved responses for GUI manipulations. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp CC: Avi Kivity a...@redhat.com CC: Alexander Graf ag...@suse.de --- Documentation/kvm/api.txt | 87 + arch/ia64/kvm/kvm-ia64.c | 27 +- arch/powerpc/kvm/book3s.c | 18 +++-- arch/x86/kvm/x86.c| 44 --- include/linux/kvm.h | 11 ++ include/linux/kvm_host.h |4 ++- virt/kvm/kvm_main.c | 63 + 7 files changed, 220 insertions(+), 34 deletions(-) diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index a237518..c106c83 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -892,6 +892,93 @@ arguments. This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel irqchip, the multiprocessing state must be maintained by userspace. +4.39 KVM_GET_USER_DIRTY_LOG_ADDR + +Capability: KVM_CAP_USER_DIRTY_LOG (=1 see below) +Architectures: all +Type: vm ioctl +Parameters: struct kvm_user_dirty_log (in/out) +Returns: 0 on success, -1 on error + +This ioctl makes it possible to use KVM_SWITCH_DIRTY_LOG (see 4.40) instead +of the old dirty log manipulation by KVM_GET_DIRTY_LOG. + +A bit about KVM_CAP_USER_DIRTY_LOG + +Before this ioctl was introduced, dirty bitmaps for dirty page logging were +allocated in the kernel's memory space. But we have now moved them to user +space to get more flexiblity and performance. To achive this move without +breaking the compatibility, we will split KVM_CAP_USER_DIRTY_LOG capability +into a few generations which can be identified by its check extension +return values. + +This KVM_GET_USER_DIRTY_LOG_ADDR belongs to the first generation with the +KVM_SWITCH_DIRTY_LOG (4.40) and must be supported by all generations. + +What you get + +By using this, you can get paired bitmap addresses which are accessible from +user space. See the explanation in 4.40 for the roles of these two bitmaps. + +How to Get + +Before calling this, you have to set the slot member of kvm_user_dirty_log +to indicate the target memory slot. + +struct kvm_user_dirty_log { + __u32 slot; + __u32 flags; + __u64 dirty_bitmap; + __u64 dirty_bitmap_old; +}; + +The addresses will be set in the paired members: dirty_bitmap and _old. + +Note + +In generation 1, we support bitmaps which are created in kernel but do not +support bitmaps created by users. This means bitmap creation/destruction +are done same as before when you instruct KVM by KVM_SET_USER_MEMORY_REGION +(see 4.34) to start/stop logging. Please do not try to free the exported +bitmaps by yourself, or KVM will access the freed area and end with fault. + +4.40 KVM_SWITCH_DIRTY_LOG + +Capability: KVM_CAP_USER_DIRTY_LOG (=1 see 4.39) +Architectures: all +Type: vm ioctl +Parameters: memory slot id +Returns: 0 if switched, 1 if not (slot was clean), -1 on error + +This ioctl allows you to switch the dirty log to the next one: a newer +ioctl for getting dirty page logs than KVM_GET_DIRTY_LOG (see 4.7 for the +explanation about dirty page logging, log format is not changed). + +If you have the capability KVM_CAP_USER_DIRTY_LOG, using this is strongly +recommended than using KVM_GET_DIRTY_LOG because this does not need buffer +copy between kernel and user space. + +How to Use + +Before calling this, you have to remember the paired addresses of dirty +bitmaps which can be obtained by KVM_GET_USER_DIRTY_LOG_ADDR (see 4.39): +dirty_bitmap (being used now by kernel) and dirty_bitmap_old (not being +used now and containing the last log). + +After calling this, the role of these bitmaps will change like this; +If the return value was 0, kernel cleared dirty_bitmap_old and began to use +it for the next logging, so that you can use the cold dirty_bitmap to check +the log since the last switch. If the return value was 1, all pages were not +dirty and bitmap switch was not done. Note that remembering which bitmap is +now active is your responsibility. So you have to update your remembering +when you get the return value 0. + +Note + +Bitmap switch may also occur when you call KVM_GET_DIRTY_LOG. Please use +either one, preferably this one, only to avoid extra confusion. We do not +guarantee on which condition KVM_GET_DIRTY_LOG causes bitmap switch. + 5. The kvm_run structure Application code obtains a
[RFC][PATCH 12/12 sample] qemu-kvm: use new API for getting/switching dirty bitmaps
We use new API for light dirty log access if KVM supports it. This conflicts with Marcelo's patches. So please take this as a sample patch. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- kvm/include/linux/kvm.h | 11 ++ qemu-kvm.c | 81 ++- qemu-kvm.h |1 + 3 files changed, 85 insertions(+), 8 deletions(-) diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h index 6485981..efd9538 100644 --- a/kvm/include/linux/kvm.h +++ b/kvm/include/linux/kvm.h @@ -317,6 +317,14 @@ struct kvm_dirty_log { }; }; +/* for KVM_GET_USER_DIRTY_LOG_ADDR */ +struct kvm_user_dirty_log { + __u32 slot; + __u32 flags; + __u64 dirty_bitmap; + __u64 dirty_bitmap_old; +}; + /* for KVM_SET_SIGNAL_MASK */ struct kvm_signal_mask { __u32 len; @@ -499,6 +507,7 @@ struct kvm_ioeventfd { #define KVM_CAP_PPC_SEGSTATE 43 #define KVM_CAP_PCI_SEGMENT 47 +#define KVM_CAP_USER_DIRTY_LOG 55 #ifdef KVM_CAP_IRQ_ROUTING @@ -595,6 +604,8 @@ struct kvm_clock_data { struct kvm_userspace_memory_region) #define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47) #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64) +#define KVM_GET_USER_DIRTY_LOG_ADDR _IOW(KVMIO, 0x49, struct kvm_user_dirty_log) +#define KVM_SWITCH_DIRTY_LOG _IO(KVMIO, 0x4a) /* Device model IOC */ #define KVM_CREATE_IRQCHIP_IO(KVMIO, 0x60) #define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level) diff --git a/qemu-kvm.c b/qemu-kvm.c index 91f0222..98777f0 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -143,6 +143,8 @@ struct slot_info { unsigned long userspace_addr; unsigned flags; int logging_count; +unsigned long *dirty_bitmap; +unsigned long *dirty_bitmap_old; }; struct slot_info slots[KVM_MAX_NUM_MEM_REGIONS]; @@ -232,6 +234,29 @@ int kvm_is_containing_region(kvm_context_t kvm, unsigned long phys_addr, return 1; } +static int kvm_user_dirty_log_works(void) +{ +return kvm_state-user_dirty_log; +} + +static int kvm_set_user_dirty_log(int slot) +{ +int r; +struct kvm_user_dirty_log dlog; + +dlog.slot = slot; +r = kvm_vm_ioctl(kvm_state, KVM_GET_USER_DIRTY_LOG_ADDR, dlog); +if (r 0) { +DPRINTF(KVM_GET_USER_DIRTY_LOG_ADDR failed: %s\n, strerror(-r)); +return r; +} +slots[slot].dirty_bitmap = (unsigned long *) + ((unsigned long)dlog.dirty_bitmap); +slots[slot].dirty_bitmap_old = (unsigned long *) + ((unsigned long)dlog.dirty_bitmap_old); +return r; +} + /* * dirty pages logging control */ @@ -265,8 +290,16 @@ static int kvm_dirty_pages_log_change(kvm_context_t kvm, DPRINTF(slot %d start %llx len %llx flags %x\n, mem.slot, mem.guest_phys_addr, mem.memory_size, mem.flags); r = kvm_vm_ioctl(kvm_state, KVM_SET_USER_MEMORY_REGION, mem); -if (r 0) +if (r 0) { fprintf(stderr, %s: %m\n, __FUNCTION__); +return r; +} +} +if (flags KVM_MEM_LOG_DIRTY_PAGES) { +r = kvm_set_user_dirty_log(slot); +} else { +slots[slot].dirty_bitmap = NULL; +slots[slot].dirty_bitmap_old = NULL; } return r; } @@ -589,7 +622,6 @@ int kvm_register_phys_mem(kvm_context_t kvm, unsigned long phys_start, void *userspace_addr, unsigned long len, int log) { - struct kvm_userspace_memory_region memory = { .memory_size = len, .guest_phys_addr = phys_start, @@ -608,6 +640,9 @@ int kvm_register_phys_mem(kvm_context_t kvm, fprintf(stderr, create_userspace_phys_mem: %s\n, strerror(-r)); return -1; } +if (log) { +r = kvm_set_user_dirty_log(memory.slot); +} register_slot(memory.slot, memory.guest_phys_addr, memory.memory_size, memory.userspace_addr, memory.flags); return 0; @@ -652,6 +687,8 @@ void kvm_destroy_phys_mem(kvm_context_t kvm, unsigned long phys_start, fprintf(stderr, destroy_userspace_phys_mem: %s, strerror(-r)); return; } +slots[memory.slot].dirty_bitmap = NULL; +slots[memory.slot].dirty_bitmap_old = NULL; free_slot(memory.slot); } @@ -692,6 +729,21 @@ int kvm_get_dirty_pages(kvm_context_t kvm, unsigned long phys_addr, void *buf) return kvm_get_map(kvm, KVM_GET_DIRTY_LOG, slot, buf); } +static int kvm_switch_map(int slot) +{ +int r; + +r = kvm_vm_ioctl(kvm_state, KVM_SWITCH_DIRTY_LOG, slot); +if (r == 0) { +unsigned long *dirty_bitmap; + +dirty_bitmap = slots[slot].dirty_bitmap; +slots[slot].dirty_bitmap = slots[slot].dirty_bitmap_old; +slots[slot].dirty_bitmap_old = dirty_bitmap; +} +return r; +} + int kvm_get_dirty_pages_range(kvm_context_t kvm,
Re: [PATCH 1/3 v2] KVM MMU: make kvm_mmu_zap_page() return the number of zapped sp in total.
On Mon, May 03, 2010 at 09:38:54PM +0800, Gui Jianfeng wrote: Hi Marcelo Actually, it doesn't only affect kvm_mmu_change_mmu_pages() but also affects kvm_mmu_remove_some_alloc_mmu_pages() which is called by mmu shrink routine. This will induce upper layer get a wrong number, so i think this should be fixed. Here is a updated version. --- From: Gui Jianfeng guijianf...@cn.fujitsu.com Currently, in kvm_mmu_change_mmu_pages(kvm, page), used_pages-- is performed after calling kvm_mmu_zap_page() in spite of that whether page is actually reclaimed. Because root sp won't be reclaimed by kvm_mmu_zap_page(). So making kvm_mmu_zap_page() return total number of reclaimed sp makes more sense. A new flag is put into kvm_mmu_zap_page() to indicate whether the top page is reclaimed. kvm_mmu_remove_some_alloc_mmu_pages() also rely on kvm_mmu_zap_page() to return a total relcaimed number. Isnt it simpler to have kvm_mmu_zap_page return the number of pages it actually freed? Then always restart the hash walk if return is positive. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 0/9] enable smp 1 and related fixes
-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 1/9] kvm: set cpu_single_env around KVM_RUN ioctl
Zero cpu_single_env before leaving global lock protection, and restore on return. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu/kvm-all.c === --- qemu.orig/kvm-all.c +++ qemu/kvm-all.c @@ -846,9 +846,11 @@ int kvm_cpu_exec(CPUState *env) } kvm_arch_pre_run(env, run); +cpu_single_env = NULL; qemu_mutex_unlock_iothread(); ret = kvm_vcpu_ioctl(env, KVM_RUN, 0); qemu_mutex_lock_iothread(); +cpu_single_env = env; kvm_arch_post_run(env, run); if (ret == -EINTR || ret == -EAGAIN) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch uq/master 8/9] kvm: validate context for kvm cpu get/put operations
From: Jan Kiszka jan.kis...@siemens.com Validate that KVM vcpu state is only read/written from cpu thread itself or that cpu is stopped. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: qemu/target-i386/kvm.c === --- qemu.orig/target-i386/kvm.c +++ qemu/target-i386/kvm.c @@ -949,6 +949,8 @@ int kvm_arch_put_registers(CPUState *env { int ret; +assert(cpu_is_stopped(env) || qemu_cpu_self(env)); + ret = kvm_getput_regs(env, 1); if (ret 0) return ret; @@ -991,6 +993,8 @@ int kvm_arch_get_registers(CPUState *env { int ret; +assert(cpu_is_stopped(env) || qemu_cpu_self(env)); + ret = kvm_getput_regs(env, 0); if (ret 0) return ret; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call minutes for May 4
KVM Forum topic ideas - mgmt interface (qemud) - working breakout sessions are welcome at the Forum stable tree - have a volunteer (thanks Justin) - Anthony will write up proposal which is basically - bug fixes actively proposed for stable tree - stable maintainer collects and applies - periodically release and re-sync w/ main tree 0.12.4? - RSN...will tag and push shortly -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Avoid writing HOST_CR0 every entry
On Mon, May 03, 2010 at 05:18:54PM +0300, Avi Kivity wrote: cr0.ts may change between entries, so we copy cr0 to HOST_CR0 before each entry. That is slow, so instead, set HOST_CR0 to have TS set unconditionally (which is a safe value), and issue a clts() just before exiting vcpu context if the task indeed owns the fpu. Saves ~50 cycles/exit. Signed-off-by: Avi Kivity a...@redhat.com Looks good to me. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
On Tuesday 04 May 2010, Takuya Yoshikawa wrote: Although we can use *_le_bit() helpers to treat bitmaps le arranged, having le bit offset calculation as a seperate macro gives us more freedom. For example, KVM has le arranged dirty bitmaps for VGA, live-migration and they are used in user space too. To avoid bitmap copies between kernel and user space, we want to update the bitmaps in user space directly. To achive this, le bit offset with *_user() functions help us a lot. So let us use the le bit offset calculation part by defining it as a new macro: generic_le_bit_offset() . Does this work correctly if your user space is 32 bits (i.e. unsigned long is different size in user space and kernel) in both big- and little-endian systems? I'm not sure about all the details, but I think you cannot in general share bitmaps between user space and kernel because of this. Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2996643 ] qemu-kvm -smb crashes with Samba 3.5.2
Bugs item #2996643, was opened at 2010-05-04 17:23 Message generated for change (Tracker Item Submitted) made by edolstra You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2996643group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Eelco Dolstra (edolstra) Assigned to: Nobody/Anonymous (nobody) Summary: qemu-kvm -smb crashes with Samba 3.5.2 Initial Comment: qemu-kvm 0.12.3 dies with a SIGTERM when using the -smb flag as soon as I try to unmount the SMB/CIFS filesystem. For instance, this sequence of commands in the Linux guest: mount.cifs //10.0.2.4/qemu /hostfs -o guest,username=nobody umount /hostfs causes qemu-kvm to crash almost immediately because it receives a TERM signal that its smbd child process sends to the process group, as strace shows: [pid 27982] kill(0, SIGTERM)= 0 [pid 27982] --- SIGTERM (Terminated) @ 0 (0) --- Process 27982 detached [pid 27980] ... timer_settime resumed NULL) = 0 [pid 27980] --- SIGTERM (Terminated) @ 0 (0) --- [pid 27980] --- SIGCHLD (Child exited) @ 0 (0) --- (pid 27982 is the smbd child, 27980 is qemu-system-x86_64) This works fine with Samba 3.3.3 not with 3.3.12 or 3.5.2, so apparently something changed there. Host machine runs NixOS (Linux 2.6.32.11), 32-bit, Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2996643group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm guest: hrtimer issue with smp.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi all, Saw a long thread on this back in Oct. but did not see if this was resolved or not. We are experiencing hrtimer: interrupt took huge_number ns errors on the guest console with -smp 2 running on an Intel host. Both are running 2.6.33.3 vanilla. Guest is using kvm-timer, Host is using tsc. Guest will become completely unresponsive after about 23hrs. Review of logs after restart shows that the system suddenly shows a system time of 2 weeks in the past. Not sure what else to report here. I've changed the guest to a single CPU, and the problem appears to be gone... Let me know how we can help on this... Stuart Sheldon ACT USA - -- If you took all the girls I knew When I was single And brought them all together for one night I know theyd never match My sweet imagination And everything looks worse in black and white -- Paul Simon - Kodachrome Lyrics -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iQIcBAEBCAAGBQJL4D0/AAoJEFKVLITDJSGSFdgQALYfHEQnFuOKxPAthFZpfV9G OZT2U0r1drwcxEPDXWlMxX+sOOItMjYkCdfJ4l4dlnOFRYigXP3puuYVocF2fXNW 2gJM7fGBl9uzm/jRBeuoaF1SX8qxp0oBzIj0G+3wstzDX/3f04T0bm/32QeyMgDH WmM2ElFlwATaVwsj+AIYyEvdFKMHW7erNNb6PVHUvTNv/SLLb7XII7jnVsBsJYZZ zz1YjhpTUFViKOobkD7nkbRCuhzEF86zKNCi4q5YnGFVBcX5v+qOExtqW/kRs0tv ibQZ0E8P+uuItdnSoo0hIIWzfetdiEvLp1ZJN1aAazyNypbtH7GIOQezOBQbSN4N Wzc53bxYL2Cer1/XnhtvgWq3TJc2a0/RiMFNh3fuCiY2WM80/NhdE+tMOXidKlOq vK3mSWlyu1GOxep2yoH8XQzLqTzDBeYeSaEsL8CGHtn92aim/lrgCoU3W4n9K+mT cHw1iEUmD9BZFEPAUj6bNwjvOQ0UPvhpiBlyrm6qoU9zK8ALAVTvCZj+9ZTv9tYJ s28A9UJJ+qGQ1FSGBVpWrhfSILpir+lGJFPRpiAXSqzYC/8i0ZkYJBDVi8LtX/Mf Q7NLVzXk6RgHcz3y/vr5ICjbyziiROFI8GbhmqtWobknT6AFaCr7rOHNoqbt91qW zjE2ZS381fcAqCYzQ+CA =KvAi -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM testing: New winutils.iso SHA1, refactoring some code
Make it possible to download the winutils.iso file right from its repository, making very convenient for users to perform windows testing. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/get_started.py | 65 +++--- 1 files changed, 46 insertions(+), 19 deletions(-) diff --git a/client/tests/kvm/get_started.py b/client/tests/kvm/get_started.py index 3a6f20f..870485b 100755 --- a/client/tests/kvm/get_started.py +++ b/client/tests/kvm/get_started.py @@ -11,6 +11,34 @@ from autotest_lib.client.common_lib import logging_manager from autotest_lib.client.bin import utils, os_dep +def check_iso(url, destination, hash): + +Verifies if ISO that can be found on url is on destination with right hash. + +This function will verify the SHA1 hash of the ISO image. If the file +turns out to be missing or corrupted, let the user know we can download it. + +@param url: URL where the ISO file can be found. +@param destination: Directory in local disk where we'd like the iso to be. +@param hash: SHA1 hash for the ISO image. + +logging.info(Verifying iso %s, os.path.basename(url)) +if not destination: +os.makedirs(destination) +iso_path = os.path.join(destination, os.path.basename(url)) +if not os.path.isfile(iso_path) or ( +utils.hash_file(iso_path, method=sha1) != hash): +logging.warning(%s not found or corrupted, iso_path) +logging.warning(Would you like to download it? (y/n)) +iso_download = raw_input() +if iso_download == 'y': +utils.unmap_url_cache(destination, url, hash, method=sha1) +else: +logging.warning(Missing file %s. Please download it, iso_path) +else: +logging.debug(%s present, with proper checksum, iso_path) + + if __name__ == __main__: logging_manager.configure_logging(kvm_utils.KvmLoggingConfig(), verbose=True) @@ -51,28 +79,27 @@ if __name__ == __main__: else: logging.debug(Config file %s exists, not touching % dst_file) -logging.info(3 - Verifying iso (make sure we have the OS iso needed for +logging.info(3 - Verifying iso (make sure we have the OS ISO needed for the default test set)) -base_iso_name = Fedora-12-x86_64-DVD.iso + +iso_name = Fedora-12-x86_64-DVD.iso fedora_dir = pub/fedora/linux/releases/12/Fedora/x86_64/iso url = os.path.join(http://download.fedoraproject.org/;, fedora_dir, - base_iso_name) -md5sum = 6dd31e292cc2eb1140544e9b1ba61c56 -iso_dir = os.path.join(base_dir, 'isos', 'linux') -if not iso_dir: -os.makedirs(iso_dir) -iso_path = os.path.join(iso_dir, base_iso_name) -if not os.path.isfile(iso_path) or ( - utils.hash_file(iso_path, method=md5) != md5sum): -logging.warning(%s not found or corrupted, iso_path) -logging.warning(Would you like to download it? (y/n)) -iso_download = raw_input() -if iso_download == 'y': -utils.unmap_url_cache(iso_dir, url, md5sum) -else: -logging.warning(Missing file %s. Please download it, iso_path) -else: -logging.debug(%s present, with proper checksum, iso_path) + iso_name) +hash = 97a018ba32d43d0e76d032834fe7562bffe8ceb3 +destination = os.path.join(base_dir, 'isos', 'linux') +check_iso(url, destination, hash) + +logging.info(4 - Verifying winutils.iso (make sure we have the utility + ISO needed for Windows testing)) + +logging.info(In order to run the KVM autotests in Windows guests, we + provide you an ISO that this script can download) + +url = http://people.redhat.com/mrodrigu/kvm/winutils.iso; +hash = 301da394fe840172188a32f8ba01524993baa0cb +destination = os.path.join(base_dir, 'isos', 'windows') +check_iso(url, destination, hash) logging.info(4 - Checking if qemu is installed (certify qemu and qemu-kvm are in the place the default config expects)) -- 1.7.0.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
On 05/04/2010 06:03 PM, Arnd Bergmann wrote: On Tuesday 04 May 2010, Takuya Yoshikawa wrote: Although we can use *_le_bit() helpers to treat bitmaps le arranged, having le bit offset calculation as a seperate macro gives us more freedom. For example, KVM has le arranged dirty bitmaps for VGA, live-migration and they are used in user space too. To avoid bitmap copies between kernel and user space, we want to update the bitmaps in user space directly. To achive this, le bit offset with *_user() functions help us a lot. So let us use the le bit offset calculation part by defining it as a new macro: generic_le_bit_offset() . Does this work correctly if your user space is 32 bits (i.e. unsigned long is different size in user space and kernel) in both big- and little-endian systems? I'm not sure about all the details, but I think you cannot in general share bitmaps between user space and kernel because of this. That's why the bitmaps are defined as little endian u64 aligned, even on big endian 32-bit systems. Little endian bitmaps are wordsize agnostic, and u64 alignment ensures we can use long-sized bitops on mixed size systems. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Fix wallclock version writing race
On 05/04/2010 03:02 PM, Avi Kivity wrote: Wallclock writing uses an unprotected global variable to hold the version; this can cause one guest to interfere with another if both write their wallclock at the same time. Signed-off-by: Avi Kivitya...@redhat.com This was pointed out by Naphtali. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f6f8dad..c3152d7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -754,14 +754,22 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data) static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock) { - static int version; + int version; + int r; struct pvclock_wall_clock wc; struct timespec boot; if (!wall_clock) return; - version++; + r = kvm_read_guest(kvm, wall_clock,version, sizeof(version)); + if (r) + return; + + if (version 1) + ++version; /* first time write, random junk */ + + ++version; kvm_write_guest(kvm, wall_clock,version, sizeof(version)); -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Fix wallclock version writing race
On Tue, May 04, 2010 at 03:02:24PM +0300, Avi Kivity wrote: Wallclock writing uses an unprotected global variable to hold the version; this can cause one guest to interfere with another if both write their wallclock at the same time. makes sense to me. ACK. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
On 05/04/2010 06:27 PM, Andre Przywara wrote: 3. In all other cases so far it BSoDs with STOP 0x3E error right before displaying that kernel message. MSDN talks about a mulitprocessor configuration error: http://msdn.microsoft.com/en-us/library/ms819006.aspx I suspected the offline CPUs in the mptable that confuse NT. But -smp 1,maxcpus=1 does not make a difference. I will try to dig deeper in this area. What about disabling ACPI? smp should still work through the mptable. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Get rid of KVM_REQ_KICK
On Mon, May 03, 2010 at 05:19:08PM +0300, Avi Kivity wrote: KVM_REQ_KICK poisons vcpu-requests by having a bit set during normal operation. This causes the fast path check for a clear vcpu-requests to fail all the time, triggering tons of atomic operations. Avi, Do you have numbers? Fix by replacing KVM_REQ_KICK with a vcpu-guest_mode atomic. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/x86.c | 17 ++--- include/linux/kvm_host.h |1 + 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..307094a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4499,13 +4499,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (vcpu-fpu_active) kvm_load_guest_fpu(vcpu); - local_irq_disable(); + atomic_set(vcpu-guest_mode, 1); + smp_wmb(); IPI can trigger here? - clear_bit(KVM_REQ_KICK, vcpu-requests); - smp_mb__after_clear_bit(); + local_irq_disable(); - if (vcpu-requests || need_resched() || signal_pending(current)) { - set_bit(KVM_REQ_KICK, vcpu-requests); + if (!atomic_read(vcpu-guest_mode) || vcpu-requests + || need_resched() || signal_pending(current)) { + atomic_set(vcpu-guest_mode, 0); + smp_wmb(); local_irq_enable(); preempt_enable(); r = 1; @@ -4550,7 +4552,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (hw_breakpoint_active()) hw_breakpoint_restore(); - set_bit(KVM_REQ_KICK, vcpu-requests); + atomic_set(vcpu-guest_mode, 0); + smp_wmb(); local_irq_enable(); ++vcpu-stat.exits; @@ -5470,7 +5473,7 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu) me = get_cpu(); if (cpu != me (unsigned)cpu nr_cpu_ids cpu_online(cpu)) - if (!test_and_set_bit(KVM_REQ_KICK, vcpu-requests)) + if (atomic_xchg(vcpu-guest_mode, 0)) smp_send_reschedule(cpu); put_cpu(); } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ce027d5..a020fa2 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -81,6 +81,7 @@ struct kvm_vcpu { int vcpu_id; struct mutex mutex; int cpu; + atomic_t guest_mode; struct kvm_run *run; unsigned long requests; unsigned long guest_debug; -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Get rid of KVM_REQ_KICK
On 05/04/2010 07:31 PM, Marcelo Tosatti wrote: On Mon, May 03, 2010 at 05:19:08PM +0300, Avi Kivity wrote: KVM_REQ_KICK poisons vcpu-requests by having a bit set during normal operation. This causes the fast path check for a clear vcpu-requests to fail all the time, triggering tons of atomic operations. Avi, Do you have numbers? Forgot to post, was about 100 cycles (I expected more, all those atomics really show up in the profile). Fix by replacing KVM_REQ_KICK with a vcpu-guest_mode atomic. Signed-off-by: Avi Kivitya...@redhat.com --- arch/x86/kvm/x86.c | 17 ++--- include/linux/kvm_host.h |1 + 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..307094a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4499,13 +4499,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (vcpu-fpu_active) kvm_load_guest_fpu(vcpu); - local_irq_disable(); + atomic_set(vcpu-guest_mode, 1); + smp_wmb(); IPI can trigger here? It can... - clear_bit(KVM_REQ_KICK,vcpu-requests); - smp_mb__after_clear_bit(); + local_irq_disable(); - if (vcpu-requests || need_resched() || signal_pending(current)) { - set_bit(KVM_REQ_KICK,vcpu-requests); + if (!atomic_read(vcpu-guest_mode) || vcpu-requests + || need_resched() || signal_pending(current)) { ... and we'll detect that guest_mode was cleared and go back. + atomic_set(vcpu-guest_mode, 0); + smp_wmb(); local_irq_enable(); preempt_enable(); r = 1; @@ -4550,7 +4552,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (hw_breakpoint_active()) hw_breakpoint_restore(); - set_bit(KVM_REQ_KICK,vcpu-requests); + atomic_set(vcpu-guest_mode, 0); + smp_wmb(); local_irq_enable(); ++vcpu-stat.exits; @@ -5470,7 +5473,7 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu) me = get_cpu(); if (cpu != me (unsigned)cpu nr_cpu_ids cpu_online(cpu)) - if (!test_and_set_bit(KVM_REQ_KICK,vcpu-requests)) + if (atomic_xchg(vcpu-guest_mode, 0)) smp_send_reschedule(cpu); put_cpu(); The atomic_xchg() does the trick. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath - Assertion
On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote: Great, I'm going to submit it as a proper patch then. Christoph, by now I'm pretty sure it's right, but can you have another look if this is correct, anyway? It looks correct to me - we really shouldn't update the the fields until bdrv_aio_cancel has returned. In fact we cannot cancel a request more often than we can, so there's a fairly high chance it will complete. Reviewed-by: Christoph Hellwig h...@lst.de -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] x86: eliminate TS_XSAVE
On Sun, 2010-05-02 at 07:53 -0700, Avi Kivity wrote: The fpu code currently uses current-thread_info-status TS_XSAVE as a way to distinguish between XSAVE capable processors and older processors. The decision is not really task specific; instead we use the task status to avoid a global memory reference - the value should be the same across all threads. Eliminate this tie-in into the task structure by using an alternative instruction keyed off the XSAVE cpu feature; this results in shorter and faster code, without introducing a global memory reference. Signed-off-by: Avi Kivity a...@redhat.com Acked-by: Suresh Siddha suresh.b.sid...@intel.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] x86: Introduce 'struct fpu' and related API
On Sun, 2010-05-02 at 07:53 -0700, Avi Kivity wrote: Currently all fpu state access is through tsk-thread.xstate. Since we wish to generalize fpu access to non-task contexts, wrap the state in a new 'struct fpu' and convert existing access to use an fpu API. Signal frame handlers are not converted to the API since they will remain task context only things. Signed-off-by: Avi Kivity a...@redhat.com One comment I have is the name 'fpu'. In future we can use this for non fpu state aswell. For now, I can't think of a simple and better name. We can perhaps change it in the future. Acked-by: Suresh Siddha suresh.b.sid...@intel.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] x86: eliminate TS_XSAVE
On Tue, 2010-05-04 at 00:41 -0700, Avi Kivity wrote: On 05/04/2010 12:45 AM, H. Peter Anvin wrote: I was trying to avoid a performance regression relative to the current code, as it appears that some care was taken to avoid the memory reference. I agree that it's probably negligible compared to the save/restore code. If the x86 maintainers agree as well, I'll replace it with cpu_has_xsave. I asked Suresh to comment on this, since he wrote the original code. He did confirm that the intent was to avoid a global memory reference. Ok, so you're happy with the patch as is? As use_xsave() is in the hot context switch path, I would like to go with Avi's proposal. thanks, suresh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] x86: eliminate TS_XSAVE
On 05/04/2010 11:15 AM, Suresh Siddha wrote: On Tue, 2010-05-04 at 00:41 -0700, Avi Kivity wrote: On 05/04/2010 12:45 AM, H. Peter Anvin wrote: I was trying to avoid a performance regression relative to the current code, as it appears that some care was taken to avoid the memory reference. I agree that it's probably negligible compared to the save/restore code. If the x86 maintainers agree as well, I'll replace it with cpu_has_xsave. I asked Suresh to comment on this, since he wrote the original code. He did confirm that the intent was to avoid a global memory reference. Ok, so you're happy with the patch as is? As use_xsave() is in the hot context switch path, I would like to go with Avi's proposal. I would tend to agree. Saving a likely cache miss in the hot context switch path is worthwhile. I would like to request one change, however. I would like to see the alternatives code to be: movb $0,reg movb $1,reg ... instead of using xor (which has to be padded with NOPs, which is of course pointless since the slot is a fixed size.) I would suggest using a byte-sized variable instead of a dword-size variable to save a few bytes, too. Once the jump label framework is integrated and has matured, I think we should consider using it to save the mov/test/jump. -hpa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio: put last_used and last_avail index into ring itself.
virtio: put last_used and last_avail index into ring itself. Generally, the other end of the virtio ring doesn't need to see where you're up to in consuming the ring. However, to completely understand what's going on from the outside, this information must be exposed. For example, if you want to save and restore a virtio_ring, but you're not the consumer because the kernel is using it directly. Fortunately, we have room to expand: the ring is always a whole number of pages and there's hundreds of bytes of padding after the avail ring and the used ring, whatever the number of descriptors (which must be a power of 2). We add a feature bit so the guest can tell the host that it's writing out the current value there, if it wants to use that. Signed-off-by: Rusty Russell ru...@rustcorp.com.au I've been looking at this patch some more (more on why later), and I wonder: would it be better to add some alignment to the last used index address, so that if we later add more stuff at the tail, it all fits in a single cache line? We use a new feature bit anyway, so layout change should not be a problem. Since I raised the question of caches: for used ring, the ring is not aligned to 64 bit, so on CPUs with 64 bit or larger cache lines, used entries will often cross cache line boundaries. Am I right and might it have been better to align ring entries to cache line boundaries? What do you think? --- drivers/virtio/virtio_ring.c | 23 +++ include/linux/virtio_ring.h | 12 +++- 2 files changed, 26 insertions(+), 9 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -71,9 +71,6 @@ struct vring_virtqueue /* Number we've added since last sync. */ unsigned int num_added; - /* Last used index we've seen. */ - u16 last_used_idx; - /* How to notify other side. FIXME: commonalize hcalls! */ void (*notify)(struct virtqueue *vq); @@ -278,12 +275,13 @@ static void detach_buf(struct vring_virt static inline bool more_used(const struct vring_virtqueue *vq) { - return vq-last_used_idx != vq-vring.used-idx; + return vring_last_used(vq-vring) != vq-vring.used-idx; } static void *vring_get_buf(struct virtqueue *_vq, unsigned int *len) { struct vring_virtqueue *vq = to_vvq(_vq); + struct vring_used_elem *u; void *ret; unsigned int i; @@ -300,8 +298,11 @@ static void *vring_get_buf(struct virtqu return NULL; } - i = vq-vring.used-ring[vq-last_used_idx%vq-vring.num].id; - *len = vq-vring.used-ring[vq-last_used_idx%vq-vring.num].len; + u = vq-vring.used-ring[vring_last_used(vq-vring) % vq-vring.num]; + i = u-id; + *len = u-len; + /* Make sure we don't reload i after doing checks. */ + rmb(); if (unlikely(i = vq-vring.num)) { BAD_RING(vq, id %u out of range\n, i); @@ -315,7 +316,8 @@ static void *vring_get_buf(struct virtqu /* detach_buf clears data, so grab it now. */ ret = vq-data[i]; detach_buf(vq, i); - vq-last_used_idx++; + vring_last_used(vq-vring)++; + END_USE(vq); return ret; } @@ -402,7 +404,6 @@ struct virtqueue *vring_new_virtqueue(un vq-vq.name = name; vq-notify = notify; vq-broken = false; - vq-last_used_idx = 0; vq-num_added = 0; list_add_tail(vq-vq.list, vdev-vqs); #ifdef DEBUG @@ -413,6 +414,10 @@ struct virtqueue *vring_new_virtqueue(un vq-indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC); + /* We publish indices whether they offer it or not: if not, it's junk + * space anyway. But calling this acknowledges the feature. */ + virtio_has_feature(vdev, VIRTIO_RING_F_PUBLISH_INDICES); + /* No callback? Tell other side not to bother us. */ if (!callback) vq-vring.avail-flags |= VRING_AVAIL_F_NO_INTERRUPT; @@ -443,6 +448,8 @@ void vring_transport_features(struct vir switch (i) { case VIRTIO_RING_F_INDIRECT_DESC: break; + case VIRTIO_RING_F_PUBLISH_INDICES: + break; default: /* We don't understand this bit. */ clear_bit(i, vdev-features); diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -29,6 +29,9 @@ /* We support indirect buffer descriptors */ #define VIRTIO_RING_F_INDIRECT_DESC 28 +/* We publish our last-seen used index at the end of the avail ring. */ +#define VIRTIO_RING_F_PUBLISH_INDICES29 + /* Virtio ring descriptors: 16 bytes. These can chain together via next. */ struct vring_desc { @@ -87,6 +90,7 @@ struct vring { * __u16
[PATCH 0/2] fix kvmclock bug - memory corruption
This patch series fixes I bug I just found with kvmclock, when I booted into a kernel without kvmclock enabled. Since I am setting msrs, I took the oportunity to use yet another function from upstream qemu (patch 1). Enjoy Glauber Costa (2): replace set_msr_entry with kvm_msr_entry turn off kvmclock when resetting cpu qemu-kvm-x86.c| 58 - target-i386/kvm.c |3 ++ 2 files changed, 38 insertions(+), 23 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] replace set_msr_entry with kvm_msr_entry
this is yet another function that upstream qemu implements, so we can just use its implementation. Signed-off-by: Glauber Costa glom...@redhat.com --- qemu-kvm-x86.c| 39 --- target-i386/kvm.c |3 +++ 2 files changed, 19 insertions(+), 23 deletions(-) diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 748ff69..439c31a 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -693,13 +693,6 @@ int kvm_arch_qemu_create_context(void) return 0; } -static void set_msr_entry(struct kvm_msr_entry *entry, uint32_t index, - uint64_t data) -{ -entry-index = index; -entry-data = data; -} - /* returns 0 on success, non-0 on failure */ static int get_msr_entry(struct kvm_msr_entry *entry, CPUState *env) { @@ -960,19 +953,19 @@ void kvm_arch_load_regs(CPUState *env, int level) /* msrs */ n = 0; /* Remember to increase msrs size if you add new registers below */ -set_msr_entry(msrs[n++], MSR_IA32_SYSENTER_CS, env-sysenter_cs); -set_msr_entry(msrs[n++], MSR_IA32_SYSENTER_ESP, env-sysenter_esp); -set_msr_entry(msrs[n++], MSR_IA32_SYSENTER_EIP, env-sysenter_eip); +kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_CS, env-sysenter_cs); +kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_ESP, env-sysenter_esp); +kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_EIP, env-sysenter_eip); if (kvm_has_msr_star) - set_msr_entry(msrs[n++], MSR_STAR, env-star); + kvm_msr_entry_set(msrs[n++], MSR_STAR, env-star); if (kvm_has_vm_hsave_pa) -set_msr_entry(msrs[n++], MSR_VM_HSAVE_PA, env-vm_hsave); +kvm_msr_entry_set(msrs[n++], MSR_VM_HSAVE_PA, env-vm_hsave); #ifdef TARGET_X86_64 if (lm_capable_kernel) { -set_msr_entry(msrs[n++], MSR_CSTAR, env-cstar); -set_msr_entry(msrs[n++], MSR_KERNELGSBASE, env-kernelgsbase); -set_msr_entry(msrs[n++], MSR_FMASK, env-fmask); -set_msr_entry(msrs[n++], MSR_LSTAR , env-lstar); +kvm_msr_entry_set(msrs[n++], MSR_CSTAR, env-cstar); +kvm_msr_entry_set(msrs[n++], MSR_KERNELGSBASE, env-kernelgsbase); +kvm_msr_entry_set(msrs[n++], MSR_FMASK, env-fmask); +kvm_msr_entry_set(msrs[n++], MSR_LSTAR , env-lstar); } #endif if (level == KVM_PUT_FULL_STATE) { @@ -983,20 +976,20 @@ void kvm_arch_load_regs(CPUState *env, int level) * huge jump-backs that would occur without any writeback at all. */ if (smp_cpus == 1 || env-tsc != 0) { -set_msr_entry(msrs[n++], MSR_IA32_TSC, env-tsc); +kvm_msr_entry_set(msrs[n++], MSR_IA32_TSC, env-tsc); } -set_msr_entry(msrs[n++], MSR_KVM_SYSTEM_TIME, env-system_time_msr); -set_msr_entry(msrs[n++], MSR_KVM_WALL_CLOCK, env-wall_clock_msr); +kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME, env-system_time_msr); +kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, env-wall_clock_msr); } #ifdef KVM_CAP_MCE if (env-mcg_cap) { if (level == KVM_PUT_RESET_STATE) -set_msr_entry(msrs[n++], MSR_MCG_STATUS, env-mcg_status); +kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status); else if (level == KVM_PUT_FULL_STATE) { -set_msr_entry(msrs[n++], MSR_MCG_STATUS, env-mcg_status); -set_msr_entry(msrs[n++], MSR_MCG_CTL, env-mcg_ctl); +kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status); +kvm_msr_entry_set(msrs[n++], MSR_MCG_CTL, env-mcg_ctl); for (i = 0; i (env-mcg_cap 0xff); i++) -set_msr_entry(msrs[n++], MSR_MC0_CTL + i, env-mce_banks[i]); +kvm_msr_entry_set(msrs[n++], MSR_MC0_CTL + i, env-mce_banks[i]); } } #endif diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 5239eaf..56740bd 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -552,6 +552,8 @@ static int kvm_put_sregs(CPUState *env) return kvm_vcpu_ioctl(env, KVM_SET_SREGS, sregs); } +#endif + static void kvm_msr_entry_set(struct kvm_msr_entry *entry, uint32_t index, uint64_t value) { @@ -559,6 +561,7 @@ static void kvm_msr_entry_set(struct kvm_msr_entry *entry, entry-data = value; } +#ifdef KVM_UPSTREAM static int kvm_put_msrs(CPUState *env, int level) { struct { -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] turn off kvmclock when resetting cpu
Currently, in the linux kernel, we reset kvmclock if we are rebooting into a crash kernel through kexec. The rationale, is that a new kernel won't follow the same memory addresses, and the memory where kvmclock is located in the first kernel, will be something else in the second one. We don't do it in normal reboots, because the second kernel ends up registering kvmclock again, which has the effect of turning off the first instance. This is, however, totally wrong. This assumes we're booting into a kernel that also has kvmclock enabled. If by some reason we reboot into something that doesn't do kvmclock including but not limited to: * rebooting into an older kernel without kvmclock support, * rebooting with no-kvmclock, * rebootint into another O.S, we'll simply have the hypervisor writing into a random memory position into the guest. Neat, uh? Moreover, I believe the fix belongs in qemu, since it is the entity more prepared to detect all kinds of reboots (by means of a cpu_reset), not to mention the presence of misbehaving guests, that can forget to turn kvmclock off. This patch fixes the issue for me. Signed-off-by: Glauber Costa glom...@redhat.com --- qemu-kvm-x86.c | 19 +++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 439c31a..4b94e04 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -1417,8 +1417,27 @@ void kvm_arch_push_nmi(void *opaque) } #endif /* KVM_CAP_USER_NMI */ +static int kvm_turn_off_clock(CPUState *env) +{ +struct { +struct kvm_msrs info; +struct kvm_msr_entry entries[100]; +} msr_data; + +struct kvm_msr_entry *msrs = msr_data.entries; +int n = 0; + +kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME, 0); +kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, 0); +msr_data.info.nmsrs = n; + +return kvm_vcpu_ioctl(env, KVM_SET_MSRS, msr_data); +} + + void kvm_arch_cpu_reset(CPUState *env) { +kvm_turn_off_clock(env); kvm_arch_reset_vcpu(env); kvm_reset_mpstate(env); } -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-spec: document block CMD and FLUSH
On Fri, Feb 19, 2010 at 12:22:20AM +0200, Michael S. Tsirkin wrote: I took a stub at documenting CMD and FLUSH request types in virtio block. Christoph, could you look over this please? I note that the interface seems full of warts to me, this might be a first step to cleaning them. The whole virtio-blk interface is full of warts. It has been extended rather ad-hoc, so that is rather expected. One issue I struggled with especially is how type field mixes bits and non-bit values. I ended up simply defining all legal values, so that we have CMD = 2, CMD_OUT = 3 and so on. It's basically a complete mess without much logic behind it. +\change_unchanged +the high bit +\change_inserted 0 1266497301 + (VIRTIO_BLK_T_BARRIER) +\change_unchanged + indicates that this request acts as a barrier and that all preceeding requests + must be complete before this one, and all following requests must not be + started until this is complete. + +\change_inserted 0 1266504385 + Note that a barrier does not flush caches in the underlying backend device + in host, and thus does not serve as data consistency guarantee. + Driver must use FLUSH request to flush the host cache. +\change_unchanged I'm not sure it's even worth documenting it. I can't see any way to actually implement safe behaviour with the VIRTIO_BLK_T_BARRIER-style barriers. Btw, did I mention that .lyx is a a really horrible format to review diffs for? Plain latex would be a lot better.. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] virtio-spec: document block CMD and FLUSH
On Tue, Apr 20, 2010 at 02:46:35AM +0100, Jamie Lokier wrote: Does this mean that virtio-blk supports all three combinations? 1. FLUSH that isn't a barrier 2. FLUSH that is also a barrier 3. Barrier that is not a flush 1 is good for fsync-like operations; 2 is good for journalling-like ordered operations. 3 sounds like it doesn't mean a lot as the host cache provides no guarantees and has no ordering facility that can be used. No. The Linux virtio_blk guest driver either supports data integrity by using FLUSH or can send down BARRIER requests which aren't much help at all. Qemu only implements FLUSH anyway. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-spec: document block CMD and FLUSH
On Tue, May 04, 2010 at 08:54:59PM +0200, Christoph Hellwig wrote: On Fri, Feb 19, 2010 at 12:22:20AM +0200, Michael S. Tsirkin wrote: I took a stub at documenting CMD and FLUSH request types in virtio block. Christoph, could you look over this please? I note that the interface seems full of warts to me, this might be a first step to cleaning them. The whole virtio-blk interface is full of warts. It has been extended rather ad-hoc, so that is rather expected. One issue I struggled with especially is how type field mixes bits and non-bit values. I ended up simply defining all legal values, so that we have CMD = 2, CMD_OUT = 3 and so on. It's basically a complete mess without much logic behind it. +\change_unchanged +the high bit +\change_inserted 0 1266497301 + (VIRTIO_BLK_T_BARRIER) +\change_unchanged + indicates that this request acts as a barrier and that all preceeding requests + must be complete before this one, and all following requests must not be + started until this is complete. + +\change_inserted 0 1266504385 + Note that a barrier does not flush caches in the underlying backend device + in host, and thus does not serve as data consistency guarantee. + Driver must use FLUSH request to flush the host cache. +\change_unchanged I'm not sure it's even worth documenting it. I can't see any way to actually implement safe behaviour with the VIRTIO_BLK_T_BARRIER-style barriers. lguest seems to still use this. I guess if you have a reliable host, VIRTIO_BLK_T_BARRIER is enough? Btw, did I mention that .lyx is a a really horrible format to review diffs for? Plain latex would be a lot better.. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-spec: document block CMD and FLUSH
On Tue, May 04, 2010 at 09:56:18PM +0300, Michael S. Tsirkin wrote: On Tue, May 04, 2010 at 08:54:59PM +0200, Christoph Hellwig wrote: On Fri, Feb 19, 2010 at 12:22:20AM +0200, Michael S. Tsirkin wrote: I took a stub at documenting CMD and FLUSH request types in virtio block. Christoph, could you look over this please? I note that the interface seems full of warts to me, this might be a first step to cleaning them. The whole virtio-blk interface is full of warts. It has been extended rather ad-hoc, so that is rather expected. One issue I struggled with especially is how type field mixes bits and non-bit values. I ended up simply defining all legal values, so that we have CMD = 2, CMD_OUT = 3 and so on. It's basically a complete mess without much logic behind it. +\change_unchanged +the high bit +\change_inserted 0 1266497301 + (VIRTIO_BLK_T_BARRIER) +\change_unchanged + indicates that this request acts as a barrier and that all preceeding requests + must be complete before this one, and all following requests must not be + started until this is complete. + +\change_inserted 0 1266504385 + Note that a barrier does not flush caches in the underlying backend device + in host, and thus does not serve as data consistency guarantee. + Driver must use FLUSH request to flush the host cache. +\change_unchanged I'm not sure it's even worth documenting it. I can't see any way to actually implement safe behaviour with the VIRTIO_BLK_T_BARRIER-style barriers. lguest seems to still use this. Sorry, it doesn't. No idea why I thought it does. I guess if you have a reliable host, VIRTIO_BLK_T_BARRIER is enough? Btw, did I mention that .lyx is a a really horrible format to review diffs for? Plain latex would be a lot better.. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] virtio-spec: document block CMD and FLUSH
On Tue, May 04, 2010 at 08:56:14PM +0200, Christoph Hellwig wrote: On Tue, Apr 20, 2010 at 02:46:35AM +0100, Jamie Lokier wrote: Does this mean that virtio-blk supports all three combinations? 1. FLUSH that isn't a barrier 2. FLUSH that is also a barrier 3. Barrier that is not a flush 1 is good for fsync-like operations; 2 is good for journalling-like ordered operations. 3 sounds like it doesn't mean a lot as the host cache provides no guarantees and has no ordering facility that can be used. No. The Linux virtio_blk guest driver either supports data integrity by using FLUSH or can send down BARRIER requests which aren't much help at all. It seems we use BARRIER when we get REQ_HARDBARRIER, right? What does the REQ_HARDBARRIER flag in request mean and when is it set? Qemu only implements FLUSH anyway. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH
Jens Axboe wrote: On Tue, May 04 2010, Rusty Russell wrote: ISTR someone mentioning a desire for such an API years ago, so CC'ing the usual I/O suspects... It would be nice to have a more fuller API for this, but the reality is that only the flush approach is really workable. Even just strict ordering of requests could only be supported on SCSI, and even there the kernel still lacks proper guarantees on error handling to prevent reordering there. There's a few I/O scheduling differences that might be useful: 1. The I/O scheduler could freely move WRITEs before a FLUSH but not before a BARRIER. That might be useful for time-critical WRITEs, and those issued by high I/O priority. 2. The I/O scheduler could move WRITEs after a FLUSH if the FLUSH is only for data belonging to a particular file (e.g. fdatasync with no file size change, even on btrfs if O_DIRECT was used for the writes being committed). That would entail tagging FLUSHes and WRITEs with a fs-specific identifier (such as inode number), opaque to the scheduler which only checks equality. 3. By delaying FLUSHes through reordering as above, the I/O scheduler could merge multiple FLUSHes into a single command. 4. On MD/RAID, BARRIER requires every backing device to quiesce before sending the low-level cache-flush, and all of those to finish before resuming each backing device. FLUSH doesn't require as much synchronising. (With per-file FLUSH; see 2; it could even avoid FLUSH altogether to some backing devices for small files). In other words, FLUSH can be more relaxed than BARRIER inside the kernel. It's ironic that we think of fsync as stronger than fbarrier outside the kernel :-) -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] fix kvmclock bug - memory corruption
On 05/04/2010 08:35 AM, Glauber Costa wrote: This patch series fixes I bug I just found with kvmclock, when I booted into a kernel without kvmclock enabled. Since I am setting msrs, I took the oportunity to use yet another function from upstream qemu (patch 1). Enjoy Glauber Costa (2): replace set_msr_entry with kvm_msr_entry turn off kvmclock when resetting cpu qemu-kvm-x86.c| 58 - target-i386/kvm.c |3 ++ 2 files changed, 38 insertions(+), 23 deletions(-) Acked-by: Zachary Amsden zams...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH
Rusty Russell wrote: On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote: I took a stub at documenting CMD and FLUSH request types in virtio block. Christoph, could you look over this please? I note that the interface seems full of warts to me, this might be a first step to cleaning them. ISTR Christoph had withdrawn some patches in this area, and was waiting for him to resubmit? I've given up on figuring out the block device. What seem to me to be sane semantics along the lines of memory barriers are foreign to disk people: they want (and depend on) flushing everywhere. For example, tdb transactions do not require a flush, they only require what I would call a barrier: that prior data be written out before any future data. Surely that would be more efficient in general than a flush! In fact, TDB wants only writes to *that file* (and metadata) written out first; it has no ordering issues with other I/O on the same device. I've just posted elsewhere on this thread, that an I/O level flush can be more efficient than an I/O level barrier (implemented using a cache-flush really), because the barrier has stricter ordering requirements at the I/O scheduling level. By the time you work up to tdb, another way to think of it is distinguishing eager fsync from fsync but I'm not in a hurry - delay as long as is convenient. The latter makes much more sense with AIO. A generic I/O interface would allow you to specify this request depends on these outstanding requests and leave it at that. It might have some sync flush command for dumb applications and OSes. For filesystems, it would probably be easy to label in-place overwrites and fdatasync data flushes when there's no file extension with an opqaue per-file identifier for certain operations. Typically over-writing in place and fdatasync would match up and wouldn't need ordering against anything else. Other operations would tend to get labelled as ordered against everything including these. -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3 v2] KVM MMU: make kvm_mmu_zap_page() return the number of zapped sp in total.
Marcelo Tosatti wrote: On Mon, May 03, 2010 at 09:38:54PM +0800, Gui Jianfeng wrote: Hi Marcelo Actually, it doesn't only affect kvm_mmu_change_mmu_pages() but also affects kvm_mmu_remove_some_alloc_mmu_pages() which is called by mmu shrink routine. This will induce upper layer get a wrong number, so i think this should be fixed. Here is a updated version. --- From: Gui Jianfeng guijianf...@cn.fujitsu.com Currently, in kvm_mmu_change_mmu_pages(kvm, page), used_pages-- is performed after calling kvm_mmu_zap_page() in spite of that whether page is actually reclaimed. Because root sp won't be reclaimed by kvm_mmu_zap_page(). So making kvm_mmu_zap_page() return total number of reclaimed sp makes more sense. A new flag is put into kvm_mmu_zap_page() to indicate whether the top page is reclaimed. kvm_mmu_remove_some_alloc_mmu_pages() also rely on kvm_mmu_zap_page() to return a total relcaimed number. Isnt it simpler to have kvm_mmu_zap_page return the number of pages it actually freed? Then always restart the hash walk if return is positive. OK, although in some cases we might encounter unneeded hash walk restart, but it's not a big problem. I don't object this solution, will post a new patch. Thanks, Gui -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: make kvm_mmu_zap_page() return the number of pages it actually freed.
Currently, kvm_mmu_zap_page() returning the number of freed children sp. This might confuse the caller, because caller don't know the actual freed number. Let's make kvm_mmu_zap_page() return the number of pages it actually freed. Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com --- arch/x86/kvm/mmu.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 51eb6d6..8ab6820 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1503,6 +1503,8 @@ static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp) if (sp-unsync) kvm_unlink_unsync_page(kvm, sp); if (!sp-root_count) { + /* Count self */ + ret++; hlist_del(sp-hash_link); kvm_mmu_free_page(kvm, sp); } else { @@ -1539,7 +1541,6 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages) page = container_of(kvm-arch.active_mmu_pages.prev, struct kvm_mmu_page, link); used_pages -= kvm_mmu_zap_page(kvm, page); - used_pages--; } kvm_nr_mmu_pages = used_pages; kvm-arch.n_free_mmu_pages = 0; @@ -2908,7 +2909,7 @@ static int kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm) page = container_of(kvm-arch.active_mmu_pages.prev, struct kvm_mmu_page, link); - return kvm_mmu_zap_page(kvm, page) + 1; + return kvm_mmu_zap_page(kvm, page); } static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask) -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: mark page dirty when page is actually modified.
Sometime cmpxchg_gpte doesn't modify gpte, in such case, don't mark page table page as dirty. Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com --- arch/x86/kvm/paging_tmpl.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 89d66ca..1ad9843 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -177,10 +177,10 @@ walk: if (!(pte PT_ACCESSED_MASK)) { trace_kvm_mmu_set_accessed_bit(table_gfn, index, sizeof(pte)); - mark_page_dirty(vcpu-kvm, table_gfn); if (FNAME(cmpxchg_gpte)(vcpu-kvm, table_gfn, index, pte, pte|PT_ACCESSED_MASK)) goto walk; + mark_page_dirty(vcpu-kvm, table_gfn); pte |= PT_ACCESSED_MASK; } @@ -217,11 +217,11 @@ walk: bool ret; trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte)); - mark_page_dirty(vcpu-kvm, table_gfn); ret = FNAME(cmpxchg_gpte)(vcpu-kvm, table_gfn, index, pte, pte|PT_DIRTY_MASK); if (ret) goto walk; + mark_page_dirty(vcpu-kvm, table_gfn); pte |= PT_DIRTY_MASK; walker-ptes[walker-level - 1] = pte; } -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
vCPU scalability for linux VMs
Gentlemen, Reaching out with a non-development question, sorry if it's not appropriate here. I'm looking for a way to improve Linux SMP VMs performance under KVM. My preliminary results show that single vCPU Linux VMs perform up to 10 times better than 4vCPU Linux VMs (consolidated performance of 8 VMs on 8 core pre-Nehalem server). I suspect that I'm missing something major and look for any means that can help improve SMP VMs performance. VMs are started using: qemu-kvm -m $ram -smp $cpus -name $name -drive file=${newimg},boot=on,cache=writethrough -net nic,macaddr=$mac,vlan=0,model=virtio -net tap,script=/kvm/qemu-ifup,vlan=0,ifname=kvmnet$i -parallel none -usb -k en-us -monitor pty -serial pty -nographic -daemonize -snapshot KVM Host Environment (redhat 5 based): # uname -r 2.6.18-194.el5 # rpm -qa|grep kvm kvm-83-164.el5 kvm-tools-83-164.el5 kmod-kvm-83-164.el5 kvm-qemu-img-83-164.el5 Thank you, Alec -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Fix debug output error
Fix a debug output error in walk_addr Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com --- arch/x86/kvm/paging_tmpl.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 89d66ca..d2c5164 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -229,7 +229,7 @@ walk: walker-pt_access = pt_access; walker-pte_access = pte_access; pgprintk(%s: pte %llx pte_access %x pt_access %x\n, -__func__, (u64)pte, pt_access, pte_access); +__func__, (u64)pte, pte_access, pt_access); return 1; not_present: -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.33.3: possible recursive locking detected
On Tue, May 04, 2010 at 11:37:37AM +0300, Avi Kivity wrote: On 05/04/2010 10:03 AM, CaT wrote: I'm currently running 2.6.33.3 in a KVM instance emulating a core2duo on 1 cpu with virtio HDs running on top of a core2duo host running 2.6.33.3. qemu-kvm version 0.12.3. Can you try commit 6992f5334995af474c2b58d010d08bc597f0f2fe in the latest kernel? Doesn't appear to be related to kvm. Copying lkml. When doing: echo noop/sys/block/vdd/queue/scheduler I got: [ 1424.438241] = [ 1424.439588] [ INFO: possible recursive locking detected ] [ 1424.440368] 2.6.33.3-moocow.20100429-142641 #2 [ 1424.440960] - [ 1424.440960] bash/2186 is trying to acquire lock: [ 1424.440960] (s_active){.+}, at: [811046b8] sysfs_remove_dir+0x75/0x88 [ 1424.440960] [ 1424.440960] but task is already holding lock: [ 1424.440960] (s_active){.+}, at: [81104849] sysfs_get_active_two+0x1f/0x46 [ 1424.440960] [ 1424.440960] other info that might help us debug this: [ 1424.440960] 4 locks held by bash/2186: [ 1424.440960] #0: (buffer-mutex){+.+.+.}, at: [8110317f] sysfs_write_file+0x39/0x126 [ 1424.440960] #1: (s_active){.+}, at: [81104849] sysfs_get_active_two+0x1f/0x46 [ 1424.440960] #2: (s_active){.+}, at: [81104856] sysfs_get_active_two+0x2c/0x46 [ 1424.440960] #3: (q-sysfs_lock){+.+.+.}, at: [8119c3f0] queue_attr_store+0x44/0x85 [ 1424.440960] [ 1424.440960] stack backtrace: [ 1424.440960] Pid: 2186, comm: bash Not tainted 2.6.33.3-moocow.20100429-142641 #2 [ 1424.440960] Call Trace: [ 1424.440960] [8105e775] __lock_acquire+0xf9f/0x178e [ 1424.440960] [8100d3ec] ? save_stack_trace+0x2a/0x48 [ 1424.440960] [8105b46c] ? lockdep_init_map+0x9f/0x52f [ 1424.440960] [8105b46c] ? lockdep_init_map+0x9f/0x52f [ 1424.440960] [8105cb56] ? trace_hardirqs_on+0xd/0xf [ 1424.440960] [8105f02e] lock_acquire+0xca/0xef [ 1424.440960] [811046b8] ? sysfs_remove_dir+0x75/0x88 [ 1424.440960] [8110458d] sysfs_addrm_finish+0xc8/0x13a [ 1424.440960] [811046b8] ? sysfs_remove_dir+0x75/0x88 [ 1424.440960] [8105cb25] ? trace_hardirqs_on_caller+0x110/0x134 [ 1424.440960] [811046b8] sysfs_remove_dir+0x75/0x88 [ 1424.440960] [811ab312] kobject_del+0x16/0x37 [ 1424.440960] [81195489] elv_iosched_store+0x10a/0x214 [ 1424.440960] [8119c416] queue_attr_store+0x6a/0x85 [ 1424.440960] [81103237] sysfs_write_file+0xf1/0x126 [ 1424.440960] [810b747f] vfs_write+0xae/0x14a [ 1424.440960] [810b75df] sys_write+0x47/0x6e [ 1424.440960] [81002202] system_call_fastpath+0x16/0x1b Original scheduler was cfq. Having rebooted and defaulted to noop I tried echo noop/sys/block/vdd/queue/scheduler and got: [ 311.294464] = [ 311.295820] [ INFO: possible recursive locking detected ] [ 311.296603] 2.6.33.3-moocow.20100429-142641 #2 [ 311.296833] - [ 311.296833] bash/2190 is trying to acquire lock: [ 311.296833] (s_active){.+}, at: [81104630] remove_dir+0x31/0x39 [ 311.296833] [ 311.296833] but task is already holding lock: [ 311.296833] (s_active){.+}, at: [81104849] sysfs_get_active_two+0x1f/0x46 [ 311.296833] [ 311.296833] other info that might help us debug this: [ 311.296833] 4 locks held by bash/2190: [ 311.296833] #0: (buffer-mutex){+.+.+.}, at: [8110317f] sysfs_write_file+0x39/0x126 [ 311.296833] #1: (s_active){.+}, at: [81104849] sysfs_get_active_two+0x1f/0x46 [ 311.296833] #2: (s_active){.+}, at: [81104856] sysfs_get_active_two+0x2c/0x46 [ 311.296833] #3: (q-sysfs_lock){+.+.+.}, at: [8119c3f0] queue_attr_store+0x44/0x85 [ 311.296833] [ 311.296833] stack backtrace: [ 311.296833] Pid: 2190, comm: bash Not tainted 2.6.33.3-moocow.20100429-142641 #2 [ 311.296833] Call Trace: [ 311.296833] [8105e775] __lock_acquire+0xf9f/0x178e [ 311.296833] [8105b46c] ? lockdep_init_map+0x9f/0x52f [ 311.296833] [8105b46c] ? lockdep_init_map+0x9f/0x52f [ 311.296833] [8105cb56] ? trace_hardirqs_on+0xd/0xf [ 311.296833] [8105f02e] lock_acquire+0xca/0xef [ 311.296833] [81104630] ? remove_dir+0x31/0x39 [ 311.296833] [8110458d] sysfs_addrm_finish+0xc8/0x13a [ 311.296833] [81104630] ? remove_dir+0x31/0x39 [ 311.296833] [8105cb25] ? trace_hardirqs_on_caller+0x110/0x134 [ 311.296833] [81104630] remove_dir+0x31/0x39 [ 311.296833] [811046c0] sysfs_remove_dir+0x7d/0x88 [ 311.296833] [811ab312] kobject_del+0x16/0x37 [ 311.296833] [81195489]
Re: 2.6.33.3: possible recursive locking detected
On Wed, May 5, 2010 at 10:32 AM, Yong Zhang yong.zh...@windriver.com wrote: On Tue, May 04, 2010 at 11:37:37AM +0300, Avi Kivity wrote: On 05/04/2010 10:03 AM, CaT wrote: I'm currently running 2.6.33.3 in a KVM instance emulating a core2duo on 1 cpu with virtio HDs running on top of a core2duo host running 2.6.33.3. qemu-kvm version 0.12.3. Can you try commit 6992f5334995af474c2b58d010d08bc597f0f2fe in the latest kernel? Hmm, 2.6.33 -stable has commit 846f99749ab68bbc7f75c74fec305de675b1a1bf? Actually, these 3 commits fixed it: 6992f5334995af474c2b58d010d08bc597f0f2fe sysfs: Use one lockdep class per sysfs ttribute. a2db6842873c8e5a70652f278d469128cb52db70 sysfs: Only take active references on attributes. e72ceb8ccac5f770b3e696e09bb673dca7024b20 sysfs: Remove sysfs_get/put_active_two However, there are many other patches needed to amend these, so I think it's not suitable for -stable to include, perhaps a revert of 846f99749ab68bbc7f75c74fec305de675b1a1bf is better. Adding Greg into Cc. Thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro
On Tue, 04 May 2010 19:08:23 +0300 Avi Kivity a...@redhat.com wrote: On 05/04/2010 06:03 PM, Arnd Bergmann wrote: On Tuesday 04 May 2010, Takuya Yoshikawa wrote: ... So let us use the le bit offset calculation part by defining it as a new macro: generic_le_bit_offset() . Does this work correctly if your user space is 32 bits (i.e. unsigned long is different size in user space and kernel) in both big- and little-endian systems? I'm not sure about all the details, but I think you cannot in general share bitmaps between user space and kernel because of this. That's why the bitmaps are defined as little endian u64 aligned, even on big endian 32-bit systems. Little endian bitmaps are wordsize agnostic, and u64 alignment ensures we can use long-sized bitops on mixed size systems. There was a suggestion to propose set_le_bit_user() kind of macros. But what I thought was these have a constraint you two explained and seemed to be a little bit specific to some area, like KVM. So I decided to propose just the offset calculation macro. Thanks, Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html