[COMMIT master] document boot option to -drive parameter

2010-05-04 Thread Avi Kivity
From: Bruce Rogers brog...@novell.com

The boot option is missing from the documentation for the -drive parameter.

If there is a better way to descibe it, I'm all ears.

Signed-off-by: Bruce Rogers brog...@novell.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/qemu-options.hx b/qemu-options.hx
index c5a160c..fbcf61e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -160,6 +160,8 @@ an untrusted format header.
 This option specifies the serial number to assign to the device.
 @item ad...@var{addr}
 Specify the controller's PCI address (if=virtio only).
+...@item bo...@var{boot}
+...@var{boot} is on or off and allows for booting from non-traditional 
interfaces, such as virtio.
 @end table
 
 By default, writethrough caching is used for all block device.  This means that
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Test cmps between two IO locations.

2010-05-04 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/user/test/x86/emulator.c b/kvm/user/test/x86/emulator.c
index c6adbb5..db84c13 100644
--- a/kvm/user/test/x86/emulator.c
+++ b/kvm/user/test/x86/emulator.c
@@ -17,18 +17,11 @@ void report(const char *name, int result)
}
 }
 
-void test_cmps(void *mem)
+void test_cmps_one(unsigned char *m1, unsigned char *m3)
 {
-   unsigned char *m1 = mem, *m2 = mem + 1024;
-   unsigned char m3[1024];
void *rsi, *rdi;
long rcx, tmp;
 
-   for (int i = 0; i  100; ++i)
-   m1[i] = m2[i] = m3[i] = i;
-   for (int i = 100; i  200; ++i)
-   m1[i] = (m3[i] = m2[i] = i) + 1;
-
rsi = m1; rdi = m3; rcx = 30;
asm volatile(xor %[tmp], %[tmp] \n\t
 repe/cmpsb
@@ -91,6 +84,19 @@ void test_cmps(void *mem)
 
 }
 
+void test_cmps(void *mem)
+{
+   unsigned char *m1 = mem, *m2 = mem + 1024;
+   unsigned char m3[1024];
+
+   for (int i = 0; i  100; ++i)
+   m1[i] = m2[i] = m3[i] = i;
+   for (int i = 100; i  200; ++i)
+   m1[i] = (m3[i] = m2[i] = i) + 1;
+test_cmps_one(m1, m3);
+test_cmps_one(m1, m2);
+}
+
 void test_cr8(void)
 {
unsigned long src, dst;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] test: access: don't expect fetch fault indication if !efer.nx

2010-05-04 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Bit 4 of the page-fault error code can only be set if efer.nx=1.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c
index 5addd15..3338fbc 100644
--- a/kvm/user/test/x86/access.c
+++ b/kvm/user/test/x86/access.c
@@ -463,6 +463,8 @@ no_pte:
 fault:
 if (!at-expected_fault)
 at-ignore_pde = 0;
+if (!at-flags[AC_CPU_EFER_NX])
+at-expected_error = ~PFERR_FETCH_MASK;
 }
 
 static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond,
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] test: access: consolidate test failure reporting into a function

2010-05-04 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c
index dbc1213..0906691 100644
--- a/kvm/user/test/x86/access.c
+++ b/kvm/user/test/x86/access.c
@@ -453,6 +453,28 @@ fault:
 ;
 }
 
+static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond,
+  const char *fmt, ...)
+{
+va_list ap;
+char buf[500];
+
+if (!*success_ret) {
+return;
+}
+
+if (!cond) {
+return;
+}
+
+*success_ret = false;
+
+va_start(ap, fmt);
+vsnprintf(buf, sizeof(buf), fmt, ap);
+va_end(ap);
+printf(FAIL: %s\n, buf);
+}
+
 int ac_test_do_access(ac_test_t *at)
 {
 static unsigned unique = 42;
@@ -460,6 +482,7 @@ int ac_test_do_access(ac_test_t *at)
 unsigned e;
 static unsigned char user_stack[4096];
 unsigned long rsp;
+_Bool success = true;
 
 ++unique;
 
@@ -531,30 +554,21 @@ int ac_test_do_access(ac_test_t *at)
  jmp back_to_kernel \n\t
  .section .text);
 
-if (fault  !at-expected_fault) {
-   printf(FAIL: unexpected fault\n);
-   return 0;
-}
-if (!fault  at-expected_fault) {
-   printf(FAIL: unexpected access\n);
-   return 0;
-}
-if (fault  e != at-expected_error) {
-   printf(FAIL: error code %x expected %x\n, e, at-expected_error);
-   return 0;
-}
-if (at-ptep  *at-ptep != at-expected_pte) {
-   printf(FAIL: pte %x expected %x\n, *at-ptep, at-expected_pte);
-   return 0;
+ac_test_check(at, success, fault  !at-expected_fault,
+  unexpected fault);
+ac_test_check(at, success, !fault  at-expected_fault,
+  unexpected access);
+ac_test_check(at, success, fault  e != at-expected_error,
+  error code %x expected %x, e, at-expected_error);
+ac_test_check(at, success, at-ptep  *at-ptep != at-expected_pte,
+  pte %x expected %x, *at-ptep, at-expected_pte);
+ac_test_check(at, success, *at-pdep != at-expected_pde,
+  pde %x expected %x, *at-pdep, at-expected_pde);
+
+if (success) {
+printf(PASS\n);
 }
-
-if (*at-pdep != at-expected_pde) {
-   printf(FAIL: pde %x expected %x\n, *at-pdep, at-expected_pde);
-   return 0;
-}
-
-printf(PASS\n);
-return 1;
+return success;
 }
 
 static void ac_test_show(ac_test_t *at)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] test: access: allow the processor not to set pde.a if a fault occurs

2010-05-04 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Some processors only set accessed bits if the translation is valid; allow
this behaviour.  This squelchs errors running with EPT.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/user/test/x86/access.c b/kvm/user/test/x86/access.c
index c7a7075..5addd15 100644
--- a/kvm/user/test/x86/access.c
+++ b/kvm/user/test/x86/access.c
@@ -137,6 +137,7 @@ typedef struct {
 pt_element_t expected_pte;
 pt_element_t *pdep;
 pt_element_t expected_pde;
+pt_element_t ignore_pde;
 int expected_fault;
 unsigned expected_error;
 idt_entry_t idt[256];
@@ -370,6 +371,7 @@ void ac_test_setup_pte(ac_test_t *at)
 if (at-ptep)
at-expected_pte = *at-ptep;
 at-expected_pde = *at-pdep;
+at-ignore_pde = 0;
 at-expected_fault = 0;
 at-expected_error = PFERR_PRESENT_MASK;
 
@@ -416,13 +418,17 @@ void ac_test_setup_pte(ac_test_t *at)
 if (at-flags[AC_ACCESS_FETCH]  at-flags[AC_PDE_NX])
at-expected_fault = 1;
 
-if (at-expected_fault)
+if (!at-flags[AC_PDE_ACCESSED])
+at-ignore_pde = PT_ACCESSED_MASK;
+
+if (!pde_valid)
goto fault;
 
-at-expected_pde |= PT_ACCESSED_MASK;
+if (!at-expected_fault)
+at-expected_pde |= PT_ACCESSED_MASK;
 
 if (at-flags[AC_PDE_PSE]) {
-   if (at-flags[AC_ACCESS_WRITE])
+   if (at-flags[AC_ACCESS_WRITE]  !at-expected_fault)
at-expected_pde |= PT_DIRTY_MASK;
goto no_pte;
 }
@@ -455,7 +461,8 @@ void ac_test_setup_pte(ac_test_t *at)
 
 no_pte:
 fault:
-;
+if (!at-expected_fault)
+at-ignore_pde = 0;
 }
 
 static void ac_test_check(ac_test_t *at, _Bool *success_ret, _Bool cond,
@@ -484,6 +491,13 @@ static void ac_test_check(ac_test_t *at, _Bool 
*success_ret, _Bool cond,
 printf(FAIL: %s\n, buf);
 }
 
+static int pt_match(pt_element_t pte1, pt_element_t pte2, pt_element_t ignore)
+{
+pte1 = ~ignore;
+pte2 = ~ignore;
+return pte1 == pte2;
+}
+
 int ac_test_do_access(ac_test_t *at)
 {
 static unsigned unique = 42;
@@ -571,7 +585,8 @@ int ac_test_do_access(ac_test_t *at)
   error code %x expected %x, e, at-expected_error);
 ac_test_check(at, success, at-ptep  *at-ptep != at-expected_pte,
   pte %x expected %x, *at-ptep, at-expected_pte);
-ac_test_check(at, success, *at-pdep != at-expected_pde,
+ac_test_check(at, success,
+  !pt_match(*at-pdep, at-expected_pde, at-ignore_pde),
   pde %x expected %x, *at-pdep, at-expected_pde);
 
 if (success  verbose) {
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Add test for ljmp.

2010-05-04 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Test that ljmp with operand in IO memory works.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/user/test/x86/emulator.c b/kvm/user/test/x86/emulator.c
index db84c13..4967d1f 100644
--- a/kvm/user/test/x86/emulator.c
+++ b/kvm/user/test/x86/emulator.c
@@ -183,6 +183,19 @@ void test_pop(void *mem)
report(ret, 1);
 }
 
+void test_ljmp(void *mem)
+{
+unsigned char *m = mem;
+volatile int res = 1;
+
+*(unsigned long**)m = jmpf;
+asm volatile (data16/mov %%cs, %0:=m(*(m + sizeof(unsigned long;
+asm volatile (rex64/ljmp *%0::m(*m));
+res = 0;
+jmpf:
+report(ljmp, res);
+}
+
 unsigned long read_cr0(void)
 {
unsigned long cr0;
@@ -258,6 +271,7 @@ int main()
 
test_smsw();
test_lmsw();
+test_ljmp(mem);
 
printf(\nSUMMARY: %d tests, %d failures\n, tests, fails);
return fails ? 1 : 0;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] qemu-kvm: fix crash on reboot with vhost-net

2010-05-04 Thread Avi Kivity
From: Michael S. Tsirkin m...@redhat.com

When vhost-net is disabled on reboot, we set msix mask notifier
to NULL to disable further mask/unmask notifications.
Code currently tries to pass this NULL to notifier,
leading to a crash.  The right thing to do is
to add explicit APIs to enable/disable notifications.
Now when disabling notifications:
- if vector is masked, we don't need to notify backend,
  just disable future notifications
- if vector is unmasked, invoke callback to unassign backend,
  then disable future notifications

This patch also polls notifier before closing it,
to make sure we don't lose events if poll callback
didn't have time to run.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/hw/msix.c b/hw/msix.c
index 3ec8805..8f9a621 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -609,14 +609,44 @@ void msix_unuse_all_vectors(PCIDevice *dev)
 
 int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque)
 {
+int r;
+if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
+return 0;
+
+assert(dev-msix_mask_notifier);
+assert(opaque);
+assert(!dev-msix_mask_notifier_opaque[vector]);
+
+if (msix_is_masked(dev, vector)) {
+return 0;
+}
+r = dev-msix_mask_notifier(dev, vector, opaque,
+msix_is_masked(dev, vector));
+if (r  0) {
+return r;
+}
+dev-msix_mask_notifier_opaque[vector] = opaque;
+return r;
+}
+
+int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector)
+{
 int r = 0;
 if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
 return 0;
 
-if (dev-msix_mask_notifier)
-r = dev-msix_mask_notifier(dev, vector, opaque,
-msix_is_masked(dev, vector));
-if (r = 0)
-dev-msix_mask_notifier_opaque[vector] = opaque;
+assert(dev-msix_mask_notifier);
+assert(dev-msix_mask_notifier_opaque[vector]);
+
+if (msix_is_masked(dev, vector)) {
+return 0;
+}
+r = dev-msix_mask_notifier(dev, vector,
+dev-msix_mask_notifier_opaque[vector],
+msix_is_masked(dev, vector));
+if (r  0) {
+return r;
+}
+dev-msix_mask_notifier_opaque[vector] = NULL;
 return r;
 }
diff --git a/hw/msix.h b/hw/msix.h
index f167231..6b21ffb 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -34,4 +34,5 @@ void msix_reset(PCIDevice *dev);
 extern int msix_supported;
 
 int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque);
+int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector);
 #endif
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 99a588c..c4bc633 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -462,10 +462,13 @@ static int virtio_pci_set_guest_notifier(void *opaque, 
int n, bool assign)
 msix_set_mask_notifier(proxy-pci_dev,
virtio_queue_vector(proxy-vdev, n), vq);
 } else {
-msix_set_mask_notifier(proxy-pci_dev,
-   virtio_queue_vector(proxy-vdev, n), NULL);
+msix_unset_mask_notifier(proxy-pci_dev,
+virtio_queue_vector(proxy-vdev, n));
 qemu_set_fd_handler(event_notifier_get_fd(notifier),
 NULL, NULL, NULL);
+/* Test and clear notifier before closing it,
+ * in case poll callback didn't have time to run. */
+virtio_pci_guest_notifier_read(vq);
 event_notifier_cleanup(notifier);
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] qemu-kvm tests cleanup

2010-05-04 Thread Avi Kivity
From: Naphtali Sprei nsp...@redhat.com

Mainly removed unused/unnecessary files and references to them

Signed-off-by: Naphtali Sprei nsp...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/user/README b/kvm/user/README
new file mode 100644
index 000..6a83831
--- /dev/null
+++ b/kvm/user/README
@@ -0,0 +1,23 @@
+This directory contains sources for a kvm test suite.
+
+Tests for x86 architecture are run as kernel images for qemu that supports 
multiboot format.
+Tests uses an infrastructure called from the bios code. The infrastructure 
initialize the system/cpu's,
+switch to long-mode and calls the 'main' function of the individual test.
+Tests uses a qemu's virtual test device, named testdev, for services like 
printing, exiting, query memory size etc.
+See file testdev.txt for more details.
+
+To create the tests' images just type 'make' in this directory.
+Tests' images created in ./test/ARCH/*.flat
+
+An example of a test invocation:
+qemu-system-x86_64 -device testdev,chardev=testlog -chardev 
file,id=testlog,path=msr.out -kernel ./test/x86/msr.flat
+This invocation runs the msr test case. The test output is in file msr.out.
+
+
+
+Directory structure:
+.:  Makefile and config files for the tests
+./test/lib: general services for the tests
+./test/lib/ARCH: architecture dependent services for the tests
+./test/ARCH: the sources of the tests and the created objects/images
+
diff --git a/kvm/user/balloon_ctl.c b/kvm/user/balloon_ctl.c
deleted file mode 100755
index e65b08d..000
--- a/kvm/user/balloon_ctl.c
+++ /dev/null
@@ -1,92 +0,0 @@
-/*
- * This binary provides access to the guest's balloon driver
- * module.
- *
- * Copyright (C) 2007 Qumranet
- *
- * Author:
- *
- *  Dor Laor dor.l...@qumranet.com
- *
- * This work is licensed under the GNU LGPL license, version 2.
- */
-
-#include unistd.h
-#include fcntl.h
-#include stdio.h
-#include stdlib.h
-#include sys/mman.h
-#include string.h
-#include errno.h
-#include sys/ioctl.h
-
-#define __user
-#include linux/kvm.h
-
-#define PAGE_SIZE 4096ul
-
-
-static int balloon_op(int *fd, int bytes)
-{
-   struct kvm_balloon_op bop;
-int r;
-
-   bop.npages = bytes/PAGE_SIZE;
-   r = ioctl(*fd, KVM_BALLOON_OP, bop);
-   if (r == -1)
-   return -errno;
-   printf(Ballon handled %d pages successfully\n, bop.npages);
-
-   return 0;
-}
-
-static int balloon_init(int *fd)
-{
-   *fd = open(/dev/kvm_balloon, O_RDWR);
-   if (*fd == -1) {
-   perror(open /dev/kvm_balloon);
-   return -1;
-   }
-
-   return 0;
-}
-
-int main(int argc, char *argv[])
-{
-   int fd;
-   int r;
-   int bytes;
-
-   if (argc != 3) {
-   perror(Please provide op=[i|d], bytes\n);
-   return 1;
-   }
-   bytes = atoi(argv[2]);
-
-   switch (*argv[1]) {
-   case 'i':
-   break;
-   case 'd':
-   bytes = -bytes;
-   break;
-   default:
-   perror(Wrong op param\n);
-   return 1;
-   }
-
-   if (balloon_init(fd)) {
-   perror(balloon_init failed\n);
-   return 1;
-   }
-
-   if ((r = balloon_op(fd, bytes))) {
-   perror(balloon_op failed\n);
-   goto out;
-   }
-
-out:
-   close(fd);
-
-   return r;
-}
-
diff --git a/kvm/user/bootstrap.lds b/kvm/user/bootstrap.lds
deleted file mode 100644
index fd0a4f8..000
--- a/kvm/user/bootstrap.lds
+++ /dev/null
@@ -1,15 +0,0 @@
-OUTPUT_FORMAT(binary)
-
-SECTIONS
-{
-. = 0;
-stext = .;
-.text : { *(.init) *(.text) }
-. = ALIGN(4K);
-.data : { *(.data) }
-. = ALIGN(16);
-.bss : { *(.bss) }
-. = ALIGN(4K);
-edata = .;
-}
-
diff --git a/kvm/user/config-x86-common.mak b/kvm/user/config-x86-common.mak
index f3172fb..8e795f0 100644
--- a/kvm/user/config-x86-common.mak
+++ b/kvm/user/config-x86-common.mak
@@ -2,9 +2,7 @@
 
 CFLAGS += -I../include/x86
 
-all: kvmtrace test_cases
-
-balloon_ctl: balloon_ctl.o
+all: test_cases
 
 cflatobjs += \
test/lib/x86/io.o \
@@ -21,21 +19,17 @@ CFLAGS += -m$(bits)
 libgcc := $(shell $(CC) -m$(bits) --print-libgcc-file-name)
 
 FLATLIBS = test/lib/libcflat.a $(libgcc)
-%.flat: %.o $(FLATLIBS)
+%.flat: %.o $(FLATLIBS) flat.lds
$(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS)
 
-tests-common = $(TEST_DIR)/bootstrap \
-   $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \
-   $(TEST_DIR)/smptest.flat  $(TEST_DIR)/port80.flat \
-   $(TEST_DIR)/realmode.flat $(TEST_DIR)/msr.flat
+tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \
+   $(TEST_DIR)/smptest.flat  $(TEST_DIR)/port80.flat \
+   $(TEST_DIR)/realmode.flat $(TEST_DIR)/msr.flat
 
 test_cases: $(tests-common) $(tests)
 
 $(TEST_DIR)/%.o: CFLAGS += -std=gnu99 -ffreestanding -I test/lib -I 

[COMMIT master] qemu-kvm tests: add printing for passing tests

2010-05-04 Thread Avi Kivity
From: Naphtali Sprei nsp...@redhat.com

Signed-off-by: Naphtali Sprei nsp...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/user/test/x86/realmode.c b/kvm/user/test/x86/realmode.c
index bfc2942..bc4ed97 100644
--- a/kvm/user/test/x86/realmode.c
+++ b/kvm/user/test/x86/realmode.c
@@ -160,6 +160,8 @@ void test_xchg(void)
 
if (!regs_equal(inregs, outregs, 0))
print_serial(xchg test 1: FAIL\n);
+   else
+   print_serial(xchg test 1: PASS\n);
 
exec_in_big_real_mode(inregs, outregs,
  insn_xchg_test2,
@@ -169,6 +171,8 @@ void test_xchg(void)
 outregs.eax != inregs.ebx ||
 outregs.ebx != inregs.eax)
print_serial(xchg test 2: FAIL\n);
+   else
+   print_serial(xchg test 2: PASS\n);
 
exec_in_big_real_mode(inregs, outregs,
  insn_xchg_test3,
@@ -178,6 +182,8 @@ void test_xchg(void)
 outregs.eax != inregs.ecx ||
 outregs.ecx != inregs.eax)
print_serial(xchg test 3: FAIL\n);
+   else
+   print_serial(xchg test 3: PASS\n);
 
exec_in_big_real_mode(inregs, outregs,
  insn_xchg_test4,
@@ -187,6 +193,8 @@ void test_xchg(void)
 outregs.eax != inregs.edx ||
 outregs.edx != inregs.eax)
print_serial(xchg test 4: FAIL\n);
+   else
+   print_serial(xchg test 4: PASS\n);
 
exec_in_big_real_mode(inregs, outregs,
  insn_xchg_test5,
@@ -196,6 +204,8 @@ void test_xchg(void)
 outregs.eax != inregs.esi ||
 outregs.esi != inregs.eax)
print_serial(xchg test 5: FAIL\n);
+   else
+   print_serial(xchg test 5: PASS\n);
 
exec_in_big_real_mode(inregs, outregs,
  insn_xchg_test6,
@@ -205,6 +215,8 @@ void test_xchg(void)
 outregs.eax != inregs.edi ||
 outregs.edi != inregs.eax)
print_serial(xchg test 6: FAIL\n);
+   else
+   print_serial(xchg test 6: PASS\n);
 
exec_in_big_real_mode(inregs, outregs,
  insn_xchg_test7,
@@ -214,6 +226,8 @@ void test_xchg(void)
 outregs.eax != inregs.ebp ||
 outregs.ebp != inregs.eax)
print_serial(xchg test 7: FAIL\n);
+   else
+   print_serial(xchg test 7: PASS\n);
 
exec_in_big_real_mode(inregs, outregs,
  insn_xchg_test8,
@@ -223,6 +237,8 @@ void test_xchg(void)
 outregs.eax != inregs.esp ||
 outregs.esp != inregs.eax)
print_serial(xchg test 8: FAIL\n);
+   else
+   print_serial(xchg test 8: PASS\n);
 }
 
 void test_shld(void)
@@ -234,9 +250,9 @@ void test_shld(void)
  insn_shld_test,
  insn_shld_test_end - insn_shld_test);
if (outregs.eax != 0xbeef)
-   print_serial(shld: failure\n);
+   print_serial(shld: FAIL\n);
else
-   print_serial(shld: success\n);
+   print_serial(shld: PASS\n);
 }
 
 void test_mov_imm(void)
@@ -253,6 +269,8 @@ void test_mov_imm(void)
  insn_mov_r16_imm_1_end - insn_mov_r16_imm_1);
if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 1234)
print_serial(mov test 1: FAIL\n);
+   else
+   print_serial(mov test 1: PASS\n);
 
/* test mov $imm, %eax */
exec_in_big_real_mode(inregs, outregs,
@@ -260,6 +278,8 @@ void test_mov_imm(void)
  insn_mov_r32_imm_1_end - insn_mov_r32_imm_1);
if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 1234567890)
print_serial(mov test 2: FAIL\n);
+   else
+   print_serial(mov test 2: PASS\n);
 
/* test mov $imm, %al/%ah */
exec_in_big_real_mode(inregs, outregs,
@@ -267,16 +287,24 @@ void test_mov_imm(void)
  insn_mov_r8_imm_1_end - insn_mov_r8_imm_1);
if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x1200)
print_serial(mov test 3: FAIL\n);
+   else
+   print_serial(mov test 3: PASS\n);
+
exec_in_big_real_mode(inregs, outregs,
  insn_mov_r8_imm_2,
  insn_mov_r8_imm_2_end - insn_mov_r8_imm_2);
if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x34)
print_serial(mov test 4: FAIL\n);
+   else
+   print_serial(mov test 4: PASS\n);
+
exec_in_big_real_mode(inregs, outregs,
  insn_mov_r8_imm_3,
  insn_mov_r8_imm_3_end - insn_mov_r8_imm_3);
if (!regs_equal(inregs, outregs, R_AX) || outregs.eax != 0x1234)
 

[COMMIT master] Merge branch 'upstream-merge'

2010-05-04 Thread Avi Kivity
From: Marcelo Tosatti mtosa...@redhat.com

* upstream-merge: (65 commits)
  block: read-only: open cdrom as read-only when using monitor's change command
  fix whitespace bogon in some versions of make
  Changes to usb-linux to conform to coding style
  Add KVM CFLAGS to vhost build
  QMP: Introduce RESUME event
  virtio-9p: Create a syntactic shortcut for the file-system pass-thru
  virtio-9p: Add P9_TFLUSH support
  virtio-9p: Add P9_TREMOVE support.
  virtio-9p: Add P9_TWSTAT support
  virtio-9p: Add P9_TCREATE support
  virtio-9p: Add P9_TWRITE support
  virtio-9p: Add P9_TCLUNK support
  virtio-9p: Add P9_TREAD support
  virtio-9p: Add P9_TOPEN support.
  virtio-9p: Add P9_TWALK support
  virtio-9p: Add P9_TSTAT support
  virtio-9p: Add P9_TATTACH support.
  virtio-9p: Add P9_TVERSION support
  virtio-9p: Add sg helper functions
  virtio-9p: Add stat and mode related helper functions.
  ...

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Merge branch 'upstream-merge'

2010-05-04 Thread Avi Kivity
From: Marcelo Tosatti mtosa...@redhat.com

* upstream-merge: (243 commits)
  virtio-serial: Implement flow control for individual ports
  virtio-serial: Discard data that guest sends us when ports aren't connected
  virtio-serial: Apps should consume all data that guest sends out / Fix virtio 
api abuse
  virtio-serial: Handle scatter/gather input from the guest
  virtio-serial: Handle scatter-gather buffers for control messages
  iov: Add iov_to_buf and iov_size helpers
  iov: Introduce a new file for helpers around iovs, add iov_from_buf()
  virtio-serial: Send out guest data to ports only if port is opened
  virtio-serial: Propagate errors in initialising ports / devices in guest
  virtio-serial: Update copyright year to 2010
  virtio-serial: Remove redundant check for 0-sized write request
  virtio-serial: whitespace: match surrounding code
  virtio-serial: Use control messages to notify guest of new ports
  virtio-serial: save/load: Send target host connection status if different
  virtio-serial: save/load: Ensure we have hot-plugged ports instantiated
  virtio-serial: save/load: Ensure nr_ports on src and dest are same.
  virtio-serial: save/load: Ensure target has enough ports
  microblaze: fix custom fprintf
  Implement cpu_get_real_ticks for Alpha.
  target-alpha: Implement RPCC.
  ...

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] qemu-kvm: emulator tests: fix msr test

2010-05-04 Thread Avi Kivity
From: Naphtali Sprei nsp...@redhat.com

use correct 64 bit mode inline assembly constraints
use a canonical form address when writing to the MSR_KERNEL_GS_BASE MSR

Signed-off-by: Naphtali Sprei nsp...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/user/test/x86/msr.c b/kvm/user/test/x86/msr.c
index 92102fa..0d6f286 100644
--- a/kvm/user/test/x86/msr.c
+++ b/kvm/user/test/x86/msr.c
@@ -17,23 +17,25 @@ static void report(const char *name, int passed)
 
 static void wrmsr(unsigned index, unsigned long long value)
 {
-   asm volatile (wrmsr : : c(index), A(value));
+   unsigned a = value, d = value  32;
+
+   asm volatile(wrmsr : : a(a), d(d), c(index));
 }
 
 static unsigned long long rdmsr(unsigned index)
 {
-   unsigned long long value;
-
-   asm volatile (rdmsr : =A(value) : c(index));
+   unsigned a, d;
 
-   return value;
+   asm volatile(rdmsr : =a(a), =d(d) : c(index));
+   return ((unsigned long long)d  32) | a;
 }
+
 #endif
 
 static void test_kernel_gs_base(void)
 {
 #ifdef __x86_64__
-   unsigned long long v1 = 0x123456789abcdef, v2;
+   unsigned long long v1 = 0x123456789abc, v2;
 
wrmsr(MSR_KERNEL_GS_BASE, v1);
v2 = rdmsr(MSR_KERNEL_GS_BASE);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] kvm test: Fix i386 crossbuild

2010-05-04 Thread Avi Kivity
From: Jan Kiszka jan.kis...@siemens.com

This fixes make ARCH=i386 of the KVM micro tests on x86-64 hosts.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/kvm/user/config-x86-common.mak b/kvm/user/config-x86-common.mak
index 63cca42..f3172fb 100644
--- a/kvm/user/config-x86-common.mak
+++ b/kvm/user/config-x86-common.mak
@@ -18,6 +18,8 @@ $(libcflat): CFLAGS += -ffreestanding -I test/lib
 
 CFLAGS += -m$(bits)
 
+libgcc := $(shell $(CC) -m$(bits) --print-libgcc-file-name)
+
 FLATLIBS = test/lib/libcflat.a $(libgcc)
 %.flat: %.o $(FLATLIBS)
$(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS)
@@ -32,7 +34,7 @@ test_cases: $(tests-common) $(tests)
 $(TEST_DIR)/%.o: CFLAGS += -std=gnu99 -ffreestanding -I test/lib -I 
test/lib/x86
  
 $(TEST_DIR)/bootstrap: $(TEST_DIR)/bootstrap.o
-   $(CC) -nostdlib -o $@ -Wl,-T,bootstrap.lds $^
+   $(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,bootstrap.lds $^
  
 $(TEST_DIR)/access.flat: $(cstart.o) $(TEST_DIR)/access.o $(TEST_DIR)/print.o
  
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] x86: eliminate TS_XSAVE

2010-05-04 Thread Avi Kivity

On 05/04/2010 12:45 AM, H. Peter Anvin wrote:



I was trying to avoid a performance regression relative to the current
code, as it appears that some care was taken to avoid the memory reference.

I agree that it's probably negligible compared to the save/restore
code.  If the x86 maintainers agree as well, I'll replace it with
cpu_has_xsave.

 

I asked Suresh to comment on this, since he wrote the original code.  He
did confirm that the intent was to avoid a global memory reference.

   


Ok, so you're happy with the patch as is?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.33.3: possible recursive locking detected

2010-05-04 Thread CaT
I'm currently running 2.6.33.3 in a KVM instance emulating a core2duo
on 1 cpu with virtio HDs running on top of a core2duo host running 2.6.33.3.
qemu-kvm version 0.12.3. When doing:

echo noop /sys/block/vdd/queue/scheduler

I got:

[ 1424.438241] =
[ 1424.439588] [ INFO: possible recursive locking detected ]
[ 1424.440368] 2.6.33.3-moocow.20100429-142641 #2
[ 1424.440960] -
[ 1424.440960] bash/2186 is trying to acquire lock:
[ 1424.440960]  (s_active){.+}, at: [811046b8] 
sysfs_remove_dir+0x75/0x88
[ 1424.440960] 
[ 1424.440960] but task is already holding lock:
[ 1424.440960]  (s_active){.+}, at: [81104849] 
sysfs_get_active_two+0x1f/0x46
[ 1424.440960] 
[ 1424.440960] other info that might help us debug this:
[ 1424.440960] 4 locks held by bash/2186:
[ 1424.440960]  #0:  (buffer-mutex){+.+.+.}, at: [8110317f] 
sysfs_write_file+0x39/0x126
[ 1424.440960]  #1:  (s_active){.+}, at: [81104849] 
sysfs_get_active_two+0x1f/0x46
[ 1424.440960]  #2:  (s_active){.+}, at: [81104856] 
sysfs_get_active_two+0x2c/0x46
[ 1424.440960]  #3:  (q-sysfs_lock){+.+.+.}, at: [8119c3f0] 
queue_attr_store+0x44/0x85
[ 1424.440960] 
[ 1424.440960] stack backtrace:
[ 1424.440960] Pid: 2186, comm: bash Not tainted 
2.6.33.3-moocow.20100429-142641 #2
[ 1424.440960] Call Trace:
[ 1424.440960]  [8105e775] __lock_acquire+0xf9f/0x178e
[ 1424.440960]  [8100d3ec] ? save_stack_trace+0x2a/0x48
[ 1424.440960]  [8105b46c] ? lockdep_init_map+0x9f/0x52f
[ 1424.440960]  [8105b46c] ? lockdep_init_map+0x9f/0x52f
[ 1424.440960]  [8105cb56] ? trace_hardirqs_on+0xd/0xf
[ 1424.440960]  [8105f02e] lock_acquire+0xca/0xef
[ 1424.440960]  [811046b8] ? sysfs_remove_dir+0x75/0x88
[ 1424.440960]  [8110458d] sysfs_addrm_finish+0xc8/0x13a
[ 1424.440960]  [811046b8] ? sysfs_remove_dir+0x75/0x88
[ 1424.440960]  [8105cb25] ? trace_hardirqs_on_caller+0x110/0x134
[ 1424.440960]  [811046b8] sysfs_remove_dir+0x75/0x88
[ 1424.440960]  [811ab312] kobject_del+0x16/0x37
[ 1424.440960]  [81195489] elv_iosched_store+0x10a/0x214
[ 1424.440960]  [8119c416] queue_attr_store+0x6a/0x85
[ 1424.440960]  [81103237] sysfs_write_file+0xf1/0x126
[ 1424.440960]  [810b747f] vfs_write+0xae/0x14a
[ 1424.440960]  [810b75df] sys_write+0x47/0x6e
[ 1424.440960]  [81002202] system_call_fastpath+0x16/0x1b

Original scheduler was cfq.

Having rebooted and defaulted to noop I tried

echo noop /sys/block/vdd/queue/scheduler

and got:

[  311.294464] =
[  311.295820] [ INFO: possible recursive locking detected ]
[  311.296603] 2.6.33.3-moocow.20100429-142641 #2
[  311.296833] -
[  311.296833] bash/2190 is trying to acquire lock:
[  311.296833]  (s_active){.+}, at: [81104630] 
remove_dir+0x31/0x39
[  311.296833] 
[  311.296833] but task is already holding lock:
[  311.296833]  (s_active){.+}, at: [81104849] 
sysfs_get_active_two+0x1f/0x46
[  311.296833] 
[  311.296833] other info that might help us debug this:
[  311.296833] 4 locks held by bash/2190:
[  311.296833]  #0:  (buffer-mutex){+.+.+.}, at: [8110317f] 
sysfs_write_file+0x39/0x126
[  311.296833]  #1:  (s_active){.+}, at: [81104849] 
sysfs_get_active_two+0x1f/0x46
[  311.296833]  #2:  (s_active){.+}, at: [81104856] 
sysfs_get_active_two+0x2c/0x46
[  311.296833]  #3:  (q-sysfs_lock){+.+.+.}, at: [8119c3f0] 
queue_attr_store+0x44/0x85
[  311.296833] 
[  311.296833] stack backtrace:
[  311.296833] Pid: 2190, comm: bash Not tainted 
2.6.33.3-moocow.20100429-142641 #2
[  311.296833] Call Trace:
[  311.296833]  [8105e775] __lock_acquire+0xf9f/0x178e
[  311.296833]  [8105b46c] ? lockdep_init_map+0x9f/0x52f
[  311.296833]  [8105b46c] ? lockdep_init_map+0x9f/0x52f
[  311.296833]  [8105cb56] ? trace_hardirqs_on+0xd/0xf
[  311.296833]  [8105f02e] lock_acquire+0xca/0xef
[  311.296833]  [81104630] ? remove_dir+0x31/0x39
[  311.296833]  [8110458d] sysfs_addrm_finish+0xc8/0x13a
[  311.296833]  [81104630] ? remove_dir+0x31/0x39
[  311.296833]  [8105cb25] ? trace_hardirqs_on_caller+0x110/0x134
[  311.296833]  [81104630] remove_dir+0x31/0x39
[  311.296833]  [811046c0] sysfs_remove_dir+0x7d/0x88
[  311.296833]  [811ab312] kobject_del+0x16/0x37
[  311.296833]  [81195489] elv_iosched_store+0x10a/0x214
[  311.296833]  [8119c416] queue_attr_store+0x6a/0x85
[  311.296833]  [81103237] sysfs_write_file+0xf1/0x126
[  311.296833]  [810b747f] vfs_write+0xae/0x14a
[  311.296833]  [810b75df] sys_write+0x47/0x6e
[  311.296833]  [81002202] system_call_fastpath+0x16/0x1b

Changing back to noop 

Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu

2010-05-04 Thread Avi Kivity

On 05/03/2010 07:32 PM, Joerg Roedel wrote:

On Tue, Apr 27, 2010 at 03:58:36PM +0300, Avi Kivity wrote:
   

So we probably need to upgrade gva_t to a u64.  Please send this as
a separate patch, and test on i386 hosts.
 

Are there _any_ regular tests of KVM on i386 hosts? For me this is
terribly broken (also after I fixed the issue which gave me a
VMEXIT_INVALID at the first vmrun).

   


No, apart from the poor users.  I'll try to set something up using nsvm.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Avi Kivity

On 05/04/2010 07:38 AM, Rusty Russell wrote:

On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote:
   

I took a stub at documenting CMD and FLUSH request types in virtio
block.  Christoph, could you look over this please?

I note that the interface seems full of warts to me,
this might be a first step to cleaning them.
 

ISTR Christoph had withdrawn some patches in this area, and was waiting
for him to resubmit?

I've given up on figuring out the block device.  What seem to me to be sane
semantics along the lines of memory barriers are foreign to disk people: they
want (and depend on) flushing everywhere.

For example, tdb transactions do not require a flush, they only require what
I would call a barrier: that prior data be written out before any future data.
Surely that would be more efficient in general than a flush!  In fact, TDB
wants only writes to *that file* (and metadata) written out first; it has no
ordering issues with other I/O on the same device.
   


I think that's SCSI ordered tags.


A generic I/O interface would allow you to specify this request depends on 
these
outstanding requests and leave it at that.  It might have some sync flush
command for dumb applications and OSes.  The userspace API might be not be as
precise and only allow such a barrier against all prior writes on this fd.
   


Depends on all previous requests, and will commit before all following 
requests.  ie a full barrier.



ISTR someone mentioning a desire for such an API years ago, so CC'ing the
usual I/O suspects...
   


I'd love to see TCQ exposed to user space.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath - Assertion

2010-05-04 Thread Kevin Wolf
Am 03.05.2010 23:26, schrieb Peter Lieven:
 Hi Qemu/KVM Devel Team,
 
 i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3.
 As backend we use open-iSCSI with dm-multipath.
 
 Multipath is configured to queue i/o if no path is available.
 
 If we create a failure on all paths, qemu starts to consume 100%
 CPU due to i/o waits which is ok so far.
 
 1 odd thing: The Monitor Interface is not responding any more ...
 
 What es a really blocker is that KVM crashes with:
 kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: 
 Assertion `bmdma-unit != (uint8_t)-1' failed.
 
 after the multipath has reestablisched at least one path.

Can you get a stack backtrace with gdb?

 Any ideas? I remember this was working with earlier kernel/kvm/qemu 
 versions.

If it works in the same setup with an older qemu version, bisecting
might help.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.33.3: possible recursive locking detected

2010-05-04 Thread Avi Kivity

On 05/04/2010 10:03 AM, CaT wrote:

I'm currently running 2.6.33.3 in a KVM instance emulating a core2duo
on 1 cpu with virtio HDs running on top of a core2duo host running 2.6.33.3.
qemu-kvm version 0.12.3.


Doesn't appear to be related to kvm.  Copying lkml.


When doing:

echo noop/sys/block/vdd/queue/scheduler

I got:

[ 1424.438241] =
[ 1424.439588] [ INFO: possible recursive locking detected ]
[ 1424.440368] 2.6.33.3-moocow.20100429-142641 #2
[ 1424.440960] -
[ 1424.440960] bash/2186 is trying to acquire lock:
[ 1424.440960]  (s_active){.+}, at: [811046b8] 
sysfs_remove_dir+0x75/0x88
[ 1424.440960]
[ 1424.440960] but task is already holding lock:
[ 1424.440960]  (s_active){.+}, at: [81104849] 
sysfs_get_active_two+0x1f/0x46
[ 1424.440960]
[ 1424.440960] other info that might help us debug this:
[ 1424.440960] 4 locks held by bash/2186:
[ 1424.440960]  #0:  (buffer-mutex){+.+.+.}, at: [8110317f] 
sysfs_write_file+0x39/0x126
[ 1424.440960]  #1:  (s_active){.+}, at: [81104849] 
sysfs_get_active_two+0x1f/0x46
[ 1424.440960]  #2:  (s_active){.+}, at: [81104856] 
sysfs_get_active_two+0x2c/0x46
[ 1424.440960]  #3:  (q-sysfs_lock){+.+.+.}, at: [8119c3f0] 
queue_attr_store+0x44/0x85
[ 1424.440960]
[ 1424.440960] stack backtrace:
[ 1424.440960] Pid: 2186, comm: bash Not tainted 
2.6.33.3-moocow.20100429-142641 #2
[ 1424.440960] Call Trace:
[ 1424.440960]  [8105e775] __lock_acquire+0xf9f/0x178e
[ 1424.440960]  [8100d3ec] ? save_stack_trace+0x2a/0x48
[ 1424.440960]  [8105b46c] ? lockdep_init_map+0x9f/0x52f
[ 1424.440960]  [8105b46c] ? lockdep_init_map+0x9f/0x52f
[ 1424.440960]  [8105cb56] ? trace_hardirqs_on+0xd/0xf
[ 1424.440960]  [8105f02e] lock_acquire+0xca/0xef
[ 1424.440960]  [811046b8] ? sysfs_remove_dir+0x75/0x88
[ 1424.440960]  [8110458d] sysfs_addrm_finish+0xc8/0x13a
[ 1424.440960]  [811046b8] ? sysfs_remove_dir+0x75/0x88
[ 1424.440960]  [8105cb25] ? trace_hardirqs_on_caller+0x110/0x134
[ 1424.440960]  [811046b8] sysfs_remove_dir+0x75/0x88
[ 1424.440960]  [811ab312] kobject_del+0x16/0x37
[ 1424.440960]  [81195489] elv_iosched_store+0x10a/0x214
[ 1424.440960]  [8119c416] queue_attr_store+0x6a/0x85
[ 1424.440960]  [81103237] sysfs_write_file+0xf1/0x126
[ 1424.440960]  [810b747f] vfs_write+0xae/0x14a
[ 1424.440960]  [810b75df] sys_write+0x47/0x6e
[ 1424.440960]  [81002202] system_call_fastpath+0x16/0x1b

Original scheduler was cfq.

Having rebooted and defaulted to noop I tried

echo noop/sys/block/vdd/queue/scheduler

and got:

[  311.294464] =
[  311.295820] [ INFO: possible recursive locking detected ]
[  311.296603] 2.6.33.3-moocow.20100429-142641 #2
[  311.296833] -
[  311.296833] bash/2190 is trying to acquire lock:
[  311.296833]  (s_active){.+}, at: [81104630] 
remove_dir+0x31/0x39
[  311.296833]
[  311.296833] but task is already holding lock:
[  311.296833]  (s_active){.+}, at: [81104849] 
sysfs_get_active_two+0x1f/0x46
[  311.296833]
[  311.296833] other info that might help us debug this:
[  311.296833] 4 locks held by bash/2190:
[  311.296833]  #0:  (buffer-mutex){+.+.+.}, at: [8110317f] 
sysfs_write_file+0x39/0x126
[  311.296833]  #1:  (s_active){.+}, at: [81104849] 
sysfs_get_active_two+0x1f/0x46
[  311.296833]  #2:  (s_active){.+}, at: [81104856] 
sysfs_get_active_two+0x2c/0x46
[  311.296833]  #3:  (q-sysfs_lock){+.+.+.}, at: [8119c3f0] 
queue_attr_store+0x44/0x85
[  311.296833]
[  311.296833] stack backtrace:
[  311.296833] Pid: 2190, comm: bash Not tainted 
2.6.33.3-moocow.20100429-142641 #2
[  311.296833] Call Trace:
[  311.296833]  [8105e775] __lock_acquire+0xf9f/0x178e
[  311.296833]  [8105b46c] ? lockdep_init_map+0x9f/0x52f
[  311.296833]  [8105b46c] ? lockdep_init_map+0x9f/0x52f
[  311.296833]  [8105cb56] ? trace_hardirqs_on+0xd/0xf
[  311.296833]  [8105f02e] lock_acquire+0xca/0xef
[  311.296833]  [81104630] ? remove_dir+0x31/0x39
[  311.296833]  [8110458d] sysfs_addrm_finish+0xc8/0x13a
[  311.296833]  [81104630] ? remove_dir+0x31/0x39
[  311.296833]  [8105cb25] ? trace_hardirqs_on_caller+0x110/0x134
[  311.296833]  [81104630] remove_dir+0x31/0x39
[  311.296833]  [811046c0] sysfs_remove_dir+0x7d/0x88
[  311.296833]  [811ab312] kobject_del+0x16/0x37
[  311.296833]  [81195489] elv_iosched_store+0x10a/0x214
[  311.296833]  [8119c416] queue_attr_store+0x6a/0x85
[  311.296833]  [81103237] sysfs_write_file+0xf1/0x126
[  311.296833]  [810b747f] vfs_write+0xae/0x14a
[  311.296833]  [810b75df] sys_write+0x47/0x6e
[  

Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Jens Axboe
On Tue, May 04 2010, Rusty Russell wrote:
 On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote:
  I took a stub at documenting CMD and FLUSH request types in virtio
  block.  Christoph, could you look over this please?
  
  I note that the interface seems full of warts to me,
  this might be a first step to cleaning them.
 
 ISTR Christoph had withdrawn some patches in this area, and was waiting
 for him to resubmit?
 
 I've given up on figuring out the block device.  What seem to me to be sane
 semantics along the lines of memory barriers are foreign to disk people: they
 want (and depend on) flushing everywhere.
 
 For example, tdb transactions do not require a flush, they only require what
 I would call a barrier: that prior data be written out before any future data.
 Surely that would be more efficient in general than a flush!  In fact, TDB
 wants only writes to *that file* (and metadata) written out first; it has no
 ordering issues with other I/O on the same device.
 
 A generic I/O interface would allow you to specify this request depends on 
 these
 outstanding requests and leave it at that.  It might have some sync flush
 command for dumb applications and OSes.  The userspace API might be not be as
 precise and only allow such a barrier against all prior writes on this fd.
 
 ISTR someone mentioning a desire for such an API years ago, so CC'ing the
 usual I/O suspects...

It would be nice to have a more fuller API for this, but the reality is
that only the flush approach is really workable. Even just strict
ordering of requests could only be supported on SCSI, and even there the
kernel still lacks proper guarantees on error handling to prevent
reordering there.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Booting/installing WindowsNT

2010-05-04 Thread Avi Kivity

On 05/03/2010 08:03 PM, Michael Tokarev wrote:

Michael, can you try to use -cpu host,-vme and see if that makes a
difference?



With -cpu host,-vme winNT boots just fine as with just -cpu host.

I also tried with -cpu qemu64 and kvm64, with +vme and -vme (4
combinations in total) - in all cases winNT crashes with the
same 0x003E error.  So it appears that vme makes no
difference.


Please try again the model/vendor/family.  I suggest using x86info on 
both to see what the differences are, using -cpu host with overrides to 
make it equivalent to qemu64 (and verifying it fails), then removing the 
overrides one by one until it works.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu

2010-05-04 Thread Roedel, Joerg
On Tue, May 04, 2010 at 03:53:57AM -0400, Avi Kivity wrote:
 On 05/03/2010 07:32 PM, Joerg Roedel wrote:
  On Tue, Apr 27, 2010 at 03:58:36PM +0300, Avi Kivity wrote:
 
  So we probably need to upgrade gva_t to a u64.  Please send this as
  a separate patch, and test on i386 hosts.
   
  Are there _any_ regular tests of KVM on i386 hosts? For me this is
  terribly broken (also after I fixed the issue which gave me a
  VMEXIT_INVALID at the first vmrun).
 
 
 
 No, apart from the poor users.  I'll try to set something up using nsvm.

Ok. I will post an initial fix for the VMEXIT_INVALID bug soon. Apart
from that I get a lockdep warning when I try to start a guest. The guest
actually boots if it is single-vcpu. SMP guests don't even boot through
the BIOS for me.

Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu

2010-05-04 Thread Avi Kivity

On 05/04/2010 12:11 PM, Roedel, Joerg wrote:

On Tue, May 04, 2010 at 03:53:57AM -0400, Avi Kivity wrote:
   

On 05/03/2010 07:32 PM, Joerg Roedel wrote:
 

On Tue, Apr 27, 2010 at 03:58:36PM +0300, Avi Kivity wrote:

   

So we probably need to upgrade gva_t to a u64.  Please send this as
a separate patch, and test on i386 hosts.

 

Are there _any_ regular tests of KVM on i386 hosts? For me this is
terribly broken (also after I fixed the issue which gave me a
VMEXIT_INVALID at the first vmrun).


   

No, apart from the poor users.  I'll try to set something up using nsvm.
 

Ok. I will post an initial fix for the VMEXIT_INVALID bug soon. Apart
from that I get a lockdep warning when I try to start a guest. The guest
actually boots if it is single-vcpu. SMP guests don't even boot through
the BIOS for me.

   


Strange.  i386 vs x86_64 shouldn't have that much effect!

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu-kvm: Process exit requests in kvm loop

2010-05-04 Thread Jan Kiszka
This unbreaks the monitor quit command for qemu-kvm.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 qemu-kvm.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 91f0222..43d599d 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2047,6 +2047,9 @@ int kvm_main_loop(void)
 vm_stop(EXCP_DEBUG);
 kvm_debug_cpu_requested = NULL;
 }
+if (qemu_exit_requested()) {
+exit(0);
+}
 }
 
 pause_all_threads();

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: x86: properly update ready_for_interrupt_injection

2010-05-04 Thread Avi Kivity

On 05/04/2010 05:04 AM, Marcelo Tosatti wrote:

The recent changes to emulate string instructions without entering guest
mode exposed a bug where pending interrupts are not properly reflected
in ready_for_interrupt_injection.

The result is that userspace overwrites a previously queued interrupt,
when irqchip's are emulated in qemu.

   


Applied, thanks.


Fix by always updating state before returning to userspace.
   


Why are we even doing this if irqchip_in_kernel?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean

2010-05-04 Thread Avi Kivity

On 04/28/2010 12:50 PM, Takuya Yoshikawa wrote:

Hi Marcelo, Avi,

I updated the patch as follows.

Changelog:
  1. Inserted one r = -ENOMEM; line following Avi's advice.
  2. Little change of explanation about performance improvements.

I'm now testing and cleaning up my next patch series based on this,
so please apply this if this makes sense and has no problems.

Thanks,
   Takuya

===

Although we always allocate a new dirty bitmap in x86's get_dirty_log(),
it is only used as a zero-source of copy_to_user() and freed right after
that when memslot is clean. This patch uses clear_user() instead of doing
this unnecessary zero-source allocation.

Performance improvement: as we can expect easily, the time needed to
allocate a bitmap is completely reduced. In my test, the improved ioctl
was about 4 to 10 times faster than the original one for clean slots.
Furthermore, reducing memory allocations and copies will produce good
effects to caches too.

   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu

2010-05-04 Thread Roedel, Joerg
On Tue, May 04, 2010 at 05:20:02AM -0400, Avi Kivity wrote:
 On 05/04/2010 12:11 PM, Roedel, Joerg wrote:
  On Tue, May 04, 2010 at 03:53:57AM -0400, Avi Kivity wrote:
 
  On 05/03/2010 07:32 PM, Joerg Roedel wrote:
   
  On Tue, Apr 27, 2010 at 03:58:36PM +0300, Avi Kivity wrote:
 
 
  So we probably need to upgrade gva_t to a u64.  Please send this as
  a separate patch, and test on i386 hosts.
 
   
  Are there _any_ regular tests of KVM on i386 hosts? For me this is
  terribly broken (also after I fixed the issue which gave me a
  VMEXIT_INVALID at the first vmrun).
 
 
 
  No, apart from the poor users.  I'll try to set something up using nsvm.
   
  Ok. I will post an initial fix for the VMEXIT_INVALID bug soon. Apart
  from that I get a lockdep warning when I try to start a guest. The guest
  actually boots if it is single-vcpu. SMP guests don't even boot through
  the BIOS for me.
 
 
 
 Strange.  i386 vs x86_64 shouldn't have that much effect!

This is the lockdep warning I get when I start booting a Linux kernel.
It is with the nested-npt patchset but the warning occurs without it too
(slightly different backtraces then).

[60390.953424] ===
[60390.954324] [ INFO: possible circular locking dependency detected ]
[60390.954324] 2.6.34-rc5 #7
[60390.954324] ---
[60390.954324] qemu-system-x86/2506 is trying to acquire lock:
[60390.954324]  (mm-mmap_sem){++}, at: [c10ab0f4] might_fault+0x4c/0x86
[60390.954324] 
[60390.954324] but task is already holding lock:
[60390.954324]  ((kvm-mmu_lock)-rlock){+.+...}, at: [f8ec1b50] 
spin_lock+0xd/0xf [kvm]
[60390.954324] 
[60390.954324] which lock already depends on the new lock.
[60390.954324] 
[60390.954324] 
[60390.954324] the existing dependency chain (in reverse order) is:
[60390.954324] 
[60390.954324] - #1 ((kvm-mmu_lock)-rlock){+.+...}:
[60390.954324][c10575ad] __lock_acquire+0x9fa/0xb6c
[60390.954324][c10577b8] lock_acquire+0x99/0xb8
[60390.954324][c15afa2b] _raw_spin_lock+0x20/0x2f
[60390.954324][f8eafe19] spin_lock+0xd/0xf [kvm]
[60390.954324][f8eb104e] 
kvm_mmu_notifier_invalidate_range_start+0x2f/0x71 [kvm]
[60390.954324][c10bc994] 
__mmu_notifier_invalidate_range_start+0x31/0x57
[60390.954324][c10b1de3] mprotect_fixup+0x153/0x3d5
[60390.954324][c10b21ca] sys_mprotect+0x165/0x1db
[60390.954324][c10028cc] sysenter_do_call+0x12/0x32
[60390.954324] 
[60390.954324] - #0 (mm-mmap_sem){++}:
[60390.954324][c10574af] __lock_acquire+0x8fc/0xb6c
[60390.954324][c10577b8] lock_acquire+0x99/0xb8
[60390.954324][c10ab111] might_fault+0x69/0x86
[60390.954324][c11d5987] _copy_from_user+0x36/0x119
[60390.954324][f8eafcd9] copy_from_user+0xd/0xf [kvm]
[60390.954324][f8eb0ac0] kvm_read_guest_page+0x24/0x33 [kvm]
[60390.954324][f8ebb362] kvm_read_guest_page_mmu+0x55/0x63 [kvm]
[60390.954324][f8ebb397] kvm_read_nested_guest_page+0x27/0x2e [kvm]
[60390.954324][f8ebb3da] load_pdptrs+0x3c/0x9e [kvm]
[60390.954324][f84747ac] svm_cache_reg+0x25/0x2b [kvm_amd]
[60390.954324][f8ec7894] kvm_mmu_load+0xf1/0x1fa [kvm]
[60390.954324][f8ebbdfc] kvm_arch_vcpu_ioctl_run+0x252/0x9c7 [kvm]
[60390.954324][f8eb1fb5] kvm_vcpu_ioctl+0xee/0x432 [kvm]
[60390.954324][c10cf8e9] vfs_ioctl+0x2c/0x96
[60390.954324][c10cfe88] do_vfs_ioctl+0x491/0x4cf
[60390.954324][c10cff0c] sys_ioctl+0x46/0x66
[60390.954324][c10028cc] sysenter_do_call+0x12/0x32
[60390.954324] 
[60390.954324] other info that might help us debug this:
[60390.954324] 
[60390.954324] 3 locks held by qemu-system-x86/2506:
[60390.954324]  #0:  (vcpu-mutex){+.+.+.}, at: [f8eb1185] 
vcpu_load+0x16/0x32 [kvm]
[60390.954324]  #1:  (kvm-srcu){.+.+.+}, at: [f8eb952c] 
srcu_read_lock+0x0/0x33 [kvm]
[60390.954324]  #2:  ((kvm-mmu_lock)-rlock){+.+...}, at: [f8ec1b50] 
spin_lock+0xd/0xf [kvm]
[60390.954324] 
[60390.954324] stack backtrace:
[60390.954324] Pid: 2506, comm: qemu-system-x86 Not tainted 2.6.34-rc5 #7
[60390.954324] Call Trace:
[60390.954324]  [c15adf46] ? printk+0x14/0x16
[60390.954324]  [c1056877] print_circular_bug+0x8a/0x96
[60390.954324]  [c10574af] __lock_acquire+0x8fc/0xb6c
[60390.954324]  [f8ec1b50] ? spin_lock+0xd/0xf [kvm]
[60390.954324]  [c10ab0f4] ? might_fault+0x4c/0x86
[60390.954324]  [c10577b8] lock_acquire+0x99/0xb8
[60390.954324]  [c10ab0f4] ? might_fault+0x4c/0x86
[60390.954324]  [c10ab111] might_fault+0x69/0x86
[60390.954324]  [c10ab0f4] ? might_fault+0x4c/0x86
[60390.954324]  [c11d5987] _copy_from_user+0x36/0x119
[60390.954324]  [f8eafcd9] copy_from_user+0xd/0xf [kvm]
[60390.954324]  [f8eb0ac0] kvm_read_guest_page+0x24/0x33 [kvm]
[60390.954324]  [f8ebb362] kvm_read_guest_page_mmu+0x55/0x63 [kvm]
[60390.954324]  [f8ebb397] kvm_read_nested_guest_page+0x27/0x2e 

Re: apparent key mapping error for usb keyboard

2010-05-04 Thread Avi Kivity

On 04/27/2010 12:46 PM, Michael Tokarev wrote:

I've a debian bugreport that claims to have a fix
for apparently wrong keymap for usb keyboard.  I
noticed this before with ps/2 keyboard too, the
sympthoms were that e.g windows keys were not
working in guests, but later on that has been
fixed.  But with `-usbdevice keyboard', i.e. with
usb keyboard, it still does not work.  See
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578846
for details and for the proposed patch which
fixes the mentioned issue.  Here's the patch itself:

--- a/hw/usb-hid.c
+++ b/hw/usb-hid.c
@@ -399,3 +399,3 @@
 0x51, 0x4e, 0x49, 0x4c, 0x00, 0x00, 0x00, 0x00,
-0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+0x00, 0x00, 0x00, 0xe3, 0xe7, 0x65, 0x00, 0x00,
 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,

I'm not sure if it's right fix however.  Hence I'm
asking for opinions here.  If it's a right way to go,
it should probably be applied to -stable too.


I've no idea, but the correct place to ask is qemu-devel (copied).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu

2010-05-04 Thread Avi Kivity

On 05/04/2010 12:37 PM, Roedel, Joerg wrote:


This is the lockdep warning I get when I start booting a Linux kernel.
It is with the nested-npt patchset but the warning occurs without it too
(slightly different backtraces then).

[60390.953424] ===
[60390.954324] [ INFO: possible circular locking dependency detected ]
[60390.954324] 2.6.34-rc5 #7
[60390.954324] ---
[60390.954324] qemu-system-x86/2506 is trying to acquire lock:
[60390.954324]  (mm-mmap_sem){++}, at: [c10ab0f4] might_fault+0x4c/0x86
[60390.954324]
[60390.954324] but task is already holding lock:
[60390.954324]  ((kvm-mmu_lock)-rlock){+.+...}, at: [f8ec1b50] 
spin_lock+0xd/0xf [kvm]
[60390.954324]
[60390.954324] which lock already depends on the new lock.
[60390.954324]
[60390.954324]
[60390.954324] the existing dependency chain (in reverse order) is:
[60390.954324]
[60390.954324] -  #1 ((kvm-mmu_lock)-rlock){+.+...}:
[60390.954324][c10575ad] __lock_acquire+0x9fa/0xb6c
[60390.954324][c10577b8] lock_acquire+0x99/0xb8
[60390.954324][c15afa2b] _raw_spin_lock+0x20/0x2f
[60390.954324][f8eafe19] spin_lock+0xd/0xf [kvm]
[60390.954324][f8eb104e] 
kvm_mmu_notifier_invalidate_range_start+0x2f/0x71 [kvm]
[60390.954324][c10bc994] 
__mmu_notifier_invalidate_range_start+0x31/0x57
[60390.954324][c10b1de3] mprotect_fixup+0x153/0x3d5
[60390.954324][c10b21ca] sys_mprotect+0x165/0x1db
[60390.954324][c10028cc] sysenter_do_call+0x12/0x32
   


Unrelated.  This can take the lock and free it.  It only shows up 
because we do memory ops inside the mmu_lock, which is deeply forbidden 
(anything which touches user memory, including kmalloc(), can trigger 
mmu notifiers and recursive locking).



[60390.954324]
[60390.954324] -  #0 (mm-mmap_sem){++}:
[60390.954324][c10574af] __lock_acquire+0x8fc/0xb6c
[60390.954324][c10577b8] lock_acquire+0x99/0xb8
[60390.954324][c10ab111] might_fault+0x69/0x86
[60390.954324][c11d5987] _copy_from_user+0x36/0x119
[60390.954324][f8eafcd9] copy_from_user+0xd/0xf [kvm]
[60390.954324][f8eb0ac0] kvm_read_guest_page+0x24/0x33 [kvm]
[60390.954324][f8ebb362] kvm_read_guest_page_mmu+0x55/0x63 [kvm]
[60390.954324][f8ebb397] kvm_read_nested_guest_page+0x27/0x2e [kvm]
[60390.954324][f8ebb3da] load_pdptrs+0x3c/0x9e [kvm]
[60390.954324][f84747ac] svm_cache_reg+0x25/0x2b [kvm_amd]
[60390.954324][f8ec7894] kvm_mmu_load+0xf1/0x1fa [kvm]
[60390.954324][f8ebbdfc] kvm_arch_vcpu_ioctl_run+0x252/0x9c7 [kvm]
[60390.954324][f8eb1fb5] kvm_vcpu_ioctl+0xee/0x432 [kvm]
[60390.954324][c10cf8e9] vfs_ioctl+0x2c/0x96
[60390.954324][c10cfe88] do_vfs_ioctl+0x491/0x4cf
[60390.954324][c10cff0c] sys_ioctl+0x46/0x66
[60390.954324][c10028cc] sysenter_do_call+0x12/0x32
   



Just a silly bug.  kvm_pdptr_read() can cause a guest memory read on 
svm, in this case with the mmu lock taken.  I'll post something to fix it.



What makes me wondering about this is that the two traces to the locks seem to
belong to different threads.
   


Ever increasing complexity...

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu

2010-05-04 Thread Avi Kivity

On 05/04/2010 12:45 PM, Avi Kivity wrote:



Just a silly bug.  kvm_pdptr_read() can cause a guest memory read on 
svm, in this case with the mmu lock taken.  I'll post something to fix 
it.


I guess this was not reported because most svm machines have npt, and 
this requires npt=0 to trigger.  Nonpae paging disables npt, so you were 
hit.  Interestingly, nsvm makes it more likely to appear, since npt on 
i386+pae will need the pdptrs.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots

2010-05-04 Thread Avi Kivity
On svm, kvm_read_pdptr() may require reading guest memory, which can sleep.

Push the spinlock into mmu_alloc_roots(), and only take it after we've read
the pdptr.

Signed-off-by: Avi Kivity a...@redhat.com
---

Marcelo, dropping and re-acquiring the lock before mmu_sync_roots(), is fine,
yes?

 arch/x86/kvm/mmu.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 51eb6d6..de99638 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2065,11 +2065,13 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
direct = 1;
root_gfn = 0;
}
+   spin_lock(vcpu-kvm-mmu_lock);
sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
  PT64_ROOT_LEVEL, direct,
  ACC_ALL, NULL);
root = __pa(sp-spt);
++sp-root_count;
+   spin_unlock(vcpu-kvm-mmu_lock);
vcpu-arch.mmu.root_hpa = root;
return 0;
}
@@ -2093,11 +2095,14 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
direct = 1;
root_gfn = i  30;
}
+   spin_lock(vcpu-kvm-mmu_lock);
sp = kvm_mmu_get_page(vcpu, root_gfn, i  30,
  PT32_ROOT_LEVEL, direct,
  ACC_ALL, NULL);
root = __pa(sp-spt);
++sp-root_count;
+   spin_unlock(vcpu-kvm-mmu_lock);
+
vcpu-arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
}
vcpu-arch.mmu.root_hpa = __pa(vcpu-arch.mmu.pae_root);
@@ -2466,7 +2471,9 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
goto out;
spin_lock(vcpu-kvm-mmu_lock);
kvm_mmu_free_some_pages(vcpu);
+   spin_unlock(vcpu-kvm-mmu_lock);
r = mmu_alloc_roots(vcpu);
+   spin_lock(vcpu-kvm-mmu_lock);
mmu_sync_roots(vcpu);
spin_unlock(vcpu-kvm-mmu_lock);
if (r)
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: kvm_pdptr_read() may sleep

2010-05-04 Thread Avi Kivity
Annotate it thusly.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/kvm_cache_regs.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index cff851c..d2a98f8 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -36,6 +36,8 @@ static inline void kvm_rip_write(struct kvm_vcpu *vcpu, 
unsigned long val)
 
 static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
 {
+   might_sleep();  /* on svm */
+
if (!test_bit(VCPU_EXREG_PDPTR,
  (unsigned long *)vcpu-arch.regs_avail))
kvm_x86_ops-cache_reg(vcpu, VCPU_EXREG_PDPTR);
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Christoph Hellwig
On Tue, May 04, 2010 at 02:08:24PM +0930, Rusty Russell wrote:
 On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote:
  I took a stub at documenting CMD and FLUSH request types in virtio
  block.  Christoph, could you look over this please?
  
  I note that the interface seems full of warts to me,
  this might be a first step to cleaning them.
 
 ISTR Christoph had withdrawn some patches in this area, and was waiting
 for him to resubmit?

Any patches I've withdrawn in this area are withdrawn for good.  But
what I really need to do is to review Michaels spec updates, sorry.

UI'll get back to it today.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots

2010-05-04 Thread Roedel, Joerg
On Tue, May 04, 2010 at 06:03:50AM -0400, Avi Kivity wrote:
 On svm, kvm_read_pdptr() may require reading guest memory, which can sleep.
 
 Push the spinlock into mmu_alloc_roots(), and only take it after we've read
 the pdptr.

This fixes the lockdep issue for me. Thanks.

Tested-by: Joerg Roedel joerg.roe...@amd.com

 
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
 
 Marcelo, dropping and re-acquiring the lock before mmu_sync_roots(), is fine,
 yes?
 
  arch/x86/kvm/mmu.c |7 +++
  1 files changed, 7 insertions(+), 0 deletions(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 51eb6d6..de99638 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -2065,11 +2065,13 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
   direct = 1;
   root_gfn = 0;
   }
 + spin_lock(vcpu-kvm-mmu_lock);
   sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
 PT64_ROOT_LEVEL, direct,
 ACC_ALL, NULL);
   root = __pa(sp-spt);
   ++sp-root_count;
 + spin_unlock(vcpu-kvm-mmu_lock);
   vcpu-arch.mmu.root_hpa = root;
   return 0;
   }
 @@ -2093,11 +2095,14 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
   direct = 1;
   root_gfn = i  30;
   }
 + spin_lock(vcpu-kvm-mmu_lock);
   sp = kvm_mmu_get_page(vcpu, root_gfn, i  30,
 PT32_ROOT_LEVEL, direct,
 ACC_ALL, NULL);
   root = __pa(sp-spt);
   ++sp-root_count;
 + spin_unlock(vcpu-kvm-mmu_lock);
 +
   vcpu-arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
   }
   vcpu-arch.mmu.root_hpa = __pa(vcpu-arch.mmu.pae_root);
 @@ -2466,7 +2471,9 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
   goto out;
   spin_lock(vcpu-kvm-mmu_lock);
   kvm_mmu_free_some_pages(vcpu);
 + spin_unlock(vcpu-kvm-mmu_lock);
   r = mmu_alloc_roots(vcpu);
 + spin_lock(vcpu-kvm-mmu_lock);
   mmu_sync_roots(vcpu);
   spin_unlock(vcpu-kvm-mmu_lock);
   if (r)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] amended: first round of vhost-net enhancements for net-next

2010-05-04 Thread Michael S. Tsirkin
David,
This is an amended pull request: I have rebased the tree to the
correct patches. This has been through basic tests and seems
to work fine here.

The following tree includes a couple of enhancements that help vhost-net.
Please pull them for net-next. Another set of patches is under
debugging/testing and I hope to get them ready in time for 2.6.35,
so there may be another pull request later.

Thanks!

The following changes since commit 7ef527377b88ff05fb122a47619ea506c631c914:

  Merge branch 'master' of 
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 (2010-05-02 22:02:06 
-0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost

Michael S. Tsirkin (2):
  tun: add ioctl to modify vnet header size
  macvtap: add ioctl to modify vnet header size

 drivers/net/macvtap.c  |   27 +++
 drivers/net/tun.c  |   32 
 include/linux/if_tun.h |2 ++
 3 files changed, 53 insertions(+), 8 deletions(-)

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath - Assertion

2010-05-04 Thread Peter Lieven

hi kevin,

i set a breakpint at bmdma_active_if. the first 2 breaks encountered 
when the last path in the multipath

failed, but the assertion was not true.
when i kicked one path back in the breakpoint was reached again, this 
time leading to an assert.

the stacktrace is from the point shortly before.

hope this helps.

br,
peter
--

(gdb) b bmdma_active_if
Breakpoint 2 at 0x43f2e0: file 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h, line 507.

(gdb) c
Continuing.
[Switching to Thread 0x7f7b3300d950 (LWP 21171)]

Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507

507assert(bmdma-unit != (uint8_t)-1);
(gdb) c
Continuing.

Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507

507assert(bmdma-unit != (uint8_t)-1);
(gdb) c
Continuing.

Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507

507assert(bmdma-unit != (uint8_t)-1);
(gdb) bt full
#0  bmdma_active_if (bmdma=0xe31fd8) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507

   __PRETTY_FUNCTION__ = bmdma_active_if
#1  0x0043f6ba in ide_read_dma_cb (opaque=0xe31fd8, ret=0) at 
/usr/src/qemu-kvm-0.12.3/hw/ide/core.c:554

   bm = (BMDMAState *) 0xe31fd8
   s = (IDEState *) 0xe17940
   n = 0
   sector_num = 0
#2  0x0058730c in dma_bdrv_cb (opaque=0xe17940, ret=0) at 
/usr/src/qemu-kvm-0.12.3/dma-helpers.c:94

   dbs = (DMAAIOCB *) 0xe17940
   cur_addr = 0
   cur_len = 0
   mem = (void *) 0x0
#3  0x0049e510 in qemu_laio_process_completion (s=0xe119c0, 
laiocb=0xe179c0) at linux-aio.c:68

   ret = 0
#4  0x0049e611 in qemu_laio_enqueue_completed (s=0xe119c0, 
laiocb=0xe179c0) at linux-aio.c:107

No locals.
#5  0x0049e787 in qemu_laio_completion_cb (opaque=0xe119c0) at 
linux-aio.c:144

   iocb = (struct iocb *) 0xe179f0
   laiocb = (struct qemu_laiocb *) 0xe179c0
   val = 1
   ret = 8
   nevents = 1
   i = 0
   events = {{data = 0x0, obj = 0xe179f0, res = 4096, res2 = 0}, {data 
= 0x0, obj = 0x0, res = 0, res2 = 0} repeats 46 times, {data = 0x0, 
obj = 0x0, res = 0,
   res2 = 4365191}, {data = 0x429abf, obj = 0x7f7b3300c410, res = 
4614129721674825936, res2 = 14777248}, {data = 0x300018, obj = 
0x7f7b3300c4c0, res = 140167113393152,
   res2 = 47259417504}, {data = 0xe17740, obj = 0xa3300c4e0, res = 
140167113393184, res2 = 0}, {data = 0xe17740, obj = 0x0, res = 0, res2 = 
17}, {data = 0x7f7b3300ccf0,
   obj = 0x92, res = 32, res2 = 168}, {data = 0x7f7b33797a00, obj = 
0x801000, res = 0, res2 = 140167141433408}, {data = 0x7f7b34496e00, obj 
= 0x7f7b33797a00,
   res = 140167113393392, res2 = 8392704}, {data = 0x0, obj = 
0x7f7b34aca040, res = 140167134932480, res2 = 140167118209654}, {data = 
0x7f7b3300d950, obj = 0x42603d, res = 0,
   res2 = 42949672960}, {data = 0x7f7b3300c510, obj = 0xe17ba0, res = 
14776128, res2 = 43805361568}, {data = 0x7f7b3300ced0, obj = 0x42797e, 
res = 0, res2 = 14777248}, {
   data = 0x174, obj = 0x0, res = 373, res2 = 0}, {data = 0x176, obj = 
0x0, res = 3221225601, res2 = 0}, {data = 0x4008ae89c083, obj = 0x0, 
res = 209379655938, res2 = 0}, {
   data = 0x7f7bc084, obj = 0x0, res = 3221225602, res2 = 0}, {data 
= 0x7f7b0012, obj = 0x0, res = 17, res2 = 0}, {data = 0x0, obj = 
0x11, res = 140167113395840,
   res2 = 146}, {data = 0x20, obj = 0xa8, res = 140167121304064, res2 = 
8392704}, {data = 0x0, obj = 0x7f7b34aca040, res = 140167134932480, res2 
= 140167121304064}, {
   data = 0x7f7b3300c680, obj = 0x801000, res = 0, res2 = 
140167141433408}, {data = 0x7f7b34496e00, obj = 0x7f7b334a4276, res = 
140167113398608, res2 = 4350013}, {data = 0x0,
   obj = 0xa, res = 140167113393824, res2 = 14777248}, {data = 
0xe2c010, obj = 0xa3300c730, res = 140167113396320, res2 = 4356478}, 
{data = 0x0, obj = 0xe17ba0,
   res = 372, res2 = 0}, {data = 0x175, obj = 0x0, res = 374, res2 = 
0}, {data = 0xc081, obj = 0x0, res = 3221225603, res2 = 0}, {data = 
0xc102, obj = 0x0,
   res = 3221225604, res2 = 0}, {data = 0xc082, obj = 0x0, res = 
18, res2 = 0}, {data = 0x11, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, 
obj = 0x0, res = 0, res2 = 0}, {
   data = 0x0, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, obj = 0x0, 
res = 0, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 0}, {data = 
0x0, obj = 0x0, res = 0,
   res2 = 140167139245116}, {data = 0x0, obj = 0x7f7b34abe118, res = 9, 
res2 = 13}, {data = 0x25bf5fc6, obj = 0x7f7b348b40f0, res = 
140167117719264, res2 = 6}, {
   data = 0x96fd7f, obj = 0x7f7b3300c850, res = 140167113394680, res2 = 
140167117724520}, {data = 0x0, obj = 0x7f7b34abe168, res = 
140167141388288, res2 = 4206037}, {
   data = 0x7f7b3343a210, obj = 0x402058, res = 21474836480, res2 = 
4294968102}, {data = 0x0, obj = 0x7f7b34ac8358, res = 140167113394736, 
res2 = 140167113394680}, {
   data = 0x25bf5fc6, obj = 0x7f7b3300c9e0, res = 0, res2 = 
140167139246910}, {data = 0x0, obj = 0x7f7b34abe168, res 

Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu

2010-05-04 Thread Roedel, Joerg
On Tue, May 04, 2010 at 05:50:50AM -0400, Avi Kivity wrote:
 On 05/04/2010 12:45 PM, Avi Kivity wrote:
 
 
  Just a silly bug.  kvm_pdptr_read() can cause a guest memory read on 
  svm, in this case with the mmu lock taken.  I'll post something to fix 
  it.
 
 I guess this was not reported because most svm machines have npt, and 
 this requires npt=0 to trigger.  Nonpae paging disables npt, so you were 
 hit.  Interestingly, nsvm makes it more likely to appear, since npt on 
 i386+pae will need the pdptrs.

Hmm, actually it happened on 32 bit with npt enabled. I think this
can trigger when mmu_alloc_roots is called for an pae guest because it
accidentially tries read the root_gfn from the guest before it figures
out that it runs with tdp and omits the gfn read from the guest.
I need to touch this for nested-npt and will look into a way improving
this.

Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Fix wallclock version writing race

2010-05-04 Thread Avi Kivity
Wallclock writing uses an unprotected global variable to hold the version;
this can cause one guest to interfere with another if both write their
wallclock at the same time.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/x86.c |   12 ++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f6f8dad..c3152d7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -754,14 +754,22 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned 
index, u64 *data)
 
 static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
 {
-   static int version;
+   int version;
+   int r;
struct pvclock_wall_clock wc;
struct timespec boot;
 
if (!wall_clock)
return;
 
-   version++;
+   r = kvm_read_guest(kvm, wall_clock, version, sizeof(version));
+   if (r)
+   return;
+
+   if (version  1)
+   ++version;  /* first time write, random junk */
+
+   ++version;
 
kvm_write_guest(kvm, wall_clock, version, sizeof(version));
 
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/22] KVM: MMU: Track page fault data in struct vcpu

2010-05-04 Thread Avi Kivity

On 05/04/2010 03:00 PM, Roedel, Joerg wrote:

On Tue, May 04, 2010 at 05:50:50AM -0400, Avi Kivity wrote:
   

On 05/04/2010 12:45 PM, Avi Kivity wrote:
 


Just a silly bug.  kvm_pdptr_read() can cause a guest memory read on
svm, in this case with the mmu lock taken.  I'll post something to fix
it.
   

I guess this was not reported because most svm machines have npt, and
this requires npt=0 to trigger.  Nonpae paging disables npt, so you were
hit.  Interestingly, nsvm makes it more likely to appear, since npt on
i386+pae will need the pdptrs.
 

Hmm, actually it happened on 32 bit with npt enabled. I think this
can trigger when mmu_alloc_roots is called for an pae guest because it
accidentially tries read the root_gfn from the guest before it figures
out that it runs with tdp and omits the gfn read from the guest.
   


Yes.  I had a patchset which moved the 'direct' calculation before, and 
skipped root_gfn if it was direct, but it was broken.  If you like I can 
resurrect it, but it may interfere with your work.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Booting/installing WindowsNT

2010-05-04 Thread Andre Przywara

Avi Kivity wrote:

On 05/03/2010 08:03 PM, Michael Tokarev wrote:

Michael, can you try to use -cpu host,-vme and see if that makes a
difference?



With -cpu host,-vme winNT boots just fine as with just -cpu host.

I also tried with -cpu qemu64 and kvm64, with +vme and -vme (4
combinations in total) - in all cases winNT crashes with the
same 0x003E error.  So it appears that vme makes no
difference.


Please try again the model/vendor/family.  I suggest using x86info on 
both to see what the differences are, using -cpu host with overrides to 
make it equivalent to qemu64 (and verifying it fails), then removing the 
overrides one by one until it works.
I managed to get a NT4 CD and can acknowledge the issues you see. I am 
about to debug this now.
With -cpu host (on a AMD K8, similar to Michael's) I get to the point 
Michael mentioned:

 Microsoft (R) Windows NT (TM) Version 4.0 (Build 1381).
 1 System Processor [512 MB Memory]  Multiprocessor Kernel
Then it _seems_ to hang, checking for getting beyond a certain TSC value 
in a tight loop.
(rdtsc; cmp %edx, %edi; ja @rdtsc; jb bailout; cmp %eax, %ebx; ja 
@rdtsdc; bailout:)
But after some time (when I got back from the monitor, but also without 
going into) I could proceed with the installation.

Michael, can you confirm this?
I will now try to get behind the STOP 3E error.

Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath - Assertion

2010-05-04 Thread Kevin Wolf
Am 04.05.2010 13:38, schrieb Peter Lieven:
 hi kevin,
 
 i set a breakpint at bmdma_active_if. the first 2 breaks encountered 
 when the last path in the multipath
 failed, but the assertion was not true.
 when i kicked one path back in the breakpoint was reached again, this 
 time leading to an assert.
 the stacktrace is from the point shortly before.
 
 hope this helps.

Hm, looks like there's something wrong with cancelling requests -
bdrv_aio_cancel might decide that it completes a request (and
consequently calls the callback for it) whereas the IDE emulation
decides that it's done with the request before calling bdrv_aio_cancel.

I haven't looked in much detail what this could break, but does
something like this help?

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0757528..3cd55e3 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2838,10 +2838,6 @@ static void ide_dma_restart(IDEState *s, int is_read)
 void ide_dma_cancel(BMDMAState *bm)
 {
 if (bm-status  BM_STATUS_DMAING) {
-bm-status = ~BM_STATUS_DMAING;
-/* cancel DMA request */
-bm-unit = -1;
-bm-dma_cb = NULL;
 if (bm-aiocb) {
 #ifdef DEBUG_AIO
 printf(aio_cancel\n);
@@ -2849,6 +2845,10 @@ void ide_dma_cancel(BMDMAState *bm)
 bdrv_aio_cancel(bm-aiocb);
 bm-aiocb = NULL;
 }
+bm-status = ~BM_STATUS_DMAING;
+/* cancel DMA request */
+bm-unit = -1;
+bm-dma_cb = NULL;
 }
 }

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Booting/installing WindowsNT

2010-05-04 Thread Michael Tokarev
Andre Przywara wrote:
[]
 I managed to get a NT4 CD and can acknowledge the issues you see. I am
 about to debug this now.
 With -cpu host (on a AMD K8, similar to Michael's) I get to the point
 Michael mentioned:
  Microsoft (R) Windows NT (TM) Version 4.0 (Build 1381).
  1 System Processor [512 MB Memory]  Multiprocessor Kernel
 Then it _seems_ to hang, checking for getting beyond a certain TSC value
 in a tight loop.
 (rdtsc; cmp %edx, %edi; ja @rdtsc; jb bailout; cmp %eax, %ebx; ja
 @rdtsdc; bailout:)
 But after some time (when I got back from the monitor, but also without
 going into) I could proceed with the installation.
 Michael, can you confirm this?

I've seen 3 variants here so far:

1.  normal installation.  It stops for a while after that kernel
  message you mentioned.  For several secodns, mabye even 20
  seconds.  And after a while it continues.  During all this
  time the guest cpu usage is 100% like you describe (a tight
  loop).  This is what I call working - I never bothered to
  think if that tight loop/pause is normal or not.  This is
  what happens for me with -cpu host.

2. with -cpu pentium it also displays that kernel message but
  stops here without any cpu usage whatsoever.  I waited for
  some 40 minutes at one point (I just forgot I started it but
  later on noticed there's a QEMU window floating around with
  that NT kernel message on it and nothing happening).

3.  In all other cases so far it BSoDs with STOP 0x3E error
  right before displaying that kernel message.

So.. I'm not sure if it's confirmation or not :)

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space

2010-05-04 Thread Takuya Yoshikawa
Hi, sorry for sending from my personal account.
The following series are all from me:

  From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

  The 3rd version of moving dirty bitmaps to user space.

From this version, we add x86 and ppc and asm-generic people to CC lists.


[To KVM people]

Sorry for being late to reply your comments.

Avi,
 - I've wrote an answer to your question in patch 5/12: drivers/vhost/vhost.c .

 - I've considered to change the set_bit_user_non_atomic to an inline function,
   but did not change because the other helpers in the uaccess.h are written as
   macros. Anyway, I hope that x86 people will give us appropriate suggestions
   about this.

 - I thought that documenting about making bitmaps 64-bit aligned will be
   written when we add an API to register user-allocated bitmaps. So probably
   in the next series.

Avi, Alex,
 - Could you check the ia64 and ppc parts, please? I tried to keep the logical
   changes as small as possible.

   I personally tried to build these with cross compilers. For ia64, I could 
check
   build success with my patch series. But book3s, even without my patch series,
   it failed with the following errors:

  arch/powerpc/kvm/book3s_paired_singles.c: In function 
'kvmppc_emulate_paired_single':
  arch/powerpc/kvm/book3s_paired_singles.c:1289: error: the frame size of 2288 
bytes is larger than 2048 bytes
  make[1]: *** [arch/powerpc/kvm/book3s_paired_singles.o] Error 1
  make: *** [arch/powerpc/kvm] Error 2


About changelog: there are two main changes from the 2nd version:
  1. I changed the treatment of clean slots (see patch 1/12).
 This was already applied today, thanks!
  2. I changed the switch API. (see patch 11/12).

To show this API's advantage, I also did a test (see the end of this mail).


[To x86 people]

Hi, Thomas, Ingo, Peter,

Please review the patches 4,5/12. Because this is the first experience for
me to send patches to x86, please tell me if this lacks anything.


[To ppc people]

Hi, Benjamin, Paul, Alex,

Please see the patches 6,7/12. I first say sorry for that I've not tested these
yet. In that sense, these may not be in the quality for precise reviews. But I
will be happy if you would give me any comments.

Alex, could you help me? Though I have a plan to get PPC box in the future,
currently I cannot test these.



[To asm-generic people]

Hi, Arnd,

Please review the patch 8/12. This kind of macro is acceptable?





[Performance test]

We measured the tsc needed to the ioctl()s for getting dirty logs in
kernel.

Test environment

  AMD Phenom(tm) 9850 Quad-Core Processor with 8GB memory


1. GUI test (running Ubuntu guest in graphical mode)

  sudo qemu-system-x86_64 -hda dirtylog_test.img -boot c -m 4192 -net ...

We show a relatively stable part to compare how much time is needed
for the basic parts of dirty log ioctl.

   get.org   get.opt  switch.opt

slots[7].len=32768  278379 66398 64024
slots[8].len=32768  181246   270   160
slots[7].len=32768  263961 64673 64494
slots[8].len=32768  181655   265   160
slots[7].len=32768  263736 64701 64610
slots[8].len=32768  182785   267   160
slots[7].len=32768  260925 65360 65042
slots[8].len=32768  182579   264   160
slots[7].len=32768  267823 65915 65682
slots[8].len=32768  186350   271   160

At a glance, we know our optimization improved significantly compared
to the original get dirty log ioctl. This is true for both get.opt and
switch.opt. This has a really big impact for the personal KVM users who
drive KVM in GUI mode on their usual PCs.

Next, we notice that switch.opt improved a hundred nano seconds or so for
these slots. Although this may sound a bit tiny improvement, we can feel
this as a difference of GUI's responses like mouse reactions.

To feel the difference, please try GUI on your PC with our patch series!


2. Live-migration test (4GB guest, write loop with 1GB buf)

We also did a live-migration test.

   get.org   get.opt  switch.opt

slots[0].len=655360 797383261144222181
slots[1].len=37570478082186721   1965244   1842824
slots[2].len=637534208 1433562   1012723   1031213
slots[3].len=131072 216858   331   331
slots[4].len=131072 121635   225   164
slots[5].len=131072 120863   356   164
slots[6].len=16777216   121746  1133   156
slots[7].len=32768  120415   230   278
slots[8].len=32768  120368   216   149
slots[0].len=655360 806497194710223582
slots[1].len=37570478082142922   1878025   1895369
slots[2].len=637534208 1386512   1021309   1000345
slots[3].len=131072 221118   459   296
slots[4].len=131072 121516   272   166
slots[5].len=131072 122652   244   173

[RFC][PATCH 1/12 applied today] KVM: x86: avoid unnecessary bitmap allocation when memslot is clean

2010-05-04 Thread Takuya Yoshikawa
Although we always allocate a new dirty bitmap in x86's get_dirty_log(),
it is only used as a zero-source of copy_to_user() and freed right after
that when memslot is clean. This patch uses clear_user() instead of doing
this unnecessary zero-source allocation.

Performance improvement: as we can expect easily, the time needed to
allocate a bitmap is completely reduced. Furthermore, we can avoid the
tlb flush triggered by vmalloc() and get some good effects. In my test,
the improved ioctl was about 4 to 10 times faster than the original one
for clean slots.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/x86/kvm/x86.c |   37 +++--
 1 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b2ce1d..b95a211 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2744,7 +2744,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
struct kvm_memory_slot *memslot;
unsigned long n;
unsigned long is_dirty = 0;
-   unsigned long *dirty_bitmap = NULL;
 
mutex_lock(kvm-slots_lock);
 
@@ -2759,27 +2758,30 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
n = kvm_dirty_bitmap_bytes(memslot);
 
-   r = -ENOMEM;
-   dirty_bitmap = vmalloc(n);
-   if (!dirty_bitmap)
-   goto out;
-   memset(dirty_bitmap, 0, n);
-
for (i = 0; !is_dirty  i  n/sizeof(long); i++)
is_dirty = memslot-dirty_bitmap[i];
 
/* If nothing is dirty, don't bother messing with page tables. */
if (is_dirty) {
struct kvm_memslots *slots, *old_slots;
+   unsigned long *dirty_bitmap;
 
spin_lock(kvm-mmu_lock);
kvm_mmu_slot_remove_write_access(kvm, log-slot);
spin_unlock(kvm-mmu_lock);
 
-   slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
-   if (!slots)
-   goto out_free;
+   r = -ENOMEM;
+   dirty_bitmap = vmalloc(n);
+   if (!dirty_bitmap)
+   goto out;
+   memset(dirty_bitmap, 0, n);
 
+   r = -ENOMEM;
+   slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
+   if (!slots) {
+   vfree(dirty_bitmap);
+   goto out;
+   }
memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
slots-memslots[log-slot].dirty_bitmap = dirty_bitmap;
 
@@ -2788,13 +2790,20 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
synchronize_srcu_expedited(kvm-srcu);
dirty_bitmap = old_slots-memslots[log-slot].dirty_bitmap;
kfree(old_slots);
+
+   r = -EFAULT;
+   if (copy_to_user(log-dirty_bitmap, dirty_bitmap, n)) {
+   vfree(dirty_bitmap);
+   goto out;
+   }
+   vfree(dirty_bitmap);
+   } else {
+   r = -EFAULT;
+   if (clear_user(log-dirty_bitmap, n))
+   goto out;
}
 
r = 0;
-   if (copy_to_user(log-dirty_bitmap, dirty_bitmap, n))
-   r = -EFAULT;
-out_free:
-   vfree(dirty_bitmap);
 out:
mutex_unlock(kvm-slots_lock);
return r;
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 2/12] KVM: introduce slot level dirty state management

2010-05-04 Thread Takuya Yoshikawa
This patch introduces is_dirty member for each memory slot.
Using this member, we remove the dirty bitmap scans which are done in
get_dirty_log().

This is important for moving dirty bitmaps to user space because we don't
have any good ways to check bitmaps in user space with low cost and scanning
bitmaps to check memory slot dirtiness will not be acceptable.

When we mark a slot dirty:
 - x86 and ppc: at the timing of mark_page_dirty()
 - ia64: at the timing of kvm_ia64_sync_dirty_log()
ia64 uses a different place to store dirty logs and synchronize it with
the logs of memory slots right before the get_dirty_log(). So we use this
timing to update is_dirty.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp
CC: Avi Kivity a...@redhat.com
CC: Alexander Graf ag...@suse.de
---
 arch/ia64/kvm/kvm-ia64.c  |   11 +++
 arch/powerpc/kvm/book3s.c |9 -
 arch/x86/kvm/x86.c|9 +++--
 include/linux/kvm_host.h  |4 ++--
 virt/kvm/kvm_main.c   |   13 +++--
 5 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index d5f4e91..17fd65c 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1824,6 +1824,9 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
base = memslot-base_gfn / BITS_PER_LONG;
 
for (i = 0; i  n/sizeof(long); ++i) {
+   if (dirty_bitmap[base + i])
+   memslot-is_dirty = true;
+
memslot-dirty_bitmap[i] = dirty_bitmap[base + i];
dirty_bitmap[base + i] = 0;
}
@@ -1838,7 +1841,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
int r;
unsigned long n;
struct kvm_memory_slot *memslot;
-   int is_dirty = 0;
 
mutex_lock(kvm-slots_lock);
spin_lock(kvm-arch.dirty_log_lock);
@@ -1847,16 +1849,17 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
if (r)
goto out;
 
-   r = kvm_get_dirty_log(kvm, log, is_dirty);
+   r = kvm_get_dirty_log(kvm, log);
if (r)
goto out;
 
+   memslot = kvm-memslots-memslots[log-slot];
/* If nothing is dirty, don't bother messing with page tables. */
-   if (is_dirty) {
+   if (memslot-is_dirty) {
kvm_flush_remote_tlbs(kvm);
-   memslot = kvm-memslots-memslots[log-slot];
n = kvm_dirty_bitmap_bytes(memslot);
memset(memslot-dirty_bitmap, 0, n);
+   memslot-is_dirty = false;
}
r = 0;
 out:
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 28e785f..4b074f1 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1191,20 +1191,18 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
struct kvm_memory_slot *memslot;
struct kvm_vcpu *vcpu;
ulong ga, ga_end;
-   int is_dirty = 0;
int r;
unsigned long n;
 
mutex_lock(kvm-slots_lock);
 
-   r = kvm_get_dirty_log(kvm, log, is_dirty);
+   r = kvm_get_dirty_log(kvm, log);
if (r)
goto out;
 
+   memslot = kvm-memslots-memslots[log-slot];
/* If nothing is dirty, don't bother messing with page tables. */
-   if (is_dirty) {
-   memslot = kvm-memslots-memslots[log-slot];
-
+   if (memslot-is_dirty) {
ga = memslot-base_gfn  PAGE_SHIFT;
ga_end = ga + (memslot-npages  PAGE_SHIFT);
 
@@ -1213,6 +1211,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
n = kvm_dirty_bitmap_bytes(memslot);
memset(memslot-dirty_bitmap, 0, n);
+   memslot-is_dirty = false;
}
 
r = 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b95a211..023c7f8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2740,10 +2740,9 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
  struct kvm_dirty_log *log)
 {
-   int r, i;
+   int r;
struct kvm_memory_slot *memslot;
unsigned long n;
-   unsigned long is_dirty = 0;
 
mutex_lock(kvm-slots_lock);
 
@@ -2758,11 +2757,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
n = kvm_dirty_bitmap_bytes(memslot);
 
-   for (i = 0; !is_dirty  i  n/sizeof(long); i++)
-   is_dirty = memslot-dirty_bitmap[i];
-
/* If nothing is dirty, don't bother messing with page tables. */
-   if (is_dirty) {
+   if (memslot-is_dirty) {
struct kvm_memslots *slots, *old_slots;
unsigned long *dirty_bitmap;
 
@@ -2784,6 +2780,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
}
memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
slots-memslots[log-slot].dirty_bitmap 

[RFC][PATCH 3/12] KVM: introduce wrapper functions to create and destroy dirty bitmaps

2010-05-04 Thread Takuya Yoshikawa
We will change the vmalloc() and vfree() to do_mmap() and do_munmap() later.
This patch makes it easy and cleanup the code.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp
---
 virt/kvm/kvm_main.c |   27 ---
 1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7ab6312..3e3acad 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -435,6 +435,12 @@ out_err_nodisable:
return ERR_PTR(r);
 }
 
+static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+   vfree(memslot-dirty_bitmap);
+   memslot-dirty_bitmap = NULL;
+}
+
 /*
  * Free any memory in @free but not in @dont.
  */
@@ -447,7 +453,7 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot 
*free,
vfree(free-rmap);
 
if (!dont || free-dirty_bitmap != dont-dirty_bitmap)
-   vfree(free-dirty_bitmap);
+   kvm_destroy_dirty_bitmap(free);
 
 
for (i = 0; i  KVM_NR_PAGE_SIZES - 1; ++i) {
@@ -458,7 +464,6 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot 
*free,
}
 
free-npages = 0;
-   free-dirty_bitmap = NULL;
free-rmap = NULL;
 }
 
@@ -520,6 +525,18 @@ static int kvm_vm_release(struct inode *inode, struct file 
*filp)
return 0;
 }
 
+static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
+{
+   unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(memslot);
+
+   memslot-dirty_bitmap = vmalloc(dirty_bytes);
+   if (!memslot-dirty_bitmap)
+   return -ENOMEM;
+
+   memset(memslot-dirty_bitmap, 0, dirty_bytes);
+   return 0;
+}
+
 /*
  * Allocate some memory and give it an address in the guest physical address
  * space.
@@ -653,12 +670,8 @@ skip_lpage:
 
/* Allocate page dirty bitmap if needed */
if ((new.flags  KVM_MEM_LOG_DIRTY_PAGES)  !new.dirty_bitmap) {
-   unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(new);
-
-   new.dirty_bitmap = vmalloc(dirty_bytes);
-   if (!new.dirty_bitmap)
+   if (kvm_create_dirty_bitmap(new)  0)
goto out_free;
-   memset(new.dirty_bitmap, 0, dirty_bytes);
/* destroy any largepage mappings for dirty tracking */
if (old.npages)
flush_shadow = 1;
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 4/12] x86: introduce copy_in_user() for 32-bit

2010-05-04 Thread Takuya Yoshikawa
During the work of KVM's dirty page logging optimization, we encountered
the need of copy_in_user() for 32-bit x86 and ppc: these will be used for
manipulating dirty bitmaps in user space.

So we implement copy_in_user() for 32-bit with existing generic copy user
helpers.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp
CC: Avi Kivity a...@redhat.com
Cc: Thomas Gleixner t...@linutronix.de
CC: Ingo Molnar mi...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
---
 arch/x86/include/asm/uaccess_32.h |2 ++
 arch/x86/lib/usercopy_32.c|   26 ++
 2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/uaccess_32.h 
b/arch/x86/include/asm/uaccess_32.h
index 088d09f..85d396d 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -21,6 +21,8 @@ unsigned long __must_check __copy_from_user_ll_nocache
(void *to, const void __user *from, unsigned long n);
 unsigned long __must_check __copy_from_user_ll_nocache_nozero
(void *to, const void __user *from, unsigned long n);
+unsigned long __must_check copy_in_user
+   (void __user *to, const void __user *from, unsigned n);
 
 /**
  * __copy_to_user_inatomic: - Copy a block of data into user space, with less 
checking.
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index e218d5d..e90ffc3 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -889,3 +889,29 @@ void copy_from_user_overflow(void)
WARN(1, Buffer overflow detected!\n);
 }
 EXPORT_SYMBOL(copy_from_user_overflow);
+
+/**
+ * copy_in_user: - Copy a block of data from user space to user space.
+ * @to:   Destination address, in user space.
+ * @from: Source address, in user space.
+ * @n:Number of bytes to copy.
+ *
+ * Context: User context only.  This function may sleep.
+ *
+ * Copy data from user space to user space.
+ *
+ * Returns number of bytes that could not be copied.
+ * On success, this will be zero.
+ */
+unsigned long
+copy_in_user(void __user *to, const void __user *from, unsigned n)
+{
+   if (access_ok(VERIFY_WRITE, to, n)  access_ok(VERIFY_READ, from, n)) {
+   if (movsl_is_ok(to, from, n))
+   __copy_user(to, from, n);
+   else
+   n = __copy_user_intel(to, (const void *)from, n);
+   }
+   return n;
+}
+EXPORT_SYMBOL(copy_in_user);
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 5/12] x86: introduce __set_bit() like function for bitmaps in user space

2010-05-04 Thread Takuya Yoshikawa
During the work of KVM's dirty page logging optimization, we encountered
the need of manipulating bitmaps in user space efficiantly. To achive this,
we introduce a uaccess function for setting a bit in user space following
Avi's suggestion.

  KVM is now using dirty bitmaps for live-migration and VGA. Although we need
  to update them from kernel side, copying them every time for updating the
  dirty log is a big bottleneck. Especially, we tested that zero-copy bitmap
  manipulation improves responses of GUI manipulations a lot.

We also found one similar need in drivers/vhost/vhost.c in which the author
implemented set_bit_to_user() locally using inefficient functions: see TODO
at the top of that.

Probably, this kind of need would be common for virtualization area.

So we introduce a macro set_bit_user_non_atomic() following the implementation
style of x86's uaccess functions.

Note: there is a one restriction to this macro: bitmaps must be 64-bit
aligned (see the comment in this patch).

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp
CC: Avi Kivity a...@redhat.com
Cc: Thomas Gleixner t...@linutronix.de
CC: Ingo Molnar mi...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
---
 arch/x86/include/asm/uaccess.h |   39 +++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index abd3e0e..3138e65 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -98,6 +98,45 @@ struct exception_table_entry {
 
 extern int fixup_exception(struct pt_regs *regs);
 
+/**
+ * set_bit_user_non_atomic: - set a bit of a bitmap in user space.
+ * @nr:   Bit offset.
+ * @addr: Base address of a bitmap in user space.
+ *
+ * Context: User context only.  This function may sleep.
+ *
+ * This macro set a bit of a bitmap in user space.
+ *
+ * Restriction: the bitmap pointed to by @addr must be 64-bit aligned:
+ * the kernel accesses the bitmap by its own word length, so bitmaps
+ * allocated by 32-bit processes may cause fault.
+ *
+ * Returns zero on success, or -EFAULT on error.
+ */
+#define __set_bit_user_non_atomic_asm(nr, addr, err, errret)   \
+   asm volatile(1:bts %1,%2\n\
+2:\n \
+.section .fixup,\ax\\n \
+3:mov %3,%0\n\
+  jmp 2b\n   \
+.previous\n  \
+_ASM_EXTABLE(1b, 3b)   \
+: =r(err)\
+: r (nr), m (__m(addr)), i (errret), 0 (err))
+
+#define set_bit_user_non_atomic(nr, addr)  \
+({ \
+   int __ret_sbu;  \
+   \
+   might_fault();  \
+   if (access_ok(VERIFY_WRITE, addr, nr/8 + 1))\
+   __set_bit_user_non_atomic_asm(nr, addr, __ret_sbu, -EFAULT);\
+   else\
+   __ret_sbu = -EFAULT;\
+   \
+   __ret_sbu;  \
+})
+
 /*
  * These are the main single-value transfer routines.  They automatically
  * use the right size if we just have the right pointer type.
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 6/12 not tested yet] PPC: introduce copy_in_user() for 32-bit

2010-05-04 Thread Takuya Yoshikawa
During the work of KVM's dirty page logging optimization, we encountered
the need of copy_in_user() for 32-bit ppc and x86: these will be used for
manipulating dirty bitmaps in user space.

So we implement copy_in_user() for 32-bit with __copy_tofrom_user().

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp
CC: Alexander Graf ag...@suse.de
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/uaccess.h |   17 +
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index bd0fb84..3a01ce8 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -359,6 +359,23 @@ static inline unsigned long copy_to_user(void __user *to,
return n;
 }
 
+static inline unsigned long copy_in_user(void __user *to,
+   const void __user *from, unsigned long n)
+{
+   unsigned long over;
+
+   if (likely(access_ok(VERIFY_READ, from, n) 
+   access_ok(VERIFY_WRITE, to, n)))
+   return __copy_tofrom_user(to, from, n);
+   if (((unsigned long)from  TASK_SIZE) ||
+   ((unsigned long)to  TASK_SIZE)) {
+   over = max((unsigned long)from, (unsigned long)to)
+   + n - TASK_SIZE;
+   return __copy_tofrom_user(to, from, n - over) + over;
+   }
+   return n;
+}
+
 #else /* __powerpc64__ */
 
 #define __copy_in_user(to, from, size) \
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 5/9] kvm: synchronize state from cpu context

2010-05-04 Thread Marcelo Tosatti
From: Jan Kiszka jan.kis...@siemens.com

It is not safe to retrieve the KVM internal state of a given cpu
while its potentially modifying it.

Queue the request to run on cpu context, similarly to qemu-kvm.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu/kvm-all.c
===
--- qemu.orig/kvm-all.c
+++ qemu/kvm-all.c
@@ -796,14 +796,22 @@ void kvm_flush_coalesced_mmio_buffer(voi
 #endif
 }
 
-void kvm_cpu_synchronize_state(CPUState *env)
+static void do_kvm_cpu_synchronize_state(void *_env)
 {
+CPUState *env = _env;
+
 if (!env-kvm_vcpu_dirty) {
 kvm_arch_get_registers(env);
 env-kvm_vcpu_dirty = 1;
 }
 }
 
+void kvm_cpu_synchronize_state(CPUState *env)
+{
+if (!env-kvm_vcpu_dirty)
+run_on_cpu(env, do_kvm_cpu_synchronize_state, env);
+}
+
 void kvm_cpu_synchronize_post_reset(CPUState *env)
 {
 kvm_arch_put_registers(env, KVM_PUT_RESET_STATE);


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 4/9] port qemu-kvm's on_vcpu code

2010-05-04 Thread Marcelo Tosatti
run_on_cpu allows to execute work on a given CPUState context.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu/cpu-all.h
===
--- qemu.orig/cpu-all.h
+++ qemu/cpu-all.h
@@ -818,6 +818,7 @@ void cpu_watchpoint_remove_all(CPUState 
 
 void cpu_single_step(CPUState *env, int enabled);
 void cpu_reset(CPUState *s);
+void run_on_cpu(CPUState *env, void (*func)(void *data), void *data);
 
 #define CPU_LOG_TB_OUT_ASM (1  0)
 #define CPU_LOG_TB_IN_ASM  (1  1)
Index: qemu/cpu-defs.h
===
--- qemu.orig/cpu-defs.h
+++ qemu/cpu-defs.h
@@ -132,6 +132,7 @@ typedef struct icount_decr_u16 {
 
 struct kvm_run;
 struct KVMState;
+struct qemu_work_item;
 
 typedef struct CPUBreakpoint {
 target_ulong pc;
@@ -204,6 +205,7 @@ typedef struct CPUWatchpoint {
 uint32_t created;   \
 struct QemuThread *thread;  \
 struct QemuCond *halt_cond; \
+struct qemu_work_item *queued_work_first, *queued_work_last;\
 const char *cpu_model_str;  \
 struct KVMState *kvm_state; \
 struct kvm_run *kvm_run;\
Index: qemu/cpus.c
===
--- qemu.orig/cpus.c
+++ qemu/cpus.c
@@ -115,6 +115,8 @@ static int cpu_has_work(CPUState *env)
 {
 if (env-stop)
 return 1;
+if (env-queued_work_first)
+return 1;
 if (env-stopped || !vm_running)
 return 0;
 if (!env-halted)
@@ -252,6 +254,11 @@ int qemu_cpu_self(void *env)
 return 1;
 }
 
+void run_on_cpu(CPUState *env, void (*func)(void *data), void *data)
+{
+func(data);
+}
+
 void resume_all_vcpus(void)
 {
 }
@@ -304,6 +311,7 @@ static QemuCond qemu_cpu_cond;
 /* system init */
 static QemuCond qemu_system_cond;
 static QemuCond qemu_pause_cond;
+static QemuCond qemu_work_cond;
 
 static void tcg_block_io_signals(void);
 static void kvm_block_io_signals(CPUState *env);
@@ -334,6 +342,50 @@ void qemu_main_loop_start(void)
 qemu_cond_broadcast(qemu_system_cond);
 }
 
+void run_on_cpu(CPUState *env, void (*func)(void *data), void *data)
+{
+struct qemu_work_item wi;
+
+if (qemu_cpu_self(env)) {
+func(data);
+return;
+}
+
+wi.func = func;
+wi.data = data;
+if (!env-queued_work_first)
+env-queued_work_first = wi;
+else
+env-queued_work_last-next = wi;
+env-queued_work_last = wi;
+wi.next = NULL;
+wi.done = false;
+
+qemu_cpu_kick(env);
+while (!wi.done) {
+CPUState *self_env = cpu_single_env;
+
+qemu_cond_wait(qemu_work_cond, qemu_global_mutex);
+cpu_single_env = self_env;
+}
+}
+
+static void flush_queued_work(CPUState *env)
+{
+struct qemu_work_item *wi;
+
+if (!env-queued_work_first)
+return;
+
+while ((wi = env-queued_work_first)) {
+env-queued_work_first = wi-next;
+wi-func(wi-data);
+wi-done = true;
+}
+env-queued_work_last = NULL;
+qemu_cond_broadcast(qemu_work_cond);
+}
+
 static void qemu_wait_io_event_common(CPUState *env)
 {
 if (env-stop) {
@@ -341,6 +393,7 @@ static void qemu_wait_io_event_common(CP
 env-stopped = 1;
 qemu_cond_signal(qemu_pause_cond);
 }
+flush_queued_work(env);
 }
 
 static void qemu_wait_io_event(CPUState *env)
Index: qemu/qemu-common.h
===
--- qemu.orig/qemu-common.h
+++ qemu/qemu-common.h
@@ -249,6 +249,14 @@ void qemu_notify_event(void);
 void qemu_cpu_kick(void *env);
 int qemu_cpu_self(void *env);
 
+/* work queue */
+struct qemu_work_item {
+struct qemu_work_item *next;
+void (*func)(void *data);
+void *data;
+int done;
+};
+
 #ifdef CONFIG_USER_ONLY
 #define qemu_init_vcpu(env) do { } while (0)
 #else


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 6/9] add cpu_is_stopped helper

2010-05-04 Thread Marcelo Tosatti
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu/cpu-all.h
===
--- qemu.orig/cpu-all.h
+++ qemu/cpu-all.h
@@ -818,6 +818,7 @@ void cpu_watchpoint_remove_all(CPUState 
 
 void cpu_single_step(CPUState *env, int enabled);
 void cpu_reset(CPUState *s);
+int cpu_is_stopped(CPUState *env);
 void run_on_cpu(CPUState *env, void (*func)(void *data), void *data);
 
 #define CPU_LOG_TB_OUT_ASM (1  0)
Index: qemu/cpus.c
===
--- qemu.orig/cpus.c
+++ qemu/cpus.c
@@ -91,6 +91,11 @@ void cpu_synchronize_all_post_init(void)
 }
 }
 
+int cpu_is_stopped(CPUState *env)
+{
+return !vm_running || env-stopped;
+}
+
 static void do_vm_stop(int reason)
 {
 if (vm_running) {


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 9/9] kvm: enable smp 1

2010-05-04 Thread Marcelo Tosatti
Process INIT/SIPI requests and enable -smp  1.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu/kvm-all.c
===
--- qemu.orig/kvm-all.c
+++ qemu/kvm-all.c
@@ -593,11 +593,6 @@ int kvm_init(int smp_cpus)
 int ret;
 int i;
 
-if (smp_cpus  1) {
-fprintf(stderr, No SMP KVM support, use '-smp 1'\n);
-return -EINVAL;
-}
-
 s = qemu_mallocz(sizeof(KVMState));
 
 #ifdef KVM_CAP_SET_GUEST_DEBUG
@@ -840,6 +835,11 @@ int kvm_cpu_exec(CPUState *env)
 }
 #endif
 
+if (kvm_arch_process_irqchip_events(env)) {
+ret = 0;
+break;
+}
+
 if (env-kvm_vcpu_dirty) {
 kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
 env-kvm_vcpu_dirty = 0;
Index: qemu/kvm.h
===
--- qemu.orig/kvm.h
+++ qemu/kvm.h
@@ -90,6 +90,8 @@ int kvm_arch_handle_exit(CPUState *env, 
 
 int kvm_arch_pre_run(CPUState *env, struct kvm_run *run);
 
+int kvm_arch_process_irqchip_events(CPUState *env);
+
 int kvm_arch_get_registers(CPUState *env);
 
 /* state subset only touched by the VCPU itself during runtime */
Index: qemu/target-i386/kvm.c
===
--- qemu.orig/target-i386/kvm.c
+++ qemu/target-i386/kvm.c
@@ -1073,6 +1073,22 @@ int kvm_arch_post_run(CPUState *env, str
 return 0;
 }
 
+int kvm_arch_process_irqchip_events(CPUState *env)
+{
+if (env-interrupt_request  CPU_INTERRUPT_INIT) {
+kvm_cpu_synchronize_state(env);
+do_cpu_init(env);
+env-exception_index = EXCP_HALTED;
+}
+
+if (env-interrupt_request  CPU_INTERRUPT_SIPI) {
+kvm_cpu_synchronize_state(env);
+do_cpu_sipi(env);
+}
+
+return env-halted;
+}
+
 static int kvm_handle_halt(CPUState *env)
 {
 if (!((env-interrupt_request  CPU_INTERRUPT_HARD) 
Index: qemu/target-ppc/kvm.c
===
--- qemu.orig/target-ppc/kvm.c
+++ qemu/target-ppc/kvm.c
@@ -224,6 +224,11 @@ int kvm_arch_post_run(CPUState *env, str
 return 0;
 }
 
+int kvm_arch_process_irqchip_events(CPUState *env)
+{
+return 0;
+}
+
 static int kvmppc_handle_halt(CPUState *env)
 {
 if (!(env-interrupt_request  CPU_INTERRUPT_HARD)  (msr_ee)) {
Index: qemu/target-s390x/kvm.c
===
--- qemu.orig/target-s390x/kvm.c
+++ qemu/target-s390x/kvm.c
@@ -175,6 +175,11 @@ int kvm_arch_post_run(CPUState *env, str
 return 0;
 }
 
+int kvm_arch_process_irqchip_events(CPUState *env)
+{
+return 0;
+}
+
 static void kvm_s390_interrupt_internal(CPUState *env, int type, uint32_t parm,
 uint64_t parm64, int vm)
 {


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space

2010-05-04 Thread Takuya Yoshikawa
During the work of KVM's dirty page logging optimization, we encountered
the need of manipulating bitmaps in user space efficiantly. To achive this,
we introduce a uaccess function for setting a bit in user space following
Avi's suggestion.

  KVM is now using dirty bitmaps for live-migration and VGA. Although we need
  to update them from kernel side, copying them every time for updating the
  dirty log is a big bottleneck. Especially, we tested that zero-copy bitmap
  manipulation improves responses of GUI manipulations a lot.

We also found one similar need in drivers/vhost/vhost.c in which the author
implemented set_bit_to_user() locally using inefficient functions: see TODO
at the top of that.

Probably, this kind of need would be common for virtualization area.

So we introduce a function set_bit_user_non_atomic().

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp
CC: Alexander Graf ag...@suse.de
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/uaccess.h |   19 +++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 3a01ce8..f878326 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -321,6 +321,25 @@ do {   
\
__gu_err;   \
 })
 
+static inline int set_bit_user_non_atomic(int nr, void __user *addr)
+{
+   u8 __user *p;
+   u8 val;
+
+   p = (u8 __user *)((unsigned long)addr + nr / BITS_PER_BYTE);
+   if (!access_ok(VERIFY_WRITE, p, 1))
+   return -EFAULT;
+
+   if (__get_user(val, p))
+   return -EFAULT;
+
+   val |= 1U  (nr % BITS_PER_BYTE);
+   if (__put_user(val, p))
+   return -EFAULT;
+
+   return 0;
+}
+
 
 /* more complex routines */
 
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro

2010-05-04 Thread Takuya Yoshikawa
Although we can use *_le_bit() helpers to treat bitmaps le arranged,
having le bit offset calculation as a seperate macro gives us more freedom.

For example, KVM has le arranged dirty bitmaps for VGA, live-migration
and they are used in user space too. To avoid bitmap copies between kernel
and user space, we want to update the bitmaps in user space directly.
To achive this, le bit offset with *_user() functions help us a lot.

So let us use the le bit offset calculation part by defining it as a new
macro: generic_le_bit_offset() .

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
CC: Arnd Bergmann a...@arndb.de
---
 include/asm-generic/bitops/le.h |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index 80e3bf1..ee445fb 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -9,6 +9,8 @@
 
 #if defined(__LITTLE_ENDIAN)
 
+#define generic_le_bit_offset(nr)  (nr)
+
 #define generic_test_le_bit(nr, addr) test_bit(nr, addr)
 #define generic___set_le_bit(nr, addr) __set_bit(nr, addr)
 #define generic___clear_le_bit(nr, addr) __clear_bit(nr, addr)
@@ -25,6 +27,8 @@
 
 #elif defined(__BIG_ENDIAN)
 
+#define generic_le_bit_offset(nr)  ((nr) ^ BITOP_LE_SWIZZLE)
+
 #define generic_test_le_bit(nr, addr) \
test_bit((nr) ^ BITOP_LE_SWIZZLE, (addr))
 #define generic___set_le_bit(nr, addr) \
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 9/12] KVM: introduce a wrapper function of set_bit_user_non_atomic()

2010-05-04 Thread Takuya Yoshikawa
This is not to break the build for other architectures than x86 and ppc.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp
---
 arch/ia64/include/asm/kvm_host.h|5 +
 arch/powerpc/include/asm/kvm_host.h |6 ++
 arch/s390/include/asm/kvm_host.h|6 ++
 arch/x86/include/asm/kvm_host.h |5 +
 4 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index a362e67..938041b 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -589,6 +589,11 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
 void kvm_sal_emul(struct kvm_vcpu *vcpu);
 
+static inline int kvm_set_bit_user(int nr, void __user *addr)
+{
+   return 0;
+}
+
 #endif /* __ASSEMBLY__*/
 
 #endif
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 0c9ad86..9463524 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -26,6 +26,7 @@
 #include linux/types.h
 #include linux/kvm_types.h
 #include asm/kvm_asm.h
+#include asm/uaccess.h
 
 #define KVM_MAX_VCPUS 1
 #define KVM_MEMORY_SLOTS 32
@@ -287,4 +288,9 @@ struct kvm_vcpu_arch {
 #endif
 };
 
+static inline int kvm_set_bit_user(int nr, void __user *addr)
+{
+   return set_bit_user_non_atomic(nr, addr);
+}
+
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 27605b6..36710ee 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -238,4 +238,10 @@ struct kvm_arch{
 };
 
 extern int sie64a(struct kvm_s390_sie_block *, unsigned long *);
+
+static inline int kvm_set_bit_user(int nr, void __user *addr)
+{
+   return 0;
+}
+
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3f0007b..9e22df9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -795,4 +795,9 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
 
+static inline int kvm_set_bit_user(int nr, void __user *addr)
+{
+   return set_bit_user_non_atomic(nr, addr);
+}
+
 #endif /* _ASM_X86_KVM_HOST_H */
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH RFC 10/12] KVM: move dirty bitmaps to user space

2010-05-04 Thread Takuya Yoshikawa
We move dirty bitmaps to user space.

 - Allocation and destruction: we use do_mmap() and do_munmap().
   The new bitmap space is twice longer than the original one and we
   use the additional space for double buffering: this makes it
   possible to update the active bitmap while letting the user space
   read the other one safely. For x86, we can also remove the vmalloc()
   in kvm_vm_ioctl_get_dirty_log().

 - Bitmap manipulations: we replace all functions which access dirty
   bitmaps with *_user() functions.

 - For ia64: moving the dirty bitmaps of memory slots does not affect
   ia64 much because it's using a different place to store dirty logs
   rather than the dirty bitmaps of memory slots: all we have to change
   are sync and get of dirty log, so we don't need set_bit_user like
   functions for ia64.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp
CC: Avi Kivity a...@redhat.com
CC: Alexander Graf ag...@suse.de
---
 arch/ia64/kvm/kvm-ia64.c  |   15 +-
 arch/powerpc/kvm/book3s.c |5 +++-
 arch/x86/kvm/x86.c|   25 --
 include/linux/kvm_host.h  |3 +-
 virt/kvm/kvm_main.c   |   62 +---
 5 files changed, 82 insertions(+), 28 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 17fd65c..03503e6 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1823,11 +1823,19 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
n = kvm_dirty_bitmap_bytes(memslot);
base = memslot-base_gfn / BITS_PER_LONG;
 
+   r = -EFAULT;
+   if (!access_ok(VERIFY_WRITE, memslot-dirty_bitmap, n))
+   goto out;
+
for (i = 0; i  n/sizeof(long); ++i) {
if (dirty_bitmap[base + i])
memslot-is_dirty = true;
 
-   memslot-dirty_bitmap[i] = dirty_bitmap[base + i];
+   if (__put_user(dirty_bitmap[base + i],
+  memslot-dirty_bitmap[i])) {
+   r = -EFAULT;
+   goto out;
+   }
dirty_bitmap[base + i] = 0;
}
r = 0;
@@ -1858,7 +1866,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
if (memslot-is_dirty) {
kvm_flush_remote_tlbs(kvm);
n = kvm_dirty_bitmap_bytes(memslot);
-   memset(memslot-dirty_bitmap, 0, n);
+   if (clear_user(memslot-dirty_bitmap, n)) {
+   r = -EFAULT;
+   goto out;
+   }
memslot-is_dirty = false;
}
r = 0;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 4b074f1..2a31d2f 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1210,7 +1210,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
kvmppc_mmu_pte_pflush(vcpu, ga, ga_end);
 
n = kvm_dirty_bitmap_bytes(memslot);
-   memset(memslot-dirty_bitmap, 0, n);
+   if (clear_user(memslot-dirty_bitmap, n)) {
+   r = -EFAULT;
+   goto out;
+   }
memslot-is_dirty = false;
}
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 023c7f8..32a3d94 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2760,40 +2760,37 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
/* If nothing is dirty, don't bother messing with page tables. */
if (memslot-is_dirty) {
struct kvm_memslots *slots, *old_slots;
-   unsigned long *dirty_bitmap;
+   unsigned long __user *dirty_bitmap;
+   unsigned long __user *dirty_bitmap_old;
 
spin_lock(kvm-mmu_lock);
kvm_mmu_slot_remove_write_access(kvm, log-slot);
spin_unlock(kvm-mmu_lock);
 
-   r = -ENOMEM;
-   dirty_bitmap = vmalloc(n);
-   if (!dirty_bitmap)
+   dirty_bitmap = memslot-dirty_bitmap;
+   dirty_bitmap_old = memslot-dirty_bitmap_old;
+   r = -EFAULT;
+   if (clear_user(dirty_bitmap_old, n))
goto out;
-   memset(dirty_bitmap, 0, n);
 
r = -ENOMEM;
slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
-   if (!slots) {
-   vfree(dirty_bitmap);
+   if (!slots)
goto out;
-   }
+
memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
-   slots-memslots[log-slot].dirty_bitmap = dirty_bitmap;
+   slots-memslots[log-slot].dirty_bitmap = dirty_bitmap_old;
+   slots-memslots[log-slot].dirty_bitmap_old = dirty_bitmap;
slots-memslots[log-slot].is_dirty = false;
 
   

[RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps

2010-05-04 Thread Takuya Yoshikawa
Now that dirty bitmaps are accessible from user space, we export the
addresses of these to achieve zero-copy dirty log check:

  KVM_GET_USER_DIRTY_LOG_ADDR

We also need an API for triggering dirty bitmap switch to take the full
advantage of double buffered bitmaps.

  KVM_SWITCH_DIRTY_LOG

See the documentation in this patch for precise explanations.

About performance improvement: the most important feature of switch API
is the lightness. In our test, this appeared in the form of improved
responses for GUI manipulations.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp
CC: Avi Kivity a...@redhat.com
CC: Alexander Graf ag...@suse.de
---
 Documentation/kvm/api.txt |   87 +
 arch/ia64/kvm/kvm-ia64.c  |   27 +-
 arch/powerpc/kvm/book3s.c |   18 +++--
 arch/x86/kvm/x86.c|   44 ---
 include/linux/kvm.h   |   11 ++
 include/linux/kvm_host.h  |4 ++-
 virt/kvm/kvm_main.c   |   63 +
 7 files changed, 220 insertions(+), 34 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index a237518..c106c83 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -892,6 +892,93 @@ arguments.
 This ioctl is only useful after KVM_CREATE_IRQCHIP.  Without an in-kernel
 irqchip, the multiprocessing state must be maintained by userspace.
 
+4.39 KVM_GET_USER_DIRTY_LOG_ADDR
+
+Capability: KVM_CAP_USER_DIRTY_LOG (=1 see below)
+Architectures: all
+Type: vm ioctl
+Parameters: struct kvm_user_dirty_log (in/out)
+Returns: 0 on success, -1 on error
+
+This ioctl makes it possible to use KVM_SWITCH_DIRTY_LOG (see 4.40) instead
+of the old dirty log manipulation by KVM_GET_DIRTY_LOG.
+
+A bit about KVM_CAP_USER_DIRTY_LOG
+
+Before this ioctl was introduced, dirty bitmaps for dirty page logging were
+allocated in the kernel's memory space.  But we have now moved them to user
+space to get more flexiblity and performance.  To achive this move without
+breaking the compatibility, we will split KVM_CAP_USER_DIRTY_LOG capability
+into a few generations which can be identified by its check extension
+return values.
+
+This KVM_GET_USER_DIRTY_LOG_ADDR belongs to the first generation with the
+KVM_SWITCH_DIRTY_LOG (4.40) and must be supported by all generations.
+
+What you get
+
+By using this, you can get paired bitmap addresses which are accessible from
+user space.  See the explanation in 4.40 for the roles of these two bitmaps.
+
+How to Get
+
+Before calling this, you have to set the slot member of kvm_user_dirty_log
+to indicate the target memory slot.
+
+struct kvm_user_dirty_log {
+   __u32 slot;
+   __u32 flags;
+   __u64 dirty_bitmap;
+   __u64 dirty_bitmap_old;
+};
+
+The addresses will be set in the paired members: dirty_bitmap and _old.
+
+Note
+
+In generation 1, we support bitmaps which are created in kernel but do not
+support bitmaps created by users.  This means bitmap creation/destruction
+are done same as before when you instruct KVM by KVM_SET_USER_MEMORY_REGION
+(see 4.34) to start/stop logging.  Please do not try to free the exported
+bitmaps by yourself, or KVM will access the freed area and end with fault.
+
+4.40 KVM_SWITCH_DIRTY_LOG
+
+Capability: KVM_CAP_USER_DIRTY_LOG (=1 see 4.39)
+Architectures: all
+Type: vm ioctl
+Parameters: memory slot id
+Returns: 0 if switched, 1 if not (slot was clean), -1 on error
+
+This ioctl allows you to switch the dirty log to the next one: a newer
+ioctl for getting dirty page logs than KVM_GET_DIRTY_LOG (see 4.7 for the
+explanation about dirty page logging, log format is not changed).
+
+If you have the capability KVM_CAP_USER_DIRTY_LOG, using this is strongly
+recommended than using KVM_GET_DIRTY_LOG because this does not need buffer
+copy between kernel and user space.
+
+How to Use
+
+Before calling this, you have to remember the paired addresses of dirty
+bitmaps which can be obtained by KVM_GET_USER_DIRTY_LOG_ADDR (see 4.39):
+dirty_bitmap (being used now by kernel) and dirty_bitmap_old (not being
+used now and containing the last log).
+
+After calling this, the role of these bitmaps will change like this;
+If the return value was 0, kernel cleared dirty_bitmap_old and began to use
+it for the next logging, so that you can use the cold dirty_bitmap to check
+the log since the last switch.  If the return value was 1, all pages were not
+dirty and bitmap switch was not done.  Note that remembering which bitmap is
+now active is your responsibility.  So you have to update your remembering
+when you get the return value 0.
+
+Note
+
+Bitmap switch may also occur when you call KVM_GET_DIRTY_LOG.  Please use
+either one, preferably this one, only to avoid extra confusion.  We do not
+guarantee on which condition KVM_GET_DIRTY_LOG causes bitmap switch.
+
 5. The kvm_run structure
 
 Application code obtains a 

[RFC][PATCH 12/12 sample] qemu-kvm: use new API for getting/switching dirty bitmaps

2010-05-04 Thread Takuya Yoshikawa
We use new API for light dirty log access if KVM supports it.

This conflicts with Marcelo's patches. So please take this as a sample patch.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 kvm/include/linux/kvm.h |   11 ++
 qemu-kvm.c  |   81 ++-
 qemu-kvm.h  |1 +
 3 files changed, 85 insertions(+), 8 deletions(-)

diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h
index 6485981..efd9538 100644
--- a/kvm/include/linux/kvm.h
+++ b/kvm/include/linux/kvm.h
@@ -317,6 +317,14 @@ struct kvm_dirty_log {
};
 };
 
+/* for KVM_GET_USER_DIRTY_LOG_ADDR */
+struct kvm_user_dirty_log {
+   __u32 slot;
+   __u32 flags;
+   __u64 dirty_bitmap;
+   __u64 dirty_bitmap_old;
+};
+
 /* for KVM_SET_SIGNAL_MASK */
 struct kvm_signal_mask {
__u32 len;
@@ -499,6 +507,7 @@ struct kvm_ioeventfd {
 #define KVM_CAP_PPC_SEGSTATE 43
 
 #define KVM_CAP_PCI_SEGMENT 47
+#define KVM_CAP_USER_DIRTY_LOG 55
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -595,6 +604,8 @@ struct kvm_clock_data {
struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR  _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_GET_USER_DIRTY_LOG_ADDR _IOW(KVMIO,  0x49, struct 
kvm_user_dirty_log)
+#define KVM_SWITCH_DIRTY_LOG  _IO(KVMIO,   0x4a)
 /* Device model IOC */
 #define KVM_CREATE_IRQCHIP_IO(KVMIO,   0x60)
 #define KVM_IRQ_LINE  _IOW(KVMIO,  0x61, struct kvm_irq_level)
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 91f0222..98777f0 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -143,6 +143,8 @@ struct slot_info {
 unsigned long userspace_addr;
 unsigned flags;
 int logging_count;
+unsigned long *dirty_bitmap;
+unsigned long *dirty_bitmap_old;
 };
 
 struct slot_info slots[KVM_MAX_NUM_MEM_REGIONS];
@@ -232,6 +234,29 @@ int kvm_is_containing_region(kvm_context_t kvm, unsigned 
long phys_addr,
 return 1;
 }
 
+static int kvm_user_dirty_log_works(void)
+{
+return kvm_state-user_dirty_log;
+}
+
+static int kvm_set_user_dirty_log(int slot)
+{
+int r;
+struct kvm_user_dirty_log dlog;
+
+dlog.slot = slot;
+r = kvm_vm_ioctl(kvm_state, KVM_GET_USER_DIRTY_LOG_ADDR, dlog);
+if (r  0) {
+DPRINTF(KVM_GET_USER_DIRTY_LOG_ADDR failed: %s\n, strerror(-r));
+return r;
+}
+slots[slot].dirty_bitmap = (unsigned long *)
+   ((unsigned long)dlog.dirty_bitmap);
+slots[slot].dirty_bitmap_old = (unsigned long *)
+   ((unsigned long)dlog.dirty_bitmap_old);
+return r;
+}
+
 /*
  * dirty pages logging control
  */
@@ -265,8 +290,16 @@ static int kvm_dirty_pages_log_change(kvm_context_t kvm,
 DPRINTF(slot %d start %llx len %llx flags %x\n,
 mem.slot, mem.guest_phys_addr, mem.memory_size, mem.flags);
 r = kvm_vm_ioctl(kvm_state, KVM_SET_USER_MEMORY_REGION, mem);
-if (r  0)
+if (r  0) {
 fprintf(stderr, %s: %m\n, __FUNCTION__);
+return r;
+}
+}
+if (flags  KVM_MEM_LOG_DIRTY_PAGES) {
+r = kvm_set_user_dirty_log(slot);
+} else {
+slots[slot].dirty_bitmap = NULL;
+slots[slot].dirty_bitmap_old = NULL;
 }
 return r;
 }
@@ -589,7 +622,6 @@ int kvm_register_phys_mem(kvm_context_t kvm,
   unsigned long phys_start, void *userspace_addr,
   unsigned long len, int log)
 {
-
 struct kvm_userspace_memory_region memory = {
 .memory_size = len,
 .guest_phys_addr = phys_start,
@@ -608,6 +640,9 @@ int kvm_register_phys_mem(kvm_context_t kvm,
 fprintf(stderr, create_userspace_phys_mem: %s\n, strerror(-r));
 return -1;
 }
+if (log) {
+r = kvm_set_user_dirty_log(memory.slot);
+}
 register_slot(memory.slot, memory.guest_phys_addr, memory.memory_size,
   memory.userspace_addr, memory.flags);
 return 0;
@@ -652,6 +687,8 @@ void kvm_destroy_phys_mem(kvm_context_t kvm, unsigned long 
phys_start,
 fprintf(stderr, destroy_userspace_phys_mem: %s, strerror(-r));
 return;
 }
+slots[memory.slot].dirty_bitmap = NULL;
+slots[memory.slot].dirty_bitmap_old = NULL;
 
 free_slot(memory.slot);
 }
@@ -692,6 +729,21 @@ int kvm_get_dirty_pages(kvm_context_t kvm, unsigned long 
phys_addr, void *buf)
 return kvm_get_map(kvm, KVM_GET_DIRTY_LOG, slot, buf);
 }
 
+static int kvm_switch_map(int slot)
+{
+int r;
+
+r = kvm_vm_ioctl(kvm_state, KVM_SWITCH_DIRTY_LOG, slot);
+if (r == 0) {
+unsigned long *dirty_bitmap;
+
+dirty_bitmap = slots[slot].dirty_bitmap;
+slots[slot].dirty_bitmap = slots[slot].dirty_bitmap_old;
+slots[slot].dirty_bitmap_old = dirty_bitmap;
+}
+return r;
+}
+
 int kvm_get_dirty_pages_range(kvm_context_t kvm, 

Re: [PATCH 1/3 v2] KVM MMU: make kvm_mmu_zap_page() return the number of zapped sp in total.

2010-05-04 Thread Marcelo Tosatti
On Mon, May 03, 2010 at 09:38:54PM +0800, Gui Jianfeng wrote:
 Hi Marcelo
 
 Actually, it doesn't only affect kvm_mmu_change_mmu_pages() but also affects 
 kvm_mmu_remove_some_alloc_mmu_pages()
 which is called by mmu shrink routine. This will induce upper layer get a 
 wrong number, so i think this should be
 fixed. Here is a updated version.
 
 ---
 From: Gui Jianfeng guijianf...@cn.fujitsu.com
 
 Currently, in kvm_mmu_change_mmu_pages(kvm, page), used_pages-- is  
 performed after calling
 kvm_mmu_zap_page() in spite of that whether page is actually reclaimed. 
 Because root sp won't 
 be reclaimed by kvm_mmu_zap_page(). So making kvm_mmu_zap_page() return total 
 number of reclaimed 
 sp makes more sense. A new flag is put into kvm_mmu_zap_page() to indicate 
 whether the top page is
 reclaimed. kvm_mmu_remove_some_alloc_mmu_pages() also rely on 
 kvm_mmu_zap_page() to return a total
 relcaimed number.

Isnt it simpler to have kvm_mmu_zap_page return the number of pages it
actually freed? Then always restart the hash walk if return is positive.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 0/9] enable smp 1 and related fixes

2010-05-04 Thread Marcelo Tosatti

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 1/9] kvm: set cpu_single_env around KVM_RUN ioctl

2010-05-04 Thread Marcelo Tosatti
Zero cpu_single_env before leaving global lock protection, and
restore on return.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu/kvm-all.c
===
--- qemu.orig/kvm-all.c
+++ qemu/kvm-all.c
@@ -846,9 +846,11 @@ int kvm_cpu_exec(CPUState *env)
 }
 
 kvm_arch_pre_run(env, run);
+cpu_single_env = NULL;
 qemu_mutex_unlock_iothread();
 ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
 qemu_mutex_lock_iothread();
+cpu_single_env = env;
 kvm_arch_post_run(env, run);
 
 if (ret == -EINTR || ret == -EAGAIN) {


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 8/9] kvm: validate context for kvm cpu get/put operations

2010-05-04 Thread Marcelo Tosatti
From: Jan Kiszka jan.kis...@siemens.com

Validate that KVM vcpu state is only read/written from cpu thread itself
or that cpu is stopped.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu/target-i386/kvm.c
===
--- qemu.orig/target-i386/kvm.c
+++ qemu/target-i386/kvm.c
@@ -949,6 +949,8 @@ int kvm_arch_put_registers(CPUState *env
 {
 int ret;
 
+assert(cpu_is_stopped(env) || qemu_cpu_self(env));
+
 ret = kvm_getput_regs(env, 1);
 if (ret  0)
 return ret;
@@ -991,6 +993,8 @@ int kvm_arch_get_registers(CPUState *env
 {
 int ret;
 
+assert(cpu_is_stopped(env) || qemu_cpu_self(env));
+
 ret = kvm_getput_regs(env, 0);
 if (ret  0)
 return ret;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for May 4

2010-05-04 Thread Chris Wright

KVM Forum topic ideas
- mgmt interface (qemud)
- working breakout sessions are welcome at the Forum

stable tree
- have a volunteer (thanks Justin)
- Anthony will write up proposal which is basically
  - bug fixes actively proposed for stable tree
  - stable maintainer collects and applies
  - periodically release and re-sync w/ main tree

0.12.4?
- RSN...will tag and push shortly
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Avoid writing HOST_CR0 every entry

2010-05-04 Thread Marcelo Tosatti
On Mon, May 03, 2010 at 05:18:54PM +0300, Avi Kivity wrote:
 cr0.ts may change between entries, so we copy cr0 to HOST_CR0 before each
 entry.  That is slow, so instead, set HOST_CR0 to have TS set unconditionally
 (which is a safe value), and issue a clts() just before exiting vcpu context
 if the task indeed owns the fpu.
 
 Saves ~50 cycles/exit.
 
 Signed-off-by: Avi Kivity a...@redhat.com

Looks good to me.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro

2010-05-04 Thread Arnd Bergmann
On Tuesday 04 May 2010, Takuya Yoshikawa wrote:
 
 Although we can use *_le_bit() helpers to treat bitmaps le arranged,
 having le bit offset calculation as a seperate macro gives us more freedom.
 
 For example, KVM has le arranged dirty bitmaps for VGA, live-migration
 and they are used in user space too. To avoid bitmap copies between kernel
 and user space, we want to update the bitmaps in user space directly.
 To achive this, le bit offset with *_user() functions help us a lot.
 
 So let us use the le bit offset calculation part by defining it as a new
 macro: generic_le_bit_offset() .

Does this work correctly if your user space is 32 bits (i.e. unsigned long
is different size in user space and kernel) in both big- and little-endian
systems?

I'm not sure about all the details, but I think you cannot in general share
bitmaps between user space and kernel because of this.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2996643 ] qemu-kvm -smb crashes with Samba 3.5.2

2010-05-04 Thread SourceForge.net
Bugs item #2996643, was opened at 2010-05-04 17:23
Message generated for change (Tracker Item Submitted) made by edolstra
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2996643group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Eelco Dolstra (edolstra)
Assigned to: Nobody/Anonymous (nobody)
Summary: qemu-kvm -smb crashes with Samba 3.5.2

Initial Comment:
qemu-kvm 0.12.3 dies with a SIGTERM when using the -smb flag as soon as I try 
to unmount the SMB/CIFS filesystem.  For instance, this sequence of commands in 
the Linux guest:

mount.cifs //10.0.2.4/qemu /hostfs -o guest,username=nobody
umount /hostfs

causes qemu-kvm to crash almost immediately because it receives a TERM signal 
that its smbd child process sends to the process group, as strace shows:

[pid 27982] kill(0, SIGTERM)= 0
[pid 27982] --- SIGTERM (Terminated) @ 0 (0) ---
Process 27982 detached
[pid 27980] ... timer_settime resumed NULL) = 0
[pid 27980] --- SIGTERM (Terminated) @ 0 (0) ---
[pid 27980] --- SIGCHLD (Child exited) @ 0 (0) ---

(pid 27982 is the smbd child, 27980 is qemu-system-x86_64)

This works fine with Samba 3.3.3 not with 3.3.12 or 3.5.2, so apparently 
something changed there.

Host machine runs NixOS (Linux 2.6.32.11), 32-bit, Intel(R) Core(TM)2 Duo CPU 
T7700 @ 2.40GHz.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2996643group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm guest: hrtimer issue with smp.

2010-05-04 Thread Stuart Sheldon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi all,

Saw a long thread on this back in Oct. but did not see if this was
resolved or not.

We are experiencing hrtimer: interrupt took huge_number ns errors on
the guest console with -smp 2 running on an Intel host. Both are running
2.6.33.3 vanilla. Guest is using kvm-timer, Host is using tsc.

Guest will become completely unresponsive after about 23hrs. Review of
logs after restart shows that the system suddenly shows a system time of
2 weeks in the past.

Not sure what else to report here. I've changed the guest to a single
CPU, and the problem appears to be gone...

Let me know how we can help on this...

Stuart Sheldon
ACT USA


- --
If you took all the girls I knew When I was single And brought
them all together for one night I know theyd never match My sweet
imagination And everything looks worse in black and white
   -- Paul Simon - Kodachrome Lyrics
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)

iQIcBAEBCAAGBQJL4D0/AAoJEFKVLITDJSGSFdgQALYfHEQnFuOKxPAthFZpfV9G
OZT2U0r1drwcxEPDXWlMxX+sOOItMjYkCdfJ4l4dlnOFRYigXP3puuYVocF2fXNW
2gJM7fGBl9uzm/jRBeuoaF1SX8qxp0oBzIj0G+3wstzDX/3f04T0bm/32QeyMgDH
WmM2ElFlwATaVwsj+AIYyEvdFKMHW7erNNb6PVHUvTNv/SLLb7XII7jnVsBsJYZZ
zz1YjhpTUFViKOobkD7nkbRCuhzEF86zKNCi4q5YnGFVBcX5v+qOExtqW/kRs0tv
ibQZ0E8P+uuItdnSoo0hIIWzfetdiEvLp1ZJN1aAazyNypbtH7GIOQezOBQbSN4N
Wzc53bxYL2Cer1/XnhtvgWq3TJc2a0/RiMFNh3fuCiY2WM80/NhdE+tMOXidKlOq
vK3mSWlyu1GOxep2yoH8XQzLqTzDBeYeSaEsL8CGHtn92aim/lrgCoU3W4n9K+mT
cHw1iEUmD9BZFEPAUj6bNwjvOQ0UPvhpiBlyrm6qoU9zK8ALAVTvCZj+9ZTv9tYJ
s28A9UJJ+qGQ1FSGBVpWrhfSILpir+lGJFPRpiAXSqzYC/8i0ZkYJBDVi8LtX/Mf
Q7NLVzXk6RgHcz3y/vr5ICjbyziiROFI8GbhmqtWobknT6AFaCr7rOHNoqbt91qW
zjE2ZS381fcAqCYzQ+CA
=KvAi
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM testing: New winutils.iso SHA1, refactoring some code

2010-05-04 Thread Lucas Meneghel Rodrigues
Make it possible to download the winutils.iso file right
from its repository, making very convenient for users
to perform windows testing.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/get_started.py |   65 +++---
 1 files changed, 46 insertions(+), 19 deletions(-)

diff --git a/client/tests/kvm/get_started.py b/client/tests/kvm/get_started.py
index 3a6f20f..870485b 100755
--- a/client/tests/kvm/get_started.py
+++ b/client/tests/kvm/get_started.py
@@ -11,6 +11,34 @@ from autotest_lib.client.common_lib import logging_manager
 from autotest_lib.client.bin import utils, os_dep
 
 
+def check_iso(url, destination, hash):
+
+Verifies if ISO that can be found on url is on destination with right hash.
+
+This function will verify the SHA1 hash of the ISO image. If the file
+turns out to be missing or corrupted, let the user know we can download it.
+
+@param url: URL where the ISO file can be found.
+@param destination: Directory in local disk where we'd like the iso to be.
+@param hash: SHA1 hash for the ISO image.
+
+logging.info(Verifying iso %s, os.path.basename(url))
+if not destination:
+os.makedirs(destination)
+iso_path = os.path.join(destination, os.path.basename(url))
+if not os.path.isfile(iso_path) or (
+utils.hash_file(iso_path, method=sha1) != hash):
+logging.warning(%s not found or corrupted, iso_path)
+logging.warning(Would you like to download it? (y/n))
+iso_download = raw_input()
+if iso_download == 'y':
+utils.unmap_url_cache(destination, url, hash, method=sha1)
+else:
+logging.warning(Missing file %s. Please download it, iso_path)
+else:
+logging.debug(%s present, with proper checksum, iso_path)
+
+
 if __name__ == __main__:
 logging_manager.configure_logging(kvm_utils.KvmLoggingConfig(),
   verbose=True)
@@ -51,28 +79,27 @@ if __name__ == __main__:
 else:
 logging.debug(Config file %s exists, not touching % dst_file)
 
-logging.info(3 - Verifying iso (make sure we have the OS iso needed for 
+logging.info(3 - Verifying iso (make sure we have the OS ISO needed for 
  the default test set))
-base_iso_name = Fedora-12-x86_64-DVD.iso
+
+iso_name = Fedora-12-x86_64-DVD.iso
 fedora_dir = pub/fedora/linux/releases/12/Fedora/x86_64/iso
 url = os.path.join(http://download.fedoraproject.org/;, fedora_dir,
-   base_iso_name)
-md5sum = 6dd31e292cc2eb1140544e9b1ba61c56
-iso_dir = os.path.join(base_dir, 'isos', 'linux')
-if not iso_dir:
-os.makedirs(iso_dir)
-iso_path = os.path.join(iso_dir, base_iso_name)
-if not os.path.isfile(iso_path) or (
- utils.hash_file(iso_path, method=md5) != 
md5sum):
-logging.warning(%s not found or corrupted, iso_path)
-logging.warning(Would you like to download it? (y/n))
-iso_download = raw_input()
-if iso_download == 'y':
-utils.unmap_url_cache(iso_dir, url, md5sum)
-else:
-logging.warning(Missing file %s. Please download it, iso_path)
-else:
-logging.debug(%s present, with proper checksum, iso_path)
+   iso_name)
+hash = 97a018ba32d43d0e76d032834fe7562bffe8ceb3
+destination = os.path.join(base_dir, 'isos', 'linux')
+check_iso(url, destination, hash)
+
+logging.info(4 - Verifying winutils.iso (make sure we have the utility 
+ ISO needed for Windows testing))
+
+logging.info(In order to run the KVM autotests in Windows guests, we 
+ provide you an ISO that this script can download)
+
+url = http://people.redhat.com/mrodrigu/kvm/winutils.iso;
+hash = 301da394fe840172188a32f8ba01524993baa0cb
+destination = os.path.join(base_dir, 'isos', 'windows')
+check_iso(url, destination, hash)
 
 logging.info(4 - Checking if qemu is installed (certify qemu and qemu-kvm 

  are in the place the default config expects))
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro

2010-05-04 Thread Avi Kivity

On 05/04/2010 06:03 PM, Arnd Bergmann wrote:

On Tuesday 04 May 2010, Takuya Yoshikawa wrote:
   

Although we can use *_le_bit() helpers to treat bitmaps le arranged,
having le bit offset calculation as a seperate macro gives us more freedom.

For example, KVM has le arranged dirty bitmaps for VGA, live-migration
and they are used in user space too. To avoid bitmap copies between kernel
and user space, we want to update the bitmaps in user space directly.
To achive this, le bit offset with *_user() functions help us a lot.

So let us use the le bit offset calculation part by defining it as a new
macro: generic_le_bit_offset() .
 

Does this work correctly if your user space is 32 bits (i.e. unsigned long
is different size in user space and kernel) in both big- and little-endian
systems?

I'm not sure about all the details, but I think you cannot in general share
bitmaps between user space and kernel because of this.
   


That's why the bitmaps are defined as little endian u64 aligned, even on 
big endian 32-bit systems.  Little endian bitmaps are wordsize agnostic, 
and u64 alignment ensures we can use long-sized bitops on mixed size 
systems.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Fix wallclock version writing race

2010-05-04 Thread Avi Kivity

On 05/04/2010 03:02 PM, Avi Kivity wrote:

Wallclock writing uses an unprotected global variable to hold the version;
this can cause one guest to interfere with another if both write their
wallclock at the same time.

Signed-off-by: Avi Kivitya...@redhat.com
   


This was pointed out by Naphtali.


diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f6f8dad..c3152d7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -754,14 +754,22 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned 
index, u64 *data)

  static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
  {
-   static int version;
+   int version;
+   int r;
struct pvclock_wall_clock wc;
struct timespec boot;

if (!wall_clock)
return;

-   version++;
+   r = kvm_read_guest(kvm, wall_clock,version, sizeof(version));
+   if (r)
+   return;
+
+   if (version  1)
+   ++version;  /* first time write, random junk */
+
+   ++version;

kvm_write_guest(kvm, wall_clock,version, sizeof(version));

   



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Fix wallclock version writing race

2010-05-04 Thread Glauber Costa
On Tue, May 04, 2010 at 03:02:24PM +0300, Avi Kivity wrote:
 Wallclock writing uses an unprotected global variable to hold the version;
 this can cause one guest to interfere with another if both write their
 wallclock at the same time.
 
makes sense to me.

ACK.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Booting/installing WindowsNT

2010-05-04 Thread Avi Kivity

On 05/04/2010 06:27 PM, Andre Przywara wrote:



3.  In all other cases so far it BSoDs with STOP 0x3E error
  right before displaying that kernel message.

MSDN talks about a mulitprocessor configuration error:
http://msdn.microsoft.com/en-us/library/ms819006.aspx
I suspected the offline CPUs in the mptable that confuse NT. But -smp 
1,maxcpus=1 does not make a difference. I will try to dig deeper in 
this area.




What about disabling ACPI?  smp should still work through the mptable.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Get rid of KVM_REQ_KICK

2010-05-04 Thread Marcelo Tosatti
On Mon, May 03, 2010 at 05:19:08PM +0300, Avi Kivity wrote:
 KVM_REQ_KICK poisons vcpu-requests by having a bit set during normal
 operation.  This causes the fast path check for a clear vcpu-requests
 to fail all the time, triggering tons of atomic operations.

Avi,

Do you have numbers? 

 Fix by replacing KVM_REQ_KICK with a vcpu-guest_mode atomic.
 
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  arch/x86/kvm/x86.c   |   17 ++---
  include/linux/kvm_host.h |1 +
  2 files changed, 11 insertions(+), 7 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 6b2ce1d..307094a 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -4499,13 +4499,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
   if (vcpu-fpu_active)
   kvm_load_guest_fpu(vcpu);
  
 - local_irq_disable();
 + atomic_set(vcpu-guest_mode, 1);
 + smp_wmb();

IPI can trigger here?

 - clear_bit(KVM_REQ_KICK, vcpu-requests);
 - smp_mb__after_clear_bit();
 + local_irq_disable();
  
 - if (vcpu-requests || need_resched() || signal_pending(current)) {
 - set_bit(KVM_REQ_KICK, vcpu-requests);
 + if (!atomic_read(vcpu-guest_mode) || vcpu-requests
 + || need_resched() || signal_pending(current)) {
 + atomic_set(vcpu-guest_mode, 0);
 + smp_wmb();
   local_irq_enable();
   preempt_enable();
   r = 1;
 @@ -4550,7 +4552,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
   if (hw_breakpoint_active())
   hw_breakpoint_restore();
  
 - set_bit(KVM_REQ_KICK, vcpu-requests);
 + atomic_set(vcpu-guest_mode, 0);
 + smp_wmb();
   local_irq_enable();
  
   ++vcpu-stat.exits;
 @@ -5470,7 +5473,7 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
  
   me = get_cpu();
   if (cpu != me  (unsigned)cpu  nr_cpu_ids  cpu_online(cpu))
 - if (!test_and_set_bit(KVM_REQ_KICK, vcpu-requests))
 + if (atomic_xchg(vcpu-guest_mode, 0))
   smp_send_reschedule(cpu);
   put_cpu();
  }
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index ce027d5..a020fa2 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -81,6 +81,7 @@ struct kvm_vcpu {
   int vcpu_id;
   struct mutex mutex;
   int   cpu;
 + atomic_t guest_mode;
   struct kvm_run *run;
   unsigned long requests;
   unsigned long guest_debug;
 -- 
 1.7.0.4
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Get rid of KVM_REQ_KICK

2010-05-04 Thread Avi Kivity

On 05/04/2010 07:31 PM, Marcelo Tosatti wrote:

On Mon, May 03, 2010 at 05:19:08PM +0300, Avi Kivity wrote:
   

KVM_REQ_KICK poisons vcpu-requests by having a bit set during normal
operation.  This causes the fast path check for a clear vcpu-requests
to fail all the time, triggering tons of atomic operations.
 

Avi,

Do you have numbers?
   


Forgot to post, was about 100 cycles (I expected more, all those atomics 
really show up in the profile).



Fix by replacing KVM_REQ_KICK with a vcpu-guest_mode atomic.

Signed-off-by: Avi Kivitya...@redhat.com
---
  arch/x86/kvm/x86.c   |   17 ++---
  include/linux/kvm_host.h |1 +
  2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b2ce1d..307094a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4499,13 +4499,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (vcpu-fpu_active)
kvm_load_guest_fpu(vcpu);

-   local_irq_disable();
+   atomic_set(vcpu-guest_mode, 1);
+   smp_wmb();
 

IPI can trigger here?
   


It can...

   

-   clear_bit(KVM_REQ_KICK,vcpu-requests);
-   smp_mb__after_clear_bit();
+   local_irq_disable();

-   if (vcpu-requests || need_resched() || signal_pending(current)) {
-   set_bit(KVM_REQ_KICK,vcpu-requests);
+   if (!atomic_read(vcpu-guest_mode) || vcpu-requests
+   || need_resched() || signal_pending(current)) {
 


... and we'll detect that guest_mode was cleared and go back.


+   atomic_set(vcpu-guest_mode, 0);
+   smp_wmb();
local_irq_enable();
preempt_enable();
r = 1;
@@ -4550,7 +4552,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (hw_breakpoint_active())
hw_breakpoint_restore();

-   set_bit(KVM_REQ_KICK,vcpu-requests);
+   atomic_set(vcpu-guest_mode, 0);
+   smp_wmb();
local_irq_enable();

++vcpu-stat.exits;
@@ -5470,7 +5473,7 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)

me = get_cpu();
if (cpu != me  (unsigned)cpu  nr_cpu_ids  cpu_online(cpu))
-   if (!test_and_set_bit(KVM_REQ_KICK,vcpu-requests))
+   if (atomic_xchg(vcpu-guest_mode, 0))
smp_send_reschedule(cpu);
put_cpu();
 


The atomic_xchg() does the trick.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath - Assertion

2010-05-04 Thread Christoph Hellwig
On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote:
 Great, I'm going to submit it as a proper patch then.
 
 Christoph, by now I'm pretty sure it's right, but can you have another
 look if this is correct, anyway?

It looks correct to me - we really shouldn't update the the fields
until bdrv_aio_cancel has returned.  In fact we cannot cancel a request
more often than we can, so there's a fairly high chance it will
complete.


Reviewed-by: Christoph Hellwig h...@lst.de
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] x86: eliminate TS_XSAVE

2010-05-04 Thread Suresh Siddha
On Sun, 2010-05-02 at 07:53 -0700, Avi Kivity wrote:
 The fpu code currently uses current-thread_info-status  TS_XSAVE as
 a way to distinguish between XSAVE capable processors and older processors.
 The decision is not really task specific; instead we use the task status to
 avoid a global memory reference - the value should be the same across all
 threads.
 
 Eliminate this tie-in into the task structure by using an alternative
 instruction keyed off the XSAVE cpu feature; this results in shorter and
 faster code, without introducing a global memory reference.
 
 Signed-off-by: Avi Kivity a...@redhat.com

Acked-by: Suresh Siddha suresh.b.sid...@intel.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86: Introduce 'struct fpu' and related API

2010-05-04 Thread Suresh Siddha
On Sun, 2010-05-02 at 07:53 -0700, Avi Kivity wrote:
 Currently all fpu state access is through tsk-thread.xstate.  Since we wish
 to generalize fpu access to non-task contexts, wrap the state in a new
 'struct fpu' and convert existing access to use an fpu API.
 
 Signal frame handlers are not converted to the API since they will remain
 task context only things.
 
 Signed-off-by: Avi Kivity a...@redhat.com

One comment I have is the name 'fpu'. In future we can use this for non
fpu state aswell. For now, I can't think of a simple and better name. We
can perhaps change it in the future.

Acked-by: Suresh Siddha suresh.b.sid...@intel.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] x86: eliminate TS_XSAVE

2010-05-04 Thread Suresh Siddha
On Tue, 2010-05-04 at 00:41 -0700, Avi Kivity wrote:
 On 05/04/2010 12:45 AM, H. Peter Anvin wrote:
 
  I was trying to avoid a performance regression relative to the current
  code, as it appears that some care was taken to avoid the memory reference.
 
  I agree that it's probably negligible compared to the save/restore
  code.  If the x86 maintainers agree as well, I'll replace it with
  cpu_has_xsave.
 
   
  I asked Suresh to comment on this, since he wrote the original code.  He
  did confirm that the intent was to avoid a global memory reference.
 
 
 
 Ok, so you're happy with the patch as is?

As use_xsave() is in the hot context switch path, I would like to go
with Avi's proposal.

thanks,
suresh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] x86: eliminate TS_XSAVE

2010-05-04 Thread H. Peter Anvin
On 05/04/2010 11:15 AM, Suresh Siddha wrote:
 On Tue, 2010-05-04 at 00:41 -0700, Avi Kivity wrote:
 On 05/04/2010 12:45 AM, H. Peter Anvin wrote:

 I was trying to avoid a performance regression relative to the current
 code, as it appears that some care was taken to avoid the memory reference.

 I agree that it's probably negligible compared to the save/restore
 code.  If the x86 maintainers agree as well, I'll replace it with
 cpu_has_xsave.

  
 I asked Suresh to comment on this, since he wrote the original code.  He
 did confirm that the intent was to avoid a global memory reference.



 Ok, so you're happy with the patch as is?
 
 As use_xsave() is in the hot context switch path, I would like to go
 with Avi's proposal.
 

I would tend to agree.  Saving a likely cache miss in the hot context
switch path is worthwhile.

I would like to request one change, however.  I would like to see the
alternatives code to be:

movb $0,reg
movb $1,reg

... instead of using xor (which has to be padded with NOPs, which is of
course pointless since the slot is a fixed size.)  I would suggest using
a byte-sized variable instead of a dword-size variable to save a few
bytes, too.

Once the jump label framework is integrated and has matured, I think we
should consider using it to save the mov/test/jump.

-hpa
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio: put last_used and last_avail index into ring itself.

2010-05-04 Thread Michael S. Tsirkin
 virtio: put last_used and last_avail index into ring itself.
 
 Generally, the other end of the virtio ring doesn't need to see where
 you're up to in consuming the ring.  However, to completely understand
 what's going on from the outside, this information must be exposed.
 For example, if you want to save and restore a virtio_ring, but you're
 not the consumer because the kernel is using it directly.
 
 Fortunately, we have room to expand: the ring is always a whole number
 of pages and there's hundreds of bytes of padding after the avail ring
 and the used ring, whatever the number of descriptors (which must be a
 power of 2).
 
 We add a feature bit so the guest can tell the host that it's writing
 out the current value there, if it wants to use that.
 
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au

I've been looking at this patch some more (more on why
later), and I wonder: would it be better to add some
alignment to the last used index address, so that
if we later add more stuff at the tail, it all
fits in a single cache line?

We use a new feature bit anyway, so layout change should not be
a problem.

Since I raised the question of caches: for used ring,
the ring is not aligned to 64 bit, so on CPUs with 64 bit
or larger cache lines, used entries will often cross
cache line boundaries. Am I right and might it
have been better to align ring entries to cache line boundaries?

What do you think?

 ---
  drivers/virtio/virtio_ring.c |   23 +++
  include/linux/virtio_ring.h  |   12 +++-
  2 files changed, 26 insertions(+), 9 deletions(-)
 
 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -71,9 +71,6 @@ struct vring_virtqueue
   /* Number we've added since last sync. */
   unsigned int num_added;
  
 - /* Last used index we've seen. */
 - u16 last_used_idx;
 -
   /* How to notify other side. FIXME: commonalize hcalls! */
   void (*notify)(struct virtqueue *vq);
  
 @@ -278,12 +275,13 @@ static void detach_buf(struct vring_virt
  
  static inline bool more_used(const struct vring_virtqueue *vq)
  {
 - return vq-last_used_idx != vq-vring.used-idx;
 + return vring_last_used(vq-vring) != vq-vring.used-idx;
  }
  
  static void *vring_get_buf(struct virtqueue *_vq, unsigned int *len)
  {
   struct vring_virtqueue *vq = to_vvq(_vq);
 + struct vring_used_elem *u;
   void *ret;
   unsigned int i;
  
 @@ -300,8 +298,11 @@ static void *vring_get_buf(struct virtqu
   return NULL;
   }
  
 - i = vq-vring.used-ring[vq-last_used_idx%vq-vring.num].id;
 - *len = vq-vring.used-ring[vq-last_used_idx%vq-vring.num].len;
 + u = vq-vring.used-ring[vring_last_used(vq-vring) % vq-vring.num];
 + i = u-id;
 + *len = u-len;
 + /* Make sure we don't reload i after doing checks. */
 + rmb();
  
   if (unlikely(i = vq-vring.num)) {
   BAD_RING(vq, id %u out of range\n, i);
 @@ -315,7 +316,8 @@ static void *vring_get_buf(struct virtqu
   /* detach_buf clears data, so grab it now. */
   ret = vq-data[i];
   detach_buf(vq, i);
 - vq-last_used_idx++;
 + vring_last_used(vq-vring)++;
 +
   END_USE(vq);
   return ret;
  }
 @@ -402,7 +404,6 @@ struct virtqueue *vring_new_virtqueue(un
   vq-vq.name = name;
   vq-notify = notify;
   vq-broken = false;
 - vq-last_used_idx = 0;
   vq-num_added = 0;
   list_add_tail(vq-vq.list, vdev-vqs);
  #ifdef DEBUG
 @@ -413,6 +414,10 @@ struct virtqueue *vring_new_virtqueue(un
  
   vq-indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
  
 + /* We publish indices whether they offer it or not: if not, it's junk
 +  * space anyway.  But calling this acknowledges the feature. */
 + virtio_has_feature(vdev, VIRTIO_RING_F_PUBLISH_INDICES);
 +
   /* No callback?  Tell other side not to bother us. */
   if (!callback)
   vq-vring.avail-flags |= VRING_AVAIL_F_NO_INTERRUPT;
 @@ -443,6 +448,8 @@ void vring_transport_features(struct vir
   switch (i) {
   case VIRTIO_RING_F_INDIRECT_DESC:
   break;
 + case VIRTIO_RING_F_PUBLISH_INDICES:
 + break;
   default:
   /* We don't understand this bit. */
   clear_bit(i, vdev-features);
 diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
 --- a/include/linux/virtio_ring.h
 +++ b/include/linux/virtio_ring.h
 @@ -29,6 +29,9 @@
  /* We support indirect buffer descriptors */
  #define VIRTIO_RING_F_INDIRECT_DESC  28
  
 +/* We publish our last-seen used index at the end of the avail ring. */
 +#define VIRTIO_RING_F_PUBLISH_INDICES29
 +
  /* Virtio ring descriptors: 16 bytes.  These can chain together via next. 
 */
  struct vring_desc
  {
 @@ -87,6 +90,7 @@ struct vring {
   *   __u16 

[PATCH 0/2] fix kvmclock bug - memory corruption

2010-05-04 Thread Glauber Costa
This patch series fixes I bug I just found with kvmclock,
when I booted into a kernel without kvmclock enabled.

Since I am setting msrs, I took the oportunity to
use yet another function from upstream qemu (patch 1).

Enjoy

Glauber Costa (2):
  replace set_msr_entry with kvm_msr_entry
  turn off kvmclock when resetting cpu

 qemu-kvm-x86.c|   58 -
 target-i386/kvm.c |3 ++
 2 files changed, 38 insertions(+), 23 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] replace set_msr_entry with kvm_msr_entry

2010-05-04 Thread Glauber Costa
this is yet another function that upstream qemu implements,
so we can just use its implementation.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 qemu-kvm-x86.c|   39 ---
 target-i386/kvm.c |3 +++
 2 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 748ff69..439c31a 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -693,13 +693,6 @@ int kvm_arch_qemu_create_context(void)
 return 0;
 }
 
-static void set_msr_entry(struct kvm_msr_entry *entry, uint32_t index,
-  uint64_t data)
-{
-entry-index = index;
-entry-data  = data;
-}
-
 /* returns 0 on success, non-0 on failure */
 static int get_msr_entry(struct kvm_msr_entry *entry, CPUState *env)
 {
@@ -960,19 +953,19 @@ void kvm_arch_load_regs(CPUState *env, int level)
 /* msrs */
 n = 0;
 /* Remember to increase msrs size if you add new registers below */
-set_msr_entry(msrs[n++], MSR_IA32_SYSENTER_CS,  env-sysenter_cs);
-set_msr_entry(msrs[n++], MSR_IA32_SYSENTER_ESP, env-sysenter_esp);
-set_msr_entry(msrs[n++], MSR_IA32_SYSENTER_EIP, env-sysenter_eip);
+kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_CS,  env-sysenter_cs);
+kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_ESP, env-sysenter_esp);
+kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_EIP, env-sysenter_eip);
 if (kvm_has_msr_star)
-   set_msr_entry(msrs[n++], MSR_STAR,  env-star);
+   kvm_msr_entry_set(msrs[n++], MSR_STAR,  env-star);
 if (kvm_has_vm_hsave_pa)
-set_msr_entry(msrs[n++], MSR_VM_HSAVE_PA, env-vm_hsave);
+kvm_msr_entry_set(msrs[n++], MSR_VM_HSAVE_PA, env-vm_hsave);
 #ifdef TARGET_X86_64
 if (lm_capable_kernel) {
-set_msr_entry(msrs[n++], MSR_CSTAR, env-cstar);
-set_msr_entry(msrs[n++], MSR_KERNELGSBASE,  env-kernelgsbase);
-set_msr_entry(msrs[n++], MSR_FMASK, env-fmask);
-set_msr_entry(msrs[n++], MSR_LSTAR  ,   env-lstar);
+kvm_msr_entry_set(msrs[n++], MSR_CSTAR, env-cstar);
+kvm_msr_entry_set(msrs[n++], MSR_KERNELGSBASE,  
env-kernelgsbase);
+kvm_msr_entry_set(msrs[n++], MSR_FMASK, env-fmask);
+kvm_msr_entry_set(msrs[n++], MSR_LSTAR  ,   env-lstar);
 }
 #endif
 if (level == KVM_PUT_FULL_STATE) {
@@ -983,20 +976,20 @@ void kvm_arch_load_regs(CPUState *env, int level)
  * huge jump-backs that would occur without any writeback at all.
  */
 if (smp_cpus == 1 || env-tsc != 0) {
-set_msr_entry(msrs[n++], MSR_IA32_TSC, env-tsc);
+kvm_msr_entry_set(msrs[n++], MSR_IA32_TSC, env-tsc);
 }
-set_msr_entry(msrs[n++], MSR_KVM_SYSTEM_TIME, env-system_time_msr);
-set_msr_entry(msrs[n++], MSR_KVM_WALL_CLOCK, env-wall_clock_msr);
+kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME, 
env-system_time_msr);
+kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, env-wall_clock_msr);
 }
 #ifdef KVM_CAP_MCE
 if (env-mcg_cap) {
 if (level == KVM_PUT_RESET_STATE)
-set_msr_entry(msrs[n++], MSR_MCG_STATUS, env-mcg_status);
+kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status);
 else if (level == KVM_PUT_FULL_STATE) {
-set_msr_entry(msrs[n++], MSR_MCG_STATUS, env-mcg_status);
-set_msr_entry(msrs[n++], MSR_MCG_CTL, env-mcg_ctl);
+kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status);
+kvm_msr_entry_set(msrs[n++], MSR_MCG_CTL, env-mcg_ctl);
 for (i = 0; i  (env-mcg_cap  0xff); i++)
-set_msr_entry(msrs[n++], MSR_MC0_CTL + i, env-mce_banks[i]);
+kvm_msr_entry_set(msrs[n++], MSR_MC0_CTL + i, 
env-mce_banks[i]);
 }
 }
 #endif
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5239eaf..56740bd 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -552,6 +552,8 @@ static int kvm_put_sregs(CPUState *env)
 return kvm_vcpu_ioctl(env, KVM_SET_SREGS, sregs);
 }
 
+#endif 
+
 static void kvm_msr_entry_set(struct kvm_msr_entry *entry,
   uint32_t index, uint64_t value)
 {
@@ -559,6 +561,7 @@ static void kvm_msr_entry_set(struct kvm_msr_entry *entry,
 entry-data = value;
 }
 
+#ifdef KVM_UPSTREAM
 static int kvm_put_msrs(CPUState *env, int level)
 {
 struct {
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] turn off kvmclock when resetting cpu

2010-05-04 Thread Glauber Costa
Currently, in the linux kernel, we reset kvmclock if we are rebooting
into a crash kernel through kexec. The rationale, is that a new kernel
won't follow the same memory addresses, and the memory where kvmclock is
located in the first kernel, will be something else in the second one.

We don't do it in normal reboots, because the second kernel ends up
registering kvmclock again, which has the effect of turning off the
first instance.

This is, however, totally wrong. This assumes we're booting into
a kernel that also has kvmclock enabled. If by some reason we reboot
into something that doesn't do kvmclock including but not limited to:
 * rebooting into an older kernel without kvmclock support,
 * rebooting with no-kvmclock,
 * rebootint into another O.S,

we'll simply have the hypervisor writing into a random memory position
into the guest. Neat, uh?

Moreover, I believe the fix belongs in qemu, since it is the entity
more prepared to detect all kinds of reboots (by means of a cpu_reset),
not to mention the presence of misbehaving guests, that can forget
to turn kvmclock off.

This patch fixes the issue for me.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 qemu-kvm-x86.c |   19 +++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 439c31a..4b94e04 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -1417,8 +1417,27 @@ void kvm_arch_push_nmi(void *opaque)
 }
 #endif /* KVM_CAP_USER_NMI */
 
+static int kvm_turn_off_clock(CPUState *env)
+{
+struct {
+struct kvm_msrs info;
+struct kvm_msr_entry entries[100];
+} msr_data;
+
+struct kvm_msr_entry *msrs = msr_data.entries;
+int n = 0;
+
+kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME, 0);
+kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, 0);
+msr_data.info.nmsrs = n;
+
+return kvm_vcpu_ioctl(env, KVM_SET_MSRS, msr_data);
+}
+
+
 void kvm_arch_cpu_reset(CPUState *env)
 {
+kvm_turn_off_clock(env);
 kvm_arch_reset_vcpu(env);
 kvm_reset_mpstate(env);
 }
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Christoph Hellwig
On Fri, Feb 19, 2010 at 12:22:20AM +0200, Michael S. Tsirkin wrote:
 I took a stub at documenting CMD and FLUSH request types in virtio
 block.  Christoph, could you look over this please?
 
 I note that the interface seems full of warts to me,
 this might be a first step to cleaning them.

The whole virtio-blk interface is full of warts.  It has been
extended rather ad-hoc, so that is rather expected.

 One issue I struggled with especially is how type
 field mixes bits and non-bit values. I ended up
 simply defining all legal values, so that we have
 CMD = 2, CMD_OUT = 3 and so on.

It's basically a complete mess without much logic behind it.

 +\change_unchanged
 +the high bit
 +\change_inserted 0 1266497301
 + (VIRTIO_BLK_T_BARRIER)
 +\change_unchanged
 + indicates that this request acts as a barrier and that all preceeding 
 requests
 + must be complete before this one, and all following requests must not be
 + started until this is complete.
 +
 +\change_inserted 0 1266504385
 + Note that a barrier does not flush caches in the underlying backend device
 + in host, and thus does not serve as data consistency guarantee.
 + Driver must use FLUSH request to flush the host cache.
 +\change_unchanged

I'm not sure it's even worth documenting it.  I can't see any way to
actually implement safe behaviour with the VIRTIO_BLK_T_BARRIER-style
barriers.


Btw, did I mention that .lyx is a a really horrible format to review
diffs for?  Plain latex would be a lot better..
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Christoph Hellwig
On Tue, Apr 20, 2010 at 02:46:35AM +0100, Jamie Lokier wrote:
 Does this mean that virtio-blk supports all three combinations?
 
1. FLUSH that isn't a barrier
2. FLUSH that is also a barrier
3. Barrier that is not a flush
 
 1 is good for fsync-like operations;
 2 is good for journalling-like ordered operations.
 3 sounds like it doesn't mean a lot as the host cache provides no
 guarantees and has no ordering facility that can be used.

No.  The Linux virtio_blk guest driver either supports data integrity
by using FLUSH or can send down BARRIER requests which aren't much
help at all.  Qemu only implements FLUSH anyway.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Michael S. Tsirkin
On Tue, May 04, 2010 at 08:54:59PM +0200, Christoph Hellwig wrote:
 On Fri, Feb 19, 2010 at 12:22:20AM +0200, Michael S. Tsirkin wrote:
  I took a stub at documenting CMD and FLUSH request types in virtio
  block.  Christoph, could you look over this please?
  
  I note that the interface seems full of warts to me,
  this might be a first step to cleaning them.
 
 The whole virtio-blk interface is full of warts.  It has been
 extended rather ad-hoc, so that is rather expected.
 
  One issue I struggled with especially is how type
  field mixes bits and non-bit values. I ended up
  simply defining all legal values, so that we have
  CMD = 2, CMD_OUT = 3 and so on.
 
 It's basically a complete mess without much logic behind it.
 
  +\change_unchanged
  +the high bit
  +\change_inserted 0 1266497301
  + (VIRTIO_BLK_T_BARRIER)
  +\change_unchanged
  + indicates that this request acts as a barrier and that all preceeding 
  requests
  + must be complete before this one, and all following requests must not be
  + started until this is complete.
  +
  +\change_inserted 0 1266504385
  + Note that a barrier does not flush caches in the underlying backend device
  + in host, and thus does not serve as data consistency guarantee.
  + Driver must use FLUSH request to flush the host cache.
  +\change_unchanged
 
 I'm not sure it's even worth documenting it.  I can't see any way to
 actually implement safe behaviour with the VIRTIO_BLK_T_BARRIER-style
 barriers.

lguest seems to still use this.
I guess if you have a reliable host, VIRTIO_BLK_T_BARRIER is enough?

 Btw, did I mention that .lyx is a a really horrible format to review
 diffs for?  Plain latex would be a lot better..
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Michael S. Tsirkin
On Tue, May 04, 2010 at 09:56:18PM +0300, Michael S. Tsirkin wrote:
 On Tue, May 04, 2010 at 08:54:59PM +0200, Christoph Hellwig wrote:
  On Fri, Feb 19, 2010 at 12:22:20AM +0200, Michael S. Tsirkin wrote:
   I took a stub at documenting CMD and FLUSH request types in virtio
   block.  Christoph, could you look over this please?
   
   I note that the interface seems full of warts to me,
   this might be a first step to cleaning them.
  
  The whole virtio-blk interface is full of warts.  It has been
  extended rather ad-hoc, so that is rather expected.
  
   One issue I struggled with especially is how type
   field mixes bits and non-bit values. I ended up
   simply defining all legal values, so that we have
   CMD = 2, CMD_OUT = 3 and so on.
  
  It's basically a complete mess without much logic behind it.
  
   +\change_unchanged
   +the high bit
   +\change_inserted 0 1266497301
   + (VIRTIO_BLK_T_BARRIER)
   +\change_unchanged
   + indicates that this request acts as a barrier and that all preceeding 
   requests
   + must be complete before this one, and all following requests must not be
   + started until this is complete.
   +
   +\change_inserted 0 1266504385
   + Note that a barrier does not flush caches in the underlying backend 
   device
   + in host, and thus does not serve as data consistency guarantee.
   + Driver must use FLUSH request to flush the host cache.
   +\change_unchanged
  
  I'm not sure it's even worth documenting it.  I can't see any way to
  actually implement safe behaviour with the VIRTIO_BLK_T_BARRIER-style
  barriers.
 
 lguest seems to still use this.

Sorry, it doesn't. No idea why I thought it does.

 I guess if you have a reliable host, VIRTIO_BLK_T_BARRIER is enough?
 
  Btw, did I mention that .lyx is a a really horrible format to review
  diffs for?  Plain latex would be a lot better..
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Michael S. Tsirkin
On Tue, May 04, 2010 at 08:56:14PM +0200, Christoph Hellwig wrote:
 On Tue, Apr 20, 2010 at 02:46:35AM +0100, Jamie Lokier wrote:
  Does this mean that virtio-blk supports all three combinations?
  
 1. FLUSH that isn't a barrier
 2. FLUSH that is also a barrier
 3. Barrier that is not a flush
  
  1 is good for fsync-like operations;
  2 is good for journalling-like ordered operations.
  3 sounds like it doesn't mean a lot as the host cache provides no
  guarantees and has no ordering facility that can be used.
 
 No.  The Linux virtio_blk guest driver either supports data integrity
 by using FLUSH or can send down BARRIER requests which aren't much
 help at all.

It seems we use BARRIER when we get REQ_HARDBARRIER, right?
What does the REQ_HARDBARRIER flag in request mean and when is it set?

  Qemu only implements FLUSH anyway.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Jamie Lokier
Jens Axboe wrote:
 On Tue, May 04 2010, Rusty Russell wrote:
  ISTR someone mentioning a desire for such an API years ago, so CC'ing the
  usual I/O suspects...
 
 It would be nice to have a more fuller API for this, but the reality is
 that only the flush approach is really workable. Even just strict
 ordering of requests could only be supported on SCSI, and even there the
 kernel still lacks proper guarantees on error handling to prevent
 reordering there.

There's a few I/O scheduling differences that might be useful:

1. The I/O scheduler could freely move WRITEs before a FLUSH but not
   before a BARRIER.  That might be useful for time-critical WRITEs,
   and those issued by high I/O priority.

2. The I/O scheduler could move WRITEs after a FLUSH if the FLUSH is
   only for data belonging to a particular file (e.g. fdatasync with
   no file size change, even on btrfs if O_DIRECT was used for the
   writes being committed).  That would entail tagging FLUSHes and
   WRITEs with a fs-specific identifier (such as inode number), opaque
   to the scheduler which only checks equality.

3. By delaying FLUSHes through reordering as above, the I/O scheduler
   could merge multiple FLUSHes into a single command.

4. On MD/RAID, BARRIER requires every backing device to quiesce before
   sending the low-level cache-flush, and all of those to finish
   before resuming each backing device.  FLUSH doesn't require as much
   synchronising.  (With per-file FLUSH; see 2; it could even avoid
   FLUSH altogether to some backing devices for small files).

In other words, FLUSH can be more relaxed than BARRIER inside the
kernel.  It's ironic that we think of fsync as stronger than
fbarrier outside the kernel :-)

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] fix kvmclock bug - memory corruption

2010-05-04 Thread Zachary Amsden

On 05/04/2010 08:35 AM, Glauber Costa wrote:

This patch series fixes I bug I just found with kvmclock,
when I booted into a kernel without kvmclock enabled.

Since I am setting msrs, I took the oportunity to
use yet another function from upstream qemu (patch 1).

Enjoy

Glauber Costa (2):
   replace set_msr_entry with kvm_msr_entry
   turn off kvmclock when resetting cpu

  qemu-kvm-x86.c|   58 -
  target-i386/kvm.c |3 ++
  2 files changed, 38 insertions(+), 23 deletions(-)

   


Acked-by: Zachary Amsden zams...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Jamie Lokier
Rusty Russell wrote:
 On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote:
  I took a stub at documenting CMD and FLUSH request types in virtio
  block.  Christoph, could you look over this please?
  
  I note that the interface seems full of warts to me,
  this might be a first step to cleaning them.
 
 ISTR Christoph had withdrawn some patches in this area, and was waiting
 for him to resubmit?
 
 I've given up on figuring out the block device.  What seem to me to be sane
 semantics along the lines of memory barriers are foreign to disk people: they
 want (and depend on) flushing everywhere.
 
 For example, tdb transactions do not require a flush, they only require what
 I would call a barrier: that prior data be written out before any future data.
 Surely that would be more efficient in general than a flush!  In fact, TDB
 wants only writes to *that file* (and metadata) written out first; it has no
 ordering issues with other I/O on the same device.

I've just posted elsewhere on this thread, that an I/O level flush can
be more efficient than an I/O level barrier (implemented using a
cache-flush really), because the barrier has stricter ordering
requirements at the I/O scheduling level.

By the time you work up to tdb, another way to think of it is
distinguishing eager fsync from fsync but I'm not in a hurry -
delay as long as is convenient.  The latter makes much more sense
with AIO.

 A generic I/O interface would allow you to specify this request
 depends on these outstanding requests and leave it at that.  It
 might have some sync flush command for dumb applications and OSes.

For filesystems, it would probably be easy to label in-place
overwrites and fdatasync data flushes when there's no file extension
with an opqaue per-file identifier for certain operations.  Typically
over-writing in place and fdatasync would match up and wouldn't need
ordering against anything else.  Other operations would tend to get
labelled as ordered against everything including these.

-- Jamie
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3 v2] KVM MMU: make kvm_mmu_zap_page() return the number of zapped sp in total.

2010-05-04 Thread Gui Jianfeng
Marcelo Tosatti wrote:
 On Mon, May 03, 2010 at 09:38:54PM +0800, Gui Jianfeng wrote:
 Hi Marcelo

 Actually, it doesn't only affect kvm_mmu_change_mmu_pages() but also affects 
 kvm_mmu_remove_some_alloc_mmu_pages()
 which is called by mmu shrink routine. This will induce upper layer get a 
 wrong number, so i think this should be
 fixed. Here is a updated version.

 ---
 From: Gui Jianfeng guijianf...@cn.fujitsu.com

 Currently, in kvm_mmu_change_mmu_pages(kvm, page), used_pages-- is  
 performed after calling
 kvm_mmu_zap_page() in spite of that whether page is actually reclaimed. 
 Because root sp won't 
 be reclaimed by kvm_mmu_zap_page(). So making kvm_mmu_zap_page() return 
 total number of reclaimed 
 sp makes more sense. A new flag is put into kvm_mmu_zap_page() to indicate 
 whether the top page is
 reclaimed. kvm_mmu_remove_some_alloc_mmu_pages() also rely on 
 kvm_mmu_zap_page() to return a total
 relcaimed number.
 
 Isnt it simpler to have kvm_mmu_zap_page return the number of pages it
 actually freed? Then always restart the hash walk if return is positive.
 

OK, although in some cases we might encounter unneeded hash walk restart, but 
it's not a big
problem. I don't object this solution, will post a new patch.

Thanks,
Gui

 
 
 



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: make kvm_mmu_zap_page() return the number of pages it actually freed.

2010-05-04 Thread Gui Jianfeng
Currently, kvm_mmu_zap_page() returning the number of freed children sp.
This might confuse the caller, because caller don't know the actual freed
number. Let's make kvm_mmu_zap_page() return the number of pages it actually
freed.

Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 51eb6d6..8ab6820 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1503,6 +1503,8 @@ static int kvm_mmu_zap_page(struct kvm *kvm, struct 
kvm_mmu_page *sp)
if (sp-unsync)
kvm_unlink_unsync_page(kvm, sp);
if (!sp-root_count) {
+   /* Count self */
+   ret++;
hlist_del(sp-hash_link);
kvm_mmu_free_page(kvm, sp);
} else {
@@ -1539,7 +1541,6 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned 
int kvm_nr_mmu_pages)
page = container_of(kvm-arch.active_mmu_pages.prev,
struct kvm_mmu_page, link);
used_pages -= kvm_mmu_zap_page(kvm, page);
-   used_pages--;
}
kvm_nr_mmu_pages = used_pages;
kvm-arch.n_free_mmu_pages = 0;
@@ -2908,7 +2909,7 @@ static int kvm_mmu_remove_some_alloc_mmu_pages(struct kvm 
*kvm)
 
page = container_of(kvm-arch.active_mmu_pages.prev,
struct kvm_mmu_page, link);
-   return kvm_mmu_zap_page(kvm, page) + 1;
+   return kvm_mmu_zap_page(kvm, page);
 }
 
 static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask)
-- 
1.6.5.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: mark page dirty when page is actually modified.

2010-05-04 Thread Gui Jianfeng
Sometime cmpxchg_gpte doesn't modify gpte, in such case, don't mark
page table page as dirty.

Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
---
 arch/x86/kvm/paging_tmpl.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 89d66ca..1ad9843 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -177,10 +177,10 @@ walk:
if (!(pte  PT_ACCESSED_MASK)) {
trace_kvm_mmu_set_accessed_bit(table_gfn, index,
   sizeof(pte));
-   mark_page_dirty(vcpu-kvm, table_gfn);
if (FNAME(cmpxchg_gpte)(vcpu-kvm, table_gfn,
index, pte, pte|PT_ACCESSED_MASK))
goto walk;
+   mark_page_dirty(vcpu-kvm, table_gfn);
pte |= PT_ACCESSED_MASK;
}
 
@@ -217,11 +217,11 @@ walk:
bool ret;
 
trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
-   mark_page_dirty(vcpu-kvm, table_gfn);
ret = FNAME(cmpxchg_gpte)(vcpu-kvm, table_gfn, index, pte,
pte|PT_DIRTY_MASK);
if (ret)
goto walk;
+   mark_page_dirty(vcpu-kvm, table_gfn);
pte |= PT_DIRTY_MASK;
walker-ptes[walker-level - 1] = pte;
}
-- 
1.6.5.2



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


vCPU scalability for linux VMs

2010-05-04 Thread Alec Istomin
Gentlemen,
 Reaching out with a non-development question, sorry if it's not
 appropriate here.
 
 I'm looking for a way to improve Linux SMP VMs performance under KVM.

 My preliminary results show that single vCPU Linux VMs perform up to 10
 times better than 4vCPU Linux VMs (consolidated performance of 8 VMs on
 8 core pre-Nehalem server). I suspect that I'm missing something major
 and look for any means that can help improve SMP VMs performance.


VMs are started using:
qemu-kvm -m $ram -smp $cpus -name $name -drive 
file=${newimg},boot=on,cache=writethrough  -net 
nic,macaddr=$mac,vlan=0,model=virtio -net 
tap,script=/kvm/qemu-ifup,vlan=0,ifname=kvmnet$i -parallel none -usb -k en-us  
-monitor pty -serial pty -nographic -daemonize -snapshot


KVM Host Environment (redhat 5 based):
# uname -r
2.6.18-194.el5
# rpm -qa|grep kvm
kvm-83-164.el5
kvm-tools-83-164.el5
kmod-kvm-83-164.el5
kvm-qemu-img-83-164.el5



Thank you,
 Alec

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Fix debug output error

2010-05-04 Thread Gui Jianfeng
Fix a debug output error in walk_addr

Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
---
 arch/x86/kvm/paging_tmpl.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 89d66ca..d2c5164 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -229,7 +229,7 @@ walk:
walker-pt_access = pt_access;
walker-pte_access = pte_access;
pgprintk(%s: pte %llx pte_access %x pt_access %x\n,
-__func__, (u64)pte, pt_access, pte_access);
+__func__, (u64)pte, pte_access, pt_access);
return 1;
 
 not_present:
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.33.3: possible recursive locking detected

2010-05-04 Thread Yong Zhang
On Tue, May 04, 2010 at 11:37:37AM +0300, Avi Kivity wrote:
 On 05/04/2010 10:03 AM, CaT wrote:
 I'm currently running 2.6.33.3 in a KVM instance emulating a core2duo
 on 1 cpu with virtio HDs running on top of a core2duo host running 2.6.33.3.
 qemu-kvm version 0.12.3.

Can you try commit 6992f5334995af474c2b58d010d08bc597f0f2fe in the latest
kernel?

 
 Doesn't appear to be related to kvm.  Copying lkml.
 
 When doing:
 
 echo noop/sys/block/vdd/queue/scheduler
 
 I got:
 
 [ 1424.438241] =
 [ 1424.439588] [ INFO: possible recursive locking detected ]
 [ 1424.440368] 2.6.33.3-moocow.20100429-142641 #2
 [ 1424.440960] -
 [ 1424.440960] bash/2186 is trying to acquire lock:
 [ 1424.440960]  (s_active){.+}, at: [811046b8] 
 sysfs_remove_dir+0x75/0x88
 [ 1424.440960]
 [ 1424.440960] but task is already holding lock:
 [ 1424.440960]  (s_active){.+}, at: [81104849] 
 sysfs_get_active_two+0x1f/0x46
 [ 1424.440960]
 [ 1424.440960] other info that might help us debug this:
 [ 1424.440960] 4 locks held by bash/2186:
 [ 1424.440960]  #0:  (buffer-mutex){+.+.+.}, at: [8110317f] 
 sysfs_write_file+0x39/0x126
 [ 1424.440960]  #1:  (s_active){.+}, at: [81104849] 
 sysfs_get_active_two+0x1f/0x46
 [ 1424.440960]  #2:  (s_active){.+}, at: [81104856] 
 sysfs_get_active_two+0x2c/0x46
 [ 1424.440960]  #3:  (q-sysfs_lock){+.+.+.}, at: [8119c3f0] 
 queue_attr_store+0x44/0x85
 [ 1424.440960]
 [ 1424.440960] stack backtrace:
 [ 1424.440960] Pid: 2186, comm: bash Not tainted 
 2.6.33.3-moocow.20100429-142641 #2
 [ 1424.440960] Call Trace:
 [ 1424.440960]  [8105e775] __lock_acquire+0xf9f/0x178e
 [ 1424.440960]  [8100d3ec] ? save_stack_trace+0x2a/0x48
 [ 1424.440960]  [8105b46c] ? lockdep_init_map+0x9f/0x52f
 [ 1424.440960]  [8105b46c] ? lockdep_init_map+0x9f/0x52f
 [ 1424.440960]  [8105cb56] ? trace_hardirqs_on+0xd/0xf
 [ 1424.440960]  [8105f02e] lock_acquire+0xca/0xef
 [ 1424.440960]  [811046b8] ? sysfs_remove_dir+0x75/0x88
 [ 1424.440960]  [8110458d] sysfs_addrm_finish+0xc8/0x13a
 [ 1424.440960]  [811046b8] ? sysfs_remove_dir+0x75/0x88
 [ 1424.440960]  [8105cb25] ? trace_hardirqs_on_caller+0x110/0x134
 [ 1424.440960]  [811046b8] sysfs_remove_dir+0x75/0x88
 [ 1424.440960]  [811ab312] kobject_del+0x16/0x37
 [ 1424.440960]  [81195489] elv_iosched_store+0x10a/0x214
 [ 1424.440960]  [8119c416] queue_attr_store+0x6a/0x85
 [ 1424.440960]  [81103237] sysfs_write_file+0xf1/0x126
 [ 1424.440960]  [810b747f] vfs_write+0xae/0x14a
 [ 1424.440960]  [810b75df] sys_write+0x47/0x6e
 [ 1424.440960]  [81002202] system_call_fastpath+0x16/0x1b
 
 Original scheduler was cfq.
 
 Having rebooted and defaulted to noop I tried
 
 echo noop/sys/block/vdd/queue/scheduler
 
 and got:
 
 [  311.294464] =
 [  311.295820] [ INFO: possible recursive locking detected ]
 [  311.296603] 2.6.33.3-moocow.20100429-142641 #2
 [  311.296833] -
 [  311.296833] bash/2190 is trying to acquire lock:
 [  311.296833]  (s_active){.+}, at: [81104630] 
 remove_dir+0x31/0x39
 [  311.296833]
 [  311.296833] but task is already holding lock:
 [  311.296833]  (s_active){.+}, at: [81104849] 
 sysfs_get_active_two+0x1f/0x46
 [  311.296833]
 [  311.296833] other info that might help us debug this:
 [  311.296833] 4 locks held by bash/2190:
 [  311.296833]  #0:  (buffer-mutex){+.+.+.}, at: [8110317f] 
 sysfs_write_file+0x39/0x126
 [  311.296833]  #1:  (s_active){.+}, at: [81104849] 
 sysfs_get_active_two+0x1f/0x46
 [  311.296833]  #2:  (s_active){.+}, at: [81104856] 
 sysfs_get_active_two+0x2c/0x46
 [  311.296833]  #3:  (q-sysfs_lock){+.+.+.}, at: [8119c3f0] 
 queue_attr_store+0x44/0x85
 [  311.296833]
 [  311.296833] stack backtrace:
 [  311.296833] Pid: 2190, comm: bash Not tainted 
 2.6.33.3-moocow.20100429-142641 #2
 [  311.296833] Call Trace:
 [  311.296833]  [8105e775] __lock_acquire+0xf9f/0x178e
 [  311.296833]  [8105b46c] ? lockdep_init_map+0x9f/0x52f
 [  311.296833]  [8105b46c] ? lockdep_init_map+0x9f/0x52f
 [  311.296833]  [8105cb56] ? trace_hardirqs_on+0xd/0xf
 [  311.296833]  [8105f02e] lock_acquire+0xca/0xef
 [  311.296833]  [81104630] ? remove_dir+0x31/0x39
 [  311.296833]  [8110458d] sysfs_addrm_finish+0xc8/0x13a
 [  311.296833]  [81104630] ? remove_dir+0x31/0x39
 [  311.296833]  [8105cb25] ? trace_hardirqs_on_caller+0x110/0x134
 [  311.296833]  [81104630] remove_dir+0x31/0x39
 [  311.296833]  [811046c0] sysfs_remove_dir+0x7d/0x88
 [  311.296833]  [811ab312] kobject_del+0x16/0x37
 [  311.296833]  [81195489] 

Re: 2.6.33.3: possible recursive locking detected

2010-05-04 Thread Américo Wang
On Wed, May 5, 2010 at 10:32 AM, Yong Zhang yong.zh...@windriver.com wrote:
 On Tue, May 04, 2010 at 11:37:37AM +0300, Avi Kivity wrote:
 On 05/04/2010 10:03 AM, CaT wrote:
 I'm currently running 2.6.33.3 in a KVM instance emulating a core2duo
 on 1 cpu with virtio HDs running on top of a core2duo host running 2.6.33.3.
 qemu-kvm version 0.12.3.

 Can you try commit 6992f5334995af474c2b58d010d08bc597f0f2fe in the latest
 kernel?


Hmm, 2.6.33 -stable has commit 846f99749ab68bbc7f75c74fec305de675b1a1bf?

Actually, these 3 commits fixed it:

6992f5334995af474c2b58d010d08bc597f0f2fe sysfs: Use one lockdep class
per sysfs ttribute.
a2db6842873c8e5a70652f278d469128cb52db70 sysfs: Only take active
references on attributes.
e72ceb8ccac5f770b3e696e09bb673dca7024b20 sysfs: Remove sysfs_get/put_active_two

However, there are many other patches needed to amend these, so I think
it's not suitable for -stable to include, perhaps a revert of
846f99749ab68bbc7f75c74fec305de675b1a1bf is better.

Adding Greg into Cc.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH resend 8/12] asm-generic: bitops: introduce le bit offset macro

2010-05-04 Thread Takuya Yoshikawa
On Tue, 04 May 2010 19:08:23 +0300
Avi Kivity a...@redhat.com wrote:

 On 05/04/2010 06:03 PM, Arnd Bergmann wrote:
  On Tuesday 04 May 2010, Takuya Yoshikawa wrote:
...
  So let us use the le bit offset calculation part by defining it as a new
  macro: generic_le_bit_offset() .
   
  Does this work correctly if your user space is 32 bits (i.e. unsigned long
  is different size in user space and kernel) in both big- and little-endian
  systems?
 
  I'm not sure about all the details, but I think you cannot in general share
  bitmaps between user space and kernel because of this.
 
 
 That's why the bitmaps are defined as little endian u64 aligned, even on 
 big endian 32-bit systems.  Little endian bitmaps are wordsize agnostic, 
 and u64 alignment ensures we can use long-sized bitops on mixed size 
 systems.

There was a suggestion to propose set_le_bit_user() kind of macros.
But what I thought was these have a constraint you two explained and seemed to 
be
a little bit specific to some area, like KVM.

So I decided to propose just the offset calculation macro.

  Thanks, Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >