[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #10 from Gleb g...@redhat.com  2012-02-16 08:04:22 ---
(In reply to comment #8)
 Not sure what you mean by installing trace-cmd before capturing the trace--I
 did do that, otherwise I wouldn't have had a trace-cmd to run. The package
I meant that if you compile it by yourself you need to run make install
instead of running it from where it was compiled. Otherwise it does not find
plugins. If you install it from package manager this is not relevant of course.

 version is trace-cmd 1.0.3-0ubuntu1 if that helps. I tried running it again
 against qemu 1.0 (the last one was for qemu 0.14), still contained a bunch of
 [FAILED TO PARSE] messages.
 
Something wrong with ubuntu trace-cmd then. Either it does not have kvm plugin
or it is too old.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779


Jan Kiszka jan.kis...@web.de changed:

   What|Removed |Added

 CC||jan.kis...@web.de




--- Comment #11 from Jan Kiszka jan.kis...@web.de  2012-02-16 08:31:18 ---
(In reply to comment #4)
 Code=8e 2b 01 00 00 8b 4d f0 89 f2 8b 45 ec 0f 0d 82 40 01 00 00 0f 6f 02 0f
 6f 4a 08 0f 6f 52 10 0f 6f 5a 18 0f 7f 00 0f 7f 48 08 0f 7f 50 10 0f 7f 58 18

This is movq (%rdx),%mm0. Possibly still unsupported by the emulator.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #12 from Gleb g...@redhat.com  2012-02-16 08:40:37 ---
Can you try disabling CONFIG_FB in your guest kernel?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-02-16 Thread David Cure
hello,

Le Tue, Feb 14, 2012 at 03:40:30PM +0200, Gleb Natapov ecrivait :
 
 Try to add -no-hpet to qemu command line and see if it helps.

I add this line in my xml definition for libvirt :

timer name='hpet' present='no'/ in the clock block. And I see
the -no-hpet in the command line :

/usr/bin/kvm -S -M pc-1.0 -enable-kvm -m 16384 -smp
2,sockets=2,cores=1,threads=1 -name rds1 -uuid
4f72a64e-de1a-b3da-adac-5dffdb9eb6aa -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/rds1.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
-no-hpet -no-shutdown -drive
file=/dev/drbd1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
-device
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-drive
file=/Iso/SW_DVD5_Windows_Svr_DC_EE_SE_Web_2008R2_64-bit_French_X15-59758.ISO,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
-netdev tap,fd=18,id=hostnet0 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:36:0c:a1:c4,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -usb -device
usb-tablet,id=input0 -vnc 127.0.0.1:0 -k fr -vga std -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

But unfortunatly, the result is the same : always 12s to runs
this function (to remember 8s on ESXi 4).

David.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Gleb Natapov
On Wed, Feb 15, 2012 at 03:59:33PM -0600, Anthony Liguori wrote:
 On 02/15/2012 07:39 AM, Avi Kivity wrote:
 On 02/07/2012 08:12 PM, Rusty Russell wrote:
 I would really love to have this, but the problem is that we'd need a
 general purpose bytecode VM with binding to some kernel APIs.  The
 bytecode VM, if made general enough to host more complicated devices,
 would likely be much larger than the actual code we have in the kernel now.
 
 We have the ability to upload bytecode into the kernel already.  It's in
 a great bytecode interpreted by the CPU itself.
 
 Unfortunately it's inflexible (has to come with the kernel) and open to
 security vulnerabilities.
 
 I wonder if there's any reasonable way to run device emulation
 within the context of the guest.  Could we effectively do something
 like SMM?
 
 For a given set of traps, reflect back into the guest quickly
 changing the visibility of the VGA region. It may require installing
 a new CR3 but maybe that wouldn't be so bad with VPIDs.
 
What will it buy us? Surely not speed. Entering a guest is not much
(if at all) faster than exiting to userspace and any non trivial
operation will require exit to userspace anyway, so we just added one
more guest entry/exit operation on the way to userspace.

 Then you could implement the PIT as guest firmware using kvmclock as the time 
 base.
 
 Once you're back in the guest, you could install the old CR3.
 Perhaps just hide a portion of the physical address space with the
 e820.
 
 Regards,
 
 Anthony Liguori
 
 If every user were emulating different machines, LPF this would make
 sense.  Are they?
 
 They aren't.
 
 Or should we write those helpers once, in C, and
 provide that for them.
 
 There are many of them: PIT/PIC/IOAPIC/MSIX tables/HPET/kvmclock/Hyper-V
 stuff/vhost-net/DMA remapping/IO remapping (just for x86), and some of
 them are quite complicated.  However implementing them in bytecode
 amounts to exposing a stable kernel ABI, since they use such a vast
 range of kernel services.
 

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-02-16 Thread Gleb Natapov
On Thu, Feb 16, 2012 at 09:55:53AM +0100, David Cure wrote:
   hello,
 
 Le Tue, Feb 14, 2012 at 03:40:30PM +0200, Gleb Natapov ecrivait :
  
  Try to add -no-hpet to qemu command line and see if it helps.
 
   I add this line in my xml definition for libvirt :
 
   timer name='hpet' present='no'/ in the clock block. And I see
 the -no-hpet in the command line :
 
 /usr/bin/kvm -S -M pc-1.0 -enable-kvm -m 16384 -smp
 2,sockets=2,cores=1,threads=1 -name rds1 -uuid
 4f72a64e-de1a-b3da-adac-5dffdb9eb6aa -nodefconfig -nodefaults -chardev
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/rds1.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
 -no-hpet -no-shutdown -drive
 file=/dev/drbd1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
 -device
 virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive
 file=/Iso/SW_DVD5_Windows_Svr_DC_EE_SE_Web_2008R2_64-bit_French_X15-59758.ISO,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
 -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
 -netdev tap,fd=18,id=hostnet0 -device
 virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:36:0c:a1:c4,bus=pci.0,addr=0x3
 -chardev pty,id=charserial0 -device
 isa-serial,chardev=charserial0,id=serial0 -usb -device
 usb-tablet,id=input0 -vnc 127.0.0.1:0 -k fr -vga std -device
 virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
 
   But unfortunatly, the result is the same : always 12s to runs
 this function (to remember 8s on ESXi 4).
 
Can you do the trace again with -no-hpet on the command line?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-02-16 Thread David Cure
Le Tue, Feb 14, 2012 at 03:32:16PM +0200, Avi Kivity ecrivait :
 
 It's reading the HPET like crazy.  There are also tons of interrupts. 
 Please use the windows performance tools to see which devices trigger
 these interrupts.
 
 The HPET issue will be fixed by the hyper-V enlightenments, but these
 will take some time to cook.

as you can see, I try -no-hpet and doesn't solve the issue, so
perhaps not related to it.


Le Tue, Feb 14, 2012 at 08:48:56AM -0500, Vadim Rozenfeld ecrivait :
 
 [VR]
 +1 
 Try Microsoft Windows Performance Toolkit from Windows SDK 
 http://www.microsoft.com/download/en/details.aspx?displaylang=enid=3138
 It's really good.

I download and install it. Let you know what we can see.

Thanks,

David.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Synchronize cpu state with kernel before poking into registers.

2012-02-16 Thread Gleb Natapov
Call to kvm_cpu_synchronize_state() is missing. 
kvm_arch_stop_on_emulation_error may
look at outdated registers here.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7079e87..51d0ae7 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -2020,6 +2020,7 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run 
*run)
 
 bool kvm_arch_stop_on_emulation_error(CPUState *env)
 {
+kvm_cpu_synchronize_state(env);
 return !(env-cr[0]  CR0_PE_MASK) ||
((env-segs[R_CS].selector   3) != 3);
 }
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-16 Thread Liu Yu
If the guest hypervisor node contains has-idle property.

Signed-off-by: Liu Yu yu@freescale.com
---
v4:
1. discard the CONFIG_E500 to make code for all powerpc platform
2. code cleanup

 arch/powerpc/kernel/epapr.S  |   29 +
 arch/powerpc/kernel/epapr_para.c |   13 -
 2 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/epapr.S b/arch/powerpc/kernel/epapr.S
index 697b390..f06d636 100644
--- a/arch/powerpc/kernel/epapr.S
+++ b/arch/powerpc/kernel/epapr.S
@@ -15,6 +15,35 @@
 #include asm/ppc_asm.h
 #include asm/asm-offsets.h
 
+#define HC_VENDOR_EPAPR(1  16)
+#define HC_EV_IDLE 16
+
+_GLOBAL(epapr_ev_idle)
+epapr_ev_idle:
+   rlwinm  r3,r1,0,0,31-THREAD_SHIFT   /* current thread_info */
+   lwz r4,TI_LOCAL_FLAGS(r3)   /* set napping bit */
+   ori r4,r4,_TLF_NAPPING  /* so when we take an exception */
+   stw r4,TI_LOCAL_FLAGS(r3)   /* it will return to our caller */
+
+   wrteei  1
+
+idle_loop:
+   LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE)
+
+.global epapr_ev_idle_start
+epapr_ev_idle_start:
+   li  r3, -1
+   nop
+   nop
+   nop
+
+   /*
+* Guard against spurious wakeups (e.g. from a hypervisor) --
+* any real interrupt will cause us to return to LR due to
+* _TLF_NAPPING.
+*/
+   b   idle_loop
+
 /* Hypercall entry point. Will be patched with device tree instructions. */
 .global epapr_hypercall_start
 epapr_hypercall_start:
diff --git a/arch/powerpc/kernel/epapr_para.c b/arch/powerpc/kernel/epapr_para.c
index ea13cac..adee4f1 100644
--- a/arch/powerpc/kernel/epapr_para.c
+++ b/arch/powerpc/kernel/epapr_para.c
@@ -20,6 +20,10 @@
 #include linux/of.h
 #include asm/epapr_hcalls.h
 #include asm/cacheflush.h
+#include asm/machdep.h
+
+extern void epapr_ev_idle(void);
+extern u32 epapr_ev_idle_start[];
 
 bool epapr_para_enabled = false;
 
@@ -35,10 +39,17 @@ static int __init epapr_para_init(void)
 
insts = of_get_property(hyper_node, hcall-instructions, len);
if (!(len % 4)  (len = (4 * 4))) {
-   for (i = 0; i  (len / 4); i++)
+   for (i = 0; i  (len / 4); i++) {
epapr_hypercall_start[i] = insts[i];
+   epapr_ev_idle_start[i] = insts[i];
+   }
flush_icache_range((ulong)epapr_hypercall_start,
   (ulong)epapr_hypercall_start + len);
+   flush_icache_range((ulong)epapr_ev_idle_start,
+  (ulong)epapr_ev_idle_start + len);
+
+   if (of_get_property(hyper_node, has-idle, NULL))
+   ppc_md.power_save = epapr_ev_idle;
 
epapr_para_enabled = true;
}
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/3] KVM: PPC: epapr: Factor out the epapr init

2012-02-16 Thread Liu Yu
from the kvm guest paravirt init code.

Signed-off-by: Liu Yu yu@freescale.com
---
v4:
1. code cleanup
2. move kvm_hypercall_start() to epapr_hypercall_start()

 arch/powerpc/Kconfig|4 ++
 arch/powerpc/include/asm/epapr_hcalls.h |2 +
 arch/powerpc/kernel/Makefile|1 +
 arch/powerpc/kernel/epapr.S |   25 
 arch/powerpc/kernel/epapr_para.c|   49 +++
 arch/powerpc/kernel/kvm.c   |   28 ++
 arch/powerpc/kernel/kvm_emul.S  |   10 --
 arch/powerpc/kvm/Kconfig|1 +
 8 files changed, 85 insertions(+), 35 deletions(-)
 create mode 100644 arch/powerpc/kernel/epapr.S
 create mode 100644 arch/powerpc/kernel/epapr_para.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1919634..1262b43 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -202,6 +202,10 @@ config EPAPR_BOOT
  Used to allow a board to specify it wants an ePAPR compliant wrapper.
default n
 
+config EPAPR_PARAVIRT
+   bool
+   default n
+
 config DEFAULT_UIMAGE
bool
help
diff --git a/arch/powerpc/include/asm/epapr_hcalls.h 
b/arch/powerpc/include/asm/epapr_hcalls.h
index f3b0c2c..0ff3f24 100644
--- a/arch/powerpc/include/asm/epapr_hcalls.h
+++ b/arch/powerpc/include/asm/epapr_hcalls.h
@@ -148,6 +148,8 @@
 #define EV_HCALL_CLOBBERS2 EV_HCALL_CLOBBERS3, r5
 #define EV_HCALL_CLOBBERS1 EV_HCALL_CLOBBERS2, r4
 
+extern bool epapr_para_enabled;
+extern u32 epapr_hypercall_start[];
 
 /*
  * We use uintptr_t to define a register because it's guaranteed to be a
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index ee728e4..9e807f3 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -136,6 +136,7 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),)
 obj-y  += ppc_save_regs.o
 endif
 
+obj-$(CONFIG_EPAPR_PARAVIRT)   += epapr_para.o epapr.o
 obj-$(CONFIG_KVM_GUEST)+= kvm.o kvm_emul.o
 
 # Disable GCOV in odd or sensitive code
diff --git a/arch/powerpc/kernel/epapr.S b/arch/powerpc/kernel/epapr.S
new file mode 100644
index 000..697b390
--- /dev/null
+++ b/arch/powerpc/kernel/epapr.S
@@ -0,0 +1,25 @@
+/*
+ * Copyright (C) 2012 Freescale Semiconductor, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include linux/threads.h
+#include asm/reg.h
+#include asm/page.h
+#include asm/cputable.h
+#include asm/thread_info.h
+#include asm/ppc_asm.h
+#include asm/asm-offsets.h
+
+/* Hypercall entry point. Will be patched with device tree instructions. */
+.global epapr_hypercall_start
+epapr_hypercall_start:
+   li  r3, -1
+   nop
+   nop
+   nop
+   blr
diff --git a/arch/powerpc/kernel/epapr_para.c b/arch/powerpc/kernel/epapr_para.c
new file mode 100644
index 000..ea13cac
--- /dev/null
+++ b/arch/powerpc/kernel/epapr_para.c
@@ -0,0 +1,49 @@
+/*
+ * ePAPR para-virtualization support.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright (C) 2012 Freescale Semiconductor, Inc.
+ */
+
+#include linux/of.h
+#include asm/epapr_hcalls.h
+#include asm/cacheflush.h
+
+bool epapr_para_enabled = false;
+
+static int __init epapr_para_init(void)
+{
+   struct device_node *hyper_node;
+   const u32 *insts;
+   int len, i;
+
+   hyper_node = of_find_node_by_path(/hypervisor);
+   if (!hyper_node)
+   return -ENODEV;
+
+   insts = of_get_property(hyper_node, hcall-instructions, len);
+   if (!(len % 4)  (len = (4 * 4))) {
+   for (i = 0; i  (len / 4); i++)
+   epapr_hypercall_start[i] = insts[i];
+   flush_icache_range((ulong)epapr_hypercall_start,
+  (ulong)epapr_hypercall_start + len);
+
+   epapr_para_enabled = true;
+   }
+
+   return 0;
+}
+
+early_initcall(epapr_para_init);
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 62bdf23..9dfc24a 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -31,6 +31,7 @@
 #include 

[PATCH v4 2/3] KVM: PPC: epapr: Add idle hcall support for host

2012-02-16 Thread Liu Yu
And add a new flag definition in kvm_ppc_pvinfo to indicate
whether host support EV_IDLE hcall.

Signed-off-by: Liu Yu yu@freescale.com
---
v4:
no change

 arch/powerpc/include/asm/kvm_para.h |   14 --
 arch/powerpc/kvm/powerpc.c  |8 
 include/linux/kvm.h |2 ++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 7b754e7..81a34c9 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -75,9 +75,19 @@ struct kvm_vcpu_arch_shared {
 };
 
 #define KVM_SC_MAGIC_R00x4b564d21 /* KVM! */
-#define HC_VENDOR_KVM  (42  16)
+
+#include asm/epapr_hcalls.h
+
+/* ePAPR Hypercall Vendor ID */
+#define HC_VENDOR_EPAPR(EV_EPAPR_VENDOR_ID  16)
+#define HC_VENDOR_KVM  (EV_KVM_VENDOR_ID  16)
+
+/* ePAPR Hypercall Token */
+#define HC_EV_IDLE EV_IDLE
+
+/* ePAPR Hypercall Return Codes */
 #define HC_EV_SUCCESS  0
-#define HC_EV_UNIMPLEMENTED12
+#define HC_EV_UNIMPLEMENTEDEV_UNIMPLEMENTED
 
 #define KVM_FEATURE_MAGIC_PAGE 1
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 0e21d15..03ebd5d 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -81,6 +81,10 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 
/* Second return value is in r4 */
break;
+   case HC_VENDOR_EPAPR | HC_EV_IDLE:
+   r = HC_EV_SUCCESS;
+   kvm_vcpu_block(vcpu);
+   break;
default:
r = HC_EV_UNIMPLEMENTED;
break;
@@ -746,6 +750,10 @@ static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo 
*pvinfo)
pvinfo-hcall[2] = inst_sc;
pvinfo-hcall[3] = inst_nop;
 
+#ifdef CONFIG_BOOKE
+   pvinfo-flags |= KVM_PPC_PVINFO_FLAGS_EV_IDLE;
+#endif
+
return 0;
 }
 
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index acbe429..6b2c70e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -449,6 +449,8 @@ struct kvm_ppc_pvinfo {
__u8  pad[108];
 };
 
+#define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (10)
+
 #define KVMIO 0xAE
 
 /* machine type bits, to be used as argument to KVM_CREATE_VM */
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/3] KVM: PPC: epapr: Factor out the epapr init

2012-02-16 Thread Alexander Graf


On 16.02.2012, at 10:26, Liu Yu yu@freescale.com wrote:

 from the kvm guest paravirt init code.
 
 Signed-off-by: Liu Yu yu@freescale.com
 ---
 v4:
 1. code cleanup
 2. move kvm_hypercall_start() to epapr_hypercall_start()
 
 arch/powerpc/Kconfig|4 ++
 arch/powerpc/include/asm/epapr_hcalls.h |2 +
 arch/powerpc/kernel/Makefile|1 +
 arch/powerpc/kernel/epapr.S |   25 
 arch/powerpc/kernel/epapr_para.c|   49 +++
 arch/powerpc/kernel/kvm.c   |   28 ++
 arch/powerpc/kernel/kvm_emul.S  |   10 --
 arch/powerpc/kvm/Kconfig|1 +
 8 files changed, 85 insertions(+), 35 deletions(-)
 create mode 100644 arch/powerpc/kernel/epapr.S
 create mode 100644 arch/powerpc/kernel/epapr_para.c
 
 diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
 index 1919634..1262b43 100644
 --- a/arch/powerpc/Kconfig
 +++ b/arch/powerpc/Kconfig
 @@ -202,6 +202,10 @@ config EPAPR_BOOT
  Used to allow a board to specify it wants an ePAPR compliant wrapper.
default n
 
 +config EPAPR_PARAVIRT
 +bool
 +default n
 +
 config DEFAULT_UIMAGE
bool
help
 diff --git a/arch/powerpc/include/asm/epapr_hcalls.h 
 b/arch/powerpc/include/asm/epapr_hcalls.h
 index f3b0c2c..0ff3f24 100644
 --- a/arch/powerpc/include/asm/epapr_hcalls.h
 +++ b/arch/powerpc/include/asm/epapr_hcalls.h
 @@ -148,6 +148,8 @@
 #define EV_HCALL_CLOBBERS2 EV_HCALL_CLOBBERS3, r5
 #define EV_HCALL_CLOBBERS1 EV_HCALL_CLOBBERS2, r4
 
 +extern bool epapr_para_enabled;
 +extern u32 epapr_hypercall_start[];
 
 /*
  * We use uintptr_t to define a register because it's guaranteed to be a
 diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
 index ee728e4..9e807f3 100644
 --- a/arch/powerpc/kernel/Makefile
 +++ b/arch/powerpc/kernel/Makefile
 @@ -136,6 +136,7 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),)
 obj-y+= ppc_save_regs.o
 endif
 
 +obj-$(CONFIG_EPAPR_PARAVIRT)+= epapr_para.o epapr.o
 obj-$(CONFIG_KVM_GUEST)+= kvm.o kvm_emul.o
 
 # Disable GCOV in odd or sensitive code
 diff --git a/arch/powerpc/kernel/epapr.S b/arch/powerpc/kernel/epapr.S
 new file mode 100644
 index 000..697b390
 --- /dev/null
 +++ b/arch/powerpc/kernel/epapr.S
 @@ -0,0 +1,25 @@
 +/*
 + * Copyright (C) 2012 Freescale Semiconductor, Inc.
 + *
 + * This program is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU General Public License
 + * as published by the Free Software Foundation; either version
 + * 2 of the License, or (at your option) any later version.
 + */
 +
 +#include linux/threads.h
 +#include asm/reg.h
 +#include asm/page.h
 +#include asm/cputable.h
 +#include asm/thread_info.h
 +#include asm/ppc_asm.h
 +#include asm/asm-offsets.h
 +
 +/* Hypercall entry point. Will be patched with device tree instructions. */
 +.global epapr_hypercall_start
 +epapr_hypercall_start:
 +lir3, -1
 +nop
 +nop
 +nop
 +blr
 diff --git a/arch/powerpc/kernel/epapr_para.c 
 b/arch/powerpc/kernel/epapr_para.c
 new file mode 100644
 index 000..ea13cac
 --- /dev/null
 +++ b/arch/powerpc/kernel/epapr_para.c
 @@ -0,0 +1,49 @@
 +/*
 + * ePAPR para-virtualization support.
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License, version 2, as
 + * published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program; if not, write to the Free Software
 + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
 + *
 + * Copyright (C) 2012 Freescale Semiconductor, Inc.
 + */
 +
 +#include linux/of.h
 +#include asm/epapr_hcalls.h
 +#include asm/cacheflush.h
 +
 +bool epapr_para_enabled = false;
 +
 +static int __init epapr_para_init(void)
 +{
 +struct device_node *hyper_node;
 +const u32 *insts;
 +int len, i;
 +
 +hyper_node = of_find_node_by_path(/hypervisor);
 +if (!hyper_node)
 +return -ENODEV;
 +
 +insts = of_get_property(hyper_node, hcall-instructions, len);
 +if (!(len % 4)  (len = (4 * 4))) {
 +for (i = 0; i  (len / 4); i++)

So if the dt passes in less than 4 ops, you bail out, and for more you 
overwrite the buffer? Not good :)

 +epapr_hypercall_start[i] = insts[i];
 +flush_icache_range((ulong)epapr_hypercall_start,
 +   (ulong)epapr_hypercall_start + len);
 +
 +epapr_para_enabled = true;
 +}
 +
 +return 0;
 +}
 +
 +early_initcall(epapr_para_init);
 diff --git 

Re: [PATCH v4 2/3] KVM: PPC: epapr: Add idle hcall support for host

2012-02-16 Thread Alexander Graf


On 16.02.2012, at 10:26, Liu Yu yu@freescale.com wrote:

 And add a new flag definition in kvm_ppc_pvinfo to indicate
 whether host support EV_IDLE hcall.
 
 Signed-off-by: Liu Yu yu@freescale.com
 ---
 v4:
 no change
 
 arch/powerpc/include/asm/kvm_para.h |   14 --
 arch/powerpc/kvm/powerpc.c  |8 
 include/linux/kvm.h |2 ++
 3 files changed, 22 insertions(+), 2 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_para.h 
 b/arch/powerpc/include/asm/kvm_para.h
 index 7b754e7..81a34c9 100644
 --- a/arch/powerpc/include/asm/kvm_para.h
 +++ b/arch/powerpc/include/asm/kvm_para.h
 @@ -75,9 +75,19 @@ struct kvm_vcpu_arch_shared {
 };
 
 #define KVM_SC_MAGIC_R00x4b564d21 /* KVM! */
 -#define HC_VENDOR_KVM(42  16)
 +
 +#include asm/epapr_hcalls.h
 +
 +/* ePAPR Hypercall Vendor ID */
 +#define HC_VENDOR_EPAPR(EV_EPAPR_VENDOR_ID  16)
 +#define HC_VENDOR_KVM(EV_KVM_VENDOR_ID  16)
 +
 +/* ePAPR Hypercall Token */
 +#define HC_EV_IDLEEV_IDLE
 +
 +/* ePAPR Hypercall Return Codes */
 #define HC_EV_SUCCESS0
 -#define HC_EV_UNIMPLEMENTED12
 +#define HC_EV_UNIMPLEMENTEDEV_UNIMPLEMENTED
 
 #define KVM_FEATURE_MAGIC_PAGE1
 
 diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
 index 0e21d15..03ebd5d 100644
 --- a/arch/powerpc/kvm/powerpc.c
 +++ b/arch/powerpc/kvm/powerpc.c
 @@ -81,6 +81,10 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 
/* Second return value is in r4 */
break;
 +case HC_VENDOR_EPAPR | HC_EV_IDLE:
 +r = HC_EV_SUCCESS;
 +kvm_vcpu_block(vcpu);
 +break;
default:
r = HC_EV_UNIMPLEMENTED;
break;
 @@ -746,6 +750,10 @@ static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo 
 *pvinfo)
pvinfo-hcall[2] = inst_sc;
pvinfo-hcall[3] = inst_nop;
 
 +#ifdef CONFIG_BOOKE

Why limit it to booke? The less ifdefs our code has, the better :)

Alex

 +pvinfo-flags |= KVM_PPC_PVINFO_FLAGS_EV_IDLE;
 +#endif
 +
return 0;
 }
 
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index acbe429..6b2c70e 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -449,6 +449,8 @@ struct kvm_ppc_pvinfo {
__u8  pad[108];
 };
 
 +#define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (10)
 +
 #define KVMIO 0xAE
 
 /* machine type bits, to be used as argument to KVM_CREATE_VM */
 -- 
 1.7.0.4
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-16 Thread Alexander Graf


On 16.02.2012, at 10:26, Liu Yu yu@freescale.com wrote:

 If the guest hypervisor node contains has-idle property.
 
 Signed-off-by: Liu Yu yu@freescale.com
 ---
 v4:
 1. discard the CONFIG_E500 to make code for all powerpc platform
 2. code cleanup
 
 arch/powerpc/kernel/epapr.S  |   29 +
 arch/powerpc/kernel/epapr_para.c |   13 -
 2 files changed, 41 insertions(+), 1 deletions(-)
 
 diff --git a/arch/powerpc/kernel/epapr.S b/arch/powerpc/kernel/epapr.S
 index 697b390..f06d636 100644
 --- a/arch/powerpc/kernel/epapr.S
 +++ b/arch/powerpc/kernel/epapr.S
 @@ -15,6 +15,35 @@
 #include asm/ppc_asm.h
 #include asm/asm-offsets.h
 
 +#define HC_VENDOR_EPAPR(1  16)
 +#define HC_EV_IDLE16
 +
 +_GLOBAL(epapr_ev_idle)
 +epapr_ev_idle:
 +rlwinmr3,r1,0,0,31-THREAD_SHIFT/* current thread_info */
 +lwzr4,TI_LOCAL_FLAGS(r3)/* set napping bit */
 +orir4,r4,_TLF_NAPPING/* so when we take an exception */
 +stwr4,TI_LOCAL_FLAGS(r3)/* it will return to our caller */
 +
 +wrteei1
 +
 +idle_loop:
 +LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE)
 +
 +.global epapr_ev_idle_start
 +epapr_ev_idle_start:
 +lir3, -1
 +nop
 +nop
 +nop

Can't you just bl into epapr_hypercall_start? You don't even have to save the 
old lr. because we never return anyways :)

Alex

 +
 +/*
 + * Guard against spurious wakeups (e.g. from a hypervisor) --
 + * any real interrupt will cause us to return to LR due to
 + * _TLF_NAPPING.
 + */
 +bidle_loop
 +
 /* Hypercall entry point. Will be patched with device tree instructions. */
 .global epapr_hypercall_start
 epapr_hypercall_start:
 diff --git a/arch/powerpc/kernel/epapr_para.c 
 b/arch/powerpc/kernel/epapr_para.c
 index ea13cac..adee4f1 100644
 --- a/arch/powerpc/kernel/epapr_para.c
 +++ b/arch/powerpc/kernel/epapr_para.c
 @@ -20,6 +20,10 @@
 #include linux/of.h
 #include asm/epapr_hcalls.h
 #include asm/cacheflush.h
 +#include asm/machdep.h
 +
 +extern void epapr_ev_idle(void);
 +extern u32 epapr_ev_idle_start[];
 
 bool epapr_para_enabled = false;
 
 @@ -35,10 +39,17 @@ static int __init epapr_para_init(void)
 
insts = of_get_property(hyper_node, hcall-instructions, len);
if (!(len % 4)  (len = (4 * 4))) {
 -for (i = 0; i  (len / 4); i++)
 +for (i = 0; i  (len / 4); i++) {
epapr_hypercall_start[i] = insts[i];
 +epapr_ev_idle_start[i] = insts[i];
 +}
flush_icache_range((ulong)epapr_hypercall_start,
   (ulong)epapr_hypercall_start + len);
 +flush_icache_range((ulong)epapr_ev_idle_start,
 +   (ulong)epapr_ev_idle_start + len);
 +
 +if (of_get_property(hyper_node, has-idle, NULL))
 +ppc_md.power_save = epapr_ev_idle;
 
epapr_para_enabled = true;
}
 -- 
 1.7.0.4
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Avi Kivity
On 02/16/2012 12:21 AM, Arnd Bergmann wrote:
 ioctl is good for hardware devices and stuff that you want to enumerate
 and/or control permissions on. For something like KVM that is really a
 core kernel service, a syscall makes much more sense.

 I would certainly never mix the two concepts: If you use a chardev to get
 a file descriptor, use ioctl to do operations on it, and if you use a 
 syscall to get the file descriptor then use other syscalls to do operations
 on it.

 I don't really have a good recommendation whether or not to change from an
 ioctl based interface to syscall for KVM now. On the one hand I believe it
 would be significantly cleaner, on the other hand we cannot remove the
 chardev interface any more since there are many existing users.


This sums up my feelings exactly.  Moving to syscalls would be an
improvement, but not so much an improvement as to warrant the thrashing
and the pain from having to maintain the old interface for a long while.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779


Avi Kivity a...@redhat.com changed:

   What|Removed |Added

 CC||a...@redhat.com




--- Comment #13 from Avi Kivity a...@redhat.com  2012-02-16 11:40:22 ---
Indeed the emulator doesn't support mmx yet.  But why is the kernel using mmx
instead of sse?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #14 from Jan Kiszka jan.kis...@web.de  2012-02-16 11:51:05 ---
Because the selected CPU type doesn't support SSE? Didn't check the config yet,
but I bet that's the reason.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


IO_PAGE_FAULT while starting xorg

2012-02-16 Thread edm
Hi,

I got this error while doing startx with 3.2.6 kernel and I can't start xorg:

[   54.683907] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x000220583200 flags=0x0010]

I saw the same error in this mailing list
(http://www.spinics.net/lists/kvm/msg48472.html) so I thought it was
the right ml for reporting this problem, am I wrong?

I attached the output of dmesg, lspci -vv and dmidecode.

disabling amd_iommu allow X to start but it's only a workaround.

This is the relevant part of my grub.cfg file:

# (0) Arch Linux
menuentry 'Arch Linux' {
 set root='(hd0,1)'; set legacy_hdbias='0'
 legacy_kernel   '/boot/vmlinuz-linux' '/boot/vmlinuz-linux'
'root=/dev/disk/by-uuid/a4259127-650c-4767-af64-9c86cdc1a5e1' 'ro'
'vga=773' 'amd_iommu_dump' '3'
 legacy_initrd '/boot/initramfs-linux.img' '/boot/initramfs-linux.img'

 # (1) Arch Linux
}


Thanks in advance,

EDM.


dmesg+dmidecode+lspcivv.tar.gz
Description: GNU Zip compressed data


Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-16 Thread Avi Kivity
On 02/16/2012 06:50 AM, Xiao Guangrong wrote:
 I think we do not need handle all tlb-flushed request here since all of these
 request can be delayed to the point where mmu-lock is released , we can simply
 do it:

 void kvm_mmu_defer_remote_flush(kvm, need_flush)
 {
   if (need_flush)
   ++kvm-tlbs_dirty;
 }

 void kvm_mmu_commit_remote_flush(struct kvm *kvm)
 {
   int dirty_count = kvm-tlbs_dirty;

   smp_mb();

   if (!dirty_count)
   return;

   if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
   ++kvm-stat.remote_tlb_flush;

-- point A

   cmpxchg(kvm-tlbs_dirty, dirty_count, 0);
 }

 if this is ok, we only need do small change in the current code, since
 kvm_mmu_commit_remote_flush is very similar with kvm_flush_remote_tlbs().

Suppose at point A another thread executes defer_remote_flush(),
commit_remote_flush(), and defer_remote_flush() again.  This brings the
balue of tlbs_dirty back to 1 again, with the tlbs dirty.  The cmpxchg()
then resets tlbs_dirty, leaving the actual tlbs dirty.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Consistent VM backup via snapshot

2012-02-16 Thread Michael Weiser
Hello,

I am trying to backup a KVM VM by creating a snapshot of the running VM
and extracting that snapshot from the running VM using qemu-img convert.
Sources on the net suggest that I would get an image that contains
all the machine state of the snapshot. So when I run the VM it would
continue to run just where it was when the snapshot was created.

But instead, when I run the VM with the converted image, the guest boots
up and does a filesystem recovery just as if it had crashed. So the
machine state information seems to go missing when extracting.

Should this work or am I missing something?

Would it instead be possible to create a snapshot of the running VM and
then just rsync the qcow2 image file to the backup location while the VM
is still running? This way, the snapshot inside the image with all state
information of the running VM would stay intact and could be resumed. I
tried this and it worked. I'm just not sure whether it would always
work, that is: Is a snapshot guaranteed to stay consistent even when the
VM is running on top of it and the image file is copied away from under
it?
-- 
Thank you,
Micha
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42782] New: IO_PAGE_FAULT while starting xorg

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42782

   Summary: IO_PAGE_FAULT while starting xorg
   Product: Virtualization
   Version: unspecified
Kernel Version: 3.2.6
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: fuffi.il.fu...@gmail.com
Regression: No


Hi,

I got this error while doing startx with 3.2.6 kernel and I can't start xorg:

[   54.683907] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x000220583200 flags=0x0010]

I saw the same error in this mailing list
(http://www.spinics.net/lists/kvm/msg48472.html) so I thought it was
the right ml for reporting this problem, am I wrong?

I attached the output of dmesg, lspci -vv and dmidecode.

disabling amd_iommu allow X to start but it's only a workaround.

This is the relevant part of my grub.cfg file:

# (0) Arch Linux
menuentry 'Arch Linux' {
 set root='(hd0,1)'; set legacy_hdbias='0'
 legacy_kernel   '/boot/vmlinuz-linux' '/boot/vmlinuz-linux'
'root=/dev/disk/by-uuid/a4259127-650c-4767-af64-9c86cdc1a5e1' 'ro'
'vga=773' 'amd_iommu_dump' '3'
 legacy_initrd '/boot/initramfs-linux.img' '/boot/initramfs-linux.img'

 # (1) Arch Linux
}


Thanks in advance,

EDM.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42782] IO_PAGE_FAULT while starting xorg

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42782





--- Comment #1 from edm fuffi.il.fu...@gmail.com  2012-02-16 12:14:28 ---
Created an attachment (id=72398)
 -- (https://bugzilla.kernel.org/attachment.cgi?id=72398)
dmesg output

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42782] IO_PAGE_FAULT while starting xorg

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42782





--- Comment #2 from edm fuffi.il.fu...@gmail.com  2012-02-16 12:17:11 ---
Created an attachment (id=72399)
 -- (https://bugzilla.kernel.org/attachment.cgi?id=72399)
dmidecode output

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42782] IO_PAGE_FAULT while starting xorg

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42782





--- Comment #3 from edm fuffi.il.fu...@gmail.com  2012-02-16 12:17:43 ---
Created an attachment (id=72400)
 -- (https://bugzilla.kernel.org/attachment.cgi?id=72400)
lspci -vv output

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] correctly mask pmc index bits in RDPMC instruction emulation

2012-02-16 Thread Gleb Natapov

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 7aad544..3e48c1d 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -413,7 +413,7 @@ int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, 
u64 *data)
struct kvm_pmc *counters;
u64 ctr;
 
-   pmc = (3u  30) - 1;
+   pmc = ~(3u  30);
if (!fixed  pmc = pmu-nr_arch_gp_counters)
return 1;
if (fixed  pmc = pmu-nr_arch_fixed_counters)
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH uq/master] Synchronize cpu state with kernel before poking into registers.

2012-02-16 Thread Jan Kiszka
On 2012-02-16 10:12, Gleb Natapov wrote:
 Call to kvm_cpu_synchronize_state() is missing. 
 kvm_arch_stop_on_emulation_error may
 look at outdated registers here.
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 7079e87..51d0ae7 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -2020,6 +2020,7 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run 
 *run)
  
  bool kvm_arch_stop_on_emulation_error(CPUState *env)
  {
 +kvm_cpu_synchronize_state(env);
  return !(env-cr[0]  CR0_PE_MASK) ||
 ((env-segs[R_CS].selector   3) != 3);
  }

Reviewed-by: Jan Kiszka jan.kis...@siemens.com

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Fix build on i386 due to the latest tsc changes

2012-02-16 Thread Avi Kivity
The code fixed by the second patch looks suspect though:

  nsdiff = data - kvm-arch.last_tsc_write;
  nsdiff = (nsdiff * 1000) / vcpu-arch.virtual_tsc_khz;

before the division, nsdiff is in tsc units.  Dividing it by
tsc_khz/1000 is equivalent to multiplying it by 100 and dividing it by
tsc_hz, so the result is in units of mega-seconds.  I expect we want

  nsdiff /= (u64)tsc_khz * 1000;

(which is somewhat hard on i386)

Avi Kivity (2):
  KVM: Fix 64-bit division in kvm_set_tsc_khz()
  KVM: Fix 64-bit division in kvm_write_tsc()

 arch/x86/kvm/x86.c |   18 --
 1 files changed, 16 insertions(+), 2 deletions(-)

-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: Fix 64-bit division in kvm_set_tsc_khz()

2012-02-16 Thread Avi Kivity
Breaks i386 build.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/x86.c |   11 +--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1312f13..1d71b13 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -978,6 +978,13 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, 
u64 nsec)
   vcpu-arch.virtual_tsc_shift);
 }
 
+static u32 adjust_tsc_khz(u32 khz, s32 ppm)
+{
+   u64 v = (u64)khz * (100 + ppm);
+   do_div(v, 100);
+   return v;
+}
+
 static void kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 this_tsc_khz)
 {
u32 thresh_lo, thresh_hi;
@@ -995,8 +1002,8 @@ static void kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 
this_tsc_khz)
 * rate being applied is within that bounds of the hardware
 * rate.  If so, no scaling or compensation need be done.
 */
-   thresh_lo = (u64)tsc_khz * (100 - tsc_tolerance_ppm) / 100;
-   thresh_hi = (u64)tsc_khz * (100 + tsc_tolerance_ppm) / 100;
+   thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm);
+   thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm);
if (this_tsc_khz  thresh_lo || this_tsc_khz  thresh_hi) {
pr_debug(kvm: requested TSC rate %u falls outside tolerance 
[%u,%u]\n, this_tsc_khz, thresh_lo, thresh_hi);
use_scaling = 1;
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: Fix 64-bit division in kvm_write_tsc()

2012-02-16 Thread Avi Kivity
Breaks i386 build.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/x86.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1d71b13..c9d99e5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1034,7 +1034,14 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
 
/* n.b - signed multiplication and division required */
nsdiff = data - kvm-arch.last_tsc_write;
+#ifdef CONFIG_X86_64
nsdiff = (nsdiff * 1000) / vcpu-arch.virtual_tsc_khz;
+#else
+   /* do_div() only does unsigned */
+   asm(idiv %2; xor %%edx, %%edx
+   : =A(nsdiff)
+   : A(nsdiff * 1000), rm(vcpu-arch.virtual_tsc_khz));
+#endif
nsdiff -= elapsed;
if (nsdiff  0)
nsdiff = -nsdiff;
-- 
1.7.9

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Fix build on i386 due to the latest tsc changes

2012-02-16 Thread Avi Kivity
On 02/16/2012 03:48 PM, Avi Kivity wrote:
 The code fixed by the second patch looks suspect though:

   nsdiff = data - kvm-arch.last_tsc_write;
   nsdiff = (nsdiff * 1000) / vcpu-arch.virtual_tsc_khz;

 before the division, nsdiff is in tsc units.  Dividing it by
 tsc_khz/1000 is equivalent to multiplying it by 100 and dividing it by
 tsc_hz, so the result is in units of mega-seconds.  I expect we want


Actually it results in units of microseconds, while we want nanoseconds.

So maybe the correct code is

  nsdiff = (nsdiff * 100) / vcpu-arch.virtual_tsc_khz;

returning nanoseconds.

I note that if the guest writes a large value into the tsc, this breaks.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice

2012-02-16 Thread Danny Kukawka
arch/powerpc/kvm/book3s_hv.c: included 'linux/sched.h' twice,
remove the duplicate.

Signed-off-by: Danny Kukawka danny.kuka...@bisect.de
---
 arch/powerpc/kvm/book3s_hv.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 336983d..a726716 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -46,7 +46,6 @@
 #include asm/page.h
 #include asm/hvcall.h
 #include linux/gfp.h
-#include linux/sched.h
 #include linux/vmalloc.h
 #include linux/highmem.h
 
-- 
1.7.8.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Synchronize cpu state with kernel before poking into registers.

2012-02-16 Thread Avi Kivity
On 02/16/2012 11:12 AM, Gleb Natapov wrote:
 Call to kvm_cpu_synchronize_state() is missing. 
 kvm_arch_stop_on_emulation_error may
 look at outdated registers here.



Thanks, applied to uq/master.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42782] IO_PAGE_FAULT while starting xorg

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42782


Joerg Roedel j...@8bytes.org changed:

   What|Removed |Added

 CC||j...@8bytes.org




--- Comment #4 from Joerg Roedel j...@8bytes.org  2012-02-16 14:02:17 ---
You are using the closed-source NVidia driver which apparently has a bug. The
driver seems not to use the DMA-API properly and tries to DMA to an invalid
handle.

Please try the open source NVidia drivers or report the problem to NVidia
directly.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] BUG in pv_clock when overflow condition is detected

2012-02-16 Thread Avi Kivity
On 02/15/2012 07:18 PM, Igor Mammedov wrote:
  On 02/15/2012 01:23 PM, Igor Mammedov wrote:
 static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time
   *shadow)
 {
   -u64 delta = native_read_tsc() - shadow-tsc_timestamp;
   +u64 delta;
   +u64 tsc = native_read_tsc();
   +BUG_ON(tsc  shadow-tsc_timestamp);
   +delta = tsc - shadow-tsc_timestamp;
 return pvclock_scale_delta(delta, shadow-tsc_to_nsec_mul,
shadow-tsc_shift);
  
   Maybe a WARN_ON_ONCE()?  Otherwise a relatively minor hypervisor
   bug can
   kill the guest.
  
  
   An attempt to print from this place is not perfect since it often
   leads
   to recursive calling to this very function and it hang there
   anyway.
   But if you insist I'll re-post it with WARN_ON_ONCE,
   It won't make much difference because guest will hang/stall due
   overflow
   anyway.
  
  Won't a BUG_ON() also result in a printk?
 Yes, it will. But stack will still keep failure point and poking
 with crash/gdb at core will always show where it's BUGged.

 In case it manages to print dump somehow (saw it couple times from ~
 30 test cycles), logs from console or from kernel message buffer
 (again poking with gdb) will show where it was called from.

 If WARN* is used, it will still totaly screwup clock and 
 last value and system will become unusable, requiring looking with
 gdb/crash at the core any way.

 So I've just used more stable failure point that will leave trace
 everywhere it manages (maybe in console log, but for sure in stack)
 in case of WARN it might leave trace on console or not and probably
 won't reflect failure point in stack either leaving only kernel
 message buffer for clue.


Makes sense.  But do get an ack from the Xen people to ensure this
doesn't break for them.


-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4 V13] Add ioctl for KVM_KVMCLOCK_CTRL

2012-02-16 Thread Avi Kivity
On 02/08/2012 05:07 PM, Eric B Munson wrote:
 Now that we have a flag that will tell the guest it was suspended, create an
 interface for that communication using a KVM ioctl.

 +
 +Capability: KVM_CAP_KVMCLOCK_CTRL
 +Architectures: Any that implement pvclocks (currently x86 only)
 +Type: vcpu ioctl
 +Parameters: None
 +Returns: 0 on success, -1 on error
 +
 +This signals to the host kernel that the specified guest is being paused by
 +userspace.  The host will set a flag in the pvclock structure that is checked
 +from the soft lockup watchdog.  

This needs to be expanded.  There is a protocol between the host and
guest that needs to be documented, including the requirement to use a
single instruction (or ll/sc) to clear the flag.  See
Documentation/virtual/kvm/msr.txt

 This ioctl can be called during pause or
 +unpause.
 +

After pausing the vcpu, but before resuming it.


-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Anthony Liguori

On 02/16/2012 02:57 AM, Gleb Natapov wrote:

On Wed, Feb 15, 2012 at 03:59:33PM -0600, Anthony Liguori wrote:

On 02/15/2012 07:39 AM, Avi Kivity wrote:

On 02/07/2012 08:12 PM, Rusty Russell wrote:

I would really love to have this, but the problem is that we'd need a
general purpose bytecode VM with binding to some kernel APIs.  The
bytecode VM, if made general enough to host more complicated devices,
would likely be much larger than the actual code we have in the kernel now.


We have the ability to upload bytecode into the kernel already.  It's in
a great bytecode interpreted by the CPU itself.


Unfortunately it's inflexible (has to come with the kernel) and open to
security vulnerabilities.


I wonder if there's any reasonable way to run device emulation
within the context of the guest.  Could we effectively do something
like SMM?

For a given set of traps, reflect back into the guest quickly
changing the visibility of the VGA region. It may require installing
a new CR3 but maybe that wouldn't be so bad with VPIDs.


What will it buy us? Surely not speed. Entering a guest is not much
(if at all) faster than exiting to userspace and any non trivial
operation will require exit to userspace anyway,


You can emulate the PIT/RTC entirely within the guest using kvmclock which 
doesn't require an additional exit to get the current time base.


So instead of:

1) guest - host kernel
2) host kernel - userspace
3) implement logic using rdtscp via VDSO
4) userspace - host kernel
5) host kernel - guest

You go:

1) guest - host kernel
2) host kernel - guest (with special CR3)
3) implement logic using rdtscp + kvmclock page
4) change CR3 within guest and RETI to VMEXIT source RIP

Same basic concept as PS/2 emulation with SMM.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/9] target-i386: Add infrastructure for reporting TPR MMIO accesses

2012-02-16 Thread Jan Kiszka
On 2012-02-16 16:21, Avi Kivity wrote:
 On 02/14/2012 05:13 PM, Jan Kiszka wrote:
 Note that KVM without in-kernel irqchip will report the address after
 the instruction that triggered a write access. In contrast, read
 accesses will return the precise information.

 
 Well this is wierd.  We could retro-doc one or the other behaviour, but
 this-on-read-but-that-on-write is just too strange.
 
 The documented way of dealing with this is to queue a signal and reenter
 the guest.  kvm will perform anything it needs to complete the
 instruction (perhaps issuing more mmio, say if someone used movsd to
 read the APIC) and then exit on the signal.  By then rip will point
 exactly after the instruction.

Hmm, true. And can trivially be changed (I'm injecting the event after
instruction completion). Will role out a new version.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 6/9] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-16 Thread Avi Kivity
On 02/14/2012 05:13 PM, Jan Kiszka wrote:
 This enables acceleration for MMIO-based TPR registers accesses of
 32-bit Windows guest systems. It is mostly useful with KVM enabled,
 either on older Intel CPUs (without flexpriority feature, can also be
 manually disabled for testing) or any current AMD processor.

 The approach introduced here is derived from the original version of
 qemu-kvm. It was refactored, documented, and extended by support for
 user space APIC emulation, both with and without KVM acceleration. The
 VMState format was kept compatible, so was the ABI to the option ROM
 that implements the guest-side para-virtualized driver service. This
 enables seamless migration from qemu-kvm to upstream or, one day,
 between KVM and TCG mode.


 +printf(patch @%lx\n, (long)ip);

Debug code


-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 4/9] target-i386: Add infrastructure for reporting TPR MMIO accesses

2012-02-16 Thread Avi Kivity
On 02/14/2012 05:13 PM, Jan Kiszka wrote:
 Note that KVM without in-kernel irqchip will report the address after
 the instruction that triggered a write access. In contrast, read
 accesses will return the precise information.


Well this is wierd.  We could retro-doc one or the other behaviour, but
this-on-read-but-that-on-write is just too strange.

The documented way of dealing with this is to queue a signal and reenter
the guest.  kvm will perform anything it needs to complete the
instruction (perhaps issuing more mmio, say if someone used movsd to
read the APIC) and then exit on the signal.  By then rip will point
exactly after the instruction.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice

2012-02-16 Thread Avi Kivity
On 02/16/2012 03:55 PM, Danny Kukawka wrote:
 arch/powerpc/kvm/book3s_hv.c: included 'linux/sched.h' twice,
 remove the duplicate.


Thanks, applied.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] correctly mask pmc index bits in RDPMC instruction emulation

2012-02-16 Thread Avi Kivity
On 02/16/2012 02:44 PM, Gleb Natapov wrote:
 Signed-off-by: Gleb Natapov g...@redhat.com


Thanks, applied and queued for 3.3

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM paravirt issue?] Re: vsyscall=emulate regression

2012-02-16 Thread Avi Kivity
On 02/15/2012 09:36 PM, Andy Lutomirski wrote:
 Hi, kvm people-

 Here's a strange failure.  It could be a bug in something
 RHEL6-specific, but it could be a generic issue that only triggers
 with a paravirt guest with old userspace on a non-ept host.  There was
 a bug like this on Xen, and I'm wondering something's wrong on kvm as
 well.

 For background, a change in 3.1 (IIRC) means that, when
 vsyscall=emulate or vsyscall=none, the vsyscall page in the fixmap is
 NX.  It seems like Amit's machine is marking the physical PTE present
 but unreadable.  

No such thing as present and unreadable, without EPT.

 So I could have messed up, or there could be a subtle
 bug somewhere.  Any ideas?

What's the code trying to do?  Execute an instruction from an
non-executable page, trap the #PF, and emulate?  And what are the
symptoms? wrong error code for the #PF?  That could easily be a kvm bug.


 I'll try to reproduce on a non-ept host later on, but that will
 involve finding one.

rmmod kvm-intel
moprobe kvm-intel ept=0

 Hmm.  You don't have ept.  If your guest kernel supports paravirt,
 then you might use the hypercall interface instead of programming the
 fixmap directly.

There is no hypercall interface for writing page tables in kvm.


 
  This is what I get with vsyscall=none, where emulate and native work
  fine on the 3.2 kernel on different host hardware, the guest stays the
  same:
 
 
  [2.874661] debug: unmapping init memory 
  8167f000..818dc000
  [2.876778] Write protecting the kernel read-only data: 6144k
  [2.879111] debug: unmapping init memory 
  880001318000..88000140
  [2.881242] debug: unmapping init memory 
  8800015a..88000160
  [2.884637] init[1] vsyscall attempted with vsyscall=none 
  ip:ff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 
  di:0

 This like (vsyscall attempted) means that the emulation worked
 correctly.  Your other traces didn't have it or anything like it,
 which mostly rules out do_emulate_vsyscall issues.


Can you point me at the code in question?

Amit, a trace would be nice.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: perf: kvm events analysis tool

2012-02-16 Thread Arnaldo Carvalho de Melo
Em Wed, Feb 15, 2012 at 10:05:08PM -0700, David Ahern escreveu:
 On 2/15/12 9:59 PM, Xiao Guangrong wrote:
 
 
 Okay, i will post the next version after collecting your new comments!
 
 Thanks for your time, David! :)
 
 
 I had more comments, but got sidetracked and forgot to come back to
 this. I still haven't looked at the code yet, but some comments from
 testing:
 
 1. The error message:
   Warning: Error: expected type 5 but read 4
   Warning: Error: expected type 5 but read 0
   Warning: unknown op '}'
 
 is fixed by this patch which has not yet made its way into perf:
 https://lkml.org/lkml/2011/9/4/41
 
 The most recent request:
 https://lkml.org/lkml/2012/2/8/479
 
 Arnaldo: the patch still applies cleanly (but with an offset of -2 lines).

I'll merge this one now, recall Steven asked me to, thanks for the
reminder,

- Arnaldo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM paravirt issue?] Re: vsyscall=emulate regression

2012-02-16 Thread Andy Lutomirski
On Thu, Feb 16, 2012 at 8:17 AM, Avi Kivity a...@redhat.com wrote:
 On 02/15/2012 09:36 PM, Andy Lutomirski wrote:
 Hi, kvm people-

 Here's a strange failure.  It could be a bug in something
 RHEL6-specific, but it could be a generic issue that only triggers
 with a paravirt guest with old userspace on a non-ept host.  There was
 a bug like this on Xen, and I'm wondering something's wrong on kvm as
 well.

 For background, a change in 3.1 (IIRC) means that, when
 vsyscall=emulate or vsyscall=none, the vsyscall page in the fixmap is
 NX.  It seems like Amit's machine is marking the physical PTE present
 but unreadable.

 No such thing as present and unreadable, without EPT.

 So I could have messed up, or there could be a subtle
 bug somewhere.  Any ideas?

 What's the code trying to do?  Execute an instruction from an
 non-executable page, trap the #PF, and emulate?  And what are the
 symptoms? wrong error code for the #PF?  That could easily be a kvm bug.


The symptom is that some kind of access to a page that's supposed to
be readable, NX is reporting error 5.  I'm not quite sure what kind of
access is causing that.


 I'll try to reproduce on a non-ept host later on, but that will
 involve finding one.

 rmmod kvm-intel
 moprobe kvm-intel ept=0

I just tried that and still can't reproduce the problem.  FWIW, I also
failed to reproduce it on the one RHEL6 machine I have access to.


 Hmm.  You don't have ept.  If your guest kernel supports paravirt,
 then you might use the hypercall interface instead of programming the
 fixmap directly.

 There is no hypercall interface for writing page tables in kvm.

Evidently I was looking at the removed kvm_set_pte stuff :)



 
  This is what I get with vsyscall=none, where emulate and native work
  fine on the 3.2 kernel on different host hardware, the guest stays the
  same:
 
 
  [    2.874661] debug: unmapping init memory 
  8167f000..818dc000
  [    2.876778] Write protecting the kernel read-only data: 6144k
  [    2.879111] debug: unmapping init memory 
  880001318000..88000140
  [    2.881242] debug: unmapping init memory 
  8800015a..88000160
  [    2.884637] init[1] vsyscall attempted with vsyscall=none 
  ip:ff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 
  di:0

 This like (vsyscall attempted) means that the emulation worked
 correctly.  Your other traces didn't have it or anything like it,
 which mostly rules out do_emulate_vsyscall issues.


 Can you point me at the code in question?

The setup code is in arch/x86/kernel/vsyscall_64.c in map_vsyscall.
The bad access is to the vsyscall page.


 Amit, a trace would be nice.

The full output from a test boot of my (updated this morning) initramfs here:
http://web.mit.edu/luto/www/linux/vsyscall_initramfs.img
may give a better hint.

The updated code is here:

#include unistd.h
#include stdio.h
#include string.h
#include time.h

typedef time_t (*vsys_time_t)(time_t *);

int main()
{
  vsys_time_t vsys_time = (vsys_time_t)(0xff600400);
  unsigned char *p = (char*)0xff600400;
  int i;

  printf(Will try reading...\n);
  printf(The first few bytes are:\n);
  for (i = 0; i  16; i++) {
unsigned char c = p[i];
printf(%02x , (int)c);
  }
  printf(\n);

  printf(Will try executing...\n);
  printf(The time is %ld\n, (long)( vsys_time(0) ));

  printf(All done\n);
  while(1)
pause();
}

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #15 from madenginee...@gmail.com  2012-02-16 16:48:56 ---
I set the kernel up for an AMD Geode LX800 (CONFIG_MGEODE_LX), which does
supposedly support SSE. No idea why it would have built using MMX if that is
now deprecated in favor of SSE.

I can try disabling the framebuffer
(CONFIG_FB/CONFIG_FB_GEODE/CONFIG_FB_GEODE_LX) for testing purposes;
unfortunately this wouldn't work for the real thing, since we need it for the
Qt-based user interface.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #16 from madenginee...@gmail.com  2012-02-16 16:51:10 ---
Background: I'm using KVM as a test platform for code destined for a panel PC
using a Geode LX800.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-16 Thread Scott Wood
On 02/16/2012 04:24 AM, Alexander Graf wrote:
 On 16.02.2012, at 10:26, Liu Yu yu@freescale.com wrote:
 +_GLOBAL(epapr_ev_idle)
 +epapr_ev_idle:
 +rlwinmr3,r1,0,0,31-THREAD_SHIFT/* current thread_info */
 +lwzr4,TI_LOCAL_FLAGS(r3)/* set napping bit */
 +orir4,r4,_TLF_NAPPING/* so when we take an exception */
 +stwr4,TI_LOCAL_FLAGS(r3)/* it will return to our caller */
 +
 +wrteei1
 +
 +idle_loop:
 +LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE)
 +
 +.global epapr_ev_idle_start
 +epapr_ev_idle_start:
 +lir3, -1
 +nop
 +nop
 +nop
 
 Can't you just bl into epapr_hypercall_start? You don't even have to save the 
 old lr. because we never return anyways :)

The interrupt will branch to LR, so no, we can't trash it or put it
anywhere else.

-scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/3] KVM: PPC: epapr: Factor out the epapr init

2012-02-16 Thread Scott Wood
On 02/16/2012 03:26 AM, Liu Yu wrote:
 from the kvm guest paravirt init code.
 
 Signed-off-by: Liu Yu yu@freescale.com
 ---
 v4:
 1. code cleanup
 2. move kvm_hypercall_start() to epapr_hypercall_start()
 
  arch/powerpc/Kconfig|4 ++
  arch/powerpc/include/asm/epapr_hcalls.h |2 +
  arch/powerpc/kernel/Makefile|1 +
  arch/powerpc/kernel/epapr.S |   25 
  arch/powerpc/kernel/epapr_para.c|   49 
 +++
  arch/powerpc/kernel/kvm.c   |   28 ++
  arch/powerpc/kernel/kvm_emul.S  |   10 --
  arch/powerpc/kvm/Kconfig|1 +
  8 files changed, 85 insertions(+), 35 deletions(-)
  create mode 100644 arch/powerpc/kernel/epapr.S
  create mode 100644 arch/powerpc/kernel/epapr_para.c

The comment about spelling out paravirt wasnn't meant to be restricted
to the kconfig symbol.  There are lots of words that begin with para,
and ePAPR isn't just about virtualization.

 diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
 index 1919634..1262b43 100644
 --- a/arch/powerpc/Kconfig
 +++ b/arch/powerpc/Kconfig
 @@ -202,6 +202,10 @@ config EPAPR_BOOT
 Used to allow a board to specify it wants an ePAPR compliant wrapper.
   default n
  
 +config EPAPR_PARAVIRT
 + bool
 + default n

Why isn't this user-selectable?  It's useful on its own.

 diff --git a/arch/powerpc/kernel/epapr.S b/arch/powerpc/kernel/epapr.S
 new file mode 100644
 index 000..697b390
 --- /dev/null
 +++ b/arch/powerpc/kernel/epapr.S

epapr-hcall.S

 +bool epapr_para_enabled = false;
 +
 +static int __init epapr_para_init(void)
 +{
 + struct device_node *hyper_node;
 + const u32 *insts;
 + int len, i;
 +
 + hyper_node = of_find_node_by_path(/hypervisor);
 + if (!hyper_node)
 + return -ENODEV;
 +
 + insts = of_get_property(hyper_node, hcall-instructions, len);
 + if (!(len % 4)  (len = (4 * 4))) {
 + for (i = 0; i  (len / 4); i++)
 + epapr_hypercall_start[i] = insts[i];
 + flush_icache_range((ulong)epapr_hypercall_start,
 +(ulong)epapr_hypercall_start + len);
 +
 + epapr_para_enabled = true;
 + }

Use patch_instruction(), fix the if test, and remove unnecessary
parentheses.  Print an error if the if test fails, but return silently
if the property is absent.

Please make asm/epapr_hcalls.h and asm/fsl_hcalls.h work with this as well.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM paravirt issue?] Re: vsyscall=emulate regression

2012-02-16 Thread Avi Kivity
On 02/16/2012 06:45 PM, Andy Lutomirski wrote:
 
  So I could have messed up, or there could be a subtle
  bug somewhere.  Any ideas?
 
  What's the code trying to do?  Execute an instruction from an
  non-executable page, trap the #PF, and emulate?  And what are the
  symptoms? wrong error code for the #PF?  That could easily be a kvm bug.
 

 The symptom is that some kind of access to a page that's supposed to
 be readable, NX is reporting error 5.  I'm not quite sure what kind of
 access is causing that.

Might it be a fetch access, with kvm forgetting to set bit 4 correctly?

 
  Can you point me at the code in question?

 The setup code is in arch/x86/kernel/vsyscall_64.c in map_vsyscall.
 The bad access is to the vsyscall page.

The bad access is on purpose, yes?

From fault.c:

#ifdef CONFIG_X86_64
/*
 * Instruction fetch faults in the vsyscall page might need
 * emulation.
 */
if (unlikely((error_code  PF_INSTR) 
 ((address  ~0xfff) == VSYSCALL_START))) {
if (emulate_vsyscall(regs, address))
return;
}
#endif

so it seems like kvm doesn't set PF_INSTR?

I thought we unit tested that, but maybe not this exact scenario.


-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-16 Thread Scott Wood
On 02/16/2012 03:26 AM, Liu Yu wrote:
 If the guest hypervisor node contains has-idle property.
 
 Signed-off-by: Liu Yu yu@freescale.com
 ---
 v4:
 1. discard the CONFIG_E500 to make code for all powerpc platform
 2. code cleanup

Is the TLF_NAPPING stuff supported on all powerpc platforms?

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-16 Thread Alexander Graf

On 16.02.2012, at 17:58, Scott Wood wrote:

 On 02/16/2012 04:24 AM, Alexander Graf wrote:
 On 16.02.2012, at 10:26, Liu Yu yu@freescale.com wrote:
 +_GLOBAL(epapr_ev_idle)
 +epapr_ev_idle:
 +rlwinmr3,r1,0,0,31-THREAD_SHIFT/* current thread_info */
 +lwzr4,TI_LOCAL_FLAGS(r3)/* set napping bit */
 +orir4,r4,_TLF_NAPPING/* so when we take an exception */
 +stwr4,TI_LOCAL_FLAGS(r3)/* it will return to our caller */
 +
 +wrteei1
 +
 +idle_loop:
 +LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE)
 +
 +.global epapr_ev_idle_start
 +epapr_ev_idle_start:
 +lir3, -1
 +nop
 +nop
 +nop
 
 Can't you just bl into epapr_hypercall_start? You don't even have to save 
 the old lr. because we never return anyways :)
 
 The interrupt will branch to LR, so no, we can't trash it or put it
 anywhere else.

Hrm. But we can clobber ctr, right? So how about we make the generic version do 
a bctr and then just do a small C wrapper that takes lr, moves it to ctr and 
branches to the generic one?

Then we don't have to replicate the hypercall code all over again for every 
invocation.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #17 from Jan Kiszka jan.kis...@web.de  2012-02-16 17:24:34 ---
(In reply to comment #15)
 I set the kernel up for an AMD Geode LX800 (CONFIG_MGEODE_LX), which does
 supposedly support SSE. No idea why it would have built using MMX if that is
 now deprecated in favor of SSE.

At least not according to the Linux configuration: MGEODE_LX only sets
X86_USE_3DNOW, thus MMX.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-16 Thread Scott Wood
On 02/16/2012 11:18 AM, Alexander Graf wrote:
 
 On 16.02.2012, at 17:58, Scott Wood wrote:
 
 On 02/16/2012 04:24 AM, Alexander Graf wrote:
 On 16.02.2012, at 10:26, Liu Yu yu@freescale.com wrote:
 +_GLOBAL(epapr_ev_idle)
 +epapr_ev_idle:
 +rlwinmr3,r1,0,0,31-THREAD_SHIFT/* current thread_info */
 +lwzr4,TI_LOCAL_FLAGS(r3)/* set napping bit */
 +orir4,r4,_TLF_NAPPING/* so when we take an exception */
 +stwr4,TI_LOCAL_FLAGS(r3)/* it will return to our caller */
 +
 +wrteei1
 +
 +idle_loop:
 +LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE)
 +
 +.global epapr_ev_idle_start
 +epapr_ev_idle_start:
 +lir3, -1
 +nop
 +nop
 +nop

 Can't you just bl into epapr_hypercall_start? You don't even have to save 
 the old lr. because we never return anyways :)

 The interrupt will branch to LR, so no, we can't trash it or put it
 anywhere else.
 
 Hrm. But we can clobber ctr, right? So how about we make the generic version 
 do a bctr and then just do a small C wrapper that takes lr, moves it to ctr 
 and branches to the generic one?

If it's just for this, I would say don't mess with the normal hcall path
for the sake of idle.  If using CTR would let us get away without
creating a stack frame in call sites, maybe that would be worthwhile,
depending on what sort of hcalls we end up having.

 Then we don't have to replicate the hypercall code all over again for every 
 invocation.

We shouldn't need to do it for every invocation.  Idle is special due to
the TLF_NAPPING hack.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-16 Thread Alexander Graf

On 16.02.2012, at 18:28, Scott Wood wrote:

 On 02/16/2012 11:18 AM, Alexander Graf wrote:
 
 On 16.02.2012, at 17:58, Scott Wood wrote:
 
 On 02/16/2012 04:24 AM, Alexander Graf wrote:
 On 16.02.2012, at 10:26, Liu Yu yu@freescale.com wrote:
 +_GLOBAL(epapr_ev_idle)
 +epapr_ev_idle:
 +rlwinmr3,r1,0,0,31-THREAD_SHIFT/* current thread_info */
 +lwzr4,TI_LOCAL_FLAGS(r3)/* set napping bit */
 +orir4,r4,_TLF_NAPPING/* so when we take an exception */
 +stwr4,TI_LOCAL_FLAGS(r3)/* it will return to our caller */
 +
 +wrteei1
 +
 +idle_loop:
 +LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE)
 +
 +.global epapr_ev_idle_start
 +epapr_ev_idle_start:
 +lir3, -1
 +nop
 +nop
 +nop
 
 Can't you just bl into epapr_hypercall_start? You don't even have to save 
 the old lr. because we never return anyways :)
 
 The interrupt will branch to LR, so no, we can't trash it or put it
 anywhere else.
 
 Hrm. But we can clobber ctr, right? So how about we make the generic version 
 do a bctr and then just do a small C wrapper that takes lr, moves it to ctr 
 and branches to the generic one?
 
 If it's just for this, I would say don't mess with the normal hcall path
 for the sake of idle.  If using CTR would let us get away without
 creating a stack frame in call sites, maybe that would be worthwhile,
 depending on what sort of hcalls we end up having.
 
 Then we don't have to replicate the hypercall code all over again for every 
 invocation.
 
 We shouldn't need to do it for every invocation.  Idle is special due to
 the TLF_NAPPING hack.

Famous last words. If it's the only case, duplication should be ok. Let's hope 
there are no others.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #18 from Avi Kivity a...@redhat.com  2012-02-16 17:30:47 ---
As a workaround you can use -cpu blah,-3dnow.  But we'll have to implement mmx
movq.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM paravirt issue?] Re: vsyscall=emulate regression

2012-02-16 Thread Andy Lutomirski
On Thu, Feb 16, 2012 at 9:14 AM, Avi Kivity a...@redhat.com wrote:
 On 02/16/2012 06:45 PM, Andy Lutomirski wrote:
 
  So I could have messed up, or there could be a subtle
  bug somewhere.  Any ideas?
 
  What's the code trying to do?  Execute an instruction from an
  non-executable page, trap the #PF, and emulate?  And what are the
  symptoms? wrong error code for the #PF?  That could easily be a kvm bug.
 

 The symptom is that some kind of access to a page that's supposed to
 be readable, NX is reporting error 5.  I'm not quite sure what kind of
 access is causing that.

 Might it be a fetch access, with kvm forgetting to set bit 4 correctly?

 
  Can you point me at the code in question?

 The setup code is in arch/x86/kernel/vsyscall_64.c in map_vsyscall.
 The bad access is to the vsyscall page.

 The bad access is on purpose, yes?

 From fault.c:

 #ifdef CONFIG_X86_64
                /*
                 * Instruction fetch faults in the vsyscall page might need
                 * emulation.
                 */
                if (unlikely((error_code  PF_INSTR) 
                             ((address  ~0xfff) == VSYSCALL_START))) {
                        if (emulate_vsyscall(regs, address))
                                return;
                }
 #endif

 so it seems like kvm doesn't set PF_INSTR?

Yes, this is on purpose, and you're almost certainly right (and I feel
dumb for not figuring this out immediately).  The error message is:

segfault at ff600400 ip ff600400 sp 7fff103d72f8 error 5

which is garbage.  The instruction at 0xff600400 can't fetch
itself as data and fault on the data access (at least not in 64-bit
mode, as far as I can think of, without evil messing with the TLBs).

So... what do we do about this?  This (whitespace-damaged, untested)
patch will probably work around it well enough to boot the system:

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 9d74824..52b9522 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -741,8 +741,11 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long
 * Instruction fetch faults in the vsyscall page might need
 * emulation.
 */
-   if (unlikely((error_code  PF_INSTR) 
+   if (unlikely(address == regs-ip  !(error_code  PF_WRITE) 
 ((address  ~0xfff) == VSYSCALL_START))) {
+   WARN_ONCE(!(error_code  PF_INSTR),
+ Fixing up bogus vsyscall read fault -- 
+ your hypervisor is buggy.);
if (emulate_vsyscall(regs, address))
return;
}

Before we patch the guest like this, though, it would be nice to know
what hosts are affected.  If it's just one version of RHEL6, maybe it
makes sense to fix the hypervisor and either leave the guest alone or
just add a warning saying to fix your hypervisor, like:

WARN_ONCE(address == regs-ip  !(error_code  (PF_INSTR | PF_WRITE))
 user_64bit_mode(regs), Fishy page fault -- you might need to fix
your hypervisor);

near some exit path in the page fault handler.  The 64-bit check is
because (I think) 32-bit code can mess with regs-ip using a cs offset
in the LDT and trigger the warning at will.

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-16 Thread Scott Wood
On 02/16/2012 11:30 AM, Alexander Graf wrote:
 
 On 16.02.2012, at 18:28, Scott Wood wrote:
 
 On 02/16/2012 11:18 AM, Alexander Graf wrote:
 Hrm. But we can clobber ctr, right? So how about we make the generic 
 version do a bctr and then just do a small C wrapper that takes lr, moves 
 it to ctr and branches to the generic one?

 If it's just for this, I would say don't mess with the normal hcall path
 for the sake of idle.  If using CTR would let us get away without
 creating a stack frame in call sites, maybe that would be worthwhile,
 depending on what sort of hcalls we end up having.

 Then we don't have to replicate the hypercall code all over again for every 
 invocation.

 We shouldn't need to do it for every invocation.  Idle is special due to
 the TLF_NAPPING hack.
 
 Famous last words. If it's the only case, duplication should be ok. Let's 
 hope there are no others.

Actually, we can't use CTR -- it's volatile in the ePAPR hypercall ABI.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-16 Thread Alexander Graf

On 16.02.2012, at 18:36, Scott Wood wrote:

 On 02/16/2012 11:30 AM, Alexander Graf wrote:
 
 On 16.02.2012, at 18:28, Scott Wood wrote:
 
 On 02/16/2012 11:18 AM, Alexander Graf wrote:
 Hrm. But we can clobber ctr, right? So how about we make the generic 
 version do a bctr and then just do a small C wrapper that takes lr, moves 
 it to ctr and branches to the generic one?
 
 If it's just for this, I would say don't mess with the normal hcall path
 for the sake of idle.  If using CTR would let us get away without
 creating a stack frame in call sites, maybe that would be worthwhile,
 depending on what sort of hcalls we end up having.
 
 Then we don't have to replicate the hypercall code all over again for 
 every invocation.
 
 We shouldn't need to do it for every invocation.  Idle is special due to
 the TLF_NAPPING hack.
 
 Famous last words. If it's the only case, duplication should be ok. Let's 
 hope there are no others.
 
 Actually, we can't use CTR -- it's volatile in the ePAPR hypercall ABI.

Ugh. Alrighty, let's duplicate the hc code then.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Offline

2012-02-16 Thread Avi Kivity
I'll be on vacation again, from Sunday (the 19th) until the following
Sunday.  Unfortunately, Marcelo is on vacation as well, so there won't
be anyone to tend to patches.  This is doubly unfortunate since there
are still unreviewed patches on the list.  I'll do my best to catch up a
little tomorrow and when I return.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM paravirt issue?] Re: vsyscall=emulate regression

2012-02-16 Thread Avi Kivity
On 02/16/2012 07:35 PM, Andy Lutomirski wrote:
 
  so it seems like kvm doesn't set PF_INSTR?

 Yes, this is on purpose, and you're almost certainly right (and I feel
 dumb for not figuring this out immediately).  The error message is:

 segfault at ff600400 ip ff600400 sp 7fff103d72f8 error 5

 which is garbage.  The instruction at 0xff600400 can't fetch
 itself as data and fault on the data access (at least not in 64-bit
 mode, as far as I can think of, without evil messing with the TLBs).

 So... what do we do about this?  This (whitespace-damaged, untested)
 patch will probably work around it well enough to boot the system:

 diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
 index 9d74824..52b9522 100644
 --- a/arch/x86/mm/fault.c
 +++ b/arch/x86/mm/fault.c
 @@ -741,8 +741,11 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned 
 long
  * Instruction fetch faults in the vsyscall page might need
  * emulation.
  */
 -   if (unlikely((error_code  PF_INSTR) 
 +   if (unlikely(address == regs-ip  !(error_code  PF_WRITE) 
 
  ((address  ~0xfff) == VSYSCALL_START))) {
 +   WARN_ONCE(!(error_code  PF_INSTR),
 + Fixing up bogus vsyscall read fault -- 
 + your hypervisor is buggy.);
 if (emulate_vsyscall(regs, address))
 return;
 }

 Before we patch the guest like this, though, it would be nice to know
 what hosts are affected.  If it's just one version of RHEL6, maybe it
 makes sense to fix the hypervisor and either leave the guest alone or
 just add a warning saying to fix your hypervisor, like:

 WARN_ONCE(address == regs-ip  !(error_code  (PF_INSTR | PF_WRITE))
  user_64bit_mode(regs), Fishy page fault -- you might need to fix
 your hypervisor);

 near some exit path in the page fault handler.  The 64-bit check is
 because (I think) 32-bit code can mess with regs-ip using a cs offset
 in the LDT and trigger the warning at will.


We'll just fix all affected hypervisor versions.  No need to uglify the
guest for a clear kvm bug.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-16 Thread Shradha Shah
Hello,

Please find my comments inline.

Regards,
Shradha Shah

On 02/16/2012 03:58 AM, Ben Hutchings wrote:
 [I'm just catching up with this after getting my own driver changes into
 shape.]
 
 On Fri, 2012-02-10 at 10:18 -0500, jamal wrote:
 Hi John,

 I went backwards to summarize at the top after going through your email.

 TL;DR version 0.1: 
 you provide a good use case where it makes sense to do things in the
 kernel. IMO, you could make the same arguement if your embedded switch
 could do ACLs, IPv4 forwarding etc. And the kernel bloats.
 I am always bigoted to move all policy control to user space instead of
 bloating in the kernel.
 [...]
 Now here is the potential issue,

 (G) The frame transmitted from ethx.y with the destination address of
 veth0 but the embedded switch is not a learning switch. If the FDB
 update is done in user space its possible (likely?) that the FDB
 entry for veth0 has not been added to the embedded switch yet. 

 Ok, got it - so the catch here is the switch is not capable of learning.
 I think this depends on where learning is done. Your intent is to
 use the S/W bridge as something that does the learning for you i.e in
 the kernel. This makes the s/w bridge part of MUST-have-for-this-to-run.
 And that maybe the case for your use case.
 [...]
 
 Well, in addition, there are SR-IOV network adapters that don't have any
 bridge.  For these, the software bridge is necessary to handle
 multicast, broadcast and forwarding between local ports, not only to do
 learning.
 
 Solarflare's implementation of accelerated guest networking (which
 Shradha and I are gradually sending upstream) builds on libvirt's
 existing support for software bridges and assigns VFs to guests as a
 means to offload some of the forwarding.

I am also trying to work with bridging using macvtap. Libvirt supports
macvtap in four modes; vepa, bridge, private and passthrough mode.
Macvtap used in the bridge mode will work similar to a software bridge and 
will improve performance.

 
 If and when we implement a hardware bridge, we would probably still want
 to keep the software bridge as a fallback.  If a guest is dependent on a
 VF that's connected to a hardware bridge, it becomes impossible or at
 least very disruptive to migrate it to another host that doesn't have a
 compatible VF available.
 
 Ben.
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Avi Kivity
On 02/15/2012 04:08 PM, Alexander Graf wrote:
  
  Well, the scatter/gather registers I proposed will give you just one
  register or all of them.

 One register is hardly any use. We either need all ways of a respective 
 address to do a full fledged lookup or all of them. 

I should have said, just one register, or all of them, or anything in
between.

 By sharing the same data structures between qemu and kvm, we actually managed 
 to reuse all of the tcg code for lookups, just like you do for x86.

Sharing the data structures is not need.  Simply synchronize them before
lookup, like we do for ordinary registers.

  On x86 you also have shared memory for page tables, it's just guest visible, 
 hence in guest memory. The concept is the same.

But cr3 isn't, and if we put it in shared memory, we'd have to VMREAD it
on every exit.  And you're risking the same thing if your hardware gets
cleverer.

  
  btw, why are you interested in virtual addresses in userspace at all?
  
  We need them for gdb and monitor introspection.
  
  Hardly fast paths that justify shared memory.  I should be much harder
  on you.

 It was a tradeoff on speed and complexity. This way we have the least amount 
 of complexity IMHO. All KVM code paths just magically fit in with the TCG 
 code. 

It's too magical, fitting a random version of a random userspace
component.  Now you can't change this tcg code (and still keep the magic).

Some complexity is part of keeping software as separate components.

 There are essentially no if(kvm_enabled)'s in our MMU walking code, because 
 the tables are just there. Makes everything a lot easier (without dragging 
 down performance).

We have the same issue with registers.  There we call
cpu_synchronize_state() before every access.  No magic, but we get to
reuse the code just the same.

  
  
  One thing that's different is that virtio offloads itself to a thread
  very quickly, while IDE does a lot of work in vcpu thread context.
  
  So it's all about latencies again, which could be reduced at least a fair 
  bit with the scheme I described above. But really, this needs to be 
  prototyped and benchmarked to actually give us data on how fast it would 
  get us.
  
  Simply making qemu issue the request from a thread would be way better. 
  Something like socketpair mmio, configured for not waiting for the
  writes to be seen (posted writes) will also help by buffering writes in
  the socket buffer.

 Yup, nice idea. That only works when all parts of a device are actually 
 implemented through the same socket though. 

Right, but that's not an issue.

 Otherwise you could run out of order. So if you have a PCI device with a PIO 
 and an MMIO BAR region, they would both have to be handled through the same 
 socket.

I'm more worried about interactions between hotplug and a device, and
between people issuing unrelated PCI reads to flush writes (not sure
what the hardware semantics are there).  It's easy to get this wrong.

  
  COWs usually happen from guest userspace, while mmio is usually from the
  guest kernel, so you can switch on that, maybe.
  
  Hrm, nice idea. That might fall apart with user space drivers that we 
  might eventually have once vfio turns out to work well, but for the time 
  being it's a nice hack :).
  
  Or nested virt...

 Nested virt on ppc with device assignment? And here I thought I was the crazy 
 one of the two of us :)

I don't mind being crazy on somebody else's arch.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Avi Kivity
On 02/16/2012 03:04 AM, Michael Ellerman wrote:
  
  ioctl is good for hardware devices and stuff that you want to enumerate
  and/or control permissions on. For something like KVM that is really a
  core kernel service, a syscall makes much more sense.

 Yeah maybe. That distinction is at least in part just historical.

 The first problem I see with using a syscall is that you don't need one
 syscall for KVM, you need ~90. OK so you wouldn't do that, you'd use a
 multiplexed syscall like epoll_ctl() - or probably several
 (vm/vcpu/etc).

No.  Many of our ioctls are for state save/restore - we reduce that to
two.  Many others are due to the with/without irqchip support - we slash
that as well.  The device assignment stuff is relegated to vfio.

I still have to draw up a concrete proposal, but I think we'll end up
with 10-15.


 Secondly you still need a handle/context for those syscalls, and I think
 the most sane thing to use for that is an fd.

The context is the process (for vm-wide calls) and thread (for vcpu
local calls).


 At that point you've basically reinvented ioctl :)

 I also think it is an advantage that you have a node in /dev for
 permissions. I know other core kernel interfaces don't use a /dev
 node, but arguably that is their loss.

Have to agree with that.  Theoretically we don't need permissions for
/dev/kvm, but in practice we do.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Alexander Graf

On 16.02.2012, at 20:24, Avi Kivity wrote:

 On 02/15/2012 04:08 PM, Alexander Graf wrote:
 
 Well, the scatter/gather registers I proposed will give you just one
 register or all of them.
 
 One register is hardly any use. We either need all ways of a respective 
 address to do a full fledged lookup or all of them. 
 
 I should have said, just one register, or all of them, or anything in
 between.
 
 By sharing the same data structures between qemu and kvm, we actually 
 managed to reuse all of the tcg code for lookups, just like you do for x86.
 
 Sharing the data structures is not need.  Simply synchronize them before
 lookup, like we do for ordinary registers.

Ordinary registers are a few bytes. We're talking of dozens of kbytes here.

 
 On x86 you also have shared memory for page tables, it's just guest visible, 
 hence in guest memory. The concept is the same.
 
 But cr3 isn't, and if we put it in shared memory, we'd have to VMREAD it
 on every exit.  And you're risking the same thing if your hardware gets
 cleverer.

Yes, we do. When that day comes, we forget the CAP and do it another way. Which 
way we will find out by the time that day of more clever hardware comes :).

 
 
 btw, why are you interested in virtual addresses in userspace at all?
 
 We need them for gdb and monitor introspection.
 
 Hardly fast paths that justify shared memory.  I should be much harder
 on you.
 
 It was a tradeoff on speed and complexity. This way we have the least amount 
 of complexity IMHO. All KVM code paths just magically fit in with the TCG 
 code. 
 
 It's too magical, fitting a random version of a random userspace
 component.  Now you can't change this tcg code (and still keep the magic).
 
 Some complexity is part of keeping software as separate components.

Why? If another user space wants to use this, they can

a) do the slow copy path
or
b) simply use our struct definitions

The whole copy thing really only makes sense when you have existing code in 
user space that you don't want to touch, but easily add on KVM to it. If KVM is 
part of your whole design, then integrating things makes a lot more sense.

 
 There are essentially no if(kvm_enabled)'s in our MMU walking code, because 
 the tables are just there. Makes everything a lot easier (without dragging 
 down performance).
 
 We have the same issue with registers.  There we call
 cpu_synchronize_state() before every access.  No magic, but we get to
 reuse the code just the same.

Yes, and for those few bytes it's ok to do so - most of the time. On s390, even 
those get shared by now. And it makes sense to do so - if we synchronize it 
every time anyways, why not do so implicitly?


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Avi Kivity
On 02/16/2012 04:46 PM, Anthony Liguori wrote:
 What will it buy us? Surely not speed. Entering a guest is not much
 (if at all) faster than exiting to userspace and any non trivial
 operation will require exit to userspace anyway,


 You can emulate the PIT/RTC entirely within the guest using kvmclock
 which doesn't require an additional exit to get the current time base.

 So instead of:

 1) guest - host kernel
 2) host kernel - userspace
 3) implement logic using rdtscp via VDSO
 4) userspace - host kernel
 5) host kernel - guest

 You go:

 1) guest - host kernel
 2) host kernel - guest (with special CR3)
 3) implement logic using rdtscp + kvmclock page
 4) change CR3 within guest and RETI to VMEXIT source RIP

 Same basic concept as PS/2 emulation with SMM.

Interesting, but unimplementable in practice.  SMM requires a VMEXIT for
RSM, and anything non-SMM wants a virtual address mapping (and some RAM)
which you can't get without guest cooperation.  There are other
complications like an NMI interrupting hypervisor-provided code and
finding unexpected addresses on its stack (SMM at least blocks NMIs).

Tangentially related, Intel introduced a VMFUNC that allows you to
change the guest's physical memory map to a pre-set alternative provided
by the host, without a VMEXIT.  Seems similar to SMM but requires guest
cooperation.  I guess it's for unintrusive virus scanners and the like.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Avi Kivity
On 02/16/2012 09:34 PM, Alexander Graf wrote:
 On 16.02.2012, at 20:24, Avi Kivity wrote:

  On 02/15/2012 04:08 PM, Alexander Graf wrote:
  
  Well, the scatter/gather registers I proposed will give you just one
  register or all of them.
  
  One register is hardly any use. We either need all ways of a respective 
  address to do a full fledged lookup or all of them. 
  
  I should have said, just one register, or all of them, or anything in
  between.
  
  By sharing the same data structures between qemu and kvm, we actually 
  managed to reuse all of the tcg code for lookups, just like you do for x86.
  
  Sharing the data structures is not need.  Simply synchronize them before
  lookup, like we do for ordinary registers.

 Ordinary registers are a few bytes. We're talking of dozens of kbytes here.

A TLB way is a few dozen bytes, no?

  
  On x86 you also have shared memory for page tables, it's just guest 
  visible, hence in guest memory. The concept is the same.
  
  But cr3 isn't, and if we put it in shared memory, we'd have to VMREAD it
  on every exit.  And you're risking the same thing if your hardware gets
  cleverer.

 Yes, we do. When that day comes, we forget the CAP and do it another way. 
 Which way we will find out by the time that day of more clever hardware comes 
 :).

Or we try to be less clever unless we have a really compelling reason. 
qemu monitor and gdb support aren't compelling reasons to optimize.

  
  It's too magical, fitting a random version of a random userspace
  component.  Now you can't change this tcg code (and still keep the magic).
  
  Some complexity is part of keeping software as separate components.

 Why? If another user space wants to use this, they can

 a) do the slow copy path
 or
 b) simply use our struct definitions

 The whole copy thing really only makes sense when you have existing code in 
 user space that you don't want to touch, but easily add on KVM to it. If KVM 
 is part of your whole design, then integrating things makes a lot more sense.

Yeah, I guess.


  
  There are essentially no if(kvm_enabled)'s in our MMU walking code, 
  because the tables are just there. Makes everything a lot easier (without 
  dragging down performance).
  
  We have the same issue with registers.  There we call
  cpu_synchronize_state() before every access.  No magic, but we get to
  reuse the code just the same.

 Yes, and for those few bytes it's ok to do so - most of the time. On s390, 
 even those get shared by now. And it makes sense to do so - if we synchronize 
 it every time anyways, why not do so implicitly?


At least on x86, we synchronize only rarely.



-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4 V14] Add flag to indicate that a vm was stopped by the host

2012-02-16 Thread Eric B Munson
This flag will be used to check if the vm was stopped by the host when a soft
lockup was detected.  The host will set the flag when it stops the guest.  On
resume, the guest will check this flag if a soft lockup is detected and skip
issuing the warning.

Signed-off-by: Eric B Munson emun...@mgebm.net
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
---
 arch/x86/include/asm/pvclock-abi.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index 35f2d19..6167fd7 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -40,5 +40,6 @@ struct pvclock_wall_clock {
 } __attribute__((__packed__));
 
 #define PVCLOCK_TSC_STABLE_BIT (1  0)
+#define PVCLOCK_GUEST_STOPPED  (1  1)
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PVCLOCK_ABI_H */
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4 V14] Avoid soft lockup message when KVM is stopped by host

2012-02-16 Thread Eric B Munson
Changes from V13:
Expand on KVM_KVMCLOCK_CTRL ioctl documentation

Changes from V12:
Re-add missing kvm.c code after rebase
Rename CAP to KVM_CAP_KVMCLOCK_CTRL
Rename ioctl to KVM_KVMCLOCK_CTRL

Changes from V11:
Re-add asm-generic stub
Correct api.txt typo
add kvm_make_request() call after setting PVCLOCK_GUEST_STOPPED

Changes from V10:
Return ioctl to per vcpu instead of per vm

Changes from V9:
Use kvm_for_each_vcpu to iterate online vcpu's

Changes from V8:
Make KVM_GUEST_PAUSED a per vm ioctl instead of per vcpu

Changes from V7:
Define KVM_CAP_GUEST_PAUSED and support check
Call mark_page_dirty () after setting PVCLOCK_GUEST_STOPPED

Changes from V6:
Use __this_cpu_and when clearing the PVCLOCK_GUEST_STOPPED flag

Changes from V5:
Collapse generic check_and_clear_guest_stopped into patch 2
Include check_and_clear_guest_stopped defintion to ia64, s390, and powerpc
Change check_and_clear_guest_stopped to use __get_cpu_var instead of taking the
 cpuid arg.
Protect check_and_clear_guest_stopped declaration with CONFIG_KVM_CLOCK check

Changes from V4:
Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED
Add description of KVMCLOCK_GUEST_PAUSED ioctl to api.txt

Changes from V3:
Include CC's on patch 3
Drop clear flag ioctl and have the watchdog clear the flag when it is reset

Changes from V2:
A new kvm functions defined in kvm_para.h, the only change to pvclock is the
initial flag definition

Changes from V1:
(Thanks Marcelo)
Host code has all been moved to arch/x86/kvm/x86.c
KVM_PAUSE_GUEST was renamed to KVM_GUEST_PAUSED

When a guest kernel is stopped by the host hypervisor it can look like a soft
lockup to the guest kernel.  This false warning can mask later soft lockup
warnings which may be real.  This patch series adds a method for a host
hypervisor to communicate to a guest kernel that it is being stopped.  The
final patch in the series has the watchdog check this flag when it goes to
issue a soft lockup warning and skip the warning if the guest knows it was
stopped.

It was attempted to solve this in Qemu, but the side effects of saving and
restoring the clock and tsc for each vcpu put the wall clock of the guest behind
by the amount of time of the pause.  This forces a guest to have ntp running
in order to keep the wall clock accurate.

Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org

Eric B Munson (4):
  Add flag to indicate that a vm was stopped by the host
  Add functions to check if the host has stopped the vm
  Add ioctl for KVM_KVMCLOCK_CTRL
  Add check for suspended vm in softlockup detector

 Documentation/virtual/kvm/api.txt   |   21 +
 arch/ia64/include/asm/kvm_para.h|5 +
 arch/powerpc/include/asm/kvm_para.h |5 +
 arch/s390/include/asm/kvm_para.h|5 +
 arch/x86/include/asm/kvm_para.h |8 
 arch/x86/include/asm/pvclock-abi.h  |1 +
 arch/x86/kernel/kvmclock.c  |   21 +
 arch/x86/kvm/x86.c  |   22 ++
 include/asm-generic/kvm_para.h  |   14 ++
 include/linux/kvm.h |3 +++
 kernel/watchdog.c   |   12 
 11 files changed, 117 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/kvm_para.h

-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4 V13] Add ioctl for KVM_KVMCLOCK_CTRL

2012-02-16 Thread Eric B Munson
Now that we have a flag that will tell the guest it was suspended, create an
interface for that communication using a KVM ioctl.

Signed-off-by: Eric B Munson emun...@mgebm.net

Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
---
Changes from V13:
 Expand the ioctl documentation
Changes from V12:
 Rename KVM_CAP and KVM ioctl
Changes from V11:
 Correct typo in api.txt
 Add kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu) call to kvm_set_guest_paused()
Changes from V10:
 Return the ioctl to per vcpu instead of per vm
Changes from V9:
 Use kvm_for_each_vcpu to iterate online vcpu's
Changes from V8:
 Make KVM_GUEST_PAUSED a per vm ioctl instead of per vcpu
Changes from V7:
 Define KVM_CAP_GUEST_PAUSED and support check
 Call mark_page_dirty () after setting PVCLOCK_GUEST_STOPPED
Changes from V4:
 Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED
 Add new ioctl description to api.txt

 Documentation/virtual/kvm/api.txt |   21 +
 arch/x86/kvm/x86.c|   22 ++
 include/linux/kvm.h   |3 +++
 3 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 59a3826..d868e80 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1628,6 +1628,27 @@ at the memory location pointed to by addr.
 The list of registers accessible using this interface is identical to the
 list in 4.64.
 
+4.70 KVM_KVMCLOCK_CTRL
+
+Capability: KVM_CAP_KVMCLOCK_CTRL
+Architectures: Any that implement pvclocks (currently x86 only)
+Type: vcpu ioctl
+Parameters: None
+Returns: 0 on success, -1 on error
+
+This signals to the host kernel that the specified guest is being paused by
+userspace.  The host will set a flag in the pvclock structure that is checked
+from the soft lockup watchdog.  The flag is part of the pvclock structure that
+is shared between guest and host, specifically the second bit of the flags
+field of the pvclock_vcpu_time_info structure.  It will be set exclusively by
+the host and read/cleared exclusively by the guest.  The guest operation of
+checking and clearing the flag must an atomic operation so use of the function
+check_and_clear_guest_paused() is encouraged, but it could also be done with
+load-link/store-conditional.  There are two cases where the guest will clear
+the flag: when the soft lockup watchdog timer resets itself or when a soft
+lockup is detected.  This ioctl can be called any time after pausing
+the vcpu, but before it is resumed.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c9d99e5..7ba8213 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2143,6 +2143,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
case KVM_CAP_GET_TSC_KHZ:
+   case KVM_CAP_KVMCLOCK_CTRL:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -2593,6 +2594,23 @@ static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu 
*vcpu,
return r;
 }
 
+/*
+ * kvm_set_guest_paused() indicates to the guest kernel that it has been
+ * stopped by the hypervisor.  This function will be called from the host only.
+ * EINVAL is returned when the host attempts to set the flag for a guest that
+ * does not support pv clocks.
+ */
+static int kvm_set_guest_paused(struct kvm_vcpu *vcpu)
+{
+   struct pvclock_vcpu_time_info *src = vcpu-arch.hv_clock;
+   if (!vcpu-arch.time_page)
+   return -EINVAL;
+   src-flags |= PVCLOCK_GUEST_STOPPED;
+   mark_page_dirty(vcpu-kvm, vcpu-arch.time  PAGE_SHIFT);
+   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+   return 0;
+}
+
 long kvm_arch_vcpu_ioctl(struct file *filp,
 unsigned int ioctl, unsigned long arg)
 {
@@ -2869,6 +2887,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = vcpu-arch.virtual_tsc_khz;
goto out;
}
+   case KVM_KVMCLOCK_CTRL: {
+   r = kvm_set_guest_paused(vcpu);
+   goto out;
+   }
default:
r = -EINVAL;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index acbe429..a90db11 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -588,6 +588,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_TSC_DEADLINE_TIMER 72
 #define KVM_CAP_S390_UCONTROL 73
 #define KVM_CAP_SYNC_REGS 74
+#define KVM_CAP_KVMCLOCK_CTRL 75
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -855,6 +856,8 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_ONE_REG */
 #define KVM_GET_ONE_REG  _IOW(KVMIO,  0xab, struct kvm_one_reg)
 #define KVM_SET_ONE_REG  

[PATCH 4/4 V14] Add check for suspended vm in softlockup detector

2012-02-16 Thread Eric B Munson
A suspended VM can cause spurious soft lockup warnings.  To avoid these, the
watchdog now checks if the kernel knows it was stopped by the host and skips
the warning if so.  When the watchdog is reset successfully, clear the guest
paused flag.

Signed-off-by: Eric B Munson emun...@mgebm.net
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
---
Changes from V3:
 Clear the PAUSED flag when the watchdog is reset

 kernel/watchdog.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index d117262..f6ffdc2 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -25,6 +25,7 @@
 #include linux/sysctl.h
 
 #include asm/irq_regs.h
+#include linux/kvm_para.h
 #include linux/perf_event.h
 
 int watchdog_enabled = 1;
@@ -280,6 +281,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
__this_cpu_write(softlockup_touch_sync, false);
sched_clock_tick();
}
+
+   /* Clear the guest paused flag on watchdog reset */
+   kvm_check_and_clear_guest_paused();
__touch_watchdog();
return HRTIMER_RESTART;
}
@@ -292,6 +296,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
 */
duration = is_softlockup(touch_ts);
if (unlikely(duration)) {
+   /*
+* If a virtual machine is stopped by the host it can look to
+* the watchdog like a soft lockup, check to see if the host
+* stopped the vm before we issue the warning
+*/
+   if (kvm_check_and_clear_guest_paused())
+   return HRTIMER_RESTART;
+
/* only warn once */
if (__this_cpu_read(soft_watchdog_warn) == true)
return HRTIMER_RESTART;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4 V14] Add functions to check if the host has stopped the vm

2012-02-16 Thread Eric B Munson
When a host stops or suspends a VM it will set a flag to show this.  The
watchdog will use these functions to determine if a softlockup is real, or the
result of a suspended VM.

Signed-off-by: Eric B Munson emun...@mgebm.net
asm-generic changes Acked-by: Arnd Bergmann a...@arndb.de
Cc: mi...@redhat.com
Cc: h...@zytor.com
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: mtosa...@redhat.com
Cc: kvm@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-ker...@vger.kernel.org
---
Changes from V11:
 Re-add the missing asm-generic stub for check_and_clear_guest_stopped()
Changes from V6:
 Use __this_cpu_and when clearing the PVCLOCK_GUEST_STOPPED flag
Changes from V5:
 Collapse generic stubs into this patch
 check_and_clear_guest_stopped() takes no args and uses __get_cpu_var()
 Include individual definitions in ia64, s390, and powerpc

 arch/ia64/include/asm/kvm_para.h|5 +
 arch/powerpc/include/asm/kvm_para.h |5 +
 arch/s390/include/asm/kvm_para.h|5 +
 arch/x86/include/asm/kvm_para.h |8 
 arch/x86/kernel/kvmclock.c  |   21 +
 include/asm-generic/kvm_para.h  |   14 ++
 6 files changed, 58 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/kvm_para.h

diff --git a/arch/ia64/include/asm/kvm_para.h b/arch/ia64/include/asm/kvm_para.h
index 1588aee..2019cb9 100644
--- a/arch/ia64/include/asm/kvm_para.h
+++ b/arch/ia64/include/asm/kvm_para.h
@@ -26,6 +26,11 @@ static inline unsigned int kvm_arch_para_features(void)
return 0;
 }
 
+static inline bool kvm_check_and_clear_guest_paused(void)
+{
+   return false;
+}
+
 #endif
 
 #endif
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 7b754e7..c18916b 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -206,6 +206,11 @@ static inline unsigned int kvm_arch_para_features(void)
return r;
 }
 
+static inline bool kvm_check_and_clear_guest_paused(void)
+{
+   return false;
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* __POWERPC_KVM_PARA_H__ */
diff --git a/arch/s390/include/asm/kvm_para.h b/arch/s390/include/asm/kvm_para.h
index 6964db2..a988329 100644
--- a/arch/s390/include/asm/kvm_para.h
+++ b/arch/s390/include/asm/kvm_para.h
@@ -149,6 +149,11 @@ static inline unsigned int kvm_arch_para_features(void)
return 0;
 }
 
+static inline bool kvm_check_and_clear_guest_paused(void)
+{
+   return false;
+}
+
 #endif
 
 #endif /* __S390_KVM_PARA_H */
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 734c376..99c4bbe 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -95,6 +95,14 @@ struct kvm_vcpu_pv_apf_data {
 extern void kvmclock_init(void);
 extern int kvm_register_clock(char *txt);
 
+#ifdef CONFIG_KVM_CLOCK
+bool kvm_check_and_clear_guest_paused(void);
+#else
+static inline bool kvm_check_and_clear_guest_paused(void)
+{
+   return false;
+}
+#endif /* CONFIG_KVMCLOCK */
 
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
  * trap that we will then rewrite to the appropriate instruction.
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index ca4e735..8562f77 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -22,6 +22,7 @@
 #include asm/msr.h
 #include asm/apic.h
 #include linux/percpu.h
+#include linux/hardirq.h
 
 #include asm/x86_init.h
 #include asm/reboot.h
@@ -114,6 +115,26 @@ static void kvm_get_preset_lpj(void)
preset_lpj = lpj;
 }
 
+bool kvm_check_and_clear_guest_paused(void)
+{
+   bool ret = false;
+   struct pvclock_vcpu_time_info *src;
+
+   /*
+* per_cpu() is safe here because this function is only called from
+* timer functions where preemption is already disabled.
+*/
+   WARN_ON(!in_atomic());
+   src = __get_cpu_var(hv_clock);
+   if ((src-flags  PVCLOCK_GUEST_STOPPED) != 0) {
+   __this_cpu_and(hv_clock.flags, ~PVCLOCK_GUEST_STOPPED);
+   ret = true;
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_check_and_clear_guest_paused);
+
 static struct clocksource kvm_clock = {
.name = kvm-clock,
.read = kvm_clock_get_cycles,
diff --git a/include/asm-generic/kvm_para.h b/include/asm-generic/kvm_para.h
new file mode 100644
index 000..05ef7e7
--- /dev/null
+++ b/include/asm-generic/kvm_para.h
@@ -0,0 +1,14 @@
+#ifndef _ASM_GENERIC_KVM_PARA_H
+#define _ASM_GENERIC_KVM_PARA_H
+
+
+/*
+ * This function is used by architectures that support kvm to avoid issuing
+ * false soft lockup messages.
+ */
+static inline bool kvm_check_and_clear_guest_paused(void)
+{
+   return false;
+}
+
+#endif
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH 3/4 V13] Add ioctl for KVM_KVMCLOCK_CTRL

2012-02-16 Thread Eric B Munson
On Thu, 16 Feb 2012, Eric B Munson wrote:

 Now that we have a flag that will tell the guest it was suspended, create an
 interface for that communication using a KVM ioctl.
 
 Signed-off-by: Eric B Munson emun...@mgebm.net

Sorry, this is actually V14.




signature.asc
Description: Digital signature


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Scott Wood
On 02/16/2012 01:38 PM, Avi Kivity wrote:
 On 02/16/2012 09:34 PM, Alexander Graf wrote:
 On 16.02.2012, at 20:24, Avi Kivity wrote:

 On 02/15/2012 04:08 PM, Alexander Graf wrote:

 Well, the scatter/gather registers I proposed will give you just one
 register or all of them.

 One register is hardly any use. We either need all ways of a respective 
 address to do a full fledged lookup or all of them. 

 I should have said, just one register, or all of them, or anything in
 between.

 By sharing the same data structures between qemu and kvm, we actually 
 managed to reuse all of the tcg code for lookups, just like you do for x86.

 Sharing the data structures is not need.  Simply synchronize them before
 lookup, like we do for ordinary registers.

 Ordinary registers are a few bytes. We're talking of dozens of kbytes here.
 
 A TLB way is a few dozen bytes, no?

I think you mean a TLB set... but the TLB (or part of it) may be fully
associative.

On e500mc, it's 24 bytes for one TLB entry, and you'd need 4 entries for
a set of TLB0, and all 64 entries in TLB1.  So 1632 bytes total.

Then we'd need to deal with tracking whether we synchronized one or more
specific sets, or everything (for migration or debug TLB dump).  The
request to synchronize would have to come from within the QEMU MMU code,
since that's the point where we know what to ask for (unless we
duplicate the logic elsewhere).  I'm not sure that reusing the standard
QEMU MMU code for individual debug address translation is really
simplifying things...

And yes, we do have fancier hardware coming fairly soon for which this
breaks (TLB0 entries can be loaded without host involvement, as long as
there's a translation from guest physical to physical in a separate
hardware table).  It'd be reasonable to ignore TLB0 for migration (treat
it as invalidated), but not for debug since that may be where the
translation we're interested in resides.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel

2012-02-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42779





--- Comment #19 from madenginee...@gmail.com  2012-02-16 22:01:38 ---
Doesn't seem to help. I used -cpu core2duo,-3dnow (since core2duo is the name
that libvirt, my usual launcher for qemu, has been using), failed on the same
movq. Also tried -cpu core2duo,-3dnow,-mmx, same results. I'll see if I can
build a kernel with the Geode framebuffer driver but without setting the CPU to
Geode (thus eliminating the use of MMX).

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Michael Ellerman
On Thu, 2012-02-16 at 21:28 +0200, Avi Kivity wrote:
 On 02/16/2012 03:04 AM, Michael Ellerman wrote:
   
   ioctl is good for hardware devices and stuff that you want to enumerate
   and/or control permissions on. For something like KVM that is really a
   core kernel service, a syscall makes much more sense.
 
  Yeah maybe. That distinction is at least in part just historical.
 
  The first problem I see with using a syscall is that you don't need one
  syscall for KVM, you need ~90. OK so you wouldn't do that, you'd use a
  multiplexed syscall like epoll_ctl() - or probably several
  (vm/vcpu/etc).
 
 No.  Many of our ioctls are for state save/restore - we reduce that to
 two.  Many others are due to the with/without irqchip support - we slash
 that as well.  The device assignment stuff is relegated to vfio.
 
 I still have to draw up a concrete proposal, but I think we'll end up
 with 10-15.

That's true, you certainly could reduce it, though by how much I'm not
sure. On powerpc I'm working on moving the irq controller emulation into
the kernel, and some associated firmware emulation, so that's at least
one new ioctl. And there will always be more, whatever scheme you have
must be easily extensible - ie. not requiring new syscalls for each new
weird platform.

  Secondly you still need a handle/context for those syscalls, and I think
  the most sane thing to use for that is an fd.
 
 The context is the process (for vm-wide calls) and thread (for vcpu
 local calls).

Yeah OK I forgot you'd mentioned that. But isn't that change basically
orthogonal to how you get into the kernel? ie. we could have the
kvm/vcpu pointers in mm_struct/task_struct today?

I guess it wouldn't win you much though because you still have the fd
and ioctl overhead as well.

cheers


signature.asc
Description: This is a digitally signed message part


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Alexander Graf

On 16.02.2012, at 20:38, Avi Kivity wrote:

 On 02/16/2012 09:34 PM, Alexander Graf wrote:
 On 16.02.2012, at 20:24, Avi Kivity wrote:
 
 On 02/15/2012 04:08 PM, Alexander Graf wrote:
 
 Well, the scatter/gather registers I proposed will give you just one
 register or all of them.
 
 One register is hardly any use. We either need all ways of a respective 
 address to do a full fledged lookup or all of them. 
 
 I should have said, just one register, or all of them, or anything in
 between.
 
 By sharing the same data structures between qemu and kvm, we actually 
 managed to reuse all of the tcg code for lookups, just like you do for x86.
 
 Sharing the data structures is not need.  Simply synchronize them before
 lookup, like we do for ordinary registers.
 
 Ordinary registers are a few bytes. We're talking of dozens of kbytes here.
 
 A TLB way is a few dozen bytes, no?
 
 
 On x86 you also have shared memory for page tables, it's just guest 
 visible, hence in guest memory. The concept is the same.
 
 But cr3 isn't, and if we put it in shared memory, we'd have to VMREAD it
 on every exit.  And you're risking the same thing if your hardware gets
 cleverer.
 
 Yes, we do. When that day comes, we forget the CAP and do it another way. 
 Which way we will find out by the time that day of more clever hardware 
 comes :).
 
 Or we try to be less clever unless we have a really compelling reason. 
 qemu monitor and gdb support aren't compelling reasons to optimize.

The goal here was simplicity with a grain of performance concerns.

So what would you be envisioning? Should we make all of the MMU walker code in 
target-ppc KVM aware so it fetches that single way it actually cares about on 
demand from the kernel? That is pretty intrusive and goes against the general 
nicely fitting in principle of how KVM integrates today.

Also, we need to store the guest TLB somewhere. With this model, we can just 
store it in user space memory, so we keep only a single copy around, reducing 
memory footprint. If we had to copy it, we would need more than a single copy.

 
 
 It's too magical, fitting a random version of a random userspace
 component.  Now you can't change this tcg code (and still keep the magic).
 
 Some complexity is part of keeping software as separate components.
 
 Why? If another user space wants to use this, they can
 
 a) do the slow copy path
 or
 b) simply use our struct definitions
 
 The whole copy thing really only makes sense when you have existing code in 
 user space that you don't want to touch, but easily add on KVM to it. If KVM 
 is part of your whole design, then integrating things makes a lot more sense.
 
 Yeah, I guess.
 
 
 
 There are essentially no if(kvm_enabled)'s in our MMU walking code, 
 because the tables are just there. Makes everything a lot easier (without 
 dragging down performance).
 
 We have the same issue with registers.  There we call
 cpu_synchronize_state() before every access.  No magic, but we get to
 reuse the code just the same.
 
 Yes, and for those few bytes it's ok to do so - most of the time. On s390, 
 even those get shared by now. And it makes sense to do so - if we 
 synchronize it every time anyways, why not do so implicitly?
 
 
 At least on x86, we synchronize only rarely.

Yeah, on s390 we only know which registers actually contain the information we 
need for traps / hypercalls when in user space, since that's where the decoding 
happens. So we better have all GPRs available to read from and write to.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Alexander Graf

On 16.02.2012, at 21:41, Scott Wood wrote:

 On 02/16/2012 01:38 PM, Avi Kivity wrote:
 On 02/16/2012 09:34 PM, Alexander Graf wrote:
 On 16.02.2012, at 20:24, Avi Kivity wrote:
 
 On 02/15/2012 04:08 PM, Alexander Graf wrote:
 
 Well, the scatter/gather registers I proposed will give you just one
 register or all of them.
 
 One register is hardly any use. We either need all ways of a respective 
 address to do a full fledged lookup or all of them. 
 
 I should have said, just one register, or all of them, or anything in
 between.
 
 By sharing the same data structures between qemu and kvm, we actually 
 managed to reuse all of the tcg code for lookups, just like you do for 
 x86.
 
 Sharing the data structures is not need.  Simply synchronize them before
 lookup, like we do for ordinary registers.
 
 Ordinary registers are a few bytes. We're talking of dozens of kbytes here.
 
 A TLB way is a few dozen bytes, no?
 
 I think you mean a TLB set... but the TLB (or part of it) may be fully
 associative.
 
 On e500mc, it's 24 bytes for one TLB entry, and you'd need 4 entries for
 a set of TLB0, and all 64 entries in TLB1.  So 1632 bytes total.
 
 Then we'd need to deal with tracking whether we synchronized one or more
 specific sets, or everything (for migration or debug TLB dump).  The
 request to synchronize would have to come from within the QEMU MMU code,
 since that's the point where we know what to ask for (unless we
 duplicate the logic elsewhere).  I'm not sure that reusing the standard
 QEMU MMU code for individual debug address translation is really
 simplifying things...
 
 And yes, we do have fancier hardware coming fairly soon for which this
 breaks (TLB0 entries can be loaded without host involvement, as long as
 there's a translation from guest physical to physical in a separate
 hardware table).  It'd be reasonable to ignore TLB0 for migration (treat
 it as invalidated), but not for debug since that may be where the
 translation we're interested in resides.

Could we maybe add an ioctl that forces kvm to read out the current tlb0 
contents and push them to memory? How slow would that be?


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 2/3] KVM: PPC: epapr: Add idle hcall support for host

2012-02-16 Thread Liu Yu-B13201


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, February 16, 2012 6:20 PM
 To: Liu Yu-B13201
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
 d...@ozlabs.org; Wood Scott-B07421; Liu Yu-B13201
 Subject: Re: [PATCH v4 2/3] KVM: PPC: epapr: Add idle hcall support for
 host
 
 
 
 On 16.02.2012, at 10:26, Liu Yu yu@freescale.com wrote:
 
  And add a new flag definition in kvm_ppc_pvinfo to indicate whether
  host support EV_IDLE hcall.
 
  Signed-off-by: Liu Yu yu@freescale.com
  ---
  v4:
  no change
 
  arch/powerpc/include/asm/kvm_para.h |   14 --
  arch/powerpc/kvm/powerpc.c  |8 
  include/linux/kvm.h |2 ++
  3 files changed, 22 insertions(+), 2 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/kvm_para.h
  b/arch/powerpc/include/asm/kvm_para.h
  index 7b754e7..81a34c9 100644
  --- a/arch/powerpc/include/asm/kvm_para.h
  +++ b/arch/powerpc/include/asm/kvm_para.h
  @@ -75,9 +75,19 @@ struct kvm_vcpu_arch_shared { };
 
  #define KVM_SC_MAGIC_R00x4b564d21 /* KVM! */
  -#define HC_VENDOR_KVM(42  16)
  +
  +#include asm/epapr_hcalls.h
  +
  +/* ePAPR Hypercall Vendor ID */
  +#define HC_VENDOR_EPAPR(EV_EPAPR_VENDOR_ID  16)
  +#define HC_VENDOR_KVM(EV_KVM_VENDOR_ID  16)
  +
  +/* ePAPR Hypercall Token */
  +#define HC_EV_IDLEEV_IDLE
  +
  +/* ePAPR Hypercall Return Codes */
  #define HC_EV_SUCCESS0
  -#define HC_EV_UNIMPLEMENTED12
  +#define HC_EV_UNIMPLEMENTEDEV_UNIMPLEMENTED
 
  #define KVM_FEATURE_MAGIC_PAGE1
 
  diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
  index 0e21d15..03ebd5d 100644
  --- a/arch/powerpc/kvm/powerpc.c
  +++ b/arch/powerpc/kvm/powerpc.c
  @@ -81,6 +81,10 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 
 /* Second return value is in r4 */
 break;
  +case HC_VENDOR_EPAPR | HC_EV_IDLE:
  +r = HC_EV_SUCCESS;
  +kvm_vcpu_block(vcpu);
  +break;
 default:
 r = HC_EV_UNIMPLEMENTED;
 break;
  @@ -746,6 +750,10 @@ static int kvm_vm_ioctl_get_pvinfo(struct
 kvm_ppc_pvinfo *pvinfo)
 pvinfo-hcall[2] = inst_sc;
 pvinfo-hcall[3] = inst_nop;
 
  +#ifdef CONFIG_BOOKE
  +pvinfo-flags |= KVM_PPC_PVINFO_FLAGS_EV_IDLE; #endif
  +
 return 0;
  } 

 Why limit it to booke? The less ifdefs our code has, the better :)

The code here tells userspace that kvm support ev_idle.
But I'm not sure if the ev_idle code works for other platforms.

So I think we should keep the ifdef until other platform test the code :)

Thanks,
Yu


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/3] KVM: PPC: epapr: Add idle hcall support for host

2012-02-16 Thread Alexander Graf

On 17.02.2012, at 03:13, Liu Yu-B13201 wrote:

 
 
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, February 16, 2012 6:20 PM
 To: Liu Yu-B13201
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc-
 d...@ozlabs.org; Wood Scott-B07421; Liu Yu-B13201
 Subject: Re: [PATCH v4 2/3] KVM: PPC: epapr: Add idle hcall support for
 host
 
 
 
 On 16.02.2012, at 10:26, Liu Yu yu@freescale.com wrote:
 
 And add a new flag definition in kvm_ppc_pvinfo to indicate whether
 host support EV_IDLE hcall.
 
 Signed-off-by: Liu Yu yu@freescale.com
 ---
 v4:
 no change
 
 arch/powerpc/include/asm/kvm_para.h |   14 --
 arch/powerpc/kvm/powerpc.c  |8 
 include/linux/kvm.h |2 ++
 3 files changed, 22 insertions(+), 2 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_para.h
 b/arch/powerpc/include/asm/kvm_para.h
 index 7b754e7..81a34c9 100644
 --- a/arch/powerpc/include/asm/kvm_para.h
 +++ b/arch/powerpc/include/asm/kvm_para.h
 @@ -75,9 +75,19 @@ struct kvm_vcpu_arch_shared { };
 
 #define KVM_SC_MAGIC_R00x4b564d21 /* KVM! */
 -#define HC_VENDOR_KVM(42  16)
 +
 +#include asm/epapr_hcalls.h
 +
 +/* ePAPR Hypercall Vendor ID */
 +#define HC_VENDOR_EPAPR(EV_EPAPR_VENDOR_ID  16)
 +#define HC_VENDOR_KVM(EV_KVM_VENDOR_ID  16)
 +
 +/* ePAPR Hypercall Token */
 +#define HC_EV_IDLEEV_IDLE
 +
 +/* ePAPR Hypercall Return Codes */
 #define HC_EV_SUCCESS0
 -#define HC_EV_UNIMPLEMENTED12
 +#define HC_EV_UNIMPLEMENTEDEV_UNIMPLEMENTED
 
 #define KVM_FEATURE_MAGIC_PAGE1
 
 diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
 index 0e21d15..03ebd5d 100644
 --- a/arch/powerpc/kvm/powerpc.c
 +++ b/arch/powerpc/kvm/powerpc.c
 @@ -81,6 +81,10 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 
   /* Second return value is in r4 */
   break;
 +case HC_VENDOR_EPAPR | HC_EV_IDLE:
 +r = HC_EV_SUCCESS;
 +kvm_vcpu_block(vcpu);
 +break;
   default:
   r = HC_EV_UNIMPLEMENTED;
   break;
 @@ -746,6 +750,10 @@ static int kvm_vm_ioctl_get_pvinfo(struct
 kvm_ppc_pvinfo *pvinfo)
   pvinfo-hcall[2] = inst_sc;
   pvinfo-hcall[3] = inst_nop;
 
 +#ifdef CONFIG_BOOKE
 +pvinfo-flags |= KVM_PPC_PVINFO_FLAGS_EV_IDLE; #endif
 +
   return 0;
 } 
 
 Why limit it to booke? The less ifdefs our code has, the better :)
 
 The code here tells userspace that kvm support ev_idle.
 But I'm not sure if the ev_idle code works for other platforms.
 
 So I think we should keep the ifdef until other platform test the code :)

But the implementation is in generic code and is not #ifdef'ed, so a guest 
could still call it just fine. It looks simple enough to work without major 
testing on different platforms, so I'd say just expose it and be done :)


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-16 Thread Xiao Guangrong
On 02/16/2012 07:57 PM, Avi Kivity wrote:


 Suppose at point A another thread executes defer_remote_flush(),
 commit_remote_flush(), and defer_remote_flush() again.  This brings the
 balue of tlbs_dirty back to 1 again, with the tlbs dirty.  The cmpxchg()
 then resets tlbs_dirty, leaving the actual tlbs dirty.
 


Oh, right, sorry for my careless!

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm + raid1 showstopper bug

2012-02-16 Thread Pete Ashdown
I've been waiting for some response from the Ubuntu team regarding a bug on
launchpad, but it appears that it isn't being taken seriously:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/745785

We are running Ubuntu Natty 11.04 with KVM with our storage on RAID10 + drbd.
While we were running the resync command, we could expect an IO block during
every resync.  I eventually disabled resync.  Yesterday we had a kernel panic,
which I am presuming may trace back to KVM and its interactions with RAID10.
I run KVM on some other boxes with some built-in Dell hardware RAIDs.  They
have been stable for well over a year on Ubuntu 10.04.

Here are the 11.04 current versions if that helps.

kvm  84+dfsg-0ubuntu16+0.14.0+noroms+0ubuntu4.4
kvm-pxe  5.4.4-7ubuntu2
qemu-kvm 0.14.0+noroms-0ubuntu4.4
drbd8-utils  8.3.9-1ubuntu1

What I'd like to know is if this bug between KVM  RAID1 is known and whether
it has been addressed in newer versions of the kernel and/or KVM.  I'm on the
verge of buying a hardware RAID card to resolve this and I'd rather not have
to deal with that.  Software RAID is so much nicer to manage.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in kvm on next-s390

2012-02-16 Thread kvm
The Buildbot has detected a new failure on builder next-s390 while building kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/next-s390/builds/447

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_next' triggered this build
Build Source Stamp: [branch next] HEAD
Blamelist: 

BUILD FAILED: failed git

sincerely,
 -The Buildbot



buildbot failure in kvm on next-ppc64

2012-02-16 Thread kvm
The Buildbot has detected a new failure on builder next-ppc64 while building 
kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/next-ppc64/builds/446

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_next' triggered this build
Build Source Stamp: [branch next] HEAD
Blamelist: 

BUILD FAILED: failed git

sincerely,
 -The Buildbot

N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf

buildbot failure in kvm on next-x86_64

2012-02-16 Thread kvm
The Buildbot has detected a new failure on builder next-x86_64 while building 
kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/next-x86_64/builds/445

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_next' triggered this build
Build Source Stamp: [branch next] HEAD
Blamelist: 

BUILD FAILED: failed git

sincerely,
 -The Buildbot



[PATCH] KVM: Avoid zapping unrelated shadows in __kvm_set_memory_region()

2012-02-16 Thread Takuya Yoshikawa
We do not need to zap all shadow pages of the guest when we create or
destroy a slot in this function.

To change this, we make kvm_mmu_zap_all()/kvm_arch_flush_shadow()
zap only those which have mappings into a given slot.

Furthermore, the condition to see if we have any mmio sptes to clear is
changed so that we will not do flush for newly created slots.

Note that the way we iterate over the active shadow pages is also
changed a bit to avoid checking unrelated pages again and again.

With all these changes applied, the total amount of time used to flush
the shadow pages of a usual Linux guest, running Fedora with 4GB memory,
during a shutdown was reduced from 90ms to 60ms.

Furthermore, the total number of flushes needed to boot and shutdown
that guest was also reduced from 52 to 31.

Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
 arch/ia64/kvm/kvm-ia64.c|2 +-
 arch/powerpc/kvm/powerpc.c  |2 +-
 arch/s390/kvm/kvm-s390.c|2 +-
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/mmu.c  |   22 ++
 arch/x86/kvm/x86.c  |   13 ++---
 include/linux/kvm_host.h|2 +-
 virt/kvm/kvm_main.c |   15 ++-
 8 files changed, 39 insertions(+), 21 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index d8ddbba..c86799f 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1621,7 +1621,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
return;
 }
 
-void kvm_arch_flush_shadow(struct kvm *kvm)
+void kvm_arch_flush_shadow(struct kvm *kvm, int slot)
 {
kvm_flush_remote_tlbs(kvm);
 }
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 00d7e34..f050637 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -309,7 +309,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 }
 
 
-void kvm_arch_flush_shadow(struct kvm *kvm)
+void kvm_arch_flush_shadow(struct kvm *kvm, int slot)
 {
 }
 
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 17ad69d..290fcf0 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -871,7 +871,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
return;
 }
 
-void kvm_arch_flush_shadow(struct kvm *kvm)
+void kvm_arch_flush_shadow(struct kvm *kvm, int slot)
 {
 }
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 74c9edf..ecb3f78 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -714,7 +714,7 @@ int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
 int kvm_mmu_rmap_write_protect(struct kvm *kvm, u64 gfn,
   struct kvm_memory_slot *slot);
-void kvm_mmu_zap_all(struct kvm *kvm);
+void kvm_mmu_zap_all(struct kvm *kvm, int slot);
 unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ff053ca..df3e73a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3867,16 +3867,30 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, 
int slot)
kvm_flush_remote_tlbs(kvm);
 }
 
-void kvm_mmu_zap_all(struct kvm *kvm)
+/*
+ * kvm_mmu_zap_all - zap all shadows which have mappings into a given slot
+ * @kvm: the kvm instance
+ * @slot: id of the target slot
+ *
+ * If @slot is -1, zap all shadow pages.
+ */
+void kvm_mmu_zap_all(struct kvm *kvm, int slot)
 {
struct kvm_mmu_page *sp, *node;
LIST_HEAD(invalid_list);
+   int try_again;
 
spin_lock(kvm-mmu_lock);
 restart:
-   list_for_each_entry_safe(sp, node, kvm-arch.active_mmu_pages, link)
-   if (kvm_mmu_prepare_zap_page(kvm, sp, invalid_list))
-   goto restart;
+   try_again = 0;
+   list_for_each_entry_safe(sp, node, kvm-arch.active_mmu_pages, link) {
+   if ((slot = 0)  !test_bit(slot, sp-slot_bitmap))
+   continue;
+
+   try_again |= kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
+   }
+   if (try_again)
+   goto restart;
 
kvm_mmu_commit_zap_page(kvm, invalid_list);
spin_unlock(kvm-mmu_lock);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c9d99e5..7ce8046 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5035,7 +5035,7 @@ int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
 * to ensure that the updated hypercall appears atomically across all
 * VCPUs.
 */
-   kvm_mmu_zap_all(vcpu-kvm);
+   kvm_mmu_zap_all(vcpu-kvm, -1);
 
kvm_x86_ops-patch_hypercall(vcpu, instruction);
 
@@ -6368,9 +6368,16 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
spin_unlock(kvm-mmu_lock);
 }
 
-void kvm_arch_flush_shadow(struct kvm 

[PATCH v4 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-16 Thread Liu Yu
If the guest hypervisor node contains has-idle property.

Signed-off-by: Liu Yu yu@freescale.com
---
v4:
1. discard the CONFIG_E500 to make code for all powerpc platform
2. code cleanup

 arch/powerpc/kernel/epapr.S  |   29 +
 arch/powerpc/kernel/epapr_para.c |   13 -
 2 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/epapr.S b/arch/powerpc/kernel/epapr.S
index 697b390..f06d636 100644
--- a/arch/powerpc/kernel/epapr.S
+++ b/arch/powerpc/kernel/epapr.S
@@ -15,6 +15,35 @@
 #include asm/ppc_asm.h
 #include asm/asm-offsets.h
 
+#define HC_VENDOR_EPAPR(1  16)
+#define HC_EV_IDLE 16
+
+_GLOBAL(epapr_ev_idle)
+epapr_ev_idle:
+   rlwinm  r3,r1,0,0,31-THREAD_SHIFT   /* current thread_info */
+   lwz r4,TI_LOCAL_FLAGS(r3)   /* set napping bit */
+   ori r4,r4,_TLF_NAPPING  /* so when we take an exception */
+   stw r4,TI_LOCAL_FLAGS(r3)   /* it will return to our caller */
+
+   wrteei  1
+
+idle_loop:
+   LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE)
+
+.global epapr_ev_idle_start
+epapr_ev_idle_start:
+   li  r3, -1
+   nop
+   nop
+   nop
+
+   /*
+* Guard against spurious wakeups (e.g. from a hypervisor) --
+* any real interrupt will cause us to return to LR due to
+* _TLF_NAPPING.
+*/
+   b   idle_loop
+
 /* Hypercall entry point. Will be patched with device tree instructions. */
 .global epapr_hypercall_start
 epapr_hypercall_start:
diff --git a/arch/powerpc/kernel/epapr_para.c b/arch/powerpc/kernel/epapr_para.c
index ea13cac..adee4f1 100644
--- a/arch/powerpc/kernel/epapr_para.c
+++ b/arch/powerpc/kernel/epapr_para.c
@@ -20,6 +20,10 @@
 #include linux/of.h
 #include asm/epapr_hcalls.h
 #include asm/cacheflush.h
+#include asm/machdep.h
+
+extern void epapr_ev_idle(void);
+extern u32 epapr_ev_idle_start[];
 
 bool epapr_para_enabled = false;
 
@@ -35,10 +39,17 @@ static int __init epapr_para_init(void)
 
insts = of_get_property(hyper_node, hcall-instructions, len);
if (!(len % 4)  (len = (4 * 4))) {
-   for (i = 0; i  (len / 4); i++)
+   for (i = 0; i  (len / 4); i++) {
epapr_hypercall_start[i] = insts[i];
+   epapr_ev_idle_start[i] = insts[i];
+   }
flush_icache_range((ulong)epapr_hypercall_start,
   (ulong)epapr_hypercall_start + len);
+   flush_icache_range((ulong)epapr_ev_idle_start,
+  (ulong)epapr_ev_idle_start + len);
+
+   if (of_get_property(hyper_node, has-idle, NULL))
+   ppc_md.power_save = epapr_ev_idle;
 
epapr_para_enabled = true;
}
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice

2012-02-16 Thread Avi Kivity
On 02/16/2012 03:55 PM, Danny Kukawka wrote:
 arch/powerpc/kvm/book3s_hv.c: included 'linux/sched.h' twice,
 remove the duplicate.


Thanks, applied.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-16 Thread Scott Wood
On 02/16/2012 04:24 AM, Alexander Graf wrote:
 On 16.02.2012, at 10:26, Liu Yu yu@freescale.com wrote:
 +_GLOBAL(epapr_ev_idle)
 +epapr_ev_idle:
 +rlwinmr3,r1,0,0,31-THREAD_SHIFT/* current thread_info */
 +lwzr4,TI_LOCAL_FLAGS(r3)/* set napping bit */
 +orir4,r4,_TLF_NAPPING/* so when we take an exception */
 +stwr4,TI_LOCAL_FLAGS(r3)/* it will return to our caller */
 +
 +wrteei1
 +
 +idle_loop:
 +LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE)
 +
 +.global epapr_ev_idle_start
 +epapr_ev_idle_start:
 +lir3, -1
 +nop
 +nop
 +nop
 
 Can't you just bl into epapr_hypercall_start? You don't even have to save the 
 old lr. because we never return anyways :)

The interrupt will branch to LR, so no, we can't trash it or put it
anywhere else.

-scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-16 Thread Alexander Graf

On 16.02.2012, at 17:58, Scott Wood wrote:

 On 02/16/2012 04:24 AM, Alexander Graf wrote:
 On 16.02.2012, at 10:26, Liu Yu yu@freescale.com wrote:
 +_GLOBAL(epapr_ev_idle)
 +epapr_ev_idle:
 +rlwinmr3,r1,0,0,31-THREAD_SHIFT/* current thread_info */
 +lwzr4,TI_LOCAL_FLAGS(r3)/* set napping bit */
 +orir4,r4,_TLF_NAPPING/* so when we take an exception */
 +stwr4,TI_LOCAL_FLAGS(r3)/* it will return to our caller */
 +
 +wrteei1
 +
 +idle_loop:
 +LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE)
 +
 +.global epapr_ev_idle_start
 +epapr_ev_idle_start:
 +lir3, -1
 +nop
 +nop
 +nop
 
 Can't you just bl into epapr_hypercall_start? You don't even have to save 
 the old lr. because we never return anyways :)
 
 The interrupt will branch to LR, so no, we can't trash it or put it
 anywhere else.

Hrm. But we can clobber ctr, right? So how about we make the generic version do 
a bctr and then just do a small C wrapper that takes lr, moves it to ctr and 
branches to the generic one?

Then we don't have to replicate the hypercall code all over again for every 
invocation.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-16 Thread Scott Wood
On 02/16/2012 11:18 AM, Alexander Graf wrote:
 
 On 16.02.2012, at 17:58, Scott Wood wrote:
 
 On 02/16/2012 04:24 AM, Alexander Graf wrote:
 On 16.02.2012, at 10:26, Liu Yu yu@freescale.com wrote:
 +_GLOBAL(epapr_ev_idle)
 +epapr_ev_idle:
 +rlwinmr3,r1,0,0,31-THREAD_SHIFT/* current thread_info */
 +lwzr4,TI_LOCAL_FLAGS(r3)/* set napping bit */
 +orir4,r4,_TLF_NAPPING/* so when we take an exception */
 +stwr4,TI_LOCAL_FLAGS(r3)/* it will return to our caller */
 +
 +wrteei1
 +
 +idle_loop:
 +LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE)
 +
 +.global epapr_ev_idle_start
 +epapr_ev_idle_start:
 +lir3, -1
 +nop
 +nop
 +nop

 Can't you just bl into epapr_hypercall_start? You don't even have to save 
 the old lr. because we never return anyways :)

 The interrupt will branch to LR, so no, we can't trash it or put it
 anywhere else.
 
 Hrm. But we can clobber ctr, right? So how about we make the generic version 
 do a bctr and then just do a small C wrapper that takes lr, moves it to ctr 
 and branches to the generic one?

If it's just for this, I would say don't mess with the normal hcall path
for the sake of idle.  If using CTR would let us get away without
creating a stack frame in call sites, maybe that would be worthwhile,
depending on what sort of hcalls we end up having.

 Then we don't have to replicate the hypercall code all over again for every 
 invocation.

We shouldn't need to do it for every invocation.  Idle is special due to
the TLF_NAPPING hack.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-16 Thread Scott Wood
On 02/16/2012 11:30 AM, Alexander Graf wrote:
 
 On 16.02.2012, at 18:28, Scott Wood wrote:
 
 On 02/16/2012 11:18 AM, Alexander Graf wrote:
 Hrm. But we can clobber ctr, right? So how about we make the generic 
 version do a bctr and then just do a small C wrapper that takes lr, moves 
 it to ctr and branches to the generic one?

 If it's just for this, I would say don't mess with the normal hcall path
 for the sake of idle.  If using CTR would let us get away without
 creating a stack frame in call sites, maybe that would be worthwhile,
 depending on what sort of hcalls we end up having.

 Then we don't have to replicate the hypercall code all over again for every 
 invocation.

 We shouldn't need to do it for every invocation.  Idle is special due to
 the TLF_NAPPING hack.
 
 Famous last words. If it's the only case, duplication should be ok. Let's 
 hope there are no others.

Actually, we can't use CTR -- it's volatile in the ePAPR hypercall ABI.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Avi Kivity
On 02/16/2012 09:34 PM, Alexander Graf wrote:
 On 16.02.2012, at 20:24, Avi Kivity wrote:

  On 02/15/2012 04:08 PM, Alexander Graf wrote:
  
  Well, the scatter/gather registers I proposed will give you just one
  register or all of them.
  
  One register is hardly any use. We either need all ways of a respective 
  address to do a full fledged lookup or all of them. 
  
  I should have said, just one register, or all of them, or anything in
  between.
  
  By sharing the same data structures between qemu and kvm, we actually 
  managed to reuse all of the tcg code for lookups, just like you do for x86.
  
  Sharing the data structures is not need.  Simply synchronize them before
  lookup, like we do for ordinary registers.

 Ordinary registers are a few bytes. We're talking of dozens of kbytes here.

A TLB way is a few dozen bytes, no?

  
  On x86 you also have shared memory for page tables, it's just guest 
  visible, hence in guest memory. The concept is the same.
  
  But cr3 isn't, and if we put it in shared memory, we'd have to VMREAD it
  on every exit.  And you're risking the same thing if your hardware gets
  cleverer.

 Yes, we do. When that day comes, we forget the CAP and do it another way. 
 Which way we will find out by the time that day of more clever hardware comes 
 :).

Or we try to be less clever unless we have a really compelling reason. 
qemu monitor and gdb support aren't compelling reasons to optimize.

  
  It's too magical, fitting a random version of a random userspace
  component.  Now you can't change this tcg code (and still keep the magic).
  
  Some complexity is part of keeping software as separate components.

 Why? If another user space wants to use this, they can

 a) do the slow copy path
 or
 b) simply use our struct definitions

 The whole copy thing really only makes sense when you have existing code in 
 user space that you don't want to touch, but easily add on KVM to it. If KVM 
 is part of your whole design, then integrating things makes a lot more sense.

Yeah, I guess.


  
  There are essentially no if(kvm_enabled)'s in our MMU walking code, 
  because the tables are just there. Makes everything a lot easier (without 
  dragging down performance).
  
  We have the same issue with registers.  There we call
  cpu_synchronize_state() before every access.  No magic, but we get to
  reuse the code just the same.

 Yes, and for those few bytes it's ok to do so - most of the time. On s390, 
 even those get shared by now. And it makes sense to do so - if we synchronize 
 it every time anyways, why not do so implicitly?


At least on x86, we synchronize only rarely.



-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Scott Wood
On 02/16/2012 01:38 PM, Avi Kivity wrote:
 On 02/16/2012 09:34 PM, Alexander Graf wrote:
 On 16.02.2012, at 20:24, Avi Kivity wrote:

 On 02/15/2012 04:08 PM, Alexander Graf wrote:

 Well, the scatter/gather registers I proposed will give you just one
 register or all of them.

 One register is hardly any use. We either need all ways of a respective 
 address to do a full fledged lookup or all of them. 

 I should have said, just one register, or all of them, or anything in
 between.

 By sharing the same data structures between qemu and kvm, we actually 
 managed to reuse all of the tcg code for lookups, just like you do for x86.

 Sharing the data structures is not need.  Simply synchronize them before
 lookup, like we do for ordinary registers.

 Ordinary registers are a few bytes. We're talking of dozens of kbytes here.
 
 A TLB way is a few dozen bytes, no?

I think you mean a TLB set... but the TLB (or part of it) may be fully
associative.

On e500mc, it's 24 bytes for one TLB entry, and you'd need 4 entries for
a set of TLB0, and all 64 entries in TLB1.  So 1632 bytes total.

Then we'd need to deal with tracking whether we synchronized one or more
specific sets, or everything (for migration or debug TLB dump).  The
request to synchronize would have to come from within the QEMU MMU code,
since that's the point where we know what to ask for (unless we
duplicate the logic elsewhere).  I'm not sure that reusing the standard
QEMU MMU code for individual debug address translation is really
simplifying things...

And yes, we do have fancier hardware coming fairly soon for which this
breaks (TLB0 entries can be loaded without host involvement, as long as
there's a translation from guest physical to physical in a separate
hardware table).  It'd be reasonable to ignore TLB0 for migration (treat
it as invalidated), but not for debug since that may be where the
translation we're interested in resides.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Next gen kvm api

2012-02-16 Thread Alexander Graf

On 16.02.2012, at 21:41, Scott Wood wrote:

 On 02/16/2012 01:38 PM, Avi Kivity wrote:
 On 02/16/2012 09:34 PM, Alexander Graf wrote:
 On 16.02.2012, at 20:24, Avi Kivity wrote:
 
 On 02/15/2012 04:08 PM, Alexander Graf wrote:
 
 Well, the scatter/gather registers I proposed will give you just one
 register or all of them.
 
 One register is hardly any use. We either need all ways of a respective 
 address to do a full fledged lookup or all of them. 
 
 I should have said, just one register, or all of them, or anything in
 between.
 
 By sharing the same data structures between qemu and kvm, we actually 
 managed to reuse all of the tcg code for lookups, just like you do for 
 x86.
 
 Sharing the data structures is not need.  Simply synchronize them before
 lookup, like we do for ordinary registers.
 
 Ordinary registers are a few bytes. We're talking of dozens of kbytes here.
 
 A TLB way is a few dozen bytes, no?
 
 I think you mean a TLB set... but the TLB (or part of it) may be fully
 associative.
 
 On e500mc, it's 24 bytes for one TLB entry, and you'd need 4 entries for
 a set of TLB0, and all 64 entries in TLB1.  So 1632 bytes total.
 
 Then we'd need to deal with tracking whether we synchronized one or more
 specific sets, or everything (for migration or debug TLB dump).  The
 request to synchronize would have to come from within the QEMU MMU code,
 since that's the point where we know what to ask for (unless we
 duplicate the logic elsewhere).  I'm not sure that reusing the standard
 QEMU MMU code for individual debug address translation is really
 simplifying things...
 
 And yes, we do have fancier hardware coming fairly soon for which this
 breaks (TLB0 entries can be loaded without host involvement, as long as
 there's a translation from guest physical to physical in a separate
 hardware table).  It'd be reasonable to ignore TLB0 for migration (treat
 it as invalidated), but not for debug since that may be where the
 translation we're interested in resides.

Could we maybe add an ioctl that forces kvm to read out the current tlb0 
contents and push them to memory? How slow would that be?


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4 2/3] KVM: PPC: epapr: Add idle hcall support for host

2012-02-16 Thread Liu Yu-B13201


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, February 16, 2012 6:20 PM
 To: Liu Yu-B13201
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc-
 d...@ozlabs.org; Wood Scott-B07421; Liu Yu-B13201
 Subject: Re: [PATCH v4 2/3] KVM: PPC: epapr: Add idle hcall support for
 host
 
 
 
 On 16.02.2012, at 10:26, Liu Yu yu@freescale.com wrote:
 
  And add a new flag definition in kvm_ppc_pvinfo to indicate whether
  host support EV_IDLE hcall.
 
  Signed-off-by: Liu Yu yu@freescale.com
  ---
  v4:
  no change
 
  arch/powerpc/include/asm/kvm_para.h |   14 --
  arch/powerpc/kvm/powerpc.c  |8 
  include/linux/kvm.h |2 ++
  3 files changed, 22 insertions(+), 2 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/kvm_para.h
  b/arch/powerpc/include/asm/kvm_para.h
  index 7b754e7..81a34c9 100644
  --- a/arch/powerpc/include/asm/kvm_para.h
  +++ b/arch/powerpc/include/asm/kvm_para.h
  @@ -75,9 +75,19 @@ struct kvm_vcpu_arch_shared { };
 
  #define KVM_SC_MAGIC_R00x4b564d21 /* KVM! */
  -#define HC_VENDOR_KVM(42  16)
  +
  +#include asm/epapr_hcalls.h
  +
  +/* ePAPR Hypercall Vendor ID */
  +#define HC_VENDOR_EPAPR(EV_EPAPR_VENDOR_ID  16)
  +#define HC_VENDOR_KVM(EV_KVM_VENDOR_ID  16)
  +
  +/* ePAPR Hypercall Token */
  +#define HC_EV_IDLEEV_IDLE
  +
  +/* ePAPR Hypercall Return Codes */
  #define HC_EV_SUCCESS0
  -#define HC_EV_UNIMPLEMENTED12
  +#define HC_EV_UNIMPLEMENTEDEV_UNIMPLEMENTED
 
  #define KVM_FEATURE_MAGIC_PAGE1
 
  diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
  index 0e21d15..03ebd5d 100644
  --- a/arch/powerpc/kvm/powerpc.c
  +++ b/arch/powerpc/kvm/powerpc.c
  @@ -81,6 +81,10 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 
 /* Second return value is in r4 */
 break;
  +case HC_VENDOR_EPAPR | HC_EV_IDLE:
  +r = HC_EV_SUCCESS;
  +kvm_vcpu_block(vcpu);
  +break;
 default:
 r = HC_EV_UNIMPLEMENTED;
 break;
  @@ -746,6 +750,10 @@ static int kvm_vm_ioctl_get_pvinfo(struct
 kvm_ppc_pvinfo *pvinfo)
 pvinfo-hcall[2] = inst_sc;
 pvinfo-hcall[3] = inst_nop;
 
  +#ifdef CONFIG_BOOKE
  +pvinfo-flags |= KVM_PPC_PVINFO_FLAGS_EV_IDLE; #endif
  +
 return 0;
  } 

 Why limit it to booke? The less ifdefs our code has, the better :)

The code here tells userspace that kvm support ev_idle.
But I'm not sure if the ev_idle code works for other platforms.

So I think we should keep the ifdef until other platform test the code :)

Thanks,
Yu


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html