date:20150320

Hardware support for vt-posted interrupts described in vt-directed-io-spec for assigned devices

2015-03-20 Thread bk rakesh

Hello,

I am using the S2600CP server board with cpu Intel(R) Xeon(R) CPU
E5-2620 v2 @ 2.10GHz which supports for APICv,

Does this hardware support the vt-d posted interrupt feature as  described in
[v3 00/26] Add VT-d Posted-Interrupts support and
https://lkml.org/lkml/2014 /12/3/102 ,

Requesting to get any information about the hardware that supports this feature.

Thanks,
Rakesh
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 9/9] qspinlock,x86,kvm: Implement KVM support for paravirt qspinlock

2015-03-20 Thread Raghavendra K T


On 03/20/2015 02:38 AM, Waiman Long wrote:

On 03/19/2015 06:01 AM, Peter Zijlstra wrote:

[...]

You are probably right. The initial apply_paravirt() was done before the
SMP boot. Subsequent ones were at kernel module load time. I put a
counter in the __native_queue_spin_unlock() and it registered 26949
unlock calls in a 16-cpu guest before it got patched out.


because even printks take lock..

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 93251] qemu-kvm guests randomly hangs after reboot command in guest

2015-03-20 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=93251

--- Comment #8 from Igor Mammedov imamm...@redhat.com ---
(In reply to Thomas Stein from comment #7)
 Hello.
 
 After reverting commit 1d4e7e3c0bca747d0fc54069a6ab8393349431c0 i had no
 problem any more. But we have to keep in mind this error only happend now
 and then. Especially creating snapshots of running vm's triggered the error
 but this runs without crashing the vm now. I tested it a few times.
 
 cheers
 t.

Could you try following patch without reverting 1d4e7e3c0bca ?
http://article.gmane.org/gmane.comp.emulators.kvm.devel/134191

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 93251] qemu-kvm guests randomly hangs after reboot command in guest

2015-03-20 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=93251

--- Comment #9 from Thomas Stein himbe...@meine-oma.de ---
Hello.

I applied the patch to vanilla 3.19.2. No problems so far. Did a few snapshots
and vm restarts.

cheers
t.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 93251] qemu-kvm guests randomly hangs after reboot command in guest

2015-03-20 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=93251

--- Comment #7 from Thomas Stein himbe...@meine-oma.de ---
Hello.

After reverting commit 1d4e7e3c0bca747d0fc54069a6ab8393349431c0 i had no
problem any more. But we have to keep in mind this error only happend now and
then. Especially creating snapshots of running vm's triggered the error but
this runs without crashing the vm now. I tested it a few times.

cheers
t.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/23] KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode

2015-03-20 Thread Paul Mackerras

From: Suresh Warrier warr...@linux.vnet.ibm.com

Interrupt-based hypercalls return H_TOO_HARD to inform KVM that it needs
to switch to the host to complete the rest of hypercall function in
virtual mode. This patch ports the virtual mode ICS/ICP reject and resend
functions to be runnable in hypervisor real mode, thus avoiding the need
to switch to the host to execute these functions in virtual mode. However,
the hypercalls continue to return H_TOO_HARD for vcpu_wakeup and notify
events - these events cannot be done in real mode and they will still need
a switch to host virtual mode.

There are sufficient differences between the real mode code and the
virtual mode code for the ICS/ICP resend and reject functions that
for now the code has been duplicated instead of sharing common code.
In the future, we can look at creating common functions.

Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 225 ---
 1 file changed, 211 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 7c22997..73bbe92 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -23,12 +23,39 @@
 
 #define DEBUG_PASSUP
 
+static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
+   u32 new_irq);
+
 static inline void rm_writeb(unsigned long paddr, u8 val)
 {
__asm__ __volatile__(sync; stbcix %0,0,%1
: : r (val), r (paddr) : memory);
 }
 
+/* -- ICS routines -- */
+static void ics_rm_check_resend(struct kvmppc_xics *xics,
+   struct kvmppc_ics *ics, struct kvmppc_icp *icp)
+{
+   int i;
+
+   arch_spin_lock(ics-lock);
+
+   for (i = 0; i  KVMPPC_XICS_IRQ_PER_ICS; i++) {
+   struct ics_irq_state *state = ics-irq_state[i];
+
+   if (!state-resend)
+   continue;
+
+   arch_spin_unlock(ics-lock);
+   icp_rm_deliver_irq(xics, icp, state-number);
+   arch_spin_lock(ics-lock);
+   }
+
+   arch_spin_unlock(ics-lock);
+}
+
+/* -- ICP routines -- */
+
 static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
struct kvm_vcpu *this_vcpu)
 {
@@ -116,6 +143,178 @@ static inline int check_too_hard(struct kvmppc_xics *xics,
return (xics-real_mode_dbg || icp-rm_action) ? H_TOO_HARD : H_SUCCESS;
 }
 
+static void icp_rm_check_resend(struct kvmppc_xics *xics,
+struct kvmppc_icp *icp)
+{
+   u32 icsid;
+
+   /* Order this load with the test for need_resend in the caller */
+   smp_rmb();
+   for_each_set_bit(icsid, icp-resend_map, xics-max_icsid + 1) {
+   struct kvmppc_ics *ics = xics-ics[icsid];
+
+   if (!test_and_clear_bit(icsid, icp-resend_map))
+   continue;
+   if (!ics)
+   continue;
+   ics_rm_check_resend(xics, ics, icp);
+   }
+}
+
+static bool icp_rm_try_to_deliver(struct kvmppc_icp *icp, u32 irq, u8 priority,
+  u32 *reject)
+{
+   union kvmppc_icp_state old_state, new_state;
+   bool success;
+
+   do {
+   old_state = new_state = READ_ONCE(icp-state);
+
+   *reject = 0;
+
+   /* See if we can deliver */
+   success = new_state.cppr  priority 
+   new_state.mfrr  priority 
+   new_state.pending_pri  priority;
+
+   /*
+* If we can, check for a rejection and perform the
+* delivery
+*/
+   if (success) {
+   *reject = new_state.xisr;
+   new_state.xisr = irq;
+   new_state.pending_pri = priority;
+   } else {
+   /*
+* If we failed to deliver we set need_resend
+* so a subsequent CPPR state change causes us
+* to try a new delivery.
+*/
+   new_state.need_resend = true;
+   }
+
+   } while (!icp_rm_try_update(icp, old_state, new_state));
+
+   return success;
+}
+
+static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
+   u32 new_irq)
+{
+   struct ics_irq_state *state;
+   struct kvmppc_ics *ics;
+   u32 reject;
+   u16 src;
+
+   /*
+* This is used both for initial delivery of an interrupt and
+* for subsequent rejection.
+*
+* Rejection can be racy vs. resends. We have evaluated the
+* rejection in an atomic ICP transaction which is now complete,
+* so potentially the ICP can already accept the

[PATCH 04/23] KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation.

2015-03-20 Thread Paul Mackerras

From: Michael Ellerman mich...@ellerman.id.au

Some PowerNV systems include a hardware random-number generator.
This HWRNG is present on POWER7+ and POWER8 chips and is capable of
generating one 64-bit random number every microsecond.  The random
numbers are produced by sampling a set of 64 unstable high-frequency
oscillators and are almost completely entropic.

PAPR defines an H_RANDOM hypercall which guests can use to obtain one
64-bit random sample from the HWRNG.  This adds a real-mode
implementation of the H_RANDOM hypercall.  This hypercall was
implemented in real mode because the latency of reading the HWRNG is
generally small compared to the latency of a guest exit and entry for
all the threads in the same virtual core.

Userspace can detect the presence of the HWRNG and the H_RANDOM
implementation by quering the KVM_CAP_PPC_HWRNG capability.  The
H_RANDOM hypercall implementation will only be invoked when the guest
does an H_RANDOM hypercall if userspace first enables the in-kernel
H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
Signed-off-by: Paul Mackerras pau...@samba.org
---
 Documentation/virtual/kvm/api.txt   |  17 +
 arch/powerpc/include/asm/archrandom.h   |  11 ++-
 arch/powerpc/include/asm/kvm_ppc.h  |   2 +
 arch/powerpc/kvm/book3s_hv_builtin.c|  15 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 115 
 arch/powerpc/kvm/powerpc.c  |   3 +
 arch/powerpc/platforms/powernv/rng.c|  29 
 include/uapi/linux/kvm.h|   1 +
 8 files changed, 191 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index b112efc..ce10b48 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3248,3 +3248,20 @@ All other orders will be handled completely in user 
space.
 Only privileged operation exceptions will be checked for in the kernel (or even
 in the hardware prior to interception). If this capability is not enabled, the
 old way of handling SIGP orders is used (partially in kernel and user space).
+
+
+8. Other capabilities.
+--
+
+This section lists capabilities that give information about other
+features of the KVM implementation.
+
+8.1 KVM_CAP_PPC_HWRNG
+
+Architectures: ppc
+
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that that the kernel has an implementation of the
+H_RANDOM hypercall backed by a hardware random-number generator.
+If present, the kernel H_RANDOM handler can be enabled for guest use
+with the KVM_CAP_PPC_ENABLE_HCALL capability.
diff --git a/arch/powerpc/include/asm/archrandom.h 
b/arch/powerpc/include/asm/archrandom.h
index bde5311..0cc6eed 100644
--- a/arch/powerpc/include/asm/archrandom.h
+++ b/arch/powerpc/include/asm/archrandom.h
@@ -30,8 +30,6 @@ static inline int arch_has_random(void)
return !!ppc_md.get_random_long;
 }
 
-int powernv_get_random_long(unsigned long *v);
-
 static inline int arch_get_random_seed_long(unsigned long *v)
 {
return 0;
@@ -47,4 +45,13 @@ static inline int arch_has_random_seed(void)
 
 #endif /* CONFIG_ARCH_RANDOM */
 
+#ifdef CONFIG_PPC_POWERNV
+int powernv_hwrng_present(void);
+int powernv_get_random_long(unsigned long *v);
+int powernv_get_random_real_mode(unsigned long *v);
+#else
+static inline int powernv_hwrng_present(void) { return 0; }
+static inline int powernv_get_random_real_mode(unsigned long *v) { return 0; }
+#endif
+
 #endif /* _ASM_POWERPC_ARCHRANDOM_H */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 46bf652..b8475da 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -302,6 +302,8 @@ static inline bool is_kvmppc_hv_enabled(struct kvm *kvm)
return kvm-arch.kvm_ops == kvmppc_hv_ops;
 }
 
+extern int kvmppc_hwrng_present(void);
+
 /*
  * Cuts out inst bits with ordering according to spec.
  * That means the leftmost bit is zero. All given bits are included.
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 1f083ff..1954a1c 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -21,6 +21,7 @@
 #include asm/cputable.h
 #include asm/kvm_ppc.h
 #include asm/kvm_book3s.h
+#include asm/archrandom.h
 
 #define KVM_CMA_CHUNK_ORDER18
 
@@ -169,3 +170,17 @@ int kvmppc_hcall_impl_hv_realmode(unsigned long cmd)
return 0;
 }
 EXPORT_SYMBOL_GPL(kvmppc_hcall_impl_hv_realmode);
+
+int kvmppc_hwrng_present(void)
+{
+   return powernv_hwrng_present();
+}
+EXPORT_SYMBOL_GPL(kvmppc_hwrng_present);
+
+long kvmppc_h_random(struct kvm_vcpu *vcpu)
+{
+   if (powernv_get_random_real_mode(vcpu-arch.gpr[4]))
+   return H_SUCCESS;
+
+   return H_HARDWARE;
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S

[PATCH 03/23] KVM: PPC: Book3S HV: Fix instruction emulation

2015-03-20 Thread Paul Mackerras

Commit 4a157d61b48c (KVM: PPC: Book3S HV: Fix endianness of
instruction obtained from HEIR register) had the side effect that
we no longer reset vcpu-arch.last_inst to -1 on guest exit in
the cases where the instruction is not fetched from the guest.
This means that if instruction emulation turns out to be required
in those cases, the host will emulate the wrong instruction, since
vcpu-arch.last_inst will contain the last instruction that was
emulated.

This fixes it by making sure that vcpu-arch.last_inst is reset
to -1 in those cases.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bb94e6f..6cbf163 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1005,6 +1005,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
/* Save HEIR (HV emulation assist reg) in emul_inst
   if this is an HEI (HV emulation interrupt, e40) */
li  r3,KVM_INST_FETCH_FAILED
+   stw r3,VCPU_LAST_INST(r9)
cmpwi   r12,BOOK3S_INTERRUPT_H_EMUL_ASSIST
bne 11f
mfspr   r3,SPRN_HEIR
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/23] KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr()

2015-03-20 Thread Paul Mackerras

Currently, kvmppc_set_lpcr() has a spinlock around the whole function,
and inside that does mutex_lock(kvm-lock).  It is not permitted to
take a mutex while holding a spinlock, because the mutex_lock might
call schedule().  In addition, this causes lockdep to warn about a
lock ordering issue:

==
[ INFO: possible circular locking dependency detected ]
3.18.0-kvm-04645-gdfea862-dirty #131 Not tainted
---
qemu-system-ppc/8179 is trying to acquire lock:
 (kvm-lock){+.+.+.}, at: [decc1f54] .kvmppc_set_lpcr+0xf4/0x1c0 
[kvm_hv]

but task is already holding lock:
 ((vcore-lock)-rlock){+.+...}, at: [decc1ea0] 
.kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

- #1 ((vcore-lock)-rlock){+.+...}:
   [c0b3c120] .mutex_lock_nested+0x80/0x570
   [decc7a14] .kvmppc_vcpu_run_hv+0xc4/0xe40 [kvm_hv]
   [deb9f5cc] .kvmppc_vcpu_run+0x2c/0x40 [kvm]
   [deb9cb24] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm]
   [deb94478] .kvm_vcpu_ioctl+0x4a8/0x7b0 [kvm]
   [c026cbb4] .do_vfs_ioctl+0x444/0x770
   [c026cfa4] .SyS_ioctl+0xc4/0xe0
   [c0009264] syscall_exit+0x0/0x98

- #0 (kvm-lock){+.+.+.}:
   [c00ff28c] .lock_acquire+0xcc/0x1a0
   [c0b3c120] .mutex_lock_nested+0x80/0x570
   [decc1f54] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
   [decc510c] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv]
   [deb9f234] .kvmppc_set_one_reg+0x44/0x330 [kvm]
   [deb9c9dc] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm]
   [deb9ced4] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
   [deb940b0] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
   [c026cbb4] .do_vfs_ioctl+0x444/0x770
   [c026cfa4] .SyS_ioctl+0xc4/0xe0
   [c0009264] syscall_exit+0x0/0x98

other info that might help us debug this:

 Possible unsafe locking scenario:

   CPU0CPU1
   
  lock((vcore-lock)-rlock);
   lock(kvm-lock);
   lock((vcore-lock)-rlock);
  lock(kvm-lock);

 *** DEADLOCK ***

2 locks held by qemu-system-ppc/8179:
 #0:  (vcpu-mutex){+.+.+.}, at: [deb93f18] .vcpu_load+0x28/0x90 
[kvm]
 #1:  ((vcore-lock)-rlock){+.+...}, at: [decc1ea0] 
.kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]

stack backtrace:
CPU: 4 PID: 8179 Comm: qemu-system-ppc Not tainted 
3.18.0-kvm-04645-gdfea862-dirty #131
Call Trace:
[c01a66c0f310] [c0b486ac] .dump_stack+0x88/0xb4 (unreliable)
[c01a66c0f390] [c00f8bec] .print_circular_bug+0x27c/0x3d0
[c01a66c0f440] [c00fe9e8] .__lock_acquire+0x2028/0x2190
[c01a66c0f5d0] [c00ff28c] .lock_acquire+0xcc/0x1a0
[c01a66c0f6a0] [c0b3c120] .mutex_lock_nested+0x80/0x570
[c01a66c0f7c0] [decc1f54] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
[c01a66c0f860] [decc510c] .kvmppc_set_one_reg_hv+0x4dc/0x990 
[kvm_hv]
[c01a66c0f8d0] [deb9f234] .kvmppc_set_one_reg+0x44/0x330 [kvm]
[c01a66c0f960] [deb9c9dc] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 
[kvm]
[c01a66c0f9f0] [deb9ced4] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
[c01a66c0faf0] [deb940b0] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
[c01a66c0fcb0] [c026cbb4] .do_vfs_ioctl+0x444/0x770
[c01a66c0fd90] [c026cfa4] .SyS_ioctl+0xc4/0xe0
[c01a66c0fe30] [c0009264] syscall_exit+0x0/0x98

This fixes it by moving the mutex_lock()/mutex_unlock() pair outside
the spin-locked region.

Cc: sta...@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de4018a..b273193 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -942,20 +942,20 @@ static int kvm_arch_vcpu_ioctl_set_sregs_hv(struct 
kvm_vcpu *vcpu,
 static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
bool preserve_top32)
 {
+   struct kvm *kvm = vcpu-kvm;
struct kvmppc_vcore *vc = vcpu-arch.vcore;
u64 mask;
 
+   mutex_lock(kvm-lock);
spin_lock(vc-lock);
/*
 * If ILE (interrupt little-endian) has changed, update the
 * MSR_LE bit in the intr_msr for each vcpu in this vcore.
 */
if ((new_lpcr  LPCR_ILE) != (vc-lpcr  LPCR_ILE)) {
-   struct kvm *kvm = vcpu-kvm;
struct kvm_vcpu *vcpu;
int i;
 
-   mutex_lock(kvm-lock);
kvm_for_each_vcpu(i, vcpu, kvm) {
if (vcpu-arch.vcore != vc)
continue;
@@ -964,7

[PATCH 09/23] KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock

2015-03-20 Thread Paul Mackerras

From: Suresh Warrier warr...@linux.vnet.ibm.com

Replaces the ICS mutex lock with a spin lock since we will be porting
these routines to real mode. Note that we need to disable interrupts
before we take the lock in anticipation of the fact that on the guest
side, we are running in the context of a hard irq and interrupts are
disabled (EE bit off) when the lock is acquired. Again, because we
will be acquiring the lock in hypervisor real mode, we need to use
an arch_spinlock_t instead of a normal spinlock here as we want to
avoid running any lockdep code (which may not be safe to execute in
real mode).

Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_xics.c | 68 +-
 arch/powerpc/kvm/book3s_xics.h |  2 +-
 2 files changed, 48 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 48f0bda..56ed9b4 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -20,6 +20,7 @@
 #include asm/xics.h
 #include asm/debug.h
 #include asm/time.h
+#include asm/spinlock.h
 
 #include linux/debugfs.h
 #include linux/seq_file.h
@@ -39,7 +40,7 @@
  * LOCKING
  * ===
  *
- * Each ICS has a mutex protecting the information about the IRQ
+ * Each ICS has a spin lock protecting the information about the IRQ
  * sources and avoiding simultaneous deliveries if the same interrupt.
  *
  * ICP operations are done via a single compare  swap transaction
@@ -109,7 +110,10 @@ static void ics_check_resend(struct kvmppc_xics *xics, 
struct kvmppc_ics *ics,
 {
int i;
 
-   mutex_lock(ics-lock);
+   unsigned long flags;
+
+   local_irq_save(flags);
+   arch_spin_lock(ics-lock);
 
for (i = 0; i  KVMPPC_XICS_IRQ_PER_ICS; i++) {
struct ics_irq_state *state = ics-irq_state[i];
@@ -120,12 +124,15 @@ static void ics_check_resend(struct kvmppc_xics *xics, 
struct kvmppc_ics *ics,
XICS_DBG(resend %#x prio %#x\n, state-number,
  state-priority);
 
-   mutex_unlock(ics-lock);
+   arch_spin_unlock(ics-lock);
+   local_irq_restore(flags);
icp_deliver_irq(xics, icp, state-number);
-   mutex_lock(ics-lock);
+   local_irq_save(flags);
+   arch_spin_lock(ics-lock);
}
 
-   mutex_unlock(ics-lock);
+   arch_spin_unlock(ics-lock);
+   local_irq_restore(flags);
 }
 
 static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
@@ -133,8 +140,10 @@ static bool write_xive(struct kvmppc_xics *xics, struct 
kvmppc_ics *ics,
   u32 server, u32 priority, u32 saved_priority)
 {
bool deliver;
+   unsigned long flags;
 
-   mutex_lock(ics-lock);
+   local_irq_save(flags);
+   arch_spin_lock(ics-lock);
 
state-server = server;
state-priority = priority;
@@ -145,7 +154,8 @@ static bool write_xive(struct kvmppc_xics *xics, struct 
kvmppc_ics *ics,
deliver = true;
}
 
-   mutex_unlock(ics-lock);
+   arch_spin_unlock(ics-lock);
+   local_irq_restore(flags);
 
return deliver;
 }
@@ -186,6 +196,7 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 
*server, u32 *priority)
struct kvmppc_ics *ics;
struct ics_irq_state *state;
u16 src;
+   unsigned long flags;
 
if (!xics)
return -ENODEV;
@@ -195,10 +206,12 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 
*server, u32 *priority)
return -EINVAL;
state = ics-irq_state[src];
 
-   mutex_lock(ics-lock);
+   local_irq_save(flags);
+   arch_spin_lock(ics-lock);
*server = state-server;
*priority = state-priority;
-   mutex_unlock(ics-lock);
+   arch_spin_unlock(ics-lock);
+   local_irq_restore(flags);
 
return 0;
 }
@@ -365,6 +378,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
struct kvmppc_ics *ics;
u32 reject;
u16 src;
+   unsigned long flags;
 
/*
 * This is used both for initial delivery of an interrupt and
@@ -391,7 +405,8 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
state = ics-irq_state[src];
 
/* Get a lock on the ICS */
-   mutex_lock(ics-lock);
+   local_irq_save(flags);
+   arch_spin_lock(ics-lock);
 
/* Get our server */
if (!icp || state-server != icp-server_num) {
@@ -434,7 +449,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
 *
 * Note that if successful, the new delivery might have itself
 * rejected an interrupt that was delivered before we took the
-* icp mutex.
+* ics spin lock.
 *
 * In this case we do the whole

[PATCH 06/23] KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte

2015-03-20 Thread Paul Mackerras

From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

This adds helper routines for locking and unlocking HPTEs, and uses
them in the rest of the code.  We don't change any locking rules in
this patch.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 25 ++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 25 +
 3 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2d81e20..0789a0f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long 
bits)
return old == 0;
 }
 
+static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   asm volatile(PPC_RELEASE_BARRIER  : : : memory);
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
+/* Without barrier */
+static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
 static inline int __hpte_actual_psize(unsigned int lp, int psize)
 {
int i, shift;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index dbf1271..6c6825a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
v = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
gr = kvm-arch.revmap[index].guest_rpte;
 
-   /* Unlock the HPTE */
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(v);
+   unlock_hpte(hptep, v);
preempt_enable();
 
gpte-eaddr = eaddr;
@@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
hpte[0] = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
hpte[1] = be64_to_cpu(hptep[1]);
hpte[2] = r = rev-guest_rpte;
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(hpte[0]);
+   unlock_hpte(hptep, hpte[0]);
preempt_enable();
 
if (hpte[0] != vcpu-arch.pgfault_hpte[0] ||
@@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
hptep[1] = cpu_to_be64(r);
eieio();
-   hptep[0] = cpu_to_be64(hpte[0]);
+   __unlock_hpte(hptep, hpte[0]);
asm volatile(ptesync : : : memory);
preempt_enable();
if (page  hpte_is_writable(r))
@@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
}
unlock_rmap(rmapp);
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
-   /* unlock and continue */
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
 
@@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
npages_dirty = n;
eieio();
}
-   v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
+   v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   hptep[0] = cpu_to_be64(v);
+   __unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
r = ~HPTE_GR_MODIFIED;
revp-guest_rpte = r;
}
-   asm volatile(PPC_RELEASE_BARRIER  : : : memory);
-   hptp[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   unlock_hpte(hptp, be64_to_cpu(hptp[0]));

[PATCH 01/23] KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr()

2015-03-20 Thread Paul Mackerras

Currently, kvmppc_set_lpcr() has a spinlock around the whole function,
and inside that does mutex_lock(kvm-lock).  It is not permitted to
take a mutex while holding a spinlock, because the mutex_lock might
call schedule().  In addition, this causes lockdep to warn about a
lock ordering issue:

==
[ INFO: possible circular locking dependency detected ]
3.18.0-kvm-04645-gdfea862-dirty #131 Not tainted
---
qemu-system-ppc/8179 is trying to acquire lock:
 (kvm-lock){+.+.+.}, at: [decc1f54] .kvmppc_set_lpcr+0xf4/0x1c0 
[kvm_hv]

but task is already holding lock:
 ((vcore-lock)-rlock){+.+...}, at: [decc1ea0] 
.kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

- #1 ((vcore-lock)-rlock){+.+...}:
   [c0b3c120] .mutex_lock_nested+0x80/0x570
   [decc7a14] .kvmppc_vcpu_run_hv+0xc4/0xe40 [kvm_hv]
   [deb9f5cc] .kvmppc_vcpu_run+0x2c/0x40 [kvm]
   [deb9cb24] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm]
   [deb94478] .kvm_vcpu_ioctl+0x4a8/0x7b0 [kvm]
   [c026cbb4] .do_vfs_ioctl+0x444/0x770
   [c026cfa4] .SyS_ioctl+0xc4/0xe0
   [c0009264] syscall_exit+0x0/0x98

- #0 (kvm-lock){+.+.+.}:
   [c00ff28c] .lock_acquire+0xcc/0x1a0
   [c0b3c120] .mutex_lock_nested+0x80/0x570
   [decc1f54] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
   [decc510c] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv]
   [deb9f234] .kvmppc_set_one_reg+0x44/0x330 [kvm]
   [deb9c9dc] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm]
   [deb9ced4] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
   [deb940b0] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
   [c026cbb4] .do_vfs_ioctl+0x444/0x770
   [c026cfa4] .SyS_ioctl+0xc4/0xe0
   [c0009264] syscall_exit+0x0/0x98

other info that might help us debug this:

 Possible unsafe locking scenario:

   CPU0CPU1
   
  lock((vcore-lock)-rlock);
   lock(kvm-lock);
   lock((vcore-lock)-rlock);
  lock(kvm-lock);

 *** DEADLOCK ***

2 locks held by qemu-system-ppc/8179:
 #0:  (vcpu-mutex){+.+.+.}, at: [deb93f18] .vcpu_load+0x28/0x90 
[kvm]
 #1:  ((vcore-lock)-rlock){+.+...}, at: [decc1ea0] 
.kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]

stack backtrace:
CPU: 4 PID: 8179 Comm: qemu-system-ppc Not tainted 
3.18.0-kvm-04645-gdfea862-dirty #131
Call Trace:
[c01a66c0f310] [c0b486ac] .dump_stack+0x88/0xb4 (unreliable)
[c01a66c0f390] [c00f8bec] .print_circular_bug+0x27c/0x3d0
[c01a66c0f440] [c00fe9e8] .__lock_acquire+0x2028/0x2190
[c01a66c0f5d0] [c00ff28c] .lock_acquire+0xcc/0x1a0
[c01a66c0f6a0] [c0b3c120] .mutex_lock_nested+0x80/0x570
[c01a66c0f7c0] [decc1f54] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
[c01a66c0f860] [decc510c] .kvmppc_set_one_reg_hv+0x4dc/0x990 
[kvm_hv]
[c01a66c0f8d0] [deb9f234] .kvmppc_set_one_reg+0x44/0x330 [kvm]
[c01a66c0f960] [deb9c9dc] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 
[kvm]
[c01a66c0f9f0] [deb9ced4] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
[c01a66c0faf0] [deb940b0] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
[c01a66c0fcb0] [c026cbb4] .do_vfs_ioctl+0x444/0x770
[c01a66c0fd90] [c026cfa4] .SyS_ioctl+0xc4/0xe0
[c01a66c0fe30] [c0009264] syscall_exit+0x0/0x98

This fixes it by moving the mutex_lock()/mutex_unlock() pair outside
the spin-locked region.

Cc: sta...@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de4018a..b273193 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -942,20 +942,20 @@ static int kvm_arch_vcpu_ioctl_set_sregs_hv(struct 
kvm_vcpu *vcpu,
 static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
bool preserve_top32)
 {
+   struct kvm *kvm = vcpu-kvm;
struct kvmppc_vcore *vc = vcpu-arch.vcore;
u64 mask;
 
+   mutex_lock(kvm-lock);
spin_lock(vc-lock);
/*
 * If ILE (interrupt little-endian) has changed, update the
 * MSR_LE bit in the intr_msr for each vcpu in this vcore.
 */
if ((new_lpcr  LPCR_ILE) != (vc-lpcr  LPCR_ILE)) {
-   struct kvm *kvm = vcpu-kvm;
struct kvm_vcpu *vcpu;
int i;
 
-   mutex_lock(kvm-lock);
kvm_for_each_vcpu(i, vcpu, kvm) {
if (vcpu-arch.vcore != vc)
continue;
@@ -964,7

[PATCH 05/23] KVM: PPC: Book3S HV: Remove RMA-related variables from code

2015-03-20 Thread Paul Mackerras

From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

We don't support real-mode areas now that 970 support is removed.
Remove the remaining details of rma from the code.  Also rename
rma_setup_done to hpte_setup_done to better reflect the changes.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  3 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 28 ++--
 arch/powerpc/kvm/book3s_hv.c| 10 +-
 3 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 8ef0512..015773f 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -228,9 +228,8 @@ struct kvm_arch {
int tlbie_lock;
unsigned long lpcr;
unsigned long rmor;
-   struct kvm_rma_info *rma;
unsigned long vrma_slb_v;
-   int rma_setup_done;
+   int hpte_setup_done;
u32 hpt_order;
atomic_t vcpus_running;
u32 online_vcores;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 534acb3..dbf1271 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -116,12 +116,12 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 
*htab_orderp)
long order;
 
mutex_lock(kvm-lock);
-   if (kvm-arch.rma_setup_done) {
-   kvm-arch.rma_setup_done = 0;
-   /* order rma_setup_done vs. vcpus_running */
+   if (kvm-arch.hpte_setup_done) {
+   kvm-arch.hpte_setup_done = 0;
+   /* order hpte_setup_done vs. vcpus_running */
smp_mb();
if (atomic_read(kvm-arch.vcpus_running)) {
-   kvm-arch.rma_setup_done = 1;
+   kvm-arch.hpte_setup_done = 1;
goto out;
}
}
@@ -1339,20 +1339,20 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
unsigned long tmp[2];
ssize_t nb;
long int err, ret;
-   int rma_setup;
+   int hpte_setup;
 
if (!access_ok(VERIFY_READ, buf, count))
return -EFAULT;
 
/* lock out vcpus from running while we're doing this */
mutex_lock(kvm-lock);
-   rma_setup = kvm-arch.rma_setup_done;
-   if (rma_setup) {
-   kvm-arch.rma_setup_done = 0;   /* temporarily */
-   /* order rma_setup_done vs. vcpus_running */
+   hpte_setup = kvm-arch.hpte_setup_done;
+   if (hpte_setup) {
+   kvm-arch.hpte_setup_done = 0;  /* temporarily */
+   /* order hpte_setup_done vs. vcpus_running */
smp_mb();
if (atomic_read(kvm-arch.vcpus_running)) {
-   kvm-arch.rma_setup_done = 1;
+   kvm-arch.hpte_setup_done = 1;
mutex_unlock(kvm-lock);
return -EBUSY;
}
@@ -1405,7 +1405,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
   r=%lx\n, ret, i, v, r);
goto out;
}
-   if (!rma_setup  is_vrma_hpte(v)) {
+   if (!hpte_setup  is_vrma_hpte(v)) {
unsigned long psize = hpte_base_page_size(v, r);
unsigned long senc = slb_pgsize_encoding(psize);
unsigned long lpcr;
@@ -1414,7 +1414,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
(VRMA_VSID  SLB_VSID_SHIFT_1T);
lpcr = senc  (LPCR_VRMASD_SH - 4);
kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
-   rma_setup = 1;
+   hpte_setup = 1;
}
++i;
hptp += 2;
@@ -1430,9 +1430,9 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
}
 
  out:
-   /* Order HPTE updates vs. rma_setup_done */
+   /* Order HPTE updates vs. hpte_setup_done */
smp_wmb();
-   kvm-arch.rma_setup_done = rma_setup;
+   kvm-arch.hpte_setup_done = hpte_setup;
mutex_unlock(kvm-lock);
 
if (err)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de74756..7b7102a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2032,11 +2032,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu)
}
 
atomic_inc(vcpu-kvm-arch.vcpus_running);
-   /* Order vcpus_running vs. rma_setup_done, see kvmppc_alloc_reset_hpt */
+   /* Order vcpus_running vs.

[PATCH 11/23] KVM: PPC: Book3S HV: Add ICP real mode counters

2015-03-20 Thread Paul Mackerras

From: Suresh Warrier warr...@linux.vnet.ibm.com

Add two counters to count how often we generate real-mode ICS resend
and reject events. The counters provide some performance statistics
that could be used in the future to consider if the real mode functions
need further optimizing. The counters are displayed as part of IPC and
ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM.

Also added two counters that count (approximately) how many times we
don't find an ICP or ICS we're looking for. These are not currently
exposed through sysfs, but can be useful when debugging crashes.

Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  7 +++
 arch/powerpc/kvm/book3s_xics.c   | 10 --
 arch/powerpc/kvm/book3s_xics.h   |  5 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 73bbe92..6dded8c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -227,6 +227,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
ics = kvmppc_xics_find_ics(xics, new_irq, src);
if (!ics) {
/* Unsafe increment, but this does not need to be accurate */
+   xics-err_noics++;
return;
}
state = ics-irq_state[src];
@@ -239,6 +240,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
icp = kvmppc_xics_find_server(xics-kvm, state-server);
if (!icp) {
/* Unsafe increment again*/
+   xics-err_noicp++;
goto out;
}
}
@@ -383,6 +385,7 @@ static void icp_rm_down_cppr(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
 * separately here as well.
 */
if (resend) {
+   icp-n_check_resend++;
icp_rm_check_resend(xics, icp);
}
 }
@@ -500,11 +503,13 @@ int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long 
server,
 
/* Handle reject in real mode */
if (reject  reject != XICS_IPI) {
+   this_icp-n_reject++;
icp_rm_deliver_irq(xics, icp, reject);
}
 
/* Handle resends in real mode */
if (resend) {
+   this_icp-n_check_resend++;
icp_rm_check_resend(xics, icp);
}
 
@@ -566,6 +571,7 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long 
cppr)
 * attempt (see comments in icp_rm_deliver_irq).
 */
if (reject  reject != XICS_IPI) {
+   icp-n_reject++;
icp_rm_deliver_irq(xics, icp, reject);
}
  bail:
@@ -616,6 +622,7 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long 
xirr)
 
/* Still asserted, resend it */
if (state-asserted) {
+   icp-n_reject++;
icp_rm_deliver_irq(xics, icp, irq);
}
 
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 56ed9b4..eb2569a 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -901,6 +901,7 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
unsigned long flags;
unsigned long t_rm_kick_vcpu, t_rm_check_resend;
unsigned long t_rm_reject, t_rm_notify_eoi;
+   unsigned long t_reject, t_check_resend;
 
if (!kvm)
return 0;
@@ -909,6 +910,8 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
t_rm_notify_eoi = 0;
t_rm_check_resend = 0;
t_rm_reject = 0;
+   t_check_resend = 0;
+   t_reject = 0;
 
seq_printf(m, =\nICP state\n=\n);
 
@@ -928,12 +931,15 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
t_rm_notify_eoi += icp-n_rm_notify_eoi;
t_rm_check_resend += icp-n_rm_check_resend;
t_rm_reject += icp-n_rm_reject;
+   t_check_resend += icp-n_check_resend;
+   t_reject += icp-n_reject;
}
 
-   seq_puts(m, ICP Guest Real Mode exit totals: );
-   seq_printf(m, \tkick_vcpu=%lu check_resend=%lu reject=%lu 
notify_eoi=%lu\n,
+   seq_printf(m, ICP Guest-Host totals: kick_vcpu=%lu check_resend=%lu 
reject=%lu notify_eoi=%lu\n,
t_rm_kick_vcpu, t_rm_check_resend,
t_rm_reject, t_rm_notify_eoi);
+   seq_printf(m, ICP Real Mode totals: check_resend=%lu resend=%lu\n,
+   t_check_resend, t_reject);
for (icsid = 0; icsid = KVMPPC_XICS_MAX_ICS_ID; icsid++) {
struct kvmppc_ics *ics = xics-ics[icsid];
 
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index 055424c..56ea44f 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++

[PATCH 08/23] KVM: PPC: Book3S HV: Add guest-host real mode completion counters

2015-03-20 Thread Paul Mackerras

From: Suresh E. Warrier warr...@linux.vnet.ibm.com

Add counters to track number of times we switch from guest real mode
to host virtual mode during an interrupt-related hyper call because the
hypercall requires actions that cannot be completed in real mode. This
will help when making optimizations that reduce guest-host transitions.

It is safe to use an ordinary increment rather than an atomic operation
because there is one ICP per virtual CPU and kvmppc_xics_rm_complete()
only works on the ICP for the current VCPU.

The counters are displayed as part of IPC and ICP state provided by
/sys/debug/kernel/powerpc/kvm* for each VM.

Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_xics.c | 31 +++
 arch/powerpc/kvm/book3s_xics.h |  6 ++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index ead3a35..48f0bda 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -802,14 +802,22 @@ static noinline int kvmppc_xics_rm_complete(struct 
kvm_vcpu *vcpu, u32 hcall)
XICS_DBG(XICS_RM: H_%x completing, act: %x state: %lx tgt: %p\n,
 hcall, icp-rm_action, icp-rm_dbgstate.raw, icp-rm_dbgtgt);
 
-   if (icp-rm_action  XICS_RM_KICK_VCPU)
+   if (icp-rm_action  XICS_RM_KICK_VCPU) {
+   icp-n_rm_kick_vcpu++;
kvmppc_fast_vcpu_kick(icp-rm_kick_target);
-   if (icp-rm_action  XICS_RM_CHECK_RESEND)
+   }
+   if (icp-rm_action  XICS_RM_CHECK_RESEND) {
+   icp-n_rm_check_resend++;
icp_check_resend(xics, icp-rm_resend_icp);
-   if (icp-rm_action  XICS_RM_REJECT)
+   }
+   if (icp-rm_action  XICS_RM_REJECT) {
+   icp-n_rm_reject++;
icp_deliver_irq(xics, icp, icp-rm_reject);
-   if (icp-rm_action  XICS_RM_NOTIFY_EOI)
+   }
+   if (icp-rm_action  XICS_RM_NOTIFY_EOI) {
+   icp-n_rm_notify_eoi++;
kvm_notify_acked_irq(vcpu-kvm, 0, icp-rm_eoied_irq);
+   }
 
icp-rm_action = 0;
 
@@ -872,10 +880,17 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
struct kvm *kvm = xics-kvm;
struct kvm_vcpu *vcpu;
int icsid, i;
+   unsigned long t_rm_kick_vcpu, t_rm_check_resend;
+   unsigned long t_rm_reject, t_rm_notify_eoi;
 
if (!kvm)
return 0;
 
+   t_rm_kick_vcpu = 0;
+   t_rm_notify_eoi = 0;
+   t_rm_check_resend = 0;
+   t_rm_reject = 0;
+
seq_printf(m, =\nICP state\n=\n);
 
kvm_for_each_vcpu(i, vcpu, kvm) {
@@ -890,8 +905,16 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
   icp-server_num, state.xisr,
   state.pending_pri, state.cppr, state.mfrr,
   state.out_ee, state.need_resend);
+   t_rm_kick_vcpu += icp-n_rm_kick_vcpu;
+   t_rm_notify_eoi += icp-n_rm_notify_eoi;
+   t_rm_check_resend += icp-n_rm_check_resend;
+   t_rm_reject += icp-n_rm_reject;
}
 
+   seq_puts(m, ICP Guest Real Mode exit totals: );
+   seq_printf(m, \tkick_vcpu=%lu check_resend=%lu reject=%lu 
notify_eoi=%lu\n,
+   t_rm_kick_vcpu, t_rm_check_resend,
+   t_rm_reject, t_rm_notify_eoi);
for (icsid = 0; icsid = KVMPPC_XICS_MAX_ICS_ID; icsid++) {
struct kvmppc_ics *ics = xics-ics[icsid];
 
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index 73f0f27..de970ec 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -78,6 +78,12 @@ struct kvmppc_icp {
u32  rm_reject;
u32  rm_eoied_irq;
 
+   /* Counters for each reason we exited real mode */
+   unsigned long n_rm_kick_vcpu;
+   unsigned long n_rm_check_resend;
+   unsigned long n_rm_reject;
+   unsigned long n_rm_notify_eoi;
+
/* Debug stuff for real mode */
union kvmppc_icp_state rm_dbgstate;
struct kvm_vcpu *rm_dbgtgt;
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/23] KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count

2015-03-20 Thread Paul Mackerras

The VPA (virtual processor area) is defined by PAPR and is therefore
big-endian, so we need a be32_to_cpu when reading it in
kvmppc_get_yield_count().  Without this, H_CONFER always fails on a
little-endian host, causing SMP guests to waste time spinning on
spinlocks.

Cc: sta...@vger.kernel.org # v3.19
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b273193..de74756 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -636,7 +636,7 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu)
spin_lock(vcpu-arch.vpa_update_lock);
lppaca = (struct lppaca *)vcpu-arch.vpa.pinned_addr;
if (lppaca)
-   yield_count = lppaca-yield_count;
+   yield_count = be32_to_cpu(lppaca-yield_count);
spin_unlock(vcpu-arch.vpa_update_lock);
return yield_count;
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/23] KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte

2015-03-20 Thread Paul Mackerras

From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

This adds helper routines for locking and unlocking HPTEs, and uses
them in the rest of the code.  We don't change any locking rules in
this patch.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 25 ++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 25 +
 3 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2d81e20..0789a0f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long 
bits)
return old == 0;
 }
 
+static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   asm volatile(PPC_RELEASE_BARRIER  : : : memory);
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
+/* Without barrier */
+static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+   hpte_v = ~HPTE_V_HVLOCK;
+   hpte[0] = cpu_to_be64(hpte_v);
+}
+
 static inline int __hpte_actual_psize(unsigned int lp, int psize)
 {
int i, shift;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index dbf1271..6c6825a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
v = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
gr = kvm-arch.revmap[index].guest_rpte;
 
-   /* Unlock the HPTE */
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(v);
+   unlock_hpte(hptep, v);
preempt_enable();
 
gpte-eaddr = eaddr;
@@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
hpte[0] = be64_to_cpu(hptep[0])  ~HPTE_V_HVLOCK;
hpte[1] = be64_to_cpu(hptep[1]);
hpte[2] = r = rev-guest_rpte;
-   asm volatile(lwsync : : : memory);
-   hptep[0] = cpu_to_be64(hpte[0]);
+   unlock_hpte(hptep, hpte[0]);
preempt_enable();
 
if (hpte[0] != vcpu-arch.pgfault_hpte[0] ||
@@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
hptep[1] = cpu_to_be64(r);
eieio();
-   hptep[0] = cpu_to_be64(hpte[0]);
+   __unlock_hpte(hptep, hpte[0]);
asm volatile(ptesync : : : memory);
preempt_enable();
if (page  hpte_is_writable(r))
@@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
return ret;
 
  out_unlock:
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
preempt_enable();
goto out_put;
 }
@@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
}
unlock_rmap(rmapp);
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
return 0;
 }
@@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
}
ret = 1;
}
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
 
/* Now check and modify the HPTE */
if (!(hptep[0]  cpu_to_be64(HPTE_V_VALID))) {
-   /* unlock and continue */
-   hptep[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   __unlock_hpte(hptep, be64_to_cpu(hptep[0]));
continue;
}
 
@@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
npages_dirty = n;
eieio();
}
-   v = ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
+   v = ~HPTE_V_ABSENT;
v |= HPTE_V_VALID;
-   hptep[0] = cpu_to_be64(v);
+   __unlock_hpte(hptep, v);
} while ((i = j) != head);
 
unlock_rmap(rmapp);
@@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
r = ~HPTE_GR_MODIFIED;
revp-guest_rpte = r;
}
-   asm volatile(PPC_RELEASE_BARRIER  : : : memory);
-   hptp[0] = ~cpu_to_be64(HPTE_V_HVLOCK);
+   unlock_hpte(hptp, be64_to_cpu(hptp[0]));

[PATCH 04/23] KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation.

2015-03-20 Thread Paul Mackerras

From: Michael Ellerman mich...@ellerman.id.au

Some PowerNV systems include a hardware random-number generator.
This HWRNG is present on POWER7+ and POWER8 chips and is capable of
generating one 64-bit random number every microsecond.  The random
numbers are produced by sampling a set of 64 unstable high-frequency
oscillators and are almost completely entropic.

PAPR defines an H_RANDOM hypercall which guests can use to obtain one
64-bit random sample from the HWRNG.  This adds a real-mode
implementation of the H_RANDOM hypercall.  This hypercall was
implemented in real mode because the latency of reading the HWRNG is
generally small compared to the latency of a guest exit and entry for
all the threads in the same virtual core.

Userspace can detect the presence of the HWRNG and the H_RANDOM
implementation by quering the KVM_CAP_PPC_HWRNG capability.  The
H_RANDOM hypercall implementation will only be invoked when the guest
does an H_RANDOM hypercall if userspace first enables the in-kernel
H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability.

Signed-off-by: Michael Ellerman mich...@ellerman.id.au
Signed-off-by: Paul Mackerras pau...@samba.org
---
 Documentation/virtual/kvm/api.txt   |  17 +
 arch/powerpc/include/asm/archrandom.h   |  11 ++-
 arch/powerpc/include/asm/kvm_ppc.h  |   2 +
 arch/powerpc/kvm/book3s_hv_builtin.c|  15 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 115 
 arch/powerpc/kvm/powerpc.c  |   3 +
 arch/powerpc/platforms/powernv/rng.c|  29 
 include/uapi/linux/kvm.h|   1 +
 8 files changed, 191 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index b112efc..ce10b48 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3248,3 +3248,20 @@ All other orders will be handled completely in user 
space.
 Only privileged operation exceptions will be checked for in the kernel (or even
 in the hardware prior to interception). If this capability is not enabled, the
 old way of handling SIGP orders is used (partially in kernel and user space).
+
+
+8. Other capabilities.
+--
+
+This section lists capabilities that give information about other
+features of the KVM implementation.
+
+8.1 KVM_CAP_PPC_HWRNG
+
+Architectures: ppc
+
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that that the kernel has an implementation of the
+H_RANDOM hypercall backed by a hardware random-number generator.
+If present, the kernel H_RANDOM handler can be enabled for guest use
+with the KVM_CAP_PPC_ENABLE_HCALL capability.
diff --git a/arch/powerpc/include/asm/archrandom.h 
b/arch/powerpc/include/asm/archrandom.h
index bde5311..0cc6eed 100644
--- a/arch/powerpc/include/asm/archrandom.h
+++ b/arch/powerpc/include/asm/archrandom.h
@@ -30,8 +30,6 @@ static inline int arch_has_random(void)
return !!ppc_md.get_random_long;
 }
 
-int powernv_get_random_long(unsigned long *v);
-
 static inline int arch_get_random_seed_long(unsigned long *v)
 {
return 0;
@@ -47,4 +45,13 @@ static inline int arch_has_random_seed(void)
 
 #endif /* CONFIG_ARCH_RANDOM */
 
+#ifdef CONFIG_PPC_POWERNV
+int powernv_hwrng_present(void);
+int powernv_get_random_long(unsigned long *v);
+int powernv_get_random_real_mode(unsigned long *v);
+#else
+static inline int powernv_hwrng_present(void) { return 0; }
+static inline int powernv_get_random_real_mode(unsigned long *v) { return 0; }
+#endif
+
 #endif /* _ASM_POWERPC_ARCHRANDOM_H */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 46bf652..b8475da 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -302,6 +302,8 @@ static inline bool is_kvmppc_hv_enabled(struct kvm *kvm)
return kvm-arch.kvm_ops == kvmppc_hv_ops;
 }
 
+extern int kvmppc_hwrng_present(void);
+
 /*
  * Cuts out inst bits with ordering according to spec.
  * That means the leftmost bit is zero. All given bits are included.
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 1f083ff..1954a1c 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -21,6 +21,7 @@
 #include asm/cputable.h
 #include asm/kvm_ppc.h
 #include asm/kvm_book3s.h
+#include asm/archrandom.h
 
 #define KVM_CMA_CHUNK_ORDER18
 
@@ -169,3 +170,17 @@ int kvmppc_hcall_impl_hv_realmode(unsigned long cmd)
return 0;
 }
 EXPORT_SYMBOL_GPL(kvmppc_hcall_impl_hv_realmode);
+
+int kvmppc_hwrng_present(void)
+{
+   return powernv_hwrng_present();
+}
+EXPORT_SYMBOL_GPL(kvmppc_hwrng_present);
+
+long kvmppc_h_random(struct kvm_vcpu *vcpu)
+{
+   if (powernv_get_random_real_mode(vcpu-arch.gpr[4]))
+   return H_SUCCESS;
+
+   return H_HARDWARE;
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S

[PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object

2015-03-20 Thread Paul Mackerras

From: Bharata B Rao bhar...@linux.vnet.ibm.com

Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
correctly, certain work arounds have to be employed to allow reuse of
vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
proposed workaround is to park the vcpu fd in userspace during cpu unplug
and reuse it later during next hotplug.

More details can be found here:
KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html

In order to support this workaround with PowerPC KVM, don't create or
initialize ICP if the vCPU is found to be already associated with an ICP.

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_xics.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index a4a8d9f..ead3a35 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, 
struct kvm_vcpu *vcpu,
return -EPERM;
if (xics-kvm != vcpu-kvm)
return -EPERM;
-   if (vcpu-arch.irq_type)
-   return -EBUSY;
+
+   /*
+* If irq_type is already set, don't reinialize but
+* return success allowing this vcpu to be reused.
+*/
+   if (vcpu-arch.irq_type != KVMPPC_IRQ_DEFAULT)
+   return 0;
 
r = kvmppc_xics_create_icp(vcpu, xcpu);
if (!r)
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/23] KVM: PPC: Book3S HV: Remove RMA-related variables from code

2015-03-20 Thread Paul Mackerras

From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

We don't support real-mode areas now that 970 support is removed.
Remove the remaining details of rma from the code.  Also rename
rma_setup_done to hpte_setup_done to better reflect the changes.

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  3 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 28 ++--
 arch/powerpc/kvm/book3s_hv.c| 10 +-
 3 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 8ef0512..015773f 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -228,9 +228,8 @@ struct kvm_arch {
int tlbie_lock;
unsigned long lpcr;
unsigned long rmor;
-   struct kvm_rma_info *rma;
unsigned long vrma_slb_v;
-   int rma_setup_done;
+   int hpte_setup_done;
u32 hpt_order;
atomic_t vcpus_running;
u32 online_vcores;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 534acb3..dbf1271 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -116,12 +116,12 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 
*htab_orderp)
long order;
 
mutex_lock(kvm-lock);
-   if (kvm-arch.rma_setup_done) {
-   kvm-arch.rma_setup_done = 0;
-   /* order rma_setup_done vs. vcpus_running */
+   if (kvm-arch.hpte_setup_done) {
+   kvm-arch.hpte_setup_done = 0;
+   /* order hpte_setup_done vs. vcpus_running */
smp_mb();
if (atomic_read(kvm-arch.vcpus_running)) {
-   kvm-arch.rma_setup_done = 1;
+   kvm-arch.hpte_setup_done = 1;
goto out;
}
}
@@ -1339,20 +1339,20 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
unsigned long tmp[2];
ssize_t nb;
long int err, ret;
-   int rma_setup;
+   int hpte_setup;
 
if (!access_ok(VERIFY_READ, buf, count))
return -EFAULT;
 
/* lock out vcpus from running while we're doing this */
mutex_lock(kvm-lock);
-   rma_setup = kvm-arch.rma_setup_done;
-   if (rma_setup) {
-   kvm-arch.rma_setup_done = 0;   /* temporarily */
-   /* order rma_setup_done vs. vcpus_running */
+   hpte_setup = kvm-arch.hpte_setup_done;
+   if (hpte_setup) {
+   kvm-arch.hpte_setup_done = 0;  /* temporarily */
+   /* order hpte_setup_done vs. vcpus_running */
smp_mb();
if (atomic_read(kvm-arch.vcpus_running)) {
-   kvm-arch.rma_setup_done = 1;
+   kvm-arch.hpte_setup_done = 1;
mutex_unlock(kvm-lock);
return -EBUSY;
}
@@ -1405,7 +1405,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
   r=%lx\n, ret, i, v, r);
goto out;
}
-   if (!rma_setup  is_vrma_hpte(v)) {
+   if (!hpte_setup  is_vrma_hpte(v)) {
unsigned long psize = hpte_base_page_size(v, r);
unsigned long senc = slb_pgsize_encoding(psize);
unsigned long lpcr;
@@ -1414,7 +1414,7 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
(VRMA_VSID  SLB_VSID_SHIFT_1T);
lpcr = senc  (LPCR_VRMASD_SH - 4);
kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
-   rma_setup = 1;
+   hpte_setup = 1;
}
++i;
hptp += 2;
@@ -1430,9 +1430,9 @@ static ssize_t kvm_htab_write(struct file *file, const 
char __user *buf,
}
 
  out:
-   /* Order HPTE updates vs. rma_setup_done */
+   /* Order HPTE updates vs. hpte_setup_done */
smp_wmb();
-   kvm-arch.rma_setup_done = rma_setup;
+   kvm-arch.hpte_setup_done = hpte_setup;
mutex_unlock(kvm-lock);
 
if (err)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de74756..7b7102a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2032,11 +2032,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu)
}
 
atomic_inc(vcpu-kvm-arch.vcpus_running);
-   /* Order vcpus_running vs. rma_setup_done, see kvmppc_alloc_reset_hpt */
+   /* Order vcpus_running vs.

[PATCH 02/23] KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count

2015-03-20 Thread Paul Mackerras

The VPA (virtual processor area) is defined by PAPR and is therefore
big-endian, so we need a be32_to_cpu when reading it in
kvmppc_get_yield_count().  Without this, H_CONFER always fails on a
little-endian host, causing SMP guests to waste time spinning on
spinlocks.

Cc: sta...@vger.kernel.org # v3.19
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b273193..de74756 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -636,7 +636,7 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu)
spin_lock(vcpu-arch.vpa_update_lock);
lppaca = (struct lppaca *)vcpu-arch.vpa.pinned_addr;
if (lppaca)
-   yield_count = lppaca-yield_count;
+   yield_count = be32_to_cpu(lppaca-yield_count);
spin_unlock(vcpu-arch.vpa_update_lock);
return yield_count;
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object

2015-03-20 Thread Paul Mackerras

From: Bharata B Rao bhar...@linux.vnet.ibm.com

Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
correctly, certain work arounds have to be employed to allow reuse of
vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
proposed workaround is to park the vcpu fd in userspace during cpu unplug
and reuse it later during next hotplug.

More details can be found here:
KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html

In order to support this workaround with PowerPC KVM, don't create or
initialize ICP if the vCPU is found to be already associated with an ICP.

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_xics.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index a4a8d9f..ead3a35 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, 
struct kvm_vcpu *vcpu,
return -EPERM;
if (xics-kvm != vcpu-kvm)
return -EPERM;
-   if (vcpu-arch.irq_type)
-   return -EBUSY;
+
+   /*
+* If irq_type is already set, don't reinialize but
+* return success allowing this vcpu to be reused.
+*/
+   if (vcpu-arch.irq_type != KVMPPC_IRQ_DEFAULT)
+   return 0;
 
r = kvmppc_xics_create_icp(vcpu, xcpu);
if (!r)
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/23] Bug fixes and improvements for HV KVM

2015-03-20 Thread Paul Mackerras

This is my current patch queue for HV KVM on PPC.  This series is
based on the queue branch of the KVM tree, i.e. roughly v4.0-rc3
plus a set of recent KVM changes which don't intersect with the
changes in this series.  On top of that, in my testing I have some
patches which are not KVM-related but are needed to boot and run a
recent upstream kernel successfully:

tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
tick/hotplug: Handover time related duties before cpu offline
powerpc/powernv: Check image loaded or not before calling flash
powerpc/powernv: Fixes for hypervisor doorbell handling
powerpc/powernv: Fix return value from power7_nap() et al.
powerpc: Export __spin_yield

These patches have been posted by their authors and are on their way
upstream via various trees.  They are not included in this series.

The first three patches are bug fixes that should go into v4.0 if
possible.  The remainder are intended for the 4.1 merge window.

The patch powerpc: Export __spin_yield is a prerequisite for patch
9/23 of this series (KVM: PPC: Book3S HV: Convert ICS mutex lock to
spin lock).  It is on its way upstream through the linuxppc-dev
mailing list.

The patch powerpc/powernv: Fixes for hypervisor doorbell handling is
needed for correct operation with patch 20/23, KVM: PPC: Book3S HV:
Use msgsnd for signalling threads.  It is also on its way upstream
through the linuxppc-dev list.  I am expecting both of these
prerequisite patches to go into 4.0.

Finally, the last patch in this series converts some of the assembly
code in book3s_hv_rmhandlers.S into C.  I intend to continue this
trend.

Paul.

 Documentation/virtual/kvm/api.txt|  17 +
 arch/powerpc/include/asm/archrandom.h|  11 +-
 arch/powerpc/include/asm/kvm_book3s_64.h |  18 ++
 arch/powerpc/include/asm/kvm_host.h  |  45 ++-
 arch/powerpc/include/asm/kvm_ppc.h   |   2 +
 arch/powerpc/include/asm/time.h  |   3 +
 arch/powerpc/kernel/asm-offsets.c|  19 +-
 arch/powerpc/kernel/time.c   |   6 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 189 +--
 arch/powerpc/kvm/book3s_hv.c | 413 +---
 arch/powerpc/kvm/book3s_hv_builtin.c |  98 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 239 --
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 524 +++
 arch/powerpc/kvm/book3s_xics.c   | 114 +--
 arch/powerpc/kvm/book3s_xics.h   |  13 +-
 arch/powerpc/kvm/powerpc.c   |   3 +
 arch/powerpc/platforms/powernv/rng.c |  29 ++
 include/uapi/linux/kvm.h |   1 +
 virt/kvm/kvm_main.c  |   1 +
 20 files changed, 1401 insertions(+), 369 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/23] Bug fixes and improvements for HV KVM

2015-03-20 Thread Paul Mackerras

This is my current patch queue for HV KVM on PPC.  This series is
based on the queue branch of the KVM tree, i.e. roughly v4.0-rc3
plus a set of recent KVM changes which don't intersect with the
changes in this series.  On top of that, in my testing I have some
patches which are not KVM-related but are needed to boot and run a
recent upstream kernel successfully:

tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
tick/hotplug: Handover time related duties before cpu offline
powerpc/powernv: Check image loaded or not before calling flash
powerpc/powernv: Fixes for hypervisor doorbell handling
powerpc/powernv: Fix return value from power7_nap() et al.
powerpc: Export __spin_yield

These patches have been posted by their authors and are on their way
upstream via various trees.  They are not included in this series.

The first three patches are bug fixes that should go into v4.0 if
possible.  The remainder are intended for the 4.1 merge window.

The patch powerpc: Export __spin_yield is a prerequisite for patch
9/23 of this series (KVM: PPC: Book3S HV: Convert ICS mutex lock to
spin lock).  It is on its way upstream through the linuxppc-dev
mailing list.

The patch powerpc/powernv: Fixes for hypervisor doorbell handling is
needed for correct operation with patch 20/23, KVM: PPC: Book3S HV:
Use msgsnd for signalling threads.  It is also on its way upstream
through the linuxppc-dev list.  I am expecting both of these
prerequisite patches to go into 4.0.

Finally, the last patch in this series converts some of the assembly
code in book3s_hv_rmhandlers.S into C.  I intend to continue this
trend.

Paul.

 Documentation/virtual/kvm/api.txt|  17 +
 arch/powerpc/include/asm/archrandom.h|  11 +-
 arch/powerpc/include/asm/kvm_book3s_64.h |  18 ++
 arch/powerpc/include/asm/kvm_host.h  |  45 ++-
 arch/powerpc/include/asm/kvm_ppc.h   |   2 +
 arch/powerpc/include/asm/time.h  |   3 +
 arch/powerpc/kernel/asm-offsets.c|  19 +-
 arch/powerpc/kernel/time.c   |   6 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 189 +--
 arch/powerpc/kvm/book3s_hv.c | 413 +---
 arch/powerpc/kvm/book3s_hv_builtin.c |  98 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 239 --
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 524 +++
 arch/powerpc/kvm/book3s_xics.c   | 114 +--
 arch/powerpc/kvm/book3s_xics.h   |  13 +-
 arch/powerpc/kvm/powerpc.c   |   3 +
 arch/powerpc/platforms/powernv/rng.c |  29 ++
 include/uapi/linux/kvm.h |   1 +
 virt/kvm/kvm_main.c  |   1 +
 20 files changed, 1401 insertions(+), 369 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/23] KVM: PPC: Book3S HV: Add guest-host real mode completion counters

2015-03-20 Thread Paul Mackerras

From: Suresh E. Warrier warr...@linux.vnet.ibm.com

Add counters to track number of times we switch from guest real mode
to host virtual mode during an interrupt-related hyper call because the
hypercall requires actions that cannot be completed in real mode. This
will help when making optimizations that reduce guest-host transitions.

It is safe to use an ordinary increment rather than an atomic operation
because there is one ICP per virtual CPU and kvmppc_xics_rm_complete()
only works on the ICP for the current VCPU.

The counters are displayed as part of IPC and ICP state provided by
/sys/debug/kernel/powerpc/kvm* for each VM.

Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_xics.c | 31 +++
 arch/powerpc/kvm/book3s_xics.h |  6 ++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index ead3a35..48f0bda 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -802,14 +802,22 @@ static noinline int kvmppc_xics_rm_complete(struct 
kvm_vcpu *vcpu, u32 hcall)
XICS_DBG(XICS_RM: H_%x completing, act: %x state: %lx tgt: %p\n,
 hcall, icp-rm_action, icp-rm_dbgstate.raw, icp-rm_dbgtgt);
 
-   if (icp-rm_action  XICS_RM_KICK_VCPU)
+   if (icp-rm_action  XICS_RM_KICK_VCPU) {
+   icp-n_rm_kick_vcpu++;
kvmppc_fast_vcpu_kick(icp-rm_kick_target);
-   if (icp-rm_action  XICS_RM_CHECK_RESEND)
+   }
+   if (icp-rm_action  XICS_RM_CHECK_RESEND) {
+   icp-n_rm_check_resend++;
icp_check_resend(xics, icp-rm_resend_icp);
-   if (icp-rm_action  XICS_RM_REJECT)
+   }
+   if (icp-rm_action  XICS_RM_REJECT) {
+   icp-n_rm_reject++;
icp_deliver_irq(xics, icp, icp-rm_reject);
-   if (icp-rm_action  XICS_RM_NOTIFY_EOI)
+   }
+   if (icp-rm_action  XICS_RM_NOTIFY_EOI) {
+   icp-n_rm_notify_eoi++;
kvm_notify_acked_irq(vcpu-kvm, 0, icp-rm_eoied_irq);
+   }
 
icp-rm_action = 0;
 
@@ -872,10 +880,17 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
struct kvm *kvm = xics-kvm;
struct kvm_vcpu *vcpu;
int icsid, i;
+   unsigned long t_rm_kick_vcpu, t_rm_check_resend;
+   unsigned long t_rm_reject, t_rm_notify_eoi;
 
if (!kvm)
return 0;
 
+   t_rm_kick_vcpu = 0;
+   t_rm_notify_eoi = 0;
+   t_rm_check_resend = 0;
+   t_rm_reject = 0;
+
seq_printf(m, =\nICP state\n=\n);
 
kvm_for_each_vcpu(i, vcpu, kvm) {
@@ -890,8 +905,16 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
   icp-server_num, state.xisr,
   state.pending_pri, state.cppr, state.mfrr,
   state.out_ee, state.need_resend);
+   t_rm_kick_vcpu += icp-n_rm_kick_vcpu;
+   t_rm_notify_eoi += icp-n_rm_notify_eoi;
+   t_rm_check_resend += icp-n_rm_check_resend;
+   t_rm_reject += icp-n_rm_reject;
}
 
+   seq_puts(m, ICP Guest Real Mode exit totals: );
+   seq_printf(m, \tkick_vcpu=%lu check_resend=%lu reject=%lu 
notify_eoi=%lu\n,
+   t_rm_kick_vcpu, t_rm_check_resend,
+   t_rm_reject, t_rm_notify_eoi);
for (icsid = 0; icsid = KVMPPC_XICS_MAX_ICS_ID; icsid++) {
struct kvmppc_ics *ics = xics-ics[icsid];
 
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index 73f0f27..de970ec 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -78,6 +78,12 @@ struct kvmppc_icp {
u32  rm_reject;
u32  rm_eoied_irq;
 
+   /* Counters for each reason we exited real mode */
+   unsigned long n_rm_kick_vcpu;
+   unsigned long n_rm_check_resend;
+   unsigned long n_rm_reject;
+   unsigned long n_rm_notify_eoi;
+
/* Debug stuff for real mode */
union kvmppc_icp_state rm_dbgstate;
struct kvm_vcpu *rm_dbgtgt;
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/23] KVM: PPC: Book3S HV: Fix instruction emulation

2015-03-20 Thread Paul Mackerras

Commit 4a157d61b48c (KVM: PPC: Book3S HV: Fix endianness of
instruction obtained from HEIR register) had the side effect that
we no longer reset vcpu-arch.last_inst to -1 on guest exit in
the cases where the instruction is not fetched from the guest.
This means that if instruction emulation turns out to be required
in those cases, the host will emulate the wrong instruction, since
vcpu-arch.last_inst will contain the last instruction that was
emulated.

This fixes it by making sure that vcpu-arch.last_inst is reset
to -1 in those cases.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bb94e6f..6cbf163 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1005,6 +1005,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
/* Save HEIR (HV emulation assist reg) in emul_inst
   if this is an HEI (HV emulation interrupt, e40) */
li  r3,KVM_INST_FETCH_FAILED
+   stw r3,VCPU_LAST_INST(r9)
cmpwi   r12,BOOK3S_INTERRUPT_H_EMUL_ASSIST
bne 11f
mfspr   r3,SPRN_HEIR
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: avoid page allocation failure in kvm_set_memory_region()

2015-03-20 Thread Igor Mammedov

KVM guest can fail to startup with following trace on host:

qemu-system-x86: page allocation failure: order:4, mode:0x40d0
Call Trace:
  dump_stack+0x47/0x67
  warn_alloc_failed+0xee/0x150
  __alloc_pages_direct_compact+0x14a/0x150
  __alloc_pages_nodemask+0x776/0xb80
  alloc_kmem_pages+0x3a/0x110
  kmalloc_order+0x13/0x50
  kmemdup+0x1b/0x40
  __kvm_set_memory_region+0x24a/0x9f0 [kvm]
  kvm_set_ioapic+0x130/0x130 [kvm]
  kvm_set_memory_region+0x21/0x40 [kvm]
  kvm_vm_ioctl+0x43f/0x750 [kvm]

Failure happens when attempting to allocate pages for
'struct kvm_memslots', however it doesn't have to be
present in physically contiguous (kmalloc-ed) address
space, change allocation to kvm_kvzalloc() so that
it will be vmalloc-ed when its size is more then a page.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
TODO:
 - work on follow up patches to allocate space for
   actual amount of memory_slots instead of possible maximum.
---
 virt/kvm/kvm_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a2214d9..7ed1f5c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -871,10 +871,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
goto out_free;
}
 
-   slots = kmemdup(kvm-memslots, sizeof(struct kvm_memslots),
-   GFP_KERNEL);
+   slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
if (!slots)
goto out_free;
+   memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
 
if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
slot = id_to_memslot(slots, mem-slot);
@@ -936,7 +936,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
return 0;
 
 out_slots:
-   kfree(slots);
+   kvfree(slots);
 out_free:
kvm_free_physmem_slot(kvm, new, old);
 out:
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Hardware support for vt-posted interrupts described in vt-directed-io-spec for assigned devices

2015-03-20 Thread bk rakesh

Adding few more information regarding the setup which i had created to
test the vt-d posted interrupts for assigned devices,

Hardware used for evaluating vt-posted interrupts
cpu E5-2620 v2 @ 2.10GHz and S2600CP server board

I had used kernel-3.18 patched with KVM-VFIO IRQ forward
control(posted by eric.au...@linaro.org), hierarchy irqdomian(posted
by jiang@linux.intel.com) and VT-d Posted-Interrupts
support(http://lwn.net/Articles/626050/) and assigned the ixgbe 10G
NIC via vfio passthrough using qemu-kvm, But resulted in the following
dmesg output,

[233783.657187] dmar: DRHD: handling fault status reg 602
[233783.662926] dmar: INTR-REMAP: Request device [[02:00.0] fault index 47
INTR-REMAP:[fault reason 36] Detected reserved fields in the IRTE entry

I had checked the hardware supported for posted interrupt capability
via capability register bit 59 (#define cap_pi_support(c)(((c) 
59)  1)),  as described in
http://www.intel.com/content/www/us/en/embedded/technology/virtualization/vt-directed-io-spec.html;,
Which resulted as not supported, Can anyone suggest that does this hw
support posted vt-d feature ? if not then which one to use.

Thanks


On Fri, Mar 20, 2015 at 12:49 PM, bk rakesh rakeshbkr...@gmail.com wrote:

 Hello,

 I am using the S2600CP server board with cpu Intel(R) Xeon(R) CPU
 E5-2620 v2 @ 2.10GHz which supports for APICv,

 Does this hardware support the vt-d posted interrupt feature as  described in
 [v3 00/26] Add VT-d Posted-Interrupts support and
 https://lkml.org/lkml/2014 /12/3/102 ,

 Requesting to get any information about the hardware that supports this 
 feature.

 Thanks,
 Rakesh
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM live migration i/o error

2015-03-20 Thread Stefan Hajnoczi

On Wed, Mar 18, 2015 at 04:53:28PM +0100, Francesc Guasch wrote:
 I have three Ubuntu Server 14.04 trusty with KVM. Two of
 them are HP servers and one is Dell. Both brands run fine
 the KVM virtual servers, and I can do live migration between
 the HPs. But I get I/O errors in the vda when I migrate to
 or from the Dell server.
 
 I have shared storage with NFS, mounted the same way in all
 of them:
 
 nfs.sever:/kvm /var/lib/libvirt/images nfs auto,vers=3
 
 I checked the version of all the packages to make sure are
 the same. I got:
 
 kernel: 3.13.0-43-generic #72-Ubuntu SMP x86_64 libvirt:
 libvirt: 1.2.2-0ubuntu13.1.9 
 qemu-utils: 2.0.0+dfsg-2ubuntu1.10
 qemu-kvm: 2.0.0+dfsg-2ubuntu1.10
 
 I made sure the Cache in the Storage is set to None.
 
 Disk bus: virtio Cache mode: none IO mode: default
 
 I run this to do live migration:
 
 virsh migrate --live virtual qemu+ssh://dellserver/system
 
 As soon as it starts in the origin console I spot I/O error
 messages, when it finishes I got them in the console in the
 destination server. The file system is read only and I have to
 shut it down hard.
 
 end request I/O error, /dev/vda, sector 8790327

origin console == guest's console?

I/O errors starting while the guest is still running on the migration
source host is strange.  I wonder if something happened to the NFS file
related to file permissions or SELinux labels?

Stefan


pgpiV3ba25OeW.pgp
Description: PGP signature

Re: [PATCH 00/23] Bug fixes and improvements for HV KVM

2015-03-20 Thread Alexander Graf



On 20.03.15 10:39, Paul Mackerras wrote:
 This is my current patch queue for HV KVM on PPC.  This series is
 based on the queue branch of the KVM tree, i.e. roughly v4.0-rc3
 plus a set of recent KVM changes which don't intersect with the
 changes in this series.  On top of that, in my testing I have some
 patches which are not KVM-related but are needed to boot and run a
 recent upstream kernel successfully:
 
 tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
 tick/hotplug: Handover time related duties before cpu offline
 powerpc/powernv: Check image loaded or not before calling flash
 powerpc/powernv: Fixes for hypervisor doorbell handling
 powerpc/powernv: Fix return value from power7_nap() et al.
 powerpc: Export __spin_yield
 
 These patches have been posted by their authors and are on their way
 upstream via various trees.  They are not included in this series.
 
 The first three patches are bug fixes that should go into v4.0 if
 possible.

Thanks, applied the first 3 to my for-4.0 branch which is going through
autotest now. If everything runs fine, I'll send it to Paolo for
upstream merge.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/23] Bug fixes and improvements for HV KVM

2015-03-20 Thread Alexander Graf



On 20.03.15 10:39, Paul Mackerras wrote:
 This is my current patch queue for HV KVM on PPC.  This series is
 based on the queue branch of the KVM tree, i.e. roughly v4.0-rc3
 plus a set of recent KVM changes which don't intersect with the
 changes in this series.  On top of that, in my testing I have some
 patches which are not KVM-related but are needed to boot and run a
 recent upstream kernel successfully:
 
 tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
 tick/hotplug: Handover time related duties before cpu offline
 powerpc/powernv: Check image loaded or not before calling flash
 powerpc/powernv: Fixes for hypervisor doorbell handling
 powerpc/powernv: Fix return value from power7_nap() et al.
 powerpc: Export __spin_yield
 
 These patches have been posted by their authors and are on their way
 upstream via various trees.  They are not included in this series.
 
 The first three patches are bug fixes that should go into v4.0 if
 possible.

Thanks, applied the first 3 to my for-4.0 branch which is going through
autotest now. If everything runs fine, I'll send it to Paolo for
upstream merge.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object

2015-03-20 Thread Alexander Graf



On 20.03.15 10:39, Paul Mackerras wrote:
 From: Bharata B Rao bhar...@linux.vnet.ibm.com
 
 Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
 correctly, certain work arounds have to be employed to allow reuse of
 vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
 proposed workaround is to park the vcpu fd in userspace during cpu unplug
 and reuse it later during next hotplug.
 
 More details can be found here:
 KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
 QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
 
 In order to support this workaround with PowerPC KVM, don't create or
 initialize ICP if the vCPU is found to be already associated with an ICP.
 
 Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org

This probably makes some sense, but please make sure that user space has
some way to figure out whether hotplug works at all.

Also Paul, for patches that you pick up from others, I'd prefer if they
send the patches to the ML themselves first and you pick them up from
there then. That way we give everyone the same treatment.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object

2015-03-20 Thread Alexander Graf



On 20.03.15 10:39, Paul Mackerras wrote:
 From: Bharata B Rao bhar...@linux.vnet.ibm.com
 
 Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
 correctly, certain work arounds have to be employed to allow reuse of
 vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
 proposed workaround is to park the vcpu fd in userspace during cpu unplug
 and reuse it later during next hotplug.
 
 More details can be found here:
 KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
 QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
 
 In order to support this workaround with PowerPC KVM, don't create or
 initialize ICP if the vCPU is found to be already associated with an ICP.
 
 Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org

This probably makes some sense, but please make sure that user space has
some way to figure out whether hotplug works at all.

Also Paul, for patches that you pick up from others, I'd prefer if they
send the patches to the ML themselves first and you pick them up from
there then. That way we give everyone the same treatment.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-20 Thread Alexander Graf



On 20.03.15 10:39, Paul Mackerras wrote:
 This reads the timebase at various points in the real-mode guest
 entry/exit code and uses that to accumulate total, minimum and
 maximum time spent in those parts of the code.  Currently these
 times are accumulated per vcpu in 5 parts of the code:
 
 * rm_entry - time taken from the start of kvmppc_hv_entry() until
   just before entering the guest.
 * rm_intr - time from when we take a hypervisor interrupt in the
   guest until we either re-enter the guest or decide to exit to the
   host.  This includes time spent handling hcalls in real mode.
 * rm_exit - time from when we decide to exit the guest until the
   return from kvmppc_hv_entry().
 * guest - time spend in the guest
 * cede - time spent napping in real mode due to an H_CEDE hcall
   while other threads in the same vcore are active.
 
 These times are exposed in debugfs in a directory per vcpu that
 contains a file called timings.  This file contains one line for
 each of the 5 timings above, with the name followed by a colon and
 4 numbers, which are the count (number of times the code has been
 executed), the total time, the minimum time, and the maximum time,
 all in nanoseconds.
 
 Signed-off-by: Paul Mackerras pau...@samba.org

Have you measure the additional overhead this brings?

 ---
  arch/powerpc/include/asm/kvm_host.h |  19 +
  arch/powerpc/include/asm/time.h |   3 +
  arch/powerpc/kernel/asm-offsets.c   |  11 +++
  arch/powerpc/kernel/time.c  |   6 ++
  arch/powerpc/kvm/book3s_hv.c| 135 
 
  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 105 -
  6 files changed, 276 insertions(+), 3 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index f1d0bbc..286c0ce 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -369,6 +369,14 @@ struct kvmppc_slb {
   u8 base_page_size;  /* MMU_PAGE_xxx */
  };
  
 +/* Struct used to accumulate timing information in HV real mode code */
 +struct kvmhv_tb_accumulator {
 + u64 seqcount;   /* used to synchronize access, also count * 2 */
 + u64 tb_total;   /* total time in timebase ticks */
 + u64 tb_min; /* min time */
 + u64 tb_max; /* max time */
 +};
 +
  # ifdef CONFIG_PPC_FSL_BOOK3E
  #define KVMPPC_BOOKE_IAC_NUM 2
  #define KVMPPC_BOOKE_DAC_NUM 2
 @@ -656,6 +664,17 @@ struct kvm_vcpu_arch {
   u64 busy_preempt;
  
   u32 emul_inst;
 +
 + struct kvmhv_tb_accumulator *cur_activity;  /* What we're timing */
 + u64 cur_tb_start;   /* when it started */
 + struct kvmhv_tb_accumulator rm_entry;   /* real-mode entry code */
 + struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */
 + struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
 + struct kvmhv_tb_accumulator guest_time; /* guest execution */
 + struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
 +
 + struct dentry *debugfs_dir;
 + struct dentry *debugfs_timings;
  #endif
  };
  
 diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
 index 03cbada..10fc784 100644
 --- a/arch/powerpc/include/asm/time.h
 +++ b/arch/powerpc/include/asm/time.h
 @@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void);
  
  DECLARE_PER_CPU(u64, decrementers_next_tb);
  
 +/* Convert timebase ticks to nanoseconds */
 +unsigned long long tb_to_ns(unsigned long long tb_ticks);
 +
  #endif /* __KERNEL__ */
  #endif /* __POWERPC_TIME_H */
 diff --git a/arch/powerpc/kernel/asm-offsets.c 
 b/arch/powerpc/kernel/asm-offsets.c
 index 4717859..ec9f59c 100644
 --- a/arch/powerpc/kernel/asm-offsets.c
 +++ b/arch/powerpc/kernel/asm-offsets.c
 @@ -458,6 +458,17 @@ int main(void)
   DEFINE(VCPU_SPRG1, offsetof(struct kvm_vcpu, arch.shregs.sprg1));
   DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
   DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
 + DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry));
 + DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr));
 + DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit));
 + DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time));
 + DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time));
 + DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity));
 + DEFINE(VCPU_ACTIVITY_START, offsetof(struct kvm_vcpu, 
 arch.cur_tb_start));
 + DEFINE(TAS_SEQCOUNT, offsetof(struct kvmhv_tb_accumulator, seqcount));
 + DEFINE(TAS_TOTAL, offsetof(struct kvmhv_tb_accumulator, tb_total));
 + DEFINE(TAS_MIN, offsetof(struct kvmhv_tb_accumulator, tb_min));
 + DEFINE(TAS_MAX, offsetof(struct kvmhv_tb_accumulator, tb_max));
  #endif
   DEFINE(VCPU_SHARED_SPRG3,

Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-20 Thread Paul Mackerras

On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote:
 
 
 On 20.03.15 10:39, Paul Mackerras wrote:
  This reads the timebase at various points in the real-mode guest
  entry/exit code and uses that to accumulate total, minimum and
  maximum time spent in those parts of the code.  Currently these
  times are accumulated per vcpu in 5 parts of the code:
  
  * rm_entry - time taken from the start of kvmppc_hv_entry() until
just before entering the guest.
  * rm_intr - time from when we take a hypervisor interrupt in the
guest until we either re-enter the guest or decide to exit to the
host.  This includes time spent handling hcalls in real mode.
  * rm_exit - time from when we decide to exit the guest until the
return from kvmppc_hv_entry().
  * guest - time spend in the guest
  * cede - time spent napping in real mode due to an H_CEDE hcall
while other threads in the same vcore are active.
  
  These times are exposed in debugfs in a directory per vcpu that
  contains a file called timings.  This file contains one line for
  each of the 5 timings above, with the name followed by a colon and
  4 numbers, which are the count (number of times the code has been
  executed), the total time, the minimum time, and the maximum time,
  all in nanoseconds.
  
  Signed-off-by: Paul Mackerras pau...@samba.org
 
 Have you measure the additional overhead this brings?

I haven't - in fact I did this patch so I could measure the overhead
or improvement from other changes I did, but it doesn't measure its
own overhead, of course.  I guess I need a workload that does a
defined number of guest entries and exits and measure how fast it runs
with and without the patch (maybe something like H_SET_MODE in a
loop).  I'll figure something out and post the results.  

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object

2015-03-20 Thread Alexander Graf



On 20.03.15 12:26, Paul Mackerras wrote:
 On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:


 On 20.03.15 10:39, Paul Mackerras wrote:
 From: Bharata B Rao bhar...@linux.vnet.ibm.com

 Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
 correctly, certain work arounds have to be employed to allow reuse of
 vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
 proposed workaround is to park the vcpu fd in userspace during cpu unplug
 and reuse it later during next hotplug.

 More details can be found here:
 KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
 QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html

 In order to support this workaround with PowerPC KVM, don't create or
 initialize ICP if the vCPU is found to be already associated with an ICP.

 Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org

 This probably makes some sense, but please make sure that user space has
 some way to figure out whether hotplug works at all.
 
 Bharata is working on the qemu side of all this, so I assume he has
 that covered.

Well, so far the kernel doesn't expose anything he can query, so I
suppose he just blindly assumes that older host kernels will randomly
break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
can check on.

 
 Also Paul, for patches that you pick up from others, I'd prefer if they
 send the patches to the ML themselves first and you pick them up from
 there then. That way we give everyone the same treatment.
 
 Fair enough.  In fact Bharata did post the patch but he sent it to
 linuxppc-...@ozlabs.org not the KVM lists.

Please make sure you only take patches into your queue that made it to
at least kvm@vger, preferably kvm-ppc@vger as well. If you see related
patches on other mailing lists, just ask the respective people to resend
with proper ML exposure.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM live migration i/o error

2015-03-20 Thread Francesc Guasch

On Fri, Mar 20, 2015 at 10:03:20AM +, Stefan Hajnoczi wrote:

Hi Stefan, thank you very much for answering me.

 On Wed, Mar 18, 2015 at 04:53:28PM +0100, Francesc Guasch wrote:
  I have three Ubuntu Server 14.04 trusty with KVM. Two of
  them are HP servers and one is Dell. Both brands run fine
  the KVM virtual servers, and I can do live migration between
  the HPs. But I get I/O errors in the vda when I migrate to
  or from the Dell server.
  
  I have shared storage with NFS, mounted the same way in all
  of them:
  
  As soon as it starts in the origin console I spot I/O error
  messages, when it finishes I got them in the console in the
  destination server. The file system is read only and I have to
  shut it down hard.
  
  end request I/O error, /dev/vda, sector 8790327
 
 origin console == guest's console?

Yes, I mean I open two consoles with virt-manager, one in
the origin host and another one in the destination
 
 I/O errors starting while the guest is still running on the migration
 source host is strange.  I wonder if something happened to the NFS file
 related to file permissions or SELinux labels?

I think I found something checking SELinux. ls -Z and getfattr
return nothing. But ps -eZ showed something very different
in the Dell server.

This is in the HP server:
/usr/sbin/libvirtd  1034 ?11:51:44 libvirtd
libvirt-09540b5d-82 701  ?05:28:40 qemu-system-x86
unconfined  1?00:01:00 init

In the Dell server init is confined in lxc and there are also
lxc-start processes.

/usr/sbin/libvirtd  1622 ?05:07:07 libvirtd
libvirt-8a0f9087-32d... 29926 ?   00:00:01 qemu-system-x86
lxc-container-default   1774 ?00:00:00 init
/usr/bin/lxc-start  1763 ?00:00:00 lxc-start

There is also LXC installed in that server ! Maybe that is messing
with kvm. The qemu processes look fine to me but there is a chance
the problem comes from there.

I could move the LXC somewhere else or I can keep it there to
try to fix this issue. What do you advice I should do now ?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-20 Thread Alexander Graf



On 20.03.15 12:25, Paul Mackerras wrote:
 On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote:


 On 20.03.15 10:39, Paul Mackerras wrote:
 This reads the timebase at various points in the real-mode guest
 entry/exit code and uses that to accumulate total, minimum and
 maximum time spent in those parts of the code.  Currently these
 times are accumulated per vcpu in 5 parts of the code:

 * rm_entry - time taken from the start of kvmppc_hv_entry() until
   just before entering the guest.
 * rm_intr - time from when we take a hypervisor interrupt in the
   guest until we either re-enter the guest or decide to exit to the
   host.  This includes time spent handling hcalls in real mode.
 * rm_exit - time from when we decide to exit the guest until the
   return from kvmppc_hv_entry().
 * guest - time spend in the guest
 * cede - time spent napping in real mode due to an H_CEDE hcall
   while other threads in the same vcore are active.

 These times are exposed in debugfs in a directory per vcpu that
 contains a file called timings.  This file contains one line for
 each of the 5 timings above, with the name followed by a colon and
 4 numbers, which are the count (number of times the code has been
 executed), the total time, the minimum time, and the maximum time,
 all in nanoseconds.

 Signed-off-by: Paul Mackerras pau...@samba.org

 Have you measure the additional overhead this brings?
 
 I haven't - in fact I did this patch so I could measure the overhead
 or improvement from other changes I did, but it doesn't measure its
 own overhead, of course.  I guess I need a workload that does a
 defined number of guest entries and exits and measure how fast it runs
 with and without the patch (maybe something like H_SET_MODE in a
 loop).  I'll figure something out and post the results.  

Yeah, just measure the number of exits you can handle for a simple
hcall. If there is measurable overhead, it's probably a good idea to
move the statistics gathering into #ifdef paths for DEBUGFS or maybe
even a separate EXIT_TIMING config option as we have it for booke.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-20 Thread Alexander Graf



On 20.03.15 10:39, Paul Mackerras wrote:
 This reads the timebase at various points in the real-mode guest
 entry/exit code and uses that to accumulate total, minimum and
 maximum time spent in those parts of the code.  Currently these
 times are accumulated per vcpu in 5 parts of the code:
 
 * rm_entry - time taken from the start of kvmppc_hv_entry() until
   just before entering the guest.
 * rm_intr - time from when we take a hypervisor interrupt in the
   guest until we either re-enter the guest or decide to exit to the
   host.  This includes time spent handling hcalls in real mode.
 * rm_exit - time from when we decide to exit the guest until the
   return from kvmppc_hv_entry().
 * guest - time spend in the guest
 * cede - time spent napping in real mode due to an H_CEDE hcall
   while other threads in the same vcore are active.
 
 These times are exposed in debugfs in a directory per vcpu that
 contains a file called timings.  This file contains one line for
 each of the 5 timings above, with the name followed by a colon and
 4 numbers, which are the count (number of times the code has been
 executed), the total time, the minimum time, and the maximum time,
 all in nanoseconds.
 
 Signed-off-by: Paul Mackerras pau...@samba.org

Have you measure the additional overhead this brings?

 ---
  arch/powerpc/include/asm/kvm_host.h |  19 +
  arch/powerpc/include/asm/time.h |   3 +
  arch/powerpc/kernel/asm-offsets.c   |  11 +++
  arch/powerpc/kernel/time.c  |   6 ++
  arch/powerpc/kvm/book3s_hv.c| 135 
 
  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 105 -
  6 files changed, 276 insertions(+), 3 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index f1d0bbc..286c0ce 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -369,6 +369,14 @@ struct kvmppc_slb {
   u8 base_page_size;  /* MMU_PAGE_xxx */
  };
  
 +/* Struct used to accumulate timing information in HV real mode code */
 +struct kvmhv_tb_accumulator {
 + u64 seqcount;   /* used to synchronize access, also count * 2 */
 + u64 tb_total;   /* total time in timebase ticks */
 + u64 tb_min; /* min time */
 + u64 tb_max; /* max time */
 +};
 +
  # ifdef CONFIG_PPC_FSL_BOOK3E
  #define KVMPPC_BOOKE_IAC_NUM 2
  #define KVMPPC_BOOKE_DAC_NUM 2
 @@ -656,6 +664,17 @@ struct kvm_vcpu_arch {
   u64 busy_preempt;
  
   u32 emul_inst;
 +
 + struct kvmhv_tb_accumulator *cur_activity;  /* What we're timing */
 + u64 cur_tb_start;   /* when it started */
 + struct kvmhv_tb_accumulator rm_entry;   /* real-mode entry code */
 + struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */
 + struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
 + struct kvmhv_tb_accumulator guest_time; /* guest execution */
 + struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
 +
 + struct dentry *debugfs_dir;
 + struct dentry *debugfs_timings;
  #endif
  };
  
 diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
 index 03cbada..10fc784 100644
 --- a/arch/powerpc/include/asm/time.h
 +++ b/arch/powerpc/include/asm/time.h
 @@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void);
  
  DECLARE_PER_CPU(u64, decrementers_next_tb);
  
 +/* Convert timebase ticks to nanoseconds */
 +unsigned long long tb_to_ns(unsigned long long tb_ticks);
 +
  #endif /* __KERNEL__ */
  #endif /* __POWERPC_TIME_H */
 diff --git a/arch/powerpc/kernel/asm-offsets.c 
 b/arch/powerpc/kernel/asm-offsets.c
 index 4717859..ec9f59c 100644
 --- a/arch/powerpc/kernel/asm-offsets.c
 +++ b/arch/powerpc/kernel/asm-offsets.c
 @@ -458,6 +458,17 @@ int main(void)
   DEFINE(VCPU_SPRG1, offsetof(struct kvm_vcpu, arch.shregs.sprg1));
   DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
   DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
 + DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry));
 + DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr));
 + DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit));
 + DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time));
 + DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time));
 + DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity));
 + DEFINE(VCPU_ACTIVITY_START, offsetof(struct kvm_vcpu, 
 arch.cur_tb_start));
 + DEFINE(TAS_SEQCOUNT, offsetof(struct kvmhv_tb_accumulator, seqcount));
 + DEFINE(TAS_TOTAL, offsetof(struct kvmhv_tb_accumulator, tb_total));
 + DEFINE(TAS_MIN, offsetof(struct kvmhv_tb_accumulator, tb_min));
 + DEFINE(TAS_MAX, offsetof(struct kvmhv_tb_accumulator, tb_max));
  #endif
   DEFINE(VCPU_SHARED_SPRG3,

Re: [PATCH 12/23] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT

2015-03-20 Thread Alexander Graf



On 20.03.15 10:39, Paul Mackerras wrote:
 This creates a debugfs directory for each HV guest (assuming debugfs
 is enabled in the kernel config), and within that directory, a file
 by which the contents of the guest's HPT (hashed page table) can be
 read.  The directory is named vm, where  is the PID of the
 process that created the guest.  The file is named htab.  This is
 intended to help in debugging problems in the host's management
 of guest memory.
 
 The contents of the file consist of a series of lines like this:
 
   3f48 4000d032bf003505 000bd7ff1196 0003b5c71196
 
 The first field is the index of the entry in the HPT, the second and
 third are the HPT entry, so the third entry contains the real page
 number that is mapped by the entry if the entry's valid bit is set.
 The fourth field is the guest's view of the second doubleword of the
 entry, so it contains the guest physical address.  (The format of the
 second through fourth fields are described in the Power ISA and also
 in arch/powerpc/include/asm/mmu-hash64.h.)
 
 Signed-off-by: Paul Mackerras pau...@samba.org
 ---
  arch/powerpc/include/asm/kvm_book3s_64.h |   2 +
  arch/powerpc/include/asm/kvm_host.h  |   2 +
  arch/powerpc/kvm/book3s_64_mmu_hv.c  | 136 
 +++
  arch/powerpc/kvm/book3s_hv.c |  12 +++
  virt/kvm/kvm_main.c  |   1 +
  5 files changed, 153 insertions(+)
 
 diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
 b/arch/powerpc/include/asm/kvm_book3s_64.h
 index 0789a0f..869c53f 100644
 --- a/arch/powerpc/include/asm/kvm_book3s_64.h
 +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
 @@ -436,6 +436,8 @@ static inline struct kvm_memslots 
 *kvm_memslots_raw(struct kvm *kvm)
   return rcu_dereference_raw_notrace(kvm-memslots);
  }
  
 +extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 +
  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
  
  #endif /* __ASM_KVM_BOOK3S_64_H__ */
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index 015773f..f1d0bbc 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -238,6 +238,8 @@ struct kvm_arch {
   atomic_t hpte_mod_interest;
   cpumask_t need_tlb_flush;
   int hpt_cma_alloc;
 + struct dentry *debugfs_dir;
 + struct dentry *htab_dentry;
  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
   struct mutex hpt_mutex;
 diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
 b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 index 6c6825a..d6fe308 100644
 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
 +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 @@ -27,6 +27,7 @@
  #include linux/srcu.h
  #include linux/anon_inodes.h
  #include linux/file.h
 +#include linux/debugfs.h
  
  #include asm/tlbflush.h
  #include asm/kvm_ppc.h
 @@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct 
 kvm_get_htab_fd *ghf)
   return ret;
  }
  
 +struct debugfs_htab_state {
 + struct kvm  *kvm;
 + struct mutexmutex;
 + unsigned long   hpt_index;
 + int chars_left;
 + int buf_index;
 + charbuf[64];
 +};
 +
 +static int debugfs_htab_open(struct inode *inode, struct file *file)
 +{
 + struct kvm *kvm = inode-i_private;
 + struct debugfs_htab_state *p;
 +
 + p = kzalloc(sizeof(*p), GFP_KERNEL);
 + if (!p)
 + return -ENOMEM;
 +
 + kvm_get_kvm(kvm);
 + p-kvm = kvm;
 + mutex_init(p-mutex);
 + file-private_data = p;
 +
 + return nonseekable_open(inode, file);
 +}
 +
 +static int debugfs_htab_release(struct inode *inode, struct file *file)
 +{
 + struct debugfs_htab_state *p = file-private_data;
 +
 + kvm_put_kvm(p-kvm);
 + kfree(p);
 + return 0;
 +}
 +
 +static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
 +  size_t len, loff_t *ppos)
 +{
 + struct debugfs_htab_state *p = file-private_data;
 + ssize_t ret, r;
 + unsigned long i, n;
 + unsigned long v, hr, gr;
 + struct kvm *kvm;
 + __be64 *hptp;
 +
 + ret = mutex_lock_interruptible(p-mutex);
 + if (ret)
 + return ret;
 +
 + if (p-chars_left) {
 + n = p-chars_left;
 + if (n  len)
 + n = len;
 + r = copy_to_user(buf, p-buf + p-buf_index, n);
 + n -= r;
 + p-chars_left -= n;
 + p-buf_index += n;
 + buf += n;
 + len -= n;
 + ret = n;
 + if (r) {
 + if (!n)
 + ret = -EFAULT;
 + goto out;
 + }
 + }
 +
 + kvm = p-kvm;
 + i = p-hpt_index;
 + hptp = (__be64 *)(kvm-arch.hpt_virt + (i * HPTE_SIZE));
 + for (; len != 0  i  kvm-arch.hpt_npte; ++i, hptp += 2) {
 + if (!(be64_to_cpu(hptp[0])

Re: [PATCH 12/23] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT

2015-03-20 Thread Alexander Graf



On 20.03.15 10:39, Paul Mackerras wrote:
 This creates a debugfs directory for each HV guest (assuming debugfs
 is enabled in the kernel config), and within that directory, a file
 by which the contents of the guest's HPT (hashed page table) can be
 read.  The directory is named vm, where  is the PID of the
 process that created the guest.  The file is named htab.  This is
 intended to help in debugging problems in the host's management
 of guest memory.
 
 The contents of the file consist of a series of lines like this:
 
   3f48 4000d032bf003505 000bd7ff1196 0003b5c71196
 
 The first field is the index of the entry in the HPT, the second and
 third are the HPT entry, so the third entry contains the real page
 number that is mapped by the entry if the entry's valid bit is set.
 The fourth field is the guest's view of the second doubleword of the
 entry, so it contains the guest physical address.  (The format of the
 second through fourth fields are described in the Power ISA and also
 in arch/powerpc/include/asm/mmu-hash64.h.)
 
 Signed-off-by: Paul Mackerras pau...@samba.org
 ---
  arch/powerpc/include/asm/kvm_book3s_64.h |   2 +
  arch/powerpc/include/asm/kvm_host.h  |   2 +
  arch/powerpc/kvm/book3s_64_mmu_hv.c  | 136 
 +++
  arch/powerpc/kvm/book3s_hv.c |  12 +++
  virt/kvm/kvm_main.c  |   1 +
  5 files changed, 153 insertions(+)
 
 diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
 b/arch/powerpc/include/asm/kvm_book3s_64.h
 index 0789a0f..869c53f 100644
 --- a/arch/powerpc/include/asm/kvm_book3s_64.h
 +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
 @@ -436,6 +436,8 @@ static inline struct kvm_memslots 
 *kvm_memslots_raw(struct kvm *kvm)
   return rcu_dereference_raw_notrace(kvm-memslots);
  }
  
 +extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 +
  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
  
  #endif /* __ASM_KVM_BOOK3S_64_H__ */
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index 015773f..f1d0bbc 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -238,6 +238,8 @@ struct kvm_arch {
   atomic_t hpte_mod_interest;
   cpumask_t need_tlb_flush;
   int hpt_cma_alloc;
 + struct dentry *debugfs_dir;
 + struct dentry *htab_dentry;
  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
   struct mutex hpt_mutex;
 diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
 b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 index 6c6825a..d6fe308 100644
 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
 +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 @@ -27,6 +27,7 @@
  #include linux/srcu.h
  #include linux/anon_inodes.h
  #include linux/file.h
 +#include linux/debugfs.h
  
  #include asm/tlbflush.h
  #include asm/kvm_ppc.h
 @@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct 
 kvm_get_htab_fd *ghf)
   return ret;
  }
  
 +struct debugfs_htab_state {
 + struct kvm  *kvm;
 + struct mutexmutex;
 + unsigned long   hpt_index;
 + int chars_left;
 + int buf_index;
 + charbuf[64];
 +};
 +
 +static int debugfs_htab_open(struct inode *inode, struct file *file)
 +{
 + struct kvm *kvm = inode-i_private;
 + struct debugfs_htab_state *p;
 +
 + p = kzalloc(sizeof(*p), GFP_KERNEL);
 + if (!p)
 + return -ENOMEM;
 +
 + kvm_get_kvm(kvm);
 + p-kvm = kvm;
 + mutex_init(p-mutex);
 + file-private_data = p;
 +
 + return nonseekable_open(inode, file);
 +}
 +
 +static int debugfs_htab_release(struct inode *inode, struct file *file)
 +{
 + struct debugfs_htab_state *p = file-private_data;
 +
 + kvm_put_kvm(p-kvm);
 + kfree(p);
 + return 0;
 +}
 +
 +static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
 +  size_t len, loff_t *ppos)
 +{
 + struct debugfs_htab_state *p = file-private_data;
 + ssize_t ret, r;
 + unsigned long i, n;
 + unsigned long v, hr, gr;
 + struct kvm *kvm;
 + __be64 *hptp;
 +
 + ret = mutex_lock_interruptible(p-mutex);
 + if (ret)
 + return ret;
 +
 + if (p-chars_left) {
 + n = p-chars_left;
 + if (n  len)
 + n = len;
 + r = copy_to_user(buf, p-buf + p-buf_index, n);
 + n -= r;
 + p-chars_left -= n;
 + p-buf_index += n;
 + buf += n;
 + len -= n;
 + ret = n;
 + if (r) {
 + if (!n)
 + ret = -EFAULT;
 + goto out;
 + }
 + }
 +
 + kvm = p-kvm;
 + i = p-hpt_index;
 + hptp = (__be64 *)(kvm-arch.hpt_virt + (i * HPTE_SIZE));
 + for (; len != 0  i  kvm-arch.hpt_npte; ++i, hptp += 2) {
 + if (!(be64_to_cpu(hptp[0])

Re: [PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8

2015-03-20 Thread Alexander Graf



On 20.03.15 10:39, Paul Mackerras wrote:
 This uses msgsnd where possible for signalling other threads within
 the same core on POWER8 systems, rather than IPIs through the XICS
 interrupt controller.  This includes waking secondary threads to run
 the guest, the interrupts generated by the virtual XICS, and the
 interrupts to bring the other threads out of the guest when exiting.
 
 Signed-off-by: Paul Mackerras pau...@samba.org
 ---
  arch/powerpc/kernel/asm-offsets.c   |  4 +++
  arch/powerpc/kvm/book3s_hv.c| 48 
 ++---
  arch/powerpc/kvm/book3s_hv_rm_xics.c| 11 
  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 41 
  4 files changed, 83 insertions(+), 21 deletions(-)
 
 diff --git a/arch/powerpc/kernel/asm-offsets.c 
 b/arch/powerpc/kernel/asm-offsets.c
 index fa7b57d..0ce2aa6 100644
 --- a/arch/powerpc/kernel/asm-offsets.c
 +++ b/arch/powerpc/kernel/asm-offsets.c
 @@ -37,6 +37,7 @@
  #include asm/thread_info.h
  #include asm/rtas.h
  #include asm/vdso_datapage.h
 +#include asm/dbell.h
  #ifdef CONFIG_PPC64
  #include asm/paca.h
  #include asm/lppaca.h
 @@ -568,6 +569,7 @@ int main(void)
   DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr));
   DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr));
   DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes));
 + DEFINE(VCORE_PCPU, offsetof(struct kvmppc_vcore, pcpu));
   DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
   DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
   DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
 @@ -757,5 +759,7 @@ int main(void)
   offsetof(struct paca_struct, subcore_sibling_mask));
  #endif
  
 + DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
 +
   return 0;
  }
 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
 index 03a8bb4..2c34bae 100644
 --- a/arch/powerpc/kvm/book3s_hv.c
 +++ b/arch/powerpc/kvm/book3s_hv.c
 @@ -51,6 +51,7 @@
  #include asm/hvcall.h
  #include asm/switch_to.h
  #include asm/smp.h
 +#include asm/dbell.h
  #include linux/gfp.h
  #include linux/vmalloc.h
  #include linux/highmem.h
 @@ -84,9 +85,34 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
 MAX_HCALL_OPCODE/4 + 1);
  static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
  static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
  
 +static bool kvmppc_ipi_thread(int cpu)
 +{
 + /* On POWER8 for IPIs to threads in the same core, use msgsnd */
 + if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
 + preempt_disable();
 + if ((cpu  ~7) == (smp_processor_id()  ~7)) {
 + unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
 + msg |= cpu  7;
 + smp_mb();
 + __asm__ __volatile__ (PPC_MSGSND(%0) : : r (msg));
 + preempt_enable();
 + return true;
 + }
 + preempt_enable();
 + }
 +
 +#if defined(CONFIG_PPC_ICP_NATIVE)  defined(CONFIG_SMP)
 + if (cpu = 0  cpu  nr_cpu_ids  paca[cpu].kvm_hstate.xics_phys) {
 + xics_wake_cpu(cpu);
 + return true;
 + }
 +#endif
 +
 + return false;
 +}
 +
  static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
  {
 - int me;
   int cpu = vcpu-cpu;
   wait_queue_head_t *wqp;
  
 @@ -96,20 +122,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu 
 *vcpu)
   ++vcpu-stat.halt_wakeup;
   }
  
 - me = get_cpu();
 + if (kvmppc_ipi_thread(cpu + vcpu-arch.ptid))
 + return;
  
   /* CPU points to the first thread of the core */
 - if (cpu != me  cpu = 0  cpu  nr_cpu_ids) {
 -#ifdef CONFIG_PPC_ICP_NATIVE
 - int real_cpu = cpu + vcpu-arch.ptid;
 - if (paca[real_cpu].kvm_hstate.xics_phys)
 - xics_wake_cpu(real_cpu);
 - else
 -#endif
 - if (cpu_online(cpu))
 - smp_send_reschedule(cpu);
 - }
 - put_cpu();
 + if (cpu = 0  cpu  nr_cpu_ids  cpu_online(cpu))
 + smp_send_reschedule(cpu);
  }
  
  /*
 @@ -1754,10 +1772,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
   /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
   smp_wmb();
   tpaca-kvm_hstate.kvm_vcpu = vcpu;
 -#if defined(CONFIG_PPC_ICP_NATIVE)  defined(CONFIG_SMP)
   if (cpu != smp_processor_id())
 - xics_wake_cpu(cpu);
 -#endif
 + kvmppc_ipi_thread(cpu);
  }
  
  static void kvmppc_wait_for_nap(void)
 diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
 b/arch/powerpc/kvm/book3s_hv_rm_xics.c
 index 6dded8c..457a8b1 100644
 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
 +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
 @@ -18,6 +18,7 @@
  #include asm/debug.h
  #include asm/synch.h
  #include asm/ppc-opcode.h
 +#include asm/dbell.h
  
  #include book3s_xics.h
  
 @@ -83,6

Re: [PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8

2015-03-20 Thread Alexander Graf



On 20.03.15 10:39, Paul Mackerras wrote:
 This uses msgsnd where possible for signalling other threads within
 the same core on POWER8 systems, rather than IPIs through the XICS
 interrupt controller.  This includes waking secondary threads to run
 the guest, the interrupts generated by the virtual XICS, and the
 interrupts to bring the other threads out of the guest when exiting.
 
 Signed-off-by: Paul Mackerras pau...@samba.org
 ---
  arch/powerpc/kernel/asm-offsets.c   |  4 +++
  arch/powerpc/kvm/book3s_hv.c| 48 
 ++---
  arch/powerpc/kvm/book3s_hv_rm_xics.c| 11 
  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 41 
  4 files changed, 83 insertions(+), 21 deletions(-)
 
 diff --git a/arch/powerpc/kernel/asm-offsets.c 
 b/arch/powerpc/kernel/asm-offsets.c
 index fa7b57d..0ce2aa6 100644
 --- a/arch/powerpc/kernel/asm-offsets.c
 +++ b/arch/powerpc/kernel/asm-offsets.c
 @@ -37,6 +37,7 @@
  #include asm/thread_info.h
  #include asm/rtas.h
  #include asm/vdso_datapage.h
 +#include asm/dbell.h
  #ifdef CONFIG_PPC64
  #include asm/paca.h
  #include asm/lppaca.h
 @@ -568,6 +569,7 @@ int main(void)
   DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr));
   DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr));
   DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes));
 + DEFINE(VCORE_PCPU, offsetof(struct kvmppc_vcore, pcpu));
   DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
   DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
   DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
 @@ -757,5 +759,7 @@ int main(void)
   offsetof(struct paca_struct, subcore_sibling_mask));
  #endif
  
 + DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
 +
   return 0;
  }
 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
 index 03a8bb4..2c34bae 100644
 --- a/arch/powerpc/kvm/book3s_hv.c
 +++ b/arch/powerpc/kvm/book3s_hv.c
 @@ -51,6 +51,7 @@
  #include asm/hvcall.h
  #include asm/switch_to.h
  #include asm/smp.h
 +#include asm/dbell.h
  #include linux/gfp.h
  #include linux/vmalloc.h
  #include linux/highmem.h
 @@ -84,9 +85,34 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
 MAX_HCALL_OPCODE/4 + 1);
  static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
  static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
  
 +static bool kvmppc_ipi_thread(int cpu)
 +{
 + /* On POWER8 for IPIs to threads in the same core, use msgsnd */
 + if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
 + preempt_disable();
 + if ((cpu  ~7) == (smp_processor_id()  ~7)) {
 + unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
 + msg |= cpu  7;
 + smp_mb();
 + __asm__ __volatile__ (PPC_MSGSND(%0) : : r (msg));
 + preempt_enable();
 + return true;
 + }
 + preempt_enable();
 + }
 +
 +#if defined(CONFIG_PPC_ICP_NATIVE)  defined(CONFIG_SMP)
 + if (cpu = 0  cpu  nr_cpu_ids  paca[cpu].kvm_hstate.xics_phys) {
 + xics_wake_cpu(cpu);
 + return true;
 + }
 +#endif
 +
 + return false;
 +}
 +
  static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
  {
 - int me;
   int cpu = vcpu-cpu;
   wait_queue_head_t *wqp;
  
 @@ -96,20 +122,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu 
 *vcpu)
   ++vcpu-stat.halt_wakeup;
   }
  
 - me = get_cpu();
 + if (kvmppc_ipi_thread(cpu + vcpu-arch.ptid))
 + return;
  
   /* CPU points to the first thread of the core */
 - if (cpu != me  cpu = 0  cpu  nr_cpu_ids) {
 -#ifdef CONFIG_PPC_ICP_NATIVE
 - int real_cpu = cpu + vcpu-arch.ptid;
 - if (paca[real_cpu].kvm_hstate.xics_phys)
 - xics_wake_cpu(real_cpu);
 - else
 -#endif
 - if (cpu_online(cpu))
 - smp_send_reschedule(cpu);
 - }
 - put_cpu();
 + if (cpu = 0  cpu  nr_cpu_ids  cpu_online(cpu))
 + smp_send_reschedule(cpu);
  }
  
  /*
 @@ -1754,10 +1772,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
   /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
   smp_wmb();
   tpaca-kvm_hstate.kvm_vcpu = vcpu;
 -#if defined(CONFIG_PPC_ICP_NATIVE)  defined(CONFIG_SMP)
   if (cpu != smp_processor_id())
 - xics_wake_cpu(cpu);
 -#endif
 + kvmppc_ipi_thread(cpu);
  }
  
  static void kvmppc_wait_for_nap(void)
 diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
 b/arch/powerpc/kvm/book3s_hv_rm_xics.c
 index 6dded8c..457a8b1 100644
 --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
 +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
 @@ -18,6 +18,7 @@
  #include asm/debug.h
  #include asm/synch.h
  #include asm/ppc-opcode.h
 +#include asm/dbell.h
  
  #include book3s_xics.h
  
 @@ -83,6

Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-20 Thread Paul Mackerras

On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote:
 
 
 On 20.03.15 10:39, Paul Mackerras wrote:
  This reads the timebase at various points in the real-mode guest
  entry/exit code and uses that to accumulate total, minimum and
  maximum time spent in those parts of the code.  Currently these
  times are accumulated per vcpu in 5 parts of the code:
  
  * rm_entry - time taken from the start of kvmppc_hv_entry() until
just before entering the guest.
  * rm_intr - time from when we take a hypervisor interrupt in the
guest until we either re-enter the guest or decide to exit to the
host.  This includes time spent handling hcalls in real mode.
  * rm_exit - time from when we decide to exit the guest until the
return from kvmppc_hv_entry().
  * guest - time spend in the guest
  * cede - time spent napping in real mode due to an H_CEDE hcall
while other threads in the same vcore are active.
  
  These times are exposed in debugfs in a directory per vcpu that
  contains a file called timings.  This file contains one line for
  each of the 5 timings above, with the name followed by a colon and
  4 numbers, which are the count (number of times the code has been
  executed), the total time, the minimum time, and the maximum time,
  all in nanoseconds.
  
  Signed-off-by: Paul Mackerras pau...@samba.org
 
 Have you measure the additional overhead this brings?

I haven't - in fact I did this patch so I could measure the overhead
or improvement from other changes I did, but it doesn't measure its
own overhead, of course.  I guess I need a workload that does a
defined number of guest entries and exits and measure how fast it runs
with and without the patch (maybe something like H_SET_MODE in a
loop).  I'll figure something out and post the results.  

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object

2015-03-20 Thread Paul Mackerras

On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
 
 
 On 20.03.15 10:39, Paul Mackerras wrote:
  From: Bharata B Rao bhar...@linux.vnet.ibm.com
  
  Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
  correctly, certain work arounds have to be employed to allow reuse of
  vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
  proposed workaround is to park the vcpu fd in userspace during cpu unplug
  and reuse it later during next hotplug.
  
  More details can be found here:
  KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
  QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
  
  In order to support this workaround with PowerPC KVM, don't create or
  initialize ICP if the vCPU is found to be already associated with an ICP.
  
  Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
  Signed-off-by: Paul Mackerras pau...@samba.org
 
 This probably makes some sense, but please make sure that user space has
 some way to figure out whether hotplug works at all.

Bharata is working on the qemu side of all this, so I assume he has
that covered.

 Also Paul, for patches that you pick up from others, I'd prefer if they
 send the patches to the ML themselves first and you pick them up from
 there then. That way we give everyone the same treatment.

Fair enough.  In fact Bharata did post the patch but he sent it to
linuxppc-...@ozlabs.org not the KVM lists.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object

2015-03-20 Thread Alexander Graf



On 20.03.15 12:26, Paul Mackerras wrote:
 On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:


 On 20.03.15 10:39, Paul Mackerras wrote:
 From: Bharata B Rao bhar...@linux.vnet.ibm.com

 Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
 correctly, certain work arounds have to be employed to allow reuse of
 vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
 proposed workaround is to park the vcpu fd in userspace during cpu unplug
 and reuse it later during next hotplug.

 More details can be found here:
 KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
 QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html

 In order to support this workaround with PowerPC KVM, don't create or
 initialize ICP if the vCPU is found to be already associated with an ICP.

 Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org

 This probably makes some sense, but please make sure that user space has
 some way to figure out whether hotplug works at all.
 
 Bharata is working on the qemu side of all this, so I assume he has
 that covered.

Well, so far the kernel doesn't expose anything he can query, so I
suppose he just blindly assumes that older host kernels will randomly
break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
can check on.

 
 Also Paul, for patches that you pick up from others, I'd prefer if they
 send the patches to the ML themselves first and you pick them up from
 there then. That way we give everyone the same treatment.
 
 Fair enough.  In fact Bharata did post the patch but he sent it to
 linuxppc-...@ozlabs.org not the KVM lists.

Please make sure you only take patches into your queue that made it to
at least kvm@vger, preferably kvm-ppc@vger as well. If you see related
patches on other mailing lists, just ask the respective people to resend
with proper ML exposure.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-20 Thread Alexander Graf



On 20.03.15 12:25, Paul Mackerras wrote:
 On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote:


 On 20.03.15 10:39, Paul Mackerras wrote:
 This reads the timebase at various points in the real-mode guest
 entry/exit code and uses that to accumulate total, minimum and
 maximum time spent in those parts of the code.  Currently these
 times are accumulated per vcpu in 5 parts of the code:

 * rm_entry - time taken from the start of kvmppc_hv_entry() until
   just before entering the guest.
 * rm_intr - time from when we take a hypervisor interrupt in the
   guest until we either re-enter the guest or decide to exit to the
   host.  This includes time spent handling hcalls in real mode.
 * rm_exit - time from when we decide to exit the guest until the
   return from kvmppc_hv_entry().
 * guest - time spend in the guest
 * cede - time spent napping in real mode due to an H_CEDE hcall
   while other threads in the same vcore are active.

 These times are exposed in debugfs in a directory per vcpu that
 contains a file called timings.  This file contains one line for
 each of the 5 timings above, with the name followed by a colon and
 4 numbers, which are the count (number of times the code has been
 executed), the total time, the minimum time, and the maximum time,
 all in nanoseconds.

 Signed-off-by: Paul Mackerras pau...@samba.org

 Have you measure the additional overhead this brings?
 
 I haven't - in fact I did this patch so I could measure the overhead
 or improvement from other changes I did, but it doesn't measure its
 own overhead, of course.  I guess I need a workload that does a
 defined number of guest entries and exits and measure how fast it runs
 with and without the patch (maybe something like H_SET_MODE in a
 loop).  I'll figure something out and post the results.  

Yeah, just measure the number of exits you can handle for a simple
hcall. If there is measurable overhead, it's probably a good idea to
move the statistics gathering into #ifdef paths for DEBUGFS or maybe
even a separate EXIT_TIMING config option as we have it for booke.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/23] Bug fixes and improvements for HV KVM

2015-03-20 Thread Alexander Graf



On 20.03.15 10:39, Paul Mackerras wrote:
 This is my current patch queue for HV KVM on PPC.  This series is
 based on the queue branch of the KVM tree, i.e. roughly v4.0-rc3
 plus a set of recent KVM changes which don't intersect with the
 changes in this series.  On top of that, in my testing I have some
 patches which are not KVM-related but are needed to boot and run a
 recent upstream kernel successfully:
 
 tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
 tick/hotplug: Handover time related duties before cpu offline
 powerpc/powernv: Check image loaded or not before calling flash
 powerpc/powernv: Fixes for hypervisor doorbell handling
 powerpc/powernv: Fix return value from power7_nap() et al.
 powerpc: Export __spin_yield
 
 These patches have been posted by their authors and are on their way
 upstream via various trees.  They are not included in this series.
 
 The first three patches are bug fixes that should go into v4.0 if
 possible.  The remainder are intended for the 4.1 merge window.
 
 The patch powerpc: Export __spin_yield is a prerequisite for patch
 9/23 of this series (KVM: PPC: Book3S HV: Convert ICS mutex lock to
 spin lock).  It is on its way upstream through the linuxppc-dev
 mailing list.
 
 The patch powerpc/powernv: Fixes for hypervisor doorbell handling is
 needed for correct operation with patch 20/23, KVM: PPC: Book3S HV:
 Use msgsnd for signalling threads.  It is also on its way upstream
 through the linuxppc-dev list.  I am expecting both of these
 prerequisite patches to go into 4.0.
 
 Finally, the last patch in this series converts some of the assembly
 code in book3s_hv_rmhandlers.S into C.  I intend to continue this
 trend.

Thanks, applied patches 4-11 to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/23] Bug fixes and improvements for HV KVM

2015-03-20 Thread Alexander Graf



On 20.03.15 10:39, Paul Mackerras wrote:
 This is my current patch queue for HV KVM on PPC.  This series is
 based on the queue branch of the KVM tree, i.e. roughly v4.0-rc3
 plus a set of recent KVM changes which don't intersect with the
 changes in this series.  On top of that, in my testing I have some
 patches which are not KVM-related but are needed to boot and run a
 recent upstream kernel successfully:
 
 tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
 tick/hotplug: Handover time related duties before cpu offline
 powerpc/powernv: Check image loaded or not before calling flash
 powerpc/powernv: Fixes for hypervisor doorbell handling
 powerpc/powernv: Fix return value from power7_nap() et al.
 powerpc: Export __spin_yield
 
 These patches have been posted by their authors and are on their way
 upstream via various trees.  They are not included in this series.
 
 The first three patches are bug fixes that should go into v4.0 if
 possible.  The remainder are intended for the 4.1 merge window.
 
 The patch powerpc: Export __spin_yield is a prerequisite for patch
 9/23 of this series (KVM: PPC: Book3S HV: Convert ICS mutex lock to
 spin lock).  It is on its way upstream through the linuxppc-dev
 mailing list.
 
 The patch powerpc/powernv: Fixes for hypervisor doorbell handling is
 needed for correct operation with patch 20/23, KVM: PPC: Book3S HV:
 Use msgsnd for signalling threads.  It is also on its way upstream
 through the linuxppc-dev list.  I am expecting both of these
 prerequisite patches to go into 4.0.
 
 Finally, the last patch in this series converts some of the assembly
 code in book3s_hv_rmhandlers.S into C.  I intend to continue this
 trend.

Thanks, applied patches 4-11 to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 93251] qemu-kvm guests randomly hangs after reboot command in guest

2015-03-20 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=93251

Igor Mammedov imamm...@redhat.com changed:

   What|Removed |Added

 CC||imamm...@redhat.com

--- Comment #6 from Igor Mammedov imamm...@redhat.com ---
Since commit 1d4e7e3c increased struct kvm_memslots size from order 2 to
order 4, like-hood of above failure grew in ~2x times (there where 90 order 2
free pages)

Looking at log from comment 1
allocation failure shouldn't happened since there are still 40 Order 4 free
pages:

 úno 13 18:10:27 argus kernel: Normal: 18555*4kB (UEMC) 145*8kB (UMC) 92*16kB 
 (UM) 54*32kB (UM) 40*64kB (UM) 9*128kB (UM) 2*256kB (U) 0*512kB 0*1024kB 
 0*2048kB 0*4096kB = 82804kB

Anyway I'll post patch that changes allocation vmalloc-ed memory as quick fix
and look for possibility of allocating actual amount of slots instead of
possible maximum like it's done now.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 18/23] KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI

2015-03-20 Thread Paul Mackerras

When running a multi-threaded guest and vcpu 0 in a virtual core
is not running in the guest (i.e. it is busy elsewhere in the host),
thread 0 of the physical core will switch the MMU to the guest and
then go to nap mode in the code at kvm_do_nap.  If the guest sends
an IPI to thread 0 using the msgsndp instruction, that will wake
up thread 0 and cause all the threads in the guest to exit to the
host unnecessarily.  To avoid the unnecessary exit, this arranges
for the PECEDP bit to be cleared in this situation.  When napping
due to a H_CEDE from the guest, we still set PECEDP so that the
thread will wake up on an IPI sent using msgsndp.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c3b148d..8afc8a8 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -191,6 +191,7 @@ kvmppc_primary_no_guest:
li  r3, NAPPING_NOVCPU
stb r3, HSTATE_NAPPING(r13)
 
+   li  r3, 0   /* Don't wake on privileged (OS) doorbell */
b   kvm_do_nap
 
 kvm_novcpu_wakeup:
@@ -2107,10 +2108,13 @@ _GLOBAL(kvmppc_h_cede)  /* r3 = vcpu pointer, 
r11 = msr, r13 = paca */
addir3, r4, VCPU_TB_CEDE
bl  kvmhv_accumulate_time
 
+   lis r3, LPCR_PECEDP@h   /* Do wake on privileged doorbell */
+
/*
 * Take a nap until a decrementer or external or doobell interrupt
-* occurs, with PECE1, PECE0 and PECEDP set in LPCR. Also clear the
-* runlatch bit before napping.
+* occurs, with PECE1 and PECE0 set in LPCR.
+* On POWER8, if we are ceding, also set PECEDP.
+* Also clear the runlatch bit before napping.
 */
 kvm_do_nap:
mfspr   r0, SPRN_CTRLF
@@ -2122,7 +2126,7 @@ kvm_do_nap:
mfspr   r5,SPRN_LPCR
ori r5,r5,LPCR_PECE0 | LPCR_PECE1
 BEGIN_FTR_SECTION
-   orisr5,r5,LPCR_PECEDP@h
+   rlwimi  r5, r3, 0, LPCR_PECEDP
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_LPCR,r5
isync
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8

2015-03-20 Thread Paul Mackerras

This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller.  This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kernel/asm-offsets.c   |  4 +++
 arch/powerpc/kvm/book3s_hv.c| 48 ++---
 arch/powerpc/kvm/book3s_hv_rm_xics.c| 11 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 41 
 4 files changed, 83 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index fa7b57d..0ce2aa6 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -37,6 +37,7 @@
 #include asm/thread_info.h
 #include asm/rtas.h
 #include asm/vdso_datapage.h
+#include asm/dbell.h
 #ifdef CONFIG_PPC64
 #include asm/paca.h
 #include asm/lppaca.h
@@ -568,6 +569,7 @@ int main(void)
DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr));
DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr));
DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes));
+   DEFINE(VCORE_PCPU, offsetof(struct kvmppc_vcore, pcpu));
DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
@@ -757,5 +759,7 @@ int main(void)
offsetof(struct paca_struct, subcore_sibling_mask));
 #endif
 
+   DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
+
return 0;
 }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 03a8bb4..2c34bae 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -51,6 +51,7 @@
 #include asm/hvcall.h
 #include asm/switch_to.h
 #include asm/smp.h
+#include asm/dbell.h
 #include linux/gfp.h
 #include linux/vmalloc.h
 #include linux/highmem.h
@@ -84,9 +85,34 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
MAX_HCALL_OPCODE/4 + 1);
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
+static bool kvmppc_ipi_thread(int cpu)
+{
+   /* On POWER8 for IPIs to threads in the same core, use msgsnd */
+   if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
+   preempt_disable();
+   if ((cpu  ~7) == (smp_processor_id()  ~7)) {
+   unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+   msg |= cpu  7;
+   smp_mb();
+   __asm__ __volatile__ (PPC_MSGSND(%0) : : r (msg));
+   preempt_enable();
+   return true;
+   }
+   preempt_enable();
+   }
+
+#if defined(CONFIG_PPC_ICP_NATIVE)  defined(CONFIG_SMP)
+   if (cpu = 0  cpu  nr_cpu_ids  paca[cpu].kvm_hstate.xics_phys) {
+   xics_wake_cpu(cpu);
+   return true;
+   }
+#endif
+
+   return false;
+}
+
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-   int me;
int cpu = vcpu-cpu;
wait_queue_head_t *wqp;
 
@@ -96,20 +122,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
++vcpu-stat.halt_wakeup;
}
 
-   me = get_cpu();
+   if (kvmppc_ipi_thread(cpu + vcpu-arch.ptid))
+   return;
 
/* CPU points to the first thread of the core */
-   if (cpu != me  cpu = 0  cpu  nr_cpu_ids) {
-#ifdef CONFIG_PPC_ICP_NATIVE
-   int real_cpu = cpu + vcpu-arch.ptid;
-   if (paca[real_cpu].kvm_hstate.xics_phys)
-   xics_wake_cpu(real_cpu);
-   else
-#endif
-   if (cpu_online(cpu))
-   smp_send_reschedule(cpu);
-   }
-   put_cpu();
+   if (cpu = 0  cpu  nr_cpu_ids  cpu_online(cpu))
+   smp_send_reschedule(cpu);
 }
 
 /*
@@ -1754,10 +1772,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
/* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
smp_wmb();
tpaca-kvm_hstate.kvm_vcpu = vcpu;
-#if defined(CONFIG_PPC_ICP_NATIVE)  defined(CONFIG_SMP)
if (cpu != smp_processor_id())
-   xics_wake_cpu(cpu);
-#endif
+   kvmppc_ipi_thread(cpu);
 }
 
 static void kvmppc_wait_for_nap(void)
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 6dded8c..457a8b1 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -18,6 +18,7 @@
 #include asm/debug.h
 #include asm/synch.h
 #include asm/ppc-opcode.h
+#include asm/dbell.h
 
 #include book3s_xics.h
 
@@ -83,6 +84,16 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
/* In SMT

[PATCH 19/23] KVM: PPC: Book3S HV: Use decrementer to wake napping threads

2015-03-20 Thread Paul Mackerras

This arranges for threads that are napping due to their vcpu having
ceded or due to not having a vcpu to wake up at the end of the guest's
timeslice without having to be poked with an IPI.  We do that by
arranging for the decrementer to contain a value no greater than the
number of timebase ticks remaining until the end of the timeslice.
In the case of a thread with no vcpu, this number is in the hypervisor
decrementer already.  In the case of a ceded vcpu, we use the smaller
of the HDEC value and the DEC value.

Using the DEC like this when ceded means we need to save and restore
the guest decrementer value around the nap.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 +++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 8afc8a8..03a37a0 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -172,6 +172,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 kvmppc_primary_no_guest:
/* We handle this much like a ceded vcpu */
+   /* put the HDEC into the DEC, since HDEC interrupts don't wake us */
+   mfspr   r3, SPRN_HDEC
+   mtspr   SPRN_DEC, r3
/* set our bit in napping_threads */
ld  r5, HSTATE_KVM_VCORE(r13)
lbz r7, HSTATE_PTID(r13)
@@ -223,6 +226,12 @@ kvm_novcpu_wakeup:
cmpdi   r3, 0
bge kvm_novcpu_exit
 
+   /* See if our timeslice has expired (HDEC is negative) */
+   mfspr   r0, SPRN_HDEC
+   li  r12, BOOK3S_INTERRUPT_HV_DECREMENTER
+   cmpwi   r0, 0
+   blt kvm_novcpu_exit
+
/* Got an IPI but other vcpus aren't yet exiting, must be a latecomer */
ld  r4, HSTATE_KVM_VCPU(r13)
cmpdi   r4, 0
@@ -1478,10 +1487,10 @@ kvmhv_do_exit:  /* r12 = trap, r13 = 
paca */
cmpwi   r3,0x100/* Are we the first here? */
bge 43f
cmpwi   r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-   beq 40f
+   beq 43f
li  r0,0
mtspr   SPRN_HDEC,r0
-40:
+
/*
 * Send an IPI to any napping threads, since an HDEC interrupt
 * doesn't wake CPUs up from nap.
@@ -2104,6 +2113,27 @@ _GLOBAL(kvmppc_h_cede)   /* r3 = vcpu pointer, 
r11 = msr, r13 = paca */
/* save FP state */
bl  kvmppc_save_fp
 
+   /*
+* Set DEC to the smaller of DEC and HDEC, so that we wake
+* no later than the end of our timeslice (HDEC interrupts
+* don't wake us from nap).
+*/
+   mfspr   r3, SPRN_DEC
+   mfspr   r4, SPRN_HDEC
+   mftbr5
+   cmpwr3, r4
+   ble 67f
+   mtspr   SPRN_DEC, r4
+67:
+   /* save expiry time of guest decrementer */
+   extsw   r3, r3
+   add r3, r3, r5
+   ld  r4, HSTATE_KVM_VCPU(r13)
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_TB_OFFSET(r5)
+   subfr3, r6, r3  /* convert to host TB value */
+   std r3, VCPU_DEC_EXPIRES(r4)
+
ld  r4, HSTATE_KVM_VCPU(r13)
addir3, r4, VCPU_TB_CEDE
bl  kvmhv_accumulate_time
@@ -2157,6 +2187,15 @@ kvm_end_cede:
/* load up FP state */
bl  kvmppc_load_fp
 
+   /* Restore guest decrementer */
+   ld  r3, VCPU_DEC_EXPIRES(r4)
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_TB_OFFSET(r5)
+   add r3, r3, r6  /* convert host TB to guest TB value */
+   mftbr7
+   subfr3, r7, r3
+   mtspr   SPRN_DEC, r3
+
/* Load NV GPRS */
ld  r14, VCPU_GPR(R14)(r4)
ld  r15, VCPU_GPR(R15)(r4)
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 23/23] KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C

2015-03-20 Thread Paul Mackerras

This replaces the assembler code for kvmhv_commence_exit() with C code
in book3s_hv_builtin.c.  It also moves the IPI/message sending code
that was in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function
so it can be used by kvmhv_commence_exit() as well as
icp_rm_set_vcpu_irq().

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  2 +
 arch/powerpc/kvm/book3s_hv_builtin.c | 73 
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 22 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 81 
 4 files changed, 85 insertions(+), 93 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 869c53f..2b84e48 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -438,6 +438,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
 
 extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 
+extern void kvmhv_rm_send_ipi(int cpu);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 2754251..dbfc525 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -22,6 +22,8 @@
 #include asm/kvm_ppc.h
 #include asm/kvm_book3s.h
 #include asm/archrandom.h
+#include asm/dbell.h
+#include asm/xics.h
 
 #define KVM_CMA_CHUNK_ORDER18
 
@@ -184,3 +186,74 @@ long kvmppc_h_random(struct kvm_vcpu *vcpu)
 
return H_HARDWARE;
 }
+
+static inline void rm_writeb(unsigned long paddr, u8 val)
+{
+   __asm__ __volatile__(stbcix %0,0,%1
+   : : r (val), r (paddr) : memory);
+}
+
+/*
+ * Send an interrupt or message to another CPU.
+ * This can only be called in real mode.
+ * The caller needs to include any barrier needed to order writes
+ * to memory vs. the IPI/message.
+ */
+void kvmhv_rm_send_ipi(int cpu)
+{
+   unsigned long xics_phys;
+
+   /* On POWER8 for IPIs to threads in the same core, use msgsnd */
+   if (cpu_has_feature(CPU_FTR_ARCH_207S) 
+   (cpu  ~7) == (raw_smp_processor_id()  ~7)) {
+   unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+   msg |= cpu  7;
+   __asm__ __volatile__ (PPC_MSGSND(%0) : : r (msg));
+   return;
+   }
+
+   /* Not too hard, then poke the target */
+   xics_phys = paca[cpu].kvm_hstate.xics_phys;
+   rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+}
+
+/*
+ * The following functions are called from the assembly code
+ * in book3s_hv_rmhandlers.S.
+ */
+static void kvmhv_interrupt_vcore(struct kvmppc_vcore *vc, int active)
+{
+   int cpu = vc-pcpu;
+
+   /* Order setting of exit map vs. msgsnd/IPI */
+   smp_mb();
+   for (; active; active = 1, ++cpu)
+   if (active  1)
+   kvmhv_rm_send_ipi(cpu);
+}
+
+void kvmhv_commence_exit(int trap)
+{
+   struct kvmppc_vcore *vc = local_paca-kvm_hstate.kvm_vcore;
+   int ptid = local_paca-kvm_hstate.ptid;
+   int me, ee;
+
+   /* Set our bit in the threads-exiting-guest map in the 0xff00
+  bits of vcore-entry_exit_map */
+   me = 0x100  ptid;
+   do {
+   ee = vc-entry_exit_map;
+   } while (cmpxchg(vc-entry_exit_map, ee, ee | me) != ee);
+
+   /* Are we the first here? */
+   if ((ee  8) != 0)
+   return;
+
+   /*
+* Trigger the other threads in this vcore to exit the guest.
+* If this is a hypervisor decrementer interrupt then they
+* will be already on their way out of the guest.
+*/
+   if (trap != BOOK3S_INTERRUPT_HV_DECREMENTER)
+   kvmhv_interrupt_vcore(vc, ee  ~(1  ptid));
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 457a8b1..046ab44 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -27,12 +27,6 @@
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
u32 new_irq);
 
-static inline void rm_writeb(unsigned long paddr, u8 val)
-{
-   __asm__ __volatile__(sync; stbcix %0,0,%1
-   : : r (val), r (paddr) : memory);
-}
-
 /* -- ICS routines -- */
 static void ics_rm_check_resend(struct kvmppc_xics *xics,
struct kvmppc_ics *ics, struct kvmppc_icp *icp)
@@ -61,7 +55,6 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
struct kvm_vcpu *this_vcpu)
 {
struct kvmppc_icp *this_icp = this_vcpu-arch.icp;
-   unsigned long xics_phys;
int cpu;
 
/* Mark the target VCPU as having an interrupt pending */
@@ -84,19 +77,8 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
/* In SMT cpu will always point to thread 0,

[PATCH 17/23] KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken

2015-03-20 Thread Paul Mackerras

We can tell when a secondary thread has finished running a guest by
the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
is no real need for the nap_count field in the kvmppc_vcore struct.
This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
pointers of the secondary threads rather than polling vc-nap_count.
Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
this also means that we can tell which secondary threads have got
stuck and thus print a more informative error message.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kernel/asm-offsets.c   |  1 -
 arch/powerpc/kvm/book3s_hv.c| 47 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 19 +
 4 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 7b327e5..f6d4232 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -272,8 +272,6 @@ struct kvmppc_vcore {
int n_runnable;
int num_threads;
int entry_exit_count;
-   int n_woken;
-   int nap_count;
int napping_threads;
int first_vcpuid;
u16 pcpu;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 5eda551..fa7b57d 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -561,7 +561,6 @@ int main(void)
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, 
entry_exit_count));
-   DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, 
napping_threads));
DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index bd16b03..03a8bb4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1702,8 +1702,10 @@ static int kvmppc_grab_hwthread(int cpu)
tpaca = paca[cpu];
 
/* Ensure the thread won't go into the kernel if it wakes */
-   tpaca-kvm_hstate.hwthread_req = 1;
tpaca-kvm_hstate.kvm_vcpu = NULL;
+   tpaca-kvm_hstate.napping = 0;
+   smp_wmb();
+   tpaca-kvm_hstate.hwthread_req = 1;
 
/*
 * If the thread is already executing in the kernel (e.g. handling
@@ -1746,35 +1748,43 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
}
cpu = vc-pcpu + vcpu-arch.ptid;
tpaca = paca[cpu];
-   tpaca-kvm_hstate.kvm_vcpu = vcpu;
tpaca-kvm_hstate.kvm_vcore = vc;
tpaca-kvm_hstate.ptid = vcpu-arch.ptid;
vcpu-cpu = vc-pcpu;
+   /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
smp_wmb();
+   tpaca-kvm_hstate.kvm_vcpu = vcpu;
 #if defined(CONFIG_PPC_ICP_NATIVE)  defined(CONFIG_SMP)
-   if (cpu != smp_processor_id()) {
+   if (cpu != smp_processor_id())
xics_wake_cpu(cpu);
-   if (vcpu-arch.ptid)
-   ++vc-n_woken;
-   }
 #endif
 }
 
-static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
+static void kvmppc_wait_for_nap(void)
 {
-   int i;
+   int cpu = smp_processor_id();
+   int i, loops;
 
-   HMT_low();
-   i = 0;
-   while (vc-nap_count  vc-n_woken) {
-   if (++i = 100) {
-   pr_err(kvmppc_wait_for_nap timeout %d %d\n,
-  vc-nap_count, vc-n_woken);
-   break;
+   for (loops = 0; loops  100; ++loops) {
+   /*
+* Check if all threads are finished.
+* We set the vcpu pointer when starting a thread
+* and the thread clears it when finished, so we look
+* for any threads that still have a non-NULL vcpu ptr.
+*/
+   for (i = 1; i  threads_per_subcore; ++i)
+   if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+   break;
+   if (i == threads_per_subcore) {
+   HMT_medium();
+   return;
}
-   cpu_relax();
+   HMT_low();
}
HMT_medium();
+   for (i = 1; i  threads_per_subcore; ++i)
+   if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+   pr_err(KVM: CPU %d seems to be stuck\n, cpu + i);
 }
 
 /*
@@ -1915,8 +1925,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Initialize *vc.
 */
-   vc-n_woken = 0;
-   vc-nap_count = 0;
vc-entry_exit_count = 0;
vc-preempt_tb =

[PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-20 Thread Paul Mackerras

This reads the timebase at various points in the real-mode guest
entry/exit code and uses that to accumulate total, minimum and
maximum time spent in those parts of the code.  Currently these
times are accumulated per vcpu in 5 parts of the code:

* rm_entry - time taken from the start of kvmppc_hv_entry() until
  just before entering the guest.
* rm_intr - time from when we take a hypervisor interrupt in the
  guest until we either re-enter the guest or decide to exit to the
  host.  This includes time spent handling hcalls in real mode.
* rm_exit - time from when we decide to exit the guest until the
  return from kvmppc_hv_entry().
* guest - time spend in the guest
* cede - time spent napping in real mode due to an H_CEDE hcall
  while other threads in the same vcore are active.

These times are exposed in debugfs in a directory per vcpu that
contains a file called timings.  This file contains one line for
each of the 5 timings above, with the name followed by a colon and
4 numbers, which are the count (number of times the code has been
executed), the total time, the minimum time, and the maximum time,
all in nanoseconds.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  19 +
 arch/powerpc/include/asm/time.h |   3 +
 arch/powerpc/kernel/asm-offsets.c   |  11 +++
 arch/powerpc/kernel/time.c  |   6 ++
 arch/powerpc/kvm/book3s_hv.c| 135 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 105 -
 6 files changed, 276 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index f1d0bbc..286c0ce 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -369,6 +369,14 @@ struct kvmppc_slb {
u8 base_page_size;  /* MMU_PAGE_xxx */
 };
 
+/* Struct used to accumulate timing information in HV real mode code */
+struct kvmhv_tb_accumulator {
+   u64 seqcount;   /* used to synchronize access, also count * 2 */
+   u64 tb_total;   /* total time in timebase ticks */
+   u64 tb_min; /* min time */
+   u64 tb_max; /* max time */
+};
+
 # ifdef CONFIG_PPC_FSL_BOOK3E
 #define KVMPPC_BOOKE_IAC_NUM   2
 #define KVMPPC_BOOKE_DAC_NUM   2
@@ -656,6 +664,17 @@ struct kvm_vcpu_arch {
u64 busy_preempt;
 
u32 emul_inst;
+
+   struct kvmhv_tb_accumulator *cur_activity;  /* What we're timing */
+   u64 cur_tb_start;   /* when it started */
+   struct kvmhv_tb_accumulator rm_entry;   /* real-mode entry code */
+   struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */
+   struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
+   struct kvmhv_tb_accumulator guest_time; /* guest execution */
+   struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
+
+   struct dentry *debugfs_dir;
+   struct dentry *debugfs_timings;
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 03cbada..10fc784 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void);
 
 DECLARE_PER_CPU(u64, decrementers_next_tb);
 
+/* Convert timebase ticks to nanoseconds */
+unsigned long long tb_to_ns(unsigned long long tb_ticks);
+
 #endif /* __KERNEL__ */
 #endif /* __POWERPC_TIME_H */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 4717859..ec9f59c 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -458,6 +458,17 @@ int main(void)
DEFINE(VCPU_SPRG1, offsetof(struct kvm_vcpu, arch.shregs.sprg1));
DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
+   DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry));
+   DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr));
+   DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit));
+   DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time));
+   DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time));
+   DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity));
+   DEFINE(VCPU_ACTIVITY_START, offsetof(struct kvm_vcpu, 
arch.cur_tb_start));
+   DEFINE(TAS_SEQCOUNT, offsetof(struct kvmhv_tb_accumulator, seqcount));
+   DEFINE(TAS_TOTAL, offsetof(struct kvmhv_tb_accumulator, tb_total));
+   DEFINE(TAS_MIN, offsetof(struct kvmhv_tb_accumulator, tb_min));
+   DEFINE(TAS_MAX, offsetof(struct kvmhv_tb_accumulator, tb_max));
 #endif
DEFINE(VCPU_SHARED_SPRG3, offsetof(struct kvm_vcpu_arch_shared, sprg3));
DEFINE(VCPU_SHARED_SPRG4, offsetof(struct kvm_vcpu_arch_shared, sprg4));
diff --git

[PATCH 15/23] KVM: PPC: Book3S HV: Minor cleanups

2015-03-20 Thread Paul Mackerras

* Remove unused kvmppc_vcore::n_busy field.
* Remove setting of RMOR, since it was only used on PPC970 and the
  PPC970 KVM support has been removed.
* Don't use r1 or r2 in setting the runlatch since they are
  conventionally reserved for other things; use r0 instead.
* Streamline the code a little and remove the ext_interrupt_to_host
  label.
* Add some comments about register usage.
* hcall_try_real_mode doesn't need to be global, and can't be
  called from C code anyway.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kernel/asm-offsets.c   |  1 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 44 ++---
 3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index cee6e55..ec4cf37 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -227,7 +227,6 @@ struct kvm_arch {
unsigned long host_sdr1;
int tlbie_lock;
unsigned long lpcr;
-   unsigned long rmor;
unsigned long vrma_slb_v;
int hpte_setup_done;
u32 hpt_order;
@@ -271,7 +270,6 @@ struct kvm_arch {
  */
 struct kvmppc_vcore {
int n_runnable;
-   int n_busy;
int num_threads;
int entry_exit_count;
int n_woken;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index ec9f59c..5eda551 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -503,7 +503,6 @@ int main(void)
DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits));
DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
-   DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index d71ae2f..b2e6718 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -244,9 +244,9 @@ kvm_novcpu_exit:
 kvm_start_guest:
 
/* Set runlatch bit the minute you wake up from nap */
-   mfspr   r1, SPRN_CTRLF
-   ori r1, r1, 1
-   mtspr   SPRN_CTRLT, r1
+   mfspr   r0, SPRN_CTRLF
+   ori r0, r0, 1
+   mtspr   SPRN_CTRLT, r0
 
ld  r2,PACATOC(r13)
 
@@ -490,11 +490,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
cmpwi   r0,0
beq 20b
 
-   /* Set LPCR and RMOR. */
+   /* Set LPCR. */
 10:ld  r8,VCORE_LPCR(r5)
mtspr   SPRN_LPCR,r8
-   ld  r8,KVM_RMOR(r9)
-   mtspr   SPRN_RMOR,r8
isync
 
/* Check if HDEC expires soon */
@@ -1065,7 +1063,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
bne 2f
mfspr   r3,SPRN_HDEC
cmpwi   r3,0
-   bge ignore_hdec
+   mr  r4,r9
+   bge fast_guest_return
 2:
/* See if this is an hcall we can handle in real mode */
cmpwi   r12,BOOK3S_INTERRUPT_SYSCALL
@@ -1073,26 +1072,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
/* External interrupt ? */
cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
-   bne+ext_interrupt_to_host
+   bne+guest_exit_cont
 
/* External interrupt, first check for host_ipi. If this is
 * set, we know the host wants us out so let's do it now
 */
bl  kvmppc_read_intr
cmpdi   r3, 0
-   bgt ext_interrupt_to_host
+   bgt guest_exit_cont
 
/* Check if any CPU is heading out to the host, if so head out too */
ld  r5, HSTATE_KVM_VCORE(r13)
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi   r0, 0x100
-   bge ext_interrupt_to_host
-
-   /* Return to guest after delivering any pending interrupt */
mr  r4, r9
-   b   deliver_guest_interrupt
-
-ext_interrupt_to_host:
+   blt deliver_guest_interrupt
 
 guest_exit_cont:   /* r9 = vcpu, r12 = trap, r13 = paca */
/* Save more register state  */
@@ -1743,8 +1737,10 @@ kvmppc_hisi:
  * Returns to the guest if we handle it, or continues on up to
  * the kernel if we can't (i.e. if we don't have a handler for
  * it, or if the handler returns H_TOO_HARD).
+ *
+ * r5 - r8 contain hcall args,
+ * r9 = vcpu, r10 = pc, r11 = msr, r12 = trap, r13 = paca
  */
-   .globl  hcall_try_real_mode
 hcall_try_real_mode:
ld  r3,VCPU_GPR(R3)(r9)
andi.   r0,r11,MSR_PR
@@ -2004,10 +2000,6 @@ hcall_real_table:
.globl  hcall_real_table_end
 hcall_real_table_end:
 
-ignore_hdec:
-   mr  r4,r9
-   b   fast_guest_return
-
 _GLOBAL(kvmppc_h_set_xdabr)

[PATCH 22/23] KVM: PPC: Book3S HV: Use bitmap of active threads rather than count

2015-03-20 Thread Paul Mackerras

Currently, the entry_exit_count field in the kvmppc_vcore struct
contains two 8-bit counts, one of the threads that have started entering
the guest, and one of the threads that have started exiting the guest.
This changes it to an entry_exit_map field which contains two bitmaps
of 8 bits each.  The advantage of doing this is that it gives us a
bitmap of which threads need to be signalled when exiting the guest.
That means that we no longer need to use the trick of setting the
HDEC to 0 to pull the other threads out of the guest, which led in
some cases to a spurious HDEC interrupt on the next guest entry.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h | 15 
 arch/powerpc/kernel/asm-offsets.c   |  2 +-
 arch/powerpc/kvm/book3s_hv.c|  5 ++-
 arch/powerpc/kvm/book3s_hv_builtin.c| 10 +++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 63 +++--
 5 files changed, 45 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index f6d4232..c2b9551 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -263,15 +263,15 @@ struct kvm_arch {
 
 /*
  * Struct for a virtual core.
- * Note: entry_exit_count combines an entry count in the bottom 8 bits
- * and an exit count in the next 8 bits.  This is so that we can
- * atomically increment the entry count iff the exit count is 0
- * without taking the lock.
+ * Note: entry_exit_map combines a bitmap of threads that have entered
+ * in the bottom 8 bits and a bitmap of threads that have exited in the
+ * next 8 bits.  This is so that we can atomically set the entry bit
+ * iff the exit map is 0 without taking a lock.
  */
 struct kvmppc_vcore {
int n_runnable;
int num_threads;
-   int entry_exit_count;
+   int entry_exit_map;
int napping_threads;
int first_vcpuid;
u16 pcpu;
@@ -296,8 +296,9 @@ struct kvmppc_vcore {
ulong conferring_threads;
 };
 
-#define VCORE_ENTRY_COUNT(vc)  ((vc)-entry_exit_count  0xff)
-#define VCORE_EXIT_COUNT(vc)   ((vc)-entry_exit_count  8)
+#define VCORE_ENTRY_MAP(vc)((vc)-entry_exit_map  0xff)
+#define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
+#define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 0ce2aa6..ed348e5 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -561,7 +561,7 @@ int main(void)
DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop));
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
-   DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, 
entry_exit_count));
+   DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, 
napping_threads));
DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2c34bae..9ea0eb5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1941,7 +1941,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Initialize *vc.
 */
-   vc-entry_exit_count = 0;
+   vc-entry_exit_map = 0;
vc-preempt_tb = TB_NIL;
vc-in_guest = 0;
vc-napping_threads = 0;
@@ -2108,8 +2108,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
 * this thread straight away and have it join in.
 */
if (!signal_pending(current)) {
-   if (vc-vcore_state == VCORE_RUNNING 
-   VCORE_EXIT_COUNT(vc) == 0) {
+   if (vc-vcore_state == VCORE_RUNNING  !VCORE_IS_EXITING(vc)) {
kvmppc_create_dtl_entry(vcpu, vc);
kvmppc_start_thread(vcpu);
trace_kvm_guest_enter(vcpu);
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 1954a1c..2754251 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -115,11 +115,11 @@ long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int 
target,
int rv = H_SUCCESS; /* = don't yield */
 
set_bit(vcpu-arch.ptid, vc-conferring_threads);
-   while ((get_tb()  stop)  (VCORE_EXIT_COUNT(vc) == 0)) {
-   threads_running = VCORE_ENTRY_COUNT(vc);
-   threads_ceded = hweight32(vc-napping_threads);
-   threads_conferring = hweight32(vc-conferring_threads);
-   if (threads_ceded + threads_conferring = threads_running) {
+   while

[PATCH 12/23] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT

2015-03-20 Thread Paul Mackerras

This creates a debugfs directory for each HV guest (assuming debugfs
is enabled in the kernel config), and within that directory, a file
by which the contents of the guest's HPT (hashed page table) can be
read.  The directory is named vm, where  is the PID of the
process that created the guest.  The file is named htab.  This is
intended to help in debugging problems in the host's management
of guest memory.

The contents of the file consist of a series of lines like this:

  3f48 4000d032bf003505 000bd7ff1196 0003b5c71196

The first field is the index of the entry in the HPT, the second and
third are the HPT entry, so the third entry contains the real page
number that is mapped by the entry if the entry's valid bit is set.
The fourth field is the guest's view of the second doubleword of the
entry, so it contains the guest physical address.  (The format of the
second through fourth fields are described in the Power ISA and also
in arch/powerpc/include/asm/mmu-hash64.h.)

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 +
 arch/powerpc/include/asm/kvm_host.h  |   2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 136 +++
 arch/powerpc/kvm/book3s_hv.c |  12 +++
 virt/kvm/kvm_main.c  |   1 +
 5 files changed, 153 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0789a0f..869c53f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -436,6 +436,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
return rcu_dereference_raw_notrace(kvm-memslots);
 }
 
+extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 015773f..f1d0bbc 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -238,6 +238,8 @@ struct kvm_arch {
atomic_t hpte_mod_interest;
cpumask_t need_tlb_flush;
int hpt_cma_alloc;
+   struct dentry *debugfs_dir;
+   struct dentry *htab_dentry;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 6c6825a..d6fe308 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -27,6 +27,7 @@
 #include linux/srcu.h
 #include linux/anon_inodes.h
 #include linux/file.h
+#include linux/debugfs.h
 
 #include asm/tlbflush.h
 #include asm/kvm_ppc.h
@@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct 
kvm_get_htab_fd *ghf)
return ret;
 }
 
+struct debugfs_htab_state {
+   struct kvm  *kvm;
+   struct mutexmutex;
+   unsigned long   hpt_index;
+   int chars_left;
+   int buf_index;
+   charbuf[64];
+};
+
+static int debugfs_htab_open(struct inode *inode, struct file *file)
+{
+   struct kvm *kvm = inode-i_private;
+   struct debugfs_htab_state *p;
+
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   if (!p)
+   return -ENOMEM;
+
+   kvm_get_kvm(kvm);
+   p-kvm = kvm;
+   mutex_init(p-mutex);
+   file-private_data = p;
+
+   return nonseekable_open(inode, file);
+}
+
+static int debugfs_htab_release(struct inode *inode, struct file *file)
+{
+   struct debugfs_htab_state *p = file-private_data;
+
+   kvm_put_kvm(p-kvm);
+   kfree(p);
+   return 0;
+}
+
+static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
+size_t len, loff_t *ppos)
+{
+   struct debugfs_htab_state *p = file-private_data;
+   ssize_t ret, r;
+   unsigned long i, n;
+   unsigned long v, hr, gr;
+   struct kvm *kvm;
+   __be64 *hptp;
+
+   ret = mutex_lock_interruptible(p-mutex);
+   if (ret)
+   return ret;
+
+   if (p-chars_left) {
+   n = p-chars_left;
+   if (n  len)
+   n = len;
+   r = copy_to_user(buf, p-buf + p-buf_index, n);
+   n -= r;
+   p-chars_left -= n;
+   p-buf_index += n;
+   buf += n;
+   len -= n;
+   ret = n;
+   if (r) {
+   if (!n)
+   ret = -EFAULT;
+   goto out;
+   }
+   }
+
+   kvm = p-kvm;
+   i = p-hpt_index;
+   hptp = (__be64 *)(kvm-arch.hpt_virt + (i * HPTE_SIZE));
+   for (; len != 0  i  kvm-arch.hpt_npte; ++i, hptp += 2) {
+   if (!(be64_to_cpu(hptp[0])  (HPTE_V_VALID | HPTE_V_ABSENT)))
+   continue;
+
+

[PATCH 23/23] KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C

2015-03-20 Thread Paul Mackerras

This replaces the assembler code for kvmhv_commence_exit() with C code
in book3s_hv_builtin.c.  It also moves the IPI/message sending code
that was in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function
so it can be used by kvmhv_commence_exit() as well as
icp_rm_set_vcpu_irq().

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  2 +
 arch/powerpc/kvm/book3s_hv_builtin.c | 73 
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 22 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 81 
 4 files changed, 85 insertions(+), 93 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 869c53f..2b84e48 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -438,6 +438,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
 
 extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 
+extern void kvmhv_rm_send_ipi(int cpu);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 2754251..dbfc525 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -22,6 +22,8 @@
 #include asm/kvm_ppc.h
 #include asm/kvm_book3s.h
 #include asm/archrandom.h
+#include asm/dbell.h
+#include asm/xics.h
 
 #define KVM_CMA_CHUNK_ORDER18
 
@@ -184,3 +186,74 @@ long kvmppc_h_random(struct kvm_vcpu *vcpu)
 
return H_HARDWARE;
 }
+
+static inline void rm_writeb(unsigned long paddr, u8 val)
+{
+   __asm__ __volatile__(stbcix %0,0,%1
+   : : r (val), r (paddr) : memory);
+}
+
+/*
+ * Send an interrupt or message to another CPU.
+ * This can only be called in real mode.
+ * The caller needs to include any barrier needed to order writes
+ * to memory vs. the IPI/message.
+ */
+void kvmhv_rm_send_ipi(int cpu)
+{
+   unsigned long xics_phys;
+
+   /* On POWER8 for IPIs to threads in the same core, use msgsnd */
+   if (cpu_has_feature(CPU_FTR_ARCH_207S) 
+   (cpu  ~7) == (raw_smp_processor_id()  ~7)) {
+   unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+   msg |= cpu  7;
+   __asm__ __volatile__ (PPC_MSGSND(%0) : : r (msg));
+   return;
+   }
+
+   /* Not too hard, then poke the target */
+   xics_phys = paca[cpu].kvm_hstate.xics_phys;
+   rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+}
+
+/*
+ * The following functions are called from the assembly code
+ * in book3s_hv_rmhandlers.S.
+ */
+static void kvmhv_interrupt_vcore(struct kvmppc_vcore *vc, int active)
+{
+   int cpu = vc-pcpu;
+
+   /* Order setting of exit map vs. msgsnd/IPI */
+   smp_mb();
+   for (; active; active = 1, ++cpu)
+   if (active  1)
+   kvmhv_rm_send_ipi(cpu);
+}
+
+void kvmhv_commence_exit(int trap)
+{
+   struct kvmppc_vcore *vc = local_paca-kvm_hstate.kvm_vcore;
+   int ptid = local_paca-kvm_hstate.ptid;
+   int me, ee;
+
+   /* Set our bit in the threads-exiting-guest map in the 0xff00
+  bits of vcore-entry_exit_map */
+   me = 0x100  ptid;
+   do {
+   ee = vc-entry_exit_map;
+   } while (cmpxchg(vc-entry_exit_map, ee, ee | me) != ee);
+
+   /* Are we the first here? */
+   if ((ee  8) != 0)
+   return;
+
+   /*
+* Trigger the other threads in this vcore to exit the guest.
+* If this is a hypervisor decrementer interrupt then they
+* will be already on their way out of the guest.
+*/
+   if (trap != BOOK3S_INTERRUPT_HV_DECREMENTER)
+   kvmhv_interrupt_vcore(vc, ee  ~(1  ptid));
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 457a8b1..046ab44 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -27,12 +27,6 @@
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
u32 new_irq);
 
-static inline void rm_writeb(unsigned long paddr, u8 val)
-{
-   __asm__ __volatile__(sync; stbcix %0,0,%1
-   : : r (val), r (paddr) : memory);
-}
-
 /* -- ICS routines -- */
 static void ics_rm_check_resend(struct kvmppc_xics *xics,
struct kvmppc_ics *ics, struct kvmppc_icp *icp)
@@ -61,7 +55,6 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
struct kvm_vcpu *this_vcpu)
 {
struct kvmppc_icp *this_icp = this_vcpu-arch.icp;
-   unsigned long xics_phys;
int cpu;
 
/* Mark the target VCPU as having an interrupt pending */
@@ -84,19 +77,8 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
/* In SMT cpu will always point to thread 0,

[PATCH 10/23] KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode

2015-03-20 Thread Paul Mackerras

From: Suresh Warrier warr...@linux.vnet.ibm.com

Interrupt-based hypercalls return H_TOO_HARD to inform KVM that it needs
to switch to the host to complete the rest of hypercall function in
virtual mode. This patch ports the virtual mode ICS/ICP reject and resend
functions to be runnable in hypervisor real mode, thus avoiding the need
to switch to the host to execute these functions in virtual mode. However,
the hypercalls continue to return H_TOO_HARD for vcpu_wakeup and notify
events - these events cannot be done in real mode and they will still need
a switch to host virtual mode.

There are sufficient differences between the real mode code and the
virtual mode code for the ICS/ICP resend and reject functions that
for now the code has been duplicated instead of sharing common code.
In the future, we can look at creating common functions.

Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 225 ---
 1 file changed, 211 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 7c22997..73bbe92 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -23,12 +23,39 @@
 
 #define DEBUG_PASSUP
 
+static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
+   u32 new_irq);
+
 static inline void rm_writeb(unsigned long paddr, u8 val)
 {
__asm__ __volatile__(sync; stbcix %0,0,%1
: : r (val), r (paddr) : memory);
 }
 
+/* -- ICS routines -- */
+static void ics_rm_check_resend(struct kvmppc_xics *xics,
+   struct kvmppc_ics *ics, struct kvmppc_icp *icp)
+{
+   int i;
+
+   arch_spin_lock(ics-lock);
+
+   for (i = 0; i  KVMPPC_XICS_IRQ_PER_ICS; i++) {
+   struct ics_irq_state *state = ics-irq_state[i];
+
+   if (!state-resend)
+   continue;
+
+   arch_spin_unlock(ics-lock);
+   icp_rm_deliver_irq(xics, icp, state-number);
+   arch_spin_lock(ics-lock);
+   }
+
+   arch_spin_unlock(ics-lock);
+}
+
+/* -- ICP routines -- */
+
 static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
struct kvm_vcpu *this_vcpu)
 {
@@ -116,6 +143,178 @@ static inline int check_too_hard(struct kvmppc_xics *xics,
return (xics-real_mode_dbg || icp-rm_action) ? H_TOO_HARD : H_SUCCESS;
 }
 
+static void icp_rm_check_resend(struct kvmppc_xics *xics,
+struct kvmppc_icp *icp)
+{
+   u32 icsid;
+
+   /* Order this load with the test for need_resend in the caller */
+   smp_rmb();
+   for_each_set_bit(icsid, icp-resend_map, xics-max_icsid + 1) {
+   struct kvmppc_ics *ics = xics-ics[icsid];
+
+   if (!test_and_clear_bit(icsid, icp-resend_map))
+   continue;
+   if (!ics)
+   continue;
+   ics_rm_check_resend(xics, ics, icp);
+   }
+}
+
+static bool icp_rm_try_to_deliver(struct kvmppc_icp *icp, u32 irq, u8 priority,
+  u32 *reject)
+{
+   union kvmppc_icp_state old_state, new_state;
+   bool success;
+
+   do {
+   old_state = new_state = READ_ONCE(icp-state);
+
+   *reject = 0;
+
+   /* See if we can deliver */
+   success = new_state.cppr  priority 
+   new_state.mfrr  priority 
+   new_state.pending_pri  priority;
+
+   /*
+* If we can, check for a rejection and perform the
+* delivery
+*/
+   if (success) {
+   *reject = new_state.xisr;
+   new_state.xisr = irq;
+   new_state.pending_pri = priority;
+   } else {
+   /*
+* If we failed to deliver we set need_resend
+* so a subsequent CPPR state change causes us
+* to try a new delivery.
+*/
+   new_state.need_resend = true;
+   }
+
+   } while (!icp_rm_try_update(icp, old_state, new_state));
+
+   return success;
+}
+
+static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
+   u32 new_irq)
+{
+   struct ics_irq_state *state;
+   struct kvmppc_ics *ics;
+   u32 reject;
+   u16 src;
+
+   /*
+* This is used both for initial delivery of an interrupt and
+* for subsequent rejection.
+*
+* Rejection can be racy vs. resends. We have evaluated the
+* rejection in an atomic ICP transaction which is now complete,
+* so potentially the ICP can already accept the

[PATCH 09/23] KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock

2015-03-20 Thread Paul Mackerras

From: Suresh Warrier warr...@linux.vnet.ibm.com

Replaces the ICS mutex lock with a spin lock since we will be porting
these routines to real mode. Note that we need to disable interrupts
before we take the lock in anticipation of the fact that on the guest
side, we are running in the context of a hard irq and interrupts are
disabled (EE bit off) when the lock is acquired. Again, because we
will be acquiring the lock in hypervisor real mode, we need to use
an arch_spinlock_t instead of a normal spinlock here as we want to
avoid running any lockdep code (which may not be safe to execute in
real mode).

Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_xics.c | 68 +-
 arch/powerpc/kvm/book3s_xics.h |  2 +-
 2 files changed, 48 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 48f0bda..56ed9b4 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -20,6 +20,7 @@
 #include asm/xics.h
 #include asm/debug.h
 #include asm/time.h
+#include asm/spinlock.h
 
 #include linux/debugfs.h
 #include linux/seq_file.h
@@ -39,7 +40,7 @@
  * LOCKING
  * ===
  *
- * Each ICS has a mutex protecting the information about the IRQ
+ * Each ICS has a spin lock protecting the information about the IRQ
  * sources and avoiding simultaneous deliveries if the same interrupt.
  *
  * ICP operations are done via a single compare  swap transaction
@@ -109,7 +110,10 @@ static void ics_check_resend(struct kvmppc_xics *xics, 
struct kvmppc_ics *ics,
 {
int i;
 
-   mutex_lock(ics-lock);
+   unsigned long flags;
+
+   local_irq_save(flags);
+   arch_spin_lock(ics-lock);
 
for (i = 0; i  KVMPPC_XICS_IRQ_PER_ICS; i++) {
struct ics_irq_state *state = ics-irq_state[i];
@@ -120,12 +124,15 @@ static void ics_check_resend(struct kvmppc_xics *xics, 
struct kvmppc_ics *ics,
XICS_DBG(resend %#x prio %#x\n, state-number,
  state-priority);
 
-   mutex_unlock(ics-lock);
+   arch_spin_unlock(ics-lock);
+   local_irq_restore(flags);
icp_deliver_irq(xics, icp, state-number);
-   mutex_lock(ics-lock);
+   local_irq_save(flags);
+   arch_spin_lock(ics-lock);
}
 
-   mutex_unlock(ics-lock);
+   arch_spin_unlock(ics-lock);
+   local_irq_restore(flags);
 }
 
 static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
@@ -133,8 +140,10 @@ static bool write_xive(struct kvmppc_xics *xics, struct 
kvmppc_ics *ics,
   u32 server, u32 priority, u32 saved_priority)
 {
bool deliver;
+   unsigned long flags;
 
-   mutex_lock(ics-lock);
+   local_irq_save(flags);
+   arch_spin_lock(ics-lock);
 
state-server = server;
state-priority = priority;
@@ -145,7 +154,8 @@ static bool write_xive(struct kvmppc_xics *xics, struct 
kvmppc_ics *ics,
deliver = true;
}
 
-   mutex_unlock(ics-lock);
+   arch_spin_unlock(ics-lock);
+   local_irq_restore(flags);
 
return deliver;
 }
@@ -186,6 +196,7 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 
*server, u32 *priority)
struct kvmppc_ics *ics;
struct ics_irq_state *state;
u16 src;
+   unsigned long flags;
 
if (!xics)
return -ENODEV;
@@ -195,10 +206,12 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 
*server, u32 *priority)
return -EINVAL;
state = ics-irq_state[src];
 
-   mutex_lock(ics-lock);
+   local_irq_save(flags);
+   arch_spin_lock(ics-lock);
*server = state-server;
*priority = state-priority;
-   mutex_unlock(ics-lock);
+   arch_spin_unlock(ics-lock);
+   local_irq_restore(flags);
 
return 0;
 }
@@ -365,6 +378,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
struct kvmppc_ics *ics;
u32 reject;
u16 src;
+   unsigned long flags;
 
/*
 * This is used both for initial delivery of an interrupt and
@@ -391,7 +405,8 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
state = ics-irq_state[src];
 
/* Get a lock on the ICS */
-   mutex_lock(ics-lock);
+   local_irq_save(flags);
+   arch_spin_lock(ics-lock);
 
/* Get our server */
if (!icp || state-server != icp-server_num) {
@@ -434,7 +449,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
 *
 * Note that if successful, the new delivery might have itself
 * rejected an interrupt that was delivered before we took the
-* icp mutex.
+* ics spin lock.
 *
 * In this case we do the whole

[PATCH 15/23] KVM: PPC: Book3S HV: Minor cleanups

2015-03-20 Thread Paul Mackerras

* Remove unused kvmppc_vcore::n_busy field.
* Remove setting of RMOR, since it was only used on PPC970 and the
  PPC970 KVM support has been removed.
* Don't use r1 or r2 in setting the runlatch since they are
  conventionally reserved for other things; use r0 instead.
* Streamline the code a little and remove the ext_interrupt_to_host
  label.
* Add some comments about register usage.
* hcall_try_real_mode doesn't need to be global, and can't be
  called from C code anyway.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kernel/asm-offsets.c   |  1 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 44 ++---
 3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index cee6e55..ec4cf37 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -227,7 +227,6 @@ struct kvm_arch {
unsigned long host_sdr1;
int tlbie_lock;
unsigned long lpcr;
-   unsigned long rmor;
unsigned long vrma_slb_v;
int hpte_setup_done;
u32 hpt_order;
@@ -271,7 +270,6 @@ struct kvm_arch {
  */
 struct kvmppc_vcore {
int n_runnable;
-   int n_busy;
int num_threads;
int entry_exit_count;
int n_woken;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index ec9f59c..5eda551 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -503,7 +503,6 @@ int main(void)
DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits));
DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
-   DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index d71ae2f..b2e6718 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -244,9 +244,9 @@ kvm_novcpu_exit:
 kvm_start_guest:
 
/* Set runlatch bit the minute you wake up from nap */
-   mfspr   r1, SPRN_CTRLF
-   ori r1, r1, 1
-   mtspr   SPRN_CTRLT, r1
+   mfspr   r0, SPRN_CTRLF
+   ori r0, r0, 1
+   mtspr   SPRN_CTRLT, r0
 
ld  r2,PACATOC(r13)
 
@@ -490,11 +490,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
cmpwi   r0,0
beq 20b
 
-   /* Set LPCR and RMOR. */
+   /* Set LPCR. */
 10:ld  r8,VCORE_LPCR(r5)
mtspr   SPRN_LPCR,r8
-   ld  r8,KVM_RMOR(r9)
-   mtspr   SPRN_RMOR,r8
isync
 
/* Check if HDEC expires soon */
@@ -1065,7 +1063,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
bne 2f
mfspr   r3,SPRN_HDEC
cmpwi   r3,0
-   bge ignore_hdec
+   mr  r4,r9
+   bge fast_guest_return
 2:
/* See if this is an hcall we can handle in real mode */
cmpwi   r12,BOOK3S_INTERRUPT_SYSCALL
@@ -1073,26 +1072,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
/* External interrupt ? */
cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
-   bne+ext_interrupt_to_host
+   bne+guest_exit_cont
 
/* External interrupt, first check for host_ipi. If this is
 * set, we know the host wants us out so let's do it now
 */
bl  kvmppc_read_intr
cmpdi   r3, 0
-   bgt ext_interrupt_to_host
+   bgt guest_exit_cont
 
/* Check if any CPU is heading out to the host, if so head out too */
ld  r5, HSTATE_KVM_VCORE(r13)
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi   r0, 0x100
-   bge ext_interrupt_to_host
-
-   /* Return to guest after delivering any pending interrupt */
mr  r4, r9
-   b   deliver_guest_interrupt
-
-ext_interrupt_to_host:
+   blt deliver_guest_interrupt
 
 guest_exit_cont:   /* r9 = vcpu, r12 = trap, r13 = paca */
/* Save more register state  */
@@ -1743,8 +1737,10 @@ kvmppc_hisi:
  * Returns to the guest if we handle it, or continues on up to
  * the kernel if we can't (i.e. if we don't have a handler for
  * it, or if the handler returns H_TOO_HARD).
+ *
+ * r5 - r8 contain hcall args,
+ * r9 = vcpu, r10 = pc, r11 = msr, r12 = trap, r13 = paca
  */
-   .globl  hcall_try_real_mode
 hcall_try_real_mode:
ld  r3,VCPU_GPR(R3)(r9)
andi.   r0,r11,MSR_PR
@@ -2004,10 +2000,6 @@ hcall_real_table:
.globl  hcall_real_table_end
 hcall_real_table_end:
 
-ignore_hdec:
-   mr  r4,r9
-   b   fast_guest_return
-
 _GLOBAL(kvmppc_h_set_xdabr)

[PATCH 17/23] KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken

2015-03-20 Thread Paul Mackerras

We can tell when a secondary thread has finished running a guest by
the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
is no real need for the nap_count field in the kvmppc_vcore struct.
This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
pointers of the secondary threads rather than polling vc-nap_count.
Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
this also means that we can tell which secondary threads have got
stuck and thus print a more informative error message.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kernel/asm-offsets.c   |  1 -
 arch/powerpc/kvm/book3s_hv.c| 47 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 19 +
 4 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 7b327e5..f6d4232 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -272,8 +272,6 @@ struct kvmppc_vcore {
int n_runnable;
int num_threads;
int entry_exit_count;
-   int n_woken;
-   int nap_count;
int napping_threads;
int first_vcpuid;
u16 pcpu;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 5eda551..fa7b57d 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -561,7 +561,6 @@ int main(void)
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, 
entry_exit_count));
-   DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, 
napping_threads));
DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index bd16b03..03a8bb4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1702,8 +1702,10 @@ static int kvmppc_grab_hwthread(int cpu)
tpaca = paca[cpu];
 
/* Ensure the thread won't go into the kernel if it wakes */
-   tpaca-kvm_hstate.hwthread_req = 1;
tpaca-kvm_hstate.kvm_vcpu = NULL;
+   tpaca-kvm_hstate.napping = 0;
+   smp_wmb();
+   tpaca-kvm_hstate.hwthread_req = 1;
 
/*
 * If the thread is already executing in the kernel (e.g. handling
@@ -1746,35 +1748,43 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
}
cpu = vc-pcpu + vcpu-arch.ptid;
tpaca = paca[cpu];
-   tpaca-kvm_hstate.kvm_vcpu = vcpu;
tpaca-kvm_hstate.kvm_vcore = vc;
tpaca-kvm_hstate.ptid = vcpu-arch.ptid;
vcpu-cpu = vc-pcpu;
+   /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
smp_wmb();
+   tpaca-kvm_hstate.kvm_vcpu = vcpu;
 #if defined(CONFIG_PPC_ICP_NATIVE)  defined(CONFIG_SMP)
-   if (cpu != smp_processor_id()) {
+   if (cpu != smp_processor_id())
xics_wake_cpu(cpu);
-   if (vcpu-arch.ptid)
-   ++vc-n_woken;
-   }
 #endif
 }
 
-static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
+static void kvmppc_wait_for_nap(void)
 {
-   int i;
+   int cpu = smp_processor_id();
+   int i, loops;
 
-   HMT_low();
-   i = 0;
-   while (vc-nap_count  vc-n_woken) {
-   if (++i = 100) {
-   pr_err(kvmppc_wait_for_nap timeout %d %d\n,
-  vc-nap_count, vc-n_woken);
-   break;
+   for (loops = 0; loops  100; ++loops) {
+   /*
+* Check if all threads are finished.
+* We set the vcpu pointer when starting a thread
+* and the thread clears it when finished, so we look
+* for any threads that still have a non-NULL vcpu ptr.
+*/
+   for (i = 1; i  threads_per_subcore; ++i)
+   if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+   break;
+   if (i == threads_per_subcore) {
+   HMT_medium();
+   return;
}
-   cpu_relax();
+   HMT_low();
}
HMT_medium();
+   for (i = 1; i  threads_per_subcore; ++i)
+   if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+   pr_err(KVM: CPU %d seems to be stuck\n, cpu + i);
 }
 
 /*
@@ -1915,8 +1925,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Initialize *vc.
 */
-   vc-n_woken = 0;
-   vc-nap_count = 0;
vc-entry_exit_count = 0;
vc-preempt_tb =

[PATCH 12/23] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT

2015-03-20 Thread Paul Mackerras

This creates a debugfs directory for each HV guest (assuming debugfs
is enabled in the kernel config), and within that directory, a file
by which the contents of the guest's HPT (hashed page table) can be
read.  The directory is named vm, where  is the PID of the
process that created the guest.  The file is named htab.  This is
intended to help in debugging problems in the host's management
of guest memory.

The contents of the file consist of a series of lines like this:

  3f48 4000d032bf003505 000bd7ff1196 0003b5c71196

The first field is the index of the entry in the HPT, the second and
third are the HPT entry, so the third entry contains the real page
number that is mapped by the entry if the entry's valid bit is set.
The fourth field is the guest's view of the second doubleword of the
entry, so it contains the guest physical address.  (The format of the
second through fourth fields are described in the Power ISA and also
in arch/powerpc/include/asm/mmu-hash64.h.)

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 +
 arch/powerpc/include/asm/kvm_host.h  |   2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 136 +++
 arch/powerpc/kvm/book3s_hv.c |  12 +++
 virt/kvm/kvm_main.c  |   1 +
 5 files changed, 153 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0789a0f..869c53f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -436,6 +436,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
return rcu_dereference_raw_notrace(kvm-memslots);
 }
 
+extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 015773f..f1d0bbc 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -238,6 +238,8 @@ struct kvm_arch {
atomic_t hpte_mod_interest;
cpumask_t need_tlb_flush;
int hpt_cma_alloc;
+   struct dentry *debugfs_dir;
+   struct dentry *htab_dentry;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 6c6825a..d6fe308 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -27,6 +27,7 @@
 #include linux/srcu.h
 #include linux/anon_inodes.h
 #include linux/file.h
+#include linux/debugfs.h
 
 #include asm/tlbflush.h
 #include asm/kvm_ppc.h
@@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct 
kvm_get_htab_fd *ghf)
return ret;
 }
 
+struct debugfs_htab_state {
+   struct kvm  *kvm;
+   struct mutexmutex;
+   unsigned long   hpt_index;
+   int chars_left;
+   int buf_index;
+   charbuf[64];
+};
+
+static int debugfs_htab_open(struct inode *inode, struct file *file)
+{
+   struct kvm *kvm = inode-i_private;
+   struct debugfs_htab_state *p;
+
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   if (!p)
+   return -ENOMEM;
+
+   kvm_get_kvm(kvm);
+   p-kvm = kvm;
+   mutex_init(p-mutex);
+   file-private_data = p;
+
+   return nonseekable_open(inode, file);
+}
+
+static int debugfs_htab_release(struct inode *inode, struct file *file)
+{
+   struct debugfs_htab_state *p = file-private_data;
+
+   kvm_put_kvm(p-kvm);
+   kfree(p);
+   return 0;
+}
+
+static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
+size_t len, loff_t *ppos)
+{
+   struct debugfs_htab_state *p = file-private_data;
+   ssize_t ret, r;
+   unsigned long i, n;
+   unsigned long v, hr, gr;
+   struct kvm *kvm;
+   __be64 *hptp;
+
+   ret = mutex_lock_interruptible(p-mutex);
+   if (ret)
+   return ret;
+
+   if (p-chars_left) {
+   n = p-chars_left;
+   if (n  len)
+   n = len;
+   r = copy_to_user(buf, p-buf + p-buf_index, n);
+   n -= r;
+   p-chars_left -= n;
+   p-buf_index += n;
+   buf += n;
+   len -= n;
+   ret = n;
+   if (r) {
+   if (!n)
+   ret = -EFAULT;
+   goto out;
+   }
+   }
+
+   kvm = p-kvm;
+   i = p-hpt_index;
+   hptp = (__be64 *)(kvm-arch.hpt_virt + (i * HPTE_SIZE));
+   for (; len != 0  i  kvm-arch.hpt_npte; ++i, hptp += 2) {
+   if (!(be64_to_cpu(hptp[0])  (HPTE_V_VALID | HPTE_V_ABSENT)))
+   continue;
+
+

[PATCH 16/23] KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu

2015-03-20 Thread Paul Mackerras

Rather than calling cond_resched() in kvmppc_run_core() before doing
the post-processing for the vcpus that we have just run (that is,
calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do
that post-processing before calling cond_resched(), and that post-
processing is moved out into its own function, post_guest_process().

The reschedule point is now in kvmppc_run_vcpu() and we define a new
vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner
task is runnable but not running.  (Doing the reschedule with the
vcore in VCORE_INACTIVE state would be bad because there are potentially
other vcpus waiting for the runner in kvmppc_wait_for_exec() which
then wouldn't get woken up.)

Also, we make use of the handy cond_resched_lock() function, which
unlocks and relocks vc-lock for us around the reschedule.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  5 +-
 arch/powerpc/kvm/book3s_hv.c| 92 +
 2 files changed, 55 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index ec4cf37..7b327e5 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -304,8 +304,9 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
 #define VCORE_SLEEPING 1
-#define VCORE_RUNNING  2
-#define VCORE_EXITING  3
+#define VCORE_PREEMPT  2
+#define VCORE_RUNNING  3
+#define VCORE_EXITING  4
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 15598be..bd16b03 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1855,15 +1855,50 @@ static void prepare_threads(struct kvmppc_vcore *vc)
}
 }
 
+static void post_guest_process(struct kvmppc_vcore *vc)
+{
+   u64 now;
+   long ret;
+   struct kvm_vcpu *vcpu, *vnext;
+
+   now = get_tb();
+   list_for_each_entry_safe(vcpu, vnext, vc-runnable_threads,
+arch.run_list) {
+   /* cancel pending dec exception if dec is positive */
+   if (now  vcpu-arch.dec_expires 
+   kvmppc_core_pending_dec(vcpu))
+   kvmppc_core_dequeue_dec(vcpu);
+
+   trace_kvm_guest_exit(vcpu);
+
+   ret = RESUME_GUEST;
+   if (vcpu-arch.trap)
+   ret = kvmppc_handle_exit_hv(vcpu-arch.kvm_run, vcpu,
+   vcpu-arch.run_task);
+
+   vcpu-arch.ret = ret;
+   vcpu-arch.trap = 0;
+
+   if (vcpu-arch.ceded) {
+   if (!is_kvmppc_resume_guest(ret))
+   kvmppc_end_cede(vcpu);
+   else
+   kvmppc_set_timer(vcpu);
+   }
+   if (!is_kvmppc_resume_guest(vcpu-arch.ret)) {
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(vcpu-arch.cpu_run);
+   }
+   }
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc-lock held.
  */
 static void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
-   struct kvm_vcpu *vcpu, *vnext;
-   long ret;
-   u64 now;
+   struct kvm_vcpu *vcpu;
int i;
int srcu_idx;
 
@@ -1895,8 +1930,11 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 */
if ((threads_per_core  1) 
((vc-num_threads  threads_per_subcore) || !on_primary_thread())) {
-   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list)
+   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) 
{
vcpu-arch.ret = -EBUSY;
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(vcpu-arch.cpu_run);
+   }
goto out;
}
 
@@ -1952,44 +1990,12 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
kvm_guest_exit();
 
preempt_enable();
-   cond_resched();
 
spin_lock(vc-lock);
-   now = get_tb();
-   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) {
-   /* cancel pending dec exception if dec is positive */
-   if (now  vcpu-arch.dec_expires 
-   kvmppc_core_pending_dec(vcpu))
-   kvmppc_core_dequeue_dec(vcpu);
-
-   trace_kvm_guest_exit(vcpu);
-
-   ret = RESUME_GUEST;
-   if (vcpu-arch.trap)
-   ret = kvmppc_handle_exit_hv(vcpu-arch.kvm_run, vcpu,
-   vcpu-arch.run_task);
-
-   vcpu-arch.ret = ret;
-   vcpu-arch.trap = 0;
-
-   if (vcpu-arch.ceded) {
-   if (!is_kvmppc_resume_guest(ret))
-

[PATCH 14/23] KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update

2015-03-20 Thread Paul Mackerras

Previously, if kvmppc_run_core() was running a VCPU that needed a VPA
update (i.e. one of its 3 virtual processor areas needed to be pinned
in memory so the host real mode code can update it on guest entry and
exit), we would drop the vcore lock and do the update there and then.
Future changes will make it inconvenient to drop the lock, so instead
we now remove it from the list of runnable VCPUs and wake up its
VCPU task.  This will have the effect that the VCPU task will exit
kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and
re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call
to kvmppc_update_vpas() and then rejoin the vcore.

The one complication is that the runner VCPU (whose VCPU task is the
current task) might be one of the ones that gets removed from the
runnable list.  In that case we just return from kvmppc_run_core()
and let the code in kvmppc_run_vcpu() wake up another VCPU task to be
the runner if necessary.

This all means that the VCORE_STARTING state is no longer used, so we
remove it.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  5 ++--
 arch/powerpc/kvm/book3s_hv.c| 56 -
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 286c0ce..cee6e55 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -306,9 +306,8 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
 #define VCORE_SLEEPING 1
-#define VCORE_STARTING 2
-#define VCORE_RUNNING  3
-#define VCORE_EXITING  4
+#define VCORE_RUNNING  2
+#define VCORE_EXITING  3
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8517c33..15598be 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1836,6 +1836,25 @@ static void kvmppc_start_restoring_l2_cache(const struct 
kvmppc_vcore *vc)
mtspr(SPRN_MPPR, mpp_addr | PPC_MPPR_FETCH_WHOLE_TABLE);
 }
 
+static void prepare_threads(struct kvmppc_vcore *vc)
+{
+   struct kvm_vcpu *vcpu, *vnext;
+
+   list_for_each_entry_safe(vcpu, vnext, vc-runnable_threads,
+arch.run_list) {
+   if (signal_pending(vcpu-arch.run_task))
+   vcpu-arch.ret = -EINTR;
+   else if (vcpu-arch.vpa.update_pending ||
+vcpu-arch.slb_shadow.update_pending ||
+vcpu-arch.dtl.update_pending)
+   vcpu-arch.ret = RESUME_GUEST;
+   else
+   continue;
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(vcpu-arch.cpu_run);
+   }
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc-lock held.
@@ -1845,46 +1864,31 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
struct kvm_vcpu *vcpu, *vnext;
long ret;
u64 now;
-   int i, need_vpa_update;
+   int i;
int srcu_idx;
-   struct kvm_vcpu *vcpus_to_update[threads_per_core];
 
-   /* don't start if any threads have a signal pending */
-   need_vpa_update = 0;
-   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) {
-   if (signal_pending(vcpu-arch.run_task))
-   return;
-   if (vcpu-arch.vpa.update_pending ||
-   vcpu-arch.slb_shadow.update_pending ||
-   vcpu-arch.dtl.update_pending)
-   vcpus_to_update[need_vpa_update++] = vcpu;
-   }
+   /*
+* Remove from the list any threads that have a signal pending
+* or need a VPA update done
+*/
+   prepare_threads(vc);
+
+   /* if the runner is no longer runnable, let the caller pick a new one */
+   if (vc-runner-arch.state != KVMPPC_VCPU_RUNNABLE)
+   return;
 
/*
-* Initialize *vc, in particular vc-vcore_state, so we can
-* drop the vcore lock if necessary.
+* Initialize *vc.
 */
vc-n_woken = 0;
vc-nap_count = 0;
vc-entry_exit_count = 0;
vc-preempt_tb = TB_NIL;
-   vc-vcore_state = VCORE_STARTING;
vc-in_guest = 0;
vc-napping_threads = 0;
vc-conferring_threads = 0;
 
/*
-* Updating any of the vpas requires calling kvmppc_pin_guest_page,
-* which can't be called with any spinlocks held.
-*/
-   if (need_vpa_update) {
-   spin_unlock(vc-lock);
-   for (i = 0; i  need_vpa_update; ++i)
-   kvmppc_update_vpas(vcpus_to_update[i]);
-   spin_lock(vc-lock);
-   }
-
-   /*
 * Make sure we are running on primary threads, and that secondary
 * threads are offline.  Also check if the number of

[PATCH 16/23] KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu

2015-03-20 Thread Paul Mackerras

Rather than calling cond_resched() in kvmppc_run_core() before doing
the post-processing for the vcpus that we have just run (that is,
calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do
that post-processing before calling cond_resched(), and that post-
processing is moved out into its own function, post_guest_process().

The reschedule point is now in kvmppc_run_vcpu() and we define a new
vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner
task is runnable but not running.  (Doing the reschedule with the
vcore in VCORE_INACTIVE state would be bad because there are potentially
other vcpus waiting for the runner in kvmppc_wait_for_exec() which
then wouldn't get woken up.)

Also, we make use of the handy cond_resched_lock() function, which
unlocks and relocks vc-lock for us around the reschedule.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  5 +-
 arch/powerpc/kvm/book3s_hv.c| 92 +
 2 files changed, 55 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index ec4cf37..7b327e5 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -304,8 +304,9 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
 #define VCORE_SLEEPING 1
-#define VCORE_RUNNING  2
-#define VCORE_EXITING  3
+#define VCORE_PREEMPT  2
+#define VCORE_RUNNING  3
+#define VCORE_EXITING  4
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 15598be..bd16b03 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1855,15 +1855,50 @@ static void prepare_threads(struct kvmppc_vcore *vc)
}
 }
 
+static void post_guest_process(struct kvmppc_vcore *vc)
+{
+   u64 now;
+   long ret;
+   struct kvm_vcpu *vcpu, *vnext;
+
+   now = get_tb();
+   list_for_each_entry_safe(vcpu, vnext, vc-runnable_threads,
+arch.run_list) {
+   /* cancel pending dec exception if dec is positive */
+   if (now  vcpu-arch.dec_expires 
+   kvmppc_core_pending_dec(vcpu))
+   kvmppc_core_dequeue_dec(vcpu);
+
+   trace_kvm_guest_exit(vcpu);
+
+   ret = RESUME_GUEST;
+   if (vcpu-arch.trap)
+   ret = kvmppc_handle_exit_hv(vcpu-arch.kvm_run, vcpu,
+   vcpu-arch.run_task);
+
+   vcpu-arch.ret = ret;
+   vcpu-arch.trap = 0;
+
+   if (vcpu-arch.ceded) {
+   if (!is_kvmppc_resume_guest(ret))
+   kvmppc_end_cede(vcpu);
+   else
+   kvmppc_set_timer(vcpu);
+   }
+   if (!is_kvmppc_resume_guest(vcpu-arch.ret)) {
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(vcpu-arch.cpu_run);
+   }
+   }
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc-lock held.
  */
 static void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
-   struct kvm_vcpu *vcpu, *vnext;
-   long ret;
-   u64 now;
+   struct kvm_vcpu *vcpu;
int i;
int srcu_idx;
 
@@ -1895,8 +1930,11 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 */
if ((threads_per_core  1) 
((vc-num_threads  threads_per_subcore) || !on_primary_thread())) {
-   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list)
+   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) 
{
vcpu-arch.ret = -EBUSY;
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(vcpu-arch.cpu_run);
+   }
goto out;
}
 
@@ -1952,44 +1990,12 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
kvm_guest_exit();
 
preempt_enable();
-   cond_resched();
 
spin_lock(vc-lock);
-   now = get_tb();
-   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) {
-   /* cancel pending dec exception if dec is positive */
-   if (now  vcpu-arch.dec_expires 
-   kvmppc_core_pending_dec(vcpu))
-   kvmppc_core_dequeue_dec(vcpu);
-
-   trace_kvm_guest_exit(vcpu);
-
-   ret = RESUME_GUEST;
-   if (vcpu-arch.trap)
-   ret = kvmppc_handle_exit_hv(vcpu-arch.kvm_run, vcpu,
-   vcpu-arch.run_task);
-
-   vcpu-arch.ret = ret;
-   vcpu-arch.trap = 0;
-
-   if (vcpu-arch.ceded) {
-   if (!is_kvmppc_resume_guest(ret))
-

[PATCH 18/23] KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI

2015-03-20 Thread Paul Mackerras

When running a multi-threaded guest and vcpu 0 in a virtual core
is not running in the guest (i.e. it is busy elsewhere in the host),
thread 0 of the physical core will switch the MMU to the guest and
then go to nap mode in the code at kvm_do_nap.  If the guest sends
an IPI to thread 0 using the msgsndp instruction, that will wake
up thread 0 and cause all the threads in the guest to exit to the
host unnecessarily.  To avoid the unnecessary exit, this arranges
for the PECEDP bit to be cleared in this situation.  When napping
due to a H_CEDE from the guest, we still set PECEDP so that the
thread will wake up on an IPI sent using msgsndp.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c3b148d..8afc8a8 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -191,6 +191,7 @@ kvmppc_primary_no_guest:
li  r3, NAPPING_NOVCPU
stb r3, HSTATE_NAPPING(r13)
 
+   li  r3, 0   /* Don't wake on privileged (OS) doorbell */
b   kvm_do_nap
 
 kvm_novcpu_wakeup:
@@ -2107,10 +2108,13 @@ _GLOBAL(kvmppc_h_cede)  /* r3 = vcpu pointer, 
r11 = msr, r13 = paca */
addir3, r4, VCPU_TB_CEDE
bl  kvmhv_accumulate_time
 
+   lis r3, LPCR_PECEDP@h   /* Do wake on privileged doorbell */
+
/*
 * Take a nap until a decrementer or external or doobell interrupt
-* occurs, with PECE1, PECE0 and PECEDP set in LPCR. Also clear the
-* runlatch bit before napping.
+* occurs, with PECE1 and PECE0 set in LPCR.
+* On POWER8, if we are ceding, also set PECEDP.
+* Also clear the runlatch bit before napping.
 */
 kvm_do_nap:
mfspr   r0, SPRN_CTRLF
@@ -2122,7 +2126,7 @@ kvm_do_nap:
mfspr   r5,SPRN_LPCR
ori r5,r5,LPCR_PECE0 | LPCR_PECE1
 BEGIN_FTR_SECTION
-   orisr5,r5,LPCR_PECEDP@h
+   rlwimi  r5, r3, 0, LPCR_PECEDP
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_LPCR,r5
isync
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8

2015-03-20 Thread Paul Mackerras

This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller.  This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kernel/asm-offsets.c   |  4 +++
 arch/powerpc/kvm/book3s_hv.c| 48 ++---
 arch/powerpc/kvm/book3s_hv_rm_xics.c| 11 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 41 
 4 files changed, 83 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index fa7b57d..0ce2aa6 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -37,6 +37,7 @@
 #include asm/thread_info.h
 #include asm/rtas.h
 #include asm/vdso_datapage.h
+#include asm/dbell.h
 #ifdef CONFIG_PPC64
 #include asm/paca.h
 #include asm/lppaca.h
@@ -568,6 +569,7 @@ int main(void)
DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr));
DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr));
DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes));
+   DEFINE(VCORE_PCPU, offsetof(struct kvmppc_vcore, pcpu));
DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
@@ -757,5 +759,7 @@ int main(void)
offsetof(struct paca_struct, subcore_sibling_mask));
 #endif
 
+   DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
+
return 0;
 }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 03a8bb4..2c34bae 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -51,6 +51,7 @@
 #include asm/hvcall.h
 #include asm/switch_to.h
 #include asm/smp.h
+#include asm/dbell.h
 #include linux/gfp.h
 #include linux/vmalloc.h
 #include linux/highmem.h
@@ -84,9 +85,34 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
MAX_HCALL_OPCODE/4 + 1);
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
+static bool kvmppc_ipi_thread(int cpu)
+{
+   /* On POWER8 for IPIs to threads in the same core, use msgsnd */
+   if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
+   preempt_disable();
+   if ((cpu  ~7) == (smp_processor_id()  ~7)) {
+   unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+   msg |= cpu  7;
+   smp_mb();
+   __asm__ __volatile__ (PPC_MSGSND(%0) : : r (msg));
+   preempt_enable();
+   return true;
+   }
+   preempt_enable();
+   }
+
+#if defined(CONFIG_PPC_ICP_NATIVE)  defined(CONFIG_SMP)
+   if (cpu = 0  cpu  nr_cpu_ids  paca[cpu].kvm_hstate.xics_phys) {
+   xics_wake_cpu(cpu);
+   return true;
+   }
+#endif
+
+   return false;
+}
+
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-   int me;
int cpu = vcpu-cpu;
wait_queue_head_t *wqp;
 
@@ -96,20 +122,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
++vcpu-stat.halt_wakeup;
}
 
-   me = get_cpu();
+   if (kvmppc_ipi_thread(cpu + vcpu-arch.ptid))
+   return;
 
/* CPU points to the first thread of the core */
-   if (cpu != me  cpu = 0  cpu  nr_cpu_ids) {
-#ifdef CONFIG_PPC_ICP_NATIVE
-   int real_cpu = cpu + vcpu-arch.ptid;
-   if (paca[real_cpu].kvm_hstate.xics_phys)
-   xics_wake_cpu(real_cpu);
-   else
-#endif
-   if (cpu_online(cpu))
-   smp_send_reschedule(cpu);
-   }
-   put_cpu();
+   if (cpu = 0  cpu  nr_cpu_ids  cpu_online(cpu))
+   smp_send_reschedule(cpu);
 }
 
 /*
@@ -1754,10 +1772,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
/* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
smp_wmb();
tpaca-kvm_hstate.kvm_vcpu = vcpu;
-#if defined(CONFIG_PPC_ICP_NATIVE)  defined(CONFIG_SMP)
if (cpu != smp_processor_id())
-   xics_wake_cpu(cpu);
-#endif
+   kvmppc_ipi_thread(cpu);
 }
 
 static void kvmppc_wait_for_nap(void)
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 6dded8c..457a8b1 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -18,6 +18,7 @@
 #include asm/debug.h
 #include asm/synch.h
 #include asm/ppc-opcode.h
+#include asm/dbell.h
 
 #include book3s_xics.h
 
@@ -83,6 +84,16 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
/* In SMT

[PATCH 14/23] KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update

2015-03-20 Thread Paul Mackerras

Previously, if kvmppc_run_core() was running a VCPU that needed a VPA
update (i.e. one of its 3 virtual processor areas needed to be pinned
in memory so the host real mode code can update it on guest entry and
exit), we would drop the vcore lock and do the update there and then.
Future changes will make it inconvenient to drop the lock, so instead
we now remove it from the list of runnable VCPUs and wake up its
VCPU task.  This will have the effect that the VCPU task will exit
kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and
re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call
to kvmppc_update_vpas() and then rejoin the vcore.

The one complication is that the runner VCPU (whose VCPU task is the
current task) might be one of the ones that gets removed from the
runnable list.  In that case we just return from kvmppc_run_core()
and let the code in kvmppc_run_vcpu() wake up another VCPU task to be
the runner if necessary.

This all means that the VCORE_STARTING state is no longer used, so we
remove it.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  5 ++--
 arch/powerpc/kvm/book3s_hv.c| 56 -
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 286c0ce..cee6e55 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -306,9 +306,8 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
 #define VCORE_SLEEPING 1
-#define VCORE_STARTING 2
-#define VCORE_RUNNING  3
-#define VCORE_EXITING  4
+#define VCORE_RUNNING  2
+#define VCORE_EXITING  3
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8517c33..15598be 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1836,6 +1836,25 @@ static void kvmppc_start_restoring_l2_cache(const struct 
kvmppc_vcore *vc)
mtspr(SPRN_MPPR, mpp_addr | PPC_MPPR_FETCH_WHOLE_TABLE);
 }
 
+static void prepare_threads(struct kvmppc_vcore *vc)
+{
+   struct kvm_vcpu *vcpu, *vnext;
+
+   list_for_each_entry_safe(vcpu, vnext, vc-runnable_threads,
+arch.run_list) {
+   if (signal_pending(vcpu-arch.run_task))
+   vcpu-arch.ret = -EINTR;
+   else if (vcpu-arch.vpa.update_pending ||
+vcpu-arch.slb_shadow.update_pending ||
+vcpu-arch.dtl.update_pending)
+   vcpu-arch.ret = RESUME_GUEST;
+   else
+   continue;
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(vcpu-arch.cpu_run);
+   }
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc-lock held.
@@ -1845,46 +1864,31 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
struct kvm_vcpu *vcpu, *vnext;
long ret;
u64 now;
-   int i, need_vpa_update;
+   int i;
int srcu_idx;
-   struct kvm_vcpu *vcpus_to_update[threads_per_core];
 
-   /* don't start if any threads have a signal pending */
-   need_vpa_update = 0;
-   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) {
-   if (signal_pending(vcpu-arch.run_task))
-   return;
-   if (vcpu-arch.vpa.update_pending ||
-   vcpu-arch.slb_shadow.update_pending ||
-   vcpu-arch.dtl.update_pending)
-   vcpus_to_update[need_vpa_update++] = vcpu;
-   }
+   /*
+* Remove from the list any threads that have a signal pending
+* or need a VPA update done
+*/
+   prepare_threads(vc);
+
+   /* if the runner is no longer runnable, let the caller pick a new one */
+   if (vc-runner-arch.state != KVMPPC_VCPU_RUNNABLE)
+   return;
 
/*
-* Initialize *vc, in particular vc-vcore_state, so we can
-* drop the vcore lock if necessary.
+* Initialize *vc.
 */
vc-n_woken = 0;
vc-nap_count = 0;
vc-entry_exit_count = 0;
vc-preempt_tb = TB_NIL;
-   vc-vcore_state = VCORE_STARTING;
vc-in_guest = 0;
vc-napping_threads = 0;
vc-conferring_threads = 0;
 
/*
-* Updating any of the vpas requires calling kvmppc_pin_guest_page,
-* which can't be called with any spinlocks held.
-*/
-   if (need_vpa_update) {
-   spin_unlock(vc-lock);
-   for (i = 0; i  need_vpa_update; ++i)
-   kvmppc_update_vpas(vcpus_to_update[i]);
-   spin_lock(vc-lock);
-   }
-
-   /*
 * Make sure we are running on primary threads, and that secondary
 * threads are offline.  Also check if the number of

[PATCH 19/23] KVM: PPC: Book3S HV: Use decrementer to wake napping threads

2015-03-20 Thread Paul Mackerras

This arranges for threads that are napping due to their vcpu having
ceded or due to not having a vcpu to wake up at the end of the guest's
timeslice without having to be poked with an IPI.  We do that by
arranging for the decrementer to contain a value no greater than the
number of timebase ticks remaining until the end of the timeslice.
In the case of a thread with no vcpu, this number is in the hypervisor
decrementer already.  In the case of a ceded vcpu, we use the smaller
of the HDEC value and the DEC value.

Using the DEC like this when ceded means we need to save and restore
the guest decrementer value around the nap.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 +++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 8afc8a8..03a37a0 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -172,6 +172,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 kvmppc_primary_no_guest:
/* We handle this much like a ceded vcpu */
+   /* put the HDEC into the DEC, since HDEC interrupts don't wake us */
+   mfspr   r3, SPRN_HDEC
+   mtspr   SPRN_DEC, r3
/* set our bit in napping_threads */
ld  r5, HSTATE_KVM_VCORE(r13)
lbz r7, HSTATE_PTID(r13)
@@ -223,6 +226,12 @@ kvm_novcpu_wakeup:
cmpdi   r3, 0
bge kvm_novcpu_exit
 
+   /* See if our timeslice has expired (HDEC is negative) */
+   mfspr   r0, SPRN_HDEC
+   li  r12, BOOK3S_INTERRUPT_HV_DECREMENTER
+   cmpwi   r0, 0
+   blt kvm_novcpu_exit
+
/* Got an IPI but other vcpus aren't yet exiting, must be a latecomer */
ld  r4, HSTATE_KVM_VCPU(r13)
cmpdi   r4, 0
@@ -1478,10 +1487,10 @@ kvmhv_do_exit:  /* r12 = trap, r13 = 
paca */
cmpwi   r3,0x100/* Are we the first here? */
bge 43f
cmpwi   r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-   beq 40f
+   beq 43f
li  r0,0
mtspr   SPRN_HDEC,r0
-40:
+
/*
 * Send an IPI to any napping threads, since an HDEC interrupt
 * doesn't wake CPUs up from nap.
@@ -2104,6 +2113,27 @@ _GLOBAL(kvmppc_h_cede)   /* r3 = vcpu pointer, 
r11 = msr, r13 = paca */
/* save FP state */
bl  kvmppc_save_fp
 
+   /*
+* Set DEC to the smaller of DEC and HDEC, so that we wake
+* no later than the end of our timeslice (HDEC interrupts
+* don't wake us from nap).
+*/
+   mfspr   r3, SPRN_DEC
+   mfspr   r4, SPRN_HDEC
+   mftbr5
+   cmpwr3, r4
+   ble 67f
+   mtspr   SPRN_DEC, r4
+67:
+   /* save expiry time of guest decrementer */
+   extsw   r3, r3
+   add r3, r3, r5
+   ld  r4, HSTATE_KVM_VCPU(r13)
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_TB_OFFSET(r5)
+   subfr3, r6, r3  /* convert to host TB value */
+   std r3, VCPU_DEC_EXPIRES(r4)
+
ld  r4, HSTATE_KVM_VCPU(r13)
addir3, r4, VCPU_TB_CEDE
bl  kvmhv_accumulate_time
@@ -2157,6 +2187,15 @@ kvm_end_cede:
/* load up FP state */
bl  kvmppc_load_fp
 
+   /* Restore guest decrementer */
+   ld  r3, VCPU_DEC_EXPIRES(r4)
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_TB_OFFSET(r5)
+   add r3, r3, r6  /* convert host TB to guest TB value */
+   mftbr7
+   subfr3, r7, r3
+   mtspr   SPRN_DEC, r3
+
/* Load NV GPRS */
ld  r14, VCPU_GPR(R14)(r4)
ld  r15, VCPU_GPR(R15)(r4)
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/23] KVM: PPC: Book3S HV: Add ICP real mode counters

2015-03-20 Thread Paul Mackerras

From: Suresh Warrier warr...@linux.vnet.ibm.com

Add two counters to count how often we generate real-mode ICS resend
and reject events. The counters provide some performance statistics
that could be used in the future to consider if the real mode functions
need further optimizing. The counters are displayed as part of IPC and
ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM.

Also added two counters that count (approximately) how many times we
don't find an ICP or ICS we're looking for. These are not currently
exposed through sysfs, but can be useful when debugging crashes.

Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  7 +++
 arch/powerpc/kvm/book3s_xics.c   | 10 --
 arch/powerpc/kvm/book3s_xics.h   |  5 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 73bbe92..6dded8c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -227,6 +227,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
ics = kvmppc_xics_find_ics(xics, new_irq, src);
if (!ics) {
/* Unsafe increment, but this does not need to be accurate */
+   xics-err_noics++;
return;
}
state = ics-irq_state[src];
@@ -239,6 +240,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
icp = kvmppc_xics_find_server(xics-kvm, state-server);
if (!icp) {
/* Unsafe increment again*/
+   xics-err_noicp++;
goto out;
}
}
@@ -383,6 +385,7 @@ static void icp_rm_down_cppr(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
 * separately here as well.
 */
if (resend) {
+   icp-n_check_resend++;
icp_rm_check_resend(xics, icp);
}
 }
@@ -500,11 +503,13 @@ int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long 
server,
 
/* Handle reject in real mode */
if (reject  reject != XICS_IPI) {
+   this_icp-n_reject++;
icp_rm_deliver_irq(xics, icp, reject);
}
 
/* Handle resends in real mode */
if (resend) {
+   this_icp-n_check_resend++;
icp_rm_check_resend(xics, icp);
}
 
@@ -566,6 +571,7 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long 
cppr)
 * attempt (see comments in icp_rm_deliver_irq).
 */
if (reject  reject != XICS_IPI) {
+   icp-n_reject++;
icp_rm_deliver_irq(xics, icp, reject);
}
  bail:
@@ -616,6 +622,7 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long 
xirr)
 
/* Still asserted, resend it */
if (state-asserted) {
+   icp-n_reject++;
icp_rm_deliver_irq(xics, icp, irq);
}
 
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 56ed9b4..eb2569a 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -901,6 +901,7 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
unsigned long flags;
unsigned long t_rm_kick_vcpu, t_rm_check_resend;
unsigned long t_rm_reject, t_rm_notify_eoi;
+   unsigned long t_reject, t_check_resend;
 
if (!kvm)
return 0;
@@ -909,6 +910,8 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
t_rm_notify_eoi = 0;
t_rm_check_resend = 0;
t_rm_reject = 0;
+   t_check_resend = 0;
+   t_reject = 0;
 
seq_printf(m, =\nICP state\n=\n);
 
@@ -928,12 +931,15 @@ static int xics_debug_show(struct seq_file *m, void 
*private)
t_rm_notify_eoi += icp-n_rm_notify_eoi;
t_rm_check_resend += icp-n_rm_check_resend;
t_rm_reject += icp-n_rm_reject;
+   t_check_resend += icp-n_check_resend;
+   t_reject += icp-n_reject;
}
 
-   seq_puts(m, ICP Guest Real Mode exit totals: );
-   seq_printf(m, \tkick_vcpu=%lu check_resend=%lu reject=%lu 
notify_eoi=%lu\n,
+   seq_printf(m, ICP Guest-Host totals: kick_vcpu=%lu check_resend=%lu 
reject=%lu notify_eoi=%lu\n,
t_rm_kick_vcpu, t_rm_check_resend,
t_rm_reject, t_rm_notify_eoi);
+   seq_printf(m, ICP Real Mode totals: check_resend=%lu resend=%lu\n,
+   t_check_resend, t_reject);
for (icsid = 0; icsid = KVMPPC_XICS_MAX_ICS_ID; icsid++) {
struct kvmppc_ics *ics = xics-ics[icsid];
 
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index 055424c..56ea44f 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++

[PATCH 21/23] KVM: PPC: Book3S HV: Streamline guest entry and exit

2015-03-20 Thread Paul Mackerras

On entry to the guest, secondary threads now wait for the primary to
switch the MMU after loading up most of their state, rather than before.
This means that the secondary threads get into the guest sooner, in the
common case where the secondary threads get to kvmppc_hv_entry before
the primary thread.

On exit, the first thread out increments the exit count and interrupts
the other threads (to get them out of the guest) before saving most
of its state, rather than after.  That means that the other threads
exit sooner and means that the first thread doesn't spend so much
time waiting for the other threads at the point where the MMU gets
switched back to the host.

This pulls out the code that increments the exit count and interrupts
other threads into a separate function, kvmhv_commence_exit().
This also makes sure that r12 and vcpu-arch.trap are set correctly
in some corner cases.

Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the
improvement.  Aggregating across vcpus for a guest with 32 vcpus,
8 threads/vcore, running on a POWER8, gives this before the change:

 rm_entry: avg 3919.3ns (244 - 56492, 742665 samples)
  rm_exit: avg 4102.5ns (130 - 36272, 704056 samples)
  rm_intr: avg 1006.0ns (12 - 75040, 2819905 samples)

and this after the change:

 rm_entry: avg 2979.8ns (258 - 83740, 836403 samples)
  rm_exit: avg 3992.9ns (12 - 45572, 838034 samples)
  rm_intr: avg  922.2ns (12 - 66694, 3127066 samples)

showing a substantial reduction in the time spent in the real-mode
guest entry code, and smaller reductions in the real mode guest exit
and interrupt handling times.  (The test was to start the guest and
boot Fedora 20 big-endian to the login prompt.)

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 240 +++-
 1 file changed, 141 insertions(+), 99 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 04728ce..ff1461d 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -175,6 +175,19 @@ kvmppc_primary_no_guest:
/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
mfspr   r3, SPRN_HDEC
mtspr   SPRN_DEC, r3
+   /*
+* Make sure the primary has finished the MMU switch.
+* We should never get here on a secondary thread, but
+* check it for robustness' sake.
+*/
+   ld  r5, HSTATE_KVM_VCORE(r13)
+65:lbz r0, VCORE_IN_GUEST(r5)
+   cmpwi   r0, 0
+   beq 65b
+   /* Set LPCR. */
+   ld  r8,VCORE_LPCR(r5)
+   mtspr   SPRN_LPCR,r8
+   isync
/* set our bit in napping_threads */
ld  r5, HSTATE_KVM_VCORE(r13)
lbz r7, HSTATE_PTID(r13)
@@ -206,7 +219,7 @@ kvm_novcpu_wakeup:
 
/* check the wake reason */
bl  kvmppc_check_wake_reason
-   
+
/* see if any other thread is already exiting */
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi   r0, 0x100
@@ -243,7 +256,12 @@ kvm_novcpu_wakeup:
 
 kvm_novcpu_exit:
ld  r4, HSTATE_KVM_VCPU(r13)
-   b   hdec_soon
+   cmpdi   r4, 0
+   beq 13f
+   addir3, r4, VCPU_TB_RMEXIT
+   bl  kvmhv_accumulate_time
+13:bl  kvmhv_commence_exit
+   b   kvmhv_switch_to_host
 
 /*
  * We come in here when wakened from nap mode.
@@ -417,7 +435,7 @@ kvmppc_hv_entry:
ld  r9,VCORE_KVM(r5)/* pointer to struct kvm */
lbz r6,HSTATE_PTID(r13)
cmpwi   r6,0
-   bne 20f
+   bne 10f
ld  r6,KVM_SDR1(r9)
lwz r7,KVM_LPID(r9)
li  r0,LPID_RSVD/* switch to reserved LPID */
@@ -488,26 +506,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
li  r0,1
stb r0,VCORE_IN_GUEST(r5)   /* signal secondaries to continue */
-   b   10f
-
-   /* Secondary threads wait for primary to have done partition switch */
-20:lbz r0,VCORE_IN_GUEST(r5)
-   cmpwi   r0,0
-   beq 20b
-
-   /* Set LPCR. */
-10:ld  r8,VCORE_LPCR(r5)
-   mtspr   SPRN_LPCR,r8
-   isync
-
-   /* Check if HDEC expires soon */
-   mfspr   r3,SPRN_HDEC
-   cmpwi   r3,512  /* 1 microsecond */
-   li  r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-   blt hdec_soon
 
/* Do we have a guest vcpu to run? */
-   cmpdi   r4, 0
+10:cmpdi   r4, 0
beq kvmppc_primary_no_guest
 kvmppc_got_guest:
 
@@ -832,6 +833,30 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
clrrdi  r6,r6,1
mtspr   SPRN_CTRLT,r6
 4:
+   /* Secondary threads wait for primary to have done partition switch */
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   lbz r6, HSTATE_PTID(r13)
+   cmpwi   r6, 0
+   beq 21f
+   lbz r0, VCORE_IN_GUEST(r5)
+   cmpwi   r0, 0
+

[PATCH 22/23] KVM: PPC: Book3S HV: Use bitmap of active threads rather than count

2015-03-20 Thread Paul Mackerras

Currently, the entry_exit_count field in the kvmppc_vcore struct
contains two 8-bit counts, one of the threads that have started entering
the guest, and one of the threads that have started exiting the guest.
This changes it to an entry_exit_map field which contains two bitmaps
of 8 bits each.  The advantage of doing this is that it gives us a
bitmap of which threads need to be signalled when exiting the guest.
That means that we no longer need to use the trick of setting the
HDEC to 0 to pull the other threads out of the guest, which led in
some cases to a spurious HDEC interrupt on the next guest entry.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h | 15 
 arch/powerpc/kernel/asm-offsets.c   |  2 +-
 arch/powerpc/kvm/book3s_hv.c|  5 ++-
 arch/powerpc/kvm/book3s_hv_builtin.c| 10 +++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 63 +++--
 5 files changed, 45 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index f6d4232..c2b9551 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -263,15 +263,15 @@ struct kvm_arch {
 
 /*
  * Struct for a virtual core.
- * Note: entry_exit_count combines an entry count in the bottom 8 bits
- * and an exit count in the next 8 bits.  This is so that we can
- * atomically increment the entry count iff the exit count is 0
- * without taking the lock.
+ * Note: entry_exit_map combines a bitmap of threads that have entered
+ * in the bottom 8 bits and a bitmap of threads that have exited in the
+ * next 8 bits.  This is so that we can atomically set the entry bit
+ * iff the exit map is 0 without taking a lock.
  */
 struct kvmppc_vcore {
int n_runnable;
int num_threads;
-   int entry_exit_count;
+   int entry_exit_map;
int napping_threads;
int first_vcpuid;
u16 pcpu;
@@ -296,8 +296,9 @@ struct kvmppc_vcore {
ulong conferring_threads;
 };
 
-#define VCORE_ENTRY_COUNT(vc)  ((vc)-entry_exit_count  0xff)
-#define VCORE_EXIT_COUNT(vc)   ((vc)-entry_exit_count  8)
+#define VCORE_ENTRY_MAP(vc)((vc)-entry_exit_map  0xff)
+#define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
+#define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 0ce2aa6..ed348e5 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -561,7 +561,7 @@ int main(void)
DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop));
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
-   DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, 
entry_exit_count));
+   DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, 
napping_threads));
DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2c34bae..9ea0eb5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1941,7 +1941,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Initialize *vc.
 */
-   vc-entry_exit_count = 0;
+   vc-entry_exit_map = 0;
vc-preempt_tb = TB_NIL;
vc-in_guest = 0;
vc-napping_threads = 0;
@@ -2108,8 +2108,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
 * this thread straight away and have it join in.
 */
if (!signal_pending(current)) {
-   if (vc-vcore_state == VCORE_RUNNING 
-   VCORE_EXIT_COUNT(vc) == 0) {
+   if (vc-vcore_state == VCORE_RUNNING  !VCORE_IS_EXITING(vc)) {
kvmppc_create_dtl_entry(vcpu, vc);
kvmppc_start_thread(vcpu);
trace_kvm_guest_enter(vcpu);
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 1954a1c..2754251 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -115,11 +115,11 @@ long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int 
target,
int rv = H_SUCCESS; /* = don't yield */
 
set_bit(vcpu-arch.ptid, vc-conferring_threads);
-   while ((get_tb()  stop)  (VCORE_EXIT_COUNT(vc) == 0)) {
-   threads_running = VCORE_ENTRY_COUNT(vc);
-   threads_ceded = hweight32(vc-napping_threads);
-   threads_conferring = hweight32(vc-conferring_threads);
-   if (threads_ceded + threads_conferring = threads_running) {
+   while

[PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-20 Thread Paul Mackerras

This reads the timebase at various points in the real-mode guest
entry/exit code and uses that to accumulate total, minimum and
maximum time spent in those parts of the code.  Currently these
times are accumulated per vcpu in 5 parts of the code:

* rm_entry - time taken from the start of kvmppc_hv_entry() until
  just before entering the guest.
* rm_intr - time from when we take a hypervisor interrupt in the
  guest until we either re-enter the guest or decide to exit to the
  host.  This includes time spent handling hcalls in real mode.
* rm_exit - time from when we decide to exit the guest until the
  return from kvmppc_hv_entry().
* guest - time spend in the guest
* cede - time spent napping in real mode due to an H_CEDE hcall
  while other threads in the same vcore are active.

These times are exposed in debugfs in a directory per vcpu that
contains a file called timings.  This file contains one line for
each of the 5 timings above, with the name followed by a colon and
4 numbers, which are the count (number of times the code has been
executed), the total time, the minimum time, and the maximum time,
all in nanoseconds.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  19 +
 arch/powerpc/include/asm/time.h |   3 +
 arch/powerpc/kernel/asm-offsets.c   |  11 +++
 arch/powerpc/kernel/time.c  |   6 ++
 arch/powerpc/kvm/book3s_hv.c| 135 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 105 -
 6 files changed, 276 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index f1d0bbc..286c0ce 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -369,6 +369,14 @@ struct kvmppc_slb {
u8 base_page_size;  /* MMU_PAGE_xxx */
 };
 
+/* Struct used to accumulate timing information in HV real mode code */
+struct kvmhv_tb_accumulator {
+   u64 seqcount;   /* used to synchronize access, also count * 2 */
+   u64 tb_total;   /* total time in timebase ticks */
+   u64 tb_min; /* min time */
+   u64 tb_max; /* max time */
+};
+
 # ifdef CONFIG_PPC_FSL_BOOK3E
 #define KVMPPC_BOOKE_IAC_NUM   2
 #define KVMPPC_BOOKE_DAC_NUM   2
@@ -656,6 +664,17 @@ struct kvm_vcpu_arch {
u64 busy_preempt;
 
u32 emul_inst;
+
+   struct kvmhv_tb_accumulator *cur_activity;  /* What we're timing */
+   u64 cur_tb_start;   /* when it started */
+   struct kvmhv_tb_accumulator rm_entry;   /* real-mode entry code */
+   struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */
+   struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
+   struct kvmhv_tb_accumulator guest_time; /* guest execution */
+   struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
+
+   struct dentry *debugfs_dir;
+   struct dentry *debugfs_timings;
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 03cbada..10fc784 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void);
 
 DECLARE_PER_CPU(u64, decrementers_next_tb);
 
+/* Convert timebase ticks to nanoseconds */
+unsigned long long tb_to_ns(unsigned long long tb_ticks);
+
 #endif /* __KERNEL__ */
 #endif /* __POWERPC_TIME_H */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 4717859..ec9f59c 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -458,6 +458,17 @@ int main(void)
DEFINE(VCPU_SPRG1, offsetof(struct kvm_vcpu, arch.shregs.sprg1));
DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
+   DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry));
+   DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr));
+   DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit));
+   DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time));
+   DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time));
+   DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity));
+   DEFINE(VCPU_ACTIVITY_START, offsetof(struct kvm_vcpu, 
arch.cur_tb_start));
+   DEFINE(TAS_SEQCOUNT, offsetof(struct kvmhv_tb_accumulator, seqcount));
+   DEFINE(TAS_TOTAL, offsetof(struct kvmhv_tb_accumulator, tb_total));
+   DEFINE(TAS_MIN, offsetof(struct kvmhv_tb_accumulator, tb_min));
+   DEFINE(TAS_MAX, offsetof(struct kvmhv_tb_accumulator, tb_max));
 #endif
DEFINE(VCPU_SHARED_SPRG3, offsetof(struct kvm_vcpu_arch_shared, sprg3));
DEFINE(VCPU_SHARED_SPRG4, offsetof(struct kvm_vcpu_arch_shared, sprg4));
diff --git

[PATCH 21/23] KVM: PPC: Book3S HV: Streamline guest entry and exit

2015-03-20 Thread Paul Mackerras

On entry to the guest, secondary threads now wait for the primary to
switch the MMU after loading up most of their state, rather than before.
This means that the secondary threads get into the guest sooner, in the
common case where the secondary threads get to kvmppc_hv_entry before
the primary thread.

On exit, the first thread out increments the exit count and interrupts
the other threads (to get them out of the guest) before saving most
of its state, rather than after.  That means that the other threads
exit sooner and means that the first thread doesn't spend so much
time waiting for the other threads at the point where the MMU gets
switched back to the host.

This pulls out the code that increments the exit count and interrupts
other threads into a separate function, kvmhv_commence_exit().
This also makes sure that r12 and vcpu-arch.trap are set correctly
in some corner cases.

Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the
improvement.  Aggregating across vcpus for a guest with 32 vcpus,
8 threads/vcore, running on a POWER8, gives this before the change:

 rm_entry: avg 3919.3ns (244 - 56492, 742665 samples)
  rm_exit: avg 4102.5ns (130 - 36272, 704056 samples)
  rm_intr: avg 1006.0ns (12 - 75040, 2819905 samples)

and this after the change:

 rm_entry: avg 2979.8ns (258 - 83740, 836403 samples)
  rm_exit: avg 3992.9ns (12 - 45572, 838034 samples)
  rm_intr: avg  922.2ns (12 - 66694, 3127066 samples)

showing a substantial reduction in the time spent in the real-mode
guest entry code, and smaller reductions in the real mode guest exit
and interrupt handling times.  (The test was to start the guest and
boot Fedora 20 big-endian to the login prompt.)

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 240 +++-
 1 file changed, 141 insertions(+), 99 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 04728ce..ff1461d 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -175,6 +175,19 @@ kvmppc_primary_no_guest:
/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
mfspr   r3, SPRN_HDEC
mtspr   SPRN_DEC, r3
+   /*
+* Make sure the primary has finished the MMU switch.
+* We should never get here on a secondary thread, but
+* check it for robustness' sake.
+*/
+   ld  r5, HSTATE_KVM_VCORE(r13)
+65:lbz r0, VCORE_IN_GUEST(r5)
+   cmpwi   r0, 0
+   beq 65b
+   /* Set LPCR. */
+   ld  r8,VCORE_LPCR(r5)
+   mtspr   SPRN_LPCR,r8
+   isync
/* set our bit in napping_threads */
ld  r5, HSTATE_KVM_VCORE(r13)
lbz r7, HSTATE_PTID(r13)
@@ -206,7 +219,7 @@ kvm_novcpu_wakeup:
 
/* check the wake reason */
bl  kvmppc_check_wake_reason
-   
+
/* see if any other thread is already exiting */
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi   r0, 0x100
@@ -243,7 +256,12 @@ kvm_novcpu_wakeup:
 
 kvm_novcpu_exit:
ld  r4, HSTATE_KVM_VCPU(r13)
-   b   hdec_soon
+   cmpdi   r4, 0
+   beq 13f
+   addir3, r4, VCPU_TB_RMEXIT
+   bl  kvmhv_accumulate_time
+13:bl  kvmhv_commence_exit
+   b   kvmhv_switch_to_host
 
 /*
  * We come in here when wakened from nap mode.
@@ -417,7 +435,7 @@ kvmppc_hv_entry:
ld  r9,VCORE_KVM(r5)/* pointer to struct kvm */
lbz r6,HSTATE_PTID(r13)
cmpwi   r6,0
-   bne 20f
+   bne 10f
ld  r6,KVM_SDR1(r9)
lwz r7,KVM_LPID(r9)
li  r0,LPID_RSVD/* switch to reserved LPID */
@@ -488,26 +506,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
li  r0,1
stb r0,VCORE_IN_GUEST(r5)   /* signal secondaries to continue */
-   b   10f
-
-   /* Secondary threads wait for primary to have done partition switch */
-20:lbz r0,VCORE_IN_GUEST(r5)
-   cmpwi   r0,0
-   beq 20b
-
-   /* Set LPCR. */
-10:ld  r8,VCORE_LPCR(r5)
-   mtspr   SPRN_LPCR,r8
-   isync
-
-   /* Check if HDEC expires soon */
-   mfspr   r3,SPRN_HDEC
-   cmpwi   r3,512  /* 1 microsecond */
-   li  r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-   blt hdec_soon
 
/* Do we have a guest vcpu to run? */
-   cmpdi   r4, 0
+10:cmpdi   r4, 0
beq kvmppc_primary_no_guest
 kvmppc_got_guest:
 
@@ -832,6 +833,30 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
clrrdi  r6,r6,1
mtspr   SPRN_CTRLT,r6
 4:
+   /* Secondary threads wait for primary to have done partition switch */
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   lbz r6, HSTATE_PTID(r13)
+   cmpwi   r6, 0
+   beq 21f
+   lbz r0, VCORE_IN_GUEST(r5)
+   cmpwi   r0, 0
+

Re: Hardware support for vt-posted interrupts described in vt-directed-io-spec for assigned devices

2015-03-20 Thread Eric Auger

On 03/20/2015 03:04 PM, Alex Williamson wrote:
 On Fri, 2015-03-20 at 15:24 +0530, bk rakesh wrote:
 Adding few more information regarding the setup which i had created to
 test the vt-d posted interrupts for assigned devices,

 Hardware used for evaluating vt-posted interrupts
 cpu E5-2620 v2 @ 2.10GHz and S2600CP server board

 I had used kernel-3.18 patched with KVM-VFIO IRQ forward
 control(posted by eric.au...@linaro.org),
 
 IRQ forwarding in an ARM technology for handling level triggered
 interrupts, not Intel, not even x86.

Hi Alex,

Feng's series relies on few pacth files in KVM-VFIO IRQ forward
control, basically for KVM_DEV_VFIO_DEVICE group introduction in
KVM-VFIO. This explains why you find some references to that patch here
I guess.

Best Regards

Eric

 
  hierarchy irqdomian(posted
 by jiang@linux.intel.com) and VT-d Posted-Interrupts
 support(http://lwn.net/Articles/626050/) and assigned the ixgbe 10G
 NIC via vfio passthrough using qemu-kvm, But resulted in the following
 dmesg output,

 [233783.657187] dmar: DRHD: handling fault status reg 602
 [233783.662926] dmar: INTR-REMAP: Request device [[02:00.0] fault index 47
 INTR-REMAP:[fault reason 36] Detected reserved fields in the IRTE entry
 
 This suggests bugs in the patch series for setting bits that are
 reserved on the hardware in your test system.
 
 I had checked the hardware supported for posted interrupt capability
 via capability register bit 59 (#define cap_pi_support(c)(((c) 
 59)  1)),  as described in
 http://www.intel.com/content/www/us/en/embedded/technology/virtualization/vt-directed-io-spec.html;,
 Which resulted as not supported, Can anyone suggest that does this hw
 support posted vt-d feature ?
 
 Your own hardware is telling you that it doesn't support it.
 
  if not then which one to use.
 
 Personally I would have no expectation that any currently shipping
 hardware supports this feature.  If you watch one of GregKH's talks on
 how the Linux community works or follow development for a while, you'll
 see and hear that Intel will often pre-enable features before the
 hardware that supports it is available.  I suspect this is one of those
 features.  Thanks,
 
 Alex
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: call irq notifiers with directed EOI

2015-03-20 Thread Radim Krčmář

2015-03-19 18:44-0300, Marcelo Tosatti:
 On Wed, Mar 18, 2015 at 07:38:22PM +0100, Radim Krčmář wrote:
  kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled.
  We need to do that for irq notifiers.  (Like with edge interrupts.)
  
  Fix it by skipping EOI broadcast only.
  
  Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211
  Signed-off-by: Radim Krčmář rkrc...@redhat.com
  ---
  diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
  @@ -443,7 +444,8 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu 
  *vcpu,
  -   if (trigger_mode != IOAPIC_LEVEL_TRIG)
  +   if (trigger_mode != IOAPIC_LEVEL_TRIG ||
  +   kvm_apic_get_reg(apic, APIC_SPIV)  APIC_SPIV_DIRECTED_EOI)
  continue;
 
 Don't you have to handle kvm_ioapic_eoi_inject_work as well?

It works without that: ent-fields.remote_irr == 1, thus
kvm_ioapic_eoi_inject_work() will do nothing.
Adding a check would be better for clarity, though.

We could add the EOI register (implement IO-APIC version 0x20), because
kernels are forced to do ugly hacks otherwise (switching to
edge-triggered mode and back).
We also clear remote_irr on a different occasion (just a write to
ioreg).

I'll take a closer look at the second one.

  ASSERT(ent-fields.trig_mode == IOAPIC_LEVEL_TRIG);
 
 This assert can now fail?

I think it can't (nothing changed), but that is how asserts should be.
It checks a different variable than the condition above.
('trigger_mode' is sourced from APIC_TMR, which should correctly match
 'ent-fields.trig_mode'.)

The assert would be more useful before 'continue;', and modified:
  ASSERT(ent-fields.trig_mode == trigger_mode)

Thanks for the review, I'll incorporate the your comments to v2.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Hardware support for vt-posted interrupts described in vt-directed-io-spec for assigned devices

2015-03-20 Thread Alex Williamson

On Fri, 2015-03-20 at 15:24 +0530, bk rakesh wrote:
 Adding few more information regarding the setup which i had created to
 test the vt-d posted interrupts for assigned devices,
 
 Hardware used for evaluating vt-posted interrupts
 cpu E5-2620 v2 @ 2.10GHz and S2600CP server board
 
 I had used kernel-3.18 patched with KVM-VFIO IRQ forward
 control(posted by eric.au...@linaro.org),

IRQ forwarding in an ARM technology for handling level triggered
interrupts, not Intel, not even x86.

  hierarchy irqdomian(posted
 by jiang@linux.intel.com) and VT-d Posted-Interrupts
 support(http://lwn.net/Articles/626050/) and assigned the ixgbe 10G
 NIC via vfio passthrough using qemu-kvm, But resulted in the following
 dmesg output,
 
 [233783.657187] dmar: DRHD: handling fault status reg 602
 [233783.662926] dmar: INTR-REMAP: Request device [[02:00.0] fault index 47
 INTR-REMAP:[fault reason 36] Detected reserved fields in the IRTE entry

This suggests bugs in the patch series for setting bits that are
reserved on the hardware in your test system.

 I had checked the hardware supported for posted interrupt capability
 via capability register bit 59 (#define cap_pi_support(c)(((c) 
 59)  1)),  as described in
 http://www.intel.com/content/www/us/en/embedded/technology/virtualization/vt-directed-io-spec.html;,
 Which resulted as not supported, Can anyone suggest that does this hw
 support posted vt-d feature ?

Your own hardware is telling you that it doesn't support it.

  if not then which one to use.

Personally I would have no expectation that any currently shipping
hardware supports this feature.  If you watch one of GregKH's talks on
how the Linux community works or follow development for a while, you'll
see and hear that Intel will often pre-enable features before the
hardware that supports it is available.  I suspect this is one of those
features.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Hardware support for vt-posted interrupts described in vt-directed-io-spec for assigned devices

2015-03-20 Thread Alex Williamson

On Fri, 2015-03-20 at 15:10 +0100, Eric Auger wrote:
 On 03/20/2015 03:04 PM, Alex Williamson wrote:
  On Fri, 2015-03-20 at 15:24 +0530, bk rakesh wrote:
  Adding few more information regarding the setup which i had created to
  test the vt-d posted interrupts for assigned devices,
 
  Hardware used for evaluating vt-posted interrupts
  cpu E5-2620 v2 @ 2.10GHz and S2600CP server board
 
  I had used kernel-3.18 patched with KVM-VFIO IRQ forward
  control(posted by eric.au...@linaro.org),
  
  IRQ forwarding in an ARM technology for handling level triggered
  interrupts, not Intel, not even x86.
 
 Hi Alex,
 
 Feng's series relies on few pacth files in KVM-VFIO IRQ forward
 control, basically for KVM_DEV_VFIO_DEVICE group introduction in
 KVM-VFIO. This explains why you find some references to that patch here
 I guess.

Ah yes, makes sense.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] kvm: avoid page allocation failure in kvm_set_memory_region()

2015-03-20 Thread Igor Mammedov

KVM guest can fail to startup with following trace on host:

qemu-system-x86: page allocation failure: order:4, mode:0x40d0
Call Trace:
  dump_stack+0x47/0x67
  warn_alloc_failed+0xee/0x150
  __alloc_pages_direct_compact+0x14a/0x150
  __alloc_pages_nodemask+0x776/0xb80
  alloc_kmem_pages+0x3a/0x110
  kmalloc_order+0x13/0x50
  kmemdup+0x1b/0x40
  __kvm_set_memory_region+0x24a/0x9f0 [kvm]
  kvm_set_ioapic+0x130/0x130 [kvm]
  kvm_set_memory_region+0x21/0x40 [kvm]
  kvm_vm_ioctl+0x43f/0x750 [kvm]

Failure happens when attempting to allocate pages for
'struct kvm_memslots', however it doesn't have to be
present in physically contiguous (kmalloc-ed) address
space, change allocation to kvm_kvzalloc() so that
it will be vmalloc-ed when its size is more then a page.

Signed-off-by: Igor Mammedov imamm...@redhat.com
---
v2:
 - alloc initial memslots with vmalloc
 - use kvfree in every place where memslots are freed

TODO:
 - work on follow up patches to allocate space for
   actual amount of memory_slots instead of possible maximum.
---
 virt/kvm/kvm_main.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a2214d9..cc6a25d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -471,7 +471,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
BUILD_BUG_ON(KVM_MEM_SLOTS_NUM  SHRT_MAX);
 
r = -ENOMEM;
-   kvm-memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
+   kvm-memslots = kvm_kvzalloc(sizeof(struct kvm_memslots));
if (!kvm-memslots)
goto out_err_no_srcu;
 
@@ -522,7 +522,7 @@ out_err_no_srcu:
 out_err_no_disable:
for (i = 0; i  KVM_NR_BUSES; i++)
kfree(kvm-buses[i]);
-   kfree(kvm-memslots);
+   kvfree(kvm-memslots);
kvm_arch_free_vm(kvm);
return ERR_PTR(r);
 }
@@ -578,7 +578,7 @@ static void kvm_free_physmem(struct kvm *kvm)
kvm_for_each_memslot(memslot, slots)
kvm_free_physmem_slot(kvm, memslot, NULL);
 
-   kfree(kvm-memslots);
+   kvfree(kvm-memslots);
 }
 
 static void kvm_destroy_devices(struct kvm *kvm)
@@ -871,10 +871,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
goto out_free;
}
 
-   slots = kmemdup(kvm-memslots, sizeof(struct kvm_memslots),
-   GFP_KERNEL);
+   slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
if (!slots)
goto out_free;
+   memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
 
if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
slot = id_to_memslot(slots, mem-slot);
@@ -917,7 +917,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
kvm_arch_commit_memory_region(kvm, mem, old, change);
 
kvm_free_physmem_slot(kvm, old, new);
-   kfree(old_memslots);
+   kvfree(old_memslots);
 
/*
 * IOMMU mapping:  New slots need to be mapped.  Old slots need to be
@@ -936,7 +936,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
return 0;
 
 out_slots:
-   kfree(slots);
+   kvfree(slots);
 out_free:
kvm_free_physmem_slot(kvm, new, old);
 out:
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/12] KVM: arm/arm64: implement kvm_io_bus MMIO handling for the VGIC

2015-03-20 Thread Christoffer Dall

On Fri, Mar 20, 2015 at 12:40:02PM +, Andre Przywara wrote:
 On 03/19/2015 03:44 PM, Andre Przywara wrote:
  Hej Christoffer,
  
 
 [ ... ]
 
  +static int vgic_handle_mmio_access(struct kvm_vcpu *vcpu,
  +struct kvm_io_device *this, gpa_t addr,
  +int len, void *val, bool is_write)
  +{
  + struct vgic_dist *dist = vcpu-kvm-arch.vgic;
  + struct vgic_io_device *iodev = container_of(this,
  + struct vgic_io_device, dev);
  + struct kvm_run *run = vcpu-run;
  + const struct vgic_io_range *range;
  + struct kvm_exit_mmio mmio;
  + bool updated_state;
  + gpa_t offset;
  +
  + offset = addr - iodev-addr;
  + range = vgic_find_range(iodev-reg_ranges, len, offset);
  + if (unlikely(!range || !range-handle_mmio)) {
  + pr_warn(Unhandled access %d %08llx %d\n, is_write, addr, len);
  + return -ENXIO;
  + }
  +
  + mmio.phys_addr = addr;
  + mmio.len = len;
  + mmio.is_write = is_write;
  + if (is_write)
  + memcpy(mmio.data, val, len);
  + mmio.private = iodev-redist_vcpu;
  +
  + spin_lock(dist-lock);
  + offset -= range-base;
  + if (vgic_validate_access(dist, range, offset)) {
  + updated_state = call_range_handler(vcpu, mmio, offset, range);
  + if (!is_write)
  + memcpy(val, mmio.data, len);
  + } else {
  + if (!is_write)
  + memset(val, 0, len);
  + updated_state = false;
  + }
  + spin_unlock(dist-lock);
  + kvm_prepare_mmio(run, mmio);
 
  we're not the only user of kvm_exit_mmio I believe, so we could rename
  
  (assuming you mean we _are_ the only user here, which I can acknowledge)
  
  this to vgic_io as well and you could change the mmio.data array to be a
  void *val pointer, which just gets set to the pointer passed into this
  function (which I think points to the kvm_run structs data array) and
  you can avoid all these memcopies, right?
  
  That sounds indeed tempting, but the comment on the struct kvm_exit_mmio
  declaration reads:
  /*
   * The in-kernel MMIO emulation code wants to use a copy of run-mmio,
   * which is an anonymous type. Use our own type instead.
   */
  How I understand this the structure was introduced to _not_ use the same
  memory, but use a copy instead. Do you remember any reason for this? And
  in how far is this type anonymous? It's even in an uapi header.
  
  Briefly looking at the code we do quite some memcpy on the way.
  I am about to go all the way down into that ARM MMIO handling cave now
  to check this (Marc, if I am not showing up again after some hours,
  please come and rescue me ;-)
 
 So, I feel that there is quite some unneeded copying and masking on the
 way, but a real fix would be quite invasive and needs quite some testing
 and review. I don't feel like rushing this into a v2 of this series.
 I quickly did what you proposed (replacing memcpy by pointer
 assignment), and that seems to work, but I don't have many chances of
 testing this this weekend, since I am on the road. Also I have to dig
 out my cross-endian test scripts first. So not sure if you want to take
 the risk with this series.
 I changed the other minor points you mentioned in the review though, so
 do you want to have a v1.5 or how do we proceed from here?
 
Why is it so invasive?  See my previous mail.  If you can convince me
that we're touching something truly nasty (code snippet?) then ok, we
can take a version without the cleanup and cleanup later.

If you're out of time, send me whatever you have, and we'll see if I fix
it up further or just leave it for the future.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/12] KVM: arm/arm64: implement kvm_io_bus MMIO handling for the VGIC

2015-03-20 Thread Andre Przywara

On 03/19/2015 03:44 PM, Andre Przywara wrote:
 Hej Christoffer,
 

[ ... ]

 +static int vgic_handle_mmio_access(struct kvm_vcpu *vcpu,
 +  struct kvm_io_device *this, gpa_t addr,
 +  int len, void *val, bool is_write)
 +{
 +   struct vgic_dist *dist = vcpu-kvm-arch.vgic;
 +   struct vgic_io_device *iodev = container_of(this,
 +   struct vgic_io_device, dev);
 +   struct kvm_run *run = vcpu-run;
 +   const struct vgic_io_range *range;
 +   struct kvm_exit_mmio mmio;
 +   bool updated_state;
 +   gpa_t offset;
 +
 +   offset = addr - iodev-addr;
 +   range = vgic_find_range(iodev-reg_ranges, len, offset);
 +   if (unlikely(!range || !range-handle_mmio)) {
 +   pr_warn(Unhandled access %d %08llx %d\n, is_write, addr, len);
 +   return -ENXIO;
 +   }
 +
 +   mmio.phys_addr = addr;
 +   mmio.len = len;
 +   mmio.is_write = is_write;
 +   if (is_write)
 +   memcpy(mmio.data, val, len);
 +   mmio.private = iodev-redist_vcpu;
 +
 +   spin_lock(dist-lock);
 +   offset -= range-base;
 +   if (vgic_validate_access(dist, range, offset)) {
 +   updated_state = call_range_handler(vcpu, mmio, offset, range);
 +   if (!is_write)
 +   memcpy(val, mmio.data, len);
 +   } else {
 +   if (!is_write)
 +   memset(val, 0, len);
 +   updated_state = false;
 +   }
 +   spin_unlock(dist-lock);
 +   kvm_prepare_mmio(run, mmio);

 we're not the only user of kvm_exit_mmio I believe, so we could rename
 
 (assuming you mean we _are_ the only user here, which I can acknowledge)
 
 this to vgic_io as well and you could change the mmio.data array to be a
 void *val pointer, which just gets set to the pointer passed into this
 function (which I think points to the kvm_run structs data array) and
 you can avoid all these memcopies, right?
 
 That sounds indeed tempting, but the comment on the struct kvm_exit_mmio
 declaration reads:
 /*
  * The in-kernel MMIO emulation code wants to use a copy of run-mmio,
  * which is an anonymous type. Use our own type instead.
  */
 How I understand this the structure was introduced to _not_ use the same
 memory, but use a copy instead. Do you remember any reason for this? And
 in how far is this type anonymous? It's even in an uapi header.
 
 Briefly looking at the code we do quite some memcpy on the way.
 I am about to go all the way down into that ARM MMIO handling cave now
 to check this (Marc, if I am not showing up again after some hours,
 please come and rescue me ;-)

So, I feel that there is quite some unneeded copying and masking on the
way, but a real fix would be quite invasive and needs quite some testing
and review. I don't feel like rushing this into a v2 of this series.
I quickly did what you proposed (replacing memcpy by pointer
assignment), and that seems to work, but I don't have many chances of
testing this this weekend, since I am on the road. Also I have to dig
out my cross-endian test scripts first. So not sure if you want to take
the risk with this series.
I changed the other minor points you mentioned in the review though, so
do you want to have a v1.5 or how do we proceed from here?

Cheers,
Andre.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/12] KVM: arm/arm64: implement kvm_io_bus MMIO handling for the VGIC

2015-03-20 Thread Christoffer Dall

On Thu, Mar 19, 2015 at 03:44:51PM +, Andre Przywara wrote:
 Hej Christoffer,
 
 On 14/03/15 14:27, Christoffer Dall wrote:
  On Fri, Mar 13, 2015 at 04:10:08PM +, Andre Przywara wrote:
  Currently we use a lot of VGIC specific code to do the MMIO
  dispatching.
  Use the previous reworks to add kvm_io_bus style MMIO handlers.
 
  Those are not yet called by the MMIO abort handler, also the actual
  VGIC emulator function do not make use of it yet, but will be enabled
  with the following patches.
 
  Signed-off-by: Andre Przywara andre.przyw...@arm.com
  ---
   include/kvm/arm_vgic.h |9 
   virt/kvm/arm/vgic.c|  111 
  
   virt/kvm/arm/vgic.h|7 +++
   3 files changed, 127 insertions(+)
 
  diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
  index b81630b..4bfc6a3 100644
  --- a/include/kvm/arm_vgic.h
  +++ b/include/kvm/arm_vgic.h
  @@ -24,6 +24,7 @@
   #include linux/irqreturn.h
   #include linux/spinlock.h
   #include linux/types.h
  +#include kvm/iodev.h
 
   #define VGIC_NR_IRQS_LEGACY 256
   #define VGIC_NR_SGIS16
  @@ -147,6 +148,14 @@ struct vgic_vm_ops {
   int (*map_resources)(struct kvm *, const struct vgic_params *);
   };
 
  +struct vgic_io_device {
  +gpa_t addr;
  +int len;
  +const struct vgic_io_range *reg_ranges;
  +struct kvm_vcpu *redist_vcpu;
  +struct kvm_io_device dev;
  +};
  +
   struct vgic_dist {
   spinlock_t  lock;
   boolin_kernel;
  diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
  index 7aae19b..71389b8 100644
  --- a/virt/kvm/arm/vgic.c
  +++ b/virt/kvm/arm/vgic.c
  @@ -32,6 +32,8 @@
   #include asm/kvm_arm.h
   #include asm/kvm_mmu.h
   #include trace/events/kvm.h
  +#include asm/kvm.h
  +#include kvm/iodev.h
 
   /*
* How the whole thing works (courtesy of Christoffer Dall):
  @@ -774,6 +776,66 @@ bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, 
  struct kvm_run *run,
   }
 
   /**
  + * vgic_handle_mmio_access - handle an in-kernel MMIO access
  + * This is called by the read/write KVM IO device wrappers below.
  + * @vcpu:   pointer to the vcpu performing the access
  + * @this:   pointer to the KVM IO device in charge
  + * @addr:   guest physical address of the access
  + * @len:size of the access
  + * @val:pointer to the data region
  + * @is_write:   read or write access
  + *
  + * returns true if the MMIO access could be performed
  + */
  +static int vgic_handle_mmio_access(struct kvm_vcpu *vcpu,
  +   struct kvm_io_device *this, gpa_t addr,
  +   int len, void *val, bool is_write)
  +{
  +struct vgic_dist *dist = vcpu-kvm-arch.vgic;
  +struct vgic_io_device *iodev = container_of(this,
  +struct vgic_io_device, 
  dev);
  +struct kvm_run *run = vcpu-run;
  +const struct vgic_io_range *range;
  +struct kvm_exit_mmio mmio;
  +bool updated_state;
  +gpa_t offset;
  +
  +offset = addr - iodev-addr;
  +range = vgic_find_range(iodev-reg_ranges, len, offset);
  +if (unlikely(!range || !range-handle_mmio)) {
  +pr_warn(Unhandled access %d %08llx %d\n, is_write, addr, 
  len);
  +return -ENXIO;
  +}
  +
  +mmio.phys_addr = addr;
  +mmio.len = len;
  +mmio.is_write = is_write;
  +if (is_write)
  +memcpy(mmio.data, val, len);
  +mmio.private = iodev-redist_vcpu;
  +
  +spin_lock(dist-lock);
  +offset -= range-base;
  +if (vgic_validate_access(dist, range, offset)) {
  +updated_state = call_range_handler(vcpu, mmio, offset, 
  range);
  +if (!is_write)
  +memcpy(val, mmio.data, len);
  +} else {
  +if (!is_write)
  +memset(val, 0, len);
  +updated_state = false;
  +}
  +spin_unlock(dist-lock);
  +kvm_prepare_mmio(run, mmio);
 
  we're not the only user of kvm_exit_mmio I believe, so we could rename
 
 (assuming you mean we _are_ the only user here, which I can acknowledge)
 

yes, I think wanted to write now, not not.

  this to vgic_io as well and you could change the mmio.data array to be a
  void *val pointer, which just gets set to the pointer passed into this
  function (which I think points to the kvm_run structs data array) and
  you can avoid all these memcopies, right?
 
 That sounds indeed tempting, but the comment on the struct kvm_exit_mmio
 declaration reads:
 /*
  * The in-kernel MMIO emulation code wants to use a copy of run-mmio,
  * which is an anonymous type. Use our own type instead.
  */
 How I understand this the structure was introduced to _not_ use the same
 memory, but use a copy instead. Do you remember any reason for this? And
 in how far is this type anonymous? It's even in an uapi header.

the kvm exit api is designed without regard for

Re: [Patch v5] x86: irq_comm: Add check for RH bit in kvm_set_msi_irq

2015-03-20 Thread Radim Krčmář

2015-03-19 16:51-0600, James Sullivan:
 I played around with native_compose_msi_msg and discovered the following:
 
 * dm=0, rh=0 = Physical Destination Mode
 * dm=0, rh=1 = Failed delivery
 * dm=1, rh=0 = Logical Destination Mode, No Redirection
 * dm=1, rh=1 = Logical Destination Mode, Redirection

Great!  (What CPU family was that?)

 So it seems to be the case that logical destination mode is used whenever
 DM=1, regardless of RH. Furthermore, the case where DM=0 and RH=1 is
 undefined, as was indicated in the closing response to the thread in
 https://software.intel.com/en-us/forums/topic/23 :

DM=0+RH=1 might be defined to fail, but I think it's acceptable to
treat it as undefined.  (Deliver them in KVM if it improves something.)

I'm still wondering about last sentence from that link, the
parenthesised part to be exact,
  The reference to the APIC ID being 0xff is because 0xff is broadcast
  and lowest priority (what the RH bit really is for X86) is illegal
  with broadcast.

Can you also check if RH=1 does something to delivery mode?

Thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object

2015-03-20 Thread Bharata B Rao

On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote:
 
 
 On 20.03.15 12:26, Paul Mackerras wrote:
  On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
 
 
  On 20.03.15 10:39, Paul Mackerras wrote:
  From: Bharata B Rao bhar...@linux.vnet.ibm.com
 
  Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
  correctly, certain work arounds have to be employed to allow reuse of
  vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
  proposed workaround is to park the vcpu fd in userspace during cpu unplug
  and reuse it later during next hotplug.
 
  More details can be found here:
  KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
  QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
 
  In order to support this workaround with PowerPC KVM, don't create or
  initialize ICP if the vCPU is found to be already associated with an ICP.
 
  Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
  Signed-off-by: Paul Mackerras pau...@samba.org
 
  This probably makes some sense, but please make sure that user space has
  some way to figure out whether hotplug works at all.
  
  Bharata is working on the qemu side of all this, so I assume he has
  that covered.
 
 Well, so far the kernel doesn't expose anything he can query, so I
 suppose he just blindly assumes that older host kernels will randomly
 break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
 can check on.

I see that you have already taken this into your tree. I have an updated
patch to expose a CAP. If the below patch looks ok, then let me know how
you would prefer to take this patch in.

Regards,
Bharata.

KVM: PPC: BOOK3S: Allow reuse of vCPU object

From: Bharata B Rao bhar...@linux.vnet.ibm.com

Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
correctly, certain work arounds have to be employed to allow reuse of
vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
proposed workaround is to park the vcpu fd in userspace during cpu unplug
and reuse it later during next hotplug.

More details can be found here:
KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html

In order to support this workaround with PowerPC KVM, don't create or
initialize ICP if the vCPU is found to be already associated with an ICP.
User space (QEMU) can reuse the vCPU after checking for the availability
of KVM_CAP_SPAPR_REUSE_VCPU capability.

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_xics.c |9 +++--
 arch/powerpc/kvm/powerpc.c |   12 
 include/uapi/linux/kvm.h   |1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index a4a8d9f..ead3a35 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, 
struct kvm_vcpu *vcpu,
return -EPERM;
if (xics-kvm != vcpu-kvm)
return -EPERM;
-   if (vcpu-arch.irq_type)
-   return -EBUSY;
+
+   /*
+* If irq_type is already set, don't reinialize but
+* return success allowing this vcpu to be reused.
+*/
+   if (vcpu-arch.irq_type != KVMPPC_IRQ_DEFAULT)
+   return 0;
 
r = kvmppc_xics_create_icp(vcpu, xcpu);
if (!r)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 27c0fac..5b7007c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -564,6 +564,18 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = 1;
break;
 #endif
+   case KVM_CAP_SPAPR_REUSE_VCPU:
+   /*
+* Kernel currently doesn't support closing of vCPU fd from
+* user space (QEMU) correctly. Hence the option available
+* is to park the vCPU fd in user space whenever a guest
+* CPU is hot removed and reuse the same later when another
+* guest CPU is hotplugged. This capability determines whether
+* it is safe to assume if parking of vCPU fd and reuse from
+* user space works for sPAPR guests.
+*/
+   r = 1;
+   break;
default:
r = 0;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8055706..8464755 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -760,6 +760,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_ENABLE_HCALL 104
 #define KVM_CAP_CHECK_EXTENSION_VM 105
 #define KVM_CAP_S390_USER_SIGP 106
+#define KVM_CAP_SPAPR_REUSE_VCPU 107
 
 #ifdef KVM_CAP_IRQ_ROUTING
 

--
To unsubscribe from this list: send the line

Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object

2015-03-20 Thread Bharata B Rao

On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote:
 
 
 On 20.03.15 12:26, Paul Mackerras wrote:
  On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
 
 
  On 20.03.15 10:39, Paul Mackerras wrote:
  From: Bharata B Rao bhar...@linux.vnet.ibm.com
 
  Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
  correctly, certain work arounds have to be employed to allow reuse of
  vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
  proposed workaround is to park the vcpu fd in userspace during cpu unplug
  and reuse it later during next hotplug.
 
  More details can be found here:
  KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
  QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
 
  In order to support this workaround with PowerPC KVM, don't create or
  initialize ICP if the vCPU is found to be already associated with an ICP.
 
  Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
  Signed-off-by: Paul Mackerras pau...@samba.org
 
  This probably makes some sense, but please make sure that user space has
  some way to figure out whether hotplug works at all.
  
  Bharata is working on the qemu side of all this, so I assume he has
  that covered.
 
 Well, so far the kernel doesn't expose anything he can query, so I
 suppose he just blindly assumes that older host kernels will randomly
 break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
 can check on.

I see that you have already taken this into your tree. I have an updated
patch to expose a CAP. If the below patch looks ok, then let me know how
you would prefer to take this patch in.

Regards,
Bharata.

KVM: PPC: BOOK3S: Allow reuse of vCPU object

From: Bharata B Rao bhar...@linux.vnet.ibm.com

Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
correctly, certain work arounds have to be employed to allow reuse of
vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
proposed workaround is to park the vcpu fd in userspace during cpu unplug
and reuse it later during next hotplug.

More details can be found here:
KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html

In order to support this workaround with PowerPC KVM, don't create or
initialize ICP if the vCPU is found to be already associated with an ICP.
User space (QEMU) can reuse the vCPU after checking for the availability
of KVM_CAP_SPAPR_REUSE_VCPU capability.

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
---
 arch/powerpc/kvm/book3s_xics.c |9 +++--
 arch/powerpc/kvm/powerpc.c |   12 
 include/uapi/linux/kvm.h   |1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index a4a8d9f..ead3a35 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, 
struct kvm_vcpu *vcpu,
return -EPERM;
if (xics-kvm != vcpu-kvm)
return -EPERM;
-   if (vcpu-arch.irq_type)
-   return -EBUSY;
+
+   /*
+* If irq_type is already set, don't reinialize but
+* return success allowing this vcpu to be reused.
+*/
+   if (vcpu-arch.irq_type != KVMPPC_IRQ_DEFAULT)
+   return 0;
 
r = kvmppc_xics_create_icp(vcpu, xcpu);
if (!r)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 27c0fac..5b7007c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -564,6 +564,18 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = 1;
break;
 #endif
+   case KVM_CAP_SPAPR_REUSE_VCPU:
+   /*
+* Kernel currently doesn't support closing of vCPU fd from
+* user space (QEMU) correctly. Hence the option available
+* is to park the vCPU fd in user space whenever a guest
+* CPU is hot removed and reuse the same later when another
+* guest CPU is hotplugged. This capability determines whether
+* it is safe to assume if parking of vCPU fd and reuse from
+* user space works for sPAPR guests.
+*/
+   r = 1;
+   break;
default:
r = 0;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8055706..8464755 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -760,6 +760,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_ENABLE_HCALL 104
 #define KVM_CAP_CHECK_EXTENSION_VM 105
 #define KVM_CAP_S390_USER_SIGP 106
+#define KVM_CAP_SPAPR_REUSE_VCPU 107
 
 #ifdef KVM_CAP_IRQ_ROUTING
 

--
To unsubscribe from this list: send the line

Re: [Patch v5] x86: irq_comm: Add check for RH bit in kvm_set_msi_irq

2015-03-20 Thread James Sullivan

On 03/20/2015 09:15 AM, Radim Krčmář wrote:
 2015-03-19 16:51-0600, James Sullivan:
 I played around with native_compose_msi_msg and discovered the following:

 * dm=0, rh=0 = Physical Destination Mode
 * dm=0, rh=1 = Failed delivery
 * dm=1, rh=0 = Logical Destination Mode, No Redirection
 * dm=1, rh=1 = Logical Destination Mode, Redirection
 
 Great!  (What CPU family was that?)
 

This was on Intel x86_64 (Core i5-3210m, 'Ivy Bridge').

 So it seems to be the case that logical destination mode is used whenever
 DM=1, regardless of RH. Furthermore, the case where DM=0 and RH=1 is
 undefined, as was indicated in the closing response to the thread in
 https://software.intel.com/en-us/forums/topic/23 :
 
 DM=0+RH=1 might be defined to fail, but I think it's acceptable to
 treat it as undefined.  (Deliver them in KVM if it improves something.)
 

My thoughts as well.

 I'm still wondering about last sentence from that link, the
 parenthesised part to be exact,
   The reference to the APIC ID being 0xff is because 0xff is broadcast
   and lowest priority (what the RH bit really is for X86) is illegal
   with broadcast.
 
 Can you also check if RH=1 does something to delivery mode?
 
 Thanks.
 

Sure, I'll look into that as well.

-James
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: avoid page allocation failure in kvm_set_memory_region()

2015-03-20 Thread Marcelo Tosatti

On Fri, Mar 20, 2015 at 09:51:26AM +, Igor Mammedov wrote:
 KVM guest can fail to startup with following trace on host:
 
 qemu-system-x86: page allocation failure: order:4, mode:0x40d0
 Call Trace:
   dump_stack+0x47/0x67
   warn_alloc_failed+0xee/0x150
   __alloc_pages_direct_compact+0x14a/0x150
   __alloc_pages_nodemask+0x776/0xb80
   alloc_kmem_pages+0x3a/0x110
   kmalloc_order+0x13/0x50
   kmemdup+0x1b/0x40
   __kvm_set_memory_region+0x24a/0x9f0 [kvm]
   kvm_set_ioapic+0x130/0x130 [kvm]
   kvm_set_memory_region+0x21/0x40 [kvm]
   kvm_vm_ioctl+0x43f/0x750 [kvm]
 
 Failure happens when attempting to allocate pages for
 'struct kvm_memslots', however it doesn't have to be
 present in physically contiguous (kmalloc-ed) address
 space, change allocation to kvm_kvzalloc() so that
 it will be vmalloc-ed when its size is more then a page.
 
 Signed-off-by: Igor Mammedov imamm...@redhat.com

Igor, two things:

1) kvm_create_vm should also use vmalloc

r = -ENOMEM;
kvm-memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
if (!kvm-memslots)
goto out_err_no_srcu;

2) there are additional places where its necessary to use
proper freeing function, i believe:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ce7888a..651ff2d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -522,7 +522,7 @@ out_err_no_srcu:
 out_err_no_disable:
for (i = 0; i  KVM_NR_BUSES; i++)
kfree(kvm-buses[i]);
-   kfree(kvm-memslots);
+   kvfree(kvm-memslots);
kvm_arch_free_vm(kvm);
return ERR_PTR(r);
 }
@@ -570,7 +570,7 @@ static void kvm_free_physmem(struct kvm *kvm)
kvm_for_each_memslot(memslot, slots)
kvm_free_physmem_slot(kvm, memslot, NULL);
 
-   kfree(kvm-memslots);
+   kvfree(kvm-memslots);
 }
 
 static void kvm_destroy_devices(struct kvm *kvm)
@@ -909,7 +922,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
kvm_arch_commit_memory_region(kvm, mem, old, change);
 
kvm_free_physmem_slot(kvm, old, new);
-   kfree(old_memslots);
+   kvfree(old_memslots);
 
/*
 * IOMMU mapping:  New slots need to be mapped.  Old slots need to be

 ---
 TODO:
  - work on follow up patches to allocate space for
actual amount of memory_slots instead of possible maximum.
 ---
  virt/kvm/kvm_main.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index a2214d9..7ed1f5c 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -871,10 +871,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
   goto out_free;
   }
  
 - slots = kmemdup(kvm-memslots, sizeof(struct kvm_memslots),
 - GFP_KERNEL);
 + slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
   if (!slots)
   goto out_free;
 + memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
  
   if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
   slot = id_to_memslot(slots, mem-slot);
 @@ -936,7 +936,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
   return 0;
  
  out_slots:
 - kfree(slots);
 + kvfree(slots);
  out_free:
   kvm_free_physmem_slot(kvm, new, old);
  out:
 -- 
 1.8.3.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: avoid page allocation failure in kvm_set_memory_region()

2015-03-20 Thread Igor Mammedov

On Fri, 20 Mar 2015 08:59:03 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:

 On Fri, Mar 20, 2015 at 09:51:26AM +, Igor Mammedov wrote:
  KVM guest can fail to startup with following trace on host:
  
  qemu-system-x86: page allocation failure: order:4, mode:0x40d0
  Call Trace:
dump_stack+0x47/0x67
warn_alloc_failed+0xee/0x150
__alloc_pages_direct_compact+0x14a/0x150
__alloc_pages_nodemask+0x776/0xb80
alloc_kmem_pages+0x3a/0x110
kmalloc_order+0x13/0x50
kmemdup+0x1b/0x40
__kvm_set_memory_region+0x24a/0x9f0 [kvm]
kvm_set_ioapic+0x130/0x130 [kvm]
kvm_set_memory_region+0x21/0x40 [kvm]
kvm_vm_ioctl+0x43f/0x750 [kvm]
  
  Failure happens when attempting to allocate pages for
  'struct kvm_memslots', however it doesn't have to be
  present in physically contiguous (kmalloc-ed) address
  space, change allocation to kvm_kvzalloc() so that
  it will be vmalloc-ed when its size is more then a page.
  
  Signed-off-by: Igor Mammedov imamm...@redhat.com
 
 Igor, two things:
 
 1) kvm_create_vm should also use vmalloc
 
 r = -ENOMEM;
 kvm-memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 if (!kvm-memslots)
 goto out_err_no_srcu;
 
 2) there are additional places where its necessary to use
 proper freeing function, i believe:
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index ce7888a..651ff2d 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -522,7 +522,7 @@ out_err_no_srcu:
  out_err_no_disable:
   for (i = 0; i  KVM_NR_BUSES; i++)
   kfree(kvm-buses[i]);
 - kfree(kvm-memslots);
 + kvfree(kvm-memslots);
   kvm_arch_free_vm(kvm);
   return ERR_PTR(r);
  }
 @@ -570,7 +570,7 @@ static void kvm_free_physmem(struct kvm *kvm)
   kvm_for_each_memslot(memslot, slots)
   kvm_free_physmem_slot(kvm, memslot, NULL);
  
 - kfree(kvm-memslots);
 + kvfree(kvm-memslots);
  }
  
  static void kvm_destroy_devices(struct kvm *kvm)
 @@ -909,7 +922,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
   kvm_arch_commit_memory_region(kvm, mem, old, change);
  
   kvm_free_physmem_slot(kvm, old, new);
 - kfree(old_memslots);
 + kvfree(old_memslots);
  
   /*
* IOMMU mapping:  New slots need to be mapped.  Old slots need to be
Thanks for review,
I'll respin patch after fixing it up.


 
  ---
  TODO:
   - work on follow up patches to allocate space for
 actual amount of memory_slots instead of possible maximum.
  ---
   virt/kvm/kvm_main.c | 6 +++---
   1 file changed, 3 insertions(+), 3 deletions(-)
  
  diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
  index a2214d9..7ed1f5c 100644
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
  @@ -871,10 +871,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
  goto out_free;
  }
   
  -   slots = kmemdup(kvm-memslots, sizeof(struct kvm_memslots),
  -   GFP_KERNEL);
  +   slots = kvm_kvzalloc(sizeof(struct kvm_memslots));
  if (!slots)
  goto out_free;
  +   memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
   
  if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
  slot = id_to_memslot(slots, mem-slot);
  @@ -936,7 +936,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
  return 0;
   
   out_slots:
  -   kfree(slots);
  +   kvfree(slots);
   out_free:
  kvm_free_physmem_slot(kvm, new, old);
   out:
  -- 
  1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] PCI passthrough of 40G ethernet interface (Openstack/KVM)

2015-03-20 Thread jacob jacob

On Thu, Mar 19, 2015 at 10:18 AM, Stefan Assmann sassm...@redhat.com wrote:
 On 19.03.2015 15:04, jacob jacob wrote:
 Hi Stefan,
 have you been able to get PCI passthrough working without any issues
 after the upgrade?

 My XL710 fails to transfer regular TCP traffic (netperf). If that works
 for you then you're already one step ahead of me. Afraid I can't help
 you there.

I have data transfer working when trying the test runs on the host
itself. Are you seeing problems when directly trying the TCP traffic
from the host itself?
The issues that i am seeing are specific to the case when the devices
are passed via PCI passthrough into the VM.

Any ideas whether this would be a kvm/qemu or i40e driver issue?
(Updating to the latest firmware and using latest i40e driver didn't
seem to help.)



   Stefan

 Thanks
 Jacob

 On Thu, Mar 19, 2015 at 4:15 AM, Stefan Assmann sassm...@redhat.com wrote:
 On 18.03.2015 23:06, Shannon Nelson wrote:
 On Wed, Mar 18, 2015 at 3:01 PM, Shannon Nelson
 shannon.nel...@intel.com wrote:


 On Wed, Mar 18, 2015 at 8:40 AM, jacob jacob opstk...@gmail.com wrote:

 On Wed, Mar 18, 2015 at 11:24 AM, Bandan Das b...@redhat.com wrote:

 Actually, Stefan suggests that support for this card is still sketchy
 and your best bet is to try out net-next
 http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git

 Also, could you please post more information about your hardware setup
 (chipset/processor/firmware version on the card etc) ?

 Host CPU : Model name:Intel(R) Xeon(R) CPU E5-2630 v2 @ 
 2.60GHz

 Manufacturer Part Number:  XL710QDA1BLK
 Ethernet controller: Intel Corporation Ethernet Controller XL710 for
 40GbE QSFP+ (rev 01)
  #ethtool -i enp9s0
 driver: i40e
 version: 1.2.6-k
 firmware-version: f4.22 a1.1 n04.24 e800013fd
 bus-info: :09:00.0
 supports-statistics: yes
 supports-test: yes
 supports-eeprom-access: yes
 supports-register-dump: yes
 supports-priv-flags: no


 Jacob,

 It looks like you're using a NIC with the e800013fd firmware from last
 summer, and from a separate message that you saw these issues with
 both the 1.2.2-k and the 1.2.37 version drivers.  I suggest the next
 step would be to update the NIC firmware as there are some performance
 and stability updates available that deal with similar issues.  Please
 see the Intel Networking support webpage at
 https://downloadcenter.intel.com/download/24769 and look for the
 NVMUpdatePackage.zip.  This should take care of several of the things
 Stefan might describe as sketchy :-).

 Interesting, the following might explain why my XL710 feels a bit
 sketchy then. ;-)
 # ethtool -i p4p1
 driver: i40e
 version: 1.2.37-k
 firmware-version: f4.22.26225 a1.1 n4.24 e12ef
 Looks like the firmware on this NIC is even older.

 I tried to update the firmware with nvmupdate64e and the first thing I
 noticed is that you cannot update the firmware even with todays linux
 git. The tool errors out because it cannot access the NVM. Only with a
 recent net-next kernel I was able to update the firmware.
 ethtool -i p4p1
 driver: i40e
 version: 1.2.37-k
 firmware-version: f4.33.31377 a1.2 n4.42 e1932

 However during the update I got a lot of errors in dmesg.
 [  301.796664] i40e :82:00.0: ARQ Error: Unknown event 0x0702 received
 [  301.893933] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [  302.005223] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [...]
 [  387.884635] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [  387.896862] i40e :82:00.0: ARQ Overflow Error detected
 [  387.902995] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received
 [...]
 [  391.583799] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.714217] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.842656] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  391.973080] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.107586] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.244140] i40e :82:00.0: NVMUpdate write failed err=-53 status=0x0 
 errno=-16 module=70 offset=0x0 size=2
 [  392.373966] i40e :82:00.0: ARQ Error: Unknown event 0x0703 received

 Not sure if that flash was actually successful or not.

   Stefan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch v5] x86: irq_comm: Add check for RH bit in kvm_set_msi_irq

2015-03-20 Thread James Sullivan

On 03/20/2015 09:22 AM, James Sullivan wrote:
 On 03/20/2015 09:15 AM, Radim Krčmář wrote:
 2015-03-19 16:51-0600, James Sullivan:
 I played around with native_compose_msi_msg and discovered the following:

 * dm=0, rh=0 = Physical Destination Mode
 * dm=0, rh=1 = Failed delivery
 * dm=1, rh=0 = Logical Destination Mode, No Redirection
 * dm=1, rh=1 = Logical Destination Mode, Redirection

 Great!  (What CPU family was that?)

 
 This was on Intel x86_64 (Core i5-3210m, 'Ivy Bridge').
 
 So it seems to be the case that logical destination mode is used whenever
 DM=1, regardless of RH. Furthermore, the case where DM=0 and RH=1 is
 undefined, as was indicated in the closing response to the thread in
 https://software.intel.com/en-us/forums/topic/23 :

 DM=0+RH=1 might be defined to fail, but I think it's acceptable to
 treat it as undefined.  (Deliver them in KVM if it improves something.)

 
 My thoughts as well.
 
 I'm still wondering about last sentence from that link, the
 parenthesised part to be exact,
   The reference to the APIC ID being 0xff is because 0xff is broadcast
   and lowest priority (what the RH bit really is for X86) is illegal
   with broadcast.

 Can you also check if RH=1 does something to delivery mode?

 Thanks.

 
 Sure, I'll look into that as well.
 
 -James
 

I haven't seen any changes in the MSI Data Register for any values of RH,
but I don't have a great sample size (one machine with one set of PCI devices),
so if anyone else can confirm that I would appreciate it.

Worth noting that low prio delivery was used across the board for my PCI devices
regardless of RH=1 or 0, so it doesn't seem to be de facto the case that the RH
bit's only purpose is for lowprio delivery on x86. Again, need to have some more
PCI devices to test against to confirm anything.

-James
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Windows 7 guest installer does not detect drive if physical partition used instead of disk file.

2015-03-20 Thread Emmanuel Noobadmin

Running
3.18.9-200.fc21.x86_64
qemu 2:2.1.3-3.fc21
libvirt 1.2.9.2-1.fc21
System is a Thinkpad X250 with Intel i7-5600u Broadwell GT2

I'm trying to replace the Win7 installation on my laptop with Fedora
21 and virtualizing Windows 7 for work purposes. I'd prefer to give
the guest its own NTFS partition instead of using a file for both
performance and ease of potential recovery.

So I've set aside unpartitioned space on the hard disk and added
/dev/sda to the virt-manager storage pool, created a new volume and
assigned it to the guest as an IDE drive. Unfortunately, the Windows 7
installer does not see this drive despite being IDE and not virtio.
If I use a qcow2 file as the drive, the installer has no problems
detecting it.

To eliminate virt-manager from the equation, I've also tried to do a
very basic install using virt-install with similar results, the
physical partition cannot be detected regardless of bus type
(IDE/SATA/virtio) even with the signed Redhat virtio drivers loaded by
the installer.

I was unable to find any similar issues or solutions online except a 2
year old thread on linuxquestions which quoted that we must specify
the whole disk instead of a partition. However, I cannot find the
source of that quote.
http://www.linuxquestions.org/questions/linux-virtualization-and-cloud-90/qemu-kvm-on-a-real-partition-947162/

Is this really the case and the reason why Windows 7 cannot see the
physical partition or there is something else I am doing wrong?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/2] drivers/vfio: Support EEH error injection

2015-03-20 Thread Gavin Shan

The patch adds one more EEH sub-command (VFIO_EEH_PE_INJECT_ERR)
to inject the specified EEH error, which is represented by
(struct vfio_eeh_pe_err), to the indicated PE for testing purpose.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 Documentation/vfio.txt| 12 
 drivers/vfio/vfio_spapr_eeh.c | 10 ++
 include/uapi/linux/vfio.h | 36 +++-
 3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
index 96978ec..c6e11a3 100644
--- a/Documentation/vfio.txt
+++ b/Documentation/vfio.txt
@@ -385,6 +385,18 @@ The code flow from the example above should be slightly 
changed:
 

 
+   /* Inject EEH error, which is expected to be caused by 32-bits
+* config load.
+*/
+   pe_op.op = VFIO_EEH_PE_INJECT_ERR;
+   pe_op.err.type = VFIO_EEH_ERR_TYPE_32;
+   pe_op.err.func = VFIO_EEH_ERR_FUNC_LD_CFG_ADDR;
+   pe_op.err.addr = 0ul;
+   pe_op.err.mask = 0ul;
+   ioctl(container, VFIO_EEH_PE_OP, pe_op);
+
+   
+
/* When 0xFF's returned from reading PCI config space or IO BARs
 * of the PCI device. Check the PE's state to see if that has been
 * frozen.
diff --git a/drivers/vfio/vfio_spapr_eeh.c b/drivers/vfio/vfio_spapr_eeh.c
index 5fa42db..38edeb4 100644
--- a/drivers/vfio/vfio_spapr_eeh.c
+++ b/drivers/vfio/vfio_spapr_eeh.c
@@ -85,6 +85,16 @@ long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
case VFIO_EEH_PE_CONFIGURE:
ret = eeh_pe_configure(pe);
break;
+   case VFIO_EEH_PE_INJECT_ERR:
+   minsz = offsetofend(struct vfio_eeh_pe_op, err.mask);
+   if (op.argsz  minsz)
+   return -EINVAL;
+   if (copy_from_user(op, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   ret = eeh_pe_inject_err(pe, op.err.type, op.err.func,
+   op.err.addr, op.err.mask);
+   break;
default:
ret = -EINVAL;
}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 82889c3..f68e962 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -468,12 +468,23 @@ struct vfio_iommu_spapr_tce_info {
  * - unfreeze IO/DMA for frozen PE;
  * - read PE state;
  * - reset PE;
- * - configure PE.
+ * - configure PE;
+ * - inject EEH error.
  */
+struct vfio_eeh_pe_err {
+   __u32 type;
+   __u32 func;
+   __u64 addr;
+   __u64 mask;
+};
+
 struct vfio_eeh_pe_op {
__u32 argsz;
__u32 flags;
__u32 op;
+   union {
+   struct vfio_eeh_pe_err err;
+   };
 };
 
 #define VFIO_EEH_PE_DISABLE0   /* Disable EEH functionality */
@@ -490,6 +501,29 @@ struct vfio_eeh_pe_op {
 #define VFIO_EEH_PE_RESET_HOT  6   /* Assert hot reset  */
 #define VFIO_EEH_PE_RESET_FUNDAMENTAL  7   /* Assert fundamental reset  */
 #define VFIO_EEH_PE_CONFIGURE  8   /* PE configuration  */
+#define VFIO_EEH_PE_INJECT_ERR 9   /* Inject EEH error  */
+#define  VFIO_EEH_ERR_TYPE_32  0   /* 32-bits EEH error type*/
+#define  VFIO_EEH_ERR_TYPE_64  1   /* 64-bits EEH error type*/
+#define  VFIO_EEH_ERR_FUNC_LD_MEM_ADDR 0   /* Memory load  */
+#define  VFIO_EEH_ERR_FUNC_LD_MEM_DATA 1
+#define  VFIO_EEH_ERR_FUNC_LD_IO_ADDR  2   /* IO load  */
+#define  VFIO_EEH_ERR_FUNC_LD_IO_DATA  3
+#define  VFIO_EEH_ERR_FUNC_LD_CFG_ADDR 4   /* Config load  */
+#define  VFIO_EEH_ERR_FUNC_LD_CFG_DATA 5
+#define  VFIO_EEH_ERR_FUNC_ST_MEM_ADDR 6   /* Memory store */
+#define  VFIO_EEH_ERR_FUNC_ST_MEM_DATA 7
+#define  VFIO_EEH_ERR_FUNC_ST_IO_ADDR  8   /* IO store */
+#define  VFIO_EEH_ERR_FUNC_ST_IO_DATA  9
+#define  VFIO_EEH_ERR_FUNC_ST_CFG_ADDR 10  /* Config store */
+#define  VFIO_EEH_ERR_FUNC_ST_CFG_DATA 11
+#define  VFIO_EEH_ERR_FUNC_DMA_RD_ADDR 12  /* DMA read */
+#define  VFIO_EEH_ERR_FUNC_DMA_RD_DATA 13
+#define  VFIO_EEH_ERR_FUNC_DMA_RD_MASTER   14
+#define  VFIO_EEH_ERR_FUNC_DMA_RD_TARGET   15
+#define  VFIO_EEH_ERR_FUNC_DMA_WR_ADDR 16  /* DMA write*/
+#define  VFIO_EEH_ERR_FUNC_DMA_WR_DATA 17
+#define  VFIO_EEH_ERR_FUNC_DMA_WR_MASTER   18
+#define  VFIO_EEH_ERR_FUNC_DMA_WR_TARGET   19
 
 #define VFIO_EEH_PE_OP _IO(VFIO_TYPE, VFIO_BASE + 21)
 
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/2] powerpc/eeh: Introduce eeh_pe_inject_err()

2015-03-20 Thread Gavin Shan

The patch defines PCI error types and functions in eeh.h and
exports function eeh_pe_inject_err(), which will be called by
VFIO driver to inject the specified PCI error to the indicated
PE for testing purpose.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h | 26 ++
 arch/powerpc/kernel/eeh.c  | 35 +++
 2 files changed, 61 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 61912fc..85a17de 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -194,6 +194,30 @@ enum {
 #define EEH_RESET_FUNDAMENTAL  3   /* Fundamental reset*/
 #define EEH_LOG_TEMP   1   /* EEH temporary error log  */
 #define EEH_LOG_PERM   2   /* EEH permanent error log  */
+#define EEH_ERR_TYPE_320   /* 32-bits PCI error
*/
+#define EEH_ERR_TYPE_641   /* 64-bits PCI error
*/
+#define EEH_ERR_FUNC_MIN   0
+#define EEH_ERR_FUNC_LD_MEM_ADDR   0   /* Memory load  */
+#define EEH_ERR_FUNC_LD_MEM_DATA   1
+#define EEH_ERR_FUNC_LD_IO_ADDR2   /* IO load  */
+#define EEH_ERR_FUNC_LD_IO_DATA3
+#define EEH_ERR_FUNC_LD_CFG_ADDR   4   /* Config load  */
+#define EEH_ERR_FUNC_LD_CFG_DATA   5
+#define EEH_ERR_FUNC_ST_MEM_ADDR   6   /* Memory store */
+#define EEH_ERR_FUNC_ST_MEM_DATA   7
+#define EEH_ERR_FUNC_ST_IO_ADDR8   /* IO store */
+#define EEH_ERR_FUNC_ST_IO_DATA9
+#define EEH_ERR_FUNC_ST_CFG_ADDR   10  /* Config store */
+#define EEH_ERR_FUNC_ST_CFG_DATA   11
+#define EEH_ERR_FUNC_DMA_RD_ADDR   12  /* DMA read */
+#define EEH_ERR_FUNC_DMA_RD_DATA   13
+#define EEH_ERR_FUNC_DMA_RD_MASTER 14
+#define EEH_ERR_FUNC_DMA_RD_TARGET 15
+#define EEH_ERR_FUNC_DMA_WR_ADDR   16  /* DMA write*/
+#define EEH_ERR_FUNC_DMA_WR_DATA   17
+#define EEH_ERR_FUNC_DMA_WR_MASTER 18
+#define EEH_ERR_FUNC_DMA_WR_TARGET 19
+#define EEH_ERR_FUNC_MAX   19
 
 struct eeh_ops {
char *name;
@@ -293,6 +317,8 @@ int eeh_pe_set_option(struct eeh_pe *pe, int option);
 int eeh_pe_get_state(struct eeh_pe *pe);
 int eeh_pe_reset(struct eeh_pe *pe, int option);
 int eeh_pe_configure(struct eeh_pe *pe);
+int eeh_pe_inject_err(struct eeh_pe *pe, int type, int func,
+ unsigned long addr, unsigned long mask);
 
 /**
  * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure.
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 76253eb..daa68a1 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1636,6 +1636,41 @@ int eeh_pe_configure(struct eeh_pe *pe)
 }
 EXPORT_SYMBOL_GPL(eeh_pe_configure);
 
+/**
+ * eeh_pe_inject_err - Injecting the specified PCI error to the indicated PE
+ * @pe: the indicated PE
+ * @type: error type
+ * @function: error function
+ * @addr: address
+ * @mask: address mask
+ *
+ * The routine is called to inject the specified PCI error, which
+ * is determined by @type and @function, to the indicated PE for
+ * testing purpose.
+ */
+int eeh_pe_inject_err(struct eeh_pe *pe, int type, int func,
+ unsigned long addr, unsigned long mask)
+{
+   /* Invalid PE ? */
+   if (!pe)
+   return -ENODEV;
+
+   /* Unsupported operation ? */
+   if (!eeh_ops || !eeh_ops-err_inject)
+   return -ENOENT;
+
+   /* Check on PCI error type */
+   if (type != EEH_ERR_TYPE_32  type != EEH_ERR_TYPE_64)
+   return -EINVAL;
+
+   /* Check on PCI error function */
+   if (func  EEH_ERR_FUNC_MIN || func  EEH_ERR_FUNC_MAX)
+   return -EINVAL;
+
+   return eeh_ops-err_inject(pe, type, func, addr, mask);
+}
+EXPORT_SYMBOL_GPL(eeh_pe_inject_err);
+
 static int proc_eeh_show(struct seq_file *m, void *v)
 {
if (!eeh_enabled()) {
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 0/2] EEH Error Injection Support for VFIO Devices

2015-03-20 Thread Gavin Shan

Those two patches are extention to EEH support for VFIO PCI devices,
which allows to inject EEH errors to VFIO PCI devices from userspace
for testing purpose.

Changelog
=
v2 - v3:
* Use offsetofend(), instead of sizeof(struct vfio_eeh_pe_op)
  to calculate argument buffer size.
v1 - v2:
* Use EEH_ERR_FUNC_{MIN,MAX} to validate PCI error function.
* Put additional arguments for error injection to union in
  struct vfio_eeh_pe_op.

Gavin Shan (2):
  powerpc/eeh: Introduce eeh_pe_inject_err()
  drivers/vfio: Support EEH error injection

 Documentation/vfio.txt | 12 
 arch/powerpc/include/asm/eeh.h | 26 ++
 arch/powerpc/kernel/eeh.c  | 35 +++
 drivers/vfio/vfio_spapr_eeh.c  | 10 ++
 include/uapi/linux/vfio.h  | 36 +++-
 5 files changed, 118 insertions(+), 1 deletion(-)

-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

97 matches

Mail list logo