from:"Paul Mackerras"

[GIT PULL] Please pull my kvm-ppc-fixes branch

2015-12-09 Thread Paul Mackerras

Hi Paolo,

I have a small patch that I would like to get into 4.4 because it
fixes a bug which for certain kernel configs allows userspace to crash
the kernel.  The configs are those for which KVM_BOOK3S_64_HV is set
(y or m) and KVM_BOOK3S_64_PR is not.  Fortunately most distros that
enable KVM_BOOK3S_64_HV also enable KVM_BOOK3S_64_PR, as far as I can
tell.

Thanks,
Paul.

The following changes since commit 09922076003ad66de41ea14d2f8c3b4a16ec7774:

  Merge tag 'kvm-arm-for-v4.4-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master 
(2015-12-04 18:32:32 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes

for you to fetch changes up to c20875a3e638e4a03e099b343ec798edd1af5cc6:

  KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR 
(2015-12-10 11:34:27 +1100)

----
Paul Mackerras (1):
  KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR

 arch/powerpc/kvm/book3s_hv.c | 6 ++
 1 file changed, 6 insertions(+)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Increase memslots to 512

2015-12-09 Thread Paul Mackerras

On Wed, Dec 09, 2015 at 11:34:07AM +0100, Thomas Huth wrote:
> Only using 32 memslots for KVM on powerpc is way too low, you can
> nowadays hit this limit quite fast by adding a couple of PCI devices
> and/or pluggable memory DIMMs to the guest.
> 
> x86 already increased the KVM_USER_MEM_SLOTS to 509, to satisfy 256
> pluggable DIMM slots, 3 private slots and 253 slots for other things
> like PCI devices (i.e. resulting in 256 + 3 + 253 = 512 slots in
> total). We should do something similar for powerpc, and since we do
> not use private slots here, we can set the value to 512 directly.
> 
> While we're at it, also remove the KVM_MEM_SLOTS_NUM definition
> from the powerpc-specific header since this gets defined in the
> generic kvm_host.h header anyway.
> 
> Signed-off-by: Thomas Huth 

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8

2015-12-09 Thread Paul Mackerras

On Fri, Nov 20, 2015 at 09:11:45AM +0100, Thomas Huth wrote:
> In the old DABR register, the BT (Breakpoint Translation) bit
> is bit number 61. In the new DAWRX register, the WT (Watchpoint
> Translation) bit is bit number 59. So to move the DABR-BT bit
> into the position of the DAWRX-WT bit, it has to be shifted by
> two, not only by one. This fixes hardware watchpoints in gdb of
> older guests that only use the H_SET_DABR/X interface instead
> of the new H_SET_MODE interface.
> 
> Signed-off-by: Thomas Huth 

Thanks, applied to my kvm-ppc-next branch, with cc: sta...@vger.kernel.org.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: remove unused variable 'vcpu_book3s'

2015-12-09 Thread Paul Mackerras

On Tue, Dec 01, 2015 at 08:42:10PM -0300, Geyslan G. Bem wrote:
> The vcpu_book3s struct is assigned but never used. So remove it.
> 
> Signed-off-by: Geyslan G. Bem 

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] KVM: PPC: Increase memslots to 320

2015-12-08 Thread Paul Mackerras

On Wed, Nov 04, 2015 at 10:03:48AM +0100, Thomas Huth wrote:
> Only using 32 memslots for KVM on powerpc is way too low, you can
> nowadays hit this limit quite fast by adding a couple of PCI devices
> and/or pluggable memory DIMMs to the guest.
> x86 already increased the limit to 512 in total, to satisfy 256
> pluggable DIMM slots, 3 private slots and 253 slots for other things
> like PCI devices. On powerpc, we only have 32 pluggable DIMMs in

I agree with increasing the limit.  Is there a reason we have only 32
pluggable DIMMs in QEMU on powerpc, not more?  Should we be increasing
that limit too?  If so, maybe we should increase the number of memory
slots to 512?

> QEMU, not 256, so we likely do not as much slots as on x86. Thus

"so we likely do not need as many slots as on x86" would be better
English.

> setting the slot limit to 320 sounds like a good value for the
> time being (until we have some code in the future to resize the
> memslot array dynamically).
> And while we're at it, also remove the KVM_MEM_SLOTS_NUM definition
> from the powerpc-specific header since this gets defined in the
> generic kvm_host.h header anyway.
> 
> Signed-off-by: Thomas Huth 

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm

2015-11-29 Thread Paul Mackerras

On Tue, Sep 15, 2015 at 08:49:35PM +1000, Alexey Kardashevskiy wrote:
> At the moment pages used for TCE tables (in addition to pages addressed
> by TCEs) are not counted in locked_vm counter so a malicious userspace
> tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
> lock a lot of memory.
> 
> This adds counting for pages used for TCE tables.
> 
> This counts the number of pages required for a table plus pages for
> the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.
> 
> This does not change the amount of (de)allocated memory.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 51 
> +++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> b/arch/powerpc/kvm/book3s_64_vio.c
> index 9526c34..b70787d 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size)
>* sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> +static long kvmppc_account_memlimit(long npages, bool inc)
> +{
> + long ret = 0;
> + const long bytes = sizeof(struct kvmppc_spapr_tce_table) +
> + (abs(npages) * sizeof(struct page *));

Why abs(npages)?  Can npages be negative?  If so, what does that mean?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] kvm - possible out of bounds

2015-11-29 Thread Paul Mackerras

On Sun, Nov 29, 2015 at 05:14:03PM -0300, Geyslan Gregório Bem wrote:
> Hello,
> 
> I have found a possible out of bounds reading in
> arch/powerpc/kvm/book3s_64_mmu.c (kvmppc_mmu_book3s_64_xlate
> function). pteg[] array could be accessed twice using the i variable
> after the for iteration. What happens is that in the last iteration
> the i index is incremented to 16, checked (i<16) then confirmed
> exiting the loop.
> 
> 277for (i=0; i<16; i+=2) { ...
> 
> Later there are reading attempts to the pteg last elements, but using
> again the already incremented i (16).
> 
> 303v = be64_to_cpu(pteg[i]);  /* pteg[16] */
> 304r = be64_to_cpu(pteg[i+1]); /* pteg[17] */

Was it some automated tool that came up with this?

There is actually no problem because the accesses outside the loop are
only done if the 'found' variable is true; 'found' is initialized to
false and only ever set to true inside the loop just before a break
statement.  Thus there is a correlation between the value of 'i' and
the value of 'found' -- if 'found' is true then we know 'i' is less
than 16.

> I really don't know if the for lace will somehow iterate until i is
> 16, anyway I think that the last readings must be using a defined max
> len/index or another more clear method.

I think it's perfectly clear to a human programmer, though some tools
(such as gcc) struggle with this kind of correlation between
variables.  That's why I asked whether your report was based on the
output from some tool.

> Eg.
> 
> v = be64_to_cpu(pteg[PTEG_LEN - 2]);
> r = be64_to_cpu(pteg[PTEG_LEN - 1]);
> 
> Or just.
> 
> v = be64_to_cpu(pteg[14]);
> r = be64_to_cpu(pteg[15]);

Either of those options would cause the code to malfunction.

> I found in the same file a variable that is not used.
> 
> 380struct kvmppc_vcpu_book3s *vcpu_book3s;
> ...
> 387vcpu_book3s = to_book3s(vcpu);

True.  It could be removed.

> A question, the kvmppc_mmu_book3s_64_init function is accessed by
> unconventional way? Because I have not found any calling to it.

Try arch/powerpc/kvm/book3s_pr.c line 410:

kvmppc_mmu_book3s_64_init(vcpu);

Grep (or git grep) is your friend.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR

2015-11-11 Thread Paul Mackerras

Currently it is possible for userspace (e.g. QEMU) to set a value
for the MSR for a guest VCPU which has both of the TS bits set,
which is an illegal combination.  The result of this is that when
we execute a hrfid (hypervisor return from interrupt doubleword)
instruction to enter the guest, the CPU will take a TM Bad Thing
type of program interrupt (vector 0x700).

Now, if PR KVM is configured in the kernel along with HV KVM, we
actually handle this without crashing the host or giving hypervisor
privilege to the guest; instead what happens is that we deliver a
program interrupt to the guest, with SRR0 reflecting the address
of the hrfid instruction and SRR1 containing the MSR value at that
point.  If PR KVM is not configured in the kernel, then we try to
run the host's program interrupt handler with the MMU set to the
guest context, which almost certainly causes a host crash.

This closes the hole by making kvmppc_set_msr_hv() check for the
illegal combination and force the TS field to a safe value (00,
meaning non-transactional).

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index becad3a..f668712 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -231,6 +231,12 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
 
 static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr)
 {
+   /*
+* Check for illegal transactional state bit combination
+* and if we find it, force the TS field to a safe state.
+*/
+   if ((msr & MSR_TS_MASK) == MSR_TS_MASK)
+   msr &= ~MSR_TS_MASK;
vcpu->arch.shregs.msr = msr;
kvmppc_end_cede(vcpu);
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: PPC: Book3S HV: Handle unexpected traps in guest entry/exit code better

2015-11-11 Thread Paul Mackerras

As we saw with the TM Bad Thing type of program interrupt occurring
on the hrfid that enters the guest, it is not completely impossible
to have a trap occurring in the guest entry/exit code, despite the
fact that the code has been written to avoid taking any traps.

This adds a check in the kvmppc_handle_exit_hv() function to detect
the case when a trap has occurred in the hypervisor-mode code, and
instead of treating it just like a trap in guest code, we now print
a message and return to userspace with a KVM_EXIT_INTERNAL_ERROR
exit reason.

Of the various interrupts that get handled in the assembly code in
the guest exit path and that can return directly to the guest, the
only one that can occur when MSR.HV=1 and MSR.EE=0 is machine check
(other than system call, which we can avoid just by not doing a sc
instruction).  Therefore this adds code to the machine check path to
ensure that if the MCE occurred in hypervisor mode, we exit to the
host rather than trying to continue the guest.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c| 18 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  2 ++
 2 files changed, 20 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index f668712..d6baf0a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -846,6 +846,24 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
 
vcpu->stat.sum_exits++;
 
+   /*
+* This can happen if an interrupt occurs in the last stages
+* of guest entry or the first stages of guest exit (i.e. after
+* setting paca->kvm_hstate.in_guest to KVM_GUEST_MODE_GUEST_HV
+* and before setting it to KVM_GUEST_MODE_HOST_HV).
+* That can happen due to a bug, or due to a machine check
+* occurring at just the wrong time.
+*/
+   if (vcpu->arch.shregs.msr & MSR_HV) {
+   printk(KERN_EMERG "KVM trap in HV mode!\n");
+   printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n",
+   vcpu->arch.trap, kvmppc_get_pc(vcpu),
+   vcpu->arch.shregs.msr);
+   kvmppc_dump_regs(vcpu);
+   run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+   run->hw.hardware_exit_reason = vcpu->arch.trap;
+   return RESUME_HOST;
+   }
run->exit_reason = KVM_EXIT_UNKNOWN;
run->ready_for_interrupt_injection = 1;
switch (vcpu->arch.trap) {
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 3c6badc..b3ce8ff 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2404,6 +2404,8 @@ machine_check_realmode:
 * guest as machine check causing guest to crash.
 */
ld  r11, VCPU_MSR(r9)
+   rldicl. r0, r11, 64-MSR_HV_LG, 63 /* check if it happened in HV mode */
+   bne mc_cont /* if so, exit to host */
andi.   r10, r11, MSR_RI/* check for unrecoverable exception */
beq 1f  /* Deliver a machine check to guest */
ld  r10, VCPU_PC(r9)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Please pull my kvm-ppc-fixes branch

2015-11-11 Thread Paul Mackerras

Paolo,

I have two fixes for HV KVM which I would like to have included in
v4.4-rc1.  The first one is a fix for a bug identified by Red Hat
which causes occasional guest crashes.  The second one fixes a bug
which causes host stalls and timeouts under certain circumstances when
the host is configured for static 2-way micro-threading mode.

Thanks,
Paul.

The following changes since commit a3eaa8649e4c6a6afdafaa04b9114fb230617bb1:

  KVM: VMX: Fix commit which broke PML (2015-11-05 11:34:11 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes

for you to fetch changes up to f74f2e2e26199f695ca3df94f29e9ab7cb707ea4:

  KVM: PPC: Book3S HV: Don't dynamically split core when already split 
(2015-11-06 16:02:59 +1100)


Paul Mackerras (2):
  KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails
  KVM: PPC: Book3S HV: Don't dynamically split core when already split

 arch/powerpc/kvm/book3s_hv.c|  2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 
 2 files changed, 13 insertions(+), 9 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Don't dynamically split core when already split

2015-11-05 Thread Paul Mackerras

In static micro-threading modes, the dynamic micro-threading code
is supposed to be disabled, because subcores can't make independent
decisions about what micro-threading mode to put the core in - there is
only one micro-threading mode for the whole core.  The code that
implements dynamic micro-threading checks for this, except that the
check was missed in one case.  This means that it is possible for a
subcore in static 2-way micro-threading mode to try to put the core
into 4-way micro-threading mode, which usually leads to stuck CPUs,
spinlock lockups, and other stalls in the host.

The problem was in the can_split_piggybacked_subcores() function, which
should always return false if the system is in a static micro-threading
mode.  This fixes the problem by making can_split_piggybacked_subcores()
use subcore_config_ok() for its checks, as subcore_config_ok() includes
the necessary check for the static micro-threading modes.

Credit to Gautham Shenoy for working out that the reason for the hangs
and stalls we were seeing was that we were trying to do dynamic 4-way
micro-threading while we were in static 2-way mode.

Fixes: b4deba5c41e9
Cc: v...@stable.kernel.org # v4.3
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2280497..becad3a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2060,7 +2060,7 @@ static bool can_split_piggybacked_subcores(struct 
core_info *cip)
return false;
n_subcores += (cip->subcore_threads[sub] - 1) >> 1;
}
-   if (n_subcores > 3 || large_sub < 0)
+   if (large_sub < 0 || !subcore_config_ok(n_subcores + 1, 2))
return false;
 
/*
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails

2015-10-26 Thread Paul Mackerras

When handling a hypervisor data or instruction storage interrupt (HDSI
or HISI), we look up the SLB entry for the address being accessed in
order to translate the effective address to a virtual address which can
be looked up in the guest HPT.  This lookup can occasionally fail due
to the guest replacing an SLB entry without invalidating the evicted
SLB entry.  In this situation an ERAT (effective to real address
translation cache) entry can persist and be used by the hardware even
though there is no longer a corresponding SLB entry.

Previously we would just deliver a data or instruction storage interrupt
(DSI or ISI) to the guest in this case.  However, this is not correct
and has been observed to cause guests to crash, typically with a
data storage protection interrupt on a store to the vmemmap area.

Instead, what we do now is to synthesize a data or instruction segment
interrupt.  That should cause the guest to reload an appropriate entry
into the SLB and retry the faulting instruction.  If it still faults,
we should find an appropriate SLB entry next time and be able to handle
the fault.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b1dab8d..3c6badc 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1749,7 +1749,8 @@ kvmppc_hdsi:
beq 3f
clrrdi  r0, r4, 28
PPC_SLBFEE_DOT(R5, R0)  /* if so, look up SLB */
-   bne 1f  /* if no SLB entry found */
+   li  r0, BOOK3S_INTERRUPT_DATA_SEGMENT
+   bne 7f  /* if no SLB entry found */
 4: std r4, VCPU_FAULT_DAR(r9)
stw r6, VCPU_FAULT_DSISR(r9)
 
@@ -1768,14 +1769,15 @@ kvmppc_hdsi:
cmpdi   r3, -2  /* MMIO emulation; need instr word */
beq 2f
 
-   /* Synthesize a DSI for the guest */
+   /* Synthesize a DSI (or DSegI) for the guest */
ld  r4, VCPU_FAULT_DAR(r9)
mr  r6, r3
-1: mtspr   SPRN_DAR, r4
+1: li  r0, BOOK3S_INTERRUPT_DATA_STORAGE
mtspr   SPRN_DSISR, r6
+7: mtspr   SPRN_DAR, r4
mtspr   SPRN_SRR0, r10
mtspr   SPRN_SRR1, r11
-   li  r10, BOOK3S_INTERRUPT_DATA_STORAGE
+   mr  r10, r0
bl  kvmppc_msr_interrupt
 fast_interrupt_c_return:
 6: ld  r7, VCPU_CTR(r9)
@@ -1823,7 +1825,8 @@ kvmppc_hisi:
beq 3f
clrrdi  r0, r10, 28
PPC_SLBFEE_DOT(R5, R0)  /* if so, look up SLB */
-   bne 1f  /* if no SLB entry found */
+   li  r0, BOOK3S_INTERRUPT_INST_SEGMENT
+   bne 7f  /* if no SLB entry found */
 4:
/* Search the hash table. */
mr  r3, r9  /* vcpu pointer */
@@ -1840,11 +1843,12 @@ kvmppc_hisi:
cmpdi   r3, -1  /* handle in kernel mode */
beq guest_exit_cont
 
-   /* Synthesize an ISI for the guest */
+   /* Synthesize an ISI (or ISegI) for the guest */
mr  r11, r3
-1: mtspr   SPRN_SRR0, r10
+1: li  r0, BOOK3S_INTERRUPT_INST_STORAGE
+7: mtspr   SPRN_SRR0, r10
mtspr   SPRN_SRR1, r11
-   li  r10, BOOK3S_INTERRUPT_INST_STORAGE
+   mr  r10, r0
bl  kvmppc_msr_interrupt
b   fast_interrupt_c_return
 
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Implement extension to report number of memslots

2015-10-25 Thread Paul Mackerras

On Fri, Oct 16, 2015 at 08:41:31AM +0200, Thomas Huth wrote:
> Yes, we'll likely need this soon! 32 slots are not enough...

Would anyone object if I raised the limit for PPC to 512 slots?
Would that cause problems on embedded PPC, for instance?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Please pull my kvm-ppc-next branch

2015-10-25 Thread Paul Mackerras

Paolo,

Here is my current patch queue for KVM on PPC.  There's nothing much
in the way of new features this time; it's mostly bug fixes, plus
Nikunj has implemented support for KVM_CAP_NR_MEMSLOTS.  These are
intended for the "next" branch of the KVM tree.  Please pull.

Thanks,
Paul.

The following changes since commit 9ffecb10283508260936b96022d4ee43a7798b4c:

  Linux 4.3-rc3 (2015-09-27 07:50:08 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-next

for you to fetch changes up to 70aa3961a196ac32baf54032b2051bac9a941118:

  KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path (2015-10-21 
16:31:52 +1100)


Andrzej Hajda (1):
  KVM: PPC: e500: fix handling local_sid_lookup result

Gautham R. Shenoy (1):
  KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path

Mahesh Salgaonkar (1):
  KVM: PPC: Book3S HV: Deliver machine check with MSR(RI=0) to guest as MCE

Nikunj A Dadhania (1):
  KVM: PPC: Implement extension to report number of memslots

Paul Mackerras (2):
  KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation 
ioctl
  KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent 
HPTEs

Tudor Laurentiu (3):
  powerpc/e6500: add TMCFG0 register definition
  KVM: PPC: e500: Emulate TMCFG0 TMRN register
  KVM: PPC: e500: fix couple of shift operations on 64 bits

 arch/powerpc/include/asm/disassemble.h  |  5 +
 arch/powerpc/include/asm/reg_booke.h|  6 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c |  3 ++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  2 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 29 ++---
 arch/powerpc/kvm/e500.c |  3 ++-
 arch/powerpc/kvm/e500_emulate.c | 19 +++
 arch/powerpc/kvm/e500_mmu_host.c|  4 ++--
 arch/powerpc/kvm/powerpc.c  |  3 +++
 9 files changed, 63 insertions(+), 11 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent HPTEs

2015-10-14 Thread Paul Mackerras

This fixes a bug where the old HPTE value returned by H_REMOVE has
the valid bit clear if the HPTE was an absent HPTE, as happens for
HPTEs for emulated MMIO pages and for RAM pages that have been paged
out by the host.  If the absent bit is set, we clear it and set the
valid bit, because from the guest's point of view, the HPTE is valid.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c1df9bb..97e7f8c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -470,6 +470,8 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
note_hpte_modification(kvm, rev);
unlock_hpte(hpte, 0);
 
+   if (v & HPTE_V_ABSENT)
+   v = (v & ~HPTE_V_ABSENT) | HPTE_V_VALID;
hpret[0] = v;
hpret[1] = r;
return H_SUCCESS;
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl

2015-10-14 Thread Paul Mackerras

Currently the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested
size of HPT, and if that is not possible, then try to allocate smaller
sizes (by factors of 2) until either a minimum is reached or the
allocation succeeds.  This is not ideal for userspace, particularly in
migration scenarios, where the destination VM really does require the
size requested.  Also, the minimum HPT size of 256kB may be
insufficient for the guest to run successfully.

This removes the fallback to smaller sizes on allocation failure for
the KVM_PPC_ALLOCATE_HTAB ioctl.  The fallback still exists for the
case where the HPT is allocated at the time the first VCPU is run, if
no HPT has been allocated by ioctl by that time.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 1f9c0a1..10722b1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -70,7 +70,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
 
/* Lastly try successively smaller sizes from the page allocator */
-   while (!hpt && order > PPC_MIN_HPT_ORDER) {
+   /* Only do this if userspace didn't specify a size via ioctl */
+   while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
   __GFP_NOWARN, order - PAGE_SHIFT);
if (!hpt)
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 15/19] KVM: PPC: e500: fix handling local_sid_lookup result

2015-10-14 Thread Paul Mackerras

On Thu, Sep 24, 2015 at 04:00:23PM +0200, Andrzej Hajda wrote:
> The function can return negative value.
> 
> The problem has been detected using proposed semantic patch
> scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].
> 
> [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107
> 
> Signed-off-by: Andrzej Hajda 

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][v2] KVM: PPC: e500: Emulate TMCFG0 TMRN register

2015-10-14 Thread Paul Mackerras

On Fri, Sep 25, 2015 at 06:02:23PM +0300, Laurentiu Tudor wrote:
> Emulate TMCFG0 TMRN register exposing one HW thread per vcpu.
> 
> Signed-off-by: Mihai Caraman 
> [laurentiu.tu...@freescale.com: rebased on latest kernel, use
>  define instead of hardcoded value, moved code in own function]
> Signed-off-by: Laurentiu Tudor 

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: e500: fix couple of shift operations on 64 bits

2015-10-14 Thread Paul Mackerras

On Thu, Oct 01, 2015 at 03:58:03PM +0300, Laurentiu Tudor wrote:
> Fix couple of cases where we shift left a 32-bit
> value thus might get truncated results on 64-bit
> targets.
> 
> Signed-off-by: Laurentiu Tudor 
> Suggested-by: Scott Wood 

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()

2015-09-21 Thread Paul Mackerras

On Mon, Sep 21, 2015 at 07:50:22AM +0200, Paolo Bonzini wrote:
> 
> 
> On 21/09/2015 03:37, David Gibson wrote:
> > On Fri, Sep 18, 2015 at 08:57:28AM +0200, Thomas Huth wrote:
> >> Access to the kvm->buses (like with the kvm_io_bus_read() and
> >> -write() functions) has to be protected via the kvm->srcu lock. 
> >> The kvmppc_h_logical_ci_load() and -store() functions are
> >> missing this lock so far, so let's add it there, too. This fixes
> >> the problem that the kernel reports "suspicious RCU usage" when
> >> lock debugging is enabled.
> >> 
> >> Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1 Signed-off-by:
> >> Thomas Huth 
> > 
> > Nice catch.  Looks like I missed this because the places 
> > kvm_io_bus_{read,write}() are called on x86 are buried about 5
> > layers below where the srcu lock is taken :/.
> > 
> > Reviewed-by: David Gibson 
...
> Paul,
> 
> shall I take this directly into my tree for -rc3?
> 
> Paolo

I have that and two other fixes in my kvm-ppc-fixes branch on
kernel.org.  They were in linux-next today.  I was going to send you a
pull request tomorrow, but if you are about to send stuff off to Linus
you could pull now from:

git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes

The three patches in there are:

Gautham R. Shenoy (1):
  KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit

Paul Mackerras (1):
  KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs

Thomas Huth (1):
  KVM: PPC: Book3S: Take the kvm->srcu lock in 
kvmppc_h_logical_ci_load/store()

The one from Gautham is a 1-liner that has been around for months and
got missed, and is obviously correct.  The one from me fixes a
regression that was introduced in 4.3-rc1 by one of my patches, which
causes oopses and soft lockups due to a use-after-free bug.

Thanks,
Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs

2015-09-17 Thread Paul Mackerras

This fixes a bug which results in stale vcore pointers being left in
the per-cpu preempted vcore lists when a VM is destroyed.  The result
of the stale vcore pointers is usually either a crash or a lockup
inside collect_piggybacks() when another VM is run.  A typical
lockup message looks like:

[  472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! 
[qemu-system-ppc:7039]
[  472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE 
nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 
xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables 
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle 
ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle 
iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng 
binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm 
drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: 
kvm_hv]
[  472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ 
#49
[  472.162187] task: c01e38512750 ti: c01e41bfc000 task.ti: 
c01e41bfc000
[  472.162262] NIP: c096b094 LR: c096b08c CTR: c030
[  472.162337] REGS: c01e41bff520 TRAP: 0901   Not tainted  (4.2.0-kvm+)
[  472.162399] MSR: 90019033   CR: 24848844  
XER: 
[  472.162588] CFAR: c096b0ac SOFTE: 1
GPR00: c070 c01e41bff7a0 c127df00 0001
GPR04: 0003 0001  00874821
GPR08: c01e41bff8e0 0001  defde740
GPR12: c030 cfdae400
[  472.163053] NIP [c096b094] _raw_spin_lock_irqsave+0xa4/0x130
[  472.163117] LR [c096b08c] _raw_spin_lock_irqsave+0x9c/0x130
[  472.163179] Call Trace:
[  472.163206] [c01e41bff7a0] [c01e41bff7f0] 0xc01e41bff7f0 
(unreliable)
[  472.163295] [c01e41bff7e0] [c070] __wake_up+0x40/0x90
[  472.163375] [c01e41bff830] [defd6fc0] 
kvmppc_run_core+0x1240/0x1950 [kvm_hv]
[  472.163465] [c01e41bffa30] [defd8510] 
kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv]
[  472.163559] [c01e41bffb70] [de9318a4] kvmppc_vcpu_run+0x44/0x60 
[kvm]
[  472.163653] [c01e41bffba0] [de92e674] 
kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[  472.163745] [c01e41bffbe0] [de9263a8] kvm_vcpu_ioctl+0x538/0x7b0 
[kvm]
[  472.163834] [c01e41bffd40] [c02d0f50] do_vfs_ioctl+0x480/0x7c0
[  472.163910] [c01e41bffde0] [c02d1364] SyS_ioctl+0xd4/0xf0
[  472.163986] [c01e41bffe30] [c0009260] system_call+0x38/0xd0
[  472.164060] Instruction dump:
[  472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 6000 6000 6042 
8bad02e2
[  472.164224] 7fc3f378 4b6a57c1 6000 7c210b78  89290009 792affe3 
40820070

The bug is that kvmppc_run_vcpu does not correctly handle the case
where a vcpu task receives a signal while its guest vcpu is executing
in the guest as a result of being piggy-backed onto the execution of
another vcore.  In that case we need to wait for the vcpu to finish
executing inside the guest, and then remove this vcore from the
preempted vcores list.  That way, we avoid leaving this vcpu's vcore
on the preempted vcores list when the vcpu gets interrupted.

Fixes: ec2571650826
Reported-by: Thomas Huth 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 9754e68..2280497 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2692,9 +2692,13 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
 
while (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE &&
   (vc->vcore_state == VCORE_RUNNING ||
-   vc->vcore_state == VCORE_EXITING))
+   vc->vcore_state == VCORE_EXITING ||
+   vc->vcore_state == VCORE_PIGGYBACK))
kvmppc_wait_for_exec(vc, vcpu, TASK_UNINTERRUPTIBLE);
 
+   if (vc->vcore_state == VCORE_PREEMPT && vc->runner == NULL)
+   kvmppc_vcore_end_preempt(vc);
+
if (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE) {
kvmppc_remove_runnable(vc, vcpu);
vcpu->stat.signal_exits++;
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD

2015-09-06 Thread Paul Mackerras

On Sun, Sep 06, 2015 at 12:47:12PM -0700, Nathan Whitehorn wrote:
> Anything I can do to help move these along? It's a big performance
> improvement for FreeBSD guests.

These patches are in Paolo's kvm-ppc-next branch and should go into
Linus' tree in the next couple of days.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Please pull my kvm-ppc-next branch

2015-09-05 Thread Paul Mackerras

Paolo,

Please pull the commits listed below into your tree.  I would like
them to go in for 4.3 as they are all small bug fixes not new
features, and they all can only affect HV-mode KVM on IBM server
machines (in fact one has no effect on code at all since it is a typo
fix for a comment).

Please let me know if you want me to re-post all the patches.

Thanks,
Paul.

The following changes since commit e3dbc572fe11a5231568e106fa3dcedd1d1bec0f:

  Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into 
kvm-queue (2015-08-22 14:57:59 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-next

for you to fetch changes up to 4e33d1f0a145d48e8cf287954bbf791af8387cfb:

  KVM: PPC: Book3S: Fix typo in top comment about locking (2015-09-04 07:28:05 
+1000)


Gautham R. Shenoy (2):
  KVM: PPC: Book3S HV: Fix race in starting secondary threads
  KVM: PPC: Book3S HV: Exit on H_DOORBELL if HOST_IPI is set

Greg Kurz (1):
  KVM: PPC: Book3S: Fix typo in top comment about locking

Thomas Huth (1):
  KVM: PPC: Book3S: Fix size of the PSPB register

 arch/powerpc/include/asm/kvm_host.h |  2 +-
 arch/powerpc/kvm/book3s_hv.c| 10 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  9 +
 arch/powerpc/kvm/book3s_xics.c  |  2 +-
 4 files changed, 20 insertions(+), 3 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Exit on H_DOORBELL only if HOST_IPI is set

2015-09-02 Thread Paul Mackerras

From: "Gautham R. Shenoy" 

The code that handles the case when we receive a H_DOORBELL interrupt
has a comment which says "Hypervisor doorbell - exit only if host IPI
flag set".  However, the current code does not actually check if the
host IPI flag is set.  This is due to a comparison instruction that
got missed.

As a result, the current code performs the exit to host only
if some sibling thread or a sibling sub-core is exiting to the
host.  This implies that, an IPI sent to a sibling core in
(subcores-per-core != 1) mode will be missed by the host unless the
sibling core is on the exit path to the host.

This patch adds the missing comparison operation which will ensure
that when HOST_IPI flag is set, we unconditionally exit to the host.

Fixes: 66feed61cdf6
Cc: sta...@vger.kernel.org # v4.1+
Signed-off-by: Gautham R. Shenoy 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b07f045..2273dca 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1213,6 +1213,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
cmpwi   r12, BOOK3S_INTERRUPT_H_DOORBELL
bne 3f
lbz r0, HSTATE_HOST_IPI(r13)
+   cmpwi   r0, 0
beq 4f
b   guest_exit_cont
 3:
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Fix race in starting secondary threads

2015-09-02 Thread Paul Mackerras

From: "Gautham R. Shenoy" 

The current dynamic micro-threading code has a race due to which a
secondary thread naps when it is supposed to be running a vcpu. As a
side effect of this, on a guest exit, the primary thread in
kvmppc_wait_for_nap() finds that this secondary thread hasn't cleared
its vcore pointer. This results in "CPU X seems to be stuck!"
warnings.

The race is possible since the primary thread on exiting the guests
only waits for all the secondaries to clear its vcore pointer. It
subsequently expects the secondary threads to enter nap while it
unsplits the core. A secondary thread which hasn't yet entered the nap
will loop in kvm_no_guest until its vcore pointer and the do_nap flag
are unset. Once the core has been unsplit, a new vcpu thread can grab
the core and set the do_nap flag *before* setting the vcore pointers
of the secondary. As a result, the secondary thread will now enter nap
via kvm_unsplit_nap instead of running the guest vcpu.

Fix this by setting the do_nap flag after setting the vcore pointer in
the PACA of the secondary in kvmppc_run_core. Also, ensure that a
secondary thread doesn't nap in kvm_unsplit_nap when the vcore pointer
in its PACA struct is set.

Fixes: b4deba5c41e9
Signed-off-by: Gautham R. Shenoy 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c| 10 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index fad52f2..c5edf17 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2411,7 +2411,6 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore 
*vc)
break;
cpu_relax();
}
-   split_info.do_nap = 1;  /* ask secondaries to nap when done */
}
 
/* Start all the threads */
@@ -2440,6 +2439,15 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore 
*vc)
thr += pvc->num_threads;
}
}
+
+   /*
+* Ensure that split_info.do_nap is set after setting
+* the vcore pointer in the PACA of the secondaries.
+*/
+   smp_mb();
+   if (cmd_bit)
+   split_info.do_nap = 1;  /* ask secondaries to nap when done */
+
/*
 * When doing micro-threading, poke the inactive threads as well.
 * This gets them to the nap instruction after kvm_do_nap,
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 472680f..b07f045 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -421,6 +421,14 @@ kvm_no_guest:
  * whole-core mode, so we need to nap.
  */
 kvm_unsplit_nap:
+   /*
+* Ensure that secondary doesn't nap when it has
+* its vcore pointer set.
+*/
+   sync/* matches smp_mb() before setting split_info.do_nap */
+   ld  r0, HSTATE_KVM_VCORE(r13)
+   cmpdi   r0, 0
+   bne kvm_no_guest
/* clear any pending message */
 BEGIN_FTR_SECTION
lis r6, (PPC_DBELL_SERVER << (63-36))@h
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Paul Mackerras

On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote:
> > The size of the Problem State Priority Boost Register is only
> > 32 bits, so let's change the type of the corresponding variable
> > accordingly to avoid future trouble.
> 
> It's not future trouble, it's broken today for LE and this should fix
> it BUT 

No, it's broken today for BE hosts, which will always see 0 for the
PSPB register value.  LE hosts are fine.

> The asm accesses it using lwz/stw and C accesses it as a ulong. On LE
> that will mean that userspace will see the value << 32

No, that will happen on BE, and since KVM_REG_PPC_PSPB says it's a
32-bit register, we'll just pass 0 back to userspace when it reads it.

> Now "fixing" it might break migration if that field is already
> stored/loaded in its "broken" form. We may have to keep the "broken"
> behaviour and document that qemu sees a value shifted by 32.

It will be being set to 0 on BE hosts across migration today
(fortunately 0 is a benign value for PSPB).  If we fix this on both
the source and destination host, then the value will get migrated
across correctly.

I think Thomas's patch is fine, it just needs a stronger patch
description saying that it fixes an actual bug.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Paul Mackerras

On Tue, Sep 01, 2015 at 11:41:18PM +0200, Thomas Huth wrote:
> The size of the Problem State Priority Boost Register is only
> 32 bits, so let's change the type of the corresponding variable
> accordingly to avoid future trouble.

Since we're already using lwz/stw in the assembly code in
book3s_hv_rmhandlers.S, this is actually a bug fix, isn't it?
How did you find it?  Did you observe a failure of some kind, or did
you just find it by code inspection?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vfio: Enable VFIO device for powerpc

2015-08-26 Thread Paul Mackerras

On Wed, Aug 26, 2015 at 11:34:26AM +0200, Alexander Graf wrote:
> 
> 
> On 13.08.15 03:15, David Gibson wrote:
> > ec53500f "kvm: Add VFIO device" added a special KVM pseudo-device which is
> > used to handle any necessary interactions between KVM and VFIO.
> > 
> > Currently that device is built on x86 and ARM, but not powerpc, although
> > powerpc does support both KVM and VFIO.  This makes things awkward in
> > userspace
> > 
> > Currently qemu prints an alarming error message if you attempt to use VFIO
> > and it can't initialize the KVM VFIO device.  We don't want to remove the
> > warning, because lack of the KVM VFIO device could mean coherency problems
> > on x86.  On powerpc, however, the error is harmless but looks disturbing,
> > and a test based on host architecture in qemu would be ugly, and break if
> > we do need the KVM VFIO device for something important in future.
> > 
> > There's nothing preventing the KVM VFIO device from being built for
> > powerpc, so this patch turns it on.  It won't actually do anything, since
> > we don't define any of the arch_*() hooks, but it will make qemu happy and
> > we can extend it in future if we need to.
> > 
> > Signed-off-by: David Gibson 
> > Reviewed-by: Eric Auger 
> 
> Paul is going to take care of the kvm-ppc tree for 4.3. Also, ppc kvm
> patches should get CC on the kvm-ppc@vger mailing list ;).
> 
> Paul, could you please pick this one up?

Sure, I'll do that once I get home (end of this week).

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm:powerpc:Fix return statements for wrapper functions in the file book3s_64_mmu_hv.c

2015-08-20 Thread Paul Mackerras

On Mon, Aug 10, 2015 at 11:27:31AM -0400, Nicholas Krause wrote:
> This fixes the wrapper functions kvm_umap_hva_hv and the function
> kvm_unmap_hav_range_hv to return the return value of the function
> kvm_handle_hva or kvm_handle_hva_range that they are wrapped to
> call internally rather then always making the caller of these
> wrapper functions think they always run successfully by returning
> the value of zero directly.

In fact these functions do always run successfully, and there is no
bug fixed here (see below).

I don't object to the change per se, since it reduces the code size
very slightly, but the commit message and headline needs to be
reworded to avoid giving the impression that this fixes something.

>  int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva)
>  {
> - kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
> - return 0;
> + return kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
>  }
>  
>  int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start, unsigned 
> long end)
>  {
> - kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp);
> - return 0;
> + return kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp);

kvm_handle_hva and kvm_handle_hva_range call the handler function
(kvm_unmap_rmapp in this case) one or more times, and return the
logical OR of the return values from the handler.  Since
kvm_unmap_rmapp always returns 0, the return value from
kvm_handle_hva{,_range} will always be 0 here.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Two fixes for dynamic micro-threading

2015-07-24 Thread Paul Mackerras

On Thu, Jul 23, 2015 at 02:02:51PM +0200, Alexander Graf wrote:
> 
> The host crash should only occur with dynamic micro-threading enabled,
> which is not in Linus' tree, correct?

Correct.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] Two fixes for dynamic micro-threading

2015-07-16 Thread Paul Mackerras

This series contains two fixes for the new dynamic micro-threading
code that was added recently for HV-mode KVM on Power servers.
The patches are against Alex Graf's kvm-ppc-queue branch.  Please
apply.

Paul.

 arch/powerpc/kvm/book3s_hv.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation

2015-07-16 Thread Paul Mackerras

Whenever a vcore state is VCORE_PREEMPT we need to be counting stolen
time for it.  This currently isn't the case when we have a vcore that
no longer has any runnable threads in it but still has a runner task,
so we do an explicit call to kvmppc_core_start_stolen() in that case.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3d02276..fad52f2 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2283,9 +2283,14 @@ static void post_guest_process(struct kvmppc_vcore *vc, 
bool is_master)
}
list_del_init(&vc->preempt_list);
if (!is_master) {
-   vc->vcore_state = vc->runner ? VCORE_PREEMPT : VCORE_INACTIVE;
-   if (still_running > 0)
+   if (still_running > 0) {
kvmppc_vcore_preempt(vc);
+   } else if (vc->runner) {
+   vc->vcore_state = VCORE_PREEMPT;
+   kvmppc_core_start_stolen(vc);
+   } else {
+   vc->vcore_state = VCORE_INACTIVE;
+   }
if (vc->n_runnable > 0 && vc->runner == NULL) {
/* make sure there's a candidate runner awake */
vcpu = list_first_entry(&vc->runnable_threads,
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: PPC: Book3S HV: Fix preempted vcore list locking

2015-07-16 Thread Paul Mackerras

When a vcore gets preempted, we put it on the preempted vcore list for
the current CPU.  The runner task then calls schedule() and comes back
some time later and takes itself off the list.  We need to be careful
to lock the list that it was put onto, which may not be the list for the
current CPU since the runner task may have moved to another CPU.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6e3ef30..3d02276 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1962,10 +1962,11 @@ static void kvmppc_vcore_preempt(struct kvmppc_vcore 
*vc)
 
 static void kvmppc_vcore_end_preempt(struct kvmppc_vcore *vc)
 {
-   struct preempted_vcore_list *lp = this_cpu_ptr(&preempted_vcores);
+   struct preempted_vcore_list *lp;
 
kvmppc_core_end_stolen(vc);
if (!list_empty(&vc->preempt_list)) {
+   lp = &per_cpu(preempted_vcores, vc->pcpu);
spin_lock(&lp->lock);
list_del_init(&vc->preempt_list);
spin_unlock(&lp->lock);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-07-02 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras 
---
v3: Rename MAX_THREADS to MAX_SMT_THREADS to avoid a compile warning

 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 367 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 113 +++--
 6 files changed, 473 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..57d5dfe 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_SMT_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_SMT_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore */
+#define VCORE_EXIT_REQ 0x1
+
 /*
  * Values for vcore_state.
  * Note that these are arranged such that lower values
diff --git a/arch/powerpc

[PATCH v2 2/5] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-06-30 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras 
---
v2: List allowed values for dynamic_mt_modes module parameter in the
module parameter description.

 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 369 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 113 +++--
 6 files changed, 475 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..4024d24 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore */
+#define VCORE_EXIT_REQ 0x1
+
 /*
  * Values for vcore_state.
  * Note that these are arranged such that lower values
diff --g

[PATCH 2/5] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-06-24 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 369 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 113 +++--
 6 files changed, 475 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..4024d24 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore */
+#define VCORE_EXIT_REQ 0x1
+
 /*
  * Values for vcore_state.
  * Note that these are arranged such that lower values
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index d333664..c3e11e0 1

[PATCH 5/5] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD

2015-06-24 Thread Paul Mackerras

This adds implementations for the H_CLEAR_REF (test and clear reference
bit) and H_CLEAR_MOD (test and clear changed bit) hypercalls.

When clearing the reference or change bit in the guest view of the HPTE,
we also have to clear it in the real HPTE so that we can detect future
references or changes.  When we do so, we transfer the R or C bit value
to the rmap entry for the underlying host page so that kvm_age_hva_hv(),
kvm_test_age_hva_hv() and kvmppc_hv_get_dirty_log() know that the page
has been referenced and/or changed.

These hypercalls are not used by Linux guests.  These implementations
have been tested using a FreeBSD guest.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 126 ++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   4 +-
 2 files changed, 121 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c7a3ab2..c1df9bb 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -112,25 +112,38 @@ void kvmppc_update_rmap_change(unsigned long *rmap, 
unsigned long psize)
 }
 EXPORT_SYMBOL_GPL(kvmppc_update_rmap_change);
 
+/* Returns a pointer to the revmap entry for the page mapped by a HPTE */
+static unsigned long *revmap_for_hpte(struct kvm *kvm, unsigned long hpte_v,
+ unsigned long hpte_gr)
+{
+   struct kvm_memory_slot *memslot;
+   unsigned long *rmap;
+   unsigned long gfn;
+
+   gfn = hpte_rpn(hpte_gr, hpte_page_size(hpte_v, hpte_gr));
+   memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn);
+   if (!memslot)
+   return NULL;
+
+   rmap = real_vmalloc_addr(&memslot->arch.rmap[gfn - memslot->base_gfn]);
+   return rmap;
+}
+
 /* Remove this HPTE from the chain for a real page */
 static void remove_revmap_chain(struct kvm *kvm, long pte_index,
struct revmap_entry *rev,
unsigned long hpte_v, unsigned long hpte_r)
 {
struct revmap_entry *next, *prev;
-   unsigned long gfn, ptel, head;
-   struct kvm_memory_slot *memslot;
+   unsigned long ptel, head;
unsigned long *rmap;
unsigned long rcbits;
 
rcbits = hpte_r & (HPTE_R_R | HPTE_R_C);
ptel = rev->guest_rpte |= rcbits;
-   gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel));
-   memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn);
-   if (!memslot)
+   rmap = revmap_for_hpte(kvm, hpte_v, ptel);
+   if (!rmap)
return;
-
-   rmap = real_vmalloc_addr(&memslot->arch.rmap[gfn - memslot->base_gfn]);
lock_rmap(rmap);
 
head = *rmap & KVMPPC_RMAP_INDEX;
@@ -678,6 +691,105 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long 
flags,
return H_SUCCESS;
 }
 
+long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
+   unsigned long pte_index)
+{
+   struct kvm *kvm = vcpu->kvm;
+   __be64 *hpte;
+   unsigned long v, r, gr;
+   struct revmap_entry *rev;
+   unsigned long *rmap;
+   long ret = H_NOT_FOUND;
+
+   if (pte_index >= kvm->arch.hpt_npte)
+   return H_PARAMETER;
+
+   rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+   hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+   while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
+   cpu_relax();
+   v = be64_to_cpu(hpte[0]);
+   r = be64_to_cpu(hpte[1]);
+   if (!(v & (HPTE_V_VALID | HPTE_V_ABSENT)))
+   goto out;
+
+   gr = rev->guest_rpte;
+   if (rev->guest_rpte & HPTE_R_R) {
+   rev->guest_rpte &= ~HPTE_R_R;
+   note_hpte_modification(kvm, rev);
+   }
+   if (v & HPTE_V_VALID) {
+   gr |= r & (HPTE_R_R | HPTE_R_C);
+   if (r & HPTE_R_R) {
+   kvmppc_clear_ref_hpte(kvm, hpte, pte_index);
+   rmap = revmap_for_hpte(kvm, v, gr);
+   if (rmap) {
+   lock_rmap(rmap);
+   *rmap |= KVMPPC_RMAP_REFERENCED;
+   unlock_rmap(rmap);
+   }
+   }
+   }
+   vcpu->arch.gpr[4] = gr;
+   ret = H_SUCCESS;
+ out:
+   unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
+   return ret;
+}
+
+long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
+   unsigned long pte_index)
+{
+   struct kvm *kvm = vcpu->kvm;
+   __be64 *hpte;
+   unsigned long v, r, gr;
+   struct revmap_entry *rev;
+   unsigned long *rmap;
+   long ret = H_NOT_FOUND;
+
+   if (pte_index >= kvm->arch.hpt_npte)
+   return H_PARAMETER;
+
+   rev = real_vmalloc_addr(&

[PATCH 4/5] KVM: PPC: Book3S HV: Fix bug in dirty page tracking

2015-06-24 Thread Paul Mackerras

This fixes a bug in the tracking of pages that get modified by the
guest.  If the guest creates a large-page HPTE, writes to memory
somewhere within the large page, and then removes the HPTE, we only
record the modified state for the first normal page within the large
page, when in fact the guest might have modified some other normal
page within the large page.

To fix this we use some unused bits in the rmap entry to record the
order (log base 2) of the size of the page that was modified, when
removing an HPTE.  Then in kvm_test_clear_dirty_npages() we use that
order to return the correct number of modified pages.

The same thing could in principle happen when removing a HPTE at the
host's request, i.e. when paging out a page, except that we never
page out large pages, and the guest can only create large-page HPTEs
if the guest RAM is backed by large pages.  However, we also fix
this case for the sake of future-proofing.

The reference bit is also subject to the same loss of information.  We
don't make the same fix here for the reference bit because there isn't
an interface for userspace to find out which pages the guest has
referenced, whereas there is one for userspace to find out which pages
the guest has modified.  Because of this loss of information, the
kvm_age_hva_hv() and kvm_test_age_hva_hv() functions might incorrectly
say that a page has not been referenced when it has, but that doesn't
matter greatly because we never page or swap out large pages.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s.h |  1 +
 arch/powerpc/include/asm/kvm_host.h   |  2 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |  8 +++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   | 17 +
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index b91e74a..e6b2534 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -158,6 +158,7 @@ extern pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t 
gpa, bool writing,
bool *writable);
 extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
unsigned long *rmap, long pte_index, int realmode);
+extern void kvmppc_update_rmap_change(unsigned long *rmap, unsigned long 
psize);
 extern void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
unsigned long pte_index);
 void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 80eb29a..e187b6a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -205,8 +205,10 @@ struct revmap_entry {
  */
 #define KVMPPC_RMAP_LOCK_BIT   63
 #define KVMPPC_RMAP_RC_SHIFT   32
+#define KVMPPC_RMAP_CHG_SHIFT  48
 #define KVMPPC_RMAP_REFERENCED (HPTE_R_R << KVMPPC_RMAP_RC_SHIFT)
 #define KVMPPC_RMAP_CHANGED(HPTE_R_C << KVMPPC_RMAP_RC_SHIFT)
+#define KVMPPC_RMAP_CHG_ORDER  (0x3ful << KVMPPC_RMAP_CHG_SHIFT)
 #define KVMPPC_RMAP_PRESENT0x1ul
 #define KVMPPC_RMAP_INDEX  0xul
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index dab68b7..1f9c0a1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -761,6 +761,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
/* Harvest R and C */
rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
*rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
+   if (rcbits & HPTE_R_C)
+   kvmppc_update_rmap_change(rmapp, psize);
if (rcbits & ~rev[i].guest_rpte) {
rev[i].guest_rpte = ptel | rcbits;
note_hpte_modification(kvm, &rev[i]);
@@ -927,8 +929,12 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
  retry:
lock_rmap(rmapp);
if (*rmapp & KVMPPC_RMAP_CHANGED) {
-   *rmapp &= ~KVMPPC_RMAP_CHANGED;
+   long change_order = (*rmapp & KVMPPC_RMAP_CHG_ORDER)
+   >> KVMPPC_RMAP_CHG_SHIFT;
+   *rmapp &= ~(KVMPPC_RMAP_CHANGED | KVMPPC_RMAP_CHG_ORDER);
npages_dirty = 1;
+   if (change_order > PAGE_SHIFT)
+   npages_dirty = 1ul << (change_order - PAGE_SHIFT);
}
if (!(*rmapp & KVMPPC_RMAP_PRESENT)) {
unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c6d601c..c7a3ab2 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -12,6 +12,7 @@
 #include 
 #include

[PATCH 3/5] KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE

2015-06-24 Thread Paul Mackerras

The reference (R) and change (C) bits in a HPT entry can be set by
hardware at any time up until the HPTE is invalidated and the TLB
invalidation sequence has completed.  This means that when removing
a HPTE, we need to read the HPTE after the invalidation sequence has
completed in order to obtain reliable values of R and C.  The code
in kvmppc_do_h_remove() used to do this.  However, commit 6f22bd3265fb
("KVM: PPC: Book3S HV: Make HTAB code LE host aware") removed the
read after invalidation as a side effect of other changes.  This
restores the read of the HPTE after invalidation.

The user-visible effect of this bug would be that when migrating a
guest, there is a small probability that a page modified by the guest
and then unmapped by the guest might not get re-transmitted and thus
the destination might end up with a stale copy of the page.

Fixes: 6f22bd3265fb
Cc: sta...@vger.kernel.org # v3.17+
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index b027a89..c6d601c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -421,14 +421,20 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
v = pte & ~HPTE_V_HVLOCK;
if (v & HPTE_V_VALID) {
-   u64 pte1;
-
-   pte1 = be64_to_cpu(hpte[1]);
hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
-   rb = compute_tlbie_rb(v, pte1, pte_index);
+   rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
-   /* Read PTE low word after tlbie to get final R/C values */
-   remove_revmap_chain(kvm, pte_index, rev, v, pte1);
+   /*
+* The reference (R) and change (C) bits in a HPT
+* entry can be set by hardware at any time up until
+* the HPTE is invalidated and the TLB invalidation
+* sequence has completed.  This means that when
+* removing a HPTE, we need to re-read the HPTE after
+* the invalidation sequence has completed in order to
+* obtain reliable values of R and C.
+*/
+   remove_revmap_chain(kvm, pte_index, rev, v,
+   be64_to_cpu(hpte[1]));
}
r = rev->guest_rpte & ~HPTE_GR_RESERVED;
note_hpte_modification(kvm, rev);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5] KVM: PPC: Book3S HV: Make use of unused threads when running guests

2015-06-24 Thread Paul Mackerras

When running a virtual core of a guest that is configured with fewer
threads per core than the physical cores have, the extra physical
threads are currently unused.  This makes it possible to use them to
run one or more other virtual cores from the same guest when certain
conditions are met.  This applies on POWER7, and on POWER8 to guests
with one thread per virtual core.  (It doesn't apply to POWER8 guests
with multiple threads per vcore because they require a 1-1 virtual to
physical thread mapping in order to be able to use msgsndp and the
TIR.)

The idea is that we maintain a list of preempted vcores for each
physical cpu (i.e. each core, since the host runs single-threaded).
Then, when a vcore is about to run, it checks to see if there are
any vcores on the list for its physical cpu that could be
piggybacked onto this vcore's execution.  If so, those additional
vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
threads are started as well as the original vcore, which is called
the master vcore.

After the vcores have exited the guest, the extra ones are put back
onto the preempted list if any of their VCPUs are still runnable and
not idle.

This means that vcpu->arch.ptid is no longer necessarily the same as
the physical thread that the vcpu runs on.  In order to make it easier
for code that wants to send an IPI to know which CPU to target, we
now store that in a new field in struct vcpu_arch, called thread_cpu.

Reviewed-by: David Gibson 
Tested-by: Laurent Vivier 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  19 +-
 arch/powerpc/kernel/asm-offsets.c   |   2 +
 arch/powerpc/kvm/book3s_hv.c| 333 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c|   7 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c|   4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   5 +
 6 files changed, 298 insertions(+), 72 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..2b74490 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -278,7 +278,9 @@ struct kvmppc_vcore {
u16 last_cpu;
u8 vcore_state;
u8 in_guest;
+   struct kvmppc_vcore *master_vcore;
struct list_head runnable_threads;
+   struct list_head preempt_list;
spinlock_t lock;
wait_queue_head_t wq;
spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */
@@ -300,12 +302,18 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
-/* Values for vcore_state */
+/*
+ * Values for vcore_state.
+ * Note that these are arranged such that lower values
+ * (< VCORE_SLEEPING) don't require stolen time accounting
+ * on load/unload, and higher values do.
+ */
 #define VCORE_INACTIVE 0
-#define VCORE_SLEEPING 1
-#define VCORE_PREEMPT  2
-#define VCORE_RUNNING  3
-#define VCORE_EXITING  4
+#define VCORE_PREEMPT  1
+#define VCORE_PIGGYBACK2
+#define VCORE_SLEEPING 3
+#define VCORE_RUNNING  4
+#define VCORE_EXITING  5
 
 /*
  * Struct used to manage memory for a virtual processor area
@@ -619,6 +627,7 @@ struct kvm_vcpu_arch {
int trap;
int state;
int ptid;
+   int thread_cpu;
bool timer_running;
wait_queue_head_t cpu_run;
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 0034b6b..d333664 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -512,6 +512,8 @@ int main(void)
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty));
DEFINE(VCPU_HEIR, offsetof(struct kvm_vcpu, arch.emul_inst));
+   DEFINE(VCPU_CPU, offsetof(struct kvm_vcpu, cpu));
+   DEFINE(VCPU_THREAD_CPU, offsetof(struct kvm_vcpu, arch.thread_cpu));
 #endif
 #ifdef CONFIG_PPC_BOOK3S
DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 68d067a..2048309 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -81,6 +81,9 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
MAX_HCALL_OPCODE/4 + 1);
 #define MPP_BUFFER_ORDER   3
 #endif
 
+static int target_smt_mode;
+module_param(target_smt_mode, int, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(target_smt_mode, "Target threads per core (0 = max)");
 
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
@@ -114,7 +117,7 @@ static bool kvmppc_ipi_thread(int cpu)
 
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-   int cpu = vcpu->cpu;
+   int cpu;
wait_queue_head_t *wqp;
 
wqp = kvm_arch_vcpu_wq(vcpu);
@@ -123,10 +126,11 @@ static void kvmp

[PATCH 0/5] PPC: Current patch queue for HV KVM

2015-06-24 Thread Paul Mackerras

This is my current queue of patches for HV KVM.  This series is based
on the kvm next branch.  They have all been posted 6 weeks ago or
more, though I have just added a 3-line fix to patch 2/5 to fix a bug
that we found in testing migration, and I expanded a comment (no code
change) in patch 3/5 following a suggestion by Aneesh.

I'd like to see these go into 4.2 if possible.

Paul.
---

 arch/powerpc/include/asm/kvm_book3s.h |   1 +
 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 +
 arch/powerpc/include/asm/kvm_host.h   |  24 +-
 arch/powerpc/kernel/asm-offsets.c |   9 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |   8 +-
 arch/powerpc/kvm/book3s_hv.c  | 648 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  32 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   | 161 +++-
 arch/powerpc/kvm/book3s_hv_rm_xics.c  |   4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 122 +-
 10 files changed, 906 insertions(+), 123 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-06-24 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras 
---
v2: Add a test (3 lines) to book3s_hv_rmhandlers.S to ensure that we
don't subtract the timebase offset in cases where we didn't add it.
This fixes a bug found in testing where the timebase could get out of
sync, causing soft lockups and crashes.

 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 369 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 113 +++--
 6 files changed, 475 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..4024d24 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered

Re: [PATCH 0/2] KVM: PPC: Book3S HV: Dynamic micro-threading/split-core

2015-06-21 Thread Paul Mackerras

On Wed, Jun 17, 2015 at 07:30:09PM +0200, Laurent Vivier wrote:
> 
> Tested-by: Laurent Vivier 
> 
> Performance is better, but Paul could you explain why it is better if I 
> disable dynamic micro-threading ?
> Did I miss something ?
> 
> My test system is an IBM Power S822L.
> 
> I run two guests with 8 vCPUs (-smp 8,sockets=8,cores=1,threads=1) both
> attached on the same core (with pinning option of virt-manager). Then, I
> measure the time needed to compile a kernel in parallel in both guests
> with "make -j 16".
> 
> My kernel without micro-threading:
> 
> real37m23.424s real37m24.959s
> user167m31.474suser165m44.142s
> sys 113m26.195ssys 113m45.072s
> 
> With micro-threading patches (PATCH 1+2):
> 
> target_smt_mode 0 [in fact It was 8 here, but it should behave like 0, as it 
> is > max threads/sub-core]
> dynamic_mt_modes 6
> 
> real32m13.338s real  32m26.652s
> user139m21.181suser  140m20.994s
> sys 77m35.339s sys   78m16.599s
> 
> It's better, but if I disable dynamic micro-threading (but PATCH 1+2):
> 
> target_smt_mode 0
> dynamic_mt_modes 0
> 
> real30m49.100s real 30m48.161s
> user144m22.989suser 142m53.886s
> sys 65m4.942s  sys  66m8.159s
> 
> it's even better.

I think what's happening here is that with dynamic_mt_modes=0 the
system alternates between the two guests, whereas with
dynamic_mt_modes=6 it will spend some of the time running both guests
simultaneously in two-way split mode.  Since you have two
compute-bound guests that each have threads=1 and 8 vcpus, it can fill
up the core either way.  In that case it is more efficient to fill up
the core with vcpus from one guest and not have to split the core,
firstly because you avoid the split/unsplit latency and secondly
because the threads run a little faster in whole-core mode than in
split-core.

I am considering adding an additional heuristic, which would be to do
two passes through the list of preempted vcores, considering only
vcores from the same guest as the primary vcore on the first pass, and
then considering all vcores on the second pass.  Maybe we could then
also say after the first pass that if we have collected 4 or more
runnable vcpus we don't bother with the second pass.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in

[PATCH 2/2] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-05-27 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 369 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 106 +++--
 6 files changed, 469 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..4024d24 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore */
+#define VCORE_EXIT_REQ 0x1
+
 /*
  * Values for vcore_state.
  * Note that these are arranged such that lower values
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index d333664..c3e11e0 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -676,7 +676,14 @@ int main(void)
HSTATE_FIELD(HSTATE_DSCR, host_dscr);
HSTATE_FIELD(HSTATE_DABR, dabr);
HSTATE_FIELD(HSTATE_DECEXP, dec_expires);
+   HSTATE_FIELD(HSTATE_SPLIT_MODE, kvm_split_mode);
DEFINE(IPI_PRIORITY, IPI_PRIORITY);
+   DEFINE(KVM_SPLIT_RPR, offsetof(struct kvm_split_

[PATCH 0/2] KVM: PPC: Book3S HV: Dynamic micro-threading/split-core

2015-05-27 Thread Paul Mackerras

This patch series provides a way to use more of the capacity of each
processor core when running guests configured with threads=1, 2 or 4
on a POWER8 host with HV KVM, without having to change the static
micro-threading (the official name for split-core) mode for the whole
machine.  The problem with setting the machine to static 2-way or
4-way micro-threading mode is that (a) then you can't run guests with
threads=8 and (b) selecting the right mode can be tricky and requires
knowledge of what guests you will be running.

Instead, with these two patches, we can now run more than one virtual
core (vcore) on a given physical core if possible, and if that means
we need to switch the core to 2-way or 4-way micro-threading mode,
then we do that on entry to the guests and switch back to whole-core
mode on exit (and we only switch the one core, not the whole machine).
The core mode switching is only done if the machine is in static
whole-core mode.

All of this only comes into effect when a core is over-committed.
When the machine is lightly loaded everything operates the same with
these patches as without.  Only when some core has a vcore that is
able to run while there is also another vcore that was wanting to run
on that core but got preempted does the logic kick in to try to run
both vcores at once.

Paul.
---

 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 +
 arch/powerpc/include/asm/kvm_host.h   |  22 +-
 arch/powerpc/kernel/asm-offsets.c |   9 +
 arch/powerpc/kvm/book3s_hv.c  | 648 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  32 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c  |   4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 111 -
 7 files changed, 740 insertions(+), 106 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: PPC: Book3S HV: Make use of unused threads when running guests

2015-05-27 Thread Paul Mackerras

When running a virtual core of a guest that is configured with fewer
threads per core than the physical cores have, the extra physical
threads are currently unused.  This makes it possible to use them to
run one or more other virtual cores from the same guest when certain
conditions are met.  This applies on POWER7, and on POWER8 to guests
with one thread per virtual core.  (It doesn't apply to POWER8 guests
with multiple threads per vcore because they require a 1-1 virtual to
physical thread mapping in order to be able to use msgsndp and the
TIR.)

The idea is that we maintain a list of preempted vcores for each
physical cpu (i.e. each core, since the host runs single-threaded).
Then, when a vcore is about to run, it checks to see if there are
any vcores on the list for its physical cpu that could be
piggybacked onto this vcore's execution.  If so, those additional
vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
threads are started as well as the original vcore, which is called
the master vcore.

After the vcores have exited the guest, the extra ones are put back
onto the preempted list if any of their VCPUs are still runnable and
not idle.

This means that vcpu->arch.ptid is no longer necessarily the same as
the physical thread that the vcpu runs on.  In order to make it easier
for code that wants to send an IPI to know which CPU to target, we
now store that in a new field in struct vcpu_arch, called thread_cpu.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  19 +-
 arch/powerpc/kernel/asm-offsets.c   |   2 +
 arch/powerpc/kvm/book3s_hv.c| 333 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c|   7 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c|   4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   5 +
 6 files changed, 298 insertions(+), 72 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..2b74490 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -278,7 +278,9 @@ struct kvmppc_vcore {
u16 last_cpu;
u8 vcore_state;
u8 in_guest;
+   struct kvmppc_vcore *master_vcore;
struct list_head runnable_threads;
+   struct list_head preempt_list;
spinlock_t lock;
wait_queue_head_t wq;
spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */
@@ -300,12 +302,18 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
-/* Values for vcore_state */
+/*
+ * Values for vcore_state.
+ * Note that these are arranged such that lower values
+ * (< VCORE_SLEEPING) don't require stolen time accounting
+ * on load/unload, and higher values do.
+ */
 #define VCORE_INACTIVE 0
-#define VCORE_SLEEPING 1
-#define VCORE_PREEMPT  2
-#define VCORE_RUNNING  3
-#define VCORE_EXITING  4
+#define VCORE_PREEMPT  1
+#define VCORE_PIGGYBACK2
+#define VCORE_SLEEPING 3
+#define VCORE_RUNNING  4
+#define VCORE_EXITING  5
 
 /*
  * Struct used to manage memory for a virtual processor area
@@ -619,6 +627,7 @@ struct kvm_vcpu_arch {
int trap;
int state;
int ptid;
+   int thread_cpu;
bool timer_running;
wait_queue_head_t cpu_run;
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 0034b6b..d333664 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -512,6 +512,8 @@ int main(void)
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty));
DEFINE(VCPU_HEIR, offsetof(struct kvm_vcpu, arch.emul_inst));
+   DEFINE(VCPU_CPU, offsetof(struct kvm_vcpu, cpu));
+   DEFINE(VCPU_THREAD_CPU, offsetof(struct kvm_vcpu, arch.thread_cpu));
 #endif
 #ifdef CONFIG_PPC_BOOK3S
DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 68d067a..2048309 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -81,6 +81,9 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
MAX_HCALL_OPCODE/4 + 1);
 #define MPP_BUFFER_ORDER   3
 #endif
 
+static int target_smt_mode;
+module_param(target_smt_mode, int, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(target_smt_mode, "Target threads per core (0 = max)");
 
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
@@ -114,7 +117,7 @@ static bool kvmppc_ipi_thread(int cpu)
 
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-   int cpu = vcpu->cpu;
+   int cpu;
wait_queue_head_t *wqp;
 
wqp = kvm_arch_vcpu_wq(vcpu);
@@ -123,10 +126,11 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu 
*vcp

[PATCH] KVM: PPC: Book3S HV: Fix list traversal in error case

2015-04-28 Thread Paul Mackerras

This fixes a regression introduced in commit 25fedfca94cf, "KVM: PPC:
Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu", which
leads to a user-triggerable oops.

In the case where we try to run a vcore on a physical core that is
not in single-threaded mode, or the vcore has too many threads for
the physical core, we iterate the list of runnable vcpus to make
each one return an EBUSY error to userspace.  Since this involves
taking each vcpu off the runnable_threads list for the vcore, we
need to use list_for_each_entry_safe rather than list_for_each_entry
to traverse the list.  Otherwise the kernel will crash with an oops
message like this:

Unable to handle kernel paging request for data at address 0x000fff88
Faulting instruction address: 0xd0001e635dc8
Oops: Kernel access of bad area, sig: 11 [#2]
SMP NR_CPUS=1024 NUMA PowerNV
...
CPU: 48 PID: 91256 Comm: qemu-system-ppc Tainted: G  D3.18.0 #1
task: c0274e507500 ti: c027d1924000 task.ti: c027d1924000
NIP: d0001e635dc8 LR: d0001e635df8 CTR: c011ba50
REGS: c027d19275b0 TRAP: 0300   Tainted: G  D (3.18.0)
MSR: 90009033   CR: 22002824  XER: 
CFAR: c0008468 DAR: 000fff88 DSISR: 4000 SOFTE: 1
GPR00: d0001e635df8 c027d1927830 d0001e64c850 0001
GPR04: 0001 0001  
GPR08: 00200200   d0001e63e588
GPR12: 2200 c7dbc800 c00fc780 000a
GPR16: fffc c00fd5439690 c00fc7801c98 0001
GPR20: 0003 c027d1927aa8 c00fd543b348 c00fd543b350
GPR24:  c00fa57f 0030 
GPR28: fff0 c00fd543b328 000fe468 c00fd543b300
NIP [d0001e635dc8] kvmppc_run_core+0x198/0x17c0 [kvm_hv]
LR [d0001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv]
Call Trace:
[c027d1927830] [d0001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] 
(unreliable)
[c027d1927a30] [d0001e638350] kvmppc_vcpu_run_hv+0x5b0/0xdd0 [kvm_hv]
[c027d1927b70] [d0001e510504] kvmppc_vcpu_run+0x44/0x60 [kvm]
[c027d1927ba0] [d0001e50d4a4] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[c027d1927be0] [d0001e504be8] kvm_vcpu_ioctl+0x5e8/0x7a0 [kvm]
[c027d1927d40] [c02d6720] do_vfs_ioctl+0x490/0x780
[c027d1927de0] [c02d6ae4] SyS_ioctl+0xd4/0xf0
[c027d1927e30] [c0009358] syscall_exit+0x0/0x98
Instruction dump:
6000 6042 387e1b30 3883 38a1 38c0 480087d9 e8410018
ebde1c98 7fbdf040 3bdee368 419e0048 <813e1b20> 939e1b18 2f890001 409effcc
---[ end trace 8cdf50251cca6680 ]---

Fixes: 25fedfca94cf
Signed-off-by: Paul Mackerras 
---
Since this is a regression fix for a patch that went in post 4.0,
it should go in for 4.1.

 arch/powerpc/kvm/book3s_hv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 48d3c5d..df81caa 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1952,7 +1952,7 @@ static void post_guest_process(struct kvmppc_vcore *vc)
  */
 static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
-   struct kvm_vcpu *vcpu;
+   struct kvm_vcpu *vcpu, *vnext;
int i;
int srcu_idx;
 
@@ -1982,7 +1982,8 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore 
*vc)
 */
if ((threads_per_core > 1) &&
((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
-   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) 
{
+   list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
+arch.run_list) {
vcpu->arch.ret = -EBUSY;
kvmppc_remove_runnable(vc, vcpu);
wake_up(&vcpu->arch.cpu_run);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE

2015-04-28 Thread Paul Mackerras

On Tue, Apr 28, 2015 at 10:36:52AM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras  writes:
> 
> > The reference (R) and change (C) bits in a HPT entry can be set by
> > hardware at any time up until the HPTE is invalidated and the TLB
> > invalidation sequence has completed.  This means that when removing
> > a HPTE, we need to read the HPTE after the invalidation sequence has
> > completed in order to obtain reliable values of R and C.  The code
> > in kvmppc_do_h_remove() used to do this.  However, commit 6f22bd3265fb
> > ("KVM: PPC: Book3S HV: Make HTAB code LE host aware") removed the
> > read after invalidation as a side effect of other changes.  This
> > restores the read of the HPTE after invalidation.
> >
> > The user-visible effect of this bug would be that when migrating a
> > guest, there is a small probability that a page modified by the guest
> > and then unmapped by the guest might not get re-transmitted and thus
> > the destination might end up with a stale copy of the page.
> >
> > Fixes: 6f22bd3265fb
> > Cc: sta...@vger.kernel.org # v3.17+
> > Signed-off-by: Paul Mackerras 
> > ---
> >  arch/powerpc/kvm/book3s_hv_rm_mmu.c | 8 +++-
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
> > b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> > index f6bf0b1..5c1737f 100644
> > --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> > +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> > @@ -413,14 +413,12 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned 
> > long flags,
> > rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
> > v = pte & ~HPTE_V_HVLOCK;
> > if (v & HPTE_V_VALID) {
> > -   u64 pte1;
> > -
> > -   pte1 = be64_to_cpu(hpte[1]);
> > hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
> > -   rb = compute_tlbie_rb(v, pte1, pte_index);
> > +   rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
> > do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
> > /* Read PTE low word after tlbie to get final R/C values */
> > -   remove_revmap_chain(kvm, pte_index, rev, v, pte1);
> > +   remove_revmap_chain(kvm, pte_index, rev, v,
> > +   be64_to_cpu(hpte[1]));
> > }
> 
> May be add the above commit message as a code comment ?

Well, that's what "/* Read PTE low word after tlbie to get final R/C
values */" was trying to be, originally, but maybe it would be helpful
to expand on it.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE

2015-04-23 Thread Paul Mackerras

The reference (R) and change (C) bits in a HPT entry can be set by
hardware at any time up until the HPTE is invalidated and the TLB
invalidation sequence has completed.  This means that when removing
a HPTE, we need to read the HPTE after the invalidation sequence has
completed in order to obtain reliable values of R and C.  The code
in kvmppc_do_h_remove() used to do this.  However, commit 6f22bd3265fb
("KVM: PPC: Book3S HV: Make HTAB code LE host aware") removed the
read after invalidation as a side effect of other changes.  This
restores the read of the HPTE after invalidation.

The user-visible effect of this bug would be that when migrating a
guest, there is a small probability that a page modified by the guest
and then unmapped by the guest might not get re-transmitted and thus
the destination might end up with a stale copy of the page.

Fixes: 6f22bd3265fb
Cc: sta...@vger.kernel.org # v3.17+
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index f6bf0b1..5c1737f 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -413,14 +413,12 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
v = pte & ~HPTE_V_HVLOCK;
if (v & HPTE_V_VALID) {
-   u64 pte1;
-
-   pte1 = be64_to_cpu(hpte[1]);
hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
-   rb = compute_tlbie_rb(v, pte1, pte_index);
+   rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
/* Read PTE low word after tlbie to get final R/C values */
-   remove_revmap_chain(kvm, pte_index, rev, v, pte1);
+   remove_revmap_chain(kvm, pte_index, rev, v,
+   be64_to_cpu(hpte[1]));
}
r = rev->guest_rpte & ~HPTE_GR_RESERVED;
note_hpte_modification(kvm, rev);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD

2015-04-23 Thread Paul Mackerras

This adds implementations for the H_CLEAR_REF (test and clear reference
bit) and H_CLEAR_MOD (test and clear changed bit) hypercalls.

When clearing the reference or change bit in the guest view of the HPTE,
we also have to clear it in the real HPTE so that we can detect future
references or changes.  When we do so, we transfer the R or C bit value
to the rmap entry for the underlying host page so that kvm_age_hva_hv(),
kvm_test_age_hva_hv() and kvmppc_hv_get_dirty_log() know that the page
has been referenced and/or changed.

These hypercalls are not used by Linux guests and these implementations
are only compile tested.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 126 ++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   4 +-
 2 files changed, 121 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 24ccc79..479ff7e 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -109,25 +109,38 @@ void kvmppc_update_rmap_change(unsigned long *rmap, 
unsigned long psize)
 }
 EXPORT_SYMBOL_GPL(kvmppc_update_rmap_change);
 
+/* Returns a pointer to the revmap entry for the page mapped by a HPTE */
+static unsigned long *revmap_for_hpte(struct kvm *kvm, unsigned long hpte_v,
+ unsigned long hpte_gr)
+{
+   struct kvm_memory_slot *memslot;
+   unsigned long *rmap;
+   unsigned long gfn;
+
+   gfn = hpte_rpn(hpte_gr, hpte_page_size(hpte_v, hpte_gr));
+   memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn);
+   if (!memslot)
+   return NULL;
+
+   rmap = real_vmalloc_addr(&memslot->arch.rmap[gfn - memslot->base_gfn]);
+   return rmap;
+}
+
 /* Remove this HPTE from the chain for a real page */
 static void remove_revmap_chain(struct kvm *kvm, long pte_index,
struct revmap_entry *rev,
unsigned long hpte_v, unsigned long hpte_r)
 {
struct revmap_entry *next, *prev;
-   unsigned long gfn, ptel, head;
-   struct kvm_memory_slot *memslot;
+   unsigned long ptel, head;
unsigned long *rmap;
unsigned long rcbits;
 
rcbits = hpte_r & (HPTE_R_R | HPTE_R_C);
ptel = rev->guest_rpte |= rcbits;
-   gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel));
-   memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn);
-   if (!memslot)
+   rmap = revmap_for_hpte(kvm, hpte_v, ptel);
+   if (!rmap)
return;
-
-   rmap = real_vmalloc_addr(&memslot->arch.rmap[gfn - memslot->base_gfn]);
lock_rmap(rmap);
 
head = *rmap & KVMPPC_RMAP_INDEX;
@@ -662,6 +675,105 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long 
flags,
return H_SUCCESS;
 }
 
+long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
+   unsigned long pte_index)
+{
+   struct kvm *kvm = vcpu->kvm;
+   __be64 *hpte;
+   unsigned long v, r, gr;
+   struct revmap_entry *rev;
+   unsigned long *rmap;
+   long ret = H_NOT_FOUND;
+
+   if (pte_index >= kvm->arch.hpt_npte)
+   return H_PARAMETER;
+
+   rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+   hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+   while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
+   cpu_relax();
+   v = be64_to_cpu(hpte[0]);
+   r = be64_to_cpu(hpte[1]);
+   if (!(v & (HPTE_V_VALID | HPTE_V_ABSENT)))
+   goto out;
+
+   gr = rev->guest_rpte;
+   if (rev->guest_rpte & HPTE_R_R) {
+   rev->guest_rpte &= ~HPTE_R_R;
+   note_hpte_modification(kvm, rev);
+   }
+   if (v & HPTE_V_VALID) {
+   gr |= r & (HPTE_R_R | HPTE_R_C);
+   if (r & HPTE_R_R) {
+   kvmppc_clear_ref_hpte(kvm, hpte, pte_index);
+   rmap = revmap_for_hpte(kvm, v, gr);
+   if (rmap) {
+   lock_rmap(rmap);
+   *rmap |= KVMPPC_RMAP_REFERENCED;
+   unlock_rmap(rmap);
+   }
+   }
+   }
+   vcpu->arch.gpr[4] = gr;
+   ret = H_SUCCESS;
+ out:
+   unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
+   return ret;
+}
+
+long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
+   unsigned long pte_index)
+{
+   struct kvm *kvm = vcpu->kvm;
+   __be64 *hpte;
+   unsigned long v, r, gr;
+   struct revmap_entry *rev;
+   unsigned long *rmap;
+   long ret = H_NOT_FOUND;
+
+   if (pte_index >= kvm->arch.hpt_npte)
+   return H_PARAMETER;
+
+   rev = real_vmalloc_addr(&kvm->arch.revmap[p

[PATCH 2/3] KVM: PPC: Book3S HV: Fix bug in dirty page tracking

2015-04-23 Thread Paul Mackerras

This fixes a bug in the tracking of pages that get modified by the
guest.  If the guest creates a large-page HPTE, writes to memory
somewhere within the large page, and then removes the HPTE, we only
record the modified state for the first normal page within the large
page, when in fact the guest might have modified some other normal
page within the large page.

To fix this we use some unused bits in the rmap entry to record the
order (log base 2) of the size of the page that was modified, when
removing an HPTE.  Then in kvm_test_clear_dirty_npages() we use that
order to return the correct number of modified pages.

The same thing could in principle happen when removing a HPTE at the
host's request, i.e. when paging out a page, except that we never
page out large pages, and the guest can only create large-page HPTEs
if the guest RAM is backed by large pages.  However, we also fix
this case for the sake of future-proofing.

The reference bit is also subject to the same loss of information.  We
don't make the same fix here for the reference bit because there isn't
an interface for userspace to find out which pages the guest has
referenced, whereas there is one for userspace to find out which pages
the guest has modified.  Because of this loss of information, the
kvm_age_hva_hv() and kvm_test_age_hva_hv() functions might incorrectly
say that a page has not been referenced when it has, but that doesn't
matter greatly because we never page or swap out large pages.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s.h |  1 +
 arch/powerpc/include/asm/kvm_host.h   |  2 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |  8 +++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   | 17 +
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 578e550..9b072a5 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -162,6 +162,7 @@ extern pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t 
gpa, bool writing,
bool *writable);
 extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
unsigned long *rmap, long pte_index, int realmode);
+extern void kvmppc_update_rmap_change(unsigned long *rmap, unsigned long 
psize);
 extern void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
unsigned long pte_index);
 void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d67a838..9c2617e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -205,8 +205,10 @@ struct revmap_entry {
  */
 #define KVMPPC_RMAP_LOCK_BIT   63
 #define KVMPPC_RMAP_RC_SHIFT   32
+#define KVMPPC_RMAP_CHG_SHIFT  48
 #define KVMPPC_RMAP_REFERENCED (HPTE_R_R << KVMPPC_RMAP_RC_SHIFT)
 #define KVMPPC_RMAP_CHANGED(HPTE_R_C << KVMPPC_RMAP_RC_SHIFT)
+#define KVMPPC_RMAP_CHG_ORDER  (0x3ful << KVMPPC_RMAP_CHG_SHIFT)
 #define KVMPPC_RMAP_PRESENT0x1ul
 #define KVMPPC_RMAP_INDEX  0xul
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index d6fe308..c9c25af 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -763,6 +763,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
/* Harvest R and C */
rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
*rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
+   if (rcbits & HPTE_R_C)
+   kvmppc_update_rmap_change(rmapp, psize);
if (rcbits & ~rev[i].guest_rpte) {
rev[i].guest_rpte = ptel | rcbits;
note_hpte_modification(kvm, &rev[i]);
@@ -929,8 +931,12 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
  retry:
lock_rmap(rmapp);
if (*rmapp & KVMPPC_RMAP_CHANGED) {
-   *rmapp &= ~KVMPPC_RMAP_CHANGED;
+   long change_order = (*rmapp & KVMPPC_RMAP_CHG_ORDER)
+   >> KVMPPC_RMAP_CHG_SHIFT;
+   *rmapp &= ~(KVMPPC_RMAP_CHANGED | KVMPPC_RMAP_CHG_ORDER);
npages_dirty = 1;
+   if (change_order > PAGE_SHIFT)
+   npages_dirty = 1ul << (change_order - PAGE_SHIFT);
}
if (!(*rmapp & KVMPPC_RMAP_PRESENT)) {
unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 5c1737f..24ccc79 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -12,6 +12,7 @@
 #include 
 #include

[PATCH 0/3] PPC HV bug fixes + hcalls for FreeBSD

2015-04-23 Thread Paul Mackerras

The main purpose of this series is to implement the H_CLEAR_REF and
H_CLEAR_MOD hypercalls defined by PAPR.  We are doing this for the
sake of FreeBSD guests as Linux guests don't use them.  Along the way
I found a couple of bugs, so the fixes for those are split out as the
first two patches.

The first two patches could go in immediately.  I'd like to get
feedback from actual users of H_CLEAR_REF/MOD before the third patch
goes in.

These patches are against Alex Graf's kvm-ppc-queue branch.

Paul.

 arch/powerpc/include/asm/kvm_host.h |  2 ++
 arch/powerpc/kernel/asm-offsets.c   |  3 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c |  6 +++-
 arch/powerpc/kvm/book3s_hv.c| 51 ++---
 arch/powerpc/kvm/book3s_hv_builtin.c| 16 +--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 26 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 22 --
 7 files changed, 98 insertions(+), 28 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/12] Remaining improvements for HV KVM

2015-04-21 Thread Paul Mackerras

On Wed, Apr 15, 2015 at 10:16:41PM +0200, Alexander Graf wrote:
> 
> 
> On 14.04.15 13:56, Paul Mackerras wrote:
> > Did you forget to push it out or something?  Your kvm-ppc-queue branch
> > is still at 4.0-rc1 as far as I can see.
> 
> Oops, not sure how that happened. Does it show up correctly for you now?

Yes, it's fine now, thanks.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/12] Remaining improvements for HV KVM

2015-04-14 Thread Paul Mackerras

On Thu, Apr 09, 2015 at 12:57:58AM +0200, Alexander Graf wrote:
> On 03/28/2015 04:21 AM, Paul Mackerras wrote:
> >This is the rest of my current patch queue for HV KVM on PPC.  This
> >series is based on Alex Graf's kvm-ppc-queue branch.  The only change
> >from the previous version of this series is that patch 2 has been
> >updated to take account of the timebase offset.
> >
> >The last patch in this series needs a definition of PPC_MSGCLR that is
> >added by the patch "powerpc/powernv: Fixes for hypervisor doorbell
> >handling", which has now gone upstream into Linus' tree as commit
> >755563bc79c7 via the linuxppc-dev mailing list.  Alex, how do you want
> >to handle that?  You could pull in the master branch of the kvm tree,
> >which includes 755563bc79c7, or you could cherry-pick 755563bc79c7 and
> >let the subsequent merge fix it up.
> 
> I've just cherry-picked it for now since it still lives in my queue, so it
> will get thrown out automatically once I rebase on next if it's included in
> there.
> 
> Paolo / Marcelo, could you please try to somehow get the commit above into
> the next branch somehow? I guess the easiest would be to merge linus/master
> into kvm/next.
> 
> Thanks, applied all to kvm-ppc-queue.

Did you forget to push it out or something?  Your kvm-ppc-queue branch
is still at 4.0-rc1 as far as I can see.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: H_CLEAR_REF and H_CLEAR_MOD

2015-04-14 Thread Paul Mackerras

On Sat, Apr 11, 2015 at 12:57:54PM -0700, Nathan Whitehorn wrote:
> 
> 
> On 02/18/15 15:33, Nathan Whitehorn wrote:
> >
> >On 02/18/15 14:00, Paul Mackerras wrote:
> >>On Wed, Feb 18, 2015 at 09:34:54AM +0100, Alexander Graf wrote:
> >>>>Am 18.02.2015 um 07:12 schrieb Nathan Whitehorn
> >>>>:
> >>>>
> >>>>It seems like KVM doesn't implement the H_CLEAR_REF and H_CLEAR_MOD
> >>>>hypervisor calls, which are absolutely critical for memory
> >>>>management in the FreeBSD kernel (and are marked "mandatory" in the
> >>>>PAPR manual). It seems some patches have been contributed already in 
> >>>>https://lists.ozlabs.org/pipermail/linuxppc-dev/2011-December/095013.html,
> >>>>so it would be fantastic if these could end up upstream.
> >>>Paul, I guess we never included this because  there was no user. If
> >>>FreeBSD does use it though, I think it makes a lot of sense to resend
> >>>it for inclusion.
> >>I agree.  I just need to check the locking and synchronization around
> >>the reference and change bit recording, then I'll resend it.
> >
> >Thanks much! Please let me know if I can help at all with this.
> 
> Any news on this? I'm happy to test the patch if you like.
> -Nathan

I have discovered a bug where we can lose the host-side view of the
dirtiness of pages when the guest does a H_CLEAR_MOD on a huge-page
HPTE, so I'm working on a fix for that.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/12] KVM: PPC: Book3S HV: Streamline guest entry and exit

2015-03-27 Thread Paul Mackerras

On entry to the guest, secondary threads now wait for the primary to
switch the MMU after loading up most of their state, rather than before.
This means that the secondary threads get into the guest sooner, in the
common case where the secondary threads get to kvmppc_hv_entry before
the primary thread.

On exit, the first thread out increments the exit count and interrupts
the other threads (to get them out of the guest) before saving most
of its state, rather than after.  That means that the other threads
exit sooner and means that the first thread doesn't spend so much
time waiting for the other threads at the point where the MMU gets
switched back to the host.

This pulls out the code that increments the exit count and interrupts
other threads into a separate function, kvmhv_commence_exit().
This also makes sure that r12 and vcpu->arch.trap are set correctly
in some corner cases.

Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the
improvement.  Aggregating across vcpus for a guest with 32 vcpus,
8 threads/vcore, running on a POWER8, gives this before the change:

 rm_entry: avg 4537.3ns (222 - 48444, 1068878 samples)
  rm_exit: avg 4787.6ns (152 - 165490, 1010717 samples)
  rm_intr: avg 1673.6ns (12 - 341304, 3818691 samples)

and this after the change:

 rm_entry: avg 3427.7ns (232 - 68150, 1118921 samples)
  rm_exit: avg 4716.0ns (12 - 150720, 1119477 samples)
  rm_intr: avg 1614.8ns (12 - 522436, 3850432 samples)

showing a substantial reduction in the time spent per guest entry in
the real-mode guest entry code, and smaller reductions in the real
mode guest exit and interrupt handling times.  (The test was to start
the guest and boot Fedora 20 big-endian to the login prompt.)

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 212 +++-
 1 file changed, 126 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 063c235..1de596f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -175,6 +175,19 @@ kvmppc_primary_no_guest:
/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
mfspr   r3, SPRN_HDEC
mtspr   SPRN_DEC, r3
+   /*
+* Make sure the primary has finished the MMU switch.
+* We should never get here on a secondary thread, but
+* check it for robustness' sake.
+*/
+   ld  r5, HSTATE_KVM_VCORE(r13)
+65:lbz r0, VCORE_IN_GUEST(r5)
+   cmpwi   r0, 0
+   beq 65b
+   /* Set LPCR. */
+   ld  r8,VCORE_LPCR(r5)
+   mtspr   SPRN_LPCR,r8
+   isync
/* set our bit in napping_threads */
ld  r5, HSTATE_KVM_VCORE(r13)
lbz r7, HSTATE_PTID(r13)
@@ -206,7 +219,7 @@ kvm_novcpu_wakeup:
 
/* check the wake reason */
bl  kvmppc_check_wake_reason
-   
+
/* see if any other thread is already exiting */
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi   r0, 0x100
@@ -244,7 +257,15 @@ kvm_novcpu_wakeup:
b   kvmppc_got_guest
 
 kvm_novcpu_exit:
-   b   hdec_soon
+#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+   ld  r4, HSTATE_KVM_VCPU(r13)
+   cmpdi   r4, 0
+   beq 13f
+   addir3, r4, VCPU_TB_RMEXIT
+   bl  kvmhv_accumulate_time
+#endif
+13:bl  kvmhv_commence_exit
+   b   kvmhv_switch_to_host
 
 /*
  * We come in here when wakened from nap mode.
@@ -422,7 +443,7 @@ kvmppc_hv_entry:
/* Primary thread switches to guest partition. */
ld  r9,VCORE_KVM(r5)/* pointer to struct kvm */
cmpwi   r6,0
-   bne 20f
+   bne 10f
ld  r6,KVM_SDR1(r9)
lwz r7,KVM_LPID(r9)
li  r0,LPID_RSVD/* switch to reserved LPID */
@@ -493,26 +514,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
li  r0,1
stb r0,VCORE_IN_GUEST(r5)   /* signal secondaries to continue */
-   b   10f
-
-   /* Secondary threads wait for primary to have done partition switch */
-20:lbz r0,VCORE_IN_GUEST(r5)
-   cmpwi   r0,0
-   beq 20b
-
-   /* Set LPCR. */
-10:ld  r8,VCORE_LPCR(r5)
-   mtspr   SPRN_LPCR,r8
-   isync
-
-   /* Check if HDEC expires soon */
-   mfspr   r3,SPRN_HDEC
-   cmpwi   r3,512  /* 1 microsecond */
-   li  r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-   blt hdec_soon
 
/* Do we have a guest vcpu to run? */
-   cmpdi   r4, 0
+10:cmpdi   r4, 0
beq kvmppc_primary_no_guest
 kvmppc_got_guest:
 
@@ -837,6 +841,30 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
clrrdi  r6,r6,1
mtspr   SPRN_CTRLT,r6
 4:
+   /* Secondary threads wait for primary to have done partition switch */
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   lbz

[PATCH 11/12] KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C

2015-03-27 Thread Paul Mackerras

This replaces the assembler code for kvmhv_commence_exit() with C code
in book3s_hv_builtin.c.  It also moves the IPI sending code that was
in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it
can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq().

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  2 +
 arch/powerpc/kvm/book3s_hv_builtin.c | 63 ++
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 12 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 66 
 4 files changed, 75 insertions(+), 68 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 869c53f..2b84e48 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -438,6 +438,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
 
 extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 
+extern void kvmhv_rm_send_ipi(int cpu);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 2754251..c42aa55 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define KVM_CMA_CHUNK_ORDER18
 
@@ -184,3 +185,65 @@ long kvmppc_h_random(struct kvm_vcpu *vcpu)
 
return H_HARDWARE;
 }
+
+static inline void rm_writeb(unsigned long paddr, u8 val)
+{
+   __asm__ __volatile__("stbcix %0,0,%1"
+   : : "r" (val), "r" (paddr) : "memory");
+}
+
+/*
+ * Send an interrupt to another CPU.
+ * This can only be called in real mode.
+ * The caller needs to include any barrier needed to order writes
+ * to memory vs. the IPI/message.
+ */
+void kvmhv_rm_send_ipi(int cpu)
+{
+   unsigned long xics_phys;
+
+   /* Poke the target */
+   xics_phys = paca[cpu].kvm_hstate.xics_phys;
+   rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+}
+
+/*
+ * The following functions are called from the assembly code
+ * in book3s_hv_rmhandlers.S.
+ */
+static void kvmhv_interrupt_vcore(struct kvmppc_vcore *vc, int active)
+{
+   int cpu = vc->pcpu;
+
+   /* Order setting of exit map vs. msgsnd/IPI */
+   smp_mb();
+   for (; active; active >>= 1, ++cpu)
+   if (active & 1)
+   kvmhv_rm_send_ipi(cpu);
+}
+
+void kvmhv_commence_exit(int trap)
+{
+   struct kvmppc_vcore *vc = local_paca->kvm_hstate.kvm_vcore;
+   int ptid = local_paca->kvm_hstate.ptid;
+   int me, ee;
+
+   /* Set our bit in the threads-exiting-guest map in the 0xff00
+  bits of vcore->entry_exit_map */
+   me = 0x100 << ptid;
+   do {
+   ee = vc->entry_exit_map;
+   } while (cmpxchg(&vc->entry_exit_map, ee, ee | me) != ee);
+
+   /* Are we the first here? */
+   if ((ee >> 8) != 0)
+   return;
+
+   /*
+* Trigger the other threads in this vcore to exit the guest.
+* If this is a hypervisor decrementer interrupt then they
+* will be already on their way out of the guest.
+*/
+   if (trap != BOOK3S_INTERRUPT_HV_DECREMENTER)
+   kvmhv_interrupt_vcore(vc, ee & ~(1 << ptid));
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 6dded8c..00e45b6 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -26,12 +26,6 @@
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
u32 new_irq);
 
-static inline void rm_writeb(unsigned long paddr, u8 val)
-{
-   __asm__ __volatile__("sync; stbcix %0,0,%1"
-   : : "r" (val), "r" (paddr) : "memory");
-}
-
 /* -- ICS routines -- */
 static void ics_rm_check_resend(struct kvmppc_xics *xics,
struct kvmppc_ics *ics, struct kvmppc_icp *icp)
@@ -60,7 +54,6 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
struct kvm_vcpu *this_vcpu)
 {
struct kvmppc_icp *this_icp = this_vcpu->arch.icp;
-   unsigned long xics_phys;
int cpu;
 
/* Mark the target VCPU as having an interrupt pending */
@@ -83,9 +76,8 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
/* In SMT cpu will always point to thread 0, we adjust it */
cpu += vcpu->arch.ptid;
 
-   /* Not too hard, then poke the target */
-   xics_phys = paca[cpu].kvm_hstate.xics_phys;
-   rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+   smp_mb();
+   kvmhv_rm_send_ipi(cpu);
 }
 
 static void icp_rm_clr_vcpu_irq(struct kvm_vcpu *vcpu)
diff --g

[PATCH 09/12] KVM: PPC: Book3S HV: Use bitmap of active threads rather than count

2015-03-27 Thread Paul Mackerras

Currently, the entry_exit_count field in the kvmppc_vcore struct
contains two 8-bit counts, one of the threads that have started entering
the guest, and one of the threads that have started exiting the guest.
This changes it to an entry_exit_map field which contains two bitmaps
of 8 bits each.  The advantage of doing this is that it gives us a
bitmap of which threads need to be signalled when exiting the guest.
That means that we no longer need to use the trick of setting the
HDEC to 0 to pull the other threads out of the guest, which led in
some cases to a spurious HDEC interrupt on the next guest entry.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h | 15 
 arch/powerpc/kernel/asm-offsets.c   |  2 +-
 arch/powerpc/kvm/book3s_hv.c|  5 ++-
 arch/powerpc/kvm/book3s_hv_builtin.c| 10 +++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 61 +++--
 5 files changed, 44 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 1517faa..d67a838 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -263,15 +263,15 @@ struct kvm_arch {
 
 /*
  * Struct for a virtual core.
- * Note: entry_exit_count combines an entry count in the bottom 8 bits
- * and an exit count in the next 8 bits.  This is so that we can
- * atomically increment the entry count iff the exit count is 0
- * without taking the lock.
+ * Note: entry_exit_map combines a bitmap of threads that have entered
+ * in the bottom 8 bits and a bitmap of threads that have exited in the
+ * next 8 bits.  This is so that we can atomically set the entry bit
+ * iff the exit map is 0 without taking a lock.
  */
 struct kvmppc_vcore {
int n_runnable;
int num_threads;
-   int entry_exit_count;
+   int entry_exit_map;
int napping_threads;
int first_vcpuid;
u16 pcpu;
@@ -296,8 +296,9 @@ struct kvmppc_vcore {
ulong conferring_threads;
 };
 
-#define VCORE_ENTRY_COUNT(vc)  ((vc)->entry_exit_count & 0xff)
-#define VCORE_EXIT_COUNT(vc)   ((vc)->entry_exit_count >> 8)
+#define VCORE_ENTRY_MAP(vc)((vc)->entry_exit_map & 0xff)
+#define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8)
+#define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8aa8246..0d07efb 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -562,7 +562,7 @@ int main(void)
DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop));
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
-   DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, 
entry_exit_count));
+   DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, 
napping_threads));
DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6741505..1426459 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1952,7 +1952,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Initialize *vc.
 */
-   vc->entry_exit_count = 0;
+   vc->entry_exit_map = 0;
vc->preempt_tb = TB_NIL;
vc->in_guest = 0;
vc->napping_threads = 0;
@@ -2119,8 +2119,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
 * this thread straight away and have it join in.
 */
if (!signal_pending(current)) {
-   if (vc->vcore_state == VCORE_RUNNING &&
-   VCORE_EXIT_COUNT(vc) == 0) {
+   if (vc->vcore_state == VCORE_RUNNING && !VCORE_IS_EXITING(vc)) {
kvmppc_create_dtl_entry(vcpu, vc);
kvmppc_start_thread(vcpu);
trace_kvm_guest_enter(vcpu);
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 1954a1c..2754251 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -115,11 +115,11 @@ long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int 
target,
int rv = H_SUCCESS; /* => don't yield */
 
set_bit(vcpu->arch.ptid, &vc->conferring_threads);
-   while ((get_tb() < stop) && (VCORE_EXIT_COUNT(vc) == 0)) {
-   threads_running = VCORE_ENTRY_COUNT(vc);
-   threads_ceded = hweight32(vc->napping_threads);
-   threads_conferring = hweight

[PATCH 05/12] KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu

2015-03-27 Thread Paul Mackerras

Rather than calling cond_resched() in kvmppc_run_core() before doing
the post-processing for the vcpus that we have just run (that is,
calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do
that post-processing before calling cond_resched(), and that post-
processing is moved out into its own function, post_guest_process().

The reschedule point is now in kvmppc_run_vcpu() and we define a new
vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner
task is runnable but not running.  (Doing the reschedule with the
vcore in VCORE_INACTIVE state would be bad because there are potentially
other vcpus waiting for the runner in kvmppc_wait_for_exec() which
then wouldn't get woken up.)

Also, we make use of the handy cond_resched_lock() function, which
unlocks and relocks vc->lock for us around the reschedule.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  5 +-
 arch/powerpc/kvm/book3s_hv.c| 92 +
 2 files changed, 55 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 3eecd88..83c4425 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -304,8 +304,9 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
 #define VCORE_SLEEPING 1
-#define VCORE_RUNNING  2
-#define VCORE_EXITING  3
+#define VCORE_PREEMPT  2
+#define VCORE_RUNNING  3
+#define VCORE_EXITING  4
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 1a6ea6e..5a1abf6 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1882,15 +1882,50 @@ static void prepare_threads(struct kvmppc_vcore *vc)
}
 }
 
+static void post_guest_process(struct kvmppc_vcore *vc)
+{
+   u64 now;
+   long ret;
+   struct kvm_vcpu *vcpu, *vnext;
+
+   now = get_tb();
+   list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
+arch.run_list) {
+   /* cancel pending dec exception if dec is positive */
+   if (now < vcpu->arch.dec_expires &&
+   kvmppc_core_pending_dec(vcpu))
+   kvmppc_core_dequeue_dec(vcpu);
+
+   trace_kvm_guest_exit(vcpu);
+
+   ret = RESUME_GUEST;
+   if (vcpu->arch.trap)
+   ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
+   vcpu->arch.run_task);
+
+   vcpu->arch.ret = ret;
+   vcpu->arch.trap = 0;
+
+   if (vcpu->arch.ceded) {
+   if (!is_kvmppc_resume_guest(ret))
+   kvmppc_end_cede(vcpu);
+   else
+   kvmppc_set_timer(vcpu);
+   }
+   if (!is_kvmppc_resume_guest(vcpu->arch.ret)) {
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(&vcpu->arch.cpu_run);
+   }
+   }
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
  */
 static void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
-   struct kvm_vcpu *vcpu, *vnext;
-   long ret;
-   u64 now;
+   struct kvm_vcpu *vcpu;
int i;
int srcu_idx;
 
@@ -1922,8 +1957,11 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 */
if ((threads_per_core > 1) &&
((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
-   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
+   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) 
{
vcpu->arch.ret = -EBUSY;
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(&vcpu->arch.cpu_run);
+   }
goto out;
}
 
@@ -1979,44 +2017,12 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
kvm_guest_exit();
 
preempt_enable();
-   cond_resched();
 
spin_lock(&vc->lock);
-   now = get_tb();
-   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
-   /* cancel pending dec exception if dec is positive */
-   if (now < vcpu->arch.dec_expires &&
-   kvmppc_core_pending_dec(vcpu))
-   kvmppc_core_dequeue_dec(vcpu);
-
-   trace_kvm_guest_exit(vcpu);
-
-   ret = RESUME_GUEST;
-   if (vcpu->arch.trap)
-   ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
-   vcpu->arch.run_task);
-

[PATCH 12/12] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8

2015-03-27 Thread Paul Mackerras

This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller.  This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.

Aggregated statistics from debugfs across vcpus for a guest with 32
vcpus, 8 threads/vcore, running on a POWER8, show this before the
change:

 rm_entry: 3387.6ns (228 - 86600, 1008969 samples)
  rm_exit: 4561.5ns (12 - 3477452, 1009402 samples)
  rm_intr: 1660.0ns (12 - 553050, 3600051 samples)

and this after the change:

 rm_entry: 3060.1ns (212 - 65138, 953873 samples)
  rm_exit: 4244.1ns (12 - 9693408, 954331 samples)
  rm_intr: 1342.3ns (12 - 1104718, 3405326 samples)

for a test of booting Fedora 20 big-endian to the login prompt.

The time taken for a H_PROD hcall (which is handled in the host
kernel) went down from about 35 microseconds to about 16 microseconds
with this change.

The noinline added to kvmppc_run_core turned out to be necessary for
good performance, at least with gcc 4.9.2 as packaged with Fedora 21
and a little-endian POWER8 host.

Signed-off-by: Paul Mackerras 
---
Note that this patch depends on the patch "powerpc/powernv: Fixes for
hypervisor doorbell handling", which is now upstream in Linus' tree as
commit 755563bc79c7, for the definition of PPC_MSGCLR().

 arch/powerpc/kernel/asm-offsets.c   |  3 ++
 arch/powerpc/kvm/book3s_hv.c| 51 ++---
 arch/powerpc/kvm/book3s_hv_builtin.c| 16 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 22 --
 4 files changed, 70 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 0d07efb..0034b6b 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef CONFIG_PPC64
 #include 
 #include 
@@ -759,5 +760,7 @@ int main(void)
offsetof(struct paca_struct, subcore_sibling_mask));
 #endif
 
+   DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
+
return 0;
 }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 1426459..bb29e75 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -84,9 +85,35 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
MAX_HCALL_OPCODE/4 + 1);
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
+static bool kvmppc_ipi_thread(int cpu)
+{
+   /* On POWER8 for IPIs to threads in the same core, use msgsnd */
+   if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
+   preempt_disable();
+   if (cpu_first_thread_sibling(cpu) ==
+   cpu_first_thread_sibling(smp_processor_id())) {
+   unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+   msg |= cpu_thread_in_core(cpu);
+   smp_mb();
+   __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+   preempt_enable();
+   return true;
+   }
+   preempt_enable();
+   }
+
+#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
+   if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
+   xics_wake_cpu(cpu);
+   return true;
+   }
+#endif
+
+   return false;
+}
+
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-   int me;
int cpu = vcpu->cpu;
wait_queue_head_t *wqp;
 
@@ -96,20 +123,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
++vcpu->stat.halt_wakeup;
}
 
-   me = get_cpu();
+   if (kvmppc_ipi_thread(cpu + vcpu->arch.ptid))
+   return;
 
/* CPU points to the first thread of the core */
-   if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) {
-#ifdef CONFIG_PPC_ICP_NATIVE
-   int real_cpu = cpu + vcpu->arch.ptid;
-   if (paca[real_cpu].kvm_hstate.xics_phys)
-   xics_wake_cpu(real_cpu);
-   else
-#endif
-   if (cpu_online(cpu))
-   smp_send_reschedule(cpu);
-   }
-   put_cpu();
+   if (cpu >= 0 && cpu < nr_cpu_ids && cpu_online(cpu))
+   smp_send_reschedule(cpu);
 }
 
 /*
@@ -1781,10 +1800,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
/* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
smp_wmb();
tpaca->kvm_hstate.kvm_vcpu = vcpu;
-#if defined(CONF

[PATCH 03/12] KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update

2015-03-27 Thread Paul Mackerras

Previously, if kvmppc_run_core() was running a VCPU that needed a VPA
update (i.e. one of its 3 virtual processor areas needed to be pinned
in memory so the host real mode code can update it on guest entry and
exit), we would drop the vcore lock and do the update there and then.
Future changes will make it inconvenient to drop the lock, so instead
we now remove it from the list of runnable VCPUs and wake up its
VCPU task.  This will have the effect that the VCPU task will exit
kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and
re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call
to kvmppc_update_vpas() and then rejoin the vcore.

The one complication is that the runner VCPU (whose VCPU task is the
current task) might be one of the ones that gets removed from the
runnable list.  In that case we just return from kvmppc_run_core()
and let the code in kvmppc_run_vcpu() wake up another VCPU task to be
the runner if necessary.

This all means that the VCORE_STARTING state is no longer used, so we
remove it.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  5 ++--
 arch/powerpc/kvm/book3s_hv.c| 56 -
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d2068bb..2f339ff 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -306,9 +306,8 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
 #define VCORE_SLEEPING 1
-#define VCORE_STARTING 2
-#define VCORE_RUNNING  3
-#define VCORE_EXITING  4
+#define VCORE_RUNNING  2
+#define VCORE_EXITING  3
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c7b18ac..1a6ea6e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1863,6 +1863,25 @@ static void kvmppc_start_restoring_l2_cache(const struct 
kvmppc_vcore *vc)
mtspr(SPRN_MPPR, mpp_addr | PPC_MPPR_FETCH_WHOLE_TABLE);
 }
 
+static void prepare_threads(struct kvmppc_vcore *vc)
+{
+   struct kvm_vcpu *vcpu, *vnext;
+
+   list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
+arch.run_list) {
+   if (signal_pending(vcpu->arch.run_task))
+   vcpu->arch.ret = -EINTR;
+   else if (vcpu->arch.vpa.update_pending ||
+vcpu->arch.slb_shadow.update_pending ||
+vcpu->arch.dtl.update_pending)
+   vcpu->arch.ret = RESUME_GUEST;
+   else
+   continue;
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(&vcpu->arch.cpu_run);
+   }
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
@@ -1872,46 +1891,31 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
struct kvm_vcpu *vcpu, *vnext;
long ret;
u64 now;
-   int i, need_vpa_update;
+   int i;
int srcu_idx;
-   struct kvm_vcpu *vcpus_to_update[threads_per_core];
 
-   /* don't start if any threads have a signal pending */
-   need_vpa_update = 0;
-   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
-   if (signal_pending(vcpu->arch.run_task))
-   return;
-   if (vcpu->arch.vpa.update_pending ||
-   vcpu->arch.slb_shadow.update_pending ||
-   vcpu->arch.dtl.update_pending)
-   vcpus_to_update[need_vpa_update++] = vcpu;
-   }
+   /*
+* Remove from the list any threads that have a signal pending
+* or need a VPA update done
+*/
+   prepare_threads(vc);
+
+   /* if the runner is no longer runnable, let the caller pick a new one */
+   if (vc->runner->arch.state != KVMPPC_VCPU_RUNNABLE)
+   return;
 
/*
-* Initialize *vc, in particular vc->vcore_state, so we can
-* drop the vcore lock if necessary.
+* Initialize *vc.
 */
vc->n_woken = 0;
vc->nap_count = 0;
vc->entry_exit_count = 0;
vc->preempt_tb = TB_NIL;
-   vc->vcore_state = VCORE_STARTING;
vc->in_guest = 0;
vc->napping_threads = 0;
vc->conferring_threads = 0;
 
/*
-* Updating any of the vpas requires calling kvmppc_pin_guest_page,
-* which can't be called with any spinlocks held.
-*/
-   if (need_vpa_update) {
-   spin_unlock(&vc->lock);
-   for (i = 0; i < need_vpa_update; ++i)
-   kvmppc_update_vpas(vcpus_to_update[i]);
-   spin_lock(&vc->lock);
-   }
-
-   /*

[PATCH v2 00/12] Remaining improvements for HV KVM

2015-03-27 Thread Paul Mackerras

This is the rest of my current patch queue for HV KVM on PPC.  This
series is based on Alex Graf's kvm-ppc-queue branch.  The only change
from the previous version of this series is that patch 2 has been
updated to take account of the timebase offset.

The last patch in this series needs a definition of PPC_MSGCLR that is
added by the patch "powerpc/powernv: Fixes for hypervisor doorbell
handling", which has now gone upstream into Linus' tree as commit
755563bc79c7 via the linuxppc-dev mailing list.  Alex, how do you want
to handle that?  You could pull in the master branch of the kvm tree,
which includes 755563bc79c7, or you could cherry-pick 755563bc79c7 and
let the subsequent merge fix it up.

I would like to see these patches go into 4.1.

Paul.

 arch/powerpc/include/asm/kvm_book3s_64.h |   4 +
 arch/powerpc/include/asm/kvm_host.h  |  44 ++-
 arch/powerpc/include/asm/time.h  |   3 +
 arch/powerpc/kernel/asm-offsets.c|  20 +-
 arch/powerpc/kernel/time.c   |   6 +
 arch/powerpc/kvm/Kconfig |  14 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 136 ++
 arch/powerpc/kvm/book3s_hv.c | 413 
 arch/powerpc/kvm/book3s_hv_builtin.c |  85 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  12 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 444 +--
 virt/kvm/kvm_main.c  |   1 +
 12 files changed, 909 insertions(+), 273 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/12] KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken

2015-03-27 Thread Paul Mackerras

We can tell when a secondary thread has finished running a guest by
the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
is no real need for the nap_count field in the kvmppc_vcore struct.
This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
pointers of the secondary threads rather than polling vc->nap_count.
Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
this also means that we can tell which secondary threads have got
stuck and thus print a more informative error message.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kernel/asm-offsets.c   |  1 -
 arch/powerpc/kvm/book3s_hv.c| 47 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 19 +
 4 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 83c4425..1517faa 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -272,8 +272,6 @@ struct kvmppc_vcore {
int n_runnable;
int num_threads;
int entry_exit_count;
-   int n_woken;
-   int nap_count;
int napping_threads;
int first_vcpuid;
u16 pcpu;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 92ec3fc..8aa8246 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -563,7 +563,6 @@ int main(void)
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, 
entry_exit_count));
-   DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, 
napping_threads));
DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 5a1abf6..6741505 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1729,8 +1729,10 @@ static int kvmppc_grab_hwthread(int cpu)
tpaca = &paca[cpu];
 
/* Ensure the thread won't go into the kernel if it wakes */
-   tpaca->kvm_hstate.hwthread_req = 1;
tpaca->kvm_hstate.kvm_vcpu = NULL;
+   tpaca->kvm_hstate.napping = 0;
+   smp_wmb();
+   tpaca->kvm_hstate.hwthread_req = 1;
 
/*
 * If the thread is already executing in the kernel (e.g. handling
@@ -1773,35 +1775,43 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
}
cpu = vc->pcpu + vcpu->arch.ptid;
tpaca = &paca[cpu];
-   tpaca->kvm_hstate.kvm_vcpu = vcpu;
tpaca->kvm_hstate.kvm_vcore = vc;
tpaca->kvm_hstate.ptid = vcpu->arch.ptid;
vcpu->cpu = vc->pcpu;
+   /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
smp_wmb();
+   tpaca->kvm_hstate.kvm_vcpu = vcpu;
 #if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
-   if (cpu != smp_processor_id()) {
+   if (cpu != smp_processor_id())
xics_wake_cpu(cpu);
-   if (vcpu->arch.ptid)
-   ++vc->n_woken;
-   }
 #endif
 }
 
-static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
+static void kvmppc_wait_for_nap(void)
 {
-   int i;
+   int cpu = smp_processor_id();
+   int i, loops;
 
-   HMT_low();
-   i = 0;
-   while (vc->nap_count < vc->n_woken) {
-   if (++i >= 100) {
-   pr_err("kvmppc_wait_for_nap timeout %d %d\n",
-  vc->nap_count, vc->n_woken);
-   break;
+   for (loops = 0; loops < 100; ++loops) {
+   /*
+* Check if all threads are finished.
+* We set the vcpu pointer when starting a thread
+* and the thread clears it when finished, so we look
+* for any threads that still have a non-NULL vcpu ptr.
+*/
+   for (i = 1; i < threads_per_subcore; ++i)
+   if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+   break;
+   if (i == threads_per_subcore) {
+   HMT_medium();
+   return;
}
-   cpu_relax();
+   HMT_low();
}
HMT_medium();
+   for (i = 1; i < threads_per_subcore; ++i)
+   if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+   pr_err("KVM: CPU %d seems to be stuck\n", cpu + i);
 }
 
 /*
@@ -1942,8 +1952,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Initiali

[PATCH 01/12] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT

2015-03-27 Thread Paul Mackerras

This creates a debugfs directory for each HV guest (assuming debugfs
is enabled in the kernel config), and within that directory, a file
by which the contents of the guest's HPT (hashed page table) can be
read.  The directory is named vm, where  is the PID of the
process that created the guest.  The file is named "htab".  This is
intended to help in debugging problems in the host's management
of guest memory.

The contents of the file consist of a series of lines like this:

  3f48 4000d032bf003505 000bd7ff1196 0003b5c71196

The first field is the index of the entry in the HPT, the second and
third are the HPT entry, so the third entry contains the real page
number that is mapped by the entry if the entry's valid bit is set.
The fourth field is the guest's view of the second doubleword of the
entry, so it contains the guest physical address.  (The format of the
second through fourth fields are described in the Power ISA and also
in arch/powerpc/include/asm/mmu-hash64.h.)

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 +
 arch/powerpc/include/asm/kvm_host.h  |   2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 136 +++
 arch/powerpc/kvm/book3s_hv.c |  12 +++
 virt/kvm/kvm_main.c  |   1 +
 5 files changed, 153 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0789a0f..869c53f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -436,6 +436,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
return rcu_dereference_raw_notrace(kvm->memslots);
 }
 
+extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 015773f..f1d0bbc 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -238,6 +238,8 @@ struct kvm_arch {
atomic_t hpte_mod_interest;
cpumask_t need_tlb_flush;
int hpt_cma_alloc;
+   struct dentry *debugfs_dir;
+   struct dentry *htab_dentry;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 6c6825a..d6fe308 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct 
kvm_get_htab_fd *ghf)
return ret;
 }
 
+struct debugfs_htab_state {
+   struct kvm  *kvm;
+   struct mutexmutex;
+   unsigned long   hpt_index;
+   int chars_left;
+   int buf_index;
+   charbuf[64];
+};
+
+static int debugfs_htab_open(struct inode *inode, struct file *file)
+{
+   struct kvm *kvm = inode->i_private;
+   struct debugfs_htab_state *p;
+
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   if (!p)
+   return -ENOMEM;
+
+   kvm_get_kvm(kvm);
+   p->kvm = kvm;
+   mutex_init(&p->mutex);
+   file->private_data = p;
+
+   return nonseekable_open(inode, file);
+}
+
+static int debugfs_htab_release(struct inode *inode, struct file *file)
+{
+   struct debugfs_htab_state *p = file->private_data;
+
+   kvm_put_kvm(p->kvm);
+   kfree(p);
+   return 0;
+}
+
+static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
+size_t len, loff_t *ppos)
+{
+   struct debugfs_htab_state *p = file->private_data;
+   ssize_t ret, r;
+   unsigned long i, n;
+   unsigned long v, hr, gr;
+   struct kvm *kvm;
+   __be64 *hptp;
+
+   ret = mutex_lock_interruptible(&p->mutex);
+   if (ret)
+   return ret;
+
+   if (p->chars_left) {
+   n = p->chars_left;
+   if (n > len)
+   n = len;
+   r = copy_to_user(buf, p->buf + p->buf_index, n);
+   n -= r;
+   p->chars_left -= n;
+   p->buf_index += n;
+   buf += n;
+   len -= n;
+   ret = n;
+   if (r) {
+   if (!n)
+   ret = -EFAULT;
+   goto out;
+   }
+   }
+
+   kvm = p->kvm;
+   i = p->hpt_index;
+   hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
+   for (; len != 0 && i < kvm->arch.hpt_npte; ++i, hptp += 2) {
+   if (!(be64_to_cpu(hptp[0]) & (HPTE_V_VALID |

[PATCH 04/12] KVM: PPC: Book3S HV: Minor cleanups

2015-03-27 Thread Paul Mackerras

* Remove unused kvmppc_vcore::n_busy field.
* Remove setting of RMOR, since it was only used on PPC970 and the
  PPC970 KVM support has been removed.
* Don't use r1 or r2 in setting the runlatch since they are
  conventionally reserved for other things; use r0 instead.
* Streamline the code a little and remove the ext_interrupt_to_host
  label.
* Add some comments about register usage.
* hcall_try_real_mode doesn't need to be global, and can't be
  called from C code anyway.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kernel/asm-offsets.c   |  1 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 44 ++---
 3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2f339ff..3eecd88 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -227,7 +227,6 @@ struct kvm_arch {
unsigned long host_sdr1;
int tlbie_lock;
unsigned long lpcr;
-   unsigned long rmor;
unsigned long vrma_slb_v;
int hpte_setup_done;
u32 hpt_order;
@@ -271,7 +270,6 @@ struct kvm_arch {
  */
 struct kvmppc_vcore {
int n_runnable;
-   int n_busy;
int num_threads;
int entry_exit_count;
int n_woken;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 3fea721..92ec3fc 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -505,7 +505,6 @@ int main(void)
DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits));
DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
-   DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index f05ae0c..29190af 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -245,9 +245,9 @@ kvm_novcpu_exit:
 kvm_start_guest:
 
/* Set runlatch bit the minute you wake up from nap */
-   mfspr   r1, SPRN_CTRLF
-   ori r1, r1, 1
-   mtspr   SPRN_CTRLT, r1
+   mfspr   r0, SPRN_CTRLF
+   ori r0, r0, 1
+   mtspr   SPRN_CTRLT, r0
 
ld  r2,PACATOC(r13)
 
@@ -493,11 +493,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
cmpwi   r0,0
beq 20b
 
-   /* Set LPCR and RMOR. */
+   /* Set LPCR. */
 10:ld  r8,VCORE_LPCR(r5)
mtspr   SPRN_LPCR,r8
-   ld  r8,KVM_RMOR(r9)
-   mtspr   SPRN_RMOR,r8
isync
 
/* Check if HDEC expires soon */
@@ -1074,7 +1072,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
bne 2f
mfspr   r3,SPRN_HDEC
cmpwi   r3,0
-   bge ignore_hdec
+   mr  r4,r9
+   bge fast_guest_return
 2:
/* See if this is an hcall we can handle in real mode */
cmpwi   r12,BOOK3S_INTERRUPT_SYSCALL
@@ -1082,26 +1081,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
/* External interrupt ? */
cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
-   bne+ext_interrupt_to_host
+   bne+guest_exit_cont
 
/* External interrupt, first check for host_ipi. If this is
 * set, we know the host wants us out so let's do it now
 */
bl  kvmppc_read_intr
cmpdi   r3, 0
-   bgt ext_interrupt_to_host
+   bgt guest_exit_cont
 
/* Check if any CPU is heading out to the host, if so head out too */
ld  r5, HSTATE_KVM_VCORE(r13)
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi   r0, 0x100
-   bge ext_interrupt_to_host
-
-   /* Return to guest after delivering any pending interrupt */
mr  r4, r9
-   b   deliver_guest_interrupt
-
-ext_interrupt_to_host:
+   blt deliver_guest_interrupt
 
 guest_exit_cont:   /* r9 = vcpu, r12 = trap, r13 = paca */
/* Save more register state  */
@@ -1762,8 +1756,10 @@ kvmppc_hisi:
  * Returns to the guest if we handle it, or continues on up to
  * the kernel if we can't (i.e. if we don't have a handler for
  * it, or if the handler returns H_TOO_HARD).
+ *
+ * r5 - r8 contain hcall args,
+ * r9 = vcpu, r10 = pc, r11 = msr, r12 = trap, r13 = paca
  */
-   .globl  hcall_try_real_mode
 hcall_try_real_mode:
ld  r3,VCPU_GPR(R3)(r9)
andi.   r0,r11,MSR_PR
@@ -2023,10 +2019,6 @@ hcall_real_table:
.globl  hcall_real_table_end
 hcall_real_table_end:
 
-ignore_hdec:
-   mr  r4,r9
-   b   fast_guest_return
-
 _GLOBAL(kv

[PATCH 02/12] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-27 Thread Paul Mackerras

This reads the timebase at various points in the real-mode guest
entry/exit code and uses that to accumulate total, minimum and
maximum time spent in those parts of the code.  Currently these
times are accumulated per vcpu in 5 parts of the code:

* rm_entry - time taken from the start of kvmppc_hv_entry() until
  just before entering the guest.
* rm_intr - time from when we take a hypervisor interrupt in the
  guest until we either re-enter the guest or decide to exit to the
  host.  This includes time spent handling hcalls in real mode.
* rm_exit - time from when we decide to exit the guest until the
  return from kvmppc_hv_entry().
* guest - time spend in the guest
* cede - time spent napping in real mode due to an H_CEDE hcall
  while other threads in the same vcore are active.

These times are exposed in debugfs in a directory per vcpu that
contains a file called "timings".  This file contains one line for
each of the 5 timings above, with the name followed by a colon and
4 numbers, which are the count (number of times the code has been
executed), the total time, the minimum time, and the maximum time,
all in nanoseconds.

The overhead of the extra code amounts to about 30ns for an hcall that
is handled in real mode (e.g. H_SET_DABR), which is about 25%.  Since
production environments may not wish to incur this overhead, the new
code is conditional on a new config symbol,
CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  21 +
 arch/powerpc/include/asm/time.h |   3 +
 arch/powerpc/kernel/asm-offsets.c   |  13 +++
 arch/powerpc/kernel/time.c  |   6 ++
 arch/powerpc/kvm/Kconfig|  14 +++
 arch/powerpc/kvm/book3s_hv.c| 150 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 141 +-
 7 files changed, 346 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index f1d0bbc..d2068bb 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -369,6 +369,14 @@ struct kvmppc_slb {
u8 base_page_size;  /* MMU_PAGE_xxx */
 };
 
+/* Struct used to accumulate timing information in HV real mode code */
+struct kvmhv_tb_accumulator {
+   u64 seqcount;   /* used to synchronize access, also count * 2 */
+   u64 tb_total;   /* total time in timebase ticks */
+   u64 tb_min; /* min time */
+   u64 tb_max; /* max time */
+};
+
 # ifdef CONFIG_PPC_FSL_BOOK3E
 #define KVMPPC_BOOKE_IAC_NUM   2
 #define KVMPPC_BOOKE_DAC_NUM   2
@@ -657,6 +665,19 @@ struct kvm_vcpu_arch {
 
u32 emul_inst;
 #endif
+
+#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+   struct kvmhv_tb_accumulator *cur_activity;  /* What we're timing */
+   u64 cur_tb_start;   /* when it started */
+   struct kvmhv_tb_accumulator rm_entry;   /* real-mode entry code */
+   struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */
+   struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
+   struct kvmhv_tb_accumulator guest_time; /* guest execution */
+   struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
+
+   struct dentry *debugfs_dir;
+   struct dentry *debugfs_timings;
+#endif /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */
 };
 
 #define VCPU_FPR(vcpu, i)  (vcpu)->arch.fp.fpr[i][TS_FPROFFSET]
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 03cbada..10fc784 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void);
 
 DECLARE_PER_CPU(u64, decrementers_next_tb);
 
+/* Convert timebase ticks to nanoseconds */
+unsigned long long tb_to_ns(unsigned long long tb_ticks);
+
 #endif /* __KERNEL__ */
 #endif /* __POWERPC_TIME_H */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 4717859..3fea721 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -459,6 +459,19 @@ int main(void)
DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
 #endif
+#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+   DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry));
+   DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr));
+   DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit));
+   DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time));
+   DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time));
+   DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity));
+   DEFINE(VCPU_ACTIVITY_START, offsetof(struct kvm_vcpu, 
arch.cur_tb_start));
+   DEFINE(TAS_SEQCOUN

[PATCH 07/12] KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI

2015-03-27 Thread Paul Mackerras

When running a multi-threaded guest and vcpu 0 in a virtual core
is not running in the guest (i.e. it is busy elsewhere in the host),
thread 0 of the physical core will switch the MMU to the guest and
then go to nap mode in the code at kvm_do_nap.  If the guest sends
an IPI to thread 0 using the msgsndp instruction, that will wake
up thread 0 and cause all the threads in the guest to exit to the
host unnecessarily.  To avoid the unnecessary exit, this arranges
for the PECEDP bit to be cleared in this situation.  When napping
due to a H_CEDE from the guest, we still set PECEDP so that the
thread will wake up on an IPI sent using msgsndp.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 9a2ad8f..f3fef6c 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -191,6 +191,7 @@ kvmppc_primary_no_guest:
li  r3, NAPPING_NOVCPU
stb r3, HSTATE_NAPPING(r13)
 
+   li  r3, 0   /* Don't wake on privileged (OS) doorbell */
b   kvm_do_nap
 
 kvm_novcpu_wakeup:
@@ -2128,10 +2129,13 @@ _GLOBAL(kvmppc_h_cede)  /* r3 = vcpu pointer, 
r11 = msr, r13 = paca */
bl  kvmhv_accumulate_time
 #endif
 
+   lis r3, LPCR_PECEDP@h   /* Do wake on privileged doorbell */
+
/*
 * Take a nap until a decrementer or external or doobell interrupt
-* occurs, with PECE1, PECE0 and PECEDP set in LPCR. Also clear the
-* runlatch bit before napping.
+* occurs, with PECE1 and PECE0 set in LPCR.
+* On POWER8, if we are ceding, also set PECEDP.
+* Also clear the runlatch bit before napping.
 */
 kvm_do_nap:
mfspr   r0, SPRN_CTRLF
@@ -2143,7 +2147,7 @@ kvm_do_nap:
mfspr   r5,SPRN_LPCR
ori r5,r5,LPCR_PECE0 | LPCR_PECE1
 BEGIN_FTR_SECTION
-   orisr5,r5,LPCR_PECEDP@h
+   rlwimi  r5, r3, 0, LPCR_PECEDP
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_LPCR,r5
isync
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/12] KVM: PPC: Book3S HV: Use decrementer to wake napping threads

2015-03-27 Thread Paul Mackerras

This arranges for threads that are napping due to their vcpu having
ceded or due to not having a vcpu to wake up at the end of the guest's
timeslice without having to be poked with an IPI.  We do that by
arranging for the decrementer to contain a value no greater than the
number of timebase ticks remaining until the end of the timeslice.
In the case of a thread with no vcpu, this number is in the hypervisor
decrementer already.  In the case of a ceded vcpu, we use the smaller
of the HDEC value and the DEC value.

Using the DEC like this when ceded means we need to save and restore
the guest decrementer value around the nap.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 +++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index f3fef6c..1c5d052 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -172,6 +172,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 kvmppc_primary_no_guest:
/* We handle this much like a ceded vcpu */
+   /* put the HDEC into the DEC, since HDEC interrupts don't wake us */
+   mfspr   r3, SPRN_HDEC
+   mtspr   SPRN_DEC, r3
/* set our bit in napping_threads */
ld  r5, HSTATE_KVM_VCORE(r13)
lbz r7, HSTATE_PTID(r13)
@@ -223,6 +226,12 @@ kvm_novcpu_wakeup:
cmpdi   r3, 0
bge kvm_novcpu_exit
 
+   /* See if our timeslice has expired (HDEC is negative) */
+   mfspr   r0, SPRN_HDEC
+   li  r12, BOOK3S_INTERRUPT_HV_DECREMENTER
+   cmpwi   r0, 0
+   blt kvm_novcpu_exit
+
/* Got an IPI but other vcpus aren't yet exiting, must be a latecomer */
ld  r4, HSTATE_KVM_VCPU(r13)
cmpdi   r4, 0
@@ -1492,10 +1501,10 @@ kvmhv_do_exit:  /* r12 = trap, r13 = 
paca */
cmpwi   r3,0x100/* Are we the first here? */
bge 43f
cmpwi   r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-   beq 40f
+   beq 43f
li  r0,0
mtspr   SPRN_HDEC,r0
-40:
+
/*
 * Send an IPI to any napping threads, since an HDEC interrupt
 * doesn't wake CPUs up from nap.
@@ -2123,6 +2132,27 @@ _GLOBAL(kvmppc_h_cede)   /* r3 = vcpu pointer, 
r11 = msr, r13 = paca */
/* save FP state */
bl  kvmppc_save_fp
 
+   /*
+* Set DEC to the smaller of DEC and HDEC, so that we wake
+* no later than the end of our timeslice (HDEC interrupts
+* don't wake us from nap).
+*/
+   mfspr   r3, SPRN_DEC
+   mfspr   r4, SPRN_HDEC
+   mftbr5
+   cmpwr3, r4
+   ble 67f
+   mtspr   SPRN_DEC, r4
+67:
+   /* save expiry time of guest decrementer */
+   extsw   r3, r3
+   add r3, r3, r5
+   ld  r4, HSTATE_KVM_VCPU(r13)
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_TB_OFFSET(r5)
+   subfr3, r6, r3  /* convert to host TB value */
+   std r3, VCPU_DEC_EXPIRES(r4)
+
 #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
ld  r4, HSTATE_KVM_VCPU(r13)
addir3, r4, VCPU_TB_CEDE
@@ -2180,6 +2210,15 @@ kvm_end_cede:
/* load up FP state */
bl  kvmppc_load_fp
 
+   /* Restore guest decrementer */
+   ld  r3, VCPU_DEC_EXPIRES(r4)
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_TB_OFFSET(r5)
+   add r3, r3, r6  /* convert host TB to guest TB value */
+   mftbr7
+   subfr3, r7, r3
+   mtspr   SPRN_DEC, r3
+
/* Load NV GPRS */
ld  r14, VCPU_GPR(R14)(r4)
ld  r15, VCPU_GPR(R15)(r4)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/12] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-27 Thread Paul Mackerras

On Fri, Mar 27, 2015 at 02:29:46PM +1100, Paul Mackerras wrote:
> This reads the timebase at various points in the real-mode guest
> entry/exit code and uses that to accumulate total, minimum and
> maximum time spent in those parts of the code.  Currently these
> times are accumulated per vcpu in 5 parts of the code:

I just realized that this is going to give bogus results if we have a
non-zero timebase offset, so ignore this patch for now.  I'll fix it
and repost.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/12] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-26 Thread Paul Mackerras

This reads the timebase at various points in the real-mode guest
entry/exit code and uses that to accumulate total, minimum and
maximum time spent in those parts of the code.  Currently these
times are accumulated per vcpu in 5 parts of the code:

* rm_entry - time taken from the start of kvmppc_hv_entry() until
  just before entering the guest.
* rm_intr - time from when we take a hypervisor interrupt in the
  guest until we either re-enter the guest or decide to exit to the
  host.  This includes time spent handling hcalls in real mode.
* rm_exit - time from when we decide to exit the guest until the
  return from kvmppc_hv_entry().
* guest - time spend in the guest
* cede - time spent napping in real mode due to an H_CEDE hcall
  while other threads in the same vcore are active.

These times are exposed in debugfs in a directory per vcpu that
contains a file called "timings".  This file contains one line for
each of the 5 timings above, with the name followed by a colon and
4 numbers, which are the count (number of times the code has been
executed), the total time, the minimum time, and the maximum time,
all in nanoseconds.

The overhead of the extra code amounts to about 29ns for an hcall that
is handled in real mode (e.g. H_SET_DABR), or 25%, and about 43ns for
an hcall that is handled in the host kernel in virtual mode
(e.g. H_PROD), or 1.2%.  Since production environments may not wish to
incur this overhead, the new code is conditional on a new config
symbol, CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  21 +
 arch/powerpc/include/asm/time.h |   3 +
 arch/powerpc/kernel/asm-offsets.c   |  13 +++
 arch/powerpc/kernel/time.c  |   6 ++
 arch/powerpc/kvm/Kconfig|  14 +++
 arch/powerpc/kvm/book3s_hv.c| 150 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 129 ++-
 7 files changed, 334 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index f1d0bbc..d2068bb 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -369,6 +369,14 @@ struct kvmppc_slb {
u8 base_page_size;  /* MMU_PAGE_xxx */
 };
 
+/* Struct used to accumulate timing information in HV real mode code */
+struct kvmhv_tb_accumulator {
+   u64 seqcount;   /* used to synchronize access, also count * 2 */
+   u64 tb_total;   /* total time in timebase ticks */
+   u64 tb_min; /* min time */
+   u64 tb_max; /* max time */
+};
+
 # ifdef CONFIG_PPC_FSL_BOOK3E
 #define KVMPPC_BOOKE_IAC_NUM   2
 #define KVMPPC_BOOKE_DAC_NUM   2
@@ -657,6 +665,19 @@ struct kvm_vcpu_arch {
 
u32 emul_inst;
 #endif
+
+#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+   struct kvmhv_tb_accumulator *cur_activity;  /* What we're timing */
+   u64 cur_tb_start;   /* when it started */
+   struct kvmhv_tb_accumulator rm_entry;   /* real-mode entry code */
+   struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */
+   struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
+   struct kvmhv_tb_accumulator guest_time; /* guest execution */
+   struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
+
+   struct dentry *debugfs_dir;
+   struct dentry *debugfs_timings;
+#endif /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */
 };
 
 #define VCPU_FPR(vcpu, i)  (vcpu)->arch.fp.fpr[i][TS_FPROFFSET]
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 03cbada..10fc784 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void);
 
 DECLARE_PER_CPU(u64, decrementers_next_tb);
 
+/* Convert timebase ticks to nanoseconds */
+unsigned long long tb_to_ns(unsigned long long tb_ticks);
+
 #endif /* __KERNEL__ */
 #endif /* __POWERPC_TIME_H */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 4717859..3fea721 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -459,6 +459,19 @@ int main(void)
DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
 #endif
+#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+   DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry));
+   DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr));
+   DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit));
+   DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time));
+   DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time));
+   DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity));
+   DEFINE(V

[PATCH 08/12] KVM: PPC: Book3S HV: Use decrementer to wake napping threads

2015-03-26 Thread Paul Mackerras

This arranges for threads that are napping due to their vcpu having
ceded or due to not having a vcpu to wake up at the end of the guest's
timeslice without having to be poked with an IPI.  We do that by
arranging for the decrementer to contain a value no greater than the
number of timebase ticks remaining until the end of the timeslice.
In the case of a thread with no vcpu, this number is in the hypervisor
decrementer already.  In the case of a ceded vcpu, we use the smaller
of the HDEC value and the DEC value.

Using the DEC like this when ceded means we need to save and restore
the guest decrementer value around the nap.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 +++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 35b1847..bc2ab3d 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -172,6 +172,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 kvmppc_primary_no_guest:
/* We handle this much like a ceded vcpu */
+   /* put the HDEC into the DEC, since HDEC interrupts don't wake us */
+   mfspr   r3, SPRN_HDEC
+   mtspr   SPRN_DEC, r3
/* set our bit in napping_threads */
ld  r5, HSTATE_KVM_VCORE(r13)
lbz r7, HSTATE_PTID(r13)
@@ -223,6 +226,12 @@ kvm_novcpu_wakeup:
cmpdi   r3, 0
bge kvm_novcpu_exit
 
+   /* See if our timeslice has expired (HDEC is negative) */
+   mfspr   r0, SPRN_HDEC
+   li  r12, BOOK3S_INTERRUPT_HV_DECREMENTER
+   cmpwi   r0, 0
+   blt kvm_novcpu_exit
+
/* Got an IPI but other vcpus aren't yet exiting, must be a latecomer */
ld  r4, HSTATE_KVM_VCPU(r13)
cmpdi   r4, 0
@@ -1492,10 +1501,10 @@ kvmhv_do_exit:  /* r12 = trap, r13 = 
paca */
cmpwi   r3,0x100/* Are we the first here? */
bge 43f
cmpwi   r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-   beq 40f
+   beq 43f
li  r0,0
mtspr   SPRN_HDEC,r0
-40:
+
/*
 * Send an IPI to any napping threads, since an HDEC interrupt
 * doesn't wake CPUs up from nap.
@@ -2123,6 +2132,27 @@ _GLOBAL(kvmppc_h_cede)   /* r3 = vcpu pointer, 
r11 = msr, r13 = paca */
/* save FP state */
bl  kvmppc_save_fp
 
+   /*
+* Set DEC to the smaller of DEC and HDEC, so that we wake
+* no later than the end of our timeslice (HDEC interrupts
+* don't wake us from nap).
+*/
+   mfspr   r3, SPRN_DEC
+   mfspr   r4, SPRN_HDEC
+   mftbr5
+   cmpwr3, r4
+   ble 67f
+   mtspr   SPRN_DEC, r4
+67:
+   /* save expiry time of guest decrementer */
+   extsw   r3, r3
+   add r3, r3, r5
+   ld  r4, HSTATE_KVM_VCPU(r13)
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_TB_OFFSET(r5)
+   subfr3, r6, r3  /* convert to host TB value */
+   std r3, VCPU_DEC_EXPIRES(r4)
+
 #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
ld  r4, HSTATE_KVM_VCPU(r13)
addir3, r4, VCPU_TB_CEDE
@@ -2180,6 +2210,15 @@ kvm_end_cede:
/* load up FP state */
bl  kvmppc_load_fp
 
+   /* Restore guest decrementer */
+   ld  r3, VCPU_DEC_EXPIRES(r4)
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_TB_OFFSET(r5)
+   add r3, r3, r6  /* convert host TB to guest TB value */
+   mftbr7
+   subfr3, r7, r3
+   mtspr   SPRN_DEC, r3
+
/* Load NV GPRS */
ld  r14, VCPU_GPR(R14)(r4)
ld  r15, VCPU_GPR(R15)(r4)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/12] KVM: PPC: Book3S HV: Minor cleanups

2015-03-26 Thread Paul Mackerras

* Remove unused kvmppc_vcore::n_busy field.
* Remove setting of RMOR, since it was only used on PPC970 and the
  PPC970 KVM support has been removed.
* Don't use r1 or r2 in setting the runlatch since they are
  conventionally reserved for other things; use r0 instead.
* Streamline the code a little and remove the ext_interrupt_to_host
  label.
* Add some comments about register usage.
* hcall_try_real_mode doesn't need to be global, and can't be
  called from C code anyway.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kernel/asm-offsets.c   |  1 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 44 ++---
 3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2f339ff..3eecd88 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -227,7 +227,6 @@ struct kvm_arch {
unsigned long host_sdr1;
int tlbie_lock;
unsigned long lpcr;
-   unsigned long rmor;
unsigned long vrma_slb_v;
int hpte_setup_done;
u32 hpt_order;
@@ -271,7 +270,6 @@ struct kvm_arch {
  */
 struct kvmppc_vcore {
int n_runnable;
-   int n_busy;
int num_threads;
int entry_exit_count;
int n_woken;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 3fea721..92ec3fc 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -505,7 +505,6 @@ int main(void)
DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits));
DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
-   DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index ee6ba4c..a69e09e 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -245,9 +245,9 @@ kvm_novcpu_exit:
 kvm_start_guest:
 
/* Set runlatch bit the minute you wake up from nap */
-   mfspr   r1, SPRN_CTRLF
-   ori r1, r1, 1
-   mtspr   SPRN_CTRLT, r1
+   mfspr   r0, SPRN_CTRLF
+   ori r0, r0, 1
+   mtspr   SPRN_CTRLT, r0
 
ld  r2,PACATOC(r13)
 
@@ -493,11 +493,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
cmpwi   r0,0
beq 20b
 
-   /* Set LPCR and RMOR. */
+   /* Set LPCR. */
 10:ld  r8,VCORE_LPCR(r5)
mtspr   SPRN_LPCR,r8
-   ld  r8,KVM_RMOR(r9)
-   mtspr   SPRN_RMOR,r8
isync
 
/* Check if HDEC expires soon */
@@ -1074,7 +1072,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
bne 2f
mfspr   r3,SPRN_HDEC
cmpwi   r3,0
-   bge ignore_hdec
+   mr  r4,r9
+   bge fast_guest_return
 2:
/* See if this is an hcall we can handle in real mode */
cmpwi   r12,BOOK3S_INTERRUPT_SYSCALL
@@ -1082,26 +1081,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
/* External interrupt ? */
cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
-   bne+ext_interrupt_to_host
+   bne+guest_exit_cont
 
/* External interrupt, first check for host_ipi. If this is
 * set, we know the host wants us out so let's do it now
 */
bl  kvmppc_read_intr
cmpdi   r3, 0
-   bgt ext_interrupt_to_host
+   bgt guest_exit_cont
 
/* Check if any CPU is heading out to the host, if so head out too */
ld  r5, HSTATE_KVM_VCORE(r13)
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi   r0, 0x100
-   bge ext_interrupt_to_host
-
-   /* Return to guest after delivering any pending interrupt */
mr  r4, r9
-   b   deliver_guest_interrupt
-
-ext_interrupt_to_host:
+   blt deliver_guest_interrupt
 
 guest_exit_cont:   /* r9 = vcpu, r12 = trap, r13 = paca */
/* Save more register state  */
@@ -1762,8 +1756,10 @@ kvmppc_hisi:
  * Returns to the guest if we handle it, or continues on up to
  * the kernel if we can't (i.e. if we don't have a handler for
  * it, or if the handler returns H_TOO_HARD).
+ *
+ * r5 - r8 contain hcall args,
+ * r9 = vcpu, r10 = pc, r11 = msr, r12 = trap, r13 = paca
  */
-   .globl  hcall_try_real_mode
 hcall_try_real_mode:
ld  r3,VCPU_GPR(R3)(r9)
andi.   r0,r11,MSR_PR
@@ -2023,10 +2019,6 @@ hcall_real_table:
.globl  hcall_real_table_end
 hcall_real_table_end:
 
-ignore_hdec:
-   mr  r4,r9
-   b   fast_guest_return
-
 _GLOBAL(kv

[PATCH 09/12] KVM: PPC: Book3S HV: Use bitmap of active threads rather than count

2015-03-26 Thread Paul Mackerras

Currently, the entry_exit_count field in the kvmppc_vcore struct
contains two 8-bit counts, one of the threads that have started entering
the guest, and one of the threads that have started exiting the guest.
This changes it to an entry_exit_map field which contains two bitmaps
of 8 bits each.  The advantage of doing this is that it gives us a
bitmap of which threads need to be signalled when exiting the guest.
That means that we no longer need to use the trick of setting the
HDEC to 0 to pull the other threads out of the guest, which led in
some cases to a spurious HDEC interrupt on the next guest entry.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h | 15 
 arch/powerpc/kernel/asm-offsets.c   |  2 +-
 arch/powerpc/kvm/book3s_hv.c|  5 ++-
 arch/powerpc/kvm/book3s_hv_builtin.c| 10 +++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 61 +++--
 5 files changed, 44 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 1517faa..d67a838 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -263,15 +263,15 @@ struct kvm_arch {
 
 /*
  * Struct for a virtual core.
- * Note: entry_exit_count combines an entry count in the bottom 8 bits
- * and an exit count in the next 8 bits.  This is so that we can
- * atomically increment the entry count iff the exit count is 0
- * without taking the lock.
+ * Note: entry_exit_map combines a bitmap of threads that have entered
+ * in the bottom 8 bits and a bitmap of threads that have exited in the
+ * next 8 bits.  This is so that we can atomically set the entry bit
+ * iff the exit map is 0 without taking a lock.
  */
 struct kvmppc_vcore {
int n_runnable;
int num_threads;
-   int entry_exit_count;
+   int entry_exit_map;
int napping_threads;
int first_vcpuid;
u16 pcpu;
@@ -296,8 +296,9 @@ struct kvmppc_vcore {
ulong conferring_threads;
 };
 
-#define VCORE_ENTRY_COUNT(vc)  ((vc)->entry_exit_count & 0xff)
-#define VCORE_EXIT_COUNT(vc)   ((vc)->entry_exit_count >> 8)
+#define VCORE_ENTRY_MAP(vc)((vc)->entry_exit_map & 0xff)
+#define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8)
+#define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8aa8246..0d07efb 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -562,7 +562,7 @@ int main(void)
DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop));
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
-   DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, 
entry_exit_count));
+   DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, 
napping_threads));
DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6741505..1426459 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1952,7 +1952,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Initialize *vc.
 */
-   vc->entry_exit_count = 0;
+   vc->entry_exit_map = 0;
vc->preempt_tb = TB_NIL;
vc->in_guest = 0;
vc->napping_threads = 0;
@@ -2119,8 +2119,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
 * this thread straight away and have it join in.
 */
if (!signal_pending(current)) {
-   if (vc->vcore_state == VCORE_RUNNING &&
-   VCORE_EXIT_COUNT(vc) == 0) {
+   if (vc->vcore_state == VCORE_RUNNING && !VCORE_IS_EXITING(vc)) {
kvmppc_create_dtl_entry(vcpu, vc);
kvmppc_start_thread(vcpu);
trace_kvm_guest_enter(vcpu);
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 1954a1c..2754251 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -115,11 +115,11 @@ long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int 
target,
int rv = H_SUCCESS; /* => don't yield */
 
set_bit(vcpu->arch.ptid, &vc->conferring_threads);
-   while ((get_tb() < stop) && (VCORE_EXIT_COUNT(vc) == 0)) {
-   threads_running = VCORE_ENTRY_COUNT(vc);
-   threads_ceded = hweight32(vc->napping_threads);
-   threads_conferring = hweight

[PATCH 11/12] KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C

2015-03-26 Thread Paul Mackerras

This replaces the assembler code for kvmhv_commence_exit() with C code
in book3s_hv_builtin.c.  It also moves the IPI sending code that was
in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it
can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq().

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  2 +
 arch/powerpc/kvm/book3s_hv_builtin.c | 63 ++
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 12 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 66 
 4 files changed, 75 insertions(+), 68 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 869c53f..2b84e48 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -438,6 +438,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
 
 extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 
+extern void kvmhv_rm_send_ipi(int cpu);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 2754251..c42aa55 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define KVM_CMA_CHUNK_ORDER18
 
@@ -184,3 +185,65 @@ long kvmppc_h_random(struct kvm_vcpu *vcpu)
 
return H_HARDWARE;
 }
+
+static inline void rm_writeb(unsigned long paddr, u8 val)
+{
+   __asm__ __volatile__("stbcix %0,0,%1"
+   : : "r" (val), "r" (paddr) : "memory");
+}
+
+/*
+ * Send an interrupt to another CPU.
+ * This can only be called in real mode.
+ * The caller needs to include any barrier needed to order writes
+ * to memory vs. the IPI/message.
+ */
+void kvmhv_rm_send_ipi(int cpu)
+{
+   unsigned long xics_phys;
+
+   /* Poke the target */
+   xics_phys = paca[cpu].kvm_hstate.xics_phys;
+   rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+}
+
+/*
+ * The following functions are called from the assembly code
+ * in book3s_hv_rmhandlers.S.
+ */
+static void kvmhv_interrupt_vcore(struct kvmppc_vcore *vc, int active)
+{
+   int cpu = vc->pcpu;
+
+   /* Order setting of exit map vs. msgsnd/IPI */
+   smp_mb();
+   for (; active; active >>= 1, ++cpu)
+   if (active & 1)
+   kvmhv_rm_send_ipi(cpu);
+}
+
+void kvmhv_commence_exit(int trap)
+{
+   struct kvmppc_vcore *vc = local_paca->kvm_hstate.kvm_vcore;
+   int ptid = local_paca->kvm_hstate.ptid;
+   int me, ee;
+
+   /* Set our bit in the threads-exiting-guest map in the 0xff00
+  bits of vcore->entry_exit_map */
+   me = 0x100 << ptid;
+   do {
+   ee = vc->entry_exit_map;
+   } while (cmpxchg(&vc->entry_exit_map, ee, ee | me) != ee);
+
+   /* Are we the first here? */
+   if ((ee >> 8) != 0)
+   return;
+
+   /*
+* Trigger the other threads in this vcore to exit the guest.
+* If this is a hypervisor decrementer interrupt then they
+* will be already on their way out of the guest.
+*/
+   if (trap != BOOK3S_INTERRUPT_HV_DECREMENTER)
+   kvmhv_interrupt_vcore(vc, ee & ~(1 << ptid));
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 6dded8c..00e45b6 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -26,12 +26,6 @@
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
u32 new_irq);
 
-static inline void rm_writeb(unsigned long paddr, u8 val)
-{
-   __asm__ __volatile__("sync; stbcix %0,0,%1"
-   : : "r" (val), "r" (paddr) : "memory");
-}
-
 /* -- ICS routines -- */
 static void ics_rm_check_resend(struct kvmppc_xics *xics,
struct kvmppc_ics *ics, struct kvmppc_icp *icp)
@@ -60,7 +54,6 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
struct kvm_vcpu *this_vcpu)
 {
struct kvmppc_icp *this_icp = this_vcpu->arch.icp;
-   unsigned long xics_phys;
int cpu;
 
/* Mark the target VCPU as having an interrupt pending */
@@ -83,9 +76,8 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
/* In SMT cpu will always point to thread 0, we adjust it */
cpu += vcpu->arch.ptid;
 
-   /* Not too hard, then poke the target */
-   xics_phys = paca[cpu].kvm_hstate.xics_phys;
-   rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+   smp_mb();
+   kvmhv_rm_send_ipi(cpu);
 }
 
 static void icp_rm_clr_vcpu_irq(struct kvm_vcpu *vcpu)
diff --g

[PATCH 07/12] KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI

2015-03-26 Thread Paul Mackerras

When running a multi-threaded guest and vcpu 0 in a virtual core
is not running in the guest (i.e. it is busy elsewhere in the host),
thread 0 of the physical core will switch the MMU to the guest and
then go to nap mode in the code at kvm_do_nap.  If the guest sends
an IPI to thread 0 using the msgsndp instruction, that will wake
up thread 0 and cause all the threads in the guest to exit to the
host unnecessarily.  To avoid the unnecessary exit, this arranges
for the PECEDP bit to be cleared in this situation.  When napping
due to a H_CEDE from the guest, we still set PECEDP so that the
thread will wake up on an IPI sent using msgsndp.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 0a35b8d..35b1847 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -191,6 +191,7 @@ kvmppc_primary_no_guest:
li  r3, NAPPING_NOVCPU
stb r3, HSTATE_NAPPING(r13)
 
+   li  r3, 0   /* Don't wake on privileged (OS) doorbell */
b   kvm_do_nap
 
 kvm_novcpu_wakeup:
@@ -2128,10 +2129,13 @@ _GLOBAL(kvmppc_h_cede)  /* r3 = vcpu pointer, 
r11 = msr, r13 = paca */
bl  kvmhv_accumulate_time
 #endif
 
+   lis r3, LPCR_PECEDP@h   /* Do wake on privileged doorbell */
+
/*
 * Take a nap until a decrementer or external or doobell interrupt
-* occurs, with PECE1, PECE0 and PECEDP set in LPCR. Also clear the
-* runlatch bit before napping.
+* occurs, with PECE1 and PECE0 set in LPCR.
+* On POWER8, if we are ceding, also set PECEDP.
+* Also clear the runlatch bit before napping.
 */
 kvm_do_nap:
mfspr   r0, SPRN_CTRLF
@@ -2143,7 +2147,7 @@ kvm_do_nap:
mfspr   r5,SPRN_LPCR
ori r5,r5,LPCR_PECE0 | LPCR_PECE1
 BEGIN_FTR_SECTION
-   orisr5,r5,LPCR_PECEDP@h
+   rlwimi  r5, r3, 0, LPCR_PECEDP
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_LPCR,r5
isync
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/12] KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken

2015-03-26 Thread Paul Mackerras

We can tell when a secondary thread has finished running a guest by
the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
is no real need for the nap_count field in the kvmppc_vcore struct.
This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
pointers of the secondary threads rather than polling vc->nap_count.
Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
this also means that we can tell which secondary threads have got
stuck and thus print a more informative error message.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kernel/asm-offsets.c   |  1 -
 arch/powerpc/kvm/book3s_hv.c| 47 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 19 +
 4 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 83c4425..1517faa 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -272,8 +272,6 @@ struct kvmppc_vcore {
int n_runnable;
int num_threads;
int entry_exit_count;
-   int n_woken;
-   int nap_count;
int napping_threads;
int first_vcpuid;
u16 pcpu;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 92ec3fc..8aa8246 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -563,7 +563,6 @@ int main(void)
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, 
entry_exit_count));
-   DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, 
napping_threads));
DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 5a1abf6..6741505 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1729,8 +1729,10 @@ static int kvmppc_grab_hwthread(int cpu)
tpaca = &paca[cpu];
 
/* Ensure the thread won't go into the kernel if it wakes */
-   tpaca->kvm_hstate.hwthread_req = 1;
tpaca->kvm_hstate.kvm_vcpu = NULL;
+   tpaca->kvm_hstate.napping = 0;
+   smp_wmb();
+   tpaca->kvm_hstate.hwthread_req = 1;
 
/*
 * If the thread is already executing in the kernel (e.g. handling
@@ -1773,35 +1775,43 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
}
cpu = vc->pcpu + vcpu->arch.ptid;
tpaca = &paca[cpu];
-   tpaca->kvm_hstate.kvm_vcpu = vcpu;
tpaca->kvm_hstate.kvm_vcore = vc;
tpaca->kvm_hstate.ptid = vcpu->arch.ptid;
vcpu->cpu = vc->pcpu;
+   /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
smp_wmb();
+   tpaca->kvm_hstate.kvm_vcpu = vcpu;
 #if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
-   if (cpu != smp_processor_id()) {
+   if (cpu != smp_processor_id())
xics_wake_cpu(cpu);
-   if (vcpu->arch.ptid)
-   ++vc->n_woken;
-   }
 #endif
 }
 
-static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
+static void kvmppc_wait_for_nap(void)
 {
-   int i;
+   int cpu = smp_processor_id();
+   int i, loops;
 
-   HMT_low();
-   i = 0;
-   while (vc->nap_count < vc->n_woken) {
-   if (++i >= 100) {
-   pr_err("kvmppc_wait_for_nap timeout %d %d\n",
-  vc->nap_count, vc->n_woken);
-   break;
+   for (loops = 0; loops < 100; ++loops) {
+   /*
+* Check if all threads are finished.
+* We set the vcpu pointer when starting a thread
+* and the thread clears it when finished, so we look
+* for any threads that still have a non-NULL vcpu ptr.
+*/
+   for (i = 1; i < threads_per_subcore; ++i)
+   if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+   break;
+   if (i == threads_per_subcore) {
+   HMT_medium();
+   return;
}
-   cpu_relax();
+   HMT_low();
}
HMT_medium();
+   for (i = 1; i < threads_per_subcore; ++i)
+   if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+   pr_err("KVM: CPU %d seems to be stuck\n", cpu + i);
 }
 
 /*
@@ -1942,8 +1952,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Initiali

[PATCH 12/12] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8

2015-03-26 Thread Paul Mackerras

This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller.  This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.

Aggregated statistics from debugfs across vcpus for a guest with 32
vcpus, 8 threads/vcore, running on a POWER8, show this before the
change:

 rm_entry: 3387.6ns (228 - 86600, 1008969 samples)
  rm_exit: 4561.5ns (12 - 3477452, 1009402 samples)
  rm_intr: 1660.0ns (12 - 553050, 3600051 samples)

and this after the change:

 rm_entry: 3060.1ns (212 - 65138, 953873 samples)
  rm_exit: 4244.1ns (12 - 9693408, 954331 samples)
  rm_intr: 1342.3ns (12 - 1104718, 3405326 samples)

for a test of booting Fedora 20 big-endian to the login prompt.

The time taken for a H_PROD hcall (which is handled in the host
kernel) went down from about 35 microseconds to about 16 microseconds
with this change.

The noinline added to kvmppc_run_core turned out to be necessary for
good performance, at least with gcc 4.9.2 as packaged with Fedora 21
and a little-endian POWER8 host.

Signed-off-by: Paul Mackerras 
---
Note that this patch depends on the patch "powerpc/powernv: Fixes for
hypervisor doorbell handling", which is now upstream in Linus' tree as
commit 755563bc79c7, for the definition of PPC_MSGCLR().

 arch/powerpc/kernel/asm-offsets.c   |  3 ++
 arch/powerpc/kvm/book3s_hv.c| 51 ++---
 arch/powerpc/kvm/book3s_hv_builtin.c| 16 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 22 --
 4 files changed, 70 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 0d07efb..0034b6b 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef CONFIG_PPC64
 #include 
 #include 
@@ -759,5 +760,7 @@ int main(void)
offsetof(struct paca_struct, subcore_sibling_mask));
 #endif
 
+   DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
+
return 0;
 }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 1426459..bb29e75 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -84,9 +85,35 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
MAX_HCALL_OPCODE/4 + 1);
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
+static bool kvmppc_ipi_thread(int cpu)
+{
+   /* On POWER8 for IPIs to threads in the same core, use msgsnd */
+   if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
+   preempt_disable();
+   if (cpu_first_thread_sibling(cpu) ==
+   cpu_first_thread_sibling(smp_processor_id())) {
+   unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+   msg |= cpu_thread_in_core(cpu);
+   smp_mb();
+   __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+   preempt_enable();
+   return true;
+   }
+   preempt_enable();
+   }
+
+#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
+   if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
+   xics_wake_cpu(cpu);
+   return true;
+   }
+#endif
+
+   return false;
+}
+
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-   int me;
int cpu = vcpu->cpu;
wait_queue_head_t *wqp;
 
@@ -96,20 +123,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
++vcpu->stat.halt_wakeup;
}
 
-   me = get_cpu();
+   if (kvmppc_ipi_thread(cpu + vcpu->arch.ptid))
+   return;
 
/* CPU points to the first thread of the core */
-   if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) {
-#ifdef CONFIG_PPC_ICP_NATIVE
-   int real_cpu = cpu + vcpu->arch.ptid;
-   if (paca[real_cpu].kvm_hstate.xics_phys)
-   xics_wake_cpu(real_cpu);
-   else
-#endif
-   if (cpu_online(cpu))
-   smp_send_reschedule(cpu);
-   }
-   put_cpu();
+   if (cpu >= 0 && cpu < nr_cpu_ids && cpu_online(cpu))
+   smp_send_reschedule(cpu);
 }
 
 /*
@@ -1781,10 +1800,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
/* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
smp_wmb();
tpaca->kvm_hstate.kvm_vcpu = vcpu;
-#if defined(CONF

[PATCH 05/12] KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu

2015-03-26 Thread Paul Mackerras

Rather than calling cond_resched() in kvmppc_run_core() before doing
the post-processing for the vcpus that we have just run (that is,
calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do
that post-processing before calling cond_resched(), and that post-
processing is moved out into its own function, post_guest_process().

The reschedule point is now in kvmppc_run_vcpu() and we define a new
vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner
task is runnable but not running.  (Doing the reschedule with the
vcore in VCORE_INACTIVE state would be bad because there are potentially
other vcpus waiting for the runner in kvmppc_wait_for_exec() which
then wouldn't get woken up.)

Also, we make use of the handy cond_resched_lock() function, which
unlocks and relocks vc->lock for us around the reschedule.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  5 +-
 arch/powerpc/kvm/book3s_hv.c| 92 +
 2 files changed, 55 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 3eecd88..83c4425 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -304,8 +304,9 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
 #define VCORE_SLEEPING 1
-#define VCORE_RUNNING  2
-#define VCORE_EXITING  3
+#define VCORE_PREEMPT  2
+#define VCORE_RUNNING  3
+#define VCORE_EXITING  4
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 1a6ea6e..5a1abf6 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1882,15 +1882,50 @@ static void prepare_threads(struct kvmppc_vcore *vc)
}
 }
 
+static void post_guest_process(struct kvmppc_vcore *vc)
+{
+   u64 now;
+   long ret;
+   struct kvm_vcpu *vcpu, *vnext;
+
+   now = get_tb();
+   list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
+arch.run_list) {
+   /* cancel pending dec exception if dec is positive */
+   if (now < vcpu->arch.dec_expires &&
+   kvmppc_core_pending_dec(vcpu))
+   kvmppc_core_dequeue_dec(vcpu);
+
+   trace_kvm_guest_exit(vcpu);
+
+   ret = RESUME_GUEST;
+   if (vcpu->arch.trap)
+   ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
+   vcpu->arch.run_task);
+
+   vcpu->arch.ret = ret;
+   vcpu->arch.trap = 0;
+
+   if (vcpu->arch.ceded) {
+   if (!is_kvmppc_resume_guest(ret))
+   kvmppc_end_cede(vcpu);
+   else
+   kvmppc_set_timer(vcpu);
+   }
+   if (!is_kvmppc_resume_guest(vcpu->arch.ret)) {
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(&vcpu->arch.cpu_run);
+   }
+   }
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
  */
 static void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
-   struct kvm_vcpu *vcpu, *vnext;
-   long ret;
-   u64 now;
+   struct kvm_vcpu *vcpu;
int i;
int srcu_idx;
 
@@ -1922,8 +1957,11 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 */
if ((threads_per_core > 1) &&
((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
-   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
+   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) 
{
vcpu->arch.ret = -EBUSY;
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(&vcpu->arch.cpu_run);
+   }
goto out;
}
 
@@ -1979,44 +2017,12 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
kvm_guest_exit();
 
preempt_enable();
-   cond_resched();
 
spin_lock(&vc->lock);
-   now = get_tb();
-   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
-   /* cancel pending dec exception if dec is positive */
-   if (now < vcpu->arch.dec_expires &&
-   kvmppc_core_pending_dec(vcpu))
-   kvmppc_core_dequeue_dec(vcpu);
-
-   trace_kvm_guest_exit(vcpu);
-
-   ret = RESUME_GUEST;
-   if (vcpu->arch.trap)
-   ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
-   vcpu->arch.run_task);
-

[PATCH 10/12] KVM: PPC: Book3S HV: Streamline guest entry and exit

2015-03-26 Thread Paul Mackerras

On entry to the guest, secondary threads now wait for the primary to
switch the MMU after loading up most of their state, rather than before.
This means that the secondary threads get into the guest sooner, in the
common case where the secondary threads get to kvmppc_hv_entry before
the primary thread.

On exit, the first thread out increments the exit count and interrupts
the other threads (to get them out of the guest) before saving most
of its state, rather than after.  That means that the other threads
exit sooner and means that the first thread doesn't spend so much
time waiting for the other threads at the point where the MMU gets
switched back to the host.

This pulls out the code that increments the exit count and interrupts
other threads into a separate function, kvmhv_commence_exit().
This also makes sure that r12 and vcpu->arch.trap are set correctly
in some corner cases.

Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the
improvement.  Aggregating across vcpus for a guest with 32 vcpus,
8 threads/vcore, running on a POWER8, gives this before the change:

 rm_entry: avg 4537.3ns (222 - 48444, 1068878 samples)
  rm_exit: avg 4787.6ns (152 - 165490, 1010717 samples)
  rm_intr: avg 1673.6ns (12 - 341304, 3818691 samples)

and this after the change:

 rm_entry: avg 3427.7ns (232 - 68150, 1118921 samples)
  rm_exit: avg 4716.0ns (12 - 150720, 1119477 samples)
  rm_intr: avg 1614.8ns (12 - 522436, 3850432 samples)

showing a substantial reduction in the time spent per guest entry in
the real-mode guest entry code, and smaller reductions in the real
mode guest exit and interrupt handling times.  (The test was to start
the guest and boot Fedora 20 big-endian to the login prompt.)

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 212 +++-
 1 file changed, 126 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 2edbf66..7074783 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -175,6 +175,19 @@ kvmppc_primary_no_guest:
/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
mfspr   r3, SPRN_HDEC
mtspr   SPRN_DEC, r3
+   /*
+* Make sure the primary has finished the MMU switch.
+* We should never get here on a secondary thread, but
+* check it for robustness' sake.
+*/
+   ld  r5, HSTATE_KVM_VCORE(r13)
+65:lbz r0, VCORE_IN_GUEST(r5)
+   cmpwi   r0, 0
+   beq 65b
+   /* Set LPCR. */
+   ld  r8,VCORE_LPCR(r5)
+   mtspr   SPRN_LPCR,r8
+   isync
/* set our bit in napping_threads */
ld  r5, HSTATE_KVM_VCORE(r13)
lbz r7, HSTATE_PTID(r13)
@@ -206,7 +219,7 @@ kvm_novcpu_wakeup:
 
/* check the wake reason */
bl  kvmppc_check_wake_reason
-   
+
/* see if any other thread is already exiting */
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi   r0, 0x100
@@ -244,7 +257,15 @@ kvm_novcpu_wakeup:
b   kvmppc_got_guest
 
 kvm_novcpu_exit:
-   b   hdec_soon
+#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+   ld  r4, HSTATE_KVM_VCPU(r13)
+   cmpdi   r4, 0
+   beq 13f
+   addir3, r4, VCPU_TB_RMEXIT
+   bl  kvmhv_accumulate_time
+#endif
+13:bl  kvmhv_commence_exit
+   b   kvmhv_switch_to_host
 
 /*
  * We come in here when wakened from nap mode.
@@ -422,7 +443,7 @@ kvmppc_hv_entry:
/* Primary thread switches to guest partition. */
ld  r9,VCORE_KVM(r5)/* pointer to struct kvm */
cmpwi   r6,0
-   bne 20f
+   bne 10f
ld  r6,KVM_SDR1(r9)
lwz r7,KVM_LPID(r9)
li  r0,LPID_RSVD/* switch to reserved LPID */
@@ -493,26 +514,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
li  r0,1
stb r0,VCORE_IN_GUEST(r5)   /* signal secondaries to continue */
-   b   10f
-
-   /* Secondary threads wait for primary to have done partition switch */
-20:lbz r0,VCORE_IN_GUEST(r5)
-   cmpwi   r0,0
-   beq 20b
-
-   /* Set LPCR. */
-10:ld  r8,VCORE_LPCR(r5)
-   mtspr   SPRN_LPCR,r8
-   isync
-
-   /* Check if HDEC expires soon */
-   mfspr   r3,SPRN_HDEC
-   cmpwi   r3,512  /* 1 microsecond */
-   li  r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-   blt hdec_soon
 
/* Do we have a guest vcpu to run? */
-   cmpdi   r4, 0
+10:cmpdi   r4, 0
beq kvmppc_primary_no_guest
 kvmppc_got_guest:
 
@@ -837,6 +841,30 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
clrrdi  r6,r6,1
mtspr   SPRN_CTRLT,r6
 4:
+   /* Secondary threads wait for primary to have done partition switch */
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   lbz

[PATCH 00/12] Remaining improvements for HV KVM

2015-03-26 Thread Paul Mackerras

This is the rest of my current patch queue for HV KVM on PPC.  This
series is based on Alex Graf's kvm-ppc-queue branch.

The last patch in this series needs a definition of PPC_MSGCLR that is
added by the patch "powerpc/powernv: Fixes for hypervisor doorbell
handling", which has now gone upstream into Linus' tree as commit
755563bc79c7 via the linuxppc-dev mailing list.  Alex, how do you want
to handle that?  You could pull in the master branch of the kvm tree,
or you could cherry-pick 755563bc79c7 and let the subsequent merge fix
it up.

Paul.

 arch/powerpc/include/asm/kvm_book3s_64.h |   4 +
 arch/powerpc/include/asm/kvm_host.h  |  44 +++-
 arch/powerpc/include/asm/time.h  |   3 +
 arch/powerpc/kernel/asm-offsets.c|  20 +-
 arch/powerpc/kernel/time.c   |   6 +
 arch/powerpc/kvm/Kconfig |  14 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 136 ++
 arch/powerpc/kvm/book3s_hv.c | 413 +
 arch/powerpc/kvm/book3s_hv_builtin.c |  85 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  12 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 432 +--
 virt/kvm/kvm_main.c  |   1 +
 12 files changed, 897 insertions(+), 273 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/12] KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update

2015-03-26 Thread Paul Mackerras

Previously, if kvmppc_run_core() was running a VCPU that needed a VPA
update (i.e. one of its 3 virtual processor areas needed to be pinned
in memory so the host real mode code can update it on guest entry and
exit), we would drop the vcore lock and do the update there and then.
Future changes will make it inconvenient to drop the lock, so instead
we now remove it from the list of runnable VCPUs and wake up its
VCPU task.  This will have the effect that the VCPU task will exit
kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and
re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call
to kvmppc_update_vpas() and then rejoin the vcore.

The one complication is that the runner VCPU (whose VCPU task is the
current task) might be one of the ones that gets removed from the
runnable list.  In that case we just return from kvmppc_run_core()
and let the code in kvmppc_run_vcpu() wake up another VCPU task to be
the runner if necessary.

This all means that the VCORE_STARTING state is no longer used, so we
remove it.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  5 ++--
 arch/powerpc/kvm/book3s_hv.c| 56 -
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d2068bb..2f339ff 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -306,9 +306,8 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
 #define VCORE_SLEEPING 1
-#define VCORE_STARTING 2
-#define VCORE_RUNNING  3
-#define VCORE_EXITING  4
+#define VCORE_RUNNING  2
+#define VCORE_EXITING  3
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c7b18ac..1a6ea6e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1863,6 +1863,25 @@ static void kvmppc_start_restoring_l2_cache(const struct 
kvmppc_vcore *vc)
mtspr(SPRN_MPPR, mpp_addr | PPC_MPPR_FETCH_WHOLE_TABLE);
 }
 
+static void prepare_threads(struct kvmppc_vcore *vc)
+{
+   struct kvm_vcpu *vcpu, *vnext;
+
+   list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
+arch.run_list) {
+   if (signal_pending(vcpu->arch.run_task))
+   vcpu->arch.ret = -EINTR;
+   else if (vcpu->arch.vpa.update_pending ||
+vcpu->arch.slb_shadow.update_pending ||
+vcpu->arch.dtl.update_pending)
+   vcpu->arch.ret = RESUME_GUEST;
+   else
+   continue;
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(&vcpu->arch.cpu_run);
+   }
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
@@ -1872,46 +1891,31 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
struct kvm_vcpu *vcpu, *vnext;
long ret;
u64 now;
-   int i, need_vpa_update;
+   int i;
int srcu_idx;
-   struct kvm_vcpu *vcpus_to_update[threads_per_core];
 
-   /* don't start if any threads have a signal pending */
-   need_vpa_update = 0;
-   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
-   if (signal_pending(vcpu->arch.run_task))
-   return;
-   if (vcpu->arch.vpa.update_pending ||
-   vcpu->arch.slb_shadow.update_pending ||
-   vcpu->arch.dtl.update_pending)
-   vcpus_to_update[need_vpa_update++] = vcpu;
-   }
+   /*
+* Remove from the list any threads that have a signal pending
+* or need a VPA update done
+*/
+   prepare_threads(vc);
+
+   /* if the runner is no longer runnable, let the caller pick a new one */
+   if (vc->runner->arch.state != KVMPPC_VCPU_RUNNABLE)
+   return;
 
/*
-* Initialize *vc, in particular vc->vcore_state, so we can
-* drop the vcore lock if necessary.
+* Initialize *vc.
 */
vc->n_woken = 0;
vc->nap_count = 0;
vc->entry_exit_count = 0;
vc->preempt_tb = TB_NIL;
-   vc->vcore_state = VCORE_STARTING;
vc->in_guest = 0;
vc->napping_threads = 0;
vc->conferring_threads = 0;
 
/*
-* Updating any of the vpas requires calling kvmppc_pin_guest_page,
-* which can't be called with any spinlocks held.
-*/
-   if (need_vpa_update) {
-   spin_unlock(&vc->lock);
-   for (i = 0; i < need_vpa_update; ++i)
-   kvmppc_update_vpas(vcpus_to_update[i]);
-   spin_lock(&vc->lock);
-   }
-
-   /*

[PATCH 01/12] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT

2015-03-26 Thread Paul Mackerras

This creates a debugfs directory for each HV guest (assuming debugfs
is enabled in the kernel config), and within that directory, a file
by which the contents of the guest's HPT (hashed page table) can be
read.  The directory is named vm, where  is the PID of the
process that created the guest.  The file is named "htab".  This is
intended to help in debugging problems in the host's management
of guest memory.

The contents of the file consist of a series of lines like this:

  3f48 4000d032bf003505 000bd7ff1196 0003b5c71196

The first field is the index of the entry in the HPT, the second and
third are the HPT entry, so the third entry contains the real page
number that is mapped by the entry if the entry's valid bit is set.
The fourth field is the guest's view of the second doubleword of the
entry, so it contains the guest physical address.  (The format of the
second through fourth fields are described in the Power ISA and also
in arch/powerpc/include/asm/mmu-hash64.h.)

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 +
 arch/powerpc/include/asm/kvm_host.h  |   2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 136 +++
 arch/powerpc/kvm/book3s_hv.c |  12 +++
 virt/kvm/kvm_main.c  |   1 +
 5 files changed, 153 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0789a0f..869c53f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -436,6 +436,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
return rcu_dereference_raw_notrace(kvm->memslots);
 }
 
+extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 015773f..f1d0bbc 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -238,6 +238,8 @@ struct kvm_arch {
atomic_t hpte_mod_interest;
cpumask_t need_tlb_flush;
int hpt_cma_alloc;
+   struct dentry *debugfs_dir;
+   struct dentry *htab_dentry;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 6c6825a..d6fe308 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct 
kvm_get_htab_fd *ghf)
return ret;
 }
 
+struct debugfs_htab_state {
+   struct kvm  *kvm;
+   struct mutexmutex;
+   unsigned long   hpt_index;
+   int chars_left;
+   int buf_index;
+   charbuf[64];
+};
+
+static int debugfs_htab_open(struct inode *inode, struct file *file)
+{
+   struct kvm *kvm = inode->i_private;
+   struct debugfs_htab_state *p;
+
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   if (!p)
+   return -ENOMEM;
+
+   kvm_get_kvm(kvm);
+   p->kvm = kvm;
+   mutex_init(&p->mutex);
+   file->private_data = p;
+
+   return nonseekable_open(inode, file);
+}
+
+static int debugfs_htab_release(struct inode *inode, struct file *file)
+{
+   struct debugfs_htab_state *p = file->private_data;
+
+   kvm_put_kvm(p->kvm);
+   kfree(p);
+   return 0;
+}
+
+static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
+size_t len, loff_t *ppos)
+{
+   struct debugfs_htab_state *p = file->private_data;
+   ssize_t ret, r;
+   unsigned long i, n;
+   unsigned long v, hr, gr;
+   struct kvm *kvm;
+   __be64 *hptp;
+
+   ret = mutex_lock_interruptible(&p->mutex);
+   if (ret)
+   return ret;
+
+   if (p->chars_left) {
+   n = p->chars_left;
+   if (n > len)
+   n = len;
+   r = copy_to_user(buf, p->buf + p->buf_index, n);
+   n -= r;
+   p->chars_left -= n;
+   p->buf_index += n;
+   buf += n;
+   len -= n;
+   ret = n;
+   if (r) {
+   if (!n)
+   ret = -EFAULT;
+   goto out;
+   }
+   }
+
+   kvm = p->kvm;
+   i = p->hpt_index;
+   hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
+   for (; len != 0 && i < kvm->arch.hpt_npte; ++i, hptp += 2) {
+   if (!(be64_to_cpu(hptp[0]) & (HPTE_V_VALID |

Re: [kvm-ppc:kvm-ppc-queue 7/9] ERROR: ".__spin_yield" [arch/powerpc/kvm/kvm.ko] undefined!

2015-03-22 Thread Paul Mackerras

On Fri, Mar 20, 2015 at 08:07:53PM +0800, kbuild test robot wrote:
> tree:   git://github.com/agraf/linux-2.6.git kvm-ppc-queue
> head:   9b1daf3cfba1801768aa41b1b6ad0b653844241f
> commit: aba777f5ce0accb4c6a277e671de0330752954e8 [7/9] KVM: PPC: Book3S HV: 
> Convert ICS mutex lock to spin lock
> config: powerpc-defconfig (attached as .config)
> reproduce:
>   wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
>   chmod +x ~/bin/make.cross
>   git checkout aba777f5ce0accb4c6a277e671de0330752954e8
>   # save the attached .config to linux build tree
>   make.cross ARCH=powerpc 
> 
> All error/warnings:
> 
> >> ERROR: ".__spin_yield" [arch/powerpc/kvm/kvm.ko] undefined!

Yes, this is the patch that depends on the "powerpc: Export
__spin_yield" patch that Suresh posted to linuxppc-...@ozlabs.org and
I acked.

I think the best thing at this stage is probably for Alex to take that
patch through his tree, assuming Michael is OK with that.

The patch is at https://patchwork.ozlabs.org/patch/443703/ or
https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-February/125315.html.

Alex, if you prefer me to post it to kvm-ppc@vger and kvm@vger, let me
know.

Paul.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8

2015-03-22 Thread Paul Mackerras

On Fri, Mar 20, 2015 at 12:28:25PM +0100, Alexander Graf wrote:
> 
> 
> On 20.03.15 10:39, Paul Mackerras wrote:
> > +   /* On POWER8 for IPIs to threads in the same core, use msgsnd */
> > +   if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
> > +   (cpu & ~7) == (raw_smp_processor_id() & ~7)) {
> 
> Can we somehow encapsulate the secret knowledge that 8 threads mean one
> core?

Looks like I want:

if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
cpu_first_thread_sibling(cpu) ==
cpu_first_thread_sibling(raw_smp_processor_id())) {

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-22 Thread Paul Mackerras

On Fri, Mar 20, 2015 at 12:35:43PM +0100, Alexander Graf wrote:
> 
> 
> On 20.03.15 12:25, Paul Mackerras wrote:
> > On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote:
> >> Have you measure the additional overhead this brings?
> > 
> > I haven't - in fact I did this patch so I could measure the overhead
> > or improvement from other changes I did, but it doesn't measure its
> > own overhead, of course.  I guess I need a workload that does a
> > defined number of guest entries and exits and measure how fast it runs
> > with and without the patch (maybe something like H_SET_MODE in a
> > loop).  I'll figure something out and post the results.  
> 
> Yeah, just measure the number of exits you can handle for a simple
> hcall. If there is measurable overhead, it's probably a good idea to
> move the statistics gathering into #ifdef paths for DEBUGFS or maybe
> even a separate EXIT_TIMING config option as we have it for booke.

For 1-vcpu guest on POWER8, it adds 29ns to the time for an hcall that
is handled in real mode (H_SET_DABR), which is 25%.  It adds 43ns to
the time for an hcall that is handled in the host kernel in virtual
mode (H_PROD), which is 1.2%.

I'll add a config option.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-20 Thread Paul Mackerras

On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote:
> 
> 
> On 20.03.15 10:39, Paul Mackerras wrote:
> > This reads the timebase at various points in the real-mode guest
> > entry/exit code and uses that to accumulate total, minimum and
> > maximum time spent in those parts of the code.  Currently these
> > times are accumulated per vcpu in 5 parts of the code:
> > 
> > * rm_entry - time taken from the start of kvmppc_hv_entry() until
> >   just before entering the guest.
> > * rm_intr - time from when we take a hypervisor interrupt in the
> >   guest until we either re-enter the guest or decide to exit to the
> >   host.  This includes time spent handling hcalls in real mode.
> > * rm_exit - time from when we decide to exit the guest until the
> >   return from kvmppc_hv_entry().
> > * guest - time spend in the guest
> > * cede - time spent napping in real mode due to an H_CEDE hcall
> >   while other threads in the same vcore are active.
> > 
> > These times are exposed in debugfs in a directory per vcpu that
> > contains a file called "timings".  This file contains one line for
> > each of the 5 timings above, with the name followed by a colon and
> > 4 numbers, which are the count (number of times the code has been
> > executed), the total time, the minimum time, and the maximum time,
> > all in nanoseconds.
> > 
> > Signed-off-by: Paul Mackerras 
> 
> Have you measure the additional overhead this brings?

I haven't - in fact I did this patch so I could measure the overhead
or improvement from other changes I did, but it doesn't measure its
own overhead, of course.  I guess I need a workload that does a
defined number of guest entries and exits and measure how fast it runs
with and without the patch (maybe something like H_SET_MODE in a
loop).  I'll figure something out and post the results.  

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object

2015-03-20 Thread Paul Mackerras

On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
> 
> 
> On 20.03.15 10:39, Paul Mackerras wrote:
> > From: Bharata B Rao 
> > 
> > Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
> > correctly, certain work arounds have to be employed to allow reuse of
> > vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
> > proposed workaround is to park the vcpu fd in userspace during cpu unplug
> > and reuse it later during next hotplug.
> > 
> > More details can be found here:
> > KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> > QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
> > 
> > In order to support this workaround with PowerPC KVM, don't create or
> > initialize ICP if the vCPU is found to be already associated with an ICP.
> > 
> > Signed-off-by: Bharata B Rao 
> > Signed-off-by: Paul Mackerras 
> 
> This probably makes some sense, but please make sure that user space has
> some way to figure out whether hotplug works at all.

Bharata is working on the qemu side of all this, so I assume he has
that covered.

> Also Paul, for patches that you pick up from others, I'd prefer if they
> send the patches to the ML themselves first and you pick them up from
> there then. That way we give everyone the same treatment.

Fair enough.  In fact Bharata did post the patch but he sent it to
linuxppc-...@ozlabs.org not the KVM lists.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 22/23] KVM: PPC: Book3S HV: Use bitmap of active threads rather than count

2015-03-20 Thread Paul Mackerras

Currently, the entry_exit_count field in the kvmppc_vcore struct
contains two 8-bit counts, one of the threads that have started entering
the guest, and one of the threads that have started exiting the guest.
This changes it to an entry_exit_map field which contains two bitmaps
of 8 bits each.  The advantage of doing this is that it gives us a
bitmap of which threads need to be signalled when exiting the guest.
That means that we no longer need to use the trick of setting the
HDEC to 0 to pull the other threads out of the guest, which led in
some cases to a spurious HDEC interrupt on the next guest entry.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h | 15 
 arch/powerpc/kernel/asm-offsets.c   |  2 +-
 arch/powerpc/kvm/book3s_hv.c|  5 ++-
 arch/powerpc/kvm/book3s_hv_builtin.c| 10 +++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 63 +++--
 5 files changed, 45 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index f6d4232..c2b9551 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -263,15 +263,15 @@ struct kvm_arch {
 
 /*
  * Struct for a virtual core.
- * Note: entry_exit_count combines an entry count in the bottom 8 bits
- * and an exit count in the next 8 bits.  This is so that we can
- * atomically increment the entry count iff the exit count is 0
- * without taking the lock.
+ * Note: entry_exit_map combines a bitmap of threads that have entered
+ * in the bottom 8 bits and a bitmap of threads that have exited in the
+ * next 8 bits.  This is so that we can atomically set the entry bit
+ * iff the exit map is 0 without taking a lock.
  */
 struct kvmppc_vcore {
int n_runnable;
int num_threads;
-   int entry_exit_count;
+   int entry_exit_map;
int napping_threads;
int first_vcpuid;
u16 pcpu;
@@ -296,8 +296,9 @@ struct kvmppc_vcore {
ulong conferring_threads;
 };
 
-#define VCORE_ENTRY_COUNT(vc)  ((vc)->entry_exit_count & 0xff)
-#define VCORE_EXIT_COUNT(vc)   ((vc)->entry_exit_count >> 8)
+#define VCORE_ENTRY_MAP(vc)((vc)->entry_exit_map & 0xff)
+#define VCORE_EXIT_MAP(vc) ((vc)->entry_exit_map >> 8)
+#define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 0ce2aa6..ed348e5 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -561,7 +561,7 @@ int main(void)
DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop));
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
-   DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, 
entry_exit_count));
+   DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, 
napping_threads));
DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2c34bae..9ea0eb5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1941,7 +1941,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Initialize *vc.
 */
-   vc->entry_exit_count = 0;
+   vc->entry_exit_map = 0;
vc->preempt_tb = TB_NIL;
vc->in_guest = 0;
vc->napping_threads = 0;
@@ -2108,8 +2108,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
 * this thread straight away and have it join in.
 */
if (!signal_pending(current)) {
-   if (vc->vcore_state == VCORE_RUNNING &&
-   VCORE_EXIT_COUNT(vc) == 0) {
+   if (vc->vcore_state == VCORE_RUNNING && !VCORE_IS_EXITING(vc)) {
kvmppc_create_dtl_entry(vcpu, vc);
kvmppc_start_thread(vcpu);
trace_kvm_guest_enter(vcpu);
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 1954a1c..2754251 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -115,11 +115,11 @@ long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int 
target,
int rv = H_SUCCESS; /* => don't yield */
 
set_bit(vcpu->arch.ptid, &vc->conferring_threads);
-   while ((get_tb() < stop) && (VCORE_EXIT_COUNT(vc) == 0)) {
-   threads_running = VCORE_ENTRY_COUNT(vc);
-   threads_ceded = hweight32(vc->napping_threads);
-   threads_conferring = hweight

[PATCH 12/23] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT

2015-03-20 Thread Paul Mackerras

This creates a debugfs directory for each HV guest (assuming debugfs
is enabled in the kernel config), and within that directory, a file
by which the contents of the guest's HPT (hashed page table) can be
read.  The directory is named vm, where  is the PID of the
process that created the guest.  The file is named "htab".  This is
intended to help in debugging problems in the host's management
of guest memory.

The contents of the file consist of a series of lines like this:

  3f48 4000d032bf003505 000bd7ff1196 0003b5c71196

The first field is the index of the entry in the HPT, the second and
third are the HPT entry, so the third entry contains the real page
number that is mapped by the entry if the entry's valid bit is set.
The fourth field is the guest's view of the second doubleword of the
entry, so it contains the guest physical address.  (The format of the
second through fourth fields are described in the Power ISA and also
in arch/powerpc/include/asm/mmu-hash64.h.)

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 +
 arch/powerpc/include/asm/kvm_host.h  |   2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 136 +++
 arch/powerpc/kvm/book3s_hv.c |  12 +++
 virt/kvm/kvm_main.c  |   1 +
 5 files changed, 153 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0789a0f..869c53f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -436,6 +436,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
return rcu_dereference_raw_notrace(kvm->memslots);
 }
 
+extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 015773f..f1d0bbc 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -238,6 +238,8 @@ struct kvm_arch {
atomic_t hpte_mod_interest;
cpumask_t need_tlb_flush;
int hpt_cma_alloc;
+   struct dentry *debugfs_dir;
+   struct dentry *htab_dentry;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 6c6825a..d6fe308 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct 
kvm_get_htab_fd *ghf)
return ret;
 }
 
+struct debugfs_htab_state {
+   struct kvm  *kvm;
+   struct mutexmutex;
+   unsigned long   hpt_index;
+   int chars_left;
+   int buf_index;
+   charbuf[64];
+};
+
+static int debugfs_htab_open(struct inode *inode, struct file *file)
+{
+   struct kvm *kvm = inode->i_private;
+   struct debugfs_htab_state *p;
+
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   if (!p)
+   return -ENOMEM;
+
+   kvm_get_kvm(kvm);
+   p->kvm = kvm;
+   mutex_init(&p->mutex);
+   file->private_data = p;
+
+   return nonseekable_open(inode, file);
+}
+
+static int debugfs_htab_release(struct inode *inode, struct file *file)
+{
+   struct debugfs_htab_state *p = file->private_data;
+
+   kvm_put_kvm(p->kvm);
+   kfree(p);
+   return 0;
+}
+
+static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
+size_t len, loff_t *ppos)
+{
+   struct debugfs_htab_state *p = file->private_data;
+   ssize_t ret, r;
+   unsigned long i, n;
+   unsigned long v, hr, gr;
+   struct kvm *kvm;
+   __be64 *hptp;
+
+   ret = mutex_lock_interruptible(&p->mutex);
+   if (ret)
+   return ret;
+
+   if (p->chars_left) {
+   n = p->chars_left;
+   if (n > len)
+   n = len;
+   r = copy_to_user(buf, p->buf + p->buf_index, n);
+   n -= r;
+   p->chars_left -= n;
+   p->buf_index += n;
+   buf += n;
+   len -= n;
+   ret = n;
+   if (r) {
+   if (!n)
+   ret = -EFAULT;
+   goto out;
+   }
+   }
+
+   kvm = p->kvm;
+   i = p->hpt_index;
+   hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
+   for (; len != 0 && i < kvm->arch.hpt_npte; ++i, hptp += 2) {
+   if (!(be64_to_cpu(hptp[0]) & (HPTE_V_VALID |

[PATCH 23/23] KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C

2015-03-20 Thread Paul Mackerras

This replaces the assembler code for kvmhv_commence_exit() with C code
in book3s_hv_builtin.c.  It also moves the IPI/message sending code
that was in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function
so it can be used by kvmhv_commence_exit() as well as
icp_rm_set_vcpu_irq().

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  2 +
 arch/powerpc/kvm/book3s_hv_builtin.c | 73 
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 22 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 81 
 4 files changed, 85 insertions(+), 93 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 869c53f..2b84e48 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -438,6 +438,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct 
kvm *kvm)
 
 extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 
+extern void kvmhv_rm_send_ipi(int cpu);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 2754251..dbfc525 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #define KVM_CMA_CHUNK_ORDER18
 
@@ -184,3 +186,74 @@ long kvmppc_h_random(struct kvm_vcpu *vcpu)
 
return H_HARDWARE;
 }
+
+static inline void rm_writeb(unsigned long paddr, u8 val)
+{
+   __asm__ __volatile__("stbcix %0,0,%1"
+   : : "r" (val), "r" (paddr) : "memory");
+}
+
+/*
+ * Send an interrupt or message to another CPU.
+ * This can only be called in real mode.
+ * The caller needs to include any barrier needed to order writes
+ * to memory vs. the IPI/message.
+ */
+void kvmhv_rm_send_ipi(int cpu)
+{
+   unsigned long xics_phys;
+
+   /* On POWER8 for IPIs to threads in the same core, use msgsnd */
+   if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
+   (cpu & ~7) == (raw_smp_processor_id() & ~7)) {
+   unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+   msg |= cpu & 7;
+   __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+   return;
+   }
+
+   /* Not too hard, then poke the target */
+   xics_phys = paca[cpu].kvm_hstate.xics_phys;
+   rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+}
+
+/*
+ * The following functions are called from the assembly code
+ * in book3s_hv_rmhandlers.S.
+ */
+static void kvmhv_interrupt_vcore(struct kvmppc_vcore *vc, int active)
+{
+   int cpu = vc->pcpu;
+
+   /* Order setting of exit map vs. msgsnd/IPI */
+   smp_mb();
+   for (; active; active >>= 1, ++cpu)
+   if (active & 1)
+   kvmhv_rm_send_ipi(cpu);
+}
+
+void kvmhv_commence_exit(int trap)
+{
+   struct kvmppc_vcore *vc = local_paca->kvm_hstate.kvm_vcore;
+   int ptid = local_paca->kvm_hstate.ptid;
+   int me, ee;
+
+   /* Set our bit in the threads-exiting-guest map in the 0xff00
+  bits of vcore->entry_exit_map */
+   me = 0x100 << ptid;
+   do {
+   ee = vc->entry_exit_map;
+   } while (cmpxchg(&vc->entry_exit_map, ee, ee | me) != ee);
+
+   /* Are we the first here? */
+   if ((ee >> 8) != 0)
+   return;
+
+   /*
+* Trigger the other threads in this vcore to exit the guest.
+* If this is a hypervisor decrementer interrupt then they
+* will be already on their way out of the guest.
+*/
+   if (trap != BOOK3S_INTERRUPT_HV_DECREMENTER)
+   kvmhv_interrupt_vcore(vc, ee & ~(1 << ptid));
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 457a8b1..046ab44 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -27,12 +27,6 @@
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
u32 new_irq);
 
-static inline void rm_writeb(unsigned long paddr, u8 val)
-{
-   __asm__ __volatile__("sync; stbcix %0,0,%1"
-   : : "r" (val), "r" (paddr) : "memory");
-}
-
 /* -- ICS routines -- */
 static void ics_rm_check_resend(struct kvmppc_xics *xics,
struct kvmppc_ics *ics, struct kvmppc_icp *icp)
@@ -61,7 +55,6 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
struct kvm_vcpu *this_vcpu)
 {
struct kvmppc_icp *this_icp = this_vcpu->arch.icp;
-   unsigned long xics_phys;
int cpu;
 
/* Mark the target VCPU as having an interrupt pendin

[PATCH 10/23] KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode

2015-03-20 Thread Paul Mackerras

From: Suresh Warrier 

Interrupt-based hypercalls return H_TOO_HARD to inform KVM that it needs
to switch to the host to complete the rest of hypercall function in
virtual mode. This patch ports the virtual mode ICS/ICP reject and resend
functions to be runnable in hypervisor real mode, thus avoiding the need
to switch to the host to execute these functions in virtual mode. However,
the hypercalls continue to return H_TOO_HARD for vcpu_wakeup and notify
events - these events cannot be done in real mode and they will still need
a switch to host virtual mode.

There are sufficient differences between the real mode code and the
virtual mode code for the ICS/ICP resend and reject functions that
for now the code has been duplicated instead of sharing common code.
In the future, we can look at creating common functions.

Signed-off-by: Suresh Warrier 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 225 ---
 1 file changed, 211 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 7c22997..73bbe92 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -23,12 +23,39 @@
 
 #define DEBUG_PASSUP
 
+static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
+   u32 new_irq);
+
 static inline void rm_writeb(unsigned long paddr, u8 val)
 {
__asm__ __volatile__("sync; stbcix %0,0,%1"
: : "r" (val), "r" (paddr) : "memory");
 }
 
+/* -- ICS routines -- */
+static void ics_rm_check_resend(struct kvmppc_xics *xics,
+   struct kvmppc_ics *ics, struct kvmppc_icp *icp)
+{
+   int i;
+
+   arch_spin_lock(&ics->lock);
+
+   for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
+   struct ics_irq_state *state = &ics->irq_state[i];
+
+   if (!state->resend)
+   continue;
+
+   arch_spin_unlock(&ics->lock);
+   icp_rm_deliver_irq(xics, icp, state->number);
+   arch_spin_lock(&ics->lock);
+   }
+
+   arch_spin_unlock(&ics->lock);
+}
+
+/* -- ICP routines -- */
+
 static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
struct kvm_vcpu *this_vcpu)
 {
@@ -116,6 +143,178 @@ static inline int check_too_hard(struct kvmppc_xics *xics,
return (xics->real_mode_dbg || icp->rm_action) ? H_TOO_HARD : H_SUCCESS;
 }
 
+static void icp_rm_check_resend(struct kvmppc_xics *xics,
+struct kvmppc_icp *icp)
+{
+   u32 icsid;
+
+   /* Order this load with the test for need_resend in the caller */
+   smp_rmb();
+   for_each_set_bit(icsid, icp->resend_map, xics->max_icsid + 1) {
+   struct kvmppc_ics *ics = xics->ics[icsid];
+
+   if (!test_and_clear_bit(icsid, icp->resend_map))
+   continue;
+   if (!ics)
+   continue;
+   ics_rm_check_resend(xics, ics, icp);
+   }
+}
+
+static bool icp_rm_try_to_deliver(struct kvmppc_icp *icp, u32 irq, u8 priority,
+  u32 *reject)
+{
+   union kvmppc_icp_state old_state, new_state;
+   bool success;
+
+   do {
+   old_state = new_state = READ_ONCE(icp->state);
+
+   *reject = 0;
+
+   /* See if we can deliver */
+   success = new_state.cppr > priority &&
+   new_state.mfrr > priority &&
+   new_state.pending_pri > priority;
+
+   /*
+* If we can, check for a rejection and perform the
+* delivery
+*/
+   if (success) {
+   *reject = new_state.xisr;
+   new_state.xisr = irq;
+   new_state.pending_pri = priority;
+   } else {
+   /*
+* If we failed to deliver we set need_resend
+* so a subsequent CPPR state change causes us
+* to try a new delivery.
+*/
+   new_state.need_resend = true;
+   }
+
+   } while (!icp_rm_try_update(icp, old_state, new_state));
+
+   return success;
+}
+
+static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
+   u32 new_irq)
+{
+   struct ics_irq_state *state;
+   struct kvmppc_ics *ics;
+   u32 reject;
+   u16 src;
+
+   /*
+* This is used both for initial delivery of an interrupt and
+* for subsequent rejection.
+*
+* Rejection can be racy vs. resends. We have evaluated the
+* reje

[PATCH 21/23] KVM: PPC: Book3S HV: Streamline guest entry and exit

2015-03-20 Thread Paul Mackerras

On entry to the guest, secondary threads now wait for the primary to
switch the MMU after loading up most of their state, rather than before.
This means that the secondary threads get into the guest sooner, in the
common case where the secondary threads get to kvmppc_hv_entry before
the primary thread.

On exit, the first thread out increments the exit count and interrupts
the other threads (to get them out of the guest) before saving most
of its state, rather than after.  That means that the other threads
exit sooner and means that the first thread doesn't spend so much
time waiting for the other threads at the point where the MMU gets
switched back to the host.

This pulls out the code that increments the exit count and interrupts
other threads into a separate function, kvmhv_commence_exit().
This also makes sure that r12 and vcpu->arch.trap are set correctly
in some corner cases.

Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the
improvement.  Aggregating across vcpus for a guest with 32 vcpus,
8 threads/vcore, running on a POWER8, gives this before the change:

 rm_entry: avg 3919.3ns (244 - 56492, 742665 samples)
  rm_exit: avg 4102.5ns (130 - 36272, 704056 samples)
  rm_intr: avg 1006.0ns (12 - 75040, 2819905 samples)

and this after the change:

 rm_entry: avg 2979.8ns (258 - 83740, 836403 samples)
  rm_exit: avg 3992.9ns (12 - 45572, 838034 samples)
  rm_intr: avg  922.2ns (12 - 66694, 3127066 samples)

showing a substantial reduction in the time spent in the real-mode
guest entry code, and smaller reductions in the real mode guest exit
and interrupt handling times.  (The test was to start the guest and
boot Fedora 20 big-endian to the login prompt.)

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 240 +++-
 1 file changed, 141 insertions(+), 99 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 04728ce..ff1461d 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -175,6 +175,19 @@ kvmppc_primary_no_guest:
/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
mfspr   r3, SPRN_HDEC
mtspr   SPRN_DEC, r3
+   /*
+* Make sure the primary has finished the MMU switch.
+* We should never get here on a secondary thread, but
+* check it for robustness' sake.
+*/
+   ld  r5, HSTATE_KVM_VCORE(r13)
+65:lbz r0, VCORE_IN_GUEST(r5)
+   cmpwi   r0, 0
+   beq 65b
+   /* Set LPCR. */
+   ld  r8,VCORE_LPCR(r5)
+   mtspr   SPRN_LPCR,r8
+   isync
/* set our bit in napping_threads */
ld  r5, HSTATE_KVM_VCORE(r13)
lbz r7, HSTATE_PTID(r13)
@@ -206,7 +219,7 @@ kvm_novcpu_wakeup:
 
/* check the wake reason */
bl  kvmppc_check_wake_reason
-   
+
/* see if any other thread is already exiting */
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi   r0, 0x100
@@ -243,7 +256,12 @@ kvm_novcpu_wakeup:
 
 kvm_novcpu_exit:
ld  r4, HSTATE_KVM_VCPU(r13)
-   b   hdec_soon
+   cmpdi   r4, 0
+   beq 13f
+   addir3, r4, VCPU_TB_RMEXIT
+   bl  kvmhv_accumulate_time
+13:bl  kvmhv_commence_exit
+   b   kvmhv_switch_to_host
 
 /*
  * We come in here when wakened from nap mode.
@@ -417,7 +435,7 @@ kvmppc_hv_entry:
ld  r9,VCORE_KVM(r5)/* pointer to struct kvm */
lbz r6,HSTATE_PTID(r13)
cmpwi   r6,0
-   bne 20f
+   bne 10f
ld  r6,KVM_SDR1(r9)
lwz r7,KVM_LPID(r9)
li  r0,LPID_RSVD/* switch to reserved LPID */
@@ -488,26 +506,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
li  r0,1
stb r0,VCORE_IN_GUEST(r5)   /* signal secondaries to continue */
-   b   10f
-
-   /* Secondary threads wait for primary to have done partition switch */
-20:lbz r0,VCORE_IN_GUEST(r5)
-   cmpwi   r0,0
-   beq 20b
-
-   /* Set LPCR. */
-10:ld  r8,VCORE_LPCR(r5)
-   mtspr   SPRN_LPCR,r8
-   isync
-
-   /* Check if HDEC expires soon */
-   mfspr   r3,SPRN_HDEC
-   cmpwi   r3,512  /* 1 microsecond */
-   li  r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-   blt hdec_soon
 
/* Do we have a guest vcpu to run? */
-   cmpdi   r4, 0
+10:cmpdi   r4, 0
beq kvmppc_primary_no_guest
 kvmppc_got_guest:
 
@@ -832,6 +833,30 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
clrrdi  r6,r6,1
mtspr   SPRN_CTRLT,r6
 4:
+   /* Secondary threads wait for primary to have done partition switch */
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   lbz r6, HSTATE_PTID(r13)
+   cmpwi   r6, 0
+   beq 21f
+   lbz r0, VCORE_IN_GUEST(r5)
+   cmpwi   r0, 0
+

[PATCH 18/23] KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI

2015-03-20 Thread Paul Mackerras

When running a multi-threaded guest and vcpu 0 in a virtual core
is not running in the guest (i.e. it is busy elsewhere in the host),
thread 0 of the physical core will switch the MMU to the guest and
then go to nap mode in the code at kvm_do_nap.  If the guest sends
an IPI to thread 0 using the msgsndp instruction, that will wake
up thread 0 and cause all the threads in the guest to exit to the
host unnecessarily.  To avoid the unnecessary exit, this arranges
for the PECEDP bit to be cleared in this situation.  When napping
due to a H_CEDE from the guest, we still set PECEDP so that the
thread will wake up on an IPI sent using msgsndp.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c3b148d..8afc8a8 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -191,6 +191,7 @@ kvmppc_primary_no_guest:
li  r3, NAPPING_NOVCPU
stb r3, HSTATE_NAPPING(r13)
 
+   li  r3, 0   /* Don't wake on privileged (OS) doorbell */
b   kvm_do_nap
 
 kvm_novcpu_wakeup:
@@ -2107,10 +2108,13 @@ _GLOBAL(kvmppc_h_cede)  /* r3 = vcpu pointer, 
r11 = msr, r13 = paca */
addir3, r4, VCPU_TB_CEDE
bl  kvmhv_accumulate_time
 
+   lis r3, LPCR_PECEDP@h   /* Do wake on privileged doorbell */
+
/*
 * Take a nap until a decrementer or external or doobell interrupt
-* occurs, with PECE1, PECE0 and PECEDP set in LPCR. Also clear the
-* runlatch bit before napping.
+* occurs, with PECE1 and PECE0 set in LPCR.
+* On POWER8, if we are ceding, also set PECEDP.
+* Also clear the runlatch bit before napping.
 */
 kvm_do_nap:
mfspr   r0, SPRN_CTRLF
@@ -2122,7 +2126,7 @@ kvm_do_nap:
mfspr   r5,SPRN_LPCR
ori r5,r5,LPCR_PECE0 | LPCR_PECE1
 BEGIN_FTR_SECTION
-   orisr5,r5,LPCR_PECEDP@h
+   rlwimi  r5, r3, 0, LPCR_PECEDP
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_LPCR,r5
isync
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8

2015-03-20 Thread Paul Mackerras

This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller.  This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kernel/asm-offsets.c   |  4 +++
 arch/powerpc/kvm/book3s_hv.c| 48 ++---
 arch/powerpc/kvm/book3s_hv_rm_xics.c| 11 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 41 
 4 files changed, 83 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index fa7b57d..0ce2aa6 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef CONFIG_PPC64
 #include 
 #include 
@@ -568,6 +569,7 @@ int main(void)
DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr));
DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr));
DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes));
+   DEFINE(VCORE_PCPU, offsetof(struct kvmppc_vcore, pcpu));
DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
@@ -757,5 +759,7 @@ int main(void)
offsetof(struct paca_struct, subcore_sibling_mask));
 #endif
 
+   DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
+
return 0;
 }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 03a8bb4..2c34bae 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -84,9 +85,34 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
MAX_HCALL_OPCODE/4 + 1);
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
+static bool kvmppc_ipi_thread(int cpu)
+{
+   /* On POWER8 for IPIs to threads in the same core, use msgsnd */
+   if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
+   preempt_disable();
+   if ((cpu & ~7) == (smp_processor_id() & ~7)) {
+   unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+   msg |= cpu & 7;
+   smp_mb();
+   __asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+   preempt_enable();
+   return true;
+   }
+   preempt_enable();
+   }
+
+#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
+   if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
+   xics_wake_cpu(cpu);
+   return true;
+   }
+#endif
+
+   return false;
+}
+
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-   int me;
int cpu = vcpu->cpu;
wait_queue_head_t *wqp;
 
@@ -96,20 +122,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
++vcpu->stat.halt_wakeup;
}
 
-   me = get_cpu();
+   if (kvmppc_ipi_thread(cpu + vcpu->arch.ptid))
+   return;
 
/* CPU points to the first thread of the core */
-   if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) {
-#ifdef CONFIG_PPC_ICP_NATIVE
-   int real_cpu = cpu + vcpu->arch.ptid;
-   if (paca[real_cpu].kvm_hstate.xics_phys)
-   xics_wake_cpu(real_cpu);
-   else
-#endif
-   if (cpu_online(cpu))
-   smp_send_reschedule(cpu);
-   }
-   put_cpu();
+   if (cpu >= 0 && cpu < nr_cpu_ids && cpu_online(cpu))
+   smp_send_reschedule(cpu);
 }
 
 /*
@@ -1754,10 +1772,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
/* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
smp_wmb();
tpaca->kvm_hstate.kvm_vcpu = vcpu;
-#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
if (cpu != smp_processor_id())
-   xics_wake_cpu(cpu);
-#endif
+   kvmppc_ipi_thread(cpu);
 }
 
 static void kvmppc_wait_for_nap(void)
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 6dded8c..457a8b1 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "book3s_xics.h"
 
@@ -83,6 +84,16 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
/* In SMT cpu will always point to thread 0, we adjust it */

[PATCH 14/23] KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update

2015-03-20 Thread Paul Mackerras

Previously, if kvmppc_run_core() was running a VCPU that needed a VPA
update (i.e. one of its 3 virtual processor areas needed to be pinned
in memory so the host real mode code can update it on guest entry and
exit), we would drop the vcore lock and do the update there and then.
Future changes will make it inconvenient to drop the lock, so instead
we now remove it from the list of runnable VCPUs and wake up its
VCPU task.  This will have the effect that the VCPU task will exit
kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and
re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call
to kvmppc_update_vpas() and then rejoin the vcore.

The one complication is that the runner VCPU (whose VCPU task is the
current task) might be one of the ones that gets removed from the
runnable list.  In that case we just return from kvmppc_run_core()
and let the code in kvmppc_run_vcpu() wake up another VCPU task to be
the runner if necessary.

This all means that the VCORE_STARTING state is no longer used, so we
remove it.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  5 ++--
 arch/powerpc/kvm/book3s_hv.c| 56 -
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 286c0ce..cee6e55 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -306,9 +306,8 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE 0
 #define VCORE_SLEEPING 1
-#define VCORE_STARTING 2
-#define VCORE_RUNNING  3
-#define VCORE_EXITING  4
+#define VCORE_RUNNING  2
+#define VCORE_EXITING  3
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8517c33..15598be 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1836,6 +1836,25 @@ static void kvmppc_start_restoring_l2_cache(const struct 
kvmppc_vcore *vc)
mtspr(SPRN_MPPR, mpp_addr | PPC_MPPR_FETCH_WHOLE_TABLE);
 }
 
+static void prepare_threads(struct kvmppc_vcore *vc)
+{
+   struct kvm_vcpu *vcpu, *vnext;
+
+   list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
+arch.run_list) {
+   if (signal_pending(vcpu->arch.run_task))
+   vcpu->arch.ret = -EINTR;
+   else if (vcpu->arch.vpa.update_pending ||
+vcpu->arch.slb_shadow.update_pending ||
+vcpu->arch.dtl.update_pending)
+   vcpu->arch.ret = RESUME_GUEST;
+   else
+   continue;
+   kvmppc_remove_runnable(vc, vcpu);
+   wake_up(&vcpu->arch.cpu_run);
+   }
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
@@ -1845,46 +1864,31 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
struct kvm_vcpu *vcpu, *vnext;
long ret;
u64 now;
-   int i, need_vpa_update;
+   int i;
int srcu_idx;
-   struct kvm_vcpu *vcpus_to_update[threads_per_core];
 
-   /* don't start if any threads have a signal pending */
-   need_vpa_update = 0;
-   list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
-   if (signal_pending(vcpu->arch.run_task))
-   return;
-   if (vcpu->arch.vpa.update_pending ||
-   vcpu->arch.slb_shadow.update_pending ||
-   vcpu->arch.dtl.update_pending)
-   vcpus_to_update[need_vpa_update++] = vcpu;
-   }
+   /*
+* Remove from the list any threads that have a signal pending
+* or need a VPA update done
+*/
+   prepare_threads(vc);
+
+   /* if the runner is no longer runnable, let the caller pick a new one */
+   if (vc->runner->arch.state != KVMPPC_VCPU_RUNNABLE)
+   return;
 
/*
-* Initialize *vc, in particular vc->vcore_state, so we can
-* drop the vcore lock if necessary.
+* Initialize *vc.
 */
vc->n_woken = 0;
vc->nap_count = 0;
vc->entry_exit_count = 0;
vc->preempt_tb = TB_NIL;
-   vc->vcore_state = VCORE_STARTING;
vc->in_guest = 0;
vc->napping_threads = 0;
vc->conferring_threads = 0;
 
/*
-* Updating any of the vpas requires calling kvmppc_pin_guest_page,
-* which can't be called with any spinlocks held.
-*/
-   if (need_vpa_update) {
-   spin_unlock(&vc->lock);
-   for (i = 0; i < need_vpa_update; ++i)
-   kvmppc_update_vpas(vcpus_to_update[i]);
-   spin_lock(&vc->lock);
-   }
-
-   /*

[PATCH 19/23] KVM: PPC: Book3S HV: Use decrementer to wake napping threads

2015-03-20 Thread Paul Mackerras

This arranges for threads that are napping due to their vcpu having
ceded or due to not having a vcpu to wake up at the end of the guest's
timeslice without having to be poked with an IPI.  We do that by
arranging for the decrementer to contain a value no greater than the
number of timebase ticks remaining until the end of the timeslice.
In the case of a thread with no vcpu, this number is in the hypervisor
decrementer already.  In the case of a ceded vcpu, we use the smaller
of the HDEC value and the DEC value.

Using the DEC like this when ceded means we need to save and restore
the guest decrementer value around the nap.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 +++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 8afc8a8..03a37a0 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -172,6 +172,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 kvmppc_primary_no_guest:
/* We handle this much like a ceded vcpu */
+   /* put the HDEC into the DEC, since HDEC interrupts don't wake us */
+   mfspr   r3, SPRN_HDEC
+   mtspr   SPRN_DEC, r3
/* set our bit in napping_threads */
ld  r5, HSTATE_KVM_VCORE(r13)
lbz r7, HSTATE_PTID(r13)
@@ -223,6 +226,12 @@ kvm_novcpu_wakeup:
cmpdi   r3, 0
bge kvm_novcpu_exit
 
+   /* See if our timeslice has expired (HDEC is negative) */
+   mfspr   r0, SPRN_HDEC
+   li  r12, BOOK3S_INTERRUPT_HV_DECREMENTER
+   cmpwi   r0, 0
+   blt kvm_novcpu_exit
+
/* Got an IPI but other vcpus aren't yet exiting, must be a latecomer */
ld  r4, HSTATE_KVM_VCPU(r13)
cmpdi   r4, 0
@@ -1478,10 +1487,10 @@ kvmhv_do_exit:  /* r12 = trap, r13 = 
paca */
cmpwi   r3,0x100/* Are we the first here? */
bge 43f
cmpwi   r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-   beq 40f
+   beq 43f
li  r0,0
mtspr   SPRN_HDEC,r0
-40:
+
/*
 * Send an IPI to any napping threads, since an HDEC interrupt
 * doesn't wake CPUs up from nap.
@@ -2104,6 +2113,27 @@ _GLOBAL(kvmppc_h_cede)   /* r3 = vcpu pointer, 
r11 = msr, r13 = paca */
/* save FP state */
bl  kvmppc_save_fp
 
+   /*
+* Set DEC to the smaller of DEC and HDEC, so that we wake
+* no later than the end of our timeslice (HDEC interrupts
+* don't wake us from nap).
+*/
+   mfspr   r3, SPRN_DEC
+   mfspr   r4, SPRN_HDEC
+   mftbr5
+   cmpwr3, r4
+   ble 67f
+   mtspr   SPRN_DEC, r4
+67:
+   /* save expiry time of guest decrementer */
+   extsw   r3, r3
+   add r3, r3, r5
+   ld  r4, HSTATE_KVM_VCPU(r13)
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_TB_OFFSET(r5)
+   subfr3, r6, r3  /* convert to host TB value */
+   std r3, VCPU_DEC_EXPIRES(r4)
+
ld  r4, HSTATE_KVM_VCPU(r13)
addir3, r4, VCPU_TB_CEDE
bl  kvmhv_accumulate_time
@@ -2157,6 +2187,15 @@ kvm_end_cede:
/* load up FP state */
bl  kvmppc_load_fp
 
+   /* Restore guest decrementer */
+   ld  r3, VCPU_DEC_EXPIRES(r4)
+   ld  r5, HSTATE_KVM_VCORE(r13)
+   ld  r6, VCORE_TB_OFFSET(r5)
+   add r3, r3, r6  /* convert host TB to guest TB value */
+   mftbr7
+   subfr3, r7, r3
+   mtspr   SPRN_DEC, r3
+
/* Load NV GPRS */
ld  r14, VCPU_GPR(R14)(r4)
ld  r15, VCPU_GPR(R15)(r4)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/23] KVM: PPC: Book3S HV: Minor cleanups

2015-03-20 Thread Paul Mackerras

* Remove unused kvmppc_vcore::n_busy field.
* Remove setting of RMOR, since it was only used on PPC970 and the
  PPC970 KVM support has been removed.
* Don't use r1 or r2 in setting the runlatch since they are
  conventionally reserved for other things; use r0 instead.
* Streamline the code a little and remove the ext_interrupt_to_host
  label.
* Add some comments about register usage.
* hcall_try_real_mode doesn't need to be global, and can't be
  called from C code anyway.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kernel/asm-offsets.c   |  1 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 44 ++---
 3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index cee6e55..ec4cf37 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -227,7 +227,6 @@ struct kvm_arch {
unsigned long host_sdr1;
int tlbie_lock;
unsigned long lpcr;
-   unsigned long rmor;
unsigned long vrma_slb_v;
int hpte_setup_done;
u32 hpt_order;
@@ -271,7 +270,6 @@ struct kvm_arch {
  */
 struct kvmppc_vcore {
int n_runnable;
-   int n_busy;
int num_threads;
int entry_exit_count;
int n_woken;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index ec9f59c..5eda551 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -503,7 +503,6 @@ int main(void)
DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits));
DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
-   DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index d71ae2f..b2e6718 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -244,9 +244,9 @@ kvm_novcpu_exit:
 kvm_start_guest:
 
/* Set runlatch bit the minute you wake up from nap */
-   mfspr   r1, SPRN_CTRLF
-   ori r1, r1, 1
-   mtspr   SPRN_CTRLT, r1
+   mfspr   r0, SPRN_CTRLF
+   ori r0, r0, 1
+   mtspr   SPRN_CTRLT, r0
 
ld  r2,PACATOC(r13)
 
@@ -490,11 +490,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
cmpwi   r0,0
beq 20b
 
-   /* Set LPCR and RMOR. */
+   /* Set LPCR. */
 10:ld  r8,VCORE_LPCR(r5)
mtspr   SPRN_LPCR,r8
-   ld  r8,KVM_RMOR(r9)
-   mtspr   SPRN_RMOR,r8
isync
 
/* Check if HDEC expires soon */
@@ -1065,7 +1063,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
bne 2f
mfspr   r3,SPRN_HDEC
cmpwi   r3,0
-   bge ignore_hdec
+   mr  r4,r9
+   bge fast_guest_return
 2:
/* See if this is an hcall we can handle in real mode */
cmpwi   r12,BOOK3S_INTERRUPT_SYSCALL
@@ -1073,26 +1072,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
/* External interrupt ? */
cmpwi   r12, BOOK3S_INTERRUPT_EXTERNAL
-   bne+ext_interrupt_to_host
+   bne+guest_exit_cont
 
/* External interrupt, first check for host_ipi. If this is
 * set, we know the host wants us out so let's do it now
 */
bl  kvmppc_read_intr
cmpdi   r3, 0
-   bgt ext_interrupt_to_host
+   bgt guest_exit_cont
 
/* Check if any CPU is heading out to the host, if so head out too */
ld  r5, HSTATE_KVM_VCORE(r13)
lwz r0, VCORE_ENTRY_EXIT(r5)
cmpwi   r0, 0x100
-   bge ext_interrupt_to_host
-
-   /* Return to guest after delivering any pending interrupt */
mr  r4, r9
-   b   deliver_guest_interrupt
-
-ext_interrupt_to_host:
+   blt deliver_guest_interrupt
 
 guest_exit_cont:   /* r9 = vcpu, r12 = trap, r13 = paca */
/* Save more register state  */
@@ -1743,8 +1737,10 @@ kvmppc_hisi:
  * Returns to the guest if we handle it, or continues on up to
  * the kernel if we can't (i.e. if we don't have a handler for
  * it, or if the handler returns H_TOO_HARD).
+ *
+ * r5 - r8 contain hcall args,
+ * r9 = vcpu, r10 = pc, r11 = msr, r12 = trap, r13 = paca
  */
-   .globl  hcall_try_real_mode
 hcall_try_real_mode:
ld  r3,VCPU_GPR(R3)(r9)
andi.   r0,r11,MSR_PR
@@ -2004,10 +2000,6 @@ hcall_real_table:
.globl  hcall_real_table_end
 hcall_real_table_end:
 
-ignore_hdec:
-   mr  r4,r9
-   b   fast_guest_return
-
 _GLOBAL(kv

[PATCH 17/23] KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken

2015-03-20 Thread Paul Mackerras

We can tell when a secondary thread has finished running a guest by
the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
is no real need for the nap_count field in the kvmppc_vcore struct.
This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
pointers of the secondary threads rather than polling vc->nap_count.
Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
this also means that we can tell which secondary threads have got
stuck and thus print a more informative error message.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  2 --
 arch/powerpc/kernel/asm-offsets.c   |  1 -
 arch/powerpc/kvm/book3s_hv.c| 47 +++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 19 +
 4 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 7b327e5..f6d4232 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -272,8 +272,6 @@ struct kvmppc_vcore {
int n_runnable;
int num_threads;
int entry_exit_count;
-   int n_woken;
-   int nap_count;
int napping_threads;
int first_vcpuid;
u16 pcpu;
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 5eda551..fa7b57d 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -561,7 +561,6 @@ int main(void)
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, 
entry_exit_count));
-   DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count));
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, 
napping_threads));
DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index bd16b03..03a8bb4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1702,8 +1702,10 @@ static int kvmppc_grab_hwthread(int cpu)
tpaca = &paca[cpu];
 
/* Ensure the thread won't go into the kernel if it wakes */
-   tpaca->kvm_hstate.hwthread_req = 1;
tpaca->kvm_hstate.kvm_vcpu = NULL;
+   tpaca->kvm_hstate.napping = 0;
+   smp_wmb();
+   tpaca->kvm_hstate.hwthread_req = 1;
 
/*
 * If the thread is already executing in the kernel (e.g. handling
@@ -1746,35 +1748,43 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
}
cpu = vc->pcpu + vcpu->arch.ptid;
tpaca = &paca[cpu];
-   tpaca->kvm_hstate.kvm_vcpu = vcpu;
tpaca->kvm_hstate.kvm_vcore = vc;
tpaca->kvm_hstate.ptid = vcpu->arch.ptid;
vcpu->cpu = vc->pcpu;
+   /* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
smp_wmb();
+   tpaca->kvm_hstate.kvm_vcpu = vcpu;
 #if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
-   if (cpu != smp_processor_id()) {
+   if (cpu != smp_processor_id())
xics_wake_cpu(cpu);
-   if (vcpu->arch.ptid)
-   ++vc->n_woken;
-   }
 #endif
 }
 
-static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
+static void kvmppc_wait_for_nap(void)
 {
-   int i;
+   int cpu = smp_processor_id();
+   int i, loops;
 
-   HMT_low();
-   i = 0;
-   while (vc->nap_count < vc->n_woken) {
-   if (++i >= 100) {
-   pr_err("kvmppc_wait_for_nap timeout %d %d\n",
-  vc->nap_count, vc->n_woken);
-   break;
+   for (loops = 0; loops < 100; ++loops) {
+   /*
+* Check if all threads are finished.
+* We set the vcpu pointer when starting a thread
+* and the thread clears it when finished, so we look
+* for any threads that still have a non-NULL vcpu ptr.
+*/
+   for (i = 1; i < threads_per_subcore; ++i)
+   if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+   break;
+   if (i == threads_per_subcore) {
+   HMT_medium();
+   return;
}
-   cpu_relax();
+   HMT_low();
}
HMT_medium();
+   for (i = 1; i < threads_per_subcore; ++i)
+   if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+   pr_err("KVM: CPU %d seems to be stuck\n", cpu + i);
 }
 
 /*
@@ -1915,8 +1925,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
/*
 * Initiali

[PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code

2015-03-20 Thread Paul Mackerras

This reads the timebase at various points in the real-mode guest
entry/exit code and uses that to accumulate total, minimum and
maximum time spent in those parts of the code.  Currently these
times are accumulated per vcpu in 5 parts of the code:

* rm_entry - time taken from the start of kvmppc_hv_entry() until
  just before entering the guest.
* rm_intr - time from when we take a hypervisor interrupt in the
  guest until we either re-enter the guest or decide to exit to the
  host.  This includes time spent handling hcalls in real mode.
* rm_exit - time from when we decide to exit the guest until the
  return from kvmppc_hv_entry().
* guest - time spend in the guest
* cede - time spent napping in real mode due to an H_CEDE hcall
  while other threads in the same vcore are active.

These times are exposed in debugfs in a directory per vcpu that
contains a file called "timings".  This file contains one line for
each of the 5 timings above, with the name followed by a colon and
4 numbers, which are the count (number of times the code has been
executed), the total time, the minimum time, and the maximum time,
all in nanoseconds.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/kvm_host.h |  19 +
 arch/powerpc/include/asm/time.h |   3 +
 arch/powerpc/kernel/asm-offsets.c   |  11 +++
 arch/powerpc/kernel/time.c  |   6 ++
 arch/powerpc/kvm/book3s_hv.c| 135 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 105 -
 6 files changed, 276 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index f1d0bbc..286c0ce 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -369,6 +369,14 @@ struct kvmppc_slb {
u8 base_page_size;  /* MMU_PAGE_xxx */
 };
 
+/* Struct used to accumulate timing information in HV real mode code */
+struct kvmhv_tb_accumulator {
+   u64 seqcount;   /* used to synchronize access, also count * 2 */
+   u64 tb_total;   /* total time in timebase ticks */
+   u64 tb_min; /* min time */
+   u64 tb_max; /* max time */
+};
+
 # ifdef CONFIG_PPC_FSL_BOOK3E
 #define KVMPPC_BOOKE_IAC_NUM   2
 #define KVMPPC_BOOKE_DAC_NUM   2
@@ -656,6 +664,17 @@ struct kvm_vcpu_arch {
u64 busy_preempt;
 
u32 emul_inst;
+
+   struct kvmhv_tb_accumulator *cur_activity;  /* What we're timing */
+   u64 cur_tb_start;   /* when it started */
+   struct kvmhv_tb_accumulator rm_entry;   /* real-mode entry code */
+   struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */
+   struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
+   struct kvmhv_tb_accumulator guest_time; /* guest execution */
+   struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
+
+   struct dentry *debugfs_dir;
+   struct dentry *debugfs_timings;
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 03cbada..10fc784 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void);
 
 DECLARE_PER_CPU(u64, decrementers_next_tb);
 
+/* Convert timebase ticks to nanoseconds */
+unsigned long long tb_to_ns(unsigned long long tb_ticks);
+
 #endif /* __KERNEL__ */
 #endif /* __POWERPC_TIME_H */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 4717859..ec9f59c 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -458,6 +458,17 @@ int main(void)
DEFINE(VCPU_SPRG1, offsetof(struct kvm_vcpu, arch.shregs.sprg1));
DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
+   DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry));
+   DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr));
+   DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit));
+   DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time));
+   DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time));
+   DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity));
+   DEFINE(VCPU_ACTIVITY_START, offsetof(struct kvm_vcpu, 
arch.cur_tb_start));
+   DEFINE(TAS_SEQCOUNT, offsetof(struct kvmhv_tb_accumulator, seqcount));
+   DEFINE(TAS_TOTAL, offsetof(struct kvmhv_tb_accumulator, tb_total));
+   DEFINE(TAS_MIN, offsetof(struct kvmhv_tb_accumulator, tb_min));
+   DEFINE(TAS_MAX, offsetof(struct kvmhv_tb_accumulator, tb_max));
 #endif
DEFINE(VCPU_SHARED_SPRG3, offsetof(struct kvm_vcpu_arch_shared, sprg3));
DEFINE(VCPU_SHARED_SPRG4, offsetof(struct kvm_vcpu_arch_shared, sprg4));
diff --git

[PATCH 09/23] KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock

2015-03-20 Thread Paul Mackerras

From: Suresh Warrier 

Replaces the ICS mutex lock with a spin lock since we will be porting
these routines to real mode. Note that we need to disable interrupts
before we take the lock in anticipation of the fact that on the guest
side, we are running in the context of a hard irq and interrupts are
disabled (EE bit off) when the lock is acquired. Again, because we
will be acquiring the lock in hypervisor real mode, we need to use
an arch_spinlock_t instead of a normal spinlock here as we want to
avoid running any lockdep code (which may not be safe to execute in
real mode).

Signed-off-by: Suresh Warrier 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_xics.c | 68 +-
 arch/powerpc/kvm/book3s_xics.h |  2 +-
 2 files changed, 48 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 48f0bda..56ed9b4 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -39,7 +40,7 @@
  * LOCKING
  * ===
  *
- * Each ICS has a mutex protecting the information about the IRQ
+ * Each ICS has a spin lock protecting the information about the IRQ
  * sources and avoiding simultaneous deliveries if the same interrupt.
  *
  * ICP operations are done via a single compare & swap transaction
@@ -109,7 +110,10 @@ static void ics_check_resend(struct kvmppc_xics *xics, 
struct kvmppc_ics *ics,
 {
int i;
 
-   mutex_lock(&ics->lock);
+   unsigned long flags;
+
+   local_irq_save(flags);
+   arch_spin_lock(&ics->lock);
 
for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
struct ics_irq_state *state = &ics->irq_state[i];
@@ -120,12 +124,15 @@ static void ics_check_resend(struct kvmppc_xics *xics, 
struct kvmppc_ics *ics,
XICS_DBG("resend %#x prio %#x\n", state->number,
  state->priority);
 
-   mutex_unlock(&ics->lock);
+   arch_spin_unlock(&ics->lock);
+   local_irq_restore(flags);
icp_deliver_irq(xics, icp, state->number);
-   mutex_lock(&ics->lock);
+   local_irq_save(flags);
+   arch_spin_lock(&ics->lock);
}
 
-   mutex_unlock(&ics->lock);
+   arch_spin_unlock(&ics->lock);
+   local_irq_restore(flags);
 }
 
 static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
@@ -133,8 +140,10 @@ static bool write_xive(struct kvmppc_xics *xics, struct 
kvmppc_ics *ics,
   u32 server, u32 priority, u32 saved_priority)
 {
bool deliver;
+   unsigned long flags;
 
-   mutex_lock(&ics->lock);
+   local_irq_save(flags);
+   arch_spin_lock(&ics->lock);
 
state->server = server;
state->priority = priority;
@@ -145,7 +154,8 @@ static bool write_xive(struct kvmppc_xics *xics, struct 
kvmppc_ics *ics,
deliver = true;
}
 
-   mutex_unlock(&ics->lock);
+   arch_spin_unlock(&ics->lock);
+   local_irq_restore(flags);
 
return deliver;
 }
@@ -186,6 +196,7 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 
*server, u32 *priority)
struct kvmppc_ics *ics;
struct ics_irq_state *state;
u16 src;
+   unsigned long flags;
 
if (!xics)
return -ENODEV;
@@ -195,10 +206,12 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 
*server, u32 *priority)
return -EINVAL;
state = &ics->irq_state[src];
 
-   mutex_lock(&ics->lock);
+   local_irq_save(flags);
+   arch_spin_lock(&ics->lock);
*server = state->server;
*priority = state->priority;
-   mutex_unlock(&ics->lock);
+   arch_spin_unlock(&ics->lock);
+   local_irq_restore(flags);
 
return 0;
 }
@@ -365,6 +378,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
struct kvmppc_ics *ics;
u32 reject;
u16 src;
+   unsigned long flags;
 
/*
 * This is used both for initial delivery of an interrupt and
@@ -391,7 +405,8 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
state = &ics->irq_state[src];
 
/* Get a lock on the ICS */
-   mutex_lock(&ics->lock);
+   local_irq_save(flags);
+   arch_spin_lock(&ics->lock);
 
/* Get our server */
if (!icp || state->server != icp->server_num) {
@@ -434,7 +449,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
 *
 * Note that if successful, the new delivery might have itself
 * rejected an interrupt that was "delivered" bef

1 2 3 4 5 6 7 8 9 >

1 - 100 of 828 matches

Mail list logo