from:"Paul"

Re: [PATCH] KVM: PPC: Increase memslots to 512

2015-12-09 Thread Paul Mackerras

On Wed, Dec 09, 2015 at 11:34:07AM +0100, Thomas Huth wrote:
> Only using 32 memslots for KVM on powerpc is way too low, you can
> nowadays hit this limit quite fast by adding a couple of PCI devices
> and/or pluggable memory DIMMs to the guest.
> 
> x86 already increased the KVM_USER_MEM_SLOTS to 509, to satisfy 256
> pluggable DIMM slots, 3 private slots and 253 slots for other things
> like PCI devices (i.e. resulting in 256 + 3 + 253 = 512 slots in
> total). We should do something similar for powerpc, and since we do
> not use private slots here, we can set the value to 512 directly.
> 
> While we're at it, also remove the KVM_MEM_SLOTS_NUM definition
> from the powerpc-specific header since this gets defined in the
> generic kvm_host.h header anyway.
> 
> Signed-off-by: Thomas Huth <th...@redhat.com>

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8

2015-12-09 Thread Paul Mackerras

On Fri, Nov 20, 2015 at 09:11:45AM +0100, Thomas Huth wrote:
> In the old DABR register, the BT (Breakpoint Translation) bit
> is bit number 61. In the new DAWRX register, the WT (Watchpoint
> Translation) bit is bit number 59. So to move the DABR-BT bit
> into the position of the DAWRX-WT bit, it has to be shifted by
> two, not only by one. This fixes hardware watchpoints in gdb of
> older guests that only use the H_SET_DABR/X interface instead
> of the new H_SET_MODE interface.
> 
> Signed-off-by: Thomas Huth <th...@redhat.com>

Thanks, applied to my kvm-ppc-next branch, with cc: sta...@vger.kernel.org.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: remove unused variable 'vcpu_book3s'

2015-12-09 Thread Paul Mackerras

On Tue, Dec 01, 2015 at 08:42:10PM -0300, Geyslan G. Bem wrote:
> The vcpu_book3s struct is assigned but never used. So remove it.
> 
> Signed-off-by: Geyslan G. Bem <geys...@gmail.com>

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Increase memslots to 512

2015-12-09 Thread Paul Mackerras

On Wed, Dec 09, 2015 at 11:34:07AM +0100, Thomas Huth wrote:
> Only using 32 memslots for KVM on powerpc is way too low, you can
> nowadays hit this limit quite fast by adding a couple of PCI devices
> and/or pluggable memory DIMMs to the guest.
> 
> x86 already increased the KVM_USER_MEM_SLOTS to 509, to satisfy 256
> pluggable DIMM slots, 3 private slots and 253 slots for other things
> like PCI devices (i.e. resulting in 256 + 3 + 253 = 512 slots in
> total). We should do something similar for powerpc, and since we do
> not use private slots here, we can set the value to 512 directly.
> 
> While we're at it, also remove the KVM_MEM_SLOTS_NUM definition
> from the powerpc-specific header since this gets defined in the
> generic kvm_host.h header anyway.
> 
> Signed-off-by: Thomas Huth <th...@redhat.com>

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8

2015-12-09 Thread Paul Mackerras

On Fri, Nov 20, 2015 at 09:11:45AM +0100, Thomas Huth wrote:
> In the old DABR register, the BT (Breakpoint Translation) bit
> is bit number 61. In the new DAWRX register, the WT (Watchpoint
> Translation) bit is bit number 59. So to move the DABR-BT bit
> into the position of the DAWRX-WT bit, it has to be shifted by
> two, not only by one. This fixes hardware watchpoints in gdb of
> older guests that only use the H_SET_DABR/X interface instead
> of the new H_SET_MODE interface.
> 
> Signed-off-by: Thomas Huth <th...@redhat.com>

Thanks, applied to my kvm-ppc-next branch, with cc: sta...@vger.kernel.org.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: remove unused variable 'vcpu_book3s'

2015-12-09 Thread Paul Mackerras

On Tue, Dec 01, 2015 at 08:42:10PM -0300, Geyslan G. Bem wrote:
> The vcpu_book3s struct is assigned but never used. So remove it.
> 
> Signed-off-by: Geyslan G. Bem <geys...@gmail.com>

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Please pull my kvm-ppc-fixes branch

2015-12-09 Thread Paul Mackerras

Hi Paolo,

I have a small patch that I would like to get into 4.4 because it
fixes a bug which for certain kernel configs allows userspace to crash
the kernel.  The configs are those for which KVM_BOOK3S_64_HV is set
(y or m) and KVM_BOOK3S_64_PR is not.  Fortunately most distros that
enable KVM_BOOK3S_64_HV also enable KVM_BOOK3S_64_PR, as far as I can
tell.

Thanks,
Paul.

The following changes since commit 09922076003ad66de41ea14d2f8c3b4a16ec7774:

  Merge tag 'kvm-arm-for-v4.4-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master 
(2015-12-04 18:32:32 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes

for you to fetch changes up to c20875a3e638e4a03e099b343ec798edd1af5cc6:

  KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR 
(2015-12-10 11:34:27 +1100)


Paul Mackerras (1):
  KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR

 arch/powerpc/kvm/book3s_hv.c | 6 ++
 1 file changed, 6 insertions(+)
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Please pull my kvm-ppc-fixes branch

2015-12-09 Thread Paul Mackerras

Hi Paolo,

I have a small patch that I would like to get into 4.4 because it
fixes a bug which for certain kernel configs allows userspace to crash
the kernel.  The configs are those for which KVM_BOOK3S_64_HV is set
(y or m) and KVM_BOOK3S_64_PR is not.  Fortunately most distros that
enable KVM_BOOK3S_64_HV also enable KVM_BOOK3S_64_PR, as far as I can
tell.

Thanks,
Paul.

The following changes since commit 09922076003ad66de41ea14d2f8c3b4a16ec7774:

  Merge tag 'kvm-arm-for-v4.4-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master 
(2015-12-04 18:32:32 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes

for you to fetch changes up to c20875a3e638e4a03e099b343ec798edd1af5cc6:

  KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR 
(2015-12-10 11:34:27 +1100)


Paul Mackerras (1):
  KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR

 arch/powerpc/kvm/book3s_hv.c | 6 ++
 1 file changed, 6 insertions(+)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] KVM: PPC: Increase memslots to 320

2015-12-08 Thread Paul Mackerras

On Wed, Nov 04, 2015 at 10:03:48AM +0100, Thomas Huth wrote:
> Only using 32 memslots for KVM on powerpc is way too low, you can
> nowadays hit this limit quite fast by adding a couple of PCI devices
> and/or pluggable memory DIMMs to the guest.
> x86 already increased the limit to 512 in total, to satisfy 256
> pluggable DIMM slots, 3 private slots and 253 slots for other things
> like PCI devices. On powerpc, we only have 32 pluggable DIMMs in

I agree with increasing the limit.  Is there a reason we have only 32
pluggable DIMMs in QEMU on powerpc, not more?  Should we be increasing
that limit too?  If so, maybe we should increase the number of memory
slots to 512?

> QEMU, not 256, so we likely do not as much slots as on x86. Thus

"so we likely do not need as many slots as on x86" would be better
English.

> setting the slot limit to 320 sounds like a good value for the
> time being (until we have some code in the future to resize the
> memslot array dynamically).
> And while we're at it, also remove the KVM_MEM_SLOTS_NUM definition
> from the powerpc-specific header since this gets defined in the
> generic kvm_host.h header anyway.
> 
> Signed-off-by: Thomas Huth <th...@redhat.com>

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] KVM: PPC: Increase memslots to 320

2015-12-08 Thread Paul Mackerras

On Wed, Nov 04, 2015 at 10:03:48AM +0100, Thomas Huth wrote:
> Only using 32 memslots for KVM on powerpc is way too low, you can
> nowadays hit this limit quite fast by adding a couple of PCI devices
> and/or pluggable memory DIMMs to the guest.
> x86 already increased the limit to 512 in total, to satisfy 256
> pluggable DIMM slots, 3 private slots and 253 slots for other things
> like PCI devices. On powerpc, we only have 32 pluggable DIMMs in

I agree with increasing the limit.  Is there a reason we have only 32
pluggable DIMMs in QEMU on powerpc, not more?  Should we be increasing
that limit too?  If so, maybe we should increase the number of memory
slots to 512?

> QEMU, not 256, so we likely do not as much slots as on x86. Thus

"so we likely do not need as many slots as on x86" would be better
English.

> setting the slot limit to 320 sounds like a good value for the
> time being (until we have some code in the future to resize the
> memslot array dynamically).
> And while we're at it, also remove the KVM_MEM_SLOTS_NUM definition
> from the powerpc-specific header since this gets defined in the
> generic kvm_host.h header anyway.
> 
> Signed-off-by: Thomas Huth <th...@redhat.com>

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] kvm - possible out of bounds

2015-11-29 Thread Paul Mackerras

On Sun, Nov 29, 2015 at 05:14:03PM -0300, Geyslan Gregório Bem wrote:
> Hello,
> 
> I have found a possible out of bounds reading in
> arch/powerpc/kvm/book3s_64_mmu.c (kvmppc_mmu_book3s_64_xlate
> function). pteg[] array could be accessed twice using the i variable
> after the for iteration. What happens is that in the last iteration
> the i index is incremented to 16, checked (i<16) then confirmed
> exiting the loop.
> 
> 277for (i=0; i<16; i+=2) { ...
> 
> Later there are reading attempts to the pteg last elements, but using
> again the already incremented i (16).
> 
> 303v = be64_to_cpu(pteg[i]);  /* pteg[16] */
> 304r = be64_to_cpu(pteg[i+1]); /* pteg[17] */

Was it some automated tool that came up with this?

There is actually no problem because the accesses outside the loop are
only done if the 'found' variable is true; 'found' is initialized to
false and only ever set to true inside the loop just before a break
statement.  Thus there is a correlation between the value of 'i' and
the value of 'found' -- if 'found' is true then we know 'i' is less
than 16.

> I really don't know if the for lace will somehow iterate until i is
> 16, anyway I think that the last readings must be using a defined max
> len/index or another more clear method.

I think it's perfectly clear to a human programmer, though some tools
(such as gcc) struggle with this kind of correlation between
variables.  That's why I asked whether your report was based on the
output from some tool.

> Eg.
> 
> v = be64_to_cpu(pteg[PTEG_LEN - 2]);
> r = be64_to_cpu(pteg[PTEG_LEN - 1]);
> 
> Or just.
> 
> v = be64_to_cpu(pteg[14]);
> r = be64_to_cpu(pteg[15]);

Either of those options would cause the code to malfunction.

> I found in the same file a variable that is not used.
> 
> 380struct kvmppc_vcpu_book3s *vcpu_book3s;
> ...
> 387vcpu_book3s = to_book3s(vcpu);

True.  It could be removed.

> A question, the kvmppc_mmu_book3s_64_init function is accessed by
> unconventional way? Because I have not found any calling to it.

Try arch/powerpc/kvm/book3s_pr.c line 410:

kvmppc_mmu_book3s_64_init(vcpu);

Grep (or git grep) is your friend.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] kvm - possible out of bounds

2015-11-29 Thread Paul Mackerras

On Sun, Nov 29, 2015 at 05:14:03PM -0300, Geyslan Gregório Bem wrote:
> Hello,
> 
> I have found a possible out of bounds reading in
> arch/powerpc/kvm/book3s_64_mmu.c (kvmppc_mmu_book3s_64_xlate
> function). pteg[] array could be accessed twice using the i variable
> after the for iteration. What happens is that in the last iteration
> the i index is incremented to 16, checked (i<16) then confirmed
> exiting the loop.
> 
> 277for (i=0; i<16; i+=2) { ...
> 
> Later there are reading attempts to the pteg last elements, but using
> again the already incremented i (16).
> 
> 303v = be64_to_cpu(pteg[i]);  /* pteg[16] */
> 304r = be64_to_cpu(pteg[i+1]); /* pteg[17] */

Was it some automated tool that came up with this?

There is actually no problem because the accesses outside the loop are
only done if the 'found' variable is true; 'found' is initialized to
false and only ever set to true inside the loop just before a break
statement.  Thus there is a correlation between the value of 'i' and
the value of 'found' -- if 'found' is true then we know 'i' is less
than 16.

> I really don't know if the for lace will somehow iterate until i is
> 16, anyway I think that the last readings must be using a defined max
> len/index or another more clear method.

I think it's perfectly clear to a human programmer, though some tools
(such as gcc) struggle with this kind of correlation between
variables.  That's why I asked whether your report was based on the
output from some tool.

> Eg.
> 
> v = be64_to_cpu(pteg[PTEG_LEN - 2]);
> r = be64_to_cpu(pteg[PTEG_LEN - 1]);
> 
> Or just.
> 
> v = be64_to_cpu(pteg[14]);
> r = be64_to_cpu(pteg[15]);

Either of those options would cause the code to malfunction.

> I found in the same file a variable that is not used.
> 
> 380struct kvmppc_vcpu_book3s *vcpu_book3s;
> ...
> 387vcpu_book3s = to_book3s(vcpu);

True.  It could be removed.

> A question, the kvmppc_mmu_book3s_64_init function is accessed by
> unconventional way? Because I have not found any calling to it.

Try arch/powerpc/kvm/book3s_pr.c line 410:

kvmppc_mmu_book3s_64_init(vcpu);

Grep (or git grep) is your friend.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm

2015-11-29 Thread Paul Mackerras

On Tue, Sep 15, 2015 at 08:49:35PM +1000, Alexey Kardashevskiy wrote:
> At the moment pages used for TCE tables (in addition to pages addressed
> by TCEs) are not counted in locked_vm counter so a malicious userspace
> tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
> lock a lot of memory.
> 
> This adds counting for pages used for TCE tables.
> 
> This counts the number of pages required for a table plus pages for
> the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.
> 
> This does not change the amount of (de)allocated memory.
> 
> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 51 
> +++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> b/arch/powerpc/kvm/book3s_64_vio.c
> index 9526c34..b70787d 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size)
>* sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> +static long kvmppc_account_memlimit(long npages, bool inc)
> +{
> + long ret = 0;
> + const long bytes = sizeof(struct kvmppc_spapr_tce_table) +
> + (abs(npages) * sizeof(struct page *));

Why abs(npages)?  Can npages be negative?  If so, what does that mean?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm

2015-11-29 Thread Paul Mackerras

On Tue, Sep 15, 2015 at 08:49:35PM +1000, Alexey Kardashevskiy wrote:
> At the moment pages used for TCE tables (in addition to pages addressed
> by TCEs) are not counted in locked_vm counter so a malicious userspace
> tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
> lock a lot of memory.
> 
> This adds counting for pages used for TCE tables.
> 
> This counts the number of pages required for a table plus pages for
> the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.
> 
> This does not change the amount of (de)allocated memory.
> 
> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 51 
> +++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c 
> b/arch/powerpc/kvm/book3s_64_vio.c
> index 9526c34..b70787d 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size)
>* sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
>  }
>  
> +static long kvmppc_account_memlimit(long npages, bool inc)
> +{
> + long ret = 0;
> + const long bytes = sizeof(struct kvmppc_spapr_tce_table) +
> + (abs(npages) * sizeof(struct page *));

Why abs(npages)?  Can npages be negative?  If so, what does that mean?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR

2015-11-11 Thread Paul Mackerras

Currently it is possible for userspace (e.g. QEMU) to set a value
for the MSR for a guest VCPU which has both of the TS bits set,
which is an illegal combination.  The result of this is that when
we execute a hrfid (hypervisor return from interrupt doubleword)
instruction to enter the guest, the CPU will take a TM Bad Thing
type of program interrupt (vector 0x700).

Now, if PR KVM is configured in the kernel along with HV KVM, we
actually handle this without crashing the host or giving hypervisor
privilege to the guest; instead what happens is that we deliver a
program interrupt to the guest, with SRR0 reflecting the address
of the hrfid instruction and SRR1 containing the MSR value at that
point.  If PR KVM is not configured in the kernel, then we try to
run the host's program interrupt handler with the MMU set to the
guest context, which almost certainly causes a host crash.

This closes the hole by making kvmppc_set_msr_hv() check for the
illegal combination and force the TS field to a safe value (00,
meaning non-transactional).

Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index becad3a..f668712 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -231,6 +231,12 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
 
 static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr)
 {
+   /*
+* Check for illegal transactional state bit combination
+* and if we find it, force the TS field to a safe state.
+*/
+   if ((msr & MSR_TS_MASK) == MSR_TS_MASK)
+   msr &= ~MSR_TS_MASK;
vcpu->arch.shregs.msr = msr;
kvmppc_end_cede(vcpu);
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] powerpc/64: Include KVM guest test in all interrupt vectors

2015-11-11 Thread Paul Mackerras

Currently, if HV KVM is configured but PR KVM isn't, we don't include
a test to see whether we were interrupted in KVM guest context for the
set of interrupts which get delivered directly to the guest by hardware
if they occur in the guest.  This includes things like program
interrupts.

However, the recent bug where userspace could set the MSR for a VCPU
to have an illegal value in the TS field, and thus cause a TM Bad Thing
type of program interrupt on the hrfid that enters the guest, showed that
we can never be completely sure that these interrupts can never occur
in the guest entry/exit code.  If one of these interrupts does happen
and we have HV KVM configured but not PR KVM, then we end up trying to
run the handler in the host with the MMU set to the guest MMU context,
which generally ends badly.

Thus, for robustness it is better to have the test in every interrupt
vector, so that if some way is found to trigger some interrupt in the
guest entry/exit path, we can handle it without immediately crashing
the host.

This means that the distinction between KVMTEST and KVMTEST_PR goes
away.  Thus we delete KVMTEST_PR and associated macros and use KVMTEST
everywhere that we previously used either KVMTEST_PR or KVMTEST.  It
also means that SOFTEN_TEST_HV_201 becomes the same as SOFTEN_TEST_PR,
so we deleted SOFTEN_TEST_HV_201 and use SOFTEN_TEST_PR instead.

Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/include/asm/exception-64s.h | 21 +++-
 arch/powerpc/kernel/exceptions-64s.S | 34 
 2 files changed, 20 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 77f52b2..9ee1078 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -263,17 +263,6 @@ do_kvm_##n:
\
 #define KVM_HANDLER_SKIP(area, h, n)
 #endif
 
-#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
-#define KVMTEST_PR(n)  __KVMTEST(n)
-#define KVM_HANDLER_PR(area, h, n) __KVM_HANDLER(area, h, n)
-#define KVM_HANDLER_PR_SKIP(area, h, n)__KVM_HANDLER_SKIP(area, h, n)
-
-#else
-#define KVMTEST_PR(n)
-#define KVM_HANDLER_PR(area, h, n)
-#define KVM_HANDLER_PR_SKIP(area, h, n)
-#endif
-
 #define NOTEST(n)
 
 /*
@@ -360,13 +349,13 @@ label##_pSeries:  \
HMT_MEDIUM_PPR_DISCARD; \
SET_SCRATCH0(r13);  /* save r13 */  \
EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common,\
-EXC_STD, KVMTEST_PR, vec)
+EXC_STD, KVMTEST, vec)
 
 /* Version of above for when we have to branch out-of-line */
 #define STD_EXCEPTION_PSERIES_OOL(vec, label)  \
.globl label##_pSeries; \
 label##_pSeries:   \
-   EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, vec);\
+   EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST, vec);   \
EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_STD)
 
 #define STD_EXCEPTION_HV(loc, vec, label)  \
@@ -436,17 +425,13 @@ label##_relon_hv: 
\
 #define _SOFTEN_TEST(h, vec)   __SOFTEN_TEST(h, vec)
 
 #define SOFTEN_TEST_PR(vec)\
-   KVMTEST_PR(vec);\
+   KVMTEST(vec);   \
_SOFTEN_TEST(EXC_STD, vec)
 
 #define SOFTEN_TEST_HV(vec)\
KVMTEST(vec);   \
_SOFTEN_TEST(EXC_HV, vec)
 
-#define SOFTEN_TEST_HV_201(vec)
\
-   KVMTEST(vec);   \
-   _SOFTEN_TEST(EXC_STD, vec)
-
 #define SOFTEN_NOTEST_PR(vec)  _SOFTEN_TEST(EXC_STD, vec)
 #define SOFTEN_NOTEST_HV(vec)  _SOFTEN_TEST(EXC_HV, vec)
 
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 0a0399c2..1a03142 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -242,7 +242,7 @@ instruction_access_slb_pSeries:
HMT_MEDIUM_PPR_DISCARD
SET_SCRATCH0(r13)
EXCEPTION_PROLOG_0(PACA_EXSLB)
-   EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x480)
+   EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST, 0x480)
std r3,PACA_EXSLB+EX_R3(r13)
mfspr   r3,SPRN_SRR0/* SRR0 is faulting address */
 #ifdef __DISABLED__
@@ -276,18 +276,18 @@ hardware_interrupt_hv:
KVM_HANDLER(PACA_EXGEN, EXC_HV, 0x502)
FTR_SECTION_ELSE
_MASKABLE_EXCEPTION_PSERIES(0x500, hardware_int

[PATCH 2/2] KVM: PPC: Book3S HV: Handle unexpected traps in guest entry/exit code better

2015-11-11 Thread Paul Mackerras

As we saw with the TM Bad Thing type of program interrupt occurring
on the hrfid that enters the guest, it is not completely impossible
to have a trap occurring in the guest entry/exit code, despite the
fact that the code has been written to avoid taking any traps.

This adds a check in the kvmppc_handle_exit_hv() function to detect
the case when a trap has occurred in the hypervisor-mode code, and
instead of treating it just like a trap in guest code, we now print
a message and return to userspace with a KVM_EXIT_INTERNAL_ERROR
exit reason.

Of the various interrupts that get handled in the assembly code in
the guest exit path and that can return directly to the guest, the
only one that can occur when MSR.HV=1 and MSR.EE=0 is machine check
(other than system call, which we can avoid just by not doing a sc
instruction).  Therefore this adds code to the machine check path to
ensure that if the MCE occurred in hypervisor mode, we exit to the
host rather than trying to continue the guest.

Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c| 18 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  2 ++
 2 files changed, 20 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index f668712..d6baf0a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -846,6 +846,24 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
 
vcpu->stat.sum_exits++;
 
+   /*
+* This can happen if an interrupt occurs in the last stages
+* of guest entry or the first stages of guest exit (i.e. after
+* setting paca->kvm_hstate.in_guest to KVM_GUEST_MODE_GUEST_HV
+* and before setting it to KVM_GUEST_MODE_HOST_HV).
+* That can happen due to a bug, or due to a machine check
+* occurring at just the wrong time.
+*/
+   if (vcpu->arch.shregs.msr & MSR_HV) {
+   printk(KERN_EMERG "KVM trap in HV mode!\n");
+   printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n",
+   vcpu->arch.trap, kvmppc_get_pc(vcpu),
+   vcpu->arch.shregs.msr);
+   kvmppc_dump_regs(vcpu);
+   run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+   run->hw.hardware_exit_reason = vcpu->arch.trap;
+   return RESUME_HOST;
+   }
run->exit_reason = KVM_EXIT_UNKNOWN;
run->ready_for_interrupt_injection = 1;
switch (vcpu->arch.trap) {
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 3c6badc..b3ce8ff 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2404,6 +2404,8 @@ machine_check_realmode:
 * guest as machine check causing guest to crash.
 */
ld  r11, VCPU_MSR(r9)
+   rldicl. r0, r11, 64-MSR_HV_LG, 63 /* check if it happened in HV mode */
+   bne mc_cont /* if so, exit to host */
andi.   r10, r11, MSR_RI/* check for unrecoverable exception */
beq 1f  /* Deliver a machine check to guest */
ld  r10, VCPU_PC(r9)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR

2015-11-11 Thread Paul Mackerras

Currently it is possible for userspace (e.g. QEMU) to set a value
for the MSR for a guest VCPU which has both of the TS bits set,
which is an illegal combination.  The result of this is that when
we execute a hrfid (hypervisor return from interrupt doubleword)
instruction to enter the guest, the CPU will take a TM Bad Thing
type of program interrupt (vector 0x700).

Now, if PR KVM is configured in the kernel along with HV KVM, we
actually handle this without crashing the host or giving hypervisor
privilege to the guest; instead what happens is that we deliver a
program interrupt to the guest, with SRR0 reflecting the address
of the hrfid instruction and SRR1 containing the MSR value at that
point.  If PR KVM is not configured in the kernel, then we try to
run the host's program interrupt handler with the MMU set to the
guest context, which almost certainly causes a host crash.

This closes the hole by making kvmppc_set_msr_hv() check for the
illegal combination and force the TS field to a safe value (00,
meaning non-transactional).

Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index becad3a..f668712 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -231,6 +231,12 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
 
 static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr)
 {
+   /*
+* Check for illegal transactional state bit combination
+* and if we find it, force the TS field to a safe state.
+*/
+   if ((msr & MSR_TS_MASK) == MSR_TS_MASK)
+   msr &= ~MSR_TS_MASK;
vcpu->arch.shregs.msr = msr;
kvmppc_end_cede(vcpu);
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Please pull my kvm-ppc-fixes branch

2015-11-11 Thread Paul Mackerras

Paolo,

I have two fixes for HV KVM which I would like to have included in
v4.4-rc1.  The first one is a fix for a bug identified by Red Hat
which causes occasional guest crashes.  The second one fixes a bug
which causes host stalls and timeouts under certain circumstances when
the host is configured for static 2-way micro-threading mode.

Thanks,
Paul.

The following changes since commit a3eaa8649e4c6a6afdafaa04b9114fb230617bb1:

  KVM: VMX: Fix commit which broke PML (2015-11-05 11:34:11 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes

for you to fetch changes up to f74f2e2e26199f695ca3df94f29e9ab7cb707ea4:

  KVM: PPC: Book3S HV: Don't dynamically split core when already split 
(2015-11-06 16:02:59 +1100)


Paul Mackerras (2):
  KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails
  KVM: PPC: Book3S HV: Don't dynamically split core when already split

 arch/powerpc/kvm/book3s_hv.c|  2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 
 2 files changed, 13 insertions(+), 9 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Please pull my kvm-ppc-fixes branch

2015-11-11 Thread Paul Mackerras

Paolo,

I have two fixes for HV KVM which I would like to have included in
v4.4-rc1.  The first one is a fix for a bug identified by Red Hat
which causes occasional guest crashes.  The second one fixes a bug
which causes host stalls and timeouts under certain circumstances when
the host is configured for static 2-way micro-threading mode.

Thanks,
Paul.

The following changes since commit a3eaa8649e4c6a6afdafaa04b9114fb230617bb1:

  KVM: VMX: Fix commit which broke PML (2015-11-05 11:34:11 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes

for you to fetch changes up to f74f2e2e26199f695ca3df94f29e9ab7cb707ea4:

  KVM: PPC: Book3S HV: Don't dynamically split core when already split 
(2015-11-06 16:02:59 +1100)


Paul Mackerras (2):
  KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails
  KVM: PPC: Book3S HV: Don't dynamically split core when already split

 arch/powerpc/kvm/book3s_hv.c|  2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 
 2 files changed, 13 insertions(+), 9 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: PPC: Book3S HV: Handle unexpected traps in guest entry/exit code better

2015-11-11 Thread Paul Mackerras

As we saw with the TM Bad Thing type of program interrupt occurring
on the hrfid that enters the guest, it is not completely impossible
to have a trap occurring in the guest entry/exit code, despite the
fact that the code has been written to avoid taking any traps.

This adds a check in the kvmppc_handle_exit_hv() function to detect
the case when a trap has occurred in the hypervisor-mode code, and
instead of treating it just like a trap in guest code, we now print
a message and return to userspace with a KVM_EXIT_INTERNAL_ERROR
exit reason.

Of the various interrupts that get handled in the assembly code in
the guest exit path and that can return directly to the guest, the
only one that can occur when MSR.HV=1 and MSR.EE=0 is machine check
(other than system call, which we can avoid just by not doing a sc
instruction).  Therefore this adds code to the machine check path to
ensure that if the MCE occurred in hypervisor mode, we exit to the
host rather than trying to continue the guest.

Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c| 18 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  2 ++
 2 files changed, 20 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index f668712..d6baf0a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -846,6 +846,24 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
 
vcpu->stat.sum_exits++;
 
+   /*
+* This can happen if an interrupt occurs in the last stages
+* of guest entry or the first stages of guest exit (i.e. after
+* setting paca->kvm_hstate.in_guest to KVM_GUEST_MODE_GUEST_HV
+* and before setting it to KVM_GUEST_MODE_HOST_HV).
+* That can happen due to a bug, or due to a machine check
+* occurring at just the wrong time.
+*/
+   if (vcpu->arch.shregs.msr & MSR_HV) {
+   printk(KERN_EMERG "KVM trap in HV mode!\n");
+   printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n",
+   vcpu->arch.trap, kvmppc_get_pc(vcpu),
+   vcpu->arch.shregs.msr);
+   kvmppc_dump_regs(vcpu);
+   run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+   run->hw.hardware_exit_reason = vcpu->arch.trap;
+   return RESUME_HOST;
+   }
run->exit_reason = KVM_EXIT_UNKNOWN;
run->ready_for_interrupt_injection = 1;
switch (vcpu->arch.trap) {
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 3c6badc..b3ce8ff 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2404,6 +2404,8 @@ machine_check_realmode:
 * guest as machine check causing guest to crash.
 */
ld  r11, VCPU_MSR(r9)
+   rldicl. r0, r11, 64-MSR_HV_LG, 63 /* check if it happened in HV mode */
+   bne mc_cont /* if so, exit to host */
andi.   r10, r11, MSR_RI/* check for unrecoverable exception */
beq 1f  /* Deliver a machine check to guest */
ld  r10, VCPU_PC(r9)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Don't dynamically split core when already split

2015-11-05 Thread Paul Mackerras

In static micro-threading modes, the dynamic micro-threading code
is supposed to be disabled, because subcores can't make independent
decisions about what micro-threading mode to put the core in - there is
only one micro-threading mode for the whole core.  The code that
implements dynamic micro-threading checks for this, except that the
check was missed in one case.  This means that it is possible for a
subcore in static 2-way micro-threading mode to try to put the core
into 4-way micro-threading mode, which usually leads to stuck CPUs,
spinlock lockups, and other stalls in the host.

The problem was in the can_split_piggybacked_subcores() function, which
should always return false if the system is in a static micro-threading
mode.  This fixes the problem by making can_split_piggybacked_subcores()
use subcore_config_ok() for its checks, as subcore_config_ok() includes
the necessary check for the static micro-threading modes.

Credit to Gautham Shenoy for working out that the reason for the hangs
and stalls we were seeing was that we were trying to do dynamic 4-way
micro-threading while we were in static 2-way mode.

Fixes: b4deba5c41e9
Cc: v...@stable.kernel.org # v4.3
Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2280497..becad3a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2060,7 +2060,7 @@ static bool can_split_piggybacked_subcores(struct 
core_info *cip)
return false;
n_subcores += (cip->subcore_threads[sub] - 1) >> 1;
}
-   if (n_subcores > 3 || large_sub < 0)
+   if (large_sub < 0 || !subcore_config_ok(n_subcores + 1, 2))
return false;
 
/*
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Don't dynamically split core when already split

2015-11-05 Thread Paul Mackerras

In static micro-threading modes, the dynamic micro-threading code
is supposed to be disabled, because subcores can't make independent
decisions about what micro-threading mode to put the core in - there is
only one micro-threading mode for the whole core.  The code that
implements dynamic micro-threading checks for this, except that the
check was missed in one case.  This means that it is possible for a
subcore in static 2-way micro-threading mode to try to put the core
into 4-way micro-threading mode, which usually leads to stuck CPUs,
spinlock lockups, and other stalls in the host.

The problem was in the can_split_piggybacked_subcores() function, which
should always return false if the system is in a static micro-threading
mode.  This fixes the problem by making can_split_piggybacked_subcores()
use subcore_config_ok() for its checks, as subcore_config_ok() includes
the necessary check for the static micro-threading modes.

Credit to Gautham Shenoy for working out that the reason for the hangs
and stalls we were seeing was that we were trying to do dynamic 4-way
micro-threading while we were in static 2-way mode.

Fixes: b4deba5c41e9
Cc: v...@stable.kernel.org # v4.3
Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2280497..becad3a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2060,7 +2060,7 @@ static bool can_split_piggybacked_subcores(struct 
core_info *cip)
return false;
n_subcores += (cip->subcore_threads[sub] - 1) >> 1;
}
-   if (n_subcores > 3 || large_sub < 0)
+   if (large_sub < 0 || !subcore_config_ok(n_subcores + 1, 2))
return false;
 
/*
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails

2015-10-26 Thread Paul Mackerras

When handling a hypervisor data or instruction storage interrupt (HDSI
or HISI), we look up the SLB entry for the address being accessed in
order to translate the effective address to a virtual address which can
be looked up in the guest HPT.  This lookup can occasionally fail due
to the guest replacing an SLB entry without invalidating the evicted
SLB entry.  In this situation an ERAT (effective to real address
translation cache) entry can persist and be used by the hardware even
though there is no longer a corresponding SLB entry.

Previously we would just deliver a data or instruction storage interrupt
(DSI or ISI) to the guest in this case.  However, this is not correct
and has been observed to cause guests to crash, typically with a
data storage protection interrupt on a store to the vmemmap area.

Instead, what we do now is to synthesize a data or instruction segment
interrupt.  That should cause the guest to reload an appropriate entry
into the SLB and retry the faulting instruction.  If it still faults,
we should find an appropriate SLB entry next time and be able to handle
the fault.

Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b1dab8d..3c6badc 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1749,7 +1749,8 @@ kvmppc_hdsi:
beq 3f
clrrdi  r0, r4, 28
PPC_SLBFEE_DOT(R5, R0)  /* if so, look up SLB */
-   bne 1f  /* if no SLB entry found */
+   li  r0, BOOK3S_INTERRUPT_DATA_SEGMENT
+   bne 7f  /* if no SLB entry found */
 4: std r4, VCPU_FAULT_DAR(r9)
stw r6, VCPU_FAULT_DSISR(r9)
 
@@ -1768,14 +1769,15 @@ kvmppc_hdsi:
cmpdi   r3, -2  /* MMIO emulation; need instr word */
beq 2f
 
-   /* Synthesize a DSI for the guest */
+   /* Synthesize a DSI (or DSegI) for the guest */
ld  r4, VCPU_FAULT_DAR(r9)
mr  r6, r3
-1: mtspr   SPRN_DAR, r4
+1: li  r0, BOOK3S_INTERRUPT_DATA_STORAGE
mtspr   SPRN_DSISR, r6
+7: mtspr   SPRN_DAR, r4
mtspr   SPRN_SRR0, r10
mtspr   SPRN_SRR1, r11
-   li  r10, BOOK3S_INTERRUPT_DATA_STORAGE
+   mr  r10, r0
bl  kvmppc_msr_interrupt
 fast_interrupt_c_return:
 6: ld  r7, VCPU_CTR(r9)
@@ -1823,7 +1825,8 @@ kvmppc_hisi:
beq 3f
clrrdi  r0, r10, 28
PPC_SLBFEE_DOT(R5, R0)  /* if so, look up SLB */
-   bne 1f  /* if no SLB entry found */
+   li  r0, BOOK3S_INTERRUPT_INST_SEGMENT
+   bne 7f  /* if no SLB entry found */
 4:
/* Search the hash table. */
mr  r3, r9  /* vcpu pointer */
@@ -1840,11 +1843,12 @@ kvmppc_hisi:
cmpdi   r3, -1  /* handle in kernel mode */
beq guest_exit_cont
 
-   /* Synthesize an ISI for the guest */
+   /* Synthesize an ISI (or ISegI) for the guest */
mr  r11, r3
-1: mtspr   SPRN_SRR0, r10
+1: li  r0, BOOK3S_INTERRUPT_INST_STORAGE
+7: mtspr   SPRN_SRR0, r10
mtspr   SPRN_SRR1, r11
-   li  r10, BOOK3S_INTERRUPT_INST_STORAGE
+   mr  r10, r0
bl  kvmppc_msr_interrupt
b   fast_interrupt_c_return
 
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails

2015-10-26 Thread Paul Mackerras

When handling a hypervisor data or instruction storage interrupt (HDSI
or HISI), we look up the SLB entry for the address being accessed in
order to translate the effective address to a virtual address which can
be looked up in the guest HPT.  This lookup can occasionally fail due
to the guest replacing an SLB entry without invalidating the evicted
SLB entry.  In this situation an ERAT (effective to real address
translation cache) entry can persist and be used by the hardware even
though there is no longer a corresponding SLB entry.

Previously we would just deliver a data or instruction storage interrupt
(DSI or ISI) to the guest in this case.  However, this is not correct
and has been observed to cause guests to crash, typically with a
data storage protection interrupt on a store to the vmemmap area.

Instead, what we do now is to synthesize a data or instruction segment
interrupt.  That should cause the guest to reload an appropriate entry
into the SLB and retry the faulting instruction.  If it still faults,
we should find an appropriate SLB entry next time and be able to handle
the fault.

Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b1dab8d..3c6badc 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1749,7 +1749,8 @@ kvmppc_hdsi:
beq 3f
clrrdi  r0, r4, 28
PPC_SLBFEE_DOT(R5, R0)  /* if so, look up SLB */
-   bne 1f  /* if no SLB entry found */
+   li  r0, BOOK3S_INTERRUPT_DATA_SEGMENT
+   bne 7f  /* if no SLB entry found */
 4: std r4, VCPU_FAULT_DAR(r9)
stw r6, VCPU_FAULT_DSISR(r9)
 
@@ -1768,14 +1769,15 @@ kvmppc_hdsi:
cmpdi   r3, -2  /* MMIO emulation; need instr word */
beq 2f
 
-   /* Synthesize a DSI for the guest */
+   /* Synthesize a DSI (or DSegI) for the guest */
ld  r4, VCPU_FAULT_DAR(r9)
mr  r6, r3
-1: mtspr   SPRN_DAR, r4
+1: li  r0, BOOK3S_INTERRUPT_DATA_STORAGE
mtspr   SPRN_DSISR, r6
+7: mtspr   SPRN_DAR, r4
mtspr   SPRN_SRR0, r10
mtspr   SPRN_SRR1, r11
-   li  r10, BOOK3S_INTERRUPT_DATA_STORAGE
+   mr  r10, r0
bl  kvmppc_msr_interrupt
 fast_interrupt_c_return:
 6: ld  r7, VCPU_CTR(r9)
@@ -1823,7 +1825,8 @@ kvmppc_hisi:
beq 3f
clrrdi  r0, r10, 28
PPC_SLBFEE_DOT(R5, R0)  /* if so, look up SLB */
-   bne 1f  /* if no SLB entry found */
+   li  r0, BOOK3S_INTERRUPT_INST_SEGMENT
+   bne 7f  /* if no SLB entry found */
 4:
/* Search the hash table. */
mr  r3, r9  /* vcpu pointer */
@@ -1840,11 +1843,12 @@ kvmppc_hisi:
cmpdi   r3, -1  /* handle in kernel mode */
beq guest_exit_cont
 
-   /* Synthesize an ISI for the guest */
+   /* Synthesize an ISI (or ISegI) for the guest */
mr  r11, r3
-1: mtspr   SPRN_SRR0, r10
+1: li  r0, BOOK3S_INTERRUPT_INST_STORAGE
+7: mtspr   SPRN_SRR0, r10
mtspr   SPRN_SRR1, r11
-   li  r10, BOOK3S_INTERRUPT_INST_STORAGE
+   mr  r10, r0
bl  kvmppc_msr_interrupt
b   fast_interrupt_c_return
 
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Please pull my kvm-ppc-next branch

2015-10-25 Thread Paul Mackerras

Paolo,

Here is my current patch queue for KVM on PPC.  There's nothing much
in the way of new features this time; it's mostly bug fixes, plus
Nikunj has implemented support for KVM_CAP_NR_MEMSLOTS.  These are
intended for the "next" branch of the KVM tree.  Please pull.

Thanks,
Paul.

The following changes since commit 9ffecb10283508260936b96022d4ee43a7798b4c:

  Linux 4.3-rc3 (2015-09-27 07:50:08 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-next

for you to fetch changes up to 70aa3961a196ac32baf54032b2051bac9a941118:

  KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path (2015-10-21 
16:31:52 +1100)


Andrzej Hajda (1):
  KVM: PPC: e500: fix handling local_sid_lookup result

Gautham R. Shenoy (1):
  KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path

Mahesh Salgaonkar (1):
  KVM: PPC: Book3S HV: Deliver machine check with MSR(RI=0) to guest as MCE

Nikunj A Dadhania (1):
  KVM: PPC: Implement extension to report number of memslots

Paul Mackerras (2):
  KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation 
ioctl
  KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent 
HPTEs

Tudor Laurentiu (3):
  powerpc/e6500: add TMCFG0 register definition
  KVM: PPC: e500: Emulate TMCFG0 TMRN register
  KVM: PPC: e500: fix couple of shift operations on 64 bits

 arch/powerpc/include/asm/disassemble.h  |  5 +
 arch/powerpc/include/asm/reg_booke.h|  6 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c |  3 ++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  2 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 29 ++---
 arch/powerpc/kvm/e500.c |  3 ++-
 arch/powerpc/kvm/e500_emulate.c | 19 +++
 arch/powerpc/kvm/e500_mmu_host.c|  4 ++--
 arch/powerpc/kvm/powerpc.c  |  3 +++
 9 files changed, 63 insertions(+), 11 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Please pull my kvm-ppc-next branch

2015-10-25 Thread Paul Mackerras

Paolo,

Here is my current patch queue for KVM on PPC.  There's nothing much
in the way of new features this time; it's mostly bug fixes, plus
Nikunj has implemented support for KVM_CAP_NR_MEMSLOTS.  These are
intended for the "next" branch of the KVM tree.  Please pull.

Thanks,
Paul.

The following changes since commit 9ffecb10283508260936b96022d4ee43a7798b4c:

  Linux 4.3-rc3 (2015-09-27 07:50:08 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-next

for you to fetch changes up to 70aa3961a196ac32baf54032b2051bac9a941118:

  KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path (2015-10-21 
16:31:52 +1100)


Andrzej Hajda (1):
  KVM: PPC: e500: fix handling local_sid_lookup result

Gautham R. Shenoy (1):
  KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path

Mahesh Salgaonkar (1):
  KVM: PPC: Book3S HV: Deliver machine check with MSR(RI=0) to guest as MCE

Nikunj A Dadhania (1):
  KVM: PPC: Implement extension to report number of memslots

Paul Mackerras (2):
  KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation 
ioctl
  KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent 
HPTEs

Tudor Laurentiu (3):
  powerpc/e6500: add TMCFG0 register definition
  KVM: PPC: e500: Emulate TMCFG0 TMRN register
  KVM: PPC: e500: fix couple of shift operations on 64 bits

 arch/powerpc/include/asm/disassemble.h  |  5 +
 arch/powerpc/include/asm/reg_booke.h|  6 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c |  3 ++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  2 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 29 ++---
 arch/powerpc/kvm/e500.c |  3 ++-
 arch/powerpc/kvm/e500_emulate.c | 19 +++
 arch/powerpc/kvm/e500_mmu_host.c|  4 ++--
 arch/powerpc/kvm/powerpc.c  |  3 +++
 9 files changed, 63 insertions(+), 11 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Implement extension to report number of memslots

2015-10-25 Thread Paul Mackerras

On Fri, Oct 16, 2015 at 08:41:31AM +0200, Thomas Huth wrote:
> Yes, we'll likely need this soon! 32 slots are not enough...

Would anyone object if I raised the limit for PPC to 512 slots?
Would that cause problems on embedded PPC, for instance?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Implement extension to report number of memslots

2015-10-25 Thread Paul Mackerras

On Fri, Oct 16, 2015 at 08:41:31AM +0200, Thomas Huth wrote:
> Yes, we'll likely need this soon! 32 slots are not enough...

Would anyone object if I raised the limit for PPC to 512 slots?
Would that cause problems on embedded PPC, for instance?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: e500: fix couple of shift operations on 64 bits

2015-10-14 Thread Paul Mackerras

On Thu, Oct 01, 2015 at 03:58:03PM +0300, Laurentiu Tudor wrote:
> Fix couple of cases where we shift left a 32-bit
> value thus might get truncated results on 64-bit
> targets.
> 
> Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com>
> Suggested-by: Scott Wood <scotttw...@freescale.com>

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][v2] KVM: PPC: e500: Emulate TMCFG0 TMRN register

2015-10-14 Thread Paul Mackerras

On Fri, Sep 25, 2015 at 06:02:23PM +0300, Laurentiu Tudor wrote:
> Emulate TMCFG0 TMRN register exposing one HW thread per vcpu.
> 
> Signed-off-by: Mihai Caraman <mihai.cara...@freescale.com>
> [laurentiu.tu...@freescale.com: rebased on latest kernel, use
>  define instead of hardcoded value, moved code in own function]
> Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com>

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] powerpc/e6500: add TMCFG0 register definition

2015-10-14 Thread Paul Mackerras

On Wed, Sep 23, 2015 at 06:06:22PM +0300, Laurentiu Tudor wrote:
> The register is not currently used in the base kernel
> but will be in a forthcoming kvm patch.
> 
> Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com>

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][v2] KVM: PPC: e500: Emulate TMCFG0 TMRN register

2015-10-14 Thread Paul Mackerras

On Fri, Sep 25, 2015 at 06:02:23PM +0300, Laurentiu Tudor wrote:
> Emulate TMCFG0 TMRN register exposing one HW thread per vcpu.
> 
> Signed-off-by: Mihai Caraman <mihai.cara...@freescale.com>
> [laurentiu.tu...@freescale.com: rebased on latest kernel, use
>  define instead of hardcoded value, moved code in own function]
> Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com>

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: e500: fix couple of shift operations on 64 bits

2015-10-14 Thread Paul Mackerras

On Thu, Oct 01, 2015 at 03:58:03PM +0300, Laurentiu Tudor wrote:
> Fix couple of cases where we shift left a 32-bit
> value thus might get truncated results on 64-bit
> targets.
> 
> Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com>
> Suggested-by: Scott Wood <scotttw...@freescale.com>

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 15/19] KVM: PPC: e500: fix handling local_sid_lookup result

2015-10-14 Thread Paul Mackerras

On Thu, Sep 24, 2015 at 04:00:23PM +0200, Andrzej Hajda wrote:
> The function can return negative value.
> 
> The problem has been detected using proposed semantic patch
> scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].
> 
> [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107
> 
> Signed-off-by: Andrzej Hajda <a.ha...@samsung.com>

Thanks, applied to my kvm-ppc-next branch.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent HPTEs

2015-10-14 Thread Paul Mackerras

This fixes a bug where the old HPTE value returned by H_REMOVE has
the valid bit clear if the HPTE was an absent HPTE, as happens for
HPTEs for emulated MMIO pages and for RAM pages that have been paged
out by the host.  If the absent bit is set, we clear it and set the
valid bit, because from the guest's point of view, the HPTE is valid.

Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c1df9bb..97e7f8c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -470,6 +470,8 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
note_hpte_modification(kvm, rev);
unlock_hpte(hpte, 0);
 
+   if (v & HPTE_V_ABSENT)
+   v = (v & ~HPTE_V_ABSENT) | HPTE_V_VALID;
hpret[0] = v;
hpret[1] = r;
return H_SUCCESS;
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent HPTEs

2015-10-14 Thread Paul Mackerras

This fixes a bug where the old HPTE value returned by H_REMOVE has
the valid bit clear if the HPTE was an absent HPTE, as happens for
HPTEs for emulated MMIO pages and for RAM pages that have been paged
out by the host.  If the absent bit is set, we clear it and set the
valid bit, because from the guest's point of view, the HPTE is valid.

Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c1df9bb..97e7f8c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -470,6 +470,8 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
note_hpte_modification(kvm, rev);
unlock_hpte(hpte, 0);
 
+   if (v & HPTE_V_ABSENT)
+   v = (v & ~HPTE_V_ABSENT) | HPTE_V_VALID;
hpret[0] = v;
hpret[1] = r;
return H_SUCCESS;
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl

2015-10-14 Thread Paul Mackerras

Currently the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested
size of HPT, and if that is not possible, then try to allocate smaller
sizes (by factors of 2) until either a minimum is reached or the
allocation succeeds.  This is not ideal for userspace, particularly in
migration scenarios, where the destination VM really does require the
size requested.  Also, the minimum HPT size of 256kB may be
insufficient for the guest to run successfully.

This removes the fallback to smaller sizes on allocation failure for
the KVM_PPC_ALLOCATE_HTAB ioctl.  The fallback still exists for the
case where the HPT is allocated at the time the first VCPU is run, if
no HPT has been allocated by ioctl by that time.

Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 1f9c0a1..10722b1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -70,7 +70,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
 
/* Lastly try successively smaller sizes from the page allocator */
-   while (!hpt && order > PPC_MIN_HPT_ORDER) {
+   /* Only do this if userspace didn't specify a size via ioctl */
+   while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
   __GFP_NOWARN, order - PAGE_SHIFT);
if (!hpt)
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl

2015-10-14 Thread Paul Mackerras

Currently the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested
size of HPT, and if that is not possible, then try to allocate smaller
sizes (by factors of 2) until either a minimum is reached or the
allocation succeeds.  This is not ideal for userspace, particularly in
migration scenarios, where the destination VM really does require the
size requested.  Also, the minimum HPT size of 256kB may be
insufficient for the guest to run successfully.

This removes the fallback to smaller sizes on allocation failure for
the KVM_PPC_ALLOCATE_HTAB ioctl.  The fallback still exists for the
case where the HPT is allocated at the time the first VCPU is run, if
no HPT has been allocated by ioctl by that time.

Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 1f9c0a1..10722b1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -70,7 +70,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
 
/* Lastly try successively smaller sizes from the page allocator */
-   while (!hpt && order > PPC_MIN_HPT_ORDER) {
+   /* Only do this if userspace didn't specify a size via ioctl */
+   while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
   __GFP_NOWARN, order - PAGE_SHIFT);
if (!hpt)
-- 
2.6.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()

2015-09-21 Thread Paul Mackerras

On Mon, Sep 21, 2015 at 07:50:22AM +0200, Paolo Bonzini wrote:
> 
> 
> On 21/09/2015 03:37, David Gibson wrote:
> > On Fri, Sep 18, 2015 at 08:57:28AM +0200, Thomas Huth wrote:
> >> Access to the kvm->buses (like with the kvm_io_bus_read() and
> >> -write() functions) has to be protected via the kvm->srcu lock. 
> >> The kvmppc_h_logical_ci_load() and -store() functions are
> >> missing this lock so far, so let's add it there, too. This fixes
> >> the problem that the kernel reports "suspicious RCU usage" when
> >> lock debugging is enabled.
> >> 
> >> Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1 Signed-off-by:
> >> Thomas Huth <th...@redhat.com>
> > 
> > Nice catch.  Looks like I missed this because the places 
> > kvm_io_bus_{read,write}() are called on x86 are buried about 5
> > layers below where the srcu lock is taken :/.
> > 
> > Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>
...
> Paul,
> 
> shall I take this directly into my tree for -rc3?
> 
> Paolo

I have that and two other fixes in my kvm-ppc-fixes branch on
kernel.org.  They were in linux-next today.  I was going to send you a
pull request tomorrow, but if you are about to send stuff off to Linus
you could pull now from:

git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes

The three patches in there are:

Gautham R. Shenoy (1):
  KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit

Paul Mackerras (1):
  KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs

Thomas Huth (1):
  KVM: PPC: Book3S: Take the kvm->srcu lock in 
kvmppc_h_logical_ci_load/store()

The one from Gautham is a 1-liner that has been around for months and
got missed, and is obviously correct.  The one from me fixes a
regression that was introduced in 4.3-rc1 by one of my patches, which
causes oopses and soft lockups due to a use-after-free bug.

Thanks,
Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()

2015-09-21 Thread Paul Mackerras

On Mon, Sep 21, 2015 at 07:50:22AM +0200, Paolo Bonzini wrote:
> 
> 
> On 21/09/2015 03:37, David Gibson wrote:
> > On Fri, Sep 18, 2015 at 08:57:28AM +0200, Thomas Huth wrote:
> >> Access to the kvm->buses (like with the kvm_io_bus_read() and
> >> -write() functions) has to be protected via the kvm->srcu lock. 
> >> The kvmppc_h_logical_ci_load() and -store() functions are
> >> missing this lock so far, so let's add it there, too. This fixes
> >> the problem that the kernel reports "suspicious RCU usage" when
> >> lock debugging is enabled.
> >> 
> >> Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1 Signed-off-by:
> >> Thomas Huth <th...@redhat.com>
> > 
> > Nice catch.  Looks like I missed this because the places 
> > kvm_io_bus_{read,write}() are called on x86 are buried about 5
> > layers below where the srcu lock is taken :/.
> > 
> > Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>
...
> Paul,
> 
> shall I take this directly into my tree for -rc3?
> 
> Paolo

I have that and two other fixes in my kvm-ppc-fixes branch on
kernel.org.  They were in linux-next today.  I was going to send you a
pull request tomorrow, but if you are about to send stuff off to Linus
you could pull now from:

git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes

The three patches in there are:

Gautham R. Shenoy (1):
  KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit

Paul Mackerras (1):
  KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs

Thomas Huth (1):
  KVM: PPC: Book3S: Take the kvm->srcu lock in 
kvmppc_h_logical_ci_load/store()

The one from Gautham is a 1-liner that has been around for months and
got missed, and is obviously correct.  The one from me fixes a
regression that was introduced in 4.3-rc1 by one of my patches, which
causes oopses and soft lockups due to a use-after-free bug.

Thanks,
Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs

2015-09-18 Thread Paul Mackerras

This fixes a bug which results in stale vcore pointers being left in
the per-cpu preempted vcore lists when a VM is destroyed.  The result
of the stale vcore pointers is usually either a crash or a lockup
inside collect_piggybacks() when another VM is run.  A typical
lockup message looks like:

[  472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! 
[qemu-system-ppc:7039]
[  472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE 
nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 
xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables 
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle 
ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle 
iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng 
binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm 
drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: 
kvm_hv]
[  472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ 
#49
[  472.162187] task: c01e38512750 ti: c01e41bfc000 task.ti: 
c01e41bfc000
[  472.162262] NIP: c096b094 LR: c096b08c CTR: c030
[  472.162337] REGS: c01e41bff520 TRAP: 0901   Not tainted  (4.2.0-kvm+)
[  472.162399] MSR: 90019033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24848844  
XER: 
[  472.162588] CFAR: c096b0ac SOFTE: 1
GPR00: c070 c01e41bff7a0 c127df00 0001
GPR04: 0003 0001  00874821
GPR08: c01e41bff8e0 0001  defde740
GPR12: c030 cfdae400
[  472.163053] NIP [c096b094] _raw_spin_lock_irqsave+0xa4/0x130
[  472.163117] LR [c096b08c] _raw_spin_lock_irqsave+0x9c/0x130
[  472.163179] Call Trace:
[  472.163206] [c01e41bff7a0] [c01e41bff7f0] 0xc01e41bff7f0 
(unreliable)
[  472.163295] [c01e41bff7e0] [c070] __wake_up+0x40/0x90
[  472.163375] [c01e41bff830] [defd6fc0] 
kvmppc_run_core+0x1240/0x1950 [kvm_hv]
[  472.163465] [c01e41bffa30] [defd8510] 
kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv]
[  472.163559] [c01e41bffb70] [de9318a4] kvmppc_vcpu_run+0x44/0x60 
[kvm]
[  472.163653] [c01e41bffba0] [de92e674] 
kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[  472.163745] [c01e41bffbe0] [de9263a8] kvm_vcpu_ioctl+0x538/0x7b0 
[kvm]
[  472.163834] [c01e41bffd40] [c02d0f50] do_vfs_ioctl+0x480/0x7c0
[  472.163910] [c01e41bffde0] [c02d1364] SyS_ioctl+0xd4/0xf0
[  472.163986] [c01e41bffe30] [c0009260] system_call+0x38/0xd0
[  472.164060] Instruction dump:
[  472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 6000 6000 6042 
8bad02e2
[  472.164224] 7fc3f378 4b6a57c1 6000 7c210b78  89290009 792affe3 
40820070

The bug is that kvmppc_run_vcpu does not correctly handle the case
where a vcpu task receives a signal while its guest vcpu is executing
in the guest as a result of being piggy-backed onto the execution of
another vcore.  In that case we need to wait for the vcpu to finish
executing inside the guest, and then remove this vcore from the
preempted vcores list.  That way, we avoid leaving this vcpu's vcore
on the preempted vcores list when the vcpu gets interrupted.

Fixes: ec2571650826
Reported-by: Thomas Huth <th...@redhat.com>
Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 9754e68..2280497 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2692,9 +2692,13 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
 
while (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE &&
   (vc->vcore_state == VCORE_RUNNING ||
-   vc->vcore_state == VCORE_EXITING))
+   vc->vcore_state == VCORE_EXITING ||
+   vc->vcore_state == VCORE_PIGGYBACK))
kvmppc_wait_for_exec(vc, vcpu, TASK_UNINTERRUPTIBLE);
 
+   if (vc->vcore_state == VCORE_PREEMPT && vc->runner == NULL)
+   kvmppc_vcore_end_preempt(vc);
+
if (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE) {
kvmppc_remove_runnable(vc, vcpu);
vcpu->stat.signal_exits++;
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs

2015-09-18 Thread Paul Mackerras

This fixes a bug which results in stale vcore pointers being left in
the per-cpu preempted vcore lists when a VM is destroyed.  The result
of the stale vcore pointers is usually either a crash or a lockup
inside collect_piggybacks() when another VM is run.  A typical
lockup message looks like:

[  472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! 
[qemu-system-ppc:7039]
[  472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE 
nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 
xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables 
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle 
ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle 
iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng 
binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm 
drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: 
kvm_hv]
[  472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ 
#49
[  472.162187] task: c01e38512750 ti: c01e41bfc000 task.ti: 
c01e41bfc000
[  472.162262] NIP: c096b094 LR: c096b08c CTR: c030
[  472.162337] REGS: c01e41bff520 TRAP: 0901   Not tainted  (4.2.0-kvm+)
[  472.162399] MSR: 90019033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24848844  
XER: 
[  472.162588] CFAR: c096b0ac SOFTE: 1
GPR00: c070 c01e41bff7a0 c127df00 0001
GPR04: 0003 0001  00874821
GPR08: c01e41bff8e0 0001  defde740
GPR12: c030 cfdae400
[  472.163053] NIP [c096b094] _raw_spin_lock_irqsave+0xa4/0x130
[  472.163117] LR [c096b08c] _raw_spin_lock_irqsave+0x9c/0x130
[  472.163179] Call Trace:
[  472.163206] [c01e41bff7a0] [c01e41bff7f0] 0xc01e41bff7f0 
(unreliable)
[  472.163295] [c01e41bff7e0] [c070] __wake_up+0x40/0x90
[  472.163375] [c01e41bff830] [defd6fc0] 
kvmppc_run_core+0x1240/0x1950 [kvm_hv]
[  472.163465] [c01e41bffa30] [defd8510] 
kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv]
[  472.163559] [c01e41bffb70] [de9318a4] kvmppc_vcpu_run+0x44/0x60 
[kvm]
[  472.163653] [c01e41bffba0] [de92e674] 
kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[  472.163745] [c01e41bffbe0] [de9263a8] kvm_vcpu_ioctl+0x538/0x7b0 
[kvm]
[  472.163834] [c01e41bffd40] [c02d0f50] do_vfs_ioctl+0x480/0x7c0
[  472.163910] [c01e41bffde0] [c02d1364] SyS_ioctl+0xd4/0xf0
[  472.163986] [c01e41bffe30] [c0009260] system_call+0x38/0xd0
[  472.164060] Instruction dump:
[  472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 6000 6000 6042 
8bad02e2
[  472.164224] 7fc3f378 4b6a57c1 6000 7c210b78  89290009 792affe3 
40820070

The bug is that kvmppc_run_vcpu does not correctly handle the case
where a vcpu task receives a signal while its guest vcpu is executing
in the guest as a result of being piggy-backed onto the execution of
another vcore.  In that case we need to wait for the vcpu to finish
executing inside the guest, and then remove this vcore from the
preempted vcores list.  That way, we avoid leaving this vcpu's vcore
on the preempted vcores list when the vcpu gets interrupted.

Fixes: ec2571650826
Reported-by: Thomas Huth <th...@redhat.com>
Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 9754e68..2280497 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2692,9 +2692,13 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
struct kvm_vcpu *vcpu)
 
while (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE &&
   (vc->vcore_state == VCORE_RUNNING ||
-   vc->vcore_state == VCORE_EXITING))
+   vc->vcore_state == VCORE_EXITING ||
+   vc->vcore_state == VCORE_PIGGYBACK))
kvmppc_wait_for_exec(vc, vcpu, TASK_UNINTERRUPTIBLE);
 
+   if (vc->vcore_state == VCORE_PREEMPT && vc->runner == NULL)
+   kvmppc_vcore_end_preempt(vc);
+
if (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE) {
kvmppc_remove_runnable(vc, vcpu);
vcpu->stat.signal_exits++;
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm

2015-09-15 Thread Paul E. McKenney

On Tue, Sep 15, 2015 at 09:24:15PM -0400, Tejun Heo wrote:
> Hello, Paul.
> 
> On Tue, Sep 15, 2015 at 04:38:18PM -0700, Paul E. McKenney wrote:
> > Well, the decision as to what is too big for -stable is owned by the
> > -stable maintainers, not by me.
> 
> Is it tho?  Usually the subsystem maintainer knows the best and has
> most say in it.  I was mostly curious whether you'd think that the
> changes would be too risky.  If not, great.

I do hope that they would listen to what I thought about it, but at
the end of the day, it is the -stable maintainers who pull a given
patch, or don't.

> > I am suggesting trying the options and seeing what works best, then
> > working to convince people as needed.
> 
> Yeah, sure thing.  Let's wait for Christian.

Indeed.  Is there enough benefit to risk jamming this thing into 4.3?
I believe that 4.4 should be a no-brainer.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm

2015-09-15 Thread Paul E. McKenney

On Tue, Sep 15, 2015 at 06:28:11PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Sep 15, 2015 at 02:38:30PM -0700, Paul E. McKenney wrote:
> > I did take a shot at adding the rcu_sync stuff during this past merge
> > window, but it did not converge quickly enough to make it.  It looks
> > quite good for the next merge window.  There have been changes in most
> > of the relevant areas, so probably best to just try them and see which
> > works best.
> 
> Heh, I'm having a bit of trouble following.  Are you saying that the
> changes would be too big for -stable?  If so, I'll send out reverts of
> the culprit patches and then reapply them for this cycle so that it
> can land together with the rcu changes in the next merge window, but
> it'd be great to find out whether the rcu changes are enough for the
> issue that Christian is seeing to go away.  If not, I'll switch to a
> different locking scheme and mark those patches w/ stable tag.

Well, the decision as to what is too big for -stable is owned by the
-stable maintainers, not by me.

I am suggesting trying the options and seeing what works best, then
working to convince people as needed.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm

2015-09-15 Thread Paul E. McKenney

On Tue, Sep 15, 2015 at 06:42:19PM +0200, Paolo Bonzini wrote:
> 
> 
> On 15/09/2015 15:36, Christian Borntraeger wrote:
> > I am wondering why the old code behaved in such fatal ways. Is there
> > some interaction between waiting for a reschedule in the
> > synchronize_sched writer and some fork code actually waiting for the
> > read side to get the lock together with some rescheduling going on
> > waiting for a lock that fork holds? lockdep does not give me an hints
> > so I have no clue :-(
> 
> It may just be consuming too much CPU usage.  kernel/rcu/tree.c warns
> about it:
> 
>  * if you are using synchronize_sched_expedited() in a loop, please
>  * restructure your code to batch your updates, and then use a single
>  * synchronize_sched() instead.
> 
> and you may remember that in KVM we switched from RCU to SRCU exactly to
> avoid userspace-controlled synchronize_rcu_expedited().
> 
> In fact, I would say that any userspace-controlled call to *_expedited()
> is a bug waiting to happen and a bad idea---because userspace can, with
> little effort, end up calling it in a loop.

Excellent points!

Other options in such situations include the following:

o   Rework so that the code uses call_rcu*() instead of *_expedited().

o   Maintain a per-task or per-CPU counter so that every so many
*_expedited() invocations instead uses the non-expedited
counterpart.  (For example, synchronize_rcu instead of
synchronize_rcu_expedited().)

Note that synchronize_srcu_expedited() is less troublesome than are the
other *_expedited() functions, because synchronize_srcu_expedited() does
not inflict OS jitter on other CPUs.  This situation is being improved,
so that the other *_expedited() functions inflict less OS jitter and
(mostly) avoid inflicting OS jitter on nohz_full CPUs and idle CPUs (the
latter being important for battery-powered systems).  In addition, the
*_expedited() functions avoid hammering CPUs with N-squared OS jitter
in response to concurrent invocation from all CPUs because multiple
concurrent *_expedited() calls will be satisfied by a single expedited
grace-period operation.  Nevertheless, as Paolo points out, it is still
necessary to exercise caution when exposing synchronous grace periods
to userspace control.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm

2015-09-15 Thread Paul E. McKenney

On Tue, Sep 15, 2015 at 05:26:22PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Sep 15, 2015 at 11:11:45PM +0200, Christian Borntraeger wrote:
> > > In fact, I would say that any userspace-controlled call to *_expedited()
> > > is a bug waiting to happen and a bad idea---because userspace can, with
> > > little effort, end up calling it in a loop.
> > 
> > Right. This also implies that we should fix this for 4.2 - I guess.
> 
> Are the percpu_rwsem changes enough?  If so, we can try to backport
> those.  If those are too risky, we can revert the patches which
> switched threadgroup lock to percpu_rwsem.

I did take a shot at adding the rcu_sync stuff during this past merge
window, but it did not converge quickly enough to make it.  It looks
quite good for the next merge window.  There have been changes in most
of the relevant areas, so probably best to just try them and see which
works best.

    Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD

2015-09-06 Thread Paul Mackerras

On Sun, Sep 06, 2015 at 12:47:12PM -0700, Nathan Whitehorn wrote:
> Anything I can do to help move these along? It's a big performance
> improvement for FreeBSD guests.

These patches are in Paolo's kvm-ppc-next branch and should go into
Linus' tree in the next couple of days.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD

2015-09-06 Thread Paul Mackerras

On Sun, Sep 06, 2015 at 12:47:12PM -0700, Nathan Whitehorn wrote:
> Anything I can do to help move these along? It's a big performance
> improvement for FreeBSD guests.

These patches are in Paolo's kvm-ppc-next branch and should go into
Linus' tree in the next couple of days.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Please pull my kvm-ppc-next branch

2015-09-05 Thread Paul Mackerras

Paolo,

Please pull the commits listed below into your tree.  I would like
them to go in for 4.3 as they are all small bug fixes not new
features, and they all can only affect HV-mode KVM on IBM server
machines (in fact one has no effect on code at all since it is a typo
fix for a comment).

Please let me know if you want me to re-post all the patches.

Thanks,
Paul.

The following changes since commit e3dbc572fe11a5231568e106fa3dcedd1d1bec0f:

  Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into 
kvm-queue (2015-08-22 14:57:59 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-next

for you to fetch changes up to 4e33d1f0a145d48e8cf287954bbf791af8387cfb:

  KVM: PPC: Book3S: Fix typo in top comment about locking (2015-09-04 07:28:05 
+1000)


Gautham R. Shenoy (2):
  KVM: PPC: Book3S HV: Fix race in starting secondary threads
  KVM: PPC: Book3S HV: Exit on H_DOORBELL if HOST_IPI is set

Greg Kurz (1):
  KVM: PPC: Book3S: Fix typo in top comment about locking

Thomas Huth (1):
  KVM: PPC: Book3S: Fix size of the PSPB register

 arch/powerpc/include/asm/kvm_host.h |  2 +-
 arch/powerpc/kvm/book3s_hv.c| 10 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  9 +
 arch/powerpc/kvm/book3s_xics.c  |  2 +-
 4 files changed, 20 insertions(+), 3 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Please pull my kvm-ppc-next branch

2015-09-05 Thread Paul Mackerras

Paolo,

Please pull the commits listed below into your tree.  I would like
them to go in for 4.3 as they are all small bug fixes not new
features, and they all can only affect HV-mode KVM on IBM server
machines (in fact one has no effect on code at all since it is a typo
fix for a comment).

Please let me know if you want me to re-post all the patches.

Thanks,
Paul.

The following changes since commit e3dbc572fe11a5231568e106fa3dcedd1d1bec0f:

  Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into 
kvm-queue (2015-08-22 14:57:59 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-next

for you to fetch changes up to 4e33d1f0a145d48e8cf287954bbf791af8387cfb:

  KVM: PPC: Book3S: Fix typo in top comment about locking (2015-09-04 07:28:05 
+1000)


Gautham R. Shenoy (2):
  KVM: PPC: Book3S HV: Fix race in starting secondary threads
  KVM: PPC: Book3S HV: Exit on H_DOORBELL if HOST_IPI is set

Greg Kurz (1):
  KVM: PPC: Book3S: Fix typo in top comment about locking

Thomas Huth (1):
  KVM: PPC: Book3S: Fix size of the PSPB register

 arch/powerpc/include/asm/kvm_host.h |  2 +-
 arch/powerpc/kvm/book3s_hv.c| 10 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  9 +
 arch/powerpc/kvm/book3s_xics.c  |  2 +-
 4 files changed, 20 insertions(+), 3 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Please add my kvm-ppc-next branch to linux-next

2015-09-03 Thread Paul Mackerras

Hi Stephen,

Please include the kvm-ppc-next branch of my powerpc git tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git

to linux-next.  This branch currently only has commits that are
intended to go into 4.3, and I won't put in any commits for 4.4 until
4.3-rc1 is out.

Thanks,
Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Exit on H_DOORBELL only if HOST_IPI is set

2015-09-02 Thread Paul Mackerras

From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com>

The code that handles the case when we receive a H_DOORBELL interrupt
has a comment which says "Hypervisor doorbell - exit only if host IPI
flag set".  However, the current code does not actually check if the
host IPI flag is set.  This is due to a comparison instruction that
got missed.

As a result, the current code performs the exit to host only
if some sibling thread or a sibling sub-core is exiting to the
host.  This implies that, an IPI sent to a sibling core in
(subcores-per-core != 1) mode will be missed by the host unless the
sibling core is on the exit path to the host.

This patch adds the missing comparison operation which will ensure
that when HOST_IPI flag is set, we unconditionally exit to the host.

Fixes: 66feed61cdf6
Cc: sta...@vger.kernel.org # v4.1+
Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b07f045..2273dca 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1213,6 +1213,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
cmpwi   r12, BOOK3S_INTERRUPT_H_DOORBELL
bne 3f
lbz r0, HSTATE_HOST_IPI(r13)
+   cmpwi   r0, 0
beq 4f
b   guest_exit_cont
 3:
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Fix race in starting secondary threads

2015-09-02 Thread Paul Mackerras

From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com>

The current dynamic micro-threading code has a race due to which a
secondary thread naps when it is supposed to be running a vcpu. As a
side effect of this, on a guest exit, the primary thread in
kvmppc_wait_for_nap() finds that this secondary thread hasn't cleared
its vcore pointer. This results in "CPU X seems to be stuck!"
warnings.

The race is possible since the primary thread on exiting the guests
only waits for all the secondaries to clear its vcore pointer. It
subsequently expects the secondary threads to enter nap while it
unsplits the core. A secondary thread which hasn't yet entered the nap
will loop in kvm_no_guest until its vcore pointer and the do_nap flag
are unset. Once the core has been unsplit, a new vcpu thread can grab
the core and set the do_nap flag *before* setting the vcore pointers
of the secondary. As a result, the secondary thread will now enter nap
via kvm_unsplit_nap instead of running the guest vcpu.

Fix this by setting the do_nap flag after setting the vcore pointer in
the PACA of the secondary in kvmppc_run_core. Also, ensure that a
secondary thread doesn't nap in kvm_unsplit_nap when the vcore pointer
in its PACA struct is set.

Fixes: b4deba5c41e9
Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c| 10 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index fad52f2..c5edf17 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2411,7 +2411,6 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore 
*vc)
break;
cpu_relax();
}
-   split_info.do_nap = 1;  /* ask secondaries to nap when done */
}
 
/* Start all the threads */
@@ -2440,6 +2439,15 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore 
*vc)
thr += pvc->num_threads;
}
}
+
+   /*
+* Ensure that split_info.do_nap is set after setting
+* the vcore pointer in the PACA of the secondaries.
+*/
+   smp_mb();
+   if (cmd_bit)
+   split_info.do_nap = 1;  /* ask secondaries to nap when done */
+
/*
 * When doing micro-threading, poke the inactive threads as well.
 * This gets them to the nap instruction after kvm_do_nap,
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 472680f..b07f045 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -421,6 +421,14 @@ kvm_no_guest:
  * whole-core mode, so we need to nap.
  */
 kvm_unsplit_nap:
+   /*
+* Ensure that secondary doesn't nap when it has
+* its vcore pointer set.
+*/
+   sync/* matches smp_mb() before setting split_info.do_nap */
+   ld  r0, HSTATE_KVM_VCORE(r13)
+   cmpdi   r0, 0
+   bne kvm_no_guest
/* clear any pending message */
 BEGIN_FTR_SECTION
lis r6, (PPC_DBELL_SERVER << (63-36))@h
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Exit on H_DOORBELL only if HOST_IPI is set

2015-09-02 Thread Paul Mackerras

From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com>

The code that handles the case when we receive a H_DOORBELL interrupt
has a comment which says "Hypervisor doorbell - exit only if host IPI
flag set".  However, the current code does not actually check if the
host IPI flag is set.  This is due to a comparison instruction that
got missed.

As a result, the current code performs the exit to host only
if some sibling thread or a sibling sub-core is exiting to the
host.  This implies that, an IPI sent to a sibling core in
(subcores-per-core != 1) mode will be missed by the host unless the
sibling core is on the exit path to the host.

This patch adds the missing comparison operation which will ensure
that when HOST_IPI flag is set, we unconditionally exit to the host.

Fixes: 66feed61cdf6
Cc: sta...@vger.kernel.org # v4.1+
Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b07f045..2273dca 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1213,6 +1213,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
cmpwi   r12, BOOK3S_INTERRUPT_H_DOORBELL
bne 3f
lbz r0, HSTATE_HOST_IPI(r13)
+   cmpwi   r0, 0
beq 4f
b   guest_exit_cont
 3:
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Fix race in starting secondary threads

2015-09-02 Thread Paul Mackerras

From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com>

The current dynamic micro-threading code has a race due to which a
secondary thread naps when it is supposed to be running a vcpu. As a
side effect of this, on a guest exit, the primary thread in
kvmppc_wait_for_nap() finds that this secondary thread hasn't cleared
its vcore pointer. This results in "CPU X seems to be stuck!"
warnings.

The race is possible since the primary thread on exiting the guests
only waits for all the secondaries to clear its vcore pointer. It
subsequently expects the secondary threads to enter nap while it
unsplits the core. A secondary thread which hasn't yet entered the nap
will loop in kvm_no_guest until its vcore pointer and the do_nap flag
are unset. Once the core has been unsplit, a new vcpu thread can grab
the core and set the do_nap flag *before* setting the vcore pointers
of the secondary. As a result, the secondary thread will now enter nap
via kvm_unsplit_nap instead of running the guest vcpu.

Fix this by setting the do_nap flag after setting the vcore pointer in
the PACA of the secondary in kvmppc_run_core. Also, ensure that a
secondary thread doesn't nap in kvm_unsplit_nap when the vcore pointer
in its PACA struct is set.

Fixes: b4deba5c41e9
Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <pau...@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c| 10 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  8 
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index fad52f2..c5edf17 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2411,7 +2411,6 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore 
*vc)
break;
cpu_relax();
}
-   split_info.do_nap = 1;  /* ask secondaries to nap when done */
}
 
/* Start all the threads */
@@ -2440,6 +2439,15 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore 
*vc)
thr += pvc->num_threads;
}
}
+
+   /*
+* Ensure that split_info.do_nap is set after setting
+* the vcore pointer in the PACA of the secondaries.
+*/
+   smp_mb();
+   if (cmd_bit)
+   split_info.do_nap = 1;  /* ask secondaries to nap when done */
+
/*
 * When doing micro-threading, poke the inactive threads as well.
 * This gets them to the nap instruction after kvm_do_nap,
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 472680f..b07f045 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -421,6 +421,14 @@ kvm_no_guest:
  * whole-core mode, so we need to nap.
  */
 kvm_unsplit_nap:
+   /*
+* Ensure that secondary doesn't nap when it has
+* its vcore pointer set.
+*/
+   sync/* matches smp_mb() before setting split_info.do_nap */
+   ld  r0, HSTATE_KVM_VCORE(r13)
+   cmpdi   r0, 0
+   bne kvm_no_guest
/* clear any pending message */
 BEGIN_FTR_SECTION
lis r6, (PPC_DBELL_SERVER << (63-36))@h
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Paul Mackerras

On Tue, Sep 01, 2015 at 11:41:18PM +0200, Thomas Huth wrote:
> The size of the Problem State Priority Boost Register is only
> 32 bits, so let's change the type of the corresponding variable
> accordingly to avoid future trouble.

Since we're already using lwz/stw in the assembly code in
book3s_hv_rmhandlers.S, this is actually a bug fix, isn't it?
How did you find it?  Did you observe a failure of some kind, or did
you just find it by code inspection?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Paul Mackerras

On Tue, Sep 01, 2015 at 11:41:18PM +0200, Thomas Huth wrote:
> The size of the Problem State Priority Boost Register is only
> 32 bits, so let's change the type of the corresponding variable
> accordingly to avoid future trouble.

Since we're already using lwz/stw in the assembly code in
book3s_hv_rmhandlers.S, this is actually a bug fix, isn't it?
How did you find it?  Did you observe a failure of some kind, or did
you just find it by code inspection?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Paul Mackerras

On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote:
> > The size of the Problem State Priority Boost Register is only
> > 32 bits, so let's change the type of the corresponding variable
> > accordingly to avoid future trouble.
> 
> It's not future trouble, it's broken today for LE and this should fix
> it BUT 

No, it's broken today for BE hosts, which will always see 0 for the
PSPB register value.  LE hosts are fine.

> The asm accesses it using lwz/stw and C accesses it as a ulong. On LE
> that will mean that userspace will see the value << 32

No, that will happen on BE, and since KVM_REG_PPC_PSPB says it's a
32-bit register, we'll just pass 0 back to userspace when it reads it.

> Now "fixing" it might break migration if that field is already
> stored/loaded in its "broken" form. We may have to keep the "broken"
> behaviour and document that qemu sees a value shifted by 32.

It will be being set to 0 on BE hosts across migration today
(fortunately 0 is a benign value for PSPB).  If we fix this on both
the source and destination host, then the value will get migrated
across correctly.

I think Thomas's patch is fine, it just needs a stronger patch
description saying that it fixes an actual bug.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Paul Mackerras

On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote:
> > The size of the Problem State Priority Boost Register is only
> > 32 bits, so let's change the type of the corresponding variable
> > accordingly to avoid future trouble.
> 
> It's not future trouble, it's broken today for LE and this should fix
> it BUT 

No, it's broken today for BE hosts, which will always see 0 for the
PSPB register value.  LE hosts are fine.

> The asm accesses it using lwz/stw and C accesses it as a ulong. On LE
> that will mean that userspace will see the value << 32

No, that will happen on BE, and since KVM_REG_PPC_PSPB says it's a
32-bit register, we'll just pass 0 back to userspace when it reads it.

> Now "fixing" it might break migration if that field is already
> stored/loaded in its "broken" form. We may have to keep the "broken"
> behaviour and document that qemu sees a value shifted by 32.

It will be being set to 0 on BE hosts across migration today
(fortunately 0 is a benign value for PSPB).  If we fix this on both
the source and destination host, then the value will get migrated
across correctly.

I think Thomas's patch is fine, it just needs a stronger patch
description saying that it fixes an actual bug.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vfio: Enable VFIO device for powerpc

2015-08-26 Thread Paul Mackerras

On Wed, Aug 26, 2015 at 11:34:26AM +0200, Alexander Graf wrote:
 
 
 On 13.08.15 03:15, David Gibson wrote:
  ec53500f kvm: Add VFIO device added a special KVM pseudo-device which is
  used to handle any necessary interactions between KVM and VFIO.
  
  Currently that device is built on x86 and ARM, but not powerpc, although
  powerpc does support both KVM and VFIO.  This makes things awkward in
  userspace
  
  Currently qemu prints an alarming error message if you attempt to use VFIO
  and it can't initialize the KVM VFIO device.  We don't want to remove the
  warning, because lack of the KVM VFIO device could mean coherency problems
  on x86.  On powerpc, however, the error is harmless but looks disturbing,
  and a test based on host architecture in qemu would be ugly, and break if
  we do need the KVM VFIO device for something important in future.
  
  There's nothing preventing the KVM VFIO device from being built for
  powerpc, so this patch turns it on.  It won't actually do anything, since
  we don't define any of the arch_*() hooks, but it will make qemu happy and
  we can extend it in future if we need to.
  
  Signed-off-by: David Gibson da...@gibson.dropbear.id.au
  Reviewed-by: Eric Auger eric.au...@linaro.org
 
 Paul is going to take care of the kvm-ppc tree for 4.3. Also, ppc kvm
 patches should get CC on the kvm-ppc@vger mailing list ;).
 
 Paul, could you please pick this one up?

Sure, I'll do that once I get home (end of this week).

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vfio: Enable VFIO device for powerpc

2015-08-26 Thread Paul Mackerras

On Wed, Aug 26, 2015 at 11:34:26AM +0200, Alexander Graf wrote:
 
 
 On 13.08.15 03:15, David Gibson wrote:
  ec53500f kvm: Add VFIO device added a special KVM pseudo-device which is
  used to handle any necessary interactions between KVM and VFIO.
  
  Currently that device is built on x86 and ARM, but not powerpc, although
  powerpc does support both KVM and VFIO.  This makes things awkward in
  userspace
  
  Currently qemu prints an alarming error message if you attempt to use VFIO
  and it can't initialize the KVM VFIO device.  We don't want to remove the
  warning, because lack of the KVM VFIO device could mean coherency problems
  on x86.  On powerpc, however, the error is harmless but looks disturbing,
  and a test based on host architecture in qemu would be ugly, and break if
  we do need the KVM VFIO device for something important in future.
  
  There's nothing preventing the KVM VFIO device from being built for
  powerpc, so this patch turns it on.  It won't actually do anything, since
  we don't define any of the arch_*() hooks, but it will make qemu happy and
  we can extend it in future if we need to.
  
  Signed-off-by: David Gibson da...@gibson.dropbear.id.au
  Reviewed-by: Eric Auger eric.au...@linaro.org
 
 Paul is going to take care of the kvm-ppc tree for 4.3. Also, ppc kvm
 patches should get CC on the kvm-ppc@vger mailing list ;).
 
 Paul, could you please pick this one up?

Sure, I'll do that once I get home (end of this week).

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm:powerpc:Fix return statements for wrapper functions in the file book3s_64_mmu_hv.c

2015-08-20 Thread Paul Mackerras

On Mon, Aug 10, 2015 at 11:27:31AM -0400, Nicholas Krause wrote:
 This fixes the wrapper functions kvm_umap_hva_hv and the function
 kvm_unmap_hav_range_hv to return the return value of the function
 kvm_handle_hva or kvm_handle_hva_range that they are wrapped to
 call internally rather then always making the caller of these
 wrapper functions think they always run successfully by returning
 the value of zero directly.

In fact these functions do always run successfully, and there is no
bug fixed here (see below).

I don't object to the change per se, since it reduces the code size
very slightly, but the commit message and headline needs to be
reworded to avoid giving the impression that this fixes something.

  int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva)
  {
 - kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
 - return 0;
 + return kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
  }
  
  int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start, unsigned 
 long end)
  {
 - kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp);
 - return 0;
 + return kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp);

kvm_handle_hva and kvm_handle_hva_range call the handler function
(kvm_unmap_rmapp in this case) one or more times, and return the
logical OR of the return values from the handler.  Since
kvm_unmap_rmapp always returns 0, the return value from
kvm_handle_hva{,_range} will always be 0 here.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm:powerpc:Fix return statements for wrapper functions in the file book3s_64_mmu_hv.c

2015-08-20 Thread Paul Mackerras

On Mon, Aug 10, 2015 at 11:27:31AM -0400, Nicholas Krause wrote:
 This fixes the wrapper functions kvm_umap_hva_hv and the function
 kvm_unmap_hav_range_hv to return the return value of the function
 kvm_handle_hva or kvm_handle_hva_range that they are wrapped to
 call internally rather then always making the caller of these
 wrapper functions think they always run successfully by returning
 the value of zero directly.

In fact these functions do always run successfully, and there is no
bug fixed here (see below).

I don't object to the change per se, since it reduces the code size
very slightly, but the commit message and headline needs to be
reworded to avoid giving the impression that this fixes something.

  int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva)
  {
 - kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
 - return 0;
 + return kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
  }
  
  int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start, unsigned 
 long end)
  {
 - kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp);
 - return 0;
 + return kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp);

kvm_handle_hva and kvm_handle_hva_range call the handler function
(kvm_unmap_rmapp in this case) one or more times, and return the
logical OR of the return values from the handler.  Since
kvm_unmap_rmapp always returns 0, the return value from
kvm_handle_hva{,_range} will always be 0 here.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Two fixes for dynamic micro-threading

2015-07-24 Thread Paul Mackerras

On Thu, Jul 23, 2015 at 02:02:51PM +0200, Alexander Graf wrote:
 
 The host crash should only occur with dynamic micro-threading enabled,
 which is not in Linus' tree, correct?

Correct.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Two fixes for dynamic micro-threading

2015-07-24 Thread Paul Mackerras

On Thu, Jul 23, 2015 at 02:02:51PM +0200, Alexander Graf wrote:
 
 The host crash should only occur with dynamic micro-threading enabled,
 which is not in Linus' tree, correct?

Correct.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation

2015-07-16 Thread Paul Mackerras

Whenever a vcore state is VCORE_PREEMPT we need to be counting stolen
time for it.  This currently isn't the case when we have a vcore that
no longer has any runnable threads in it but still has a runner task,
so we do an explicit call to kvmppc_core_start_stolen() in that case.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3d02276..fad52f2 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2283,9 +2283,14 @@ static void post_guest_process(struct kvmppc_vcore *vc, 
bool is_master)
}
list_del_init(vc-preempt_list);
if (!is_master) {
-   vc-vcore_state = vc-runner ? VCORE_PREEMPT : VCORE_INACTIVE;
-   if (still_running  0)
+   if (still_running  0) {
kvmppc_vcore_preempt(vc);
+   } else if (vc-runner) {
+   vc-vcore_state = VCORE_PREEMPT;
+   kvmppc_core_start_stolen(vc);
+   } else {
+   vc-vcore_state = VCORE_INACTIVE;
+   }
if (vc-n_runnable  0  vc-runner == NULL) {
/* make sure there's a candidate runner awake */
vcpu = list_first_entry(vc-runnable_threads,
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: PPC: Book3S HV: Fix preempted vcore list locking

2015-07-16 Thread Paul Mackerras

When a vcore gets preempted, we put it on the preempted vcore list for
the current CPU.  The runner task then calls schedule() and comes back
some time later and takes itself off the list.  We need to be careful
to lock the list that it was put onto, which may not be the list for the
current CPU since the runner task may have moved to another CPU.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6e3ef30..3d02276 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1962,10 +1962,11 @@ static void kvmppc_vcore_preempt(struct kvmppc_vcore 
*vc)
 
 static void kvmppc_vcore_end_preempt(struct kvmppc_vcore *vc)
 {
-   struct preempted_vcore_list *lp = this_cpu_ptr(preempted_vcores);
+   struct preempted_vcore_list *lp;
 
kvmppc_core_end_stolen(vc);
if (!list_empty(vc-preempt_list)) {
+   lp = per_cpu(preempted_vcores, vc-pcpu);
spin_lock(lp-lock);
list_del_init(vc-preempt_list);
spin_unlock(lp-lock);
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: PPC: Book3S HV: Fix preempted vcore list locking

2015-07-16 Thread Paul Mackerras

When a vcore gets preempted, we put it on the preempted vcore list for
the current CPU.  The runner task then calls schedule() and comes back
some time later and takes itself off the list.  We need to be careful
to lock the list that it was put onto, which may not be the list for the
current CPU since the runner task may have moved to another CPU.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6e3ef30..3d02276 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1962,10 +1962,11 @@ static void kvmppc_vcore_preempt(struct kvmppc_vcore 
*vc)
 
 static void kvmppc_vcore_end_preempt(struct kvmppc_vcore *vc)
 {
-   struct preempted_vcore_list *lp = this_cpu_ptr(preempted_vcores);
+   struct preempted_vcore_list *lp;
 
kvmppc_core_end_stolen(vc);
if (!list_empty(vc-preempt_list)) {
+   lp = per_cpu(preempted_vcores, vc-pcpu);
spin_lock(lp-lock);
list_del_init(vc-preempt_list);
spin_unlock(lp-lock);
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] Two fixes for dynamic micro-threading

2015-07-16 Thread Paul Mackerras

This series contains two fixes for the new dynamic micro-threading
code that was added recently for HV-mode KVM on Power servers.
The patches are against Alex Graf's kvm-ppc-queue branch.  Please
apply.

Paul.

 arch/powerpc/kvm/book3s_hv.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation

2015-07-16 Thread Paul Mackerras

Whenever a vcore state is VCORE_PREEMPT we need to be counting stolen
time for it.  This currently isn't the case when we have a vcore that
no longer has any runnable threads in it but still has a runner task,
so we do an explicit call to kvmppc_core_start_stolen() in that case.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3d02276..fad52f2 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2283,9 +2283,14 @@ static void post_guest_process(struct kvmppc_vcore *vc, 
bool is_master)
}
list_del_init(vc-preempt_list);
if (!is_master) {
-   vc-vcore_state = vc-runner ? VCORE_PREEMPT : VCORE_INACTIVE;
-   if (still_running  0)
+   if (still_running  0) {
kvmppc_vcore_preempt(vc);
+   } else if (vc-runner) {
+   vc-vcore_state = VCORE_PREEMPT;
+   kvmppc_core_start_stolen(vc);
+   } else {
+   vc-vcore_state = VCORE_INACTIVE;
+   }
if (vc-n_runnable  0  vc-runner == NULL) {
/* make sure there's a candidate runner awake */
vcpu = list_first_entry(vc-runnable_threads,
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] Two fixes for dynamic micro-threading

2015-07-16 Thread Paul Mackerras

This series contains two fixes for the new dynamic micro-threading
code that was added recently for HV-mode KVM on Power servers.
The patches are against Alex Graf's kvm-ppc-queue branch.  Please
apply.

Paul.

 arch/powerpc/kvm/book3s_hv.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-07-02 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras pau...@samba.org
---
v3: Rename MAX_THREADS to MAX_SMT_THREADS to avoid a compile warning

 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 367 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 113 +++--
 6 files changed, 473 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..57d5dfe 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_SMT_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_SMT_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore */
+#define VCORE_EXIT_REQ 0x1
+
 /*
  * Values for vcore_state.
  * Note that these are arranged such that lower values
diff --git a/arch/powerpc/kernel/asm

[PATCH v3] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-07-02 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras pau...@samba.org
---
v3: Rename MAX_THREADS to MAX_SMT_THREADS to avoid a compile warning

 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 367 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 113 +++--
 6 files changed, 473 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..57d5dfe 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_SMT_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_SMT_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore */
+#define VCORE_EXIT_REQ 0x1
+
 /*
  * Values for vcore_state.
  * Note that these are arranged such that lower values
diff --git a/arch/powerpc/kernel/asm

[PATCH v2 2/5] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-06-30 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras pau...@samba.org
---
v2: List allowed values for dynamic_mt_modes module parameter in the
module parameter description.

 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 369 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 113 +++--
 6 files changed, 475 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..4024d24 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore */
+#define VCORE_EXIT_REQ 0x1
+
 /*
  * Values for vcore_state.
  * Note that these are arranged such that lower values
diff --git a/arch

[PATCH v2 2/5] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-06-30 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras pau...@samba.org
---
v2: List allowed values for dynamic_mt_modes module parameter in the
module parameter description.

 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 369 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 113 +++--
 6 files changed, 475 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..4024d24 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore */
+#define VCORE_EXIT_REQ 0x1
+
 /*
  * Values for vcore_state.
  * Note that these are arranged such that lower values
diff --git a/arch

[PATCH 3/5] KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE

2015-06-24 Thread Paul Mackerras

The reference (R) and change (C) bits in a HPT entry can be set by
hardware at any time up until the HPTE is invalidated and the TLB
invalidation sequence has completed.  This means that when removing
a HPTE, we need to read the HPTE after the invalidation sequence has
completed in order to obtain reliable values of R and C.  The code
in kvmppc_do_h_remove() used to do this.  However, commit 6f22bd3265fb
(KVM: PPC: Book3S HV: Make HTAB code LE host aware) removed the
read after invalidation as a side effect of other changes.  This
restores the read of the HPTE after invalidation.

The user-visible effect of this bug would be that when migrating a
guest, there is a small probability that a page modified by the guest
and then unmapped by the guest might not get re-transmitted and thus
the destination might end up with a stale copy of the page.

Fixes: 6f22bd3265fb
Cc: sta...@vger.kernel.org # v3.17+
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index b027a89..c6d601c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -421,14 +421,20 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]);
v = pte  ~HPTE_V_HVLOCK;
if (v  HPTE_V_VALID) {
-   u64 pte1;
-
-   pte1 = be64_to_cpu(hpte[1]);
hpte[0] = ~cpu_to_be64(HPTE_V_VALID);
-   rb = compute_tlbie_rb(v, pte1, pte_index);
+   rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
do_tlbies(kvm, rb, 1, global_invalidates(kvm, flags), true);
-   /* Read PTE low word after tlbie to get final R/C values */
-   remove_revmap_chain(kvm, pte_index, rev, v, pte1);
+   /*
+* The reference (R) and change (C) bits in a HPT
+* entry can be set by hardware at any time up until
+* the HPTE is invalidated and the TLB invalidation
+* sequence has completed.  This means that when
+* removing a HPTE, we need to re-read the HPTE after
+* the invalidation sequence has completed in order to
+* obtain reliable values of R and C.
+*/
+   remove_revmap_chain(kvm, pte_index, rev, v,
+   be64_to_cpu(hpte[1]));
}
r = rev-guest_rpte  ~HPTE_GR_RESERVED;
note_hpte_modification(kvm, rev);
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD

2015-06-24 Thread Paul Mackerras

This adds implementations for the H_CLEAR_REF (test and clear reference
bit) and H_CLEAR_MOD (test and clear changed bit) hypercalls.

When clearing the reference or change bit in the guest view of the HPTE,
we also have to clear it in the real HPTE so that we can detect future
references or changes.  When we do so, we transfer the R or C bit value
to the rmap entry for the underlying host page so that kvm_age_hva_hv(),
kvm_test_age_hva_hv() and kvmppc_hv_get_dirty_log() know that the page
has been referenced and/or changed.

These hypercalls are not used by Linux guests.  These implementations
have been tested using a FreeBSD guest.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 126 ++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   4 +-
 2 files changed, 121 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c7a3ab2..c1df9bb 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -112,25 +112,38 @@ void kvmppc_update_rmap_change(unsigned long *rmap, 
unsigned long psize)
 }
 EXPORT_SYMBOL_GPL(kvmppc_update_rmap_change);
 
+/* Returns a pointer to the revmap entry for the page mapped by a HPTE */
+static unsigned long *revmap_for_hpte(struct kvm *kvm, unsigned long hpte_v,
+ unsigned long hpte_gr)
+{
+   struct kvm_memory_slot *memslot;
+   unsigned long *rmap;
+   unsigned long gfn;
+
+   gfn = hpte_rpn(hpte_gr, hpte_page_size(hpte_v, hpte_gr));
+   memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn);
+   if (!memslot)
+   return NULL;
+
+   rmap = real_vmalloc_addr(memslot-arch.rmap[gfn - memslot-base_gfn]);
+   return rmap;
+}
+
 /* Remove this HPTE from the chain for a real page */
 static void remove_revmap_chain(struct kvm *kvm, long pte_index,
struct revmap_entry *rev,
unsigned long hpte_v, unsigned long hpte_r)
 {
struct revmap_entry *next, *prev;
-   unsigned long gfn, ptel, head;
-   struct kvm_memory_slot *memslot;
+   unsigned long ptel, head;
unsigned long *rmap;
unsigned long rcbits;
 
rcbits = hpte_r  (HPTE_R_R | HPTE_R_C);
ptel = rev-guest_rpte |= rcbits;
-   gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel));
-   memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn);
-   if (!memslot)
+   rmap = revmap_for_hpte(kvm, hpte_v, ptel);
+   if (!rmap)
return;
-
-   rmap = real_vmalloc_addr(memslot-arch.rmap[gfn - memslot-base_gfn]);
lock_rmap(rmap);
 
head = *rmap  KVMPPC_RMAP_INDEX;
@@ -678,6 +691,105 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long 
flags,
return H_SUCCESS;
 }
 
+long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
+   unsigned long pte_index)
+{
+   struct kvm *kvm = vcpu-kvm;
+   __be64 *hpte;
+   unsigned long v, r, gr;
+   struct revmap_entry *rev;
+   unsigned long *rmap;
+   long ret = H_NOT_FOUND;
+
+   if (pte_index = kvm-arch.hpt_npte)
+   return H_PARAMETER;
+
+   rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]);
+   hpte = (__be64 *)(kvm-arch.hpt_virt + (pte_index  4));
+   while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
+   cpu_relax();
+   v = be64_to_cpu(hpte[0]);
+   r = be64_to_cpu(hpte[1]);
+   if (!(v  (HPTE_V_VALID | HPTE_V_ABSENT)))
+   goto out;
+
+   gr = rev-guest_rpte;
+   if (rev-guest_rpte  HPTE_R_R) {
+   rev-guest_rpte = ~HPTE_R_R;
+   note_hpte_modification(kvm, rev);
+   }
+   if (v  HPTE_V_VALID) {
+   gr |= r  (HPTE_R_R | HPTE_R_C);
+   if (r  HPTE_R_R) {
+   kvmppc_clear_ref_hpte(kvm, hpte, pte_index);
+   rmap = revmap_for_hpte(kvm, v, gr);
+   if (rmap) {
+   lock_rmap(rmap);
+   *rmap |= KVMPPC_RMAP_REFERENCED;
+   unlock_rmap(rmap);
+   }
+   }
+   }
+   vcpu-arch.gpr[4] = gr;
+   ret = H_SUCCESS;
+ out:
+   unlock_hpte(hpte, v  ~HPTE_V_HVLOCK);
+   return ret;
+}
+
+long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
+   unsigned long pte_index)
+{
+   struct kvm *kvm = vcpu-kvm;
+   __be64 *hpte;
+   unsigned long v, r, gr;
+   struct revmap_entry *rev;
+   unsigned long *rmap;
+   long ret = H_NOT_FOUND;
+
+   if (pte_index = kvm-arch.hpt_npte)
+   return H_PARAMETER;
+
+   rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]);
+   hpte = (__be64 *)(kvm-arch.hpt_virt + (pte_index  4));
+   while (!try_lock_hpte(hpte

[PATCH 2/5] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-06-24 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 369 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 113 +++--
 6 files changed, 475 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..4024d24 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore */
+#define VCORE_EXIT_REQ 0x1
+
 /*
  * Values for vcore_state.
  * Note that these are arranged such that lower values
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index d333664..c3e11e0 100644

[PATCH 3/5] KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE

2015-06-24 Thread Paul Mackerras

The reference (R) and change (C) bits in a HPT entry can be set by
hardware at any time up until the HPTE is invalidated and the TLB
invalidation sequence has completed.  This means that when removing
a HPTE, we need to read the HPTE after the invalidation sequence has
completed in order to obtain reliable values of R and C.  The code
in kvmppc_do_h_remove() used to do this.  However, commit 6f22bd3265fb
(KVM: PPC: Book3S HV: Make HTAB code LE host aware) removed the
read after invalidation as a side effect of other changes.  This
restores the read of the HPTE after invalidation.

The user-visible effect of this bug would be that when migrating a
guest, there is a small probability that a page modified by the guest
and then unmapped by the guest might not get re-transmitted and thus
the destination might end up with a stale copy of the page.

Fixes: 6f22bd3265fb
Cc: sta...@vger.kernel.org # v3.17+
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index b027a89..c6d601c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -421,14 +421,20 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long 
flags,
rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]);
v = pte  ~HPTE_V_HVLOCK;
if (v  HPTE_V_VALID) {
-   u64 pte1;
-
-   pte1 = be64_to_cpu(hpte[1]);
hpte[0] = ~cpu_to_be64(HPTE_V_VALID);
-   rb = compute_tlbie_rb(v, pte1, pte_index);
+   rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
do_tlbies(kvm, rb, 1, global_invalidates(kvm, flags), true);
-   /* Read PTE low word after tlbie to get final R/C values */
-   remove_revmap_chain(kvm, pte_index, rev, v, pte1);
+   /*
+* The reference (R) and change (C) bits in a HPT
+* entry can be set by hardware at any time up until
+* the HPTE is invalidated and the TLB invalidation
+* sequence has completed.  This means that when
+* removing a HPTE, we need to re-read the HPTE after
+* the invalidation sequence has completed in order to
+* obtain reliable values of R and C.
+*/
+   remove_revmap_chain(kvm, pte_index, rev, v,
+   be64_to_cpu(hpte[1]));
}
r = rev-guest_rpte  ~HPTE_GR_RESERVED;
note_hpte_modification(kvm, rev);
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5] KVM: PPC: Book3S HV: Make use of unused threads when running guests

2015-06-24 Thread Paul Mackerras

When running a virtual core of a guest that is configured with fewer
threads per core than the physical cores have, the extra physical
threads are currently unused.  This makes it possible to use them to
run one or more other virtual cores from the same guest when certain
conditions are met.  This applies on POWER7, and on POWER8 to guests
with one thread per virtual core.  (It doesn't apply to POWER8 guests
with multiple threads per vcore because they require a 1-1 virtual to
physical thread mapping in order to be able to use msgsndp and the
TIR.)

The idea is that we maintain a list of preempted vcores for each
physical cpu (i.e. each core, since the host runs single-threaded).
Then, when a vcore is about to run, it checks to see if there are
any vcores on the list for its physical cpu that could be
piggybacked onto this vcore's execution.  If so, those additional
vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
threads are started as well as the original vcore, which is called
the master vcore.

After the vcores have exited the guest, the extra ones are put back
onto the preempted list if any of their VCPUs are still runnable and
not idle.

This means that vcpu-arch.ptid is no longer necessarily the same as
the physical thread that the vcpu runs on.  In order to make it easier
for code that wants to send an IPI to know which CPU to target, we
now store that in a new field in struct vcpu_arch, called thread_cpu.

Reviewed-by: David Gibson da...@gibson.dropbear.id.au
Tested-by: Laurent Vivier lviv...@redhat.com
Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  19 +-
 arch/powerpc/kernel/asm-offsets.c   |   2 +
 arch/powerpc/kvm/book3s_hv.c| 333 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c|   7 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c|   4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   5 +
 6 files changed, 298 insertions(+), 72 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..2b74490 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -278,7 +278,9 @@ struct kvmppc_vcore {
u16 last_cpu;
u8 vcore_state;
u8 in_guest;
+   struct kvmppc_vcore *master_vcore;
struct list_head runnable_threads;
+   struct list_head preempt_list;
spinlock_t lock;
wait_queue_head_t wq;
spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */
@@ -300,12 +302,18 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
-/* Values for vcore_state */
+/*
+ * Values for vcore_state.
+ * Note that these are arranged such that lower values
+ * ( VCORE_SLEEPING) don't require stolen time accounting
+ * on load/unload, and higher values do.
+ */
 #define VCORE_INACTIVE 0
-#define VCORE_SLEEPING 1
-#define VCORE_PREEMPT  2
-#define VCORE_RUNNING  3
-#define VCORE_EXITING  4
+#define VCORE_PREEMPT  1
+#define VCORE_PIGGYBACK2
+#define VCORE_SLEEPING 3
+#define VCORE_RUNNING  4
+#define VCORE_EXITING  5
 
 /*
  * Struct used to manage memory for a virtual processor area
@@ -619,6 +627,7 @@ struct kvm_vcpu_arch {
int trap;
int state;
int ptid;
+   int thread_cpu;
bool timer_running;
wait_queue_head_t cpu_run;
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 0034b6b..d333664 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -512,6 +512,8 @@ int main(void)
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty));
DEFINE(VCPU_HEIR, offsetof(struct kvm_vcpu, arch.emul_inst));
+   DEFINE(VCPU_CPU, offsetof(struct kvm_vcpu, cpu));
+   DEFINE(VCPU_THREAD_CPU, offsetof(struct kvm_vcpu, arch.thread_cpu));
 #endif
 #ifdef CONFIG_PPC_BOOK3S
DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 68d067a..2048309 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -81,6 +81,9 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
MAX_HCALL_OPCODE/4 + 1);
 #define MPP_BUFFER_ORDER   3
 #endif
 
+static int target_smt_mode;
+module_param(target_smt_mode, int, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(target_smt_mode, Target threads per core (0 = max));
 
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
@@ -114,7 +117,7 @@ static bool kvmppc_ipi_thread(int cpu)
 
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-   int cpu = vcpu-cpu;
+   int cpu;
wait_queue_head_t *wqp;
 
wqp = kvm_arch_vcpu_wq(vcpu);
@@ -123,10 +126,11 @@ static void

[PATCH 0/5] PPC: Current patch queue for HV KVM

2015-06-24 Thread Paul Mackerras

This is my current queue of patches for HV KVM.  This series is based
on the kvm next branch.  They have all been posted 6 weeks ago or
more, though I have just added a 3-line fix to patch 2/5 to fix a bug
that we found in testing migration, and I expanded a comment (no code
change) in patch 3/5 following a suggestion by Aneesh.

I'd like to see these go into 4.2 if possible.

Paul.
---

 arch/powerpc/include/asm/kvm_book3s.h |   1 +
 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 +
 arch/powerpc/include/asm/kvm_host.h   |  24 +-
 arch/powerpc/kernel/asm-offsets.c |   9 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |   8 +-
 arch/powerpc/kvm/book3s_hv.c  | 648 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  32 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   | 161 +++-
 arch/powerpc/kvm/book3s_hv_rm_xics.c  |   4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 122 +-
 10 files changed, 906 insertions(+), 123 deletions(-)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/5] PPC: Current patch queue for HV KVM

2015-06-24 Thread Paul Mackerras

This is my current queue of patches for HV KVM.  This series is based
on the kvm next branch.  They have all been posted 6 weeks ago or
more, though I have just added a 3-line fix to patch 2/5 to fix a bug
that we found in testing migration, and I expanded a comment (no code
change) in patch 3/5 following a suggestion by Aneesh.

I'd like to see these go into 4.2 if possible.

Paul.
---

 arch/powerpc/include/asm/kvm_book3s.h |   1 +
 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 +
 arch/powerpc/include/asm/kvm_host.h   |  24 +-
 arch/powerpc/kernel/asm-offsets.c |   9 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |   8 +-
 arch/powerpc/kvm/book3s_hv.c  | 648 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  32 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   | 161 +++-
 arch/powerpc/kvm/book3s_hv_rm_xics.c  |   4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 122 +-
 10 files changed, 906 insertions(+), 123 deletions(-)


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-06-24 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras pau...@samba.org
---
v2: Add a test (3 lines) to book3s_hv_rmhandlers.S to ensure that we
don't subtract the timebase offset in cases where we didn't add it.
This fixes a bug found in testing where the timebase could get out of
sync, causing soft lockups and crashes.

 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 369 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 113 +++--
 6 files changed, 475 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..4024d24 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore

[PATCH v2] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-06-24 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras pau...@samba.org
---
v2: Add a test (3 lines) to book3s_hv_rmhandlers.S to ensure that we
don't subtract the timebase offset in cases where we didn't add it.
This fixes a bug found in testing where the timebase could get out of
sync, causing soft lockups and crashes.

 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 369 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 113 +++--
 6 files changed, 475 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..4024d24 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore

[PATCH 5/5] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD

2015-06-24 Thread Paul Mackerras

This adds implementations for the H_CLEAR_REF (test and clear reference
bit) and H_CLEAR_MOD (test and clear changed bit) hypercalls.

When clearing the reference or change bit in the guest view of the HPTE,
we also have to clear it in the real HPTE so that we can detect future
references or changes.  When we do so, we transfer the R or C bit value
to the rmap entry for the underlying host page so that kvm_age_hva_hv(),
kvm_test_age_hva_hv() and kvmppc_hv_get_dirty_log() know that the page
has been referenced and/or changed.

These hypercalls are not used by Linux guests.  These implementations
have been tested using a FreeBSD guest.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 126 ++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   4 +-
 2 files changed, 121 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c7a3ab2..c1df9bb 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -112,25 +112,38 @@ void kvmppc_update_rmap_change(unsigned long *rmap, 
unsigned long psize)
 }
 EXPORT_SYMBOL_GPL(kvmppc_update_rmap_change);
 
+/* Returns a pointer to the revmap entry for the page mapped by a HPTE */
+static unsigned long *revmap_for_hpte(struct kvm *kvm, unsigned long hpte_v,
+ unsigned long hpte_gr)
+{
+   struct kvm_memory_slot *memslot;
+   unsigned long *rmap;
+   unsigned long gfn;
+
+   gfn = hpte_rpn(hpte_gr, hpte_page_size(hpte_v, hpte_gr));
+   memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn);
+   if (!memslot)
+   return NULL;
+
+   rmap = real_vmalloc_addr(memslot-arch.rmap[gfn - memslot-base_gfn]);
+   return rmap;
+}
+
 /* Remove this HPTE from the chain for a real page */
 static void remove_revmap_chain(struct kvm *kvm, long pte_index,
struct revmap_entry *rev,
unsigned long hpte_v, unsigned long hpte_r)
 {
struct revmap_entry *next, *prev;
-   unsigned long gfn, ptel, head;
-   struct kvm_memory_slot *memslot;
+   unsigned long ptel, head;
unsigned long *rmap;
unsigned long rcbits;
 
rcbits = hpte_r  (HPTE_R_R | HPTE_R_C);
ptel = rev-guest_rpte |= rcbits;
-   gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel));
-   memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn);
-   if (!memslot)
+   rmap = revmap_for_hpte(kvm, hpte_v, ptel);
+   if (!rmap)
return;
-
-   rmap = real_vmalloc_addr(memslot-arch.rmap[gfn - memslot-base_gfn]);
lock_rmap(rmap);
 
head = *rmap  KVMPPC_RMAP_INDEX;
@@ -678,6 +691,105 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long 
flags,
return H_SUCCESS;
 }
 
+long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
+   unsigned long pte_index)
+{
+   struct kvm *kvm = vcpu-kvm;
+   __be64 *hpte;
+   unsigned long v, r, gr;
+   struct revmap_entry *rev;
+   unsigned long *rmap;
+   long ret = H_NOT_FOUND;
+
+   if (pte_index = kvm-arch.hpt_npte)
+   return H_PARAMETER;
+
+   rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]);
+   hpte = (__be64 *)(kvm-arch.hpt_virt + (pte_index  4));
+   while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
+   cpu_relax();
+   v = be64_to_cpu(hpte[0]);
+   r = be64_to_cpu(hpte[1]);
+   if (!(v  (HPTE_V_VALID | HPTE_V_ABSENT)))
+   goto out;
+
+   gr = rev-guest_rpte;
+   if (rev-guest_rpte  HPTE_R_R) {
+   rev-guest_rpte = ~HPTE_R_R;
+   note_hpte_modification(kvm, rev);
+   }
+   if (v  HPTE_V_VALID) {
+   gr |= r  (HPTE_R_R | HPTE_R_C);
+   if (r  HPTE_R_R) {
+   kvmppc_clear_ref_hpte(kvm, hpte, pte_index);
+   rmap = revmap_for_hpte(kvm, v, gr);
+   if (rmap) {
+   lock_rmap(rmap);
+   *rmap |= KVMPPC_RMAP_REFERENCED;
+   unlock_rmap(rmap);
+   }
+   }
+   }
+   vcpu-arch.gpr[4] = gr;
+   ret = H_SUCCESS;
+ out:
+   unlock_hpte(hpte, v  ~HPTE_V_HVLOCK);
+   return ret;
+}
+
+long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
+   unsigned long pte_index)
+{
+   struct kvm *kvm = vcpu-kvm;
+   __be64 *hpte;
+   unsigned long v, r, gr;
+   struct revmap_entry *rev;
+   unsigned long *rmap;
+   long ret = H_NOT_FOUND;
+
+   if (pte_index = kvm-arch.hpt_npte)
+   return H_PARAMETER;
+
+   rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]);
+   hpte = (__be64 *)(kvm-arch.hpt_virt + (pte_index  4));
+   while (!try_lock_hpte(hpte

[PATCH 2/5] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-06-24 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

It is now possible for thread 0 to find that an exit has been
requested before it gets to switch the subcore state to the guest.  In
that case we haven't added the guest's timebase offset to the
timebase, so we need to be careful not to subtract the offset in the
guest exit path.  In fact we just skip the whole path that switches
back to host context, since we haven't switched to the guest context.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 369 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 113 +++--
 6 files changed, 475 insertions(+), 62 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..4024d24 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore */
+#define VCORE_EXIT_REQ 0x1
+
 /*
  * Values for vcore_state.
  * Note that these are arranged such that lower values
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index d333664..c3e11e0 100644

[PATCH 4/5] KVM: PPC: Book3S HV: Fix bug in dirty page tracking

2015-06-24 Thread Paul Mackerras

This fixes a bug in the tracking of pages that get modified by the
guest.  If the guest creates a large-page HPTE, writes to memory
somewhere within the large page, and then removes the HPTE, we only
record the modified state for the first normal page within the large
page, when in fact the guest might have modified some other normal
page within the large page.

To fix this we use some unused bits in the rmap entry to record the
order (log base 2) of the size of the page that was modified, when
removing an HPTE.  Then in kvm_test_clear_dirty_npages() we use that
order to return the correct number of modified pages.

The same thing could in principle happen when removing a HPTE at the
host's request, i.e. when paging out a page, except that we never
page out large pages, and the guest can only create large-page HPTEs
if the guest RAM is backed by large pages.  However, we also fix
this case for the sake of future-proofing.

The reference bit is also subject to the same loss of information.  We
don't make the same fix here for the reference bit because there isn't
an interface for userspace to find out which pages the guest has
referenced, whereas there is one for userspace to find out which pages
the guest has modified.  Because of this loss of information, the
kvm_age_hva_hv() and kvm_test_age_hva_hv() functions might incorrectly
say that a page has not been referenced when it has, but that doesn't
matter greatly because we never page or swap out large pages.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s.h |  1 +
 arch/powerpc/include/asm/kvm_host.h   |  2 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |  8 +++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   | 17 +
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index b91e74a..e6b2534 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -158,6 +158,7 @@ extern pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t 
gpa, bool writing,
bool *writable);
 extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,
unsigned long *rmap, long pte_index, int realmode);
+extern void kvmppc_update_rmap_change(unsigned long *rmap, unsigned long 
psize);
 extern void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
unsigned long pte_index);
 void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 80eb29a..e187b6a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -205,8 +205,10 @@ struct revmap_entry {
  */
 #define KVMPPC_RMAP_LOCK_BIT   63
 #define KVMPPC_RMAP_RC_SHIFT   32
+#define KVMPPC_RMAP_CHG_SHIFT  48
 #define KVMPPC_RMAP_REFERENCED (HPTE_R_R  KVMPPC_RMAP_RC_SHIFT)
 #define KVMPPC_RMAP_CHANGED(HPTE_R_C  KVMPPC_RMAP_RC_SHIFT)
+#define KVMPPC_RMAP_CHG_ORDER  (0x3ful  KVMPPC_RMAP_CHG_SHIFT)
 #define KVMPPC_RMAP_PRESENT0x1ul
 #define KVMPPC_RMAP_INDEX  0xul
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index dab68b7..1f9c0a1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -761,6 +761,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long 
*rmapp,
/* Harvest R and C */
rcbits = be64_to_cpu(hptep[1])  (HPTE_R_R | HPTE_R_C);
*rmapp |= rcbits  KVMPPC_RMAP_RC_SHIFT;
+   if (rcbits  HPTE_R_C)
+   kvmppc_update_rmap_change(rmapp, psize);
if (rcbits  ~rev[i].guest_rpte) {
rev[i].guest_rpte = ptel | rcbits;
note_hpte_modification(kvm, rev[i]);
@@ -927,8 +929,12 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, 
unsigned long *rmapp)
  retry:
lock_rmap(rmapp);
if (*rmapp  KVMPPC_RMAP_CHANGED) {
-   *rmapp = ~KVMPPC_RMAP_CHANGED;
+   long change_order = (*rmapp  KVMPPC_RMAP_CHG_ORDER)
+KVMPPC_RMAP_CHG_SHIFT;
+   *rmapp = ~(KVMPPC_RMAP_CHANGED | KVMPPC_RMAP_CHG_ORDER);
npages_dirty = 1;
+   if (change_order  PAGE_SHIFT)
+   npages_dirty = 1ul  (change_order - PAGE_SHIFT);
}
if (!(*rmapp  KVMPPC_RMAP_PRESENT)) {
unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index c6d601c..c7a3ab2 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -12,6 +12,7 @@
 #include linux/kvm_host.h
 #include linux/hugetlb.h
 #include linux/module.h
+#include linux/log2.h
 
 #include asm

Re: [PATCH 0/2] KVM: PPC: Book3S HV: Dynamic micro-threading/split-core

2015-06-21 Thread Paul Mackerras

On Wed, Jun 17, 2015 at 07:30:09PM +0200, Laurent Vivier wrote:
 
 Tested-by: Laurent Vivier lviv...@redhat.com
 
 Performance is better, but Paul could you explain why it is better if I 
 disable dynamic micro-threading ?
 Did I miss something ?
 
 My test system is an IBM Power S822L.
 
 I run two guests with 8 vCPUs (-smp 8,sockets=8,cores=1,threads=1) both
 attached on the same core (with pinning option of virt-manager). Then, I
 measure the time needed to compile a kernel in parallel in both guests
 with make -j 16.
 
 My kernel without micro-threading:
 
 real37m23.424s real37m24.959s
 user167m31.474suser165m44.142s
 sys 113m26.195ssys 113m45.072s
 
 With micro-threading patches (PATCH 1+2):
 
 target_smt_mode 0 [in fact It was 8 here, but it should behave like 0, as it 
 is  max threads/sub-core]
 dynamic_mt_modes 6
 
 real32m13.338s real  32m26.652s
 user139m21.181suser  140m20.994s
 sys 77m35.339s sys   78m16.599s
 
 It's better, but if I disable dynamic micro-threading (but PATCH 1+2):
 
 target_smt_mode 0
 dynamic_mt_modes 0
 
 real30m49.100s real 30m48.161s
 user144m22.989suser 142m53.886s
 sys 65m4.942s  sys  66m8.159s
 
 it's even better.

I think what's happening here is that with dynamic_mt_modes=0 the
system alternates between the two guests, whereas with
dynamic_mt_modes=6 it will spend some of the time running both guests
simultaneously in two-way split mode.  Since you have two
compute-bound guests that each have threads=1 and 8 vcpus, it can fill
up the core either way.  In that case it is more efficient to fill up
the core with vcpus from one guest and not have to split the core,
firstly because you avoid the split/unsplit latency and secondly
because the threads run a little faster in whole-core mode than in
split-core.

I am considering adding an additional heuristic, which would be to do
two passes through the list of preempted vcores, considering only
vcores from the same guest as the primary vcore on the first pass, and
then considering all vcores on the second pass.  Maybe we could then
also say after the first pass that if we have collected 4 or more
runnable vcpus we don't bother with the second pass.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in

Re: [PATCH 0/2] KVM: PPC: Book3S HV: Dynamic micro-threading/split-core

2015-06-21 Thread Paul Mackerras

On Wed, Jun 17, 2015 at 07:30:09PM +0200, Laurent Vivier wrote:
 
 Tested-by: Laurent Vivier lviv...@redhat.com
 
 Performance is better, but Paul could you explain why it is better if I 
 disable dynamic micro-threading ?
 Did I miss something ?
 
 My test system is an IBM Power S822L.
 
 I run two guests with 8 vCPUs (-smp 8,sockets=8,cores=1,threads=1) both
 attached on the same core (with pinning option of virt-manager). Then, I
 measure the time needed to compile a kernel in parallel in both guests
 with make -j 16.
 
 My kernel without micro-threading:
 
 real37m23.424s real37m24.959s
 user167m31.474suser165m44.142s
 sys 113m26.195ssys 113m45.072s
 
 With micro-threading patches (PATCH 1+2):
 
 target_smt_mode 0 [in fact It was 8 here, but it should behave like 0, as it 
 is  max threads/sub-core]
 dynamic_mt_modes 6
 
 real32m13.338s real  32m26.652s
 user139m21.181suser  140m20.994s
 sys 77m35.339s sys   78m16.599s
 
 It's better, but if I disable dynamic micro-threading (but PATCH 1+2):
 
 target_smt_mode 0
 dynamic_mt_modes 0
 
 real30m49.100s real 30m48.161s
 user144m22.989suser 142m53.886s
 sys 65m4.942s  sys  66m8.159s
 
 it's even better.

I think what's happening here is that with dynamic_mt_modes=0 the
system alternates between the two guests, whereas with
dynamic_mt_modes=6 it will spend some of the time running both guests
simultaneously in two-way split mode.  Since you have two
compute-bound guests that each have threads=1 and 8 vcpus, it can fill
up the core either way.  In that case it is more efficient to fill up
the core with vcpus from one guest and not have to split the core,
firstly because you avoid the split/unsplit latency and secondly
because the threads run a little faster in whole-core mode than in
split-core.

I am considering adding an additional heuristic, which would be to do
two passes through the list of preempted vcores, considering only
vcores from the same guest as the primary vcore on the first pass, and
then considering all vcores on the second pass.  Maybe we could then
also say after the first pass that if we have collected 4 or more
runnable vcpus we don't bother with the second pass.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in

[PATCH 0/2] KVM: PPC: Book3S HV: Dynamic micro-threading/split-core

2015-05-27 Thread Paul Mackerras

This patch series provides a way to use more of the capacity of each
processor core when running guests configured with threads=1, 2 or 4
on a POWER8 host with HV KVM, without having to change the static
micro-threading (the official name for split-core) mode for the whole
machine.  The problem with setting the machine to static 2-way or
4-way micro-threading mode is that (a) then you can't run guests with
threads=8 and (b) selecting the right mode can be tricky and requires
knowledge of what guests you will be running.

Instead, with these two patches, we can now run more than one virtual
core (vcore) on a given physical core if possible, and if that means
we need to switch the core to 2-way or 4-way micro-threading mode,
then we do that on entry to the guests and switch back to whole-core
mode on exit (and we only switch the one core, not the whole machine).
The core mode switching is only done if the machine is in static
whole-core mode.

All of this only comes into effect when a core is over-committed.
When the machine is lightly loaded everything operates the same with
these patches as without.  Only when some core has a vcore that is
able to run while there is also another vcore that was wanting to run
on that core but got preempted does the logic kick in to try to run
both vcores at once.

Paul.
---

 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 +
 arch/powerpc/include/asm/kvm_host.h   |  22 +-
 arch/powerpc/kernel/asm-offsets.c |   9 +
 arch/powerpc/kvm/book3s_hv.c  | 648 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  32 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c  |   4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 111 -
 7 files changed, 740 insertions(+), 106 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: PPC: Book3S HV: Make use of unused threads when running guests

2015-05-27 Thread Paul Mackerras

When running a virtual core of a guest that is configured with fewer
threads per core than the physical cores have, the extra physical
threads are currently unused.  This makes it possible to use them to
run one or more other virtual cores from the same guest when certain
conditions are met.  This applies on POWER7, and on POWER8 to guests
with one thread per virtual core.  (It doesn't apply to POWER8 guests
with multiple threads per vcore because they require a 1-1 virtual to
physical thread mapping in order to be able to use msgsndp and the
TIR.)

The idea is that we maintain a list of preempted vcores for each
physical cpu (i.e. each core, since the host runs single-threaded).
Then, when a vcore is about to run, it checks to see if there are
any vcores on the list for its physical cpu that could be
piggybacked onto this vcore's execution.  If so, those additional
vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
threads are started as well as the original vcore, which is called
the master vcore.

After the vcores have exited the guest, the extra ones are put back
onto the preempted list if any of their VCPUs are still runnable and
not idle.

This means that vcpu-arch.ptid is no longer necessarily the same as
the physical thread that the vcpu runs on.  In order to make it easier
for code that wants to send an IPI to know which CPU to target, we
now store that in a new field in struct vcpu_arch, called thread_cpu.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  19 +-
 arch/powerpc/kernel/asm-offsets.c   |   2 +
 arch/powerpc/kvm/book3s_hv.c| 333 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c|   7 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c|   4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   5 +
 6 files changed, 298 insertions(+), 72 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..2b74490 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -278,7 +278,9 @@ struct kvmppc_vcore {
u16 last_cpu;
u8 vcore_state;
u8 in_guest;
+   struct kvmppc_vcore *master_vcore;
struct list_head runnable_threads;
+   struct list_head preempt_list;
spinlock_t lock;
wait_queue_head_t wq;
spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */
@@ -300,12 +302,18 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
-/* Values for vcore_state */
+/*
+ * Values for vcore_state.
+ * Note that these are arranged such that lower values
+ * ( VCORE_SLEEPING) don't require stolen time accounting
+ * on load/unload, and higher values do.
+ */
 #define VCORE_INACTIVE 0
-#define VCORE_SLEEPING 1
-#define VCORE_PREEMPT  2
-#define VCORE_RUNNING  3
-#define VCORE_EXITING  4
+#define VCORE_PREEMPT  1
+#define VCORE_PIGGYBACK2
+#define VCORE_SLEEPING 3
+#define VCORE_RUNNING  4
+#define VCORE_EXITING  5
 
 /*
  * Struct used to manage memory for a virtual processor area
@@ -619,6 +627,7 @@ struct kvm_vcpu_arch {
int trap;
int state;
int ptid;
+   int thread_cpu;
bool timer_running;
wait_queue_head_t cpu_run;
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 0034b6b..d333664 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -512,6 +512,8 @@ int main(void)
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty));
DEFINE(VCPU_HEIR, offsetof(struct kvm_vcpu, arch.emul_inst));
+   DEFINE(VCPU_CPU, offsetof(struct kvm_vcpu, cpu));
+   DEFINE(VCPU_THREAD_CPU, offsetof(struct kvm_vcpu, arch.thread_cpu));
 #endif
 #ifdef CONFIG_PPC_BOOK3S
DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 68d067a..2048309 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -81,6 +81,9 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
MAX_HCALL_OPCODE/4 + 1);
 #define MPP_BUFFER_ORDER   3
 #endif
 
+static int target_smt_mode;
+module_param(target_smt_mode, int, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(target_smt_mode, Target threads per core (0 = max));
 
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
@@ -114,7 +117,7 @@ static bool kvmppc_ipi_thread(int cpu)
 
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-   int cpu = vcpu-cpu;
+   int cpu;
wait_queue_head_t *wqp;
 
wqp = kvm_arch_vcpu_wq(vcpu);
@@ -123,10 +126,11 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu 
*vcpu)
++vcpu-stat.halt_wakeup

[PATCH 2/2] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-05-27 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 369 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 106 +++--
 6 files changed, 469 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..4024d24 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore */
+#define VCORE_EXIT_REQ 0x1
+
 /*
  * Values for vcore_state.
  * Note that these are arranged such that lower values
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index d333664..c3e11e0 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -676,7 +676,14 @@ int main(void)
HSTATE_FIELD(HSTATE_DSCR, host_dscr);
HSTATE_FIELD(HSTATE_DABR, dabr);
HSTATE_FIELD(HSTATE_DECEXP, dec_expires);
+   HSTATE_FIELD(HSTATE_SPLIT_MODE, kvm_split_mode);
DEFINE(IPI_PRIORITY, IPI_PRIORITY);
+   DEFINE(KVM_SPLIT_RPR, offsetof(struct

[PATCH 0/2] KVM: PPC: Book3S HV: Dynamic micro-threading/split-core

2015-05-27 Thread Paul Mackerras

This patch series provides a way to use more of the capacity of each
processor core when running guests configured with threads=1, 2 or 4
on a POWER8 host with HV KVM, without having to change the static
micro-threading (the official name for split-core) mode for the whole
machine.  The problem with setting the machine to static 2-way or
4-way micro-threading mode is that (a) then you can't run guests with
threads=8 and (b) selecting the right mode can be tricky and requires
knowledge of what guests you will be running.

Instead, with these two patches, we can now run more than one virtual
core (vcore) on a given physical core if possible, and if that means
we need to switch the core to 2-way or 4-way micro-threading mode,
then we do that on entry to the guests and switch back to whole-core
mode on exit (and we only switch the one core, not the whole machine).
The core mode switching is only done if the machine is in static
whole-core mode.

All of this only comes into effect when a core is over-committed.
When the machine is lightly loaded everything operates the same with
these patches as without.  Only when some core has a vcore that is
able to run while there is also another vcore that was wanting to run
on that core but got preempted does the logic kick in to try to run
both vcores at once.

Paul.
---

 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 +
 arch/powerpc/include/asm/kvm_host.h   |  22 +-
 arch/powerpc/kernel/asm-offsets.c |   9 +
 arch/powerpc/kvm/book3s_hv.c  | 648 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  32 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c  |   4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 111 -
 7 files changed, 740 insertions(+), 106 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] KVM: PPC: Book3S HV: Make use of unused threads when running guests

2015-05-27 Thread Paul Mackerras

When running a virtual core of a guest that is configured with fewer
threads per core than the physical cores have, the extra physical
threads are currently unused.  This makes it possible to use them to
run one or more other virtual cores from the same guest when certain
conditions are met.  This applies on POWER7, and on POWER8 to guests
with one thread per virtual core.  (It doesn't apply to POWER8 guests
with multiple threads per vcore because they require a 1-1 virtual to
physical thread mapping in order to be able to use msgsndp and the
TIR.)

The idea is that we maintain a list of preempted vcores for each
physical cpu (i.e. each core, since the host runs single-threaded).
Then, when a vcore is about to run, it checks to see if there are
any vcores on the list for its physical cpu that could be
piggybacked onto this vcore's execution.  If so, those additional
vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
threads are started as well as the original vcore, which is called
the master vcore.

After the vcores have exited the guest, the extra ones are put back
onto the preempted list if any of their VCPUs are still runnable and
not idle.

This means that vcpu-arch.ptid is no longer necessarily the same as
the physical thread that the vcpu runs on.  In order to make it easier
for code that wants to send an IPI to know which CPU to target, we
now store that in a new field in struct vcpu_arch, called thread_cpu.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_host.h |  19 +-
 arch/powerpc/kernel/asm-offsets.c   |   2 +
 arch/powerpc/kvm/book3s_hv.c| 333 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c|   7 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c|   4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   5 +
 6 files changed, 298 insertions(+), 72 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..2b74490 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -278,7 +278,9 @@ struct kvmppc_vcore {
u16 last_cpu;
u8 vcore_state;
u8 in_guest;
+   struct kvmppc_vcore *master_vcore;
struct list_head runnable_threads;
+   struct list_head preempt_list;
spinlock_t lock;
wait_queue_head_t wq;
spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */
@@ -300,12 +302,18 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
-/* Values for vcore_state */
+/*
+ * Values for vcore_state.
+ * Note that these are arranged such that lower values
+ * ( VCORE_SLEEPING) don't require stolen time accounting
+ * on load/unload, and higher values do.
+ */
 #define VCORE_INACTIVE 0
-#define VCORE_SLEEPING 1
-#define VCORE_PREEMPT  2
-#define VCORE_RUNNING  3
-#define VCORE_EXITING  4
+#define VCORE_PREEMPT  1
+#define VCORE_PIGGYBACK2
+#define VCORE_SLEEPING 3
+#define VCORE_RUNNING  4
+#define VCORE_EXITING  5
 
 /*
  * Struct used to manage memory for a virtual processor area
@@ -619,6 +627,7 @@ struct kvm_vcpu_arch {
int trap;
int state;
int ptid;
+   int thread_cpu;
bool timer_running;
wait_queue_head_t cpu_run;
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 0034b6b..d333664 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -512,6 +512,8 @@ int main(void)
DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty));
DEFINE(VCPU_HEIR, offsetof(struct kvm_vcpu, arch.emul_inst));
+   DEFINE(VCPU_CPU, offsetof(struct kvm_vcpu, cpu));
+   DEFINE(VCPU_THREAD_CPU, offsetof(struct kvm_vcpu, arch.thread_cpu));
 #endif
 #ifdef CONFIG_PPC_BOOK3S
DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 68d067a..2048309 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -81,6 +81,9 @@ static DECLARE_BITMAP(default_enabled_hcalls, 
MAX_HCALL_OPCODE/4 + 1);
 #define MPP_BUFFER_ORDER   3
 #endif
 
+static int target_smt_mode;
+module_param(target_smt_mode, int, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(target_smt_mode, Target threads per core (0 = max));
 
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
@@ -114,7 +117,7 @@ static bool kvmppc_ipi_thread(int cpu)
 
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-   int cpu = vcpu-cpu;
+   int cpu;
wait_queue_head_t *wqp;
 
wqp = kvm_arch_vcpu_wq(vcpu);
@@ -123,10 +126,11 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu 
*vcpu)
++vcpu-stat.halt_wakeup

[PATCH 2/2] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8

2015-05-27 Thread Paul Mackerras

This builds on the ability to run more than one vcore on a physical
core by using the micro-threading (split-core) modes of the POWER8
chip.  Previously, only vcores from the same VM could be run together,
and (on POWER8) only if they had just one thread per core.  With the
ability to split the core on guest entry and unsplit it on guest exit,
we can run up to 8 vcpu threads from up to 4 different VMs, and we can
run multiple vcores with 2 or 4 vcpus per vcore.

Dynamic micro-threading is only available if the static configuration
of the cores is whole-core mode (unsplit), and only on POWER8.

To manage this, we introduce a new kvm_split_mode struct which is
shared across all of the subcores in the core, with a pointer in the
paca on each thread.  In addition we extend the core_info struct to
have information on each subcore.  When deciding whether to add a
vcore to the set already on the core, we now have two possibilities:
(a) piggyback the vcore onto an existing subcore, or (b) start a new
subcore.

Currently, when any vcpu needs to exit the guest and switch to host
virtual mode, we interrupt all the threads in all subcores and switch
the core back to whole-core mode.  It may be possible in future to
allow some of the subcores to keep executing in the guest while
subcore 0 switches to the host, but that is not implemented in this
patch.

This adds a module parameter called dynamic_mt_modes which controls
which micro-threading (split-core) modes the code will consider, as a
bitmap.  In other words, if it is 0, no micro-threading mode is
considered; if it is 2, only 2-way micro-threading is considered; if
it is 4, only 4-way, and if it is 6, both 2-way and 4-way
micro-threading mode will be considered.  The default is 6.

With this, we now have secondary threads which are the primary thread
for their subcore and therefore need to do the MMU switch.  These
threads will need to be started even if they have no vcpu to run, so
we use the vcore pointer in the PACA rather than the vcpu pointer to
trigger them.

Signed-off-by: Paul Mackerras pau...@samba.org
---
 arch/powerpc/include/asm/kvm_book3s_asm.h |  20 ++
 arch/powerpc/include/asm/kvm_host.h   |   3 +
 arch/powerpc/kernel/asm-offsets.c |   7 +
 arch/powerpc/kvm/book3s_hv.c  | 369 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |  25 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 106 +++--
 6 files changed, 469 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 5bdfb5d..4024d24 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -25,6 +25,12 @@
 #define XICS_MFRR  0xc
 #define XICS_IPI   2   /* interrupt source # for IPIs */
 
+/* Maximum number of threads per physical core */
+#define MAX_THREADS8
+
+/* Maximum number of subcores per physical core */
+#define MAX_SUBCORES   4
+
 #ifdef __ASSEMBLY__
 
 #ifdef CONFIG_KVM_BOOK3S_HANDLER
@@ -65,6 +71,19 @@ kvmppc_resume_\intno:
 
 #else  /*__ASSEMBLY__ */
 
+struct kvmppc_vcore;
+
+/* Struct used for coordinating micro-threading (split-core) mode changes */
+struct kvm_split_mode {
+   unsigned long   rpr;
+   unsigned long   pmmar;
+   unsigned long   ldbar;
+   u8  subcore_size;
+   u8  do_nap;
+   u8  napped[MAX_THREADS];
+   struct kvmppc_vcore *master_vcs[MAX_SUBCORES];
+};
+
 /*
  * This struct goes in the PACA on 64-bit processors.  It is used
  * to store host state that needs to be saved when we enter a guest
@@ -100,6 +119,7 @@ struct kvmppc_host_state {
u64 host_spurr;
u64 host_dscr;
u64 dec_expires;
+   struct kvm_split_mode *kvm_split_mode;
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
u64 cfar;
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2b74490..80eb29a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -302,6 +302,9 @@ struct kvmppc_vcore {
 #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map  8)
 #define VCORE_IS_EXITING(vc)   (VCORE_EXIT_MAP(vc) != 0)
 
+/* This bit is used when a vcore exit is triggered from outside the vcore */
+#define VCORE_EXIT_REQ 0x1
+
 /*
  * Values for vcore_state.
  * Note that these are arranged such that lower values
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index d333664..c3e11e0 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -676,7 +676,14 @@ int main(void)
HSTATE_FIELD(HSTATE_DSCR, host_dscr);
HSTATE_FIELD(HSTATE_DABR, dabr);
HSTATE_FIELD(HSTATE_DECEXP, dec_expires);
+   HSTATE_FIELD(HSTATE_SPLIT_MODE, kvm_split_mode);
DEFINE(IPI_PRIORITY, IPI_PRIORITY);
+   DEFINE(KVM_SPLIT_RPR, offsetof(struct

Re: [PATCH 1/1] KVM: PPC: Book3S: correct width in XER handling

2015-05-20 Thread Paul Mackerras

On Wed, May 20, 2015 at 03:26:12PM +1000, Sam Bobroff wrote:
 In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
 accessed as such.
 
 This patch corrects places where it is accessed as a 32 bit field by a
 64 bit kernel.  In some cases this is via a 32 bit load or store
 instruction which, depending on endianness, will cause either the
 lower or upper 32 bits to be missed.  In another case it is cast as a
 u32, causing the upper 32 bits to be cleared.
 
 This patch corrects those places by extending the access methods to
 64 bits.
 
 Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com

Acked-by: Paul Mackerras pau...@samba.org
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] KVM: PPC: Book3S: correct width in XER handling

2015-05-20 Thread Paul Mackerras

On Wed, May 20, 2015 at 05:35:08PM -0500, Scott Wood wrote:
 
 It's nominally a 64-bit register, but the upper 32 bits are reserved in
 ISA 2.06.  Do newer ISAs or certain implementations define things in the
 upper 32 bits, or is this just about the asm accesses being wrong on
 big-endian?

It's primarily about the asm accesses being wrong.

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Fix list traversal in error case

2015-04-28 Thread Paul Mackerras

This fixes a regression introduced in commit 25fedfca94cf, KVM: PPC:
Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu, which
leads to a user-triggerable oops.

In the case where we try to run a vcore on a physical core that is
not in single-threaded mode, or the vcore has too many threads for
the physical core, we iterate the list of runnable vcpus to make
each one return an EBUSY error to userspace.  Since this involves
taking each vcpu off the runnable_threads list for the vcore, we
need to use list_for_each_entry_safe rather than list_for_each_entry
to traverse the list.  Otherwise the kernel will crash with an oops
message like this:

Unable to handle kernel paging request for data at address 0x000fff88
Faulting instruction address: 0xd0001e635dc8
Oops: Kernel access of bad area, sig: 11 [#2]
SMP NR_CPUS=1024 NUMA PowerNV
...
CPU: 48 PID: 91256 Comm: qemu-system-ppc Tainted: G  D3.18.0 #1
task: c0274e507500 ti: c027d1924000 task.ti: c027d1924000
NIP: d0001e635dc8 LR: d0001e635df8 CTR: c011ba50
REGS: c027d19275b0 TRAP: 0300   Tainted: G  D (3.18.0)
MSR: 90009033 SF,HV,EE,ME,IR,DR,RI,LE  CR: 22002824  XER: 
CFAR: c0008468 DAR: 000fff88 DSISR: 4000 SOFTE: 1
GPR00: d0001e635df8 c027d1927830 d0001e64c850 0001
GPR04: 0001 0001  
GPR08: 00200200   d0001e63e588
GPR12: 2200 c7dbc800 c00fc780 000a
GPR16: fffc c00fd5439690 c00fc7801c98 0001
GPR20: 0003 c027d1927aa8 c00fd543b348 c00fd543b350
GPR24:  c00fa57f 0030 
GPR28: fff0 c00fd543b328 000fe468 c00fd543b300
NIP [d0001e635dc8] kvmppc_run_core+0x198/0x17c0 [kvm_hv]
LR [d0001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv]
Call Trace:
[c027d1927830] [d0001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] 
(unreliable)
[c027d1927a30] [d0001e638350] kvmppc_vcpu_run_hv+0x5b0/0xdd0 [kvm_hv]
[c027d1927b70] [d0001e510504] kvmppc_vcpu_run+0x44/0x60 [kvm]
[c027d1927ba0] [d0001e50d4a4] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[c027d1927be0] [d0001e504be8] kvm_vcpu_ioctl+0x5e8/0x7a0 [kvm]
[c027d1927d40] [c02d6720] do_vfs_ioctl+0x490/0x780
[c027d1927de0] [c02d6ae4] SyS_ioctl+0xd4/0xf0
[c027d1927e30] [c0009358] syscall_exit+0x0/0x98
Instruction dump:
6000 6042 387e1b30 3883 38a1 38c0 480087d9 e8410018
ebde1c98 7fbdf040 3bdee368 419e0048 813e1b20 939e1b18 2f890001 409effcc
---[ end trace 8cdf50251cca6680 ]---

Fixes: 25fedfca94cf
Signed-off-by: Paul Mackerras pau...@samba.org
---
Since this is a regression fix for a patch that went in post 4.0,
it should go in for 4.1.

 arch/powerpc/kvm/book3s_hv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 48d3c5d..df81caa 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1952,7 +1952,7 @@ static void post_guest_process(struct kvmppc_vcore *vc)
  */
 static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
-   struct kvm_vcpu *vcpu;
+   struct kvm_vcpu *vcpu, *vnext;
int i;
int srcu_idx;
 
@@ -1982,7 +1982,8 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore 
*vc)
 */
if ((threads_per_core  1) 
((vc-num_threads  threads_per_subcore) || !on_primary_thread())) {
-   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) 
{
+   list_for_each_entry_safe(vcpu, vnext, vc-runnable_threads,
+arch.run_list) {
vcpu-arch.ret = -EBUSY;
kvmppc_remove_runnable(vc, vcpu);
wake_up(vcpu-arch.cpu_run);
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S HV: Fix list traversal in error case

2015-04-28 Thread Paul Mackerras

This fixes a regression introduced in commit 25fedfca94cf, KVM: PPC:
Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu, which
leads to a user-triggerable oops.

In the case where we try to run a vcore on a physical core that is
not in single-threaded mode, or the vcore has too many threads for
the physical core, we iterate the list of runnable vcpus to make
each one return an EBUSY error to userspace.  Since this involves
taking each vcpu off the runnable_threads list for the vcore, we
need to use list_for_each_entry_safe rather than list_for_each_entry
to traverse the list.  Otherwise the kernel will crash with an oops
message like this:

Unable to handle kernel paging request for data at address 0x000fff88
Faulting instruction address: 0xd0001e635dc8
Oops: Kernel access of bad area, sig: 11 [#2]
SMP NR_CPUS=1024 NUMA PowerNV
...
CPU: 48 PID: 91256 Comm: qemu-system-ppc Tainted: G  D3.18.0 #1
task: c0274e507500 ti: c027d1924000 task.ti: c027d1924000
NIP: d0001e635dc8 LR: d0001e635df8 CTR: c011ba50
REGS: c027d19275b0 TRAP: 0300   Tainted: G  D (3.18.0)
MSR: 90009033 SF,HV,EE,ME,IR,DR,RI,LE  CR: 22002824  XER: 
CFAR: c0008468 DAR: 000fff88 DSISR: 4000 SOFTE: 1
GPR00: d0001e635df8 c027d1927830 d0001e64c850 0001
GPR04: 0001 0001  
GPR08: 00200200   d0001e63e588
GPR12: 2200 c7dbc800 c00fc780 000a
GPR16: fffc c00fd5439690 c00fc7801c98 0001
GPR20: 0003 c027d1927aa8 c00fd543b348 c00fd543b350
GPR24:  c00fa57f 0030 
GPR28: fff0 c00fd543b328 000fe468 c00fd543b300
NIP [d0001e635dc8] kvmppc_run_core+0x198/0x17c0 [kvm_hv]
LR [d0001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv]
Call Trace:
[c027d1927830] [d0001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] 
(unreliable)
[c027d1927a30] [d0001e638350] kvmppc_vcpu_run_hv+0x5b0/0xdd0 [kvm_hv]
[c027d1927b70] [d0001e510504] kvmppc_vcpu_run+0x44/0x60 [kvm]
[c027d1927ba0] [d0001e50d4a4] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[c027d1927be0] [d0001e504be8] kvm_vcpu_ioctl+0x5e8/0x7a0 [kvm]
[c027d1927d40] [c02d6720] do_vfs_ioctl+0x490/0x780
[c027d1927de0] [c02d6ae4] SyS_ioctl+0xd4/0xf0
[c027d1927e30] [c0009358] syscall_exit+0x0/0x98
Instruction dump:
6000 6042 387e1b30 3883 38a1 38c0 480087d9 e8410018
ebde1c98 7fbdf040 3bdee368 419e0048 813e1b20 939e1b18 2f890001 409effcc
---[ end trace 8cdf50251cca6680 ]---

Fixes: 25fedfca94cf
Signed-off-by: Paul Mackerras pau...@samba.org
---
Since this is a regression fix for a patch that went in post 4.0,
it should go in for 4.1.

 arch/powerpc/kvm/book3s_hv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 48d3c5d..df81caa 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1952,7 +1952,7 @@ static void post_guest_process(struct kvmppc_vcore *vc)
  */
 static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
-   struct kvm_vcpu *vcpu;
+   struct kvm_vcpu *vcpu, *vnext;
int i;
int srcu_idx;
 
@@ -1982,7 +1982,8 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore 
*vc)
 */
if ((threads_per_core  1) 
((vc-num_threads  threads_per_subcore) || !on_primary_thread())) {
-   list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) 
{
+   list_for_each_entry_safe(vcpu, vnext, vc-runnable_threads,
+arch.run_list) {
vcpu-arch.ret = -EBUSY;
kvmppc_remove_runnable(vc, vcpu);
wake_up(vcpu-arch.cpu_run);
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1879 matches

Mail list logo