Re: [PATCH] KVM: PPC: Increase memslots to 512
On Wed, Dec 09, 2015 at 11:34:07AM +0100, Thomas Huth wrote: > Only using 32 memslots for KVM on powerpc is way too low, you can > nowadays hit this limit quite fast by adding a couple of PCI devices > and/or pluggable memory DIMMs to the guest. > > x86 already increased the KVM_USER_MEM_SLOTS to 509, to satisfy 256 > pluggable DIMM slots, 3 private slots and 253 slots for other things > like PCI devices (i.e. resulting in 256 + 3 + 253 = 512 slots in > total). We should do something similar for powerpc, and since we do > not use private slots here, we can set the value to 512 directly. > > While we're at it, also remove the KVM_MEM_SLOTS_NUM definition > from the powerpc-specific header since this gets defined in the > generic kvm_host.h header anyway. > > Signed-off-by: Thomas Huth <th...@redhat.com> Thanks, applied to my kvm-ppc-next branch. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8
On Fri, Nov 20, 2015 at 09:11:45AM +0100, Thomas Huth wrote: > In the old DABR register, the BT (Breakpoint Translation) bit > is bit number 61. In the new DAWRX register, the WT (Watchpoint > Translation) bit is bit number 59. So to move the DABR-BT bit > into the position of the DAWRX-WT bit, it has to be shifted by > two, not only by one. This fixes hardware watchpoints in gdb of > older guests that only use the H_SET_DABR/X interface instead > of the new H_SET_MODE interface. > > Signed-off-by: Thomas Huth <th...@redhat.com> Thanks, applied to my kvm-ppc-next branch, with cc: sta...@vger.kernel.org. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: remove unused variable 'vcpu_book3s'
On Tue, Dec 01, 2015 at 08:42:10PM -0300, Geyslan G. Bem wrote: > The vcpu_book3s struct is assigned but never used. So remove it. > > Signed-off-by: Geyslan G. Bem <geys...@gmail.com> Thanks, applied to my kvm-ppc-next branch. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Increase memslots to 512
On Wed, Dec 09, 2015 at 11:34:07AM +0100, Thomas Huth wrote: > Only using 32 memslots for KVM on powerpc is way too low, you can > nowadays hit this limit quite fast by adding a couple of PCI devices > and/or pluggable memory DIMMs to the guest. > > x86 already increased the KVM_USER_MEM_SLOTS to 509, to satisfy 256 > pluggable DIMM slots, 3 private slots and 253 slots for other things > like PCI devices (i.e. resulting in 256 + 3 + 253 = 512 slots in > total). We should do something similar for powerpc, and since we do > not use private slots here, we can set the value to 512 directly. > > While we're at it, also remove the KVM_MEM_SLOTS_NUM definition > from the powerpc-specific header since this gets defined in the > generic kvm_host.h header anyway. > > Signed-off-by: Thomas Huth <th...@redhat.com> Thanks, applied to my kvm-ppc-next branch. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8
On Fri, Nov 20, 2015 at 09:11:45AM +0100, Thomas Huth wrote: > In the old DABR register, the BT (Breakpoint Translation) bit > is bit number 61. In the new DAWRX register, the WT (Watchpoint > Translation) bit is bit number 59. So to move the DABR-BT bit > into the position of the DAWRX-WT bit, it has to be shifted by > two, not only by one. This fixes hardware watchpoints in gdb of > older guests that only use the H_SET_DABR/X interface instead > of the new H_SET_MODE interface. > > Signed-off-by: Thomas Huth <th...@redhat.com> Thanks, applied to my kvm-ppc-next branch, with cc: sta...@vger.kernel.org. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: remove unused variable 'vcpu_book3s'
On Tue, Dec 01, 2015 at 08:42:10PM -0300, Geyslan G. Bem wrote: > The vcpu_book3s struct is assigned but never used. So remove it. > > Signed-off-by: Geyslan G. Bem <geys...@gmail.com> Thanks, applied to my kvm-ppc-next branch. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Please pull my kvm-ppc-fixes branch
Hi Paolo, I have a small patch that I would like to get into 4.4 because it fixes a bug which for certain kernel configs allows userspace to crash the kernel. The configs are those for which KVM_BOOK3S_64_HV is set (y or m) and KVM_BOOK3S_64_PR is not. Fortunately most distros that enable KVM_BOOK3S_64_HV also enable KVM_BOOK3S_64_PR, as far as I can tell. Thanks, Paul. The following changes since commit 09922076003ad66de41ea14d2f8c3b4a16ec7774: Merge tag 'kvm-arm-for-v4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master (2015-12-04 18:32:32 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes for you to fetch changes up to c20875a3e638e4a03e099b343ec798edd1af5cc6: KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR (2015-12-10 11:34:27 +1100) Paul Mackerras (1): KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR arch/powerpc/kvm/book3s_hv.c | 6 ++ 1 file changed, 6 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Please pull my kvm-ppc-fixes branch
Hi Paolo, I have a small patch that I would like to get into 4.4 because it fixes a bug which for certain kernel configs allows userspace to crash the kernel. The configs are those for which KVM_BOOK3S_64_HV is set (y or m) and KVM_BOOK3S_64_PR is not. Fortunately most distros that enable KVM_BOOK3S_64_HV also enable KVM_BOOK3S_64_PR, as far as I can tell. Thanks, Paul. The following changes since commit 09922076003ad66de41ea14d2f8c3b4a16ec7774: Merge tag 'kvm-arm-for-v4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master (2015-12-04 18:32:32 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes for you to fetch changes up to c20875a3e638e4a03e099b343ec798edd1af5cc6: KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR (2015-12-10 11:34:27 +1100) Paul Mackerras (1): KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR arch/powerpc/kvm/book3s_hv.c | 6 ++ 1 file changed, 6 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: PPC: Increase memslots to 320
On Wed, Nov 04, 2015 at 10:03:48AM +0100, Thomas Huth wrote: > Only using 32 memslots for KVM on powerpc is way too low, you can > nowadays hit this limit quite fast by adding a couple of PCI devices > and/or pluggable memory DIMMs to the guest. > x86 already increased the limit to 512 in total, to satisfy 256 > pluggable DIMM slots, 3 private slots and 253 slots for other things > like PCI devices. On powerpc, we only have 32 pluggable DIMMs in I agree with increasing the limit. Is there a reason we have only 32 pluggable DIMMs in QEMU on powerpc, not more? Should we be increasing that limit too? If so, maybe we should increase the number of memory slots to 512? > QEMU, not 256, so we likely do not as much slots as on x86. Thus "so we likely do not need as many slots as on x86" would be better English. > setting the slot limit to 320 sounds like a good value for the > time being (until we have some code in the future to resize the > memslot array dynamically). > And while we're at it, also remove the KVM_MEM_SLOTS_NUM definition > from the powerpc-specific header since this gets defined in the > generic kvm_host.h header anyway. > > Signed-off-by: Thomas Huth <th...@redhat.com> Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: PPC: Increase memslots to 320
On Wed, Nov 04, 2015 at 10:03:48AM +0100, Thomas Huth wrote: > Only using 32 memslots for KVM on powerpc is way too low, you can > nowadays hit this limit quite fast by adding a couple of PCI devices > and/or pluggable memory DIMMs to the guest. > x86 already increased the limit to 512 in total, to satisfy 256 > pluggable DIMM slots, 3 private slots and 253 slots for other things > like PCI devices. On powerpc, we only have 32 pluggable DIMMs in I agree with increasing the limit. Is there a reason we have only 32 pluggable DIMMs in QEMU on powerpc, not more? Should we be increasing that limit too? If so, maybe we should increase the number of memory slots to 512? > QEMU, not 256, so we likely do not as much slots as on x86. Thus "so we likely do not need as many slots as on x86" would be better English. > setting the slot limit to 320 sounds like a good value for the > time being (until we have some code in the future to resize the > memslot array dynamically). > And while we're at it, also remove the KVM_MEM_SLOTS_NUM definition > from the powerpc-specific header since this gets defined in the > generic kvm_host.h header anyway. > > Signed-off-by: Thomas Huth <th...@redhat.com> Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] kvm - possible out of bounds
On Sun, Nov 29, 2015 at 05:14:03PM -0300, Geyslan Gregório Bem wrote: > Hello, > > I have found a possible out of bounds reading in > arch/powerpc/kvm/book3s_64_mmu.c (kvmppc_mmu_book3s_64_xlate > function). pteg[] array could be accessed twice using the i variable > after the for iteration. What happens is that in the last iteration > the i index is incremented to 16, checked (i<16) then confirmed > exiting the loop. > > 277for (i=0; i<16; i+=2) { ... > > Later there are reading attempts to the pteg last elements, but using > again the already incremented i (16). > > 303v = be64_to_cpu(pteg[i]); /* pteg[16] */ > 304r = be64_to_cpu(pteg[i+1]); /* pteg[17] */ Was it some automated tool that came up with this? There is actually no problem because the accesses outside the loop are only done if the 'found' variable is true; 'found' is initialized to false and only ever set to true inside the loop just before a break statement. Thus there is a correlation between the value of 'i' and the value of 'found' -- if 'found' is true then we know 'i' is less than 16. > I really don't know if the for lace will somehow iterate until i is > 16, anyway I think that the last readings must be using a defined max > len/index or another more clear method. I think it's perfectly clear to a human programmer, though some tools (such as gcc) struggle with this kind of correlation between variables. That's why I asked whether your report was based on the output from some tool. > Eg. > > v = be64_to_cpu(pteg[PTEG_LEN - 2]); > r = be64_to_cpu(pteg[PTEG_LEN - 1]); > > Or just. > > v = be64_to_cpu(pteg[14]); > r = be64_to_cpu(pteg[15]); Either of those options would cause the code to malfunction. > I found in the same file a variable that is not used. > > 380struct kvmppc_vcpu_book3s *vcpu_book3s; > ... > 387vcpu_book3s = to_book3s(vcpu); True. It could be removed. > A question, the kvmppc_mmu_book3s_64_init function is accessed by > unconventional way? Because I have not found any calling to it. Try arch/powerpc/kvm/book3s_pr.c line 410: kvmppc_mmu_book3s_64_init(vcpu); Grep (or git grep) is your friend. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] kvm - possible out of bounds
On Sun, Nov 29, 2015 at 05:14:03PM -0300, Geyslan Gregório Bem wrote: > Hello, > > I have found a possible out of bounds reading in > arch/powerpc/kvm/book3s_64_mmu.c (kvmppc_mmu_book3s_64_xlate > function). pteg[] array could be accessed twice using the i variable > after the for iteration. What happens is that in the last iteration > the i index is incremented to 16, checked (i<16) then confirmed > exiting the loop. > > 277for (i=0; i<16; i+=2) { ... > > Later there are reading attempts to the pteg last elements, but using > again the already incremented i (16). > > 303v = be64_to_cpu(pteg[i]); /* pteg[16] */ > 304r = be64_to_cpu(pteg[i+1]); /* pteg[17] */ Was it some automated tool that came up with this? There is actually no problem because the accesses outside the loop are only done if the 'found' variable is true; 'found' is initialized to false and only ever set to true inside the loop just before a break statement. Thus there is a correlation between the value of 'i' and the value of 'found' -- if 'found' is true then we know 'i' is less than 16. > I really don't know if the for lace will somehow iterate until i is > 16, anyway I think that the last readings must be using a defined max > len/index or another more clear method. I think it's perfectly clear to a human programmer, though some tools (such as gcc) struggle with this kind of correlation between variables. That's why I asked whether your report was based on the output from some tool. > Eg. > > v = be64_to_cpu(pteg[PTEG_LEN - 2]); > r = be64_to_cpu(pteg[PTEG_LEN - 1]); > > Or just. > > v = be64_to_cpu(pteg[14]); > r = be64_to_cpu(pteg[15]); Either of those options would cause the code to malfunction. > I found in the same file a variable that is not used. > > 380struct kvmppc_vcpu_book3s *vcpu_book3s; > ... > 387vcpu_book3s = to_book3s(vcpu); True. It could be removed. > A question, the kvmppc_mmu_book3s_64_init function is accessed by > unconventional way? Because I have not found any calling to it. Try arch/powerpc/kvm/book3s_pr.c line 410: kvmppc_mmu_book3s_64_init(vcpu); Grep (or git grep) is your friend. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm
On Tue, Sep 15, 2015 at 08:49:35PM +1000, Alexey Kardashevskiy wrote: > At the moment pages used for TCE tables (in addition to pages addressed > by TCEs) are not counted in locked_vm counter so a malicious userspace > tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and > lock a lot of memory. > > This adds counting for pages used for TCE tables. > > This counts the number of pages required for a table plus pages for > the kvmppc_spapr_tce_table struct (TCE table descriptor) itself. > > This does not change the amount of (de)allocated memory. > > Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> > --- > arch/powerpc/kvm/book3s_64_vio.c | 51 > +++- > 1 file changed, 50 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kvm/book3s_64_vio.c > b/arch/powerpc/kvm/book3s_64_vio.c > index 9526c34..b70787d 100644 > --- a/arch/powerpc/kvm/book3s_64_vio.c > +++ b/arch/powerpc/kvm/book3s_64_vio.c > @@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size) >* sizeof(u64), PAGE_SIZE) / PAGE_SIZE; > } > > +static long kvmppc_account_memlimit(long npages, bool inc) > +{ > + long ret = 0; > + const long bytes = sizeof(struct kvmppc_spapr_tce_table) + > + (abs(npages) * sizeof(struct page *)); Why abs(npages)? Can npages be negative? If so, what does that mean? Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kernel 5/9] KVM: PPC: Account TCE-containing pages in locked_vm
On Tue, Sep 15, 2015 at 08:49:35PM +1000, Alexey Kardashevskiy wrote: > At the moment pages used for TCE tables (in addition to pages addressed > by TCEs) are not counted in locked_vm counter so a malicious userspace > tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and > lock a lot of memory. > > This adds counting for pages used for TCE tables. > > This counts the number of pages required for a table plus pages for > the kvmppc_spapr_tce_table struct (TCE table descriptor) itself. > > This does not change the amount of (de)allocated memory. > > Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> > --- > arch/powerpc/kvm/book3s_64_vio.c | 51 > +++- > 1 file changed, 50 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kvm/book3s_64_vio.c > b/arch/powerpc/kvm/book3s_64_vio.c > index 9526c34..b70787d 100644 > --- a/arch/powerpc/kvm/book3s_64_vio.c > +++ b/arch/powerpc/kvm/book3s_64_vio.c > @@ -45,13 +45,56 @@ static long kvmppc_stt_npages(unsigned long window_size) >* sizeof(u64), PAGE_SIZE) / PAGE_SIZE; > } > > +static long kvmppc_account_memlimit(long npages, bool inc) > +{ > + long ret = 0; > + const long bytes = sizeof(struct kvmppc_spapr_tce_table) + > + (abs(npages) * sizeof(struct page *)); Why abs(npages)? Can npages be negative? If so, what does that mean? Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR
Currently it is possible for userspace (e.g. QEMU) to set a value for the MSR for a guest VCPU which has both of the TS bits set, which is an illegal combination. The result of this is that when we execute a hrfid (hypervisor return from interrupt doubleword) instruction to enter the guest, the CPU will take a TM Bad Thing type of program interrupt (vector 0x700). Now, if PR KVM is configured in the kernel along with HV KVM, we actually handle this without crashing the host or giving hypervisor privilege to the guest; instead what happens is that we deliver a program interrupt to the guest, with SRR0 reflecting the address of the hrfid instruction and SRR1 containing the MSR value at that point. If PR KVM is not configured in the kernel, then we try to run the host's program interrupt handler with the MMU set to the guest context, which almost certainly causes a host crash. This closes the hole by making kvmppc_set_msr_hv() check for the illegal combination and force the TS field to a safe value (00, meaning non-transactional). Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index becad3a..f668712 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -231,6 +231,12 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu) static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr) { + /* +* Check for illegal transactional state bit combination +* and if we find it, force the TS field to a safe state. +*/ + if ((msr & MSR_TS_MASK) == MSR_TS_MASK) + msr &= ~MSR_TS_MASK; vcpu->arch.shregs.msr = msr; kvmppc_end_cede(vcpu); } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] powerpc/64: Include KVM guest test in all interrupt vectors
Currently, if HV KVM is configured but PR KVM isn't, we don't include a test to see whether we were interrupted in KVM guest context for the set of interrupts which get delivered directly to the guest by hardware if they occur in the guest. This includes things like program interrupts. However, the recent bug where userspace could set the MSR for a VCPU to have an illegal value in the TS field, and thus cause a TM Bad Thing type of program interrupt on the hrfid that enters the guest, showed that we can never be completely sure that these interrupts can never occur in the guest entry/exit code. If one of these interrupts does happen and we have HV KVM configured but not PR KVM, then we end up trying to run the handler in the host with the MMU set to the guest MMU context, which generally ends badly. Thus, for robustness it is better to have the test in every interrupt vector, so that if some way is found to trigger some interrupt in the guest entry/exit path, we can handle it without immediately crashing the host. This means that the distinction between KVMTEST and KVMTEST_PR goes away. Thus we delete KVMTEST_PR and associated macros and use KVMTEST everywhere that we previously used either KVMTEST_PR or KVMTEST. It also means that SOFTEN_TEST_HV_201 becomes the same as SOFTEN_TEST_PR, so we deleted SOFTEN_TEST_HV_201 and use SOFTEN_TEST_PR instead. Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/include/asm/exception-64s.h | 21 +++- arch/powerpc/kernel/exceptions-64s.S | 34 2 files changed, 20 insertions(+), 35 deletions(-) diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 77f52b2..9ee1078 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -263,17 +263,6 @@ do_kvm_##n: \ #define KVM_HANDLER_SKIP(area, h, n) #endif -#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE -#define KVMTEST_PR(n) __KVMTEST(n) -#define KVM_HANDLER_PR(area, h, n) __KVM_HANDLER(area, h, n) -#define KVM_HANDLER_PR_SKIP(area, h, n)__KVM_HANDLER_SKIP(area, h, n) - -#else -#define KVMTEST_PR(n) -#define KVM_HANDLER_PR(area, h, n) -#define KVM_HANDLER_PR_SKIP(area, h, n) -#endif - #define NOTEST(n) /* @@ -360,13 +349,13 @@ label##_pSeries: \ HMT_MEDIUM_PPR_DISCARD; \ SET_SCRATCH0(r13); /* save r13 */ \ EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common,\ -EXC_STD, KVMTEST_PR, vec) +EXC_STD, KVMTEST, vec) /* Version of above for when we have to branch out-of-line */ #define STD_EXCEPTION_PSERIES_OOL(vec, label) \ .globl label##_pSeries; \ label##_pSeries: \ - EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, vec);\ + EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST, vec); \ EXCEPTION_PROLOG_PSERIES_1(label##_common, EXC_STD) #define STD_EXCEPTION_HV(loc, vec, label) \ @@ -436,17 +425,13 @@ label##_relon_hv: \ #define _SOFTEN_TEST(h, vec) __SOFTEN_TEST(h, vec) #define SOFTEN_TEST_PR(vec)\ - KVMTEST_PR(vec);\ + KVMTEST(vec); \ _SOFTEN_TEST(EXC_STD, vec) #define SOFTEN_TEST_HV(vec)\ KVMTEST(vec); \ _SOFTEN_TEST(EXC_HV, vec) -#define SOFTEN_TEST_HV_201(vec) \ - KVMTEST(vec); \ - _SOFTEN_TEST(EXC_STD, vec) - #define SOFTEN_NOTEST_PR(vec) _SOFTEN_TEST(EXC_STD, vec) #define SOFTEN_NOTEST_HV(vec) _SOFTEN_TEST(EXC_HV, vec) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 0a0399c2..1a03142 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -242,7 +242,7 @@ instruction_access_slb_pSeries: HMT_MEDIUM_PPR_DISCARD SET_SCRATCH0(r13) EXCEPTION_PROLOG_0(PACA_EXSLB) - EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x480) + EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST, 0x480) std r3,PACA_EXSLB+EX_R3(r13) mfspr r3,SPRN_SRR0/* SRR0 is faulting address */ #ifdef __DISABLED__ @@ -276,18 +276,18 @@ hardware_interrupt_hv: KVM_HANDLER(PACA_EXGEN, EXC_HV, 0x502) FTR_SECTION_ELSE _MASKABLE_EXCEPTION_PSERIES(0x500, hardware_int
[PATCH 2/2] KVM: PPC: Book3S HV: Handle unexpected traps in guest entry/exit code better
As we saw with the TM Bad Thing type of program interrupt occurring on the hrfid that enters the guest, it is not completely impossible to have a trap occurring in the guest entry/exit code, despite the fact that the code has been written to avoid taking any traps. This adds a check in the kvmppc_handle_exit_hv() function to detect the case when a trap has occurred in the hypervisor-mode code, and instead of treating it just like a trap in guest code, we now print a message and return to userspace with a KVM_EXIT_INTERNAL_ERROR exit reason. Of the various interrupts that get handled in the assembly code in the guest exit path and that can return directly to the guest, the only one that can occur when MSR.HV=1 and MSR.EE=0 is machine check (other than system call, which we can avoid just by not doing a sc instruction). Therefore this adds code to the machine check path to ensure that if the MCE occurred in hypervisor mode, we exit to the host rather than trying to continue the guest. Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv.c| 18 ++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 ++ 2 files changed, 20 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index f668712..d6baf0a 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -846,6 +846,24 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, vcpu->stat.sum_exits++; + /* +* This can happen if an interrupt occurs in the last stages +* of guest entry or the first stages of guest exit (i.e. after +* setting paca->kvm_hstate.in_guest to KVM_GUEST_MODE_GUEST_HV +* and before setting it to KVM_GUEST_MODE_HOST_HV). +* That can happen due to a bug, or due to a machine check +* occurring at just the wrong time. +*/ + if (vcpu->arch.shregs.msr & MSR_HV) { + printk(KERN_EMERG "KVM trap in HV mode!\n"); + printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n", + vcpu->arch.trap, kvmppc_get_pc(vcpu), + vcpu->arch.shregs.msr); + kvmppc_dump_regs(vcpu); + run->exit_reason = KVM_EXIT_INTERNAL_ERROR; + run->hw.hardware_exit_reason = vcpu->arch.trap; + return RESUME_HOST; + } run->exit_reason = KVM_EXIT_UNKNOWN; run->ready_for_interrupt_injection = 1; switch (vcpu->arch.trap) { diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 3c6badc..b3ce8ff 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -2404,6 +2404,8 @@ machine_check_realmode: * guest as machine check causing guest to crash. */ ld r11, VCPU_MSR(r9) + rldicl. r0, r11, 64-MSR_HV_LG, 63 /* check if it happened in HV mode */ + bne mc_cont /* if so, exit to host */ andi. r10, r11, MSR_RI/* check for unrecoverable exception */ beq 1f /* Deliver a machine check to guest */ ld r10, VCPU_PC(r9) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR
Currently it is possible for userspace (e.g. QEMU) to set a value for the MSR for a guest VCPU which has both of the TS bits set, which is an illegal combination. The result of this is that when we execute a hrfid (hypervisor return from interrupt doubleword) instruction to enter the guest, the CPU will take a TM Bad Thing type of program interrupt (vector 0x700). Now, if PR KVM is configured in the kernel along with HV KVM, we actually handle this without crashing the host or giving hypervisor privilege to the guest; instead what happens is that we deliver a program interrupt to the guest, with SRR0 reflecting the address of the hrfid instruction and SRR1 containing the MSR value at that point. If PR KVM is not configured in the kernel, then we try to run the host's program interrupt handler with the MMU set to the guest context, which almost certainly causes a host crash. This closes the hole by making kvmppc_set_msr_hv() check for the illegal combination and force the TS field to a safe value (00, meaning non-transactional). Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index becad3a..f668712 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -231,6 +231,12 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu) static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr) { + /* +* Check for illegal transactional state bit combination +* and if we find it, force the TS field to a safe state. +*/ + if ((msr & MSR_TS_MASK) == MSR_TS_MASK) + msr &= ~MSR_TS_MASK; vcpu->arch.shregs.msr = msr; kvmppc_end_cede(vcpu); } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Please pull my kvm-ppc-fixes branch
Paolo, I have two fixes for HV KVM which I would like to have included in v4.4-rc1. The first one is a fix for a bug identified by Red Hat which causes occasional guest crashes. The second one fixes a bug which causes host stalls and timeouts under certain circumstances when the host is configured for static 2-way micro-threading mode. Thanks, Paul. The following changes since commit a3eaa8649e4c6a6afdafaa04b9114fb230617bb1: KVM: VMX: Fix commit which broke PML (2015-11-05 11:34:11 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes for you to fetch changes up to f74f2e2e26199f695ca3df94f29e9ab7cb707ea4: KVM: PPC: Book3S HV: Don't dynamically split core when already split (2015-11-06 16:02:59 +1100) Paul Mackerras (2): KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails KVM: PPC: Book3S HV: Don't dynamically split core when already split arch/powerpc/kvm/book3s_hv.c| 2 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 2 files changed, 13 insertions(+), 9 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Please pull my kvm-ppc-fixes branch
Paolo, I have two fixes for HV KVM which I would like to have included in v4.4-rc1. The first one is a fix for a bug identified by Red Hat which causes occasional guest crashes. The second one fixes a bug which causes host stalls and timeouts under certain circumstances when the host is configured for static 2-way micro-threading mode. Thanks, Paul. The following changes since commit a3eaa8649e4c6a6afdafaa04b9114fb230617bb1: KVM: VMX: Fix commit which broke PML (2015-11-05 11:34:11 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes for you to fetch changes up to f74f2e2e26199f695ca3df94f29e9ab7cb707ea4: KVM: PPC: Book3S HV: Don't dynamically split core when already split (2015-11-06 16:02:59 +1100) Paul Mackerras (2): KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails KVM: PPC: Book3S HV: Don't dynamically split core when already split arch/powerpc/kvm/book3s_hv.c| 2 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 2 files changed, 13 insertions(+), 9 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: PPC: Book3S HV: Handle unexpected traps in guest entry/exit code better
As we saw with the TM Bad Thing type of program interrupt occurring on the hrfid that enters the guest, it is not completely impossible to have a trap occurring in the guest entry/exit code, despite the fact that the code has been written to avoid taking any traps. This adds a check in the kvmppc_handle_exit_hv() function to detect the case when a trap has occurred in the hypervisor-mode code, and instead of treating it just like a trap in guest code, we now print a message and return to userspace with a KVM_EXIT_INTERNAL_ERROR exit reason. Of the various interrupts that get handled in the assembly code in the guest exit path and that can return directly to the guest, the only one that can occur when MSR.HV=1 and MSR.EE=0 is machine check (other than system call, which we can avoid just by not doing a sc instruction). Therefore this adds code to the machine check path to ensure that if the MCE occurred in hypervisor mode, we exit to the host rather than trying to continue the guest. Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv.c| 18 ++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 ++ 2 files changed, 20 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index f668712..d6baf0a 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -846,6 +846,24 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, vcpu->stat.sum_exits++; + /* +* This can happen if an interrupt occurs in the last stages +* of guest entry or the first stages of guest exit (i.e. after +* setting paca->kvm_hstate.in_guest to KVM_GUEST_MODE_GUEST_HV +* and before setting it to KVM_GUEST_MODE_HOST_HV). +* That can happen due to a bug, or due to a machine check +* occurring at just the wrong time. +*/ + if (vcpu->arch.shregs.msr & MSR_HV) { + printk(KERN_EMERG "KVM trap in HV mode!\n"); + printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n", + vcpu->arch.trap, kvmppc_get_pc(vcpu), + vcpu->arch.shregs.msr); + kvmppc_dump_regs(vcpu); + run->exit_reason = KVM_EXIT_INTERNAL_ERROR; + run->hw.hardware_exit_reason = vcpu->arch.trap; + return RESUME_HOST; + } run->exit_reason = KVM_EXIT_UNKNOWN; run->ready_for_interrupt_injection = 1; switch (vcpu->arch.trap) { diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 3c6badc..b3ce8ff 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -2404,6 +2404,8 @@ machine_check_realmode: * guest as machine check causing guest to crash. */ ld r11, VCPU_MSR(r9) + rldicl. r0, r11, 64-MSR_HV_LG, 63 /* check if it happened in HV mode */ + bne mc_cont /* if so, exit to host */ andi. r10, r11, MSR_RI/* check for unrecoverable exception */ beq 1f /* Deliver a machine check to guest */ ld r10, VCPU_PC(r9) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Don't dynamically split core when already split
In static micro-threading modes, the dynamic micro-threading code is supposed to be disabled, because subcores can't make independent decisions about what micro-threading mode to put the core in - there is only one micro-threading mode for the whole core. The code that implements dynamic micro-threading checks for this, except that the check was missed in one case. This means that it is possible for a subcore in static 2-way micro-threading mode to try to put the core into 4-way micro-threading mode, which usually leads to stuck CPUs, spinlock lockups, and other stalls in the host. The problem was in the can_split_piggybacked_subcores() function, which should always return false if the system is in a static micro-threading mode. This fixes the problem by making can_split_piggybacked_subcores() use subcore_config_ok() for its checks, as subcore_config_ok() includes the necessary check for the static micro-threading modes. Credit to Gautham Shenoy for working out that the reason for the hangs and stalls we were seeing was that we were trying to do dynamic 4-way micro-threading while we were in static 2-way mode. Fixes: b4deba5c41e9 Cc: v...@stable.kernel.org # v4.3 Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2280497..becad3a 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2060,7 +2060,7 @@ static bool can_split_piggybacked_subcores(struct core_info *cip) return false; n_subcores += (cip->subcore_threads[sub] - 1) >> 1; } - if (n_subcores > 3 || large_sub < 0) + if (large_sub < 0 || !subcore_config_ok(n_subcores + 1, 2)) return false; /* -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Don't dynamically split core when already split
In static micro-threading modes, the dynamic micro-threading code is supposed to be disabled, because subcores can't make independent decisions about what micro-threading mode to put the core in - there is only one micro-threading mode for the whole core. The code that implements dynamic micro-threading checks for this, except that the check was missed in one case. This means that it is possible for a subcore in static 2-way micro-threading mode to try to put the core into 4-way micro-threading mode, which usually leads to stuck CPUs, spinlock lockups, and other stalls in the host. The problem was in the can_split_piggybacked_subcores() function, which should always return false if the system is in a static micro-threading mode. This fixes the problem by making can_split_piggybacked_subcores() use subcore_config_ok() for its checks, as subcore_config_ok() includes the necessary check for the static micro-threading modes. Credit to Gautham Shenoy for working out that the reason for the hangs and stalls we were seeing was that we were trying to do dynamic 4-way micro-threading while we were in static 2-way mode. Fixes: b4deba5c41e9 Cc: v...@stable.kernel.org # v4.3 Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2280497..becad3a 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2060,7 +2060,7 @@ static bool can_split_piggybacked_subcores(struct core_info *cip) return false; n_subcores += (cip->subcore_threads[sub] - 1) >> 1; } - if (n_subcores > 3 || large_sub < 0) + if (large_sub < 0 || !subcore_config_ok(n_subcores + 1, 2)) return false; /* -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails
When handling a hypervisor data or instruction storage interrupt (HDSI or HISI), we look up the SLB entry for the address being accessed in order to translate the effective address to a virtual address which can be looked up in the guest HPT. This lookup can occasionally fail due to the guest replacing an SLB entry without invalidating the evicted SLB entry. In this situation an ERAT (effective to real address translation cache) entry can persist and be used by the hardware even though there is no longer a corresponding SLB entry. Previously we would just deliver a data or instruction storage interrupt (DSI or ISI) to the guest in this case. However, this is not correct and has been observed to cause guests to crash, typically with a data storage protection interrupt on a store to the vmemmap area. Instead, what we do now is to synthesize a data or instruction segment interrupt. That should cause the guest to reload an appropriate entry into the SLB and retry the faulting instruction. If it still faults, we should find an appropriate SLB entry next time and be able to handle the fault. Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index b1dab8d..3c6badc 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1749,7 +1749,8 @@ kvmppc_hdsi: beq 3f clrrdi r0, r4, 28 PPC_SLBFEE_DOT(R5, R0) /* if so, look up SLB */ - bne 1f /* if no SLB entry found */ + li r0, BOOK3S_INTERRUPT_DATA_SEGMENT + bne 7f /* if no SLB entry found */ 4: std r4, VCPU_FAULT_DAR(r9) stw r6, VCPU_FAULT_DSISR(r9) @@ -1768,14 +1769,15 @@ kvmppc_hdsi: cmpdi r3, -2 /* MMIO emulation; need instr word */ beq 2f - /* Synthesize a DSI for the guest */ + /* Synthesize a DSI (or DSegI) for the guest */ ld r4, VCPU_FAULT_DAR(r9) mr r6, r3 -1: mtspr SPRN_DAR, r4 +1: li r0, BOOK3S_INTERRUPT_DATA_STORAGE mtspr SPRN_DSISR, r6 +7: mtspr SPRN_DAR, r4 mtspr SPRN_SRR0, r10 mtspr SPRN_SRR1, r11 - li r10, BOOK3S_INTERRUPT_DATA_STORAGE + mr r10, r0 bl kvmppc_msr_interrupt fast_interrupt_c_return: 6: ld r7, VCPU_CTR(r9) @@ -1823,7 +1825,8 @@ kvmppc_hisi: beq 3f clrrdi r0, r10, 28 PPC_SLBFEE_DOT(R5, R0) /* if so, look up SLB */ - bne 1f /* if no SLB entry found */ + li r0, BOOK3S_INTERRUPT_INST_SEGMENT + bne 7f /* if no SLB entry found */ 4: /* Search the hash table. */ mr r3, r9 /* vcpu pointer */ @@ -1840,11 +1843,12 @@ kvmppc_hisi: cmpdi r3, -1 /* handle in kernel mode */ beq guest_exit_cont - /* Synthesize an ISI for the guest */ + /* Synthesize an ISI (or ISegI) for the guest */ mr r11, r3 -1: mtspr SPRN_SRR0, r10 +1: li r0, BOOK3S_INTERRUPT_INST_STORAGE +7: mtspr SPRN_SRR0, r10 mtspr SPRN_SRR1, r11 - li r10, BOOK3S_INTERRUPT_INST_STORAGE + mr r10, r0 bl kvmppc_msr_interrupt b fast_interrupt_c_return -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails
When handling a hypervisor data or instruction storage interrupt (HDSI or HISI), we look up the SLB entry for the address being accessed in order to translate the effective address to a virtual address which can be looked up in the guest HPT. This lookup can occasionally fail due to the guest replacing an SLB entry without invalidating the evicted SLB entry. In this situation an ERAT (effective to real address translation cache) entry can persist and be used by the hardware even though there is no longer a corresponding SLB entry. Previously we would just deliver a data or instruction storage interrupt (DSI or ISI) to the guest in this case. However, this is not correct and has been observed to cause guests to crash, typically with a data storage protection interrupt on a store to the vmemmap area. Instead, what we do now is to synthesize a data or instruction segment interrupt. That should cause the guest to reload an appropriate entry into the SLB and retry the faulting instruction. If it still faults, we should find an appropriate SLB entry next time and be able to handle the fault. Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index b1dab8d..3c6badc 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1749,7 +1749,8 @@ kvmppc_hdsi: beq 3f clrrdi r0, r4, 28 PPC_SLBFEE_DOT(R5, R0) /* if so, look up SLB */ - bne 1f /* if no SLB entry found */ + li r0, BOOK3S_INTERRUPT_DATA_SEGMENT + bne 7f /* if no SLB entry found */ 4: std r4, VCPU_FAULT_DAR(r9) stw r6, VCPU_FAULT_DSISR(r9) @@ -1768,14 +1769,15 @@ kvmppc_hdsi: cmpdi r3, -2 /* MMIO emulation; need instr word */ beq 2f - /* Synthesize a DSI for the guest */ + /* Synthesize a DSI (or DSegI) for the guest */ ld r4, VCPU_FAULT_DAR(r9) mr r6, r3 -1: mtspr SPRN_DAR, r4 +1: li r0, BOOK3S_INTERRUPT_DATA_STORAGE mtspr SPRN_DSISR, r6 +7: mtspr SPRN_DAR, r4 mtspr SPRN_SRR0, r10 mtspr SPRN_SRR1, r11 - li r10, BOOK3S_INTERRUPT_DATA_STORAGE + mr r10, r0 bl kvmppc_msr_interrupt fast_interrupt_c_return: 6: ld r7, VCPU_CTR(r9) @@ -1823,7 +1825,8 @@ kvmppc_hisi: beq 3f clrrdi r0, r10, 28 PPC_SLBFEE_DOT(R5, R0) /* if so, look up SLB */ - bne 1f /* if no SLB entry found */ + li r0, BOOK3S_INTERRUPT_INST_SEGMENT + bne 7f /* if no SLB entry found */ 4: /* Search the hash table. */ mr r3, r9 /* vcpu pointer */ @@ -1840,11 +1843,12 @@ kvmppc_hisi: cmpdi r3, -1 /* handle in kernel mode */ beq guest_exit_cont - /* Synthesize an ISI for the guest */ + /* Synthesize an ISI (or ISegI) for the guest */ mr r11, r3 -1: mtspr SPRN_SRR0, r10 +1: li r0, BOOK3S_INTERRUPT_INST_STORAGE +7: mtspr SPRN_SRR0, r10 mtspr SPRN_SRR1, r11 - li r10, BOOK3S_INTERRUPT_INST_STORAGE + mr r10, r0 bl kvmppc_msr_interrupt b fast_interrupt_c_return -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Please pull my kvm-ppc-next branch
Paolo, Here is my current patch queue for KVM on PPC. There's nothing much in the way of new features this time; it's mostly bug fixes, plus Nikunj has implemented support for KVM_CAP_NR_MEMSLOTS. These are intended for the "next" branch of the KVM tree. Please pull. Thanks, Paul. The following changes since commit 9ffecb10283508260936b96022d4ee43a7798b4c: Linux 4.3-rc3 (2015-09-27 07:50:08 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-next for you to fetch changes up to 70aa3961a196ac32baf54032b2051bac9a941118: KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path (2015-10-21 16:31:52 +1100) Andrzej Hajda (1): KVM: PPC: e500: fix handling local_sid_lookup result Gautham R. Shenoy (1): KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path Mahesh Salgaonkar (1): KVM: PPC: Book3S HV: Deliver machine check with MSR(RI=0) to guest as MCE Nikunj A Dadhania (1): KVM: PPC: Implement extension to report number of memslots Paul Mackerras (2): KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent HPTEs Tudor Laurentiu (3): powerpc/e6500: add TMCFG0 register definition KVM: PPC: e500: Emulate TMCFG0 TMRN register KVM: PPC: e500: fix couple of shift operations on 64 bits arch/powerpc/include/asm/disassemble.h | 5 + arch/powerpc/include/asm/reg_booke.h| 6 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 ++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 ++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 29 ++--- arch/powerpc/kvm/e500.c | 3 ++- arch/powerpc/kvm/e500_emulate.c | 19 +++ arch/powerpc/kvm/e500_mmu_host.c| 4 ++-- arch/powerpc/kvm/powerpc.c | 3 +++ 9 files changed, 63 insertions(+), 11 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Please pull my kvm-ppc-next branch
Paolo, Here is my current patch queue for KVM on PPC. There's nothing much in the way of new features this time; it's mostly bug fixes, plus Nikunj has implemented support for KVM_CAP_NR_MEMSLOTS. These are intended for the "next" branch of the KVM tree. Please pull. Thanks, Paul. The following changes since commit 9ffecb10283508260936b96022d4ee43a7798b4c: Linux 4.3-rc3 (2015-09-27 07:50:08 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-next for you to fetch changes up to 70aa3961a196ac32baf54032b2051bac9a941118: KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path (2015-10-21 16:31:52 +1100) Andrzej Hajda (1): KVM: PPC: e500: fix handling local_sid_lookup result Gautham R. Shenoy (1): KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path Mahesh Salgaonkar (1): KVM: PPC: Book3S HV: Deliver machine check with MSR(RI=0) to guest as MCE Nikunj A Dadhania (1): KVM: PPC: Implement extension to report number of memslots Paul Mackerras (2): KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent HPTEs Tudor Laurentiu (3): powerpc/e6500: add TMCFG0 register definition KVM: PPC: e500: Emulate TMCFG0 TMRN register KVM: PPC: e500: fix couple of shift operations on 64 bits arch/powerpc/include/asm/disassemble.h | 5 + arch/powerpc/include/asm/reg_booke.h| 6 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 ++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 ++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 29 ++--- arch/powerpc/kvm/e500.c | 3 ++- arch/powerpc/kvm/e500_emulate.c | 19 +++ arch/powerpc/kvm/e500_mmu_host.c| 4 ++-- arch/powerpc/kvm/powerpc.c | 3 +++ 9 files changed, 63 insertions(+), 11 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Implement extension to report number of memslots
On Fri, Oct 16, 2015 at 08:41:31AM +0200, Thomas Huth wrote: > Yes, we'll likely need this soon! 32 slots are not enough... Would anyone object if I raised the limit for PPC to 512 slots? Would that cause problems on embedded PPC, for instance? Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Implement extension to report number of memslots
On Fri, Oct 16, 2015 at 08:41:31AM +0200, Thomas Huth wrote: > Yes, we'll likely need this soon! 32 slots are not enough... Would anyone object if I raised the limit for PPC to 512 slots? Would that cause problems on embedded PPC, for instance? Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e500: fix couple of shift operations on 64 bits
On Thu, Oct 01, 2015 at 03:58:03PM +0300, Laurentiu Tudor wrote: > Fix couple of cases where we shift left a 32-bit > value thus might get truncated results on 64-bit > targets. > > Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com> > Suggested-by: Scott Wood <scotttw...@freescale.com> Thanks, applied to my kvm-ppc-next branch. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][v2] KVM: PPC: e500: Emulate TMCFG0 TMRN register
On Fri, Sep 25, 2015 at 06:02:23PM +0300, Laurentiu Tudor wrote: > Emulate TMCFG0 TMRN register exposing one HW thread per vcpu. > > Signed-off-by: Mihai Caraman <mihai.cara...@freescale.com> > [laurentiu.tu...@freescale.com: rebased on latest kernel, use > define instead of hardcoded value, moved code in own function] > Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com> Thanks, applied to my kvm-ppc-next branch. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/e6500: add TMCFG0 register definition
On Wed, Sep 23, 2015 at 06:06:22PM +0300, Laurentiu Tudor wrote: > The register is not currently used in the base kernel > but will be in a forthcoming kvm patch. > > Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com> Thanks, applied to my kvm-ppc-next branch. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][v2] KVM: PPC: e500: Emulate TMCFG0 TMRN register
On Fri, Sep 25, 2015 at 06:02:23PM +0300, Laurentiu Tudor wrote: > Emulate TMCFG0 TMRN register exposing one HW thread per vcpu. > > Signed-off-by: Mihai Caraman <mihai.cara...@freescale.com> > [laurentiu.tu...@freescale.com: rebased on latest kernel, use > define instead of hardcoded value, moved code in own function] > Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com> Thanks, applied to my kvm-ppc-next branch. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e500: fix couple of shift operations on 64 bits
On Thu, Oct 01, 2015 at 03:58:03PM +0300, Laurentiu Tudor wrote: > Fix couple of cases where we shift left a 32-bit > value thus might get truncated results on 64-bit > targets. > > Signed-off-by: Laurentiu Tudor <laurentiu.tu...@freescale.com> > Suggested-by: Scott Wood <scotttw...@freescale.com> Thanks, applied to my kvm-ppc-next branch. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 15/19] KVM: PPC: e500: fix handling local_sid_lookup result
On Thu, Sep 24, 2015 at 04:00:23PM +0200, Andrzej Hajda wrote: > The function can return negative value. > > The problem has been detected using proposed semantic patch > scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1]. > > [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107 > > Signed-off-by: Andrzej Hajda <a.ha...@samsung.com> Thanks, applied to my kvm-ppc-next branch. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent HPTEs
This fixes a bug where the old HPTE value returned by H_REMOVE has the valid bit clear if the HPTE was an absent HPTE, as happens for HPTEs for emulated MMIO pages and for RAM pages that have been paged out by the host. If the absent bit is set, we clear it and set the valid bit, because from the guest's point of view, the HPTE is valid. Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index c1df9bb..97e7f8c 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -470,6 +470,8 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, note_hpte_modification(kvm, rev); unlock_hpte(hpte, 0); + if (v & HPTE_V_ABSENT) + v = (v & ~HPTE_V_ABSENT) | HPTE_V_VALID; hpret[0] = v; hpret[1] = r; return H_SUCCESS; -- 2.6.1 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent HPTEs
This fixes a bug where the old HPTE value returned by H_REMOVE has the valid bit clear if the HPTE was an absent HPTE, as happens for HPTEs for emulated MMIO pages and for RAM pages that have been paged out by the host. If the absent bit is set, we clear it and set the valid bit, because from the guest's point of view, the HPTE is valid. Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index c1df9bb..97e7f8c 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -470,6 +470,8 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, note_hpte_modification(kvm, rev); unlock_hpte(hpte, 0); + if (v & HPTE_V_ABSENT) + v = (v & ~HPTE_V_ABSENT) | HPTE_V_VALID; hpret[0] = v; hpret[1] = r; return H_SUCCESS; -- 2.6.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl
Currently the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested size of HPT, and if that is not possible, then try to allocate smaller sizes (by factors of 2) until either a minimum is reached or the allocation succeeds. This is not ideal for userspace, particularly in migration scenarios, where the destination VM really does require the size requested. Also, the minimum HPT size of 256kB may be insufficient for the guest to run successfully. This removes the fallback to smaller sizes on allocation failure for the KVM_PPC_ALLOCATE_HTAB ioctl. The fallback still exists for the case where the HPT is allocated at the time the first VCPU is run, if no HPT has been allocated by ioctl by that time. Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 1f9c0a1..10722b1 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -70,7 +70,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) } /* Lastly try successively smaller sizes from the page allocator */ - while (!hpt && order > PPC_MIN_HPT_ORDER) { + /* Only do this if userspace didn't specify a size via ioctl */ + while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) { hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT| __GFP_NOWARN, order - PAGE_SHIFT); if (!hpt) -- 2.6.1 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl
Currently the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested size of HPT, and if that is not possible, then try to allocate smaller sizes (by factors of 2) until either a minimum is reached or the allocation succeeds. This is not ideal for userspace, particularly in migration scenarios, where the destination VM really does require the size requested. Also, the minimum HPT size of 256kB may be insufficient for the guest to run successfully. This removes the fallback to smaller sizes on allocation failure for the KVM_PPC_ALLOCATE_HTAB ioctl. The fallback still exists for the case where the HPT is allocated at the time the first VCPU is run, if no HPT has been allocated by ioctl by that time. Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 1f9c0a1..10722b1 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -70,7 +70,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) } /* Lastly try successively smaller sizes from the page allocator */ - while (!hpt && order > PPC_MIN_HPT_ORDER) { + /* Only do this if userspace didn't specify a size via ioctl */ + while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) { hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT| __GFP_NOWARN, order - PAGE_SHIFT); if (!hpt) -- 2.6.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()
On Mon, Sep 21, 2015 at 07:50:22AM +0200, Paolo Bonzini wrote: > > > On 21/09/2015 03:37, David Gibson wrote: > > On Fri, Sep 18, 2015 at 08:57:28AM +0200, Thomas Huth wrote: > >> Access to the kvm->buses (like with the kvm_io_bus_read() and > >> -write() functions) has to be protected via the kvm->srcu lock. > >> The kvmppc_h_logical_ci_load() and -store() functions are > >> missing this lock so far, so let's add it there, too. This fixes > >> the problem that the kernel reports "suspicious RCU usage" when > >> lock debugging is enabled. > >> > >> Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1 Signed-off-by: > >> Thomas Huth <th...@redhat.com> > > > > Nice catch. Looks like I missed this because the places > > kvm_io_bus_{read,write}() are called on x86 are buried about 5 > > layers below where the srcu lock is taken :/. > > > > Reviewed-by: David Gibson <da...@gibson.dropbear.id.au> ... > Paul, > > shall I take this directly into my tree for -rc3? > > Paolo I have that and two other fixes in my kvm-ppc-fixes branch on kernel.org. They were in linux-next today. I was going to send you a pull request tomorrow, but if you are about to send stuff off to Linus you could pull now from: git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes The three patches in there are: Gautham R. Shenoy (1): KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit Paul Mackerras (1): KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs Thomas Huth (1): KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store() The one from Gautham is a 1-liner that has been around for months and got missed, and is obviously correct. The one from me fixes a regression that was introduced in 4.3-rc1 by one of my patches, which causes oopses and soft lockups due to a use-after-free bug. Thanks, Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()
On Mon, Sep 21, 2015 at 07:50:22AM +0200, Paolo Bonzini wrote: > > > On 21/09/2015 03:37, David Gibson wrote: > > On Fri, Sep 18, 2015 at 08:57:28AM +0200, Thomas Huth wrote: > >> Access to the kvm->buses (like with the kvm_io_bus_read() and > >> -write() functions) has to be protected via the kvm->srcu lock. > >> The kvmppc_h_logical_ci_load() and -store() functions are > >> missing this lock so far, so let's add it there, too. This fixes > >> the problem that the kernel reports "suspicious RCU usage" when > >> lock debugging is enabled. > >> > >> Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1 Signed-off-by: > >> Thomas Huth <th...@redhat.com> > > > > Nice catch. Looks like I missed this because the places > > kvm_io_bus_{read,write}() are called on x86 are buried about 5 > > layers below where the srcu lock is taken :/. > > > > Reviewed-by: David Gibson <da...@gibson.dropbear.id.au> ... > Paul, > > shall I take this directly into my tree for -rc3? > > Paolo I have that and two other fixes in my kvm-ppc-fixes branch on kernel.org. They were in linux-next today. I was going to send you a pull request tomorrow, but if you are about to send stuff off to Linus you could pull now from: git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-fixes The three patches in there are: Gautham R. Shenoy (1): KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit Paul Mackerras (1): KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs Thomas Huth (1): KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store() The one from Gautham is a 1-liner that has been around for months and got missed, and is obviously correct. The one from me fixes a regression that was introduced in 4.3-rc1 by one of my patches, which causes oopses and soft lockups due to a use-after-free bug. Thanks, Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs
This fixes a bug which results in stale vcore pointers being left in the per-cpu preempted vcore lists when a VM is destroyed. The result of the stale vcore pointers is usually either a crash or a lockup inside collect_piggybacks() when another VM is run. A typical lockup message looks like: [ 472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [qemu-system-ppc:7039] [ 472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: kvm_hv] [ 472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ #49 [ 472.162187] task: c01e38512750 ti: c01e41bfc000 task.ti: c01e41bfc000 [ 472.162262] NIP: c096b094 LR: c096b08c CTR: c030 [ 472.162337] REGS: c01e41bff520 TRAP: 0901 Not tainted (4.2.0-kvm+) [ 472.162399] MSR: 90019033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24848844 XER: [ 472.162588] CFAR: c096b0ac SOFTE: 1 GPR00: c070 c01e41bff7a0 c127df00 0001 GPR04: 0003 0001 00874821 GPR08: c01e41bff8e0 0001 defde740 GPR12: c030 cfdae400 [ 472.163053] NIP [c096b094] _raw_spin_lock_irqsave+0xa4/0x130 [ 472.163117] LR [c096b08c] _raw_spin_lock_irqsave+0x9c/0x130 [ 472.163179] Call Trace: [ 472.163206] [c01e41bff7a0] [c01e41bff7f0] 0xc01e41bff7f0 (unreliable) [ 472.163295] [c01e41bff7e0] [c070] __wake_up+0x40/0x90 [ 472.163375] [c01e41bff830] [defd6fc0] kvmppc_run_core+0x1240/0x1950 [kvm_hv] [ 472.163465] [c01e41bffa30] [defd8510] kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv] [ 472.163559] [c01e41bffb70] [de9318a4] kvmppc_vcpu_run+0x44/0x60 [kvm] [ 472.163653] [c01e41bffba0] [de92e674] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm] [ 472.163745] [c01e41bffbe0] [de9263a8] kvm_vcpu_ioctl+0x538/0x7b0 [kvm] [ 472.163834] [c01e41bffd40] [c02d0f50] do_vfs_ioctl+0x480/0x7c0 [ 472.163910] [c01e41bffde0] [c02d1364] SyS_ioctl+0xd4/0xf0 [ 472.163986] [c01e41bffe30] [c0009260] system_call+0x38/0xd0 [ 472.164060] Instruction dump: [ 472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 6000 6000 6042 8bad02e2 [ 472.164224] 7fc3f378 4b6a57c1 6000 7c210b78 89290009 792affe3 40820070 The bug is that kvmppc_run_vcpu does not correctly handle the case where a vcpu task receives a signal while its guest vcpu is executing in the guest as a result of being piggy-backed onto the execution of another vcore. In that case we need to wait for the vcpu to finish executing inside the guest, and then remove this vcore from the preempted vcores list. That way, we avoid leaving this vcpu's vcore on the preempted vcores list when the vcpu gets interrupted. Fixes: ec2571650826 Reported-by: Thomas Huth <th...@redhat.com> Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 9754e68..2280497 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2692,9 +2692,13 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) while (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE && (vc->vcore_state == VCORE_RUNNING || - vc->vcore_state == VCORE_EXITING)) + vc->vcore_state == VCORE_EXITING || + vc->vcore_state == VCORE_PIGGYBACK)) kvmppc_wait_for_exec(vc, vcpu, TASK_UNINTERRUPTIBLE); + if (vc->vcore_state == VCORE_PREEMPT && vc->runner == NULL) + kvmppc_vcore_end_preempt(vc); + if (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE) { kvmppc_remove_runnable(vc, vcpu); vcpu->stat.signal_exits++; -- 2.5.1 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs
This fixes a bug which results in stale vcore pointers being left in the per-cpu preempted vcore lists when a VM is destroyed. The result of the stale vcore pointers is usually either a crash or a lockup inside collect_piggybacks() when another VM is run. A typical lockup message looks like: [ 472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [qemu-system-ppc:7039] [ 472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: kvm_hv] [ 472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ #49 [ 472.162187] task: c01e38512750 ti: c01e41bfc000 task.ti: c01e41bfc000 [ 472.162262] NIP: c096b094 LR: c096b08c CTR: c030 [ 472.162337] REGS: c01e41bff520 TRAP: 0901 Not tainted (4.2.0-kvm+) [ 472.162399] MSR: 90019033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24848844 XER: [ 472.162588] CFAR: c096b0ac SOFTE: 1 GPR00: c070 c01e41bff7a0 c127df00 0001 GPR04: 0003 0001 00874821 GPR08: c01e41bff8e0 0001 defde740 GPR12: c030 cfdae400 [ 472.163053] NIP [c096b094] _raw_spin_lock_irqsave+0xa4/0x130 [ 472.163117] LR [c096b08c] _raw_spin_lock_irqsave+0x9c/0x130 [ 472.163179] Call Trace: [ 472.163206] [c01e41bff7a0] [c01e41bff7f0] 0xc01e41bff7f0 (unreliable) [ 472.163295] [c01e41bff7e0] [c070] __wake_up+0x40/0x90 [ 472.163375] [c01e41bff830] [defd6fc0] kvmppc_run_core+0x1240/0x1950 [kvm_hv] [ 472.163465] [c01e41bffa30] [defd8510] kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv] [ 472.163559] [c01e41bffb70] [de9318a4] kvmppc_vcpu_run+0x44/0x60 [kvm] [ 472.163653] [c01e41bffba0] [de92e674] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm] [ 472.163745] [c01e41bffbe0] [de9263a8] kvm_vcpu_ioctl+0x538/0x7b0 [kvm] [ 472.163834] [c01e41bffd40] [c02d0f50] do_vfs_ioctl+0x480/0x7c0 [ 472.163910] [c01e41bffde0] [c02d1364] SyS_ioctl+0xd4/0xf0 [ 472.163986] [c01e41bffe30] [c0009260] system_call+0x38/0xd0 [ 472.164060] Instruction dump: [ 472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 6000 6000 6042 8bad02e2 [ 472.164224] 7fc3f378 4b6a57c1 6000 7c210b78 89290009 792affe3 40820070 The bug is that kvmppc_run_vcpu does not correctly handle the case where a vcpu task receives a signal while its guest vcpu is executing in the guest as a result of being piggy-backed onto the execution of another vcore. In that case we need to wait for the vcpu to finish executing inside the guest, and then remove this vcore from the preempted vcores list. That way, we avoid leaving this vcpu's vcore on the preempted vcores list when the vcpu gets interrupted. Fixes: ec2571650826 Reported-by: Thomas Huth <th...@redhat.com> Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 9754e68..2280497 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2692,9 +2692,13 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) while (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE && (vc->vcore_state == VCORE_RUNNING || - vc->vcore_state == VCORE_EXITING)) + vc->vcore_state == VCORE_EXITING || + vc->vcore_state == VCORE_PIGGYBACK)) kvmppc_wait_for_exec(vc, vcpu, TASK_UNINTERRUPTIBLE); + if (vc->vcore_state == VCORE_PREEMPT && vc->runner == NULL) + kvmppc_vcore_end_preempt(vc); + if (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE) { kvmppc_remove_runnable(vc, vcpu); vcpu->stat.signal_exits++; -- 2.5.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm
On Tue, Sep 15, 2015 at 09:24:15PM -0400, Tejun Heo wrote: > Hello, Paul. > > On Tue, Sep 15, 2015 at 04:38:18PM -0700, Paul E. McKenney wrote: > > Well, the decision as to what is too big for -stable is owned by the > > -stable maintainers, not by me. > > Is it tho? Usually the subsystem maintainer knows the best and has > most say in it. I was mostly curious whether you'd think that the > changes would be too risky. If not, great. I do hope that they would listen to what I thought about it, but at the end of the day, it is the -stable maintainers who pull a given patch, or don't. > > I am suggesting trying the options and seeing what works best, then > > working to convince people as needed. > > Yeah, sure thing. Let's wait for Christian. Indeed. Is there enough benefit to risk jamming this thing into 4.3? I believe that 4.4 should be a no-brainer. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm
On Tue, Sep 15, 2015 at 06:28:11PM -0400, Tejun Heo wrote: > Hello, > > On Tue, Sep 15, 2015 at 02:38:30PM -0700, Paul E. McKenney wrote: > > I did take a shot at adding the rcu_sync stuff during this past merge > > window, but it did not converge quickly enough to make it. It looks > > quite good for the next merge window. There have been changes in most > > of the relevant areas, so probably best to just try them and see which > > works best. > > Heh, I'm having a bit of trouble following. Are you saying that the > changes would be too big for -stable? If so, I'll send out reverts of > the culprit patches and then reapply them for this cycle so that it > can land together with the rcu changes in the next merge window, but > it'd be great to find out whether the rcu changes are enough for the > issue that Christian is seeing to go away. If not, I'll switch to a > different locking scheme and mark those patches w/ stable tag. Well, the decision as to what is too big for -stable is owned by the -stable maintainers, not by me. I am suggesting trying the options and seeing what works best, then working to convince people as needed. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm
On Tue, Sep 15, 2015 at 06:42:19PM +0200, Paolo Bonzini wrote: > > > On 15/09/2015 15:36, Christian Borntraeger wrote: > > I am wondering why the old code behaved in such fatal ways. Is there > > some interaction between waiting for a reschedule in the > > synchronize_sched writer and some fork code actually waiting for the > > read side to get the lock together with some rescheduling going on > > waiting for a lock that fork holds? lockdep does not give me an hints > > so I have no clue :-( > > It may just be consuming too much CPU usage. kernel/rcu/tree.c warns > about it: > > * if you are using synchronize_sched_expedited() in a loop, please > * restructure your code to batch your updates, and then use a single > * synchronize_sched() instead. > > and you may remember that in KVM we switched from RCU to SRCU exactly to > avoid userspace-controlled synchronize_rcu_expedited(). > > In fact, I would say that any userspace-controlled call to *_expedited() > is a bug waiting to happen and a bad idea---because userspace can, with > little effort, end up calling it in a loop. Excellent points! Other options in such situations include the following: o Rework so that the code uses call_rcu*() instead of *_expedited(). o Maintain a per-task or per-CPU counter so that every so many *_expedited() invocations instead uses the non-expedited counterpart. (For example, synchronize_rcu instead of synchronize_rcu_expedited().) Note that synchronize_srcu_expedited() is less troublesome than are the other *_expedited() functions, because synchronize_srcu_expedited() does not inflict OS jitter on other CPUs. This situation is being improved, so that the other *_expedited() functions inflict less OS jitter and (mostly) avoid inflicting OS jitter on nohz_full CPUs and idle CPUs (the latter being important for battery-powered systems). In addition, the *_expedited() functions avoid hammering CPUs with N-squared OS jitter in response to concurrent invocation from all CPUs because multiple concurrent *_expedited() calls will be satisfied by a single expedited grace-period operation. Nevertheless, as Paolo points out, it is still necessary to exercise caution when exposing synchronous grace periods to userspace control. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm
On Tue, Sep 15, 2015 at 05:26:22PM -0400, Tejun Heo wrote: > Hello, > > On Tue, Sep 15, 2015 at 11:11:45PM +0200, Christian Borntraeger wrote: > > > In fact, I would say that any userspace-controlled call to *_expedited() > > > is a bug waiting to happen and a bad idea---because userspace can, with > > > little effort, end up calling it in a loop. > > > > Right. This also implies that we should fix this for 4.2 - I guess. > > Are the percpu_rwsem changes enough? If so, we can try to backport > those. If those are too risky, we can revert the patches which > switched threadgroup lock to percpu_rwsem. I did take a shot at adding the rcu_sync stuff during this past merge window, but it did not converge quickly enough to make it. It looks quite good for the next merge window. There have been changes in most of the relevant areas, so probably best to just try them and see which works best. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD
On Sun, Sep 06, 2015 at 12:47:12PM -0700, Nathan Whitehorn wrote: > Anything I can do to help move these along? It's a big performance > improvement for FreeBSD guests. These patches are in Paolo's kvm-ppc-next branch and should go into Linus' tree in the next couple of days. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD
On Sun, Sep 06, 2015 at 12:47:12PM -0700, Nathan Whitehorn wrote: > Anything I can do to help move these along? It's a big performance > improvement for FreeBSD guests. These patches are in Paolo's kvm-ppc-next branch and should go into Linus' tree in the next couple of days. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Please pull my kvm-ppc-next branch
Paolo, Please pull the commits listed below into your tree. I would like them to go in for 4.3 as they are all small bug fixes not new features, and they all can only affect HV-mode KVM on IBM server machines (in fact one has no effect on code at all since it is a typo fix for a comment). Please let me know if you want me to re-post all the patches. Thanks, Paul. The following changes since commit e3dbc572fe11a5231568e106fa3dcedd1d1bec0f: Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-queue (2015-08-22 14:57:59 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-next for you to fetch changes up to 4e33d1f0a145d48e8cf287954bbf791af8387cfb: KVM: PPC: Book3S: Fix typo in top comment about locking (2015-09-04 07:28:05 +1000) Gautham R. Shenoy (2): KVM: PPC: Book3S HV: Fix race in starting secondary threads KVM: PPC: Book3S HV: Exit on H_DOORBELL if HOST_IPI is set Greg Kurz (1): KVM: PPC: Book3S: Fix typo in top comment about locking Thomas Huth (1): KVM: PPC: Book3S: Fix size of the PSPB register arch/powerpc/include/asm/kvm_host.h | 2 +- arch/powerpc/kvm/book3s_hv.c| 10 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 9 + arch/powerpc/kvm/book3s_xics.c | 2 +- 4 files changed, 20 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Please pull my kvm-ppc-next branch
Paolo, Please pull the commits listed below into your tree. I would like them to go in for 4.3 as they are all small bug fixes not new features, and they all can only affect HV-mode KVM on IBM server machines (in fact one has no effect on code at all since it is a typo fix for a comment). Please let me know if you want me to re-post all the patches. Thanks, Paul. The following changes since commit e3dbc572fe11a5231568e106fa3dcedd1d1bec0f: Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-queue (2015-08-22 14:57:59 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-next for you to fetch changes up to 4e33d1f0a145d48e8cf287954bbf791af8387cfb: KVM: PPC: Book3S: Fix typo in top comment about locking (2015-09-04 07:28:05 +1000) Gautham R. Shenoy (2): KVM: PPC: Book3S HV: Fix race in starting secondary threads KVM: PPC: Book3S HV: Exit on H_DOORBELL if HOST_IPI is set Greg Kurz (1): KVM: PPC: Book3S: Fix typo in top comment about locking Thomas Huth (1): KVM: PPC: Book3S: Fix size of the PSPB register arch/powerpc/include/asm/kvm_host.h | 2 +- arch/powerpc/kvm/book3s_hv.c| 10 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 9 + arch/powerpc/kvm/book3s_xics.c | 2 +- 4 files changed, 20 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Please add my kvm-ppc-next branch to linux-next
Hi Stephen, Please include the kvm-ppc-next branch of my powerpc git tree at: git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git to linux-next. This branch currently only has commits that are intended to go into 4.3, and I won't put in any commits for 4.4 until 4.3-rc1 is out. Thanks, Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Exit on H_DOORBELL only if HOST_IPI is set
From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com> The code that handles the case when we receive a H_DOORBELL interrupt has a comment which says "Hypervisor doorbell - exit only if host IPI flag set". However, the current code does not actually check if the host IPI flag is set. This is due to a comparison instruction that got missed. As a result, the current code performs the exit to host only if some sibling thread or a sibling sub-core is exiting to the host. This implies that, an IPI sent to a sibling core in (subcores-per-core != 1) mode will be missed by the host unless the sibling core is on the exit path to the host. This patch adds the missing comparison operation which will ensure that when HOST_IPI flag is set, we unconditionally exit to the host. Fixes: 66feed61cdf6 Cc: sta...@vger.kernel.org # v4.1+ Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index b07f045..2273dca 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1213,6 +1213,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) cmpwi r12, BOOK3S_INTERRUPT_H_DOORBELL bne 3f lbz r0, HSTATE_HOST_IPI(r13) + cmpwi r0, 0 beq 4f b guest_exit_cont 3: -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Fix race in starting secondary threads
From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com> The current dynamic micro-threading code has a race due to which a secondary thread naps when it is supposed to be running a vcpu. As a side effect of this, on a guest exit, the primary thread in kvmppc_wait_for_nap() finds that this secondary thread hasn't cleared its vcore pointer. This results in "CPU X seems to be stuck!" warnings. The race is possible since the primary thread on exiting the guests only waits for all the secondaries to clear its vcore pointer. It subsequently expects the secondary threads to enter nap while it unsplits the core. A secondary thread which hasn't yet entered the nap will loop in kvm_no_guest until its vcore pointer and the do_nap flag are unset. Once the core has been unsplit, a new vcpu thread can grab the core and set the do_nap flag *before* setting the vcore pointers of the secondary. As a result, the secondary thread will now enter nap via kvm_unsplit_nap instead of running the guest vcpu. Fix this by setting the do_nap flag after setting the vcore pointer in the PACA of the secondary in kvmppc_run_core. Also, ensure that a secondary thread doesn't nap in kvm_unsplit_nap when the vcore pointer in its PACA struct is set. Fixes: b4deba5c41e9 Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv.c| 10 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index fad52f2..c5edf17 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2411,7 +2411,6 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) break; cpu_relax(); } - split_info.do_nap = 1; /* ask secondaries to nap when done */ } /* Start all the threads */ @@ -2440,6 +2439,15 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) thr += pvc->num_threads; } } + + /* +* Ensure that split_info.do_nap is set after setting +* the vcore pointer in the PACA of the secondaries. +*/ + smp_mb(); + if (cmd_bit) + split_info.do_nap = 1; /* ask secondaries to nap when done */ + /* * When doing micro-threading, poke the inactive threads as well. * This gets them to the nap instruction after kvm_do_nap, diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 472680f..b07f045 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -421,6 +421,14 @@ kvm_no_guest: * whole-core mode, so we need to nap. */ kvm_unsplit_nap: + /* +* Ensure that secondary doesn't nap when it has +* its vcore pointer set. +*/ + sync/* matches smp_mb() before setting split_info.do_nap */ + ld r0, HSTATE_KVM_VCORE(r13) + cmpdi r0, 0 + bne kvm_no_guest /* clear any pending message */ BEGIN_FTR_SECTION lis r6, (PPC_DBELL_SERVER << (63-36))@h -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Exit on H_DOORBELL only if HOST_IPI is set
From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com> The code that handles the case when we receive a H_DOORBELL interrupt has a comment which says "Hypervisor doorbell - exit only if host IPI flag set". However, the current code does not actually check if the host IPI flag is set. This is due to a comparison instruction that got missed. As a result, the current code performs the exit to host only if some sibling thread or a sibling sub-core is exiting to the host. This implies that, an IPI sent to a sibling core in (subcores-per-core != 1) mode will be missed by the host unless the sibling core is on the exit path to the host. This patch adds the missing comparison operation which will ensure that when HOST_IPI flag is set, we unconditionally exit to the host. Fixes: 66feed61cdf6 Cc: sta...@vger.kernel.org # v4.1+ Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index b07f045..2273dca 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1213,6 +1213,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) cmpwi r12, BOOK3S_INTERRUPT_H_DOORBELL bne 3f lbz r0, HSTATE_HOST_IPI(r13) + cmpwi r0, 0 beq 4f b guest_exit_cont 3: -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Fix race in starting secondary threads
From: "Gautham R. Shenoy" <e...@linux.vnet.ibm.com> The current dynamic micro-threading code has a race due to which a secondary thread naps when it is supposed to be running a vcpu. As a side effect of this, on a guest exit, the primary thread in kvmppc_wait_for_nap() finds that this secondary thread hasn't cleared its vcore pointer. This results in "CPU X seems to be stuck!" warnings. The race is possible since the primary thread on exiting the guests only waits for all the secondaries to clear its vcore pointer. It subsequently expects the secondary threads to enter nap while it unsplits the core. A secondary thread which hasn't yet entered the nap will loop in kvm_no_guest until its vcore pointer and the do_nap flag are unset. Once the core has been unsplit, a new vcpu thread can grab the core and set the do_nap flag *before* setting the vcore pointers of the secondary. As a result, the secondary thread will now enter nap via kvm_unsplit_nap instead of running the guest vcpu. Fix this by setting the do_nap flag after setting the vcore pointer in the PACA of the secondary in kvmppc_run_core. Also, ensure that a secondary thread doesn't nap in kvm_unsplit_nap when the vcore pointer in its PACA struct is set. Fixes: b4deba5c41e9 Signed-off-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <pau...@samba.org> --- arch/powerpc/kvm/book3s_hv.c| 10 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index fad52f2..c5edf17 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2411,7 +2411,6 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) break; cpu_relax(); } - split_info.do_nap = 1; /* ask secondaries to nap when done */ } /* Start all the threads */ @@ -2440,6 +2439,15 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) thr += pvc->num_threads; } } + + /* +* Ensure that split_info.do_nap is set after setting +* the vcore pointer in the PACA of the secondaries. +*/ + smp_mb(); + if (cmd_bit) + split_info.do_nap = 1; /* ask secondaries to nap when done */ + /* * When doing micro-threading, poke the inactive threads as well. * This gets them to the nap instruction after kvm_do_nap, diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 472680f..b07f045 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -421,6 +421,14 @@ kvm_no_guest: * whole-core mode, so we need to nap. */ kvm_unsplit_nap: + /* +* Ensure that secondary doesn't nap when it has +* its vcore pointer set. +*/ + sync/* matches smp_mb() before setting split_info.do_nap */ + ld r0, HSTATE_KVM_VCORE(r13) + cmpdi r0, 0 + bne kvm_no_guest /* clear any pending message */ BEGIN_FTR_SECTION lis r6, (PPC_DBELL_SERVER << (63-36))@h -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: ppc: Fix size of the PSPB register
On Tue, Sep 01, 2015 at 11:41:18PM +0200, Thomas Huth wrote: > The size of the Problem State Priority Boost Register is only > 32 bits, so let's change the type of the corresponding variable > accordingly to avoid future trouble. Since we're already using lwz/stw in the assembly code in book3s_hv_rmhandlers.S, this is actually a bug fix, isn't it? How did you find it? Did you observe a failure of some kind, or did you just find it by code inspection? Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: ppc: Fix size of the PSPB register
On Tue, Sep 01, 2015 at 11:41:18PM +0200, Thomas Huth wrote: > The size of the Problem State Priority Boost Register is only > 32 bits, so let's change the type of the corresponding variable > accordingly to avoid future trouble. Since we're already using lwz/stw in the assembly code in book3s_hv_rmhandlers.S, this is actually a bug fix, isn't it? How did you find it? Did you observe a failure of some kind, or did you just find it by code inspection? Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: ppc: Fix size of the PSPB register
On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote: > > The size of the Problem State Priority Boost Register is only > > 32 bits, so let's change the type of the corresponding variable > > accordingly to avoid future trouble. > > It's not future trouble, it's broken today for LE and this should fix > it BUT No, it's broken today for BE hosts, which will always see 0 for the PSPB register value. LE hosts are fine. > The asm accesses it using lwz/stw and C accesses it as a ulong. On LE > that will mean that userspace will see the value << 32 No, that will happen on BE, and since KVM_REG_PPC_PSPB says it's a 32-bit register, we'll just pass 0 back to userspace when it reads it. > Now "fixing" it might break migration if that field is already > stored/loaded in its "broken" form. We may have to keep the "broken" > behaviour and document that qemu sees a value shifted by 32. It will be being set to 0 on BE hosts across migration today (fortunately 0 is a benign value for PSPB). If we fix this on both the source and destination host, then the value will get migrated across correctly. I think Thomas's patch is fine, it just needs a stronger patch description saying that it fixes an actual bug. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: ppc: Fix size of the PSPB register
On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote: > > The size of the Problem State Priority Boost Register is only > > 32 bits, so let's change the type of the corresponding variable > > accordingly to avoid future trouble. > > It's not future trouble, it's broken today for LE and this should fix > it BUT No, it's broken today for BE hosts, which will always see 0 for the PSPB register value. LE hosts are fine. > The asm accesses it using lwz/stw and C accesses it as a ulong. On LE > that will mean that userspace will see the value << 32 No, that will happen on BE, and since KVM_REG_PPC_PSPB says it's a 32-bit register, we'll just pass 0 back to userspace when it reads it. > Now "fixing" it might break migration if that field is already > stored/loaded in its "broken" form. We may have to keep the "broken" > behaviour and document that qemu sees a value shifted by 32. It will be being set to 0 on BE hosts across migration today (fortunately 0 is a benign value for PSPB). If we fix this on both the source and destination host, then the value will get migrated across correctly. I think Thomas's patch is fine, it just needs a stronger patch description saying that it fixes an actual bug. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vfio: Enable VFIO device for powerpc
On Wed, Aug 26, 2015 at 11:34:26AM +0200, Alexander Graf wrote: On 13.08.15 03:15, David Gibson wrote: ec53500f kvm: Add VFIO device added a special KVM pseudo-device which is used to handle any necessary interactions between KVM and VFIO. Currently that device is built on x86 and ARM, but not powerpc, although powerpc does support both KVM and VFIO. This makes things awkward in userspace Currently qemu prints an alarming error message if you attempt to use VFIO and it can't initialize the KVM VFIO device. We don't want to remove the warning, because lack of the KVM VFIO device could mean coherency problems on x86. On powerpc, however, the error is harmless but looks disturbing, and a test based on host architecture in qemu would be ugly, and break if we do need the KVM VFIO device for something important in future. There's nothing preventing the KVM VFIO device from being built for powerpc, so this patch turns it on. It won't actually do anything, since we don't define any of the arch_*() hooks, but it will make qemu happy and we can extend it in future if we need to. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Reviewed-by: Eric Auger eric.au...@linaro.org Paul is going to take care of the kvm-ppc tree for 4.3. Also, ppc kvm patches should get CC on the kvm-ppc@vger mailing list ;). Paul, could you please pick this one up? Sure, I'll do that once I get home (end of this week). Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vfio: Enable VFIO device for powerpc
On Wed, Aug 26, 2015 at 11:34:26AM +0200, Alexander Graf wrote: On 13.08.15 03:15, David Gibson wrote: ec53500f kvm: Add VFIO device added a special KVM pseudo-device which is used to handle any necessary interactions between KVM and VFIO. Currently that device is built on x86 and ARM, but not powerpc, although powerpc does support both KVM and VFIO. This makes things awkward in userspace Currently qemu prints an alarming error message if you attempt to use VFIO and it can't initialize the KVM VFIO device. We don't want to remove the warning, because lack of the KVM VFIO device could mean coherency problems on x86. On powerpc, however, the error is harmless but looks disturbing, and a test based on host architecture in qemu would be ugly, and break if we do need the KVM VFIO device for something important in future. There's nothing preventing the KVM VFIO device from being built for powerpc, so this patch turns it on. It won't actually do anything, since we don't define any of the arch_*() hooks, but it will make qemu happy and we can extend it in future if we need to. Signed-off-by: David Gibson da...@gibson.dropbear.id.au Reviewed-by: Eric Auger eric.au...@linaro.org Paul is going to take care of the kvm-ppc tree for 4.3. Also, ppc kvm patches should get CC on the kvm-ppc@vger mailing list ;). Paul, could you please pick this one up? Sure, I'll do that once I get home (end of this week). Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm:powerpc:Fix return statements for wrapper functions in the file book3s_64_mmu_hv.c
On Mon, Aug 10, 2015 at 11:27:31AM -0400, Nicholas Krause wrote: This fixes the wrapper functions kvm_umap_hva_hv and the function kvm_unmap_hav_range_hv to return the return value of the function kvm_handle_hva or kvm_handle_hva_range that they are wrapped to call internally rather then always making the caller of these wrapper functions think they always run successfully by returning the value of zero directly. In fact these functions do always run successfully, and there is no bug fixed here (see below). I don't object to the change per se, since it reduces the code size very slightly, but the commit message and headline needs to be reworded to avoid giving the impression that this fixes something. int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva) { - kvm_handle_hva(kvm, hva, kvm_unmap_rmapp); - return 0; + return kvm_handle_hva(kvm, hva, kvm_unmap_rmapp); } int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start, unsigned long end) { - kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp); - return 0; + return kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp); kvm_handle_hva and kvm_handle_hva_range call the handler function (kvm_unmap_rmapp in this case) one or more times, and return the logical OR of the return values from the handler. Since kvm_unmap_rmapp always returns 0, the return value from kvm_handle_hva{,_range} will always be 0 here. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm:powerpc:Fix return statements for wrapper functions in the file book3s_64_mmu_hv.c
On Mon, Aug 10, 2015 at 11:27:31AM -0400, Nicholas Krause wrote: This fixes the wrapper functions kvm_umap_hva_hv and the function kvm_unmap_hav_range_hv to return the return value of the function kvm_handle_hva or kvm_handle_hva_range that they are wrapped to call internally rather then always making the caller of these wrapper functions think they always run successfully by returning the value of zero directly. In fact these functions do always run successfully, and there is no bug fixed here (see below). I don't object to the change per se, since it reduces the code size very slightly, but the commit message and headline needs to be reworded to avoid giving the impression that this fixes something. int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva) { - kvm_handle_hva(kvm, hva, kvm_unmap_rmapp); - return 0; + return kvm_handle_hva(kvm, hva, kvm_unmap_rmapp); } int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start, unsigned long end) { - kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp); - return 0; + return kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp); kvm_handle_hva and kvm_handle_hva_range call the handler function (kvm_unmap_rmapp in this case) one or more times, and return the logical OR of the return values from the handler. Since kvm_unmap_rmapp always returns 0, the return value from kvm_handle_hva{,_range} will always be 0 here. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Two fixes for dynamic micro-threading
On Thu, Jul 23, 2015 at 02:02:51PM +0200, Alexander Graf wrote: The host crash should only occur with dynamic micro-threading enabled, which is not in Linus' tree, correct? Correct. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Two fixes for dynamic micro-threading
On Thu, Jul 23, 2015 at 02:02:51PM +0200, Alexander Graf wrote: The host crash should only occur with dynamic micro-threading enabled, which is not in Linus' tree, correct? Correct. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation
Whenever a vcore state is VCORE_PREEMPT we need to be counting stolen time for it. This currently isn't the case when we have a vcore that no longer has any runnable threads in it but still has a runner task, so we do an explicit call to kvmppc_core_start_stolen() in that case. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_hv.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 3d02276..fad52f2 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2283,9 +2283,14 @@ static void post_guest_process(struct kvmppc_vcore *vc, bool is_master) } list_del_init(vc-preempt_list); if (!is_master) { - vc-vcore_state = vc-runner ? VCORE_PREEMPT : VCORE_INACTIVE; - if (still_running 0) + if (still_running 0) { kvmppc_vcore_preempt(vc); + } else if (vc-runner) { + vc-vcore_state = VCORE_PREEMPT; + kvmppc_core_start_stolen(vc); + } else { + vc-vcore_state = VCORE_INACTIVE; + } if (vc-n_runnable 0 vc-runner == NULL) { /* make sure there's a candidate runner awake */ vcpu = list_first_entry(vc-runnable_threads, -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: PPC: Book3S HV: Fix preempted vcore list locking
When a vcore gets preempted, we put it on the preempted vcore list for the current CPU. The runner task then calls schedule() and comes back some time later and takes itself off the list. We need to be careful to lock the list that it was put onto, which may not be the list for the current CPU since the runner task may have moved to another CPU. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_hv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 6e3ef30..3d02276 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1962,10 +1962,11 @@ static void kvmppc_vcore_preempt(struct kvmppc_vcore *vc) static void kvmppc_vcore_end_preempt(struct kvmppc_vcore *vc) { - struct preempted_vcore_list *lp = this_cpu_ptr(preempted_vcores); + struct preempted_vcore_list *lp; kvmppc_core_end_stolen(vc); if (!list_empty(vc-preempt_list)) { + lp = per_cpu(preempted_vcores, vc-pcpu); spin_lock(lp-lock); list_del_init(vc-preempt_list); spin_unlock(lp-lock); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: PPC: Book3S HV: Fix preempted vcore list locking
When a vcore gets preempted, we put it on the preempted vcore list for the current CPU. The runner task then calls schedule() and comes back some time later and takes itself off the list. We need to be careful to lock the list that it was put onto, which may not be the list for the current CPU since the runner task may have moved to another CPU. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_hv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 6e3ef30..3d02276 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1962,10 +1962,11 @@ static void kvmppc_vcore_preempt(struct kvmppc_vcore *vc) static void kvmppc_vcore_end_preempt(struct kvmppc_vcore *vc) { - struct preempted_vcore_list *lp = this_cpu_ptr(preempted_vcores); + struct preempted_vcore_list *lp; kvmppc_core_end_stolen(vc); if (!list_empty(vc-preempt_list)) { + lp = per_cpu(preempted_vcores, vc-pcpu); spin_lock(lp-lock); list_del_init(vc-preempt_list); spin_unlock(lp-lock); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Two fixes for dynamic micro-threading
This series contains two fixes for the new dynamic micro-threading code that was added recently for HV-mode KVM on Power servers. The patches are against Alex Graf's kvm-ppc-queue branch. Please apply. Paul. arch/powerpc/kvm/book3s_hv.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation
Whenever a vcore state is VCORE_PREEMPT we need to be counting stolen time for it. This currently isn't the case when we have a vcore that no longer has any runnable threads in it but still has a runner task, so we do an explicit call to kvmppc_core_start_stolen() in that case. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_hv.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 3d02276..fad52f2 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -2283,9 +2283,14 @@ static void post_guest_process(struct kvmppc_vcore *vc, bool is_master) } list_del_init(vc-preempt_list); if (!is_master) { - vc-vcore_state = vc-runner ? VCORE_PREEMPT : VCORE_INACTIVE; - if (still_running 0) + if (still_running 0) { kvmppc_vcore_preempt(vc); + } else if (vc-runner) { + vc-vcore_state = VCORE_PREEMPT; + kvmppc_core_start_stolen(vc); + } else { + vc-vcore_state = VCORE_INACTIVE; + } if (vc-n_runnable 0 vc-runner == NULL) { /* make sure there's a candidate runner awake */ vcpu = list_first_entry(vc-runnable_threads, -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Two fixes for dynamic micro-threading
This series contains two fixes for the new dynamic micro-threading code that was added recently for HV-mode KVM on Power servers. The patches are against Alex Graf's kvm-ppc-queue branch. Please apply. Paul. arch/powerpc/kvm/book3s_hv.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras pau...@samba.org --- v3: Rename MAX_THREADS to MAX_SMT_THREADS to avoid a compile warning arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 367 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 113 +++-- 6 files changed, 473 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..57d5dfe 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_SMT_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_SMT_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore */ +#define VCORE_EXIT_REQ 0x1 + /* * Values for vcore_state. * Note that these are arranged such that lower values diff --git a/arch/powerpc/kernel/asm
[PATCH v3] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras pau...@samba.org --- v3: Rename MAX_THREADS to MAX_SMT_THREADS to avoid a compile warning arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 367 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 113 +++-- 6 files changed, 473 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..57d5dfe 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_SMT_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_SMT_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore */ +#define VCORE_EXIT_REQ 0x1 + /* * Values for vcore_state. * Note that these are arranged such that lower values diff --git a/arch/powerpc/kernel/asm
[PATCH v2 2/5] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras pau...@samba.org --- v2: List allowed values for dynamic_mt_modes module parameter in the module parameter description. arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 369 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 113 +++-- 6 files changed, 475 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..4024d24 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore */ +#define VCORE_EXIT_REQ 0x1 + /* * Values for vcore_state. * Note that these are arranged such that lower values diff --git a/arch
[PATCH v2 2/5] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras pau...@samba.org --- v2: List allowed values for dynamic_mt_modes module parameter in the module parameter description. arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 369 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 113 +++-- 6 files changed, 475 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..4024d24 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore */ +#define VCORE_EXIT_REQ 0x1 + /* * Values for vcore_state. * Note that these are arranged such that lower values diff --git a/arch
[PATCH 3/5] KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE
The reference (R) and change (C) bits in a HPT entry can be set by hardware at any time up until the HPTE is invalidated and the TLB invalidation sequence has completed. This means that when removing a HPTE, we need to read the HPTE after the invalidation sequence has completed in order to obtain reliable values of R and C. The code in kvmppc_do_h_remove() used to do this. However, commit 6f22bd3265fb (KVM: PPC: Book3S HV: Make HTAB code LE host aware) removed the read after invalidation as a side effect of other changes. This restores the read of the HPTE after invalidation. The user-visible effect of this bug would be that when migrating a guest, there is a small probability that a page modified by the guest and then unmapped by the guest might not get re-transmitted and thus the destination might end up with a stale copy of the page. Fixes: 6f22bd3265fb Cc: sta...@vger.kernel.org # v3.17+ Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index b027a89..c6d601c 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -421,14 +421,20 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]); v = pte ~HPTE_V_HVLOCK; if (v HPTE_V_VALID) { - u64 pte1; - - pte1 = be64_to_cpu(hpte[1]); hpte[0] = ~cpu_to_be64(HPTE_V_VALID); - rb = compute_tlbie_rb(v, pte1, pte_index); + rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index); do_tlbies(kvm, rb, 1, global_invalidates(kvm, flags), true); - /* Read PTE low word after tlbie to get final R/C values */ - remove_revmap_chain(kvm, pte_index, rev, v, pte1); + /* +* The reference (R) and change (C) bits in a HPT +* entry can be set by hardware at any time up until +* the HPTE is invalidated and the TLB invalidation +* sequence has completed. This means that when +* removing a HPTE, we need to re-read the HPTE after +* the invalidation sequence has completed in order to +* obtain reliable values of R and C. +*/ + remove_revmap_chain(kvm, pte_index, rev, v, + be64_to_cpu(hpte[1])); } r = rev-guest_rpte ~HPTE_GR_RESERVED; note_hpte_modification(kvm, rev); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD
This adds implementations for the H_CLEAR_REF (test and clear reference bit) and H_CLEAR_MOD (test and clear changed bit) hypercalls. When clearing the reference or change bit in the guest view of the HPTE, we also have to clear it in the real HPTE so that we can detect future references or changes. When we do so, we transfer the R or C bit value to the rmap entry for the underlying host page so that kvm_age_hva_hv(), kvm_test_age_hva_hv() and kvmppc_hv_get_dirty_log() know that the page has been referenced and/or changed. These hypercalls are not used by Linux guests. These implementations have been tested using a FreeBSD guest. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 126 ++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 +- 2 files changed, 121 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index c7a3ab2..c1df9bb 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -112,25 +112,38 @@ void kvmppc_update_rmap_change(unsigned long *rmap, unsigned long psize) } EXPORT_SYMBOL_GPL(kvmppc_update_rmap_change); +/* Returns a pointer to the revmap entry for the page mapped by a HPTE */ +static unsigned long *revmap_for_hpte(struct kvm *kvm, unsigned long hpte_v, + unsigned long hpte_gr) +{ + struct kvm_memory_slot *memslot; + unsigned long *rmap; + unsigned long gfn; + + gfn = hpte_rpn(hpte_gr, hpte_page_size(hpte_v, hpte_gr)); + memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn); + if (!memslot) + return NULL; + + rmap = real_vmalloc_addr(memslot-arch.rmap[gfn - memslot-base_gfn]); + return rmap; +} + /* Remove this HPTE from the chain for a real page */ static void remove_revmap_chain(struct kvm *kvm, long pte_index, struct revmap_entry *rev, unsigned long hpte_v, unsigned long hpte_r) { struct revmap_entry *next, *prev; - unsigned long gfn, ptel, head; - struct kvm_memory_slot *memslot; + unsigned long ptel, head; unsigned long *rmap; unsigned long rcbits; rcbits = hpte_r (HPTE_R_R | HPTE_R_C); ptel = rev-guest_rpte |= rcbits; - gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel)); - memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn); - if (!memslot) + rmap = revmap_for_hpte(kvm, hpte_v, ptel); + if (!rmap) return; - - rmap = real_vmalloc_addr(memslot-arch.rmap[gfn - memslot-base_gfn]); lock_rmap(rmap); head = *rmap KVMPPC_RMAP_INDEX; @@ -678,6 +691,105 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags, return H_SUCCESS; } +long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags, + unsigned long pte_index) +{ + struct kvm *kvm = vcpu-kvm; + __be64 *hpte; + unsigned long v, r, gr; + struct revmap_entry *rev; + unsigned long *rmap; + long ret = H_NOT_FOUND; + + if (pte_index = kvm-arch.hpt_npte) + return H_PARAMETER; + + rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]); + hpte = (__be64 *)(kvm-arch.hpt_virt + (pte_index 4)); + while (!try_lock_hpte(hpte, HPTE_V_HVLOCK)) + cpu_relax(); + v = be64_to_cpu(hpte[0]); + r = be64_to_cpu(hpte[1]); + if (!(v (HPTE_V_VALID | HPTE_V_ABSENT))) + goto out; + + gr = rev-guest_rpte; + if (rev-guest_rpte HPTE_R_R) { + rev-guest_rpte = ~HPTE_R_R; + note_hpte_modification(kvm, rev); + } + if (v HPTE_V_VALID) { + gr |= r (HPTE_R_R | HPTE_R_C); + if (r HPTE_R_R) { + kvmppc_clear_ref_hpte(kvm, hpte, pte_index); + rmap = revmap_for_hpte(kvm, v, gr); + if (rmap) { + lock_rmap(rmap); + *rmap |= KVMPPC_RMAP_REFERENCED; + unlock_rmap(rmap); + } + } + } + vcpu-arch.gpr[4] = gr; + ret = H_SUCCESS; + out: + unlock_hpte(hpte, v ~HPTE_V_HVLOCK); + return ret; +} + +long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags, + unsigned long pte_index) +{ + struct kvm *kvm = vcpu-kvm; + __be64 *hpte; + unsigned long v, r, gr; + struct revmap_entry *rev; + unsigned long *rmap; + long ret = H_NOT_FOUND; + + if (pte_index = kvm-arch.hpt_npte) + return H_PARAMETER; + + rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]); + hpte = (__be64 *)(kvm-arch.hpt_virt + (pte_index 4)); + while (!try_lock_hpte(hpte
[PATCH 2/5] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 369 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 113 +++-- 6 files changed, 475 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..4024d24 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore */ +#define VCORE_EXIT_REQ 0x1 + /* * Values for vcore_state. * Note that these are arranged such that lower values diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index d333664..c3e11e0 100644
[PATCH 3/5] KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE
The reference (R) and change (C) bits in a HPT entry can be set by hardware at any time up until the HPTE is invalidated and the TLB invalidation sequence has completed. This means that when removing a HPTE, we need to read the HPTE after the invalidation sequence has completed in order to obtain reliable values of R and C. The code in kvmppc_do_h_remove() used to do this. However, commit 6f22bd3265fb (KVM: PPC: Book3S HV: Make HTAB code LE host aware) removed the read after invalidation as a side effect of other changes. This restores the read of the HPTE after invalidation. The user-visible effect of this bug would be that when migrating a guest, there is a small probability that a page modified by the guest and then unmapped by the guest might not get re-transmitted and thus the destination might end up with a stale copy of the page. Fixes: 6f22bd3265fb Cc: sta...@vger.kernel.org # v3.17+ Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index b027a89..c6d601c 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -421,14 +421,20 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags, rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]); v = pte ~HPTE_V_HVLOCK; if (v HPTE_V_VALID) { - u64 pte1; - - pte1 = be64_to_cpu(hpte[1]); hpte[0] = ~cpu_to_be64(HPTE_V_VALID); - rb = compute_tlbie_rb(v, pte1, pte_index); + rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index); do_tlbies(kvm, rb, 1, global_invalidates(kvm, flags), true); - /* Read PTE low word after tlbie to get final R/C values */ - remove_revmap_chain(kvm, pte_index, rev, v, pte1); + /* +* The reference (R) and change (C) bits in a HPT +* entry can be set by hardware at any time up until +* the HPTE is invalidated and the TLB invalidation +* sequence has completed. This means that when +* removing a HPTE, we need to re-read the HPTE after +* the invalidation sequence has completed in order to +* obtain reliable values of R and C. +*/ + remove_revmap_chain(kvm, pte_index, rev, v, + be64_to_cpu(hpte[1])); } r = rev-guest_rpte ~HPTE_GR_RESERVED; note_hpte_modification(kvm, rev); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] KVM: PPC: Book3S HV: Make use of unused threads when running guests
When running a virtual core of a guest that is configured with fewer threads per core than the physical cores have, the extra physical threads are currently unused. This makes it possible to use them to run one or more other virtual cores from the same guest when certain conditions are met. This applies on POWER7, and on POWER8 to guests with one thread per virtual core. (It doesn't apply to POWER8 guests with multiple threads per vcore because they require a 1-1 virtual to physical thread mapping in order to be able to use msgsndp and the TIR.) The idea is that we maintain a list of preempted vcores for each physical cpu (i.e. each core, since the host runs single-threaded). Then, when a vcore is about to run, it checks to see if there are any vcores on the list for its physical cpu that could be piggybacked onto this vcore's execution. If so, those additional vcores are put into state VCORE_PIGGYBACK and their runnable VCPU threads are started as well as the original vcore, which is called the master vcore. After the vcores have exited the guest, the extra ones are put back onto the preempted list if any of their VCPUs are still runnable and not idle. This means that vcpu-arch.ptid is no longer necessarily the same as the physical thread that the vcpu runs on. In order to make it easier for code that wants to send an IPI to know which CPU to target, we now store that in a new field in struct vcpu_arch, called thread_cpu. Reviewed-by: David Gibson da...@gibson.dropbear.id.au Tested-by: Laurent Vivier lviv...@redhat.com Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_host.h | 19 +- arch/powerpc/kernel/asm-offsets.c | 2 + arch/powerpc/kvm/book3s_hv.c| 333 ++-- arch/powerpc/kvm/book3s_hv_builtin.c| 7 +- arch/powerpc/kvm/book3s_hv_rm_xics.c| 4 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 5 + 6 files changed, 298 insertions(+), 72 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index d91f65b..2b74490 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -278,7 +278,9 @@ struct kvmppc_vcore { u16 last_cpu; u8 vcore_state; u8 in_guest; + struct kvmppc_vcore *master_vcore; struct list_head runnable_threads; + struct list_head preempt_list; spinlock_t lock; wait_queue_head_t wq; spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */ @@ -300,12 +302,18 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) -/* Values for vcore_state */ +/* + * Values for vcore_state. + * Note that these are arranged such that lower values + * ( VCORE_SLEEPING) don't require stolen time accounting + * on load/unload, and higher values do. + */ #define VCORE_INACTIVE 0 -#define VCORE_SLEEPING 1 -#define VCORE_PREEMPT 2 -#define VCORE_RUNNING 3 -#define VCORE_EXITING 4 +#define VCORE_PREEMPT 1 +#define VCORE_PIGGYBACK2 +#define VCORE_SLEEPING 3 +#define VCORE_RUNNING 4 +#define VCORE_EXITING 5 /* * Struct used to manage memory for a virtual processor area @@ -619,6 +627,7 @@ struct kvm_vcpu_arch { int trap; int state; int ptid; + int thread_cpu; bool timer_running; wait_queue_head_t cpu_run; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 0034b6b..d333664 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -512,6 +512,8 @@ int main(void) DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr)); DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty)); DEFINE(VCPU_HEIR, offsetof(struct kvm_vcpu, arch.emul_inst)); + DEFINE(VCPU_CPU, offsetof(struct kvm_vcpu, cpu)); + DEFINE(VCPU_THREAD_CPU, offsetof(struct kvm_vcpu, arch.thread_cpu)); #endif #ifdef CONFIG_PPC_BOOK3S DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id)); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 68d067a..2048309 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -81,6 +81,9 @@ static DECLARE_BITMAP(default_enabled_hcalls, MAX_HCALL_OPCODE/4 + 1); #define MPP_BUFFER_ORDER 3 #endif +static int target_smt_mode; +module_param(target_smt_mode, int, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(target_smt_mode, Target threads per core (0 = max)); static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); @@ -114,7 +117,7 @@ static bool kvmppc_ipi_thread(int cpu) static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) { - int cpu = vcpu-cpu; + int cpu; wait_queue_head_t *wqp; wqp = kvm_arch_vcpu_wq(vcpu); @@ -123,10 +126,11 @@ static void
[PATCH 0/5] PPC: Current patch queue for HV KVM
This is my current queue of patches for HV KVM. This series is based on the kvm next branch. They have all been posted 6 weeks ago or more, though I have just added a 3-line fix to patch 2/5 to fix a bug that we found in testing migration, and I expanded a comment (no code change) in patch 3/5 following a suggestion by Aneesh. I'd like to see these go into 4.2 if possible. Paul. --- arch/powerpc/include/asm/kvm_book3s.h | 1 + arch/powerpc/include/asm/kvm_book3s_asm.h | 20 + arch/powerpc/include/asm/kvm_host.h | 24 +- arch/powerpc/kernel/asm-offsets.c | 9 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 8 +- arch/powerpc/kvm/book3s_hv.c | 648 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 32 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 161 +++- arch/powerpc/kvm/book3s_hv_rm_xics.c | 4 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 122 +- 10 files changed, 906 insertions(+), 123 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] PPC: Current patch queue for HV KVM
This is my current queue of patches for HV KVM. This series is based on the kvm next branch. They have all been posted 6 weeks ago or more, though I have just added a 3-line fix to patch 2/5 to fix a bug that we found in testing migration, and I expanded a comment (no code change) in patch 3/5 following a suggestion by Aneesh. I'd like to see these go into 4.2 if possible. Paul. --- arch/powerpc/include/asm/kvm_book3s.h | 1 + arch/powerpc/include/asm/kvm_book3s_asm.h | 20 + arch/powerpc/include/asm/kvm_host.h | 24 +- arch/powerpc/kernel/asm-offsets.c | 9 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 8 +- arch/powerpc/kvm/book3s_hv.c | 648 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 32 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 161 +++- arch/powerpc/kvm/book3s_hv_rm_xics.c | 4 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 122 +- 10 files changed, 906 insertions(+), 123 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras pau...@samba.org --- v2: Add a test (3 lines) to book3s_hv_rmhandlers.S to ensure that we don't subtract the timebase offset in cases where we didn't add it. This fixes a bug found in testing where the timebase could get out of sync, causing soft lockups and crashes. arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 369 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 113 +++-- 6 files changed, 475 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..4024d24 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore
[PATCH v2] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras pau...@samba.org --- v2: Add a test (3 lines) to book3s_hv_rmhandlers.S to ensure that we don't subtract the timebase offset in cases where we didn't add it. This fixes a bug found in testing where the timebase could get out of sync, causing soft lockups and crashes. arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 369 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 113 +++-- 6 files changed, 475 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..4024d24 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore
[PATCH 5/5] KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD
This adds implementations for the H_CLEAR_REF (test and clear reference bit) and H_CLEAR_MOD (test and clear changed bit) hypercalls. When clearing the reference or change bit in the guest view of the HPTE, we also have to clear it in the real HPTE so that we can detect future references or changes. When we do so, we transfer the R or C bit value to the rmap entry for the underlying host page so that kvm_age_hva_hv(), kvm_test_age_hva_hv() and kvmppc_hv_get_dirty_log() know that the page has been referenced and/or changed. These hypercalls are not used by Linux guests. These implementations have been tested using a FreeBSD guest. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 126 ++-- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 +- 2 files changed, 121 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index c7a3ab2..c1df9bb 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -112,25 +112,38 @@ void kvmppc_update_rmap_change(unsigned long *rmap, unsigned long psize) } EXPORT_SYMBOL_GPL(kvmppc_update_rmap_change); +/* Returns a pointer to the revmap entry for the page mapped by a HPTE */ +static unsigned long *revmap_for_hpte(struct kvm *kvm, unsigned long hpte_v, + unsigned long hpte_gr) +{ + struct kvm_memory_slot *memslot; + unsigned long *rmap; + unsigned long gfn; + + gfn = hpte_rpn(hpte_gr, hpte_page_size(hpte_v, hpte_gr)); + memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn); + if (!memslot) + return NULL; + + rmap = real_vmalloc_addr(memslot-arch.rmap[gfn - memslot-base_gfn]); + return rmap; +} + /* Remove this HPTE from the chain for a real page */ static void remove_revmap_chain(struct kvm *kvm, long pte_index, struct revmap_entry *rev, unsigned long hpte_v, unsigned long hpte_r) { struct revmap_entry *next, *prev; - unsigned long gfn, ptel, head; - struct kvm_memory_slot *memslot; + unsigned long ptel, head; unsigned long *rmap; unsigned long rcbits; rcbits = hpte_r (HPTE_R_R | HPTE_R_C); ptel = rev-guest_rpte |= rcbits; - gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel)); - memslot = __gfn_to_memslot(kvm_memslots_raw(kvm), gfn); - if (!memslot) + rmap = revmap_for_hpte(kvm, hpte_v, ptel); + if (!rmap) return; - - rmap = real_vmalloc_addr(memslot-arch.rmap[gfn - memslot-base_gfn]); lock_rmap(rmap); head = *rmap KVMPPC_RMAP_INDEX; @@ -678,6 +691,105 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags, return H_SUCCESS; } +long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags, + unsigned long pte_index) +{ + struct kvm *kvm = vcpu-kvm; + __be64 *hpte; + unsigned long v, r, gr; + struct revmap_entry *rev; + unsigned long *rmap; + long ret = H_NOT_FOUND; + + if (pte_index = kvm-arch.hpt_npte) + return H_PARAMETER; + + rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]); + hpte = (__be64 *)(kvm-arch.hpt_virt + (pte_index 4)); + while (!try_lock_hpte(hpte, HPTE_V_HVLOCK)) + cpu_relax(); + v = be64_to_cpu(hpte[0]); + r = be64_to_cpu(hpte[1]); + if (!(v (HPTE_V_VALID | HPTE_V_ABSENT))) + goto out; + + gr = rev-guest_rpte; + if (rev-guest_rpte HPTE_R_R) { + rev-guest_rpte = ~HPTE_R_R; + note_hpte_modification(kvm, rev); + } + if (v HPTE_V_VALID) { + gr |= r (HPTE_R_R | HPTE_R_C); + if (r HPTE_R_R) { + kvmppc_clear_ref_hpte(kvm, hpte, pte_index); + rmap = revmap_for_hpte(kvm, v, gr); + if (rmap) { + lock_rmap(rmap); + *rmap |= KVMPPC_RMAP_REFERENCED; + unlock_rmap(rmap); + } + } + } + vcpu-arch.gpr[4] = gr; + ret = H_SUCCESS; + out: + unlock_hpte(hpte, v ~HPTE_V_HVLOCK); + return ret; +} + +long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags, + unsigned long pte_index) +{ + struct kvm *kvm = vcpu-kvm; + __be64 *hpte; + unsigned long v, r, gr; + struct revmap_entry *rev; + unsigned long *rmap; + long ret = H_NOT_FOUND; + + if (pte_index = kvm-arch.hpt_npte) + return H_PARAMETER; + + rev = real_vmalloc_addr(kvm-arch.revmap[pte_index]); + hpte = (__be64 *)(kvm-arch.hpt_virt + (pte_index 4)); + while (!try_lock_hpte(hpte
[PATCH 2/5] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. It is now possible for thread 0 to find that an exit has been requested before it gets to switch the subcore state to the guest. In that case we haven't added the guest's timebase offset to the timebase, so we need to be careful not to subtract the offset in the guest exit path. In fact we just skip the whole path that switches back to host context, since we haven't switched to the guest context. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 369 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 113 +++-- 6 files changed, 475 insertions(+), 62 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..4024d24 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore */ +#define VCORE_EXIT_REQ 0x1 + /* * Values for vcore_state. * Note that these are arranged such that lower values diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index d333664..c3e11e0 100644
[PATCH 4/5] KVM: PPC: Book3S HV: Fix bug in dirty page tracking
This fixes a bug in the tracking of pages that get modified by the guest. If the guest creates a large-page HPTE, writes to memory somewhere within the large page, and then removes the HPTE, we only record the modified state for the first normal page within the large page, when in fact the guest might have modified some other normal page within the large page. To fix this we use some unused bits in the rmap entry to record the order (log base 2) of the size of the page that was modified, when removing an HPTE. Then in kvm_test_clear_dirty_npages() we use that order to return the correct number of modified pages. The same thing could in principle happen when removing a HPTE at the host's request, i.e. when paging out a page, except that we never page out large pages, and the guest can only create large-page HPTEs if the guest RAM is backed by large pages. However, we also fix this case for the sake of future-proofing. The reference bit is also subject to the same loss of information. We don't make the same fix here for the reference bit because there isn't an interface for userspace to find out which pages the guest has referenced, whereas there is one for userspace to find out which pages the guest has modified. Because of this loss of information, the kvm_age_hva_hv() and kvm_test_age_hva_hv() functions might incorrectly say that a page has not been referenced when it has, but that doesn't matter greatly because we never page or swap out large pages. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_book3s.h | 1 + arch/powerpc/include/asm/kvm_host.h | 2 ++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 8 +++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 17 + 4 files changed, 27 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index b91e74a..e6b2534 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -158,6 +158,7 @@ extern pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing, bool *writable); extern void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev, unsigned long *rmap, long pte_index, int realmode); +extern void kvmppc_update_rmap_change(unsigned long *rmap, unsigned long psize); extern void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep, unsigned long pte_index); void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep, diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 80eb29a..e187b6a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -205,8 +205,10 @@ struct revmap_entry { */ #define KVMPPC_RMAP_LOCK_BIT 63 #define KVMPPC_RMAP_RC_SHIFT 32 +#define KVMPPC_RMAP_CHG_SHIFT 48 #define KVMPPC_RMAP_REFERENCED (HPTE_R_R KVMPPC_RMAP_RC_SHIFT) #define KVMPPC_RMAP_CHANGED(HPTE_R_C KVMPPC_RMAP_RC_SHIFT) +#define KVMPPC_RMAP_CHG_ORDER (0x3ful KVMPPC_RMAP_CHG_SHIFT) #define KVMPPC_RMAP_PRESENT0x1ul #define KVMPPC_RMAP_INDEX 0xul diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index dab68b7..1f9c0a1 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -761,6 +761,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, /* Harvest R and C */ rcbits = be64_to_cpu(hptep[1]) (HPTE_R_R | HPTE_R_C); *rmapp |= rcbits KVMPPC_RMAP_RC_SHIFT; + if (rcbits HPTE_R_C) + kvmppc_update_rmap_change(rmapp, psize); if (rcbits ~rev[i].guest_rpte) { rev[i].guest_rpte = ptel | rcbits; note_hpte_modification(kvm, rev[i]); @@ -927,8 +929,12 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp) retry: lock_rmap(rmapp); if (*rmapp KVMPPC_RMAP_CHANGED) { - *rmapp = ~KVMPPC_RMAP_CHANGED; + long change_order = (*rmapp KVMPPC_RMAP_CHG_ORDER) +KVMPPC_RMAP_CHG_SHIFT; + *rmapp = ~(KVMPPC_RMAP_CHANGED | KVMPPC_RMAP_CHG_ORDER); npages_dirty = 1; + if (change_order PAGE_SHIFT) + npages_dirty = 1ul (change_order - PAGE_SHIFT); } if (!(*rmapp KVMPPC_RMAP_PRESENT)) { unlock_rmap(rmapp); diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index c6d601c..c7a3ab2 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -12,6 +12,7 @@ #include linux/kvm_host.h #include linux/hugetlb.h #include linux/module.h +#include linux/log2.h #include asm
Re: [PATCH 0/2] KVM: PPC: Book3S HV: Dynamic micro-threading/split-core
On Wed, Jun 17, 2015 at 07:30:09PM +0200, Laurent Vivier wrote: Tested-by: Laurent Vivier lviv...@redhat.com Performance is better, but Paul could you explain why it is better if I disable dynamic micro-threading ? Did I miss something ? My test system is an IBM Power S822L. I run two guests with 8 vCPUs (-smp 8,sockets=8,cores=1,threads=1) both attached on the same core (with pinning option of virt-manager). Then, I measure the time needed to compile a kernel in parallel in both guests with make -j 16. My kernel without micro-threading: real37m23.424s real37m24.959s user167m31.474suser165m44.142s sys 113m26.195ssys 113m45.072s With micro-threading patches (PATCH 1+2): target_smt_mode 0 [in fact It was 8 here, but it should behave like 0, as it is max threads/sub-core] dynamic_mt_modes 6 real32m13.338s real 32m26.652s user139m21.181suser 140m20.994s sys 77m35.339s sys 78m16.599s It's better, but if I disable dynamic micro-threading (but PATCH 1+2): target_smt_mode 0 dynamic_mt_modes 0 real30m49.100s real 30m48.161s user144m22.989suser 142m53.886s sys 65m4.942s sys 66m8.159s it's even better. I think what's happening here is that with dynamic_mt_modes=0 the system alternates between the two guests, whereas with dynamic_mt_modes=6 it will spend some of the time running both guests simultaneously in two-way split mode. Since you have two compute-bound guests that each have threads=1 and 8 vcpus, it can fill up the core either way. In that case it is more efficient to fill up the core with vcpus from one guest and not have to split the core, firstly because you avoid the split/unsplit latency and secondly because the threads run a little faster in whole-core mode than in split-core. I am considering adding an additional heuristic, which would be to do two passes through the list of preempted vcores, considering only vcores from the same guest as the primary vcore on the first pass, and then considering all vcores on the second pass. Maybe we could then also say after the first pass that if we have collected 4 or more runnable vcpus we don't bother with the second pass. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in
Re: [PATCH 0/2] KVM: PPC: Book3S HV: Dynamic micro-threading/split-core
On Wed, Jun 17, 2015 at 07:30:09PM +0200, Laurent Vivier wrote: Tested-by: Laurent Vivier lviv...@redhat.com Performance is better, but Paul could you explain why it is better if I disable dynamic micro-threading ? Did I miss something ? My test system is an IBM Power S822L. I run two guests with 8 vCPUs (-smp 8,sockets=8,cores=1,threads=1) both attached on the same core (with pinning option of virt-manager). Then, I measure the time needed to compile a kernel in parallel in both guests with make -j 16. My kernel without micro-threading: real37m23.424s real37m24.959s user167m31.474suser165m44.142s sys 113m26.195ssys 113m45.072s With micro-threading patches (PATCH 1+2): target_smt_mode 0 [in fact It was 8 here, but it should behave like 0, as it is max threads/sub-core] dynamic_mt_modes 6 real32m13.338s real 32m26.652s user139m21.181suser 140m20.994s sys 77m35.339s sys 78m16.599s It's better, but if I disable dynamic micro-threading (but PATCH 1+2): target_smt_mode 0 dynamic_mt_modes 0 real30m49.100s real 30m48.161s user144m22.989suser 142m53.886s sys 65m4.942s sys 66m8.159s it's even better. I think what's happening here is that with dynamic_mt_modes=0 the system alternates between the two guests, whereas with dynamic_mt_modes=6 it will spend some of the time running both guests simultaneously in two-way split mode. Since you have two compute-bound guests that each have threads=1 and 8 vcpus, it can fill up the core either way. In that case it is more efficient to fill up the core with vcpus from one guest and not have to split the core, firstly because you avoid the split/unsplit latency and secondly because the threads run a little faster in whole-core mode than in split-core. I am considering adding an additional heuristic, which would be to do two passes through the list of preempted vcores, considering only vcores from the same guest as the primary vcore on the first pass, and then considering all vcores on the second pass. Maybe we could then also say after the first pass that if we have collected 4 or more runnable vcpus we don't bother with the second pass. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in
[PATCH 0/2] KVM: PPC: Book3S HV: Dynamic micro-threading/split-core
This patch series provides a way to use more of the capacity of each processor core when running guests configured with threads=1, 2 or 4 on a POWER8 host with HV KVM, without having to change the static micro-threading (the official name for split-core) mode for the whole machine. The problem with setting the machine to static 2-way or 4-way micro-threading mode is that (a) then you can't run guests with threads=8 and (b) selecting the right mode can be tricky and requires knowledge of what guests you will be running. Instead, with these two patches, we can now run more than one virtual core (vcore) on a given physical core if possible, and if that means we need to switch the core to 2-way or 4-way micro-threading mode, then we do that on entry to the guests and switch back to whole-core mode on exit (and we only switch the one core, not the whole machine). The core mode switching is only done if the machine is in static whole-core mode. All of this only comes into effect when a core is over-committed. When the machine is lightly loaded everything operates the same with these patches as without. Only when some core has a vcore that is able to run while there is also another vcore that was wanting to run on that core but got preempted does the logic kick in to try to run both vcores at once. Paul. --- arch/powerpc/include/asm/kvm_book3s_asm.h | 20 + arch/powerpc/include/asm/kvm_host.h | 22 +- arch/powerpc/kernel/asm-offsets.c | 9 + arch/powerpc/kvm/book3s_hv.c | 648 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 32 +- arch/powerpc/kvm/book3s_hv_rm_xics.c | 4 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 111 - 7 files changed, 740 insertions(+), 106 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: PPC: Book3S HV: Make use of unused threads when running guests
When running a virtual core of a guest that is configured with fewer threads per core than the physical cores have, the extra physical threads are currently unused. This makes it possible to use them to run one or more other virtual cores from the same guest when certain conditions are met. This applies on POWER7, and on POWER8 to guests with one thread per virtual core. (It doesn't apply to POWER8 guests with multiple threads per vcore because they require a 1-1 virtual to physical thread mapping in order to be able to use msgsndp and the TIR.) The idea is that we maintain a list of preempted vcores for each physical cpu (i.e. each core, since the host runs single-threaded). Then, when a vcore is about to run, it checks to see if there are any vcores on the list for its physical cpu that could be piggybacked onto this vcore's execution. If so, those additional vcores are put into state VCORE_PIGGYBACK and their runnable VCPU threads are started as well as the original vcore, which is called the master vcore. After the vcores have exited the guest, the extra ones are put back onto the preempted list if any of their VCPUs are still runnable and not idle. This means that vcpu-arch.ptid is no longer necessarily the same as the physical thread that the vcpu runs on. In order to make it easier for code that wants to send an IPI to know which CPU to target, we now store that in a new field in struct vcpu_arch, called thread_cpu. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_host.h | 19 +- arch/powerpc/kernel/asm-offsets.c | 2 + arch/powerpc/kvm/book3s_hv.c| 333 ++-- arch/powerpc/kvm/book3s_hv_builtin.c| 7 +- arch/powerpc/kvm/book3s_hv_rm_xics.c| 4 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 5 + 6 files changed, 298 insertions(+), 72 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index d91f65b..2b74490 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -278,7 +278,9 @@ struct kvmppc_vcore { u16 last_cpu; u8 vcore_state; u8 in_guest; + struct kvmppc_vcore *master_vcore; struct list_head runnable_threads; + struct list_head preempt_list; spinlock_t lock; wait_queue_head_t wq; spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */ @@ -300,12 +302,18 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) -/* Values for vcore_state */ +/* + * Values for vcore_state. + * Note that these are arranged such that lower values + * ( VCORE_SLEEPING) don't require stolen time accounting + * on load/unload, and higher values do. + */ #define VCORE_INACTIVE 0 -#define VCORE_SLEEPING 1 -#define VCORE_PREEMPT 2 -#define VCORE_RUNNING 3 -#define VCORE_EXITING 4 +#define VCORE_PREEMPT 1 +#define VCORE_PIGGYBACK2 +#define VCORE_SLEEPING 3 +#define VCORE_RUNNING 4 +#define VCORE_EXITING 5 /* * Struct used to manage memory for a virtual processor area @@ -619,6 +627,7 @@ struct kvm_vcpu_arch { int trap; int state; int ptid; + int thread_cpu; bool timer_running; wait_queue_head_t cpu_run; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 0034b6b..d333664 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -512,6 +512,8 @@ int main(void) DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr)); DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty)); DEFINE(VCPU_HEIR, offsetof(struct kvm_vcpu, arch.emul_inst)); + DEFINE(VCPU_CPU, offsetof(struct kvm_vcpu, cpu)); + DEFINE(VCPU_THREAD_CPU, offsetof(struct kvm_vcpu, arch.thread_cpu)); #endif #ifdef CONFIG_PPC_BOOK3S DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id)); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 68d067a..2048309 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -81,6 +81,9 @@ static DECLARE_BITMAP(default_enabled_hcalls, MAX_HCALL_OPCODE/4 + 1); #define MPP_BUFFER_ORDER 3 #endif +static int target_smt_mode; +module_param(target_smt_mode, int, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(target_smt_mode, Target threads per core (0 = max)); static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); @@ -114,7 +117,7 @@ static bool kvmppc_ipi_thread(int cpu) static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) { - int cpu = vcpu-cpu; + int cpu; wait_queue_head_t *wqp; wqp = kvm_arch_vcpu_wq(vcpu); @@ -123,10 +126,11 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) ++vcpu-stat.halt_wakeup
[PATCH 2/2] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 369 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 106 +++-- 6 files changed, 469 insertions(+), 61 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..4024d24 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore */ +#define VCORE_EXIT_REQ 0x1 + /* * Values for vcore_state. * Note that these are arranged such that lower values diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index d333664..c3e11e0 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -676,7 +676,14 @@ int main(void) HSTATE_FIELD(HSTATE_DSCR, host_dscr); HSTATE_FIELD(HSTATE_DABR, dabr); HSTATE_FIELD(HSTATE_DECEXP, dec_expires); + HSTATE_FIELD(HSTATE_SPLIT_MODE, kvm_split_mode); DEFINE(IPI_PRIORITY, IPI_PRIORITY); + DEFINE(KVM_SPLIT_RPR, offsetof(struct
[PATCH 0/2] KVM: PPC: Book3S HV: Dynamic micro-threading/split-core
This patch series provides a way to use more of the capacity of each processor core when running guests configured with threads=1, 2 or 4 on a POWER8 host with HV KVM, without having to change the static micro-threading (the official name for split-core) mode for the whole machine. The problem with setting the machine to static 2-way or 4-way micro-threading mode is that (a) then you can't run guests with threads=8 and (b) selecting the right mode can be tricky and requires knowledge of what guests you will be running. Instead, with these two patches, we can now run more than one virtual core (vcore) on a given physical core if possible, and if that means we need to switch the core to 2-way or 4-way micro-threading mode, then we do that on entry to the guests and switch back to whole-core mode on exit (and we only switch the one core, not the whole machine). The core mode switching is only done if the machine is in static whole-core mode. All of this only comes into effect when a core is over-committed. When the machine is lightly loaded everything operates the same with these patches as without. Only when some core has a vcore that is able to run while there is also another vcore that was wanting to run on that core but got preempted does the logic kick in to try to run both vcores at once. Paul. --- arch/powerpc/include/asm/kvm_book3s_asm.h | 20 + arch/powerpc/include/asm/kvm_host.h | 22 +- arch/powerpc/kernel/asm-offsets.c | 9 + arch/powerpc/kvm/book3s_hv.c | 648 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 32 +- arch/powerpc/kvm/book3s_hv_rm_xics.c | 4 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 111 - 7 files changed, 740 insertions(+), 106 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: PPC: Book3S HV: Make use of unused threads when running guests
When running a virtual core of a guest that is configured with fewer threads per core than the physical cores have, the extra physical threads are currently unused. This makes it possible to use them to run one or more other virtual cores from the same guest when certain conditions are met. This applies on POWER7, and on POWER8 to guests with one thread per virtual core. (It doesn't apply to POWER8 guests with multiple threads per vcore because they require a 1-1 virtual to physical thread mapping in order to be able to use msgsndp and the TIR.) The idea is that we maintain a list of preempted vcores for each physical cpu (i.e. each core, since the host runs single-threaded). Then, when a vcore is about to run, it checks to see if there are any vcores on the list for its physical cpu that could be piggybacked onto this vcore's execution. If so, those additional vcores are put into state VCORE_PIGGYBACK and their runnable VCPU threads are started as well as the original vcore, which is called the master vcore. After the vcores have exited the guest, the extra ones are put back onto the preempted list if any of their VCPUs are still runnable and not idle. This means that vcpu-arch.ptid is no longer necessarily the same as the physical thread that the vcpu runs on. In order to make it easier for code that wants to send an IPI to know which CPU to target, we now store that in a new field in struct vcpu_arch, called thread_cpu. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_host.h | 19 +- arch/powerpc/kernel/asm-offsets.c | 2 + arch/powerpc/kvm/book3s_hv.c| 333 ++-- arch/powerpc/kvm/book3s_hv_builtin.c| 7 +- arch/powerpc/kvm/book3s_hv_rm_xics.c| 4 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 5 + 6 files changed, 298 insertions(+), 72 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index d91f65b..2b74490 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -278,7 +278,9 @@ struct kvmppc_vcore { u16 last_cpu; u8 vcore_state; u8 in_guest; + struct kvmppc_vcore *master_vcore; struct list_head runnable_threads; + struct list_head preempt_list; spinlock_t lock; wait_queue_head_t wq; spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */ @@ -300,12 +302,18 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) -/* Values for vcore_state */ +/* + * Values for vcore_state. + * Note that these are arranged such that lower values + * ( VCORE_SLEEPING) don't require stolen time accounting + * on load/unload, and higher values do. + */ #define VCORE_INACTIVE 0 -#define VCORE_SLEEPING 1 -#define VCORE_PREEMPT 2 -#define VCORE_RUNNING 3 -#define VCORE_EXITING 4 +#define VCORE_PREEMPT 1 +#define VCORE_PIGGYBACK2 +#define VCORE_SLEEPING 3 +#define VCORE_RUNNING 4 +#define VCORE_EXITING 5 /* * Struct used to manage memory for a virtual processor area @@ -619,6 +627,7 @@ struct kvm_vcpu_arch { int trap; int state; int ptid; + int thread_cpu; bool timer_running; wait_queue_head_t cpu_run; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 0034b6b..d333664 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -512,6 +512,8 @@ int main(void) DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr)); DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty)); DEFINE(VCPU_HEIR, offsetof(struct kvm_vcpu, arch.emul_inst)); + DEFINE(VCPU_CPU, offsetof(struct kvm_vcpu, cpu)); + DEFINE(VCPU_THREAD_CPU, offsetof(struct kvm_vcpu, arch.thread_cpu)); #endif #ifdef CONFIG_PPC_BOOK3S DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id)); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 68d067a..2048309 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -81,6 +81,9 @@ static DECLARE_BITMAP(default_enabled_hcalls, MAX_HCALL_OPCODE/4 + 1); #define MPP_BUFFER_ORDER 3 #endif +static int target_smt_mode; +module_param(target_smt_mode, int, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(target_smt_mode, Target threads per core (0 = max)); static void kvmppc_end_cede(struct kvm_vcpu *vcpu); static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); @@ -114,7 +117,7 @@ static bool kvmppc_ipi_thread(int cpu) static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) { - int cpu = vcpu-cpu; + int cpu; wait_queue_head_t *wqp; wqp = kvm_arch_vcpu_wq(vcpu); @@ -123,10 +126,11 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) ++vcpu-stat.halt_wakeup
[PATCH 2/2] KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
This builds on the ability to run more than one vcore on a physical core by using the micro-threading (split-core) modes of the POWER8 chip. Previously, only vcores from the same VM could be run together, and (on POWER8) only if they had just one thread per core. With the ability to split the core on guest entry and unsplit it on guest exit, we can run up to 8 vcpu threads from up to 4 different VMs, and we can run multiple vcores with 2 or 4 vcpus per vcore. Dynamic micro-threading is only available if the static configuration of the cores is whole-core mode (unsplit), and only on POWER8. To manage this, we introduce a new kvm_split_mode struct which is shared across all of the subcores in the core, with a pointer in the paca on each thread. In addition we extend the core_info struct to have information on each subcore. When deciding whether to add a vcore to the set already on the core, we now have two possibilities: (a) piggyback the vcore onto an existing subcore, or (b) start a new subcore. Currently, when any vcpu needs to exit the guest and switch to host virtual mode, we interrupt all the threads in all subcores and switch the core back to whole-core mode. It may be possible in future to allow some of the subcores to keep executing in the guest while subcore 0 switches to the host, but that is not implemented in this patch. This adds a module parameter called dynamic_mt_modes which controls which micro-threading (split-core) modes the code will consider, as a bitmap. In other words, if it is 0, no micro-threading mode is considered; if it is 2, only 2-way micro-threading is considered; if it is 4, only 4-way, and if it is 6, both 2-way and 4-way micro-threading mode will be considered. The default is 6. With this, we now have secondary threads which are the primary thread for their subcore and therefore need to do the MMU switch. These threads will need to be started even if they have no vcpu to run, so we use the vcore pointer in the PACA rather than the vcpu pointer to trigger them. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_book3s_asm.h | 20 ++ arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/kernel/asm-offsets.c | 7 + arch/powerpc/kvm/book3s_hv.c | 369 ++ arch/powerpc/kvm/book3s_hv_builtin.c | 25 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 106 +++-- 6 files changed, 469 insertions(+), 61 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 5bdfb5d..4024d24 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -25,6 +25,12 @@ #define XICS_MFRR 0xc #define XICS_IPI 2 /* interrupt source # for IPIs */ +/* Maximum number of threads per physical core */ +#define MAX_THREADS8 + +/* Maximum number of subcores per physical core */ +#define MAX_SUBCORES 4 + #ifdef __ASSEMBLY__ #ifdef CONFIG_KVM_BOOK3S_HANDLER @@ -65,6 +71,19 @@ kvmppc_resume_\intno: #else /*__ASSEMBLY__ */ +struct kvmppc_vcore; + +/* Struct used for coordinating micro-threading (split-core) mode changes */ +struct kvm_split_mode { + unsigned long rpr; + unsigned long pmmar; + unsigned long ldbar; + u8 subcore_size; + u8 do_nap; + u8 napped[MAX_THREADS]; + struct kvmppc_vcore *master_vcs[MAX_SUBCORES]; +}; + /* * This struct goes in the PACA on 64-bit processors. It is used * to store host state that needs to be saved when we enter a guest @@ -100,6 +119,7 @@ struct kvmppc_host_state { u64 host_spurr; u64 host_dscr; u64 dec_expires; + struct kvm_split_mode *kvm_split_mode; #endif #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 2b74490..80eb29a 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -302,6 +302,9 @@ struct kvmppc_vcore { #define VCORE_EXIT_MAP(vc) ((vc)-entry_exit_map 8) #define VCORE_IS_EXITING(vc) (VCORE_EXIT_MAP(vc) != 0) +/* This bit is used when a vcore exit is triggered from outside the vcore */ +#define VCORE_EXIT_REQ 0x1 + /* * Values for vcore_state. * Note that these are arranged such that lower values diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index d333664..c3e11e0 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -676,7 +676,14 @@ int main(void) HSTATE_FIELD(HSTATE_DSCR, host_dscr); HSTATE_FIELD(HSTATE_DABR, dabr); HSTATE_FIELD(HSTATE_DECEXP, dec_expires); + HSTATE_FIELD(HSTATE_SPLIT_MODE, kvm_split_mode); DEFINE(IPI_PRIORITY, IPI_PRIORITY); + DEFINE(KVM_SPLIT_RPR, offsetof(struct
Re: [PATCH 1/1] KVM: PPC: Book3S: correct width in XER handling
On Wed, May 20, 2015 at 03:26:12PM +1000, Sam Bobroff wrote: In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is accessed as such. This patch corrects places where it is accessed as a 32 bit field by a 64 bit kernel. In some cases this is via a 32 bit load or store instruction which, depending on endianness, will cause either the lower or upper 32 bits to be missed. In another case it is cast as a u32, causing the upper 32 bits to be cleared. This patch corrects those places by extending the access methods to 64 bits. Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com Acked-by: Paul Mackerras pau...@samba.org -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: PPC: Book3S: correct width in XER handling
On Wed, May 20, 2015 at 05:35:08PM -0500, Scott Wood wrote: It's nominally a 64-bit register, but the upper 32 bits are reserved in ISA 2.06. Do newer ISAs or certain implementations define things in the upper 32 bits, or is this just about the asm accesses being wrong on big-endian? It's primarily about the asm accesses being wrong. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Fix list traversal in error case
This fixes a regression introduced in commit 25fedfca94cf, KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu, which leads to a user-triggerable oops. In the case where we try to run a vcore on a physical core that is not in single-threaded mode, or the vcore has too many threads for the physical core, we iterate the list of runnable vcpus to make each one return an EBUSY error to userspace. Since this involves taking each vcpu off the runnable_threads list for the vcore, we need to use list_for_each_entry_safe rather than list_for_each_entry to traverse the list. Otherwise the kernel will crash with an oops message like this: Unable to handle kernel paging request for data at address 0x000fff88 Faulting instruction address: 0xd0001e635dc8 Oops: Kernel access of bad area, sig: 11 [#2] SMP NR_CPUS=1024 NUMA PowerNV ... CPU: 48 PID: 91256 Comm: qemu-system-ppc Tainted: G D3.18.0 #1 task: c0274e507500 ti: c027d1924000 task.ti: c027d1924000 NIP: d0001e635dc8 LR: d0001e635df8 CTR: c011ba50 REGS: c027d19275b0 TRAP: 0300 Tainted: G D (3.18.0) MSR: 90009033 SF,HV,EE,ME,IR,DR,RI,LE CR: 22002824 XER: CFAR: c0008468 DAR: 000fff88 DSISR: 4000 SOFTE: 1 GPR00: d0001e635df8 c027d1927830 d0001e64c850 0001 GPR04: 0001 0001 GPR08: 00200200 d0001e63e588 GPR12: 2200 c7dbc800 c00fc780 000a GPR16: fffc c00fd5439690 c00fc7801c98 0001 GPR20: 0003 c027d1927aa8 c00fd543b348 c00fd543b350 GPR24: c00fa57f 0030 GPR28: fff0 c00fd543b328 000fe468 c00fd543b300 NIP [d0001e635dc8] kvmppc_run_core+0x198/0x17c0 [kvm_hv] LR [d0001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] Call Trace: [c027d1927830] [d0001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] (unreliable) [c027d1927a30] [d0001e638350] kvmppc_vcpu_run_hv+0x5b0/0xdd0 [kvm_hv] [c027d1927b70] [d0001e510504] kvmppc_vcpu_run+0x44/0x60 [kvm] [c027d1927ba0] [d0001e50d4a4] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm] [c027d1927be0] [d0001e504be8] kvm_vcpu_ioctl+0x5e8/0x7a0 [kvm] [c027d1927d40] [c02d6720] do_vfs_ioctl+0x490/0x780 [c027d1927de0] [c02d6ae4] SyS_ioctl+0xd4/0xf0 [c027d1927e30] [c0009358] syscall_exit+0x0/0x98 Instruction dump: 6000 6042 387e1b30 3883 38a1 38c0 480087d9 e8410018 ebde1c98 7fbdf040 3bdee368 419e0048 813e1b20 939e1b18 2f890001 409effcc ---[ end trace 8cdf50251cca6680 ]--- Fixes: 25fedfca94cf Signed-off-by: Paul Mackerras pau...@samba.org --- Since this is a regression fix for a patch that went in post 4.0, it should go in for 4.1. arch/powerpc/kvm/book3s_hv.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 48d3c5d..df81caa 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1952,7 +1952,7 @@ static void post_guest_process(struct kvmppc_vcore *vc) */ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) { - struct kvm_vcpu *vcpu; + struct kvm_vcpu *vcpu, *vnext; int i; int srcu_idx; @@ -1982,7 +1982,8 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) */ if ((threads_per_core 1) ((vc-num_threads threads_per_subcore) || !on_primary_thread())) { - list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) { + list_for_each_entry_safe(vcpu, vnext, vc-runnable_threads, +arch.run_list) { vcpu-arch.ret = -EBUSY; kvmppc_remove_runnable(vc, vcpu); wake_up(vcpu-arch.cpu_run); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S HV: Fix list traversal in error case
This fixes a regression introduced in commit 25fedfca94cf, KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu, which leads to a user-triggerable oops. In the case where we try to run a vcore on a physical core that is not in single-threaded mode, or the vcore has too many threads for the physical core, we iterate the list of runnable vcpus to make each one return an EBUSY error to userspace. Since this involves taking each vcpu off the runnable_threads list for the vcore, we need to use list_for_each_entry_safe rather than list_for_each_entry to traverse the list. Otherwise the kernel will crash with an oops message like this: Unable to handle kernel paging request for data at address 0x000fff88 Faulting instruction address: 0xd0001e635dc8 Oops: Kernel access of bad area, sig: 11 [#2] SMP NR_CPUS=1024 NUMA PowerNV ... CPU: 48 PID: 91256 Comm: qemu-system-ppc Tainted: G D3.18.0 #1 task: c0274e507500 ti: c027d1924000 task.ti: c027d1924000 NIP: d0001e635dc8 LR: d0001e635df8 CTR: c011ba50 REGS: c027d19275b0 TRAP: 0300 Tainted: G D (3.18.0) MSR: 90009033 SF,HV,EE,ME,IR,DR,RI,LE CR: 22002824 XER: CFAR: c0008468 DAR: 000fff88 DSISR: 4000 SOFTE: 1 GPR00: d0001e635df8 c027d1927830 d0001e64c850 0001 GPR04: 0001 0001 GPR08: 00200200 d0001e63e588 GPR12: 2200 c7dbc800 c00fc780 000a GPR16: fffc c00fd5439690 c00fc7801c98 0001 GPR20: 0003 c027d1927aa8 c00fd543b348 c00fd543b350 GPR24: c00fa57f 0030 GPR28: fff0 c00fd543b328 000fe468 c00fd543b300 NIP [d0001e635dc8] kvmppc_run_core+0x198/0x17c0 [kvm_hv] LR [d0001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] Call Trace: [c027d1927830] [d0001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] (unreliable) [c027d1927a30] [d0001e638350] kvmppc_vcpu_run_hv+0x5b0/0xdd0 [kvm_hv] [c027d1927b70] [d0001e510504] kvmppc_vcpu_run+0x44/0x60 [kvm] [c027d1927ba0] [d0001e50d4a4] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm] [c027d1927be0] [d0001e504be8] kvm_vcpu_ioctl+0x5e8/0x7a0 [kvm] [c027d1927d40] [c02d6720] do_vfs_ioctl+0x490/0x780 [c027d1927de0] [c02d6ae4] SyS_ioctl+0xd4/0xf0 [c027d1927e30] [c0009358] syscall_exit+0x0/0x98 Instruction dump: 6000 6042 387e1b30 3883 38a1 38c0 480087d9 e8410018 ebde1c98 7fbdf040 3bdee368 419e0048 813e1b20 939e1b18 2f890001 409effcc ---[ end trace 8cdf50251cca6680 ]--- Fixes: 25fedfca94cf Signed-off-by: Paul Mackerras pau...@samba.org --- Since this is a regression fix for a patch that went in post 4.0, it should go in for 4.1. arch/powerpc/kvm/book3s_hv.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 48d3c5d..df81caa 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1952,7 +1952,7 @@ static void post_guest_process(struct kvmppc_vcore *vc) */ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) { - struct kvm_vcpu *vcpu; + struct kvm_vcpu *vcpu, *vnext; int i; int srcu_idx; @@ -1982,7 +1982,8 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) */ if ((threads_per_core 1) ((vc-num_threads threads_per_subcore) || !on_primary_thread())) { - list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) { + list_for_each_entry_safe(vcpu, vnext, vc-runnable_threads, +arch.run_list) { vcpu-arch.ret = -EBUSY; kvmppc_remove_runnable(vc, vcpu); wake_up(vcpu-arch.cpu_run); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html