Re: [RFC PATCH v0 5/5] pseries: Asynchronous page fault support

2021-08-12 Thread Bharata B Rao
On Fri, Aug 13, 2021 at 02:06:40PM +1000, Nicholas Piggin wrote:
> Excerpts from Bharata B Rao's message of August 5, 2021 5:24 pm:
> > Add asynchronous page fault support for pseries guests.
> > 
> > 1. Setup the guest to handle async-pf
> >- Issue H_REG_SNS hcall to register the SNS region.
> >- Setup the subvention interrupt irq.
> >- Enable async-pf by updating the byte_b9 of VPA for each
> >  CPU.
> > 2. Check if the page fault is an expropriation notification
> >(SRR1_PROGTRAP set in SRR1) and if so put the task on
> >wait queue based on the expropriation correlation number
> >read from the VPA.
> > 3. Handle subvention interrupt to wake any waiting tasks.
> >The wait and wakeup mechanism from x86 async-pf implementation
> >is being reused here.
> 
> I don't know too much about the background of this.
> 
> How much benefit does this give? What situations?

I haven't yet gotten into measuring the benefit of this. Once
the patches are bit more stable than what they are currently,
we need to measure and evaluate the benefits.

> Does PowerVM implement it?

I suppose so, need to check though.

> Do other architectures KVM have something similar?

Yes, x86 and s390 KVM have had this feature for a while now
and generic KVM interfaces exist to support it.

> 
> The SRR1 setting for the DSI is in PAPR? In that case it should be okay,
> it might be good to add a small comment in exceptions-64s.S.

Yes, SRR1 setting is part of PAPR.

> 
> [...]
> 
> > @@ -395,6 +395,11 @@ static int ___do_page_fault(struct pt_regs *regs, 
> > unsigned long address,
> > vm_fault_t fault, major = 0;
> > bool kprobe_fault = kprobe_page_fault(regs, 11);
> >  
> > +#ifdef CONFIG_PPC_PSERIES
> > +   if (handle_async_page_fault(regs, address))
> > +   return 0;
> > +#endif
> > +
> > if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
> > return 0;
> 
> [...]
> 
> > +int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
> > +{
> > +   struct async_pf_sleep_node n;
> > +   DECLARE_SWAITQUEUE(wait);
> > +   unsigned long exp_corr_nr;
> > +
> > +   /* Is this Expropriation notification? */
> > +   if (!(mfspr(SPRN_SRR1) & SRR1_PROGTRAP))
> > +   return 0;
> 
> Yep this should be an inline that is guarded by a static key, and then 
> probably have an inline check for SRR1_PROGTRAP. You shouldn't need to
> mfspr here, but just use regs->msr.

Right.

> 
> > +
> > +   if (unlikely(!user_mode(regs)))
> > +   panic("Host injected async PF in kernel mode\n");
> 
> Hmm. Is there anything in the PAPR interface that specifies that the
> OS can only deal with problem state access faults here? Or is that
> inherent in the expropriation feature?

Didn't see anything specific to that effect in PAPR. However since
this puts the faulting guest process to sleep until the page
becomes ready in the host, I have limited it to guest user space
faults.

Regards,
Bharata.


Re: [RFC PATCH v0 0/5] PPC: KVM: pseries: Asynchronous page fault

2021-08-05 Thread Bharata B Rao
On Thu, Aug 05, 2021 at 12:54:34PM +0530, Bharata B Rao wrote:
> Hi,
> 
> This series adds asynchronous page fault support for pseries guests
> and enables the support for the same in powerpc KVM. This is an
> early RFC with details and multiple TODOs listed in patch descriptions.
> 
> This patch needs supporting enablement in QEMU too which will be
> posted separately.

QEMU part is posted here:
https://lore.kernel.org/qemu-devel/20210805073228.502292-2-bhar...@linux.ibm.com/T/#u

Regards,
Bharata.


[RFC PATCH v0 5/5] pseries: Asynchronous page fault support

2021-08-05 Thread Bharata B Rao
Add asynchronous page fault support for pseries guests.

1. Setup the guest to handle async-pf
   - Issue H_REG_SNS hcall to register the SNS region.
   - Setup the subvention interrupt irq.
   - Enable async-pf by updating the byte_b9 of VPA for each
 CPU.
2. Check if the page fault is an expropriation notification
   (SRR1_PROGTRAP set in SRR1) and if so put the task on
   wait queue based on the expropriation correlation number
   read from the VPA.
3. Handle subvention interrupt to wake any waiting tasks.
   The wait and wakeup mechanism from x86 async-pf implementation
   is being reused here.

TODO:
- Check how to keep this feature together with other CMO features.
- The async-pf check in the page fault handler path is limited to
  guest with an #ifdef. This isn't sufficient and hence needs to
  be replaced by an appropriate check.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/async-pf.h   |  12 ++
 arch/powerpc/mm/fault.c   |   7 +-
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 arch/powerpc/platforms/pseries/async-pf.c | 219 ++
 4 files changed, 238 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/include/asm/async-pf.h
 create mode 100644 arch/powerpc/platforms/pseries/async-pf.c

diff --git a/arch/powerpc/include/asm/async-pf.h 
b/arch/powerpc/include/asm/async-pf.h
new file mode 100644
index ..95d6c3da9f50
--- /dev/null
+++ b/arch/powerpc/include/asm/async-pf.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Async page fault support via PAPR Expropriation/Subvention Notification
+ * option(ESN)
+ *
+ * Copyright 2020 Bharata B Rao, IBM Corp. 
+ */
+
+#ifndef _ASM_POWERPC_ASYNC_PF_H
+int handle_async_page_fault(struct pt_regs *regs, unsigned long addr);
+#define _ASM_POWERPC_ASYNC_PF_H
+#endif
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index a8d0ce85d39a..bbdc61605885 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -44,7 +44,7 @@
 #include 
 #include 
 #include 
-
+#include 
 
 /*
  * do_page_fault error handling helpers
@@ -395,6 +395,11 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned 
long address,
vm_fault_t fault, major = 0;
bool kprobe_fault = kprobe_page_fault(regs, 11);
 
+#ifdef CONFIG_PPC_PSERIES
+   if (handle_async_page_fault(regs, address))
+   return 0;
+#endif
+
if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
return 0;
 
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 4cda0ef87be0..e0ada605ef20 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -6,7 +6,7 @@ obj-y   := lpar.o hvCall.o nvram.o reconfig.o \
   of_helpers.o \
   setup.o iommu.o event_sources.o ras.o \
   firmware.o power.o dlpar.o mobility.o rng.o \
-  pci.o pci_dlpar.o eeh_pseries.o msi.o
+  pci.o pci_dlpar.o eeh_pseries.o msi.o async-pf.o
 obj-$(CONFIG_SMP)  += smp.o
 obj-$(CONFIG_SCANLOG)  += scanlog.o
 obj-$(CONFIG_KEXEC_CORE)   += kexec.o
diff --git a/arch/powerpc/platforms/pseries/async-pf.c 
b/arch/powerpc/platforms/pseries/async-pf.c
new file mode 100644
index ..c2f3bbc0d674
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/async-pf.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Async page fault support via PAPR Expropriation/Subvention Notification
+ * option(ESN)
+ *
+ * Copyright 2020 Bharata B Rao, IBM Corp. 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static char sns_buffer[PAGE_SIZE] __aligned(4096);
+static uint16_t *esn_q = (uint16_t *)sns_buffer + 1;
+static unsigned long next_eq_entry, nr_eq_entries;
+
+#define ASYNC_PF_SLEEP_HASHBITS 8
+#define ASYNC_PF_SLEEP_HASHSIZE (1<list) {
+   struct async_pf_sleep_node *n =
+   hlist_entry(p, typeof(*n), link);
+   if (n->token == token)
+   return n;
+   }
+
+   return NULL;
+}
+static int async_pf_queue_task(u64 token, struct async_pf_sleep_node *n)
+{
+   u64 key = hash_64(token, ASYNC_PF_SLEEP_HASHBITS);
+   struct async_pf_sleep_head *b = _pf_sleepers[key];
+   struct async_pf_sleep_node *e;
+
+   raw_spin_lock(>lock);
+   e = _find_apf_task(b, token);
+   if (e) {
+   /* dummy entry exist -> wake up was delivered ahead of PF */
+   hlist_del(>link);
+   raw_spin_unlock(>lock);
+   kfree(e);
+   return false;
+   }
+
+   n->token = token;
+   n->cpu = smp_processor_id();
+   init_swait_queue_head(>wq);
+   hlist_add_head(>link, >list);
+   raw_spin_unlock(>lock);
+   return true;
+}
+
+/*
+ * Handle E

[RFC PATCH v0 4/5] KVM: PPC: BOOK3S HV: Async PF support

2021-08-05 Thread Bharata B Rao
Add asynchronous page fault support for PowerKVM by making
use of the Expropriation/Subvention Notification Option
defined by PAPR specifications.

1. When guest accessed page isn't immediately available in the
host, update the vcpu's VPA with a unique expropriation correlation
number and inject a DSI to the guest with SRR1_PROGTRAP bit set in
SRR1. This informs the guest vcpu to put the process to wait and
schedule a different process.
   - Async PF is supported for data pages in this implementation
 though PAPR allows it for code pages too.
   - Async PF is supported only for user pages here.
   - The feature is currently limited only to radix guests.

2. When the page becomes available, update the Subvention Notification
Structure  with the corresponding expropriation correlation number and
and inform the guest via subvention interrupt.
   - Subvention Notification Structure (SNS) is a region of memory
 shared between host and guest via which the communication related
 to expropriated and subvened pages happens between guest and host.
   - SNS region is registered by the guest via H_REG_SNS hcall which
 is implemented in QEMU.
   - H_REG_SNS implementation in QEMU needs a new ioctl KVM_PPC_SET_SNS.
 This ioctl is used to map and pin the guest page containing SNS
 in the host.
   - Subvention notification interrupt is raised to the guest by
 QEMU in response to the guest exit via KVM_REQ_ESN_EXIT. This
 interrupt informs the guest about the availability of the
 pages.

TODO:
- H_REG_SNS is implemented in QEMU because this hcall needs to return
  the interrupt source number associated with the subvention interrupt.
  Claiming of IRQ line and raising an external interrupt seem to be
  straightforward from QEMU. Figure out the in-kernel equivalents for
  these two so that, we can save on guest exit for each expropriated
  page and move the entire hcall implementation into the host kernel.
- The code is pretty much experimental and is barely able to boot a
  guest. I do see some requests for expropriated pages not getting
  fulfilled by host leading the long delays in guest. This needs some
  debugging.
- A few other aspects recommended by PAPR around this feature(like
  setting of page state flags) need to be evaluated and incorporated
  into the implementation if found appropriate.

Signed-off-by: Bharata B Rao 
---
 Documentation/virt/kvm/api.rst|  15 ++
 arch/powerpc/include/asm/hvcall.h |   1 +
 arch/powerpc/include/asm/kvm_book3s_esn.h |  24 +++
 arch/powerpc/include/asm/kvm_host.h   |  21 +++
 arch/powerpc/include/asm/kvm_ppc.h|   1 +
 arch/powerpc/include/asm/lppaca.h |  12 +-
 arch/powerpc/include/uapi/asm/kvm.h   |   6 +
 arch/powerpc/kvm/Kconfig  |   2 +
 arch/powerpc/kvm/Makefile |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c|   3 +
 arch/powerpc/kvm/book3s_hv.c  |  25 +++
 arch/powerpc/kvm/book3s_hv_esn.c  | 189 ++
 include/uapi/linux/kvm.h  |   1 +
 tools/include/uapi/linux/kvm.h|   1 +
 14 files changed, 303 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_esn.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_esn.c

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index dae68e68ca23..512f078b9d02 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5293,6 +5293,21 @@ the trailing ``'\0'``, is indicated by ``name_size`` in 
the header.
 The Stats Data block contains an array of 64-bit values in the same order
 as the descriptors in Descriptors block.
 
+4.134 KVM_PPC_SET_SNS
+-
+
+:Capability: basic
+:Architectures: powerpc
+:Type: vm ioctl
+:Parameters: none
+:Returns: 0 on successful completion,
+
+As part of H_REG_SNS hypercall, this ioctl is used to map and pin
+the guest provided SNS structure in the host.
+
+This is used for providing asynchronous page fault support for
+powerpc pseries KVM guests.
+
 5. The kvm_run structure
 
 
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 9bcf345cb208..9e33500c1723 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -321,6 +321,7 @@
 #define H_SCM_UNBIND_ALL0x3FC
 #define H_SCM_HEALTH0x400
 #define H_SCM_PERFORMANCE_STATS 0x418
+#define H_REG_SNS  0x41C
 #define H_RPT_INVALIDATE   0x448
 #define H_SCM_FLUSH0x44C
 #define MAX_HCALL_OPCODE   H_SCM_FLUSH
diff --git a/arch/powerpc/include/asm/kvm_book3s_esn.h 
b/arch/powerpc/include/asm/kvm_book3s_esn.h
new file mode 100644
index ..d79a441ea31d
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s_esn.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KVM_BOOK3S_ESN_H__
+#define __ASM_KVM_BOOK3S_ESN_H__
+
+/* SNS

[RFC PATCH v0 3/5] KVM: PPC: Book3S: Enable setting SRR1 flags for DSI

2021-08-05 Thread Bharata B Rao
kvmppc_core_queue_data_storage() doesn't provide an option to
set SRR1 flags when raising DSI. Since kvmppc_inject_interrupt()
allows for such a provision, add an argument to allow the same.

This will be used to raise DSI with SRR1_PROGTRAP set when
expropriation interrupt needs to be injected to the guest.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/kvm_ppc.h | 3 ++-
 arch/powerpc/kvm/book3s.c  | 6 +++---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 +++---
 arch/powerpc/kvm/book3s_hv.c   | 4 ++--
 arch/powerpc/kvm/book3s_hv_nested.c| 4 ++--
 arch/powerpc/kvm/book3s_pr.c   | 4 ++--
 6 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 2d88944f9f34..09235bdfd4ac 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -143,7 +143,8 @@ extern void kvmppc_core_queue_dtlb_miss(struct kvm_vcpu 
*vcpu, ulong dear_flags,
ulong esr_flags);
 extern void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu,
   ulong dear_flags,
-  ulong esr_flags);
+  ulong esr_flags,
+  ulong srr1_flags);
 extern void kvmppc_core_queue_itlb_miss(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu,
   ulong esr_flags);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 79833f78d1da..f7f6641a788d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -284,11 +284,11 @@ void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu)
 }
 
 void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, ulong dar,
-   ulong flags)
+   ulong dsisr, ulong srr1)
 {
kvmppc_set_dar(vcpu, dar);
-   kvmppc_set_dsisr(vcpu, flags);
-   kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE, 0);
+   kvmppc_set_dsisr(vcpu, dsisr);
+   kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE, srr1);
 }
 EXPORT_SYMBOL_GPL(kvmppc_core_queue_data_storage);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index b5905ae4377c..618206a504b0 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -946,7 +946,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
if (dsisr & DSISR_BADACCESS) {
/* Reflect to the guest as DSI */
pr_err("KVM: Got radix HV page fault with DSISR=%lx\n", dsisr);
-   kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+   kvmppc_core_queue_data_storage(vcpu, ea, dsisr, 0);
return RESUME_GUEST;
}
 
@@ -971,7 +971,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
 * Bad address in guest page table tree, or other
 * unusual error - reflect it to the guest as DSI.
 */
-   kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+   kvmppc_core_queue_data_storage(vcpu, ea, dsisr, 0);
return RESUME_GUEST;
}
return kvmppc_hv_emulate_mmio(vcpu, gpa, ea, writing);
@@ -981,7 +981,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
if (writing) {
/* give the guest a DSI */
kvmppc_core_queue_data_storage(vcpu, ea, DSISR_ISSTORE |
-  DSISR_PROTFAULT);
+  DSISR_PROTFAULT, 0);
return RESUME_GUEST;
}
kvm_ro = true;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 47ccd4a2df54..d07e9065f7c1 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1592,7 +1592,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 
if (!(vcpu->arch.fault_dsisr & (DSISR_NOHPTE | 
DSISR_PROTFAULT))) {
kvmppc_core_queue_data_storage(vcpu,
-   vcpu->arch.fault_dar, vcpu->arch.fault_dsisr);
+   vcpu->arch.fault_dar, vcpu->arch.fault_dsisr, 
0);
r = RESUME_GUEST;
break;
}
@@ -1610,7 +1610,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
r = RESUME_PAGE_FAULT;
} else {
kvmppc_core_queue_data_storage(vcpu,
-   vcpu->arch.fault_dar, err);
+   

[RFC PATCH v0 2/5] KVM: PPC: Add support for KVM_REQ_ESN_EXIT

2021-08-05 Thread Bharata B Rao
Add a new KVM exit request KVM_REQ_ESN_EXIT that will be used
to exit to userspace (QEMU) whenever subvention notification
needs to be sent to the guest.

The userspace (QEMU) issues the subvention notification by
injecting an interrupt into the guest.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/kvm_host.h | 1 +
 arch/powerpc/kvm/book3s_hv.c| 8 
 include/uapi/linux/kvm.h| 1 +
 3 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 9f52f282b1aa..204dc2d91388 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -52,6 +52,7 @@
 #define KVM_REQ_WATCHDOG   KVM_ARCH_REQ(0)
 #define KVM_REQ_EPR_EXIT   KVM_ARCH_REQ(1)
 #define KVM_REQ_PENDING_TIMER  KVM_ARCH_REQ(2)
+#define KVM_REQ_ESN_EXIT   KVM_ARCH_REQ(3)
 
 #include 
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 085fb8ecbf68..47ccd4a2df54 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2820,6 +2820,14 @@ static void kvmppc_core_vcpu_free_hv(struct kvm_vcpu 
*vcpu)
 
 static int kvmppc_core_check_requests_hv(struct kvm_vcpu *vcpu)
 {
+   /*
+* If subvention interrupt needs to be injected to the guest
+* exit to user space.
+*/
+   if (kvm_check_request(KVM_REQ_ESN_EXIT, vcpu)) {
+   vcpu->run->exit_reason = KVM_EXIT_ESN;
+   return 0;
+   }
/* Indicate we want to get back into the guest */
return 1;
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9e4aabcb31a..47be532ed14b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -269,6 +269,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_AP_RESET_HOLD32
 #define KVM_EXIT_X86_BUS_LOCK 33
 #define KVM_EXIT_XEN  34
+#define KVM_EXIT_ESN 35
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
-- 
2.31.1



[RFC PATCH v0 0/5] PPC: KVM: pseries: Asynchronous page fault

2021-08-05 Thread Bharata B Rao
Hi,

This series adds asynchronous page fault support for pseries guests
and enables the support for the same in powerpc KVM. This is an
early RFC with details and multiple TODOs listed in patch descriptions.

This patch needs supporting enablement in QEMU too which will be
posted separately.

Bharata B Rao (5):
  powerpc: Define Expropriation interrupt bit to VPA byte offset 0xB9
  KVM: PPC: Add support for KVM_REQ_ESN_EXIT
  KVM: PPC: Book3S: Enable setting SRR1 flags for DSI
  KVM: PPC: BOOK3S HV: Async PF support
  pseries: Asynchronous page fault support

 Documentation/virt/kvm/api.rst|  15 ++
 arch/powerpc/include/asm/async-pf.h   |  12 ++
 arch/powerpc/include/asm/hvcall.h |   1 +
 arch/powerpc/include/asm/kvm_book3s_esn.h |  24 +++
 arch/powerpc/include/asm/kvm_host.h   |  22 +++
 arch/powerpc/include/asm/kvm_ppc.h|   4 +-
 arch/powerpc/include/asm/lppaca.h |  20 +-
 arch/powerpc/include/uapi/asm/kvm.h   |   6 +
 arch/powerpc/kvm/Kconfig  |   2 +
 arch/powerpc/kvm/Makefile |   5 +-
 arch/powerpc/kvm/book3s.c |   6 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c|   9 +-
 arch/powerpc/kvm/book3s_hv.c  |  37 +++-
 arch/powerpc/kvm/book3s_hv_esn.c  | 189 +++
 arch/powerpc/kvm/book3s_hv_nested.c   |   4 +-
 arch/powerpc/kvm/book3s_pr.c  |   4 +-
 arch/powerpc/mm/fault.c   |   7 +-
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 arch/powerpc/platforms/pseries/async-pf.c | 219 ++
 drivers/cpuidle/cpuidle-pseries.c |   4 +-
 include/uapi/linux/kvm.h  |   2 +
 tools/include/uapi/linux/kvm.h|   1 +
 22 files changed, 574 insertions(+), 21 deletions(-)
 create mode 100644 arch/powerpc/include/asm/async-pf.h
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_esn.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_esn.c
 create mode 100644 arch/powerpc/platforms/pseries/async-pf.c

-- 
2.31.1



[RFC PATCH v0 1/5] powerpc: Define Expropriation interrupt bit to VPA byte offset 0xB9

2021-08-05 Thread Bharata B Rao
VPA byte offset 0xB9 was named as donate_dedicated_cpu as that
was the only used bit. The Expropriation/Subvention support defines
a bit in byte offset 0xB9. Define this bit and rename the field
in VPA to a generic name.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/lppaca.h | 8 +++-
 drivers/cpuidle/cpuidle-pseries.c | 4 ++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index c390ec377bae..57e432766f3e 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -80,7 +80,7 @@ struct lppaca {
u8  ebb_regs_in_use;
u8  reserved7[6];
u8  dtl_enable_mask;/* Dispatch Trace Log mask */
-   u8  donate_dedicated_cpu;   /* Donate dedicated CPU cycles */
+   u8  byte_b9; /* Donate dedicated CPU cycles & Expropriation int */
u8  fpregs_in_use;
u8  pmcregs_in_use;
u8  reserved8[28];
@@ -116,6 +116,12 @@ struct lppaca {
 
 #define lppaca_of(cpu) (*paca_ptrs[cpu]->lppaca_ptr)
 
+/*
+ * Flags for Byte offset 0xB9
+ */
+#define LPPACA_DONATE_DED_CPU_CYCLES   0x1
+#define LPPACA_EXP_INT_ENABLED 0x2
+
 /*
  * We are using a non architected field to determine if a partition is
  * shared or dedicated. This currently works on both KVM and PHYP, but
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index a2b5c6f60cf0..b9d0f41c3f19 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -221,7 +221,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
u8 old_latency_hint;
 
pseries_idle_prolog();
-   get_lppaca()->donate_dedicated_cpu = 1;
+   get_lppaca()->byte_b9 |= LPPACA_DONATE_DED_CPU_CYCLES;
old_latency_hint = get_lppaca()->cede_latency_hint;
get_lppaca()->cede_latency_hint = cede_latency_hint[index];
 
@@ -229,7 +229,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
check_and_cede_processor();
 
local_irq_disable();
-   get_lppaca()->donate_dedicated_cpu = 0;
+   get_lppaca()->byte_b9 &= ~LPPACA_DONATE_DED_CPU_CYCLES;
get_lppaca()->cede_latency_hint = old_latency_hint;
 
pseries_idle_epilog();
-- 
2.31.1



Re: [RFC PATCH v0 1/1] powerpc/percpu: Use 2MB atom_size in percpu allocator on radix

2021-07-11 Thread Bharata B Rao
On Mon, Jul 12, 2021 at 01:00:10PM +1000, Nicholas Piggin wrote:
> Excerpts from Bharata B Rao's message of July 8, 2021 3:29 pm:
> > The atom_size used by percpu allocator on powerpc is currently
> > determined by mmu_linear_psize which is initialized to 4K and
> > mmu_linear_psize is modified only by hash. Till now for radix
> > the atom_size was defaulting to PAGE_SIZE(64K).
> 
> Looks like it was 1MB to me?

Was it hash? Because atom_size will get set to 1MB on hash.
And both on baremetal and KVM radix, I see 64K atom_size.

> 
> > Go for 2MB
> > atom_size on radix if support for 2MB pages exist.
> > 
> > 2MB atom_size on radix will allow using PMD mappings in the
> > vmalloc area if and when support for higher sized vmalloc
> > mappings is enabled for the pecpu allocator. However right now
> 
> That would be nice.
> 
> > this change will result in more number of units to be allocated
> > within one allocation due to increased upa(units per allocation).
> 
> In that case is there any reason to do it until then?

Not strictly. I observed a similar setting on x86 which has
been there for long, so was just checking if it makes sense
here too.

> 
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  arch/powerpc/kernel/setup_64.c | 34 +-
> >  1 file changed, 25 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> > index 1ff258f6c76c..45ce2d6e8112 100644
> > --- a/arch/powerpc/kernel/setup_64.c
> > +++ b/arch/powerpc/kernel/setup_64.c
> > @@ -871,6 +871,30 @@ static void __init pcpu_populate_pte(unsigned long 
> > addr)
> >   __func__, PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
> >  }
> >  
> > +static size_t pcpu_atom_size(void)
> > +{
> > +   size_t atom_size = PAGE_SIZE;
> > +
> > +   /*
> > +* Radix: Use PAGE_SIZE by default or 2M if available.
> > +*/
> > +   if (radix_enabled()) {
> > +   if (mmu_psize_defs[MMU_PAGE_2M].shift)
> > +   atom_size = 1 << mmu_psize_defs[MMU_PAGE_2M].shift;
> 
> Looks like this changes behaviour for radix.

Yes, it does as it increases the atom_size which results in higher
upa as noted. Did you mean some other behaviour change?

> 
> Also mmu_psize_defs is a pretty horrible interface you only need it in 
> some low level instruction encodings. You already explicitly know it's
> 2MB there, so you can just PMD_SHIFT.

Ok.

> 
> If you want to know whether huge PMD is supported and enabled in vmalloc
> memory, you would have to add some check which also accounts for
> vmap_allow_huge, so that would be another patch.

Yes makes sense if we want to tie the setting of higher atom_size
to actual availability of PMD mappings in vmalloc.

Regards,
Bharata.


Re: [PATCH] powerpc: preempt: Don't touch the idle task's preempt_count during hotplug

2021-07-08 Thread Bharata B Rao
On Wed, Jul 07, 2021 at 07:38:31PM +0100, Valentin Schneider wrote:
> Powerpc currently resets a CPU's idle task preempt_count to 0 before said
> task starts executing the secondary startup routine (and becomes an idle
> task proper).
> 
> This conflicts with commit
> 
>   f1a0a376ca0c ("sched/core: Initialize the idle task with preemption 
> disabled")
> 
> which initializes all of the idle tasks' preempt_count to PREEMPT_DISABLED
> during smp_init(). Note that this was superfluous before said commit, as
> back then the hotplug machinery would invoke init_idle() via
> idle_thread_get(), which would have already reset the CPU's idle task's
> preempt_count to PREEMPT_ENABLED.
> 
> Get rid of this preempt_count write.
> 
> Cc: Guenter Roeck 
> Fixes: f1a0a376ca0c ("sched/core: Initialize the idle task with preemption 
> disabled")
> Reported-by: Bharata B Rao 
> Signed-off-by: Valentin Schneider 
> ---
>  arch/powerpc/platforms/cell/smp.c| 3 ---
>  arch/powerpc/platforms/pseries/smp.c | 5 +
>  2 files changed, 1 insertion(+), 7 deletions(-)

The messages like "BUG: scheduling while atomic: swapper/1/0/0x0000"
for each secondary CPU are no longer seen after this patch on powerpc.

Tested-by: Bharata B Rao 


[RFC PATCH v0 1/1] powerpc/percpu: Use 2MB atom_size in percpu allocator on radix

2021-07-07 Thread Bharata B Rao
The atom_size used by percpu allocator on powerpc is currently
determined by mmu_linear_psize which is initialized to 4K and
mmu_linear_psize is modified only by hash. Till now for radix
the atom_size was defaulting to PAGE_SIZE(64K). Go for 2MB
atom_size on radix if support for 2MB pages exist.

2MB atom_size on radix will allow using PMD mappings in the
vmalloc area if and when support for higher sized vmalloc
mappings is enabled for the pecpu allocator. However right now
this change will result in more number of units to be allocated
within one allocation due to increased upa(units per allocation).

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/kernel/setup_64.c | 34 +-
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 1ff258f6c76c..45ce2d6e8112 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -871,6 +871,30 @@ static void __init pcpu_populate_pte(unsigned long addr)
  __func__, PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
 }
 
+static size_t pcpu_atom_size(void)
+{
+   size_t atom_size = PAGE_SIZE;
+
+   /*
+* Radix: Use PAGE_SIZE by default or 2M if available.
+*/
+   if (radix_enabled()) {
+   if (mmu_psize_defs[MMU_PAGE_2M].shift)
+   atom_size = 1 << mmu_psize_defs[MMU_PAGE_2M].shift;
+   goto out;
+   }
+
+   /*
+* Hash: Linear mapping is one of 4K, 1M and 16M.  For 4K, no need
+* to group units.  For larger mappings, use 1M atom which
+* should be large enough to contain a number of units.
+*/
+   if (mmu_linear_psize != MMU_PAGE_4K)
+   atom_size = 1 << 20;
+
+out:
+   return atom_size;
+}
 
 void __init setup_per_cpu_areas(void)
 {
@@ -880,15 +904,7 @@ void __init setup_per_cpu_areas(void)
unsigned int cpu;
int rc = -EINVAL;
 
-   /*
-* Linear mapping is one of 4K, 1M and 16M.  For 4K, no need
-* to group units.  For larger mappings, use 1M atom which
-* should be large enough to contain a number of units.
-*/
-   if (mmu_linear_psize == MMU_PAGE_4K)
-   atom_size = PAGE_SIZE;
-   else
-   atom_size = 1 << 20;
+   atom_size = pcpu_atom_size();
 
if (pcpu_chosen_fc != PCPU_FC_PAGE) {
rc = pcpu_embed_first_chunk(0, dyn_size, atom_size, 
pcpu_cpu_distance,
-- 
2.31.1



Re: [PATCH v8 3/6] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2021-07-05 Thread Bharata B Rao
On Mon, Jul 05, 2021 at 02:42:33PM +1000, David Gibson wrote:
> On Mon, Jun 21, 2021 at 02:20:00PM +0530, Bharata B Rao wrote:
> > diff --git a/arch/powerpc/include/asm/mmu_context.h 
> > b/arch/powerpc/include/asm/mmu_context.h
> > index 4bc45d3ed8b0..b44f291fc909 100644
> > --- a/arch/powerpc/include/asm/mmu_context.h
> > +++ b/arch/powerpc/include/asm/mmu_context.h
> > @@ -124,8 +124,17 @@ static inline bool need_extra_context(struct mm_struct 
> > *mm, unsigned long ea)
> >  
> >  #if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) && defined(CONFIG_PPC_RADIX_MMU)
> >  extern void radix_kvm_prefetch_workaround(struct mm_struct *mm);
> > +void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid,
> > +unsigned long type, unsigned long pg_sizes,
> > +unsigned long start, unsigned long end);
> >  #else
> >  static inline void radix_kvm_prefetch_workaround(struct mm_struct *mm) { }
> > +static inline void do_h_rpt_invalidate_prt(unsigned long pid,
> > +  unsigned long lpid,
> > +  unsigned long type,
> > +  unsigned long pg_sizes,
> > +  unsigned long start,
> > +  unsigned long end) { }
> 
> Since the only plausible caller is in KVM HV code, why do you need the
> #else clause.

The call to the above routine is prevented for non-radix guests
in KVM HV code at runtime using kvm_is_radix() check and not by
CONFIG_PPC_RADIX_MMU. Hence the #else version would be needed.

Regards,
Bharata.


Re: PowerPC guest getting "BUG: scheduling while atomic" on linux-next-20210623 during secondary CPUs bringup

2021-06-25 Thread Bharata B Rao
On Fri, Jun 25, 2021 at 12:16:52PM +0200, Peter Zijlstra wrote:
> You mean: CONFIG_PREEMPTION=n, what about CONFIG_PREEMPT_COUNT?
> 
> Because if both are =n, then I don't see how that warning could trigger.
> in_atomic_preempt_off() would then result in prempt_count() == 0, and
> per the print above, it *is* 0.

CONFIG_PREEMPTION isn't set.

Also other PREEMPT related options are as under:

# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set

Regards,
Bharata.


Re: PowerPC guest getting "BUG: scheduling while atomic" on linux-next-20210623 during secondary CPUs bringup

2021-06-25 Thread Bharata B Rao
On Fri, Jun 25, 2021 at 09:28:09AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 25, 2021 at 11:16:08AM +0530, Srikar Dronamraju wrote:
> > * Bharata B Rao  [2021-06-24 21:25:09]:
> > 
> > > A PowerPC KVM guest gets the following BUG message when booting
> > > linux-next-20210623:
> > > 
> > > smp: Bringing up secondary CPUs ...
> > > BUG: scheduling while atomic: swapper/1/0/0x
> 
> 'funny', your preempt_count is actually too low. The check here is for
> preempt_count() == DISABLE_OFFSET (aka. 1 when PREEMPT=y), but you have
> 0.
> 
> > > no locks held by swapper/1/0.
> > > Modules linked in:
> > > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.13.0-rc7-next-20210623
> > > Call Trace:
> > > [cae5bc20] [c0badc64] dump_stack_lvl+0x98/0xe0 
> > > (unreliable)
> > > [cae5bc60] [c0210200] __schedule_bug+0xb0/0xe0
> > > [cae5bcd0] [c1609e28] __schedule+0x1788/0x1c70
> > > [cae5be20] [c160a8cc] schedule_idle+0x3c/0x70
> > > [cae5be50] [c022984c] do_idle+0x2bc/0x420
> > > [cae5bf00] [c0229d88] cpu_startup_entry+0x38/0x40
> > > [cae5bf30] [c00666c0] start_secondary+0x290/0x2a0
> > > [cae5bf90] [c000be54] start_secondary_prolog+0x10/0x14
> > > 
> > > 
> > > 
> > > smp: Brought up 2 nodes, 16 CPUs
> > > numa: Node 0 CPUs: 0-7
> > > numa: Node 1 CPUs: 8-15
> > > 
> > > This seems to have started from next-20210521 and isn't seen on
> > > next-20210511.
> > > 
> > 
> > Bharata,
> > 
> > I think the regression is due to Commit f1a0a376ca0c ("sched/core:
> > Initialize the idle task with preemption disabled")
> 
> So that extra preempt_disable() that got removed would've incremented it
> to 1 and then things would've been fine.
> 
> Except.. Valentin changed things such that preempt_count() should've
> been inittialized to 1, instead of 0, but for some raisin that didn't
> stick.. what gives.
> 
> So we have init_idle(p) -> init_idle_preempt_count(p) ->
> task_thread_info(p)->preempt_count = PREEMPT_DISABLED;
> 
> But somehow, by the time you're running start_secondary(), that's gotten
> to be 0 again. Does DEBUG_PREEMPT give more clues?

PREEMPTION is off here.

Regards,
Bharata.


Re: PowerPC guest getting "BUG: scheduling while atomic" on linux-next-20210623 during secondary CPUs bringup

2021-06-24 Thread Bharata B Rao
On Fri, Jun 25, 2021 at 11:16:08AM +0530, Srikar Dronamraju wrote:
> * Bharata B Rao  [2021-06-24 21:25:09]:
> 
> > A PowerPC KVM guest gets the following BUG message when booting
> > linux-next-20210623:
> > 
> > smp: Bringing up secondary CPUs ...
> > BUG: scheduling while atomic: swapper/1/0/0x
> > no locks held by swapper/1/0.
> > Modules linked in:
> > CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.13.0-rc7-next-20210623
> > Call Trace:
> > [cae5bc20] [c0badc64] dump_stack_lvl+0x98/0xe0 (unreliable)
> > [cae5bc60] [c0210200] __schedule_bug+0xb0/0xe0
> > [cae5bcd0] [c1609e28] __schedule+0x1788/0x1c70
> > [cae5be20] [c160a8cc] schedule_idle+0x3c/0x70
> > [cae5be50] [c022984c] do_idle+0x2bc/0x420
> > [cae5bf00] [c0229d88] cpu_startup_entry+0x38/0x40
> > [cae5bf30] [c00666c0] start_secondary+0x290/0x2a0
> > [cae5bf90] [c000be54] start_secondary_prolog+0x10/0x14
> > 
> > 
> > 
> > smp: Brought up 2 nodes, 16 CPUs
> > numa: Node 0 CPUs: 0-7
> > numa: Node 1 CPUs: 8-15
> > 
> > This seems to have started from next-20210521 and isn't seen on
> > next-20210511.
> > 
> 
> Bharata,
> 
> I think the regression is due to Commit f1a0a376ca0c ("sched/core:
> Initialize the idle task with preemption disabled")
> 
> Can you please try with the above commit reverted?

Yes, reverting that commit helps.

Regards,
Bharata.


PowerPC guest getting "BUG: scheduling while atomic" on linux-next-20210623 during secondary CPUs bringup

2021-06-24 Thread Bharata B Rao
Hi,

A PowerPC KVM guest gets the following BUG message when booting
linux-next-20210623:

smp: Bringing up secondary CPUs ...
BUG: scheduling while atomic: swapper/1/0/0x
no locks held by swapper/1/0.
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.13.0-rc7-next-20210623
Call Trace:
[cae5bc20] [c0badc64] dump_stack_lvl+0x98/0xe0 (unreliable)
[cae5bc60] [c0210200] __schedule_bug+0xb0/0xe0
[cae5bcd0] [c1609e28] __schedule+0x1788/0x1c70
[cae5be20] [c160a8cc] schedule_idle+0x3c/0x70
[cae5be50] [c022984c] do_idle+0x2bc/0x420
[cae5bf00] [c0229d88] cpu_startup_entry+0x38/0x40
[cae5bf30] [c00666c0] start_secondary+0x290/0x2a0
[cae5bf90] [c000be54] start_secondary_prolog+0x10/0x14



smp: Brought up 2 nodes, 16 CPUs
numa: Node 0 CPUs: 0-7
numa: Node 1 CPUs: 8-15

This seems to have started from next-20210521 and isn't seen on
next-20210511.

Regards,
Bharata.


Re: [PATCH v8 4/6] KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE

2021-06-22 Thread Bharata B Rao
On Tue, Jun 22, 2021 at 10:05:45AM +0530, Bharata B Rao wrote:
> On Mon, Jun 21, 2021 at 10:12:42AM -0700, Nathan Chancellor wrote:
> > I have not seen this reported yet so apologies if it has and there is a
> > fix I am missing:
> > 
> > arch/powerpc/kvm/book3s_hv_nested.c:1334:11: error: variable 'ap' is 
> > uninitialized when used here [-Werror,-Wuninitialized]
> >ap, start, end);
> >^~
> > arch/powerpc/kvm/book3s_hv_nested.c:1276:25: note: initialize the variable 
> > 'ap' to silence this warning
> > unsigned long psize, ap;
> >^
> > = 0
> 
> Thanks for catching this, this wasn't caught in my environment.
> 
> I will repost the series with proper initialization to ap.

Michael,

Here is the fix for this on top of powerpc/next. If it is easier
and cleaner to fold this into the original series and re-post
the whole series against any updated tree, let me know.


>From 2e7198e28c0d1137f3230d4645e9cfddaccf4987 Mon Sep 17 00:00:00 2001
From: Bharata B Rao 
Date: Tue, 22 Jun 2021 12:07:01 +0530
Subject: [PATCH 1/1] KVM: PPC: Book3S HV: Use proper ap value in
 H_RPT_INVALIDATE

The ap value that is used when performing range based partition
scoped invalidations for the nested guests wasn't initialized
correctly.

Fix this and while we are here, reorganize the routine that does
this invalidation for better readability.

Fixes: 0e67d866cb32 ("KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE")
Signed-off-by: Bharata B Rao 
---
 arch/powerpc/kvm/book3s_hv_nested.c | 90 +
 1 file changed, 40 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index d78efb5f5bb3..3a06ac0b53e2 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1222,27 +1222,6 @@ long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu)
return H_SUCCESS;
 }
 
-static long do_tlb_invalidate_nested_tlb(struct kvm_vcpu *vcpu,
-unsigned long lpid,
-unsigned long page_size,
-unsigned long ap,
-unsigned long start,
-unsigned long end)
-{
-   unsigned long addr = start;
-   int ret;
-
-   do {
-   ret = kvmhv_emulate_tlbie_tlb_addr(vcpu, lpid, ap,
-  get_epn(addr));
-   if (ret)
-   return ret;
-   addr += page_size;
-   } while (addr < end);
-
-   return ret;
-}
-
 static long do_tlb_invalidate_nested_all(struct kvm_vcpu *vcpu,
 unsigned long lpid, unsigned long ric)
 {
@@ -1263,6 +1242,42 @@ static long do_tlb_invalidate_nested_all(struct kvm_vcpu 
*vcpu,
  */
 static unsigned long tlb_range_flush_page_ceiling __read_mostly = 33;
 
+static long do_tlb_invalidate_nested_tlb(struct kvm_vcpu *vcpu,
+unsigned long lpid,
+unsigned long pg_sizes,
+unsigned long start,
+unsigned long end)
+{
+   int ret = H_P4;
+   unsigned long addr, nr_pages;
+   struct mmu_psize_def *def;
+   unsigned long psize, ap, page_size;
+   bool flush_lpid;
+
+   for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
+   def = _psize_defs[psize];
+   if (!(pg_sizes & def->h_rpt_pgsize))
+   continue;
+
+   nr_pages = (end - start) >> def->shift;
+   flush_lpid = nr_pages > tlb_range_flush_page_ceiling;
+   if (flush_lpid)
+   return do_tlb_invalidate_nested_all(vcpu, lpid,
+   RIC_FLUSH_TLB);
+   addr = start;
+   ap = mmu_get_ap(psize);
+   page_size = 1UL << def->shift;
+   do {
+   ret = kvmhv_emulate_tlbie_tlb_addr(vcpu, lpid, ap,
+  get_epn(addr));
+   if (ret)
+   return H_P4;
+   addr += page_size;
+   } while (addr < end);
+   }
+   return ret;
+}
+
 /*
  * Performs partition-scoped invalidations for nested guests
  * as part of H_RPT_INVALIDATE hcall.
@@ -1271,10 +1286,6 @@ long do_h_rpt_invalidate_pat(struct kvm_vcpu *vcpu, 
unsigned long lpid,
 unsigned long type, unsigned long pg_sizes,
  

Re: [PATCH v8 4/6] KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE

2021-06-21 Thread Bharata B Rao
On Mon, Jun 21, 2021 at 10:12:42AM -0700, Nathan Chancellor wrote:
> > +long do_h_rpt_invalidate_pat(struct kvm_vcpu *vcpu, unsigned long lpid,
> > +unsigned long type, unsigned long pg_sizes,
> > +unsigned long start, unsigned long end)
> > +{
> > +   struct kvm_nested_guest *gp;
> > +   long ret;
> > +   unsigned long psize, ap;
> > +
> > +   /*
> > +* If L2 lpid isn't valid, we need to return H_PARAMETER.
> > +*
> > +* However, nested KVM issues a L2 lpid flush call when creating
> > +* partition table entries for L2. This happens even before the
> > +* corresponding shadow lpid is created in HV which happens in
> > +* H_ENTER_NESTED call. Since we can't differentiate this case from
> > +* the invalid case, we ignore such flush requests and return success.
> > +*/
> > +   gp = kvmhv_find_nested(vcpu->kvm, lpid);
> > +   if (!gp)
> > +   return H_SUCCESS;
> > +
> > +   /*
> > +* A flush all request can be handled by a full lpid flush only.
> > +*/
> > +   if ((type & H_RPTI_TYPE_NESTED_ALL) == H_RPTI_TYPE_NESTED_ALL)
> > +   return do_tlb_invalidate_nested_all(vcpu, lpid, RIC_FLUSH_ALL);
> > +
> > +   /*
> > +* We don't need to handle a PWC flush like process table here,
> > +* because intermediate partition scoped table in nested guest doesn't
> > +* really have PWC. Only level we have PWC is in L0 and for nested
> > +* invalidate at L0 we always do kvm_flush_lpid() which does
> > +* radix__flush_all_lpid(). For range invalidate at any level, we
> > +* are not removing the higher level page tables and hence there is
> > +* no PWC invalidate needed.
> > +*
> > +* if (type & H_RPTI_TYPE_PWC) {
> > +*  ret = do_tlb_invalidate_nested_all(vcpu, lpid, RIC_FLUSH_PWC);
> > +*  if (ret)
> > +*  return H_P4;
> > +* }
> > +*/
> > +
> > +   if (start == 0 && end == -1)
> > +   return do_tlb_invalidate_nested_all(vcpu, lpid, RIC_FLUSH_TLB);
> > +
> > +   if (type & H_RPTI_TYPE_TLB) {
> > +   struct mmu_psize_def *def;
> > +   bool flush_lpid;
> > +   unsigned long nr_pages;
> > +
> > +   for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
> > +   def = _psize_defs[psize];
> > +   if (!(pg_sizes & def->h_rpt_pgsize))
> > +   continue;
> > +
> > +   nr_pages = (end - start) >> def->shift;
> > +   flush_lpid = nr_pages > tlb_range_flush_page_ceiling;
> > +   if (flush_lpid)
> > +   return do_tlb_invalidate_nested_all(vcpu, lpid,
> > +   RIC_FLUSH_TLB);
> > +
> > +   ret = do_tlb_invalidate_nested_tlb(vcpu, lpid,
> > +  (1UL << def->shift),
> > +  ap, start, end);
> 
> I have not seen this reported yet so apologies if it has and there is a
> fix I am missing:
> 
> arch/powerpc/kvm/book3s_hv_nested.c:1334:11: error: variable 'ap' is 
> uninitialized when used here [-Werror,-Wuninitialized]
>ap, start, end);
>^~
> arch/powerpc/kvm/book3s_hv_nested.c:1276:25: note: initialize the variable 
> 'ap' to silence this warning
> unsigned long psize, ap;
>^
> = 0

Thanks for catching this, this wasn't caught in my environment.

I will repost the series with proper initialization to ap.

Regards,
Bharata.


[PATCH v8 6/6] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

2021-06-21 Thread Bharata B Rao
In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
H_RPT_INVALIDATE if available. The availability of this hcall
is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions
DT property.

Signed-off-by: Bharata B Rao 
Reviewed-by: Fabiano Rosas 
Reviewed-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 27 +-
 arch/powerpc/kvm/book3s_hv_nested.c| 12 ++--
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index d909c069363e..b5905ae4377c 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Supported radix tree geometry.
@@ -318,9 +319,19 @@ void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned 
long addr,
}
 
psi = shift_to_mmu_psize(pshift);
-   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
-   lpid, rb);
+
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE)) {
+   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 
1),
+   lpid, rb);
+   } else {
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB,
+   psize_to_rpti_pgsize(psi),
+   addr, addr + psize);
+   }
+
if (rc)
pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc);
 }
@@ -334,8 +345,14 @@ static void kvmppc_radix_flush_pwc(struct kvm *kvm, 
unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 
1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_PWC, H_RPTI_PAGE_ALL,
+   0, -1UL);
if (rc)
pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 056d3df68de1..d78efb5f5bb3 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct patb_entry *pseries_partition_tb;
 
@@ -467,8 +468,15 @@ static void kvmhv_flush_lpid(unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 
1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+   H_RPTI_TYPE_PAT,
+   H_RPTI_PAGE_ALL, 0, -1UL);
if (rc)
pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
 }
-- 
2.31.1



[PATCH v8 5/6] KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability

2021-06-21 Thread Bharata B Rao
Now that we have H_RPT_INVALIDATE fully implemented, enable
support for the same via KVM_CAP_PPC_RPT_INVALIDATE KVM capability

Signed-off-by: Bharata B Rao 
Reviewed-by: David Gibson 
---
 Documentation/virt/kvm/api.rst | 18 ++
 arch/powerpc/kvm/powerpc.c |  3 +++
 include/uapi/linux/kvm.h   |  1 +
 3 files changed, 22 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7fcb2fd38f42..9977e845633f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6362,6 +6362,24 @@ default.
 
 See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
 
+7.26 KVM_CAP_PPC_RPT_INVALIDATE
+---
+
+:Capability: KVM_CAP_PPC_RPT_INVALIDATE
+:Architectures: ppc
+:Type: vm
+
+This capability indicates that the kernel is capable of handling
+H_RPT_INVALIDATE hcall.
+
+In order to enable the use of H_RPT_INVALIDATE in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+This capability is enabled for hypervisors on platforms like POWER9
+that support radix MMU.
+
 8. Other capabilities.
 ==
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a2a68a958fa0..be33b5321a76 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -682,6 +682,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = !!(hv_enabled && kvmppc_hv_ops->enable_dawr1 &&
   !kvmppc_hv_ops->enable_dawr1(NULL));
break;
+   case KVM_CAP_PPC_RPT_INVALIDATE:
+   r = 1;
+   break;
 #endif
default:
r = 0;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 79d9c44d1ad7..9016e96de971 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1083,6 +1083,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SGX_ATTRIBUTE 196
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_PPC_RPT_INVALIDATE 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.31.1



[PATCH v8 4/6] KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE

2021-06-21 Thread Bharata B Rao
Enable support for process-scoped invalidations from nested
guests and partition-scoped invalidations for nested guests.

Process-scoped invalidations for any level of nested guests
are handled by implementing H_RPT_INVALIDATE handler in the
nested guest exit path in L0.

Partition-scoped invalidation requests are forwarded to the
right nested guest, handled there and passed down to L0
for eventual handling.

Signed-off-by: Bharata B Rao 
Signed-off-by: Aneesh Kumar K.V 
[Nested guest partition-scoped invalidation changes]
---
 .../include/asm/book3s/64/tlbflush-radix.h|   4 +
 arch/powerpc/include/asm/kvm_book3s.h |   3 +
 arch/powerpc/kvm/book3s_hv.c  |  59 -
 arch/powerpc/kvm/book3s_hv_nested.c   | 117 ++
 arch/powerpc/mm/book3s64/radix_tlb.c  |   4 -
 5 files changed, 180 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 8b33601cdb9d..a46fd37ad552 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -4,6 +4,10 @@
 
 #include 
 
+#define RIC_FLUSH_TLB 0
+#define RIC_FLUSH_PWC 1
+#define RIC_FLUSH_ALL 2
+
 struct vm_area_struct;
 struct mm_struct;
 struct mmu_gather;
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index e6b53c6e21e3..caaa0f592d8e 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -307,6 +307,9 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 
dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
 long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
+long do_h_rpt_invalidate_pat(struct kvm_vcpu *vcpu, unsigned long lpid,
+unsigned long type, unsigned long pg_sizes,
+unsigned long start, unsigned long end);
 int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu,
  u64 time_limit, unsigned long lpcr);
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 7e6da4687d88..3d5b8ba3786d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -925,6 +925,34 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu)
return yield_count;
 }
 
+/*
+ * H_RPT_INVALIDATE hcall handler for nested guests.
+ *
+ * Handles only nested process-scoped invalidation requests in L0.
+ */
+static int kvmppc_nested_h_rpt_invalidate(struct kvm_vcpu *vcpu)
+{
+   unsigned long type = kvmppc_get_gpr(vcpu, 6);
+   unsigned long pid, pg_sizes, start, end;
+
+   /*
+* The partition-scoped invalidations aren't handled here in L0.
+*/
+   if (type & H_RPTI_TYPE_NESTED)
+   return RESUME_HOST;
+
+   pid = kvmppc_get_gpr(vcpu, 4);
+   pg_sizes = kvmppc_get_gpr(vcpu, 7);
+   start = kvmppc_get_gpr(vcpu, 8);
+   end = kvmppc_get_gpr(vcpu, 9);
+
+   do_h_rpt_invalidate_prt(pid, vcpu->arch.nested->shadow_lpid,
+   type, pg_sizes, start, end);
+
+   kvmppc_set_gpr(vcpu, 3, H_SUCCESS);
+   return RESUME_GUEST;
+}
+
 static long kvmppc_h_rpt_invalidate(struct kvm_vcpu *vcpu,
unsigned long id, unsigned long target,
unsigned long type, unsigned long pg_sizes,
@@ -938,10 +966,18 @@ static long kvmppc_h_rpt_invalidate(struct kvm_vcpu *vcpu,
 
/*
 * Partition-scoped invalidation for nested guests.
-* Not yet supported
 */
-   if (type & H_RPTI_TYPE_NESTED)
-   return H_P3;
+   if (type & H_RPTI_TYPE_NESTED) {
+   if (!nesting_enabled(vcpu->kvm))
+   return H_FUNCTION;
+
+   /* Support only cores as target */
+   if (target != H_RPTI_TARGET_CMMU)
+   return H_P2;
+
+   return do_h_rpt_invalidate_pat(vcpu, id, type, pg_sizes,
+  start, end);
+   }
 
/*
 * Process-scoped invalidation for L1 guests.
@@ -1629,6 +1665,23 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu 
*vcpu)
if (!xics_on_xive())
kvmppc_xics_rm_complete(vcpu, 0);
break;
+   case BOOK3S_INTERRUPT_SYSCALL:
+   {
+   unsigned long req = kvmppc_get_gpr(vcpu, 3);
+
+   /*
+* The H_RPT_INVALIDATE hcalls issued by nested
+* guests for process-scoped invalidations when
+* GTSE=0, are handled here in L0.
+*/
+   if (req == H_RPT_INVALIDATE) {
+   r = kvmppc_nested_h_rpt_invalidate(vcpu);
+   

[PATCH v8 0/6] Support for H_RPT_INVALIDATE in PowerPC KVM

2021-06-21 Thread Bharata B Rao
This patchset adds support for the new hcall H_RPT_INVALIDATE
and replaces the nested tlb flush calls with this new hcall
if support for the same exists.

Changes in v8:
-
- Used tlb_single_page_flush_ceiling in the process-scoped range
  flush routine to switch to full PID invalation if
  the number of pages is above the threshold
- Moved iterating over page sizes into the actual routine that
  handles the eventual flushing thereby limiting the page size
  iteration only to range based flushing
- Converted #if 0 section into a comment section to avoid
  checkpatch from complaining.
- Used a threshold in the partition-scoped range flushing
  to switch to full LPID invalidation

v7: 
https://lore.kernel.org/linuxppc-dev/20210505154642.178702-1-bhar...@linux.ibm.com/

Aneesh Kumar K.V (1):
  KVM: PPC: Book3S HV: Fix comments of H_RPT_INVALIDATE arguments

Bharata B Rao (5):
  powerpc/book3s64/radix: Add H_RPT_INVALIDATE pgsize encodings to
mmu_psize_def
  KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
  KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE
  KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability
  KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

 Documentation/virt/kvm/api.rst|  18 ++
 arch/powerpc/include/asm/book3s/64/mmu.h  |   1 +
 .../include/asm/book3s/64/tlbflush-radix.h|   4 +
 arch/powerpc/include/asm/hvcall.h |   4 +-
 arch/powerpc/include/asm/kvm_book3s.h |   3 +
 arch/powerpc/include/asm/mmu_context.h|   9 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c|  27 ++-
 arch/powerpc/kvm/book3s_hv.c  |  89 +
 arch/powerpc/kvm/book3s_hv_nested.c   | 129 -
 arch/powerpc/kvm/powerpc.c|   3 +
 arch/powerpc/mm/book3s64/radix_pgtable.c  |   5 +
 arch/powerpc/mm/book3s64/radix_tlb.c  | 176 +-
 include/uapi/linux/kvm.h  |   1 +
 13 files changed, 456 insertions(+), 13 deletions(-)

-- 
2.31.1



[PATCH v8 3/6] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2021-06-21 Thread Bharata B Rao
H_RPT_INVALIDATE does two types of TLB invalidations:

1. Process-scoped invalidations for guests when LPCR[GTSE]=0.
   This is currently not used in KVM as GTSE is not usually
   disabled in KVM.
2. Partition-scoped invalidations that an L1 hypervisor does on
   behalf of an L2 guest. This is currently handled
   by H_TLB_INVALIDATE hcall and this new replaces the old that.

This commit enables process-scoped invalidations for L1 guests.
Support for process-scoped and partition-scoped invalidations
from/for nested guests will be added separately.

Process scoped tlbie invalidations from L1 and nested guests
need RS register for TLBIE instruction to contain both PID and
LPID.  This patch introduces primitives that execute tlbie
instruction with both PID and LPID set in prepartion for
H_RPT_INVALIDATE hcall.

A description of H_RPT_INVALIDATE follows:

int64   /* H_Success: Return code on successful completion */
    /* H_Busy - repeat the call with the same */
    /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid
   parameters */
hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT
translation
lookaside information */
  uint64 id,    /* PID/LPID to invalidate */
  uint64 target,    /* Invalidation target */
  uint64 type,  /* Type of lookaside information */
  uint64 pg_sizes,  /* Page sizes */
  uint64 start, /* Start of Effective Address (EA)
   range (inclusive) */
  uint64 end)   /* End of EA range (exclusive) */

Invalidation targets (target)
-
Core MMU    0x01 /* All virtual processors in the
partition */
Core local MMU  0x02 /* Current virtual processor */
Nest MMU    0x04 /* All nest/accelerator agents
in use by the partition */

A combination of the above can be specified,
except core and core local.

Type of translation to invalidate (type)
---
NESTED   0x0001  /* invalidate nested guest partition-scope */
TLB  0x0002  /* Invalidate TLB */
PWC  0x0004  /* Invalidate Page Walk Cache */
PRT  0x0008  /* Invalidate caching of Process Table
Entries if NESTED is clear */
PAT  0x0008  /* Invalidate caching of Partition Table
Entries if NESTED is set */

A combination of the above can be specified.

Page size mask (pages)
--
4K  0x01
64K 0x02
2M  0x04
1G  0x08
All sizes   (-1UL)

A combination of the above can be specified.
All page sizes can be selected with -1.

Semantics: Invalidate radix tree lookaside information
   matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters
  are different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
  LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and
  end should be a valid Quadrant address and  end > start.
* Return H_NotSupported if the partition is not in running in radix
  translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid
  addresses. Else start and end should be aligned to 4kB (lower 11
  bits clear).
* If NESTED is clear, then invalidate process scoped lookaside
  information. Else pid specifies a nested LPID, and the invalidation
  is performed   on nested guest partition table and nested guest
  partition scope real addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3
  and quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
  Those which are partially covered are considered outside
  invalidation range, which allows a caller to optimally invalidate
  ranges that may   contain mixed page sizes.
* Return H_SUCCESS on success.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/mmu_context.h |   9 ++
 arch/powerpc/kvm/book3s_hv.c   |  36 ++
 arch/powerpc/mm/book3s64/radix_tlb.c   | 172 +
 3 files changed, 217 insertions(+)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 4bc45d3ed8b0..b44f291fc909 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -124,8 +124,17 @@ static inline bool need_extra_context(struct mm_struct 
*mm, unsigned long ea)
 
 #if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) && defined(CONFIG_PPC_RADIX_MMU)
 extern void radix_kvm_prefetch_workaround(struct mm_struct *mm);
+void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid,
+unsigned long type, unsigned lo

[PATCH v8 1/6] KVM: PPC: Book3S HV: Fix comments of H_RPT_INVALIDATE arguments

2021-06-21 Thread Bharata B Rao
From: "Aneesh Kumar K.V" 

The type values H_RPTI_TYPE_PRT and H_RPTI_TYPE_PAT indicate
invalidating the caching of process and partition scoped entries
respectively.

Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: Bharata B Rao 
Reviewed-by: David Gibson 
---
 arch/powerpc/include/asm/hvcall.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index e3b29eda8074..7e4b2cef40c2 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -413,9 +413,9 @@
 #define H_RPTI_TYPE_NESTED 0x0001  /* Invalidate nested guest 
partition-scope */
 #define H_RPTI_TYPE_TLB0x0002  /* Invalidate TLB */
 #define H_RPTI_TYPE_PWC0x0004  /* Invalidate Page Walk Cache */
-/* Invalidate Process Table Entries if H_RPTI_TYPE_NESTED is clear */
+/* Invalidate caching of Process Table Entries if H_RPTI_TYPE_NESTED is clear 
*/
 #define H_RPTI_TYPE_PRT0x0008
-/* Invalidate Partition Table Entries if H_RPTI_TYPE_NESTED is set */
+/* Invalidate caching of Partition Table Entries if H_RPTI_TYPE_NESTED is set 
*/
 #define H_RPTI_TYPE_PAT0x0008
 #define H_RPTI_TYPE_ALL(H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC | \
 H_RPTI_TYPE_PRT)
-- 
2.31.1



[PATCH v8 2/6] powerpc/book3s64/radix: Add H_RPT_INVALIDATE pgsize encodings to mmu_psize_def

2021-06-21 Thread Bharata B Rao
Add a field to mmu_psize_def to store the page size encodings
of H_RPT_INVALIDATE hcall. Initialize this while scanning the radix
AP encodings. This will be used when invalidating with required
page size encoding in the hcall.

Signed-off-by: Bharata B Rao 
Reviewed-by: David Gibson 
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 1 +
 arch/powerpc/mm/book3s64/radix_pgtable.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index eace8c3f7b0a..c02f42d1031e 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -19,6 +19,7 @@ struct mmu_psize_def {
int penc[MMU_PAGE_COUNT];   /* HPTE encoding */
unsigned inttlbiel; /* tlbiel supported for that page size */
unsigned long   avpnm;  /* bits to mask out in AVPN in the HPTE */
+   unsigned long   h_rpt_pgsize; /* H_RPT_INVALIDATE page size encoding */
union {
unsigned long   sllp;   /* SLB L||LP (exact mask to use in 
slbmte) */
unsigned long ap;   /* Ap encoding used by PowerISA 3.0 */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 5fef8db3b463..637db10d841e 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -486,6 +486,7 @@ static int __init radix_dt_scan_page_sizes(unsigned long 
node,
def = _psize_defs[idx];
def->shift = shift;
def->ap  = ap;
+   def->h_rpt_pgsize = psize_to_rpti_pgsize(idx);
}
 
/* needed ? */
@@ -560,9 +561,13 @@ void __init radix__early_init_devtree(void)
 */
mmu_psize_defs[MMU_PAGE_4K].shift = 12;
mmu_psize_defs[MMU_PAGE_4K].ap = 0x0;
+   mmu_psize_defs[MMU_PAGE_4K].h_rpt_pgsize =
+   psize_to_rpti_pgsize(MMU_PAGE_4K);
 
mmu_psize_defs[MMU_PAGE_64K].shift = 16;
mmu_psize_defs[MMU_PAGE_64K].ap = 0x5;
+   mmu_psize_defs[MMU_PAGE_64K].h_rpt_pgsize =
+   psize_to_rpti_pgsize(MMU_PAGE_64K);
}
 
/*
-- 
2.31.1



Re: [PATCH v7 3/6] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2021-05-06 Thread Bharata B Rao
On Thu, May 06, 2021 at 03:45:21PM +1000, Nicholas Piggin wrote:
> Excerpts from Bharata B Rao's message of May 6, 2021 1:46 am:
> >  
> > +static long kvmppc_h_rpt_invalidate(struct kvm_vcpu *vcpu,
> > +   unsigned long id, unsigned long target,
> > +   unsigned long type, unsigned long pg_sizes,
> > +   unsigned long start, unsigned long end)
> > +{
> > +   unsigned long psize;
> > +   struct mmu_psize_def *def;
> > +
> > +   if (!kvm_is_radix(vcpu->kvm))
> > +   return H_UNSUPPORTED;
> > +
> > +   if (end < start)
> > +   return H_P5;
> > +
> > +   /*
> > +* Partition-scoped invalidation for nested guests.
> > +* Not yet supported
> > +*/
> > +   if (type & H_RPTI_TYPE_NESTED)
> > +   return H_P3;
> > +
> > +   /*
> > +* Process-scoped invalidation for L1 guests.
> > +*/
> > +   for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
> > +   def = _psize_defs[psize];
> > +   if (!(pg_sizes & def->h_rpt_pgsize))
> > +   continue;
> 
> Not that it really matters but why did you go this approach rather than
> use a bitmask iteration over h_rpt_pgsize?

If you are asking why I am not just looping over the hcall argument
@pg_sizes bitmask then, I was doing that in my earlier version. But
David suggested that it would be good to have page size encodings
of H_RPT_INVALIDATE within mmu_pgsize_defs[]. Based on this, I am
populating mmu_pgsize_defs[] during radix page size initialization
and using that here to check for those page sizes that have been set
in @pg_sizes.

> 
> I would actually prefer to put this loop into the TLB invalidation code
> itself.

Yes, I could easily move it there.

> 
> The reason is that not all flush types are based on page size. You only
> need to do IS=1/2/3 flushes once and it takes out all page sizes.

I see. So we have to do explicit flushing for different page sizes
only if we are doing range based invalidation (IS=0). For rest of
the cases (IS=1/2/3), that's not necessary.

> 
> You don't need to do all these optimisations right now, but it would
> be good to make them possible to implement.

Sure.

> > +void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid,
> > +unsigned long type, unsigned long page_size,
> > +unsigned long psize, unsigned long start,
> > +unsigned long end)
> > +{
> > +   /*
> > +* A H_RPTI_TYPE_ALL request implies RIC=3, hence
> > +* do a single IS=1 based flush.
> > +*/
> > +   if ((type & H_RPTI_TYPE_ALL) == H_RPTI_TYPE_ALL) {
> > +   _tlbie_pid_lpid(pid, lpid, RIC_FLUSH_ALL);
> > +   return;
> > +   }
> > +
> > +   if (type & H_RPTI_TYPE_PWC)
> > +   _tlbie_pid_lpid(pid, lpid, RIC_FLUSH_PWC);
> > +
> > +   if (start == 0 && end == -1) /* PID */
> > +   _tlbie_pid_lpid(pid, lpid, RIC_FLUSH_TLB);
> > +   else /* EA */
> > +   _tlbie_va_range_lpid(start, end, pid, lpid, page_size,
> > +psize, false);
> 
> At least one thing that is probably needed is to use the 
> single_page_flush_ceiling to flip the va range flush over to a pid 
> flush, so the guest can't cause problems in the hypervisor with an 
> enormous range.

Yes, makes sense. I shall do this and the above as later optimizations.

Regards,
Bharata.


[PATCH v7 6/6] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

2021-05-05 Thread Bharata B Rao
In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
H_RPT_INVALIDATE if available. The availability of this hcall
is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions
DT property.

Signed-off-by: Bharata B Rao 
Reviewed-by: Fabiano Rosas 
Reviewed-by: David Gibson 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 27 +-
 arch/powerpc/kvm/book3s_hv_nested.c| 12 ++--
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index ec4f58fa9f5a..6980f8ef08f9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Supported radix tree geometry.
@@ -318,9 +319,19 @@ void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned 
long addr,
}
 
psi = shift_to_mmu_psize(pshift);
-   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
-   lpid, rb);
+
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE)) {
+   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 
1),
+   lpid, rb);
+   } else {
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB,
+   psize_to_rpti_pgsize(psi),
+   addr, addr + psize);
+   }
+
if (rc)
pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc);
 }
@@ -334,8 +345,14 @@ static void kvmppc_radix_flush_pwc(struct kvm *kvm, 
unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 
1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_PWC, H_RPTI_PAGE_ALL,
+   0, -1UL);
if (rc)
pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 91f10290130d..d1529251b078 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct patb_entry *pseries_partition_tb;
 
@@ -467,8 +468,15 @@ static void kvmhv_flush_lpid(unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 
1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+   H_RPTI_TYPE_PAT,
+   H_RPTI_PAGE_ALL, 0, -1UL);
if (rc)
pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
 }
-- 
2.26.2



[PATCH v7 5/6] KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability

2021-05-05 Thread Bharata B Rao
Now that we have H_RPT_INVALIDATE fully implemented, enable
support for the same via KVM_CAP_PPC_RPT_INVALIDATE KVM capability

Signed-off-by: Bharata B Rao 
Reviewed-by: David Gibson 
---
 Documentation/virt/kvm/api.rst | 18 ++
 arch/powerpc/kvm/powerpc.c |  3 +++
 include/uapi/linux/kvm.h   |  1 +
 3 files changed, 22 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 22d077562149..233a9c0a15be 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6362,6 +6362,24 @@ default.
 
 See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
 
+7.26 KVM_CAP_PPC_RPT_INVALIDATE
+---
+
+:Capability: KVM_CAP_PPC_RPT_INVALIDATE
+:Architectures: ppc
+:Type: vm
+
+This capability indicates that the kernel is capable of handling
+H_RPT_INVALIDATE hcall.
+
+In order to enable the use of H_RPT_INVALIDATE in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+This capability is enabled for hypervisors on platforms like POWER9
+that support radix MMU.
+
 8. Other capabilities.
 ==
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a2a68a958fa0..be33b5321a76 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -682,6 +682,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = !!(hv_enabled && kvmppc_hv_ops->enable_dawr1 &&
   !kvmppc_hv_ops->enable_dawr1(NULL));
break;
+   case KVM_CAP_PPC_RPT_INVALIDATE:
+   r = 1;
+   break;
 #endif
default:
r = 0;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..613198a94c43 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SGX_ATTRIBUTE 196
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_PPC_RPT_INVALIDATE 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.26.2



[PATCH v7 3/6] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2021-05-05 Thread Bharata B Rao
H_RPT_INVALIDATE does two types of TLB invalidations:

1. Process-scoped invalidations for guests when LPCR[GTSE]=0.
   This is currently not used in KVM as GTSE is not usually
   disabled in KVM.
2. Partition-scoped invalidations that an L1 hypervisor does on
   behalf of an L2 guest. This is currently handled
   by H_TLB_INVALIDATE hcall and this new replaces the old that.

This commit enables process-scoped invalidations for L1 guests.
Support for process-scoped and partition-scoped invalidations
from/for nested guests will be added separately.

Process scoped tlbie invalidations from L1 and nested guests
need RS register for TLBIE instruction to contain both PID and
LPID.  This patch introduces primitives that execute tlbie
instruction with both PID and LPID set in prepartion for
H_RPT_INVALIDATE hcall.

A description of H_RPT_INVALIDATE follows:

int64   /* H_Success: Return code on successful completion */
    /* H_Busy - repeat the call with the same */
    /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid
   parameters */
hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT
translation
lookaside information */
  uint64 id,    /* PID/LPID to invalidate */
  uint64 target,    /* Invalidation target */
  uint64 type,  /* Type of lookaside information */
  uint64 pg_sizes,  /* Page sizes */
  uint64 start, /* Start of Effective Address (EA)
   range (inclusive) */
  uint64 end)   /* End of EA range (exclusive) */

Invalidation targets (target)
-
Core MMU    0x01 /* All virtual processors in the
partition */
Core local MMU  0x02 /* Current virtual processor */
Nest MMU    0x04 /* All nest/accelerator agents
in use by the partition */

A combination of the above can be specified,
except core and core local.

Type of translation to invalidate (type)
---
NESTED   0x0001  /* invalidate nested guest partition-scope */
TLB  0x0002  /* Invalidate TLB */
PWC  0x0004  /* Invalidate Page Walk Cache */
PRT  0x0008  /* Invalidate caching of Process Table
Entries if NESTED is clear */
PAT  0x0008  /* Invalidate caching of Partition Table
Entries if NESTED is set */

A combination of the above can be specified.

Page size mask (pages)
--
4K  0x01
64K 0x02
2M  0x04
1G  0x08
All sizes   (-1UL)

A combination of the above can be specified.
All page sizes can be selected with -1.

Semantics: Invalidate radix tree lookaside information
   matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters
  are different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
  LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and
  end should be a valid Quadrant address and  end > start.
* Return H_NotSupported if the partition is not in running in radix
  translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid
  addresses. Else start and end should be aligned to 4kB (lower 11
  bits clear).
* If NESTED is clear, then invalidate process scoped lookaside
  information. Else pid specifies a nested LPID, and the invalidation
  is performed   on nested guest partition table and nested guest
  partition scope real addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3
  and quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
  Those which are partially covered are considered outside
  invalidation range, which allows a caller to optimally invalidate
  ranges that may   contain mixed page sizes.
* Return H_SUCCESS on success.

Signed-off-by: Bharata B Rao 
Reviewed-by: David Gibson 
---
 arch/powerpc/include/asm/mmu_context.h |  11 ++
 arch/powerpc/kvm/book3s_hv.c   |  46 
 arch/powerpc/mm/book3s64/radix_tlb.c   | 148 +
 3 files changed, 205 insertions(+)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 4bc45d3ed8b0..128760eb598e 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -124,8 +124,19 @@ static inline bool need_extra_context(struct mm_struct 
*mm, unsigned long ea)
 
 #if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) && defined(CONFIG_PPC_RADIX_MMU)
 extern void radix_kvm_prefetch_workaround(struct mm_struct *mm);
+void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid,
+unsigned long type, unsigned lon

[PATCH v7 4/6] KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE

2021-05-05 Thread Bharata B Rao
Enable support for process-scoped invalidations from nested
guests and partition-scoped invalidations for nested guests.

Process-scoped invalidations for any level of nested guests
are handled by implementing H_RPT_INVALIDATE handler in the
nested guest exit path in L0.

Partition-scoped invalidation requests are forwarded to the
right nested guest, handled there and passed down to L0
for eventual handling.

Signed-off-by: Bharata B Rao 
Signed-off-by: Aneesh Kumar K.V 
[Nested guest partition-scoped invalidation changes]
---
 .../include/asm/book3s/64/tlbflush-radix.h|   4 +
 arch/powerpc/include/asm/kvm_book3s.h |   3 +
 arch/powerpc/kvm/book3s_hv.c  |  66 ++-
 arch/powerpc/kvm/book3s_hv_nested.c   | 104 ++
 arch/powerpc/mm/book3s64/radix_tlb.c  |   4 -
 5 files changed, 174 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 8b33601cdb9d..a46fd37ad552 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -4,6 +4,10 @@
 
 #include 
 
+#define RIC_FLUSH_TLB 0
+#define RIC_FLUSH_PWC 1
+#define RIC_FLUSH_ALL 2
+
 struct vm_area_struct;
 struct mm_struct;
 struct mmu_gather;
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index a6e9a5585e61..fdf54741c58c 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -307,6 +307,9 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 
dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
 long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
+long do_h_rpt_invalidate_pat(struct kvm_vcpu *vcpu, unsigned long lpid,
+unsigned long type, unsigned long pg_sizes,
+unsigned long start, unsigned long end);
 int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu,
  u64 time_limit, unsigned long lpcr);
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index bcf34246bbe9..a2e7fbec796a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -925,6 +925,41 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu)
return yield_count;
 }
 
+/*
+ * H_RPT_INVALIDATE hcall handler for nested guests.
+ *
+ * Handles only nested process-scoped invalidation requests in L0.
+ */
+static int kvmppc_nested_h_rpt_invalidate(struct kvm_vcpu *vcpu)
+{
+   unsigned long type = kvmppc_get_gpr(vcpu, 6);
+   unsigned long pid, pg_sizes, start, end, psize;
+   struct mmu_psize_def *def;
+
+   /*
+* The partition-scoped invalidations aren't handled here in L0.
+*/
+   if (type & H_RPTI_TYPE_NESTED)
+   return RESUME_HOST;
+
+   pid = kvmppc_get_gpr(vcpu, 4);
+   pg_sizes = kvmppc_get_gpr(vcpu, 7);
+   start = kvmppc_get_gpr(vcpu, 8);
+   end = kvmppc_get_gpr(vcpu, 9);
+
+   for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
+   def = _psize_defs[psize];
+   if (pg_sizes & def->h_rpt_pgsize)
+   do_h_rpt_invalidate_prt(pid,
+   vcpu->arch.nested->shadow_lpid,
+   type, (1UL << def->shift),
+   psize, start, end);
+   }
+
+   kvmppc_set_gpr(vcpu, 3, H_SUCCESS);
+   return RESUME_GUEST;
+}
+
 static long kvmppc_h_rpt_invalidate(struct kvm_vcpu *vcpu,
unsigned long id, unsigned long target,
unsigned long type, unsigned long pg_sizes,
@@ -941,10 +976,18 @@ static long kvmppc_h_rpt_invalidate(struct kvm_vcpu *vcpu,
 
/*
 * Partition-scoped invalidation for nested guests.
-* Not yet supported
 */
-   if (type & H_RPTI_TYPE_NESTED)
-   return H_P3;
+   if (type & H_RPTI_TYPE_NESTED) {
+   if (!nesting_enabled(vcpu->kvm))
+   return H_FUNCTION;
+
+   /* Support only cores as target */
+   if (target != H_RPTI_TARGET_CMMU)
+   return H_P2;
+
+   return do_h_rpt_invalidate_pat(vcpu, id, type, pg_sizes,
+  start, end);
+   }
 
/*
 * Process-scoped invalidation for L1 guests.
@@ -1639,6 +1682,23 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu 
*vcpu)
if (!xics_on_xive())
kvmppc_xics_rm_complete(vcpu, 0);
break;
+   case BOOK3S_INTERRUPT_SYSCALL:
+   {
+   un

[PATCH v7 2/6] powerpc/book3s64/radix: Add H_RPT_INVALIDATE pgsize encodings to mmu_psize_def

2021-05-05 Thread Bharata B Rao
Add a field to mmu_psize_def to store the page size encodings
of H_RPT_INVALIDATE hcall. Initialize this while scanning the radix
AP encodings. This will be used when invalidating with required
page size encoding in the hcall.

Signed-off-by: Bharata B Rao 
Reviewed-by: David Gibson 
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 1 +
 arch/powerpc/mm/book3s64/radix_pgtable.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index eace8c3f7b0a..c02f42d1031e 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -19,6 +19,7 @@ struct mmu_psize_def {
int penc[MMU_PAGE_COUNT];   /* HPTE encoding */
unsigned inttlbiel; /* tlbiel supported for that page size */
unsigned long   avpnm;  /* bits to mask out in AVPN in the HPTE */
+   unsigned long   h_rpt_pgsize; /* H_RPT_INVALIDATE page size encoding */
union {
unsigned long   sllp;   /* SLB L||LP (exact mask to use in 
slbmte) */
unsigned long ap;   /* Ap encoding used by PowerISA 3.0 */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 5fef8db3b463..637db10d841e 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -486,6 +486,7 @@ static int __init radix_dt_scan_page_sizes(unsigned long 
node,
def = _psize_defs[idx];
def->shift = shift;
def->ap  = ap;
+   def->h_rpt_pgsize = psize_to_rpti_pgsize(idx);
}
 
/* needed ? */
@@ -560,9 +561,13 @@ void __init radix__early_init_devtree(void)
 */
mmu_psize_defs[MMU_PAGE_4K].shift = 12;
mmu_psize_defs[MMU_PAGE_4K].ap = 0x0;
+   mmu_psize_defs[MMU_PAGE_4K].h_rpt_pgsize =
+   psize_to_rpti_pgsize(MMU_PAGE_4K);
 
mmu_psize_defs[MMU_PAGE_64K].shift = 16;
mmu_psize_defs[MMU_PAGE_64K].ap = 0x5;
+   mmu_psize_defs[MMU_PAGE_64K].h_rpt_pgsize =
+   psize_to_rpti_pgsize(MMU_PAGE_64K);
}
 
/*
-- 
2.26.2



[PATCH v7 0/6] Support for H_RPT_INVALIDATE in PowerPC KVM

2021-05-05 Thread Bharata B Rao
This patchset adds support for the new hcall H_RPT_INVALIDATE
and replaces the nested tlb flush calls with this new hcall
if support for the same exists.

Changes in v7:
-
- Fixed a bug where LPID of nested guest was being fetched
  wrongly in the process scoped invalidation part of nested
  guest exit handler.
  (In kvmppc_nested_h_rpt_invalidate() of patch 4/6)
- Moved the movement of RIC_FLUSH_ definitions to appropriate
  patch.

v6: 
https://lore.kernel.org/linuxppc-dev/20210311083939.595568-1-bhar...@linux.ibm.com/

Aneesh Kumar K.V (1):
  KVM: PPC: Book3S HV: Fix comments of H_RPT_INVALIDATE arguments

Bharata B Rao (5):
  powerpc/book3s64/radix: Add H_RPT_INVALIDATE pgsize encodings to
mmu_psize_def
  KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
  KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE
  KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability
  KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

 Documentation/virt/kvm/api.rst|  18 +++
 arch/powerpc/include/asm/book3s/64/mmu.h  |   1 +
 .../include/asm/book3s/64/tlbflush-radix.h|   4 +
 arch/powerpc/include/asm/hvcall.h |   4 +-
 arch/powerpc/include/asm/kvm_book3s.h |   3 +
 arch/powerpc/include/asm/mmu_context.h|  11 ++
 arch/powerpc/kvm/book3s_64_mmu_radix.c|  27 +++-
 arch/powerpc/kvm/book3s_hv.c  | 106 
 arch/powerpc/kvm/book3s_hv_nested.c   | 116 -
 arch/powerpc/kvm/powerpc.c|   3 +
 arch/powerpc/mm/book3s64/radix_pgtable.c  |   5 +
 arch/powerpc/mm/book3s64/radix_tlb.c  | 152 +-
 include/uapi/linux/kvm.h  |   1 +
 13 files changed, 438 insertions(+), 13 deletions(-)

-- 
2.26.2



[PATCH v7 1/6] KVM: PPC: Book3S HV: Fix comments of H_RPT_INVALIDATE arguments

2021-05-05 Thread Bharata B Rao
From: "Aneesh Kumar K.V" 

The type values H_RPTI_TYPE_PRT and H_RPTI_TYPE_PAT indicate
invalidating the caching of process and partition scoped entries
respectively.

Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: Bharata B Rao 
Reviewed-by: David Gibson 
---
 arch/powerpc/include/asm/hvcall.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 443050906018..f9927a1545ea 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -413,9 +413,9 @@
 #define H_RPTI_TYPE_NESTED 0x0001  /* Invalidate nested guest 
partition-scope */
 #define H_RPTI_TYPE_TLB0x0002  /* Invalidate TLB */
 #define H_RPTI_TYPE_PWC0x0004  /* Invalidate Page Walk Cache */
-/* Invalidate Process Table Entries if H_RPTI_TYPE_NESTED is clear */
+/* Invalidate caching of Process Table Entries if H_RPTI_TYPE_NESTED is clear 
*/
 #define H_RPTI_TYPE_PRT0x0008
-/* Invalidate Partition Table Entries if H_RPTI_TYPE_NESTED is set */
+/* Invalidate caching of Partition Table Entries if H_RPTI_TYPE_NESTED is set 
*/
 #define H_RPTI_TYPE_PAT0x0008
 #define H_RPTI_TYPE_ALL(H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC | \
 H_RPTI_TYPE_PRT)
-- 
2.26.2



Re: [PATCH v6 3/6] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2021-03-22 Thread Bharata B Rao
On Tue, Mar 23, 2021 at 01:26:56PM +1100, David Gibson wrote:
> On Thu, Mar 11, 2021 at 02:09:36PM +0530, Bharata B Rao wrote:
> > H_RPT_INVALIDATE does two types of TLB invalidations:
> > 
> > 1. Process-scoped invalidations for guests when LPCR[GTSE]=0.
> >This is currently not used in KVM as GTSE is not usually
> >disabled in KVM.
> > 2. Partition-scoped invalidations that an L1 hypervisor does on
> >behalf of an L2 guest. This is currently handled
> >by H_TLB_INVALIDATE hcall and this new replaces the old that.
> > 
> > This commit enables process-scoped invalidations for L1 guests.
> > Support for process-scoped and partition-scoped invalidations
> > from/for nested guests will be added separately.
> > 
> > Process scoped tlbie invalidations from L1 and nested guests
> > need RS register for TLBIE instruction to contain both PID and
> > LPID.  This patch introduces primitives that execute tlbie
> > instruction with both PID and LPID set in prepartion for
> > H_RPT_INVALIDATE hcall.
> > 
> > A description of H_RPT_INVALIDATE follows:
> > 
> > int64   /* H_Success: Return code on successful completion */
> >     /* H_Busy - repeat the call with the same */
> >     /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid
> >parameters */
> > hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT
> > translation
> > lookaside information */
> >   uint64 id,    /* PID/LPID to invalidate */
> >   uint64 target,    /* Invalidation target */
> >   uint64 type,  /* Type of lookaside information */
> >   uint64 pg_sizes,  /* Page sizes */
> >   uint64 start, /* Start of Effective Address (EA)
> >range (inclusive) */
> >   uint64 end)   /* End of EA range (exclusive) */
> > 
> > Invalidation targets (target)
> > -
> > Core MMU    0x01 /* All virtual processors in the
> > partition */
> > Core local MMU  0x02 /* Current virtual processor */
> > Nest MMU    0x04 /* All nest/accelerator agents
> > in use by the partition */
> > 
> > A combination of the above can be specified,
> > except core and core local.
> > 
> > Type of translation to invalidate (type)
> > ---
> > NESTED   0x0001  /* invalidate nested guest partition-scope */
> > TLB  0x0002  /* Invalidate TLB */
> > PWC  0x0004  /* Invalidate Page Walk Cache */
> > PRT  0x0008  /* Invalidate caching of Process Table
> > Entries if NESTED is clear */
> > PAT  0x0008  /* Invalidate caching of Partition Table
> > Entries if NESTED is set */
> > 
> > A combination of the above can be specified.
> > 
> > Page size mask (pages)
> > --
> > 4K  0x01
> > 64K 0x02
> > 2M  0x04
> > 1G  0x08
> > All sizes   (-1UL)
> > 
> > A combination of the above can be specified.
> > All page sizes can be selected with -1.
> > 
> > Semantics: Invalidate radix tree lookaside information
> >    matching the parameters given.
> > * Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters
> >   are different from the defined values.
> > * Return H_PARAMETER if NESTED is set and pid is not a valid nested
> >   LPID allocated to this partition
> > * Return H_P5 if (start, end) doesn't form a valid range. Start and
> >   end should be a valid Quadrant address and  end > start.
> > * Return H_NotSupported if the partition is not in running in radix
> >   translation mode.
> > * May invalidate more translation information than requested.
> > * If start = 0 and end = -1, set the range to cover all valid
> >   addresses. Else start and end should be aligned to 4kB (lower 11
> >   bits clear).
> > * If NESTED is clear, then invalidate process scoped lookaside
> >   information. Else pid specifies a nested LPID, and the invalidation
> >   is performed   on nested guest partition table and nested guest
> >   partition scope real addresses.
> > * If pid = 0 and NESTED is clear, then valid addresses are quadrant 3
> >   and quadrant 0 spaces, Else valid addresses are quadrant 0.
> > * Pages which are fully covered by the range are to be invalidated.
> >   Those which are partially covered are considered outside
> >

[PATCH v6 5/6] KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability

2021-03-11 Thread Bharata B Rao
Now that we have H_RPT_INVALIDATE fully implemented, enable
support for the same via KVM_CAP_PPC_RPT_INVALIDATE KVM capability

Signed-off-by: Bharata B Rao 
---
 Documentation/virt/kvm/api.rst | 18 ++
 arch/powerpc/kvm/powerpc.c |  3 +++
 include/uapi/linux/kvm.h   |  1 +
 3 files changed, 22 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 1a2b5210cdbf..d769cef5f904 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6227,6 +6227,24 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between 
them.
 This capability can be used to check / enable 2nd DAWR feature provided
 by POWER10 processor.
 
+7.24 KVM_CAP_PPC_RPT_INVALIDATE
+--
+
+:Capability: KVM_CAP_PPC_RPT_INVALIDATE
+:Architectures: ppc
+:Type: vm
+
+This capability indicates that the kernel is capable of handling
+H_RPT_INVALIDATE hcall.
+
+In order to enable the use of H_RPT_INVALIDATE in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+This capability is enabled for hypervisors on platforms like POWER9
+that support radix MMU.
+
 8. Other capabilities.
 ==
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a2a68a958fa0..be33b5321a76 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -682,6 +682,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = !!(hv_enabled && kvmppc_hv_ops->enable_dawr1 &&
   !kvmppc_hv_ops->enable_dawr1(NULL));
break;
+   case KVM_CAP_PPC_RPT_INVALIDATE:
+   r = 1;
+   break;
 #endif
default:
r = 0;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f6afee209620..2b2370475cec 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1078,6 +1078,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_DIRTY_LOG_RING 192
 #define KVM_CAP_X86_BUS_LOCK_EXIT 193
 #define KVM_CAP_PPC_DAWR1 194
+#define KVM_CAP_PPC_RPT_INVALIDATE 195
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.26.2



[PATCH v6 6/6] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

2021-03-11 Thread Bharata B Rao
In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
H_RPT_INVALIDATE if available. The availability of this hcall
is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions
DT property.

Signed-off-by: Bharata B Rao 
Reviewed-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 27 +-
 arch/powerpc/kvm/book3s_hv_nested.c| 12 ++--
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index e603de7ade52..1e1e55fd0ee5 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Supported radix tree geometry.
@@ -318,9 +319,19 @@ void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned 
long addr,
}
 
psi = shift_to_mmu_psize(pshift);
-   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
-   lpid, rb);
+
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE)) {
+   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 
1),
+   lpid, rb);
+   } else {
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB,
+   psize_to_rpti_pgsize(psi),
+   addr, addr + psize);
+   }
+
if (rc)
pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc);
 }
@@ -334,8 +345,14 @@ static void kvmppc_radix_flush_pwc(struct kvm *kvm, 
unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 
1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_PWC, H_RPTI_PAGE_ALL,
+   0, -1UL);
if (rc)
pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index adcc8e26ef22..5601b7eb9b89 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct patb_entry *pseries_partition_tb;
 
@@ -444,8 +445,15 @@ static void kvmhv_flush_lpid(unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 
1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+   H_RPTI_TYPE_PAT,
+   H_RPTI_PAGE_ALL, 0, -1UL);
if (rc)
pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
 }
-- 
2.26.2



[PATCH v6 4/6] KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE

2021-03-11 Thread Bharata B Rao
Enable support for process-scoped invalidations from nested
guests and partition-scoped invalidations for nested guests.

Process-scoped invalidations for any level of nested guests
are handled by implementing H_RPT_INVALIDATE handler in the
nested guest exit path in L0.

Partition-scoped invalidation requests are forwarded to the
right nested guest, handled there and passed down to L0
for eventual handling.

Signed-off-by: Bharata B Rao 
Signed-off-by: Aneesh Kumar K.V 
[Nested guest partition-scoped invalidation changes]
---
 arch/powerpc/include/asm/kvm_book3s.h |   3 +
 arch/powerpc/kvm/book3s_hv.c  |  71 +-
 arch/powerpc/kvm/book3s_hv_nested.c   | 104 ++
 3 files changed, 175 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 2f5f919f6cd3..de8fc5a4d19c 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -305,6 +305,9 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 
dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
 long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
+long do_h_rpt_invalidate_pat(struct kvm_vcpu *vcpu, unsigned long lpid,
+unsigned long type, unsigned long pg_sizes,
+unsigned long start, unsigned long end);
 int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu,
  u64 time_limit, unsigned long lpcr);
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 5d008468347c..03755389efd1 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -922,6 +922,46 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu)
return yield_count;
 }
 
+/*
+ * H_RPT_INVALIDATE hcall handler for nested guests.
+ *
+ * Handles only nested process-scoped invalidation requests in L0.
+ */
+static int kvmppc_nested_h_rpt_invalidate(struct kvm_vcpu *vcpu)
+{
+   unsigned long type = kvmppc_get_gpr(vcpu, 6);
+   unsigned long pid, pg_sizes, start, end, psize;
+   struct kvm_nested_guest *gp;
+   struct mmu_psize_def *def;
+
+   /*
+* The partition-scoped invalidations aren't handled here in L0.
+*/
+   if (type & H_RPTI_TYPE_NESTED)
+   return RESUME_HOST;
+
+   pid = kvmppc_get_gpr(vcpu, 4);
+   pg_sizes = kvmppc_get_gpr(vcpu, 7);
+   start = kvmppc_get_gpr(vcpu, 8);
+   end = kvmppc_get_gpr(vcpu, 9);
+
+   gp = kvmhv_get_nested(vcpu->kvm, vcpu->kvm->arch.lpid, false);
+   if (!gp)
+   goto out;
+
+   for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
+   def = _psize_defs[psize];
+   if (pg_sizes & def->h_rpt_pgsize)
+   do_h_rpt_invalidate_prt(pid, gp->shadow_lpid, type,
+   (1UL << def->shift), psize,
+   start, end);
+   }
+   kvmhv_put_nested(gp);
+out:
+   kvmppc_set_gpr(vcpu, 3, H_SUCCESS);
+   return RESUME_GUEST;
+}
+
 static long kvmppc_h_rpt_invalidate(struct kvm_vcpu *vcpu,
unsigned long id, unsigned long target,
unsigned long type, unsigned long pg_sizes,
@@ -938,10 +978,18 @@ static long kvmppc_h_rpt_invalidate(struct kvm_vcpu *vcpu,
 
/*
 * Partition-scoped invalidation for nested guests.
-* Not yet supported
 */
-   if (type & H_RPTI_TYPE_NESTED)
-   return H_P3;
+   if (type & H_RPTI_TYPE_NESTED) {
+   if (!nesting_enabled(vcpu->kvm))
+   return H_FUNCTION;
+
+   /* Support only cores as target */
+   if (target != H_RPTI_TARGET_CMMU)
+   return H_P2;
+
+   return do_h_rpt_invalidate_pat(vcpu, id, type, pg_sizes,
+  start, end);
+   }
 
/*
 * Process-scoped invalidation for L1 guests.
@@ -1636,6 +1684,23 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu 
*vcpu)
if (!xics_on_xive())
kvmppc_xics_rm_complete(vcpu, 0);
break;
+   case BOOK3S_INTERRUPT_SYSCALL:
+   {
+   unsigned long req = kvmppc_get_gpr(vcpu, 3);
+
+   /*
+* The H_RPT_INVALIDATE hcalls issued by nested
+* guests for process-scoped invalidations when
+* GTSE=0, are handled here in L0.
+*/
+   if (req == H_RPT_INVALIDATE) {
+   r = kvmppc_nested_h_rpt_invalidate(vcpu);
+   break;
+   }
+
+   

[PATCH v6 2/6] powerpc/book3s64/radix: Add H_RPT_INVALIDATE pgsize encodings to mmu_psize_def

2021-03-11 Thread Bharata B Rao
Add a field to mmu_psize_def to store the page size encodings
of H_RPT_INVALIDATE hcall. Initialize this while scanning the radix
AP encodings. This will be used when invalidating with required
page size encoding in the hcall.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 1 +
 arch/powerpc/mm/book3s64/radix_pgtable.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index eace8c3f7b0a..c02f42d1031e 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -19,6 +19,7 @@ struct mmu_psize_def {
int penc[MMU_PAGE_COUNT];   /* HPTE encoding */
unsigned inttlbiel; /* tlbiel supported for that page size */
unsigned long   avpnm;  /* bits to mask out in AVPN in the HPTE */
+   unsigned long   h_rpt_pgsize; /* H_RPT_INVALIDATE page size encoding */
union {
unsigned long   sllp;   /* SLB L||LP (exact mask to use in 
slbmte) */
unsigned long ap;   /* Ap encoding used by PowerISA 3.0 */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 98f0b243c1ab..1b749899016b 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -486,6 +486,7 @@ static int __init radix_dt_scan_page_sizes(unsigned long 
node,
def = _psize_defs[idx];
def->shift = shift;
def->ap  = ap;
+   def->h_rpt_pgsize = psize_to_rpti_pgsize(idx);
}
 
/* needed ? */
@@ -560,9 +561,13 @@ void __init radix__early_init_devtree(void)
 */
mmu_psize_defs[MMU_PAGE_4K].shift = 12;
mmu_psize_defs[MMU_PAGE_4K].ap = 0x0;
+   mmu_psize_defs[MMU_PAGE_4K].h_rpt_pgsize =
+   psize_to_rpti_pgsize(MMU_PAGE_4K);
 
mmu_psize_defs[MMU_PAGE_64K].shift = 16;
mmu_psize_defs[MMU_PAGE_64K].ap = 0x5;
+   mmu_psize_defs[MMU_PAGE_64K].h_rpt_pgsize =
+   psize_to_rpti_pgsize(MMU_PAGE_64K);
}
 
/*
-- 
2.26.2



[PATCH v6 3/6] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2021-03-11 Thread Bharata B Rao
H_RPT_INVALIDATE does two types of TLB invalidations:

1. Process-scoped invalidations for guests when LPCR[GTSE]=0.
   This is currently not used in KVM as GTSE is not usually
   disabled in KVM.
2. Partition-scoped invalidations that an L1 hypervisor does on
   behalf of an L2 guest. This is currently handled
   by H_TLB_INVALIDATE hcall and this new replaces the old that.

This commit enables process-scoped invalidations for L1 guests.
Support for process-scoped and partition-scoped invalidations
from/for nested guests will be added separately.

Process scoped tlbie invalidations from L1 and nested guests
need RS register for TLBIE instruction to contain both PID and
LPID.  This patch introduces primitives that execute tlbie
instruction with both PID and LPID set in prepartion for
H_RPT_INVALIDATE hcall.

A description of H_RPT_INVALIDATE follows:

int64   /* H_Success: Return code on successful completion */
    /* H_Busy - repeat the call with the same */
    /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid
   parameters */
hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT
translation
lookaside information */
  uint64 id,    /* PID/LPID to invalidate */
  uint64 target,    /* Invalidation target */
  uint64 type,  /* Type of lookaside information */
  uint64 pg_sizes,  /* Page sizes */
  uint64 start, /* Start of Effective Address (EA)
   range (inclusive) */
  uint64 end)   /* End of EA range (exclusive) */

Invalidation targets (target)
-
Core MMU    0x01 /* All virtual processors in the
partition */
Core local MMU  0x02 /* Current virtual processor */
Nest MMU    0x04 /* All nest/accelerator agents
in use by the partition */

A combination of the above can be specified,
except core and core local.

Type of translation to invalidate (type)
---
NESTED   0x0001  /* invalidate nested guest partition-scope */
TLB  0x0002  /* Invalidate TLB */
PWC  0x0004  /* Invalidate Page Walk Cache */
PRT  0x0008  /* Invalidate caching of Process Table
Entries if NESTED is clear */
PAT  0x0008  /* Invalidate caching of Partition Table
Entries if NESTED is set */

A combination of the above can be specified.

Page size mask (pages)
--
4K  0x01
64K 0x02
2M  0x04
1G  0x08
All sizes   (-1UL)

A combination of the above can be specified.
All page sizes can be selected with -1.

Semantics: Invalidate radix tree lookaside information
   matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters
  are different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
  LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and
  end should be a valid Quadrant address and  end > start.
* Return H_NotSupported if the partition is not in running in radix
  translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid
  addresses. Else start and end should be aligned to 4kB (lower 11
  bits clear).
* If NESTED is clear, then invalidate process scoped lookaside
  information. Else pid specifies a nested LPID, and the invalidation
  is performed   on nested guest partition table and nested guest
  partition scope real addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3
  and quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
  Those which are partially covered are considered outside
  invalidation range, which allows a caller to optimally invalidate
  ranges that may   contain mixed page sizes.
* Return H_SUCCESS on success.

Signed-off-by: Bharata B Rao 
---
 .../include/asm/book3s/64/tlbflush-radix.h|   4 +
 arch/powerpc/include/asm/mmu_context.h|  11 ++
 arch/powerpc/kvm/book3s_hv.c  |  46 ++
 arch/powerpc/mm/book3s64/radix_tlb.c  | 152 +-
 4 files changed, 209 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 8b33601cdb9d..a46fd37ad552 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -4,6 +4,10 @@
 
 #include 
 
+#define RIC_FLUSH_TLB 0
+#define RIC_FLUSH_PWC 1
+#define RIC_FLUSH_ALL 2
+
 struct vm_area_struct;
 struct mm_struct;
 struct mmu_gather;
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
in

[PATCH v6 1/6] KVM: PPC: Book3S HV: Fix comments of H_RPT_INVALIDATE arguments

2021-03-11 Thread Bharata B Rao
From: "Aneesh Kumar K.V" 

The type values H_RPTI_TYPE_PRT and H_RPTI_TYPE_PAT indicate
invalidating the caching of process and partition scoped entries
respectively.

Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/hvcall.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index ed6086d57b22..6af7bb3c9121 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -411,9 +411,9 @@
 #define H_RPTI_TYPE_NESTED 0x0001  /* Invalidate nested guest 
partition-scope */
 #define H_RPTI_TYPE_TLB0x0002  /* Invalidate TLB */
 #define H_RPTI_TYPE_PWC0x0004  /* Invalidate Page Walk Cache */
-/* Invalidate Process Table Entries if H_RPTI_TYPE_NESTED is clear */
+/* Invalidate caching of Process Table Entries if H_RPTI_TYPE_NESTED is clear 
*/
 #define H_RPTI_TYPE_PRT0x0008
-/* Invalidate Partition Table Entries if H_RPTI_TYPE_NESTED is set */
+/* Invalidate caching of Partition Table Entries if H_RPTI_TYPE_NESTED is set 
*/
 #define H_RPTI_TYPE_PAT0x0008
 #define H_RPTI_TYPE_ALL(H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC | \
 H_RPTI_TYPE_PRT)
-- 
2.26.2



[PATCH v6 0/6] Support for H_RPT_INVALIDATE in PowerPC KVM

2021-03-11 Thread Bharata B Rao
This patchset adds support for the new hcall H_RPT_INVALIDATE
and replaces the nested tlb flush calls with this new hcall
if support for the same exists.

Changes in v6:
-
- Split the patch that adds hcall support in to three parts so that
  it becomes easy to understand.
  part1: hcall support for L1 guest
  part2: hcall support for nested guests
  part3: add KVM_CAP_PPC_RPT_INVALIDATE
- Renames, comments and code reorgs to improve readability of the
  patches.
- Added a patch to fix comments in hcall args (Aneesh).
- Added hcall documentation as part of commit message.
- Correct handling for partition-scoped invalidation of nested
  guests (Aneesh).

I haven't addressed the optimization that David suggested for
psize_to_rpti_pgsize() and that could be done later separately.

This patchset passes the following tests:
-
- Boot and reboot of L1, L2 and L3 guests with GTSE=0.
- Boot and reboot of L1 and multiple L2 guests with GTSE=0
- vm class stress-ng tests on L3 with GTSE=0
- Boot and reboot of L1, L2 and L3 guest with GTSE=1

v5: 
https://lore.kernel.org/linuxppc-dev/20210224082510.3962423-1-bhar...@linux.ibm.com/T/#t

Aneesh Kumar K.V (1):
  KVM: PPC: Book3S HV: Fix comments of H_RPT_INVALIDATE arguments

Bharata B Rao (5):
  powerpc/book3s64/radix: Add H_RPT_INVALIDATE pgsize encodings to
mmu_psize_def
  KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
  KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE
  KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability
  KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

 Documentation/virt/kvm/api.rst|  18 +++
 arch/powerpc/include/asm/book3s/64/mmu.h  |   1 +
 .../include/asm/book3s/64/tlbflush-radix.h|   4 +
 arch/powerpc/include/asm/hvcall.h |   4 +-
 arch/powerpc/include/asm/kvm_book3s.h |   3 +
 arch/powerpc/include/asm/mmu_context.h|  11 ++
 arch/powerpc/kvm/book3s_64_mmu_radix.c|  27 +++-
 arch/powerpc/kvm/book3s_hv.c  | 111 +
 arch/powerpc/kvm/book3s_hv_nested.c   | 116 -
 arch/powerpc/kvm/powerpc.c|   3 +
 arch/powerpc/mm/book3s64/radix_pgtable.c  |   5 +
 arch/powerpc/mm/book3s64/radix_tlb.c  | 152 +-
 include/uapi/linux/kvm.h  |   1 +
 13 files changed, 443 insertions(+), 13 deletions(-)

-- 
2.26.2



Re: [PATCH v5 2/3] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2021-03-01 Thread Bharata B Rao
On Tue, Mar 02, 2021 at 12:45:18PM +1100, David Gibson wrote:
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 45fd862ac128..38ce3f21b21f 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6225,6 +6225,24 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between 
> > them.
> >  This capability can be used to check / enable 2nd DAWR feature provided
> >  by POWER10 processor.
> >  
> > +7.23 KVM_CAP_PPC_RPT_INVALIDATE
> > +--
> > +
> > +:Capability: KVM_CAP_PPC_RPT_INVALIDATE
> > +:Architectures: ppc
> > +:Type: vm
> > +
> > +This capability indicates that the kernel is capable of handling
> > +H_RPT_INVALIDATE hcall.
> > +
> > +In order to enable the use of H_RPT_INVALIDATE in the guest,
> > +user space might have to advertise it for the guest. For example,
> > +IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
> > +present in the "ibm,hypertas-functions" device-tree property.
> > +
> > +This capability is enabled for hypervisors on platforms like POWER9
> > +that support radix MMU.
> 
> Does this mean that KVM will handle the hypercall, even if not
> explicitly enabled by userspace (qemu)?  That's generally not what we
> want, since we need to allow qemu to set up backwards compatible
> guests.

This capability only indicates that hypervisor supports the hcall.

QEMU will check for this and conditionally enable the hcall
(via KVM_CAP_PPC_ENABLE_HCALL ioctl). Enabling the hcall is
conditional to cap-rpt-invalidate sPAPR machine capability being
enabled by the user. Will post a followup QEMU patch shortly.

Older QEMU patch can be found here:
https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg00627.html

> 
> > +
> >  8. Other capabilities.
> >  ==
> >  
> > diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
> > b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> > index 8b33601cdb9d..a46fd37ad552 100644
> > --- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> > +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> > @@ -4,6 +4,10 @@
> >  
> >  #include 
> >  
> > +#define RIC_FLUSH_TLB 0
> > +#define RIC_FLUSH_PWC 1
> > +#define RIC_FLUSH_ALL 2
> > +
> >  struct vm_area_struct;
> >  struct mm_struct;
> >  struct mmu_gather;
> > diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> > b/arch/powerpc/include/asm/kvm_book3s.h
> > index 2f5f919f6cd3..a1515f94400e 100644
> > --- a/arch/powerpc/include/asm/kvm_book3s.h
> > +++ b/arch/powerpc/include/asm/kvm_book3s.h
> > @@ -305,6 +305,9 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, 
> > u64 dw1);
> >  void kvmhv_release_all_nested(struct kvm *kvm);
> >  long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
> >  long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
> > +long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
> > +unsigned long type, unsigned long pg_sizes,
> > +unsigned long start, unsigned long end);
> >  int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu,
> >   u64 time_limit, unsigned long lpcr);
> >  void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
> > diff --git a/arch/powerpc/include/asm/mmu_context.h 
> > b/arch/powerpc/include/asm/mmu_context.h
> > index 652ce85f9410..820caf4e01b7 100644
> > --- a/arch/powerpc/include/asm/mmu_context.h
> > +++ b/arch/powerpc/include/asm/mmu_context.h
> > @@ -124,8 +124,19 @@ static inline bool need_extra_context(struct mm_struct 
> > *mm, unsigned long ea)
> >  
> >  #if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) && defined(CONFIG_PPC_RADIX_MMU)
> >  extern void radix_kvm_prefetch_workaround(struct mm_struct *mm);
> > +void do_h_rpt_invalidate(unsigned long pid, unsigned long lpid,
> > +unsigned long type, unsigned long page_size,
> > +unsigned long psize, unsigned long start,
> > +unsigned long end);
> >  #else
> >  static inline void radix_kvm_prefetch_workaround(struct mm_struct *mm) { }
> > +static inline void do_h_rpt_invalidate(unsigned long pid,
> > +  unsigned long lpid,
> > +  unsigned long type,
> > +  unsigned long page_size,
> > +  unsigned long psize,
> > +  unsigned long start,
> > +  unsigned long end) { }
> >  #endif
> >  
> >  extern void switch_cop(struct mm_struct *next);
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index 13bad6bf4c95..d83f006fc19d 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -921,6 +921,69 @@ static int kvmppc_get_yield_count(struct kvm_vcpu 
> > *vcpu)
> > return yield_count;
> >  }
> >  
> > +static void do_h_rpt_invalidate_prs(unsigned long pid, 

Re: [PATCH v5 1/3] powerpc/book3s64/radix: Add H_RPT_INVALIDATE pgsize encodings to mmu_psize_def

2021-03-01 Thread Bharata B Rao
On Tue, Mar 02, 2021 at 12:28:34PM +1100, David Gibson wrote:
> On Wed, Feb 24, 2021 at 01:55:08PM +0530, Bharata B Rao wrote:
> > Add a field to mmu_psize_def to store the page size encodings
> > of H_RPT_INVALIDATE hcall. Initialize this while scanning the radix
> > AP encodings. This will be used when invalidating with required
> > page size encoding in the hcall.
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  arch/powerpc/include/asm/book3s/64/mmu.h | 1 +
> >  arch/powerpc/mm/book3s64/radix_pgtable.c | 5 +
> >  2 files changed, 6 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
> > b/arch/powerpc/include/asm/book3s/64/mmu.h
> > index eace8c3f7b0a..c02f42d1031e 100644
> > --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> > +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> > @@ -19,6 +19,7 @@ struct mmu_psize_def {
> > int penc[MMU_PAGE_COUNT];   /* HPTE encoding */
> > unsigned inttlbiel; /* tlbiel supported for that page size */
> > unsigned long   avpnm;  /* bits to mask out in AVPN in the HPTE */
> > +   unsigned long   h_rpt_pgsize; /* H_RPT_INVALIDATE page size encoding */
> > union {
> > unsigned long   sllp;   /* SLB L||LP (exact mask to use in 
> > slbmte) */
> > unsigned long ap;   /* Ap encoding used by PowerISA 3.0 */
> > diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
> > b/arch/powerpc/mm/book3s64/radix_pgtable.c
> > index 98f0b243c1ab..1b749899016b 100644
> > --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> > +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> > @@ -486,6 +486,7 @@ static int __init radix_dt_scan_page_sizes(unsigned 
> > long node,
> > def = _psize_defs[idx];
> > def->shift = shift;
> > def->ap  = ap;
> > +   def->h_rpt_pgsize = psize_to_rpti_pgsize(idx);
> > }
> >  
> > /* needed ? */
> > @@ -560,9 +561,13 @@ void __init radix__early_init_devtree(void)
> >  */
> > mmu_psize_defs[MMU_PAGE_4K].shift = 12;
> > mmu_psize_defs[MMU_PAGE_4K].ap = 0x0;
> > +   mmu_psize_defs[MMU_PAGE_4K].h_rpt_pgsize =
> > +   psize_to_rpti_pgsize(MMU_PAGE_4K);
> 
> Hm.  TBH, I was thinking of this as replacing psize_to_rpti_pgsize() -
> that is, you directly put the correct codes in there, then just have
> psize_to_rpti_pgsize() look them up in the table.
> 
> I guess that could be a followup change, though.
> 
> >  
> > mmu_psize_defs[MMU_PAGE_64K].shift = 16;
> > mmu_psize_defs[MMU_PAGE_64K].ap = 0x5;
> > +   mmu_psize_defs[MMU_PAGE_64K].h_rpt_pgsize =
> > +   psize_to_rpti_pgsize(MMU_PAGE_64K);

Hmm if you see I got rid of rpti_pgsize_to_psize() by having the
defines directly in mmu_psize_def[].

There are two cases in the above code (radix__early_init_devtree)

1. If radix pagesize encodings are present in the DT, we walk
the page sizes in the loop and populate the enconding for
H_RPT_INVALIDATE. I am not sure if we can use the direct codes
in this case.

2. If DT doesn't have the radix pagesize encodings, 4K and 64K
sizes are assumed as fallback sizes where we can use direct
encodings.

Regards,
Bharata.


Re: [PATCH v5 2/3] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2021-03-01 Thread Bharata B Rao
On Wed, Feb 24, 2021 at 12:58:02PM -0300, Fabiano Rosas wrote:
> > @@ -1590,6 +1662,24 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu 
> > *vcpu)
> > if (!xics_on_xive())
> > kvmppc_xics_rm_complete(vcpu, 0);
> > break;
> > +   case BOOK3S_INTERRUPT_SYSCALL:
> > +   {
> > +   unsigned long req = kvmppc_get_gpr(vcpu, 3);
> > +
> > +   /*
> > +* The H_RPT_INVALIDATE hcalls issued by nested
> > +* guests for process scoped invalidations when
> > +* GTSE=0, are handled here in L0.
> > +*/
> > +   if (req == H_RPT_INVALIDATE) {
> > +   kvmppc_nested_rpt_invalidate(vcpu);
> > +   r = RESUME_GUEST;
> > +   break;
> > +   }
> 
> I'm inclined to say this is a bit too early. We're handling the hcall
> before kvmhv_run_single_vcpu has fully finished and we'll skip some
> code that has been running in all guest exits:
> 
>   if (trap) {
>   if (!nested)
>   r = kvmppc_handle_exit_hv(vcpu, current);
>   else
>   r = kvmppc_handle_nested_exit(vcpu);  <--- we're here
>   }
>   vcpu->arch.ret = r;
> 
> (...)
> 
>   vcpu->arch.ceded = 0;
> 
>   vc->vcore_state = VCORE_INACTIVE;
>   trace_kvmppc_run_core(vc, 1);
> 
>  done:
>   kvmppc_remove_runnable(vc, vcpu);
>   trace_kvmppc_run_vcpu_exit(vcpu);
> 
>   return vcpu->arch.ret;
> 
> Especially the kvmppc_remove_runnable function because it sets the
> vcpu state:
> 
>   vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
> 
> which should be the case if we're handling a hypercall.
> 
> I suggest we do similarly to the L1 exit code and defer the hcall
> handling until after kvmppc_run_single_vcpu has exited, still inside the
> is_kvmppc_resume_guest(r) loop.
> 
> So we'd set:
> case BOOK3S_INTERRUPT_SYSCALL:
>   vcpu->run->exit_reason = KVM_EXIT_PAPR_HCALL;
>   r = RESUME_HOST;
> break;
> 
> and perhaps introduce a new kvmppc_pseries_do_nested_hcall that's called
> after kvmppc_run_single_vcpu.

Yes, looks like we should, but I wasn't sure if an exit similar to L1
exit for hcall handling is needed here too, hence took this approach.

Paul, could you please clarify?

Regards,
Bharata.


[PATCH v5 3/3] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

2021-02-24 Thread Bharata B Rao
In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
H_RPT_INVALIDATE if available. The availability of this hcall
is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions
DT property.

Signed-off-by: Bharata B Rao 
Reviewed-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 27 +-
 arch/powerpc/kvm/book3s_hv_nested.c| 12 ++--
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index bb35490400e9..7ea5459022cb 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Supported radix tree geometry.
@@ -318,9 +319,19 @@ void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned 
long addr,
}
 
psi = shift_to_mmu_psize(pshift);
-   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
-   lpid, rb);
+
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE)) {
+   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 
1),
+   lpid, rb);
+   } else {
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB,
+   psize_to_rpti_pgsize(psi),
+   addr, addr + psize);
+   }
+
if (rc)
pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc);
 }
@@ -334,8 +345,14 @@ static void kvmppc_radix_flush_pwc(struct kvm *kvm, 
unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 
1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_PWC, H_RPTI_PAGE_ALL,
+   0, -1UL);
if (rc)
pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index ca43b2d38dce..2a6570e6c2c4 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct patb_entry *pseries_partition_tb;
 
@@ -444,8 +445,15 @@ static void kvmhv_flush_lpid(unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 
1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+   H_RPTI_TYPE_PAT,
+   H_RPTI_PAGE_ALL, 0, -1UL);
if (rc)
pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
 }
-- 
2.26.2



[PATCH v5 1/3] powerpc/book3s64/radix: Add H_RPT_INVALIDATE pgsize encodings to mmu_psize_def

2021-02-24 Thread Bharata B Rao
Add a field to mmu_psize_def to store the page size encodings
of H_RPT_INVALIDATE hcall. Initialize this while scanning the radix
AP encodings. This will be used when invalidating with required
page size encoding in the hcall.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 1 +
 arch/powerpc/mm/book3s64/radix_pgtable.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index eace8c3f7b0a..c02f42d1031e 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -19,6 +19,7 @@ struct mmu_psize_def {
int penc[MMU_PAGE_COUNT];   /* HPTE encoding */
unsigned inttlbiel; /* tlbiel supported for that page size */
unsigned long   avpnm;  /* bits to mask out in AVPN in the HPTE */
+   unsigned long   h_rpt_pgsize; /* H_RPT_INVALIDATE page size encoding */
union {
unsigned long   sllp;   /* SLB L||LP (exact mask to use in 
slbmte) */
unsigned long ap;   /* Ap encoding used by PowerISA 3.0 */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 98f0b243c1ab..1b749899016b 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -486,6 +486,7 @@ static int __init radix_dt_scan_page_sizes(unsigned long 
node,
def = _psize_defs[idx];
def->shift = shift;
def->ap  = ap;
+   def->h_rpt_pgsize = psize_to_rpti_pgsize(idx);
}
 
/* needed ? */
@@ -560,9 +561,13 @@ void __init radix__early_init_devtree(void)
 */
mmu_psize_defs[MMU_PAGE_4K].shift = 12;
mmu_psize_defs[MMU_PAGE_4K].ap = 0x0;
+   mmu_psize_defs[MMU_PAGE_4K].h_rpt_pgsize =
+   psize_to_rpti_pgsize(MMU_PAGE_4K);
 
mmu_psize_defs[MMU_PAGE_64K].shift = 16;
mmu_psize_defs[MMU_PAGE_64K].ap = 0x5;
+   mmu_psize_defs[MMU_PAGE_64K].h_rpt_pgsize =
+   psize_to_rpti_pgsize(MMU_PAGE_64K);
}
 
/*
-- 
2.26.2



[PATCH v5 2/3] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2021-02-24 Thread Bharata B Rao
Implement H_RPT_INVALIDATE hcall and add KVM capability
KVM_CAP_PPC_RPT_INVALIDATE to indicate the support for the same.

This hcall does two types of TLB invalidations:

1. Process-scoped invalidations for guests with LPCR[GTSE]=0.
   This is currently not used in KVM as GTSE is not usually
   disabled in KVM.
2. Partition-scoped invalidations that an L1 hypervisor does on
   behalf of an L2 guest. This replaces the uses of the existing
   hcall H_TLB_INVALIDATE.

In order to handle process scoped invalidations of L2, we
intercept the nested exit handling code in L0 only to handle
H_TLB_INVALIDATE hcall.

Process scoped tlbie invalidations from L1 and nested guests
need RS register for TLBIE instruction to contain both PID and
LPID.  This patch introduces primitives that execute tlbie
instruction with both PID and LPID set in prepartion for
H_RPT_INVALIDATE hcall.

Signed-off-by: Bharata B Rao 
---
 Documentation/virt/kvm/api.rst|  18 +++
 .../include/asm/book3s/64/tlbflush-radix.h|   4 +
 arch/powerpc/include/asm/kvm_book3s.h |   3 +
 arch/powerpc/include/asm/mmu_context.h|  11 ++
 arch/powerpc/kvm/book3s_hv.c  |  90 +++
 arch/powerpc/kvm/book3s_hv_nested.c   |  77 +
 arch/powerpc/kvm/powerpc.c|   3 +
 arch/powerpc/mm/book3s64/radix_tlb.c  | 147 +-
 include/uapi/linux/kvm.h  |   1 +
 9 files changed, 350 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 45fd862ac128..38ce3f21b21f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6225,6 +6225,24 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between 
them.
 This capability can be used to check / enable 2nd DAWR feature provided
 by POWER10 processor.
 
+7.23 KVM_CAP_PPC_RPT_INVALIDATE
+--
+
+:Capability: KVM_CAP_PPC_RPT_INVALIDATE
+:Architectures: ppc
+:Type: vm
+
+This capability indicates that the kernel is capable of handling
+H_RPT_INVALIDATE hcall.
+
+In order to enable the use of H_RPT_INVALIDATE in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+This capability is enabled for hypervisors on platforms like POWER9
+that support radix MMU.
+
 8. Other capabilities.
 ==
 
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 8b33601cdb9d..a46fd37ad552 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -4,6 +4,10 @@
 
 #include 
 
+#define RIC_FLUSH_TLB 0
+#define RIC_FLUSH_PWC 1
+#define RIC_FLUSH_ALL 2
+
 struct vm_area_struct;
 struct mm_struct;
 struct mmu_gather;
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 2f5f919f6cd3..a1515f94400e 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -305,6 +305,9 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 
dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
 long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
+long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
+unsigned long type, unsigned long pg_sizes,
+unsigned long start, unsigned long end);
 int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu,
  u64 time_limit, unsigned long lpcr);
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 652ce85f9410..820caf4e01b7 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -124,8 +124,19 @@ static inline bool need_extra_context(struct mm_struct 
*mm, unsigned long ea)
 
 #if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) && defined(CONFIG_PPC_RADIX_MMU)
 extern void radix_kvm_prefetch_workaround(struct mm_struct *mm);
+void do_h_rpt_invalidate(unsigned long pid, unsigned long lpid,
+unsigned long type, unsigned long page_size,
+unsigned long psize, unsigned long start,
+unsigned long end);
 #else
 static inline void radix_kvm_prefetch_workaround(struct mm_struct *mm) { }
+static inline void do_h_rpt_invalidate(unsigned long pid,
+  unsigned long lpid,
+  unsigned long type,
+  unsigned long page_size,
+  unsigned long psize,
+ 

[PATCH v5 0/3] Support for H_RPT_INVALIDATE in PowerPC KVM

2021-02-24 Thread Bharata B Rao
This patchset adds support for the new hcall H_RPT_INVALIDATE
and replaces the nested tlb flush calls with this new hcall
if support for the same exists.

Changes in v5:
-
- Included the h_rpt_invalidate page size information within
  mmu_pszie_defs[] as per David Gibson's suggestion.
- Redid nested exit changes as per Paul Mackerras' suggestion.
- Folded the patch that added tlbie primitives into the
  hcall implementation patch.

v4: 
https://lore.kernel.org/linuxppc-dev/20210215063542.3642366-1-bhar...@linux.ibm.com/T/#t

Bharata B Rao (3):
  powerpc/book3s64/radix: Add H_RPT_INVALIDATE pgsize encodings to
mmu_psize_def
  KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
  KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

 Documentation/virt/kvm/api.rst|  18 +++
 arch/powerpc/include/asm/book3s/64/mmu.h  |   1 +
 .../include/asm/book3s/64/tlbflush-radix.h|   4 +
 arch/powerpc/include/asm/kvm_book3s.h |   3 +
 arch/powerpc/include/asm/mmu_context.h|  11 ++
 arch/powerpc/kvm/book3s_64_mmu_radix.c|  27 +++-
 arch/powerpc/kvm/book3s_hv.c  |  90 +++
 arch/powerpc/kvm/book3s_hv_nested.c   |  89 ++-
 arch/powerpc/kvm/powerpc.c|   3 +
 arch/powerpc/mm/book3s64/radix_pgtable.c  |   5 +
 arch/powerpc/mm/book3s64/radix_tlb.c  | 147 +-
 include/uapi/linux/kvm.h  |   1 +
 12 files changed, 388 insertions(+), 11 deletions(-)

-- 
2.26.2



Re: [PATCH v4 2/3] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2021-02-21 Thread Bharata B Rao
On Wed, Feb 17, 2021 at 11:38:07AM +1100, David Gibson wrote:
> On Mon, Feb 15, 2021 at 12:05:41PM +0530, Bharata B Rao wrote:
> > Implement H_RPT_INVALIDATE hcall and add KVM capability
> > KVM_CAP_PPC_RPT_INVALIDATE to indicate the support for the same.
> > 
> > This hcall does two types of TLB invalidations:
> > 
> > 1. Process-scoped invalidations for guests with LPCR[GTSE]=0.
> >This is currently not used in KVM as GTSE is not usually
> >disabled in KVM.
> > 2. Partition-scoped invalidations that an L1 hypervisor does on
> >behalf of an L2 guest. This replaces the uses of the existing
> >hcall H_TLB_INVALIDATE.
> > 
> > In order to handle process scoped invalidations of L2, we
> > intercept the nested exit handling code in L0 only to handle
> > H_TLB_INVALIDATE hcall.
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  Documentation/virt/kvm/api.rst | 17 +
> >  arch/powerpc/include/asm/kvm_book3s.h  |  3 +
> >  arch/powerpc/include/asm/mmu_context.h | 11 +++
> >  arch/powerpc/kvm/book3s_hv.c   | 91 
> >  arch/powerpc/kvm/book3s_hv_nested.c| 96 ++
> >  arch/powerpc/kvm/powerpc.c |  3 +
> >  arch/powerpc/mm/book3s64/radix_tlb.c   | 25 +++
> >  include/uapi/linux/kvm.h   |  1 +
> >  8 files changed, 247 insertions(+)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 99ceb978c8b0..416c36aa35d4 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6038,6 +6038,23 @@ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit 
> > notifications which user space
> >  can then handle to implement model specific MSR handling and/or user 
> > notifications
> >  to inform a user that an MSR was not handled.
> >  
> > +7.22 KVM_CAP_PPC_RPT_INVALIDATE
> > +--
> > +
> > +:Capability: KVM_CAP_PPC_RPT_INVALIDATE
> > +:Architectures: ppc
> > +:Type: vm
> > +
> > +This capability indicates that the kernel is capable of handling
> > +H_RPT_INVALIDATE hcall.
> > +
> > +In order to enable the use of H_RPT_INVALIDATE in the guest,
> > +user space might have to advertise it for the guest. For example,
> > +IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
> > +present in the "ibm,hypertas-functions" device-tree property.
> > +
> > +This capability is always enabled.
> 
> I guess that means it's always enabled when it's available - I'm
> pretty sure it won't be enabled on POWER8 or on PR KVM.

Correct, will reword this and restrict this to POWER9, radix etc

> 
> > +
> >  8. Other capabilities.
> >  ==
> >  
> > diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> > b/arch/powerpc/include/asm/kvm_book3s.h
> > index d32ec9ae73bd..0f1c5fa6e8ce 100644
> > --- a/arch/powerpc/include/asm/kvm_book3s.h
> > +++ b/arch/powerpc/include/asm/kvm_book3s.h
> > @@ -298,6 +298,9 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, 
> > u64 dw1);
> >  void kvmhv_release_all_nested(struct kvm *kvm);
> >  long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
> >  long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
> > +long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
> > +unsigned long type, unsigned long pg_sizes,
> > +unsigned long start, unsigned long end);
> >  int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu,
> >   u64 time_limit, unsigned long lpcr);
> >  void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
> > diff --git a/arch/powerpc/include/asm/mmu_context.h 
> > b/arch/powerpc/include/asm/mmu_context.h
> > index d5821834dba9..fbf3b5b45fe9 100644
> > --- a/arch/powerpc/include/asm/mmu_context.h
> > +++ b/arch/powerpc/include/asm/mmu_context.h
> > @@ -124,8 +124,19 @@ static inline bool need_extra_context(struct mm_struct 
> > *mm, unsigned long ea)
> >  
> >  #if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) && defined(CONFIG_PPC_RADIX_MMU)
> >  extern void radix_kvm_prefetch_workaround(struct mm_struct *mm);
> > +void do_h_rpt_invalidate(unsigned long pid, unsigned long lpid,
> > +unsigned long type, unsigned long page_size,
> > +unsigned long psize, unsigned long start,
> > +unsigned long end);
> >  #else
> >  static inline void radix_kvm_prefetch

Re: [PATCH v4 1/3] powerpc/book3s64/radix/tlb: tlbie primitives for process-scoped invalidations from guests

2021-02-21 Thread Bharata B Rao
On Wed, Feb 17, 2021 at 11:24:48AM +1100, David Gibson wrote:
> On Mon, Feb 15, 2021 at 12:05:40PM +0530, Bharata B Rao wrote:
> > H_RPT_INVALIDATE hcall needs to perform process scoped tlbie
> > invalidations of L1 and nested guests from L0. This needs RS register
> > for TLBIE instruction to contain both PID and LPID. Introduce
> > primitives that execute tlbie instruction with both PID
> > and LPID set in prepartion for H_RPT_INVALIDATE hcall.
> > 
> > While we are here, move RIC_FLUSH definitions to header file
> > and introduce helper rpti_pgsize_to_psize() that will be needed
> > by the upcoming hcall.
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  .../include/asm/book3s/64/tlbflush-radix.h|  18 +++
> >  arch/powerpc/mm/book3s64/radix_tlb.c  | 122 +-
> >  2 files changed, 136 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
> > b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> > index 94439e0cefc9..aace7e9b2397 100644
> > --- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> > +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
> > @@ -4,6 +4,10 @@
> >  
> >  #include 
> >  
> > +#define RIC_FLUSH_TLB 0
> > +#define RIC_FLUSH_PWC 1
> > +#define RIC_FLUSH_ALL 2
> > +
> >  struct vm_area_struct;
> >  struct mm_struct;
> >  struct mmu_gather;
> > @@ -21,6 +25,20 @@ static inline u64 psize_to_rpti_pgsize(unsigned long 
> > psize)
> > return H_RPTI_PAGE_ALL;
> >  }
> >  
> > +static inline int rpti_pgsize_to_psize(unsigned long page_size)
> > +{
> > +   if (page_size == H_RPTI_PAGE_4K)
> > +   return MMU_PAGE_4K;
> > +   if (page_size == H_RPTI_PAGE_64K)
> > +   return MMU_PAGE_64K;
> > +   if (page_size == H_RPTI_PAGE_2M)
> > +   return MMU_PAGE_2M;
> > +   if (page_size == H_RPTI_PAGE_1G)
> > +   return MMU_PAGE_1G;
> > +   else
> > +   return MMU_PAGE_64K; /* Default */
> > +}
> 
> Would it make sense to put the H_RPT_PAGE_ tags into the
> mmu_psize_defs table and scan that here, rather than open coding the
> conversion?

I will give this a try and see how it looks.

Otherwise the changes in the patch which are mainly about
introducing primitives that require to set both PID and LPID
for tlbie instruction - do they look right?

Regards,
Bharata.


[PATCH v4 2/3] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2021-02-14 Thread Bharata B Rao
Implement H_RPT_INVALIDATE hcall and add KVM capability
KVM_CAP_PPC_RPT_INVALIDATE to indicate the support for the same.

This hcall does two types of TLB invalidations:

1. Process-scoped invalidations for guests with LPCR[GTSE]=0.
   This is currently not used in KVM as GTSE is not usually
   disabled in KVM.
2. Partition-scoped invalidations that an L1 hypervisor does on
   behalf of an L2 guest. This replaces the uses of the existing
   hcall H_TLB_INVALIDATE.

In order to handle process scoped invalidations of L2, we
intercept the nested exit handling code in L0 only to handle
H_TLB_INVALIDATE hcall.

Signed-off-by: Bharata B Rao 
---
 Documentation/virt/kvm/api.rst | 17 +
 arch/powerpc/include/asm/kvm_book3s.h  |  3 +
 arch/powerpc/include/asm/mmu_context.h | 11 +++
 arch/powerpc/kvm/book3s_hv.c   | 91 
 arch/powerpc/kvm/book3s_hv_nested.c| 96 ++
 arch/powerpc/kvm/powerpc.c |  3 +
 arch/powerpc/mm/book3s64/radix_tlb.c   | 25 +++
 include/uapi/linux/kvm.h   |  1 +
 8 files changed, 247 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 99ceb978c8b0..416c36aa35d4 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6038,6 +6038,23 @@ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit 
notifications which user space
 can then handle to implement model specific MSR handling and/or user 
notifications
 to inform a user that an MSR was not handled.
 
+7.22 KVM_CAP_PPC_RPT_INVALIDATE
+--
+
+:Capability: KVM_CAP_PPC_RPT_INVALIDATE
+:Architectures: ppc
+:Type: vm
+
+This capability indicates that the kernel is capable of handling
+H_RPT_INVALIDATE hcall.
+
+In order to enable the use of H_RPT_INVALIDATE in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+This capability is always enabled.
+
 8. Other capabilities.
 ==
 
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index d32ec9ae73bd..0f1c5fa6e8ce 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -298,6 +298,9 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 
dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
 long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
+long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
+unsigned long type, unsigned long pg_sizes,
+unsigned long start, unsigned long end);
 int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu,
  u64 time_limit, unsigned long lpcr);
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index d5821834dba9..fbf3b5b45fe9 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -124,8 +124,19 @@ static inline bool need_extra_context(struct mm_struct 
*mm, unsigned long ea)
 
 #if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) && defined(CONFIG_PPC_RADIX_MMU)
 extern void radix_kvm_prefetch_workaround(struct mm_struct *mm);
+void do_h_rpt_invalidate(unsigned long pid, unsigned long lpid,
+unsigned long type, unsigned long page_size,
+unsigned long psize, unsigned long start,
+unsigned long end);
 #else
 static inline void radix_kvm_prefetch_workaround(struct mm_struct *mm) { }
+static inline void do_h_rpt_invalidate(unsigned long pid,
+  unsigned long lpid,
+  unsigned long type,
+  unsigned long page_size,
+  unsigned long psize,
+  unsigned long start,
+  unsigned long end) { }
 #endif
 
 extern void switch_cop(struct mm_struct *next);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6f612d240392..802cb77c39cc 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -904,6 +904,64 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu)
return yield_count;
 }
 
+static void do_h_rpt_invalidate_prs(unsigned long pid, unsigned long lpid,
+   unsigned long type, unsigned long pg_sizes,
+   unsigned long start, unsigned long end)
+{
+   unsigned long psize;
+
+   if (pg_sizes & H_RPTI_PAGE_64K) {
+   psize = rpti_pgsize_to_psize(pg_sizes & H_RPTI_PAGE_64K);
+

[PATCH v4 1/3] powerpc/book3s64/radix/tlb: tlbie primitives for process-scoped invalidations from guests

2021-02-14 Thread Bharata B Rao
H_RPT_INVALIDATE hcall needs to perform process scoped tlbie
invalidations of L1 and nested guests from L0. This needs RS register
for TLBIE instruction to contain both PID and LPID. Introduce
primitives that execute tlbie instruction with both PID
and LPID set in prepartion for H_RPT_INVALIDATE hcall.

While we are here, move RIC_FLUSH definitions to header file
and introduce helper rpti_pgsize_to_psize() that will be needed
by the upcoming hcall.

Signed-off-by: Bharata B Rao 
---
 .../include/asm/book3s/64/tlbflush-radix.h|  18 +++
 arch/powerpc/mm/book3s64/radix_tlb.c  | 122 +-
 2 files changed, 136 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 94439e0cefc9..aace7e9b2397 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -4,6 +4,10 @@
 
 #include 
 
+#define RIC_FLUSH_TLB 0
+#define RIC_FLUSH_PWC 1
+#define RIC_FLUSH_ALL 2
+
 struct vm_area_struct;
 struct mm_struct;
 struct mmu_gather;
@@ -21,6 +25,20 @@ static inline u64 psize_to_rpti_pgsize(unsigned long psize)
return H_RPTI_PAGE_ALL;
 }
 
+static inline int rpti_pgsize_to_psize(unsigned long page_size)
+{
+   if (page_size == H_RPTI_PAGE_4K)
+   return MMU_PAGE_4K;
+   if (page_size == H_RPTI_PAGE_64K)
+   return MMU_PAGE_64K;
+   if (page_size == H_RPTI_PAGE_2M)
+   return MMU_PAGE_2M;
+   if (page_size == H_RPTI_PAGE_1G)
+   return MMU_PAGE_1G;
+   else
+   return MMU_PAGE_64K; /* Default */
+}
+
 static inline int mmu_get_ap(int psize)
 {
return mmu_psize_defs[psize].ap;
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c 
b/arch/powerpc/mm/book3s64/radix_tlb.c
index fb66d154b26c..097402435303 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -18,10 +18,6 @@
 #include 
 #include 
 
-#define RIC_FLUSH_TLB 0
-#define RIC_FLUSH_PWC 1
-#define RIC_FLUSH_ALL 2
-
 /*
  * tlbiel instruction for radix, set invalidation
  * i.e., r=1 and is=01 or is=10 or is=11
@@ -128,6 +124,21 @@ static __always_inline void __tlbie_pid(unsigned long pid, 
unsigned long ric)
trace_tlbie(0, 0, rb, rs, ric, prs, r);
 }
 
+static __always_inline void __tlbie_pid_lpid(unsigned long pid,
+unsigned long lpid,
+unsigned long ric)
+{
+   unsigned long rb, rs, prs, r;
+
+   rb = PPC_BIT(53); /* IS = 1 */
+   rs = (pid << PPC_BITLSHIFT(31)) | (lpid & ~(PPC_BITMASK(0, 31)));
+   prs = 1; /* process scoped */
+   r = 1;   /* radix format */
+
+   asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
+: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : 
"memory");
+   trace_tlbie(0, 0, rb, rs, ric, prs, r);
+}
 static __always_inline void __tlbie_lpid(unsigned long lpid, unsigned long ric)
 {
unsigned long rb,rs,prs,r;
@@ -188,6 +199,23 @@ static __always_inline void __tlbie_va(unsigned long va, 
unsigned long pid,
trace_tlbie(0, 0, rb, rs, ric, prs, r);
 }
 
+static __always_inline void __tlbie_va_lpid(unsigned long va, unsigned long 
pid,
+   unsigned long lpid,
+   unsigned long ap, unsigned long ric)
+{
+   unsigned long rb, rs, prs, r;
+
+   rb = va & ~(PPC_BITMASK(52, 63));
+   rb |= ap << PPC_BITLSHIFT(58);
+   rs = (pid << PPC_BITLSHIFT(31)) | (lpid & ~(PPC_BITMASK(0, 31)));
+   prs = 1; /* process scoped */
+   r = 1;   /* radix format */
+
+   asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
+: : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : 
"memory");
+   trace_tlbie(0, 0, rb, rs, ric, prs, r);
+}
+
 static __always_inline void __tlbie_lpid_va(unsigned long va, unsigned long 
lpid,
unsigned long ap, unsigned long ric)
 {
@@ -233,6 +261,22 @@ static inline void fixup_tlbie_va_range(unsigned long va, 
unsigned long pid,
}
 }
 
+static inline void fixup_tlbie_va_range_lpid(unsigned long va,
+unsigned long pid,
+unsigned long lpid,
+unsigned long ap)
+{
+   if (cpu_has_feature(CPU_FTR_P9_TLBIE_ERAT_BUG)) {
+   asm volatile("ptesync" : : : "memory");
+   __tlbie_pid_lpid(0, lpid, RIC_FLUSH_TLB);
+   }
+
+   if (cpu_has_feature(CPU_FTR_P9_TLBIE_STQ_BUG)) {
+   asm volatile("ptesync" : : : "memory");
+   __tlbie_va_lpid(va

[PATCH v4 0/3] Support for H_RPT_INVALIDATE in PowerPC KVM

2021-02-14 Thread Bharata B Rao
This patchset adds support for the new hcall H_RPT_INVALIDATE
and replaces the nested tlb flush calls with this new hcall
if support for the same exists.

Changes in v4:
-
- While reusing the tlb flush routines from radix_tlb.c in v3,
  setting of LPID got missed out. Take care of this by
  introducing new flush routines that set both PID and LPID
  when using tlbie instruction. This is required for
  process-scoped invalidations from guests (both L1 and
  nested guests). Added a new patch 1/3 for this.
- Added code to handle H_RPT_INVALIDATE hcall issued
  by nested guest in L0 nested guest exit path.

v3: 
https://lore.kernel.org/linuxppc-dev/20210105090557.2150104-1-bhar...@linux.ibm.com/T/#t

Bharata B Rao (3):
  powerpc/book3s64/radix/tlb: tlbie primitives for process-scoped
invalidations from guests
  KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
  KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

 Documentation/virt/kvm/api.rst|  17 ++
 .../include/asm/book3s/64/tlbflush-radix.h|  18 +++
 arch/powerpc/include/asm/kvm_book3s.h |   3 +
 arch/powerpc/include/asm/mmu_context.h|  11 ++
 arch/powerpc/kvm/book3s_64_mmu_radix.c|  27 +++-
 arch/powerpc/kvm/book3s_hv.c  |  91 +++
 arch/powerpc/kvm/book3s_hv_nested.c   | 108 -
 arch/powerpc/kvm/powerpc.c|   3 +
 arch/powerpc/mm/book3s64/radix_tlb.c  | 147 +-
 include/uapi/linux/kvm.h  |   1 +
 10 files changed, 415 insertions(+), 11 deletions(-)

-- 
2.26.2



[PATCH v4 3/3] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

2021-02-14 Thread Bharata B Rao
In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
H_RPT_INVALIDATE if available. The availability of this hcall
is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions
DT property.

Signed-off-by: Bharata B Rao 
Reviewed-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 27 +-
 arch/powerpc/kvm/book3s_hv_nested.c| 12 ++--
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index bb35490400e9..7ea5459022cb 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Supported radix tree geometry.
@@ -318,9 +319,19 @@ void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned 
long addr,
}
 
psi = shift_to_mmu_psize(pshift);
-   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
-   lpid, rb);
+
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE)) {
+   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 
1),
+   lpid, rb);
+   } else {
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB,
+   psize_to_rpti_pgsize(psi),
+   addr, addr + psize);
+   }
+
if (rc)
pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc);
 }
@@ -334,8 +345,14 @@ static void kvmppc_radix_flush_pwc(struct kvm *kvm, 
unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 
1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_PWC, H_RPTI_PAGE_ALL,
+   0, -1UL);
if (rc)
pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 40ed4eb80adb..0ebddb615684 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct patb_entry *pseries_partition_tb;
 
@@ -402,8 +403,15 @@ static void kvmhv_flush_lpid(unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 
1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+   H_RPTI_TYPE_PAT,
+   H_RPTI_PAGE_ALL, 0, -1UL);
if (rc)
pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
 }
-- 
2.26.2



Re: [PATCH v3 1/2] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2021-01-06 Thread Bharata B Rao
On Wed, Jan 06, 2021 at 05:27:27PM -0300, Fabiano Rosas wrote:
> Bharata B Rao  writes:
> > +
> > +long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
> > +unsigned long type, unsigned long pg_sizes,
> > +unsigned long start, unsigned long end)
> > +{
> > +   struct kvm_nested_guest *gp;
> > +   long ret;
> > +   unsigned long psize, ap;
> > +
> > +   /*
> > +* If L2 lpid isn't valid, we need to return H_PARAMETER.
> > +*
> > +* However, nested KVM issues a L2 lpid flush call when creating
> > +* partition table entries for L2. This happens even before the
> > +* corresponding shadow lpid is created in HV which happens in
> > +* H_ENTER_NESTED call. Since we can't differentiate this case from
> > +* the invalid case, we ignore such flush requests and return success.
> > +*/
> 
> So for a nested lpid the H_TLB_INVALIDATE in:
> 
> kvmppc_core_init_vm_hv -> kvmppc_setup_partition_table ->
> kvmhv_set_ptbl_entry -> kvmhv_flush_lpid
> 
> has always been a noop? It seems that we could just skip
> kvmhv_flush_lpid in L1 during init_vm then.

May be, but I suppose that flush is required and could be fixed
eventually.

Regards,
Bharata.


[PATCH v3 2/2] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

2021-01-05 Thread Bharata B Rao
In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
H_RPT_INVALIDATE if available. The availability of this hcall
is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions
DT property.

Signed-off-by: Bharata B Rao 
Reviewed-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 27 +-
 arch/powerpc/kvm/book3s_hv_nested.c| 12 ++--
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index bb35490400e9..7ea5459022cb 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Supported radix tree geometry.
@@ -318,9 +319,19 @@ void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned 
long addr,
}
 
psi = shift_to_mmu_psize(pshift);
-   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
-   lpid, rb);
+
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE)) {
+   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 
1),
+   lpid, rb);
+   } else {
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB,
+   psize_to_rpti_pgsize(psi),
+   addr, addr + psize);
+   }
+
if (rc)
pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc);
 }
@@ -334,8 +345,14 @@ static void kvmppc_radix_flush_pwc(struct kvm *kvm, 
unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 
1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_PWC, H_RPTI_PAGE_ALL,
+   0, -1UL);
if (rc)
pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 40ed4eb80adb..0ebddb615684 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct patb_entry *pseries_partition_tb;
 
@@ -402,8 +403,15 @@ static void kvmhv_flush_lpid(unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 
1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+   H_RPTI_TYPE_PAT,
+   H_RPTI_PAGE_ALL, 0, -1UL);
if (rc)
pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
 }
-- 
2.26.2



[PATCH v3 1/2] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2021-01-05 Thread Bharata B Rao
Implement H_RPT_INVALIDATE hcall and add KVM capability
KVM_CAP_PPC_RPT_INVALIDATE to indicate the support for the same.

This hcall does two types of TLB invalidations:

1. Process-scoped invalidations for guests with LPCR[GTSE]=0.
   This is currently not used in KVM as GTSE is not usually
   disabled in KVM.
2. Partition-scoped invalidations that an L1 hypervisor does on
   behalf of an L2 guest. This replaces the uses of the existing
   hcall H_TLB_INVALIDATE.

Signed-off-by: Bharata B Rao 
---
 Documentation/virt/kvm/api.rst| 17 
 .../include/asm/book3s/64/tlbflush-radix.h| 18 
 arch/powerpc/include/asm/kvm_book3s.h |  3 +
 arch/powerpc/include/asm/mmu_context.h|  7 ++
 arch/powerpc/kvm/book3s_hv.c  | 56 +++
 arch/powerpc/kvm/book3s_hv_nested.c   | 96 +++
 arch/powerpc/kvm/powerpc.c|  3 +
 arch/powerpc/mm/book3s64/radix_tlb.c  | 24 -
 include/uapi/linux/kvm.h  |  1 +
 9 files changed, 221 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 70254eaa5229..acf1fb4a0e1d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6032,6 +6032,23 @@ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit 
notifications which user space
 can then handle to implement model specific MSR handling and/or user 
notifications
 to inform a user that an MSR was not handled.
 
+7.22 KVM_CAP_PPC_RPT_INVALIDATE
+--
+
+:Capability: KVM_CAP_PPC_RPT_INVALIDATE
+:Architectures: ppc
+:Type: vm
+
+This capability indicates that the kernel is capable of handling
+H_RPT_INVALIDATE hcall.
+
+In order to enable the use of H_RPT_INVALIDATE in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+This capability is always enabled.
+
 8. Other capabilities.
 ==
 
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 94439e0cefc9..aace7e9b2397 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -4,6 +4,10 @@
 
 #include 
 
+#define RIC_FLUSH_TLB 0
+#define RIC_FLUSH_PWC 1
+#define RIC_FLUSH_ALL 2
+
 struct vm_area_struct;
 struct mm_struct;
 struct mmu_gather;
@@ -21,6 +25,20 @@ static inline u64 psize_to_rpti_pgsize(unsigned long psize)
return H_RPTI_PAGE_ALL;
 }
 
+static inline int rpti_pgsize_to_psize(unsigned long page_size)
+{
+   if (page_size == H_RPTI_PAGE_4K)
+   return MMU_PAGE_4K;
+   if (page_size == H_RPTI_PAGE_64K)
+   return MMU_PAGE_64K;
+   if (page_size == H_RPTI_PAGE_2M)
+   return MMU_PAGE_2M;
+   if (page_size == H_RPTI_PAGE_1G)
+   return MMU_PAGE_1G;
+   else
+   return MMU_PAGE_64K; /* Default */
+}
+
 static inline int mmu_get_ap(int psize)
 {
return mmu_psize_defs[psize].ap;
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index d32ec9ae73bd..0f1c5fa6e8ce 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -298,6 +298,9 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 
dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
 long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
+long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
+unsigned long type, unsigned long pg_sizes,
+unsigned long start, unsigned long end);
 int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu,
  u64 time_limit, unsigned long lpcr);
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index d5821834dba9..17b2995a55ed 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -124,8 +124,15 @@ static inline bool need_extra_context(struct mm_struct 
*mm, unsigned long ea)
 
 #if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) && defined(CONFIG_PPC_RADIX_MMU)
 extern void radix_kvm_prefetch_workaround(struct mm_struct *mm);
+void do_h_rpt_invalidate(unsigned long pid, unsigned long type,
+unsigned long page_size, unsigned long psize,
+unsigned long start, unsigned long end);
 #else
 static inline void radix_kvm_prefetch_workaround(struct mm_struct *mm) { }
+static inline void do_h_rpt_invalidate(unsigned long pid, unsigned long type,
+  unsigned long page_size,
+   

[PATCH v3 0/2] Support for H_RPT_INVALIDATE in PowerPC KVM

2021-01-05 Thread Bharata B Rao
This patchset adds support for the new hcall H_RPT_INVALIDATE
and replaces the nested tlb flush calls with this new hcall
if support for the same exists.

Changes in v3:
-
- Reused the tlb flush routines from radix_tlb.c
- Ensured fixups are take care of due to the above change.
 
v2: 
https://lore.kernel.org/linuxppc-dev/2020121705.gd310...@yekko.fritz.box/T/#t

H_RPT_INVALIDATE

Syntax:
int64   /* H_Success: Return code on successful completion */
    /* H_Busy - repeat the call with the same */
    /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid parameters */
    hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT translation 
lookaside information */
  uint64 pid,   /* PID/LPID to invalidate */
  uint64 target,    /* Invalidation target */
  uint64 type,  /* Type of lookaside information */
  uint64 pageSizes, /* Page sizes */
  uint64 start, /* Start of Effective Address (EA) range 
(inclusive) */
  uint64 end)   /* End of EA range (exclusive) */

Invalidation targets (target)
-
Core MMU    0x01 /* All virtual processors in the partition */
Core local MMU  0x02 /* Current virtual processor */
Nest MMU    0x04 /* All nest/accelerator agents in use by the partition */

A combination of the above can be specified, except core and core local.

Type of translation to invalidate (type)
---
NESTED   0x0001  /* Invalidate nested guest partition-scope */
TLB  0x0002  /* Invalidate TLB */
PWC  0x0004  /* Invalidate Page Walk Cache */
PRT  0x0008  /* Invalidate Process Table Entries if NESTED is clear */
PAT  0x0008  /* Invalidate Partition Table Entries if NESTED is set */

A combination of the above can be specified.

Page size mask (pageSizes)
--
4K  0x01
64K 0x02
2M  0x04
1G  0x08
All sizes   (-1UL)

A combination of the above can be specified.
All page sizes can be selected with -1.

Semantics: Invalidate radix tree lookaside information
   matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters are
  different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
  LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and end
  should be a valid Quadrant address and  end > start.
* Return H_NotSupported if the partition is not in running in radix
  translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid addresses.
  Else start and end should be aligned to 4kB (lower 11 bits clear).
* If NESTED is clear, then invalidate process scoped lookaside information.
  Else pid specifies a nested LPID, and the invalidation is performed
  on nested guest partition table and nested guest partition scope real
  addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3 and
  quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
  Those which are partially covered are considered outside invalidation
  range, which allows a caller to optimally invalidate ranges that may
  contain mixed page sizes.
* Return H_SUCCESS on success.

Bharata B Rao (2):
  KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
  KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

 Documentation/virt/kvm/api.rst|  17 +++
 .../include/asm/book3s/64/tlbflush-radix.h|  18 +++
 arch/powerpc/include/asm/kvm_book3s.h |   3 +
 arch/powerpc/include/asm/mmu_context.h|   7 ++
 arch/powerpc/kvm/book3s_64_mmu_radix.c|  27 -
 arch/powerpc/kvm/book3s_hv.c  |  56 +
 arch/powerpc/kvm/book3s_hv_nested.c   | 108 +-
 arch/powerpc/kvm/powerpc.c|   3 +
 arch/powerpc/mm/book3s64/radix_tlb.c  |  24 +++-
 include/uapi/linux/kvm.h  |   1 +
 10 files changed, 253 insertions(+), 11 deletions(-)

-- 
2.26.2



Re: [PATCH v2 1/2] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2020-12-17 Thread Bharata B Rao
On Thu, Dec 17, 2020 at 02:42:15PM +1100, David Gibson wrote:
> On Wed, Dec 16, 2020 at 02:24:46PM +0530, Bharata B Rao wrote:
> > +static void do_tlb_invalidate(unsigned long rs, unsigned long target,
> > + unsigned long type, unsigned long page_size,
> > + unsigned long ap, unsigned long start,
> > + unsigned long end)
> > +{
> > +   unsigned long rb;
> > +   unsigned long addr = start;
> > +
> > +   if ((type & H_RPTI_TYPE_ALL) == H_RPTI_TYPE_ALL) {
> > +   rb = PPC_BIT(53); /* IS = 1 */
> > +   do_tlb_invalidate_all(rb, rs);
> > +   return;
> > +   }
> > +
> > +   if (type & H_RPTI_TYPE_PWC) {
> > +   rb = PPC_BIT(53); /* IS = 1 */
> > +   do_tlb_invalidate_pwc(rb, rs);
> > +   }
> > +
> > +   if (!addr && end == -1) { /* PID */
> > +   rb = PPC_BIT(53); /* IS = 1 */
> > +   do_tlb_invalidate_tlb(rb, rs);
> > +   } else { /* EA */
> > +   do {
> > +   rb = addr & ~(PPC_BITMASK(52, 63));
> > +   rb |= ap << PPC_BITLSHIFT(58);
> > +   do_tlb_invalidate_tlb(rb, rs);
> > +   addr += page_size;
> > +   } while (addr < end);
> > +   }
> > +}
> > +
> > +static long kvmppc_h_rpt_invalidate(struct kvm_vcpu *vcpu,
> > +   unsigned long pid, unsigned long target,
> > +   unsigned long type, unsigned long pg_sizes,
> > +   unsigned long start, unsigned long end)
> > +{
> > +   unsigned long rs, ap, psize;
> > +
> > +   if (!kvm_is_radix(vcpu->kvm))
> > +   return H_FUNCTION;
> 
> IIUC The cover note said this case was H_NOT_SUPPORTED, rather than 
> H_FUNCTION.
> 
> > +
> > +   if (end < start)
> > +   return H_P5;
> > +
> > +   if (type & H_RPTI_TYPE_NESTED) {
> > +   if (!nesting_enabled(vcpu->kvm))
> > +   return H_FUNCTION;
> 
> Likewise, I'm not sure that H_FUNCTION is the right choice here.

Yes to both, will switch to H_FUNCTION in the next iteration.

> 
> > +
> > +   /* Support only cores as target */
> > +   if (target != H_RPTI_TARGET_CMMU)
> > +   return H_P2;
> > +
> > +   return kvmhv_h_rpti_nested(vcpu, pid,
> > +  (type & ~H_RPTI_TYPE_NESTED),
> > +   pg_sizes, start, end);
> > +   }
> > +
> > +   rs = pid << PPC_BITLSHIFT(31);
> > +   rs |= vcpu->kvm->arch.lpid;
> > +
> > +   if (pg_sizes & H_RPTI_PAGE_64K) {
> > +   psize = rpti_pgsize_to_psize(pg_sizes & H_RPTI_PAGE_64K);
> > +   ap = mmu_get_ap(psize);
> > +   do_tlb_invalidate(rs, target, type, (1UL << 16), ap, start,
> > + end);
> 
> Should these be conditional on the TLB flag in type?

Didn't quite get you. Do you mean that depending on the type flag
we may not need to do invalidations for different page sizes
separately?

Regards,
Bharata.


Re: [PATCH v2 1/2] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2020-12-17 Thread Bharata B Rao
On Wed, Dec 16, 2020 at 07:47:29PM -0300, Fabiano Rosas wrote:
> > +static void do_tlb_invalidate(unsigned long rs, unsigned long target,
> > + unsigned long type, unsigned long page_size,
> > + unsigned long ap, unsigned long start,
> > + unsigned long end)
> > +{
> > +   unsigned long rb;
> > +   unsigned long addr = start;
> > +
> > +   if ((type & H_RPTI_TYPE_ALL) == H_RPTI_TYPE_ALL) {
> > +   rb = PPC_BIT(53); /* IS = 1 */
> > +   do_tlb_invalidate_all(rb, rs);
> > +   return;
> > +   }
> > +
> > +   if (type & H_RPTI_TYPE_PWC) {
> > +   rb = PPC_BIT(53); /* IS = 1 */
> > +   do_tlb_invalidate_pwc(rb, rs);
> > +   }
> > +
> > +   if (!addr && end == -1) { /* PID */
> > +   rb = PPC_BIT(53); /* IS = 1 */
> > +   do_tlb_invalidate_tlb(rb, rs);
> > +   } else { /* EA */
> > +   do {
> > +   rb = addr & ~(PPC_BITMASK(52, 63));
> > +   rb |= ap << PPC_BITLSHIFT(58);
> > +   do_tlb_invalidate_tlb(rb, rs);
> > +   addr += page_size;
> > +   } while (addr < end);
> > +   }
> > +}
> 
> This is all quite similar to _tlbie_pid in mm/book3s64/radix_tlb.c so:
> 
> 1) Shouldn't do_tlb_invalidate be in that file so we could reuse
> __tlbie_pid and __tlbie_va? There are also the tracepoints in that file
> that we might want to reuse.

Will see how much reuse is possible.

> 
> 2) For my own understanding, don't the "fixups" in _tlbie_pid apply to
> this scenario as well?

Yes, I think, will add fixups.

> > +long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
> > +unsigned long type, unsigned long pg_sizes,
> > +unsigned long start, unsigned long end)
> > +{
> > +   struct kvm_nested_guest *gp;
> > +   long ret;
> > +   unsigned long psize, ap;
> > +
> > +   /*
> > +* If L2 lpid isn't valid, we need to return H_PARAMETER.
> > +* Nested KVM issues a L2 lpid flush call when creating
> > +* partition table entries for L2. This happens even before
> > +* the corresponding shadow lpid is created in HV. Until
> > +* this is fixed, ignore such flush requests.
> 
> >From the text, it seems that you are talking about kvmhv_set_ptbl_entry
> in L1 calling kvmhv_flush_lpid, but I'm not sure. Could you clarify that
> scenario a bit?

Yes this is the scenario which I am talking about here.

> 
> Maybe it would be good to have a more concrete hint of the issue here or
> in the commit message, since you mentioned this is something that needs
> fixing.

Hmm let me see if I can make the comment more verbose/concrete in the
next version.

Thanks for your review.

Regards,
Bharata.


[PATCH v2 0/2] Support for H_RPT_INVALIDATE in PowerPC KVM

2020-12-16 Thread Bharata B Rao
This patchset adds support for the new hcall H_RPT_INVALIDATE
and replaces the nested tlb flush calls with this new hcall
if support for the same exists.

Changes in v2:
-
- Not enabling the hcall by default now, userspace can enable it when
  required.
- Added implementation for process-scoped invalidations in the hcall.

v1: 
https://lore.kernel.org/linuxppc-dev/20201019112642.53016-1-bhar...@linux.ibm.com/T/#t

H_RPT_INVALIDATE

Syntax:
int64   /* H_Success: Return code on successful completion */
    /* H_Busy - repeat the call with the same */
    /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid parameters */
    hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT translation 
lookaside information */
  uint64 pid,   /* PID/LPID to invalidate */
  uint64 target,    /* Invalidation target */
  uint64 type,  /* Type of lookaside information */
  uint64 pageSizes, /* Page sizes */
  uint64 start, /* Start of Effective Address (EA) range 
(inclusive) */
  uint64 end)   /* End of EA range (exclusive) */

Invalidation targets (target)
-
Core MMU    0x01 /* All virtual processors in the partition */
Core local MMU  0x02 /* Current virtual processor */
Nest MMU    0x04 /* All nest/accelerator agents in use by the partition */

A combination of the above can be specified, except core and core local.

Type of translation to invalidate (type)
---
NESTED   0x0001  /* Invalidate nested guest partition-scope */
TLB  0x0002  /* Invalidate TLB */
PWC  0x0004  /* Invalidate Page Walk Cache */
PRT  0x0008  /* Invalidate Process Table Entries if NESTED is clear */
PAT  0x0008  /* Invalidate Partition Table Entries if NESTED is set */

A combination of the above can be specified.

Page size mask (pageSizes)
--
4K  0x01
64K 0x02
2M  0x04
1G  0x08
All sizes   (-1UL)

A combination of the above can be specified.
All page sizes can be selected with -1.

Semantics: Invalidate radix tree lookaside information
   matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters are
  different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
  LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and end
  should be a valid Quadrant address and  end > start.
* Return H_NotSupported if the partition is not in running in radix
  translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid addresses.
  Else start and end should be aligned to 4kB (lower 11 bits clear).
* If NESTED is clear, then invalidate process scoped lookaside information.
  Else pid specifies a nested LPID, and the invalidation is performed
  on nested guest partition table and nested guest partition scope real
  addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3 and
  quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
  Those which are partially covered are considered outside invalidation
  range, which allows a caller to optimally invalidate ranges that may
  contain mixed page sizes.
* Return H_SUCCESS on success.

Bharata B Rao (2):
  KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
  KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

 Documentation/virt/kvm/api.rst|  17 +++
 .../include/asm/book3s/64/tlbflush-radix.h|  18 +++
 arch/powerpc/include/asm/kvm_book3s.h |   3 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c|  27 +++-
 arch/powerpc/kvm/book3s_hv.c  | 121 ++
 arch/powerpc/kvm/book3s_hv_nested.c   | 106 ++-
 arch/powerpc/kvm/powerpc.c|   3 +
 arch/powerpc/mm/book3s64/radix_tlb.c  |   4 -
 include/uapi/linux/kvm.h  |   1 +
 9 files changed, 289 insertions(+), 11 deletions(-)

-- 
2.26.2



[PATCH v2 2/2] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

2020-12-16 Thread Bharata B Rao
In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
H_RPT_INVALIDATE if available. The availability of this hcall
is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions
DT property.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 27 +-
 arch/powerpc/kvm/book3s_hv_nested.c| 12 ++--
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index bb35490400e9..7ea5459022cb 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Supported radix tree geometry.
@@ -318,9 +319,19 @@ void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned 
long addr,
}
 
psi = shift_to_mmu_psize(pshift);
-   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
-   lpid, rb);
+
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE)) {
+   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 
1),
+   lpid, rb);
+   } else {
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB,
+   psize_to_rpti_pgsize(psi),
+   addr, addr + psize);
+   }
+
if (rc)
pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc);
 }
@@ -334,8 +345,14 @@ static void kvmppc_radix_flush_pwc(struct kvm *kvm, 
unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 
1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_PWC, H_RPTI_PAGE_ALL,
+   0, -1UL);
if (rc)
pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index a54ba4b1d4a7..9dc694288757 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct patb_entry *pseries_partition_tb;
 
@@ -402,8 +403,15 @@ static void kvmhv_flush_lpid(unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 
1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+   H_RPTI_TYPE_PAT,
+   H_RPTI_PAGE_ALL, 0, -1UL);
if (rc)
pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
 }
-- 
2.26.2



[PATCH v2 1/2] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

2020-12-16 Thread Bharata B Rao
Implement H_RPT_INVALIDATE hcall and add KVM capability
KVM_CAP_PPC_RPT_INVALIDATE to indicate the support for the same.

This hcall does two types of TLB invalidations:

1. Process-scoped invalidations for guests with LPCR[GTSE]=0.
   This is currently not used in KVM as GTSE is not usually
   disabled in KVM.
2. Partition-scoped invalidations that an L1 hypervisor does on
   behalf of an L2 guest. This replaces the uses of the existing
   hcall H_TLB_INVALIDATE.

Signed-off-by: Bharata B Rao 
---
 Documentation/virt/kvm/api.rst|  17 +++
 .../include/asm/book3s/64/tlbflush-radix.h|  18 +++
 arch/powerpc/include/asm/kvm_book3s.h |   3 +
 arch/powerpc/kvm/book3s_hv.c  | 121 ++
 arch/powerpc/kvm/book3s_hv_nested.c   |  94 ++
 arch/powerpc/kvm/powerpc.c|   3 +
 arch/powerpc/mm/book3s64/radix_tlb.c  |   4 -
 include/uapi/linux/kvm.h  |   1 +
 8 files changed, 257 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index e00a66d72372..5ce237c0d707 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6014,6 +6014,23 @@ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit 
notifications which user space
 can then handle to implement model specific MSR handling and/or user 
notifications
 to inform a user that an MSR was not handled.
 
+7.22 KVM_CAP_PPC_RPT_INVALIDATE
+--
+
+:Capability: KVM_CAP_PPC_RPT_INVALIDATE
+:Architectures: ppc
+:Type: vm
+
+This capability indicates that the kernel is capable of handling
+H_RPT_INVALIDATE hcall.
+
+In order to enable the use of H_RPT_INVALIDATE in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+This capability is always enabled.
+
 8. Other capabilities.
 ==
 
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 94439e0cefc9..aace7e9b2397 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -4,6 +4,10 @@
 
 #include 
 
+#define RIC_FLUSH_TLB 0
+#define RIC_FLUSH_PWC 1
+#define RIC_FLUSH_ALL 2
+
 struct vm_area_struct;
 struct mm_struct;
 struct mmu_gather;
@@ -21,6 +25,20 @@ static inline u64 psize_to_rpti_pgsize(unsigned long psize)
return H_RPTI_PAGE_ALL;
 }
 
+static inline int rpti_pgsize_to_psize(unsigned long page_size)
+{
+   if (page_size == H_RPTI_PAGE_4K)
+   return MMU_PAGE_4K;
+   if (page_size == H_RPTI_PAGE_64K)
+   return MMU_PAGE_64K;
+   if (page_size == H_RPTI_PAGE_2M)
+   return MMU_PAGE_2M;
+   if (page_size == H_RPTI_PAGE_1G)
+   return MMU_PAGE_1G;
+   else
+   return MMU_PAGE_64K; /* Default */
+}
+
 static inline int mmu_get_ap(int psize)
 {
return mmu_psize_defs[psize].ap;
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index d32ec9ae73bd..0f1c5fa6e8ce 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -298,6 +298,9 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 
dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
 long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
+long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
+unsigned long type, unsigned long pg_sizes,
+unsigned long start, unsigned long end);
 int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu,
  u64 time_limit, unsigned long lpcr);
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index e3b1839fc251..adf2d1191581 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -904,6 +904,118 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu)
return yield_count;
 }
 
+static inline void do_tlb_invalidate_all(unsigned long rb, unsigned long rs)
+{
+   asm volatile("ptesync" : : : "memory");
+   asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
+   : : "r"(rb), "i"(1), "i"(1), "i"(RIC_FLUSH_ALL), "r"(rs)
+   : "memory");
+   asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+}
+
+static inline void do_tlb_invalidate_pwc(unsigned long rb, unsigned long rs)
+{
+   asm volatile("ptesync" : : : "memory");
+   asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
+   : : &q

Re: [PATCH v1 1/2] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE (nested case only)

2020-12-11 Thread Bharata B Rao
On Mon, Oct 19, 2020 at 04:56:41PM +0530, Bharata B Rao wrote:
> Implements H_RPT_INVALIDATE hcall and supports only nested case
> currently.
> 
> A KVM capability KVM_CAP_RPT_INVALIDATE is added to indicate the
> support for this hcall.

As Paul mentioned in the thread, this hcall does both process scoped
invalidations and partition scoped invalidations for L2 guest.
I am adding KVM_CAP_RPT_INVALIDATE capability with only partition
scoped invalidations (nested case) implemented in the hcall as we
don't see the need for KVM to implement process scoped invalidation
function as KVM may never run with LPCR[GTSE]=0.

I am wondering if enabling the capability with only partial
implementation of the hcall is the correct thing to do. In future
if we ever want process scoped invalidations support in this hcall,
we may not be able to differentiate the availability of two functions
cleanly from QEMU.

So does it make sense to implement the process scoped invalidation
function also now itself even if it is not going to be used in
KVM?

Regards,
Bharata.


Re: [PATCH v1 1/2] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE (nested case only)

2020-12-09 Thread Bharata B Rao
On Wed, Dec 09, 2020 at 03:15:42PM +1100, Paul Mackerras wrote:
> On Mon, Oct 19, 2020 at 04:56:41PM +0530, Bharata B Rao wrote:
> > Implements H_RPT_INVALIDATE hcall and supports only nested case
> > currently.
> > 
> > A KVM capability KVM_CAP_RPT_INVALIDATE is added to indicate the
> > support for this hcall.
> 
> I have a couple of questions about this patch:
> 
> 1. Is this something that is useful today, or is it something that may
> become useful in the future depending on future product plans? In
> other words, what advantage is there to forcing L2 guests to use this
> hcall instead of doing tlbie themselves?

H_RPT_INVALIDATE will replace the use of the existing H_TLB_INVALIDATE
for nested partition scoped invalidations. Implementations that want to
off-load invalidations to the host (when GTSE=0) would have to bother
about only one hcall (H_RPT_INVALIDATE)

> 
> 2. Why does it need to be added to the default-enabled hcall list?
> 
> There is a concern that if this is enabled by default we could get the
> situation where a guest using it gets migrated to a host that doesn't
> support it, which would be bad.  That is the reason that all new
> things like this are disabled by default and only enabled by userspace
> (i.e. QEMU) in situations where we can enforce that it is available on
> all hosts to which the VM might be migrated.

As you suggested privately, I am thinking of falling back to
H_TLB_INVALIDATE in case where this new hcall fails due to not being
present. That should address the migration case that you mention
above. With that and leaving the new hcall enabled by default
is good okay?

Regards,
Bharata.


Re: [PATCH] powerpc/book3s_hv_uvmem: Check for failed page migration

2020-12-04 Thread Bharata B Rao
On Thu, Dec 03, 2020 at 04:08:12PM +1100, Alistair Popple wrote:
> migrate_vma_pages() may still clear MIGRATE_PFN_MIGRATE on pages which
> are not able to be migrated. Drivers may safely copy data prior to
> calling migrate_vma_pages() however a remote mapping must not be
> established until after migrate_vma_pages() has returned as the
> migration could still fail.
> 
> UV_PAGE_IN_in both copies and maps the data page, therefore it should
> only be called after checking the results of migrate_vma_pages().
> 
> Signed-off-by: Alistair Popple 
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 84e5a2dc8be5..08aa6a90c525 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -762,7 +762,10 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
>   goto out_finalize;
>   }
>  
> - if (pagein) {
> + *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> + migrate_vma_pages();
> +
> + if ((*mig.src & MIGRATE_PFN_MIGRATE) && pagein) {
>   pfn = *mig.src >> MIGRATE_PFN_SHIFT;
>   spage = migrate_pfn_to_page(*mig.src);
>   if (spage) {
> @@ -773,8 +776,6 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
>   }
>   }
>  
> - *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> - migrate_vma_pages();
>  out_finalize:
>   migrate_vma_finalize();
>   return ret;

Reviewed-by: Bharata B Rao 

Did you actually hit this scenario with secure VMs where a UV-paged-in
page was later found to be not migratable?

Regards,
Bharata.


Re: [PATCH v1 0/2] Use H_RPT_INVALIDATE for nested guest

2020-11-24 Thread Bharata B Rao
Hi,

Any comments on this patchset? Anything specific to be addressed
before it could be considered for inclusion?

Regards,
Bharata.

On Mon, Oct 19, 2020 at 04:56:40PM +0530, Bharata B Rao wrote:
> This patchset adds support for the new hcall H_RPT_INVALIDATE
> (currently handles nested case only) and replaces the nested tlb flush
> calls with this new hcall if the support for the same exists.
> 
> Changes in v1:
> -
> - Removed the bits that added the FW_FEATURE_RPT_INVALIDATE feature
>   as they are already upstream.
> 
> v0: 
> https://lore.kernel.org/linuxppc-dev/20200703104420.21349-1-bhar...@linux.ibm.com/T/#m1800c5f5b3d4f6a154ae58fc1c617c06f286358f
> 
> H_RPT_INVALIDATE
> 
> Syntax:
> int64   /* H_Success: Return code on successful completion */
>     /* H_Busy - repeat the call with the same */
>     /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid parameters */
>     hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT translation 
> lookaside information */
>   uint64 pid,   /* PID/LPID to invalidate */
>   uint64 target,    /* Invalidation target */
>   uint64 type,  /* Type of lookaside information */
>   uint64 pageSizes, /* Page sizes */
>   uint64 start, /* Start of Effective Address (EA) range 
> (inclusive) */
>   uint64 end)   /* End of EA range (exclusive) */
> 
> Invalidation targets (target)
> -
> Core MMU    0x01 /* All virtual processors in the partition */
> Core local MMU  0x02 /* Current virtual processor */
> Nest MMU    0x04 /* All nest/accelerator agents in use by the partition */
> 
> A combination of the above can be specified, except core and core local.
> 
> Type of translation to invalidate (type)
> ---
> NESTED   0x0001  /* Invalidate nested guest partition-scope */
> TLB  0x0002  /* Invalidate TLB */
> PWC  0x0004  /* Invalidate Page Walk Cache */
> PRT  0x0008  /* Invalidate Process Table Entries if NESTED is clear */
> PAT  0x0008  /* Invalidate Partition Table Entries if NESTED is set */
> 
> A combination of the above can be specified.
> 
> Page size mask (pageSizes)
> --
> 4K  0x01
> 64K 0x02
> 2M  0x04
> 1G  0x08
> All sizes   (-1UL)
> 
> A combination of the above can be specified.
> All page sizes can be selected with -1.
> 
> Semantics: Invalidate radix tree lookaside information
>    matching the parameters given.
> * Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters are
>   different from the defined values.
> * Return H_PARAMETER if NESTED is set and pid is not a valid nested
>   LPID allocated to this partition
> * Return H_P5 if (start, end) doesn't form a valid range. Start and end
>   should be a valid Quadrant address and  end > start.
> * Return H_NotSupported if the partition is not in running in radix
>   translation mode.
> * May invalidate more translation information than requested.
> * If start = 0 and end = -1, set the range to cover all valid addresses.
>   Else start and end should be aligned to 4kB (lower 11 bits clear).
> * If NESTED is clear, then invalidate process scoped lookaside information.
>   Else pid specifies a nested LPID, and the invalidation is performed
>   on nested guest partition table and nested guest partition scope real
>   addresses.
> * If pid = 0 and NESTED is clear, then valid addresses are quadrant 3 and
>   quadrant 0 spaces, Else valid addresses are quadrant 0.
> * Pages which are fully covered by the range are to be invalidated.
>   Those which are partially covered are considered outside invalidation
>   range, which allows a caller to optimally invalidate ranges that may
>   contain mixed page sizes.
> * Return H_SUCCESS on success.
> 
> Bharata B Rao (2):
>   KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE (nested case
> only)
>   KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM
> 
>  Documentation/virt/kvm/api.rst|  17 +++
>  .../include/asm/book3s/64/tlbflush-radix.h|  18 +++
>  arch/powerpc/include/asm/kvm_book3s.h |   3 +
>  arch/powerpc/kvm/book3s_64_mmu_radix.c|  26 -
>  arch/powerpc/kvm/book3s_hv.c  |  32 ++
>  arch/powerpc/kvm/book3s_hv_nested.c   | 107 +-
>  arch/powerpc/kvm/powerpc.c|   3 +
>  arch/powerpc/mm/book3s64/radix_tlb.c  |   4 -
>  include/uapi/linux/kvm.h  |   1 +
>  9 files changed, 200 insertions(+), 11 deletions(-)
> 
> -- 
> 2.26.2


[PATCH v1 1/2] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE (nested case only)

2020-10-19 Thread Bharata B Rao
Implements H_RPT_INVALIDATE hcall and supports only nested case
currently.

A KVM capability KVM_CAP_RPT_INVALIDATE is added to indicate the
support for this hcall.

Signed-off-by: Bharata B Rao 
---
 Documentation/virt/kvm/api.rst| 17 
 .../include/asm/book3s/64/tlbflush-radix.h| 18 
 arch/powerpc/include/asm/kvm_book3s.h |  3 +
 arch/powerpc/kvm/book3s_hv.c  | 32 +++
 arch/powerpc/kvm/book3s_hv_nested.c   | 94 +++
 arch/powerpc/kvm/powerpc.c|  3 +
 arch/powerpc/mm/book3s64/radix_tlb.c  |  4 -
 include/uapi/linux/kvm.h  |  1 +
 8 files changed, 168 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 1f26d83e6b168..67e98a56271ae 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5852,6 +5852,23 @@ controlled by the kvm module parameter halt_poll_ns. 
This capability allows
 the maximum halt time to specified on a per-VM basis, effectively overriding
 the module parameter for the target VM.
 
+7.21 KVM_CAP_RPT_INVALIDATE
+--
+
+:Capability: KVM_CAP_RPT_INVALIDATE
+:Architectures: ppc
+:Type: vm
+
+This capability indicates that the kernel is capable of handling
+H_RPT_INVALIDATE hcall.
+
+In order to enable the use of H_RPT_INVALIDATE in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+This capability is always enabled.
+
 8. Other capabilities.
 ==
 
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h 
b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 94439e0cefc9c..aace7e9b2397d 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -4,6 +4,10 @@
 
 #include 
 
+#define RIC_FLUSH_TLB 0
+#define RIC_FLUSH_PWC 1
+#define RIC_FLUSH_ALL 2
+
 struct vm_area_struct;
 struct mm_struct;
 struct mmu_gather;
@@ -21,6 +25,20 @@ static inline u64 psize_to_rpti_pgsize(unsigned long psize)
return H_RPTI_PAGE_ALL;
 }
 
+static inline int rpti_pgsize_to_psize(unsigned long page_size)
+{
+   if (page_size == H_RPTI_PAGE_4K)
+   return MMU_PAGE_4K;
+   if (page_size == H_RPTI_PAGE_64K)
+   return MMU_PAGE_64K;
+   if (page_size == H_RPTI_PAGE_2M)
+   return MMU_PAGE_2M;
+   if (page_size == H_RPTI_PAGE_1G)
+   return MMU_PAGE_1G;
+   else
+   return MMU_PAGE_64K; /* Default */
+}
+
 static inline int mmu_get_ap(int psize)
 {
return mmu_psize_defs[psize].ap;
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index d32ec9ae73bd4..0f1c5fa6e8ce3 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -298,6 +298,9 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 
dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
 long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
+long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
+unsigned long type, unsigned long pg_sizes,
+unsigned long start, unsigned long end);
 int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu,
  u64 time_limit, unsigned long lpcr);
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3bd3118c76330..6cbd37af91ebf 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -904,6 +904,28 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu)
return yield_count;
 }
 
+static long kvmppc_h_rpt_invalidate(struct kvm_vcpu *vcpu,
+   unsigned long pid, unsigned long target,
+   unsigned long type, unsigned long pg_sizes,
+   unsigned long start, unsigned long end)
+{
+   if (end < start)
+   return H_P5;
+
+   if ((!type & H_RPTI_TYPE_NESTED))
+   return H_P3;
+
+   if (!nesting_enabled(vcpu->kvm))
+   return H_FUNCTION;
+
+   /* Support only cores as target */
+   if (target != H_RPTI_TARGET_CMMU)
+   return H_P2;
+
+   return kvmhv_h_rpti_nested(vcpu, pid, (type & ~H_RPTI_TYPE_NESTED),
+  pg_sizes, start, end);
+}
+
 int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 {
unsigned long req = kvmppc_get_gpr(vcpu, 3);
@@ -1112,6 +1134,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 */
ret = kvmppc_h_svm_init_abort

[PATCH v1 2/2] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

2020-10-19 Thread Bharata B Rao
In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
H_RPT_INVALIDATE if available. The availability of this hcall
is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions
DT property.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 26 +-
 arch/powerpc/kvm/book3s_hv_nested.c| 13 +++--
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 22a677b18695e..9934a91adcc3b 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Supported radix tree geometry.
@@ -318,9 +319,17 @@ void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned 
long addr,
}
 
psi = shift_to_mmu_psize(pshift);
-   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
-   lpid, rb);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE)) {
+   rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+   H_TLBIE_P1_ENC(0, 0, 1), lpid, rb);
+   } else {
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB,
+   psize_to_rpti_pgsize(psi),
+   addr, addr + psize);
+   }
if (rc)
pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc);
 }
@@ -334,8 +343,15 @@ static void kvmppc_radix_flush_pwc(struct kvm *kvm, 
unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+   H_TLBIE_P1_ENC(1, 0, 1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_PWC, H_RPTI_PAGE_ALL,
+   0, -1UL);
if (rc)
pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 3ec0231628b42..2a187c782e89b 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct patb_entry *pseries_partition_tb;
 
@@ -402,8 +403,16 @@ static void kvmhv_flush_lpid(unsigned int lpid)
return;
}
 
-   rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
-   lpid, TLBIEL_INVAL_SET_LPID);
+   if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+   rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+   H_TLBIE_P1_ENC(2, 0, 1),
+   lpid, TLBIEL_INVAL_SET_LPID);
+   else
+   rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+   H_RPTI_TYPE_NESTED |
+   H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+   H_RPTI_TYPE_PAT,
+   H_RPTI_PAGE_ALL, 0, -1UL);
if (rc)
pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
 }
-- 
2.26.2



[PATCH v1 0/2] Use H_RPT_INVALIDATE for nested guest

2020-10-19 Thread Bharata B Rao
This patchset adds support for the new hcall H_RPT_INVALIDATE
(currently handles nested case only) and replaces the nested tlb flush
calls with this new hcall if the support for the same exists.

Changes in v1:
-
- Removed the bits that added the FW_FEATURE_RPT_INVALIDATE feature
  as they are already upstream.

v0: 
https://lore.kernel.org/linuxppc-dev/20200703104420.21349-1-bhar...@linux.ibm.com/T/#m1800c5f5b3d4f6a154ae58fc1c617c06f286358f

H_RPT_INVALIDATE

Syntax:
int64   /* H_Success: Return code on successful completion */
    /* H_Busy - repeat the call with the same */
    /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid parameters */
    hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT translation 
lookaside information */
  uint64 pid,   /* PID/LPID to invalidate */
  uint64 target,    /* Invalidation target */
  uint64 type,  /* Type of lookaside information */
  uint64 pageSizes, /* Page sizes */
  uint64 start, /* Start of Effective Address (EA) range 
(inclusive) */
  uint64 end)   /* End of EA range (exclusive) */

Invalidation targets (target)
-
Core MMU    0x01 /* All virtual processors in the partition */
Core local MMU  0x02 /* Current virtual processor */
Nest MMU    0x04 /* All nest/accelerator agents in use by the partition */

A combination of the above can be specified, except core and core local.

Type of translation to invalidate (type)
---
NESTED   0x0001  /* Invalidate nested guest partition-scope */
TLB  0x0002  /* Invalidate TLB */
PWC  0x0004  /* Invalidate Page Walk Cache */
PRT  0x0008  /* Invalidate Process Table Entries if NESTED is clear */
PAT  0x0008  /* Invalidate Partition Table Entries if NESTED is set */

A combination of the above can be specified.

Page size mask (pageSizes)
--
4K  0x01
64K 0x02
2M  0x04
1G  0x08
All sizes   (-1UL)

A combination of the above can be specified.
All page sizes can be selected with -1.

Semantics: Invalidate radix tree lookaside information
   matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters are
  different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
  LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and end
  should be a valid Quadrant address and  end > start.
* Return H_NotSupported if the partition is not in running in radix
  translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid addresses.
  Else start and end should be aligned to 4kB (lower 11 bits clear).
* If NESTED is clear, then invalidate process scoped lookaside information.
  Else pid specifies a nested LPID, and the invalidation is performed
  on nested guest partition table and nested guest partition scope real
  addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3 and
  quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
  Those which are partially covered are considered outside invalidation
  range, which allows a caller to optimally invalidate ranges that may
  contain mixed page sizes.
* Return H_SUCCESS on success.

Bharata B Rao (2):
  KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE (nested case
only)
  KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

 Documentation/virt/kvm/api.rst|  17 +++
 .../include/asm/book3s/64/tlbflush-radix.h|  18 +++
 arch/powerpc/include/asm/kvm_book3s.h |   3 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c|  26 -
 arch/powerpc/kvm/book3s_hv.c  |  32 ++
 arch/powerpc/kvm/book3s_hv_nested.c   | 107 +-
 arch/powerpc/kvm/powerpc.c|   3 +
 arch/powerpc/mm/book3s64/radix_tlb.c  |   4 -
 include/uapi/linux/kvm.h  |   1 +
 9 files changed, 200 insertions(+), 11 deletions(-)

-- 
2.26.2



Re: [RFC v1 2/2] KVM: PPC: Book3S HV: abstract secure VM related calls.

2020-10-12 Thread Bharata B Rao
On Mon, Oct 12, 2020 at 12:27:43AM -0700, Ram Pai wrote:
> Abstract the secure VM related calls into generic calls.
> 
> These generic calls will call the corresponding method of the
> backend that prvoides the implementation to support secure VM.
> 
> Currently there is only the ultravisor based implementation.
> Modify that implementation to act as a backed to the generic calls.
> 
> This plumbing will provide the flexibility to add more backends
> in the future.
> 
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h   | 100 ---
>  arch/powerpc/include/asm/kvmppc_svm_backend.h | 250 
> ++
>  arch/powerpc/kvm/book3s_64_mmu_radix.c|   6 +-
>  arch/powerpc/kvm/book3s_hv.c  |  28 +--
>  arch/powerpc/kvm/book3s_hv_uvmem.c|  78 ++--
>  5 files changed, 327 insertions(+), 135 deletions(-)
>  delete mode 100644 arch/powerpc/include/asm/kvm_book3s_uvmem.h
>  create mode 100644 arch/powerpc/include/asm/kvmppc_svm_backend.h
> 
> diff --git a/arch/powerpc/include/asm/kvmppc_svm_backend.h 
> b/arch/powerpc/include/asm/kvmppc_svm_backend.h
> new file mode 100644
> index 000..be60d80
> --- /dev/null
> +++ b/arch/powerpc/include/asm/kvmppc_svm_backend.h
> @@ -0,0 +1,250 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + *
> + * Copyright IBM Corp. 2020
> + *
> + * Authors: Ram Pai 
> + */
> +
> +#ifndef __POWERPC_KVMPPC_SVM_BACKEND_H__
> +#define __POWERPC_KVMPPC_SVM_BACKEND_H__
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#ifdef CONFIG_PPC_BOOK3S
> +#include 
> +#else
> +#include 
> +#endif
> +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
> +#include 
> +#include 
> +#include 
> +#endif
> +
> +struct kvmppc_hmm_backend {

Though we started with HMM initially, what we ended up with eventually
has nothing to do with HMM. Please don't introduce hmm again :-)

> + /* initialize */
> + int (*kvmppc_secmem_init)(void);
> +
> + /* cleanup */
> + void (*kvmppc_secmem_free)(void);
> +
> + /* is memory available */
> + bool (*kvmppc_secmem_available)(void);
> +
> + /* allocate a protected/secure page for the secure VM */
> + unsigned long (*kvmppc_svm_page_in)(struct kvm *kvm,
> + unsigned long gra,
> + unsigned long flags,
> + unsigned long page_shift);
> +
> + /* recover the protected/secure page from the secure VM */
> + unsigned long (*kvmppc_svm_page_out)(struct kvm *kvm,
> + unsigned long gra,
> + unsigned long flags,
> + unsigned long page_shift);
> +
> + /* initiate the transition of a VM to secure VM */
> + unsigned long (*kvmppc_svm_init_start)(struct kvm *kvm);
> +
> + /* finalize the transition of a secure VM */
> + unsigned long (*kvmppc_svm_init_done)(struct kvm *kvm);
> +
> + /* share the page on page fault */
> + int (*kvmppc_svm_page_share)(struct kvm *kvm, unsigned long gfn);
> +
> + /* abort the transition to a secure VM */
> + unsigned long (*kvmppc_svm_init_abort)(struct kvm *kvm);
> +
> + /* add a memory slot */
> + int (*kvmppc_svm_memslot_create)(struct kvm *kvm,
> + const struct kvm_memory_slot *new);
> +
> + /* free a memory slot */
> + void (*kvmppc_svm_memslot_delete)(struct kvm *kvm,
> + const struct kvm_memory_slot *old);
> +
> + /* drop pages allocated to the secure VM */
> + void (*kvmppc_svm_drop_pages)(const struct kvm_memory_slot *free,
> +  struct kvm *kvm, bool skip_page_out);
> +};

Since the structure has kvmppc_ prefix, may be you can drop
the same from its members to make the fields smaller?

> +
> +extern const struct kvmppc_hmm_backend *kvmppc_svm_backend;
> +
> +static inline int kvmppc_svm_page_share(struct kvm *kvm, unsigned long gfn)
> +{
> + if (!kvmppc_svm_backend)
> + return -ENODEV;
> +
> + return kvmppc_svm_backend->kvmppc_svm_page_share(kvm,
> + gfn);
> +}
> +
> +static inline void kvmppc_svm_drop_pages(const struct kvm_memory_slot 
> *memslot,
> + struct kvm *kvm, bool skip_page_out)
> +{
> + if (!kvmppc_svm_backend)
> + return;
> +
> + kvmppc_svm_backend->kvmppc_svm_drop_pages(memslot,
> + kvm, skip_page_out);
> +}
> +
> +static inline int kvmppc_svm_page_in(struct kvm *kvm,
> + unsigned long gpa,
> + unsigned long flags,
> + unsigned long page_shift)
> +{
> + if (!kvmppc_svm_backend)
> + return -ENODEV;
> +
> + return kvmppc_svm_backend->kvmppc_svm_page_in(kvm,
> + gpa, flags, page_shift);
> +}
> +
> +static inline int kvmppc_svm_page_out(struct kvm *kvm,
> + unsigned long gpa,
> + unsigned long flags,
> + 

Re: [PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free()

2020-07-31 Thread Bharata B Rao
On Fri, Jul 31, 2020 at 01:37:00AM -0700, Ram Pai wrote:
> On Fri, Jul 31, 2020 at 09:59:40AM +0530, Bharata B Rao wrote:
> > On Thu, Jul 30, 2020 at 04:25:26PM -0700, Ram Pai wrote:
> > In our case, device pages that are in use are always associated with a valid
> > pvt member. See kvmppc_uvmem_get_page() which returns failure if it
> > runs out of device pfns and that will result in proper failure of
> > page-in calls.
> 
> looked at the code, and yes that code path looks correct. So my
> reasoning behind the root cause of this bug is incorrect. However the
> bug is surfacing and there must be a reason.
> 
> > 
> > For the case where we run out of device pfns, migrate_vma_finalize() will
> > restore the original PTE and will not replace the PTE with device private 
> > PTE.
> > 
> > Also kvmppc_uvmem_page_free() (=dev_pagemap_ops.page_free()) is never
> > called for non-device-private pages.
> 
> Yes. it should not be called. But as seen above in the stack trace, it is 
> called. 
> 
> What would cause the HMM to call ->page_free() on a page that is not
> associated with that device's pfn?

I believe it is being called for a device private page, you can verify
it when you hit it next time?

> 
> > 
> > This could be a use-after-free case possibly arising out of the new state
> > changes in HV. If so, this fix will only mask the bug and not address the
> > original problem.
> 
> I can verify by rerunning the tests, without the new state changes. But
> I do not see how those changes can cause this fault?
> 
> This could also be caused by a duplicate ->page_free() call due to some
> bug in the migrate_page path? Could there be a race between
> migrate_page() and a page_fault ?
> 
> 
> Regardless, kvmppc_uvmem_page_free() needs to be fixed. It should not
> access contents of pvt, without verifing pvt is valid.

We don't expect pvt to be NULL here. Checking for NULL and returning
isn't the right fix, I think.

Regards,
Bharata.


Re: [PATCH] KVM: PPC: Book3S HV: Define H_PAGE_IN_NONSHARED for H_SVM_PAGE_IN hcall

2020-07-30 Thread Bharata B Rao
On Thu, Jul 30, 2020 at 04:21:01PM -0700, Ram Pai wrote:
> H_SVM_PAGE_IN hcall takes a flag parameter. This parameter specifies the
> way in which a page will be treated.  H_PAGE_IN_NONSHARED indicates
> that the page will be shared with the Secure VM, and H_PAGE_IN_SHARED
> indicates that the page will not be shared but its contents will
> be copied.

Looks like you got the definitions of shared and non-shared interchanged.

> 
> However H_PAGE_IN_NONSHARED is not defined in the header file, though
> it is defined and documented in the API captured in
> Documentation/powerpc/ultravisor.rst
> 
> Define H_PAGE_IN_NONSHARED in the header file.

What is the use of defining this? Is this used directly in any place?
Or, are youp planning to introduce such a usage?

Regards,
Bharata.


Re: [PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free()

2020-07-30 Thread Bharata B Rao
On Thu, Jul 30, 2020 at 04:25:26PM -0700, Ram Pai wrote:
> Observed the following oops while stress-testing, using multiple
> secureVM on a distro kernel. However this issue theoritically exists in
> 5.5 kernel and later.
> 
> This issue occurs when the total number of requested device-PFNs exceed
> the total-number of available device-PFNs.  PFN migration fails to
> allocate a device-pfn, which causes migrate_vma_finalize() to trigger
> kvmppc_uvmem_page_free() on a page, that is not associated with any
> device-pfn.  kvmppc_uvmem_page_free() blindly tries to access the
> contents of the private data which can be null, leading to the following
> kernel fault.
> 
>  --
>  Unable to handle kernel paging request for data at address 0x0011
>  Faulting instruction address: 0xc0080e36e110
>  Oops: Kernel access of bad area, sig: 11 [#1]
>  LE SMP NR_CPUS=2048 NUMA PowerNV
> 
>  MSR:  9280b033 
>CR: 24424822  XER: 
>  CFAR: c0e3d764 DAR: 0011 DSISR: 4000 IRQMASK: 0
>  GPR00: c0080e36e0a4 c01f1d59f610 c0080e38a400 
>  GPR04: c01fa500 fffe  c000201fffeaf300
>  GPR08: 01f0  0f80 c0080e373608
>  GPR12: c0e3d710 c000201fffeaf300 0001 7fef8736
>  GPR16: 7fff97db4410 c000201c3b66a578  
>  GPR20: 000119db9ad0 000a fffc 0001
>  GPR24: c000201c3b66 c01f1d59f7a0 c04cffb0 0001
>  GPR28:  c00a001ff003e000 c0080e386150 0f80
>  NIP [c0080e36e110] kvmppc_uvmem_page_free+0xc8/0x210 [kvm_hv]
>  LR [c0080e36e0a4] kvmppc_uvmem_page_free+0x5c/0x210 [kvm_hv]
>  Call Trace:
>  [c0512010] free_devmap_managed_page+0xd0/0x100
>  [c03f71d0] put_devmap_managed_page+0xa0/0xc0
>  [c04d24bc] migrate_vma_finalize+0x32c/0x410
>  [c0080e36e828] kvmppc_svm_page_in.constprop.5+0xa0/0x460 [kvm_hv]
>  [c0080e36eddc] kvmppc_uv_migrate_mem_slot.isra.2+0x1f4/0x230 [kvm_hv]
>  [c0080e36fa98] kvmppc_h_svm_init_done+0x90/0x170 [kvm_hv]
>  [c0080e35bb14] kvmppc_pseries_do_hcall+0x1ac/0x10a0 [kvm_hv]
>  [c0080e35edf4] kvmppc_vcpu_run_hv+0x83c/0x1060 [kvm_hv]
>  [c0080e95eb2c] kvmppc_vcpu_run+0x34/0x48 [kvm]
>  [c0080e95a2dc] kvm_arch_vcpu_ioctl_run+0x374/0x830 [kvm]
>  [c0080e9433b4] kvm_vcpu_ioctl+0x45c/0x7c0 [kvm]
>  [c05451d0] do_vfs_ioctl+0xe0/0xaa0
>  [c0545d64] sys_ioctl+0xc4/0x160
>  [c000b408] system_call+0x5c/0x70
>  Instruction dump:
>  a12d1174 2f89 409e0158 a1271172 3929 b1271172 7c2004ac 3920
>  913e0140 3920 e87d0010 f93d0010 <89230011> e8c3 e9030008 2f89
>  --
> 
>  Fix the oops..
> 
> fixes: ca9f49 ("KVM: PPC: Book3S HV: Support for running secure guests")
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 2806983..f4002bf 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -1018,13 +1018,15 @@ static void kvmppc_uvmem_page_free(struct page *page)
>  {
>   unsigned long pfn = page_to_pfn(page) -
>   (kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT);
> - struct kvmppc_uvmem_page_pvt *pvt;
> + struct kvmppc_uvmem_page_pvt *pvt = page->zone_device_data;
> +
> + if (!pvt)
> + return;
>  
>   spin_lock(_uvmem_bitmap_lock);
>   bitmap_clear(kvmppc_uvmem_bitmap, pfn, 1);
>   spin_unlock(_uvmem_bitmap_lock);
>  
> - pvt = page->zone_device_data;
>   page->zone_device_data = NULL;
>   if (pvt->remove_gfn)
>   kvmppc_gfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm);

In our case, device pages that are in use are always associated with a valid
pvt member. See kvmppc_uvmem_get_page() which returns failure if it
runs out of device pfns and that will result in proper failure of
page-in calls.

For the case where we run out of device pfns, migrate_vma_finalize() will
restore the original PTE and will not replace the PTE with device private PTE.

Also kvmppc_uvmem_page_free() (=dev_pagemap_ops.page_free()) is never
called for non-device-private pages.

This could be a use-after-free case possibly arising out of the new state
changes in HV. If so, this fix will only mask the bug and not address the
original problem.

Regards,
Bharata.


Re: [PATCH v2] powerpc/book3s64/radix: Add kernel command line option to disable radix GTSE

2020-07-27 Thread Bharata B Rao
On Mon, Jul 27, 2020 at 02:29:08PM +0530, Aneesh Kumar K.V wrote:
> This adds a kernel command line option that can be used to disable GTSE 
> support.
> Disabling GTSE implies kernel will make hcalls to invalidate TLB entries.
> 
> This was done so that we can do VM migration between configs that 
> enable/disable
> GTSE support via hypervisor. To migrate a VM from a system that supports
> GTSE to a system that doesn't, we can boot the guest with
> radix_hcall_invalidate=on, thereby forcing the guest to use hcalls for TLB
> invalidates.
> 
> The check for hcall availability is done in pSeries_setup_arch so that
> the panic message appears on the console. This should only happen on
> a hypervisor that doesn't force the guest to hash translation even
> though it can't handle the radix GTSE=0 request via CAS. With
> radix_hcall_invalidate=on if the hypervisor doesn't support 
> hcall_rpt_invalidate
> hcall it should force the LPAR to hash translation.
> 
> Signed-off-by: Aneesh Kumar K.V 

Tested

1. radix_hcall_invalidate=on with KVM implementation of H_RPT_INVALIDATE hcall,
   the tlb flush calls get off-loaded to the hcall.
2. radix_hcall_invalidate=on w/o H_RPT_INVALIDATE hcall, the guest kernel
   panics as per design.

Tested-by: Bharata B Rao 


[PATCH] powerpc/mm: Limit resize_hpt_for_hotplug() call to hash guests only

2020-07-27 Thread Bharata B Rao
During memory hotplug and unplug, resize_hpt_for_hotplug() gets called
for both hash and radix guests but it should be called only for hash
guests. Though the call does nothing in the radix guest case, it is
cleaner to push this call into hash specific memory hotplug routines.

Reported-by: Nathan Lynch 
Signed-off-by: Bharata B Rao 
---
Tested with memory hotplug and unplug for hash and radix KVM guests.

 arch/powerpc/include/asm/sparsemem.h  | 6 --
 arch/powerpc/mm/book3s64/hash_utils.c | 8 +++-
 arch/powerpc/mm/mem.c | 5 -
 3 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/sparsemem.h 
b/arch/powerpc/include/asm/sparsemem.h
index c89b32443cff..1e6fa371cc38 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -17,12 +17,6 @@ extern int create_section_mapping(unsigned long start, 
unsigned long end,
  int nid, pgprot_t prot);
 extern int remove_section_mapping(unsigned long start, unsigned long end);
 
-#ifdef CONFIG_PPC_BOOK3S_64
-extern int resize_hpt_for_hotplug(unsigned long new_mem_size);
-#else
-static inline int resize_hpt_for_hotplug(unsigned long new_mem_size) { return 
0; }
-#endif
-
 #ifdef CONFIG_NUMA
 extern int hot_add_scn_to_nid(unsigned long scn_addr);
 #else
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index 9fdabea04990..30a4a91d9987 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -785,7 +785,7 @@ static unsigned long __init htab_get_table_size(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int resize_hpt_for_hotplug(unsigned long new_mem_size)
+static int resize_hpt_for_hotplug(unsigned long new_mem_size)
 {
unsigned target_hpt_shift;
 
@@ -819,6 +819,8 @@ int hash__create_section_mapping(unsigned long start, 
unsigned long end,
return -1;
}
 
+   resize_hpt_for_hotplug(memblock_phys_mem_size());
+
rc = htab_bolt_mapping(start, end, __pa(start),
   pgprot_val(prot), mmu_linear_psize,
   mmu_kernel_ssize);
@@ -836,6 +838,10 @@ int hash__remove_section_mapping(unsigned long start, 
unsigned long end)
int rc = htab_remove_mapping(start, end, mmu_linear_psize,
 mmu_kernel_ssize);
WARN_ON(rc < 0);
+
+   if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
+   pr_warn("Hash collision while resizing HPT\n");
+
return rc;
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index c2c11eb8dcfc..9dafc636588f 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -127,8 +127,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
unsigned long nr_pages = size >> PAGE_SHIFT;
int rc;
 
-   resize_hpt_for_hotplug(memblock_phys_mem_size());
-
start = (unsigned long)__va(start);
rc = create_section_mapping(start, start + size, nid,
params->pgprot);
@@ -161,9 +159,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
 * hit that section of memory
 */
vm_unmap_aliases();
-
-   if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
-   pr_warn("Hash collision while resizing HPT\n");
 }
 #endif
 
-- 
2.26.2



Re: [PATCH] powerpc/book3s64/radix: Add kernel command line option to disable radix GTSE

2020-07-27 Thread Bharata B Rao
On Fri, Jul 24, 2020 at 01:26:00PM +0530, Aneesh Kumar K.V wrote:
> This adds a kernel command line option that can be used to disable GTSE 
> support.
> Disabling GTSE implies kernel will make hcalls to invalidate TLB entries.
> 
> This was done so that we can do VM migration between configs that 
> enable/disable
> GTSE support via hypervisor. To migrate a VM from a system that supports
> GTSE to a system that doesn't, we can boot the guest with radix_gtse=off, 
> thereby
> forcing the guest to use hcalls for TLB invalidates.
> 
> The check for hcall availability is done in pSeries_setup_arch so that
> the panic message appears on the console. This should only happen on
> a hypervisor that doesn't force the guest to hash translation even
> though it can't handle the radix GTSE=0 request via CAS. With radix_gtse=off
> if the hypervisor doesn't support hcall_rpt_invalidate hcall it should
> force the LPAR to hash translation.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  3 +++
>  arch/powerpc/include/asm/firmware.h |  4 +++-
>  arch/powerpc/kernel/prom_init.c | 13 +
>  arch/powerpc/platforms/pseries/firmware.c   |  1 +
>  arch/powerpc/platforms/pseries/setup.c  |  5 +
>  5 files changed, 21 insertions(+), 5 deletions(-)
 
Tested

1. radix_gtse=off with KVM implementation of H_RPT_INVALIDATE hcall, the
   tlb flush calls get off-loaded to hcalls.
2. radix_gtse=off w/o H_RPT_INVALIDATE hcall, the guest kernel panics
   as per design.

However in both cases, the guest kernel prints out
"WARNING: Hypervisor doesn't support RADIX with GTSE" which can be a bit
confusing in case 1 as GTSE has disabled by the guest and hypervisor is
capable of supporting the same via hcall.

Regards,
Bharata.


Re: [PATCH v5 5/7] KVM: PPC: Book3S HV: migrate hot plugged memory

2020-07-26 Thread Bharata B Rao
On Thu, Jul 23, 2020 at 01:07:22PM -0700, Ram Pai wrote:
> From: Laurent Dufour 
> 
> When a memory slot is hot plugged to a SVM, PFNs associated with the
> GFNs in that slot must be migrated to the secure-PFNs, aka device-PFNs.
> 
> Call kvmppc_uv_migrate_mem_slot() to accomplish this.
> Disable page-merge for all pages in the memory slot.
> 
> Signed-off-by: Ram Pai 
> [rearranged the code, and modified the commit log]
> Signed-off-by: Laurent Dufour 
> ---
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h | 14 ++
>  arch/powerpc/kvm/book3s_hv.c| 10 ++
>  arch/powerpc/kvm/book3s_hv_uvmem.c  | 23 +++
>  3 files changed, 35 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
> b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> index f229ab5..59c17ca 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> @@ -25,6 +25,10 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot 
> *free,
>struct kvm *kvm, bool skip_page_out);
>  int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
>   const struct kvm_memory_slot *memslot);
> +int kvmppc_uvmem_memslot_create(struct kvm *kvm,
> + const struct kvm_memory_slot *new);
> +void kvmppc_uvmem_memslot_delete(struct kvm *kvm,
> + const struct kvm_memory_slot *old);
>  #else
>  static inline int kvmppc_uvmem_init(void)
>  {
> @@ -84,5 +88,15 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, 
> unsigned long gfn)
>  static inline void
>  kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>   struct kvm *kvm, bool skip_page_out) { }
> +
> +static inline int  kvmppc_uvmem_memslot_create(struct kvm *kvm,
> + const struct kvm_memory_slot *new)
> +{
> + return H_UNSUPPORTED;
> +}
> +
> +static inline void  kvmppc_uvmem_memslot_delete(struct kvm *kvm,
> + const struct kvm_memory_slot *old) { }
> +
>  #endif /* CONFIG_PPC_UV */
>  #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index d331b46..b1485ca 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4515,16 +4515,10 @@ static void 
> kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
>  
>   switch (change) {
>   case KVM_MR_CREATE:
> - if (kvmppc_uvmem_slot_init(kvm, new))
> - return;
> - uv_register_mem_slot(kvm->arch.lpid,
> -  new->base_gfn << PAGE_SHIFT,
> -  new->npages * PAGE_SIZE,
> -  0, new->id);
> + kvmppc_uvmem_memslot_create(kvm, new);

Only concern is that kvmppc_uvmem_memslot_create() can fail due
to multiple reasons but we ignore them and go ahead with memory
hotplug.

May be this hasn't been observed in reality but if we can note this
as a TODO in the comments to dig further and explore the possibility
of recovering from here, then

Reviewed-by: Bharata B Rao 

Regards,
Bharata.


Re: [PATCH v5 6/7] KVM: PPC: Book3S HV: move kvmppc_svm_page_out up

2020-07-26 Thread Bharata B Rao
On Thu, Jul 23, 2020 at 01:07:23PM -0700, Ram Pai wrote:
> From: Laurent Dufour 
> 
> kvmppc_svm_page_out() will need to be called by kvmppc_uvmem_drop_pages()
> so move it upper in this file.
> 
> Furthermore it will be interesting to call this function when already
> holding the kvm->arch.uvmem_lock, so prefix the original function with __
> and remove the locking in it, and introduce a wrapper which call that
> function with the lock held.
> 
> There is no functional change.
> 
> Cc: Ram Pai 
> Cc: Bharata B Rao 
> Cc: Paul Mackerras 
> Signed-off-by: Ram Pai 
> Signed-off-by: Laurent Dufour 

Reviewed-by: Bharata B Rao 

Regards,
Bharata.


Re: [PATCH] KVM: PPC: Book3S HV: rework secure mem slot dropping

2020-07-26 Thread Bharata B Rao
On Fri, Jul 24, 2020 at 10:35:27AM +0200, Laurent Dufour wrote:
> When a secure memslot is dropped, all the pages backed in the secure
> device (aka really backed by secure memory by the Ultravisor)
> should be paged out to a normal page. Previously, this was
> achieved by triggering the page fault mechanism which is calling
> kvmppc_svm_page_out() on each pages.
> 
> This can't work when hot unplugging a memory slot because the memory
> slot is flagged as invalid and gfn_to_pfn() is then not trying to access
> the page, so the page fault mechanism is not triggered.
> 
> Since the final goal is to make a call to kvmppc_svm_page_out() it seems
> simpler to call directly instead of triggering such a mechanism. This
> way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
> memslot.
> 
> Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
> the call to __kvmppc_svm_page_out() is made.  As
> __kvmppc_svm_page_out needs the vma pointer to migrate the pages,
> the VMA is fetched in a lazy way, to not trigger find_vma() all
> the time. In addition, the mmap_sem is held in read mode during
> that time, not in write mode since the virual memory layout is not
> impacted, and kvm->arch.uvmem_lock prevents concurrent operation
> on the secure device.
> 
> Cc: Ram Pai 
> Cc: Bharata B Rao 
> Cc: Paul Mackerras 
> Signed-off-by: Ram Pai 
>   [modified the changelog description]
> Signed-off-by: Laurent Dufour 
> [modified check on the VMA in kvmppc_uvmem_drop_pages]

Reviewed-by: Bharata B Rao 

Regards,
Bharata.


Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes

2020-07-24 Thread Bharata B Rao
On Fri, Jul 24, 2020 at 09:52:14PM +1000, Michael Ellerman wrote:
> Bharata B Rao  writes:
> > On Tue, Jul 21, 2020 at 10:25:58PM +1000, Michael Ellerman wrote:
> >> Bharata B Rao  writes:
> >> > On Tue, Jul 21, 2020 at 11:45:20AM +1000, Michael Ellerman wrote:
> >> >> Nathan Lynch  writes:
> >> >> > "Aneesh Kumar K.V"  writes:
> >> >> >> This is the next version of the fixes for memory unplug on radix.
> >> >> >> The issues and the fix are described in the actual patches.
> >> >> >
> >> >> > I guess this isn't actually causing problems at runtime right now, 
> >> >> > but I
> >> >> > notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
> >> >> > arch_remove_memory(), which ought to be mmu-agnostic:
> >> >> >
> >> >> > int __ref arch_add_memory(int nid, u64 start, u64 size,
> >> >> > struct mhp_params *params)
> >> >> > {
> >> >> >   unsigned long start_pfn = start >> PAGE_SHIFT;
> >> >> >   unsigned long nr_pages = size >> PAGE_SHIFT;
> >> >> >   int rc;
> >> >> >
> >> >> >   resize_hpt_for_hotplug(memblock_phys_mem_size());
> >> >> >
> >> >> >   start = (unsigned long)__va(start);
> >> >> >   rc = create_section_mapping(start, start + size, nid,
> >> >> >   params->pgprot);
> >> >> > ...
> >> >> 
> >> >> Hmm well spotted.
> >> >> 
> >> >> That does return early if the ops are not setup:
> >> >> 
> >> >> int resize_hpt_for_hotplug(unsigned long new_mem_size)
> >> >> {
> >> >> unsigned target_hpt_shift;
> >> >> 
> >> >> if (!mmu_hash_ops.resize_hpt)
> >> >> return 0;
> >> >> 
> >> >> 
> >> >> And:
> >> >> 
> >> >> void __init hpte_init_pseries(void)
> >> >> {
> >> >> ...
> >> >> if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
> >> >> mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
> >> >> 
> >> >> And that comes in via ibm,hypertas-functions:
> >> >> 
> >> >> {FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
> >> >> 
> >> >> 
> >> >> But firmware is not necessarily going to add/remove that call based on
> >> >> whether we're using hash/radix.
> >> >
> >> > Correct but hpte_init_pseries() will not be called for radix guests.
> >> 
> >> Yeah, duh. You'd think the function name would have been a sufficient
> >> clue for me :)
> >> 
> >> >> So I think a follow-up patch is needed to make this more robust.
> >> >> 
> >> >> Aneesh/Bharata what platform did you test this series on? I'm curious
> >> >> how this didn't break.
> >> >
> >> > I have tested memory hotplug/unplug for radix guest on zz platform and
> >> > sanity-tested this for hash guest on P8.
> >> >
> >> > As noted above, mmu_hash_ops.resize_hpt will not be set for radix
> >> > guest and hence we won't see any breakage.
> >> 
> >> OK.
> >> 
> >> That's probably fine as it is then. Or maybe just a comment in
> >> resize_hpt_for_hotplug() pointing out that resize_hpt will be NULL if
> >> we're using radix.
> >
> > Or we could move these calls to hpt-only routines like below?
> 
> That looks like it would be equivalent, and would nicely isolate those
> calls in hash specific code. So yeah I think that's worth sending as a
> proper patch, even better if you can test it.

Sure I will send it as a proper patch. I did test minimal hotplug/unplug
for hash guest with that patch, will do more extensive test and resend.

> 
> > David - Do you remember if there was any particular reason to have
> > these two hpt-resize calls within powerpc-generic memory hotplug code?
> 
> I think the HPT resizing was developed before or concurrently with the
> radix support, so I would guess it was just not something we thought
> about at the time.

Right.

Regards,
Bharata.


Re: [PATCH v5 4/7] KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs.

2020-07-23 Thread Bharata B Rao
On Thu, Jul 23, 2020 at 01:07:21PM -0700, Ram Pai wrote:
> The Ultravisor is expected to explicitly call H_SVM_PAGE_IN for all the
> pages of the SVM before calling H_SVM_INIT_DONE. This causes a huge
> delay in tranistioning the VM to SVM. The Ultravisor is only interested
> in the pages that contain the kernel, initrd and other important data
> structures. The rest contain throw-away content.
> 
> However if not all pages are requested by the Ultravisor, the Hypervisor
> continues to consider the GFNs corresponding to the non-requested pages
> as normal GFNs. This can lead to data-corruption and undefined behavior.
> 
> In H_SVM_INIT_DONE handler, move all the PFNs associated with the SVM's
> GFNs to secure-PFNs. Skip the GFNs that are already Paged-in or Shared
> or Paged-in followed by a Paged-out.
> 
> Cc: Paul Mackerras 
> Cc: Benjamin Herrenschmidt 
> Cc: Michael Ellerman 
> Cc: Bharata B Rao 
> Cc: Aneesh Kumar K.V 
> Cc: Sukadev Bhattiprolu 
> Cc: Laurent Dufour 
> Cc: Thiago Jung Bauermann 
> Cc: David Gibson 
> Cc: Claudio Carvalho 
> Cc: kvm-...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Ram Pai 
> ---
>  Documentation/powerpc/ultravisor.rst|   2 +
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h |   2 +
>  arch/powerpc/kvm/book3s_hv_uvmem.c  | 136 
> +---
>  3 files changed, 127 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/powerpc/ultravisor.rst 
> b/Documentation/powerpc/ultravisor.rst
> index a1c8c37..ba6b1bf 100644
> --- a/Documentation/powerpc/ultravisor.rst
> +++ b/Documentation/powerpc/ultravisor.rst
> @@ -934,6 +934,8 @@ Return values
>   * H_UNSUPPORTED if called from the wrong context (e.g.
>   from an SVM or before an H_SVM_INIT_START
>   hypercall).
> + * H_STATE   if the hypervisor could not successfully
> +transition the VM to Secure VM.
>  
>  Description
>  ~~~
> diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
> b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> index 9cb7d8b..f229ab5 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> @@ -23,6 +23,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
>  unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
>  void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>struct kvm *kvm, bool skip_page_out);
> +int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
> + const struct kvm_memory_slot *memslot);

I still don't see why this be a global function. You should be able
to move around a few functions in book3s_hv_uvmem.c up/down and
satisfy the calling order dependencies.

Otherwise, Reviewed-by: Bharata B Rao 


Re: [PATCH v5 7/7] KVM: PPC: Book3S HV: rework secure mem slot dropping

2020-07-23 Thread Bharata B Rao
On Thu, Jul 23, 2020 at 01:07:24PM -0700, Ram Pai wrote:
> From: Laurent Dufour 
> 
> When a secure memslot is dropped, all the pages backed in the secure
> device (aka really backed by secure memory by the Ultravisor)
> should be paged out to a normal page. Previously, this was
> achieved by triggering the page fault mechanism which is calling
> kvmppc_svm_page_out() on each pages.
> 
> This can't work when hot unplugging a memory slot because the memory
> slot is flagged as invalid and gfn_to_pfn() is then not trying to access
> the page, so the page fault mechanism is not triggered.
> 
> Since the final goal is to make a call to kvmppc_svm_page_out() it seems
> simpler to call directly instead of triggering such a mechanism. This
> way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
> memslot.
> 
> Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
> the call to __kvmppc_svm_page_out() is made.  As
> __kvmppc_svm_page_out needs the vma pointer to migrate the pages,
> the VMA is fetched in a lazy way, to not trigger find_vma() all
> the time. In addition, the mmap_sem is held in read mode during
> that time, not in write mode since the virual memory layout is not
> impacted, and kvm->arch.uvmem_lock prevents concurrent operation
> on the secure device.
> 
> Cc: Ram Pai 
> Cc: Bharata B Rao 
> Cc: Paul Mackerras 
> Signed-off-by: Ram Pai 
>   [modified the changelog description]
> Signed-off-by: Laurent Dufour 
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c | 54 
> ++
>  1 file changed, 37 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index c772e92..daffa6e 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -632,35 +632,55 @@ static inline int kvmppc_svm_page_out(struct 
> vm_area_struct *vma,
>   * fault on them, do fault time migration to replace the device PTEs in
>   * QEMU page table with normal PTEs from newly allocated pages.
>   */
> -void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
> +void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot,
>struct kvm *kvm, bool skip_page_out)
>  {
>   int i;
>   struct kvmppc_uvmem_page_pvt *pvt;
> - unsigned long pfn, uvmem_pfn;
> - unsigned long gfn = free->base_gfn;
> + struct page *uvmem_page;
> + struct vm_area_struct *vma = NULL;
> + unsigned long uvmem_pfn, gfn;
> + unsigned long addr, end;
> +
> + mmap_read_lock(kvm->mm);
> +
> + addr = slot->userspace_addr;
> + end = addr + (slot->npages * PAGE_SIZE);
>  
> - for (i = free->npages; i; --i, ++gfn) {
> - struct page *uvmem_page;
> + gfn = slot->base_gfn;
> + for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) {
> +
> + /* Fetch the VMA if addr is not in the latest fetched one */
> + if (!vma || (addr < vma->vm_start || addr >= vma->vm_end)) {
> + vma = find_vma_intersection(kvm->mm, addr, end);
> + if (!vma ||
> + vma->vm_start > addr || vma->vm_end < end) {
> + pr_err("Can't find VMA for gfn:0x%lx\n", gfn);
> + break;
> + }

There is a potential issue with the boundary condition check here
which I discussed with Laurent yesterday. Guess he hasn't gotten around
to look at it yet.

Regards,
Bharata.


Re: [v4 4/5] KVM: PPC: Book3S HV: retry page migration before erroring-out

2020-07-23 Thread Bharata B Rao
On Fri, Jul 17, 2020 at 01:00:26AM -0700, Ram Pai wrote:
> @@ -812,7 +842,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, 
> unsigned long gpa,
>   struct vm_area_struct *vma;
>   int srcu_idx;
>   unsigned long gfn = gpa >> page_shift;
> - int ret;
> + int ret, repeat_count = REPEAT_COUNT;
>  
>   if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
>   return H_UNSUPPORTED;
> @@ -826,34 +856,44 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, 
> unsigned long gpa,
>   if (flags & H_PAGE_IN_SHARED)
>   return kvmppc_share_page(kvm, gpa, page_shift);
>  
> - ret = H_PARAMETER;
>   srcu_idx = srcu_read_lock(>srcu);
> - mmap_read_lock(kvm->mm);
>  
> - start = gfn_to_hva(kvm, gfn);
> - if (kvm_is_error_hva(start))
> - goto out;
> -
> - mutex_lock(>arch.uvmem_lock);
>   /* Fail the page-in request of an already paged-in page */
> - if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
> - goto out_unlock;
> + mutex_lock(>arch.uvmem_lock);
> + ret = kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL);
> + mutex_unlock(>arch.uvmem_lock);

Same comment as for the prev patch. I don't think you can release
the lock here.

> + if (ret) {
> + srcu_read_unlock(>srcu, srcu_idx);
> + return H_PARAMETER;
> + }
>  
> - end = start + (1UL << page_shift);
> - vma = find_vma_intersection(kvm->mm, start, end);
> - if (!vma || vma->vm_start > start || vma->vm_end < end)
> - goto out_unlock;
> + do {
> + ret = H_PARAMETER;
> + mmap_read_lock(kvm->mm);
>  
> - if (kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, page_shift,
> - true))
> - goto out_unlock;
> + start = gfn_to_hva(kvm, gfn);
> + if (kvm_is_error_hva(start)) {
> + mmap_read_unlock(kvm->mm);
> + break;
> + }
>  
> - ret = H_SUCCESS;
> + end = start + (1UL << page_shift);
> + vma = find_vma_intersection(kvm->mm, start, end);
> + if (!vma || vma->vm_start > start || vma->vm_end < end) {
> + mmap_read_unlock(kvm->mm);
> + break;
> + }
> +
> + mutex_lock(>arch.uvmem_lock);
> + ret = kvmppc_svm_migrate_page(vma, start, end, gpa, kvm, 
> page_shift, true);
> + mutex_unlock(>arch.uvmem_lock);
> +
> + mmap_read_unlock(kvm->mm);
> + } while (ret == -2 && repeat_count--);
> +
> + if (ret == -2)
> + ret = H_BUSY;
>  
> -out_unlock:
> - mutex_unlock(>arch.uvmem_lock);
> -out:
> - mmap_read_unlock(kvm->mm);
>   srcu_read_unlock(>srcu, srcu_idx);
>   return ret;
>  }
> -- 
> 1.8.3.1


Re: [v4 3/5] KVM: PPC: Book3S HV: in H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs.

2020-07-23 Thread Bharata B Rao
On Fri, Jul 17, 2020 at 01:00:25AM -0700, Ram Pai wrote:
>  
> +int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
> + const struct kvm_memory_slot *memslot)

Don't see any callers for this outside of this file, so why not static?

> +{
> + unsigned long gfn = memslot->base_gfn;
> + struct vm_area_struct *vma;
> + unsigned long start, end;
> + int ret = 0;
> +
> + while (kvmppc_next_nontransitioned_gfn(memslot, kvm, )) {

So you checked the state of gfn under uvmem_lock above, but release
it too.

> +
> + mmap_read_lock(kvm->mm);
> + start = gfn_to_hva(kvm, gfn);
> + if (kvm_is_error_hva(start)) {
> + ret = H_STATE;
> + goto next;
> + }
> +
> + end = start + (1UL << PAGE_SHIFT);
> + vma = find_vma_intersection(kvm->mm, start, end);
> + if (!vma || vma->vm_start > start || vma->vm_end < end) {
> + ret = H_STATE;
> + goto next;
> + }
> +
> + mutex_lock(>arch.uvmem_lock);
> + ret = kvmppc_svm_migrate_page(vma, start, end,
> + (gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);

What is the guarantee that the gfn is in the same earlier state when you do
do migration here?

Regards,
Bharata.


Re: [v4 2/5] KVM: PPC: Book3S HV: track the state GFNs associated with secure VMs

2020-07-22 Thread Bharata B Rao
  |   |
>  | Shared | Shared | Secure   |Normal |Shared |
>  |||  |   |   |
>  | Normal | Shared | Secure   |Normal |Secure |
>  --
> 
>  7. Life cycle of a VM
> 
>  
>  | |  start|  H_SVM_  |H_SVM_   |H_SVM_ |UV_SVM_|
>  | |  VM   |INIT_START|INIT_DONE|INIT_ABORT |TERMINATE  |
>  | |   |  | |   |   |
>  - --
>  | |   |  | |   |   |
>  | Normal  | Normal| Transient|Error|Error  |Normal |
>  | |   |  | |   |   |
>  | Secure  |   Error   | Error|Error|Error  |Normal |
>  | |       |  | |   |   |
>  |Transient|   N/A | Error|Secure   |Normal |Normal |
>  
> 
> 
> 
> Cc: Paul Mackerras 
> Cc: Benjamin Herrenschmidt 
> Cc: Michael Ellerman 
> Cc: Bharata B Rao 
> Cc: Aneesh Kumar K.V 
> Cc: Sukadev Bhattiprolu 
> Cc: Laurent Dufour 
> Cc: Thiago Jung Bauermann 
> Cc: David Gibson 
> Cc: Claudio Carvalho 
> Cc: kvm-...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Reviewed-by: Thiago Jung Bauermann 
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c | 187 
> +
>  1 file changed, 168 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 0baa293..df2e272 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -98,7 +98,127 @@
>  static unsigned long *kvmppc_uvmem_bitmap;
>  static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
>  
> -#define KVMPPC_UVMEM_PFN (1UL << 63)
> +/*
> + * States of a GFN
> + * ---
> + * The GFN can be in one of the following states.
> + *
> + * (a) Secure - The GFN is secure. The GFN is associated with
> + *   a Secure VM, the contents of the GFN is not accessible
> + *   to the Hypervisor.  This GFN can be backed by a secure-PFN,
> + *   or can be backed by a normal-PFN with contents encrypted.
> + *   The former is true when the GFN is paged-in into the
> + *   ultravisor. The latter is true when the GFN is paged-out
> + *   of the ultravisor.
> + *
> + * (b) Shared - The GFN is shared. The GFN is associated with a
> + *   a secure VM. The contents of the GFN is accessible to
> + *   Hypervisor. This GFN is backed by a normal-PFN and its
> + *   content is un-encrypted.
> + *
> + * (c) Normal - The GFN is a normal. The GFN is associated with
> + *   a normal VM. The contents of the GFN is accesible to
> + *   the Hypervisor. Its content is never encrypted.
> + *
> + * States of a VM.
> + * ---
> + *
> + * Normal VM:  A VM whose contents are always accessible to
> + *   the hypervisor.  All its GFNs are normal-GFNs.
> + *
> + * Secure VM: A VM whose contents are not accessible to the
> + *   hypervisor without the VM's consent.  Its GFNs are
> + *   either Shared-GFN or Secure-GFNs.
> + *
> + * Transient VM: A Normal VM that is transitioning to secure VM.
> + *   The transition starts on successful return of
> + *   H_SVM_INIT_START, and ends on successful return
> + *   of H_SVM_INIT_DONE. This transient VM, can have GFNs
> + *   in any of the three states; i.e Secure-GFN, Shared-GFN,
> + *   and Normal-GFN. The VM never executes in this state
> + *   in supervisor-mode.
> + *
> + * Memory slot State.
> + * -
> + *   The state of a memory slot mirrors the state of the
> + *   VM the memory slot is associated with.
> + *
> + * VM State transition.
> + * 
> + *
> + *  A VM always starts in Normal Mode.
> + *
> + *  H_SVM_INIT_START moves the VM into transient state. During this
> + *  time the Ultravisor may request some of its GFNs to be shared or
> + *  secured. So its GFNs can be in one of the three GFN states.
> + *
> + *  H_SVM_INIT_DONE moves the VM entirely from transient state to
> + *  secure-state. At this point any left-over normal-GFNs are
> + *  transitioned to Secure-GFN.
> + *
> + *  H_SVM_INIT_ABORT moves the transient VM back to normal VM.
> + *  All its GFNs are moved to Normal-GFNs.
> + *
> + *  UV_TERMINATE transitions the secure-VM back to normal-VM. All
> + *  th

Re: [PATCH v2 2/2] KVM: PPC: Book3S HV: rework secure mem slot dropping

2020-07-22 Thread Bharata B Rao
On Tue, Jul 21, 2020 at 12:42:02PM +0200, Laurent Dufour wrote:
> When a secure memslot is dropped, all the pages backed in the secure device
> (aka really backed by secure memory by the Ultravisor) should be paged out
> to a normal page. Previously, this was achieved by triggering the page
> fault mechanism which is calling kvmppc_svm_page_out() on each pages.
> 
> This can't work when hot unplugging a memory slot because the memory slot
> is flagged as invalid and gfn_to_pfn() is then not trying to access the
> page, so the page fault mechanism is not triggered.
> 
> Since the final goal is to make a call to kvmppc_svm_page_out() it seems
> simpler to directly calling it instead of triggering such a mechanism. This
> way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
> memslot.
> 
> Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
> the call to __kvmppc_svm_page_out() is made.
> As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the
> VMA is fetched in a lazy way, to not trigger find_vma() all the time. In
> addition, the mmap_sem is help in read mode during that time, not in write
> mode since the virual memory layout is not impacted, and
> kvm->arch.uvmem_lock prevents concurrent operation on the secure device.
> 
> Cc: Ram Pai 
> Cc: Bharata B Rao 
> Cc: Paul Mackerras 
> Signed-off-by: Laurent Dufour 
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c | 54 --
>  1 file changed, 37 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 5a4b02d3f651..ba5c7c77cc3a 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -624,35 +624,55 @@ static inline int kvmppc_svm_page_out(struct 
> vm_area_struct *vma,
>   * fault on them, do fault time migration to replace the device PTEs in
>   * QEMU page table with normal PTEs from newly allocated pages.
>   */
> -void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
> +void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot,
>struct kvm *kvm, bool skip_page_out)
>  {
>   int i;
>   struct kvmppc_uvmem_page_pvt *pvt;
> - unsigned long pfn, uvmem_pfn;
> - unsigned long gfn = free->base_gfn;
> + struct page *uvmem_page;
> + struct vm_area_struct *vma = NULL;
> + unsigned long uvmem_pfn, gfn;
> + unsigned long addr, end;
> +
> + mmap_read_lock(kvm->mm);
> +
> + addr = slot->userspace_addr;

We typically use gfn_to_hva() for that, but that won't work for a
memslot that is already marked INVALID which is the case here.
I think it is ok to access slot->userspace_addr here of an INVALID
memslot, but just thought of explictly bringing this up.

> + end = addr + (slot->npages * PAGE_SIZE);
>  
> - for (i = free->npages; i; --i, ++gfn) {
> - struct page *uvmem_page;
> + gfn = slot->base_gfn;
> + for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) {
> +
> + /* Fetch the VMA if addr is not in the latest fetched one */
> + if (!vma || (addr < vma->vm_start || addr >= vma->vm_end)) {
> + vma = find_vma_intersection(kvm->mm, addr, end);
> + if (!vma ||
> + vma->vm_start > addr || vma->vm_end < end) {
> + pr_err("Can't find VMA for gfn:0x%lx\n", gfn);
> + break;
> + }
> + }

In Ram's series, kvmppc_memslot_page_merge() also walks the VMAs spanning
the memslot, but it uses a different logic for the same. Why can't these
two cases use the same method to walk the VMAs? Is there anything subtly
different between the two cases?

Regards,
Bharata.


Re: [v4 5/5] KVM: PPC: Book3S HV: migrate hot plugged memory

2020-07-22 Thread Bharata B Rao
On Fri, Jul 17, 2020 at 01:00:27AM -0700, Ram Pai wrote:
> From: Laurent Dufour 
> 
> When a memory slot is hot plugged to a SVM, PFNs associated with the
> GFNs in that slot must be migrated to secure-PFNs, aka device-PFNs.
> 
> Call kvmppc_uv_migrate_mem_slot() to accomplish this.
> Disable page-merge for all pages in the memory slot.
> 
> Signed-off-by: Ram Pai 
> [rearranged the code, and modified the commit log]
> Signed-off-by: Laurent Dufour 
> ---
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h | 10 ++
>  arch/powerpc/kvm/book3s_hv.c| 10 ++
>  arch/powerpc/kvm/book3s_hv_uvmem.c  | 22 ++
>  3 files changed, 34 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
> b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> index f229ab5..6f7da00 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> @@ -25,6 +25,9 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot 
> *free,
>struct kvm *kvm, bool skip_page_out);
>  int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
>   const struct kvm_memory_slot *memslot);
> +void kvmppc_memslot_create(struct kvm *kvm, const struct kvm_memory_slot 
> *new);
> +void kvmppc_memslot_delete(struct kvm *kvm, const struct kvm_memory_slot 
> *old);

The names look a bit generic, but these functions are specific
to secure guests. May be rename them to kvmppc_uvmem_memslot_[create/delele]?

> +
>  #else
>  static inline int kvmppc_uvmem_init(void)
>  {
> @@ -84,5 +87,12 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, 
> unsigned long gfn)
>  static inline void
>  kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>   struct kvm *kvm, bool skip_page_out) { }
> +
> +static inline void  kvmppc_memslot_create(struct kvm *kvm,
> + const struct kvm_memory_slot *new) { }
> +
> +static inline void  kvmppc_memslot_delete(struct kvm *kvm,
> + const struct kvm_memory_slot *old) { }
> +
>  #endif /* CONFIG_PPC_UV */
>  #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index d331b46..bf3be3b 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4515,16 +4515,10 @@ static void 
> kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
>  
>   switch (change) {
>   case KVM_MR_CREATE:
> - if (kvmppc_uvmem_slot_init(kvm, new))
> - return;
> - uv_register_mem_slot(kvm->arch.lpid,
> -  new->base_gfn << PAGE_SHIFT,
> -  new->npages * PAGE_SIZE,
> -  0, new->id);
> + kvmppc_memslot_create(kvm, new);
>   break;
>   case KVM_MR_DELETE:
> - uv_unregister_mem_slot(kvm->arch.lpid, old->id);
> - kvmppc_uvmem_slot_free(kvm, old);
> + kvmppc_memslot_delete(kvm, old);
>   break;
>   default:
>   /* TODO: Handle KVM_MR_MOVE */
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index a206984..a2b4d25 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -1089,6 +1089,28 @@ int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned 
> long gfn)
>   return (ret == U_SUCCESS) ? RESUME_GUEST : -EFAULT;
>  }
>  
> +void kvmppc_memslot_create(struct kvm *kvm, const struct kvm_memory_slot 
> *new)
> +{
> + if (kvmppc_uvmem_slot_init(kvm, new))
> + return;
> +
> + if (kvmppc_memslot_page_merge(kvm, new, false))
> + return;
> +
> + if (uv_register_mem_slot(kvm->arch.lpid, new->base_gfn << PAGE_SHIFT,
> + new->npages * PAGE_SIZE, 0, new->id))
> + return;
> +
> + kvmppc_uv_migrate_mem_slot(kvm, new);

Quite a few things can return failure here including
kvmppc_uv_migrate_mem_slot() and we are ignoring all of those.
I am wondering if this should be called from prepare_memory_region callback
instead of commit_memory_region. In the prepare phase, we have a way
to back out in case of error. Can you check if moving this call to
prepare callback is feasible?

In the other case in 1/5, the code issues ksm unmerge request on error,
but not here.

Also check if the code for 1st three calls can be shared with similar
code in 1/5.

Regards,
Bharata.


Re: [v4 1/5] KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START

2020-07-22 Thread Bharata B Rao
On Fri, Jul 17, 2020 at 01:00:23AM -0700, Ram Pai wrote:
> Page-merging of pages in memory-slots associated with a Secure VM,
> is disabled in H_SVM_PAGE_IN handler.
> 
> This operation should have been done much earlier; the moment the VM
> is initiated for secure-transition. Delaying this operation, increases
> the probability for those pages to acquire new references , making it
> impossible to migrate those pages.
> 
> Disable page-migration in H_SVM_INIT_START handling.
> 
> Signed-off-by: Ram Pai 

Reviewed-by: Bharata B Rao 

with a few observations below...

> ---
>  Documentation/powerpc/ultravisor.rst |  1 +
>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 98 
> +++-
>  2 files changed, 76 insertions(+), 23 deletions(-)
> 
> diff --git a/Documentation/powerpc/ultravisor.rst 
> b/Documentation/powerpc/ultravisor.rst
> index df136c8..a1c8c37 100644
> --- a/Documentation/powerpc/ultravisor.rst
> +++ b/Documentation/powerpc/ultravisor.rst
> @@ -895,6 +895,7 @@ Return values
>  One of the following values:
>  
>   * H_SUCCESS  on success.
> +* H_STATEif the VM is not in a position to switch to secure.
>  
>  Description
>  ~~~
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index e6f76bc..0baa293 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -211,6 +211,65 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
> struct kvm *kvm,
>   return false;
>  }
>  
> +static int kvmppc_memslot_page_merge(struct kvm *kvm,
> + struct kvm_memory_slot *memslot, bool merge)
> +{
> + unsigned long gfn = memslot->base_gfn;
> + unsigned long end, start = gfn_to_hva(kvm, gfn);
> + int ret = 0;
> + struct vm_area_struct *vma;
> + int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
> +
> + if (kvm_is_error_hva(start))
> + return H_STATE;
> +
> + end = start + (memslot->npages << PAGE_SHIFT);
> +
> + mmap_write_lock(kvm->mm);
> + do {
> + vma = find_vma_intersection(kvm->mm, start, end);
> + if (!vma) {
> + ret = H_STATE;
> + break;
> + }
> + ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +   merge_flag, >vm_flags);
> + if (ret) {
> + ret = H_STATE;
> + break;
> + }
> + start = vma->vm_end + 1;

This should be start = vma->vm_end I believe.

> + } while (end > vma->vm_end);
> +
> + mmap_write_unlock(kvm->mm);
> + return ret;
> +}
> +
> +static int __kvmppc_page_merge(struct kvm *kvm, bool merge)
> +{
> + struct kvm_memslots *slots;
> + struct kvm_memory_slot *memslot;
> + int ret = 0;
> +
> + slots = kvm_memslots(kvm);
> + kvm_for_each_memslot(memslot, slots) {
> + ret = kvmppc_memslot_page_merge(kvm, memslot, merge);
> + if (ret)
> + break;
> + }
> + return ret;
> +}

You walk through all the slots here to issue kvm_madvise, but...

> +
> +static inline int kvmppc_disable_page_merge(struct kvm *kvm)
> +{
> + return __kvmppc_page_merge(kvm, false);
> +}
> +
> +static inline int kvmppc_enable_page_merge(struct kvm *kvm)
> +{
> + return __kvmppc_page_merge(kvm, true);
> +}
> +
>  unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>  {
>   struct kvm_memslots *slots;
> @@ -232,11 +291,18 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>   return H_AUTHORITY;
>  
>   srcu_idx = srcu_read_lock(>srcu);
> +
> + /* disable page-merging for all memslot */
> + ret = kvmppc_disable_page_merge(kvm);
> + if (ret)
> + goto out;
> +
> + /* register the memslot */
>   slots = kvm_memslots(kvm);
>   kvm_for_each_memslot(memslot, slots) {

... you are walking thro' the same set of slots here anyway. I think
it makes sense to issue merge advices from here itself. That will
help you to share code with kvmppc_memslot_create() in 5/5.

All the below 3 calls are common to both the code paths, I think
they can be carved out into a separate function if you prefer.

kvmppc_uvmem_slot_init
kvmppc_memslot_page_merge
uv_register_mem_slot

Regards,
Bharata.


Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes

2020-07-22 Thread Bharata B Rao
On Tue, Jul 21, 2020 at 10:25:58PM +1000, Michael Ellerman wrote:
> Bharata B Rao  writes:
> > On Tue, Jul 21, 2020 at 11:45:20AM +1000, Michael Ellerman wrote:
> >> Nathan Lynch  writes:
> >> > "Aneesh Kumar K.V"  writes:
> >> >> This is the next version of the fixes for memory unplug on radix.
> >> >> The issues and the fix are described in the actual patches.
> >> >
> >> > I guess this isn't actually causing problems at runtime right now, but I
> >> > notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
> >> > arch_remove_memory(), which ought to be mmu-agnostic:
> >> >
> >> > int __ref arch_add_memory(int nid, u64 start, u64 size,
> >> >struct mhp_params *params)
> >> > {
> >> >  unsigned long start_pfn = start >> PAGE_SHIFT;
> >> >  unsigned long nr_pages = size >> PAGE_SHIFT;
> >> >  int rc;
> >> >
> >> >  resize_hpt_for_hotplug(memblock_phys_mem_size());
> >> >
> >> >  start = (unsigned long)__va(start);
> >> >  rc = create_section_mapping(start, start + size, nid,
> >> >  params->pgprot);
> >> > ...
> >> 
> >> Hmm well spotted.
> >> 
> >> That does return early if the ops are not setup:
> >> 
> >> int resize_hpt_for_hotplug(unsigned long new_mem_size)
> >> {
> >>unsigned target_hpt_shift;
> >> 
> >>if (!mmu_hash_ops.resize_hpt)
> >>return 0;
> >> 
> >> 
> >> And:
> >> 
> >> void __init hpte_init_pseries(void)
> >> {
> >>...
> >>if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
> >>mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
> >> 
> >> And that comes in via ibm,hypertas-functions:
> >> 
> >>{FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
> >> 
> >> 
> >> But firmware is not necessarily going to add/remove that call based on
> >> whether we're using hash/radix.
> >
> > Correct but hpte_init_pseries() will not be called for radix guests.
> 
> Yeah, duh. You'd think the function name would have been a sufficient
> clue for me :)
> 
> >> So I think a follow-up patch is needed to make this more robust.
> >> 
> >> Aneesh/Bharata what platform did you test this series on? I'm curious
> >> how this didn't break.
> >
> > I have tested memory hotplug/unplug for radix guest on zz platform and
> > sanity-tested this for hash guest on P8.
> >
> > As noted above, mmu_hash_ops.resize_hpt will not be set for radix
> > guest and hence we won't see any breakage.
> 
> OK.
> 
> That's probably fine as it is then. Or maybe just a comment in
> resize_hpt_for_hotplug() pointing out that resize_hpt will be NULL if
> we're using radix.

Or we could move these calls to hpt-only routines like below?

David - Do you remember if there was any particular reason to have
these two hpt-resize calls within powerpc-generic memory hotplug code?

diff --git a/arch/powerpc/include/asm/sparsemem.h 
b/arch/powerpc/include/asm/sparsemem.h
index c89b32443cff..1e6fa371cc38 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -17,12 +17,6 @@ extern int create_section_mapping(unsigned long start, 
unsigned long end,
  int nid, pgprot_t prot);
 extern int remove_section_mapping(unsigned long start, unsigned long end);
 
-#ifdef CONFIG_PPC_BOOK3S_64
-extern int resize_hpt_for_hotplug(unsigned long new_mem_size);
-#else
-static inline int resize_hpt_for_hotplug(unsigned long new_mem_size) { return 
0; }
-#endif
-
 #ifdef CONFIG_NUMA
 extern int hot_add_scn_to_nid(unsigned long scn_addr);
 #else
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index eec6f4e5e481..5daf53ec7600 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -787,7 +787,7 @@ static unsigned long __init htab_get_table_size(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int resize_hpt_for_hotplug(unsigned long new_mem_size)
+static int resize_hpt_for_hotplug(unsigned long new_mem_size)
 {
unsigned target_hpt_shift;
 
@@ -821,6 +821,8 @@ int hash__create_section_mapping(unsigned long start, 
unsigned long end,
return -1;
}
 
+   resize_hpt_for_hotplug(memblock_phys_mem_size());
+
rc = htab_bolt_mapping(start, end, __pa(start),
   pgprot_val(prot), mmu_linear_psize,

Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes

2020-07-20 Thread Bharata B Rao
On Tue, Jul 21, 2020 at 11:45:20AM +1000, Michael Ellerman wrote:
> Nathan Lynch  writes:
> > "Aneesh Kumar K.V"  writes:
> >> This is the next version of the fixes for memory unplug on radix.
> >> The issues and the fix are described in the actual patches.
> >
> > I guess this isn't actually causing problems at runtime right now, but I
> > notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
> > arch_remove_memory(), which ought to be mmu-agnostic:
> >
> > int __ref arch_add_memory(int nid, u64 start, u64 size,
> >   struct mhp_params *params)
> > {
> > unsigned long start_pfn = start >> PAGE_SHIFT;
> > unsigned long nr_pages = size >> PAGE_SHIFT;
> > int rc;
> >
> > resize_hpt_for_hotplug(memblock_phys_mem_size());
> >
> > start = (unsigned long)__va(start);
> > rc = create_section_mapping(start, start + size, nid,
> > params->pgprot);
> > ...
> 
> Hmm well spotted.
> 
> That does return early if the ops are not setup:
> 
> int resize_hpt_for_hotplug(unsigned long new_mem_size)
> {
>   unsigned target_hpt_shift;
> 
>   if (!mmu_hash_ops.resize_hpt)
>   return 0;
> 
> 
> And:
> 
> void __init hpte_init_pseries(void)
> {
>   ...
>   if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
>   mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
> 
> And that comes in via ibm,hypertas-functions:
> 
>   {FW_FEATURE_HPT_RESIZE, "hcall-hpt-resize"},
> 
> 
> But firmware is not necessarily going to add/remove that call based on
> whether we're using hash/radix.

Correct but hpte_init_pseries() will not be called for radix guests.

> 
> So I think a follow-up patch is needed to make this more robust.
> 
> Aneesh/Bharata what platform did you test this series on? I'm curious
> how this didn't break.

I have tested memory hotplug/unplug for radix guest on zz platform and
sanity-tested this for hash guest on P8.

As noted above, mmu_hash_ops.resize_hpt will not be set for radix
guest and hence we won't see any breakage.

However a separate patch to fix this will be good.

Regards,
Bharata.


Re: [FIX PATCH] powerpc/prom: Enable Radix GTSE in cpu pa-features

2020-07-20 Thread Bharata B Rao
On Mon, Jul 20, 2020 at 03:38:29PM +1000, Nicholas Piggin wrote:
> Excerpts from Bharata B Rao's message of July 20, 2020 2:42 pm:
> > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> > index 9cc49f265c86..a9594bad572a 100644
> > --- a/arch/powerpc/kernel/prom.c
> > +++ b/arch/powerpc/kernel/prom.c
> > @@ -163,7 +163,8 @@ static struct ibm_pa_feature {
> > { .pabyte = 0,  .pabit = 6, .cpu_features  = CPU_FTR_NOEXECUTE },
> > { .pabyte = 1,  .pabit = 2, .mmu_features  = MMU_FTR_CI_LARGE_PAGE },
> >  #ifdef CONFIG_PPC_RADIX_MMU
> > -   { .pabyte = 40, .pabit = 0, .mmu_features  = MMU_FTR_TYPE_RADIX },
> > +   { .pabyte = 40, .pabit = 0,
> > + .mmu_features  = (MMU_FTR_TYPE_RADIX | MMU_FTR_GTSE) },
> 
> It might look better like this:
> 
> { .pabyte = 1,  .pabit = 2, .mmu_features  = MMU_FTR_CI_LARGE_PAGE },
> #ifdef CONFIG_PPC_RADIX_MMU
> { .pabyte = 40, .pabit = 0, .mmu_features  = MMU_FTR_TYPE_RADIX },
> { .pabyte = 40, .pabit = 0, .mmu_features  = MMU_FTR_TYPE_RADIX |
>  MMU_FTR_GTSE },
> #endif
>   { .pabyte = 1,  .pabit = 1, .invert = 1, .cpu_features = 
> CPU_FTR_NODSISRALIGN },
> 
> But that's bikeshedding a bit and the optional bits already put it out 
> of alignment.

Here it is...

>From 1be7f3f8b43503740431b7bdf585e488ecdeb48f Mon Sep 17 00:00:00 2001
From: Nicholas Piggin 
Date: Mon, 20 Jul 2020 09:05:05 +0530
Subject: [FIX PATCH] powerpc/prom: Enable Radix GTSE in cpu pa-features

When '029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.")'
made GTSE an MMU feature, it was enabled by default in
powerpc-cpu-features but was missed in pa-features. This causes
random memory corruption during boot of PowerNV kernels if
CONFIG_PPC_DT_CPU_FTRS isn't enabled.

Fixes: 029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.")
Reported-by: Qian Cai 
Signed-off-by: Nicholas Piggin 
Signed-off-by: Bharata B Rao 
---
 arch/powerpc/kernel/prom.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 9cc49f265c86..dae30e805e42 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -163,7 +163,8 @@ static struct ibm_pa_feature {
{ .pabyte = 0,  .pabit = 6, .cpu_features  = CPU_FTR_NOEXECUTE },
{ .pabyte = 1,  .pabit = 2, .mmu_features  = MMU_FTR_CI_LARGE_PAGE },
 #ifdef CONFIG_PPC_RADIX_MMU
-   { .pabyte = 40, .pabit = 0, .mmu_features  = MMU_FTR_TYPE_RADIX },
+   { .pabyte = 40, .pabit = 0, .mmu_features  = MMU_FTR_TYPE_RADIX |
+MMU_FTR_GTSE },
 #endif
{ .pabyte = 1,  .pabit = 1, .invert = 1, .cpu_features = 
CPU_FTR_NODSISRALIGN },
{ .pabyte = 5,  .pabit = 0, .cpu_features  = CPU_FTR_REAL_LE,
-- 
2.26.2



[FIX PATCH] powerpc/prom: Enable Radix GTSE in cpu pa-features

2020-07-19 Thread Bharata B Rao
From: Nicholas Piggin 

When '029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.")'
made GTSE an MMU feature, it was enabled by default in
powerpc-cpu-features but was missed in pa-features. This causes
random memory corruption during boot of PowerNV kernels where
CONFIG_PPC_DT_CPU_FTRS isn't enabled.

Fixes: 029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.")
Reported-by: Qian Cai 
Signed-off-by: Nicholas Piggin 
Signed-off-by: Bharata B Rao 
---
 arch/powerpc/kernel/prom.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 9cc49f265c86..a9594bad572a 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -163,7 +163,8 @@ static struct ibm_pa_feature {
{ .pabyte = 0,  .pabit = 6, .cpu_features  = CPU_FTR_NOEXECUTE },
{ .pabyte = 1,  .pabit = 2, .mmu_features  = MMU_FTR_CI_LARGE_PAGE },
 #ifdef CONFIG_PPC_RADIX_MMU
-   { .pabyte = 40, .pabit = 0, .mmu_features  = MMU_FTR_TYPE_RADIX },
+   { .pabyte = 40, .pabit = 0,
+ .mmu_features  = (MMU_FTR_TYPE_RADIX | MMU_FTR_GTSE) },
 #endif
{ .pabyte = 1,  .pabit = 1, .invert = 1, .cpu_features = 
CPU_FTR_NODSISRALIGN },
{ .pabyte = 5,  .pabit = 0, .cpu_features  = CPU_FTR_REAL_LE,
-- 
2.26.2



Re: [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE

2020-07-16 Thread Bharata B Rao
On Fri, Jul 17, 2020 at 12:44:00PM +1000, Nicholas Piggin wrote:
> Excerpts from Nicholas Piggin's message of July 17, 2020 12:08 pm:
> > Excerpts from Qian Cai's message of July 17, 2020 3:27 am:
> >> On Fri, Jul 03, 2020 at 11:06:05AM +0530, Bharata B Rao wrote:
> >>> Hypervisor may choose not to enable Guest Translation Shootdown Enable
> >>> (GTSE) option for the guest. When GTSE isn't ON, the guest OS isn't
> >>> permitted to use instructions like tblie and tlbsync directly, but is
> >>> expected to make hypervisor calls to get the TLB flushed.
> >>> 
> >>> This series enables the TLB flush routines in the radix code to
> >>> off-load TLB flushing to hypervisor via the newly proposed hcall
> >>> H_RPT_INVALIDATE. 
> >>> 
> >>> To easily check the availability of GTSE, it is made an MMU feature.
> >>> The OV5 handling and H_REGISTER_PROC_TBL hcall are changed to
> >>> handle GTSE as an optionally available feature and to not assume GTSE
> >>> when radix support is available.
> >>> 
> >>> The actual hcall implementation for KVM isn't included in this
> >>> patchset and will be posted separately.
> >>> 
> >>> Changes in v3
> >>> =
> >>> - Fixed a bug in the hcall wrapper code where we were missing setting
> >>>   H_RPTI_TYPE_NESTED while retrying the failed flush request with
> >>>   a full flush for the nested case.
> >>> - s/psize_to_h_rpti/psize_to_rpti_pgsize
> >>> 
> >>> v2: 
> >>> https://lore.kernel.org/linuxppc-dev/20200626131000.5207-1-bhar...@linux.ibm.com/T/#t
> >>> 
> >>> Bharata B Rao (2):
> >>>   powerpc/mm: Enable radix GTSE only if supported.
> >>>   powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if
> >>> enabled
> >>> 
> >>> Nicholas Piggin (1):
> >>>   powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when
> >>> !GTSE
> >> 
> >> Reverting the whole series fixed random memory corruptions during boot on
> >> POWER9 PowerNV systems below.
> > 
> > If I s/mmu_has_feature(MMU_FTR_GTSE)/(1)/g in radix_tlb.c, then the .o
> > disasm is the same as reverting my patch.
> > 
> > Feature bits not being set right? PowerNV should be pretty simple, seems
> > to do the same as FTR_TYPE_RADIX.
> 
> Might need this fix
> 
> ---
> 
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 9cc49f265c86..54c9bcea9d4e 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -163,7 +163,7 @@ static struct ibm_pa_feature {
>   { .pabyte = 0,  .pabit = 6, .cpu_features  = CPU_FTR_NOEXECUTE },
>   { .pabyte = 1,  .pabit = 2, .mmu_features  = MMU_FTR_CI_LARGE_PAGE },
>  #ifdef CONFIG_PPC_RADIX_MMU
> - { .pabyte = 40, .pabit = 0, .mmu_features  = MMU_FTR_TYPE_RADIX },
> + { .pabyte = 40, .pabit = 0, .mmu_features  = (MMU_FTR_TYPE_RADIX | 
> MMU_FTR_GTSE) },
>  #endif
>   { .pabyte = 1,  .pabit = 1, .invert = 1, .cpu_features = 
> CPU_FTR_NODSISRALIGN },
>   { .pabyte = 5,  .pabit = 0, .cpu_features  = CPU_FTR_REAL_LE,

Michael - Let me know if this should be folded into 1/3 and the complete
series resent.

Regards,
Bharata.


Re: [v3 3/5] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE

2020-07-15 Thread Bharata B Rao
On Tue, Jul 14, 2020 at 10:05:41PM -0700, Ram Pai wrote:
> On Mon, Jul 13, 2020 at 03:15:06PM +0530, Bharata B Rao wrote:
> > On Sat, Jul 11, 2020 at 02:13:45AM -0700, Ram Pai wrote:
> > > The Ultravisor is expected to explicitly call H_SVM_PAGE_IN for all the 
> > > pages
> > >  
> > >   if (!(*mig.src & MIGRATE_PFN_MIGRATE)) {
> > > - ret = -1;
> > > + ret = -2;
> > 
> > migrate_vma_setup() has marked that this pfn can't be migrated. What
> > transient errors are you observing which will disappear within 10
> > retries?
> > 
> > Also till now when UV used to pull in all the pages, we never seemed to
> > have hit these transient errors. But now when HV is pushing the same
> > pages, we see these errors which are disappearing after 10 retries.
> > Can you explain this more please? What sort of pages are these?
> 
> We did see them even before this patch. The retry alleviates the
> problem, but does not entirely eliminate it. If the chance of seeing
> the issue without the patch is 1%,  the chance of seeing this issue
> with this patch becomes 0.25%.

Okay, but may be we should investigate the problem a bit more to
understand why the page migrations are failing before taking this
route?

> 
> > 
> > >   goto out_finalize;
> > >   }
> > > + bool retry = 0;
> ...snip...
> > > +
> > > + *ret = 0;
> > > + while (kvmppc_next_nontransitioned_gfn(memslot, kvm, )) {
> > > +
> > > + down_write(>mm->mmap_sem);
> > 
> > Acquiring and releasing mmap_sem in a loop? Any reason?
> > 
> > Now that you have moved ksm_madvise() calls to init time, any specific
> > reason to take write mmap_sem here?
> 
> The semaphore protects the vma. right?

We took write lock just for ksm_madvise() and then downgraded to
read. Now that you are moving that to init time, read is sufficient here.

Regards,
Bharata.


Re: [v3 1/5] KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START

2020-07-15 Thread Bharata B Rao
On Tue, Jul 14, 2020 at 10:16:14PM -0700, Ram Pai wrote:
> On Mon, Jul 13, 2020 at 10:59:41AM +0530, Bharata B Rao wrote:
> > On Sat, Jul 11, 2020 at 02:13:43AM -0700, Ram Pai wrote:
> > > Merging of pages associated with each memslot of a SVM is
> > > disabled the page is migrated in H_SVM_PAGE_IN handler.
> > > 
> > > This operation should have been done much earlier; the moment the VM
> > > is initiated for secure-transition. Delaying this operation, increases
> > > the probability for those pages to acquire new references , making it
> > > impossible to migrate those pages in H_SVM_PAGE_IN handler.
> > > 
> > > Disable page-migration in H_SVM_INIT_START handling.
> > 
> > While it is a good idea to disable KSM merging for all VMAs during
> > H_SVM_INIT_START, I am curious if you did observe an actual case of
> > ksm_madvise() failing which resulted in subsequent H_SVM_PAGE_IN
> > failing to migrate?
> 
> No. I did not find any ksm_madvise() failing.  But it did not make sense
> to ksm_madvise() everytime a page_in was requested. Hence i proposed
> this patch. H_SVM_INIT_START is the right place for ksm_advise().

Indeed yes. Then you may want to update the description which currently
seems to imply that this change is being done to avoid issues arising
out of delayed KSM unmerging advice.

> 
> > 
> > > 
> > > Signed-off-by: Ram Pai 
> > > ---
> > >  arch/powerpc/kvm/book3s_hv_uvmem.c | 96 
> > > +-
> > >  1 file changed, 74 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> > > b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > > index 3d987b1..bfc3841 100644
> > > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> > > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > > @@ -211,6 +211,65 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long 
> > > gfn, struct kvm *kvm,
> > >   return false;
> > >  }
> > >  
> > > +static int kvmppc_memslot_page_merge(struct kvm *kvm,
> > > + struct kvm_memory_slot *memslot, bool merge)
> > > +{
> > > + unsigned long gfn = memslot->base_gfn;
> > > + unsigned long end, start = gfn_to_hva(kvm, gfn);
> > > + int ret = 0;
> > > + struct vm_area_struct *vma;
> > > + int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
> > > +
> > > + if (kvm_is_error_hva(start))
> > > + return H_STATE;
> > 
> > This and other cases below seem to be a new return value from
> > H_SVM_INIT_START. May be update the documentation too along with
> > this patch?
> 
> ok.
> 
> > 
> > > +
> > > + end = start + (memslot->npages << PAGE_SHIFT);
> > > +
> > > + down_write(>mm->mmap_sem);
> > 
> > When you rebase the patches against latest upstream you may want to
> > replace the above and other instances by mmap_write/read_lock().
> 
> ok.
> 
> > 
> > > + do {
> > > + vma = find_vma_intersection(kvm->mm, start, end);
> > > + if (!vma) {
> > > + ret = H_STATE;
> > > + break;
> > > + }
> > > + ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> > > +   merge_flag, >vm_flags);
> > > + if (ret) {
> > > + ret = H_STATE;
> > > + break;
> > > + }
> > > + start = vma->vm_end + 1;
> > > + } while (end > vma->vm_end);
> > > +
> > > + up_write(>mm->mmap_sem);
> > > + return ret;
> > > +}
> > > +
> > > +static int __kvmppc_page_merge(struct kvm *kvm, bool merge)
> > > +{
> > > + struct kvm_memslots *slots;
> > > + struct kvm_memory_slot *memslot;
> > > + int ret = 0;
> > > +
> > > + slots = kvm_memslots(kvm);
> > > + kvm_for_each_memslot(memslot, slots) {
> > > + ret = kvmppc_memslot_page_merge(kvm, memslot, merge);
> > > + if (ret)
> > > + break;
> > > + }
> > > + return ret;
> > > +}
> > > +
> > > +static inline int kvmppc_disable_page_merge(struct kvm *kvm)
> > > +{
> > > + return __kvmppc_page_merge(kvm, false);
> > > +}
> > > +
> > > +static inline int kvmppc_enable_page_merge(struct kvm *kvm)
> > > +{
> > > + return __kvmppc_page_merge(kvm, true);
> 

Re: [RFC PATCH v0 2/2] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

2020-07-13 Thread Bharata B Rao
On Thu, Jul 09, 2020 at 08:07:11PM +1000, Paul Mackerras wrote:
> On Thu, Jul 09, 2020 at 02:38:51PM +0530, Bharata B Rao wrote:
> > On Thu, Jul 09, 2020 at 03:18:03PM +1000, Paul Mackerras wrote:
> > > On Fri, Jul 03, 2020 at 04:14:20PM +0530, Bharata B Rao wrote:
> > > > In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
> > > > H_RPT_INVALIDATE if available. The availability of this hcall
> > > > is determined from "hcall-rpt-invalidate" string in 
> > > > ibm,hypertas-functions
> > > > DT property.
> > > 
> > > What are we going to use when nested KVM supports HPT guests at L2?
> > > L1 will need to do partition-scoped tlbies with R=0 via a hypercall,
> > > but H_RPT_INVALIDATE says in its name that it only handles radix
> > > page tables (i.e. R=1).
> > 
> > For L2 HPT guests, the old hcall is expected to work after it adds
> > support for R=0 case?
> 
> That was the plan.
> 
> > The new hcall should be advertised via ibm,hypertas-functions only
> > for radix guests I suppose.
> 
> Well, the L1 hypervisor is a radix guest of L0, so it would have
> H_RPT_INVALIDATE available to it?
> 
> I guess the question is whether H_RPT_INVALIDATE is supposed to do
> everything, that is, radix process-scoped invalidations, radix
> partition-scoped invalidations, and HPT partition-scoped
> invalidations.  If that is the plan then we should call it something
> different.

Guess we are bit late now to rename it and include HPT in the scope.

> 
> This patchset seems to imply that H_RPT_INVALIDATE is at least going
> to be used for radix partition-scoped invalidations as well as radix
> process-scoped invalidations.  If you are thinking that in future when
> we need HPT partition-scoped invalidations for a radix L1 hypervisor
> running a HPT L2 guest, we are going to define a new hypercall for
> that, I suppose that is OK, though it doesn't really seem necessary.

Guess a new hcall would be the way forward to cover the HPT L2 guest
requirements.

Thanks for pointing this out.

Regards,
Bharata.


Re: [v3 4/5] KVM: PPC: Book3S HV: retry page migration before erroring-out H_SVM_PAGE_IN

2020-07-13 Thread Bharata B Rao
On Sat, Jul 11, 2020 at 02:13:46AM -0700, Ram Pai wrote:
> The page requested for page-in; sometimes, can have transient
> references, and hence cannot migrate immediately. Retry a few times
> before returning error.

As I noted in the previous patch, we need to understand what are these
transient errors and they occur on what type of pages?

The previous patch also introduced a bit of retry logic in the
page-in path. Can you consolidate the retry logic into a separate
patch?

> 
> H_SVM_PAGE_IN interface is enhanced to return H_BUSY if the page is
> not in a migratable state.
> 
> Cc: Paul Mackerras 
> Cc: Benjamin Herrenschmidt 
> Cc: Michael Ellerman 
> Cc: Bharata B Rao 
> Cc: Aneesh Kumar K.V 
> Cc: Sukadev Bhattiprolu 
> Cc: Laurent Dufour 
> Cc: Thiago Jung Bauermann 
> Cc: David Gibson 
> Cc: Claudio Carvalho 
> Cc: kvm-...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> 
> Signed-off-by: Ram Pai 
> ---
>  Documentation/powerpc/ultravisor.rst |  1 +
>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 54 
> +---
>  2 files changed, 33 insertions(+), 22 deletions(-)
> 
> diff --git a/Documentation/powerpc/ultravisor.rst 
> b/Documentation/powerpc/ultravisor.rst
> index d98fc85..638d1a7 100644
> --- a/Documentation/powerpc/ultravisor.rst
> +++ b/Documentation/powerpc/ultravisor.rst
> @@ -1034,6 +1034,7 @@ Return values
>   * H_PARAMETER   if ``guest_pa`` is invalid.
>   * H_P2  if ``flags`` is invalid.
>   * H_P3  if ``order`` of page is invalid.
> + * H_BUSYif ``page`` is not in a state to pagein
>  
>  Description
>  ~~~
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 12ed52a..c9bdef6 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -843,7 +843,7 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, 
> unsigned long gpa,
>   struct vm_area_struct *vma;
>   int srcu_idx;
>   unsigned long gfn = gpa >> page_shift;
> - int ret;
> + int ret, repeat_count = REPEAT_COUNT;
>  
>   if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
>   return H_UNSUPPORTED;
> @@ -857,34 +857,44 @@ unsigned long kvmppc_h_svm_page_in(struct kvm *kvm, 
> unsigned long gpa,
>   if (flags & H_PAGE_IN_SHARED)
>   return kvmppc_share_page(kvm, gpa, page_shift);
>  
> - ret = H_PARAMETER;
>   srcu_idx = srcu_read_lock(>srcu);
> - down_write(>mm->mmap_sem);
>  
> - start = gfn_to_hva(kvm, gfn);
> - if (kvm_is_error_hva(start))
> - goto out;
> -
> - mutex_lock(>arch.uvmem_lock);
>   /* Fail the page-in request of an already paged-in page */
> - if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL))
> - goto out_unlock;
> + mutex_lock(>arch.uvmem_lock);
> + ret = kvmppc_gfn_is_uvmem_pfn(gfn, kvm, NULL);
> + mutex_unlock(>arch.uvmem_lock);
> + if (ret) {
> + srcu_read_unlock(>srcu, srcu_idx);
> + return H_PARAMETER;
> + }
>  
> - end = start + (1UL << page_shift);
> - vma = find_vma_intersection(kvm->mm, start, end);
> - if (!vma || vma->vm_start > start || vma->vm_end < end)
> - goto out_unlock;
> + do {
> + ret = H_PARAMETER;
> + down_write(>mm->mmap_sem);

Again with ksm_madvise() moved to init time, check if you still need
write mmap_sem here.

Regards,
Bharata.


Re: [v3 3/5] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE

2020-07-13 Thread Bharata B Rao
On Sat, Jul 11, 2020 at 02:13:45AM -0700, Ram Pai wrote:
> The Ultravisor is expected to explicitly call H_SVM_PAGE_IN for all the pages
> of the SVM before calling H_SVM_INIT_DONE. This causes a huge delay in
> tranistioning the VM to SVM. The Ultravisor is interested in the pages that
> contain the kernel, initrd and other important data structures. The rest of 
> the
> pages contain throw-away content. Hence requesting just the necessary and
> sufficient pages from the Hypervisor is sufficient.
> 
> However if not all pages are requested by the Ultravisor, the Hypervisor
> continues to consider the GFNs corresponding to the non-requested pages as
> normal GFNs. This can lead to data-corruption and undefined behavior.
> 
> Move all the PFNs associated with the SVM's GFNs to secure-PFNs, in
> H_SVM_INIT_DONE. Skip the GFNs that are already Paged-in or Shared or
> Paged-in followed by a Paged-out.
> 
> Cc: Paul Mackerras 
> Cc: Benjamin Herrenschmidt 
> Cc: Michael Ellerman 
> Cc: Bharata B Rao 
> Cc: Aneesh Kumar K.V 
> Cc: Sukadev Bhattiprolu 
> Cc: Laurent Dufour 
> Cc: Thiago Jung Bauermann 
> Cc: David Gibson 
> Cc: Claudio Carvalho 
> Cc: kvm-...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Ram Pai 
> ---
>  Documentation/powerpc/ultravisor.rst|   2 +
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h |   2 +
>  arch/powerpc/kvm/book3s_hv_uvmem.c  | 166 
> +---
>  3 files changed, 156 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/powerpc/ultravisor.rst 
> b/Documentation/powerpc/ultravisor.rst
> index df136c8..d98fc85 100644
> --- a/Documentation/powerpc/ultravisor.rst
> +++ b/Documentation/powerpc/ultravisor.rst
> @@ -933,6 +933,8 @@ Return values
>   * H_UNSUPPORTED if called from the wrong context (e.g.
>   from an SVM or before an H_SVM_INIT_START
>   hypercall).
> + * H_STATE   if the hypervisor could not successfully
> +transition the VM to Secure VM.
>  
>  Description
>  ~~~
> diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h 
> b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> index 9cb7d8b..f229ab5 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> @@ -23,6 +23,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
>  unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
>  void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>struct kvm *kvm, bool skip_page_out);
> +int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
> + const struct kvm_memory_slot *memslot);
>  #else
>  static inline int kvmppc_uvmem_init(void)
>  {
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 6d6c256..12ed52a 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -93,6 +93,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  static struct dev_pagemap kvmppc_uvmem_pgmap;
>  static unsigned long *kvmppc_uvmem_bitmap;
> @@ -348,6 +349,43 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
> struct kvm *kvm,
>   return false;
>  }
>  
> +/*
> + * starting from *gfn search for the next available GFN that is not yet
> + * transitioned to a secure GFN.  return the value of that GFN in *gfn.  If a
> + * GFN is found, return true, else return false
> + */
> +static bool kvmppc_next_nontransitioned_gfn(const struct kvm_memory_slot 
> *memslot,
> + struct kvm *kvm, unsigned long *gfn)
> +{
> + struct kvmppc_uvmem_slot *p;
> + bool ret = false;
> + unsigned long i;
> +
> + mutex_lock(>arch.uvmem_lock);
> +
> + list_for_each_entry(p, >arch.uvmem_pfns, list)
> + if (*gfn >= p->base_pfn && *gfn < p->base_pfn + p->nr_pfns)
> + break;
> + if (!p)
> + goto out;
> + /*
> +  * The code below assumes, one to one correspondence between
> +  * kvmppc_uvmem_slot and memslot.
> +  */
> + for (i = *gfn; i < p->base_pfn + p->nr_pfns; i++) {
> + unsigned long index = i - p->base_pfn;
> +
> + if (!(p->pfns[index] & KVMPPC_GFN_FLAG_MASK)) {
> + *gfn = i;
> + ret = true;
> + break;
> + }
> + }
> +out:
> + mutex_unlock(>arch.uvmem_lock);
> + return ret;
> +}
> +
>  static int kvmppc_me

Re: [v3 1/5] KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START

2020-07-12 Thread Bharata B Rao
On Sat, Jul 11, 2020 at 02:13:43AM -0700, Ram Pai wrote:
> Merging of pages associated with each memslot of a SVM is
> disabled the page is migrated in H_SVM_PAGE_IN handler.
> 
> This operation should have been done much earlier; the moment the VM
> is initiated for secure-transition. Delaying this operation, increases
> the probability for those pages to acquire new references , making it
> impossible to migrate those pages in H_SVM_PAGE_IN handler.
> 
> Disable page-migration in H_SVM_INIT_START handling.

While it is a good idea to disable KSM merging for all VMAs during
H_SVM_INIT_START, I am curious if you did observe an actual case of
ksm_madvise() failing which resulted in subsequent H_SVM_PAGE_IN
failing to migrate?

> 
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c | 96 
> +-
>  1 file changed, 74 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
> b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 3d987b1..bfc3841 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -211,6 +211,65 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, 
> struct kvm *kvm,
>   return false;
>  }
>  
> +static int kvmppc_memslot_page_merge(struct kvm *kvm,
> + struct kvm_memory_slot *memslot, bool merge)
> +{
> + unsigned long gfn = memslot->base_gfn;
> + unsigned long end, start = gfn_to_hva(kvm, gfn);
> + int ret = 0;
> + struct vm_area_struct *vma;
> + int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
> +
> + if (kvm_is_error_hva(start))
> + return H_STATE;

This and other cases below seem to be a new return value from
H_SVM_INIT_START. May be update the documentation too along with
this patch?

> +
> + end = start + (memslot->npages << PAGE_SHIFT);
> +
> + down_write(>mm->mmap_sem);

When you rebase the patches against latest upstream you may want to
replace the above and other instances by mmap_write/read_lock().

> + do {
> + vma = find_vma_intersection(kvm->mm, start, end);
> + if (!vma) {
> + ret = H_STATE;
> + break;
> + }
> + ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +   merge_flag, >vm_flags);
> + if (ret) {
> + ret = H_STATE;
> + break;
> + }
> + start = vma->vm_end + 1;
> + } while (end > vma->vm_end);
> +
> + up_write(>mm->mmap_sem);
> + return ret;
> +}
> +
> +static int __kvmppc_page_merge(struct kvm *kvm, bool merge)
> +{
> + struct kvm_memslots *slots;
> + struct kvm_memory_slot *memslot;
> + int ret = 0;
> +
> + slots = kvm_memslots(kvm);
> + kvm_for_each_memslot(memslot, slots) {
> + ret = kvmppc_memslot_page_merge(kvm, memslot, merge);
> + if (ret)
> + break;
> + }
> + return ret;
> +}
> +
> +static inline int kvmppc_disable_page_merge(struct kvm *kvm)
> +{
> + return __kvmppc_page_merge(kvm, false);
> +}
> +
> +static inline int kvmppc_enable_page_merge(struct kvm *kvm)
> +{
> + return __kvmppc_page_merge(kvm, true);
> +}
> +
>  unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>  {
>   struct kvm_memslots *slots;
> @@ -232,11 +291,18 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>   return H_AUTHORITY;
>  
>   srcu_idx = srcu_read_lock(>srcu);
> +
> + /* disable page-merging for all memslot */
> + ret = kvmppc_disable_page_merge(kvm);
> + if (ret)
> + goto out;
> +
> + /* register the memslot */
>   slots = kvm_memslots(kvm);
>   kvm_for_each_memslot(memslot, slots) {
>   if (kvmppc_uvmem_slot_init(kvm, memslot)) {
>   ret = H_PARAMETER;
> - goto out;
> + break;
>   }
>   ret = uv_register_mem_slot(kvm->arch.lpid,
>  memslot->base_gfn << PAGE_SHIFT,
> @@ -245,9 +311,12 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>   if (ret < 0) {
>   kvmppc_uvmem_slot_free(kvm, memslot);
>   ret = H_PARAMETER;
> - goto out;
> + break;
>   }
>   }
> +
> + if (ret)
> + kvmppc_enable_page_merge(kvm);

Is there any use of enabling KSM merging in the failure path here?
Won't UV terminate the VM if H_SVM_INIT_START fails? If there is no need,
you can do away with some extra routines above.

>  out:
>   srcu_read_unlock(>srcu, srcu_idx);
>   return ret;
> @@ -384,7 +453,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
> gpa, struct kvm *kvm)
>   */
>  static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long 
> start,
>  unsigned long 

  1   2   3   4   >