from:"Fabiano Rosas"

[PATCH] KVM: PPC: Book3S HV: Fix decrementer migration

2022-08-16 Thread Fabiano Rosas

We used to have a workaround[1] for a hang during migration that was
made ineffective when we converted the decrementer expiry to be
relative to guest timebase.

The point of the workaround was that in the absence of an explicit
decrementer expiry value provided by userspace during migration, KVM
needs to initialize dec_expires to a value that will result in an
expired decrementer after subtracting the current guest timebase. That
stops the vcpu from hanging after migration due to a decrementer
that's too large.

If the dec_expires is now relative to guest timebase, its
initialization needs to be guest timebase-relative as well, otherwise
we end up with a decrementer expiry that is still larger than the
guest timebase.

1- https://git.kernel.org/torvalds/c/5855564c8ab2

Fixes: 3c1a4322bba7 ("KVM: PPC: Book3S HV: Change dec_expires to be relative to 
guest timebase")
Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_hv.c | 18 --
 arch/powerpc/kvm/powerpc.c   |  1 -
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 57d0835e56fd..917abda9e5ce 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2517,10 +2517,24 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
r = set_vpa(vcpu, >arch.dtl, addr, len);
break;
case KVM_REG_PPC_TB_OFFSET:
+   {
/* round up to multiple of 2^24 */
-   vcpu->arch.vcore->tb_offset =
-   ALIGN(set_reg_val(id, *val), 1UL << 24);
+   u64 tb_offset = ALIGN(set_reg_val(id, *val), 1UL << 24);
+
+   /*
+* Now that we know the timebase offset, update the
+* decrementer expiry with a guest timebase value. If
+* the userspace does not set DEC_EXPIRY, this ensures
+* a migrated vcpu at least starts with an expired
+* decrementer, which is better than a large one that
+* causes a hang.
+*/
+   if (!vcpu->arch.dec_expires && tb_offset)
+   vcpu->arch.dec_expires = get_tb() + tb_offset;
+
+   vcpu->arch.vcore->tb_offset = tb_offset;
break;
+   }
case KVM_REG_PPC_LPCR:
kvmppc_set_lpcr(vcpu, set_reg_val(id, *val), true);
break;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index fb1490761c87..757491dd6b7b 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -786,7 +786,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
hrtimer_init(>arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup;
-   vcpu->arch.dec_expires = get_tb();
 
 #ifdef CONFIG_KVM_EXIT_TIMING
mutex_init(>arch.exit_timing_lock);
-- 
2.35.3

Re: [PATCH v2 2/2] powerpc/kvm: don't crash on missing rng, and use darn

2022-06-24 Thread Fabiano Rosas

"Jason A. Donenfeld"  writes:

> On POWER8 systems that don't have ibm,power-rng available, a guest that
> ignores the KVM_CAP_PPC_HWRNG flag and calls H_RANDOM anyway will
> dereference a NULL pointer. And on machines with darn instead of
> ibm,power-rng, H_RANDOM won't work at all.
>
> This patch kills two birds with one stone, by routing H_RANDOM calls to
> ppc_md.get_random_seed, and doing the real mode check inside of it.
>
> Cc: sta...@vger.kernel.org # v4.1+
> Cc: Michael Ellerman 
> Fixes: e928e9cb3601 ("KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM 
> implementation.")
> Signed-off-by: Jason A. Donenfeld 

Reviewed-by: Fabiano Rosas

Re: [PATCH v2 1/2] powerpc/powernv: rename remaining rng powernv_ functions to pnv_

2022-06-24 Thread Fabiano Rosas

"Jason A. Donenfeld"  writes:

> The preferred nomenclature is pnv_, not powernv_, but rng.c used
> powernv_ for some reason, which isn't consistent with the rest. A recent
> commit added a few pnv_ functions to rng.c, making the file a bit of a
> mishmash. This commit just replaces the rest of them.
>
> Cc: Michael Ellerman 
> Fixes: f3eac426657 ("powerpc/powernv: wire up rng during setup_arch")
> Signed-off-by: Jason A. Donenfeld 

Reviewed-by: Fabiano Rosas

[PATCH v2] KVM: PPC: Align pt_regs in kvm_vcpu_arch structure

2022-06-24 Thread Fabiano Rosas

The H_ENTER_NESTED hypercall receives as second parameter the address
of a region of memory containing the values for the nested guest
privileged registers. We currently use the pt_regs structure contained
within kvm_vcpu_arch for that end.

Most hypercalls that receive a memory address expect that region to
not cross a 4K page boundary. We would want H_ENTER_NESTED to follow
the same pattern so this patch ensures the pt_regs structure sits
within a page.

Note: the pt_regs structure is currently 384 bytes in size, so
aligning to 512 is sufficient to ensure it will not cross a 4K page
and avoids punching too big a hole in struct kvm_vcpu_arch.

Signed-off-by: Fabiano Rosas 
Signed-off-by: Murilo Opsfelder Araújo 
---
v2:
 - updated commit message to inform the rationale for aligning to 512;

 - added Murilo's sign-off which I had forgotten, we worked on this
   together.
---
 arch/powerpc/include/asm/kvm_host.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 2909a88acd16..2c7219cef4ec 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -523,7 +523,11 @@ struct kvm_vcpu_arch {
struct kvmppc_book3s_shadow_vcpu *shadow_vcpu;
 #endif
 
-   struct pt_regs regs;
+   /*
+* This is passed along to the HV via H_ENTER_NESTED. Align to
+* prevent it crossing a real 4K page.
+*/
+   struct pt_regs regs __aligned(512);
 
struct thread_fp_state fp;
 
-- 
2.35.3

Re: [PATCH] powerpc/kvm: don't crash on missing rng, and use darn

2022-06-24 Thread Fabiano Rosas

"Jason A. Donenfeld"  writes:

> On POWER8 systems that don't have ibm,power-rng available, a guest that
> ignores the KVM_CAP_PPC_HWRNG flag and calls H_RANDOM anyway will
> dereference a NULL pointer. And on machines with darn instead of
> ibm,power-rng, H_RANDOM won't work at all.
>
> This patch kills two birds with one stone, by routing H_RANDOM calls to
> ppc_md.get_random_seed, and doing the real mode check inside of it.
>
> Cc: sta...@vger.kernel.org # v4.1+
> Cc: Michael Ellerman 
> Fixes: e928e9cb3601 ("KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM 
> implementation.")
> Signed-off-by: Jason A. Donenfeld 
> ---
>
> This patch must be applied ontop of:
> 1) 
> https://github.com/linuxppc/linux/commit/f3eac426657d985b97c92fa5f7ae1d43f04721f3
> 2) https://lore.kernel.org/all/20220622102532.173393-1-ja...@zx2c4.com/
>
>
>  arch/powerpc/include/asm/archrandom.h |  5 
>  arch/powerpc/kvm/book3s_hv_builtin.c  |  5 ++--
>  arch/powerpc/platforms/powernv/rng.c  | 33 +++
>  3 files changed, 11 insertions(+), 32 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/archrandom.h 
> b/arch/powerpc/include/asm/archrandom.h
> index 11d4815841ab..3af27bb84a3d 100644
> --- a/arch/powerpc/include/asm/archrandom.h
> +++ b/arch/powerpc/include/asm/archrandom.h
> @@ -38,12 +38,7 @@ static inline bool __must_check 
> arch_get_random_seed_int(unsigned int *v)
>  #endif /* CONFIG_ARCH_RANDOM */
>
>  #ifdef CONFIG_PPC_POWERNV
> -int pnv_hwrng_present(void);
>  int pnv_get_random_long(unsigned long *v);
> -int pnv_get_random_real_mode(unsigned long *v);
> -#else
> -static inline int pnv_hwrng_present(void) { return 0; }
> -static inline int pnv_get_random_real_mode(unsigned long *v) { return 0; }
>  #endif
>
>  #endif /* _ASM_POWERPC_ARCHRANDOM_H */
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
> b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 799d40c2ab4f..1c6672826db5 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -176,13 +176,14 @@ EXPORT_SYMBOL_GPL(kvmppc_hcall_impl_hv_realmode);
>
>  int kvmppc_hwrng_present(void)
>  {
> - return pnv_hwrng_present();
> + return ppc_md.get_random_seed != NULL;
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_hwrng_present);
>
>  long kvmppc_rm_h_random(struct kvm_vcpu *vcpu)
>  {
> - if (pnv_get_random_real_mode(>arch.regs.gpr[4]))
> + if (ppc_md.get_random_seed &&
> + ppc_md.get_random_seed(>arch.regs.gpr[4]))
>   return H_SUCCESS;

This is the same as arch_get_random_seed_long, perhaps you could use it
instead.

Otherwise, the archrandom.h include is not needed anymore and could be
replaced with #include  for ppc_md.

>
>   return H_HARDWARE;
> diff --git a/arch/powerpc/platforms/powernv/rng.c 
> b/arch/powerpc/platforms/powernv/rng.c
> index 868bb9777425..c748567cd47e 100644
> --- a/arch/powerpc/platforms/powernv/rng.c
> +++ b/arch/powerpc/platforms/powernv/rng.c
> @@ -29,15 +29,6 @@ struct pnv_rng {
>
>  static DEFINE_PER_CPU(struct pnv_rng *, pnv_rng);
>
> -int pnv_hwrng_present(void)
> -{
> - struct pnv_rng *rng;
> -
> - rng = get_cpu_var(pnv_rng);
> - put_cpu_var(rng);
> - return rng != NULL;
> -}
> -
>  static unsigned long rng_whiten(struct pnv_rng *rng, unsigned long val)
>  {
>   unsigned long parity;
> @@ -58,17 +49,6 @@ static unsigned long rng_whiten(struct pnv_rng *rng, 
> unsigned long val)
>   return val;
>  }
>
> -int pnv_get_random_real_mode(unsigned long *v)
> -{
> - struct pnv_rng *rng;
> -
> - rng = raw_cpu_read(pnv_rng);
> -
> - *v = rng_whiten(rng, __raw_rm_readq(rng->regs_real));
> -
> - return 1;
> -}
> -
>  static int pnv_get_random_darn(unsigned long *v)
>  {
>   unsigned long val;
> @@ -105,11 +85,14 @@ int pnv_get_random_long(unsigned long *v)
>  {
>   struct pnv_rng *rng;
>
> - rng = get_cpu_var(pnv_rng);
> -
> - *v = rng_whiten(rng, in_be64(rng->regs));
> -
> - put_cpu_var(rng);
> + if (mfmsr() & MSR_DR) {
> + rng = raw_cpu_read(pnv_rng);
> + *v = rng_whiten(rng, __raw_rm_readq(rng->regs_real));
> + } else {
> + rng = get_cpu_var(pnv_rng);
> + *v = rng_whiten(rng, in_be64(rng->regs));
> + put_cpu_var(rng);
> + }
>
>   return 1;
>  }

[PATCH] KVM: PPC: Book3S HV: tracing: Add missing hcall names

2022-06-14 Thread Fabiano Rosas

The kvm_trace_symbol_hcall macro is missing several of the hypercalls
defined in hvcall.h.

Add the most common ones that are issued during guest lifetime,
including the ones that are only used by QEMU and SLOF.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/include/asm/hvcall.h |  8 
 arch/powerpc/kvm/trace_hv.h   | 21 -
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index d92a20a85395..1d454c70e7c6 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -350,6 +350,14 @@
 /* Platform specific hcalls, used by KVM */
 #define H_RTAS 0xf000
 
+/*
+ * Platform specific hcalls, used by QEMU/SLOF. These are ignored by
+ * KVM and only kept here so we can identify them during tracing.
+ */
+#define H_LOGICAL_MEMOP  0xF001
+#define H_CAS0XF002
+#define H_UPDATE_DT  0XF003
+
 /* "Platform specific hcalls", provided by PHYP */
 #define H_GET_24X7_CATALOG_PAGE0xF078
 #define H_GET_24X7_DATA0xF07C
diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
index 32e2cb5811cc..8d57c8428531 100644
--- a/arch/powerpc/kvm/trace_hv.h
+++ b/arch/powerpc/kvm/trace_hv.h
@@ -94,6 +94,7 @@
{H_GET_HCA_INFO,"H_GET_HCA_INFO"}, \
{H_GET_PERF_COUNT,  "H_GET_PERF_COUNT"}, \
{H_MANAGE_TRACE,"H_MANAGE_TRACE"}, \
+   {H_GET_CPU_CHARACTERISTICS, "H_GET_CPU_CHARACTERISTICS"}, \
{H_FREE_LOGICAL_LAN_BUFFER, "H_FREE_LOGICAL_LAN_BUFFER"}, \
{H_QUERY_INT_STATE, "H_QUERY_INT_STATE"}, \
{H_POLL_PENDING,"H_POLL_PENDING"}, \
@@ -125,7 +126,25 @@
{H_COP, "H_COP"}, \
{H_GET_MPP_X,   "H_GET_MPP_X"}, \
{H_SET_MODE,"H_SET_MODE"}, \
-   {H_RTAS,"H_RTAS"}
+   {H_REGISTER_PROC_TBL,   "H_REGISTER_PROC_TBL"}, \
+   {H_QUERY_VAS_CAPABILITIES,  "H_QUERY_VAS_CAPABILITIES"}, \
+   {H_INT_GET_SOURCE_INFO, "H_INT_GET_SOURCE_INFO"}, \
+   {H_INT_SET_SOURCE_CONFIG,   "H_INT_SET_SOURCE_CONFIG"}, \
+   {H_INT_GET_QUEUE_INFO,  "H_INT_GET_QUEUE_INFO"}, \
+   {H_INT_SET_QUEUE_CONFIG,"H_INT_SET_QUEUE_CONFIG"}, \
+   {H_INT_ESB, "H_INT_ESB"}, \
+   {H_INT_RESET,   "H_INT_RESET"}, \
+   {H_RPT_INVALIDATE,  "H_RPT_INVALIDATE"}, \
+   {H_RTAS,"H_RTAS"}, \
+   {H_LOGICAL_MEMOP,   "H_LOGICAL_MEMOP"}, \
+   {H_CAS, "H_CAS"}, \
+   {H_UPDATE_DT,   "H_UPDATE_DT"}, \
+   {H_GET_PERF_COUNTER_INFO,   "H_GET_PERF_COUNTER_INFO"}, \
+   {H_SET_PARTITION_TABLE, "H_SET_PARTITION_TABLE"}, \
+   {H_ENTER_NESTED,"H_ENTER_NESTED"}, \
+   {H_TLB_INVALIDATE,  "H_TLB_INVALIDATE"}, \
+   {H_COPY_TOFROM_GUEST,   "H_COPY_TOFROM_GUEST"}
+
 
 #define kvm_trace_symbol_kvmret \
{RESUME_GUEST,  "RESUME_GUEST"}, \
-- 
2.35.3

Re: [PATCH 4/4] powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTING

2022-05-27 Thread Fabiano Rosas

Nicholas Piggin  writes:

> CONFIG_VIRT_CPU_ACCOUNTING_GEN under pseries does not implement
> stolen time accounting. Implement it with the paravirt time
> accounting option.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  .../admin-guide/kernel-parameters.txt |  6 +++---
>  arch/powerpc/include/asm/paravirt.h   | 12 
>  arch/powerpc/platforms/pseries/Kconfig|  8 
>  arch/powerpc/platforms/pseries/lpar.c | 11 +++
>  arch/powerpc/platforms/pseries/setup.c| 19 +++
>  5 files changed, 53 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 3f1cc5e317ed..855fc7b02261 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3604,9 +3604,9 @@
>   [X86,PV_OPS] Disable paravirtualized VMware scheduler
>   clock and use the default one.
>
> - no-steal-acc[X86,PV_OPS,ARM64] Disable paravirtualized steal time
> - accounting. steal time is computed, but won't
> - influence scheduler behaviour
> + no-steal-acc[X86,PV_OPS,ARM64,PPC/PSERIES] Disable paravirtualized
> + steal time accounting. steal time is computed, but
> + won't influence scheduler behaviour
>
>   nolapic [X86-32,APIC] Do not enable or use the local APIC.
>
> diff --git a/arch/powerpc/include/asm/paravirt.h 
> b/arch/powerpc/include/asm/paravirt.h
> index eb7df559ae74..f5ba1a3c41f8 100644
> --- a/arch/powerpc/include/asm/paravirt.h
> +++ b/arch/powerpc/include/asm/paravirt.h
> @@ -21,6 +21,18 @@ static inline bool is_shared_processor(void)
>   return static_branch_unlikely(_processor);
>  }
>
> +#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
> +extern struct static_key paravirt_steal_enabled;
> +extern struct static_key paravirt_steal_rq_enabled;
> +
> +u64 pseries_paravirt_steal_clock(int cpu);
> +
> +static inline u64 paravirt_steal_clock(int cpu)
> +{
> + return pseries_paravirt_steal_clock(cpu);
> +}
> +#endif
> +
>  /* If bit 0 is set, the cpu has been ceded, conferred, or preempted */
>  static inline u32 yield_count_of(int cpu)
>  {
> diff --git a/arch/powerpc/platforms/pseries/Kconfig 
> b/arch/powerpc/platforms/pseries/Kconfig
> index f7fd91d153a4..d4306ebdca5e 100644
> --- a/arch/powerpc/platforms/pseries/Kconfig
> +++ b/arch/powerpc/platforms/pseries/Kconfig
> @@ -24,13 +24,21 @@ config PPC_PSERIES
>   select SWIOTLB
>   default y
>
> +config PARAVIRT
> + bool
> +

In file included from ../kernel/sched/build_utility.c:53:
../kernel/sched/sched.h:87:11: fatal error: asm/paravirt_api_clock.h: No such 
file or directory
   87 | # include 

$ find . -name paravirt_api_clock.h
./arch/arm64/include/asm/paravirt_api_clock.h
./arch/x86/include/asm/paravirt_api_clock.h
./arch/arm/include/asm/paravirt_api_clock.h

>  config PARAVIRT_SPINLOCKS
>   bool
>
> +config PARAVIRT_TIME_ACCOUNTING
> + select PARAVIRT
> + bool
> +
>  config PPC_SPLPAR
>   bool "Support for shared-processor logical partitions"
>   depends on PPC_PSERIES
>   select PARAVIRT_SPINLOCKS if PPC_QUEUED_SPINLOCKS
> + select PARAVIRT_TIME_ACCOUNTING if VIRT_CPU_ACCOUNTING_GEN
>   default y
>   help
> Enabling this option will make the kernel run more efficiently

Re: [PATCH 3/4] KVM: PPC: Book3S HV: Implement scheduling wait interval counters in the VPA

2022-05-27 Thread Fabiano Rosas

Nicholas Piggin  writes:

> PAPR specifies accumulated virtual processor wait intervals that relate
> to partition scheduling interval times. Implement these counters in the
> same way as they are repoted by dtl.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas

Re: [PATCH 2/4] powerpc/pseries: Add wait interval counters to VPA

2022-05-27 Thread Fabiano Rosas

Nicholas Piggin  writes:

> The hypervisor exposes accumulated partition scheduling interval times
> in the VPA (lppaca). These can be used to implement a simple stolen time
> in the guest without complex and costly dtl scanning.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/include/asm/lppaca.h | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/lppaca.h 
> b/arch/powerpc/include/asm/lppaca.h
> index c390ec377bae..34d44cb17c87 100644
> --- a/arch/powerpc/include/asm/lppaca.h
> +++ b/arch/powerpc/include/asm/lppaca.h
> @@ -104,14 +104,18 @@ struct lppaca {
>   volatile __be32 dispersion_count; /* dispatch changed physical cpu */
>   volatile __be64 cmo_faults; /* CMO page fault count */
>   volatile __be64 cmo_fault_time; /* CMO page fault time */
> - u8  reserved10[104];
> + u8  reserved10[64]; /* [S]PURR expropriated/donated */
> + volatile __be64 enqueue_dispatch_tb; /* Total TB enqueue->dispatch */
> + volatile __be64 ready_enqueue_tb; /* Total TB ready->enqueue */
> + volatile __be64 wait_ready_tb;  /* Total TB wait->ready */

This last one is unused but I assume you are adding anyway it because it
could be later added to lparcfg. So:

Reviewed-by: Fabiano Rosas

Re: [PATCH 1/4] KVM: PPC: Book3S HV P9: Restore stolen time logging in dtl

2022-05-27 Thread Fabiano Rosas

Nicholas Piggin  writes:

> Stolen time logging in dtl was removed from the P9 path, so guests had
> no stolen time accounting. Add it back in a simpler way that still
> avoids locks and per-core accounting code.
>
> Fixes: ecb6a7207f92 ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic")
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kvm/book3s_hv.c | 49 +---
>  1 file changed, 45 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 6fa518f6501d..0a0835edb64a 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -248,6 +248,7 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu 
> *vcpu)
>
>  /*
>   * We use the vcpu_load/put functions to measure stolen time.
> + *
>   * Stolen time is counted as time when either the vcpu is able to
>   * run as part of a virtual core, but the task running the vcore
>   * is preempted or sleeping, or when the vcpu needs something done
> @@ -277,6 +278,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu 
> *vcpu)
>   * lock.  The stolen times are measured in units of timebase ticks.
>   * (Note that the != TB_NIL checks below are purely defensive;
>   * they should never fail.)
> + *
> + * The POWER9 path is simpler, one vcpu per virtual core so the
> + * former case does not exist. If a vcpu is preempted when it is
> + * BUSY_IN_HOST and not ceded or otherwise blocked, then accumulate
> + * the stolen cycles in busy_stolen. RUNNING is not a preemptible
> + * state in the P9 path.

Do you mean RUNNABLE? The only RUNNING state I see is in relation to the
vcore.

[PATCH 5/5] KVM: PPC: Book3S HV: Provide more detailed timings for P9 entry path

2022-05-25 Thread Fabiano Rosas

Alter the data collection points for the debug timing code in the P9
path to be more in line with what the code does. The points where we
accumulate time are now the following:

vcpu_entry: From vcpu_run_hv entry until the start of the inner loop;

guest_entry: From the start of the inner loop until the guest entry in
 asm;

in_guest: From the guest entry in asm until the return to KVM C code;

guest_exit: From the return into KVM C code until the corresponding
hypercall/page fault handling or re-entry into the guest;

hypercall: Time spent handling hcalls in the kernel (hcalls can go to
   QEMU, not accounted here);

page_fault: Time spent handling page faults;

vcpu_exit: vcpu_run_hv exit (almost no code here currently).

Like before, these are exposed in debugfs in a file called
"timings". There are four values:

- number of occurrences of the accumulation point;
- total time the vcpu spent in the phase in ns;
- shortest time the vcpu spent in the phase in ns;
- longest time the vcpu spent in the phase in ns;

===
Before:

  rm_entry: 53132 16793518 256 4060
  rm_intr: 53132 2125914 22 340
  rm_exit: 53132 24108344 374 2180
  guest: 53132 40980507996 404 9997650
  cede: 0 0 0 0

After:

  vcpu_entry: 34637 7716108 178 4416
  guest_entry: 52414 49365608 324 747542
  in_guest: 52411 40828715840 258 9997480
  guest_exit: 52410 19681717182 826 102496674
  vcpu_exit: 34636 1744462 38 182
  hypercall: 45712 22878288 38 1307962
  page_fault: 992 04034 568 168688

  With just one instruction (hcall):

  vcpu_entry: 1 942 942 942
  guest_entry: 1 4044 4044 4044
  in_guest: 1 1540 1540 1540
  guest_exit: 1 3542 3542 3542
  vcpu_exit: 1 80 80 80
  hypercall: 0 0 0 0
  page_fault: 0 0 0 0
===

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/include/asm/kvm_host.h   | 12 +++-
 arch/powerpc/kvm/Kconfig  |  9 +
 arch/powerpc/kvm/book3s_hv.c  | 23 ++-
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 14 --
 4 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 37f03665bfa2..de2b226aa350 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -827,11 +827,13 @@ struct kvm_vcpu_arch {
struct kvmhv_tb_accumulator *cur_activity;  /* What we're timing */
u64 cur_tb_start;   /* when it started */
 #ifdef CONFIG_KVM_BOOK3S_HV_P9_TIMING
-   struct kvmhv_tb_accumulator rm_entry;   /* real-mode entry code */
-   struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */
-   struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
-   struct kvmhv_tb_accumulator guest_time; /* guest execution */
-   struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
+   struct kvmhv_tb_accumulator vcpu_entry;
+   struct kvmhv_tb_accumulator vcpu_exit;
+   struct kvmhv_tb_accumulator in_guest;
+   struct kvmhv_tb_accumulator hcall;
+   struct kvmhv_tb_accumulator pg_fault;
+   struct kvmhv_tb_accumulator guest_entry;
+   struct kvmhv_tb_accumulator guest_exit;
 #else
struct kvmhv_tb_accumulator rm_entry;   /* real-mode entry code */
struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 191347f44731..cedf1e0f50e1 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -135,10 +135,11 @@ config KVM_BOOK3S_HV_P9_TIMING
select KVM_BOOK3S_HV_EXIT_TIMING
depends on KVM_BOOK3S_HV_POSSIBLE && DEBUG_FS
help
- Calculate time taken for each vcpu in various parts of the
- code. The total, minimum and maximum times in nanoseconds
- together with the number of executions are reported in debugfs in
- kvm/vm#/vcpu#/timings.
+ Calculate time taken for each vcpu during vcpu entry and
+ exit, time spent inside the guest and time spent handling
+ hypercalls and page faults. The total, minimum and maximum
+ times in nanoseconds together with the number of executions
+ are reported in debugfs in kvm/vm#/vcpu#/timings.
 
  If unsure, say N.
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 69a6b40d58b9..f485632f247a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2654,11 +2654,13 @@ static struct debugfs_timings_element {
size_t offset;
 } timings[] = {
 #ifdef CONFIG_KVM_BOOK3S_HV_P9_TIMING
-   {"rm_entry",offsetof(struct kvm_vcpu, arch.rm_entry)},
-   {"rm_intr", offsetof(struct kvm_vcpu, arch.rm_intr)},
-   {"rm_exit", offsetof(struct kvm_vcpu, arch.rm_exit)},
-   {"guest",   offsetof(struct kvm_vcpu, arch.guest_time)},
-

[PATCH 4/5] KVM: PPC: Book3S HV: Expose timing functions to module code

2022-05-25 Thread Fabiano Rosas

The next patch adds new timing points to the P9 entry path, some of
which are in the module code, so we need to export the timing
functions.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_hv.h  | 10 ++
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 11 ++-
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.h b/arch/powerpc/kvm/book3s_hv.h
index 6b7f07d9026b..2f2e59d7d433 100644
--- a/arch/powerpc/kvm/book3s_hv.h
+++ b/arch/powerpc/kvm/book3s_hv.h
@@ -40,3 +40,13 @@ void switch_pmu_to_guest(struct kvm_vcpu *vcpu,
struct p9_host_os_sprs *host_os_sprs);
 void switch_pmu_to_host(struct kvm_vcpu *vcpu,
struct p9_host_os_sprs *host_os_sprs);
+
+#ifdef CONFIG_KVM_BOOK3S_HV_P9_TIMING
+void accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next);
+#define start_timing(vcpu, next) accumulate_time(vcpu, next)
+#define end_timing(vcpu) accumulate_time(vcpu, NULL)
+#else
+#define accumulate_time(vcpu, next) do {} while (0)
+#define start_timing(vcpu, next) do {} while (0)
+#define end_timing(vcpu) do {} while (0)
+#endif
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index f8ce473149b7..8b2a9a360e4e 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -438,7 +438,7 @@ void restore_p9_host_os_sprs(struct kvm_vcpu *vcpu,
 EXPORT_SYMBOL_GPL(restore_p9_host_os_sprs);
 
 #ifdef CONFIG_KVM_BOOK3S_HV_P9_TIMING
-static void __accumulate_time(struct kvm_vcpu *vcpu, struct 
kvmhv_tb_accumulator *next)
+void accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator *next)
 {
struct kvmppc_vcore *vc = vcpu->arch.vcore;
struct kvmhv_tb_accumulator *curr;
@@ -468,14 +468,7 @@ static void __accumulate_time(struct kvm_vcpu *vcpu, 
struct kvmhv_tb_accumulator
smp_wmb();
curr->seqcount = seq + 2;
 }
-
-#define start_timing(vcpu, next) __accumulate_time(vcpu, next)
-#define end_timing(vcpu) __accumulate_time(vcpu, NULL)
-#define accumulate_time(vcpu, next) __accumulate_time(vcpu, next)
-#else
-#define start_timing(vcpu, next) do {} while (0)
-#define end_timing(vcpu) do {} while (0)
-#define accumulate_time(vcpu, next) do {} while (0)
+EXPORT_SYMBOL_GPL(accumulate_time);
 #endif
 
 static inline u64 mfslbv(unsigned int idx)
-- 
2.35.1

[PATCH 3/5] KVM: PPC: Book3S HV: Decouple the debug timing from the P8 entry path

2022-05-25 Thread Fabiano Rosas

We are currently doing the timing for debug purposes of the P9 entry
path using the accumulators and terminology defined by the old entry
path for P8 machines.

Not only the "real-mode" and "napping" mentions are out of place for
the P9 Radix entry path but also we cannot change them because the
timing code is coupled to the structures defined in struct
kvm_vcpu_arch.

Add a new CONFIG_KVM_BOOK3S_HV_P9_TIMING to enable the timing code for
the P9 entry path. For now, just add the new CONFIG and duplicate the
structures. A subsequent patch will add the P9 changes.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/include/asm/kvm_host.h   |  8 
 arch/powerpc/kvm/Kconfig  | 14 +-
 arch/powerpc/kvm/book3s_hv.c  | 13 +++--
 arch/powerpc/kvm/book3s_hv_p9_entry.c |  2 +-
 4 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index faf301d0dec0..37f03665bfa2 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -826,11 +826,19 @@ struct kvm_vcpu_arch {
 #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
struct kvmhv_tb_accumulator *cur_activity;  /* What we're timing */
u64 cur_tb_start;   /* when it started */
+#ifdef CONFIG_KVM_BOOK3S_HV_P9_TIMING
struct kvmhv_tb_accumulator rm_entry;   /* real-mode entry code */
struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */
struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
struct kvmhv_tb_accumulator guest_time; /* guest execution */
struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
+#else
+   struct kvmhv_tb_accumulator rm_entry;   /* real-mode entry code */
+   struct kvmhv_tb_accumulator rm_intr;/* real-mode intr handling */
+   struct kvmhv_tb_accumulator rm_exit;/* real-mode exit code */
+   struct kvmhv_tb_accumulator guest_time; /* guest execution */
+   struct kvmhv_tb_accumulator cede_time;  /* time napping inside guest */
+#endif
 #endif /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */
 };
 
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 73f8277df7d1..191347f44731 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -130,10 +130,22 @@ config KVM_BOOK3S_64_PR
 config KVM_BOOK3S_HV_EXIT_TIMING
bool
 
+config KVM_BOOK3S_HV_P9_TIMING
+   bool "Detailed timing for the P9 entry point"
+   select KVM_BOOK3S_HV_EXIT_TIMING
+   depends on KVM_BOOK3S_HV_POSSIBLE && DEBUG_FS
+   help
+ Calculate time taken for each vcpu in various parts of the
+ code. The total, minimum and maximum times in nanoseconds
+ together with the number of executions are reported in debugfs in
+ kvm/vm#/vcpu#/timings.
+
+ If unsure, say N.
+
 config KVM_BOOK3S_HV_P8_TIMING
bool "Detailed timing for hypervisor real-mode code (for POWER8)"
select KVM_BOOK3S_HV_EXIT_TIMING
-   depends on KVM_BOOK3S_HV_POSSIBLE && DEBUG_FS
+   depends on KVM_BOOK3S_HV_POSSIBLE && DEBUG_FS && 
!KVM_BOOK3S_HV_P9_TIMING
help
  Calculate time taken for each vcpu in the real-mode guest entry,
  exit, and interrupt handling code, plus time spent in the guest
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6fa518f6501d..69a6b40d58b9 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2653,11 +2653,19 @@ static struct debugfs_timings_element {
const char *name;
size_t offset;
 } timings[] = {
+#ifdef CONFIG_KVM_BOOK3S_HV_P9_TIMING
{"rm_entry",offsetof(struct kvm_vcpu, arch.rm_entry)},
{"rm_intr", offsetof(struct kvm_vcpu, arch.rm_intr)},
{"rm_exit", offsetof(struct kvm_vcpu, arch.rm_exit)},
{"guest",   offsetof(struct kvm_vcpu, arch.guest_time)},
{"cede",offsetof(struct kvm_vcpu, arch.cede_time)},
+#else
+   {"rm_entry",offsetof(struct kvm_vcpu, arch.rm_entry)},
+   {"rm_intr", offsetof(struct kvm_vcpu, arch.rm_intr)},
+   {"rm_exit", offsetof(struct kvm_vcpu, arch.rm_exit)},
+   {"guest",   offsetof(struct kvm_vcpu, arch.guest_time)},
+   {"cede",offsetof(struct kvm_vcpu, arch.cede_time)},
+#endif
 };
 
 #define N_TIMINGS  (ARRAY_SIZE(timings))
@@ -2776,8 +2784,9 @@ static const struct file_operations debugfs_timings_ops = 
{
 /* Create a debugfs directory for the vcpu */
 static int kvmppc_arch_create_vcpu_debugfs_hv(struct kvm_vcpu *vcpu, struct 
dentry *debugfs_dentry)
 {
-   debugfs_create_file("timings", 0444, debugfs_dentry, vcpu,
-

[PATCH 2/5] KVM: PPC: Book3S HV: Add a new config for P8 debug timing

2022-05-25 Thread Fabiano Rosas

Turn the existing Kconfig KVM_BOOK3S_HV_EXIT_TIMING into
KVM_BOOK3S_HV_P8_TIMING in preparation for the addition of a new
config for P9 timings.

This applies only to P8 code, the generic timing code is still kept
under KVM_BOOK3S_HV_EXIT_TIMING.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kernel/asm-offsets.c   |  2 +-
 arch/powerpc/kvm/Kconfig|  6 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 24 
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index eec536aef83a..8c10f536e478 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -379,7 +379,7 @@ int main(void)
OFFSET(VCPU_SPRG2, kvm_vcpu, arch.shregs.sprg2);
OFFSET(VCPU_SPRG3, kvm_vcpu, arch.shregs.sprg3);
 #endif
-#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING
OFFSET(VCPU_TB_RMENTRY, kvm_vcpu, arch.rm_entry);
OFFSET(VCPU_TB_RMINTR, kvm_vcpu, arch.rm_intr);
OFFSET(VCPU_TB_RMEXIT, kvm_vcpu, arch.rm_exit);
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index ddd88179110a..73f8277df7d1 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -128,7 +128,11 @@ config KVM_BOOK3S_64_PR
  and system calls on the host.
 
 config KVM_BOOK3S_HV_EXIT_TIMING
-   bool "Detailed timing for hypervisor real-mode code"
+   bool
+
+config KVM_BOOK3S_HV_P8_TIMING
+   bool "Detailed timing for hypervisor real-mode code (for POWER8)"
+   select KVM_BOOK3S_HV_EXIT_TIMING
depends on KVM_BOOK3S_HV_POSSIBLE && DEBUG_FS
help
  Calculate time taken for each vcpu in the real-mode guest entry,
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index d185dee26026..c34932e31dcd 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -229,14 +229,14 @@ kvm_novcpu_wakeup:
cmpdi   r4, 0
beq kvmppc_primary_no_guest
 
-#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING
addir3, r4, VCPU_TB_RMENTRY
bl  kvmhv_start_timing
 #endif
b   kvmppc_got_guest
 
 kvm_novcpu_exit:
-#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING
ld  r4, HSTATE_KVM_VCPU(r13)
cmpdi   r4, 0
beq 13f
@@ -515,7 +515,7 @@ kvmppc_hv_entry:
li  r6, KVM_GUEST_MODE_HOST_HV
stb r6, HSTATE_IN_GUEST(r13)
 
-#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING
/* Store initial timestamp */
cmpdi   r4, 0
beq 1f
@@ -886,7 +886,7 @@ fast_guest_return:
li  r9, KVM_GUEST_MODE_GUEST_HV
stb r9, HSTATE_IN_GUEST(r13)
 
-#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING
/* Accumulate timing */
addir3, r4, VCPU_TB_GUEST
bl  kvmhv_accumulate_time
@@ -937,7 +937,7 @@ secondary_too_late:
cmpdi   r4, 0
beq 11f
stw r12, VCPU_TRAP(r4)
-#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING
addir3, r4, VCPU_TB_RMEXIT
bl  kvmhv_accumulate_time
 #endif
@@ -951,7 +951,7 @@ hdec_soon:
li  r12, BOOK3S_INTERRUPT_HV_DECREMENTER
 12:stw r12, VCPU_TRAP(r4)
mr  r9, r4
-#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING
addir3, r4, VCPU_TB_RMEXIT
bl  kvmhv_accumulate_time
 #endif
@@ -1048,7 +1048,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
li  r0, MSR_RI
mtmsrd  r0, 1
 
-#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING
addir3, r9, VCPU_TB_RMINTR
mr  r4, r9
bl  kvmhv_accumulate_time
@@ -1127,7 +1127,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
 guest_exit_cont:   /* r9 = vcpu, r12 = trap, r13 = paca */
 
-#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING
addir3, r9, VCPU_TB_RMEXIT
mr  r4, r9
bl  kvmhv_accumulate_time
@@ -1487,7 +1487,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
mtspr   SPRN_LPCR,r8
isync
 
-#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING
/* Finish timing, if we have a vcpu */
ld  r4, HSTATE_KVM_VCPU(r13)
cmpdi   r4, 0
@@ -2155,7 +2155,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM)
ld  r4, HSTATE_KVM_VCPU(r13)
std r3, VCPU_DEC_EXPIRES(r4)
 
-#ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
+#ifdef CONFIG_KVM_BOOK3S_HV_P8_TIMING
ld  r4, HSTATE_KVM_VCPU(r13)
addir3, r4, VCPU_TB_CEDE
bl  kvmhv_accumulate_time
@@ -2223,7 +2223,7 @@ kvm_end_cede:
/* get vcpu p

[PATCH 1/5] KVM: PPC: Book3S HV: Fix "rm_exit" entry in debugfs timings

2022-05-25 Thread Fabiano Rosas

At debugfs/kvm//vcpu0/timings we show how long each part of the
code takes to run:

$ cat /sys/kernel/debug/kvm/*-*/vcpu0/timings
rm_entry: 123785 49398892 118 4898
rm_intr: 123780 6075890 22 390
rm_exit: 0 0 0 0 <-- NOK
guest: 123780 46732919988 402 9997638
cede: 0 0 0 0<-- OK, no cede napping in P9

The "rm_exit" is always showing zero because it is the last one and
end_timing does not increment the counter of the previous entry.

We can fix it by calling accumulate_time again instead of
end_timing. That way the counter gets incremented. The rest of the
arithmetic can be ignored because there are no timing points after
this and the accumulators are reset before the next round.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 13 ++---
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index a28e5b3daabd..f7591b6c92d1 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -438,15 +438,6 @@ void restore_p9_host_os_sprs(struct kvm_vcpu *vcpu,
 EXPORT_SYMBOL_GPL(restore_p9_host_os_sprs);
 
 #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
-static void __start_timing(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator 
*next)
-{
-   struct kvmppc_vcore *vc = vcpu->arch.vcore;
-   u64 tb = mftb() - vc->tb_offset_applied;
-
-   vcpu->arch.cur_activity = next;
-   vcpu->arch.cur_tb_start = tb;
-}
-
 static void __accumulate_time(struct kvm_vcpu *vcpu, struct 
kvmhv_tb_accumulator *next)
 {
struct kvmppc_vcore *vc = vcpu->arch.vcore;
@@ -478,8 +469,8 @@ static void __accumulate_time(struct kvm_vcpu *vcpu, struct 
kvmhv_tb_accumulator
curr->seqcount = seq + 2;
 }
 
-#define start_timing(vcpu, next) __start_timing(vcpu, next)
-#define end_timing(vcpu) __start_timing(vcpu, NULL)
+#define start_timing(vcpu, next) __accumulate_time(vcpu, next)
+#define end_timing(vcpu) __accumulate_time(vcpu, NULL)
 #define accumulate_time(vcpu, next) __accumulate_time(vcpu, next)
 #else
 #define start_timing(vcpu, next) do {} while (0)
-- 
2.35.1

[PATCH 0/5] KVM: PPC: Book3S HV: Update debug timing code

2022-05-25 Thread Fabiano Rosas

We have some debug information at /sys/kernel/debug/kvm//vcpu#/timings
which shows the time it takes to run various parts of the code.

That infrastructure was written in the P8 timeframe and wasn't updated
along with the guest entry point changes for P9.

Ideally we would be able to just add new/different accounting points
to the code as it changes over time but since the P8 and P9 entry
points are different code paths we first need to separate them from
each other. This series alters KVM Kconfig to make that distinction.

Currently:
CONFIG_KVM_BOOK3S_HV_EXIT_TIMING - timing infrastructure in asm (P8 only)
   timing infrastructure in C (P9 only)
   generic timing variables (P8/P9)
   debugfs code
   timing points for P8

After this series:
CONFIG_KVM_BOOK3S_HV_EXIT_TIMING - generic timing variables (P8/P9)
   debugfs code

CONFIG_KVM_BOOK3S_HV_P8_TIMING - timing infrastructure in asm (P8 only)
 timing points for P8

CONFIG_KVM_BOOK3S_HV_P9_TIMING - timing infrastructure in C (P9 only)
 timing points for P9

The new Kconfig rules are:

a) CONFIG_KVM_BOOK3S_HV_P8_TIMING selects CONFIG_KVM_BOOK3S_HV_EXIT_TIMING,
   resulting in the previous behavior. Tested on P8.

b) CONFIG_KVM_BOOK3S_HV_P9_TIMING selects CONFIG_KVM_BOOK3S_HV_EXIT_TIMING,
   resulting in the new behavior. Tested on P9.

c) CONFIG_KVM_BOOK3S_HV_P8_TIMING and CONFIG_KVM_BOOK3S_HV_P9_TIMING
   are mutually exclusive. If both are set, P9 takes precedence.

Fabiano Rosas (5):
  KVM: PPC: Book3S HV: Fix "rm_exit" entry in debugfs timings
  KVM: PPC: Book3S HV: Add a new config for P8 debug timing
  KVM: PPC: Book3S HV: Decouple the debug timing from the P8 entry path
  KVM: PPC: Book3S HV: Expose timing functions to module code
  KVM: PPC: Book3S HV: Provide more detailed timings for P9 entry path

 arch/powerpc/include/asm/kvm_host.h | 10 +++
 arch/powerpc/kernel/asm-offsets.c   |  2 +-
 arch/powerpc/kvm/Kconfig| 19 -
 arch/powerpc/kvm/book3s_hv.c| 26 --
 arch/powerpc/kvm/book3s_hv.h| 10 +++
 arch/powerpc/kvm/book3s_hv_p9_entry.c   | 36 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 24 -
 7 files changed, 82 insertions(+), 45 deletions(-)

-- 
2.35.1

[PATCH] KVM: PPC: Align pt_regs in kvm_vcpu_arch structure

2022-05-25 Thread Fabiano Rosas

The H_ENTER_NESTED hypercall receives as second parameter the address
of a region of memory containing the values for the nested guest
privileged registers. We currently use the pt_regs structure contained
within kvm_vcpu_arch for that end.

Most hypercalls that receive a memory address expect that region to
not cross a 4k page boundary. We would want H_ENTER_NESTED to follow
the same pattern so this patch ensures the pt_regs structure sits
within a page.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/include/asm/kvm_host.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index faf301d0dec0..87eba60f2920 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -519,7 +519,11 @@ struct kvm_vcpu_arch {
struct kvmppc_book3s_shadow_vcpu *shadow_vcpu;
 #endif
 
-   struct pt_regs regs;
+   /*
+* This is passed along to the HV via H_ENTER_NESTED. Align to
+* prevent it crossing a real 4K page.
+*/
+   struct pt_regs regs __aligned(512);
 
struct thread_fp_state fp;
 
-- 
2.35.1

Re: [PATCH kernel] KVM: PPC: Book3s: PR: Enable default TCE hypercalls

2022-05-06 Thread Fabiano Rosas

Alexey Kardashevskiy  writes:

> When KVM_CAP_PPC_ENABLE_HCALL was introduced, H_GET_TCE and H_PUT_TCE
> were already implemented and enabled by default; however H_GET_TCE
> was missed out on PR KVM (probably because the handler was in
> the real mode code at the time).
>
> This enables H_GET_TCE by default. While at this, this wraps
> the checks in ifdef CONFIG_SPAPR_TCE_IOMMU just like HV KVM.
>
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: Fabiano Rosas

Re: [PATCH v2] powerpc/rtas: Keep MSR[RI] set when calling RTAS

2022-05-03 Thread Fabiano Rosas

Michael Ellerman  writes:

>> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
>> index 9581906b5ee9..65cb14b56f8d 100644
>> --- a/arch/powerpc/kernel/entry_64.S
>> +++ b/arch/powerpc/kernel/entry_64.S
>> @@ -330,22 +330,18 @@ _GLOBAL(enter_rtas)
>>  clrldi  r4,r4,2 /* convert to realmode address */
>>  mtlrr4
>>  
>> -li  r0,0
>> -ori r0,r0,MSR_EE|MSR_SE|MSR_BE|MSR_RI
>> -andcr0,r6,r0
>> -
>> -li  r9,1
>> -rldicr  r9,r9,MSR_SF_LG,(63-MSR_SF_LG)
>> -ori r9,r9,MSR_IR|MSR_DR|MSR_FE0|MSR_FE1|MSR_FP|MSR_RI|MSR_LE
>> -andcr6,r0,r9
>  
> One advantage of the old method is it can adapt to new MSR bits being
> set by the kernel.
>
> For example we used to use RTAS on powernv, and this code didn't need
> updating to cater to MSR_HV being set. We will probably never use RTAS
> on bare-metal again, so that's OK.
>
> But your change might break secure virtual machines, because it clears
> MSR_S whereas the old code didn't. I think SVMs did use RTAS, but I
> don't know whether it matters if it's called with MSR_S set or not?
>
> Not sure if anyone will remember, or has a working setup they can test.
> Maybe for now we just copy MSR_S from the kernel MSR the way the
> current code does.

Would the kernel even be able to change the bit? I think only urfid can
clear MSR_S.

Re: [PATCH kernel] KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers

2022-05-02 Thread Fabiano Rosas

Alexey Kardashevskiy  writes:

> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index d185dee26026..44d74bfe05df 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1784,13 +1784,8 @@ hcall_real_table:
>   .long   DOTSYM(kvmppc_h_clear_mod) - hcall_real_table
>   .long   DOTSYM(kvmppc_h_clear_ref) - hcall_real_table
>   .long   DOTSYM(kvmppc_h_protect) - hcall_real_table
> -#ifdef CONFIG_SPAPR_TCE_IOMMU
> - .long   DOTSYM(kvmppc_h_get_tce) - hcall_real_table
> - .long   DOTSYM(kvmppc_rm_h_put_tce) - hcall_real_table
> -#else
>   .long   0   /* 0x1c */
>   .long   0   /* 0x20 */
> -#endif
>   .long   0   /* 0x24 - H_SET_SPRG0 */
>   .long   DOTSYM(kvmppc_h_set_dabr) - hcall_real_table
>   .long   DOTSYM(kvmppc_rm_h_page_init) - hcall_real_table
> @@ -1868,13 +1863,8 @@ hcall_real_table:
>   .long   0   /* 0x12c */
>   .long   0   /* 0x130 */
>   .long   DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
> -#ifdef CONFIG_SPAPR_TCE_IOMMU
> - .long   DOTSYM(kvmppc_rm_h_stuff_tce) - hcall_real_table
> - .long   DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table
> -#else
>   .long   0   /* 0x138 */
>   .long   0   /* 0x13c */
> -#endif
>   .long   0   /* 0x140 */
>   .long   0   /* 0x144 */
>   .long   0   /* 0x148 */

The ones you remove from here need to be added to kvmppc_hcall_impl_hv,
otherwise we get the WARN at init_default_hcalls because
kvmppc_hcall_impl_hv_realmode can't find them.

Re: serial hang in qemu-system-ppc64 -M pseries

2022-04-29 Thread Fabiano Rosas

Rob Landley  writes:

> On 4/27/22 10:27, Thomas Huth wrote:
>> On 26/04/2022 12.26, Rob Landley wrote:
>>> When I cut and paste 80-ish characters of text into the Linux serial 
>>> console, it
>>> reads 16 characters and stops. When I hit space, it reads another 16 
>>> characters,
>>> and if I keep at it will eventually catch up without losing data. If I type,
>>> every character shows up immediately.
>> 
>> That "16" certainly comes from VTERM_BUFSIZE in hw/char/spapr_vty.c in the 
>> QEMU sources, I think.
>> 
>>> (On other qemu targets and kernels I can cut and paste an entire uuencoded
>>> binary and it goes through just fine in one go, but this target hangs with 
>>> big
>>> pastes until I hit keys.)
>>> 
>>> Is this a qemu-side bug, or a kernel-side bug?
>>> 
>>> Kernel config attached (linux 5.18-rc3 or thereabouts), qemu invocation is:
>>> 
>>> qemu-system-ppc64 -M pseries -vga none -nographic -no-reboot -m 256 -kernel
>>> vmlinux -initrd powerpc64leroot.cpio.gz -append "panic=1 HOST=powerpc64le
>>> console=hvc0"
>> 
>> Which version of QEMU are you using?
>
> $ qemu-system-ppc64 --version
> QEMU emulator version 6.2.92 (v6.2.0-rc2)
> Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
>
> From november. I can pull and rebuild but it'll take a bit. (Hopefully
> rebuilding would fix the need to echo -e '\e[?7h' afterwards to undo the "bash
> command line history marches up the screen because qemu's x86 bios disabled 
> line
> wrap and then left it that way" issue...)
>
>> Which frontend (GTK or terminal?) ... 
>
> The above command line has -nographic, forcing terminal. Running ldd on the
> binary doesn't pull up anything gtk. (It pulls up libncursesw though.)
>
> If you want to reproduce my test locally:
>
> wget 
> https://landley.net/toybox/downloads/binaries/mkroot/0.8.5/powerpc64le.tgz
> tar xvzf powerpc64le.tgz
> cd powerpc64le
> ./qemu-*.sh
>
> Then paste something longer than 16 characters at the eventual command prompt
> once the kernel finishes booting.

I suspect this is due to how the tty driver (n_tty.c) interacts with
the console (hvc_console.c) driver's buffer size.

This is the stack:

#0 hvc_push  <-- calls into KVM/QEMU to write up to 16 bytes
#1 hvc_write
#2 tty_put_char
#3 do_output_char
#4 __process_echoes  <-- writes up to tty_write_room() chars
#5 flush_echoes  <-- returns early if ~ECHO && ~ECHONL
#6 n_tty_receive_buf_common  <-- buffers more than 16 bytes
#7 tty_ldisc_receive_buf
#8 tty_port_default_receive_buf
#9 receive_buf
#10 process_one_work

In my busybox instance which does not have this issue I can see that
termios.c_lflag = 0x447, so in the stack above #4 is not called (ECHO
is 0x8, ECHONL is 0x10).

In the bug scenario: termios.c_lflag = 0x5cf, so we go into #4 which
is supposed to write (say) 17 bytes, but ends up writing only 16
because that is what tty_write_room() returns.

What I think is causing this issue is that the hvc_vio.c driver is
configured with hp->outbuf_size = 16. That number comes from the
H_PUT_TERM_CHAR hypercall spec which reads two registers at a
time. However, the hvc_write function supports writes larger than 16
bytes so it seems we're needlessly propagating the 16 byte limitation
to the upper layer.

The driver is also not buffering the write, so if it gets called to
write one char at a time (like __process_echoes does) there should be no
limit to how much it can write.

I think if we increase hp->outbuf_size to a value that is larger than
N_TTY_BUF_SIZE=4096 the echo buffer would drain before reaching this new
hvc driver limit.

I tested that and it seems to work, but I'm not sure if it's the right
fix, there are some things I couldn't figure out:

   i) if a driver actually runs out of buffer space, what
  __process_echoes should do about it;

  ii) why the bug sometimes happens only at the 32 characters boundary
  (or other multiple of 16);

 iii) why the ECHO flag differs from the working case.

> If you want to reproduce it all from source:
>
> git clone https://github.com/landley/toybox
> cd toybox && mkdir ccc && cd ccc
> wget
> https://landley.net/toybox/downloads/binaries/toolchains/latest/powerpc64le-linux-musl-cross.tar.xz
> -O - | tar xv
> cd ..
> CROSS=powerpc64le LINUX=~/linux scripts/mkroot.sh
> cd root/powerpc64le
> ./qemu-*.sh
>
> This assumes your linux kernel source directory is in ~/linux of course, and
> that qemu-system-ppc64 is in the $PATH...
>
>> this rings a distant bell, but I thought we had fixed these issues long ago 
>> in the past... e.g.:
>>
>> https://yhbt.net/lore/all/1380113886-16845-10-git-send-email-mdr...@linux.vnet.ibm.com/
>> 
>> https://git.qemu.org/?p=qemu.git;a=commitdiff;h=8a273cbe53221d28
>
> The qemu I'm running is newer than 2016. :)
>
> Most targets are fine with this: I cut and paste entire uuencoded binaries 
> into
> the serial console as an easy way to insert a file into an initramfs. It can
> usually take multiple megabytes

Re: [PATCH] KVM: PPC: Book3S HV: Initialize AMOR in nested entry

2022-04-28 Thread Fabiano Rosas

Nicholas Piggin  writes:

> Excerpts from Fabiano Rosas's message of April 26, 2022 12:21 am:
>> The hypervisor always sets AMOR to ~0, but let's ensure we're not
>> passing stale values around.
>> 
>
> Reviewed-by: Nicholas Piggin 
>
> Looks like our L0 doesn't do anything with hvregs.amor ?

It doesn't. And if the HV ever starts clearing bits from AMOR, then we
would need to change any kernel code that writes and reads from AMR (
such as the KUAP) to take into consideration that we might read a
different value from what we wrote.

[PATCH] KVM: PPC: Book3S HV: Initialize AMOR in nested entry

2022-04-25 Thread Fabiano Rosas

The hypervisor always sets AMOR to ~0, but let's ensure we're not
passing stale values around.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_hv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6fa518f6501d..b5f504576765 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3967,6 +3967,7 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu 
*vcpu, u64 time_limit, uns
 
kvmhv_save_hv_regs(vcpu, );
hvregs.lpcr = lpcr;
+   hvregs.amor = ~0;
vcpu->arch.regs.msr = vcpu->arch.shregs.msr;
hvregs.version = HV_GUEST_STATE_VERSION;
if (vcpu->arch.nested) {
-- 
2.35.1

Re: [PATCH 3/3] KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused

2022-04-19 Thread Fabiano Rosas

Sean Christopherson  writes:

> Add wrappers to acquire/release KVM's SRCU lock when stashing the index
> in vcpu->src_idx, along with rudimentary detection of illegal usage,
> e.g. re-acquiring SRCU and thus overwriting vcpu->src_idx.  Because the
> SRCU index is (currently) either 0 or 1, illegal nesting bugs can go
> unnoticed for quite some time and only cause problems when the nested
> lock happens to get a different index.
>
> Wrap the WARNs in PROVE_RCU=y, and make them ONCE, otherwise KVM will
> likely yell so loudly that it will bring the kernel to its knees.
>
> Signed-off-by: Sean Christopherson 

For the powerpc part:

Tested-by: Fabiano Rosas 

> ---
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |  9 +
>  arch/powerpc/kvm/book3s_hv_nested.c| 16 +++
>  arch/powerpc/kvm/book3s_rtas.c |  4 ++--
>  arch/powerpc/kvm/powerpc.c |  4 ++--
>  arch/riscv/kvm/vcpu.c  | 16 +++
>  arch/riscv/kvm/vcpu_exit.c |  4 ++--
>  arch/s390/kvm/interrupt.c  |  4 ++--
>  arch/s390/kvm/kvm-s390.c   |  8 
>  arch/s390/kvm/vsie.c   |  4 ++--
>  arch/x86/kvm/x86.c | 28 --
>  include/linux/kvm_host.h   | 24 +-
>  11 files changed, 71 insertions(+), 50 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
> b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> index e4ce2a35483f..42851c32ff3b 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> @@ -168,9 +168,10 @@ int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, 
> gva_t eaddr,
>   return -EINVAL;
>   /* Read the entry from guest memory */
>   addr = base + (index * sizeof(rpte));
> - vcpu->srcu_idx = srcu_read_lock(>srcu);
> +
> + kvm_vcpu_srcu_read_lock(vcpu);
>   ret = kvm_read_guest(kvm, addr, , sizeof(rpte));
> - srcu_read_unlock(>srcu, vcpu->srcu_idx);
> + kvm_vcpu_srcu_read_unlock(vcpu);
>   if (ret) {
>   if (pte_ret_p)
>   *pte_ret_p = addr;
> @@ -246,9 +247,9 @@ int kvmppc_mmu_radix_translate_table(struct kvm_vcpu 
> *vcpu, gva_t eaddr,
>  
>   /* Read the table to find the root of the radix tree */
>   ptbl = (table & PRTB_MASK) + (table_index * sizeof(entry));
> - vcpu->srcu_idx = srcu_read_lock(>srcu);
> + kvm_vcpu_srcu_read_lock(vcpu);
>   ret = kvm_read_guest(kvm, ptbl, , sizeof(entry));
> - srcu_read_unlock(>srcu, vcpu->srcu_idx);
> + kvm_vcpu_srcu_read_unlock(vcpu);
>   if (ret)
>   return ret;
>  
> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
> b/arch/powerpc/kvm/book3s_hv_nested.c
> index 9d373f8963ee..c943a051c6e7 100644
> --- a/arch/powerpc/kvm/book3s_hv_nested.c
> +++ b/arch/powerpc/kvm/book3s_hv_nested.c
> @@ -306,10 +306,10 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
>   /* copy parameters in */
>   hv_ptr = kvmppc_get_gpr(vcpu, 4);
>   regs_ptr = kvmppc_get_gpr(vcpu, 5);
> - vcpu->srcu_idx = srcu_read_lock(>kvm->srcu);
> + kvm_vcpu_srcu_read_lock(vcpu);
>   err = kvmhv_read_guest_state_and_regs(vcpu, _hv, _regs,
> hv_ptr, regs_ptr);
> - srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx);
> + kvm_vcpu_srcu_read_unlock(vcpu);
>   if (err)
>   return H_PARAMETER;
>  
> @@ -410,10 +410,10 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
>   byteswap_hv_regs(_hv);
>   byteswap_pt_regs(_regs);
>   }
> - vcpu->srcu_idx = srcu_read_lock(>kvm->srcu);
> + kvm_vcpu_srcu_read_lock(vcpu);
>   err = kvmhv_write_guest_state_and_regs(vcpu, _hv, _regs,
>  hv_ptr, regs_ptr);
> - srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx);
> + kvm_vcpu_srcu_read_unlock(vcpu);
>   if (err)
>   return H_AUTHORITY;
>  
> @@ -600,16 +600,16 @@ long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu 
> *vcpu)
>   goto not_found;
>  
>   /* Write what was loaded into our buffer back to the L1 guest */
> - vcpu->srcu_idx = srcu_read_lock(>kvm->srcu);
> + kvm_vcpu_srcu_read_lock(vcpu);
>   rc = kvm_vcpu_write_guest(vcpu, gp_to, buf, n);
> - srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx);
> + kvm_vcpu_srcu_read_unlock(vcpu);
>

[PATCH] KVM: PPC: Book3S HV: Fix vcore_blocked tracepoint

2022-03-28 Thread Fabiano Rosas

We removed most of the vcore logic from the P9 path but there's still
a tracepoint that tried to dereference vc->runner.

Fixes: ecb6a7207f92 ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic")
Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_hv.c | 8 
 arch/powerpc/kvm/trace_hv.h  | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c886557638a1..5f5b2d0dee8c 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4218,13 +4218,13 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore 
*vc)
start_wait = ktime_get();
 
vc->vcore_state = VCORE_SLEEPING;
-   trace_kvmppc_vcore_blocked(vc, 0);
+   trace_kvmppc_vcore_blocked(vc->runner, 0);
spin_unlock(>lock);
schedule();
finish_rcuwait(>wait);
spin_lock(>lock);
vc->vcore_state = VCORE_INACTIVE;
-   trace_kvmppc_vcore_blocked(vc, 1);
+   trace_kvmppc_vcore_blocked(vc->runner, 1);
++vc->runner->stat.halt_successful_wait;
 
cur = ktime_get();
@@ -4596,9 +4596,9 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
if (kvmppc_vcpu_check_block(vcpu))
break;
 
-   trace_kvmppc_vcore_blocked(vc, 0);
+   trace_kvmppc_vcore_blocked(vcpu, 0);
schedule();
-   trace_kvmppc_vcore_blocked(vc, 1);
+   trace_kvmppc_vcore_blocked(vcpu, 1);
}
finish_rcuwait(wait);
}
diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
index 38cd0ed0a617..32e2cb5811cc 100644
--- a/arch/powerpc/kvm/trace_hv.h
+++ b/arch/powerpc/kvm/trace_hv.h
@@ -409,9 +409,9 @@ TRACE_EVENT(kvmppc_run_core,
 );
 
 TRACE_EVENT(kvmppc_vcore_blocked,
-   TP_PROTO(struct kvmppc_vcore *vc, int where),
+   TP_PROTO(struct kvm_vcpu *vcpu, int where),
 
-   TP_ARGS(vc, where),
+   TP_ARGS(vcpu, where),
 
TP_STRUCT__entry(
__field(int,n_runnable)
@@ -421,8 +421,8 @@ TRACE_EVENT(kvmppc_vcore_blocked,
),
 
TP_fast_assign(
-   __entry->runner_vcpu = vc->runner->vcpu_id;
-   __entry->n_runnable  = vc->n_runnable;
+   __entry->runner_vcpu = vcpu->vcpu_id;
+   __entry->n_runnable  = vcpu->arch.vcore->n_runnable;
__entry->where   = where;
__entry->tgid= current->tgid;
),
-- 
2.35.1

Re: [PATCH 6/6] KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting

2022-03-09 Thread Fabiano Rosas

Nicholas Piggin  writes:

> The L1 should not be able to adjust LPES mode for the L2. Setting LPES
> if the L0 needs it clear would cause external interrupts to be sent to
> L2 and missed by the L0.
>
> Clearing LPES when it may be set, as typically happens with XIVE enabled
> could cause a performance issue despite having no native XIVE support in
> the guest, because it will cause mediated interrupts for the L2 to be
> taken in HV mode, which then have to be injected.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas

Re: [PATCH 4/6] KVM: PPC: Book3S HV P9: Split !nested case out from guest entry

2022-03-09 Thread Fabiano Rosas

Nicholas Piggin  writes:

> The differences between nested and !nested will become larger in
> later changes so split them out for readability.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas

Re: [PATCH 3/6] KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming

2022-03-09 Thread Fabiano Rosas

Nicholas Piggin  writes:

> Move the cede abort logic out of xive escalation rearming and into
> the caller to prepare for handling a similar case with nested guest
> entry.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/include/asm/kvm_ppc.h |  4 ++--
>  arch/powerpc/kvm/book3s_hv.c   | 10 --
>  arch/powerpc/kvm/book3s_xive.c |  9 ++---
>  3 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index a14dbcd1b8ce..94fa5f246657 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -671,7 +671,7 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int 
> irq_source_id, u32 irq,
>  int level, bool line_status);
>  extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
>  extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu);
> -extern void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu);
> +extern bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu);
>
>  static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
>  {
> @@ -709,7 +709,7 @@ static inline int kvmppc_xive_set_irq(struct kvm *kvm, 
> int irq_source_id, u32 ir
> int level, bool line_status) { return 
> -ENODEV; }
>  static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
>  static inline void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu) { }
> -static inline void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu) { }
> +static inline bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu) { 
> return true; }
>
>  static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
>   { return 0; }
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 5df359053147..a0b674d3a2da 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4073,10 +4073,16 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu 
> *vcpu, u64 time_limit,
>   !(vcpu->arch.shregs.msr & MSR_PR)) {
>   unsigned long req = kvmppc_get_gpr(vcpu, 3);
>
> - /* H_CEDE has to be handled now, not later */
> + /* H_CEDE has to be handled now */
>   if (req == H_CEDE) {
>   kvmppc_cede(vcpu);
> - kvmppc_xive_rearm_escalation(vcpu); /* may 
> un-cede */
> + if (!kvmppc_xive_rearm_escalation(vcpu)) {
> + /*
> +  * Pending escalation so abort
> +  * the cede.
> +  */
> + vcpu->arch.ceded = 0;

This was moved after the mb() in kvmppc_xive_rearm_escalation, so I
think a concurrent H_PROD might continue to see tvcpu->arch.ceded = 1
after the escalation has been set. Is this an issue?

> + }
>   kvmppc_set_gpr(vcpu, 3, 0);
>   trap = 0;
>
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index e216c068075d..7b513e14cada 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -179,12 +179,13 @@ void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_xive_pull_vcpu);
>
> -void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
> +bool kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
>  {
>   void __iomem *esc_vaddr = (void __iomem *)vcpu->arch.xive_esc_vaddr;
> + bool ret = true;
>
>   if (!esc_vaddr)
> - return;
> + return ret;
>
>   /* we are using XIVE with single escalation */
>
> @@ -197,7 +198,7 @@ void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
>* we also don't want to set xive_esc_on to 1 here in
>* case we race with xive_esc_irq().
>*/
> - vcpu->arch.ceded = 0;
> + ret = false;
>   /*
>* The escalation interrupts are special as we don't EOI them.
>* There is no need to use the load-after-store ordering offset
> @@ -210,6 +211,8 @@ void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
>   __raw_readq(esc_vaddr + XIVE_ESB_SET_PQ_00);
>   }
>   mb();
> +
> + return ret;
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_xive_rearm_escalation);

Re: [PATCH 1/6] KVM: PPC: Book3S HV P9: Fix "lost kick" race

2022-03-09 Thread Fabiano Rosas

Nicholas Piggin  writes:

> When new work is created that requires attention from the hypervisor
> (e.g., to inject an interrupt into the guest), fast_vcpu_kick is used to
> pull the target vcpu out of the guest if it may have been running.
>
> Therefore the work creation side looks like this:
>
>   vcpu->arch.doorbell_request = 1;
>   kvmppc_fast_vcpu_kick_hv(vcpu) {
> smp_mb();
> cpu = vcpu->cpu;
> if (cpu != -1)
> send_ipi(cpu);
>   }
>
> And the guest entry side *should* look like this:
>
>   vcpu->cpu = smp_processor_id();
>   smp_mb();
>   if (vcpu->arch.doorbell_request) {
> // do something (abort entry or inject doorbell etc)
>   }
>
> But currently the store and load are flipped, so it is possible for the
> entry to see no doorbell pending, and the doorbell creation misses the
> store to set cpu, resulting lost work (or at least delayed until the
> next guest exit).
>
> Fix this by reordering the entry operations and adding a smp_mb
> between them. The P8 path appears to have a similar race which is
> commented but not addressed yet.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas

[RFC PATCH] KVM: PPC: Book3S HV: Add KVM_CAP_PPC_GTSE

2022-03-08 Thread Fabiano Rosas

This patch adds a new KVM capability to address a crash we're
currently having inside the nested guest kernel when running with
GTSE disabled in the nested hypervisor.

The summary is:

We allow any guest a cmdline override of GTSE for migration
purposes. The nested guest does not know it needs to use the option
and tries to run 'tlbie' with LPCR_GTSE=0.

The details are a bit more intricate:

QEMU always sets GTSE=1 in OV5 even before calling KVM. At prom_init,
guests use the OV5 value to set MMU_FTR_GTSE. This setting can be
overridden by 'radix_hcall_invalidate=on' in the kernel cmdline. The
option itself depends on the availability of
FW_FEATURE_RPT_INVALIDATE, which is tied to QEMU's cap-rpt-invalidate
capability.

The MMU_FTR_GTSE flag leads guests to set PROC_TABLE_GTSE in their
process tables and after H_REGISTER_PROC_TBL, both QEMU and KVM will
set LPCR_GTSE=1 for that guest. Unless the guest uses the cmdline
override, in which case:

  MMU_FTR_GTSE=0 -> PROC_TABLE_GTSE=0 -> LPCR_GTSE=0

We don't allow the nested hypervisor to set some LPCR bits for its
nested guests, so if the nested HV has LPCR_GTSE=0, its nested guests
will also have LPCR_GTSE=0. But since the only thing that can really
flip GTSE is the cmdline override, if a nested guest runs without it,
then the sequence goes:

  MMU_FTR_GTSE=1 -> PROC_TABLE_GTSE=1 -> LPCR_GTSE=0.

With LPCR_GTSE=0 the HW will treat 'tlbie' as HV privileged.

How the new capability helps:

By having QEMU consult KVM on what the correct GTSE value is, we can
have the nested hypervisor return the same value that it is currently
using. QEMU will then put the correct value in the device-tree for the
nested guest and MMU_FTR_GTSE will match LPCR_GTSE.

Fixes: b87cc116c7e1 ("KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE 
capability")
Signed-off-by: Fabiano Rosas 
---
This supersedes the previous RFC: "KVM: PPC: Book3s HV: Allow setting
GTSE for the nested guest"*. Aneesh explained to me that we don't want
to allow L1 and L2 GTSE values to differ.

*- https://lore.kernel.org/r/20220304182657.2489303-1-faro...@linux.ibm.com
---
 arch/powerpc/kvm/powerpc.c | 3 +++
 include/uapi/linux/kvm.h   | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2ad0ccd202d5..dd08b3b729cd 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -677,6 +677,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_PPC_RPT_INVALIDATE:
r = 1;
break;
+   case KVM_CAP_PPC_GTSE:
+   r = mmu_has_feature(MMU_FTR_GTSE);
+   break;
 #endif
default:
r = 0;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 507ee1f2aa96..cc581e345d2a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1135,6 +1135,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_XSAVE2 208
 #define KVM_CAP_SYS_ATTRIBUTES 209
 #define KVM_CAP_PPC_AIL_MODE_3 210
+#define KVM_CAP_PPC_GTSE 211
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.34.1

Re: [RFC PATCH 2/2] KVM: PPC: Book3S HV: Provide a more accurate MAX_VCPU_ID in P9

2022-03-08 Thread Fabiano Rosas

Christophe Leroy  writes:

> Le 13/04/2021 à 00:26, Fabiano Rosas a écrit :
>> The KVM_CAP_MAX_VCPU_ID capability was added by commit 0b1b1dfd52a6
>> ("kvm: introduce KVM_MAX_VCPU_ID") to allow for vcpu ids larger than
>> KVM_MAX_VCPU in powerpc.
>> 
>> For a P9 host we depend on the guest VSMT value to know what is the
>> maximum number of vcpu id we can support:
>> 
>> kvmppc_core_vcpu_create_hv:
>>  (...)
>>  if (cpu_has_feature(CPU_FTR_ARCH_300)) {
>> --> if (id >= (KVM_MAX_VCPUS * kvm->arch.emul_smt_mode)) {
>>  pr_devel("KVM: VCPU ID too high\n");
>>  core = KVM_MAX_VCORES;
>>  } else {
>>  BUG_ON(kvm->arch.smt_mode != 1);
>>  core = kvmppc_pack_vcpu_id(kvm, id);
>>  }
>>  } else {
>>  core = id / kvm->arch.smt_mode;
>>  }
>> 
>> which means that the value being returned by the capability today for
>> a given guest is potentially way larger than what we actually support:
>> 
>> \#define KVM_MAX_VCPU_ID (MAX_SMT_THREADS * KVM_MAX_VCORES)
>> 
>> If the capability is queried before userspace enables the
>> KVM_CAP_PPC_SMT ioctl there is not much we can do, but if the emulated
>> smt mode is already known we could provide a more accurate value.
>> 
>> The only practical effect of this change today is to make the
>> kvm_create_max_vcpus test pass for powerpc. The QEMU spapr machine has
>> a lower max vcpu than what KVM allows so even KVM_MAX_VCPU is not
>> reached.
>> 
>> Signed-off-by: Fabiano Rosas 
>
> This patch won't apply after commit a1c42ddedf35 ("kvm: rename 
> KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS")
>
> Feel free to resubmit if still applicable.

Thanks for the reminder, Christophe.

I was focusing on enabling the rest of the kvm-selftests:
https://lore.kernel.org/r/20220120170109.948681-1-faro...@linux.ibm.com

I'm preparing a v2 for that series and will try to include these patches
as well.

Re: [PATCH 2/6] KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry

2022-03-07 Thread Fabiano Rosas

Nicholas Piggin  writes:

> If there is a pending xive interrupt, inject it at guest entry (if
> MSR[EE] is enabled) rather than take another interrupt when the guest
> is entered. If xive is enabled then LPCR[LPES] is set so this behaviour
> should be expected.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kvm/book3s_hv.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index f8c0f1f52a1e..5df359053147 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4524,9 +4524,14 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
> time_limit,
>
>   if (!nested) {
>   kvmppc_core_prepare_to_enter(vcpu);
> - if (test_bit(BOOK3S_IRQPRIO_EXTERNAL,
> -  >arch.pending_exceptions))
> + if (vcpu->arch.shregs.msr & MSR_EE) {
> + if (xive_interrupt_pending(vcpu))
> + kvmppc_inject_interrupt_hv(vcpu,
> + BOOK3S_INTERRUPT_EXTERNAL, 0);
> + } else if (test_bit(BOOK3S_IRQPRIO_EXTERNAL,
> +  >arch.pending_exceptions)) {
>   lpcr |= LPCR_MER;
> + }
>   } else if (vcpu->arch.pending_exceptions ||
>      vcpu->arch.doorbell_request ||
>  xive_interrupt_pending(vcpu)) {

Reviewed-by: Fabiano Rosas

Re: [RFC PATCH] KVM: PPC: Book3s HV: Allow setting GTSE for the nested guest

2022-03-07 Thread Fabiano Rosas

"Aneesh Kumar K.V"  writes:

> Fabiano Rosas  writes:
>
>> We're currently getting a Program Interrupt inside the nested guest
>> kernel when running with GTSE disabled in the nested hypervisor. We
>> allow any guest a cmdline override of GTSE for migration purposes. The
>> nested guest does not know it needs to use the option and tries to run
>> 'tlbie' with LPCR_GTSE=0.
>>
>> The details are a bit more intricate:
>>
>> QEMU always sets GTSE=1 in OV5 even before calling KVM. At prom_init,
>> guests use the OV5 value to set MMU_FTR_GTSE. This setting can be
>> overridden by 'radix_hcall_invalidate=on' in the kernel cmdline. The
>> option itself depends on the availability of
>> FW_FEATURE_RPT_INVALIDATE, which it tied to QEMU's cap-rpt-invalidate
>> capability.
>>
>> The MMU_FTR_GTSE flag leads guests to set PROC_TABLE_GTSE in their
>> process tables and after H_REGISTER_PROC_TBL, both QEMU and KVM will
>> set LPCR_GTSE=1 for that guest. Unless the guest uses the cmdline
>> override, in which case:
>>
>>   MMU_FTR_GTSE=0 -> PROC_TABLE_GTSE=0 -> LPCR_GTSE=0
>>
>> We don't allow the nested hypervisor to set some LPCR bits for its
>> nested guests, so if the nested HV has LPCR_GTSE=0, its nested guests
>> will also have LPCR_GTSE=0. But since the only thing that can really
>> flip GTSE is the cmdline override, if a nested guest runs without it,
>> then the sequence goes:
>>
>>   MMU_FTR_GTSE=1 -> PROC_TABLE_GTSE=1 -> LPCR_GTSE=0.
>>
>> With LPCR_GTSE=0 the HW will treat 'tlbie' as HV privileged.
>>
>> This patch allows a nested HV to set LPCR_GTSE for its nested guests
>> so the LPCR setting will match what the nested guest sees in OV5.
>
> This needs a Fixes: tag?

This feature was done in many pieces, I think it will end up being the
commit that enabled the H_RPT_INVALIDATE capability:

Fixes: b87cc116c7e1 ("KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE 
capability")

> I am not sure what is broken. If L1 doesn't support GTSE, then it should
> publish the same to L2 and L2 should not use tlbie.

L1 cannot set L2's LPCR to the correct value because L0 will not allow
it. That is what this patch is changing.

I looked into having QEMU set the proper values to use with CAS, but
that is done in QEMU too early, before the first dispatch of L2 (which
is when L0 decides that L1 is not allowed to modify some bits). So QEMU
always advertises GTSE=1.

> That was working before? Or is it that the kernel command to disable
> gtse when used with L2 kernel is broken?

The command line works, but it needs to be explicitly given when
starting the L2. There is no link between L1 and L2 when it comes to
GTSE aside from the LPCR value L1 chose to use. So L2 can start with no
command line at all, while L1 had GTSE disabled.

AFAICT, this has always been broken. The stack leading to this is:

NIP [c008615c] radix__flush_tlb_kernel_range+0x13c/0x420
[c0075840] change_page_attr+0xb0/0x240
[c044624c] __apply_to_page_range+0x5ac/0xb90
[c0075bbc] change_memory_attr+0x7c/0x150
[c0350390] bpf_prog_select_runtime+0x200/0x290
[c0d9400c] bpf_migrate_filter+0x18c/0x1e0
[c0d95f38] bpf_prog_create+0x178/0x1d0
[c130e4f4] ptp_classifier_init+0x4c/0x80
[c0000130d874] sock_init+0xe0/0x100
[c00121e0] do_one_initcall+0x60/0x2c0
[c12b48cc] kernel_init_freeable+0x33c/0x3dc
[c00127c8] kernel_init+0x44/0x18c
[c000ce64] ret_from_kernel_thread+0x5c/0x64

>>
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  arch/powerpc/kvm/book3s_hv_nested.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
>> b/arch/powerpc/kvm/book3s_hv_nested.c
>> index 9d373f8963ee..5b9008d89f90 100644
>> --- a/arch/powerpc/kvm/book3s_hv_nested.c
>> +++ b/arch/powerpc/kvm/book3s_hv_nested.c
>> @@ -262,7 +262,7 @@ static void load_l2_hv_regs(struct kvm_vcpu *vcpu,
>>   * Don't let L1 change LPCR bits for the L2 except these:
>>   */
>>  mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
>> -LPCR_LPES | LPCR_MER;
>> +LPCR_LPES | LPCR_MER | LPCR_GTSE;
>>  
>>  /*
>>   * Additional filtering is required depending on hardware
>> -- 
>> 2.34.1

[RFC PATCH] KVM: PPC: Book3s HV: Allow setting GTSE for the nested guest

2022-03-04 Thread Fabiano Rosas

We're currently getting a Program Interrupt inside the nested guest
kernel when running with GTSE disabled in the nested hypervisor. We
allow any guest a cmdline override of GTSE for migration purposes. The
nested guest does not know it needs to use the option and tries to run
'tlbie' with LPCR_GTSE=0.

The details are a bit more intricate:

QEMU always sets GTSE=1 in OV5 even before calling KVM. At prom_init,
guests use the OV5 value to set MMU_FTR_GTSE. This setting can be
overridden by 'radix_hcall_invalidate=on' in the kernel cmdline. The
option itself depends on the availability of
FW_FEATURE_RPT_INVALIDATE, which it tied to QEMU's cap-rpt-invalidate
capability.

The MMU_FTR_GTSE flag leads guests to set PROC_TABLE_GTSE in their
process tables and after H_REGISTER_PROC_TBL, both QEMU and KVM will
set LPCR_GTSE=1 for that guest. Unless the guest uses the cmdline
override, in which case:

  MMU_FTR_GTSE=0 -> PROC_TABLE_GTSE=0 -> LPCR_GTSE=0

We don't allow the nested hypervisor to set some LPCR bits for its
nested guests, so if the nested HV has LPCR_GTSE=0, its nested guests
will also have LPCR_GTSE=0. But since the only thing that can really
flip GTSE is the cmdline override, if a nested guest runs without it,
then the sequence goes:

  MMU_FTR_GTSE=1 -> PROC_TABLE_GTSE=1 -> LPCR_GTSE=0.

With LPCR_GTSE=0 the HW will treat 'tlbie' as HV privileged.

This patch allows a nested HV to set LPCR_GTSE for its nested guests
so the LPCR setting will match what the nested guest sees in OV5.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_hv_nested.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 9d373f8963ee..5b9008d89f90 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -262,7 +262,7 @@ static void load_l2_hv_regs(struct kvm_vcpu *vcpu,
 * Don't let L1 change LPCR bits for the L2 except these:
 */
mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
-   LPCR_LPES | LPCR_MER;
+   LPCR_LPES | LPCR_MER | LPCR_GTSE;
 
/*
 * Additional filtering is required depending on hardware
-- 
2.34.1

Re: [PATCH v3 3/3] KVM: PPC: Add KVM_CAP_PPC_AIL_MODE_3

2022-02-21 Thread Fabiano Rosas

Nicholas Piggin  writes:

> Add KVM_CAP_PPC_AIL_MODE_3 to advertise the capability to set the AIL
> resource mode to 3 with the H_SET_MODE hypercall. This capability
> differs between processor types and KVM types (PR, HV, Nested HV), and
> affects guest-visible behaviour.
>
> QEMU will implement a cap-ail-mode-3 to control this behaviour[1], and
> use the KVM CAP if available to determine KVM support[2].
>
> [1] https://lists.nongnu.org/archive/html/qemu-ppc/2022-02/msg00437.html
> [2] https://lists.nongnu.org/archive/html/qemu-ppc/2022-02/msg00439.html
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  Documentation/virt/kvm/api.rst | 14 ++
>  arch/powerpc/include/asm/setup.h   |  2 ++
>  arch/powerpc/kvm/powerpc.c | 20 
>  arch/powerpc/platforms/pseries/setup.c | 12 +++-
>  include/uapi/linux/kvm.h   |  1 +
>  tools/include/uapi/linux/kvm.h |  1 +
>  6 files changed, 49 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index bb8cfddbb22d..404056a9a35a 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6995,6 +6995,20 @@ indicated by the fd to the VM this is called on.
>  This is intended to support intra-host migration of VMs between userspace 
> VMMs,
>  upgrading the VMM process without interrupting the guest.
>
> +7.30 KVM_CAP_PPC_AIL_MODE_3
> +---
> +
> +:Capability: KVM_CAP_PPC_AIL_MODE_3
> +:Architectures: ppc
> +:Type: vm
> +
> +This capability indicates that the kernel supports the mode 3 setting for the
> +"Address Translation Mode on Interrupt" aka "Alternate Interrupt Location"
> +resource that is controlled with the H_SET_MODE hypercall.
> +
> +This capability allows a guest kernel to use a better-performance mode for
> +handling interrupts and system calls.
> +
>  8. Other capabilities.
>  ==
>
> diff --git a/arch/powerpc/include/asm/setup.h 
> b/arch/powerpc/include/asm/setup.h
> index d0d3dd531c7f..a555fb77258a 100644
> --- a/arch/powerpc/include/asm/setup.h
> +++ b/arch/powerpc/include/asm/setup.h
> @@ -28,11 +28,13 @@ void setup_panic(void);
>  #define ARCH_PANIC_TIMEOUT 180
>
>  #ifdef CONFIG_PPC_PSERIES
> +extern bool pseries_reloc_on_exception(void);
>  extern bool pseries_enable_reloc_on_exc(void);
>  extern void pseries_disable_reloc_on_exc(void);
>  extern void pseries_big_endian_exceptions(void);
>  void __init pseries_little_endian_exceptions(void);
>  #else
> +static inline bool pseries_reloc_on_exception(void) { return false; }
>  static inline bool pseries_enable_reloc_on_exc(void) { return false; }
>  static inline void pseries_disable_reloc_on_exc(void) {}
>  static inline void pseries_big_endian_exceptions(void) {}
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 2ad0ccd202d5..7dc101ea778c 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -678,6 +678,26 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   r = 1;
>   break;
>  #endif
> + case KVM_CAP_PPC_AIL_MODE_3:
> + /*
> +  * KVM PR, POWER7, and some POWER9s don't support AIL=3 mode.
> +  * The POWER9s can support it if the guest runs in hash mode,
> +  * but QEMU doesn't necessarily query the capability in time.
> +  */
> + if (hv_enabled) {
> + if (kvmhv_on_pseries()) {
> + if (pseries_reloc_on_exception())
> + r = 1;
> + } else if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
> +   
> !cpu_has_feature(CPU_FTR_P9_RADIX_PREFETCH_BUG)) {
> + r = 1;
> + } else {
> + r = 0;
> + }
> + } else {
> + r = 0;
> + }
> + break;
>   default:
>   r = 0;
>   break;
> diff --git a/arch/powerpc/platforms/pseries/setup.c 
> b/arch/powerpc/platforms/pseries/setup.c
> index 83a04d967a59..182525c2abd5 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -353,6 +353,13 @@ static void pseries_lpar_idle(void)
>   pseries_idle_epilog();
>  }
>
> +static bool pseries_reloc_on_exception_enabled;
> +
> +bool pseries_reloc_on_exception(void)
> +{
> + return pseries_reloc_on_e

Re: [PATCH v2 1/2] KVM: PPC: Book3S PR: Disable SCV when AIL could be disabled

2022-01-31 Thread Fabiano Rosas

Nicholas Piggin  writes:

> PR KVM does not support running with AIL enabled, and SCV does is not
> supported with AIL disabled. Fix this by ensuring the SCV facility is
> disabled with FSCR while a CPU could be running with AIL=0.
>
> The PowerNV host supports disabling AIL on a per-CPU basis, so SCV just
> needs to be disabled when a vCPU is being run.
>
> The pSeries machine can only switch AIL on a system-wide basis, so it
> must disable SCV support at boot if the configuration can potentially
> run a PR KVM guest.
>
> Also ensure a the FSCR[SCV] bit can not be enabled when emulating
> mtFSCR for the guest.
>
> SCV is not emulated for the PR guest at the moment, this just fixes the
> host crashes.
>
> Alternatives considered and rejected:
> - SCV support can not be disabled by PR KVM after boot, because it is
>   advertised to userspace with HWCAP.
> - AIL can not be disabled on a per-CPU basis. At least when running on
>   pseries it is a per-LPAR setting.
> - Support for real-mode SCV vectors will not be added because they are
>   at 0x17000 so making such a large fixed head space causes immediate
>   value limits to be exceeded, requiring a lot rework and more code.
> - Disabling SCV for any PR KVM possible kernel will cause a slowdown
>   when not using PR KVM.
> - A boot time option to disable SCV to use PR KVM is user-hostile.
> - System call instruction emulation for SCV facility unavailable
>   instructions is too complex and old emulation code was subtly broken
>   and removed.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kernel/exceptions-64s.S |  4 
>  arch/powerpc/kernel/setup_64.c   | 28 
>  arch/powerpc/kvm/Kconfig |  9 +
>  arch/powerpc/kvm/book3s_pr.c | 20 ++--
>  4 files changed, 55 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 55caeee37c08..b66dd6f775a4 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -809,6 +809,10 @@ __start_interrupts:
>   * - MSR_EE|MSR_RI is clear (no reentrant exceptions)
>   * - Standard kernel environment is set up (stack, paca, etc)
>   *
> + * KVM:
> + * These interrupts do not elevate HV 0->1, so HV is not involved. PR KVM
> + * ensures that FSCR[SCV] is disabled whenever it has to force AIL off.
> + *
>   * Call convention:
>   *
>   * syscall register convention is in Documentation/powerpc/syscall64-abi.rst
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index be8577ac9397..7f7da641e551 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -197,6 +197,34 @@ static void __init configure_exceptions(void)
>
>   /* Under a PAPR hypervisor, we need hypercalls */
>   if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
> + /*
> +  * - PR KVM does not support AIL mode interrupts in the host
> +  *   while a PR guest is running.
> +  *
> +  * - SCV system call interrupt vectors are only implemented for
> +  *   AIL mode interrupts.
> +  *
> +  * - On pseries, AIL mode can only be enabled and disabled
> +  *   system-wide so when a PR VM is created on a pseries host,
> +  *   all CPUs of the host are set to AIL=0 mode.
> +  *
> +  * - Therefore host CPUs must not execute scv while a PR VM
> +  *   exists.
> +  *
> +  * - SCV support can not be disabled dynamically because the
> +  *   feature is advertised to host userspace. Disabling the
> +  *   facility and emulating it would be possible but is not
> +  *   implemented.
> +  *
> +  * - So SCV support is blanket diabled if PR KVM could possibly

disabled

Reviewed-by: Fabiano Rosas

Re: [PATCH v2 2/2] KVM: PPC: Book3S PR: Disallow AIL != 0

2022-01-31 Thread Fabiano Rosas

Nicholas Piggin  writes:

> KVM PR does not implement address translation modes on interrupt, so it
> must not allow H_SET_MODE to succeed. The behaviour change caused by
> this mode is architected and not advisory (interrupts *must* behave
> differently).
>
> QEMU does not deal with differences in AIL support in the host. The
> solution to that is a spapr capability and corresponding KVM CAP, but
> this patch does not break things more than before (the host behaviour
> already differs, this change just disallows some modes that are not
> implemented properly).
>
> By happy coincidence, this allows PR Linux guests that are using the SCV
> facility to boot and run, because Linux disables the use of SCV if AIL
> can not be set to 3. This does not fix the underlying problem of missing
> SCV support (an OS could implement real-mode SCV vectors and try to
> enable the facility). The true fix for that is for KVM PR to emulate scv
> interrupts from the facility unavailable interrupt.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/kvm/book3s_pr_papr.c | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/arch/powerpc/kvm/book3s_pr_papr.c 
> b/arch/powerpc/kvm/book3s_pr_papr.c
> index 1f10e7dfcdd0..dc4f51ac84bc 100644
> --- a/arch/powerpc/kvm/book3s_pr_papr.c
> +++ b/arch/powerpc/kvm/book3s_pr_papr.c
> @@ -281,6 +281,22 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu 
> *vcpu)
>   return EMULATE_DONE;
>  }
>
> +static int kvmppc_h_pr_set_mode(struct kvm_vcpu *vcpu)
> +{
> + unsigned long mflags = kvmppc_get_gpr(vcpu, 4);
> + unsigned long resource = kvmppc_get_gpr(vcpu, 5);
> +
> + if (resource == H_SET_MODE_RESOURCE_ADDR_TRANS_MODE) {
> + /* KVM PR does not provide AIL!=0 to guests */
> + if (mflags == 0)
> + kvmppc_set_gpr(vcpu, 3, H_SUCCESS);
> + else
> + kvmppc_set_gpr(vcpu, 3, H_UNSUPPORTED_FLAG_START - 63);
> + return EMULATE_DONE;
> + }
> + return EMULATE_FAIL;
> +}
> +
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  static int kvmppc_h_pr_put_tce(struct kvm_vcpu *vcpu)
>  {
> @@ -384,6 +400,8 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
>   return kvmppc_h_pr_logical_ci_load(vcpu);
>   case H_LOGICAL_CI_STORE:
>   return kvmppc_h_pr_logical_ci_store(vcpu);
> + case H_SET_MODE:
> + return kvmppc_h_pr_set_mode(vcpu);
>   case H_XIRR:
>   case H_CPPR:
>   case H_EOI:
> @@ -421,6 +439,7 @@ int kvmppc_hcall_impl_pr(unsigned long cmd)
>   case H_CEDE:
>   case H_LOGICAL_CI_LOAD:
>   case H_LOGICAL_CI_STORE:
> + case H_SET_MODE:
>  #ifdef CONFIG_KVM_XICS
>   case H_XIRR:
>   case H_CPPR:
> @@ -447,6 +466,7 @@ static unsigned int default_hcall_list[] = {
>   H_BULK_REMOVE,
>   H_PUT_TCE,
>   H_CEDE,
> + H_SET_MODE,
>  #ifdef CONFIG_KVM_XICS
>   H_XIRR,
>   H_CPPR,

[PATCH v5 5/5] KVM: PPC: Book3s: mmio: Deliver DSI after emulation failure

2022-01-25 Thread Fabiano Rosas

MMIO emulation can fail if the guest uses an instruction that we are
not prepared to emulate. Since these instructions can be and most
likely are valid ones, this is (slightly) closer to an access fault
than to an illegal instruction, so deliver a Data Storage interrupt
instead of a Program interrupt.

BookE ignores bad faults, so it will keep using a Program interrupt
because a DSI would cause a fault loop in the guest.

Suggested-by: Nicholas Piggin 
Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/emulate_loadstore.c | 10 +++---
 arch/powerpc/kvm/powerpc.c   | 22 ++
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/emulate_loadstore.c 
b/arch/powerpc/kvm/emulate_loadstore.c
index 48272a9b9c30..cfc9114b87d0 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -73,7 +73,6 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 {
u32 inst;
enum emulation_result emulated = EMULATE_FAIL;
-   int advance = 1;
struct instruction_op op;
 
/* this default type might be overwritten by subcategories */
@@ -98,6 +97,8 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
int type = op.type & INSTR_TYPE_MASK;
int size = GETSIZE(op.type);
 
+   vcpu->mmio_is_write = OP_IS_STORE(type);
+
switch (type) {
case LOAD:  {
int instr_byte_swap = op.type & BYTEREV;
@@ -355,15 +356,10 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
}
}
 
-   if (emulated == EMULATE_FAIL) {
-   advance = 0;
-   kvmppc_core_queue_program(vcpu, 0);
-   }
-
trace_kvm_ppc_instr(inst, kvmppc_get_pc(vcpu), emulated);
 
/* Advance past emulated instruction. */
-   if (advance)
+   if (emulated != EMULATE_FAIL)
kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
 
return emulated;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index acb0d2a4bdb9..82d889db2b6b 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -309,6 +309,28 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
kvm_debug_ratelimited("Guest access to device memory using 
unsupported instruction (opcode: %#08x)\n",
  last_inst);
+
+   /*
+* Injecting a Data Storage here is a bit more
+* accurate since the instruction that caused the
+* access could still be a valid one.
+*/
+   if (!IS_ENABLED(CONFIG_BOOKE)) {
+   ulong dsisr = DSISR_BADACCESS;
+
+   if (vcpu->mmio_is_write)
+   dsisr |= DSISR_ISSTORE;
+
+   kvmppc_core_queue_data_storage(vcpu, 
vcpu->arch.vaddr_accessed, dsisr);
+   } else {
+   /*
+* BookE does not send a SIGBUS on a bad
+* fault, so use a Program interrupt instead
+* to avoid a fault loop.
+*/
+   kvmppc_core_queue_program(vcpu, 0);
+   }
+
r = RESUME_GUEST;
break;
}
-- 
2.34.1

[PATCH v5 4/5] KVM: PPC: mmio: Return to guest after emulation failure

2022-01-25 Thread Fabiano Rosas

If MMIO emulation fails we don't want to crash the whole guest by
returning to userspace.

The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM
implementation") added a todo:

  /* XXX Deliver Program interrupt to guest. */

and later the commit d69614a295ae ("KVM: PPC: Separate loadstore
emulation from priv emulation") added the Program interrupt injection
but in another file, so I'm assuming it was missed that this block
needed to be altered.

Also change the message to a ratelimited one since we're letting the
guest run and it could flood the host logs.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/powerpc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 27fb2b70f631..acb0d2a4bdb9 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -307,9 +307,9 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
u32 last_inst;
 
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
-   /* XXX Deliver Program interrupt to guest. */
-   pr_emerg("%s: emulation failed (%08x)\n", __func__, last_inst);
-   r = RESUME_HOST;
+   kvm_debug_ratelimited("Guest access to device memory using 
unsupported instruction (opcode: %#08x)\n",
+ last_inst);
+   r = RESUME_GUEST;
break;
}
default:
-- 
2.34.1

[PATCH v5 3/5] KVM: PPC: mmio: Reject instructions that access more than mmio.data size

2022-01-25 Thread Fabiano Rosas

The MMIO interface between the kernel and userspace uses a structure
that supports a maximum of 8-bytes of data. Instructions that access
more than that need to be emulated in parts.

We currently don't have generic support for splitting the emulation in
parts and each set of instructions needs to be explicitly included.

There's already an error message being printed when a load or store
exceeds the mmio.data buffer but we don't fail the emulation until
later at kvmppc_complete_mmio_load and even then we allow userspace to
make a partial copy of the data, which ends up overwriting some fields
of the mmio structure.

This patch makes the emulation fail earlier at kvmppc_handle_load|store,
which will send a Program interrupt to the guest. This is better than
allowing the guest to proceed with partial data.

Note that this was caught in a somewhat artificial scenario using
quadword instructions (lq/stq), there's no account of an actual guest
in the wild running instructions that are not properly emulated.

(While here, remove the "bad MMIO" messages. The caller already has an
error message.)

Signed-off-by: Fabiano Rosas 
Reviewed-by: Alexey Kardashevskiy 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/powerpc.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index c2bd29e90314..27fb2b70f631 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1114,10 +1114,8 @@ static void kvmppc_complete_mmio_load(struct kvm_vcpu 
*vcpu)
struct kvm_run *run = vcpu->run;
u64 gpr;
 
-   if (run->mmio.len > sizeof(gpr)) {
-   printk(KERN_ERR "bad MMIO length: %d\n", run->mmio.len);
+   if (run->mmio.len > sizeof(gpr))
return;
-   }
 
if (!vcpu->arch.mmio_host_swabbed) {
switch (run->mmio.len) {
@@ -1236,10 +1234,8 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu,
host_swabbed = !is_default_endian;
}
 
-   if (bytes > sizeof(run->mmio.data)) {
-   printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
-  run->mmio.len);
-   }
+   if (bytes > sizeof(run->mmio.data))
+   return EMULATE_FAIL;
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
run->mmio.len = bytes;
@@ -1325,10 +1321,8 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu,
host_swabbed = !is_default_endian;
}
 
-   if (bytes > sizeof(run->mmio.data)) {
-   printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
-  run->mmio.len);
-   }
+   if (bytes > sizeof(run->mmio.data))
+   return EMULATE_FAIL;
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
run->mmio.len = bytes;
-- 
2.34.1

[PATCH v5 2/5] KVM: PPC: Fix vmx/vsx mixup in mmio emulation

2022-01-25 Thread Fabiano Rosas

The MMIO emulation code for vector instructions is duplicated between
VSX and VMX. When emulating VMX we should check the VMX copy size
instead of the VSX one.

Fixes: acc9eb9305fe ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction ...")
Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/powerpc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 50414fb2a5ea..c2bd29e90314 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1499,7 +1499,7 @@ int kvmppc_handle_vmx_load(struct kvm_vcpu *vcpu,
 {
enum emulation_result emulated = EMULATE_DONE;
 
-   if (vcpu->arch.mmio_vsx_copy_nums > 2)
+   if (vcpu->arch.mmio_vmx_copy_nums > 2)
return EMULATE_FAIL;
 
while (vcpu->arch.mmio_vmx_copy_nums) {
@@ -1596,7 +1596,7 @@ int kvmppc_handle_vmx_store(struct kvm_vcpu *vcpu,
unsigned int index = rs & KVM_MMIO_REG_MASK;
enum emulation_result emulated = EMULATE_DONE;
 
-   if (vcpu->arch.mmio_vsx_copy_nums > 2)
+   if (vcpu->arch.mmio_vmx_copy_nums > 2)
return EMULATE_FAIL;
 
vcpu->arch.io_gpr = rs;
-- 
2.34.1

[PATCH v5 0/5] KVM: PPC: MMIO fixes

2022-01-25 Thread Fabiano Rosas

Changes from v4:

-patch 4: switched to kvm_debug_ratelimited.

-patch 5: kept the Program interrupt for BookE.

v4:
https://lore.kernel.org/r/20220121222626.972495-1-faro...@linux.ibm.com

v3:
https://lore.kernel.org/r/20220107210012.4091153-1-faro...@linux.ibm.com

v2:
https://lore.kernel.org/r/20220106200304.4070825-1-faro...@linux.ibm.com

v1:
https://lore.kernel.org/r/20211223211528.3560711-1-faro...@linux.ibm.com

Fabiano Rosas (5):
  KVM: PPC: Book3S HV: Stop returning internal values to userspace
  KVM: PPC: Fix vmx/vsx mixup in mmio emulation
  KVM: PPC: mmio: Reject instructions that access more than mmio.data
size
  KVM: PPC: mmio: Return to guest after emulation failure
  KVM: PPC: Book3s: mmio: Deliver DSI after emulation failure

 arch/powerpc/kvm/emulate_loadstore.c | 10 ++---
 arch/powerpc/kvm/powerpc.c   | 56 
 2 files changed, 43 insertions(+), 23 deletions(-)

-- 
2.34.1

[PATCH v5 1/5] KVM: PPC: Book3S HV: Stop returning internal values to userspace

2022-01-25 Thread Fabiano Rosas

Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values
to userspace, against the API of the KVM_RUN ioctl which returns 0 on
success.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
This was noticed while enabling the kvm selftests for powerpc. There's
an assert at the _vcpu_run function when we return a value different
from the expected.
---
 arch/powerpc/kvm/powerpc.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2ad0ccd202d5..50414fb2a5ea 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1841,6 +1841,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_ALTIVEC
 out:
 #endif
+
+   /*
+* We're already returning to userspace, don't pass the
+* RESUME_HOST flags along.
+*/
+   if (r > 0)
+   r = 0;
+
vcpu_put(vcpu);
return r;
 }
-- 
2.34.1

[PATCH v3 4/4] KVM: PPC: Decrement module refcount if init_vm fails

2022-01-25 Thread Fabiano Rosas

We increment the reference count for KVM-HV/PR before the call to
kvmppc_core_init_vm. If that function fails we need to decrement the
refcount.

Also remove the check on kvm_ops->owner because try_module_get can
handle a NULL module.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/powerpc.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2ad0ccd202d5..a6d6d452243f 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -431,6 +431,8 @@ int kvm_arch_check_processor_compat(void *opaque)
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
struct kvmppc_ops *kvm_ops = NULL;
+   int r;
+
/*
 * if we have both HV and PR enabled, default is HV
 */
@@ -452,11 +454,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
} else
goto err_out;
 
-   if (kvm_ops->owner && !try_module_get(kvm_ops->owner))
+   if (!try_module_get(kvm_ops->owner))
return -ENOENT;
 
kvm->arch.kvm_ops = kvm_ops;
-   return kvmppc_core_init_vm(kvm);
+   r = kvmppc_core_init_vm(kvm);
+   if (r)
+   module_put(kvm_ops->owner);
+   return r;
 err_out:
return -EINVAL;
 }
-- 
2.34.1

[PATCH v3 3/4] KVM: PPC: Book3S HV: Free allocated memory if module init fails

2022-01-25 Thread Fabiano Rosas

The module's exit function is not called when the init fails, we need
to do cleanup before returning.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b9aace212599..87a49651a402 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6104,7 +6104,7 @@ static int kvmppc_book3s_init_hv(void)
if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
r = kvm_init_subcore_bitmap();
if (r)
-   return r;
+   goto err;
}
 
/*
@@ -6120,7 +6120,8 @@ static int kvmppc_book3s_init_hv(void)
np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc");
if (!np) {
pr_err("KVM-HV: Cannot determine method for accessing 
XICS\n");
-   return -ENODEV;
+   r = -ENODEV;
+   goto err;
}
/* presence of intc confirmed - node can be dropped again */
of_node_put(np);
@@ -6133,12 +6134,12 @@ static int kvmppc_book3s_init_hv(void)
 
r = kvmppc_mmu_hv_init();
if (r)
-   return r;
+   goto err;
 
if (kvmppc_radix_possible()) {
r = kvmppc_radix_init();
if (r)
-   return r;
+   goto err;
}
 
r = kvmppc_uvmem_init();
@@ -6151,6 +6152,12 @@ static int kvmppc_book3s_init_hv(void)
kvmppc_hv_ops = _ops_hv;
 
return 0;
+
+err:
+   kvmhv_nested_exit();
+   kvmppc_radix_exit();
+
+   return r;
 }
 
 static void kvmppc_book3s_exit_hv(void)
-- 
2.34.1

[PATCH v3 2/4] KVM: PPC: Book3S HV: Delay setting of kvm ops

2022-01-25 Thread Fabiano Rosas

Delay the setting of kvm_hv_ops until after all init code has
completed. This avoids leaving the ops still accessible if the init
fails.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_hv.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3a3845f366d4..b9aace212599 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6127,9 +6127,6 @@ static int kvmppc_book3s_init_hv(void)
}
 #endif
 
-   kvm_ops_hv.owner = THIS_MODULE;
-   kvmppc_hv_ops = _ops_hv;
-
init_default_hcalls();
 
init_vcore_lists();
@@ -6145,10 +6142,15 @@ static int kvmppc_book3s_init_hv(void)
}
 
r = kvmppc_uvmem_init();
-   if (r < 0)
+   if (r < 0) {
pr_err("KVM-HV: kvmppc_uvmem_init failed %d\n", r);
+   return r;
+   }
 
-   return r;
+   kvm_ops_hv.owner = THIS_MODULE;
+   kvmppc_hv_ops = _ops_hv;
+
+   return 0;
 }
 
 static void kvmppc_book3s_exit_hv(void)
-- 
2.34.1

[PATCH v3 1/4] KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init

2022-01-25 Thread Fabiano Rosas

The return of the function is being shadowed by the call to
kvmppc_uvmem_init.

Fixes: ca9f4942670c ("KVM: PPC: Book3S HV: Support for running secure guests")
Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d1817cd9a691..3a3845f366d4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6138,8 +6138,11 @@ static int kvmppc_book3s_init_hv(void)
if (r)
return r;
 
-   if (kvmppc_radix_possible())
+   if (kvmppc_radix_possible()) {
r = kvmppc_radix_init();
+   if (r)
+   return r;
+   }
 
r = kvmppc_uvmem_init();
if (r < 0)
-- 
2.34.1

[PATCH v3 0/4] KVM: PPC: KVM module exit fixes

2022-01-25 Thread Fabiano Rosas

changes from v2:

- patch 4: Matched module_put() to try_module_get()

v2:
https://lore.kernel.org/r/20220124220803.1011673-1-faro...@linux.ibm.com

v1:
https://lore.kernel.org/r/20211223211931.3560887-1-faro...@linux.ibm.com

Fabiano Rosas (4):
  KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
  KVM: PPC: Book3S HV: Delay setting of kvm ops
  KVM: PPC: Book3S HV: Free allocated memory if module init fails
  KVM: PPC: Decrement module refcount if init_vm fails

 arch/powerpc/kvm/book3s_hv.c | 28 
 arch/powerpc/kvm/powerpc.c   |  9 +++--
 2 files changed, 27 insertions(+), 10 deletions(-)

-- 
2.34.1

Re: [PATCH v2 4/4] KVM: PPC: Decrement module refcount if init_vm fails

2022-01-25 Thread Fabiano Rosas

Nicholas Piggin  writes:

> Excerpts from Fabiano Rosas's message of January 25, 2022 8:08 am:
>> We increment the reference count for KVM-HV/PR before the call to
>> kvmppc_core_init_vm. If that function fails we need to decrement the
>> refcount.
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>> Caught this while testing Nick's LPID patches by looking at
>> /sys/module/kvm_hv/refcnt
>
> Nice catch. Is this the only change in the series?

Yes.

> You can just use kvm_ops->owner like try_module_get() does I think? Also
> try_module_get works on a NULL module same as module_put by the looks,
> so you could adjust that in this patch to remove the NULL check so it
> is consistent with the put.

Sure, I'll send a v3.

Re: [PATCH 2/2] KVM: PPC: Book3S PR: Disallow AIL != 0

2022-01-24 Thread Fabiano Rosas

Nicholas Piggin  writes:

> KVM PR does not implement address translation modes on interrupt, so it
> must not allow H_SET_MODE to succeed.
>
> This is not compatible with QEMU behaviour. The solution might be to
> have a cap-ail for this, but now it's broken either way so fix it in
> KVM to start with.
>
> This allows PR Linux guests that are using the SCV facility to boot and
> run, because Linux disables the use of SCV if AIL can not be set to 3.
> This isn't a real fix because Linux or another OS could implement real
> mode SCV vectors and try to enable it. The right solution is for KVM to
> emulate scv interrupts from the facility unavailable interrupt.
>
> Signed-off-by: Nicholas Piggin 
> ---

Reviewed-by: Fabiano Rosas 

>  arch/powerpc/kvm/book3s_pr_papr.c | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/arch/powerpc/kvm/book3s_pr_papr.c 
> b/arch/powerpc/kvm/book3s_pr_papr.c
> index 1f10e7dfcdd0..dc4f51ac84bc 100644
> --- a/arch/powerpc/kvm/book3s_pr_papr.c
> +++ b/arch/powerpc/kvm/book3s_pr_papr.c
> @@ -281,6 +281,22 @@ static int kvmppc_h_pr_logical_ci_store(struct kvm_vcpu 
> *vcpu)
>   return EMULATE_DONE;
>  }
>
> +static int kvmppc_h_pr_set_mode(struct kvm_vcpu *vcpu)
> +{
> + unsigned long mflags = kvmppc_get_gpr(vcpu, 4);
> + unsigned long resource = kvmppc_get_gpr(vcpu, 5);
> +
> + if (resource == H_SET_MODE_RESOURCE_ADDR_TRANS_MODE) {
> + /* KVM PR does not provide AIL!=0 to guests */
> + if (mflags == 0)
> + kvmppc_set_gpr(vcpu, 3, H_SUCCESS);
> + else
> + kvmppc_set_gpr(vcpu, 3, H_UNSUPPORTED_FLAG_START - 63);
> + return EMULATE_DONE;
> + }
> + return EMULATE_FAIL;
> +}
> +
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  static int kvmppc_h_pr_put_tce(struct kvm_vcpu *vcpu)
>  {
> @@ -384,6 +400,8 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
>   return kvmppc_h_pr_logical_ci_load(vcpu);
>   case H_LOGICAL_CI_STORE:
>   return kvmppc_h_pr_logical_ci_store(vcpu);
> + case H_SET_MODE:
> + return kvmppc_h_pr_set_mode(vcpu);
>   case H_XIRR:
>   case H_CPPR:
>   case H_EOI:
> @@ -421,6 +439,7 @@ int kvmppc_hcall_impl_pr(unsigned long cmd)
>   case H_CEDE:
>   case H_LOGICAL_CI_LOAD:
>   case H_LOGICAL_CI_STORE:
> + case H_SET_MODE:
>  #ifdef CONFIG_KVM_XICS
>   case H_XIRR:
>   case H_CPPR:
> @@ -447,6 +466,7 @@ static unsigned int default_hcall_list[] = {
>   H_BULK_REMOVE,
>   H_PUT_TCE,
>   H_CEDE,
> + H_SET_MODE,
>  #ifdef CONFIG_KVM_XICS
>   H_XIRR,
>   H_CPPR,

Re: [PATCH 1/2] KVM: PPC: Book3S PR: Disable SCV when running AIL is disabled

2022-01-24 Thread Fabiano Rosas

Nicholas Piggin  writes:

> PR KVM does not support running with AIL enabled, and SCV does is not
> supported with AIL disabled.
>
> Fix this by ensuring the SCV facility is disabled with FSCR while a
> CPU can be running with AIL=0. PowerNV host supports disabling AIL on a
> per-CPU basis, so SCV just needs to be disabled when a vCPU is run.
>
> The pSeries machine can only switch AIL on a system-wide basis, so it
> must disable SCV support at boot if the configuration can potentially
> run a PR KVM guest.
>
> SCV is not emulated for the PR guest at the moment, this just fixes the
> host crashes.
>
> Alternatives considered and rejected:
> - SCV support can not be disabled by PR KVM after boot, because it is
>   advertised to userspace with HWCAP.
> - AIL can not be disabled on a per-CPU basis. At least when running on
>   pseries it is a per-LPAR setting.
> - Support for real-mode SCV vectors will not be added because they are
>   at 0x17000 so making such a large fixed head space causes immediate
>   value limits to be exceeded, requiring a lot rework and more code.
> - Disabling SCV for any PR KVM possible kernel will cause a slowdown
>   when not using PR KVM.
> - A boot time option to disable SCV to use PR KVM is user-hostile.
> - System call instruction emulation for SCV facility unavailable
>   instructions is too complex and old emulation code was subtly broken
>   and removed.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/kernel/exceptions-64s.S |  4 
>  arch/powerpc/kernel/setup_64.c   | 15 +++
>  arch/powerpc/kvm/book3s_pr.c | 20 ++--
>  3 files changed, 33 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 55caeee37c08..b66dd6f775a4 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -809,6 +809,10 @@ __start_interrupts:
>   * - MSR_EE|MSR_RI is clear (no reentrant exceptions)
>   * - Standard kernel environment is set up (stack, paca, etc)
>   *
> + * KVM:
> + * These interrupts do not elevate HV 0->1, so HV is not involved. PR KVM
> + * ensures that FSCR[SCV] is disabled whenever it has to force AIL off.
> + *
>   * Call convention:
>   *
>   * syscall register convention is in Documentation/powerpc/syscall64-abi.rst
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index be8577ac9397..ac52c69a3811 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -197,6 +197,21 @@ static void __init configure_exceptions(void)
>
>   /* Under a PAPR hypervisor, we need hypercalls */
>   if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
> + /*
> +  * PR KVM does not support AIL mode interrupts in the host, and
> +  * SCV system call interrupt vectors are only implemented for
> +  * AIL mode. Under pseries, AIL mode can only be enabled and
> +  * disabled system-wide so when PR KVM is loaded, all CPUs in
> +  * the host are set to AIL=0 mode. SCV can not be disabled
> +  * dynamically because the feature is advertised to host
> +  * userspace, so SCV support must not be enabled if PR KVM can
> +  * possibly be run.
> +  */
> + if (IS_ENABLED(CONFIG_KVM_BOOK3S_PR_POSSIBLE) && 
> !radix_enabled()) {
> + init_task.thread.fscr &= ~FSCR_SCV;
> + cur_cpu_spec->cpu_user_features2 &= ~PPC_FEATURE2_SCV;
> + }
> +

"Under pseries, AIL mode can only be enabled and disabled system-wide so
 when PR KVM is loaded, all CPUs in the host are set to AIL=0 mode."

Loaded as in 'modprobe kvm_pr'? And host as in "nested host"
surely. Unless I completely misunderstood the patch (likely).

Is there a way to make this less unexpected to users? Maybe a few words
in the Kconfig entry for PR_POSSIBLE saying "if you enable this and run
a Hash MMU guest, you lose SCV"?

>   /* Enable AIL if possible */
>   if (!pseries_enable_reloc_on_exc()) {
>   init_task.thread.fscr &= ~FSCR_SCV;
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index 34a801c3604a..4d1c84b94b77 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -140,9 +140,12 @@ static void kvmppc_core_vcpu_load_pr(struct kvm_vcpu 
> *vcpu, int cpu)
>  #endif
>
>   /* Disable AIL if supported */
> - if (cpu_has_feature(CPU_FTR_HVMODE) &&
> - cpu_has_feature(CPU_FTR_ARCH_207S))
> - mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~LPCR_AIL);
> + if (cpu_has_feature(CPU_FTR_HVMODE)) {
> + if (cpu_has_feature(CPU_FTR_ARCH_207S))
> + mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~LPCR_AIL);
> + if (cpu_has_feature(CPU_FTR_ARCH_300) && (current->thread.fscr 
> & FSCR_SCV))
> +

[PATCH v2 0/4] KVM: PPC: KVM module exit fixes

2022-01-24 Thread Fabiano Rosas

I stumbled upon another issue with our module exit so I'm sending
another version to add a fix for it.

- patches 1 and 3 are already reviewed;

- patch 2 lacks a Reviewed-by. Nick asked about an issue Alexey might
  have encountered. I haven't heard of any issues with the module exit
  aside from the ones that this series fixes;

- patch 4 is new. It fixes an issue with module refcounting.

v1:
https://lore.kernel.org/r/20211223211931.3560887-1-faro...@linux.ibm.com

Fabiano Rosas (4):
  KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
  KVM: PPC: Book3S HV: Delay setting of kvm ops
  KVM: PPC: Book3S HV: Free allocated memory if module init fails
  KVM: PPC: Decrement module refcount if init_vm fails

 arch/powerpc/kvm/book3s_hv.c | 28 
 arch/powerpc/kvm/powerpc.c   |  7 ++-
 2 files changed, 26 insertions(+), 9 deletions(-)

-- 
2.34.1

Re: [PATCH 6/6] KVM: PPC: Book3S HV: Remove KVMPPC_NR_LPIDS

2022-01-24 Thread Fabiano Rosas

Nicholas Piggin  writes:

> KVMPPC_NR_LPIDS no longer represents any size restriction on the
> LPID space and can be removed. A CPU with more than 12 LPID bits
> implemented will now be able to create more than 4095 guests.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/include/asm/kvm_book3s_asm.h | 3 ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c   | 3 ---
>  2 files changed, 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
> b/arch/powerpc/include/asm/kvm_book3s_asm.h
> index e6bda70b1d93..c8882d9b86c2 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
> @@ -14,9 +14,6 @@
>  #define XICS_MFRR0xc
>  #define XICS_IPI 2   /* interrupt source # for IPIs */
>
> -/* LPIDs we support with this build -- runtime limit may be lower */
> -#define KVMPPC_NR_LPIDS  (1UL << 12)
> -
>  /* Maximum number of threads per physical core */
>  #define MAX_SMT_THREADS  8
>
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index f983fb36cbf2..aafd2a74304c 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -269,9 +269,6 @@ int kvmppc_mmu_hv_init(void)
>   nr_lpids = 1UL << KVM_MAX_NESTED_GUESTS_SHIFT;
>   }
>
> - if (nr_lpids > KVMPPC_NR_LPIDS)
> - nr_lpids = KVMPPC_NR_LPIDS;
> -
>   if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
>   /* POWER7 has 10-bit LPIDs, POWER8 has 12-bit LPIDs */
>   if (cpu_has_feature(CPU_FTR_ARCH_207S))

Re: [PATCH 5/6] KVM: PPC: Book3S Nested: Use explicit 4096 LPID maximum

2022-01-24 Thread Fabiano Rosas

Nicholas Piggin  writes:

> Rather than tie this to KVMPPC_NR_LPIDS which is becoming more dynamic,
> fix it to 4096 (12-bits) explicitly for now.
>
> kvmhv_get_nested() does not have to check against KVM_MAX_NESTED_GUESTS
> because the L1 partition table registration hcall already did that, and
> it checks against the partition table size.
>
> This patch also puts all the partition table size calculations into the
> same form, using 12 for the architected size field shift and 4 for the
> shift corresponding to the partition table entry size.
>
> Signed-of-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/include/asm/kvm_host.h |  7 ++-
>  arch/powerpc/kvm/book3s_64_mmu_hv.c |  2 +-
>  arch/powerpc/kvm/book3s_hv_nested.c | 24 +++-
>  3 files changed, 18 insertions(+), 15 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 5fd0564e5c94..e6fb03884dcc 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -34,7 +34,12 @@
>  #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>  #include   /* for MAX_SMT_THREADS */
>  #define KVM_MAX_VCPU_IDS (MAX_SMT_THREADS * KVM_MAX_VCORES)
> -#define KVM_MAX_NESTED_GUESTSKVMPPC_NR_LPIDS
> +
> +/*
> + * Limit the nested partition table to 4096 entries (because that's what
> + * hardware supports). Both guest and host use this value.
> + */
> +#define KVM_MAX_NESTED_GUESTS_SHIFT  12
>
>  #else
>  #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 5be92d5bc099..f983fb36cbf2 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -266,7 +266,7 @@ int kvmppc_mmu_hv_init(void)
>   return -EINVAL;
>   nr_lpids = 1UL << mmu_lpid_bits;
>   } else {
> - nr_lpids = KVM_MAX_NESTED_GUESTS;
> + nr_lpids = 1UL << KVM_MAX_NESTED_GUESTS_SHIFT;
>   }
>
>   if (nr_lpids > KVMPPC_NR_LPIDS)
> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
> b/arch/powerpc/kvm/book3s_hv_nested.c
> index 1eff969b095c..75169e0753ce 100644
> --- a/arch/powerpc/kvm/book3s_hv_nested.c
> +++ b/arch/powerpc/kvm/book3s_hv_nested.c
> @@ -439,10 +439,11 @@ long kvmhv_nested_init(void)
>   if (!radix_enabled())
>   return -ENODEV;
>
> - /* find log base 2 of KVMPPC_NR_LPIDS, rounding up */
> - ptb_order = __ilog2(KVMPPC_NR_LPIDS - 1) + 1;
> - if (ptb_order < 8)
> - ptb_order = 8;
> + /* Partition table entry is 1<<4 bytes in size, hence the 4. */
> + ptb_order = KVM_MAX_NESTED_GUESTS_SHIFT + 4;
> + /* Minimum partition table size is 1<<12 bytes */
> + if (ptb_order < 12)
> + ptb_order = 12;
>   pseries_partition_tb = kmalloc(sizeof(struct patb_entry) << ptb_order,
>  GFP_KERNEL);
>   if (!pseries_partition_tb) {
> @@ -450,7 +451,7 @@ long kvmhv_nested_init(void)
>   return -ENOMEM;
>   }
>
> - ptcr = __pa(pseries_partition_tb) | (ptb_order - 8);
> + ptcr = __pa(pseries_partition_tb) | (ptb_order - 12);
>   rc = plpar_hcall_norets(H_SET_PARTITION_TABLE, ptcr);
>   if (rc != H_SUCCESS) {
>   pr_err("kvm-hv: Parent hypervisor does not support nesting 
> (rc=%ld)\n",
> @@ -534,16 +535,14 @@ long kvmhv_set_partition_table(struct kvm_vcpu *vcpu)
>   long ret = H_SUCCESS;
>
>   srcu_idx = srcu_read_lock(>srcu);
> - /*
> -  * Limit the partition table to 4096 entries (because that's what
> -  * hardware supports), and check the base address.
> -  */
> - if ((ptcr & PRTS_MASK) > 12 - 8 ||
> + /* Check partition size and base address. */
> + if ((ptcr & PRTS_MASK) + 12 - 4 > KVM_MAX_NESTED_GUESTS_SHIFT ||
>   !kvm_is_visible_gfn(vcpu->kvm, (ptcr & PRTB_MASK) >> PAGE_SHIFT))
>   ret = H_PARAMETER;
>   srcu_read_unlock(>srcu, srcu_idx);
>   if (ret == H_SUCCESS)
>   kvm->arch.l1_ptcr = ptcr;
> +
>   return ret;
>  }
>
> @@ -639,7 +638,7 @@ static void kvmhv_update_ptbl_cache(struct 
> kvm_nested_guest *gp)
>
>   ret = -EFAULT;
>   ptbl_addr = (kvm->arch.l1_ptcr & PRTB_MASK) + (gp->l1_lpid << 4);
> - if (gp->l1_lpid < (1ul << ((kvm->arch.l1_ptcr & PRTS_MASK) + 8))) {
> + if (gp->l1_lpid < (1ul << ((kvm->arch.l1_ptcr & PRTS_MASK) + 12 - 4))) {
>

Re: [PATCH 4/6] KVM: PPC: Book3S HV Nested: Change nested guest lookup to use idr

2022-01-24 Thread Fabiano Rosas

Nicholas Piggin  writes:

> This removes the fixed sized kvm->arch.nested_guests array.
>
> Signed-off-by: Nicholas Piggin 
> ---

Reviewed-by: Fabiano Rosas 

>  arch/powerpc/include/asm/kvm_host.h |   3 +-
>  arch/powerpc/kvm/book3s_hv_nested.c | 110 +++-
>  2 files changed, 59 insertions(+), 54 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index d9bf60bf0816..5fd0564e5c94 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -326,8 +326,7 @@ struct kvm_arch {
>   struct list_head uvmem_pfns;
>   struct mutex mmu_setup_lock;/* nests inside vcpu mutexes */
>   u64 l1_ptcr;
> - int max_nested_lpid;
> - struct kvm_nested_guest *nested_guests[KVM_MAX_NESTED_GUESTS];
> + struct idr kvm_nested_guest_idr;
>   /* This array can grow quite large, keep it at the end */
>   struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
>  #endif
> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
> b/arch/powerpc/kvm/book3s_hv_nested.c
> index 9d373f8963ee..1eff969b095c 100644
> --- a/arch/powerpc/kvm/book3s_hv_nested.c
> +++ b/arch/powerpc/kvm/book3s_hv_nested.c
> @@ -521,11 +521,6 @@ static void kvmhv_set_nested_ptbl(struct 
> kvm_nested_guest *gp)
>   kvmhv_set_ptbl_entry(gp->shadow_lpid, dw0, gp->process_table);
>  }
>
> -void kvmhv_vm_nested_init(struct kvm *kvm)
> -{
> - kvm->arch.max_nested_lpid = -1;
> -}
> -
>  /*
>   * Handle the H_SET_PARTITION_TABLE hcall.
>   * r4 = guest real address of partition table + log_2(size) - 12
> @@ -660,6 +655,35 @@ static void kvmhv_update_ptbl_cache(struct 
> kvm_nested_guest *gp)
>   kvmhv_set_nested_ptbl(gp);
>  }
>
> +void kvmhv_vm_nested_init(struct kvm *kvm)
> +{
> + idr_init(>arch.kvm_nested_guest_idr);
> +}
> +
> +static struct kvm_nested_guest *__find_nested(struct kvm *kvm, int lpid)
> +{
> + return idr_find(>arch.kvm_nested_guest_idr, lpid);
> +}
> +
> +static bool __prealloc_nested(struct kvm *kvm, int lpid)
> +{
> + if (idr_alloc(>arch.kvm_nested_guest_idr,
> + NULL, lpid, lpid + 1, GFP_KERNEL) != lpid)
> + return false;
> + return true;
> +}
> +
> +static void __add_nested(struct kvm *kvm, int lpid, struct kvm_nested_guest 
> *gp)
> +{
> + if (idr_replace(>arch.kvm_nested_guest_idr, gp, lpid))
> + WARN_ON(1);
> +}
> +
> +static void __remove_nested(struct kvm *kvm, int lpid)
> +{
> + idr_remove(>arch.kvm_nested_guest_idr, lpid);
> +}
> +
>  static struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm *kvm, unsigned 
> int lpid)
>  {
>   struct kvm_nested_guest *gp;
> @@ -720,13 +744,8 @@ static void kvmhv_remove_nested(struct kvm_nested_guest 
> *gp)
>   long ref;
>
>   spin_lock(>mmu_lock);
> - if (gp == kvm->arch.nested_guests[lpid]) {
> - kvm->arch.nested_guests[lpid] = NULL;
> - if (lpid == kvm->arch.max_nested_lpid) {
> - while (--lpid >= 0 && !kvm->arch.nested_guests[lpid])
> - ;
> - kvm->arch.max_nested_lpid = lpid;
> - }
> + if (gp == __find_nested(kvm, lpid)) {
> + __remove_nested(kvm, lpid);
>   --gp->refcnt;
>   }
>   ref = gp->refcnt;
> @@ -743,24 +762,22 @@ static void kvmhv_remove_nested(struct kvm_nested_guest 
> *gp)
>   */
>  void kvmhv_release_all_nested(struct kvm *kvm)
>  {
> - int i;
> + int lpid;
>   struct kvm_nested_guest *gp;
>   struct kvm_nested_guest *freelist = NULL;
>   struct kvm_memory_slot *memslot;
>   int srcu_idx, bkt;
>
>   spin_lock(>mmu_lock);
> - for (i = 0; i <= kvm->arch.max_nested_lpid; i++) {
> - gp = kvm->arch.nested_guests[i];
> - if (!gp)
> - continue;
> - kvm->arch.nested_guests[i] = NULL;
> + idr_for_each_entry(>arch.kvm_nested_guest_idr, gp, lpid) {
> + __remove_nested(kvm, lpid);
>   if (--gp->refcnt == 0) {
>   gp->next = freelist;
>   freelist = gp;
>   }
>   }
> - kvm->arch.max_nested_lpid = -1;
> + idr_destroy(>arch.kvm_nested_guest_idr);
> + /* idr is empty and may be reused at this point */
>   spin_unlock(>mmu_lock);
>   while ((gp = freelist) != NULL) {
>   freelist = gp->next;
> @@ -797,7 +814,7 @@ struct kvm_nes

Re: [PATCH 3/6] KVM: PPC: Book3S HV: Use IDA allocator for LPID allocator

2022-01-24 Thread Fabiano Rosas

Nicholas Piggin  writes:

> This removes the fixed-size lpid_inuse array.
>
> Signed-off-by: Nicholas Piggin 
> ---

Reviewed-by: Fabiano Rosas 

>  arch/powerpc/kvm/powerpc.c | 25 +
>  1 file changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 102993462872..c527a5751b46 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -2453,20 +2453,22 @@ long kvm_arch_vm_ioctl(struct file *filp,
>   return r;
>  }
>
> -static unsigned long lpid_inuse[BITS_TO_LONGS(KVMPPC_NR_LPIDS)];
> +static DEFINE_IDA(lpid_inuse);
>  static unsigned long nr_lpids;
>
>  long kvmppc_alloc_lpid(void)
>  {
> - long lpid;
> + int lpid;
>
> - do {
> - lpid = find_first_zero_bit(lpid_inuse, KVMPPC_NR_LPIDS);
> - if (lpid >= nr_lpids) {
> + /* The host LPID must always be 0 (allocation starts at 1) */
> + lpid = ida_alloc_range(_inuse, 1, nr_lpids - 1, GFP_KERNEL);
> + if (lpid < 0) {
> + if (lpid == -ENOMEM)
> + pr_err("%s: Out of memory\n", __func__);
> + else
>   pr_err("%s: No LPIDs free\n", __func__);
> - return -ENOMEM;
> - }
> - } while (test_and_set_bit(lpid, lpid_inuse));
> + return -ENOMEM;
> + }
>
>   return lpid;
>  }
> @@ -2474,15 +2476,14 @@ EXPORT_SYMBOL_GPL(kvmppc_alloc_lpid);
>
>  void kvmppc_free_lpid(long lpid)
>  {
> - clear_bit(lpid, lpid_inuse);
> + ida_free(_inuse, lpid);
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_free_lpid);
>
> +/* nr_lpids_param includes the host LPID */
>  void kvmppc_init_lpid(unsigned long nr_lpids_param)
>  {
> - nr_lpids = min_t(unsigned long, KVMPPC_NR_LPIDS, nr_lpids_param);
> - memset(lpid_inuse, 0, sizeof(lpid_inuse));
> - set_bit(0, lpid_inuse); /* The host LPID must always be 0 */
> + nr_lpids = nr_lpids_param;
>  }
>  EXPORT_SYMBOL_GPL(kvmppc_init_lpid);

Re: [PATCH 2/6] KVM: PPC: Book3S HV: Update LPID allocator init for POWER9, Nested

2022-01-24 Thread Fabiano Rosas

Nicholas Piggin  writes:

> The LPID allocator init is changed to:
> - use mmu_lpid_bits rather than hard-coding;
> - use KVM_MAX_NESTED_GUESTS for nested hypervisors;
> - not reserve the top LPID on POWER9 and newer CPUs.
>
> The reserved LPID is made a POWER7/8-specific detail.
>
> Signed-off-by: Nicholas Piggin 
> ---

Reviewed-by: Fabiano Rosas 

>  arch/powerpc/include/asm/kvm_book3s_asm.h |  2 +-
>  arch/powerpc/include/asm/reg.h|  2 --
>  arch/powerpc/kvm/book3s_64_mmu_hv.c   | 29 ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  8 +++
>  arch/powerpc/mm/init_64.c |  3 +++
>  5 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
> b/arch/powerpc/include/asm/kvm_book3s_asm.h
> index b6d31bff5209..e6bda70b1d93 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
> @@ -15,7 +15,7 @@
>  #define XICS_IPI 2   /* interrupt source # for IPIs */
>
>  /* LPIDs we support with this build -- runtime limit may be lower */
> -#define KVMPPC_NR_LPIDS  (LPID_RSVD + 1)
> +#define KVMPPC_NR_LPIDS  (1UL << 12)
>
>  /* Maximum number of threads per physical core */
>  #define MAX_SMT_THREADS  8
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 1e14324c5190..1e8b2e04e626 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -473,8 +473,6 @@
>  #ifndef SPRN_LPID
>  #define SPRN_LPID0x13F   /* Logical Partition Identifier */
>  #endif
> -#define   LPID_RSVD_POWER7   0x3ff   /* Reserved LPID for partn switching */
> -#define   LPID_RSVD  0xfff   /* Reserved LPID for partn switching */
>  #define  SPRN_HMER   0x150   /* Hypervisor maintenance exception reg 
> */
>  #define   HMER_DEBUG_TRIG(1ul << (63 - 17)) /* Debug trigger */
>  #define  SPRN_HMEER  0x151   /* Hyp maintenance exception enable reg 
> */
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
> b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 09fc52b6f390..5be92d5bc099 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -256,7 +256,7 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct 
> kvm_memory_slot *memslot,
>
>  int kvmppc_mmu_hv_init(void)
>  {
> - unsigned long rsvd_lpid;
> + unsigned long nr_lpids;
>
>   if (!mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE))
>   return -EINVAL;
> @@ -264,16 +264,29 @@ int kvmppc_mmu_hv_init(void)
>   if (cpu_has_feature(CPU_FTR_HVMODE)) {
>   if (WARN_ON(mfspr(SPRN_LPID) != 0))
>   return -EINVAL;
> + nr_lpids = 1UL << mmu_lpid_bits;
> + } else {
> + nr_lpids = KVM_MAX_NESTED_GUESTS;
>   }
>
> - /* POWER8 and above have 12-bit LPIDs (10-bit in POWER7) */
> - if (cpu_has_feature(CPU_FTR_ARCH_207S))
> - rsvd_lpid = LPID_RSVD;
> - else
> - rsvd_lpid = LPID_RSVD_POWER7;
> + if (nr_lpids > KVMPPC_NR_LPIDS)
> + nr_lpids = KVMPPC_NR_LPIDS;
> +
> + if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
> + /* POWER7 has 10-bit LPIDs, POWER8 has 12-bit LPIDs */
> + if (cpu_has_feature(CPU_FTR_ARCH_207S))
> + WARN_ON(nr_lpids != 1UL << 12);
> + else
> + WARN_ON(nr_lpids != 1UL << 10);
> +
> + /*
> +  * Reserve the last implemented LPID use in partition
> +  * switching for POWER7 and POWER8.
> +  */
> + nr_lpids -= 1;
> + }
>
> - /* rsvd_lpid is reserved for use in partition switching */
> - kvmppc_init_lpid(rsvd_lpid);
> + kvmppc_init_lpid(nr_lpids);
>
>   return 0;
>  }
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index d185dee26026..0c552885a032 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -50,6 +50,14 @@
>  #define STACK_SLOT_UAMOR (SFS-88)
>  #define STACK_SLOT_FSCR  (SFS-96)
>
> +/*
> + * Use the last LPID (all implemented LPID bits = 1) for partition switching.
> + * This is reserved in the LPID allocator. POWER7 only implements 0x3ff, but
> + * we write 0xfff into the LPID SPR anyway, which seems to work and just
> + * ignores the top bits.
> + */
> +#define   LPID_RSVD  0xfff
> +
>  /*
>   * Call kvmppc_hv_entry in real mode.
>   *

[PATCH v2 4/4] KVM: PPC: Decrement module refcount if init_vm fails

2022-01-24 Thread Fabiano Rosas

We increment the reference count for KVM-HV/PR before the call to
kvmppc_core_init_vm. If that function fails we need to decrement the
refcount.

Signed-off-by: Fabiano Rosas 
---
Caught this while testing Nick's LPID patches by looking at
/sys/module/kvm_hv/refcnt
---
 arch/powerpc/kvm/powerpc.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2ad0ccd202d5..4285d0eac900 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -431,6 +431,8 @@ int kvm_arch_check_processor_compat(void *opaque)
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
struct kvmppc_ops *kvm_ops = NULL;
+   int r;
+
/*
 * if we have both HV and PR enabled, default is HV
 */
@@ -456,7 +458,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return -ENOENT;
 
kvm->arch.kvm_ops = kvm_ops;
-   return kvmppc_core_init_vm(kvm);
+   r = kvmppc_core_init_vm(kvm);
+   if (r)
+   module_put(kvm->arch.kvm_ops->owner);
+   return r;
 err_out:
return -EINVAL;
 }
-- 
2.34.1

[PATCH v2 3/4] KVM: PPC: Book3S HV: Free allocated memory if module init fails

2022-01-24 Thread Fabiano Rosas

The module's exit function is not called when the init fails, we need
to do cleanup before returning.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b9aace212599..87a49651a402 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6104,7 +6104,7 @@ static int kvmppc_book3s_init_hv(void)
if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
r = kvm_init_subcore_bitmap();
if (r)
-   return r;
+   goto err;
}
 
/*
@@ -6120,7 +6120,8 @@ static int kvmppc_book3s_init_hv(void)
np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc");
if (!np) {
pr_err("KVM-HV: Cannot determine method for accessing 
XICS\n");
-   return -ENODEV;
+   r = -ENODEV;
+   goto err;
}
/* presence of intc confirmed - node can be dropped again */
of_node_put(np);
@@ -6133,12 +6134,12 @@ static int kvmppc_book3s_init_hv(void)
 
r = kvmppc_mmu_hv_init();
if (r)
-   return r;
+   goto err;
 
if (kvmppc_radix_possible()) {
r = kvmppc_radix_init();
if (r)
-   return r;
+   goto err;
}
 
r = kvmppc_uvmem_init();
@@ -6151,6 +6152,12 @@ static int kvmppc_book3s_init_hv(void)
kvmppc_hv_ops = _ops_hv;
 
return 0;
+
+err:
+   kvmhv_nested_exit();
+   kvmppc_radix_exit();
+
+   return r;
 }
 
 static void kvmppc_book3s_exit_hv(void)
-- 
2.34.1

[PATCH v2 2/4] KVM: PPC: Book3S HV: Delay setting of kvm ops

2022-01-24 Thread Fabiano Rosas

Delay the setting of kvm_hv_ops until after all init code has
completed. This avoids leaving the ops still accessible if the init
fails.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_hv.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3a3845f366d4..b9aace212599 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6127,9 +6127,6 @@ static int kvmppc_book3s_init_hv(void)
}
 #endif
 
-   kvm_ops_hv.owner = THIS_MODULE;
-   kvmppc_hv_ops = _ops_hv;
-
init_default_hcalls();
 
init_vcore_lists();
@@ -6145,10 +6142,15 @@ static int kvmppc_book3s_init_hv(void)
}
 
r = kvmppc_uvmem_init();
-   if (r < 0)
+   if (r < 0) {
pr_err("KVM-HV: kvmppc_uvmem_init failed %d\n", r);
+   return r;
+   }
 
-   return r;
+   kvm_ops_hv.owner = THIS_MODULE;
+   kvmppc_hv_ops = _ops_hv;
+
+   return 0;
 }
 
 static void kvmppc_book3s_exit_hv(void)
-- 
2.34.1

[PATCH v2 1/4] KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init

2022-01-24 Thread Fabiano Rosas

The return of the function is being shadowed by the call to
kvmppc_uvmem_init.

Fixes: ca9f4942670c ("KVM: PPC: Book3S HV: Support for running secure guests")
Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d1817cd9a691..3a3845f366d4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6138,8 +6138,11 @@ static int kvmppc_book3s_init_hv(void)
if (r)
return r;
 
-   if (kvmppc_radix_possible())
+   if (kvmppc_radix_possible()) {
r = kvmppc_radix_init();
+   if (r)
+   return r;
+   }
 
r = kvmppc_uvmem_init();
if (r < 0)
-- 
2.34.1

Re: [PATCH] KVM: PPC: Book3S HV P9: Optimise loads around context switch

2022-01-24 Thread Fabiano Rosas

Nicholas Piggin  writes:

> It is better to get all loads for the register values in flight
> before starting to switch LPID, PID, and LPCR because those
> mtSPRs are expensive and serialising.
>
> This also just tidies up the code for a potential future change
> to the context switching sequence.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas

Re: [PATCH] KVM: PPC: Book3S HV: HFSCR[PREFIX] does not exist

2022-01-24 Thread Fabiano Rosas

Nicholas Piggin  writes:

> This facility is controlled by FSCR only. Reserved bits should not be
> set in the HFSCR register (although it's likely harmless as this
> position would not be re-used, and the L0 is forgiving here too).
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/include/asm/reg.h | 1 -
>  arch/powerpc/kvm/book3s_hv.c   | 2 +-
>  2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 2835f6363228..1e14324c5190 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -417,7 +417,6 @@
>  #define   FSCR_DSCR  __MASK(FSCR_DSCR_LG)
>  #define   FSCR_INTR_CAUSE (ASM_CONST(0xFF) << 56)/* interrupt cause */
>  #define SPRN_HFSCR   0xbe/* HV=1 Facility Status & Control Register */
> -#define   HFSCR_PREFIX   __MASK(FSCR_PREFIX_LG)
>  #define   HFSCR_MSGP __MASK(FSCR_MSGP_LG)
>  #define   HFSCR_TAR  __MASK(FSCR_TAR_LG)
>  #define   HFSCR_EBB  __MASK(FSCR_EBB_LG)
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 84c89f08ae9a..be8914c3dde9 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -2830,7 +2830,7 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu 
> *vcpu)
>* to trap and then we emulate them.
>*/
>   vcpu->arch.hfscr = HFSCR_TAR | HFSCR_EBB | HFSCR_PM | HFSCR_BHRB |
> - HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP | HFSCR_PREFIX;
> + HFSCR_DSCR | HFSCR_VECVSX | HFSCR_FP;
>   if (cpu_has_feature(CPU_FTR_HVMODE)) {
>   vcpu->arch.hfscr &= mfspr(SPRN_HFSCR);
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM

Re: [PATCH] KVM: PPC: Book3S HV Nested: Fix nested HFSCR being clobbered with multiple vCPUs

2022-01-24 Thread Fabiano Rosas

Nicholas Piggin  writes:

> The L0 is storing HFSCR requested by the L1 for the L2 in struct
> kvm_nested_guest when the L1 requests a vCPU enter L2. kvm_nested_guest
> is not a per-vCPU structure. Hilarity ensues.
>
> Fix it by moving the nested hfscr into the vCPU structure together with
> the other per-vCPU nested fields.
>
> Fixes: 8b210a880b35 ("KVM: PPC: Book3S HV Nested: Make nested HFSCR state 
> accessible")
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas

[PATCH v4 5/5] KVM: PPC: mmio: Deliver DSI after emulation failure

2022-01-21 Thread Fabiano Rosas

MMIO emulation can fail if the guest uses an instruction that we are
not prepared to emulate. Since these instructions can be and most
likely are valid ones, this is (slightly) closer to an access fault
than to an illegal instruction, so deliver a Data Storage interrupt
instead of a Program interrupt.

Suggested-by: Nicholas Piggin 
Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/emulate_loadstore.c | 10 +++---
 arch/powerpc/kvm/powerpc.c   | 12 
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/emulate_loadstore.c 
b/arch/powerpc/kvm/emulate_loadstore.c
index 48272a9b9c30..cfc9114b87d0 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -73,7 +73,6 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 {
u32 inst;
enum emulation_result emulated = EMULATE_FAIL;
-   int advance = 1;
struct instruction_op op;
 
/* this default type might be overwritten by subcategories */
@@ -98,6 +97,8 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
int type = op.type & INSTR_TYPE_MASK;
int size = GETSIZE(op.type);
 
+   vcpu->mmio_is_write = OP_IS_STORE(type);
+
switch (type) {
case LOAD:  {
int instr_byte_swap = op.type & BYTEREV;
@@ -355,15 +356,10 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
}
}
 
-   if (emulated == EMULATE_FAIL) {
-   advance = 0;
-   kvmppc_core_queue_program(vcpu, 0);
-   }
-
trace_kvm_ppc_instr(inst, kvmppc_get_pc(vcpu), emulated);
 
/* Advance past emulated instruction. */
-   if (advance)
+   if (emulated != EMULATE_FAIL)
kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
 
return emulated;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 214602c58f13..9befb121dddb 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -305,10 +305,22 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
case EMULATE_FAIL:
{
u32 last_inst;
+   ulong store_bit = DSISR_ISSTORE;
+   ulong cause = DSISR_BADACCESS;
 
+#ifdef CONFIG_BOOKE
+   store_bit = ESR_ST;
+   cause = 0;
+#endif
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
pr_info_ratelimited("KVM: guest access to device memory using 
unsupported instruction (PID: %d opcode: %#08x)\n",
current->pid, last_inst);
+
+   if (vcpu->mmio_is_write)
+   cause |= store_bit;
+
+   kvmppc_core_queue_data_storage(vcpu, vcpu->arch.vaddr_accessed,
+  cause);
r = RESUME_GUEST;
break;
}
-- 
2.34.1

[PATCH v4 4/5] KVM: PPC: mmio: Return to guest after emulation failure

2022-01-21 Thread Fabiano Rosas

If MMIO emulation fails we don't want to crash the whole guest by
returning to userspace.

The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM
implementation") added a todo:

  /* XXX Deliver Program interrupt to guest. */

and later the commit d69614a295ae ("KVM: PPC: Separate loadstore
emulation from priv emulation") added the Program interrupt injection
but in another file, so I'm assuming it was missed that this block
needed to be altered.

Also change the message to a ratelimited one since we're letting the
guest run and it could flood the host logs.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/powerpc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 27fb2b70f631..214602c58f13 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -307,9 +307,9 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
u32 last_inst;
 
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
-   /* XXX Deliver Program interrupt to guest. */
-   pr_emerg("%s: emulation failed (%08x)\n", __func__, last_inst);
-   r = RESUME_HOST;
+   pr_info_ratelimited("KVM: guest access to device memory using 
unsupported instruction (PID: %d opcode: %#08x)\n",
+   current->pid, last_inst);
+   r = RESUME_GUEST;
break;
}
default:
-- 
2.34.1

[PATCH v4 3/5] KVM: PPC: mmio: Reject instructions that access more than mmio.data size

2022-01-21 Thread Fabiano Rosas

The MMIO interface between the kernel and userspace uses a structure
that supports a maximum of 8-bytes of data. Instructions that access
more than that need to be emulated in parts.

We currently don't have generic support for splitting the emulation in
parts and each set of instructions needs to be explicitly included.

There's already an error message being printed when a load or store
exceeds the mmio.data buffer but we don't fail the emulation until
later at kvmppc_complete_mmio_load and even then we allow userspace to
make a partial copy of the data, which ends up overwriting some fields
of the mmio structure.

This patch makes the emulation fail earlier at kvmppc_handle_load|store,
which will send a Program interrupt to the guest. This is better than
allowing the guest to proceed with partial data.

Note that this was caught in a somewhat artificial scenario using
quadword instructions (lq/stq), there's no account of an actual guest
in the wild running instructions that are not properly emulated.

(While here, remove the "bad MMIO" messages. The caller already has an
error message.)

Signed-off-by: Fabiano Rosas 
Reviewed-by: Alexey Kardashevskiy 
---
 arch/powerpc/kvm/powerpc.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index c2bd29e90314..27fb2b70f631 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1114,10 +1114,8 @@ static void kvmppc_complete_mmio_load(struct kvm_vcpu 
*vcpu)
struct kvm_run *run = vcpu->run;
u64 gpr;
 
-   if (run->mmio.len > sizeof(gpr)) {
-   printk(KERN_ERR "bad MMIO length: %d\n", run->mmio.len);
+   if (run->mmio.len > sizeof(gpr))
return;
-   }
 
if (!vcpu->arch.mmio_host_swabbed) {
switch (run->mmio.len) {
@@ -1236,10 +1234,8 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu,
host_swabbed = !is_default_endian;
}
 
-   if (bytes > sizeof(run->mmio.data)) {
-   printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
-  run->mmio.len);
-   }
+   if (bytes > sizeof(run->mmio.data))
+   return EMULATE_FAIL;
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
run->mmio.len = bytes;
@@ -1325,10 +1321,8 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu,
host_swabbed = !is_default_endian;
}
 
-   if (bytes > sizeof(run->mmio.data)) {
-   printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
-  run->mmio.len);
-   }
+   if (bytes > sizeof(run->mmio.data))
+   return EMULATE_FAIL;
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
run->mmio.len = bytes;
-- 
2.34.1

[PATCH v4 2/5] KVM: PPC: Fix vmx/vsx mixup in mmio emulation

2022-01-21 Thread Fabiano Rosas

The MMIO emulation code for vector instructions is duplicated between
VSX and VMX. When emulating VMX we should check the VMX copy size
instead of the VSX one.

Fixes: acc9eb9305fe ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction ...")
Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/powerpc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 50414fb2a5ea..c2bd29e90314 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1499,7 +1499,7 @@ int kvmppc_handle_vmx_load(struct kvm_vcpu *vcpu,
 {
enum emulation_result emulated = EMULATE_DONE;
 
-   if (vcpu->arch.mmio_vsx_copy_nums > 2)
+   if (vcpu->arch.mmio_vmx_copy_nums > 2)
return EMULATE_FAIL;
 
while (vcpu->arch.mmio_vmx_copy_nums) {
@@ -1596,7 +1596,7 @@ int kvmppc_handle_vmx_store(struct kvm_vcpu *vcpu,
unsigned int index = rs & KVM_MMIO_REG_MASK;
enum emulation_result emulated = EMULATE_DONE;
 
-   if (vcpu->arch.mmio_vsx_copy_nums > 2)
+   if (vcpu->arch.mmio_vmx_copy_nums > 2)
return EMULATE_FAIL;
 
vcpu->arch.io_gpr = rs;
-- 
2.34.1

[PATCH v4 1/5] KVM: PPC: Book3S HV: Stop returning internal values to userspace

2022-01-21 Thread Fabiano Rosas

Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values
to userspace, against the API of the KVM_RUN ioctl which returns 0 on
success.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
This was noticed while enabling the kvm selftests for powerpc. There's
an assert at the _vcpu_run function when we return a value different
from the expected.
---
 arch/powerpc/kvm/powerpc.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2ad0ccd202d5..50414fb2a5ea 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1841,6 +1841,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_ALTIVEC
 out:
 #endif
+
+   /*
+* We're already returning to userspace, don't pass the
+* RESUME_HOST flags along.
+*/
+   if (r > 0)
+   r = 0;
+
vcpu_put(vcpu);
return r;
 }
-- 
2.34.1

[PATCH v4 0/5] KVM: PPC: MMIO fixes

2022-01-21 Thread Fabiano Rosas

Changes from v3:

Removed all of the low level messages and altered the pr_emerg in the
emulate_mmio to a more descriptive message.

Changed the Program interrupt to a Data Storage. There's an ifdef
needed because this code is shared by HV, PR and booke.

v3:
https://lore.kernel.org/r/20220107210012.4091153-1-faro...@linux.ibm.com

v2:
https://lore.kernel.org/r/20220106200304.4070825-1-faro...@linux.ibm.com

v1:
https://lore.kernel.org/r/20211223211528.3560711-1-faro...@linux.ibm.com

Fabiano Rosas (5):
  KVM: PPC: Book3S HV: Stop returning internal values to userspace
  KVM: PPC: Fix vmx/vsx mixup in mmio emulation
  KVM: PPC: mmio: Reject instructions that access more than mmio.data
size
  KVM: PPC: mmio: Return to guest after emulation failure
  KVM: PPC: mmio: Deliver DSI after emulation failure

 arch/powerpc/kvm/emulate_loadstore.c | 10 ++
 arch/powerpc/kvm/powerpc.c   | 46 ++--
 2 files changed, 33 insertions(+), 23 deletions(-)

-- 
2.34.1

[PATCH 2/2] KVM: selftests: Add support for ppc64le

2022-01-20 Thread Fabiano Rosas

This adds the infrastructure for writing tests for the powerpc
platform (Only Radix MMU for now).

This patch also enables two tests:

- a dummy sample test that creates a guest with one vcpu, issues
  hypercalls and reads/writes test values from memory.

- the kvm_page_table test, although at this point I'm not using it to
  test KVM, but mostly as a way to stress test this code.

$ make -C tools/testing/selftests TARGETS=kvm
$ make -C tools/testing/selftests TARGETS=kvm run_tests

Signed-off-by: Fabiano Rosas 
---
 MAINTAINERS   |   3 +
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |  14 +-
 .../selftests/kvm/include/kvm_util_base.h |   7 +
 .../selftests/kvm/include/ppc64le/processor.h |  43 +++
 tools/testing/selftests/kvm/lib/kvm_util.c|   5 +
 .../testing/selftests/kvm/lib/powerpc/hcall.S |   6 +
 .../selftests/kvm/lib/powerpc/processor.c | 343 ++
 .../testing/selftests/kvm/lib/powerpc/ucall.c |  67 
 .../selftests/kvm/powerpc/sample_test.c   | 144 
 10 files changed, 630 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/include/ppc64le/processor.h
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/hcall.S
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/processor.c
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/ucall.c
 create mode 100644 tools/testing/selftests/kvm/powerpc/sample_test.c

diff --git a/MAINTAINERS b/MAINTAINERS
index a76e7558b151..15c89d33d584 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10537,6 +10537,9 @@ F:  arch/powerpc/include/asm/kvm*
 F: arch/powerpc/include/uapi/asm/kvm*
 F: arch/powerpc/kernel/kvm*
 F: arch/powerpc/kvm/
+F: tools/testing/selftests/kvm/include/ppc64le/
+F: tools/testing/selftests/kvm/lib/powerpc/
+F: tools/testing/selftests/kvm/powerpc/
 
 KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv)
 M: Anup Patel 
diff --git a/tools/testing/selftests/kvm/.gitignore 
b/tools/testing/selftests/kvm/.gitignore
index 8c129961accf..45ab993e2845 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -46,6 +46,7 @@
 /x86_64/xen_vmcall_test
 /x86_64/xss_msr_test
 /x86_64/vmx_pmu_msrs_test
+/powerpc/sample_test
 /access_tracking_perf_test
 /demand_paging_test
 /dirty_log_test
diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index 556da71c33b8..5ae27109e9b9 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -17,9 +17,9 @@ KSFT_KHDR_INSTALL := 1
 # LINUX_TOOL_ARCH_INCLUDE is set using ARCH variable.
 #
 # x86_64 targets are named to include x86_64 as a suffix and directories
-# for includes are in x86_64 sub-directory. s390x and aarch64 follow the
-# same convention. "uname -m" doesn't result in the correct mapping for
-# s390x and aarch64.
+# for includes are in x86_64 sub-directory. s390x, aarch64 and ppc64le
+# follow the same convention. "uname -m" doesn't result in the correct
+# mapping for s390x, aarch64 and ppc64le.
 #
 # No change necessary for x86_64
 UNAME_M := $(shell uname -m)
@@ -36,12 +36,17 @@ endif
 ifeq ($(ARCH),riscv)
UNAME_M := riscv
 endif
+# Set UNAME_M for ppc64le compile/install to work
+ifeq ($(ARCH),powerpc)
+   UNAME_M := ppc64le
+endif
 
 LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/rbtree.c 
lib/sparsebit.c lib/test_util.c lib/guest_modes.c lib/perf_test_util.c
 LIBKVM_x86_64 = lib/x86_64/apic.c lib/x86_64/processor.c lib/x86_64/vmx.c 
lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S
 LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c 
lib/aarch64/handlers.S lib/aarch64/spinlock.c lib/aarch64/gic.c 
lib/aarch64/gic_v3.c lib/aarch64/vgic.c
 LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c 
lib/s390x/diag318_test_handler.c
 LIBKVM_riscv = lib/riscv/processor.c lib/riscv/ucall.c
+LIBKVM_ppc64le = lib/powerpc/processor.c lib/powerpc/ucall.c 
lib/powerpc/hcall.S
 
 TEST_GEN_PROGS_x86_64 = x86_64/cr4_cpuid_sync_test
 TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
@@ -133,6 +138,9 @@ TEST_GEN_PROGS_riscv += kvm_page_table_test
 TEST_GEN_PROGS_riscv += set_memory_region_test
 TEST_GEN_PROGS_riscv += kvm_binary_stats_test
 
+TEST_GEN_PROGS_ppc64le += powerpc/sample_test
+TEST_GEN_PROGS_ppc64le += kvm_page_table_test
+
 TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
 LIBKVM += $(LIBKVM_$(UNAME_M))
 
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 66775de26952..a930d663fe67 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -54,6 +54,7 @@ enum vm_guest_mode {
VM_MODE_P36V48_16K,
VM_MODE_P36V48_64K,
VM_MODE_P36V47_16K,
+   VM_MODE_P51V52_64K,
NUM_VM_M

[PATCH 0/2] KVM: selftests: Add powerpc support

2022-01-20 Thread Fabiano Rosas

This series adds the initial support for ppc64le Book3s with Radix
MMU.

At this time I'm including only the kvm_page_table test and a dummy
test to serve as a sample of what can be done with these tests. I
intend to make a pass over the remaining common tests and add the ones
which could be built for powerpc as well.

patch 1: a prerequisite small fix for the powerpc vcpu_ioctl. It is
 the same I already sent to the ppc mailing list but I'll
 include it here to make this a complete series.

patch 2: the actual infrastructure support.

Fabiano Rosas (2):
  KVM: PPC: Book3S HV: Stop returning internal values to userspace
  KVM: selftests: Add support for ppc64le

 MAINTAINERS   |   3 +
 arch/powerpc/kvm/powerpc.c|   8 +
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |  14 +-
 .../selftests/kvm/include/kvm_util_base.h |   7 +
 .../selftests/kvm/include/ppc64le/processor.h |  43 +++
 tools/testing/selftests/kvm/lib/kvm_util.c|   5 +
 .../testing/selftests/kvm/lib/powerpc/hcall.S |   6 +
 .../selftests/kvm/lib/powerpc/processor.c | 343 ++
 .../testing/selftests/kvm/lib/powerpc/ucall.c |  67 
 .../selftests/kvm/powerpc/sample_test.c   | 144 
 11 files changed, 638 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/include/ppc64le/processor.h
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/hcall.S
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/processor.c
 create mode 100644 tools/testing/selftests/kvm/lib/powerpc/ucall.c
 create mode 100644 tools/testing/selftests/kvm/powerpc/sample_test.c

-- 
2.34.1

[PATCH 1/2] KVM: PPC: Book3S HV: Stop returning internal values to userspace

2022-01-20 Thread Fabiano Rosas

Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values
to userspace, against the API of the KVM_RUN ioctl which returns 0 on
success.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
This was noticed while enabling the kvm selftests for powerpc. There's
an assert at the _vcpu_run function when we return a value different
from the expected.
---
 arch/powerpc/kvm/powerpc.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2ad0ccd202d5..50414fb2a5ea 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1841,6 +1841,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_ALTIVEC
 out:
 #endif
+
+   /*
+* We're already returning to userspace, don't pass the
+* RESUME_HOST flags along.
+*/
+   if (r > 0)
+   r = 0;
+
vcpu_put(vcpu);
return r;
 }
-- 
2.34.1

Re: [PATCH 2/3] KVM: PPC: Book3S HV: Delay setting of kvm ops

2022-01-11 Thread Fabiano Rosas

Nicholas Piggin  writes:

> Excerpts from Fabiano Rosas's message of December 24, 2021 7:19 am:
>> Delay the setting of kvm_hv_ops until after all init code has
>> completed. This avoids leaving the ops still accessible if the init
>> fails.
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  arch/powerpc/kvm/book3s_hv.c | 12 +++-
>>  1 file changed, 7 insertions(+), 5 deletions(-)
>> 
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 9f4765951733..53400932f5d8 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -6087,9 +6087,6 @@ static int kvmppc_book3s_init_hv(void)
>>  }
>>  #endif
>>  
>> -kvm_ops_hv.owner = THIS_MODULE;
>> -kvmppc_hv_ops = _ops_hv;
>> -
>>  init_default_hcalls();
>>  
>>  init_vcore_lists();
>> @@ -6105,10 +6102,15 @@ static int kvmppc_book3s_init_hv(void)
>>  }
>>  
>>  r = kvmppc_uvmem_init();
>> -if (r < 0)
>> +if (r < 0) {
>>  pr_err("KVM-HV: kvmppc_uvmem_init failed %d\n", r);
>> +return r;
>> +}
>>  
>> -return r;
>> +kvm_ops_hv.owner = THIS_MODULE;
>> +kvmppc_hv_ops = _ops_hv;
>> +
>> +return 0;
>>  }
>>  
>>  static void kvmppc_book3s_exit_hv(void)
>> -- 
>> 2.33.1
>> 
>> 
>
> Also looks okay to me but KVM init has lots of details. IIRC Alexey may 
> have run into a related issue with ops being set too early (or was it 
> cleared too late?)
>
> Thanks,
> Nick
>

CC Alexey

Re: [PATCH v3 5/6] KVM: PPC: mmio: Return to guest after emulation failure

2022-01-11 Thread Fabiano Rosas

Nicholas Piggin  writes:

> Excerpts from Alexey Kardashevskiy's message of January 11, 2022 9:51 am:
>> 
>> 
>> On 1/10/22 18:36, Nicholas Piggin wrote:
>>> Excerpts from Fabiano Rosas's message of January 8, 2022 7:00 am:
>>>> If MMIO emulation fails we don't want to crash the whole guest by
>>>> returning to userspace.
>>>>
>>>> The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM
>>>> implementation") added a todo:
>>>>
>>>>/* XXX Deliver Program interrupt to guest. */
>>>>
>>>> and later the commit d69614a295ae ("KVM: PPC: Separate loadstore
>>>> emulation from priv emulation") added the Program interrupt injection
>>>> but in another file, so I'm assuming it was missed that this block
>>>> needed to be altered.
>>>>
>>>> Signed-off-by: Fabiano Rosas 
>>>> Reviewed-by: Alexey Kardashevskiy 
>>>> ---
>>>>   arch/powerpc/kvm/powerpc.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>>>> index 6daeea4a7de1..56b0faab7a5f 100644
>>>> --- a/arch/powerpc/kvm/powerpc.c
>>>> +++ b/arch/powerpc/kvm/powerpc.c
>>>> @@ -309,7 +309,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
>>>>kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
>>>>kvmppc_core_queue_program(vcpu, 0);
>>>>pr_info("%s: emulation failed (%08x)\n", __func__, last_inst);
>>>> -  r = RESUME_HOST;
>>>> +  r = RESUME_GUEST;
>>> 
>>> So at this point can the pr_info just go away?
>>> 
>>> I wonder if this shouldn't be a DSI rather than a program check.
>>> DSI with DSISR[37] looks a bit more expected. Not that Linux
>>> probably does much with it but at least it would give a SIGBUS
>>> rather than SIGILL.
>> 
>> It does not like it is more expected to me, it is not about wrong memory 
>> attributes, it is the instruction itself which cannot execute.
>
> It's not an illegal instruction though, it can't execute because of the
> nature of the data / address it is operating on. That says d-side to me.
>
> DSISR[37] isn't perfect but if you squint it's not terrible. It's about
> certain instructions that have restrictions operating on other than
> normal cacheable mappings.

I think I agree with Nick on this one. At least the DSISR gives _some_
information while the Program is maybe too generic. I would probably be
staring at the opcode wondering what is wrong with it.

Re: [PATCH v3 3/6] KVM: PPC: Don't use pr_emerg when mmio emulation fails

2022-01-11 Thread Fabiano Rosas

Nicholas Piggin  writes:

> Excerpts from Fabiano Rosas's message of January 8, 2022 7:00 am:
>> If MMIO emulation fails we deliver a Program interrupt to the
>> guest. This is a normal event for the host, so use pr_info.
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>
> Should it be rate limited to prevent guest spamming host log? In the 
> case of informational messages it's always good if it gives the 
> administrator some idea of what to do with it. If it's normal
> for the host does it even need a message? If yes, then identifying which
> guest and adding something like "(this might becaused by a bug in guest 
> driver)" would set the poor admin's mind at rest.

I'll drop this message then and improve the other one that is emitted
earlier at the emulation code.

Re: [PATCH v3 6/6] KVM: PPC: mmio: Reject instructions that access more than mmio.data size

2022-01-11 Thread Fabiano Rosas

Nicholas Piggin  writes:

> Excerpts from Fabiano Rosas's message of January 8, 2022 7:00 am:
>> The MMIO interface between the kernel and userspace uses a structure
>> that supports a maximum of 8-bytes of data. Instructions that access
>> more than that need to be emulated in parts.
>> 
>> We currently don't have generic support for splitting the emulation in
>> parts and each set of instructions needs to be explicitly included.
>> 
>> There's already an error message being printed when a load or store
>> exceeds the mmio.data buffer but we don't fail the emulation until
>> later at kvmppc_complete_mmio_load and even then we allow userspace to
>> make a partial copy of the data, which ends up overwriting some fields
>> of the mmio structure.
>> 
>> This patch makes the emulation fail earlier at kvmppc_handle_load|store,
>> which will send a Program interrupt to the guest. This is better than
>> allowing the guest to proceed with partial data.
>> 
>> Note that this was caught in a somewhat artificial scenario using
>> quadword instructions (lq/stq), there's no account of an actual guest
>> in the wild running instructions that are not properly emulated.
>> 
>> (While here, fix the error message to check against 'bytes' and not
>> 'run->mmio.len' which at this point has an old value.)
>
> This looks good to me
>
> Reviewed-by: Nicholas Piggin 
>
>> 
>> Signed-off-by: Fabiano Rosas 
>> Reviewed-by: Alexey Kardashevskiy 
>> ---
>>  arch/powerpc/kvm/powerpc.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>> index 56b0faab7a5f..a1643ca988e0 100644
>> --- a/arch/powerpc/kvm/powerpc.c
>> +++ b/arch/powerpc/kvm/powerpc.c
>> @@ -1246,7 +1246,8 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu,
>>  
>>  if (bytes > sizeof(run->mmio.data)) {
>>  printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
>> -   run->mmio.len);
>> +   bytes);
>
> I wonder though this should probably be ratelimited, informational (or 
> at least warning because it's a host message), and perhaps a bit more
> explanatory that it's a guest problem (or at least lack of host support
> for particular guest MMIO sizes).

Yes, I'll ratelimit it an try to make it clear that this is something
that happened inside the guest but it lacks support in KVM. Then
hopefully people will tell to us if they ever need that support.

[PATCH v3 3/6] KVM: PPC: Don't use pr_emerg when mmio emulation fails

2022-01-07 Thread Fabiano Rosas

If MMIO emulation fails we deliver a Program interrupt to the
guest. This is a normal event for the host, so use pr_info.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/powerpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 92e552ab5a77..4d7d0d080232 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -308,7 +308,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
 
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
/* XXX Deliver Program interrupt to guest. */
-   pr_emerg("%s: emulation failed (%08x)\n", __func__, last_inst);
+   pr_info("%s: emulation failed (%08x)\n", __func__, last_inst);
r = RESUME_HOST;
break;
}
-- 
2.33.1

[PATCH v3 4/6] KVM: PPC: mmio: Queue interrupt at kvmppc_emulate_mmio

2022-01-07 Thread Fabiano Rosas

If MMIO emulation fails, we queue a Program interrupt to the
guest. Move that line up into kvmppc_emulate_mmio, which is where we
set RESUME_GUEST/HOST. This allows the removal of the 'advance'
variable.

No functional change, just separation of responsibilities.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/emulate_loadstore.c | 8 +---
 arch/powerpc/kvm/powerpc.c   | 2 +-
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/emulate_loadstore.c 
b/arch/powerpc/kvm/emulate_loadstore.c
index 48272a9b9c30..4dec920fe4c9 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -73,7 +73,6 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 {
u32 inst;
enum emulation_result emulated = EMULATE_FAIL;
-   int advance = 1;
struct instruction_op op;
 
/* this default type might be overwritten by subcategories */
@@ -355,15 +354,10 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
}
}
 
-   if (emulated == EMULATE_FAIL) {
-   advance = 0;
-   kvmppc_core_queue_program(vcpu, 0);
-   }
-
trace_kvm_ppc_instr(inst, kvmppc_get_pc(vcpu), emulated);
 
/* Advance past emulated instruction. */
-   if (advance)
+   if (emulated != EMULATE_FAIL)
kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
 
return emulated;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 4d7d0d080232..6daeea4a7de1 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -307,7 +307,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
u32 last_inst;
 
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
-   /* XXX Deliver Program interrupt to guest. */
+   kvmppc_core_queue_program(vcpu, 0);
pr_info("%s: emulation failed (%08x)\n", __func__, last_inst);
r = RESUME_HOST;
break;
-- 
2.33.1

[PATCH v3 6/6] KVM: PPC: mmio: Reject instructions that access more than mmio.data size

2022-01-07 Thread Fabiano Rosas

The MMIO interface between the kernel and userspace uses a structure
that supports a maximum of 8-bytes of data. Instructions that access
more than that need to be emulated in parts.

We currently don't have generic support for splitting the emulation in
parts and each set of instructions needs to be explicitly included.

There's already an error message being printed when a load or store
exceeds the mmio.data buffer but we don't fail the emulation until
later at kvmppc_complete_mmio_load and even then we allow userspace to
make a partial copy of the data, which ends up overwriting some fields
of the mmio structure.

This patch makes the emulation fail earlier at kvmppc_handle_load|store,
which will send a Program interrupt to the guest. This is better than
allowing the guest to proceed with partial data.

Note that this was caught in a somewhat artificial scenario using
quadword instructions (lq/stq), there's no account of an actual guest
in the wild running instructions that are not properly emulated.

(While here, fix the error message to check against 'bytes' and not
'run->mmio.len' which at this point has an old value.)

Signed-off-by: Fabiano Rosas 
Reviewed-by: Alexey Kardashevskiy 
---
 arch/powerpc/kvm/powerpc.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 56b0faab7a5f..a1643ca988e0 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1246,7 +1246,8 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu,
 
if (bytes > sizeof(run->mmio.data)) {
printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
-  run->mmio.len);
+  bytes);
+   return EMULATE_FAIL;
}
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
@@ -1335,7 +1336,8 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu,
 
if (bytes > sizeof(run->mmio.data)) {
printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
-  run->mmio.len);
+  bytes);
+   return EMULATE_FAIL;
}
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
-- 
2.33.1

[PATCH v3 5/6] KVM: PPC: mmio: Return to guest after emulation failure

2022-01-07 Thread Fabiano Rosas

If MMIO emulation fails we don't want to crash the whole guest by
returning to userspace.

The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM
implementation") added a todo:

  /* XXX Deliver Program interrupt to guest. */

and later the commit d69614a295ae ("KVM: PPC: Separate loadstore
emulation from priv emulation") added the Program interrupt injection
but in another file, so I'm assuming it was missed that this block
needed to be altered.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Alexey Kardashevskiy 
---
 arch/powerpc/kvm/powerpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6daeea4a7de1..56b0faab7a5f 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -309,7 +309,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
kvmppc_core_queue_program(vcpu, 0);
pr_info("%s: emulation failed (%08x)\n", __func__, last_inst);
-   r = RESUME_HOST;
+   r = RESUME_GUEST;
break;
}
default:
-- 
2.33.1

[PATCH v3 1/6] KVM: PPC: Book3S HV: Stop returning internal values to userspace

2022-01-07 Thread Fabiano Rosas

Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values
to userspace, against the API of the KVM_RUN ioctl which returns 0 on
success.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
This was noticed while enabling the kvm selftests for powerpc. There's
an assert at the _vcpu_run function when we return a value different
from the expected.
---
 arch/powerpc/kvm/powerpc.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a72920f4f221..1e130bb087c4 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1849,6 +1849,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_ALTIVEC
 out:
 #endif
+
+   /*
+* We're already returning to userspace, don't pass the
+* RESUME_HOST flags along.
+*/
+   if (r > 0)
+   r = 0;
+
vcpu_put(vcpu);
return r;
 }
-- 
2.33.1

[PATCH v3 2/6] KVM: PPC: Fix vmx/vsx mixup in mmio emulation

2022-01-07 Thread Fabiano Rosas

The MMIO emulation code for vector instructions is duplicated between
VSX and VMX. When emulating VMX we should check the VMX copy size
instead of the VSX one.

Fixes: acc9eb9305fe ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction ...")
Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/powerpc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 1e130bb087c4..92e552ab5a77 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1507,7 +1507,7 @@ int kvmppc_handle_vmx_load(struct kvm_vcpu *vcpu,
 {
enum emulation_result emulated = EMULATE_DONE;
 
-   if (vcpu->arch.mmio_vsx_copy_nums > 2)
+   if (vcpu->arch.mmio_vmx_copy_nums > 2)
return EMULATE_FAIL;
 
while (vcpu->arch.mmio_vmx_copy_nums) {
@@ -1604,7 +1604,7 @@ int kvmppc_handle_vmx_store(struct kvm_vcpu *vcpu,
unsigned int index = rs & KVM_MMIO_REG_MASK;
enum emulation_result emulated = EMULATE_DONE;
 
-   if (vcpu->arch.mmio_vsx_copy_nums > 2)
+   if (vcpu->arch.mmio_vmx_copy_nums > 2)
return EMULATE_FAIL;
 
vcpu->arch.io_gpr = rs;
-- 
2.33.1

[PATCH v3 0/6] KVM: PPC: MMIO fixes

2022-01-07 Thread Fabiano Rosas

This v3 addresses review comments:

Merge patches 3 and 7, now patch 6, which returns EMULATE_FAIL and now
also alters the error message.

Remove the now unnecessary 'advance' variable from emulate_loadstore
in patch 4.

v2:
https://lore.kernel.org/r/20220106200304.4070825-1-faro...@linux.ibm.com

v1:
https://lore.kernel.org/r/20211223211528.3560711-1-faro...@linux.ibm.com

Fabiano Rosas (6):
  KVM: PPC: Book3S HV: Stop returning internal values to userspace
  KVM: PPC: Fix vmx/vsx mixup in mmio emulation
  KVM: PPC: Don't use pr_emerg when mmio emulation fails
  KVM: PPC: mmio: Queue interrupt at kvmppc_emulate_mmio
  KVM: PPC: mmio: Return to guest after emulation failure
  KVM: PPC: mmio: Reject instructions that access more than mmio.data
size

 arch/powerpc/kvm/emulate_loadstore.c |  8 +---
 arch/powerpc/kvm/powerpc.c   | 24 +---
 2 files changed, 18 insertions(+), 14 deletions(-)

-- 
2.33.1

Re: [PATCH v2 6/7] KVM: PPC: mmio: Return to guest after emulation failure

2022-01-07 Thread Fabiano Rosas

Alexey Kardashevskiy  writes:

> On 07/01/2022 07:03, Fabiano Rosas wrote:
>> If MMIO emulation fails we don't want to crash the whole guest by
>> returning to userspace.
>> 
>> The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM
>> implementation") added a todo:
>> 
>>/* XXX Deliver Program interrupt to guest. */
>> 
>> and later the commit d69614a295ae ("KVM: PPC: Separate loadstore
>> emulation from priv emulation") added the Program interrupt injection
>> but in another file, so I'm assuming it was missed that this block
>> needed to be altered.
>> 
>> Signed-off-by: Fabiano Rosas 
>
>
> Looks right.
> Reviewed-by: Alexey Kardashevskiy 
>
> but this means if I want to keep debugging those kvm selftests in 
> comfort, I'll have to have some exception handlers in the vm as 
> otherwise the failing $pc is lost after this change :)

Yes! But that will be a problem for any test. These kinds of issues is
why I wanted a trial period before sending the test infrastructure
upstream. Maybe we don't need exception handlers, but just a way to
force the test to crash if it tries to fetch from 0x700.

[PATCH v2 7/7] KVM: PPC: mmio: Reject instructions that access more than mmio.data size

2022-01-06 Thread Fabiano Rosas

The MMIO interface between the kernel and userspace uses a structure
that supports a maximum of 8-bytes of data. Instructions that access
more than that need to be emulated in parts.

We currently don't have generic support for splitting the emulation in
parts and each set of instructions needs to be explicitly included.

There's already an error message being printed when a load or store
exceeds the mmio.data buffer but we don't fail the emulation until
later at kvmppc_complete_mmio_load and even then we allow userspace to
make a partial copy of the data, which ends up overwriting some fields
of the mmio structure.

This patch makes the emulation fail earlier at kvmppc_handle_load|store,
which will send a Program interrupt to the guest. This is better than
allowing the guest to proceed with partial data.

Note that this was caught in a somewhat artificial scenario using
quadword instructions (lq/stq), there's no account of an actual guest
in the wild running instructions that are not properly emulated.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/powerpc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 50e08635e18a..a1643ca988e0 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1247,6 +1247,7 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu,
if (bytes > sizeof(run->mmio.data)) {
printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
   bytes);
+   return EMULATE_FAIL;
}
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
@@ -1336,6 +1337,7 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu,
if (bytes > sizeof(run->mmio.data)) {
printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
   bytes);
+   return EMULATE_FAIL;
}
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
-- 
2.33.1

[PATCH v2 6/7] KVM: PPC: mmio: Return to guest after emulation failure

2022-01-06 Thread Fabiano Rosas

If MMIO emulation fails we don't want to crash the whole guest by
returning to userspace.

The original commit bbf45ba57eae ("KVM: ppc: PowerPC 440 KVM
implementation") added a todo:

  /* XXX Deliver Program interrupt to guest. */

and later the commit d69614a295ae ("KVM: PPC: Separate loadstore
emulation from priv emulation") added the Program interrupt injection
but in another file, so I'm assuming it was missed that this block
needed to be altered.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/powerpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a2e78229d645..50e08635e18a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -309,7 +309,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
kvmppc_core_queue_program(vcpu, 0);
pr_info("%s: emulation failed (%08x)\n", __func__, last_inst);
-   r = RESUME_HOST;
+   r = RESUME_GUEST;
break;
}
default:
-- 
2.33.1

[PATCH v2 5/7] KVM: PPC: mmio: Queue interrupt at kvmppc_emulate_mmio

2022-01-06 Thread Fabiano Rosas

If MMIO emulation fails, we queue a Program interrupt to the
guest. Move that line up into kvmppc_emulate_mmio, which is where we
set RESUME_GUEST/HOST.

No functional change, just separation of responsibilities.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/emulate_loadstore.c | 4 +---
 arch/powerpc/kvm/powerpc.c   | 2 +-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/emulate_loadstore.c 
b/arch/powerpc/kvm/emulate_loadstore.c
index 48272a9b9c30..ef50e8cfd988 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -355,10 +355,8 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
}
}
 
-   if (emulated == EMULATE_FAIL) {
+   if (emulated == EMULATE_FAIL)
advance = 0;
-   kvmppc_core_queue_program(vcpu, 0);
-   }
 
trace_kvm_ppc_instr(inst, kvmppc_get_pc(vcpu), emulated);
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 3fc8057db4b4..a2e78229d645 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -307,7 +307,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
u32 last_inst;
 
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
-   /* XXX Deliver Program interrupt to guest. */
+   kvmppc_core_queue_program(vcpu, 0);
pr_info("%s: emulation failed (%08x)\n", __func__, last_inst);
r = RESUME_HOST;
break;
-- 
2.33.1

[PATCH v2 4/7] KVM: PPC: Don't use pr_emerg when mmio emulation fails

2022-01-06 Thread Fabiano Rosas

If MMIO emulation fails we deliver a Program interrupt to the
guest. This is a normal event for the host, so use pr_info.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/powerpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 0b0818d032e1..3fc8057db4b4 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -308,7 +308,7 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
 
kvmppc_get_last_inst(vcpu, INST_GENERIC, _inst);
/* XXX Deliver Program interrupt to guest. */
-   pr_emerg("%s: emulation failed (%08x)\n", __func__, last_inst);
+   pr_info("%s: emulation failed (%08x)\n", __func__, last_inst);
r = RESUME_HOST;
break;
}
-- 
2.33.1

[PATCH v2 3/7] KVM: PPC: Fix mmio length message

2022-01-06 Thread Fabiano Rosas

We check against 'bytes' but print 'run->mmio.len' which at that point
has an old value.

e.g. 16-byte load:

before:
__kvmppc_handle_load: bad MMIO length: 8

now:
__kvmppc_handle_load: bad MMIO length: 16

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/powerpc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 92e552ab5a77..0b0818d032e1 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1246,7 +1246,7 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu,
 
if (bytes > sizeof(run->mmio.data)) {
printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
-  run->mmio.len);
+  bytes);
}
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
@@ -1335,7 +1335,7 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu,
 
if (bytes > sizeof(run->mmio.data)) {
printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
-  run->mmio.len);
+  bytes);
}
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
-- 
2.33.1

[PATCH v2 2/7] KVM: PPC: Fix vmx/vsx mixup in mmio emulation

2022-01-06 Thread Fabiano Rosas

The MMIO emulation code for vector instructions is duplicated between
VSX and VMX. When emulating VMX we should check the VMX copy size
instead of the VSX one.

Fixes: acc9eb9305fe ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction ...")
Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
 arch/powerpc/kvm/powerpc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 1e130bb087c4..92e552ab5a77 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1507,7 +1507,7 @@ int kvmppc_handle_vmx_load(struct kvm_vcpu *vcpu,
 {
enum emulation_result emulated = EMULATE_DONE;
 
-   if (vcpu->arch.mmio_vsx_copy_nums > 2)
+   if (vcpu->arch.mmio_vmx_copy_nums > 2)
return EMULATE_FAIL;
 
while (vcpu->arch.mmio_vmx_copy_nums) {
@@ -1604,7 +1604,7 @@ int kvmppc_handle_vmx_store(struct kvm_vcpu *vcpu,
unsigned int index = rs & KVM_MMIO_REG_MASK;
enum emulation_result emulated = EMULATE_DONE;
 
-   if (vcpu->arch.mmio_vsx_copy_nums > 2)
+   if (vcpu->arch.mmio_vmx_copy_nums > 2)
return EMULATE_FAIL;
 
vcpu->arch.io_gpr = rs;
-- 
2.33.1

[PATCH v2 1/7] KVM: PPC: Book3S HV: Stop returning internal values to userspace

2022-01-06 Thread Fabiano Rosas

Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values
to userspace, against the API of the KVM_RUN ioctl which returns 0 on
success.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Nicholas Piggin 
---
This was noticed while enabling the kvm selftests for powerpc. There's
an assert at the _vcpu_run function when we return a value different
from the expected.
---
 arch/powerpc/kvm/powerpc.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index a72920f4f221..1e130bb087c4 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1849,6 +1849,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_ALTIVEC
 out:
 #endif
+
+   /*
+* We're already returning to userspace, don't pass the
+* RESUME_HOST flags along.
+*/
+   if (r > 0)
+   r = 0;
+
vcpu_put(vcpu);
return r;
 }
-- 
2.33.1

[PATCH v2 0/7] KVM: PPC: MMIO fixes

2022-01-06 Thread Fabiano Rosas

The change from v1 is that I have altered the MMIO size check to fail
the emulation in case the size exceeds 8 bytes. That brought some
cleanups and another fix along with it, we were returning to userspace
in case of failure instead of the guest.

This has now become an MMIO series, but since the first commit has
been reviewed already, I'm leaving it here.

v1:
https://lore.kernel.org/r/20211223211528.3560711-1-faro...@linux.ibm.com

Fabiano Rosas (7):
  KVM: PPC: Book3S HV: Stop returning internal values to userspace
  KVM: PPC: Fix vmx/vsx mixup in mmio emulation
  KVM: PPC: Fix mmio length message
  KVM: PPC: Don't use pr_emerg when mmio emulation fails
  KVM: PPC: mmio: Queue interrupt at kvmppc_emulate_mmio
  KVM: PPC: mmio: Return to guest after emulation failure
  KVM: PPC: mmio: Reject instructions that access more than mmio.data
size

 arch/powerpc/kvm/emulate_loadstore.c |  4 +---
 arch/powerpc/kvm/powerpc.c   | 24 +---
 2 files changed, 18 insertions(+), 10 deletions(-)

-- 
2.33.1

Re: [PATCH 3/3] KVM: PPC: Fix mmio length message

2021-12-30 Thread Fabiano Rosas

Nicholas Piggin  writes:

> Excerpts from Fabiano Rosas's message of December 24, 2021 7:15 am:
>> We check against 'bytes' but print 'run->mmio.len' which at that point
>> has an old value.
>> 
>> e.g. 16-byte load:
>> 
>> before:
>> __kvmppc_handle_load: bad MMIO length: 8
>> 
>> now:
>> __kvmppc_handle_load: bad MMIO length: 16
>> 
>> Signed-off-by: Fabiano Rosas 
>
> This patch fine, but in the case of overflow we continue anyway here.
> Can that overwrite some other memory in the kvm_run struct?

I tested this and QEMU will indeed overwrite the subsequent fields of
kvm_run. A `lq` on this data:

mmio_test_data:
.long   0xdeadbeef
.long   0x8badf00d
.long   0x1337c0de
.long   0x01abcdef

produces:

__kvmppc_handle_load: bad MMIO length: 16
kvmppc_complete_mmio_load data: 0x8badf00ddeadbeef
bad MMIO length: 322420958  <-- mmio.len got nuked

But then we return from kvmppc_complete_mmio_load without writing to the
registers.

>
> This is familiar, maybe something Alexey has noticed in the past too?
> What was the consensus on fixing it? (at least it should have a comment
> if it's not a problem IMO)

My plan was to just add quadword support. And whatever else might
missing. But I got sidetracked with how to test this so I'm just now
coming back to it.

Perhaps a more immediate fix is needed before that? We could block loads
and stores larger than 8 bytes earlier at kvmppc_emulate_loadstore for
instance.

Re: [PATCH 2/3] KVM: PPC: Fix vmx/vsx mixup in mmio emulation

2021-12-27 Thread Fabiano Rosas

Nicholas Piggin  writes:

> Excerpts from Fabiano Rosas's message of December 24, 2021 7:15 am:
>> The MMIO emulation code for vector instructions is duplicated between
>> VSX and VMX. When emulating VMX we should check the VMX copy size
>> instead of the VSX one.
>> 
>> Fixes: acc9eb9305fe ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction 
>> ...")
>> Signed-off-by: Fabiano Rosas 
>
> Good catch. AFAIKS handle_vmx_store needs the same treatment? If you
> agree then

Half the bug now, half the bug next year... haha I'll send a v2.

aside:
All this duplication is kind of annoying. I'm looking into what it would
take to have quadword instruction emulation here as well (Alexey caught
a bug with syskaller) and the code would be really similar. I see that
x86 has a more generic implementation that maybe we could take advantage
of. See "f78146b0f923 (KVM: Fix page-crossing MMIO)"

[PATCH 2/3] KVM: PPC: Book3S HV: Delay setting of kvm ops

2021-12-23 Thread Fabiano Rosas

Delay the setting of kvm_hv_ops until after all init code has
completed. This avoids leaving the ops still accessible if the init
fails.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_hv.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 9f4765951733..53400932f5d8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6087,9 +6087,6 @@ static int kvmppc_book3s_init_hv(void)
}
 #endif
 
-   kvm_ops_hv.owner = THIS_MODULE;
-   kvmppc_hv_ops = _ops_hv;
-
init_default_hcalls();
 
init_vcore_lists();
@@ -6105,10 +6102,15 @@ static int kvmppc_book3s_init_hv(void)
}
 
r = kvmppc_uvmem_init();
-   if (r < 0)
+   if (r < 0) {
pr_err("KVM-HV: kvmppc_uvmem_init failed %d\n", r);
+   return r;
+   }
 
-   return r;
+   kvm_ops_hv.owner = THIS_MODULE;
+   kvmppc_hv_ops = _ops_hv;
+
+   return 0;
 }
 
 static void kvmppc_book3s_exit_hv(void)
-- 
2.33.1

[PATCH 1/3] KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init

2021-12-23 Thread Fabiano Rosas

The return of the function is being shadowed by the call to
kvmppc_uvmem_init.

Fixes: ca9f4942670c ("KVM: PPC: Book3S HV: Support for running secure guests")
Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_hv.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 7b74fc0a986b..9f4765951733 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6098,8 +6098,11 @@ static int kvmppc_book3s_init_hv(void)
if (r)
return r;
 
-   if (kvmppc_radix_possible())
+   if (kvmppc_radix_possible()) {
r = kvmppc_radix_init();
+   if (r)
+   return r;
+   }
 
r = kvmppc_uvmem_init();
if (r < 0)
-- 
2.33.1

[PATCH 3/3] KVM: PPC: Book3S HV: Free allocated memory if module init fails

2021-12-23 Thread Fabiano Rosas

The module's exit function is not called when the init fails, we need
to do cleanup before returning.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_hv.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 53400932f5d8..2d79298e7ca4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6065,7 +6065,7 @@ static int kvmppc_book3s_init_hv(void)
 
r = kvm_init_subcore_bitmap();
if (r)
-   return r;
+   goto err;
 
/*
 * We need a way of accessing the XICS interrupt controller,
@@ -6080,7 +6080,8 @@ static int kvmppc_book3s_init_hv(void)
np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc");
if (!np) {
pr_err("KVM-HV: Cannot determine method for accessing 
XICS\n");
-   return -ENODEV;
+   r = -ENODEV;
+   goto err;
}
/* presence of intc confirmed - node can be dropped again */
of_node_put(np);
@@ -6093,12 +6094,12 @@ static int kvmppc_book3s_init_hv(void)
 
r = kvmppc_mmu_hv_init();
if (r)
-   return r;
+   goto err;
 
if (kvmppc_radix_possible()) {
r = kvmppc_radix_init();
if (r)
-   return r;
+   goto err;
}
 
r = kvmppc_uvmem_init();
@@ -6111,6 +6112,12 @@ static int kvmppc_book3s_init_hv(void)
kvmppc_hv_ops = _ops_hv;
 
return 0;
+
+err:
+   kvmhv_nested_exit();
+   kvmppc_radix_exit();
+
+   return r;
 }
 
 static void kvmppc_book3s_exit_hv(void)
-- 
2.33.1

[PATCH 0/3] KVM: PPC: KVM module exit fixes

2021-12-23 Thread Fabiano Rosas

This is a resend the module cleanup fixes but this time without the
HV/PR merge.

Fabiano Rosas (1):
  KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
  KVM: PPC: Book3S HV: Delay setting of kvm ops
  KVM: PPC: Book3S HV: Free allocated memory if module init fails

 arch/powerpc/kvm/book3s_hv.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

-- 
2.33.1

[PATCH 3/3] KVM: PPC: Fix mmio length message

2021-12-23 Thread Fabiano Rosas

We check against 'bytes' but print 'run->mmio.len' which at that point
has an old value.

e.g. 16-byte load:

before:
__kvmppc_handle_load: bad MMIO length: 8

now:
__kvmppc_handle_load: bad MMIO length: 16

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/powerpc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 793d42bd6c8f..7823207eb8f1 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1246,7 +1246,7 @@ static int __kvmppc_handle_load(struct kvm_vcpu *vcpu,
 
if (bytes > sizeof(run->mmio.data)) {
printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
-  run->mmio.len);
+  bytes);
}
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
@@ -1335,7 +1335,7 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu,
 
if (bytes > sizeof(run->mmio.data)) {
printk(KERN_ERR "%s: bad MMIO length: %d\n", __func__,
-  run->mmio.len);
+  bytes);
}
 
run->mmio.phys_addr = vcpu->arch.paddr_accessed;
-- 
2.33.1

1 2 3 >

1 - 100 of 234 matches

Mail list logo