date:20210622

Re: [PATCH V4 3/4] cpufreq: powerenv: Migrate to ->exit() callback instead of ->stop_cpu()

2021-06-22 Thread Michael Ellerman

Viresh Kumar  writes:
>
> Subject: Re: [PATCH V4 3/4] cpufreq: powerenv: Migrate to ->exit() callback 
> instead of ->stop_cpu()

Typo in subject should be "powernv".

cheers

[PATCH v2] powerpc/kprobes: Fix Oops by passing ppc_inst as a pointer to emulate_step() on ppc32

2021-06-22 Thread Christophe Leroy

From: Naveen N. Rao 

Trying to use a kprobe on ppc32 results in the below splat:
BUG: Unable to handle kernel data access on read at 0x7c0802a6
Faulting instruction address: 0xc002e9f0
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K PowerPC 44x Platform
Modules linked in:
CPU: 0 PID: 89 Comm: sh Not tainted 5.13.0-rc1-01824-g3a81c0495fdb #7
NIP:  c002e9f0 LR: c0011858 CTR: 8a47
REGS: c292fd50 TRAP: 0300   Not tainted  (5.13.0-rc1-01824-g3a81c0495fdb)
MSR:  9000   CR: 24002002  XER: 2000
DEAR: 7c0802a6 ESR: 

NIP [c002e9f0] emulate_step+0x28/0x324
LR [c0011858] optinsn_slot+0x128/0x1
Call Trace:
 opt_pre_handler+0x7c/0xb4 (unreliable)
 optinsn_slot+0x128/0x1
 ret_from_syscall+0x0/0x28

The offending instruction is:
81 24 00 00 lwz r9,0(r4)

Here, we are trying to load the second argument to emulate_step():
struct ppc_inst, which is the instruction to be emulated. On ppc64,
structures are passed in registers when passed by value. However, per
the ppc32 ABI, structures are always passed to functions as pointers.
This isn't being adhered to when setting up the call to emulate_step()
in the optprobe trampoline. Fix the same.

Fixes: eacf4c0202654a ("powerpc: Enable OPTPROBES on PPC32")
Cc: sta...@vger.kernel.org
Signed-off-by: Naveen N. Rao 
---
v2: Rebased on powerpc/merge 7f030e9d57b8
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/optprobes.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index 2b8fe40069ad..53facb4b377f 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -228,8 +228,12 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe 
*op, struct kprobe *p)
/*
 * 3. load instruction to be emulated into relevant register, and
 */
-   temp = ppc_inst_read(p->ainsn.insn);
-   patch_imm_load_insns(ppc_inst_as_ulong(temp), 4, buff + TMPL_INSN_IDX);
+   if (IS_ENABLED(CONFIG_PPC64)) {
+   temp = ppc_inst_read(p->ainsn.insn);
+   patch_imm_load_insns(ppc_inst_as_ulong(temp), 4, buff + 
TMPL_INSN_IDX);
+   } else {
+   patch_imm_load_insns((unsigned long)p->ainsn.insn, 4, buff + 
TMPL_INSN_IDX);
+   }
 
/*
 * 4. branch back from trampoline
-- 
2.25.0

[PATCH] selftests/powerpc: Use req_max_processed_len from sysfs NX capabilities

2021-06-22 Thread Haren Myneni



On PowerVM, the hypervisor defines the maximum buffer length for
each NX request and the kernel exported this value via sysfs.

This patch reads this value if the sysfs entry is available and
is used to limit the request length.

Signed-off-by: Haren Myneni 
---
 .../testing/selftests/powerpc/nx-gzip/Makefile  |  4 ++--
 .../selftests/powerpc/nx-gzip/gzfht_test.c  | 17 +++--
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/powerpc/nx-gzip/Makefile 
b/tools/testing/selftests/powerpc/nx-gzip/Makefile
index 640fad6cc2c7..0785c2e99d40 100644
--- a/tools/testing/selftests/powerpc/nx-gzip/Makefile
+++ b/tools/testing/selftests/powerpc/nx-gzip/Makefile
@@ -1,8 +1,8 @@
-CFLAGS = -O3 -m64 -I./include
+CFLAGS = -O3 -m64 -I./include -I../include
 
 TEST_GEN_FILES := gzfht_test gunz_test
 TEST_PROGS := nx-gzip-test.sh
 
 include ../../lib.mk
 
-$(TEST_GEN_FILES): gzip_vas.c
+$(TEST_GEN_FILES): gzip_vas.c ../utils.c
diff --git a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c 
b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
index b099753b50e4..095195a25687 100644
--- a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
+++ b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
@@ -60,6 +60,7 @@
 #include 
 #include 
 #include 
+#include "utils.h"
 #include "nxu.h"
 #include "nx.h"
 
@@ -70,6 +71,8 @@ FILE *nx_gzip_log;
 #define FNAME_MAX 1024
 #define FEXT ".nx.gz"
 
+#define SYSFS_MAX_REQ_BUF_PATH 
"devices/vio/ibm,compression-v1/nx_gzip_caps/req_max_processed_len"
+
 /*
  * LZ counts returned in the user supplied nx_gzip_crb_cpb_t structure.
  */
@@ -244,6 +247,7 @@ int compress_file(int argc, char **argv, void *handle)
struct nx_gzip_crb_cpb_t *cmdp;
uint32_t pagelen = 65536;
int fault_tries = NX_MAX_FAULTS;
+   char buf[32];
 
cmdp = (void *)(uintptr_t)
aligned_alloc(sizeof(struct nx_gzip_crb_cpb_t),
@@ -263,8 +267,17 @@ int compress_file(int argc, char **argv, void *handle)
assert(NULL != (outbuf = (char *)malloc(outlen)));
nxu_touch_pages(outbuf, outlen, pagelen, 1);
 
-   /* Compress piecemeal in smallish chunks */
-   chunk = 1<<22;
+   /*
+* On PowerVM, the hypervisor defines the maximum request buffer
+* size is defined and this value is available via sysfs.
+*/
+   if (!read_sysfs_file(SYSFS_MAX_REQ_BUF_PATH, buf, sizeof(buf))) {
+   chunk = atoi(buf);
+   } else {
+   /* sysfs entry is not available on PowerNV */
+   /* Compress piecemeal in smallish chunks */
+   chunk = 1<<22;
+   }
 
/* Write the gzip header to the stream */
num_hdr_bytes = gzip_header_blank(outbuf);
-- 
2.18.2

[powerpc:topic/ppc-kvm] BUILD SUCCESS 51696f39cbee5bb684e7959c0c98b5f54548aa34

2021-06-22 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
topic/ppc-kvm
branch HEAD: 51696f39cbee5bb684e7959c0c98b5f54548aa34  KVM: PPC: Book3S HV: 
Workaround high stack usage with clang

elapsed time: 777m

configs tested: 97
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
arm   h3600_defconfig
powerpcklondike_defconfig
arm   imx_v6_v7_defconfig
powerpc  arches_defconfig
arm rpc_defconfig
powerpcsam440ep_defconfig
sh   se7705_defconfig
powerpc   mpc834x_itxgp_defconfig
mips  malta_defconfig
xtensa   alldefconfig
powerpc  makalu_defconfig
h8300 edosk2674_defconfig
sh   se7724_defconfig
arc   tb10x_defconfig
mips   rbtx49xx_defconfig
sh   se7343_defconfig
m68kmvme16x_defconfig
armrealview_defconfig
x86_64allnoconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a001-20210622
i386 randconfig-a002-20210622
i386 randconfig-a003-20210622
i386 randconfig-a006-20210622
i386 randconfig-a005-20210622
i386 randconfig-a004-20210622
x86_64   randconfig-a012-20210622
x86_64   randconfig-a016-20210622
x86_64   randconfig-a015-20210622
x86_64   randconfig-a014-20210622
x86_64   randconfig-a013-20210622
x86_64   randconfig-a011-20210622
i386 randconfig-a011-20210622
i386 randconfig-a014-20210622
i386 randconfig-a013-20210622
i386 randconfig-a015-20210622
i386 randconfig-a012-20210622
i386 randconfig-a016-20210622
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
riscvnommu_k210_defconfig
riscvnommu_virt_defconfig
riscv  rv32_defconfig
x86_64rhel-8.3-kselftests
um   x86_64_defconfig
um i386_defconfig
umkunit_defconfig
x86_64   allyesconfig
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-kbuiltin
x86_64  kexec

clang tested configs:
x86_64   randconfig-b001-20210622
x86_64   randconfig-a002-20210622
x86_64   randconfig-a001-20210622
x86_64   randconfig-a005-20210622
x86_64   randconfig-a003-20210622
x86_64   randconfig-a004-20210622
x86_64   randconfig-a006-20210622

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

[PATCH V4 3/4] cpufreq: powerenv: Migrate to ->exit() callback instead of ->stop_cpu()

2021-06-22 Thread Viresh Kumar

commit 367dc4aa932b ("cpufreq: Add stop CPU callback to cpufreq_driver
interface") added the stop_cpu() callback to allow the drivers to do
clean up before the CPU is completely down and its state can't be
modified.

At that time the CPU hotplug framework used to call the cpufreq core's
registered notifier for different events like CPU_DOWN_PREPARE and
CPU_POST_DEAD. The stop_cpu() callback was called during the
CPU_DOWN_PREPARE event.

This is no longer the case, cpuhp_cpufreq_offline() is called only once
by the CPU hotplug core now and we don't really need two separate
callbacks for cpufreq drivers, i.e. stop_cpu() and exit(), as everything
can be done from the exit() callback itself.

Migrate to using the exit() callback instead of stop_cpu().

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/powernv-cpufreq.c | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index e439b43c19eb..005600cef273 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -875,7 +875,15 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
 
 static int powernv_cpufreq_cpu_exit(struct cpufreq_policy *policy)
 {
-   /* timer is deleted in cpufreq_cpu_stop() */
+   struct powernv_smp_call_data freq_data;
+   struct global_pstate_info *gpstates = policy->driver_data;
+
+   freq_data.pstate_id = idx_to_pstate(powernv_pstate_info.min);
+   freq_data.gpstate_id = idx_to_pstate(powernv_pstate_info.min);
+   smp_call_function_single(policy->cpu, set_pstate, _data, 1);
+   if (gpstates)
+   del_timer_sync(>timer);
+
kfree(policy->driver_data);
 
return 0;
@@ -1007,18 +1015,6 @@ static struct notifier_block powernv_cpufreq_opal_nb = {
.priority   = 0,
 };
 
-static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
-{
-   struct powernv_smp_call_data freq_data;
-   struct global_pstate_info *gpstates = policy->driver_data;
-
-   freq_data.pstate_id = idx_to_pstate(powernv_pstate_info.min);
-   freq_data.gpstate_id = idx_to_pstate(powernv_pstate_info.min);
-   smp_call_function_single(policy->cpu, set_pstate, _data, 1);
-   if (gpstates)
-   del_timer_sync(>timer);
-}
-
 static unsigned int powernv_fast_switch(struct cpufreq_policy *policy,
unsigned int target_freq)
 {
@@ -1042,7 +1038,6 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
.target_index   = powernv_cpufreq_target_index,
.fast_switch= powernv_fast_switch,
.get= powernv_cpufreq_get,
-   .stop_cpu   = powernv_cpufreq_stop_cpu,
.attr   = powernv_cpu_freq_attr,
 };
 
-- 
2.31.1.272.g89b43f80a514

[PATCH V4 0/4] cpufreq: Migrate away from ->stop_cpu() callback

2021-06-22 Thread Viresh Kumar

Hi Rafael,

These are based on your patch [1] now.

commit 367dc4aa932b ("cpufreq: Add stop CPU callback to cpufreq_driver
interface") added the stop_cpu() callback to allow the drivers to do
clean up before the CPU is completely down and its state can't be
modified.

At that time the CPU hotplug framework used to call the cpufreq core's
registered notifier for different events like CPU_DOWN_PREPARE and
CPU_POST_DEAD. The stop_cpu() callback was called during the
CPU_DOWN_PREPARE event.

This is no longer the case, cpuhp_cpufreq_offline() is called only once
by the CPU hotplug core now and we don't really need two separate
callbacks for cpufreq drivers, i.e. stop_cpu() and exit(), as everything
can be done from the exit() callback itself.

Migrate to using the offline() or exit() callback instead of stop_cpu().

V3->V4:
- Based on a cleanup patch [1] from Rafael, apart from 5.13-rc7.
- No need to update exit() for intel pstate anymore.
- Remove the stop_cpu() callback completely.

--
Viresh

[1] https://lore.kernel.org/linux-pm/5490292.DvuYhMxLoT@kreacher/

Viresh Kumar (4):
  cpufreq: cppc: Migrate to ->exit() callback instead of ->stop_cpu()
  cpufreq: intel_pstate: Migrate to ->offline() instead of ->stop_cpu()
  cpufreq: powerenv: Migrate to ->exit() callback instead of
->stop_cpu()
  cpufreq: Remove stop_cpu() callback

 Documentation/cpu-freq/cpu-drivers.rst|  3 --
 .../zh_CN/cpu-freq/cpu-drivers.rst|  3 --
 drivers/cpufreq/cppc_cpufreq.c| 46 ++-
 drivers/cpufreq/cpufreq.c |  3 --
 drivers/cpufreq/intel_pstate.c| 10 +---
 drivers/cpufreq/powernv-cpufreq.c | 23 --
 include/linux/cpufreq.h   |  1 -
 7 files changed, 35 insertions(+), 54 deletions(-)

-- 
2.31.1.272.g89b43f80a514

[PATCH] powerpc: offline CPU in stop_this_cpu

2021-06-22 Thread Nicholas Piggin

printk_safe_flush_on_panic() has special lock breaking code for the
case where we panic()ed with the console lock held. It relies on
panic IPI causing other CPUs to mark themselves offline.

Do as most other architectures do.

This effectively reverts commit de6e5d38417e ("powerpc: smp_send_stop do
not offline stopped CPUs"), unfortunately it may result in some false
positive warnings, but the alternative is more situations where we can
crash without getting messages out.

Fixes: de6e5d38417e ("powerpc: smp_send_stop do not offline stopped CPUs")
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/smp.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 2e05c783440a..bf12cca86d70 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -619,6 +619,8 @@ static void nmi_stop_this_cpu(struct pt_regs *regs)
/*
 * IRQs are already hard disabled by the smp_handle_nmi_ipi.
 */
+   set_cpu_online(smp_processor_id(), false);
+
spin_begin();
while (1)
spin_cpu_relax();
@@ -634,6 +636,15 @@ void smp_send_stop(void)
 static void stop_this_cpu(void *dummy)
 {
hard_irq_disable();
+
+   /*
+* Offlining CPUs in stop_this_cpu can result in scheduler warnings,
+* (see commit de6e5d38417e), but printk_safe_flush_on_panic() wants
+* to know other CPUs are offline before it breaks locks to flush
+* printk buffers, in case we panic()ed while holding the lock.
+*/
+   set_cpu_online(smp_processor_id(), false);
+
spin_begin();
while (1)
spin_cpu_relax();
-- 
2.23.0

[powerpc:next] BUILD SUCCESS a736143afd036f2078fe19435b16fd55abc789a9

2021-06-22 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next
branch HEAD: a736143afd036f2078fe19435b16fd55abc789a9  Merge branch 
'topic/ppc-kvm' into next

elapsed time: 725m

configs tested: 104
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
arm   h3600_defconfig
powerpcklondike_defconfig
arm   imx_v6_v7_defconfig
powerpc  arches_defconfig
arm rpc_defconfig
nds32alldefconfig
powerpc   maple_defconfig
arm   corgi_defconfig
arcvdk_hs38_smp_defconfig
mips   sb1250_swarm_defconfig
h8300 edosk2674_defconfig
sh   se7724_defconfig
mips  malta_defconfig
arc   tb10x_defconfig
mips   ip28_defconfig
arm eseries_pxa_defconfig
powerpc ep8248e_defconfig
sh   rts7751r2dplus_defconfig
mips   rbtx49xx_defconfig
sh   se7343_defconfig
armvt8500_v6_v7_defconfig
arm axm55xx_defconfig
arm lpc18xx_defconfig
powerpc tqm8560_defconfig
openrisc simple_smp_defconfig
x86_64allnoconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a001-20210622
i386 randconfig-a002-20210622
i386 randconfig-a003-20210622
i386 randconfig-a006-20210622
i386 randconfig-a005-20210622
i386 randconfig-a004-20210622
x86_64   randconfig-a012-20210622
x86_64   randconfig-a016-20210622
x86_64   randconfig-a015-20210622
x86_64   randconfig-a014-20210622
x86_64   randconfig-a013-20210622
x86_64   randconfig-a011-20210622
i386 randconfig-a014-20210622
i386 randconfig-a013-20210622
i386 randconfig-a015-20210622
i386 randconfig-a012-20210622
i386 randconfig-a016-20210622
i386 randconfig-a011-20210622
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
riscvallmodconfig
um   x86_64_defconfig
um i386_defconfig
umkunit_defconfig
x86_64   allyesconfig
x86_64rhel-8.3-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-kbuiltin
x86_64  kexec

clang tested configs:
x86_64   randconfig-b001-20210622
x86_64   randconfig-a002-20210622
x86_64   randconfig-a001-20210622
x86_64

[powerpc:next-test] BUILD REGRESSION a23408e2575e49c4394f8733c78dce907286ac8e

2021-06-22 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
branch HEAD: a23408e2575e49c4394f8733c78dce907286ac8e  powerpc/64s/interrupt: 
Check and fix srr_valid without crashing

possible Error/Warning in current branch:

arch/powerpc/platforms/52xx/mpc52xx_pm.c:58:5: error: stack frame size (1040) 
exceeds limit (1024) in function 'mpc52xx_pm_prepare' 
[-Werror,-Wframe-larger-than]
arch/powerpc/platforms/52xx/mpc52xx_pm.c:58:5: error: stack frame size of 1040 
bytes in function 'mpc52xx_pm_prepare' [-Werror,-Wframe-larger-than]
arch/powerpc/sysdev/ehv_pic.c:111:5: error: no previous prototype for function 
'ehv_pic_set_irq_type' [-Werror,-Wmissing-prototypes]

Error/Warning ids grouped by kconfigs:

clang_recent_errors
|-- powerpc-randconfig-r005-20210622
|   |-- 
arch-powerpc-platforms-52xx-mpc52xx_pm.c:error:stack-frame-size-()-exceeds-limit-()-in-function-mpc52xx_pm_prepare-Werror-Wframe-larger-than
|   `-- 
arch-powerpc-platforms-52xx-mpc52xx_pm.c:error:stack-frame-size-of-bytes-in-function-mpc52xx_pm_prepare-Werror-Wframe-larger-than
`-- powerpc-randconfig-r035-20210622
`-- 
arch-powerpc-sysdev-ehv_pic.c:error:no-previous-prototype-for-function-ehv_pic_set_irq_type-Werror-Wmissing-prototypes

elapsed time: 725m

configs tested: 107
configs skipped: 2

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
arm   h3600_defconfig
powerpcklondike_defconfig
arm   imx_v6_v7_defconfig
powerpc  arches_defconfig
arm rpc_defconfig
powerpc   motionpro_defconfig
arcvdk_hs38_defconfig
powerpc powernv_defconfig
powerpc mpc832x_rdb_defconfig
powerpc tqm5200_defconfig
powerpcgamecube_defconfig
h8300 edosk2674_defconfig
sh   se7724_defconfig
mips  malta_defconfig
arc   tb10x_defconfig
mips   ip28_defconfig
arm eseries_pxa_defconfig
powerpc ep8248e_defconfig
sh   rts7751r2dplus_defconfig
mips   rbtx49xx_defconfig
sh   se7343_defconfig
powerpc mpc836x_rdk_defconfig
sh kfr2r09-romimage_defconfig
m68k  atari_defconfig
powerpc tqm8555_defconfig
powerpc mpc8315_rdb_defconfig
powerpcsam440ep_defconfig
powerpc sequoia_defconfig
x86_64allnoconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
arc  allyesconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a001-20210622
i386 randconfig-a002-20210622
i386 randconfig-a003-20210622
i386 randconfig-a006-20210622
i386 randconfig-a005-20210622
i386 randconfig-a004-20210622
x86_64   randconfig-a012-20210622
x86_64   randconfig-a016-20210622
x86_64   randconfig-a015-20210622
x86_64   randconfig-a014-20210622
x86_64   randconfig-a013-20210622
x86_64   randconfig-a011-20210622
i386 randconfig-a014-20210622
i386

[PATCH] powerpc: Make PPC_IRQ_SOFT_MASK_DEBUG depend on PPC64

2021-06-22 Thread Nicholas Piggin

32-bit platforms don't have irq soft masking.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig.debug | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 6342f9da4545..45d871fb9155 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -84,6 +84,7 @@ config MSI_BITMAP_SELFTEST
 
 config PPC_IRQ_SOFT_MASK_DEBUG
bool "Include extra checks for powerpc irq soft masking"
+   depends on PPC64
 
 config XMON
bool "Include xmon kernel debugger"
-- 
2.23.0

[PATCH v2] powerpc: add compile-time support for lbarx, lharx

2021-06-22 Thread Nicholas Piggin

ISA v2.06 (POWER7 and up) as well as e6500 support lbarx and lharx.
Add a compile option that allows code to use it, and add support in
cmpxchg and xchg 8 and 16 bit values without shifting and masking.

Signed-off-by: Nicholas Piggin 
---
v2: Fixed lwarx->lharx typo, switched to PPC_HAS_

 arch/powerpc/Kconfig   |   3 +
 arch/powerpc/include/asm/cmpxchg.h | 236 -
 arch/powerpc/lib/sstep.c   |  21 +--
 arch/powerpc/platforms/Kconfig.cputype |   5 +
 4 files changed, 254 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 088dd2afcfe4..dc17f4d51a79 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -278,6 +278,9 @@ config PPC_BARRIER_NOSPEC
default y
depends on PPC_BOOK3S_64 || PPC_FSL_BOOK3E
 
+config PPC_HAS_LBARX_LHARX
+   bool
+
 config EARLY_PRINTK
bool
default y
diff --git a/arch/powerpc/include/asm/cmpxchg.h 
b/arch/powerpc/include/asm/cmpxchg.h
index cf091c4c22e5..28fbd57db1ec 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -77,10 +77,76 @@ u32 __cmpxchg_##type##sfx(volatile void *p, u32 old, u32 
new)   \
  * the previous value stored there.
  */
 
+#ifndef CONFIG_PPC_HAS_LBARX_LHARX
 XCHG_GEN(u8, _local, "memory");
 XCHG_GEN(u8, _relaxed, "cc");
 XCHG_GEN(u16, _local, "memory");
 XCHG_GEN(u16, _relaxed, "cc");
+#else
+static __always_inline unsigned long
+__xchg_u8_local(volatile void *p, unsigned long val)
+{
+   unsigned long prev;
+
+   __asm__ __volatile__(
+"1:lbarx   %0,0,%2 \n"
+"  stbcx.  %3,0,%2 \n\
+   bne-1b"
+   : "=" (prev), "+m" (*(volatile unsigned char *)p)
+   : "r" (p), "r" (val)
+   : "cc", "memory");
+
+   return prev;
+}
+
+static __always_inline unsigned long
+__xchg_u8_relaxed(u8 *p, unsigned long val)
+{
+   unsigned long prev;
+
+   __asm__ __volatile__(
+"1:lbarx   %0,0,%2\n"
+"  stbcx.  %3,0,%2\n"
+"  bne-1b"
+   : "=" (prev), "+m" (*p)
+   : "r" (p), "r" (val)
+   : "cc");
+
+   return prev;
+}
+
+static __always_inline unsigned long
+__xchg_u16_local(volatile void *p, unsigned long val)
+{
+   unsigned long prev;
+
+   __asm__ __volatile__(
+"1:lharx   %0,0,%2 \n"
+"  sthcx.  %3,0,%2 \n\
+   bne-1b"
+   : "=" (prev), "+m" (*(volatile unsigned short *)p)
+   : "r" (p), "r" (val)
+   : "cc", "memory");
+
+   return prev;
+}
+
+static __always_inline unsigned long
+__xchg_u16_relaxed(u16 *p, unsigned long val)
+{
+   unsigned long prev;
+
+   __asm__ __volatile__(
+"1:lharx   %0,0,%2\n"
+"  sthcx.  %3,0,%2\n"
+"  bne-1b"
+   : "=" (prev), "+m" (*p)
+   : "r" (p), "r" (val)
+   : "cc");
+
+   return prev;
+}
+#endif
 
 static __always_inline unsigned long
 __xchg_u32_local(volatile void *p, unsigned long val)
@@ -198,11 +264,12 @@ __xchg_relaxed(void *ptr, unsigned long x, unsigned int 
size)
(__typeof__(*(ptr))) __xchg_relaxed((ptr),  \
(unsigned long)_x_, sizeof(*(ptr)));\
 })
+
 /*
  * Compare and exchange - if *p == old, set it to new,
  * and return the old value of *p.
  */
-
+#ifndef CONFIG_PPC_HAS_LBARX_LHARX
 CMPXCHG_GEN(u8, , PPC_ATOMIC_ENTRY_BARRIER, PPC_ATOMIC_EXIT_BARRIER, "memory");
 CMPXCHG_GEN(u8, _local, , , "memory");
 CMPXCHG_GEN(u8, _acquire, , PPC_ACQUIRE_BARRIER, "memory");
@@ -211,6 +278,173 @@ CMPXCHG_GEN(u16, , PPC_ATOMIC_ENTRY_BARRIER, 
PPC_ATOMIC_EXIT_BARRIER, "memory");
 CMPXCHG_GEN(u16, _local, , , "memory");
 CMPXCHG_GEN(u16, _acquire, , PPC_ACQUIRE_BARRIER, "memory");
 CMPXCHG_GEN(u16, _relaxed, , , "cc");
+#else
+static __always_inline unsigned long
+__cmpxchg_u8(volatile unsigned char *p, unsigned long old, unsigned long new)
+{
+   unsigned int prev;
+
+   __asm__ __volatile__ (
+   PPC_ATOMIC_ENTRY_BARRIER
+"1:lbarx   %0,0,%2 # __cmpxchg_u8\n\
+   cmpw0,%0,%3\n\
+   bne-2f\n"
+"  stbcx.  %4,0,%2\n\
+   bne-1b"
+   PPC_ATOMIC_EXIT_BARRIER
+   "\n\
+2:"
+   : "=" (prev), "+m" (*p)
+   : "r" (p), "r" (old), "r" (new)
+   : "cc", "memory");
+
+   return prev;
+}
+
+static __always_inline unsigned long
+__cmpxchg_u8_local(volatile unsigned char *p, unsigned long old,
+   unsigned long new)
+{
+   unsigned int prev;
+
+   __asm__ __volatile__ (
+"1:lbarx   %0,0,%2 # __cmpxchg_u8\n\
+   cmpw0,%0,%3\n\
+   bne-2f\n"
+"  stbcx.  %4,0,%2\n\
+   bne-1b"
+   "\n\
+2:"
+   : "=" (prev), "+m" (*p)
+   : "r" (p), "r" (old), "r" (new)
+   : "cc", "memory");
+
+   return prev;
+}
+
+static __always_inline unsigned long
+__cmpxchg_u8_relaxed(u8 *p, unsigned long old, unsigned long new)
+{
+   unsigned long prev;
+
+   __asm__ __volatile__ (
+"1:lbarx   %0,0,%2 #

[PATCH] powerpc/64s: accumulate_stolen_time remove irq mask workaround

2021-06-22 Thread Nicholas Piggin

The caller has been moved to C after irq soft-mask state has been
reconciled, and Linux irqs have been marked as disabled, so this
does not have to play games with irq internals.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/time.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b67d93a609a2..d0308e804063 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -231,24 +231,13 @@ static u64 scan_dispatch_log(u64 stop_tb)
 void notrace accumulate_stolen_time(void)
 {
u64 sst, ust;
-   unsigned long save_irq_soft_mask = irq_soft_mask_return();
struct cpu_accounting_data *acct = _paca->accounting;
 
-   /* We are called early in the exception entry, before
-* soft/hard_enabled are sync'ed to the expected state
-* for the exception. We are hard disabled but the PACA
-* needs to reflect that so various debug stuff doesn't
-* complain
-*/
-   irq_soft_mask_set(IRQS_DISABLED);
-
sst = scan_dispatch_log(acct->starttime_user);
ust = scan_dispatch_log(acct->starttime);
acct->stime -= sst;
acct->utime -= ust;
acct->steal_time += ust + sst;
-
-   irq_soft_mask_set(save_irq_soft_mask);
 }
 
 static inline u64 calculate_stolen_time(u64 stop_tb)
-- 
2.23.0

[PATCH v2] powerpc/pseries: Enable hardlockup watchdog for PowerVM partitions

2021-06-22 Thread Nicholas Piggin

PowerVM will not arbitrarily oversubscribe or stop guests, page out the
guest kernel text to a NFS volume connected by carrier pigeon to abacus
based storage, etc., as a KVM host might. So PowerVM guests are not
likely to be killed by the hard lockup watchdog in normal operation,
even with shared processor LPARs which still get a minimum allotment of
CPU time.

Enable the hard lockup detector by default on !KVM guests, which we will
assume is PowerVM. It has been useful in finding problems on bare metal
kernels.

Signed-off-by: Nicholas Piggin 
---
v2: Fix 64e build by including kvm_guest.h

 arch/powerpc/kernel/setup_64.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index e42b85e4f1aa..428058dc5114 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -33,6 +33,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -939,16 +940,20 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
  * disable it by default. Book3S has a soft-nmi hardlockup detector based
  * on the decrementer interrupt, so it does not suffer from this problem.
  *
- * It is likely to get false positives in VM guests, so disable it there
- * by default too.
+ * It is likely to get false positives in KVM guests, so disable it there
+ * by default too. PowerVM will not stop or arbitrarily oversubscribe
+ * CPUs, but give a minimum regular allotment even with SPLPAR, so enable
+ * the detector for non-KVM guests, assume PowerVM.
  */
 static int __init disable_hardlockup_detector(void)
 {
 #ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
hardlockup_detector_disable();
 #else
-   if (firmware_has_feature(FW_FEATURE_LPAR))
-   hardlockup_detector_disable();
+   if (firmware_has_feature(FW_FEATURE_LPAR)) {
+   if (is_kvm_guest())
+   hardlockup_detector_disable();
+   }
 #endif
 
return 0;
-- 
2.23.0

Re: [PATCH 2/2] powerpc/prom_init: Pass linux_banner to firmware via option vector 7

2021-06-22 Thread Michael Ellerman

Tyrel Datwyler  writes:
> On 6/20/21 11:49 PM, Michael Ellerman wrote:
>> Pass the value of linux_banner to firmware via option vector 7.
>> 
>> Option vector 7 is described in "LoPAR" Linux on Power Architecture
>> Reference v2.9, in table B.7 on page 824:
>> 
>>   An ASCII character formatted null terminated string that describes
>>   the client operating system. The string shall be human readable and
>>   may be displayed on the console.
>> 
>> The string can be up to 256 bytes total, including the nul terminator.
>> 
>> linux_banner contains lots of information, and should make it possible
>> to identify the exact kernel version that is running:
>> 
>>   const char linux_banner[] =
>>   "Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@"
>>   LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "\n";
>> 
>> For example:
>>   Linux version 4.15.0-144-generic (buildd@bos02-ppc64el-018) (gcc
>>   version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #148-Ubuntu SMP Sat May 8
>>   02:32:13 UTC 2021 (Ubuntu 4.15.0-144.148-generic 4.15.18)
>> 
>> It's also printed at boot to the console/dmesg, which should make it
>> possible to correlate what firmware receives with the console/dmesg on
>> the machine.
>> 
>> Signed-off-by: Michael Ellerman 
>> ---
>> 
>> NB. linux_banner is already allowed by prom_init_check.sh
>> 
>> LoPAR: 
>> https://openpowerfoundation.org/?resource_lib=linux-on-power-architecture-reference-a-papr-linux-subset-review-draft
>> ---
>>  arch/powerpc/kernel/prom_init.c | 15 +++
>>  1 file changed, 15 insertions(+)
>> 
>> diff --git a/arch/powerpc/kernel/prom_init.c 
>> b/arch/powerpc/kernel/prom_init.c
>> index c18d55f8b951..7343076b261c 100644
>> --- a/arch/powerpc/kernel/prom_init.c
>> +++ b/arch/powerpc/kernel/prom_init.c
...
>> @@ -1340,6 +1351,10 @@ static void __init prom_check_platform_support(void)
>>  memcpy(_architecture_vec, _architecture_vec_template,
>> sizeof(ibm_architecture_vec));
>> 
>> +prom_strscpy_pad(ibm_architecture_vec.vec7.os_id, linux_banner, 256);
>> +// Ensure nul termination
>> +ibm_architecture_vec.vec7.os_id[255] = '\0';
>> +
>
> Doesn't the implementation of prom_strscpy_pad() in patch 1 ensure nul 
> termination?

Yes! I was originally using strncpy(), but forgot to drop this when I
switched to strscpy_pad(). I dropped it when applying.

Thanks for reviewing.

cheers

Re: [PATCH v14 01/12] swiotlb: Refactor swiotlb init functions

2021-06-22 Thread Stefano Stabellini

On Sat, 19 Jun 2021, Claire Chang wrote:
> Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
> initialization to make the code reusable.
> 
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> Tested-by: Stefano Stabellini 
> Tested-by: Will Deacon 

Acked-by: Stefano Stabellini 


> ---
>  kernel/dma/swiotlb.c | 50 ++--
>  1 file changed, 25 insertions(+), 25 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 52e2ac526757..1f9b2b9e7490 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void)
>   memset(vaddr, 0, bytes);
>  }
>  
> -int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose)
> +static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t 
> start,
> + unsigned long nslabs, bool late_alloc)
>  {
> + void *vaddr = phys_to_virt(start);
>   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
> +
> + mem->nslabs = nslabs;
> + mem->start = start;
> + mem->end = mem->start + bytes;
> + mem->index = 0;
> + mem->late_alloc = late_alloc;
> + spin_lock_init(>lock);
> + for (i = 0; i < mem->nslabs; i++) {
> + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> + mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> + mem->slots[i].alloc_size = 0;
> + }
> + memset(vaddr, 0, bytes);
> +}
> +
> +int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose)
> +{
>   struct io_tlb_mem *mem;
>   size_t alloc_size;
>  
> @@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned 
> long nslabs, int verbose)
>   if (!mem)
>   panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
> __func__, alloc_size, PAGE_SIZE);
> - mem->nslabs = nslabs;
> - mem->start = __pa(tlb);
> - mem->end = mem->start + bytes;
> - mem->index = 0;
> - spin_lock_init(>lock);
> - for (i = 0; i < mem->nslabs; i++) {
> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> - mem->slots[i].alloc_size = 0;
> - }
> +
> + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
>  
>   io_tlb_default_mem = mem;
>   if (verbose)
> @@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size)
>  int
>  swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
>  {
> - unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
>   struct io_tlb_mem *mem;
> + unsigned long bytes = nslabs << IO_TLB_SHIFT;
>  
>   if (swiotlb_force == SWIOTLB_NO_FORCE)
>   return 0;
> @@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long 
> nslabs)
>   if (!mem)
>   return -ENOMEM;
>  
> - mem->nslabs = nslabs;
> - mem->start = virt_to_phys(tlb);
> - mem->end = mem->start + bytes;
> - mem->index = 0;
> - mem->late_alloc = 1;
> - spin_lock_init(>lock);
> - for (i = 0; i < mem->nslabs; i++) {
> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> - mem->slots[i].alloc_size = 0;
> - }
> -
> + memset(mem, 0, sizeof(*mem));
>   set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
> - memset(tlb, 0, bytes);
> + swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
>  
>   io_tlb_default_mem = mem;
>   swiotlb_print_info();
> -- 
> 2.32.0.288.g62a8d224e6-goog
>

Re: [PATCH 1/2] powerpc/prom_init: Convert prom_strcpy() into prom_strscpy_pad()

2021-06-22 Thread Tyrel Datwyler

On 6/21/21 9:11 PM, Michael Ellerman wrote:
> Daniel Axtens  writes:
>> Hi
>>
>>> -static char __init *prom_strcpy(char *dest, const char *src)
>>> +static ssize_t __init prom_strscpy_pad(char *dest, const char *src, size_t 
>>> n)
>>>  {
>>> -   char *tmp = dest;
>>> +   ssize_t rc;
>>> +   size_t i;
>>>  
>>> -   while ((*dest++ = *src++) != '\0')
>>> -   /* nothing */;
>>> -   return tmp;
>>> +   if (n == 0 || n > INT_MAX)
>>> +   return -E2BIG;
>>> +
>>> +   // Copy up to n bytes
>>> +   for (i = 0; i < n && src[i] != '\0'; i++)
>>> +   dest[i] = src[i];
>>> +
>>> +   rc = i;
>>> +
>>> +   // If we copied all n then we have run out of space for the nul
>>> +   if (rc == n) {
>>> +   // Rewind by one character to ensure nul termination
>>> +   i--;
>>> +   rc = -E2BIG;
>>> +   }
>>> +
>>> +   for (; i < n; i++)
>>> +   dest[i] = '\0';
>>> +
>>> +   return rc;
>>>  }
>>>  
>>
>> This implementation seems good to me.
>>
>> I copied it into a new C file and added the following:
>>
>> int main() {
>>  char longstr[255]="abcdefghijklmnopqrstuvwxyz";
>>  char shortstr[5];
>>  assert(prom_strscpy_pad(longstr, "", 0) == -E2BIG);
>>  assert(prom_strscpy_pad(longstr, "hello", 255) == 5);
>>  assert(prom_strscpy_pad(shortstr, "hello", 5) == -E2BIG);
>>  assert(memcmp(shortstr, "hell", 5) == 0);
>>  assert(memcmp(longstr, "hello\0\0\0\0\0\0\0\0\0", 6) == 0);
>>  return 0;
>> }
>>
>> All the assertions pass. I believe this covers all the conditions from
>> the strscpy_pad docstring.
>>
>> Reviewed-by: Daniel Axtens 
> 
> Thanks.
> 
> I'll also drop the explicit nul termination in patch 2, which is a
> leftover from when I was using strncpy().

I guess you can ignore my other email questioning this.

-Tyrel

> 
> cheers
>

Re: [PATCH 2/2] powerpc/prom_init: Pass linux_banner to firmware via option vector 7

2021-06-22 Thread Tyrel Datwyler

On 6/20/21 11:49 PM, Michael Ellerman wrote:
> Pass the value of linux_banner to firmware via option vector 7.
> 
> Option vector 7 is described in "LoPAR" Linux on Power Architecture
> Reference v2.9, in table B.7 on page 824:
> 
>   An ASCII character formatted null terminated string that describes
>   the client operating system. The string shall be human readable and
>   may be displayed on the console.
> 
> The string can be up to 256 bytes total, including the nul terminator.
> 
> linux_banner contains lots of information, and should make it possible
> to identify the exact kernel version that is running:
> 
>   const char linux_banner[] =
>   "Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@"
>   LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "\n";
> 
> For example:
>   Linux version 4.15.0-144-generic (buildd@bos02-ppc64el-018) (gcc
>   version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #148-Ubuntu SMP Sat May 8
>   02:32:13 UTC 2021 (Ubuntu 4.15.0-144.148-generic 4.15.18)
> 
> It's also printed at boot to the console/dmesg, which should make it
> possible to correlate what firmware receives with the console/dmesg on
> the machine.
> 
> Signed-off-by: Michael Ellerman 
> ---
> 
> NB. linux_banner is already allowed by prom_init_check.sh
> 
> LoPAR: 
> https://openpowerfoundation.org/?resource_lib=linux-on-power-architecture-reference-a-papr-linux-subset-review-draft
> ---
>  arch/powerpc/kernel/prom_init.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index c18d55f8b951..7343076b261c 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -944,6 +945,10 @@ struct option_vector6 {
>   u8 os_name;
>  } __packed;
> 
> +struct option_vector7 {
> + u8 os_id[256];
> +} __packed;
> +
>  struct ibm_arch_vec {
>   struct { u32 mask, val; } pvrs[14];
> 
> @@ -966,6 +971,9 @@ struct ibm_arch_vec {
> 
>   u8 vec6_len;
>   struct option_vector6 vec6;
> +
> + u8 vec7_len;
> + struct option_vector7 vec7;
>  } __packed;
> 
>  static const struct ibm_arch_vec ibm_architecture_vec_template __initconst = 
> {
> @@ -1112,6 +1120,9 @@ static const struct ibm_arch_vec 
> ibm_architecture_vec_template __initconst = {
>   .secondary_pteg = 0,
>   .os_name = OV6_LINUX,
>   },
> +
> + /* option vector 7: OS Identification */
> + .vec7_len = VECTOR_LENGTH(sizeof(struct option_vector7)),
>  };
> 
>  static struct ibm_arch_vec __prombss ibm_architecture_vec  
> cacheline_aligned;
> @@ -1340,6 +1351,10 @@ static void __init prom_check_platform_support(void)
>   memcpy(_architecture_vec, _architecture_vec_template,
>  sizeof(ibm_architecture_vec));
> 
> + prom_strscpy_pad(ibm_architecture_vec.vec7.os_id, linux_banner, 256);
> + // Ensure nul termination
> + ibm_architecture_vec.vec7.os_id[255] = '\0';
> +

Doesn't the implementation of prom_strscpy_pad() in patch 1 ensure nul 
termination?

-Tyrel

>   if (prop_len > 1) {
>   int i;
>   u8 vec[8];
>

Re: linux-next: manual merge of the kvm tree with the powerpc tree

2021-06-22 Thread Paolo Bonzini


On 22/06/21 16:51, Michael Ellerman wrote:

Please drop the patches at
https://www.spinics.net/lists/kvm-ppc/msg18666.html  from the powerpc
tree, and merge them through either the kvm-powerpc or kvm trees.

The kvm-ppc tree is not taking patches at the moment.


If so, let's remove the "T" entry from MAINTAINERS and add an entry for 
the k...@vger.kernel.org mailing list.



  
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/log/?h=topic/ppc-kvm

The commit Stephen mentioned has been rebased since to squash in a fix.
But what is in the topic branch is now final, I won't rebase what's
there.


Thanks, I pulled it.  Anyway, if the workflow is not the one indicated 
by MAINTAINERS it's never a bad idea to Cc more people when applying 
patches.


Paolo

Re: [PATCH v4 7/7] powerpc/pseries: Add support for FORM2 associativity

2021-06-22 Thread Daniel Henrique Barboza





On 6/22/21 9:07 AM, Aneesh Kumar K.V wrote:

Daniel Henrique Barboza  writes:


On 6/17/21 1:51 PM, Aneesh Kumar K.V wrote:

PAPR interface currently supports two different ways of communicating resource
grouping details to the OS. These are referred to as Form 0 and Form 1
associativity grouping. Form 0 is the older format and is now considered
deprecated. This patch adds another resource grouping named FORM2.

Signed-off-by: Daniel Henrique Barboza 
Signed-off-by: Aneesh Kumar K.V 
---
   Documentation/powerpc/associativity.rst   | 135 
   arch/powerpc/include/asm/firmware.h   |   3 +-
   arch/powerpc/include/asm/prom.h   |   1 +
   arch/powerpc/kernel/prom_init.c   |   3 +-
   arch/powerpc/mm/numa.c| 149 +-
   arch/powerpc/platforms/pseries/firmware.c |   1 +
   6 files changed, 286 insertions(+), 6 deletions(-)
   create mode 100644 Documentation/powerpc/associativity.rst

diff --git a/Documentation/powerpc/associativity.rst 
b/Documentation/powerpc/associativity.rst
new file mode 100644
index ..93be604ac54d
--- /dev/null
+++ b/Documentation/powerpc/associativity.rst
@@ -0,0 +1,135 @@
+
+NUMA resource associativity
+=
+
+Associativity represents the groupings of the various platform resources into
+domains of substantially similar mean performance relative to resources outside
+of that domain. Resources subsets of a given domain that exhibit better
+performance relative to each other than relative to other resources subsets
+are represented as being members of a sub-grouping domain. This performance
+characteristic is presented in terms of NUMA node distance within the Linux 
kernel.
+From the platform view, these groups are also referred to as domains.
+
+PAPR interface currently supports different ways of communicating these 
resource
+grouping details to the OS. These are referred to as Form 0, Form 1 and Form2
+associativity grouping. Form 0 is the older format and is now considered 
deprecated.
+
+Hypervisor indicates the type/form of associativity used via "ibm,arcitecture-vec-5 
property".
+Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates usage of 
Form 0 or Form 1.
+A value of 1 indicates the usage of Form 1 associativity. For Form 2 
associativity
+bit 2 of byte 5 in the "ibm,architecture-vec-5" property is used.
+
+Form 0
+-
+Form 0 associativity supports only two NUMA distance (LOCAL and REMOTE).
+
+Form 1
+-
+With Form 1 a combination of ibm,associativity-reference-points and 
ibm,associativity
+device tree properties are used to determine the NUMA distance between 
resource groups/domains.
+
+The “ibm,associativity” property contains one or more lists of numbers 
(domainID)
+representing the resource’s platform grouping domains.
+
+The “ibm,associativity-reference-points” property contains one or more list of 
numbers
+(domainID index) that represents the 1 based ordinal in the associativity 
lists.
+The list of domainID index represnets increasing hierachy of resource grouping.
+
+ex:
+{ primary domainID index, secondary domainID index, tertiary domainID index.. }
+
+Linux kernel uses the domainID at the primary domainID index as the NUMA node 
id.
+Linux kernel computes NUMA distance between two domains by recursively 
comparing
+if they belong to the same higher-level domains. For mismatch at every higher
+level of the resource group, the kernel doubles the NUMA distance between the
+comparing domains.
+
+Form 2
+---
+Form 2 associativity format adds separate device tree properties representing 
NUMA node distance
+thereby making the node distance computation flexible. Form 2 also allows 
flexible primary
+domain numbering. With numa distance computation now detached from the index 
value of
+"ibm,associativity" property, Form 2 allows a large number of primary domain 
ids at the
+same domainID index representing resource groups of different 
performance/latency characteristics.
+
+Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 in 
the
+"ibm,architecture-vec-5" property.
+
+"ibm,numa-lookup-index-table" property contains one or more list numbers 
representing
+the domainIDs present in the system. The offset of the domainID in this 
property is considered
+the domainID index.
+
+prop-encoded-array: The number N of the domainIDs encoded as with encode-int, 
followed by
+N domainID encoded as with encode-int
+
+For ex:
+ibm,numa-lookup-index-table =  {4, 0, 8, 250, 252}, domainID index for 
domainID 8 is 1.
+
+"ibm,numa-distance-table" property contains one or more list of numbers 
representing the NUMA
+distance between resource groups/domains present in the system.
+
+prop-encoded-array: The number N of the distance values encoded as with 
encode-int, followed by
+N distance values encoded as with encode-bytes. The max distance value we 
could encode is 255.
+
+For ex:

Re: [powerpc][next-20210621] WARNING at kernel/sched/fair.c:3277 during boot

2021-06-22 Thread Sachin Sant

>> On Tue, 22 Jun 2021 at 09:39, Sachin Sant  wrote:
>>> 
>>> While booting 5.13.0-rc7-next-20210621 on a PowerVM LPAR following warning
>>> is seen
>>> 
>>> [   30.922154] [ cut here ]
>>> [   30.922201] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || 
>>> cfs_rq->avg.runnable_avg
>>> [   30.922219] WARNING: CPU: 6 PID: 762 at kernel/sched/fair.c:3277 
>>> update_blocked_averages+0x758/0x780
>> 
>> Yes. That was exactly the purpose of the patch. There is one last
>> remaining part which could generate this. I'm going to prepare a patch
> 
> Could you try the patch below ? I have been able to reproduce the problem 
> locally and this
> fix it on my system:
> 
I can recreate the issue with this patch.

 Starting Terminate Plymouth Boot Screen...
 Starting Hold until boot process finishes up...
[FAILED] Failed to start Crash recovery kernel arming.
See 'systemctl status kdump.service' for details.
[   10.737913] [ cut here ]
[   10.737960] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || 
cfs_rq->avg.runnable_avg
[   10.737976] WARNING: CPU: 27 PID: 146 at kernel/sched/fair.c:3279 
update_blocked_averages+0x758/0x780
[   10.738010] Modules linked in: stp llc rfkill sunrpc pseries_rng xts 
vmx_crypto uio_pdrv_genirq uio sch_fq_codel ip_tables xfs libcrc32c sr_mod 
sd_mod cdrom t10_pi sg ibmvscsi ibmveth scsi_transport_srp dm_mirror 
dm_region_hash dm_log dm_mod fuse
[   10.738089] CPU: 27 PID: 146 Comm: ksoftirqd/27 Not tainted 
5.13.0-rc7-next-20210621-dirty #2
[   10.738103] NIP:  c01b2768 LR: c01b2764 CTR: c0729120
[   10.738116] REGS: c00015973840 TRAP: 0700   Not tainted  
(5.13.0-rc7-next-20210621-dirty)
[   10.738130] MSR:  8282b033   CR: 
48000224  XER: 0005
[   10.738161] CFAR: c014d120 IRQMASK: 1 
[   10.738161] GPR00: c01b2764 c00015973ae0 c29bb900 
0048 
[   10.738161] GPR04: fffe c000159737a0 0027 
c0154f9f7e18 
[   10.738161] GPR08: 0023 0001 0027 
c0167f1d7fe8 
[   10.738161] GPR12:  c0154ffd7e80 c0154fa82580 
b78a 
[   10.738161] GPR16: 00028007883c 02ed c00038d31000 
 
[   10.738161] GPR20:  c29fdfe0  
037b 
[   10.738161] GPR24:  c0154fa82f90 0001 
c0003d4ca400 
[   10.738161] GPR28: 02ed c00038d311c0 c00038d31100 
 
[   10.738281] NIP [c01b2768] update_blocked_averages+0x758/0x780
[   10.738290] LR [c01b2764] update_blocked_averages+0x754/0x780
[   10.738299] Call Trace:
[   10.738303] [c00015973ae0] [c01b2764] 
update_blocked_averages+0x754/0x780 (unreliable)
[   10.738315] [c00015973c00] [c01be720] 
run_rebalance_domains+0xa0/0xd0
[   10.738326] [c00015973c30] [c0cf9acc] __do_softirq+0x15c/0x3d4
[   10.738337] [c00015973d20] [c0158464] run_ksoftirqd+0x64/0x90
[   10.738346] [c00015973d40] [c018fd24] 
smpboot_thread_fn+0x204/0x270
[   10.738357] [c00015973da0] [c0189770] kthread+0x190/0x1a0
[   10.738367] [c00015973e10] [c000ceec] 
ret_from_kernel_thread+0x5c/0x70
[   10.738381] Instruction dump:
[   10.738388] 3863c808 9be9eefe 4bf9a979 6000 0fe0 4bfff980 e9210070 
e8610088 
[   10.738410] 3941 99490003 4bf9a959 6000 <0fe0> 4bfffc24 3d22fff6 
8929eefb 
[   10.738431] ---[ end trace 9ca80b55840c53f0 ]—

Thanks
-Sachin

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8cc27b847ad8..da91db1c137f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3037,8 +3037,9 @@ enqueue_load_avg(struct cfs_rq *cfs_rq, struct 
> sched_entity *se)
> static inline void
> dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> +   u32 divider = get_pelt_divider(>avg);
>sub_positive(_rq->avg.load_avg, se->avg.load_avg);
> -   sub_positive(_rq->avg.load_sum, se_weight(se) * se->avg.load_sum);
> +   cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * divider;
> }
> #else
> static inline void
>

Re: linux-next: manual merge of the kvm tree with the powerpc tree

2021-06-22 Thread Michael Ellerman

Paolo Bonzini  writes:
> On 22/06/21 07:25, Stephen Rothwell wrote:
>> Hi all,
>> 
>> Today's linux-next merge of the kvm tree got a conflict in:
>> 
>>include/uapi/linux/kvm.h
>> 
>> between commit:
>> 
>>9bb4a6f38fd4 ("KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE 
>> capability")
>> 
>> from the powerpc tree and commits:
>> 
>>644f706719f0 ("KVM: x86: hyper-v: Introduce KVM_CAP_HYPERV_ENFORCE_CPUID")
>>6dba94035203 ("KVM: x86: Introduce KVM_GET_SREGS2 / KVM_SET_SREGS2")
>>0dbb11230437 ("KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall")
>> 
>> from the kvm tree.
>> 
>> I fixed it up (see below) and can carry the fix as necessary. This
>> is now fixed as far as linux-next is concerned, but any non trivial
>> conflicts should be mentioned to your upstream maintainer when your tree
>> is submitted for merging.  You may also want to consider cooperating
>> with the maintainer of the conflicting tree to minimise any particularly
>> complex conflicts.
>> 
>
> What are the dependencies of these KVM patches on patches from the bare 
> metal trees,

I don't think there's actually a semantic dependency on my tree, but
there's multiple textual conflicts with my tree. That series has to go
via both trees, or there will be conflicts.

> ... and can you guys *please* start using topic branches?
>
> I've been asking you for literally years, but this is the first time I 
> remember that Linus will have to resolve conflicts in uAPI changes and 
> it is *not* acceptable.

The patches are in a topic branch, which I will ask you to pull before
the merge window, in order to resolve any conflicts.

> Please drop the patches at 
> https://www.spinics.net/lists/kvm-ppc/msg18666.html from the powerpc 
> tree, and merge them through either the kvm-powerpc or kvm trees.

The kvm-ppc tree is not taking patches at the moment.

But it doesn't matter anyway, this series needs to be merged into my
tree and the KVM tree regardless.

The topic branch is here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/log/?h=topic/ppc-kvm


The commit Stephen mentioned has been rebased since to squash in a fix.
But what is in the topic branch is now final, I won't rebase what's
there.

cheers

Re: [powerpc][next-20210621] WARNING at kernel/sched/fair.c:3277 during boot

2021-06-22 Thread Vincent Guittot

Le mardi 22 juin 2021 à 09:49:31 (+0200), Vincent Guittot a écrit :
> Hi Sachin,
> 
> On Tue, 22 Jun 2021 at 09:39, Sachin Sant  wrote:
> >
> > While booting 5.13.0-rc7-next-20210621 on a PowerVM LPAR following warning
> > is seen
> >
> > [   30.922154] [ cut here ]
> > [   30.922201] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || 
> > cfs_rq->avg.runnable_avg
> > [   30.922219] WARNING: CPU: 6 PID: 762 at kernel/sched/fair.c:3277 
> > update_blocked_averages+0x758/0x780
> > [   30.922259] Modules linked in: pseries_rng xts vmx_crypto 
> > uio_pdrv_genirq uio sch_fq_codel ip_tables sd_mod t10_pi sg fuse
> > [   30.922309] CPU: 6 PID: 762 Comm: augenrules Not tainted 
> > 5.13.0-rc7-next-20210621 #1
> > [   30.922329] NIP:  c01b27e8 LR: c01b27e4 CTR: 
> > c07cfda0
> > [   30.922344] REGS: c00023fcb660 TRAP: 0700   Not tainted  
> > (5.13.0-rc7-next-20210621)
> > [   30.922359] MSR:  80029033   CR: 48488224  
> > XER: 0005
> > [   30.922394] CFAR: c014d120 IRQMASK: 1
> >GPR00: c01b27e4 c00023fcb900 c2a08400 
> > 0048
> >GPR04: 7fff c00023fcb5c0 0027 
> > c00f6fdd7e18
> >GPR08: 0023 0001 0027 
> > c28a6650
> >GPR12: 8000 c00f6fff7680 c00f6fe62600 
> > 0032
> >GPR16: 0007331a989a c00f6fe62600 c000238a6800 
> > 0001
> >GPR20:  c2a4dfe0  
> > 0006
> >GPR24:  c00f6fe63010 0001 
> > c00f6fe62680
> >GPR28: 0006 c000238a69c0  
> > c00f6fe62600
> > [   30.922569] NIP [c01b27e8] update_blocked_averages+0x758/0x780
> > [   30.922599] LR [c01b27e4] update_blocked_averages+0x754/0x780
> > [   30.922624] Call Trace:
> > [   30.922631] [c00023fcb900] [c01b27e4] 
> > update_blocked_averages+0x754/0x780 (unreliable)
> > [   30.922653] [c00023fcba20] [c01bd668] 
> > newidle_balance+0x258/0x5c0
> > [   30.922674] [c00023fcbab0] [c01bdaac] 
> > pick_next_task_fair+0x7c/0x4d0
> > [   30.922692] [c00023fcbb10] [c0dcd31c] __schedule+0x15c/0x1780
> > [   30.922708] [c00023fcbc50] [c01a5a04] do_task_dead+0x64/0x70
> > [   30.922726] [c00023fcbc80] [c0156338] do_exit+0x848/0xcc0
> > [   30.922743] [c00023fcbd50] [c0156884] do_group_exit+0x64/0xe0
> > [   30.922758] [c00023fcbd90] [c0156924] 
> > sys_exit_group+0x24/0x30
> > [   30.922774] [c00023fcbdb0] [c00310c0] 
> > system_call_exception+0x150/0x2d0
> > [   30.922792] [c00023fcbe10] [c000cc5c] 
> > system_call_common+0xec/0x278
> > [   30.922808] --- interrupt: c00 at 0x7fffb3acddcc
> > [   30.922821] NIP:  7fffb3acddcc LR: 7fffb3a27f04 CTR: 
> > 
> > [   30.922833] REGS: c00023fcbe80 TRAP: 0c00   Not tainted  
> > (5.13.0-rc7-next-20210621)
> > [   30.922847] MSR:  8280f033   
> > CR: 28444202  XER: 
> > [   30.922882] IRQMASK: 0
> >GPR00: 00ea 7fffc8f21780 7fffb3bf7100 
> > 
> >GPR04:  000155f142f0  
> > 7fffb3d23740
> >GPR08: fbad2a87   
> > 
> >GPR12:  7fffb3d2aeb0 000116be95e0 
> > 0032
> >GPR16:  7fffc8f21cd8 002d 
> > 0024
> >GPR20: 7fffc8f21cd4 7fffb3bf4f98 0001 
> > 0001
> >GPR24: 7fffb3bf0950   
> > 0001
> >GPR28:   7fffb3d23ec0 
> > 
> > [   30.923023] NIP [7fffb3acddcc] 0x7fffb3acddcc
> > [   30.923035] LR [7fffb3a27f04] 0x7fffb3a27f04
> > [   30.923045] --- interrupt: c00
> > [   30.923052] Instruction dump:
> > [   30.923061] 3863be48 9be97ae6 4bf9a8f9 6000 0fe0 4bfff980 
> > e9210070 e8610088
> > [   30.923088] 3941 99490003 4bf9a8d9 6000 <0fe0> 4bfffc24 
> > 3d22fff5 89297ae3
> > [   30.923113] ---[ end trace ed07974d2149c499 ]—
> >
> > This warning was introduced with commit 9e077b52d86a
> > sched/pelt: Check that *_avg are null when *_sum are
> 
> Yes. That was exactly the purpose of the patch. There is one last
> remaining part which could generate this. I'm going to prepare a patch

Could you try the patch below ? I have been able to reproduce the problem 
locally and this
fix it on my system:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8cc27b847ad8..da91db1c137f 100644
--- a/kernel/sched/fair.c

[PATCH 3/3] powerpc/pseries: fail quicker in dlpar_memory_add_by_ic()

2021-06-22 Thread Daniel Henrique Barboza

The validation done at the start of dlpar_memory_add_by_ic() is an all
of nothing scenario - if any LMBs in the range is marked as RESERVED we
can fail right away.

We then can remove the 'lmbs_available' var and its check with
'lmbs_to_add' since the whole LMB range was already validated in the
previous step.

Signed-off-by: Daniel Henrique Barboza 
---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c0a03e1537cb..377d852f5a9a 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -796,7 +796,6 @@ static int dlpar_memory_add_by_index(u32 drc_index)
 static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 drc_index)
 {
struct drmem_lmb *lmb, *start_lmb, *end_lmb;
-   int lmbs_available = 0;
int rc;
 
pr_info("Attempting to hot-add %u LMB(s) at index %x\n",
@@ -811,15 +810,14 @@ static int dlpar_memory_add_by_ic(u32 lmbs_to_add, u32 
drc_index)
 
/* Validate that the LMBs in this range are not reserved */
for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
-   if (lmb->flags & DRCONF_MEM_RESERVED)
-   break;
-
-   lmbs_available++;
+   /* Fail immediately if the whole range can't be hot-added */
+   if (lmb->flags & DRCONF_MEM_RESERVED) {
+   pr_err("Memory at %llx (drc index %x) is reserved\n",
+   lmb->base_addr, lmb->drc_index);
+   return -EINVAL;
+   }
}
 
-   if (lmbs_available < lmbs_to_add)
-   return -EINVAL;
-
for_each_drmem_lmb_in_range(lmb, start_lmb, end_lmb) {
if (lmb->flags & DRCONF_MEM_ASSIGNED)
continue;
-- 
2.31.1

[PATCH 2/3] powerpc/pseries: break early in dlpar_memory_add_by_count() loops

2021-06-22 Thread Daniel Henrique Barboza

After a successful dlpar_add_lmb() call the LMB is marked as reserved.
Later on, depending whether we added enough LMBs or not, we rely on
the marked LMBs to see which ones might need to be removed, and we
remove the reservation of all of them.

These are done in for_each_drmem_lmb() loops without any break
condition. This means that we're going to check all LMBs of the partition
even after going through all the reserved ones.

This patch adds break conditions in both loops to avoid this. The
'lmbs_added' variable was renamed to 'lmbs_reserved', and it's now
being decremented each time a lmb reservation is removed, indicating
if there are still marked LMBs to be processed.

Signed-off-by: Daniel Henrique Barboza 
---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 28a7fd90232f..c0a03e1537cb 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -673,7 +673,7 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
 {
struct drmem_lmb *lmb;
int lmbs_available = 0;
-   int lmbs_added = 0;
+   int lmbs_reserved = 0;
int rc;
 
pr_info("Attempting to hot-add %d LMB(s)\n", lmbs_to_add);
@@ -714,13 +714,12 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
 * requested LMBs cannot be added.
 */
drmem_mark_lmb_reserved(lmb);
-
-   lmbs_added++;
-   if (lmbs_added == lmbs_to_add)
+   lmbs_reserved++;
+   if (lmbs_reserved == lmbs_to_add)
break;
}
 
-   if (lmbs_added != lmbs_to_add) {
+   if (lmbs_reserved != lmbs_to_add) {
pr_err("Memory hot-add failed, removing any added LMBs\n");
 
for_each_drmem_lmb(lmb) {
@@ -735,6 +734,10 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
dlpar_release_drc(lmb->drc_index);
 
drmem_remove_lmb_reservation(lmb);
+   lmbs_reserved--;
+
+   if (lmbs_reserved == 0)
+   break;
}
rc = -EINVAL;
} else {
@@ -745,6 +748,10 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
pr_debug("Memory at %llx (drc index %x) was 
hot-added\n",
 lmb->base_addr, lmb->drc_index);
drmem_remove_lmb_reservation(lmb);
+   lmbs_reserved--;
+
+   if (lmbs_reserved == 0)
+   break;
}
rc = 0;
}
-- 
2.31.1

[PATCH 0/3] powerpc/pseries: cleanups for dlpar_memory_add* functions

2021-06-22 Thread Daniel Henrique Barboza

Hi,

These are a couple of cleanups for the dlpar_memory_add* functions
that are similar to those I did a month or so ago in
dlpar_memory_remove_by_count and dlpar_memory_remove_by_ic. 



Daniel Henrique Barboza (3):
  powerpc/pseries: skip reserved LMBs in dlpar_memory_add_by_count()
  powerpc/pseries: break early in dlpar_memory_add_by_count() loops
  powerpc/pseries: fail quicker in dlpar_memory_add_by_ic()

 .../platforms/pseries/hotplug-memory.c| 34 ---
 1 file changed, 21 insertions(+), 13 deletions(-)

-- 
2.31.1

[PATCH 1/3] powerpc/pseries: skip reserved LMBs in dlpar_memory_add_by_count()

2021-06-22 Thread Daniel Henrique Barboza

The function is counting reserved LMBs as available to be added, but
they aren't. This will cause the function to miscalculate the available
LMBs and can trigger errors later on when executing dlpar_add_lmb().

Signed-off-by: Daniel Henrique Barboza 
---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 36f66556a7c6..28a7fd90232f 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -683,6 +683,9 @@ static int dlpar_memory_add_by_count(u32 lmbs_to_add)
 
/* Validate that there are enough LMBs to satisfy the request */
for_each_drmem_lmb(lmb) {
+   if (lmb->flags & DRCONF_MEM_RESERVED)
+   continue;
+
if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
lmbs_available++;
 
-- 
2.31.1

Re: [PATCH v3 0/4] Add perf interface to expose nvdimm

2021-06-22 Thread Peter Zijlstra

On Thu, Jun 17, 2021 at 06:56:13PM +0530, Kajol Jain wrote:
> ---
> Kajol Jain (4):
>   drivers/nvdimm: Add nvdimm pmu structure
>   drivers/nvdimm: Add perf interface to expose nvdimm performance stats
>   powerpc/papr_scm: Add perf interface support
>   powerpc/papr_scm: Document papr_scm sysfs event format entries

Don't see anything obviously wrong with this one.

Acked-by: Peter Zijlstra (Intel)

Re: [PATCH v2 1/1] powerpc/papr_scm: Properly handle UUID types and API

2021-06-22 Thread Andy Shevchenko

On Tue, Jun 22, 2021 at 03:44:56PM +0300, Andy Shevchenko wrote:
> On Wed, Jun 16, 2021 at 04:43:03PM +0300, Andy Shevchenko wrote:
> > Parse to and export from UUID own type, before dereferencing.
> > This also fixes wrong comment (Little Endian UUID is something else)
> > and should eliminate the direct strict types assignments.
> 
> Any comments on this version? Can it be applied?

"Any _other_ comments..."

> > Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset 
> > cookie")
> > Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for 
> > storing uuid from the device tree")

AFAIU it's fine to have Fixes tags, but if anybody insist I will remove them
and send v3.

> > ---
> > v2: added missed header (Vaibhav), updated comment (Aneesh),
> > rewrite part of the commit message to avoid mentioning the Sparse

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v2 1/1] powerpc/papr_scm: Properly handle UUID types and API

2021-06-22 Thread Andy Shevchenko

On Wed, Jun 16, 2021 at 04:43:03PM +0300, Andy Shevchenko wrote:
> Parse to and export from UUID own type, before dereferencing.
> This also fixes wrong comment (Little Endian UUID is something else)
> and should eliminate the direct strict types assignments.

Any comments on this version? Can it be applied?

> Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset cookie")
> Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for 
> storing uuid from the device tree")
> Cc: Oliver O'Halloran 
> Cc: Aneesh Kumar K.V 
> Signed-off-by: Andy Shevchenko 
> ---
> v2: added missed header (Vaibhav), updated comment (Aneesh),
> rewrite part of the commit message to avoid mentioning the Sparse

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v4 7/7] powerpc/pseries: Add support for FORM2 associativity

2021-06-22 Thread Aneesh Kumar K.V

Daniel Henrique Barboza  writes:

> On 6/17/21 1:51 PM, Aneesh Kumar K.V wrote:
>> PAPR interface currently supports two different ways of communicating 
>> resource
>> grouping details to the OS. These are referred to as Form 0 and Form 1
>> associativity grouping. Form 0 is the older format and is now considered
>> deprecated. This patch adds another resource grouping named FORM2.
>> 
>> Signed-off-by: Daniel Henrique Barboza 
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>   Documentation/powerpc/associativity.rst   | 135 
>>   arch/powerpc/include/asm/firmware.h   |   3 +-
>>   arch/powerpc/include/asm/prom.h   |   1 +
>>   arch/powerpc/kernel/prom_init.c   |   3 +-
>>   arch/powerpc/mm/numa.c| 149 +-
>>   arch/powerpc/platforms/pseries/firmware.c |   1 +
>>   6 files changed, 286 insertions(+), 6 deletions(-)
>>   create mode 100644 Documentation/powerpc/associativity.rst
>> 
>> diff --git a/Documentation/powerpc/associativity.rst 
>> b/Documentation/powerpc/associativity.rst
>> new file mode 100644
>> index ..93be604ac54d
>> --- /dev/null
>> +++ b/Documentation/powerpc/associativity.rst
>> @@ -0,0 +1,135 @@
>> +
>> +NUMA resource associativity
>> +=
>> +
>> +Associativity represents the groupings of the various platform resources 
>> into
>> +domains of substantially similar mean performance relative to resources 
>> outside
>> +of that domain. Resources subsets of a given domain that exhibit better
>> +performance relative to each other than relative to other resources subsets
>> +are represented as being members of a sub-grouping domain. This performance
>> +characteristic is presented in terms of NUMA node distance within the Linux 
>> kernel.
>> +From the platform view, these groups are also referred to as domains.
>> +
>> +PAPR interface currently supports different ways of communicating these 
>> resource
>> +grouping details to the OS. These are referred to as Form 0, Form 1 and 
>> Form2
>> +associativity grouping. Form 0 is the older format and is now considered 
>> deprecated.
>> +
>> +Hypervisor indicates the type/form of associativity used via 
>> "ibm,arcitecture-vec-5 property".
>> +Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates usage of 
>> Form 0 or Form 1.
>> +A value of 1 indicates the usage of Form 1 associativity. For Form 2 
>> associativity
>> +bit 2 of byte 5 in the "ibm,architecture-vec-5" property is used.
>> +
>> +Form 0
>> +-
>> +Form 0 associativity supports only two NUMA distance (LOCAL and REMOTE).
>> +
>> +Form 1
>> +-
>> +With Form 1 a combination of ibm,associativity-reference-points and 
>> ibm,associativity
>> +device tree properties are used to determine the NUMA distance between 
>> resource groups/domains.
>> +
>> +The “ibm,associativity” property contains one or more lists of numbers 
>> (domainID)
>> +representing the resource’s platform grouping domains.
>> +
>> +The “ibm,associativity-reference-points” property contains one or more list 
>> of numbers
>> +(domainID index) that represents the 1 based ordinal in the associativity 
>> lists.
>> +The list of domainID index represnets increasing hierachy of resource 
>> grouping.
>> +
>> +ex:
>> +{ primary domainID index, secondary domainID index, tertiary domainID 
>> index.. }
>> +
>> +Linux kernel uses the domainID at the primary domainID index as the NUMA 
>> node id.
>> +Linux kernel computes NUMA distance between two domains by recursively 
>> comparing
>> +if they belong to the same higher-level domains. For mismatch at every 
>> higher
>> +level of the resource group, the kernel doubles the NUMA distance between 
>> the
>> +comparing domains.
>> +
>> +Form 2
>> +---
>> +Form 2 associativity format adds separate device tree properties 
>> representing NUMA node distance
>> +thereby making the node distance computation flexible. Form 2 also allows 
>> flexible primary
>> +domain numbering. With numa distance computation now detached from the 
>> index value of
>> +"ibm,associativity" property, Form 2 allows a large number of primary 
>> domain ids at the
>> +same domainID index representing resource groups of different 
>> performance/latency characteristics.
>> +
>> +Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 
>> in the
>> +"ibm,architecture-vec-5" property.
>> +
>> +"ibm,numa-lookup-index-table" property contains one or more list numbers 
>> representing
>> +the domainIDs present in the system. The offset of the domainID in this 
>> property is considered
>> +the domainID index.
>> +
>> +prop-encoded-array: The number N of the domainIDs encoded as with 
>> encode-int, followed by
>> +N domainID encoded as with encode-int
>> +
>> +For ex:
>> +ibm,numa-lookup-index-table =  {4, 0, 8, 250, 252}, domainID index for 
>> domainID 8 is 1.
>> +
>> +"ibm,numa-distance-table" property contains one or more

[RFC PATCH 43/43] KVM: PPC: Book3S HV P9: Optimise hash guest SLB saving

2021-06-22 Thread Nicholas Piggin

slbmfee/slbmfev instructions are very expensive, moreso than a regular
mfspr instruction, so minimising them significantly improves hash guest
exit performance. The slbmfev is only required if slbmfee found a valid
SLB entry.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 3fffcec67ff8..5e9e9f809297 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -459,10 +459,22 @@ static void __accumulate_time(struct kvm_vcpu *vcpu, 
struct kvmhv_tb_accumulator
 #define accumulate_time(vcpu, next) do {} while (0)
 #endif
 
-static inline void mfslb(unsigned int idx, u64 *slbee, u64 *slbev)
+static inline u64 mfslbv(unsigned int idx)
 {
-   asm volatile("slbmfev  %0,%1" : "=r" (*slbev) : "r" (idx));
-   asm volatile("slbmfee  %0,%1" : "=r" (*slbee) : "r" (idx));
+   u64 slbev;
+
+   asm volatile("slbmfev  %0,%1" : "=r" (slbev) : "r" (idx));
+
+   return slbev;
+}
+
+static inline u64 mfslbe(unsigned int idx)
+{
+   u64 slbee;
+
+   asm volatile("slbmfee  %0,%1" : "=r" (slbee) : "r" (idx));
+
+   return slbee;
 }
 
 static inline void mtslb(u64 slbee, u64 slbev)
@@ -592,8 +604,10 @@ static void save_clear_guest_mmu(struct kvm *kvm, struct 
kvm_vcpu *vcpu)
 */
for (i = 0; i < vcpu->arch.slb_nr; i++) {
u64 slbee, slbev;
-   mfslb(i, , );
+
+   slbee = mfslbe(i);
if (slbee & SLB_ESID_V) {
+   slbev = mfslbv(i);
vcpu->arch.slb[nr].orige = slbee | i;
vcpu->arch.slb[nr].origv = slbev;
nr++;
-- 
2.23.0

[RFC PATCH 42/43] KVM: PPC: Book3S HV P9: Improve mfmsr performance on entry

2021-06-22 Thread Nicholas Piggin

Rearrange the MSR saving on entry so it does not follow the mtmsrd to
disable interrupts, avoiding a possible RAW scoreboard stall.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  2 +
 arch/powerpc/kvm/book3s_hv.c | 18 ++-
 arch/powerpc/kvm/book3s_hv_p9_entry.c| 66 +++-
 3 files changed, 47 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index f8a0ed90b853..20ca9b1a2d41 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -153,6 +153,8 @@ static inline bool kvmhv_vcpu_is_radix(struct kvm_vcpu 
*vcpu)
return radix;
 }
 
+unsigned long kvmppc_msr_hard_disable_set_facilities(struct kvm_vcpu *vcpu, 
unsigned long msr);
+
 int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long 
lpcr, u64 *tb);
 
 #define KVM_DEFAULT_HPT_ORDER  24  /* 16MB HPT by default */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 7cb9e87b50b7..c8edab9a90cb 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3759,6 +3759,8 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu 
*vcpu, u64 time_limit, uns
s64 dec;
int trap;
 
+   msr = mfmsr();
+
save_p9_host_os_sprs(_os_sprs);
 
/*
@@ -3769,24 +3771,10 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu 
*vcpu, u64 time_limit, uns
 */
host_psscr = mfspr(SPRN_PSSCR_PR);
 
-   hard_irq_disable();
+   kvmppc_msr_hard_disable_set_facilities(vcpu, msr);
if (lazy_irq_pending())
return 0;
 
-   /* MSR bits may have been cleared by context switch */
-   msr = 0;
-   if (IS_ENABLED(CONFIG_PPC_FPU))
-   msr |= MSR_FP;
-   if (cpu_has_feature(CPU_FTR_ALTIVEC))
-   msr |= MSR_VEC;
-   if (cpu_has_feature(CPU_FTR_VSX))
-   msr |= MSR_VSX;
-   if ((cpu_has_feature(CPU_FTR_TM) ||
-   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)) &&
-   (vcpu->arch.hfscr & HFSCR_TM))
-   msr |= MSR_TM;
-   msr = msr_check_and_set(msr);
-
load_vcpu_state(vcpu, _os_sprs);
 
if (vcpu->arch.psscr != host_psscr)
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 48b0ce9e0c39..3fffcec67ff8 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -604,6 +604,44 @@ static void save_clear_guest_mmu(struct kvm *kvm, struct 
kvm_vcpu *vcpu)
}
 }
 
+unsigned long kvmppc_msr_hard_disable_set_facilities(struct kvm_vcpu *vcpu, 
unsigned long msr)
+{
+   unsigned long msr_needed = 0;
+
+   msr &= ~MSR_EE;
+
+   /* MSR bits may have been cleared by context switch so must recheck */
+   if (IS_ENABLED(CONFIG_PPC_FPU))
+   msr_needed |= MSR_FP;
+   if (cpu_has_feature(CPU_FTR_ALTIVEC))
+   msr_needed |= MSR_VEC;
+   if (cpu_has_feature(CPU_FTR_VSX))
+   msr_needed |= MSR_VSX;
+   if ((cpu_has_feature(CPU_FTR_TM) ||
+   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)) &&
+   (vcpu->arch.hfscr & HFSCR_TM))
+   msr_needed |= MSR_TM;
+
+   /*
+* This could be combined with MSR[RI] clearing, but that expands
+* the unrecoverable window. It would be better to cover unrecoverable
+* with KVM bad interrupt handling rather than use MSR[RI] at all.
+*
+* Much more difficult and less worthwhile to combine with IR/DR
+* disable.
+*/
+   if ((msr & msr_needed) != msr_needed) {
+   msr |= msr_needed;
+   __mtmsrd(msr, 0);
+   } else {
+   __hard_irq_disable();
+   }
+   local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+
+   return msr;
+}
+EXPORT_SYMBOL_GPL(kvmppc_msr_hard_disable_set_facilities);
+
 int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long 
lpcr, u64 *tb)
 {
struct p9_host_os_sprs host_os_sprs;
@@ -637,6 +675,9 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
 
vcpu->arch.ceded = 0;
 
+   /* Save MSR for restore, with EE clear. */
+   msr = mfmsr() & ~MSR_EE;
+
host_hfscr = mfspr(SPRN_HFSCR);
host_ciabr = mfspr(SPRN_CIABR);
host_psscr = mfspr(SPRN_PSSCR_PR);
@@ -658,35 +699,12 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
 
save_p9_host_os_sprs(_os_sprs);
 
-   /*
-* This could be combined with MSR[RI] clearing, but that expands
-* the unrecoverable window. It would be better to cover unrecoverable
-* with KVM bad interrupt handling rather than use MSR[RI] at all.
-*
-* Much more difficult and less worthwhile to combine with IR/DR
-*

[RFC PATCH 41/43] KVM: PPC: Book3S HV Nested: Avoid extra mftb() in nested entry

2021-06-22 Thread Nicholas Piggin

mftb() is expensive and one can be avoided on nested guest dispatch.

If the time checking code distinguishes between the L0 timer and the
nested HV timer, then both can be tested in the same place with the
same mftb() value.

This also nicely illustrates the relationship between the L0 and nested
HV timers.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/kvm_asm.h  |  1 +
 arch/powerpc/kvm/book3s_hv.c| 12 
 arch/powerpc/kvm/book3s_hv_nested.c |  5 -
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h 
b/arch/powerpc/include/asm/kvm_asm.h
index fbbf3cec92e9..d68d71987d5c 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -79,6 +79,7 @@
 #define BOOK3S_INTERRUPT_FP_UNAVAIL0x800
 #define BOOK3S_INTERRUPT_DECREMENTER   0x900
 #define BOOK3S_INTERRUPT_HV_DECREMENTER0x980
+#define BOOK3S_INTERRUPT_NESTED_HV_DECREMENTER 0x1980
 #define BOOK3S_INTERRUPT_DOORBELL  0xa00
 #define BOOK3S_INTERRUPT_SYSCALL   0xc00
 #define BOOK3S_INTERRUPT_TRACE 0xd00
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 9d8277a4c829..7cb9e87b50b7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1410,6 +1410,10 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
run->ready_for_interrupt_injection = 1;
switch (vcpu->arch.trap) {
/* We're good on these - the host merely wanted to get our attention */
+   case BOOK3S_INTERRUPT_NESTED_HV_DECREMENTER:
+   WARN_ON_ONCE(1); /* Should never happen */
+   vcpu->arch.trap = BOOK3S_INTERRUPT_HV_DECREMENTER;
+   fallthrough;
case BOOK3S_INTERRUPT_HV_DECREMENTER:
vcpu->stat.dec_exits++;
r = RESUME_GUEST;
@@ -1737,6 +1741,12 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu 
*vcpu)
vcpu->stat.ext_intr_exits++;
r = RESUME_GUEST;
break;
+   /* These need to go to the nested HV */
+   case BOOK3S_INTERRUPT_NESTED_HV_DECREMENTER:
+   vcpu->arch.trap = BOOK3S_INTERRUPT_HV_DECREMENTER;
+   vcpu->stat.dec_exits++;
+   r = RESUME_HOST;
+   break;
/* SR/HMI/PMI are HV interrupts that host has handled. Resume guest.*/
case BOOK3S_INTERRUPT_HMI:
case BOOK3S_INTERRUPT_PERFMON:
@@ -3855,6 +3865,8 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
return BOOK3S_INTERRUPT_HV_DECREMENTER;
if (next_timer < time_limit)
time_limit = next_timer;
+   else if (*tb >= time_limit) /* nested time limit */
+   return BOOK3S_INTERRUPT_NESTED_HV_DECREMENTER;
 
vcpu->arch.ceded = 0;
 
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 5a534f7924f2..a92808a927ff 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -361,11 +361,6 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
vcpu->arch.ret = RESUME_GUEST;
vcpu->arch.trap = 0;
do {
-   if (mftb() >= hdec_exp) {
-   vcpu->arch.trap = BOOK3S_INTERRUPT_HV_DECREMENTER;
-   r = RESUME_HOST;
-   break;
-   }
r = kvmhv_run_single_vcpu(vcpu, hdec_exp, l2_hv.lpcr);
} while (is_kvmppc_resume_guest(r));
 
-- 
2.23.0

[RFC PATCH 40/43] KVM: PPC: Book3S HV P9: Avoid tlbsync sequence on radix guest exit

2021-06-22 Thread Nicholas Piggin

Use the existing TLB flushing logic to IPI the previous CPU and run the
necessary barriers before running a guest vCPU on a new physical CPU,
to do the necessary radix GTSE barriers for handling the case of an
interrupted guest tlbie sequence.

This results in more IPIs than the TLB flush logic requires, but it's
a significant win for common case scheduling when the vCPU remains on
the same physical CPU.

-522 cycles (5754) POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c  | 31 +++
 arch/powerpc/kvm/book3s_hv_p9_entry.c |  9 
 2 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 91bbd0a8f6b6..9d8277a4c829 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2906,6 +2906,25 @@ static void radix_flush_cpu(struct kvm *kvm, int cpu, 
struct kvm_vcpu *vcpu)
smp_call_function_single(i, do_nothing, NULL, 1);
 }
 
+static void do_migrate_away_vcpu(void *arg)
+{
+   struct kvm_vcpu *vcpu = arg;
+   struct kvm *kvm = vcpu->kvm;
+
+   /*
+* If the guest has GTSE, it may execute tlbie, so do a eieio; tlbsync;
+* ptesync sequence on the old CPU before migrating to a new one, in
+* case we interrupted the guest between a tlbie ; eieio ;
+* tlbsync; ptesync sequence.
+*
+* Otherwise, ptesync is sufficient.
+*/
+   if (kvm->arch.lpcr & LPCR_GTSE)
+   asm volatile("eieio; tlbsync; ptesync");
+   else
+   asm volatile("ptesync");
+}
+
 static void kvmppc_prepare_radix_vcpu(struct kvm_vcpu *vcpu, int pcpu)
 {
struct kvm_nested_guest *nested = vcpu->arch.nested;
@@ -2933,10 +2952,14 @@ static void kvmppc_prepare_radix_vcpu(struct kvm_vcpu 
*vcpu, int pcpu)
 * so we use a single bit in .need_tlb_flush for all 4 threads.
 */
if (prev_cpu != pcpu) {
-   if (prev_cpu >= 0 &&
-   cpu_first_tlb_thread_sibling(prev_cpu) !=
-   cpu_first_tlb_thread_sibling(pcpu))
-   radix_flush_cpu(kvm, prev_cpu, vcpu);
+   if (prev_cpu >= 0) {
+   if (cpu_first_tlb_thread_sibling(prev_cpu) !=
+   cpu_first_tlb_thread_sibling(pcpu))
+   radix_flush_cpu(kvm, prev_cpu, vcpu);
+
+   smp_call_function_single(prev_cpu,
+   do_migrate_away_vcpu, vcpu, 1);
+   }
if (nested)
nested->prev_cpu[vcpu->arch.nested_vcpu_id] = pcpu;
else
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 4bab56c10254..48b0ce9e0c39 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -994,15 +994,6 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
 
local_paca->kvm_hstate.in_guest = KVM_GUEST_MODE_NONE;
 
-   if (kvm_is_radix(kvm)) {
-   /*
-* Since this is radix, do a eieio; tlbsync; ptesync sequence
-* in case we interrupted the guest between a tlbie and a
-* ptesync.
-*/
-   asm volatile("eieio; tlbsync; ptesync");
-   }
-
/*
 * cp_abort is required if the processor supports local copy-paste
 * to clear the copy buffer that was under control of the guest.
-- 
2.23.0

[RFC PATCH 39/43] KVM: PPC: Book3S HV P9: Don't restore PSSCR if not needed

2021-06-22 Thread Nicholas Piggin

This also moves the PSSCR update in nested entry to avoid a SPR
scoreboard stall.

-45 cycles (6276) POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c  |  7 +--
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 26 +++---
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c7cf771d3351..91bbd0a8f6b6 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3756,7 +3756,9 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu 
*vcpu, u64 time_limit, uns
 
load_vcpu_state(vcpu, _os_sprs);
 
-   mtspr(SPRN_PSSCR_PR, vcpu->arch.psscr);
+   if (vcpu->arch.psscr != host_psscr)
+   mtspr(SPRN_PSSCR_PR, vcpu->arch.psscr);
+
kvmhv_save_hv_regs(vcpu, );
hvregs.lpcr = lpcr;
vcpu->arch.regs.msr = vcpu->arch.shregs.msr;
@@ -3797,7 +3799,6 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu 
*vcpu, u64 time_limit, uns
vcpu->arch.shregs.dar = mfspr(SPRN_DAR);
vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
vcpu->arch.psscr = mfspr(SPRN_PSSCR_PR);
-   mtspr(SPRN_PSSCR_PR, host_psscr);
 
store_vcpu_state(vcpu);
 
@@ -3810,6 +3811,8 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu 
*vcpu, u64 time_limit, uns
timer_rearm_host_dec(*tb);
 
restore_p9_host_os_sprs(vcpu, _os_sprs);
+   if (vcpu->arch.psscr != host_psscr)
+   mtspr(SPRN_PSSCR_PR, host_psscr);
 
return trap;
 }
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index f305d1d6445c..4bab56c10254 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -621,6 +621,7 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
unsigned long host_dawr0;
unsigned long host_dawrx0;
unsigned long host_psscr;
+   unsigned long host_hpsscr;
unsigned long host_pidr;
unsigned long host_dawr1;
unsigned long host_dawrx1;
@@ -638,7 +639,9 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
 
host_hfscr = mfspr(SPRN_HFSCR);
host_ciabr = mfspr(SPRN_CIABR);
-   host_psscr = mfspr(SPRN_PSSCR);
+   host_psscr = mfspr(SPRN_PSSCR_PR);
+   if (cpu_has_feature(CPU_FTRS_POWER9_DD2_2))
+   host_hpsscr = mfspr(SPRN_PSSCR);
host_pidr = mfspr(SPRN_PID);
 
if (dawr_enabled()) {
@@ -719,8 +722,14 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
if (vcpu->arch.ciabr != host_ciabr)
mtspr(SPRN_CIABR, vcpu->arch.ciabr);
 
-   mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC |
- (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
+
+   if (cpu_has_feature(CPU_FTRS_POWER9_DD2_2)) {
+   mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC |
+ (local_paca->kvm_hstate.fake_suspend << 
PSSCR_FAKE_SUSPEND_LG));
+   } else {
+   if (vcpu->arch.psscr != host_psscr)
+   mtspr(SPRN_PSSCR_PR, vcpu->arch.psscr);
+   }
 
mtspr(SPRN_HFSCR, vcpu->arch.hfscr);
 
@@ -905,7 +914,7 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
 
vcpu->arch.ic = mfspr(SPRN_IC);
vcpu->arch.pid = mfspr(SPRN_PID);
-   vcpu->arch.psscr = mfspr(SPRN_PSSCR) & PSSCR_GUEST_VIS;
+   vcpu->arch.psscr = mfspr(SPRN_PSSCR_PR);
 
vcpu->arch.shregs.sprg0 = mfspr(SPRN_SPRG0);
vcpu->arch.shregs.sprg1 = mfspr(SPRN_SPRG1);
@@ -948,9 +957,12 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
mtspr(SPRN_PURR, local_paca->kvm_hstate.host_purr);
mtspr(SPRN_SPURR, local_paca->kvm_hstate.host_spurr);
 
-   /* Preserve PSSCR[FAKE_SUSPEND] until we've called kvmppc_save_tm_hv */
-   mtspr(SPRN_PSSCR, host_psscr |
- (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
+   if (cpu_has_feature(CPU_FTRS_POWER9_DD2_2)) {
+   /* Preserve PSSCR[FAKE_SUSPEND] until we've called 
kvmppc_save_tm_hv */
+   mtspr(SPRN_PSSCR, host_hpsscr |
+ (local_paca->kvm_hstate.fake_suspend << 
PSSCR_FAKE_SUSPEND_LG));
+   }
+
mtspr(SPRN_HFSCR, host_hfscr);
if (vcpu->arch.ciabr != host_ciabr)
mtspr(SPRN_CIABR, host_ciabr);
-- 
2.23.0

[RFC PATCH 38/43] KVM: PPC: Book3S HV P9: Test dawr_enabled() before saving host DAWR SPRs

2021-06-22 Thread Nicholas Piggin

Some of the DAWR SPR access is already predicated on dawr_enabled(),
apply this to the remainder of the accesses.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 34 ---
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 7aa72efcac6c..f305d1d6445c 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -638,13 +638,16 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
 
host_hfscr = mfspr(SPRN_HFSCR);
host_ciabr = mfspr(SPRN_CIABR);
-   host_dawr0 = mfspr(SPRN_DAWR0);
-   host_dawrx0 = mfspr(SPRN_DAWRX0);
host_psscr = mfspr(SPRN_PSSCR);
host_pidr = mfspr(SPRN_PID);
-   if (cpu_has_feature(CPU_FTR_DAWR1)) {
-   host_dawr1 = mfspr(SPRN_DAWR1);
-   host_dawrx1 = mfspr(SPRN_DAWRX1);
+
+   if (dawr_enabled()) {
+   host_dawr0 = mfspr(SPRN_DAWR0);
+   host_dawrx0 = mfspr(SPRN_DAWRX0);
+   if (cpu_has_feature(CPU_FTR_DAWR1)) {
+   host_dawr1 = mfspr(SPRN_DAWR1);
+   host_dawrx1 = mfspr(SPRN_DAWRX1);
+   }
}
 
local_paca->kvm_hstate.host_purr = mfspr(SPRN_PURR);
@@ -951,15 +954,18 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
mtspr(SPRN_HFSCR, host_hfscr);
if (vcpu->arch.ciabr != host_ciabr)
mtspr(SPRN_CIABR, host_ciabr);
-   if (vcpu->arch.dawr0 != host_dawr0)
-   mtspr(SPRN_DAWR0, host_dawr0);
-   if (vcpu->arch.dawrx0 != host_dawrx0)
-   mtspr(SPRN_DAWRX0, host_dawrx0);
-   if (cpu_has_feature(CPU_FTR_DAWR1)) {
-   if (vcpu->arch.dawr1 != host_dawr1)
-   mtspr(SPRN_DAWR1, host_dawr1);
-   if (vcpu->arch.dawrx1 != host_dawrx1)
-   mtspr(SPRN_DAWRX1, host_dawrx1);
+
+   if (dawr_enabled()) {
+   if (vcpu->arch.dawr0 != host_dawr0)
+   mtspr(SPRN_DAWR0, host_dawr0);
+   if (vcpu->arch.dawrx0 != host_dawrx0)
+   mtspr(SPRN_DAWRX0, host_dawrx0);
+   if (cpu_has_feature(CPU_FTR_DAWR1)) {
+   if (vcpu->arch.dawr1 != host_dawr1)
+   mtspr(SPRN_DAWR1, host_dawr1);
+   if (vcpu->arch.dawrx1 != host_dawrx1)
+   mtspr(SPRN_DAWRX1, host_dawrx1);
+   }
}
 
if (vc->dpdes)
-- 
2.23.0

[RFC PATCH 37/43] KVM: PPC: Book3S HV P9: Comment and fix MMU context switching code

2021-06-22 Thread Nicholas Piggin

Tighten up partition switching code synchronisation and comments.

In particular, hwsync ; isync is required after the last access that is
performed in the context of a partition, before the partition is
switched away from.

-301 cycles (6319) POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  4 +++
 arch/powerpc/kvm/book3s_hv_p9_entry.c  | 40 +++---
 2 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index d909c069363e..5a6ab0a61b68 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -53,6 +53,8 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int 
pid,
 
preempt_disable();
 
+   asm volatile("hwsync" ::: "memory");
+   isync();
/* switch the lpid first to avoid running host with unallocated pid */
old_lpid = mfspr(SPRN_LPID);
if (old_lpid != lpid)
@@ -69,6 +71,8 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int 
pid,
else
ret = copy_to_user_nofault((void __user *)to, from, n);
 
+   asm volatile("hwsync" ::: "memory");
+   isync();
/* switch the pid first to avoid running host with unallocated pid */
if (quadrant == 1 && pid != old_pid)
mtspr(SPRN_PID, old_pid);
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 55286a8357f7..7aa72efcac6c 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -503,17 +503,19 @@ static void switch_mmu_to_guest_radix(struct kvm *kvm, 
struct kvm_vcpu *vcpu, u6
lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
 
/*
-* All the isync()s are overkill but trivially follow the ISA
-* requirements. Some can likely be replaced with justification
-* comment for why they are not needed.
+* Prior memory accesses to host PID Q3 must be completed before we
+* start switching, and stores must be drained to avoid not-my-LPAR
+* logic (see switch_mmu_to_host).
 */
+   asm volatile("hwsync" ::: "memory");
isync();
mtspr(SPRN_LPID, lpid);
-   isync();
mtspr(SPRN_LPCR, lpcr);
-   isync();
mtspr(SPRN_PID, vcpu->arch.pid);
-   isync();
+   /*
+* isync not required here because we are HRFID'ing to guest before
+* any guest context access, which is context synchronising.
+*/
 }
 
 static void switch_mmu_to_guest_hpt(struct kvm *kvm, struct kvm_vcpu *vcpu, 
u64 lpcr)
@@ -523,25 +525,41 @@ static void switch_mmu_to_guest_hpt(struct kvm *kvm, 
struct kvm_vcpu *vcpu, u64
 
lpid = kvm->arch.lpid;
 
+   /*
+* See switch_mmu_to_guest_radix. ptesync should not be required here
+* even if the host is in HPT mode because speculative accesses would
+* not cause RC updates (we are in real mode).
+*/
+   asm volatile("hwsync" ::: "memory");
+   isync();
mtspr(SPRN_LPID, lpid);
mtspr(SPRN_LPCR, lpcr);
mtspr(SPRN_PID, vcpu->arch.pid);
 
for (i = 0; i < vcpu->arch.slb_max; i++)
mtslb(vcpu->arch.slb[i].orige, vcpu->arch.slb[i].origv);
-
-   isync();
+   /*
+* isync not required here, see switch_mmu_to_guest_radix.
+*/
 }
 
 static void switch_mmu_to_host(struct kvm *kvm, u32 pid)
 {
+   /*
+* The guest has exited, so guest MMU context is no longer being
+* non-speculatively accessed, but a hwsync is needed before the
+* mtLPIDR / mtPIDR switch, in order to ensure all stores are drained,
+* so the not-my-LPAR tlbie logic does not overlook them.
+*/
+   asm volatile("hwsync" ::: "memory");
isync();
mtspr(SPRN_PID, pid);
-   isync();
mtspr(SPRN_LPID, kvm->arch.host_lpid);
-   isync();
mtspr(SPRN_LPCR, kvm->arch.host_lpcr);
-   isync();
+   /*
+* isync is not required after the switch, because mtmsrd with L=0
+* is performed after this switch, which is context synchronising.
+*/
 
if (!radix_enabled())
slb_restore_bolted_realmode();
-- 
2.23.0

[RFC PATCH 36/43] KVM: PPC: Book3S HV P9: Use Linux SPR save/restore to manage some host SPRs

2021-06-22 Thread Nicholas Piggin

Linux implements SPR save/restore including storage space for registers
in the task struct for process context switching. Make use of this
similarly to the way we make use of the context switching fp/vec save
restore.

This improves code reuse, allows some stack space to be saved, and helps
with avoiding VRSAVE updates if they are not required.

-61 cycles (6620) POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/switch_to.h  |  2 +
 arch/powerpc/kernel/process.c |  6 ++
 arch/powerpc/kvm/book3s_hv.c  | 22 +--
 arch/powerpc/kvm/book3s_hv.h  |  3 -
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 88 +++
 5 files changed, 72 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 9d1fbd8be1c7..de17c45314bc 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -112,6 +112,8 @@ static inline void clear_task_ebb(struct task_struct *t)
 #endif
 }
 
+void kvmppc_save_current_sprs(void);
+
 extern int set_thread_tidr(struct task_struct *t);
 
 #endif /* _ASM_POWERPC_SWITCH_TO_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index dfce089ac424..29b8fd9704be 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1175,6 +1175,12 @@ static inline void save_sprs(struct thread_struct *t)
 #endif
 }
 
+void kvmppc_save_current_sprs(void)
+{
+   save_sprs(>thread);
+}
+EXPORT_SYMBOL_GPL(kvmppc_save_current_sprs);
+
 static inline void restore_sprs(struct thread_struct *old_thread,
struct thread_struct *new_thread)
 {
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2430725f29f7..c7cf771d3351 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4410,9 +4410,6 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
struct kvm_run *run = vcpu->run;
int r;
int srcu_idx;
-   unsigned long ebb_regs[3] = {}; /* shut up GCC */
-   unsigned long user_tar = 0;
-   unsigned int user_vrsave;
struct kvm *kvm;
unsigned long msr;
 
@@ -4473,14 +4470,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 
save_user_regs_kvm();
 
-   /* Save userspace EBB and other register values */
-   if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   ebb_regs[0] = mfspr(SPRN_EBBHR);
-   ebb_regs[1] = mfspr(SPRN_EBBRR);
-   ebb_regs[2] = mfspr(SPRN_BESCR);
-   user_tar = mfspr(SPRN_TAR);
-   }
-   user_vrsave = mfspr(SPRN_VRSAVE);
+   kvmppc_save_current_sprs();
 
vcpu->arch.waitp = >arch.vcore->wait;
vcpu->arch.pgdir = kvm->mm->pgd;
@@ -4521,17 +4511,9 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
}
} while (is_kvmppc_resume_guest(r));
 
-   /* Restore userspace EBB and other register values */
-   if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   mtspr(SPRN_EBBHR, ebb_regs[0]);
-   mtspr(SPRN_EBBRR, ebb_regs[1]);
-   mtspr(SPRN_BESCR, ebb_regs[2]);
-   mtspr(SPRN_TAR, user_tar);
-   }
-   mtspr(SPRN_VRSAVE, user_vrsave);
-
vcpu->arch.state = KVMPPC_VCPU_NOTREADY;
atomic_dec(>arch.vcpus_running);
+
return r;
 }
 
diff --git a/arch/powerpc/kvm/book3s_hv.h b/arch/powerpc/kvm/book3s_hv.h
index 72e3a8f4c2cf..c7ad1127462d 100644
--- a/arch/powerpc/kvm/book3s_hv.h
+++ b/arch/powerpc/kvm/book3s_hv.h
@@ -3,11 +3,8 @@
  * Privileged (non-hypervisor) host registers to save.
  */
 struct p9_host_os_sprs {
-   unsigned long dscr;
-   unsigned long tidr;
unsigned long iamr;
unsigned long amr;
-   unsigned long fscr;
 
unsigned int pmc1;
unsigned int pmc2;
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 653f2765a399..55286a8357f7 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -217,15 +217,26 @@ EXPORT_SYMBOL_GPL(switch_pmu_to_host);
 static void load_spr_state(struct kvm_vcpu *vcpu,
struct p9_host_os_sprs *host_os_sprs)
 {
+   /* TAR is very fast */
mtspr(SPRN_TAR, vcpu->arch.tar);
 
+#ifdef CONFIG_ALTIVEC
+   if (cpu_has_feature(CPU_FTR_ALTIVEC) &&
+   current->thread.vrsave != vcpu->arch.vrsave)
+   mtspr(SPRN_VRSAVE, vcpu->arch.vrsave);
+#endif
+
if (vcpu->arch.hfscr & HFSCR_EBB) {
-   mtspr(SPRN_EBBHR, vcpu->arch.ebbhr);
-   mtspr(SPRN_EBBRR, vcpu->arch.ebbrr);
-   mtspr(SPRN_BESCR, vcpu->arch.bescr);
+   if (current->thread.ebbhr != vcpu->arch.ebbhr)
+   mtspr(SPRN_EBBHR, vcpu->arch.ebbhr);
+   if (current->thread.ebbrr != vcpu->arch.ebbrr)
+

[RFC PATCH 35/43] KVM: PPC: Book3S HV P9: Demand fault TM facility registers

2021-06-22 Thread Nicholas Piggin

Use HFSCR facility disabling to implement demand faulting for TM, with
a hysteresis counter similar to the load_fp etc counters in context
switching that implement the equivalent demand faulting for userspace
facilities.

This speeds up guest entry/exit by avoiding the register save/restore
when a guest is not frequently using them. When a guest does use them
often, there will be some additional demand fault overhead, but these
are not commonly used facilities.

-304 cycles (6681) POWER9 virt-mode NULL hcall with the previous patch

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/kvm_host.h   |  1 +
 arch/powerpc/kvm/book3s_hv.c  | 21 +
 arch/powerpc/kvm/book3s_hv_nested.c   |  2 +-
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 18 --
 4 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index bee95106c1f2..d79f0b1b1578 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -586,6 +586,7 @@ struct kvm_vcpu_arch {
ulong ppr;
u32 pspb;
u8 load_ebb;
+   u8 load_tm;
ulong fscr;
ulong shadow_fscr;
ulong ebbhr;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 99e9da078e7d..2430725f29f7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1373,6 +1373,13 @@ static int kvmppc_ebb_unavailable(struct kvm_vcpu *vcpu)
return RESUME_GUEST;
 }
 
+static int kvmppc_tm_unavailable(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.hfscr |= HFSCR_TM;
+
+   return RESUME_GUEST;
+}
+
 static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 struct task_struct *tsk)
 {
@@ -1654,6 +1661,8 @@ XXX benchmark guest exits
r = kvmppc_pmu_unavailable(vcpu);
if (cause == FSCR_EBB_LG)
r = kvmppc_ebb_unavailable(vcpu);
+   if (cause == FSCR_TM_LG)
+   r = kvmppc_tm_unavailable(vcpu);
}
if (r == EMULATE_FAIL) {
kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
@@ -1775,6 +1784,8 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu 
*vcpu)
r = kvmppc_pmu_unavailable(vcpu);
if (cause == FSCR_EBB_LG && (vcpu->arch.nested_hfscr & 
HFSCR_EBB))
r = kvmppc_ebb_unavailable(vcpu);
+   if (cause == FSCR_TM_LG && (vcpu->arch.nested_hfscr & HFSCR_TM))
+   r = kvmppc_tm_unavailable(vcpu);
 
if (r == EMULATE_FAIL)
r = RESUME_HOST;
@@ -3737,8 +3748,9 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu 
*vcpu, u64 time_limit, uns
msr |= MSR_VEC;
if (cpu_has_feature(CPU_FTR_VSX))
msr |= MSR_VSX;
-   if (cpu_has_feature(CPU_FTR_TM) ||
-   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
+   if ((cpu_has_feature(CPU_FTR_TM) ||
+   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)) &&
+   (vcpu->arch.hfscr & HFSCR_TM))
msr |= MSR_TM;
msr = msr_check_and_set(msr);
 
@@ -4453,8 +4465,9 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
msr |= MSR_VEC;
if (cpu_has_feature(CPU_FTR_VSX))
msr |= MSR_VSX;
-   if (cpu_has_feature(CPU_FTR_TM) ||
-   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
+   if ((cpu_has_feature(CPU_FTR_TM) ||
+   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)) &&
+   (vcpu->arch.hfscr & HFSCR_TM))
msr |= MSR_TM;
msr = msr_check_and_set(msr);
 
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index ee8668f056f9..5a534f7924f2 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -168,7 +168,7 @@ static void sanitise_hv_regs(struct kvm_vcpu *vcpu, struct 
hv_guest_state *hr)
 * but preserve the interrupt cause field and facilities that might
 * be disabled for demand faulting in the L1.
 */
-   hr->hfscr &= (HFSCR_INTR_CAUSE | HFSCR_PM | HFSCR_EBB |
+   hr->hfscr &= (HFSCR_INTR_CAUSE | HFSCR_PM | HFSCR_TM | HFSCR_EBB |
vcpu->arch.hfscr);
 
/* Don't let data address watchpoint match in hypervisor state */
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index cf41261daa97..653f2765a399 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -284,8 +284,9 @@ static void store_spr_state(struct kvm_vcpu *vcpu)
 void load_vcpu_state(struct kvm_vcpu *vcpu,
   struct p9_host_os_sprs *host_os_sprs)
 {
-   if (cpu_has_feature(CPU_FTR_TM) ||
-

[RFC PATCH 34/43] KVM: PPC: Book3S HV P9: Demand fault EBB facility registers

2021-06-22 Thread Nicholas Piggin

Use HFSCR facility disabling to implement demand faulting for EBB, with
a hysteresis counter similar to the load_fp etc counters in context
switching that implement the equivalent demand faulting for userspace
facilities.

This speeds up guest entry/exit by avoiding the register save/restore
when a guest is not frequently using them. When a guest does use them
often, there will be some additional demand fault overhead, but these
are not commonly used facilities.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/kvm_host.h   |  1 +
 arch/powerpc/kvm/book3s_hv.c  | 11 +++
 arch/powerpc/kvm/book3s_hv_nested.c   |  3 ++-
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 26 --
 4 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 118b388ea887..bee95106c1f2 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -585,6 +585,7 @@ struct kvm_vcpu_arch {
ulong cfar;
ulong ppr;
u32 pspb;
+   u8 load_ebb;
ulong fscr;
ulong shadow_fscr;
ulong ebbhr;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ae528eb37792..99e9da078e7d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1366,6 +1366,13 @@ static int kvmppc_pmu_unavailable(struct kvm_vcpu *vcpu)
return RESUME_GUEST;
 }
 
+static int kvmppc_ebb_unavailable(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.hfscr |= HFSCR_EBB;
+
+   return RESUME_GUEST;
+}
+
 static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 struct task_struct *tsk)
 {
@@ -1645,6 +1652,8 @@ XXX benchmark guest exits
r = kvmppc_emulate_doorbell_instr(vcpu);
if (cause == FSCR_PM_LG)
r = kvmppc_pmu_unavailable(vcpu);
+   if (cause == FSCR_EBB_LG)
+   r = kvmppc_ebb_unavailable(vcpu);
}
if (r == EMULATE_FAIL) {
kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
@@ -1764,6 +1773,8 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu 
*vcpu)
r = EMULATE_FAIL;
if (cause == FSCR_PM_LG && (vcpu->arch.nested_hfscr & HFSCR_PM))
r = kvmppc_pmu_unavailable(vcpu);
+   if (cause == FSCR_EBB_LG && (vcpu->arch.nested_hfscr & 
HFSCR_EBB))
+   r = kvmppc_ebb_unavailable(vcpu);
 
if (r == EMULATE_FAIL)
r = RESUME_HOST;
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 024b0ce5b702..ee8668f056f9 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -168,7 +168,8 @@ static void sanitise_hv_regs(struct kvm_vcpu *vcpu, struct 
hv_guest_state *hr)
 * but preserve the interrupt cause field and facilities that might
 * be disabled for demand faulting in the L1.
 */
-   hr->hfscr &= (HFSCR_INTR_CAUSE | HFSCR_PM | vcpu->arch.hfscr);
+   hr->hfscr &= (HFSCR_INTR_CAUSE | HFSCR_PM | HFSCR_EBB |
+   vcpu->arch.hfscr);
 
/* Don't let data address watchpoint match in hypervisor state */
hr->dawrx0 &= ~DAWRX_HYP;
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 4d1a2d1ff4c1..cf41261daa97 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -218,9 +218,12 @@ static void load_spr_state(struct kvm_vcpu *vcpu,
struct p9_host_os_sprs *host_os_sprs)
 {
mtspr(SPRN_TAR, vcpu->arch.tar);
-   mtspr(SPRN_EBBHR, vcpu->arch.ebbhr);
-   mtspr(SPRN_EBBRR, vcpu->arch.ebbrr);
-   mtspr(SPRN_BESCR, vcpu->arch.bescr);
+
+   if (vcpu->arch.hfscr & HFSCR_EBB) {
+   mtspr(SPRN_EBBHR, vcpu->arch.ebbhr);
+   mtspr(SPRN_EBBRR, vcpu->arch.ebbrr);
+   mtspr(SPRN_BESCR, vcpu->arch.bescr);
+   }
 
if (!cpu_has_feature(CPU_FTR_ARCH_31))
mtspr(SPRN_TIDR, vcpu->arch.tid);
@@ -251,9 +254,20 @@ static void load_spr_state(struct kvm_vcpu *vcpu,
 static void store_spr_state(struct kvm_vcpu *vcpu)
 {
vcpu->arch.tar = mfspr(SPRN_TAR);
-   vcpu->arch.ebbhr = mfspr(SPRN_EBBHR);
-   vcpu->arch.ebbrr = mfspr(SPRN_EBBRR);
-   vcpu->arch.bescr = mfspr(SPRN_BESCR);
+
+   if (vcpu->arch.hfscr & HFSCR_EBB) {
+   vcpu->arch.ebbhr = mfspr(SPRN_EBBHR);
+   vcpu->arch.ebbrr = mfspr(SPRN_EBBRR);
+   vcpu->arch.bescr = mfspr(SPRN_BESCR);
+   /*
+* This is like load_fp in context switching, turn off the
+* facility after it wraps the u8 to try avoiding saving
+* and

[RFC PATCH 33/43] KVM: PPC: Book3S HV P9: More SPR speed improvements

2021-06-22 Thread Nicholas Piggin

This avoids more scoreboard stalls and reduces mtSPRs.

-193 cycles (6985) POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 67 ---
 1 file changed, 40 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index b41be3d8f101..4d1a2d1ff4c1 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -618,24 +618,29 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
vc->tb_offset_applied = vc->tb_offset;
}
 
-   if (vc->pcr)
-   mtspr(SPRN_PCR, vc->pcr | PCR_MASK);
-   mtspr(SPRN_DPDES, vc->dpdes);
mtspr(SPRN_VTB, vc->vtb);
-
mtspr(SPRN_PURR, vcpu->arch.purr);
mtspr(SPRN_SPURR, vcpu->arch.spurr);
 
+   if (vc->pcr)
+   mtspr(SPRN_PCR, vc->pcr | PCR_MASK);
+   if (vc->dpdes)
+   mtspr(SPRN_DPDES, vc->dpdes);
+
if (dawr_enabled()) {
-   mtspr(SPRN_DAWR0, vcpu->arch.dawr0);
-   mtspr(SPRN_DAWRX0, vcpu->arch.dawrx0);
+   if (vcpu->arch.dawr0 != host_dawr0)
+   mtspr(SPRN_DAWR0, vcpu->arch.dawr0);
+   if (vcpu->arch.dawrx0 != host_dawrx0)
+   mtspr(SPRN_DAWRX0, vcpu->arch.dawrx0);
if (cpu_has_feature(CPU_FTR_DAWR1)) {
-   mtspr(SPRN_DAWR1, vcpu->arch.dawr1);
-   mtspr(SPRN_DAWRX1, vcpu->arch.dawrx1);
+   if (vcpu->arch.dawr1 != host_dawr1)
+   mtspr(SPRN_DAWR1, vcpu->arch.dawr1);
+   if (vcpu->arch.dawrx1 != host_dawrx1)
+   mtspr(SPRN_DAWRX1, vcpu->arch.dawrx1);
}
}
-   mtspr(SPRN_CIABR, vcpu->arch.ciabr);
-   mtspr(SPRN_IC, vcpu->arch.ic);
+   if (vcpu->arch.ciabr != host_ciabr)
+   mtspr(SPRN_CIABR, vcpu->arch.ciabr);
 
mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC |
  (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
@@ -833,17 +838,6 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
vc->dpdes = mfspr(SPRN_DPDES);
vc->vtb = mfspr(SPRN_VTB);
 
-   save_clear_guest_mmu(kvm, vcpu);
-   switch_mmu_to_host(kvm, host_pidr);
-
-   /*
-* If we are in real mode, only switch MMU on after the MMU is
-* switched to host, to avoid the P9_RADIX_PREFETCH_BUG.
-*/
-   __mtmsrd(msr, 0);
-
-   store_vcpu_state(vcpu);
-
dec = mfspr(SPRN_DEC);
if (!(lpcr & LPCR_LD)) /* Sign extend if not using large decrementer */
dec = (s32) dec;
@@ -861,6 +855,19 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
vc->tb_offset_applied = 0;
}
 
+   save_clear_guest_mmu(kvm, vcpu);
+   switch_mmu_to_host(kvm, host_pidr);
+
+   /*
+* Enable MSR here in order to have facilities enabled to save
+* guest registers. This enables MMU (if we were in realmode), so
+* only switch MMU on after the MMU is switched to host, to avoid
+* the P9_RADIX_PREFETCH_BUG or hash guest context.
+*/
+   __mtmsrd(msr, 0);
+
+   store_vcpu_state(vcpu);
+
mtspr(SPRN_PURR, local_paca->kvm_hstate.host_purr);
mtspr(SPRN_SPURR, local_paca->kvm_hstate.host_spurr);
 
@@ -868,15 +875,21 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
mtspr(SPRN_PSSCR, host_psscr |
  (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
mtspr(SPRN_HFSCR, host_hfscr);
-   mtspr(SPRN_CIABR, host_ciabr);
-   mtspr(SPRN_DAWR0, host_dawr0);
-   mtspr(SPRN_DAWRX0, host_dawrx0);
+   if (vcpu->arch.ciabr != host_ciabr)
+   mtspr(SPRN_CIABR, host_ciabr);
+   if (vcpu->arch.dawr0 != host_dawr0)
+   mtspr(SPRN_DAWR0, host_dawr0);
+   if (vcpu->arch.dawrx0 != host_dawrx0)
+   mtspr(SPRN_DAWRX0, host_dawrx0);
if (cpu_has_feature(CPU_FTR_DAWR1)) {
-   mtspr(SPRN_DAWR1, host_dawr1);
-   mtspr(SPRN_DAWRX1, host_dawrx1);
+   if (vcpu->arch.dawr1 != host_dawr1)
+   mtspr(SPRN_DAWR1, host_dawr1);
+   if (vcpu->arch.dawrx1 != host_dawrx1)
+   mtspr(SPRN_DAWRX1, host_dawrx1);
}
 
-   mtspr(SPRN_DPDES, 0);
+   if (vc->dpdes)
+   mtspr(SPRN_DPDES, 0);
if (vc->pcr)
mtspr(SPRN_PCR, PCR_MASK);
 
-- 
2.23.0

[RFC PATCH 32/43] KVM: PPC: Book3S HV P9: Restrict DSISR canary workaround to processors that require it

2021-06-22 Thread Nicholas Piggin

Use CPU_FTR_P9_RADIX_PREFETCH_BUG for this, to test for DD2.1 and below
processors.

-43 cycles (7178) POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c  | 3 ++-
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 6 --
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a31397fde98e..ae528eb37792 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1523,7 +1523,8 @@ XXX benchmark guest exits
unsigned long vsid;
long err;
 
-   if (vcpu->arch.fault_dsisr == HDSISR_CANARY) {
+   if (cpu_has_feature(CPU_FTR_P9_RADIX_PREFETCH_BUG) &&
+   unlikely(vcpu->arch.fault_dsisr == HDSISR_CANARY)) {
r = RESUME_GUEST; /* Just retry if it's the canary */
break;
}
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 9e58624566a4..b41be3d8f101 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -656,9 +656,11 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
 * HDSI which should correctly update the HDSISR the second time HDSI
 * entry.
 *
-* Just do this on all p9 processors for now.
+* The "radix prefetch bug" test can be used to test for this bug, as
+* it also exists fo DD2.1 and below.
 */
-   mtspr(SPRN_HDSISR, HDSISR_CANARY);
+   if (cpu_has_feature(CPU_FTR_P9_RADIX_PREFETCH_BUG))
+   mtspr(SPRN_HDSISR, HDSISR_CANARY);
 
mtspr(SPRN_SPRG0, vcpu->arch.shregs.sprg0);
mtspr(SPRN_SPRG1, vcpu->arch.shregs.sprg1);
-- 
2.23.0

[RFC PATCH 31/43] KVM: PPC: Book3S HV P9: Switch PMU to guest as late as possible

2021-06-22 Thread Nicholas Piggin

This moves PMU switch to guest as late as possible in entry, and switch
back to host as early as possible at exit. This helps the host get the
most perf coverage of KVM entry/exit code as possible.

This is slightly suboptimal for SPR scheduling point of view when the
PMU is enabled, but when perf is disabled there is no real difference.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c  | 6 ++
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 6 ++
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ee4002c33f89..a31397fde98e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3703,8 +3703,6 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu 
*vcpu, u64 time_limit, uns
s64 dec;
int trap;
 
-   switch_pmu_to_guest(vcpu, _os_sprs);
-
save_p9_host_os_sprs(_os_sprs);
 
/*
@@ -3766,9 +3764,11 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu 
*vcpu, u64 time_limit, uns
 
mtspr(SPRN_DAR, vcpu->arch.shregs.dar);
mtspr(SPRN_DSISR, vcpu->arch.shregs.dsisr);
+   switch_pmu_to_guest(vcpu, _os_sprs);
trap = plpar_hcall_norets(H_ENTER_NESTED, __pa(),
  __pa(>arch.regs));
kvmhv_restore_hv_return_state(vcpu, );
+   switch_pmu_to_host(vcpu, _os_sprs);
vcpu->arch.shregs.msr = vcpu->arch.regs.msr;
vcpu->arch.shregs.dar = mfspr(SPRN_DAR);
vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
@@ -3787,8 +3787,6 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu 
*vcpu, u64 time_limit, uns
 
restore_p9_host_os_sprs(vcpu, _os_sprs);
 
-   switch_pmu_to_host(vcpu, _os_sprs);
-
return trap;
 }
 
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 81ff8479ac32..9e58624566a4 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -577,8 +577,6 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
local_paca->kvm_hstate.host_purr = mfspr(SPRN_PURR);
local_paca->kvm_hstate.host_spurr = mfspr(SPRN_SPURR);
 
-   switch_pmu_to_guest(vcpu, _os_sprs);
-
save_p9_host_os_sprs(_os_sprs);
 
/*
@@ -708,7 +706,9 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
 
accumulate_time(vcpu, >arch.guest_time);
 
+   switch_pmu_to_guest(vcpu, _os_sprs);
kvmppc_p9_enter_guest(vcpu);
+   switch_pmu_to_host(vcpu, _os_sprs);
 
accumulate_time(vcpu, >arch.rm_intr);
 
@@ -904,8 +904,6 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
asm volatile(PPC_CP_ABORT);
 
 out:
-   switch_pmu_to_host(vcpu, _os_sprs);
-
end_timing(vcpu);
 
return trap;
-- 
2.23.0

[RFC PATCH 30/43] KVM: PPC: Book3S HV P9: Implement TM fastpath for guest entry/exit

2021-06-22 Thread Nicholas Piggin

If TM is not active, only TM register state needs to be saved.

-348 cycles (7218) POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index f5098995f5cb..81ff8479ac32 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -271,8 +271,16 @@ void load_vcpu_state(struct kvm_vcpu *vcpu,
   struct p9_host_os_sprs *host_os_sprs)
 {
if (cpu_has_feature(CPU_FTR_TM) ||
-   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
-   kvmppc_restore_tm_hv(vcpu, vcpu->arch.shregs.msr, true);
+   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)) {
+   unsigned long msr = vcpu->arch.shregs.msr;
+   if (MSR_TM_ACTIVE(msr)) {
+   kvmppc_restore_tm_hv(vcpu, msr, true);
+   } else {
+   mtspr(SPRN_TEXASR, vcpu->arch.texasr);
+   mtspr(SPRN_TFHAR, vcpu->arch.tfhar);
+   mtspr(SPRN_TFIAR, vcpu->arch.tfiar);
+   }
+   }
 
load_spr_state(vcpu, host_os_sprs);
 
@@ -295,8 +303,16 @@ void store_vcpu_state(struct kvm_vcpu *vcpu)
vcpu->arch.vrsave = mfspr(SPRN_VRSAVE);
 
if (cpu_has_feature(CPU_FTR_TM) ||
-   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
-   kvmppc_save_tm_hv(vcpu, vcpu->arch.shregs.msr, true);
+   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST)) {
+   unsigned long msr = vcpu->arch.shregs.msr;
+   if (MSR_TM_ACTIVE(msr)) {
+   kvmppc_save_tm_hv(vcpu, msr, true);
+   } else {
+   vcpu->arch.texasr = mfspr(SPRN_TEXASR);
+   vcpu->arch.tfhar = mfspr(SPRN_TFHAR);
+   vcpu->arch.tfiar = mfspr(SPRN_TFIAR);
+   }
+   }
 }
 EXPORT_SYMBOL_GPL(store_vcpu_state);
 
-- 
2.23.0

[RFC PATCH 29/43] KVM: PPC: Book3S HV P9: Move remaining SPR and MSR access into low level entry

2021-06-22 Thread Nicholas Piggin

Move register saving and loading from kvmhv_p9_guest_entry() into the HV
and nested entry handlers.

Accesses are scheduled to reduce mtSPR / mfSPR interleaving which
reduces SPR scoreboard stalls.

XXX +212 cycles here somewhere (7566), investigate  POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c  | 77 --
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 80 ---
 2 files changed, 96 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 64386fc0cd00..ee4002c33f89 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3697,9 +3697,15 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu 
*vcpu, u64 time_limit, uns
 {
struct kvmppc_vcore *vc = vcpu->arch.vcore;
unsigned long host_psscr;
+   unsigned long msr;
struct hv_guest_state hvregs;
-   int trap;
+   struct p9_host_os_sprs host_os_sprs;
s64 dec;
+   int trap;
+
+   switch_pmu_to_guest(vcpu, _os_sprs);
+
+   save_p9_host_os_sprs(_os_sprs);
 
/*
 * We need to save and restore the guest visible part of the
@@ -3708,6 +3714,26 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu 
*vcpu, u64 time_limit, uns
 * this is done in kvmhv_vcpu_entry_p9() below otherwise.
 */
host_psscr = mfspr(SPRN_PSSCR_PR);
+
+   hard_irq_disable();
+   if (lazy_irq_pending())
+   return 0;
+
+   /* MSR bits may have been cleared by context switch */
+   msr = 0;
+   if (IS_ENABLED(CONFIG_PPC_FPU))
+   msr |= MSR_FP;
+   if (cpu_has_feature(CPU_FTR_ALTIVEC))
+   msr |= MSR_VEC;
+   if (cpu_has_feature(CPU_FTR_VSX))
+   msr |= MSR_VSX;
+   if (cpu_has_feature(CPU_FTR_TM) ||
+   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
+   msr |= MSR_TM;
+   msr = msr_check_and_set(msr);
+
+   load_vcpu_state(vcpu, _os_sprs);
+
mtspr(SPRN_PSSCR_PR, vcpu->arch.psscr);
kvmhv_save_hv_regs(vcpu, );
hvregs.lpcr = lpcr;
@@ -3749,12 +3775,20 @@ static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu 
*vcpu, u64 time_limit, uns
vcpu->arch.psscr = mfspr(SPRN_PSSCR_PR);
mtspr(SPRN_PSSCR_PR, host_psscr);
 
+   store_vcpu_state(vcpu);
+
dec = mfspr(SPRN_DEC);
if (!(lpcr & LPCR_LD)) /* Sign extend if not using large decrementer */
dec = (s32) dec;
*tb = mftb();
vcpu->arch.dec_expires = dec + (*tb + vc->tb_offset);
 
+   timer_rearm_host_dec(*tb);
+
+   restore_p9_host_os_sprs(vcpu, _os_sprs);
+
+   switch_pmu_to_host(vcpu, _os_sprs);
+
return trap;
 }
 
@@ -3765,9 +3799,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 unsigned long lpcr, u64 *tb)
 {
struct kvmppc_vcore *vc = vcpu->arch.vcore;
-   struct p9_host_os_sprs host_os_sprs;
u64 next_timer;
-   unsigned long msr;
int trap;
 
next_timer = timer_get_next_tb();
@@ -3778,33 +3810,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
vcpu->arch.ceded = 0;
 
-   save_p9_host_os_sprs(_os_sprs);
-
-   /*
-* This could be combined with MSR[RI] clearing, but that expands
-* the unrecoverable window. It would be better to cover unrecoverable
-* with KVM bad interrupt handling rather than use MSR[RI] at all.
-*
-* Much more difficult and less worthwhile to combine with IR/DR
-* disable.
-*/
-   hard_irq_disable();
-   if (lazy_irq_pending())
-   return 0;
-
-   /* MSR bits may have been cleared by context switch */
-   msr = 0;
-   if (IS_ENABLED(CONFIG_PPC_FPU))
-   msr |= MSR_FP;
-   if (cpu_has_feature(CPU_FTR_ALTIVEC))
-   msr |= MSR_VEC;
-   if (cpu_has_feature(CPU_FTR_VSX))
-   msr |= MSR_VSX;
-   if (cpu_has_feature(CPU_FTR_TM) ||
-   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
-   msr |= MSR_TM;
-   msr = msr_check_and_set(msr);
-
kvmppc_subcore_enter_guest();
 
vc->entry_exit_map = 1;
@@ -3812,10 +3817,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
vcpu_vpa_increment_dispatch(vcpu);
 
-   load_vcpu_state(vcpu, _os_sprs);
-
-   switch_pmu_to_guest(vcpu, _os_sprs);
-
if (kvmhv_on_pseries()) {
trap = kvmhv_vcpu_entry_p9_nested(vcpu, time_limit, lpcr, tb);
 
@@ -3858,16 +3859,8 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
vcpu->arch.slb_max = 0;
}
 
-   switch_pmu_to_host(vcpu, _os_sprs);
-
-   store_vcpu_state(vcpu);
-
vcpu_vpa_increment_dispatch(vcpu);
 
-   timer_rearm_host_dec(*tb);
-
-

[RFC PATCH 28/43] KVM: PPC: Book3S HV P9: Move nested guest entry into its own function

2021-06-22 Thread Nicholas Piggin

This is just refactoring.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 125 +++
 1 file changed, 67 insertions(+), 58 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a7660af22161..64386fc0cd00 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3692,6 +3692,72 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu 
*vcpu)
}
 }
 
+/* call our hypervisor to load up HV regs and go */
+static int kvmhv_vcpu_entry_p9_nested(struct kvm_vcpu *vcpu, u64 time_limit, 
unsigned long lpcr, u64 *tb)
+{
+   struct kvmppc_vcore *vc = vcpu->arch.vcore;
+   unsigned long host_psscr;
+   struct hv_guest_state hvregs;
+   int trap;
+   s64 dec;
+
+   /*
+* We need to save and restore the guest visible part of the
+* psscr (i.e. using SPRN_PSSCR_PR) since the hypervisor
+* doesn't do this for us. Note only required if pseries since
+* this is done in kvmhv_vcpu_entry_p9() below otherwise.
+*/
+   host_psscr = mfspr(SPRN_PSSCR_PR);
+   mtspr(SPRN_PSSCR_PR, vcpu->arch.psscr);
+   kvmhv_save_hv_regs(vcpu, );
+   hvregs.lpcr = lpcr;
+   vcpu->arch.regs.msr = vcpu->arch.shregs.msr;
+   hvregs.version = HV_GUEST_STATE_VERSION;
+   if (vcpu->arch.nested) {
+   hvregs.lpid = vcpu->arch.nested->shadow_lpid;
+   hvregs.vcpu_token = vcpu->arch.nested_vcpu_id;
+   } else {
+   hvregs.lpid = vcpu->kvm->arch.lpid;
+   hvregs.vcpu_token = vcpu->vcpu_id;
+   }
+   hvregs.hdec_expiry = time_limit;
+
+   /*
+* When setting DEC, we must always deal with irq_work_raise
+* via NMI vs setting DEC. The problem occurs right as we
+* switch into guest mode if a NMI hits and sets pending work
+* and sets DEC, then that will apply to the guest and not
+* bring us back to the host.
+*
+* irq_work_raise could check a flag (or possibly LPCR[HDICE]
+* for example) and set HDEC to 1? That wouldn't solve the
+* nested hv case which needs to abort the hcall or zero the
+* time limit.
+*
+* XXX: Another day's problem.
+*/
+   mtspr(SPRN_DEC, kvmppc_dec_expires_host_tb(vcpu) - *tb);
+
+   mtspr(SPRN_DAR, vcpu->arch.shregs.dar);
+   mtspr(SPRN_DSISR, vcpu->arch.shregs.dsisr);
+   trap = plpar_hcall_norets(H_ENTER_NESTED, __pa(),
+ __pa(>arch.regs));
+   kvmhv_restore_hv_return_state(vcpu, );
+   vcpu->arch.shregs.msr = vcpu->arch.regs.msr;
+   vcpu->arch.shregs.dar = mfspr(SPRN_DAR);
+   vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
+   vcpu->arch.psscr = mfspr(SPRN_PSSCR_PR);
+   mtspr(SPRN_PSSCR_PR, host_psscr);
+
+   dec = mfspr(SPRN_DEC);
+   if (!(lpcr & LPCR_LD)) /* Sign extend if not using large decrementer */
+   dec = (s32) dec;
+   *tb = mftb();
+   vcpu->arch.dec_expires = dec + (*tb + vc->tb_offset);
+
+   return trap;
+}
+
 /*
  * Guest entry for POWER9 and later CPUs.
  */
@@ -3700,7 +3766,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 {
struct kvmppc_vcore *vc = vcpu->arch.vcore;
struct p9_host_os_sprs host_os_sprs;
-   s64 dec;
u64 next_timer;
unsigned long msr;
int trap;
@@ -3752,63 +3817,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
switch_pmu_to_guest(vcpu, _os_sprs);
 
if (kvmhv_on_pseries()) {
-   /*
-* We need to save and restore the guest visible part of the
-* psscr (i.e. using SPRN_PSSCR_PR) since the hypervisor
-* doesn't do this for us. Note only required if pseries since
-* this is done in kvmhv_vcpu_entry_p9() below otherwise.
-*/
-   unsigned long host_psscr;
-   /* call our hypervisor to load up HV regs and go */
-   struct hv_guest_state hvregs;
-
-   host_psscr = mfspr(SPRN_PSSCR_PR);
-   mtspr(SPRN_PSSCR_PR, vcpu->arch.psscr);
-   kvmhv_save_hv_regs(vcpu, );
-   hvregs.lpcr = lpcr;
-   vcpu->arch.regs.msr = vcpu->arch.shregs.msr;
-   hvregs.version = HV_GUEST_STATE_VERSION;
-   if (vcpu->arch.nested) {
-   hvregs.lpid = vcpu->arch.nested->shadow_lpid;
-   hvregs.vcpu_token = vcpu->arch.nested_vcpu_id;
-   } else {
-   hvregs.lpid = vcpu->kvm->arch.lpid;
-   hvregs.vcpu_token = vcpu->vcpu_id;
-   }
-   hvregs.hdec_expiry = time_limit;
-
-   /*
-* When setting DEC, we must always deal with irq_work_raise
-* via NMI vs setting DEC. The problem occurs

[RFC PATCH 27/43] KVM: PPC: Book3S HV P9: Move host OS save/restore functions to built-in

2021-06-22 Thread Nicholas Piggin

Move the P9 guest/host register switching functions to the built-in
P9 entry code, and export it for nested to use as well.

This allows more flexibility in scheduling these supervisor privileged
SPR accesses with the HV privileged and PR SPR accesses in the low level
entry code.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c  | 351 +-
 arch/powerpc/kvm/book3s_hv.h  |  39 +++
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 332 
 3 files changed, 372 insertions(+), 350 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_hv.h

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 35749b0b663f..a7660af22161 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -79,6 +79,7 @@
 #include 
 
 #include "book3s.h"
+#include "book3s_hv.h"
 
 #define CREATE_TRACE_POINTS
 #include "trace_hv.h"
@@ -3675,356 +3676,6 @@ static noinline void kvmppc_run_core(struct 
kvmppc_vcore *vc)
trace_kvmppc_run_core(vc, 1);
 }
 
-/*
- * Privileged (non-hypervisor) host registers to save.
- */
-struct p9_host_os_sprs {
-   unsigned long dscr;
-   unsigned long tidr;
-   unsigned long iamr;
-   unsigned long amr;
-   unsigned long fscr;
-
-   unsigned int pmc1;
-   unsigned int pmc2;
-   unsigned int pmc3;
-   unsigned int pmc4;
-   unsigned int pmc5;
-   unsigned int pmc6;
-   unsigned long mmcr0;
-   unsigned long mmcr1;
-   unsigned long mmcr2;
-   unsigned long mmcr3;
-   unsigned long mmcra;
-   unsigned long siar;
-   unsigned long sier1;
-   unsigned long sier2;
-   unsigned long sier3;
-   unsigned long sdar;
-};
-
-static void freeze_pmu(unsigned long mmcr0, unsigned long mmcra)
-{
-   if (!(mmcr0 & MMCR0_FC))
-   goto do_freeze;
-   if (mmcra & MMCRA_SAMPLE_ENABLE)
-   goto do_freeze;
-   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
-   if (!(mmcr0 & MMCR0_PMCCEXT))
-   goto do_freeze;
-   if (!(mmcra & MMCRA_BHRB_DISABLE))
-   goto do_freeze;
-   }
-   return;
-
-do_freeze:
-   mmcr0 = MMCR0_FC;
-   mmcra = 0;
-   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
-   mmcr0 |= MMCR0_PMCCEXT;
-   mmcra = MMCRA_BHRB_DISABLE;
-   }
-
-   mtspr(SPRN_MMCR0, mmcr0);
-   mtspr(SPRN_MMCRA, mmcra);
-   isync();
-}
-
-static void switch_pmu_to_guest(struct kvm_vcpu *vcpu,
-   struct p9_host_os_sprs *host_os_sprs)
-{
-   struct lppaca *lp;
-   int load_pmu = 1;
-
-   lp = vcpu->arch.vpa.pinned_addr;
-   if (lp)
-   load_pmu = lp->pmcregs_in_use;
-
-   if (load_pmu)
- vcpu->arch.hfscr |= HFSCR_PM;
-
-   /* Save host */
-   if (ppc_get_pmu_inuse()) {
-   /*
-* It might be better to put PMU handling (at least for the
-* host) in the perf subsystem because it knows more about what
-* is being used.
-*/
-
-   /* POWER9, POWER10 do not implement HPMC or SPMC */
-
-   host_os_sprs->mmcr0 = mfspr(SPRN_MMCR0);
-   host_os_sprs->mmcra = mfspr(SPRN_MMCRA);
-
-   freeze_pmu(host_os_sprs->mmcr0, host_os_sprs->mmcra);
-
-   host_os_sprs->pmc1 = mfspr(SPRN_PMC1);
-   host_os_sprs->pmc2 = mfspr(SPRN_PMC2);
-   host_os_sprs->pmc3 = mfspr(SPRN_PMC3);
-   host_os_sprs->pmc4 = mfspr(SPRN_PMC4);
-   host_os_sprs->pmc5 = mfspr(SPRN_PMC5);
-   host_os_sprs->pmc6 = mfspr(SPRN_PMC6);
-   host_os_sprs->mmcr1 = mfspr(SPRN_MMCR1);
-   host_os_sprs->mmcr2 = mfspr(SPRN_MMCR2);
-   host_os_sprs->sdar = mfspr(SPRN_SDAR);
-   host_os_sprs->siar = mfspr(SPRN_SIAR);
-   host_os_sprs->sier1 = mfspr(SPRN_SIER);
-
-   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
-   host_os_sprs->mmcr3 = mfspr(SPRN_MMCR3);
-   host_os_sprs->sier2 = mfspr(SPRN_SIER2);
-   host_os_sprs->sier3 = mfspr(SPRN_SIER3);
-   }
-   }
-
-#ifdef CONFIG_PPC_PSERIES
-   if (kvmhv_on_pseries()) {
-   if (vcpu->arch.vpa.pinned_addr) {
-   struct lppaca *lp = vcpu->arch.vpa.pinned_addr;
-   get_lppaca()->pmcregs_in_use = lp->pmcregs_in_use;
-   } else {
-   get_lppaca()->pmcregs_in_use = 1;
-   }
-   }
-#endif
-
-   /* Load guest */
-   if (vcpu->arch.hfscr & HFSCR_PM) {
-   mtspr(SPRN_PMC1, vcpu->arch.pmc[0]);
-   mtspr(SPRN_PMC2, vcpu->arch.pmc[1]);
-   mtspr(SPRN_PMC3, vcpu->arch.pmc[2]);
-   mtspr(SPRN_PMC4, vcpu->arch.pmc[3]);
-   mtspr(SPRN_PMC5,

[RFC PATCH 26/43] KVM: PPC: Book3S HV P9: Move vcpu register save/restore into functions

2021-06-22 Thread Nicholas Piggin

This should be no functional difference but makes the caller easier
to read.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 55 +---
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a780a9b9effd..35749b0b663f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3958,6 +3958,37 @@ static void store_spr_state(struct kvm_vcpu *vcpu)
vcpu->arch.ctrl = mfspr(SPRN_CTRLF);
 }
 
+static void load_vcpu_state(struct kvm_vcpu *vcpu,
+  struct p9_host_os_sprs *host_os_sprs)
+{
+   if (cpu_has_feature(CPU_FTR_TM) ||
+   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
+   kvmppc_restore_tm_hv(vcpu, vcpu->arch.shregs.msr, true);
+
+   load_spr_state(vcpu, host_os_sprs);
+
+   load_fp_state(>arch.fp);
+#ifdef CONFIG_ALTIVEC
+   load_vr_state(>arch.vr);
+#endif
+   mtspr(SPRN_VRSAVE, vcpu->arch.vrsave);
+}
+
+static void store_vcpu_state(struct kvm_vcpu *vcpu)
+{
+   store_spr_state(vcpu);
+
+   store_fp_state(>arch.fp);
+#ifdef CONFIG_ALTIVEC
+   store_vr_state(>arch.vr);
+#endif
+   vcpu->arch.vrsave = mfspr(SPRN_VRSAVE);
+
+   if (cpu_has_feature(CPU_FTR_TM) ||
+   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
+   kvmppc_save_tm_hv(vcpu, vcpu->arch.shregs.msr, true);
+}
+
 static void save_p9_host_os_sprs(struct p9_host_os_sprs *host_os_sprs)
 {
if (!cpu_has_feature(CPU_FTR_ARCH_31))
@@ -4065,17 +4096,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
vcpu_vpa_increment_dispatch(vcpu);
 
-   if (cpu_has_feature(CPU_FTR_TM) ||
-   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
-   kvmppc_restore_tm_hv(vcpu, vcpu->arch.shregs.msr, true);
-
-   load_spr_state(vcpu, _os_sprs);
-
-   load_fp_state(>arch.fp);
-#ifdef CONFIG_ALTIVEC
-   load_vr_state(>arch.vr);
-#endif
-   mtspr(SPRN_VRSAVE, vcpu->arch.vrsave);
+   load_vcpu_state(vcpu, _os_sprs);
 
switch_pmu_to_guest(vcpu, _os_sprs);
 
@@ -4179,17 +4200,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
switch_pmu_to_host(vcpu, _os_sprs);
 
-   store_spr_state(vcpu);
-
-   store_fp_state(>arch.fp);
-#ifdef CONFIG_ALTIVEC
-   store_vr_state(>arch.vr);
-#endif
-   vcpu->arch.vrsave = mfspr(SPRN_VRSAVE);
-
-   if (cpu_has_feature(CPU_FTR_TM) ||
-   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
-   kvmppc_save_tm_hv(vcpu, vcpu->arch.shregs.msr, true);
+   store_vcpu_state(vcpu);
 
vcpu_vpa_increment_dispatch(vcpu);
 
-- 
2.23.0

[RFC PATCH 23/43] KVM: PPC: Book3S HV P9: Avoid SPR scoreboard stalls

2021-06-22 Thread Nicholas Piggin

Avoid interleaving mfSPR and mtSPR.

-151 cycles (7427) POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c  |  8 
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 19 +++
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 99b19f4e7ed7..8c6ba04e1fdf 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4165,10 +4165,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
store_spr_state(vcpu);
 
-   timer_rearm_host_dec(*tb);
-
-   restore_p9_host_os_sprs(vcpu, _os_sprs);
-
store_fp_state(>arch.fp);
 #ifdef CONFIG_ALTIVEC
store_vr_state(>arch.vr);
@@ -4183,6 +4179,10 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
switch_pmu_to_host(vcpu, _os_sprs);
 
+   timer_rearm_host_dec(*tb);
+
+   restore_p9_host_os_sprs(vcpu, _os_sprs);
+
vc->entry_exit_map = 0x101;
vc->in_guest = 0;
 
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 237ea1ef1eab..afdd7dfa1c08 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -228,6 +228,9 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
host_dawrx1 = mfspr(SPRN_DAWRX1);
}
 
+   local_paca->kvm_hstate.host_purr = mfspr(SPRN_PURR);
+   local_paca->kvm_hstate.host_spurr = mfspr(SPRN_SPURR);
+
if (vc->tb_offset) {
u64 new_tb = *tb + vc->tb_offset;
mtspr(SPRN_TBU40, new_tb);
@@ -244,8 +247,6 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
mtspr(SPRN_DPDES, vc->dpdes);
mtspr(SPRN_VTB, vc->vtb);
 
-   local_paca->kvm_hstate.host_purr = mfspr(SPRN_PURR);
-   local_paca->kvm_hstate.host_spurr = mfspr(SPRN_SPURR);
mtspr(SPRN_PURR, vcpu->arch.purr);
mtspr(SPRN_SPURR, vcpu->arch.spurr);
 
@@ -433,10 +434,8 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
/* Advance host PURR/SPURR by the amount used by guest */
purr = mfspr(SPRN_PURR);
spurr = mfspr(SPRN_SPURR);
-   mtspr(SPRN_PURR, local_paca->kvm_hstate.host_purr +
- purr - vcpu->arch.purr);
-   mtspr(SPRN_SPURR, local_paca->kvm_hstate.host_spurr +
- spurr - vcpu->arch.spurr);
+   local_paca->kvm_hstate.host_purr += purr - vcpu->arch.purr;
+   local_paca->kvm_hstate.host_spurr += spurr - vcpu->arch.spurr;
vcpu->arch.purr = purr;
vcpu->arch.spurr = spurr;
 
@@ -449,6 +448,9 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
vcpu->arch.shregs.sprg2 = mfspr(SPRN_SPRG2);
vcpu->arch.shregs.sprg3 = mfspr(SPRN_SPRG3);
 
+   vc->dpdes = mfspr(SPRN_DPDES);
+   vc->vtb = mfspr(SPRN_VTB);
+
dec = mfspr(SPRN_DEC);
if (!(lpcr & LPCR_LD)) /* Sign extend if not using large decrementer */
dec = (s32) dec;
@@ -466,6 +468,9 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
vc->tb_offset_applied = 0;
}
 
+   mtspr(SPRN_PURR, local_paca->kvm_hstate.host_purr);
+   mtspr(SPRN_SPURR, local_paca->kvm_hstate.host_spurr);
+
/* Preserve PSSCR[FAKE_SUSPEND] until we've called kvmppc_save_tm_hv */
mtspr(SPRN_PSSCR, host_psscr |
  (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
@@ -494,8 +499,6 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
if (cpu_has_feature(CPU_FTR_ARCH_31))
asm volatile(PPC_CP_ABORT);
 
-   vc->dpdes = mfspr(SPRN_DPDES);
-   vc->vtb = mfspr(SPRN_VTB);
mtspr(SPRN_DPDES, 0);
if (vc->pcr)
mtspr(SPRN_PCR, PCR_MASK);
-- 
2.23.0

[RFC PATCH 25/43] KVM: PPC: Book3S HV P9: Juggle SPR switching around

2021-06-22 Thread Nicholas Piggin

This juggles SPR switching on the entry and exit sides to be more
symmetric, which makes the next refactoring patch possible with no
functional change.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 612b70216e75..a780a9b9effd 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4069,7 +4069,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
kvmppc_restore_tm_hv(vcpu, vcpu->arch.shregs.msr, true);
 
-   switch_pmu_to_guest(vcpu, _os_sprs);
+   load_spr_state(vcpu, _os_sprs);
 
load_fp_state(>arch.fp);
 #ifdef CONFIG_ALTIVEC
@@ -4077,7 +4077,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 #endif
mtspr(SPRN_VRSAVE, vcpu->arch.vrsave);
 
-   load_spr_state(vcpu, _os_sprs);
+   switch_pmu_to_guest(vcpu, _os_sprs);
 
if (kvmhv_on_pseries()) {
/*
@@ -4177,6 +4177,8 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
vcpu->arch.slb_max = 0;
}
 
+   switch_pmu_to_host(vcpu, _os_sprs);
+
store_spr_state(vcpu);
 
store_fp_state(>arch.fp);
@@ -4191,8 +4193,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
vcpu_vpa_increment_dispatch(vcpu);
 
-   switch_pmu_to_host(vcpu, _os_sprs);
-
timer_rearm_host_dec(*tb);
 
restore_p9_host_os_sprs(vcpu, _os_sprs);
-- 
2.23.0

[RFC PATCH 24/43] KVM: PPC: Book3S HV P9: Only execute mtSPR if the value changed

2021-06-22 Thread Nicholas Piggin

Keep better track of the current SPR value in places where
they are to be loaded with a new context, to reduce expensive
mtSPR operations.

-73 cycles (7354) POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 64 ++--
 1 file changed, 39 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8c6ba04e1fdf..612b70216e75 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3905,19 +3905,28 @@ static void switch_pmu_to_host(struct kvm_vcpu *vcpu,
}
 }
 
-static void load_spr_state(struct kvm_vcpu *vcpu)
+static void load_spr_state(struct kvm_vcpu *vcpu,
+   struct p9_host_os_sprs *host_os_sprs)
 {
-   mtspr(SPRN_DSCR, vcpu->arch.dscr);
-   mtspr(SPRN_IAMR, vcpu->arch.iamr);
-   mtspr(SPRN_PSPB, vcpu->arch.pspb);
-   mtspr(SPRN_FSCR, vcpu->arch.fscr);
mtspr(SPRN_TAR, vcpu->arch.tar);
mtspr(SPRN_EBBHR, vcpu->arch.ebbhr);
mtspr(SPRN_EBBRR, vcpu->arch.ebbrr);
mtspr(SPRN_BESCR, vcpu->arch.bescr);
-   mtspr(SPRN_TIDR, vcpu->arch.tid);
-   mtspr(SPRN_AMR, vcpu->arch.amr);
-   mtspr(SPRN_UAMOR, vcpu->arch.uamor);
+
+   if (!cpu_has_feature(CPU_FTR_ARCH_31))
+   mtspr(SPRN_TIDR, vcpu->arch.tid);
+   if (host_os_sprs->iamr != vcpu->arch.iamr)
+   mtspr(SPRN_IAMR, vcpu->arch.iamr);
+   if (host_os_sprs->amr != vcpu->arch.amr)
+   mtspr(SPRN_AMR, vcpu->arch.amr);
+   if (vcpu->arch.uamor != 0)
+   mtspr(SPRN_UAMOR, vcpu->arch.uamor);
+   if (host_os_sprs->fscr != vcpu->arch.fscr)
+   mtspr(SPRN_FSCR, vcpu->arch.fscr);
+   if (host_os_sprs->dscr != vcpu->arch.dscr)
+   mtspr(SPRN_DSCR, vcpu->arch.dscr);
+   if (vcpu->arch.pspb != 0)
+   mtspr(SPRN_PSPB, vcpu->arch.pspb);
 
/*
 * DAR, DSISR, and for nested HV, SPRGs must be set with MSR[RI]
@@ -3932,28 +3941,31 @@ static void load_spr_state(struct kvm_vcpu *vcpu)
 
 static void store_spr_state(struct kvm_vcpu *vcpu)
 {
-   vcpu->arch.ctrl = mfspr(SPRN_CTRLF);
-
-   vcpu->arch.iamr = mfspr(SPRN_IAMR);
-   vcpu->arch.pspb = mfspr(SPRN_PSPB);
-   vcpu->arch.fscr = mfspr(SPRN_FSCR);
vcpu->arch.tar = mfspr(SPRN_TAR);
vcpu->arch.ebbhr = mfspr(SPRN_EBBHR);
vcpu->arch.ebbrr = mfspr(SPRN_EBBRR);
vcpu->arch.bescr = mfspr(SPRN_BESCR);
-   vcpu->arch.tid = mfspr(SPRN_TIDR);
+
+   if (!cpu_has_feature(CPU_FTR_ARCH_31))
+   vcpu->arch.tid = mfspr(SPRN_TIDR);
+   vcpu->arch.iamr = mfspr(SPRN_IAMR);
vcpu->arch.amr = mfspr(SPRN_AMR);
vcpu->arch.uamor = mfspr(SPRN_UAMOR);
+   vcpu->arch.fscr = mfspr(SPRN_FSCR);
vcpu->arch.dscr = mfspr(SPRN_DSCR);
+   vcpu->arch.pspb = mfspr(SPRN_PSPB);
+
+   vcpu->arch.ctrl = mfspr(SPRN_CTRLF);
 }
 
 static void save_p9_host_os_sprs(struct p9_host_os_sprs *host_os_sprs)
 {
-   host_os_sprs->dscr = mfspr(SPRN_DSCR);
-   host_os_sprs->tidr = mfspr(SPRN_TIDR);
+   if (!cpu_has_feature(CPU_FTR_ARCH_31))
+   host_os_sprs->tidr = mfspr(SPRN_TIDR);
host_os_sprs->iamr = mfspr(SPRN_IAMR);
host_os_sprs->amr = mfspr(SPRN_AMR);
host_os_sprs->fscr = mfspr(SPRN_FSCR);
+   host_os_sprs->dscr = mfspr(SPRN_DSCR);
 }
 
 /* vcpu guest regs must already be saved */
@@ -3962,18 +3974,20 @@ static void restore_p9_host_os_sprs(struct kvm_vcpu 
*vcpu,
 {
mtspr(SPRN_SPRG_VDSO_WRITE, local_paca->sprg_vdso);
 
-   mtspr(SPRN_PSPB, 0);
-   mtspr(SPRN_UAMOR, 0);
-
-   mtspr(SPRN_DSCR, host_os_sprs->dscr);
-   mtspr(SPRN_TIDR, host_os_sprs->tidr);
-   mtspr(SPRN_IAMR, host_os_sprs->iamr);
-
+   if (!cpu_has_feature(CPU_FTR_ARCH_31))
+   mtspr(SPRN_TIDR, host_os_sprs->tidr);
+   if (host_os_sprs->iamr != vcpu->arch.iamr)
+   mtspr(SPRN_IAMR, host_os_sprs->iamr);
+   if (vcpu->arch.uamor != 0)
+   mtspr(SPRN_UAMOR, 0);
if (host_os_sprs->amr != vcpu->arch.amr)
mtspr(SPRN_AMR, host_os_sprs->amr);
-
if (host_os_sprs->fscr != vcpu->arch.fscr)
mtspr(SPRN_FSCR, host_os_sprs->fscr);
+   if (host_os_sprs->dscr != vcpu->arch.dscr)
+   mtspr(SPRN_DSCR, host_os_sprs->dscr);
+   if (vcpu->arch.pspb != 0)
+   mtspr(SPRN_PSPB, 0);
 
/* Save guest CTRL register, set runlatch to 1 */
if (!(vcpu->arch.ctrl & 1))
@@ -4063,7 +4077,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 #endif
mtspr(SPRN_VRSAVE, vcpu->arch.vrsave);
 
-   load_spr_state(vcpu);
+   load_spr_state(vcpu, _os_sprs);
 
if (kvmhv_on_pseries()) {
/*
-- 
2.23.0

[RFC PATCH 22/43] KVM: PPC: Book3S HV P9: Optimise timebase reads

2021-06-22 Thread Nicholas Piggin

Reduce the number of mfTB executed by passing the current timebase
around entry and exit code rather than read it multiple times.

-213 cycles (7578) POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  2 +-
 arch/powerpc/kvm/book3s_hv.c | 88 +---
 arch/powerpc/kvm/book3s_hv_p9_entry.c| 33 +
 3 files changed, 65 insertions(+), 58 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index eaf3a562bf1e..f8a0ed90b853 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -153,7 +153,7 @@ static inline bool kvmhv_vcpu_is_radix(struct kvm_vcpu 
*vcpu)
return radix;
 }
 
-int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long 
lpcr);
+int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long 
lpcr, u64 *tb);
 
 #define KVM_DEFAULT_HPT_ORDER  24  /* 16MB HPT by default */
 #endif
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 218dacd78e25..99b19f4e7ed7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -275,22 +275,22 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu 
*vcpu)
  * they should never fail.)
  */
 
-static void kvmppc_core_start_stolen(struct kvmppc_vcore *vc)
+static void kvmppc_core_start_stolen(struct kvmppc_vcore *vc, u64 tb)
 {
unsigned long flags;
 
spin_lock_irqsave(>stoltb_lock, flags);
-   vc->preempt_tb = mftb();
+   vc->preempt_tb = tb;
spin_unlock_irqrestore(>stoltb_lock, flags);
 }
 
-static void kvmppc_core_end_stolen(struct kvmppc_vcore *vc)
+static void kvmppc_core_end_stolen(struct kvmppc_vcore *vc, u64 tb)
 {
unsigned long flags;
 
spin_lock_irqsave(>stoltb_lock, flags);
if (vc->preempt_tb != TB_NIL) {
-   vc->stolen_tb += mftb() - vc->preempt_tb;
+   vc->stolen_tb += tb - vc->preempt_tb;
vc->preempt_tb = TB_NIL;
}
spin_unlock_irqrestore(>stoltb_lock, flags);
@@ -300,6 +300,7 @@ static void kvmppc_core_vcpu_load_hv(struct kvm_vcpu *vcpu, 
int cpu)
 {
struct kvmppc_vcore *vc = vcpu->arch.vcore;
unsigned long flags;
+   u64 now = mftb();
 
/*
 * We can test vc->runner without taking the vcore lock,
@@ -308,12 +309,12 @@ static void kvmppc_core_vcpu_load_hv(struct kvm_vcpu 
*vcpu, int cpu)
 * ever sets it to NULL.
 */
if (vc->runner == vcpu && vc->vcore_state >= VCORE_SLEEPING)
-   kvmppc_core_end_stolen(vc);
+   kvmppc_core_end_stolen(vc, now);
 
spin_lock_irqsave(>arch.tbacct_lock, flags);
if (vcpu->arch.state == KVMPPC_VCPU_BUSY_IN_HOST &&
vcpu->arch.busy_preempt != TB_NIL) {
-   vcpu->arch.busy_stolen += mftb() - vcpu->arch.busy_preempt;
+   vcpu->arch.busy_stolen += now - vcpu->arch.busy_preempt;
vcpu->arch.busy_preempt = TB_NIL;
}
spin_unlock_irqrestore(>arch.tbacct_lock, flags);
@@ -323,13 +324,14 @@ static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcore *vc = vcpu->arch.vcore;
unsigned long flags;
+   u64 now = mftb();
 
if (vc->runner == vcpu && vc->vcore_state >= VCORE_SLEEPING)
-   kvmppc_core_start_stolen(vc);
+   kvmppc_core_start_stolen(vc, now);
 
spin_lock_irqsave(>arch.tbacct_lock, flags);
if (vcpu->arch.state == KVMPPC_VCPU_BUSY_IN_HOST)
-   vcpu->arch.busy_preempt = mftb();
+   vcpu->arch.busy_preempt = now;
spin_unlock_irqrestore(>arch.tbacct_lock, flags);
 }
 
@@ -684,7 +686,7 @@ static u64 vcore_stolen_time(struct kvmppc_vcore *vc, u64 
now)
 }
 
 static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
-   struct kvmppc_vcore *vc)
+   struct kvmppc_vcore *vc, u64 tb)
 {
struct dtl_entry *dt;
struct lppaca *vpa;
@@ -695,7 +697,7 @@ static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
 
dt = vcpu->arch.dtl_ptr;
vpa = vcpu->arch.vpa.pinned_addr;
-   now = mftb();
+   now = tb;
core_stolen = vcore_stolen_time(vc, now);
stolen = core_stolen - vcpu->arch.stolen_logged;
vcpu->arch.stolen_logged = core_stolen;
@@ -2792,14 +2794,14 @@ static void kvmppc_set_timer(struct kvm_vcpu *vcpu)
 extern int __kvmppc_vcore_entry(void);
 
 static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
-  struct kvm_vcpu *vcpu)
+  struct kvm_vcpu *vcpu, u64 tb)
 {
u64 now;
 
if (vcpu->arch.state != KVMPPC_VCPU_RUNNABLE)
return;
spin_lock_irq(>arch.tbacct_lock);
-   now = mftb();
+   now = tb;
vcpu->arch.busy_stolen

[RFC PATCH 21/43] KVM: PPC: Book3S HV P9: Move TB updates

2021-06-22 Thread Nicholas Piggin

Move the TB updates between saving and loading guest and host SPRs,
to improve scheduling by keeping issue-NTC operations together as
much as possible.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 36 +--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 469dd5cbb52d..44ee805875ba 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -215,15 +215,6 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
 
vcpu->arch.ceded = 0;
 
-   if (vc->tb_offset) {
-   u64 new_tb = tb + vc->tb_offset;
-   mtspr(SPRN_TBU40, new_tb);
-   tb = mftb();
-   if ((tb & 0xff) < (new_tb & 0xff))
-   mtspr(SPRN_TBU40, new_tb + 0x100);
-   vc->tb_offset_applied = vc->tb_offset;
-   }
-
/* Could avoid mfmsr by passing around, but probably no big deal */
msr = mfmsr();
 
@@ -238,6 +229,15 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
host_dawrx1 = mfspr(SPRN_DAWRX1);
}
 
+   if (vc->tb_offset) {
+   u64 new_tb = tb + vc->tb_offset;
+   mtspr(SPRN_TBU40, new_tb);
+   tb = mftb();
+   if ((tb & 0xff) < (new_tb & 0xff))
+   mtspr(SPRN_TBU40, new_tb + 0x100);
+   vc->tb_offset_applied = vc->tb_offset;
+   }
+
if (vc->pcr)
mtspr(SPRN_PCR, vc->pcr | PCR_MASK);
mtspr(SPRN_DPDES, vc->dpdes);
@@ -454,6 +454,15 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
tb = mftb();
vcpu->arch.dec_expires = dec + tb;
 
+   if (vc->tb_offset_applied) {
+   u64 new_tb = tb - vc->tb_offset_applied;
+   mtspr(SPRN_TBU40, new_tb);
+   tb = mftb();
+   if ((tb & 0xff) < (new_tb & 0xff))
+   mtspr(SPRN_TBU40, new_tb + 0x100);
+   vc->tb_offset_applied = 0;
+   }
+
/* Preserve PSSCR[FAKE_SUSPEND] until we've called kvmppc_save_tm_hv */
mtspr(SPRN_PSSCR, host_psscr |
  (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
@@ -488,15 +497,6 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
if (vc->pcr)
mtspr(SPRN_PCR, PCR_MASK);
 
-   if (vc->tb_offset_applied) {
-   u64 new_tb = mftb() - vc->tb_offset_applied;
-   mtspr(SPRN_TBU40, new_tb);
-   tb = mftb();
-   if ((tb & 0xff) < (new_tb & 0xff))
-   mtspr(SPRN_TBU40, new_tb + 0x100);
-   vc->tb_offset_applied = 0;
-   }
-
/* HDEC must be at least as large as DEC, so decrementer_max fits */
mtspr(SPRN_HDEC, decrementer_max);
 
-- 
2.23.0

[RFC PATCH 20/43] KVM: PPC: Book3S HV: Change dec_expires to be relative to guest timebase

2021-06-22 Thread Nicholas Piggin

Change dec_expires to be relative to the guest timebase, and allow
it to be moved into low level P9 guest entry functions, to improve
SPR access scheduling.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/kvm_book3s.h   |  6 +++
 arch/powerpc/include/asm/kvm_host.h |  2 +-
 arch/powerpc/kvm/book3s_hv.c| 58 +
 arch/powerpc/kvm/book3s_hv_nested.c |  3 ++
 arch/powerpc/kvm/book3s_hv_p9_entry.c   | 10 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 14 --
 6 files changed, 49 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index e6b53c6e21e3..032c597db0a9 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -403,6 +403,12 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu 
*vcpu)
return vcpu->arch.fault_dar;
 }
 
+/* Expiry time of vcpu DEC relative to host TB */
+static inline u64 kvmppc_dec_expires_host_tb(struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.dec_expires - vcpu->arch.vcore->tb_offset;
+}
+
 static inline bool is_kvmppc_resume_guest(int r)
 {
return (r == RESUME_GUEST || r == RESUME_GUEST_NV);
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 5c003a5ff854..118b388ea887 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -747,7 +747,7 @@ struct kvm_vcpu_arch {
 
struct hrtimer dec_timer;
u64 dec_jiffies;
-   u64 dec_expires;
+   u64 dec_expires;/* Relative to guest timebase. */
unsigned long pending_exceptions;
u8 ceded;
u8 prodded;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 86c85e303a6d..218dacd78e25 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2149,8 +2149,7 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
*val = get_reg_val(id, vcpu->arch.vcore->arch_compat);
break;
case KVM_REG_PPC_DEC_EXPIRY:
-   *val = get_reg_val(id, vcpu->arch.dec_expires +
-  vcpu->arch.vcore->tb_offset);
+   *val = get_reg_val(id, vcpu->arch.dec_expires);
break;
case KVM_REG_PPC_ONLINE:
*val = get_reg_val(id, vcpu->arch.online);
@@ -2402,8 +2401,7 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
r = kvmppc_set_arch_compat(vcpu, set_reg_val(id, *val));
break;
case KVM_REG_PPC_DEC_EXPIRY:
-   vcpu->arch.dec_expires = set_reg_val(id, *val) -
-   vcpu->arch.vcore->tb_offset;
+   vcpu->arch.dec_expires = set_reg_val(id, *val);
break;
case KVM_REG_PPC_ONLINE:
i = set_reg_val(id, *val);
@@ -2780,13 +2778,13 @@ static void kvmppc_set_timer(struct kvm_vcpu *vcpu)
unsigned long dec_nsec, now;
 
now = get_tb();
-   if (now > vcpu->arch.dec_expires) {
+   if (now > kvmppc_dec_expires_host_tb(vcpu)) {
/* decrementer has already gone negative */
kvmppc_core_queue_dec(vcpu);
kvmppc_core_prepare_to_enter(vcpu);
return;
}
-   dec_nsec = tb_to_ns(vcpu->arch.dec_expires - now);
+   dec_nsec = tb_to_ns(kvmppc_dec_expires_host_tb(vcpu) - now);
hrtimer_start(>arch.dec_timer, dec_nsec, HRTIMER_MODE_REL);
vcpu->arch.timer_running = 1;
 }
@@ -3258,7 +3256,7 @@ static void post_guest_process(struct kvmppc_vcore *vc, 
bool is_master)
 */
spin_unlock(>lock);
/* cancel pending dec exception if dec is positive */
-   if (now < vcpu->arch.dec_expires &&
+   if (now < kvmppc_dec_expires_host_tb(vcpu) &&
kvmppc_core_pending_dec(vcpu))
kvmppc_core_dequeue_dec(vcpu);
 
@@ -4068,20 +4066,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
load_spr_state(vcpu);
 
-   /*
-* When setting DEC, we must always deal with irq_work_raise via NMI vs
-* setting DEC. The problem occurs right as we switch into guest mode
-* if a NMI hits and sets pending work and sets DEC, then that will
-* apply to the guest and not bring us back to the host.
-*
-* irq_work_raise could check a flag (or possibly LPCR[HDICE] for
-* example) and set HDEC to 1? That wouldn't solve the nested hv
-* case which needs to abort the hcall or zero the time limit.
-*
-* XXX: Another day's problem.
-*/
-   mtspr(SPRN_DEC, vcpu->arch.dec_expires - tb);
-
if (kvmhv_on_pseries()) {
/*
 * We need to save and restore the guest visible part of the
@@ -4107,6 +4091,23 @@ static int

[RFC PATCH 19/43] KVM: PPC: Book3S HV P9: Add kvmppc_stop_thread to match kvmppc_start_thread

2021-06-22 Thread Nicholas Piggin

Small cleanup makes it a bit easier to match up entry and exit
operations.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b8b0695a9312..86c85e303a6d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2948,6 +2948,13 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu, 
struct kvmppc_vcore *vc)
kvmppc_ipi_thread(cpu);
 }
 
+/* Old path does this in asm */
+static void kvmppc_stop_thread(struct kvm_vcpu *vcpu)
+{
+   vcpu->cpu = -1;
+   vcpu->arch.thread_cpu = -1;
+}
+
 static void kvmppc_wait_for_nap(int n_threads)
 {
int cpu = smp_processor_id();
@@ -4154,8 +4161,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
dec = (s32) dec;
tb = mftb();
vcpu->arch.dec_expires = dec + tb;
-   vcpu->cpu = -1;
-   vcpu->arch.thread_cpu = -1;
 
store_spr_state(vcpu);
 
@@ -4627,6 +4632,8 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
 
guest_exit_irqoff();
 
+   kvmppc_stop_thread(vcpu);
+
powerpc_local_irq_pmu_restore(flags);
 
cpumask_clear_cpu(pcpu, >arch.cpu_in_guest);
-- 
2.23.0

[RFC PATCH 18/43] KVM: PPC: Book3S HV P9: Improve mtmsrd scheduling by delaying MSR[EE] disable

2021-06-22 Thread Nicholas Piggin

Moving the mtmsrd after the host SPRs are saved and before the guest
SPRs start to be loaded can prevent an SPR scoreboard stall (because
the mtmsrd is L=1 type which does not cause context synchronisation.

This is also now more convenient to combined with the mtmsrd L=0
instruction to enable facilities just below, but that is not done yet.

-12 cycles (7791) POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3ac5dbdb59f8..b8b0695a9312 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4015,6 +4015,18 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
save_p9_host_os_sprs(_os_sprs);
 
+   /*
+* This could be combined with MSR[RI] clearing, but that expands
+* the unrecoverable window. It would be better to cover unrecoverable
+* with KVM bad interrupt handling rather than use MSR[RI] at all.
+*
+* Much more difficult and less worthwhile to combine with IR/DR
+* disable.
+*/
+   hard_irq_disable();
+   if (lazy_irq_pending())
+   return 0;
+
/* MSR bits may have been cleared by context switch */
msr = 0;
if (IS_ENABLED(CONFIG_PPC_FPU))
@@ -4512,6 +4524,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
struct kvmppc_vcore *vc;
struct kvm *kvm = vcpu->kvm;
struct kvm_nested_guest *nested = vcpu->arch.nested;
+   unsigned long flags;
 
trace_kvmppc_run_vcpu_enter(vcpu);
 
@@ -4555,11 +4568,11 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
if (kvm_is_radix(kvm))
kvmppc_prepare_radix_vcpu(vcpu, pcpu);
 
-   local_irq_disable();
-   hard_irq_disable();
+   /* flags save not required, but irq_pmu has no disable/enable API */
+   powerpc_local_irq_pmu_save(flags);
if (signal_pending(current))
goto sigpend;
-   if (lazy_irq_pending() || need_resched() || !kvm->arch.mmu_ready)
+   if (need_resched() || !kvm->arch.mmu_ready)
goto out;
 
if (!nested) {
@@ -4614,7 +4627,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
 
guest_exit_irqoff();
 
-   local_irq_enable();
+   powerpc_local_irq_pmu_restore(flags);
 
cpumask_clear_cpu(pcpu, >arch.cpu_in_guest);
 
@@ -4672,7 +4685,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
run->exit_reason = KVM_EXIT_INTR;
vcpu->arch.ret = -EINTR;
  out:
-   local_irq_enable();
+   powerpc_local_irq_pmu_restore(flags);
preempt_enable();
goto done;
 }
-- 
2.23.0

[RFC PATCH 17/43] KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save host SPRs

2021-06-22 Thread Nicholas Piggin

This reduces the number of mtmsrd required to enable facility bits when
saving/restoring registers, by having the KVM code set all bits up front
rather than using individual facility functions that set their particular
MSR bits.

-42 cycles (7803) POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/process.c | 24 +++
 arch/powerpc/kvm/book3s_hv.c  | 57 ++-
 arch/powerpc/kvm/book3s_hv_p9_entry.c |  1 +
 3 files changed, 64 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 89e34aa273e2..dfce089ac424 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -592,6 +592,30 @@ static void save_all(struct task_struct *tsk)
msr_check_and_clear(msr_all_available);
 }
 
+void save_user_regs_kvm(void)
+{
+   unsigned long usermsr;
+
+   if (!current->thread.regs)
+   return;
+
+   usermsr = current->thread.regs->msr;
+
+   if (usermsr & MSR_FP)
+   save_fpu(current);
+
+   if (usermsr & MSR_VEC)
+   save_altivec(current);
+
+   if (usermsr & MSR_TM) {
+current->thread.tm_tfhar = mfspr(SPRN_TFHAR);
+current->thread.tm_tfiar = mfspr(SPRN_TFIAR);
+current->thread.tm_texasr = mfspr(SPRN_TEXASR);
+current->thread.regs->msr &= ~MSR_TM;
+   }
+}
+EXPORT_SYMBOL_GPL(save_user_regs_kvm);
+
 void flush_all_to_thread(struct task_struct *tsk)
 {
if (tsk->thread.regs) {
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 73a8b45249e8..3ac5dbdb59f8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3999,6 +3999,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
struct p9_host_os_sprs host_os_sprs;
s64 dec;
u64 tb, next_timer;
+   unsigned long msr;
int trap;
 
WARN_ON_ONCE(vcpu->arch.ceded);
@@ -4010,8 +4011,23 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
if (next_timer < time_limit)
time_limit = next_timer;
 
+   vcpu->arch.ceded = 0;
+
save_p9_host_os_sprs(_os_sprs);
 
+   /* MSR bits may have been cleared by context switch */
+   msr = 0;
+   if (IS_ENABLED(CONFIG_PPC_FPU))
+   msr |= MSR_FP;
+   if (cpu_has_feature(CPU_FTR_ALTIVEC))
+   msr |= MSR_VEC;
+   if (cpu_has_feature(CPU_FTR_VSX))
+   msr |= MSR_VSX;
+   if (cpu_has_feature(CPU_FTR_TM) ||
+   cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
+   msr |= MSR_TM;
+   msr = msr_check_and_set(msr);
+
kvmppc_subcore_enter_guest();
 
vc->entry_exit_map = 1;
@@ -4025,7 +4041,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
switch_pmu_to_guest(vcpu, _os_sprs);
 
-   msr_check_and_set(MSR_FP | MSR_VEC | MSR_VSX);
load_fp_state(>arch.fp);
 #ifdef CONFIG_ALTIVEC
load_vr_state(>arch.vr);
@@ -4134,7 +4149,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
restore_p9_host_os_sprs(vcpu, _os_sprs);
 
-   msr_check_and_set(MSR_FP | MSR_VEC | MSR_VSX);
store_fp_state(>arch.fp);
 #ifdef CONFIG_ALTIVEC
store_vr_state(>arch.vr);
@@ -4663,6 +4677,8 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
time_limit,
goto done;
 }
 
+void save_user_regs_kvm(void);
+
 static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 {
struct kvm_run *run = vcpu->run;
@@ -4672,19 +4688,24 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
unsigned long user_tar = 0;
unsigned int user_vrsave;
struct kvm *kvm;
+   unsigned long msr;
 
if (!vcpu->arch.sane) {
run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
return -EINVAL;
}
 
+   /* No need to go into the guest when all we'll do is come back out */
+   if (signal_pending(current)) {
+   run->exit_reason = KVM_EXIT_INTR;
+   return -EINTR;
+   }
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/*
 * Don't allow entry with a suspended transaction, because
 * the guest entry/exit code will lose it.
-* If the guest has TM enabled, save away their TM-related SPRs
-* (they will get restored by the TM unavailable interrupt).
 */
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
if (cpu_has_feature(CPU_FTR_TM) && current->thread.regs &&
(current->thread.regs->msr & MSR_TM)) {
if (MSR_TM_ACTIVE(current->thread.regs->msr)) {
@@ -4692,12 +4713,6 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
run->fail_entry.hardware_entry_failure_reason = 0;
return -EINVAL;
}
-   /* Enable TM so we can read the TM

[RFC PATCH 16/43] KVM: PPC: Book3S HV P9: Move SPRG restore to restore_p9_host_os_sprs

2021-06-22 Thread Nicholas Piggin

Move the SPR update into its relevant helper function. This will
help with SPR scheduling improvements in later changes.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index f0298b286c42..73a8b45249e8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3953,6 +3953,8 @@ static void save_p9_host_os_sprs(struct p9_host_os_sprs 
*host_os_sprs)
 static void restore_p9_host_os_sprs(struct kvm_vcpu *vcpu,
struct p9_host_os_sprs *host_os_sprs)
 {
+   mtspr(SPRN_SPRG_VDSO_WRITE, local_paca->sprg_vdso);
+
mtspr(SPRN_PSPB, 0);
mtspr(SPRN_UAMOR, 0);
 
@@ -4152,8 +4154,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
timer_rearm_host_dec(tb);
 
-   mtspr(SPRN_SPRG_VDSO_WRITE, local_paca->sprg_vdso);
-
kvmppc_subcore_exit_guest();
 
return trap;
-- 
2.23.0

[RFC PATCH 15/43] KVM: PPC: Book3S HV: CTRL SPR does not require read-modify-write

2021-06-22 Thread Nicholas Piggin

Processors that support KVM HV do not require read-modify-write of
the CTRL SPR to set/clear their thread's runlatch. Just write 1 or 0
to it.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c|  2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 15 ++-
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 0733bb95f439..f0298b286c42 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3920,7 +3920,7 @@ static void load_spr_state(struct kvm_vcpu *vcpu)
 */
 
if (!(vcpu->arch.ctrl & 1))
-   mtspr(SPRN_CTRLT, mfspr(SPRN_CTRLF) & ~1);
+   mtspr(SPRN_CTRLT, 0);
 }
 
 static void store_spr_state(struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 0eb06734bc26..488a1e07958c 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -775,12 +775,11 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
mtspr   SPRN_AMR,r5
mtspr   SPRN_UAMOR,r6
 
-   /* Restore state of CTRL run bit; assume 1 on entry */
+   /* Restore state of CTRL run bit; the host currently has it set to 1 */
lwz r5,VCPU_CTRL(r4)
andi.   r5,r5,1
bne 4f
-   mfspr   r6,SPRN_CTRLF
-   clrrdi  r6,r6,1
+   li  r6,0
mtspr   SPRN_CTRLT,r6
 4:
/* Secondary threads wait for primary to have done partition switch */
@@ -1209,12 +1208,12 @@ guest_bypass:
stw r0, VCPU_CPU(r9)
stw r0, VCPU_THREAD_CPU(r9)
 
-   /* Save guest CTRL register, set runlatch to 1 */
+   /* Save guest CTRL register, set runlatch to 1 if it was clear */
mfspr   r6,SPRN_CTRLF
stw r6,VCPU_CTRL(r9)
andi.   r0,r6,1
bne 4f
-   ori r6,r6,1
+   li  r6,1
mtspr   SPRN_CTRLT,r6
 4:
/*
@@ -2220,8 +2219,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_TM)
 * Also clear the runlatch bit before napping.
 */
 kvm_do_nap:
-   mfspr   r0, SPRN_CTRLF
-   clrrdi  r0, r0, 1
+   li  r0,0
mtspr   SPRN_CTRLT, r0
 
li  r0,1
@@ -2240,8 +2238,7 @@ kvm_nap_sequence: /* desired LPCR value in r5 */
 
bl  isa206_idle_insn_mayloss
 
-   mfspr   r0, SPRN_CTRLF
-   ori r0, r0, 1
+   li  r0,1
mtspr   SPRN_CTRLT, r0
 
mtspr   SPRN_SRR1, r3
-- 
2.23.0

[RFC PATCH 14/43] KVM: PPC: Book3S HV P9: Demand fault PMU SPRs when marked not inuse

2021-06-22 Thread Nicholas Piggin

The pmcregs_in_use field in the guest VPA can not be trusted to reflect
what the guest is doing with PMU SPRs, so the PMU must always be managed
(stopped) when exiting the guest, and SPR values set when entering the
guest to ensure it can't cause a covert channel or otherwise cause other
guests or the host to misbehave.

So prevent guest access to the PMU with HFSCR[PM] if pmcregs_in_use is
clear, and avoid the PMU SPR access on every partition switch. Guests
that set pmcregs_in_use incorrectly or when first setting it and using
the PMU will take a hypervisor facility unavailable interrupt that will
bring in the PMU SPRs.

-774 cycles (7759) cycles POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/kvm_host.h |   1 +
 arch/powerpc/kvm/book3s_hv.c| 122 ++--
 arch/powerpc/kvm/book3s_hv_nested.c |  12 ++-
 3 files changed, 105 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 7e4c3a741951..5c003a5ff854 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -819,6 +819,7 @@ struct kvm_vcpu_arch {
/* For support of nested guests */
struct kvm_nested_guest *nested;
u32 nested_vcpu_id;
+   u64 nested_hfscr;
gpa_t nested_io_gpr;
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 13b8389b0479..0733bb95f439 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1349,6 +1349,20 @@ static int kvmppc_emulate_doorbell_instr(struct kvm_vcpu 
*vcpu)
return RESUME_GUEST;
 }
 
+/*
+ * If the lppaca had pmcregs_in_use clear when we exited the guest, then
+ * HFSCR_PM is cleared for next entry. If the guest then tries to access
+ * the PMU SPRs, we get this facility unavailable interrupt. Putting HFSCR_PM
+ * back in the guest HFSCR will cause the next entry to load the PMU SPRs and
+ * allow the guest access to continue.
+ */
+static int kvmppc_pmu_unavailable(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.hfscr |= HFSCR_PM;
+
+   return RESUME_GUEST;
+}
+
 static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 struct task_struct *tsk)
 {
@@ -1618,16 +1632,22 @@ XXX benchmark guest exits
 * to emulate.
 * Otherwise, we just generate a program interrupt to the guest.
 */
-   case BOOK3S_INTERRUPT_H_FAC_UNAVAIL:
+   case BOOK3S_INTERRUPT_H_FAC_UNAVAIL: {
r = EMULATE_FAIL;
-   if (((vcpu->arch.hfscr >> 56) == FSCR_MSGP_LG) &&
-   cpu_has_feature(CPU_FTR_ARCH_300))
-   r = kvmppc_emulate_doorbell_instr(vcpu);
+   if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+   unsigned long cause = vcpu->arch.hfscr >> 56;
+
+   if (cause == FSCR_MSGP_LG)
+   r = kvmppc_emulate_doorbell_instr(vcpu);
+   if (cause == FSCR_PM_LG)
+   r = kvmppc_pmu_unavailable(vcpu);
+   }
if (r == EMULATE_FAIL) {
kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
r = RESUME_GUEST;
}
break;
+   }
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
case BOOK3S_INTERRUPT_HV_SOFTPATCH:
@@ -1734,6 +1754,19 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu 
*vcpu)
srcu_read_unlock(>kvm->srcu, srcu_idx);
break;
 
+   case BOOK3S_INTERRUPT_H_FAC_UNAVAIL: {
+   unsigned long cause = vcpu->arch.hfscr >> 56;
+
+   r = EMULATE_FAIL;
+   if (cause == FSCR_PM_LG && (vcpu->arch.nested_hfscr & HFSCR_PM))
+   r = kvmppc_pmu_unavailable(vcpu);
+
+   if (r == EMULATE_FAIL)
+   r = RESUME_HOST;
+
+   break;
+   }
+
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
case BOOK3S_INTERRUPT_HV_SOFTPATCH:
/*
@@ -3693,6 +3726,17 @@ static void freeze_pmu(unsigned long mmcr0, unsigned 
long mmcra)
 static void switch_pmu_to_guest(struct kvm_vcpu *vcpu,
struct p9_host_os_sprs *host_os_sprs)
 {
+   struct lppaca *lp;
+   int load_pmu = 1;
+
+   lp = vcpu->arch.vpa.pinned_addr;
+   if (lp)
+   load_pmu = lp->pmcregs_in_use;
+
+   if (load_pmu)
+ vcpu->arch.hfscr |= HFSCR_PM;
+
+   /* Save host */
if (ppc_get_pmu_inuse()) {
/*
 * It might be better to put PMU handling (at least for the
@@ -3737,29 +3781,31 @@ static void switch_pmu_to_guest(struct kvm_vcpu *vcpu,
}
 #endif
 
-   /* load guest */
-   mtspr(SPRN_PMC1, vcpu->arch.pmc[0]);
-   mtspr(SPRN_PMC2, vcpu->arch.pmc[1]);
-   mtspr(SPRN_PMC3, vcpu->arch.pmc[2]);
-   mtspr(SPRN_PMC4,

[RFC PATCH 13/43] KVM: PPC: Book3S HV P9: Factor PMU save/load into context switch functions

2021-06-22 Thread Nicholas Piggin

Rather than guest/host save/retsore functions, implement context switch
functions that take care of details like the VPA update for nested.

The reason to split these kind of helpers into explicit save/load
functions is mainly to schedule SPR access nicely, but PMU is a special
case where the load requires mtSPR (to stop counters) and other
difficulties, so there's less possibility to schedule those nicely. The
SPR accesses also have side-effects if the PMU is running, and in later
changes we keep the host PMU running as long as possible so this code
can be better profiled, which also complicates scheduling.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 51 
 1 file changed, 23 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 38d8afa16839..13b8389b0479 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3690,7 +3690,8 @@ static void freeze_pmu(unsigned long mmcr0, unsigned long 
mmcra)
isync();
 }
 
-static void save_p9_host_pmu(struct p9_host_os_sprs *host_os_sprs)
+static void switch_pmu_to_guest(struct kvm_vcpu *vcpu,
+   struct p9_host_os_sprs *host_os_sprs)
 {
if (ppc_get_pmu_inuse()) {
/*
@@ -3724,10 +3725,19 @@ static void save_p9_host_pmu(struct p9_host_os_sprs 
*host_os_sprs)
host_os_sprs->sier3 = mfspr(SPRN_SIER3);
}
}
-}
 
-static void load_p9_guest_pmu(struct kvm_vcpu *vcpu)
-{
+#ifdef CONFIG_PPC_PSERIES
+   if (kvmhv_on_pseries()) {
+   if (vcpu->arch.vpa.pinned_addr) {
+   struct lppaca *lp = vcpu->arch.vpa.pinned_addr;
+   get_lppaca()->pmcregs_in_use = lp->pmcregs_in_use;
+   } else {
+   get_lppaca()->pmcregs_in_use = 1;
+   }
+   }
+#endif
+
+   /* load guest */
mtspr(SPRN_PMC1, vcpu->arch.pmc[0]);
mtspr(SPRN_PMC2, vcpu->arch.pmc[1]);
mtspr(SPRN_PMC3, vcpu->arch.pmc[2]);
@@ -3752,7 +3762,8 @@ static void load_p9_guest_pmu(struct kvm_vcpu *vcpu)
/* No isync necessary because we're starting counters */
 }
 
-static void save_p9_guest_pmu(struct kvm_vcpu *vcpu)
+static void switch_pmu_to_host(struct kvm_vcpu *vcpu,
+   struct p9_host_os_sprs *host_os_sprs)
 {
struct lppaca *lp;
int save_pmu = 1;
@@ -3787,10 +3798,12 @@ static void save_p9_guest_pmu(struct kvm_vcpu *vcpu)
} else {
freeze_pmu(mfspr(SPRN_MMCR0), mfspr(SPRN_MMCRA));
}
-}
 
-static void load_p9_host_pmu(struct p9_host_os_sprs *host_os_sprs)
-{
+#ifdef CONFIG_PPC_PSERIES
+   if (kvmhv_on_pseries())
+   get_lppaca()->pmcregs_in_use = ppc_get_pmu_inuse();
+#endif
+
if (ppc_get_pmu_inuse()) {
mtspr(SPRN_PMC1, host_os_sprs->pmc1);
mtspr(SPRN_PMC2, host_os_sprs->pmc2);
@@ -3929,8 +3942,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
save_p9_host_os_sprs(_os_sprs);
 
-   save_p9_host_pmu(_os_sprs);
-
kvmppc_subcore_enter_guest();
 
vc->entry_exit_map = 1;
@@ -3942,17 +3953,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
kvmppc_restore_tm_hv(vcpu, vcpu->arch.shregs.msr, true);
 
-#ifdef CONFIG_PPC_PSERIES
-   if (kvmhv_on_pseries()) {
-   if (vcpu->arch.vpa.pinned_addr) {
-   struct lppaca *lp = vcpu->arch.vpa.pinned_addr;
-   get_lppaca()->pmcregs_in_use = lp->pmcregs_in_use;
-   } else {
-   get_lppaca()->pmcregs_in_use = 1;
-   }
-   }
-#endif
-   load_p9_guest_pmu(vcpu);
+   switch_pmu_to_guest(vcpu, _os_sprs);
 
msr_check_and_set(MSR_FP | MSR_VEC | MSR_VSX);
load_fp_state(>arch.fp);
@@ -4076,11 +4077,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
vcpu_vpa_increment_dispatch(vcpu);
 
-   save_p9_guest_pmu(vcpu);
-#ifdef CONFIG_PPC_PSERIES
-   if (kvmhv_on_pseries())
-   get_lppaca()->pmcregs_in_use = ppc_get_pmu_inuse();
-#endif
+   switch_pmu_to_host(vcpu, _os_sprs);
 
vc->entry_exit_map = 0x101;
vc->in_guest = 0;
@@ -4089,8 +4086,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 
mtspr(SPRN_SPRG_VDSO_WRITE, local_paca->sprg_vdso);
 
-   load_p9_host_pmu(_os_sprs);
-
kvmppc_subcore_exit_guest();
 
return trap;
-- 
2.23.0

[RFC PATCH 12/43] KVM: PPC: Book3S HV P9: Factor out yield_count increment

2021-06-22 Thread Nicholas Piggin

Factor duplicated code into a helper function.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b1b94b3563b7..38d8afa16839 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3896,6 +3896,16 @@ static inline bool hcall_is_xics(unsigned long req)
req == H_IPOLL || req == H_XIRR || req == H_XIRR_X;
 }
 
+static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu)
+{
+   struct lppaca *lp = vcpu->arch.vpa.pinned_addr;
+   if (lp) {
+   u32 yield_count = be32_to_cpu(lp->yield_count) + 1;
+   lp->yield_count = cpu_to_be32(yield_count);
+   vcpu->arch.vpa.dirty = 1;
+   }
+}
+
 /*
  * Guest entry for POWER9 and later CPUs.
  */
@@ -3926,12 +3936,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
vc->entry_exit_map = 1;
vc->in_guest = 1;
 
-   if (vcpu->arch.vpa.pinned_addr) {
-   struct lppaca *lp = vcpu->arch.vpa.pinned_addr;
-   u32 yield_count = be32_to_cpu(lp->yield_count) + 1;
-   lp->yield_count = cpu_to_be32(yield_count);
-   vcpu->arch.vpa.dirty = 1;
-   }
+   vcpu_vpa_increment_dispatch(vcpu);
 
if (cpu_has_feature(CPU_FTR_TM) ||
cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
@@ -4069,12 +4074,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
cpu_has_feature(CPU_FTR_P9_TM_HV_ASSIST))
kvmppc_save_tm_hv(vcpu, vcpu->arch.shregs.msr, true);
 
-   if (vcpu->arch.vpa.pinned_addr) {
-   struct lppaca *lp = vcpu->arch.vpa.pinned_addr;
-   u32 yield_count = be32_to_cpu(lp->yield_count) + 1;
-   lp->yield_count = cpu_to_be32(yield_count);
-   vcpu->arch.vpa.dirty = 1;
-   }
+   vcpu_vpa_increment_dispatch(vcpu);
 
save_p9_guest_pmu(vcpu);
 #ifdef CONFIG_PPC_PSERIES
-- 
2.23.0

[RFC PATCH 11/43] KVM: PPC: Book3S HV P9: Implement PMU save/restore in C

2021-06-22 Thread Nicholas Piggin

Implement the P9 path PMU save/restore code in C, and remove the
POWER9/10 code from the P7/8 path assembly.

-449 cycles (8533) POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/asm-prototypes.h |   5 -
 arch/powerpc/kvm/book3s_hv.c  | 205 --
 arch/powerpc/kvm/book3s_hv_interrupts.S   |  13 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  43 +
 4 files changed, 200 insertions(+), 66 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 02ee6f5ac9fe..928db8ef9a5a 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -136,11 +136,6 @@ static inline void kvmppc_restore_tm_hv(struct kvm_vcpu 
*vcpu, u64 msr,
bool preserve_nv) { }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
-void kvmhv_save_host_pmu(void);
-void kvmhv_load_host_pmu(void);
-void kvmhv_save_guest_pmu(struct kvm_vcpu *vcpu, bool pmu_in_use);
-void kvmhv_load_guest_pmu(struct kvm_vcpu *vcpu);
-
 void kvmppc_p9_enter_guest(struct kvm_vcpu *vcpu);
 
 long kvmppc_h_set_dabr(struct kvm_vcpu *vcpu, unsigned long dabr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index f7349d150828..b1b94b3563b7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3635,6 +3635,188 @@ static noinline void kvmppc_run_core(struct 
kvmppc_vcore *vc)
trace_kvmppc_run_core(vc, 1);
 }
 
+/*
+ * Privileged (non-hypervisor) host registers to save.
+ */
+struct p9_host_os_sprs {
+   unsigned long dscr;
+   unsigned long tidr;
+   unsigned long iamr;
+   unsigned long amr;
+   unsigned long fscr;
+
+   unsigned int pmc1;
+   unsigned int pmc2;
+   unsigned int pmc3;
+   unsigned int pmc4;
+   unsigned int pmc5;
+   unsigned int pmc6;
+   unsigned long mmcr0;
+   unsigned long mmcr1;
+   unsigned long mmcr2;
+   unsigned long mmcr3;
+   unsigned long mmcra;
+   unsigned long siar;
+   unsigned long sier1;
+   unsigned long sier2;
+   unsigned long sier3;
+   unsigned long sdar;
+};
+
+static void freeze_pmu(unsigned long mmcr0, unsigned long mmcra)
+{
+   if (!(mmcr0 & MMCR0_FC))
+   goto do_freeze;
+   if (mmcra & MMCRA_SAMPLE_ENABLE)
+   goto do_freeze;
+   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+   if (!(mmcr0 & MMCR0_PMCCEXT))
+   goto do_freeze;
+   if (!(mmcra & MMCRA_BHRB_DISABLE))
+   goto do_freeze;
+   }
+   return;
+
+do_freeze:
+   mmcr0 = MMCR0_FC;
+   mmcra = 0;
+   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+   mmcr0 |= MMCR0_PMCCEXT;
+   mmcra = MMCRA_BHRB_DISABLE;
+   }
+
+   mtspr(SPRN_MMCR0, mmcr0);
+   mtspr(SPRN_MMCRA, mmcra);
+   isync();
+}
+
+static void save_p9_host_pmu(struct p9_host_os_sprs *host_os_sprs)
+{
+   if (ppc_get_pmu_inuse()) {
+   /*
+* It might be better to put PMU handling (at least for the
+* host) in the perf subsystem because it knows more about what
+* is being used.
+*/
+
+   /* POWER9, POWER10 do not implement HPMC or SPMC */
+
+   host_os_sprs->mmcr0 = mfspr(SPRN_MMCR0);
+   host_os_sprs->mmcra = mfspr(SPRN_MMCRA);
+
+   freeze_pmu(host_os_sprs->mmcr0, host_os_sprs->mmcra);
+
+   host_os_sprs->pmc1 = mfspr(SPRN_PMC1);
+   host_os_sprs->pmc2 = mfspr(SPRN_PMC2);
+   host_os_sprs->pmc3 = mfspr(SPRN_PMC3);
+   host_os_sprs->pmc4 = mfspr(SPRN_PMC4);
+   host_os_sprs->pmc5 = mfspr(SPRN_PMC5);
+   host_os_sprs->pmc6 = mfspr(SPRN_PMC6);
+   host_os_sprs->mmcr1 = mfspr(SPRN_MMCR1);
+   host_os_sprs->mmcr2 = mfspr(SPRN_MMCR2);
+   host_os_sprs->sdar = mfspr(SPRN_SDAR);
+   host_os_sprs->siar = mfspr(SPRN_SIAR);
+   host_os_sprs->sier1 = mfspr(SPRN_SIER);
+
+   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+   host_os_sprs->mmcr3 = mfspr(SPRN_MMCR3);
+   host_os_sprs->sier2 = mfspr(SPRN_SIER2);
+   host_os_sprs->sier3 = mfspr(SPRN_SIER3);
+   }
+   }
+}
+
+static void load_p9_guest_pmu(struct kvm_vcpu *vcpu)
+{
+   mtspr(SPRN_PMC1, vcpu->arch.pmc[0]);
+   mtspr(SPRN_PMC2, vcpu->arch.pmc[1]);
+   mtspr(SPRN_PMC3, vcpu->arch.pmc[2]);
+   mtspr(SPRN_PMC4, vcpu->arch.pmc[3]);
+   mtspr(SPRN_PMC5, vcpu->arch.pmc[4]);
+   mtspr(SPRN_PMC6, vcpu->arch.pmc[5]);
+   mtspr(SPRN_MMCR1, vcpu->arch.mmcr[1]);
+   mtspr(SPRN_MMCR2, vcpu->arch.mmcr[2]);
+   mtspr(SPRN_SDAR, vcpu->arch.sdar);
+   mtspr(SPRN_SIAR, vcpu->arch.siar);
+   mtspr(SPRN_SIER,

[RFC PATCH 10/43] powerpc/64s: Always set PMU control registers to frozen/disabled when not in use

2021-06-22 Thread Nicholas Piggin

KVM PMU management code looks for particular frozen/disabled bits in
the PMU registers so it knows whether it must clear them when coming
out of a guest or not. Setting this up helps KVM make these optimisations
without getting confused. Longer term the better approach might be to
move guest/host PMU switching to the perf subsystem.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/cpu_setup_power.c | 4 ++--
 arch/powerpc/kernel/dt_cpu_ftrs.c | 6 +++---
 arch/powerpc/kvm/book3s_hv.c  | 5 +
 arch/powerpc/perf/core-book3s.c   | 7 +++
 4 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/cpu_setup_power.c 
b/arch/powerpc/kernel/cpu_setup_power.c
index a29dc8326622..3dc61e203f37 100644
--- a/arch/powerpc/kernel/cpu_setup_power.c
+++ b/arch/powerpc/kernel/cpu_setup_power.c
@@ -109,7 +109,7 @@ static void init_PMU_HV_ISA207(void)
 static void init_PMU(void)
 {
mtspr(SPRN_MMCRA, 0);
-   mtspr(SPRN_MMCR0, 0);
+   mtspr(SPRN_MMCR0, MMCR0_FC);
mtspr(SPRN_MMCR1, 0);
mtspr(SPRN_MMCR2, 0);
 }
@@ -123,7 +123,7 @@ static void init_PMU_ISA31(void)
 {
mtspr(SPRN_MMCR3, 0);
mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE);
-   mtspr(SPRN_MMCR0, MMCR0_PMCCEXT);
+   mtspr(SPRN_MMCR0, MMCR0_FC | MMCR0_PMCCEXT);
 }
 
 /*
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 0a6b36b4bda8..06a089fbeaa7 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -353,7 +353,7 @@ static void init_pmu_power8(void)
}
 
mtspr(SPRN_MMCRA, 0);
-   mtspr(SPRN_MMCR0, 0);
+   mtspr(SPRN_MMCR0, MMCR0_FC);
mtspr(SPRN_MMCR1, 0);
mtspr(SPRN_MMCR2, 0);
mtspr(SPRN_MMCRS, 0);
@@ -392,7 +392,7 @@ static void init_pmu_power9(void)
mtspr(SPRN_MMCRC, 0);
 
mtspr(SPRN_MMCRA, 0);
-   mtspr(SPRN_MMCR0, 0);
+   mtspr(SPRN_MMCR0, MMCR0_FC);
mtspr(SPRN_MMCR1, 0);
mtspr(SPRN_MMCR2, 0);
 }
@@ -428,7 +428,7 @@ static void init_pmu_power10(void)
 
mtspr(SPRN_MMCR3, 0);
mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE);
-   mtspr(SPRN_MMCR0, MMCR0_PMCCEXT);
+   mtspr(SPRN_MMCR0, MMCR0_FC | MMCR0_PMCCEXT);
 }
 
 static int __init feat_enable_pmu_power10(struct dt_cpu_feature *f)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 1f30f98b09d1..f7349d150828 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2593,6 +2593,11 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu 
*vcpu)
 #endif
 #endif
vcpu->arch.mmcr[0] = MMCR0_FC;
+   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+   vcpu->arch.mmcr[0] |= MMCR0_PMCCEXT;
+   vcpu->arch.mmcra = MMCRA_BHRB_DISABLE;
+   }
+
vcpu->arch.ctrl = CTRL_RUNLATCH;
/* default to host PVR, since we can't spoof it */
kvmppc_set_pvr_hv(vcpu, mfspr(SPRN_PVR));
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 51622411a7cc..e33b29ec1a65 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1361,6 +1361,13 @@ static void power_pmu_enable(struct pmu *pmu)
goto out;
 
if (cpuhw->n_events == 0) {
+   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+   mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE);
+   mtspr(SPRN_MMCR0, MMCR0_FC | MMCR0_PMCCEXT);
+   } else {
+   mtspr(SPRN_MMCRA, 0);
+   mtspr(SPRN_MMCR0, MMCR0_FC);
+   }
ppc_set_pmu_inuse(0);
goto out;
}
-- 
2.23.0

[RFC PATCH 09/43] KVM: PPC: Book3S HV: Don't always save PMU for guest capable of nesting

2021-06-22 Thread Nicholas Piggin

Revert the workaround added by commit 63279eeb7f93a ("KVM: PPC: Book3S
HV: Always save guest pmu for guest capable of nesting").

Nested capable guests running with the earlier commit ("KVM: PPC: Book3S
HV Nested: Indicate guest PMU in-use in VPA") will now indicate the PMU
in-use status of their guests, which means the parent does not need to
unconditionally save the PMU for nested capable guests.

This will cause the PMU to break for nested guests when running older
nested hypervisor guests under a kernel with this change. It's unclear
there's an easy way to avoid that, so this could wait for a release or
so for the fix to filter into stable kernels.

-134 cycles (8982) POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ed713f49fbd5..1f30f98b09d1 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3901,8 +3901,6 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
vcpu->arch.vpa.dirty = 1;
save_pmu = lp->pmcregs_in_use;
}
-   /* Must save pmu if this guest is capable of running nested guests */
-   save_pmu |= nesting_enabled(vcpu->kvm);
 
kvmhv_save_guest_pmu(vcpu, save_pmu);
 #ifdef CONFIG_PPC_PSERIES
-- 
2.23.0

[RFC PATCH 08/43] powerpc/64s: Keep AMOR SPR a constant ~0 at runtime

2021-06-22 Thread Nicholas Piggin

This register controls supervisor SPR modifications, and as such is only
relevant for KVM. KVM always sets AMOR to ~0 on guest entry, and never
restores it coming back out to the host, so it can be kept constant and
avoid the mtSPR in KVM guest entry.

-21 cycles (9116) cycles POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/cpu_setup_power.c|  8 
 arch/powerpc/kernel/dt_cpu_ftrs.c|  2 ++
 arch/powerpc/kvm/book3s_hv_p9_entry.c|  2 --
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |  2 --
 arch/powerpc/mm/book3s64/radix_pgtable.c | 15 ---
 arch/powerpc/platforms/powernv/idle.c|  8 +++-
 6 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/kernel/cpu_setup_power.c 
b/arch/powerpc/kernel/cpu_setup_power.c
index 3cca88ee96d7..a29dc8326622 100644
--- a/arch/powerpc/kernel/cpu_setup_power.c
+++ b/arch/powerpc/kernel/cpu_setup_power.c
@@ -137,6 +137,7 @@ void __setup_cpu_power7(unsigned long offset, struct 
cpu_spec *t)
return;
 
mtspr(SPRN_LPID, 0);
+   mtspr(SPRN_AMOR, ~0);
mtspr(SPRN_PCR, PCR_MASK);
init_LPCR_ISA206(mfspr(SPRN_LPCR), LPCR_LPES1 >> LPCR_LPES_SH);
 }
@@ -150,6 +151,7 @@ void __restore_cpu_power7(void)
return;
 
mtspr(SPRN_LPID, 0);
+   mtspr(SPRN_AMOR, ~0);
mtspr(SPRN_PCR, PCR_MASK);
init_LPCR_ISA206(mfspr(SPRN_LPCR), LPCR_LPES1 >> LPCR_LPES_SH);
 }
@@ -164,6 +166,7 @@ void __setup_cpu_power8(unsigned long offset, struct 
cpu_spec *t)
return;
 
mtspr(SPRN_LPID, 0);
+   mtspr(SPRN_AMOR, ~0);
mtspr(SPRN_PCR, PCR_MASK);
init_LPCR_ISA206(mfspr(SPRN_LPCR) | LPCR_PECEDH, 0); /* LPES = 0 */
init_HFSCR();
@@ -184,6 +187,7 @@ void __restore_cpu_power8(void)
return;
 
mtspr(SPRN_LPID, 0);
+   mtspr(SPRN_AMOR, ~0);
mtspr(SPRN_PCR, PCR_MASK);
init_LPCR_ISA206(mfspr(SPRN_LPCR) | LPCR_PECEDH, 0); /* LPES = 0 */
init_HFSCR();
@@ -202,6 +206,7 @@ void __setup_cpu_power9(unsigned long offset, struct 
cpu_spec *t)
mtspr(SPRN_PSSCR, 0);
mtspr(SPRN_LPID, 0);
mtspr(SPRN_PID, 0);
+   mtspr(SPRN_AMOR, ~0);
mtspr(SPRN_PCR, PCR_MASK);
init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
 LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
@@ -223,6 +228,7 @@ void __restore_cpu_power9(void)
mtspr(SPRN_PSSCR, 0);
mtspr(SPRN_LPID, 0);
mtspr(SPRN_PID, 0);
+   mtspr(SPRN_AMOR, ~0);
mtspr(SPRN_PCR, PCR_MASK);
init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
 LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
@@ -242,6 +248,7 @@ void __setup_cpu_power10(unsigned long offset, struct 
cpu_spec *t)
mtspr(SPRN_PSSCR, 0);
mtspr(SPRN_LPID, 0);
mtspr(SPRN_PID, 0);
+   mtspr(SPRN_AMOR, ~0);
mtspr(SPRN_PCR, PCR_MASK);
init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
 LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
@@ -264,6 +271,7 @@ void __restore_cpu_power10(void)
mtspr(SPRN_PSSCR, 0);
mtspr(SPRN_LPID, 0);
mtspr(SPRN_PID, 0);
+   mtspr(SPRN_AMOR, ~0);
mtspr(SPRN_PCR, PCR_MASK);
init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
 LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 358aee7c2d79..0a6b36b4bda8 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -80,6 +80,7 @@ static void __restore_cpu_cpufeatures(void)
mtspr(SPRN_LPCR, system_registers.lpcr);
if (hv_mode) {
mtspr(SPRN_LPID, 0);
+   mtspr(SPRN_AMOR, ~0);
mtspr(SPRN_HFSCR, system_registers.hfscr);
mtspr(SPRN_PCR, system_registers.pcr);
}
@@ -216,6 +217,7 @@ static int __init feat_enable_hv(struct dt_cpu_feature *f)
}
 
mtspr(SPRN_LPID, 0);
+   mtspr(SPRN_AMOR, ~0);
 
lpcr = mfspr(SPRN_LPCR);
lpcr &=  ~LPCR_LPES0; /* HV external interrupts */
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index c4f3e066fcb4..a3281f0c9214 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -286,8 +286,6 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
mtspr(SPRN_SPRG2, vcpu->arch.shregs.sprg2);
mtspr(SPRN_SPRG3, vcpu->arch.shregs.sprg3);
 
-   mtspr(SPRN_AMOR, ~0UL);
-
local_paca->kvm_hstate.in_guest = KVM_GUEST_MODE_HV_P9;
 
/*
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index

[RFC PATCH 07/43] KVM: PPC: Book3S HV: POWER10 enable HAIL when running radix guests

2021-06-22 Thread Nicholas Piggin

HV interrupts may be taken with the MMU enabled when radix guests are
running. Enable LPCR[HAIL] on ISA v3.1 processors for radix guests.
Make this depend on the host LPCR[HAIL] being enabled. Currently that is
always enabled, but having this test means any issue that might require
LPCR[HAIL] to be disabled in the host will not have to be duplicated in
KVM.

-1380 cycles on P10 NULL hcall entry+exit

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 36e1db48fccf..ed713f49fbd5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4896,6 +4896,8 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
  */
 int kvmppc_switch_mmu_to_hpt(struct kvm *kvm)
 {
+   unsigned long lpcr, lpcr_mask;
+
if (nesting_enabled(kvm))
kvmhv_release_all_nested(kvm);
kvmppc_rmap_reset(kvm);
@@ -4905,8 +4907,13 @@ int kvmppc_switch_mmu_to_hpt(struct kvm *kvm)
kvm->arch.radix = 0;
spin_unlock(>mmu_lock);
kvmppc_free_radix(kvm);
-   kvmppc_update_lpcr(kvm, LPCR_VPM1,
-  LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR);
+
+   lpcr = LPCR_VPM1;
+   lpcr_mask = LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR;
+   if (cpu_has_feature(CPU_FTR_ARCH_31))
+   lpcr_mask |= LPCR_HAIL;
+   kvmppc_update_lpcr(kvm, lpcr, lpcr_mask);
+
return 0;
 }
 
@@ -4916,6 +4923,7 @@ int kvmppc_switch_mmu_to_hpt(struct kvm *kvm)
  */
 int kvmppc_switch_mmu_to_radix(struct kvm *kvm)
 {
+   unsigned long lpcr, lpcr_mask;
int err;
 
err = kvmppc_init_vm_radix(kvm);
@@ -4927,8 +4935,17 @@ int kvmppc_switch_mmu_to_radix(struct kvm *kvm)
kvm->arch.radix = 1;
spin_unlock(>mmu_lock);
kvmppc_free_hpt(>arch.hpt);
-   kvmppc_update_lpcr(kvm, LPCR_UPRT | LPCR_GTSE | LPCR_HR,
-  LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR);
+
+   lpcr = LPCR_UPRT | LPCR_GTSE | LPCR_HR;
+   lpcr_mask = LPCR_VPM1 | LPCR_UPRT | LPCR_GTSE | LPCR_HR;
+   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+   lpcr_mask |= LPCR_HAIL;
+   if (cpu_has_feature(CPU_FTR_HVMODE) &&
+   (kvm->arch.host_lpcr & LPCR_HAIL))
+   lpcr |= LPCR_HAIL;
+   }
+   kvmppc_update_lpcr(kvm, lpcr, lpcr_mask);
+
return 0;
 }
 
@@ -5092,6 +5109,10 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
kvm->arch.mmu_ready = 1;
lpcr &= ~LPCR_VPM1;
lpcr |= LPCR_UPRT | LPCR_GTSE | LPCR_HR;
+   if (cpu_has_feature(CPU_FTR_HVMODE) &&
+   cpu_has_feature(CPU_FTR_ARCH_31) &&
+   (kvm->arch.host_lpcr & LPCR_HAIL))
+   lpcr |= LPCR_HAIL;
ret = kvmppc_init_vm_radix(kvm);
if (ret) {
kvmppc_free_lpid(kvm->arch.lpid);
-- 
2.23.0

[RFC PATCH 06/43] powerpc/time: add API for KVM to re-arm the host timer/decrementer

2021-06-22 Thread Nicholas Piggin

Rather than have KVM look up the host timer and fiddle with the
irq-work internal details, have the powerpc/time.c code provide a
function for KVM to re-arm the Linux timer code when exiting a
guest.

This is implementation has an improvement over existing code of
marking a decrementer interrupt as soft-pending if a timer has
expired, rather than setting DEC to a -ve value, which tended to
cause host timers to take two interrupts (first hdec to exit the
guest, then the immediate dec).

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/time.h | 16 +++---
 arch/powerpc/kernel/time.c  | 52 +++--
 arch/powerpc/kvm/book3s_hv.c|  7 ++---
 3 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 69b6be617772..924b2157882f 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -99,18 +99,6 @@ extern void div128_by_32(u64 dividend_high, u64 dividend_low,
 extern void secondary_cpu_time_init(void);
 extern void __init time_init(void);
 
-#ifdef CONFIG_PPC64
-static inline unsigned long test_irq_work_pending(void)
-{
-   unsigned long x;
-
-   asm volatile("lbz %0,%1(13)"
-   : "=r" (x)
-   : "i" (offsetof(struct paca_struct, irq_work_pending)));
-   return x;
-}
-#endif
-
 DECLARE_PER_CPU(u64, decrementers_next_tb);
 
 static inline u64 timer_get_next_tb(void)
@@ -118,6 +106,10 @@ static inline u64 timer_get_next_tb(void)
return __this_cpu_read(decrementers_next_tb);
 }
 
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+void timer_rearm_host_dec(u64 now);
+#endif
+
 /* Convert timebase ticks to nanoseconds */
 unsigned long long tb_to_ns(unsigned long long tb_ticks);
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 026b3c0b648c..7c9de3498548 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -510,6 +510,16 @@ EXPORT_SYMBOL(profile_pc);
  * 64-bit uses a byte in the PACA, 32-bit uses a per-cpu variable...
  */
 #ifdef CONFIG_PPC64
+static inline unsigned long test_irq_work_pending(void)
+{
+   unsigned long x;
+
+   asm volatile("lbz %0,%1(13)"
+   : "=r" (x)
+   : "i" (offsetof(struct paca_struct, irq_work_pending)));
+   return x;
+}
+
 static inline void set_irq_work_pending_flag(void)
 {
asm volatile("stb %0,%1(13)" : :
@@ -553,13 +563,44 @@ void arch_irq_work_raise(void)
preempt_enable();
 }
 
+static void set_dec_or_work(u64 val)
+{
+   set_dec(val);
+   /* We may have raced with new irq work */
+   if (unlikely(test_irq_work_pending()))
+   set_dec(1);
+}
+
 #else  /* CONFIG_IRQ_WORK */
 
 #define test_irq_work_pending()0
 #define clear_irq_work_pending()
 
+static void set_dec_or_work(u64 val)
+{
+   set_dec(val);
+}
 #endif /* CONFIG_IRQ_WORK */
 
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+void timer_rearm_host_dec(u64 now)
+{
+   u64 *next_tb = this_cpu_ptr(_next_tb);
+
+   WARN_ON_ONCE(!arch_irqs_disabled());
+   WARN_ON_ONCE(mfmsr() & MSR_EE);
+
+   if (now >= *next_tb) {
+   local_paca->irq_happened |= PACA_IRQ_DEC;
+   } else {
+   now = *next_tb - now;
+   if (now <= decrementer_max)
+   set_dec_or_work(now);
+   }
+}
+EXPORT_SYMBOL_GPL(timer_rearm_host_dec);
+#endif
+
 /*
  * timer_interrupt - gets called when the decrementer overflows,
  * with interrupts disabled.
@@ -620,10 +661,7 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(timer_interrupt)
} else {
now = *next_tb - now;
if (now <= decrementer_max)
-   set_dec(now);
-   /* We may have raced with new irq work */
-   if (test_irq_work_pending())
-   set_dec(1);
+   set_dec_or_work(now);
__this_cpu_inc(irq_stat.timer_irqs_others);
}
 
@@ -865,11 +903,7 @@ static int decrementer_set_next_event(unsigned long evt,
  struct clock_event_device *dev)
 {
__this_cpu_write(decrementers_next_tb, get_tb() + evt);
-   set_dec(evt);
-
-   /* We may have raced with new irq work */
-   if (test_irq_work_pending())
-   set_dec(1);
+   set_dec_or_work(evt);
 
return 0;
 }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 5ec534620e07..36e1db48fccf 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3913,11 +3913,8 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
vc->entry_exit_map = 0x101;
vc->in_guest = 0;
 
-   next_timer = timer_get_next_tb();
-   set_dec(next_timer - tb);
-   /* We may have raced with new irq work */
-   if (test_irq_work_pending())
-   set_dec(1);
+   timer_rearm_host_dec(tb);
+

[RFC PATCH 05/43] KVM: PPC: Book3S HV P9: Reduce mftb per guest entry/exit

2021-06-22 Thread Nicholas Piggin

mftb is serialising (dispatch next-to-complete) so it is heavy weight
for a mfspr. Avoid reading it multiple times in the entry or exit paths.
A small number of cycles delay to timers is tolerable.

-118 cycles (9137) POWER9 virt-mode NULL hcall

Reviewed-by: Fabiano Rosas 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c  | 4 ++--
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a413377aafb5..5ec534620e07 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3794,7 +3794,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
 *
 * XXX: Another day's problem.
 */
-   mtspr(SPRN_DEC, vcpu->arch.dec_expires - mftb());
+   mtspr(SPRN_DEC, vcpu->arch.dec_expires - tb);
 
if (kvmhv_on_pseries()) {
/*
@@ -3914,7 +3914,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
vc->in_guest = 0;
 
next_timer = timer_get_next_tb();
-   set_dec(next_timer - mftb());
+   set_dec(next_timer - tb);
/* We may have raced with new irq work */
if (test_irq_work_pending())
set_dec(1);
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 63afd277c5f3..c4f3e066fcb4 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -203,7 +203,8 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
unsigned long host_dawr1;
unsigned long host_dawrx1;
 
-   hdec = time_limit - mftb();
+   tb = mftb();
+   hdec = time_limit - tb;
if (hdec < 0)
return BOOK3S_INTERRUPT_HV_DECREMENTER;
 
@@ -215,7 +216,7 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
vcpu->arch.ceded = 0;
 
if (vc->tb_offset) {
-   u64 new_tb = mftb() + vc->tb_offset;
+   u64 new_tb = tb + vc->tb_offset;
mtspr(SPRN_TBU40, new_tb);
tb = mftb();
if ((tb & 0xff) < (new_tb & 0xff))
-- 
2.23.0

[RFC PATCH 04/43] KVM: PPC: Book3S HV P9: Use large decrementer for HDEC

2021-06-22 Thread Nicholas Piggin

On processors that don't suppress the HDEC exceptions when LPCR[HDICE]=0,
this could help reduce needless guest exits due to leftover exceptions on
entering the guest.

Reviewed-by: Alexey Kardashevskiy 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/time.h   | 2 ++
 arch/powerpc/kernel/time.c| 1 +
 arch/powerpc/kvm/book3s_hv_p9_entry.c | 3 ++-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index fd09b4797fd7..69b6be617772 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -18,6 +18,8 @@
 #include 
 
 /* time.c */
+extern u64 decrementer_max;
+
 extern unsigned long tb_ticks_per_jiffy;
 extern unsigned long tb_ticks_per_usec;
 extern unsigned long tb_ticks_per_sec;
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 98bdd96141f2..026b3c0b648c 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -89,6 +89,7 @@ static struct clocksource clocksource_timebase = {
 
 #define DECREMENTER_DEFAULT_MAX 0x7FFF
 u64 decrementer_max = DECREMENTER_DEFAULT_MAX;
+EXPORT_SYMBOL_GPL(decrementer_max); /* for KVM HDEC */
 
 static int decrementer_set_next_event(unsigned long evt,
  struct clock_event_device *dev);
diff --git a/arch/powerpc/kvm/book3s_hv_p9_entry.c 
b/arch/powerpc/kvm/book3s_hv_p9_entry.c
index 83f592eadcd2..63afd277c5f3 100644
--- a/arch/powerpc/kvm/book3s_hv_p9_entry.c
+++ b/arch/powerpc/kvm/book3s_hv_p9_entry.c
@@ -489,7 +489,8 @@ int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 
time_limit, unsigned long lpc
vc->tb_offset_applied = 0;
}
 
-   mtspr(SPRN_HDEC, 0x7fff);
+   /* HDEC must be at least as large as DEC, so decrementer_max fits */
+   mtspr(SPRN_HDEC, decrementer_max);
 
save_clear_guest_mmu(kvm, vcpu);
switch_mmu_to_host(kvm, host_pidr);
-- 
2.23.0

[RFC PATCH 03/43] KVM: PPC: Book3S HV P9: Use host timer accounting to avoid decrementer read

2021-06-22 Thread Nicholas Piggin

There is no need to save away the host DEC value, as it is derived
from the host timer subsystem which maintains the next timer time,
so it can be restored from there.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/time.h |  5 +
 arch/powerpc/kernel/time.c  |  1 +
 arch/powerpc/kvm/book3s_hv.c| 14 +++---
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 8c2c3dd4ddba..fd09b4797fd7 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -111,6 +111,11 @@ static inline unsigned long test_irq_work_pending(void)
 
 DECLARE_PER_CPU(u64, decrementers_next_tb);
 
+static inline u64 timer_get_next_tb(void)
+{
+   return __this_cpu_read(decrementers_next_tb);
+}
+
 /* Convert timebase ticks to nanoseconds */
 unsigned long long tb_to_ns(unsigned long long tb_ticks);
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index da995c5fb97d..98bdd96141f2 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -108,6 +108,7 @@ struct clock_event_device decrementer_clockevent = {
 EXPORT_SYMBOL(decrementer_clockevent);
 
 DEFINE_PER_CPU(u64, decrementers_next_tb);
+EXPORT_SYMBOL_GPL(decrementers_next_tb);
 static DEFINE_PER_CPU(struct clock_event_device, decrementers);
 
 #define XSEC_PER_SEC (1024*1024)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d19b4ae01642..a413377aafb5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3729,18 +3729,17 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
struct kvmppc_vcore *vc = vcpu->arch.vcore;
struct p9_host_os_sprs host_os_sprs;
s64 dec;
-   u64 tb;
+   u64 tb, next_timer;
int trap, save_pmu;
 
WARN_ON_ONCE(vcpu->arch.ceded);
 
-   dec = mfspr(SPRN_DEC);
tb = mftb();
-   if (dec < 0)
+   next_timer = timer_get_next_tb();
+   if (tb >= next_timer)
return BOOK3S_INTERRUPT_HV_DECREMENTER;
-   local_paca->kvm_hstate.dec_expires = dec + tb;
-   if (local_paca->kvm_hstate.dec_expires < time_limit)
-   time_limit = local_paca->kvm_hstate.dec_expires;
+   if (next_timer < time_limit)
+   time_limit = next_timer;
 
save_p9_host_os_sprs(_os_sprs);
 
@@ -3914,7 +3913,8 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
vc->entry_exit_map = 0x101;
vc->in_guest = 0;
 
-   set_dec(local_paca->kvm_hstate.dec_expires - mftb());
+   next_timer = timer_get_next_tb();
+   set_dec(next_timer - mftb());
/* We may have raced with new irq work */
if (test_irq_work_pending())
set_dec(1);
-- 
2.23.0

[RFC PATCH 02/43] KMV: PPC: Book3S HV P9: Use set_dec to set decrementer to host

2021-06-22 Thread Nicholas Piggin

The host Linux timer code arms the decrementer with the value
'decrementers_next_tb - current_tb' using set_dec(), which stores
val - 1 on Book3S-64, which is not quite the same as what KVM does
to re-arm the host decrementer when exiting the guest.

This shouldn't be a significant change, but it makes the logic match
and avoids this small extra change being brought into the next patch.

Suggested-by: Alexey Kardashevskiy 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 97f3d6d54b61..d19b4ae01642 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3914,7 +3914,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
vc->entry_exit_map = 0x101;
vc->in_guest = 0;
 
-   mtspr(SPRN_DEC, local_paca->kvm_hstate.dec_expires - mftb());
+   set_dec(local_paca->kvm_hstate.dec_expires - mftb());
/* We may have raced with new irq work */
if (test_irq_work_pending())
set_dec(1);
-- 
2.23.0

[RFC PATCH 01/43] powerpc/64s: Remove WORT SPR from POWER9/10

2021-06-22 Thread Nicholas Piggin

This register is not architected and not implemented in POWER9 or 10,
it just reads back zeroes for compatibility.

-78 cycles (9255) cycles POWER9 virt-mode NULL hcall

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kvm/book3s_hv.c  | 3 ---
 arch/powerpc/platforms/powernv/idle.c | 2 --
 2 files changed, 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 9228042bd54f..97f3d6d54b61 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3640,7 +3640,6 @@ static void load_spr_state(struct kvm_vcpu *vcpu)
mtspr(SPRN_EBBHR, vcpu->arch.ebbhr);
mtspr(SPRN_EBBRR, vcpu->arch.ebbrr);
mtspr(SPRN_BESCR, vcpu->arch.bescr);
-   mtspr(SPRN_WORT, vcpu->arch.wort);
mtspr(SPRN_TIDR, vcpu->arch.tid);
mtspr(SPRN_AMR, vcpu->arch.amr);
mtspr(SPRN_UAMOR, vcpu->arch.uamor);
@@ -3667,7 +3666,6 @@ static void store_spr_state(struct kvm_vcpu *vcpu)
vcpu->arch.ebbhr = mfspr(SPRN_EBBHR);
vcpu->arch.ebbrr = mfspr(SPRN_EBBRR);
vcpu->arch.bescr = mfspr(SPRN_BESCR);
-   vcpu->arch.wort = mfspr(SPRN_WORT);
vcpu->arch.tid = mfspr(SPRN_TIDR);
vcpu->arch.amr = mfspr(SPRN_AMR);
vcpu->arch.uamor = mfspr(SPRN_UAMOR);
@@ -3699,7 +3697,6 @@ static void restore_p9_host_os_sprs(struct kvm_vcpu *vcpu,
struct p9_host_os_sprs *host_os_sprs)
 {
mtspr(SPRN_PSPB, 0);
-   mtspr(SPRN_WORT, 0);
mtspr(SPRN_UAMOR, 0);
 
mtspr(SPRN_DSCR, host_os_sprs->dscr);
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 528a7e0cf83a..180baecad914 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -667,7 +667,6 @@ static unsigned long power9_idle_stop(unsigned long psscr)
sprs.purr   = mfspr(SPRN_PURR);
sprs.spurr  = mfspr(SPRN_SPURR);
sprs.dscr   = mfspr(SPRN_DSCR);
-   sprs.wort   = mfspr(SPRN_WORT);
sprs.ciabr  = mfspr(SPRN_CIABR);
 
sprs.mmcra  = mfspr(SPRN_MMCRA);
@@ -785,7 +784,6 @@ static unsigned long power9_idle_stop(unsigned long psscr)
mtspr(SPRN_PURR,sprs.purr);
mtspr(SPRN_SPURR,   sprs.spurr);
mtspr(SPRN_DSCR,sprs.dscr);
-   mtspr(SPRN_WORT,sprs.wort);
mtspr(SPRN_CIABR,   sprs.ciabr);
 
mtspr(SPRN_MMCRA,   sprs.mmcra);
-- 
2.23.0

[RFC PATCH 00/43] KVM: PPC: Book3S HV P9: entry/exit optimisations round 1

2021-06-22 Thread Nicholas Piggin

This series applies to powerpc topic/ppc-kvm branch (KVM Cify
series in particular), plus "KVM: PPC: Book3S HV Nested: Reflect L2 PMU
in-use to L0 when L2 SPRs are live" posted to kvm-ppc.

This reduces radix guest full entry/exit latency on POWER9 and POWER10
by almost 2x (hash is similar but it's still significantly slower than
the P7/8 real mode handler). Nested HV guests should see speedups with
some smaller improvements in the L1, plus the L0 switching sees many
of the same speedups as a direct guest.

It does this in several main ways:

- Rearrange code to optimise SPR accesses. Mainly, avoid scoreboard
  stalls.

- Test SPR values to avoid mtSPRs where possible. mtSPRs are expensive.

- Reduce mftb. mftb is expensive.

- Demand fault certain facilities to avoid saving and/or restoring them
  (at the cost of fault when they are used, but this is mitigated over
  a number of entries, like the facilities when context switching 
  processes). PM, TM, and EBB so far.

- Defer some sequences that are made just in case a guest is interrupted
  in the middle of a critical section to the case where the guest is
  scheduled on a different CPU, rather than every time (at the cost of
  an extra IPI in this case). Namely the tlbsync sequence for radix with
  GTSE, which is very expensive.

Some of the numbers quoted in changelogs may have changed a bit with
patches being updated, reordered, etc. They give a bit of a guide, but
I might remove them from the final submission because they're too much
to maintain.

Thanks,
Nick

Nicholas Piggin (43):
  powerpc/64s: Remove WORT SPR from POWER9/10
  KMV: PPC: Book3S HV P9: Use set_dec to set decrementer to host
  KVM: PPC: Book3S HV P9: Use host timer accounting to avoid decrementer
read
  KVM: PPC: Book3S HV P9: Use large decrementer for HDEC
  KVM: PPC: Book3S HV P9: Reduce mftb per guest entry/exit
  powerpc/time: add API for KVM to re-arm the host timer/decrementer
  KVM: PPC: Book3S HV: POWER10 enable HAIL when running radix guests
  powerpc/64s: Keep AMOR SPR a constant ~0 at runtime
  KVM: PPC: Book3S HV: Don't always save PMU for guest capable of
nesting
  powerpc/64s: Always set PMU control registers to frozen/disabled when
not in use
  KVM: PPC: Book3S HV P9: Implement PMU save/restore in C
  KVM: PPC: Book3S HV P9: Factor out yield_count increment
  KVM: PPC: Book3S HV P9: Factor PMU save/load into context switch
functions
  KVM: PPC: Book3S HV P9: Demand fault PMU SPRs when marked not inuse
  KVM: PPC: Book3S HV: CTRL SPR does not require read-modify-write
  KVM: PPC: Book3S HV P9: Move SPRG restore to restore_p9_host_os_sprs
  KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save
host SPRs
  KVM: PPC: Book3S HV P9: Improve mtmsrd scheduling by delaying MSR[EE]
disable
  KVM: PPC: Book3S HV P9: Add kvmppc_stop_thread to match
kvmppc_start_thread
  KVM: PPC: Book3S HV: Change dec_expires to be relative to guest
timebase
  KVM: PPC: Book3S HV P9: Move TB updates
  KVM: PPC: Book3S HV P9: Optimise timebase reads
  KVM: PPC: Book3S HV P9: Avoid SPR scoreboard stalls
  KVM: PPC: Book3S HV P9: Only execute mtSPR if the value changed
  KVM: PPC: Book3S HV P9: Juggle SPR switching around
  KVM: PPC: Book3S HV P9: Move vcpu register save/restore into functions
  KVM: PPC: Book3S HV P9: Move host OS save/restore functions to
built-in
  KVM: PPC: Book3S HV P9: Move nested guest entry into its own function
  KVM: PPC: Book3S HV P9: Move remaining SPR and MSR access into low
level entry
  KVM: PPC: Book3S HV P9: Implement TM fastpath for guest entry/exit
  KVM: PPC: Book3S HV P9: Switch PMU to guest as late as possible
  KVM: PPC: Book3S HV P9: Restrict DSISR canary workaround to processors
that require it
  KVM: PPC: Book3S HV P9: More SPR speed improvements
  KVM: PPC: Book3S HV P9: Demand fault EBB facility registers
  KVM: PPC: Book3S HV P9: Demand fault TM facility registers
  KVM: PPC: Book3S HV P9: Use Linux SPR save/restore to manage some host
SPRs
  KVM: PPC: Book3S HV P9: Comment and fix MMU context switching code
  KVM: PPC: Book3S HV P9: Test dawr_enabled() before saving host DAWR
SPRs
  KVM: PPC: Book3S HV P9: Don't restore PSSCR if not needed
  KVM: PPC: Book3S HV P9: Avoid tlbsync sequence on radix guest exit
  KVM: PPC: Book3S HV Nested: Avoid extra mftb() in nested entry
  KVM: PPC: Book3S HV P9: Improve mfmsr performance on entry
  KVM: PPC: Book3S HV P9: Optimise hash guest SLB saving

 arch/powerpc/include/asm/asm-prototypes.h |   5 -
 arch/powerpc/include/asm/kvm_asm.h|   1 +
 arch/powerpc/include/asm/kvm_book3s.h |   6 +
 arch/powerpc/include/asm/kvm_book3s_64.h  |   4 +-
 arch/powerpc/include/asm/kvm_host.h   |   5 +-
 arch/powerpc/include/asm/switch_to.h  |   2 +
 arch/powerpc/include/asm/time.h   |  19 +-
 arch/powerpc/kernel/cpu_setup_power.c |  12 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c |   8 +-
 arch/powerpc/kernel/process.c |

Re: [PATCH] KVM: PPC: Book3S HV: Workaround high stack usage with clang

2021-06-22 Thread Nicholas Piggin

Excerpts from Nathan Chancellor's message of June 22, 2021 4:24 am:
> LLVM does not emit optimal byteswap assembly, which results in high
> stack usage in kvmhv_enter_nested_guest() due to the inlining of
> byteswap_pt_regs(). With LLVM 12.0.0:
> 
> arch/powerpc/kvm/book3s_hv_nested.c:289:6: error: stack frame size of
> 2512 bytes in function 'kvmhv_enter_nested_guest' 
> [-Werror,-Wframe-larger-than=]
> long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
>  ^
> 1 error generated.
> 
> While this gets fixed in LLVM, mark byteswap_pt_regs() as
> noinline_for_stack so that it does not get inlined and break the build
> due to -Werror by default in arch/powerpc/. Not inlining saves
> approximately 800 bytes with LLVM 12.0.0:
> 
> arch/powerpc/kvm/book3s_hv_nested.c:290:6: warning: stack frame size of
> 1728 bytes in function 'kvmhv_enter_nested_guest' [-Wframe-larger-than=]
> long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
>  ^
> 1 warning generated.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/1292
> Link: https://bugs.llvm.org/show_bug.cgi?id=49610
> Link: https://lore.kernel.org/r/202104031853.vdt0qjqj-...@intel.com/
> Link: https://gist.github.com/ba710e3703bf45043a31e2806c843ffd
> Reported-by: kernel test robot 
> Signed-off-by: Nathan Chancellor 

Seems okay to me. If it was something where performance might be 
signficiant I guess you could ifdef on CC_IS_CLANG, but for this
it shouldn't matter.

Acked-by: Nicholas Piggin 

Thanks,
Nick

> ---
>  arch/powerpc/kvm/book3s_hv_nested.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
> b/arch/powerpc/kvm/book3s_hv_nested.c
> index 60724f674421..1b3ff0af1264 100644
> --- a/arch/powerpc/kvm/book3s_hv_nested.c
> +++ b/arch/powerpc/kvm/book3s_hv_nested.c
> @@ -53,7 +53,8 @@ void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct 
> hv_guest_state *hr)
>   hr->dawrx1 = vcpu->arch.dawrx1;
>  }
>  
> -static void byteswap_pt_regs(struct pt_regs *regs)
> +/* Use noinline_for_stack due to https://bugs.llvm.org/show_bug.cgi?id=49610 
> */
> +static noinline_for_stack void byteswap_pt_regs(struct pt_regs *regs)
>  {
>   unsigned long *addr = (unsigned long *) regs;
>  
> 
> base-commit: 4a21192e2796c3338c4b0083b494a84a61311aaf
> -- 
> 2.32.0.93.g670b81a890
> 
>

Re: [PATCH 4/4] powerpc: Enable KFENCE on BOOK3S/64

2021-06-22 Thread Michael Ellerman

Jordan Niethe  writes:
> From: Christophe Leroy 
>
> This reuses the DEBUG_PAGEALLOC logic.
>
> Tested with CONFIG_KFENCE + CONFIG_KUNIT + CONFIG_KFENCE_KUNIT_TEST on
> radix and hash.
>
> Signed-off-by: Christophe Leroy 
> [jpn: Handle radix]
> Signed-off-by: Jordan Niethe 
> ---
>  arch/powerpc/Kconfig |  2 +-
>  arch/powerpc/include/asm/book3s/64/pgtable.h |  2 +-
>  arch/powerpc/include/asm/kfence.h| 19 +++
>  arch/powerpc/mm/book3s64/hash_utils.c| 12 ++--
>  arch/powerpc/mm/book3s64/radix_pgtable.c |  8 +---
>  5 files changed, 32 insertions(+), 11 deletions(-)

This makes lockdep very unhappy :(

  [   24.016750][C0] 
  [   24.017145][C0] WARNING: inconsistent lock state
  [   24.017600][C0] 5.13.0-rc2-00196-g8bf29f9c76e2 #1 Not tainted
  [   24.018222][C0] 
  [   24.018612][C0] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
  [   24.019146][C0] S55runtest/104 [HC0[0]:SC1[1]:HE1:SE0] takes:
  [   24.019695][C0] c278bf50 
(init_mm.page_table_lock){+.?.}-{2:2}, at: change_page_attr+0x54/0x290
  [   24.021847][C0] {SOFTIRQ-ON-W} state was registered at:
  [   24.022353][C0]   lock_acquire+0x128/0x600
  [   24.022941][C0]   _raw_spin_lock+0x54/0x80
  [   24.023301][C0]   change_page_attr+0x54/0x290
  [   24.023667][C0]   __apply_to_page_range+0x550/0xa70
  [   24.024070][C0]   change_memory_attr+0x7c/0x140
  [   24.024445][C0]   bpf_prog_select_runtime+0x230/0x2a0
  [   24.024911][C0]   bpf_migrate_filter+0x18c/0x1e0
  [   24.025310][C0]   bpf_prog_create+0x178/0x1d0
  [   24.025681][C0]   ptp_classifier_init+0x4c/0x80
  [   24.026090][C0]   sock_init+0xe0/0x100
  [   24.026422][C0]   do_one_initcall+0x88/0x4b0
  [   24.026790][C0]   kernel_init_freeable+0x364/0x40c
  [   24.027196][C0]   kernel_init+0x24/0x188
  [   24.027539][C0]   ret_from_kernel_thread+0x5c/0x70
  [   24.027987][C0] irq event stamp: 1322
  [   24.028315][C0] hardirqs last  enabled at (1322): [] 
_raw_spin_unlock_irqrestore+0x94/0xd0
  [   24.029084][C0] hardirqs last disabled at (1321): [] 
_raw_spin_lock_irqsave+0xa8/0xc0
  [   24.029813][C0] softirqs last  enabled at (738): [] 
__do_softirq+0x5f8/0x668
  [   24.030531][C0] softirqs last disabled at (1271): [] 
__irq_exit_rcu+0x1c4/0x1d0
  [   24.031271][C0]
  [   24.031271][C0] other info that might help us debug this:
  [   24.031917][C0]  Possible unsafe locking scenario:
  [   24.031917][C0]
  [   24.032460][C0]CPU0
  [   24.032720][C0]
  [   24.032980][C0]   lock(init_mm.page_table_lock);
  [   24.033400][C0]   
  [   24.033668][C0] lock(init_mm.page_table_lock);
  [   24.034102][C0]
  [   24.034102][C0]  *** DEADLOCK ***
  [   24.034102][C0]
  [   24.034735][C0] 5 locks held by S55runtest/104:
  [   24.035162][C0]  #0: ca9ef098 (>ldisc_sem){}-{0:0}, 
at: tty_ldisc_ref_wait+0x3c/0xa0
  [   24.035998][C0]  #1: ca9ef130 
(>atomic_write_lock){+.+.}-{3:3}, at: file_tty_write.constprop.0+0xd8/0x3b0
  [   24.036849][C0]  #2: ca9ef2e8 
(>termios_rwsem){}-{3:3}, at: n_tty_write+0xd0/0x6b0
  [   24.037591][C0]  #3: c008001d2378 
(>output_lock){+.+.}-{3:3}, at: n_tty_write+0x248/0x6b0
  [   24.038342][C0]  #4: c2618448 (rcu_callback){}-{0:0}, at: 
rcu_core+0x450/0x1360
  [   24.039093][C0]
  [   24.039093][C0] stack backtrace:
  [   24.039727][C0] CPU: 0 PID: 104 Comm: S55runtest Not tainted 
5.13.0-rc2-00196-g8bf29f9c76e2 #1
  [   24.040790][C0] Call Trace:
  [   24.041120][C0] [cadc2be0] [c0940868] 
dump_stack+0xec/0x144 (unreliable)
  [   24.041925][C0] [cadc2c30] [c01f1b38] 
print_usage_bug.part.0+0x24c/0x278
  [   24.042611][C0] [cadc2cd0] [c01eb0c0] 
mark_lock+0x950/0xc00
  [   24.043186][C0] [cadc2df0] [c01ebb74] 
__lock_acquire+0x494/0x28b0
  [   24.043794][C0] [cadc2f20] [c01eeba8] 
lock_acquire+0x128/0x600
  [   24.044384][C0] [cadc3020] [c1098f64] 
_raw_spin_lock+0x54/0x80
  [   24.044976][C0] [cadc3050] [c008aa14] 
change_page_attr+0x54/0x290
  [   24.045586][C0] [cadc30b0] [c04347e0] 
__apply_to_page_range+0x550/0xa70
  [   24.046238][C0] [cadc31a0] [c008accc] 
change_memory_attr+0x7c/0x140
  [   24.046857][C0] [cadc31e0] [c0099f78] 
radix__kernel_map_pages+0x68/0x80
  [   24.047501][C0] [cadc3200] [c04a8028] 
kfence_protect+0x48/0x80
  [   24.048091][C0] [cadc3230] [c04a84a8] 
kfence_guarded_free+0x448/0x590
  [   24.048718][C0] [cadc3290] [c049e1b0] 
__slab_free+0x400/0x6c0
  [   24.049307][C0]

Re: [PATCH 2/2] powerpc/64s/interrupt: Check and fix srr_valid without crashing

2021-06-22 Thread Christophe Leroy





Le 22/06/2021 à 10:54, Nicholas Piggin a écrit :

Excerpts from Christophe Leroy's message of June 22, 2021 4:47 pm:



Le 22/06/2021 à 08:04, Nicholas Piggin a écrit :

The PPC_RFI_SRR_DEBUG check added by patch "powerpc/64s: avoid reloading
(H)SRR registers if they are still valid" has a few deficiencies. It
does not fix the actual problem, it's not enabled by default, and it
causes a program check interrupt which can cause more difficulties.

However there are a lot of paths which may clobber SRRs or change return
regs, and difficult to have a high confidence that all paths are covered
without wider testing.

Add a relatively low overhead always-enabled check that catches most
such cases, reports once, and fixes it so the kernel can continue.

Signed-off-by: Nicholas Piggin 
---
   arch/powerpc/kernel/interrupt.c | 58 +
   1 file changed, 58 insertions(+)

diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index 05fa3ae56e25..5920a3e8d1d5 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -231,6 +231,56 @@ static notrace void booke_load_dbcr0(void)
   #endif
   }
   
+#include  /* for show_regs */

+static void check_return_regs_valid(struct pt_regs *regs)
+{
+#ifdef CONFIG_PPC_BOOK3S_64
+   static bool warned = false;
+
+   if (regs->trap == 0x980 || regs->trap == 0xe00 || regs->trap == 0xe20 ||
+   regs->trap == 0xe40 || regs->trap == 0xe60 || regs->trap == 0xe80 ||
+   regs->trap == 0xea0 || regs->trap == 0xf80 || regs->trap == 0x1200 
||
+   regs->trap == 0x1500 || regs->trap == 0x1600 || regs->trap == 
0x1800) {


Can you use names defined in asm/interrupt.h instead of raw values ?
Some are already there, others can be added.


Good idea. Could go into a helper too actually.

I wanted to clean up the KVM definitions and unify them with interrupt.h
defs but that's a bit of churn. Can I get to that in the next merge or
so?




Sure

Christophe

Re: [PATCH 2/2] powerpc/64s/interrupt: Check and fix srr_valid without crashing

2021-06-22 Thread Nicholas Piggin

Excerpts from Christophe Leroy's message of June 22, 2021 4:47 pm:
> 
> 
> Le 22/06/2021 à 08:04, Nicholas Piggin a écrit :
>> The PPC_RFI_SRR_DEBUG check added by patch "powerpc/64s: avoid reloading
>> (H)SRR registers if they are still valid" has a few deficiencies. It
>> does not fix the actual problem, it's not enabled by default, and it
>> causes a program check interrupt which can cause more difficulties.
>> 
>> However there are a lot of paths which may clobber SRRs or change return
>> regs, and difficult to have a high confidence that all paths are covered
>> without wider testing.
>> 
>> Add a relatively low overhead always-enabled check that catches most
>> such cases, reports once, and fixes it so the kernel can continue.
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>   arch/powerpc/kernel/interrupt.c | 58 +
>>   1 file changed, 58 insertions(+)
>> 
>> diff --git a/arch/powerpc/kernel/interrupt.c 
>> b/arch/powerpc/kernel/interrupt.c
>> index 05fa3ae56e25..5920a3e8d1d5 100644
>> --- a/arch/powerpc/kernel/interrupt.c
>> +++ b/arch/powerpc/kernel/interrupt.c
>> @@ -231,6 +231,56 @@ static notrace void booke_load_dbcr0(void)
>>   #endif
>>   }
>>   
>> +#include  /* for show_regs */
>> +static void check_return_regs_valid(struct pt_regs *regs)
>> +{
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +static bool warned = false;
>> +
>> +if (regs->trap == 0x980 || regs->trap == 0xe00 || regs->trap == 0xe20 ||
>> +regs->trap == 0xe40 || regs->trap == 0xe60 || regs->trap == 0xe80 ||
>> +regs->trap == 0xea0 || regs->trap == 0xf80 || regs->trap == 0x1200 
>> ||
>> +regs->trap == 0x1500 || regs->trap == 0x1600 || regs->trap == 
>> 0x1800) {
> 
> Can you use names defined in asm/interrupt.h instead of raw values ?
> Some are already there, others can be added.

Good idea. Could go into a helper too actually.

I wanted to clean up the KVM definitions and unify them with interrupt.h 
defs but that's a bit of churn. Can I get to that in the next merge or 
so?

Thanks,
Nick

Re: [powerpc][next-20210621] WARNING at kernel/sched/fair.c:3277 during boot

2021-06-22 Thread Vincent Guittot

Hi Sachin,

On Tue, 22 Jun 2021 at 09:39, Sachin Sant  wrote:
>
> While booting 5.13.0-rc7-next-20210621 on a PowerVM LPAR following warning
> is seen
>
> [   30.922154] [ cut here ]
> [   30.922201] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || 
> cfs_rq->avg.runnable_avg
> [   30.922219] WARNING: CPU: 6 PID: 762 at kernel/sched/fair.c:3277 
> update_blocked_averages+0x758/0x780
> [   30.922259] Modules linked in: pseries_rng xts vmx_crypto uio_pdrv_genirq 
> uio sch_fq_codel ip_tables sd_mod t10_pi sg fuse
> [   30.922309] CPU: 6 PID: 762 Comm: augenrules Not tainted 
> 5.13.0-rc7-next-20210621 #1
> [   30.922329] NIP:  c01b27e8 LR: c01b27e4 CTR: 
> c07cfda0
> [   30.922344] REGS: c00023fcb660 TRAP: 0700   Not tainted  
> (5.13.0-rc7-next-20210621)
> [   30.922359] MSR:  80029033   CR: 48488224  
> XER: 0005
> [   30.922394] CFAR: c014d120 IRQMASK: 1
>GPR00: c01b27e4 c00023fcb900 c2a08400 
> 0048
>GPR04: 7fff c00023fcb5c0 0027 
> c00f6fdd7e18
>GPR08: 0023 0001 0027 
> c28a6650
>GPR12: 8000 c00f6fff7680 c00f6fe62600 
> 0032
>GPR16: 0007331a989a c00f6fe62600 c000238a6800 
> 0001
>GPR20:  c2a4dfe0  
> 0006
>GPR24:  c00f6fe63010 0001 
> c00f6fe62680
>GPR28: 0006 c000238a69c0  
> c00f6fe62600
> [   30.922569] NIP [c01b27e8] update_blocked_averages+0x758/0x780
> [   30.922599] LR [c01b27e4] update_blocked_averages+0x754/0x780
> [   30.922624] Call Trace:
> [   30.922631] [c00023fcb900] [c01b27e4] 
> update_blocked_averages+0x754/0x780 (unreliable)
> [   30.922653] [c00023fcba20] [c01bd668] 
> newidle_balance+0x258/0x5c0
> [   30.922674] [c00023fcbab0] [c01bdaac] 
> pick_next_task_fair+0x7c/0x4d0
> [   30.922692] [c00023fcbb10] [c0dcd31c] __schedule+0x15c/0x1780
> [   30.922708] [c00023fcbc50] [c01a5a04] do_task_dead+0x64/0x70
> [   30.922726] [c00023fcbc80] [c0156338] do_exit+0x848/0xcc0
> [   30.922743] [c00023fcbd50] [c0156884] do_group_exit+0x64/0xe0
> [   30.922758] [c00023fcbd90] [c0156924] sys_exit_group+0x24/0x30
> [   30.922774] [c00023fcbdb0] [c00310c0] 
> system_call_exception+0x150/0x2d0
> [   30.922792] [c00023fcbe10] [c000cc5c] 
> system_call_common+0xec/0x278
> [   30.922808] --- interrupt: c00 at 0x7fffb3acddcc
> [   30.922821] NIP:  7fffb3acddcc LR: 7fffb3a27f04 CTR: 
> 
> [   30.922833] REGS: c00023fcbe80 TRAP: 0c00   Not tainted  
> (5.13.0-rc7-next-20210621)
> [   30.922847] MSR:  8280f033   
> CR: 28444202  XER: 
> [   30.922882] IRQMASK: 0
>GPR00: 00ea 7fffc8f21780 7fffb3bf7100 
> 
>GPR04:  000155f142f0  
> 7fffb3d23740
>GPR08: fbad2a87   
> 
>GPR12:  7fffb3d2aeb0 000116be95e0 
> 0032
>GPR16:  7fffc8f21cd8 002d 
> 0024
>GPR20: 7fffc8f21cd4 7fffb3bf4f98 0001 
> 0001
>GPR24: 7fffb3bf0950   
> 0001
>GPR28:   7fffb3d23ec0 
> 
> [   30.923023] NIP [7fffb3acddcc] 0x7fffb3acddcc
> [   30.923035] LR [7fffb3a27f04] 0x7fffb3a27f04
> [   30.923045] --- interrupt: c00
> [   30.923052] Instruction dump:
> [   30.923061] 3863be48 9be97ae6 4bf9a8f9 6000 0fe0 4bfff980 e9210070 
> e8610088
> [   30.923088] 3941 99490003 4bf9a8d9 6000 <0fe0> 4bfffc24 
> 3d22fff5 89297ae3
> [   30.923113] ---[ end trace ed07974d2149c499 ]—
>
> This warning was introduced with commit 9e077b52d86a
> sched/pelt: Check that *_avg are null when *_sum are

Yes. That was exactly the purpose of the patch. There is one last
remaining part which could generate this. I'm going to prepare a patch

Thanks

>
> next-20210618 was good.
>
> Thanks
> -Sachin

[powerpc][next-20210621] WARNING at kernel/sched/fair.c:3277 during boot

2021-06-22 Thread Sachin Sant

While booting 5.13.0-rc7-next-20210621 on a PowerVM LPAR following warning
is seen

[   30.922154] [ cut here ]
[   30.922201] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || 
cfs_rq->avg.runnable_avg
[   30.922219] WARNING: CPU: 6 PID: 762 at kernel/sched/fair.c:3277 
update_blocked_averages+0x758/0x780
[   30.922259] Modules linked in: pseries_rng xts vmx_crypto uio_pdrv_genirq 
uio sch_fq_codel ip_tables sd_mod t10_pi sg fuse
[   30.922309] CPU: 6 PID: 762 Comm: augenrules Not tainted 
5.13.0-rc7-next-20210621 #1
[   30.922329] NIP:  c01b27e8 LR: c01b27e4 CTR: c07cfda0
[   30.922344] REGS: c00023fcb660 TRAP: 0700   Not tainted  
(5.13.0-rc7-next-20210621)
[   30.922359] MSR:  80029033   CR: 48488224  
XER: 0005
[   30.922394] CFAR: c014d120 IRQMASK: 1 
   GPR00: c01b27e4 c00023fcb900 c2a08400 
0048 
   GPR04: 7fff c00023fcb5c0 0027 
c00f6fdd7e18 
   GPR08: 0023 0001 0027 
c28a6650 
   GPR12: 8000 c00f6fff7680 c00f6fe62600 
0032 
   GPR16: 0007331a989a c00f6fe62600 c000238a6800 
0001 
   GPR20:  c2a4dfe0  
0006 
   GPR24:  c00f6fe63010 0001 
c00f6fe62680 
   GPR28: 0006 c000238a69c0  
c00f6fe62600 
[   30.922569] NIP [c01b27e8] update_blocked_averages+0x758/0x780
[   30.922599] LR [c01b27e4] update_blocked_averages+0x754/0x780
[   30.922624] Call Trace:
[   30.922631] [c00023fcb900] [c01b27e4] 
update_blocked_averages+0x754/0x780 (unreliable)
[   30.922653] [c00023fcba20] [c01bd668] newidle_balance+0x258/0x5c0
[   30.922674] [c00023fcbab0] [c01bdaac] 
pick_next_task_fair+0x7c/0x4d0
[   30.922692] [c00023fcbb10] [c0dcd31c] __schedule+0x15c/0x1780
[   30.922708] [c00023fcbc50] [c01a5a04] do_task_dead+0x64/0x70
[   30.922726] [c00023fcbc80] [c0156338] do_exit+0x848/0xcc0
[   30.922743] [c00023fcbd50] [c0156884] do_group_exit+0x64/0xe0
[   30.922758] [c00023fcbd90] [c0156924] sys_exit_group+0x24/0x30
[   30.922774] [c00023fcbdb0] [c00310c0] 
system_call_exception+0x150/0x2d0
[   30.922792] [c00023fcbe10] [c000cc5c] 
system_call_common+0xec/0x278
[   30.922808] --- interrupt: c00 at 0x7fffb3acddcc
[   30.922821] NIP:  7fffb3acddcc LR: 7fffb3a27f04 CTR: 
[   30.922833] REGS: c00023fcbe80 TRAP: 0c00   Not tainted  
(5.13.0-rc7-next-20210621)
[   30.922847] MSR:  8280f033   CR: 
28444202  XER: 
[   30.922882] IRQMASK: 0 
   GPR00: 00ea 7fffc8f21780 7fffb3bf7100 
 
   GPR04:  000155f142f0  
7fffb3d23740 
   GPR08: fbad2a87   
 
   GPR12:  7fffb3d2aeb0 000116be95e0 
0032 
   GPR16:  7fffc8f21cd8 002d 
0024 
   GPR20: 7fffc8f21cd4 7fffb3bf4f98 0001 
0001 
   GPR24: 7fffb3bf0950   
0001 
   GPR28:   7fffb3d23ec0 
 
[   30.923023] NIP [7fffb3acddcc] 0x7fffb3acddcc
[   30.923035] LR [7fffb3a27f04] 0x7fffb3a27f04
[   30.923045] --- interrupt: c00
[   30.923052] Instruction dump:
[   30.923061] 3863be48 9be97ae6 4bf9a8f9 6000 0fe0 4bfff980 e9210070 
e8610088 
[   30.923088] 3941 99490003 4bf9a8d9 6000 <0fe0> 4bfffc24 3d22fff5 
89297ae3 
[   30.923113] ---[ end trace ed07974d2149c499 ]—

This warning was introduced with commit 9e077b52d86a
sched/pelt: Check that *_avg are null when *_sum are

next-20210618 was good.

Thanks
-Sachin

Re: [PATCH v8 4/6] KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE

2021-06-22 Thread Bharata B Rao

On Tue, Jun 22, 2021 at 10:05:45AM +0530, Bharata B Rao wrote:
> On Mon, Jun 21, 2021 at 10:12:42AM -0700, Nathan Chancellor wrote:
> > I have not seen this reported yet so apologies if it has and there is a
> > fix I am missing:
> > 
> > arch/powerpc/kvm/book3s_hv_nested.c:1334:11: error: variable 'ap' is 
> > uninitialized when used here [-Werror,-Wuninitialized]
> >ap, start, end);
> >^~
> > arch/powerpc/kvm/book3s_hv_nested.c:1276:25: note: initialize the variable 
> > 'ap' to silence this warning
> > unsigned long psize, ap;
> >^
> > = 0
> 
> Thanks for catching this, this wasn't caught in my environment.
> 
> I will repost the series with proper initialization to ap.

Michael,

Here is the fix for this on top of powerpc/next. If it is easier
and cleaner to fold this into the original series and re-post
the whole series against any updated tree, let me know.


>From 2e7198e28c0d1137f3230d4645e9cfddaccf4987 Mon Sep 17 00:00:00 2001
From: Bharata B Rao 
Date: Tue, 22 Jun 2021 12:07:01 +0530
Subject: [PATCH 1/1] KVM: PPC: Book3S HV: Use proper ap value in
 H_RPT_INVALIDATE

The ap value that is used when performing range based partition
scoped invalidations for the nested guests wasn't initialized
correctly.

Fix this and while we are here, reorganize the routine that does
this invalidation for better readability.

Fixes: 0e67d866cb32 ("KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE")
Signed-off-by: Bharata B Rao 
---
 arch/powerpc/kvm/book3s_hv_nested.c | 90 +
 1 file changed, 40 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index d78efb5f5bb3..3a06ac0b53e2 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1222,27 +1222,6 @@ long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu)
return H_SUCCESS;
 }
 
-static long do_tlb_invalidate_nested_tlb(struct kvm_vcpu *vcpu,
-unsigned long lpid,
-unsigned long page_size,
-unsigned long ap,
-unsigned long start,
-unsigned long end)
-{
-   unsigned long addr = start;
-   int ret;
-
-   do {
-   ret = kvmhv_emulate_tlbie_tlb_addr(vcpu, lpid, ap,
-  get_epn(addr));
-   if (ret)
-   return ret;
-   addr += page_size;
-   } while (addr < end);
-
-   return ret;
-}
-
 static long do_tlb_invalidate_nested_all(struct kvm_vcpu *vcpu,
 unsigned long lpid, unsigned long ric)
 {
@@ -1263,6 +1242,42 @@ static long do_tlb_invalidate_nested_all(struct kvm_vcpu 
*vcpu,
  */
 static unsigned long tlb_range_flush_page_ceiling __read_mostly = 33;
 
+static long do_tlb_invalidate_nested_tlb(struct kvm_vcpu *vcpu,
+unsigned long lpid,
+unsigned long pg_sizes,
+unsigned long start,
+unsigned long end)
+{
+   int ret = H_P4;
+   unsigned long addr, nr_pages;
+   struct mmu_psize_def *def;
+   unsigned long psize, ap, page_size;
+   bool flush_lpid;
+
+   for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
+   def = _psize_defs[psize];
+   if (!(pg_sizes & def->h_rpt_pgsize))
+   continue;
+
+   nr_pages = (end - start) >> def->shift;
+   flush_lpid = nr_pages > tlb_range_flush_page_ceiling;
+   if (flush_lpid)
+   return do_tlb_invalidate_nested_all(vcpu, lpid,
+   RIC_FLUSH_TLB);
+   addr = start;
+   ap = mmu_get_ap(psize);
+   page_size = 1UL << def->shift;
+   do {
+   ret = kvmhv_emulate_tlbie_tlb_addr(vcpu, lpid, ap,
+  get_epn(addr));
+   if (ret)
+   return H_P4;
+   addr += page_size;
+   } while (addr < end);
+   }
+   return ret;
+}
+
 /*
  * Performs partition-scoped invalidations for nested guests
  * as part of H_RPT_INVALIDATE hcall.
@@ -1271,10 +1286,6 @@ long do_h_rpt_invalidate_pat(struct kvm_vcpu *vcpu, 
unsigned long lpid,
 unsigned long type, unsigned long pg_sizes,
 unsigned long start, unsigned long end)
 {
-   struct kvm_nested_guest *gp;
-   long ret;
-   unsigned

Re: [PATCH 2/2] powerpc/64s/interrupt: Check and fix srr_valid without crashing

2021-06-22 Thread Christophe Leroy





Le 22/06/2021 à 08:04, Nicholas Piggin a écrit :

The PPC_RFI_SRR_DEBUG check added by patch "powerpc/64s: avoid reloading
(H)SRR registers if they are still valid" has a few deficiencies. It
does not fix the actual problem, it's not enabled by default, and it
causes a program check interrupt which can cause more difficulties.

However there are a lot of paths which may clobber SRRs or change return
regs, and difficult to have a high confidence that all paths are covered
without wider testing.

Add a relatively low overhead always-enabled check that catches most
such cases, reports once, and fixes it so the kernel can continue.

Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/kernel/interrupt.c | 58 +
  1 file changed, 58 insertions(+)

diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index 05fa3ae56e25..5920a3e8d1d5 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -231,6 +231,56 @@ static notrace void booke_load_dbcr0(void)
  #endif
  }
  
+#include  /* for show_regs */

+static void check_return_regs_valid(struct pt_regs *regs)
+{
+#ifdef CONFIG_PPC_BOOK3S_64
+   static bool warned = false;
+
+   if (regs->trap == 0x980 || regs->trap == 0xe00 || regs->trap == 0xe20 ||
+   regs->trap == 0xe40 || regs->trap == 0xe60 || regs->trap == 0xe80 ||
+   regs->trap == 0xea0 || regs->trap == 0xf80 || regs->trap == 0x1200 
||
+   regs->trap == 0x1500 || regs->trap == 0x1600 || regs->trap == 
0x1800) {


Can you use names defined in asm/interrupt.h instead of raw values ?
Some are already there, others can be added.



+   if (local_paca->hsrr_valid) {
+   unsigned long hsrr0 = mfspr(SPRN_HSRR0);
+   unsigned long hsrr1 = mfspr(SPRN_HSRR1);
+
+   if (hsrr0 == regs->nip && hsrr1 == regs->msr)
+   return;
+
+   if (!warned) {
+   warned = true;
+   printk("HSRR0 was: %lx should be: %lx\n",
+   hsrr0, regs->nip);
+   printk("HSRR1 was: %lx should be: %lx\n",
+   hsrr1, regs->msr);
+   show_regs(regs);
+   }
+   local_paca->hsrr_valid = 0; /* fixup */
+   }
+
+   } else if (regs->trap != 0x3000) {
+   if (local_paca->srr_valid) {
+   unsigned long srr0 = mfspr(SPRN_SRR0);
+   unsigned long srr1 = mfspr(SPRN_SRR1);
+
+   if (srr0 == regs->nip && srr1 == regs->msr)
+   return;
+
+   if (!warned) {
+   warned = true;
+   printk("SRR0 was: %lx should be: %lx\n",
+   srr0, regs->nip);
+   printk("SRR1 was: %lx should be: %lx\n",
+   srr1, regs->msr);
+   show_regs(regs);
+   }
+   local_paca->srr_valid = 0; /* fixup */
+   }
+   }
+#endif
+}
+
  /*
   * This should be called after a syscall returns, with r3 the return value
   * from the syscall. If this function returns non-zero, the system call
@@ -327,6 +377,8 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
}
}
  
+	check_return_regs_valid(regs);

+
user_enter_irqoff();
  
  	/* scv need not set RI=0 because SRRs are not used */

@@ -405,6 +457,8 @@ notrace unsigned long interrupt_exit_user_prepare(struct 
pt_regs *regs)
}
}
  
+	check_return_regs_valid(regs);

+
user_enter_irqoff();
  
  	if (unlikely(!__prep_irq_for_enabled_exit(true))) {

@@ -469,9 +523,13 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct 
pt_regs *regs)
}
}
  
+		check_return_regs_valid(regs);

+
if (unlikely(!prep_irq_for_enabled_exit(true, 
!irqs_disabled_flags(flags
goto again;
} else {
+   check_return_regs_valid(regs);
+
/* Returning to a kernel context with local irqs disabled. */
__hard_EE_RI_disable();
  #ifdef CONFIG_PPC64

Re: linux-next: manual merge of the kvm tree with the powerpc tree

2021-06-22 Thread Paolo Bonzini


On 22/06/21 07:25, Stephen Rothwell wrote:

Hi all,

Today's linux-next merge of the kvm tree got a conflict in:

   include/uapi/linux/kvm.h

between commit:

   9bb4a6f38fd4 ("KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE 
capability")

from the powerpc tree and commits:

   644f706719f0 ("KVM: x86: hyper-v: Introduce KVM_CAP_HYPERV_ENFORCE_CPUID")
   6dba94035203 ("KVM: x86: Introduce KVM_GET_SREGS2 / KVM_SET_SREGS2")
   0dbb11230437 ("KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall")

from the kvm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.



What are the dependencies of these KVM patches on patches from the bare 
metal trees, and can you guys *please* start using topic branches?


I've been asking you for literally years, but this is the first time I 
remember that Linus will have to resolve conflicts in uAPI changes and 
it is *not* acceptable.


Please drop the patches at 
https://www.spinics.net/lists/kvm-ppc/msg18666.html from the powerpc 
tree, and merge them through either the kvm-powerpc or kvm trees.


Paolo

[PATCH 2/2] powerpc/64s/interrupt: Check and fix srr_valid without crashing

2021-06-22 Thread Nicholas Piggin

The PPC_RFI_SRR_DEBUG check added by patch "powerpc/64s: avoid reloading
(H)SRR registers if they are still valid" has a few deficiencies. It
does not fix the actual problem, it's not enabled by default, and it
causes a program check interrupt which can cause more difficulties.

However there are a lot of paths which may clobber SRRs or change return
regs, and difficult to have a high confidence that all paths are covered
without wider testing.

Add a relatively low overhead always-enabled check that catches most
such cases, reports once, and fixes it so the kernel can continue.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/interrupt.c | 58 +
 1 file changed, 58 insertions(+)

diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index 05fa3ae56e25..5920a3e8d1d5 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -231,6 +231,56 @@ static notrace void booke_load_dbcr0(void)
 #endif
 }
 
+#include  /* for show_regs */
+static void check_return_regs_valid(struct pt_regs *regs)
+{
+#ifdef CONFIG_PPC_BOOK3S_64
+   static bool warned = false;
+
+   if (regs->trap == 0x980 || regs->trap == 0xe00 || regs->trap == 0xe20 ||
+   regs->trap == 0xe40 || regs->trap == 0xe60 || regs->trap == 0xe80 ||
+   regs->trap == 0xea0 || regs->trap == 0xf80 || regs->trap == 0x1200 
||
+   regs->trap == 0x1500 || regs->trap == 0x1600 || regs->trap == 
0x1800) {
+   if (local_paca->hsrr_valid) {
+   unsigned long hsrr0 = mfspr(SPRN_HSRR0);
+   unsigned long hsrr1 = mfspr(SPRN_HSRR1);
+
+   if (hsrr0 == regs->nip && hsrr1 == regs->msr)
+   return;
+
+   if (!warned) {
+   warned = true;
+   printk("HSRR0 was: %lx should be: %lx\n",
+   hsrr0, regs->nip);
+   printk("HSRR1 was: %lx should be: %lx\n",
+   hsrr1, regs->msr);
+   show_regs(regs);
+   }
+   local_paca->hsrr_valid = 0; /* fixup */
+   }
+
+   } else if (regs->trap != 0x3000) {
+   if (local_paca->srr_valid) {
+   unsigned long srr0 = mfspr(SPRN_SRR0);
+   unsigned long srr1 = mfspr(SPRN_SRR1);
+
+   if (srr0 == regs->nip && srr1 == regs->msr)
+   return;
+
+   if (!warned) {
+   warned = true;
+   printk("SRR0 was: %lx should be: %lx\n",
+   srr0, regs->nip);
+   printk("SRR1 was: %lx should be: %lx\n",
+   srr1, regs->msr);
+   show_regs(regs);
+   }
+   local_paca->srr_valid = 0; /* fixup */
+   }
+   }
+#endif
+}
+
 /*
  * This should be called after a syscall returns, with r3 the return value
  * from the syscall. If this function returns non-zero, the system call
@@ -327,6 +377,8 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
}
}
 
+   check_return_regs_valid(regs);
+
user_enter_irqoff();
 
/* scv need not set RI=0 because SRRs are not used */
@@ -405,6 +457,8 @@ notrace unsigned long interrupt_exit_user_prepare(struct 
pt_regs *regs)
}
}
 
+   check_return_regs_valid(regs);
+
user_enter_irqoff();
 
if (unlikely(!__prep_irq_for_enabled_exit(true))) {
@@ -469,9 +523,13 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct 
pt_regs *regs)
}
}
 
+   check_return_regs_valid(regs);
+
if (unlikely(!prep_irq_for_enabled_exit(true, 
!irqs_disabled_flags(flags
goto again;
} else {
+   check_return_regs_valid(regs);
+
/* Returning to a kernel context with local irqs disabled. */
__hard_EE_RI_disable();
 #ifdef CONFIG_PPC64
-- 
2.23.0

[PATCH 1/2] powerpc/64s: Fix "avoid reloading (H)SRR registers if they are still valid"

2021-06-22 Thread Nicholas Piggin

This picks up a bunch of srr invalidation cases missed, and tries to
convert the entire tree including 64e and 32-bit to use the accessors,
to save headaches in future. It also avoids doing the general clobber
after changing several things, to always using the accessors (a few
extra stores to paca is less important than simplicity of the code).
Only one place left where return regs are invalidated is context switching.
This also fixes a bug in context switching due to that invalidation not
being done for a newly created task due to the way ret_from_fork etc works.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/interrupt.h  |  2 +-
 arch/powerpc/include/asm/livepatch.h  |  2 +-
 arch/powerpc/include/asm/probes.h |  4 +-
 arch/powerpc/include/asm/ptrace.h |  2 +-
 arch/powerpc/kernel/hw_breakpoint.c   |  4 +-
 arch/powerpc/kernel/kgdb.c|  8 ++--
 arch/powerpc/kernel/kprobes-ftrace.c  |  2 +-
 arch/powerpc/kernel/kprobes.c | 13 +++---
 arch/powerpc/kernel/mce.c |  2 +-
 arch/powerpc/kernel/optprobes.c   |  2 +-
 arch/powerpc/kernel/process.c | 24 +--
 arch/powerpc/kernel/ptrace/ptrace-adv.c   | 20 +
 arch/powerpc/kernel/ptrace/ptrace-noadv.c | 14 +++
 arch/powerpc/kernel/ptrace/ptrace-view.c  |  5 ++-
 arch/powerpc/kernel/signal.c  | 10 ++---
 arch/powerpc/kernel/signal_32.c   | 42 +--
 arch/powerpc/kernel/signal_64.c   | 35 +++-
 arch/powerpc/kernel/traps.c   | 24 +--
 arch/powerpc/kernel/uprobes.c |  4 +-
 arch/powerpc/lib/error-inject.c   |  2 +-
 arch/powerpc/lib/sstep.c  | 17 
 arch/powerpc/lib/test_emulate_step.c  |  1 +
 arch/powerpc/math-emu/math_efp.c  |  2 +-
 arch/powerpc/platforms/embedded6xx/holly.c|  4 +-
 .../platforms/embedded6xx/mpc7448_hpc2.c  |  4 +-
 arch/powerpc/platforms/pasemi/idle.c  |  4 +-
 arch/powerpc/platforms/powernv/opal.c |  2 +-
 arch/powerpc/platforms/pseries/ras.c  |  4 +-
 arch/powerpc/sysdev/fsl_rio.c |  4 +-
 arch/powerpc/xmon/xmon.c  | 14 +++
 30 files changed, 135 insertions(+), 142 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index 6e9d18838d56..de36fb5d9c51 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -91,7 +91,7 @@ static inline void nap_adjust_return(struct pt_regs *regs)
if (unlikely(test_thread_local_flags(_TLF_NAPPING))) {
/* Can avoid a test-and-clear because NMIs do not call this */
clear_thread_local_flags(_TLF_NAPPING);
-   regs->nip = (unsigned long)power4_idle_nap_return;
+   regs_set_return_ip(regs, (unsigned long)power4_idle_nap_return);
}
 #endif
 }
diff --git a/arch/powerpc/include/asm/livepatch.h 
b/arch/powerpc/include/asm/livepatch.h
index ae25e6e72997..4fe018cc207b 100644
--- a/arch/powerpc/include/asm/livepatch.h
+++ b/arch/powerpc/include/asm/livepatch.h
@@ -16,7 +16,7 @@ static inline void klp_arch_set_pc(struct ftrace_regs *fregs, 
unsigned long ip)
 {
struct pt_regs *regs = ftrace_get_regs(fregs);
 
-   regs->nip = ip;
+   regs_set_return_ip(regs, ip);
 }
 
 #define klp_get_ftrace_location klp_get_ftrace_location
diff --git a/arch/powerpc/include/asm/probes.h 
b/arch/powerpc/include/asm/probes.h
index 84dd1addd434..c5d984700d24 100644
--- a/arch/powerpc/include/asm/probes.h
+++ b/arch/powerpc/include/asm/probes.h
@@ -34,14 +34,14 @@ typedef u32 ppc_opcode_t;
 /* Enable single stepping for the current task */
 static inline void enable_single_step(struct pt_regs *regs)
 {
-   regs->msr |= MSR_SINGLESTEP;
+   regs_set_return_msr(regs, regs->msr | MSR_SINGLESTEP);
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
/*
 * We turn off Critical Input Exception(CE) to ensure that the single
 * step will be for the instruction we have the probe on; if we don't,
 * it is possible we'd get the single step reported for CE.
 */
-   regs->msr &= ~MSR_CE;
+   regs_set_return_msr(regs, regs->msr & ~MSR_CE);
mtspr(SPRN_DBCR0, mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM);
 #ifdef CONFIG_PPC_47x
isync();
diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index cd01423b1c24..891ac7189d31 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -242,7 +242,7 @@ static inline void regs_set_return_msr(struct pt_regs 
*regs, unsigned long msr)
 #endif
 }
 
-static inline void return_ip_or_msr_changed(void)
+static inline void set_return_regs_changed(void)
 {
 #ifdef CONFIG_PPC_BOOK3S_64
local_paca->hsrr_valid = 0;
diff --git

[PATCH 0/2] fast interrupts fixes

2021-06-22 Thread Nicholas Piggin

These are a couple of improvements to the fast interrupt exits series
that fit after patch 4. The first patch is incremental to patch 4 and
should be folded in, the second could go anywhere in the series but I
based it on top.

I can rebase it or resend the series if needed.

Thanks,
Nick

Nicholas Piggin (2):
  powerpc/64s: Fix "avoid reloading (H)SRR registers if they are still
valid"
  powerpc/64s/interrupt: Check and fix srr_valid without crashing

 arch/powerpc/include/asm/interrupt.h  |  2 +-
 arch/powerpc/include/asm/livepatch.h  |  2 +-
 arch/powerpc/include/asm/probes.h |  4 +-
 arch/powerpc/include/asm/ptrace.h |  2 +-
 arch/powerpc/kernel/hw_breakpoint.c   |  4 +-
 arch/powerpc/kernel/interrupt.c   | 58 +++
 arch/powerpc/kernel/kgdb.c|  8 +--
 arch/powerpc/kernel/kprobes-ftrace.c  |  2 +-
 arch/powerpc/kernel/kprobes.c | 13 +++--
 arch/powerpc/kernel/mce.c |  2 +-
 arch/powerpc/kernel/optprobes.c   |  2 +-
 arch/powerpc/kernel/process.c | 24 
 arch/powerpc/kernel/ptrace/ptrace-adv.c   | 20 ---
 arch/powerpc/kernel/ptrace/ptrace-noadv.c | 14 ++---
 arch/powerpc/kernel/ptrace/ptrace-view.c  |  5 +-
 arch/powerpc/kernel/signal.c  | 10 ++--
 arch/powerpc/kernel/signal_32.c   | 42 +++---
 arch/powerpc/kernel/signal_64.c   | 35 +--
 arch/powerpc/kernel/traps.c   | 24 
 arch/powerpc/kernel/uprobes.c |  4 +-
 arch/powerpc/lib/error-inject.c   |  2 +-
 arch/powerpc/lib/sstep.c  | 17 +++---
 arch/powerpc/lib/test_emulate_step.c  |  1 +
 arch/powerpc/math-emu/math_efp.c  |  2 +-
 arch/powerpc/platforms/embedded6xx/holly.c|  4 +-
 .../platforms/embedded6xx/mpc7448_hpc2.c  |  4 +-
 arch/powerpc/platforms/pasemi/idle.c  |  4 +-
 arch/powerpc/platforms/powernv/opal.c |  2 +-
 arch/powerpc/platforms/pseries/ras.c  |  4 +-
 arch/powerpc/sysdev/fsl_rio.c |  4 +-
 arch/powerpc/xmon/xmon.c  | 14 ++---
 31 files changed, 193 insertions(+), 142 deletions(-)

-- 
2.23.0

86 matches

Mail list logo